Re: record identical operator

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-24 13:46:12
Message-ID: 1380030372.48377.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> We need justification to add operators, imv, especially ones that
> expose our internal binary representation of data.

I believe I have done so.

> I worry that adding these will come back to bite us later

How?

> and that we're making promises we won't be able to keep.

The promise that a concurrent refresh will produce the same set of
rows as a non-concurrent one?

> If these inconsistencies in what happens with these data types
> are an issue then REFRESH can be handled as a wholesale
> DELETE/INSERT.

I have real-life experience with handling faux materialized views
by creating tables (at Wisconsin Courts).  We initially did it the
way you describe, but the run time was excessive (in Milwaukee
County the overnight run did not always complete before the start
of business hours the next day).  Switching to logic similar to
what I've implemented here, it completed an order of magnitude
faster, and generated a small fraction of the WAL and logical
replication data that the technique you describe did.  Performing
preliminary steps in temporary tables to minimize the changes
needed to permanent tables seems beneficial enough on the face of
it that I think the burden of proof should be on someone arguing
that deleting all rows and re-inserting (in the same transaction)
is, in general, a superior approach to finding the differences and
applying only those/

> Trying to do this incremental-but-not-really maintenance where
> the whole query is run but we try to skimp on what's actually
> getting updated in the matview is a premature optimization, imv,
> and one which may be less performant and more painful, with more
> gotchas and challenges for our users, to deal with in the long
> run.

I have the evidence of a ten-fold performance improvement plus
minimized WAL and replication work on my side.  What evidence do
you have to back your assertions?  (Don't forget to work in bloat
and vacuum truncation issues to the costs of your proposal.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-09-24 13:49:19 Re: Improving avg performance for numeric
Previous Message Stephen Frost 2013-09-24 13:31:53 Re: record identical operator