Re: record identical operator

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-23 20:32:41
Message-ID: 20130923203241.GK2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> I don't know why there shouldn't be a question about that.

Because anything else would be an internal optimization which must be
proven to be correct, imv, also..

> Suppose
> that the MAX() aggregate is in use. If 4 or 5 or 6 is updated so as
> to change the maximum of the three, then matrow 1 needs updating. But
> if the maximum remains the same, then it doesn't. The right way to
> decide whether it needs updating is to re-aggregate those three rows
> and then see whether you get the same (read: binary identical) out of
> the aggregate that you got the last time you ran it.

You could argue the same about PG doing that for any row update- check
if anything is actually *binary* different and, if not, then don't
update it. Of course, there's questions about if that's "right" and
what about triggers, etc..

> Also, suppose the same statement updates row 4, row 5, and row 6.
> Instead of updating the materialized view three times, you do it just
> once at end-of-statmement, like an AFTER STATEMENT trigger that
> somehow knows which rows were updated.

Sorry if I wasn't clear, but that's exactly what I was trying to
describe regarding how it should work. I was NOT intending to suggest
that each update immediately update the matview. It's just that we
keep *track* of what was updated and then, at some convenient point,
actually run the process to update the matview rows (maybe in an AFTER
statement, maybe 5 minutes from now).

> In this case even something
> like AVG() could produce the same result as it did before the update.

Sure it could.

> And you'd surely want to avoid updating the matview if the new value
> was the same as what was already stored in the matview (but not if it
> was equal but not the same).

I don't see why updating a row that was built with AVG() should be
avoided over a row that was built with MAX(), unless you're suggesting
there's a different set of rows involved in the two or there's some
additional optimization around figuring out if these particular changes
*should* actually change the result. That's an analysis which could
still happen and wouldn't need to rely on any binary equality test, and
it'd need to have a whole lot more smarts than this approach anyway or
you'll still end up running a query against all of the rows involved in
the AVG() to then only decide at the last moment to not update the row,
which doesn't strike me as a great optimization. Perhaps that's why we
didn't implement it for PG itself?

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-09-23 20:38:40 Re: record identical operator
Previous Message Tomas Vondra 2013-09-23 20:15:02 Re: Improving avg performance for numeric