Re: record identical operator

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-23 19:55:58
Message-ID: 1379966158.9393.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> I'm trying to explain that using that methodology is what landed
> us in this situation to begin with.

I'm trying to figure out what situation you think we're in.
Seriously, if you could apply the patch and show one example that
demonstrates what you see to be a problem, that would be great.

>> I think it is fairly obvious that REFRESH should REgenerate a FRESH
>> copy of the data, versus incremental maintenance -- which attempts
>> to keep the matview up-to-date without regenerating the full set of
>> data.
>
> Having 'REFRESH' regenerate a fresh copy of the data makes sense to me,
> and is what we have now, no?  The only issue there is that it takes out
> a big lock, which I appreciate that you're trying to get rid of.
>
>>   Whenever there is logical replication (and materialized
>> views are, conceptually, one form of that -- within the database) I
>> feel it is important to be able to correct any possible "drift".
>> With matviews, I see the way to do that as the REFRESH command, and
>> I feel that it is important to be able to do that in a way that can
>> run concurrently with readers of the matview -- without blocking
>> them or being blocked by them.
>
> Of course.
>
>> Discussion of incremental maintenance really belongs on a different
>> thread.
>
> I'm really getting tired of everyone saying "this is the only way to do
> it" (or perhaps "well, this is already committed, therefore it must be
> what we're gonna do")

What I'm saying is that REFRESH and incremental maintenance are two
different things, and conflating them just confuses everything.

> when a) we're already planning to rip this out and change it, or
> so I thought,

The entire change to matview-specific code is to use a different
operator in two places.  Outside of that, it consists of adding the
12th non-default opclass to core.

> and b) we're trying to make promises we can't keep with this
> approach.

I don't see any such.  If you do, please describe them; or better
yet, give an example.

>> Since I have gone to the trouble to read a lot of papers
>> on the topic, and select one that I think is a good basis for our
>> implementation, I hope everyone will frame discussion in terms of
>> either:
>>   -  how best to implement the techniques from that paper, or
>>   -  why some other paper presents a better technique.
>
> My recollection from the hackers meeting is that I'm trying to simply
> paraphrase what you had said was in the paper wrt keeping track of what
> rows are changed underneath and using that as a basis to implement the
> changes necessary in the view.  Does the paper you're referring to
> describe rerunning the whole query and then trying to figure out what's
> been changed..?  That's really what I'm having trouble understanding
> why anyone would want to implement.  I'll try and find time to hunt down
> the threads and papers on it, but I really could have sworn this was
> gone over at the hacker meeting- and it made a lot of sense to me, then.

The only thing the paper says on the topic is that any incremental
maintenance scheme is a heuristic.  There will always be cases when
it would be faster and less resource-intensive to regenerate the
data from the defining query.  There is at least an implication
that a good implementation will try to identify when it is in such
a situation, and ignore the whole incremental maintenance approach
in favor of what we are doing with REFRESH.  The example they give
is if there is an unqualified DELETE of every row in a table which
is part of an inner join generating the result, that it would
almost be faster to to generate the (empty) result set than to run
their algorithm to determine that all the rows need to be deleted.
One reason for having a REFRESH that re-runs the query like this is
that it *is* a recommended "escape hatch" when a mass operation
makes the incremental calculations too expensive.

>> I really didn't expect to have to burn so much time
>> and energy arguing over whether a REFRESH should leave the matview
>> accurately containing the results of the matview's query.
>
> I appreciate you bringing me up to speed on where things actually are
> here- again, sorry for not realizing the direction that this was going
> in earlier; it really didn't even occur to me that it would have gone
> down this road.  I, also, didn't expect to spend so much time on this.
>
>>>> We can argue about how it should be named
>
> Really, I'm back to trying to figure out why we want to go down this
> road at all.
>
>>>> and whether it should be documented
>>
>> I thought we had a consensus to document both the existing record
>> comparison operators and these new ones, and I'm fine with that.
>
> If it gets added, it certainly should be documented,

That seems to be the consensus.  In fact, I would have submitted
that with this patch if there had been any documentation for the
default record comparison operators.  It seemed like that might
have been omitted on purpose, and it seemed weird to add
documentation for a non-default operator for records when (a) we
didn't document the default operator for records and (b) we don't
document many of the other non-default operators already in core.

> and heavily caveated.

I'm not sure what caveats would be needed.  It seems to me that a
clear description of what it does would suffice.  Like all the
other non-default opclasses in core, it will be non-default because
it is less frequently useful.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-09-23 19:59:03 Re: record identical operator
Previous Message Robert Haas 2013-09-23 19:49:50 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE