Re: record identical operator

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-13 21:59:00
Message-ID: 20130913215900.GB7437@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-13 14:36:27 -0700, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote:
> >> The new operator is logically similar to IS NOT DISTINCT FROM for a
> >> record, although its implementation is very different.  For one
> >> thing, it doesn't replace the operation with column level operators
> >> in the parser.  For another thing, it doesn't look up operators for
> >> each type, so the "identical" operator does not need to be
> >> implemented for each type to use it as shown above.  It compares
> >> values byte-for-byte, after detoasting.  The test for identical
> >> records can avoid the detoasting altogether for any values with
> >> different lengths, and it stops when it finds the first column with
> >> a difference.
> >
> > In the general case, that operator sounds dangerous to me. We don't
> > guarantee that a Datum containing the same data always has the same
> > binary representation. E.g. array can have a null bitmap or may not have
> > one, depending on how they were created.
> >
> > I am not actually sure whether that's a problem for your usecase, but I
> > get headaches when we try circumventing the type abstraction that way.
> >
> > Yes, we do such tricks in other places already, but afaik in all those
> > places errorneously believing two Datums are distinct is not error, just
> > a missed optimization. Allowing a general operator with such a murky
> > definition to creep into something SQL exposed... Hm. Not sure.
>
> Well, the only two alternatives I could see were to allow
> user-visible differences not to be carried to the matview if they
> old and new values were considered "equal", or to implement an
> "identical" operator or function in every type that was to be
> allowed in a matview.  Given those options, what's in this patch
> seemed to me to be the least evil.
>
> It might be worth noting that this scheme doesn't have a problem
> with correctness if there are multiple equal values which are not
> identical, as long as any two identical values are equal.  If the
> query which generates contents for a matview generates
> non-identical but equal values from one run to the next without any
> particular reason, that might cause performance problems.

I am not actually that concerned with MVCs using this, you're quite
capable of analyzing the dangers. What I am wary of is exposing an
operator that's basically broken from the get go to SQL.
Now, the obvious issue there is that matviews use SQL to refresh :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-09-13 22:01:56 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Andres Freund 2013-09-13 21:57:30 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers