Re: record identical operator

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-13 21:36:27
Message-ID: 1379108187.76931.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote:
>> The new operator is logically similar to IS NOT DISTINCT FROM for a
>> record, although its implementation is very different.  For one
>> thing, it doesn't replace the operation with column level operators
>> in the parser.  For another thing, it doesn't look up operators for
>> each type, so the "identical" operator does not need to be
>> implemented for each type to use it as shown above.  It compares
>> values byte-for-byte, after detoasting.  The test for identical
>> records can avoid the detoasting altogether for any values with
>> different lengths, and it stops when it finds the first column with
>> a difference.
>
> In the general case, that operator sounds dangerous to me. We don't
> guarantee that a Datum containing the same data always has the same
> binary representation. E.g. array can have a null bitmap or may not have
> one, depending on how they were created.
>
> I am not actually sure whether that's a problem for your usecase, but I
> get headaches when we try circumventing the type abstraction that way.
>
> Yes, we do such tricks in other places already, but afaik in all those
> places errorneously believing two Datums are distinct is not error, just
> a missed optimization. Allowing a general operator with such a murky
> definition to creep into something SQL exposed... Hm. Not sure.

Well, the only two alternatives I could see were to allow
user-visible differences not to be carried to the matview if they
old and new values were considered "equal", or to implement an
"identical" operator or function in every type that was to be
allowed in a matview.  Given those options, what's in this patch
seemed to me to be the least evil.

It might be worth noting that this scheme doesn't have a problem
with correctness if there are multiple equal values which are not
identical, as long as any two identical values are equal.  If the
query which generates contents for a matview generates
non-identical but equal values from one run to the next without any
particular reason, that might cause performance problems.

To mangle Orwell: "Among pairs of equal values, some pairs are more
equal than others."

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-09-13 21:41:46 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Merlin Moncure 2013-09-13 21:20:54 Re: Large shared_buffer stalls WAS: proposal: Set effective_cache_size to greater of .conf value, shared_buffers