Re: record identical operator

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-16 14:28:23
Message-ID: 20130916142823.GD5249@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-15 19:49:26 -0400, Noah Misch wrote:
> On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote:
> > On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> > > Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > > > But both arrays don't have the same binary representation since
> > > > the former has a null bitmap, the latter not. So, if you had a
> > > > composite type like (int4[]) and would compare that without
> > > > invoking operators you'd return something false in some cases
> > > > because of the null bitmaps.
> > >
> > > Not for the = operator. The new "identical" operator would find
> > > them to not be identical, though.
> >
> > Yep. And I think that's a problem if exposed to SQL. People won't
> > understand the hazards and end up using it because its faster or
> > somesuch.
>
> The important question is whether to document the new operator and/or provide
> it under a guessable name. If we give the operator a weird name, don't
> document it, and put an "internal use only" comment in the catalogs, that is
> essentially as good as hiding this feature at the SQL level.

Doesn't match my experience.

> Type-specific identity operators seem like overkill, anyway. If we find that
> meaningless variations in a particular data type are causing too many false
> non-matches for the generic identity operator, the answer is to make the
> functions generating datums of that type settle on a canonical form. That
> would be the solution for your example involving array null bitmaps.

I think that's pretty much unrealistic. I am pretty sure that if either
of us starts looking we will find at about a dozen of such cases and
miss the other dozen. Not to speak about external code which is damn
likely to contain such cases.
And I think that efficiency will often make such normalization expensive
(consider postgis where Datums afaik can exist with an internal bounding
box or without).

I think it's far more realistic to implement an identity operator that
will fall back to a type specific operator iff equals has "strange"
properties.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-16 14:38:46 Re: Support for REINDEX CONCURRENTLY
Previous Message Merlin Moncure 2013-09-16 14:18:04 Re: Proposal: json_populate_record and nested json objects