Re: knngist - 0.8

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 01:32:09
Message-ID: 22720.1290475929@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> As far as the syntax of CREATE/ALTER OPERATOR CLASS/FAMILY is concerned,
> I like Robert's proposal (OPERATOR ... FOR ORDER or FOR SEARCH) better
> than Teodor's, though we don't need the multiple-purposes-per-entry
> aspect of it. This is mainly because it looks more easily extensible
> if we ever do get around to having more purposes.

Although having said that ... I was poking around a bit more and noticed
how entirely bogus the current patch's matching of pathkeys is. It pays
no attention at all to which pk_opfamily is specified, which means it
will fail given a query like
SELECT ... ORDER BY indexable_col <-> constant USING <<<
where <<< represents some sort order that has nothing to do with the
order that GIST is expecting. The query will be considered to match the
index and then it will proceed to produce results in the wrong order.

We could perhaps kluge around that by insisting that the pathkey opclass
be the default btree opclass for the relevant datatype; but that's
fairly expensive to check for in match_pathkey_to_index, and it's
getting even further away from having a general-purpose solution.

I'm inclined to think that instead of just specifying "FOR ORDER",
the opclass definition ought to specify "FOR ORDER BY something",
where "something" could perhaps be a pre-existing btree opclass name.
Or maybe it could be a suitable sort operator, though then we'd have
to do a bit of reverse-engineering in match_pathkey_to_index, so that's
still no fun from a lookup-speed point of view. In any case what we'd
be specifying is a sort order for the datatype that the opclass member
operator yields, not the indexed datatype (so in practice it'd usually
be float8_ops or float8 <, no doubt).

The reason I bring this up now is that it affects the decision as to
what the unique key for pg_amop ought to be. Instead of having an
enum "purpose" column, maybe we should consider that the unique key
is (operator oid, opfamily oid, order-by-oid), where order-by-oid
is zero for a search operator and the OID of the btree opclass or sort
operator for an ordering operator. This would be of value if we
imagined that a single opclass could support ordering by more than one
distance ordering operator; which seems a bit far-fetched but perhaps
not impossible. On the other side of the coin it'd mean we aren't
leaving room for other sorts of operator "purposes".

On balance I'm inclined to leave the unique key as per previous proposal
(with a "purpose" column) and add the which-sort-order-is-that
information as payload columns that aren't part of the key.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-11-23 01:38:48 Re: s/LABEL/VALUE/ for ENUMs
Previous Message Robert Haas 2010-11-23 00:48:31 Re: s/LABEL/VALUE/ for ENUMs