Extending opfamilies for GIN indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Extending opfamilies for GIN indexes
Date: 2011-01-19 00:03:43
Message-ID: 7019.1295395423@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just got annoyed by the fact that contrib/intarray has support for
queries on GIN indexes on integer[] columns, but they only work if you
use the intarray-provided opclass, not the core-provided GIN opclass for
integer[] columns. In general, of course, two different GIN opclasses
aren't compatible, but here there is precious little reason why not:
the contents of the index are the same both ways, ie, all the individual
integer keys in the arrays. It would be a real usability improvement,
and would eliminate a foot-gun, if contrib/intarray could somehow be an
extension to the core opclass instead of an independent thing.

It seems to me that this should be possible within the opfamily/opclass
data structure. Right now, there isn't any real application for
opfamilies for GIN (or GiST) indexes, because both of those AMs pay
attention only to the "default" support procs that are bound into the
opclass for an index. But that could change.

In particular, only two of the five support procs used by GIN are
actually associated with "the index", in the sense of having some impact
on what's stored in the index: the compare() and extractValue() procs.
The other three are more associated with queries, though they do depend
on having knowledge about the behavior of the compare and extractValue
procs.

So here's what I'm thinking: we could redefine a GIN opclass, per se, as
needing only compare() and extractValue() procs to be bound into it.
The other three procs, as well as the query operators, could be "loose"
in the containing opfamily. The index AM would choose which set of the
other support procedures to use for a specific query by matching their
amproclefttype/amprocrighttype to the declared input types of the query
operator, much as btree does.

Having done that, contrib/intarray could work by adding "loose"
operators and support procs to the core opfamily for integer[].

It's possible that this scheme would also make it really useful to have
multiple opclasses within one GIN opfamily; though offhand I'm not sure
of an application for that. (Right now, the only reason to do that is
if you want to give opclasses for different types the same name, as we
do with the core "array_ops".)

Perhaps the same could be done with GiST, although I'm less sure about
the possible usefulness there.

Comments?

BTW, this idea means that amproc entries would no longer be tightly
associated with specific GIN opclasses, so the contentious patch for
getObjectDescription should indeed get applied.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2011-01-19 00:24:25 Re: log_hostname and pg_stat_activity
Previous Message David E. Wheeler 2011-01-18 23:53:24 Re: Fixing GIN for empty/null/full-scan cases