Re: NULL's support in SP-GiST

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-03-09 22:44:32
Message-ID: 7324.1331333072@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> attached patch introduces NULLs indexing for SP-GiST. With this patch
> Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan.

I've looked at this patch a bit. I share Jaime's extreme discomfort
with re-using GIN code to handle some pages of an SPGist index. Making
that code serve two masters is going to bite us on the rear sooner or
later, probably sooner. I must also object in the strongest terms to
the proposed rearrangement of SPGiST page special space to make it
sort-of-compatible with GIN special space. That will entirely break
tools such as pg_filedump, which needs the special space to be visibly
different from GIN, or it won't know how to print the page contents.

The other aspect I don't particularly like is the proposed changes to
the opclass interface API. Adding a "satisfyAll" field seems like just
a kluge. Also, it does nothing to fix the complaints I had in
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00804.php
about the search API being inherently inefficient for multiple scan
keys, because it forces repeat reconstruction of the indexed value.

I think a better fix for the opclass API would be to do what I suggested
there:
> * Perhaps it'd be a good idea to move the loop over scankeys to inside
> the opclass consistent methods, ie call them just once to check all the
> scankeys. Then we could meaningfully define zero scankeys as a full
> index scan, and we would also get rid of redundant value reconstruction
> work when there's more than one scankey.

I'm less sure about what to do to store nulls, but one idea is to have a
separate SPGiST tree storing only nulls and descending from its own root
page, similar to the idea in this patch of having a separate root page
for nulls. It'd be a tad less efficient than GIN-based storage for
large numbers of nulls, but you probably don't want to use SPGiST to
index columns with lots of nulls anyway.

Normally, if I felt that a patch needed to be thrown away and rewritten,
I'd just bounce it back to the author for rework. However, in this case
we are under a time crunch, and I feel that it's critical that we try to
get both the opclass API and the on-disk format right for 9.2. It will
be much harder to change either thing once we release. So I'm willing
to spend some time rewriting the patch according to these ideas, and
will go off and do that if there are not objections.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-03-09 22:54:10 Re: poll: CHECK TRIGGER?
Previous Message Marko Kreen 2012-03-09 22:42:58 Re: pg_crypto failures with llvm on OSX