Re: NULL's support in SP-GiST

Lists: pgsql-hackers
From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: NULL's support in SP-GiST
Date: 2012-02-02 21:26:13
Message-ID: Pine.LNX.4.64.1201312034280.12612@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi there,

attached patch introduces NULLs indexing for SP-GiST. With this patch
Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan.

We added boolean satisfyAll field in spgInnerConsistentIn and spgLeafConsistentIn
structures, which informs the user-defined methods, that all search results
satisfy a query and should be returned. Calls of consistent methods are
needed because they know how to reconstruct an original value.

Unlike BTree we can't introduce a rule like "NULL is greater than
anything else", because Sp-GiST doesn't know semantics of indexed data.
Also, Sp-GiST is essentially single-column index, so we can
store null values outside the main structure of index and use separate code
path to work with NULLs. Actually, storing just ItemPointer
(instead of the whole index tuple) is enough for NULLs, so we can reuse
data tree storage from GIN, which used for storing
ItemPointers for each indexed values. For that purpose, GinPageOpaqueData
and SpGistPageOpaqueData should be compatible, at least at flag's
positions and values. In fact, it's needed only for vacuum code,
which produces full index scan.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Attachment Content-Type Size
spgist_null-0.8.gz application/octet-stream 12.0 KB

From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-02-28 22:20:55
Message-ID: CAJKUy5iq=0i_hQByCxCVBfqdWbkjvanAWndMY7eJh1f074v9gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 2, 2012 at 4:26 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> Hi there,
>
> attached patch introduces NULLs indexing for SP-GiST. With this patch
> Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan.
>

I was looking at this.
It passes all regression tests, and seems to work fine.

What i don't like about it is that spgnull.c actually call GIN
functions and even uses GIN flags. Don't know how bad it is, but IMO
there is a module violation here.

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-03-07 20:36:20
Message-ID: CA+TgmoZTRSoQ3yvEcHfpiyYvG2qsbZq8Dqa4aCPh2Jkwfd7_UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 28, 2012 at 5:20 PM, Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
> On Thu, Feb 2, 2012 at 4:26 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>> attached patch introduces NULLs indexing for SP-GiST. With this patch
>> Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan.
>
> I was looking at this.
> It passes all regression tests, and seems to work fine.
>
> What i don't like about it is that spgnull.c actually call GIN
> functions and even uses GIN flags. Don't know how bad it is, but IMO
> there is a module violation here.

That certainly doesn't sound like a good thing.

I guess the question is whether this is a stop-ship item for spgist.
If it is, then we're going to have to spend the time to fix this, but
if not, then since it was submitted more than two weeks after the
start of the CommitFest, it seems we should postpone it to 9.3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-03-07 21:03:21
Message-ID: 19145.1331154201@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I guess the question is whether this is a stop-ship item for spgist.
> If it is, then we're going to have to spend the time to fix this, but
> if not, then since it was submitted more than two weeks after the
> start of the CommitFest, it seems we should postpone it to 9.3.

If we buy into the assumption that nulls support can/should be bolted on
after the fact, then postponing it to 9.3 would be reasonable. I wasn't
very happy with that idea though, and would like to look at this issue
before 9.2 gets frozen. So please don't pull it from the CF yet.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-03-07 21:59:37
Message-ID: CA+TgmoaYgftYTDog6fy-5af1WFDeycDt5nw=h=q+L7khYj3r=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 7, 2012 at 4:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I guess the question is whether this is a stop-ship item for spgist.
>> If it is, then we're going to have to spend the time to fix this, but
>> if not, then since it was submitted more than two weeks after the
>> start of the CommitFest, it seems we should postpone it to 9.3.
>
> If we buy into the assumption that nulls support can/should be bolted on
> after the fact, then postponing it to 9.3 would be reasonable.  I wasn't
> very happy with that idea though, and would like to look at this issue
> before 9.2 gets frozen.  So please don't pull it from the CF yet.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-03-09 22:44:32
Message-ID: 7324.1331333072@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> attached patch introduces NULLs indexing for SP-GiST. With this patch
> Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan.

I've looked at this patch a bit. I share Jaime's extreme discomfort
with re-using GIN code to handle some pages of an SPGist index. Making
that code serve two masters is going to bite us on the rear sooner or
later, probably sooner. I must also object in the strongest terms to
the proposed rearrangement of SPGiST page special space to make it
sort-of-compatible with GIN special space. That will entirely break
tools such as pg_filedump, which needs the special space to be visibly
different from GIN, or it won't know how to print the page contents.

The other aspect I don't particularly like is the proposed changes to
the opclass interface API. Adding a "satisfyAll" field seems like just
a kluge. Also, it does nothing to fix the complaints I had in
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00804.php
about the search API being inherently inefficient for multiple scan
keys, because it forces repeat reconstruction of the indexed value.

I think a better fix for the opclass API would be to do what I suggested
there:
> * Perhaps it'd be a good idea to move the loop over scankeys to inside
> the opclass consistent methods, ie call them just once to check all the
> scankeys. Then we could meaningfully define zero scankeys as a full
> index scan, and we would also get rid of redundant value reconstruction
> work when there's more than one scankey.

I'm less sure about what to do to store nulls, but one idea is to have a
separate SPGiST tree storing only nulls and descending from its own root
page, similar to the idea in this patch of having a separate root page
for nulls. It'd be a tad less efficient than GIN-based storage for
large numbers of nulls, but you probably don't want to use SPGiST to
index columns with lots of nulls anyway.

Normally, if I felt that a patch needed to be thrown away and rewritten,
I'd just bounce it back to the author for rework. However, in this case
we are under a time crunch, and I feel that it's critical that we try to
get both the opclass API and the on-disk format right for 9.2. It will
be much harder to change either thing once we release. So I'm willing
to spend some time rewriting the patch according to these ideas, and
will go off and do that if there are not objections.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NULL's support in SP-GiST
Date: 2012-03-11 16:00:47
Message-ID: 24777.1331481647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I think a better fix for the opclass API would be to do what I suggested
> there:
>> * Perhaps it'd be a good idea to move the loop over scankeys to inside
>> the opclass consistent methods, ie call them just once to check all the
>> scankeys. Then we could meaningfully define zero scankeys as a full
>> index scan, and we would also get rid of redundant value reconstruction
>> work when there's more than one scankey.

I've committed that ...

> I'm less sure about what to do to store nulls, but one idea is to have a
> separate SPGiST tree storing only nulls and descending from its own root
> page, similar to the idea in this patch of having a separate root page
> for nulls. It'd be a tad less efficient than GIN-based storage for
> large numbers of nulls, but you probably don't want to use SPGiST to
> index columns with lots of nulls anyway.

... and attached is a WIP patch that handles nulls as a separate SPGiST
tree. Tuple layouts are the same as before, but each page is marked as
to whether it stores nulls or non-nulls. The cache mechanism is
modified to keep separate sets of cached pages for the regular and nulls
trees. I'm fairly happy with this from a code cleanliness point of
view, and it passes the regression tests included in your patch (which
I didn't append here). It still needs doc updates, and also I think
that the WAL logic is probably broken --- the page-stores-nulls flag
will probably need to be added to most of SPGiST's WAL record types.

regards, tom lane

Attachment Content-Type Size
spgist-nulls-0.9.patch text/x-patch 59.2 KB