GiST subsplit question

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: GiST subsplit question
Date: 2012-05-30 19:21:14
Message-ID: 1338405674.18825.17.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

http://archives.postgresql.org/message-id/CAEsn3ybKcnFno_tGQDJ=AfHiR2xd9KA1FQt5CQxxYu3Wz_hAHQ@mail.gmail.com

I was trying to answer that question on -general, and I found it a
little more challenging than I expected.

There was a previous discussion here:
http://archives.postgresql.org/pgsql-general/2007-08/msg01816.php

And it seems to trace back to these commits:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=783a73168b972488f85e48381546db047cb8f982
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1f7ef548ec2e594fa8766781c490fb5b998ea46b

I looked for the follow-up commit to support subsplit in the contrib
modules, figuring that would answer some questions, but I couldn't find
it.

The part that's confusing me is that the commit message says: "pickSplit
should set spl_(l|r)datum_exists to 'false'", but I don't see any
picksplit method that actually does that in contrib, nor in the sample
in the docs.

The code in that area is a bit difficult to follow, so it's not obvious
to me exactly what is supposed to happen.

So, do we demote that message to a DEBUG1? Or do we make it more clear
what the authors of a specific picksplit are supposed to do to avoid
that problem? Or am I misunderstanding something?

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: GiST subsplit question
Date: 2012-05-30 20:35:44
Message-ID: CAPpHfdsfRpuCAJzJpyW95yvqycqp1zdQ5rktW2Vmj3ryeBu49g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 30, 2012 at 11:21 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> I looked for the follow-up commit to support subsplit in the contrib
> modules, figuring that would answer some questions, but I couldn't find
> it.
>
> The part that's confusing me is that the commit message says: "pickSplit
> should set spl_(l|r)datum_exists to 'false'", but I don't see any
> picksplit method that actually does that in contrib, nor in the sample
> in the docs.
>

The only picksplit implementation I know to support secondary split
is fallbackSplit in gistproc.c :). I didn't understand how secondary split
works until I get deep into GiST code.

The code in that area is a bit difficult to follow, so it's not obvious
> to me exactly what is supposed to happen.
>

When GiST split index tupes by first column, it find if some tuples can be
equally placed to any of groups. If so it calls picksplit for second column
with only that tuples. But we already now that some keys of second column
should be placed to particular groups, because some tuples are already
unambiguously
placed to the groups. GiST calculates union of that keys of second column
for each group and pass it as ldatum and rdatum to pisckplit function of
second column. In order to indicate such secondary split GiST set
ldatum_exists and rdatum_exists flags. If picksplit function support
secondary split, it should join given keys to existing ldatum and rdatum
and set off ldatum_exists and rdatum_exists flags. If picksplit function
don't support secondary split than in leave ldatum_exists and rdatum_exists
as is, and GiST decide how to join picksplit result with existing groups
itself by using penalty function. Also it is possible than only one of
ldatum_exists
and rdatum_exists flags is set. It indicated that there aren't not null
keys of second column which unambiguously join group which flag isn't set.
In this case pisksplit function which support secondary split can form one
of ldatum or rdatum (which flag isn't set) without limitations.

> So, do we demote that message to a DEBUG1? Or do we make it more clear
> what the authors of a specific picksplit are supposed to do to avoid
> that problem? Or am I misunderstanding something?
>

+1 for demote message to DEBUG1. I think it shouldn't be so noisy, it just
indicates something could be improved.
Also I think we defenitely need to document secondary split. Now it's no
chances to understand without reverse engeneering it from code.

------
With best regards,
Alexander Korotkov.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: GiST subsplit question
Date: 2012-06-26 15:28:23
Message-ID: CA+Tgmob_t5i9u8WC0SejETqihVpxb=ao_u-FXhCb1As0dxhPcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 30, 2012 at 4:35 PM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
>> So, do we demote that message to a DEBUG1? Or do we make it more clear
>> what the authors of a specific picksplit are supposed to do to avoid
>> that problem? Or am I misunderstanding something?
>
>
> +1 for demote message to DEBUG1. I think it shouldn't be so noisy, it just
> indicates something could be improved.
> Also I think we defenitely need to document secondary split. Now it's no
> chances to understand without reverse engeneering it from code.

I'm happy to go demote the message if we have consensus on that, but
somebody else is going to need to provide the doc patch. Any takers?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: GiST subsplit question
Date: 2012-06-26 17:43:44
Message-ID: 1340732624.12523.1.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-06-26 at 11:28 -0400, Robert Haas wrote:
> On Wed, May 30, 2012 at 4:35 PM, Alexander Korotkov
> <aekorotkov(at)gmail(dot)com> wrote:
> >> So, do we demote that message to a DEBUG1? Or do we make it more clear
> >> what the authors of a specific picksplit are supposed to do to avoid
> >> that problem? Or am I misunderstanding something?
> >
> >
> > +1 for demote message to DEBUG1. I think it shouldn't be so noisy, it just
> > indicates something could be improved.
> > Also I think we defenitely need to document secondary split. Now it's no
> > chances to understand without reverse engeneering it from code.
>
> I'm happy to go demote the message if we have consensus on that, but
> somebody else is going to need to provide the doc patch. Any takers?

I was planning to do that, but it won't be for a few days at least. If
someone else wants to do it sooner, feel free.

Regards,
Jeff Davis