Proposal to improve multicolumn GiST page split algoriithm.

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal to improve multicolumn GiST page split algoriithm.
Date: 2006-06-07 16:51:25
Message-ID: 4487040D.7040002@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'd like to base on paper "Generalizing "Search" in Generalized Search Trees" by
Paul Aoki, section 4.1 "Multiple Key Support"
(http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf)

Proposed algorithm (without details about nulls etc):
1) n = 1 - set column's number to first one
2) form vector of keys of n-th column
3) set left and right union keys to NULL (see below)
4) call user-defined pickSplit method for n-th column
5) if it was a last column then return - page is splitted
6) try to find keys on left page with zero penalty with right union key and
keys on right page with left union. Note, we check only keys in n-th
column. Let M is a total number of keys with zero penalties with
opposite unions. So we have M keys/tuples which can be freely
distributed between left and right page. Penalty is calculated by call
user-defined Penalty() method for n-th column.
7) if M == 0 then return - page is splitted
8) n++
9) form vector of keys of n-th column, but use only tuples, determined in
step 6
10)set left and right union keys. Its forms from tuples which can't be
freely distributed between page. These tuples are determined in step 6
11)go to step 4.

That algorithm requires to small change in interface of pickSplit() method.
It works with GIST_SPLITVEC structure. But step 6 requires that pickSplit()
should knows: are unions already formed? If not, pickSplit() should work
as now. If yes, pickSplit() must take in consideration formed left and right
unions, and it can't to 'reduce' that unions.

I suggest add one boolean field to GIST_SPLITVEC: isSubSplit (suggest exact
name, pls)
if isSubSplit == false then unions are not formed, pickSplit works as now.
if isSubSplit == true then unions are already formed, pickSplit should use
formed keys. So, after pickSplit() call isSubSplit should be always 'false', if
it's not then GiST's core should say at least a warning about unsupported
feature in opclass. BTW, in this case, GiST may compose old (with formed before)
unions with suggested by pickSplit() by maximizing penalty between left and
right keys.

Also, I plan to split GIST_SPLITVEC to 2 structures: one of its will be argument
for pickSplit() and another will use internally in GiST core. Now GIST_SPLITVEC
contains a lot of field that's needed only for core.

Thoughts, suggestions, objections?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2006-06-07 17:06:43 Snowball and ispell in tsearch2
Previous Message Tom Lane 2006-06-07 15:47:45 Re: How to avoid transaction ID wrap