Re: why is gist index taking so much space on the disc

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: why is gist index taking so much space on the disc
Date: 2005-11-22 00:28:52
Message-ID: Pine.GSO.4.63.0511220320010.29329@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 21 Nov 2005, Martijn van Oosterhout wrote:

> On Mon, Nov 21, 2005 at 08:14:44PM +0100, Grzegorz Jaskiewicz wrote:
>>> You mean you sometimes put the same elements in the two halves? You
>>> shouldn't do that. The whole point is that the search will descend any
>>> node that matches consistant, but any single key should only appear
>>> once in each index.
>>>
>>> picksplit should *split* the set, not return two sets about the same
>>> size as you started...
>>
>> Nope, I mean that 'masks' created to match either 'half' sometimes
>> match elements in the other one.
>> This shouldn't be a big deal, just one level to go down on query to
>> much more specific result set.
>> I have fixed that with, somewhat hack.
>
> It's not a hack, that's how it's supposed to work. An entry should only
> appear once in the index, but it could appear in multiple places. Like
> you say, some entries can go into either half.
>
> B-Trees are the rather special case that you can always split a set of
> values into two non-overlapping sets. With geometric types (like your
> bitmasks) you can't avoid overlap sometimes so you have to follow
> multiple branches to check if an element is there or not.
>
> Your pseudo code is good and will work fine. However, ideally you want
> to divide the overlap in such a way that later splits work better.
> Maybe by trying to decide which mask is "closer".
>
> The better the splitting the more efficient your tree will become.
> Ofcourse, perfect splitting may be expensive but then it depends on how
> many inserts vs how many selects. If you do a lot of searches it may be
> worth the time.

Martijn is perfectly right here. You, probably, need to read a bit
some classical papers, for example,
"R-TREES: A dynamic index structure for spatial searching" by Antonin
Guttman.

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-11-22 00:33:16 Re: PostgreSQL 8.1.0 catalog corruption
Previous Message Tom Lane 2005-11-22 00:18:33 Re: PostgreSQL 8.1.0 catalog corruption