Re: why is gist index taking so much space on the disc

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: why is gist index taking so much space on the disc
Date: 2005-11-21 19:45:32
Message-ID: 20051121194527.GG16764@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2005 at 08:14:44PM +0100, Grzegorz Jaskiewicz wrote:
> >You mean you sometimes put the same elements in the two halves? You
> >shouldn't do that. The whole point is that the search will descend any
> >node that matches consistant, but any single key should only appear
> >once in each index.
> >
> >picksplit should *split* the set, not return two sets about the same
> >size as you started...
>
> Nope, I mean that 'masks' created to match either 'half' sometimes
> match elements in the other one.
> This shouldn't be a big deal, just one level to go down on query to
> much more specific result set.
> I have fixed that with, somewhat hack.

It's not a hack, that's how it's supposed to work. An entry should only
appear once in the index, but it could appear in multiple places. Like
you say, some entries can go into either half.

B-Trees are the rather special case that you can always split a set of
values into two non-overlapping sets. With geometric types (like your
bitmasks) you can't avoid overlap sometimes so you have to follow
multiple branches to check if an element is there or not.

Your pseudo code is good and will work fine. However, ideally you want
to divide the overlap in such a way that later splits work better.
Maybe by trying to decide which mask is "closer".

The better the splitting the more efficient your tree will become.
Ofcourse, perfect splitting may be expensive but then it depends on how
many inserts vs how many selects. If you do a lot of searches it may be
worth the time.

BTW, I glad you're making progress and hopefully you might be able to
publish some code. PostgreSQL could do with some example GiST indexes
on bitmaps.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-11-21 19:53:34 Re: Time for pgindent?
Previous Message Jim C. Nasby 2005-11-21 19:41:55 Re: bind variables, soft vs hard parse