Re: why is gist index taking so much space on the disc

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: why is gist index taking so much space on the disc
Date: 2005-11-21 18:32:20
Message-ID: 20051121183214.GF16764@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2005 at 04:58:25PM +0100, Grzegorz Jaskiewicz wrote:
> my conquers with Gist index for custom type are nearly finished. It
> is working as it is now, but there are few problems here and there.
> One of em, being amount of disc space index it self takes. The type
> stucture it self takes 160bytes. Adding 100.000 rows into table -
> CREATE TABLE blah (a serial, b customType);

Let's see, 160bytes means you'll get aboud 50 keys per page. So you
would expect 2000 leaf page, 40 level 1 pages. This should be less than
20-30MB

<snip>

> Is it normal that index is so hudge ? Even tho my type has built in
> masks (element that can match few different values), and %. up front
> the string (which behaves just like the sql % in b ~ '%.something').
> And both are used to build "unions" for pick-split, and other
> operations. Is it because of pick-split it self ? It does good work
> in splitting up table of elements into two separate ones, by sorting
> them first, than creating common "mask" for L and P. And by scanning
> whole table again, and putting elements matching into L or P. L and P
> elements sometimes overlap, but so far I can't find better solution.

You mean you sometimes put the same elements in the two halves? You
shouldn't do that. The whole point is that the search will descend any
node that matches consistant, but any single key should only appear
once in each index.

picksplit should *split* the set, not return two sets about the same
size as you started...

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin McArthur 2005-11-21 19:04:37 Re: order by, for custom types
Previous Message Heikki Linnakangas 2005-11-21 18:03:34 Re: Data directory on read-only media