Re: gistchoose vs. bloat

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-21 13:19:35
Message-ID: 50FD4067.2000902@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.01.2013 15:06, Tom Lane wrote:
> Jeff Davis<pgsql(at)j-davis(dot)com> writes:
>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>>> I looked at this patch. ISTM we should not have the option at all but
>>> just do it always. I cannot believe that always-go-left is ever a
>>> preferable strategy in the long run; the resulting imbalance in the
>>> index will surely kill any possible benefit. Even if there are some
>>> cases where it miraculously fails to lose, how many users are going to
>>> realize that applies to their case and make use of the option?
>
>> Sounds good to me.
>
>> If I remember correctly, there was also an argument that it may be
>> useful for repeatable test results. That's a little questionable for
>> performance (except in those cases where few penalties are identical
>> anyway), but could plausibly be useful for a crash report or something.
>
> Meh. There's already a random decision, in the equivalent place and for
> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
> complained about that being indeterminate, so I'm unconvinced that
> there's a market for it with gist.

I wonder if it would work for gist to do something similar to
_bt_findinsertloc, and have a bias towards the left page, but sometimes
descend to one of the pages to the right. You would get the cache
locality of usually descending down the same subtree, but also fill the
pages to the right. Jeff / Alexander, want to give that a shot?

When building an index from scratch, using the new buffered index build,
you could do a lot better than fill each page like with regular inserts
and split when one fills up. You could e.g buffer 10 pages worth of
tuples, and perform a 10-way split of all the buffered tuples together,
distributing them equally to 10 pages (or 10+something, to leave some
room for updates). But that's clearly a separate and much larger patch.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-01-21 13:59:28 Re: Making testing on Windows easier
Previous Message Dave Page 2013-01-21 13:11:17 Re: Making testing on Windows easier