Re: GIN improvements part 1: additional information

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 1: additional information
Date: 2013-10-03 18:43:54
Message-ID: 524DBAEA.9080908@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.09.2013 18:35, Bruce Momjian wrote:
> On Sun, Sep 15, 2013 at 01:14:45PM +0400, Alexander Korotkov wrote:
>> On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<hlinnakangas(at)vmware(dot)com>
>> wrote:
>>
>> There's a few open questions:
>>
>> 1. How are we going to handle pg_upgrade? It would be nice to be able to
>> read the old page format, or convert on-the-fly. OTOH, if it gets too
>> complicated, might not be worth it. The indexes are much smaller with the
>> patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
>> later. Still, I'd like to give it a shot.
>
> We have broken pg_upgrade index compatibility in the past.
> Specifically, hash and GIN index binary format changed from PG 8.3 to
> 8.4. I handled it by invalidating the indexes and providing a
> post-upgrade script to REINDEX all the changed indexes. The user
> message is:
>
> Your installation contains hash and/or GIN indexes. These indexes have
> different internal formats between your old and new clusters, so they
> must be reindexed with the REINDEX command. The file:
>
> ...
>
> when executed by psql by the database superuser will recreate all invalid
> indexes; until then, none of these indexes will be used.
>
> It would be very easy to do this from a pg_upgrade perspective.
> However, I know there has been complaints from others about making
> pg_upgrade more restrictive.
>
> In this specific case, even if you write code to read the old file
> format, we might want to create the REINDEX script to allow _optional_
> reindexing to shrink the index files.
>
> If we do require the REINDEX, --check will clearly warn the user that
> this will be required.

It seems we've all but decided that we'll require reindexing GIN indexes
in 9.4. Let's take the opportunity to change some other annoyances with
the current GIN on-disk format:

1. There's no explicit "page id" field in the opaque struct, like there
is in other index types. This is for the benefit of debugging tools like
pg_filedump. We've managed to tell GIN pages apart from other index
types by the fact that the special size of GIN pages is 8 and it's not
using all the high-order bits in the last byte on the page. But an
explicit page id field would be nice, so let's add that.

2. I'd like to change the way "incomplete splits" are handled.
Currently, WAL recovery keeps track of incomplete splits, and fixes any
that remain at the end of recovery. That concept is slightly broken;
it's not guaranteed that after you've split a leaf page, for example,
you will succeed in inserting the downlink to its parent. You might e.g
run out of disk space. To fix that, I'd like to add a flag to the page
header to indicate if the split has been completed, ie. if the page's
downlink has been inserted to the parent, and fix them lazily on the
next insert. I did a similar change to GiST back in 9.1. (Strictly
speaking this doesn't require changing the on-disk format, though.)

3. I noticed that the GIN b-trees, the main key entry tree and the
posting trees, use a slightly different arrangement of the downlink than
our regular nbtree code does. In nbtree, the downlink for a page is the
*low* key of that page, ie. if the downlink is 10, all the items on that
child page must be >= 10. But in GIN, we store the *high* key in the
downlink, ie. all the items on the child page must be <= 10. That makes
inserting new downlinks at a page split slightly more complicated. For
example, when splitting a page containing keys between 1-10 into 1-5 and
5-10, you need to insert a new downlink with key 10 for the new right
page, and also update the existing downlink to 5. The nbtree code
doesn't require updating existing entries.

Anything else?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-10-03 18:48:20 Re: GIN improvements part 1: additional information
Previous Message Robert Haas 2013-10-03 18:39:46 Re: [RFC] Extend namespace of valid guc names