Re: Index AM change proposals, redux

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-23 16:04:32
Message-ID: 1208966672.4259.1397.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2008-04-09 at 20:30 -0400, Tom Lane wrote:

> * GIT (Grouped Index Tuple) indexes, which achieve index space savings
> in btrees by having a single index tuple represent multiple heap tuples
> (on a single heap page) containing a range of key values. I am not sure
> what the development status is --- Heikki had submitted a completed
> patch but there seemed to be agreement on making changes, and that's not
> been done AFAIK. The really serious problem I've got with it is that
> it'd foreclose the possibility of returning actual index keys from btree
> indexes, thus basically killing the usefulness of that idea. I'm not
> convinced it would offer enough gain to be worth paying that price.

> Another issue is that we'd need to check how much of the use-case for
> GIT has been taken over by HOT.

That seems to be a misunderstanding about HOT and GIT. HOT is an
important requirement for GIT, but other than they are unrelated.

Testing in 2006/2007 showed that HOT stabilised the effects of repeated
updates, which then showed as a "gain" in performance. But GIT did show
considerable actual performance gains in its target use case.

GIT significantly reduces the size of clustered indexes, greatly
improving the number of index pointers that can be held in memory for
very large indexes. That translates directly into a reduction in I/O for
large databases on typical hardware, for primary operations, file
backups and recovery (and this, log replication). Test results validated
that and showed increased performance, over and above that experienced
with HOT, when tested together.

Now there may be problems with the GIT code as it stands, but we should
acknowledge that the general technique has been proven to improve
performance on a recent PostgreSQL codebase. This is an unsurprising
result, since SQLServer, Sybase, DB2, Oracle and Teradata (at least) all
use indexes of this category to improve real-world performance. The idea
is definitely not a benchmark-only feature.

Many users would be very interested if we could significantly reduce the
size of the main index on their largest tables.

I would at least like to see clustered indexes acknowledged as a TODO
item, so we keep the door open for a future implementation based around
the basic concept of GIT.

Nobody is going to waste their time flogging a dead horse, which is why
the patch isn't ready. Maybe *that* horse is dead, not really for me to
say, but if we can at least agree on a basic statement that equine
animals are fast we may find a rider willing to invest time in them.

I don't see the "returns index keys" idea as being killed by or killing
this concept. Returning keys is valid and useful when we can, but there
are other considerations that, in some use cases, will be a dominant
factor.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-23 16:07:10 Re: Index AM change proposals, redux
Previous Message Tom Lane 2008-04-23 15:04:51 Re: pg_ctl do_restart