Re: ToDo List Item - System Table Index Clustering

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simone Aiken <saiken(at)ulfheim(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo List Item - System Table Index Clustering
Date: 2011-01-18 21:36:25
Message-ID: 201101182136.p0ILaPa21058@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:
> >>
> >> Hello Postgres Hackers,
> >>
> >> In reference to this todo item about clustering system table indexes,
> >> ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php )
> >> I have been studying the system tables to see which would benefit ?from
> >> clustering. ?I have some index suggestions and a question if you have a
> >> moment.
> >
> > Wow, this is really old stuff. ?I don't know if this is really of any
> > benefit, given that these catalogs are loaded into syscaches anyway.
> > Furthermore, if you cluster at initdb time, they will soon lose the
> > ordering, given that updates move tuples around and inserts put them
> > anywhere. ?So you'd need the catalogs to be re-clustered once in a
> > while, and I don't see how you'd do that (except by asking the user to
> > do it, which doesn't sound so great).
>
> The idea of the TODO seems to have been to set the default clustering
> to something reasonable. That doesn't necessarily seem like a bad
> idea even if we can't automatically maintain the cluster order, but
> there's some question in my mind whether we'd get any measurable
> benefit from the clustering. Even on a database with a gigantic
> number of tables, it seems likely that the relevant system catalogs
> will stay fully cached and, as you point out, the system caches will
> further blunt the impact of any work in this area. I think the first
> thing to do would be to try to come up with a reproducible test case
> where clustering the tables improves performance. If we can't, that
> might mean it's time to remove this TODO.

I think CLUSTER is a win when you are looking up multiple rows in the
same table, either using a non-unique index or a range search. What
places do such lookups? Having them all in adjacent pages would be a
win --- single-row lookups are usually not.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-01-18 21:39:17 Re: Fixing GIN for empty/null/full-scan cases
Previous Message Bruce Momjian 2011-01-18 20:55:12 test_fsync label adjustments