Re: ToDo List Item - System Table Index Clustering

From: "Simone Aiken" <saiken(at)quietlyCompetent(dot)com>
To: "'Bruce Momjian'" <bruce(at)momjian(dot)us>, "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Alvaro Herrera'" <alvherre(at)commandprompt(dot)com>, "'Simone Aiken'" <saiken(at)ulfheim(dot)net>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo List Item - System Table Index Clustering
Date: 2011-01-19 16:19:47
Message-ID: 007401cbb7f4$b1539820$13fac860$@quietlyCompetent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> Robert
>
> I think the first
> thing to do would be to try to come up with a reproducible test case
> where clustering the tables improves performance.
>

On that note, is there any standard way you guys do benchmarks?

> Bruce
>
>I think CLUSTER is a win when you are looking up multiple rows in the same
table, either using a non-unique index or a range search. What places do
such lookups? >Having them all in adjacent pages would be a win ---
single-row lookups are usually not.
>

Mostly the tables that track column level data. Typically you will want to
grab rows for multiple columns for a given table at once so it would be
helpful to have them be contiguous on disk.

I could design a benchmark to display this by building a thousand tables one
column at a time using 'alter add column' to scatter the catalog rows for
the tables across many blocks. So they'll be a range with column 1 for each
table and column 2 for each table and column three for each table. Then
fill a couple data tables with a lot of data and set some noise makers to
loop through them over and over with full table scans ... filling up cache
with unrelated data and hopefully ageing out the cache of the pg_tables.
Then do some benchmark index lookup queries to see the retrieval time before
and after clustering the pg_ctalog tables to record a difference.

If the criteria is "doesn't hurt anything and helps a little" I think this
passes. Esp since clusters aren't maintained automatically so adding them
has no negative impact on insert or update. It'd just be a nice thing to do
if you know it can be done that doesn't harm anyone who doesn't know.

-Simone Aiken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-01-19 16:27:51 Re: limiting hint bit I/O
Previous Message Robert Haas 2011-01-19 16:19:26 Re: Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on