Re: clustered indexes?

From: Alvaro Herrera <alvherre(at)atentus(dot)com>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: Varun Kacholia <varunk(at)cse(dot)iitb(dot)ac(dot)in>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: clustered indexes?
Date: 2002-06-22 17:22:01
Message-ID: Pine.LNX.4.44.0206221241060.2135-100000@cm-lcon-46-187.cm.vtr.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Curt Sampson dijo:

> BTW, clustered indexes would be a *really* cool feature to have in
> postgres. I recently saw a query go from 70 seconds down to .6 seconds
> when I clustered the table on the particular column I was selecting a
> value from.

I don't know if the storage manager allows it, but maybe it is possible
to use the free space map to allocate some free space on each page; that
way, anytime a new tuple is added, it is written on the page that
correspond to its clustered value (here you add the overhead of checking
whether a given relation is clustered or not to _every_ relation).

If there is a lot of tuples for a particular value, it should just
allocate free space on the last page that contains tuples of that value.
OTOH, if there are a lot of values with small number of tuples in one
page, it allocates space for all of them at the end of the page.

Another thing would be that if one value has tuples that fit in an
integer number of pages, a page next to it would have to be allocated
for free space also.

Of course, when running out of space for a particular value, some more
space would have to be allocated, moving all the tuples in the rest of
the table. Looks quite inefficient. Or if the clustered values do not
have also to be ordered, maybe just the tuples of the next value have to
be moved to the end of the table, freeing all the space they were using
(but this is a really bad idea if the naxt value has a lot of tuples).

Another way to do this would be having some dummy tuples after each
values' real tuples. When something wants to write a new tuple, it uses
one of the dummy tuples. Variable size tuples would be a problem, of
course. Limiting the feature to only fixed-size-tuple-tables simplifies
the problem somewhat.

All this sounds pretty difficult to me, and I wonder whether it's worth
the trouble. Maybe fixing the problem with the current CLUSTER
implementation (the dropping of indexes and grant permissions) helps
somewhat, but seeing the comments in CVS log from Tom Lane does not
help:

revision 1.63
date: 2001/01/12 01:22:21; author: tgl; state: Exp; lines: +2 -2
Preserve constraints and column defaults during CLUSTER.
Wish they were all this easy ...

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tina Messmann 2002-06-22 19:08:35 Re: NetBSD Live CD/PostgreSQL?
Previous Message Werner Schalk 2002-06-22 17:11:43 Creating a user fails