Re: [HACKERS] Maintaining cluster order on insert

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Maintaining cluster order on insert
Date: 2006-08-11 23:33:26
Message-ID: 44DD13C6.2020609@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Heikki Linnakangas wrote:
> Ron Mayer wrote:
>> In my case my biggest/slowest tables are clustered by zip-code (which
>> does a reasonable job at keeping counties/cities/etc on the
>> same pages too)....
>
> No deletes? If the tables grow over time, you probably would need to run
> CLUSTER every now and then to get the best performance, though the patch
> would alleviate that quite a lot.

Yup, pretty much no deletes; since it's a historical archive of
some government documents with address info. Though I the
live system may periodically expunge data, say, 10+years old.

> Do you have a development environment where you could test what effect
> the patch would have? It would be interesting to have a real-world use
> case, since I don't have one myself at the moment.

I have a development environment, but it doesn't have the same
real-time-growing behavior, and only a small region of the country.
I suppose I could pre-load N-1 years and cluster it, and then
incrementally insert the last year of data to simulate the effect.

But sure, I'll attempt to try the patch; but don't really have any
good benchmarking environment to give any definitive results. If
an anecdotal "this is how it feels to me" is useful, I can give one
of those.

>> > Your best bet might be to partition the table into two subtables, one
>> > with "stable" data and one with the fresh data.
>>
>> Hmm... that should work well for me too....
>
> Umm, if your inserts are uniformly distributed across the country, you
> wouldn't have a stable part, right?

Hmm. Maybe. I was thinking when archiving to the large table
an "order by" clause when inserting from the new partition to the
stable partition could at least make the big table "piecewise"
clustered so most records for a zip code fit in the same few disk
pages, even though those pages would still end up lying around
far apart on the disk.

I wonder what part of "CLUSTER" gives the most benefit - that
most records of a type fit on a few blocks; or that those blocks
are next to each other so can be read sequentially?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2006-08-12 00:26:45 Re: New variable server_version_num
Previous Message Heikki Linnakangas 2006-08-11 23:16:58 Re: [HACKERS] Maintaining cluster order on insert

Browse pgsql-patches by date

  From Date Subject
Next Message David Fetter 2006-08-12 00:26:45 Re: New variable server_version_num
Previous Message Heikki Linnakangas 2006-08-11 23:16:58 Re: [HACKERS] Maintaining cluster order on insert