Re: Cluster table and order information

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Dario Beraldi <dario(dot)beraldi(at)ed(dot)ac(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Cluster table and order information
Date: 2011-02-01 16:22:27
Message-ID: 4D483343.4010806@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2/1/2011 10:17 AM, Dario Beraldi wrote:
> Quoting Andy Colson <andy(at)squeakycode(dot)net>:
>
>> On 2/1/2011 9:08 AM, Dario Beraldi wrote:
>>> Hello,
>>>
>>> From the documentation of CLUSTER table
>>> (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I
>>> understand that clustering can be achieved by re-creating the table like
>>> this:
>>>
>>> CREATE TABLE newtable AS
>>> SELECT * FROM table ORDER BY columnlist;
>>>
>>> My question is: If I upload with COPY a datafile which is already
>>> correctly sorted, can I inform postgres of such order, so that no
>>> clustering is necessary after the import? In other words, how can I tell
>>> postgres that my file is order by this and that column?
>>>
>>> Many thanks!
>>>
>>> Dario
>>>
>>
>> The planner has no knowledge of cluster. Meaning PG will query a
>> clustered and unclustered table exactly the same way. A table is not
>> marked or anything as clustered. And in fact, during usage of a table
>> it'll become unclustered.
>>
>> Clustering is only useful when you are going to read multiple records
>> in the same order as an index. It turns "more random seeks" into "more
>> sequential reads".
>>
>> If your COPY loads data in indexed order, then just dont run the cluster.
>>
>> -Andy
>>
> Thanks very much Andy, this clarifies my doubts.
>
> I was misled by the docs saying "When a table is clustered, PostgreSQL
> remembers which index it was clustered by" which made me think that the
> order information is stored somewhere.
>
> All the best
> Dario
>

The next sentience clears it up:

The form "CLUSTER table_name" reclusters the table using the same index
as before.

-Andy

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2011-02-01 16:52:41 Re: Weird performance issue with custom function with a for loop.
Previous Message Dario Beraldi 2011-02-01 16:17:00 Re: Cluster table and order information