Quick Links

Re: Why is sorting on two columns so slower thansortingon one column?

Lists:	pgsql-hackers

From:	Jie Li <jay23jack(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Why is sorting on two columns so slower than sorting on one column?
Date:	2010-12-23 07:33:12
Message-ID:	AANLkTi=VK+kLjEvVdEL7y9P-rPE3OJVzqruR59EPmkj+@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Here is the test table,

The first query sorting on one column:
postgres=# explain analyze select * from big_wf order by age;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
time=11228.155..16427.149 rows=4100000 loops=1)
Sort Key: age
Sort Method: external sort Disk: 72112kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
(actual time=6.196..4797.620 rows=4100000 loops=1)
Total runtime: 19530.452 ms
(5 rows)

The second query sorting on two columns:
postgres=# explain analyze select * from big_wf order by age,id;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
time=37544.779..48206.702 rows=4100000 loops=1)
Sort Key: age, id
Sort Method: external merge Disk: 72048kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
(actual time=6.796..5518.663 rows=4100000 loops=1)
Total runtime: 51258.000 ms
(5 rows)

The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
first column(age) of all the tuples are of the same value, so the second
column(id) is always needed for comparison. While the first sorting takes
about only 6 seconds, the second one takes over 30 seconds, Is this too
much than expected? Is there any possible optimization ?

Thanks,
Li Jie

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Jie Li <jay23jack(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower than sorting on one column?
Date:	2010-12-23 14:03:00
Message-ID:	20101223140300.GJ10252@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
> Hi,
>
> Here is the test table,
>
> postgres=# \d big_wf
> Table "public.big_wf"
> Column | Type | Modifiers
> --------+---------+-----------
> age | integer |
> id | integer |
>
> postgres=# \dt+ big_wf
> List of relations
> Schema | Name | Type | Owner | Size | Description
> --------+--------+-------+----------+--------+-------------
> public | big_wf | table | workshop | 142 MB |
>
>
> The first query sorting on one column:
> postgres=# explain analyze select * from big_wf order by age;
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------
> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
> time=11228.155..16427.149 rows=4100000 loops=1)
> Sort Key: age
> Sort Method: external sort Disk: 72112kB
> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
> (actual time=6.196..4797.620 rows=4100000 loops=1)
> Total runtime: 19530.452 ms
> (5 rows)
>
> The second query sorting on two columns:
> postgres=# explain analyze select * from big_wf order by age,id;
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------
> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
> time=37544.779..48206.702 rows=4100000 loops=1)
> Sort Key: age, id
> Sort Method: external merge Disk: 72048kB
> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
> (actual time=6.796..5518.663 rows=4100000 loops=1)
> Total runtime: 51258.000 ms
> (5 rows)
>
> The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
> first column(age) of all the tuples are of the same value, so the second
> column(id) is always needed for comparison. While the first sorting takes
> about only 6 seconds, the second one takes over 30 seconds, Is this too
> much than expected? Is there any possible optimization ?
>
> Thanks,
> Li Jie

Hi Li,

If I understand your description, in the first query the sort does
not actually have to do anything because the column values for "age"
are all degenerate. In the second query, you actually need to sort
the values which is why it takes longer. If the first column values
are the same, then simply sorting by "id" alone would be faster.
You could also bump up work_mem for the query to perform the sort
in memory.

Regards,
Ken

From:	Marti Raudsepp <marti(at)juffo(dot)org>
To:	Jie Li <jay23jack(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower than sorting on one column?
Date:	2010-12-23 14:17:51
Message-ID:	AANLkTi=ceKpoPDcJGsE4WAkN4dzFqhOQ=zyuyo5eYL3O@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 23, 2010 at 09:33, Jie Li <jay23jack(at)gmail(dot)com> wrote:
> While the first sorting takes
> about only 6 seconds, the second one takes over 30 seconds, Is this too
> much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti

From:	"Li Jie" <jay23jack(at)gmail(dot)com>
To:	"Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc:	"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower than sortingon one column?
Date:	2010-12-23 14:19:46
Message-ID:	000c01cba2ac$757caa10$2ad118ac@A0078508
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Ken,

Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random.

postgres=# explain analyze select * from big_wf order by id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
Sort Key: id
Sort Method: external merge Disk: 72048kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000 loops=1)

Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the "cost" are the same (575775.45).

Thanks,
Li Jie

----- Original Message -----
From: "Kenneth Marshall" <ktm(at)rice(dot)edu>
To: "Jie Li" <jay23jack(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Thursday, December 23, 2010 10:03 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sortingon one column?

> On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
>> Hi,
>>
>> Here is the test table,
>>
>> postgres=# \d big_wf
>> Table "public.big_wf"
>> Column | Type | Modifiers
>> --------+---------+-----------
>> age | integer |
>> id | integer |
>>
>> postgres=# \dt+ big_wf
>> List of relations
>> Schema | Name | Type | Owner | Size | Description
>> --------+--------+-------+----------+--------+-------------
>> public | big_wf | table | workshop | 142 MB |
>>
>>
>> The first query sorting on one column:
>> postgres=# explain analyze select * from big_wf order by age;
>> QUERY
>> PLAN
>> -------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
>> time=11228.155..16427.149 rows=4100000 loops=1)
>> Sort Key: age
>> Sort Method: external sort Disk: 72112kB
>> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
>> (actual time=6.196..4797.620 rows=4100000 loops=1)
>> Total runtime: 19530.452 ms
>> (5 rows)
>>
>> The second query sorting on two columns:
>> postgres=# explain analyze select * from big_wf order by age,id;
>> QUERY
>> PLAN
>> -------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
>> time=37544.779..48206.702 rows=4100000 loops=1)
>> Sort Key: age, id
>> Sort Method: external merge Disk: 72048kB
>> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
>> (actual time=6.796..5518.663 rows=4100000 loops=1)
>> Total runtime: 51258.000 ms
>> (5 rows)
>>
>> The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
>> first column(age) of all the tuples are of the same value, so the second
>> column(id) is always needed for comparison. While the first sorting takes
>> about only 6 seconds, the second one takes over 30 seconds, Is this too
>> much than expected? Is there any possible optimization ?
>>
>> Thanks,
>> Li Jie
>
> Hi Li,
>
> If I understand your description, in the first query the sort does
> not actually have to do anything because the column values for "age"
> are all degenerate. In the second query, you actually need to sort
> the values which is why it takes longer. If the first column values
> are the same, then simply sorting by "id" alone would be faster.
> You could also bump up work_mem for the query to perform the sort
> in memory.
>
> Regards,
> Ken
>

From:	"Li Jie" <jay23jack(at)gmail(dot)com>
To:	"Marti Raudsepp" <marti(at)juffo(dot)org>
Cc:	"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower than sorting on one column?
Date:	2010-12-23 14:29:26
Message-ID:	001801cba2ad$cf3b4100$2ad118ac@A0078508
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Marti,

Thanks for your help! I guess I understand what you mean, a clustered index will make sorting as cheap as a seq scan, right?

But what I meant is, is there any potential optimization for the backend implementation? Intuitively, if sorting on one column or two columns will incur the same I/O costs, why should there be so much difference?

Thanks,
Li Jie

----- Original Message -----
From: "Marti Raudsepp" <marti(at)juffo(dot)org>
To: "Jie Li" <jay23jack(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Thursday, December 23, 2010 10:17 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Li Jie <jay23jack(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower than sortingon one column?
Date:	2010-12-23 14:30:00
Message-ID:	20101223143000.GK10252@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> Hi Ken,
>
> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random.
>
> postgres=# explain analyze select * from big_wf order by id;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------
> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
> Sort Key: id
> Sort Method: external merge Disk: 72048kB
> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000 loops=1)
>
> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the "cost" are the same (575775.45).
>
> Thanks,
> Li Jie

In the first query, the cost is basically the I/O cost to read the
table from disk. The actual sort does not do anything since the
sort values are the same. In the second query, the sort has to
swap things in memory/disk to get them in the correct order for
the result. This actually takes CPU and possibly additional I/O
which is why it is slower. In the case of sorting by just the "id"
column, the size of the sorted values is smaller which would need
fewer batches to complete the sort since the sort is bigger than
the work_mem.

Cheers,
Ken

From:	"Li Jie" <jay23jack(at)gmail(dot)com>
To:	"Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc:	"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower thansortingon one column?
Date:	2010-12-23 14:42:26
Message-ID:	003801cba2af$a04e3e90$2ad118ac@A0078508
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

----- Original Message -----
From: "Kenneth Marshall" <ktm(at)rice(dot)edu>
To: "Li Jie" <jay23jack(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Thursday, December 23, 2010 10:30 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

> On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
>> Hi Ken,
>>
>> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random.
>>
>> postgres=# explain analyze select * from big_wf order by id;
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
>> Sort Key: id
>> Sort Method: external merge Disk: 72048kB
>> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000 loops=1)
>>
>> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the "cost" are the same (575775.45).
>>
>> Thanks,
>> Li Jie
>
> In the first query, the cost is basically the I/O cost to read the
> table from disk. The actual sort does not do anything since the
> sort values are the same. In the second query, the sort has to
> swap things in memory/disk to get them in the correct order for
> the result. This actually takes CPU and possibly additional I/O
> which is why it is slower. In the case of sorting by just the "id"
> column, the size of the sorted values is smaller which would need
> fewer batches to complete the sort since the sort is bigger than
> the work_mem.
>
> Cheers,
> Ken

Hi Ken,

Thanks for your analysis.

But in the last query that sorts on "id", since the query selects all the columns for output, the actual sorted size is the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the comparison. Am I right?

Thanks,
Li Jie

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Li Jie <jay23jack(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower thansortingon one column?
Date:	2010-12-23 14:53:05
Message-ID:	20101223145305.GN10252@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
> ----- Original Message -----
> From: "Kenneth Marshall" <ktm(at)rice(dot)edu>
> To: "Li Jie" <jay23jack(at)gmail(dot)com>
> Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
> Sent: Thursday, December 23, 2010 10:30 PM
> Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?
>
>
> > On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> >> Hi Ken,
> >>
> >> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random.
> >>
> >> postgres=# explain analyze select * from big_wf order by id;
> >> QUERY PLAN
> >> -------------------------------------------------------------------------------------------------------------------------
> >> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
> >> Sort Key: id
> >> Sort Method: external merge Disk: 72048kB
> >> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000 loops=1)
> >>
> >> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the "cost" are the same (575775.45).
> >>
> >> Thanks,
> >> Li Jie
> >
> > In the first query, the cost is basically the I/O cost to read the
> > table from disk. The actual sort does not do anything since the
> > sort values are the same. In the second query, the sort has to
> > swap things in memory/disk to get them in the correct order for
> > the result. This actually takes CPU and possibly additional I/O
> > which is why it is slower. In the case of sorting by just the "id"
> > column, the size of the sorted values is smaller which would need
> > fewer batches to complete the sort since the sort is bigger than
> > the work_mem.
> >
> > Cheers,
> > Ken
>
> Hi Ken,
>
> Thanks for your analysis.
>
> But in the last query that sorts on "id", since the query selects all the columns for output, the actual sorted size is the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the comparison. Am I right?
>
> Thanks,
> Li Jie

I think you are right. Sorry for the confusion.

Ken

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	Li Jie <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower thansortingon one column?
Date:	2010-12-23 15:26:57
Message-ID:	23613.1293118017@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
>> But in the last query that sorts on "id", since the query selects all the columns for output, the actual sorted size is the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the comparison. Am I right?

> I think you are right. Sorry for the confusion.

I doubt the cost of comparing two integers is the issue here; rather
it's more likely one of how many merge passes were needed. You could
find out instead of just speculating by turning on trace_sort and
comparing the log outputs.

regards, tom lane

From:	Jie Li <jay23jack(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower thansortingon one column?
Date:	2010-12-24 05:27:31
Message-ID:	AANLkTik6tPNnOkRO2kzLinLMe=UY3m8nmWZmpqb1envG@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 23, 2010 at 10:26 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> > On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
> >> But in the last query that sorts on "id", since the query selects all
> the columns for output, the actual sorted size is the same, and the only
> difference is the comparison cost. The query sorting on two columns needs to
> do twice the comparison. Am I right?
>
> > I think you are right. Sorry for the confusion.
>
> I doubt the cost of comparing two integers is the issue here; rather
> it's more likely one of how many merge passes were needed. You could
> find out instead of just speculating by turning on trace_sort and
> comparing the log outputs.
>
> regards, tom lane
>

I follow your advice and set the trace_sort, here is the two queries result,
the log outputs looks similar and they seem to use the same number of
tapes. But the time is different. Could you have a look:

postgres=# explain analyze select * from big_wf order by id;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
time=28102.985..39351.309 rows=4100000 loops=1)
Sort Key: id
Sort Method: external merge Disk: 72048kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual
time=9.190..7262.789 rows=4100000 loops=1)
Total runtime: 42953.855 ms
(5 rows)
STATEMENT: explain analyze select * from big_wf order by id;
LOG: begin tuple sort: nkeys = 1, workMem = 20480, randomAccess = f
STATEMENT: explain analyze select * from big_wf order by id;
LOG: switching to external sort with 74 tapes: CPU 0.29s/0.28u sec elapsed
0.71 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 1 to tape 0: CPU 0.68s/2.12u sec elapsed 3.41 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 2 to tape 1: CPU 1.22s/4.24u sec elapsed 6.53 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 3 to tape 2: CPU 1.67s/6.38u sec elapsed 9.67 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 4 to tape 3: CPU 2.22s/8.61u sec elapsed 12.93
sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 5 to tape 4: CPU 2.72s/10.80u sec elapsed 16.09
sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 6 to tape 5: CPU 3.23s/12.86u sec elapsed 19.12
sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 7 to tape 6: CPU 3.81s/15.02u sec elapsed 23.84
sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 8 to tape 7: CPU 4.36s/17.13u sec elapsed 27.05
sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: performsort starting: CPU 4.38s/17.20u sec elapsed 27.15 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing run 9 to tape 8: CPU 4.39s/17.88u sec elapsed 27.97
sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: finished writing final run 10 to tape 9: CPU 4.39s/17.88u sec elapsed
27.97 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: performsort done (except 10-way final merge): CPU 4.39s/17.98u sec
elapsed 28.10 sec
STATEMENT: explain analyze select * from big_wf order by id;
LOG: external sort ended, 9006 disk blocks used: CPU 8.01s/27.02u sec
elapsed 42.92 sec
STATEMENT: explain analyze select * from big_wf order by id;

postgres=# explain analyze select * from big_wf order by age,id;
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
time=43709.851..57048.645 rows=4100000 loops=1)
Sort Key: age, id
Sort Method: external merge Disk: 72048kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual
time=10.090..7075.208 rows=4100000 loops=1)
Total runtime: 60721.824 ms

STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: begin tuple sort: nkeys = 2, workMem = 20480, randomAccess = f
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: switching to external sort with 74 tapes: CPU 0.28s/0.30u sec elapsed
0.67 sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 1 to tape 0: CPU 0.71s/3.63u sec elapsed 4.90 sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 2 to tape 1: CPU 1.30s/7.53u sec elapsed 10.30
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 3 to tape 2: CPU 1.88s/11.36u sec elapsed 15.39
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 4 to tape 3: CPU 2.43s/15.20u sec elapsed 20.51
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 5 to tape 4: CPU 3.05s/18.96u sec elapsed 25.44
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 6 to tape 5: CPU 3.68s/22.74u sec elapsed 30.47
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 7 to tape 6: CPU 4.24s/26.63u sec elapsed 36.61
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 8 to tape 7: CPU 4.78s/30.41u sec elapsed 41.56
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: performsort starting: CPU 4.81s/30.56u sec elapsed 41.75 sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing run 9 to tape 8: CPU 4.84s/32.07u sec elapsed 43.56
sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: finished writing final run 10 to tape 9: CPU 4.84s/32.07u sec elapsed
43.56 sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: performsort done (except 10-way final merge): CPU 4.85s/32.16u sec
elapsed 43.70 sec
STATEMENT: explain analyze select * from big_wf order by age,id;
LOG: external sort ended, 9006 disk blocks used: CPU 8.60s/41.93u sec
elapsed 60.73 sec
STATEMENT: explain analyze select * from big_wf order by age,id;

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Jie Li <jay23jack(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower thansortingon one column?
Date:	2010-12-27 08:58:36
Message-ID:	1293440316.1193.61988.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-12-24 at 00:27 -0500, Jie Li wrote:

> I doubt the cost of comparing two integers is the issue here;
> rather
> it's more likely one of how many merge passes were needed.
> You could
> find out instead of just speculating by turning on trace_sort
> and
> comparing the log outputs.

> postgres=# explain analyze select * from big_wf order by id;

> LOG: switching to external sort with 74 tapes: CPU 0.29s/0.28u sec
> elapsed 0.71 sec
> LOG: external sort ended, 9006 disk blocks used: CPU 8.01s/27.02u
> sec

> elapsed 42.92 sec
> STATEMENT: explain analyze select * from big_wf order by id;

> STATEMENT: explain analyze select * from big_wf order by age,id;

> LOG: begin tuple sort: nkeys = 2, workMem = 20480, randomAccess = f
> STATEMENT: explain analyze select * from big_wf order by age,id;
> LOG: switching to external sort with 74 tapes: CPU 0.28s/0.30u sec
> elapsed 0.67 sec
> LOG: external sort ended, 9006 disk blocks used: CPU 8.60s/41.93u sec
> elapsed 60.73 sec
> STATEMENT: explain analyze select * from big_wf order by age,id;

I think the answer is that only the first column comparison is
optimised. Second and subsequent comparisons are not optimised.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Jie Li <jay23jack(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why is sorting on two columns so slower thansortingon one column?
Date:	2010-12-29 12:15:52
Message-ID:	AANLkTim+r8c92DDepkotBe1jgTDPz9c-oeA6V80XbYHw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 27, 2010 at 3:58 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think the answer is that only the first column comparison is
> optimised. Second and subsequent comparisons are not optimised.

What sort of optimization are you referring to here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company