Re: Optimizer on sort aggregate

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: fengttt(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimizer on sort aggregate
Date: 2014-10-17 23:35:16
Message-ID: 20141018.083516.694943555601559481.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The query,
> select count(distinct j) from t group by t, i;
>
> runs for 35 seconds. However, if I change the query to,
> select count(distinct j) from t group by i, t; -- note switching the
> ordering
> select count(distinct j) from t group by decode(t, 'escape'), i; -- convert
> t to bytea
>
> Run times are just about 5 and 6.5 seconds. The reason is clear, compare a
> string with collation is slow, which is well understood by pg hackers.
> However, here, the sorting order is forced by the planner, not user.
> Planner can do the following optimizations,

Interesting. I got following result:

test=# explain analyze select count(distinct j) from t group by t, i;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual time=1332.937..2431.238 rows=1000000 loops=1)
Group Key: t, i
-> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual time=1332.922..1507.413 rows=1000000 loops=1)
Sort Key: t, i
Sort Method: external merge Disk: 33232kB
-> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22) (actual time=0.006..131.406 rows=1000000 loops=1)
Planning time: 0.031 ms
Execution time: 2484.271 ms
(8 rows)

Time: 2484.520 ms

test=# explain analyze select count(distinct j) from t group by i, t;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual time=602.510..1632.087 rows=1000000 loops=1)
Group Key: i, t
-> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual time=602.493..703.274 rows=1000000 loops=1)
Sort Key: i, t
Sort Method: external sort Disk: 33240kB
-> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22) (actual time=0.014..129.213 rows=1000000 loops=1)
Planning time: 0.176 ms
Execution time: 1685.575 ms
(8 rows)

Time: 1687.641 ms

Not so big difference here (maybe because I use SSD) but there is
still about 50% difference in execution time. Note that I disable
locale support.

> 1. for the sort we generated for sort agg, planner can switch column
> ordering, put int before string,
> 2. for the sort we generated for sort agg, use bytea compare instead of
> string compare.
>
> They will bring big improvement to this common query. Is this something
> reasonable?

Not looking at sort and planner code closely, I guess planner does
not recognize the cost difference between "external merge" and
"external sort" because cost estimations for sort step in each plan
are exactly same (137519.84..140019.84). However I am not sure if we
should fix the planner or should fix our sort machinery since it is
possible that the sort machinery should not expose such a big
difference in the two sort methods.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-10-17 23:36:30 Re: Optimizer on sort aggregate
Previous Message Marko Tiikkaja 2014-10-17 22:49:54 Re: get_actual_variable_range vs idx_scan/idx_tup_fetch