Why percent_rank is so slower than rank?

From: Jie Li <jay23jack(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Why percent_rank is so slower than rank?
Date: 2010-12-09 07:26:09
Message-ID: AANLkTi=Jxs_mG4kJuxEWLdKvwREHYBeSA40AYnXqYhtq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I'm new to window functions. Recently I run some simple queries but
surprised to find percent_rank is so slower than rank, could anybody tell me
why?

The table schema:
test=# \d inventory1
Table "public.inventory1"
Column | Type | Modifiers
----------------------+---------+-----------
inv_date_sk | integer | not null
inv_item_sk | integer | not null
inv_warehouse_sk | integer | not null
inv_quantity_on_hand | integer |

test=# \dt+ inventory1
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------+-------+----------+---------+-------------
public | inventory1 | table | workshop | 8880 kB |

The rank query result:
test=# explain analyze select inv_date_sk,inv_item_sk, rank()over(partition
by inv_date_sk order by inv_item_sk) from inventory1;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=631.947..1361.158 rows=189000 loops=1)
-> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=631.924..771.990 rows=189000 loops=1)
Sort Key: inv_date_sk, inv_item_sk
Sort Method: quicksort Memory: 12218kB
-> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.055..198.948 rows=189000 loops=1)
Total runtime: 1500.193 ms
(6 rows)

The percent_rank result:
test=# explain analyze select inv_date_sk,inv_item_sk,
percent_rank()over(partition by inv_date_sk order by inv_item_sk) from
inventory1;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=766.432..32924.804 rows=189000 loops=1)
-> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=756.320..905.407 rows=189000 loops=1)
Sort Key: inv_date_sk, inv_item_sk
Sort Method: quicksort Memory: 12218kB
-> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.102..224.607 rows=189000 loops=1)
Total runtime: 33152.188 ms
(6 rows)

One special thing is that all the values of the partition key(inv_date_sk)
are the same, that is, there is only one window partition. I find that
percent_rank needs to buffer all the tuples to get the total number of rows.
But why is it so expensive?

I use 8.4.4. And I only increase the work_mem to 100M and leave other
parameters untouched.

Thanks,
Li Jie

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-12-09 07:52:11 Re: Hot Standby btree delete records and vacuum_defer_cleanup_age
Previous Message Jeff Janes 2010-12-09 05:44:13 Re: BufFreelistLock