Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

From: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
To: David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-04-13 14:50:24
Message-ID: 534AA430.9020105@student.kit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello David,

thanks for your work. The results look promising.

What I'm missing is a test case with multiple fields in the partition by
clauses:

-- should push down, because partid is part of all PARTITION BY clauses
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid, partid+0) m
from winagg
) winagg
where partid=1;

current production 9.3.4 is returning


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on winagg (cost=350955.11..420955.11 rows=20 width=20)
(actual time=2564.360..3802.413 rows=20 loops=1)
Filter: (winagg.partid = 1)
Rows Removed by Filter: 1999980
-> WindowAgg (cost=350955.11..395955.11 rows=2000000 width=4)
(actual time=2564.332..3657.051 rows=2000000 loops=1)
-> Sort (cost=350955.11..355955.11 rows=2000000 width=4)
(actual time=2564.320..2802.444 rows=2000000 loops=1)
Sort Key: winagg_1.partid, ((winagg_1.partid + 0))
Sort Method: external sort Disk: 50840kB
-> WindowAgg (cost=0.43..86948.43 rows=2000000
width=4) (actual time=0.084..1335.081 rows=2000000 loops=1)
-> Index Only Scan using winagg_partid_idx on
winagg winagg_1 (cost=0.43..51948.43 rows=2000000 width=4) (actual
time=0.051..378.232 rows=2000000 loops=1)
Heap Fetches: 0

"Index Only Scan" currently returns all rows (without pushdown) on
current production 9.3.4. What happens with the patch you provided?

-- Already Part of your tests:
-- should NOT push down, because partid is NOT part of all PARTITION BY
clauses
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid+0) m
from winagg
) winagg
where partid=1;

Reordering the fields should also be tested:
-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid at the end
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid+0, partid) m
from winagg
) winagg
where partid=1;

-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid in the middle
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid+0, partid, partid+1) m
from winagg
) winagg
where partid=1;

Best regards
Thomas

Am 13.04.2014 13:32, schrieb David Rowley:
> On this thread
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu
> there was some discussion around allowing push downs of quals that
> happen to be in every window clause of the sub query. I've quickly put
> together a patch which does this (see attached)
>
> I'm posting this just mainly to let Thomas know that I'm working on it,
> per his request on the other thread.
>
> The patch seems to work with all my test cases, and I've not quite
> gotten around to thinking of any more good cases to throw at it.
>
> Oh and I know that my
> function var_exists_in_all_query_partition_by_clauses has no business in
> allpaths.c, I'll move it out as soon as I find a better home for it.
>
> Here's my test case:
>
> drop table if exists winagg;
>
> create table winagg (
> id serial not null primary key,
> partid int not null
> );
>
> insert into winagg (partid) select x.x % 100000 from
> generate_series(1,2000000) x(x);
>
>
> create index winagg_partid_idx on winagg(partid);
>
>
> -- Should push: this should push WHERE partid=1 to the inner query as
> partid is in the only parition by clause in the query.
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg) winagg where partid=1;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------
> WindowAgg (cost=4.58..82.23 rows=20 width=4) (actual
> time=0.196..0.207 rows=20 loops=1)
> -> Bitmap Heap Scan on winagg (cost=4.58..81.98 rows=20 width=4)
> (actual time=0.102..0.170 rows=20 loops=1)
> Recheck Cond: (partid = 1)
> Heap Blocks: exact=20
> -> Bitmap Index Scan on winagg_partid_idx (cost=0.00..4.58
> rows=20 width=0) (actual time=0.084..0.084 rows=20 loops=1)
> Index Cond: (partid = 1)
> Planning time: 0.208 ms
> Total runtime: 0.276 ms
> (8 rows)
>
> -- Should not push: Added a +0 to partition by clause.
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid + 0) n from winagg) winagg where partid=1;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------
> Subquery Scan on winagg (cost=265511.19..330511.19 rows=20 width=12)
> (actual time=2146.642..4257.267 rows=20 loops=1)
> Filter: (winagg.partid = 1)
> Rows Removed by Filter: 1999980
> -> WindowAgg (cost=265511.19..305511.19 rows=2000000 width=4)
> (actual time=2146.614..4099.169 rows=2000000 loops=1)
> -> Sort (cost=265511.19..270511.19 rows=2000000 width=4)
> (actual time=2146.587..2994.993 rows=2000000 loops=1)
> Sort Key: ((winagg_1.partid + 0))
> Sort Method: external merge Disk: 35136kB
> -> Seq Scan on winagg winagg_1 (cost=0.00..28850.00
> rows=2000000 width=4) (actual time=0.025..418.306 rows=2000000 loops=1)
> Planning time: 0.249 ms
> Total runtime: 4263.933 ms
> (10 rows)
>
>
> -- Should not push: Add a window clause (which is not used) that has a
> partition by clause that does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as (partition by
> id)) winagg where partid=1;
>
> -- Should not push: 1 window clause does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as (order id))
> winagg where partid=1;
>
> -- Should not push: 1 window clause does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as ()) winagg
> where partid=1;
>
> As of now the patch is a couple of hours old, I've not even bothered to
> run the regression tests yet, let alone add any new ones.
>
> Comments are welcome...
>
> Regards
>
> David Rowley
>

--
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax: +49-721-72380001
Mobil: +49-174-2152332
E-Mail: thomas(dot)mayer(at)student(dot)kit(dot)edu
=======================================

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-04-13 15:01:35 Re: Patch to fix a couple of compiler warnings from 80a5cf64
Previous Message Tom Lane 2014-04-13 14:49:51 Re: integrate pg_upgrade analyze_new_cluster.sh into vacuumdb