Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

Lists: pgsql-hackers
From: David Rowley <dgrowley(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, thomas(dot)mayer(at)student(dot)kit(dot)edu
Subject: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-04-13 11:32:25
Message-ID: CAHoyFK8b4HpSF+NeyiZjcr96nKLaOC7MpgHy5hnymxDVKP6juw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On this thread
http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there
was some discussion around allowing push downs of quals that happen to be
in every window clause of the sub query. I've quickly put together a patch
which does this (see attached)

I'm posting this just mainly to let Thomas know that I'm working on it, per
his request on the other thread.

The patch seems to work with all my test cases, and I've not quite gotten
around to thinking of any more good cases to throw at it.

Oh and I know that my function var_exists_in_all_query_partition_by_clauses
has no business in allpaths.c, I'll move it out as soon as I find a better
home for it.

Here's my test case:

drop table if exists winagg;

create table winagg (
id serial not null primary key,
partid int not null
);

insert into winagg (partid) select x.x % 100000 from
generate_series(1,2000000) x(x);

create index winagg_partid_idx on winagg(partid);

-- Should push: this should push WHERE partid=1 to the inner query as
partid is in the only parition by clause in the query.
explain analyze select partid,n from (select partid,count(*) over
(partition by partid) n from winagg) winagg where partid=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=4.58..82.23 rows=20 width=4) (actual time=0.196..0.207
rows=20 loops=1)
-> Bitmap Heap Scan on winagg (cost=4.58..81.98 rows=20 width=4)
(actual time=0.102..0.170 rows=20 loops=1)
Recheck Cond: (partid = 1)
Heap Blocks: exact=20
-> Bitmap Index Scan on winagg_partid_idx (cost=0.00..4.58
rows=20 width=0) (actual time=0.084..0.084 rows=20 loops=1)
Index Cond: (partid = 1)
Planning time: 0.208 ms
Total runtime: 0.276 ms
(8 rows)

-- Should not push: Added a +0 to partition by clause.
explain analyze select partid,n from (select partid,count(*) over
(partition by partid + 0) n from winagg) winagg where partid=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on winagg (cost=265511.19..330511.19 rows=20 width=12)
(actual time=2146.642..4257.267 rows=20 loops=1)
Filter: (winagg.partid = 1)
Rows Removed by Filter: 1999980
-> WindowAgg (cost=265511.19..305511.19 rows=2000000 width=4) (actual
time=2146.614..4099.169 rows=2000000 loops=1)
-> Sort (cost=265511.19..270511.19 rows=2000000 width=4) (actual
time=2146.587..2994.993 rows=2000000 loops=1)
Sort Key: ((winagg_1.partid + 0))
Sort Method: external merge Disk: 35136kB
-> Seq Scan on winagg winagg_1 (cost=0.00..28850.00
rows=2000000 width=4) (actual time=0.025..418.306 rows=2000000 loops=1)
Planning time: 0.249 ms
Total runtime: 4263.933 ms
(10 rows)

-- Should not push: Add a window clause (which is not used) that has a
partition by clause that does not have partid
explain analyze select partid,n from (select partid,count(*) over
(partition by partid) n from winagg window stopPushDown as (partition by
id)) winagg where partid=1;

-- Should not push: 1 window clause does not have partid
explain analyze select partid,n from (select partid,count(*) over
(partition by partid) n from winagg window stopPushDown as (order id))
winagg where partid=1;

-- Should not push: 1 window clause does not have partid
explain analyze select partid,n from (select partid,count(*) over
(partition by partid) n from winagg window stopPushDown as ()) winagg where
partid=1;

As of now the patch is a couple of hours old, I've not even bothered to run
the regression tests yet, let alone add any new ones.

Comments are welcome...

Regards

David Rowley

Attachment Content-Type Size
wfunc_pushdown_paritionby_v0.1.patch application/octet-stream 3.2 KB

From: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
To: David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-04-13 14:50:24
Message-ID: 534AA430.9020105@student.kit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello David,

thanks for your work. The results look promising.

What I'm missing is a test case with multiple fields in the partition by
clauses:

-- should push down, because partid is part of all PARTITION BY clauses
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid, partid+0) m
from winagg
) winagg
where partid=1;

current production 9.3.4 is returning


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on winagg (cost=350955.11..420955.11 rows=20 width=20)
(actual time=2564.360..3802.413 rows=20 loops=1)
Filter: (winagg.partid = 1)
Rows Removed by Filter: 1999980
-> WindowAgg (cost=350955.11..395955.11 rows=2000000 width=4)
(actual time=2564.332..3657.051 rows=2000000 loops=1)
-> Sort (cost=350955.11..355955.11 rows=2000000 width=4)
(actual time=2564.320..2802.444 rows=2000000 loops=1)
Sort Key: winagg_1.partid, ((winagg_1.partid + 0))
Sort Method: external sort Disk: 50840kB
-> WindowAgg (cost=0.43..86948.43 rows=2000000
width=4) (actual time=0.084..1335.081 rows=2000000 loops=1)
-> Index Only Scan using winagg_partid_idx on
winagg winagg_1 (cost=0.43..51948.43 rows=2000000 width=4) (actual
time=0.051..378.232 rows=2000000 loops=1)
Heap Fetches: 0

"Index Only Scan" currently returns all rows (without pushdown) on
current production 9.3.4. What happens with the patch you provided?

-- Already Part of your tests:
-- should NOT push down, because partid is NOT part of all PARTITION BY
clauses
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid+0) m
from winagg
) winagg
where partid=1;

Reordering the fields should also be tested:
-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid at the end
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid+0, partid) m
from winagg
) winagg
where partid=1;

-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid in the middle
explain analyze select partid,n,m from (
select partid,
count(*) over (partition by partid) n,
count(*) over (partition by partid+0, partid, partid+1) m
from winagg
) winagg
where partid=1;

Best regards
Thomas

Am 13.04.2014 13:32, schrieb David Rowley:
> On this thread
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu
> there was some discussion around allowing push downs of quals that
> happen to be in every window clause of the sub query. I've quickly put
> together a patch which does this (see attached)
>
> I'm posting this just mainly to let Thomas know that I'm working on it,
> per his request on the other thread.
>
> The patch seems to work with all my test cases, and I've not quite
> gotten around to thinking of any more good cases to throw at it.
>
> Oh and I know that my
> function var_exists_in_all_query_partition_by_clauses has no business in
> allpaths.c, I'll move it out as soon as I find a better home for it.
>
> Here's my test case:
>
> drop table if exists winagg;
>
> create table winagg (
> id serial not null primary key,
> partid int not null
> );
>
> insert into winagg (partid) select x.x % 100000 from
> generate_series(1,2000000) x(x);
>
>
> create index winagg_partid_idx on winagg(partid);
>
>
> -- Should push: this should push WHERE partid=1 to the inner query as
> partid is in the only parition by clause in the query.
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg) winagg where partid=1;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------
> WindowAgg (cost=4.58..82.23 rows=20 width=4) (actual
> time=0.196..0.207 rows=20 loops=1)
> -> Bitmap Heap Scan on winagg (cost=4.58..81.98 rows=20 width=4)
> (actual time=0.102..0.170 rows=20 loops=1)
> Recheck Cond: (partid = 1)
> Heap Blocks: exact=20
> -> Bitmap Index Scan on winagg_partid_idx (cost=0.00..4.58
> rows=20 width=0) (actual time=0.084..0.084 rows=20 loops=1)
> Index Cond: (partid = 1)
> Planning time: 0.208 ms
> Total runtime: 0.276 ms
> (8 rows)
>
> -- Should not push: Added a +0 to partition by clause.
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid + 0) n from winagg) winagg where partid=1;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------
> Subquery Scan on winagg (cost=265511.19..330511.19 rows=20 width=12)
> (actual time=2146.642..4257.267 rows=20 loops=1)
> Filter: (winagg.partid = 1)
> Rows Removed by Filter: 1999980
> -> WindowAgg (cost=265511.19..305511.19 rows=2000000 width=4)
> (actual time=2146.614..4099.169 rows=2000000 loops=1)
> -> Sort (cost=265511.19..270511.19 rows=2000000 width=4)
> (actual time=2146.587..2994.993 rows=2000000 loops=1)
> Sort Key: ((winagg_1.partid + 0))
> Sort Method: external merge Disk: 35136kB
> -> Seq Scan on winagg winagg_1 (cost=0.00..28850.00
> rows=2000000 width=4) (actual time=0.025..418.306 rows=2000000 loops=1)
> Planning time: 0.249 ms
> Total runtime: 4263.933 ms
> (10 rows)
>
>
> -- Should not push: Add a window clause (which is not used) that has a
> partition by clause that does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as (partition by
> id)) winagg where partid=1;
>
> -- Should not push: 1 window clause does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as (order id))
> winagg where partid=1;
>
> -- Should not push: 1 window clause does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as ()) winagg
> where partid=1;
>
> As of now the patch is a couple of hours old, I've not even bothered to
> run the regression tests yet, let alone add any new ones.
>
> Comments are welcome...
>
> Regards
>
> David Rowley
>

--
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax: +49-721-72380001
Mobil: +49-174-2152332
E-Mail: thomas(dot)mayer(at)student(dot)kit(dot)edu
=======================================


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowley(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, thomas(dot)mayer(at)student(dot)kit(dot)edu
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-04-13 15:31:41
Message-ID: 32204.1397403101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Rowley <dgrowley(at)gmail(dot)com> writes:
> On this thread
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there
> was some discussion around allowing push downs of quals that happen to be
> in every window clause of the sub query. I've quickly put together a patch
> which does this (see attached)

I think you should have check_output_expressions deal with this, instead.
Compare the existing test there for non-DISTINCT output columns.

> Oh and I know that my function var_exists_in_all_query_partition_by_clauses
> has no business in allpaths.c, I'll move it out as soon as I find a better
> home for it.

I might be wrong, but I think you could just embed that search loop in
check_output_expressions, and it wouldn't be too ugly.

regards, tom lane


From: David Rowley <dgrowley(at)gmail(dot)com>
To: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-04-14 11:03:31
Message-ID: CAHoyFK9c8m16opDnAKnKHU6ahdDcXsLefjYVCb_wW7ksbO8rww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14 April 2014 02:50, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu> wrote:

> Hello David,
>
> thanks for your work. The results look promising.
>

Thanks

>
> What I'm missing is a test case with multiple fields in the partition by
> clauses:
>
>
I've modified the patch and added some regression tests that I think cover
all of your cases, but please let me know if I've missed any. The patch
will follow very soon.

> -- should push down, because partid is part of all PARTITION BY clauses
> explain analyze select partid,n,m from (
> select partid,
> count(*) over (partition by partid) n,
> count(*) over (partition by partid, partid+0) m
> from winagg
> ) winagg
> where partid=1;
>
> current production 9.3.4 is returning
>
>
> QUERY PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------------------------------------------------
> Subquery Scan on winagg (cost=350955.11..420955.11 rows=20 width=20)
> (actual time=2564.360..3802.413 rows=20 loops=1)
>
> Filter: (winagg.partid = 1)
> Rows Removed by Filter: 1999980
> -> WindowAgg (cost=350955.11..395955.11 rows=2000000 width=4) (actual
> time=2564.332..3657.051 rows=2000000 loops=1)
> -> Sort (cost=350955.11..355955.11 rows=2000000 width=4)
> (actual time=2564.320..2802.444 rows=2000000 loops=1)
> Sort Key: winagg_1.partid, ((winagg_1.partid + 0))
> Sort Method: external sort Disk: 50840kB
> -> WindowAgg (cost=0.43..86948.43 rows=2000000 width=4)
> (actual time=0.084..1335.081 rows=2000000 loops=1)
> -> Index Only Scan using winagg_partid_idx on winagg
> winagg_1 (cost=0.43..51948.43 rows=2000000 width=4) (actual
> time=0.051..378.232 rows=2000000 loops=1)
> Heap Fetches: 0
>
> "Index Only Scan" currently returns all rows (without pushdown) on current
> production 9.3.4. What happens with the patch you provided?
>
>
I get a push down as expected.

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on winagg (cost=82.71..83.31 rows=20 width=20) (actual
time=0.168..0.179 rows=20 loops=1)
-> WindowAgg (cost=82.71..83.11 rows=20 width=4) (actual
time=0.166..0.174 rows=20 loops=1)
-> Sort (cost=82.71..82.76 rows=20 width=4) (actual
time=0.151..0.154 rows=20 loops=1)
Sort Key: ((winagg_1.partid + 0))
Sort Method: quicksort Memory: 17kB
-> WindowAgg (cost=4.58..82.28 rows=20 width=4) (actual
time=0.127..0.135 rows=20 loops=1)
-> Bitmap Heap Scan on winagg winagg_1
(cost=4.58..81.98 rows=20 width=4) (actual time=0.058..0.104 rows=20
loops=1)
Recheck Cond: (partid = 1)
Heap Blocks: exact=20
-> Bitmap Index Scan on winagg_partid_idx
(cost=0.00..4.58 rows=20 width=0) (actual time=0.037..0.037 rows=20
loops=1)
Index Cond: (partid = 1)
Planning time: 0.235 ms
Total runtime: 0.280 ms

> -- Already Part of your tests:
> -- should NOT push down, because partid is NOT part of all PARTITION BY
> clauses
> explain analyze select partid,n,m from (
> select partid,
> count(*) over (partition by partid) n,
> count(*) over (partition by partid+0) m
> from winagg
> ) winagg
> where partid=1;
>
> Reordering the fields should also be tested:
> -- should push down, because partid is part of all PARTITION BY clauses
> -- here: partid at the end
> explain analyze select partid,n,m from (
> select partid,
> count(*) over (partition by partid) n,
> count(*) over (partition by partid+0, partid) m
> from winagg
> ) winagg
> where partid=1;
>
>
Covered in regression and works as expected.

> -- should push down, because partid is part of all PARTITION BY clauses
> -- here: partid in the middle
> explain analyze select partid,n,m from (
> select partid,
> count(*) over (partition by partid) n,
> count(*) over (partition by partid+0, partid, partid+1) m
> from winagg
> ) winagg
> where partid=1;
>
>
I covered this in the regression tests too.

Regards

David Rowley


From: David Rowley <dgrowley(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-04-14 11:19:40
Message-ID: CAHoyFK9ihoSarntWc-NJ5tPHko4Wcausd-1C_0wEcogi9UEKTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14 April 2014 03:31, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <dgrowley(at)gmail(dot)com> writes:
> > On this thread
> > http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.eduthere
> > was some discussion around allowing push downs of quals that happen to be
> > in every window clause of the sub query. I've quickly put together a
> patch
> > which does this (see attached)
>
> I think you should have check_output_expressions deal with this, instead.
> Compare the existing test there for non-DISTINCT output columns.
>
>
I've moved the code there and it seems like a much better place for it.
Thanks for the tip.

> > Oh and I know that my function
> var_exists_in_all_query_partition_by_clauses
> > has no business in allpaths.c, I'll move it out as soon as I find a
> better
> > home for it.
>
> I might be wrong, but I think you could just embed that search loop in
> check_output_expressions, and it wouldn't be too ugly.
>
>
I've changed the helper function to take the TargetEntry now the same
as targetIsInSortList does for the hasDistinctOn test and I renamed the
helper function to targetExistsInAllQueryPartitionByClauses. It seems much
better, but there's probably a bit of room for improving on the names some
more.

I've included the updated patch with some regression tests.
The final test I added tests that an unused window which would, if used,
cause the pushdown not to take place still stops the pushdownf from
happening... I know you both talked about this case in the other thread,
but I've done nothing for it as Tom mentioned that this should likely be
fixed elsewhere, if it's even worth worrying about at all. I'm not quite
sure if I needed to bother including this test for it, but I did anyway, it
can be removed if it is deemed unneeded.

Per Thomas' comments, I added a couple of tests that ensure that the order
of the columns in the partition by clause don't change the behaviour.
Looking at the implementation of the actual code makes this test seems a
bit unneeded as really but I thought that the test should not assume
anything about the implementation so from that point of view the extra test
seems like a good idea.

For now I can't think of much else to do for the patch, but I'd really
appreciate it Thomas if you could run it through the paces if you can find
any time for it, or maybe just give my regression tests a once over to see
if you can think of any other cases that should be covered.

Regards

David Rowley

Attachment Content-Type Size
wfunc_pushdown_partitionby_v0.2.patch application/octet-stream 14.3 KB

From: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
To: David Rowley <dgrowley(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-05-25 13:10:36
Message-ID: 5381EBCC.9020908@student.kit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello David,

sorry for the late response. I will try out your changes from the view
of a user in mid-June. However, I can't do a trustworthy code review as
I'm not an experienced postgre-hacker (yet).

Best Regards
Thomas

Am 14.04.2014 13:19, schrieb David Rowley:
> On 14 April 2014 03:31, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
> David Rowley <dgrowley(at)gmail(dot)com <mailto:dgrowley(at)gmail(dot)com>> writes:
> > On this thread
> >
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there
> > was some discussion around allowing push downs of quals that
> happen to be
> > in every window clause of the sub query. I've quickly put
> together a patch
> > which does this (see attached)
>
> I think you should have check_output_expressions deal with this,
> instead.
> Compare the existing test there for non-DISTINCT output columns.
>
>
> I've moved the code there and it seems like a much better place for it.
> Thanks for the tip.
>
> > Oh and I know that my function
> var_exists_in_all_query_partition_by_clauses
> > has no business in allpaths.c, I'll move it out as soon as I find
> a better
> > home for it.
>
> I might be wrong, but I think you could just embed that search loop in
> check_output_expressions, and it wouldn't be too ugly.
>
>
> I've changed the helper function to take the TargetEntry now the same
> as targetIsInSortList does for the hasDistinctOn test and I renamed the
> helper function to targetExistsInAllQueryPartitionByClauses. It seems
> much better, but there's probably a bit of room for improving on the
> names some more.
>
> I've included the updated patch with some regression tests.
> The final test I added tests that an unused window which would, if used,
> cause the pushdown not to take place still stops the pushdownf from
> happening... I know you both talked about this case in the other thread,
> but I've done nothing for it as Tom mentioned that this should likely be
> fixed elsewhere, if it's even worth worrying about at all. I'm not quite
> sure if I needed to bother including this test for it, but I did anyway,
> it can be removed if it is deemed unneeded.
>
> Per Thomas' comments, I added a couple of tests that ensure that the
> order of the columns in the partition by clause don't change the
> behaviour. Looking at the implementation of the actual code makes this
> test seems a bit unneeded as really but I thought that the test should
> not assume anything about the implementation so from that point of view
> the extra test seems like a good idea.
>
> For now I can't think of much else to do for the patch, but I'd really
> appreciate it Thomas if you could run it through the paces if you can
> find any time for it, or maybe just give my regression tests a once over
> to see if you can think of any other cases that should be covered.
>
> Regards
>
> David Rowley

--
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax: +49-721-72380001
Mobil: +49-174-2152332
E-Mail: thomas(dot)mayer(at)student(dot)kit(dot)edu
=======================================


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Cc: David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-17 09:27:14
Message-ID: CAApHDvpB-BFSfFoMDuM-EEUYzByDRH0mCma+m33p+SxuX2MxrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, May 25, 2014 at 2:10 PM, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
wrote:

> Hello David,
>
> sorry for the late response. I will try out your changes from the view of
> a user in mid-June. However, I can't do a trustworthy code review as I'm
> not an experienced postgre-hacker (yet).
>
>
Thanks, that sort of review will be a great start.

I've attached an updated patch as it seems there's been a change that
removes redundant sorts which was breaking one of the regression tests in
v0.2 of the patch.

+EXPLAIN (COSTS OFF)
+SELECT depname,depsalary FROM (SELECT depname,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ rank() OVER (ORDER BY enroll_date) emprank
+ FROM empsalary) emp
+WHERE depname = 'sales';

This used to produced a plan which included a sort by enroll_date, but this
is no longer the case. I've updated the expected results to remove the
extra sort from the explain output

Regards

David Rowley

Attachment Content-Type Size
wfunc_pushdown_partitionby_v0.3.patch application/octet-stream 14.1 KB

From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: David Rowley <dgrowley(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-21 00:38:52
Message-ID: 53A4D41C.6030105@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've had a chance to look at this and here is my review.

On 04/14/2014 01:19 PM, David Rowley wrote:
> I've included the updated patch with some regression tests.

The first thing I noticed is there is no documentation, but I don't
think we document such things outside of the release notes, so that's
probably normal.

> The final test I added tests that an unused window which would, if used,
> cause the pushdown not to take place still stops the pushdownf from
> happening... I know you both talked about this case in the other thread,
> but I've done nothing for it as Tom mentioned that this should likely be
> fixed elsewhere, if it's even worth worrying about at all. I'm not quite
> sure if I needed to bother including this test for it, but I did anyway,
> it can be removed if it is deemed unneeded.

I don't think this test has any business being in this patch. Removal
of unused "things" should be a separate patch and once that's done your
patch should work as expected. This regression test you have here
almost demands that that other removal patch shouldn't happen.

> Per Thomas' comments, I added a couple of tests that ensure that the
> order of the columns in the partition by clause don't change the
> behaviour. Looking at the implementation of the actual code makes this
> test seems a bit unneeded as really but I thought that the test should
> not assume anything about the implementation so from that point of view
> the extra test seems like a good idea.

Every possible permutation of everything is overkill, but I think having
a test that makes sure the pushdown happens when the partition in
question isn't first in the list is a good thing.

> For now I can't think of much else to do for the patch, but I'd really
> appreciate it Thomas if you could run it through the paces if you can
> find any time for it, or maybe just give my regression tests a once over
> to see if you can think of any other cases that should be covered.

I'm not Thomas, but...

This patch is very simple. There's nothing wrong with that, of course,
but I wonder how hard it would be to push more stuff down. For example,
you have this case which works perfectly by not pushing down:

SELECT * FROM (
SELECT partid,
winagg() OVER (PARTITION BY partid+0)
FROM t) wa
WHERE partid = 1;

But then you have this case which doesn't push down:

SELECT * FROM (
SELECT partid,
winagg() OVER (PARTITION BY partid+0)
FROM t) wa
WHERE partid+0 = 1;

I don't know what's involved in fixing that, or if we do that sort of
thing elsewhere, but it's something to consider.

Multi-column pushdowns work as expected.

I have an issue with some of the code comments:

In subquery_is_pushdown_safe you reduced the number of "points" from
three to two. The comments used to say "Check point 1" and "Check point
2" but the third was kind of tested throughout the rest of the code so
didn't have a dedicated comment. Now that there are only two checks,
the "Check point 1" without a 2 seems a little bit odd. The attached
patch provides a suggestion for improvement.

The same kind of weirdness is found in check_output_expressions but I
don't really have any suggestions to fix it so I just left it.

The attached patch also fixes some typos.

I'm marking this Waiting on Author pending discussion on pushing down
entire expressions, but on the whole I think this is pretty much ready.
--
Vik

Attachment Content-Type Size
wfunc_pushdown_partitionby_v0.3b.patch text/x-diff 1.5 KB

From: David Rowley <dgrowley(at)gmail(dot)com>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-21 12:03:18
Message-ID: CAHoyFK8hCd0Kodsh3jnEuUR9SmR4fQG4U2Sjrf5h18fKBG6jbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 June 2014 01:38, Vik Fearing <vik(dot)fearing(at)dalibo(dot)com> wrote:

> I've had a chance to look at this and here is my review.
>
> On 04/14/2014 01:19 PM, David Rowley wrote:
> > I've included the updated patch with some regression tests.
>
> The first thing I noticed is there is no documentation, but I don't
> think we document such things outside of the release notes, so that's
> probably normal.
>
>
This prompted me to have a look to see if there's anything written in the
documents about the reasons why we might not push quals down, but I didn't
manage to find anything to that affect.

> > The final test I added tests that an unused window which would, if used,
> > cause the pushdown not to take place still stops the pushdownf from
> > happening... I know you both talked about this case in the other thread,
> > but I've done nothing for it as Tom mentioned that this should likely be
> > fixed elsewhere, if it's even worth worrying about at all. I'm not quite
> > sure if I needed to bother including this test for it, but I did anyway,
> > it can be removed if it is deemed unneeded.
>
> I don't think this test has any business being in this patch. Removal
> of unused "things" should be a separate patch and once that's done your
> patch should work as expected. This regression test you have here
> almost demands that that other removal patch shouldn't happen.
>
>
Agreed, I've removed that test now.

> > Per Thomas' comments, I added a couple of tests that ensure that the
> > order of the columns in the partition by clause don't change the
> > behaviour. Looking at the implementation of the actual code makes this
> > test seems a bit unneeded as really but I thought that the test should
> > not assume anything about the implementation so from that point of view
> > the extra test seems like a good idea.
>
> Every possible permutation of everything is overkill, but I think having
> a test that makes sure the pushdown happens when the partition in
> question isn't first in the list is a good thing.
>
>
In the updated patch I've removed some a few extra tests that were just
testing the same clauses in the PARTITION BY but in a different order.

> > For now I can't think of much else to do for the patch, but I'd really
> > appreciate it Thomas if you could run it through the paces if you can
> > find any time for it, or maybe just give my regression tests a once over
> > to see if you can think of any other cases that should be covered.
>
> I'm not Thomas, but...
>
> This patch is very simple. There's nothing wrong with that, of course,
> but I wonder how hard it would be to push more stuff down. For example,
> you have this case which works perfectly by not pushing down:
>
> SELECT * FROM (
> SELECT partid,
> winagg() OVER (PARTITION BY partid+0)
> FROM t) wa
> WHERE partid = 1;
>
> But then you have this case which doesn't push down:
>
> SELECT * FROM (
> SELECT partid,
> winagg() OVER (PARTITION BY partid+0)
> FROM t) wa
> WHERE partid+0 = 1;
>
> I don't know what's involved in fixing that, or if we do that sort of
> thing elsewhere, but it's something to consider.
>
>
I had a look at this and at first I thought it was just as simple as
removing the if (tle->resjunk) continue; from check_output_expressions, but
after removing that I see that it's a bit more complex.
In qual_is_pushdown_safe it pulls (using pull_var_clause) the Vars from the
outer WHERE clause and not the whole expression, it then checks, in your
case if "partid" (not partid+0) is safe to push down, and sees that it's
not, due to this patches code marking it as not because partid is not in
the PARTITION BY clause.

I really don't think it's the business of this patch to make changes around
here. Perhaps it would be worth looking into in more detail in the future.
Although, you can probably get what you want by just writing the query as:

SELECT * FROM (
SELECT partid+0 as partid,
count(*) OVER (PARTITION BY partid+0)
FROM t) wa
WHERE partid = 1;

Or if you really need the unmodified partid, then you could add another
column to the target list and just not do select * in the outer query.

Multi-column pushdowns work as expected.
>
>
> I have an issue with some of the code comments:
>
> In subquery_is_pushdown_safe you reduced the number of "points" from
> three to two. The comments used to say "Check point 1" and "Check point
> 2" but the third was kind of tested throughout the rest of the code so
> didn't have a dedicated comment. Now that there are only two checks,
> the "Check point 1" without a 2 seems a little bit odd. The attached
> patch provides a suggestion for improvement.
>
> The same kind of weirdness is found in check_output_expressions but I
> don't really have any suggestions to fix it so I just left it.
>
> The attached patch also fixes some typos.
>
>
That seems like a good change, oh and well spotted on the typo.
I've applied your patch into my local copy of the code here. Thanks

>
> I'm marking this Waiting on Author pending discussion on pushing down
> entire expressions, but on the whole I think this is pretty much ready.
>

As I said above, I don't think playing around with that code is really work
for this patch. It can be done later in another patch if people think it
would be useful.

I've attached a delta patch with just the changes from v0.3 and a complete
updated patch.

Thanks for taking the time to look at this.

Regards

David Rowley

> --
> Vik
>

Attachment Content-Type Size
wfunc_pushdown_partitionby_v0.4.patch application/octet-stream 10.6 KB
wfunc_pushdown_partitionby_v0.4_delta.patch application/octet-stream 7.3 KB

From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: David Rowley <dgrowley(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-26 23:01:22
Message-ID: 53ACA642.8040804@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/21/2014 02:03 PM, David Rowley wrote:
> I'm marking this Waiting on Author pending discussion on pushing down
> entire expressions, but on the whole I think this is pretty much ready.
>
>
> As I said above, I don't think playing around with that code is really
> work for this patch. It can be done later in another patch if people
> think it would be useful.

I tend to agree.

This latest patch is ready for a committer to look at now. The weird
comments have been changed, superfluous regression tests removed, and
nothing done about expression pushdown per (brief) discussion.
--
Vik


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-27 00:49:14
Message-ID: 4123.1403830154@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Vik Fearing <vik(dot)fearing(at)dalibo(dot)com> writes:
> This latest patch is ready for a committer to look at now. The weird
> comments have been changed, superfluous regression tests removed, and
> nothing done about expression pushdown per (brief) discussion.

I started to look at this patch and realized that there's an issue that
isn't covered, which is not too surprising because the existing code fails
to cover it too. Remember that the argument for pushing down being safe
at all is that we expect the pushed-down qual to yield the same result at
all rows of a given partition, so that we either include or exclude the
whole partition and thereby don't change window function results. This
means that not only must the qual expression depend only on partitioning
columns, but *it had better not be volatile*.

In exactly the same way, it isn't safe to push down quals into
subqueries that use DISTINCT unless the quals are non-volatile. This
consideration is missed by the current code, and I think that's a bug.

(Pushing down volatile quals would also be unsafe in subqueries involving
aggregation, except that we put them into HAVING so that they're executed
only once per subquery output row anyway.)

Given the lack of prior complaints, I'm not excited about back-patching a
change to prevent pushing down volatile quals in the presence of DISTINCT;
but I think we probably ought to fix it in 9.5, and maybe 9.4 too.

Thoughts?

regards, tom lane


From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-27 01:31:03
Message-ID: 53ACC957.8010409@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/27/2014 02:49 AM, Tom Lane wrote:
> Vik Fearing <vik(dot)fearing(at)dalibo(dot)com> writes:
>> This latest patch is ready for a committer to look at now. The weird
>> comments have been changed, superfluous regression tests removed, and
>> nothing done about expression pushdown per (brief) discussion.
>
> I started to look at this patch and realized that there's an issue that
> isn't covered, which is not too surprising because the existing code fails
> to cover it too. Remember that the argument for pushing down being safe
> at all is that we expect the pushed-down qual to yield the same result at
> all rows of a given partition, so that we either include or exclude the
> whole partition and thereby don't change window function results. This
> means that not only must the qual expression depend only on partitioning
> columns, but *it had better not be volatile*.
>
> In exactly the same way, it isn't safe to push down quals into
> subqueries that use DISTINCT unless the quals are non-volatile. This
> consideration is missed by the current code, and I think that's a bug.
>
> (Pushing down volatile quals would also be unsafe in subqueries involving
> aggregation, except that we put them into HAVING so that they're executed
> only once per subquery output row anyway.)

Are you going to take care of all this, or should David or I take a
crack at it? The commitfest app still shows Ready for Committer.

> Given the lack of prior complaints, I'm not excited about back-patching a
> change to prevent pushing down volatile quals in the presence of DISTINCT;
> but I think we probably ought to fix it in 9.5, and maybe 9.4 too.
>
> Thoughts?

I didn't test it myself, I'm just taking your word on it.

If it's a bug, it should obviously be fixed in 9.5. As for 9.4, I have
always viewed a beta as a time to fix bugs so I vote to backpatch it at
least that far.

Why wouldn't it go back all the way to 9.0? (assuming 8.4 is dead)
--
Vik


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-27 01:35:11
Message-ID: 4570.1403832911@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Vik Fearing <vik(dot)fearing(at)dalibo(dot)com> writes:
> On 06/27/2014 02:49 AM, Tom Lane wrote:
>> In exactly the same way, it isn't safe to push down quals into
>> subqueries that use DISTINCT unless the quals are non-volatile. This
>> consideration is missed by the current code, and I think that's a bug.

>> Given the lack of prior complaints, I'm not excited about back-patching a
>> change to prevent pushing down volatile quals in the presence of DISTINCT;
>> but I think we probably ought to fix it in 9.5, and maybe 9.4 too.

> Why wouldn't it go back all the way to 9.0? (assuming 8.4 is dead)

People get unhappy when minor releases de-optimize queries that had
been working for them. It's not too late to change the behavior of
9.4, but I'm hesitant to do it in already-released branches, especially
in the absence of any complaints from the field.

> Are you going to take care of all this, or should David or I take a
> crack at it? The commitfest app still shows Ready for Committer.

I can deal with it.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, David Rowley <dgrowley(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-27 01:55:00
Message-ID: 20140627015500.GR16098@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Why wouldn't it go back all the way to 9.0? (assuming 8.4 is dead)
>
> People get unhappy when minor releases de-optimize queries that had
> been working for them. It's not too late to change the behavior of
> 9.4, but I'm hesitant to do it in already-released branches, especially
> in the absence of any complaints from the field.

Agreed. Back-patching to released versions would likely end up making
for a nasty surprise in a minor point release for some users.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowley(at)gmail(dot)com>
Cc: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-28 06:12:57
Message-ID: 50687.1403935977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Rowley <dgrowley(at)gmail(dot)com> writes:
> [ wfunc_pushdown_partitionby_v0.4.patch ]

I've committed this with the addition of a volatility check and some
other basically-cosmetic adjustments.

regards, tom lane


From: David Rowley <dgrowley(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
Subject: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Date: 2014-06-28 09:50:50
Message-ID: CAHoyFK8if_V0xy71ymoZQ4enODqBTT26XD7oZ0FxJVhEucNu6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 June 2014 18:12, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <dgrowley(at)gmail(dot)com> writes:
> > [ wfunc_pushdown_partitionby_v0.4.patch ]
>
> I've committed this with the addition of a volatility check and some
> other basically-cosmetic adjustments.
>
>
Great, thank you for making the required changes.

Vik, thank you for reviewing the patch.

Regards

David Rowley

regards, tom lane
>