Re: HAVING push-down

Lists: pgsql-hackers
From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: HAVING push-down
Date: 2007-01-26 13:53:32
Message-ID: 1169819613.3772.329.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do* evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

- - -

If not, it seems fairly straightforward to push down some or all of a
HAVING clause so that the qual clause is tested prior to aggregation,
not after aggregation. This could, for certain queries, significantly
reduce the amount of effort that the final Agg node performs.

We might think about deeper push-down within the query, but since the
Agg node already has the havingQual, it seems a straightforward act to
decide whether to apply it before or after the aggregation.

We already do find_unaggregated_cols(), so little additional analysis
seems required.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HAVING push-down
Date: 2007-01-26 15:22:05
Message-ID: 87zm85ak4i.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

> I've just read a paper that says PostgreSQL doesn't do this. My reading
> of the code is that we *do* evaluate the HAVING clause prior to
> calculating the aggregates for it. I thought I'd check to resolve the
> confusion.
>
> - - -
>
> If not, it seems fairly straightforward to push down some or all of a
> HAVING clause so that the qual clause is tested prior to aggregation,
> not after aggregation. This could, for certain queries, significantly
> reduce the amount of effort that the final Agg node performs.

You mean in cases like this?

postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..13.61 rows=1 width=12)
-> Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12)
Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
(3 rows)

I think we push having clauses into WHERE clauses whenever there are no
aggregates in them.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HAVING push-down
Date: 2007-01-26 15:30:53
Message-ID: 1169825453.3772.367.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2007-01-26 at 15:22 +0000, Gregory Stark wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>
> > I've just read a paper that says PostgreSQL doesn't do this. My reading
> > of the code is that we *do* evaluate the HAVING clause prior to
> > calculating the aggregates for it. I thought I'd check to resolve the
> > confusion.
> >

> You mean in cases like this?
>
> postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
> QUERY PLAN
> ------------------------------------------------------------------------------------
> GroupAggregate (cost=0.00..13.61 rows=1 width=12)
> -> Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12)
> Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
> (3 rows)

OK, thanks. I'll feedback to the author of the paper I was reviewing.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: HAVING push-down
Date: 2007-01-26 15:46:59
Message-ID: 45BA2273.8000800@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2007-01-26 at 15:22 +0000, Gregory Stark wrote:
>> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>>
>>> I've just read a paper that says PostgreSQL doesn't do this. My reading
>>> of the code is that we *do* evaluate the HAVING clause prior to
>>> calculating the aggregates for it. I thought I'd check to resolve the
>>> confusion.
>>>
>
>> You mean in cases like this?
>>
>> postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
>> QUERY PLAN
>> ------------------------------------------------------------------------------------
>> GroupAggregate (cost=0.00..13.61 rows=1 width=12)
>> -> Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12)
>> Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
>> (3 rows)
>
> OK, thanks. I'll feedback to the author of the paper I was reviewing.
>

Care to share the paper in general? It might be beneficial for all of us.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: HAVING push-down
Date: 2007-01-26 16:16:23
Message-ID: 24658.1169828183@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> I've just read a paper that says PostgreSQL doesn't do this.

What does he mean by that exactly, and which PG version is he looking
at? As Greg notes, we do know how to push down non-aggregated
conditions, but I'm not sure that's what he's thinking of. There have
been some relevant bug fixes, eg

2004-07-10 14:39 tgl

* src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
HAVING condition before computing targetlist of an Aggregate node.
This is required by SQL spec to avoid failures in cases like
SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) >
0; AFAICT we have gotten this wrong since day one. Kudos to Holger
Jakobs for being the first to notice.

Also, it's still true that we run all the aggregate transition functions
in parallel, so if you were hoping to use HAVING on an aggregate
condition to prevent an overflow or something in the state accumulation
function for a targetlist aggregate, you'd lose. But I don't see any
way to avoid that without scanning the data twice, which we're surely
not gonna do.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HAVING push-down
Date: 2007-01-26 16:21:04
Message-ID: 1169828465.3772.376.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2007-01-26 at 11:16 -0500, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > I've just read a paper that says PostgreSQL doesn't do this.
>
> What does he mean by that exactly, and which PG version is he looking
> at? As Greg notes, we do know how to push down non-aggregated
> conditions, but I'm not sure that's what he's thinking of.

Yes, it was specifically non-aggregated conditions.

> There have
> been some relevant bug fixes, eg
>
> 2004-07-10 14:39 tgl
>
> * src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
> HAVING condition before computing targetlist of an Aggregate node.
> This is required by SQL spec to avoid failures in cases like
> SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) >
> 0; AFAICT we have gotten this wrong since day one. Kudos to Holger
> Jakobs for being the first to notice.
>
> Also, it's still true that we run all the aggregate transition functions
> in parallel, so if you were hoping to use HAVING on an aggregate
> condition to prevent an overflow or something in the state accumulation
> function for a targetlist aggregate, you'd lose. But I don't see any
> way to avoid that without scanning the data twice, which we're surely
> not gonna do.

I'll send you the paper off-line, there's some more interesting stuff
also. p.12

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HAVING push-down
Date: 2007-01-26 17:07:37
Message-ID: 1169831257.3772.397.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2007-01-26 at 07:46 -0800, Joshua D. Drake wrote:

> Care to share the paper in general? It might be beneficial for all of us.

I'll ask the author, but don't expect an immediate response.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com