Re: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]

From: David Fetter <david(at)fetter(dot)org>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Date: 2013-07-15 18:43:04
Message-ID: 20130715184304.GA3520@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 14, 2013 at 10:15:12PM -0400, Noah Misch wrote:
> On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote:
> > On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote:
> > > Overall I think this patch offers useful additional functionality, in
> > > compliance with the SQL spec, which should be handy to simplify
> > > complex grouping queries.
>
> As I understand this feature, it is syntactic sugar for the typical case of an
> aggregate with a strict transition function. For example, "min(x) FILTER
> (WHERE y > 0)" is rigorously equivalent to "min(CASE y > 0 THEN x END)".
> Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard
> queries it is *only* syntactic sugar. In PostgreSQL, it lets you do novel
> things with, for example, array_agg(). Is that accurate?
>
> > > I think this is ready for committer.
>
> The patch was thorough. I updated applicable comments, revisited some
> cosmetic choices, and made these functional changes:
>
> 1. The pg_stat_statements "query jumble" should incorporate the filter.
>
> 2. The patch did not update costing. I made it add the cost of the filter
> expression the same way we add the cost of the argument expressions. This
> makes "min(x) FILTER (WHERE y > 0)" match "min(case y > 0 THEN x end)" in
> terms of cost, which is a fair start. At some point, we could do better by
> reducing the argument cost by the filter selectivity.
>
>
> A few choices/standard interpretations may deserve discussion. The patch
> makes filter clauses contribute to the subquery level chosen to be the
> "aggregation query". This is observable through the behavior of these two
> standard-conforming queries:
>
> select (select count(outer_c)
> from (values (1)) t0(inner_c))
> from (values (2),(3)) t1(outer_c); -- outer query is aggregation query
> select (select count(outer_c) filter (where inner_c <> 0)
> from (values (1)) t0(inner_c))
> from (values (2),(3)) t1(outer_c); -- inner query is aggregation query
>
> I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this. Since that
> still isn't crystal-clear to me, I mention it in case anyone has a different
> reading.
>
> Distinct from that, albeit in a similar vein, SQL does not permit outer
> references in a filter clause. This patch permits them; I think that
> qualifies as a reasonable PostgreSQL extension.
>
> > --- a/doc/src/sgml/keywords.sgml
> > +++ b/doc/src/sgml/keywords.sgml
>
> > @@ -3200,7 +3200,7 @@
> > </row>
> > <row>
> > <entry><token>OVER</token></entry>
> > - <entry>reserved (can be function or type)</entry>
> > + <entry>non-reserved</entry>
>
> I committed this one-line correction separately.
>
> > --- a/src/backend/optimizer/plan/planagg.c
> > +++ b/src/backend/optimizer/plan/planagg.c
> > @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context)
> > ListCell *l;
> >
> > Assert(aggref->agglevelsup == 0);
> > - if (list_length(aggref->args) != 1 || aggref->aggorder != NIL)
> > + if (list_length(aggref->args) != 1 || aggref->aggorder != NIL || aggref->agg_filter != NULL)
> > return true; /* it couldn't be MIN/MAX */
> > /* note: we do not care if DISTINCT is mentioned ... */
>
> I twitched upon reading this, because neither ORDER BY nor FILTER preclude the
> aggregate being MIN or MAX. Perhaps Andrew can explain why he put aggorder
> there back in 2009. All I can figure is that writing max(c ORDER BY x) is so
> unlikely that we'd too often waste the next syscache lookup. But the same
> argument would apply to DISTINCT. With FILTER, the rationale is entirely
> different. The aggregate could well be MIN/MAX; we just haven't implemented
> the necessary support elsewhere in this file.
>
>
> See attached patch revisions. The first patch edits find_minmax_aggs_walker()
> per my comments just now. The second is an update of your FILTER patch with
> the changes to which I alluded above; it applies atop the first patch. Would
> you verify that I didn't ruin anything? Barring problems, will commit.
>
> Are you the sole named author of this patch? That's what the CF page says,
> but that wasn't fully clear to me from the list discussions.
>
> Thanks,
> nm

Tested your changes. They pass regression, etc. :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nicholas White 2013-07-15 18:45:10 Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
Previous Message Robert Haas 2013-07-15 18:29:46 Re: tab-completion for \lo_import