Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Date: 2013-07-01 16:30:38
Message-ID: CAEZATCWgD_jYY0ARXTNTLbP7FSDsfahc4y3WtSOhgXo0ajPnyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 July 2013 01:44, David Fetter <david(at)fetter(dot)org> wrote:
> On Fri, Jun 28, 2013 at 09:22:52PM +0100, Dean Rasheed wrote:
>> On 21 June 2013 06:16, David Fetter <david(at)fetter(dot)org> wrote:
>> > Please find attached a patch which allows subqueries in the FILTER
>> > clause and adds regression testing for same.
>> >
>>
>> This needs re-basing/merging following Robert's recent commit to make
>> OVER unreserved.
>
> Please find attached. Thanks, Andrew Gierth! In this one, FILTER is
> no longer a reserved word.
>

Looking at this patch again, it appears to be in pretty good shape.

- Applies cleanly to head.
- Compiles with no warnings.
- Includes regression test cases and doc updates.
- Compatible with the relevant part of T612, "Advanced OLAP operations".
- Includes pg_dump support.
- Code changes all look reasonable, and I can't find any corner cases
that have been missed.
- Appears to work as expected. I tested everything I could think of
and couldn't break it.

AFAICT all the bases have been covered. As mentioned upthread, I would
have preferred a few more regression test cases, and a couple of the
tests don't appear to return anything interesting, but I'll leave that
for the committer to decide whether they're sufficient for regression
tests.

I have a few suggestions to improve the docs:

1). In syntax.sgml: "The aggregate_name can also be suffixed with
FILTER as described below". It's not really a suffix to the aggregate
name, since it follows the function arguments and optional order by
clause. Perhaps it would be more consistent with the surrounding text
to say something like

<replaceable>expression</replaceable> is
any value expression that does not itself contain an aggregate
expression or a window function call, and
! <replaceable>order_by_clause</replaceable> and
! <replaceable>filter_clause</replaceable> are optional
! <literal>ORDER BY</> and <literal>FILTER</> clauses as described below.

2). In syntax.sgml: "... or when a FILTER clause is present, each row
matching same". In the context of that paragraph this suggests that
the filter clause only applies to the first form, since that paragraph
is a description of the 4 forms of the aggregate function. I don't
think it's worth mentioning FILTER in this paragraph at all --- it's
adequately described below that.

3). In syntax.sgml: "Adding a FILTER clause to an aggregate specifies
which values of the expression being aggregated to evaluate". How
about something a little more specific, along the lines of

If <literal>FILTER</> is specified, then only input rows for which
the <replaceable>filter_clause</replaceable> evaluates to true are
fed to the aggregate function; input rows for which the
<replaceable>filter_clause</replaceable> evaluates to false or the
null value are discarded. For example...

4). In select.sgml: "In the absence of a FILTER clause, aggregate
functions...". It doesn't seem right to refer to the FILTER clause at
the top level here because it's not part of the SELECT syntax being
described on this page. Also I think this should include a
cross-reference to the aggregate function syntax section, perhaps
something like:

Aggregate functions, if any are used, are computed across all rows
making up each group, producing a separate value for each group
(whereas without <literal>GROUP BY</literal>, an aggregate
produces a single value computed across all the selected rows).
+ The set of rows fed to the aggregate function can be further filtered
+ by attaching a <literal>FILTER</literal> clause to the aggregate
+ function call, see <xref ...> for more information.
When <literal>GROUP BY</literal> is present, it is not valid for
the <command>SELECT</command> list expressions to refer to
ungrouped columns except within aggregate functions or if the
ungrouped column is functionally dependent on the grouped columns,
since there would otherwise be more than one possible value to
return for an ungrouped column. A functional dependency exists if
the grouped columns (or a subset thereof) are the primary key of
the table containing the ungrouped column.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-07-01 16:32:17 Re: Randomisation for ensuring nlogn complexity in quicksort
Previous Message Bernd Helmle 2013-07-01 15:34:24 Re: Passing fdw_private data from PlanForeignScan to PlanForeignModify