Re: Final Patch for GROUPING SETS

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-11 02:19:00
Message-ID: 87tx13t9y6.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

More comment on this later, but I want to highlight these specific
points since we need clear answers here to avoid wasting unnecessary
time and effort:

Tom> I've not spent any real effort looking at gsp2.patch yet, but it
Tom> seems even worse off comment-wise: if there's any explanation in
Tom> there at all of what a "chained aggregate" is, I didn't find it.

(Maybe "stacked" would have been a better term.)

What that code does is produce plans that look like this:

GroupAggregate
-> Sort
-> ChainAggregate
-> Sort
-> ChainAggregate

in much the same way that WindowAgg nodes are generated.

Where would you consider the best place to comment this? The WindowAgg
equivalent seems to be discussed primarily in the header comment of
nodeWindowAgg.c.

Tom> I'd also counsel you to find some other way to do it than
Tom> putting bool chain_head fields in Aggref nodes;

There are no chain_head fields in Aggref nodes.

Agg.chain_head is true for the Agg node at the top of the chain (the
GroupAggregate node in the above example), while AggState.chain_head
is set on the ChainAggregate nodes to point to the AggState of the
GroupAggregate node.

What we need to know before doing any further work on this is whether
this idea of stacking up aggregate and sort nodes is a viable one.

(The feedback I've had so far suggests that the performance is
acceptable, even if there are still optimization opportunities that
can be tackled later, like adding HashAggregate support.)

Tom> I took a quick look at gsp-u.patch. It seems like that approach
Tom> should work, with of course the caveat that using CUBE/ROLLUP as
Tom> function names in a GROUP BY list would be problematic. I'm not
Tom> convinced by the commentary in ruleutils.c suggesting that extra
Tom> parentheses would help disambiguate: aren't extra parentheses
Tom> still going to contain grouping specs according to the standard?

The spec is of minimal help here since it does not allow expressions in
GROUP BY at all, last I looked; only column references.

The extra parens do actually disambiguate because CUBE(x) and
(CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside
GROUPING SETS (...), it cannot appear inside a (...) list nested inside
a GROUPING SETS list (or anywhere else).

As the comments in gram.y explain, the productions used are intended
to follow the spec with the exception of using a_expr where the spec
requires <ordinary grouping set>. So CUBE and ROLLUP are recognized as
special only as part of a group_by_item (<grouping element> in the
spec), and as soon as we see a paren that isn't part of the "GROUPING
SETS (" opener, we're forced into parsing an a_expr, in which CUBE()
would become a function call.

(The case of upgrading from an old pg version seems to require the use
of --quote-all-identifiers in pg_dump)

Tom> Forcibly schema-qualifying such function names seems like a less
Tom> fragile answer on that end.

That I guess would require keeping more state, unless you applied it
everywhere to any function with a keyword for a name? I dunno.

The question that needs deciding here is less whether the approach
_could_ work but whether we _want_ it. The objection has been made
that we are in effect introducing a new category of "unreserved almost
everywhere" keyword, which I think has a point; on the other hand,
reserving CUBE is a seriously painful prospect.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-12-11 02:44:40 Re: double vacuum in initdb
Previous Message Peter Eisentraut 2014-12-11 01:58:16 Re: Fix typo um vacuumdb tests