Re: Final Patch for GROUPING SETS - unrecognized node type: 347

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 16:52:21
Message-ID: 87egvn8k0e.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:

>> As for computing it all twice, there's currently no attempt to
>> optimize multiple identical grouping sets into multiple
>> projections of a single grouping set result. CUBE(a,b,c,a) has
>> twice as many grouping sets as CUBE(a,b,c) does, even though all
>> the extra ones are duplicates.

Tomas> Shouldn't this be solved by eliminating the excessive
Tomas> ChainAggregate? Although it probably changes GROUPING(...),
Tomas> so it's not just about removing the duplicate column(s) from
Tomas> the CUBE.

Eliminating the excess ChainAggregate would not change the number of
grouping sets, only where they are computed.

Tomas> Maybe preventing this completely (i.e. raising an ERROR with
Tomas> "duplicate columns in CUBE/ROLLUP/... clauses") would be
Tomas> appropriate. Does the standard says anything about this?

The spec does not say anything explicitly about duplicates, so they
are allowed (and duplicate grouping _sets_ can't be removed, only
duplicate columns within a single GROUP BY clause after the grouping
sets have been eliminated by transformation). I have checked my
reading of the spec against oracle 11 and MSSQL using sqlfiddle.

The way the spec handles grouping sets is to define a sequence of
syntactic transforms that result in a query which is a UNION ALL of
ordinary GROUP BY queries. (We haven't tried to implement the
additional optional feature of GROUP BY DISTINCT.) Since it's UNION
ALL, any duplicates must be preserved, so a query with GROUPING SETS
((a),(a)) reduces to:

SELECT ... GROUP BY a UNION ALL SELECT ... GROUP BY a;

and therefore has duplicates of all its result rows.

I'm quite prepared to concede that I may have read the spec wrong
(wouldn't be the first time), but in this case I require any such
claim to be backed up by an example from some other db showing an
actual difference in behavior.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-09-07 17:06:04 Re: Adding a nullable DOMAIN column w/ CHECK
Previous Message Emre Hasegeli 2014-09-07 16:09:39 Re: Selectivity estimation for inet operators