Re: Final Patch for GROUPING SETS

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-01 01:17:52
Message-ID: 20150501011752.GA3416161@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 05:35:26AM +0100, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

> >> + * TODO: AGG_HASHED doesn't support multiple grouping sets yet.
>
> Andres> Are you intending to resolve this before an eventual commit?
>
> Original plan was to tackle AGG_HASHED after a working implementation
> was committed;

+1 for that plan.

> Andres> Possibly after the 'structural' issues are resolved? Or do you
> Andres> think this can safely be put of for another release?
>
> I think the feature is useful even without AGG_HASHED, even though that
> means it can sometimes be beaten on performance by using UNION ALL of
> many separate GROUP BYs; but I'd defer to the opinions of others on that
> point.

It will be a tough call, and PostgreSQL has gone each way on some recent
feature. I recommend considering both GroupAggregate and HashAggregate in all
design discussion but continuing to work toward a first commit implementing
GroupAggregate alone. With that in the tree, we'll be in a better position to
decide whether to release a feature paused at that stage in its development.
Critical facts are uncertain, so a discussion today would be unproductive.

> Andres> So, having quickly scanned through the patch, do I understand
> Andres> correctly that the contentious problems are:
>
> Andres> * Arguably this converts the execution *tree* into a DAG. Tom
> Andres> seems to be rather uncomfortable with that. I am wondering
> Andres> whether this really is a big deal - essentially this only
> Andres> happens in a relatively 'isolated' part of the tree right?
> Andres> I.e. if those chained together nodes were considered one node,
> Andres> there would not be any loops? Additionally, the way
> Andres> parametrized scans works already essentially "violates" the
> Andres> tree paradigma somewhat.

I agree with your assessment. That has been contentious.

> The major downsides as I see them with the current approach are:
>
> 1. It makes plans (and hence explain output) nest very deeply if you
> have complex grouping sets (especially cubes with high dimensionality).
>
> This can make explain very slow in the most extreme cases

I'm not worried about that. If anything, the response is to modify explain to
more-quickly/compactly present affected plan trees.

> 2. A union-based approach would have a chance of including AGG_HASHED
> support without any significant code changes,

> -> CTE x
> -> entire input subplan here
> -> Append
> -> GroupAggregate
> -> Sort
> -> CTE Scan x
> -> GroupAggregate
> -> Sort
> -> CTE Scan x
> -> HashAggregate
> -> CTE Scan x
> [...]

This uses 50-67% more I/O than the current strategy, which makes it a dead end
from my standpoint. Details:
http://www.postgresql.org/message-id/20141221210005.GA1864976@tornado.leadboat.com

> Andres> Are those the two bigger controversial areas? Or are there
> Andres> others in your respective views?

> Another controversial item was the introduction of GroupedVar.

I know of no additional controversies to add to this list.

Thanks,
nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-05-01 01:20:15 Re: PATCH: adaptive ndistinct estimator v4
Previous Message Bruce Momjian 2015-05-01 01:11:39 Re: proposal: disallow operator "=>" and use it for named parameters