Re: Final Patch for GROUPING SETS

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 08:58:45
Message-ID: 20141231085845.GA2148306@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 23, 2014 at 02:29:58AM -0500, Noah Misch wrote:
> On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
> > I still find the ChainAggregate approach too ugly at a system structural
> > level to accept, regardless of Noah's argument about number of I/O cycles
> > consumed. We'll be paying for that in complexity and bugs into the
> > indefinite future, and I wonder if it isn't going to foreclose some other
> > "performance opportunities" as well.
>
> Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
> intersection between those having maintainable design and those having optimal
> I/O usage, optimal memory usage, and optimal number of sorts. Let's put more
> effort into finding it. I'm hearing that the shared tuplestore is
> ChainAggregate's principal threat to system structure; is that right?

The underlying algorithm, if naively expressed in terms of our executor
concepts, would call ExecProcNode() on a SortState, then feed the resulting
slot to both a GroupAggregate and to another Sort. That implies a non-tree
graph of executor nodes, which isn't going to fly anytime soon. The CTE
approach bypasses that problem by eliminating cooperation between sorts,
instead reading 2N+1 copies of the source data for N sorts. ChainAggregate is
a bit like a node having two parents, a Sort and a GroupAggregate. However,
the graph edge between ChainAggregate and its GroupAggregate is a tuplestore
instead of the usual, synchronous ExecProcNode().

Suppose one node orchestrated all sorting and aggregation. Call it a
MultiGroupAggregate for now. It wouldn't harness Sort nodes, because it
performs aggregation between tuplesort_puttupleslot() calls. Instead, it
would directly manage two Tuplesortstate, CUR and NEXT. The node would have
an initial phase similar to ExecSort(), in which it drains the outer node to
populate the first CUR. After that, it looks more like agg_retrieve_direct(),
except that CUR is the input source, and each tuple drawn is also put into
NEXT. When done with one CUR, swap CUR with NEXT and reinitialize NEXT. This
design does not add I/O consumption or require a nonstandard communication
channel between executor nodes. Tom, Andrew, does that look satisfactory?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Atri Sharma 2014-12-31 09:15:29 Re: Final Patch for GROUPING SETS
Previous Message Guillaume Lelarge 2014-12-31 07:46:22 Re: Maximum number of WAL files in the pg_xlog directory