Re: Final Patch for GROUPING SETS

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 09:15:29
Message-ID: CAOeZVidbB5v0FPG2=_QcROGvYJe+41LMiJ1Xy_ptqFZjNnp0Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ChainAggregate is

> a bit like a node having two parents, a Sort and a GroupAggregate.
> However,
> the graph edge between ChainAggregate and its GroupAggregate is a
> tuplestore
> instead of the usual, synchronous ExecProcNode().
>

Well, I dont buy the two parents theory. The Sort nodes are intermediately
stacked amongst ChainAggregate nodes, so there is still the single edge.
However, as you rightly said, there is a shared tuplestore, but note that
only the head of chain ChainAggregate has the top GroupAggregate as its
parent.

>
> Suppose one node orchestrated all sorting and aggregation. Call it a
> MultiGroupAggregate for now. It wouldn't harness Sort nodes, because it
> performs aggregation between tuplesort_puttupleslot() calls. Instead, it
> would directly manage two Tuplesortstate, CUR and NEXT. The node would
> have
> an initial phase similar to ExecSort(), in which it drains the outer node
> to
> populate the first CUR. After that, it looks more like
> agg_retrieve_direct(),
> except that CUR is the input source, and each tuple drawn is also put into
> NEXT. When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
> This
> design does not add I/O consumption or require a nonstandard communication
> channel between executor nodes. Tom, Andrew, does that look satisfactory?
>
>
So you are essentially proposing merging ChainAggregate and its
corresponding Sort node?

So the structure would be something like:

GroupAggregate
--> MultiGroupAgg (a,b)
----> MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level
GroupAggregate node projects the result of the entire operation. The key to
ChainAggregate nodes is that each ChainAggregate node handles grouping sets
that fit a single ROLLUP list i.e. can be done by a single sort order.
There can be multiple lists of this type in a single GS operation, however,
our current design has only a single top GroupAggregate node but a
ChainAggregate node + Sort node per sort order. If you are proposing
replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
with a single MultiGroupAggregate node, I am not able to understand how it
will handle all the multiple sort orders. If you are proposing replacing
only ChainAggregate + Sort node with a single MultiGroupAgg node, that
still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please
correct me if I am wrong.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-12-31 10:40:17 Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Previous Message Noah Misch 2014-12-31 08:58:45 Re: Final Patch for GROUPING SETS