Re: Final Patch for GROUPING SETS - unrecognized node type: 347

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 12:33:05
Message-ID: 540C5081.90205@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6.9.2014 23:34, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>
> Tomas> I have significant doubts about the whole design,
> Tomas> though. Especially the decision not to use HashAggregate,
>
> There is no "decision not to use HashAggregate". There is simply no
> support for HashAggregate yet.
>
> Having it be able to work with GroupAggregate is essential, because
> there are always cases where HashAggregate is simply not permitted
> (e.g. when using distinct or sorted aggs; or unhashable types; or with
> the current code, when the estimated memory usage exceeds work_mem).
> HashAggregate may be a performance improvement, but it's something
> that can be added afterwards rather than an essential part of the
> feature.

Ah, OK. I got confused by the "final patch" subject, and so the
possibility of additional optimization somehow didn't occur to me.

> Tomas> Now, the chaining only makes this worse, because it
> Tomas> effectively forces a separate sort of the whole table for each
> Tomas> grouping set.
>
> It's not one sort per grouping set, it's the minimal number of sorts
> needed to express the result as a union of ROLLUP clauses. The planner
> code will (I believe) always find the smallest number of sorts needed.

You're probably right. Although when doing GROUP BY CUBE (a,b,c,a) I get
one more ChainAggregate than with CUBE(a,b,c). and we seem to compute
all the aggregates twice. Not sure if we need to address this though,
because it's mostly user's fault.

> Each aggregate node can process any number of grouping sets as long as
> they represent a single rollup list (and therefore share a single sort
> order).
>
> Yes, this is slower than using one hashagg. But it solves the general
> problem in a way that does not interfere with future optimization.
>
> (HashAggregate can be added to the current implementation by first
> adding executor support for hashagg with multiple grouping sets, then
> in the planner, extracting as many hashable grouping sets as possible
> from the list before looking for rollup lists. The chained aggregate
> code can work just fine with a HashAggregate as the chain head.
>
> We have not actually tackled this, since I'm not going to waste any
> time adding optimizations before the basic idea is accepted.)

OK, understood.

>
> Tomas> What I envisioned when considering hacking on this a few
> Tomas> months back, was extending the aggregate API with "merge
> Tomas> state" function,
>
> That's not really on the cards for arbitrary non-trivial aggregate
> functions.
>
> Yes, it can be done for simple ones, and if you want to use that as a
> basis for adding optimizations that's fine. But a solution that ONLY
> works in simple cases isn't sufficient, IMO.

I believe it can be done for most aggregates, assuming you have access
to the internal state somehow (not just the). Adding it for in-core
aggregates would not be difficult, in most cases. But you're right we
don't have this now at all.

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2014-09-07 13:11:49 Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Previous Message Petr Jelinek 2014-09-07 12:29:50 Re: Built-in binning functions