Re: Combining Aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2015-03-04 09:37:19
Message-ID: CAApHDvpgXhghtpmuKPhnBj9ZDeEPy-8C0StXgG-GuTPAMdYp6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 February 2015 at 21:13, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>
> This patch itself looks good as an infrastructure towards
> the big picture, however, we still don't reach the consensus
> how combined functions are used instead of usual translation
> functions.

Thank you for taking the time to look at the patch.

>
> Aggregate function usually consumes one or more values extracted
> from a tuple, then it accumulates its internal state according
> to the argument. Exiting transition function performs to update
> its internal state with assumption of a function call per records.
> On the other hand, new combined function allows to update its
> internal state with partial aggregated values which is processed
> by preprocessor node.
> An aggregate function is represented with Aggref node in plan tree,
> however, we have no certain way to determine which function shall
> be called to update internal state of aggregate.
>
>
This is true, there's nothing done in the planner to set any sort of state
in the aggregation nodes to tell them weather to call the final function or
not. It's quite hard to know how far to go with this patch. It's really
only intended to provide the necessary infrastructure for things like
parallel query and various other possible usages of aggregate combine
functions. I don't think it's really appropriate for this patch to go
adding such a property to any nodes as there would still be nothing in the
planner to actually set those properties... The only thing I can think of
to get around this is implement the most simple use for combine aggregate
functions, the problem with that is, that the most simple case is not at
all simple.

> For example, avg(float) has an internal state with float[3] type
> for number of rows, sum of X and X^2. If combined function can
> update its internal state with partially aggregated values, its
> argument should be float[3]. It is obviously incompatible to
> float8_accum(float) that is transition function of avg(float).
> I think, we need a new flag on Aggref node to inform executor
> which function shall be called to update internal state of
> aggregate. Executor cannot decide it without this hint.
>
> Also, do you have idea to push down aggregate function across
> joins? Even though it is a bit old research, I could find
> a systematic approach to push down aggregate across join.
> https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf
>
>
I've not read the paper yet, but I do have a very incomplete WIP patch to
do this. I've just not had much time to work on it.

> I think, it is great if core functionality support this query
> rewriting feature based on cost estimation, without external
> modules.
>

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-03-04 09:41:12 Re: Combining Aggregates
Previous Message Shigeru Hanada 2015-03-04 09:26:27 Re: Join push-down support for foreign tables