Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-01-19 21:44:16
Message-ID: CAKJS1f_yVNaHawu+ty-_1p5i7nLX0iK3ntjWcjGqvXNm6YDjaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 January 2016 at 05:56, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > Now, there has been talk of this previously, on various threads, but I
> don't
> > believe any final decisions were made on how exactly it should be done.
> At
> > the moment I plan to make changes as follows:
> >
> > Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and
> > aggserialtype These will only be required when aggtranstype is INTERNAL.
>
> Check.
>
> > Perhaps we should disallow CREATE AGGREAGET from accepting them for any
> > other type...
>
> Well, we should definitely not accept them and have them be
> meaningless. We should either reject them or accept them and then use
> them. I can't immediately think of a reason to serialize one
> non-internal type as another, but maybe there is one.
>
>
In the latest patch I'm disallowing serial and deserial functions for non
INTERNAL state aggregates during CREATE AGGREGATE.
I think it's best to keep this disabled for now, and do so until we
discover some reason that we might want want to enable it. If we enabled
it, and later decided that was a dumb idea, then it'll be much harder to
add the restriction later, since it may cause errors for users who have
created their own aggregates.

> > Add a new bool field to nodeAgg's state named serialStates. If this is
> field
> > is set to true then when we're in finalizeAgg = false mode, we'll call
> the
> > serialfn on the agg state instead of the finalfn. This will allow the
> > serialized state to be stored in the tuple and sent off to the main
> backend.
> > The combine agg node should also be set to serialStates = true, so that
> it
> > knows to deserialize instead of just assuming that the agg state is of
> type
> > aggtranstype.
>
> I'm not quite sure, but it sounds like you might be overloading
> serialStates with two different meanings here.
>

Hmm, only in the sense that serialStates means "serialise" and
"deserialise". The only time that both could occur in the same node is if
combineStates=true and finalizeAggs=false (in other words 3 or more
aggregation stages).

Let's say we are performing aggregation in 3 stages, stage 1 is operating
on normal values and uses the transfn on these values, but does not
finalise the states. If serialStates = true here then, if one exists, we
call the serialfn, passing in the aggregated state, and pass the return
value of that function up to the next node. Now, this next (middle) node is
important, serialStates must also be true here so that the executor knows
to deserialise the previously serialised states. Now, this node uses the
combinefn to merge states which require, and then since serialStates is
true, it also (re)serialises those new states again. Now if there was some
reason that this middle node should deserialise, but not re-serialise those
states, then we may need an extra parameter to instruct it to do so. I
guess this may be required if we were to perform some partial aggregation
and combining again within a single process (in which case we'd not need to
serialise INTERNAL states, we can just pass the pointer to them in the
Tuple), but then we might require the states to be serialised in order to
hand them over to the main process, from a worker process.

I can imagine cases where we might want to do this in the future, so
perhaps it is worth it:

Imagine:

SELECT COUNT(*) FROM (SELECT * FROM (SELECT * FROM a UNION ALL SELECT *
FROM b) AS ab UNION ALL (SELECT * FROM c UNION ALL SELECT * FROM d)) abcd;

Where the planner may decide to, on 1 worker perform count(*) on a then b,
and combine that into ab, while doing the same for c and d on some other
worker process.

> I believe this should allow us to not cause any performance regressions by
> > moving away from INTERNAL agg states. It should also be very efficient
> for
> > internal states such as Int8TransTypeData, as this struct merely has 2
> int64
> > fields which should be very simple to stuff into a bytea varlena type. We
> > don't need to mess around converting the ->count and ->sum into a
> strings or
> > anything.
>
> I think it would be more user-friendly to emit these as 2-element
> integer arrays rather than bytea. Sure, bytea is fine when PostgreSQL
> is talking to itself, but if you are communicating with an external
> system, it's a different situation. If the remote system is
> PostgreSQL then you are again OK, except for the data going over the
> wire being incomprehensible and maybe byte-order-dependent, but what
> if you want some other database system to do partial aggregation and
> then further aggregate the result? You don't want the intermediate
> state to be some kooky thing that only another PostgreSQL database has
> a chance of generating correctly.
>

If we do that then we also need to invent composite database types for the
more complicated internal states such as NumericAggState.
We should probably also consider performance here too. I had been
considering just using the *send() functions to build a bytea.

> Then in order for the planner to allow parallel aggregation all aggregates
> > must:
> >
> > Not have a DISTINCT or ORDER BY clause
> > Have a combinefn
> > If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn.
> >
> > We can relax the requirement on 3 if we're using 2-stage aggregation, but
> > not parallel aggregation.
>
> When would we do that?

This is probably best explained in one of the code comments:

+ * states as input rather than input tuples. This mode facilitates
multiple
+ * aggregate stages which allows us to support pushing aggregation
down
+ * deeper into the plan rather than leaving it for the final stage.
For
+ * example with a query such as:
+ *
+ * SELECT count(*) FROM (SELECT * FROM a UNION ALL SELECT * FROM b);
+ *
+ * with this functionality the planner has the flexibility to
generate a
+ * plan which performs count(*) on table a and table b separately
and then
+ * add a combine phase to combine both results. In this case the
combine
+ * function would simply add both counts together.

There's also the possibility to use this functionality later to allow GROUP
BY to occur before the final plan stage, e.g before joining to some
relation.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-01-19 21:49:07 Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
Previous Message Andres Freund 2016-01-19 21:43:21 Re: checkpointer continuous flushing