Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2013-12-16 09:07:19
Message-ID: CAApHDvqsks__ZE47JshFS5PXXpamWuF-_u=s-HwCYuW=LZj5mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 16, 2013 at 9:39 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

>
> There's another technique we could use which doesn't need a negative
> transition function, assuming the order you feed the values to the aggreate
> function doesn't matter: keep subtotals. For example, if the window first
> contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and then 1 + 2 + 7 =
> 10. Next, 1 leaves the window, and 5 enters it. Now you calculate 2 + 7 +
> 5 = 14. By keeping the subtotal (3 + 4 = 7) around, you saved one addition
> compared to calculating 2 + 3 + 4 + 5 from scratch.
>
> The negative transition function is a lot simpler and faster for count(*)
> and integer operations, so we probably should implement that anyway. But
> the subtotals technique could be very useful for other data types.
>
> - Heikki
>

That's quite interesting. I guess we would need another flag in
pg_aggregate to mark if the order of the tuples matters, string_agg would
be an example of one that would have to skip this.

At least for ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING if the
aggregate order did not matter than it would likely be quite efficient just
to aggregate from the bottom up and materialise the results at each tuple
and store it in the tuple store, then just use that materialised value when
that tuple is processed. This method won't work when the frame base is not
fixed.

I had been thinking ahead on how to improve MIN and MAX cases too. I came
up with something called "tuple store indexes" that could be build as
binary search trees with a composite index on the tuple position and the
aggregate's sort operator... Something similar to how the following query
could use an index on (id,value) to calculate max()
select max(value) from test where id between 1 and 100;
It's certainly not something for this patch, but it was an idea I came up
with which I think would be possible without adding any more columns to
pg_aggregate.

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2013-12-16 09:55:50 Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Previous Message Albe Laurenz 2013-12-16 09:03:16 Re: Like operator for name type