Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-17 21:42:39
Message-ID: CAApHDvp2XHnS9ypMv+yEbLGYBvGUT49WJ7Xv4fN8D2agDsGOkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 18, 2014 at 10:17 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> On Jan17, 2014, at 20:34 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> >
> >> I've now shuffled things around so that we can use inverse transition
> functions
> >> even if only some aggregates provide them, and to allow inverse
> transition
> >> functions to force a restart by returning NULL. The rules regarding
> NULL handling
> >> are now the following
> >
> > Maybe this is me thinking out loud, but I'm just thinking about the
> numeric case again.
> > Since the patch can now handle inverse transition functions returning
> NULL when they
> > fail to perform inverse transitions, I'm wondering if we could add an
> "expectedscale"
> > to NumericAggState, set it to -1 initially, when we get the first value
> set the
> > expectedscale to the dscale of that numeric, then if we get anything but
> that value
> > we'll set the expectedscale back to -1 again, if we are asked to perform
> an inverse
> > transition with a expectedscale as -1 we'll return null, otherwise we
> can perform
> > the inverse transition...
>
> You could do better than that - the numeric problem amounts to tracking
> the maximum
> scale AFAICS, so it'd be sufficient to restart if we remove a value whose
> scale equals
> the current maximum. But if we do SUM(numeric) at all, I think we should
> do so
> without requiring restarts - I still think the overhead of tracking the
> maximum scale
> within the aggregated values isn't that bad. If we zero the dscale
> counters lazily,
> the numbers of entries we have to zero is bound by the maximum dscale we
> encounter.
> Since actually summing N digits is probably more expensive than zeroing
> them, and since
> we pay the zeroing overhead only once per aggregation but save potentially
> many
> summations, I'm pretty sure we come out ahead by quite some margin.
>
>
We'll work that out, I don't think it will take very long to code up your
idea either.
I just thought that my idea was good enough and very cheap too, won't all
numerics that are actually stored in a column have the same scale anyway?
Is it not only been a problem because we've been testing with random
numeric literals the whole time?

The test turned out to become:
if (state->expectedScale == -1)
state->expectedScale = X.dscale;
else if (state->expectedScale != X.dscale)
state->expectedScale = -2;

In do_numeric_accum, then when we do the inverse I just check if
expectedScale < 0 then return NULL.

I'm not set on it, and I'm willing to try the lazy zeroing of the scale
tracker array, but I'm just not quite sure what extra real use cases it
covers that the one above does not. Perhaps my way won't perform inverse
transitions if the query did sum(numericCol / 10) or so.

I'll be committing this to the github repo very soon, so we can hack away
at the scale tracking code once it's back in.

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-01-17 22:07:24 Re: Triggers on foreign tables
Previous Message Andrew Dunstan 2014-01-17 21:18:06 Re: currawong is not a happy animal