Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-25 08:50:06
Message-ID: CAApHDvqkf9ZhtY7Bk9OdJdNRq3NG0iz6WVzwFsEitdRH5GS3Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> >> If you want to play with
> >> this, I think the first step has to be to find a set of guarantees that
> >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that
> if the
> >> summands all have the same sign, the error is bound by C * N, where C
> is some
> >> (machine-specific?) constant (about 1e-15 or so), and N is the number
> of input
> >> rows. Or at least so I think from looking at SUMs over floats in
> descending
> >> order, which I guess is the worst case. You could, for example, try to
> see if
> >> you can find a invertibility conditions which guarantees the same, but
> allows
> >> C to be larger. That would put a bound on the number of digits lost by
> the new
> >> SUM(float) compared to the old one.
> >
> > I guess if the case is that normal (non-window) sum(float) results are
> undefined
> > unless you add an order by to the aggregate then I guess there is no
> possible
> > logic to put in for inverse transitions that will make them behave the
> same as
> > an undefined behaviour.
>
> Actually, if sum(float) was really undefined, it'd be *very* easy to
> provide an
> inverse transition function - just make it a no-op. Heck, you could then
> even
> make the forward transition function a no-op, since the very definition of
> "undefined behaviour" is "result can be anything, including setting your
> house
> on fire". The point is, it's *not* undefined - it's just imprecise. And the
> imprecision can even be quantified, it just depends on the number of
> input rows (the equal-sign requirement is mostly there to make
> "imprecision"
> equivalent to "relative error").
>
>
My apologies, I meant to use the term nondeterministic rather than
undefined. There's really not any need that I can see to turn things silly
here.

My point was more that since sum(float) can give different results if it
used an index scan rather than a seq scan, trying to get the inverse
transition to match something that gives varying results sounds like a
tricky task to take on.

> >> > If it seems sound enough, then I may implement it in C to see how much
> >> > overhead it adds to forward aggregation for floating point types, but
> even
> >> > if it did add too much overhead to forward aggregation it might be
> worth
> >> > allowing aggregates to have 2 forward transition functions and if the
> 2nd
> >> > one exists then it could be used in windowing functions where the
> frame
> >> > does not have "unbounded following".
> >>
> >> I don't think adding yet another type of aggregation function is the
> >> solution here.
> >
> > Why not?
> >
> > I thought, if in the cases where we need to burden the forward transition
> > functions with extra code to make inverse transitions possible, then why
> > not have an extra transition function so that it does not slow down
> normal
> > aggregation?
> >
> > My testing of sum(numeric) last night does not show any slow down by that
> > extra dscale tracking code that was added, but if it did then I'd be
> thinking
> > seriously about 2 forward transition functions to get around the problem.
> > I don't understand what would be wrong with that. The core code could
> just
> > make use of the 2nd function if it existed and inverse transitions were
> > thought to be required. i.e in a window context and does not
> > have UNBOUNDED PRECEDING.
>
> First, every additional function increases the maintenance burden, and at
> some point the expected benefit simply isn't worth it. IMHO at least.
>
>
There's little point in arguing for this as we've managed to narrow any
forward transition over head to background noise so far, my point more was
if there was no other way to maintain performance and have an inverse
transition function that this may be a solution to that problem.

> Secondly, you'd really need a second aggregate definition - usually, the
> non-invertible aggregate will get away with a much smaller state
> representation.
> Aggregates with state type "internal" could hack their way around that by
> simply not using the additional parts of their state structure, but e.g.
> plpgsql aggregates would still incur overhead due to repeated copying of
> a unnecessary large state value. If it's possible to represent the
> forward-only but not the invertible state state with a copy-by-value type,
> that overhead could easily be a large part of the total overhead you paid
> for having an inverse transition function.
>
>
I had imagined they keep the same state type and don't use any extra
variables that are defined for the forward transition function that is
invertible.

> And lastly, we could achieve much of the benefit of a second transition
> function by simply telling the forward transition function whether to
> expect inverse transition function calls. We already expose things like
> the aggregation memory context to C-language transition functions, so the
> basic mechanism is already there. In fact, I pondered whether to do this -
> but then figured I'd leave it to however needs that facility first. Though
> it wouldn't be much code - basically, setting a flag in the WindowAggState,
> and exporting a function to be used by aggregates to read it, similar
> to what AggCheckCallContext does.
>
>
Sounds like a good idea, but what would the solution aggregate transition
functions that are not written in C?

> best regards,
> Florian Pflug
>
>
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2014-01-25 09:15:20 Re: extension_control_path
Previous Message Michael Paquier 2014-01-25 06:24:23 Re: Recovery to backup point