Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-14 06:57:20
Message-ID: CAApHDvqgeouEk+qJETJiJyLo92fpqVFnCLq0HetDT4TCjMEPxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 14, 2014 at 2:00 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > As the patch stands at the moment, I currently have a regression test
> > which currently fails due to these extra zeros after the decimal point:
> >
> > -- This test currently fails due extra trailing 0 digits.
> > SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND
> UNBOUNDED FOLLOWING)
> > FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
> >
> > Patched produces:
> > 6.01
> > 5.00
> > 3.00
> > Unpatched produces:
> > 6.01
> > 5
> > 3
>
> Hm, that's annoying. I checked the standard to see if it offers any
> guidance
> here, but it doesn't look like it does - the standard just says that SUM()
> ought
> to return a type with the same scale as the input type has. That's fine if
> every NUMERIC type has a fixed scale, but that's not the case in postgres -
> we allow values of unconstrained NUMERIC types (i.e., numeric types
> without an
> explicitly specified precision or scale) to each define their own scale.
>
>
Thanks for digging that out in the standard and thanks for all that
information you supplied here
http://www.postgresql.org/message-id/0FA6C08E-2166-405B-83F7-63B196B88CA3@phlo.org
too.
Sorry I've not had the chance to reply yet. I was kind of hoping that the
answer would be more in my favour to help with inverse transitions for
sum(numeric).

> To fix this, we'd have to track the maximum scale within the current frame.
> That's easier than the general problem of providing an inverse transition
> function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
> be sufficient to track the number of times we saw each scale, and also the
> current maximum. Once we reduce the current maximum's count back to zero
> in the inverse transition function, we'd scan from that value to the left
> to
> find the next non-zero entry.
>
>
I've been thinking about this, but I had thought that the maximum dscale
was bigger than 1000. The docs seem to claim 16383 here -->
http://www.postgresql.org/docs/devel/static/datatype-numeric.html I'd go
ahead and implement this if that number was smaller, but I'm thinking
zeroing out an array of 16383 elements on first call to do_numeric_accum
might be too big an overhead to write off as "background noise". If it was
20 or even 100 then I'd probably try for that.

I think the overhead for each call after that would likely be ok as it
would probably just be an operation like
state->scaleCount[X.dscale]++; which I would imagine would be a very small
percentage overhead on normal aggregate functions. Of course the inverse
would have to do the harder work of looping backwards over the array until
it found an element with the count above 0 and setting that as the current
maximum. I think this would be a winner if it wasn't for the high initial
hit of zeroing that 16383 element array.. Or 1000 whichever.

> We could also choose to ignore this, although I'm not sure I really like
> that.
> It seems entirely OK at first sight - after all, the values all stay the
> same,
> we just emit a different number of trailing zeros. But it still causes
> results
> to be affected by values, even if only in the number of trailing zeros,
> which
> lie outside the value's range. That seems like more non-determinism than as
> database should show.
>
> > With inverse transitions this query still produces correct results, it
> just does
> > not produces the numeric in the same format as it does without
> performing inverse
> > transitions. Personally I'd rather focus on trying to get SUM(numeric)
> in there
> > for 9.4
>
> I think it'd be worthwile to get this into 9.4, if that's still an option,
> even if we only support COUNT.
>
> best regards,
> Florian Pflug
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2014-01-14 07:24:18 Re: [BUGS] surprising to_timestamp behavior
Previous Message Jeevan Chalke 2014-01-14 06:52:30 Re: [PATCH] Filter error log statements by sqlstate