Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-18 05:15:06
Message-ID: CAApHDvpRuu9_nCXe2grZqrcQVTdg++Ht6oh9E7bR4NWLbLQewQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> First, I've go the feeling that I should somehow update the commitfest app,
> but I don't really know in which way. Should I put myself in as a reviewer,
> or as a second author? Or neither? Suggestions welcome...
>
>
We I guess you're both now, but it's a bit weird to be author and reviewer
so I've put your name against author too, hopefully Dean can review our
combined results and we can review each other's work at the same time.

> On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > The test turned out to become:
> > if (state->expectedScale == -1)
> > state->expectedScale = X.dscale;
> > else if (state->expectedScale != X.dscale)
> > state->expectedScale = -2;
> >
> > In do_numeric_accum, then when we do the inverse I just check if
> > expectedScale < 0 then return NULL.
>
> Ok, so this will rescan if and only if the dscales of all inputs match.
> I still that's overly pessimistic - we've only got a problem when we
> removed the input with the largest dscale, no? So my suggestion would be
>
> state->maxDScale = MAX(X.dscale, state->maxDScale);
>
> in do_numeric_accum, and in the inverse
>
> if (state->maxDScane == X.dscale)
> return PG_RETURN_NULL;
>
> I'd think that this avoids more restarts without about the same effort,
> but I haven't tried this though, so maybe I'm missing something.
>
>
This is not quite right as it means if all the values are the same then
we reject inverse transitions since state->maxScale will always be equal to
X.dscale.
But you are right about the overly strict code I've put in, we should allow
values with a less than the maximum dscale to be unaggregated without
complaint. To implement this I needed a maxScaleCounter variable too so we
only reject when the maxScaleCounter gets back to 0 again.

Note that after this fix the results for my quick benchmark now look like:

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,20000);
select sum(num) over(order by num rows between current row and unbounded
following) from num; -- 113 ms
select sum(num / 10) over(order by num rows between current row and
unbounded following) from num; -- 156ms
select sum(num / 1) over(order by num rows between current row and
unbounded following) from num; -- 144 ms

So it seems to be much less prone to falling back to brute force forward
transitions.
It also seems the / 10 version must have had to previously do 1 brute force
rescan but now it looks like it can do it in 1 scan.

> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
> > tracker array, but I'm just not quite sure what extra real use cases it
> > covers that the one above does not. Perhaps my way won't perform inverse
> > transitions if the query did sum(numericCol / 10) or so.
>
> Dunno how many people SUM over numerics with different dscales. Easily
> possible that it's few enough to not be worth fussing over.
>
>
Going by Tom's comments in the post above this is possible just by having
an unconstrained numeric column, but I guess there's still a good chance
that even those unconstrained numbers have the same scale or at least the
scale will likely not vary wildly enough to make us have to perform brute
force forward transitions for each row.

> > create table num (num numeric(10,2) not null);
> > insert into num (num) select * from generate_series(1,20000);
> > select sum(num) over(order by num rows between current row and unbounded
> following) from num; -- 124ms
> > select sum(num / 10) over(order by num rows between current row and
> unbounded following) from num; -- 254ms
> > select sum(num / 1) over(order by num rows between current row and
> unbounded following) from num; -- 108156.917 ms
> >
> > The divide by 1 case is slow because of that weird 20 trailing zero
> > instead of 16 when dividing a numeric by 1 and that causes the inverse
> > transition function to return NULL because the scale changes.
> >
> > I've not tested an unpatched version yet to see how that divide by 1
> query
> > performs on that but I'll get to that soon.
>
> So without the patch, all three queries should perform simiarly, i.e. take
> about 10 seconds, right? If so, the improvement is fantastic!
>
>
Well, it's actually 100 seconds, not 10. I tested the worse case
performance against an unpatched head and got 107 seconds instead of the
108. So I'm guessing that's pretty good as worse case is not really any
worse and the worse case is pretty hard to get to. I guess the results
would have to all have a different scale with the biggest scale on the
first aggregated values... Reaching that worse case just seems impossible
in a real world workload.

> > I'm thinking that the idea about detecting the numeric range with
> floating
> > point types and performing an inverse transition providing the range has
> > not gone beyond some defined danger zone is not material for this
> patch...
> > I think it would be not a great deal of work to code, but the testing
> involved
> > would probably make this patch not possible for 9.4
>
> Yeah, I never imagined that this would happen for 9.4.
>
> > The latest version of the patch is attached.
>
> OK, there are a few more comments
>
> * build_aggregate_fnexprs() should allow NULL to be passed for
> invtransfn_oid,
> I think. I don't quite like that we construct that even for plain
> aggregates,
> and avoiding requires just an additional if.
>
> I'm not quite sure what you mean on this. It passes InvalidOid in normal
aggregate calls (search for: "InvalidOid, /* invtrans is not needed here
*/") and only looks up the function in build_aggregate_fnexprs if
(OidIsValid(invtransfn_oid)) is true. I'm not sure how this can be improved
since that function is used for window aggregates and normal aggregates.

> * Don't we need to check for volatile function in the filter expression
> too?
>
>
I did manual testing on this before and the volatility test for the
aggregate arguments seems to cover this. I didn't look into why but it just
did. I've not test this again since your refactoring. I could test this
easily before when my numeric case was changing the results because of the
dscale problem, I noticed that if I did FILTER(WHERE random() > 0) that the
extra trailing zeros would disappear.
The problem now is that it's pretty hard to determine if an inverse
transition took place, the only way we can really tell is performance. I'll
see if I can invent a new test case for this by creating a user defined
aggregate as you described. I'm thinking just append '+' to a string for
transitions and '-' to a string for inverse transitions, then just make
sure we only have a string of '+'s when doing something like filter(where
random() >= 0).

> * As it stands, I don't think intXand_inv and intXor_inv are worth it,
> since
> the case where they return non-NULL is awefully slim (only for inputs
> containing only 1 respectively only zeros). We should either track the
> number
> of zeros and ones per bit, which would allow us to always perform inverse
> transitions, or just drop them.
>
>
I did think of this when I wrote them. I thought that the removing 0 case
might be quite common and worth it, but I thought the ~0 case would be less
common, but I just thought it was weird to do one without the other.
To do more tracking on these it looks like we'd need to change those
aggregates to use an state type that is internal and I think the extra
tracking would mean looping over a 8, 32 or 64 element array of int64's for
each value, I just don't think that would be a winner performance wise
since the code that's there is pretty much a handful of CPU cycles. It's
probably far more worth it for the bool and/or aggregates. We could just
keep track of the values aggregated and the count of values as "true" and
return true if those are the same in the case of "AND", then check the true
count is > 0 in the case of "OR". I'd feel more strongly to go and do that
if I'd actually ever used those aggregates for anything.

> * Quite a few of the inverse transition functions are marked strict, yet
> contain code to handle NULL inputs. You can just remove that code - the
> system
> makes sure that strict functions never receive NULL arguments. Affected
> are,
> AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv,
> int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv,
> int4_sum_inv,
> int8_sum_inv. Not sure that list is exhaustive...
>
>
Should be able to get a list from:
select proname,proisstrict from pg_proc where proisstrict = true and oid
in(select agginvtransfn from pg_aggregate);

I might not have time for this today though, so if you feel like checking
these that would be really helpful.

> * For any of the new inverse transition functions, I'd be inclined to just
> elog() if they're called directly and not as an aggregate. In particular
> those which check for that anyway, plus the *smaller_inv and *larger_inv
> ones. I don't see why anyone would ever want to call these directly - and
> if we elog() now, we can easy change these functions later, because no
> external
> code can depend on them. E.g., maybe someone wants to keep the top N
> elements
> in the MIN/MAX aggregates one day...
>
>
Yeah I guess the way it is now may mean we'd need to support legacy
functions for ever and a day if we changed the way they worked, but I'm not
sure if adding a check to see if it was used in an aggregate function
changes that, as a user could be using the built in function in their own
user defined aggregate, so there could still be complaints if we removed
them from a major version. What would be needed is some way to have
functions internally but publish these functions to the user, say only
visible from initdb or something. I don't think that's part of this patch
though. Maybe just the fact that they're undocumented helps give them more
ability to be removed later.

> * The number of new regression tests seems a bit excessive. I don't think
> there
> really a policy what to test and what not, but in this case I think it
> suffices
> if we test the basic machinery, plus the more complex functions. But
> e.g. most
> of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum
> internally,
> and the wrapper code basically just does datatype conversion, so testing
> a few
> cases seems enough there. What I think we *should* test, but don't do
> currently,
> is whether the core machinery performs the expected calls of the forward
> and
> reverse transition function. I was thinking about creating an aggregate
> in the
> regression tests which simply concatenates all the calls into a string,
> e.g.
> you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed
> 1.
> I think that should be possible with an SQL-language forward and inverse
> transfer function, but I haven't tried. I can try, if you want.
>
>
I agree that there are quite a lot of tests and I think that's why I took a
different approach when it came to all this little *larger_inv and
smaller_inv functions, there were just so many! I thought the aggregate
tests would run in the blank of a eye anyway, but perhaps I should consider
other things than just processing time. I created most of the aggregate
call tests by writing a whole load of queries on an unpatched version then
ran the tests and took the output of that as my expected results for my
patched copy. These were really useful to check for regression when I was
working hard on nodeWindowAgg.c. I'd find it hard to pick and choose what
to remove giving that they all test something slightly different, even if
it's just a different final function. They did pick up some failures
earlier when I forgot to change the strict property on int8_avg_accum_inv.
Maybe someone else has an opinion on that the number of tests?

Overall, I think it's really starting to take shape now and the list of
things to do are pretty small. I'm really happy to see so many aggregate
functions with inverse transition functions now!

Regards

David Rowley

> best regards,
> Florian Pflug
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-18 05:43:15 Re: [PATCH] Make various variables read-only (const)
Previous Message Peter Eisentraut 2014-01-18 04:19:05 Re: 9.3.2 Client-only installation per docs fails creating directory.