Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-04-11 16:40:58
Message-ID: 78601743-5959-4CDC-917C-17BAE1CF1457@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr11, 2014, at 17:09 , Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Basically, this comes down to a design judgment call, and my judgment
> is differing from yours. In the absence of opinions from others,
> I'm going to do it my way.

Ok. Are you going to do the necessary changes, or shall I? I'm happy to
leave it to you, but if you lack the time I can try to find some.

>> ... SUM(int4) wouldn't need
>> *any* extra state at all to be invertible, if it weren't for those pesky
>> issues surrounding NULL handling. In fact, an earlier version of the
>> invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
>> only reason this doesn't work nowadays is that Dean didn't like the
>> forward-nonstrict-but-inverse-strict special case that made this work.
>
> Tell me about the special case here again? Should we revisit the issue?

My original coding allows the combination of non-strict forward with strict
reverse transition functions. The combination behaved like a strict aggregate
regarding NULL handling - i.e., neither the forward nor the reverse transition
function received NULL inputs. But if the initial state was NULL, the forward
transition function *would* be called with that NULL state value upon seeing
the first non-NULL input. In the window case, the aggregation machinery would
also take care to reset the state type to it's initial value when it removed
the last non-NULL input from the aggregation state (just like it does for
strict aggregates today). This had two advantages

1) First, it allows strict aggregates to use state type internal. As it
stands now, aggregates with state type internal must implement their
own NULL handling, even if they behave exactly like most standard
aggregates do, namely ignore NULLS and return NULL only if there were
no non-NULL inputs.

2) It allows strict aggregates whose state type and input type aren't
binary coercible to return NULL if all inputs were NULL without any
special coding. As it stands today, this doesn't work without some
kind of counter in the state, because the final function otherwise
won't know if there were non-NULL inputs or not. Aggregates whose state
and input types are binary coercible get around that by setting the
initial value to NULL while *still* being strict, and relying on the
state replacement behaviour of the aggregate machinery.

It, however, also has a few disadvantages

3) It means that one needs to look at the inverse transition function's
strictness setting even if that function is never used.

4) It feels a bit hacky.

(2) is what affects SUM(int4). The only reason it track the number of inputs
is to be able to return NULL instead of 0 if no inputs remain.

One idea to fix (3) and (4) was *explicitly* flagging aggregates as
NULL-handling or NULL-ignoring, and also as what one might call
"weakly strict", i.e. as returning NULL exactly if there were no non-NULL
inputs. There are various variations of that theme possible - one flag for
each behaviour, or simply a single "common behaviour" flag. In the end, I
decided not to pursue that, mostly because the aggregates affected by that
issued turned out to be relatively easy to fix. For the ones with state type
internal, I simply added a counter, and I made SUM(int4) use AVG's transition
function.

I don't feel too strongly either way on this one - I initially implemented the
exception because I noticed that the NULL handling of some aggregates was
broken otherwise, and it seemed simpler to fix this in one place than going
over all the aggregates separately. OTOH, when I wrote the docs, I noticed
how hard it was to describe the behaviour accurately, which made me like it
less and less. And Dean wasn't happy with it at all, so that finally settled it.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Muraviov 2014-04-11 17:03:25 Re: Problem with displaying "wide" tables in psql
Previous Message Jack.O'Sullivan 2014-04-11 15:59:35 Re: CLOB & BLOB limitations in PostgreSQL