Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-29 08:59:25
Message-ID: CAEZATCUSiuma-OL5kJeU0_9_iu0SQyF-3-iecnh-GybzrzHj=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 January 2014 20:16, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jan27, 2014, at 23:28 , Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> strict transfn vs non-strict inv_transfn
>> ----------------------------------------
>>
>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the
>> executor. To me, that seems overly restrictive --- if transfn is
>> strict, then you know for sure that no NULL values are added to the
>> aggregate state, so surely it doesn't matter whether or not
>> inv_transfn is strict. It will never be asked to remove NULL values.
>>
>> I think there are definite use-cases where a user might want to use a
>> pre-existing function as the inverse transition function, so it seems
>> harsh to force them to wrap it in a strict function in this case.
>>
>> AFAICS, the code in advance/retreat_windowaggregate should just work
>> if those checks are removed.
>
> True. It didn't use to in earlier version of the patch because the advance
> and retreat functions looked at the strict settings directly, but now that
> there's an ignore_nulls flag in the executor node, the only requirement
> should be that ignore_nulls is set if either of the transition functions
> is strict.
>
> I'm not sure that the likelihood of someone wanting to combine a strict
> forward with a non-strict inverse function is very hight, but neither
>

Me neither, but the checks to forbid it aren't adding anything, and I
think it's best to make it as flexible as possible.

>> non-strict transfn vs strict inv_transfn
>> ----------------------------------------
>>
>> At first this seems as though it must be an error --- the forwards
>> transition function allows NULL values into the aggregate state, and
>> the inverse transition function is strict, so it cannot remove them.
>>
>> But actually what the patch is doing in this case is treating the
>> forwards transition function as partially strict --- it won't be
>> passed NULL values (at least in a window context), but it may be
>> passed a NULL state in order to build the initial state when the first
>> non-NULL value arrives.
>
> Exactly.
>
>> It looks like this behaviour is intended to support aggregates that
>> ignore NULL values, but cannot actually be made to have a strict
>> transition function. I think typically this is because the aggregate's
>> initial state is NULL and it's state type differs from the type of the
>> values being aggregated, and so can't be automatically created from
>> the first non-NULL value.
>
> Yes. I added this because the alternative would haven been to count
> non-NULL values in most of the existing SUM() aggregates.
>
>> That all seems quite ugly though, because now you have a transition
>> function that is not strict, which is passed NULL values when used in
>> a normal aggregate context, and not passed NULL values when used in a
>> window context (whether sliding or not), except for the NULL state for
>> the first non-NULL value.
>
> Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does
> and skip NULL inputs if the aggregate has a strict inverse transition
> function. That fact that we never actually *call* the inverse doesn't
> matter. Will fix that once we decided on the various strictness issues.
>

Yuk!

>> I'm not sure if there is a better way to do it though. If we disallow
>> this case, these aggregates would have to use non-strict functions for
>> both the forward and inverse transition functions, which means they
>> would have to implement their own non-NULL value counting.
>> Alternatively, allowing strict transition functions for these
>> aggregates would require that we provide some other way to initialise
>> the state from the first non-NULL input, such as a new initfn.
>
> I'm not sure an initfn would really help. It would allow us to return
> to the initial requirement that the strict settings match - but you
> already deem the check that the forward transition function can only
> be strict if the inverse is also overly harsh, so would that really be
> an improvement? It's also cost us an additional pg_proc entry per aggregate.
>
> Another idea would be to have an explicit nulls=ignore|process option
> for CREATE AGGREGATE. If nulls=process and either of the transition
> functions are strict, we could either error out, or simply do what
> normal functions calls do and pretend they return NULL for NULL inputs.
> Not sure how the rule that forward transition functions may not return
> NULL if there's an inverse transition function would fit in if we do
> the latter, though.
>
> The question is - is it worth it the effort to add that flag?
>

Yeah, I thought about a flag too. I think that could work quite nicely.

Basically where I'm coming from is trying to make this as flexible as
possible, without pre-judging the kinds of aggregates that users may
want to add.

One use-case I had in mind upthread was suppose you wanted to write a
custom version of array_agg that only collected non-NULL values. With
such a flag, that would be trivial, but with the current patch you'd
have to (count-intuitively) wrap the inverse transition function in a
strict function.

Another use-case I can imagine is suppose you wanted a custom version
of sum() that returned NULL if any of the input values were NULL. If
you knew that most of your data was non-NULL, you might still wish to
benefit from an inverse transition function. Right now the patch won't
allow that because of the error in advance_windowaggregate(), but
possibly that could be relaxed by forcing a restart in that case. If
I've understood correctly that should give similar to current
performance if NULLs are present, and enhanced performance as the
window moved over non-NULL data.

In that second case, it would also be nice if you could simply re-use
the existing sum forward and inverse transition functions, with a
different null-handling flag.

Also, in cases where the forwards transition function cannot be made
strict (e.g., it's state type is internal) and there is no inverse
transition function, there might be a small performance gain to be had
from not calling the transition function with NULL values, rather than
have it ignore them.

So I think an ignore-nulls flag would have real benefits, as well as
being a cleaner design than relying on a strict inverse transition
function.

What do others think?

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2014-01-29 09:13:36 Re: Triggers on foreign tables
Previous Message Dave Page 2014-01-29 08:40:14 Re: proposal: hide application_name from other users