Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Nicholas White <n(dot)j(dot)white(at)gmail(dot)com>, Troels Nielsen <bn(dot)troels(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
Date: 2013-06-30 09:10:46
Message-ID: CAEZATCWT3=P88nv2ThTjvRDLpOsVtAPxaVPe=MaWe-x=GuhSmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 June 2013 17:30, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2013-06-24 at 18:01 +0100, Nicholas White wrote:
>> Good catch - I've attached a patch to address your point 1. It now
>> returns the below (i.e. correctly doesn't fill in the saved value if
>> the index is out of the window. However, I'm not sure whether (e.g.)
>> lead-2-ignore-nulls means count forwards two rows, and if that's null
>> use the last one you've seen (the current implementation) or count
>> forwards two non-null rows (as you suggest). The behaviour isn't
>> specified in a (free) draft of the 2003 standard
>> (http://www.wiscorp.com/sql_2003_standard.zip), and I don't have
>> access to the (non-free) final version. Could someone who does have
>> access to it clarify this? I've also added your example to the
>> regression test cases.
>
> Reading a later version of the draft, it is specified, but is still
> slightly unclear.
>
> As I see it, the standard describes the behavior in terms of eliminating
> the NULL rows entirely before applying the offset. This matches Troels's
> interpretation. Are you aware of any implementations that do something
> different?
>
>> I didn't include this functionality for the first / last value window
>> functions as their implementation is currently a bit different; they
>> just call WinGetFuncArgInFrame to pick out a single value. Making
>> these functions respect nulls would involve changing the single lookup
>> to a walk through the tuples to find the first non-null version, and
>> keeping track of this index in a struct in the context. As this change
>> is reasonably orthogonal I was going to submit it as a separate patch.
>
> Sounds good.
>

I took a quick look at this and I think there are still a few problems:

1). The ignore/respect nulls flag needs to be per-window-function
data, not a window frame option, because the same window may be shared
by multiple window function calls. For example, the following test
causes a crash:

SELECT val,
lead(val, 2) IGNORE NULLS OVER w,
lead(val, 2) RESPECT NULLS OVER w
FROM unnest(ARRAY[1,2,3,4,NULL, NULL, NULL, 5, 6, 7]) AS val
WINDOW w as ();

The connection to the server was lost. Attempting reset: Failed.

2). As Troels Nielsen said up-thread, I think this should throw a
FEATURE_NOT_SUPPORTED error if it is used for window functions that
don't support it, rather than silently ignoring the flag.

3). Similarly, the parser accepts ignore/respect nulls for arbitrary
aggregate functions over a window, so maybe this should also throw a
FEATURE_NOT_SUPPORTED error. Alternatively, it might be trivial to
make all aggregate functions work with ignore nulls in a window
context, simply by using the existing code for strict aggregate
transition functions. That might be quite handy to support things like
array_agg(val) IGNORE NULLS OVER(...).

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-06-30 10:50:51 Re: GIN improvements part 1: additional information
Previous Message Amit kapila 2013-06-30 07:33:00 Re: New regression test time