Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

From: Nicholas White <n(dot)j(dot)white(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Troels Nielsen <bn(dot)troels(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
Date: 2013-07-11 14:51:01
Message-ID: CA+=vxNYsqwLKUHBmmSEGEkQP3a2LtFfAKq32xNqCif0Mhe7WtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've attached a revised version that fixes the issues above:

> changing a reference of the form:
> OVER w
> into:
> OVER (w)

Fixed (and I've updated the tests).

> It's bad form to modify a list while iterating through it.

Fixed

> We shouldn't create an arbitrary number of duplicate windows

Fixed

> Is there a problem with having two windowdefs in
> the p_windowdefs list with the same name
> ...
> You'll have to be a little careful that any other code knows that names
> can be duplicated in the list though.

I'm not sure I really can verify this - as I'm not sure how much
contrib / other third-party code has access to this data structure.
I'd prefer to be cautious and just create a child window if needed.

> I think we should get rid of the bitmapset entirely
> ...
> Instead of the bitmapset, we can keep track of two offsets

I've modified leadlag_common so it uses your suggested algorithm for
constant offsets (although it turns out you only need to keep a single
int64 index in the context). This algorithm calls
WinGetFuncArgInPartition at least twice per row, once to check whether
the current row is null (and so check if we have to move the leading /
lagged index forward) and either once to get leading / lagging value
or more than once to push the leading / lagged value forwards to the
next non-null value.
I've kept the bitmap solution for the non-constant offset case (i.e.
the random partition access case) as I believe it changes the cost of
calculating the lead / lagged values for every row in the partition to
O(partition size) - whereas a non-caching scan-the-partition solution
would be O(partition size * partition size). Is that OK?

Thanks -

Nick

Attachment Content-Type Size
lead-lag-ignore-nulls.patch application/octet-stream 38.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-07-11 16:19:01 Re: [PATCH] big test separation POC
Previous Message Sawada Masahiko 2013-07-11 14:42:10 Re: Patch for fail-back without fresh backup