Quick Links

Re: Window functions patch v04 for the September commit fest

Lists:	pgsql-hackers

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Window functions patch v04 for the September commit fest
Date:	2008-08-29 17:04:27
Message-ID:	e08cc0400808291004j3f5d0548hea89dcdd946dcd6d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here's the latest window functions patch against HEAD. It seems to be
ready for the September commit fest, as added documents, WINDOW clause
feature and misc tests. I guess this would be the window functions
feature freeze for 8.4. The remaining feature will be implemented for
the later release.

This patch consists of:
- windowed aggregates
- cooperate with GROUP BY aggregates
- some optimizations with multiple windows
- ranking functions
- WINDOW clause
- windowing SQL regression tests
- sgml documents

Looking up the total road map, the dropped features are:

- sliding window (window framing)
- lead(), lag(), etc. that reach for random rows
- user defined window functions

The first and second topics are difficult to implement currently.
Because these features require random row access, it seems that
tuplestore would be able to save multiple positions to mark/restore.
This is fundamental change that is over my capability. Also, user
defined window functions seem to have much more to decide. I think I
can't put into shape the general needs of user's window functions now.
Lacking these feature, this stage looks compatible to SQLServer 2005,
while Oracle and DB2 have almost full of the specification.

Also, current implementation has only a type of plan which uses sort
operation. It should be optimized by re-position the windows and/or
using hashtable.

Oh, git hosting repository is updated as well.

Regards,

--
Hitoshi Harada

Attachment	Content-Type	Size
window_functions.patch.20080830.gz	application/x-gzip	36.5 KB

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-01 03:15:19
Message-ID:	e08cc0400808312015o486ab4a3lacd51b12a1582396@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/8/30 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> Here's the latest window functions patch against HEAD. It seems to be
> ready for the September commit fest, as added documents, WINDOW clause
> feature and misc tests. I guess this would be the window functions
> feature freeze for 8.4. The remaining feature will be implemented for
> the later release.
>
> This patch consists of:
> - windowed aggregates
> - cooperate with GROUP BY aggregates
> - some optimizations with multiple windows
> - ranking functions
> - WINDOW clause
> - windowing SQL regression tests
> - sgml documents
>
> Looking up the total road map, the dropped features are:
>
> - sliding window (window framing)
> - lead(), lag(), etc. that reach for random rows
> - user defined window functions
>
> The first and second topics are difficult to implement currently.
> Because these features require random row access, it seems that
> tuplestore would be able to save multiple positions to mark/restore.
> This is fundamental change that is over my capability. Also, user
> defined window functions seem to have much more to decide. I think I
> can't put into shape the general needs of user's window functions now.
> Lacking these feature, this stage looks compatible to SQLServer 2005,
> while Oracle and DB2 have almost full of the specification.
>
> Also, current implementation has only a type of plan which uses sort
> operation. It should be optimized by re-position the windows and/or
> using hashtable.
>
> Oh, git hosting repository is updated as well.

README is updated.
http://umitanuki.net/pgsql/wfv04/design.html

Please add link to commit fest wiki.

Regards,

--
Hitoshi Harada

From:	David Fetter <david(at)fetter(dot)org>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-01 04:46:29
Message-ID:	20080901044629.GK3717@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 01, 2008 at 12:15:19PM +0900, Hitoshi Harada wrote:
> README is updated.
> http://umitanuki.net/pgsql/wfv04/design.html
>
> Please add link to commit fest wiki.

Added :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-01 16:55:14
Message-ID:	1220288114.4371.196.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2008-08-30 at 02:04 +0900, Hitoshi Harada wrote:

> Here's the latest window functions patch against HEAD. It seems to be
> ready for the September commit fest, as added documents, WINDOW clause
> feature and misc tests. I guess this would be the window functions
> feature freeze for 8.4. The remaining feature will be implemented for
> the later release.
>
> This patch consists of:
> - windowed aggregates
> - cooperate with GROUP BY aggregates
> - some optimizations with multiple windows
> - ranking functions
> - WINDOW clause
> - windowing SQL regression tests
> - sgml documents
>
> Looking up the total road map, the dropped features are:
>
> - sliding window (window framing)
> - lead(), lag(), etc. that reach for random rows
> - user defined window functions
>
> The first and second topics are difficult to implement currently.
> Because these features require random row access, it seems that
> tuplestore would be able to save multiple positions to mark/restore.
> This is fundamental change that is over my capability. Also, user
> defined window functions seem to have much more to decide. I think I
> can't put into shape the general needs of user's window functions now.
> Lacking these feature, this stage looks compatible to SQLServer 2005,
> while Oracle and DB2 have almost full of the specification.

If you've done all of that, then I'm impressed. Well done.

Few general comments

* The docs talk about "windowing functions", yet you talk about "window
functions" here. I think the latter is correct, but whichever we choose
we should be consistent (and hopefully matching SQL Standard).

* You don't use duplicate the examples from the docs into the tests,
which is always a good way to get conflicting reports from users. :-)

* The tests seem very light for such a huge range of new functionality.
(8 tests is hardly sufficient). I'd like to see a wide range of tests -
probably 5-10 times as many individual test statements. I would also
like to see test failures that illustrate the as-yet unimplemented
features and the warning messages that are thrown - this will help us
understand exactly what is missing also. It would also be useful to see
other common coding mistakes/misconceptions and the corresponding error
messages.

> Also, current implementation has only a type of plan which uses sort
> operation. It should be optimized by re-position the windows and/or
> using hashtable.

I would like to see some performance test results also. It would be good
to know whether they are fast/slow etc.. It will definitely help the
case for inclusion if they are faster than alternative multi-statement
approaches to solving the basic data access problems.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-01 18:00:47
Message-ID:	48BC2DCF.2090103@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hitoshi Harada wrote:
> 2008/8/30 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>> Here's the latest window functions patch against HEAD. It seems to be
>> ready for the September commit fest, as added documents, WINDOW clause
>> feature and misc tests. I guess this would be the window functions
>> feature freeze for 8.4. The remaining feature will be implemented for
>> the later release.
>>
>> This patch consists of:
>> - windowed aggregates
>> - cooperate with GROUP BY aggregates
>> - some optimizations with multiple windows
>> - ranking functions
>> - WINDOW clause
>> - windowing SQL regression tests
>> - sgml documents
>>
>> Looking up the total road map, the dropped features are:
>>
>> - sliding window (window framing)
>> - lead(), lag(), etc. that reach for random rows
>> - user defined window functions

Ok, I'm starting to read up on SQL2003 window functions, and to review
this patch. Here's a long brain-dump of my first thoughts on the design.

Let's list a couple of requirements first:

1. It's important that what gets committed now can be extended to handle
all of the window function stuff in SQL2003 in the future, as well as
user-defined-window-functions in the spirit of PostgreSQL extensibility.
Even if we don't implement all of it in this release.

2. Performance needs to be reasonable, in the face a large number of
input rows. These are typically used in OLAP queries that process
gazillions of rows. Some window functions need access to all rows in the
window frame or partition, so you can't reasonably expect great
performance with those if the working set is large, but those that
don't, should work with finite memory requirements, and reasonably fast.

Because of 1., let's focus on the interface for
user-defined-window-functions first. Ideally, the interface is such that
all the window functions and aggregates defined in SQL2003, and others
we might want to have in core or as pgfoundry projects, can be
implemented using it.

I think there's still some confusion in the design document about what a
frame is. The terminology section is great, it helped me a lot to get
started, so thank you for that. However, in the "Actual design in the
patch" you describe how a window function works, and you talk about a
window frame, but elsewhere in the doc you note that window frames are
not implemented yet.

As already discussed, there's two different kind of window expressions:
window aggregates, and ranking aggregates (you call them window
functions in your doc). It seems that they are different enough that we
can't bang them together. For example, you can't use a window frame
clause with ranking aggregates, and ranking aggregates need to see the
whole row, whereas window aggregates only see the expressions passed as
argument.

API for window functions (aka. ranking aggregate functions)
------------------------

Borrowing ideas from the many approaches listed in the design doc, I
propose an interface using a set-returning function. Similar to Simon's
proposal, but taking into account that a ranking function might need to
see some or all input tuples before returning anything.

For each input tuple, the window function can return any number of
tuples, including 0. After processing all input values, though, the
total number of output rows returned must match the total number of
input values fed into it. The function must eventually output one value
for each input value, in the same order.

This is a flexible design that allows for different kind of
implementations; the function can return one output tuple for each input
tuple (ROW_NUMBER(), RANK() etc.), or it can swallow all input tuples
first, and only then return all output tuples (e.g. CUME_DIST).

Here's a couple of step-by-step examples of how some functions would
work. Imagine input values 1, 3, 4, 4, 5:

RANK
----

Invocation Input Value returned Internal state (after)
1 1 1 (last value 1, count 1, rank=1)
2 3 2 (last value 2, count 1, rank=2)
3 4 3 (last value 4, count 1, rank=3)
4 4 3 (last value 4, count 2, rank=3)
5 5 5 (last value 5, count 1, rank=5)

So RANK returns one tuple after each input tuple. Just like in your
current patch, keep the last value returned, a row count, and the
current rank as state.

LAG
---

Internal state is a FIFO queue of values. Each input value is pushed to
the FIFO, and the tuple that falls out of the queue is returned.

For example, LAG(<col>,2,0):

Invocation Input row Value returned Internal state (after)
1 1 0 (1)
2 3 0 (3, 1)
3 4 1 (4, 3)
4 4 3 (4, 4)
5 5 4 (5, 4)

LEAD
----

Return nothing for first <offset> input tuples. Then return the current
input tuple for the rest of the input tuples, and after the last input
tuple, return <offset> number of default values.

For example, LEAD(<col>,2,0):

Invocation Input row Value returned Internal state (after)
1 1 <none> (offsetbegin = 1, pad=2)
2 3 <none> (offsetbegin = 0, pad=2)
3 4 4 (offsetbegin = 0, pad=2)
4 4 4 (offsetbegin = 0, pad=2)
5 5 5 (offsetbegin = 0, pad=2)
6 <none> 0 (offsetbegin = 0, pad=1)
7 <none> 0 (offsetbegin = 0, pad=0)

For functions like LEAD or CUME_DIST that don't immediately start
returning values, the executor will need a tuplestore to buffer input
rows until it gets their values from the window function.

The syntax for creating a ranking aggregate might look something like this:

CREATE RANKING AGGREGATE name (
SFUNC = sfunc,
STYPE = state_data_type,
OUTFUNC = outfunc
)

This is very similar to Simon's proposal, and to current CREATE
AGGREGATE, but tweaked a bit to fix the problems Hitoshi pointed out.

sfunc is called once for each input row. Unlike with normal aggregates,
sfunc is passed the whole input row, so that e.g RANK can compare it
against the previous row, or LEAD can buffer it.

outfunc is a set-returning function, and it is called until it returns
no more rows, after each sfunc invocation.

Window aggregates
-----------------

Like normal aggregates, window aggregates process N input values, and
return one output value. In normal GROUP BY expressions, the aggregate
function is called once each group, but in a window aggregate expression
with a window frame (aka sliding window), there is as many "groups" as
there is rows.

In fact, any normal aggregate function can also be used as a window
aggregate and vice versa. The trivial implementation is to have a buffer
to hold all values currently in the window frame, and for each input
row, invoke the aggregate function over all the rows currently in the
frame. I think we'll need support that, to allow using any built-in or
user-defined aggregate as a window aggregate.

However, we also want to provide optimized versions of the common
aggregates. Aggregates like COUNT, SUM, and AVG can easily be
implemented more efficiently, without rescanning all the tuples in the
group.

I propose an extension to the current CREATE AGGREGATE syntax to allow
for optimized versions. Currently, the syntax is like:

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , FINALFUNC = ffunc ]
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
)

Let's add a function to allow removing tuples from the current window frame:

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , FINALFUNC = ffunc ]
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
[ , SUBTRACTFUNC = subfunc,
)

subfunc is like sfunc, except that the input value is "subtracted" from
the current value. On each input row, the executor calls sfunc for all
new tuples that enter the window frame, and subfunc for all tuples that
exit it. Another difference is that finalfunc can be called many times
during the lifecycle of an aggregate invocation. For example, the
subfunc for COUNT would decrement the row count by one, and for SUM,
subfunc would subtract the removed value from the current sum. Perhaps
we should also support an "opportunistic subfunc", that could either
perform the subtraction, or return a special value indicating that it
can't be done, and we need to calculate the aggregate from scratch as if
subfunc wasn't defined. That would be useful for MIN/MAX, which can be
implemented by just keeping track of the current MIN/MAX value, and
"subtracting" any other value than the current MIN/MAX is a no-op, but
"subtracting" the current MIN/MAX value requires scanning all the rest
of the tuples from scratch.

PS. Have you looked at the "hypothetical set functions" in SQL2003?

PPS. I was just about to send this, when I noticed that LEAD/LAG are in
fact not in the SQL2003 draft I have at hand. In fact, none of the
window functions defined there, RANK, DENSE_RANK, PERCENT_RANK,
CUME_DIST and ROW_NUMBER, require random access to the tuples. Still
seems like a good idea to be have the flexibility for them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	David Fetter <david(at)fetter(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-01 20:39:03
Message-ID:	20080901203903.GP3717@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
> Hitoshi Harada wrote:
>> 2008/8/30 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>>> Here's the latest window functions patch against HEAD. It seems to be
>>> ready for the September commit fest, as added documents, WINDOW clause
>>> feature and misc tests. I guess this would be the window functions
>>> feature freeze for 8.4. The remaining feature will be implemented for
>>> the later release.
>>>
>>> This patch consists of:
>>> - windowed aggregates
>>> - cooperate with GROUP BY aggregates
>>> - some optimizations with multiple windows
>>> - ranking functions
>>> - WINDOW clause
>>> - windowing SQL regression tests
>>> - sgml documents
>>>
>>> Looking up the total road map, the dropped features are:
>>>
>>> - sliding window (window framing)
>>> - lead(), lag(), etc. that reach for random rows
>>> - user defined window functions
>
> Ok, I'm starting to read up on SQL2003 window functions,

Maybe it would be better to read the SQL2008 standard
<http://wiscorp.com/sql200n.zip> :)

> and to review this patch. Here's a long brain-dump of my first
> thoughts on the design.
>
> Let's list a couple of requirements first:
>
> 1. It's important that what gets committed now can be extended to
> handle all of the window function stuff in SQL2003 in the future,
> as well as user-defined-window-functions in the spirit of
> PostgreSQL extensibility. Even if we don't implement all of it in
> this release.

> 2. Performance needs to be reasonable, in the face a large number of
> input rows. These are typically used in OLAP queries that process
> gazillions of rows. Some window functions need access to all rows in the
> window frame or partition, so you can't reasonably expect great
> performance with those if the working set is large, but those that
> don't, should work with finite memory requirements, and reasonably fast.
>
> Because of 1., let's focus on the interface for
> user-defined-window-functions first. Ideally, the interface is such
> that all the window functions and aggregates defined in SQL2003,
> and others we might want to have in core or as pgfoundry projects,
> can be implemented using it.

SQL2008 defines the following:

<rank function type> ::=
RANK
| DENSE_RANK
| PERCENT_RANK
| CUME_DIST

<ntile function> ::=
NTILE <left paren> <number of tiles> <right paren>

<lead or lag function> ::=
<lead or lag> <left paren> <lead or lag extent>
[ <comma> <offset> [ <comma> <default expression> ] ] <right paren>
[ <null treatment> ]

<lead or lag> ::=
LEAD | LAG

<null treatment> ::=
RESPECT NULLS | IGNORE NULLS

<first or last value function> ::=
<first or last value> <left paren> <value expression> <right paren> [ <null treatment> ]

<first or last value> ::=
FIRST_VALUE | LAST_VALUE

<nth value function> ::=
NTH_VALUE <left paren> <value expression> <comma> <nth row> <right paren>
[ <from first or last> ] [ <null treatment> ]

<from first or last> ::=
FROM FIRST
| FROM LAST

<in-line window specification> ::=
<window specification>

> For functions like LEAD or CUME_DIST that don't immediately start
> returning values, the executor will need a tuplestore to buffer input
> rows until it gets their values from the window function.

For these aggregates, should there be an API that signals that a tuplestore
will be needed?

> The syntax for creating a ranking aggregate might look something like this:
>
> CREATE RANKING AGGREGATE name (
> SFUNC = sfunc,
> STYPE = state_data_type,
> OUTFUNC = outfunc
> )
>
> This is very similar to Simon's proposal, and to current CREATE
> AGGREGATE, but tweaked a bit to fix the problems Hitoshi pointed
> out.
>
> sfunc is called once for each input row. Unlike with normal
> aggregates, sfunc is passed the whole input row, so that e.g RANK
> can compare it against the previous row, or LEAD can buffer it.
>
> outfunc is a set-returning function, and it is called until it
> returns no more rows, after each sfunc invocation.

> Window aggregates
> -----------------
>
> Like normal aggregates, window aggregates process N input values, and
> return one output value. In normal GROUP BY expressions, the aggregate
> function is called once each group, but in a window aggregate expression
> with a window frame (aka sliding window), there is as many "groups" as
> there is rows.
>
> In fact, any normal aggregate function can also be used as a window
> aggregate and vice versa. The trivial implementation is to have a buffer
> to hold all values currently in the window frame, and for each input
> row, invoke the aggregate function over all the rows currently in the
> frame. I think we'll need support that, to allow using any built-in or
> user-defined aggregate as a window aggregate.
>
> However, we also want to provide optimized versions of the common
> aggregates. Aggregates like COUNT, SUM, and AVG can easily be
> implemented more efficiently, without rescanning all the tuples in the
> group.
>
> I propose an extension to the current CREATE AGGREGATE syntax to allow
> for optimized versions. Currently, the syntax is like:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> )
>
> Let's add a function to allow removing tuples from the current window frame:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> [ , SUBTRACTFUNC = subfunc,
> )
>
> subfunc is like sfunc, except that the input value is "subtracted" from
> the current value. On each input row, the executor calls sfunc for all
> new tuples that enter the window frame, and subfunc for all tuples that
> exit it. Another difference is that finalfunc can be called many times
> during the lifecycle of an aggregate invocation. For example, the
> subfunc for COUNT would decrement the row count by one, and for SUM,
> subfunc would subtract the removed value from the current sum. Perhaps
> we should also support an "opportunistic subfunc", that could either
> perform the subtraction, or return a special value indicating that it
> can't be done, and we need to calculate the aggregate from scratch as if
> subfunc wasn't defined. That would be useful for MIN/MAX, which can be
> implemented by just keeping track of the current MIN/MAX value, and
> "subtracting" any other value than the current MIN/MAX is a no-op, but
> "subtracting" the current MIN/MAX value requires scanning all the rest
> of the tuples from scratch.
>
> PS. Have you looked at the "hypothetical set functions" in SQL2003?

They're kinda neat :)

> PPS. I was just about to send this, when I noticed that LEAD/LAG are in
> fact not in the SQL2003 draft I have at hand. In fact, none of the
> window functions defined there, RANK, DENSE_RANK, PERCENT_RANK,
> CUME_DIST and ROW_NUMBER, require random access to the tuples. Still
> seems like a good idea to be have the flexibility for them.

They're in SQL:2008.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-01 21:50:29
Message-ID:	87y72bmt7e.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:

> sfunc is called once for each input row. Unlike with normal aggregates, sfunc
> is passed the whole input row, so that e.g RANK can compare it against the
> previous row, or LEAD can buffer it.

I'm not sure I follow this bit about being passed the whole input row. How
would that relate to something like lag(x*y, 2) for example?

> outfunc is a set-returning function, and it is called until it returns no more
> rows, after each sfunc invocation.

And in any case I feel like this is abstraction on the cheap. The only reason
it's so general is because it's leaving all the work to the functions to
implement.

It also means we get no benefit from cases like

SELECT lag(x,5),lag(y,5),lag(z,5)

where the executor could keep one tuplestore and for all of them. For that
matter it could keep one tuplestore even for cases like lag(x,5),lag(y,4).

What would the executor do for a query like

SELECT lead(x,1),lead(y,2),lead(y,3)

It would not only have to keep a tuplestore to buffer the output but it would
have to deal with receiving data from different SRFs at different times. The
best approach I can think of would be to keep a tuplestore for each SRF using
themas queues, reading old values from the head as soon as they all have at
least one new value in them.

And it doesn't answer how to deal with things like

SELECT lag(x,1) OVER (ORDER BY a), lag(x,1) OVER (ORDER BY b)

I, uh, don't actually have any ideas of how to deal with that one :(

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 04:46:46
Message-ID:	e08cc0400809012146m5da0ecebqbba6d1ceecdf42f4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for comments,

2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>
> Ok, I'm starting to read up on SQL2003 window functions, and to review this
> patch. Here's a long brain-dump of my first thoughts on the design.
>
> Let's list a couple of requirements first:
>
> 1. It's important that what gets committed now can be extended to handle all
> of the window function stuff in SQL2003 in the future, as well as
> user-defined-window-functions in the spirit of PostgreSQL extensibility.
> Even if we don't implement all of it in this release.
>
> 2. Performance needs to be reasonable, in the face a large number of input
> rows. These are typically used in OLAP queries that process gazillions of
> rows. Some window functions need access to all rows in the window frame or
> partition, so you can't reasonably expect great performance with those if
> the working set is large, but those that don't, should work with finite
> memory requirements, and reasonably fast.
>
> Because of 1., let's focus on the interface for
> user-defined-window-functions first. Ideally, the interface is such that all
> the window functions and aggregates defined in SQL2003, and others we might
> want to have in core or as pgfoundry projects, can be implemented using it.
>
> I think there's still some confusion in the design document about what a
> frame is. The terminology section is great, it helped me a lot to get
> started, so thank you for that. However, in the "Actual design in the patch"
> you describe how a window function works, and you talk about a window frame,
> but elsewhere in the doc you note that window frames are not implemented
> yet.

Sorry about that. In my understanding, the "Window Frame" is defined
by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
contrast to "Window Partition" defined by "PARTITION BY" clause. A
frame slides within a partition or there's only one frame if those
clauses are not specified. The current patch has not implemented this
yet. I'll update the docs.

>
> As already discussed, there's two different kind of window expressions:
> window aggregates, and ranking aggregates (you call them window functions in
> your doc). It seems that they are different enough that we can't bang them
> together. For example, you can't use a window frame clause with ranking
> aggregates, and ranking aggregates need to see the whole row, whereas window
> aggregates only see the expressions passed as argument.

I don't like to call the second type "ranking aggregates" because it
may refer to only ranking functions though there are more types of
function like ntile(), lead() and lag(). But "window functions"
doesn't seem appropriate either since it's ambiguous with the general
name of window expressions.

>
> API for window functions (aka. ranking aggregate functions)
> ------------------------
>
> Borrowing ideas from the many approaches listed in the design doc, I propose
> an interface using a set-returning function. Similar to Simon's proposal,
> but taking into account that a ranking function might need to see some or
> all input tuples before returning anything.
>
> For each input tuple, the window function can return any number of tuples,
> including 0. After processing all input values, though, the total number of
> output rows returned must match the total number of input values fed into
> it. The function must eventually output one value for each input value, in
> the same order.
>
> This is a flexible design that allows for different kind of implementations;
> the function can return one output tuple for each input tuple (ROW_NUMBER(),
> RANK() etc.), or it can swallow all input tuples first, and only then return
> all output tuples (e.g. CUME_DIST).
>
> Here's a couple of step-by-step examples of how some functions would work.
> Imagine input values 1, 3, 4, 4, 5:
>
> RANK
> ----
>
> Invocation Input Value returned Internal state (after)
> 1 1 1 (last value 1, count 1, rank=1)
> 2 3 2 (last value 2, count 1, rank=2)
> 3 4 3 (last value 4, count 1, rank=3)
> 4 4 3 (last value 4, count 2, rank=3)
> 5 5 5 (last value 5, count 1, rank=5)
>
> So RANK returns one tuple after each input tuple. Just like in your current
> patch, keep the last value returned, a row count, and the current rank as
> state.
>
> LAG
> ---
>
> Internal state is a FIFO queue of values. Each input value is pushed to the
> FIFO, and the tuple that falls out of the queue is returned.
>
> For example, LAG(<col>,2,0):
>
> Invocation Input row Value returned Internal state (after)
> 1 1 0 (1)
> 2 3 0 (3, 1)
> 3 4 1 (4, 3)
> 4 4 3 (4, 4)
> 5 5 4 (5, 4)
>
> LEAD
> ----
>
> Return nothing for first <offset> input tuples. Then return the current
> input tuple for the rest of the input tuples, and after the last input
> tuple, return <offset> number of default values.
>
> For example, LEAD(<col>,2,0):
>
> Invocation Input row Value returned Internal state (after)
> 1 1 <none> (offsetbegin = 1, pad=2)
> 2 3 <none> (offsetbegin = 0, pad=2)
> 3 4 4 (offsetbegin = 0, pad=2)
> 4 4 4 (offsetbegin = 0, pad=2)
> 5 5 5 (offsetbegin = 0, pad=2)
> 6 <none> 0 (offsetbegin = 0, pad=1)
> 7 <none> 0 (offsetbegin = 0, pad=0)
>
>
> For functions like LEAD or CUME_DIST that don't immediately start returning
> values, the executor will need a tuplestore to buffer input rows until it
> gets their values from the window function.
>
>
> The syntax for creating a ranking aggregate might look something like this:
>
> CREATE RANKING AGGREGATE name (
> SFUNC = sfunc,
> STYPE = state_data_type,
> OUTFUNC = outfunc
> )
>
> This is very similar to Simon's proposal, and to current CREATE AGGREGATE,
> but tweaked a bit to fix the problems Hitoshi pointed out.
>
> sfunc is called once for each input row. Unlike with normal aggregates,
> sfunc is passed the whole input row, so that e.g RANK can compare it against
> the previous row, or LEAD can buffer it.
>
> outfunc is a set-returning function, and it is called until it returns no
> more rows, after each sfunc invocation.

Your proposal is smarter than the current implementation. But it
doesn't seem complete in some way. From logical point of view, the
window functions should be allowed to access whole of rows in the
frame the current row belongs to (c.f. inverse distribution
functions). From implementing and performance point of view, there
need to consider such case like mixing window aggregates and ranking
aggregates in the same window, which means it is better that the two
types of functions are processed in the similar flow. Also logically,
SQL spec doesn't forbid ranking aggregates in sliding window frames.
So your design may be flexible enough to cover different requirements
of lead()/lag() but I don't know if it will.

> Window aggregates
> -----------------
>
> Like normal aggregates, window aggregates process N input values, and return
> one output value. In normal GROUP BY expressions, the aggregate function is
> called once each group, but in a window aggregate expression with a window
> frame (aka sliding window), there is as many "groups" as there is rows.
>
> In fact, any normal aggregate function can also be used as a window
> aggregate and vice versa. The trivial implementation is to have a buffer to
> hold all values currently in the window frame, and for each input row,
> invoke the aggregate function over all the rows currently in the frame. I
> think we'll need support that, to allow using any built-in or user-defined
> aggregate as a window aggregate.
>
> However, we also want to provide optimized versions of the common
> aggregates. Aggregates like COUNT, SUM, and AVG can easily be implemented
> more efficiently, without rescanning all the tuples in the group.
>
> I propose an extension to the current CREATE AGGREGATE syntax to allow for
> optimized versions. Currently, the syntax is like:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> )
>
> Let's add a function to allow removing tuples from the current window frame:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> [ , SUBTRACTFUNC = subfunc,
> )
>
> subfunc is like sfunc, except that the input value is "subtracted" from the
> current value. On each input row, the executor calls sfunc for all new
> tuples that enter the window frame, and subfunc for all tuples that exit it.
> Another difference is that finalfunc can be called many times during the
> lifecycle of an aggregate invocation. For example, the subfunc for COUNT
> would decrement the row count by one, and for SUM, subfunc would subtract
> the removed value from the current sum. Perhaps we should also support an
> "opportunistic subfunc", that could either perform the subtraction, or
> return a special value indicating that it can't be done, and we need to
> calculate the aggregate from scratch as if subfunc wasn't defined. That
> would be useful for MIN/MAX, which can be implemented by just keeping track
> of the current MIN/MAX value, and "subtracting" any other value than the
> current MIN/MAX is a no-op, but "subtracting" the current MIN/MAX value
> requires scanning all the rest of the tuples from scratch.

Hmmm, its name may be better as "invfunc" than subfunc. I feel
horrible in imagining to manage the combination of functions that
don't support subfunc, ones that does and ranking aggregates but it's
possible.

> PS. Have you looked at the "hypothetical set functions" in SQL2003?

I had a glance but not deeply, since I found I would be lost in design
if deeply diving into it. :)

--
Hitoshi Harada

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 05:03:38
Message-ID:	e08cc0400809012203y7f45deecra2a6dcc9131f782c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/9/2 Gregory Stark <stark(at)enterprisedb(dot)com>:
> And it doesn't answer how to deal with things like
>
> SELECT lag(x,1) OVER (ORDER BY a), lag(x,1) OVER (ORDER BY b)
>
> I, uh, don't actually have any ideas of how to deal with that one :(

For different windows we make different window nodes with different
sort nodes as the patch does.

Regards,

--
Hitoshi Harada

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 06:33:00
Message-ID:	48BCDE1C.60708@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David Fetter wrote:
> On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
>> Ok, I'm starting to read up on SQL2003 window functions,
>
> Maybe it would be better to read the SQL2008 standard
> <http://wiscorp.com/sql200n.zip> :)

Ah, thanks!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	David Fetter <david(at)fetter(dot)org>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 06:42:25
Message-ID:	6879.1220337745@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> David Fetter wrote:
>> On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
>>> Ok, I'm starting to read up on SQL2003 window functions,
>>
>> Maybe it would be better to read the SQL2008 standard
>> <http://wiscorp.com/sql200n.zip> :)

> Ah, thanks!

Er, s/2008 standard/200n draft with uncertain chances of passage/

It's not like we haven't seen a SQL draft go down in flames before.

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 06:51:09
Message-ID:	48BCE25D.8040804@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
>> sfunc is called once for each input row. Unlike with normal aggregates, sfunc
>> is passed the whole input row, so that e.g RANK can compare it against the
>> previous row, or LEAD can buffer it.
>
> I'm not sure I follow this bit about being passed the whole input row. How
> would that relate to something like lag(x*y, 2) for example?

Well, sfunc needs to be passed not only the whole input tuple, but also
the values of the arguments.

Hmm. The spec says that the offset argument of lead and lag needs to be
a numeric literal. To enforce that at parse time, we'd have to hard-code
that into the grammar. Or we could check it at run-time, and throw an
error if its value changes across invocations of the sfunc function.

>> outfunc is a set-returning function, and it is called until it returns no more
>> rows, after each sfunc invocation.
>
> And in any case I feel like this is abstraction on the cheap. The only reason
> it's so general is because it's leaving all the work to the functions to
> implement.
>
> It also means we get no benefit from cases like
>
> SELECT lag(x,5),lag(y,5),lag(z,5)
>
> where the executor could keep one tuplestore and for all of them. For that
> matter it could keep one tuplestore even for cases like lag(x,5),lag(y,4).

True. I guess it comes down to how much complexity we're willing to have
in the executor node to handle that more efficiently.

Note that the memory usage is the same either way, except for the extra
constant overhead of multiple tuplestores, because each tuplestore
inside the lag implementation only needs to store its own column.

> What would the executor do for a query like
>
> SELECT lead(x,1),lead(y,2),lead(y,3)
>
> It would not only have to keep a tuplestore to buffer the output but it would
> have to deal with receiving data from different SRFs at different times. The
> best approach I can think of would be to keep a tuplestore for each SRF using
> themas queues, reading old values from the head as soon as they all have at
> least one new value in them.

Hitoshi solved that creating a separate Window node for each window
function. So the plan tree for that would look something like:

Window (lead(x,1))
Window (lead(y,2))
Window (lead(y,3))
Seq Scan ...

That keeps the Window node implementation quite simple because it only
needs to handle one window function at a time.

> And it doesn't answer how to deal with things like
>
> SELECT lag(x,1) OVER (ORDER BY a), lag(x,1) OVER (ORDER BY b)
>
> I, uh, don't actually have any ideas of how to deal with that one :(

The above handles that by putting extra sort nodes in between the Window
nodes. Not too efficient, but I don't see any way around it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	David Fetter <david(at)fetter(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 06:55:05
Message-ID:	20080902065505.GA2452@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > David Fetter wrote:
> >> On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
> >>> Ok, I'm starting to read up on SQL2003 window functions,
> >>
> >> Maybe it would be better to read the SQL2008 standard
> >> <http://wiscorp.com/sql200n.zip> :)
>
> > Ah, thanks!
>
> Er, s/2008 standard/200n draft with uncertain chances of passage/
>
> It's not like we haven't seen a SQL draft go down in flames before.

Do you think that anything in the windowing functions section will
disappear?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 07:14:07
Message-ID:	7359.1220339647@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David Fetter <david(at)fetter(dot)org> writes:
> On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
>> It's not like we haven't seen a SQL draft go down in flames before.

> Do you think that anything in the windowing functions section will
> disappear?

Who's to say?

I have no objection to looking at the 2003 and 200n documents in
parallel, especially if there are places where 200n clarifies the
intent of 2003. But I'd be suspicious of designing around
entirely-new features presented by 200n.

regards, tom lane

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 08:13:16
Message-ID:	48BCF59C.8060803@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
>> On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
>>> It's not like we haven't seen a SQL draft go down in flames before.
>
>> Do you think that anything in the windowing functions section will
>> disappear?
>
> Who's to say?
>
> I have no objection to looking at the 2003 and 200n documents in
> parallel, especially if there are places where 200n clarifies the
> intent of 2003. But I'd be suspicious of designing around
> entirely-new features presented by 200n.

well http://www.wiscorp.com/SQLStandards.html

states:

"This points to the documents which wlll likely be the documents that
represent the SQL 2008 Standard. These documents are out for
International Standard ballot at this time. The vote is an Up/Down vote.
No changes allowed."

Stefan

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 08:36:08
Message-ID:	e08cc0400809020136o69c4713fyd020d440d971ea19@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Gregory Stark wrote:
>> What would the executor do for a query like
>>
>> SELECT lead(x,1),lead(y,2),lead(y,3)
>>
>> It would not only have to keep a tuplestore to buffer the output but it
>> would
>> have to deal with receiving data from different SRFs at different times.
>> The
>> best approach I can think of would be to keep a tuplestore for each SRF
>> using
>> themas queues, reading old values from the head as soon as they all have
>> at
>> least one new value in them.
>
> Hitoshi solved that creating a separate Window node for each window
> function. So the plan tree for that would look something like:
>
> Window (lead(x,1))
> Window (lead(y,2))
> Window (lead(y,3))
> Seq Scan ...
>
> That keeps the Window node implementation quite simple because it only needs
> to handle one window function at a time.

To say more accurately, one Window node can handle more than one
window function. If it is thought to be the same using equalFuncs
comparing targetlists, some functions are put into one node.

Regards,

--
Hitoshi Harada

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 09:44:31
Message-ID:	1220348671.4371.312.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-09-01 at 21:00 +0300, Heikki Linnakangas wrote:

> 1. It's important that what gets committed now can be extended to handle
> all of the window function stuff in SQL2003 in the future, as well as
> user-defined-window-functions in the spirit of PostgreSQL extensibility.
> Even if we don't implement all of it in this release.

I think whatever public APIs get published now must be sufficient to
support user-defined-window functions across all future releases, so on
that point I agree completely. (One reason why I argued earlier in
favour of avoiding an API for now).

We shouldn't restrict the implementation of the internals to be upward
compatible though because I foresee some aspect of complexity stalling
and thus killing the patch in the short term if we do that. We don't
have much time left for this release.

If we only have the combined (brain * time) to get a partial
implementation in for this release then I would urge we go for that,
rather than wait for perfection - as long as there are no other negative
effects.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 10:24:21
Message-ID:	20080902102421.GB3664@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 02, 2008 at 10:44:31AM +0100, Simon Riggs wrote:
> If we only have the combined (brain * time) to get a partial
> implementation in for this release then I would urge we go for that,
> rather than wait for perfection - as long as there are no other negative
> effects.

"premature optimization is the root of all evil." (Knuth, Donald.)

"make it work, then make it better".

Getting a partial implementation that works out now is far better than
waiting until its perfect.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 11:42:45
Message-ID:	1220355765.4371.381.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-09-02 at 03:14 -0400, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
> > On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> >> It's not like we haven't seen a SQL draft go down in flames before.
>
> > Do you think that anything in the windowing functions section will
> > disappear?
>
> Who's to say?
>
> I have no objection to looking at the 2003 and 200n documents in
> parallel, especially if there are places where 200n clarifies the
> intent of 2003. But I'd be suspicious of designing around
> entirely-new features presented by 200n.

I have confirmation from Michael Gorman, Wiscorp, that

> The new standard was approved in early Summer. SQL 2008 is finished.

So as of now, SQL2008 exists, all hail. SQL2003 and earlier versions
have been superseded and can be ignored.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 12:08:50
Message-ID:	e08cc0400809020508k16180032ub78dd3fd71851e1e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/9/2 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> If you've done all of that, then I'm impressed. Well done.
>
> Few general comments
>
> * The docs talk about "windowing functions", yet you talk about "window
> functions" here. I think the latter is correct, but whichever we choose
> we should be consistent (and hopefully matching SQL Standard).

That's what I'm embarrassed. Now we have "window functions" meaning
two, one for generic name of window expressions and the other for
non-window-aggregates. It is a word play, which is difficult problem
for non-native people, but... let's use "window functions". I'll
revise sgml docs.

> * You don't use duplicate the examples from the docs into the tests,
> which is always a good way to get conflicting reports from users. :-)
>
> * The tests seem very light for such a huge range of new functionality.
> (8 tests is hardly sufficient). I'd like to see a wide range of tests -
> probably 5-10 times as many individual test statements. I would also
> like to see test failures that illustrate the as-yet unimplemented
> features and the warning messages that are thrown - this will help us
> understand exactly what is missing also. It would also be useful to see
> other common coding mistakes/misconceptions and the corresponding error
> messages.
>
>> Also, current implementation has only a type of plan which uses sort
>> operation. It should be optimized by re-position the windows and/or
>> using hashtable.
>
> I would like to see some performance test results also. It would be good
> to know whether they are fast/slow etc.. It will definitely help the
> case for inclusion if they are faster than alternative multi-statement
> approaches to solving the basic data access problems.
>

OK, thanks for your advices. I'll work on tests, docs and benchmarks,
then send in another patch in a week or so.

Regards,

--
Hitoshi Harada

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 12:51:09
Message-ID:	48BD36BD.2070209@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hitoshi Harada wrote:
> 2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> In my understanding, the "Window Frame" is defined
> by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
> contrast to "Window Partition" defined by "PARTITION BY" clause. A
> frame slides within a partition or there's only one frame if those
> clauses are not specified. The current patch has not implemented this
> yet. I'll update the docs.

Yes, that's how I read it as well. Another way to think of it is that
there's always a window frame, but if no ROWS BETWEEN or similar clause
is given, the window frame spans the whole partition (or from the 1st
row of the partition, up to the current row, if ORDER BY is given).

> I don't like to call the second type "ranking aggregates" because it
> may refer to only ranking functions though there are more types of
> function like ntile(), lead() and lag(). But "window functions"
> doesn't seem appropriate either since it's ambiguous with the general
> name of window expressions.

Yep, confusing :-(. The SQL standard says that "a window function is one
of: a rank function, a distribution function, a row number function, a
window aggregate function, the ntile function, the lead function, the
lag function, the first-value function, the last-value function, the
nth-value function". So, window aggregate functions are a special class
of window functions, and there's no term to refer to all the rest of the
window functions excluding window aggregate functions.

Your doc SQL spec
Window expression Window function
Window function Any window function other than a window aggregate function
Window aggregate Window aggregate function

I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
window function other than a window aggregate function", but you're
right that that's still confusing, because the SQL2008 term "rank
function" includes only RANK() and DENSE_RANK().

The spec calls them "group aggregate functions", when they're used with
GROUP BY, rather than as a window function. I think we could use that term.

> Your proposal is smarter than the current implementation. But it
> doesn't seem complete in some way. From logical point of view, the
> window functions should be allowed to access whole of rows in the
> frame the current row belongs to (c.f. inverse distribution
> functions).

By the whole of rows, do you mean
a) the chosen value or expression of all rows, or
b) all columns of the input rows?

Different window functions have different needs. RANK() for example does
need to see all columns, to compare them, but it only needs access to
the previous and current row. CUME_DIST on the other hand needs access
to all columns of all rows, and LAG needs access to a specific column of
a fixed number of rows. And row_number needs nothing.

The needs of access to the rows are so different that it seems best to
me to delegate the buffering to the window function.

Actually, looking closer to the ranking functions, they don't really
need access to all columns. They just need to be able to compare them,
according to the window ordering, so we should probably only provide
access to the arguments of the aggregate, evaluated for any row in the
frame, and a comparator function, that can determine if any given rows
in the frame are <, = or >.

> From implementing and performance point of view, there
> need to consider such case like mixing window aggregates and ranking
> aggregates in the same window, which means it is better that the two
> types of functions are processed in the similar flow. Also logically,
> SQL spec doesn't forbid ranking aggregates in sliding window frames.

It doesn't? Oh, bummer. It seems we need a grand unified theory of
ranking and other aggregates then :-(. How does something like RANK work
with a window frame? What does it return, if the current row is excluded
from the window frame, e.g with EXCLUDE CURRENT ROW?

Let's look at the trivial, generic, and slow implementation first, and
then figure out what tricks we can do to make it faster. I gather that
that generic algorithm, following how the SQL spec defines the window
frame, looks like this:

0. Construct the ordered set of tuples, containing all the tuples in the
partition.
For each row, start with the ordered set constructed in step 0, and:
1. Remove all tuples from the set that are before the start of the
sliding frame
2. Remove all tuples from the set that are after the end of the sliding
frame
3. Remove all tuples specified by the window-frame-exclusion clause
(EXCLUDE CURRENT ROW/GROUP/TIES/NO OTHERS).
4. Run the aggregate over all tuples still in the set.

Now, the first obvious optimization is that we don't want to construct
the window frame from scratch for each row. Instead:

0. Start with an empty ordered set S.
For each row:
1. Read forward until we hit the end of the new sliding frame, and add
those rows to S.
2. Remove all tuples from S that are before the start of the sliding frame.
3. Construct a new set T, which is the same as S, but all tuples
specified by the window-frame-exclusion clause (EXCLUDE CURRENT
ROW/GROUP/TIES/NO OTHERS) are removed.
5. Run the aggregate over all tuples in T.

This is applicable to all ranking and window aggregates. Per spec, if no
window-frame-clause is given, the start of the frame is always the 1st
row in the partition. If an ORDER BY is given, the end of the frame is
the the current row. Otherwise it's the end of partition, which means
that we can just run the aggregate once and return the same value for
all rows in the partition.

Now, let's consider how to speed up step 5. There's the difference
between ranking and window aggregates. Window aggregates don't care
about the position of the current row within the window frame, so the
interface I proposed earlier, with the subfunc, would work fine. But for
ranking aggregates, the position of the current row matters.

Another must-have optimization is that we must be able to eagerly
discard rows that we know won't be accessed anymore. And likewise, we
mustn't read ahead many rows that we don't need to calculate the
aggregate for the current row.

I think I'm coming around to your suggestion of passing a Window object
to the aggregate function, allowing random access to the current frame.
As Simon pointed out, there needs to be a way for the aggregate function
to signal when it doesn't need some rows in the frame anymore, for
performance.

The sliding window frame always moves forward, never backwards.
Therefore it's sufficient if the user-defined function can indicate a
cutoff point, allowing the executor node to discard any preceding rows.
However, the window-frame-exclusion clause complicates that, because the
current row and its peers might or might not be in the frame, which
means that rows can disappear and reappear into the middle of the window
frame. We don't need to implement window-frame-exclusion in this
release, but we should consider how that affects the API.

The API could be like this (in pseudo-syntax):

CREATE AGGREGATE name (
<type> evalfunc(Window object)
enterfunc(Window object, int pos)
exitfunc(Window object, int pos)
)

where evalfunc, enterfunc and exitfunc are functions provided by the
aggregate function author.

Window object {
/* Returns the position of current row within the frame. */
int get_current_pos();
/* Returns the number of rows in the frame. */
int get_max_pos();
/* Compares two tuple in the window frame, according to window
ordering */
int compare(int posa, int posb)
/* Returns the value of an aggregate argument for tuple at given
position */
Datum get_arg_value(int pos, int argno);
/* Promise that we won't call get_arg_value or compare for any pos <
cutoff_pos anymore */
void signal_cutoff(int cutoff_pos);
}

The executor calls enterfunc for each row that enters the window frame,
and exitfunc for each row that exits the frame (before removing the
tuple from the frame, so that it's still accessible to get_arg_value()
and compare()). After calling enter/exitfuncs for all rows that
enter/exit the frame, evalfunc is called once, to return the value for
the current row.

I believe this allows for an efficient implementation of all the window
functions we've discussed. It works fine for implementing regular
aggregates as well, so there no need to treat ranking and other
aggregates any differently. We can completely replace the current method
of defining aggregates with this if we want to, as long as we provide
some sort of glue for backwards-compatibility with aggregates in
external projects.

Here's sketches of some aggregate implementations using this API:

RANK()
------
enterfunc:
if position of the new row is > current row, do nothing. Otherwise,
decrement current rank.
exitfunc:
if position of the new row is > current row, do nothing. Otherwise,
increment current rank.

LEAD(offset)
------------
evalfunc:
Call get_row(get_current_row() + offset).
Call signal_cutoff(get_current_row());

LAG(offset)
-----------
evalfunc:
Call get_row(get_current_row() - offset).
Call signal_cutoff(get_current_row() - offset);

MIN/MAX
-------
enterfunc:
if new value is </> the current min/max value, replace it with the new
value
exitfunc:
if removed value = the current min/max value, rescan the current frame
for new min/max value, iterating through all values (1..get_max_row()).

COUNT
-----
enterfunc: increment counter
exitfunc: decrement counter

CUME_DIST
---------
CUME_DIST is defined as NP/NR, where NP is the number of rows preceding
or peer with the current row, and NR is the total number of rows in the
frame. NR = get_max_pos(). Maintain NP-counter in enterfunc/exitfunc,
incrementing it whenever a row preceding current row enters or exits the
frame. In evalfunc, increment the counter to reflect the new current row
position.

> I feel horrible in imagining to manage the combination of functions that
> don't support subfunc, ones that does and ranking aggregates but it's
> possible.

Well, if you handle each window/ranking aggregate in a separate Window
node, like you did, it should be much easier to manage. If it gets
overwhelmingly complex, you could also split it into two slightly
different Window nodes, one for the generic case, for aggregates that
don't have subfunc, and one for ones that do.

(the point is moot, if we go with the API I proposed above, using a
Window object)

All in all, this is such a monstrous piece of functionality, that if we
try to implement everything in one patch, it'll never get finished. So
we clearly have to implement this in phases, and just make sure we don't
paint ourselves in the corner with the design.

I'd suggest:

1. Implement Window node, with the capability to invoke an aggregate
function, using the above API. Implement required parser/planner
changes. Implement a few simple ranking aggregates using the API.
2. Implement glue to call normal aggregates through the new API
3. Implement the optimization to drop rows that are no longer needed
(signal_cutoff can be a no-op until this phase)
4. Implement window framing (the frame can always be all rows in the
partition, or all rows until the current row, until this phase)
5. Expose the new API to user-defined aggregates. It can be an internal
API only used by built-in functions until this phase

I believe you already have phase 1 in your patch, except for the API
changes.

>> PS. Have you looked at the "hypothetical set functions" in SQL2003?
>
> I had a glance but not deeply, since I found I would be lost in design
> if deeply diving into it. :)

Yeah ;-). A quick glance suggests that it won't affect aggregate the API
we come up with, but will require changes to parser/executor.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 13:09:37
Message-ID:	48BD3B11.905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martijn van Oosterhout wrote:
> On Tue, Sep 02, 2008 at 10:44:31AM +0100, Simon Riggs wrote:
>> If we only have the combined (brain * time) to get a partial
>> implementation in for this release then I would urge we go for that,
>> rather than wait for perfection - as long as there are no other negative
>> effects.
>
> "premature optimization is the root of all evil." (Knuth, Donald.)
>
> "make it work, then make it better".
>
> Getting a partial implementation that works out now is far better than
> waiting until its perfect.

Sure. Just have to watch out that we don't follow a dead-end, making it
harder to add missing functionality later on.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	David Fetter <david(at)fetter(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 16:35:43
Message-ID:	20080902163543.GD2452@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 02, 2008 at 12:42:45PM +0100, Simon Riggs wrote:
>
> On Tue, 2008-09-02 at 03:14 -0400, Tom Lane wrote:
> > David Fetter <david(at)fetter(dot)org> writes:
> > > On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> > >> It's not like we haven't seen a SQL draft go down in flames
> > >> before.
> >
> > > Do you think that anything in the windowing functions section
> > > will disappear?
> >
> > Who's to say?
> >
> > I have no objection to looking at the 2003 and 200n documents in
> > parallel, especially if there are places where 200n clarifies the
> > intent of 2003. But I'd be suspicious of designing around
> > entirely-new features presented by 200n.
>
> I have confirmation from Michael Gorman, Wiscorp, that
>
> > The new standard was approved in early Summer. SQL 2008 is
> > finished.
>
> So as of now, SQL2008 exists, all hail. SQL2003 and earlier versions
> have been superseded and can be ignored.

Any chance we can buy a few copies of the official one for use on the
project?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 17:12:28
Message-ID:	1220375548.4371.476.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-09-02 at 09:35 -0700, David Fetter wrote:
> On Tue, Sep 02, 2008 at 12:42:45PM +0100, Simon Riggs wrote:
> >
> > On Tue, 2008-09-02 at 03:14 -0400, Tom Lane wrote:
> > > David Fetter <david(at)fetter(dot)org> writes:
> > > > On Tue, Sep 02, 2008 at 02:42:25AM -0400, Tom Lane wrote:
> > > >> It's not like we haven't seen a SQL draft go down in flames
> > > >> before.
> > >
> > > > Do you think that anything in the windowing functions section
> > > > will disappear?
> > >
> > > Who's to say?
> > >
> > > I have no objection to looking at the 2003 and 200n documents in
> > > parallel, especially if there are places where 200n clarifies the
> > > intent of 2003. But I'd be suspicious of designing around
> > > entirely-new features presented by 200n.
> >
> > I have confirmation from Michael Gorman, Wiscorp, that
> >
> > > The new standard was approved in early Summer. SQL 2008 is
> > > finished.
> >
> > So as of now, SQL2008 exists, all hail. SQL2003 and earlier versions
> > have been superseded and can be ignored.
>
> Any chance we can buy a few copies of the official one for use on the
> project?

You have to buy them from ISO website I think, but it was $00s when I
looked some years back - we'd probably need a dozen copies at least
since we hardly ever meet. Michael's .pdf was the final version, so we
do actually have the final form even if the page formatting somewhat
different.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	"Neil Conway" <neil(dot)conway(at)gmail(dot)com>
To:	"David Fetter" <david(at)fetter(dot)org>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-02 17:52:12
Message-ID:	b4e5ce320809021052p9aadce2hf122edef653586ad@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 2, 2008 at 9:35 AM, David Fetter <david(at)fetter(dot)org> wrote:
> Any chance we can buy a few copies of the official one for use on the
> project?

AFAIK there is no significant difference between the "official"
standard and the draft version available online, so I don't see the
point.

Neil

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-03 02:34:24
Message-ID:	e08cc0400809021934r1bdb8c67w973964d569955800@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Hitoshi Harada wrote:
>>
>> 2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>> In my understanding, the "Window Frame" is defined
>> by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
>> contrast to "Window Partition" defined by "PARTITION BY" clause. A
>> frame slides within a partition or there's only one frame if those
>> clauses are not specified. The current patch has not implemented this
>> yet. I'll update the docs.
>
> Yes, that's how I read it as well. Another way to think of it is that
> there's always a window frame, but if no ROWS BETWEEN or similar clause is
> given, the window frame spans the whole partition (or from the 1st row of
> the partition, up to the current row, if ORDER BY is given).
>
>> I don't like to call the second type "ranking aggregates" because it
>> may refer to only ranking functions though there are more types of
>> function like ntile(), lead() and lag(). But "window functions"
>> doesn't seem appropriate either since it's ambiguous with the general
>> name of window expressions.
>
> Yep, confusing :-(. The SQL standard says that "a window function is one of:
> a rank function, a distribution function, a row number function, a window
> aggregate function, the ntile function, the lead function, the lag function,
> the first-value function, the last-value function, the nth-value function".
> So, window aggregate functions are a special class of window functions, and
> there's no term to refer to all the rest of the window functions excluding
> window aggregate functions.
>
> Your doc SQL spec
> Window expression Window function
> Window function Any window function other than a window aggregate
> function
> Window aggregate Window aggregate function
>
> I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
> window function other than a window aggregate function", but you're right
> that that's still confusing, because the SQL2008 term "rank function"
> includes only RANK() and DENSE_RANK().
>
> The spec calls them "group aggregate functions", when they're used with
> GROUP BY, rather than as a window function. I think we could use that term.

Agree. So from now on, we use "window functions" for all kinds of
functions including window aggregates. "Window expression" is
discarded. "Window functions" also means the mechanism to support
these functions to process and this project.

>> Your proposal is smarter than the current implementation. But it
>> doesn't seem complete in some way. From logical point of view, the
>> window functions should be allowed to access whole of rows in the
>> frame the current row belongs to (c.f. inverse distribution
>> functions).
>
> By the whole of rows, do you mean
> a) the chosen value or expression of all rows, or
> b) all columns of the input rows?

a). I mean all input rows in a window frame. But later I found
"inverse distribution function" is not one of window functions. That
is actually one of aggregate functions. Forget about it.

>
> Different window functions have different needs. RANK() for example does
> need to see all columns, to compare them, but it only needs access to the
> previous and current row. CUME_DIST on the other hand needs access to all
> columns of all rows, and LAG needs access to a specific column of a fixed
> number of rows. And row_number needs nothing.
>
> The needs of access to the rows are so different that it seems best to me to
> delegate the buffering to the window function.

Delegating optimization to them depending on functions' needs is a
good idea. So executor can concentrate on the window function process
flow. Let's unify it in executor and let trivial optimizations get
into individual functions.

>
> Actually, looking closer to the ranking functions, they don't really need
> access to all columns. They just need to be able to compare them, according
> to the window ordering, so we should probably only provide access to the
> arguments of the aggregate, evaluated for any row in the frame, and a
> comparator function, that can determine if any given rows in the frame are
> <, = or >.

That is kind of problem. If your task is only to define that window
node executor simply stores window frame rows and pass them to window
functions as they need, the rank functions' needs don't come. As you
point out, rank functions need ordering key columns and its
comparators. So you have to imagine what comes next? What will be
wanted other than ordering key columns, if we think about "universe
window functions" much more than SQL spec says?

>
>> From implementing and performance point of view, there
>> need to consider such case like mixing window aggregates and ranking
>> aggregates in the same window, which means it is better that the two
>> types of functions are processed in the similar flow. Also logically,
>> SQL spec doesn't forbid ranking aggregates in sliding window frames.
>
> It doesn't? Oh, bummer. It seems we need a grand unified theory of ranking
> and other aggregates then :-(. How does something like RANK work with a
> window frame? What does it return, if the current row is excluded from the
> window frame, e.g with EXCLUDE CURRENT ROW?

It doesn't but implicitly does. Also it seems that the Oracle and
other RDMBSs do. Rank functions in sliding frame are possible because
if you take the current row the window frame can be defined and with
it aggregate. Also in implementing, without having catalog info about
which function can take sliding window, parser/planner/executor cannot
see how to forbid it. Have the info, otherwise we have to allow it.

> Let's look at the trivial, generic, and slow implementation first, and then
> figure out what tricks we can do to make it faster. I gather that that
> generic algorithm, following how the SQL spec defines the window frame,
> looks like this:

So as to satisfy all of window functions' needs, Window object does
well. But it is so heavy that we need to optimize as functions need,
by reducing stored rows and avoiding to call redundant functions.
Still I'm afraid if user can define such a complex function...

> Here's sketches of some aggregate implementations using this API:
>
> RANK()
> ------
> enterfunc:
> if position of the new row is > current row, do nothing. Otherwise,
> decrement current rank.
> exitfunc:
> if position of the new row is > current row, do nothing. Otherwise,
> increment current rank.

I don't see why the row's positions affect rank. The rank depends on
comparing two rows ordering columns, doen't it?

> I'd suggest:
>
> 1. Implement Window node, with the capability to invoke an aggregate
> function, using the above API. Implement required parser/planner changes.
> Implement a few simple ranking aggregates using the API.
> 2. Implement glue to call normal aggregates through the new API
> 3. Implement the optimization to drop rows that are no longer needed
> (signal_cutoff can be a no-op until this phase)
> 4. Implement window framing (the frame can always be all rows in the
> partition, or all rows until the current row, until this phase)
> 5. Expose the new API to user-defined aggregates. It can be an internal API
> only used by built-in functions until this phase
>
> I believe you already have phase 1 in your patch, except for the API
> changes.
>

I am willing to challenge to implement the API above, after maintain
the current patch adding docs and tests. Since the API includes
changes much more like Aggregate syntax than current patch, I'm not
sure if I can finish it by next commit fest, which is said to be
"feature freeze". For safety, remain the current patch to review
excluding API and executor then if I fail to finish use it for next
release. Git helps it by cutting a branch, does it? How do you think?

Regards,

--
Hitoshi Harada

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-03 05:47:13
Message-ID:	1220420833.4371.613.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:

> The needs of access to the rows are so different that it seems best to
> me to delegate the buffering to the window function.

That seems sensible in some ways, not others.

Some of the window functions, like lead and lag merely specify window
size and shape for other functions to act upon. For those types of
request I don't see any need for custom functions, whereas for the
comparison/calculation functions there might be a need.

We don't need to implement all the things the SQL Standard calls window
functions with a 1:1 mapping to Postgres functions.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-03 06:51:43
Message-ID:	48BE33FF.4050800@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:
>
>> The needs of access to the rows are so different that it seems best to
>> me to delegate the buffering to the window function.
>
> That seems sensible in some ways, not others.

In the API I proposed later in that mail, the buffering is actually done
by the executor node, not by the window function. Instead, the window
function can request abitrary rows of the frame from the executor, and
can signal that some rows are no longer required, allowing them to be
discarded.

> Some of the window functions, like lead and lag merely specify window
> size and shape for other functions to act upon.

I don't understand that. LEAD/LAG return a value. They don't affect the
size or shape of the window in any way. It doesn't affect other functions.

> For those types of
> request I don't see any need for custom functions, whereas for the
> comparison/calculation functions there might be a need.
>
> We don't need to implement all the things the SQL Standard calls window
> functions with a 1:1 mapping to Postgres functions.

Sure, we have special hacks for things like MIN/MAX already. But using
PostgreSQL functions does seem like the simplest solution to me, as the
backend code can get quite complex if we have to add special handling
for different window functions. LEAD/LAG fall quite nicely into the
framework I proposed, but if something comes along that doesn't, then
we'll have to extend the framework or add a special case.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-03 07:25:16
Message-ID:	48BE3BDC.3060601@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hitoshi Harada wrote:
> 2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>> Hitoshi Harada wrote:
>>> 2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>>> In my understanding, the "Window Frame" is defined
>>> by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
>>> contrast to "Window Partition" defined by "PARTITION BY" clause. A
>>> frame slides within a partition or there's only one frame if those
>>> clauses are not specified. The current patch has not implemented this
>>> yet. I'll update the docs.
>> Yes, that's how I read it as well. Another way to think of it is that
>> there's always a window frame, but if no ROWS BETWEEN or similar clause is
>> given, the window frame spans the whole partition (or from the 1st row of
>> the partition, up to the current row, if ORDER BY is given).
>>
>>> I don't like to call the second type "ranking aggregates" because it
>>> may refer to only ranking functions though there are more types of
>>> function like ntile(), lead() and lag(). But "window functions"
>>> doesn't seem appropriate either since it's ambiguous with the general
>>> name of window expressions.
>> Yep, confusing :-(. The SQL standard says that "a window function is one of:
>> a rank function, a distribution function, a row number function, a window
>> aggregate function, the ntile function, the lead function, the lag function,
>> the first-value function, the last-value function, the nth-value function".
>> So, window aggregate functions are a special class of window functions, and
>> there's no term to refer to all the rest of the window functions excluding
>> window aggregate functions.
>>
>> Your doc SQL spec
>> Window expression Window function
>> Window function Any window function other than a window aggregate
>> function
>> Window aggregate Window aggregate function
>>
>> I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
>> window function other than a window aggregate function", but you're right
>> that that's still confusing, because the SQL2008 term "rank function"
>> includes only RANK() and DENSE_RANK().
>>
>> The spec calls them "group aggregate functions", when they're used with
>> GROUP BY, rather than as a window function. I think we could use that term.
>
> Agree. So from now on, we use "window functions" for all kinds of
> functions including window aggregates. "Window expression" is
> discarded. "Window functions" also means the mechanism to support
> these functions to process and this project.
>
>>> Your proposal is smarter than the current implementation. But it
>>> doesn't seem complete in some way. From logical point of view, the
>>> window functions should be allowed to access whole of rows in the
>>> frame the current row belongs to (c.f. inverse distribution
>>> functions).
>> By the whole of rows, do you mean
>> a) the chosen value or expression of all rows, or
>> b) all columns of the input rows?
>
> a). I mean all input rows in a window frame. But later I found
> "inverse distribution function" is not one of window functions. That
> is actually one of aggregate functions. Forget about it.
>
>> Different window functions have different needs. RANK() for example does
>> need to see all columns, to compare them, but it only needs access to the
>> previous and current row. CUME_DIST on the other hand needs access to all
>> columns of all rows, and LAG needs access to a specific column of a fixed
>> number of rows. And row_number needs nothing.
>>
>> The needs of access to the rows are so different that it seems best to me to
>> delegate the buffering to the window function.
>
> Delegating optimization to them depending on functions' needs is a
> good idea. So executor can concentrate on the window function process
> flow. Let's unify it in executor and let trivial optimizations get
> into individual functions.
>
>> Actually, looking closer to the ranking functions, they don't really need
>> access to all columns. They just need to be able to compare them, according
>> to the window ordering, so we should probably only provide access to the
>> arguments of the aggregate, evaluated for any row in the frame, and a
>> comparator function, that can determine if any given rows in the frame are
>> <, = or >.
>
> That is kind of problem. If your task is only to define that window
> node executor simply stores window frame rows and pass them to window
> functions as they need, the rank functions' needs don't come. As you
> point out, rank functions need ordering key columns and its
> comparators. So you have to imagine what comes next? What will be
> wanted other than ordering key columns, if we think about "universe
> window functions" much more than SQL spec says?

It might be a good idea to google around what window functions other
DBMSs support, and see if this scheme could support all of them. I
couldn't find any that it couldn't, but I didn't look very hard.

>> Let's look at the trivial, generic, and slow implementation first, and then
>> figure out what tricks we can do to make it faster. I gather that that
>> generic algorithm, following how the SQL spec defines the window frame,
>> looks like this:
>
> So as to satisfy all of window functions' needs, Window object does
> well. But it is so heavy that we need to optimize as functions need,
> by reducing stored rows and avoiding to call redundant functions.
> Still I'm afraid if user can define such a complex function...

An average user probably can't. Creating a regular aggregate using the
current method is not exactly trivial either.

At the moment, I'm more worried about finding a good abstraction to use
within the backend, and worry about exposing it to user-defined
functions later.

>> Here's sketches of some aggregate implementations using this API:
>>
>> RANK()
>> ------
>> enterfunc:
>> if position of the new row is > current row, do nothing. Otherwise,
>> decrement current rank.
>> exitfunc:
>> if position of the new row is > current row, do nothing. Otherwise,
>> increment current rank.
>
> I don't see why the row's positions affect rank. The rank depends on
> comparing two rows ordering columns, doen't it?

Yes. Imagine a window frame like this (e.g. from "ROWS BETWEEN 2
PRECEDING AND 2 FOLLOWING"):

10
20
30 <-- current row
40
50

RANK() doesn't care about rows that enter frame, that are after current
row. For example, if 60 enters the frame:

10
20
30 <-- current row
40
50
60

The RANK() of the current row is still 3, as it was before. RANK() only
advances when the current row is advanced, or new rows that precede the
current row enter the frame, e.g if a peer of the current row enters the
frame, which is possible with a window-frame-exclusion clause:

10
20
30
30 <-- current row
40
50
60

>
>> I'd suggest:
>>
>> 1. Implement Window node, with the capability to invoke an aggregate
>> function, using the above API. Implement required parser/planner changes.
>> Implement a few simple ranking aggregates using the API.
>> 2. Implement glue to call normal aggregates through the new API
>> 3. Implement the optimization to drop rows that are no longer needed
>> (signal_cutoff can be a no-op until this phase)
>> 4. Implement window framing (the frame can always be all rows in the
>> partition, or all rows until the current row, until this phase)
>> 5. Expose the new API to user-defined aggregates. It can be an internal API
>> only used by built-in functions until this phase
>>
>> I believe you already have phase 1 in your patch, except for the API
>> changes.
>
> I am willing to challenge to implement the API above, after maintain
> the current patch adding docs and tests. Since the API includes
> changes much more like Aggregate syntax than current patch, I'm not
> sure if I can finish it by next commit fest, which is said to be
> "feature freeze". For safety, remain the current patch to review
> excluding API and executor then if I fail to finish use it for next
> release. Git helps it by cutting a branch, does it? How do you think?

We do allow changes to the user manual after the feature freeze, so I'd
suggest concentrating on the code and tests first. Code comments and
internal docs are important, though, for easy review.

I'm sure we won't get all the way to phase 5 for 8.4, but if we can even
get 1-3, plus some of the most important window functions, this this
will be a great release!

I'll review the parser/planner changes from the current patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-03 08:17:16
Message-ID:	1220429836.4371.664.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2008-09-03 at 09:51 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:
> >
> >> The needs of access to the rows are so different that it seems best to
> >> me to delegate the buffering to the window function.
> >
> > That seems sensible in some ways, not others.
>
> In the API I proposed later in that mail, the buffering is actually done
> by the executor node, not by the window function. Instead, the window
> function can request abitrary rows of the frame from the executor, and
> can signal that some rows are no longer required, allowing them to be
> discarded.

I'm happy with that.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-03 13:33:39
Message-ID:	e08cc0400809030633k58dc5528p87468842565fb67e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/9/3 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Hitoshi Harada wrote:
>>
>>> I'd suggest:
>>>
>>> 1. Implement Window node, with the capability to invoke an aggregate
>>> function, using the above API. Implement required parser/planner changes.
>>> Implement a few simple ranking aggregates using the API.
>>> 2. Implement glue to call normal aggregates through the new API
>>> 3. Implement the optimization to drop rows that are no longer needed
>>> (signal_cutoff can be a no-op until this phase)
>>> 4. Implement window framing (the frame can always be all rows in the
>>> partition, or all rows until the current row, until this phase)
>>> 5. Expose the new API to user-defined aggregates. It can be an internal
>>> API
>>> only used by built-in functions until this phase
>>>
>>> I believe you already have phase 1 in your patch, except for the API
>>> changes.
>>
>> I am willing to challenge to implement the API above, after maintain
>> the current patch adding docs and tests. Since the API includes
>> changes much more like Aggregate syntax than current patch, I'm not
>> sure if I can finish it by next commit fest, which is said to be
>> "feature freeze". For safety, remain the current patch to review
>> excluding API and executor then if I fail to finish use it for next
>> release. Git helps it by cutting a branch, does it? How do you think?
>
> We do allow changes to the user manual after the feature freeze, so I'd
> suggest concentrating on the code and tests first. Code comments and
> internal docs are important, though, for easy review.
>
> I'm sure we won't get all the way to phase 5 for 8.4, but if we can even get
> 1-3, plus some of the most important window functions, this this will be a
> great release!

OK, so first tests and internal docs/comments, then comes trying to
catch API , finally docs.

BTW, I think it is better to put together the discussion points we
have done as "general roadmap to complete window functions". It is not
about the features for the next release but is the complete tasks.
Where to go? Wiki, or my design docs?

Regards,

--
Hitoshi Harada

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-04 10:10:20
Message-ID:	48BFB40C.80805@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hitoshi Harada wrote:
> BTW, I think it is better to put together the discussion points we
> have done as "general roadmap to complete window functions". It is not
> about the features for the next release but is the complete tasks.
> Where to go? Wiki, or my design docs?

That's up to you, really. I like your design docs page, but you're free
to use the Wiki as well, of course.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-04 18:11:13
Message-ID:	48C024C1.3070107@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> I'll review the parser/planner changes from the current patch.

Looks pretty sane to me. Few issues:

Is it always OK to share a window between two separate window function
invocations, if they both happen to have identical OVER clause? It seems
OK for stable functions, but I'm not sure that's correct for expressions
involving volatile functions. I wonder if the SQL spec has anything to
say about that.

As you noted in the comments, the planner could be a lot smarter about
avoiding sorts. Currently you just always put a sort node below each
Window, and another on top of them if there's an ORDER BY. There's
obviously a lot of room for improvement there.

This line:
> result_plan->targetlist = preprocess_window(tlist, result_plan);
in planner.c, won't work if result_plan is one that can't do projection.
A few screenfuls above that call, there's this:
> /*
> * If the top-level plan node is one that cannot do expression
> * evaluation, we must insert a Result node to project the
> * desired tlist.
> */
> if (!is_projection_capable_plan(result_plan))
> {
> result_plan = (Plan *) make_result(root,
> sub_tlist,
> NULL,
> result_plan);
> }
> else
> {
> /*
> * Otherwise, just replace the subplan's flat tlist with
> * the desired tlist.
> */
> result_plan->targetlist = sub_tlist;
> }
which is what you need to do with the preprocess_window call as well. I
think this is caused by that:
postgres=# explain SELECT row_number() OVER (ORDER BY id*10) FROM
(SELECT * FROM foo UNION ALL SELECT * FROM foo) sq;
ERROR: bogus varattno for OUTER var: 2

And then there's these:

postgres=# explain SELECT * FROm foo WHERE row_number() OVER (ORDER BY
id) > 10;
ERROR: winaggref found in non-Window plan node
postgres=# explain UPDATE foo SET id = 10 RETURNING (ROW_NUMBER() OVER
(ORDER BY random())) ;
ERROR: winaggref found in non-Window plan node
postgres=# explain SELECT row_number() OVER (ORDER BY (1)) FROm foo;
ERROR: ORDER/GROUP BY expression not found in targetlist
postgres=# explain SELECT row_number() OVER (ORDER BY length('foo'))
FROM foo;
ERROR: could not find pathkey item to sort

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-06 08:54:04
Message-ID:	e08cc0400809060154j6a23dc96h4b643419786a7f69@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/9/5 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Heikki Linnakangas wrote:
>>
>> I'll review the parser/planner changes from the current patch.
>
> Looks pretty sane to me. Few issues:
>
> Is it always OK to share a window between two separate window function
> invocations, if they both happen to have identical OVER clause? It seems OK
> for stable functions, but I'm not sure that's correct for expressions
> involving volatile functions. I wonder if the SQL spec has anything to say
> about that.

It may be here:

---quote---
In general, two <window function>s are computed independently, each
one performing its own sort of its data, even if they use the same
data and the same <sort specification list>. Since sorts may specify
partial orderings,
the computation of <window function>s is inevitably non-deterministic
to the extent that the ordering is not total. Nevertheless, the user
may desire that two <window function>s be computed using the same
ordering, so that, for example, two moving aggregates move through the
rows of a partition in precisely the same order.

Two <window function>s are computed using the same (possibly
non-deterministic) window ordering of the rows if any of the following
are true:

― The <window function>s identify the same window structure descriptor.

― The <window function>s' window structure descriptors have window
partitioning clauses that enumerate the same number of column
references, and those column references are pairwise equivalent in
their order
of occurrence; and their window structure descriptors have window
ordering clauses with the same number of <sort key>s, and those <sort
key>s are all column references, and those column references are
pairwise equivalent in their order of occurrence, and the <sort
specification>s pairwise specify or imply <collate clause>s that
specify equivalent <collation name>s, the same <ordering
specification> (ASC or DESC), and the same <null ordering> (NULLS
FIRST or NULLS LAST).

― The window structure descriptor of one <window function> is the
ordering window of the other <window function>, or both window
structure descriptors identify the same ordering window.
/---quote---

But it doesn't say anything about volatile functions. Do you have
example that is bad with the current design?

The other issuses are OK. I missed those cases. will fix them.

Regards,

--
Hitoshi Harada

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Window functions patch v04 for the September commit fest
Date:	2008-09-09 16:45:21
Message-ID:	e08cc0400809090945m2ee3f6efp22610a633d88eac1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> Also, current implementation has only a type of plan which uses sort
>> operation. It should be optimized by re-position the windows and/or
>> using hashtable.
>
> I would like to see some performance test results also. It would be good
> to know whether they are fast/slow etc.. It will definitely help the
> case for inclusion if they are faster than alternative multi-statement
> approaches to solving the basic data access problems.
>

Just for the report, I attach the result I have tested today. You see
the result says the current window function is faster than
sort-operated self-join and slower than hashagg-operated self-join.

This test is on the Redhat Linux ES3 Xeon 2.13GHz with 100,000 rows 2
column integers. I wrote simple perl script using psql invoking the
shell so it may contain the invocation overhead overall.

test0 test1 test2 test3 test4 test5
------------------------------------------------------------
689.502 416.633 257.970 1195.294 954.318 1204.292
687.254 447.676 256.629 1075.342 949.711 1154.754
700.602 421.818 260.742 1105.680 926.462 1203.012
736.594 476.388 334.310 1157.818 978.861 1199.944
676.572 418.782 270.270 1060.900 909.474 1175.079
687.260 428.564 257.032 1069.013 1045.387 1275.988
700.252 429.289 263.216 1074.749 1018.968 1273.965
719.478 445.218 258.464 1087.932 1015.744 1273.637
694.865 453.737 261.286 1065.229 1039.941 1262.208
685.756 430.169 258.017 1124.795 1102.055 1297.603
------------------------------------------------------------
697.81 436.83 267.79 1101.68 994.09 1232.05

test0 SELECT sum(amount) OVER (PARTITION BY sector) FROM bench1;
test1 SELECT amount FROM bench1 ORDER BY sector;
test2 SELECT sum(amount) FROM bench1 GROUP BY sector;
test3 SELECT id, amount - avg(amount) OVER (PARTITION BY sector) FROM bench1;
test4 SELECT id, amount - avg FROM bench1 INNER JOIN(SELECT sector,
avg(amount) FROM bench1 GROUP BY sector)t USING(sector)
test5 SET enable_hashagg TO off; SELECT id, amount - avg FROM bench1
INNER JOIN(SELECT sector, avg(amount) FROM bench1 GROUP BY sector)t
USING(sector)

I'll include this test in my docs later.

Regards,

--
Hitoshi Harada