Quick Links

Window Functions: v07 APIs and buffering strateties

Lists:	pgsql-hackers

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Window Functions: v07 APIs and buffering strateties
Date:	2008-10-27 16:12:22
Message-ID:	e08cc0400810270912u49a6ec83vc23984c01f368f76@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

As I promised, version 7 of the window functions is now released. At
the same time, git repository branch comes back to master.

git: http://git.postgresql.org/?p=~davidfetter/window_functions/.git
patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz

It's too huge to send it to this list :) and I don't have time enough
to update the usual document but more comments in source code were put
than before so check it out!

Thanks to those feedbacks I found the basic cumulative aggregate is
necessary even though FRAME clause is not supported. In window
specifications like:
WINDOW w AS (ORDER BY id)
the window frame is implicitly made as BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW. But still SQL spec window functions like percent_rank()
need all rows of the partition. So I introduce buffering strategy in
Window node.

So now we have:
SELECT sum(i) OVER (ORDER BY i) FROM generate_series(1, 10) i;
sum
-----
1
3
6
10
15
21
28
36
45
55
(10 rows)

and all SQL spec functions.

I guess this release covers all the features I had intended when I was
started. Though FRAME clause support is not done yet, a basic
execution mechanism is prepared so if the parser supports it, the
executor doesn't worry about it. But before heading for FRAME clause,
we must support buffering strategies, and FRAME clause may be in 8.5
cycle. In this release, I wrote around the Partition buffering so no
optimization for row_number() is implemented, though you can be
inspired how to add another buffering when reading nodeWindow.c. For
the future buffering addition, I defined and classified those Window
function APIs:

extern int64 WinRowCurrentPos(WindowObject winobj);
extern Datum WinRowGetArg(WindowObject winobj, ExprState *argstate,
bool *isnull);
extern bool WinRowGetTuple(WindowObject winobj, TupleTableSlot *slot);

extern bool WinFrameShrinked(WindowObject winobj);
extern bool WinFrameExtended(WindowObject winobj);
extern int64 WinFrameGetRowNum(WindowObject winobj);
extern Datum WinFrameGetArg(WindowObject winobj, ExprState *argstate,
int relpos, int seektype, bool *isnull);
extern bool WinFrameGetTuple(WindowObject winobj, TupleTableSlot *slot,
int relpos, int seektype);
extern int WinFrameShrinkingNum(WindowObject winobj);
extern int WinFrameExtendedNum(WindowObject winobj);

extern int64 WinPartGetRowNum(WindowObject winobj);
extern Datum WinPartGetArg(WindowObject winobj, ExprState *argstate,
int relpos, int seektype, bool *isnull);
extern bool WinPartGetTuple(WindowObject winobj, TupleTableSlot *slot,
int relpos, int seektype);

extern WindowIter WinFrameStartIter(WindowObject winobj, int pos);
extern WindowIter WinPartStartIter(WindowObject winobj, int pos);
extern bool WinIterNext(WindowIter iter);
extern Datum WinIterGetArg(WindowIter iter, ExprState *argstate, bool *isnull);
extern bool WinIterGetTuple(WindowIter iter, TupleTableSlot *slot);

With these APIs, you can write advanced window aggregate subtracting
shrinking rows of the frame, and buffering supports.

I believe I can send another patch until the next commit fest, but not
sure about sgml documentation. If someone is interested in this
feature, feel free to help me in documentation! I think the lack is
around SQL spec window functions, how the window frame works (and that
we don't support FRAME clause in this release), and basic concept of
window function architecture. We don't have to touch deep inside of
window function APIs and buffering strategies since the window
functions cannot be defined as user function now.

Comments, feedbacks?

Regards,

--
Hitoshi Harada

From:	"David Rowley" <dgrowley(at)gmail(dot)com>
To:	"'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>, "'PostgreSQL Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 02:01:15
Message-ID:	435B8B37E7DB41259C44714247A6D1AF@amd64
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hitoshi Harada Wrote:

> As I promised, version 7 of the window functions is now released. At
> the same time, git repository branch comes back to master.
>
> git: http://git.postgresql.org/?p=~davidfetter/window_functions/.git
> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz

I've been testing for a couple of hours now and just comparing results to
the results Oracle gives me. Perhaps not the best method but it's faster
than reading through the spec.

I managed to get a seg-fault with the following query.

select salary,dense_rank() over (order by salary desc)
from employee
group by salary;

It's the group by that does it.

test=# select salary,dense_rank() over (order by salary desc) from employee
group by salary;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

It seems to be possible to crash not just dense_rank() and rank().

This crashes too.

select relid,AVG(seq_Scan) OVER (ORDER BY relid)
FROM pg_stat_user_tables
GROUP BY relid,seq_scan;

Oracle allows both these queries.

Of course I changed table names and column names to make the test case a bit
easier to re-produce.

Looking forward to seeing windowing functions in 8.4!

David

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"David Rowley" <dgrowley(at)gmail(dot)com>
Cc:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 02:43:51
Message-ID:	e08cc0400810271943s6b53dafct1587e0eceb87fb58@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/28 David Rowley <dgrowley(at)gmail(dot)com>:
> Hitoshi Harada Wrote:
>
>> As I promised, version 7 of the window functions is now released. At
>> the same time, git repository branch comes back to master.
>>
>> git: http://git.postgresql.org/?p=~davidfetter/window_functions/.git
>> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz
>
> I've been testing for a couple of hours now and just comparing results to
> the results Oracle gives me. Perhaps not the best method but it's faster
> than reading through the spec.
>
> I managed to get a seg-fault with the following query.
>
> select salary,dense_rank() over (order by salary desc)
> from employee
> group by salary;
>
>
> It's the group by that does it.
>
> test=# select salary,dense_rank() over (order by salary desc) from employee
> group by salary;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
>
> It seems to be possible to crash not just dense_rank() and rank().
>
> This crashes too.
>
> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
> FROM pg_stat_user_tables
> GROUP BY relid,seq_scan;

sample=# select relid,AVG(seq_Scan) OVER (ORDER BY relid)
sample-# FROM pg_stat_user_tables
sample-# GROUP BY relid,seq_scan;
relid | avg
-------+--------------------
16385 | 7.0000000000000000
16391 | 3.5000000000000000
16394 | 2.3333333333333333
(3 rows)

Hmm, I tested it on my environment, but doesn't crash. If you have
further information on that, please tell me. And yes, I haven't paid
attention for such cases. I think at least regression test is
necessary to be added.

> Looking forward to seeing windowing functions in 8.4!

Thanks for your testing.

Regards,

--
Hitoshi Harada

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 04:20:49
Message-ID:	20081028130511.DC4F.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> wrote:

> As I promised, version 7 of the window functions is now released.
> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz

I tested the patch on mingw (Windows) and
got the following warning and error:

A. gram.y: conflicts: 3 shift/reduce
B. include/nodes/plannodes.h:650: error: syntax error before "uint"

I have no idea about A.

B is reported on the type 'uint' in struct Window.
I can compile the code if I replace 'uint' to 'uint32'

typedef struct Window
{
...
uint preceding_rows; /* used only when FRAME_ROWS */
uint following_rows; /* used only when FRAME_ROWS */

> SELECT sum(i) OVER (ORDER BY i) FROM generate_series(1, 10) i;

works fine, but...

> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
> FROM pg_stat_user_tables
> GROUP BY relid,seq_scan;

crashes with segfault.

=# SELECT version();
version
------------------------------------------------------------------------------------------------------
PostgreSQL 8.4devel on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.5 (mingw-vista special r3)
(1 row)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 05:05:37
Message-ID:	e08cc0400810272205v1cbd2c5eq62806561b04c972b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for your testing all!

2008/10/28 ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> wrote:
>
>> As I promised, version 7 of the window functions is now released.
>> patch: http://umitanuki.net/pgsql/window_functions.patch.20081028.gz
>
> I tested the patch on mingw (Windows) and
> got the following warning and error:
>
> A. gram.y: conflicts: 3 shift/reduce
> B. include/nodes/plannodes.h:650: error: syntax error before "uint"
>
> I have no idea about A.

I have noticed it but didn't think it is a problem, but it doesn't
occur in production, does it?

>
> B is reported on the type 'uint' in struct Window.
> I can compile the code if I replace 'uint' to 'uint32'
>
> typedef struct Window
> {
> ...
> uint preceding_rows; /* used only when FRAME_ROWS */
> uint following_rows; /* used only when FRAME_ROWS */

I didn't know it. will fix it.

>
>> SELECT sum(i) OVER (ORDER BY i) FROM generate_series(1, 10) i;
>
> works fine, but...
>
>> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
>> FROM pg_stat_user_tables
>> GROUP BY relid,seq_scan;
>
> crashes with segfault.

I reproduced it on linux of my environment, building without
debug/cassert. It could be a problem around there.

Regards,

--
Hitoshi Harada

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 07:03:30
Message-ID:	e08cc0400810280003v4391fcd7s72fcad660d39d809@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/28 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> Thanks for your testing all!
>
> 2008/10/28 ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> wrote:
>>
>>
>>> select relid,AVG(seq_Scan) OVER (ORDER BY relid)
>>> FROM pg_stat_user_tables
>>> GROUP BY relid,seq_scan;
>>
>> crashes with segfault.
>
> I reproduced it on linux of my environment, building without
> debug/cassert. It could be a problem around there.
>

And I fixed this problem, confirming with/without debug/cassert/gcc
-O and push it to git. If you want delta patch, please see

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=fbf19bfd0c8d2ac083b775f4cc724ec66e74fa8f

Thanks for all of your feedbacks.
Regards,

--
Hitoshi Harada

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 07:58:10
Message-ID:	20081028162348.DC5D.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> wrote:

> And I fixed this problem, confirming with/without debug/cassert/gcc
> -O and push it to git. If you want delta patch, please see
> http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=fbf19bfd0c8d2ac083b775f4cc724ec66e74fa8f

Now it passed all regression tests!

There is still one trivial warning:
parse_func.c: In function `ParseFuncOrColumn':
parse_func.c:77: warning: 'retval' might be used uninitialized in this function

It looks completely safe, but I suggest to initialize
the variable only to keep compiler quiet.

[src/backend/parser/parse_func.c] ParseFuncOrColumn()
else
+ {
elog(ERROR, "unknown function status");
+ retval = NULL; /* keep compiler quiet */
+ }

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 08:09:50
Message-ID:	e08cc0400810280109k7ea26f5bkac06f1d9ad8b0ec7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/28 ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>
> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> wrote:
>
>> And I fixed this problem, confirming with/without debug/cassert/gcc
>> -O and push it to git. If you want delta patch, please see
>> http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=fbf19bfd0c8d2ac083b775f4cc724ec66e74fa8f
>
> Now it passed all regression tests!
>
>
> There is still one trivial warning:
> parse_func.c: In function `ParseFuncOrColumn':
> parse_func.c:77: warning: 'retval' might be used uninitialized in this function
>
> It looks completely safe, but I suggest to initialize
> the variable only to keep compiler quiet.
>
> [src/backend/parser/parse_func.c] ParseFuncOrColumn()
> else
> + {
> elog(ERROR, "unknown function status");
> + retval = NULL; /* keep compiler quiet */
> + }
>

Agreed. I've just noticed it as well and I think it would be more much
like original code that the last "else if" clause gets "else". Anyway,
thanks.

Regards,

--
Hitoshi Harada

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 12:24:30
Message-ID:	18420.1225196670@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
> 2008/10/28 ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>> I tested the patch on mingw (Windows) and
>> got the following warning and error:
>>
>> A. gram.y: conflicts: 3 shift/reduce
>> B. include/nodes/plannodes.h:650: error: syntax error before "uint"
>>
>> I have no idea about A.

> I have noticed it but didn't think it is a problem, but it doesn't
> occur in production, does it?

We have a zero-tolerance policy for bison warnings. Patches that
introduce shift/reduce conflicts *will* be rejected. (And no, %expect
isn't an acceptable fix. The problem with it is you can't be sure
which warnings it ignored. In a grammar that gets hacked on as often
as PG's does, we couldn't rely on the conflicts to not move around,
possibly resulting in unforeseen misbehavior.)

regards, tom lane

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 12:31:37
Message-ID:	e08cc0400810280531k7c5eafafya6079774f7a138de@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/28 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
>> 2008/10/28 ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>>> I tested the patch on mingw (Windows) and
>>> got the following warning and error:
>>>
>>> A. gram.y: conflicts: 3 shift/reduce
>>> B. include/nodes/plannodes.h:650: error: syntax error before "uint"
>>>
>>> I have no idea about A.
>
>> I have noticed it but didn't think it is a problem, but it doesn't
>> occur in production, does it?
>
> We have a zero-tolerance policy for bison warnings. Patches that
> introduce shift/reduce conflicts *will* be rejected. (And no, %expect
> isn't an acceptable fix. The problem with it is you can't be sure
> which warnings it ignored. In a grammar that gets hacked on as often
> as PG's does, we couldn't rely on the conflicts to not move around,
> possibly resulting in unforeseen misbehavior.)
>
> regards, tom lane
>

OK, I'll try to remove it. I'm not used to bison so my first task is
to find where the conflict is...

Regards,

--
Hitoshi Harada

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 12:55:07
Message-ID:	18811.1225198507@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
> OK, I'll try to remove it. I'm not used to bison so my first task is
> to find where the conflict is...

Use bison -v to get details of where the conflict is. I find that
the most common way to fix things is to postpone where the parser
has to make a decision, which usually ends up meaning a slightly
more verbose set of productions --- for instance instead of

foo: bar opt_baz

opt_baz: baz | /*EMPTY*/

you might have to do

foo: bar baz | bar

(which is actually shorter in this pseudo-example, but wouldn't be
if bar were complicated or foo had a lot of alternatives already).
The reason this can fix it is that the parser doesn't have to commit
to which case applies until after it's read the tokens for baz.
In the first form, it has to decide whether baz is there or not
without looking any further ahead than the first token of baz.

regards, tom lane

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 16:22:47
Message-ID:	e08cc0400810280922p2d71d61fm3f0ca392dec93fd3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/28 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
>> OK, I'll try to remove it. I'm not used to bison so my first task is
>> to find where the conflict is...
>
> Use bison -v to get details of where the conflict is. I find that
> the most common way to fix things is to postpone where the parser
> has to make a decision, which usually ends up meaning a slightly
> more verbose set of productions --- for instance instead of
>
> foo: bar opt_baz
>
> opt_baz: baz | /*EMPTY*/
>
> you might have to do
>
> foo: bar baz | bar
>

Thanks for your advice. And I found the problem is around FRAME
clause. Now my question is:

Can "ROWS" be reserved_keyword?

In window specifications, we have

OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])

and currently "ROWS" is not reserved so bison is confused with cases
of "ROWS" included in expr_list and in FRAME clause. Because there is
no delimiter between ORDER BY clause and FRAME (that is (ROWS |
RANGE)) clause, "ROWS" can be in expr_list as a_expr. I see "ROWS" has
been an unreserved keyword and that many places in gram.y such cases
have been avoided by other haking methods, but in this case, isn't it
possible such? If I missed something let me know it.

Regards,

--
Hitoshi Harada

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 16:38:09
Message-ID:	8289.1225211889@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
> Can "ROWS" be reserved_keyword?

> In window specifications, we have

> OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])

> and currently "ROWS" is not reserved so bison is confused with cases
> of "ROWS" included in expr_list and in FRAME clause. Because there is
> no delimiter between ORDER BY clause and FRAME (that is (ROWS |
> RANGE)) clause, "ROWS" can be in expr_list as a_expr.

Right offhand, I don't see any alternative but to make both ROWS and
RANGE reserved. It's pretty annoying since that might break existing
applications that have been using these as identifiers, but the SQL
committee seems to care little about that :-(

BTW, finding this sort of problem is exactly why ignoring shift/reduce
conflicts is a bad idea. You would've ended up with unexpected
behaviors given the wrong input.

regards, tom lane

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 17:23:41
Message-ID:	e08cc0400810281023k157681d0p4d44a7cfeedac033@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
>> Can "ROWS" be reserved_keyword?
>
>> In window specifications, we have
>
>> OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])
>
>> and currently "ROWS" is not reserved so bison is confused with cases
>> of "ROWS" included in expr_list and in FRAME clause. Because there is
>> no delimiter between ORDER BY clause and FRAME (that is (ROWS |
>> RANGE)) clause, "ROWS" can be in expr_list as a_expr.
>
> Right offhand, I don't see any alternative but to make both ROWS and
> RANGE reserved. It's pretty annoying since that might break existing
> applications that have been using these as identifiers, but the SQL
> committee seems to care little about that :-(
>
> BTW, finding this sort of problem is exactly why ignoring shift/reduce
> conflicts is a bad idea. You would've ended up with unexpected
> behaviors given the wrong input.
>

I see it now. This is so good study to me, though it spent me much
time. Thanks anyway.

Regards,

--
Hitoshi Harada

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 17:37:23
Message-ID:	20081028173723.GA17203@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 28, 2008 at 12:38:09PM -0400, Tom Lane wrote:
> "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
> > In window specifications, we have
>
> > OVER (ORDER BY expr_list [(ROWS|RANGE) ... ])
>
> > and currently "ROWS" is not reserved so bison is confused with cases
> > of "ROWS" included in expr_list and in FRAME clause. Because there is
> > no delimiter between ORDER BY clause and FRAME (that is (ROWS |
> > RANGE)) clause, "ROWS" can be in expr_list as a_expr.
>
> Right offhand, I don't see any alternative but to make both ROWS and
> RANGE reserved. It's pretty annoying since that might break existing
> applications that have been using these as identifiers, but the SQL
> committee seems to care little about that :-(

Given that the only problematic case is if expr_list ends with a
postfix operator, wouldn't it be sufficient to simply decree that in
that case you need parentheses? Seems a lot less painful than adding
two reserved words.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 17:50:26
Message-ID:	9811.1225216226@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Tue, Oct 28, 2008 at 12:38:09PM -0400, Tom Lane wrote:
>> Right offhand, I don't see any alternative but to make both ROWS and
>> RANGE reserved. It's pretty annoying since that might break existing
>> applications that have been using these as identifiers, but the SQL
>> committee seems to care little about that :-(

> Given that the only problematic case is if expr_list ends with a
> postfix operator, wouldn't it be sufficient to simply decree that in
> that case you need parentheses? Seems a lot less painful than adding
> two reserved words.

Hmm ... actually, it might be possible to fix it with a suitable
precedence declaration? The trick is to make sure that in
... ORDER BY foo ! ROWS ...
the operator is taken as postfix not infix, which is the exact opposite
of what we did for AS-less column aliases (and, in fact, is the opposite
of what bison will do by default, IIRC). So it might be possible to fix
by attaching some new precedence level to the ROWS token.

On the other hand, this would still mean that ROWS acts oddly
differently from any other column name, and in a way that would be hard
to explain or document. So I'm really not sure that this is a better
solution than reserving it.

Still another trick in our toolbox is to use merged tokens to fix the
lack of lookahead. If I read the spec correctly, the problematic cases
of ROWS and RANGE must be followed by UNBOUNDED or BETWEEN, so folding
these token pairs into four merged-tokens would get rid of the need to
reserve anything. Merged tokens create their own little oddities too
though, as I was just complaining to Peter. I wonder whether it would
be possible to improve on that problem by arranging some way for the
grammar to signal the lexer about whether lookahead is needed, and thus
not do the merging in contexts where it couldn't be appropriate.

regards, tom lane

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 18:14:13
Message-ID:	20081028181413.GB17203@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 28, 2008 at 01:50:26PM -0400, Tom Lane wrote:
> > Given that the only problematic case is if expr_list ends with a
> > postfix operator, wouldn't it be sufficient to simply decree that in
> > that case you need parentheses? Seems a lot less painful than adding
> > two reserved words.
>
> Hmm ... actually, it might be possible to fix it with a suitable
> precedence declaration? The trick is to make sure that in
> ... ORDER BY foo ! ROWS ...
> the operator is taken as postfix not infix, which is the exact opposite
> of what we did for AS-less column aliases (and, in fact, is the opposite
> of what bison will do by default, IIRC). So it might be possible to fix
> by attaching some new precedence level to the ROWS token.

Yes. Bison's default is to shift, which means that if you do nothing it
will treat ROWS as part of the expression if it makes any sense at all.
Given the requirement for a following UNBOUNDED or BETWEEN, the only
problem is that you'll get a syntax error if the expr_list ends in a
postfix operator, I don't see how you get hidden ambiguity.

Operator precedence is exactly the way to do this, since operator
precedence rules exist solely to resolve the shift/reduce conflicts.

You're right about the documentation though. I suppose you could put in
the documentation for the ROWS stuff something along the lines of: If
the last expression of your ORDER by ends in a postfix operator, you
must use parentheses. How many postfix operators are in common use in
ORDER BY expressions anyway?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-28 18:55:16
Message-ID:	10722.1225220116@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Tue, Oct 28, 2008 at 01:50:26PM -0400, Tom Lane wrote:
>> ... So it might be possible to fix
>> by attaching some new precedence level to the ROWS token.

> Yes. Bison's default is to shift, which means that if you do nothing it
> will treat ROWS as part of the expression if it makes any sense at all.
> Given the requirement for a following UNBOUNDED or BETWEEN, the only
> problem is that you'll get a syntax error if the expr_list ends in a
> postfix operator, I don't see how you get hidden ambiguity.

Hmm, now I see what you meant; that's a little different than what I was
envisioning. I was thinking of trying to force a parse decision that
would support the windowing syntax, whereas you propose forcing a
parse decision that does the opposite, and making the user parenthesize
if he's got a conflict.

What the choice seems to come down to is making ROWS and RANGE reserved
(in some form or other) versus creating a corner case for users of
postfix operators. Phrased that way it does seem like the second
alternative is better.

Hitoshi: you can probably make this happen by including ROWS and RANGE
in the %nonassoc IDENT precedence declaration, but you'll want to test
to make sure the right things happen.

regards, tom lane

From:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Window Functions: v07 APIs and buffering strateties
Date:	2008-10-29 00:46:53
Message-ID:	e08cc0400810281746l53a4167elb1254b71a3cc79f7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/10/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
>> On Tue, Oct 28, 2008 at 01:50:26PM -0400, Tom Lane wrote:
>>> ... So it might be possible to fix
>>> by attaching some new precedence level to the ROWS token.
>
>> Yes. Bison's default is to shift, which means that if you do nothing it
>> will treat ROWS as part of the expression if it makes any sense at all.
>> Given the requirement for a following UNBOUNDED or BETWEEN, the only
>> problem is that you'll get a syntax error if the expr_list ends in a
>> postfix operator, I don't see how you get hidden ambiguity.
>
> Hmm, now I see what you meant; that's a little different than what I was
> envisioning. I was thinking of trying to force a parse decision that
> would support the windowing syntax, whereas you propose forcing a
> parse decision that does the opposite, and making the user parenthesize
> if he's got a conflict.
>
> What the choice seems to come down to is making ROWS and RANGE reserved
> (in some form or other) versus creating a corner case for users of
> postfix operators. Phrased that way it does seem like the second
> alternative is better.
>
> Hitoshi: you can probably make this happen by including ROWS and RANGE
> in the %nonassoc IDENT precedence declaration, but you'll want to test
> to make sure the right things happen.
>

Bison and parsing are quite new to me so it'll take a little time but
I will try it. One thing, the words following after ROWS/RANGE are not
only UNBOUNDED and BETWEEN but also CURRENT and "unsigned constant"
though. Still the phrasing approach doesn't seem less hope?

Regards,

--
Hitoshi Harada