Re: add more frame types in window functions (ROWS)

Lists: pgsql-hackers
From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: add more frame types in window functions (ROWS)
Date: 2009-11-13 19:25:02
Message-ID: e08cc0400911131125m1a89190fqb0ddd6f69c2b2f00@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is the patch against HEAD to support more frame types in
window functions, including these frame types:

- ROWS BETWEEN s PRECEDING AND e PRECEDING
- ROWS BETWEEN s PRECEDING AND CURRENT ROW
- ROWS BETWEEN s PRECEDING AND e FOLLOWING
- ROWS BETWEEN s PRECEDING AND UNBOUNDED FOLLOWING
- ROWS BETWEEN CURRENT ROW AND e FOLLOWING
- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- ROWS BETWEEN s FOLLOWING AND e FOLLOWING
- ROWS BETWEEN s FOLLOWING AND UNBOUNDED FOLLOWING
- RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- ROWS s PRECEDING

which means that for ROWS types we now support almost all types but
for RANGE types we don't have "value PRECEDING" / "value FOLLOWING".
I'm planning to implement them until CommitFest:2010-01 so this is
"Request for Review" phase though I've arranged the patch to be
committable.

Aggregate cache mechanism is sometimes cleared as discussed the other
day. But it should be kept that the original cache mechanism for basic
(i.e. already implemented) frame types.

Some points to be reviewed are:

- Added WindowFrameDef in parsenode.h
- Is A_Const member for startOffset / endOffset is appropriate?
- Are those data types (including gram.y) well designed?
- For basic frame types, are aggregates still optimized as before?
- Added regression tests but enough?
- All error case covered? For example: ROWS s FOLLOWING AND e PRECEDING
- Added members to WindowAggState to track more information of frame
types, but isn't there more efficient way?
- Not modified docs yet

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
rows_frame_types.20091114.patch application/octet-stream 58.2 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: umi(dot)tanuki(at)gmail(dot)com (Hitoshi Harada), pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-14 20:33:51
Message-ID: 87y6m8najw.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, I've started reviewing your patch.

I've already found some things that need work:

- missing _readWindowFrameDef function (all nodes that are output
from parse analysis must have both _read and _out functions,
otherwise views can't work)

- the A_Const nodes should probably be transformed to Const nodes in
parse analysis, since A_Const has no _read/_out functions, which
means changing the corresponding code in the executor.

(A question here: the spec allows (by my reading) the use of
parameters in the window frame clause, i.e. BETWEEN $1 PRECEDING AND
$2 FOLLOWING. Wouldn't it therefore make more sense to treat the
values as Exprs, albeit very limited ones, and eval them at startup
rather than assuming we know the node type and digging down into it
all over the place?)

You can see the problem here if you do this:

create view v as
select i,sum(i) over (order by i rows between 1 preceding and 1 following)
from generate_series(1,10) i;
select * from v;

(A regression test for this would probably be good, which reminds me that
I need to add one of those myself in the aggregate order by stuff.)

Also, you're going to need to do some work in ruleutils.c
(get_rule_windowspec) in order to be able to deparse your new frame
definitions.

I'll continue reviewing the stuff that does work, so I'm not marking this
as "waiting for author" yet.

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: umi(dot)tanuki(at)gmail(dot)com (Hitoshi Harada), pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-14 20:43:50
Message-ID: 18306.1258231430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> (A question here: the spec allows (by my reading) the use of
> parameters in the window frame clause, i.e. BETWEEN $1 PRECEDING AND
> $2 FOLLOWING. Wouldn't it therefore make more sense to treat the
> values as Exprs, albeit very limited ones, and eval them at startup
> rather than assuming we know the node type and digging down into it
> all over the place?)

Seems like you might as well allow any expression not containing
local Vars. Compare the handling of LIMIT.

regards, tom lane


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-15 07:04:21
Message-ID: e08cc0400911142304n780c3371vb3c0bb89b0e9216a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/15 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>> (A question here: the spec allows (by my reading) the use of
>> parameters in the window frame clause, i.e. BETWEEN $1 PRECEDING AND
>> $2 FOLLOWING.  Wouldn't it therefore make more sense to treat the
>> values as Exprs, albeit very limited ones, and eval them at startup
>> rather than assuming we know the node type and digging down into it
>> all over the place?)
>
> Seems like you might as well allow any expression not containing
> local Vars.  Compare the handling of LIMIT.

Hmm, I've read it wrong, was assuming a constant for <unsigned value
specification> which actually includes any expression. But it's a
fixed value during execution, right? Otherwise, we cannot predicate
frame boundary.

Regards,

--
Hitoshi Harada


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-15 07:11:19
Message-ID: e08cc0400911142311o1b106606g590495aabaf7ff74@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for your review.

2009/11/15 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
> Hi, I've started reviewing your patch.
>
> I've already found some things that need work:
>
>  - missing _readWindowFrameDef function (all nodes that are output
>   from parse analysis must have both _read and _out functions,
>   otherwise views can't work)

I added _outWindowFramedef() but seem to forget _read one. Will add it.

>
>  - the A_Const nodes should probably be transformed to Const nodes in
>   parse analysis, since A_Const has no _read/_out functions, which
>   means changing the corresponding code in the executor.

Thanks for this comment. I hadn't determined which node should be used
as a value node passed to executor. Including Tom's comment, I must
consider which should be again.

Regards,

--
Hitoshi Harada


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-15 07:33:47
Message-ID: 87eio0mezy.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:

>>> (A question here: the spec allows (by my reading) the use of
>>> parameters in the window frame clause, i.e. BETWEEN $1 PRECEDING
>>> AND $2 FOLLOWING.  Wouldn't it therefore make more sense to treat
>>> the values as Exprs, albeit very limited ones, and eval them at
>>> startup rather than assuming we know the node type and digging
>>> down into it all over the place?)

>> Seems like you might as well allow any expression not containing
>> local Vars.  Compare the handling of LIMIT.

Hitoshi> Hmm, I've read it wrong, was assuming a constant for <unsigned value
Hitoshi> specification> which actually includes any expression. But it's a
Hitoshi> fixed value during execution, right? Otherwise, we cannot predicate
Hitoshi> frame boundary.

The spec doesn't allow arbitrary expressions there, only literals and
parameters. Allowing more than that would be going beyond the spec, but
as with LIMIT, could be useful nonetheless.

For it to be a fixed value during execution, the same rules apply as
for LIMIT; it can't contain Vars of the current query level.

My thinking is that the executor definitely shouldn't be relying on it
being a specific node type, but should just ExecEvalExpr it on the
first call and store the result; then you don't have to know whether
it's a Const or Param node or a more complex expression.

--
Andrew.


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-15 07:43:41
Message-ID: e08cc0400911142343s20dd9b2dve37d41f24f1673f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/15 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
> My thinking is that the executor definitely shouldn't be relying on it
> being a specific node type, but should just ExecEvalExpr it on the
> first call and store the result; then you don't have to know whether
> it's a Const or Param node or a more complex expression.

Yeah, so that we allow something like ROWS BETWEEN 1 + $1 PRECEDING
AND ... And to support RANGE BETWEEN n PRECEDING ..., we should make
datum to add or to subtract from current value on initial call anyway.

Regards,

--
Hitoshi Harada


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: umi(dot)tanuki(at)gmail(dot)com (Hitoshi Harada), pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-19 08:29:29
Message-ID: 876397c68t.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's the rest of the review, as far as I've taken it given the
problems with the code.

The patch applied cleanly and includes regression tests but not docs.

Small nitpicks: there are some comments not updated (e.g. the
big one at the start of eval_windowaggregates). A couple of lines are
commented-out using C++ comments.

The overall approach seems ok, and the parser stuff seems fine to me.

These are the issues I've found that make it not committable in its
present form (including the ones I mentioned before):

- missing _readWindowFrameDef function (all nodes that are output
from parse analysis must have both _read and _out functions,
otherwise views can't work)

- the A_Const nodes should probably be transformed to Const nodes in
parse analysis, since A_Const has no _read/_out functions, which
means changing the corresponding code in the executor.

- ruleutils.c not updated to deparse the newly added window options

- leaks memory like it's going out of style

The memory leakage is caused by not resetting any memory contexts when
throwing away all the aggregate state when advancing the start of the
window frame. This looks like it will require a rethink of the memory
management being used; it's not enough just to pfree copies of the
transition values (which you don't appear to be doing), you have to
reset the memory context that was exposed to the transition functions
via context->wincontext. So the current setup of a single long-lived
context won't work; you'll need a long-lived one, plus an additional
one that you can reset any time the aggregates need to be
re-initialized. (And if you're not going to break existing aggregate
functions, WindowAggState.wincontext needs to be the one that gets
reset.)

Tests for memory leaks:

-- tests for failure to free by-ref transition values
select count(*)
from (select i,max(repeat(i::text,100)) over (order by i rows between 1 preceding and current row)
from generate_series(1,1000000) i) s;

-- tests for failure to reset memory context on window advance
select count(*)
from (select i,array_agg(i) over (order by i rows between 1 preceding and current row)
from generate_series(1,1000000) i) s;

--
Andrew (irc:RhodiumToad)


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-19 13:43:59
Message-ID: e08cc0400911190543t13448ec6t4748e039ec8d2940@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/19 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
> Here's the rest of the review, as far as I've taken it given the
> problems with the code.
>
> The patch applied cleanly and includes regression tests but not docs.
>
> Small nitpicks: there are some comments not updated (e.g. the
> big one at the start of eval_windowaggregates). A couple of lines are
> commented-out using C++ comments.

OK. It's tough for me to rewrite that big part of comment but I'll try it.

> The overall approach seems ok, and the parser stuff seems fine to me.
>
> These are the issues I've found that make it not committable in its
> present form (including the ones I mentioned before):
>
>  - missing _readWindowFrameDef function (all nodes that are output
>   from parse analysis must have both _read and _out functions,
>   otherwise views can't work)
>
>  - the A_Const nodes should probably be transformed to Const nodes in
>   parse analysis, since A_Const has no _read/_out functions, which
>   means changing the corresponding code in the executor.

A_Const/Const will be replace by Expr, to cover any expression without
local Var.

>
>  - ruleutils.c not updated to deparse the newly added window options
>
>  - leaks memory like it's going out of style
>
> The memory leakage is caused by not resetting any memory contexts when
> throwing away all the aggregate state when advancing the start of the
> window frame. This looks like it will require a rethink of the memory
> management being used; it's not enough just to pfree copies of the
> transition values (which you don't appear to be doing), you have to
> reset the memory context that was exposed to the transition functions
> via context->wincontext. So the current setup of a single long-lived
> context won't work; you'll need a long-lived one, plus an additional
> one that you can reset any time the aggregates need to be
> re-initialized. (And if you're not going to break existing aggregate
> functions, WindowAggState.wincontext needs to be the one that gets
> reset.)

Hmm, good point. Though I doubt we need two contexts for this because
we have not so far (and we already have tmpcontext for that purpose),
memory leakage probably seems to happen. I'll check it out.

Thanks for your elaborate review anyway. All I was worried about is
now clear. It will be lucky if I can update my patch until next week.
So please keep it "Waiting on Author".

Regards,

--
Hitoshi Harada


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-11-29 08:46:51
Message-ID: e08cc0400911290046t3620ae6cid15b69a70360b154@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/19 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2009/11/19 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
>> Small nitpicks: there are some comments not updated (e.g. the
>> big one at the start of eval_windowaggregates). A couple of lines are
>> commented-out using C++ comments.

Fixed. Document patch is included as well.

>> - missing _readWindowFrameDef function (all nodes that are output
>> from parse analysis must have both _read and _out functions,
>> otherwise views can't work)

I changed my mind and WindowFrameDef is alive only in initial parser
stage so that _readWindowFrameDef is now unnecessary. Information of
startOffset and endOffset will be copied to WindowClause members.

>> - the A_Const nodes should probably be transformed to Const nodes in
>> parse analysis, since A_Const has no _read/_out functions, which
>> means changing the corresponding code in the executor.

Fixed.

>> - ruleutils.c not updated to deparse the newly added window options

Fixed.

>> - leaks memory like it's going out of style

As earlier mail, I added aggcontext to WindowAggState.

Still it doesn't contain RANGE ... PRECEDING / FOLLOWING. If it's not
acceptable for commit without RANGE value support, I'd agree with
that. I'm planning to work on that until the next CommitFest.

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
rows_frame_types.20091129.patch application/octet-stream 86.1 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-02 08:16:00
Message-ID: 871vjdu78n.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:

Hitoshi> As earlier mail, I added aggcontext to WindowAggState.

This issue (as detailed in this post):
http://archives.postgresql.org/pgsql-hackers/2009-11/msg01871.php

is currently the only significant outstanding issue in my review of this
patch. I think we need to see more feedback on whether it is acceptable
to change the aggregate function API again (and if so, what to do with it)
before I can post a final review on this and mark it ready for committer
(or not).

--
Andrew.


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-04 21:15:36
Message-ID: 877ht2o8q3.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Functionally this patch looks excellent; correct format, applies
cleanly, passes regression, and I've been unable to find any issues
with the code itself. But I still have a concern over the interface
change, so I'm setting this back to "waiting on author" for now even
though it's really a matter for general discussion on -hackers.

To take the outstanding issues in descending order of importance:

1) Memory context handling for aggregate calls

Having thought about this carefully, I think the solution used in this
patch has to be rejected for one specific reason: if you compile
existing code (i.e. aggregates that use WindowAggState.wincontext)
written for 8.4 against the patched server, the code compiles
successfully and appears to run, but leaks memory at runtime. (And I've
confirmed that there do exist external modules that would be affected.)

If we're going to change the interface in this way, there should, IMO,
be enough of a change that old code fails to compile; e.g. by renaming
wincontext to partition_context or some equivalent change.

But in my opinion we should go further than this: since there's obviously
a need for aggregates to be able to get at a suitable memory context, I
think we should formalize this and actually define some interface functions
for them to use, so that something like

if (fcinfo->context && IsA(fcinfo->context, AggState))
aggcontext = ((AggState *) fcinfo->context)->aggcontext;
else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
aggcontext = ((WindowAggState *) fcinfo->context)->aggcontext;
else
ereport(...);

becomes something like

aggcontext = AggGetMemoryContext(fcinfo->context);
if (!aggcontext)
ereport(...);

For completeness, there should be two other functions: one to simply
return whether we are in fact being called as an aggregate, and another
one to return whether it's safe to scribble on the first argument
(while it's currently the case that these two are equivalent, it would
be good not to assume that).

Comments? This is the most significant issue as I see it.

(Also, a function in contrib/tsearch2 that accesses wincontext wasn't
updated by the patch.)

2) Keywords

I didn't find any discussion of this previously; did I miss it? The
patch changes BETWEEN from TYPE_FUNC_NAME_KEYWORD to COL_NAME_KEYWORD,
so it's now allowed as a column name (which it previously wasn't),
but now not allowed as a function. I get why it can't be a function now,
but if it didn't work as a column name before, why does it now?

(UNBOUNDED remains an unreserved word, and seems to behave in a
reasonable fashion even if used as an upper-level var in the query;
the interpretation of a bare UNBOUNDED has the standard behaviour
as far as I can see.)

3) Regression tests

Testing that views work is OK as far as it goes, but I think that view
definition should be left in place rather than dropped (possibly with
even more variants) so that the deparse code gets properly tested too.
(see the "rules" test)

4) Deparse output

The code is forcing explicit casting on the offset expressions, i.e.
the deparsed code looks like

... ROWS BETWEEN 1::bigint PRECEDING AND 1::bigint FOLLOWING ...

This looks a bit ugly; is it avoidable? At least for ROWS it should
be, I think, since the type is known; even for RANGE, the type would
be determined by the sort column.

5) Documentation issues

The entry for BETWEEN in the keywords appendix isn't updated.
(Wouldn't it make more sense for this to be generated from the keyword
list somehow?)

Spelling:
- current row. In <literal>ROWS</> mode this value means phisical row
+ current row. In <literal>ROWS</> mode this value means physical row
(this error appears twice)

The doc could probably do with some wordsmithing but this probably
shouldn't block the patch on its own; that's something that could be
handled separately I guess.

--
Andrew.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-05 06:06:07
Message-ID: 8735.1259993167@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> If we're going to change the interface in this way, there should, IMO,
> be enough of a change that old code fails to compile; e.g. by renaming
> wincontext to partition_context or some equivalent change.

Agreed --- if we have to break things, break them obviously not
silently. I don't have time right now to think about this issue in
detail, but if those are the alternatives I think the choice is clear.
Quietly adding a memory leak to code that used to work well is not
acceptable.

regards, tom lane


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-05 08:50:33
Message-ID: e08cc0400912050050m63d3be04vc13f584c1552b8de@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/12/5 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:

> 1) Memory context handling for aggregate calls
>
>    aggcontext = AggGetMemoryContext(fcinfo->context);
>    if (!aggcontext)
>        ereport(...);
>
> For completeness, there should be two other functions: one to simply
> return whether we are in fact being called as an aggregate, and another
> one to return whether it's safe to scribble on the first argument
> (while it's currently the case that these two are equivalent, it would
> be good not to assume that).
>
> Comments? This is the most significant issue as I see it.

Yep, I agree. The most essential point on this is that external
functions refer to the struct members directly; we should provide
kinds of API.

> (Also, a function in contrib/tsearch2 that accesses wincontext wasn't
> updated by the patch.)

Thanks for noticing. I didn't know it.

> 2) Keywords

The discussion is:

http://archives.postgresql.org/message-id/e08cc0400911241703u4bf53ek1c3910605a3d8778@mail.gmail.com

and ideas are extracted from Tom's mail in the last year just before
committing the first window function patch, except for changing to
COL_NAME_KEYWORD rather than RESERVED_KEYWORD as suggested. The reason
I picked it up was only that it works. I cannot tell the side effect
of COL_NAME_KEYWORD but I guess we tend to avoid RESERVED_KEYWORD as
far as possible.

And the reason BETWEEN cannot work as function name any more is based
on bison's output shift/reduce conflict when SCONST follows BETWEEN in
frame_extent. I don't have clear example that makes this happen,
though.

> 3) Regression tests
>
> Testing that views work is OK as far as it goes, but I think that view
> definition should be left in place rather than dropped (possibly with
> even more variants) so that the deparse code gets properly tested too.
> (see the "rules" test)

OK,

> 4) Deparse output
>
> The code is forcing explicit casting on the offset expressions, i.e.
> the deparsed code looks like
>
>  ... ROWS BETWEEN 1::bigint PRECEDING AND 1::bigint FOLLOWING ...
>
> This looks a bit ugly; is it avoidable? At least for ROWS it should
> be, I think, since the type is known; even for RANGE, the type would
> be determined by the sort column.

Hmm, I'll change it as LIMIT clause does. Pass false as showimplicit
to get_rule_expr() maybe?

> 5) Documentation issues
>
> The entry for BETWEEN in the keywords appendix isn't updated.
> (Wouldn't it make more sense for this to be generated from the keyword
> list somehow?)

I heard that you don't need to send keywords appendix in the patch
because it is auto-generated, if I remember correctly.

>
> Spelling:
> -     current row. In <literal>ROWS</> mode this value means phisical row
> +     current row. In <literal>ROWS</> mode this value means physical row
> (this error appears twice)

Oops, thanks.

I'm now reworking as reviewed. The last issue is whether we accept
extension of frame types without RANGE support.

Regards,

--
Hitoshi Harada


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-05 10:17:59
Message-ID: e08cc0400912050217u25bb9f3fq58fbe1d21bd9cd6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/12/5 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:

> I'm now reworking as reviewed. The last issue is whether we accept
> extension of frame types without RANGE support.

Attached is updated version. I added AggGetMemoryContext() in
executor/nodeAgg.h (though I'm not sure where to go...) and its second
argument "iswindowagg" is output parameter to know whether the call
context is Agg or WindowAgg. Your proposal of APIs to know whether the
function is called as Aggregate or not is also a candidate to be, but
it seems out of this patch scope, so it doesn't touch anything.

Fix tsearch function is also included, as well as typo phisical ->
physical. Pass false to get_rule_expr() of value in
PRECEDING/FOLLOWING.

One thing for rule test, I checked existing regression test cases and
concluded DROP VIEW is necessary, or even VIEW test for a specific
feature is not needed. I remember your aggregate ORDER BY patch
contains "rules" test changes. However, since processing order of
regression tests is not predictable and may change AFAIK, I guess it
shouldn't add those changes in rules.out.

Regards.

--
Hitoshi Harada

Attachment Content-Type Size
rows_frame_types.20091205.patch.gz application/x-gzip 18.0 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-05 10:53:35
Message-ID: 87bpidn1tg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:

Hitoshi> One thing for rule test, I checked existing regression test
Hitoshi> cases and concluded DROP VIEW is necessary, or even VIEW
Hitoshi> test for a specific feature is not needed. I remember your
Hitoshi> aggregate ORDER BY patch contains "rules" test
Hitoshi> changes. However, since processing order of regression tests
Hitoshi> is not predictable and may change AFAIK, I guess it
Hitoshi> shouldn't add those changes in rules.out.

Actually, looking more closely, the way you have it currently works only
by chance - "rules" and "window" are running in parallel, therefore the
view creation in "window" can break the output of "rules".

The order of regression tests is set in parallel_schedule and
serial_schedule; it's unpredictable only for tests within the same
parallel group.

I think a modification of the schedule is needed here; the only other
option would be to move the view creation into a different test.

--
Andrew.


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-06 22:40:10
Message-ID: 87my1vhhm9.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:

Hitoshi> Attached is updated version. I added AggGetMemoryContext()
Hitoshi> in executor/nodeAgg.h (though I'm not sure where to go...)
Hitoshi> and its second argument "iswindowagg" is output parameter to
Hitoshi> know whether the call context is Agg or WindowAgg. Your
Hitoshi> proposal of APIs to know whether the function is called as
Hitoshi> Aggregate or not is also a candidate to be, but it seems out
Hitoshi> of this patch scope, so it doesn't touch anything.

I don't really like the extra argument; aggregate functions should
almost never have to care about whether they're being called as window
functions rather than aggregate functions. And if it does care, I
don't see why this is the appropriate function for it. At the very
least the function should accept NULL for the "iswindowagg" pointer to
avoid useless variables in the caller.

So for this and the regression test problem mentioned in the other mail,
I'm setting this back to "waiting on author".

--
Andrew.


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-07 01:04:53
Message-ID: e08cc0400912061704n6bd1d52s9fd413e7e932ecdc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/12/7 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
>>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>
>  Hitoshi> Attached is updated version. I added AggGetMemoryContext()
>  Hitoshi> in executor/nodeAgg.h (though I'm not sure where to go...)
>  Hitoshi> and its second argument "iswindowagg" is output parameter to
>  Hitoshi> know whether the call context is Agg or WindowAgg. Your
>  Hitoshi> proposal of APIs to know whether the function is called as
>  Hitoshi> Aggregate or not is also a candidate to be, but it seems out
>  Hitoshi> of this patch scope, so it doesn't touch anything.
>
> I don't really like the extra argument; aggregate functions should
> almost never have to care about whether they're being called as window
> functions rather than aggregate functions. And if it does care, I
> don't see why this is the appropriate function for it. At the very
> least the function should accept NULL for the "iswindowagg" pointer to
> avoid useless variables in the caller.

I've just remembered such situation before, around here:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/array_userfuncs.c.diff?r1=1.30;r2=1.31

Of course it is fixed now. Still thinking about error reporting or
something, there should be a way to know which context you are called.
OK, I agree iswindowagg can be nullable, though most "isnull"
parameter cannot be NULL and forcing caller to put boolean stack value
don't seem a hard work.

> So for this and the regression test problem mentioned in the other mail,
> I'm setting this back to "waiting on author".

In my humble opinion, view regression test is not necessary in both my
patch and yours. At least window test has not been testing views so
far since it was introduced. Am I missing?

Regards,

--
Hitoshi Harada


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: umi(dot)tanuki(at)gmail(dot)com (Hitoshi Harada), pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-07 01:48:38
Message-ID: 87ein7h8ky.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:

>> So for this and the regression test problem mentioned in the other
>> mail, I'm setting this back to "waiting on author".

Hitoshi> In my humble opinion, view regression test is not necessary
Hitoshi> in both my patch and yours. At least window test has not
Hitoshi> been testing views so far since it was introduced. Am I
Hitoshi> missing?

If you don't want to include views in the regression test, then take
them out; I will object to that, but I'll leave it up to the committer
to decide whether it is appropriate, provided the other concerns are
addressed.

But what you have in the regression tests _now_ is, as far as I can
tell, only working by chance; with "rules" and "window" being in the
same parallel group, and window.sql creating a view, you have an
obvious race condition, unless I'm missing something.

I included view tests in my own patch for the very simple reason that
I found actual bugs that way during development.

--
Andrew (irc:RhodiumToad)


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-07 02:53:21
Message-ID: 4B1C6E21.9030602@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth wrote:
> But what you have in the regression tests _now_ is, as far as I can
> tell, only working by chance; with "rules" and "window" being in the
> same parallel group, and window.sql creating a view, you have an
> obvious race condition, unless I'm missing something.
>
You're right. rules.sql does this, among other things that might get
impacted:

SELECT viewname, definition FROM pg_views WHERE schemaname <>
'information_schema' ORDER BY viewname;

Both rules and window are part of the same parallel group:

test: select_views portals_p2 rules foreign_key cluster dependency guc
bitmapops combocid tsearch tsdicts foreign_data window xmlmap

Which means that views created in the window test could absolutely cause
the rules test to fail given a bad race condition. Either rules or
window needs to be moved to another section of the test schedule. (I
guess you could cut down the scope of "rules" to avoid this particular
problem, but hacking other people's regression tests to work around
issues caused by yours is never good practice). I also agree with
Andrew's sentiment that including a view on top of the new window
implementations is good practice, just for general robustness.

Also, while not a strict requirement, I personally hate seeing things
with simple names used in the regression tests, like how "v" is used in
this patch. The better written regression tests use a preface for all
their names to decrease the chance of a conflict here; for example, the
rules test names all of its views with names like rtest_v1 so they're
very clearly not going to conflict with views created by other tests.
No cleverness there can eliminate the conflict with rules though, when
it's looking at every view in the system.

It looks like a lot of progress has been made on this patch through its
review. But there's enough open issues still that I think it could use
a bit more time to mature before we try to get it committed--the fact
that it's been getting bounced around for weeks now and the regression
tests aren't even completely settled down yet is telling. The feature
seems complete, useful, and functionally solid, but still in need of
some general cleanup and re-testing afterwards. I'm going to mark this
one "Returned with Feedback" for now. Please continue to work on
knocking all these issues out, this should be a lot easier to get
committed in our next CF.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: add more frame types in window functions (ROWS)
Date: 2009-12-07 03:11:21
Message-ID: e08cc0400912061911s42b9a106xf4a7979bd2804482@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/12/7 Greg Smith <greg(at)2ndquadrant(dot)com>:
> Which means that views created in the window test could absolutely cause the
> rules test to fail given a bad race condition.  Either rules or window needs
> to be moved to another section of the test schedule.  (I guess you could cut
> down the scope of "rules" to avoid this particular problem, but hacking
> other people's regression tests to work around issues caused by yours is
> never good practice).  I also agree with Andrew's sentiment that including a
> view on top of the new window implementations is good practice, just for
> general robustness.

I've agreed window and rules cannot be in the same group if window has
view test.

> It looks like a lot of progress has been made on this patch through its
> review.  But there's enough open issues still that I think it could use a
> bit more time to mature before we try to get it committed--the fact that
> it's been getting bounced around for weeks now and the regression tests
> aren't even completely settled down yet is telling.  The feature seems
> complete, useful, and functionally solid, but still in need of some general
> cleanup and re-testing afterwards.  I'm going to mark this one "Returned
> with Feedback" for now.  Please continue to work on knocking all these
> issues out, this should be a lot easier to get committed in our next CF.

OK, Andrew, thank you for collaborating on this. This fest made my
patch to progress greatly. More comments from others on the memory
leakage issue are welcome yet.

Regards,

--
Hitoshi Harada