Re: Windowing Function Patch Review -> NTH_VALUE

Lists: pgsql-hackers
From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-05 00:25:18
Message-ID: D4DF765C92DD40BDA5CBBD0BAB37AF87@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> All,
>
> This is my first patch review for PostgreSQL. I did submit a patch last
> commit fest (Boyer-Moore) so I feel I should review one this commit fest.
> I'm quite new to PostgreSQL so please don't rely on me totally. I'll do my
> best. Heikki is also reviewing this patch which makes me feel better.
>
> My aim is to get the author has much feed back as quickly as possible. For
> this reason I'm going to be breaking down my reviews into the following
> topics.
>
> 1. Does patch apply cleanly?
>
> 2. Any compiler warnings?
>
> 3. Do the results follow the SQL standard?
>
> 4. Performance Comparison, does it perform better than alternate ways of
> doing things. Self joins, sub queries etc.
>
> 5. Performance, no comparison. How does it perform with larger tables?

This thread covers part of 3.

Quoted from SQL:2008
"If CUME_DIST is specified, then the relative rank of a row R is defined as
NP/NR, where NP is defined
to be the number of rows preceding or peer with R in the window ordering of
the window partition of R
and NR is defined to be the number of rows in the window partition of R."

So let me create a quick test case...

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);

insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

My interpretation of the standard should make the last two columns in the
following query equal, and they are in the patch.

SELECT name,CAST(r AS FLOAT) / c, cd
FROM (SELECT name,
ROW_NUMBER() OVER(ORDER BY salary) as r,
COUNT(*) OVER() AS c,
CUME_DIST() OVER(ORDER BY salary) AS cd
FROM employees
) t;

Both Oracle and Sybase say otherwise. Have I (we both) misinterpreted the
standard?

name,cast(t.r as real)/t.c,cd
'Jeff',0.16666666666666666,0.16666666666666666
'Sam',0.3333333333333333,0.3333333333333333
'Ian',0.5,0.5
'Richard',0.6666666666666666,0.6666666666666666
'John',0.8333333333333334,1.0
'Matthew',1.0,1.0

Above are the results from Sybase.

Can anyone see who is correct here? Is it possible that both Oracle and
Sybase have it wrong?

David.


From: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-05 01:02:57
Message-ID: 1d709ecc0811041702m7fb8f515gfefb357e28585a63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Quoted from SQL:2008
> "If CUME_DIST is specified, then the relative *rank *of a row R is defined
> as
> NP/NR, where NP is defined
> to be the number of rows preceding or peer with R in the window ordering of
> the window partition of R
> and NR is defined to be the number of rows in the window partition of R."
>
> I guess there is a difference between "row_number" and "number of rows
preceding or peer with R"

"number of rows preceding or peer with R" == count(*) over (order by salary)

As far as I understand, the following query should calculate cume_dist
properly (and it does so in Oracle):

SELECT name,CAST(r AS FLOAT) / c, cd
FROM (SELECT name,
COUNT(*) OVER(ORDER BY salary) as r,
COUNT(*) OVER() AS c,
CUME_DIST() OVER(ORDER BY salary) AS cd
FROM employees
) t;

Sincerely yours,
Vladimir Sitnikov


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: "David Rowley" <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-05 01:26:59
Message-ID: e08cc0400811041726l26657b20ya37b2fe5624f1253@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/5 Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>:
>
>> Quoted from SQL:2008
>> "If CUME_DIST is specified, then the relative rank of a row R is defined
>> as
>> NP/NR, where NP is defined
>> to be the number of rows preceding or peer with R in the window ordering
>> of
>> the window partition of R
>> and NR is defined to be the number of rows in the window partition of R."
>>
> I guess there is a difference between "row_number" and "number of rows
> preceding or peer with R"
>
> "number of rows preceding or peer with R" == count(*) over (order by salary)
>
> As far as I understand, the following query should calculate cume_dist
> properly (and it does so in Oracle):
>
> SELECT name,CAST(r AS FLOAT) / c, cd
> FROM (SELECT name,
> COUNT(*) OVER(ORDER BY salary) as r,
> COUNT(*) OVER() AS c,
> CUME_DIST() OVER(ORDER BY salary) AS cd
> FROM employees
> ) t;
>

I'm afraid I misinterpreted it. As you say,

"number of rows preceding == row_number()"

and

"rumber of rows preceding or peers to R != row_number() (neither rank())"

"peers to R" in the window function context means "same rows by the
ORDER BY clause", so in the first example, id=5 and id=6 are peers and
in both rows, NP should be 6, as Oracle and Sybase say.

Even though I understand the definition, your suggestion of COUNT(*)
OVER (ORDER BY salary) doesn't make sense. In the patch, it simply
returns the same value as row_number() but is it wrong, too?

Regards,

--
Hitoshi Harada


From: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc: "David Rowley" <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-05 03:22:47
Message-ID: 1d709ecc0811041922y2af09fderf605eedc098222e4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>
> Even though I understand the definition, your suggestion of COUNT(*)
> OVER (ORDER BY salary) doesn't make sense.

Why does not that make sense?
I have not read the spec, however Oracle has a default window specification
in case there is only an order by clause. The default window is "range
between unbounded preceding and current row".

"count(*) over (order by salary range between unbounded preceding and
current row)" is perfectly identical to the "number of rows preceding or
peers to R" by the definition, isn't it? I see here a word-by-word
translation from SQL to the English and vice versa.

If the patch returns "row_number" it is wrong since there is no way for
row_number to be a "number of rows preceding or peer with R", is there?

Regards,
Vladimir Sitnikov


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: "David Rowley" <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-05 04:47:22
Message-ID: e08cc0400811042047i92ea376t2843e90e2f5a50e2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/5 Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>:
>>
>> Even though I understand the definition, your suggestion of COUNT(*)
>> OVER (ORDER BY salary) doesn't make sense.
>
> Why does not that make sense?
> I have not read the spec, however Oracle has a default window specification
> in case there is only an order by clause. The default window is "range
> between unbounded preceding and current row".
>
> "count(*) over (order by salary range between unbounded preceding and
> current row)" is perfectly identical to the "number of rows preceding or
> peers to R" by the definition, isn't it? I see here a word-by-word
> translation from SQL to the English and vice versa.
>
> If the patch returns "row_number" it is wrong since there is no way for
> row_number to be a "number of rows preceding or peer with R", is there?
>

I've got it.
I had thought that implicit window framing specified by ORDER BY
clause (such like above) would mean ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW. But actually reading the spec more closely it says:

Otherwise, WF consists of all rows of the partition of R that precede
R or are peers of R in the
window ordering of the window partition defined by the window ordering clause.

So it means RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as you
pointed. And the result of count(*) OVER (ORDER BY salary) doesn't
equal to row_number().

Now my assumption is broken. Let me take time to think about how to solve it...

Regards,

--
Hitoshi Harada


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: "David Rowley" <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-07 15:46:47
Message-ID: e08cc0400811070746p43553f10s6bb09e98b7eafc3b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David, Vladimir,

2008/11/5 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> Now my assumption is broken. Let me take time to think about how to solve it...

I fixed the points we discussed a few days ago. The delta is here:

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bfbc039f68aa749b403c84a803d49a02b3d6c199;hp=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8

although attached is the whole (split) patch.

In addition to fixing cume_dist() and implicit frame definition, I
added two window function APIs, WinRowIsPeer() and WinIterFinish(). Up
to now I've avoided to touch ORDER BY clause comparisons deeply,
because I didn't see any abstraction in that except rank(). But now I
know the very important word "peers" appears so many times in the
spec, I'm inclined to implement some general mechanisms for those APIs
like IsPeer(). Also, as with this version part of RANGE is supported,
the road to the FRAME clause support got shorter than before.

Thanks for your feedback and continuing tests!

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081107-1.gz application/x-gzip 46.7 KB

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: "David Rowley" <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-07 15:47:22
Message-ID: e08cc0400811070747n261889c9p1ff256d252d0ba36@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

patch-2

2008/11/8 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> David, Vladimir,
>
> 2008/11/5 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>> Now my assumption is broken. Let me take time to think about how to solve it...
>
> I fixed the points we discussed a few days ago. The delta is here:
>
> http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bfbc039f68aa749b403c84a803d49a02b3d6c199;hp=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8
>
> although attached is the whole (split) patch.
>
> In addition to fixing cume_dist() and implicit frame definition, I
> added two window function APIs, WinRowIsPeer() and WinIterFinish(). Up
> to now I've avoided to touch ORDER BY clause comparisons deeply,
> because I didn't see any abstraction in that except rank(). But now I
> know the very important word "peers" appears so many times in the
> spec, I'm inclined to implement some general mechanisms for those APIs
> like IsPeer(). Also, as with this version part of RANGE is supported,
> the road to the FRAME clause support got shorter than before.
>
> Thanks for your feedback and continuing tests!
>
> Regards,
>
>
> --
> Hitoshi Harada
>

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081107-2.gz application/x-gzip 93.0 KB

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-08 15:03:37
Message-ID: 34A5A4A343FC4A13BE08FDB63DAA521F@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada Wrote:
> > although attached is the whole (split) patch.

I'm having some trouble getting these patches to patch cleanly. I think it's
because of this that I can't get postgres to compile after applying the
patch.

It errors out at tuptoaster.c some constants seem to be missing from
fmgroids.h

If I open src/include/utils/fmgroids.h

The only constaint being defined is:

#define F__NULL_ 31

I'd assume it was a problem with my setup, but the CVS head compiles fine.

Let me know if I'm doing something wrong.

Here is the output from patch and the compile errors.

david(at)ubuntu1:~/pg8.4/pgsql$ patch -p1 < window_functions.patch.20081107-1
patching file doc/src/sgml/func.sgml
Hunk #1 succeeded at 10061 (offset 11 lines).
patching file doc/src/sgml/keywords.sgml
patching file doc/src/sgml/queries.sgml
patching file doc/src/sgml/query.sgml
patching file doc/src/sgml/ref/select.sgml
patching file doc/src/sgml/syntax.sgml
patching file src/backend/catalog/pg_aggregate.c
patching file src/backend/catalog/pg_proc.c
patching file src/backend/commands/explain.c
patching file src/backend/executor/Makefile
patching file src/backend/executor/execAmi.c
patching file src/backend/executor/execGrouping.c
patching file src/backend/executor/execProcnode.c
patching file src/backend/executor/execQual.c
Hunk #3 succeeded at 4167 (offset 14 lines).
patching file src/backend/executor/nodeAgg.c
patching file src/backend/executor/nodeWindow.c
patching file src/backend/nodes/copyfuncs.c
patching file src/backend/nodes/equalfuncs.c
patching file src/backend/nodes/nodeFuncs.c
patching file src/backend/nodes/outfuncs.c
patching file src/backend/nodes/readfuncs.c
patching file src/backend/optimizer/path/allpaths.c
patching file src/backend/optimizer/path/equivclass.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/planagg.c
patching file src/backend/optimizer/plan/planmain.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/optimizer/plan/setrefs.c
patching file src/backend/optimizer/plan/subselect.c
patching file src/backend/optimizer/prep/prepjointree.c
patching file src/backend/optimizer/util/clauses.c
patching file src/backend/optimizer/util/tlist.c
patching file src/backend/optimizer/util/var.c
patching file src/backend/parser/analyze.c
patching file src/backend/parser/gram.y
Hunk #6 succeeded at 6433 (offset 8 lines).
Hunk #7 succeeded at 6443 (offset 8 lines).
Hunk #8 succeeded at 8212 (offset 9 lines).
Hunk #9 succeeded at 8220 (offset 9 lines).
Hunk #10 succeeded at 8232 (offset 9 lines).
Hunk #11 succeeded at 8244 (offset 9 lines).
Hunk #12 succeeded at 8256 (offset 9 lines).
Hunk #13 succeeded at 8272 (offset 9 lines).
Hunk #14 succeeded at 8284 (offset 9 lines).
Hunk #15 succeeded at 8306 (offset 9 lines).
Hunk #16 succeeded at 8754 (offset 9 lines).
Hunk #17 succeeded at 9629 (offset 9 lines).
Hunk #18 succeeded at 9637 (offset 9 lines).
Hunk #19 succeeded at 9703 (offset 9 lines).
Hunk #20 succeeded at 9720 (offset 9 lines).
Hunk #21 succeeded at 9772 (offset 9 lines).
Hunk #22 succeeded at 9959 (offset 9 lines).
Hunk #23 succeeded at 9980 (offset 9 lines).
patching file src/backend/parser/keywords.c
patching file src/backend/parser/parse_agg.c
patching file src/backend/parser/parse_clause.c
patching file src/backend/parser/parse_expr.c
patching file src/backend/parser/parse_func.c
patching file src/backend/rewrite/rewriteManip.c
patching file src/backend/utils/adt/Makefile
patching file src/backend/utils/adt/ruleutils.c
patching file src/backend/utils/adt/wfunc.c
patching file src/backend/utils/sort/tuplestore.c
patching file src/include/catalog/pg_attribute.h
patching file src/include/catalog/pg_class.h
david(at)ubuntu1:~/pg8.4/pgsql$ patch -p1 < window_functions.patch.20081107-2
(Stripping trailing CRs from patch.)
patching file src/include/catalog/pg_proc.h
Hunk #4 FAILED at 106.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/catalog/pg_proc.h.rej
(Stripping trailing CRs from patch.)
patching file src/include/executor/executor.h
(Stripping trailing CRs from patch.)
patching file src/include/executor/nodeWindow.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/execnodes.h
Hunk #3 succeeded at 509 (offset 2 lines).
Hunk #4 succeeded at 1586 (offset 2 lines).
(Stripping trailing CRs from patch.)
patching file src/include/nodes/nodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/parsenodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/plannodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/primnodes.h
(Stripping trailing CRs from patch.)
patching file src/include/nodes/relation.h
(Stripping trailing CRs from patch.)
patching file src/include/optimizer/planmain.h
(Stripping trailing CRs from patch.)
patching file src/include/optimizer/var.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_agg.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_clause.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_func.h
(Stripping trailing CRs from patch.)
patching file src/include/parser/parse_node.h
(Stripping trailing CRs from patch.)
patching file src/include/rewrite/rewriteManip.h
(Stripping trailing CRs from patch.)
patching file src/include/utils/builtins.h
Hunk #1 succeeded at 975 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file src/include/utils/errcodes.h
(Stripping trailing CRs from patch.)
patching file src/include/utils/tuplestore.h
(Stripping trailing CRs from patch.)
patching file src/interfaces/ecpg/preproc/preproc.y
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/opr_sanity.out
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/window.out
(Stripping trailing CRs from patch.)
patching file src/test/regress/parallel_schedule
(Stripping trailing CRs from patch.)
patching file src/test/regress/serial_schedule
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/create_table.sql
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/opr_sanity.sql
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/window.sql
david(at)ubuntu1:~/pg8.4/pgsql$

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv
-I../../../../src/include -D_GNU_SOURCE -c -o tuptoaster.o tuptoaster.c
tuptoaster.c: In function âtoast_delete_datumâ:
tuptoaster.c:1293: error: âF_OIDEQâ undeclared (first use in this function)
tuptoaster.c:1293: error: (Each undeclared identifier is reported only once
tuptoaster.c:1293: error: for each function it appears in.)
tuptoaster.c: In function âtoast_fetch_datumâ:
tuptoaster.c:1372: error: âF_OIDEQâ undeclared (first use in this function)
tuptoaster.c: In function âtoast_fetch_datum_sliceâ:
tuptoaster.c:1567: error: âF_OIDEQâ undeclared (first use in this function)
tuptoaster.c:1577: error: âF_INT4EQâ undeclared (first use in this function)
tuptoaster.c:1585: error: âF_INT4GEâ undeclared (first use in this function)
tuptoaster.c:1589: error: âF_INT4LEâ undeclared (first use in this function)
make[4]: *** [tuptoaster.o] Error 1
make[4]: Leaving directory `/home/david/pg8.4/pgsql/src/backend/access/heap'
make[3]: *** [heap-recursive] Error 2
make[3]: Leaving directory `/home/david/pg8.4/pgsql/src/backend/access'
make[2]: *** [access-recursive] Error 2
make[2]: Leaving directory `/home/david/pg8.4/pgsql/src/backend'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/david/pg8.4/pgsql/src'
make: *** [all] Error 2


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-08 15:51:10
Message-ID: e08cc0400811080751j262f2920q3eb1d13cca349207@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
> Hitoshi Harada Wrote:
>> > although attached is the whole (split) patch.
>
> I'm having some trouble getting these patches to patch cleanly. I think it's
> because of this that I can't get postgres to compile after applying the
> patch.
>
> It errors out at tuptoaster.c some constants seem to be missing from
> fmgroids.h
>
> If I open src/include/utils/fmgroids.h
>
> The only constaint being defined is:
>
> #define F__NULL_ 31
>
> I'd assume it was a problem with my setup, but the CVS head compiles fine.
>
> Let me know if I'm doing something wrong.

> patching file src/include/catalog/pg_proc.h
> Hunk #4 FAILED at 106.
> 1 out of 4 hunks FAILED -- saving rejects to file
> src/include/catalog/pg_proc.h.rej

As it says pg_proc.h may have conflicts. The patch is not against HEAD
but based on the same as the previous (v08) patch. I am remote now so
I'm not sure but could you try to find conflicts in pg_proc.h? The
pg_proc catalog has been added 1 column called prociswfunc (bool) in
the patch. All you have to do is add 'f' in the middle of new-coming
lines.
Sorry for annoying, and I'll be back in hours.

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-08 18:36:50
Message-ID: 19447.1226169410@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David Rowley" <dgrowley(at)gmail(dot)com> writes:
> patching file src/include/catalog/pg_proc.h
> Hunk #4 FAILED at 106.
> 1 out of 4 hunks FAILED -- saving rejects to file
> src/include/catalog/pg_proc.h.rej

I imagine you'll find that "hunk #4" covers the entire DATA() body of
the file :-(. It can't possibly apply cleanly if anyone's added or
altered pg_proc entries since the patch was made.

What you'd need to do is manually insert proiswfunc = 'f' entries in
all the existing DATA lines (this is usually not too hard with sed or
an emacs macro), then add whatever new functions the patch defines.
Even figuring out the latter from the patch representation can be a
serious PITA, since they'll be a few lines out of a multi-thousand-line
failed diff hunk.

I'm not sure if Hitoshi is in a position to submit the pg_proc changes
as two separate diffs --- one to add the new column and a separate one
to add in the new functions --- but it'd be a lot easier to deal with
merge issues if he could.

(Now I'll sit back and wait for some fanboy to claim that
$his_favorite_scm could solve this automatically ...)

regards, tom lane


From: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowley(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "heikki(dot)linnakangas(at)enterprisedb(dot)com" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-08 18:46:58
Message-ID: B3E86191-D692-4CBB-9387-13E1E15321AD@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Instead of a patch it might be easier to submit the new columns as a
perl script or sed command. We do something like that to make merging
pg_proc easier.

greg

On 8 Nov 2008, at 01:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "David Rowley" <dgrowley(at)gmail(dot)com> writes:
>> patching file src/include/catalog/pg_proc.h
>> Hunk #4 FAILED at 106.
>> 1 out of 4 hunks FAILED -- saving rejects to file
>> src/include/catalog/pg_proc.h.rej
>
> I imagine you'll find that "hunk #4" covers the entire DATA() body of
> the file :-(. It can't possibly apply cleanly if anyone's added or
> altered pg_proc entries since the patch was made.
>
> What you'd need to do is manually insert proiswfunc = 'f' entries in
> all the existing DATA lines (this is usually not too hard with sed or
> an emacs macro), then add whatever new functions the patch defines.
> Even figuring out the latter from the patch representation can be a
> serious PITA, since they'll be a few lines out of a multi-thousand-
> line
> failed diff hunk.
>
> I'm not sure if Hitoshi is in a position to submit the pg_proc changes
> as two separate diffs --- one to add the new column and a separate one
> to add in the new functions --- but it'd be a lot easier to deal with
> merge issues if he could.
>
> (Now I'll sit back and wait for some fanboy to claim that
> $his_favorite_scm could solve this automatically ...)
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-09 04:24:48
Message-ID: e08cc0400811082024p3a724444s91fe9c69f8da41e3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/9 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
>> Hitoshi Harada Wrote:
>>> > although attached is the whole (split) patch.
>>
>> I'm having some trouble getting these patches to patch cleanly. I think it's
>> because of this that I can't get postgres to compile after applying the
>> patch.
>>
>> It errors out at tuptoaster.c some constants seem to be missing from
>> fmgroids.h
>>
>> If I open src/include/utils/fmgroids.h
>>
>> The only constaint being defined is:
>>
>> #define F__NULL_ 31
>>
>> I'd assume it was a problem with my setup, but the CVS head compiles fine.
>>
>> Let me know if I'm doing something wrong.
>
>> patching file src/include/catalog/pg_proc.h
>> Hunk #4 FAILED at 106.
>> 1 out of 4 hunks FAILED -- saving rejects to file
>> src/include/catalog/pg_proc.h.rej
>
> As it says pg_proc.h may have conflicts. The patch is not against HEAD
> but based on the same as the previous (v08) patch. I am remote now so
> I'm not sure but could you try to find conflicts in pg_proc.h? The
> pg_proc catalog has been added 1 column called prociswfunc (bool) in
> the patch. All you have to do is add 'f' in the middle of new-coming
> lines.
> Sorry for annoying, and I'll be back in hours.
>

I recreate the patch against current HEAD, in the git it's here:

http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543d873bb7228f4c057c23e0

I tested `patch -p1` with the attached and succeeded to make it work
cleanly. It seems to me that this patch can be applied until another
pg_proc.h entry would introduced in the HEAD.

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081109-1.gz application/x-gzip 46.6 KB

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-09 04:25:20
Message-ID: e08cc0400811082025t30d5a94et53a5b9585155adc1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

patch-2

2008/11/9 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2008/11/9 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>> 2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
>>> Hitoshi Harada Wrote:
>>>> > although attached is the whole (split) patch.
>>>
>>> I'm having some trouble getting these patches to patch cleanly. I think it's
>>> because of this that I can't get postgres to compile after applying the
>>> patch.
>>>
>>> It errors out at tuptoaster.c some constants seem to be missing from
>>> fmgroids.h
>>>
>>> If I open src/include/utils/fmgroids.h
>>>
>>> The only constaint being defined is:
>>>
>>> #define F__NULL_ 31
>>>
>>> I'd assume it was a problem with my setup, but the CVS head compiles fine.
>>>
>>> Let me know if I'm doing something wrong.
>>
>>> patching file src/include/catalog/pg_proc.h
>>> Hunk #4 FAILED at 106.
>>> 1 out of 4 hunks FAILED -- saving rejects to file
>>> src/include/catalog/pg_proc.h.rej
>>
>> As it says pg_proc.h may have conflicts. The patch is not against HEAD
>> but based on the same as the previous (v08) patch. I am remote now so
>> I'm not sure but could you try to find conflicts in pg_proc.h? The
>> pg_proc catalog has been added 1 column called prociswfunc (bool) in
>> the patch. All you have to do is add 'f' in the middle of new-coming
>> lines.
>> Sorry for annoying, and I'll be back in hours.
>>
>
> I recreate the patch against current HEAD, in the git it's here:
>
> http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543d873bb7228f4c057c23e0
>
> I tested `patch -p1` with the attached and succeeded to make it work
> cleanly. It seems to me that this patch can be applied until another
> pg_proc.h entry would introduced in the HEAD.
>
>
> Regards,
>
> --
> Hitoshi Harada
>

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081109-2.gz application/x-gzip 93.0 KB

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-09 09:59:16
Message-ID: CD929D16910E443E8E04C0F27068CCB9@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada wrote:
> I recreate the patch against current HEAD, in the git it's here:
>
> http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543
> d873bb7228f4c057c23e0
>
> I tested `patch -p1` with the attached and succeeded to make it work
> cleanly. It seems to me that this patch can be applied until another
> pg_proc.h entry would introduced in the HEAD.
>
>
> Regards,
>
> --
> Hitoshi Harada

Thanks Hitoshi,

You've done what I planned to do first thing this morning. I was planning to
write a script that checks for the new lines then I could have removed
those, patched then re-added them with the extra f.

It looks like it's not easy to keep it working with CVS head. I counted 3
patches that touched pg_proc.h just this month. I must have missed another
as I still couldn't get it to apply. Hence the need for the script.

Thanks for the new patch I'll get testing now.

David.


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-09 10:18:59
Message-ID: e08cc0400811090218t1420b232n411d97ca051a2538@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
> Hitoshi Harada wrote:
>> I recreate the patch against current HEAD, in the git it's here:
>>
>> http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543
>> d873bb7228f4c057c23e0
>>
>> I tested `patch -p1` with the attached and succeeded to make it work
>> cleanly. It seems to me that this patch can be applied until another
>> pg_proc.h entry would introduced in the HEAD.
>>
>>
>> Regards,
>>
>> --
>> Hitoshi Harada
>
> Thanks Hitoshi,
>
> You've done what I planned to do first thing this morning. I was planning to
> write a script that checks for the new lines then I could have removed
> those, patched then re-added them with the extra f.
>
> It looks like it's not easy to keep it working with CVS head. I counted 3
> patches that touched pg_proc.h just this month. I must have missed another
> as I still couldn't get it to apply. Hence the need for the script.
>
> Thanks for the new patch I'll get testing now.
>
> David.
>

I'm glad to hear that. Actually thanks to git it is quite easy for me
to merge my own repository with the HEAD. It tells me which lines are
new coming and which lines I modified are newer than else in CVS. This
is my first project where I use git (and I am not guru of cvs either
-:) but I love it now.
So feel free to tell me to recreate a new patch against head. Not so
heavy work as making on-the-fly script.

Regards,

--
Hitoshi Harada


From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-09 11:08:15
Message-ID: D564243DB5D14ACFAD0023585E1D9783@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada wrote:
> I'm glad to hear that. Actually thanks to git it is quite easy for me
> to merge my own repository with the HEAD. It tells me which lines are
> new coming and which lines I modified are newer than else in CVS. This
> is my first project where I use git (and I am not guru of cvs either
> -:) but I love it now.
> So feel free to tell me to recreate a new patch against head. Not so
> heavy work as making on-the-fly script.
>

CUME_DIST and COUNT(*) seem to be working as expected now with the original
tests. I'll continue to test more and also test the other windowing
functions that I've not got to yet.

I did receive a small warning from the compiler when compiling nodeWindow.c.
It was complaining about a possible un-initialised f_shrinking. Just to keep
it quiet I changed the break's to return's after the elog() call.

*** nodeWindow.c 2008-11-09 10:54:50.000000000 +0000
--- nodeWindow.c 2008-11-09 10:39:24.000000000 +0000
***************
*** 818,824 ****
case FRAME_CURRENT_RANGE:
/* UNSUPPORTED */
elog(ERROR, "unknown preceding type %d",
node->preceding_type);
! break;
case FRAME_VALUE_ROWS:
if (node->preceding_rows > 0)
f_shrinking = 0;
--- 818,824 ----
case FRAME_CURRENT_RANGE:
/* UNSUPPORTED */
elog(ERROR, "unknown preceding type %d",
node->preceding_type);
! return; /* keep compiler quiet */
case FRAME_VALUE_ROWS:
if (node->preceding_rows > 0)
f_shrinking = 0;
***************
*** 922,928 ****
case FRAME_CURRENT_RANGE:
/* UNSUPPORTED */
elog(ERROR, "unknown preceding type %d",
node->preceding_type);
! break;
case FRAME_VALUE_ROWS:
if (node->preceding_rows <=
winobj->p_currentpos + 1)
f_shrinking = 1;
--- 922,928 ----
case FRAME_CURRENT_RANGE:
/* UNSUPPORTED */
elog(ERROR, "unknown preceding type %d",
node->preceding_type);
! return; /* keep compiler quiet */
case FRAME_VALUE_ROWS:
if (node->preceding_rows <=
winobj->p_currentpos + 1)
f_shrinking = 1;


From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-09 11:30:17
Message-ID: E1304A0E35F34CB080A83F60AD9B71EF@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Using one of my original test tables I'm testing windowing functions with a
GROUP BY.

The following query works as I would expect.

-- Works
SELECT department,
SUM(Salary),
ROW_NUMBER() OVER (ORDER BY department),
SUM(SUM(salary)) OVER (ORDER BY department)
FROM employees
GROUP BY department;

The following one fails with the message.
ERROR: variable not found in subplan target list

-- Does not work.
SELECT department,
SUM(Salary),
ROW_NUMBER() OVER (ORDER BY department),
SUM(SUM(salary)) OVER (ORDER BY department DESC)
FROM employees
GROUP BY department;

I just added the DESC to force it into creating 2 separate windows.

I can re-write the non working query to work using the following:

SELECT department,
salary,
ROW_NUMBER() OVER (ORDER BY department),
SUM(salary) OVER (ORDER BY department DESC)
FROM (SELECT department,
SUM(salary) AS salary
FROM employees
GROUP BY department
) t;

Testing with:

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);

insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);


From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Windowing Function Patch Review -> NTILE function
Date: 2008-11-09 12:32:02
Message-ID: F8B51ABADE2343029236663D0E64EAEB@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I've done a little testing with NTILE(). I think a check should be added to
the ntile() function in wfunc.c.

david=# select name,salary,ntile(0) over (order by salary) as n from
employees;
ERROR: floating-point exception
DETAIL: An invalid floating-point operation was signaled. This probably
means an out-of-range result or an invalid operation, such as division by
zero.

I tracked that message back to the signal handler in postgres.c :-( simple
fix though. Any value less than 1 does not really make sense to me.

Maybe we should add something like:

if (PG_WINDOW_ARG(0) < 1)
elog(ERROR, "negative or zero ntile argument not allowed");

What do you think?

Oracle errors out on less than 1, Sybase seems not to have ntile.
MSSQL 2008 also errors out on less than 1

David.


From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Windowing Function Patch Review -> NTH_VALUE
Date: 2008-11-09 14:05:37
Message-ID: 2FD448EBB6A840B2A81205FBE55DEFD4@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm having a little trouble understanding the standard for NTH_VALUE(). I
would have assumed that NTH_VALUE(name,1) would return the first name in the
window. The current patch is using 0 for the first.

Here is the paragraph I'm reading in the standard:

"The nth-value function takes an arbitrary <value expression> VE and a
<simple value specification> or a <dynamic parameter specification> that
evaluates to an exact numeric value n with scale 0 (zero) as arguments
and, for each row R of a windowed table, returns the value of VE evaluated
on the n-th row from the first (if FROM FIRST is specified or implied) or
the last (if FROM LAST is specified) row of the window frame of R defined by
a window structure descriptor. In addition, RESPECT NULLS or IGNORE NULLS
can be specified to indicate whether the rows for which VE evaluates to the
null value are preserved or eliminated."

The text "returns the value of VE evaluated on the n-th row from the first".
I find the "from the first" quite difficult to understand. If it said "in
the window partition" then that seems simple. I'm not sure if "from the
first" includes or does not include the first row in the window partition.

Perhaps it's easier to see in an example.

(Using employees table from another thread) for those who missed it:

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);

insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

david=# select *,nth_value(name,1) over (order by id) from employees;
id | name | department | salary | nth_value
----+---------+------------+--------+-----------
1 | Jeff | IT | 10000 |
2 | Sam | IT | 12000 | Sam
3 | Richard | Manager | 30000 | Sam
4 | Ian | Manager | 20000 | Sam
5 | John | IT | 60000 | Sam
6 | Matthew | Director | 60000 | Sam
(6 rows)

"Sam" is the name from the 2nd row in the window partition.

david=# select *,nth_value(name,0) over (order by id) from employees;
id | name | department | salary | nth_value
----+---------+------------+--------+-----------
1 | Jeff | IT | 10000 | Jeff
2 | Sam | IT | 12000 | Jeff
3 | Richard | Manager | 30000 | Jeff
4 | Ian | Manager | 20000 | Jeff
5 | John | IT | 60000 | Jeff
6 | Matthew | Director | 60000 | Jeff

Also does anyone think that a negative nth_value should be disallowed. The
standard does not seem to give any details on this.

david=# select *,nth_value(name,-1) over (order by id) from employees;
id | name | department | salary | nth_value
----+---------+------------+--------+-----------
1 | Jeff | IT | 10000 |
2 | Sam | IT | 12000 |
3 | Richard | Manager | 30000 |
4 | Ian | Manager | 20000 |
5 | John | IT | 60000 |
6 | Matthew | Director | 60000 |

I also cannot find another RDBMS that implements NTH_VALUE to see what they
do.

Does anyone know if one exists?

Anyone out there able to understand what the standard requires in this case?

It just seems strange to have NTH_VALUE(col,1) return the 2nd row when
functions like ROW_NUMBER() work with base 1 rather than base 0.

Any help or comments on this would be appreciated.

David.


From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Windowing Function Patch Review -> ROW_NUMBER without ORDER BY
Date: 2008-11-09 21:35:01
Message-ID: CA36904CC62E4C3BB37C177425A22D02@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been trying to think of a use case for using ROW_NUMBER() with no ORDER
BY in the window clause.

Using the example table I always seem to be using, for those who missed it
in other threads.

create table employees (
id INT primary key,
name varchar(30) not null,
department varchar(30) not null,
salary int not null,
check (salary >= 0)
);

insert into employees values(1,'Jeff','IT',10000);
insert into employees values(2,'Sam','IT',12000);
insert into employees values(3,'Richard','Manager',30000);
insert into employees values(4,'Ian','Manager',20000);
insert into employees values(5,'John','IT',60000);
insert into employees values(6,'Matthew','Director',60000);

david=# select *,row_number() over () from employees;
id | name | department | salary | row_number
----+---------+------------+--------+------------
1 | Jeff | IT | 10000 | 1
2 | Sam | IT | 12000 | 2
4 | Ian | Manager | 20000 | 3
5 | John | IT | 60000 | 4
6 | Matthew | Director | 60000 | 5
3 | Richard | Manager | 30000 | 6
(6 rows)

row_number seems to assign the rows a number in order of how it reads them
from the heap. Just to confirm...

update employees set salary = salary where id = 3;

david=# select *,row_number() over () from employees;
id | name | department | salary | row_number
----+---------+------------+--------+------------
1 | Jeff | IT | 10000 | 1
2 | Sam | IT | 12000 | 2
4 | Ian | Manager | 20000 | 3
5 | John | IT | 60000 | 4
6 | Matthew | Director | 60000 | 5
3 | Richard | Manager | 30000 | 6
(6 rows)

The spec says: "The ROW_NUMBER function computes the sequential row number,
starting with 1 (one) for the first row, of the row within its window
partition according to the window ordering of the window."

I'm just not sure if we should block this or not.

Does anyone see this as a feature?

Does anyone see this as a bug?

Any feedback is welcome

David.


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> ROW_NUMBER without ORDER BY
Date: 2008-11-09 23:30:33
Message-ID: 200811100030.34057.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 09 November 2008 22:35:01 David Rowley wrote:
> I've been trying to think of a use case for using ROW_NUMBER() with no ORDER
> BY in the window clause.
>
> Using the example table I always seem to be using, for those who missed it
> in other threads.
>
> create table employees (
> id INT primary key,
> name varchar(30) not null,
> department varchar(30) not null,
> salary int not null,
> check (salary >= 0)
> );
>
> insert into employees values(1,'Jeff','IT',10000);
> insert into employees values(2,'Sam','IT',12000);
> insert into employees values(3,'Richard','Manager',30000);
> insert into employees values(4,'Ian','Manager',20000);
> insert into employees values(5,'John','IT',60000);
> insert into employees values(6,'Matthew','Director',60000);
>
>
> david=# select *,row_number() over () from employees;
> id | name | department | salary | row_number
> ----+---------+------------+--------+------------
> 1 | Jeff | IT | 10000 | 1
> 2 | Sam | IT | 12000 | 2
> 4 | Ian | Manager | 20000 | 3
> 5 | John | IT | 60000 | 4
> 6 | Matthew | Director | 60000 | 5
> 3 | Richard | Manager | 30000 | 6
> (6 rows)
>
> row_number seems to assign the rows a number in order of how it reads them
> from the heap. Just to confirm...
>
> update employees set salary = salary where id = 3;
>
> david=# select *,row_number() over () from employees;
> id | name | department | salary | row_number
> ----+---------+------------+--------+------------
> 1 | Jeff | IT | 10000 | 1
> 2 | Sam | IT | 12000 | 2
> 4 | Ian | Manager | 20000 | 3
> 5 | John | IT | 60000 | 4
> 6 | Matthew | Director | 60000 | 5
> 3 | Richard | Manager | 30000 | 6
> (6 rows)
>
> The spec says: "The ROW_NUMBER function computes the sequential row number,
> starting with 1 (one) for the first row, of the row within its window
> partition according to the window ordering of the window."
>
> I'm just not sure if we should block this or not.
>
> Does anyone see this as a feature?
>
> Does anyone see this as a bug?
>
> Any feedback is welcome

I see this as a greate feature.
It will hopefully be possible to write:

SELECT *, max(row_number()) over() as total_rows from employees;

To get the maximum number of rows in a separate column. Very usefull when writing queries to retrieve "paged" results. Like "Give me the 20 top articles sorted on date and also the total number of articles" in *one* query, eliminating the need for a separate count(*) query.

There was some discussion regarding this here:
http://archives.postgresql.org/pgsql-hackers/2008-10/msg00729.php

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / CEO
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Karenslyst Allé 11 | know how to do a thing and to watch |
PO. Box 529 Skøyen | somebody else doing it wrong, without |
0214 Oslo | comment. |
NORWAY | |
Tlf: +47 24 15 38 90 | |
Fax: +47 24 15 38 91 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "Andreas Joseph Krogh" <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> ROW_NUMBER without ORDER BY
Date: 2008-11-10 00:01:04
Message-ID: 1d709ecc0811091601x4896ee72ib0c3cbf64f5ce3a6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>
> I see this as a greate feature.

I would treat ranking functions without explicit order by clause as a
feature rather than a bug. However, I believe, in most cases optimizer will
avoid additional sort when possible, thus an "order by" in a windowing
clause would not cause any performance degradation.

>
> It will hopefully be possible to write:
>
> SELECT *, max(row_number()) over() as total_rows from employees;

I believe this query does not make sense. At least, "row_number" without
"over" sounds odd.

To count all the rows (if you really want to) you might use "count(*) over
()".

>
> To get the maximum number of rows in a separate column. Very usefull when
> writing queries to retrieve "paged" results. Like "Give me the 20 top
> articles sorted on date and also the total number of articles" in *one*
> query, eliminating the need for a separate count(*) query.

Sometimes it is better to perform several separate queries since optimizer
could use an index scan to get "20 top" and seq scan to get the "count(*)"

Regards,
Vladimir Sitnikov


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> ROW_NUMBER without ORDER BY
Date: 2008-11-10 00:38:58
Message-ID: e08cc0400811091638p10c28103m59bc705f75766cec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/10 David Rowley <dgrowley(at)gmail(dot)com>:
> I've been trying to think of a use case for using ROW_NUMBER() with no ORDER
> BY in the window clause.
>
> Using the example table I always seem to be using, for those who missed it
> in other threads.
>
> create table employees (
> id INT primary key,
> name varchar(30) not null,
> department varchar(30) not null,
> salary int not null,
> check (salary >= 0)
> );
>
> insert into employees values(1,'Jeff','IT',10000);
> insert into employees values(2,'Sam','IT',12000);
> insert into employees values(3,'Richard','Manager',30000);
> insert into employees values(4,'Ian','Manager',20000);
> insert into employees values(5,'John','IT',60000);
> insert into employees values(6,'Matthew','Director',60000);
>
>
> david=# select *,row_number() over () from employees;
> id | name | department | salary | row_number
> ----+---------+------------+--------+------------
> 1 | Jeff | IT | 10000 | 1
> 2 | Sam | IT | 12000 | 2
> 4 | Ian | Manager | 20000 | 3
> 5 | John | IT | 60000 | 4
> 6 | Matthew | Director | 60000 | 5
> 3 | Richard | Manager | 30000 | 6
> (6 rows)
>
> row_number seems to assign the rows a number in order of how it reads them
> from the heap. Just to confirm...
>
> update employees set salary = salary where id = 3;
>
> david=# select *,row_number() over () from employees;
> id | name | department | salary | row_number
> ----+---------+------------+--------+------------
> 1 | Jeff | IT | 10000 | 1
> 2 | Sam | IT | 12000 | 2
> 4 | Ian | Manager | 20000 | 3
> 5 | John | IT | 60000 | 4
> 6 | Matthew | Director | 60000 | 5
> 3 | Richard | Manager | 30000 | 6
> (6 rows)
>
> The spec says: "The ROW_NUMBER function computes the sequential row number,
> starting with 1 (one) for the first row, of the row within its window
> partition according to the window ordering of the window."
>
> I'm just not sure if we should block this or not.
>
> Does anyone see this as a feature?

I don't see any reason to take it as a bug. It may be confusing some
people but it is consistent enough and not ambiguous. Many users
already know if they don't specify ORDER BY clause in a simple regular
query they wouldn't receive ordered rows so it will match their
senses.

Regards,

--
Hitoshi Harada


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-10 13:48:23
Message-ID: e08cc0400811100548s1d31af73s888062c05c8512bd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
> Using one of my original test tables I'm testing windowing functions with a
> GROUP BY.
>
> The following query works as I would expect.
>
> -- Works
> SELECT department,
> SUM(Salary),
> ROW_NUMBER() OVER (ORDER BY department),
> SUM(SUM(salary)) OVER (ORDER BY department)
> FROM employees
> GROUP BY department;
>
>
> The following one fails with the message.
> ERROR: variable not found in subplan target list
>
> -- Does not work.
> SELECT department,
> SUM(Salary),
> ROW_NUMBER() OVER (ORDER BY department),
> SUM(SUM(salary)) OVER (ORDER BY department DESC)
> FROM employees
> GROUP BY department;
>
> I just added the DESC to force it into creating 2 separate windows.
>
> I can re-write the non working query to work using the following:
>
>
> SELECT department,
> salary,
> ROW_NUMBER() OVER (ORDER BY department),
> SUM(salary) OVER (ORDER BY department DESC)
> FROM (SELECT department,
> SUM(salary) AS salary
> FROM employees
> GROUP BY department
> ) t;
>
>

Thank you for your tests again. Attached are both of delta from the
previous and the whole version agains today's HEAD. Note that HEAD has
changed pg_class.h, which conflicted with my local repository so if
you try it update your local with HEAD and apply the patch. Though,
the delta one seem to be applicable without sync.

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081110-1.gz application/x-gzip 46.7 KB
wfdelta.20081110.patch text/x-diff 4.1 KB

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-10 13:48:59
Message-ID: e08cc0400811100548w4da4abc5g1b1f7053e95e4582@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

patch-2

2008/11/10 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
>> Using one of my original test tables I'm testing windowing functions with a
>> GROUP BY.
>>
>> The following query works as I would expect.
>>
>> -- Works
>> SELECT department,
>> SUM(Salary),
>> ROW_NUMBER() OVER (ORDER BY department),
>> SUM(SUM(salary)) OVER (ORDER BY department)
>> FROM employees
>> GROUP BY department;
>>
>>
>> The following one fails with the message.
>> ERROR: variable not found in subplan target list
>>
>> -- Does not work.
>> SELECT department,
>> SUM(Salary),
>> ROW_NUMBER() OVER (ORDER BY department),
>> SUM(SUM(salary)) OVER (ORDER BY department DESC)
>> FROM employees
>> GROUP BY department;
>>
>> I just added the DESC to force it into creating 2 separate windows.
>>
>> I can re-write the non working query to work using the following:
>>
>>
>> SELECT department,
>> salary,
>> ROW_NUMBER() OVER (ORDER BY department),
>> SUM(salary) OVER (ORDER BY department DESC)
>> FROM (SELECT department,
>> SUM(salary) AS salary
>> FROM employees
>> GROUP BY department
>> ) t;
>>
>>
>
> Thank you for your tests again. Attached are both of delta from the
> previous and the whole version agains today's HEAD. Note that HEAD has
> changed pg_class.h, which conflicted with my local repository so if
> you try it update your local with HEAD and apply the patch. Though,
> the delta one seem to be applicable without sync.
>
>
> Regards,
>
> --
> Hitoshi Harada
>

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081110-2.gz application/x-gzip 93.1 KB

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> NTILE function
Date: 2008-11-10 14:19:02
Message-ID: e08cc0400811100619w46a1b013x2051f6a8f2dad116@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
>
> I've done a little testing with NTILE(). I think a check should be added to
> the ntile() function in wfunc.c.
>
> david=# select name,salary,ntile(0) over (order by salary) as n from
> employees;
> ERROR: floating-point exception
> DETAIL: An invalid floating-point operation was signaled. This probably
> means an out-of-range result or an invalid operation, such as division by
> zero.
>
> I tracked that message back to the signal handler in postgres.c :-( simple
> fix though. Any value less than 1 does not really make sense to me.
>
> Maybe we should add something like:
>
> if (PG_WINDOW_ARG(0) < 1)
> elog(ERROR, "negative or zero ntile argument not allowed");
>
> What do you think?
>
> Oracle errors out on less than 1, Sybase seems not to have ntile.
> MSSQL 2008 also errors out on less than 1
>
> David.
>
>

I am so sorry but I missed this thread.

I found in the spec:
1) If NT is the null value, then the result is the null value.
2) If NT is less than or equal to 0 (zero), then an exception
condition is raised: data exception
― invalid argument for NTILE function.

My patch violates both of two :-( As you point, we must add the value
check and also allow null case to return null.

will be fixed soon.

Regards,

--
Hitoshi Harada


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> NTH_VALUE
Date: 2008-11-10 14:24:41
Message-ID: e08cc0400811100624o77744b15me1bc009b74d292c1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
> I'm having a little trouble understanding the standard for NTH_VALUE(). I
> would have assumed that NTH_VALUE(name,1) would return the first name in the
> window. The current patch is using 0 for the first.
>

Hmmm, good point... I haven't thought about it enough, just followed
the internal design. The window_seek() in nodeWindow.c, which is an
internal API, counts rows from 0 so the external behavior is simlar.
Nothing more :-P

Giving your comment, actually it seems to me strange that row_number()
returns 1 for the first row whereas the ntile(ve, 1) returns the
second. If there're no objections I will change it as you told.

Regards,

--
Hitoshi Harada


From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Vladimir Sitnikov'" <sitnikov(dot)vladimir(at)gmail(dot)com>
Subject: Re: Windowing Function Patch Review -> ROW_NUMBER without ORDER BY
Date: 2008-11-10 20:22:52
Message-ID: 9031019CBED74CB4B3FFDDA0C7448A4D@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada wrote:
> > The spec says: "The ROW_NUMBER function computes the sequential row
> number,
> > starting with 1 (one) for the first row, of the row within its window
> > partition according to the window ordering of the window."
> >
> > I'm just not sure if we should block this or not.
> >
> > Does anyone see this as a feature?
>
> I don't see any reason to take it as a bug. It may be confusing some
> people but it is consistent enough and not ambiguous. Many users
> already know if they don't specify ORDER BY clause in a simple regular
> query they wouldn't receive ordered rows so it will match their
> senses.

"Bug" was probably the wrong word for me to use. At the time I was thinking
it could easily be misused. The last sentence in the above quote seemed to
change my mind about this. Perhaps it is slightly unusual but it may come in
useful for someone.

David.


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> NTH_VALUE
Date: 2008-11-11 16:09:02
Message-ID: e08cc0400811110809o3fb66e4cu50350c70a3732b20@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/10 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2008/11/9 David Rowley <dgrowley(at)gmail(dot)com>:
>> I'm having a little trouble understanding the standard for NTH_VALUE(). I
>> would have assumed that NTH_VALUE(name,1) would return the first name in the
>> window. The current patch is using 0 for the first.
>>
>
> Hmmm, good point... I haven't thought about it enough, just followed
> the internal design. The window_seek() in nodeWindow.c, which is an
> internal API, counts rows from 0 so the external behavior is simlar.
> Nothing more :-P
>

Reading the spec more closely, it says:

If <window function type> is <nth value function>, then:
i) Let RN be the value of <nth row>.
ii) Case:
1) If RN is the null value, then the result is the null value.
2) If RN is less than or equal to 0 (zero), then an exception
condition is raised: data exception ― invalid argument for NTH_VALUE
function.

So obviously nth_value(expr, 0) causes error and nth_value(expr, 1)
returns the first row. I will update my patch soon.

Regards,

--
Hitoshi Harada


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> NTH_VALUE
Date: 2008-11-12 09:46:31
Message-ID: e08cc0400811120146r3c47b320g8aac5e22c05c8829@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/11/12 1:09 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> So obviously nth_value(expr, 0) causes error and nth_value(expr, 1)
> returns the first row. I will update my patch soon.

Fixed these issues:
- ntile() value check
- nth_value() value check and behavior change

Also, I found a bug that is fixed in this release:

SELECT sum(salary) OVER w1, count(*) OVER w2 FROM empsalary WINDOW w1
AS (ORDER BY salary), w2 AS (ORDER BY salary);

The identical windows named differently were not processed
appropriately. It works now.

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081112-1.gz application/x-gzip 47.2 KB

From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "David Rowley" <dgrowley(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Windowing Function Patch Review -> NTH_VALUE
Date: 2008-11-12 09:47:12
Message-ID: e08cc0400811120147x7351e119p13c2b08c00bcb906@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

patch-2.
They are against HEAD today.

2008/11/12 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2008/11/12 1:09 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>> So obviously nth_value(expr, 0) causes error and nth_value(expr, 1)
>> returns the first row. I will update my patch soon.
>
> Fixed these issues:
> - ntile() value check
> - nth_value() value check and behavior change
>
> Also, I found a bug that is fixed in this release:
>
> SELECT sum(salary) OVER w1, count(*) OVER w2 FROM empsalary WINDOW w1
> AS (ORDER BY salary), w2 AS (ORDER BY salary);
>
> The identical windows named differently were not processed
> appropriately. It works now.
>
>
> Regards,
>
>
>
> --
> Hitoshi Harada
>

--
Hitoshi Harada

Attachment Content-Type Size
window_functions.patch.20081112-2.gz application/x-gzip 93.2 KB