We are not following the spec for HAVING without GROUP BY

Lists: pgsql-bugspgsql-hackers
From: "Peter Wright" <pete(at)flooble(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-06 22:47:13
Message-ID: 20050306224713.7FBCAF0C69@svr2.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


The following bug has been logged online:

Bug reference: 1528
Logged by: Peter Wright
Email address: pete(at)flooble(dot)net
PostgreSQL version: 7.4.7, 8.0.1
Operating system: Debian Linux (unstable)
Description: Rows returned that should be excluded by WHERE clause
Details:

Hopefully this example SQL will paste correctly -
I think this demonstrates the problem much better than I could explain in
words. The bug is shown in the two
SELECT queries with a WHERE clause. Very bizarre.

The same bug crops up on 7.4.6, 7.4.7 and 8.0.1.

pete(at)serf [07/Mar 6:28:50] pts/10 !19 ~ $ createdb test1

CREATE DATABASE

pete(at)serf [07/Mar 6:28:59] pts/10 !20 ~ $ psql test1

Welcome to psql 7.4.7, the PostgreSQL interactive terminal.



Type: \copyright for distribution terms

\h for help with SQL commands

\? for help on internal slash commands

\g or terminate with semicolon to execute query

\q to quit



test1=# create table t1 ( a smallint primary key, b smallint ) ;

NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for
table "t1"
CREATE TABLE

test1=# create table t2 ( a smallint primary key, b smallint ) ;

NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for
table "t2"
CREATE TABLE

test1=# insert into t1 values (1, 1);

INSERT 118413888 1

test1=# insert into t1 values (2, 2);

INSERT 118413889 1

test1=# insert into t2 values (1, 4);

INSERT 118413890 1

test1=# insert into t2 values (2, 8);

INSERT 118413891 1

test1=# select id, min(b) from ( select 1 as id, max(b) as b from t1 union
select 2 as id, max(b) from t2 ) as q1 group by id ;
id | min

----+-----

1 | 2

2 | 8

(2 rows)



test1=# create view qry1 as select id, min(b) from ( select 1 as id, max(b)
as b from t1 union select 2 as id, max(b) from t2 ) as q1 group by id ;


CREATE VIEW

test1=# select * from qry1 where id = 1;

id | min

----+-----

1 | 2

2 |

(2 rows)



test1=# select * from qry1 where id = 2;

id | min

----+-----

1 |

2 | 8

(2 rows)



test1=# select * from qry1;

id | min

----+-----

1 | 2

2 | 8

(2 rows)



test1=#


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Wright" <pete(at)flooble(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-08 08:07:13
Message-ID: 4017.1110269233@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

"Peter Wright" <pete(at)flooble(dot)net> writes:
> Description: Rows returned that should be excluded by WHERE clause

Interesting point. The view and union don't seem to be the issue;
I think the problem can be expressed as

regression=# select 2 as id, max(b) from t2 having 2 = 1;
id | max
----+-----
2 |
(1 row)

Now, if this were a WHERE clause, I think the answer would be right:

regression=# select 2 as id, max(b) from t2 where 2 = 1;
id | max
----+-----
2 |
(1 row)

but since it's HAVING I think this is probably wrong. Looking at the
EXPLAIN output

regression=# explain select 2 as id, max(b) from t2 having 2 = 1;
QUERY PLAN
----------------------------------------------------------------
Aggregate (cost=3.68..3.68 rows=1 width=2)
-> Result (cost=0.00..3.14 rows=214 width=2)
One-Time Filter: false
-> Seq Scan on t2 (cost=0.00..3.14 rows=214 width=2)
(4 rows)

the issue is clearly that the known-false HAVING clause is pushed down
inside the aggregation, as though it were WHERE. The existing code
pushes down HAVING to WHERE if the clause contains no aggregates, but
evidently this is too simplistic. What are the correct conditions for
pushing down HAVING clauses to WHERE?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org, pgsql-bugs(at)postgreSQL(dot)org
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-09 23:40:47
Message-ID: 8109.1110411647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote:
> I think the problem can be expressed as

> regression=# select 2 as id, max(b) from t2 having 2 = 1;
> id | max
> ----+-----
> 2 |
> (1 row)

> the issue is clearly that the known-false HAVING clause is pushed down
> inside the aggregation, as though it were WHERE. The existing code
> pushes down HAVING to WHERE if the clause contains no aggregates, but
> evidently this is too simplistic. What are the correct conditions for
> pushing down HAVING clauses to WHERE?

After reading the spec a little, I think that we have oversimplified our
handling of aggregate-free HAVING clauses. If you look in planner.c
you'll find that such a clause is converted into a WHERE clause, but
this is not what the spec says to do, and you can tell the difference
in cases like the above.

What the spec actually says, or at least implies, is that a HAVING
clause is to be evaluated only once per group --- where the "group"
is the whole table if there's no GROUP BY clause. The group is
to be discarded if the HAVING clause doesn't return true. SQL92 7.8:

1) Let T be the result of the preceding <from clause>, <where
clause>, or <group by clause>. If that clause is not a <group
by clause>, then T consists of a single group and does not have
a grouping column.

2) The <search condition> is applied to each group of T. The result
of the <having clause> is a grouped table of those groups of T
for which the result of the <search condition> is true.

So it's clear that what the above case should return is a grouped table
having no groups ... ie, no rows out. What we are actually returning is
one group containing no rows, which is visibly different because of the
presence of the aggregate function in the SELECT list.

There are really four cases to think about, depending on whether the
query has GROUP BY and on whether it has any aggregates outside the
HAVING clause:

1. No GROUP BY, no aggregates

Per spec, the HAVING clause should be evaluated once and either we
return the whole input or none of it. Since there are no grouped
columns and (by assumption) no aggregates in the HAVING clause, the
HAVING clause must in fact be variable-free, ie, it's a pseudoconstant
clause. (Only pseudoconstant, because it might contain outer-level
variables or volatile functions.) I think the correct implementation
in this case is to generate a gating Result node with the HAVING clause
as a one-time filter, so that we don't evaluate any of the query if the
HAVING is false. The current code gets this almost right: it will make
a variable-free WHERE clause into a Result gating condition *if it
contains no volatile functions*. So it's wrong for the volatile
function case but right otherwise.

2. GROUP BY, no aggregates

In this case the HAVING clause might contain references to the grouping
columns. It is legitimate to push down the HAVING to become WHERE,
but *only* if it doesn't contain any volatile functions --- otherwise it
might be possible to tell that the HAVING clause was executed more than
once. It would be useful to push down the HAVING if, for example, it
could become an indexscan qualifier. However if the HAVING condition
is expensive to compute (eg it contains a subselect) we'd probably be
better off not to push it into WHERE, but to arrange to evaluate it
only once per group. Right now the executor cannot support testing
such a condition, but I think it would be easy enough to improve nodeGroup.c
to allow testing a qual condition for each group.

3. No GROUP BY, has aggregates

As in case 1, the HAVING clause must be variable-free, so the best
implementation would be to put it into a gating Result node. It would
be correct to treat it the same way as we do for a HAVING clause
containing aggregates (ie, attach it as a qual condition to the Agg plan
node) --- but that would mean computing and throwing away the aggregate
result when the HAVING fails, when we could skip computing it altogether.

4. GROUP BY and has aggregates

This is really the same as case 2: we could push down the HAVING
condition if it contains no volatile functions, but unless it is
cheap to evaluate we are probably best off to attach it as a qual
condition to the Agg node, ie, evaluate it only once per group.
The only difference is that we don't need an executor fix to support
this, since Agg does quals already.

So, aside from the originally reported bug, there are two other problems
in this logic: it isn't ensuring that volatile functions will be
evaluated only once per group, and it isn't considering evaluation
cost in deciding whether a clause that could be converted to WHERE
should be or not.

I haven't yet tried to make a patch that fixes all of these things.
It'll likely come out complex enough that we don't want to back-patch
it into 8.0 or before. If so, I'll try to make a simpler variant that
fixes the semantic bugs but doesn't try to be smart about evaluation
cost.

Comments?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org, pgsql-bugs(at)postgreSQL(dot)org
Subject: We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 02:21:36
Message-ID: 10126.1110421296@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote in reference to bug#1528:
> What the spec actually says, or at least implies, is that a HAVING
> clause is to be evaluated only once per group --- where the "group"
> is the whole table if there's no GROUP BY clause.

In fact, reading the spec more closely, it is clear that the presence
of HAVING turns the query into a grouped query even if there is no
GROUP BY. I quote SQL92 7.8 again:

7.8 <having clause>

Function

Specify a grouped table derived by the elimination of groups from
^^^^^^^^^^^^^^^^^^^^^^^
the result of the previously specified clause that do not meet the
<search condition>.

...

1) Let T be the result of the preceding <from clause>, <where
clause>, or <group by clause>. If that clause is not a <group
by clause>, then T consists of a single group and does not have
a grouping column.

2) The <search condition> is applied to each group of T. The result
of the <having clause> is a grouped table of those groups of T
^^^^^^^^^^^^^^^^^^
for which the result of the <search condition> is true.

This is quite clear that the output of a HAVING clause is a "grouped
table" no matter whether the query uses GROUP BY or aggregates or not.

What that means is that neither the HAVING clause nor the targetlist
can use any ungrouped columns except within aggregate calls; that is,

select col from tab having 2>1

is in fact illegal per SQL spec, because col isn't a grouping column
(there are no grouping columns in this query).

What we are currently doing with this construct is pretending that it
means

select col from tab where 2>1

but it does not mean that according to the spec.

As I look into this, I find that several warty special cases in the
parser and planner arise from our misunderstanding of this point,
and could be eliminated if we enforced the spec's interpretation.
In particular this whole business of "moving HAVING into WHERE" is
wrong and should go away.

Comments? Can anyone confirm whether DB2 or other databases allow
ungrouped column references with HAVING?

regards, tom lane


From: Kevin Brown <kevin(at)sysexperts(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 03:30:08
Message-ID: 20050310033008.GC30977@filer
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> What that means is that neither the HAVING clause nor the targetlist
> can use any ungrouped columns except within aggregate calls; that is,
>
> select col from tab having 2>1
>
> is in fact illegal per SQL spec, because col isn't a grouping column
> (there are no grouping columns in this query).

[...]

> Comments? Can anyone confirm whether DB2 or other databases allow
> ungrouped column references with HAVING?

Oracle does not allow such references. It issues "ORA-00979: not a
GROUP BY expression" when you try to hand it such a reference.

MS SQL Server does not allow such references either, yielding
"columnname is invalid in the HAVING clause because it is not
contained in either an aggregate function or the GROUP BY clause.".

Can't comment about DB2.

--
Kevin Brown kevin(at)sysexperts(dot)com


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 03:46:26
Message-ID: 87acpc5fjh.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> In particular this whole business of "moving HAVING into WHERE" is
> wrong and should go away.

It sort of seems like "select aggregate(col) from tab" with no GROUP BY clause
is a bit of a special case. The consistent thing to do would be to return no
records. It's only due to the special case that SQL returns a single record
for this case.

It seems like this special case is the only way to expose this difference
between a WHERE clause and a HAVING clause with an aggregate-free expression.

It seems like all that's needed is a simple flag on the Aggregate node that
says whether to output a single record if there are no input records or to
output no records.

--
greg


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Kevin Brown <kevin(at)sysexperts(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP
Date: 2005-03-10 04:00:26
Message-ID: 422FC65A.5060806@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>>Comments? Can anyone confirm whether DB2 or other databases allow
>>ungrouped column references with HAVING?
>
> Oracle does not allow such references. It issues "ORA-00979: not a
> GROUP BY expression" when you try to hand it such a reference.
>
> MS SQL Server does not allow such references either, yielding
> "columnname is invalid in the HAVING clause because it is not
> contained in either an aggregate function or the GROUP BY clause.".
>
> Can't comment about DB2.

MySQL allows it:

mysql> create table tab (col integer);
Query OK, 0 rows affected (0.01 sec)

mysql> select col from tab having 2 > 1;
Empty set (0.00 sec)

mysql> insert into tab values (1);
Query OK, 1 row affected (0.00 sec)

mysql> select col from tab having 2 > 1;
+------+
| col |
+------+
| 1 |
+------+
1 row in set (0.00 sec)

Of course, that's not saying much!

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Kevin Brown <kevin(at)sysexperts(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP
Date: 2005-03-10 04:44:55
Message-ID: 11032.1110429895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
>>> Comments? Can anyone confirm whether DB2 or other databases allow
>>> ungrouped column references with HAVING?

> MySQL allows it:

A slightly tighter experiment shows that they treat HAVING like WHERE
in this case:

mysql> create table tab(col int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tab values(1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into tab values(2);
Query OK, 1 row affected (0.01 sec)

mysql> select col from tab having col>1;
+------+
| col |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

I think it's fairly likely that they copied our misinterpretation ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 05:06:24
Message-ID: 11171.1110431184@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> It sort of seems like "select aggregate(col) from tab" with no GROUP BY clause
> is a bit of a special case. The consistent thing to do would be to return no
> records.

I don't think so. SQL99 defines this stuff in a way that might make you
feel better: it says that the presence of either HAVING or any aggregate
functions in the target list implies "GROUP BY ()", which is the case
that they identify as <grand total> in the <group by clause> syntax.
Basically this legitimizes the concept of turning the whole input table
into one group, which is what's really going on here. We get this right
in the case where it's driven by the appearance of aggregate functions,
but not when it's just driven by HAVING.

> It seems like all that's needed is a simple flag on the Aggregate node that
> says whether to output a single record if there are no input records or to
> output no records.

The implementation problem is that there *is* no aggregate node if there
are no aggregates. The definitional problem is that we are allowing
cases that are illegal per spec and are going to be difficult to
continue to support if we handle all the spec-required cases properly.

regards, tom lane


From: Mark Shewmaker <mark(at)primefactor(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP
Date: 2005-03-10 05:20:06
Message-ID: 1110432006.12946.2.camel@k9
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 2005-03-09 at 21:21 -0500, Tom Lane wrote:
> Comments? Can anyone confirm whether DB2 or other databases allow
> ungrouped column references with HAVING?

In Sybase:

1> select 2 as id, max(myfield) from mytable where 2=1
2> go
id
----------- ----------
2 NULL

(1 row affected)
1> select 2 as id, max(myfield) from mytable having 2=1
2> go
id
----------- ----------

(0 rows affected)

--
Mark Shewmaker
mark(at)primefactor(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 17:44:50
Message-ID: 16153.1110476690@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote:
> This is quite clear that the output of a HAVING clause is a "grouped
> table" no matter whether the query uses GROUP BY or aggregates or not.

> What that means is that neither the HAVING clause nor the targetlist
> can use any ungrouped columns except within aggregate calls; that is,
> select col from tab having 2>1
> is in fact illegal per SQL spec, because col isn't a grouping column
> (there are no grouping columns in this query).

Actually, it's even more than that: a query with HAVING and no GROUP BY
should always return 1 row (if the HAVING succeeds) or 0 rows (if not).
If there are no aggregates, the entire from/where clause can be thrown
away, because it can have no impact on the result!

Would those of you with access to other DBMSes try this:

create table tab (col integer);
select 1 from tab having 1=0;
select 1 from tab having 1=1;
insert into tab values(1);
insert into tab values(2);
select 1 from tab having 1=0;
select 1 from tab having 1=1;

I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows
from the 4 selects --- that is, the contents of tab make no difference
at all. (MySQL returns 0, 0, 0, and 2 rows, so they are definitely
copying our mistake...)

regards, tom lane


From: Kevin HaleBoyes <khaleboyes(at)chartwelltechnology(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP
Date: 2005-03-10 18:21:14
Message-ID: 4230901A.3080101@chartwelltechnology.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Would those of you with access to other DBMSes try this:
>
> create table tab (col integer);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
>
> I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows
> from the 4 selects --- that is, the contents of tab make no difference
> at all. (MySQL returns 0, 0, 0, and 2 rows, so they are definitely
> copying our mistake...)
>
> regards, tom lane

From SQL server 2000 with a service pack, I get:

zero rows from the first query (having 1=0);
one row, col value 1, from second query (having 1=1);
...run inserts...
zero rows from the third query (having 1=0);
one row, col value 1, from forth query (having 1=1);

K.


From: Gary Doades <gpd(at)gpdnet(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without
Date: 2005-03-10 19:12:48
Message-ID: 42309C30.7060107@gpdnet.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
>
> Would those of you with access to other DBMSes try this:
>
> create table tab (col integer);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
>
> I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows

MS SQL Server 2000 returns 0, 1, 0 and 1 rows correctly.

Cheers,
Gary.


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without
Date: 2005-03-10 19:24:41
Message-ID: 20050310192441.GA86977@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
>
> Would those of you with access to other DBMSes try this:
>
> create table tab (col integer);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
>
> I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows

Not that this means much, but I'll mention it for the sake of
completeness: SQLite 3.0.8 disallows all of the above SELECT
statements:

sqlite> create table tab (col integer);
sqlite> select 1 from tab having 1=0;
SQL error: a GROUP BY clause is required before HAVING
sqlite> select 1 from tab having 1=1;
SQL error: a GROUP BY clause is required before HAVING
sqlite> insert into tab values(1);
sqlite> insert into tab values(2);
sqlite> select 1 from tab having 1=0;
SQL error: a GROUP BY clause is required before HAVING
sqlite> select 1 from tab having 1=1;
SQL error: a GROUP BY clause is required before HAVING

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: johnnnnnn <john(at)phaedrusdeinus(dot)org>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 20:01:28
Message-ID: 20050310200128.GA32629@performics.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Mar 10, 2005 at 12:44:50PM -0500, Tom Lane wrote:
> Would those of you with access to other DBMSes try this:

DB2/LINUX 8.1.6

> create table tab (col integer);
> select 1 from tab having 1=0;

1
-----------

0 record(s) selected.

> select 1 from tab having 1=1;

1
-----------
1

1 record(s) selected.

> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;

1
-----------

0 record(s) selected.

> select 1 from tab having 1=1;

1
-----------
1

1 record(s) selected.

-johnnnnnnnn


From: Jaime Casanova <systemguards(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 20:14:23
Message-ID: c2d9e70e05031012143fddf4c1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, 10 Mar 2005 12:44:50 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Would those of you with access to other DBMSes try this:
>
On informix 9.21.UC4

> create table tab (col integer);
> select 1 from tab having 1=0;
>
returns no rows

> select 1 from tab having 1=1;
>
returns no rows

> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
>
returns no rows

> select 1 from tab having 1=1;
>
returns 2 rows

regards,
Jaime Casanova


From: Mark Kirkwood <markir(at)coretech(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP
Date: 2005-03-10 21:37:13
Message-ID: 4230BE09.9070305@coretech.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:

> Would those of you with access to other DBMSes try this:
>
> create table tab (col integer);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
>
> I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows
> from the 4 selects --- that is, the contents of tab make no difference
> at all. (MySQL returns 0, 0, 0, and 2 rows, so they are definitely
> copying our mistake...)

Firebird 1.5.1 FreeBSD 5.3
Database: test
SQL> drop table tab;
SQL> create table tab (col integer);
SQL> select 1 from tab having 1=0;
SQL> select 1 from tab having 1=1;

============

1

SQL> insert into tab values(1);
SQL> insert into tab values(2);
SQL> select 1 from tab having 1=0;
SQL> select 1 from tab having 1=1;

============

1

SQL>


From: Mark Shewmaker <mark(at)primefactor(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP BY
Date: 2005-03-10 22:42:07
Message-ID: 20050310224207.GB20501@primefactor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Mar 10, 2005 at 12:44:50PM -0500, Tom Lane wrote:
>
> Would those of you with access to other DBMSes try this:
>
> create table tab (col integer);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
>
> I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows
> from the 4 selects --- that is, the contents of tab make no difference
> at all.

Sybase ASE version 12.5.2 returns 0, 0, 0, and 1 rows.

A plain "select 1 from tab" returns zero rows when tab is empty.

--
Mark Shewmaker
mark(at)primefactor(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Wright" <pete(at)flooble(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-10 23:49:47
Message-ID: 26929.1110498587@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

"Peter Wright" <pete(at)flooble(dot)net> writes:
> I think this demonstrates the problem much better than I could explain in
> words. The bug is shown in the two
> SELECT queries with a WHERE clause. Very bizarre.

I've applied a patch that corrects this problem in CVS HEAD, but since
it changes the behavior of HAVING in a nontrivial way, I'm inclined to
think that we should not backpatch it into existing release branches.

regards, tom lane


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP
Date: 2005-03-11 00:32:53
Message-ID: 4230E735.8000805@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>>select 1 from tab having 1=1;
>>
>
> returns 2 rows

I'm curious whats in those two rows... {{1} {1}} ?


From: Peter Wright <pete(at)flooble(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-11 05:45:18
Message-ID: 20050311054518.GA5375@cartman.flooble.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Hi Tom, others,

First I must say that I appreciate the effort you've invested already
into finding the best "correct" solution. It's very encouraging. :)

I think I understand your analysis of the problem being that HAVING is
erroneously optimised/simplified to WHERE in some cases - and so the
initial "bug" I reported is technically the correct behaviour(?).

....Okay, maybe I'm not completely sure I've understood you correctly. :)

On 08/03 03:07:13, Tom Lane wrote:
> "Peter Wright" <pete(at)flooble(dot)net> writes:
> > Description: Rows returned that should be excluded by WHERE clause
>
> Interesting point. The view and union don't seem to be the issue;

I think the view _is_ the issue (well, at least for me and my
(limited) understanding of how things should work :)). See below.

> I think the problem can be expressed as
[ snip ]
> Now, if this were a WHERE clause, I think the answer would be right:
>
> regression=# select 2 as id, max(b) from t2 where 2 = 1;
> id | max
> ----+-----
> 2 |
> (1 row)
>
> but since it's HAVING I think this is probably wrong.
[ ... ]

On 08/03 12:14:35, Tom Lane wrote:
> "Gill, Jerry T." <JTGill(at)west(dot)com> writes:
> > Just an interesting side note here, this behavior is identical to
> > DB2. I am not sure if that makes it correct or not, but here is an
> > example.
> > [gill(at)c2n2 gill]$ db2 "select 2 as id, max(apn3) from phoenix.client where 2 =1"
>
> > ID 2
> > ----------- ------
> > 2 -
>
> > 1 record(s) selected.
>
> In the WHERE case I think there's no question that the above is
> correct: WHERE is defined to filter rows before application of
> aggregates, so zero rows arrive at the MAX aggregate, and that means
> it produces a NULL.

Now this _does_ make sense - but in the case of a view (or
sub-select), how can it be correct that a WHERE _outside_ the view can
affect the behaviour of that view?

At the very least I'd call that grossly anti-intuitive:

----------------------------------------------------------------------
test1=# create table tab1 ( a integer, b integer );
CREATE TABLE
test1=# insert into tab1 values ( 1, 1 );
INSERT 118421921 1
test1=# insert into tab1 values ( 2, 2 );
INSERT 118421922 1
test1=# create view qry1 as select 2 as id, max(b) as b from tab1;
CREATE VIEW
test1=# create table tab2 ( id integer, b integer );
CREATE TABLE
test1=# insert into tab2 values ( 2, 2);
INSERT 118421931 1
test1=# select * from tab2;
id | b
----+---
2 | 2
(1 row)

test1=# select * from qry1;
id | b
----+---
2 | 2
(1 row)

test1=# select * from qry1 where id = 1;
id | b
----+---
2 |
(1 row)

test1=# select * from tab2 where id = 1;
id | b
----+---
(0 rows)

test1=#
----------------------------------------------------------------------

You say, "WHERE is defined to filter rows before application of
aggregates", but I'd _think_ that should be interpreted to apply only
to aggregates in the _current_ query (ie. not in sub-queries).

In my example just above, I'd _expect_ the view should be fully
evaluated and the results (of that view) treated as though it were
just another table.

Perhaps I'm just showing my limited experience with database theory
here :-), but if you can explain why it makes sense that WHERE must be
applied before aggregation in _all_ subqueries, that'd be good *wry grin*.

> But HAVING is supposed to filter after aggregation, so I think
> probably there should be no row out in that case.

I have no problem with this.

> regards, tom lane

Thanks again for your efforts with this issue, Tom.

Currently I'm working around it by adding an extra kludge-clause,
effectively "AND b is not null", but it'd be preferable to have
Postgres do the Right Thing(tm), whatever that might be.... :)

Pete.
--
http://akira.apana.org.au/~pete/
And anyway, we know that 2 + 2 = 5, for very large values of two...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Wright <pete(at)flooble(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-11 06:48:47
Message-ID: 29260.1110523727@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Peter Wright <pete(at)flooble(dot)net> writes:
> [various stuff snipped]
> You say, "WHERE is defined to filter rows before application of
> aggregates", but I'd _think_ that should be interpreted to apply only
> to aggregates in the _current_ query (ie. not in sub-queries).

Well, the subtext of this discussion is that Postgres, like every other
DBMS on the planet, will aggressively push query restrictions down as
far as it's allowed to by the semantic rules. Consider
CREATE VIEW v1 AS SELECT c1, sum(c2) FROM tab GROUP BY c1;
SELECT * FROM v1 WHERE c1 = 42;
A naive implementation would compute every row of the view v1
(ie, every sum of c2 over each existing value of c1) and then
throw away each result except the one for c1 = 42. This is
obviously not acceptable. So we have to transform the query to
SELECT c1, sum(c2) FROM tab WHERE c1 = 42 GROUP BY c1;
which gives the execution engine a fair shot at doing something
reasonable, ie, pulling only the rows of tab that have c1 = 42,
which we could expect would be done with the aid of an index on c1.

(The GROUP BY step is actually redundant in this formulation,
but the cost of doing it is probably negligible; certainly it's
not the major problem compared to computing all the useless
sums over c1 groups other than 42.)

Point here is that to get reasonably efficient behavior we have to be
able to push the WHERE c1 = 42 condition down inside the view's
GROUP BY clause; and therefore we have to understand the exact
semantic conditions under which that is an allowable transformation.
Your bug report is essentially pointing out an error in our rules
for thinking that this transformation is allowable.

regards, tom lane


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP
Date: 2005-03-11 08:20:00
Message-ID: Pine.OSF.4.61.0503111018210.164442@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, 10 Mar 2005, Tom Lane wrote:

> Would those of you with access to other DBMSes try this:
>
> create table tab (col integer);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
> insert into tab values(1);
> insert into tab values(2);
> select 1 from tab having 1=0;
> select 1 from tab having 1=1;
>
> I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows
> from the 4 selects --- that is, the contents of tab make no difference
> at all. (MySQL returns 0, 0, 0, and 2 rows, so they are definitely
> copying our mistake...)

DB2 (version 8.1) gives 0, 1, 0, 1.

- Heikki


From: Terry Yapt <yapt(at)NOtechSAPMnovell(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We are not following the spec for HAVING without GROUP
Date: 2005-03-11 13:37:37
Message-ID: sj73311bugnc7r9vkvp43kjnukmgphs4su@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Results from Oracle 9.2.0.3 (9.2 patch 2)

===========
Connected to Oracle9i Release 9.2.0.3.0
Connected as system

SQL>
SQL> DROP TABLE TESTTAB;

DROP TABLE TESTTAB

ORA-00942: table or view does not exist

SQL> create table TESTtab (col integer);

Table created

SQL> select 1 as col from TESTtab having 1=0;

COL
----------

SQL> select 1 as col from TESTtab having 1=1;

COL
----------

SQL> insert into TESTtab values(1);

1 row inserted

SQL> insert into TESTtab values(2);

1 row inserted

SQL> select 1 as col from TESTtab having 1=0;

COL
----------

SQL> select 1 as col from TESTtab having 1=1;

COL
----------
1
1

SQL> DROP TABLE TESTTAB;

Table dropped

SQL>
===========


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 04:24:18
Message-ID: 87mzt67t3h.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

> On Wed, 2005-03-09 at 21:21 -0500, Tom Lane wrote:
> > Comments? Can anyone confirm whether DB2 or other databases allow
> > ungrouped column references with HAVING?

Mysql treats ungrouped columns as an assertion that those columns will all be
equal for the group and it can pick an arbitrary one. Essentially it does an
implicit "first(x) AS x". The expected use case is for things like:

select count(*),a.*
from a,b
where a.pk = b.a_fk
group by a.pk

I've noticed quite frequently scenarios where this idiom would be very handy.
I usually either end up rewriting the query to have nested subqueries so I can
push the grouping into the subquery. This doesn't always work though and
sometimes I end up listing several, sometimes dozens, of columns like
"first(x) AS x" or else end up

--
greg


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 04:56:55
Message-ID: 20050314045655.GC30090@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sun, Mar 13, 2005 at 23:24:18 -0500,
Greg Stark <gsstark(at)mit(dot)edu> wrote:
>
> I've noticed quite frequently scenarios where this idiom would be very handy.
> I usually either end up rewriting the query to have nested subqueries so I can
> push the grouping into the subquery. This doesn't always work though and
> sometimes I end up listing several, sometimes dozens, of columns like
> "first(x) AS x" or else end up

If someone did a naive implementation of first() and last() aggregates
for 8.1, is that something that would likely be accepted?


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 05:35:32
Message-ID: 8764zu7psr.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Bruno Wolff III <bruno(at)wolff(dot)to> writes:

> If someone did a naive implementation of first() and last() aggregates
> for 8.1, is that something that would likely be accepted?

You mean like this?

CREATE FUNCTION first_accum(anyelement,anyelement) RETURNS anyelement as 'select coalesce($1,$2)' LANGUAGE SQL;
CREATE AGGREGATE first (BASETYPE=anyelement, SFUNC=first_accum, STYPE = anyelement);

Though I suspect it would be faster as a native C implementation.

--
greg


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 06:12:53
Message-ID: 20050314061253.GA1274@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Mar 14, 2005 at 00:35:32 -0500,
Greg Stark <gsstark(at)mit(dot)edu> wrote:
>
> Bruno Wolff III <bruno(at)wolff(dot)to> writes:
>
> > If someone did a naive implementation of first() and last() aggregates
> > for 8.1, is that something that would likely be accepted?
>
> You mean like this?
>
> CREATE FUNCTION first_accum(anyelement,anyelement) RETURNS anyelement as 'select coalesce($1,$2)' LANGUAGE SQL;
> CREATE AGGREGATE first (BASETYPE=anyelement, SFUNC=first_accum, STYPE = anyelement);
>
> Though I suspect it would be faster as a native C implementation.

Pretty much that idea.

It just seemed odd to me that first and last weren't implemented, since they
seemed to be simple and could be used in cases where max or min couldn't
(because of no ordering) to do the same thing.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 06:49:04
Message-ID: 5753.1110782944@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Wed, 2005-03-09 at 21:21 -0500, Tom Lane wrote:
>> Comments? Can anyone confirm whether DB2 or other databases allow
>> ungrouped column references with HAVING?

> Mysql treats ungrouped columns as an assertion that those columns will all be
> equal for the group and it can pick an arbitrary one. Essentially it does an
> implicit "first(x) AS x". The expected use case is for things like:

> select count(*),a.*
> from a,b
> where a.pk = b.a_fk
> group by a.pk

[ Your comment is completely unrelated to my question, but anyway: ]

Well, that query is actually legitimate per SQL99 (though not per SQL92)
if a.pk actually is a primary key. A correct implementation of SQL99
would deduce that the columns of A are all functionally dependent on
a.pk and not make you list them in GROUP BY. I dunno whether mysql goes
through that pushup or whether they just assume the user knows what he's
doing (though from what I know of their design philosophy I bet the
latter).

I'm not sure if we have a TODO item about working on the SQL99 grouping
rules, but I'd like to see us implement at least the simpler cases,
such as this one.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 06:52:59
Message-ID: 5788.1110783179@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> If someone did a naive implementation of first() and last() aggregates
> for 8.1, is that something that would likely be accepted?

For the purpose that Greg is suggesting, these would have no advantage
over min() or max() --- since the system wouldn't know how to optimize
them --- and they'd be considerably less standard. So my inclination
would be to say it's a waste of effort.

regards, tom lane


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 07:26:34
Message-ID: 20050314072634.GA3860@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Mar 14, 2005 at 01:52:59 -0500,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> > If someone did a naive implementation of first() and last() aggregates
> > for 8.1, is that something that would likely be accepted?
>
> For the purpose that Greg is suggesting, these would have no advantage
> over min() or max() --- since the system wouldn't know how to optimize
> them --- and they'd be considerably less standard. So my inclination
> would be to say it's a waste of effort.

The case I was thinking of were datatypes without a defined ordering
where max and min wouldn't be usable. But if GROUP BY was going to
changed to allow any columns if the primary key was used in the GROUP
BY clause, I can't see any use for those functions.


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 09:13:23
Message-ID: 87oedmvbd8.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruno Wolff III <bruno(at)wolff(dot)to> writes:

> The case I was thinking of were datatypes without a defined ordering
> where max and min wouldn't be usable. But if GROUP BY was going to
> changed to allow any columns if the primary key was used in the GROUP
> BY clause, I can't see any use for those functions.

Well any other case will basically be a another spelling for DISTINCT ON.

Except DISTINCT ON only handles a limited range of cases. Basically DISTINCT
ON is a special case of GROUP BY where the _only_ aggregate function you're
allowed is first().

Handling the same cases using GROUP BY would let you mix other aggregate
functions so where you have:

select distinct on (x) x,y,z
order by x,y,z

You can do the equivalent:

select x, first(y), first(z)
order by x,y,z
group by x

But you can also handle the more general case like:

select x, first(y), first(z), avg(a), sum(s)
order by x,y,z
group by x

I don't really care one way or the other about the "first" function per se.

But it seems odd to have a feature to handle a special case of an existing
much more general feature separately. It seems it would be more worthwhile to
handle the general case of aggregate functions that don't need all the records
to generate an answer, including first(), last(), min(), and max(). That would
better handle the cases DISTINCT ON handles but also solve many other
problems.

--
greg


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Mark Kirkwood <markir(at)coretech(dot)co(dot)nz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [HACKERS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 12:40:38
Message-ID: 1h1b31d9if4586q25g19iobm43e89sn6nq@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, 11 Mar 2005 10:37:13 +1300, Mark Kirkwood
<markir(at)coretech(dot)co(dot)nz> wrote:
>Firebird 1.5.1 FreeBSD 5.3
>[correct results]

Interbase 6.0:
SQL> create table tab (col integer);
SQL> select 1 from tab having 1=0;
SQL> select 1 from tab having 1=1;

============

0 <------- :-)

SQL> insert into tab values(1);
SQL> insert into tab values(2);
SQL> select 1 from tab having 1=0;
SQL> select 1 from tab having 1=1;

============

1

SQL>

Servus
Manfred


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruno Wolff III <bruno(at)wolff(dot)to>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] We are not following the spec for HAVING without
Date: 2005-03-14 12:59:45
Message-ID: Pine.LNX.4.44.0503141301240.16409-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 14 Mar 2005, Greg Stark wrote:

> select distinct on (x) x,y,z
> order by x,y,z
>
> You can do the equivalent:
>
> select x, first(y), first(z)
> order by x,y,z
> group by x
>
> But you can also handle the more general case like:
>
> select x, first(y), first(z), avg(a), sum(s)
> order by x,y,z
> group by x
>
> I don't really care one way or the other about the "first" function per se.

The standard (sql2003) have what is called windows where one can do these
things and much more.

A window is like a group by, but you keep all rows in the result. This can
be used to for example enumrate the rows within a window partition using
ROW_NUMBER(). It can later can be used in a WHERE to select the top 3 rows
in each window, or something like that.

Here is an example that calculate the avg and sum for each window. It
return all the rows (x values) in the window together with a row number
(within the window) and the 2 aggregate results. In this case the
aggregates will be the same for all rows in the partition but one can also
get it to do a kind of of sliding window aggregate (for example the
avarage of the row before and the row after the current row):

SELECT ROW_NUMBER() OVER bar AS num,
x,
avg(a) OVER bar,
sum (a) OVER bar
FROM foo
WINDOW bar AS PARTITION BY x ORDER BY x, y, z;

and then one can put that whole thing as a subselect and just select the
rows with num = 1.

This doesn't mean that we don't want functions like first() and last(),
they are also be useful. I just wanted to inform that with sql2003 one can
write queries with the same effect as the above (but much more
complicated, of course :-).

ps. All I know about the window functions is from what I've read in the
draft of the sql2003 standard. It's not the perfect way to learn about new
features so I wont bet my life on that the above example works as is. If
someone knows better I'd like to hear about it.

ps2. I'd love to read a book that discusses the sql2003 (or even sql99)
that explain features, give examples, and so on. But i guess the market
don't want books that explain things that no database have implemented yet
(Oracle have window functions but i've never used that).

--
/Dennis Björklund


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruno Wolff III <bruno(at)wolff(dot)to>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 15:49:04
Message-ID: 87d5u2ut1r.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:

> The standard (sql2003) have what is called windows where one can do these
> things and much more.

OLAP functions would be very nice. But they're not the same thing. In fact
that's precisely *why* they would be really nice. They allow you to do things
that are not feasible to do with just plain old GROUP BY.

But I stared at this spec for a while and it seemed like implementing it would
be pretty hard. Like, I don't see any way to implement your query below
without sorting every record in the group repeatedly for every record. That
would be a lot of sorts.

> SELECT ROW_NUMBER() OVER bar AS num,
> x,
> avg(a) OVER bar,
> sum (a) OVER bar
> FROM foo
> WINDOW bar AS PARTITION BY x ORDER BY x, y, z;

Note that as you said, this returns just as many records as are in the
original table. The OLAP functions here are just regular functions, not
aggregate functions -- albeit functions that use data from other records other
than the one being output.

DISTINCT ON is just another spelling for GROUP BY, it always outputs only one
record per group.

> ps2. I'd love to read a book that discusses the sql2003 (or even sql99)
> that explain features, give examples, and so on. But i guess the market
> don't want books that explain things that no database have implemented yet
> (Oracle have window functions but i've never used that).

DB2 has them too. Check out this DB2 documentation, it might be more helpful
than the specs.

http://publib.boulder.ibm.com/infocenter/rb63help/index.jsp?topic=/com.ibm.redbrick.doc6.3/sqlrg/sqlrg35.htm

--
greg


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruno Wolff III <bruno(at)wolff(dot)to>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] We are not following the spec for HAVING without
Date: 2005-03-14 15:59:28
Message-ID: Pine.LNX.4.44.0503141652580.16409-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 14 Mar 2005, Greg Stark wrote:

> > SELECT ROW_NUMBER() OVER bar AS num,
> > x,
> > avg(a) OVER bar,
> > sum (a) OVER bar
> > FROM foo
> > WINDOW bar AS PARTITION BY x ORDER BY x, y, z;
>
> Note that as you said, this returns just as many records as are in the
> original table. The OLAP functions here are just regular functions, not
> aggregate functions

They are aggregate functions, the avg() is a window aggregate function
according to the standard. It runs over all values in the same partition.

> -- albeit functions that use data from other records other
> than the one being output.

Yes, and not just one other record, but a number of them. Isn't that what
aggregate functions are?

Anyway, I just wanted to point to this area in the standard to tell others
what are possible using standard constructs. It doesn't really help
anything in this specific case. Pg will not have any of this implemented
in the nearest future (I guess). There is always a chance that someone see
the mail, get interested, learn about it and then implements it :-)

--
/Dennis Björklund


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruno Wolff III <bruno(at)wolff(dot)to>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] We are not following the spec for HAVING without GROUP
Date: 2005-03-14 16:51:18
Message-ID: 87vf7utbll.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:

> They are aggregate functions, the avg() is a window aggregate function
> according to the standard. It runs over all values in the same partition.
>
> > -- albeit functions that use data from other records other
> > than the one being output.
>
> Yes, and not just one other record, but a number of them. Isn't that what
> aggregate functions are?

Er, yeah. They're aggregate functions in that they consider many records as
input. However they're not aggregate functions in that they produce an output
for every record, as opposed to outputting only one value for a whole group.

--
greg