When is a record NULL?

Lists: pgsql-hackers
From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: When is a record NULL?
Date: 2009-07-23 22:19:36
Message-ID: 48BDABE9-88AB-46E9-BABE-F70DDBFB98BD@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

How can a record be neither NULL or NOT NULL?

try=# select ROW(1, NULL) IS NULL;
?column?
----------
f
(1 row)

try=# select ROW(1, NULL) IS NOT NULL;
?column?
----------
f
(1 row)

This makes it rather hard to tell, in PL/pgSQL, when I've fetched the
last record from a cursor…

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-23 22:29:24
Message-ID: AD0542DC-FC37-40A1-8450-38EE82B37892@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 3:19 PM, David E. Wheeler wrote:

> How can a record be neither NULL or NOT NULL?
>
> try=# select ROW(1, NULL) IS NULL;
> ?column?
> ----------
> f
> (1 row)
>
> try=# select ROW(1, NULL) IS NOT NULL;
> ?column?
> ----------
> f
> (1 row)
>
> This makes it rather hard to tell, in PL/pgSQL, when I've fetched
> the last record from a cursor…

Also:

select ROW(1, NULL) IS DISTINCT FROM ROW(2, NULL);
?column?
----------
t

As expected, but the IS NULL above is not expected (by this bunny,
anyway).

Best,

David


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-23 22:33:19
Message-ID: 1248388399.25874.12.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-07-23 at 15:19 -0700, David E. Wheeler wrote:
> How can a record be neither NULL or NOT NULL?

You could do:

not ROW(1, NULL) is null and not ROW(1, NULL) is not null

I don't know what the SQL standard says about this.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-23 23:08:28
Message-ID: 14776.1248390508@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> How can a record be neither NULL or NOT NULL?

This is per SQL standard. IS NULL is true if *all* the record's
fields are null; IS NOT NULL is true if *none* of them are.

Yeah, it's a bit dubious, but that's what they said.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-23 23:17:00
Message-ID: C2F66615-CC76-4BD5-A956-6BE7A5E4A5F6@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 4:08 PM, Tom Lane wrote:

> This is per SQL standard. IS NULL is true if *all* the record's
> fields are null; IS NOT NULL is true if *none* of them are.
>
> Yeah, it's a bit dubious, but that's what they said.

And yet they're DISTINCT FROM each other if either or both contain
NULLs and some other values? It seems to me that, to be consistent, it
should be:

select ROW(1, NULL) IS DISTINCT FROM ROW(2, NULL);
?column?
----------
f

No?

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-23 23:22:37
Message-ID: 15091.1248391357@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> And yet they're DISTINCT FROM each other if either or both contain
> NULLs and some other values?

Well, that would depend on what the values were and in what columns...

> It seems to me that, to be consistent, it
> should be:

> select ROW(1, NULL) IS DISTINCT FROM ROW(2, NULL);
> ?column?
> ----------
> f

Dunno how you can possibly come to that conclusion. Those row
values are certainly distinct (according to both PG and the spec).

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 00:32:03
Message-ID: CB36F40C-9917-416C-AC94-C30DD481AFD8@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 4:22 PM, Tom Lane wrote:

> Dunno how you can possibly come to that conclusion. Those row
> values are certainly distinct (according to both PG and the spec).

Are they not both null?

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 00:35:33
Message-ID: 91890A93-3CF5-40B0-A76E-B2FC2456D6BE@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 3:33 PM, Jeff Davis wrote:

> not ROW(1, NULL) is null and not ROW(1, NULL) is not null
>
> I don't know what the SQL standard says about this

Thanks Jeff, that's just what I needed.

Best,

David


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 01:06:32
Message-ID: 1248397592.25874.51.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-07-23 at 17:32 -0700, David E. Wheeler wrote:
> On Jul 23, 2009, at 4:22 PM, Tom Lane wrote:
>
> > Dunno how you can possibly come to that conclusion. Those row
> > values are certainly distinct (according to both PG and the spec).
>
> Are they not both null?

[ Is that a play on words? ;) ]

Here's the logical breakdown of your argument:

x = ROW(1, NULL)

0. x IS NOT NULL = false
1. => NOT x IS NOT NULL = true
2. => NOT NOT x IS NULL = true
3. => x IS NULL = true

However, in step 2, you transformed:
x IS NOT NULL => NOT x IS NULL

But in SQL that is not a tautology!

I don't think it's wise to assume SQL is consistent. I think it is
possible to create a consistent 3VL system, but you have to give up some
other very important property. I can't remember the details at the
moment, but there's an interesting proof in "Logic and Databases" by
C.J. Date.

Does the SQL spec lay out the tautologies anywhere, so that we can get a
clear picture of what's going on with NULLs? I won't make the claim that
SQL is inconsistent without actually seeing the system as a whole, but,
at a minimum, many of the tautologies that people are accustomed to are
not true in SQL.

Regards,
Jeff Davis


From: Greg Stark <gsstark(at)mit(dot)edu>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 01:21:01
Message-ID: 407d949e0907231821y79028370h848aa23da2763dfe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 1:32 AM, David E. Wheeler<david(at)kineticode(dot)com> wrote:
> On Jul 23, 2009, at 4:22 PM, Tom Lane wrote:
>
>> Dunno how you can possibly come to that conclusion.  Those row
>> values are certainly distinct (according to both PG and the spec).
>
> Are they not both null?

Isn't that just what you were complaining about not being the case:

On Thu, Jul 23, 2009 at 11:29 PM, David E. Wheeler<david(at)kineticode(dot)com> wrote:
> try=# select ROW(1, NULL) IS NULL;
> ?column?
> ----------
> f
> (1 row)

--
greg
http://mit.edu/~gsstark/resume.pdf


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 01:46:25
Message-ID: 5D3ECB26-B5B8-4539-ACD9-A0152E48182B@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 6:21 PM, Greg Stark wrote:

>> Are they not both null?
>
> Isn't that just what you were complaining about not being the case:

Yes, but given that the standard says that `ROW(1, NULL)` is NULL,
then I would expect it to be NOT DISTINCT from `ROW(2, NULL)`.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 01:52:18
Message-ID: 27AF328C-D3F5-4D65-9BBE-EDCD39E9291C@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 6:06 PM, Jeff Davis wrote:

> However, in step 2, you transformed:
> x IS NOT NULL => NOT x IS NULL
>
> But in SQL that is not a tautology!

No, that's not the problem I see -- that solved the problem in my
particular code. The problem I see is that, given that the standard
says (according to Tom) that if any value is NULL then the record is
NULL, then I would expect this to return false:

SELECT ROW(1, NULL) IS DISTINCT FROM ROW(2, NULL);

But it returns true.

> I don't think it's wise to assume SQL is consistent. I think it is
> possible to create a consistent 3VL system, but you have to give up
> some
> other very important property. I can't remember the details at the
> moment, but there's an interesting proof in "Logic and Databases" by
> C.J. Date.

If the standard says that, in the case of records, two NULLs are
distinct, then fine. Completely bizarre, but fine. But I suspect that
such is not the case.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: David E(dot) Wheeler <david(at)kineticode(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 01:58:18
Message-ID: B0B6B859-131A-4D19-94BE-D6EDA552E456@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 6:52 PM, David E. Wheeler wrote:

> No, that's not the problem I see -- that solved the problem in my
> particular code. The problem I see is that, given that the standard
> says (according to Tom) that if any value is NULL then the record is
> NULL, then I would expect this to return false:
>
> SELECT ROW(1, NULL) IS DISTINCT FROM ROW(2, NULL);
>
> But it returns true.

Sorry, I'm confused. Understandable I think. So:

ROW(1, NULL) is neither NULL nor NOT NULL. I've no idea what state it
is, but I guess that's the standard. In this case, IS DISTINCT FROM
gives me a sensible return value -- it assumes that the records are
NOT NULL, I guess, for its purposes. This is still inconsistent, since
the records are neither NULL nor NOT NULL, but perhaps sensible.

It's pretty insane, frankly. Gotta love SQL.

Best,

David


From: Brendan Jurd <direvus(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 04:34:10
Message-ID: 37ed240d0907232134g7cdd4139l19a79d81f8c98666@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/24 David E. Wheeler <david(at)kineticode(dot)com>:
> ROW(1, NULL) is neither NULL nor NOT NULL. I've no idea what state it is,
> but I guess that's the standard.

Well, a ROW is an ordered set of values, each one of which may be
either NULL or NOT NULL.

It doesn't really make sense to talk about the ROW itself being NULL
or NOT NULL, only its member values (but for extra confusion, contrast
with the treatment of arrays, which can themselves be NULL).

It does make sense, however, to talk about the ROW's member values
being entirely NULL or entirely NOT NULL, and that's what the IS NULL
and IS NOT NULL tests tell you about.

I guess the spec authors figured they might as well make IS [NOT] NULL
do something useful when applied to a row rather than throwing an
error. I tend to agree.

I hope that provides some clarity.

Cheers,
BJ


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 09:15:56
Message-ID: EBA99F2A-C04F-4CE1-A629-63F01125EEF7@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 23, 2009, at 9:34 PM, Brendan Jurd wrote:

> Well, a ROW is an ordered set of values, each one of which may be
> either NULL or NOT NULL.

Right.

> It doesn't really make sense to talk about the ROW itself being NULL
> or NOT NULL, only its member values (but for extra confusion, contrast
> with the treatment of arrays, which can themselves be NULL).

Well then maybe a record (row) should *never* be null.

> It does make sense, however, to talk about the ROW's member values
> being entirely NULL or entirely NOT NULL, and that's what the IS NULL
> and IS NOT NULL tests tell you about.

Ah! So that's where the three-valued logic comes in to play with
records:

try=# SELECT ROW(1, NULL) IS NULL, ROW (1, 1) IS NULL, ROW(NULL,
NULL) IS NULL;
?column? | ?column? | ?column?
----------+----------+----------
f | f | t

> I guess the spec authors figured they might as well make IS [NOT] NULL
> do something useful when applied to a row rather than throwing an
> error. I tend to agree.

Frankly, I find the state where a record with a NULL and a not-null
value being neither NULL nor not NULL bizarre.

> I hope that provides some clarity.

It's useful to learn that `ROW(NULL, NULL)` is NULL, but I find the
whole thing totally bizarre. Is it me?

Best,

David


From: Richard Huxton <dev(at)archonet(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 10:16:56
Message-ID: 4A698A18.6090300@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David E. Wheeler wrote:
> On Jul 23, 2009, at 9:34 PM, Brendan Jurd wrote:

>> I guess the spec authors figured they might as well make IS [NOT] NULL
>> do something useful when applied to a row rather than throwing an
>> error. I tend to agree.
>
> Frankly, I find the state where a record with a NULL and a not-null
> value being neither NULL nor not NULL bizarre.

I'm guessing the justification (and presumably this was worked out based
on the behaviour of one or more of the big DB providers and then
justified afterwards) is that the composite is "partially unknown". Of
course you should either introduce a new code or throw an error, but
four-valued logic isn't going to win you any friends.

If the argument *is* that because you know part of the overall value the
composite isn't null then I'd argue that ('abc' || null) isn't null
either. After all, the first three characters are perfectly well
established.

>> I hope that provides some clarity.
>
> It's useful to learn that `ROW(NULL, NULL)` is NULL, but I find the
> whole thing totally bizarre. Is it me?

Yes, just you. None of the rest of us have any problems with this at all :-)

--
Richard Huxton
Archonet Ltd


From: Brendan Jurd <direvus(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 10:17:16
Message-ID: 37ed240d0907240317s13faff4wb4dc70d2b51e708f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/24 David E. Wheeler <david(at)kineticode(dot)com>:
> It's useful to learn that `ROW(NULL, NULL)` is NULL, but I find the whole
> thing totally bizarre. Is it me?
>

*shrug* The IS [NOT] NULL tests mean something different when applied
to a ROW than they do when applied to a scalar value or an array.

"SELECT 1 IS NULL"
=> means "is this scalar set to the special value NULL?".

"SELECT ROW(1, 2) IS NULL"
=> means "are all the member values of this row set to the special
value NULL?"

So it is wrong to talk about ROW(NULL, NULL) being NULL. It doesn't
have the property of being NULL or not NULL, because it is a composite
value. "ROW(NULL, NULL) IS NULL" returns true, but that is not the
same as saying that it actually is NULL, because of the different
semantics above.

It's slightly different semantics from what you get with ordinary
scalar values, but that is cognisant with the fact that composites are
fundamentally different things from ordinary scalar values.

Cheers,
BJ


From: Michael Gould <mgould(at)intermodalsoftwaresolutions(dot)net>
To: Richard Huxton <dev(at)archonet(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 13:10:26
Message-ID: 4a5378f420212b02a470edc1bc276a69@intermodalsoftwaresolutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Does this also apply at the column level?

In Postgres is If column IS NOT NULL or If column IS NULL? Do all
columns in the record have to be NULL to have IF column IS NULL = true?

Best Regards
--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 13:31:13
Message-ID: 25925.1248442273@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Brendan Jurd <direvus(at)gmail(dot)com> writes:
> So it is wrong to talk about ROW(NULL, NULL) being NULL. It doesn't
> have the property of being NULL or not NULL, because it is a composite
> value. "ROW(NULL, NULL) IS NULL" returns true, but that is not the
> same as saying that it actually is NULL, because of the different
> semantics above.

It's worse than that, because there actually is also such a thing as
the row value being NULL --- ie, there's no row structure at all.
At least internally, that's a completely different thing from having
a row all of whose fields are null.

SQL doesn't provide a test for this case that's separate from the test
involving null-ness of individual fields. Not much we can do about
it though. I'm not entirely sure that exposing the distinction would
be helpful anyway ...

regards, tom lane


From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 13:36:46
Message-ID: 20090724133646.GB6606@eddie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 23, 2009 at 06:46:25PM -0700, David E. Wheeler wrote:
> Yes, but given that the standard says that `ROW(1, NULL)` is NULL, then I
> would expect it to be NOT DISTINCT from `ROW(2, NULL)`.

Wait, didn't we decide upthread that the standard said ROW(1, NULL) isn't
NULL?

(From Tom):
> This is per SQL standard. IS NULL is true if *all* the record's
> fields are null; IS NOT NULL is true if *none* of them are.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 14:01:10
Message-ID: 26379.1248444070@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Tolley <eggyknap(at)gmail(dot)com> writes:
> On Thu, Jul 23, 2009 at 06:46:25PM -0700, David E. Wheeler
>> Yes, but given that the standard says that `ROW(1, NULL)` is NULL, then I
>> would expect it to be NOT DISTINCT from `ROW(2, NULL)`.

> Wait, didn't we decide upthread that the standard said ROW(1, NULL) isn't
> NULL?

David misspoke in the quoted statement, as I believe he figured out soon
thereafter. For that row value, neither IS NULL nor IS NOT NULL will
return true. The spec defines them in such a way that they are not inverses
for row values.

SQL2008 points out:

NOTE 219 - For all R, "R IS NOT NULL" has the same result as
"NOT R IS NULL" if and only if R is of degree 1. Table 14,
"<null predicate> semantics", specifies this behavior.

That table looks like this:

R IS R IS NOT NOT R IS NOT R IS NOT
_Expression_______NULL____NULL__________NULL__________NULL_________

| degree 1: null | true_ | false_ | false_ | true_ |
| | | | | |
| degree 1: not | false_| true_ | true_ | false_ |
null

| degree > 1: | true_ | false_ | false_ | true_ |
| all null | | | | |
| | | | | |
| degree > 1: | false_| false_ | true_ | true_ |
| some null | | | | |
| | | | | |
| degree > 1: | false_| true_ | true_ | false_ |
|_none_null______|_______|_____________|____________|______________|

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 16:40:22
Message-ID: 1248453622.5201.88.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-07-24 at 09:31 -0400, Tom Lane wrote:
> Brendan Jurd <direvus(at)gmail(dot)com> writes:
> It's worse than that, because there actually is also such a thing as
> the row value being NULL --- ie, there's no row structure at all.
> At least internally, that's a completely different thing from having
> a row all of whose fields are null.

Here is an example:
select * from (values (row(NULL)), (null)) v;

But it's hard to do much useful with row values once you have them. You
can't even count them:
select count(column1) from (values (row(NULL)), (null)) v;
ERROR: record type has not been registered

> SQL doesn't provide a test for this case that's separate from the test
> involving null-ness of individual fields. Not much we can do about
> it though. I'm not entirely sure that exposing the distinction would
> be helpful anyway ...

There's no single test, but you can see if it's an empty row with:
x IS NULL AND x IS DISTINCT FROM NULL
And you can see if it's a "real" NULL by:
x IS NULL AND x IS NOT DISTINCT FROM NULL

It's funny until you try it -- then it's just scary.

Regards,
Jeff Davis


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: When is a record NULL?
Date: 2009-07-24 16:56:14
Message-ID: 20090724165614.GD5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 09:31:13AM -0400, Tom Lane wrote:
> Brendan Jurd <direvus(at)gmail(dot)com> writes:
> > So it is wrong to talk about ROW(NULL, NULL) being NULL. It doesn't
> > have the property of being NULL or not NULL, because it is a composite
> > value. "ROW(NULL, NULL) IS NULL" returns true, but that is not the
> > same as saying that it actually is NULL, because of the different
> > semantics above.
>
> It's worse than that, because there actually is also such a thing as
> the row value being NULL --- ie, there's no row structure at all.
> At least internally, that's a completely different thing from having
> a row all of whose fields are null.

Hope nobody minds if I go off on a somewhat pedagogic bent here!

Not quite sure what you mean by the above; but I think maybe something
like:

SELECT NULL::RECORD AS r;

PG may well treat this internally as a special case, but from a type
level I don't see any difference between the above and, say:

SELECT ROW(1)::RECORD AS r;

In both cases we get a result that has exactly one column and this
column is of type RECORD (or ROW depending how you spell things).
The fact that it happens to be a NULL *value* in one case shouldn't
affect things at the level of *types*--unless PG has suddenly become
dependently-typed which I don't believe it wants to be.

I'm also aware that PG's handling of types with ROW values is somewhat
inconsistent when compared to other values, for example:

SELECT (r).a
FROM (
SELECT a,b
FROM (VALUES
(1,2),
(2,3)) x(a,b)) r;

here, we can look inside the RECORD named by "r" and pull out the value
associated with attribute "a", but inside:

SELECT (r).a
FROM (VALUES
(ROW(1,2)),
(ROW(2,3))) x(r);

we get a message saying that the "record type has not been registered"
when I'd expect to get an error saying that it doesn't know which
attribute "a" is. We also fail to get an error in the following case:

SELECT r
FROM (VALUES
(ROW(1,2)),
(ROW('a','b'))) x(r);

which (to me) seems wrong. The (allegedly) static types in PG appear
dynamic when it comes to RECORDs.

I'd be interested in fixing this behavior but every time I've started
trying to fix this I've tended to get horribly lost in the code.

> SQL doesn't provide a test for this case that's separate from the test
> involving null-ness of individual fields. Not much we can do about
> it though. I'm not entirely sure that exposing the distinction would
> be helpful anyway ...

I think it would; I tend to write the following and have just realized
that it doesn't do what I thought it did:

SELECT a.*
FROM tbla a
LEFT JOIN tblb b ON a.id = b.id
WHERE b IS NULL;

The intuition being that the row valued "b" would only be considered to
be NULL (i.e. IS NULL returned TRUE) when the LEFT JOIN failed. The
standard way to write this is of course to write "WHERE b.id IS NULL",
but I'm unsure why it's necessary to "look inside" the record "b" to get
out attribute "id" to see if it's NULL when it should just be possible
to look at "b" directly.

--
Sam http://samason.me.uk/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-24 17:30:39
Message-ID: 4A69A96F0200002500028D22@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Fri, Jul 24, 2009 at 09:31:13AM -0400, Tom Lane wrote:

> Hope nobody minds if I go off on a somewhat pedagogic bent here!

Not as long as you don't mind replies in kind. ;-)

> The fact that it happens to be a NULL *value* in one case

Well, according to Codd (and I tend to go with him on this) there is
no such thing. NULL is a way to flag a place where a value could be
stored, but is not -- because is unknown or is not applicable in that
context. (He seemed to feel it was a big weakness of SQL that it
didn't differentiate between these two conditions, but that's another
argument.) "NULL value" is an oxymoron.

>> SQL doesn't provide a test for this case that's separate from the
>> test involving null-ness of individual fields. Not much we can do
>> about it though. I'm not entirely sure that exposing the
>> distinction would be helpful anyway ...
>
> I think it would

The distinction between not having a tuple and having a tuple for
which you don't know any applicable values seems thin. I'm not sure
what that would really mean.

-Kevin


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 17:35:08
Message-ID: b42b73150907241035ja15c1c7v9ad90c29626ce352@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 5:15 AM, David E. Wheeler<david(at)kineticode(dot)com> wrote:
> On Jul 23, 2009, at 9:34 PM, Brendan Jurd wrote:
>
>> Well, a ROW is an ordered set of values, each one of which may be
>> either NULL or NOT NULL.
>
> Right.
>
>> It doesn't really make sense to talk about the ROW itself being NULL
>> or NOT NULL, only its member values (but for extra confusion, contrast
>> with the treatment of arrays, which can themselves be NULL).
>
> Well then maybe a record (row) should *never* be null.

I disagree, and I think our current way of treating things is
incorrect (although harmless). I rowtype can be null:

select null::somerowtype;

I think the following should _not_ return true:
select (null, null)::somerowtype is null;

The reasoning being that while the rowtype members are null, the
record variable itself is not; these are two distinct cases and should
be checked for and treated differently.

Another line of reasoning for this is that if something gives 'true'
for the is null operator, it should behave as null does, giving null
for any operations on it and giving null for STRICT functions, to give
a couple of examples.

create table foo (a int, b int);
select (null, null)::foo is null;
?column?
----------
t

create or replace function doit(foo) returns void as $$ begin raise
notice '!'; end; $$ language plpgsql strict;

select doit(row(null, null)::foo);
NOTICE: ! <-- what???!?

I think this is wrong, and if the sql standard sez it is so, then the
standard is wrong :-).

merlin


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 17:44:37
Message-ID: b42b73150907241044l46b88302q73d1b35204727721@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 1:35 PM, Merlin Moncure<mmoncure(at)gmail(dot)com> wrote:
> On Fri, Jul 24, 2009 at 5:15 AM, David E. Wheeler<david(at)kineticode(dot)com> wrote:
>> On Jul 23, 2009, at 9:34 PM, Brendan Jurd wrote:
>>
>>> Well, a ROW is an ordered set of values, each one of which may be
>>> either NULL or NOT NULL.
>>
>> Right.
>>
>>> It doesn't really make sense to talk about the ROW itself being NULL
>>> or NOT NULL, only its member values (but for extra confusion, contrast
>>> with the treatment of arrays, which can themselves be NULL).
>>
>> Well then maybe a record (row) should *never* be null.
>
> I disagree, and I think our current way of treating things is
> incorrect (although harmless).  I rowtype can be null:
>
> select null::somerowtype;
>
> I think the following should _not_ return true:
> select (null, null)::somerowtype is null;
>
> The reasoning being that while the rowtype members are null, the
> record variable itself is not; these are two distinct cases and should
> be checked for and treated differently.
>
> Another line of reasoning for this is that if something gives 'true'
> for the is null operator, it should behave as null does, giving null
> for any operations on it and giving null for STRICT functions, to give
> a couple of examples.
>
> create table foo (a int, b int);
> select (null, null)::foo is null;
>  ?column?
> ----------
>  t
>
> create or replace function doit(foo) returns void as $$ begin raise
> notice '!'; end; $$ language plpgsql strict;
>
> select doit(row(null, null)::foo);
> NOTICE:  !  <-- what???!?
>
> I think this is wrong, and if the sql standard sez it is so, then the
> standard is wrong :-).

Just to clarify...I think this is the right behavior to run the
function that way, but (null,null) is null == true is not correct.
There are not 'degrees' of null...something is either null or it
isn't.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:05:55
Message-ID: 1664.1248458755@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> I think the following should _not_ return true:
> select (null, null)::somerowtype is null;

Take it up with the SQL standards committee. The spec is 100% not
ambiguous about this.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:10:31
Message-ID: 1764.1248459031@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> There's no single test, but you can see if it's an empty row with:
> x IS NULL AND x IS DISTINCT FROM NULL
> And you can see if it's a "real" NULL by:
> x IS NULL AND x IS NOT DISTINCT FROM NULL

Hmmm ... that suggests that we may not be handling IS DISTINCT FROM
correctly for rowtypes. I haven't read that part of the spec, but
I would've guessed that the spec wants it to be consistent with
IS NULL.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:13:20
Message-ID: 4788B002-21DD-4B87-B4A4-213DC6650A00@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 24, 2009, at 3:17, Brendan Jurd <direvus(at)gmail(dot)com> wrote:

> "ROW(NULL, NULL) IS NULL" returns true, but that is not the
> same as saying that it actually is NULL

I think my head just exploded. Loving the dark corners of SQL…

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:19:39
Message-ID: 07247004-538C-493D-8D7A-EF964FB8E054@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 24, 2009, at 11:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Hmmm ... that suggests that we may not be handling IS DISTINCT FROM
> correctly for rowtypes. I haven't read that part of the spec, but
> I would've guessed that the spec wants it to be consistent with
> IS NULL.

Yes, that's precisely what I was trying to get at last night.

Best,

David


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:20:44
Message-ID: 1248459644.25874.69.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-07-24 at 13:35 -0400, Merlin Moncure wrote:
> I think the following should _not_ return true:
> select (null, null)::somerowtype is null;
>
> The reasoning being that

It's not at all clear to me that you can "reason" effectively about SQL
logic semantics. Upon which laws are you reasoning? Certainly not 2VL
boolean logic.

The standard is what it is. If it says that some NULLs are red and some
NULLs are blue, then we'd probably support it.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:37:13
Message-ID: 1248460633.25874.84.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-07-24 at 14:10 -0400, Tom Lane wrote:
> Hmmm ... that suggests that we may not be handling IS DISTINCT FROM
> correctly for rowtypes. I haven't read that part of the spec, but
> I would've guessed that the spec wants it to be consistent with
> IS NULL.
>

Our behavior appears to match the standard, which covers this case in
8.15.General Rules.1.c.iv.

Regards,
Jeff Davis


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 18:49:51
Message-ID: b42b73150907241149n2fd65c9dqb9eed70e5b3eca0b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 2:05 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> I think the following should _not_ return true:
>> select (null, null)::somerowtype is null;
>
> Take it up with the SQL standards committee.  The spec is 100% not
> ambiguous about this.

I'm not proposing to change the current behavior...just playing
devil's advocate here. Still, why:

*) is select null::foo is distinct from (null, null)::foo; true?
(null should not be distinct from null)

*) does (null, null) allow STRICT functions to execute?

*) is (null, null)::text not null?

The upshot of this is that 'is null' really means 'is approximately
null', or our handling of the three above cases is wrong. Or put
another way, I'd like a clarification:

is (null, null) really null (in which case I'd say our handling of it
is out of standard), or just a special case for the IS NULL operator,
so that it returns null but in all other respect not null? ISTM you
can't have it both ways.

merlin


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 20:29:34
Message-ID: 407d949e0907241329x3d495e65yecb57e82af4e86fd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 7:49 PM, Merlin Moncure<mmoncure(at)gmail(dot)com> wrote:
> Still, why:
>
> *) is select null::foo is distinct from (null, null)::foo; true?
> (null should not be distinct from null)
>
> *) does (null, null) allow STRICT functions to execute?
>
> *) is (null, null)::text not null?

These are all good questions. Are you interested in checking what the
spec says should happen in these cases?

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 21:18:28
Message-ID: b42b73150907241418h70f1ce02mac6522beb72d25f9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 4:29 PM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> On Fri, Jul 24, 2009 at 7:49 PM, Merlin Moncure<mmoncure(at)gmail(dot)com> wrote:
>> Still, why:
>>
>> *) is select null::foo is distinct from (null, null)::foo; true?
>> (null should not be distinct from null)
>>
>> *) does (null, null) allow STRICT functions to execute?
>>
>> *) is (null, null)::text not null?
>
> These are all good questions. Are you interested in checking what the
> spec says should happen in these cases?

I don't have a copy :-). (Am I being obtuse, and not noticing
something obvious?)

I think though that one of three cases is true:

*) postgresql is not treating (null, null) as null except in the case
where checked with 'is null'.
*) postgresql is following spec, which basically contradicts itself.
*) the behavior is undefined

If the first case is true (i bet it is), then many things we do with
composite types are wrong on some level, seriously in some cases. for
example:

select count(rowtype) from foo;
select distinct rowtype from foo;
select * from foo join bar using (rowtype);

can all give the wrong answer.

regardless, I think the sql standard definition of null for rowtypes
is lame -- the way null works for arrays is much better, in that the
elements inside the array an be null, independently can the array
itself.

I like the current behavior of composites (all composite fields being
null is a pretty rare case in practice), and would argue against
bringing the database closer to spec if it turns out that we are doing
it incorrectly.

merin


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brendan Jurd <direvus(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 21:59:23
Message-ID: 69692EBE-15BE-405C-9E0C-C5D69380E4D2@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 24, 2009, at 9:40 AM, Jeff Davis wrote:

> There's no single test, but you can see if it's an empty row with:
> x IS NULL AND x IS DISTINCT FROM NULL
> And you can see if it's a "real" NULL by:
> x IS NULL AND x IS NOT DISTINCT FROM NULL
>
> It's funny until you try it -- then it's just scary.

Amen to that.

So here's what I'm doing, essentially (some error handling removed for
clarity):

FETCH have INTO rec_have;
FETCH want INTO rec_want;
WHILE NOT rec_have IS NULL OR NOT rec_want IS NULL LOOP
IF rec_have IS DISTINCT FROM rec_want THEN
RETURN false;
END IF;
rownum = rownum + 1;
FETCH have INTO rec_have;
FETCH want INTO rec_want;
END LOOP;
RETURN true;

So far this seems to work for the tests I've thrown at it, telling me
when two cursors return results that are row-by-row equivalent,
including when columns have the NULLs, though I've not yet tried rows
that are nothing but nulls. That's probably not very useful, but it is
possible.

Does this look like it's a reasonable implementation for what I'm
testing? Have I missed anything in the swirl of the bizarre that this
thread has triggered?

Thanks,

David


From: David E(dot) Wheeler <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brendan Jurd <direvus(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 22:05:27
Message-ID: 0AC123E8-CAC3-4227-860C-31DA38A5D286@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 24, 2009, at 11:37 AM, Jeff Davis wrote:

>> Hmmm ... that suggests that we may not be handling IS DISTINCT FROM
>> correctly for rowtypes. I haven't read that part of the spec, but
>> I would've guessed that the spec wants it to be consistent with
>> IS NULL.
>
> Our behavior appears to match the standard, which covers this case in
> 8.15.General Rules.1.c.iv.

Good, because what I'm doing is comparing rows returned by two
different cursors using IS DISTINCT FROM, and what I need the
comparison to do is to ensure that two rows are not equivalent
(distinct) unless all of their values are the same, column to column,
including NULLs.

Best,

David


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: When is a record NULL?
Date: 2009-07-24 22:15:20
Message-ID: 20090724221520.GE5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 12:30:39PM -0500, Kevin Grittner wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > The fact that it happens to be a NULL *value* in one case
>
> Well, according to Codd (and I tend to go with him on this) there is
> no such thing. NULL is a way to flag a place where a value could be
> stored, but is not -- because is unknown or is not applicable in that
> context. (He seemed to feel it was a big weakness of SQL that it
> didn't differentiate between these two conditions, but that's another
> argument.) "NULL value" is an oxymoron.

I think then maybe we're talking about different things; I was trying
to draw attention to the distinction between types and values---types
allow some invariants of the code to be automatically checked before it
is run, a value only has meaning at run time with the set of possible
values an expression is defined over being constrained by its type.

In this dichotomy a NULL is most definitely a value and with my current
experience I don't understand the distinction you're trying to draw.

> >> SQL doesn't provide a test for this case that's separate from the
> >> test involving null-ness of individual fields. Not much we can do
> >> about it though. I'm not entirely sure that exposing the
> >> distinction would be helpful anyway ...
> >
> > I think it would
>
> The distinction between not having a tuple and having a tuple for
> which you don't know any applicable values seems thin. I'm not sure
> what that would really mean.

Other languages/type systems do define this precisely. For example,
in object orientated languages there's a big difference between a
reference to an object being NULL and some member of an object being
NULL. Databases obviously have their own semantics, but the distinction
is well defined.

Any implementation that tries to be faithful to a standard has its hands
somewhat tied and PG is no exception. PG currently seems to be some
hybrid half way between, it internally knows there is a distinction
between the two but it doesn't like to expose this. For example (and
this appears particularly awkward because of annoying limitations in the
syntax PG accepts):

SELECT y
FROM (SELECT 1) x(a)
LEFT JOIN (SELECT 1,2) y(a,b) ON FALSE;

I think it should be valid to express this as:
SELECT (SELECT 1,2 WHERE FALSE);
but PG doesn't like sub-queries returning two columns--but this is
material for another discussion.

This returns a single row whose only attribute is NULL (i.e. it's
rendered as '') and not as a record whose attributes are all NULL (i.e.
rendered as '(,)'). The fact PG does the former says that your mental
model isn't congruent with PGs behavior.

If your model is correct then when the IS DISTINCT FROM operator works
on RECORDs the following should return FALSE for all of the following:

SELECT NULL IS DISTINCT FROM ROW(NULL);
SELECT NULL IS DISTINCT FROM ROW(NULL,NULL);
SELECT NULL IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
SELECT ROW(NULL) IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));

i.e. there is *no* difference between a NULL record and a record
consisting entirely of NULLs.

--
Sam http://samason.me.uk/


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: David E(dot) Wheeler <david(at)kineticode(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brendan Jurd <direvus(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-24 22:47:39
Message-ID: 2CA937FC-2501-442E-BB3E-879D12132370@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 24, 2009, at 2:59 PM, David E. Wheeler wrote:

> FETCH have INTO rec_have;
> FETCH want INTO rec_want;
> WHILE NOT rec_have IS NULL OR NOT rec_want IS NULL LOOP
> IF rec_have IS DISTINCT FROM rec_want THEN
> RETURN false;
> END IF;
> rownum = rownum + 1;
> FETCH have INTO rec_have;
> FETCH want INTO rec_want;
> END LOOP;
> RETURN true;

Bah. It fails to do what I want when I pass cursors that return:

VALUES (NULL, NULL), (NULL, NULL)
VALUES (NULL, NULL)

So when it gets to that second row in the first cursor, it doesn't
know it's a row with NULLs as opposed to an empty row. So this bit:

WHILE NOT rec_have IS NULL OR NOT rec_want IS NULL LOOP

Obviously isn't detecting the difference. I tried

WHILE (NOT rec_have IS NULL AND rec_have IS DISTINCT FROM NULL)
OR (NOT rec_want IS NULL AND rec_want IS DISTINCT FROM NULL)

and

WHILE (NOT rec_have IS NULL AND NOT rec_have IS NOT DISTINCT FROM
NULL)
OR (NOT rec_want IS NULL AND NOT rec_want IS NOT DISTINCT FROM
NULL)

But they didn't work, either.

There's got to be a way to do this; better, there ought to be an easy
way to tell the difference. :-(

Thanks,

David


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "David E(dot)Wheeler" <david(at)kineticode(dot)com>
Cc: "Brendan Jurd" <direvus(at)gmail(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: When is a record NULL?
Date: 2009-07-25 16:15:39
Message-ID: 4A6AE95B0200002500028D34@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> wrote:

> So when it gets to that second row in the first cursor, it doesn't
> know it's a row with NULLs as opposed to an empty row.

> there ought to be an easy way to tell the difference. :-(

I would have thought that the correct thing is to check SQLSTATE for
'02000'. I can't see how PostgreSQL allows this, however. :-(

-Kevin


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brendan Jurd <direvus(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-25 16:24:11
Message-ID: 1248539051.2759.627.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-07-24 at 15:47 -0700, David E. Wheeler wrote:
> WHILE (NOT rec_have IS NULL AND rec_have IS DISTINCT FROM NULL)
> OR (NOT rec_want IS NULL AND rec_want IS DISTINCT FROM NULL)
>

I don't think you want the "NOT x IS NULL" part at all -- that will
evaluate to false when x = rec(NULL,NULL). I think you just want the "x
IS DISTINCT FROM NULL" part, right? Will that work?

Regards,
Jeff Davis


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-25 16:42:04
Message-ID: 4A6AEF8C0200002500028D3C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Fri, Jul 24, 2009 at 12:30:39PM -0500, Kevin Grittner wrote:

> In this dichotomy a NULL is most definitely a value and with my
> current experience I don't understand the distinction you're trying
> to draw.

There can be a place where a value *could* go which does not contain a
value. Codd considered it crucial, from a mathematical correctness
point of view, that the absence of a value not be indicated by some
special "magic value", but rather by some other technique which
indicates that there *is* no value there. In SQL this is done with
NULL. Based on reading his books, it seems to me that Codd always
seemed uncomfortable with this, since it made it appear to be some
special value, which he was adamant that it is *not*. It seems he
would have preferred a relational language use a term like "FLAGGED AS
MISSING" rather than "IS NULL". It also would have allowed the
flexibility to differentiate various types of missing values, such as
"FLAGGED AS UNKNOWN" or "FLAGGED AS NOT APPLICABLE".

>> The distinction between not having a tuple and having a tuple for
>> which you don't know any applicable values seems thin. I'm not
>> sure what that would really mean.
>
> Other languages/type systems do define this precisely.

Yeah, I've made my living programming for decades, and worked in
dozens of languages, so I know how this is usually done. I do think
that set logic in relational data involves some slightly different
twists on things than most language have. I tend, for bettor or
worse, to come down in agreement with the positions Codd espoused on
most of these things.

> [PG] ... internally knows there is a distinction
> between the two but it doesn't like to expose this.

Well, to some extent I think it's a tough problem, since the set logic
of a relational database is implemented in C, which doesn't have the
same concepts. There's got to be a little slight of hand in there
somewhere.

> If your model is correct then when the IS DISTINCT FROM operator
> works on RECORDs the following should return FALSE for all of the
> following:
>
> SELECT NULL IS DISTINCT FROM ROW(NULL);
> SELECT NULL IS DISTINCT FROM ROW(NULL,NULL);
> SELECT NULL IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
> SELECT ROW(NULL) IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
>
> i.e. there is *no* difference between a NULL record and a record
> consisting entirely of NULLs.

Well, on that I would go with whatever the SQL standard says, and hope
it's not too ambiguous. (I haven't tried to sort though this one in
the standard, so far.) I was going into the theory both because it is
the basis for some of the seemingly odd aspects of SQL, and because at
least half the time I see someone put the word NULL immediately in
front of the word VALUE, they are wandering into confusion on these
issues. (I will admit that using such technically incorrect language
is sometimes hard to avoid without sounding stilted, even if all
parties to the conversation know that NULL is *not* a value.)

I know that Codd was insistent that any relation (which included the
result of any query) which could contain duplicate rows should be
called a "corrupted relation". (In fact, in one of his books I think
he averaged a comment on this point about once every two pages.) So I
shudder to think what his reaction would be to a relation with a row
which contained no values. I have a really hard time figuring out
what useful information such a row could represent.

-Kevin


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brendan Jurd <direvus(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-25 23:20:17
Message-ID: 45BA8563-CF9A-40ED-80D5-31A9BB7FBA84@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 25, 2009, at 9:24 AM, Jeff Davis wrote:

> I don't think you want the "NOT x IS NULL" part at all -- that will
> evaluate to false when x = rec(NULL,NULL). I think you just want the
> "x
> IS DISTINCT FROM NULL" part, right? Will that work?

Nope, infinite loop when because `ROW(null, null)` and `ROW()` are
both distinct from null:

try=# select row(null, null) is distinct from null, row() is distinct
from null;
?column? | ?column?
----------+----------
t | t
(1 row)

So I still can't tell when I've exhausted a cursor.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-25 23:41:44
Message-ID: EE822D5E-C6F4-4D7F-9D07-B7773BC60D71@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 25, 2009, at 9:42 AM, Kevin Grittner wrote:

> I know that Codd was insistent that any relation (which included the
> result of any query) which could contain duplicate rows should be
> called a "corrupted relation". (In fact, in one of his books I think
> he averaged a comment on this point about once every two pages.) So I
> shudder to think what his reaction would be to a relation with a row
> which contained no values. I have a really hard time figuring out
> what useful information such a row could represent.

I agree that it's pathological, but it's clearly allowed by SQL, so we
need to be able to deal with it effectively. Intuitively would be
nice, but effectively will do.

Consider:

CREATE TABLE peeps (
name TEXT NOT NULL,
dob date,
ssn text,
active boolean NOT NULL DEFAULT true
);

INSERT INTO peeps
VALUES ('Tom', '1963-03-23', '123-45-6789', true),
('Damian', NULL, NULL, true),
('Larry', NULL, '932-45-3456', true),
('Bruce', '1965-12-31', NULL, true);

% SELECT dob, ssn from peeps where active;
dob | ssn
------------+-------------
1963-03-23 | 123-45-6789
[null] | [null]
[null] | 932-45-3456
1965-12-31 | [null]

Useless perhaps, but it's gonna happen, and someone may even have a
reason for it. Until such time as NULLs are killed off, we need to be
able to deal with SQL's pathologies.

Best,

David


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: When is a record NULL?
Date: 2009-07-26 00:40:31
Message-ID: 20090726004030.GO5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 25, 2009 at 11:42:04AM -0500, Kevin Grittner wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > In this dichotomy a NULL is most definitely a value and with my
> > current experience I don't understand the distinction you're trying
> > to draw.
>
> There can be a place where a value *could* go which does not contain a
> value. Codd considered it crucial, from a mathematical correctness
> point of view, that the absence of a value not be indicated by some
> special "magic value", but rather by some other technique which
> indicates that there *is* no value there. In SQL this is done with
> NULL. Based on reading his books, it seems to me that Codd always
> seemed uncomfortable with this, since it made it appear to be some
> special value, which he was adamant that it is *not*.

Have you used more academic languages like ML or Haskell? Their
"option" and "Maybe" types, respectively, provide the nicest practical
treatment I've seen of this.

> It seems he
> would have preferred a relational language use a term like "FLAGGED AS
> MISSING" rather than "IS NULL". It also would have allowed the
> flexibility to differentiate various types of missing values, such as
> "FLAGGED AS UNKNOWN" or "FLAGGED AS NOT APPLICABLE".

Those sorts of lists get domain specific *very* quickly. I think
languages are better off providing a type system of enough complexity to
express maybe types and letting users invent whatever is most useful for
the job at hand.

> I do think
> that set logic in relational data involves some slightly different
> twists on things than most language have.

You must live in a very different world from me then! :)

> I tend, for bettor or
> worse, to come down in agreement with the positions Codd espoused on
> most of these things.

I've not read much of his writings, any canonical references for this
sort of discussion?

> > [PG] ... internally knows there is a distinction
> > between the two but it doesn't like to expose this.
>
> Well, to some extent I think it's a tough problem, since the set logic
> of a relational database is implemented in C, which doesn't have the
> same concepts. There's got to be a little slight of hand in there
> somewhere.

That's a pretty bad excuse; everything ends up as machine code
in the end. Many languages expose very abstract and consistent
views of things, some of them a lot more awkward than that of a
relational database. PG is admittedly hampered by a desire to follow a
particularly "innovative" standard and correctness for PG commonly means
keeping data safe. Being logically consistent normally takes a minor
role, although there is quite a lot of overlap between the two.

> > If your model is correct then when the IS DISTINCT FROM operator
> > works on RECORDs the following should return FALSE for all of the
> > following:
> >
> > SELECT NULL IS DISTINCT FROM ROW(NULL);
> > SELECT NULL IS DISTINCT FROM ROW(NULL,NULL);
> > SELECT NULL IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
> > SELECT ROW(NULL) IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
> >
> > i.e. there is *no* difference between a NULL record and a record
> > consisting entirely of NULLs.
>
> Well, on that I would go with whatever the SQL standard says, and hope
> it's not too ambiguous. (I haven't tried to sort though this one in
> the standard, so far.)

I've had a reasonable browse around an old copy of SQL'08 I found, but
couldn't find much helpful. The definition of distinct (3.1.6.8 in my
copy) seems particularly vacuous, it would seem to leave it down to how
rows are constructed and I can't find many details of that.

> I was going into the theory both because it is
> the basis for some of the seemingly odd aspects of SQL, and because at
> least half the time I see someone put the word NULL immediately in
> front of the word VALUE, they are wandering into confusion on these
> issues. (I will admit that using such technically incorrect language
> is sometimes hard to avoid without sounding stilted, even if all
> parties to the conversation know that NULL is *not* a value.)

I think that depends on what definition of "value" you're using. If
you're considering it to be a value like a NULL pointer then I'd agree
as this is just a convention to treat "pointers with a value of zero"
specially. If you treat values as members of a set with the set defined
as their type and a sub-type relation existing between types then a NULL
value is the only member of an unnamed type (in SQL) that's the subtype
of all other types. There are other ways of formalizing this, and I've
probably explained it badly here, but it's a rough sketch of how I think
about it.

> I know that Codd was insistent that any relation (which included the
> result of any query) which could contain duplicate rows should be
> called a "corrupted relation". (In fact, in one of his books I think
> he averaged a comment on this point about once every two pages.) So I
> shudder to think what his reaction would be to a relation with a row
> which contained no values. I have a really hard time figuring out
> what useful information such a row could represent.

There's a *big* difference between what's allowed in a table and what
can come back from an arbitrary calculation. Think about what "b" would
be in the following:

SELECT b
FROM a LEFT JOIN b ON FALSE;

SQL does say "A table is a multiset of rows. A row is a nonempty
sequence of values." PG still needs some way of representing the above,
and the example David gave while I was writing this.

--
Sam http://samason.me.uk/

[1] http://hol.sourceforge.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-26 17:49:32
Message-ID: 4A6C50DC0200002500028D51@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

> I've not read much of his writings, any canonical references for
> this sort of discussion?

I think this is the one, although it's been a while since I read it,
and I might be getting it confused with something else he wrote:

Codd, E.F. (1990). The Relational Model for Database Management
(Version 2 ed.). Addison Wesley Publishing Company.
ISBN 0-201-14192-2.

I believe that he puts forward a list of about 200 things he feels
should be true of a database in order for him to consider it a
relational database. Since he was first and foremost a mathematician,
and was something of a perfectionist, I don't think some of these are
achievable (at least in the foreseeable future) without tanking
performance, but it makes for an interesting read. I find most of it
to be on target, and it gives a unique chance to see things from the
perspective of the inventor of relational model for database
management.

I don't, of course, agree with him on everything. If you think that
the SQL standard date handling is weird, wait until you see how a
perfectionist mathematician attempts to deal with it. :-) Also, the
requirement that, to be considered a relational database, it must be
impossible to write two queries which can be shown to be logically
equivalent but which optimize to different access plans to be, well, a
bit "ivory tower."

It appears that the "no duplicate rows in a relation" rule is to
Codd's relational theory what the speed of light is to relativity. I
think it is basically a corollary to the rule that each datum must be
addressable by specifying its table name, column name, and some set of
key values which uniquely identify the row.

-Kevin


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-26 21:57:26
Message-ID: 407d949e0907261457s434c6deekecb08096f1decf1a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 26, 2009 at 6:49 PM, Kevin
Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Also, the
> requirement that, to be considered a relational database, it must be
> impossible to write two queries which can be shown to be logically
> equivalent but which optimize to different access plans to be, well, a
> bit "ivory tower."

Personally I think that's a fine goal to aim for. I'm not sure what
"to be considered a relational database" means but I consider a bug
whenever there's a case where this isn't true. It may be a bug that we
don't have a good solution for or a bug that's too minor for the
amount of effort it would require but it's still not right and if we
found a solution that we were happy with we would definitely want to
fix it.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: When is a record NULL?
Date: 2009-07-26 22:24:14
Message-ID: 20090726222414.GU5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 26, 2009 at 12:49:32PM -0500, Kevin Grittner wrote:
> Codd, E.F. (1990). The Relational Model for Database Management
> (Version 2 ed.). Addison Wesley Publishing Company.
> ISBN 0-201-14192-2.

Looks as though I've got some reading to do then--somewhat annoying that
only second hand copies available from the US, but never mind!

> I believe that he puts forward a list of about 200 things he feels
> should be true of a database in order for him to consider it a
> relational database. Since he was first and foremost a mathematician,
> and was something of a perfectionist, I don't think some of these are
> achievable (at least in the foreseeable future) without tanking
> performance, but it makes for an interesting read. I find most of it
> to be on target, and it gives a unique chance to see things from the
> perspective of the inventor of relational model for database
> management.

Yup, I've heard lots and read a few smaller articles but don't think
I've got around to any of his books.

> I don't, of course, agree with him on everything. If you think that
> the SQL standard date handling is weird, wait until you see how a
> perfectionist mathematician attempts to deal with it. :-) Also, the
> requirement that, to be considered a relational database, it must be
> impossible to write two queries which can be shown to be logically
> equivalent but which optimize to different access plans to be, well, a
> bit "ivory tower."

Sounds as though he's using a different definition than what I would
use, but I'm sure I'll find out.

--
Sam http://samason.me.uk/


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-26 22:46:19
Message-ID: 69F6A881-08D2-4C94-A925-A4BA2820DBDA@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 25, 2009, at 4:41 PM, David E. Wheeler wrote:

> Useless perhaps, but it's gonna happen, and someone may even have a
> reason for it. Until such time as NULLs are killed off, we need to
> be able to deal with SQL's pathologies.

And something I'd like to be able to handle in a while loop, as I'm
actually fetching one row at a time from two cursors and need to be
able to tell when I've reached the end of a cursor. This example
highlights the issue:

\set QUIET 1
SET client_min_messages = warning;
BEGIN;

CREATE TABLE peeps (
name TEXT NOT NULL,
dob date,
ssn text,
active boolean NOT NULL DEFAULT true
);

INSERT INTO peeps
VALUES ('Tom', '1963-03-23', '123-45-6789', true),
('Damian', NULL, NULL, true),
('Larry', NULL, '932-45-3456', true),
('Bruce', '1965-12-31', NULL, true);

CREATE TYPE dobssn AS ( dob date, ssn text );

CREATE FUNCTION using_loop() RETURNS SETOF dobssn LANGUAGE
plpgsql AS $$
DECLARE
stuff CURSOR FOR SELECT dob, ssn from peeps where active
ORDER BY name;
BEGIN
FOR rec IN stuff LOOP
RETURN NEXT rec;
END LOOP;
END;
$$;

CREATE FUNCTION using_while() RETURNS SETOF dobssn LANGUAGE
plpgsql AS $$
DECLARE
stuff CURSOR FOR SELECT dob, ssn from peeps where active
ORDER BY name;
rec dobssn;
BEGIN
open stuff;
FETCH stuff INTO rec;
WHILE NOT rec IS NULL LOOP
RETURN NEXT rec;
FETCH stuff INTO rec;
END LOOP;
END;
$$;

SELECT * FROM using_loop();
SELECT * FROM using_while();

ROLLBACK;

Output:

dob | ssn
------------+-------------
1965-12-31 |
|
| 932-45-3456
1963-03-23 | 123-45-6789
(4 rows)

dob | ssn
------------+-----
1965-12-31 |
(1 row)

So somehow the use of the loop to go right through the cursor can tell
the difference between a record that's all nulls and the when the end
of the cursor has been reached. My use of the while loop, however,
cannot tell the difference, and AFAICT, there is no way to detect the
difference in SQL. Is that correct? Is there some way to get
using_while() to properly return all the records?

FYI, using:

WHILE rec IS DISTINCT FROM NULL LOOP

Results in an infinite loop. So does:

WHILE NOT rec IS NOT DISTINCT FROM NULL LOOP

And this, of course:

WHILE rec IS NOT NULL LOOP

Returns no rows at all.

Surely someone has run into this before, no?

Thanks,

David


From: "Eric B(dot) Ridge" <ebr(at)tcdi(dot)com>
To: David E(dot) Wheeler <david(at)kineticode(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-26 23:02:53
Message-ID: 114FE040-D9BD-4000-9A49-74F191C67A71@tcdi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 26, 2009, at 6:46 PM, David E. Wheeler wrote:

> Is there some way to get using_while() to properly return all the
> records?

I'm just a random lurker, but FOUND seems to work just fine (I suppose
it's PG-specific?).

http://www.postgresql.org/docs/8.1/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS

BEGIN
OPEN stuff;
FETCH stuff INTO rec;
WHILE FOUND LOOP
RETURN NEXT rec;
FETCH stuff INTO rec;
END LOOP;
END;

HTH,

eric


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: When is a record NULL?
Date: 2009-07-26 23:26:56
Message-ID: 20090726232656.GW5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 26, 2009 at 03:46:19PM -0700, David E. Wheeler wrote:
> And something I'd like to be able to handle in a while loop, as I'm
> actually fetching one row at a time from two cursors and need to be
> able to tell when I've reached the end of a cursor.

I'm sure I'm missing something obvious, but why doesn't the "FOUND"
magic variable tell you what you want?

--
Sam http://samason.me.uk/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-27 13:48:43
Message-ID: 4A6D69EB0200002500028D64@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> wrote:
> Kevin Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>> impossible to write two queries which can be shown to be logically
>> equivalent but which optimize to different access plans
>
> Personally I think that's a fine goal to aim for.

Sure, but from my experience, there aren't any database products which
come close to having an optimizer which can do a good enough job that
it is yet feasible. I'm not sure I've even seen any which reliably
treat '(a AND b) OR (c and d)' as equivalent to 'NOT ((a OR b) AND (c
OR d))', much less the fairly common 'a > b OR (a = b AND c > d)' into
'a >= b AND (a > b OR c > d)'. Two commonly heard arguments on this
list are that:

(1) a particular improvement in this area is not worth it because it
would cost more in CPU time to recognize the equivalence that it would
save in run time from the better plan, and

(2) someone sometimes finds it useful to not recognize the equivalence
so that they can coerce a certain type of plan.

The latter really *is* a form of optimizer hint, it's just an
undocumented, arcane hint for the Illuminati.

But anyway, I didn't say that it was a bad thing toward which to
strive, just that it's so far from realization that as an absolute
requirement to be designated an RDBMS, it's a bit "ivory tower."

> I'm not sure what "to be considered a relational database" means

In that context I was talking about Codd's book where he provides a
list of properties which a DBMS should have to be considered, in his
opinion (as the inventor of the relational model for database
management) a relational database management system. It just occurs
to me that I think some of these entries were *required* for his
blessing, and others were just *desirable*. I don't remember which
designation he gave this point.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-27 14:09:56
Message-ID: 4A6D6EE40200002500028D6A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

> I've heard lots and read a few smaller articles but don't think
> I've got around to any of his books.

Having just poked around on the Internet, I think perhaps this was his
only full-fledge book, per se. The rest of his work appears to have
been papers published in academia or with the ACM.

>> to be considered a relational database, it must be
>> impossible to write two queries which can be shown to be logically
>> equivalent but which optimize to different access plans
>
> Sounds as though he's using a different definition than what I would
> use, but I'm sure I'll find out.

I think that as the inventor of the relational model for database
management, he felt that things were being done using the name of the
technology which didn't match his vision of it. This book, and some
of his papers, seem to have been geared toward preserving the
integrity of his vision of RDBMS.

-Kevin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: When is a record NULL?
Date: 2009-07-27 15:41:05
Message-ID: 603c8f070907270841y1197b081x46e563bddb1bc852@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 27, 2009 at 9:48 AM, Kevin
Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> The latter really *is* a form of optimizer hint, it's just an
> undocumented, arcane hint for the Illuminati.

Well said.

...Robert


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Eric B(dot) Ridge <ebr(at)tcdi(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-28 00:19:44
Message-ID: 5BE64390-BB4B-4FCC-B3CF-4853471429D3@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 26, 2009, at 4:02 PM, Eric B. Ridge wrote:

> I'm just a random lurker, but FOUND seems to work just fine (I
> suppose it's PG-specific?).
>
> http://www.postgresql.org/docs/8.1/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS
>
> BEGIN
> OPEN stuff;
> FETCH stuff INTO rec;
> WHILE FOUND LOOP
> RETURN NEXT rec;
> FETCH stuff INTO rec;
> END LOOP;
> END;

Yep, that's just what I needed, thanks. I think I'll send a patch for
the "Cursors" section of the PL/pgSQL documentation that mentions
this. Would have saved me a bunch of hassle.

Best,

David


From: David E(dot) Wheeler <david(at)kineticode(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: When is a record NULL?
Date: 2009-07-28 19:22:34
Message-ID: F5020398-BEE9-467F-BD10-876F43EA87CE@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 27, 2009, at 5:19 PM, David E. Wheeler wrote:

> Yep, that's just what I needed, thanks. I think I'll send a patch
> for the "Cursors" section of the PL/pgSQL documentation that
> mentions this. Would have saved me a bunch of hassle.

So would have reading two more sentences of the docs, which state, to
my surprise (that I missed it!), “As with `SELECT INTO`, the special
variable <code>FOUND</code> can be checked to see whether a row was
obtained or not.” D’oh!

So now I've got things working the way I need, and I've blogged about
these SQL pathologies so that I don't forget.

http://www.justatheory.com/computers/databases/postgresql/neither-null-nor-not-null.html

Thanks for the help, everyone.

Best,

David