COALESCE and NULLIF semantics

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: COALESCE and NULLIF semantics
Date: 2009-09-08 18:18:02
Message-ID: 4AA6598A020000250002AB80@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Given the desire to defer pinning the type of a literal, for better
support of user defined types, I think the current semantics of all
the conditional expressions:

http://www.postgresql.org/docs/8.4/interactive/functions-conditional.html

are somewhat broken.

As a quick sample of something which I believe implements the correct
semantics for COALESCE and NULLIF, see the functions below. I'm not
suggesting that it would be best to treat these as functions, just
showing what I think the correct semantics would be. I also think
that the ease of definition shows that such an interpretation would
not necessarily be fairly characterized as a huge wart on the type
system.

create function "coalesce"(v1 unknown, v2 unknown)
returns unknown language plpgsql as
'begin if v1 is not null then return v1; end if; return v2; end;';

create function "coalesce"(v1 anyelement, v2 anyelement)
returns anyelement language plpgsql as
'begin if v1 is not null then return v1; end if; return v2; end;';

create function "nullif"(v1 unknown, v2 unknown)
returns unknown language plpgsql as
'begin if v1 = v2 then return null; end if; return v1; end;';

create function "nullif"(v1 anyelement, v2 anyelement)
returns anyelement language plpgsql as
'begin if v1 = v2 then return null; end if; return v1; end;';

select "coalesce"(null, null), pg_typeof(("coalesce"(null, null)));
coalesce | pg_typeof
----------+-----------
| unknown
(1 row)

select "coalesce"(null, '1'), pg_typeof(("coalesce"(null, '1')));
coalesce | pg_typeof
----------+-----------
1 | unknown
(1 row)

select "coalesce"('1', '2'), pg_typeof(("coalesce"('1', '2')));
coalesce | pg_typeof
----------+-----------
1 | unknown
(1 row)

select "coalesce"(null, '1'::text), pg_typeof(("coalesce"(null,
'1'::text)));
coalesce | pg_typeof
----------+-----------
1 | text
(1 row)

select "coalesce"(null, 1), pg_typeof(("coalesce"(null, 1)));
coalesce | pg_typeof
----------+-----------
1 | integer
(1 row)

The CASE predicate would also need to behave in this manner. Probably
GREATEST and LEAST, as well.

-Kevin


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-09 12:13:32
Message-ID: 1252498412.15729.9.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-09-08 at 13:18 -0500, Kevin Grittner wrote:
> As a quick sample of something which I believe implements the correct
> semantics for COALESCE and NULLIF, see the functions below.

You might want to show before and after, so it's clear what you are
suggesting to change.


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-09 13:38:55
Message-ID: 4AA7699F020000250002ABAE@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Tue, 2009-09-08 at 13:18 -0500, Kevin Grittner wrote:
>> As a quick sample of something which I believe implements the
>> correct semantics for COALESCE and NULLIF, see the functions below.
>
> You might want to show before and after, so it's clear what you are
> suggesting to change.

OK. The only time it would be different from current behavior is when
all parameters are of unknown type -- the result would be unknown
rather than text:

select "coalesce"(null, null), pg_typeof("coalesce"(null, null));
coalesce | pg_typeof
----------+-----------
| unknown
(1 row)

select coalesce(null, null), pg_typeof(coalesce(null, null));
coalesce | pg_typeof
----------+-----------
| text
(1 row)

select "coalesce"(null, '1'), pg_typeof("coalesce"(null, '1'));
coalesce | pg_typeof
----------+-----------
1 | unknown
(1 row)

select coalesce(null, '1'), pg_typeof(coalesce(null, '1'));
coalesce | pg_typeof
----------+-----------
1 | text
(1 row)

select "coalesce"('1', '2'), pg_typeof("coalesce"('1', '2'));
coalesce | pg_typeof
----------+-----------
1 | unknown
(1 row)

select coalesce('1', '2'), pg_typeof(coalesce('1', '2'));
coalesce | pg_typeof
----------+-----------
1 | text
(1 row)

All other situations would follow current behavior. For example, this
is the same either way:

select "coalesce"(null, 1), pg_typeof("coalesce"(null, 1));
coalesce | pg_typeof
----------+-----------
1 | integer
(1 row)

I believe this would work better both for those coming from a straight
SQL standard perspective and for those who want to treat user defined
types as first class types.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-09 14:25:34
Message-ID: 29989.1252506334@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> OK. The only time it would be different from current behavior is when
> all parameters are of unknown type -- the result would be unknown
> rather than text:

The difficulty with that is that it implies eventually having to coerce
from unknown to something else, only at runtime instead of parse time.
There is not actually any such thing as a runtime coercion from unknown.
What there is is parse-time determination of the type of a literal
constant.

Now admittedly there's probably not any major technical obstacle to
making a runtime conversion happen --- it's merely delayed invocation of
the destination type's input function. But I find it really ugly from a
theoretical point of view. Doing calculations with "unknown" values
just seems wrong. As an example consider

INSERT INTO tab (date_column) VALUES(COALESCE('2009-09-09', 'boo'));

If we made it work like you suggest, the system would never notice
that 'boo' is not a legal value of type date. I don't find that
to be a good idea.

For NULLIF the concept fails entirely, because you *can not* compare two
values without having determined what data type you intend to treat them
as. Ex: is '007' different from '7'?

regards, tom lane


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-09 21:00:03
Message-ID: 20090909210003.GN5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 09, 2009 at 10:25:34AM -0400, Tom Lane wrote:
> Now admittedly there's probably not any major technical obstacle to
> making a runtime conversion happen --- it's merely delayed invocation of
> the destination type's input function. But I find it really ugly from a
> theoretical point of view. Doing calculations with "unknown" values
> just seems wrong.

It's pretty grim. This seems to be some strange halfway house on the
way to real type-inference, with broken semantics to boot. How would it
prevent weirdos like:

SELECT 'msg'||v, date_trunc('year',v), v+10
FROM (SELECT 'hi ho') x(v);

If Kevin wants something like this it should either be real
type-inference or leave it as it is. The problem is that each variable
should have exactly one type, type inference lets you delay finding that
type until later down the tree and then back-propagate it back up again.
Your suggestions seems to allow variables to have as many types as it
wants.

--
Sam http://samason.me.uk/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 16:37:43
Message-ID: 4AAA3687020000250002ACE8@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Wed, Sep 09, 2009 at 10:25:34AM -0400, Tom Lane wrote:
>> Now admittedly there's probably not any major technical obstacle to
>> making a runtime conversion happen --- it's merely delayed
>> invocation of the destination type's input function. But I find it
>> really ugly from a theoretical point of view. Doing calculations
>> with "unknown" values just seems wrong.

Agreed. I did say that I didn't actually want to actually turn these
into functions -- I was just putting together a simple demonstration
(usable only for simple test cases) of what I thought the parse-time
behavior should be, to facilitate discussion. I *was* thinking that
showing that COALESCE could behave that way for simple cases with two
one-line plpgsql functions might show that the semantics weren't
excessively bizarre.

I don't think that the suggestion could be a problem for COALESCE. In
fact, I think somewhere in another thread, Tom conceded that much, but
(understandably) didn't want that one form of CASE behaving
differently than everything else did. I'll try to address that.

The first point is that if any of the expressions used for any result
value in one of these predicates is typed, nothing at all would
change. This is only about the behavior when each result value is
NULL or an untyped literal. All subsequent comments assume that, to
avoid the tedium of restating it each time.

I don't think explicit CASE predicates in either form would be a
problem, because there is nothing to suggest a connection between a
literal in the expression *which chooses* a result value and a literal
*used as* a result value.

I think that LEAST and GREATEST are a lost cause in terms of changing
much, since there are obviously compares to be made using *some* type
before a value can be derived -- at least if there is more than one
non-NULL value. Since these are PostgreSQL extensions which don't
even behave consistently with other products' extensions using the
same words, I'm not too concerned about them being "irregular". (I'm
not sure what the justification for the current behavior would even be
-- since NULL means *unknown*, how can you declare that you know the
greatest or least value in a set of values when any are unknown? It
seems like these should be named LEAST_KNOWN and GREATEST_KNOWN for
their current semantics.) In any event, the current behavior is to
treat them as text; I don't think we can improve on that, beyond
perhaps using unknown if all values are NULL, or all but one are NULL
and the remaining one is an untyped literal. Not sure whether that's
sane or worth it.

NULLIF presents a problem only with two arguments which are *both*
untyped literals. That case currently resolves to text. If both are
NULL, or one is NULL and the other is an untyped literal, I don't see
how there is a problem declaring the result type as unknown. I think
it would be sane to continue using text with two untyped literals.
This would require users to declare the type of one or both literals
if they want something else. (Frankly, I've never had a use for
NULLIF; it seems like a kludge which is there to encourage
substitution of magic values for NULL and then allow those magic
values to be transformed back to NULL on demand. Does anybody who
expects sane behavior really use this?)

Finally, there is one "minor" extension to what I said above. Any of
these conditional expressions which evaluate to an untyped literal or
NULL would be considered the same as a bare untyped literal or NULL
for all purposes, including their use in an enclosing conditional
expression. I don't *think* that adds a lot of complexity to the
issue, but I'm not sure on that one.

> It's pretty grim. This seems to be some strange halfway house on
> the way to real type-inference, with broken semantics to boot. How
> would it prevent weirdos like:
>
> SELECT 'msg'||v, date_trunc('year',v), v+10
> FROM (SELECT 'hi ho') x(v);

I don't see where what I'm proposing would change the behavior of that
at all. I'm only proposing parse-time changes for conditional
expressions -- the CASE predicate and its abbreviations.

I have looked at the code where the parser resolves types for these.
I think it would be within my skill set to produce a patch if others
agree this makes sense; although so far such agreement doesn't seem
too likely. :-(

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, "Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 16:59:04
Message-ID: 22373.1252688344@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> I'm only proposing parse-time changes for conditional
> expressions -- the CASE predicate and its abbreviations.

No, you are not; you are proposing run-time changes, specifically the
need to coerce unknown to something else long after the point where
the unknown is just a literal constant.

As far as I can see, this entire discussion turns on the complaint that
IS NULL gives different results for plain NULL and ROW(NULL,NULL,...);
if that weren't true then we wouldn't be arguing about whether COALESCE
is wrong. We really ought to be focusing on that and not making random
adjustments to the behavior of "unknown".

I've been wondering whether it would be sensible to make the
composite-datum constructors check for all-null fields and generate
a plain NULL if so. If so then ROW(NULL,NULL) would be
indistinguishable from NULL and the semantic gripes seem to largely
go away. It would be a problem for anyone who actually wanted to
distinguish those two cases, but how much do we care?

regards, tom lane


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 17:24:22
Message-ID: 20090911172422.GP5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 11, 2009 at 12:59:04PM -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > I'm only proposing parse-time changes for conditional
> > expressions -- the CASE predicate and its abbreviations.
>
> No, you are not; you are proposing run-time changes, specifically the
> need to coerce unknown to something else long after the point where
> the unknown is just a literal constant.

One thing I've just realized these discussions have pointed out is
that PG isn't doing the correct thing all the time with types. When
is it ever valid to see an "unknown" after type checking? AFAICT, it
shouldn't ever appear and hence doing:

CREATE VIEW v AS SELECT 'foo';

Should be doing the normal default to TEXT type. Is that right? or does
"unknown" have more meaning besides just being something that needs to
be fixed up during type checking.

> I've been wondering whether it would be sensible to make the
> composite-datum constructors check for all-null fields and generate
> a plain NULL if so. If so then ROW(NULL,NULL) would be
> indistinguishable from NULL and the semantic gripes seem to largely
> go away. It would be a problem for anyone who actually wanted to
> distinguish those two cases, but how much do we care?

I'd prefer these semantics; it would make it do "the right thing" in
more cases than now.

--
Sam http://samason.me.uk/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 17:26:45
Message-ID: 4AAA4205020000250002ACF6@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> I'm only proposing parse-time changes for conditional
>> expressions -- the CASE predicate and its abbreviations.
>
> No, you are not; you are proposing run-time changes, specifically
> the need to coerce unknown to something else long after the point
> where the unknown is just a literal constant.

I was thinking of changing what is currently done, for example, here:

newc->coalescetype = select_common_type(pstate, newargs, "COALESCE",
NULL);

Is that so late as you say, or is there a reason that can't work?

> As far as I can see, this entire discussion turns on the complaint
> that IS NULL gives different results for plain NULL and
> ROW(NULL,NULL,...);

No, I'm not proposing any change to that. (Others are, but that's not
my focus, personally.)

> if that weren't true then we wouldn't be arguing about whether
> COALESCE is wrong.

Yeah, I am. When you have queries built based on which fields on a
QBE window are filled by a user, it's not hard to come up with a
clause like:

AND (somedate < COALESCE(NULL, NULL) OR ...)

We solved this by modifying our framework to pass down metadata about
the values in addition to the values themselves. We were always able
to look at an object's class to generate the correct literal type -- a
Date object would generate a DATE '2009-09-11' format literal; but a
NULL had been bare in that situation. We now generate CAST(NULL AS
type) whenever we insert a NULL literal, so we are no longer burned by
this. I'm just thinking that it would reduce pain for others.

-Kevin


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 17:29:09
Message-ID: 20090911172909.GQ5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 11, 2009 at 06:24:22PM +0100, Sam Mason wrote:
> One thing I've just realized these discussions have pointed out is
> that PG isn't doing the correct thing all the time with types. When
> is it ever valid to see an "unknown" after type checking? AFAICT, it
> shouldn't ever appear and hence doing:
>
> CREATE VIEW v AS SELECT 'foo';
>
> Should be doing the normal default to TEXT type. Is that right? or does
> "unknown" have more meaning besides just being something that needs to
> be fixed up during type checking.

Doh, sorry I shouldn't have sent that. I wanted to spend some time to
see if I could find any other examples, but I hit send by accident.

--
Sam http://samason.me.uk/


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 17:33:10
Message-ID: 20090911173310.GR5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 11, 2009 at 12:26:45PM -0500, Kevin Grittner wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > if that weren't true then we wouldn't be arguing about whether
> > COALESCE is wrong.
>
> Yeah, I am. When you have queries built based on which fields on a
> QBE window are filled by a user, it's not hard to come up with a
> clause like:
>
> AND (somedate < COALESCE(NULL, NULL) OR ...)
>
> We solved this by modifying our framework to pass down metadata about
> the values in addition to the values themselves.

You need a *much* more invasive change to fix this. PG's type checker
only looks one level deep when choosing what types to replace "unknown"
with; what you you want is full type-inference as it's only that which
will allow you to track back up the layers and assign consistent types
to arbitrary expressions like the above.

--
Sam http://samason.me.uk/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, "Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 17:37:00
Message-ID: 22996.1252690620@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> I was thinking of changing what is currently done, for example, here:

> newc->coalescetype = select_common_type(pstate, newargs, "COALESCE",
> NULL);

> Is that so late as you say, or is there a reason that can't work?

It's what happens afterwards that's the problem --- try it and see.

> Yeah, I am. When you have queries built based on which fields on a
> QBE window are filled by a user, it's not hard to come up with a
> clause like:

> AND (somedate < COALESCE(NULL, NULL) OR ...)

Right. The only real way to fix that is to propagate the later
discovery that type 'date' would be preferred back to the inputs of the
COALESCE, which is what Sam Mason has been on about (IIUC). I'm afraid
that such a thing would make the behavior even more full of surprises
than what we have now. Resolving unknown from context is already
"action at a distance", as it were, and the longer the distance involved
the more chance for unexpected behavior. Not to mention the
implementation difficulties.

> We solved this by modifying our framework to pass down metadata about
> the values in addition to the values themselves. We were always able
> to look at an object's class to generate the correct literal type -- a
> Date object would generate a DATE '2009-09-11' format literal; but a
> NULL had been bare in that situation. We now generate CAST(NULL AS
> type) whenever we insert a NULL literal, so we are no longer burned by
> this. I'm just thinking that it would reduce pain for others.

Of course that's what the SQL spec would tell you to do anyway ;-)

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 17:41:21
Message-ID: 4AAA4571020000250002ACFD@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

> what you you want is full type-inference as it's only that which
> will allow you to track back up the layers and assign consistent
> types to arbitrary expressions like the above.

Well, obviously that would fix it; I'm not clear on why *only* that
would fix it. It seemed to me that we wouldn't have to go back up
like that if we deferred the assignment of a type in conditional
expressions. I've only scanned that part of the code, so it's well
within the range of possibility that I misunderstood something, but I
thought the type assigned to a CASE or COALESCE is used in the context
of evaluating enclosing expressions on the way *down*, no?

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 18:01:02
Message-ID: 4AAA4A0E020000250002AD05@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> I was thinking of changing what is currently done, for example,
>> here:
>
>> newc->coalescetype = select_common_type(pstate, newargs,
>> "COALESCE", NULL);
>
>> Is that so late as you say, or is there a reason that can't work?
>
> It's what happens afterwards that's the problem --- try it and see.

Anything in particular I should test or be looking for, or will the
error of my ways be glaringly obvious on any usage?

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, "Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 18:10:44
Message-ID: 23591.1252692644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It's what happens afterwards that's the problem --- try it and see.

> Anything in particular I should test or be looking for, or will the
> error of my ways be glaringly obvious on any usage?

I'm expecting coerce_type to fail, along the lines of
ERROR: failed to find conversion function from unknown to whatever

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 21:19:20
Message-ID: 1252703960.9975.89.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-09-11 at 12:59 -0400, Tom Lane wrote:
> If so then ROW(NULL,NULL) would be
> indistinguishable from NULL and the semantic gripes seem to largely
> go away. It would be a problem for anyone who actually wanted to
> distinguish those two cases, but how much do we care?

Does that violate the standard?

To make that interpretation work I think you would need to say that
ROW(NULL,NULL) _is_ the null value, and you would have to allow things
like:

select 1 + row(null,null);

which seems strange to me.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 21:35:07
Message-ID: 11581.1252704907@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> To make that interpretation work I think you would need to say that
> ROW(NULL,NULL) _is_ the null value,

Right...

> and you would have to allow things like:

> select 1 + row(null,null);

Eh? It's a null value of a composite type. The above is a type
violation.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>,"Sam Mason" <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 22:10:04
Message-ID: 4AAA846C020000250002ADBA@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm expecting coerce_type to fail, along the lines of
> ERROR: failed to find conversion function from unknown to whatever

OK. After playing with that and reading the code in more depth, I now
see what you've been saying.

[picks up lance and takes aim at windmill]

It still seems solvable without getting too extreme.

It seems to me that all of the conditional expressions besides the
ones using the CASE keyword *could* be resolved down to a literal of
unknown type at parse time, since they fall into this behavior *only*
when all parameters are NULL or literals of unknown type. Behavior
would be deterministic at parse time. The biggest argument against
doing this is that it would be bad to have behavior for the CASE
abbreviations which doesn't match the behavior of the CASE predicate
itself. Less trivial than what I thought was needed, but doable.

[spurs donkey to a trot]

I think the explicit CASE predicate can be solved, too.

The thing which makes the CASE predicate harder, is that you can have
the information that the result is guaranteed to be NULL or an untyped
literal, but you might not know *which* of the values will be chosen
until run time. (I don't know why it took me this long to see that
distinction. Oh, well.) We currently coerce all of the return values
to text for this. OK. But...

[breaks into a gallop]

... you know that it *will* be a NULL or a literal of unknown type,
and at parse time you can determine *which* typinput function will
need to be used once the actual value is determined at run time.

[reaches windmill]

So, couldn't the plan just include something to call the typinput at
run time against the text value we're already building up?

[brushes dust from clothing]

What did I miss this time?

-Kevin


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 22:13:44
Message-ID: 1252707224.9975.118.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-09-11 at 17:35 -0400, Tom Lane wrote:
> Eh? It's a null value of a composite type. The above is a type
> violation.

The spec calls it "the null value" which is included in all domains
(Framework 4.4.2). However, in the same section, it mentions "the data
type of the null value", so apparently each null does have a specific
type.

It seems to me like the spec would have something to say about
ROW(NULL,NULL) versus NULL. Do other systems make a distinction?

Regards,
Jeff Davis


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 22:37:18
Message-ID: 20090911223718.GS5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 11, 2009 at 01:37:00PM -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > Yeah, I am. When you have queries built based on which fields on a
> > QBE window are filled by a user, it's not hard to come up with a
> > clause like:
>
> > AND (somedate < COALESCE(NULL, NULL) OR ...)
>
> Right. The only real way to fix that is to propagate the later
> discovery that type 'date' would be preferred back to the inputs of the
> COALESCE, which is what Sam Mason has been on about (IIUC).

Yup; sounds right. The fact that the inputs to COALESCE here are just
simple NULL literals is making the example seem too simple. The general
case is that of them being an arbitrary expression and you somehow need
to get the DATE type all the way back up to the top literal and make
sure that no other branch uses it as anything else. This was what my
example with using a sub-select was about, maybe this would be better
though:

SELECT date '2001-1-1' < COALESCE(NULL,v)
FROM (SELECT NULL) x(v)
WHERE v = 10;

This should fail to type check; v is being treated as both a DATE and
an INT. Getting a compiler to do this is pretty easy (store the type
constraints some where else during type checking and then make sure they
all match at the end), but isn't the way PG works at the moment.

> I'm afraid
> that such a thing would make the behavior even more full of surprises
> than what we have now. Resolving unknown from context is already
> "action at a distance", as it were, and the longer the distance involved
> the more chance for unexpected behavior. Not to mention the
> implementation difficulties.

Most of my experience says that type-inference actually makes things
easier. If you're dealing with dynamically *checked* languages then I
can see where you comment comes from; but for typed languages, where
everything has to be perfect at compile time, then this doesn't seem to
be true. This is why people who use Haskell tend to be the ones saying
things like "when it type checks you can be pretty certain it's going to
work". I'm not saying we should be going that complicated, just that
in my experience more complicated type systems imply simpler and more
understandable runtime behavior.

Implementation is a bit harder, but it won't be much more complicated
than what PG already has. It's already dealing with most of the issues
(in a somewhat ad-hoc way) and I'd expect that getting type-inference in
would help clean other things up a bit.

--
Sam http://samason.me.uk/


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COALESCE and NULLIF semantics
Date: 2009-09-11 23:01:54
Message-ID: 20090911230153.GT5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 11, 2009 at 12:41:21PM -0500, Kevin Grittner wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
>
> > what you you want is full type-inference as it's only that which
> > will allow you to track back up the layers and assign consistent
> > types to arbitrary expressions like the above.
>
> Well, obviously that would fix it; I'm not clear on why *only* that
> would fix it.

Because, I think, if you did come up with "another" solution and gave it
another name most type-theorists would call it type-inference anyway.

Type inference is just a general idea and is implemented in lots of
different ways depending on the specifics of the problem. You could
argue that PG has a limited form of type inference already.

> It seemed to me that we wouldn't have to go back up
> like that if we deferred the assignment of a type in conditional
> expressions. I've only scanned that part of the code, so it's well
> within the range of possibility that I misunderstood something, but I
> thought the type assigned to a CASE or COALESCE is used in the context
> of evaluating enclosing expressions on the way *down*, no?

Maybe we're using different terms; but when a literal is declared you
don't know what type it is, just that it needs at most one. It's only
later on when the variable is actually used that you find out what its
type constraints are.

--
Sam http://samason.me.uk/