Implicit coercions, choosing types for constants, etc (yet again)

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Implicit coercions, choosing types for constants, etc (yet again)
Date: 2002-11-10 18:58:57
Message-ID: 21045.1036954737@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In the last episode of this long-running issue, we decided that life
would be better if we make small-enough integer constants be initially
typed as int2 rather than int4, and then auto-promote them up to int4
when necessary. This would solve problems with, for example,
SELECT ... WHERE smallintcol = 42
not using an index.

I tried doing this and immediately got a bunch of failures in the
regression tests. It looks like we still have more work to do in
thinking out how the system should deal with this stuff.

One instructive example:

regression=# SELECT SUBSTRING('1234567890' FROM 3);
substring
-----------
34567890
(1 row)

regression=# SELECT SUBSTRING('1234567890' FROM 3::int2);
substring
-----------
3
(1 row)

Why did that happen? Well, the parser has to choose among four possible
candidates for 2-parameter substring():

bit | pg_catalog | substring | bit, integer
bytea | pg_catalog | substring | bytea, integer
text | pg_catalog | substring | text, integer
text | pg_catalog | substring | text, text

When the second argument is initially typed as 'integer', the fourth
candidate is immediately dropped because it has no exact matches, and
then the third candidate ultimately wins because the unknown-literal
input is preferentially assumed to be of string category. But when
the second argument is initially 'int2', we fail to eliminate any
candidates on the exact-match test, and then the fourth candidate
wins on the test that counts the number of preferred types at positions
requiring coercion --- text is a preferred type and integer isn't.
So we end up using the pattern-match interpretation of substring(),
even though we had to apply an int2-to-text implicit coercion to do it.

I think this is pretty clearly a bug, or at least undesirable behavior,
even if we weren't thinking of changing the initial typing of constants.
You can get this behavior today in existing releases, if you happen to
use a smallint variable as the second argument of substring().

One way to fix this is to make cross-category coercions to text not be
implicit casts. (I've opined before that that's a bad idea, but not
gotten a lot of support for changing it. Now that we have a distinction
between implicit and assignment casts, perhaps we could consider making
coercions to text be assignment casts, as a compromise?)

Alternatively we could try to rejigger the rules followed by
func_select_candidate(), which seem a little ad-hoc anyway. I don't
have a better proposal to make at the moment, though.

Here's another problem:

regression=# select to_hex(42);
to_hex
--------
2a
(1 row)

regression=# select to_hex(42::int2);
ERROR: Function to_hex(smallint) does not exist
Unable to identify a function that satisfies the given argument types
You may need to add explicit typecasts

The problem here is that the parser can't choose between to_hex(integer)
and to_hex(bigint). Neither integer nor bigint is a preferred type, so
there's no basis to prefer one over the other.

I suppose we could fix this particular case by eliminating
to_hex(integer), or adding to_hex(smallint), but that seems a very
unsatisfying answer. I am wondering about adding some notion of
"conversion distance" associated with casts, and preferring choices
that require a smaller conversion distance; perhaps this could replace
the concept of "preferred type", too. But again I don't have a specific
proposal to make. Any thoughts?

The test case that was actually in the regression tests was

select to_hex(256*256*256 - 1) AS "ffffff";
! ERROR: Function to_hex(smallint) does not exist
! Unable to identify a function that satisfies the given argument types
! You may need to add explicit typecasts

Even had the parser resolved the overloaded to_hex call, this test would
have failed, because int2 multiplication overflows:

regression=# select 256::int2*256::int2*256::int2;
?column?
----------
0
(1 row)

I am thinking that it might be good to eliminate all the basic
arithmetic operators on int2, so that you get int4 as the minimum
width for arithmetic. But this cannot work unless we have some concept
like conversion distance, or the parser will just fail to make a choice
between int4, int8, etc alternatives.

In short: we still have a bunch of shortcomings in the type-resolution
process, and we can't allow small integers to be typed as int2 until
we fix 'em.

Comments? Ideas?

regards, tom lane


From: Dennis Björklund <db(at)zigo(dot)dhs(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implicit coercions, choosing types for constants, etc
Date: 2002-11-11 14:16:57
Message-ID: Pine.LNX.4.44.0211111425470.18875-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 10 Nov 2002, Tom Lane wrote:

> In the last episode of this long-running issue, we decided that life
> would be better if we make small-enough integer constants be initially
> typed as int2 rather than int4, and then auto-promote them up to int4
> when necessary.

What kind of type system do postgresql (or SQL in general) use? I don't
know much about the postgresql codebase but I do know something about type
inference of functional languages.

It seems to me that one would like to have a polymorphic typesystem with
constraints to handle overloading, subtyping or whatever is needed in
SQL.

> This would solve problems with, for example,
> SELECT ... WHERE smallintcol = 42
> not using an index.

Using a suitable typesystem 42 could be said to have a type like

isInt z => z

which should be read that z is the type and isInt z is a constraint on z
saying that z is an integer type (that means for example that z can never
be instantiated to Bool). Then the use of smallintcol = 42 where
smallintcol has type int2 and where equality = is overloaded for types
Int2 -> Int2 -> Bool, Int4 -> Int4 -> Bool (and so on) would force 42 to
be the type Int2 in this case, since the first argument of = had type
Int2.

Is there anywhere I can read about the typesystem in SQL in general and
postgresql in particular?

There are a number of type systems in the functional world with properties
like this. Some very advanced and some simpler. I have a feeling from
reading this list that the type inference in postgresql as a bit ad hook.
But i've not read the source at all so it might be unfair to say such a
thing.

In the functional language Haskell there is also a defaulting rule that is
used if you end up with constants like 42 still with type isInt z => z. If
the type was left like this it just imply that any numeric type for 42
would do. In this case maybe z is defaulted to Int4. In most cases they
way you use the constant would force it to be a particular type.

Of the examples you gave in the letter I don't see anything that shouldn't
work with a more advanced typesystem like this. But I'm sure there are
other strange constructs in postgresql that I don't know about.

I would love to make some small toy implementation to try out things but
in the nearest future I don't have time for that. But this interests me so
maybe I'll give it a try some day (like next summer :-). At least I would
like to know more about how it works in postgresql today. It's possible
that what I talk about do not apply to SQL or that postgresql already
implements similar things.

--
/Dennis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dennis Björklund <db(at)zigo(dot)dhs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implicit coercions, choosing types for constants, etc (yet again)
Date: 2002-11-11 16:56:35
Message-ID: 2710.1037033795@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

=?ISO-8859-1?Q?Dennis_Bj=F6rklund?= <db(at)zigo(dot)dhs(dot)org> writes:
> It seems to me that one would like to have a polymorphic typesystem with
> constraints to handle overloading, subtyping or whatever is needed in
> SQL.

What we have now is indeed pretty ad-hoc, but a full-blown type
inference system might be overkill. Not sure. It would be interesting
to look into it.

> Is there anywhere I can read about the typesystem in SQL in general and
> postgresql in particular?

There's the user's guide
http://developer.postgresql.org/docs/postgres/typeconv.html
and there's the source code (see src/backend/parser/, particularly
parse_coerce.c, parse_func.c, parse_oper.c). Not much in between
I'm afraid, but the source code is reasonably well-commented.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Implicit coercions, choosing types for constants, etc
Date: 2002-11-11 20:09:31
Message-ID: Pine.LNX.4.44.0211111937480.12428-100000@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane writes:

> One way to fix this is to make cross-category coercions to text not be
> implicit casts. (I've opined before that that's a bad idea, but not
> gotten a lot of support for changing it. Now that we have a distinction
> between implicit and assignment casts, perhaps we could consider making
> coercions to text be assignment casts, as a compromise?)

I thought we had agreed to make those explicit. In fact, I vaguely recall
you not liking that notion ...

> I suppose we could fix this particular case by eliminating
> to_hex(integer), or adding to_hex(smallint), but that seems a very
> unsatisfying answer. I am wondering about adding some notion of
> "conversion distance" associated with casts, and preferring choices
> that require a smaller conversion distance; perhaps this could replace
> the concept of "preferred type", too. But again I don't have a specific
> proposal to make. Any thoughts?

A couple of months ago I played around with the notion of adding a numeric
preference ("distance", as you call it) to the casts, but in the end this
solved only a small number of cases and created a big mess at the same
time. When you have to pick arbitrary distances, any algorithm will give
you arbitrary answers, after all.

I think we can construct a type precedence list using the existing catalog
information. Considering the example of choosing between int2->int4 and
int2->int8, the system could notice that there is an implicit cast
int4->int8 (and no implicit cast the other way), so int4 is "less" than
int8. (If there were an implicit cast int8->int4, then the system would
have to consider int4 and int8 to be equivalent and picking one at random
would be reasonable.)

--
Peter Eisentraut peter_e(at)gmx(dot)net