RFC for adding typmods to functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: RFC for adding typmods to functions
Date: 2009-11-17 22:09:27
Message-ID: 5192.1258495767@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel submitted a patch to add typmods to function declarations, but there
was no prior design discussion and it desperately needs some. Let me try
to summarize the issues that seem to need agreement.

The proposed patch allows optional typmods to be attached to the declared
argument and result types of a function; for example you could say
"create function foo(numeric(2)) returns numeric(4)". (Note: in existing
releases, this syntax works but the typmod information is simply
discarded.) An immediate application, not implemented here but which
we'd like to have for 8.5, is multiple anyelement types -- for example,

create function foo(anyelement, anyelement, anyelement(1), anyelement(1))
returns anyelement(1)

says that the first and second arguments must be of the same type, the
third and fourth must also be of the same type but not necessarily the
same as the first two, and the result is of this second type.

I can see the following definitional issues:

1. Are the typmods of input arguments part of the function signature,
ie, could foo(numeric(2)) and foo(numeric(3)) coexist? The proposed
patch answers "no, they are the same function and you can have only one".
This may be good enough, but there are some possible uses that we are
foreclosing by doing this. Two sample applications:

foo(numeric) a general-purpose function
foo(numeric(2)) same definition but optimized for short inputs

foo(anyelement, anyelement(1)) general case
foo(anyelement, anyelement) optimized for identical input types

The major obstacle to allowing such cases is that we'd need to invent new
ambiguous-function resolution rules that would let us figure out which
function to prefer for a given set of inputs, and it's not at all clear
how to do that --- in particular deciding that one is preferable to
another seems to require type-specific knowledge about the meaning of
different typmods. So that looks like a major can of worms, probably
requiring new APIs for custom data types.

A possible compromise is to say that you can have only one now but leave
the door open to allow more than one later. However, the function
signature is the function identity for many purposes, so it's hard to be
fuzzy about this. For example, given "CREATE FUNCTION foo(numeric(2))",
which of the following should drop the function?
DROP FUNCTION foo(numeric(2));
DROP FUNCTION foo(numeric);
DROP FUNCTION foo(numeric(3));
The traditional behavior is that any of these would work, since the
typmod was ignored anyway. If the typmod means something then the
second one is a bit surprising and the third definitely doesn't
satisfy the POLA. Are we prepared to possibly break existing apps
now by disallowing the third and/or second?

2. What is the exact meaning of attaching a typmod to an input argument?
As the patch has it, doing so means nothing at all for the purposes of
resolving which function to call, and then once we have identified the
function we will attempt to apply an implicit coercion to the actual input
argument to make it match the typmod. The first part of that is probably
reasonable if you accept the "there can be only one" answer to point #1;
but if you don't then it's completely unworkable. In any case it's worth
noting that foo(anyelement, anyelement) will accept two arguments of the
same types and different typmods, which might surprise people. The second
part is trickier, in particular the fact that the coercion is implicit.
Up to now there have been only assignment and explicit coercions that
could try to apply a typmod to a value. Our existing API for coercion
functions (see the CREATE CAST man page if you don't recall details)
doesn't even provide a way for the coercion function to distinguish
implicit from assignment coercions. Maybe this is fine --- on that same
page we say it's bad design for coercion functions to pay attention to the
cast context anyhow. But we had better agree that it's okay for such
coercions to behave more like assignment than like a traditional implicit
cast. If you want to distinguish the cases, we need to break that API.

3. What is the exact meaning of attaching a typmod to a result or output
argument? There are two fundamentally different views you can take on
this point: that the typmod is an assertion that the function result
matches the typmod, or that the typmod requests a run-time coercion step
to make the result match the typmod. For C-level functions the first of
these seems more natural; after all we take it on faith that the result is
of the declared type. In particular, you *have to* adopt that viewpoint
towards the coercion functions of the type, because the system has no
other knowledge of what a typmod means than "the results of the type's
coercion functions have the correct properties for the given typmod
value". For PL functions I doubt we want to trust the function writer
completely that his results match the typmod, but should we adopt an
approach of "check the result" (and, presumably, throw error if it doesn't
meet the typmod) or "force a coercion" (and if so, with which semantics
--- explicit, assignment, implicit)? The former would require
infrastructure we have not currently got, ie, a "check typmod" function
for datatypes supporting typmods. The latter seems a bit ugly because it
gives PL functions a subtly different set of semantics from C functions.
In either case it seems we'd have to hope that all PL authors remember
to insert code to do that, or else we have a hole in the type system:
functions returning values that don't meet the typmod the system thinks
they do. We can fix all the built-in PLs but I'll gladly wager that
at least one third-party PL will forget to deal with this, and nobody will
notice until it's reported as a security bug.

4. What about functions whose output typmod should depend on the input
typmod(s)? I mentioned earlier the example that concatenation of
varchar(M) and varchar(N) should produce varchar(M+N). We could possibly
punt on this for the time being; supporting only fixed output typmods for
now doesn't obviously foreclose us from adding support for computed
typmods later. However there is still one nasty case that we cannot
push off till later: given a function that takes and returns a polymorphic
type such as anyelement, and an actual argument with a typmod (eg
numeric(2)), is the result numeric(2) or just numeric? As things stand
we would have little choice but to say the latter, because we don't know
what the function might do with the value, and there are too many real
cases where the result might not have the same typmod. But there are
also a lot of cases where you *would* wish that it has the same typmod,
and this patch raises the stakes for throwing away typmods mid-expression.
Is this okay, and if not what could we do about it?

Unless we have consensus on all of these points I don't think we should
proceed with the patch. Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2009-11-17 22:12:42 Re: next CommitFest
Previous Message Kevin Grittner 2009-11-17 21:40:07 Re: plpgsql: open for execute - add USING clause