Rethinking user-defined-typmod before it's too late

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 16:14:45
Message-ID: 5146.1181924085@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The current discussion about the tsearch-in-core patch has convinced me
that there are plausible use-cases for typmod values that aren't simple
integers. For instance it could be sane for a type to want a locale or
language selection as a typmod, eg tsvector('ru') or tsvector('sv').
(I'm not saying we are actually going to do that to tsvector, just that
it's now clear to me that there are use-cases for such things.)

Teodor's work a few months ago generalized things enough so that
something like this is within reach. The grammar will actually allow
darn near anything for a typmod, since the grammar production is
expr_list to avoid shift/reduce conflict with the very similar-looking
productions for function calls. The only place where we are
constraining what a typmod can be is that the defined API for
user-written typmodin functions is "integer array".

At the time that patch was being worked on, I think I argued that
integer typmods were enough because you'd have to pack them into such a
small output representation anyway. The hole in that logic is that you
might have a fairly small enumerated set of possibilities, but that
doesn't mean you want to make the user use a numeric code for them.
You could even make the typmod be an integer key for a lookup table,
if the set of possibilities is not hardwired.

Since this code hasn't been released yet, the API isn't set in stone
... but as soon as we ship 8.3, it will be, or at least changing it will
be orders of magnitude more painful than it is today. So, late as this
is in the devel cycle, I think now is the time to reconsider.

I propose changing the typmodin signature to "typmodin(cstring[]) returns
int4", that is, the typmods will be passed as strings not integers. This
will incur a bit of extra conversion overhead for the normal uses where
the typmods are integers, but I think the gain in flexibility is worth
it. I'm inclined to make the code in parse_type.c take either integer
constants, simple string literals, or unqualified names as input ---
so you could write either tsvector('ru') or tsvector(ru) when using a
type that wants a nonintegral typmod.

Note that the typmodout side is already OK since it is defined to return
a string.

Comments?

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 16:22:42
Message-ID: 20070615162242.GJ13394@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 15, 2007 at 12:14:45PM -0400, Tom Lane wrote:

[snip]

> I propose changing the typmodin signature to "typmodin(cstring[])
> returns int4", that is, the typmods will be passed as strings not
> integers. This will incur a bit of extra conversion overhead for
> the normal uses where the typmods are integers, but I think the gain
> in flexibility is worth it. I'm inclined to make the code in
> parse_type.c take either integer constants, simple string literals,
> or unqualified names as input --- so you could write either
> tsvector('ru') or tsvector(ru) when using a type that wants a
> nonintegral typmod.
>
> Note that the typmodout side is already OK since it is defined to
> return a string.
>
> Comments?

+1 :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 16:40:54
Message-ID: 4672C116.6050007@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I propose changing the typmodin signature to "typmodin(cstring[]) returns
> int4", that is, the typmods will be passed as strings not integers. This
> will incur a bit of extra conversion overhead for the normal uses where
> the typmods are integers, but I think the gain in flexibility is worth
agree

> it. I'm inclined to make the code in parse_type.c take either integer

And modify ArrayGetTypmods() to ArrayGetIntegerTypmods()

Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 16:59:36
Message-ID: 20070615165936.GK7531@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> I propose changing the typmodin signature to "typmodin(cstring[]) returns
> int4", that is, the typmods will be passed as strings not integers. This
> will incur a bit of extra conversion overhead for the normal uses where
> the typmods are integers, but I think the gain in flexibility is worth
> it. I'm inclined to make the code in parse_type.c take either integer
> constants, simple string literals, or unqualified names as input ---
> so you could write either tsvector('ru') or tsvector(ru) when using a
> type that wants a nonintegral typmod.
>
Would this allow for 'multi-value' typmods for user-defined types?
That's something that would greatly help and simplify PostGIS. It was
brought up on the PostGIS lists here:
http://postgis.refractions.net/pipermail/postgis-users/2006-September/013086.html
and on -hackers here:
http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg81281.html

The 'geometry' type really needs to have a typmod which has the
dimensions, SRID and type of the geometry. At the moment the PostGIS
folks are using constraints and essentially a side-table to work around
this, which gets really, really ugly. It sounds like this might work
for them, and while it'd incur a bit of overhead to parse the string I'm
pretty sure it'd be worth it.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 17:29:45
Message-ID: 6271.1181928585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> I propose changing the typmodin signature to "typmodin(cstring[]) returns
>> int4", that is, the typmods will be passed as strings not integers.

> And modify ArrayGetTypmods() to ArrayGetIntegerTypmods()

Right --- the decoding work will only have to happen in one place for
our existing uses.

Is it worth providing an ArrayGetStringTypmods in core, when it won't
be used by any existing core datatypes?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 17:32:02
Message-ID: 6303.1181928722@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Would this allow for 'multi-value' typmods for user-defined types?

If you can squeeze them into 31 bits of stored typmod, yes. That
may mean that you still need the side table (with stored typmod being a
lookup key for the table). But this gets you out of exposing that
detail to users.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 18:29:16
Message-ID: 4672DA7C.1010503@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Is it worth providing an ArrayGetStringTypmods in core, when it won't
> be used by any existing core datatypes?
I don't think so - cstring[] is a set of strings itself. I don't believe that we
could suggest something commonly useful without some real-world examples.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 19:01:52
Message-ID: 20070615190152.GN7531@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > Would this allow for 'multi-value' typmods for user-defined types?
>
> If you can squeeze them into 31 bits of stored typmod, yes. That
> may mean that you still need the side table (with stored typmod being a
> lookup key for the table). But this gets you out of exposing that
> detail to users.

I see, the user could put in:
geometry(123456789,MULTIPOLYGON,3);

But we'd only get 31 bits of room to encode that into. I'm not sure if
that's enough. :( At the moment there's three columns we're talking
about in the side-table:
SRID (integer)
TYPE (varchar(30))
DIMENSIONS (integer)

Now, the type is a small enumerated set, and we can probably limit
dimensions to a few bits (maybe one for 2d/3d, but we might have some
other cases...), and still be following the OGC standard, but I don't
think there are any restrictions on SRID beyond '32 bit integer'. As
such, I'm not sure if we can encode it all directly into 31 bits (which
would obviously be preferred to a side-table with each case we come
across being enumerated in it). Then again, at the *moment*, anyway,
the SRIDs we have only go up to about 32,000, so we could dedicate 16
bits to it and probably be alright.

Any chance of this being increased? Obviously would like to avoid the
side-table, if possible.

Thanks!

Stephen


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 19:07:23
Message-ID: 200706152107.25785.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am Freitag, 15. Juni 2007 18:14 schrieb Tom Lane:
> The current discussion about the tsearch-in-core patch has convinced me
> that there are plausible use-cases for typmod values that aren't simple
> integers.  For instance it could be sane for a type to want a locale or
> language selection as a typmod, eg tsvector('ru') or tsvector('sv').

That would also be very useful for the XML type with an optional XML schema
modification. I guess in a lot of use cases you would have to store the
mapping in a side table, if the typmod on disk remains an integer.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-15 20:38:03
Message-ID: 21886.1181939883@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Any chance of this being increased?

No. Changing typmod to something other than int32 would require many
thousands of lines of diffs just in the core distro. I don't even want
to think about how much outside code would break.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Stephen Frost" <sfrost(at)snowman(dot)net>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-06-19 18:47:25
Message-ID: 1182278845.6855.335.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2007-06-15 at 15:01 -0400, Stephen Frost wrote:

> But we'd only get 31 bits of room to encode that into. I'm not sure if
> that's enough. :( At the moment there's three columns we're talking
> about in the side-table:
> SRID (integer)
> TYPE (varchar(30))
> DIMENSIONS (integer)
>
> Now, the type is a small enumerated set, and we can probably limit
> dimensions to a few bits (maybe one for 2d/3d, but we might have some
> other cases...), and still be following the OGC standard, but I don't
> think there are any restrictions on SRID beyond '32 bit integer'. As
> such, I'm not sure if we can encode it all directly into 31 bits (which
> would obviously be preferred to a side-table with each case we come
> across being enumerated in it). Then again, at the *moment*, anyway,
> the SRIDs we have only go up to about 32,000, so we could dedicate 16
> bits to it and probably be alright.

This is for type/column definitions, so you'd only have a problem if you
had more than 2 billion defined combinations of (SRID, TYPE, DIMENSIONS)
in the database. Admittedly this would need to cope with all user
defined typmods created during SQL execution e.g. X::typmod(A, B, C),
but ISTM that would never realistically be a problem.

The typmod function could cache the top ten combinations etc..

> Any chance of this being increased? Obviously would like to avoid the
> side-table, if possible.

If you had more than 2 billion permutations you'd definitely want that
in a table.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-07-17 05:02:57
Message-ID: 200707170502.l6H52vB09793@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Is this something for 8.3 or 8.4?

---------------------------------------------------------------------------

Tom Lane wrote:
> The current discussion about the tsearch-in-core patch has convinced me
> that there are plausible use-cases for typmod values that aren't simple
> integers. For instance it could be sane for a type to want a locale or
> language selection as a typmod, eg tsvector('ru') or tsvector('sv').
> (I'm not saying we are actually going to do that to tsvector, just that
> it's now clear to me that there are use-cases for such things.)
>
> Teodor's work a few months ago generalized things enough so that
> something like this is within reach. The grammar will actually allow
> darn near anything for a typmod, since the grammar production is
> expr_list to avoid shift/reduce conflict with the very similar-looking
> productions for function calls. The only place where we are
> constraining what a typmod can be is that the defined API for
> user-written typmodin functions is "integer array".
>
> At the time that patch was being worked on, I think I argued that
> integer typmods were enough because you'd have to pack them into such a
> small output representation anyway. The hole in that logic is that you
> might have a fairly small enumerated set of possibilities, but that
> doesn't mean you want to make the user use a numeric code for them.
> You could even make the typmod be an integer key for a lookup table,
> if the set of possibilities is not hardwired.
>
> Since this code hasn't been released yet, the API isn't set in stone
> ... but as soon as we ship 8.3, it will be, or at least changing it will
> be orders of magnitude more painful than it is today. So, late as this
> is in the devel cycle, I think now is the time to reconsider.
>
> I propose changing the typmodin signature to "typmodin(cstring[]) returns
> int4", that is, the typmods will be passed as strings not integers. This
> will incur a bit of extra conversion overhead for the normal uses where
> the typmods are integers, but I think the gain in flexibility is worth
> it. I'm inclined to make the code in parse_type.c take either integer
> constants, simple string literals, or unqualified names as input ---
> so you could write either tsvector('ru') or tsvector(ru) when using a
> type that wants a nonintegral typmod.
>
> Note that the typmodout side is already OK since it is defined to return
> a string.
>
> Comments?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rethinking user-defined-typmod before it's too late
Date: 2007-07-17 05:11:29
Message-ID: 2233.1184649089@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Is this something for 8.3 or 8.4?

My goodness, you are a bit behind on the email. We fixed that a month ago.

regards, tom lane