Re: Type Categories for User-Defined Types

Lists: pgsql-hackers
From: David E(dot) Wheeler <david(at)kineticode(dot)com>
To: pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Type Categories for User-Defined Types
Date: 2008-07-29 18:53:57
Message-ID: 1E80EF30-FA7F-467D-B1DE-4D56DAE5246E@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 29, 2008, at 11:41, Tom Lane wrote:

> Okay, it's committed with minor revisions --- the biggest thing I
> fixed
> was the lack of an uninstall script.

Great, thanks!

> I saw what you were talking about in terms of still having some
> casting
> issues: having to put in a quote_literal(citext) alias function seems
> like a huge hack,

Yes, and I've been adding more hacks along the lines of:

CREATE OR REPLACE FUNCTION int8(citext)
RETURNS int8
AS 'SELECT int8( $1::text )'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citext(int8)
RETURNS citext
AS 'SELECT text( $1 )::citext'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE CAST (int8 AS citext) WITH FUNCTION citext(int8) AS ASSIGNMENT;
CREATE CAST (citext AS int8) WITH FUNCTION int8(citext);

I've been doing this for all the types, writing tests to see how text
behaves and replicating it with these hack functions. No, it's not
ideal.

> and I notice that cases like
>
> contrib_regression=# select 'a'::text || 'b'::citext;
> ERROR: operator is not unique: text || citext
>
> still don't work even though you put in an alias || operator.

Damn, I didn't even notice that! Can that be fixed?

> It seems to me that trying to fix these things retail is a losing
> proposition. The reason you need these, instead of having everything
> "just work" like varchar does, is that citext isn't seen as a member
> of the string type category, and so the "preferred type" preference
> for
> text isn't applied. What we ought to do about that IMHO is make a
> way for user-defined types to declare what category they belong to.
> This has been foreseen as needed for a *very* long time, but we never
> really had a forcing function to make us do it before.

Yes, this would be a *much* nicer way to do it, IMO.

> Obviously the solution should involve a new column in pg_type and
> a new type property in CREATE TYPE, but what should the representation
> be? A full-on approach would make the type categories be real SQL
> objects with their own system catalog and reference them by OID,
> but I can't help thinking that that's overkill.

It kinda sounds that way, yeah. What happens with DOMAINs, BTW? Do
they need to write hacky functions like the above, or are they aware
of their types because of the types from which they inherit?

> Anyway, debating that is probably material for a separate thread ...

Here you go! ;-)

Thanks again for the commit, Tom.

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 20:12:09
Message-ID: 15217.1217362329@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 29, 2008, at 11:41, Tom Lane wrote:
>> and I notice that cases like
>> contrib_regression=# select 'a'::text || 'b'::citext;
>> ERROR: operator is not unique: text || citext
>> still don't work even though you put in an alias || operator.

> Damn, I didn't even notice that! Can that be fixed?

Given the present infrastructure I think the only way would be with
two more alias operators, text||citext and citext||text. But that way
madness lies.

>> Obviously the solution should involve a new column in pg_type and
>> a new type property in CREATE TYPE, but what should the representation
>> be? A full-on approach would make the type categories be real SQL
>> objects with their own system catalog and reference them by OID,
>> but I can't help thinking that that's overkill.

> It kinda sounds that way, yeah. What happens with DOMAINs, BTW? Do
> they need to write hacky functions like the above, or are they aware
> of their types because of the types from which they inherit?

Domains are treated as their base types in general. Elein has been
complaining about that for years ;-) ... but I think improving it
is unrelated to this issue.

>> Anyway, debating that is probably material for a separate thread ...

> Here you go! ;-)

After a quick look to verify my recollection: the only two things
that the system does with type categories are

extern CATEGORY TypeCategory(Oid type);

Returns the category a type belongs to.

extern bool IsPreferredType(CATEGORY category, Oid type);

Detects whether a type is a preferred type in its category (there can
be more than one preferred type in a category, and in fact the
traditional setup is that *every* user-defined type is a preferred
type in the USER_TYPE category).

The categories themselves are pretty much opaque values, except that
parse_func.c has special behavior to prefer STRING_TYPE when in doubt.

So this can fairly obviously be replaced by two new pg_type columns,
say "typcategory" and "typpreferred", where the latter is a bool.
Since the list of categories is pretty short and there's no obvious
reason to extend it a lot, I propose that we just represent typcategory
as a "char", using a mapping along the lines of

BITSTRING_TYPE b
BOOLEAN_TYPE B
DATETIME_TYPE D
GENERIC_TYPE P (think "pseudotype")
GEOMETRIC_TYPE G
INVALID_TYPE \0 (not allowed in catalog anyway)
NETWORK_TYPE n
NUMERIC_TYPE N
STRING_TYPE S
TIMESPAN_TYPE T
UNKNOWN_TYPE u
USER_TYPE U

Users would be allowed to select any single ASCII character as the
"category" of a user-defined type, should they have a need to make their
own new category. Of course CREATE TYPE's default is category = U and
preferred = true for backward compatibility reasons. We could put down
a rule that system-defined categories are always upper or lower case
letters (or even always upper, if we wanted to strain some of the
assignments a bit) so that it's clear what can be used for a
user-defined category.

It might possibly be worth making new categories for arrays, composites,
and enums; they're currently effectively USER_TYPE but that doesn't seem
quite right. Also, the rules for domains should likely be "same
category as base type, never a preferred type" instead of the current
behavior where they're user types. (I think the latter doesn't really
matter now, because we always smash a domain to its base type before
inquiring about categories anyway. But it might give Elein a bit more
room to maneuver with the functions-on-domains issue.)

A possible objection is that this will make TypeCategory and
IsPreferredType slower than before, since they'll involve a syscache
lookup instead of a simple switch statement. I don't think this will
be too bad though; all the paths they are used in are full of catalog
lookups anyway, so it's hard to credit that there would be much
percentage slowdown.

Thoughts?

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 20:24:58
Message-ID: 36F67A2D-BC51-4919-A46F-6C35CE1415C0@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 29, 2008, at 13:12, Tom Lane wrote:

>> Damn, I didn't even notice that! Can that be fixed?
>
> Given the present infrastructure I think the only way would be with
> two more alias operators, text||citext and citext||text. But that way
> madness lies.

I suppose, then, that you're saying that there are lots of other
functions for which this sort of thing would need to be done? Because
two more aliases for this one operator is no big deal, AFAIC.

>> It kinda sounds that way, yeah. What happens with DOMAINs, BTW? Do
>> they need to write hacky functions like the above, or are they aware
>> of their types because of the types from which they inherit?
>
> Domains are treated as their base types in general. Elein has been
> complaining about that for years ;-) ... but I think improving it
> is unrelated to this issue.

I see.

> After a quick look to verify my recollection: the only two things
> that the system does with type categories are
>
> extern CATEGORY TypeCategory(Oid type);
>
> Returns the category a type belongs to.
>
> extern bool IsPreferredType(CATEGORY category, Oid type);
>
> Detects whether a type is a preferred type in its category (there can
> be more than one preferred type in a category, and in fact the
> traditional setup is that *every* user-defined type is a preferred
> type in the USER_TYPE category).

Perhaps tangential: What does it mean for a type to be "preferred"?

> The categories themselves are pretty much opaque values, except that
> parse_func.c has special behavior to prefer STRING_TYPE when in doubt.
>
> So this can fairly obviously be replaced by two new pg_type columns,
> say "typcategory" and "typpreferred", where the latter is a bool.
> Since the list of categories is pretty short and there's no obvious
> reason to extend it a lot, I propose that we just represent
> typcategory
> as a "char", using a mapping along the lines of
>
> BITSTRING_TYPE b
> BOOLEAN_TYPE B
> DATETIME_TYPE D
> GENERIC_TYPE P (think "pseudotype")
> GEOMETRIC_TYPE G
> INVALID_TYPE \0 (not allowed in catalog anyway)
> NETWORK_TYPE n
> NUMERIC_TYPE N
> STRING_TYPE S
> TIMESPAN_TYPE T
> UNKNOWN_TYPE u
> USER_TYPE U
>
> Users would be allowed to select any single ASCII character as the
> "category" of a user-defined type, should they have a need to make
> their
> own new category.

Wouldn't this then limit them to 52 possible categories? Does that
matter? Given your suggestion, I'm assuming that a single character is
somehow more efficient than an enum, yes?

> Of course CREATE TYPE's default is category = U and
> preferred = true for backward compatibility reasons. We could put
> down
> a rule that system-defined categories are always upper or lower case
> letters (or even always upper, if we wanted to strain some of the
> assignments a bit) so that it's clear what can be used for a
> user-defined category.

Makes sense.

> It might possibly be worth making new categories for arrays,
> composites,
> and enums; they're currently effectively USER_TYPE but that doesn't
> seem
> quite right. Also, the rules for domains should likely be "same
> category as base type, never a preferred type" instead of the current
> behavior where they're user types. (I think the latter doesn't really
> matter now, because we always smash a domain to its base type before
> inquiring about categories anyway. But it might give Elein a bit more
> room to maneuver with the functions-on-domains issue.)

Yes, this all sounds like it'd be an important improvement.

> A possible objection is that this will make TypeCategory and
> IsPreferredType slower than before, since they'll involve a syscache
> lookup instead of a simple switch statement. I don't think this will
> be too bad though; all the paths they are used in are full of catalog
> lookups anyway, so it's hard to credit that there would be much
> percentage slowdown.
>
> Thoughts?

Obviously I don't know much about the internals, but your explanation
here seems very clear to me. I like it. +1

Thank you, Tom.

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 21:00:29
Message-ID: 16440.1217365229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 29, 2008, at 13:12, Tom Lane wrote:
>> Given the present infrastructure I think the only way would be with
>> two more alias operators, text||citext and citext||text. But that way
>> madness lies.

> I suppose, then, that you're saying that there are lots of other
> functions for which this sort of thing would need to be done? Because
> two more aliases for this one operator is no big deal, AFAIC.

Well, a rough estimate of the places where implicit coercion to text
might be relevant to resolving ambiguity is

select proname from pg_proc
where 'text'::regtype = any(proargtypes)
group by proname having count(*)>1;

select oprname from pg_operator
where oprleft='text'::regtype or oprright='text'::regtype
group by oprname having count(*)> 1;

I count 37 functions and 10 operators as of CVS HEAD. Perhaps not all
would need to be fixed in practical use, but if you wanted seamless
integration of citext it's quite possible that you'd need alias
functions/operators (maybe more than one) in each of those cases.

[ squint... ] Actually, this is an underestimate since these queries
aren't finding cases like quote_literal, where there is ambiguity but
only one of the alternatives takes 'text'. I'm too lazy to work out a
better query though.

> Perhaps tangential: What does it mean for a type to be "preferred"?

See the ambiguous-function resolution rules in chapter 10 of the fine
manual ...

>> Users would be allowed to select any single ASCII character as the
>> "category" of a user-defined type, should they have a need to make
>> their own new category.

> Wouldn't this then limit them to 52 possible categories?

It'd be either 94 - 26 or 94 - 26 - 26 depending on what the policy is
about lower-case letters (and assuming they wanted to stay away from
control characters, which seems like a good idea). Considering the
world supply of categories up to now has been about ten, it's hard
to imagine that this is really a limitation.

> Does that
> matter? Given your suggestion, I'm assuming that a single character is
> somehow more efficient than an enum, yes?

Marginally so; but an enum wouldn't help anyway unless we are prepared
to invent ALTER ENUM. We'd have to go to an actual new system catalog
if we wanted something noticeably better than the poor-mans-enum
approach, and as I mentioned earlier, that just seems like overkill.
(Besides, we could always add it later if there's suddenly a gold rush
for categories. The only thing we'd be locking ourselves into, if
we view this as a stopgap implementation, is the need to accept
single-character abbreviations in future, even after the system knows
actual names for categories.)

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 21:15:54
Message-ID: 055450C5-1386-45F3-B2D3-8FD72E781B0A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 29, 2008, at 14:00, Tom Lane wrote:

> Well, a rough estimate of the places where implicit coercion to text
> might be relevant to resolving ambiguity is
>
> select proname from pg_proc
> where 'text'::regtype = any(proargtypes)
> group by proname having count(*)>1;
>
> select oprname from pg_operator
> where oprleft='text'::regtype or oprright='text'::regtype
> group by oprname having count(*)> 1;
>
> I count 37 functions and 10 operators as of CVS HEAD. Perhaps not all
> would need to be fixed in practical use, but if you wanted seamless
> integration of citext it's quite possible that you'd need alias
> functions/operators (maybe more than one) in each of those cases.

Well, there are already citext aliases for all of those operators, for
this very reason. There are citext aliases for a bunch of the
functions, too (ltrim(), substring(), etc.), so I wouldn't worry about
adding more. I've added more of them since I last sent a patch, mainly
for the regexp functions, replace(), strpos(), etc. I'd guess that I'm
about half-way there already, and there probably are a few I wouldn't
bother with (like timezone()).

Anyway, would this issue then go away once the type stuff was added
and citext was specified as TYPE = 'S'?

> [ squint... ] Actually, this is an underestimate since these queries
> aren't finding cases like quote_literal, where there is ambiguity but
> only one of the alternatives takes 'text'. I'm too lazy to work out a
> better query though.

Thanks.

>> Perhaps tangential: What does it mean for a type to be "preferred"?
>
> See the ambiguous-function resolution rules in chapter 10 of the fine
> manual ...

I see this:

> C. Run through all candidates and keep those that accept preferred
> types (of the input data type's type category) at the most positions
> where type conversion will be required. Keep all candidates if none
> accept preferred types. If only one candidate remains, use it; else
> continue to the next step.

That doesn't exactly explain what "preferred" means, just that it
seems to prioritize the resolution of a function a bit. Which, I
guess, is the point.

>> Wouldn't this then limit them to 52 possible categories?
>
> It'd be either 94 - 26 or 94 - 26 - 26 depending on what the policy is
> about lower-case letters (and assuming they wanted to stay away from
> control characters, which seems like a good idea). Considering the
> world supply of categories up to now has been about ten, it's hard
> to imagine that this is really a limitation.

Okay.

>> Does that
>> matter? Given your suggestion, I'm assuming that a single character
>> is
>> somehow more efficient than an enum, yes?
>
> Marginally so; but an enum wouldn't help anyway unless we are prepared
> to invent ALTER ENUM. We'd have to go to an actual new system catalog
> if we wanted something noticeably better than the poor-mans-enum
> approach, and as I mentioned earlier, that just seems like overkill.
> (Besides, we could always add it later if there's suddenly a gold rush
> for categories. The only thing we'd be locking ourselves into, if
> we view this as a stopgap implementation, is the need to accept
> single-character abbreviations in future, even after the system knows
> actual names for categories.)

Makes sense.

Thanks,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 21:33:46
Message-ID: 16848.1217367226@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> Well, there are already citext aliases for all of those operators, for
> this very reason. There are citext aliases for a bunch of the
> functions, too (ltrim(), substring(), etc.), so I wouldn't worry about
> adding more. I've added more of them since I last sent a patch, mainly
> for the regexp functions, replace(), strpos(), etc. I'd guess that I'm
> about half-way there already, and there probably are a few I wouldn't
> bother with (like timezone()).

That's exactly what I don't really want to do; if you are adding aliases
*only* to get rid of ambiguity-errors, and not to alter functionality,
then I think you're doing the wrong thing. Adding more aliases can
easily make the situation worse.

> Anyway, would this issue then go away once the type stuff was added
> and citext was specified as TYPE = 'S'?

Yeah, that's the point of the proposal. I think the issue has come up
once or twice before, too, else I'd not be so interested in a general
solution. (digs in archives ... there was some discussion of this
in connection with unsigned integer types, and I seem to recall older
threads but can't find any right now.)

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 22:06:08
Message-ID: 88562268-F221-4C31-B435-868C03B88197@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 29, 2008, at 14:33, Tom Lane wrote:

> That's exactly what I don't really want to do; if you are adding
> aliases
> *only* to get rid of ambiguity-errors, and not to alter functionality,
> then I think you're doing the wrong thing. Adding more aliases can
> easily make the situation worse.

Actually, most seem to resolve to text okay. I'm adding aliases to
change behavior (e.g., case-insensitive matching in replace()). It's
the bazillion cast functions I'm having to add that are annoying (see
my previous post with the int8 example).

>> Anyway, would this issue then go away once the type stuff was added
>> and citext was specified as TYPE = 'S'?
>
> Yeah, that's the point of the proposal. I think the issue has come up
> once or twice before, too, else I'd not be so interested in a general
> solution. (digs in archives ... there was some discussion of this
> in connection with unsigned integer types, and I seem to recall older
> threads but can't find any right now.)

No worries, it seems like a really good idea to me, regardless.

Thanks!

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 22:35:10
Message-ID: 26907.1217370910@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> "David E. Wheeler" <david(at)kineticode(dot)com> writes:
>> Anyway, would this issue then go away once the type stuff was added
>> and citext was specified as TYPE = 'S'?

> Yeah, that's the point of the proposal.

BTW, I did confirm (by temporarily hacking up TypeCategory()) that
causing citext to appear to be of STRING category eliminates the need
for the extra || operator and quote_literal() function that are in
the current citext code. So the proposed solution really will work.

You might still want to keep the aliases in cases where the point is to
have the function or operator output resolve as citext rather than text.
I'm not sure how many of these cases that's really important for,
though.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 23:22:06
Message-ID: 559754CA-52DC-4D03-8BB1-18EB0F673122@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 29, 2008, at 15:35, Tom Lane wrote:

> BTW, I did confirm (by temporarily hacking up TypeCategory()) that
> causing citext to appear to be of STRING category eliminates the need
> for the extra || operator and quote_literal() function that are in
> the current citext code. So the proposed solution really will work.

Nice! Does it also allow all of the casts to and from text to
implicitly work, e.g,

SELECT 4::int8::citext = '4' AS t;
SELECT '4'::citext::int8 = 4 AS t;
SELECT 4::bigint::citext = '4' AS t;
SELECT '4'::citext::bigint = 4 AS t;

> You might still want to keep the aliases in cases where the point is
> to
> have the function or operator output resolve as citext rather than
> text.
> I'm not sure how many of these cases that's really important for,
> though.

Not many, I should think.

Thanks,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 23:45:11
Message-ID: 2313.1217375111@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 29, 2008, at 15:35, Tom Lane wrote:
>> BTW, I did confirm (by temporarily hacking up TypeCategory()) that
>> causing citext to appear to be of STRING category eliminates the need
>> for the extra || operator and quote_literal() function that are in
>> the current citext code. So the proposed solution really will work.

> Nice! Does it also allow all of the casts to and from text to
> implicitly work, e.g,

No, but we could think about that. Do we really want that to work for
any member of the string category? It seems a bit overly broad to me
...

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-29 23:53:23
Message-ID: DDFB50E9-CB6B-4CC3-B229-57F3FFD792E5@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 29, 2008, at 16:45, Tom Lane wrote:

>> Nice! Does it also allow all of the casts to and from text to
>> implicitly work, e.g,
>
> No, but we could think about that. Do we really want that to work for
> any member of the string category? It seems a bit overly broad to me
> ...

I was thinking about other text-style types, like UUID and UPC, and
was thinking probably not, at least for them. OTOH, some strings
should work that way, like citext or enums, the difference being that,
in most contexts, they can be treated as plain text, since they're
usually thought of that way. Or so it seems to me. Might there be a
way to create that distinction?

Best,

David


From: "Ryan Bradetich" <rbradetich(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 07:24:38
Message-ID: e739902b0807300024p35b658f5pd9fd060b15e5a6ad@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

On Tue, Jul 29, 2008 at 2:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, that's the point of the proposal. I think the issue has come up
> once or twice before, too, else I'd not be so interested in a general
> solution. (digs in archives ... there was some discussion of this
> in connection with unsigned integer types, and I seem to recall older
> threads but can't find any right now.)

Anything I should be looking into and/or testing for unsigned integer support?

Thanks!

- Ryan

P.S. I have most of the uint2 and uint4 types implemented. I am currently
working on the unit and performance testing. I need to upgrade one of
my Linux boxes first before I can complete my testing. I am hoping to post
my code and performance testing later this week.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 10:08:54
Message-ID: 87wsj3hefd.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> So this can fairly obviously be replaced by two new pg_type columns,
> say "typcategory" and "typpreferred", where the latter is a bool.
> Since the list of categories is pretty short and there's no obvious
> reason to extend it a lot, I propose that we just represent typcategory
> as a "char", using a mapping along the lines of
>
> BITSTRING_TYPE b
> BOOLEAN_TYPE B
> DATETIME_TYPE D
> GENERIC_TYPE P (think "pseudotype")
> GEOMETRIC_TYPE G
> INVALID_TYPE \0 (not allowed in catalog anyway)
> NETWORK_TYPE n
...

I had a different thought when you posted the original prompt earlier. Instead
of a static list of type categories we could re-use type oids. So if you
belong to a type category you store the oid of the preferred type of that
category in typcategory.

I can't help thinking from the list above that there's nothing special about
datetime, geometric, and network data types that some user defined set of
types wouldn't necessarily want to define.

I do agree that having SQL commands to create new type categories, even a new
catalog table is overkill, but not because we wouldn't want to create new
ones. Just because there isn't really any other meta data we want to store
about type categories. Aside from the preferred type and the members there
isn't anything more to say about them.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 14:25:39
Message-ID: 24925.1217427939@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I had a different thought when you posted the original prompt earlier. Instead
> of a static list of type categories we could re-use type oids. So if you
> belong to a type category you store the oid of the preferred type of that
> category in typcategory.

The problem with that is it assumes that each category has one and only
one preferred type. This isn't the case in NUMERIC_TYPE for instance.

What is actually going on in NUMERIC_TYPE is that the OID and OID-alias
types form a sub-category; but you have to look at the set of available
implicit casts to realize that. We could maybe get away with one
preferred type per category if we wanted to invent an explicit
representation of sub-categories, but that seems too complicated.
In any case it's not clear to me that the flexibility to have more than
one preferred type might not be helpful in other, less clearly
structured categories.

> I can't help thinking from the list above that there's nothing special about
> datetime, geometric, and network data types that some user defined set of
> types wouldn't necessarily want to define.

Well, that's why the set of categories needs to be extensible. I agree
that some of the existing categories aren't really pulling their weight:
unless you have enough implicit casts within the category to cause
ambiguity, you don't really need preferred types and hence not your own
category.

For the moment I'm just trying to model the behavior of the existing
functions exactly; but once the catalog infrastructure is in, we could
revisit some of those old decisions and try to eliminate useless
categories. In particular I'm thinking that the historical behavior of
treating all user-defined types as preferred might be backwards.

> I do agree that having SQL commands to create new type categories, even a new
> catalog table is overkill, but not because we wouldn't want to create new
> ones. Just because there isn't really any other meta data we want to store
> about type categories. Aside from the preferred type and the members there
> isn't anything more to say about them.

Agreed.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ryan Bradetich" <rbradetich(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 14:50:49
Message-ID: 25300.1217429449@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Ryan Bradetich" <rbradetich(at)gmail(dot)com> writes:
> On Tue, Jul 29, 2008 at 2:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, that's the point of the proposal. I think the issue has come up
>> once or twice before, too, else I'd not be so interested in a general
>> solution. (digs in archives ... there was some discussion of this
>> in connection with unsigned integer types, and I seem to recall older
>> threads but can't find any right now.)

> Anything I should be looking into and/or testing for unsigned integer support?

Dunno, I forget what the conclusion was about implicit casting for the
unsigned types in your proposal. Have you experimented with seeing
whether, eg, UNION'ing an unsigned with some signed-integer value
behaves sensibly?

The thread I mention above was a year or so back and was originated by
someone who wanted to duplicate mysql's behavior. Your proposal is
a lot more limited and might not really need to try to put the unsigned
types into the numeric category.

regards, tom lane


From: "Ryan Bradetich" <rbradetich(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 15:11:37
Message-ID: e739902b0807300811v5c6e7942tf3d0767bc6905f9f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Tom,

On Wed, Jul 30, 2008 at 7:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dunno, I forget what the conclusion was about implicit casting for the
> unsigned types in your proposal. Have you experimented with seeing
> whether, eg, UNION'ing an unsigned with some signed-integer value
> behaves sensibly?

That was not one of my tests, but I will add it :)

Here are the performance tests I am planning / working on:

1. Loading of both integer and unsigned integers into a table. This test is
mainly a sanity check to ensure the unsigned integers do not add
significant time during inserts. In a perfect world, I believe they should
take the same time when the unsigned integers and integers have equal
widths. Experimentation is showing the unsigned integer to take slightly
longer. I suspect this is due to the additional integer->unsigned
integer cast? I am still digging into this for my personal curiosity.

2. Testing the < operator. The unsigned integer type provides a native cast
for the < operator.

3. Testing the & operator. The unsigned integer type does not provide a
native cast for the & operator, so they are cast up to the next larger size.

I am testing this for both signed and unsigned integers with data sets of the
following sizes: 100, 1000, 10000, 100000, 500000, 1M, 5M, and 10M rows.
I am planning to test on both 32-bit and 64-bit x86 platforms. If there is
interest, I can also run these tests on 32-bit and 64-bit PowerPC platforms.

I will add the union test to my test plan. Anything else I should add or any
holes I am missing with this test plan?

> The thread I mention above was a year or so back and was originated by
> someone who wanted to duplicate mysql's behavior. Your proposal is
> a lot more limited and might not really need to try to put the unsigned
> types into the numeric category.

Ah, ok. I will not worry about it for now.

Thanks!

- Ryan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 17:34:49
Message-ID: 913.1217439289@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 29, 2008, at 16:45, Tom Lane wrote:
>> [ move preferred-type info into the system catalogs ]

I've committed changes along this line. Please look at CVS HEAD and
re-evaluate which alias functions/operators you still really need.

>>> Nice! Does it also allow all of the casts to and from text to
>>> implicitly work, e.g,
>>
>> No, but we could think about that. Do we really want that to work for
>> any member of the string category? It seems a bit overly broad to me
>> ...

> I was thinking about other text-style types, like UUID and UPC, and
> was thinking probably not, at least for them. OTOH, some strings
> should work that way, like citext or enums, the difference being that,
> in most contexts, they can be treated as plain text, since they're
> usually thought of that way. Or so it seems to me. Might there be a
> way to create that distinction?

UUID and so on aren't considered part of the string category, and
shouldn't be IMHO ... any type that has semantics significantly
different from "arbitrary string of text" doesn't belong. At the
same time I'm not entirely sure that we want the I/O conversions
to work for everything in the category. As of CVS HEAD, what we've
got in string category by default are

template1=# select oid::regtype, typtype, typispreferred from pg_type where typcategory = 'S';
oid | typtype | typispreferred
-----------------------------------+---------+----------------
"char" | b | f
name | b | f
text | b | t
character | b | f
character varying | b | f
information_schema.character_data | d | f
information_schema.sql_identifier | d | f
(7 rows)

and you have to remember that *any* domain created over a string type
will also be considered to be of string category.

The behavior that's hard-wired into parse_coerce.c at the moment
is that only text, varchar, bpchar can be sources or targets of
I/O conversions. While opening it up to citext sounds reasonable,
I'm a lot less sure about domains.

[ pokes at it ... ] Oh, I hadn't realized this: find_coercion_pathway
is looking at types that it's already smashed to base types, so
actually you can get an I/O conversion for a domain over one of these
types already!

regression=# create domain d2 as varchar(2);
CREATE DOMAIN
regression=# select 123::int4::d2;
d2
----
12
(1 row)

So maybe the domain issue isn't so important.

Comments anyone?

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 17:42:24
Message-ID: 011A2FCD-8775-4119-B8B2-A76DF111F652@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 30, 2008, at 10:34, Tom Lane wrote:

>>> [ move preferred-type info into the system catalogs ]
>
> I've committed changes along this line. Please look at CVS HEAD and
> re-evaluate which alias functions/operators you still really need.

Okay, I'll hunt down some tuits today to hack on that.

> UUID and so on aren't considered part of the string category, and
> shouldn't be IMHO ... any type that has semantics significantly
> different from "arbitrary string of text" doesn't belong.

Yes, that was essentially my point. "arbitrary string of text" types
are probably fairly rare, since one can just use text or citext or
varchar.

> At the
> same time I'm not entirely sure that we want the I/O conversions
> to work for everything in the category. As of CVS HEAD, what we've
> got in string category by default are
>
> template1=# select oid::regtype, typtype, typispreferred from
> pg_type where typcategory = 'S';
> oid | typtype | typispreferred
> -----------------------------------+---------+----------------
> "char" | b | f
> name | b | f
> text | b | t
> character | b | f
> character varying | b | f
> information_schema.character_data | d | f
> information_schema.sql_identifier | d | f
> (7 rows)
>
> and you have to remember that *any* domain created over a string type
> will also be considered to be of string category.

Right, that all makes sense.

> The behavior that's hard-wired into parse_coerce.c at the moment
> is that only text, varchar, bpchar can be sources or targets of
> I/O conversions. While opening it up to citext sounds reasonable,
> I'm a lot less sure about domains.

So who might we open it up so that citext can take advantage of it?
I'd love to get rid of all these ugly cast functions I've been writing.

> [ pokes at it ... ] Oh, I hadn't realized this: find_coercion_pathway
> is looking at types that it's already smashed to base types, so
> actually you can get an I/O conversion for a domain over one of these
> types already!
>
> regression=# create domain d2 as varchar(2);
> CREATE DOMAIN
> regression=# select 123::int4::d2;
> d2
> ----
> 12
> (1 row)
>
> So maybe the domain issue isn't so important.

What about enums?

Best,

David


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 17:55:10
Message-ID: 489064AE.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> and you have to remember that *any* domain created over a string
type
> will also be considered to be of string category.

> find_coercion_pathway
> is looking at types that it's already smashed to base types, so
> actually you can get an I/O conversion for a domain over one of
these
> types already!
>
> regression=# create domain d2 as varchar(2);
> CREATE DOMAIN
> regression=# select 123::int4::d2;
> d2
> ----
> 12
> (1 row)

I would find is surprising if a domain didn't share these properties
with its underlying type.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 18:09:50
Message-ID: 1551.1217441390@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 30, 2008, at 10:34, Tom Lane wrote:
>> The behavior that's hard-wired into parse_coerce.c at the moment
>> is that only text, varchar, bpchar can be sources or targets of
>> I/O conversions. While opening it up to citext sounds reasonable,
>> I'm a lot less sure about domains.
>> [ uh, no, actually domains do work here ]
>> So maybe the domain issue isn't so important.

> What about enums?

Er, what about them? I don't see that they enter into this at all.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 18:26:17
Message-ID: 349814F9-4BE9-446D-B7DC-E103156A018C@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 30, 2008, at 11:09, Tom Lane wrote:

>> What about enums?
>
> Er, what about them? I don't see that they enter into this at all.

Well, from a user's point of view, they're pretty much just strings…

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 18:27:34
Message-ID: 64792473-E306-414E-B66E-622D333D7ADC@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 30, 2008, at 10:42, David E. Wheeler wrote:

>> I've committed changes along this line. Please look at CVS HEAD and
>> re-evaluate which alias functions/operators you still really need.
>
> Okay, I'll hunt down some tuits today to hack on that.

The ones you removed all seem to be fine. I'll have to re-evaluate all
the one's I've been adding, though. I expect I'll still have to
replace those that need to behave differently (i.e., case-
insensitively) than text.

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 19:58:29
Message-ID: 19842.1217447909@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 30, 2008, at 11:09, Tom Lane wrote:
>>> What about enums?
>>
>> Er, what about them? I don't see that they enter into this at all.

> Well, from a user's point of view, they're pretty much just strings

But they're a very restricted set of strings; I can't see that there's
any general-purpose use for allowing them as the source or destination
of I/O conversions.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 20:10:22
Message-ID: 20069.1217448622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Jul 30, 2008, at 10:34, Tom Lane wrote:
>> UUID and so on aren't considered part of the string category, and
>> shouldn't be IMHO ... any type that has semantics significantly
>> different from "arbitrary string of text" doesn't belong.

> Yes, that was essentially my point. "arbitrary string of text" types
> are probably fairly rare, since one can just use text or citext or
> varchar.

Good point --- so new members of STRING category aren't going to be that
common, except for domains which apparently aren't bothering people
anyway. I'll go ahead and make the change. (I think it's just a
trivial change in find_coercion_pathway, and everything else should
Just Work. If it turns out not to be trivial maybe we should
reconsider.)

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-30 20:11:46
Message-ID: 3121EED5-EB0A-4666-B36A-CB08801B9DEA@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 30, 2008, at 12:58, Tom Lane wrote:

>> Well, from a user's point of view, they're pretty much just strings
>
> But they're a very restricted set of strings; I can't see that there's
> any general-purpose use for allowing them as the source or destination
> of I/O conversions.

*shrug* Okay, no big deal, I think.

Best,

David


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-31 07:49:48
Message-ID: 200807310949.48245.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le mercredi 30 juillet 2008, David E. Wheeler a écrit :
> On Jul 30, 2008, at 10:34, Tom Lane wrote:
> >>> [ move preferred-type info into the system catalogs ]
> > UUID and so on aren't considered part of the string category, and
> > shouldn't be IMHO ... any type that has semantics significantly
> > different from "arbitrary string of text" doesn't belong.
>
> Yes, that was essentially my point. "arbitrary string of text" types
> are probably fairly rare, since one can just use text or citext or
> varchar.

I'm not following this thread closely (enough) but my prefix_range type (from
the pgfoundry prefix module) certainly is a user defined (in C) "arbitrary
string of text" type. The fact that it's user visible could be questionned,
its usage is for GiST indexing prefix searches when the prefix is in the
table, not in the query literal:
SELECT * FROM prefixes WHERE prefix @> 'literal';

This query currently only profits from the GiST indexing if prefix column is
of type prefix_range, I'd like to be able to index text and store
prefix_range in the index, it's on the TODO list (and certainly is doable
AFAIUI).

I'm not sure this input is valuable for the topic, but would rather not miss
the example in case it is :)

> > The behavior that's hard-wired into parse_coerce.c at the moment
> > is that only text, varchar, bpchar can be sources or targets of
> > I/O conversions. While opening it up to citext sounds reasonable,
> > I'm a lot less sure about domains.

I still remember the sour taste of cancelling a domain usage for some type
needs here after discovering you can't create CAST for domains. Would the
categorising stuff help here?

Regards,
--
dim


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-07-31 17:42:00
Message-ID: 2FD031D4-7B94-4EEF-9C4D-34A77F3FE256@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 30, 2008, at 13:10, Tom Lane wrote:

>> Yes, that was essentially my point. "arbitrary string of text" types
>> are probably fairly rare, since one can just use text or citext or
>> varchar.
>
> Good point --- so new members of STRING category aren't going to be
> that
> common, except for domains which apparently aren't bothering people
> anyway. I'll go ahead and make the change. (I think it's just a
> trivial change in find_coercion_pathway, and everything else should
> Just Work. If it turns out not to be trivial maybe we should
> reconsider.)

Wow. Really nice, Tom. Thanks!

The attached patch has all the tests I added to my svn version against
8.3, and for which I had to write 60 additional cast functions. With
your changes, I had to add only two more:

CREATE OR REPLACE FUNCTION citext(boolean)
RETURNS citext
AS 'booltext'
LANGUAGE 'internal' IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citext(inet)
RETURNS citext
AS 'SELECT text( $1 )::citext'
LANGUAGE SQL IMMUTABLE STRICT;

So the I/O casts are working beautifully. This is true even for ENUMs,
which I couldn't get to magically cast with a function in 8.3. Thank
you!

If you don't want so many cast tests, I can remove all but a few of
them. I wrote them for all the core user-visible types I could
identify so that I could see how they behaved with text and then make
citext work the same way. Such is not as important in HEAD, thanks to
the I/O casting, so let me know if you want me to cut down on the
number of tests (I was particularly uncertain about the xml type,
since --with-libxml won't always be true, I expect).

The other change in this patch is the addition of functions to make
various string-comparison functions behave case-insensitively. I've
started with all those that I was aware of from the previous
documentation, and I've made them behave case-insensitively by writing
SQL functions to hack it in. Ideally these would be done in C, but
that started to get beyond my abilities. Suggestions welcome.

The only other thing I wanted to look at doing with citext was to look
for any other string-comparison functions I might have missed and do
the same for them. Otherwise, I think we're golden.

Many thanks,

David

Attachment Content-Type Size
citext_casting.patch.gz application/x-gzip 8.8 KB

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-08-04 17:43:35
Message-ID: 20A114CA-0FC9-47B0-9E7A-2AB80325FFE4@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 31, 2008, at 10:42, David E. Wheeler wrote:

>> Good point --- so new members of STRING category aren't going to be
>> that
>> common, except for domains which apparently aren't bothering people
>> anyway. I'll go ahead and make the change. (I think it's just a
>> trivial change in find_coercion_pathway, and everything else should
>> Just Work. If it turns out not to be trivial maybe we should
>> reconsider.)
>
> Wow. Really nice, Tom. Thanks!
>
> The attached patch has all the tests I added to my svn version
> against 8.3, and for which I had to write 60 additional cast
> functions.

Ping! Just wanted to make sure this wasn't lost in the shuffle…

Thanks,

David


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-08-04 18:02:26
Message-ID: 20080804180226.GC4483@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David E. Wheeler wrote:
> On Jul 31, 2008, at 10:42, David E. Wheeler wrote:

>> Wow. Really nice, Tom. Thanks!
>>
>> The attached patch has all the tests I added to my svn version against
>> 8.3, and for which I had to write 60 additional cast functions.
>
> Ping! Just wanted to make sure this wasn't lost in the shuffle…

Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-08-04 18:09:39
Message-ID: 41C85208-0D9E-4F15-8533-AE49471425F0@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 4, 2008, at 11:02, Alvaro Herrera wrote:

>> Ping! Just wanted to make sure this wasn't lost in the shuffle…
>
> Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09

Sure, although it's a simple refinement (read: tests, mainly) of an
accepted July patch, submitted before the end of July. I can put it in
for Sept, but thought it might be best to deal with it while the
original patch is still fresh in (Tom's) mind and since it's something
Tom asked me for. :-)

Anyway, added to the 09 commitfest page.

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-08-04 18:15:52
Message-ID: 48974758.6030700@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David E. Wheeler wrote:
> On Aug 4, 2008, at 11:02, Alvaro Herrera wrote:
>
>>> Ping! Just wanted to make sure this wasn't lost in the shuffle…
>>
>> Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09
>
> Sure, although it's a simple refinement (read: tests, mainly) of an
> accepted July patch, submitted before the end of July. I can put it in
> for Sept, but thought it might be best to deal with it while the
> original patch is still fresh in (Tom's) mind and since it's something
> Tom asked me for. :-)
>
> Anyway, added to the 09 commitfest page.
>
>

There is nothing to prevent such things from being committed ahead of
that commitfest.

cheers

andrew


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-08-11 03:58:40
Message-ID: 08157471-DF74-46F2-B34A-6D34C160C1DA@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just realized that I forgot to add the DROP FUNCTION statements to the
uninstall script. New patch attached.

Best,

David

Attachment Content-Type Size
citext_casting2.patch.gz application/x-gzip 9.0 KB

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Type Categories for User-Defined Types
Date: 2008-08-11 17:43:36
Message-ID: F721EFF1-553C-4E25-A293-7BD08D6957F4@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 10, 2008, at 20:58, David E. Wheeler wrote:

> Just realized that I forgot to add the DROP FUNCTION statements to
> the uninstall script. New patch attached.

And this one also includes the casts I added. :-)

Best,

David

Attachment Content-Type Size
citext_casting3.patch.gz application/x-gzip 8.0 KB