Re: enums

Lists: pgsql-hackers
From: "Jim Nasby" <jnasby(at)pervasive(dot)com>
To: "Gregory Maxwell" <gmaxwell(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enums
Date: 2005-10-27 22:23:58
Message-ID: D1D2D51E3BE3FC4E98598248901F7594027F9025@ausmail2k4.aus.pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Adding -hackers back to the list...

> -----Original Message-----
> From: Gregory Maxwell [mailto:gmaxwell(at)gmail(dot)com]
> Sent: Thursday, October 27, 2005 5:03 PM
> To: Jim Nasby
> Subject: Re: [HACKERS] enums
>
>
> On 10/27/05, Jim C. Nasby <jnasby(at)pervasive(dot)com> wrote:
> > On Thu, Oct 27, 2005 at 04:54:36PM -0400, Ted Rolle wrote:
> > > This little snippet is great! The only problem I see is
> that the enums must
> > > be consistent across all modules.
> > >
> > > What about loading a variable with a "default" value?
> Then it could be
> > > adjusted to 'play'.
> >
> > Huh? Sorry, but you completely lost me here...
> >
> > On another note, I noticed that the comparison operators seem to be
> > comparing the underlying numeric value used to store the
> enum, which is
> > wrong IMO. Consider:
> >
> > ENUM color '"red","blue","green"'
> > CREATE TABLE t (c color);
> > INSERT INTO t VALUES('blue');
> > INSERT INTO t VALUES('green');
> > INSERT INTO t VALUES('red');
> > SELECT c FROM t ORDER BY c;
> > red
> > blue
> > green
> >
> > That seems counter-intuitive. It's also exposing an implimentation
> > detail (that the enum is stored internally as a number).
>
> You could as equally say that it's ordering it by the order of the
> enum declaration, which seems quite reasonable to me.

I don't really see why that's considered reasonable, especially as a default. I could maybe see an argument for having a means to see this ordering, but IMO anything depending on that is broken. I don't think we should be making any guarantees about how enums are stored in the database (including ordering).

> Now, if you can multiply one enum with another or with an integer,
> that would be a bit odd. But the collation behavior seems quite sane
> to me.
>
> The behavior in mysql is the same:
>
> mysql> create table t (
> -> color enum ('red','blue','green')
> -> );
> mysql> INSERT INTO t VALUES ('blue');
> mysql> INSERT INTO t VALUES ('green');
> mysql> INSERT INTO t VALUES ('red');
> mysql> select color from t order by color;
> +-------+
> | color |
> +-------+
> | red |
> | blue |
> | green |
> +-------+
> 3 rows in set (0.04 sec)
>
> Hopefully we the end implimentation is a bit better than mysql and
> actually cares what you're inserting:
>
> mysql> select color*2 from t;
> +---------+
> | color*2 |
> +---------+
> | 4 |
> | 6 |
> | 2 |
> +---------+
> 3 rows in set (0.00 sec)
> mysql> INSERT INTO t VALUES (1);
> Query OK, 1 row affected (0.00 sec)
> mysql> INSERT INTO t VALUES ('monkey');
> Query OK, 1 row affected, 1 warning (0.00 sec)
> mysql> select color from t;
> +-------+
> | color |
> +-------+
> | blue |
> | green |
> | red |
> | red |
> | |
> +-------+
> 5 rows in set (0.00 sec)

Your examples show why I don't think it's a good idea to use MySQL as a guide for how to do enums.

If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an artifact of our storage mechanism. This means supporting things like being able to re-order the accepted values in an enum. But like I said, I just don't see the use case for doing that.


From: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
To: Jim Nasby <jnasby(at)pervasive(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-27 22:46:24
Message-ID: e692861c0510271546ud7a08c7h3ed3f97315671f38@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/27/05, Jim Nasby <jnasby(at)pervasive(dot)com> wrote:
> Adding -hackers back to the list...
> > You could as equally say that it's ordering it by the order of the
> > enum declaration, which seems quite reasonable to me.
>
> I don't really see why that's considered reasonable, especially as a default. I could maybe see an argument for having a means to see this ordering, but IMO anything depending on that is broken. I don't think we should be making any guarantees about how enums are stored in the database (including ordering).

> Your examples show why I don't think it's a good idea to use MySQL as a guide for how to do enums.

Yes, MySQL is broken in some regards, as usual. However, the API isn't
bad (except for the fact that it doesn't care what invalid crap you
throw at it), and more importantly there are thousands of apps and
developers who think around that interface. We should copy it without
the brokenness as much as possible unless we have good cause
otherwise.

> If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an artifact of our storage mechanism. This means supporting things like being able to re-order the accepted values in an enum. But like I said, I just don't see the use case for doing that.

So what do you propose we do for a default ordering? I hope you don't
think we should force a sort as though the enum labels were text...
That almost certainly incorrect for most applications of enums, which
are used to make opaque labels more human compatible.

MySQL's behavior of allowing the user to specify the collation in the
typedef makes a lot of sense to me, it doesn't matter that it actually
works as an artifact of the storage backend. I'd argue that it would
make sense to sort by the specification order even if we changed the
backend to use varchars rather than numbers.


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-27 23:04:48
Message-ID: 20051027230448.GX63747@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 27, 2005 at 06:46:24PM -0400, Gregory Maxwell wrote:
> So what do you propose we do for a default ordering? I hope you don't
> think we should force a sort as though the enum labels were text...

I do think that. Or default ordering on whatever type the enum is (I can
see enums that are something other than text as useful, though that's a
secondary goal).

> That almost certainly incorrect for most applications of enums, which
> are used to make opaque labels more human compatible.

Sorting red before blue doesn't sound very opaque to me...

> MySQL's behavior of allowing the user to specify the collation in the
> typedef makes a lot of sense to me, it doesn't matter that it actually
> works as an artifact of the storage backend. I'd argue that it would
> make sense to sort by the specification order even if we changed the
> backend to use varchars rather than numbers.

Like I said, if we're going to support a concept of ordering of items in
an enum then we need to support it fully. For starters that means having
the ability to re-order things in an enum seamlessly.

If our primary concern is MySQL compatability then we should look at
offering two types of enums; one that mirrors their broken stuff and one
that works they way you'd actually want it to.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
Cc: Jim Nasby <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-27 23:17:16
Message-ID: 43615FFC.9070004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Maxwell wrote:

>Yes, MySQL is broken in some regards, as usual. However, the API isn't
>bad (except for the fact that it doesn't care what invalid crap you
>throw at it), and more importantly there are thousands of apps and
>developers who think around that interface. We should copy it without
>the brokenness as much as possible unless we have good cause
>otherwise.
>
>

mmm ... no. It is too broken. We should do enums orthogonally with other
type definitions in PostgreSQL. Where I would like to get to is that we
have a flavor of CREATE TYPE that will create the enum type for us,
including all the support that I build into my little kit. And if you
want to change the enumeration set on a column, you would use ALTER
TABLE foo ALTER COLUMN bar TYPE newtype USING ...

Inline declarations of enums does not strike me as good.

>
>
>>If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an artifact of our storage mechanism. This means supporting things like being able to re-order the accepted values in an enum. But like I said, I just don't see the use case for doing that.
>>
>>
>
>So what do you propose we do for a default ordering? I hope you don't
>think we should force a sort as though the enum labels were text...
>That almost certainly incorrect for most applications of enums, which
>are used to make opaque labels more human compatible.
>
>

Yeah, lexical ordering is surely wrong. I believe that the enumeration
order is the only correct ordering. If you wanted lexical ordering, you
could always do "order by myenum::text".

>MySQL's behavior of allowing the user to specify the collation in the
>typedef makes a lot of sense to me, it doesn't matter that it actually
>works as an artifact of the storage backend. I'd argue that it would
>make sense to sort by the specification order even if we changed the
>backend to use varchars rather than numbers.
>
>
>

Can't see the point in it, I must confess.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Gregory Maxwell <gmaxwell(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-27 23:22:19
Message-ID: 4361612B.8030804@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:

>
>Like I said, if we're going to support a concept of ordering of items in
>an enum then we need to support it fully. For starters that means having
>the ability to re-order things in an enum seamlessly.
>
>

I do not see this at all. An enumeration defines an ordering and a set
of labels. Why should you be able to change it? If you want a different
ordering, create a new enumeration. Let's do this right because it's a
feature worth having, not just mimic the competition's idiocy.

cheers

andrew


From: Trent Shipley <tshipley(at)deru(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 00:23:02
Message-ID: 200510271723.03405.tshipley@deru.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 2005-10-27 16:22, Andrew Dunstan wrote:
> Jim C. Nasby wrote:
> >Like I said, if we're going to support a concept of ordering of items in
> >an enum then we need to support it fully. For starters that means having
> >the ability to re-order things in an enum seamlessly.
>
> I do not see this at all. An enumeration defines an ordering and a set
> of labels. Why should you be able to change it? If you want a different
> ordering, create a new enumeration. Let's do this right because it's a
> feature worth having, not just mimic the competition's idiocy
>

The symbols in the set have no _per se_ order.
A collation rule is necessary to sort the symbols consistently.
ASCII is an enumeration
Unicode is a large enumeration with a simple naive collation and a complex
default collation.

Defining a set results in an unordered specification of symbols.
Defining a collation produces an ordering for the set.
There can be many collations for a set.

An enumeration is just a computer science short-hand way to define a set and a
"native" collation for the set.
An enumeration's native collation need not be the only, or even the most
common, collation for the enumerated set of symbols.


From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: tshipley(at)deru(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 00:39:31
Message-ID: 20B975DA-2A24-4E50-AF2E-47CFFE9BDDC7@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Oct 28, 2005, at 9:23 , Trent Shipley wrote:

> On Thursday 2005-10-27 16:22, Andrew Dunstan wrote:
>
>> Jim C. Nasby wrote:
>>
>>> Like I said, if we're going to support a concept of ordering of
>>> items in
>>> an enum then we need to support it fully. For starters that means
>>> having
>>> the ability to re-order things in an enum seamlessly.
>>>
>>
>> I do not see this at all. An enumeration defines an ordering and a
>> set
>> of labels. Why should you be able to change it? If you want a
>> different
>> ordering, create a new enumeration. Let's do this right because
>> it's a
>> feature worth having, not just mimic the competition's idiocy
>>
>>
>
> The symbols in the set have no _per se_ order.
> A collation rule is necessary to sort the symbols consistently.
> ASCII is an enumeration
> Unicode is a large enumeration with a simple naive collation and a
> complex
> default collation.
>
> Defining a set results in an unordered specification of symbols.
> Defining a collation produces an ordering for the set.
> There can be many collations for a set.
>
> An enumeration is just a computer science short-hand way to define
> a set and a
> "native" collation for the set.
> An enumeration's native collation need not be the only, or even the
> most
> common, collation for the enumerated set of symbols.

Relational databases already have a type for unordered sets: tables.
IMO, if there's going to be a separate enumerated type, it should be
more than just an alternative way of defining a set of key-value pairs.

Michael Glaesemann
grzm myrealbox com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: tshipley(at)deru(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 01:05:40
Message-ID: 43617964.2070403@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Trent Shipley wrote:

>
>An enumeration is just a computer science short-hand way to define a set and a
>"native" collation for the set.
>An enumeration's native collation need not be the only, or even the most
>common, collation for the enumerated set of symbols.
>
>
>
>

No it's not. Many languages define enumerated types as having a fixed
ordering. It is not just a set. That is the sense in which I am using
the term. And, after all, you can always write a function that gives you
an alternative ordering. All we are building in is an ordering based in
the enumeration order, which you are at perfect liberty not to use.
Postgres lets you define an alternative operator class for any type, so
we are very flexible. People are getting way too hung up over this.

cheers

andrew


From: "Cristian Prieto" <cristian(at)clickdiario(dot)com>
To: "'Andrew Dunstan'" <andrew(at)dunslane(dot)net>, <tshipley(at)deru(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enums
Date: 2005-10-28 01:35:41
Message-ID: 003901c5db5f$e9078f50$6500a8c0@gt.ClickDiario.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

What about use the declaration order as the enum order?, for example: if I
declare something like: "CREATE ENUM hola ('item1', 'item3', 'item2');"
-this is just assuming an hypothetical approach to use enum types in this
way- and the logical order of the items could be 'item1', 'item3', 'item2'
just because the user decided to create them in this way.

Well, that's just an idea...

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Andrew Dunstan
Sent: Jueves, 27 de Octubre de 2005 07:06 p.m.
To: tshipley(at)deru(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] enums

Trent Shipley wrote:

>
>An enumeration is just a computer science short-hand way to define a set
and a
>"native" collation for the set.
>An enumeration's native collation need not be the only, or even the most
>common, collation for the enumerated set of symbols.
>
>
>
>

No it's not. Many languages define enumerated types as having a fixed
ordering. It is not just a set. That is the sense in which I am using
the term. And, after all, you can always write a function that gives you
an alternative ordering. All we are building in is an ordering based in
the enumeration order, which you are at perfect liberty not to use.
Postgres lets you define an alternative operator class for any type, so
we are very flexible. People are getting way too hung up over this.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
message can get through to the mailing list cleanly


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Cristian Prieto <cristian(at)clickdiario(dot)com>
Cc: tshipley(at)deru(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 01:46:41
Message-ID: 43618301.3070405@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Cristian Prieto wrote:

>What about use the declaration order as the enum order?, for example: if I
>declare something like: "CREATE ENUM hola ('item1', 'item3', 'item2');"
>-this is just assuming an hypothetical approach to use enum types in this
>way- and the logical order of the items could be 'item1', 'item3', 'item2'
>just because the user decided to create them in this way.
>
>Well, that's just an idea...
>
>
>

Using the declaration order is exactly what I have done.

cheers

andrew


From: Trent Shipley <tshipley(at)deru(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 01:58:21
Message-ID: 200510271858.21920.tshipley@deru.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 2005-10-27 17:39, Michael Glaesemann wrote:
> On Oct 28, 2005, at 9:23 , Trent Shipley wrote:
> > On Thursday 2005-10-27 16:22, Andrew Dunstan wrote:
> >> Jim C. Nasby wrote:
<snip/>
> Relational databases already have a type for unordered sets: tables.
> IMO, if there's going to be a separate enumerated type, it should be
> more than just an alternative way of defining a set of key-value pairs.
>
> Michael Glaesemann
> grzm myrealbox com

Of course, what is an enumeration except an *ordered* list of key-value pairs;
that is, a set with a built-in collation.

Glaesemann is right. To me that implies that no database NEEDS an enumeration
type.

We just started discussing it because it would greatly enhance MySQL
migration. Even more important, enumerations would be tremendously
convenient (even if they are not strictly necessary).

Enumerations would be good marketing and good engineering.

The debate is about implementation:

1) Pure list (seems to be why MySQL does). Each enumeration is a list. If
you want to create a variant collation for the list, create a new, parallel
list.

This is very straightforward but violates the "store once, read many times"
principle of database design.

2) Hybrid list + secondary collations. You create and store an enumeration
(call it enum_a). If you want to reorder the underlying set, just declare a
new collation for the enumeration (call the result enum_b). enum_b is
effectively a virtual enumeration. The relationship of enum_b to enum_a is
like that between a view and its table.

On the downside, this approach is a theoretical stew. It should be relatively
easy to implement.

3) Set + collation functions. You define a set. You define a collation for
the set. Having declared set_a and a collation_a you can then declare
enumeration_a. (Note that the result defined by the developer's collation
function may not necessarily result in a _per se_ enumeration.)

This has the appeal of separating the symbol declaration from its ordering.
Furthermore, of all the options it is the most powerful. Unfortunately, it
may be verbose, unintuitive, and the most difficult to implement.

There is probably no reason approach #1 or #2 could not be implemented using
the machinery for approach #3 under the bonnet. That way we could have
something like:

CREATE SYMBOL SET
{possibly a disguised create table, but probably not for performance
reasons}
CREATE COLLATION USING function_name
CREATE ENUMERATION
and
CREATE MYSQL_ENUMERATION. {probably just overload CREATE ENUMERATION}

===============================

http://dev.mysql.com/doc/refman/5.1/en/string-type-overview.html
http://dev.mysql.com/doc/refman/5.1/en/enum.html
(Note that unlike C enumerations MySql enumerations are two way and do some
context dependent magic.)


From: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jim Nasby <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 06:02:43
Message-ID: e692861c0510272302g689db8d0uca13e3897dd82b86@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/27/05, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >Yes, MySQL is broken in some regards, as usual. However, the API isn't
> >bad (except for the fact that it doesn't care what invalid crap you
> >throw at it), and more importantly there are thousands of apps and
> >developers who think around that interface. We should copy it without
> >the brokenness as much as possible unless we have good cause
> >otherwise.
> >
>
> mmm ... no. It is too broken. We should do enums orthogonally with other
> type definitions in PostgreSQL. Where I would like to get to is that we
> have a flavor of CREATE TYPE that will create the enum type for us,
> including all the support that I build into my little kit. And if you
> want to change the enumeration set on a column, you would use ALTER
> TABLE foo ALTER COLUMN bar TYPE newtype USING ...

eh, Well that we have a reasonable user extensiable type system is
reasonable reason. What I was mostly objecting to was the use of
lexical collation the "don't mess with what people already expect"
argument was just the most handy strawman available. :)

And in doing so you could insert a enum in the middle of the existing
list without breaking the values already in the table? If so that
would be very useful.

> Inline declarations of enums does not strike me as good.

You're right, it's a property of a type.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
Cc: Jim Nasby <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: enums
Date: 2005-10-28 12:39:59
Message-ID: 43621C1F.4050409@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Maxwell wrote:

>And in doing so you could insert a enum in the middle of the existing
>list without breaking the values already in the table? If so that
>would be very useful.
>
>
>
You do it by altering the column type, not by altering the type itself.
MySQL's way of doing this is made necessary by its horrid non-orthogonal
way of doing enums. Here's how it works in PostgreSQL. (To make this
example work I had to add a text conversion - an inadvertant omission
from the original. This is in a revised version of the enumkit,
available at the same location.)

andrew=# create table foo (i serial, c rgb);
NOTICE: CREATE TABLE will create implicit sequence "foo_i_seq" for
serial column "foo.i"
CREATE TABLE
andrew=# insert into foo (c) values ('blue');
INSERT 8711471 1
andrew=# insert into foo (c) values ('green');
INSERT 8711472 1
andrew=# insert into foo (c) values ('red');
INSERT 8711473 1
andrew=# select * from foo order by c;
i | c
---+-------
3 | red
2 | green
1 | blue
(3 rows)

andrew=# insert into foo (c) values ('yellow');
ERROR: invalid input value for enum: "yellow"
andrew=# alter table foo alter column c type rainbow using c::text;
ALTER TABLE
andrew=# select * from foo order by c;
i | c
---+-------
3 | red
2 | green
1 | blue
(3 rows)

andrew=# insert into foo (c) values ('yellow');
INSERT 8711477 1
andrew=# select * from foo order by c;
i | c
---+--------
3 | red
4 | yellow
2 | green
1 | blue
(4 rows)

cheers

andrew