Quick Links

enums

Lists:	pgsql-hackers

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	enums
Date:	2005-10-27 19:47:53
Message-ID:	43612EE9.5010209@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

If people would like to play, I have created a little kit to help in
creating first class enum types in a few seconds. It works something
like this:

make TYPENAME=rainbow ENUMS=' "red", "orange", "yellow", "green",
"blue", "indigo", "violet" '
make TYPENAME=rainbow install
psql -f /path/to/contrib/rainbow-install.sql yourdb

and you are done. Now you can do:

create table foo( r rainbow);
insert into foo values('red');
select 'red'::rainbow < 'green'::rainbow; <-- yields true
select rainbow_order('yellow'); <-- yields 2

The kit is at http://developer.postgresql.org/~adunstan/enumkit.tgz

Needs 8.0 or later, since it use PGXS.

Maximum number of values is 32767 - but if you use that many you're
insane anyway :-)

I did this as part of thinking about how we might do enums properly. AS
Chris KL recently noted - it is very often asked for. So this is not the
end of the road, just a tiny step at the beginning.

cheers

andrew

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-27 20:46:03
Message-ID:	20051027204603.GN63747@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is cool; it's something people can use today if nothing else.
Long-term, is it practical to have the enums compiled in? ISTM that's
not very workable, but I'm completely guessing. The other issue is that
this version makes it very difficult to change what's in the enum (not
that that's at all easy with MySQL...)

On Thu, Oct 27, 2005 at 03:47:53PM -0400, Andrew Dunstan wrote:
>
> If people would like to play, I have created a little kit to help in
> creating first class enum types in a few seconds. It works something
> like this:
>
> make TYPENAME=rainbow ENUMS=' "red", "orange", "yellow", "green",
> "blue", "indigo", "violet" '
> make TYPENAME=rainbow install
> psql -f /path/to/contrib/rainbow-install.sql yourdb
>
> and you are done. Now you can do:
>
> create table foo( r rainbow);
> insert into foo values('red');
> select 'red'::rainbow < 'green'::rainbow; <-- yields true
> select rainbow_order('yellow'); <-- yields 2
>
> The kit is at http://developer.postgresql.org/~adunstan/enumkit.tgz
>
> Needs 8.0 or later, since it use PGXS.
>
> Maximum number of values is 32767 - but if you use that many you're
> insane anyway :-)
>
> I did this as part of thinking about how we might do enums properly. AS
> Chris KL recently noted - it is very often asked for. So this is not the
> end of the road, just a tiny step at the beginning.
>
> cheers
>
> andrew
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Ted Rolle <ted(dot)rolle(at)gmail(dot)com>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-27 20:54:36
Message-ID:	11dc8b9e0510271354w76677229oa19774cf8bd45ce8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This little snippet is great! The only problem I see is that the enums must
be consistent across all modules.

What about loading a variable with a "default" value? Then it could be
adjusted to 'play'.

On 10/27/05, Jim C. Nasby <jnasby(at)pervasive(dot)com> wrote:
>
> This is cool; it's something people can use today if nothing else.
> Long-term, is it practical to have the enums compiled in? ISTM that's
> not very workable, but I'm completely guessing. The other issue is that
> this version makes it very difficult to change what's in the enum (not
> that that's at all easy with MySQL...)
>
> On Thu, Oct 27, 2005 at 03:47:53PM -0400, Andrew Dunstan wrote:
> >
> > If people would like to play, I have created a little kit to help in
> > creating first class enum types in a few seconds. It works something
> > like this:
> >
> > make TYPENAME=rainbow ENUMS=' "red", "orange", "yellow", "green",
> > "blue", "indigo", "violet" '
> > make TYPENAME=rainbow install
> > psql -f /path/to/contrib/rainbow-install.sql yourdb
> >
> > and you are done. Now you can do:
> >
> > create table foo( r rainbow);
> > insert into foo values('red');
> > select 'red'::rainbow < 'green'::rainbow; <-- yields true
> > select rainbow_order('yellow'); <-- yields 2
> >
> > The kit is at http://developer.postgresql.org/~adunstan/enumkit.tgz
> >
> > Needs 8.0 or later, since it use PGXS.
> >
> > Maximum number of values is 32767 - but if you use that many you're
> > insane anyway :-)
> >
> > I did this as part of thinking about how we might do enums properly. AS
> > Chris KL recently noted - it is very often asked for. So this is not the
> > end of the road, just a tiny step at the beginning.
> >
> > cheers
> >
> > andrew
>
>

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	ted(at)php(dot)net
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-27 21:26:13
Message-ID:	20051027212613.GP63747@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2005 at 04:54:36PM -0400, Ted Rolle wrote:
> This little snippet is great! The only problem I see is that the enums must
> be consistent across all modules.
>
> What about loading a variable with a "default" value? Then it could be
> adjusted to 'play'.

Huh? Sorry, but you completely lost me here...

On another note, I noticed that the comparison operators seem to be
comparing the underlying numeric value used to store the enum, which is
wrong IMO. Consider:

ENUM color '"red","blue","green"'
CREATE TABLE t (c color);
INSERT INTO t VALUES('blue');
INSERT INTO t VALUES('green');
INSERT INTO t VALUES('red');
SELECT c FROM t ORDER BY c;
red
blue
green

That seems counter-intuitive. It's also exposing an implimentation
detail (that the enum is stored internally as a number).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-27 23:02:45
Message-ID:	43615C95.3090508@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim C. Nasby wrote:

>
>On another note, I noticed that the comparison operators seem to be
>comparing the underlying numeric value used to store the enum, which is
>wrong IMO. Consider:
>
>ENUM color '"red","blue","green"'
>CREATE TABLE t (c color);
>INSERT INTO t VALUES('blue');
>INSERT INTO t VALUES('green');
>INSERT INTO t VALUES('red');
>SELECT c FROM t ORDER BY c;
>red
>blue
>green
>
>That seems counter-intuitive. It's also exposing an implimentation
>detail (that the enum is stored internally as a number).
>
>

No it is not. Not in the slightest. It is honoring the enumeration order
defined for the type. That is the ONLY correct behaviour, IMNSHO.
Otherwise, you could just as easily use a domain with a check constraint.

In fact, mysql's behaviour is laughably, even ludicrously, inconsistent:

mysql> select * from t where color < 'green';
+-------+
| color |
+-------+
| blue |
+-------+

So for "order by" it honors the enumeration order, but for < it uses the
lexical ordering. Lovely, eh?

cheers

andrew

From:	Michael Fuhr <mike(at)fuhr(dot)org>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-27 23:41:01
Message-ID:	20051027234101.GA50434@winnie.fuhr.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2005 at 07:02:45PM -0400, Andrew Dunstan wrote:
> Jim C. Nasby wrote:
> >On another note, I noticed that the comparison operators seem to be
> >comparing the underlying numeric value used to store the enum, which is
> >wrong IMO. Consider:
> >
> >ENUM color '"red","blue","green"'
> >CREATE TABLE t (c color);
> >INSERT INTO t VALUES('blue');
> >INSERT INTO t VALUES('green');
> >INSERT INTO t VALUES('red');
> >SELECT c FROM t ORDER BY c;
> >red
> >blue
> >green
> >
> >That seems counter-intuitive. It's also exposing an implimentation
> >detail (that the enum is stored internally as a number).
>
> No it is not. Not in the slightest. It is honoring the enumeration order
> defined for the type. That is the ONLY correct behaviour, IMNSHO.

I agree. Honoring the enumeration order makes sense if you consider
the values as things that should be ordered based on some property
of their thingness instead of based on what their labels happen to
be in a particular language. If I have an enumeration of colors I
might want values sorted by their position in the spectrum, so
whether the labels are (red, green, blue) or (gorri, berde, urdin)
I might want to maintain that particular order.

If you want values ordered lexically then you can enumerate them
that way. Why force that behavior on people who want to order based
on some other criteria?

--
Michael Fuhr

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Michael Fuhr <mike(at)fuhr(dot)org>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 01:24:57
Message-ID:	20051028012457.GC63747@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2005 at 05:41:01PM -0600, Michael Fuhr wrote:
> If you want values ordered lexically then you can enumerate them
> that way. Why force that behavior on people who want to order based
> on some other criteria?

Well, I was arguing about the default behavior. I'd bet that we're going
to have a constant set of people wondering why ORDER BY is doing the
'wrong thing' when ordering an ENUM, which is why I argued that the
default behavior should be ordering based on the external type, not how
we're storing it or some other order.

But I'm clearly in the minority in this view, so I'm droping it. :)

On to other issues...

Andrew, you mentioned that if you want to change the ordering you should
just create a new type. What about if you need to change the values that
are in the enum? MySQL does (or at least did, it's been some time since
I've messed with this) a horrible job at that. There's no way to rename
anything; you have to add the new names you want, then do a bulk update,
then delete the (now old) names. IMO this is broken.

Also, if we are going to maintain ordering and mapping (presumably via
the internal number that we're storing), then I think we should expose
that, at least optionally. So for example, you should be able to define
what a specific enum value means. Not everyone will want a linear
numbering starting at 0 afterall.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 01:45:05
Message-ID:	436182A1.7050806@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim C. Nasby wrote:

>Andrew, you mentioned that if you want to change the ordering you should
>just create a new type. What about if you need to change the values that
>are in the enum? MySQL does (or at least did, it's been some time since
>I've messed with this) a horrible job at that. There's no way to rename
>anything; you have to add the new names you want, then do a bulk update,
>then delete the (now old) names. IMO this is broken.
>
>

It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE
newtype USING expression" operation. You would write a function that
took a value of the old type and returned a value of the new type and
use a cll to that function in the expression. Since these would be named
types, unlike the case in mysql where they are anonymously defined
inline, this would present no difficulties at all.

>Also, if we are going to maintain ordering and mapping (presumably via
>the internal number that we're storing), then I think we should expose
>that, at least optionally. So for example, you should be able to define
>what a specific enum value means. Not everyone will want a linear
>numbering starting at 0 afterall.
>
>

What on earth for? Users should not care in the slightest what the
internal representation is . Users who want a map where the values are
exposed should create a lookup table.

You keep saying that we are using the internal representation as the
ordering. This is simply the wrong way to look at it. The internal
representation REFLECTS the ordering; it doesn't impose it. The user has
imposed the ordering when defining the type. In my enumkit I did provide
a function that gave back the internal representation, but I am not by
any means certain that that's a good idea.

cheers

andrew

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 02:14:15
Message-ID:	20051028021415.GH63747@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2005 at 09:45:05PM -0400, Andrew Dunstan wrote:
>
>
> Jim C. Nasby wrote:
>
> >Andrew, you mentioned that if you want to change the ordering you should
> >just create a new type. What about if you need to change the values that
> >are in the enum? MySQL does (or at least did, it's been some time since
> >I've messed with this) a horrible job at that. There's no way to rename
> >anything; you have to add the new names you want, then do a bulk update,
> >then delete the (now old) names. IMO this is broken.
> >
> >
>
>
> It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE
> newtype USING expression" operation. You would write a function that
> took a value of the old type and returned a value of the new type and
> use a cll to that function in the expression. Since these would be named
> types, unlike the case in mysql where they are anonymously defined
> inline, this would present no difficulties at all.

But why force a re-write of the entire table just to change the name of
something?

Or is ALTER COLUMN TYPE smart enough to not touch anything if the
mapping function is equality?

> >Also, if we are going to maintain ordering and mapping (presumably via
> >the internal number that we're storing), then I think we should expose
> >that, at least optionally. So for example, you should be able to define
> >what a specific enum value means. Not everyone will want a linear
> >numbering starting at 0 afterall.
> >
> >
>
> What on earth for? Users should not care in the slightest what the
> internal representation is . Users who want a map where the values are
> exposed should create a lookup table.
>
> You keep saying that we are using the internal representation as the
> ordering. This is simply the wrong way to look at it. The internal
> representation REFLECTS the ordering; it doesn't impose it. The user has
> imposed the ordering when defining the type. In my enumkit I did provide
> a function that gave back the internal representation, but I am not by
> any means certain that that's a good idea.

Well, someone was arguing that enum should be used as a convenient way
to map human labels on a set of values. To me, that means you should be
able to define exactly what that set of values is.

Personally, I don't see why enum can't just be syntactic sugar on top of
a side-table of values and a foreign key. And I guess a view to hide the
internals from normal viewing. That would certainly allow the most
flexibility, although it probably wouldn't perform as well as what you
wrote.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	ted(at)php(dot)net
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 02:20:51
Message-ID:	43618B03.8050704@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ted Rolle wrote:

> This little snippet is great! The only problem I see is that the
> enums must be consistent across all modules.
>
> What about loading a variable with a "default" value? Then it could
> be adjusted to 'play'.
>
>

You can set a default for a variable using one of these types, as you
can for any other postgres type:

create table bar (color rainbow default 'blue');
insert into bar values(default);

As for any postgres type, the default must be a valid value for the type.

cheers

andrew

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, ted(at)php(dot)net, Michael Fuhr <mike(at)fuhr(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: enums
Date:	2005-10-28 02:23:27
Message-ID:	1130466207.846.40.camel@home
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE
> > newtype USING expression" operation. You would write a function that
> > took a value of the old type and returned a value of the new type and
> > use a cll to that function in the expression. Since these would be named
> > types, unlike the case in mysql where they are anonymously defined
> > inline, this would present no difficulties at all.
>
> But why force a re-write of the entire table just to change the name of
> something?
>
> Or is ALTER COLUMN TYPE smart enough to not touch anything if the
> mapping function is equality?

Nearly all ALTER TABLE commands are processed in the same way. In fact,
in some cases they are combined (ADD column is broken down then
re-combined later for a single rewrite).

You could pretty easily add this type of logic to skip the rewrite stage
if not needed.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 02:34:57
Message-ID:	43618E51.2000407@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim C. Nasby wrote:

>On Thu, Oct 27, 2005 at 09:45:05PM -0400, Andrew Dunstan wrote:
>
>
>>Jim C. Nasby wrote:
>>
>>
>>
>>>Andrew, you mentioned that if you want to change the ordering you should
>>>just create a new type. What about if you need to change the values that
>>>are in the enum? MySQL does (or at least did, it's been some time since
>>>I've messed with this) a horrible job at that. There's no way to rename
>>>anything; you have to add the new names you want, then do a bulk update,
>>>then delete the (now old) names. IMO this is broken.
>>>
>>>
>>>
>>>
>>It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE
>>newtype USING expression" operation. You would write a function that
>>took a value of the old type and returned a value of the new type and
>>use a cll to that function in the expression. Since these would be named
>>types, unlike the case in mysql where they are anonymously defined
>>inline, this would present no difficulties at all.
>>
>>
>
>But why force a re-write of the entire table just to change the name of
>something?
>
>

Because you are not just changing the name of something.

You can't decide you want to insert a new digit between 3 and 4 for an
integer column and just affect a few rows, rather than change the whole
set of rows for that field. Nor change the order of the integers. An
enumerated type is just like that, except that the values are labels
instead of digit sequences.

>>
>>You keep saying that we are using the internal representation as the
>>ordering. This is simply the wrong way to look at it. The internal
>>representation REFLECTS the ordering; it doesn't impose it. The user has
>>imposed the ordering when defining the type. In my enumkit I did provide
>>a function that gave back the internal representation, but I am not by
>>any means certain that that's a good idea.
>>
>>
>
>Well, someone was arguing that enum should be used as a convenient way
>to map human labels on a set of values. To me, that means you should be
>able to define exactly what that set of values is.
>
>

Well, that's not my conception at all. Then it is not an enumeration in
my view.

>Personally, I don't see why enum can't just be syntactic sugar on top of
>a side-table of values and a foreign key. And I guess a view to hide the
>internals from normal viewing. That would certainly allow the most
>flexibility, although it probably wouldn't perform as well as what you
>wrote.
>
>

The other issue is ease of use.

We used lookup tables in bugzilla when it was converted to work with
Postgres. But many users will find having to do that annoying, to say
the least. I think there's a very good case for providing true enums.
There is a technical part of the puzzle I can't quite see yet, though :-)

cheers

andrew

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Subject:	Re: enums
Date:	2005-10-28 03:07:19
Message-ID:	1130468839.846.55.camel@home
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> The other issue is ease of use.
>
> We used lookup tables in bugzilla when it was converted to work with
> Postgres. But many users will find having to do that annoying, to say
> the least. I think there's a very good case for providing true enums.

Then why did you use lookup tables instead of a varchar and a
constraint? Probably performance.

A much more general purpose but just as good solution would be the
ability to create a hidden surrogate key for a structure.

CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
CREATE TABLE account (name varchar(60), status varchar(20) references
status);

Behind the scenes (transparent to the user) this gets converted to:

CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
WITH SURROGATE;
CREATE TABLE account (name varchar(60), status integer references
status(id));

SELECT * FROM account; would be rewritten as
SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
account;

Enum might be good for a short list of items but something like the
above should be good for any common value that we manually create
surrogate keys for today but without the clutter or the application
needing to know.

If PostgreSQL had an updatable view implementation it would be pretty
simple to implement.

From:	Gregory Maxwell <gmaxwell(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 05:57:38
Message-ID:	e692861c0510272257n6fb8943fl1bcfd30942f1c35b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/27/05, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >That seems counter-intuitive. It's also exposing an implimentation
> >detail (that the enum is stored internally as a number).
>
> No it is not. Not in the slightest. It is honoring the enumeration order
> defined for the type. That is the ONLY correct behaviour, IMNSHO.
> Otherwise, you could just as easily use a domain with a check constraint.
>
> In fact, mysql's behaviour is laughably, even ludicrously, inconsistent:
[snip]
> So for "order by" it honors the enumeration order, but for < it uses the
> lexical ordering. Lovely, eh?

Oh wow. That is broken, I didn't try that case because I figured it
would do it right (i.e. use the enum order).

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Rod Taylor <pg(at)rbt(dot)ca>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Subject:	Re: enums
Date:	2005-10-28 14:16:15
Message-ID:	436232AF.6050600@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Rod Taylor wrote:

>>The other issue is ease of use.
>>
>>We used lookup tables in bugzilla when it was converted to work with
>>Postgres. But many users will find having to do that annoying, to say
>>the least. I think there's a very good case for providing true enums.
>>
>>
>
>Then why did you use lookup tables instead of a varchar and a
>constraint? Probably performance.
>
>

To be honest, I forget why. Possible because we also needed to be able
to get a list of allowed values, although I don't know how one does that
in mysql. Maybe because it just seemed like a good idea at the time and
nobody spoke up against it.

>A much more general purpose but just as good solution would be the
>ability to create a hidden surrogate key for a structure.
>
>CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
>CREATE TABLE account (name varchar(60), status varchar(20) references
>status);
>
>Behind the scenes (transparent to the user) this gets converted to:
>
>CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
>WITH SURROGATE;
>CREATE TABLE account (name varchar(60), status integer references
>status(id));
>
>
>SELECT * FROM account; would be rewritten as
>SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
>account;
>
>Enum might be good for a short list of items but something like the
>above should be good for any common value that we manually create
>surrogate keys for today but without the clutter or the application
>needing to know.
>
>If PostgreSQL had an updatable view implementation it would be pretty
>simple to implement.
>
>
>

That won't make it easier to change the ordering or the value set, which
some people seem concerned about.

But it too might be a nice feature. I suspect it would be a lot more
work than simple enums, for which there is significant demand.

cheers

andrew

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Rod Taylor <pg(at)rbt(dot)ca>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 18:20:51
Message-ID:	20051028182051.GJ13187@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2005 at 11:07:19PM -0400, Rod Taylor wrote:
> > The other issue is ease of use.
> >
> > We used lookup tables in bugzilla when it was converted to work with
> > Postgres. But many users will find having to do that annoying, to say
> > the least. I think there's a very good case for providing true enums.
>
> Then why did you use lookup tables instead of a varchar and a
> constraint? Probably performance.
>
> A much more general purpose but just as good solution would be the
> ability to create a hidden surrogate key for a structure.
>
> CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
> CREATE TABLE account (name varchar(60), status varchar(20) references
> status);
>
> Behind the scenes (transparent to the user) this gets converted to:
>
> CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
> WITH SURROGATE;
> CREATE TABLE account (name varchar(60), status integer references
> status(id));
>
>
> SELECT * FROM account; would be rewritten as
> SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
> account;
>
> Enum might be good for a short list of items but something like the
> above should be good for any common value that we manually create
> surrogate keys for today but without the clutter or the application
> needing to know.
>
> If PostgreSQL had an updatable view implementation it would be pretty
> simple to implement.

I'm not quiet following the WITH SURROGATE bit, but what you've
described certainly looks valuable. Note that I would still want to be
able to get at the raw numeric values in some fasion.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 18:23:27
Message-ID:	20051028182327.GK13187@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2005 at 10:34:57PM -0400, Andrew Dunstan wrote:
>
>
> Jim C. Nasby wrote:
>
> >On Thu, Oct 27, 2005 at 09:45:05PM -0400, Andrew Dunstan wrote:
> >
> >
> >>Jim C. Nasby wrote:
> >>
> >>
> >>
> >>>Andrew, you mentioned that if you want to change the ordering you should
> >>>just create a new type. What about if you need to change the values that
> >>>are in the enum? MySQL does (or at least did, it's been some time since
> >>>I've messed with this) a horrible job at that. There's no way to rename
> >>>anything; you have to add the new names you want, then do a bulk update,
> >>>then delete the (now old) names. IMO this is broken.
> >>>
> >>>
> >>>
> >>>
> >>It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE
> >>newtype USING expression" operation. You would write a function that
> >>took a value of the old type and returned a value of the new type and
> >>use a cll to that function in the expression. Since these would be named
> >>types, unlike the case in mysql where they are anonymously defined
> >>inline, this would present no difficulties at all.
> >>
> >>
> >
> >But why force a re-write of the entire table just to change the name of
> >something?
> >
> >
>
> Because you are not just changing the name of something.

No, I was refering specifically to the case of wanting to rename
something. IE: you setup an enum for sky colors (blue, black), and then
the PHB issues an edict that the daytime sky is now green. In this case
you (or at least I) don't want to define a new enum, I just want to
change 'blue' to 'green' in that enum. There's no reason it needs to hit
the table at all.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 18:57:03
Message-ID:	1130525823.846.131.camel@home
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2005-10-28 at 13:20 -0500, Jim C. Nasby wrote:
> On Thu, Oct 27, 2005 at 11:07:19PM -0400, Rod Taylor wrote:
> > > The other issue is ease of use.
> > >
> > > We used lookup tables in bugzilla when it was converted to work with
> > > Postgres. But many users will find having to do that annoying, to say
> > > the least. I think there's a very good case for providing true enums.
> >
> > Then why did you use lookup tables instead of a varchar and a
> > constraint? Probably performance.
> >
> > A much more general purpose but just as good solution would be the
> > ability to create a hidden surrogate key for a structure.
> >
> > CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
> > CREATE TABLE account (name varchar(60), status varchar(20) references
> > status);
> >
> > Behind the scenes (transparent to the user) this gets converted to:
> >
> > CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
> > WITH SURROGATE;
> > CREATE TABLE account (name varchar(60), status integer references
> > status(id));
> >
> >
> > SELECT * FROM account; would be rewritten as
> > SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
> > account;
> >
> > Enum might be good for a short list of items but something like the
> > above should be good for any common value that we manually create
> > surrogate keys for today but without the clutter or the application
> > needing to know.
> >
> > If PostgreSQL had an updatable view implementation it would be pretty
> > simple to implement.
>
> I'm not quiet following the WITH SURROGATE bit, but what you've
> described certainly looks valuable. Note that I would still want to be
> able to get at the raw numeric values in some fasion.

The basic idea is that most of us break out schemas by creating fake
primary keys for the purpose of obtaining performance because using the
proper primary key (single or multiple columns) is often very slow.

The automatic and transparent creation of a surrogate key by PostgreSQL
would allow us to dramatically clean up the presentation of our schema
to the users using the database without the performance hit we currently
get.

It puts surrogate keys (fake primary keys) back to the level of table
spaces, indexes and other performance enhancements where they belong.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 20:12:01
Message-ID:	43628611.2020906@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim C. Nasby wrote:

>>>>
>>>>
>>>But why force a re-write of the entire table just to change the name of
>>>something?
>>>
>>>
>>>
>>>
>>Because you are not just changing the name of something.
>>
>>
>
>No, I was refering specifically to the case of wanting to rename
>something. IE: you setup an enum for sky colors (blue, black), and then
>the PHB issues an edict that the daytime sky is now green. In this case
>you (or at least I) don't want to define a new enum, I just want to
>change 'blue' to 'green' in that enum. There's no reason it needs to hit
>the table at all.
>
>

Well, with enumkit you can't, because the values are hardwired in the
.so file. With a builtin facility you would be able to, because the
values would live in the catalog. However, hacking the catalog is not
something I would encourage - what you are suggesting basically breaks
the abstraction. But sure, it would be possible. I would not provide an
SQL level facility to do it, though. My approved way to do it would be
like the example I gave earlier.

cheers

andrew

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Rod Taylor <pg(at)rbt(dot)ca>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 20:21:59
Message-ID:	20051028202159.GT13187@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> The basic idea is that most of us break out schemas by creating fake
> primary keys for the purpose of obtaining performance because using the
> proper primary key (single or multiple columns) is often very slow.
>
> The automatic and transparent creation of a surrogate key by PostgreSQL
> would allow us to dramatically clean up the presentation of our schema
> to the users using the database without the performance hit we currently
> get.
>
>
> It puts surrogate keys (fake primary keys) back to the level of table
> spaces, indexes and other performance enhancements where they belong.

Ahh. Yes, that would definately be great to have. Although it would
probably take me months if not years to get used to not seeing a bunch
of _id fields laying all over the place...

Is SURROGATE part of any of the ANSI specs?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 20:23:26
Message-ID:	20051028202326.GU13187@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 28, 2005 at 04:12:01PM -0400, Andrew Dunstan wrote:
> Well, with enumkit you can't, because the values are hardwired in the
> .so file. With a builtin facility you would be able to, because the
> values would live in the catalog. However, hacking the catalog is not
> something I would encourage - what you are suggesting basically breaks
> the abstraction. But sure, it would be possible. I would not provide an
> SQL level facility to do it, though. My approved way to do it would be
> like the example I gave earlier.

Why not allow renaming though? It seems like a logical feature to have,
and an easy one to add. What am I missing?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Michael Fuhr <mike(at)fuhr(dot)org>, ted(at)php(dot)net, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: enums
Date:	2005-10-28 20:31:36
Message-ID:	43628AA8.4030601@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim C. Nasby wrote:

>On Fri, Oct 28, 2005 at 04:12:01PM -0400, Andrew Dunstan wrote:
>
>
>>Well, with enumkit you can't, because the values are hardwired in the
>>.so file. With a builtin facility you would be able to, because the
>>values would live in the catalog. However, hacking the catalog is not
>>something I would encourage - what you are suggesting basically breaks
>>the abstraction. But sure, it would be possible. I would not provide an
>>SQL level facility to do it, though. My approved way to do it would be
>>like the example I gave earlier.
>>
>>
>
>Why not allow renaming though? It seems like a logical feature to have,
>and an easy one to add. What am I missing?
>
>

That it is not changing a name, but a value. It's roughly the equivalent
of inserting a new digit between 3 and 4. Your "feature" breaks the
abstraction I am trying to implement.

cheers

andrew

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 20:36:26
Message-ID:	1130531786.846.139.camel@home
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2005-10-28 at 15:21 -0500, Jim C. Nasby wrote:
> On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> > The basic idea is that most of us break out schemas by creating fake
> > primary keys for the purpose of obtaining performance because using the
> > proper primary key (single or multiple columns) is often very slow.
> >
> > The automatic and transparent creation of a surrogate key by PostgreSQL
> > would allow us to dramatically clean up the presentation of our schema
> > to the users using the database without the performance hit we currently
> > get.
> >
> >
> > It puts surrogate keys (fake primary keys) back to the level of table
> > spaces, indexes and other performance enhancements where they belong.
>
> Ahh. Yes, that would definately be great to have. Although it would
> probably take me months if not years to get used to not seeing a bunch
> of _id fields laying all over the place...
>
> Is SURROGATE part of any of the ANSI specs?

No, but neither is an index, rollback segment, or table space. The ANSI
spec doesn't usually deal with performance tweaks that are the
responsibility of the DBA.

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Rod Taylor <pg(at)rbt(dot)ca>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 21:28:37
Message-ID:	20051028212837.GX13187@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 28, 2005 at 04:36:26PM -0400, Rod Taylor wrote:
> On Fri, 2005-10-28 at 15:21 -0500, Jim C. Nasby wrote:
> > On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> > > The basic idea is that most of us break out schemas by creating fake
> > > primary keys for the purpose of obtaining performance because using the
> > > proper primary key (single or multiple columns) is often very slow.
> > >
> > > The automatic and transparent creation of a surrogate key by PostgreSQL
> > > would allow us to dramatically clean up the presentation of our schema
> > > to the users using the database without the performance hit we currently
> > > get.
> > >
> > >
> > > It puts surrogate keys (fake primary keys) back to the level of table
> > > spaces, indexes and other performance enhancements where they belong.
> >
> > Ahh. Yes, that would definately be great to have. Although it would
> > probably take me months if not years to get used to not seeing a bunch
> > of _id fields laying all over the place...
> >
> > Is SURROGATE part of any of the ANSI specs?
>
> No, but neither is an index, rollback segment, or table space. The ANSI
> spec doesn't usually deal with performance tweaks that are the
> responsibility of the DBA.

True, but none of those other things you mention affect external
representation of data. But I was more wondering if we were inventing
syntax on the fly here or not...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 22:10:26
Message-ID:	1130537426.846.146.camel@home
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2005-10-28 at 16:28 -0500, Jim C. Nasby wrote:
> On Fri, Oct 28, 2005 at 04:36:26PM -0400, Rod Taylor wrote:
> > On Fri, 2005-10-28 at 15:21 -0500, Jim C. Nasby wrote:
> > > On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> > > > The basic idea is that most of us break out schemas by creating fake
> > > > primary keys for the purpose of obtaining performance because using the
> > > > proper primary key (single or multiple columns) is often very slow.
> > > >
> > > > The automatic and transparent creation of a surrogate key by PostgreSQL
> > > > would allow us to dramatically clean up the presentation of our schema
> > > > to the users using the database without the performance hit we currently
> > > > get.
> > > >
> > > >
> > > > It puts surrogate keys (fake primary keys) back to the level of table
> > > > spaces, indexes and other performance enhancements where they belong.
> > >
> > > Ahh. Yes, that would definately be great to have. Although it would
> > > probably take me months if not years to get used to not seeing a bunch
> > > of _id fields laying all over the place...
> > >
> > > Is SURROGATE part of any of the ANSI specs?
> >
> > No, but neither is an index, rollback segment, or table space. The ANSI
> > spec doesn't usually deal with performance tweaks that are the
> > responsibility of the DBA.
>
> True, but none of those other things you mention affect external
> representation of data. But I was more wondering if we were inventing
> syntax on the fly here or not...

It isn't supposed to impact the external representation of the data and
generally neither is an ENUM outside of the potential sorting ability. I
was just getting the impression that the big push for enums was to be
able to use a 'real word' but without a performance hit.

A regular old table, foreign key to a varchar gives you the 'real word'
and the surrogate key allows you to do so without a performance hit.

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Rod Taylor <pg(at)rbt(dot)ca>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Fuhr <mike(at)fuhr(dot)org>
Subject:	Re: enums
Date:	2005-10-28 22:21:54
Message-ID:	20051028222154.GD13187@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 28, 2005 at 06:10:26PM -0400, Rod Taylor wrote:
> It isn't supposed to impact the external representation of the data and
> generally neither is an ENUM outside of the potential sorting ability. I
> was just getting the impression that the big push for enums was to be
> able to use a 'real word' but without a performance hit.
>
> A regular old table, foreign key to a varchar gives you the 'real word'
> and the surrogate key allows you to do so without a performance hit.

I think there's probably good use cases for each. If you've got
something small like a status field, 'enum' might be better. For bigger
things, SURROGATE could be nice syntactic sugar.

Now that I finally understand what Andrew's been getting at with enums,
I'm wondering if we might want to expand on the typical usage a bit.
Looking at a plain-old C enum, you're just representing some magic
labels with a number to save space. Things like say, SLRU_PAGE_CLEAN,
SLRU_PAGE_READ_IN_PROGRESS, SLRU_PAGE_CLEAN. Those names are great from
a code standpoint, but they're not something you'd typically want to
display to the user. So, imho a useful extension would be to allow for
enums to contain both the 'machine name' and a 'human name', where the
human name could be renamed freely. To put this in a more concrete
example; I hate the default priorities that ship with bugzilla; P1 - P5.
Is 1 high or is 5? So I always rename them to Very Low, Low ... Very
High. That means making changes both to the database and to the code.
But if Bugzilla was using my idea of an enum then the code would refer
to priorities with P1...P5 (or whatever else they wanted to call it) and
I could easily change the human names to something that can't be
confused.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Surrogate keys (Was: enums)
Date:	2006-01-13 12:42:55
Message-ID:	loom.20060113T133918-638@post.gmane.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Rod Taylor <pg <at> rbt.ca> writes:

> The basic idea is that most of us break out schemas by creating fake
> primary keys for the purpose of obtaining performance because using the
> proper primary key (single or multiple columns) is often very slow.

This is one thing I simply can't understand.

If you still declare the natural key(s) as UNIQUEs, you have just made
performance worse. Now there are two keys to be checked on UPDATEs and
INSERTs, two indexes to be updated, and probably a SEQUENCE too.

If you don't, you have just thrown away centralised, optimised integrity
checking, and will probably have to remember to do a slower SELECT before
updating.

Certainly decoupling presentation from storage would be nice, but even before
that generalised use of surrogate keys seems to me a knee-jerk reaction.

From:	Michael Glaesemann <grzm(at)myrealbox(dot)com>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 10:28:21
Message-ID:	812B6253-520A-46E1-A120-011F23B558D7@myrealbox.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 13, 2006, at 21:42 , Leandro Guimarães Faria Corcete DUTRA wrote:

> If you still declare the natural key(s) as UNIQUEs, you have just made
> performance worse. Now there are two keys to be checked on UPDATEs
> and
> INSERTs, two indexes to be updated, and probably a SEQUENCE too.

For UPDATEs and INSERTs, the "proper" primary key also needs to be
checked, but keys are used for more than just checking uniqueness:
they're also often used in JOINs. Joining against a single integer
I'd think it quite a different proposition (I'd think faster in terms
of performance) than joining against, say, a text column or a
composite key.

Michael Glaesemann
grzm myrealbox com

From:	Lukas Smith <smith(at)pooteeweet(dot)org>
To:	Michael Glaesemann <grzm(at)myrealbox(dot)com>
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 10:34:31
Message-ID:	43C8D3B7.7070007@pooteeweet.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Glaesemann wrote:
>
> On Jan 13, 2006, at 21:42 , Leandro Guimarães Faria Corcete DUTRA wrote:
>
>> If you still declare the natural key(s) as UNIQUEs, you have just made
>> performance worse. Now there are two keys to be checked on UPDATEs and
>> INSERTs, two indexes to be updated, and probably a SEQUENCE too.
>
> For UPDATEs and INSERTs, the "proper" primary key also needs to be
> checked, but keys are used for more than just checking uniqueness:
> they're also often used in JOINs. Joining against a single integer I'd
> think it quite a different proposition (I'd think faster in terms of
> performance) than joining against, say, a text column or a composite key.

Well this is a balancing decision. You certainly slow down inserts. You
might also increase the stress on the table because you have to
translate between the different keys. It also depends on the join type
you end up doing. It also obviously depends on how large your original
primary key is. However whatever your situation is: make sure you do not
end up doing premature optimization.

regards,
Lukas

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 15:32:34
Message-ID:	1137252754.7815.90.camel@home
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2006-01-13 at 12:42 +0000, Leandro Guimarães Faria Corcete DUTRA
wrote:
> Rod Taylor <pg <at> rbt.ca> writes:
>
> > The basic idea is that most of us break out schemas by creating fake
> > primary keys for the purpose of obtaining performance because using the
> > proper primary key (single or multiple columns) is often very slow.
>
> This is one thing I simply can't understand.
>
> If you still declare the natural key(s) as UNIQUEs, you have just made
> performance worse. Now there are two keys to be checked on UPDATEs and
> INSERTs, two indexes to be updated, and probably a SEQUENCE too.

Indeed. Using a surrogate key is not free and that is why it would be
something the DBA would specify during table creation.

The main goal would be to give the option of using a surrogate key
without being forced to expose it to the applications using the
database. It is a feature akin to table spaces in that it can help
performance but without the application or standard users knowing why.

From:	Lukas Smith <smith(at)pooteeweet(dot)org>
To:	Rod Taylor <pg(at)rbt(dot)ca>
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 15:42:23
Message-ID:	43C91BDF.5010103@pooteeweet.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Rod Taylor wrote:

>> If you still declare the natural key(s) as UNIQUEs, you have just made
>> performance worse. Now there are two keys to be checked on UPDATEs and
>> INSERTs, two indexes to be updated, and probably a SEQUENCE too.
>
> Indeed. Using a surrogate key is not free and that is why it would be
> something the DBA would specify during table creation.
>
> The main goal would be to give the option of using a surrogate key
> without being forced to expose it to the applications using the
> database. It is a feature akin to table spaces in that it can help
> performance but without the application or standard users knowing why.

Just this morning my father came to me (he is getting into SQL now that
he is retired) with an issue where a surrogate key probably makes sense.
He is storing a tree of plant families that can get fairly deep. The
primary key is the scientific name. In order to improve performance and
get rid of the recursive lookups he currently does he now wants to use
materialized paths (<parent name>/<sub name>/<sub sub name>). He decided
not to go with nested paths since that makes it very hard to hand fix
things in the tree structure. Obviously using the scientific name in the
materialized paths can quickly give you a really wide column if you have
a fairly deep tree. In that case it could be beneficial to introduce a
surrogate key.

The only annoying bit is that he frequently needs to sync with an
external database where they use no surrogate key so the import slows
down because he needs to check if a surrogate key has been introduced
for every given scientific name before writing to the database.

regards,
Lukas

From:	mark(at)mark(dot)mielke(dot)cc
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 16:06:07
Message-ID:	20060114160607.GA10058@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2006 at 12:42:55PM +0000, Leandro Guimarães Faria Corcete DUTRA wrote:
> Rod Taylor <pg <at> rbt.ca> writes:
> > The basic idea is that most of us break out schemas by creating fake
> > primary keys for the purpose of obtaining performance because using the
> > proper primary key (single or multiple columns) is often very slow.
> This is one thing I simply can't understand.

> If you still declare the natural key(s) as UNIQUEs, you have just made
> performance worse. Now there are two keys to be checked on UPDATEs and
> INSERTs, two indexes to be updated, and probably a SEQUENCE too.

Not to completely defend the practice - but in some applications,
INSERT is much less frequent than UPDATE, and that UPDATE requires a
unique check on the primary key and the surrogate key, as well as an
update, should be considered (and I believe is considered) a
PostgreSQL performance bug. It's undesirable and unnecessary behaviour
for the majority of uses (where they key does not change as a part of
the update).

> Certainly decoupling presentation from storage would be nice, but even before
> that generalised use of surrogate keys seems to me a knee-jerk reaction.

Yes, I agree. As per a previous thread, I'm one of those using it to
generalize my query / update implementation into common base code. I
have other reasons - but I confess to this being the real reason.

In my case, the cost of maintaining the code that queries / updates is
more expensive than the cost of having an extra unique index, and the
storage and performance impacts this has on my data. :-)

Is my primary reason good on its own, without the other more legitimate
justifications? It's good enough for me. I expect others to strongly
disagree.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	mark(at)mark(dot)mielke(dot)cc
Cc:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 20:33:39
Message-ID:	20060114203339.GA25248@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2006 at 11:06:07AM -0500, mark(at)mark(dot)mielke(dot)cc wrote:
> Not to completely defend the practice - but in some applications,
> INSERT is much less frequent than UPDATE, and that UPDATE requires a
> unique check on the primary key and the surrogate key, as well as an
> update, should be considered (and I believe is considered) a
> PostgreSQL performance bug. It's undesirable and unnecessary behaviour
> for the majority of uses (where they key does not change as a part of
> the update).

Unique check? An index is an index and when you do an UPDATE the new
tuple has to be added to the index. At this point it doesn't matter if
the index is unique or not, all indexes cost something.

Since after the UPDATE the tuple with that primary key appears two (or
more) times in the table, a check needs to be made that they don't
overlap timewise. Are you claiming you could avoid this check and still
guarentee correctness in the face of concurrent transactions?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

From:	mark(at)mark(dot)mielke(dot)cc
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-14 21:08:44
Message-ID:	20060114210844.GA14387@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2006 at 09:33:39PM +0100, Martijn van Oosterhout wrote:
> On Sat, Jan 14, 2006 at 11:06:07AM -0500, mark(at)mark(dot)mielke(dot)cc wrote:
> > Not to completely defend the practice - but in some applications,
> > INSERT is much less frequent than UPDATE, and that UPDATE requires a
> > unique check on the primary key and the surrogate key, as well as an
> > update, should be considered (and I believe is considered) a
> > PostgreSQL performance bug. It's undesirable and unnecessary behaviour
> > for the majority of uses (where they key does not change as a part of
> > the update).
> Unique check? An index is an index and when you do an UPDATE the new
> tuple has to be added to the index. At this point it doesn't matter if
> the index is unique or not, all indexes cost something.

> Since after the UPDATE the tuple with that primary key appears two (or
> more) times in the table, a check needs to be made that they don't
> overlap timewise. Are you claiming you could avoid this check and still
> guarentee correctness in the face of concurrent transactions?

I'm claiming that I agree with this TODO item:

- Prevent index uniqueness checks when UPDATE does not modify the column

Uniqueness (index) checks are done when updating a column
even if the column is not modified by the UPDATE.

Definately, the check is unnecessary. If it was unique before we made
the change, we know it will be unique after we've made the change.
The check shouldn't be performed for the primary key, or for the
surrogate key, if neither of these keys are modified in any way.

Perhaps you are challenging my addition of the phrase "as well as an
update", with a hint on my part, that I feel the update is unnecessary
as well. I may have been wrong to add these 5 words. The MVCC
implementation has numerous costs, and perhaps this is one of them
that cannot be avoided. :-(

Cheers,
mark

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-15 05:08:42
Message-ID:	87lkxixelh.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm> writes:

> Certainly decoupling presentation from storage would be nice, but even before
> that generalised use of surrogate keys seems to me a knee-jerk reaction.

I hate knee-jerk reactions too, but just think of all the pain of people
dealing with databases where they used Social Security numbers for primary
keys. I would never use an attribute that represents some real-world datum as
a primary key any more.

In my experience there are very few occasions where I want a real non-sequence
generated primary key. I've never regretted having a sequence generated
primary key, and I've certainly had occasions to regret not having one.

--
greg

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Michael Glaesemann <grzm(at)myrealbox(dot)com>
Cc:	Leandro Guimar?es Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-16 18:52:16
Message-ID:	20060116185216.GF67693@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2006 at 07:28:21PM +0900, Michael Glaesemann wrote:
>
> On Jan 13, 2006, at 21:42 , Leandro Guimar?es Faria Corcete DUTRA wrote:
>
> >If you still declare the natural key(s) as UNIQUEs, you have just made
> >performance worse. Now there are two keys to be checked on UPDATEs
> >and
> >INSERTs, two indexes to be updated, and probably a SEQUENCE too.
>
> For UPDATEs and INSERTs, the "proper" primary key also needs to be
> checked, but keys are used for more than just checking uniqueness:
> they're also often used in JOINs. Joining against a single integer
> I'd think it quite a different proposition (I'd think faster in terms
> of performance) than joining against, say, a text column or a
> composite key.

a) the optimizer does a really poor job on multi-column index statistics
b) If each parent record will have many children, the space savings from
using a surrogate key can be quite large
c) depending on how you view things, putting actual keys all over the
place is denormalized

Generally, I just use surrogate keys for everything unless performance
dictates something else.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 13:08:53
Message-ID:	loom.20060118T140650-865@post.gmane.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim C. Nasby <jnasby <at> pervasive.com> writes:

> a) the optimizer does a really poor job on multi-column index statistics

So it should be fixed?

And there are a *lot* of singular, natural keys.

> b) If each parent record will have many children, the space savings from
> using a surrogate key can be quite large

Not such a common case.

> c) depending on how you view things, putting actual keys all over the
> place is denormalized

How come? Never!

> Generally, I just use surrogate keys for everything unless performance
> dictates something else.

What I am proposing is the reverse: use natural keys for everything unless
performance dictates something else.

In support of my PoV:
http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

From:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 13:11:44
Message-ID:	loom.20060118T141047-101@post.gmane.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark <at> mit.edu> writes:

> I hate knee-jerk reactions too, but just think of all the pain of people
> dealing with databases where they used Social Security numbers for primary
> keys. I would never use an attribute that represents some real-world datum as
> a primary key any more.

I am not familiar with the situation.

> In my experience there are very few occasions where I want a real non-sequence
> generated primary key. I've never regretted having a sequence generated
> primary key, and I've certainly had occasions to regret not having one.

http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

From:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: enums
Date:	2006-01-18 13:24:36
Message-ID:	loom.20060118T141331-965@post.gmane.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew <at> dunslane.net> writes:

> If people would like to play, I have created a little kit to help in
> creating first class enum types in a few seconds.

Isn't what we actually want possreps?

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 13:32:46
Message-ID:	20060118133246.GC27070@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 18, 2006 at 01:08:53PM +0000, Leandro Guimarães Faria Corcete DUTRA wrote:
> Jim C. Nasby <jnasby <at> pervasive.com> writes:
> > Generally, I just use surrogate keys for everything unless performance
> > dictates something else.
>
> What I am proposing is the reverse: use natural keys for everything unless
> performance dictates something else.
>
> In support of my PoV:
> http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

Interesting. However, in my experience very few things have "natural
keys". There are no combination of attributes for people, phone calls
or even real events that make useful natural keys.

You don't say what the primary key on your events table was but I can
see one possibility:

(place,datetime)

A unique on this won't prevent overlapping events. Sure, it'll get rid
of the obvious duplicates but won't solve the problem. It also fails
the criteria that keys stable, since you can move events. You do need a
constraint on that table, but a unique constraint isn't it.

While I agree with your statement that it's the abuse of these keys
thats the problem, I find people are far too likely to see natural keys
where none exist.

BTW, the way I deal with people mixing up surrogate keys is by (usually
by chance) having the sequences for different tables start at wildly
different points. By starting one counter at a million and the other at
one, the chances that you'll be able to mix them up is reduced. On some
systems I can even identify the table a key comes from by looking at the
number, just because I know only one table has keys in the 30,000
range.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: enums
Date:	2006-01-18 13:42:08
Message-ID:	43CE45B0.3030501@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Leandro Guimarães Faria Corcete DUTRA wrote:

>Andrew Dunstan <andrew <at> dunslane.net> writes:
>
>
>
>>If people would like to play, I have created a little kit to help in
>>creating first class enum types in a few seconds.
>>
>>
>
>Isn't what we actually want possreps?
>
>
>
>
>

You appear to be responding to mail from months ago. Please catch up
before replying, so we don't rehash old discussions. As previously
discussed, I intend to do first class enums for the next release of
postgres, if I get enough time. Enumkit was just a very small step along
the research road, although it is useful in itself, which is why I
released it.

cheers

andrew

From:	Michael Glaesemann <grzm(at)myrealbox(dot)com>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 13:56:14
Message-ID:	C17C695B-BEDB-4BA8-843C-9A1EF7AAF00C@myrealbox.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 18, 2006, at 22:08 , Leandro Guimarães Faria Corcete DUTRA wrote:

> Jim C. Nasby <jnasby <at> pervasive.com> writes:
>
>> a) the optimizer does a really poor job on multi-column index
>> statistics
>
> So it should be fixed?

Of course! Patches welcome!

Michael Glaesemann
grzm myrealbox com

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 14:53:54
Message-ID:	87slrlwprx.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm> writes:

> Greg Stark <gsstark <at> mit.edu> writes:
>
> > I hate knee-jerk reactions too, but just think of all the pain of people
> > dealing with databases where they used Social Security numbers for primary
> > keys. I would never use an attribute that represents some real-world datum as
> > a primary key any more.
>
> I am not familiar with the situation.

The US gov't handed out unique numbers to every worker for their old age
pension program. Many early database designers thought that made a wonderful
natural primary key.

It turns out that:

a) not everyone has a social insurance number: when their business expanded to
include foreign nationals these databases had to make up fake social insurance
numbers.

b) Occasionally people's social insurance numbers change, either because they
got it wrong in the first place or because of identity theft later on. Even
dealing with it changing isn't good enough because the old records don't
disappear; the person essentially has *two* social insurance numbers.

c) For security reasons it turns out to be a bad idea to be passing around
social insurance numbers in the first place. So these database designers had a
major problem adapting when people started refusing to give them social
insurance numbers or complaining when their application leaked their social
insurance number.

In short, what seemed like the clearest possible example of a natural primary
key became a great example of how hard it is to deal with changing business
requirements when you've tied your database design to the old rules. Using
natural primary keys makes an iron-clad design assumption that the business
rules surrounding that datum will never change. And the one thing constant in
business is that business rules change.

In the past I've used "username" as a primary key for a users table, what
could be safer?

Later we had to create a sequence generated userid column because some data
partners couldn't handle an text column without corrupting it. And of course
one day the question arose whether we could handle someone wanting to change
their username. Then another day we were asked whether we could have two
different people with the same username if they belonged to separate branded
subsites.

--
greg

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Leandro Guimar??es Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 19:11:50
Message-ID:	20060118191150.GL17896@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 18, 2006 at 01:08:53PM +0000, Leandro Guimar??es Faria Corcete DUTRA wrote:
> > b) If each parent record will have many children, the space savings from
> > using a surrogate key can be quite large
>
> Not such a common case.

Hmmm...

Many blog entries per user... Many blog comments per entry

Many PO's per customer... many line items per PO...

Etc., etc. I would argue that one-many relationships are far more common
than one-one, and it's very common for an integer ID to be a more
compact representation than a real key.

> > c) depending on how you view things, putting actual keys all over the
> > place is denormalized
>
> How come? Never!

Huh?

One of the tenants of normalization is that you don't repeat data. You
don't use customer name in your PO table, because it's asking for
problems; what if a customer changes names (as just one example).

> > Generally, I just use surrogate keys for everything unless performance
> > dictates something else.
>
> What I am proposing is the reverse: use natural keys for everything unless
> performance dictates something else.
>
> In support of my PoV:
> http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

Read the bottom of it:

"I am not saying that you should avoid autonumber surrogate keys like an
SCO executive. The danger is not in their use but in their abuse. The
"events_id" column in the "events" table didn't give us any trouble
until we began to rely on it as the sole key for the table. The
accounting application gave us problems because we were using the ID as
the entire handle for the records. That crossed the line from use to
misuse, and we suffered for it."

To paraphrase, the issue isn't that surrogate keys were used for RI; the
issue is that proper keys were not setup to begin with. Does it make
sense to have a customer table where customer_name isn't unique? Almost
certainly not. But that's just one possible constraint you might put on
that table. To put words in Josh's mouth, the issue isn't with using a
surrogate key, it's with not thinking about what constraints you should
be placing on your data.

Take a look at cbk's comment; he does a great job of summing the issue
up.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-18 23:58:50
Message-ID:	200601181558.50878.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martjin,

> Interesting. However, in my experience very few things have "natural
> keys". There are no combination of attributes for people, phone calls
> or even real events that make useful natural keys.

I certainly hope that I never have to pick up one of your projects. A
table without a natural key is a data management disaster. Without a
key, it's not data, it's garbage.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-19 07:27:17
Message-ID:	20060119072717.GA9949@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 18, 2006 at 03:58:50PM -0800, Josh Berkus wrote:
> Martjin,
>
> > Interesting. However, in my experience very few things have "natural
> > keys". There are no combination of attributes for people, phone calls
> > or even real events that make useful natural keys.
>
> I certainly hope that I never have to pick up one of your projects. A
> table without a natural key is a data management disaster. Without a
> key, it's not data, it's garbage.

???

Please provides natural keys for any of the following:

- A Person
- A phone call: (from,to,date,time,duration) is not enough
- A physical address
- A phone line: (phone numbers arn't unique over time)
- An internet account: (usernames not unique over time either)

In any of these either misspellings, changes of names, ownership or
even structure over time render the obvious useless as keys. There are
techniques for detecting and reducing duplication but the point is that
for any of these duplicates *can* be valid data.

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-19 18:09:26
Message-ID:	43CFD5D6.7000109@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martjin,

> In any of these either misspellings, changes of names, ownership or
> even structure over time render the obvious useless as keys. There are
> techniques for detecting and reducing duplication but the point is that
> for any of these duplicates *can* be valid data.

Please point me out where, in the writings of E.F. Codd or in the SQL
Standard, it says that keys have to be immutable for the life of the row.

Duplicate *values* can be valid data. Duplicate *tuples* show some
serious flaws in your database design. If you have a personnel
directory on which you've not bothered to define any unique constraints
other than the ID column, then you can't match your data to reality. If
you have two rows with the same first and last name, you don't know if
they are two different people or the same person, duplicated. Which
will be a big problem come paycheck time.

Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising a
set of values definining a *unique* data entity. i.e. "The employeee
named "John" "Little" at extension "4531". There is nothing anywhere
said about keys never changing.

This is Databases 101 material. Really!

--Josh

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-19 21:29:20
Message-ID:	20060119212920.GA17981@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 19, 2006 at 10:09:26AM -0800, Josh Berkus wrote:
> Martjin,
>
> >In any of these either misspellings, changes of names, ownership or
> >even structure over time render the obvious useless as keys. There are
> >techniques for detecting and reducing duplication but the point is that
> >for any of these duplicates *can* be valid data.
>
> Please point me out where, in the writings of E.F. Codd or in the SQL
> Standard, it says that keys have to be immutable for the life of the row.

Possibly nowhere. But when you send invoices to customers, any details
on there *are* immutable. Sure, in your database you don't care if
things change, but then they don't match reality anymore do they?

> Duplicate *values* can be valid data. Duplicate *tuples* show some
> serious flaws in your database design. If you have a personnel
> directory on which you've not bothered to define any unique constraints
> other than the ID column, then you can't match your data to reality. If
> you have two rows with the same first and last name, you don't know if
> they are two different people or the same person, duplicated. Which
> will be a big problem come paycheck time.

I never said there were duplicate tuples, just that the data has no
natural keys. The tuples are unique because there's a surrogate key. It
is entirely possible to have two people with the same first name, last
name and date of birth. Rather uncommon, but the database must be able
to support it.

I don't understand your example though. If you have a personnel
directory with two rows with the same first and last name, what does
that tell you. Nothing. You have to go find out whether there really
are two of those people or not. You can simplify the process by taking
into account the fact that it's very unlikely, but a unique constraint
is not the answer. Besides, it's far more likely the same person will
appear twice with two different spellings of their name. :)

Anyway, the discussion was about surrogate vs natural keys. Nothing
here has convinced me that there are any useful natural keys to be
found in the examples I gave. Most of the examples I gave come from a
system I had to maintain where some designer had assumed there was some
kind of natural key and in *each* and *every* case it caused
problems...

From:	Leandro Guimarães Faria Corcete Dutra <leandro(at)dutra(dot)fastmail(dot)fm>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Surrogate keys (Was: enums)
Date:	2006-01-29 01:48:57
Message-ID:	1138499338.6336.9.camel@bege.exemplo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Em Qui, 2006-01-19 às 22:29 +0100, Martijn van Oosterhout escreveu:
> Possibly nowhere. But when you send invoices to customers, any details
> on there *are* immutable. Sure, in your database you don't care if
> things change, but then they don't match reality anymore do they?

Then what you need is a temporal database -- at least some form of
historical records. Nothing to do with keys in themselves.

> I never said there were duplicate tuples, just that the data has no
> natural keys. The tuples are unique because there's a surrogate key.

This does not guarantee uniqueness, as the key is artificially and
internally generated.

> It
> is entirely possible to have two people with the same first name, last
> name and date of birth. Rather uncommon, but the database must be able
> to support it.

And the way to support it is to take into account additional data --
place of birth, parents' data etc -- as part of the candidate keys. Not
to allow duplicates.

> I don't understand your example though. If you have a personnel
> directory with two rows with the same first and last name, what does
> that tell you. Nothing. You have to go find out whether there really
> are two of those people or not.

And how will you do that if you don't store additional data?

> You can simplify the process by taking
> into account the fact that it's very unlikely, but a unique constraint
> is not the answer.

Oh yes, it is. They only one.

> Besides, it's far more likely the same person will
> appear twice with two different spellings of their name. :)

So what?

--
+55 (11) 5685 2219 xmpp:leandrod(at)jabber(dot)org
+55 (11) 9406 7191 Yahoo!: lgcdutra
+55 (11) 5686 9607 MSN: leandro(at)dutra(dot)fastmail(dot)fm
+55 (11) 4390 5383 ICQ/AIM: 61287803