Re: UUID column as pimrary key?

Lists: pgsql-general
From: Dennis Gearon <gearond(at)sbcglobal(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: UUID column as pimrary key?
Date: 2011-01-04 19:07:00
Message-ID: 189550.43158.qm@web82103.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I haven't been able to find anywhere, easily, in the documentation using google
where a list of allowed data types for primary keys is.

So, UUIDs can be primary keys?
Any issues wtih them on sorting or paging of index tables, etc.?

Also, the documentation says that UUIDs are 128 bit value, but never explicitly
says that's how it's stored. Nor does it use one of the nice, blue headered
tables for UUID (or ENUM) showing storage and other attributes as it does for
numeric, character,boolean, date/time, binary, monetary, geometric, or network
types.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.


From: David Wall <d(dot)wall(at)computer(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-04 19:16:52
Message-ID: 4D237224.2010800@computer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We're using UUID for primary keys in PG 8.4 without any issues. I have
no real insights into the details or performance issues, but always
figured it was stored as a binary 128-bit value, but with added benefits
of being able to enter and view them using a standard string format. We
don't sort them as they have no real meaning for us.

On 1/4/2011 11:07 AM, Dennis Gearon wrote:
> I haven't been able to find anywhere, easily, in the documentation using google
> where a list of allowed data types for primary keys is.
>
> So, UUIDs can be primary keys?
> Any issues wtih them on sorting or paging of index tables, etc.?
>
> Also, the documentation says that UUIDs are 128 bit value, but never explicitly
> says that's how it's stored. Nor does it use one of the nice, blue headered
> tables for UUID (or ENUM) showing storage and other attributes as it does for
> numeric, character,boolean, date/time, binary, monetary, geometric, or network
> types.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Dennis Gearon <gearond(at)sbcglobal(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-04 19:24:30
Message-ID: 1294169070.6949.39.camel@jd-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2011-01-04 at 11:07 -0800, Dennis Gearon wrote:
> I haven't been able to find anywhere, easily, in the documentation using google
> where a list of allowed data types for primary keys is.

Anything that can be UNIQUE NOT NULL

>
> So, UUIDs can be primary keys?

Yes.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-04 21:11:14
Message-ID: 874o9o1ij1.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

d(dot)wall(at)computer(dot)org (David Wall) writes:
> We're using UUID for primary keys in PG 8.4 without any issues. I
> have no real insights into the details or performance issues, but
> always figured it was stored as a binary 128-bit value, but with added
> benefits of being able to enter and view them using a standard string
> format. We don't sort them as they have no real meaning for us.

In principle, this might be a reason to want to do the long-outstanding
work on hash indexes; with UUIDs, it mayn't be useful to sort the
values, but you *do* want to be able to validate that they're unique.
--
output = ("cbbrowne" "@" "gmail.com")
http://www3.sympatico.ca/cbbrowne/
"Computers are like air conditioners: They stop working properly if
you open windows."


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Dennis Gearon <gearond(at)sbcglobal(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 08:31:51
Message-ID: 05af4d43490ca257564b66dd9a3de971@softperience.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 4 Jan 2011 11:07:00 -0800 (PST), Dennis Gearon
<gearond(at)sbcglobal(dot)net> wrote:
> I haven't been able to find anywhere, easily, in the documentation
> using google
> where a list of allowed data types for primary keys is.
>
> So, UUIDs can be primary keys?
> Any issues wtih them on sorting or paging of index tables, etc.?

Disadvantage
* aren't ordered in way as records are added to DB.
* 128bit length (PSQL stores them as 128bit value)
* slower to generate you need to use random number generator
* ... if you do select * on table with two uuids you will need to
scroll GUI to see data :)
* ... really unhandy if you want to make manual updates :)

Advantage:
* simple to generate, and 128bit random is almost globally unique,
* you have your id, before executing query, (in contrast to all this
autoincrement) so you may put it in dependant rows
* almost every platform has UUID generator

Advantage / disadvantage
* depending on UUID generator, UUID can store some "privacy"
information e.g. MAC address of your card, such UUID.

Personally I prefer pooled incremental id's. Fast, unique, you have Id
before query - but you need to write "code" by self.

> Also, the documentation says that UUIDs are 128 bit value, but never
> explicitly
> says that's how it's stored. Nor does it use one of the nice, blue
> headered
> tables for UUID (or ENUM) showing storage and other attributes as it
> does for
> numeric, character,boolean, date/time, binary, monetary, geometric,
> or network
> types.
>
>
>
> Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is
> usually a better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Dennis Gearon <gearond(at)sbcglobal(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 10:50:11
Message-ID: 4D244CE3.8010504@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01/05/2011 07:31 PM, Radosław Smogura wrote:

> * you have your id, before executing query, (in contrast to all this
> autoincrement) so you may put it in dependant rows

Do you mean that with a UUID, you don't need to talk to the database at
all, you can generate an ID with no interaction with / involvement with
the database at all? Because other than that, there's not much
difference in how you normally work with them.

With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
);
INSERT INTO x(y) VALUES (1);

With a uuid, you'd:

CREATE TABLE x (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
y integer
);
INSERT INTO x(y) VALUES (1);

In either case, you can explicitly call the generator function for
seq/uuid - nextval(seqname) or uuid_generate_v4() respectively - or you
can omit the PK column in your inserts and let the database generate it.

> Personally I prefer pooled incremental id's. Fast, unique, you have Id
> before query - but you need to write "code" by self.

Many libraries / ORMs / etc that interact with Pg will happily take care
of this for you. In fact, I had to fight to convince Hibernate that I
*didn't* want it to increment all my counters in steps of 50.

--
Craig Ringer


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Dennis Gearon <gearond(at)sbcglobal(dot)net>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 14:28:40
Message-ID: da91718b5ed40d52ae34278f274119da@softperience.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer
<craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 01/05/2011 07:31 PM, Radosław Smogura wrote:
>
>> * you have your id, before executing query, (in contrast to all this
>> autoincrement) so you may put it in dependant rows
>
> Do you mean that with a UUID, you don't need to talk to the database
> at all, you can generate an ID with no interaction with / involvement
> with the database at all? Because other than that, there's not much
> difference in how you normally work with them.
>
>
> With a sequence, you might:
>
> CREATE SEQUENCE x_id_seq;
> CREATE TABLE x (
> id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
> y integer
> );
> INSERT INTO x(y) VALUES (1);
>
I mean situation, when You create e.g. in one transaction, book and
chapters, in some way You need retrieve book's id, by returning clause
of insert, or by obtaining id form sequence.
It's simpler to write:
book_id = new uuid();
insert into book values(book_id,....);
insert into chapters values(new uuid(), book_id, ...);
isn't it?


From: Szymon Guz <mabewlun(at)gmail(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dennis Gearon <gearond(at)sbcglobal(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 14:34:00
Message-ID: AANLkTim2UJKr1foZBP5a=C5DgL-nOA6YigDNDNoZqznd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 5 January 2011 15:28, Radosław Smogura <rsmogura(at)softperience(dot)eu> wrote:

> On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer <
> craig(at)postnewspapers(dot)com(dot)au> wrote:
>
>> On 01/05/2011 07:31 PM, Radosław Smogura wrote:
>>
>> * you have your id, before executing query, (in contrast to all this
>>> autoincrement) so you may put it in dependant rows
>>>
>>
>> Do you mean that with a UUID, you don't need to talk to the database
>> at all, you can generate an ID with no interaction with / involvement
>> with the database at all? Because other than that, there's not much
>> difference in how you normally work with them.
>>
>>
>> With a sequence, you might:
>>
>> CREATE SEQUENCE x_id_seq;
>> CREATE TABLE x (
>> id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
>> y integer
>> );
>> INSERT INTO x(y) VALUES (1);
>>
>> I mean situation, when You create e.g. in one transaction, book and
> chapters, in some way You need retrieve book's id, by returning clause of
> insert, or by obtaining id form sequence.
> It's simpler to write:
> book_id = new uuid();
> insert into book values(book_id,....);
> insert into chapters values(new uuid(), book_id, ...);
> isn't it?
>
>
>
For me it is simpler just to write this:

bookid = insert into books(...) values(...) returning book_id;
insert into chapters(book_id, ...) values( bookid, ...);

but it's a matter of taste, I think.

regards
Szymon


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 14:37:18
Message-ID: 780FB717-BF19-4789-B862-21343CAA26C6@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:

> * simple to generate, and 128bit random is almost globally unique,

Almost? Should be totally unique, as long as your random source is decent quality.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 14:55:26
Message-ID: 0F3F66BC-7713-4782-876C-C747E8984FCC@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 7:28 AM, Radosław Smogura wrote:

> It's simpler to write:
...
> isn't it?

Depends on the situation, the libraries you're using, and so on.

Now, if you're generating records in a distributed system, where your node might be disconnected when it's creating a record, it is *much* simpler in that case ;-)

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 14:55:53
Message-ID: AANLkTimOa4zZxCApCNw6DGQAmWdT2yuzJuJUgjqj-0X1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe(at)elevated-dev(dot)com> wrote:
> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>
>> * simple to generate, and 128bit random is almost globally unique,
>
> Almost? Should be totally unique, as long as your random source is decent quality.

But I would never rely on that alone. You always have a strategy in
place, in case there's a duplicate.

--
GJ


From: Mike Christensen <mike(at)kitchenpc(dot)com>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 15:03:12
Message-ID: AANLkTi=2s6Q_16yzj8-iYa2=9stGE-mtxJjJk4YAOnji@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2011/1/5 Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>:
> On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe(at)elevated-dev(dot)com> wrote:
>> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>>
>>> * simple to generate, and 128bit random is almost globally unique,
>>
>> Almost? Should be totally unique, as long as your random source is decent quality.
>
> But I would never rely on that alone. You always have a strategy in
> place, in case there's a duplicate.

As long as all your UUIDs are generated with the same algorithm, they
are guaranteed to be unique.


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 15:04:45
Message-ID: 241AC0A2-1533-440C-8A97-ED4C02930EF5@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 7:55 AM, Grzegorz Jaśkiewicz wrote:

> But I would never rely on that alone. You always have a strategy in
> place, in case there's a duplicate.

That's really unnecessary, basically a total waste of effort.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 15:05:29
Message-ID: 20110105100529.6407650b.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:

> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>
> > * simple to generate, and 128bit random is almost globally unique,
>
> Almost? Should be totally unique, as long as your random source is decent quality.

This is going off-topic, but I did some research on this because we
were considering using UUIDs for various keys ...

Fact is, UUIDs are not _guaranteed_ to be unique. If you use the generating
system that includes a MAC address, then in theory, they are guaranteed
to be unique, but in practice, MAC addresses aren't guaranteed to be
unique either, so that's not 100% either.

Beyond that, the namespace size for a UUID is so incomprehensibly huge
that the chance of two randomly generated UUIDs having the same value
is incomprehensibly unlikely ... it is, however, not a 100% guarantee.

Anyway, in our case, we determined that the chance of UUID collision
for the dataset in question was extremely unlikely, however, the
consequences of such a collision were pretty bad. We also determined
that we were able to control a "unit ID" for each independent system
that would generate IDs, which could (a) be part of a unique seed for
UUIDs, or (b) be a prefix to a autonumber ID that would be a lot easier
to read and work with manually. In the end, we chose b for the human
factor.

Face it, reading, remembering, and typing UUIDs kinda sucks.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Mike Christensen <mike(at)kitchenpc(dot)com>
Cc: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 15:06:16
Message-ID: ED5F8B5E-969D-427C-80A0-FD7ED7598E93@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 8:03 AM, Mike Christensen wrote:

> As long as all your UUIDs are generated with the same algorithm, they
> are guaranteed to be unique.

There is no requirement that they be generated with the same algorithm in order to be unique. A MAC/time-based UUID cannot duplicate a random one, and vice versa. (Also applies to the 3rd flavor of UUID whose details I do not remember.)

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 15:21:49
Message-ID: F00ABD29-930E-4835-B219-B0F300CAC9F9@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:

> Beyond that, the namespace size for a UUID is so incomprehensibly huge
> that the chance of two randomly generated UUIDs having the same value
> is incomprehensibly unlikely

Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneously struck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wiped out by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earth will fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishingly small that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including the chance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that fires start in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely, I mean many many many orders of magnitude...)

> In the end, we chose b for the human
> factor.

A very good decision, in the case where you're actually able to control each independent system.

> Face it, reading, remembering, and typing UUIDs kinda sucks.

Lots of copy & paste, or custom GUI tools for devs & DBAs, or abuse like '...%', all of them painful in their own way.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 15:55:22
Message-ID: 20110105105522.4b3314a2.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:

> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>
> > Beyond that, the namespace size for a UUID is so incomprehensibly huge
> > that the chance of two randomly generated UUIDs having the same value
> > is incomprehensibly unlikely
>
> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneously struck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wiped out by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earth will fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishingly small that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including the chance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that fires start in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely, I mean many many many orders of magnitude...)

That statement demonstrates a lack of investigation and/or consideration
of the circumstances.

I can't find my math or I'd reproduce it here, but consider this:

If you have 50 devices, each generating 100 UUIDs per hour, and you'll
keep records for 1 year, then your argument above is probably accurate.

However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
be keeping those records for 10+ years, the chances of collisions near
the end of that 10 year span get high enough to actually make developers
nervous.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:01:52
Message-ID: 1466.1294243312@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Ribe <scott_ribe(at)elevated-dev(dot)com> writes:
> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
>> that the chance of two randomly generated UUIDs having the same value
>> is incomprehensibly unlikely

> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneously struck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wiped out by an asteroid impact,

I see those sorts of arguments all the time, and I consider them pure BS.
Yes, the namespace is theoretically large. However, the questions you
really have to answer are (a) how much of the namespace is actually
being used by the UUID generation methods in use in a particular
application; (b) how sure can you be that there is not correlation
between UUIDs generated in different places/sessions.

In practical use I think the odds of a collision are *far* higher than
you are suggesting, unless the UUID generation is being done with a lot
more care than is likely if the user takes these sorts of claims at face
value. The odds may still be low enough to be a very good risk, but
you need to think about it not just bet your database on it without
thinking.

Being paranoid is a good thing. It's what DBAs are paid for.

regards, tom lane


From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:03:47
Message-ID: 4D249663.3080904@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01/05/2011 08:55 AM, Bill Moran wrote:
> In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:
>
>> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>>
>>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
>>> that the chance of two randomly generated UUIDs having the same value
>>> is incomprehensibly unlikely
>>
>> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneously struck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wiped out by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earth will fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishingly small that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including the chance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that fires start in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely, I mean many many many orders of magnitude...)
>
> That statement demonstrates a lack of investigation and/or consideration
> of the circumstances.
>
> I can't find my math or I'd reproduce it here, but consider this:
>
> If you have 50 devices, each generating 100 UUIDs per hour, and you'll
> keep records for 1 year, then your argument above is probably accurate.
>
> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> be keeping those records for 10+ years, the chances of collisions near
> the end of that 10 year span get high enough to actually make developers
> nervous.
>

But we're talking about a primary key. Postgres guarantees the
uniqueness. 1 transaction in 10^^100 rolls back due to a second
instance of an (otherwise/almost) uuid. Big deal.


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:18:37
Message-ID: A6A6ABF5-71F7-4941-9D0B-D1CEE5C8ED0D@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 8:55 AM, Bill Moran wrote:

> That statement demonstrates a lack of investigation and/or consideration
> of the circumstances.

No, it doesn't.

> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> be keeping those records for 10+ years, the chances of collisions near
> the end of that 10 year span get high enough to actually make developers
> nervous.

No, they don't. At the end of your hypothetical 10-year period, you will have used about 43,000,000,000 UUIDs, or about 1/100,000,000,000,000,000,000,000,000th of the UUID space (assuming random UUIDs). Leaving you with a chance of a single collision of about 1/18,000,000,000,000,000.

Assuming of course good entropy. If the generation of random numbers is bad, then UUIDs are not so useful ;-)

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Leif Biberg Kristensen <leif(at)solumslekt(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:19:22
Message-ID: 201101051719.22141.leif@solumslekt.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday 5. January 2011 16.05.29 Bill Moran wrote:

> Beyond that, the namespace size for a UUID is so incomprehensibly huge
> that the chance of two randomly generated UUIDs having the same value
> is incomprehensibly unlikely ... it is, however, not a 100% guarantee.

I can't help thinking of the «Birthday Paradox»:

http://en.wikipedia.org/wiki/Birthday_problem

regards, Leif


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:29:59
Message-ID: B6BFE569-C4DE-4873-95C2-15EE2AC5603C@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:

> In practical use I think the odds of a collision are *far* higher than
> you are suggesting, unless the UUID generation is being done with a lot
> more care than is likely if the user takes these sorts of claims at face
> value.

Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithm provided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonable to assume that the generation is done with a great deal of care. (And if we are talking about users coming up with their own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X & Linux. I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows.

The IETF Network Working Group designed UUIDs to ensure that their uniqueness guarantee would be strong enough that no application would need to worry about duplicates, ever. Claims that collisions are too likely to depend on UUIDs being unique really are claims that the IETF Network Working Group didn't know what it was doing, which I find a bit ridiculous.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Leif Biberg Kristensen <leif(at)solumslekt(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:33:51
Message-ID: 16A02EED-B639-4150-92DF-24FEA7352EBA@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote:

> I can't help thinking of the «Birthday Paradox»:

Yes, the calculation of the probability of a collision is the same for the "birthday paradox" as for random UUID collisions.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Mike Christensen <mike(at)kitchenpc(dot)com>
Cc: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:46:27
Message-ID: AANLkTik=Z_H1xN7CuTWkvJ5yC8o61y0q+qhVYtj-VzsU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 5, 2011 at 3:03 PM, Mike Christensen <mike(at)kitchenpc(dot)com> wrote:
> 2011/1/5 Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>:
>> On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe(at)elevated-dev(dot)com> wrote:
>>> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>>>
>>>> * simple to generate, and 128bit random is almost globally unique,
>>>
>>> Almost? Should be totally unique, as long as your random source is decent quality.
>>
>> But I would never rely on that alone. You always have a strategy in
>> place, in case there's a duplicate.
>
> As long as all your UUIDs are generated with the same algorithm, they
> are guaranteed to be unique.
>

Good luck with that ....

--
GJ


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bill Moran <wmoran(at)potentialtech(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 16:57:45
Message-ID: 4D24A309.3070509@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01/05/2011 08:29 AM, Scott Ribe wrote:
> On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:
>
>> In practical use I think the odds of a collision are *far* higher than
>> you are suggesting, unless the UUID generation is being done with a lot
>> more care than is likely if the user takes these sorts of claims at face
>> value.
>
> Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithm provided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonable to assume that the generation is done with a great deal of care. (And if we are talking about users coming up with their own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X& Linux. I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows.
>
> The IETF Network Working Group designed UUIDs to ensure that their uniqueness guarantee would be strong enough that no application would need to worry about duplicates, ever. Claims that collisions are too likely to depend on UUIDs being unique really are claims that the IETF Network Working Group didn't know what it was doing, which I find a bit ridiculous.
>

Maybe or maybe not:)

http://www.ietf.org/rfc/rfc4122.txt

"4.3. Algorithm for Creating a Name-Based UUID

The version 3 or 5 UUID is meant for generating UUIDs from "names"
that are drawn from, and unique within, some "name space". The
concept of name and name space should be broadly construed, and not
limited to textual names. For example, some name spaces are the
domain name system, URLs, ISO Object IDs (OIDs), X.500 Distinguished
Names (DNs), and reserved words in a programming language. The
mechanisms or conventions used for allocating names and ensuring
their uniqueness within their name spaces are beyond the scope of
this specification.

The requirements for these types of UUIDs are as follows:

o The UUIDs generated at different times from the same name in the
same namespace MUST be equal.

......
"

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: pgsql-general(at)postgresql(dot)org
Cc: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bill Moran <wmoran(at)potentialtech(dot)com>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 17:30:42
Message-ID: 201101051830.42778.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sorry for not citation...

When I was talking about "almost unique", I was meaning that the UUID is
random so there is no guarantee that you will not generate two indencital
UUIDs even in subsequent calls, but it has looooow probability (you have
greater chances to win in LOTTO).

128bits is huge for now, but what will happen in next 2,3 years? In 20th
century, people think storing only last two digit of year will be enaugh!!!
Suppose we are creating gloabl, distributed database for storing information
about mobile device and base station it logged in.

If we want to guarantee uniquness of UUID across calls, we could talk about
much more far _pseudo_ random generator, then "normal" pseudo - randoms, and
what I think we need to keep state of such random generator, and share and
lock it for multiple concurrent calls. So it will not be something different
then ordinal serial column...

Now I think it's clear there is no "magical" algorithm for UUIDs.

My opinion about all of those UUID with MAC, IP addresses, Microsoft "growing"
UUIDs. All of this decrases chance of uniqness of UUID. If we will keep n
first bits of UUID constant, then we have only 128-n bits random (truly in
UUIDs we have one decimal field reserved for UUID version). If we want next
UUID to be greater then previous, at each call we will remove (next-prev) of
possible values.

Shouldn't this be enaugh for namespace UUIDs
new UUID("namespece".hashCode(), "name".hashChode())

or a little joke...
new UUID(1,1) meats this condition
> o The UUIDs generated at different times from the same name in the
>
> same namespace MUST be equal.


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Subject: Re: *****SPAM***** Re: UUID column as pimrary key?
Date: 2011-01-05 17:48:37
Message-ID: 99698C36-E3AE-4277-9DF6-8C7F81E690DF@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 10:30 AM, Radosław Smogura wrote:

> 128bits is huge for now, but what will happen in next 2,3 years?

It will still be large. When you get up to around 100 trillion UUIDs, you'll be getting up to around a 1 in a billion chance of a single collision. Before you claim that we'll outgrow them in a few years, you might want to consider the actual numbers, the growth rate of storage density, the number of disks required to hold that many records. I have a feeling the answers might surprise you ;-)

> If we want to guarantee uniquness of UUID across calls, we could talk about
> much more far _pseudo_ random generator, then "normal" pseudo - randoms, and
> what I think we need to keep state of such random generator, and share and
> lock it for multiple concurrent calls. So it will not be something different
> then ordinal serial column...

No, we don't really have to think about it much at all. The IETF and OS engineers already did. Random UUIDs are not generated with some sloppy amateurish algorithm.

> My opinion about all of those UUID with MAC, IP addresses, Microsoft "growing"
> UUIDs. All of this decrases chance of uniqness of UUID.

Well, a decrease from one insanely small chance to another insanely small chance is not anything to worry about. After all, you could argue that 128 bits is a "decrease" from 256 bits. It's the same argument. UUIDs were designed to avoid collisions, by people who knew what they were doing. More significant bits would lower the chance, from "already low enough" to "even more low enough".

> Shouldn't this be enaugh for namespace UUIDs
> new UUID("namespece".hashCode(), "name".hashChode())
>
> or a little joke...
> new UUID(1,1) meats this condition
>> o The UUIDs generated at different times from the same name in the
>>
>> same namespace MUST be equal.
>

People, people, people, please. *Namespace* UUIDs are intended to map *unique* names to UUIDs in the case where you already have *unique* names and just need to map them to a more compact form, thus the requirement that the same namespace + name always yields the same UUID. This is *not* a weakness in UUIDs, nor is it the kind of UUID you get from a simple uuid_gen or similar call, nor is it a possible source of collisions for database UUIDs (unless you do something enormously stupid, like use database fields to construct a name to give to a UUID generator).

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Radosław Smogura <mail(at)smogura(dot)eu>
To: pgsql-general(at)postgresql(dot)org
Cc: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, Leif Biberg Kristensen <leif(at)solumslekt(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 18:31:37
Message-ID: 201101051931.37534.mail@smogura.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Ribe <scott_ribe(at)elevated-dev(dot)com> Wednesday 05 January 2011 17:33:51
> On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote:
> > I can't help thinking of the «Birthday Paradox»:
> Yes, the calculation of the probability of a collision is the same for the
> "birthday paradox" as for random UUID collisions.

Depends on probability measure we will choose. The true is that probability
that in two coin drops we will get two reverses is 1/4, but true is, too, as
Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't
need to drop again.


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 18:57:59
Message-ID: 20110105135759.de43017f.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Rob Sargent <robjsargent(at)gmail(dot)com>:

>
>
> On 01/05/2011 08:55 AM, Bill Moran wrote:
> > In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:
> >
> >> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
> >>
> >>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
> >>> that the chance of two randomly generated UUIDs having the same value
> >>> is incomprehensibly unlikely
> >>
> >> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneously struck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wiped out by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earth will fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishingly small that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including the chance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that fires start in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely, I mean many many many orders of magnitude...)
> >
> > That statement demonstrates a lack of investigation and/or consideration
> > of the circumstances.
> >
> > I can't find my math or I'd reproduce it here, but consider this:
> >
> > If you have 50 devices, each generating 100 UUIDs per hour, and you'll
> > keep records for 1 year, then your argument above is probably accurate.
> >
> > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> > be keeping those records for 10+ years, the chances of collisions near
> > the end of that 10 year span get high enough to actually make developers
> > nervous.
> >
>
> But we're talking about a primary key. Postgres guarantees the
> uniqueness. 1 transaction in 10^^100 rolls back due to a second
> instance of an (otherwise/almost) uuid. Big deal.

That doesn't make any sense. If you're using a single PostgreSQL instance,
then why not just use the built in SERIAL mechanism that guarantees that
you will NEVER have a conflict?

In our case (and I expect it's the case with most people considering UUIDs)
we're talking about independent devices that occasionally synchronize
data between themselves. These devices need to generate a unique ID
of some sort without having to check with every other device. This is
one of the problems that UUIDs were intended to fix.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 19:03:24
Message-ID: 20110105140324.526b5a95.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:

> On Jan 5, 2011, at 8:55 AM, Bill Moran wrote:
>
> > That statement demonstrates a lack of investigation and/or consideration
> > of the circumstances.
>
> No, it doesn't.
>
> > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> > be keeping those records for 10+ years, the chances of collisions near
> > the end of that 10 year span get high enough to actually make developers
> > nervous.
>
> No, they don't. At the end of your hypothetical 10-year period, you will have used about 43,000,000,000 UUIDs, or about 1/100,000,000,000,000,000,000,000,000th of the UUID space (assuming random UUIDs). Leaving you with a chance of a single collision of about 1/18,000,000,000,000,000.

For crying out loud. If you're going to pick me apart with numbers, then
actually do it with some intelligence.

I could easily counter your argument by upping the numbers to 500,000
mobile devices generating 10000 UUIDs per hour over 20 years ... or raise
it even higher if you come back with that same argument ...

But the point (that you are trying to sidestep) is that the UUID namespace
is finite, so therefore you WILL hit a problem with conflicts at some point.
Just because that point is larger than most people have to concern themselves
with isn't an invalidation.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Radosław Smogura <mail(at)smogura(dot)eu>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: *****SPAM***** Re: UUID column as pimrary key?
Date: 2011-01-05 19:31:55
Message-ID: 3671266E-1AE4-4AB0-BA5F-A7286D0045FC@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 11:31 AM, Radosław Smogura wrote:

> The true is that probability
> that in two coin drops we will get two reverses is 1/4, but true is, too, as
> Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't
> need to drop again.

Nonsense. You don't stop generating UUIDs just because you haven't yet got a collision.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Dennis Gearon <gearond(at)sbcglobal(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 19:40:59
Message-ID: 197654.49025.qm@web82105.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Yes, the ID is generated before the database is contacted, in my case anyway.

The type of UUID that I want is a hybrid, so I would have to write a stored
procedure and then a trigger upon insert to get the UUID. Not rocket science, I
just have more on my plate than I can handle. So PHP is my 'main thing', so I
quickly wrote it in that. Plus, it makes it more database agnostic.

I don't get next/currval behavior, but I don't think I'll need it.

BTW, Switching from Postgres? Not likely anytime soon. I'd have to be up in the
500M+ rows and be in the data warehousing/map reducing arena before I'd
consider THAT. And there's 'flavors' of Postgres that will do that, anyway.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better

idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

----- Original Message ----
From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Dennis Gearon <gearond(at)sbcglobal(dot)net>; pgsql-general(at)postgresql(dot)org
Sent: Wed, January 5, 2011 2:50:11 AM
Subject: Re: [GENERAL] UUID column as pimrary key?

On 01/05/2011 07:31 PM, Radosław Smogura wrote:

> * you have your id, before executing query, (in contrast to all this
> autoincrement) so you may put it in dependant rows

Do you mean that with a UUID, you don't need to talk to the database at all, you

can generate an ID with no interaction with / involvement with the database at
all? Because other than that, there's not much difference in how you normally
work with them.

With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
);
INSERT INTO x(y) VALUES (1);

With a uuid, you'd:

CREATE TABLE x (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
y integer
);
INSERT INTO x(y) VALUES (1);

In either case, you can explicitly call the generator function for seq/uuid -
nextval(seqname) or uuid_generate_v4() respectively - or you can omit the PK
column in your inserts and let the database generate it.

> Personally I prefer pooled incremental id's. Fast, unique, you have Id
> before query - but you need to write "code" by self.

Many libraries / ORMs / etc that interact with Pg will happily take care of this

for you. In fact, I had to fight to convince Hibernate that I *didn't* want it
to increment all my counters in steps of 50.

--
Craig Ringer


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 19:41:43
Message-ID: E6ED48EC-0A7A-4A06-8F4E-75181BD55633@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 12:03 PM, Bill Moran wrote:

> For crying out loud. If you're going to pick me apart with numbers, then
> actually do it with some intelligence.

If you're going to get nasty, at least try to be accurate.

> I could easily counter your argument by upping the numbers to 500,000
> mobile devices generating 10000 UUIDs per hour over 20 years ... or raise
> it even higher if you come back with that same argument ...

Yeah, then you get into the realm of 1 in 10s of millions of a chance of collision. But you would need to explain to me how you would get that many records into the database committed to disk, when the UUIDs alone without any other data represent a stream of 22MB/s ;-) Or, looked at another way, inserting 1,388,889 rows/second would indeed be difficult to sustain.

> But the point (that you are trying to sidestep) is that the UUID namespace
> is finite, so therefore you WILL hit a problem with conflicts at some point.
> Just because that point is larger than most people have to concern themselves
> with isn't an invalidation.

I'm not sidestepping the point at all. The point is that the finiteness of the space is a red herring. The space is large enough that there's no chance of collision in any realistic scenario. In order to get to a point where the probability of collision is high enough to worry about, you have to generate (and collect) UUIDs at a rate that is simply not realistic--as in your second example quoted above. If you just keep raising your numbers, you could go for 100,000,000,000,000 devices generating 100,000,000,000,000 UUIDs an hour for 10,000 years. Collisions would be guaranteed, but that does not make it a useful scenario to consider.

2^256 is a finite space as well. Would you argue that because it "is finite, so therefore you WILL hit a problem with conflicts at some point"? How about 2^512? (Bearing in mind that even though finite that space would be large enough to assign approximately 10^74 UUIDs to every atom in the observable universe, or 10^51 UUIDs to every atom in the total universe using high-end estimates of the size of the non-observable universe)?

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 19:58:12
Message-ID: 4D24CD54.2070005@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01/05/2011 11:57 AM, Bill Moran wrote:
> In response to Rob Sargent <robjsargent(at)gmail(dot)com>:
>
>>
>>
>> On 01/05/2011 08:55 AM, Bill Moran wrote:
>>> In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:
>>>
>>>> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>>>>
>>>>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
>>>>> that the chance of two randomly generated UUIDs having the same value
>>>>> is incomprehensibly unlikely
>>>>
>>>> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be simultaneously struck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be wiped out by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and earth will fly off into space... If you're worried about UUID collisions, then either your priorities are completely wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so vanishingly small that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including the chance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that fires start in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more likely, I mean many many many orders of magnitude...)

>>>
>>> That statement demonstrates a lack of investigation and/or consideration
>>> of the circumstances.
>>>
>>> I can't find my math or I'd reproduce it here, but consider this:
>>>
>>> If you have 50 devices, each generating 100 UUIDs per hour, and you'll
>>> keep records for 1 year, then your argument above is probably accurate.
>>>
>>> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
>>> be keeping those records for 10+ years, the chances of collisions near
>>> the end of that 10 year span get high enough to actually make developers
>>> nervous.
>>>
>>
>> But we're talking about a primary key. Postgres guarantees the
>> uniqueness. 1 transaction in 10^^100 rolls back due to a second
>> instance of an (otherwise/almost) uuid. Big deal.
>
> That doesn't make any sense. If you're using a single PostgreSQL instance,
> then why not just use the built in SERIAL mechanism that guarantees that
> you will NEVER have a conflict?
>
> In our case (and I expect it's the case with most people considering UUIDs)
> we're talking about independent devices that occasionally synchronize
> data between themselves. These devices need to generate a unique ID
> of some sort without having to check with every other device. This is
> one of the problems that UUIDs were intended to fix.
>
Indeed there is a finite space. A very large, but finite space, just as
sequence has I suspect. If your software cannot handle a rollback for
whatever reason, you have much bigger problem on your hand than the the
remote chance of a collision in uuid generation.

From wikipedia, "only after generating 1 billion UUIDs every second for
the next 100 years, the probability of creating just one duplicate would
be about 50%. The probability of one duplicate would be about 50% if
every person on earth owns 600 million UUIDs."


From: Raymond O'Donnell <rod(at)iol(dot)ie>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 20:13:27
Message-ID: 4D24D0E7.2000302@iol.ie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 05/01/2011 19:41, Scott Ribe wrote:
> to every atom in the observable universe, or 10^51 UUIDs to every
> atom in the total universe using high-end estimates of the size of
> the non-observable universe)?

Is that taking dark matter into account? :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod(at)iol(dot)ie


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: rod(at)iol(dot)ie
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: *****SPAM***** Re: UUID column as pimrary key?
Date: 2011-01-05 20:14:37
Message-ID: 8E8FA1E0-3D9D-4C9F-A9BC-EB454ED20705@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 1:13 PM, Raymond O'Donnell wrote:

> Is that taking dark matter into account? :-)

It's not clear to me ;-)

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 20:43:01
Message-ID: 20110105154301.7897de27.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Rob Sargent <robjsargent(at)gmail(dot)com>:
> >
> > In our case (and I expect it's the case with most people considering UUIDs)
> > we're talking about independent devices that occasionally synchronize
> > data between themselves. These devices need to generate a unique ID
> > of some sort without having to check with every other device. This is
> > one of the problems that UUIDs were intended to fix.
>
> Indeed there is a finite space. A very large, but finite space, just as
> sequence has I suspect. If your software cannot handle a rollback for
> whatever reason, you have much bigger problem on your hand than the the
> remote chance of a collision in uuid generation.

You missed the point.

Despite the fact that the chance of a collision is very, very small, there
is no easy way to fix it if it happens. Zero. It can't be done without
shutting the system down, recalling all the remote devices and manually
reconciling the problem ... which is not an option.

Also, it's hard to DETECT the collision. If a device creates a new record
with a duplicate UUID, how do we tell that apart, during synchronization,
from an update to the record? Now I have your SSN or private medical
data included as part of my record, which violates laws, could lead to
incorrect medical care that could kill someone ... etc.

So, despite the chance being very small, the consequences are HUGE. I
can't just detect it and roll back.

If you have a single DB backend, these problem are easy to solve -- so
easy, in fact, that using UUIDs is overkill.

But when you have mission-critical data that must be correct and available
under every circumstance, you have to consider that UUIDs are not a
guarantee. And when a method that DOES have a guarantee is one of the
options, why would you take ANY risk at all, no matter how small?

(BTW: I hope that the people who think that the risk is acceptable aren't
writing medical software. Even if it only kills one person every 10,000
years because they were given the wrong medicine, that's too often in
my opinion)

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 21:12:26
Message-ID: 4D24DEBA.2030106@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01/05/2011 01:43 PM, Bill Moran wrote:
> In response to Rob Sargent <robjsargent(at)gmail(dot)com>:
>>>
>>> In our case (and I expect it's the case with most people considering UUIDs)
>>> we're talking about independent devices that occasionally synchronize
>>> data between themselves. These devices need to generate a unique ID
>>> of some sort without having to check with every other device. This is
>>> one of the problems that UUIDs were intended to fix.
>>
>> Indeed there is a finite space. A very large, but finite space, just as
>> sequence has I suspect. If your software cannot handle a rollback for
>> whatever reason, you have much bigger problem on your hand than the the
>> remote chance of a collision in uuid generation.
>
> You missed the point.
>
> Despite the fact that the chance of a collision is very, very small, there
> is no easy way to fix it if it happens. Zero. It can't be done without
> shutting the system down, recalling all the remote devices and manually
> reconciling the problem ... which is not an option.
>
> Also, it's hard to DETECT the collision. If a device creates a new record
> with a duplicate UUID, how do we tell that apart, during synchronization,
> from an update to the record? Now I have your SSN or private medical
> data included as part of my record, which violates laws, could lead to
> incorrect medical care that could kill someone ... etc.
>
> So, despite the chance being very small, the consequences are HUGE. I
> can't just detect it and roll back.
>
> If you have a single DB backend, these problem are easy to solve -- so
> easy, in fact, that using UUIDs is overkill.
>
> But when you have mission-critical data that must be correct and available
> under every circumstance, you have to consider that UUIDs are not a
> guarantee. And when a method that DOES have a guarantee is one of the
> options, why would you take ANY risk at all, no matter how small?
>
> (BTW: I hope that the people who think that the risk is acceptable aren't
> writing medical software. Even if it only kills one person every 10,000
> years because they were given the wrong medicine, that's too often in
> my opinion)
>

Seems the software would need to know whether an update or an add was
begin performed: is this a new patient or one all ready in the db. New
record patient record and new id (of some arbitrary nature) and any new
subsidiary data associated appropriately.

BTW, you're opinion of efficacy of current medical practice is also out
of whack. From a JAMA artical: "Similar to previous studies, almost a
quarter (22.7%) of active-care patient deaths were rated as at least
possibly preventable by optimal care, with 6.0% rated as probably or
definitely preventable. " That's 3 in 50. How d'ya like them odds?


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 22:03:12
Message-ID: 20110105220311.GE19652@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
> I'm not sidestepping the point at all.

You may be missing it, however, because. . .

> The point is that the finiteness of the space is a red herring. The
> space is large enough that there's no chance of collision in any
> realistic scenario.
> In order to get to a point where the probability
> of collision is high enough to worry about, you have to generate
> (and collect) UUIDs at a rate that is simply not realistic--as in
> your second example quoted above.

. . .the example was not that UUIDs are being generated and collected
in one place at that rate, but that they're being generated in several
independent places at a time, and if the cost of the collision is
extremely high, there might be reasons not to use the UUID strategy
but instead to use something else that is generated algorithmically by
the database. There's a trade-off in having distributed systems
acting completely independently, and while I have lots of confidence
in my colleagues at the IETF (and agree with you that for the
overwhelming majority of cases UUIDs are guaranteed-unique enough),
correctly making these trade-offs still requires thought and
analysis. It's exactly the kind of of analysis that professional
paranoids like DBAs are for.

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 22:39:09
Message-ID: BD60A226-95FD-4D13-9F6B-E7CB94602DCB@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>> From wikipedia, "only after generating 1 billion UUIDs every second for
> the next 100 years, the probability of creating just one duplicate would
> be about 50%. The probability of one duplicate would be about 50% if
> every person on earth owns 600 million UUIDs."

Even if the chances of one person encountering a UUID collision are about once in say 100 billion years, that could be tomorrow. It could also not happen at all in that time span. That's how chance works. You can never assume it won't happen. If the costs incurred by a collision are lower than the costs of preventing it, you can choose to just take the hit, but that doesn't go for _every_ problem domain. The medical systems mentioned up-thread are an example of that.

Next to that, UUID's are generated by computers. I have no doubts that the numeric space that makes up a UUID allows for collision chances as low as described, but are computers capable of generating those numbers sufficiently random that they actually achieve that low a chance? I think that's pushing it.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d24f31d11541020617287!


From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 22:49:01
Message-ID: 4D24F55D.8070409@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01/05/2011 03:39 PM, Alban Hertroys wrote:
>>> From wikipedia, "only after generating 1 billion UUIDs every second for
>> the next 100 years, the probability of creating just one duplicate would
>> be about 50%. The probability of one duplicate would be about 50% if
>> every person on earth owns 600 million UUIDs."
>
>
> Even if the chances of one person encountering a UUID collision are about once in say 100 billion years, that could be tomorrow. It could also not happen at all in that time span. That's how chance works. You can never assume it won't happen. If the costs incurred by a collision are lower than the costs of preventing it, you can choose to just take the hit, but that doesn't go for _every_ problem domain. The medical systems mentioned up-thread are an example of that.
>
> Next to that, UUID's are generated by computers. I have no doubts that the numeric space that makes up a UUID allows for collision chances as low as described, but are computers capable of generating those numbers sufficiently random that they actually achieve that low a chance? I think that's pushing it.
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1214,4d24f31711541026711723!
>
>
I'm not talking about chance. I'm talking about robust software. You
have to expect some sort of rollback for any of a number of reasons and
deal with them. The potential UUID collision is just one example of a
contingency one might plan for after having taken care of all the more
probable failure points.


From: Michael Satterwhite <michael(at)weblore(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 23:11:35
Message-ID: 201101051711.35715.michael@weblore.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I've been enjoying this discussion, but it seems to me there are two
possibilities that guarantee unique results, use less storage, and neither
would use a UUID for the primary key.

The first would assume that only one machine is doing the work on the database.
As has been pointed out earlier, in this case the simplest solution would be
to simply use a series. As that is guaranteed unique on a single machine and
is incredibly simple to implement.

Once multiple machines are linked to maintain the database, this has a flaw in
it as a series is not guaranteed to be unique between machines (actually I
think they're pretty well guaranteed *NOT* to be, but that's another issue. In
that case, another field is needed as follows:

Create table test
( id serial,
machine_id integer,
.....
primary_key(machine_id, id);
)

Each machine would have a unique machine_id. This would guarantee uniqueness
and be very easy to maintain. In addition - *IF* it were ever necessary - it
would be easy to determine which machine generated records.

OK, I'm probably going to regret jumping in here, but I couldn't resist.

Sorry


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 23:22:08
Message-ID: 878vyzc4wv.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

ajs(at)crankycanuck(dot)ca (Andrew Sullivan) writes:
> On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
>> I'm not sidestepping the point at all.
>
> You may be missing it, however, because. . .
>
>> The point is that the finiteness of the space is a red herring. The
>> space is large enough that there's no chance of collision in any
>> realistic scenario.
>> In order to get to a point where the probability
>> of collision is high enough to worry about, you have to generate
>> (and collect) UUIDs at a rate that is simply not realistic--as in
>> your second example quoted above.
>
> . . .the example was not that UUIDs are being generated and collected
> in one place at that rate, but that they're being generated in several
> independent places at a time, and if the cost of the collision is
> extremely high, there might be reasons not to use the UUID strategy
> but instead to use something else that is generated algorithmically by
> the database. There's a trade-off in having distributed systems
> acting completely independently, and while I have lots of confidence
> in my colleagues at the IETF (and agree with you that for the
> overwhelming majority of cases UUIDs are guaranteed-unique enough),
> correctly making these trade-offs still requires thought and
> analysis. It's exactly the kind of of analysis that professional
> paranoids like DBAs are for.

But it seems to me that some of the analytics are getting a little *too*
paranoid, on the "perhaps UUIDs are the wrong answer" side of the
column.

There's no panaceas, here; if the process that is using IDs is fragile,
then things can break down whether one is using UUID or SERIAL.

I prefer the "probably unique enough" side of the fence, myself.

And the process that uses the IDs needs to be robust enough that things
won't just fall apart in tatters if it runs into non-uniqueness.

I'd expect that to not need to be a terribly big deal - if there's a
UNIQUE index on a UUID-based column, then an insert will fail, and the
process can pick between things like:
- Responding that it had a problem, or
- Retrying.

And if the system isn't prepared for that sort of condition, then it's
also not prepared for some seemingly more likely error conditions such
as:
- The DB connection timed out because something fuzzed out on the
network
- The DB server fell over and is restarting because (power failed,
someone kicked the switch, disk ran out, ...)

It seems rather silly to be *totally* paranoid about the
not-infinite-uniqueness of UUIDs when there are plenty of other risks
lurking around that also need erro checking.
--
"cbbrowne","@","gmail.com"
http://linuxdatabases.info/info/slony.html
"How can you dream the impossible dream when you can't get any sleep?"
-- Sam Robb


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 23:27:55
Message-ID: 874o9nc4n8.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

dalroi(at)solfertje(dot)student(dot)utwente(dot)nl (Alban Hertroys) writes:
>>> From wikipedia, "only after generating 1 billion UUIDs every second for
>> the next 100 years, the probability of creating just one duplicate would
>> be about 50%. The probability of one duplicate would be about 50% if
>> every person on earth owns 600 million UUIDs."
>
>
> Even if the chances of one person encountering a UUID collision are
> about once in say 100 billion years, that could be tomorrow. It could
> also not happen at all in that time span. That's how chance works. You
> can never assume it won't happen. If the costs incurred by a collision
> are lower than the costs of preventing it, you can choose to just take
> the hit, but that doesn't go for _every_ problem domain. The medical
> systems mentioned up-thread are an example of that.
>
> Next to that, UUID's are generated by computers. I have no doubts that
> the numeric space that makes up a UUID allows for collision chances as
> low as described, but are computers capable of generating those
> numbers sufficiently random that they actually achieve that low a
> chance? I think that's pushing it.

I think I'd rather analyze this by:
a) Reading the RFC, and
b) Looking at some of the common implementations likely to get used

rather than to merely have "a doubt."

RFC 4122 does NOT point to randomness as the only criterion to
discourage collisions, and treating UUIDs as if they were merely about
being "sufficiently random to achieve low chance of collision" is
insulting to the drafters of the standard, because they were certainly
NOT so naive as to think that was sufficient.
--
"cbbrowne","@","acm.org"
"What I find most amusing about com and .NET is that they are trying
to solve a problem I only had when programming using MS tools."
-- Max M <maxm(at)mxm(dot)dk> (on comp.lang.python)


From: dennis jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com>
To: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 23:32:58
Message-ID: AANLkTikoCeBXyKtGkeWeV72o932jyy6udfm+XB29PMFU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran <wmoran(at)potentialtech(dot)com> wrote:
>
> But the point (that you are trying to sidestep) is that the UUID namespace
> is finite, so therefore you WILL hit a problem with conflicts at some point.
> Just because that point is larger than most people have to concern themselves
> with isn't an invalidation.

The UUID itself is 128 bits. Some of those bits are pre-determined.
I don't recall, but I think that a "normal" UUID has 121 bits of
randomness.

How many would one have to store in a database before a collision
would even be a concern. Such a database would be freaking huge.
Probably far larger than anything that anyone has.

Lets say (I'm pulling numbers out of my ass here), that you wanted to
store 2^100 rows in a table. Each row would have a UUID and some
other meaningful data. Maybe a short string or something. I don't
recall what the postgresql row overhead is (~20 bytes?), but lets say
that each row in your magic table of death required 64 bytes. A table
with 2^100 rows would require nearly 10^31 bytes ( = log_10(64 *
2^100)). How on Earth would you store that much data? And why would
you ever need to?

I postulate that UUID collisions in Postgresql, using a "good" source
for UUID generation, is unlikely to have collisions for any reasonable
database.

Food for thought:
http://blogs.sun.com/dcb/entry/zfs_boils_the_ocean_consumes

ps- If my math is off, I apologize. Its been a long day...


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Michael Satterwhite <michael(at)weblore(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 23:44:40
Message-ID: 763ec3dcb2cbd8a6917d80e4c3fc01cf@softperience.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Pooled id.. child machine connects to main and says give 10000 of ids,
main increments counter by 10000, child allocates in given pool.

On Wed, 5 Jan 2011 17:11:35 -0600, Michael Satterwhite
<michael(at)weblore(dot)com> wrote:
> Once multiple machines are linked to maintain the database, this has
> a flaw in
> it as a series is not guaranteed to be unique between machines
> (actually I
> think they're pretty well guaranteed *NOT* to be


From: Szymon Guz <mabewlun(at)gmail(dot)com>
To: dennis jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-05 23:47:44
Message-ID: AANLkTinANzm61UOURi27D+k56T=Mn33OvKuA7Tm6WBD8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 6 January 2011 00:32, dennis jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com> wrote:

> On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran <wmoran(at)potentialtech(dot)com>
> wrote:
> >
> > But the point (that you are trying to sidestep) is that the UUID
> namespace
> > is finite, so therefore you WILL hit a problem with conflicts at some
> point.
> > Just because that point is larger than most people have to concern
> themselves
> > with isn't an invalidation.
>
> The UUID itself is 128 bits. Some of those bits are pre-determined.
> I don't recall, but I think that a "normal" UUID has 121 bits of
> randomness.
>
> How many would one have to store in a database before a collision
> would even be a concern. Such a database would be freaking huge.
> Probably far larger than anything that anyone has.
>

Not necessarily.

Bad luck: two inserts, and one collision.
Good luck: many many inserts, without any collision.

If you have ten pairs of socks... how many do you need to choose, to have
two from the same pair (to have a collision)?
Good luck: 2
Bad luck: 11

With uuid you can write a program that most of the time works correctly, but
sometimes not.
If you need a program that works correctly all the time, you can use UUID,
with the overhead of a procedure, that checks the collisions, and does
something with that, or you could generate the PK without UUID, and be sure
that the generator won't have collisions, just because it is a normal
sequence.

regards
Szymon


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: *****SPAM***** Re: UUID column as pimrary key?
Date: 2011-01-06 02:44:29
Message-ID: DA58279F-DBD8-4B36-A6D5-03DFE60C6338@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 3:03 PM, Andrew Sullivan wrote:

> ...the example was not that UUIDs are being generated and collected
> in one place at that rate, but that they're being generated in several
> independent places at a time...

In order for a collision to matter, really in order for there to *be* a collision, the duplicate pair has to be collected in one place.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Michael Satterwhite <michael(at)weblore(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: *****SPAM***** Re: UUID column as pimrary key?
Date: 2011-01-06 02:46:18
Message-ID: 42C12932-F9D3-4204-A1C8-210493644BDD@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote:

> Each machine would have a unique machine_id. This would guarantee uniqueness
> and be very easy to maintain.

And if somebody clones the disk to a new machine, but leaves the old one in service? Or do you use the MAC address and hope that's unique?

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 03:23:26
Message-ID: 20110106032325.GA25891@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 05, 2011 at 06:22:08PM -0500, Chris Browne wrote:
> But it seems to me that some of the analytics are getting a little *too*
> paranoid, on the "perhaps UUIDs are the wrong answer" side of the
> column.

That could be. I was simply noting that there are cases where one
could legitimately decide that the UUID is a bad fit. I was just
objecting to the seeming (to me anyway) claim that UUIDs can always be
used.

> There's no panaceas, here; if the process that is using IDs is fragile,
> then things can break down whether one is using UUID or SERIAL.

Not necessarily. In a distributed system where some rows will
sometimes (and unpredictably) be shared, guaranteeing that two systems
cannot generate the same ID could be really important. In those
cases, it's probably better to have a fixed generator ID plus a serial
(or, for that matter, a UUID) because then you can be sure you don't
run into one another. (Of course, some UUID examples already have
this built in. It sort of depends on the algorithm one is using.)

> I prefer the "probably unique enough" side of the fence, myself.

Me too, most of the time.

> And the process that uses the IDs needs to be robust enough that things
> won't just fall apart in tatters if it runs into non-uniqueness.

But that robustness requirement might be impossible in a distributed
system, was all I was trying to point out.

> It seems rather silly to be *totally* paranoid about the
> not-infinite-uniqueness of UUIDs when there are plenty of other risks
> lurking around that also need erro checking.

I fully agree with this.

A
--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 03:26:50
Message-ID: 20110106032648.GB25891@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 05, 2011 at 07:44:29PM -0700, Scott Ribe wrote:

> In order for a collision to matter, really in order for there to *be* a collision, the duplicate pair has to be collected in one place.

Not at the time of generation, though. They only have to end up in
the same place at once. For most cases, the value of the data
compared to the improbably low liklihood of collision means that you
deal with this case by saying, "Yuck. Well, someone loses." But if
the data is valuable enough (say, unique identifiers for nuclear
warheads), that's just not an acceptable trade-off.

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 05:30:45
Message-ID: 20110106053045.GD25891@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 05, 2011 at 07:46:18PM -0700, Scott Ribe wrote:
> And if somebody clones the disk to a new machine, but leaves the old one in service? Or do you use the MAC address and hope that's unique?
>

Excellent questions, and exactly the sort (as I was arguing elsewhere
in this thread) one has to answer in order to decide what the right
strategy is.

For instance, it may be that your application need is dealing with
incredibly valuable identifiers, that not cloning the disk is a matter
of national security, that the cost of finding and fixing a failed
unique identification case runs into the billions of dollars (not to
mention the millions of deaths), and that the assignment is performed
in real time by distributed systems that only eventually, if at all,
deliver data to a unified system. In this case, the cost of the
failure of uniqueness is very high, and it would be prudent to arrange
something that guaranteed that no assigner node could possibly create
a duplicate identifier. There are UUID approaches that can do this;
they mostly work by guaranteeing a fixed and assignable local part.
UUID vs. serial for one of those columns seems to me to be nothing but
an application efficiency problem.

What is far more likely is that ordinary-value identifiers are being
generated, and that you are using them for more or less pedestrian
reasons. The cost of an undetected duplicate is still maybe millions
of dollars, but the chances of a duplicate going undetected at
creation time (like, say, the MAC address of the creating machine is
used, the useful lifetime of the UUID is on the order of single-digit
years, and there are no more than 10 database back ends involved) are
fairly low. In this case, using a proven UUID generator seems like a
no-brainer to me (and in fact, I work on an application where we do
this).

I don't understand the number of knees in this conversation that seem
to be jerking against the answer, "It depends." Of course it depends.
If there were one answer for everything, developing good
database-backed systems wouldn't be something people would pay any of
us for.

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 08:02:43
Message-ID: 9D9C59E3-7ADA-4AD5-913A-20F9450BA007@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 6 Jan 2011, at 24:27, Chris Browne wrote:

>> Next to that, UUID's are generated by computers. I have no doubts that
>> the numeric space that makes up a UUID allows for collision chances as
>> low as described, but are computers capable of generating those
>> numbers sufficiently random that they actually achieve that low a
>> chance? I think that's pushing it.
>
> RFC 4122 does NOT point to randomness as the only criterion to
> discourage collisions, and treating UUIDs as if they were merely about
> being "sufficiently random to achieve low chance of collision" is
> insulting to the drafters of the standard, because they were certainly
> NOT so naive as to think that was sufficient.

I'm sure the designers knew what they were getting into. This comment was aimed at people claiming things like "with this and that huge number of events a collision won't occur in 100 billion years", which - to me at least - looks like they're only looking at the big number of bits involved without understanding statistical analysis.
Let's just say, if the developers of "Microsoft Visual Nuclear Power Plant Designer Professional" were claiming things like that, would you trust their product?

The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's unlikely to ever happen, but you can't predict when it will happen if it does. The possible consequences of a collision matter a lot in designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should get a really hefty insurance, or maybe you need to evacuate the country.

Opposed to that, a sequence isn't random and therefore you can predict when you will run into collisions - namely once the sequence wraps. Considering that even a 32-bit sequence allows for several billions of rows before collisions _can_ occur, you can be certain that your problem is pretty far into the future.
It _will_ be a big problem without an obvious solution if it occurs though, as from that point on you will run into a lot of collisions and the resolution to the problem is rather dependent on what you're working on.

Now that is not an argument against protecting your application against collisions, if there is a chance that you will run into collisions (you won't in a 10-record lookup table, for example) then you need to take that into consideration in your designs, but there are many (usually obvious) cases in which it's safe to omit it. With UUID's that's a little more complicated.

I don't think anyone in this discussion is saying "Don't use UUID's!". Just be aware of their limitations and the problem domains where they are sensible to use. The same goes for sequences.
It would, for example, be (obviously) pretty insane to use UUID's for a 10-record lookup table. There's plenty of examples in this thread where they shine, I don't need to repeat that.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d25777f11541886517442!


From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 08:52:15
Message-ID: AANLkTim_Zqg+NNKNxcq+qMX4vXM0FrX2DMfAGKFL=D_8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>>> Next to that, UUID's are generated by computers. I have no doubts that
>>> the numeric space that makes up a UUID allows for collision chances as
>>> low as described, but are computers capable of generating those
>>> numbers sufficiently random that they actually achieve that low a
>>> chance? I think that's pushing it.

> The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's unlikely to ever happen, but you can't predict when it will happen if it does. The possible consequences of a collision matter a lot in designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should get a really hefty insurance, or maybe you need to evacuate the country.

Given the math, I suspect the chance of a UUID collision generated by
a respected library is roughly the same or less than any other method
of choosing a unique id. If you are looking at these extreme
improbabilities, your SERIAL isn't guaranteed unique either when you
take into account cosmic rays flipping the right bits in your ECC
memory or on your disk platter.

So if you are worried about such things, give up now because your
application is going to have to run with hardware and software in this
universe with all its limitations.

Maybe I should start a business in providing UUID collision insurance?

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 09:51:03
Message-ID: ig43a7$2qv$2@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 2011-01-05, Scott Ribe <scott_ribe(at)elevated-dev(dot)com> wrote:
> On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:
>
>> In practical use I think the odds of a collision are *far* higher than
>> you are suggesting, unless the UUID generation is being done with a lot
>> more care than is likely if the user takes these sorts of claims at face
>> value.
>
> Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithm provided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonable to assume that the generation is done with a great deal of care. (And if we are talking about users coming up with their own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X & Linux. I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows.

> The IETF Network Working Group designed UUIDs to ensure that their
> uniqueness guarantee would be strong enough that no application would
> need to worry about duplicates, ever. Claims that collisions are too
> likely to depend on UUIDs being unique really are claims that the IETF
> Network Working Group didn't know what it was doing, which I find a
> bit ridiculous.

Who was it that decided on 32 bits for IP addresses?

--
⚂⚃ 100% natural


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 14:28:04
Message-ID: 56CE2C4D-A0AE-4AAC-8F61-1C9E1AB13302@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 6, 2011, at 2:51 AM, Jasen Betts wrote:

> Who was it that decided on 32 bits for IP addresses?

Nice try, but that was rather long before the IETF existed ;-)

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Stuart Bishop <stuart(at)stuartbishop(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 14:29:48
Message-ID: 1395F6F7-B084-4C00-922D-D36FE22E6A84@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:

> If you are looking at these extreme
> improbabilities, your SERIAL isn't guaranteed unique either when you
> take into account cosmic rays flipping the right bits in your ECC
> memory or on your disk platter.

Yes, that's rather the point, the probability is so extremely low that it in most cases it should be treated as 0. Some people seem to have a problem wrapping their heads around relative magnitudes that extreme.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Stuart Bishop <stuart(at)stuartbishop(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 15:14:00
Message-ID: 20110106101400.c69e1918.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:

> On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:
>
> > If you are looking at these extreme
> > improbabilities, your SERIAL isn't guaranteed unique either when you
> > take into account cosmic rays flipping the right bits in your ECC
> > memory or on your disk platter.
>
> Yes, that's rather the point, the probability is so extremely low that it in most cases it should be treated as 0. Some people seem to have a problem wrapping their heads around relative magnitudes that extreme.

My head is totally wrapped around probability -- yours is not. It's
amazing the number of people on this list who are confused by a
bunch of x in y #s being thrown around.

If the chance of a duplicate is 1 in a hundred gazillion, then I can
still hit a dupe the VERY FIRST TIME I USE IT.

I'm writing software that is intended to be used to save lives in the
event of an earthquake or flood or cosmic ray flipping bits or any
other massive disaster. The last thing I need while a bunch of EMTs
are digging bodies out of the rubble trying to save their lives is to
need to resolve a key conflict with a bunch of mobile devices, most of
which are not reachable because the cell network is down because of
the disaster. There's going to be enough other shit going wrong ...
my software is supposed to be part of the solution.

I don't give a fuck how small the chance of conflict is, the only
viable option for that chance is 0. Period. Any argument to the
contrary is stupid, asinine and outright negligent.

Now, if I were writing facebook or twitter or some other pointless
entertainment app, the odd chance of losing a post or whatever to a
key conflict is not something I'm going to worry about for more than
the time it takes to yawn.

And that's been my point all along, despite people trying to dilute it
with nonsense numbers that they don't understand: UUIDs are sufficiently
unique for 99.99999999999% of the applications out there. However, you
should always take 5 or 10 minutes to consider whether your application
is one of the .00000000001% that can't tolerate the tiny risk.

And also, if your entire solution to that risk is to rollback the
transaction in the event of a conflict, then your application is simple
enough that UUIDs are overkill anyway. Use them anyway if you want.

As far as statistics are concerned, the chance that someone as batfuck
insane as Hitler would rise to power in a major country is 1 in
1,102,196,287,287,859,992,396,273,293,203 -- yet it happened.

There. I Godwined the damn thing.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Michael Satterwhite <michael(at)weblore(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: *****SPAM***** Re: UUID column as pimrary key?
Date: 2011-01-06 15:19:14
Message-ID: 201101060919.14621.michael@weblore.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday, January 05, 2011 08:46:18 pm Scott Ribe wrote:
> On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote:
> > Each machine would have a unique machine_id. This would guarantee
> > uniqueness and be very easy to maintain.
>
> And if somebody clones the disk to a new machine, but leaves the old one in
> service? Or do you use the MAC address and hope that's unique?

That would be a matter of incompetent administration. *NOTHING* can protect
against that.


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Stuart Bishop <stuart(at)stuartbishop(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 15:46:59
Message-ID: 96857559-BBD2-4ECA-9F8F-2892FCF27AD3@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 6, 2011, at 8:14 AM, Bill Moran wrote:

> I don't give a fuck how small the chance of conflict is, the only
> viable option for that chance is 0. Period. Any argument to the
> contrary is stupid, asinine and outright negligent.

Do you give a fuck about the chance that bits will flip in the RAM and not be detected? Do you give a fuck about the chance that bits will flip in whatever persistent storage is in your device and not be detected? Do you give a fuck about the chance that bits will be flipped in the network and not be detected? Do you give a fuck about the chance that a transistor will go into a metastable state instead of flipping, and lock up the device and lose data not yet saved? Well, of course you do. But now what are the relative odds? All of these things can happen already (and all of them can happen the very first time you use it), so already your system is not precisely 100% reliable. Now is the risk of UUID collision somewhere near the same as these risks, or orders of magnitude higher, or orders of magnitude lower? It does matter.

> However, you
> should always take 5 or 10 minutes to consider whether your application
> is one of the .00000000001% that can't tolerate the tiny risk.

If an application/device truly can't tolerate a risk of an error of 1 in 10^-16, that's a problem because you can't build a device without risk of error below some threshold (in that general neighborhood I think). You can only push risks to lower & lower probability, and it makes no sense to focus on a single risk and spend time and effort to push it to orders of magnitude lower probability than all the other risks in the system. (As long as there are risks at orders of magnitude higher priority, those should get the time & expense.)

> And that's been my point all along, despite people trying to dilute it
> with nonsense numbers that they don't understand...

No, it hasn't been your point all along. Your point has shifted twice now as you've been shown to be wrong about the odds. And the numbers used are not nonsense at all. All of which somewhat contradicts your statement that your "head is totally wrapped around probability" ;-) Added to your apparent ignoring of other error sources in order to focus on one extremely unlikely one, well...

> And also, if your entire solution to that risk is to rollback the
> transaction in the event of a conflict, then your application is simple
> enough that UUIDs are overkill anyway.

I kind of doubt that the person who posted that intended it as the entire solution. It seemed to me that was intended as just the event that triggers conflict resolution and the next step would be to inform the device that the conflicting record is getting a new UUID, update appropriately, and so on.

Just so you know, I'm done talking to you. Your arrogance, rudeness, insults, condescension and personal attacks are not something that I will deal with anymore.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, Stuart Bishop <stuart(at)stuartbishop(dot)net>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 15:55:38
Message-ID: 201101060755.39255.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday 06 January 2011 7:14:00 am Bill Moran wrote:
> In response to Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:
> > On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:
> > > If you are looking at these extreme
> > > improbabilities, your SERIAL isn't guaranteed unique either when you
> > > take into account cosmic rays flipping the right bits in your ECC
> > > memory or on your disk platter.
> >
> > Yes, that's rather the point, the probability is so extremely low that it
> > in most cases it should be treated as 0. Some people seem to have a
> > problem wrapping their heads around relative magnitudes that extreme.
>

>
> There. I Godwined the damn thing.
>
> --
> Bill Moran

Maybe a wrap up is in order:)
As I said earlier this is one of those arguments that could go forever because
everyone is right, so to summarize:
1) UUIDs can have a very to extremely large namespace but less than infinite.
2) There are other alternatives i.e SERIAL
3) Managing the above is based on the interaction of three components -
software,hardware,wetware(people). Any one of which can have a weakness and in
combination their are many permutations.
4) DBAs need to plan for the worse. Worse being somewhat contextual. Real time
control of a nuclear plant versus Web social media. Choosing a unique number
generator and dealing with possible collisions is contingent on this context.

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Michael Satterwhite <michael(at)weblore(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 15:59:56
Message-ID: 9C94EDBC-B2C2-4540-9E45-11ACBB2DB5C8@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 6, 2011, at 8:19 AM, Michael Satterwhite wrote:

> That would be a matter of incompetent administration. *NOTHING* can protect
> against that.

Well, no, not necessarily. It might well be a goal (in fact, is a goal with some software that I'm developing), that users/admins don't have to worry about data caches moving across machines. My primary point, which I stated incompletely, was that in order to depend on node ids as part of unique ids, requires a degree of control over the administration of nodes, and for a given application this might or might not be practical. For instance, if your app runs on cell phones, and the OSs you deploy on give you access to the device id, and you don't mind using a rather long prefix to form your unique ids, then you have an obvious solution that, as far as I know, is guaranteed to be unique. (Ignoring the possibility of hacking of the device id, because no matter what you choose as a prefix, if an adversary manages to deliberately change the prefix, you can get duplicates.) My secondary point was that this is rather difficult to detect in time to prevent conflicts.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Anthony <osm(at)inbox(dot)org>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Rob Sargent <robjsargent(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 16:22:18
Message-ID: AANLkTi=cOfTyxt1EpzKYpN6MFjdTAHT7kk=vZd2duJyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 5, 2011 at 3:43 PM, Bill Moran <wmoran(at)potentialtech(dot)com> wrote:
> Despite the fact that the chance of a collision is very, very small, there
> is no easy way to fix it if it happens.  Zero.  It can't be done without
> shutting the system down, recalling all the remote devices and manually
> reconciling the problem ... which is not an option.

Sounds like a security problem - an intentional collision is much more
likely than a random one.


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 16:31:38
Message-ID: 87zkreat91.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

dennis(dot)jenkins(dot)75(at)gmail(dot)com (dennis jenkins) writes:
> The UUID itself is 128 bits. Some of those bits are pre-determined.
> I don't recall, but I think that a "normal" UUID has 121 bits of
> randomness.

That doesn't match RFC 4122 very well...

It indicates 5 forms of UUIDs:

1) Time-based, where about 1/2 the data comes from local timestamp data,
and 48 bits come from MAC address (or similar)

2) "DCE Security" (about which it says little)

3) Name-based, using MD5 hashing

4) Randomly generated UUIDs (which are quite likely what you're thinking
about) have 122 bits of random data

5) Name-based, using SHA-1 hashing

The reasonable choices for a would-be artificial primary key seem to be
1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
host data are likely to eliminate the "oh, it might just randomly match"
problem.

Note the set of functions in the uuid-ossp contrib module fit this, down
to omitting Version 2 :-).

test(at)localhost-> \df+ public.uuid_generate*
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Owner | Language | Source code | Description
--------+--------------------+------------------+---------------------------+--------+------------+----------+----------+--------------------+-------------
public | uuid_generate_v1 | uuid | | normal | volatile | postgres | c | uuid_generate_v1 |
public | uuid_generate_v1mc | uuid | | normal | volatile | postgres | c | uuid_generate_v1mc |
public | uuid_generate_v3 | uuid | namespace uuid, name text | normal | immutable | postgres | c | uuid_generate_v3 |
public | uuid_generate_v4 | uuid | | normal | volatile | postgres | c | uuid_generate_v4 |
public | uuid_generate_v5 | uuid | namespace uuid, name text | normal | immutable | postgres | c | uuid_generate_v5 |
(5 rows)
--
"I'm all for advancement. However rich text on an ephemeral media is
a totally brain-dead idea. Nobody in their right mind would take the
effort to prettyfy documents that are going to be gone in a few days."
-- Jay Denebeim <denebeim(at)deepthot(dot)ml(dot)org>


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 16:51:25
Message-ID: 87vd22asc2.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

wmoran(at)potentialtech(dot)com (Bill Moran) writes:
> If the chance of a duplicate is 1 in a hundred gazillion, then I can
> still hit a dupe the VERY FIRST TIME I USE IT.
>
> I'm writing software that is intended to be used to save lives in the
> event of an earthquake or flood or cosmic ray flipping bits or any
> other massive disaster. The last thing I need while a bunch of EMTs
> are digging bodies out of the rubble trying to save their lives is to
> need to resolve a key conflict with a bunch of mobile devices, most of
> which are not reachable because the cell network is down because of
> the disaster. There's going to be enough other shit going wrong ...
> my software is supposed to be part of the solution.
>
> I don't give a fuck how small the chance of conflict is, the only
> viable option for that chance is 0. Period. Any argument to the
> contrary is stupid, asinine and outright negligent.

If your system is sufficiently negligently designed that this particular
conflict causes it to kill people, then I wouldn't be too inclined to
point at this issue with UUIDs being the Real Problem with the system.

This is NOT the only risk that the system faces; you can't get *nearly*
as low probabilities attached to hardware and network issues such as:
- Disks failing
- Cosmic rays twiddling bits in memory
- Network connections failing part way through the work
- Dumb techs blindly cloning the same "host key" onto every one of the
EMTs' data collection devices

That last item is an argument in *FAVOR* of using UUIDs for the "EMTs
digging bodies out of rubble" scenario... The typical alternative to
UUIDs would be to have (Sequence + Host Data), and I'd expect there to
be a considerable risk of Dumb Techs making mistakes there. It wouldn't
be difficult for such a scenario to induce systematic key collisions.

It's never just about one risk; it's about *all the risks.*

> And that's been my point all along, despite people trying to dilute it
> with nonsense numbers that they don't understand: UUIDs are sufficiently
> unique for 99.99999999999% of the applications out there. However, you
> should always take 5 or 10 minutes to consider whether your application
> is one of the .00000000001% that can't tolerate the tiny risk.

If your application is so fragile that it cannot tolerate *that* tiny
risk, then I have to wonder if your system isn't Just Plain Too Fragile,
because there are other unavoidable risks likely to be of much greater
probability.
--
"Have you noticed that, when we were young, we were told that
`everybody else is doing it' was a really stupid reason to do
something, but now it's the standard reason for picking a particular
software package?" -- Barry Gehm


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 17:09:15
Message-ID: 64E36981-C1B1-44D2-9718-F1113431757E@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 6, 2011, at 9:31 AM, Chris Browne wrote:

> The reasonable choices for a would-be artificial primary key seem to be
> 1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
> host data are likely to eliminate the "oh, it might just randomly match"
> problem.

In some contexts, 1 is considered a security weakness, as it reveals information about which machine generated it and when, which is why most OS-supplied uuid generators now default to 4 (random). This tends to be more of a concern with encryption/security uses, and if it's not a concern for your db[*], then your are correct that 1 is likely the best choice.

[*] After all, in many dbs we log all sorts of explicit where/who/when for auditing purposes. In that case, having ids that provide a clue of where/when most certainly does not add any legitimate security concern.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, Stuart Bishop <stuart(at)stuartbishop(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Apology
Date: 2011-01-06 18:24:06
Message-ID: 20110106132406.a0f2c6c5.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


It was recently brought to my attention that in my enthusiasm to
have my point of view understood, that I may have been crude,
or "self-important" or something else that people find offensive.

My apologies to anyone who was offended, and to anyone who considers
this thread a bikeshed or flamewar that I have been contributing to.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 18:33:05
Message-ID: 68E9C3DF-8A66-4D0A-9208-E62B2BCEB3DC@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 6 Jan 2011, at 17:51, Chris Browne wrote:

> wmoran(at)potentialtech(dot)com (Bill Moran) writes:
> If your system is sufficiently negligently designed that this particular
> conflict causes it to kill people, then I wouldn't be too inclined to
> point at this issue with UUIDs being the Real Problem with the system.
>
> This is NOT the only risk that the system faces; you can't get *nearly*
> as low probabilities attached to hardware and network issues such as:
> - Disks failing
> - Cosmic rays twiddling bits in memory
> - Network connections failing part way through the work
> - Dumb techs blindly cloning the same "host key" onto every one of the
> EMTs' data collection devices

Let's say that you actually build a mission critical system for which you'd need to evacuate the country if it fails. You pick the best ECC RAM you can find, the most reliable type of disk storage available, your fallback network has a fallback network of it's own, etc. Basically you have done everything you could to ensure that the chances of the system failing are as small as technically possible.

All those little failure chances add up to a certain number. Using UUID's for your ID's is not required for the design of the system, yet you chose to do so. You added a nearly infinite chance of UUID collisions to the accumulated chance of the system failing.

Now the system miraculously fails and the country needs evacuating. A committee is going to investigate why it failed. If the dumb techy above is responsible, they just found themselves a scape-goat. If they didn't, but stumble upon your unnecessary usage of UUID's instead... Let's just say I don't want to be that person.

I have to agree with Bill here, if lives depend on your system then anything that adds to the failure chances is very hard to defend. In the end it often boils down to responsibility in case of failure, not to mention what it does to your own peace of mind.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d260af211541129314545!


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 19:36:21
Message-ID: 87oc7tbz9m.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

dalroi(at)solfertje(dot)student(dot)utwente(dot)nl (Alban Hertroys) writes:
> On 6 Jan 2011, at 17:51, Chris Browne wrote:
>
>> wmoran(at)potentialtech(dot)com (Bill Moran) writes:
>> If your system is sufficiently negligently designed that this particular
>> conflict causes it to kill people, then I wouldn't be too inclined to
>> point at this issue with UUIDs being the Real Problem with the system.
>>
>> This is NOT the only risk that the system faces; you can't get *nearly*
>> as low probabilities attached to hardware and network issues such as:
>> - Disks failing
>> - Cosmic rays twiddling bits in memory
>> - Network connections failing part way through the work
>> - Dumb techs blindly cloning the same "host key" onto every one of the
>> EMTs' data collection devices
>
> Let's say that you actually build a mission critical system for which
> you'd need to evacuate the country if it fails. You pick the best ECC
> RAM you can find, the most reliable type of disk storage available,
> your fallback network has a fallback network of it's own,
> etc. Basically you have done everything you could to ensure that the
> chances of the system failing are as small as technically possible.
>
> All those little failure chances add up to a certain number. Using
> UUID's for your ID's is not required for the design of the system, yet
> you chose to do so. You added a nearly infinite chance of UUID
> collisions to the accumulated chance of the system failing.

Infinite? The probability can't conceivably exceed 1.

It's scarcely likely to exceed "infinitesimal."

I've built clustered systems, and frequently, the resulting Rube
Goldberg apparatus that tries to protect against failures of the other
apparatus trying to protect against failures of further apparatus that
tries to protect against failures introduces a tall and unwieldy stack
of intricately interwoven components such that operators need to be
*mighty* careful not to tip anything over lest the protective apparatus
collapse, knocking over the system it was supposed to protect.

> Now the system miraculously fails and the country needs evacuating. A
> committee is going to investigate why it failed. If the dumb techy
> above is responsible, they just found themselves a scape-goat. If they
> didn't, but stumble upon your unnecessary usage of UUID's
> instead... Let's just say I don't want to be that person.

If the system is that mission critical, then it well and truly warrants
doing enough proper analysis of the risks to *know* the risks of the
various expectable failure conditions, and to do so in rather more
detail than the oversimplification of characterizing them as
"infinitesimal" or "infinite."

> I have to agree with Bill here, if lives depend on your system then
> anything that adds to the failure chances is very hard to defend. In
> the end it often boils down to responsibility in case of failure, not
> to mention what it does to your own peace of mind.

It seems to me that using serially assigned values, along with manually
assigned server IDs, to construct a would-be-unique value, is likely to
introduce quite a lot *more risk* of system failure than would the use
of UUIDs.

So someone that rules out UUIDs based on some fallacious imagined
"infinite chance of collisions" is jumping away from a small risk, and
accepting one much more likely to take lives.

We haven't seen any indication that would distinguish between "infinite"
and "infinitesimal," beyond the fact that "infinite" is infinitely
larger than the largest probability that one can find for an event,
which is 1.
--
(format nil "~S(at)~S" "cbbrowne" "gmail.com")
"But life wasn't yes-no, on-off. Life was shades of gray, and
rainbows not in the order of the spectrum."
-- L. E. Modesitt, Jr., _Adiamante_


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 20:01:01
Message-ID: E0AD7ADE-9259-4C27-B649-FADC6662B4C9@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Jan 6, 2011, at 3:52 AM, Stuart Bishop wrote:
> Maybe I should start a business in providing UUID collision insurance?

Your ideas are intriguing to me and I wish to subscribe to your newsletter.

-M


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 20:39:34
Message-ID: 20110106153934.a655b5c2.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Chris Browne <cbbrowne(at)acm(dot)org>:
>
> It seems to me that using serially assigned values, along with manually
> assigned server IDs, to construct a would-be-unique value, is likely to
> introduce quite a lot *more risk* of system failure than would the use
> of UUIDs.

First off, server IDs are not randomly assigned. They are assigned
automatically when the software is activated and tracked by the master
server.

Second, there is a very simple mechanism to prevent attempted "device
cloning" from causing a problem. (it's not a particularly difficult
problem to solve, actually)

Third, and very important, is the following conversation we had during
planning:
"Can we get good quality random data on mobile devices?"
"We'll have to research it, we don't know yet."
"Wait, before we do the research -- if we find that some mobile devices
have good quality random data and others don't -- can we tell the client
which devices they're allowed to use?"
"No, the client will dictate which devices it must work on."

So basically, we realized that even if our concerns about UUID
collisions we're unreasonably paranoid, we still couldn't guarantee
that the devices would be up to the task of generating UUIDs with
sufficient randomness.

As I tried to point out earlier in the thread, the collision issue was
one of MANY things we considered in this design. It is, however, the
ONLY one that's directly relevant to the original conversation. Now
that my description of our design seems to be coming under fire, I
feel the need to at least point out that we thought it through more
than that. I suppose that's mostly my own fault for trying to bring
in an example that it's impractical to discuss in full detail.

As other people have pointed out, there are concerns about malicious
impersonation, device failure, data corruption ... each of these may
be more or less likely or dangerous than UUID collision, but not all
of them are _intended_ to be solved by not using UUIDs, so claiming
that abandoning UUIDs does not fix these problems is completely
correct, and also a straw man.

> So someone that rules out UUIDs based on some fallacious imagined
> "infinite chance of collisions" is jumping away from a small risk, and
> accepting one much more likely to take lives.

The possibility of collisions is not fallacious, however, the use of
"infinite" (I don't remember who wrote that) is obviously not correct.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-06 22:06:55
Message-ID: B6FEE77F-2807-407F-B555-A136581A1BEF@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 6 Jan 2011, at 20:36, Chris Browne wrote:

> Infinite? The probability can't conceivably exceed 1.

Don't start picking om words please, "infinitely small" or "infinitesimal" is obviously what I meant to write there.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d263d0511543498117537!


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UUID column as pimrary key?
Date: 2011-01-08 10:48:24
Message-ID: ig9fdo$8v3$2@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 2011-01-06, Chris Browne <cbbrowne(at)acm(dot)org> wrote:

> (Sequence + Host Data), and I'd expect there to
> be a considerable risk of Dumb Techs making mistakes there. It wouldn't
> be difficult for such a scenario to induce systematic key collisions.

I've seen that happen. cleanup of the effected records was tricky,
a few customers were inconvenienced

--
⚂⚃ 100% natural