Re: [HACKERS] Patch for UUID datatype (beta)

Lists: pgsql-hackerspgsql-patches
From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Patch for UUID datatype (beta)
Date: 2006-09-17 23:00:21
Message-ID: 1158534021.9228.9.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Folks,

The following patch implements the UUID datatype. I would like to send
this beta patch to see if I still am on the right track. Please send
your comments.

Description of UUID:

- The type is called uuid.
- btree and hash indexes are supported.
- uuid array is supported.
- uuid text i/o is supported.
- uuid binary i/o is supported.
- uuid_to_text and text_to_uuid casting is supported.
- uuid_to_varchar and varchar_to_uuid casting is supported.
- the < <= = => > <> operators are supported. Please note that some of
these operators mathematically have no meaning and are only good for
sorting.

- new_guid() function is supported. This function is based on V4 random
uuid value. It generated 16 random bytes with uuid 'variant' and
'version'. It is not guaranteed to produce unique values according to
the docs but I have inserted 6 million records and it did not create any
duplicates :)

- the uuid datatype supports 3 input formats:
1. "00000000-0000-0000-0000-00000000"
2. "0000000000000000000000000000"
3. "{00000000-0000-0000-0000-00000000}"

- the uuid datatype supports the defined output format by RFC:
"00000000-0000-0000-0000-00000000"

Areas yet in development and testing:

- uuid array indexing.
- testing with joins (merge,hash,gin)
- new_guid() fail proof testing
- performance testing
- testing with internal storage and compression.
- regression test addition
- proper documentation
- overall sanity testing/checking

Please note that I consider this a beta patch.
You can download it from:
http://www.truesoftware.net/pgsql/uuid/patch-0.1/

Regards,
Gevik.


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Patch for UUID datatype (beta)
Date: 2006-09-18 07:21:37
Message-ID: 450E4901.5060206@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gevik Babakhani wrote:
> - new_guid() function is supported. This function is based on V4 random
> uuid value. It generated 16 random bytes with uuid 'variant' and
> 'version'. It is not guaranteed to produce unique values

Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
there's a commonly accepted algorithm providing this.

Regards,
Andreas


From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Patch for UUID datatype (beta)
Date: 2006-09-18 08:41:38
Message-ID: 1158568898.19958.14.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2006-09-18 at 09:21 +0200, Andreas Pflug wrote:
> Gevik Babakhani wrote:
> > - new_guid() function is supported. This function is based on V4 random
> > uuid value. It generated 16 random bytes with uuid 'variant' and
> > 'version'. It is not guaranteed to produce unique values
>
> Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> there's a commonly accepted algorithm providing this.
>

uniqueness is never a guaranteed. that is according to the RFC docs.
However the new_guid() generates a random value in the range of 256^256.
The random value is again based on the PG's randomizer which is a very
good one.

uniqueness is never a guaranteed in the sense that there is a tiny
chance someone of the other side of the planet might generate the same
guid. Or if you set your PC's clock back to the past (1981) you have a
tiny chance to generate a same guid twice.

I am running a test that is going on for the past two days, in has
generated over 14 million guids with new_guid() and yet no
duplicates :)

Regards,
Gevik

> Regards,
> Andreas
>
>


From: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
To: "Gevik Babakhani" <pgdev(at)xs4all(dot)nl>
Cc: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Patch for UUID datatype (beta)
Date: 2006-09-18 09:11:04
Message-ID: 7be3f35d0609180211p752e4662u3373d988d984ffa5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gevik,
>uniqueness is never a guaranteed. that is according to the RFC docs.

>uniqueness is never a guaranteed in the sense that there is a tiny
>chance someone of the other side of the planet might generate the same
>guid.

As much as I learned, it is recommended to give information about "grade of
uniqueness". I think it would be a valuable information, which information
your UUID-generator takes into account, and what the "grade of uniqueness"
is.

(I know of the Windows UUID, which takes the MAC-Address of the included
Ethernet-Card into it's calculation, which may be guaranteed to be unique)

Some more questions about UUIDs and your patch:

a) compatibility of UUIDs -> I have generated a lot of UUIDs via the WIN32
provided function (for the unix-only-people: Windows uses UUIDs all around
its registry, software IDs and on and on). How unique are those UUIDs when
mixed with "your" UUIDs ?

b) I read some time ago about the problems with UUIDs as primary keys in
contrast to serials: serials get produced in ascending order; and often data
which was produced in one timespan is also connected semantically. "near
serial values" are also local within a btree-index; but UUIDs generated in
"near times" are usually spread around the possible bitranges.
(example for sequence of serials: 1 - 2 - 3 - 4 - 5 - 6
example for sequence of UUIDs : 1 - 999919281921843191 - 782 -
18291831912318971231)
that is supposed to affect the locality of the index, and from that also the
performance of the system.

I do not know how valid this information is; so I am asking you for your
feedback; especially since you put a lot of thoughts into this UUID patch.
Maybe you took allready care of this situation when constructing the index
operators?

Thanks

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.


From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: Harald Armin Massa <haraldarminmassa(at)gmail(dot)com>
Cc: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-18 09:29:03
Message-ID: 1158571743.19958.40.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2006-09-18 at 11:11 +0200, Harald Armin Massa wrote:
> Gevik,
> >uniqueness is never a guaranteed. that is according to the RFC docs.
>
> >uniqueness is never a guaranteed in the sense that there is a tiny
> >chance someone of the other side of the planet might generate the
> same
> >guid.
>
> As much as I learned, it is recommended to give information about
> "grade of uniqueness". I think it would be a valuable information,
> which information your UUID-generator takes into account, and what the
> "grade of uniqueness" is.
>
> (I know of the Windows UUID, which takes the MAC-Address of the
> included Ethernet-Card into it's calculation, which may be guaranteed
> to be unique)
>

>
> Some more questions about UUIDs and your patch:
>
> a) compatibility of UUIDs -> I have generated a lot of UUIDs via the
> WIN32 provided function (for the unix-only-people: Windows uses UUIDs
> all around its registry, software IDs and on and on). How unique are
> those UUIDs when mixed with "your" UUIDs ?

The new_guid() generates a random guid in the range of 256^256 which is
3.231700607131100730071487668867e+616 (easy to imagine) using PG's
randomizer. I wonder how often someone could actually generate a
duplicate guid in this range. This also goes for the MS version of the
guid. It uses the MAC address and a timespamp but what happens if by
chance your PC's clock is set in the past!

>
> b) I read some time ago about the problems with UUIDs as primary keys
> in contrast to serials: serials get produced in ascending order; and
> often data which was produced in one timespan is also connected
> semantically. "near serial values" are also local within a
> btree-index; but UUIDs generated in "near times" are usually spread
> around the possible bitranges.
> (example for sequence of serials: 1 - 2 - 3 - 4 - 5 - 6
> example for sequence of UUIDs : 1 - 999919281921843191 - 782 -
> 18291831912318971231)
> that is supposed to affect the locality of the index, and from that
> also the performance of the system.
>
> I do not know how valid this information is; so I am asking you for
> your feedback; especially since you put a lot of thoughts into this
> UUID patch. Maybe you took allready care of this situation when
> constructing the index operators?

I am running many test regarding indexing of the uuid datatype with
large amount of records. But the performance test is still limited to
hardware capacity

Thank you.
>
> Thanks
>
> Harald
>
>
>
>
>
>
> --
> GHUM Harald Massa
> persuadere et programmare
> Harald Armin Massa
> Reinsburgstraße 202b
> 70197 Stuttgart
> 0173/9409607
> -
> Let's set so double the killer delete select all.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Patch for UUID datatype (beta)
Date: 2006-09-18 14:33:22
Message-ID: 27536.1158590002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> there's a commonly accepted algorithm providing this.

Anyone who thinks UUIDs are guaranteed unique has been drinking too much
of the kool-aid. They're at best probably unique. Some generator
algorithms might make it more probable than others, but you simply
cannot "guarantee" it for UUIDs generated on noncommunicating machines.

One of the big reasons that I'm hesitant to put a UUID generation
function into core is the knowledge that none of them are or can be
perfect ... so people might need different ones depending on local
conditions. I'm inclined to think that a reasonable setup would put
the datatype (with input, output, comparison and indexing support)
into core, but provide a generation function as a contrib module,
making it easily replaceable.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-18 14:47:22
Message-ID: 200609181647.23654.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Montag, 18. September 2006 09:21 schrieb Andreas Pflug:
> Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> there's a commonly accepted algorithm providing this.

There are several such algorithms, which is part of the problem. If someone
could sort that out, we might get somewhere.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Patch for UUID datatype (beta)
Date: 2006-09-18 15:14:22
Message-ID: 1158592462.24177.2.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Completely agreed. I can remove the function from the patch. The
temptation was just too high not to include the new_guid() in the
patch :)

On Mon, 2006-09-18 at 10:33 -0400, Tom Lane wrote:
> Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> > Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> > there's a commonly accepted algorithm providing this.
>
> Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> of the kool-aid. They're at best probably unique. Some generator
> algorithms might make it more probable than others, but you simply
> cannot "guarantee" it for UUIDs generated on noncommunicating machines.
>
> One of the big reasons that I'm hesitant to put a UUID generation
> function into core is the knowledge that none of them are or can be
> perfect ... so people might need different ones depending on local
> conditions. I'm inclined to think that a reasonable setup would put
> the datatype (with input, output, comparison and indexing support)
> into core, but provide a generation function as a contrib module,
> making it easily replaceable.
>
> regards, tom lane
>


From: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
To: "Gevik Babakhani" <pgdev(at)xs4all(dot)nl>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-18 15:29:34
Message-ID: 7be3f35d0609180829u568ecd38sad599945fc6e1491@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

>
>
> > Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> > of the kool-aid.
>

> Identifier uniqueness considerations:
> This document specifies three algorithms to generate UUIDs: the
> first leverages the unique values of 802 MAC addresses to
> guarantee uniqueness, the second uses pseudo-random number
> generators, and the third uses cryptographic hashing and
> application-provided text strings. As a result, the UUIDs
> generated according to the mechanisms here will be unique from all
> other UUIDs that have been or will be assigned.

That is a quote from the ftp://ftp.rfc-editor.org/in-notes/rfc4122.txt

And to quote ITU-T
"""
If generated according to one of the mechanisms defined in ITU-T Rec. X.667
| ISO/IEC 9834-8 <http://fpweb/ITU-T/studygroups/com17/oid.html>, a UUID is
either guaranteed to be different from all other UUIDs generated before 3603
A.D., or is extremely likely to be different (depending on the mechanism
chosen). The UUID generation algorithm specified in this standard supports
very high allocation rates: 10 million per second per machine if necessary,
so UUIDs can also be used as transaction IDs.
"""

They also talk about a "guaranteed differentness" - and as much as I
understand, they are Unique as long as the MAC-Adresses of the Network-Cards
are unique, and fall back to "extremly likely" when there is no network card
present.

I would really like PostgreSQL to include an uuid-generation function
crafted along the recommendations in rfc4122 or ISO/IEC 9834-8; so those
UUIDs have a "ISO/IEC-defined uniqueness" or at least a "ISO/IEC-defined
extreme likelyness to be unique"

As of now there are at least 3 implementations for UUID creation for
PostgreSQL in the wild; as much as I understand is that "UUIDs created by
the same algorithm" are much more likely to be unique to each other then
UUIDs created by different algorithms.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Harald Armin Massa <haraldarminmassa(at)gmail(dot)com>
Cc: Gevik Babakhani <pgdev(at)xs4all(dot)nl>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-18 15:53:54
Message-ID: 20060918155354.GG8796@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 18, 2006 at 05:29:34PM +0200, Harald Armin Massa wrote:
> I would really like PostgreSQL to include an uuid-generation function
> crafted along the recommendations in rfc4122 or ISO/IEC 9834-8; so those
> UUIDs have a "ISO/IEC-defined uniqueness" or at least a "ISO/IEC-defined
> extreme likelyness to be unique"

The code to get things like the MAC address is going to be a pile of
very OS specific code, which I really don't think is in the realm of
code postgresql wants to maintain. The easier and better solution is to
include a module in contrib (at best) that calls some standard
cross-platform library to do the job.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
Cc: "Gevik Babakhani" <pgdev(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-18 16:15:51
Message-ID: 29546.1158596151@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com> writes:
> They also talk about a "guaranteed differentness" - and as much as I
> understand, they are Unique as long as the MAC-Adresses of the Network-Cards
> are unique, and fall back to "extremly likely" when there is no network card
> present.

MAC addresses are not guaranteed unique (heck, on Apple machines they're
user-assignable, and I think you can change 'em on Linux too). Another
unrelated-to-reality assumption in the above claim is that the local
system clock is always accurate (is never, say, set backwards).

You can have a reasonably strong probability that UUIDs generated per spec
within a single well-run network are unique, but that's about as far as
I'd care to believe it.

regards, tom lane


From: mark(at)mark(dot)mielke(dot)cc
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-18 16:23:16
Message-ID: 20060918162316.GB31239@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 18, 2006 at 10:33:22AM -0400, Tom Lane wrote:
> Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> > Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> > there's a commonly accepted algorithm providing this.
> Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> of the kool-aid. They're at best probably unique. Some generator
> algorithms might make it more probable than others, but you simply
> cannot "guarantee" it for UUIDs generated on noncommunicating machines.

The versions that include a MAC address, time, and serial number for
the machine come pretty close, presuming that the user has not
overwritten the MAC address with something else. It's unique at
manufacturing time. If the generation is performed from a library
with the same state, on the same machine, on the off chance that you
do request multiple generations at the same exact time (from my
experience, this is already unlikely) the serial number should be
bumped for that time.

So yeah - if you set your MAC address, or if your machine time is ever
set back, or if you assume a serial number of 0 each time (generation
routine isn't shared among processes on the system), you can get overlap.
All of these can be controlled, making it possible to eliminate overlap.

> One of the big reasons that I'm hesitant to put a UUID generation
> function into core is the knowledge that none of them are or can be
> perfect ... so people might need different ones depending on local
> conditions. I'm inclined to think that a reasonable setup would put
> the datatype (with input, output, comparison and indexing support)
> into core, but provide a generation function as a contrib module,
> making it easily replaceable.

I have UUID generation in core in my current implementation. In the
last year that I've been using it, I have already chosen twice to
generate UUIDs from my calling program. I find it faster, as it avoids
have to call out to PostgreSQL twice. Once to generate the UUID, and
once to insert the row using it. I have no strong need for UUID
generation to be in core, and believe there does exist strong reasons
not to. Performance is better when not in core. Portability of
PostgreSQL is better when not in core. Ability to control how UUID is
defined is better when not in control.

The only thing an in-core version provides is convenience for those
that do not have easy access to a UUID generation library. I don't
care for that convenience.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
To: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-18 21:00:22
Message-ID: 20060918210021.GE47167@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

If you're going to yank it, please at least include a generator in
contrib.

Personally, I'd like to see at least some kind of generator in core,
with appropriate info/disclaimers in the docs. A simple random-number
generator is probably the best way to go in that regard. I think that
most people know that UUID generation isn't 100.00000% perfect.

BTW, at a former company we used SHA1s to identify files that had been
uploaded. We were wondering on the odds of 2 different files hashing to
the same value and found some statistical comparisons of probabilities.
I don't recall the details, but the odds of duplicating a SHA1 (1 in
2^160) are so insanely small that it's hard to find anything in the
physical world that compares. To duplicate random 256^256 numbers you'd
probably have to search until the heat-death of the universe.

On Mon, Sep 18, 2006 at 05:14:22PM +0200, Gevik Babakhani wrote:
> Completely agreed. I can remove the function from the patch. The
> temptation was just too high not to include the new_guid() in the
> patch :)
>
>
> On Mon, 2006-09-18 at 10:33 -0400, Tom Lane wrote:
> > Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> > > Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> > > there's a commonly accepted algorithm providing this.
> >
> > Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> > of the kool-aid. They're at best probably unique. Some generator
> > algorithms might make it more probable than others, but you simply
> > cannot "guarantee" it for UUIDs generated on noncommunicating machines.
> >
> > One of the big reasons that I'm hesitant to put a UUID generation
> > function into core is the knowledge that none of them are or can be
> > perfect ... so people might need different ones depending on local
> > conditions. I'm inclined to think that a reasonable setup would put
> > the datatype (with input, output, comparison and indexing support)
> > into core, but provide a generation function as a contrib module,
> > making it easily replaceable.
> >
> > regards, tom lane
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim Nasby jimn(at)enterprisedb(dot)com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
Cc: Gevik Babakhani <pgdev(at)xs4all(dot)nl>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-18 21:13:27
Message-ID: 20060918211327.GJ8796@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 18, 2006 at 04:00:22PM -0500, Jim C. Nasby wrote:
> BTW, at a former company we used SHA1s to identify files that had been
> uploaded. We were wondering on the odds of 2 different files hashing to
> the same value and found some statistical comparisons of probabilities.
> I don't recall the details, but the odds of duplicating a SHA1 (1 in
> 2^160) are so insanely small that it's hard to find anything in the
> physical world that compares. To duplicate random 256^256 numbers you'd
> probably have to search until the heat-death of the universe.

The birthday paradox gives you about 2^80 (about 10^24) files before a
SHA1 match, which is huge enough as it is. AIUI a UUID is only 2^128
bits so that would make 2^64 (about 10^19) random strings before you
get a duplicate. Embed the time in there and the chance becomes
*really* small, because then you have to get it in the same second.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-18 21:17:50
Message-ID: 20060918211750.GF47167@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 18, 2006 at 12:23:16PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> I have UUID generation in core in my current implementation. In the
> last year that I've been using it, I have already chosen twice to
> generate UUIDs from my calling program. I find it faster, as it avoids
> have to call out to PostgreSQL twice. Once to generate the UUID, and
> once to insert the row using it. I have no strong need for UUID
> generation to be in core, and believe there does exist strong reasons
> not to. Performance is better when not in core. Portability of
> PostgreSQL is better when not in core. Ability to control how UUID is
> defined is better when not in control.

That's kinda short-sighted. You're assuming that the only place you'll
want to generate UUIDs is outside the database. What about a stored
procedure that's adding data to the database? How about populating a
table via a SELECT INTO? There's any number of cases where you'd want to
generate a UUID inside the database.

> The only thing an in-core version provides is convenience for those
> that do not have easy access to a UUID generation library. I don't
> care for that convenience.

It's not about access to a library, it's about how do you get to that
library from inside the database, which may not be very easy.

You may not care for that convenience, but I certainly would.
--
Jim Nasby jimn(at)enterprisedb(dot)com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
Cc: Gevik Babakhani <pgdev(at)xs4all(dot)nl>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-18 21:38:24
Message-ID: 1158615504.30652.15.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2006-09-18 at 16:00 -0500, Jim C. Nasby wrote:
> BTW, at a former company we used SHA1s to identify files that had been
> uploaded. We were wondering on the odds of 2 different files hashing to
> the same value and found some statistical comparisons of probabilities.
> I don't recall the details, but the odds of duplicating a SHA1 (1 in
> 2^160) are so insanely small that it's hard to find anything in the
> physical world that compares. To duplicate random 256^256 numbers you'd
> probably have to search until the heat-death of the universe.

That assumes you have good random data. Usually there is some kind of
tradeoff between the randomness and the performance. If you
read /dev/random each time, that eliminates some applications that need
to generate UUIDs very quickly. If you use pseudorandom data, you are
vulnerable in the case a clock is set back or the data repeats.

Regards,
Jeff Davis


From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Patch for UUID datatype (beta)
Date: 2006-09-18 22:27:30
Message-ID: 1158618450.22265.7.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

If you have trouble with duplicate OIDs
Please use patch-0.2 for testing. I have changed the OIDs to 5000 range.

You can download it from:
http://www.truesoftware.net/pgsql/uuid/patch-0.2/

On Mon, 2006-09-18 at 01:00 +0200, Gevik Babakhani wrote:
> Folks,
>
> The following patch implements the UUID datatype. I would like to send
> this beta patch to see if I still am on the right track. Please send
> your comments.
>
> Description of UUID:
>
> - The type is called uuid.
> - btree and hash indexes are supported.
> - uuid array is supported.
> - uuid text i/o is supported.
> - uuid binary i/o is supported.
> - uuid_to_text and text_to_uuid casting is supported.
> - uuid_to_varchar and varchar_to_uuid casting is supported.
> - the < <= = => > <> operators are supported. Please note that some of
> these operators mathematically have no meaning and are only good for
> sorting.
>
> - new_guid() function is supported. This function is based on V4 random
> uuid value. It generated 16 random bytes with uuid 'variant' and
> 'version'. It is not guaranteed to produce unique values according to
> the docs but I have inserted 6 million records and it did not create any
> duplicates :)
>
> - the uuid datatype supports 3 input formats:
> 1. "00000000-0000-0000-0000-00000000"
> 2. "0000000000000000000000000000"
> 3. "{00000000-0000-0000-0000-00000000}"
>
> - the uuid datatype supports the defined output format by RFC:
> "00000000-0000-0000-0000-00000000"
>
>
> Areas yet in development and testing:
>
> - uuid array indexing.
> - testing with joins (merge,hash,gin)
> - new_guid() fail proof testing
> - performance testing
> - testing with internal storage and compression.
> - regression test addition
> - proper documentation
> - overall sanity testing/checking
>
> Please note that I consider this a beta patch.
> You can download it from:
> http://www.truesoftware.net/pgsql/uuid/patch-0.1/
>
>
> Regards,
> Gevik.
>
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


From: mark(at)mark(dot)mielke(dot)cc
To: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-18 23:45:07
Message-ID: 20060918234507.GA16056@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 18, 2006 at 04:17:50PM -0500, Jim C. Nasby wrote:
> On Mon, Sep 18, 2006 at 12:23:16PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > I have UUID generation in core in my current implementation. In the
> > last year that I've been using it, I have already chosen twice to
> > generate UUIDs from my calling program. I find it faster, as it avoids
> > have to call out to PostgreSQL twice. Once to generate the UUID, and
> > once to insert the row using it. I have no strong need for UUID
> > generation to be in core, and believe there does exist strong reasons
> > not to. Performance is better when not in core. Portability of
> > PostgreSQL is better when not in core. Ability to control how UUID is
> > defined is better when not in control.
> That's kinda short-sighted. You're assuming that the only place you'll
> want to generate UUIDs is outside the database. What about a stored
> procedure that's adding data to the database? How about populating a
> table via a SELECT INTO? There's any number of cases where you'd want to
> generate a UUID inside the database.

contrib module.

> > The only thing an in-core version provides is convenience for those
> > that do not have easy access to a UUID generation library. I don't
> > care for that convenience.
> It's not about access to a library, it's about how do you get to that
> library from inside the database, which may not be very easy.
> You may not care for that convenience, but I certainly would.

Then load the contrib module. I do both. I'd happily reduce my contrib
module to be based upon a built-in UUID type within PostgreSQL, providing
the necessary UUID generation routines.

I would not use a 100% random number generator for a UUID value as was
suggested. I prefer inserting the MAC address and the time, to at
least allow me to control if a collision is possible. This is not easy
to do using a few lines of C code. I'd rather have a UUID type in core
with no generation routine, than no UUID type in core because the code
is too complicated to maintain, or not portable enough.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-19 13:20:13
Message-ID: 20060919132013.GT47167@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 18, 2006 at 07:45:07PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> I would not use a 100% random number generator for a UUID value as was
> suggested. I prefer inserting the MAC address and the time, to at
> least allow me to control if a collision is possible. This is not easy
> to do using a few lines of C code. I'd rather have a UUID type in core
> with no generation routine, than no UUID type in core because the code
> is too complicated to maintain, or not portable enough.

As others have mentioned, using MAC address doesn't remove the
possibility of a collision.

Maybe a good compromise that would allow a generator function to go into
the backend would be to combine the current time with a random number.
That will ensure that you won't get a dupe, so long as your clock never
runs backwards.
--
Jim Nasby jimn(at)enterprisedb(dot)com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
Cc: mark(at)mark(dot)mielke(dot)cc, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-19 13:35:55
Message-ID: 1158672955.10757.4.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> As others have mentioned, using MAC address doesn't remove the
> possibility of a collision.
>
> Maybe a good compromise that would allow a generator function to go into
> the backend would be to combine the current time with a random number.
> That will ensure that you won't get a dupe, so long as your clock never
> runs backwards.

I think that is a reasonable solution. I just wonder if there is a cross
platform way to get the MAC address for all OS we support.


From: mark(at)mark(dot)mielke(dot)cc
To: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-19 13:51:23
Message-ID: 20060919135123.GA2422@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
> On Mon, Sep 18, 2006 at 07:45:07PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > I would not use a 100% random number generator for a UUID value as was
> > suggested. I prefer inserting the MAC address and the time, to at
> > least allow me to control if a collision is possible. This is not easy
> > to do using a few lines of C code. I'd rather have a UUID type in core
> > with no generation routine, than no UUID type in core because the code
> > is too complicated to maintain, or not portable enough.
> As others have mentioned, using MAC address doesn't remove the
> possibility of a collision.

It does, as I control the MAC address. I can choose not to overwrite it.
I can choose to ensure that any cases where it is overwritten, it is
overwritten with a unique value. Random number does not provide this
level of control.

> Maybe a good compromise that would allow a generator function to go into
> the backend would be to combine the current time with a random number.
> That will ensure that you won't get a dupe, so long as your clock never
> runs backwards.

Which standard UUID generation function would you be thinking of?
Inventing a new one doesn't seem sensible. I'll have to read over the
versions again...

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
To: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
Cc: mark(at)mark(dot)mielke(dot)cc, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-19 13:53:05
Message-ID: 20060919135305.GV47167@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 19, 2006 at 03:35:55PM +0200, Gevik Babakhani wrote:
> > As others have mentioned, using MAC address doesn't remove the
> > possibility of a collision.
> >
> > Maybe a good compromise that would allow a generator function to go into
> > the backend would be to combine the current time with a random number.
> > That will ensure that you won't get a dupe, so long as your clock never
> > runs backwards.
>
> I think that is a reasonable solution. I just wonder if there is a cross
> platform way to get the MAC address for all OS we support.

Well... how much OS-specific code do you want? :)

Another (not as good) possibility would be to use the IP address (along
with time and a random number).
--
Jim Nasby jimn(at)enterprisedb(dot)com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: mark(at)mark(dot)mielke(dot)cc
Cc: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-19 14:11:39
Message-ID: 450FFA9B.9050203@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


[-patches trimmed from list]

mark(at)mark(dot)mielke(dot)cc wrote:
>> As others have mentioned, using MAC address doesn't remove the
>> possibility of a collision.
>>
>
> It does, as I control the MAC address. I can choose not to overwrite it.
> I can choose to ensure that any cases where it is overwritten, it is
> overwritten with a unique value.
>

How do you know somebody else isn't using that MAC value?

cheers

andrew


From: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-19 14:16:31
Message-ID: 20060919141631.GW47167@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 19, 2006 at 09:51:23AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
> > On Mon, Sep 18, 2006 at 07:45:07PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > > I would not use a 100% random number generator for a UUID value as was
> > > suggested. I prefer inserting the MAC address and the time, to at
> > > least allow me to control if a collision is possible. This is not easy
> > > to do using a few lines of C code. I'd rather have a UUID type in core
> > > with no generation routine, than no UUID type in core because the code
> > > is too complicated to maintain, or not portable enough.
> > As others have mentioned, using MAC address doesn't remove the
> > possibility of a collision.
>
> It does, as I control the MAC address. I can choose not to overwrite it.
> I can choose to ensure that any cases where it is overwritten, it is
> overwritten with a unique value. Random number does not provide this
> level of control.
>
> > Maybe a good compromise that would allow a generator function to go into
> > the backend would be to combine the current time with a random number.
> > That will ensure that you won't get a dupe, so long as your clock never
> > runs backwards.
>
> Which standard UUID generation function would you be thinking of?
> Inventing a new one doesn't seem sensible. I'll have to read over the
> versions again...

I don't think it exists, but I don't see how that's an issue. Let's look
at an extreme case: take the amount of random entropy used for the
random-only generation method. Append that to the current time in UTC,
and hash it. Thanks to the time component, you've now greatly reduced
the odds of a duplicate, probably by many orders of magnitude.

Ultimately, I'm OK with a generator that's only in contrib, provided
that there's at least one that will work on all OSes.
--
Jim Nasby jimn(at)enterprisedb(dot)com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: mark(at)mark(dot)mielke(dot)cc
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-19 21:12:13
Message-ID: 20060919211213.GA14890@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 19, 2006 at 10:11:39AM -0400, Andrew Dunstan wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> >>As others have mentioned, using MAC address doesn't remove the
> >>possibility of a collision.
> >It does, as I control the MAC address. I can choose not to overwrite it.
> >I can choose to ensure that any cases where it is overwritten, it is
> >overwritten with a unique value.
> How do you know somebody else isn't using that MAC value?

Different UUID forms can be unique within their domain. As long as I
control the MAC address assignment for all of my units, my MAC address
can be guaranteed to be unique across space and time, within the
generous range provided by a UUID. My UUIDs may not be unique in your
database, or in your domain, but they will be unique within mine.

If I use a UUID form based upon the MD5 or SHA-1 of a unique URL, there
is a great chance that it is unique. Better than that of a random number
generator, in that I control the URL.

I'm not in favour of the random number based UUID forms, as I believe
I am sacrificing control, thereby allowing for generation to result in
non-unique output. Where it is currently impossible for me to generate
the same UUID (I control the MAC address, time, and the generator uses
the clock sequence), using a random number generator turns the
impossibility into a possibility.

If you don't have control over the MAC address, time, or generator,
then yeah - random number generator might suffice.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-19 22:02:40
Message-ID: slrneh0q80.2ea3.andrew+nonews@atlantis.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2006-09-19, mark(at)mark(dot)mielke(dot)cc <mark(at)mark(dot)mielke(dot)cc> wrote:
> Different UUID forms can be unique within their domain. As long as I
> control the MAC address assignment for all of my units, my MAC address
> can be guaranteed to be unique across space and time,

You do not know (and can never know) that no-one else is using the same
MAC address. Anyone with substantial experience in networking will tell
you that the supposed "uniqueness" of manufacturer-assigned MACs is often
a myth, with (in extreme cases) entire batches of NICs being manufactured
with the same assigned MAC.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: mark(at)mark(dot)mielke(dot)cc
To: andrew(at)supernews(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 01:39:13
Message-ID: 20060920013913.GA22242@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 19, 2006 at 10:02:40PM -0000, Andrew - Supernews wrote:
> On 2006-09-19, mark(at)mark(dot)mielke(dot)cc <mark(at)mark(dot)mielke(dot)cc> wrote:
> > Different UUID forms can be unique within their domain. As long as I
> > control the MAC address assignment for all of my units, my MAC address
> > can be guaranteed to be unique across space and time,
> You do not know (and can never know) that no-one else is using the same
> MAC address. Anyone with substantial experience in networking will tell
> you that the supposed "uniqueness" of manufacturer-assigned MACs is often
> a myth, with (in extreme cases) entire batches of NICs being manufactured
> with the same assigned MAC.

I have the impression I'm not being heard.

*I* control the MAC address assignment for all of *MY* units.

Clear? :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: mark(at)mark(dot)mielke(dot)cc
Cc: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-20 03:21:51
Message-ID: 20060920032151.GF31466@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

mark(at)mark(dot)mielke(dot)cc wrote:
> On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
> > On Mon, Sep 18, 2006 at 07:45:07PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > > I would not use a 100% random number generator for a UUID value as was
> > > suggested. I prefer inserting the MAC address and the time, to at
> > > least allow me to control if a collision is possible. This is not easy
> > > to do using a few lines of C code. I'd rather have a UUID type in core
> > > with no generation routine, than no UUID type in core because the code
> > > is too complicated to maintain, or not portable enough.
> > As others have mentioned, using MAC address doesn't remove the
> > possibility of a collision.
>
> It does, as I control the MAC address.

What happens if you have two postmaster running on the same machine?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: mark(at)mark(dot)mielke(dot)cc
To: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gevik Babakhani <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-20 04:04:27
Message-ID: 20060920040427.GA25866@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Sep 19, 2006 at 11:21:51PM -0400, Alvaro Herrera wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> > On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
> > > On Mon, Sep 18, 2006 at 07:45:07PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > > > I would not use a 100% random number generator for a UUID value as was
> > > > suggested. I prefer inserting the MAC address and the time, to at
> > > > least allow me to control if a collision is possible. This is not easy
> > > > to do using a few lines of C code. I'd rather have a UUID type in core
> > > > with no generation routine, than no UUID type in core because the code
> > > > is too complicated to maintain, or not portable enough.
> > > As others have mentioned, using MAC address doesn't remove the
> > > possibility of a collision.
> > It does, as I control the MAC address.
> What happens if you have two postmaster running on the same machine?

Could be bad things. :-)

For the case of two postmaster processes, I assume you mean two
different databases? If you never intend to merge the data between the
two databases, the problem is irrelevant. There is a much greater
chance that any UUID form is more unique, or can be guaranteed to be
unique, within a single application instance, than across all
application instances in existence. If you do intend to merge the
data, you may have a problem.

If I have two connections to PostgreSQL - would the plpgsql procedures
be executed from two different processes? With an in-core generation
routine, I think it is possible for it to collide unless inter-process
synchronization is used (unlikely) to ensure generation of unique
time/sequence combinations each time. I use this right now (mostly),
but as I've mentioned, it isn't my favourite. It's convenient. I don't
believe it provides the sort of guarantees that a SERIAL provides.

A model that intended to try and guarantee uniqueness would provide a
UUID generation service for the entire host, that was not specific to
any application, or database, possibly accessible via the loopback
address. It would ensure that at any given time, either the time is
new, or the sequence is new for the time. If computer time ever went
backwards, it could keep the last time issued persistent, and
increment from this point forward through the clock sequence values
until real time catches up. An alternative would be along the lines of
a /dev/uuid device, that like /dev/random, would be responsible for
outputting unique uuid values for the system. Who does this? Probably
nobody. I'm tempted to implement it, though, for my uses. :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
To: "mark(at)mark(dot)mielke(dot)cc" <mark(at)mark(dot)mielke(dot)cc>
Cc: "Jim C(dot) Nasby" <jimn(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, "Gevik Babakhani" <pgdev(at)xs4all(dot)nl>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Patch for UUID datatype (beta)
Date: 2006-09-20 07:02:56
Message-ID: 7be3f35d0609200002n19af5287r642614ab572f7352@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Mark,

>
> A model that intended to try and guarantee uniqueness would provide a
> UUID generation service for the entire host, that was not specific to
> any application, or database, possibly accessible via the loopback
> address. It would ensure that at any given time, either the time is
> new, or the sequence is new for the time. If computer time ever went
> backwards, it could keep the last time issued persistent, and
> increment from this point forward through the clock sequence values
> until real time catches up. An alternative would be along the lines of
> a /dev/uuid device, that like /dev/random, would be responsible for
> outputting unique uuid values for the system. Who does this? Probably
> nobody. I'm tempted to implement it, though, for my uses. :-)
>

That is an excellent summary. There is just one wrong assumption in it:

>Probably nobody.

Within win32 there is an API call, which provides you with an GUID / UUID
with to my knowledge exactly the features you are describing. win32 is
installed on some computers. So for PostgreSQL on win32 the new_guid() you
describe in detail would be quite simple to implement: a call to
CoCreateGuid.

The challenging part is: I use PostgreSQL in a mixed environment. And Linux
i.e. does not provide CoCreateGuid. That's why I am voting to have it in
PostgreSQL :)

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.


From: Gregory Stark <gsstark(at)mit(dot)edu>
To: mark(at)mark(dot)mielke(dot)cc
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 09:04:00
Message-ID: 87k63y9af3.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


mark(at)mark(dot)mielke(dot)cc writes:

> I have the impression I'm not being heard.
>
> *I* control the MAC address assignment for all of *MY* units.

No, you're missing the point. How does that help *me* avoid collisions with
your UUIDs? UUIDs are supposed to be unique period, not just unique on your
database.

If all you want is unique number generation in your database then you can just
use sequences and they'll take a lot less space and perform much better.
(16-byte foreign keys throughout the whole database, *shudder*)

The reason to use UUIDs is when you want to have unique identifiers that you
can send outside the database and know they won't conflict with other unique
identifiers generated elsewhere.

Really this whole debate only reinforces the point that there isn't a single
way of doing UUID generation. There are multiple libraries out there each with
pros and cons. It makes more sense to have multiple pgfoundry UUID generating
modules.

--
greg


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: mark(at)mark(dot)mielke(dot)cc, andrew(at)supernews(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 09:59:38
Message-ID: Pine.BSO.4.63.0609200239580.29136@resin2.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 20 Sep 2006, Gregory Stark wrote:

>
> mark(at)mark(dot)mielke(dot)cc writes:
>
> > I have the impression I'm not being heard.
> >
> > *I* control the MAC address assignment for all of *MY* units.
>
> No, you're missing the point. How does that help *me* avoid collisions with
> your UUIDs? UUIDs are supposed to be unique period, not just unique on your
> database.

I must jump in with my amusement at this whole conversation. I just
looked up the standard (http://www.ietf.org/rfc/rfc4122.txt) and it
includes this abstract:

Abstract

This specification defines a Uniform Resource Name namespace for
UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
Unique IDentifier). A UUID is 128 bits long, and can guarantee
uniqueness across space and time. UUIDs were originally used in the
Apollo Network Computing System and later in the Open Software
Foundation's (OSF) Distributed Computing Environment (DCE), and then
in Microsoft Windows platforms.

It then goes on to detail multiple versions of them which are generated in
various ways. But they are all called UUID, and thus should all be
UNIVERSALLY unique, and the statement "can guarantee uniqueness across
space and time" should apply equally to all versions, as it is an absolute
statement. So perhaps the ietf have been drinking the kool-aid (or
whatever), or perhaps you plan to use your databases in multiple
universes. But the standard seems to make the whole discussion moot by
guaranteeing all UUIDs to be unique across space and time. Or am I
misreading that?

So I guess I am just ROFL at the fact that people can't seem to get their
definition of "universe" quite straight. Either the UUID is misnamed, or
some people here are vastly underestimating the scope of the universe, or
perhaps both. Or perhaps it's just that it's 3am and this thing seems
extraordiarily funny to me right now ;)

--
Menu, n.:
A list of dishes which the restaurant has just run out of.


From: Markus Schaber <schabi(at)logix-tt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 12:11:32
Message-ID: 45112FF4.9020408@logix-tt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi, Mark,

mark(at)mark(dot)mielke(dot)cc wrote:

> The versions that include a MAC address, time, and serial number for
> the machine come pretty close, presuming that the user has not
> overwritten the MAC address with something else. It's unique at
> manufacturing time.

Not even that is guaranteed. I remember that, about 8 years ago, me and
a co-student bought a cheap "network starting kit" each, containing two
network kards and a crossover cable.

Now, it turned out, that the first cards in both packages had the same
mac address, and the second cards as well, so we could not network
together using proper cabling and a hub.

Luckily, the mac address was flashable in an eeprom, and so my friend
"fixed" his hards with those from two 10 MBit Coax cards we had
abandoned in favour of the new twisted pair network.

AFAIR, in the end it turned out that the whole charge of cards was
manufactured this way. Officially, it was a bug in the eeprom content
generating software, but there were rumours that the manufacturer wanted
to avoid paying the registration fees for the mac address ranges...

Just gettin' off topic,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


From: mark(at)mark(dot)mielke(dot)cc
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 13:17:52
Message-ID: 20060920131752.GA2410@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, Sep 20, 2006 at 05:04:00AM -0400, Gregory Stark wrote:
> mark(at)mark(dot)mielke(dot)cc writes:
> > I have the impression I'm not being heard.
> > *I* control the MAC address assignment for all of *MY* units.
> No, you're missing the point. How does that help *me* avoid collisions with
> your UUIDs? UUIDs are supposed to be unique period, not just unique on your
> database.

As you already said, they can't be. I don't see how random is better than
unique by intent (MAC address).

> If all you want is unique number generation in your database then
> you can just use sequences and they'll take a lot less space and
> perform much better. (16-byte foreign keys throughout the whole
> database, *shudder*)

I want unique number generation from several separate databases, and
I don't like the idea of maintaining complicated SERIAL ranges, or using
one of the increment by X, offset Y techniques. Too hard.

> The reason to use UUIDs is when you want to have unique identifiers that you
> can send outside the database and know they won't conflict with other unique
> identifiers generated elsewhere.

If you don't control the factors that influence the UUID generation, this
is a cross your fingers type of merge. Random numbers might collide.
Shared MAC address might collide. Not controlling the time source might
collide. Although it will probably work, if I know my domain, if I know
what will need to be merged, I can ensure that they can be merged.

> Really this whole debate only reinforces the point that there isn't
> a single way of doing UUID generation. There are multiple libraries
> out there each with pros and cons. It makes more sense to have
> multiple pgfoundry UUID generating modules.

Exactly. If I lead you to the impression that I want UUIDv1 in core, this
was not the intent. What I intend to say is that different people want
different implementations, and one of the most useful versions, in my
opinion, is difficult to implement portably.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Tom Dunstan <pgsql(at)tomd(dot)cc>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 14:06:26
Message-ID: 45114AE2.2000002@tomd.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

mark(at)mark(dot)mielke(dot)cc wrote:
>> Really this whole debate only reinforces the point that there isn't
>> a single way of doing UUID generation. There are multiple libraries
>> out there each with pros and cons. It makes more sense to have
>> multiple pgfoundry UUID generating modules.
>
> Exactly. If I lead you to the impression that I want UUIDv1 in core, this
> was not the intent. What I intend to say is that different people want
> different implementations, and one of the most useful versions, in my
> opinion, is difficult to implement portably.

Actually, you could do it very portably, at the cost of a minute or so's
worth of configuration. Simply have a GUC variable called, say,
uuid_mac_address. Then the person who gets a box of dud NICs or who,
like me, has a virtual server somewhere without a true ethernet port
visible to the operating system, can easily set it. No cross-platform
code, no requirement to build a third party module in contrib (at least
not for v1 uuids).

I actually DO think that we should have at least one default generation
routine in core, even if the above idea doesn't float and it's just v4
random numbers. If we advertise that we have uuids, people will not
expect to have to install a contrib module just to get some values
generated. The SQL server function newsequentialid() which gives v1
uuids, sort of, is ONLY available as a default value for a column, you
can't use it in normal expressions (figure that out). So people clearly
will expect to be able to generate these at the database level.

Using either v1s as configured above or v4s, there's no portability
issue. Indeed MS SQL Server has a both available (newsequentialid() and
newid()). And sufficient documentation should allow people to make their
minds up regarding what their needs are. If they really want funky v3
namespace ones then they can install a contrib, no problem with that.

Cheers

Tom


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, mark(at)mark(dot)mielke(dot)cc, andrew(at)supernews(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-20 14:22:29
Message-ID: 2698.1158762149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> I must jump in with my amusement at this whole conversation. I just
> looked up the standard (http://www.ietf.org/rfc/rfc4122.txt) and it
> includes this abstract:

> A UUID is 128 bits long, and can guarantee
> uniqueness across space and time.

The only meaningful word in that claim is "can". Which boils down to
"if everybody always follows best practices and no failures ever occur,
maybe they're really unique". We already know that two of the more
critical assumptions embedded in those best practices (unique MAC
addresses and always-correct system clocks) are seriously flawed in
the real world.

To see just how much of the kool-aid that RFC's authors have been
drinking, note that their "sample implementation" in Appendix A
implements the unique node identifier as ... a random number.
So much for guaranteed uniqueness.

regards, tom lane


From: Thomas Hallgren <thomas(at)tada(dot)se>
To: pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Patch for UUID datatype (beta)
Date: 2006-09-27 20:00:59
Message-ID: 451AD87B.8000005@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

mark(at)mark(dot)mielke(dot)cc wrote:
> On Tue, Sep 19, 2006 at 11:21:51PM -0400, Alvaro Herrera wrote:
>> mark(at)mark(dot)mielke(dot)cc wrote:
>>> On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
>>>> On Mon, Sep 18, 2006 at 07:45:07PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
>>>>> I would not use a 100% random number generator for a UUID value as was
>>>>> suggested. I prefer inserting the MAC address and the time, to at
>>>>> least allow me to control if a collision is possible. This is not easy
>>>>> to do using a few lines of C code. I'd rather have a UUID type in core
>>>>> with no generation routine, than no UUID type in core because the code
>>>>> is too complicated to maintain, or not portable enough.
>>>> As others have mentioned, using MAC address doesn't remove the
>>>> possibility of a collision.
>>> It does, as I control the MAC address.
>> What happens if you have two postmaster running on the same machine?
>
> Could be bad things. :-)
>
> For the case of two postmaster processes, I assume you mean two
> different databases? If you never intend to merge the data between the
> two databases, the problem is irrelevant. There is a much greater
> chance that any UUID form is more unique, or can be guaranteed to be
> unique, within a single application instance, than across all
> application instances in existence. If you do intend to merge the
> data, you may have a problem.
>
You may. But it's not very likely. Since a) there is a 13-bit random number in addition to
the MAC address (the clock sequence) and b) the timestamp has a granularity of 100 nanosec.
An implementation could be made to prevent clock-sequence collisions on the same machine and
thereby avoid this altogether.

Kind Regards,
Thomas Hallgren