Re: pervasiveness of surrogate (also called synthetic) keys

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pervasiveness of surrogate (also called synthetic) keys
Date: 2011-05-03 00:25:10
Message-ID: 1304382310.2038.115.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 2011-05-02 at 11:10 -0400, Greg Smith wrote:
> The position Merlin
> has advocated here, that there should always be a natural key available
> if you know the data well enough, may be true. But few people are good
> enough designers to be sure they've made the decision correctly, and the
> downsides of being wrong can be a long, painful conversion process.
> Easier for most people to just eliminate the possibility of making a
> mistake by using auto-generated surrogate keys, where the primary
> problem you'll run into is merely using more space/resources than you
> might otherwise need to have. It minimizes the worst-case--mistake make
> in the model, expensive re-design--by adding overhead that makes the
> average case more expensive.

Once you really try to define "natural" and "surrogate" keys, I think a
lot of the arguments disappear. I wrote about this a few years back:

http://thoughts.j-davis.com/2007/12/11/terminology-confusion/

In particular, I think you are falsely assuming that a natural key must
be generated from an outside source (or some source outside of your
control), and is therefore not reliably unique.

You can generate your own keys, and if you hand them out to customers
and include them on paperwork, they are now a part of the reality that
your database models -- and therefore become natural keys. Invoice
numbers, driver's license numbers, etc., are all natural keys, because
they are known about, and used, in reality. Usernames are, too, the only
difference is that you let the user choose it.

In contrast, a pointer or a UUID typically does not represent reality,
because no humans ever see it and no computer systems outside yours know
about it. So, it's merely an implementation detail and should not be a
part of the model.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-05-03 01:24:24 Re: Help with database recovery ...
Previous Message Craig Ringer 2011-05-02 23:26:22 Re: Needs Suggestion