partitioning question -- how to guarantee uniqueness across partitions

Lists: pgsql-general
From: Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-28 16:28:21
Message-ID: 8ec76080906280928k2aaf90c7yf3025b2db2ef7805@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a simple example copied from the 8.3 manual on partitioning
(http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html).

My question is, if you create a serial type in the parent table which
is meant to be the primary key across all the partitions, how does one
guarantee uniqueness of that key?

This example is contrived, but it shows that there is no constraint to
prevent insertion of duplicate 'id' values into separate partition
tables. (after running the example there should be two id=100 rows in
the table). This makes sense because each partition only guarantees
uniqueness for it's own rows, and the primary key constraint on the
parent table never comes into play. Do I need to add a test in the
trigger function to prevent insertion of duplicate 'id' values?

I realize that if I used a partitioning function that was based on
ranges of the primary key, then I would be guaranteed unique ranges of
id for each table, but most partitioning examples I've seen usually
test for date ranges which is similar to this example.

Any practical thoughts on this issue would be appreciated.

Thanks,
Whit

begin;

CREATE TABLE part_main (
id serial primary key,
data_value integer not null
);

CREATE TABLE part_10 (CHECK (data_value <= 10)) inherits (part_main);
ALTER TABLE part_10 ADD PRIMARY KEY (id);

CREATE TABLE part_20 (CHECK (data_value > 10 and data_value <= 20 ))
inherits (part_main);
ALTER TABLE part_20 ADD PRIMARY KEY (id);

CREATE OR REPLACE FUNCTION part_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.data_value <= 10) THEN INSERT INTO part_10 VALUES (NEW.*);
ELSEIF ( NEW.data_value > 10 and NEW.data_value <= 20) THEN INSERT
INTO part_20 VALUES (NEW.*);
ELSE RAISE EXCEPTION 'value out of range. Fix the
part_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_part_trigger
BEFORE INSERT ON part_main
FOR EACH ROW EXECUTE PROCEDURE part_insert_trigger();

insert into part_main (id,data_value)
values
(100,10),
(100,20);

commit;


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-28 17:27:35
Message-ID: 10405.1246210055@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com> writes:
> I have a simple example copied from the 8.3 manual on partitioning
> (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html).

> My question is, if you create a serial type in the parent table which
> is meant to be the primary key across all the partitions, how does one
> guarantee uniqueness of that key?

One doesn't. That is not an appropriate way to set up a partitioned
table. You need a primary key that can actually be used as a meaningful
partitioning key. In this example, the id is completely useless and
what you should be looking at is making the data_value be the primary
key.

regards, tom lane


From: Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-28 18:45:04
Message-ID: 8ec76080906281145t7116fb52l4eaf7b4d626fd62e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks, Tom.

Let me give a little more detail on my actual data rather than the
simple example I sent.

I have a 60GB table of loan balances, which I've partitioned into 26 tables.

The loan id's are a sequence of 6 characters, so the partitioning rule
I've used is the first character of the loan id, which yields roughly
equal sized partitions of 2.8 GB or so.

Each loan can only have one balance per month, so the primary key on
each partition is set to be loan_id and asofdate.

However, this data is meant to be available via a rails application,
hence, the need for a surrogate key of integers which is unique across
the entire set of partitions.

Creation of new rows in the partitioned tables should not be an issue
under normal circumstances because I see that all of the child tables
use the same sequence for generating new id's.

However, what makes me nervous is that there is no explicit constraint
in the database that prevents duplicate id's from being created, and
I'm not sure how the rails app would react if for whatever reason
duplicate id keys wound up in the table.

Any suggestions?

Thanks,
Whit

On Sun, Jun 28, 2009 at 1:27 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com> writes:
>> I have a simple example copied from the 8.3 manual on partitioning
>> (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html).
>
>> My question is, if you create a serial type in the parent table which
>> is meant to be the primary key across all the partitions, how does one
>> guarantee uniqueness of that key?
>
> One doesn't.  That is not an appropriate way to set up a partitioned
> table.  You need a primary key that can actually be used as a meaningful
> partitioning key.  In this example, the id is completely useless and
> what you should be looking at is making the data_value be the primary
> key.
>
>                        regards, tom lane
>


From: Arndt Lehmann <arndt(dot)lehmann(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-29 01:16:49
Message-ID: 9b0a3e00-9daf-4d63-be57-11784edab5fb@q14g2000vbn.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jun 29, 3:45 am, armstrong(dot)w(dot)(dot)(dot)(at)gmail(dot)com (Whit Armstrong) wrote:
> Thanks, Tom.
>
> Let me give a little more detail on my actual data rather than the
> simple example I sent.
>
> I have a 60GB table of loan balances, which I've partitioned into 26 tables.
>
> The loan id's are a sequence of 6 characters, so the partitioning rule
> I've used is the first character of the loan id, which yields roughly
> equal sized partitions of 2.8 GB or so.
>
> Each loan can only have one balance per month, so the primary key on
> each partition is set to be loan_id and asofdate.
>
> However, this data is meant to be available via a rails application,
> hence, the need for a surrogate key of integers which is unique across
> the entire set of partitions.
>
> Creation of new rows in the partitioned tables should not be an issue
> under normal circumstances because I see that all of the child tables
> use the same sequence for generating new id's.
>
> However, what makes me nervous is that there is no explicit constraint
> in the database that prevents duplicate id's from being created, and
> I'm not sure how the rails app would react if for whatever reason
> duplicate id keys wound up in the table.
>
> Any suggestions?
>
> Thanks,
> Whit
>
>
>
> On Sun, Jun 28, 2009 at 1:27 PM, Tom Lane<t(dot)(dot)(dot)(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Whit Armstrong <armstrong(dot)w(dot)(dot)(dot)(at)gmail(dot)com> writes:
> >> I have a simple example copied from the 8.3 manual on partitioning
> >> (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html).
>
> >> My question is, if you create a serial type in the parent table which
> >> is meant to be the primary key across all the partitions, how does one
> >> guarantee uniqueness of that key?
>
> > One doesn't.  That is not an appropriate way to set up a partitioned
> > table.  You need a primary key that can actually be used as a meaningful
> > partitioning key.  In this example, the id is completely useless and
> > what you should be looking at is making the data_value be the primary
> > key.
>
> >                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-gene(dot)(dot)(dot)(at)postgresql(dot)org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

Hi Whit,

you could consider using GUIDs instead of Integer for primary keys.
Here is a nice blog post explaining how to do this in Rails:
http://ariejan.net/2008/08/12/ruby-on-rails-uuid-as-your-activerecord-primary-key/

Best Regards,
Arndt Lehmann


From: Arndt Lehmann <arndt(dot)lehmann(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-29 01:27:58
Message-ID: f33ed165-de6d-4e4c-bb6f-ac458c1fb67f@d32g2000yqh.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> Hi Whit,
>
> you could consider using GUIDs instead of Integer for primary keys.
> Here is a nice blog post explaining how to do this in Rails:
>    http://ariejan.net/2008/08/12/ruby-on-rails-uuid-as-your-activerecord...
>
> Best Regards,
>   Arndt Lehmann

just a note: As per above blog post, the GUID are generated using a
Ruby module.
PostgreSQL actually comes with a baked in UUID column type. That
probably will be a more efficient solution.

Regards,
Arndt


From: Arndt Lehmann <arndt(dot)lehmann(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-29 01:28:05
Message-ID: 2763fb2a-5211-4205-b19f-ec0d3b37f683@z9g2000yqi.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> Hi Whit,
>
> you could consider using GUIDs instead of Integer for primary keys.
> Here is a nice blog post explaining how to do this in Rails:
>    http://ariejan.net/2008/08/12/ruby-on-rails-uuid-as-your-activerecord...
>
> Best Regards,
>   Arndt Lehmann

just a note: As per above blog post, the GUID are generated using a
Ruby module.
PostgreSQL actually comes with a baked in UUID column type. That
probably will be a more efficient solution.

Regards,
Arndt


From: Erik Jones <ejones(at)engineyard(dot)com>
To: Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: partitioning question -- how to guarantee uniqueness across partitions
Date: 2009-06-29 19:15:57
Message-ID: 8DBEB4C7-5E7E-4DD0-94E9-29A6F05A1A9A@engineyard.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Jun 28, 2009, at 11:45 AM, Whit Armstrong wrote:

> Thanks, Tom.
>
> Let me give a little more detail on my actual data rather than the
> simple example I sent.
>
> I have a 60GB table of loan balances, which I've partitioned into 26
> tables.
>
> The loan id's are a sequence of 6 characters, so the partitioning rule
> I've used is the first character of the loan id, which yields roughly
> equal sized partitions of 2.8 GB or so.
>
> Each loan can only have one balance per month, so the primary key on
> each partition is set to be loan_id and asofdate.
>
> However, this data is meant to be available via a rails application,
> hence, the need for a surrogate key of integers which is unique across
> the entire set of partitions.
>
> Creation of new rows in the partitioned tables should not be an issue
> under normal circumstances because I see that all of the child tables
> use the same sequence for generating new id's.
>
> However, what makes me nervous is that there is no explicit constraint
> in the database that prevents duplicate id's from being created, and
> I'm not sure how the rails app would react if for whatever reason
> duplicate id keys wound up in the table.

As long as your inserts always use the default value,
nextval('sequence_name'), for the id values then that can never happen
unless you at some point use setval('sequence_name', X) where X <= the
max value already present in your partitioned table, which you should
never be doing anyway.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k