Quick Links

Partitioning/inherited tables vs FKs

Lists:	pgsql-hackers

From:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To:	pgsql-hackers(at)postgreSQL(dot)org
Cc:	Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Partitioning/inherited tables vs FKs
Date:	2010-05-06 08:52:42
Message-ID:	4BE2835A.5020601@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

we came across an interesting problem.

=# create table parent (id serial primary key, t text);
NOTICE: CREATE TABLE will create implicit sequence "parent_id_seq" for
serial column "parent.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"parent_pkey" for table "parent"
CREATE TABLE
=# create table child () inherits (parent);
CREATE TABLE
=# create table refer (id serial primary key, parent_id integer
references parent (id));
NOTICE: CREATE TABLE will create implicit sequence "refer_id_seq" for
serial column "refer.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"refer_pkey" for table "refer"
CREATE TABLE
=# begin;
BEGIN
=# insert into child (t) values ('a') returning id;
id
----
1
(1 sor)

INSERT 0 1
=# select * from parent;
id | t
----+---
1 | a
(1 sor)

=# insert into refer (parent_id) values (1);
ERROR: insert or update on table "refer" violates foreign key
constraint "refer_parent_id_fkey"
DETAIL: Key (parent_id)=(1) is not present in table "parent".

The use case for this was there were different news items,
and there were another table for summaries, that could point
to any of the news items table. Another use case could be
a large partitioned table with an FK to the main table where
the referring table might only contain very few "interesting" data.

No matter what are the semantics, the parent table in the
inheritance chain cannot be used as and endpoint for FKs.

Is it a bug, or intentional?

The only solution currently is that the referring table has to be
partitioned the same way as the referred table in the FK, and
its parent table has to be queried.

Best regards,
Zoltán Böszörményi

--
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc:	pgsql-hackers(at)postgreSQL(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-06 09:31:37
Message-ID:	FAA7C1DF-63CC-4E6D-940B-83EBC5A50D0B@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 6, 2010, at 10:52 , Boszormenyi Zoltan wrote:
> =# create table parent (id serial primary key, t text);
> ...
> =# create table child () inherits (parent);
> ...
> =# create table refer (id serial primary key, parent_id integer
> ...
> =# insert into child (t) values ('a') returning id;
> ...
> =# select * from parent;
> id | t
> ----+---
> 1 | a
> (1 sor)
>
> =# insert into refer (parent_id) values (1);
> ERROR: insert or update on table "refer" violates foreign key
> constraint "refer_parent_id_fkey"
> DETAIL: Key (parent_id)=(1) is not present in table "parent".
>
> The use case for this was there were different news items,
> and there were another table for summaries, that could point
> to any of the news items table. Another use case could be
> a large partitioned table with an FK to the main table where
> the referring table might only contain very few "interesting" data.

Yeah, this is a long-standing issue with inheritance. Table inheritance in postgres isn't much more than an implicit UNION done on selects plus some logic in ALTER TABLE to keep propagate structural changes. Indices and constraints basically always behave as if ONLY had been specified. I'm not even sure if the ids are globally unique in your example - it might be that each child's "id serial" column gets its very own sequence.

One possible workaround is no create a table, say referred_ids, that contains all the ids from parent and all of its children, kept up-to-date via triggers, and point the FK constraint to that table. That also allows for a global unique constraint on the ids by definition a suitable unique or primary key constraint on referred_ids.

What lies at the heart of this problem is the lack of multi-table indices and hence multi-table unique constraints in postgres. AFAIK with those in place the rest amounts to the removal of ONLY from the constraint check queries plus some code to propagate constraint triggers to child tables.

best regards,
Florian Pflug

From:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc:	pgsql-hackers(at)postgresql(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-06 10:37:42
Message-ID:	v2i3073cc9b1005060337x4883e16ak8e6bdf1996a29ffe@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2010/5/6 Boszormenyi Zoltan <zb(at)cybertec(dot)at>:
>
> =# insert into refer (parent_id) values (1);
> ERROR: insert or update on table "refer" violates foreign key
> constraint "refer_parent_id_fkey"
> DETAIL: Key (parent_id)=(1) is not present in table "parent".
>
> The use case for this was there were different news items,
> and there were another table for summaries, that could point
> to any of the news items table. Another use case could be
> a large partitioned table with an FK to the main table where
> the referring table might only contain very few "interesting" data.
>
> No matter what are the semantics, the parent table in the
> inheritance chain cannot be used as and endpoint for FKs.
>
> Is it a bug, or intentional?

i would call it a bug, but this is a known issue

>
> The only solution currently is that the referring table has to be
> partitioned the same way as the referred table in the FK, and
> its parent table has to be queried.
>

no, you can install a trigger on the child table that verifies the
existence of the id on your partitioned parent table, the SELECT
you'll use inside that trigger will look at the entire set of tables
(as long as you don't use FROM ONLY)

also could be useful to put an index (even a PK) on every child to
ensure uniqueness and make the SELECT more efficient, and of course a
check constraint in every child emulating a partition key

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-06 12:35:53
Message-ID:	AANLkTilTGApDV2mm3HhewBuYqqUITY-gbj28NDrFIEaG@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 6, 2010 at 6:37 AM, Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
> i would call it a bug, but this is a known issue
>
>>
>> The only solution currently is that the referring table has to be
>> partitioned the same way as the referred table in the FK, and
>> its parent table has to be queried.
>>
>
> no, you can install a trigger on the child table that verifies the
> existence of the id on your partitioned parent table, the SELECT
> you'll use inside that trigger will look at the entire set of tables
> (as long as you don't use FROM ONLY)
>
> also could be useful to put an index (even a PK) on every child to
> ensure uniqueness and make the SELECT more efficient, and of course a
> check constraint in every child emulating a partition key

The referential integrity triggers contain some extra magic that isn't
easily simulatable in userland, and that is necessary to make the
foreign key constraints airtight. We've discussed this previously but
I don't remember which thread it was or the details of when things
blow up. I think it's something like this: the parent has a tuple
that is not referenced by any child. Transaction 1 begins, deletes
the parent tuple (checking that it has no children), and pauses.
Transaction 2 begins, adds a child tuple that references the parent
tuple (checking that the parent exists, which it does), and commits.
Transaction 1 commits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-06 14:38:12
Message-ID:	16409.1273156692@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> What lies at the heart of this problem is the lack of multi-table
> indices and hence multi-table unique constraints in postgres. AFAIK
> with those in place the rest amounts to the removal of ONLY from the
> constraint check queries plus some code to propagate constraint
> triggers to child tables.

Well, the lack of multi-table indexes certainly is the heart of the
problem, but I'm not sure that inventing such a thing is the solution.
Quite aside from the implementation difficulties involved in it,
doing things that way would destroy some of the major reasons to
partition tables at all:

* the index grows as the size of the total data set, it's not limited
by partition size

* can't cheaply drop one partition any more, you have to vacuum the
(big) index first

* probably some other things I'm not thinking of at the moment.

I think the real solution is to upgrade the partitioning infrastructure
so that we can understand that columns are unique across the whole
partitioned table, when the partitioning is done on that column and each
partition has a unique index.

regards, tom lane

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-06 16:04:07
Message-ID:	6DA8D72F-1195-4F18-8DAF-3C4B8A5C88E7@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 6, 2010, at 16:38 , Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> What lies at the heart of this problem is the lack of multi-table
>> indices and hence multi-table unique constraints in postgres. AFAIK
>> with those in place the rest amounts to the removal of ONLY from the
>> constraint check queries plus some code to propagate constraint
>> triggers to child tables.
>
> Well, the lack of multi-table indexes certainly is the heart of the
> problem, but I'm not sure that inventing such a thing is the solution.
> Quite aside from the implementation difficulties involved in it,
> doing things that way would destroy some of the major reasons to
> partition tables at all:
>
> * the index grows as the size of the total data set, it's not limited
> by partition size
>
> * can't cheaply drop one partition any more, you have to vacuum the
> (big) index first
>
> * probably some other things I'm not thinking of at the moment.
>
> I think the real solution is to upgrade the partitioning infrastructure
> so that we can understand that columns are unique across the whole
> partitioned table, when the partitioning is done on that column and each
> partition has a unique index.

True, for partitioned tables multi-table indices reintroduce some of the performance problems that partitioning is supposed to avoid.

But OTOH if you use table inheritance as a means to map data models (e.g. EER) more naturally to SQL, then multi-table indices have advantages over the partitioning-friendly solution you sketched above.

With a multi-table index, SELECT * FROM PARENT WHERE ID=?? has complexity LOG(N*M) where M is the number of tables inheriting from PARENT (including PARENT itself), and N the average number of rows in these tables. With one index per child, the complexity is M*LOG(N) which is significantly higher if M is large. Constraint exclusion could reduce that to LOG(N), but only if each child is has it's own private ID range which precludes ID assignment from a global sequence and hence makes ID assignment much more complex and error-prone.

Anyway, I was wondering why we need guaranteed uniqueness for FK relationships anyway. Because if we don't (which I didn't check prior to posting this I must admit), then why can't we simply remove the "ONLY" from the RI queries and let ALTER TABLE attach the RI triggers not only to the parent but also to all children. What am I missing?

best regards,
Florian Pflug

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-06 16:06:54
Message-ID:	18308.1273162014@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> Anyway, I was wondering why we need guaranteed uniqueness for FK
> relationships anyway.

It's required by spec, and the semantics aren't terribly sensible
without it.

regards, tom lane

From:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, "Hans-Juergen Schoenig" <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 06:16:52
Message-ID:	201005111316.52819.fozzy@ac-sw.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> The referential integrity triggers contain some extra magic that isn't
> easily simulatable in userland, and that is necessary to make the
> foreign key constraints airtight. We've discussed this previously but
> I don't remember which thread it was or the details of when things
> blow up. I think it's something like this: the parent has a tuple
> that is not referenced by any child. Transaction 1 begins, deletes
> the parent tuple (checking that it has no children), and pauses.
> Transaction 2 begins, adds a child tuple that references the parent
> tuple (checking that the parent exists, which it does), and commits.
> Transaction 1 commits.

Will SELECT ... FOR SHARE not help?

Regargs,
Dmitry

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 11:29:36
Message-ID:	AANLkTinkbEikDBhP28QryQp_rHgtbJ3rPIT82DemcxpE@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy(at)ac-sw(dot)com> wrote:
>> The referential integrity triggers contain some extra magic that isn't
>> easily simulatable in userland, and that is necessary to make the
>> foreign key constraints airtight. We've discussed this previously but
>> I don't remember which thread it was or the details of when things
>> blow up. I think it's something like this: the parent has a tuple
>> that is not referenced by any child. Transaction 1 begins, deletes
>> the parent tuple (checking that it has no children), and pauses.
>> Transaction 2 begins, adds a child tuple that references the parent
>> tuple (checking that the parent exists, which it does), and commits.
>> Transaction 1 commits.
>
> Will SELECT ... FOR SHARE not help?

Try it, with the example above. I think you'll find that it doesn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 12:10:00
Message-ID:	4BE94918.6080809@cs.helsinki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2010-05-11 14:29 +0200, Robert Haas wrote:
> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy(at)ac-sw(dot)com> wrote:
>>> The referential integrity triggers contain some extra magic that isn't
>>> easily simulatable in userland, and that is necessary to make the
>>> foreign key constraints airtight. We've discussed this previously but
>>> I don't remember which thread it was or the details of when things
>>> blow up. I think it's something like this: the parent has a tuple
>>> that is not referenced by any child. Transaction 1 begins, deletes
>>> the parent tuple (checking that it has no children), and pauses.
>>> Transaction 2 begins, adds a child tuple that references the parent
>>> tuple (checking that the parent exists, which it does), and commits.
>>> Transaction 1 commits.
>>
>> Will SELECT ... FOR SHARE not help?
>
> Try it, with the example above. I think you'll find that it doesn't.

TXA => delete from foo;
DELETE 1

TXB => select a from foo for share; -- waits

What am I missing?

Regards,
Marko Tiikkaja

From:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 12:55:12
Message-ID:	h2hb0f3f5a11005110555v4dc49b55v26e14ddbe4e3d9dd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2010/5/11 Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>:

> On 2010-05-11 14:29 +0200, Robert Haas wrote:
>
>> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy(at)ac-sw(dot)com> wrote:
>>
>>>> The referential integrity triggers contain some extra magic that isn't
>>>> easily simulatable in userland, and that is necessary to make the
>>>> foreign key constraints airtight. We've discussed this previously but
>>>> I don't remember which thread it was or the details of when things
>>>> blow up. I think it's something like this: the parent has a tuple
>>>> that is not referenced by any child. Transaction 1 begins, deletes
>>>> the parent tuple (checking that it has no children), and pauses.
>>>> Transaction 2 begins, adds a child tuple that references the parent
>>>> tuple (checking that the parent exists, which it does), and commits.
>>>> Transaction 1 commits.
>>>
>>> Will SELECT ... FOR SHARE not help?
>>
>> Try it, with the example above. I think you'll find that it doesn't.
>
> TXA => delete from foo;
> DELETE 1
>
> TXB => select a from foo for share; -- waits
>
> What am I missing?

Slightly verbose example of what can go wrong:

CREATE TABLE a (i int PRIMARY KEY);
INSERT INTO a VALUES (1);

CREATE TABLE b (a_id int);

>>>>>> Start with T1:

T1> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
T1> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Does a with i = 1 exist?
i
---
1
(1 Zeile)

T1> INSERT INTO b VALUES (1); -- Great, it existed, insert row
pointing to it in b.
INSERT 0 1

>>>>>> Switch to T2:

T2> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Evil
transaction T2 is intervening!
BEGIN
T2> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR SHARE.
i
---
1
(1 Zeile)

T2> SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
anything pointing to it.
a_id
------
(0 Zeilen)

T2> DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
blocks, because T1 is still holding the lock).

>>>>>> Switch to T1:

1> COMMIT; -- Commit the insertion of a row pointing to a with i = 1
(this releases all locks that T1 is holding).
COMMIT

>>>>>> T2 continues:

DELETE 1
T2> COMMIT; -- Commit the deletion of a with i = 1.
COMMIT
T2> SELECT * FROM b EXCEPT SELECT * FROM a; -- Check for inconsistencies.
a_id
------
1
(1 Zeile)

Woops.

Nicolas

From:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 12:59:00
Message-ID:	4BE95494.7080209@cs.helsinki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is getting way off topic, but:

On 5/11/10 3:55 PM +0300, Nicolas Barbier wrote:
> T2> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR SHARE.
> i
> ---
> 1
> (1 Zeile)
>
> T2> SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
> anything pointing to it.
> a_id
> ------
> (0 Zeilen)
>
> T2> DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
> blocks, because T1 is still holding the lock).

Obviously you wouldn't delete anything with a SHARE lock.

Regards,
Marko Tiikkaja

From:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 13:07:53
Message-ID:	u2nb0f3f5a11005110607qfbe75589j74660f9a87d75021@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2010/5/11 Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>:

> This is getting way off topic, but:
>
> On 5/11/10 3:55 PM +0300, Nicolas Barbier wrote:
>>
>> T2> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR
>> SHARE.
>> i
>> ---
>> 1
>> (1 Zeile)
>>
>> T2> SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
>> anything pointing to it.
>> a_id
>> ------
>> (0 Zeilen)
>>
>> T2> DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
>> blocks, because T1 is still holding the lock).
>
> Obviously you wouldn't delete anything with a SHARE lock.

So where would you put a SELECT ... FOR SHARE to fix the problem? (Per
"Will SELECT ... FOR SHARE not help?".) I agree that my second FOR
SHARE doesn't really make a lot of sense, but that doesn't disprove
the fact that the first FOR SHARE fails to ensure consistency.

Nicolas

From:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 13:11:42
Message-ID:	4BE9578E.4060801@cs.helsinki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/11/10 4:07 PM +0300, Nicolas Barbier wrote:
> 2010/5/11 Marko Tiikkaja<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>:
>
>> This is getting way off topic, but:
>>
>> On 5/11/10 3:55 PM +0300, Nicolas Barbier wrote:
>>>
>>> T2> SELECT i FROM a WHERE i = 1 FOR SHARE; -- Lock a with i = 1 FOR
>>> SHARE.
>>> i
>>> ---
>>> 1
>>> (1 Zeile)
>>>
>>> T2> SELECT a_id FROM b WHERE a_id = 1; -- Check whether it's got
>>> anything pointing to it.
>>> a_id
>>> ------
>>> (0 Zeilen)
>>>
>>> T2> DELETE FROM a WHERE i = 1; -- Nope, so delete a with i = 1 (this
>>> blocks, because T1 is still holding the lock).
>>
>> Obviously you wouldn't delete anything with a SHARE lock.
>
> So where would you put a SELECT ... FOR SHARE to fix the problem? (Per
> "Will SELECT ... FOR SHARE not help?".) I agree that my second FOR
> SHARE doesn't really make a lot of sense, but that doesn't disprove
> the fact that the first FOR SHARE fails to ensure consistency.

I took the "SELECT ... FOR SHARE" suggestion in a more general way,
suggesting the use of row-level locks. T2 should be holding an
exclusive row-level lock (SELECT ... FOR UPDATE) when checking for
references.

Regards,
Marko Tiikkaja

From:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 13:19:10
Message-ID:	4BE9594E.10106@cs.helsinki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/11/10 4:11 PM +0300, I wrote:
> I took the "SELECT ... FOR SHARE" suggestion in a more general way,
> suggesting the use of row-level locks. T2 should be holding an
> exclusive row-level lock (SELECT ... FOR UPDATE) when checking for
> references.

Hmm. Right, that transaction wouldn't see the rows in a serializable
transaction so this doesn't solve the problem.

Regards,
Marko Tiikkaja

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 14:03:50
Message-ID:	5573.1273586630@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> On 5/11/10 4:11 PM +0300, I wrote:
>> I took the "SELECT ... FOR SHARE" suggestion in a more general way,
>> suggesting the use of row-level locks. T2 should be holding an
>> exclusive row-level lock (SELECT ... FOR UPDATE) when checking for
>> references.

> Hmm. Right, that transaction wouldn't see the rows in a serializable
> transaction so this doesn't solve the problem.

Yeah. The hidden "magic" in the built-in FK code is not locking
(it does actually use SELECT FOR SHARE to lock rows). Rather, it's
about doing tuple liveness checks using snapshots that aren't available
at the SQL level, particularly in serializable transactions.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Marko Tiikkaja" <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, "Nicolas Barbier" <nicolas(dot)barbier(at)gmail(dot)com>
Cc:	"Jaime Casanova" <jaime(at)2ndquadrant(dot)com>, "Dmitry Fefelov" <fozzy(at)ac-sw(dot)com>, "Hans-Juergen Schoenig" <hs(at)cybertec(dot)at>, <sandor(at)cybertec(dot)at>,"Boszormenyi Zoltan" <zb(at)cybertec(dot)at>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-11 14:07:17
Message-ID:	4BE91E450200002500031507@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:

>>>>>>> Switch to T1:
>
> 1> COMMIT; -- Commit the insertion...
> COMMIT
>
>>>>>>> T2 continues:
>
> DELETE 1
> T2> COMMIT; -- Commit the deletion of a with i = 1.
> COMMIT
> T2> SELECT * FROM b EXCEPT SELECT * FROM a;
> a_id
> ------
> 1
> (1 Zeile)
>
> Woops.

This is exactly the sort of issue for which true serializable
behavior will provide a solution. I will be offering a patch to
implement that for 9.1 once 9.0 settles down. FWIW when you commit
T1, the patched code rolls back T2 with this message:

T2> DELETE FROM a WHERE i = 1;
ERROR: could not serialize access due to read/write dependencies
among transactions
HINT: The transaction might succeed if retried.

Thanks for the example; I will it to the others.

-Kevin

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	SHARE locks vs. DELETE in SERIALIZABLE mode (Was: Partitioning/inherited tables vs FKs)
Date:	2010-05-11 14:28:25
Message-ID:	FB6F8A87-E87A-4454-9709-F43A874A38A3@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 11, 2010, at 13:29 , Robert Haas wrote:
> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy(at)ac-sw(dot)com> wrote:
>>> The referential integrity triggers contain some extra magic that isn't
>>> easily simulatable in userland, and that is necessary to make the
>>> foreign key constraints airtight. We've discussed this previously but
>>> I don't remember which thread it was or the details of when things
>>> blow up. I think it's something like this: the parent has a tuple
>>> that is not referenced by any child. Transaction 1 begins, deletes
>>> the parent tuple (checking that it has no children), and pauses.
>>> Transaction 2 begins, adds a child tuple that references the parent
>>> tuple (checking that the parent exists, which it does), and commits.
>>> Transaction 1 commits.
>>
>> Will SELECT ... FOR SHARE not help?
>
> Try it, with the example above. I think you'll find that it doesn't.

That example does in fact work. Here is the precise sequence of commands I tested with constraint checking triggers implemented in PL/PGSQL.

C1: BEGIN
C1: DELETE FROM parent WHERE parent_id = 0
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE -- Optional
C2: INSERT INTO child (parent_id) VALUES (0) -- Waits for C1 to commit
C1: COMMIT -- Now C2 fails either with a constraint_violation or serialization_error

The reason this works is that C2's attempt to SHARE-lock the parent row blocks until C1 commits. In READ COMMITTED mode C2 will then realize that the parent row is now gone. In SERIALIZABLE mode it won't get that far, because the SHARE-locking attempt throws a serialization error since the parent row was concurrently modified.

The serialization error, however, disappears if the two transactions are swapped. The following sequence of commands succeeds, even though the FK constraint is not satisfied.

C1: BEGIN
C1: INSERT INTO child (parent_id) VALUES (0)
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
C2: SELECT TRUE -- Take snapshot *before* C1 commits
C1: COMMIT
C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
C2: COMMIT

It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently SHARE-locked is allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not usually depend on the other in which they are taken.

The build-in constraint triggers avoid the second case by checking not only for rows visible under the transaction's snapshot but also for rows visible under a freshly taken snapshot in the ri_parent PERFORM statement. I do wonder if the recheck was still needed if the DELETE in the second case threw a serialization_error also. Does anyone have an example that proves it necessary?

best regards,
Florian Pflug

Here are the table definitions and trigger functions I used:

CREATE TABLE parent (parent_id SERIAL NOT NULL PRIMARY KEY);
CREATE TABLE child (child_id SERIAL NOT NULL PRIMARY KEY, parent_id INTEGER NOT NULL);

CREATE FUNCTION ri_parent() RETURNS TRIGGER AS $body$
BEGIN
PERFORM TRUE FROM child WHERE parent_id = OLD.parent_id;
IF FOUND THEN
RAISE SQLSTATE '23503' USING MESSAGE = 'Parent ' || OLD.parent_id || ' still referenced during ' || TG_OP;
END IF;
RETURN NULL;
END;
$body$ LANGUAGE PLPGSQL VOLATILE;
CREATE TRIGGER ri_parent AFTER UPDATE OR DELETE ON parent FOR EACH ROW EXECUTE PROCEDURE ri_parent();

CREATE FUNCTION ri_child() RETURNS TRIGGER AS $body$
BEGIN
PERFORM TRUE FROM parent WHERE parent_id = NEW.parent_id FOR SHARE OF parent;
IF NOT FOUND THEN
RAISE SQLSTATE '23503' USING MESSAGE = 'Parent ' || NEW.parent_id || ' does not exist during ' || TG_OP;
END IF;
RETURN NULL;
END;
$body$ LANGUAGE PLPGSQL VOLATILE;
CREATE TRIGGER ri_child AFTER INSERT OR UPDATE ON child FOR EACH ROW EXECUTE PROCEDURE ri_child();

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: SHARE locks vs. DELETE in SERIALIZABLE mode (Was: Partitioning/inherited tables vs FKs)
Date:	2010-05-11 15:04:29
Message-ID:	AANLkTikL1BusXgcrnK6Zq9ZFeJbX1CaXyCSbuckHKG6F@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2010/5/11 Florian Pflug <fgp(at)phlo(dot)org>:
> On May 11, 2010, at 13:29 , Robert Haas wrote:
>> On Tue, May 11, 2010 at 2:16 AM, Dmitry Fefelov <fozzy(at)ac-sw(dot)com> wrote:
>>>> The referential integrity triggers contain some extra magic that isn't
>>>> easily simulatable in userland, and that is necessary to make the
>>>> foreign key constraints airtight. We've discussed this previously but
>>>> I don't remember which thread it was or the details of when things
>>>> blow up. I think it's something like this: the parent has a tuple
>>>> that is not referenced by any child. Transaction 1 begins, deletes
>>>> the parent tuple (checking that it has no children), and pauses.
>>>> Transaction 2 begins, adds a child tuple that references the parent
>>>> tuple (checking that the parent exists, which it does), and commits.
>>>> Transaction 1 commits.
>>>
>>> Will SELECT ... FOR SHARE not help?
>>
>> Try it, with the example above. I think you'll find that it doesn't.
>
> That example does in fact work. Here is the precise sequence of commands I tested with constraint checking triggers implemented in PL/PGSQL.
[...]
> The serialization error, however, disappears if the two transactions are swapped. The following sequence of commands succeeds, even though the FK constraint is not satisfied.

Thanks for figuring this out. I thought there was a case like this
but I couldn't remember exactly how to reproduce it.

> C1: BEGIN
> C1: INSERT INTO child (parent_id) VALUES (0)
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE -- Take snapshot *before* C1 commits
> C1: COMMIT
> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
> C2: COMMIT
>
> It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently SHARE-locked is allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not usually depend on the other in which they are taken.

Wait - I'm confused. The DELETE in your example happens after C1
commits, so C1 can't still be holding any locks (nor does C2 take any
locks prior to the commit of C1).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Florian Pflug" <fgp(at)phlo(dot)org>
Cc:	"Jaime Casanova" <jaime(at)2ndquadrant(dot)com>, "Dmitry Fefelov" <fozzy(at)ac-sw(dot)com>, "Hans-Juergen Schoenig" <hs(at)cybertec(dot)at>, <sandor(at)cybertec(dot)at>,"Boszormenyi Zoltan" <zb(at)cybertec(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SHARE locks vs. DELETE in SERIALIZABLE mode (Was: Partitioning/inherited tables vs FKs)
Date:	2010-05-11 16:36:05
Message-ID:	4BE941260200002500031521@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> wrote:

> The serialization error, however, disappears if the two
> transactions are swapped. The following sequence of commands
> succeeds, even though the FK constraint is not satisfied.
>
> C1: BEGIN
> C1: INSERT INTO child (parent_id) VALUES (0)
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE -- Take snapshot *before* C1 commits
> C1: COMMIT
> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
> C2: COMMIT

Thanks for another good example. Added to serializable test suite.

C2> DELETE FROM parent WHERE parent_id = 0;
ERROR: could not serialize access due to read/write dependencies
among transactions
HINT: The transaction might succeed if retried.
CONTEXT: SQL statement "SELECT TRUE FROM child WHERE parent_id =
OLD.parent_id"
PL/pgSQL function "ri_parent" line 2 at PERFORM

By the way, when adding these, I'm taking off the "FOR SHARE" or
"FOR UPDATE" clauses; they're not needed with true serializable
transactions. Otherwise, examples used as presented.

-Kevin

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: SHARE locks vs. DELETE in SERIALIZABLE mode (Was: Partitioning/inherited tables vs FKs)
Date:	2010-05-11 16:39:23
Message-ID:	CA7E9CA4-A213-4F57-ABE7-8706D9210281@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 11, 2010, at 17:04 , Robert Haas wrote:
> 2010/5/11 Florian Pflug <fgp(at)phlo(dot)org>:
>> C1: BEGIN
>> C1: INSERT INTO child (parent_id) VALUES (0)
>> C2: BEGIN
>> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
>> C2: SELECT TRUE -- Take snapshot *before* C1 commits
>> C1: COMMIT
>> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
>> C2: COMMIT
>>
>> It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently SHARE-locked is allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not usually depend on the other in which they are taken.
>
> Wait - I'm confused. The DELETE in your example happens after C1
> commits, so C1 can't still be holding any locks (nor does C2 take any
> locks prior to the commit of C1).

I used the word "lock" a bit sloppy there.

What I did want to point out is that any UPDATE by a SERIALIZABLE transaction to a row that has been concurrently updated causes a serialization error. The same happens when it instead SHARE- or UPDATE-locks the concurrently updated row. This is also independent from the commit-time of the concurrent transaction, as long as it is deemed invisible by the UPDATE/LOCK-ing transaction. In other words, any attempt to UPDATE, SHARE-lock or UPDATE-lock a row from within a SERIALIZABLE transaction fails if the visible row version isn't the latest row version. If, however, the order of the events is the other way around, such that the SHARE-locking or UPDATE-locking happens first, and the UPDATE afterwards, then no serialization error occurs!

That might seem sensible if you view SHARE-locks and UPDATE-locks as locks, and the "taint" that marks a row (the existence of a newer row version) after it has been updated by a transaction as "something else". After all, as you pointed out, the lock is gone as soon as the transaction commits. If, however, you view that "taint" as a slightly strange kind of lock that a transaction holds on the rows it updated even *after* the transaction committed, then it stops making sense. You now have a "locking" behavior with order-dependent conflicts.

Viewing those "taints" as locks is consistent with how that true serializability algorithm Kevin Grittner is working on deals with those things, I believe - or at least it's probably that paper in the back of my mind that made me call it "lock" in the first place.

It would be interesting to formalize this in the language of that paper - unfortunately, I probably lack the time to do this in the near future :-(

To avoid more confusion, here are the sequences of commands I have in mind:

No serialization error (and neither with "FOR UPDATE" instead of "FOR SHARE")
C1: BEGIN
C1: SELECT t.* FROM t WHERE id = 1 FOR SHARE
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
C2: SELECT TRUE --Take snapshot before c1 commits
C1: COMMIT
C2: UPDATE t SET id = 2 WHERE id = 1
C2: COMMIT

Serialization error (and also with "FOR UPDATE" instead of "FOR SHARE")
C1: BEGIN
C1: UPDATE t SET id = 2 WHERE id = 1
C2: BEGIN
C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
C2: SELECT TRUE --Take snapshot before c1 commits
C1: COMMIT
C2: SELECT t.* FROM t WHERE id = 1 FOR SHARE --serialization error
C2: COMMIT

best regards,
Florian Pflug

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: SHARE locks vs. DELETE in SERIALIZABLE mode (Was: Partitioning/inherited tables vs FKs)
Date:	2010-05-11 18:05:59
Message-ID:	4BE99C87.8060701@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/11/2010 12:39 PM, Florian Pflug wrote:
> On May 11, 2010, at 17:04 , Robert Haas wrote:
>> 2010/5/11 Florian Pflug <fgp(at)phlo(dot)org>:
>>> C1: BEGIN
>>> C1: INSERT INTO child (parent_id) VALUES (0)
>>> C2: BEGIN
>>> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
>>> C2: SELECT TRUE -- Take snapshot *before* C1 commits
>>> C1: COMMIT
>>> C2: DELETE FROM parent WHERE parent_id = 0 -- Works!
>>> C2: COMMIT
>>>
>>> It seems that while SHARE-locking a concurrently deleted row causes a serialization error, deleting a concurrently SHARE-locked is allowed. I do wonder if this shouldn't be considered a bug - whether locks conflict or not does not usually depend on the other in which they are taken.
>>
>> Wait - I'm confused. The DELETE in your example happens after C1
>> commits, so C1 can't still be holding any locks (nor does C2 take any
>> locks prior to the commit of C1).
>
> I used the word "lock" a bit sloppy there.
>
> What I did want to point out is that any UPDATE by a SERIALIZABLE transaction to a row that has been concurrently updated causes a serialization error. The same happens when it instead SHARE- or UPDATE-locks the concurrently updated row. This is also independent from the commit-time of the concurrent transaction, as long as it is deemed invisible by the UPDATE/LOCK-ing transaction. In other words, any attempt to UPDATE, SHARE-lock or UPDATE-lock a row from within a SERIALIZABLE transaction fails if the visible row version isn't the latest row version. If, however, the order of the events is the other way around, such that the SHARE-locking or UPDATE-locking happens first, and the UPDATE afterwards, then no serialization error occurs!
>
> That might seem sensible if you view SHARE-locks and UPDATE-locks as locks, and the "taint" that marks a row (the existence of a newer row version) after it has been updated by a transaction as "something else". After all, as you pointed out, the lock is gone as soon as the transaction commits. If, however, you view that "taint" as a slightly strange kind of lock that a transaction holds on the rows it updated even *after* the transaction committed, then it stops making sense. You now have a "locking" behavior with order-dependent conflicts.
>
> Viewing those "taints" as locks is consistent with how that true serializability algorithm Kevin Grittner is working on deals with those things, I believe - or at least it's probably that paper in the back of my mind that made me call it "lock" in the first place.
>
> It would be interesting to formalize this in the language of that paper - unfortunately, I probably lack the time to do this in the near future :-(
>
> To avoid more confusion, here are the sequences of commands I have in mind:
>
> No serialization error (and neither with "FOR UPDATE" instead of "FOR SHARE")
> C1: BEGIN
> C1: SELECT t.* FROM t WHERE id = 1 FOR SHARE
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE --Take snapshot before c1 commits
> C1: COMMIT
> C2: UPDATE t SET id = 2 WHERE id = 1
> C2: COMMIT
>
> Serialization error (and also with "FOR UPDATE" instead of "FOR SHARE")
> C1: BEGIN
> C1: UPDATE t SET id = 2 WHERE id = 1
> C2: BEGIN
> C2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
> C2: SELECT TRUE --Take snapshot before c1 commits
> C1: COMMIT
> C2: SELECT t.* FROM t WHERE id = 1 FOR SHARE --serialization error
> C2: COMMIT

The problem really is that in the case of deleting a PK row while a
concurrent transaction creates such a reference cannot be solved with
user level visibility rules in case of a serializable transacton, unless
you go really expensive routes.

One corner case is that the transaction doing the FK INSERT commits
after the serializable transaction doing the PK DELETE got its snapshot
and also does the PK check before the PK DELETE got the lock on it. No
user level visibility allows it to see that newly created reference. And
unless the FK INSERTer actually UPDATE's the PK row (expensive), the PK
DELETE will not throw anything. It will wait to get the lock and go
ahead with the delete.

The PK DELETE needs to be able to do some sort of dirty scan in order to
see those new references. That is what I think Tom was referring to.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Fefelov <fozzy(at)ac-sw(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: SHARE locks vs. DELETE in SERIALIZABLE mode (Was: Partitioning/inherited tables vs FKs)
Date:	2010-05-11 18:57:50
Message-ID:	69F9DCFD-D65A-4A14-82BD-A5DFF1C0AFEA@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 11, 2010, at 20:05 , Jan Wieck wrote:
> The problem really is that in the case of deleting a PK row while a concurrent transaction creates such a reference cannot be solved with user level visibility rules in case of a serializable transacton, unless you go really expensive routes.

Yeah. The information to detect this is there, though - the xmax of the PK row will be a multixact in this case, and one member of that set won't be deemed visible by the deleting transaction.

> One corner case is that the transaction doing the FK INSERT commits after the serializable transaction doing the PK DELETE got its snapshot and also does the PK check before the PK DELETE got the lock on it. No user level visibility allows it to see that newly created reference. And unless the FK INSERTer actually UPDATE's the PK row (expensive), the PK DELETE will not throw anything. It will wait to get the lock and go ahead with the delete.

Exactly. It consciously waits for the lock (knowing that it was held by a concurrent transaction *not* visible to the deleting transaction), and after obtaining the lock goes on to delete the row. If the concurrent transaction hadn't held a mere lock, but had instead UPDATEd the row, this would cause a serialization error.

> The PK DELETE needs to be able to do some sort of dirty scan in order to see those new references. That is what I think Tom was referring to.

Yeah. Though the need for that "dirty scan" (it's not actually a scan with DIRTY READ semantics, but rather one with READ COMMITTED semantics) might vanish if a SHARE lock had the same effect (causing a serialization error) on concurrent transactions that an UPDATE has.

I'm not yet convinced that this is true, nor do I necessarily think that making all SHARE locks behave that way would be a good idea. But if my assertion is in fact true it would allow for robust user-level referential constraints by either modifying SHARE-lock behavior or adding a new row-lock type.

best regards,
Florian Pflug

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers(at)postgreSQL(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-17 17:45:14
Message-ID:	8A11AFD4-BDA4-4748-8E5F-482F092DAB4E@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 6, 2010, at 4:31 AM, Florian Pflug wrote:
>> The use case for this was there were different news items,
>> and there were another table for summaries, that could point
>> to any of the news items table. Another use case could be
>> a large partitioned table with an FK to the main table where
>> the referring table might only contain very few "interesting" data.
>
> Yeah, this is a long-standing issue with inheritance. Table inheritance in postgres isn't much more than an implicit UNION done on selects plus some logic in ALTER TABLE to keep propagate structural changes. Indices and constraints basically always behave as if ONLY had been specified. I'm not even sure if the ids are globally unique in your example - it might be that each child's "id serial" column gets its very own sequence.
>
> One possible workaround is no create a table, say referred_ids, that contains all the ids from parent and all of its children, kept up-to-date via triggers, and point the FK constraint to that table. That also allows for a global unique constraint on the ids by definition a suitable unique or primary key constraint on referred_ids.
>
> What lies at the heart of this problem is the lack of multi-table indices and hence multi-table unique constraints in postgres. AFAIK with those in place the rest amounts to the removal of ONLY from the constraint check queries plus some code to propagate constraint triggers to child tables.

FWIW, we use inheritance for something other than partitioning, and I created a trigger that provides a crude form of a foreign key constraint, as well as one that provides a crude global unique constraint on the PK. Both probably have holes and race conditions, but I figure they're better than just hoping no one screws something up.

BTW, my intention is to release all the generic tools we've developed to pgFoundry, it just hasn't happened yet. If enough people find this stuff interesting I can try and up the priority on getting that done. (And if you're *really* wanting this stuff you could pay 2nd Quadrant or CMD to get it for you.)

test_us(at)workbook(dot)local=# \df+ payment_instruments.tg_payment_instruments_unique
List of functions
-[ RECORD 1 ]-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Schema | payment_instruments
Name | tg_payment_instruments_unique
Result data type | trigger
Argument data types |
Volatility | volatile
Owner | cnuadmin
Language | plpgsql
Source code |
:
: DECLARE
: name CONSTANT text := 'payment_instruments.tg_payment_instruments_unique';
: c_full_table_name CONSTANT text := TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME;
: BEGIN
: PERFORM tools.assert( TG_WHEN = 'BEFORE', TG_NAME || ' ON ' || c_full_table_name ||' must be an BEFORE trigger' );
: PERFORM tools.assert( TG_LEVEL = 'ROW', TG_NAME || ' ON ' || c_full_table_name ||' must be a row-level trigger' );
:
: -- Deleting would break RI, so don't allow it. Granted, this should probably be a separate trigger, but...
: PERFORM tools.assert( 'payment_instruments__payment_instruments__inherit__no_delete'
: , TG_OP != 'DELETE'
: , 'DELETEs are not allowed on ' || c_full_table_name || ' (they would break inheritance RI)'
: );
:
: RAISE DEBUG '%:
: TG_OP = %
: TG_TABLE_NAME = %
: NEW.payment_instrument_id = %'
: , name
: , TG_OP
: , TG_TABLE_NAME
: , NEW.payment_instrument_id
: ;
:
: -- Changing the PK would break RI, so we shouldn't allow it. Granted, this should probably be a separate trigger, but...
: IF TG_OP = 'UPDATE' THEN
: PERFORM tools.assert( 'payment_instruments__payment_instruments__inherit__pk_no_change'
: , NEW.payment_instrument_id IS NOT DISTINCT FROM OLD.payment_instrument_id
: , 'Changing payment_instrument_id on ' || c_full_table_name || ' is not allowed (it would break inheritance RI)'
: );
: ELSE
: -- Only check for dupes on insert, otherwise we'll see our own ID
: PERFORM tools.assert( 'payment_instruments__payment_instruments__inherit__unique'
: , NOT EXISTS( SELECT * FROM payment_instruments.payment_instruments WHERE payment_instrument_id = NEW.payment_instrument_id )
: , 'duplicate row violation, payment_instrument_id ' || coalesce( NEW.payment_instrument_id::text, '<NULL>' ) || ' already exists'
: );
: END IF;
:
: RETURN NEW;
: END;
:
:
Description | Trigger to try and prevent duplicated payment_instrument_ids. This trigger is in no way perfect and has a huge race condition, but generally these IDs should be getting assigned by a sequence, so we should not normally have an issue with duped IDs anyway.

test_us(at)workbook(dot)local=# \df+ payment_instruments.tg_payment_instruments_ri
List of functions
-[ RECORD 1 ]-------+------------------------------------------------------------------------------------------------------------------------------------------
Schema | payment_instruments
Name | tg_payment_instruments_ri
Result data type | trigger
Argument data types |
Volatility | volatile
Owner | cnuadmin
Language | plpgsql
Source code |
:
: DECLARE
: name CONSTANT text := 'payment_instruments.tg_payment_instruments_ri';
: c_full_table_name CONSTANT text := TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME;
: v_payment_instrument_type payment_instruments.payment_instrument_types.payment_instrument_type%TYPE;
: v_table_name text;
: v_only text := '';
: v_result int;
: sql text;
: BEGIN
: PERFORM tools.assert( TG_WHEN = 'AFTER', TG_NAME || ' ON ' || c_full_table_name ||' must be an AFTER trigger' );
: PERFORM tools.assert( TG_LEVEL = 'ROW', TG_NAME || ' ON ' || c_full_table_name ||' must be a row-level trigger' );
: PERFORM tools.assert( TG_OP IN ( 'INSERT', 'UPDATE' ), TG_NAME || ' ON ' || c_full_table_name ||' must be on INSERT or UPDATE' );
:
: -- Th generally won't be allowed, but the trigger should still support it
: IF NEW.payment_instrument_id IS NULL THEN
: RAISE DEBUG '%: payment_instrument_id is NULL, skipping check', name;
: RETURN NULL;
: END IF;
:
: -- If we're updating and we haven't modified payment_instrument_id, just bail
: IF TG_OP = 'UPDATE' THEN
: IF NEW.payment_instrument_id IS NOT DISTINCT FROM OLD.payment_instrument_id THEN
: RAISE DEBUG '%: payment_instrument_id ( = % ) is unchanged, skipping check', name, NEW.payment_instrument_id;
: RETURN NULL;
: END IF;
: END IF;
:
: /*
: * We want to not only check for existence of the desired row, we also want
: * to share-lock it. Unfortunately, sharelocks aren't implemented for
: * inherited tables, so we need to find the record in the correct table. We
: * can do this fairly easily if we can find out what "type" of record it is.
: */
:
: -- Figure out what type of record this is (of course, record might not exist here)
: SELECT INTO v_payment_instrument_type 'payment_instruments.' || payment_instrument_type || 's'
: FROM payment_instruments.payment_instrument_types__get( (
: SELECT payment_instrument_type_id
: FROM payment_instruments.payment_instruments pi
: WHERE pi.payment_instrument_id = NEW.payment_instrument_id
: ) )
: ;
: IF NOT FOUND OR v_payment_instrument_type IS NULL THEN
: -- There wasn't a record at all
: v_only := 'ONLY '; -- FOR SHARE won't work if we select from both the parent and the children
: v_table_name := 'payment_instruments.payment_instruments';
: ELSE
: -- We figured out what type of record this is, now try and lock the row in the right table
: v_table_name := v_payment_instrument_type;
: END IF;
:
: sql := 'SELECT 1 FROM ' || v_only || v_table_name || '
: WHERE payment_instrument_id = ' || NEW.payment_instrument_id || '
: FOR SHARE'
: ;
: RAISE DEBUG '%: Executing SQL %', name, sql;
: -- Note that simply using a PERFORM here might get optimized out.
: EXECUTE sql INTO v_result;
:
: /*
: * Normally we should always find an record, but if it was somehow
: * put in the wrong table, or if it was deleted after our initial
: * select then we wouldn't have one.
: */
: IF v_result = 1 THEN
: RAISE DEBUG '%: record for payment_instrument_id = % found in table %', name, NEW.payment_instrument_id, v_table_name;
: RETURN NULL;
: END IF;
:
: RAISE EXCEPTION 'insert on update on table "%.%" violates foreign key; payment_instrument_id=(%) is not present in table "%"'
: , TG_TABLE_SCHEMA
: , TG_TABLE_NAME
: , NEW.payment_instrument_id
: , v_table_name
: ;
: END;
:
:
Description | Enables Refferential Integrity at the payment_instruments level (normal RI on a table that is an inheritance parent does not really work)

--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Sándor Miglécz <sandor(at)cybertec(dot)at>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject:	Re: Partitioning/inherited tables vs FKs
Date:	2010-05-18 10:20:51
Message-ID:	AANLkTilQrV6i8AG1bMIGYU7WEXrhu8x6gLtCqu-VnUh8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 6, 2010 at 10:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> * the index grows as the size of the total data set, it's not limited
> by partition size
>
> * can't cheaply drop one partition any more, you have to vacuum the
> (big) index first

So I wholeheartedly agree with the general sentiment that if you need
global indexes then partitioning just isn't really the right tool for
you.

But it occurs to me that we could defer the vacuum safely. I'm
assuming a index heap-tid pointer for a global index would include a
relid or some other identifier to specify which partition the tuple is
in. If you drop that partition those can all just be left as dangling
pointers as long as we don't reuse that id. So all we would need is
some way to leave a catalog entry reserving that id. The data files
can be truncated and deleted normally and whenever vacuum does run
against the index it can clean up the catalog entries for the deleted
partitions.

But I would rather work on having unique and foreign key constraints
that work on keys which include the partition key than work on global
indexes.
--
greg