Re: Question about RI checks

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Nick Barnes <nickbarnes01(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Question about RI checks
Date: 2014-10-24 11:53:47
Message-ID: AC19E024-578A-4E1A-87DC-BF068094785A@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct23, 2014, at 17:45 , Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
> transaction a check for child rows when validating a parent DELETE
> should consider both rows which exist according to the transaction
> snapshot and according to a "current" snapshot.

I've pondered this further, and unfortunately it seems that this
isn't sufficient to guarantee true serializability :-(

Verifying that both snapshots contain exactly the same rows does not
prevent a child row from being inserted and immediately deleted again,
not if both actions happen *after* the parent-updating transaction
took its snapshot, but *before* it takes the crosscheck snapshot.

Let parent(id) and child(id, parent_id) again be two tables with a
FK constraint between them, let <child> be initially empty, and let
<parent> contain a single row (1).

Assume PD is a transaction which deletes all rows from <parent>,
CI a transaction which inserts the row (1, 1) into <child>, and
CD a transaction which deletes that row again.

Even with the extended cross-checking you propose, we'd still allow
the following concurrent schedule

1. PD takes snapshot
2. CI starts and completes
3. CD starts and completes
4. PD deletes from <parent> without complaining, since there were
no conflicting rows at time (1), and none at time (4).

So far, all is well. But add two more tables, called <ci_before_pd>
and <pd_before_cd>, both initially containing one row. Let CI scan
<ci_before_pd>, let PD delete from <ci_before_pd> and scan <pd_before_cd>,
and let CD delete from <pd_before_cd>. In the concurrent schedule from
above, CI will see the row in <ci_before_pd>, and PD will delete it, and
PD will see the row in <pd_before_cd> that CD deletes. Note that SSI *will*
allow that schedule to occur without raising a serialization error
The only serial schedule which yields the same results for the various
queries pertaining <ci_before_pd> and <pd_before_cd> is

CI -> PD -> CD,

i.e. PD has to run *between* CI and CD. But in that serial schedule,
PD *should* see a FK key violation, since CI has inserted a child which
CD hasn't yet deleted.

There is thus *no* serial schedule which yields the same results as the
concurrent schedule above for the queries pertaining <parent> and <child>,
*and* for the queries pertaining <ci_before_pd> and <pd_before_cd>, i.e
the concurrent schedule is *not* serializable. Yet even the extended cross-
check won't detect this.

Attached is an isolationtester spec file which implements this example,
and the corresponding out-file which shows that SSI permits the concurrent
schedule. Since SSI doesn't concern itself with RI enforcement queries,
it would also permit that schedule if we extended the cross-check, I think.

(I used REL9_4_STABLE as of today to try this, commit
1cf54b00ba2100083390223a8244430643c1ec07)

best regards,
Florian Pflug

Attachment Content-Type Size
fk-consistency2.spec application/octet-stream 2.0 KB
fk-consistency2.out application/octet-stream 1.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-10-24 11:58:07 Re: Function array_agg(array)
Previous Message Michael Paquier 2014-10-24 11:46:19 Re: Function array_agg(array)