Quick Links

Re: FOR KEY LOCK foreign keys

Lists:	pgsql-hackers

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	FOR KEY LOCK foreign keys
Date:	2011-01-13 21:58:09
Message-ID:	1294953201-sup-2099@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

As previously commented, here's a proposal with patch to turn foreign
key checks into something less intrusive.

The basic idea, as proposed by Simon Riggs, was discussed in a previous
pgsql-hackers thread here:
http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com

It goes like this: instead of acquiring a shared lock on the involved
tuple, we only acquire a "key lock", that is, something that prevents
the tuple from going away entirely but not from updating fields that are
not covered by any unique index.

As discussed, this is still more restrictive than necessary (we could
lock only those columns that are involved in the foreign key being
checked), but that has all sorts of implementation level problems, so we
settled for this, which is still much better than the current state of
affairs.

I published about this here:
http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/

So, as a rough design,

1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.
4. It also conflicts with UPDATE if the UPDATE modifies an attribute
indexed by a unique index.

Here's a patch for this, on which I need to do some more testing and
update docs.

Some patch details:

1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010.
2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the
Xmax to the locker (just like the other lock marks). If the tuple is
already key-locked, a MultiXactId needs to be created from the
original locker(s) and the new transaction.
3. The original tuple needs to be marked with the Cmax of the locking
command, to prevent it from being seen in the same transaction.
4. A non-conflicting update to the tuple must carry forward some fields
from the original tuple into the updated copy. Those include Xmax,
XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.
5. We check for the is-indexed condition early in heap_update. This
check is independent of the HOT check, which occurs later in the
routine.
6. The relcache entry now keeps two lists of indexed attributes; the new
one only covers unique indexes. Both lists are built in a single
pass over the index list and saved in the relcache entry, so a
heap_update call only does this once. The main difference between
the two checks is that the one for HOT is done after the tuple has
been toasted. This cannot be done for this check, because the
toaster runs too late. This means some work is duplicated. We
could optimize this further.

Something else that might be of interest: the patch as presented here
does NOT solve the deadlock problem originally presented by Joel
Jacobson. It does solve the second, simpler example I presented in my
blog article referenced above, however. I need to have a closer look at
that problem to figure out if we could fix the deadlock too.

I need to thank Simon Riggs for the original idea, and Robert Haas for
some thoughtful discussion on IM that helped me figure out some
roadblocks. Of course, without the pgsql-hackers discussion there
wouldn't be any patch at all.

I also have to apologize to everyone for the lateness in this. Some
severe illness brought me down, then the holiday season slowed
everything almost to a halt, then a rushed but very much welcome move to
a larger house prevented me from dedicating the time I originally
intended. All those things are settled now, hopefully.

--
Álvaro Herrera

Attachment	Content-Type	Size
fklocks.patch	application/octet-stream	66.3 KB

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joel Jacobson <joel(at)gluefinance(dot)com>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-14 18:00:48
Message-ID:	98402EDE-8D91-41BE-8387-2C687F14265B@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote:

> Something else that might be of interest: the patch as presented here
> does NOT solve the deadlock problem originally presented by Joel
> Jacobson. It does solve the second, simpler example I presented in my
> blog article referenced above, however. I need to have a closer look at
> that problem to figure out if we could fix the deadlock too.

Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlock example?

Best,

David

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joel Jacobson <joel(at)gluefinance(dot)com>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-14 18:08:27
Message-ID:	AANLkTi=+=5fqPRaWa_YKrQA3BYKmZubfPoL7Mbb=W9+4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 14, 2011 at 1:00 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote:
>
>> Something else that might be of interest: the patch as presented here
>> does NOT solve the deadlock problem originally presented by Joel
>> Jacobson. It does solve the second, simpler example I presented in my
>> blog article referenced above, however. I need to have a closer look at
>> that problem to figure out if we could fix the deadlock too.
>
> Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlock example?

Alvaro, are you planning to add this to the CF?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	David E(dot) Wheeler <david(at)kineticode(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joel Jacobson <joel(at)gluefinance(dot)com>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-14 18:14:18
Message-ID:	1295028828-sup-2187@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from David E. Wheeler's message of vie ene 14 15:00:48 -0300 2011:
> On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote:
>
> > Something else that might be of interest: the patch as presented here
> > does NOT solve the deadlock problem originally presented by Joel
> > Jacobson. It does solve the second, simpler example I presented in my
> > blog article referenced above, however. I need to have a closer look at
> > that problem to figure out if we could fix the deadlock too.
>
> Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlock example?

Not necessarily -- we can implement that as a later refinement/improvement.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David E(dot) Wheeler <david(at)kineticode(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joel Jacobson <joel(at)gluefinance(dot)com>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-14 18:14:27
Message-ID:	1295028859-sup-6640@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of vie ene 14 15:08:27 -0300 2011:
> On Fri, Jan 14, 2011 at 1:00 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> > On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote:
> >
> >> Something else that might be of interest: the patch as presented here
> >> does NOT solve the deadlock problem originally presented by Joel
> >> Jacobson. It does solve the second, simpler example I presented in my
> >> blog article referenced above, however. I need to have a closer look at
> >> that problem to figure out if we could fix the deadlock too.
> >
> > Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlock example?
>
> Alvaro, are you planning to add this to the CF?

Eh, yes.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-22 21:25:26
Message-ID:	m21v44y6k9.fsf@2ndQuadrant.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

This is a first level of review for the patch. I finally didn't get as
much time as I hoped I would, so couldn't get familiar with the locking
internals and machinery… as a result, I can't much comment on the code.

The patch applies cleanly (patch moves one hunk all by itself) and
compiles with no warning. It includes no docs, and I think it will be
required to document the user visible SELECT … FOR KEY LOCK OF x new
feature.

Code wise, very few comments here. It looks like the new code had been
there from the beginning by the reading of the patch. I only have one
question about a variable naming:

! COPY_SCALAR_FIELD(forUpdate);
! COPY_SCALAR_FIELD(strength);

forUpdate used to be a boolean, strength is now one of LCS_FORUPDATE,
LCS_FORSHARE or LCS_FORKEYLOCK. I wonder if that's a fortunate naming
here, but IANANS (I Am Not A Native Speaker).

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> As previously commented, here's a proposal with patch to turn foreign
> key checks into something less intrusive.
>
> The basic idea, as proposed by Simon Riggs, was discussed in a previous
> pgsql-hackers thread here:
> http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com

This link here provides a test case that will issue a deadlock, and

> Something else that might be of interest: the patch as presented here
> does NOT solve the deadlock problem originally presented by Joel

Indeed, that's the first thing I tried… I'm not sure about why fixing
the deadlock issue wouldn't be in this patch scope?

The thing that I'm able to confirm by running this test case is that the
RI trigger check is done with the new code from the patch:

CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."a" x WHERE "aid" OPERATOR(pg_catalog.=) $1 FOR KEY LOCK OF x"

Sorry for not posting more tests yet, but seeing how late I am to find
the time for the first level review I figured I might as well send that
already. I will try some other test cases, but sure enough, that should
be part of the user level documentation…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-23 01:46:17
Message-ID:	AANLkTik7nPN10As8-6=hTrPEw5aZTc0a-LAQA3HwTYPv@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 22, 2011 at 4:25 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Hi,
>
> This is a first level of review for the patch. I finally didn't get as
> much time as I hoped I would, so couldn't get familiar with the locking
> internals and machinery… as a result, I can't much comment on the code.
>
> The patch applies cleanly (patch moves one hunk all by itself) and
> compiles with no warning. It includes no docs, and I think it will be
> required to document the user visible SELECT … FOR KEY LOCK OF x new
> feature.

I feel like this should be called "KEY SHARE" rather than "KEY LOCK".
It's essentially a weaker version of the SHARE lock we have now, but
that's not clear from the name.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Marti Raudsepp <marti(at)juffo(dot)org>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-01-28 20:42:20
Message-ID:	AANLkTi=X4tghfgiXJ-eht52+_c3aWCamiWhwMCToq7LO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 13, 2011 at 23:58, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> It goes like this: instead of acquiring a shared lock on the involved
> tuple, we only acquire a "key lock", that is, something that prevents
> the tuple from going away entirely but not from updating fields that are
> not covered by any unique index.
>
> As discussed, this is still more restrictive than necessary (we could
> lock only those columns that are involved in the foreign key being
> checked), but that has all sorts of implementation level problems, so we
> settled for this, which is still much better than the current state of
> affairs.

Seems to me that you can go a bit further without much trouble, if you
only consider indexes that *can* be referenced by foreign keys --
indexes that don't have expressions or predicates.

I frequently create unique indexes on (lower(name)) where I want
case-insensitive unique indexes, or use predicates like WHERE
deleted=false to allow duplicates after deleting the old item.

So, instead of:
if (indexInfo->ii_Unique)
you can write:
if (indexInfo->ii_Unique
&& indexInfo->ii_Expressions == NIL
&& indexInfo->ii_Predicate == NIL)

This would slightly simplify RelationGetIndexAttrBitmap() because you
no longer have to worry about including columns that are part of index
expressions/predicates.

I guess rd_uindexattr should be renamed to something like
rd_keyindexattr or rd_keyattr.

Is this worthwhile? I can write and submit a patch if it sounds good.

Regards,
Marti

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-11 07:13:22
Message-ID:	20110211071322.GB26971@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Alvaro,

On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote:
> As previously commented, here's a proposal with patch to turn foreign
> key checks into something less intrusive.
>
> The basic idea, as proposed by Simon Riggs, was discussed in a previous
> pgsql-hackers thread here:
> http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
>
> It goes like this: instead of acquiring a shared lock on the involved
> tuple, we only acquire a "key lock", that is, something that prevents
> the tuple from going away entirely but not from updating fields that are
> not covered by any unique index.

First off, this is highly-valuable work. My experience echoes that of some
other commenter (I *think* it was Josh Berkus, but I can't find the original
reference now): this is the #1 cause of production deadlocks. To boot, the
patch is small and fits cleanly into the current code.

The patch had a trivial conflict in planner.c, plus plenty of offsets. I've
attached the rebased patch that I used for review. For anyone following along,
all the interesting hunks touch heapam.c; the rest is largely mechanical. A
"diff -w" patch is also considerably easier to follow.

Incidentally, HeapTupleSatisfiesMVCC has some bits of code like this (not new):

/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);

They're specifically only allowed for SHARE and KEY locks, right?
heap_lock_tuple seems to assume as much.

Having read [1], I tried to work out what kind of table-level lock we must hold
before proceeding with a DDL operation that changes the set of "key" columns.
The thing we must prevent is an UPDATE making a concurrent decision about its
need to conflict with a FOR KEY LOCK lock. Therefore, it's sufficient for the
DDL to take ShareLock. CREATE INDEX does just this, so we're good.

[1] http://archives.postgresql.org/message-id/22196.1282757644@sss.pgh.pa.us

I observe visibility breakage with this test case:

-- Setup
BEGIN;
DROP TABLE IF EXISTS child, parent;
CREATE TABLE parent (
parent_key int PRIMARY KEY,
aux text NOT NULL
);
CREATE TABLE child (
child_key int PRIMARY KEY,
parent_key int NOT NULL REFERENCES parent
);
INSERT INTO parent VALUES (1, 'foo');
COMMIT;
TABLE parent; -- set hint bit
SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0));
to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
--------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
902 | 1 | 8160 | 1 | 32 | 1125 | 0 | 33 | (0,1) | 2 | 2306 | 24 | NULL | NULL

-- Interleaved part
P0:
BEGIN;
INSERT INTO child VALUES (1, 1);
P1:
BEGIN;
SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0));
to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
--------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
112 | 1 | 8160 | 1 | 32 | 1125 | 1126 | 33 | (0,1) | 2 | 274 | 24 | NULL | NULL
UPDATE parent SET aux = 'baz'; -- UPDATE 1
TABLE parent; -- 0 rows
SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0));
to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
--------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
102 | 1 | 8160 | 1 | 32 | 1125 | 1128 | 0 | (0,2) | 16386 | 258 | 24 | NULL | NULL
2012 | 2 | 8128 | 1 | 32 | 1128 | 1126 | 2249 | (0,2) | -32766 | 8210 | 24 | NULL | NULL

The problem seems to be that funny t_cid (2249). Tracing through heap_update,
the new code is not setting t_cid during this test case.

My own deadlock test case, which is fixed by the patch, uses the same setup.
Its interleaved part is as follows:

P0: INSERT INTO child VALUES (1, 1);
P1: INSERT INTO child VALUES (2, 1);
P0: UPDATE parent SET aux = 'bar';
P1: UPDATE parent SET aux = 'baz';

> As discussed, this is still more restrictive than necessary (we could
> lock only those columns that are involved in the foreign key being
> checked), but that has all sorts of implementation level problems, so we
> settled for this, which is still much better than the current state of
> affairs.

Agreed. What about locking only the columns that are actually used in any
incoming foreign key (not just the FK in question at the time)? We'd just have
more work to do on a cold relcache, a pg_depend scan per unique index.

Usually, each of my tables has no more than one candidate key referenced by
FOREIGN KEY constraints: the explicit or notional primary key. I regularly add
UNIQUE indexes not used by any foreign key, though. YMMV. Given this
optimization, constraining the lock even further by individual FOREIGN KEY
constraint would be utterly unimportant for my databases.

> I published about this here:
> http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/
>
> So, as a rough design,
>
> 1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
> 2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
> 3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.

It does not conflict with SELECT FOR SHARE, does it?

> 4. It also conflicts with UPDATE if the UPDATE modifies an attribute
> indexed by a unique index.

This is the per-tuple lock conflict table before your change:

FOR SHARE conflicts with FOR UPDATE
FOR UPDATE conflicts with FOR UPDATE and FOR SHARE

After:

FOR KEY LOCK conflicts with FOR UPDATE
FOR SHARE conflicts with FOR UPDATE
FOR UPDATE conflicts with FOR UPDATE, FOR SHARE, (FOR KEY LOCK if cols <@ keycols)

The odd thing here is the checking of an outside condition to decide whether
locks conflict. Normally, to get a different conflict list, we add another lock
type. What about this?

FOR KEY SHARE conflicts with FOR KEY UPDATE
FOR SHARE conflicts with FOR KEY UPDATE, FOR UPDATE
FOR UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE
FOR KEY UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE, FOR KEY SHARE

This would also fix Joel's test case. A disadvantage is that we'd check for
changes in FK-referenced columns-change even when there's no key lock activity.
That seems acceptable, but it's a point for debate.

Either way, SELECT ... FOR UPDATE will probably end up different than a true
update. The full behavior relies on having an old tuple to bear the UPDATE lock
and a new tuple to bear the KEY lock. In the current patch, SELECT ... FOR
UPDATE blocks on KEY just like SHARE. So there will be that wart in the
conflict lists, no matter what.

> Here's a patch for this, on which I need to do some more testing and
> update docs.
>
> Some patch details:
>
> 1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010.
> 2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the
> Xmax to the locker (just like the other lock marks). If the tuple is
> already key-locked, a MultiXactId needs to be created from the
> original locker(s) and the new transaction.

Makes sense.

> 3. The original tuple needs to be marked with the Cmax of the locking
> command, to prevent it from being seen in the same transaction.

Could you elaborate on this requirement?

> 4. A non-conflicting update to the tuple must carry forward some fields
> from the original tuple into the updated copy. Those include Xmax,
> XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.

HeapTupleHeaderGetCmax() has this assertion:

/* We do not store cmax when locking a tuple */
Assert(!(tup->t_infomask & (HEAP_MOVED | HEAP_IS_LOCKED)));

Assuming that assertion is still valid, there will never be a HEAP_COMBOCID flag
to copy. Right?

> 5. We check for the is-indexed condition early in heap_update. This
> check is independent of the HOT check, which occurs later in the
> routine.
> 6. The relcache entry now keeps two lists of indexed attributes; the new
> one only covers unique indexes. Both lists are built in a single
> pass over the index list and saved in the relcache entry, so a
> heap_update call only does this once. The main difference between
> the two checks is that the one for HOT is done after the tuple has
> been toasted. This cannot be done for this check, because the
> toaster runs too late. This means some work is duplicated. We
> could optimize this further.

Seems reasonable.

One thing that helped me to think through Joel's test case is that the two
middle statements take tuple-level locks, but that's inessential. Granted, FOR
UPDATE tuple locks are by far the most common kind of blocking in production.
Here's another formulation that also still gets a deadlock:

P1: BEGIN;
P2: BEGIN;
P1: UPDATE A SET Col1 = 1 WHERE AID = 1; -- FOR UPDATE tuple lock
P2: LOCK TABLE pg_am IN ROW SHARE MODE
P1: LOCK TABLE pg_am IN ROW SHARE MODE -- blocks
P2: UPDATE B SET Col2 = 1 WHERE BID = 2; -- blocks for KEY => deadlock

As best I can tell, the explanation is that this patch only improves things when
the FOR KEY LOCK precedes the FOR UPDATE. Splitting out FOR KEY UPDATE fixes
that. It would also optimize this complement to your own blog post example,
which still blocks needlessly:

-- Session 1
CREATE TABLE foo (a int PRIMARY KEY, b text);
CREATE TABLE bar (a int NOT NULL REFERENCES foo);
INSERT INTO foo VALUES (42);

BEGIN;
UPDATE foo SET b = 'Hello World' ;

-- Session 2
INSERT INTO bar VALUES (42);

Automated tests would go a long way toward building confidence that this patch
does the right thing. Thanks to the SSI patch, we now have an in-tree test
framework for testing interleaved transactions. The only thing it needs to be
suitable for this work is a way to handle blocked commands. If you like, I can
try to whip something up for that.

Hunk-specific comments (based on diff -w version of patch):

> *** a/src/backend/access/heap/heapam.c
> --- b/src/backend/access/heap/heapam.c

> ***************
> *** 2484,2489 **** l2:
> --- 2487,2508 ----
> xwait = HeapTupleHeaderGetXmax(oldtup.t_data);
> infomask = oldtup.t_data->t_infomask;
>
> + /*
> + * if it's only key-locked and we're not updating an indexed column,
> + * we can act though MayBeUpdated was returned, but the resulting tuple
> + * needs a bunch of fields copied from the original.
> + */
> + if ((infomask & HEAP_XMAX_KEY_LOCK) &&
> + !(infomask & HEAP_XMAX_SHARED_LOCK) &&
> + HeapSatisfiesHOTUpdate(relation, keylck_attrs,
> + &oldtup, newtup))
> + {
> + result = HeapTupleMayBeUpdated;
> + keylocked_update = true;
> + }

The condition for getting here is "result == HeapTupleBeingUpdated && wait". If
!wait, we'd never get the chance to see if this would avoid the wait. Currently
all callers pass wait = true, so this is academic.

> +
> + if (!keylocked_update)
> + {
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>
> /*
> ***************
> *** 2563,2568 **** l2:
> --- 2582,2588 ----
> else
> result = HeapTupleUpdated;
> }
> + }
>
> if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
> {
> ***************
> *** 2609,2621 **** l2:
>
> newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
> newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
> ! newtup->t_data->t_infomask |= (HEAP_XMAX_INVALID | HEAP_UPDATED);
> HeapTupleHeaderSetXmin(newtup->t_data, xid);
> - HeapTupleHeaderSetCmin(newtup->t_data, cid);
> - HeapTupleHeaderSetXmax(newtup->t_data, 0); /* for cleanliness */
> newtup->t_tableOid = RelationGetRelid(relation);
>
> /*
> * Replace cid with a combo cid if necessary. Note that we already put
> * the plain cid into the new tuple.
> */
> --- 2629,2671 ----
>
> newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
> newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
> ! newtup->t_data->t_infomask |= HEAP_UPDATED;
> HeapTupleHeaderSetXmin(newtup->t_data, xid);
> newtup->t_tableOid = RelationGetRelid(relation);
>
> /*
> + * If this update is touching a tuple that was key-locked, we need to
> + * carry forward some bits from the old tuple into the new copy.
> + */
> + if (keylocked_update)
> + {
> + HeapTupleHeaderSetXmax(newtup->t_data,
> + HeapTupleHeaderGetXmax(oldtup.t_data));
> + newtup->t_data->t_infomask |= (oldtup.t_data->t_infomask &
> + (HEAP_XMAX_IS_MULTI |
> + HEAP_XMAX_KEY_LOCK));
> + /*
> + * we also need to copy the combo CID stuff, but only if the original
> + * tuple was created by us; otherwise the combocid module complains
> + * (Alternatively we could use HeapTupleHeaderGetRawCommandId)
> + */

This comment should describe why it's correct, not just indicate that another
module complains if we do otherwise.

> + if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(oldtup.t_data)))
> + {
> + newtup->t_data->t_infomask |= (oldtup.t_data->t_infomask &
> + HEAP_COMBOCID);

HeapTupleHeaderSetCmin unsets HEAP_COMBOCID, so this is a no-op.

> + HeapTupleHeaderSetCmin(newtup->t_data,
> + HeapTupleHeaderGetCmin(oldtup.t_data));
> + }
> +
> + }
> + else
> + {
> + newtup->t_data->t_infomask |= HEAP_XMAX_INVALID;
> + HeapTupleHeaderSetXmax(newtup->t_data, 0); /* for cleanliness */
> + HeapTupleHeaderSetCmin(newtup->t_data, cid);
> + }

As mentioned above, this code can fail to set Cmin entirely.

> +
> + /*
> * Replace cid with a combo cid if necessary. Note that we already put
> * the plain cid into the new tuple.
> */
> ***************
> *** 3142,3148 **** heap_lock_tuple(Relation relation, HeapTuple tuple, Buffer *buffer,
> LOCKMODE tuple_lock_type;
> bool have_tuple_lock = false;
>
> ! tuple_lock_type = (mode == LockTupleShared) ? ShareLock : ExclusiveLock;
>
> *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> --- 3192,3211 ----
> LOCKMODE tuple_lock_type;
> bool have_tuple_lock = false;
>
> ! /* in FOR KEY LOCK mode, we use a share lock temporarily */

I found this comment confusing. The first several times I read it, I thought it
meant that we start out by setting HEAP_XMAX_SHARED_LOCK in the tuple, then
downgrade it. However, this is talking about the ephemeral heavyweight lock.
Maybe it's just me, but consider deleting this comment.

> ! switch (mode)
> ! {
> ! case LockTupleShared:
> ! case LockTupleKeylock:
> ! tuple_lock_type = ShareLock;
> ! break;
> ! case LockTupleExclusive:
> ! tuple_lock_type = ExclusiveLock;
> ! break;
> ! default:
> ! elog(ERROR, "invalid tuple lock mode");
> ! tuple_lock_type = 0; /* keep compiler quiet */
> ! }
>
> *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> ***************
> *** 3175,3192 **** l3:
> LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
>
> /*
> ! * If we wish to acquire share lock, and the tuple is already
> ! * share-locked by a multixact that includes any subtransaction of the
> * current top transaction, then we effectively hold the desired lock
> * already. We *must* succeed without trying to take the tuple lock,
> * else we will deadlock against anyone waiting to acquire exclusive
> * lock. We don't need to make any state changes in this case.
> */
> ! if (mode == LockTupleShared &&
> (infomask & HEAP_XMAX_IS_MULTI) &&
> MultiXactIdIsCurrent((MultiXactId) xwait))
> {
> ! Assert(infomask & HEAP_XMAX_SHARED_LOCK);
> /* Probably can't hold tuple lock here, but may as well check */
> if (have_tuple_lock)
> UnlockTuple(relation, tid, tuple_lock_type);
> --- 3238,3255 ----
> LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
>
> /*
> ! * If we wish to acquire a key or share lock, and the tuple is already
> ! * share- or key-locked by a multixact that includes any subtransaction of the
> * current top transaction, then we effectively hold the desired lock
> * already. We *must* succeed without trying to take the tuple lock,
> * else we will deadlock against anyone waiting to acquire exclusive
> * lock. We don't need to make any state changes in this case.
> */
> ! if ((mode == LockTupleShared || mode == LockTupleKeylock) &&
> (infomask & HEAP_XMAX_IS_MULTI) &&
> MultiXactIdIsCurrent((MultiXactId) xwait))
> {
> ! Assert(infomask & HEAP_IS_SHARE_LOCKED);
> /* Probably can't hold tuple lock here, but may as well check */
> if (have_tuple_lock)
> UnlockTuple(relation, tid, tuple_lock_type);

If we're upgrading from KEY LOCK to a SHARE, we can't take this shortcut. At a
minimum, we need to update t_infomask.

Then there's a choice: do we queue up normally and risk deadlock, or do we skip
the heavyweight lock queue and risk starvation? Your last blog post suggests a
preference for the latter. I haven't formed a strong preference, but given this
behavior, ...

P0: FOR SHARE -- acquired
P1: UPDATE -- blocks
P2: FOR SHARE -- blocks

... I'm not sure why making the first lock FOR KEY LOCK ought to change things.

Some documentation may be in order about the deadlock hazards of mixing FOR
SHARE locks with foreign key usage.

> ***************
> *** 3217,3226 **** l3:
> have_tuple_lock = true;
> }
>
> ! if (mode == LockTupleShared && (infomask & HEAP_XMAX_SHARED_LOCK))
> {
> /*
> ! * Acquiring sharelock when there's at least one sharelocker
> * already. We need not wait for him/them to complete.
> */
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> --- 3280,3290 ----
> have_tuple_lock = true;
> }
>
> ! if ((mode == LockTupleShared || mode == LockTupleKeylock) &&
> ! (infomask & HEAP_IS_SHARE_LOCKED))
> {
> /*
> ! * Acquiring sharelock or keylock when there's at least one such locker
> * already. We need not wait for him/them to complete.
> */
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);

Likewise: we cannot implicitly upgrade someone else's KEY LOCK to SHARE.

> ***************
> *** 3476,3482 **** l3:
> xlrec.target.tid = tuple->t_self;
> xlrec.locking_xid = xid;
> xlrec.xid_is_mxact = ((new_infomask & HEAP_XMAX_IS_MULTI) != 0);
> ! xlrec.shared_lock = (mode == LockTupleShared);
> rdata[0].data = (char *) &xlrec;
> rdata[0].len = SizeOfHeapLock;
> rdata[0].buffer = InvalidBuffer;
> --- 3543,3549 ----
> xlrec.target.tid = tuple->t_self;
> xlrec.locking_xid = xid;
> xlrec.xid_is_mxact = ((new_infomask & HEAP_XMAX_IS_MULTI) != 0);
> ! xlrec.lock_strength = mode == LockTupleShared ? 's' : mode == LockTupleKeylock ? 'k' : 'x';

Seems strange having these character literals. Why not just cast the mode to a
char? Could even set the enum values to the ASCII values of those characters,
if you were so inclined. Happily, they fall in the right order.

> rdata[0].data = (char *) &xlrec;
> rdata[0].len = SizeOfHeapLock;
> rdata[0].buffer = InvalidBuffer;

> *** a/src/backend/executor/execMain.c
> --- b/src/backend/executor/execMain.c

> ***************
> *** 112,119 **** lnext:
> /* okay, try to lock the tuple */
> if (erm->markType == ROW_MARK_EXCLUSIVE)
> lockmode = LockTupleExclusive;
> ! else
> lockmode = LockTupleShared;
>
> test = heap_lock_tuple(erm->relation, &tuple, &buffer,
> &update_ctid, &update_xmax,
> --- 112,126 ----
> /* okay, try to lock the tuple */
> if (erm->markType == ROW_MARK_EXCLUSIVE)
> lockmode = LockTupleExclusive;
> ! else if (erm->markType == ROW_MARK_SHARE)
> lockmode = LockTupleShared;
> + else if (erm->markType == ROW_MARK_KEYLOCK)
> + lockmode = LockTupleKeylock;
> + else
> + {
> + elog(ERROR, "unsupported rowmark type");
> + lockmode = LockTupleExclusive; /* keep compiler quiet */
> + }

A switch statement would be more consistent with what you've done elsewhere.

> *** a/src/backend/nodes/outfuncs.c
> --- b/src/backend/nodes/outfuncs.c

> ***************
> *** 2181,2187 **** _outRowMarkClause(StringInfo str, RowMarkClause *node)
> WRITE_NODE_TYPE("ROWMARKCLAUSE");
>
> WRITE_UINT_FIELD(rti);
> ! WRITE_BOOL_FIELD(forUpdate);
> WRITE_BOOL_FIELD(noWait);
> WRITE_BOOL_FIELD(pushedDown);
> }
> --- 2181,2187 ----
> WRITE_NODE_TYPE("ROWMARKCLAUSE");
>
> WRITE_UINT_FIELD(rti);
> ! WRITE_BOOL_FIELD(strength);

WRITE_ENUM_FIELD?

> WRITE_BOOL_FIELD(noWait);
> WRITE_BOOL_FIELD(pushedDown);
> }
> *** a/src/backend/nodes/readfuncs.c
> --- b/src/backend/nodes/readfuncs.c
> ***************
> *** 299,305 **** _readRowMarkClause(void)
> READ_LOCALS(RowMarkClause);
>
> READ_UINT_FIELD(rti);
> ! READ_BOOL_FIELD(forUpdate);
> READ_BOOL_FIELD(noWait);
> READ_BOOL_FIELD(pushedDown);
>
> --- 299,305 ----
> READ_LOCALS(RowMarkClause);
>
> READ_UINT_FIELD(rti);
> ! READ_BOOL_FIELD(strength);

READ_ENUM_FIELD?

> *** a/src/backend/optimizer/plan/planner.c
> --- b/src/backend/optimizer/plan/planner.c

> ***************
> *** 1887,1896 **** preprocess_rowmarks(PlannerInfo *root)
> newrc = makeNode(PlanRowMark);
> newrc->rti = newrc->prti = rc->rti;
> newrc->rowmarkId = ++(root->glob->lastRowMarkId);
> ! if (rc->forUpdate)
> newrc->markType = ROW_MARK_EXCLUSIVE;
> ! else
> newrc->markType = ROW_MARK_SHARE;
> newrc->noWait = rc->noWait;
> newrc->isParent = false;
>
> --- 1887,1904 ----
> newrc = makeNode(PlanRowMark);
> newrc->rti = newrc->prti = rc->rti;
> newrc->rowmarkId = ++(root->glob->lastRowMarkId);
> ! switch (rc->strength)
> ! {
> ! case LCS_FORUPDATE:
> newrc->markType = ROW_MARK_EXCLUSIVE;
> ! break;
> ! case LCS_FORSHARE:
> newrc->markType = ROW_MARK_SHARE;
> + break;
> + case LCS_FORKEYLOCK:
> + newrc->markType = ROW_MARK_KEYLOCK;
> + break;
> + }

This needs a "default" clause throwing an error. (Seems like the default could
be in #ifdef USE_ASSERT_CHECKING, but we don't seem to ever do that.)

> *** a/src/backend/tcop/utility.c
> --- b/src/backend/tcop/utility.c
> ***************
> *** 2205,2214 **** CreateCommandTag(Node *parsetree)
> else if (stmt->rowMarks != NIL)
> {
> /* not 100% but probably close enough */
> ! if (((RowMarkClause *) linitial(stmt->rowMarks))->forUpdate)
> tag = "SELECT FOR UPDATE";
> ! else
> tag = "SELECT FOR SHARE";
> }
> else
> tag = "SELECT";
> --- 2205,2225 ----
> else if (stmt->rowMarks != NIL)
> {
> /* not 100% but probably close enough */
> ! switch (((RowMarkClause *) linitial(stmt->rowMarks))->strength)
> ! {
> ! case LCS_FORUPDATE:
> tag = "SELECT FOR UPDATE";
> ! break;
> ! case LCS_FORSHARE:
> tag = "SELECT FOR SHARE";
> + break;
> + case LCS_FORKEYLOCK:
> + tag = "SELECT FOR KEY LOCK";
> + break;
> + default:
> + tag = "???";
> + break;

elog(ERROR) in the default clause, perhaps? See earlier comment.

> *** a/src/backend/utils/adt/ruleutils.c
> --- b/src/backend/utils/adt/ruleutils.c
> ***************
> *** 2837,2848 **** get_select_query_def(Query *query, deparse_context *context,
> if (rc->pushedDown)
> continue;
>
> ! if (rc->forUpdate)
> ! appendContextKeyword(context, " FOR UPDATE",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> ! else
> appendContextKeyword(context, " FOR SHARE",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> appendStringInfo(buf, " OF %s",
> quote_identifier(rte->eref->aliasname));
> if (rc->noWait)
> --- 2837,2858 ----
> if (rc->pushedDown)
> continue;
>
> ! switch (rc->strength)
> ! {
> ! case LCS_FORKEYLOCK:
> ! appendContextKeyword(context, " FOR KEY LOCK",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> ! break;
> ! case LCS_FORSHARE:
> appendContextKeyword(context, " FOR SHARE",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> + break;
> + case LCS_FORUPDATE:
> + appendContextKeyword(context, " FOR UPDATE",
> + -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> + break;
> + }

Another switch statement; see earlier comment.

> *** a/src/backend/utils/cache/relcache.c
> --- b/src/backend/utils/cache/relcache.c

> ***************
> *** 3661,3675 **** RelationGetIndexAttrBitmap(Relation relation)
> --- 3665,3688 ----
> int attrnum = indexInfo->ii_KeyAttrNumbers[i];
>
> if (attrnum != 0)
> + {
> indexattrs = bms_add_member(indexattrs,
> attrnum - FirstLowInvalidHeapAttributeNumber);
> + if (indexInfo->ii_Unique)
> + uindexattrs = bms_add_member(uindexattrs,
> + attrnum - FirstLowInvalidHeapAttributeNumber);
> + }
> }
>
> /* Collect all attributes used in expressions, too */
> pull_varattnos((Node *) indexInfo->ii_Expressions, &indexattrs);
> + if (indexInfo->ii_Unique)
> + pull_varattnos((Node *) indexInfo->ii_Expressions, &uindexattrs);

No need; as Marti mentioned, such indexes are not usable for FOREIGN KEY.

>
> /* Collect all attributes in the index predicate, too */
> pull_varattnos((Node *) indexInfo->ii_Predicate, &indexattrs);
> + if (indexInfo->ii_Unique)
> + pull_varattnos((Node *) indexInfo->ii_Predicate, &uindexattrs);

Likewise.

> *** a/src/include/access/htup.h
> --- b/src/include/access/htup.h
> ***************
> *** 163,174 **** typedef HeapTupleHeaderData *HeapTupleHeader;
> #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
> #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
> #define HEAP_HASOID 0x0008 /* has an object-id field */
> ! /* bit 0x0010 is available */
> #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
> #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
> #define HEAP_XMAX_SHARED_LOCK 0x0080 /* xmax is shared locker */
> /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */
> ! #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK)
> #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
> #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
> #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
> --- 163,177 ----
> #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
> #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
> #define HEAP_HASOID 0x0008 /* has an object-id field */
> ! #define HEAP_XMAX_KEY_LOCK 0x0010 /* xmax is a "key" locker */
> #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
> #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
> #define HEAP_XMAX_SHARED_LOCK 0x0080 /* xmax is shared locker */
> + /* if either SHARE or KEY lock bit is set, this is a "shared" lock */
> + #define HEAP_IS_SHARE_LOCKED (HEAP_XMAX_SHARED_LOCK | HEAP_XMAX_KEY_LOCK)
> /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */

"either" should now be "any".

> ! #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK | \
> ! HEAP_XMAX_KEY_LOCK)
> #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
> #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
> #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */

> *** a/src/include/nodes/parsenodes.h
> --- b/src/include/nodes/parsenodes.h
> ***************
> *** 554,571 **** typedef struct DefElem
> } DefElem;
>
> /*
> ! * LockingClause - raw representation of FOR UPDATE/SHARE options
> *
> * Note: lockedRels == NIL means "all relations in query". Otherwise it
> * is a list of RangeVar nodes. (We use RangeVar mainly because it carries
> * a location field --- currently, parse analysis insists on unqualified
> * names in LockingClause.)
> */
> typedef struct LockingClause
> {
> NodeTag type;
> List *lockedRels; /* FOR UPDATE or FOR SHARE relations */
> ! bool forUpdate; /* true = FOR UPDATE, false = FOR SHARE */
> bool noWait; /* NOWAIT option */
> } LockingClause;
>
> --- 554,579 ----
> } DefElem;
>
> /*
> ! * LockingClause - raw representation of FOR UPDATE/SHARE/KEY LOCK options
> *
> * Note: lockedRels == NIL means "all relations in query". Otherwise it
> * is a list of RangeVar nodes. (We use RangeVar mainly because it carries
> * a location field --- currently, parse analysis insists on unqualified
> * names in LockingClause.)
> */
> + typedef enum LockClauseStrength
> + {
> + /* order is important -- see applyLockingClause */
> + LCS_FORKEYLOCK,
> + LCS_FORSHARE,
> + LCS_FORUPDATE
> + } LockClauseStrength;
> +

It's sure odd having this enum precisely mirror LockTupleMode. Is there
precedent for this? They are at opposite ends of processing stack, I suppose.

> typedef struct LockingClause
> {
> NodeTag type;
> List *lockedRels; /* FOR UPDATE or FOR SHARE relations */
> ! LockClauseStrength strength;
> bool noWait; /* NOWAIT option */
> } LockingClause;
>
> ***************
> *** 839,856 **** typedef struct WindowClause
> * parser output representation of FOR UPDATE/SHARE clauses
> *
> * Query.rowMarks contains a separate RowMarkClause node for each relation
> ! * identified as a FOR UPDATE/SHARE target. If FOR UPDATE/SHARE is applied
> ! * to a subquery, we generate RowMarkClauses for all normal and subquery rels
> ! * in the subquery, but they are marked pushedDown = true to distinguish them
> ! * from clauses that were explicitly written at this query level. Also,
> ! * Query.hasForUpdate tells whether there were explicit FOR UPDATE/SHARE
> ! * clauses in the current query level.
> */
> typedef struct RowMarkClause
> {
> NodeTag type;
> Index rti; /* range table index of target relation */
> ! bool forUpdate; /* true = FOR UPDATE, false = FOR SHARE */
> bool noWait; /* NOWAIT option */
> bool pushedDown; /* pushed down from higher query level? */
> } RowMarkClause;
> --- 847,864 ----
> * parser output representation of FOR UPDATE/SHARE clauses
> *
> * Query.rowMarks contains a separate RowMarkClause node for each relation
> ! * identified as a FOR UPDATE/SHARE/KEY LOCK target. If one of these clauses
> ! * is applied to a subquery, we generate RowMarkClauses for all normal and
> ! * subquery rels in the subquery, but they are marked pushedDown = true to
> ! * distinguish them from clauses that were explicitly written at this query
> ! * level. Also, Query.hasForUpdate tells whether there were explicit FOR
> ! * UPDATE/SHARE clauses in the current query level.

Need a "/KEY LOCK" in the last sentence.

> */
> typedef struct RowMarkClause
> {
> NodeTag type;
> Index rti; /* range table index of target relation */
> ! LockClauseStrength strength;
> bool noWait; /* NOWAIT option */
> bool pushedDown; /* pushed down from higher query level? */
> } RowMarkClause;

I'd like to do some more testing around HOT and TOAST, plus run performance
tests. Figured I should get this much fired off, though.

Thanks,
nm

Attachment	Content-Type	Size
fklocks-20110211.patch	text/plain	66.4 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-11 17:15:20
Message-ID:	1297443290-sup-3183@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of vie feb 11 04:13:22 -0300 2011:

Hello,

First, thanks for the very thorough review.

> On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote:

> Incidentally, HeapTupleSatisfiesMVCC has some bits of code like this (not new):
>
> /* MultiXacts are currently only allowed to lock tuples */
> Assert(tuple->t_infomask & HEAP_IS_LOCKED);
>
> They're specifically only allowed for SHARE and KEY locks, right?
> heap_lock_tuple seems to assume as much.

Yeah, since FOR UPDATE acquires an exclusive lock on the tuple, you
can't have a multixact there. Maybe we can make the assert more
specific; I'll have a look.

> [ test case with funny visibility behavior ]

Looking into the visibility bug.

> > I published about this here:
> > http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/
> >
> > So, as a rough design,
> >
> > 1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
> > 2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
> > 3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.
>
> It does not conflict with SELECT FOR SHARE, does it?

It doesn't; I think I copied old text there. (I had originally thought
that they would conflict, but I had to change that due to implementation
restrictions).

> The odd thing here is the checking of an outside condition to decide whether
> locks conflict. Normally, to get a different conflict list, we add another lock
> type. What about this?
>
> FOR KEY SHARE conflicts with FOR KEY UPDATE
> FOR SHARE conflicts with FOR KEY UPDATE, FOR UPDATE
> FOR UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE
> FOR KEY UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE, FOR KEY SHARE

Hmm, let me see about this.

> > 3. The original tuple needs to be marked with the Cmax of the locking
> > command, to prevent it from being seen in the same transaction.
>
> Could you elaborate on this requirement?

Consider an open cursor with a snapshot prior to the lock. If we leave
the old tuple as is, the cursor would see that old tuple as visible.
But the locked copy of the tuple is also visible, because the Cmax is
just a locker, not an updater.

> > 4. A non-conflicting update to the tuple must carry forward some fields
> > from the original tuple into the updated copy. Those include Xmax,
> > XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.
>
> HeapTupleHeaderGetCmax() has this assertion:
>
> /* We do not store cmax when locking a tuple */
> Assert(!(tup->t_infomask & (HEAP_MOVED | HEAP_IS_LOCKED)));
>
> Assuming that assertion is still valid, there will never be a HEAP_COMBOCID flag
> to copy. Right?

Hmm, I think the assert is wrong, but I'm still paging in the details of
the patch after being away from it for so long. Let me think more about it.

> [ Lots more stuff ]

I'll give careful consideration to all this.

Thanks again for the detailed review.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-11 18:17:59
Message-ID:	20110211181759.GC30425@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 11, 2011 at 02:15:20PM -0300, Alvaro Herrera wrote:
> Excerpts from Noah Misch's message of vie feb 11 04:13:22 -0300 2011:
> > On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote:
> > > 3. The original tuple needs to be marked with the Cmax of the locking
> > > command, to prevent it from being seen in the same transaction.
> >
> > Could you elaborate on this requirement?
>
> Consider an open cursor with a snapshot prior to the lock. If we leave
> the old tuple as is, the cursor would see that old tuple as visible.
> But the locked copy of the tuple is also visible, because the Cmax is
> just a locker, not an updater.

Thanks. Today, a lock operation leaves t_cid unchanged, and an update fills its
own cid into Cmax of the old tuple and Cmin of the new tuple. So, the cursor
would only see the old tuple. What will make that no longer sufficient?

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-14 20:56:10
Message-ID:	1297716803-sup-1524@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of vie feb 11 04:13:22 -0300 2011:

> I observe visibility breakage with this test case:
>
> [ ... ]
>
> The problem seems to be that funny t_cid (2249). Tracing through heap_update,
> the new code is not setting t_cid during this test case.

So I can fix this problem by simply adding a call to
HeapTupleHeaderSetCmin when the stuff about ComboCid does not hold, but
seeing that screenful plus the subsequent call to
HeapTupleHeaderAdjustCmax feels wrong. I think this needs to be
rethought ...

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Marti Raudsepp <marti(at)juffo(dot)org>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-14 22:39:25
Message-ID:	AANLkTimgf2OPZ8-25Q3_v+avF2fjcxVpyejSjm3_yuxr@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 11, 2011 at 09:13, Noah Misch <noah(at)leadboat(dot)com> wrote:
> The patch had a trivial conflict in planner.c, plus plenty of offsets. I've
> attached the rebased patch that I used for review. For anyone following along,
> all the interesting hunks touch heapam.c; the rest is largely mechanical. A
> "diff -w" patch is also considerably easier to follow.

Here's a simple patch for the RelationGetIndexAttrBitmap() function,
as explained in my last post. I don't know if it's any help to you,
but since I wrote it I might as well send it up. This applies on top
of Noah's rebased patch.

I did some tests and it seems to work, although I also hit the same
visibility bug as Noah.

Test case I used:

THREAD A:
create table foo (pk int primary key, ak int);
create unique index on foo (ak) where ak != 0;
create unique index on foo ((-ak));

create table bar (foo_pk int references foo (pk));
insert into foo values(1,1);
begin; insert into bar values(1);

THREAD B:
begin; update foo set ak=2 where ak=1;

Regards,
Marti

Attachment	Content-Type	Size
0001-Only-acquire-KEY-LOCK-for-colums-that-can-be-referen.patch	text/x-patch	5.3 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Marti Raudsepp <marti(at)juffo(dot)org>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-14 23:49:58
Message-ID:	1297726905-sup-2727@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Marti Raudsepp's message of lun feb 14 19:39:25 -0300 2011:
> On Fri, Feb 11, 2011 at 09:13, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > The patch had a trivial conflict in planner.c, plus plenty of offsets. I've
> > attached the rebased patch that I used for review. For anyone following along,
> > all the interesting hunks touch heapam.c; the rest is largely mechanical. A
> > "diff -w" patch is also considerably easier to follow.
>
> Here's a simple patch for the RelationGetIndexAttrBitmap() function,
> as explained in my last post. I don't know if it's any help to you,
> but since I wrote it I might as well send it up. This applies on top
> of Noah's rebased patch.

Got it, thanks.

> I did some tests and it seems to work, although I also hit the same
> visibility bug as Noah.

Yeah, that bug is fixed with the attached, though I am rethinking this
bit.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment	Content-Type	Size
0001-Fix-visibility-bug-and-poorly-worded-comment.patch	application/octet-stream	1.4 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-15 21:15:38
Message-ID:	AANLkTikHp3cb+pB_d1ROGhFsTaoqKUbQMVM3iVA44v3J@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 14, 2011 at 6:49 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Marti Raudsepp's message of lun feb 14 19:39:25 -0300 2011:
>> On Fri, Feb 11, 2011 at 09:13, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > The patch had a trivial conflict in planner.c, plus plenty of offsets. I've
>> > attached the rebased patch that I used for review. For anyone following along,
>> > all the interesting hunks touch heapam.c; the rest is largely mechanical. A
>> > "diff -w" patch is also considerably easier to follow.
>>
>> Here's a simple patch for the RelationGetIndexAttrBitmap() function,
>> as explained in my last post. I don't know if it's any help to you,
>> but since I wrote it I might as well send it up. This applies on top
>> of Noah's rebased patch.
>
> Got it, thanks.
>
>> I did some tests and it seems to work, although I also hit the same
>> visibility bug as Noah.
>
> Yeah, that bug is fixed with the attached, though I am rethinking this
> bit.

I am thinking that the statute of limitations has expired on this
patch, and that we should mark it Returned with Feedback and continue
working on it for 9.2. I know it's a valuable feature, but I think
we're out of time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-15 21:32:36
Message-ID:	E1661B84-6931-4A71-9061-52C933A0F165@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 15, 2011, at 1:15 PM, Robert Haas wrote:

>> Yeah, that bug is fixed with the attached, though I am rethinking this
>> bit.
>
> I am thinking that the statute of limitations has expired on this
> patch, and that we should mark it Returned with Feedback and continue
> working on it for 9.2. I know it's a valuable feature, but I think
> we're out of time.

How is such a determination made, exactly?

Best,

David

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-15 21:32:45
Message-ID:	1297805461-sup-6497@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of mar feb 15 18:15:38 -0300 2011:

> I am thinking that the statute of limitations has expired on this
> patch, and that we should mark it Returned with Feedback and continue
> working on it for 9.2. I know it's a valuable feature, but I think
> we're out of time.

Okay, I've marked it as such in the commitfest app. It'll be in 9.2's
first commitfest.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-02-16 00:17:56
Message-ID:	4D5B17B4.7010306@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> How is such a determination made, exactly?

It's Feb 15th, and portions of the patch need a rework according to the
author. I'm with Robert on this one.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-03-11 15:51:14
Message-ID:	20110311155114.GA29175@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote:
> Automated tests would go a long way toward building confidence that this patch
> does the right thing. Thanks to the SSI patch, we now have an in-tree test
> framework for testing interleaved transactions. The only thing it needs to be
> suitable for this work is a way to handle blocked commands. If you like, I can
> try to whip something up for that.
[off-list ACK followed]

Here's a patch implementing that. It applies to master, with or without your
KEY LOCK patch also applied, though the expected outputs reflect the
improvements from your patch. I add three isolation test specs:

fk-contention: blocking-only test case from your blog post
fk-deadlock: the deadlocking test case I used during patch review
fk-deadlock2: Joel Jacobson's deadlocking test case

When a spec permutation would have us run a command in a currently-blocked
session, we cannot implement that permutation. Such permutations represent
impossible real-world scenarios, anyway. For now, I just explicitly name the
valid permutations in each spec file. If the test harness detects this problem,
we abort the current test spec. It might be nicer to instead cancel all
outstanding queries, issue rollbacks in all sessions, and continue with other
permutations. I hesitated to do that, because we currently leave all
transaction control in the hands of the test spec.

I only support one waiting command at a time. As long as one commands continues
to wait, I run other commands to completion synchronously. This decision has no
impact on the current test specs, which all have two sessions. It avoided a
touchy policy decision concerning deadlock detection. If two commands have
blocked, it may be that a third command needs to run before they will unblock,
or it may be that the two commands have formed a deadlock. We won't know for
sure until deadlock_timeout elapses. If it's possible to run the next step in
the permutation (i.e., it uses a different session from any blocked command), we
can either do so immediately or wait out the deadlock_timeout first. The latter
slows the test suite, but it makes the output more natural -- more like what one
would typically after running the commands by hand. If anyone can think of a
sound general policy, that would be helpful. For now, I've punted.

With a default postgresql.conf, deadlock_timeout constitutes most of the run
time. Reduce it to 20ms to accelerate things when running the tests repeatedly.

Since timing dictates which query participating in a deadlock will be chosen for
cancellation, the expected outputs bearing deadlock errors are unstable. I'm
not sure how much it will come up in practice, so I have not included expected
output variations to address this.

I think this will work on Windows as well as pgbench does, but I haven't
verified that.

Sorry for the delay on this.

Attachment	Content-Type	Size
fklocks-tests-v1.patch	text/plain	22.2 KB

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-06-19 16:30:41
Message-ID:	4DFE2431.9080804@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I hope this hasn't been forgotten. But I cant see it has been committed
or moved
into the commitfest process?

Jesper

On 2011-03-11 16:51, Noah Misch wrote:
> On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote:
>> Automated tests would go a long way toward building confidence that this patch
>> does the right thing. Thanks to the SSI patch, we now have an in-tree test
>> framework for testing interleaved transactions. The only thing it needs to be
>> suitable for this work is a way to handle blocked commands. If you like, I can
>> try to whip something up for that.
> [off-list ACK followed]
>
> Here's a patch implementing that. It applies to master, with or without your
> KEY LOCK patch also applied, though the expected outputs reflect the
> improvements from your patch. I add three isolation test specs:
>
> fk-contention: blocking-only test case from your blog post
> fk-deadlock: the deadlocking test case I used during patch review
> fk-deadlock2: Joel Jacobson's deadlocking test case
>
> When a spec permutation would have us run a command in a currently-blocked
> session, we cannot implement that permutation. Such permutations represent
> impossible real-world scenarios, anyway. For now, I just explicitly name the
> valid permutations in each spec file. If the test harness detects this problem,
> we abort the current test spec. It might be nicer to instead cancel all
> outstanding queries, issue rollbacks in all sessions, and continue with other
> permutations. I hesitated to do that, because we currently leave all
> transaction control in the hands of the test spec.
>
> I only support one waiting command at a time. As long as one commands continues
> to wait, I run other commands to completion synchronously. This decision has no
> impact on the current test specs, which all have two sessions. It avoided a
> touchy policy decision concerning deadlock detection. If two commands have
> blocked, it may be that a third command needs to run before they will unblock,
> or it may be that the two commands have formed a deadlock. We won't know for
> sure until deadlock_timeout elapses. If it's possible to run the next step in
> the permutation (i.e., it uses a different session from any blocked command), we
> can either do so immediately or wait out the deadlock_timeout first. The latter
> slows the test suite, but it makes the output more natural -- more like what one
> would typically after running the commands by hand. If anyone can think of a
> sound general policy, that would be helpful. For now, I've punted.
>
> With a default postgresql.conf, deadlock_timeout constitutes most of the run
> time. Reduce it to 20ms to accelerate things when running the tests repeatedly.
>
> Since timing dictates which query participating in a deadlock will be chosen for
> cancellation, the expected outputs bearing deadlock errors are unstable. I'm
> not sure how much it will come up in practice, so I have not included expected
> output variations to address this.
>
> I think this will work on Windows as well as pgbench does, but I haven't
> verified that.
>
> Sorry for the delay on this.
>

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-06-20 20:11:10
Message-ID:	20110620201110.GB17037@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 19, 2011 at 06:30:41PM +0200, Jesper Krogh wrote:
> I hope this hasn't been forgotten. But I cant see it has been committed
> or moved
> into the commitfest process?

If you're asking about that main patch for $SUBJECT rather than those
isolationtester changes specifically, I can't speak to the plans for it. I
wasn't planning to move the test suite work forward independent of the core
patch it serves, but we could do that if there's another application.

Thanks,
nm

> On 2011-03-11 16:51, Noah Misch wrote:
>> On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote:
>>> Automated tests would go a long way toward building confidence that this patch
>>> does the right thing. Thanks to the SSI patch, we now have an in-tree test
>>> framework for testing interleaved transactions. The only thing it needs to be
>>> suitable for this work is a way to handle blocked commands. If you like, I can
>>> try to whip something up for that.
>> [off-list ACK followed]
>>
>> Here's a patch implementing that. It applies to master, with or without your
>> KEY LOCK patch also applied, though the expected outputs reflect the
>> improvements from your patch. I add three isolation test specs:
>>
>> fk-contention: blocking-only test case from your blog post
>> fk-deadlock: the deadlocking test case I used during patch review
>> fk-deadlock2: Joel Jacobson's deadlocking test case

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-06-21 04:50:44
Message-ID:	4E002324.3000608@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2011-06-20 22:11, Noah Misch wrote:
> On Sun, Jun 19, 2011 at 06:30:41PM +0200, Jesper Krogh wrote:
>> I hope this hasn't been forgotten. But I cant see it has been committed
>> or moved
>> into the commitfest process?
> If you're asking about that main patch for $SUBJECT rather than those
> isolationtester changes specifically, I can't speak to the plans for it. I
> wasn't planning to move the test suite work forward independent of the core
> patch it serves, but we could do that if there's another application.
Yes, I was actually asking about the main patch for foreign key locks.

Jesper
--
Jesper

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-07-12 21:59:01
Message-ID:	1310507494-sup-7461@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of vie mar 11 12:51:14 -0300 2011:
> On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote:
> > Automated tests would go a long way toward building confidence that this patch
> > does the right thing. Thanks to the SSI patch, we now have an in-tree test
> > framework for testing interleaved transactions. The only thing it needs to be
> > suitable for this work is a way to handle blocked commands. If you like, I can
> > try to whip something up for that.
> [off-list ACK followed]
>
> Here's a patch implementing that. It applies to master, with or without your
> KEY LOCK patch also applied, though the expected outputs reflect the
> improvements from your patch. I add three isolation test specs:
>
> fk-contention: blocking-only test case from your blog post
> fk-deadlock: the deadlocking test case I used during patch review
> fk-deadlock2: Joel Jacobson's deadlocking test case

Thanks for this patch. I have applied it, adjusting the expected output
of these tests to the HEAD code. I'll adjust it when I commit the
fklocks patch, I guess, but it seemed simpler to have it out of the way;
besides it might end up benefitting other people who might be messing
with the locking code.

> I only support one waiting command at a time. As long as one commands continues
> to wait, I run other commands to completion synchronously.

Should be fine for now, I guess.

> I think this will work on Windows as well as pgbench does, but I haven't
> verified that.

We will find out shortly.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Noah Misch <noah(at)2ndQuadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-07-13 05:34:10
Message-ID:	20110713053407.GA19443@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 12, 2011 at 05:59:01PM -0400, Alvaro Herrera wrote:
> Excerpts from Noah Misch's message of vie mar 11 12:51:14 -0300 2011:
> > On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote:
> > > Automated tests would go a long way toward building confidence that this patch
> > > does the right thing. Thanks to the SSI patch, we now have an in-tree test
> > > framework for testing interleaved transactions. The only thing it needs to be
> > > suitable for this work is a way to handle blocked commands. If you like, I can
> > > try to whip something up for that.
> > [off-list ACK followed]
> >
> > Here's a patch implementing that. It applies to master, with or without your
> > KEY LOCK patch also applied, though the expected outputs reflect the
> > improvements from your patch. I add three isolation test specs:
> >
> > fk-contention: blocking-only test case from your blog post
> > fk-deadlock: the deadlocking test case I used during patch review
> > fk-deadlock2: Joel Jacobson's deadlocking test case
>
> Thanks for this patch. I have applied it, adjusting the expected output
> of these tests to the HEAD code. I'll adjust it when I commit the
> fklocks patch, I guess, but it seemed simpler to have it out of the way;
> besides it might end up benefitting other people who might be messing
> with the locking code.

Great. There have been a few recent patches where I would have used this
functionality to provide tests, so I'm glad to have it in.

> > I think this will work on Windows as well as pgbench does, but I haven't
> > verified that.
>
> We will find out shortly.

I see you've added a fix for the MSVC animals; thanks.

coypu failed during the run of the test due to a different session being chosen
as the deadlock victim. We can now vary deadlock_timeout to prevent this; see
attached fklocks-tests-deadlock_timeout.patch. This also makes the tests much
faster on a default postgresql.conf.

crake failed when it reported waiting on the first step of an existing isolation
test ("two-ids.spec"). I will need to look into that further.

Thanks,
nm

Attachment	Content-Type	Size
fklocks-tests-deadlock_timeout.patch	text/plain	6.0 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)2ndquadrant(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-07-15 23:01:26
Message-ID:	1310770360-sup-340@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of mié jul 13 01:34:10 -0400 2011:

> coypu failed during the run of the test due to a different session being chosen
> as the deadlock victim. We can now vary deadlock_timeout to prevent this; see
> attached fklocks-tests-deadlock_timeout.patch. This also makes the tests much
> faster on a default postgresql.conf.

I applied your patch, thanks. I couldn't reproduce the failures without
it, even running only the three new tests in a loop a few dozen times.

> crake failed when it reported waiting on the first step of an existing isolation
> test ("two-ids.spec"). I will need to look into that further.

Actually, there are four failures in tests other than the two fixed by
your patch. These are:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-12%2022:32:02
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2011-07-14%2016:27:00
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pitta&dt=2011-07-15%2015:00:08
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-15%2018:32:02

The last two are an identical failure in multiple-row-versions:
***************
*** 1,11 ****
Parsed test spec with 4 sessions

starting permutation: rx1 wx2 c2 wx3 ry3 wy4 rz4 c4 c3 wz1 c1
! step rx1: SELECT * FROM t WHERE id = 1000000;
id txt

1000000
- step wx2: UPDATE t SET txt = 'b' WHERE id = 1000000;
step c2: COMMIT;
step wx3: UPDATE t SET txt = 'c' WHERE id = 1000000;
step ry3: SELECT * FROM t WHERE id = 500000;
--- 1,12 ----
Parsed test spec with 4 sessions

starting permutation: rx1 wx2 c2 wx3 ry3 wy4 rz4 c4 c3 wz1 c1
! step rx1: SELECT * FROM t WHERE id = 1000000; <waiting ...>
! step wx2: UPDATE t SET txt = 'b' WHERE id = 1000000;
! step rx1: <... completed>
id txt

1000000
step c2: COMMIT;
step wx3: UPDATE t SET txt = 'c' WHERE id = 1000000;
step ry3: SELECT * FROM t WHERE id = 500000;

The other failure by crake in two-ids:

***************
*** 440,447 ****
step c3: COMMIT;

starting permutation: rxwy2 wx1 ry3 c2 c3 c1
! step rxwy2: update D2 set id = (select id+1 from D1);
step wx1: update D1 set id = id + 1;
step ry3: select id from D2;
id

--- 440,448 ----
step c3: COMMIT;

starting permutation: rxwy2 wx1 ry3 c2 c3 c1
! step rxwy2: update D2 set id = (select id+1 from D1); <waiting ...>
step wx1: update D1 set id = id + 1;
+ step rxwy2: <... completed>
step ry3: select id from D2;
id

And the most problematic one, in nightjar, is a failure to send two
async commands, which is not supported by the new code:

--- 255,260 ----
ERROR: could not serialize access due to read/write dependencies among transactions

starting permutation: ry2 wx2 rx1 wy1 c2 c1
! step ry2: SELECT count(*) FROM project WHERE project_manager = 1; <waiting ...>
! failed to send query: another command is already in progress

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Noah Misch <noah(at)2ndQuadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-07-16 17:11:49
Message-ID:	20110716171121.GB2047@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jul 15, 2011 at 07:01:26PM -0400, Alvaro Herrera wrote:
> Excerpts from Noah Misch's message of mié jul 13 01:34:10 -0400 2011:
>
> > coypu failed during the run of the test due to a different session being chosen
> > as the deadlock victim. We can now vary deadlock_timeout to prevent this; see
> > attached fklocks-tests-deadlock_timeout.patch. This also makes the tests much
> > faster on a default postgresql.conf.
>
> I applied your patch, thanks. I couldn't reproduce the failures without
> it, even running only the three new tests in a loop a few dozen times.

It's probably more likely to crop up on a loaded system. I did not actually
reproduce it myself. However, if you swap the timeouts, the opposite session
finds the deadlock. From there, I'm convinced that the right timing
perturbations could yield the symptom coypu exhibited.

> > crake failed when it reported waiting on the first step of an existing isolation
> > test ("two-ids.spec"). I will need to look into that further.
>
> Actually, there are four failures in tests other than the two fixed by
> your patch. These are:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-12%2022:32:02
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2011-07-14%2016:27:00
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pitta&dt=2011-07-15%2015:00:08
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-15%2018:32:02

Thanks for summarizing. These all boil down to lock waits not anticipated by
the test specs. Having pondered this, I've been able to come up with just one
explanation. If autovacuum runs VACUUM during the test and finds that it can
truncate dead space from the end of a relation, it will acquire an
AccessExclusiveLock. When I decrease autovacuum_naptime to 1s, I do see
plenty of pg_type and pg_attribute truncations during a test run.

When I sought to reproduce this, what I first saw instead was an indefinite
test suite hang. That turned out to arise from an unrelated thinko -- I
assumed that backend IDs were stable for the life of the backend, but they're
only stable for the life of a pgstat snapshot. This fell down when a backend
older than one of the test backends exited during the test:

4199 2011-07-16 03:33:28.733 EDT DEBUG: forked new backend, pid=23984 socket=8
23984 2011-07-16 03:33:28.737 EDT LOG: statement: SET client_min_messages = warning;
23984 2011-07-16 03:33:28.739 EDT LOG: statement: SELECT i FROM pg_stat_get_backend_idset() t(i) WHERE pg_stat_get_backend_pid(i) = pg_backend_pid()
23985 2011-07-16 03:33:28.740 EDT DEBUG: autovacuum: processing database "postgres"
4199 2011-07-16 03:33:28.754 EDT DEBUG: forked new backend, pid=23986 socket=8
23986 2011-07-16 03:33:28.754 EDT LOG: statement: SET client_min_messages = warning;
4199 2011-07-16 03:33:28.755 EDT DEBUG: server process (PID 23985) exited with exit code 0
23986 2011-07-16 03:33:28.755 EDT LOG: statement: SELECT i FROM pg_stat_get_backend_idset() t(i) WHERE pg_stat_get_backend_pid(i) = pg_backend_pid()
4199 2011-07-16 03:33:28.766 EDT DEBUG: forked new backend, pid=23987 socket=8
23987 2011-07-16 03:33:28.766 EDT LOG: statement: SET client_min_messages = warning;
23987 2011-07-16 03:33:28.767 EDT LOG: statement: SELECT i FROM pg_stat_get_backend_idset() t(i) WHERE pg_stat_get_backend_pid(i) = pg_backend_pid()

This led isolationtester to initialize backend_ids = {1,2,2}, making us unable
to detect lock waits correctly. That's also consistent with the symptoms Rémi
Zara just reported. With that fixed, I was able to reproduce the failure due
to autovacuum-truncate-induced transient waiting using this recipe:
- autovacuum_naptime = 1s
- src/test/isolation/Makefile changed to pass --use-existing during installcheck
- Run 'make installcheck' in a loop
- A concurrent session running this in a loop:
CREATE TABLE churn (a int, b int, c int, d int, e int, f int, g int, h int);
DROP TABLE churn;

That yields a steady stream of vacuum truncations, and an associated lock wait
generally capsized the suite within 5-10 runs. Frankly, I have some
difficulty believing that this mechanic alone produced all four failures you
cite above; I suspect I'm still missing some more-frequent cause. Any other
theories on which system background activities can cause a transient lock
wait? It would have to produce a "pgstat_report_waiting(true)" call, so I
believe that excludes all LWLock and lighter contention.

In any event, I have attached a patch that fixes the problems I have described
here. To ignore autovacuum, it only recognizes a wait when one of the
backends under test holds a conflicting lock. (It occurs to me that perhaps
we should expose a pg_lock_conflicts(lockmode_held text, lockmode_req text)
function to simplify this query -- this is a fairly common monitoring need.)

With that change in place, my setup survived through about fifty suite runs at
a time. The streak would end when session 2 would unexpectedly detect a
deadlock that session 1 should have detected. The session 1 deadlock_timeout
I chose, 20ms, is too aggressive. When session 2 is to issue the command that
completes the deadlock, it must do so before session 1 runs the deadlock
detector. Since we burn 10ms just noticing that the previous statement has
blocked, that left only 10ms to issue the next statement. This patch bumps
the figure from 20s to 100ms; hopefully that will be enough for even a
decently-loaded virtual host. We should keep it as low as is reasonable,
because it contributes directly to the isolation suite runtime. Each addition
to deadlock_timeout slows the suite by 12x that amount.

With this patch in its final form, I have completed 180+ suite runs without a
failure. In the absence of better theories on the cause for the buildfarm
failures, we should give the buildfarm a whirl with this patch.

I apologize for the quantity of errata this change is entailing.

Thanks,
nm

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
fklocks-tests-harden.patch	text/plain	9.8 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)2ndquadrant(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-07-19 18:47:16
Message-ID:	1311100780-sup-3636@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of sáb jul 16 13:11:49 -0400 2011:

> In any event, I have attached a patch that fixes the problems I have described
> here. To ignore autovacuum, it only recognizes a wait when one of the
> backends under test holds a conflicting lock. (It occurs to me that perhaps
> we should expose a pg_lock_conflicts(lockmode_held text, lockmode_req text)
> function to simplify this query -- this is a fairly common monitoring need.)

Applied it. I agree that having such an utility function is worthwhile,
particularly if we're working on making pg_locks more usable as a whole.

(I wasn't able to reproduce Rémi's hangups here, so I wasn't able to
reproduce the other bits either.)

> With that change in place, my setup survived through about fifty suite runs at
> a time. The streak would end when session 2 would unexpectedly detect a
> deadlock that session 1 should have detected. The session 1 deadlock_timeout
> I chose, 20ms, is too aggressive. When session 2 is to issue the command that
> completes the deadlock, it must do so before session 1 runs the deadlock
> detector. Since we burn 10ms just noticing that the previous statement has
> blocked, that left only 10ms to issue the next statement. This patch bumps
> the figure from 20s to 100ms; hopefully that will be enough for even a
> decently-loaded virtual host.

Committed this too.

> With this patch in its final form, I have completed 180+ suite runs without a
> failure. In the absence of better theories on the cause for the buildfarm
> failures, we should give the buildfarm a whirl with this patch.

Great. If there is some other failure mechanism, we'll find out ...

> I apologize for the quantity of errata this change is entailing.

No need to apologize. I might as well apologize myself because I didn't
detect these problems on review. But we don't do that -- we just fix
the problems and move on. It's great that you were able to come up with
a fix quickly.

And this is precisely why I committed this way ahead of the patch that
it was written to help: we're now not fixing problems in both
simultaneously. By the time we get that other patch in, this test
harness will be fully robust.

Thanks for all your effort in this.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-07-27 23:16:44
Message-ID:	1311807810-sup-1055@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hackers,

This is an updated version of the patch I introduced here:

http://archives.postgresql.org/message-id/1294953201-sup-2099@alvh.no-ip.org

Mainly, this patch addresses the numerous comments by Noah Misch here:
http://archives.postgresql.org/message-id/20110211071322.GB26971@tornado.leadboat.com
My thanks to Noah for the very exhaustive review and ideas.

I also removed the bit about copying the ComboCid to the new version of
the tuple during an update. I think that must have been the result of
very fuzzy thinking; I cannot find any reasoning that leads to it being
necessary, or even correct.

I also included Marti Raudsepp's patch to consider only indexes usable
in foreign keys.

One thing I have not addressed is Noah's idea about creating a new lock
mode, KEY UPDATE, that would let us solve the initial problem that this
patch set to resolve in the first place. I am not clear on exactly how
that is to be implemented, because currently heap_update and heap_delete
do not grab any kind of lock but instead do their own ad-hoc waiting. I
think that might need to be reshuffled a bit, to which I haven't gotten
yet, and is a radical enough idea that I would like it to be discussed
by the hackers community at large before setting sail on developing it.
In the meantime, this patch does improve the current situation quite a
lot.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment	Content-Type	Size
fklocks-2.patch	application/octet-stream	74.9 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-08-03 16:14:15
Message-ID:	CA+Tgmob4Z3ick+LMqzuoJNdJ4h3SS5RKQMb1zwGDkg8Z0sB5Ww@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 27, 2011 at 7:16 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> One thing I have not addressed is Noah's idea about creating a new lock
> mode, KEY UPDATE, that would let us solve the initial problem that this
> patch set to resolve in the first place. I am not clear on exactly how
> that is to be implemented, because currently heap_update and heap_delete
> do not grab any kind of lock but instead do their own ad-hoc waiting. I
> think that might need to be reshuffled a bit, to which I haven't gotten
> yet, and is a radical enough idea that I would like it to be discussed
> by the hackers community at large before setting sail on developing it.
> In the meantime, this patch does improve the current situation quite a
> lot.

I haven't looked at the patch yet, but do you have a pointer to Noah's
proposal? And/or a description of how it differs from what you
implemented here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FOR KEY LOCK foreign keys
Date:	2011-08-03 17:03:49
Message-ID:	1312390939-sup-3097@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of mié ago 03 12:14:15 -0400 2011:
> On Wed, Jul 27, 2011 at 7:16 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > One thing I have not addressed is Noah's idea about creating a new lock
> > mode, KEY UPDATE, that would let us solve the initial problem that this
> > patch set to resolve in the first place. I am not clear on exactly how
> > that is to be implemented, because currently heap_update and heap_delete
> > do not grab any kind of lock but instead do their own ad-hoc waiting. I
> > think that might need to be reshuffled a bit, to which I haven't gotten
> > yet, and is a radical enough idea that I would like it to be discussed
> > by the hackers community at large before setting sail on developing it.
> > In the meantime, this patch does improve the current situation quite a
> > lot.
>
> I haven't looked at the patch yet, but do you have a pointer to Noah's
> proposal? And/or a description of how it differs from what you
> implemented here?

Yes, see his review email here:
http://archives.postgresql.org/message-id/20110211071322.GB26971@tornado.leadboat.com

It's long, but search for the part where he talks about "KEY UPDATE".
The way my patch works is explained by Noah there.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support