improving foreign key locks

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: improving foreign key locks
Date: 2010-11-25 22:01:19
Message-ID: 1290721684-sup-3951@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

So I've been working on improving locks for foreign key checks, as
discussed in a thread started by Joel Jacobson a while ago. I've posted
about this:
http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/
http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/
(Note [1] below).

There's a question that arose in internal CMD discussion, which is: is
there an use case for keeping SELECT FOR SHARE locks, with the semantics
that they currently have, if we replace the FK checks with some other
lock implementation?

I've been assuming that we would keep FOR SHARE, because it's a feature
that's been exposed to users directly as a SQL command for five
releases, and so presumably someone may be depending on it. So the new
code would be triggered by a different SQL option, and it needs to work
in conjunction with FOR SHARE.

Now, if the majority opinion here is that we don't need to keep the
current FOR SHARE semantics, a few things would be different.

Thoughts on keeping vs. removing FOR SHARE?

I will be posting more extensively on the implementation of this on this
list, later.

[1] The blog posts says that FOR SHARE would conflict with FOR KEY LOCK,
but I'm having second thoughts about this for various reasons; so they
will not conflict (in other words, transaction A can take a FOR SHARE
lock in a tuple, and transaction B can take FOR KEY LOCK, and they both
can continue). Please consider this if you want to comment on the
design presented in those articles.

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-26 13:48:39
Message-ID: 034160BB-0CD5-471F-A9B3-CFA94C140DC3@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov25, 2010, at 23:01 , Alvaro Herrera wrote:
> So I've been working on improving locks for foreign key checks, as
> discussed in a thread started by Joel Jacobson a while ago. I've posted
> about this:
> http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/
> http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/

To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, but also individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are
A) All fields
B) All fields which are included in some unique constraint, including primary keys.

I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be
SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ]
and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'd obtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those, for the sake of a more efficient implementation.

This leads quite naturally to the following behaviour regarding lock conflicts
.) An UPDATE conflicts with a SHARE or UPDATE lock if the update touches fields locked by the SHARE or UPDATE lock. Thus, in case (A) they always conflict while in case (B) they only conflict if the UPDATE touches a field contained in some unique index
.) A DELETE conflicts always conflicts with a SHARE or UPDATE lock since it, in a way, touches all fields
.) A UPDATE lock always conflicts with a SHARE lock, since there are no two non-overlapping sets of fields that we could lock.
.) SHARE locks never conflict with one another.

best regards,
Florian Pflug


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-26 20:06:30
Message-ID: 1290801672-sup-1415@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Florian Pflug's message of vie nov 26 10:48:39 -0300 2010:
> On Nov25, 2010, at 23:01 , Alvaro Herrera wrote:
> > So I've been working on improving locks for foreign key checks, as
> > discussed in a thread started by Joel Jacobson a while ago. I've posted
> > about this:
> > http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/
> > http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/
>
> To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, but also individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are
> A) All fields
> B) All fields which are included in some unique constraint, including primary keys.
>
> I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be
> SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ]
> and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'd obtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those, for the sake of a more efficient implementation.

The problem with this idea is that it's not possible to implement it.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-27 04:29:39
Message-ID: 69BD776F-7A2C-48A8-96F7-BBAD9409DC0F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov26, 2010, at 21:06 , Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of vie nov 26 10:48:39 -0300 2010:
>> On Nov25, 2010, at 23:01 , Alvaro Herrera wrote:
>>> So I've been working on improving locks for foreign key checks, as
>>> discussed in a thread started by Joel Jacobson a while ago. I've posted
>>> about this:
>>> http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/
>>> http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/
>>
>> To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, but also individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are
>> A) All fields
>> B) All fields which are included in some unique constraint, including primary keys.
>>
>> I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be
>> SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ]
>> and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'd obtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those, for the sake of a more efficient implementation.
>
> The problem with this idea is that it's not possible to implement it.

How so? The implementation you proposed in your blog should work fine for this. XMAX_KEY_LOCK would signal that only fields from set (B) are locked, while XMAX_SHARE_LOCK (or however thats called, didn't check the code) would signal that all fields are locked. These are the only two field sets that we'd support, and for any set of columns the user specified we'd pick the smallest superset of the set we *do* support and use that (Thus, we obtain a key lock if only fields from a unique index where specified, and a share lock otherwise).

The only difference is that instead of presenting this to the user as an entirely new lock type, we instead present it as a generalization of SHARE locks. The advantage being that *if* we ever figure out a way to support more fine-grained locking of fields, (say, locking only the fields contain in some *specific* index, maybe by storing locking the index tuple), we can do so completely transparent to the user.

greetings,
Florian Pflug


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-29 19:14:12
Message-ID: 1291057849-sup-5721@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Florian Pflug's message of sáb nov 27 01:29:39 -0300 2010:
> On Nov26, 2010, at 21:06 , Alvaro Herrera wrote:

> > The problem with this idea is that it's not possible to implement it.
>
> How so? The implementation you proposed in your blog should work fine for this. XMAX_KEY_LOCK would signal that only fields from set (B) are locked, while XMAX_SHARE_LOCK (or however thats called, didn't check the code) would signal that all fields are locked. These are the only two field sets that we'd support, and for any set of columns the user specified we'd pick the smallest superset of the set we *do* support and use that (Thus, we obtain a key lock if only fields from a unique index where specified, and a share lock otherwise).
>
> The only difference is that instead of presenting this to the user as an entirely new lock type, we instead present it as a generalization of SHARE locks. The advantage being that *if* we ever figure out a way to support more fine-grained locking of fields, (say, locking only the fields contain in some *specific* index, maybe by storing locking the index tuple), we can do so completely transparent to the user.

Oh, I see. Yeah, perhaps this could work. I'll have a look at both
ends.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-29 21:00:55
Message-ID: 1291063929-sup-5331@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Florian Pflug's message of vie nov 26 10:48:39 -0300 2010:

> To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, but also individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are
> A) All fields
> B) All fields which are included in some unique constraint, including primary keys.
>
> I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be
> SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ]
> and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'd obtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those, for the sake of a more efficient implementation.

This would require some additions in ri_FetchConstraintInfo(). Right
now it does a single syscache lookup and then extracts a bunch of
attributes from there. If we're going to implement as you suggest, we'd
have to:

1. add a relcache lookup in there, and extract column names involved in
the FK.

2. store those column names in RI_ConstraintInfo; currently it's about
68 bytes, it'd grow to ~2116 bytes (current size plus RI_MAX_NUMKEYS * NAMEDATALEN).

Additionally, we'd have to expend some more cycles at the parse analysis
phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those
columns belong into some non-partial unique index.

Is the performance gain sufficient to pay these costs?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-29 21:12:56
Message-ID: 1291065111-sup-2613@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010:

> This would require some additions in ri_FetchConstraintInfo(). Right
> now it does a single syscache lookup and then extracts a bunch of
> attributes from there. If we're going to implement as you suggest, we'd
> have to:
>
> 1. add a relcache lookup in there, and extract column names involved in
> the FK.
>
> 2. store those column names in RI_ConstraintInfo; currently it's about
> 68 bytes, it'd grow to ~2116 bytes (current size plus RI_MAX_NUMKEYS * NAMEDATALEN).
>
> Additionally, we'd have to expend some more cycles at the parse analysis
> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those
> columns belong into some non-partial unique index.

Hmm, actually there's already a relcache lookup when we execute whatever
action the FK needs to execute, so maybe we don't need to do any of
this.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-29 21:33:19
Message-ID: 4806.1291066399@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010:
>> Additionally, we'd have to expend some more cycles at the parse analysis
>> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those
>> columns belong into some non-partial unique index.

> Hmm, actually there's already a relcache lookup when we execute whatever
> action the FK needs to execute, so maybe we don't need to do any of
> this.

Checking for existence of a unique index at parse analysis time is quite
horrid anyway, because it means the validity of the query can change
from parse time to execution time. We got stuck with some of that in
relation to GROUP BY dependencies, but I don't want to buy into it
anywhere that we're not forced to by the letter of the SQL spec.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-11-29 21:56:50
Message-ID: 1291067699-sup-7174@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of lun nov 29 18:33:19 -0300 2010:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010:
> >> Additionally, we'd have to expend some more cycles at the parse analysis
> >> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those
> >> columns belong into some non-partial unique index.

> Checking for existence of a unique index at parse analysis time is quite
> horrid anyway, because it means the validity of the query can change
> from parse time to execution time. We got stuck with some of that in
> relation to GROUP BY dependencies, but I don't want to buy into it
> anywhere that we're not forced to by the letter of the SQL spec.

Hmm. Are you less opposed to the idea of some new nonstandard syntax in
the locking clause, then? I propose "SELECT FOR KEY LOCK", though
anything that the RI code could use would be the same to me, of course.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-01 16:10:46
Message-ID: A5AF0437-B4A2-499D-8713-7FF0B2D122BF@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov29, 2010, at 22:33 , Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010:
>>> Additionally, we'd have to expend some more cycles at the parse analysis
>>> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those
>>> columns belong into some non-partial unique index.
>
>> Hmm, actually there's already a relcache lookup when we execute whatever
>> action the FK needs to execute, so maybe we don't need to do any of
>> this.
>
> Checking for existence of a unique index at parse analysis time is quite
> horrid anyway, because it means the validity of the query can change
> from parse time to execution time. We got stuck with some of that in
> relation to GROUP BY dependencies, but I don't want to buy into it
> anywhere that we're not forced to by the letter of the SQL spec.

The validity wouldn't change, only the kind of lock taken. If all columns to be locked are part of some unique index, we'd record that fact in the locked tuple's infomask, and thus know that only a certain subset of columns are to be prevented from being updated.

So I figured we could do the check while executing the query, probably when we're about to lock the first tuple. The result could be cached then, so we'd only incur the overhead once. I'd also expect the overhead to be pretty small compared to the subsequent update of the tuple's xmax and the IO that this entails.

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-01 16:17:19
Message-ID: 17049.1291220239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> The validity wouldn't change, only the kind of lock taken. If all columns to be locked are part of some unique index, we'd record that fact in the locked tuple's infomask, and thus know that only a certain subset of columns are to be prevented from being updated.

There's not enough space in the infomask to record which columns (or
which unique index) are involved. And if you're talking about data that
could remain on disk long after the unique index is gone, that's not
going to be good enough.

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-01 17:09:15
Message-ID: 186A0050-2557-4695-A7CB-4ABB356F81E9@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec1, 2010, at 17:17 , Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> The validity wouldn't change, only the kind of lock taken. If all columns to be locked are part of some unique index, we'd record that fact in the locked tuple's infomask, and thus know that only a certain subset of columns are to be prevented from being updated.
>
> There's not enough space in the infomask to record which columns (or
> which unique index) are involved. And if you're talking about data that
> could remain on disk long after the unique index is gone, that's not
> going to be good enough.

We'd distinguish two cases
A) The set of locked columns is a subset of the set of columns mentioned in
*any* unique index. (In other words, for every locked column there is a
unique index which includes that column, though not necessarily one index
which includes them all)
B) The set of locked columns does not satisfy (A)

We'd mark case (A) by a bit in the infomask (say, HEAP_XMAX_SHARED_LOCK_KEYONLY) when SHARE locking the row. An UPDATE on such a SHARE locked row would be allowed despite the lock if it only changed columns not mentioned by any unique index.

Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEY set. Dropping an index requires some care, since it retroactively reduces the set of locked columns for pre-existing HEAP_XMAX_SHARED_LOCK_KEY locks. A possible solution for that might be to disregard HEAP_XMAX_SHARED_LOCK_KEYONLY, thus treating the row as fully share-locked, if a unique index was recently dropped. Assuming one can find a suitable definition of "recently" in this context, of course. Alvaro's initial proposal suffers from essentially the same problem I believe. It is somewhat mitigated though by the fact that he expects KEY locks to only be used by FOREIGN KEYS. Thus, if a unique index is dropped it either wasn't used by any FK constraint, in which case the retroactive reduction of lock strength doesn't matter, or the FK constraint must haven been dropped first, in which case the importance of enforcing the FK is somewhat diminished.

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-01 17:44:03
Message-ID: 18626.1291225443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> On Dec1, 2010, at 17:17 , Tom Lane wrote:
>> There's not enough space in the infomask to record which columns (or
>> which unique index) are involved. And if you're talking about data that
>> could remain on disk long after the unique index is gone, that's not
>> going to be good enough.

> We'd distinguish two cases
> A) The set of locked columns is a subset of the set of columns mentioned in
> *any* unique index. (In other words, for every locked column there is a
> unique index which includes that column, though not necessarily one index
> which includes them all)
> B) The set of locked columns does not satisfy (A)

How's that fix it? The on-disk flags are still falsifiable by
subsequent index changes.

> Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEY set.

Not with that definition. I could create a unique index that doesn't
contain some column that every previous unique index contained.

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-01 21:57:18
Message-ID: 976C0EBC-2184-49F1-9042-27219E9FB2E3@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec1, 2010, at 18:44 , Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> On Dec1, 2010, at 17:17 , Tom Lane wrote:
>>> There's not enough space in the infomask to record which columns (or
>>> which unique index) are involved. And if you're talking about data that
>>> could remain on disk long after the unique index is gone, that's not
>>> going to be good enough.
>
>> We'd distinguish two cases
>> A) The set of locked columns is a subset of the set of columns mentioned in
>> *any* unique index. (In other words, for every locked column there is a
>> unique index which includes that column, though not necessarily one index
>> which includes them all)
>> B) The set of locked columns does not satisfy (A)
>
>> Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEY set.
>
> Not with that definition. I could create a unique index that doesn't
> contain some column that every previous unique index contained.
Hm, I seem to be unable to put the definition into unambiguous words. Let me try again.

Set A contains all columns for which an unique index exist which includes said column. More formally, if C_i is the set of columns included in the index I, then A is the union over all C_i with I ranging over the unique indices.

If for example index unique index I1 contains the columns (a,b) and unique index I2 contains the columns (b,c) (and no other unique indices exist) then the set contains the columns a,b,c. If there is an additional index I3 on (d), then A = {a,b,c,d}. If a row is SHARE locked and HEAP_XMAX_SHARED_LOCK_KEY is set, only these columns are considered to be locked. If the flag is not set, all columns are considered locked.

best regards,
Florian Pflug


From: Jim Nasby <jim(at)nasby(dot)net>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-01 23:59:11
Message-ID: 5DF35D5E-53B4-4D12-8652-8FA3823567D1@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 1, 2010, at 11:09 AM, Florian Pflug wrote:
> An UPDATE on such a SHARE locked row would be allowed despite the lock if it only changed columns not mentioned by any unique index.

On a side-note, by "changed columns" do you mean the column appeared in the UPDATE statement, or the data actually changed? I suspect the former might be easier to implement, but it's really going to fsck with some applications (Rails is one example that comes to mind).
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-02 03:53:59
Message-ID: DFE17B65-07DD-469E-AFEA-536756F9D3B1@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec2, 2010, at 00:59 , Jim Nasby wrote:
> On Dec 1, 2010, at 11:09 AM, Florian Pflug wrote:
>> An UPDATE on such a SHARE locked row would be allowed despite the lock if it only changed columns not mentioned by any unique index.
>
> On a side-note, by "changed columns" do you mean the column appeared in the UPDATE statement, or the data actually changed? I suspect the former might be easier to implement, but it's really going to fsck with some applications (Rails is one example that comes to mind).

The most sensible thing to do is probably to make it mean "columns whose new value's binary representation differs from the old value's binary representation". That is also what is checked for HOT updated I believe, though I didn't recheck...

best regards,
Florian Pflug


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-02 13:52:35
Message-ID: 1291297625-sup-368@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of mié dic 01 14:44:03 -0300 2010:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
> > On Dec1, 2010, at 17:17 , Tom Lane wrote:
> >> There's not enough space in the infomask to record which columns (or
> >> which unique index) are involved. And if you're talking about data that
> >> could remain on disk long after the unique index is gone, that's not
> >> going to be good enough.
>
> > We'd distinguish two cases
> > A) The set of locked columns is a subset of the set of columns mentioned in
> > *any* unique index. (In other words, for every locked column there is a
> > unique index which includes that column, though not necessarily one index
> > which includes them all)
> > B) The set of locked columns does not satisfy (A)
>
> How's that fix it? The on-disk flags are still falsifiable by
> subsequent index changes.
>
> > Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEY set.
>
> Not with that definition. I could create a unique index that doesn't
> contain some column that every previous unique index contained.

This is not a problem, because a later UPDATE that tried to modify a
locked tuple on a column only indexed by the new index, would consider
it locked. Which is OK. There would only be a problem if we were
allowed to drop an index (which would reduce the set of locked columns),
but we don't allow this because DROP INDEX requires an exclusive lock on
the table and thus there cannot be anyone with key-locked tuples in it.

AFAICT Florian's definition is good.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving foreign key locks
Date: 2010-12-03 16:31:27
Message-ID: 1291393399-sup-3906@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Of course, there is a much more serious problem with the whole idea. I
have worked through most of the necessary changes and I'm now down to
changing heap_udate to comply with the new locking protocol.

The problem I'm now facing is that we need to know the set of updated
columns pretty early -- just after calling HeapTupleSatisfiesUpdate, in
fact, so that we can change a BeingUpdated result into MayBeUpdated if
the set of columns is right. However, we don't know the set of updated
columns until much later, when the function has already undergone a lot
of other work and checked other possible error conditions.

Short of resturcturing the function so that we can obtain the set of
updated columns early (which I'm not fond of doing and may not even be
possible), I'm thinking that we should be "optimistic" about the set of
updated columns, and recheck them later. The problem with this idea, of
course, is that if the test turns out to fail, we could have possibly
caused a lot of work (such as TOAST changes) that now need to be
discarded. In other words, the optimization is pessimizing some cases
:-(

I'm not seeing any other way around this ATM.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support