augmenting MultiXacts to improve foreign keys

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 17:01:04
Message-ID: 1312907125-sup-9346@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Previously, we had some discussion to introduce a new type of tuple lock
to let foreign keys be lighter weight, allowing for more concurrency.
During the course of that discussion, it became evident that the
solution being proposed didn't go all the way to solve the problem. A
proposal by Noah Misch seems like it would fix the whole problem, but it
requires some more rejiggering than I had considered.

This email summarizes what I just posted in a blog article here:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
and adds some implementation details.

The proposal there is to create two new tuple lock types, not one. They
have been called "KEY UPDATE" and "KEY SHARE".

Proposed conflict table:
KEY UPDATE FOR UPDATE FOR SHARE KEY SHARE
KEY UPDATE X X X X
FOR UPDATE X X X
FOR SHARE X X
KEY SHARE X

DELETE always grabs KEY UPDATE lock on a tuple.
UPDATE grabs KEY UPDATE if the key is being modified, otherwise FOR UPDATE.
SELECT FOR UPDATE grabs FOR UPDATE.
SELECT FOR SHARE grabs FOR SHARE.
SELECT FOR KEY SHARE grabs FOR KEY SHARE. This is the mode used by RI triggers.

Note that this means that UPDATE would always have to check whether key
columns are being modified. (I loosely talk about "key columns" here
referring to columns involving unique indexes. On tables with no unique
indexes, I think this would have to mean all columns, but I haven't
thought much about this bit.)

Note that the KEY UPDATE lock would be an internal option, not exposed
to SQL. I think we already have enough extensions in this area. We are
forced to expose KEY SHARE because RI triggers get to it via SPI, but I
would be happy to avoid doing it if I knew how. I would also love to
get rid of FOR SHARE because I don't think they serve any purpose
anymore, but since it has already been exposed to SQL for several
releases, I'm not sure that it's a viable thing to do.

To implement this, we need to augment MultiXact to store the lock type
that each comprising Xid holds on the tuple. Two bits per Xid are
needed. My thinking is that we could simply extend the "members" to add
a byte for every four members. This should be relatively simple, though
this forecloses the option of using MultiXact for some other purpose
than locking tuples. This being purely theoretical, I don't have a
problem with that.

Note that the original keylocks patch I posted a couple of weeks ago has
a caveat: if transaction A grabs share lock and transaction B grabs key
lock, there's no way to know who owns which. I dismissed that problem
as unimportant (and probably infrequent); the good thing about this new
idea is that we wouldn't have that problem.

This would also help us find a solution to the problem that an aborted
subtransaction that updates or excl-locks a tuple causes an earlier
shared lock to be forgotten. We would deal with this by marking the
Xmax with a new MultiXact that includes both the lock and the update.
This means that this MultiXact would have to be WAL-logged. I would
leave that for a later patch, but I think it's good that there's a way
to fix it.

Thoughts?

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 18:41:14
Message-ID: 1312915274.19455.7.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-08-09 at 13:01 -0400, Alvaro Herrera wrote:
> Note that the KEY UPDATE lock would be an internal option, not exposed
> to SQL. I think we already have enough extensions in this area. We are
> forced to expose KEY SHARE because RI triggers get to it via SPI, but I
> would be happy to avoid doing it if I knew how.

Right now, FKs aren't really very special, they are mostly just
syntactic sugar (right?). This proposal would make FKs special internal
mechanisms, and I don't see the benefit in doing so.

[ I didn't read through the previous threads yet, so perhaps this was
already discussed. ]

Regards,
Jeff Davis


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 18:51:03
Message-ID: 1312915676-sup-6214@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011:
> On Tue, 2011-08-09 at 13:01 -0400, Alvaro Herrera wrote:
> > Note that the KEY UPDATE lock would be an internal option, not exposed
> > to SQL. I think we already have enough extensions in this area. We are
> > forced to expose KEY SHARE because RI triggers get to it via SPI, but I
> > would be happy to avoid doing it if I knew how.
>
> Right now, FKs aren't really very special, they are mostly just
> syntactic sugar (right?). This proposal would make FKs special internal
> mechanisms, and I don't see the benefit in doing so.

Well, you can get the same behavior by adding the constraint triggers
manually. But those triggers are written in C, so you could equally
claim that they are "special internal" already. The SPI interface has
some special entry points to allow them to work correctly (for example
passing a snapshot for the checks to run with).

In any case, this is certainly not something I'm really interested in
doing. I don't have a problem with simply adding the new syntax to SQL
and documenting it appropriately ("this is only for internal RI use").

> [ I didn't read through the previous threads yet, so perhaps this was
> already discussed. ]

Nope.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 19:10:37
Message-ID: 64B402D3-368C-48EE-AA4B-879F4004F3B2@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug9, 2011, at 19:01 , Alvaro Herrera wrote:
> This would also help us find a solution to the problem that an aborted
> subtransaction that updates or excl-locks a tuple causes an earlier
> shared lock to be forgotten. We would deal with this by marking the
> Xmax with a new MultiXact that includes both the lock and the update.
> This means that this MultiXact would have to be WAL-logged. I would
> leave that for a later patch, but I think it's good that there's a way
> to fix it.

Yeah, I pondered something similar in the past, but never got around to
figuring out the details.

This would also allow us to modify the behaviour of tuple locks under
isolation level REPEATABLE READ to throw a serialization error if the
row was locked by an invisible transaction. Doing so would allow us to
get rid of the strange re-checking logic that RI triggers must currently
use in REPEATABLE READ mode, and would thus make it possible to implement
custom RI triggers in PL/pgSQL.

I'll try to find some time to check how that fits in with the new
per-tuple lock levels you proposed.

best regards,
Florian Pflug


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 19:41:00
Message-ID: 68C9008E-69A2-42EC-B56F-51B4EED7F56A@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug9, 2011, at 19:01 , Alvaro Herrera wrote:
> Note that this means that UPDATE would always have to check whether key
> columns are being modified. (I loosely talk about "key columns" here
> referring to columns involving unique indexes. On tables with no unique
> indexes, I think this would have to mean all columns, but I haven't
> thought much about this bit.)
>
> To implement this, we need to augment MultiXact to store the lock type
> that each comprising Xid holds on the tuple. Two bits per Xid are
> needed. My thinking is that we could simply extend the "members" to add
> a byte for every four members. This should be relatively simple, though
> this forecloses the option of using MultiXact for some other purpose
> than locking tuples. This being purely theoretical, I don't have a
> problem with that.

Hm, I'm still not a huge fan of having the only the choice between locking
all columns, or all columns that are part of a unique index. For multiple
reasons.

First, I'd like to see us support FKs which reference non-unqiue columns.
As long as we restrict such FK constraints to ON UPDATE/DELETE RESTRICT
(or NO ACTION), there's no conceptual difficulty in doing that. Yeah, I
know that there'll probably be a lot of push-back on the grounds of standards
compliance, but I still believe it'd be a nice feature.

Second, there are lots of reasons for adding UNIQUE constraints to columns
beside being on the referencing side of a FK constraint. If we make the
lock strength taken by UPDATE depend on whether or not the UPDATE modified a
column included in a UNIQUE constraint, we force people to drop UNIQUE
constraints to avoid deadlocks, and thus indirectly support sloppy schema
design.

Couldn't we simply give the user a way to specify, per column, whether or
not that column is a "KEY" column? Creating a foreign key constraint could
still implicitly mark all referenced columns as KEY columns, but columns
would no longer be "KEY" columns simply because they're part of a UNIQUE
constraint. Users would be free to add arbitrary columns to the set of
"KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY.

best regards,
Florian Pflug


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 19:50:27
Message-ID: 1312919106-sup-3707@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Florian Pflug's message of mar ago 09 15:41:00 -0400 2011:

> First, I'd like to see us support FKs which reference non-unqiue columns.
> As long as we restrict such FK constraints to ON UPDATE/DELETE RESTRICT
> (or NO ACTION), there's no conceptual difficulty in doing that. Yeah, I
> know that there'll probably be a lot of push-back on the grounds of standards
> compliance, but I still believe it'd be a nice feature.
>
> Second, there are lots of reasons for adding UNIQUE constraints to columns
> beside being on the referencing side of a FK constraint. If we make the
> lock strength taken by UPDATE depend on whether or not the UPDATE modified a
> column included in a UNIQUE constraint, we force people to drop UNIQUE
> constraints to avoid deadlocks, and thus indirectly support sloppy schema
> design.
>
> Couldn't we simply give the user a way to specify, per column, whether or
> not that column is a "KEY" column? Creating a foreign key constraint could
> still implicitly mark all referenced columns as KEY columns, but columns
> would no longer be "KEY" columns simply because they're part of a UNIQUE
> constraint. Users would be free to add arbitrary columns to the set of
> "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY.

What you propose seems reasonable to me (allegedly not an expert in
keys). However, I don't see that it conflicts with my proposal in any
way -- I mean, on top of my patch you could build something that changes
what columns are considered "keys". This is why the "KEY SHARE" rowmark
option is going to be documented as "only for internal use", because it
might change depending on how we define foreign-key-referenceable
fields.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 20:40:15
Message-ID: 12383.1312922415@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011:
>> Right now, FKs aren't really very special, they are mostly just
>> syntactic sugar (right?). This proposal would make FKs special internal
>> mechanisms, and I don't see the benefit in doing so.

> Well, you can get the same behavior by adding the constraint triggers
> manually. But those triggers are written in C, so you could equally
> claim that they are "special internal" already. The SPI interface has
> some special entry points to allow them to work correctly (for example
> passing a snapshot for the checks to run with).

Yeah, the crosscheck-snapshot logic already puts the lie to any idea
that the RI triggers are equivalent to anything available at the SQL
level.

ISTM you could pass down the "please do this with FOR KEY UPDATE not
just FOR SHARE" flag similarly to the way the crosscheck snapshot is
passed down, or maybe even just do it if you see a crosscheck snapshot
is present in the executor state (though that's a bit ugly).

Like Florian, I'm considerably more concerned about the aspect of
deciding which columns are "key columns" and whether they changed.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-09 21:03:24
Message-ID: 1312923126-sup-4707@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of mar ago 09 16:40:15 -0400 2011:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011:
> >> Right now, FKs aren't really very special, they are mostly just
> >> syntactic sugar (right?). This proposal would make FKs special internal
> >> mechanisms, and I don't see the benefit in doing so.
>
> > Well, you can get the same behavior by adding the constraint triggers
> > manually. But those triggers are written in C, so you could equally
> > claim that they are "special internal" already. The SPI interface has
> > some special entry points to allow them to work correctly (for example
> > passing a snapshot for the checks to run with).
>
> Yeah, the crosscheck-snapshot logic already puts the lie to any idea
> that the RI triggers are equivalent to anything available at the SQL
> level.
>
> ISTM you could pass down the "please do this with FOR KEY UPDATE not
> just FOR SHARE" flag similarly to the way the crosscheck snapshot is
> passed down, or maybe even just do it if you see a crosscheck snapshot
> is present in the executor state (though that's a bit ugly).

You mean FOR KEY SHARE ...

Yeah, I could pass it down that way. But if we go that route, do we
need to specify any specific rowmark clause at all? Maybe we could go
back to FOR UPDATE and use the flag to tell the executor that it's
actually FOR KEY SHARE, and deprecate the FOR SHARE clause altogether.

> Like Florian, I'm considerably more concerned about the aspect of
> deciding which columns are "key columns" and whether they changed.

I will keep this in mind. I think this problem is considerably easier
to solve than the different rowmark per xact in MultiXact, anyway.
If we require the user to specify which unique indexes or constraints
are appropriate for FKs, it becomes trivial.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-10 00:00:32
Message-ID: 221E6C16-CC0F-460D-9B84-675CD6312DE2@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 9, 2011, at 4:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Like Florian, I'm considerably more concerned about the aspect of
> deciding which columns are "key columns" and whether they changed.

Should we consider trying implement a system that can lock individual columns?

Either way, my main concern is that we're going to end up pessimizing the common case of UPDATE, by making it do extra work to reduce the chances of a lock conflict from an incoming foreign key. Most of the time there won't be an incoming foreign key, or the referring row won't get updated, and that work will be wasted. It would be nice to just lock the row "for some kind of update" and sort out what exactly we locked only if a possible conflict comes along.

...Robert


From: Noah Misch <noah(at)2ndQuadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-10 06:45:37
Message-ID: 20110810064534.GA15182@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 09, 2011 at 09:41:00PM +0200, Florian Pflug wrote:
> Couldn't we simply give the user a way to specify, per column, whether or
> not that column is a "KEY" column? Creating a foreign key constraint could
> still implicitly mark all referenced columns as KEY columns, but columns
> would no longer be "KEY" columns simply because they're part of a UNIQUE
> constraint. Users would be free to add arbitrary columns to the set of
> "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY.

What would be the use case for manually expanding the set of key columns?

If you have a heavily-updated table referenced by slowly-changing tables, it
might pay off to disable the optimization entirely. That would be an esoteric
escape hatch to provide for an uncommon use case, though.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-10 07:11:16
Message-ID: 20110810071115.GB15182@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 09, 2011 at 01:01:04PM -0400, Alvaro Herrera wrote:
> KEY UPDATE FOR UPDATE FOR SHARE KEY SHARE
> KEY UPDATE X X X X
> FOR UPDATE X X X
> FOR SHARE X X
> KEY SHARE X
>
> DELETE always grabs KEY UPDATE lock on a tuple.
> UPDATE grabs KEY UPDATE if the key is being modified, otherwise FOR UPDATE.
> SELECT FOR UPDATE grabs FOR UPDATE.
> SELECT FOR SHARE grabs FOR SHARE.
> SELECT FOR KEY SHARE grabs FOR KEY SHARE. This is the mode used by RI triggers.
>
> Note that this means that UPDATE would always have to check whether key
> columns are being modified. (I loosely talk about "key columns" here
> referring to columns involving unique indexes. On tables with no unique
> indexes, I think this would have to mean all columns, but I haven't
> thought much about this bit.)

On tables with no key columns, we can skip the datum comparisons and use KEY
UPDATE, because it does not matter: nobody would try to take KEY SHARE anyway.
(If KEY SHARE is SQL-exposed, a manual acquisition remains possible. It does
not seem important to cater to that, though.) Key columns should be those
columns actually referenced by incoming foreign keys; the relcache can maintain
that list. This was less important for the previous version, which didn't
compare datums prior to encountering a live KEY LOCK. It will now be more
important to avoid that overhead for tables lacking incoming FKs.

> To implement this, we need to augment MultiXact to store the lock type
> that each comprising Xid holds on the tuple. Two bits per Xid are
> needed. My thinking is that we could simply extend the "members" to add
> a byte for every four members. This should be relatively simple, though
> this forecloses the option of using MultiXact for some other purpose
> than locking tuples. This being purely theoretical, I don't have a
> problem with that.
>
> Note that the original keylocks patch I posted a couple of weeks ago has
> a caveat: if transaction A grabs share lock and transaction B grabs key
> lock, there's no way to know who owns which. I dismissed that problem
> as unimportant (and probably infrequent); the good thing about this new
> idea is that we wouldn't have that problem.

Is there a case not involving manual SELECT ... FOR <locktype> that will depend
on this for correctness? Even if not, there's a lot to like about this
proposal. However, I think I may be missing the condition you had in mind when
designing it.

> This would also help us find a solution to the problem that an aborted
> subtransaction that updates or excl-locks a tuple causes an earlier
> shared lock to be forgotten. We would deal with this by marking the
> Xmax with a new MultiXact that includes both the lock and the update.
> This means that this MultiXact would have to be WAL-logged. I would
> leave that for a later patch, but I think it's good that there's a way
> to fix it.

Interesting. Growing pg_multixact/members by 6% seems reasonable for those two
benefits. It also makes possible a manual FOR UPDATE over a KEY LOCK; that
previously looked problematic due to the lack of a later tuple version to
continue bearing the KEY LOCK.

Consider the simple case of a tuple with a single KEY LOCK which we then proceed
to UPDATE, not touching any key column. Will that old tuple version get a
multixact bearing both the FOR UPDATE locker and the KEY LOCK locker, or will it
carry just the FOR UPDATE locker with the new tuple witnessing the KEY LOCK?
The latter seems preferable to avoid an increase in pg_multixact usage, but I
haven't worked out whether it covers enough of the problem space.

Thanks,
nm

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-10 08:14:10
Message-ID: 8EF1FA5A-89D8-484D-BE00-2F60CF93555D@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug9, 2011, at 22:40 , Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011:
>>> Right now, FKs aren't really very special, they are mostly just
>>> syntactic sugar (right?). This proposal would make FKs special internal
>>> mechanisms, and I don't see the benefit in doing so.
>
>> Well, you can get the same behavior by adding the constraint triggers
>> manually. But those triggers are written in C, so you could equally
>> claim that they are "special internal" already. The SPI interface has
>> some special entry points to allow them to work correctly (for example
>> passing a snapshot for the checks to run with).
>
> Yeah, the crosscheck-snapshot logic already puts the lie to any idea
> that the RI triggers are equivalent to anything available at the SQL
> level.

True, but I still considered that to be a quite unfortunate limitation,
and one that I hope to one day remove. So I'd hate to see us adding more
stumbling blocks along that road.

Even today, you can do user-space FK-like constraint if you restrict
yourself to either running all transaction in isolation level READ COMMITTED,
or all transactions in isolation level SERIALIZABLE (Though I in the later
case, you don't need locks anyway..)

best regards,
Florian Pflug


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Noah Misch <noah(at)2ndQuadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-08-10 08:15:49
Message-ID: 562B5BB5-DA06-4CD6-881B-A99D1EEC8CA6@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug10, 2011, at 08:45 , Noah Misch wrote:
> On Tue, Aug 09, 2011 at 09:41:00PM +0200, Florian Pflug wrote:
>> Couldn't we simply give the user a way to specify, per column, whether or
>> not that column is a "KEY" column? Creating a foreign key constraint could
>> still implicitly mark all referenced columns as KEY columns, but columns
>> would no longer be "KEY" columns simply because they're part of a UNIQUE
>> constraint. Users would be free to add arbitrary columns to the set of
>> "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY.
>
> What would be the use case for manually expanding the set of key columns?

You'd use that if you implemented a FK-like constraint (e.g. a FK on an
array-values field). Without a way to manually expand the set of KEY columns,
such a non-core FK constraint couldn't use the lighter locks.

best regards,
Florian Pflug


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-09-09 21:31:30
Message-ID: cmdap.323308e530.1315601945-sup-7377@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Alvaro Herrera's message of mar ago 09 13:01:04 -0400 2011:

> To implement this, we need to augment MultiXact to store the lock type
> that each comprising Xid holds on the tuple. Two bits per Xid are
> needed. My thinking is that we could simply extend the "members" to add
> a byte for every four members.

So I've been working on this, and I've noticed that somehow it seems to
have turned into a giant snowball. I'd like opinions on the items below
before I continue work here; if any of these ideas turns out to be a
showstopper, I'd like to know sooner rather than later.

1. since MultiXacts are going to contain updates and not just locks, it
means they will need to persist beyond OldestXmin -- in fact,
pg_multixact is going to have the same truncation rules as pg_clog,
namely the vacuum freeze horizon. Currently they are truncated very
quickly; this is not going to be true anymore.

2. This also means that we may have to resolve MultiXacts to their
comprising TransactionIds when tqual.c is doing visibility checks on the
tuples. Right now, the code simply does things like this:

if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
return true;
}
/* further code checks HeapTupleHeaderGetXmax(tuple) */

It's now going to need to do something more like

if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
xmax = HeapTupleGetUpdateXid(tuple);
}
else
xmax = HeapTupleHeaderGetXmax(tuple);
/* further code checks our derived xmax */

where the HeapTupleGetUpdateXid function looks more or less like this

TransactionId
HeapTupleGetUpdateXid(HeapTupleHeader tuple)
{
TransactionId update_xact;

Assert(!(tuple->t_infomask & HEAP_XMAX_IS_NOT_UPDATE));
Assert(tuple->t_infomask & HEAP_XMAX_IS_MULTI);

MultiXactMember *members;

nmembers = GetMultiXactIdMembers(xwait, &members);

if (nmembers > 0)
{
int i;

for (i = 0; i < nmembers; i++)
{
/* KEY SHARE lockers are okay -- ignore it */
if (members[i].status == MultiXactStatusKeyShare)
continue;
/* there should be at most one updater */
Assert(update_xact == InvalidTransactionId);
/* other status values not acceptable because they
* conflict with update */
Assert(members[i].status == MultiXactStatusUpdate);
update_xact = members[i].xid;
}
}

return update_xact;
}

Which leads us to:

3. I've come up with HEAP_XMAX_IS_NOT_UPDATE in t_infomask, which means
that the Xmax, being a MultiXact, does not contain an update Xid. This
reuses the old value of HEAP_XMAX_SHARED_LOCK. I've used this rather
weird semantics for these reasons:

a) it's pg_upgrade compatible. Any tuple that has the SHARED_LOCK bit
from the older version set cannot contain an update, and so the bit is
automatically right.
b) it quickly avoids having to run the GetMultiXactIdMembers thingy
(which is expensive) in the common case that there's no update.
c) it lets me keep the HEAP_IS_LOCKED semantics; which means "this tuple
is only locked by Xmax, not updated", which is used extensively in
various places.
/*
* if any of these bits is set, xmax hasn't deleted the tuple, only locked it.
*/
#define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_IS_NOT_UPDATE | \
HEAP_XMAX_KEYSHR_LOCK)

4. FOR SHARE is not a very widely used clause, I think. FOR UPDATE is
part of the standard and thus it warrants quick innards, i.e. its own
hint bit. FOR KEY SHARE is going to be used by RI triggers and so it
should also be quick; I've also given it its own hint bit. However,
FOR SHARE is probably not used much and I've relegated it to being
mandatorily stored in MultiXact. That is, if someone requests a FOR
SHARE lock on a tuple, it will get a singleton MultiXact. The reason
for this is that I didn't want to use one more hint bit.

5. I've now used three hint bits -- reused HEAP_XMAX_SHARED_LOCK as
HEAP_XMAX_IS_NOT_UPDATE (already explained); used the free hint bit from
t_infomask as HEAP_XMAX_KEYSHR_LOCK (should be obvious); and I've used
0x2000 in t_infomask2 as HEAP_UPDATE_KEY_INTACT, to mean that this tuple
has been updated but the key columns have not been modified. This lets
heapam.c know that this tuple can be further key-share locked.

6. When locking a tuple that is being concurrently updated, the locking
transaction must obviously walk up the update chain (t_self) and lock
the updated version too. I have not tried to implement this yet but I
think it's going to be a bit weird -- I think I might need to modify
tuples that are HeapTupleInvisible and/or HeapTupleSelfUpdated according
to HeapTupleSatisfiesUpdate. (I even wonder if I'm going to need to
create a new HeapTupleSatisfies routine for this purpose).

Thoughts?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-09-12 21:22:40
Message-ID: 4E6E7820.3000002@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Anyone on all of this?

On 09/09/2011 02:31 PM, Alvaro Herrera wrote:
>
>
> Excerpts from Alvaro Herrera's message of mar ago 09 13:01:04 -0400 2011:
>
>> To implement this, we need to augment MultiXact to store the lock type
>> that each comprising Xid holds on the tuple. Two bits per Xid are
>> needed. My thinking is that we could simply extend the "members" to add
>> a byte for every four members.
>
> So I've been working on this, and I've noticed that somehow it seems to
> have turned into a giant snowball. I'd like opinions on the items below
> before I continue work here; if any of these ideas turns out to be a
> showstopper, I'd like to know sooner rather than later.
>
> 1. since MultiXacts are going to contain updates and not just locks, it
> means they will need to persist beyond OldestXmin -- in fact,
> pg_multixact is going to have the same truncation rules as pg_clog,
> namely the vacuum freeze horizon. Currently they are truncated very
> quickly; this is not going to be true anymore.
>
> 2. This also means that we may have to resolve MultiXacts to their
> comprising TransactionIds when tqual.c is doing visibility checks on the
> tuples. Right now, the code simply does things like this:
>
> if (tuple->t_infomask& HEAP_XMAX_IS_MULTI)
> {
> /* MultiXacts are currently only allowed to lock tuples */
> Assert(tuple->t_infomask& HEAP_IS_LOCKED);
> return true;
> }
> /* further code checks HeapTupleHeaderGetXmax(tuple) */
>
> It's now going to need to do something more like
>
> if (tuple->t_infomask& HEAP_XMAX_IS_MULTI)
> {
> if (tuple->t_infomask& HEAP_IS_LOCKED)
> return true;
> xmax = HeapTupleGetUpdateXid(tuple);
> }
> else
> xmax = HeapTupleHeaderGetXmax(tuple);
> /* further code checks our derived xmax */
>
> where the HeapTupleGetUpdateXid function looks more or less like this
>
> TransactionId
> HeapTupleGetUpdateXid(HeapTupleHeader tuple)
> {
> TransactionId update_xact;
>
> Assert(!(tuple->t_infomask& HEAP_XMAX_IS_NOT_UPDATE));
> Assert(tuple->t_infomask& HEAP_XMAX_IS_MULTI);
>
> MultiXactMember *members;
>
> nmembers = GetMultiXactIdMembers(xwait,&members);
>
> if (nmembers> 0)
> {
> int i;
>
> for (i = 0; i< nmembers; i++)
> {
> /* KEY SHARE lockers are okay -- ignore it */
> if (members[i].status == MultiXactStatusKeyShare)
> continue;
> /* there should be at most one updater */
> Assert(update_xact == InvalidTransactionId);
> /* other status values not acceptable because they
> * conflict with update */
> Assert(members[i].status == MultiXactStatusUpdate);
> update_xact = members[i].xid;
> }
> }
>
> return update_xact;
> }
>
>
> Which leads us to:
>
> 3. I've come up with HEAP_XMAX_IS_NOT_UPDATE in t_infomask, which means
> that the Xmax, being a MultiXact, does not contain an update Xid. This
> reuses the old value of HEAP_XMAX_SHARED_LOCK. I've used this rather
> weird semantics for these reasons:
>
> a) it's pg_upgrade compatible. Any tuple that has the SHARED_LOCK bit
> from the older version set cannot contain an update, and so the bit is
> automatically right.
> b) it quickly avoids having to run the GetMultiXactIdMembers thingy
> (which is expensive) in the common case that there's no update.
> c) it lets me keep the HEAP_IS_LOCKED semantics; which means "this tuple
> is only locked by Xmax, not updated", which is used extensively in
> various places.
> /*
> * if any of these bits is set, xmax hasn't deleted the tuple, only locked it.
> */
> #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_IS_NOT_UPDATE | \
> HEAP_XMAX_KEYSHR_LOCK)
>
>
> 4. FOR SHARE is not a very widely used clause, I think. FOR UPDATE is
> part of the standard and thus it warrants quick innards, i.e. its own
> hint bit. FOR KEY SHARE is going to be used by RI triggers and so it
> should also be quick; I've also given it its own hint bit. However,
> FOR SHARE is probably not used much and I've relegated it to being
> mandatorily stored in MultiXact. That is, if someone requests a FOR
> SHARE lock on a tuple, it will get a singleton MultiXact. The reason
> for this is that I didn't want to use one more hint bit.
>
> 5. I've now used three hint bits -- reused HEAP_XMAX_SHARED_LOCK as
> HEAP_XMAX_IS_NOT_UPDATE (already explained); used the free hint bit from
> t_infomask as HEAP_XMAX_KEYSHR_LOCK (should be obvious); and I've used
> 0x2000 in t_infomask2 as HEAP_UPDATE_KEY_INTACT, to mean that this tuple
> has been updated but the key columns have not been modified. This lets
> heapam.c know that this tuple can be further key-share locked.
>
> 6. When locking a tuple that is being concurrently updated, the locking
> transaction must obviously walk up the update chain (t_self) and lock
> the updated version too. I have not tried to implement this yet but I
> think it's going to be a bit weird -- I think I might need to modify
> tuples that are HeapTupleInvisible and/or HeapTupleSelfUpdated according
> to HeapTupleSatisfiesUpdate. (I even wonder if I'm going to need to
> create a new HeapTupleSatisfies routine for this purpose).
>
> Thoughts?
>

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-09-13 14:02:51
Message-ID: CA+TgmoZBnUDBki_ku6Kxza+m32PEwZAk273dwRHfEu7hn1J+7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 9, 2011 at 5:31 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> [ multixact complexity ]

I wonder if it's a mistake to be thinking about solving this problem
by extending the MultiXact mechanism. Pushing xmax out-of-line so
that we have room to store tuple information seems expensive,
especially because there's no convenient way to undo it once the locks
are old enough not to be interesting any more. The current system
works because we never need both pieces of information at the same
time, but that's not going to be true any more.

I'm wondering if it would be possible to modify the main lock manager,
or create a special-purpose tuple lock manager, to record all tuple
locks, both awaited and granted. You'd need to make sure that if
there were more than a few locks the information could spill to disk
somehow, and you'd also need to make sure that you didn't pull that
information in from disk more often than necessary - i.e. you should
try to keep enough summary info in memory to determine whether there
*might* be a conflicting lock that spilled out, so that you only need
to go examine the spilled data if there's a possibility that you might
find something interesting there.

A system like this would make it possible to clean up all the lock
entries at transaction end, which would avoid a lot of the complexity
you mention. On the other hand, it's clearly not simple, either, and
I haven't thought through all the details...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-09-13 14:27:08
Message-ID: 1315923028-sup-1337@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Robert Haas's message of mar sep 13 11:02:51 -0300 2011:
>
> On Fri, Sep 9, 2011 at 5:31 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > [ multixact complexity ]
>
> I wonder if it's a mistake to be thinking about solving this problem
> by extending the MultiXact mechanism. Pushing xmax out-of-line so
> that we have room to store tuple information seems expensive,
> especially because there's no convenient way to undo it once the locks
> are old enough not to be interesting any more. The current system
> works because we never need both pieces of information at the same
> time, but that's not going to be true any more.

Hmm, it doesn't look that way to me: whenever you lock a row, all
previous lockers that are gone can now be forgotten. Locks that are old
enough not to be interesting, are constantly and automatically gone.

The only reason that multixact now needs to persist beyond currently
running transaction is the chance that there might be an "update xmax"
hiding somewhere; and tuples marked with those are going to be removed
by vacuum anyway. (I have been thinking that long before vacuum, we
could remove the multixact and replace it with a plain Xid, if the
lockers are all gone -- which is another part of your "undo it once the
locks are old enough".)

The expensive bit is the reason why I used a hint bit to mark this
possibility; we distinguish the cheap case of locked-but-not-updated
from the expensive one of locked-and-updated with hint bits, so the
cheap case stays cheap; and the expensive one requires a bit more work,
yes, but this brings more concurrency overall.

> I'm wondering if it would be possible to modify the main lock manager,
> or create a special-purpose tuple lock manager, to record all tuple
> locks, both awaited and granted. You'd need to make sure that if
> there were more than a few locks the information could spill to disk
> somehow, and you'd also need to make sure that you didn't pull that
> information in from disk more often than necessary - i.e. you should
> try to keep enough summary info in memory to determine whether there
> *might* be a conflicting lock that spilled out, so that you only need
> to go examine the spilled data if there's a possibility that you might
> find something interesting there.

This is where we started, back when we were creating SELECT FOR SHARE:
trying to spill the lock table. That idea went down in flames; consider
that someone might request to block an entire huge table, and you're in
trouble. There might have been other problems I don't recall with that
idea.

I don't want to go back to that drawing board -- obviously I'm not very
keen of going great lengths down the same path, only to fail, twice.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: augmenting MultiXacts to improve foreign keys
Date: 2011-09-14 12:07:26
Message-ID: CA+TgmoaOtxZqbBn_rWcN6rW7VFE-85ryEBbWy4uJdcTbxqbKxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 13, 2011 at 9:27 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>> I wonder if it's a mistake to be thinking about solving this problem
>> by extending the MultiXact mechanism.  Pushing xmax out-of-line so
>> that we have room to store tuple information seems expensive,
>> especially because there's no convenient way to undo it once the locks
>> are old enough not to be interesting any more.  The current system
>> works because we never need both pieces of information at the same
>> time, but that's not going to be true any more.
>
> Hmm, it doesn't look that way to me: whenever you lock a row, all
> previous lockers that are gone can now be forgotten.  Locks that are old
> enough not to be interesting, are constantly and automatically gone.

Yes... but as you pointed out, you'll still have to keep the
MultiXacts around for a long time, because you have no way of knowing
for certain whether there might be an xmax referring to them.

> The only reason that multixact now needs to persist beyond currently
> running transaction is the chance that there might be an "update xmax"
> hiding somewhere; and tuples marked with those are going to be removed
> by vacuum anyway.  (I have been thinking that long before vacuum, we
> could remove the multixact and replace it with a plain Xid, if the
> lockers are all gone -- which is another part of your "undo it once the
> locks are old enough".)

You could do that, but odds are it won't buy you much. Generally by
the time you could do this, you'll either be able to mark xmax invalid
or truncate the tuple to a dead line pointer. The exception would be
the case where xmax outlives all the lockers, but I'm not sure whether
it's worth having code just to cater to that case.

> The expensive bit is the reason why I used a hint bit to mark this
> possibility; we distinguish the cheap case of locked-but-not-updated
> from the expensive one of locked-and-updated with hint bits, so the
> cheap case stays cheap; and the expensive one requires a bit more work,
> yes, but this brings more concurrency overall.

Right. I don't think what you're trying to do is necessarily bad; I
was just casting around for other solutions. From an efficiency
standpoint, I can see two things to worry about: first, keeping
multixacts around for longer consumes more disk space than not doing
so, and most of the time, that disk space will be wasted; and second,
we'll need to deference multixact XIDs more frequently, and that's not
free. Now, it may well be that both of those problems are
sufficiently small that we needn't worry about them.

>> I'm wondering if it would be possible to modify the main lock manager,
>> or create a special-purpose tuple lock manager, to record all tuple
>> locks, both awaited and granted.  You'd need to make sure that if
>> there were more than a few locks the information could spill to disk
>> somehow, and you'd also need to make sure that you didn't pull that
>> information in from disk more often than necessary - i.e. you should
>> try to keep enough summary info in memory to determine whether there
>> *might* be a conflicting lock that spilled out, so that you only need
>> to go examine the spilled data if there's a possibility that you might
>> find something interesting there.
>
> This is where we started, back when we were creating SELECT FOR SHARE:
> trying to spill the lock table.  That idea went down in flames; consider
> that someone might request to block an entire huge table, and you're in
> trouble.

Yeah, that's not much fun. On the other hand, our current
representation involves dirtying every page in the table, which isn't
much fun either. I'm not sure that this couldn't be made to work -
with some highly-compressed lock representation - but I can see why
you're not interested in trying it.

I wish I had some better ideas here, but I think this is just a tough
problem and it may be that any solution will be somewhat messy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company