alternative model for handling locking in parallel groups

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: alternative model for handling locking in parallel groups
Date: 2014-11-13 20:59:11
Message-ID: CA+TgmoYGpLoJo+LG1beBOs8gdjwjTQ0qdmxsYJ4ihFyJ11Tr-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Discussion of my incomplete group locking patch seems to have
converged around two points: (1) Everybody agrees that undetected
deadlocks are unacceptable. (2) Nobody agrees with my proposal to
treat locks held by group members as mutually non-conflicting. As was
probably evident from the emails on the other thread, it was not
initially clear to me why you'd EVER want heavyweight locks held by
different group members to mutually conflict, but after thinking it
over for a while, I started to think of cases where you would
definitely want that:

1. Suppose two or more parallel workers are doing a parallel COPY.
Each time the relation needs to be extended, one backend or the other
will need to take the relation extension lock in Exclusive mode.
Clearly, taking this lock needs to exclude both workers in the same
group and also unrelated processes.

2. Suppose two or more parallel workers are doing a parallel
sequential scan, with a filter condition of myfunc(a.somecol), and
that myfunc(a.somecal) updates a tuple in some other table. Access to
update that tuple will be serialized using tuple locks, and it's no
safer for two parallel workers to do this at the same time than it
would be for two unrelated processes to do it at the same time.

On the other hand, I think there are also some cases where you pretty
clearly DO want the locks among group members to be mutually
non-conflicting, such as:

3. Parallel VACUUM. VACUUM takes ShareUpdateExclusiveLock, so that
only one process can be vacuuming a relation at the same time. Now,
if you've got several processes in a group that are collaborating to
vacuum that relation, they clearly need to avoid excluding each other,
but they still need to exclude other people. And in particular,
nobody else should get to start vacuuming that relation until the last
member of the group exits. So what you want is a
ShareUpdateExclusiveLock that is, in effect, shared across the whole
group, so that it's only released when the last process exits.

4. Parallel query on a locked relation. Parallel query should work on
a table created in the current transaction, or one explicitly locked
by user action. It's not acceptable for that to just randomly
deadlock, and skipping parallelism altogether, while it'd probably be
acceptable for a first version, is not going a good long-term
solution. It also sounds buggy and fragile for the query planner to
try to guess whether the lock requests in the parallel workers will
succeed or fail when issued. Figuring such details out is the job of
the lock manager or the parallelism infrastructure, not the query
planner.

After thinking about these cases for a bit, I came up with a new
possible approach to this problem. Suppose that, at the beginning of
parallelism, when we decide to start up workers, we grant all of the
locks already held by the master to each worker (ignoring the normal
rules for lock conflicts). Thereafter, we do everything the same as
now, with no changes to the deadlock detector. That allows the lock
conflicts to happen normally in the first two cases above, while
preventing the unwanted lock conflicts in the second two cases.

I believe that it can also be made safe against undetected deadlocks.
Suppose A1 and A2 are members of the same parallel group. If A1 waits
for some unrelated process B which waits for A2, then there are two
possibilities. One is that the blocking lock held by A2 was acquired
before the beginning of parallelism. In that case, both A1 and A2
will hold the problematic lock, so the deadlock detector will notice
that A1 waits for B waits for A1 and report the deadlock. Woohoo!
The other option is that the blocking lock was acquired after the
start of parallelism. But if that's the case, it probably *isn't* a
deadlock, because the lock in question is likely one we intend to hold
only briefly, like a relation extension lock, tuple lock, or a
relation lock on a system catalog that we're scanning for a system
cache lookup. So it's fine to let A1 wait, in this case. In due
time, A2 will release the lock, and everyone will be happy.

But wait, I hear you cry! What if (as in case #2 above) the lock in
question is long-lived, one which we intend to retain until
transaction commit? I think we can just decide not to support this
case for right now. Impose a coding rule that nothing done in
parallel-mode can acquire such a lock; verify when finishing the
parallel section of a computation, before waiting for other members of
the parallel group, or terminating, that we have not accumulated any
such locks, and throw an error if we have. We could make this case
work, but it's tricky. Aside from needing to work out how to make
deadlock detection work, we'd need to transfer the lock in the
opposite direction, from child to parent. Otherwise, we'd have a
behavior difference vs. the non-parallel case: instead of being
retained until transaction commit, the lock would only be retained
until the current parallel phase ended. For now, it seems fine not to
allow this: lots of useful things will be parallel-safe even without
support for this case.

We might have problems if the following sequence of events occurs: (1)
the user backend acquires AccessExclusiveLock on a relation; (2) it
spawns a parallel worker; (3) either of the two processes now attempts
to get AccessShareLock on that relation. LockCheckConflicts() will
(correctly) realize that the processes' own AccessExclusiveLock
doesn't conflict with its desire for AccessShareLock, but it will
still think that the other one does. To fix that, we can teach
LockCheckConflicts() that it's always OK to get a lock if your
existing locks conflict with all modes with which the new lock would
also conflict. I'm not absolutely positive that's a big enough
band-aid, but it's certainly simple to implement, and I can't think of
a problem with it off-hand.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2014-11-13 21:03:40 Re: array exclusion constraints
Previous Message Andres Freund 2014-11-13 20:43:39 Re: group locking: incomplete patch, just for discussion