Re: group locking: incomplete patch, just for discussion

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: group locking: incomplete patch, just for discussion
Date: 2014-11-06 02:26:37
Message-ID: CA+TgmoZZRz61LdjyT129=w_UJq+7QXHHwvSQsQAxLaK5anfBCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The procgloballist stuff should be the subject of a separate patch
> which I agree with.

Yes, I think that's probably a net improvement in robustness quite
apart from what we decide to do about any of the rest of this. I've
attached it here as revise-procglobal-tracking.patch and will commit
that bit if nobody objects. The remainder is reattached without
change as group-locking-v0.1.patch.

Per your other comment, I've developed the beginnings of a testing
framework which I attached here as test_group_locking-v0.patch. That
doesn't look to have much hope of evolving into something we'd want
even in contrib, but I think it'll be rather useful for debugging. It
works like this:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# select test_group_locking('1.0:start,2.0:start,1.0:lock:AccessExclusiveLock:foo,2.0:lock:AccessExclusiveLock:foo');
NOTICE: starting worker 1.0
NOTICE: starting worker 2.0
NOTICE: instructing worker 1.0 to acquire AccessExclusiveLock on
relation with OID 16387
NOTICE: instructing worker 2.0 to acquire AccessExclusiveLock on
relation with OID 16387
ERROR: could not obtain AccessExclusiveLock on relation with OID 16387
CONTEXT: background worker, group 2, task 0

The syntax is a little arcane, I guess, but it's "documented" in the
comments within. In this case I asked it to start up two background
workers and have them both try to take AccessExclusiveLock on table
foo. As expected, the second one fails. The idea is that workers are
identified by a pair of numbers X.Y; two workers with the same X-value
are in the same locking group. So if I call the second worker 1.1
rather than 2.0, it'll join the same locking group as worker 1.0 and
... then it does the wrong thing, and then it crashes the server,
because my completely-untested code is unsurprisingly riddled with
bugs.

Eventually, this needs to be generalized a bit so that we can use it
to test deadlock detection. That's tricky, because what you really
want to do is tell worker A to wait for some lock and then, once
you're sure it's on the wait queue, tell worker B to go take some
other lock and check that you see the resulting deadlock. There
doesn't seem to be a good API for the user backend to find out whether
some background worker is waiting for some particular lock, so I may
have to resort to the hacky expedient of having the driver process
wait for a few seconds and assume that's long enough that the
background worker will be on the wait queue by then. Or maybe I can
drum up some solution, but anyway it's not done yet.

The value of this test code is that we can easily reproduce locking
scenarios which would be hard to reproduce in a real workload - e.g.
because they're timing-dependent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
group-locking-v0.1.patch text/x-patch 28.6 KB
revise-procglobal-tracking.patch text/x-patch 4.9 KB
test_group_locking-v0.patch text/x-patch 30.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-06 02:31:59 Re: Additional role attributes && superuser review
Previous Message Josh Berkus 2014-11-06 02:15:17 Re: recovery_target_time and standby_mode