Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date: 2014-02-15 03:20:17
Message-ID: 8918EF09-4727-4896-AD34-3DAEC3063883@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb14, 2014, at 19:21 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
>> Well, the assumption isn't all that new. We already have the situation that
>> a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
>> Currently, the process who took it off the queue is responsible for rectifying
>> that eventually, with the changed proposed below it'll be the backend who
>> owns the PGPROC. From the point of view of other backends, nothing really
>> changes.
>
> That window is absolutely tiny tho.

True, but it's there, so if anything counts on that never being the case, it's
still already broken.

>>>> b) resetting lwWaitLink introduces a race condition (however unlikely)
>>>>
>>>> we'll trade correctness for cleanliness if we continue to reset lwWaitLink
>>>> without protecting against the race. That's a bit of a weird trade-off to make.
>>>
>>> It's not just cleanliness, it's being able to actually debug crashes.
>>
>> We could still arrange for the stray lwWaitLink from being visible only
>> momentarily. If a backend is woken up and detects that lwWaiting is false,
>> it knows that it cannot be on any wait queue - it was just removed, and
>> hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
>> back to NULL. The stray value would thus only be visible while a backend executes
>> the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
>> from a stack trace. So I'm not convinced that this makes debugging any harder.
>
> That's not actually safe on an out of order architecture afaics. Without
> barriers the store to lwWaitLink in the woken up backend can preempt the
> read for the next element in the PGSemaphoreUnlock() loop.

Hm, true. I guess we could prevent that by introducing an artificial dependency
between the read and the write - something like

proc = head;
head = proc->lwWaitLink
/*
* We don't ever expect to actually PANIC here, but the test forces the
* load to execute before the store to lwWaiting. This prevents a race
* between reading lwWaitLink here and resetting it back to zero in the
* awoken backend (Note that backends can wake up spuriously, so just
* reading it before doing PGSemaphoreUnlock is insufficient)
*/
if (head != MyProc)
proc->lwWaiting = false;
else
elog(PANIC, ...)
PGSemaphoreUnlock(&proc->sem);

(This assumes that proc is a volatile pointer)

Another idea would be to do as you suggest and only mark the PGPROC pointers
volatile, but to additionally add a check for queue corruption somewhere. We should
be able to detect that - if we ever hit this issue, LWLockRelease should find a
PGPROC while traversing the queue whose lwWaitLink is NULL but which isn't equal to
lock->tail. If that ever happens, we'd simply PANIC.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-15 03:30:45 Re: Recovery inconsistencies, standby much larger than primary
Previous Message Tom Lane 2014-02-15 02:42:26 Re: narwhal and PGDLLIMPORT