Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date: 2014-02-10 18:13:11
Message-ID: 52F916B7.4090701@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/10/2014 03:46 PM, Andres Freund wrote:
> Hi,
>
> During the lwlock scalability work I noticed a longstanding issue with
> the lwlock code. LWLockRelease() and the other mentioned locations do
> the following to wake up any waiters, without holding the lock's
> spinlock:
> /*
> * Awaken any waiters I removed from the queue.
> */
> while (head != NULL)
> {
> LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
> proc = head;
> head = proc->lwWaitLink;
> proc->lwWaitLink = NULL;
> proc->lwWaiting = false;
> PGSemaphoreUnlock(&proc->sem);
> }
>
> which means they manipulate the lwWaitLink queue without
> protection. That's done intentionally. The code tries to protect against
> corruption of the list to do a woken up backend acquiring a lock (this
> or an independent one) by only continuing when the lwWaiting flag is set
> to false. Unfortunately there's absolutely no guarantee that a) the
> assignment to lwWaitLink and lwWaiting are done in that order b) that
> the stores are done in-order from the POV of other backends.
> So what we need to do is to acquire a write barrier between the
> assignments to lwWaitLink and lwWaiting, i.e.
> proc->lwWaitLink = NULL;
> pg_write_barrier();
> proc->lwWaiting = false;
> the reader side already uses an implicit barrier by using spinlocks.
>
> I've fixed this as part 1 of the lwlock scalability work in [1], but
> Heikki rightfully suggested that a) this should be backpatched b) done
> in a separate commit.
>
> There is the question what to do about the branches without barriers? I
> guess a SpinLockAcquire()/Release() would do?

The other alternative we discussed on IM is to unlink the waiters from
the linked list, while still holding the spinlock. We can't do the
PGSemaphoreUnlock() call to actually wake up the waiters while holding
the spinlock, but all the shared memory manipulations we could. It would
move all the modifications of the shared structures under the spinlock,
which feels comforting.

It would require using some-sort of a backend-private data structure to
hold the list of processes to wake up. We don't want to palloc() in
LWLockRelease(), but we could malloc() a large-enough array once at
process initialization, and use that on all LWLockRelease() calls.
- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-10 18:15:53 Re: jsonb and nested hstore
Previous Message Peter Geoghegan 2014-02-10 18:05:31 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE