Yes, WaitLatch is vulnerable to weak-memory-ordering bugs

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Date: 2011-08-07 17:47:49
Message-ID: 24241.1312739269@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I suspected $SUBJECT from the beginning, and I've now put in enough work
to be able to prove it. The attached test program reliably fails within
a few minutes of being started, when run with 8 worker processes on an
8-core PPC machine. It's a pretty simple "token passing ring" protocol,
and at some point one of the processes sees its latch set without seeing
its flag set, so it goes back to sleep and the token stops getting passed.

The original worry about the latch code was that there was some internal
race condition of this sort, but I think we were failing to see the forest
for the trees. The specification of how to use a latch reads like this:

* The correct pattern to wait for an event is:
*
* for (;;)
* {
* ResetLatch();
* if (work to do)
* Do Stuff();
*
* WaitLatch();
* }
*
* It's important to reset the latch *before* checking if there's work to
* do. Otherwise, if someone sets the latch between the check and the
* ResetLatch call, you will miss it and Wait will block.
*
* To wake up the waiter, you must first set a global flag or something
* else that the main loop tests in the "if (work to do)" part, and call
* SetLatch *after* that.

Any protocol of that sort has *obviously* got a race condition between
changes of the latch state and changes of the separate flag state, if run
in a weak-memory-ordering machine.

At least on the hardware I'm testing, it seems that the key requirement
for a failure to be seen is that there's a lot of contention for the cache
line containing the flag, but not for the cache line containing the latch.
This allows the write to the flag to take longer to propagate to main
memory than the write to the latch. I could not get it to fail until
I added the extra shared-memory traffic in the delay loop around line 175.
This probably also explains why it doesn't fail with small numbers of
workers --- you need enough contending CPUs to saturate the memory
hardware. (On Red Hat's test machines, 7 or 8 workers would reliably
fail within a few minutes, but 6 or fewer didn't always.)

I'm not sure whether we need to do anything about this for 9.1; it may be
that none of the existing uses of latches are subject to the kind of
contention that would create a risk. But I'm entirely convinced that we
need to insert some memory-barrier instructions into the latch code before
we expand our uses of latches much more. Since we were already finding
that memory barriers will be needed for performance reasons elsewhere,
I think it's time to bite the bullet and develop that infrastructure.

regards, tom lane

Attachment Content-Type Size
latch_test.c text/x-patch 4.7 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Date: 2011-08-08 12:47:28
Message-ID: CA+TgmobyqhvXtizQ7ugR7M27fdtto78LB4c-x407KVVH=SxJug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Any protocol of that sort has *obviously* got a race condition between
> changes of the latch state and changes of the separate flag state, if run
> in a weak-memory-ordering machine.
>
> At least on the hardware I'm testing, it seems that the key requirement
> for a failure to be seen is that there's a lot of contention for the cache
> line containing the flag, but not for the cache line containing the latch.
> This allows the write to the flag to take longer to propagate to main
> memory than the write to the latch.

This is interesting research, especially because it provides sobering
evidence of how easily a real problem could be overlooked in testing.
If it took several minutes for this to fail on an 8-CPU system, it's
easy to imagine a less egregious example sitting in our code-base for
years undetected. (Indeed, I think we probably have a few of those
already...)

On the flip side, I'm not sure that anyone ever really expected that a
latch could be safely used this way. Normally, I'd expect the flag to
be some piece of state protected by an LWLock, and I think that ought
to be OK provided that the lwlock is released before setting or
checking the latch. Maybe we should try to document the contract here
a little better; I think it's just that there must be a full memory
barrier (such as LWLockRelease) in both processes involved in the
interaction.

> I'm not sure whether we need to do anything about this for 9.1; it may be
> that none of the existing uses of latches are subject to the kind of
> contention that would create a risk.  But I'm entirely convinced that we
> need to insert some memory-barrier instructions into the latch code before
> we expand our uses of latches much more.  Since we were already finding
> that memory barriers will be needed for performance reasons elsewhere,
> I think it's time to bite the bullet and develop that infrastructure.

I've been thinking about this too and actually went so far as to do
some research and put together something that I hope covers most of
the interesting cases. The attached patch is pretty much entirely
untested, but reflects my present belief about how things ought to
work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
barrier-v1.patch application/octet-stream 6.3 KB

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Date: 2011-08-08 15:28:45
Message-ID: CAEYLb_VUgtq=aSZkTZ5Z45L0-x0VTFOpQpNQPt06Gwr6xmd9aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8 August 2011 13:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On the flip side, I'm not sure that anyone ever really expected that a
> latch could be safely used this way.  Normally, I'd expect the flag to
> be some piece of state protected by an LWLock, and I think that ought
> to be OK provided that the lwlock is released before setting or
> checking the latch.

I'm inclined to agree. FWIW I'll point out that in addition to your
point, it's also worth remembering that a shared latch isn't always
necessary, as in the case of the archiver, so this shouldn't
fundamentally shake our confidence in the latch implementation.

> Maybe we should try to document the contract here
> a little better; I think it's just that there must be a full memory
> barrier (such as LWLockRelease) in both processes involved in the
> interaction.

Or, maybe we should think about pushing that into the latch
implementation, while providing an interface that is easy to use
correctly and hard to use incorrectly. The latch code already has a
pretty complicated contract, and I had to bend over backwards using it
in the AVLauncher patch that I submitted to the upcoming commitfest.
Weakening or equivocating the contract any further should be
considered a last resort.

> I've been thinking about this too and actually went so far as to do
> some research and put together something that I hope covers most of
> the interesting cases.  The attached patch is pretty much entirely
> untested, but reflects my present belief about how things ought to
> work.

That's interesting. Nice work.

It seems likely that Windows will support ARM in some way in the next
couple of years, so it's good that you're handling this using a win32
abstraction where available. Of course, Itanic is currently supported
on Windows, though not by us, and it is obviously never going to be
worth pursuing such support. The point is that it generally isn't safe
to assume that we'll only ever support Windows on x86 and x86_64.

I'd like to see a #warning where you fall back on acquiring and
releasing a spinlock, or at least a #warning if that in turn falls
back on the semaphore-based spinlock implementation. Granted, that
directive isn't in the C standard, but it's pretty portable in
practice. If it breaks the build on some very esoteric platform, that
may not be a bad thing - in fact, maybe you should use the portable
#error directive to make sure it breaks. I'd rather have someone
complain about that than lots more people complain about Postgres
being inexplicably slow, or worse not complain and dismiss Postgres
out of hand for their use-case.

By the way, I don't think that the comment "Won't work on Visual
Studio 2003" is accurate. Everything you do for the
defined(WIN32_ONLY_COMPILER) case is documented as working with 2003.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Date: 2011-08-08 15:52:24
Message-ID: CA+TgmoZa0qi=j0h=ZgAfXTYNTYSLD=CdEiP-2s3WZFLR1F45Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 8, 2011 at 11:28 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>> Maybe we should try to document the contract here
>> a little better; I think it's just that there must be a full memory
>> barrier (such as LWLockRelease) in both processes involved in the
>> interaction.
>
> Or, maybe we should think about pushing that into the latch
> implementation, while providing an interface that is easy to use
> correctly and hard to use incorrectly.

That would probably result in unnecessary double-synchronization in
many cases, though. Memory barriers are expensive, and I think we'd
better adopt a policy of trying to figure out where they are truly
needed and using them only in those places. Or to put that another
way, having spent the last couple of months trying to reduce our
synchronization overhead, I'm not real keen on adding more unless we
really need it.

>> I've been thinking about this too and actually went so far as to do
>> some research and put together something that I hope covers most of
>> the interesting cases.  The attached patch is pretty much entirely
>> untested, but reflects my present belief about how things ought to
>> work.
>
> That's interesting. Nice work.

Thanks!

> It seems likely that Windows will support ARM in some way in the next
> couple of years, so it's good that you're handling this using a win32
> abstraction where available. Of course, Itanic is currently supported
> on Windows, though not by us, and it is obviously never going to be
> worth pursuing such support. The point is that it generally isn't safe
> to assume that we'll only ever support Windows on x86 and x86_64.

Agreed. I saw an MSDN article that referred to "architectures on
which Windows is supported". It didn't say which those were, but I
figured I'd better not assume they were all platforms with strong
memory ordering.

> I'd like to see a #warning where you fall back on acquiring and
> releasing a spinlock, or at least a #warning if that in turn falls
> back on the semaphore-based spinlock implementation. Granted, that
> directive isn't in the C standard, but it's pretty portable in
> practice. If it breaks the build on some very esoteric platform, that
> may not be a bad thing - in fact, maybe you should use the portable
> #error directive to make sure it breaks. I'd rather have someone
> complain about that than lots more people complain about Postgres
> being inexplicably slow, or worse not complain and dismiss Postgres
> out of hand for their use-case.

I was thinking about whether and how we could expose that information.
I was almost tempted to add a read-only GUC that would output the
details of what kind of synchronization we're using. So the bad case
would look something like this:

spinlock=semaphore,memory_barrier=spinlock,read_barrier=spinlock,write_barrier=spinlock,compiler_barrier=spinlock

And the good case would look something like this:

spinlock=gcc-inline-asm,memory_barrier=gcc-inline-asm,read_barrier=gcc-inline-asm,write_barrier=gcc-inline-asm,compiler_barrier=gcc-inline-asm

And you can imagine other values, like compiler-intrinsic. Of course,
jamming a whole CSV file into a GUC value is a bit unappealing. Maybe
it would be better to have a system view with two columns,
synchronization_primitive and implementation.

Either way, I'd prefer this to a #warning, because that's only going
to be visible at compile-time, which means that when $PGCOMPANY gets
called in to figure out why the system is dog slow, it's going to take
a bit of forensics to figure out where the build came from and what
options are in use. Is that a huge problem? Maybe not, but it seems
like it would be a nice thing to make it easy for people to log into a
running system and immediately be able to determine which forms of
wizardry we're using there.

> By the way, I don't think that the comment "Won't work on Visual
> Studio 2003" is accurate. Everything you do for the
> defined(WIN32_ONLY_COMPILER) case is documented as working with 2003.

Oh, really? I thought I ran across something that said otherwise, but
it might have been wrong, or maybe I got confused somewhere along the
line. I haven't tested anything more than that this compiles without
erroring out on MacOS X, so the probability of errors and omissions is
not small.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Date: 2011-08-08 16:39:32
Message-ID: 27512.1312821572@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Any protocol of that sort has *obviously* got a race condition between
>> changes of the latch state and changes of the separate flag state, if run
>> in a weak-memory-ordering machine.

> On the flip side, I'm not sure that anyone ever really expected that a
> latch could be safely used this way. Normally, I'd expect the flag to
> be some piece of state protected by an LWLock, and I think that ought
> to be OK provided that the lwlock is released before setting or
> checking the latch. Maybe we should try to document the contract here
> a little better; I think it's just that there must be a full memory
> barrier (such as LWLockRelease) in both processes involved in the
> interaction.

Heikki argued the same thing in
http://archives.postgresql.org/message-id/4CEA5A0F.1030602@enterprisedb.com
but I don't think anyone has actually done the legwork to verify that
current uses are safe. So [ ... warms up grep ... ]

Currrent callers of WaitLatch(OrSocket):

XLogPageRead waits on XLogCtl->recoveryWakeupLatch

latch is set by WakeupRecovery, which is called by process'
own signal handlers (OK) and XLogWalRcvFlush. The latter is OK
because there's a spinlock protecting the transmitted data.

pgarch_MainLoop waits on mainloop_latch

non-issue because latch is process-local

WalSndLoop waits on MyWalSnd->latch

latch is set by signal handlers and WalSndWakeup. The latter is
OK because it's called right after XLogFlush (which certainly
locks whatever it touches) or a spinlock release.

SyncRepWaitForLSN waits on MyProc->waitLatch

latch is set by signal handlers and SyncRepWakeQueue. The
latter appears unsafe at first glance, because it sets the
shared variable (thisproc->syncRepState) and immediately
sets the latch. However, the receiver does this curious dance:

syncRepState = MyProc->syncRepState;
if (syncRepState == SYNC_REP_WAITING)
{
LWLockAcquire(SyncRepLock, LW_SHARED);
syncRepState = MyProc->syncRepState;
LWLockRelease(SyncRepLock);
}

which I believe makes it safe, though rather underdocumented;
if a race does occur, the receiver will acquire the lock and
recheck, and the lock acquisition will block until the caller of
SyncRepWakeQueue has released SyncRepLock, and that release
will flush any pending writes to shared memory.

Conclusions:

(1) 9.1 does not have a problem of this sort.

(2) Breathing hard on any of this code could break it.

(3) I'm not going to hold still for not inserting a memory barrier
into SetLatch, once we have the code. Unsubstantiated performance
worries are not a sufficient argument not to --- as a wise man once
said, you can make the code arbitrarily fast if it doesn't have to
give the right answer.

(4) In the meantime, we'd better document that it's caller's
responsibility to ensure that the flag variable is adequately
protected by locking. I think SyncRepWaitForLSN could use a warning
comment, too. I will go do that.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Date: 2011-08-08 16:45:07
Message-ID: 4E401293.50108@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.08.2011 19:39, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>> On the flip side, I'm not sure that anyone ever really expected that a
>> latch could be safely used this way. Normally, I'd expect the flag to
>> be some piece of state protected by an LWLock, and I think that ought
>> to be OK provided that the lwlock is released before setting or
>> checking the latch. Maybe we should try to document the contract here
>> a little better; I think it's just that there must be a full memory
>> barrier (such as LWLockRelease) in both processes involved in the
>> interaction.
>
> Heikki argued the same thing in
> http://archives.postgresql.org/message-id/4CEA5A0F.1030602@enterprisedb.com
> but I don't think anyone has actually done the legwork to verify that
> current uses are safe. So [ ... warms up grep ... ]

Thanks for double-checking.

> (3) I'm not going to hold still for not inserting a memory barrier
> into SetLatch, once we have the code. Unsubstantiated performance
> worries are not a sufficient argument not to --- as a wise man once
> said, you can make the code arbitrarily fast if it doesn't have to
> give the right answer.

I agree we definitely should add the memory barrier instruction there
once we have them.

> (4) In the meantime, we'd better document that it's caller's
> responsibility to ensure that the flag variable is adequately
> protected by locking. I think SyncRepWaitForLSN could use a warning
> comment, too. I will go do that.

Ok, thanks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com