Quick Links

Re: sinval synchronization considered harmful

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<robertmhaas(at)gmail(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 16:16:28
Message-ID:	4E280A8C020000250003F661@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:

> SIGetDataEntries(). I believe we need to bite the bullet and
> rewrite this using a lock-free algorithm, using memory barriers on
> processors with weak memory ordering.

> [32 processors; 80 clients]

> On unpatched master

> tps = 132518.586371 (including connections establishing)
> tps = 130968.749747 (including connections establishing)
> tps = 132574.338942 (including connections establishing)

> With the lazy vxid locks patch

> tps = 119215.958372 (including connections establishing)
> tps = 113056.859871 (including connections establishing)
> tps = 160562.770998 (including connections establishing)

> gets rid of SInvalReadLock and instead gives each backend its own
> spinlock.

> tps = 167392.042393 (including connections establishing)
> tps = 171336.145020 (including connections establishing)
> tps = 170500.529303 (including connections establishing)

> SIGetDataEntries() can pretty easily be made lock-free.

> tps = 203256.701227 (including connections establishing)
> tps = 190637.957571 (including connections establishing)
> tps = 190228.617178 (including connections establishing)

> Thoughts? Comments? Ideas?

Very impressive! Those numbers definitely justify some #ifdef code
to provide alternatives for weak memory ordering machines versus
others. With the number of CPUs climbing as it is, this is very
important work!

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 18:31:15
Message-ID:	CA+TgmobXqaqx7g52uH4F+VVwcmvTd9L-ZjzdT4FTvCgTvUARYg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 12:16 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Very impressive! Those numbers definitely justify some #ifdef code
> to provide alternatives for weak memory ordering machines versus
> others. With the number of CPUs climbing as it is, this is very
> important work!

Thanks. I'm not thinking so much about #ifdef (although that could
work, too) as I am about providing some primitives to allow this sort
of code-writing to be done in a somewhat less ad-hoc fashion. It
seems like there are basically two categories of machines we need to
worry about.

1. Machines with strong memory ordering. On this category of machines
(which include x86), the CPU basically does not perform loads or
stores out of order. On some of these machines, it is apparently
possible for there to be some ordering of stores relative to loads,
but if the program stores two values or loads two values, those
operations will performed in the same order they appear in the
program. The main thing you need to make your code work reliably on
these machines is a primitive that keeps the compiler from reordering
your code during optimization. On x86, certain categories of exotic
instructions do require

2. Machines with weak memory ordering. On this category of machines
(which includes PowerPC, Dec Alpha, and maybe some others), the CPU
reorders memory accesses arbitrarily unless you explicitly issue
instructions that enforce synchronization. You still need to keep the
compiler from moving things around, too. Alpha is particularly
pernicious, because something like a->b can fetch the pointed-to value
before loading the pointer itself. This is otherwise known as "we
have basically no cache coherency circuits on this chip at all". On
these machines, you need to issue an explicit memory barrier
instruction at each sequence point, or just acquire and release a
spinlock.

So you can imagine a primitive that is defined to be a compiler
barrier on machines with strong memory ordering, and as a memory
fencing instruction on machines with weak memory ordering.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 18:50:08
Message-ID:	18073.1311274208@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... On these machines, you need to issue an explicit memory barrier
> instruction at each sequence point, or just acquire and release a
> spinlock.

Right, and the reason that a spinlock fixes it is that we have memory
barrier instructions built into the spinlock code sequences on machines
where it matters.

To get to the point where we could do the sort of optimization Robert
is talking about, someone will have to build suitable primitives for
all the platforms we support. In the cases where we use gcc ASM in
s_lock.h, it shouldn't be too hard to pull out the barrier
instruction(s) ... but on platforms where we rely on OS-supplied
functions, some research is going to be needed.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 19:15:56
Message-ID:	CA+Tgmobxcm9wD7zJgtM3r3EfJQNDc3dD9wYJB5wa9EfxgK7QBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 2:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> ... On these machines, you need to issue an explicit memory barrier
>> instruction at each sequence point, or just acquire and release a
>> spinlock.
>
> Right, and the reason that a spinlock fixes it is that we have memory
> barrier instructions built into the spinlock code sequences on machines
> where it matters.
>
> To get to the point where we could do the sort of optimization Robert
> is talking about, someone will have to build suitable primitives for
> all the platforms we support. In the cases where we use gcc ASM in
> s_lock.h, it shouldn't be too hard to pull out the barrier
> instruction(s) ... but on platforms where we rely on OS-supplied
> functions, some research is going to be needed.

Yeah, although falling back to SpinLockAcquire() and SpinLockRelease()
on a backend-private slock_t should work anywhere that PostgreSQL
works at all[1]. That will probably be slower than a memory fence
instruction and certainly slower than a compiler barrier, but the
point is that - right now - we're doing it the slow way everywhere.

I think the real challenge is going to be testing. If anyone has a
machine with weak memory ordering they can give me access to, that
would be really helpful for flushing the bugs out of this stuff.
Getting it to work on x86 is not the hard part.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] This was a suggestion from Noah Misch. I wasn't quite convinced
when he initially made it, but having studied the issue a lot more, I
now am. The CPU doesn't know how many processes have the memory
mapped into their address space.

From:	Dave Page <dpage(at)pgadmin(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 19:22:27
Message-ID:	CA+OCxoyOJ82gOVE5v_Dx7o-z505Yj2fZ4BEqCQiPkSQCmkyHxg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 8:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> I think the real challenge is going to be testing. If anyone has a
> machine with weak memory ordering they can give me access to, that
> would be really helpful for flushing the bugs out of this stuff.
> Getting it to work on x86 is not the hard part.

I believe there's a PPC box in our storage facility in NJ that we
might be able to dig out for you. There's also a couple in our India
office. Let me know if they'd be of help.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dave Page <dpage(at)pgadmin(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 19:25:56
Message-ID:	CA+TgmoZ9YEmy+dwybkLOBJbm2Hyc4yvr=c7p52oNK0SX6CDv9Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 3:22 PM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> On Thu, Jul 21, 2011 at 8:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think the real challenge is going to be testing. If anyone has a
>> machine with weak memory ordering they can give me access to, that
>> would be really helpful for flushing the bugs out of this stuff.
>> Getting it to work on x86 is not the hard part.
>
> I believe there's a PPC box in our storage facility in NJ that we
> might be able to dig out for you. There's also a couple in our India
> office. Let me know if they'd be of help.

Yes!

More processors is better, of course, but having anything at all to
test on would be an improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Dave Page <dpage(at)pgadmin(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 19:29:41
Message-ID:	CA+OCxow6a4G27isfAigUG7_48kgXkr63KO0LDeXS_5W4KVD8gQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 8:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 21, 2011 at 3:22 PM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>> On Thu, Jul 21, 2011 at 8:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I think the real challenge is going to be testing. If anyone has a
>>> machine with weak memory ordering they can give me access to, that
>>> would be really helpful for flushing the bugs out of this stuff.
>>> Getting it to work on x86 is not the hard part.
>>
>> I believe there's a PPC box in our storage facility in NJ that we
>> might be able to dig out for you. There's also a couple in our India
>> office. Let me know if they'd be of help.
>
> Yes!
>
> More processors is better, of course, but having anything at all to
> test on would be an improvement.

OK, will check with India first, as it'll be easier for them to deploy.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 19:53:52
Message-ID:	19175.1311278032@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think the real challenge is going to be testing. If anyone has a
> machine with weak memory ordering they can give me access to, that
> would be really helpful for flushing the bugs out of this stuff.

There are multi-CPU PPCen in the buildfarm, or at least there were last
time I broke the sinval code ;-). Note that testing on a single-core
PPC will prove nothing.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 20:02:35
Message-ID:	CA+TgmoZCYDyRbM0bjHbL+uMJ5AchHx6vFGSsmXbRv=RKNe+Qfw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 3:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think the real challenge is going to be testing. If anyone has a
>> machine with weak memory ordering they can give me access to, that
>> would be really helpful for flushing the bugs out of this stuff.
>
> There are multi-CPU PPCen in the buildfarm, or at least there were last
> time I broke the sinval code ;-). Note that testing on a single-core
> PPC will prove nothing.

Yeah, I was just thinking about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 22:22:09
Message-ID:	79FCA27A-B912-48B4-90A4-562FEFB1EE75@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul21, 2011, at 21:15 , Robert Haas wrote:
> On Thu, Jul 21, 2011 at 2:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> ... On these machines, you need to issue an explicit memory barrier
>>> instruction at each sequence point, or just acquire and release a
>>> spinlock.
>>
>> Right, and the reason that a spinlock fixes it is that we have memory
>> barrier instructions built into the spinlock code sequences on machines
>> where it matters.
>>
>> To get to the point where we could do the sort of optimization Robert
>> is talking about, someone will have to build suitable primitives for
>> all the platforms we support. In the cases where we use gcc ASM in
>> s_lock.h, it shouldn't be too hard to pull out the barrier
>> instruction(s) ... but on platforms where we rely on OS-supplied
>> functions, some research is going to be needed.
>
> Yeah, although falling back to SpinLockAcquire() and SpinLockRelease()
> on a backend-private slock_t should work anywhere that PostgreSQL
> works at all[1]. That will probably be slower than a memory fence
> instruction and certainly slower than a compiler barrier, but the
> point is that - right now - we're doing it the slow way everywhere.

As I discovered while playing with various lockless algorithms to
improve our LWLocks, spin locks aren't actually a replacement for
a (full) barrier.

Lock acquisition only really needs to guarantee that loads and stores
which come after the acquisition operation in program order (i.e., in
the instruction stream) aren't globally visible before that operation
completes. This kind of barrier behaviour is often fittingly called
"acquire barrier".

Similarly, a lock release operation only needs to guarantee that loads
and stores which occur before that operation in program order are
globally visible before the release operation completes. This, again,
is fittingly called "release barrier".

Now assume the following code fragment

global1 = 1;
SpinLockAcquire();
SpinLockRelease();
global2 = 1;

If SpinLockAcquire() has "acquire barrier" semantics, and SpinLockRelease()
has "release barrier" sematics, the it's possible for the store to global1
to be delayed until after SpinLockAcquire(), and similarly for the store
to global2 to be executed before SpinLockRelease() completes. In other
words, what happens is

SpinLockAcquire();
global1 = 1;
global2 = 1;
SpinLockRelease();

But once that can happens, there's no reason that it couldn't also be

SpinLockAcquire();
global2 = 1;
global1 = 1;
SpinLockRelease();

I didn't check if any of our spin lock implementations is actually affected
by this, but it doesn't seem wise to rely on them being full barriers, even
if it may be true today.

best regards,
Florian Pflug

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 22:44:59
Message-ID:	20110721224458.GD66121@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 02:31:15PM -0400, Robert Haas wrote:
> 1. Machines with strong memory ordering. On this category of machines
> (which include x86), the CPU basically does not perform loads or
> stores out of order. On some of these machines, it is apparently
> possible for there to be some ordering of stores relative to loads,
> but if the program stores two values or loads two values, those
> operations will performed in the same order they appear in the
> program.

This is all correct, but...

> The main thing you need to make your code work reliably on
> these machines is a primitive that keeps the compiler from reordering
> your code during optimization.

If you're suggesting that hardware memory barriers aren't going to be
needed to implement lock-free code on x86, that isn't true. Because a
read can be reordered with respect to a write to a different memory
location, you can still have problems. So you do still need memory
barriers, just fewer of them.

Dekker's algorithm is the classic example: two threads each set a flag
and then check whether the other thread's flag is set. In any
sequential execution, at least one should see the other's flag set, but
on the x86 that doesn't always happen. One thread's read might be
reordered before its write.

> 2. Machines with weak memory ordering. On this category of machines
> (which includes PowerPC, Dec Alpha, and maybe some others), the CPU
> reorders memory accesses arbitrarily unless you explicitly issue
> instructions that enforce synchronization. You still need to keep the
> compiler from moving things around, too. Alpha is particularly
> pernicious, because something like a->b can fetch the pointed-to value
> before loading the pointer itself. This is otherwise known as "we
> have basically no cache coherency circuits on this chip at all". On
> these machines, you need to issue an explicit memory barrier
> instruction at each sequence point, or just acquire and release a
> spinlock.

The Alpha is pretty much unique (thankfully!) in allowing dependent
reads to be reordered. That makes it even weaker than the typical
weak-ordering machine. Since reading a pointer and then dereferencing
it is a pretty reasonable thing to do regularly in RCU code, you
probably don't want to emit barriers in between on architectures where
it's not actually necessary. That argues for another operation that's
defined to be a barrier (mb) on the Alpha but a no-op elsewhere.
Certainly the Linux kernel found it useful to do so
(read_barrier_depends)

Alternatively, one might question how important it is to support the
Alpha these days...

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-21 23:34:58
Message-ID:	CA+Tgmobgm3OVyv7_30Y0bhwgo8oeR_FOH5Xvk2fGLvGyDrnAxg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 6:22 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jul21, 2011, at 21:15 , Robert Haas wrote:
>> On Thu, Jul 21, 2011 at 2:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> ... On these machines, you need to issue an explicit memory barrier
>>>> instruction at each sequence point, or just acquire and release a
>>>> spinlock.
>>>
>>> Right, and the reason that a spinlock fixes it is that we have memory
>>> barrier instructions built into the spinlock code sequences on machines
>>> where it matters.
>>>
>>> To get to the point where we could do the sort of optimization Robert
>>> is talking about, someone will have to build suitable primitives for
>>> all the platforms we support. In the cases where we use gcc ASM in
>>> s_lock.h, it shouldn't be too hard to pull out the barrier
>>> instruction(s) ... but on platforms where we rely on OS-supplied
>>> functions, some research is going to be needed.
>>
>> Yeah, although falling back to SpinLockAcquire() and SpinLockRelease()
>> on a backend-private slock_t should work anywhere that PostgreSQL
>> works at all[1]. That will probably be slower than a memory fence
>> instruction and certainly slower than a compiler barrier, but the
>> point is that - right now - we're doing it the slow way everywhere.
>
> As I discovered while playing with various lockless algorithms to
> improve our LWLocks, spin locks aren't actually a replacement for
> a (full) barrier.
>
> Lock acquisition only really needs to guarantee that loads and stores
> which come after the acquisition operation in program order (i.e., in
> the instruction stream) aren't globally visible before that operation
> completes. This kind of barrier behaviour is often fittingly called
> "acquire barrier".
>
> Similarly, a lock release operation only needs to guarantee that loads
> and stores which occur before that operation in program order are
> globally visible before the release operation completes. This, again,
> is fittingly called "release barrier".
>
> Now assume the following code fragment
>
> global1 = 1;
> SpinLockAcquire();
> SpinLockRelease();
> global2 = 1;
>
> If SpinLockAcquire() has "acquire barrier" semantics, and SpinLockRelease()
> has "release barrier" sematics, the it's possible for the store to global1
> to be delayed until after SpinLockAcquire(), and similarly for the store
> to global2 to be executed before SpinLockRelease() completes. In other
> words, what happens is
>
> SpinLockAcquire();
> global1 = 1;
> global2 = 1;
> SpinLockRelease();
>
> But once that can happens, there's no reason that it couldn't also be
>
> SpinLockAcquire();
> global2 = 1;
> global1 = 1;
> SpinLockRelease();
>
> I didn't check if any of our spin lock implementations is actually affected
> by this, but it doesn't seem wise to rely on them being full barriers, even
> if it may be true today.

Hmm. I'm not worried about that. AFAIK, only IA64 has such an
implementation, and our existing spinlock implementation doesn't use
it. If we were to add something like that in the future, we'd
presumably know that we were doing it, and would add the appropriate
memory barrier primitive at the same time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sinval synchronization considered harmful
Date:	2011-07-22 00:24:51
Message-ID:	CA+TgmobOy=+6Gp9v3JnGyVs0Fi1WUeM9dU5+bC3sN7J0Z_v9-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 21, 2011 at 6:44 PM, Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
> If you're suggesting that hardware memory barriers aren't going to be
> needed to implement lock-free code on x86, that isn't true. Because a
> read can be reordered with respect to a write to a different memory
> location, you can still have problems. So you do still need memory
> barriers, just fewer of them.
>
> Dekker's algorithm is the classic example: two threads each set a flag
> and then check whether the other thread's flag is set. In any
> sequential execution, at least one should see the other's flag set, but
> on the x86 that doesn't always happen. One thread's read might be
> reordered before its write.

In the case of sinval, what we need to do for SIGetDataEntries() is,
approximately, a bunch of loads, followed by a store to one of the
locations we loaded (which no one else can have written meanwhile).
So I think that's OK.

In SIInsertDataEntries(), what we need to do is, approximately, take a
lwlock, load from a location which can only be written while holding
the lwlock, do a bunch of stores, ending with a store to that first
location, and release the lwlock. I think that's OK, too.

>> 2. Machines with weak memory ordering. On this category of machines
>> (which includes PowerPC, Dec Alpha, and maybe some others), the CPU
>> reorders memory accesses arbitrarily unless you explicitly issue
>> instructions that enforce synchronization. You still need to keep the
>> compiler from moving things around, too. Alpha is particularly
>> pernicious, because something like a->b can fetch the pointed-to value
>> before loading the pointer itself. This is otherwise known as "we
>> have basically no cache coherency circuits on this chip at all". On
>> these machines, you need to issue an explicit memory barrier
>> instruction at each sequence point, or just acquire and release a
>> spinlock.
>
> The Alpha is pretty much unique (thankfully!) in allowing dependent
> reads to be reordered. That makes it even weaker than the typical
> weak-ordering machine. Since reading a pointer and then dereferencing
> it is a pretty reasonable thing to do regularly in RCU code, you
> probably don't want to emit barriers in between on architectures where
> it's not actually necessary. That argues for another operation that's
> defined to be a barrier (mb) on the Alpha but a no-op elsewhere.
> Certainly the Linux kernel found it useful to do so
> (read_barrier_depends)
>
> Alternatively, one might question how important it is to support the
> Alpha these days...

Well, currently, we do, so we probably don't want to drop support for
that without some careful thought. I searched the archive and found
someone trying to compile 8.3.something on Alpha just a few years ago,
so it's apparently not totally dead yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company