Re: better atomics

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Ants Aasma <ants(at)cybertec(dot)at>
Subject: Re: better atomics
Date: 2013-10-16 20:51:59
Message-ID: 20131016205159.GC13024@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-10-16 22:39:07 +0200, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-10-16 14:30:34 -0400, Robert Haas wrote:
> >>> But _and, _or are really useful because they can be used to atomically
> >>> clear and set flag bits.
>
> >> Until we have some concrete application that requires this
> >> functionality for adequate performance, I'd be inclined not to bother.
> >> I think we have bigger fish to fry, and (to extend my nautical
> >> metaphor) trying to get this working everywhere might amount to trying
> >> to boil the ocean.
>
> > Well, I actually have a very, very preliminary patch using them in the
> > course of removing the spinlocks in PinBuffer() (via LockBufHdr()).
>
> I think we need to be very, very wary of making our set of required
> atomics any larger than it absolutely has to be. The more stuff we
> require, the closer we're going to be to making PG a program that only
> runs well on Intel.

Well, essentially I am advocating to support basically three operations:
* compare and swap
* atomic add (and by that sub)
* test and set

The other operations are just porcelain around that.

With compare and swap both the others can be easily implemented if
neccessary.

Note that e.g. linux - running on all platforms we're talking about but
VAX - exensively and unconditionally uses atomic operations widely. So I
think we don't have to be too afraid about non-intel performance here.

> I am not comforted by the "gcc will provide good
> implementations of the atomics everywhere" argument, because in fact
> it won't. See for example the comment associated with our only existing
> use of gcc atomics:
>
> * On ARM, we use __sync_lock_test_and_set(int *, int) if available, and if
> * not fall back on the SWPB instruction. SWPB does not work on ARMv6 or
> * later, so the compiler builtin is preferred if available. Note also that
> * the int-width variant of the builtin works on more chips than other widths.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> That's not a track record that should inspire much faith that complete
> sets of atomics will exist elsewhere. What's more, we don't just need
> atomics that *work*, we need atomics that are *fast*, else the whole
> exercise turns into pessimization not improvement. The more atomic ops
> we depend on, the more likely it is that some of them will involve kernel
> support on some chips, destroying whatever performance improvement we
> hoped to get.

Hm. I can't really follow. We *prefer* to use __sync_lock_test_and_set
in contrast to our own open-coded implementation, right? And the comment
about some hardware only supporting "int-width" matches that I only want to
require u32 atomics support, right?

I completely agree that we cannot rely on 8byte math or similar (16byte
cmpxchg) to be supported by all platforms. That indeed would require
kernel fallbacks on e.g. ARM.

> > ... The only thing I'd touch around platforms in that patch is
> > adding a generic fallback pg_atomic_test_and_set() to s_lock.h and
> > remove the special case usages of __sync_lock_test_and_set() (Arm64,
> > some 32 bit arms).
>
> Um, if we had a "generic" pg_atomic_test_and_set(), s_lock.h would be
> about one line long. The above quote seems to me to be exactly the kind
> of optimism that is not warranted in this area.

Well, I am somewhat hesitant to change s_lock for more platforms than
necessary. So I proposed to restructure it in a way that leaves all
existing implementations in place that do not already rely on
__sync_lock_test_and_set().

There's also SPIN_DELAY btw, which I don't see any widely provided
intrinsics for.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-16 20:54:27 Re: removing old ports and architectures
Previous Message Claudio Freire 2013-10-16 20:50:30 Re: Auto-tuning work_mem and maintenance_work_mem