Re: better atomics - v0.5

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Ants Aasma <ants(at)cybertec(dot)at>
Subject: Re: better atomics - v0.5
Date: 2014-07-02 13:06:10
Message-ID: 20140702130610.GO21169@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-06-30 20:28:52 +0200, Andres Freund wrote:
> To quantify this, on my 2 socket xeon E5520 workstation - which is too
> small to heavily show the contention problems in pgbench -S - the
> numbers are:
>
> pgbench -M prepared -c 16 -j 16 -T 10 -S (best of 5, noticeably variability)
> master: 152354.294117
> lwlocks-atomics: 170528.784958
> lwlocks-atomics-spin: 159416.202578
>
> pgbench -M prepared -c 1 -j 1 -T 10 -S (best of 5, noticeably variability)
> master: 16273.702191
> lwlocks-atomics: 16768.712433
> lwlocks-atomics-spin: 16744.913083
>
> So, there really isn't a problem with the emulation. It's not actually
> that surprising - the absolute number of atomic ops is prety much the
> same. Where we earlier took a spinlock to increment shared we now still
> take one.

Tom, you've voiced concern that using atomics will regress performance
for platforms that don't use atomics in a nearby thread. Can these
numbers at least ease those a bit?
I've compared the atomics vs emulated atomics on 32bit x86, 64bit x86
and POWER7. That's all I've access to at the moment. In all cases the
performance with lwlocks ontop emulated atomics was the same as the old
spinlock based algorithm or even a bit better. I'm sure that it's
possible to design atomics based algorithms where that's not the case,
but I don't think we need to solve those before we meet them.

I think for most contended places which possibly can be improved two
things matter: The number of instructions that need to work atomically
(i.e. TAS/CAS/xchg for spinlocks, xadd/cmpxchg for atomics) and the
duration locks are held. When converting to atomics, even if emulated,
the hot paths shouldn't need more locked instructions than before and
often enough the duration during which spinlocks are held will be
smaller.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan Corbet 2014-07-02 13:08:10 Re: /proc/self/oom_adj is deprecated in newer Linux kernels
Previous Message Alexander Korotkov 2014-07-02 13:05:25 Re: 9.5 CF1