Re: xlc atomics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>
Subject: Re: xlc atomics
Date: 2016-02-15 19:02:41
Message-ID: 12444.1455562961@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> That commit (0d32d2e) permitted things to compile and usually pass tests, but
> I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen
> sixteen duplicate-catalog-OID failures.

I'd been wondering about those ...

> These suggested OidGenLock wasn't doing its job. I've seen similar symptoms
> around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> 5765-J08)" for ppc64le. The problem is generic-xlc.h
> pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our
> own s_lock.h, its references, and other projects' usage:

Nice catch!

> This patch's test case would have failed about half the time under today's
> generic-xlc.h. Fast machines run it in about 1s. A test that detects the bug
> 99% of the time would run far longer, hence this compromise.

Sounds like a reasonable compromise to me, although I wonder about the
value of it if we stick it into pgbench's TAP tests. How many of the
slower buildfarm members are running the TAP tests? Certainly mine are
not.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-02-15 19:20:02 Re: [PATCH] Code refactoring related to -fsanitize=use-after-scope
Previous Message Greg Stark 2016-02-15 18:42:49 A bit of PG archeology uncovers an interesting Linux/Unix factoid