Re: Online enabling of checksums

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-04-06 00:28:17
Message-ID: FB948276-7B32-4B77-83E6-D00167F8EEB4@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 05 Apr 2018, at 23:13, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> (And yes, we've noticed it's failing in isolationtester on a number of boxes -- Daniel is currently investigating)

Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set. When running without atomics, don’t we
need to do something like the below diff to make these APIs match? :

--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
bool
pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
{
- return TAS((slock_t *) &ptr->sema);
+ return TAS((slock_t *) &ptr->sema) == 0;
}

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

AFAICT there are no other callers of this than the online checksum code, and
this is not executed by the tests when running without atomics, which could
explain why nothing else is broken.

Before continuing the debugging, does this theory hold any water? This isn’t
code I’m deeply familiar with so would appreciate any pointers.

cheers ./daniel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-04-06 00:56:54 Re: Checkpoint not retrying failed fsync?
Previous Message David Rowley 2018-04-06 00:06:08 Re: [HACKERS] path toward faster partition pruning