8.3.5: Crash in CountActiveBackends() - lockless race?

Lists: pgsql-hackers
From: Marko Kreen <markokr(at)gmail(dot)com>
To: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 10:11:48
Message-ID: e51f66da0903300311x18ababa4rdc584593aa33cb91@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We got a crash in our test-server, which has huge number of backends
running:

(gdb) bt
#0 CountActiveBackends () at procarray.c:1094
#1 0x0000000000475f45 in RecordTransactionCommit () at xact.c:945
#2 0x000000000047601c in CommitTransaction () at xact.c:1675
#3 0x0000000000476247 in CommitTransactionCommand () at xact.c:2373
#4 0x00000000005b7872 in finish_xact_command () at postgres.c:2322
#5 0x00000000005b8865 in exec_simple_query (query_string=0xa23070
"END") at postgres.c:1017
#6 0x00000000005ba1b1 in PostgresMain (argc=4, argv=<value optimized
out>, username=0x90b180 "replicator") at postgres.c:3577
#7 0x000000000058ea8b in ServerLoop () at postmaster.c:3207
#8 0x000000000058f7ae in PostmasterMain (argc=5, argv=0x9061e0) at
postmaster.c:1029
#9 0x0000000000545865 in main (argc=5, argv=<value optimized out>) at
main.c:188

$ uname -a
Linux test 2.6.23.17 #1 SMP Fri Oct 31 10:36:17 GMT 2008 x86_64 GNU/Linux

Postgres 8.3.5 / Debian etch / gcc 3.3.5 (Debian 1:3.3.5-13)

------------------------------

My theory:

CountActiveBackends() tries to do lockless access:

/*
* Note: for speed, we don't acquire ProcArrayLock. This is a little bit
* bogus, but since we are only testing fields for zero or nonzero, it
* should be OK. The result is only used for heuristic purposes anyway...
*/
for (index = 0; index < arrayP->numProcs; index++)
{
volatile PGPROC *proc = arrayP->procs[index];

if (proc == MyProc)
continue; /* do not count myself */
if (proc->pid == 0) <--
continue; /* do not count prepared xacts */

ProcArrayAdd() does proper locking, but does not consider lockless access:

LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

....

arrayP->procs[arrayP->numProcs] = proc;
arrayP->numProcs++; // numProcs can be visible before 'proc' ptr

LWLockRelease(ProcArrayLock);

Because there is no memory barrier between setting the ptr and
numProcs++, the numProcs can be visible before the ptr to
CountActivebackends & co. Also, as the ProcArray does not clear
the ptr, which means the once-used slots will point into shared mem,
the race is only fatal if it happens for unused slots.

I see 2 ways to fix it:

1. Add memory barrier to ProcArrayAdd/ProcArrayRemove between pointer
and count update. This guarantees that partial slots will not be seen.

2. Always clear the pointer in ProcArrayRemove and check for NULL
in all "lockless" access points. This guarantees that partial slots
will be either NULL or just-freed ones, before the barrier in
LWLockRelease(), which means the contents should be still sensible.

#1 seems to require platform-specific code, which we don't have yet?
So #2 may be easier solution.

--
marko


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 13:09:34
Message-ID: 49D0C48E.6030602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen wrote:
> 1. Add memory barrier to ProcArrayAdd/ProcArrayRemove between pointer
> and count update. This guarantees that partial slots will not be seen.
>
> 2. Always clear the pointer in ProcArrayRemove and check for NULL
> in all "lockless" access points. This guarantees that partial slots
> will be either NULL or just-freed ones, before the barrier in
> LWLockRelease(), which means the contents should be still sensible.
>
> #1 seems to require platform-specific code, which we don't have yet?

Marking the pointer as volatile should work.

> So #2 may be easier solution.

Agreed. And more importantly, it puts the onus of getting it right into
CountActiveBackends, which is the one who's breaking the rules. We don't
necessarily need to clear the pointer in ProcArrayRemove either, the
count doesn't need to be accurate.

Barring objections, I'll do #2:

*** procarray.c.~1.40.~ 2008-01-09 23:52:36.000000000 +0200
--- procarray.c 2009-03-30 16:04:00.000000000 +0300
***************
*** 1088,1093 ****
--- 1088,1101 ----
for (index = 0; index < arrayP->numProcs; index++)
{
volatile PGPROC *proc = arrayP->procs[index];
+
+ /*
+ * Since we're not holding a lock, need to check that the pointer
+ * is valid. Someone holding the lock could have increased numProcs
+ * already, but not yet assigned a valid pointer to the array.
+ */
+ if (proc != NULL)
+ continue;

if (proc == MyProc)
continue; /* do not count myself */

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 13:47:55
Message-ID: e51f66da0903300647n29fe8946n51354fe1f0f710c7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/30/09, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Marko Kreen wrote:
>
> > 1. Add memory barrier to ProcArrayAdd/ProcArrayRemove between pointer
> > and count update. This guarantees that partial slots will not be seen.
> >
> > 2. Always clear the pointer in ProcArrayRemove and check for NULL
> > in all "lockless" access points. This guarantees that partial slots
> > will be either NULL or just-freed ones, before the barrier in
> > LWLockRelease(), which means the contents should be still sensible.
> >
> > #1 seems to require platform-specific code, which we don't have yet?
> >
>
> Marking the pointer as volatile should work.

No, "volatile" affects only compiler, we need to notify CPU.

> > So #2 may be easier solution.
>
> Agreed. And more importantly, it puts the onus of getting it right into
> CountActiveBackends, which is the one who's breaking the rules. We don't
> necessarily need to clear the pointer in ProcArrayRemove either, the count
> doesn't need to be accurate.

Without reset in ProcArrayRemove we may use some ancient pointer that
may point to garbage? I don't think it's good coding style to allow
that to happen. Lockless operations are unobvious enough...

Also, are there other functions that try lockless access on proc_array?

--
marko


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 14:02:02
Message-ID: 49D0D0DA.2000501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen wrote:
> On 3/30/09, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Agreed. And more importantly, it puts the onus of getting it right into
>> CountActiveBackends, which is the one who's breaking the rules. We don't
>> necessarily need to clear the pointer in ProcArrayRemove either, the count
>> doesn't need to be accurate.
>
> Without reset in ProcArrayRemove we may use some ancient pointer that
> may point to garbage? I don't think it's good coding style to allow
> that to happen.

Well, that can happen anyway. CountActiveBackends() could fetch the
pointer and determine that it's not NULL, just before ProcArrayRemove
clears it.

I agree it's a bit dirty, but seems safe as the PGPROC entries are in
shared memory.

(clearing the pointer might be a good idea anyway, though, for debugging
purposes)

> Also, are there other functions that try lockless access on proc_array?

We do set fields in MyProc without holding the lock, but that should be
fine.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 14:09:07
Message-ID: e51f66da0903300709w5b591a55n2d18e085f1ddeb81@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/30/09, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Marko Kreen wrote:
>
> > On 3/30/09, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> >
> > > Agreed. And more importantly, it puts the onus of getting it right into
> > > CountActiveBackends, which is the one who's breaking the rules. We don't
> > > necessarily need to clear the pointer in ProcArrayRemove either, the
> count
> > > doesn't need to be accurate.
> > >
> >
> > Without reset in ProcArrayRemove we may use some ancient pointer that
> > may point to garbage? I don't think it's good coding style to allow
> > that to happen.
> >
>
> Well, that can happen anyway. CountActiveBackends() could fetch the pointer
> and determine that it's not NULL, just before ProcArrayRemove clears it.

Yes, but that way it's well-defined what pointer you can get there -
it can only be a pointer that is just being removed.

And you know that if the ProcArrayRemove does not invalidate any fields
before LWunlock, the struct data is valid.

> I agree it's a bit dirty, but seems safe as the PGPROC entries are in
> shared memory.

I understand that the pointer is valid, but what about data
it is pointing to?

> (clearing the pointer might be a good idea anyway, though, for debugging
> purposes)

Yes, I think it would be good idea.

> > Also, are there other functions that try lockless access on proc_array?
> >
>
> We do set fields in MyProc without holding the lock, but that should be
> fine.

Ok.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 14:18:29
Message-ID: 3072.1238422709@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Marko Kreen wrote:
>> Without reset in ProcArrayRemove we may use some ancient pointer that
>> may point to garbage? I don't think it's good coding style to allow
>> that to happen.

> Well, that can happen anyway. CountActiveBackends() could fetch the
> pointer and determine that it's not NULL, just before ProcArrayRemove
> clears it.

Dead PGPROC entries are just put into a list for reuse, so the pointer
would still point at storage that looked like a PGPROC. I concur with
Heikki's theory that the observed crash must have been from fetching a
pointer that was never yet not NULL.

regards, tom lane


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 14:36:41
Message-ID: e51f66da0903300736g4062100cu6675edcdd4c9f992@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/30/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
> > Marko Kreen wrote:
> >> Without reset in ProcArrayRemove we may use some ancient pointer that
> >> may point to garbage? I don't think it's good coding style to allow
> >> that to happen.
>
> > Well, that can happen anyway. CountActiveBackends() could fetch the
> > pointer and determine that it's not NULL, just before ProcArrayRemove
> > clears it.
>
>
> Dead PGPROC entries are just put into a list for reuse, so the pointer
> would still point at storage that looked like a PGPROC. I concur with
> Heikki's theory that the observed crash must have been from fetching a
> pointer that was never yet not NULL.

Well, that was also my theory. But my point is that such lockless code
should be written in more stricter way so it's effects can be clearly
deduced. Or at least such roundabout effects should be commented -
"Ancient pointer here would still point to PGPROC struct".

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-30 14:51:24
Message-ID: 10088.1238424684@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> Well, that was also my theory. But my point is that such lockless code
> should be written in more stricter way so it's effects can be clearly
> deduced.

We don't really care that much, for what CountActiveBackends is used for.

> Or at least such roundabout effects should be commented -
> "Ancient pointer here would still point to PGPROC struct".

Agreed, the comment should mention all of these possibilities.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?
Date: 2009-03-31 05:19:57
Message-ID: 49D1A7FD.7000301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
>> Or at least such roundabout effects should be commented -
>> "Ancient pointer here would still point to PGPROC struct".
>
> Agreed, the comment should mention all of these possibilities.

Fixed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com