Re: Recursive ReceiveSharedInvalidMessages not safe

Lists: pgsql-hackers
From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-05 10:05:31
Message-ID: 20140505100531.GT12715@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

While investigating an issue pointed out by valgrind around undefined
bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
bug in ReceiveSharedInvalidMessages(). It tries to be safe against
recursion but it's not:
When it recurses into ReceiveSharedInvalidMessages() from it's main loop
from inside the inval callback while nextmsg = nummsgs it'll overwrite
the 'messages' array with new contents. But at this point the old
content of one entry in the messages array is still passed to
the LocalExecuteInvalidationMessage() that caused the recursion.

It looks to me like SHAREDINVALRELCACHE_ID messages are the only ones
actually affected by this in reality because all the other ones either
operate on stack memory or don't recurse. What could happen there is
that a different relcache entry is invalidated than the relcache
invalidation callback is called for.
This actually could explain the relfilenodemap error we've seen a couple
weeks back.

It looks to me like this is broken since at least fad153ec. I think the
fix is just to make the current 'SharedInvalidationMessage *msg' not be
pointers but a local copiy of the to-be-processed entry.

Comments?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-05 18:15:58
Message-ID: 11846.1399313758@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> While investigating an issue pointed out by valgrind around undefined
> bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
> bug in ReceiveSharedInvalidMessages(). It tries to be safe against
> recursion but it's not:
> When it recurses into ReceiveSharedInvalidMessages() from it's main loop
> from inside the inval callback while nextmsg = nummsgs it'll overwrite
> the 'messages' array with new contents. But at this point the old
> content of one entry in the messages array is still passed to
> the LocalExecuteInvalidationMessage() that caused the recursion.

Hm, yeah, so if the called inval function continues to use the message
contents after doing something that could result in a recursive call,
it might be looking at trashed data.

> It looks to me like this is broken since at least fad153ec. I think the
> fix is just to make the current 'SharedInvalidationMessage *msg' not be
> pointers but a local copiy of the to-be-processed entry.

Yeah, that should do it. I think I'd been trying to avoid copying
messages more times than necessary, but evidently I optimized away
one copy step too many :-(

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-05 19:28:57
Message-ID: 20140505192857.GG17909@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-05-05 14:15:58 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > While investigating an issue pointed out by valgrind around undefined
> > bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
> > bug in ReceiveSharedInvalidMessages(). It tries to be safe against
> > recursion but it's not:
> > When it recurses into ReceiveSharedInvalidMessages() from it's main loop
> > from inside the inval callback while nextmsg = nummsgs it'll overwrite
> > the 'messages' array with new contents. But at this point the old
> > content of one entry in the messages array is still passed to
> > the LocalExecuteInvalidationMessage() that caused the recursion.
>
> Hm, yeah, so if the called inval function continues to use the message
> contents after doing something that could result in a recursive call,
> it might be looking at trashed data.

FWIW, with a bit of changes around it to make it more visible
(malloc/freeing the array) this sometimes triggers during make check. So
it's quite plausible that this is the caused the relfilenodemap
regression error we were a bit confused about.

I started to look at this code because valgrind was bleating at me
inside inval.c and for the heck of I couldn't understand wh. Since then
this lead me into quite a wild goose chase. Leading me to discover a
couple of things I am not perfectly happy about:

a) SICleanupQueue() sometimes releases and reacquires the write lock
held on the outside. That's pretty damn fragile, not to mention
ugly. Even slight reformulations of the code in SIInsertDataEntries()
can break this... Can we please extend the comment in the latter that
to mention the lock dropping explicitly?
b) we right/left shift -1 in a signed int by 16 in
CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
implementation defined behaviour.
c) The ProcessMessageList() calls access msg->rc.id to test for the type
of the existing message. That's not nice. The compiler is entirely
free to e.g. copy the entire struct to local memory causing
unitialized memory to be accessed. Entirely cosmetic, but ...

d)
The valgrind splats I was very occasionally getting were:
==21013== Conditional jump or move depends on uninitialised value(s)
==21013== at 0x8DD755: hash_search_with_hash_value (dynahash.c:885)
==21013== by 0x8DD60F: hash_search (dynahash.c:811)
==21013== by 0x799A58: smgrclosenode (smgr.c:357)
==21013== by 0x8B6ABF: LocalExecuteInvalidationMessage (inval.c:566)
==21013== by 0x77BBCF: ReceiveSharedInvalidMessages (sinval.c:127)
==21013== by 0x8B6C3F: AcceptInvalidationMessages (inval.c:640)
==21013== by 0x77FDCC: LockRelationOid (lmgr.c:107)
==21013== by 0x4A6F33: relation_open (heapam.c:1029)
==21013== by 0x4A71EB: heap_open (heapam.c:1195)
==21013== by 0x8B4228: SearchCatCache (catcache.c:1223)
==21013== by 0x8C61B1: SearchSysCache (syscache.c:909)
==21013== by 0x8C62CD: GetSysCacheOid (syscache.c:987)
==21013== Uninitialised value was created by a stack allocation
==21013== at 0x8B64C3: AddCatcacheInvalidationMessage (inval.c:328)

After far, far too much confused head scratching, code reading, random
elog()s et al I found out that this is just because of a deficiency in
valgrind's undefinedness tracking. The problem is that it doesn't really
understand shared memory sufficiently. When a backend stored a message
in the SI ringbuffer it sets a bitmask about initialized memory for a
specific position in the ringubffer. If the *same* backend reads from
the same position in the ringbuffer (4096 messages later) into which
another backend has stored a *different* type of message the bitmap of
initialized bytes will possibly be wrong if the padding is distributed
differently.

Unfortunately this cannot precisely be caught by valgrind's
suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
0) in AddCatcacheInvalidationMessage() et al. to suppress these
warnings. Imo we can just add them unconditionally, but if somebody else
prefers we can add #ifdef USE_VALGRIND around them.

I can do a), c), d), if others agree but I am not sure about b)?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-05 19:41:22
Message-ID: 20808.1399318882@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> a) SICleanupQueue() sometimes releases and reacquires the write lock
> held on the outside. That's pretty damn fragile, not to mention
> ugly. Even slight reformulations of the code in SIInsertDataEntries()
> can break this... Can we please extend the comment in the latter that
> to mention the lock dropping explicitly?

Want to write a better comment?

> b) we right/left shift -1 in a signed int by 16 in
> CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
> implementation defined behaviour.

Looks all right to me. Yeah, the right shift might have undefined
high-order bits, but we don't care because we're storing the result
into an int16.

> c) The ProcessMessageList() calls access msg->rc.id to test for the type
> of the existing message. That's not nice.

Huh?

> After far, far too much confused head scratching, code reading, random
> elog()s et al I found out that this is just because of a deficiency in
> valgrind's undefinedness tracking. [...]
> Unfortunately this cannot precisely be caught by valgrind's
> suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
> 0) in AddCatcacheInvalidationMessage() et al. to suppress these
> warnings. Imo we can just add them unconditionally, but if somebody else
> prefers we can add #ifdef USE_VALGRIND around them.

I'd be okay with USE_VALGRIND. I'm not particularly hot on adding a
memset for everybody just to make valgrind less confused. Especially
since that's really going to hide any problems, not fix them.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-05 20:05:01
Message-ID: 20140505200500.GE27691@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > a) SICleanupQueue() sometimes releases and reacquires the write lock
> > held on the outside. That's pretty damn fragile, not to mention
> > ugly. Even slight reformulations of the code in SIInsertDataEntries()
> > can break this... Can we please extend the comment in the latter that
> > to mention the lock dropping explicitly?
>
> Want to write a better comment?

Will do.

> > b) we right/left shift -1 in a signed int by 16 in
> > CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
> > implementation defined behaviour.
>
> Looks all right to me. Yeah, the right shift might have undefined
> high-order bits, but we don't care because we're storing the result
> into an int16.

Doesn't at the very least
rnode.backend = (msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
have to be
rnode.backend = ((int) msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;

> > c) The ProcessMessageList() calls access msg->rc.id to test for the type
> > of the existing message. That's not nice.
>
> Huh?

The checks should access msg->id, not msg->rc.id...

> > After far, far too much confused head scratching, code reading, random
> > elog()s et al I found out that this is just because of a deficiency in
> > valgrind's undefinedness tracking. [...]
> > Unfortunately this cannot precisely be caught by valgrind's
> > suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
> > 0) in AddCatcacheInvalidationMessage() et al. to suppress these
> > warnings. Imo we can just add them unconditionally, but if somebody else
> > prefers we can add #ifdef USE_VALGRIND around them.
>
> I'd be okay with USE_VALGRIND. I'm not particularly hot on adding a
> memset for everybody just to make valgrind less confused. Especially
> since that's really going to hide any problems, not fix them.

I can't see it having any measureable overhead. Even old compilers will
optimize the memset() away and just initialize the padding... But
anyway, USE_VALGRIND is fine with me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-05 22:50:59
Message-ID: 24366.1399330259@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
>> Looks all right to me. Yeah, the right shift might have undefined
>> high-order bits, but we don't care because we're storing the result
>> into an int16.

> Doesn't at the very least
> rnode.backend = (msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
> have to be
> rnode.backend = ((int) msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;

A promotion to int (or wider) is implicit in any arithmetic operation,
last I checked the C standard. The "(int)" on the other side isn't
necessary either.

We might actually be better off casting both sides to unsigned int,
just to enforce that the left shifting is done with unsigned semantics.

>>> c) The ProcessMessageList() calls access msg->rc.id to test for the type
>>> of the existing message. That's not nice.

>> Huh?

> The checks should access msg->id, not msg->rc.id...

Meh. If they're not the same thing, we've got big trouble anyway.
But if you want to change it as a cosmetic thing, no objection.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-08 16:39:53
Message-ID: 20140508163953.GB5556@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> > After far, far too much confused head scratching, code reading, random
> > elog()s et al I found out that this is just because of a deficiency in
> > valgrind's undefinedness tracking. [...]
> > Unfortunately this cannot precisely be caught by valgrind's
> > suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
> > 0) in AddCatcacheInvalidationMessage() et al. to suppress these
> > warnings. Imo we can just add them unconditionally, but if somebody else
> > prefers we can add #ifdef USE_VALGRIND around them.
>
> I'd be okay with USE_VALGRIND. I'm not particularly hot on adding a
> memset for everybody just to make valgrind less confused. Especially
> since that's really going to hide any problems, not fix them.

The attached patch does VALGRIND_MAKE_MEM_DEFINED() on the relevant
structs. That's already defined to empty if USE_VALGRIND isn't defined.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Silence-a-spurious-valgrind-warning-in-inval.c.patch text/x-patch 3.3 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recursive ReceiveSharedInvalidMessages not safe
Date: 2014-05-08 16:44:33
Message-ID: 20140508164433.GC5556@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-05-05 18:50:59 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> >> Looks all right to me. Yeah, the right shift might have undefined
> >> high-order bits, but we don't care because we're storing the result
> >> into an int16.
>
> > Doesn't at the very least
> > rnode.backend = (msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
> > have to be
> > rnode.backend = ((int) msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
>
> A promotion to int (or wider) is implicit in any arithmetic operation,
> last I checked the C standard. The "(int)" on the other side isn't
> necessary either.

Done in the attached patch.

> We might actually be better off casting both sides to unsigned int,
> just to enforce that the left shifting is done with unsigned semantics.
>
> >>> c) The ProcessMessageList() calls access msg->rc.id to test for the type
> >>> of the existing message. That's not nice.
>
> >> Huh?
>
> > The checks should access msg->id, not msg->rc.id...
>
> Meh. If they're not the same thing, we've got big trouble anyway.
> But if you want to change it as a cosmetic thing, no objection.

Changed as well.

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > a) SICleanupQueue() sometimes releases and reacquires the write lock
> > held on the outside. That's pretty damn fragile, not to mention
> > ugly. Even slight reformulations of the code in SIInsertDataEntries()
> > can break this... Can we please extend the comment in the latter that
> > to mention the lock dropping explicitly?
>
> Want to write a better comment?

Edited the comment and, in a perhaps debatable fashion, moved some
variable declarations + volatile into the retry loop for some added
cozyness. If the compiler inlines the cleanup function it could very
well decide to fetch some of the variables only once.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Minor-cleanups-and-comment-improvements-in-the-cache.patch text/x-patch 3.3 KB