Re: missing RelationCloseSmgr in FreeFakeRelcacheEntry?

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: missing RelationCloseSmgr in FreeFakeRelcacheEntry?
Date: 2013-11-04 07:38:27
Message-ID: 52774EF3.6080705@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.10.2013 03:16, Andres Freund wrote:
> Hi,
>
> I've started a valgrind run earlier when trying to run the regression
> tests with valgrind --error-exitcode=122 (to cause the regression tests
> to fail visibly) but it crashed frequently...
>
> One of them was:
>
> ==2184== Invalid write of size 8
> ==2184== at 0x76787F: smgrclose (smgr.c:284)
> ==2184== by 0x4ED717: xact_redo_commit_internal (xact.c:4676)
> ==2184== by 0x4ED7FF: xact_redo_commit (xact.c:4712)
> ==2184== by 0x4EDB0D: xact_redo (xact.c:4838)
> ==2184== by 0x50355D: StartupXLOG (xlog.c:6809)
> ==2184== by 0x70AA1E: StartupProcessMain (startup.c:224)
> ==2184== by 0x512B38: AuxiliaryProcessMain (bootstrap.c:429)
> ==2184== by 0x709C43: StartChildProcess (postmaster.c:5063)
> ==2184== by 0x7086EA: PostmasterStateMachine (postmaster.c:3544)
> ==2184== by 0x7072F1: reaper (postmaster.c:2801)
> ==2184== by 0x57B325F: ??? (in /lib/x86_64-linux-gnu/libc-2.17.so)
> ==2184== by 0x585F822: __select_nocancel (syscall-template.S:81)
> ==2184== Address 0x5f63410 is 5,584 bytes inside a recently re-allocated block of size 8,192 alloc'd
> ==2184== at 0x402ACEC: malloc (vg_replace_malloc.c:270)
> ==2184== by 0x8B3F8E: AllocSetAlloc (aset.c:854)
> ==2184== by 0x8B623B: MemoryContextAlloc (mcxt.c:581)
> ==2184== by 0x8B5F93: MemoryContextCreate (mcxt.c:522)
> ==2184== by 0x8B33C4: AllocSetContextCreate (aset.c:444)
> ==2184== by 0x8B55DD: MemoryContextInit (mcxt.c:110)
> ==2184== by 0x703B17: PostmasterMain (po
>
> Which seems to indicate a dangling ->rd_smgr pointer, confusing the heck
> out of me because I couldn't see how that'd happen.
>
> Looking a bit closer it seems to me that the fake relcache
> infrastructure seems to neglect the chance that something used the fake
> entry to read something which will have done a RelationOpenSmgr(). Which
> in turn will have set rel->rd_smgr->smgr_owner to rel.
> When we then pfree() the fake relation in FreeFakeRelcacheEntry we have
> a dangling pointer.

Yeah, that's clearly a bug.

> diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
> index 5429d5e..ee7c892 100644
> --- a/src/backend/access/transam/xlogutils.c
> +++ b/src/backend/access/transam/xlogutils.c
> @@ -433,6 +433,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode)
> void
> FreeFakeRelcacheEntry(Relation fakerel)
> {
> + RelationCloseSmgr(fakerel);
> pfree(fakerel);
> }

Hmm, I don't think that's a very good solution. Firstly, it will force
the underlying files to be closed, hurting performance. Fake relcache
entries are used in heapam when clearing visibility map bits, which
might happen frequently enough for that to matter. Secondly, it will
fail if you create two fake relcache entries for the same relfilenode.
Freeing the first will close the smgr entry, and freeing the second will
try to close the same smgr entry again. We never do that in the current
code, but it seems like a possible source of bugs in the future.

How about the attached instead?

- Heikki

Attachment Content-Type Size
fake-relcache-fix-2.patch text/x-diff 625 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2013-11-04 07:40:19 Removal of archive in wal_level
Previous Message Tom Lane 2013-11-04 05:20:25 Re: Shave a few instructions from child-process startup sequence