Re: We need to rethink relation cache entry rebuild

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: We need to rethink relation cache entry rebuild
Date: 2010-01-09 01:01:35
Message-ID: 24663.1262998895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I mentioned earlier that buildfarm member jaguar (that's the one that
builds with CLOBBER_CACHE_ALWAYS) was showing suspicious intermittent
failures. There's another one today:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-01-08%2004:00:02
and I also managed to reproduce a similar problem locally after running
the parallel regression tests with CLOBBER_CACHE_ALWAYS for about a day.
I'm not entirely sure yet that I understand everything that is going
wrong, but there is one thing that is real clear: honoring a SIGINT or
SIGTERM during relcache rebuild leads to Assert failure or worse.
The stack trace at the bottom of the above-mentioned page shows the
problem pretty clearly. What appears to have happened is that a DROP
DATABASE was issued, causing a cancel to be sent to an autovac worker in
that database, and the worker happened to be in the middle of a relcache
entry rebuild when it accepted the signal. (CLOBBER_CACHE_ALWAYS makes
this hugely more probable, but it could happen in ordinary builds too.)
The problem is that the relcache entry being rebuilt is referenced, so
it has a positive reference count that is tracked in a ResourceOwner
... but the refcount field has been temporarily zeroed while
RelationBuildDesc runs, so when transaction abort tries to release the
refcount we hit that Assert in RelationDecrementReferenceCount.

In a non-Assert build you wouldn't notice an immediate problem, but that
doesn't mean things are okay in the field. The problem can be stated
more generally as: relcache rebuild temporarily messes up a relcache
entry that other places have got pointers to. If an error occurs during
rebuild, those other places might still try to use their pointers,
expecting to reference a valid relcache entry. I think this could
happen for example if the error occurred inside a subtransaction, and
we trapped the exception and allowed the outer transaction to resume.
Code in the outer transaction would naturally still expect its pointer
to a valid, reference-count-incremented relcache entry to be good.

RelationClearRelation does take the step of de-linking the entry it's
going to rebuild from the hash table. When that code was designed,
before resource owners or subtransactions, I think it was actually
safe --- an error could result in a leaked entry in CacheMemoryContext,
but there could not be any live pointers to the partially-rebuilt entry
after we did transaction abort cleanup. Now, however, it's entirely
not safe, and it's only the rather low probability of failure that
has prevented us from spotting the problem before.

Basically I think we have to fix this by ensuring that an error escape
can't occur while a relcache entry is in a partially rebuilt state.
What I have in mind is to rewrite RelationClearRelation so that it
does a rebuild this way:

1. Build a new relcache entry from scratch. This entry won't be linked
from anywhere. A failure here results in no worse than some leaked
memory.

2. Swap the contents of the old and new relcache entries. Do this in
straight-line code that has no possibility of CHECK_FOR_INTERRUPTS.
But don't swap the refcount fields, nor certain infrastructure pointers
that we try to preserve intact if possible --- for example, don't swap
the tupledesc pointers if the old and new tupledescs are equal.

3. Free the new relcache entry along with the (possibly swapped)
infrastructure for it.

One slightly tricky spot is that the index infrastructure has to be
swapped all or nothing, because it all lives in a single subsidiary
memory context. I think this is all right, but if it isn't we'll need
to expend more code on cleaning that infrastructure up properly instead
of just destroying a context.

Comments?

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We need to rethink relation cache entry rebuild
Date: 2010-01-09 01:58:53
Message-ID: 407d949e1001081758w747e9953n7493baa54a69111@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 9, 2010 at 1:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Comments?
>

How old is this problem? It doesn't sound like a backpatchable fix...

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: We need to rethink relation cache entry rebuild
Date: 2010-01-09 02:10:08
Message-ID: 25590.1263003008@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> How old is this problem? It doesn't sound like a backpatchable fix...

Presumably it goes back to 8.0.

I was planning to defer thinking about whether to back-patch it until
we had a working fix and could see how big a change it really is.

regards, tom lane


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Build farm tweaks
Date: 2010-01-09 02:50:11
Message-ID: 35cd5cbc9dfbccbbe90ed456c1f0cda8@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> I mentioned earlier that buildfarm member jaguar (that's the one that
> builds with CLOBBER_CACHE_ALWAYS) was showing suspicious intermittent
> failures.

Tom, this brings up another question: is there any flag, environment,
forced resource limitation, etc. that you or others would like to see
in the build farm? I'd be happy to apply anything to my animal, for
example, as it's running on a fairly standard Linux distro and I'd
like it to be more 'useful'.

If such a list turns out to be more than a few items, perhaps doc
it somewhere like the wiki?

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 201001082147
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAktH7scACgkQvJuQZxSWSsjLPACgmof4H4ux/0qpOPbpHIfKrM5E
gm4AoKN9BhihZqZeS6rDQo1j8l9B+uVL
=cxEm
-----END PGP SIGNATURE-----


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: We need to rethink relation cache entry rebuild
Date: 2010-01-10 11:09:05
Message-ID: 4B49B551.3020706@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I mentioned earlier that buildfarm member jaguar (that's the one that
> builds with CLOBBER_CACHE_ALWAYS) was showing suspicious intermittent
> failures. There's another one today:

hmm I was just doing a CLOBBER_CACHE_ALWAYS build on one of my ARM based
boxes and it seems to fall over very quickly as well however I'm not
getting a useful backtrace from gdb...

Stefan


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: We need to rethink relation cache entry rebuild
Date: 2010-01-10 11:25:52
Message-ID: 1263122752.19367.139183.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-01-08 at 20:01 -0500, Tom Lane wrote:
> I mentioned earlier that buildfarm member jaguar (that's the one that
> builds with CLOBBER_CACHE_ALWAYS) was showing suspicious intermittent
> failures. There's another one today:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-01-08%2004:00:02

Thanks for looking into that; I'm sorry that I was too busy on HS things
and was forced to hand back when the issue occurred earlier.

> What I have in mind is to rewrite RelationClearRelation

That sounds like it is an isolated fix, so that's good from where I'm
sitting.

I will leave off coding anything around VFI until this is done. I was
realistically around 2 weeks from doing that anyway, so I don't foresee
any delay for HS.

--
Simon Riggs www.2ndQuadrant.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: We need to rethink relation cache entry rebuild
Date: 2010-01-10 23:57:44
Message-ID: 4389.1263167864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Basically I think we have to fix this by ensuring that an error escape
> can't occur while a relcache entry is in a partially rebuilt state.

Attached is a draft patch for this. In addition to fixing the stated
problem, it also takes care of a thinko that I found along the way:
RelationClearRelation assumes that any rd_indexprs or rd_indpred trees
can be freed by deleting the rd_indexcxt context, as the comments in
rel.h imply. But actually the code that loads those fields was putting
the trees directly into CacheMemoryContext, meaning that a cache flush
on an index that has expressions or a predicate would result in a
session-lifespan memory leak.

I think this is not too complicated to back-patch --- if anything the
logic is simpler than before. Comments?

regards, tom lane

Attachment Content-Type Size
relcache-rebuild-fix.patch text/x-patch 21.0 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Build farm tweaks
Date: 2010-01-18 15:40:35
Message-ID: 20100118154034.GA3617@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Sabino Mullane wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
> > I mentioned earlier that buildfarm member jaguar (that's the one that
> > builds with CLOBBER_CACHE_ALWAYS) was showing suspicious intermittent
> > failures.
>
> Tom, this brings up another question: is there any flag, environment,
> forced resource limitation, etc. that you or others would like to see
> in the build farm? I'd be happy to apply anything to my animal, for
> example, as it's running on a fairly standard Linux distro and I'd
> like it to be more 'useful'.

At the very least I think we would benefit from more cache-clobbering
animals. (It'd be nice to display that flag in the buildfarm dashboard
too ...)

BTW I find it a bit surprising that we have more 8.3 animals than 8.4
... I guess this is just because there's no auto-add of branches to
animals when new major versions are released.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.