Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Keith Fiske <keith(at)omniti(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, levertond(at)googlemail(dot)com, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Date: 2014-06-06 23:11:49
Message-ID: 20140606231149.GA24880@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2014-06-06 18:21:45 -0400, Tom Lane wrote:
> Also, there are a bunch of fsync_fname() calls inside critical sections in
> replication/slot.c. Seems at best pretty damn risky; what's more, the
> critical sections cover only the fsyncs and not anything else, which is
> flat out broken. If it was okay to fail just before calling the fsync,
> why is it critical to not fail inside it? Somebody was not thinking
> clearly there.

No, it actually makes sense. If:
* the open, write or fsync to the temp file fails: no permanent state
has changed. We can gracefully error out.
* rename(tmpfile, realname) fails: we know (by posix) that the file
hasn't been renamed. The old state is still valid.
* if the fsync() to the new file fails (damn unlikely) we don't know
which state is valid. So if we'd crash in that moment we might loose
our reservation on resources (e.g. catalog xmin). And might start to
decode with the wrong catalog state. Bad. On startup we'll try to
fsync the slot files again, so we won't startup until that's clear.

Why is it that risky? We fdatasync() files while inside a critical
section all the time. And we've done the space allocation (the fsync on
the old filename) and the rename() outside the critical section.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Joe Conway 2014-06-06 23:21:44 Re: BUG #10544: I cannot canceling query during computing of R
Previous Message Tom Lane 2014-06-06 22:21:45 Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process