Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Keith Fiske <keith(at)omniti(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, levertond(at)googlemail(dot)com, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Date: 2014-06-06 22:21:45
Message-ID: 14003.1402093305@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-06-06 18:03:53 -0400, Tom Lane wrote:
>> The point here seems to be that lazy_vacuum_page does the visibility map
>> ops inside its own critical section. Why? Setting a visibility bit
>> doesn't seem like it's critical. Why can't we just move the
>> END_CRIT_SECTION() to before the PageIsAllVisible test?

> Yea, that's what I am proposing upthread. If we move the visibility
> tests out of the critical section this will get rid of the original
> report as well.

I went trolling for other critical sections ...

lazy_scan_heap has same disease, but looks like it can be fixed same way.

Also, there are a bunch of fsync_fname() calls inside critical sections in
replication/slot.c. Seems at best pretty damn risky; what's more, the
critical sections cover only the fsyncs and not anything else, which is
flat out broken. If it was okay to fail just before calling the fsync,
why is it critical to not fail inside it? Somebody was not thinking
clearly there.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2014-06-06 23:11:49 Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Previous Message Andres Freund 2014-06-06 22:05:37 Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process