Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Keith Fiske <keith(at)omniti(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, levertond(at)googlemail(dot)com, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Date: 2014-06-19 23:30:59
Message-ID: 20140619233059.GF16260@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2014-06-06 18:21:45 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-06-06 18:03:53 -0400, Tom Lane wrote:
> >> The point here seems to be that lazy_vacuum_page does the visibility map
> >> ops inside its own critical section. Why? Setting a visibility bit
> >> doesn't seem like it's critical. Why can't we just move the
> >> END_CRIT_SECTION() to before the PageIsAllVisible test?
>
> > Yea, that's what I am proposing upthread. If we move the visibility
> > tests out of the critical section this will get rid of the original
> > report as well.
>
> I went trolling for other critical sections ...
>
> lazy_scan_heap has same disease, but looks like it can be fixed same way.

Hm. I'm writing a fix for this issue right now and went to look into
lazy_scan_heap() - but I'm not seeing a problem atm. Do you mean that
bit:
START_CRIT_SECTION();

/* mark buffer dirty before writing a WAL record */
MarkBufferDirty(buf);

/*
* It's possible that another backend has extended the heap,
* initialized the page, and then failed to WAL-log the page
* due to an ERROR. Since heap extension is not WAL-logged,
* recovery might try to replay our record setting the page
* all-visible and find that the page isn't initialized, which
* will cause a PANIC. To prevent that, check whether the
* page has been previously WAL-logged, and if not, do that
* now.
*/
if (RelationNeedsWAL(onerel) &&
PageGetLSN(page) == InvalidXLogRecPtr)
log_newpage_buffer(buf, true);

PageSetAllVisible(page);
visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId);
END_CRIT_SECTION();
If so, I think that should actually be safe - visibilitymap_set()
shouldn't ever do IO - that's why it gets the vmbuffer argument passed
in.
Now, afaics there isn't really any need for the critical section in this
case but the assertion in log_newpage_buffer(). So, afaics, we could
just end the do the END_CRIT_SECTION() two lines earlier, to keep
it as short as possible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2014-06-19 23:59:57 Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Previous Message Alvaro Herrera 2014-06-19 22:12:41 Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts