Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-06 14:35:42
Message-ID: 52CABF3E.2050004@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 01/06/2014 03:48 PM, Andres Freund wrote:
> Hi,
>
> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
>> page 14833 of relation base/16436/3321003988 is uninitialized
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
>> contains references to invalid pages
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup
>> process (PID 19938) was terminated by signal 6: Aborted
>> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG:
>> terminating any other active server processes
>
> There just was another case of this reported on IRC by MatheusOl and for
> some reason in his case I noticed the pertinent details and it quickly
> clicked:
> * page 14833 is the one with the error
> * we're actually vacuuming page 38538
> * lastBlockVacuumed is 0
>
> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
> and the page vacuumed and acquire a cleanup lock on it. But there isn't
> any guarantee that the intermediate pages are valid, filled pages,
> afaics.

Hmm. So the problem arises if there's an uninitialized page in the
middle of the b-tree relation for some reason. It's unusual for an
uninitialized page to be left in the middle of the relation, but it's
certainly possible, if e.g you crash just after extending the relation.
In a heap, vacuum will initialize such pages and emit a WARNING like
"page %u is uninitialized --- fixing", but we don't do that for b-tree.

> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.

That'd be horrendously dangerous. It would silently zap any page with
any error on it. But we could add a new ReadBufferMode that returns
InvalidBuffer on error, without zeroing the page.

- Heikki

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2014-01-06 14:38:16 Re: BUG #8686: Standby could not restart.
Previous Message Andres Freund 2014-01-06 13:48:15 Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

Browse pgsql-hackers by date

  From Date Subject
Next Message Masterprojekt Naumann1 2014-01-06 14:37:37 Re: Convert Datum* to char*
Previous Message Robert Haas 2014-01-06 14:28:25 Re: ALTER SYSTEM SET command to change postgresql.conf parameters