What's the condition of bug "PANIC: WAL contains references to invalid pages"?

From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: What's the condition of bug "PANIC: WAL contains references to invalid pages"?
Date: 2014-01-15 13:12:07
Message-ID: 1C1E6786B6704DD5B39B52CF244045A2@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Please tell me a bit about the following bug which has just been solved. I
wish this is exactly what has been annoying for a year.

Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
http://www.postgresql.org/message-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

I've read the discussion, but I'm wondering what the condition where this
failure happens. I guess I understand the following conditions need to hold
true. Are there any other conditions?

* The database server crashes while a btree index is being extended (by page
split).
* Hot standby is used.
* The standby is rebuilt and started.

When I last investigated the bug, the user was doing repeated failover
testing --- stop the master by running "pg_ctl stop -mi" while some
application was performing database updates, promote the standby, rebuild
the standby with pg_basebackup, and start the new standby. In one of those
iterations, the newly rebuilt standby crashed with "WAL contains references
to invalid pages". This seems to match the above mail thread.

However, I don't understand why btree_xlog_vacuum() encountered an all-zero
page. How did the all-zero page appear on the standby? Was it transferred
from master by pg_basebackup? FYI, the server log didn't contain any
messages related to disk full, nor any ERROR messages.

Regards
MauMau

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-01-15 13:27:20 Re: plpgsql.warn_shadow
Previous Message Marko Tiikkaja 2014-01-15 13:09:46 Re: plpgsql.warn_shadow