Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery

From: Maciek Sakrejda <maciek(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery
Date: 2013-02-24 00:16:39
Message-ID: CAKwe89DvzPSFECrwkrbbgpr0JwZUhSfRxas6Z0khRnw=zDSbAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Feb 21, 2013 at 4:04 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> I'd like to see the contents of the WAL, starting from the last
> checkpoint, up to the point where failover happened. In particular, any
> actions on the relation base/16385/16430, which caused the error.
> pg_controldata output on the base backup would also interesting, as well as
> the contents of backup_label file.
>
> How long did the standby run between the base backup and the failover? How
> many WAL segments?
>
> One more thing you could try to narrow down the error: restore from the
> base backup, and let it run up to the point of failover, but shut it down
> just before the failover with "pg_ctl stop -m fast". That should create a
> restartpoint, at the latest checkpoint record. Then restart, and perform
> failover. If it still throws the same error, we know that the WAL record
> that touched the page that doesn't exist was after the last checkpoint.
>

Unfortunately, it looks like we lost the bad wal segments and necessary
base backup due to our archiving mechanism. We don't yet have a principled
way of saving systems for forensics. I thought I had manually accounted for
everything to keep this "on ice" but I missed a step and the system was
archived. I apologize. I'll see if I can add something for us to better
support this.

For what it's worth the failover was done at 2013-02-14 23:55:44 +0000 and
the base backup used was dated 2013-02-15 00:49:22 +0000.

I'll follow up in case we run into this again.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2013-02-24 00:17:24 Re: BUG #7853: Incorrect statistics in table with many dead rows.
Previous Message Pavel Stehule 2013-02-23 21:51:31 Re: BUG #7873: pg_restore --clean tries to drop tables that don't exist