Re: corrupt pages detected by enabling checksums

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-05-09 21:22:34
Message-ID: CA+U5nMLCQcfzjZ1vXG6tEma7VCK7XLcz+rb-c9W=2jjLW_mEQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 May 2013 20:28, Jim Nasby <jim(at)nasby(dot)net> wrote:

>> Unfortunately, it seems that doing any kind of validation to determine
>> that we have a valid end-of-the-WAL inherently requires some kind of
>> separate durable write somewhere. It would be a tiny amount of data (an
>> LSN and maybe some extra crosscheck information), so I could imagine
>> that would be just fine given the right hardware; but if we just write
>> to disk that would be pretty bad. Ideas welcome.

Not so sure.

If the WAL record length is intact, and it probably is, then we can
test whether the next WAL record is valid also.

If the current WAL record is corrupt and the next WAL record is
corrupt, then we have a problem.

If the current WAL record is corrupt and the next WAL record is in
every way valid, we can potentially continue. But we need to keep
track of accumulated errors to avoid getting into a worse situation.
Obviously, we would need to treat the next WAL record with complete
scepticism, but I have seen cases where only a single WAL record was
corrupt.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-05-09 21:29:02 Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4
Previous Message Tom Lane 2013-05-09 21:11:43 Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4