Re: corrupt pages detected by enabling checksums

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-05-10 17:23:30
Message-ID: 12258.1368206610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> A single WAL record can be over 24kB.

<pedantic>
Actually, WAL records can run to megabytes. Consider for example a
commit record for a transaction that dropped thousands of tables ---
there'll be info about each such table in the commit record, to cue
replay to remove those files.
</pedantic>

> If you replayed the following record but not this record you would
> have an inconsistent database. ...
> Or it could be an index insert for that tuple would would result in a
> physically inconsistent database with index pointers that point to
> incorrect tuples. Index scans would return tuples that didn't match
> the index or would miss tuples that should be returned.

Skipping actions such as index page splits would lead to even more fun.

Even in simple cases such as successive inserts and deletions in the
same heap page, failing to replay some of the actions is going to be
disastrous. The *best case* scenario for that is that WAL replay
PANICs when it notices that the action it's trying to replay is
inconsistent with the current state of the page, eg it's trying to
insert at a TID that already exists.

IMO we can't proceed past a broken WAL record. The actually useful
suggestion upthread was that we try to notice whether there seem
to be valid WAL records past the broken one, so that we could warn
the DBA that some commits might have been lost. I don't think we
can do much in the way of automatic data recovery, but we could give
the DBA a chance to do forensics rather than blindly starting up (and
promptly overwriting all the evidence).

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-05-10 17:32:45 Re: corrupt pages detected by enabling checksums
Previous Message Greg Stark 2013-05-10 16:54:00 Re: corrupt pages detected by enabling checksums