Re: corrupt pages detected by enabling checksums

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-04-05 23:39:37
Message-ID: 1365205177.7580.3183.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:
> Maybe we could scan forward to check whether a corrupted WAL record is
> followed by one or more valid ones with sensible LSNs. If it is,
> chances are high that we haven't actually hit the end of the WAL. In
> that case, we could either log a warning, or (better, probably) abort
> crash recovery.

+1.

> Corruption of fields which we require to scan past the record would
> cause false negatives, i.e. no trigger an error even though we do
> abort recovery mid-way through. There's a risk of false positives too,
> but they require quite specific orderings of writes and thus seem
> rather unlikely. (AFAICS, the OS would have to write some parts of
> record N followed by the whole of record N+1 and then crash to cause a
> false positive).

Does the xlp_pageaddr help solve this?

Also, we'd need to be a little careful when written-but-not-flushed WAL
data makes it to disk, which could cause a false positive and may be a
fairly common case.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-04-06 00:09:42 Re: Drastic performance loss in assert-enabled build in HEAD
Previous Message Jeff Davis 2013-04-05 23:29:47 Re: corrupt pages detected by enabling checksums