Re: corrupt pages detected by enabling checksums

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-05-09 22:18:47
Message-ID: 1368137927.24407.85.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2013-05-09 at 14:28 -0500, Jim Nasby wrote:
> What about moving some critical data from the beginning of the WAL
> record to the end? That would make it easier to detect that we don't
> have a complete record. It wouldn't necessarily replace the CRC
> though, so maybe that's not good enough.
>
> Actually, what if we actually *duplicated* some of the same WAL header
> info at the end of the record? Given a reasonable amount of data that
> would damn-near ensure that a torn record was detected, because the
> odds of having the exact same sequence of random bytes would be so
> low. Potentially even just duplicating the LSN would suffice.

I think both of these ideas have some false positives and false
negatives.

If the corruption happens at the record boundary, and wipes out the
special information at the end of the record, then you might think it
was not fully flushed, and we're in the same position as today.

If the WAL record is large, and somehow the beginning and the end get
written to disk but not the middle, then it will look like corruption;
but really the WAL was just not completely flushed. This seems pretty
unlikely, but not impossible.

That being said, I like the idea of introducing some extra checks if a
perfect solution is not possible.

> On the separate write idea, if that could be controlled by a GUC I
> think it'd be worth doing. Anyone that needs to worry about this
> corner case probably has hardware that would support that.

It sounds pretty easy to do that naively. I'm just worried that the
performance will be so bad for so many users that it's not a very
reasonable choice.

Today, it would probably make more sense to just use sync rep. If the
master's WAL is corrupt, and it starts up too early, then that should be
obvious when you try to reconnect streaming replication. I haven't tried
it, but I'm assuming that it gives a useful error message.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-05-09 22:19:31 Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4
Previous Message Greg Stark 2013-05-09 22:13:58 Re: corrupt pages detected by enabling checksums