Re: Enabling Checksums

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Enabling Checksums
Date: 2013-03-08 06:35:45
Message-ID: CAAZKuFaby3ShuAxq7wiDCPUHoiEYjkO73LDufwdhpQ2MgTbSFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote:
>> Putting aside the not-so-rosy predictions seen elsewhere in this
>> thread about the availability of a high performance, reliable
>> checksumming file system available on common platforms, I'd like to
>> express what benefit this feature will have to me:
>>
>> Corruption has easily occupied more than one person-month of time last
>> year for us. This year to date I've burned two weeks, although
>> admittedly this was probably the result of statistical clustering.
>> Other colleagues of mine have probably put in a week or two in
>> aggregate in this year to date. The ability to quickly, accurately,
>> and maybe at some later date proactively finding good backups to run
>> WAL recovery from is one of the biggest strides we can make in the
>> operation of Postgres. The especially ugly cases are where the page
>> header is not corrupt, so full page images can carry along malformed
>> tuples...basically, when the corruption works its way into the WAL,
>> we're in much worse shape. Checksums would hopefully prevent this
>> case, converting them into corrupt pages that will not be modified.
>>
>> It would be better yet if I could write tools to find the last-good
>> version of pages, and so I think tight integration with Postgres will
>> see a lot of benefits that would be quite difficult and non-portable
>> when relying on file system checksumming.
>
> I see Heroku has corruption experience, and I know Jim Nasby has
> struggled with corruption in the past.

More than a little: it has entered the realm of the routine, and
happens frequently enough that it has become worthwhile to start
looking for patterns.

Our methods so far rely heavily on our archives to deal with it: it's
time consuming but the 'simple' case of replaying WAL from some
earlier base backup resulting in a non-corrupt database is easily the
most common. Interestingly, the WAL has never failed to recover
halfway through because of CRC failures while treating corruption[0].
We know this fairly convincingly because we constantly sample txid and
wal positions while checking the database, as we typically do about
every thirty seconds.

I think this unreasonable effectiveness of this strategy of old backup
and WAL replay might suggest that database checksums would prove
useful. In my mind, the ways this formula could work so well if the
bug was RAM or CPU based is slimmed considerably.

[0] I have seen -- very rarely -- substantial periods of severe WAL
corruption (files are not even remotely the correct size) propagated
to the archives in the case of disaster recovery where the machine met
its end because of the WAL disk being marked as dead.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2013-03-08 06:56:14 Re: 9.2.3 crashes during archive recovery
Previous Message Pavel Stehule 2013-03-08 06:07:19 Re: Enabling Checksums