Re: [GENERAL] Undetected corruption of table files

Lists: pgsql-generalpgsql-hackers
From: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Undetected corruption of table files
Date: 2007-08-23 13:46:07
Message-ID: D960CB61B694CF459DCFB4B0128514C218D95B@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I am slightly worried that corruption of data files may
remain undetected in PostgreSQL.

As an experiment, I created a simple table with a primary key
index and inserted a couple of rows. The corresponding data file
is 1 page = 8K long.

Now when I stop the server, zero out the data file with
dd if=/dev/zero of=45810 bs=8192 count=1
and start the server again, the table is empty when I SELECT
from it and no errors are reported.

Only a VACUUM gives me the
WARNING: relation "test" page 0 is uninitialized --- fixing
and the file is truncated to length zero.

The next thing I tried is to randomly scribble into the 8K data
file with a hex editor at different locations.

Some of these actions provoked error messages ranging from
ERROR: invalid page header in block 0 of relation "test"
over
ERROR: could not access status of transaction 1954047348
to
LOG: server process (PID 28149) was terminated by signal 11

Frequently, though, the result was that some of the rows were
"missing", i.e. there was no error message when I SELECTed
from the table, but some of the rows were gone.

I got no errors or warnings from VACUUM either.

As far as I know there is no tool to verify the integrity of
a PostgreSQL table.

- Shouldn't there be an error, some kind of 'missing magic
number' or similar, when a table file consists of only
zeros?

- Wouldn't it be desirable to have some means to verify the
integrity of a table file or a whole database?

Yours,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Undetected corruption of table files
Date: 2007-08-23 17:08:01
Message-ID: 8963.1187888881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at> writes:
> - Shouldn't there be an error, some kind of 'missing magic
> number' or similar, when a table file consists of only
> zeros?

The particular case of an all-zeroes page is specifically allowed,
and has to be because it's a valid transient state in various
scenarios.

> - Wouldn't it be desirable to have some means to verify the
> integrity of a table file or a whole database?

SELECT * usually does reasonably well at that.

regards, tom lane


From: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
To: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Undetected corruption of table files
Date: 2007-08-24 07:57:41
Message-ID: D960CB61B694CF459DCFB4B0128514C218DA78@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
>> - Shouldn't there be an error, some kind of 'missing magic
>> number' or similar, when a table file consists of only
>> zeros?
>
> The particular case of an all-zeroes page is specifically allowed,
> and has to be because it's a valid transient state in various
> scenarios.

I see, that was a pathological case.

>> - Wouldn't it be desirable to have some means to verify the
>> integrity of a table file or a whole database?
>
> SELECT * usually does reasonably well at that.

Would it be an option to have a checksum somewhere in each
data block that is verified upon read?

Would this be a lot of work to add? I'd be willing to try it
if it is desirable and feasible.

Yours,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Undetected corruption of table files
Date: 2007-08-24 14:59:36
Message-ID: 29791.1187967576@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at> writes:
> Would it be an option to have a checksum somewhere in each
> data block that is verified upon read?

That's been proposed before and rejected before. See the archives ...

regards, tom lane


From: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
To: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undetected corruption of table files
Date: 2007-08-27 08:06:38
Message-ID: D960CB61B694CF459DCFB4B0128514C222079E@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
>> Would it be an option to have a checksum somewhere in each
>> data block that is verified upon read?
>
> That's been proposed before and rejected before. See the archives ...

I searched for "checksum" and couldn't find it. Could someone
give me a pointer? I'm not talking about WAL files here.

Thanks,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undetected corruption of table files
Date: 2007-08-27 15:16:55
Message-ID: 617.1188227815@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at> writes:
> Tom Lane wrote:
>>> Would it be an option to have a checksum somewhere in each
>>> data block that is verified upon read?

>> That's been proposed before and rejected before. See the archives ...

> I searched for "checksum" and couldn't find it. Could someone
> give me a pointer? I'm not talking about WAL files here.

"CRC" maybe? Also, make sure your search goes all the way back; I think
the prior discussions were around the same time WAL was initially put
in, and/or when we dropped the WAL CRC width from 64 to 32 bits.
The very measurable overhead of WAL CRCs are the main thing that's
discouraged us from having page CRCs. (Well, that and the lack of
evidence that they'd actually gain anything.)

regards, tom lane


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-27 15:26:40
Message-ID: 36e682920708270826s2cac1fer5e79738d3dafaf46@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 8/27/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> that and the lack of evidence that they'd actually gain anything

I find it somewhat ironic that PostgreSQL strives to be fairly
non-corruptable, yet has no way to detect a corrupted page. The only
reason for not having CRCs is because it will slow down performance...
which is exactly opposite of conventional PostgreSQL wisdom (no
performance trade-off for durability).

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris(at)enterprisedb(dot)com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/


From: "Trevor Talbot" <quension(at)gmail(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-27 15:48:19
Message-ID: 90bce5730708270848p3d8ae18y56325c5a269c0566@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 8/27/07, Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
> On 8/27/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > that and the lack of evidence that they'd actually gain anything
>
> I find it somewhat ironic that PostgreSQL strives to be fairly
> non-corruptable, yet has no way to detect a corrupted page. The only
> reason for not having CRCs is because it will slow down performance...
> which is exactly opposite of conventional PostgreSQL wisdom (no
> performance trade-off for durability).

But how does detecting a corrupted data page gain you any durability?
All it means is that the platform underneath screwed up, and you've
already *lost* durability. What do you do then?

It seems like the same idea as an application trying to detect RAM errors.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>, <pgsql-general(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undetected corruption of table files
Date: 2007-08-27 15:50:06
Message-ID: 87lkbxf041.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at> writes:
>> Tom Lane wrote:
>>>> Would it be an option to have a checksum somewhere in each
>>>> data block that is verified upon read?
>
>>> That's been proposed before and rejected before. See the archives ...
>
>> I searched for "checksum" and couldn't find it. Could someone
>> give me a pointer? I'm not talking about WAL files here.
>
> "CRC" maybe? Also, make sure your search goes all the way back; I think
> the prior discussions were around the same time WAL was initially put
> in, and/or when we dropped the WAL CRC width from 64 to 32 bits.
> The very measurable overhead of WAL CRCs are the main thing that's
> discouraged us from having page CRCs. (Well, that and the lack of
> evidence that they'd actually gain anything.)

I thought we determined the reason WAL CRCs are expensive is because we have
to checksum each WAL record individually. I recall the last time this came up
I ran some microbenchmarks and found that the cost to CRC an entire 8k block
was on the order of tens of microseconds.

The last time it came up was in the context of allowing turning off
full_page_writes but offering a guarantee that torn pages would be detected on
recovery and no later. I was a proponent of using writev to embed bytes in
each 512 byte block and Jonah said it would be no faster than a CRC (and
obviously considerably more complicated). My benchmarks showed that Jonah was
right and the CRC was cheaper than a the added cost of using writev.

I do agree the benefits of having a CRC are overstated. Most times corruption
is caused by bad memory and a CRC will happily checksum the corrupted memory
just fine. A checksum is no guarantee. But I've also seen data corruption
caused by bad memory in an i/o controller, for example. There are always going
to be cases where it could help.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-27 15:58:24
Message-ID: 46D2F4A0.3050708@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Jonah H. Harris wrote:
> On 8/27/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> that and the lack of evidence that they'd actually gain anything
>
> I find it somewhat ironic that PostgreSQL strives to be fairly
> non-corruptable, yet has no way to detect a corrupted page. The only
> reason for not having CRCs is because it will slow down performance...
> which is exactly opposite of conventional PostgreSQL wisdom (no
> performance trade-off for durability).

Why? I can't say I speak for the developers, but I think the reason is
that data corruption can (with the very rare exception of undetected
programming errors) only be caused by hardware problems.

If you have a "proper" production database server, your memory has error
checking, and your RAID controller has something of the kind as well. If
not you would probably be running the database on a filesystem that has
reliable integrity verification mechanisms.

In the worst case (all the above mechanisms fail), you have backups.

IMHO the problem is covered quite adequately. The operating system and
the hardware cover for the database, as they should; it's _their_ job.

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Trevor Talbot" <quension(at)gmail(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-27 16:00:34
Message-ID: 1433.1188230434@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Trevor Talbot" <quension(at)gmail(dot)com> writes:
> On 8/27/07, Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
>> I find it somewhat ironic that PostgreSQL strives to be fairly
>> non-corruptable, yet has no way to detect a corrupted page.

> But how does detecting a corrupted data page gain you any durability?
> All it means is that the platform underneath screwed up, and you've
> already *lost* durability. What do you do then?

Indeed. In fact, the most likely implementation of this (refuse to do
anything with a page with a bad CRC) would be a net loss from that
standpoint, because you couldn't get *any* data out of a page, even if
only part of it had been zapped.

regards, tom lane


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Trevor Talbot" <quension(at)gmail(dot)com>, "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-27 16:08:17
Message-ID: 36e682920708270908y73bf818xf412ca31f57dd2b1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 8/27/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Indeed. In fact, the most likely implementation of this (refuse to do
> anything with a page with a bad CRC) would be a net loss from that
> standpoint, because you couldn't get *any* data out of a page, even if
> only part of it had been zapped.

At least you would know it was corrupted, instead of getting funky
errors and/or crashes.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris(at)enterprisedb(dot)com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/


From: Decibel! <decibel(at)decibel(dot)org>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Trevor Talbot <quension(at)gmail(dot)com>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-27 18:55:47
Message-ID: 20070827185546.GH54309@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Aug 27, 2007 at 12:08:17PM -0400, Jonah H. Harris wrote:
> On 8/27/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Indeed. In fact, the most likely implementation of this (refuse to do
> > anything with a page with a bad CRC) would be a net loss from that
> > standpoint, because you couldn't get *any* data out of a page, even if
> > only part of it had been zapped.

I think it'd be perfectly reasonable to have a mode where you could
bypass the check so that you could see what was in the corrupted page
(as well as deleting everything on the page so that you could "fix" the
corruption). Obviously, this should be restricted to superusers.

> At least you would know it was corrupted, instead of getting funky
> errors and/or crashes.

Or worse, getting what appears to be perfectly valid data, but isn't.
--
Decibel!, aka Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
To: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Undetected corruption of table files
Date: 2007-08-28 08:14:17
Message-ID: D960CB61B694CF459DCFB4B0128514C2220A7F@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
>>>> Would it be an option to have a checksum somewhere in each
>>>> data block that is verified upon read?
>
>>> That's been proposed before and rejected before. See the
>>> archives ...
>
> I think
> the prior discussions were around the same time WAL was initially put
> in, and/or when we dropped the WAL CRC width from 64 to 32 bits.
> The very measurable overhead of WAL CRCs are the main thing that's
> discouraged us from having page CRCs. (Well, that and the lack of
> evidence that they'd actually gain anything.)

Hmmm - silence me if I'm misunderstanding this, but the most
conclusive hit I had was a mail by you:

http://archives.postgresql.org/pgsql-general/2001-10/msg01142.php

which only got affirmative feedback.

Also, there's a TODO entry:

- Add optional CRC checksum to heap and index pages

This seems to me to be exactly what I wish for...

To the best of my knowledge, the most expensive thing in databases
today is disk I/O, because CPU speed is increasing faster. Although
calculating a checksum upon writing a block to disk will
certainly incur CPU overhead, what may have seemed too expensive
a couple of years ago could be acceptable today.

I understand the argument that it's the task of hardware and
OS to see that data don't get corrupted, but it would improve
PostgreSQL's reliabitity if it can detect such errors and at
least issue a warning.
This wouldn't fix the underlying problem, but it would tell you
to not overwrite last week's backup tape...

Not all databases are on enterprise scale storage systems, and
there's also the small possibility of PostgreSQL bugs that could
be detected that way.

Yours,
Laurenz Albe


From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: ("Trevor Talbot" <quension(at)gmail(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>)
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-28 14:33:27
Message-ID: 200708281436.l7SEajp6077914@smtp6.jaring.my
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

At 11:48 PM 8/27/2007, Trevor Talbot wrote:
>On 8/27/07, Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
> > On 8/27/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > that and the lack of evidence that they'd actually gain anything
> >
> > I find it somewhat ironic that PostgreSQL strives to be fairly
> > non-corruptable, yet has no way to detect a corrupted page. The only
> > reason for not having CRCs is because it will slow down performance...
> > which is exactly opposite of conventional PostgreSQL wisdom (no
> > performance trade-off for durability).
>
>But how does detecting a corrupted data page gain you any durability?
>All it means is that the platform underneath screwed up, and you've
>already *lost* durability. What do you do then?

The benefit I see is you get to change the platform underneath
earlier than later.

Whether that's worth it or not I don't know - real world stats/info
would be good.

Even my home PATA drives tend to grumble about stuff first before
they fail, so it might not be worthwhile doing the extra work.

Regards,
Link.


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Alban Hertroys <alban(at)magproductions(dot)nl>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Undetected corruption of table files
Date: 2007-08-29 06:59:55
Message-ID: 82ejhmddw4.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

* Alban Hertroys:

> If you have a "proper" production database server, your memory has
> error checking, and your RAID controller has something of the kind
> as well.

To my knowledge, no readily available controller performs validation
on reads (not even for RAID-1 or RAID-10, where it would be pretty
straightforward).

Something like an Adler32 checksum (not a full CRC) on each page might
be helpful. However, what I'd really like to see is something that
catches missed writes, but this is very difficult to implement AFAICT.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: Tom Lane *EXTERN* <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Undetected corruption of table files
Date: 2007-08-31 12:08:43
Message-ID: 46D804CB.5020807@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 8/28/2007 4:14 AM, Albe Laurenz wrote:
> Not all databases are on enterprise scale storage systems, and
> there's also the small possibility of PostgreSQL bugs that could
> be detected that way.

Computing a checksum just before writing the block will NOT detect any
faulty memory or Postgres bug that corrupted the block. You will have a
perfectly fine checksum over the corrupted data.

A checksum only detects corruptions that happen between write and read.
Most data corruptions that happen during that time however lead to some
sort of read error reported by the disk.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
To: "Jan Wieck *EXTERN*" <JanWieck(at)Yahoo(dot)com>
Cc: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Undetected corruption of table files
Date: 2007-08-31 12:34:09
Message-ID: D960CB61B694CF459DCFB4B0128514C2221576@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Jan Wieck wrote:
> Computing a checksum just before writing the block will NOT detect any

> faulty memory or Postgres bug that corrupted the block. You will have
a
> perfectly fine checksum over the corrupted data.
>
> A checksum only detects corruptions that happen between write and
read.
> Most data corruptions that happen during that time however lead to
some
> sort of read error reported by the disk.

I have thought some more about it, and tend to agree now:
Checksums will only detect disk failure, and that's only
one of the many integrity problems that can happen.
And one that can be reduced to a reasonable degree with good
storage systems.

So the benefit of checksums is not enough to bother.

Yours,
Laurenz Albe


From: Decibel! <decibel(at)decibel(dot)org>
To: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: Jan Wieck *EXTERN* <JanWieck(at)Yahoo(dot)com>, Tom Lane *EXTERN* <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Undetected corruption of table files
Date: 2007-08-31 15:49:08
Message-ID: 20070831154908.GW38801@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 31, 2007 at 02:34:09PM +0200, Albe Laurenz wrote:
> I have thought some more about it, and tend to agree now:
> Checksums will only detect disk failure, and that's only
> one of the many integrity problems that can happen.
> And one that can be reduced to a reasonable degree with good
> storage systems.
>
> So the benefit of checksums is not enough to bother.

Uhm... how often do we get people asking about corruption on -admin
alone? 2-3x a month? ISTM it would be very valuable to those folks to
be able to tell them if the corruption occurred between writing a page
out and reading it back in.

Even if we don't care about folks running on suspect hardware, having a
CRC would make it far more reasonable to recommend full_page_writes=off.
I never turn that off and recommend to folks that they don't turn it off
because there's no way to know if it will or has corrupted data.

BTW, a method that would buy additional protection would be to compute
the CRC for a page every time you modify it in such a way that generates
a WAL record, and record that CRC with the WAL record. That would
protect from corruption that happened anytime after the page was
modified, instead of just when smgr went to write it out. How useful
that is I don't know...
--
Decibel!, aka Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Undetected corruption of table files
Date: 2007-08-31 19:11:29
Message-ID: 291.1188587489@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Decibel! <decibel(at)decibel(dot)org> writes:
> Even if we don't care about folks running on suspect hardware, having a
> CRC would make it far more reasonable to recommend full_page_writes=3Doff.

This argument seems ridiculous. Finding out that you have corrupt data
is no substitute for not having corrupt data.

> BTW, a method that would buy additional protection would be to compute
> the CRC for a page every time you modify it in such a way that generates
> a WAL record, and record that CRC with the WAL record. That would
> protect from corruption that happened anytime after the page was
> modified, instead of just when smgr went to write it out. How useful
> that is I don't know...

Two words: hint bits.

regards, tom lane


From: Decibel! <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Undetected corruption of table files
Date: 2007-08-31 21:17:50
Message-ID: 20070831211750.GJ38801@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 31, 2007 at 03:11:29PM -0400, Tom Lane wrote:
> Decibel! <decibel(at)decibel(dot)org> writes:
> > Even if we don't care about folks running on suspect hardware, having a
> > CRC would make it far more reasonable to recommend full_page_writes=3Doff.
>
> This argument seems ridiculous. Finding out that you have corrupt data
> is no substitute for not having corrupt data.

Of course. But how will you discover you have corrupt data if there's no
mechanism to detect it?
--
Decibel!, aka Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)