Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-03 18:24:32
Message-ID: 51339560.8030701@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The 16-bit checksum feature seems functional, with two sources of
overhead. There's some CPU time burned to compute checksums when pages
enter the system. And there's extra overhead for WAL logging hint bits.
I'll quantify both of those better in another message.

For completeness sake I've attached the latest versions of the patches I
tested (same set as my last message) along with the testing programs and
source changes that have been useful for my review. I have a test case
now demonstrating a tricky issue my gut told me was possible in page
header handling, and that's what I talk about most here.

= Handling bit errors in page headers =

The thing I've been stuck on is trying to find a case where turning
checksums on results in data that could be read becoming completely
unavailable, after a single bit of corruption. That seemed to me the
biggest risk of this feature. If checksumming can result in lost data,
where before that data would be available just with some potential for
error in it, that's kind of bad. I've created a program that does just
that, with a repeatable shell script test case (check-check.sh)

This builds on the example I gave before, where I can corrupt a single
bit of data in pgbench_accounts (lowest bit in byte 14 in the page) and
then reads that page without problems:

$ psql -c "select sum(abalance) from pgbench_accounts"
sum
-----
0

Corrupting the same bit on a checksums enabled build catches the problem:

WARNING: page verification failed, calculated checksum 5900 but
expected 9227
ERROR: invalid page header in block 0 of relation base/16384/16397

This is good, because it's exactly the sort of quiet corruption that the
feature is supposed to find. But clearly it's *possible* to still read
all of the data in this page, because the build without checksums does
just that. All of these fail now:

$ psql -c "select sum(abalance) from pgbench_accounts"
WARNING: page verification failed, calculated checksum 5900 but
expected 9227
ERROR: invalid page header in block 0 of relation base/16384/16397

$ psql -c "select * from pgbench_accounts"
WARNING: page verification failed, calculated checksum 5900 but
expected 9227
ERROR: invalid page header in block 0 of relation base/16384/16397

And you get this sort of mess out of pg_dump:

COPY pgbench_accounts (aid, bid, abalance, filler) FROM stdin;
pg_dump: WARNING: page verification failed, calculated checksum 5900
but expected 9227
\.

pg_dump: Dumping the contents of table "pgbench_accounts" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: invalid page header in block
0 of relation base/16384/16397
pg_dump: The command was: COPY public.pgbench_accounts (aid, bid,
abalance, filler) TO stdout;

I think an implicit goal of this feature was to soldier on when possible
to do so. The case where something in the page header is corrupted
seems the weakest part of that idea. I would still be happy to enable
this feature on a lot of servers, because stopping in the case of subtle
header corruption just means going to another known good copy of the
data; probably a standby server.

I could see some people getting surprised by this change though. I'm
not sure if it's possible to consider a checksum failure in a page
header something that is WARNed about, rather than always treating it as
a failure and the data is unavailable (without page inspection tools at
least). That seems like the main thing that might be improved in this
feature right now.

= Testing issues =

It is surprisingly hard to get a repeatable test program that corrupts a
bit on a data page. If you already have a copy of the page in memory
and you corrupt the copy on disk, the corrupted copy won't be noticed.
And if you happen to trigger a write of that page, the corruption will
quietly be fixed. This is all good, but it's something to be aware of
when writing test code.

The other thing to watch out for is that you're not hitting an
Index-Only Scan anywhere, because then you're bypassing the database
page you corrupted.

What I've done is come up with a repeatable test case that shows the
checksum patch finding a single bit of corruption that is missed by a
regular server. The program is named check-check.sh, and a full output
run is attached as check-check.log

I also added a developer only debugging test patch as
show_block_verifications.patch This makes every block read spew a
message about what relation it's touching, and proves the checksum
mechanism is being hit each time. The main reason I needed that is to
make sure the pages I expected to be read were actually the ones being
read. When I accidentally was hitting index-only scans for example, I
could tell that because it was touching something from
pgbench_accounts_pkey instead the pgbench_account table data I was
corrupting.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
check-check.log text/plain 9.1 KB
check-check.sh application/x-sh 1.5 KB
checksums-20130124.patch text/plain 72.0 KB
checksums-20130224.patch text/plain 63.5 KB
pg_corrupt text/plain 4.3 KB
replace-tli-with-checksums-20130124.patch text/plain 47.1 KB
show_block_verifications.patch text/plain 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-03-03 19:37:16 Re: Commitfest progress
Previous Message Heikki Linnakangas 2013-03-03 18:11:58 Re: materialized views and FDWs