Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enabling Checksums
Date: 2013-03-19 18:32:31
Message-ID: 5148AF3F.5040801@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/8/13 4:40 PM, Greg Stark wrote:
> On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> After some examination of the systems involved, we conculded that the
>> issue was the FreeBSD drivers for the new storage, which were unstable
>> and had custom source patches. However, without PostgreSQL checksums,
>> we couldn't *prove* it wasn't PostgreSQL at fault. It ended up taking
>> weeks of testing, most of which was useless, to prove to them they had a
>> driver problem so it could be fixed. If Postgres had had checksums, we
>> could have avoided wasting a couple weeks looking for non-existant
>> PostgreSQL bugs.
>
> How would Postgres checksums have proven that?

It's hard to prove this sort of thing definitively. I see this more as
a source of evidence that can increase confidence that the database is
doing the right thing, most usefully in a replication environment.
Systems that care about data integrity nowadays are running with a WAL
shipping replica of some sort. Right now there's no way to grade the
master vs. standby copies of data, to figure out which is likely to be
the better copy. In a checksum environment, here's a new
troubleshooting workflow that becomes possible:

1) Checksum error happens on the master.
2) The same block is checked on the standby. It has the same 16 bit
checksum, but different data, and its checksum matches its data.
3) The copy of that block on the standby, which was shipped over the
network instead of being stored locally, is probably good.
4) The database must have been consistent when the data was in RAM on
the master.
5) Conclusion: there's probably something wrong at a storage layer
below the database on the master.

Now, of course this doesn't automatically point the finger correctly
with every possible corruption possibility. But this example is a
situation I've seen in the real world when a bad driver flips a random
bit in a block. If Josh had been able to show his client the standby
server built from streaming replication was just fine, and corruption
was limited to the master, that doesn't *prove* the database isn't the
problem. But it does usefully adjust the perception of what faults are
likely and unlikely away from it. Right now when I see master/standby
differences in data blocks, it's the old problem of telling the true
time when you have two clocks. Having a checksum helps pick the right
copy when there is more than one, and one has been corrupted by storage
layer issues.

> If i understand the performance issues right the main problem is the
> extra round trip to the wal log which can require a sync. Is that
> right?

I don't think this changes things such that there is a second fsync per
transaction. That is a worthwhile test workload to add though. Right
now the tests Jeff and I have ran have specifically avoided systems with
slow fsync, because you can't really test the CPU/memory overhead very
well if you're hitting the rotational latency bottleneck.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-03-19 20:16:14 Re: postgres_fdw vs data formatting GUCs (was Re: [v9.3] writable foreign tables)
Previous Message Tom Lane 2013-03-19 18:07:53 Re: Enabling Checksums