Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-18 19:40:48
Message-ID: 51476DC0.9000106@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/18/13 10:52 AM, Bruce Momjian wrote:
> With a potential 10-20% overhead, I am unclear who would enable this at
> initdb time.

If you survey people who are running PostgreSQL on "cloud" hardware, be
it Amazon's EC2 or similar options from other vendors, you will find a
high percentage of them would pay quite a bit of performance to make
their storage more reliable. To pick one common measurement for
popularity, a Google search on "ebs corruption" returns 17 million hits.
To quote one of those, Baron Schwartz of Percona talking about MySQL
on EC2:

"BTW, I have seen data corruption on EBS volumes. It’s not clear whether
it was InnoDB’s fault (extremely unlikely IMO), the operating system’s
fault, EBS’s fault, or something else."

http://www.mysqlperformanceblog.com/2011/08/04/mysql-performance-on-ec2ebs-versus-rds/

*That* uncertainty is where a lot of the demand for this feature is
coming from. People deploy into the cloud, their data gets corrupted,
and no one call tell them what/why/how it happened. And that means they
don't even know what to change to make it better. The only people I see
really doing something about this problem all seem years off, and I'm
not sure they are going to help--especially since some of them are
targeting "enterprise" storage rather than the cloud-style installations.

> I assume a user would wait until they suspected corruption to turn it
> on, and because it is only initdb-enabled, they would have to
> dump/reload their cluster. The open question is whether this is a
> usable feature as written, or whether we should wait until 9.4.

The reliability issues of both physical and virtual hardware are so
widely known that many people will deploy with this on as their default
configuration.

If you don't trust your existing data, you can't retroactively check it.
A checksum of an already corrupt block is useless. Accordingly, there
is no use case for converting an installation with real or even
suspected problems to a checksummed one. If you wait until you suspect
corruption to care about checksums, it's really too late. There is only
one available next step: you must do a dump to figure out what's
readable. That is the spot that all of the incoming data recovery
customers we see at 2ndQuadrant are already in when we're called. The
cluster is suspicious, sometimes they can get data out of it with a
dump, and if we hack up their install we can usually recover a bit more
than they could.

After the data from a partially corrupted database is dumped, someone
who has just been through that pain might decide they should turn
checksums on when they restore the dump. When it's on, they can access
future damage easily at the block level when it happens, and possibly
repair it without doing a full dump/reload. What's implemented in the
feature we're talking about has a good enough UI to handle this entire
cycle I see damaged installations go through.

> In fact, this feature is going to need
> pg_upgrade changes to detect from pg_controldata that the old/new
> clusters have the same checksum setting.

I think that's done already, but it's certainly something to test out too.

Good questions, Bruce, I don't think the reasons behind this feature's
demand have been highlighted very well before. I try not to spook the
world by talking regularly about how many corrupt PostgreSQL databases
I've seen, but they do happen. Most of my regular ranting on crappy
SSDs that lie about writes comes from a TB scale PostgreSQL install that
got corrupted due to the write-cache flaws of the early Intel
SSDs--twice. The would have happily lost even the worst-case 20% of
regular performance to avoid going down for two days each time they saw
corruption, where we had to dump/reload to get them going again. If the
install had checksums, I could have figured out which blocks were
damaged and manually fixed them. Without checksums, there's no way to
even tell for sure what is broken.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2013-03-18 19:46:32 Re: Enabling Checksums
Previous Message Andrew Dunstan 2013-03-18 19:29:01 Re: JSON Function Bike Shedding