Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-07 00:44:44
Message-ID: 5137E2FC.6050708@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/6/13 1:34 PM, Robert Haas wrote:
> We've had a few EnterpriseDB customers who have had fantastically
> painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
> ZFS block size to the PostgreSQL block size is supposed to make these
> problems go away, but in my experience it does not have that effect.

There are a couple of major tuning issues you have to get right for good
ZFS performance, like its tendency to gobble more RAM than is
necessarily appropriate for a PostgreSQL host. If you nail down all
those and carefully setup everything it can work OK. When Sun had a
bunch of good engineers working on the problem they certainly pulled it
off. I managed a 3TB database on a ZFS volume for a while myself.
Being able to make filesystem snapshots cleanly and easily was very nice.

As for the write performance implications of COW, though, at a couple of
points I was only able to keep that system ingesting data fast enough if
I turned fsync off :( It's not as if even ZFS makes all the filesystem
issues the database worries about go away either. Take a look at
http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html
as an example. That should leave you with a healthy concern over ZFS
handling of power interruption and lying drives. "[NTFS and ext3] have
the same problem, but it has different effects, that aren't as visible
as in ZFS." ext4 actually fixed this for most hardware though, and I
believe ZFS still has the same uberblock concern. ZFS reliability and
its page checksums are good, but they're not magic for eliminating torn
page issues.

Normally I would agree with Heikki's theory of "let's wait a few years
and see if the filesystem will take care of it" idea. But for me, the
"when do we get checksums?" clock started ticking in 2006 when ZFS
popularized its implementation, and now it's gone off and it keeps
ringing at new places. I would love it if FreeBSD had caught a massive
popularity wave in the last few years, so ZFS was running in a lot more
places. Instead what I keep seeing is deployments Linux with filesystem
choices skewed toward conservative. Forget about the leading edge--I'd
be happy if I could get one large customer to migrate off of ext3...

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2013-03-07 00:48:35 Re: Support for REINDEX CONCURRENTLY
Previous Message Andres Freund 2013-03-07 00:41:17 Re: Enabling Checksums