Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-18 00:50:11
Message-ID: 514664C3.9080404@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/17/13 1:41 PM, Simon Riggs wrote:
> So I'm now moving towards commit using a CRC algorithm. I'll put in a
> feature to allow algorithm be selected at initdb time, though that is
> mainly a convenience to allow us to more easily do further testing on
> speedups and whether there are any platform specific regressions
> there.

That sounds reasonable. As I just posted, I'm hoping Ants can help make
a pass over a CRC16 version, since his one on the Fletcher one seemed
very productive. If you're spending time looking at this, I know I'd
prefer to see you poking at the WAL related aspects instead. There are
more of us who are capable of crunching CRC code than the list of people
who have practice at WAL changes like you do.

I see the situation with checksums right now as being similar to the
commit/postpone situation for Hot Standby in 9.0. The code is uglier
and surely buggier than we'd like, but it has been getting beat on
regularly for over a year now to knock problems out. There are surely
more bugs left to find. The improved testing that comes only from
something being committed is probably necessary to really advance the
testing coverage though. But with adopting the feature being a strict
opt-in, the bug rate for non-adopters isn't that broad. All the TLI
rearrangements is a lot of the patch, but that's pretty mechanical work
that doesn't seem that risky.

There was one question that kepts coming up in person this week (Simon,
Jeff, Daniel, Josh Berkus, and myself were all in the same place for a
few days) that I wanted to address with some thoughts on-list. Given
that the current overhead is right on the edge of being acceptable, the
concern is whether committing this will lock the project into a
permanent problem that can't be improved later. I think it's
manageable, though. Here's how I interpret the data we have:

-The checksum has to change from Fletcher 16 to CRC-16. The "hairy"
parts of the feature don't change very much from that though. I see
exactly which checksum is produced is a pretty small detail, from a code
correctness perspective. It's not like this will be starting over the
testing cycle completely. The performance change should be quantified
though.

-Some common workloads will show no performance drop, like things that
fit into shared_buffers and don't write hint bits.

-Some common workloads that write things seem to hit about a 2% drop,
presumably because they hit one of the slower situations around 10% of
the time.

-There are a decent number of hard to deal with workloads that have
shared_buffers <-> OS cache thrashing, and any approach here will
regularly hit them with around a 20% drop. There's some hope that this
will improve later, especially if a CRC is used and later versions can
pick up the Intel i7 CRC32 hardware acceleration. The magnitude of this
overhead doesn't seem too negotiable though. We've heard enough
comparisons with other people's implementations now to see that's near
the best anyone does here. If the weird slowdowns some people report
with very large values of shared_buffers is fixed, that will make this
situation better. That's on my hit list of things I really want to see
sorted in the next release.

-The worst of the worst case behavior is Jeff's "SELECTs now write a WAL
logged hint bit now" test, which can easily exceed a 20% drop. There
have been lots of features submitted in the last two releases that try
to improve hint bit operations. Some of those didn't show enough of a
win to be worth the trouble. It may be the case, though, that in a
checksummed environment those wins are suddenly big enough to matter.
If any of those go in later, the worst case for checksums could then
improve too. Having to test both ways, with and without checksums,
complicates the performance testing. But the project has to start
adopting a better approach to that in the next year regardless IMHO, and
I'm scheduling time to help as much as I can with it. (That's a whole
other discussion)

-Having COPY FREEZE available now is a useful tool to eliminate a lot of
the load/expensive hint bit write scenarios I know exist in the real
world. I think the docs for checksumming should even highlight that
synergy.

As long as the feature is off by default, so that people have to turn it
on to hit the biggest changed code paths, the exposure to potential bugs
doesn't seem too bad. New WAL data is no fun, but it's not like this
hasn't happened before.

For version <9.3+1>, there's a decent sized list of potential
performance improvements that seem possible. I don't see any reason to
believe committing a CRC16 based version of this will lock the
implementation into a bad form that can't be optimized later. The
comparison with Hot Standby again seems apt again here. There was a
decent list of rough edges that were hit by early 9.0 adopters only when
they turned the feature on. Then many were improved in 9.1.
Checksumming seems it could follow the same path. Committed for 9.3,
improvements expected during <9.3+1> work, generally considered well
tested by the release of <9.3+1>.

On the testing front, we've seen on-list interest in this feature from
companies like Heroku and Enova, who both have some resources and
practice to help testing too. Heroku can spin up test instances with
workloads any number of ways. Enova can make a Londiste standby with
checksums turned on to hit it with a logical replicated workload, while
the master stays un-checksummed.

If this goes in, I fully intent to hold both companies to hitting the
feature with as many workloads as they can help generate during (and
beyond) beta. I have my own stress tests I'll keep running too. If the
bug rate from the beta adopters is bad and doesn't improve, there's is
always the uncomfortable possibility of reverting it before the first RC.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-03-18 01:54:34 Re: [HACKERS] Trust intermediate CA for client certificates
Previous Message Greg Smith 2013-03-18 00:04:29 Re: Enabling Checksums