Re: Enabling Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-03 23:05:22
Message-ID: 5133D732.4090801@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/19/12 6:30 PM, Jeff Davis wrote:
> I ran a few tests.
> Test 1 - find worst-case overhead for the checksum calculation on write:
> Test 2 - worst-case overhead for calculating checksum while reading data
> Test 3 - worst-case WAL overhead

What I've done is wrap all of these tests into a shell script that runs
them 3 times each, with and without checksums. That includes steps like
the spots where Jeff found a "sync" helped improve repeatability. I ran
these manually before and didn't notice enough of a difference to argue
with any of his results at the time. Having them packaged up usefully
means I can try some additional platforms too, and other testers should
be easily able to take a crack at it.

On the last one, in addition to runtime I directly measure how many
bytes of WAL are written. It's 0 in the usual case, where the hint bit
changes triggered by the first "SELECT * FROM foo" don't generate any WAL.

Detailed results with both my and Jeff's numbers are in the attached
spreadsheet. I did my tests on a Mac writing to SSD, to try and get
some variety in the test platforms. The main difference there is that
Test 1 is much slower on my system, enough so that the slowdown isn't as
pronounced.

Remember, these are a set of tests designed to magnify the worst case
here. I don't feel any of these results make the feature uncommittable.
The numbers I'm getting are not significantly different from the ones
Jeff posted back in December, and those were acceptable to some of the
early adopter candidates I've been surveying informally. These numbers
are amplifying overhead without doing much in the way of real disk I/O,
which can easily be a lot more expensive than any of this. I do think
there needs to be a bit more documentation of the potential downsides to
checksumming written though, since they are pretty hefty in some situations.

I'm going to get some pgbench results next, to try and put this into a
more realistic context too. The numbers for this round break down like
this:

= Test 1 - find worst-case overhead for the checksum calculation on write =

This can hit 25% of runtime when you isolate it out. I'm not sure if
how I'm running this multiple times makes sense yet. This one is so
much slower on my Mac that I can't barely see a change at all.

= Test 2 - worst-case overhead for calculating checksum while reading data =

Jeff saw an 18% slowdown, I get 24 to 32%. This one bothers me because
the hit is going to happen during the very common situation where data
is shuffling a lot between a larger OS cache and shared_buffers taking a
relatively small fraction. If that issue were cracked, such that
shared_buffers could be >50% of RAM, I think the typical real-world
impact of this would be easier to take.

= Test 3 - worst-case WAL overhead =

This is the really nasty one. The 10,000,000 rows touched by the SELECT
statement here create no WAL in a non-checksum environment. When
checksums are on, 368,513,656 bytes of WAL are written, so about 37
bytes per row.

Jeff saw this increase runtime by 135%, going from 1000ms to 2350ms. My
multiple runs are jumping around in a way I also don't trust fully yet.
But the first and best of the ones I'm seeing goes from 1660ms to
4013ms, which is a 140% increase. The others are even worse. I suspect
I'm filling a cache that isn't cleared before the second and third run
are over. I'll know for sure when I switch back to Linux.

The really nasty case I can see making people really cranky is where
someone has fsync on, a slowly rotating drive, and then discovers this
slowing read statements. There's already a decent share of "why is it
writing when I do 'SELECT *'?" complaints around the block I/O, which is
fully asynchronous in a lot of cases.

Right now the whole hint bit mechanism and its overhead are treated as
an internal detail that isn't in the regular documentation. I think
committing this sort of checksum patch will require exposing some of the
implementation to the user in the documentation, so people can
understand what the trouble cases are--either in advance or when trying
to puzzle out why they're hitting one of them.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
ChecksumPerfResults.xls application/vnd.ms-excel 9.0 KB
perf-checksum.conf text/plain 399 bytes
perf-checksum.sh application/x-sh 4.0 KB
wal-functions.sql text/plain 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-03-03 23:07:35 Re: transforms
Previous Message Nicolas Barbier 2013-03-03 23:01:18 Re: Materialized views WIP patch