Re: Enabling Checksums

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-04-13 15:26:26
Message-ID: 20130413152626.GD10556@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-04-13 18:14:28 +0300, Ants Aasma wrote:
> On Sat, Apr 13, 2013 at 5:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >> On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote:
> >>> As I understand it, SIMD is just a CPU-optimized method for producing a
> >>> CRC checksum. Is that right? Does it produce the same result as a
> >>> non-CPU-optimized CRC calculation?
> >
> >> No we are talking about a different algorithm that results in different
> >> results, thats why its important to choose now since we can't change it
> >> later without breaking pg_upgrade in further releases.
> >> http://en.wikipedia.org/wiki/SIMD_%28hash_function%29
> >
> > [ squint... ] We're talking about a *cryptographic* hash function?
> > Why in the world was this considered a good idea for page checksums?
> >
> > In the first place, it's probably not very fast compared to some
> > alternatives, and in the second place, the criteria by which people
> > would consider it a good crypto hash function have approximately nothing
> > to do with what we need for a checksum function. What we want for a
> > checksum function is high probability of detection of common hardware
> > failure modes, such as burst errors and all-zeroes. This is
> > particularly critical when we're going with only a 16-bit checksum ---
> > the probabilities need to be skewed in the right direction, or it's not
> > going to be all that terribly useful.
> >
> > CRCs are known to be good for that sort of thing; it's what they were
> > designed for. I'd like to see some evidence that any substitute
> > algorithm has similar properties. Without that, I'm going to vote
> > against this idea.
>
> Sorry for creating confusion here by playing fast and loose with the
> terminology. We are not talking about that hash function at all. What
> we are talking about here is Fowler-Noll-Vo-ish
> (http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)
> hash function that is restructured to be parallelisable with SIMD
> instructions with the explicit goal of being as fast as possible. The
> resulting hash function is roughly two orders of magnitude faster than
> 1-byte-at-a-time CRC32 currently in use. Performance is about
> comparable with optimized fixed size memcpy running in cache.

Gah, one shouldn't look to quick for a reference, sorry.

> Based on current analysis, it is particularly good at detecting single
> bit errors, as good at detecting burst errors as can be expected from
> 16 bits and not horrible at detecting burst writes of zeroes. It is
> quite bad at detecting multiple uncorrelated single bit errors and
> extremely bad at detecting repeating patterns of errors in low order
> bits.

> All in all I would say that the performance is worth the loss in
> detection capability as we are not talking about using the checksum to
> prove correctness.

Is it actually a loss compared to our 16bit flavor of crc32 we now use?
I didn't think so far from the properties you described?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2013-04-13 16:44:26 Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Previous Message Ants Aasma 2013-04-13 15:14:28 Re: Enabling Checksums