Re: Enabling Checksums

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-04-18 17:15:20
Message-ID: E7FF4EB0-BC20-4C7B-9A0F-7075B61C90DB@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr18, 2013, at 18:48 , Ants Aasma <ants(at)cybertec(dot)at> wrote:
> On Thu, Apr 18, 2013 at 5:57 PM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
>> I'll generate an avalanche diagram for CRC32C too, but it will take a
>> while even if I use a smaller dataset.
>
> Well that was useless... In CRC flipping each bit in the input flips
> preset pattern of bits in the output regardless of the actual data on
> the page. Some stats for CRC32C - input bits affect 28344 different
> bit combinations. Count of bits by number of duplicated bitpatterns:

Yup, CRC is linear too. CRC is essentially long division for polynomials,
i.e. you interpret the N input bits as the coefficients of a (large)
polynomial of degree (N-1), and divide by the CRC polynomial. The remainder
is the checksum, and consists of B bits where B is the degree of the
CRC polynomial. (Polynomial here means polynomial over GF(2), i.e. over
a field with only two values 0 and 1)

I'm currently trying to see if one can easily explain the partial-write
behaviour from that. Having lots of zeros at the end end corresponds
to an input polynomial of the form

p(x) * x^l

where l is the number of zero bits. The CRC (q(x) is the CRC polynomial) is

p(x) * x^l mod q(x) = (p(x) mod q(x)) * (x^l mod q(x)) mod q(x)

That still doesn't explain it, though - the result *should* simply
be the checksum of p(x), scrambled a bit by the multiplication with
(x^l mod q(x)). But if q(x) is irreducible, that scrambling is invertible
(as multiplication module some irreducible element always is), and thus
shouldn't matter much.

So either the CRC32-C polynomial isn't irreducible, or there something
fishy going on. Could there be a bug in your CRC implementation? Maybe
a mixup between big and little endian, or something like that?

The third possibility is that I've overlooking something, of course ;-)
Will think more about this tomorrow if time permits

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2013-04-18 17:24:32 Re: Enabling Checksums
Previous Message Ants Aasma 2013-04-18 17:13:15 Re: Enabling Checksums