Re: 16-bit page checksums for 9.2

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>
Cc: "Andres Freund" <andres(at)anarazel(dot)de>,<aidan(at)highrise(dot)ca>, <stark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-03 22:21:42
Message-ID: 4F032B1602000025000442AA@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> v2 of checksum patch, using a conditional copy if checksumming is
> enabled, so locking is removed.
>
> Thanks to Andres for thwacking me with the cluestick, though I
> have used a simple copy rather than a copy & calc.
>
> Tested using make installcheck with parameter on/off, then restart
> and vacuumdb to validate all pages.
>
> Reviews, objections, user interface tweaks all welcome.

I'm happy with how this looks, except (as noted in a code comment)
that there seems to be room for optimization of the calculation
itself. Details below:

(1) I like the choice of Fletcher-16. It should be very good at
detecting problems while being a lot less expensive that an official
CRC calculation. The fact that it was developed at Lawrence
Livermore Labs and has been subjected to peer review in the IEEE
Transactions on Communications generates confidence in the
technique. According to what I've read, though, the technique is
conceptually based around the modulus of the sums. Doing the
modulus calculation for each addition is often done to prevent
overflow, but if we define sum1 and sum2 as uint I don't see how we
can get an overflow with 8k byes, so I suggest we declare both of
these local variables as uint and leave the modulus to the end. It
can't hurt to leave off 16000 division operations per checksum
generation.

(2) I'm not sure about doing this in three parts, to skip the
checksum itself and the hole in the middle of the page. Is this
because the hole might not have predictable data? Why would that
matter, as long as it is read back the same? Is it expected that
this will reduce the cost by checking fewer bytes? That we could
tolerate the read of an actual corrupted disk sector in the middle
of a page if it didn't contain any data? If we can set the checksum
to zero before starting, might it be faster to load four bytes at a
time, and iterate fewer times? (Like I said, I'm not sure about any
of this, but it seemed worth asking the questions.)

(3) Rather than having PageSetVerificationInfo() use memcpy,
followed by pass through the copied data to calculate the checksum,
might it be better to have a "copy and calculate" version of the
function (as in VMware's original patch) to save an extra pass over
the page image?

Other than these performance tweaks around the calculation phase, I
didn't spot any problems. I beat up on it a bit on a couple
machines without hitting any bugs or seeing any regression test
failures.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2012-01-03 22:52:51 Re: [RFC] grants vs. inherited tables
Previous Message Robert Haas 2012-01-03 22:21:36 Re: spinlocks on powerpc