Re: 16-bit page checksums for 9.2

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <simon(at)2ndQuadrant(dot)com>,<stark(at)mit(dot)edu>
Cc: <aidan(at)highrise(dot)ca>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-25 13:01:21
Message-ID: 4EF6CA410200002500044100@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Simon Riggs wrote:
> On Sat, Dec 24, 2011 at 8:06 PM, Greg Stark wrote:

>> The problem is that there is no WAL indicating the hint bit
>> change. And if the torn page includes the new checksum but not the
>> new hint bit or vice versa it will be a checksum mismatch.

With *just* this patch, true. An OS crash or hardware failure could
sometimes create an invalid page.

>> The strategy discussed in the past was moving all the hint bits to
>> a common area and skipping them in the checksum. No amount of
>> double writing or buffering or locking will avoid this problem.

I don't believe that. Double-writing is a technique to avoid torn
pages, but it requires a checksum to work. This chicken-and-egg
problem requires the checksum to be implemented first.

> I completely agree we should do this, but we are unable to do it
> now, so this patch is a stop-gap and provides a much requested
> feature *now*.

Yes, for people who trust their environment to prevent torn pages, or
who are willing to tolerate one bad page per OS crash in return for
quick reporting of data corruption from unreliable file systems, this
is a good feature even without double-writes.

> In the future, we will be able to tell the difference between an
> acceptable and an unacceptable bit error.

A double-write patch would provide that, and it sounds like VMware
has a working patch for that which is being polished for submission.
It would need to wait until we have some consensus on the checksum
patch before it can be finalized. I'll try to review the patch from
this thread today, to do what I can to move that along.

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: simon(at)2ndQuadrant(dot)com, stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-27 20:05:11
Message-ID: 4EFA24F7.6090007@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.12.2011 15:01, Kevin Grittner wrote:
> I don't believe that. Double-writing is a technique to avoid torn
> pages, but it requires a checksum to work. This chicken-and-egg
> problem requires the checksum to be implemented first.

I don't think double-writes require checksums on the data pages
themselves, just on the copies in the double-write buffers. In the
double-write buffer, you'll need some extra information per-page anyway,
like a relfilenode and block number that indicates which page it is in
the buffer.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-27 23:39:48
Message-ID: CA+U5nMJK89jFTKCmegnoKRShaSTamFs4RLr6Fyshu1R05htq4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 27, 2011 at 8:05 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 25.12.2011 15:01, Kevin Grittner wrote:
>>
>> I don't believe that.  Double-writing is a technique to avoid torn
>> pages, but it requires a checksum to work.  This chicken-and-egg
>> problem requires the checksum to be implemented first.
>
>
> I don't think double-writes require checksums on the data pages themselves,
> just on the copies in the double-write buffers. In the double-write buffer,
> you'll need some extra information per-page anyway, like a relfilenode and
> block number that indicates which page it is in the buffer.

How would you know when to look in the double write buffer?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-28 02:54:02
Message-ID: CA+U5nM+WUMu5FvuyK+hPGoxdgMq4xRqd5tFW-qBaDYNt2b8A=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 25, 2011 at 1:01 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> This chicken-and-egg
> problem requires the checksum to be implemented first.

v2 of checksum patch, using a conditional copy if checksumming is
enabled, so locking is removed.

Thanks to Andres for thwacking me with the cluestick, though I have
used a simple copy rather than a copy & calc.

Tested using make installcheck with parameter on/off, then restart and
vacuumdb to validate all pages.

Reviews, objections, user interface tweaks all welcome.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
checksum16.v2.patch text/x-patch 25.5 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-28 07:42:10
Message-ID: 4EFAC852.3080808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28.12.2011 01:39, Simon Riggs wrote:
> On Tue, Dec 27, 2011 at 8:05 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 25.12.2011 15:01, Kevin Grittner wrote:
>>>
>>> I don't believe that. Double-writing is a technique to avoid torn
>>> pages, but it requires a checksum to work. This chicken-and-egg
>>> problem requires the checksum to be implemented first.
>>
>>
>> I don't think double-writes require checksums on the data pages themselves,
>> just on the copies in the double-write buffers. In the double-write buffer,
>> you'll need some extra information per-page anyway, like a relfilenode and
>> block number that indicates which page it is in the buffer.
>
> How would you know when to look in the double write buffer?

You scan the double-write buffer, and every page in the double write
buffer that has a valid checksum, you copy to the main storage. There's
no need to check validity of pages in the main storage.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-28 09:22:14
Message-ID: CA+U5nMJhRbkmNqVWCJTvCLfDcFJNu0j-H_XGZzv7=P0Uamfnsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>> How would you know when to look in the double write buffer?
>
>
> You scan the double-write buffer, and every page in the double write buffer
> that has a valid checksum, you copy to the main storage. There's no need to
> check validity of pages in the main storage.

OK, then we are talking at cross purposes. Double write buffers, in
the way you explain them allow us to remove full page writes. They
clearly don't do anything to check page validity on read. Torn pages
are not the only fault we wish to correct against... and the double
writes idea is orthogonal to the idea of checksums.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-28 17:45:37
Message-ID: 4EFB55C1.9090603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28.12.2011 11:22, Simon Riggs wrote:
> On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>>> How would you know when to look in the double write buffer?
>>
>>
>> You scan the double-write buffer, and every page in the double write buffer
>> that has a valid checksum, you copy to the main storage. There's no need to
>> check validity of pages in the main storage.
>
> OK, then we are talking at cross purposes. Double write buffers, in
> the way you explain them allow us to remove full page writes. They
> clearly don't do anything to check page validity on read. Torn pages
> are not the only fault we wish to correct against... and the double
> writes idea is orthogonal to the idea of checksums.

The reason we're talking about double write buffers in this thread is
that double write buffers can be used to solve the problem with hint
bits and checksums.

You're right, though, that it's academical whether double write buffers
can be used without checksums on data pages, if the whole point of the
exercise is to make it possible to have checksums on data pages..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, stark(at)mit(dot)edu, aidan(at)highrise(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-29 10:58:38
Message-ID: CA+U5nMKu2Oy097fyV8qFTWVO7fyx5D_-5q=Nhd_64Yr0Bzq0+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 28, 2011 at 5:45 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 28.12.2011 11:22, Simon Riggs wrote:
>>
>> On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com>  wrote:
>>
>>>> How would you know when to look in the double write buffer?
>>>
>>>
>>>
>>> You scan the double-write buffer, and every page in the double write
>>> buffer
>>> that has a valid checksum, you copy to the main storage. There's no need
>>> to
>>> check validity of pages in the main storage.
>>
>>
>> OK, then we are talking at cross purposes. Double write buffers, in
>> the way you explain them allow us to remove full page writes. They
>> clearly don't do anything to check page validity on read. Torn pages
>> are not the only fault we wish to correct against... and the double
>> writes idea is orthogonal to the idea of checksums.
>
>
> The reason we're talking about double write buffers in this thread is that
> double write buffers can be used to solve the problem with hint bits and
> checksums.

Torn pages are not the only problem we need to detect.

You said "You scan the double write buffer...". When exactly would you do that?

Please explain how a double write buffer detects problems that do not
occur as the result of a crash.

We don't have much time, so please be clear and lucid.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>
Cc: "Andres Freund" <andres(at)anarazel(dot)de>,<aidan(at)highrise(dot)ca>, <stark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-03 22:21:42
Message-ID: 4F032B1602000025000442AA@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> v2 of checksum patch, using a conditional copy if checksumming is
> enabled, so locking is removed.
>
> Thanks to Andres for thwacking me with the cluestick, though I
> have used a simple copy rather than a copy & calc.
>
> Tested using make installcheck with parameter on/off, then restart
> and vacuumdb to validate all pages.
>
> Reviews, objections, user interface tweaks all welcome.

I'm happy with how this looks, except (as noted in a code comment)
that there seems to be room for optimization of the calculation
itself. Details below:

(1) I like the choice of Fletcher-16. It should be very good at
detecting problems while being a lot less expensive that an official
CRC calculation. The fact that it was developed at Lawrence
Livermore Labs and has been subjected to peer review in the IEEE
Transactions on Communications generates confidence in the
technique. According to what I've read, though, the technique is
conceptually based around the modulus of the sums. Doing the
modulus calculation for each addition is often done to prevent
overflow, but if we define sum1 and sum2 as uint I don't see how we
can get an overflow with 8k byes, so I suggest we declare both of
these local variables as uint and leave the modulus to the end. It
can't hurt to leave off 16000 division operations per checksum
generation.

(2) I'm not sure about doing this in three parts, to skip the
checksum itself and the hole in the middle of the page. Is this
because the hole might not have predictable data? Why would that
matter, as long as it is read back the same? Is it expected that
this will reduce the cost by checking fewer bytes? That we could
tolerate the read of an actual corrupted disk sector in the middle
of a page if it didn't contain any data? If we can set the checksum
to zero before starting, might it be faster to load four bytes at a
time, and iterate fewer times? (Like I said, I'm not sure about any
of this, but it seemed worth asking the questions.)

(3) Rather than having PageSetVerificationInfo() use memcpy,
followed by pass through the copied data to calculate the checksum,
might it be better to have a "copy and calculate" version of the
function (as in VMware's original patch) to save an extra pass over
the page image?

Other than these performance tweaks around the calculation phase, I
didn't spot any problems. I beat up on it a bit on a couple
machines without hitting any bugs or seeing any regression test
failures.

-Kevin


From: Jim Nasby <jim(at)nasby(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Andres Freund" <andres(at)anarazel(dot)de>, <aidan(at)highrise(dot)ca>, <stark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-03 23:00:15
Message-ID: A5220FCC-1321-47CF-8C23-2DD40467B761@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 3, 2012, at 4:21 PM, Kevin Grittner wrote:
> (2) I'm not sure about doing this in three parts, to skip the
> checksum itself and the hole in the middle of the page. Is this
> because the hole might not have predictable data? Why would that
> matter, as long as it is read back the same?

IMO not checksumming the free space would be a really bad idea. It's entirely possible to have your hardware crapping on your free space, and I'd still want to know that that was happening. Now, it would be interesting if the free space could be checksummed separately, since there's no reason to refuse to read the page if only the free space is screwed up... But given the choice, I'd rather get an error when the free space is "corrupted" and be forced to look into things rather than blissfully ignore corrupted free space only to be hit later with real data loss.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 08:26:57
Message-ID: CA+U5nMJ936dqbvJAkZ040HeX=_3ckVBU6ChE6GMskXaw3ow7ZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 3, 2012 at 11:00 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jan 3, 2012, at 4:21 PM, Kevin Grittner wrote:
>> (2)  I'm not sure about doing this in three parts, to skip the
>> checksum itself and the hole in the middle of the page.  Is this
>> because the hole might not have predictable data?  Why would that
>> matter, as long as it is read back the same?
>
> IMO not checksumming the free space would be a really bad idea. It's entirely possible to have your hardware crapping on your free space, and I'd still want to know that that was happening. Now, it would be interesting if the free space could be checksummed separately, since there's no reason to refuse to read the page if only the free space is screwed up... But given the choice, I'd rather get an error when the free space is "corrupted" and be forced to look into things rather than blissfully ignore corrupted free space only to be hit later with real data loss.

I see that argument. We don't have space for 2 checksums.

We can either

(1) report all errors on a page, including errors that don't change
PostgreSQL data. This involves checksumming long strings of zeroes,
which the checksum algorithm can't tell apart from long strings of
ones.

(2) report only errors that changed PostgreSQL data.

We already do (1) for WAL CRCs so doing the same thing for page
checksums makes sense and is much faster.

If enough people think we should do (2) then its a simple change to the patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, David Fetter <david(at)fetter(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 08:38:19
Message-ID: CA+U5nMLbFTYSP=FDPBOa_5WQ8nwUeKurAcjfwA+8rOe-OE6BvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 3, 2012 at 10:21 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> I'm happy with how this looks, except (as noted in a code comment)
> that there seems to be room for optimization of the calculation
> itself.  Details below:

...

> (3)  Rather than having PageSetVerificationInfo() use memcpy,
> followed by pass through the copied data to calculate the checksum,
> might it be better to have a "copy and calculate" version of the
> function (as in VMware's original patch) to save an extra pass over
> the page image?

> Other than these performance tweaks around the calculation phase, I
> didn't spot any problems.  I beat up on it a bit on a couple
> machines without hitting any bugs or seeing any regression test
> failures.

My focus was on getting something working first, then tuning. If we're
agreed that we have everything apart from the tuning then we can
proceed with tests to see which works better.

The copy and calculate approach might get in the way of hardware
prefetch since in my understanding the memory fetch time exceeds the
calculation time. As discussed elsewhere using that code or not would
not stop that work being credited.

David, please can you rework the VMware calc patch to produce an
additional 16-bit checksum mechanism in a way compatible with the
16bit patch, so we can test the two versions of the calculation? We
can make the GUC an enum so that the page checksum is selectable (for
testing).

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 09:20:20
Message-ID: 201201041020.20781.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote:
> (1) I like the choice of Fletcher-16. It should be very good at
> detecting problems while being a lot less expensive that an official
> CRC calculation.
I wonder if CRC32c wouldn't be a good alternative given more and more cpus
(its in SSE 4.2) support calculating it in silicon.

Andres


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 10:08:35
Message-ID: CA+U5nMLNpgFPu1ER6KGb8p54h7Sv0w+_ZaYZMJZqQcJAf36jAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote:
>> (1)  I like the choice of Fletcher-16.  It should be very good at
>> detecting problems while being a lot less expensive that an official
>> CRC calculation.
> I wonder if CRC32c wouldn't be a good alternative given more and more cpus
> (its in SSE 4.2) support calculating it in silicon.

We're trying to get something that fits in 16bits for this release.
I'm guessing CRC32c doesn't?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 11:53:04
Message-ID: CAP-rdTYtSvCRQ=2FbjryT8G7eeDFw+PDp1Py7i07mSAmhkLtUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/4 Simon Riggs <simon(at)2ndquadrant(dot)com>:

> On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> I wonder if CRC32c wouldn't be a good alternative given more and more cpus
>> (its in SSE 4.2) support calculating it in silicon.
>
> We're trying to get something that fits in 16bits for this release.
> I'm guessing CRC32c doesn't?

What happens to the problem-detecting performance of a 16 bit part of
a CRC32c vs. a real 16 bit checksum? If it is still as good, it might
make sense to use the former, assuming that there is a way to easily
trigger the silicon support and enough CPUs support it.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 13:31:57
Message-ID: 20120104133157.GK24234@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon, all,

* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> (1) report all errors on a page, including errors that don't change
> PostgreSQL data. This involves checksumming long strings of zeroes,
> which the checksum algorithm can't tell apart from long strings of
> ones.

Do we actually know when/where it's supposed to be all zeros, and hence
could we check for that explicitly? If we know what it's supposed to
be, in order to be consistent with other information, I could certainly
see value in actually checking that.

I don't think that's valuable enough to go breaking abstraction layers
or bending over backwards to do it though. If we don't have the
knowledge, at the right level, that the data should all be zeros then
including those pieces in the CRC certainly makes sense to me.

Just my 2c.

Thanks,

Stephen


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 15:13:05
Message-ID: CA+U5nM+L_RFkiSoW7C5rYEaaQkgTnw+P3Y4yhS-S_9n1=+rsKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Simon, all,
>
> * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
>> (1) report all errors on a page, including errors that don't change
>> PostgreSQL data. This involves checksumming long strings of zeroes,
>> which the checksum algorithm can't tell apart from long strings of
>> ones.
>
> Do we actually know when/where it's supposed to be all zeros, and hence
> could we check for that explicitly?  If we know what it's supposed to
> be, in order to be consistent with other information, I could certainly
> see value in actually checking that.

Yes, we can. Excellent suggestion, will implement.

That means we can keep the CRC calc fast as well as check the whole of
the page inbound.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-05 15:29:59
Message-ID: CA+U5nM+Q7sNVw4+1mUYi=xykrFuO9V+v8WVAPwjVfAp8NGdxjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 4, 2012 at 3:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> Simon, all,
>>
>> * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
>>> (1) report all errors on a page, including errors that don't change
>>> PostgreSQL data. This involves checksumming long strings of zeroes,
>>> which the checksum algorithm can't tell apart from long strings of
>>> ones.
>>
>> Do we actually know when/where it's supposed to be all zeros, and hence
>> could we check for that explicitly?  If we know what it's supposed to
>> be, in order to be consistent with other information, I could certainly
>> see value in actually checking that.
>
> Yes, we can. Excellent suggestion, will implement.

No, we can't.

I discover that non-all-zeroes holes are fairly common, just not very frequent.

That may or may not be a problem, but not something to be dealt with
here and now.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-06 01:10:17
Message-ID: 20120106011017.GM24234@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> I discover that non-all-zeroes holes are fairly common, just not very frequent.

Curious, might be interesting to find out why.

> That may or may not be a problem, but not something to be dealt with
> here and now.

But I agree that it's not the job of this patch/effort. It sounds like
we have clear indication, however, that those areas, as they are not
necessairly all zeros, should be included in the checksum.

Thanks,

Stephen


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-06 10:30:53
Message-ID: CA+U5nMK9xVMb69UPnseQ5pjnUuh85NW+daREFayjgZ8FXj1WUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 6, 2012 at 1:10 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
>> I discover that non-all-zeroes holes are fairly common, just not very frequent.
>
> Curious, might be interesting to find out why.
>
>> That may or may not be a problem, but not something to be dealt with
>> here and now.
>
> But I agree that it's not the job of this patch/effort.  It sounds like
> we have clear indication, however, that those areas, as they are not
> necessairly all zeros, should be included in the checksum.

Disagree. Full page writes ignore the hole, so its appropriate to do
so here also.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-06 10:36:08
Message-ID: 201201061136.08476.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday, January 06, 2012 11:30:53 AM Simon Riggs wrote:
> On Fri, Jan 6, 2012 at 1:10 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> >> I discover that non-all-zeroes holes are fairly common, just not very
> >> frequent.
> >
> > Curious, might be interesting to find out why.
> >
> >> That may or may not be a problem, but not something to be dealt with
> >> here and now.
> >
> > But I agree that it's not the job of this patch/effort. It sounds like
> > we have clear indication, however, that those areas, as they are not
> > necessairly all zeros, should be included in the checksum.
>
> Disagree. Full page writes ignore the hole, so its appropriate to do
> so here also.
Well, ignoriging them in fpw has clear space benefits. Ignoring them while
checksumming doesn't have that much of a benefit.

Andres


From: Jim Nasby <jim(at)nasby(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-10 02:51:30
Message-ID: 74709685-CEFB-4B64-8599-F3F05D6F9BAC@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 6, 2012, at 4:36 AM, Andres Freund wrote:
> On Friday, January 06, 2012 11:30:53 AM Simon Riggs wrote:
>> On Fri, Jan 6, 2012 at 1:10 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>> * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
>>>> I discover that non-all-zeroes holes are fairly common, just not very
>>>> frequent.
>>>
>>> Curious, might be interesting to find out why.
>>>
>>>> That may or may not be a problem, but not something to be dealt with
>>>> here and now.
>>>
>>> But I agree that it's not the job of this patch/effort. It sounds like
>>> we have clear indication, however, that those areas, as they are not
>>> necessairly all zeros, should be included in the checksum.
>>
>> Disagree. Full page writes ignore the hole, so its appropriate to do
>> so here also.
> Well, ignoriging them in fpw has clear space benefits. Ignoring them while
> checksumming doesn't have that much of a benefit.

I agree with Andres... we should checksum zero bytes, because if they're screwed up then something is wrong with your system, even if you got lucky with what data got trashed.

As I mentioned before, 2 separate checksums would be nice, but if we can't have that I think we need to fail on any checksum error.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net