Re: Enabling Checksums

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-05 09:35:14
Message-ID: 5135BC52.3020609@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.03.2013 09:11, Simon Riggs wrote:
> On 3 March 2013 18:24, Greg Smith<greg(at)2ndquadrant(dot)com> wrote:
>
>> The 16-bit checksum feature seems functional, with two sources of overhead.
>> There's some CPU time burned to compute checksums when pages enter the
>> system. And there's extra overhead for WAL logging hint bits. I'll
>> quantify both of those better in another message.
>
> It's crunch time. Do you and Jeff believe this patch should be
> committed to Postgres core?
>
> Are there objectors?

In addition to my hostility towards this patch in general, there are
some specifics in the patch I'd like to raise (read out in a grumpy voice):

If you enable checksums, the free space map never gets updated in a
standby. It will slowly drift to be completely out of sync with reality,
which could lead to significant slowdown and bloat after failover.

Since the checksums are an all-or-nothing cluster-wide setting, the
three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and
PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the
code simpler, and leaves the bits free for future use. If we want to
enable such per-page setting in the future, we can add it later. For a
per-relation scheme, they're not needed.

> + * The checksum algorithm is a modified Fletcher 64-bit (which is
> + * order-sensitive). The modification is because, at the end, we have two
> + * 64-bit sums, but we only have room for a 16-bit checksum. So, instead of
> + * using a modulus of 2^32 - 1, we use 2^8 - 1; making it also resemble a
> + * Fletcher 16-bit. We don't use Fletcher 16-bit directly, because processing
> + * single bytes at a time is slower.

How does the error detection rate of this compare with e.g CRC-16? Is
there any ill effect from truncating the Fletcher sums like this?

> + /*
> + * Store the sums as bytes in the checksum. We add one to shift the range
> + * from 0..255 to 1..256, to make zero invalid for checksum bytes (which
> + * seems wise).
> + */
> + p8Checksum[0] = (sum1 % 255) + 1;
> + p8Checksum[1] = (sum2 % 255) + 1;

That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall
seeing that in other checksum implementations either. 16-bits is not
very wide for a checksum, and this eats about 1% of the space of valid
values.

I can see that it might be a handy debugging aid to avoid 0. But there's
probably no need to avoid 0 in both bytes, it seems enough to avoid a
completely zero return value.

XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN
without a lock. That's not atomic, so it could incorrectly determine
that a page doesn't need to be backed up. We used to always hold an
exclusive lock on the buffer when it's called, which prevents
modifications to the LSN, but that's no longer the case.

Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I
think it will generate WAL records for unlogged tables as it is.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2013-03-05 10:23:26 Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
Previous Message Andres Freund 2013-03-05 09:27:59 Re: Support for REINDEX CONCURRENTLY