Checksums, state of play

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Checksums, state of play
Date: 2012-03-05 15:03:18
Message-ID: CA+U5nMJYJXzFiTBwaa94W5WWymWBCXThotRNZUMGs_cR+Gt6zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

To avoid any confusion as to where this proposed feature is now, I'd
like to summarise my understanding, make proposals and also request
clear feedback on them.

Checksums have a number of objections to them outstanding.

1. We don't need them because there will be something better in a
later release. I don't think anybody disagrees that a better solution
is possible in the future; doubts have been expressed as to what will
be required and when that is likely to happen. Opinions differ. We can
and should do something now unless there is reason not to.

2. Turning checksums on/off/on/off in rapid succession can cause false
positive reports of checksum failure if crashes occur and are ignored.
That may lead to the feature and PostgreSQL being held in disrepute.
This can be resolved, if desired, by having having a two-stage
enabling process where we must issue a command that scans every block
in the database before checksum checking begins. VACUUM is easily
modified to the task, we just need to agree that is suitable and agree
syntax.
A suggestion is VACUUM ENABLE CHECKSUMS; others are possible.

3. Pages with checksums set need to have a version marking to show
that they are a later version of the page layout. That version number
needs to be extensible to many later versions. Pages of multiple
versions need to exist within the server to allow simple upgrades and
migration.

4. Checksums that are dependent upon a bit setting on the block are
somewhat fragile. Requests have been made to add bits in certain
positions and also to remove them again. No set of bits seems to
please everyone.

(3) and (4) are in conflict with each other, but there is a solution.
We mark the block with a version number, but we don't make the
checking dependant upon the version number. We simply avoid making any
checks until the command to scan all blocks is complete, per point
(2). That way we need to use 1 flag bit to mark the new version and
zero flag bits to indicate checks should happen.

(Various other permutations of solutions for (2), (3), (4) have been
discussed and may also be still open)

5. The part of the page header that can be used as a checksum has been
disputed. Using the 16 bits dedicated to a version number seems like
the least useful consecutive 2 bytes of data in the page header. It
can't be < 16 bits because that wouldn't be an effective checksum for
database blocks. We might prefer 32 bits, but that would require use
of some other parts of the page header and possibly split that into
two parts. Splitting the checksum into 2 parts will cause the code to
be more complex and fragile.

6. Performance impacts. Measured to be a small regression.

7. Hint bit setting requires WAL logging. The worst case for that
would be setting hints on newly loaded tables. Work has been done on
other patches to remove that case. If those don't fly, this would be a
cost paid by those that wish to take advantage of this feature.

If there are other points I've missed for whatever reason, please add
them here again for clarity.

My own assessment of the above is that the checksum feature can be
added to 9.2, as long as we agree the changes above and then proceed
to implement them and also that no further serious problems emerge.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-05 15:32:05 Re: xlog min recovery request ... is past current point ...
Previous Message Kevin Grittner 2012-03-05 14:34:37 Re: autovacuum locks