Re: Enabling Checksums

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enabling Checksums
Date: 2012-11-13 10:12:58
Message-ID: 50A21D2A.60703@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/13/2012 01:22 AM, Greg Smith wrote:
> Once you accept that eventually there need to be online conversion
> tools, there needs to be some easy way to distinguish which pages have
> been processed for several potential implementations.

Agreed. What I'm saying is that this identification doesn't need to be
as fine grained as a per-page bit. A single "horizon" or "border" is
enough, given an ordering of relations (for example by OID) and an
ordering of pages in the relations (obvious).

> All of the table-based checksum enabling ideas ...

This is not really one - it doesn't allow per-table switching. It's just
meant to be a more compact way of representing which pages have been
checksummed and which not.

> I'm thinking of this in some ways like the way creation of a new (but
> not yet valid) foreign key works. Once that's active, new activity is
> immediately protected moving forward. And eventually there's this
> cleanup step needed, one that you can inch forward over a few days.

I understand that. However, I question if users really care. If a
corruption is detected, the clever DBA tells his trainee immediately
check the file- and disk subsystem - no matter whether the corruption
was on old or new data.

You have a point in that pages with "newer" data are often more likely
to be re-read and thus getting checked. Where as the checksums written
to pages with old data might not be re-read any time soon. Starting to
write checksums from the end of the relation could mitigate this to some
extent, though.

Also keep in mind the "quietly corrupted after checked once, but still
in the middle of checking a relation" case. Thus a single bit doesn't
really give us the guarantee you ask for. Sure, we can add more than one
bit. And yeah, if done properly, adding more bits exponentially reduces
the likeliness of a corruption inadvertently turning off checksumming
for a page.

All that said, I'm not opposed to using a few bits of the page header. I
wanted to outline an alternative that I think is viable and less intrusive.

> This is why I think any good solution to this problem needs to
> incorporate restartable conversion.

I fully agree to that.

Regards

Markus Wanner

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-11-13 12:13:12 Re: Proof of concept: standalone backend with full FE/BE protocol
Previous Message Andres Freund 2012-11-13 09:03:10 Re: Patch für MAP_HUGETLB for mmap() shared memory