Re: Protecting against unexpected zero-pages: proposal

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protecting against unexpected zero-pages: proposal
Date: 2010-11-09 17:06:41
Message-ID: AANLkTi=ypc=nd4opsVntRdM+OvCbcKDkxjeH=oVMizQe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 9, 2010 at 11:26 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:

>> Huh, this implies that if we did go through all the work of
>> segregating the hint bits and could arrange that they all appear on
>> the same 512-byte sector and if we buffered them so that we were
>> writing the same bits we checksummed then we actually *could* include
>> them in the CRC after all since even a torn page will almost certainly
>> not tear an individual sector.
>
> If there's a torn page then we've crashed, which means we go through crash recovery, which puts a valid page (with valid CRC) back in place from the WAL. What am I missing?

The problem case is where hint-bits have been set. Hint bits have
always been "we don't really care, but we write them".

A torn-page on hint-bit-only writes is ok, because with a torn page
(assuming you dont' get zero-ed pages), you get the old or new chunks
of the complete 8K buffer, but they are identical except for only
hint-bits, which eiterh the old or new state is sufficient.

But with a check-sum, now, getting a torn page w/ only hint-bit
updates now becomes noticed. Before, it might have happened, but we
wouldn't have noticed or cared.

So, for getting checksums, we have to offer up a few things:
1) zero-copy writes, we need to buffer the write to get a consistent
checksum (or lock the buffer tight)
2) saving hint-bits on an otherwise unchanged page. We either need to
just not write that page, and loose the work the hint-bits did, or do
a full-page WAL of it, so the torn-page checksum is fixed

Both of these are theoretical performance tradeoffs. How badly do we
want to verify on read that it is *exactly* what we thought we wrote?

a.

--
Aidan Van Dyk                                             Create like a god,
aidan(at)highrise(dot)ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2010-11-09 17:12:06 Re: proposal: plpgsql - iteration over fields of rec or row variable
Previous Message Greg Stark 2010-11-09 17:01:45 Re: Protecting against unexpected zero-pages: proposal