Re: Checkpoint cost, looks like it is WAL/CRC

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruno Wolff III <bruno(at)wolff(dot)to>, Greg Stark <gsstark(at)mit(dot)edu>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, josh(at)agliodbs(dot)com, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Date: 2005-07-07 15:15:06
Message-ID: 200507071515.j67FF6B07875@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> > SCSI tagged queueing certainly allows 512-byte blocks to be reordered
> > during writes.
>
> Then a torn-page tell-tale is required that will tell us of any change
> to any of the 512-byte sectors that make up a block/page.
>
> Here's an idea:
>
> We read the page that we would have backed up, calc the CRC and write a
> short WAL record with just the CRC, not the block. When we recover we
> re-read the database page, calc its CRC and compare it with the CRC from
> the transaction log. If they differ, we know that the page was torn and
> we know the database needs recovery. (So we calc the CRC when we log AND
> when we recover).
>
> This avoids the need to write full pages, though slightly slows down
> recovery.

Yes, that is a good idea! That torn page thing sounded like a mess, and
I love that we can check them on recovery rather than whenever you
happen to access the page.

What would be great would be to implement this when full_page_writes is
off, _and_ have the page writes happen when the page is written to disk
rather than modified in the shared buffers.

I will add those to the TODO list now. Updated item:

* Eliminate need to write full pages to WAL before page modification
[wal]

Currently, to protect against partial disk page writes, we write
full page images to WAL before they are modified so we can correct any
partial page writes during recovery. These pages can also be
eliminated from point-in-time archive files.

o -Add ability to turn off full page writes
o When off, write CRC to WAL and check file system blocks
on recovery
o Write full pages during file system write and not when
the page is modified in the buffer cache

This allows most full page writes to happen in the background
writer.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-07-07 15:18:14 Re: Checkpoint cost, looks like it is WAL/CRC
Previous Message Andrew Dunstan 2005-07-07 14:30:29 windows regression failure - prepared xacts