Re: Block-level CRC checks

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, decibel <decibel(at)decibel(dot)org>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-07 17:54:04
Message-ID: 407d949e0912070954h52525034ka3366d2dd2250e52@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 4, 2009 at 10:47 PM, Chuck McDevitt <cmcdevitt(at)greenplum(dot)com> wrote:
> A curiosity question regarding torn pages:  How does this work on file systems that don't write in-place, but instead always do copy-on-write?
>
> My example would be Sun's ZFS file system (In Solaris & BSD).  Because of its "snapshot & rollback" functionality, it never writes a page in-place, but instead always copies it to another place on disk.  How does this affect the corruption caused by a torn write?
>
> Can we end up with horrible corruption on this type of filesystem where we wouldn't on normal file systems, where we are writing to a previously zeroed area on disk?
>
> Sorry if this is a stupid question... Hopefully somebody can reassure me that this isn't an issue.

It's not a stupid question, we're not 100% sure but we believe ZFS
doesn't need full page writes because it's immune to torn pages.

I think the idea of ZFS is that the new partially written page isn't
visible because it's not linked into the tree until it's been
completely written. To me it appears this would depend on the drive
system ordering writes very strictly which seems hard to be sure is
happening. Perhaps this is tied to the tricks they do to avoid
contention on the root, if they do a write barrier before every root
update that seems like it should be sufficient to me, but I don't know
at that level of detail.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-12-07 18:00:20 Re: Adding support for SE-Linux security
Previous Message Alvaro Herrera 2009-12-07 17:52:58 Re: YAML Was: CommitFest status/management