Re: zero_damaged_pages doesn't work

Lists: pgsql-general
From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: zero_damaged_pages doesn't work
Date: 2010-09-27 21:07:24
Message-ID: 4CA1078C.7060901@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Is the zero_damaged_pages feature expected to work in 8.3.11 ?

I have a fair bit of evidence that it doesn't (you get nice messages
in saying that the page is being zeroed, but the on-disk data does not
change).
I also see quite a few folk reporting similar findings in various form
and mailing list posts over the past few years.

I can use dd to zero the on-disk data, but it'd be nice to know
definitively if this feature is expected to work, and if so under
what conditions it might not.

fwiw I am enabling zero_damaged_pages using a set command
in a client session, not in the server's config file. The symptoms
I observe are that a query that previously errored out due to
a bad page header error will succeed when zero_damaged_pages
is enabled, the log says that the page is being zeroed.
However the same query run subsequently without zero_damaged_pages
will again fail, and pg_filedump shows that the on-disk data
hasn't changed.

Thanks.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: David Boreham <david_list(at)boreham(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: zero_damaged_pages doesn't work
Date: 2010-09-27 22:40:28
Message-ID: 1285627228.32386.5.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 2010-09-27 at 15:07 -0600, David Boreham wrote:
> Is the zero_damaged_pages feature expected to work in 8.3.11 ?
>
> I have a fair bit of evidence that it doesn't (you get nice messages
> in saying that the page is being zeroed, but the on-disk data does not
> change).
> I also see quite a few folk reporting similar findings in various form
> and mailing list posts over the past few years.
>
> I can use dd to zero the on-disk data, but it'd be nice to know
> definitively if this feature is expected to work, and if so under
> what conditions it might not.

It does zero the page in the buffer, but I don't think it marks it as
dirty. So, it never really makes it to disk as all-zeros.

> fwiw I am enabling zero_damaged_pages using a set command
> in a client session, not in the server's config file. The symptoms
> I observe are that a query that previously errored out due to
> a bad page header error will succeed when zero_damaged_pages
> is enabled, the log says that the page is being zeroed.
> However the same query run subsequently without zero_damaged_pages
> will again fail, and pg_filedump shows that the on-disk data
> hasn't changed.

The subsequent queries may succeed if the page is still in the buffer
cache.

zero_damaged_pages is not meant as a recovery tool. It's meant to allow
you to pg_dump whatever data is not damaged, so that you can restore
into a fresh location.

Regards,
Jeff Davis


From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: zero_damaged_pages doesn't work
Date: 2010-09-27 22:45:32
Message-ID: 4CA11E8C.7040104@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 9/27/2010 4:40 PM, Jeff Davis wrote:
> It does zero the page in the buffer, but I don't think it marks it as
> dirty. So, it never really makes it to disk as all-zeros.

Ah ha ! This is certainly consistent with the observed behavior.

> zero_damaged_pages is not meant as a recovery tool. It's meant to allow
> you to pg_dump whatever data is not damaged, so that you can restore
> into a fresh location.

It'd be useful for future generations if this were included in the doc.

The latest version :
http://www.postgresql.org/docs/9.0/static/runtime-config-developer.html
still talks about destroying data (which at least to me implies a
persistent change
to the on-disk bits) and fails to mention that the zeroing only occurs
in the
page pool sans write-back.

If it helps, I'd be happy to contribute some time to fix up the docs,
but imho a simple
copy/paste of your text above would be sufficient.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Boreham <david_list(at)boreham(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: zero_damaged_pages doesn't work
Date: 2010-09-27 22:53:42
Message-ID: 6752.1285628022@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

David Boreham <david_list(at)boreham(dot)org> writes:
> On 9/27/2010 4:40 PM, Jeff Davis wrote:
>> zero_damaged_pages is not meant as a recovery tool. It's meant to allow
>> you to pg_dump whatever data is not damaged, so that you can restore
>> into a fresh location.

> It'd be useful for future generations if this were included in the doc.

> The latest version :
> http://www.postgresql.org/docs/9.0/static/runtime-config-developer.html
> still talks about destroying data (which at least to me implies a
> persistent change to the on-disk bits) and fails to mention that the
> zeroing only occurs in the page pool sans write-back.

The reason it tells you that data will be destroyed is that that could
very well happen. If the system decides to put new data into what will
appear to it to be an empty page, then the damaged data on disk will be
overwritten, and then there's no hope of recovering anything.

Like Jeff said, this is not a recovery tool. It's certainly not meant
to be something that you keep turned on for any length of time, and so
the possibility of repeat messages is really not a design consideration
at all.

regards, tom lane


From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: zero_damaged_pages doesn't work
Date: 2010-09-27 22:58:56
Message-ID: 4CA121B0.9000802@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 9/27/2010 4:53 PM, Tom Lane wrote:
> The reason it tells you that data will be destroyed is that that could
> very well happen. If the system decides to put new data into what will
> appear to it to be an empty page, then the damaged data on disk will be
> overwritten, and then there's no hope of recovering anything.
>
> Like Jeff said, this is not a recovery tool. It's certainly not meant
> to be something that you keep turned on for any length of time, and so
> the possibility of repeat messages is really not a design consideration
> at all.

No argument with any of this, although I'm not the intended audience for
these warnings -- I know what I'm doing ;)

I'm not sure though if you're disagreeing with my
suggestion that the documentation be improved/corrected though.
Is that the case ? (if so then I will argue)


From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: zero_damaged_pages doesn't work
Date: 2010-09-27 23:13:52
Message-ID: 4CA12530.1040905@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 9/27/2010 4:53 PM, Tom Lane wrote:
> The reason it tells you that data will be destroyed is that that could
> very well happen.

Re-parsing this, I think there was a mis-communication :

I'm not at all suggesting that the doc should _not_ say that data will
be corrupted.
I'm suggesting that in addition to what it currently says, it also
should say that the on-disk data won't be
changed by the page zeroing mode.

In my searching I found countless people over the past few years who had
been similarly confused into believing that it would write back the
zeroed page
to disk.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: David Boreham <david_list(at)boreham(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: zero_damaged_pages doesn't work
Date: 2011-02-01 21:44:51
Message-ID: 201102012144.p11Lipj17478@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

David Boreham wrote:
> On 9/27/2010 4:53 PM, Tom Lane wrote:
> > The reason it tells you that data will be destroyed is that that could
> > very well happen.
>
> Re-parsing this, I think there was a mis-communication :
>
> I'm not at all suggesting that the doc should _not_ say that data will
> be corrupted.
> I'm suggesting that in addition to what it currently says, it also
> should say that the on-disk data won't be
> changed by the page zeroing mode.
>
> In my searching I found countless people over the past few years who had
> been similarly confused into believing that it would write back the
> zeroed page
> to disk.

Based on this discussion from September, I have applied the attached
documentation patch to clarify that zero_damaged_pages are not forced to
disk, and when to set this parameter off again.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/zero.diff text/x-diff 2.3 KB