Quick Links

read() returns ERANGE in Mac OS X

Lists:	pgsql-hackers

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	read() returns ERANGE in Mac OS X
Date:	2012-05-16 13:39:13
Message-ID:	1337175236-sup-4510@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

We just came across a situation where a corrupted HFS+ filesystem
appears to return ERANGE on a customer machine. Our first reaction was
to turn zero_damaged_pages on to allow taking a pg_dump backup of the
database, but surprisingly this does not work. A quick glance at the
code shows the reason:

if (nbytes != BLCKSZ)
{
if (nbytes < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read block %u in file \"%s\": %m",
blocknum, FilePathName(v->mdfd_vfd))));

/*
* Short read: we are at or past EOF, or we read a partial block at
* EOF. Normally this is an error; upper levels should never try to
* read a nonexistent block. However, if zero_damaged_pages is ON or
* we are InRecovery, we should instead return zeroes without
* complaining. This allows, for example, the case of trying to
* update a block that was later truncated away.
*/
if (zero_damaged_pages || InRecovery)
MemSet(buffer, 0, BLCKSZ);
else
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("could not read block %u in file \"%s\": read only %d of %d bytes",
blocknum, FilePathName(v->mdfd_vfd),
nbytes, BLCKSZ)));

Note that zero_damaged_pages only enters the picture if it's a short
read, not if the read actually fails completely.

Is this by design, or is this just an oversight?

See
http://lists.gnu.org/archive/html/rdiff-backup-users/2007-12/msg00053.html

I don't have yet any evidence that the filesystem is actually corrupt,
but the error message from the kernel is "Result out of range", which is
not documented to be possible on read() in Mac OS X.

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-16 13:51:26
Message-ID:	24254.1337176286@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> We just came across a situation where a corrupted HFS+ filesystem
> appears to return ERANGE on a customer machine. Our first reaction was
> to turn zero_damaged_pages on to allow taking a pg_dump backup of the
> database, but surprisingly this does not work. A quick glance at the
> code shows the reason:
> ...
> Note that zero_damaged_pages only enters the picture if it's a short
> read, not if the read actually fails completely.

> Is this by design, or is this just an oversight?

It is by design, in that the only contemplated case was truncated-away
pages. I'm pretty hesitant to consider allowing arbitrary kernel errors
to be ignored here ...

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-16 15:38:16
Message-ID:	1337182602-sup-8471@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of mié may 16 09:51:26 -0400 2012:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > We just came across a situation where a corrupted HFS+ filesystem
> > appears to return ERANGE on a customer machine. Our first reaction was
> > to turn zero_damaged_pages on to allow taking a pg_dump backup of the
> > database, but surprisingly this does not work. A quick glance at the
> > code shows the reason:
> > ...
> > Note that zero_damaged_pages only enters the picture if it's a short
> > read, not if the read actually fails completely.
>
> > Is this by design, or is this just an oversight?
>
> It is by design, in that the only contemplated case was truncated-away
> pages. I'm pretty hesitant to consider allowing arbitrary kernel errors
> to be ignored here ...

Understood. This is a bit at odds with what the docs say about about
the feature though:

Detection of a damaged page header normally causes PostgreSQL to report an
error, aborting the current transaction. Setting zero_damaged_pages to on
causes the system to instead report a warning, zero out the damaged page in
memory, and continue processing. This behavior will destroy data, namely all
the rows on the damaged page. However, it does allow you to get past the error
and retrieve rows from any undamaged pages that might be present in the table.
It is useful for recovering data if corruption has occurred due to a hardware
or software error. You should generally not set this on until you have given up
hope of recovering data from the damaged pages of a table. Zeroed-out pages are
not forced to disk so it is recommended to recreate the table or the index
before turning this parameter off again. The default setting is off, and it can
only be changed by a superuser.
http://www.postgresql.org/docs/9.1/static/runtime-config-developer.html

Maybe I just need another setting, zero_pages_damaged_at_the_os_level or
something like that.

... just kidding.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-17 13:08:26
Message-ID:	16EA3A6E-B527-4FA6-87D3-1465BD02B948@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May16, 2012, at 15:51 , Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
>> We just came across a situation where a corrupted HFS+ filesystem
>> appears to return ERANGE on a customer machine. Our first reaction was
>> to turn zero_damaged_pages on to allow taking a pg_dump backup of the
>> database, but surprisingly this does not work. A quick glance at the
>> code shows the reason:
>> ...
>> Note that zero_damaged_pages only enters the picture if it's a short
>> read, not if the read actually fails completely.
>
>> Is this by design, or is this just an oversight?
>
> It is by design, in that the only contemplated case was truncated-away
> pages. I'm pretty hesitant to consider allowing arbitrary kernel errors
> to be ignored here …

Maybe we should have zero_missing_pages which would only zero on short reads,
and zero_damaged_pages which would zero on all IO errors?

Or we could have zero_damaged_pages zero only if reports EIO, and then add
any platform-specific additional error codes as we learn about them. ERANGE
on darwin would be the first such addition.

In any case, it seems to me that at least EIO should trigger zeroing, since
that is presumably what you'd get on a filesystem with integrated checksums
like ZFS.

best regards,
Florian Pflug

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-18 21:18:57
Message-ID:	1337375767-sup-8837@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Florian Pflug's message of jue may 17 09:08:26 -0400 2012:
> On May16, 2012, at 15:51 , Tom Lane wrote:

> > It is by design, in that the only contemplated case was truncated-away
> > pages. I'm pretty hesitant to consider allowing arbitrary kernel errors
> > to be ignored here …
>
> Maybe we should have zero_missing_pages which would only zero on short reads,
> and zero_damaged_pages which would zero on all IO errors?
>
> Or we could have zero_damaged_pages zero only if reports EIO, and then add
> any platform-specific additional error codes as we learn about them. ERANGE
> on darwin would be the first such addition.

Seems to me that we could make zero_damaged_pages an enum. The default
value of "on" would only catch truncated-away pages; another value would
also capture kernel-level error conditions.

The thing is, once you start getting kernel-level errors you're pretty
much screwed and there's no way to just recover whatever data is
recoverable.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-19 07:48:51
Message-ID:	002767BE-2F61-47E9-9B22-5B3DAE665376@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May18, 2012, at 23:18 , Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of jue may 17 09:08:26 -0400 2012:
>> On May16, 2012, at 15:51 , Tom Lane wrote:
>
>>> It is by design, in that the only contemplated case was truncated-away
>>> pages. I'm pretty hesitant to consider allowing arbitrary kernel errors
>>> to be ignored here …
>>
>> Maybe we should have zero_missing_pages which would only zero on short reads,
>> and zero_damaged_pages which would zero on all IO errors?
>>
>> Or we could have zero_damaged_pages zero only if reports EIO, and then add
>> any platform-specific additional error codes as we learn about them. ERANGE
>> on darwin would be the first such addition.
>
> Seems to me that we could make zero_damaged_pages an enum. The default
> value of "on" would only catch truncated-away pages; another value would
> also capture kernel-level error conditions.

Yeah, an enum would be nicer than an additional GUC. I kinda keep forgetting
that we have those. Though to bikeshed, the GUC should probably be just called
'zero_pages' and take the values 'never', 'missing', 'unreadable' ;-)

> The thing is, once you start getting kernel-level errors you're pretty
> much screwed and there's no way to just recover whatever data is
> recoverable.

I thought your initial gripe was precisely that you got a kernel-level error,
yet the filesystem was still in pretty good shape?

Which actually seemed quite likely to me - the cause could be, for example,
simply a single bad block. Or a filesystem-level checksum error if you're using
a filesystem with built-in integrity checks.

best regards,
Florian Pflug

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-20 18:12:07
Message-ID:	1337537195-sup-5129@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Florian Pflug's message of sáb may 19 03:48:51 -0400 2012:
>
> On May18, 2012, at 23:18 , Alvaro Herrera wrote:
> > Excerpts from Florian Pflug's message of jue may 17 09:08:26 -0400 2012:

> > Seems to me that we could make zero_damaged_pages an enum. The default
> > value of "on" would only catch truncated-away pages; another value would
> > also capture kernel-level error conditions.
>
> Yeah, an enum would be nicer than an additional GUC. I kinda keep forgetting
> that we have those. Though to bikeshed, the GUC should probably be just called
> 'zero_pages' and take the values 'never', 'missing', 'unreadable' ;-)

Sounds reasonable to me ..

> > The thing is, once you start getting kernel-level errors you're pretty
> > much screwed and there's no way to just recover whatever data is
> > recoverable.
>
> I thought your initial gripe was precisely that you got a kernel-level error,
> yet the filesystem was still in pretty good shape?

Uhm. I'm not really sure what's the actual problem, but I think it is
precisely a corrupted filesystem.

> Which actually seemed quite likely to me - the cause could be, for example,
> simply a single bad block. Or a filesystem-level checksum error if you're using
> a filesystem with built-in integrity checks.

I guess ERANGE is the sort of thing that's not quite expected here -- I
mean you might get EIO if there's an I/O problem such as a checksum
error, but ERANGE suggests to me that the kernel might be leaking some
internal error that's not supposed to be thrown to the user.

In any case I don't think we can distinguish kernel-level problems such
as this one, from filesystem level problems. I mean, they all come from
the kernel, as far as we're concerned.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 15:54:15
Message-ID:	CA+Tgmoa74ieK=aEQ_x33cHrBj_2G7WfwAHzWiBciA2VSqSdibQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, May 20, 2012 at 2:12 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>> Yeah, an enum would be nicer than an additional GUC. I kinda keep forgetting
>> that we have those. Though to bikeshed, the GUC should probably be just called
>> 'zero_pages' and take the values 'never', 'missing', 'unreadable' ;-)
>
> Sounds reasonable to me ..

It seems like it would be nicer to have a setting that somehow makes
the system disregard errors and soldier on rather than actively
destroying your data. Not that I have an exact design in mind, but
zero_damaged_pages is a really fast way to destroy your data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 16:23:28
Message-ID:	17911.1337617408@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, May 20, 2012 at 2:12 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>>> Yeah, an enum would be nicer than an additional GUC. I kinda keep forgetting
>>> that we have those. Though to bikeshed, the GUC should probably be just called
>>> 'zero_pages' and take the values 'never', 'missing', 'unreadable' ;-)

>> Sounds reasonable to me ..

> It seems like it would be nicer to have a setting that somehow makes
> the system disregard errors and soldier on rather than actively
> destroying your data. Not that I have an exact design in mind, but
> zero_damaged_pages is a really fast way to destroy your data.

If we were sure that the kernel error was permanent, then this argument
would be moot: the data is gone already. The scary thought here is that
it might be a transient error, such as a not-always-repeatable kernel
bug. In that case, zeroing the page would indeed lose data that had
been recoverable before.

I'm not entirely sure how we would "soldier on" though; there is no good
reason to think that the kernel has loaded any data at all into
userspace when read() fails.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 16:27:17
Message-ID:	CA+Tgmob848Rc0oG0BAc_BvS1J6HqZAx-mey+eCu4jURKV5B96w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 21, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If we were sure that the kernel error was permanent, then this argument
> would be moot: the data is gone already. The scary thought here is that
> it might be a transient error, such as a not-always-repeatable kernel
> bug. In that case, zeroing the page would indeed lose data that had
> been recoverable before.

Yeah, and in fact I think that's probably not a terribly remote
scenario. Also, if you're running on dying hardware, you really do
NOT want to force the kernel to write a whole bunch of pages back to
the dying disk in the midst of trying to pg_dump it before it falls
over. You just want to read what you can of what's there now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 16:43:34
Message-ID:	18365.1337618614@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, May 21, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If we were sure that the kernel error was permanent, then this argument
>> would be moot: the data is gone already. The scary thought here is that
>> it might be a transient error, such as a not-always-repeatable kernel
>> bug. In that case, zeroing the page would indeed lose data that had
>> been recoverable before.

> Yeah, and in fact I think that's probably not a terribly remote
> scenario. Also, if you're running on dying hardware, you really do
> NOT want to force the kernel to write a whole bunch of pages back to
> the dying disk in the midst of trying to pg_dump it before it falls
> over. You just want to read what you can of what's there now.

Hm? zero_damaged_pages doesn't cause the buffer to be marked dirty,
so I dunno where these alleged writes are coming from.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 17:59:02
Message-ID:	CA+TgmoaS31wHFKitYkk9dYpP2pqFxwZzXLeQAxgNgNGQ0Mg25w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 21, 2012 at 12:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, May 21, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If we were sure that the kernel error was permanent, then this argument
>>> would be moot: the data is gone already. The scary thought here is that
>>> it might be a transient error, such as a not-always-repeatable kernel
>>> bug. In that case, zeroing the page would indeed lose data that had
>>> been recoverable before.
>
>> Yeah, and in fact I think that's probably not a terribly remote
>> scenario. Also, if you're running on dying hardware, you really do
>> NOT want to force the kernel to write a whole bunch of pages back to
>> the dying disk in the midst of trying to pg_dump it before it falls
>> over. You just want to read what you can of what's there now.
>
> Hm? zero_damaged_pages doesn't cause the buffer to be marked dirty,
> so I dunno where these alleged writes are coming from.

I'm not sure either, but I'm pretty sure I've seen at least one case
where turning it on caused a whole lotta data to disappear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 18:20:22
Message-ID:	3442.1337624422@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, May 21, 2012 at 12:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hm? zero_damaged_pages doesn't cause the buffer to be marked dirty,
>> so I dunno where these alleged writes are coming from.

> I'm not sure either, but I'm pretty sure I've seen at least one case
> where turning it on caused a whole lotta data to disappear.

[ thinks about that for awhile... ] The only plausible way for a zeroed
heap page to be silently overwritten by the system is if lazy_scan_heap()
reclaims it for re-use during an autovacuum. I was going to say that
autovacuum.c is careful to force zero_damaged_pages OFF to forestall
exactly this scenario, but on reflection I realize there is a hole in
that defense: the broken-on-disk page could be read in by some other
backend that has zero_damaged_pages ON, then left in shared buffers,
and then an autovacuum scan could find it and reclaim it.

I wonder whether we should dedicate a buffer status bit to show that
the buffer has been zeroed by zero_damaged_pages and thus doesn't
reflect what's on disk. Then we could teach autovacuum to not overwrite
such pages. On the other hand, such an approach would mean that you
couldn't use vacuum to forcibly clean up broken pages, so while this
might be "safer" it's not clear it makes things more useful.

regards, tom lane

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: read() returns ERANGE in Mac OS X
Date:	2012-05-21 22:08:51
Message-ID:	1CDD0EA0-5534-4801-8C88-96B46D6A96C2@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May21, 2012, at 20:20 , Tom Lane wrote:
> I wonder whether we should dedicate a buffer status bit to show that
> the buffer has been zeroed by zero_damaged_pages and thus doesn't
> reflect what's on disk. Then we could teach autovacuum to not overwrite
> such pages.

+1. The idea of us overwriting valid pages because of transient errors sure
is scary.

> On the other hand, such an approach would mean that you
> couldn't use vacuum to forcibly clean up broken pages, so while this
> might be "safer" it's not clear it makes things more useful.

If we're concerned about this, we could always add a separate GUC
fix_damaged_pages which controls whether zero'd pages are written back
or not.

best regards,
Florian Pflug