Re: Protecting against unexpected zero-pages: proposal

From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protecting against unexpected zero-pages: proposal
Date: 2010-11-09 16:44:04
Message-ID: AANLkTi=NFr9kP6bhfwfmB5TEmdwwidbXe9UyqR7z30mF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> There are also crosschecks that you can apply: if it's a heap page, are
> there any index pages with pointers to it? If it's an index page, are
> there downlink or sibling links to it from elsewhere in the index?
> A page that Postgres left as zeroes would not have any references to it.
>
> IMO there are a lot of methods that can separate filesystem misfeasance
> from Postgres errors, probably with greater reliability than this hack.
> I would also suggest that you don't really need to prove conclusively
> that any particular instance is one or the other --- a pattern across
> multiple instances will tell you what you want to know.
>

Doing this postmortem on a regular deployment and fixing the problem would
not be too difficult. But this platform, which Postgres is a part of, would
be mostly left unattended once deployed (pardon me for not sharing the
details, as I am not sure if I can).

An external HA component is supposed to detect any problems (by querying
Postgres or by external means) and take an evasive action. It is this
automation of problem detection that we are seeking.

As Greg pointed out, even with this hack in place, we might still get zero
pages from the FS (say, when ext3 does metadata journaling but not block
journaling). In that case we'd rely on recovery's WAL replay of relation
extension to reintroduce the magic number in pages.

> What's more, if I did believe that this was a safe and
> reliable technique, I'd be unhappy about the opportunity cost of
> reserving it for zero-page testing rather than other purposes.
>
>
This is one of those times where you are a bit too terse for me. What does
zero-page imply that this hack wouldn't?

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2010-11-09 16:58:45 DROP TABLESPACE needs crash-resistance
Previous Message Jim Nasby 2010-11-09 16:26:56 Re: Protecting against unexpected zero-pages: proposal