Re: page is uninitialized --- fixing

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: page is uninitialized --- fixing
Date: 2009-06-09 20:51:41
Message-ID: 1244580701.15799.374.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2009-06-09 at 16:17 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > A couple of people in recent years have had a problem with "page X is
> > uninitialised -- fixing" messages.
>
> > I have a case now with 569357 consecutive pages that required fixing in
> > pg_attribute. We looked at pages by hand and they really are
> > uninitialised, but otherwise what we would expect for size, name etc..
>
> > Clearly this is way too many pages to be easily explainable.
>
> It's probably too late to tell now, but I wonder if those pages actually
> existed or were just a "hole" in the file. A perhaps-plausible
> mechanism for them to appear is that the FSM spits out some ridiculously
> large page number as being the next place to insert something into
> pg_attribute, the system plops down a new tuple into that page, and
> behold you have a large hole that reads as zeroes.
>
> Another interesting question is whether the range began or ended at a
> 1GB segment boundary, in which case something in or around the
> segmenting logic could be at fault. (Hmm ... actually 1GB is only
> 131072 pages anyway, so your "hole" definitely spanned several segments.
> That seems like the next place to look.)

The "hole" started about 0.75GB in file 0 and spanned 4 complete 1GB
segments before records started again in file 5. The "hole" segments
were all 1GB in size, and the pages either size of the hole were
undamaged.

A corrupt record of a block number would do this in XLogReadBuffer() if
we had full page writes enabled. But it would have to be corrupt between
setting it correctly and the CRC check on the WAL record. Which is a
fairly small window of believability.

Should there be a sanity check on how far a relation can be extended in
recovery?

Not sure if that would work with normal mode ReadBuffer() - it should
fail somewhere in smgr or in bufmgr.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-06-09 21:17:53 Re: Problem with listen_addresses = '*' on 8.4beta2 on AIX
Previous Message Kevin Grittner 2009-06-09 20:48:59 Re: postmaster recovery and automatic restart suppression