Re: Allow WAL information to recover corrupted pg_controldata

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: "'Alvaro Herrera'" <alvherre(at)commandprompt(dot)com>, 'Cédric Villemain' <cedric(at)2ndquadrant(dot)com>, "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>, "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Subject: Re: Allow WAL information to recover corrupted pg_controldata
Date: 2012-06-21 07:10:51
Message-ID: 17994.1340262651@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila(at)huawei(dot)com> writes:
>> The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not doing
>> so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the
>> next-LSN.

> The further corruption can only be possible when we replay some wrong
> WAL by selecting wrong LSN.

No, this is mistaken. Pages in the database that have LSN ahead of
where the server thinks the end of WAL is cause lots of problems
unrelated to replay; for example, inability to complete a checkpoint.
That might not directly lead to additional corruption, but consider
the case where such a page gets further modified, and the server decides
it doesn't need to create a full-page image because the LSN is ahead of
where the last checkpoint was. A crash or two later, you have new
problems.

(Admittedly, once you've run pg_resetxlog you're best advised to just be
trying to dump what you've got, and not modify it more. But sometimes
you have to hack the data just to get pg_dump to complete.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-06-21 07:12:40 Re: SP-GiST for ranges based on 2d-mapping and quad-tree
Previous Message Simon Riggs 2012-06-21 07:01:01 Pruning the TODO list