Re: Archive recovery won't be completed on some situation.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnakangas(at)vmware(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Archive recovery won't be completed on some situation.
Date: 2014-03-19 08:28:06
Message-ID: 20140319.172806.193015541.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, thank you for suggestions.

The *problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions. 9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't. Finally, needed amendments per versions
are

9.4: Nothing more is needed (but resetting backup mode by
resetxlog is acceptable)

9.3: Can be recovered without resetting backup positions in
controlfile. (but smarter with it)

9.2: Same to 9.3

9.1: Cannot be recoverd without directly resetting backup
position in controlfile. Resetting feature is needed.

At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote
> On 03/15/2014 05:59 PM, Fujii Masao wrote:
> > What about adding new option into pg_resetxlog so that we can
> > reset the pg_control's backup start location? Even after we've
> > accidentally entered into the situation that you described, we can
> > exit from that by resetting the backup start location in pg_control.
> > Also this option seems helpful to salvage the data as a last resort
> > from the corrupted backup.
>
> Yeah, seems reasonable. After you run pg_resetxlog, there's no hope
> that the backup end record would arrive any time later. And if it
> does, it won't really do much good after you've reset the WAL.
>
> We probably should just clear out the backup start/stop location
> always when you run pg_resetxlog. Your database is potentially broken
> if you reset the WAL before reaching consistency, but if forcibly do
> that with "pg_resetxlog -f", you've been warned.

Agreed. Attached patches do that and I could "recover" the
database state with following steps,

(1) Remove recovery.conf and do pg_resetxlog -bf
(the option name 'b' would be arguable)
(2) Start the server (with crash recovery)
(3) Stop the server (in any mode)
(4) Create recovery.conf and start the server with archive recovery.

Some annoyance in step 2 and 3 but I don't want to support the
pacemaker's in-a-sense broken sequence no further:(

This is alterable by the following steps suggested in Masao's
previous mail for 9.2 and alter, but 9.1 needs forcibly resetting
startBackupPoint.

At Sun, 16 Mar 2014 00:59:01 +0900, Fujii Masao wrote
> Though this is formal way, you can exit from that situation by
>
> (1) Remove recovery.conf and start the server with crash recovery
> (2) Execute pg_start_backup() after crash recovery ends
> (3) Copy backup_label to somewhere
> (4) Execute pg_stop_backup() and shutdown the server
> (5) Copy backup_label back to $PGDATA
> (6) Create recovery.conf and start the server with archive recovery

This worked for 9.2, 9.3 and HEAD but failed for 9.1 at step 1.

| 2014-03-19 15:53:02.512 JST FATAL: WAL ends before end of online backup
| 2014-03-19 15:53:02.512 JST HINT: Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that point must be available at recovery.

This seems inevitable.

| if (InRecovery &&
| (XLByteLT(EndOfLog, minRecoveryPoint) ||
| !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)))
| {
...
| /*
| * Ran off end of WAL before reaching end-of-backup WAL record, or
| * minRecoveryPoint.
| */
| if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
| ereport(FATAL,
| (errmsg("WAL ends before end of online backup"),

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
resetxlog_9.4.patch text/x-patch 1.6 KB
resetxlog_9.3.patch text/x-patch 1.6 KB
resetxlog_9.2.patch text/x-patch 1.8 KB
resetxlog_9.1.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-03-19 08:45:21 Review: plpgsql.extra_warnings, plpgsql.extra_errors
Previous Message Heikki Linnakangas 2014-03-19 07:59:19 Re: pg_archivecleanup bug