Re: 9.2.3 crashes during archive recovery

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, michael(dot)paquier(at)gmail(dot)com, ants(at)cybertec(dot)at, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.2.3 crashes during archive recovery
Date: 2013-03-07 08:05:42
Message-ID: 51384A56.1010906@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2013/03/06 16:50), Heikki Linnakangas wrote:>
>> Hi,
>>
>> Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
>> Attempt patch records minRecoveryPoint.
>> [crash recovery -> record minRecoveryPoint in control file -> archive
>> recovery]
>> I think that this is an original intention of Heikki's patch.
>
> Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're still in crash recovery. The real issue here is that I missed the two "return NULL;"s in ReadRecord(), so the code that I put in the next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. Attached patch is the proper fix for this.
>
Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop, crash recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come off standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.
I attempt test script which was modifyed Horiuch's script. This script does not depend on shell enviroment. It was only needed to fix PGPATH.
Please execute this test script.

>> I also found a bug in latest 9.2_stable. It does not get latest timeline
>> and
>> recovery history file in archive recovery when master and standby
>> timeline is different.
>
> Works for me.. Can you create a test script for that? Remember to set "recovery_target_timeline='latest'".
I set recovery_target_timeline=latest. hmm...

Here is my recovery.conf.
> mitsu-ko(at)localhost postgresql]$ cat Standby/recovery.conf
> standby_mode = 'yes'
> recovery_target_timeline='latest'
> primary_conninfo='host=localhost port=65432'
> restore_command='cp ../arc/%f %p'
And my system's log message is here.
> waiting for server to start....[Standby] LOG: database system was shut down in recovery at 2013-03-07 02:56:05 EST
> [Standby] LOG: restored log file "00000002.history" from archive
> cp: cannot stat `../arc/00000003.history': そのようなファイルやディレクトリはありません
> [Standby] FATAL: requested timeline 2 is not a child of database system timeline 1
> [Standby] LOG: startup process (PID 20941) exited with exit code 1
> [Standby] LOG: aborting startup due to startup process failure
It can be reproduced in my test script, too.
Last master start command might seem not to exist generally in my test script.
But it is generally that PostgreSQL with Pacemaker system.

Best regards,
--
Mitsumasa KONDO
NTT OSS Center

Attachment Content-Type Size
crash-archive-recovery9.2_stable_v2.patch text/x-diff 1.3 KB
fix-not-recovery-target-timeline-bug.patch text/x-diff 446 bytes
run_v2.sh text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-03-07 08:55:04 Re: Materialized views WIP patch
Previous Message Amit Kapila 2013-03-07 07:42:27 Re: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]