Re: Switching timeline over streaming replication

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: Switching timeline over streaming replication
Date: 2012-12-26 20:31:37
Message-ID: 50DB5EA9.7010406@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.12.2012 16:37, Fujii Masao wrote:
> On Fri, Dec 21, 2012 at 1:48 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sat, Dec 15, 2012 at 9:36 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>> I found another "requested timeline does not contain minimum recovery point"
>>> error scenario in HEAD:
>>>
>>> 1. Set up the master 'M', one standby 'S1', and one cascade standby 'S2'.
>>> 2. Shutdown the master 'M' and promote the standby 'S1', and wait for 'S2'
>>> to reconnect to 'S1'.
>>> 3. Set up new cascade standby 'S3' connecting to 'S2'.
>>> Then 'S3' fails to start the recovery because of the following error:
>>>
>>> FATAL: requested timeline 2 does not contain minimum recovery
>>> point 0/3000000 on timeline 1
>>> LOG: startup process (PID 33104) exited with exit code 1
>>> LOG: aborting startup due to startup process failure
>>>
>>> The result of pg_controldata of 'S3' is:
>>>
>>> Latest checkpoint location: 0/3000088
>>> Prior checkpoint location: 0/2000060
>>> Latest checkpoint's REDO location: 0/3000088
>>> Latest checkpoint's REDO WAL file: 000000020000000000000003
>>> Latest checkpoint's TimeLineID: 2
>>> <snip>
>>> Min recovery ending location: 0/3000000
>>> Min recovery ending loc's timeline: 1
>>> Backup start location: 0/0
>>> Backup end location: 0/0
>>>
>>> The content of the timeline history file '00000002.history' is:
>>>
>>> 1 0/3000088 no recovery target specified
>>
>> I still could reproduce this problem. Attached is the shell script
>> which reproduces the problem.
>
> This problem happens when new standby starts up from the backup
> taken from another standby and its recovery starts from the shutdown
> checkpoint record which causes timeline switch. In this case,
> the timeline of minimum recovery point can be different from that of
> latest checkpoint (i.e., shutdown checkpoint). But the following check
> in StartupXLOG() assumes that they are always the same wrongly.
> So the problem happens.
>
> /*
> * The min recovery point should be part of the requested timeline's
> * history, too.
> */
> if (!XLogRecPtrIsInvalid(ControlFile->minRecoveryPoint)&&
> tliOfPointInHistory(ControlFile->minRecoveryPoint - 1, expectedTLEs) !=
> ControlFile->minRecoveryPointTLI)
> ereport(FATAL,
> (errmsg("requested timeline %u does not contain minimum recovery
> point %X/%X on timeline %u",
> recoveryTargetTLI,
> (uint32) (ControlFile->minRecoveryPoint>> 32),
> (uint32) ControlFile->minRecoveryPoint,
> ControlFile->minRecoveryPointTLI)));

No, it doesn't assume that min recovery point is on the same timeline as
the checkpoint record. This is another variant of the "timeline history
files are not included in the backup" problem discussed on the other
thread with subject "pg_basebackup from cascading standby after timeline
switch". If you remove the min recovery point check above, the test case
still fails, with a different error message:

LOG: unexpected timeline ID 1 in log segment 000000020000000000000003,
offset 0

If you modify the test script to copy the 00000002.history file to the
data-standby3/pg_xlog after running pg_basebackup, the test case works.
(we still need to fix it, of course)

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2012-12-26 20:32:33 Re: Feature Request: pg_replication_master()
Previous Message Heikki Linnakangas 2012-12-26 20:04:10 Re: Feature Request: pg_replication_master()