Re: Cascading replication and recovery_target_timeline='latest'

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-03 23:25:02
Message-ID: CAHGQGwGrLAvWvV23VLJez4qjovPGaJNtJJ2o=9g_UTwjSQy8dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 03.09.2012 10:43, Fujii Masao wrote:
>>
>> On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi>
>>> wrote:
>>>>
>>>> Aside from the missing locking, I wonder what that does to a cascaded
>>>>
>>>> standby. If there is an active walsender running while RecoveryTargetTLI
>>>> is
>>>> changed, I think what will happen is that the walsender will continue to
>>>> stream WAL from the old timeline, but because the startup process is now
>>>> actually replaying from a different timeline, the walsender will send
>>>> bogus
>>>> WAL to the standby.
>>>
>>>
>>> Good catch! That's really problem. To address that, we should terminate
>>> all cascading walsenders when the timeline history file is read and
>>> the recovery target timeline is changed?
>>
>>
>> This is not right fix. After terminating cascading walsenders, it
>> might take them
>> some time to come to an end, and during that time they might send bogus
>> WAL
>> from old timeline. Currently there is no safeguard against sending bogus
>> WAL
>> from old timeline. To implement such a safeguard, cascading walsender
>> needs
>> to know when the timeline is updated and which is the last valid WAL file
>> of
>> the timeline as the startup process knows. IOW, we need to change
>> cascading
>> walsenders so that they also read and understand the timeline history
>> files.
>> This is not easy fix at this stage (9.2.0 is about to be released...).
>>
>> So, as one idea, I'm thiking to just forbid cascading replication when
>> recovery_target_timeline is set to 'latest'. Thought?
>
>
> Hmm, I was thinking that when walsender gets the position it can send the
> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
> recovery timeline. If it has changed, refuse to send the new WAL and
> terminate. That would be a fairly small change, it would just close the
> window between requesting walsenders to terminate and them actually
> terminating.

Yeah, sounds good. Could you implement the patch? If you don't have time,
I will....

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-09-03 23:26:33 Re: Cascading replication and recovery_target_timeline='latest'
Previous Message Andrew Dunstan 2012-09-03 22:20:22 Re: pg_upgrade del/rmdir path fix