Re: Cascading replication and recovery_target_timeline='latest'

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-03 22:07:44
Message-ID: 50452A30.4090906@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03.09.2012 10:43, Fujii Masao wrote:
> On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi> wrote:
>>> Aside from the missing locking, I wonder what that does to a cascaded
>>> standby. If there is an active walsender running while RecoveryTargetTLI is
>>> changed, I think what will happen is that the walsender will continue to
>>> stream WAL from the old timeline, but because the startup process is now
>>> actually replaying from a different timeline, the walsender will send bogus
>>> WAL to the standby.
>>
>> Good catch! That's really problem. To address that, we should terminate
>> all cascading walsenders when the timeline history file is read and
>> the recovery target timeline is changed?
>
> This is not right fix. After terminating cascading walsenders, it
> might take them
> some time to come to an end, and during that time they might send bogus WAL
> from old timeline. Currently there is no safeguard against sending bogus WAL
> from old timeline. To implement such a safeguard, cascading walsender needs
> to know when the timeline is updated and which is the last valid WAL file of
> the timeline as the startup process knows. IOW, we need to change cascading
> walsenders so that they also read and understand the timeline history files.
> This is not easy fix at this stage (9.2.0 is about to be released...).
>
> So, as one idea, I'm thiking to just forbid cascading replication when
> recovery_target_timeline is set to 'latest'. Thought?

Hmm, I was thinking that when walsender gets the position it can send
the WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the
current recovery timeline. If it has changed, refuse to send the new WAL
and terminate. That would be a fairly small change, it would just close
the window between requesting walsenders to terminate and them actually
terminating.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-09-03 22:20:22 Re: pg_upgrade del/rmdir path fix
Previous Message Kevin Grittner 2012-09-03 21:31:25 index-only scans versus serializable transactions