Cascading replication and recovery_target_timeline='latest'

Lists: pgsql-hackers
From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Cascading replication and recovery_target_timeline='latest'
Date: 2012-08-31 08:03:34
Message-ID: 50406FD6.8050903@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When a cascading standby launches a new walsender, it fetches the
current recovery timeline:

/*
* Use the recovery target timeline ID during recovery
*/
if (am_cascading_walsender)
ThisTimeLineID = GetRecoveryTargetTLI();

Comment in GetRecoveryTargetTLI() does this:

/* RecoveryTargetTLI doesn't change so we need no lock to copy it */
return XLogCtl->RecoveryTargetTLI;

That comment is not true. RecoveryTargetTLI can change during recovery,
if you set recovery_target_timeline='latest'. In 'latest' mode, when the
(apparent) end of WAL is reached, the archive is scanned for any new
timeline history files that may have appeared. If a new timeline is
found, RecoveryTargetTLI is updated, and recovery is continued on the
new timeline.

Aside from the missing locking, I wonder what that does to a cascaded
standby. If there is an active walsender running while RecoveryTargetTLI
is changed, I think what will happen is that the walsender will continue
to stream WAL from the old timeline, but because the startup process is
now actually replaying from a different timeline, the walsender will
send bogus WAL to the standby.

When a standby ends recovery, creates a new timeline, and switches to
normal operation, postmaster terminates all walsenders because of the
timeline change. But don't we have a race condition there, with similar
effect? It might take a while for a walsender to die, and in that
window, it might send bogus WAL to the cascaded standby.

- Heikki


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-08-31 17:32:16
Message-ID: CAHGQGwEM1-F2XfbcPdhphUadjtYZF3teLWhAHQXLHFYBsudK-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> When a cascading standby launches a new walsender, it fetches the current
> recovery timeline:
>
> /*
> * Use the recovery target timeline ID during recovery
> */
> if (am_cascading_walsender)
> ThisTimeLineID = GetRecoveryTargetTLI();
>
> Comment in GetRecoveryTargetTLI() does this:
>
> /* RecoveryTargetTLI doesn't change so we need no lock to copy it */
> return XLogCtl->RecoveryTargetTLI;
>
>
> That comment is not true. RecoveryTargetTLI can change during recovery, if
> you set recovery_target_timeline='latest'. In 'latest' mode, when the
> (apparent) end of WAL is reached, the archive is scanned for any new
> timeline history files that may have appeared. If a new timeline is found,
> RecoveryTargetTLI is updated, and recovery is continued on the new timeline.

Right. We need lock there for now.

> Aside from the missing locking, I wonder what that does to a cascaded
> standby. If there is an active walsender running while RecoveryTargetTLI is
> changed, I think what will happen is that the walsender will continue to
> stream WAL from the old timeline, but because the startup process is now
> actually replaying from a different timeline, the walsender will send bogus
> WAL to the standby.

Good catch! That's really problem. To address that, we should terminate
all cascading walsenders when the timeline history file is read and
the recovery target timeline is changed?

> When a standby ends recovery, creates a new timeline, and switches to normal
> operation, postmaster terminates all walsenders because of the timeline
> change. But don't we have a race condition there, with similar effect? It
> might take a while for a walsender to die, and in that window, it might send
> bogus WAL to the cascaded standby.

No, I think. The cascading walsender can send only WAL data, up to
min(replay_location, receive_location, restore_location). IOW, it sends
only replayed, received (i.e., streamed), and restored WAL files.
These WAL files belong to old timeline, so ISTM that the cascading
walsender cannot send any new-timeline WAL files.

Regards,

--
Fujii Masao


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-03 17:43:00
Message-ID: CAHGQGwGVLVTzz1QqZLL8=mt7oB5MUqDC-w=b1Thd5wPKi256OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> When a cascading standby launches a new walsender, it fetches the current
>> recovery timeline:
>>
>> /*
>> * Use the recovery target timeline ID during recovery
>> */
>> if (am_cascading_walsender)
>> ThisTimeLineID = GetRecoveryTargetTLI();
>>
>> Comment in GetRecoveryTargetTLI() does this:
>>
>> /* RecoveryTargetTLI doesn't change so we need no lock to copy it */
>> return XLogCtl->RecoveryTargetTLI;
>>
>>
>> That comment is not true. RecoveryTargetTLI can change during recovery, if
>> you set recovery_target_timeline='latest'. In 'latest' mode, when the
>> (apparent) end of WAL is reached, the archive is scanned for any new
>> timeline history files that may have appeared. If a new timeline is found,
>> RecoveryTargetTLI is updated, and recovery is continued on the new timeline.
>
> Right. We need lock there for now.
>
>> Aside from the missing locking, I wonder what that does to a cascaded
>> standby. If there is an active walsender running while RecoveryTargetTLI is
>> changed, I think what will happen is that the walsender will continue to
>> stream WAL from the old timeline, but because the startup process is now
>> actually replaying from a different timeline, the walsender will send bogus
>> WAL to the standby.
>
> Good catch! That's really problem. To address that, we should terminate
> all cascading walsenders when the timeline history file is read and
> the recovery target timeline is changed?

This is not right fix. After terminating cascading walsenders, it
might take them
some time to come to an end, and during that time they might send bogus WAL
from old timeline. Currently there is no safeguard against sending bogus WAL
from old timeline. To implement such a safeguard, cascading walsender needs
to know when the timeline is updated and which is the last valid WAL file of
the timeline as the startup process knows. IOW, we need to change cascading
walsenders so that they also read and understand the timeline history files.
This is not easy fix at this stage (9.2.0 is about to be released...).

So, as one idea, I'm thiking to just forbid cascading replication when
recovery_target_timeline is set to 'latest'. Thought?

Regards,

--
Fujii Masao


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-03 22:07:44
Message-ID: 50452A30.4090906@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.09.2012 10:43, Fujii Masao wrote:
> On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi> wrote:
>>> Aside from the missing locking, I wonder what that does to a cascaded
>>> standby. If there is an active walsender running while RecoveryTargetTLI is
>>> changed, I think what will happen is that the walsender will continue to
>>> stream WAL from the old timeline, but because the startup process is now
>>> actually replaying from a different timeline, the walsender will send bogus
>>> WAL to the standby.
>>
>> Good catch! That's really problem. To address that, we should terminate
>> all cascading walsenders when the timeline history file is read and
>> the recovery target timeline is changed?
>
> This is not right fix. After terminating cascading walsenders, it
> might take them
> some time to come to an end, and during that time they might send bogus WAL
> from old timeline. Currently there is no safeguard against sending bogus WAL
> from old timeline. To implement such a safeguard, cascading walsender needs
> to know when the timeline is updated and which is the last valid WAL file of
> the timeline as the startup process knows. IOW, we need to change cascading
> walsenders so that they also read and understand the timeline history files.
> This is not easy fix at this stage (9.2.0 is about to be released...).
>
> So, as one idea, I'm thiking to just forbid cascading replication when
> recovery_target_timeline is set to 'latest'. Thought?

Hmm, I was thinking that when walsender gets the position it can send
the WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the
current recovery timeline. If it has changed, refuse to send the new WAL
and terminate. That would be a fairly small change, it would just close
the window between requesting walsenders to terminate and them actually
terminating.

- Heikki


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-03 23:25:02
Message-ID: CAHGQGwGrLAvWvV23VLJez4qjovPGaJNtJJ2o=9g_UTwjSQy8dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 03.09.2012 10:43, Fujii Masao wrote:
>>
>> On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi>
>>> wrote:
>>>>
>>>> Aside from the missing locking, I wonder what that does to a cascaded
>>>>
>>>> standby. If there is an active walsender running while RecoveryTargetTLI
>>>> is
>>>> changed, I think what will happen is that the walsender will continue to
>>>> stream WAL from the old timeline, but because the startup process is now
>>>> actually replaying from a different timeline, the walsender will send
>>>> bogus
>>>> WAL to the standby.
>>>
>>>
>>> Good catch! That's really problem. To address that, we should terminate
>>> all cascading walsenders when the timeline history file is read and
>>> the recovery target timeline is changed?
>>
>>
>> This is not right fix. After terminating cascading walsenders, it
>> might take them
>> some time to come to an end, and during that time they might send bogus
>> WAL
>> from old timeline. Currently there is no safeguard against sending bogus
>> WAL
>> from old timeline. To implement such a safeguard, cascading walsender
>> needs
>> to know when the timeline is updated and which is the last valid WAL file
>> of
>> the timeline as the startup process knows. IOW, we need to change
>> cascading
>> walsenders so that they also read and understand the timeline history
>> files.
>> This is not easy fix at this stage (9.2.0 is about to be released...).
>>
>> So, as one idea, I'm thiking to just forbid cascading replication when
>> recovery_target_timeline is set to 'latest'. Thought?
>
>
> Hmm, I was thinking that when walsender gets the position it can send the
> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
> recovery timeline. If it has changed, refuse to send the new WAL and
> terminate. That would be a fairly small change, it would just close the
> window between requesting walsenders to terminate and them actually
> terminating.

Yeah, sounds good. Could you implement the patch? If you don't have time,
I will....

Regards,

--
Fujii Masao


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-03 23:26:33
Message-ID: 50453CA9.9000801@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.09.2012 16:25, Fujii Masao wrote:
> On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi> wrote:
>> Hmm, I was thinking that when walsender gets the position it can send the
>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
>> recovery timeline. If it has changed, refuse to send the new WAL and
>> terminate. That would be a fairly small change, it would just close the
>> window between requesting walsenders to terminate and them actually
>> terminating.
>
> Yeah, sounds good. Could you implement the patch? If you don't have time,
> I will....

I'll give it a shot..

- Heikki


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: hlinnaka(at)iki(dot)fi
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 00:40:22
Message-ID: 50454DF6.4070400@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.09.2012 16:26, Heikki Linnakangas wrote:
> On 03.09.2012 16:25, Fujii Masao wrote:
>> On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi>
>> wrote:
>>> Hmm, I was thinking that when walsender gets the position it can send
>>> the
>>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the
>>> current
>>> recovery timeline. If it has changed, refuse to send the new WAL and
>>> terminate. That would be a fairly small change, it would just close the
>>> window between requesting walsenders to terminate and them actually
>>> terminating.
>>
>> Yeah, sounds good. Could you implement the patch? If you don't have time,
>> I will....
>
> I'll give it a shot..

So, this is what I came up with, please review.

- Heikki

Attachment Content-Type Size
disconnect-walsenders-on-target-tli-change.patch text/x-diff 8.2 KB

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: hlinnaka(at)iki(dot)fi
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 10:02:55
Message-ID: m2r4qiw1f4.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> Hmm, I was thinking that when walsender gets the position it can send the
> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
> recovery timeline. If it has changed, refuse to send the new WAL and
> terminate. That would be a fairly small change, it would just close the
> window between requesting walsenders to terminate and them actually
> terminating.

It looks to me like a bug fix that also applies to non cascading
situation. Is that right?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 22:41:03
Message-ID: 5046837F.3050001@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guys,

Is this a patch to 9.3?

i.e. do we need to delay the release for this?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 22:44:31
Message-ID: 5046844F.6000800@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.09.2012 03:02, Dimitri Fontaine wrote:
> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>> Hmm, I was thinking that when walsender gets the position it can send the
>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
>> recovery timeline. If it has changed, refuse to send the new WAL and
>> terminate. That would be a fairly small change, it would just close the
>> window between requesting walsenders to terminate and them actually
>> terminating.
>
> It looks to me like a bug fix that also applies to non cascading
> situation. Is that right?

No, only cascading replication is affected. In non-cascading situation,
the timeline never changes in the master. It's only in cascading mode
that you have a problem, where the standby can cross timelines while
it's replaying the WAL, and also sending it over to cascading standby.

- Heikki


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 23:01:47
Message-ID: 5046885B.4080904@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.09.2012 15:41, Josh Berkus wrote:
> Guys,
>
> Is this a patch to 9.3?
>
> i.e. do we need to delay the release for this?

It is for 9.2. I'll do a little bit more testing, and barring any
issues, commit the patch. What exactly is the schedule? Do we need to do
a RC2 because of this?

- Heikki


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 23:14:01
Message-ID: 50468B39.4010008@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki,

> It is for 9.2. I'll do a little bit more testing, and barring any
> issues, commit the patch. What exactly is the schedule? Do we need to do
> a RC2 because of this?

We're currently scheduled to release next week. If we need to do an
RC2, we're going to have to do some fast rescheduling; we've already
started the publicity machine.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-04 23:50:16
Message-ID: 12902.1346802616@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Heikki,
>> It is for 9.2. I'll do a little bit more testing, and barring any
>> issues, commit the patch. What exactly is the schedule? Do we need to do
>> a RC2 because of this?

> We're currently scheduled to release next week. If we need to do an
> RC2, we're going to have to do some fast rescheduling; we've already
> started the publicity machine.

At this point I would argue that the only thing that should abort the
launch is a bad regression. Minor bugs in new features (and this must
be minor if it wasn't noticed before) don't qualify.

Having said that, it'd be good to get it fixed if we can. The schedule
says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
tomorrow (Wednesday)?

regards, tom lane


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: hlinnaka(at)iki(dot)fi
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 00:14:11
Message-ID: 50469953.1070603@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.09.2012 17:40, Heikki Linnakangas wrote:
> On 03.09.2012 16:26, Heikki Linnakangas wrote:
>> On 03.09.2012 16:25, Fujii Masao wrote:
>>> On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas<hlinnaka(at)iki(dot)fi>
>>> wrote:
>>>> Hmm, I was thinking that when walsender gets the position it can send
>>>> the
>>>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the
>>>> current
>>>> recovery timeline. If it has changed, refuse to send the new WAL and
>>>> terminate. That would be a fairly small change, it would just close the
>>>> window between requesting walsenders to terminate and them actually
>>>> terminating.
>>>
>>> Yeah, sounds good. Could you implement the patch? If you don't have
>>> time,
>>> I will....
>>
>> I'll give it a shot..
>
> So, this is what I came up with, please review.

While testing, I bumped into another related bug: When a WAL segment is
restored from the archive, we let a walsender to send that whole WAL
segment to a cascading standby. However, there's no guarantee that the
restored WAL segment is complete. In particular, if a timeline changes
within that segment, e.g 000000010000000000000004, that segment will be
only partially full, and the WAL continues at segment
000000020000000000000004, at the next timeline. This can also happen if
you copy a partial WAL segment to the archive, for example from a
crashed master server. Or if you have set up record-based WAL shipping
not using streaming replication, per
http://www.postgresql.org/docs/devel/static/log-shipping-alternative.html#WARM-STANDBY-RECORD.
That manual page says you can only deal with whole WAL files that way,
but I think with standby_mode='on', that's actually no longer true.

So all in all, it seems like a shaky assumption that once you've
restored a WAL file from the archive, you're free to stream it to a
cascading slave. I think it would be more robust to limit it to
streaming the file only up to the point that it's been replayed - and
thus verified - in the 1st standby. If everyone is OK with that change
in behavior, the fix is simple.

- Heikki


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 00:34:59
Message-ID: 50469E33.902@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.09.2012 16:50, Tom Lane wrote:
> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>> Heikki,
>>> It is for 9.2. I'll do a little bit more testing, and barring any
>>> issues, commit the patch. What exactly is the schedule? Do we need to do
>>> a RC2 because of this?
>
>> We're currently scheduled to release next week. If we need to do an
>> RC2, we're going to have to do some fast rescheduling; we've already
>> started the publicity machine.
>
> At this point I would argue that the only thing that should abort the
> launch is a bad regression. Minor bugs in new features (and this must
> be minor if it wasn't noticed before) don't qualify.
>
> Having said that, it'd be good to get it fixed if we can. The schedule
> says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
> tomorrow (Wednesday)?

The attached patch fixes it for me. It fixes the original problem, by
adding the missing locking and terminating walsenders on a target
timeline change, and also changes the behavior wrt. WAL segments
restored from the archive, as I just suggested in another email
(http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).

The test case I've been using is a master and two standbys. The first
standby is set up to connect to the master with streaming replication,
and the other standby is set up to connect to the 1st standby, ie. it's
a cascading slave. In addition, the master is set up to do WAL archiving
to a directory, and both standbys have a restore_command to read from
that archive, and restore_target_timeline='latest'. After the master and
both standbys are running, I create a dummy recovery.conf file in
master's data directory, with just "restore_command='/bin/false'" in it,
and restart the master. That forces a timeline change in the master.
With the patch, the 1st standby will notice the new timeline in the
archive, switch to that, and reconnect to the master. The cascading
connection to the 2nd standby is terminated because of the timeline
change, the 2nd standby will also scan the archive and pick up the new
timeline, reconnect to the 1st standby, and be in sync again.

- Heikki

Attachment Content-Type Size
disconnect-walsenders-on-target-tli-change-2.patch text/x-diff 6.5 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 02:34:59
Message-ID: 5046BA53.8000707@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.09.2012 17:34, Heikki Linnakangas wrote:
> On 04.09.2012 16:50, Tom Lane wrote:
>> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>>> Heikki,
>>>> It is for 9.2. I'll do a little bit more testing, and barring any
>>>> issues, commit the patch. What exactly is the schedule? Do we need
>>>> to do
>>>> a RC2 because of this?
>>
>>> We're currently scheduled to release next week. If we need to do an
>>> RC2, we're going to have to do some fast rescheduling; we've already
>>> started the publicity machine.
>>
>> At this point I would argue that the only thing that should abort the
>> launch is a bad regression. Minor bugs in new features (and this must
>> be minor if it wasn't noticed before) don't qualify.
>>
>> Having said that, it'd be good to get it fixed if we can. The schedule
>> says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
>> tomorrow (Wednesday)?
>
> The attached patch fixes it for me. It fixes the original problem, by
> adding the missing locking and terminating walsenders on a target
> timeline change, and also changes the behavior wrt. WAL segments
> restored from the archive, as I just suggested in another email
> (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).

Committed that.

- Heikki


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: hlinnaka(at)iki(dot)fi
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 04:56:55
Message-ID: 1346821015.16318.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-09-04 at 19:34 -0700, Heikki Linnakangas wrote:
> > The attached patch fixes it for me. It fixes the original problem, by
> > adding the missing locking and terminating walsenders on a target
> > timeline change, and also changes the behavior wrt. WAL segments
> > restored from the archive, as I just suggested in another email
> > (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).
>
> Committed that.

New compiler warnings:

xlog.c: In function ‘XLogFileRead’:
xlog.c:2785:14: error: unused variable ‘endptr’ [-Werror=unused-variable]
xlog.c:2784:25: error: unused variable ‘xlogctl’ [-Werror=unused-variable]


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 05:09:24
Message-ID: 5046DE84.7020107@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.09.2012 21:56, Peter Eisentraut wrote:
> On Tue, 2012-09-04 at 19:34 -0700, Heikki Linnakangas wrote:
>>> The attached patch fixes it for me. It fixes the original problem, by
>>> adding the missing locking and terminating walsenders on a target
>>> timeline change, and also changes the behavior wrt. WAL segments
>>> restored from the archive, as I just suggested in another email
>>> (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).
>>
>> Committed that.
>
> New compiler warnings:
>
> xlog.c: In function ‘XLogFileRead’:
> xlog.c:2785:14: error: unused variable ‘endptr’ [-Werror=unused-variable]
> xlog.c:2784:25: error: unused variable ‘xlogctl’ [-Werror=unused-variable]

Fixed, thanks.

- Heikki


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: hlinnaka(at)iki(dot)fi
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 08:03:47
Message-ID: m2wr08vqu4.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 04.09.2012 03:02, Dimitri Fontaine wrote:
>> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>>> Hmm, I was thinking that when walsender gets the position it can send the
>>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
>>> recovery timeline. If it has changed, refuse to send the new WAL and
>>> terminate. That would be a fairly small change, it would just close the
>>> window between requesting walsenders to terminate and them actually
>>> terminating.
>
> No, only cascading replication is affected. In non-cascading situation, the
> timeline never changes in the master. It's only in cascading mode that you
> have a problem, where the standby can cross timelines while it's replaying
> the WAL, and also sending it over to cascading standby.

It seems to me that it applies to connecting a standby to a newly
promoted standby too, as the timeline did change in this case too.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 14:15:19
Message-ID: 50475E77.9000701@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.09.2012 01:03, Dimitri Fontaine wrote:
> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>> On 04.09.2012 03:02, Dimitri Fontaine wrote:
>>> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>>>> Hmm, I was thinking that when walsender gets the position it can send the
>>>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
>>>> recovery timeline. If it has changed, refuse to send the new WAL and
>>>> terminate. That would be a fairly small change, it would just close the
>>>> window between requesting walsenders to terminate and them actually
>>>> terminating.
>>
>> No, only cascading replication is affected. In non-cascading situation, the
>> timeline never changes in the master. It's only in cascading mode that you
>> have a problem, where the standby can cross timelines while it's replaying
>> the WAL, and also sending it over to cascading standby.
>
> It seems to me that it applies to connecting a standby to a newly
> promoted standby too, as the timeline did change in this case too.

I was worried about that too at first, but Fujii pointed out that's OK:
see last paragraph at
http://archives.postgresql.org/pgsql-hackers/2012-08/msg01203.php.

If you connect to a standby that was already promoted to new master,
it's no different from connecting to a master in general. It works. If
you connect just before a standby is promoted, it works because a
cascading standby pays attention to the recovery target timeline, and
the pointer to last replayed WAL record. Promoting a standby doesn't
change recovery target timeline or the last replayed WAL record, it sets
XLogCtl->ThisTimeLineID. So the walsender in cascading mode will send
the WAL up to where the promotion happened, but will stop there until
it's terminated by the signal.

- Heikki


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: hlinnaka(at)iki(dot)fi
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 14:55:28
Message-ID: m2mx14tt7j.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> I was worried about that too at first, but Fujii pointed out that's OK: see
> last paragraph at
> http://archives.postgresql.org/pgsql-hackers/2012-08/msg01203.php.

Mmm, ok.

I'm worried about master-standby-standby setup where the master
disappear, we promote a standby and the second standby now feeds from
the newly promoted standby. Well we have to reconnect manually in this
case, but don't we need some similar stopgaps?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 17:08:43
Message-ID: 5047871B.4040005@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.09.2012 07:55, Dimitri Fontaine wrote:
> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>> I was worried about that too at first, but Fujii pointed out that's OK: see
>> last paragraph at
>> http://archives.postgresql.org/pgsql-hackers/2012-08/msg01203.php.
>
> Mmm, ok.
>
> I'm worried about master-standby-standby setup where the master
> disappear, we promote a standby and the second standby now feeds from
> the newly promoted standby. Well we have to reconnect manually in this
> case, but don't we need some similar stopgaps?

The second standby will have to reconnect, but it will happen automatically.

- Heikki