Re: Tracking latest timeline in standby mode

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Tracking latest timeline in standby mode
Date: 2010-10-27 14:42:24
Message-ID: 4CC83A50.7070807@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At the moment, when you specify recovery_target_timeline='latest', we
scan for the latest timeline at the beginning of recovery, and pick that
as the target. If new timelines appear during recovery, we stick to the
target chosen in the beginning, the new timelines are ignored. That's
undesirable if you have one master and two standby servers, and failover
happens to one of the standbys. The other standby won't automatically
start tracking the new TLI created by the promoted new master, it
requires a restart to notice.

This was discussed a while ago:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg00620.php

More work needs to be done to make that work over streaming replication,
sending history files over the wire, for example, but let's take baby
steps. At the very minimum the startup process should notice new
timelines appearing in the archive. The attached patch does that.

Comments?

A related issue is that we should have a check for the issue I also
mentioned in the comments:

> /*
> * If the current timeline is not part of the history of the
> * new timeline, we cannot proceed to it.
> *
> * XXX This isn't foolproof: The new timeline might have forked from
> * the current one, but before the current recovery location. In that
> * case we will still switch to the new timeline and proceed replaying
> * from it even though the history doesn't match what we already
> * replayed. That's not good. We will likely notice at the next online
> * checkpoint, as the TLI won't match what we expected, but it's
> * not guaranteed. The admin needs to make sure that doesn't happen.
> */

but that's a pre-existing and orthogonal issue, it can with the current
code too if you restart the standby, so let's handle that as a separate
patch. I'll focus on that next.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
rescan-latest-tli-1.patch text/x-diff 4.2 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2010-11-01 10:32:49
Message-ID: AANLkTimkP8pa+nGLYS4v0EUNKmOUi9oGwdDCNBtZSakj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 27, 2010 at 11:42 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> At the moment, when you specify recovery_target_timeline='latest', we scan
> for the latest timeline at the beginning of recovery, and pick that as the
> target. If new timelines appear during recovery, we stick to the target
> chosen in the beginning, the new timelines are ignored. That's undesirable
> if you have one master and two standby servers, and failover happens to one
> of the standbys. The other standby won't automatically start tracking the
> new TLI created by the promoted new master, it requires a restart to notice.
>
> This was discussed a while ago:
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg00620.php
>
> More work needs to be done to make that work over streaming replication,
> sending history files over the wire, for example, but let's take baby steps.
> At the very minimum the startup process should notice new timelines
> appearing in the archive. The attached patch does that.
>
> Comments?

Currently the startup process rescans the timeline history file only
when walreceiver
is not in progress. But, if walreceiver receives that file from the
master in the future,
the startup process should rescan them even while walreceiver is in progress?

> A related issue is that we should have a check for the issue I also
> mentioned in the comments:
>
>>        /*
>>         * If the current timeline is not part of the history of the
>>         * new timeline, we cannot proceed to it.
>>         *
>>         * XXX This isn't foolproof: The new timeline might have forked
>> from
>>         * the current one, but before the current recovery location. In
>> that
>>         * case we will still switch to the new timeline and proceed
>> replaying
>>         * from it even though the history doesn't match what we already
>>         * replayed. That's not good. We will likely notice at the next
>> online
>>         * checkpoint, as the TLI won't match what we expected, but it's
>>         * not guaranteed. The admin needs to make sure that doesn't
>> happen.
>>         */
>
> but that's a pre-existing and orthogonal issue, it can with the current code
> too if you restart the standby, so let's handle that as a separate patch.

I'm thinking to write the timeline switch LSN to the timeline history file, and
compare LSN with the location of the last applied WAL record when that
file is rescaned. If the timeline switch LSN is ahead, we cannot do the switch.

Currently the timeline history file contains the timeline switch WAL filename,
but it's not used at all. As a first step, what about replacing that
filename with
the switch LSN?

+ /* Switch target */
+ recoveryTargetTLI = newtarget;
+ expectedTLIs = newExpectedTLIs;

Before "expectedTLIs = newExpectedTLIs", we should call
list_free_deep(expectedTLIs)?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2010-11-01 11:32:59
Message-ID: 4CCEA56B.5030307@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.11.2010 12:32, Fujii Masao wrote:
>> A related issue is that we should have a check for the issue I also
>> mentioned in the comments:
>>
>>> /*
>>> * If the current timeline is not part of the history of the
>>> * new timeline, we cannot proceed to it.
>>> *
>>> * XXX This isn't foolproof: The new timeline might have forked
>>> from
>>> * the current one, but before the current recovery location. In
>>> that
>>> * case we will still switch to the new timeline and proceed
>>> replaying
>>> * from it even though the history doesn't match what we already
>>> * replayed. That's not good. We will likely notice at the next
>>> online
>>> * checkpoint, as the TLI won't match what we expected, but it's
>>> * not guaranteed. The admin needs to make sure that doesn't
>>> happen.
>>> */
>>
>> but that's a pre-existing and orthogonal issue, it can with the current code
>> too if you restart the standby, so let's handle that as a separate patch.
>
> I'm thinking to write the timeline switch LSN to the timeline history file, and
> compare LSN with the location of the last applied WAL record when that
> file is rescaned. If the timeline switch LSN is ahead, we cannot do the switch.

Yeah, that's one approach. Another is to validate the TLI in the xlog
page header, it should always match the current timeline we're on. That
would feel more robust to me.

We're a bit fuzzy about what TLI is written in the page header when the
timeline changing checkpoint record is written, though. If the
checkpoint record fits in the previous page, the page will carry the old
TLI, but if the checkpoint record begins a new WAL page, the new page is
initialized with the new TLI. I think we should rearrange that so that
the page header will always carry the old TLI.

>
> + /* Switch target */
> + recoveryTargetTLI = newtarget;
> + expectedTLIs = newExpectedTLIs;
>
> Before "expectedTLIs = newExpectedTLIs", we should call
> list_free_deep(expectedTLIs)?

It's an integer list so list_free(expectedTLIs) is enough, and I doubt
that leakage will ever be a problem in practice, but in principle you're
right.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2010-11-02 05:15:26
Message-ID: AANLkTim-1_=UHHRbZyx059YoQJVSMmmVHiAqT-qgVK_k@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Yeah, that's one approach. Another is to validate the TLI in the xlog page
> header, it should always match the current timeline we're on. That would
> feel more robust to me.

Yeah, that seems better.

> We're a bit fuzzy about what TLI is written in the page header when the
> timeline changing checkpoint record is written, though. If the checkpoint
> record fits in the previous page, the page will carry the old TLI, but if
> the checkpoint record begins a new WAL page, the new page is initialized
> with the new TLI. I think we should rearrange that so that the page header
> will always carry the old TLI.

Or after rescanning the timeline history files, what about refetching the last
applied record and checking whether the TLI in the xlog page header is the
same as the previous TLI? IOW, what about using the header of the xlog page
including the last applied record instead of the following checkpoint record?

Anyway ISTM we should also check that the min recovery point is not ahead
of the TLI switch location. So we need to fetch the record in the min recovery
point and validate the TLI of the xlog page header. Otherwise, the database
might get corrupted. This can happen, for example, when you remove all the
WAL files in pg_xlog directory and restart the standby.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2010-11-02 15:08:44
Message-ID: 4CD0297C.9080004@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.11.2010 07:15, Fujii Masao wrote:
> On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Yeah, that's one approach. Another is to validate the TLI in the xlog page
>> header, it should always match the current timeline we're on. That would
>> feel more robust to me.
>
> Yeah, that seems better.
>
>> We're a bit fuzzy about what TLI is written in the page header when the
>> timeline changing checkpoint record is written, though. If the checkpoint
>> record fits in the previous page, the page will carry the old TLI, but if
>> the checkpoint record begins a new WAL page, the new page is initialized
>> with the new TLI. I think we should rearrange that so that the page header
>> will always carry the old TLI.
>
> Or after rescanning the timeline history files, what about refetching the last
> applied record and checking whether the TLI in the xlog page header is the
> same as the previous TLI? IOW, what about using the header of the xlog page
> including the last applied record instead of the following checkpoint record?

I guess that would work too, but it seems problematic to move backwards
during recovery.

> Anyway ISTM we should also check that the min recovery point is not ahead
> of the TLI switch location. So we need to fetch the record in the min recovery
> point and validate the TLI of the xlog page header. Otherwise, the database
> might get corrupted. This can happen, for example, when you remove all the
> WAL files in pg_xlog directory and restart the standby.

Yes, that's another problem. We don't know which timeline the min
recovery point refers to. We should store TLI along with
minRecoveryPoint, then we can at least check that we're on the right
timeline when we reach minRecoveryPoint and throw an error.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-01-04 20:08:18
Message-ID: 4D237E32.2070204@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.11.2010 07:15, Fujii Masao wrote:
> On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Yeah, that's one approach. Another is to validate the TLI in the xlog page
>> header, it should always match the current timeline we're on. That would
>> feel more robust to me.
>
> Yeah, that seems better.

I finally got around to look at this. I wrote a patch to validate that
the TLI on xlog page header matches ThisTimeLineID during recovery, and
noticed quickly in testing that it doesn't catch all the cases I'd like
to catch :-(.

The problem scenario is this:

TLI 1 -----------+C-------+------->Standby
.
.
TLI 2 +C-------+------->

The two horizontal lines represent two timelines. TLI 2 forks off from
TLI 1, because of a failover to a not-completely up-to-date standby
server, for example. The plus-signs represent WAL segment boundaries and
C's represent checkpoint records.

Another standby server has replayed all the WAL on TLI 2. Its latest
restartpoint is C. The checkpoint records on the different timelines are
at the same location, at the beginning of the WAL files - not all that
impossible if you have archive_timeout set, for example.

Now, if you stop and restart the standby, it will try to recover to the
latest timeline, which is TLI 2. But before the restart, it had already
replayed the WAL from TLI 1, so it's wrong to replay the WAL from the
parallel universe of TLI 2. At the moment, it will go ahead and do it,
and you end up with an inconsistent database.

I planned to fix that by checking the TLI on the xlog page header, but
that alone isn't enough in the above scenario. The TLI on the page
headers on timeline 2 are what's expected; the first page on the segment
has TLI==1, because it was just forked off from timeline 1, and the
subsequent pages have TLI==2, as they should after the checkpoint record.

So we have to remember that before the restart, which timeline where we
on. We already remember how far we had replayed, that's the
minRecoveryPoint we store in the control file, but we have to memorize
the timeline along that.

On reflection, your idea of checking the history file before replaying
anything seems much easier. We'll still need to add the timeline
alongside minRecoveryPoint to do the checking, but it's a lot easier to
do against the history file. And we can validate the TLIs on page
headers against the information from the history file as we read in the WAL.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-01-24 07:00:02
Message-ID: AANLkTikUeT0t91WbvihbUokXvfEAOsjJhyjNZUe9_9D-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I finally got around to look at this. I wrote a patch to validate that the
> TLI on xlog page header matches ThisTimeLineID during recovery, and noticed
> quickly in testing that it doesn't catch all the cases I'd like to catch
> :-(.

The patch added into the CF hasn't solved this problem yet. Are you planning
to solve it in 9.1? Or are you planning to just commit the patch for 9.1, and
postpone the issue to 9.2 or later? I'm OK either way. Of course, the former
is quite better, though.

Anyway, you have to add the documentation about this feature.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-02-08 04:27:31
Message-ID: AANLkTinStVP7X5fiLfgjBf6yWyJmOJchizJ2UFk9QKpK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 2:00 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I finally got around to look at this. I wrote a patch to validate that the
>> TLI on xlog page header matches ThisTimeLineID during recovery, and noticed
>> quickly in testing that it doesn't catch all the cases I'd like to catch
>> :-(.
>
> The patch added into the CF hasn't solved this problem yet. Are you planning
> to solve it in 9.1? Or are you planning to just commit the patch for 9.1, and
> postpone the issue to 9.2 or later? I'm OK either way. Of course, the former
> is quite better, though.
>
> Anyway, you have to add the documentation about this feature.

This patch is erroneously marked Needs Review in the CommitFest
application, but I think really it's Waiting on Author, and has been
for a long time. I'm thinking we should push this out to 9.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-03-07 10:52:29
Message-ID: 4D74B8ED.9040700@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.02.2011 06:27, Robert Haas wrote:
> On Mon, Jan 24, 2011 at 2:00 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> I finally got around to look at this. I wrote a patch to validate that the
>>> TLI on xlog page header matches ThisTimeLineID during recovery, and noticed
>>> quickly in testing that it doesn't catch all the cases I'd like to catch
>>> :-(.
>>
>> The patch added into the CF hasn't solved this problem yet. Are you planning
>> to solve it in 9.1? Or are you planning to just commit the patch for 9.1, and
>> postpone the issue to 9.2 or later? I'm OK either way. Of course, the former
>> is quite better, though.
>>
>> Anyway, you have to add the documentation about this feature.
>
> This patch is erroneously marked Needs Review in the CommitFest
> application, but I think really it's Waiting on Author, and has been
> for a long time. I'm thinking we should push this out to 9.2.

I dropped the ball on this one, but now that we have pg_basebackup and
"pg_ctl promote" which make it easy to set up a standby and failover, I
think we should still do this in 9.1. Otherwise you need a restart to
have a 2nd standby server track the TLI change that failover causes.

I wanted to add those extra safeguards, and to support streaming
replication in addition to restoring from archive, but that's 9.2
material. However, the original patch
(http://archives.postgresql.org/message-id/4CC83A50.7070807@enterprisedb.com)
was non-intrusive and no-one objected. While the extra safeguards
would've been nice, this patch doesn't make the situation any worse than
it is already when you restart the standby.

Here's an updated version of that patch, now with a little bit of
documentation. Barring objections, I'll commit this.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
rescan-latest-tli-2.patch text/x-diff 6.0 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-03-07 12:06:14
Message-ID: AANLkTikTcaVzu=OoXvgqFntds2jRu_etwh_0DUcY7UJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 7, 2011 at 11:52, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 08.02.2011 06:27, Robert Haas wrote:
>>
>> On Mon, Jan 24, 2011 at 2:00 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com>
>>  wrote:
>>>
>>> On Wed, Jan 5, 2011 at 5:08 AM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com>  wrote:
>>>>
>>>> I finally got around to look at this. I wrote a patch to validate that
>>>> the
>>>> TLI on xlog page header matches ThisTimeLineID during recovery, and
>>>> noticed
>>>> quickly in testing that it doesn't catch all the cases I'd like to catch
>>>> :-(.
>>>
>>> The patch added into the CF hasn't solved this problem yet. Are you
>>> planning
>>> to solve it in 9.1? Or are you planning to just commit the patch for 9.1,
>>> and
>>> postpone the issue to 9.2 or later? I'm OK either way. Of course, the
>>> former
>>> is quite better, though.
>>>
>>> Anyway, you have to add the documentation about this feature.
>>
>> This patch is erroneously marked Needs Review in the CommitFest
>> application, but I think really it's Waiting on Author, and has been
>> for a long time.  I'm thinking we should push this out to 9.2.
>
> I dropped the ball on this one, but now that we have pg_basebackup and
> "pg_ctl promote" which make it easy to set up a standby and failover, I
> think we should still do this in 9.1. Otherwise you need a restart to have a
> 2nd standby server track the TLI change that failover causes.

+1 for doing this!

(haven't had time to look through the actual patch, so obviously don't
do it if it's broken..)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-03-07 12:35:33
Message-ID: AANLkTikkZDV3Nj5WwZ5HuVRpqfOBNWy7a8eS7D8+Fa2C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 7, 2011 at 9:06 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> I dropped the ball on this one, but now that we have pg_basebackup and
>> "pg_ctl promote" which make it easy to set up a standby and failover, I
>> think we should still do this in 9.1. Otherwise you need a restart to have a
>> 2nd standby server track the TLI change that failover causes.
>
> +1 for doing this!

+1

Comments:

+ if (!list_member_int(expectedTLIs,
+ (int) recoveryTargetTLI))
+ ereport(LOG,
+ (errmsg("new timeline %u is not a child of database system timeline %u",

We should check whether recoveryTargetTLI is a member of newExpectedTLIs
instead of expectedTLIs?

> + /* Switch target */
>
> + recoveryTargetTLI = newtarget;
> + expectedTLIs = newExpectedTLIs;
>
> Before "expectedTLIs = newExpectedTLIs", we should call
> list_free_deep(expectedTLIs)?
>
> It's an integer list so list_free(expectedTLIs) is enough, and I doubt that leakage will ever be a problem in practice, but in principle you're right.

True. But I think that it's good habit to fix a leakage no matter how
small it's.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-03-07 19:16:55
Message-ID: 4D752F27.40707@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07.03.2011 14:35, Fujii Masao wrote:
> Comments:
>
> + if (!list_member_int(expectedTLIs,
> + (int) recoveryTargetTLI))
> + ereport(LOG,
> + (errmsg("new timeline %u is not a child of database system timeline %u",
>
> We should check whether recoveryTargetTLI is a member of newExpectedTLIs
> instead of expectedTLIs?

Thanks, fixed.

>> + /* Switch target */
>>
>> + recoveryTargetTLI = newtarget;
>> + expectedTLIs = newExpectedTLIs;
>>
>> Before "expectedTLIs = newExpectedTLIs", we should call
>> list_free_deep(expectedTLIs)?
>>
>> It's an integer list so list_free(expectedTLIs) is enough, and I doubt that leakage will ever be a problem in practice, but in principle you're right.
>
> True. But I think that it's good habit to fix a leakage no matter how
> small it's.

Ah, thanks for the reminder.

Added that and committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: senthilnathan <senthilnathan(dot)t(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-10-03 06:18:03
Message-ID: 1317622683255-4863900.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Whether this feature is available in version 9.1.0. ??

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Tracking-latest-timeline-in-standby-mode-tp3238829p4863900.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: senthilnathan <senthilnathan(dot)t(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-10-04 04:20:17
Message-ID: CAHGQGwGCVfgm6X3-Yjt3W0MtGm+WbSGDZTSa13i06LVo34Vn+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 3, 2011 at 3:18 PM, senthilnathan <senthilnathan(dot)t(at)gmail(dot)com> wrote:
> Whether this feature is available in version 9.1.0. ??

Yes, it's available in 9.1.x.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: senthilnathan <senthilnathan(dot)t(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tracking latest timeline in standby mode
Date: 2011-12-08 05:28:10
Message-ID: 1323322090821-5057733.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We are using 9.1.,

We have a set up like a master and 2 standby servers. M -- > S1,S2 . Both
standby S1 and S2 share the same archive. Master will have an Virtual IP.
Both stand by servers will be replicated using this virtual ip.

Assume the master fails,using our heart beat mechanism Virtual IP bound to
S1(if S1 is ahead or equal to S2 XLOG).,

Is it required to copy the time line history file that is generated at time
of S1 promotion as master to the archive directory of S2 for replication to
work (i.e S1(new master) to S2.)

Without doing this history file copy from S1 to S2, S2 keeps throwing the
following error message.,

2011-12-07 17:29:46 IST::@:[18879]:FATAL: could not receive data from WAL
stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
2011-12-07 17:29:49 IST::@:[18875]:LOG: record with zero length at
0/5D8FFC0
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/00000002.history': No such file or directory
2011-12-07 17:29:49 IST::@:[20362]:FATAL: timeline 2 of the primary does
not match recovery target timeline 1
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/00000002.history': No such file or directory
2011-12-07 17:29:54 IST::@:[20367]:FATAL: timeline 2 of the primary does
not match recovery target timeline 1
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/000000010000000000000005': No such file or
directory
cp: cannot stat `../archive/00000002.history': No such file or directory

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Tracking-latest-timeline-in-standby-mode-tp3238829p5057733.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.