Re: How should pg_standby get over the gap of timeline?

Lists: pgsql-hackers
From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: How should pg_standby get over the gap of timeline?
Date: 2008-11-20 13:41:59
Message-ID: 3f0b79eb0811200541p2b995e06p880485ebd37bf370@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

In the current Synch Rep patch, the standby cannot catch up with the
primary which has a bigger timeline. So, whenever making the standby
catch up, a fresh base backup is required. This is obviously undesirable,
and I'd like to get rid of this restriction.

Postgres itself can recover up to a bigger timeline without a base
backup. The remaining problem is that pg_standby cannot get over the
gap of timeline. It continues waiting for the XLOG file with out-of-date
timeline, and redo doesn't progress.

My idea is that introducing a new option into pg_standby, which makes
the restoring fail if there is the XLOG file with the same logid and segid
even if the target file doesn't exist. Once failing to restore, the startup
process can switch the timeline and try to restore the XLOG file with
new timeline.

Is this idea reasonable? Any comments welcome!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-20 14:24:37
Message-ID: 49257325.5020505@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> In the current Synch Rep patch, the standby cannot catch up with the
> primary which has a bigger timeline.

That would only happen if you've performed an archive recovery in the
primary. If you've done PITR in the primary, I don't think there's any
guarantee that it's even possible to catch up the standby. The standby
might already have replayed a WAL file from an earlier timeline, that
isn't part of the history of the bigger timeline.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-20 14:50:34
Message-ID: 3f0b79eb0811200650x1f34c741wd8f895bb7bee3ad7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Heikki. Thanks for the comment!

On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> In the current Synch Rep patch, the standby cannot catch up with the
>> primary which has a bigger timeline.
>
> That would only happen if you've performed an archive recovery in the
> primary. If you've done PITR in the primary, I don't think there's any
> guarantee that it's even possible to catch up the standby. The standby might
> already have replayed a WAL file from an earlier timeline, that isn't part
> of the history of the bigger timeline.

I assume the situation of making the standby (the original primary) catch up
with the primary (the original standby) after failover. Since a timeline is
incremented when a failover finishes archive recovery on a standby, the
timelines differ between two servers.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-20 15:06:36
Message-ID: 49257CFC.900@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> Hi, Heikki. Thanks for the comment!
>
> On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Fujii Masao wrote:
>>> In the current Synch Rep patch, the standby cannot catch up with the
>>> primary which has a bigger timeline.
>> That would only happen if you've performed an archive recovery in the
>> primary. If you've done PITR in the primary, I don't think there's any
>> guarantee that it's even possible to catch up the standby. The standby might
>> already have replayed a WAL file from an earlier timeline, that isn't part
>> of the history of the bigger timeline.
>
> I assume the situation of making the standby (the original primary) catch up
> with the primary (the original standby) after failover. Since a timeline is
> incremented when a failover finishes archive recovery on a standby, the
> timelines differ between two servers.

That seems like a dangerous assumption. What if the standby had fallen
behind before the failover? It's not safe to failover back to the
original primary in that case. We'd need some kind of safeguards against
that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-20 15:15:21
Message-ID: 2e78013d0811200715q36ab6ab5x8ec03fa2d712157c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 20, 2008 at 8:36 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>
> That seems like a dangerous assumption. What if the standby had fallen
> behind before the failover? It's not safe to failover back to the original
> primary in that case. We'd need some kind of safeguards against that.
>
>
For synchronous replication, what if we ensure that the standby has received
the WAL (atleast in its buffers) before writing it to disk on the primary ?
If we do that, I think the old standby can never fall behind the primary and
it would be easy for the old primary to join back the replication without a
fresh backup.

Of course, this doesn't work for async replication.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-21 03:03:45
Message-ID: 3f0b79eb0811201903v1be9742dp48bf944db76de3df@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 21, 2008 at 12:06 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> Hi, Heikki. Thanks for the comment!
>>
>> On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> Fujii Masao wrote:
>>>>
>>>> In the current Synch Rep patch, the standby cannot catch up with the
>>>> primary which has a bigger timeline.
>>>
>>> That would only happen if you've performed an archive recovery in the
>>> primary. If you've done PITR in the primary, I don't think there's any
>>> guarantee that it's even possible to catch up the standby. The standby
>>> might
>>> already have replayed a WAL file from an earlier timeline, that isn't
>>> part
>>> of the history of the bigger timeline.
>>
>> I assume the situation of making the standby (the original primary) catch
>> up
>> with the primary (the original standby) after failover. Since a timeline
>> is
>> incremented when a failover finishes archive recovery on a standby, the
>> timelines differ between two servers.
>
> That seems like a dangerous assumption. What if the standby had fallen
> behind before the failover? It's not safe to failover back to the original
> primary in that case. We'd need some kind of safeguards against that.

Yeah, it's a legitimate concern. As the safeguard, I'm going to delete the
XLOG files which may be inconsistent from the standby before making it
catch up. The XLOG file including the recovery starting point and the
subsequent ones may be inconsistent. Then, they need to be copied from
the primary. I'm writing down the draft of this procedure at wiki.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Procedure

But, it's overkill to overwrite all the XLOG files which may be inconsistent.
In the future, I'm going to provide the tool to compare the content of XLOG
between two servers and tell the user which files should be overwritten.

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-21 03:39:57
Message-ID: 3f0b79eb0811201939j6858f99ay16b0d453d6c0f934@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 21, 2008 at 12:15 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
> On Thu, Nov 20, 2008 at 8:36 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>
>> That seems like a dangerous assumption. What if the standby had fallen
>> behind before the failover? It's not safe to failover back to the original
>> primary in that case. We'd need some kind of safeguards against that.
>>
>
> For synchronous replication, what if we ensure that the standby has received
> the WAL (atleast in its buffers) before writing it to disk on the primary ?
> If we do that, I think the old standby can never fall behind the primary and
> it would be easy for the old primary to join back the replication without a
> fresh backup.

In the current patch, since the WAL are written and sent concurrently for
the performance gain, we cannot guarantee whether the old standby fall
behind or not. I think that the setup procedure which can resolve both
cases is required.

> Of course, this doesn't work for async replication.

Yeah, in asynch replication, some committed transaction may disappear
regardless of whether the fresh backup is used or not. But, since the
current patch guarantee "Replicate Ahead Log" rule even if asynch case,
we can recover the old primary by using the WAL on the old standby
consistently.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-21 17:09:23
Message-ID: 1227287363.7015.96.camel@hp_dx2400_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote:

> In the current Synch Rep patch, the standby cannot catch up with the
> primary which has a bigger timeline. So, whenever making the standby
> catch up, a fresh base backup is required. This is obviously undesirable,
> and I'd like to get rid of this restriction.
>
> Postgres itself can recover up to a bigger timeline without a base
> backup. The remaining problem is that pg_standby cannot get over the
> gap of timeline. It continues waiting for the XLOG file with out-of-date
> timeline, and redo doesn't progress.

We've discussed this before. My answer is the same: you are assuming it
is safe to re-enter recovery, which is not correct (currently). You are
also assuming that taking a base backup is an expensive operation - it
need not be so if you simply move only the files/data that have changed,
e.g. rsync.

So if you want this to work, hacking pg_standby is not the way to do it.
But I'm not convinced there is a problem worth solving.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-21 18:39:47
Message-ID: 3f0b79eb0811211039t578f4499t375091e63972376a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Simon. Thanks for the comment!!

On Sat, Nov 22, 2008 at 2:09 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote:
>
>> In the current Synch Rep patch, the standby cannot catch up with the
>> primary which has a bigger timeline. So, whenever making the standby
>> catch up, a fresh base backup is required. This is obviously undesirable,
>> and I'd like to get rid of this restriction.
>>
>> Postgres itself can recover up to a bigger timeline without a base
>> backup. The remaining problem is that pg_standby cannot get over the
>> gap of timeline. It continues waiting for the XLOG file with out-of-date
>> timeline, and redo doesn't progress.
>
> We've discussed this before. My answer is the same: you are assuming it
> is safe to re-enter recovery, which is not correct (currently).

I'm afraid you might be right. But I cannot understand yet why it's not
safe to re-enter recovery. Is it safe to re-enter recovery from the
restart point after PITR stopped halfway? If it's safe, ISTM that PITR
without a base backup also is safe. Please let me know what might
violate a re-entry of recovery. What is your worry?

> You are
> also assuming that taking a base backup is an expensive operation - it
> need not be so if you simply move only the files/data that have changed,
> e.g. rsync.

It depends on DB size and type. I think that it's important that the user
*can* choose the better method according to his situation.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-22 09:28:52
Message-ID: 1227346132.7015.110.camel@hp_dx2400_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Sat, 2008-11-22 at 03:39 +0900, Fujii Masao wrote:
> Hi, Simon. Thanks for the comment!!
>
> On Sat, Nov 22, 2008 at 2:09 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote:
> >
> >> In the current Synch Rep patch, the standby cannot catch up with the
> >> primary which has a bigger timeline. So, whenever making the standby
> >> catch up, a fresh base backup is required. This is obviously undesirable,
> >> and I'd like to get rid of this restriction.
> >>
> >> Postgres itself can recover up to a bigger timeline without a base
> >> backup. The remaining problem is that pg_standby cannot get over the
> >> gap of timeline. It continues waiting for the XLOG file with out-of-date
> >> timeline, and redo doesn't progress.
> >
> > We've discussed this before. My answer is the same: you are assuming it
> > is safe to re-enter recovery, which is not correct (currently).
>
> I'm afraid you might be right. But I cannot understand yet why it's not
> safe to re-enter recovery. Is it safe to re-enter recovery from the
> restart point after PITR stopped halfway? If it's safe, ISTM that PITR
> without a base backup also is safe. Please let me know what might
> violate a re-entry of recovery. What is your worry?

My worry is that there has not been an exhaustive analysis. "Almost
correct" and "probably correct" is not the same thing as "correct". We
need to look through all of the changes that occur at the end of
recovery to be certain we can do this. Luckily normal data blocks don't
know anything about such state changes, so that is a good start. We must
look at

Timelines
control file
startupclog, startup multixact etc
autovacuum starting
relcache init file
flat files
archive status
pg_xlog
two phase commit
...
every single file type in Postgres...

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How should pg_standby get over the gap of timeline?
Date: 2008-11-25 06:54:21
Message-ID: 3f0b79eb0811242254m1efea91av1f6e82866708d599@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Nov 22, 2008 at 6:28 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> My worry is that there has not been an exhaustive analysis. "Almost
> correct" and "probably correct" is not the same thing as "correct". We
> need to look through all of the changes that occur at the end of
> recovery to be certain we can do this. Luckily normal data blocks don't
> know anything about such state changes, so that is a good start. We must
> look at

It's reasonable worry. Thanks a lot, Simon. I will examine it next time
(probably 8.5).

And, I'd like to clear up which recovery method is safe now. Althogh
I think as follows, is it right?

Safe (proved to be safe):
- PITR with a base backup.
That is, we don't always need a fresh backup when setting up, and
can make the standby catch up by using an old or fresh backup.
If we can use an old backup, I think it might be worth changing
pg_standby to get over the gap of timeline. What is your opinion?

- PITR with a database cluster including a recovery restart point.
That is, we can make the standby catch up without a base backup
after it fails.

Not safe (further examination is needed):
- PITR with a database cluster not including a recovery restart point.
That is, we cannot make the standby (old primary) catch up without
a base backup.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center