Quick Links

Problem with streaming replication, backups, and recovery (9.0.x)

Lists:	pgsql-hackers

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-25 20:31:20
Message-ID:	20110325203120.GA5946@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi,

So, I hit a strange problem with Streaming Replication, that I cannot explain.

Executive summary:

when using hot backup made on straming replication slave, sometimes
(depending on load) generated backup is created in such a way, that while it
can be brough back as standalone Pg, and it can be brough back as streaming
slave, such slave (created off the backup) cannot be promoted to standalone.

Disclaimer:

I know that making hot backups on slave is not the suggested way, yet I was
doing it without any problem on earlier Postgres versions (8.2,8.3,8.4), and do
not have this problem with backups generated from the masters, so the problem I
hit now is so" peculiar, that I thought that it might be just an effect of some
underlying, more serious, condition.

Longer explanation:

First, let me explain how omnipitr-backup-slave works, because it's the tool
that I use to make backups on slave.

Steps that it does:

1. gets pg_controldata for $PGDATADIR
2. compresses $PGDATA to data tar.gz, putting inside backup_label file,
which contains:
START WAL LOCATION: %s (file %s)
CHECKPOINT LOCATION: %s
START TIME: %s
LABEL: OmniPITR_Slave_Hot_Backup
where START WAL LOCATION uses value from "Latest checkpoint's REDO
location" from pg_controldata from step #1, "CHECKPOINT LOCATION" is
taken from "Latest checkpoint location" from pg_controldata taken in step
#1, and START TIME is based on current (before starting compression of
$PGDATA) timestamp.
3. gets another copy of pg_controldata for $PGDATA
4. repeats step #3 until value in "Latest checkpoint location" will change
5. wait until file that contains WAL location, from "Minimum recovery ending
location" from pg_controldata from step #4, will be available
6. creates .backup file which is named based on "START WAL
LOCATION" (from step #2), and contains the same lines as backup_label file
from step #2, plus two more lines:
STOP WAL LOCATION: %s (file %s)
STOP TIME: %s
where STOP WAL LOCATION is taken from "Minimum recovery ending location"
from pg_controldata from step #4, and STOP time is current timestamp as
of before starting compression of wal segments.
7. compresses xlogs plus the .backup file generated in step #6.

This approach worked for a long time on various hosts, systems, versions, etc.

But now, it fails.

I'm using for tests PostgreSQL 9.0.2 and 9.0.3 (mostly 9.0.2 as this is the
most critical for me, but I tested on 9.0.3 too, and the problem is the same),
on linux (ubuntu), 64bit.

I do the procedure as always, and it produces backup. With this backup I can
setup new standalone server, and it works.

I can also setup streaming slave, and it also works, but when I create
trigger file to promote this slave to master it fails with error:
2011-03-24 21:01:58.051 CET @ 9680 LOG: trigger file found: /home/depesz/slave2/finish.recovery
2011-03-24 21:01:58.051 CET @ 9930 FATAL: terminating walreceiver process due to administrator command
2011-03-24 21:01:58.151 CET @ 9680 LOG: redo done at 0/1F000058
2011-03-24 21:01:58.151 CET @ 9680 LOG: last completed transaction was at log time 2011-03-24 20:58:25.836333+01
2011-03-24 21:01:58.238 CET @ 9680 FATAL: WAL ends before consistent recovery point

Which is interesting, because this particular backup was done using .backup file containing:

START WAL LOCATION: 0/A000020 (file 00000001000000000000000A)
STOP WAL LOCATION: 0/12C9D7E8 (file 000000010000000000000012)
CHECKPOINT LOCATION: 0/B803050
START TIME: 2011-03-24 20:52:46 CET
STOP TIME: 2011-03-24 20:53:41 CET
LABEL: OmniPITR_Slave_Hot_Backup

Which means that minimum recovery ending location was in fact reached (it was
on 0/12C9D7E8, and recovery continued till 0/1F000058).

I have set of script that can be used to replicate the problem, but the test
takes some time (~ 30 minutes).

What's most interesting is that this problem does not happen always. It
happens only when there was non-trivial load on db server - this is in my tests
where both master and slave are the same machine. I think that in normal cases
load on slave is more important.

If anyone would be able to help, I can give you access to test machine
and/or provide set of script which replicate (usually) the problem.

Alternatively - if there is anything I can do to help you solve the mystery
- I'd be very willing to.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	depesz(at)depesz(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 04:48:13
Message-ID:	AANLkTimp4UKgSfq4tvgRmA6TV743k4vCoqWpqRC-hntc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 26, 2011 at 5:31 AM, hubert depesz lubaczewski
<depesz(at)depesz(dot)com> wrote:
> I can also setup streaming slave, and it also works, but when I create
> trigger file to promote this slave to master it fails with error:
> 2011-03-24 21:01:58.051 CET @ 9680 LOG: trigger file found: /home/depesz/slave2/finish.recovery
> 2011-03-24 21:01:58.051 CET @ 9930 FATAL: terminating walreceiver process due to administrator command
> 2011-03-24 21:01:58.151 CET @ 9680 LOG: redo done at 0/1F000058
> 2011-03-24 21:01:58.151 CET @ 9680 LOG: last completed transaction was at log time 2011-03-24 20:58:25.836333+01
> 2011-03-24 21:01:58.238 CET @ 9680 FATAL: WAL ends before consistent recovery point
>
> Which is interesting, because this particular backup was done using .backup file containing:
>
> START WAL LOCATION: 0/A000020 (file 00000001000000000000000A)
> STOP WAL LOCATION: 0/12C9D7E8 (file 000000010000000000000012)
> CHECKPOINT LOCATION: 0/B803050
> START TIME: 2011-03-24 20:52:46 CET
> STOP TIME: 2011-03-24 20:53:41 CET
> LABEL: OmniPITR_Slave_Hot_Backup
>
> Which means that minimum recovery ending location was in fact reached (it was
> on 0/12C9D7E8, and recovery continued till 0/1F000058).

In 9.0, recovery doesn't read a backup history file. That FATAL error happens
if recovery ends before it reads the WAL record which was generated by
pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL
record not backup history file. Since you didn't run pg_stop_backup() and there
is no WAL record containing the recovery ending location, you got that error.

If you want to take hot backup from the standby, you need to do the procedure
explained in
http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 12:18:03
Message-ID:	AANLkTimoQdAM6J9eqnBz3BjriekqsHHhbRA0ATdHBzeO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 12:48 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> If you want to take hot backup from the standby, you need to do the procedure
> explained in
> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups

It'd be nice to improve this in 9.2. Relying on users to get this
just right seems both inconvenient and error-prone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 12:19:33
Message-ID:	20110328121933.GA17060@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote:
> In 9.0, recovery doesn't read a backup history file. That FATAL error happens
> if recovery ends before it reads the WAL record which was generated by
> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL
> record not backup history file. Since you didn't run pg_stop_backup() and there
> is no WAL record containing the recovery ending location, you got that error.
>
> If you want to take hot backup from the standby, you need to do the procedure
> explained in
> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups

Is it intentional and/or does it serve some greater good? I mean -
ability to make backups on slave without ever bothering master was
pretty interesting.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 15:11:48
Message-ID:	20110328151148.GA2436@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

one more question. how come that I can use this backup to make
standalone pg, and it starts without any problem, but when I start it as
sr slave, let it run for some time, and then promote to standalone, it
breaks?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<depesz(at)depesz(dot)com>,"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 21:24:23
Message-ID:	4D90B637020000250003BDF2@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:

> how come that I can use this backup to make standalone pg, and it
> starts without any problem, but when I start it as sr slave, let
> it run for some time, and then promote to standalone, it breaks?

We need more detail to make much of a guess about that.

-Kevin

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 21:42:23
Message-ID:	20110328214223.GA12802@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 04:24:23PM -0500, Kevin Grittner wrote:
> hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
>
> > how come that I can use this backup to make standalone pg, and it
> > starts without any problem, but when I start it as sr slave, let
> > it run for some time, and then promote to standalone, it breaks?
>
> We need more detail to make much of a guess about that.

what details can I provide?

I can provide scripts that I use to test it, and also access to test
machine that I was testing it on.

if you'd need something else - just tell me what, i'll do my best to
provide.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<depesz(at)depesz(dot)com>
Cc:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 21:53:37
Message-ID:	4D90BD11020000250003BE0C@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
> On Mon, Mar 28, 2011 at 04:24:23PM -0500, Kevin Grittner wrote:
>> hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
>>
>>> how come that I can use this backup to make standalone pg, and
<<< it starts without any problem, but when I start it as sr slave,
>>> let it run for some time, and then promote to standalone, it
>>> breaks?
>>
>> We need more detail to make much of a guess about that.
>
> what details can I provide?
>
> I can provide scripts that I use to test it, and also access to
> test machine that I was testing it on.

For starters, what do you mean by "it breaks"? What, exactly
happens? What is in the logs? What version of PostgreSQL? Are you
using pg_standby or custom scripts?

-Kevin

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 22:18:56
Message-ID:	20110328221855.GB30401@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 04:53:37PM -0500, Kevin Grittner wrote:
> hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
> > On Mon, Mar 28, 2011 at 04:24:23PM -0500, Kevin Grittner wrote:
> >> hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
> >>
> >>> how come that I can use this backup to make standalone pg, and
> <<< it starts without any problem, but when I start it as sr slave,
> >>> let it run for some time, and then promote to standalone, it
> >>> breaks?
> >>
> >> We need more detail to make much of a guess about that.
> >
> > what details can I provide?
> >
> > I can provide scripts that I use to test it, and also access to
> > test machine that I was testing it on.
>
> For starters, what do you mean by "it breaks"? What, exactly
> happens? What is in the logs? What version of PostgreSQL? Are you
> using pg_standby or custom scripts?

hmm ... i thought that all details are in the first mail in thread.

I can probably repost it, but it seems to me that it includes all of the
information - which scripts, how it fails, in what cases, and what
exactly i'm doing.

have you seen this mail -
http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php ?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<depesz(at)depesz(dot)com>
Cc:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 22:29:22
Message-ID:	4D90C572020000250003BE12@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:

> have you seen this mail -
> http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php

Ah, OK.

I have a theory. Can you try it in what would be the failure case,
but run an explicit a CHECKPOINT on the master, wait for
pg_controldata to show that checkpoint on the slave, and (as soon as
you see that) try to trigger the slave to come up in production?

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<depesz(at)depesz(dot)com>
Cc:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 22:43:15
Message-ID:	4D90C8B3020000250003BE1F@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:

> have you seen this mail -
> http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php

One more thing: Am I correct in understanding that you are trying to
do a PITR-style backup without using pg_start_backup() and
pg_stop_backup()? If so, why?

-Kevin

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 22:50:03
Message-ID:	20110328225002.GA22515@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 05:29:22PM -0500, Kevin Grittner wrote:
> hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
>
> > have you seen this mail -
> > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php
>
> Ah, OK.
>
> I have a theory. Can you try it in what would be the failure case,
> but run an explicit a CHECKPOINT on the master, wait for
> pg_controldata to show that checkpoint on the slave, and (as soon as
> you see that) try to trigger the slave to come up in production?

yes. will check, but it will happen in ~ 10 hours.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-28 22:51:11
Message-ID:	20110328225110.GB22515@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 05:43:15PM -0500, Kevin Grittner wrote:
> hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:
>
> > have you seen this mail -
> > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php
>
> One more thing: Am I correct in understanding that you are trying to
> do a PITR-style backup without using pg_start_backup() and
> pg_stop_backup()? If so, why?

because this is backup on slave, and the point was to make the backup
work without *any* bothering master.

so far it worked fine. and generally even with 9.0 it still works, and
backup *can* be used to setup new pg instance. but it cannot be used to
make sr slave, which we could later on promote.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	depesz(at)depesz(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 02:13:07
Message-ID:	AANLkTi=83zR2QaDMJJCigHVJ11P=VbrpFDifvWHA0_rL@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 9:19 PM, hubert depesz lubaczewski
<depesz(at)depesz(dot)com> wrote:
> On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote:
>> In 9.0, recovery doesn't read a backup history file. That FATAL error happens
>> if recovery ends before it reads the WAL record which was generated by
>> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL
>> record not backup history file. Since you didn't run pg_stop_backup() and there
>> is no WAL record containing the recovery ending location, you got that error.
>>
>> If you want to take hot backup from the standby, you need to do the procedure
>> explained in
>> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups
>
> Is it intentional and/or does it serve some greater good?

Yes, it's intentional. In streaming replication, at first the master must stream
a backup history file to the standby in order to let it know the recovery ending
position. But streaming replication doesn't have ability to send a text file, so
we changed the code so that the recovery ending position was also written as
WAL record which can be streamed.

IIRC another reason is that it's more reliable to write down the important
information like the recovery ending position to WAL record than a backup
history file.

> I mean -
> ability to make backups on slave without ever bothering master was
> pretty interesting.

Me, too. We would need to implement that in 9.2.

BTW, in my system, I use another trick to take a base backup from the
standby:

(All of these operations are expected to be performed on the standby)
(1) Run CHECKPOINT
(2) Copy pg_control to temporary area
(3) Take a base backup of $PGDATA
(4) Copy back pg_control from temporary area to the backup taken in (2).
(5) Calculate the recovery ending position from current pg_control in
$PGDATA by using pg_controldata

When recovery starts from that backup, it doesn't automatically check
whether it has reached the ending position or not. So the user needs to
check that manually.

Yeah, this trick is very fragile and complicated. I'd like to improve the way
in 9.2.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	depesz(at)depesz(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 02:20:48
Message-ID:	AANLkTikNQd8jXYuarhcoYe_2sRBPSf0_3N_o7OrkuLj-@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2011 at 12:11 AM, hubert depesz lubaczewski
<depesz(at)depesz(dot)com> wrote:
> On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote:
>> In 9.0, recovery doesn't read a backup history file. That FATAL error happens
>> if recovery ends before it reads the WAL record which was generated by
>> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL
>> record not backup history file. Since you didn't run pg_stop_backup() and there
>> is no WAL record containing the recovery ending location, you got that error.
>>
>> If you want to take hot backup from the standby, you need to do the procedure
>> explained in
>> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups
>
> one more question. how come that I can use this backup to make
> standalone pg, and it starts without any problem, but when I start it as
> sr slave, let it run for some time, and then promote to standalone, it
> breaks?

Did you use recovery.conf to start standalone PostgreSQL? If not,
recovery doesn't check whether it reaches the recovery ending position
or not. So I guess no problem didn't happen.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 09:41:42
Message-ID:	20110329094142.GA16699@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2011 at 05:29:22PM -0500, Kevin Grittner wrote:
> I have a theory. Can you try it in what would be the failure case,
> but run an explicit a CHECKPOINT on the master, wait for
> pg_controldata to show that checkpoint on the slave, and (as soon as
> you see that) try to trigger the slave to come up in production?

=$ ( pg_controldata master/; pg_controldata slave2/ ) | grep "Latest checkpoint location:"
Latest checkpoint location: 0/2D000058
Latest checkpoint location: 0/2C000058

=$ psql -p 54001 -c "checkpoint"
CHECKPOINT

=$ ( pg_controldata master/; pg_controldata slave2/ ) | grep "Latest checkpoint location:"
Latest checkpoint location: 0/2E000058
Latest checkpoint location: 0/2C000058

... ~ 1.5 minute later

=$ ( pg_controldata master/; pg_controldata slave2/ ) | grep "Latest checkpoint location:"
Latest checkpoint location: 0/2E000058
Latest checkpoint location: 0/2E000058

=$ touch /home/depesz/slave2/finish.recovery

it worked. now the slave2 is working as stand alone.

what does it tell us? will any work happening after checkpoint break it anyway?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 09:46:39
Message-ID:	20110329094638.GB16699@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2011 at 11:20:48AM +0900, Fujii Masao wrote:
> On Tue, Mar 29, 2011 at 12:11 AM, hubert depesz lubaczewski
> <depesz(at)depesz(dot)com> wrote:
> > On Mon, Mar 28, 2011 at 01:48:13PM +0900, Fujii Masao wrote:
> >> In 9.0, recovery doesn't read a backup history file. That FATAL error happens
> >> if recovery ends before it reads the WAL record which was generated by
> >> pg_stop_backup(). IOW, recovery gets the recovery ending location from WAL
> >> record not backup history file. Since you didn't run pg_stop_backup() and there
> >> is no WAL record containing the recovery ending location, you got that error.
> >>
> >> If you want to take hot backup from the standby, you need to do the procedure
> >> explained in
> >> http://wiki.postgresql.org/wiki/Incrementally_Updated_Backups
> >
> > one more question. how come that I can use this backup to make
> > standalone pg, and it starts without any problem, but when I start it as
> > sr slave, let it run for some time, and then promote to standalone, it
> > breaks?
>
> Did you use recovery.conf to start standalone PostgreSQL? If not,
> recovery doesn't check whether it reaches the recovery ending position
> or not. So I guess no problem didn't happen.

no, i don't use.

hmm .. i am nearly 100% certain that previous pgs did in fact check if the end
of recovery is reached.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 09:49:17
Message-ID:	20110329094917.GC16699@depesz.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2011 at 11:13:07AM +0900, Fujii Masao wrote:
> Yes, it's intentional. In streaming replication, at first the master must stream
> a backup history file to the standby in order to let it know the recovery ending
> position. But streaming replication doesn't have ability to send a text file, so
> we changed the code so that the recovery ending position was also written as
> WAL record which can be streamed.

ok, this makes sense.

> BTW, in my system, I use another trick to take a base backup from the
> standby:
>
> (All of these operations are expected to be performed on the standby)
> (1) Run CHECKPOINT
> (2) Copy pg_control to temporary area
> (3) Take a base backup of $PGDATA
> (4) Copy back pg_control from temporary area to the backup taken in (2).
> (5) Calculate the recovery ending position from current pg_control in
> $PGDATA by using pg_controldata
>
> When recovery starts from that backup, it doesn't automatically check
> whether it has reached the ending position or not. So the user needs to
> check that manually.
> Yeah, this trick is very fragile and complicated. I'd like to improve the way
> in 9.2.

I know about it, but I feel very worried about doing stuff like this -
i.e. meddling with internal files of pg.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	depesz(at)depesz(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 11:27:21
Message-ID:	AANLkTinTXNt7imZuAKPKHc84bTVSX2Q9Arb0fQ-rc=sW@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2011 at 6:46 PM, hubert depesz lubaczewski
<depesz(at)depesz(dot)com> wrote:
>> Did you use recovery.conf to start standalone PostgreSQL? If not,
>> recovery doesn't check whether it reaches the recovery ending position
>> or not. So I guess no problem didn't happen.
>
> no, i don't use.
>
> hmm .. i am nearly 100% certain that previous pgs did in fact check if the end
> of recovery is reached.

Yes. In 8.4, that was checked only when starting recovery from the backup
(i.e., which includes backup_label and backup history file) without
recovery.conf.
But in 9.0, the behavior was changed so that only archive recovery (i.e., with
recovery.conf) checks that. IIRC, we don't have strong opinion about
this change.
We should revert, in order to make even crash recovery check whether it
reaches the ending location? Of course, even if we do that, your problem is
not solved at all. So I think that the right direction is to implement
the ability
to easily take a base backup from the standby, in 9.2.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<depesz(at)depesz(dot)com>
Cc:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 13:49:03
Message-ID:	4D919CFF020000250003BE79@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:

> it worked. now the slave2 is working as stand alone.
>
> what does it tell us? will any work happening after checkpoint
> break it anyway?

I'm less sure about what will put it into a bad state again than I
was that an immediate checkpoint would put you into a good state.
I have a vague feeling that I've seen or heard something which
suggests that doing this during a spread checkpoint might be a
problem as things currently stand. I can't be more specific without
digging through code, and I'm pretty swamped at the moment.

-Kevin

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-29 15:54:17
Message-ID:	4D9200A9.5090406@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29.03.2011 14:27, Fujii Masao wrote:
> On Tue, Mar 29, 2011 at 6:46 PM, hubert depesz lubaczewski
> <depesz(at)depesz(dot)com> wrote:
>>> Did you use recovery.conf to start standalone PostgreSQL? If not,
>>> recovery doesn't check whether it reaches the recovery ending position
>>> or not. So I guess no problem didn't happen.
>>
>> no, i don't use.
>>
>> hmm .. i am nearly 100% certain that previous pgs did in fact check if the end
>> of recovery is reached.
>
> Yes. In 8.4, that was checked only when starting recovery from the backup
> (i.e., which includes backup_label and backup history file) without
> recovery.conf.
> But in 9.0, the behavior was changed so that only archive recovery (i.e., with
> recovery.conf) checks that. IIRC, we don't have strong opinion about
> this change.
> We should revert, in order to make even crash recovery check whether it
> reaches the ending location?

Hmm, why did we change that? It seems like a mistake, the database is
not consistent until you reach the backup stop location, whether or not
you're doing archive recovery. +1 for reverting that, and backpatching
it as well.

"pg_basebackup -x", which includes all the WAL required to restore in
the pg_xlog directory of the base backup itself, is also affected.
Without the check that you reach the end-of-backup, an aborted base
backup will appear to restore fine, even though some WAL segments are
missing and the backup is incomplete.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-30 02:39:24
Message-ID:	AANLkTik0eCRcFiK0GrtKDmVaUdWnHKWzFYN=v-_YRSzh@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 30, 2011 at 12:54 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Hmm, why did we change that?

I'm not sure, but I guess that's because I missed the case where crash
recovery starts from the backup :(

> It seems like a mistake, the database is not
> consistent until you reach the backup stop location, whether or not you're
> doing archive recovery. +1 for reverting that, and backpatching it as well.

Agreed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-30 06:25:30
Message-ID:	AANLkTimnkcYKnazLrUrGZE2mGnFW45k2KyXoJXvZVPtn@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 30, 2011 at 11:39 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Mar 30, 2011 at 12:54 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Hmm, why did we change that?
>
> I'm not sure, but I guess that's because I missed the case where crash
> recovery starts from the backup :(
>
>> It seems like a mistake, the database is not
>> consistent until you reach the backup stop location, whether or not you're
>> doing archive recovery. +1 for reverting that, and backpatching it as well.
>
> Agreed.

Attached patch reverts that. Comments?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
backup_stop_location_v1.patch	application/octet-stream	1.6 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-30 07:56:43
Message-ID:	4D92E23B.5070200@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30.03.2011 09:25, Fujii Masao wrote:
> On Wed, Mar 30, 2011 at 11:39 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Mar 30, 2011 at 12:54 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Hmm, why did we change that?
>>
>> I'm not sure, but I guess that's because I missed the case where crash
>> recovery starts from the backup :(
>>
>>> It seems like a mistake, the database is not
>>> consistent until you reach the backup stop location, whether or not you're
>>> doing archive recovery. +1 for reverting that, and backpatching it as well.
>>
>> Agreed.
>
> Attached patch reverts that. Comments?

Looks good, committed.

We could also improve the error message. If we haven't reached the
end-of-backup location, we could say something along the lines of:

ERROR: WAL ends before the end of online backup
HINT: Online backup must be ended with pg_stop_backup(), and all the WAL
up to that point must be available at recovery.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Problem with streaming replication, backups, and recovery (9.0.x)
Date:	2011-03-30 08:12:27
Message-ID:	AANLkTimZtVf7UCn-TsDfTX4CgHJdVmW0dEXo5CnLYts9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 30, 2011 at 4:56 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Attached patch reverts that. Comments?
>
> Looks good, committed.

Thanks!

> We could also improve the error message. If we haven't reached the
> end-of-backup location, we could say something along the lines of:
>
> ERROR: WAL ends before the end of online backup
> HINT: Online backup must be ended with pg_stop_backup(), and all the WAL up
> to that point must be available at recovery.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center