Problem with streaming replication, backups, and recovery (9.0.x)

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Problem with streaming replication, backups, and recovery (9.0.x)
Date: 2011-03-25 20:31:20
Message-ID: 20110325203120.GA5946@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi,

So, I hit a strange problem with Streaming Replication, that I cannot explain.

Executive summary:

when using hot backup made on straming replication slave, sometimes
(depending on load) generated backup is created in such a way, that while it
can be brough back as standalone Pg, and it can be brough back as streaming
slave, such slave (created off the backup) cannot be promoted to standalone.

Disclaimer:

I know that making hot backups on slave is not the suggested way, yet I was
doing it without any problem on earlier Postgres versions (8.2,8.3,8.4), and do
not have this problem with backups generated from the masters, so the problem I
hit now is so" peculiar, that I thought that it might be just an effect of some
underlying, more serious, condition.

Longer explanation:

First, let me explain how omnipitr-backup-slave works, because it's the tool
that I use to make backups on slave.

Steps that it does:

1. gets pg_controldata for $PGDATADIR
2. compresses $PGDATA to data tar.gz, putting inside backup_label file,
which contains:
START WAL LOCATION: %s (file %s)
CHECKPOINT LOCATION: %s
START TIME: %s
LABEL: OmniPITR_Slave_Hot_Backup
where START WAL LOCATION uses value from "Latest checkpoint's REDO
location" from pg_controldata from step #1, "CHECKPOINT LOCATION" is
taken from "Latest checkpoint location" from pg_controldata taken in step
#1, and START TIME is based on current (before starting compression of
$PGDATA) timestamp.
3. gets another copy of pg_controldata for $PGDATA
4. repeats step #3 until value in "Latest checkpoint location" will change
5. wait until file that contains WAL location, from "Minimum recovery ending
location" from pg_controldata from step #4, will be available
6. creates .backup file which is named based on "START WAL
LOCATION" (from step #2), and contains the same lines as backup_label file
from step #2, plus two more lines:
STOP WAL LOCATION: %s (file %s)
STOP TIME: %s
where STOP WAL LOCATION is taken from "Minimum recovery ending location"
from pg_controldata from step #4, and STOP time is current timestamp as
of before starting compression of wal segments.
7. compresses xlogs plus the .backup file generated in step #6.

This approach worked for a long time on various hosts, systems, versions, etc.

But now, it fails.

I'm using for tests PostgreSQL 9.0.2 and 9.0.3 (mostly 9.0.2 as this is the
most critical for me, but I tested on 9.0.3 too, and the problem is the same),
on linux (ubuntu), 64bit.

I do the procedure as always, and it produces backup. With this backup I can
setup new standalone server, and it works.

I can also setup streaming slave, and it also works, but when I create
trigger file to promote this slave to master it fails with error:
2011-03-24 21:01:58.051 CET @ 9680 LOG: trigger file found: /home/depesz/slave2/finish.recovery
2011-03-24 21:01:58.051 CET @ 9930 FATAL: terminating walreceiver process due to administrator command
2011-03-24 21:01:58.151 CET @ 9680 LOG: redo done at 0/1F000058
2011-03-24 21:01:58.151 CET @ 9680 LOG: last completed transaction was at log time 2011-03-24 20:58:25.836333+01
2011-03-24 21:01:58.238 CET @ 9680 FATAL: WAL ends before consistent recovery point

Which is interesting, because this particular backup was done using .backup file containing:

START WAL LOCATION: 0/A000020 (file 00000001000000000000000A)
STOP WAL LOCATION: 0/12C9D7E8 (file 000000010000000000000012)
CHECKPOINT LOCATION: 0/B803050
START TIME: 2011-03-24 20:52:46 CET
STOP TIME: 2011-03-24 20:53:41 CET
LABEL: OmniPITR_Slave_Hot_Backup

Which means that minimum recovery ending location was in fact reached (it was
on 0/12C9D7E8, and recovery continued till 0/1F000058).

I have set of script that can be used to replicate the problem, but the test
takes some time (~ 30 minutes).

What's most interesting is that this problem does not happen always. It
happens only when there was non-trivial load on db server - this is in my tests
where both master and slave are the same machine. I think that in normal cases
load on slave is more important.

If anyone would be able to help, I can give you access to test machine
and/or provide set of script which replicate (usually) the problem.

Alternatively - if there is anything I can do to help you solve the mystery
- I'd be very willing to.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2011-03-25 20:43:26 Re: Transactional DDL, but not Serializable
Previous Message Heikki Linnakangas 2011-03-25 20:26:30 Re: Pre-set Hint bits/VACUUM FREEZE on data load..?