Re: Too strict check when starting from a basebackup taken off a standby

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Too strict check when starting from a basebackup taken off a standby
Date: 2014-12-18 08:30:01
Message-ID: 20141218083001.GY5023@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-12-16 18:37:48 +0200, Heikki Linnakangas wrote:
> On 12/11/2014 04:21 PM, Marco Nenciarini wrote:
> >Il 11/12/14 12:38, Andres Freund ha scritto:
> >>On December 11, 2014 9:56:09 AM CET, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> >>>On 12/11/2014 05:45 AM, Andres Freund wrote:
> >>>
> >>>Yeah. I was not able to reproduce this, but I'm clearly missing
> >>>something, since both you and Sergey have seen this happening. Can you
> >>>write a script to reproduce?
> >>
> >>Not right now, I only have my mobile... Its quite easy though. Create a pg-basebackup from a standby. Create a recovery.conf with a broken primary conninfo. Start. Shutdown. Fix conninfo. Start.
> >>
> >
> >Just tested it. There steps are not sufficient to reproduce the issue on
> >a test installation. I suppose because, on small test datadir, the
> >checkpoint location and the redo location on the pg_control are the same
> >present in the backup_label.
> >
> >To trigger this bug you need to have at least a restartpoint happened on
> >standby between the start and the end of the backup.
> >
> >you could simulate it issuing a checkpoint on master, a checkpoint on
> >standby (to force a restartpoint), then copying the pg_control from the
> >standby.
> >
> >This way I've been able to reproduce it.
>
> Ok, got it. I was able to reproduce this by using pg_basebackup
> --max-rate=1024, and issuing "CHECKPOINT" in the standby while the backup
> was running.

FWIW, I can reproduce it without any such hangups. I've just tested it
using my local scripts:
# create primary
$ reinit-pg-dev-master
$ run-pg-dev-master
# create first standby
$ reinit-pg-dev-master-standby
$ run-pg-dev-master-standby
# create 2nd standby
$ pg_basebackup -h /tmp/ -p 5441 -D /tmp/tree --write-recovery-conf
$ PGHOST=frakbar run-pg-dev-master-standby -D /tmp/tree
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
FATAL: could not connect to the primary server: could not translate host name "frakbar" to address: Name or service not known
$ PGHOST=/tmp run-pg-dev-master-standby -D /tmp/tree
LOG: started streaming WAL from primary at 0/2000000 on timeline 1
FATAL: backup_label contains data inconsistent with control file
HINT: This means that the backup is corrupted and you will have to use another backup for recovery.

After the fix I just pushed that sequence works.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-12-18 08:34:59 Re: Minor improvement to explain.c
Previous Message Fujii Masao 2014-12-18 08:27:04 Re: [REVIEW] Re: Compression of full-page-writes