Quick Links

Re: Too strict check when starting from a basebackup taken off a standby

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>,pgsql-hackers(at)postgresql(dot)org
Cc:	eshkinkot(at)gmail(dot)com
Subject:	Re: Too strict check when starting from a basebackup taken off a standby
Date:	2014-12-11 11:38:05
Message-ID:	B1E95D69-931A-40D1-A8A9-9DC0B88392A2@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On December 11, 2014 9:56:09 AM CET, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>On 12/11/2014 05:45 AM, Andres Freund wrote:
>> A customer recently reported getting "backup_label contains data
>> inconsistent with control file" after taking a basebackup from a
>standby
>> and starting it with a typo in primary_conninfo.
>>
>> When starting postgres from a basebackup StartupXLOG() has the follow
>> code to deal with backup labels:
>> if (haveBackupLabel)
>> {
>> ControlFile->backupStartPoint = checkPoint.redo;
>> ControlFile->backupEndRequired = backupEndRequired;
>>
>> if (backupFromStandby)
>> {
>> if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY)
>> ereport(FATAL,
>> (errmsg("backup_label contains data
>inconsistent with control file"),
>> errhint("This means that the backup is
>corrupted and you will "
>> "have to use another backup for
>recovery.")));
>> ControlFile->backupEndPoint =
>ControlFile->minRecoveryPoint;
>> }
>> }
>>
>> while I'm not enthusiastic about the error message, that bit of code
>> looks sane at first glance. We certainly expect the control file to
>> indicate we're in recovery. Since we're unlinking the backup label
>> shortly afterwards we'd normally not expect to hit that case after a
>> shutdown in recovery.
>
>Check.
>
>> The problem is that after reading the backup label we also have to
>read
>> the corresponding checkpoing from pg_xlog. If primary_conninfo and/or
>> restore_command are misconfigured and can't restore files that can
>only
>> be fixed by shutting down the cluster and fixing up recovery.conf -
>> which sets DB_SHUTDOWNED_IN_RECOVERY in the control file.
>
>No it doesn't. The state is set to DB_SHUTDOWNED_IN_RECOVERY in
>CreateRestartPoint(). If you shut down the server before it has even
>read the initial checkpoint record, it will not attempt to create a
>restartpoint nor update the control file.

Yes, it does. There's a shortcut that just sets the state in the control file and then exits.

>> The easiest solution seems to be to simply also allow that as a state
>in
>> the above check. It might be nicer to not allow a ShutdownXLOG to
>modify
>> the control file et al at that stage, but I think that'd end up being
>> more invasive.
>>
>> A short search shows that that also looks like a credible explanation
>> for #12128...
>
>Yeah. I was not able to reproduce this, but I'm clearly missing
>something, since both you and Sergey have seen this happening. Can you
>write a script to reproduce?

Not right now, I only have my mobile... Its quite easy though. Create a pg-basebackup from a standby. Create a recovery.conf with a broken primary conninfo. Start. Shutdown. Fix conninfo. Start.

Andres

--
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Too strict check when starting from a basebackup taken off a standby at 2014-12-11 08:56:09 from Heikki Linnakangas

Responses

Re: Too strict check when starting from a basebackup taken off a standby at 2014-12-11 14:21:03 from Marco Nenciarini

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2014-12-11 13:07:49	Re: Review of Refactoring code for sync node detection
Previous Message	Heikki Linnakangas	2014-12-11 11:01:08	Re: Directory/File Access Permissions for COPY and Generic File Access Functions