backup_label in a crash recovery

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: backup_label in a crash recovery
Date: 2009-11-02 06:51:29
Message-ID: 3f0b79eb0911012251x111092e3r33dbd918389f6519@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

When a crash occurs before calling pg_stop_backup(),
the subsequent crash recovery causes the FATAL error
and outputs the following HINT message.

If you are not restoring from a backup, try removing the file
\"%s/backup_label\"."

I wonder why backup_label isn't automatically removed
in normal crash recovery case. Is this for the fail-safe
protection; prevent admin from restoring from a backup
wrongly without creating recovery.conf? Or another?

If that's intentional, a clusterware for shared disk
failover system should remove backup_label whenever
doing failover. Otherwise, when a crash occurs during
online-backup, the failover would fail.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Fujii Masao *EXTERN*" <masao(dot)fujii(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label in a crash recovery
Date: 2009-11-02 10:05:47
Message-ID: D960CB61B694CF459DCFB4B0128514C203937FF1@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> When a crash occurs before calling pg_stop_backup(),
> the subsequent crash recovery causes the FATAL error
> and outputs the following HINT message.
>
> If you are not restoring from a backup, try removing the file
> \"%s/backup_label\"."
>
> I wonder why backup_label isn't automatically removed
> in normal crash recovery case. Is this for the fail-safe
> protection; prevent admin from restoring from a backup
> wrongly without creating recovery.conf? Or another?
>
> If that's intentional, a clusterware for shared disk
> failover system should remove backup_label whenever
> doing failover. Otherwise, when a crash occurs during
> online-backup, the failover would fail.

I do not know if there is a good reason why the server does
not ignore backup_label if recovery.conf is not present.

But as it is, any failover system should definitely remove
backup_label.

Yours,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label in a crash recovery
Date: 2009-11-02 14:24:54
Message-ID: 24115.1257171894@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> I wonder why backup_label isn't automatically removed
> in normal crash recovery case.

Removing it automatically could be catastrophic if done incorrectly, no?

> If that's intentional, a clusterware for shared disk
> failover system should remove backup_label whenever
> doing failover.

It would be no less catastrophic if done incorrectly from outside the
postmaster; see for example the problems people have had historically
with startup scripts that think they should remove postmaster.pid.

regards, tom lane


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label in a crash recovery
Date: 2009-11-03 10:56:30
Message-ID: D960CB61B694CF459DCFB4B0128514C203937FF7@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> > I wonder why backup_label isn't automatically removed
> > in normal crash recovery case.
>
> Removing it automatically could be catastrophic if done
> incorrectly, no?
>
> It would be no less catastrophic if done incorrectly from outside the
> postmaster; see for example the problems people have had historically
> with startup scripts that think they should remove postmaster.pid.

I beg to differ.

Removing postmaster.pid can lead to a corrupt database.
Removing backup_label means that one of your backups will go wrong,
and a subsequent pg_stop_backup() will throw an error.

If you have a cluster failover during an online backup, I think
any reasonable person would suspect that the backup went wrong.
And if nothing else does, the error on pg_stop_backup() will tell you.

Given a choice, I expect that everybody who is intent enough
on availibility to implement such a solution will want the
database to come up if it can be done without data loss.

Is there a flaw in my reasoning?

Yours,
Laurenz Albe


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: laurenz(dot)albe(at)wien(dot)gv(dot)at ("Albe Laurenz"), "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label in a crash recovery
Date: 2009-11-03 15:01:15
Message-ID: 873a4vfynl.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Albe" == "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> writes:

Albe> Removing postmaster.pid can lead to a corrupt database.
Albe> Removing backup_label means that one of your backups will go
Albe> wrong, and a subsequent pg_stop_backup() will throw an error.

Albe> If you have a cluster failover during an online backup, I think
Albe> any reasonable person would suspect that the backup went wrong.
Albe> And if nothing else does, the error on pg_stop_backup() will
Albe> tell you.
[...]
Albe> Is there a flaw in my reasoning?

Yes.

Imagine the following scenario: the system crashed while pg_start_backup
was in effect (so backup_label exists), and the postmaster is about to
start up. i.e. you're at the point where you might naively imagine that
you can delete the backup_label.

How do you distinguish between these two scenarios:

1) you're starting up in a data dir where you crashed in the middle of
a backup

2) you're starting up in a data dir that is a restore of a base backup,
but no recovery.conf has been created

(hint: you can't)

If in scenario 2, you remove the backup_label and proceed with
recovery, there is a significant chance (depending on the timing, and
if you didn't exclude pg_xlog from the backup) that recovery will
_think_ it succeeds but actually leaves you with a corrupt data
directory.

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: laurenz(dot)albe(at)wien(dot)gv(dot)at ("Albe Laurenz"), "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label in a crash recovery
Date: 2009-11-03 16:01:09
Message-ID: 5091.1257264069@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ after further thought... ]

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> How do you distinguish between these two scenarios:

> 1) you're starting up in a data dir where you crashed in the middle of
> a backup

> 2) you're starting up in a data dir that is a restore of a base backup,
> but no recovery.conf has been created

> (hint: you can't)

Hmm ... you can not tell this if the postmaster just started, and
I agree that removing backup_label in such a case is too risky.
However, in a typical crash scenario the postmaster doesn't die,
it just kills off and restarts its children; and in that scenario
we do have additional knowledge, namely that the postmaster was
already up. I think it could be safe and useful to forcibly remove
backup_label before commencing recovery, *if* we know that the system
had previously been in fully-operational status.

However, this begs the question: does a backend crash necessarily imply
that an in-progress base backup has to be canceled and restarted from
scratch? It's not clear to me why you wouldn't consider that the backup
can keep going. So maybe what we really want here is not to remove the
label file, but to have the postmaster signal to the recovery process
that it knows this is a crash recovery and any backup_label should be
ignored.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label in a crash recovery
Date: 2009-11-04 01:58:38
Message-ID: 3f0b79eb0911031758s248c13e5vcfd594c3d30f12bf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Wed, Nov 4, 2009 at 12:01 AM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> 2) you're starting up in a data dir that is a restore of a base backup,
>   but no recovery.conf has been created

Is the scenario 2 (i.e., a normal crash recovery without recovery.conf)
supported in postgres? But, anyway, it's possible by admin's error in
operation. So maybe backup_label should not be removed automatically for
the fail-safe protection, in that case.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center