Re: backup_label and server start

Lists: pgsql-hackers
From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: backup_label and server start
Date: 2007-11-20 14:19:26
Message-ID: D960CB61B694CF459DCFB4B0128514C293CEB7@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

If the postmaster is stopped with 'pg_ctl stop' while an
online backup is in progress, the 'backup_label' file will remain
in the data directory.

There is no recovery.conf file present.

When the server is started again, it attempts to recover from
the checkpoint marked in the backup_label file even if the
shutdown was clean.

If the WAL file mentioned in backup_label is not in pg_xlog
(it has already been archived and removed because there was
enough database activity since pg_start_backup()), the startup
process will fail with a message like this:

LOG: could not open file "pg_xlog/000000020000000000000084" (log file 0, segment 132): No such file or directory
LOG: invalid checkpoint record
PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".

My question:
Is it safe to just delete the file as the hint suggests?

I see the following comment in src/backend/access/transam/xlog.c:

/*
* read_backup_label: check to see if a backup_label file is present
*
* If we see a backup_label during recovery, we assume that we are recovering
* from a backup dump file, and we therefore roll forward from the checkpoint
* identified by the label file, NOT what pg_control says. This avoids the
* problem that pg_control might have been archived one or more checkpoints
* later than the start of the dump, and so if we rely on it as the start
* point, we will fail to restore a consistent database state.

"We will fail to restore a consistent database state"
sounds rather intimidating.

*If* - on the other hand - it is safe to follow the hint
and remove the backup_label, wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

Or, alternatively, the backup_label file could by removed by a
clean shutdown.

Thanks,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label and server start
Date: 2007-11-20 15:48:46
Message-ID: 25388.1195573726@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> writes:
> wouldn't it be a good thing
> for the startup process to ignore (and rename) the backup_label
> file if no recovery.conf is present?

No, it certainly wouldn't.

I don't see why we should simplify the bizarre case you're talking about
at the price of putting land mines under the feet of people who are
actually trying to do a restore. It hasn't lost any data for you,
and it gave you a correct HINT, so I don't have a problem with the
current behavior.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label and server start
Date: 2007-11-20 17:11:53
Message-ID: 1195578713.4217.283.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2007-11-20 at 15:19 +0100, Albe Laurenz wrote:

> "We will fail to restore a consistent database state"
> sounds rather intimidating.

Well, how else should a warning of data loss sound? :-)

It's vaguely possible that the database state could be consistent, if
the server were quiet when you stopped it. But that is unlikely *and*
there is no way of knowing for certain, that is why we introduced
pg_stop_backup() in the first place.

> *If* - on the other hand - it is safe to follow the hint
> and remove the backup_label, wouldn't it be a good thing
> for the startup process to ignore (and rename) the backup_label
> file if no recovery.conf is present?

The hint is telling you how to restart the original server, not a crafty
way of cheating the process to allow you to use it for backup.

What are you trying to do?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>
Subject: Re: backup_label and server start
Date: 2007-11-21 08:04:21
Message-ID: D960CB61B694CF459DCFB4B0128514C293D051@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> If the postmaster is stopped with 'pg_ctl stop' while an
>> online backup is in progress, the 'backup_label' file will remain
>> in the data directory.
[...]
>> the startup process will fail with a message like this:
[...]
>> PANIC: could not locate required checkpoint record
>> HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".
>>
>> wouldn't it be a good thing
>> for the startup process to ignore (and rename) the backup_label
>> file if no recovery.conf is present?

Tom Lane replied:
> No, it certainly wouldn't.

Point taken. When backup_label is present and recovery.conf isn't,
there is the risk that the data directory has been restored from
an online backup, in which case using the latest available
checkpoint would be detrimental.

> I don't see why we should simplify the bizarre case you're
> talking about

Well, it's not a bizarre case, it has happened twice here.

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

One of our databases is running in a RedHat cluster, which
in this case cannot failover to another node.
And this can also happen during an online backup.

Simon Riggs replied:
> The hint is telling you how to restart the original server, not a crafty
> way of cheating the process to allow you to use it for backup.
>
> What are you trying to do?

You misunderstood me, I'm not trying to cheat anything, nor do
I want to restore a backup that way.

All I want to do is restart a server after a clean shutdown.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

What do you think?

Yours,
Laurenz Albe


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Tom Lane *EXTERN* <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label and server start
Date: 2007-11-21 09:07:45
Message-ID: 1195636065.4217.431.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-11-21 at 09:04 +0100, Albe Laurenz wrote:

> If somebody stops the postmaster while an online backup is
> in progress, there is no warning or nothing. Only the server
> will fail to restart.

Well, it seems best not to do this. There is always a need for a careful
procedure to manually shutdown a live server, interlocking with other
applications. ISTM like a manual procedure will resolve this for you.

If we remove the file in the place you suggest then an Archive Recovery
will succeed when it should fail, with no possibility of a hint, which
seems a worse error.

> All I want to do is restart a server after a clean shutdown.
>
> How about my second suggestion:
>
> Remove backup_label when the server shuts down cleanly.
> In that case an online backup in progress will not be useful
> anyway, and there is no need to recover on server restart.

That will make PITRs fail:

1. pg_start_backup()
2. backup
3. shutdown, removes backup_label
4. pg_stop_backup()

step 4 will now fail because of a missing backup_label file.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Peter Childs" <peterachilds(at)gmail(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label and server start
Date: 2007-11-21 09:47:22
Message-ID: a2de01dd0711210147r7ba662e9l861e785129ef757e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21/11/2007, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Wed, 2007-11-21 at 09:04 +0100, Albe Laurenz wrote:
>
> > If somebody stops the postmaster while an online backup is
> > in progress, there is no warning or nothing. Only the server
> > will fail to restart.
>
> Well, it seems best not to do this. There is always a need for a careful
> procedure to manually shutdown a live server, interlocking with other
> applications. ISTM like a manual procedure will resolve this for you.
>
> If we remove the file in the place you suggest then an Archive Recovery
> will succeed when it should fail, with no possibility of a hint, which
> seems a worse error.
>
> > All I want to do is restart a server after a clean shutdown.
> >
> > How about my second suggestion:
> >
> > Remove backup_label when the server shuts down cleanly.
> > In that case an online backup in progress will not be useful
> > anyway, and there is no need to recover on server restart.
>
> That will make PITRs fail:
>
> 1. pg_start_backup()
> 2. backup
> 3. shutdown, removes backup_label
> 4. pg_stop_backup()
>
> step 4 will now fail because of a missing backup_label file.
>
>
How about this, emit a warning on shutdown and fail to shutdown until the
backup has finished.

Seams to me that either way your sunk if you shut down a server while a
backup is in progress. Your only way out is to work out weather to use the
previous pitr backups plus logs or remove the label. Doing it automatically
would be very very dangerous.

Peter.


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Peter Childs <peterachilds(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label and server start
Date: 2007-11-21 13:33:52
Message-ID: 1195652032.4246.7.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-11-21 at 09:47 +0000, Peter Childs wrote:

> How about this, emit a warning on shutdown and fail to shutdown until
> the backup has finished.

That would be reasonable for -m smart shutdown.

We would then be treating the backup as a connection.

...but not for a fast shutdown.

Any comments against?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label and server start
Date: 2007-11-21 14:04:47
Message-ID: D960CB61B694CF459DCFB4B0128514C293D33A@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
>> If somebody stops the postmaster while an online backup is
>> in progress, there is no warning or nothing. Only the server
>> will fail to restart.
>
> Well, it seems best not to do this. There is always a need
> for a careful
> procedure to manually shutdown a live server, interlocking with other
> applications. ISTM like a manual procedure will resolve this for you.

You're arguing that there *should* be a manual intervention
if a server was shutdown while a backup was active.

> If we remove the file in the place you suggest then an Archive Recovery
> will succeed when it should fail, with no possibility of a hint, which
> seems a worse error.
>
>> How about my second suggestion:
>>
>> Remove backup_label when the server shuts down cleanly.
>> In that case an online backup in progress will not be useful
>> anyway, and there is no need to recover on server restart.
>
> That will make PITRs fail:
>
> 1. pg_start_backup()
> 2. backup
> 3. shutdown, removes backup_label
> 4. pg_stop_backup()
>
> step 4 will now fail because of a missing backup_label file.

Using the same kind of argument as you did above I would
say that pg_stop_backup() *should* fail if the server
restarted (and recovered!) inbetween - there was certainly something
fishy going on during the online backup.

In your list, you left out step 3.5: restart the server.
This step may fail if you do *not* remove the backup_label.

What is worse:
- Have pg_stop_backup() fail if the server was shut down
during the backup
or
- Prevent the server from restarting at all without manual
intervention.

I would say the latter.

Yours,
Laurenz Albe


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Tom Lane *EXTERN* <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: backup_label and server start
Date: 2007-11-21 14:32:22
Message-ID: 1195655542.4246.27.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-11-21 at 15:04 +0100, Albe Laurenz wrote:
> Simon Riggs wrote:
> >> If somebody stops the postmaster while an online backup is
> >> in progress, there is no warning or nothing. Only the server
> >> will fail to restart.
> >
> > Well, it seems best not to do this. There is always a need
> > for a careful
> > procedure to manually shutdown a live server, interlocking with other
> > applications. ISTM like a manual procedure will resolve this for you.
>
> You're arguing that there *should* be a manual intervention
> if a server was shutdown while a backup was active.

Shutting down the server was a manual action, so what is wrong in a
manual action to recover from that mistake?

If the shutdown was automatic, then it needs to be properly scheduled so
automatic actions do not conflict with one another.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label and server start
Date: 2007-11-21 15:29:57
Message-ID: D960CB61B694CF459DCFB4B0128514C293D3DD@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> That will make PITRs fail:
>
> 1. pg_start_backup()
> 2. backup
> 3. shutdown, removes backup_label
> 4. pg_stop_backup()
>
> step 4 will now fail because of a missing backup_label file.

Wait a minute:
pg_stop_backup() will also fail in the current setup,
because after recovery backup_label gets renamed
to backup_label.old.

So what do we lose if we remove (or rename) backup_label
on a clean server shutdown?

Yours,
Laurenz Albe


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>, "Peter Childs" <peterachilds(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label and server start
Date: 2007-11-22 07:48:36
Message-ID: D960CB61B694CF459DCFB4B0128514C299CAF9@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Wed, 2007-11-21 at 09:47 +0000, Peter Childs wrote:
>> How about this, emit a warning on shutdown and fail to shutdown until
>> the backup has finished.
>
> That would be reasonable for -m smart shutdown.
>
> We would then be treating the backup as a connection.
>
> ...but not for a fast shutdown.
>
> Any comments against?

No, that would be ok with me.

Anything that gets us out of the trap that you can shutdown
a server without any warning and then cannot restart it without
manual intervention.

What about: refuse shutdown for "smart" if a backup is in progress,
but shutdown with a loud warning for "fast".

... I still don't know what's wrong with removing backup_label
upon a clean server shutdown ...

Yours,
Laurenz Albe


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>
Subject: Re: backup_label and server start
Date: 2007-11-24 20:19:21
Message-ID: 200711242019.lAOKJL521577@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Albe Laurenz wrote:
> >> If the postmaster is stopped with 'pg_ctl stop' while an
> >> online backup is in progress, the 'backup_label' file will remain
> >> in the data directory.
> [...]
> >> the startup process will fail with a message like this:
> [...]
> >> PANIC: could not locate required checkpoint record
> >> HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".
> >>
> >> wouldn't it be a good thing
> >> for the startup process to ignore (and rename) the backup_label
> >> file if no recovery.conf is present?
>
> Tom Lane replied:
> > No, it certainly wouldn't.
>
> Point taken. When backup_label is present and recovery.conf isn't,
> there is the risk that the data directory has been restored from
> an online backup, in which case using the latest available
> checkpoint would be detrimental.
>
> > I don't see why we should simplify the bizarre case you're
> > talking about
>
> Well, it's not a bizarre case, it has happened twice here.
>
> If somebody stops the postmaster while an online backup is
> in progress, there is no warning or nothing. Only the server
> will fail to restart.
>
> One of our databases is running in a RedHat cluster, which
> in this case cannot failover to another node.
> And this can also happen during an online backup.
>
> Simon Riggs replied:
> > The hint is telling you how to restart the original server, not a crafty
> > way of cheating the process to allow you to use it for backup.
> >
> > What are you trying to do?
>
> You misunderstood me, I'm not trying to cheat anything, nor do
> I want to restore a backup that way.
>
> All I want to do is restart a server after a clean shutdown.
>
> How about my second suggestion:
>
> Remove backup_label when the server shuts down cleanly.
> In that case an online backup in progress will not be useful
> anyway, and there is no need to recover on server restart.
>
> What do you think?
>
> Yours,
> Laurenz Albe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>
Subject: Re: backup_label and server start
Date: 2008-03-17 21:49:27
Message-ID: 200803172149.m2HLnSK08467@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

o Fix server restart problem when the server was shutdown during
a PITR backup

http://archives.postgresql.org/pgsql-hackers/2007-11/msg00800.php

---------------------------------------------------------------------------

Albe Laurenz wrote:
> >> If the postmaster is stopped with 'pg_ctl stop' while an
> >> online backup is in progress, the 'backup_label' file will remain
> >> in the data directory.
> [...]
> >> the startup process will fail with a message like this:
> [...]
> >> PANIC: could not locate required checkpoint record
> >> HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".
> >>
> >> wouldn't it be a good thing
> >> for the startup process to ignore (and rename) the backup_label
> >> file if no recovery.conf is present?
>
> Tom Lane replied:
> > No, it certainly wouldn't.
>
> Point taken. When backup_label is present and recovery.conf isn't,
> there is the risk that the data directory has been restored from
> an online backup, in which case using the latest available
> checkpoint would be detrimental.
>
> > I don't see why we should simplify the bizarre case you're
> > talking about
>
> Well, it's not a bizarre case, it has happened twice here.
>
> If somebody stops the postmaster while an online backup is
> in progress, there is no warning or nothing. Only the server
> will fail to restart.
>
> One of our databases is running in a RedHat cluster, which
> in this case cannot failover to another node.
> And this can also happen during an online backup.
>
> Simon Riggs replied:
> > The hint is telling you how to restart the original server, not a crafty
> > way of cheating the process to allow you to use it for backup.
> >
> > What are you trying to do?
>
> You misunderstood me, I'm not trying to cheat anything, nor do
> I want to restore a backup that way.
>
> All I want to do is restart a server after a clean shutdown.
>
> How about my second suggestion:
>
> Remove backup_label when the server shuts down cleanly.
> In that case an online backup in progress will not be useful
> anyway, and there is no need to recover on server restart.
>
> What do you think?
>
> Yours,
> Laurenz Albe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +