Proposal: improve shutdown during online backup

Lists: pgsql-hackers
From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: improve shutdown during online backup
Date: 2008-03-26 13:36:53
Message-ID: D960CB61B694CF459DCFB4B0128514C201E6780F@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm referring to the discussion in this thread:
http://archives.postgresql.org/pgsql-hackers/2007-11/msg00946.php

As expressed in the thread, I think that there should not be
a backup_label file in the data directory after a clean shutdown,
because
a) it's probably an oversight anyway (someone forgot to
call pg_stop_backup) and
b) it will force an unnecessary recovery at server restart,
which will sometimes fail (if the WAL file is no longer there).

This is my proposal:

1) On "pg_ctl stop|restart -m smart", check if online backup is
in progress and do not shutdown in this case (treat the online
backup like an open connection).
2) On "pg_ctl stop|restart -m fast", remove backup_label after
the server has been brought down successfully.

If that's acceptable, I'd be willing to work on it.

Yours,
Laurenz Albe


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: improve shutdown during online backup
Date: 2008-03-26 20:51:07
Message-ID: 1206564667.4285.1223.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2008-03-26 at 14:36 +0100, Albe Laurenz wrote:
> I'm referring to the discussion in this thread:
> http://archives.postgresql.org/pgsql-hackers/2007-11/msg00946.php
>
> As expressed in the thread, I think that there should not be
> a backup_label file in the data directory after a clean shutdown,
> because
> a) it's probably an oversight anyway (someone forgot to
> call pg_stop_backup) and
> b) it will force an unnecessary recovery at server restart,
> which will sometimes fail (if the WAL file is no longer there).
>
> This is my proposal:
>
> 1) On "pg_ctl stop|restart -m smart", check if online backup is
> in progress and do not shutdown in this case (treat the online
> backup like an open connection).
> 2) On "pg_ctl stop|restart -m fast", remove backup_label after
> the server has been brought down successfully.
>
> If that's acceptable, I'd be willing to work on it.

Seems reasonable. Go for it.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: improve shutdown during online backup
Date: 2008-03-26 23:54:50
Message-ID: Pine.GSO.4.64.0803261917180.2012@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 26 Mar 2008, Albe Laurenz wrote:

> 1) On "pg_ctl stop|restart -m smart", check if online backup is
> in progress and do not shutdown in this case (treat the online
> backup like an open connection).

As long as you give a warning as to the cause. While you're in there, I
think more output in general about the reason why a smart shutdown failed
would be nice as well. I haven't looked at the code to see if it's
practical but I'd love "shutdown blocked by pid 53213,53216" rather than
having to go search for them myself after it quietly fails.

> 2) On "pg_ctl stop|restart -m fast", remove backup_label after
> the server has been brought down successfully.

And you need a warning here as well about this fact. I think the actual
details associated with that label should be both printed and put into the
logs at this time, so you know which backup you just hosed. Maybe the
label file could get renamed instead? Just deleting the file without
saving it somewhere doesn't seem right, that's the sort of thing MySQL
would do. If there's [one|some] of those failed backup logs inside $PGDATA
that gives an additional clue to an admin who doesn't watch that logs that
something is wrong with the backups.

There are three options here for how "-m fast" could handle things:

1) Warning, remove backup label.

2) Warning and server is not stopped. This is unacceptable because too
many scripts expect fast shutdown will usually take the server down (
/etc/init.d/postgresql being the most popular)

3) Server stops but you do get a stern warning that it will not start
again until you remove the backup label yourself--the current behavior
with a warning. The problem with this one is that some shutdowns don't
have any human involvement (again, consider server reboot) and therefore
you can't assume anyone will ever see this message.

If you want to remove the root problem here, you have to follow (1) and
remove the label. Otherwise it's still the case that the person who
starts the database will be surprised if the person stopping it isn't
paying attention (or isn't a person).

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Greg Smith *EXTERN*" <gsmith(at)gregsmith(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: improve shutdown during online backup
Date: 2008-03-27 07:51:40
Message-ID: D960CB61B694CF459DCFB4B0128514C201ED197D@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
> > 1) On "pg_ctl stop|restart -m smart", check if online backup is
> > in progress and do not shutdown in this case (treat the online
> > backup like an open connection).
>
> As long as you give a warning as to the cause. While you're in there, I
> think more output in general about the reason why a smart shutdown failed
> would be nice as well.

I'll look what I can do.

> > 2) On "pg_ctl stop|restart -m fast", remove backup_label after
> > the server has been brought down successfully.
>
> And you need a warning here as well about this fact. I think the actual
> details associated with that label should be both printed and put into the
> logs at this time, so you know which backup you just hosed.

Sounds right.

> Maybe the label file could get renamed instead?

I agree.

> There are three options here for how "-m fast" could handle things:
>
> 1) Warning, remove backup label.

I prefer that.

Thanks for the feedback,
Laurenz Albe