Re: backup_label revisited

Lists: pgsql-hackers
From: Greg Stark <stark(at)mit(dot)edu>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: backup_label revisited
Date: 2014-05-29 12:12:14
Message-ID: CAM-w4HPAmArty712RtOvWAZbxOH9=zUGafEoDmPnYc9rS_XL3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So I ran into the case again where a system crashed while a hot backup
was being taken. Postgres couldn't start up automatically because the
backup_label was present. This has come up before e.g.
http://www.postgresql.org/message-id/CAAZKuFaP1GxcOJtyzCh13rvevJeVwro1VVfbYsQTWGUD9iS1_g@mail.gmail.com
but I believe no progress was made.

I was trying to think if we could somehow identify if the backup_label
was from a backup in progress or a restore in progress. Obvious
choices like putting the server ip address in it are obviously not
going to work for several reasons.

However, at least on Linux wouldn't it be sufficient to put the inode
number of the backup_label file in the backup_label? If it's still the
same inode then it's just restarting, not a restore since afaik
there's no way for tar or the like to recreate the file with the same
inode on any filesystem. That would even protect against another
restore on the same host.

--
greg


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label revisited
Date: 2014-06-02 13:10:27
Message-ID: CAHGQGwGfbDa2iL9ZEqrDhX4Kk9cjho1HUxv_tnUkCR4tBvzwFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 29, 2014 at 9:12 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> So I ran into the case again where a system crashed while a hot backup
> was being taken. Postgres couldn't start up automatically because the
> backup_label was present. This has come up before e.g.
> http://www.postgresql.org/message-id/CAAZKuFaP1GxcOJtyzCh13rvevJeVwro1VVfbYsQTWGUD9iS1_g@mail.gmail.com
> but I believe no progress was made.
>
> I was trying to think if we could somehow identify if the backup_label
> was from a backup in progress or a restore in progress. Obvious
> choices like putting the server ip address in it are obviously not
> going to work for several reasons.
>
> However, at least on Linux wouldn't it be sufficient to put the inode
> number of the backup_label file in the backup_label? If it's still the
> same inode then it's just restarting, not a restore since afaik
> there's no way for tar or the like to recreate the file with the same
> inode on any filesystem.

Could you let me know the link to the page explaining this?

> That would even protect against another
> restore on the same host.

What about the case where we restore the backup to another server and
start the recovery? In this case, ISTM inode can be the same. No?

Regards,

--
Fujii Masao


From: Greg Stark <stark(at)mit(dot)edu>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label revisited
Date: 2014-06-04 17:17:29
Message-ID: CAM-w4HODJejOnLYYYcYF8vv+jNn1_FFP7K2jGFMO8=EtxYcRFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 2, 2014 at 2:10 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Could you let me know the link to the page explaining this?
>
>> That would even protect against another
>> restore on the same host.
>
> What about the case where we restore the backup to another server and
> start the recovery? In this case, ISTM inode can be the same. No?

Hm, I was about to comment that that seems very unlikely. Even on the
same server if you delete the old database root and then unpack a
backup it could possibly reuse the exact same inode again. But it's
really not likely to happen.

However in the brave new world of filesystem snapshots there is the
possibility someone has "restored" a backup by opening one of their
snapshots read-write. In which case the backup-label would have the
same inode number. That would still be fine if the snapshot is
consistent but if they have tablespaces and those tablespaces were
snapshotted separately then it wouldn't be ok.

I think that's a fatal flaw unless anyone can see a way to distinguish
a filesystem from a snapshot of the filesystem.

--
greg


From: Noah Misch <noah(at)leadboat(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup_label revisited
Date: 2014-06-08 23:57:19
Message-ID: 20140608235719.GA585541@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 04, 2014 at 06:17:29PM +0100, Greg Stark wrote:
> On Mon, Jun 2, 2014 at 2:10 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > What about the case where we restore the backup to another server and
> > start the recovery? In this case, ISTM inode can be the same. No?
>
> Hm, I was about to comment that that seems very unlikely. Even on the
> same server if you delete the old database root and then unpack a
> backup it could possibly reuse the exact same inode again. But it's
> really not likely to happen.
>
> However in the brave new world of filesystem snapshots there is the
> possibility someone has "restored" a backup by opening one of their
> snapshots read-write. In which case the backup-label would have the
> same inode number. That would still be fine if the snapshot is
> consistent but if they have tablespaces and those tablespaces were
> snapshotted separately then it wouldn't be ok.
>
> I think that's a fatal flaw unless anyone can see a way to distinguish
> a filesystem from a snapshot of the filesystem.

Implementations of the "dump" program likely have that property of preserving
inode metadata while not promising a consistent snapshot. If we keep such
backup methods fully supported, I agree with your conclusion.

PostgreSQL can't, in general, distinguish an almost-snapshot from a master
that crashed during a backup. But the administrator can track the difference.
If you use pg_start_backup(), your init.d script that gets control after a
crash can have 'rm -f "$PGDATA"/backup_label'. Use a different script to
recover hot backups. Perhaps what ought to change is the documentation and
contrib/start-scripts? Maybe even add a --definitely-not-a-backup postmaster
option, so scripts need not hardcode knowledge of a semi-internal fact like
the name of the file to delete.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com