backup tools ought to ensure created backups are durable

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: backup tools ought to ensure created backups are durable
Date: 2016-03-27 23:30:33
Message-ID: 20160327233033.GD20662@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

As pointed out in
http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
make any efforts to ensure their output is durable.

I think for backup tools of possibly critical data, that's pretty much
unaceptable.

There's cases where we can't ensure durability (i.e. pg_dump | gzip >
file), but it's out of our hands in that case.

Greetings,

Andres Freund


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-28 01:11:55
Message-ID: CAB7nPqS=h+5dF3dea1EzsSHB_D23kLhsgS7Lg1jMqs04dSZR6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> As pointed out in
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> make any efforts to ensure their output is durable.
>
> I think for backup tools of possibly critical data, that's pretty much
> unaceptable.

Definitely agreed, once a backup/dump has been taken and those
utilities exit, we had better ensure that they are durably on disk.
For pg_basebackup and pg_dump, as everything except pg_dump/plain
require a target directory for the location of the output result, we
really can make things better.
--
Michael


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-28 09:35:57
Message-ID: CABUevEwTBLLAxz9V6YAj6v+2WKWW2emBkAmec4vb0=DgGFYzsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > As pointed out in
> >
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > make any efforts to ensure their output is durable.
> >
> > I think for backup tools of possibly critical data, that's pretty much
> > unaceptable.
>
> Definitely agreed, once a backup/dump has been taken and those
> utilities exit, we had better ensure that they are durably on disk.
> For pg_basebackup and pg_dump, as everything except pg_dump/plain
> require a target directory for the location of the output result, we
> really can make things better.
>
>
Definitely agreed on fixing it. But I don't think your summary is right.

pg_basebackup in tar mode can be sent to stdout, does not require a
directory. And the same for pg_dump in any mode except for directory. So we
can't just drive it off the mode, some more detailed checks are required.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Andres Freund <andres(at)anarazel(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-28 13:12:22
Message-ID: 20160328131222.wdtngvibt6k2p5id@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-03-28 11:35:57 +0200, Magnus Hagander wrote:
> On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
>
> > On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > As pointed out in
> > >
> > http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> > > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > > make any efforts to ensure their output is durable.
> > >
> > > I think for backup tools of possibly critical data, that's pretty much
> > > unaceptable.
> >
> > Definitely agreed, once a backup/dump has been taken and those
> > utilities exit, we had better ensure that they are durably on disk.
> > For pg_basebackup and pg_dump, as everything except pg_dump/plain
> > require a target directory for the location of the output result, we
> > really can make things better.
> >
> >
> Definitely agreed on fixing it. But I don't think your summary is right.
>
> pg_basebackup in tar mode can be sent to stdout, does not require a
> directory. And the same for pg_dump in any mode except for directory. So we
> can't just drive it off the mode, some more detailed checks are required.

if (!isastty(stdout)) ought to do the trick, afaics? And maybe add a
warning somewhere in the docs about the tools not fsyncing if you pipe
their output data somewhere?

Andres


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-28 16:03:15
Message-ID: CABUevEw3yey6Eoeur4k3MzMY6FStpr5F6r475r0rDKXxMaGVHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 28, 2016 at 3:12 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2016-03-28 11:35:57 +0200, Magnus Hagander wrote:
> > On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com>
> > wrote:
> >
> > > On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> > > > As pointed out in
> > > >
> > >
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> > > > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > > > make any efforts to ensure their output is durable.
> > > >
> > > > I think for backup tools of possibly critical data, that's pretty
> much
> > > > unaceptable.
> > >
> > > Definitely agreed, once a backup/dump has been taken and those
> > > utilities exit, we had better ensure that they are durably on disk.
> > > For pg_basebackup and pg_dump, as everything except pg_dump/plain
> > > require a target directory for the location of the output result, we
> > > really can make things better.
> > >
> > >
> > Definitely agreed on fixing it. But I don't think your summary is right.
> >
> > pg_basebackup in tar mode can be sent to stdout, does not require a
> > directory. And the same for pg_dump in any mode except for directory. So
> we
> > can't just drive it off the mode, some more detailed checks are required.
>
> if (!isastty(stdout)) ought to do the trick, afaics? And maybe add a
> warning somewhere in the docs about the tools not fsyncing if you pipe
> their output data somewhere?
>

That should work yeah. And given that we already use that check in other
places, it seems it should be perfectly safe. And as long as we only do a
WARNING and not abort if the fsync fails, we should be OK if people
intentionally store their backups on an fs that doesn't speak fsync (if
that exists), in which case I don't really think we even need a switch to
turn it off.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-29 06:46:34
Message-ID: 56FA24CA.8060609@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/28/16 11:03 AM, Magnus Hagander wrote:
>
> That should work yeah. And given that we already use that check in other
> places, it seems it should be perfectly safe. And as long as we only do
> a WARNING and not abort if the fsync fails, we should be OK if people
> intentionally store their backups on an fs that doesn't speak fsync (if
> that exists), in which case I don't really think we even need a switch
> to turn it off.

I'd even go so far as spitting out a warning any time we can't fsync
(maybe that's what you're suggesting?)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-29 08:06:20
Message-ID: CABUevEwsHPC3g1wkavbL6p0YhpqaaGMasFkk8h2T2x1maYjv7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 29, 2016 at 8:46 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 3/28/16 11:03 AM, Magnus Hagander wrote:
>
>>
>> That should work yeah. And given that we already use that check in other
>> places, it seems it should be perfectly safe. And as long as we only do
>> a WARNING and not abort if the fsync fails, we should be OK if people
>> intentionally store their backups on an fs that doesn't speak fsync (if
>> that exists), in which case I don't really think we even need a switch
>> to turn it off.
>>
>
> I'd even go so far as spitting out a warning any time we can't fsync
> (maybe that's what you're suggesting?)

That is pretty much what I was suggesting, yes.

Though we might want to consolidate them in for example pg_basebackup -Fp
and pg_dump -Fd into something like "failed to fsync <n> files".

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Andres Freund <andres(at)anarazel(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-29 08:12:29
Message-ID: 20160329081229.GA27646@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-03-29 10:06:20 +0200, Magnus Hagander wrote:
> On Tue, Mar 29, 2016 at 8:46 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>
> > On 3/28/16 11:03 AM, Magnus Hagander wrote:
> >
> >>
> >> That should work yeah. And given that we already use that check in other
> >> places, it seems it should be perfectly safe. And as long as we only do
> >> a WARNING and not abort if the fsync fails, we should be OK if people
> >> intentionally store their backups on an fs that doesn't speak fsync (if
> >> that exists), in which case I don't really think we even need a switch
> >> to turn it off.
> >>
> >
> > I'd even go so far as spitting out a warning any time we can't fsync
> > (maybe that's what you're suggesting?)
>
>
> That is pretty much what I was suggesting, yes.
>
> Though we might want to consolidate them in for example pg_basebackup -Fp
> and pg_dump -Fd into something like "failed to fsync <n> files".

I'd just not output anything if ENOTSUPP or similar is returned, and not
bother with something as complex as collecting errors.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup tools ought to ensure created backups are durable
Date: 2016-03-29 08:22:34
Message-ID: CABUevEznZAo+aCf6+w2tAnOe=kynQjMXF3MrZb9SLeFAWHQQNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 29, 2016 at 10:12 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2016-03-29 10:06:20 +0200, Magnus Hagander wrote:
> > On Tue, Mar 29, 2016 at 8:46 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
> wrote:
> >
> > > On 3/28/16 11:03 AM, Magnus Hagander wrote:
> > >
> > >>
> > >> That should work yeah. And given that we already use that check in
> other
> > >> places, it seems it should be perfectly safe. And as long as we only
> do
> > >> a WARNING and not abort if the fsync fails, we should be OK if people
> > >> intentionally store their backups on an fs that doesn't speak fsync
> (if
> > >> that exists), in which case I don't really think we even need a switch
> > >> to turn it off.
> > >>
> > >
> > > I'd even go so far as spitting out a warning any time we can't fsync
> > > (maybe that's what you're suggesting?)
> >
> >
> > That is pretty much what I was suggesting, yes.
> >
> > Though we might want to consolidate them in for example pg_basebackup -Fp
> > and pg_dump -Fd into something like "failed to fsync <n> files".
>
> I'd just not output anything if ENOTSUPP or similar is returned, and not
> bother with something as complex as collecting errors.
>

That'll work too, I guess. Won't necessarily make people aware of the
problem, but in the unlikely event they use a fs like that they should be
aware of it already.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/