Re: incremental backups

Lists: pgsql-general
From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: incremental backups
Date: 2006-01-26 17:33:23
Message-ID: 9282E52E-4127-4B2F-925E-11331B5429D7@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am looking into using WAL archiving for incremental backups. It
all seems fairly straightforward except for one thing.

So you set up the archiving of the WAL files. Then you set up cron
or something to regularly do a physical backup of the data
directory. But when you do the physical backup you don't have the
last WAL file archived yet that you need to restore that physical
backup. So you always need to keep at least two physical backups
around so that you know that at least one of them has the WAL files
needed for recovery.

The question I have is: how do I know if I can use the latest one?
That is if I first do physical backup A and then later do physical
backup B and then I want to do a restore. How do I know when I've
got the files I need to use B so that I don't have to go all the way
back to A?

My initial thoughts are that I could:

a) just before or after calling pg_stop_backup check the file system
to see what the last archived WAL file is on disk and make sure that
that I get the next one before I try restoring from that backup.

b) just before or after calling pg_stop_backup check postgres to see
to see what the current active WAL file is and make sure it has been
archived before I try to restore from that backup.

c) Always just use backup A.

No c seems the easiest but is that even fail safe? I realize it
wouldn't really ever happen in an active production environment that
was set up right but say you did backup A and backup B and during
that whole time you had few writes in postgres that you never filled
up a whole WAL file so both of the backups are invalid. Then you
would have to always go to option a or b above to verify that a given
backup was good so that any previous backups could be deleted.

Wouldn't it make things a lot easier if the backup history file not
only gave you the name of the first file that you need but also the
last one? Then you could look at a given backup and say I need this
start file and this end file. Then you could delete all archived WAL
files before start file. And you could delete any old physical dumps
because you know that your last physical dump was good. It would
just save you the step in the backups process of figuring out what
that file is. And it seems like pg_stop_backup could determine that
on it's own.

Does that make sense? Am I totally off base here?

Rick


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-26 17:41:23
Message-ID: 1138297283.24321.45.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I didn't read your mail very carefully, but I guess you want:

- turn on WAL archiving, and archive all WAL logs;
- take the file system backup at regular time points, optionally you
can keep them also for point in time recovery;

Then you always have all the WAL files you need to recover to any point
in time you need. You can then supply all the WAL files which are needed
by the last file system backup to recover after a crash, or you can
supply all the WAL files up to the time point just before your student
DBA deleted all your data.

HTH,
Csaba.

On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:
> I am looking into using WAL archiving for incremental backups. It
> all seems fairly straightforward except for one thing.
>
> So you set up the archiving of the WAL files. Then you set up cron
> or something to regularly do a physical backup of the data
> directory. But when you do the physical backup you don't have the
> last WAL file archived yet that you need to restore that physical
> backup. So you always need to keep at least two physical backups
> around so that you know that at least one of them has the WAL files
> needed for recovery.
>
> The question I have is: how do I know if I can use the latest one?
> That is if I first do physical backup A and then later do physical
> backup B and then I want to do a restore. How do I know when I've
> got the files I need to use B so that I don't have to go all the way
> back to A?
>
> My initial thoughts are that I could:
>
> a) just before or after calling pg_stop_backup check the file system
> to see what the last archived WAL file is on disk and make sure that
> that I get the next one before I try restoring from that backup.
>
> b) just before or after calling pg_stop_backup check postgres to see
> to see what the current active WAL file is and make sure it has been
> archived before I try to restore from that backup.
>
> c) Always just use backup A.
>
> No c seems the easiest but is that even fail safe? I realize it
> wouldn't really ever happen in an active production environment that
> was set up right but say you did backup A and backup B and during
> that whole time you had few writes in postgres that you never filled
> up a whole WAL file so both of the backups are invalid. Then you
> would have to always go to option a or b above to verify that a given
> backup was good so that any previous backups could be deleted.
>
> Wouldn't it make things a lot easier if the backup history file not
> only gave you the name of the first file that you need but also the
> last one? Then you could look at a given backup and say I need this
> start file and this end file. Then you could delete all archived WAL
> files before start file. And you could delete any old physical dumps
> because you know that your last physical dump was good. It would
> just save you the step in the backups process of figuring out what
> that file is. And it seems like pg_stop_backup could determine that
> on it's own.
>
> Does that make sense? Am I totally off base here?
>
> Rick
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-26 17:48:26
Message-ID: 2173B363-D11E-4F86-B622-0BF2FD808797@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Um, no you didn't read my email at all. I am aware of all of that
and it is clearly outlined in the docs. My email was about a
specific detail in the process. Please read it if you want to know
what my actual question was.

Thanks,

Rick

On Jan 26, 2006, at 10:41 AM, Csaba Nagy wrote:

> I didn't read your mail very carefully, but I guess you want:
>
> - turn on WAL archiving, and archive all WAL logs;
> - take the file system backup at regular time points, optionally you
> can keep them also for point in time recovery;
>
> Then you always have all the WAL files you need to recover to any
> point
> in time you need. You can then supply all the WAL files which are
> needed
> by the last file system backup to recover after a crash, or you can
> supply all the WAL files up to the time point just before your student
> DBA deleted all your data.
>
> HTH,
> Csaba.
>
>
> On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:
>> I am looking into using WAL archiving for incremental backups. It
>> all seems fairly straightforward except for one thing.
>>
>> So you set up the archiving of the WAL files. Then you set up cron
>> or something to regularly do a physical backup of the data
>> directory. But when you do the physical backup you don't have the
>> last WAL file archived yet that you need to restore that physical
>> backup. So you always need to keep at least two physical backups
>> around so that you know that at least one of them has the WAL files
>> needed for recovery.
>>
>> The question I have is: how do I know if I can use the latest one?
>> That is if I first do physical backup A and then later do physical
>> backup B and then I want to do a restore. How do I know when I've
>> got the files I need to use B so that I don't have to go all the way
>> back to A?
>>
>> My initial thoughts are that I could:
>>
>> a) just before or after calling pg_stop_backup check the file system
>> to see what the last archived WAL file is on disk and make sure that
>> that I get the next one before I try restoring from that backup.
>>
>> b) just before or after calling pg_stop_backup check postgres to see
>> to see what the current active WAL file is and make sure it has been
>> archived before I try to restore from that backup.
>>
>> c) Always just use backup A.
>>
>> No c seems the easiest but is that even fail safe? I realize it
>> wouldn't really ever happen in an active production environment that
>> was set up right but say you did backup A and backup B and during
>> that whole time you had few writes in postgres that you never filled
>> up a whole WAL file so both of the backups are invalid. Then you
>> would have to always go to option a or b above to verify that a given
>> backup was good so that any previous backups could be deleted.
>>
>> Wouldn't it make things a lot easier if the backup history file not
>> only gave you the name of the first file that you need but also the
>> last one? Then you could look at a given backup and say I need this
>> start file and this end file. Then you could delete all archived WAL
>> files before start file. And you could delete any old physical dumps
>> because you know that your last physical dump was good. It would
>> just save you the step in the backups process of figuring out what
>> that file is. And it seems like pg_stop_backup could determine that
>> on it's own.
>>
>> Does that make sense? Am I totally off base here?
>>
>> Rick
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>
>


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-27 10:32:42
Message-ID: 1138357962.24321.51.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

OK, that was before going home from work, so it could be excusable :-D
I read your mail now in more detail, and I can't answer it other than
that we use here a standby data base based on WAL log shipping, and the
procedure of building the standby finishes with a script
inserting/deleting a few 1000s of lines in a bogus table so there is for
sure a WAL file archived. That might fit your needs or might not...

Cheers,
Csaba.

On Thu, 2006-01-26 at 18:48, Rick Gigger wrote:
> Um, no you didn't read my email at all. I am aware of all of that
> and it is clearly outlined in the docs. My email was about a
> specific detail in the process. Please read it if you want to know
> what my actual question was.
>
> Thanks,
>
> Rick
>
> On Jan 26, 2006, at 10:41 AM, Csaba Nagy wrote:
>
> > I didn't read your mail very carefully, but I guess you want:
> >
> > - turn on WAL archiving, and archive all WAL logs;
> > - take the file system backup at regular time points, optionally you
> > can keep them also for point in time recovery;
> >
> > Then you always have all the WAL files you need to recover to any
> > point
> > in time you need. You can then supply all the WAL files which are
> > needed
> > by the last file system backup to recover after a crash, or you can
> > supply all the WAL files up to the time point just before your student
> > DBA deleted all your data.
> >
> > HTH,
> > Csaba.
> >
> >
> > On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:
> >> I am looking into using WAL archiving for incremental backups. It
> >> all seems fairly straightforward except for one thing.
> >>
> >> So you set up the archiving of the WAL files. Then you set up cron
> >> or something to regularly do a physical backup of the data
> >> directory. But when you do the physical backup you don't have the
> >> last WAL file archived yet that you need to restore that physical
> >> backup. So you always need to keep at least two physical backups
> >> around so that you know that at least one of them has the WAL files
> >> needed for recovery.
> >>
> >> The question I have is: how do I know if I can use the latest one?
> >> That is if I first do physical backup A and then later do physical
> >> backup B and then I want to do a restore. How do I know when I've
> >> got the files I need to use B so that I don't have to go all the way
> >> back to A?
> >>
> >> My initial thoughts are that I could:
> >>
> >> a) just before or after calling pg_stop_backup check the file system
> >> to see what the last archived WAL file is on disk and make sure that
> >> that I get the next one before I try restoring from that backup.
> >>
> >> b) just before or after calling pg_stop_backup check postgres to see
> >> to see what the current active WAL file is and make sure it has been
> >> archived before I try to restore from that backup.
> >>
> >> c) Always just use backup A.
> >>
> >> No c seems the easiest but is that even fail safe? I realize it
> >> wouldn't really ever happen in an active production environment that
> >> was set up right but say you did backup A and backup B and during
> >> that whole time you had few writes in postgres that you never filled
> >> up a whole WAL file so both of the backups are invalid. Then you
> >> would have to always go to option a or b above to verify that a given
> >> backup was good so that any previous backups could be deleted.
> >>
> >> Wouldn't it make things a lot easier if the backup history file not
> >> only gave you the name of the first file that you need but also the
> >> last one? Then you could look at a given backup and say I need this
> >> start file and this end file. Then you could delete all archived WAL
> >> files before start file. And you could delete any old physical dumps
> >> because you know that your last physical dump was good. It would
> >> just save you the step in the backups process of figuring out what
> >> that file is. And it seems like pg_stop_backup could determine that
> >> on it's own.
> >>
> >> Does that make sense? Am I totally off base here?
> >>
> >> Rick
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 6: explain analyze is your friend
> >
> >
>


From: Richard Huxton <dev(at)archonet(dot)com>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-27 10:33:01
Message-ID: 43D9F6DD.7020401@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Rick Gigger wrote:
> Um, no you didn't read my email at all. I am aware of all of that and
> it is clearly outlined in the docs. My email was about a specific
> detail in the process. Please read it if you want to know what my
> actual question was.

I'm not sure your email is quite right as regards the process. You need:
1. the filesystem backup
2. the WAL file indicated in the history-file
3. all the WAL files later than that
to get up to "now".

If you don't want to replay up to "now" then you will not need some of
the more recent WAL files. You can't afford to throw them away though
since you've got a rolling backup system running and the whole point is
so you can recover to any point you like.

You can however throw away any WAL files older than that indicated in
the history file for your current filesystem-backup. You can then only
restore from that point in time forward.

There is no "last one" in the WAL set unless you know the time you want
to restore to. Indeed, the "last one" might not be "full" yet and
therefore archived if you want to restore to 10 seconds ago.

Or am I mis-understanding your email too?

--
Richard Huxton
Archonet Ltd


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-27 16:39:25
Message-ID: 2DA8891D-AD85-4B78-BFA6-DA029C4E269B@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sorry for my sharp reply! It looks like we are after the same thing
so that does help a little although it doesn't really answer my
question. I set up my backups system using pg_dump back in 7.3
because that's all there was. I am finally moving to 8.1 and want to
switch to doing incrementals because the dumps are just getting too
big. If you didn't mind showing me your scripts it would probably be
a good staring point for me.

Thanks,

Rick

On Jan 27, 2006, at 3:32 AM, Csaba Nagy wrote:

> OK, that was before going home from work, so it could be excusable :-D
> I read your mail now in more detail, and I can't answer it other than
> that we use here a standby data base based on WAL log shipping, and
> the
> procedure of building the standby finishes with a script
> inserting/deleting a few 1000s of lines in a bogus table so there
> is for
> sure a WAL file archived. That might fit your needs or might not...
>
> Cheers,
> Csaba.
>
>
> On Thu, 2006-01-26 at 18:48, Rick Gigger wrote:
>> Um, no you didn't read my email at all. I am aware of all of that
>> and it is clearly outlined in the docs. My email was about a
>> specific detail in the process. Please read it if you want to know
>> what my actual question was.
>>
>> Thanks,
>>
>> Rick
>>
>> On Jan 26, 2006, at 10:41 AM, Csaba Nagy wrote:
>>
>>> I didn't read your mail very carefully, but I guess you want:
>>>
>>> - turn on WAL archiving, and archive all WAL logs;
>>> - take the file system backup at regular time points,
>>> optionally you
>>> can keep them also for point in time recovery;
>>>
>>> Then you always have all the WAL files you need to recover to any
>>> point
>>> in time you need. You can then supply all the WAL files which are
>>> needed
>>> by the last file system backup to recover after a crash, or you can
>>> supply all the WAL files up to the time point just before your
>>> student
>>> DBA deleted all your data.
>>>
>>> HTH,
>>> Csaba.
>>>
>>>
>>> On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:
>>>> I am looking into using WAL archiving for incremental backups. It
>>>> all seems fairly straightforward except for one thing.
>>>>
>>>> So you set up the archiving of the WAL files. Then you set up cron
>>>> or something to regularly do a physical backup of the data
>>>> directory. But when you do the physical backup you don't have the
>>>> last WAL file archived yet that you need to restore that physical
>>>> backup. So you always need to keep at least two physical backups
>>>> around so that you know that at least one of them has the WAL files
>>>> needed for recovery.
>>>>
>>>> The question I have is: how do I know if I can use the latest one?
>>>> That is if I first do physical backup A and then later do physical
>>>> backup B and then I want to do a restore. How do I know when I've
>>>> got the files I need to use B so that I don't have to go all the
>>>> way
>>>> back to A?
>>>>
>>>> My initial thoughts are that I could:
>>>>
>>>> a) just before or after calling pg_stop_backup check the file
>>>> system
>>>> to see what the last archived WAL file is on disk and make sure
>>>> that
>>>> that I get the next one before I try restoring from that backup.
>>>>
>>>> b) just before or after calling pg_stop_backup check postgres to
>>>> see
>>>> to see what the current active WAL file is and make sure it has
>>>> been
>>>> archived before I try to restore from that backup.
>>>>
>>>> c) Always just use backup A.
>>>>
>>>> No c seems the easiest but is that even fail safe? I realize it
>>>> wouldn't really ever happen in an active production environment
>>>> that
>>>> was set up right but say you did backup A and backup B and during
>>>> that whole time you had few writes in postgres that you never
>>>> filled
>>>> up a whole WAL file so both of the backups are invalid. Then you
>>>> would have to always go to option a or b above to verify that a
>>>> given
>>>> backup was good so that any previous backups could be deleted.
>>>>
>>>> Wouldn't it make things a lot easier if the backup history file not
>>>> only gave you the name of the first file that you need but also the
>>>> last one? Then you could look at a given backup and say I need
>>>> this
>>>> start file and this end file. Then you could delete all
>>>> archived WAL
>>>> files before start file. And you could delete any old physical
>>>> dumps
>>>> because you know that your last physical dump was good. It would
>>>> just save you the step in the backups process of figuring out what
>>>> that file is. And it seems like pg_stop_backup could determine
>>>> that
>>>> on it's own.
>>>>
>>>> Does that make sense? Am I totally off base here?
>>>>
>>>> Rick
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 6: explain analyze is your friend
>>>
>>>
>>
>
>


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-27 17:18:57
Message-ID: 06F853D6-AFA7-4B71-83BF-E56831287414@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I guess my email wasn't all that clear. I will try to rephrase. I
am moving from using the old style pg_dump for backups to using
incrementals and want to make sure I understand the process before I
go about writing a bunch of scritps.

To me setting up incremental backup consists of the following
components:

1) Setting up the WAL archiving. This one is trivial.
2) Doing physical dumps of the $PGDATA directory. This one is once
again trivial.
3) Knowing which physical dumps are Good and Not Good. For a given
physical dump D there is are WAL archive files Dstart and Dend for
which you much have Dstart and Dend and all files in between. If you
have all those files then the physical dump is Good. If you don't
have them then the dump is worthless to you.
4) Knowing which dumps and which archive files can be deleted. This
depends on a number of factors.
a) How far back do you want to be able to do PITR
b) How much space do you have / want to use for PITR
c) Which physical dumps are Good and which are Not Good. (see #3)

Now I think I have a pretty good plan here except for #3 (and so #4
then also suffers).

Just as an example lets say I'm not concerned so much with PITR as I
am recovering from a db crash. I've got all the backups files saved
to my backup db server so I can failover to it if my primary db
server dies. I just want to make sure I've got one physical dump
that is good. (This is not my actual situation but it illustrated my
point better.)

Now when I do a physical dump it is not a Good dump. That is I don't
have the end archive file necessary to recover from that physical
dump. That is to say that when I call pg_backup_start() then copy
$PGDATA then call pg_backup_stop() postgres might be on say WAL
archive file #5. Once the physical dump is completed WAL archive
file #5 hasn't been archived yet. I only have up to #4. So if I
delete my old physical dumps and all I've got is this most recent one
and my database crashes before #5 gets archived then I am hosed. I
have no good physical backups to start from.

My main question is about the best way to figure out when a physical
dump is Good.

One strategy is to always keep around lots of physical dumps. If you
keep around 100 dumps you can be pretty sure that in the space of
time that those physical dumps take place that at least one WAL file
was archived. In fact if you keep 2 physical dumps you can be fairly
certain of this. If not then you really need to space our your dumps
more.

Is this making sense at this point?

The problem is that the WAL archiving is triggered by postgres and
the rate at which the db is updated. The physical dumps are
triggered by cron and on a purely time based schedule. So in theory
if you had the physical dumps happening once a day but for some odd
reason no one updated the database for 4 days then all of a sudden
you'd have 2 physical backups and neither of them are good. If
you're db crashes during that time you are hosed.

Maybe I am arguing a point that is just stupid because this will
never happen in real life. But in that it is my backups system that
I will be using to recover from complete and total disaster I just
want to have all my bases covered.

So my ideas on how to determine if a physical dump is Good are as
follows.

1) When you do the physical backup (after dumping the $PGDATA dir but
before calling pg_stop_backup() ) determine the current WAL archive
file. Mark somewhere in the backed up physical dump the last file
needed for the dump to be considered good. Then your deletion
scripts can look at the WAL archive files you have and the last one
required for the dump to be Good and determine if the dump is Good or
not.

2) After doing the physical dump but before calling pg_stop_backup()
just copy the current WAL file to the physical dump. If that file
later gets archived then the restore commands overwrites your
partially completed one so it doesn't hurt but you know that when you
call pg_stop_backup() that that physical dump is good. (Is it ok to
copy the current WAL file while it is still in use?)

Is anyone taking one of these or any other precautions to make sure
they've got a good physical dump or does everyone just keep a whole
bunch of dumps around, and then actually restore the dump to see if
it is good and if not go back to a previous dump?

I hope that makes more sense.

Thanks,

Rick

On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:

> Rick Gigger wrote:
>> Um, no you didn't read my email at all. I am aware of all of that
>> and it is clearly outlined in the docs. My email was about a
>> specific detail in the process. Please read it if you want to
>> know what my actual question was.
>
> I'm not sure your email is quite right as regards the process. You
> need:
> 1. the filesystem backup
> 2. the WAL file indicated in the history-file
> 3. all the WAL files later than that
> to get up to "now".
>
> If you don't want to replay up to "now" then you will not need some
> of the more recent WAL files. You can't afford to throw them away
> though since you've got a rolling backup system running and the
> whole point is so you can recover to any point you like.
>
> You can however throw away any WAL files older than that indicated
> in the history file for your current filesystem-backup. You can
> then only restore from that point in time forward.
>
> There is no "last one" in the WAL set unless you know the time you
> want to restore to. Indeed, the "last one" might not be "full" yet
> and therefore archived if you want to restore to 10 seconds ago.
>
> Or am I mis-understanding your email too?
>
> --
> Richard Huxton
> Archonet Ltd
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-30 18:13:27
Message-ID: 200601301813.k0UIDRl11357@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Unfortunately, I think I understand your question. :-)

These TODO items are what you need:

* Point-In-Time Recovery (PITR)

o Allow point-in-time recovery to archive partially filled
write-ahead logs [pitr]

Currently only full WAL files are archived. This means that the
most recent transactions aren't available for recovery in case
of a disk failure. This could be triggered by a user command or
a timer.

o Automatically force archiving of partially-filled WAL files when
pg_stop_backup() is called or the server is stopped

Doing this will allow administrators to know more easily when
the archive contains all the files needed for point-in-time
recovery.

I will try to push to have them done for 8.2.

---------------------------------------------------------------------------

Rick Gigger wrote:
> I guess my email wasn't all that clear. I will try to rephrase. I
> am moving from using the old style pg_dump for backups to using
> incrementals and want to make sure I understand the process before I
> go about writing a bunch of scritps.
>
> To me setting up incremental backup consists of the following
> components:
>
> 1) Setting up the WAL archiving. This one is trivial.
> 2) Doing physical dumps of the $PGDATA directory. This one is once
> again trivial.
> 3) Knowing which physical dumps are Good and Not Good. For a given
> physical dump D there is are WAL archive files Dstart and Dend for
> which you much have Dstart and Dend and all files in between. If you
> have all those files then the physical dump is Good. If you don't
> have them then the dump is worthless to you.
> 4) Knowing which dumps and which archive files can be deleted. This
> depends on a number of factors.
> a) How far back do you want to be able to do PITR
> b) How much space do you have / want to use for PITR
> c) Which physical dumps are Good and which are Not Good. (see #3)
>
> Now I think I have a pretty good plan here except for #3 (and so #4
> then also suffers).
>
> Just as an example lets say I'm not concerned so much with PITR as I
> am recovering from a db crash. I've got all the backups files saved
> to my backup db server so I can failover to it if my primary db
> server dies. I just want to make sure I've got one physical dump
> that is good. (This is not my actual situation but it illustrated my
> point better.)
>
> Now when I do a physical dump it is not a Good dump. That is I don't
> have the end archive file necessary to recover from that physical
> dump. That is to say that when I call pg_backup_start() then copy
> $PGDATA then call pg_backup_stop() postgres might be on say WAL
> archive file #5. Once the physical dump is completed WAL archive
> file #5 hasn't been archived yet. I only have up to #4. So if I
> delete my old physical dumps and all I've got is this most recent one
> and my database crashes before #5 gets archived then I am hosed. I
> have no good physical backups to start from.
>
> My main question is about the best way to figure out when a physical
> dump is Good.
>
> One strategy is to always keep around lots of physical dumps. If you
> keep around 100 dumps you can be pretty sure that in the space of
> time that those physical dumps take place that at least one WAL file
> was archived. In fact if you keep 2 physical dumps you can be fairly
> certain of this. If not then you really need to space our your dumps
> more.
>
> Is this making sense at this point?
>
> The problem is that the WAL archiving is triggered by postgres and
> the rate at which the db is updated. The physical dumps are
> triggered by cron and on a purely time based schedule. So in theory
> if you had the physical dumps happening once a day but for some odd
> reason no one updated the database for 4 days then all of a sudden
> you'd have 2 physical backups and neither of them are good. If
> you're db crashes during that time you are hosed.
>
> Maybe I am arguing a point that is just stupid because this will
> never happen in real life. But in that it is my backups system that
> I will be using to recover from complete and total disaster I just
> want to have all my bases covered.
>
> So my ideas on how to determine if a physical dump is Good are as
> follows.
>
> 1) When you do the physical backup (after dumping the $PGDATA dir but
> before calling pg_stop_backup() ) determine the current WAL archive
> file. Mark somewhere in the backed up physical dump the last file
> needed for the dump to be considered good. Then your deletion
> scripts can look at the WAL archive files you have and the last one
> required for the dump to be Good and determine if the dump is Good or
> not.
>
> 2) After doing the physical dump but before calling pg_stop_backup()
> just copy the current WAL file to the physical dump. If that file
> later gets archived then the restore commands overwrites your
> partially completed one so it doesn't hurt but you know that when you
> call pg_stop_backup() that that physical dump is good. (Is it ok to
> copy the current WAL file while it is still in use?)
>
> Is anyone taking one of these or any other precautions to make sure
> they've got a good physical dump or does everyone just keep a whole
> bunch of dumps around, and then actually restore the dump to see if
> it is good and if not go back to a previous dump?
>
> I hope that makes more sense.
>
> Thanks,
>
> Rick
>
> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
>
> > Rick Gigger wrote:
> >> Um, no you didn't read my email at all. I am aware of all of that
> >> and it is clearly outlined in the docs. My email was about a
> >> specific detail in the process. Please read it if you want to
> >> know what my actual question was.
> >
> > I'm not sure your email is quite right as regards the process. You
> > need:
> > 1. the filesystem backup
> > 2. the WAL file indicated in the history-file
> > 3. all the WAL files later than that
> > to get up to "now".
> >
> > If you don't want to replay up to "now" then you will not need some
> > of the more recent WAL files. You can't afford to throw them away
> > though since you've got a rolling backup system running and the
> > whole point is so you can recover to any point you like.
> >
> > You can however throw away any WAL files older than that indicated
> > in the history file for your current filesystem-backup. You can
> > then only restore from that point in time forward.
> >
> > There is no "last one" in the WAL set unless you know the time you
> > want to restore to. Indeed, the "last one" might not be "full" yet
> > and therefore archived if you want to restore to 10 seconds ago.
> >
> > Or am I mis-understanding your email too?
> >
> > --
> > Richard Huxton
> > Archonet Ltd
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-30 23:29:15
Message-ID: 9A194B8E-2932-4EB5-AC8E-14E42B003262@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Yes! Thanks you! That is exactly what I was looking for.

So I take it that this means that it is save to copy the current in
use WAL file even as it is being written to?
And it also means that if I copy it with my physical file system
backup then I should have the last file that I need to restore from
that physical backup?

So if I write my own backup_latest_WAL_file.sh script (I think I
found one on the list from Simon Riggs) then I can do what I need to
do before those todo items get done? Or will I need to wait till
postgres gives me the ability to safely copy the file?

On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:

>
> Unfortunately, I think I understand your question. :-)
>
> These TODO items are what you need:
>
> * Point-In-Time Recovery (PITR)
>
> o Allow point-in-time recovery to archive partially filled
> write-ahead logs [pitr]
>
> Currently only full WAL files are archived. This means
> that the
> most recent transactions aren't available for recovery
> in case
> of a disk failure. This could be triggered by a user
> command or
> a timer.
>
> o Automatically force archiving of partially-filled WAL
> files when
> pg_stop_backup() is called or the server is stopped
>
> Doing this will allow administrators to know more
> easily when
> the archive contains all the files needed for point-in-
> time
> recovery.
>
> I will try to push to have them done for 8.2.
>
> ----------------------------------------------------------------------
> -----
>
> Rick Gigger wrote:
>> I guess my email wasn't all that clear. I will try to rephrase. I
>> am moving from using the old style pg_dump for backups to using
>> incrementals and want to make sure I understand the process before I
>> go about writing a bunch of scritps.
>>
>> To me setting up incremental backup consists of the following
>> components:
>>
>> 1) Setting up the WAL archiving. This one is trivial.
>> 2) Doing physical dumps of the $PGDATA directory. This one is once
>> again trivial.
>> 3) Knowing which physical dumps are Good and Not Good. For a given
>> physical dump D there is are WAL archive files Dstart and Dend for
>> which you much have Dstart and Dend and all files in between. If you
>> have all those files then the physical dump is Good. If you don't
>> have them then the dump is worthless to you.
>> 4) Knowing which dumps and which archive files can be deleted. This
>> depends on a number of factors.
>> a) How far back do you want to be able to do PITR
>> b) How much space do you have / want to use for PITR
>> c) Which physical dumps are Good and which are Not Good. (see #3)
>>
>> Now I think I have a pretty good plan here except for #3 (and so #4
>> then also suffers).
>>
>> Just as an example lets say I'm not concerned so much with PITR as I
>> am recovering from a db crash. I've got all the backups files saved
>> to my backup db server so I can failover to it if my primary db
>> server dies. I just want to make sure I've got one physical dump
>> that is good. (This is not my actual situation but it illustrated my
>> point better.)
>>
>> Now when I do a physical dump it is not a Good dump. That is I don't
>> have the end archive file necessary to recover from that physical
>> dump. That is to say that when I call pg_backup_start() then copy
>> $PGDATA then call pg_backup_stop() postgres might be on say WAL
>> archive file #5. Once the physical dump is completed WAL archive
>> file #5 hasn't been archived yet. I only have up to #4. So if I
>> delete my old physical dumps and all I've got is this most recent one
>> and my database crashes before #5 gets archived then I am hosed. I
>> have no good physical backups to start from.
>>
>> My main question is about the best way to figure out when a physical
>> dump is Good.
>>
>> One strategy is to always keep around lots of physical dumps. If you
>> keep around 100 dumps you can be pretty sure that in the space of
>> time that those physical dumps take place that at least one WAL file
>> was archived. In fact if you keep 2 physical dumps you can be fairly
>> certain of this. If not then you really need to space our your dumps
>> more.
>>
>> Is this making sense at this point?
>>
>> The problem is that the WAL archiving is triggered by postgres and
>> the rate at which the db is updated. The physical dumps are
>> triggered by cron and on a purely time based schedule. So in theory
>> if you had the physical dumps happening once a day but for some odd
>> reason no one updated the database for 4 days then all of a sudden
>> you'd have 2 physical backups and neither of them are good. If
>> you're db crashes during that time you are hosed.
>>
>> Maybe I am arguing a point that is just stupid because this will
>> never happen in real life. But in that it is my backups system that
>> I will be using to recover from complete and total disaster I just
>> want to have all my bases covered.
>>
>> So my ideas on how to determine if a physical dump is Good are as
>> follows.
>>
>> 1) When you do the physical backup (after dumping the $PGDATA dir but
>> before calling pg_stop_backup() ) determine the current WAL archive
>> file. Mark somewhere in the backed up physical dump the last file
>> needed for the dump to be considered good. Then your deletion
>> scripts can look at the WAL archive files you have and the last one
>> required for the dump to be Good and determine if the dump is Good or
>> not.
>>
>> 2) After doing the physical dump but before calling pg_stop_backup()
>> just copy the current WAL file to the physical dump. If that file
>> later gets archived then the restore commands overwrites your
>> partially completed one so it doesn't hurt but you know that when you
>> call pg_stop_backup() that that physical dump is good. (Is it ok to
>> copy the current WAL file while it is still in use?)
>>
>> Is anyone taking one of these or any other precautions to make sure
>> they've got a good physical dump or does everyone just keep a whole
>> bunch of dumps around, and then actually restore the dump to see if
>> it is good and if not go back to a previous dump?
>>
>> I hope that makes more sense.
>>
>> Thanks,
>>
>> Rick
>>
>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
>>
>>> Rick Gigger wrote:
>>>> Um, no you didn't read my email at all. I am aware of all of that
>>>> and it is clearly outlined in the docs. My email was about a
>>>> specific detail in the process. Please read it if you want to
>>>> know what my actual question was.
>>>
>>> I'm not sure your email is quite right as regards the process. You
>>> need:
>>> 1. the filesystem backup
>>> 2. the WAL file indicated in the history-file
>>> 3. all the WAL files later than that
>>> to get up to "now".
>>>
>>> If you don't want to replay up to "now" then you will not need some
>>> of the more recent WAL files. You can't afford to throw them away
>>> though since you've got a rolling backup system running and the
>>> whole point is so you can recover to any point you like.
>>>
>>> You can however throw away any WAL files older than that indicated
>>> in the history file for your current filesystem-backup. You can
>>> then only restore from that point in time forward.
>>>
>>> There is no "last one" in the WAL set unless you know the time you
>>> want to restore to. Indeed, the "last one" might not be "full" yet
>>> and therefore archived if you want to restore to 10 seconds ago.
>>>
>>> Or am I mis-understanding your email too?
>>>
>>> --
>>> Richard Huxton
>>> Archonet Ltd
>>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-30 23:43:30
Message-ID: 19532E3F-5BB4-479B-8D58-8A2947402E14@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

And here is the real million dollar question. Let's say for some
reason I don't have the last WAL file I need for my backup to be
valid. Will it die and tell me it's bad or will it just start up
with a screwed up data directory?

On Jan 30, 2006, at 4:29 PM, Rick Gigger wrote:

> Yes! Thanks you! That is exactly what I was looking for.
>
> So I take it that this means that it is save to copy the current in
> use WAL file even as it is being written to?
> And it also means that if I copy it with my physical file system
> backup then I should have the last file that I need to restore from
> that physical backup?
>
> So if I write my own backup_latest_WAL_file.sh script (I think I
> found one on the list from Simon Riggs) then I can do what I need
> to do before those todo items get done? Or will I need to wait
> till postgres gives me the ability to safely copy the file?
>
>
>
> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
>
>>
>> Unfortunately, I think I understand your question. :-)
>>
>> These TODO items are what you need:
>>
>> * Point-In-Time Recovery (PITR)
>>
>> o Allow point-in-time recovery to archive partially filled
>> write-ahead logs [pitr]
>>
>> Currently only full WAL files are archived. This means
>> that the
>> most recent transactions aren't available for recovery
>> in case
>> of a disk failure. This could be triggered by a user
>> command or
>> a timer.
>>
>> o Automatically force archiving of partially-filled WAL
>> files when
>> pg_stop_backup() is called or the server is stopped
>>
>> Doing this will allow administrators to know more
>> easily when
>> the archive contains all the files needed for point-in-
>> time
>> recovery.
>>
>> I will try to push to have them done for 8.2.
>>
>> ---------------------------------------------------------------------
>> ------
>>
>> Rick Gigger wrote:
>>> I guess my email wasn't all that clear. I will try to rephrase. I
>>> am moving from using the old style pg_dump for backups to using
>>> incrementals and want to make sure I understand the process before I
>>> go about writing a bunch of scritps.
>>>
>>> To me setting up incremental backup consists of the following
>>> components:
>>>
>>> 1) Setting up the WAL archiving. This one is trivial.
>>> 2) Doing physical dumps of the $PGDATA directory. This one is once
>>> again trivial.
>>> 3) Knowing which physical dumps are Good and Not Good. For a given
>>> physical dump D there is are WAL archive files Dstart and Dend for
>>> which you much have Dstart and Dend and all files in between. If
>>> you
>>> have all those files then the physical dump is Good. If you don't
>>> have them then the dump is worthless to you.
>>> 4) Knowing which dumps and which archive files can be deleted. This
>>> depends on a number of factors.
>>> a) How far back do you want to be able to do PITR
>>> b) How much space do you have / want to use for PITR
>>> c) Which physical dumps are Good and which are Not Good. (see #3)
>>>
>>> Now I think I have a pretty good plan here except for #3 (and so #4
>>> then also suffers).
>>>
>>> Just as an example lets say I'm not concerned so much with PITR as I
>>> am recovering from a db crash. I've got all the backups files saved
>>> to my backup db server so I can failover to it if my primary db
>>> server dies. I just want to make sure I've got one physical dump
>>> that is good. (This is not my actual situation but it
>>> illustrated my
>>> point better.)
>>>
>>> Now when I do a physical dump it is not a Good dump. That is I
>>> don't
>>> have the end archive file necessary to recover from that physical
>>> dump. That is to say that when I call pg_backup_start() then copy
>>> $PGDATA then call pg_backup_stop() postgres might be on say WAL
>>> archive file #5. Once the physical dump is completed WAL archive
>>> file #5 hasn't been archived yet. I only have up to #4. So if I
>>> delete my old physical dumps and all I've got is this most recent
>>> one
>>> and my database crashes before #5 gets archived then I am hosed. I
>>> have no good physical backups to start from.
>>>
>>> My main question is about the best way to figure out when a physical
>>> dump is Good.
>>>
>>> One strategy is to always keep around lots of physical dumps. If
>>> you
>>> keep around 100 dumps you can be pretty sure that in the space of
>>> time that those physical dumps take place that at least one WAL file
>>> was archived. In fact if you keep 2 physical dumps you can be
>>> fairly
>>> certain of this. If not then you really need to space our your
>>> dumps
>>> more.
>>>
>>> Is this making sense at this point?
>>>
>>> The problem is that the WAL archiving is triggered by postgres and
>>> the rate at which the db is updated. The physical dumps are
>>> triggered by cron and on a purely time based schedule. So in theory
>>> if you had the physical dumps happening once a day but for some odd
>>> reason no one updated the database for 4 days then all of a sudden
>>> you'd have 2 physical backups and neither of them are good. If
>>> you're db crashes during that time you are hosed.
>>>
>>> Maybe I am arguing a point that is just stupid because this will
>>> never happen in real life. But in that it is my backups system that
>>> I will be using to recover from complete and total disaster I just
>>> want to have all my bases covered.
>>>
>>> So my ideas on how to determine if a physical dump is Good are as
>>> follows.
>>>
>>> 1) When you do the physical backup (after dumping the $PGDATA dir
>>> but
>>> before calling pg_stop_backup() ) determine the current WAL archive
>>> file. Mark somewhere in the backed up physical dump the last file
>>> needed for the dump to be considered good. Then your deletion
>>> scripts can look at the WAL archive files you have and the last one
>>> required for the dump to be Good and determine if the dump is
>>> Good or
>>> not.
>>>
>>> 2) After doing the physical dump but before calling pg_stop_backup()
>>> just copy the current WAL file to the physical dump. If that file
>>> later gets archived then the restore commands overwrites your
>>> partially completed one so it doesn't hurt but you know that when
>>> you
>>> call pg_stop_backup() that that physical dump is good. (Is it ok to
>>> copy the current WAL file while it is still in use?)
>>>
>>> Is anyone taking one of these or any other precautions to make sure
>>> they've got a good physical dump or does everyone just keep a whole
>>> bunch of dumps around, and then actually restore the dump to see if
>>> it is good and if not go back to a previous dump?
>>>
>>> I hope that makes more sense.
>>>
>>> Thanks,
>>>
>>> Rick
>>>
>>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
>>>
>>>> Rick Gigger wrote:
>>>>> Um, no you didn't read my email at all. I am aware of all of that
>>>>> and it is clearly outlined in the docs. My email was about a
>>>>> specific detail in the process. Please read it if you want to
>>>>> know what my actual question was.
>>>>
>>>> I'm not sure your email is quite right as regards the process. You
>>>> need:
>>>> 1. the filesystem backup
>>>> 2. the WAL file indicated in the history-file
>>>> 3. all the WAL files later than that
>>>> to get up to "now".
>>>>
>>>> If you don't want to replay up to "now" then you will not need some
>>>> of the more recent WAL files. You can't afford to throw them away
>>>> though since you've got a rolling backup system running and the
>>>> whole point is so you can recover to any point you like.
>>>>
>>>> You can however throw away any WAL files older than that indicated
>>>> in the history file for your current filesystem-backup. You can
>>>> then only restore from that point in time forward.
>>>>
>>>> There is no "last one" in the WAL set unless you know the time you
>>>> want to restore to. Indeed, the "last one" might not be "full" yet
>>>> and therefore archived if you want to restore to 10 seconds ago.
>>>>
>>>> Or am I mis-understanding your email too?
>>>>
>>>> --
>>>> Richard Huxton
>>>> Archonet Ltd
>>>>
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>> http://archives.postgresql.org
>>>
>>
>> --
>> Bruce Momjian | http://candle.pha.pa.us
>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
>> + If your life is a hard drive, | 13 Roberts Road
>> + Christ can be your backup. | Newtown Square,
>> Pennsylvania 19073
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-31 01:58:12
Message-ID: 27731.1138672692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Rick Gigger <rick(at)alpinenetworking(dot)com> writes:
> And here is the real million dollar question. Let's say for some
> reason I don't have the last WAL file I need for my backup to be
> valid. Will it die and tell me it's bad or will it just start up
> with a screwed up data directory?

It'll restore up to the end of the data it has. The only case that's
actually "invalid" is not restoring far enough to cover the time window
that the original base backup was taken over. Otherwise it's just a
situation of restoring up to a particular point in time...

regards, tom lane


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-31 04:04:23
Message-ID: A2D53AB0-042D-43E3-A0BE-F40137275C0F@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 30, 2006, at 6:58 PM, Tom Lane wrote:

> Rick Gigger <rick(at)alpinenetworking(dot)com> writes:
>> And here is the real million dollar question. Let's say for some
>> reason I don't have the last WAL file I need for my backup to be
>> valid. Will it die and tell me it's bad or will it just start up
>> with a screwed up data directory?
>
> It'll restore up to the end of the data it has. The only case that's
> actually "invalid" is not restoring far enough to cover the time
> window
> that the original base backup was taken over. Otherwise it's just a
> situation of restoring up to a particular point in time...
>

That's what I mean by invalid. Let's say I do something stupid and
do a physical backup and I don't grab the current WAL file. All I
have is the last one to be archived before I did my backup, which is
not late enough to do a valid restore. Will postgres know that the
restore process failed because I didn't have that last necessary WAL
file or will it just start up in a potentially inconsistent state.
Obviously that would be my fault not postgres' since I am the one
that didn't give it the data it needed to do a full restore. But I
am just wondering if that is a potential area to shoot yourself in
the foot or if postgres will put the safety on for me.

Rick


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-31 05:20:38
Message-ID: 200601310520.k0V5Kc108901@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Yes, I think copying it while it is being written is safe.

---------------------------------------------------------------------------

Rick Gigger wrote:
> Yes! Thanks you! That is exactly what I was looking for.
>
> So I take it that this means that it is save to copy the current in
> use WAL file even as it is being written to?
> And it also means that if I copy it with my physical file system
> backup then I should have the last file that I need to restore from
> that physical backup?
>
> So if I write my own backup_latest_WAL_file.sh script (I think I
> found one on the list from Simon Riggs) then I can do what I need to
> do before those todo items get done? Or will I need to wait till
> postgres gives me the ability to safely copy the file?
>
>
>
> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
>
> >
> > Unfortunately, I think I understand your question. :-)
> >
> > These TODO items are what you need:
> >
> > * Point-In-Time Recovery (PITR)
> >
> > o Allow point-in-time recovery to archive partially filled
> > write-ahead logs [pitr]
> >
> > Currently only full WAL files are archived. This means
> > that the
> > most recent transactions aren't available for recovery
> > in case
> > of a disk failure. This could be triggered by a user
> > command or
> > a timer.
> >
> > o Automatically force archiving of partially-filled WAL
> > files when
> > pg_stop_backup() is called or the server is stopped
> >
> > Doing this will allow administrators to know more
> > easily when
> > the archive contains all the files needed for point-in-
> > time
> > recovery.
> >
> > I will try to push to have them done for 8.2.
> >
> > ----------------------------------------------------------------------
> > -----
> >
> > Rick Gigger wrote:
> >> I guess my email wasn't all that clear. I will try to rephrase. I
> >> am moving from using the old style pg_dump for backups to using
> >> incrementals and want to make sure I understand the process before I
> >> go about writing a bunch of scritps.
> >>
> >> To me setting up incremental backup consists of the following
> >> components:
> >>
> >> 1) Setting up the WAL archiving. This one is trivial.
> >> 2) Doing physical dumps of the $PGDATA directory. This one is once
> >> again trivial.
> >> 3) Knowing which physical dumps are Good and Not Good. For a given
> >> physical dump D there is are WAL archive files Dstart and Dend for
> >> which you much have Dstart and Dend and all files in between. If you
> >> have all those files then the physical dump is Good. If you don't
> >> have them then the dump is worthless to you.
> >> 4) Knowing which dumps and which archive files can be deleted. This
> >> depends on a number of factors.
> >> a) How far back do you want to be able to do PITR
> >> b) How much space do you have / want to use for PITR
> >> c) Which physical dumps are Good and which are Not Good. (see #3)
> >>
> >> Now I think I have a pretty good plan here except for #3 (and so #4
> >> then also suffers).
> >>
> >> Just as an example lets say I'm not concerned so much with PITR as I
> >> am recovering from a db crash. I've got all the backups files saved
> >> to my backup db server so I can failover to it if my primary db
> >> server dies. I just want to make sure I've got one physical dump
> >> that is good. (This is not my actual situation but it illustrated my
> >> point better.)
> >>
> >> Now when I do a physical dump it is not a Good dump. That is I don't
> >> have the end archive file necessary to recover from that physical
> >> dump. That is to say that when I call pg_backup_start() then copy
> >> $PGDATA then call pg_backup_stop() postgres might be on say WAL
> >> archive file #5. Once the physical dump is completed WAL archive
> >> file #5 hasn't been archived yet. I only have up to #4. So if I
> >> delete my old physical dumps and all I've got is this most recent one
> >> and my database crashes before #5 gets archived then I am hosed. I
> >> have no good physical backups to start from.
> >>
> >> My main question is about the best way to figure out when a physical
> >> dump is Good.
> >>
> >> One strategy is to always keep around lots of physical dumps. If you
> >> keep around 100 dumps you can be pretty sure that in the space of
> >> time that those physical dumps take place that at least one WAL file
> >> was archived. In fact if you keep 2 physical dumps you can be fairly
> >> certain of this. If not then you really need to space our your dumps
> >> more.
> >>
> >> Is this making sense at this point?
> >>
> >> The problem is that the WAL archiving is triggered by postgres and
> >> the rate at which the db is updated. The physical dumps are
> >> triggered by cron and on a purely time based schedule. So in theory
> >> if you had the physical dumps happening once a day but for some odd
> >> reason no one updated the database for 4 days then all of a sudden
> >> you'd have 2 physical backups and neither of them are good. If
> >> you're db crashes during that time you are hosed.
> >>
> >> Maybe I am arguing a point that is just stupid because this will
> >> never happen in real life. But in that it is my backups system that
> >> I will be using to recover from complete and total disaster I just
> >> want to have all my bases covered.
> >>
> >> So my ideas on how to determine if a physical dump is Good are as
> >> follows.
> >>
> >> 1) When you do the physical backup (after dumping the $PGDATA dir but
> >> before calling pg_stop_backup() ) determine the current WAL archive
> >> file. Mark somewhere in the backed up physical dump the last file
> >> needed for the dump to be considered good. Then your deletion
> >> scripts can look at the WAL archive files you have and the last one
> >> required for the dump to be Good and determine if the dump is Good or
> >> not.
> >>
> >> 2) After doing the physical dump but before calling pg_stop_backup()
> >> just copy the current WAL file to the physical dump. If that file
> >> later gets archived then the restore commands overwrites your
> >> partially completed one so it doesn't hurt but you know that when you
> >> call pg_stop_backup() that that physical dump is good. (Is it ok to
> >> copy the current WAL file while it is still in use?)
> >>
> >> Is anyone taking one of these or any other precautions to make sure
> >> they've got a good physical dump or does everyone just keep a whole
> >> bunch of dumps around, and then actually restore the dump to see if
> >> it is good and if not go back to a previous dump?
> >>
> >> I hope that makes more sense.
> >>
> >> Thanks,
> >>
> >> Rick
> >>
> >> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
> >>
> >>> Rick Gigger wrote:
> >>>> Um, no you didn't read my email at all. I am aware of all of that
> >>>> and it is clearly outlined in the docs. My email was about a
> >>>> specific detail in the process. Please read it if you want to
> >>>> know what my actual question was.
> >>>
> >>> I'm not sure your email is quite right as regards the process. You
> >>> need:
> >>> 1. the filesystem backup
> >>> 2. the WAL file indicated in the history-file
> >>> 3. all the WAL files later than that
> >>> to get up to "now".
> >>>
> >>> If you don't want to replay up to "now" then you will not need some
> >>> of the more recent WAL files. You can't afford to throw them away
> >>> though since you've got a rolling backup system running and the
> >>> whole point is so you can recover to any point you like.
> >>>
> >>> You can however throw away any WAL files older than that indicated
> >>> in the history file for your current filesystem-backup. You can
> >>> then only restore from that point in time forward.
> >>>
> >>> There is no "last one" in the WAL set unless you know the time you
> >>> want to restore to. Indeed, the "last one" might not be "full" yet
> >>> and therefore archived if you want to restore to 10 seconds ago.
> >>>
> >>> Or am I mis-understanding your email too?
> >>>
> >>> --
> >>> Richard Huxton
> >>> Archonet Ltd
> >>>
> >>
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 4: Have you searched our list archives?
> >>
> >> http://archives.postgresql.org
> >>
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square,
> > Pennsylvania 19073
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-31 14:14:14
Message-ID: 3350.1138716854@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Rick Gigger <rick(at)alpinenetworking(dot)com> writes:
> That's what I mean by invalid. Let's say I do something stupid and
> do a physical backup and I don't grab the current WAL file. All I
> have is the last one to be archived before I did my backup, which is
> not late enough to do a valid restore. Will postgres know that the
> restore process failed because I didn't have that last necessary WAL
> file or will it just start up in a potentially inconsistent state.

Yes:

/*
* Complain if we did not roll forward far enough to render the backup
* dump consistent.
*/
if (XLByteLT(EndOfLog, recoveryMinXlogOffset))
{
if (needNewTimeLine) /* stopped because of stop request */
ereport(FATAL,
(errmsg("requested recovery stop point is before end time of backup dump")));
else
/* ran off end of WAL */
ereport(FATAL,
(errmsg("WAL ends before end time of backup dump")));
}

regards, tom lane


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-01-31 16:09:31
Message-ID: B41FD9DE-C1F7-4080-B4D1-A01A50864F1C@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Wonderful. That is good news. Thanks.

Rick

On Jan 31, 2006, at 7:14 AM, Tom Lane wrote:

> Rick Gigger <rick(at)alpinenetworking(dot)com> writes:
>> That's what I mean by invalid. Let's say I do something stupid and
>> do a physical backup and I don't grab the current WAL file. All I
>> have is the last one to be archived before I did my backup, which is
>> not late enough to do a valid restore. Will postgres know that the
>> restore process failed because I didn't have that last necessary WAL
>> file or will it just start up in a potentially inconsistent state.
>
> Yes:
>
> /*
> * Complain if we did not roll forward far enough to render the
> backup
> * dump consistent.
> */
> if (XLByteLT(EndOfLog, recoveryMinXlogOffset))
> {
> if (needNewTimeLine) /* stopped because of stop request */
> ereport(FATAL,
> (errmsg("requested recovery stop point is
> before end time of backup dump")));
> else
> /* ran off end of WAL */
> ereport(FATAL,
> (errmsg("WAL ends before end time of backup
> dump")));
> }
>
> regards, tom lane
>


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-02-15 11:11:32
Message-ID: 29C25929-A57B-4AA3-A46F-37D4A18EE67E@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I've started writing some scripts to set up incremental backup to my
taste. I just discovered something and thought I would revisit this
thread briefly.

When you go to restore from a give base file system backup you need
to know the start WAL file that you need and the end WAL file that
you need. (You will most likely have many files beyond the "stop"
file but you must have at least up to the "stop" file for the restore
to work.

Now if you try to restore but you don't have the "stop" WAL file
postges will die on recovery and tell you that it can't recover
forward far enough to make the backup consistent. But I wanted to
know the easiest way to verify if you indeed had the necessary files
without having to actually do a restore and have postgres tell you if
it succeeded or not.

Perhaps no one understood me because the answer I was looking for was
too obvious. But what I really wanted to know was how do you know
what the "stop" file is. It informs you of the start file all over
the place when doing the base backups but I thought I would have to
do something clever to figure out the stop file on my own. But
luckily I don't. The backup history file has too lines like this:

START WAL LOCATION: 0/88F21D0C (file 000000010000000000000088)
STOP WAL LOCATION: 0/88F21D50 (file 000000010000000000000088)

It was clear to me from the docs how to figure out what the start
file is but the end file was a mystery until I actually created a
backup history file and looked in it. The only place I can find in
the Online Backup instructions where this is indicated is this
paragraph:

"To make use of this backup, you will need to keep around all the WAL
segment files generated during and after the file system backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area.
This file is named after the first WAL segment file that you need to
have to make use of the backup. For example, if the starting WAL file
is 0000000100001234000055CD the backup history file will be named
something like 0000000100001234000055CD.007C9330.backup. (The second
number in the file name stands for an exact position within the WAL
file, and can ordinarily be ignored.) Once you have safely archived
the file system backup and the WAL segment files used during the
backup (as specified in the backup history file), all archived WAL
segments with names numerically less are no longer needed to recover
the file system backup and may be deleted. However, you should
consider keeping several backup sets to be absolutely certain that
you can recover your data. Keep in mind that only completed WAL
segment files are archived, so there will be delay between running
pg_stop_backup and the archiving of all WAL segment files needed to
make the file system backup consistent."

Reading it now it seems obvious that the file would contain not only
the start WAL file but also the Stop WAL file but when going over the
directions the first time it did not pick up on it. And it left me
thinking I would have to use some hack to figure it out if I ever
wanted to test a base backup. It would have been less confusing to
me if it just said right in the docs: "The backup history file
contains both the start WAL file name and the Stop WAL file name" or
something like that just to make it perfectly clear.

Now that I know this I can extract that filename from the backup
history file, check to see if it has been archived and copy it
somewhere if it hasn't been archived yet. I'm pretty sure that I can
assume that all files before the stop file have already been
archived. So once I backup the stop file I can be positive that the
base backup I just made will be valid when I try to restore from it.

This lessens my need for the "get current WAL file" functionality in
this context. It will still be nice to have in the context of
backing it up every five minutes or so in case a WAL file takes a
long time to fill up.

Anyway I would have been less confused if the docs had made it more
clear that the name of the stop wal file was in the backup history file.

Rick

On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote:

>
> Yes, I think copying it while it is being written is safe.
>
> ----------------------------------------------------------------------
> -----
>
> Rick Gigger wrote:
>> Yes! Thanks you! That is exactly what I was looking for.
>>
>> So I take it that this means that it is save to copy the current in
>> use WAL file even as it is being written to?
>> And it also means that if I copy it with my physical file system
>> backup then I should have the last file that I need to restore from
>> that physical backup?
>>
>> So if I write my own backup_latest_WAL_file.sh script (I think I
>> found one on the list from Simon Riggs) then I can do what I need to
>> do before those todo items get done? Or will I need to wait till
>> postgres gives me the ability to safely copy the file?
>>
>>
>>
>> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
>>
>>>
>>> Unfortunately, I think I understand your question. :-)
>>>
>>> These TODO items are what you need:
>>>
>>> * Point-In-Time Recovery (PITR)
>>>
>>> o Allow point-in-time recovery to archive partially filled
>>> write-ahead logs [pitr]
>>>
>>> Currently only full WAL files are archived. This means
>>> that the
>>> most recent transactions aren't available for recovery
>>> in case
>>> of a disk failure. This could be triggered by a user
>>> command or
>>> a timer.
>>>
>>> o Automatically force archiving of partially-filled WAL
>>> files when
>>> pg_stop_backup() is called or the server is stopped
>>>
>>> Doing this will allow administrators to know more
>>> easily when
>>> the archive contains all the files needed for point-in-
>>> time
>>> recovery.
>>>
>>> I will try to push to have them done for 8.2.
>>>
>>> --------------------------------------------------------------------
>>> --
>>> -----
>>>
>>> Rick Gigger wrote:
>>>> I guess my email wasn't all that clear. I will try to rephrase. I
>>>> am moving from using the old style pg_dump for backups to using
>>>> incrementals and want to make sure I understand the process
>>>> before I
>>>> go about writing a bunch of scritps.
>>>>
>>>> To me setting up incremental backup consists of the following
>>>> components:
>>>>
>>>> 1) Setting up the WAL archiving. This one is trivial.
>>>> 2) Doing physical dumps of the $PGDATA directory. This one is once
>>>> again trivial.
>>>> 3) Knowing which physical dumps are Good and Not Good. For a given
>>>> physical dump D there is are WAL archive files Dstart and Dend for
>>>> which you much have Dstart and Dend and all files in between.
>>>> If you
>>>> have all those files then the physical dump is Good. If you don't
>>>> have them then the dump is worthless to you.
>>>> 4) Knowing which dumps and which archive files can be deleted.
>>>> This
>>>> depends on a number of factors.
>>>> a) How far back do you want to be able to do PITR
>>>> b) How much space do you have / want to use for PITR
>>>> c) Which physical dumps are Good and which are Not Good. (see #3)
>>>>
>>>> Now I think I have a pretty good plan here except for #3 (and so #4
>>>> then also suffers).
>>>>
>>>> Just as an example lets say I'm not concerned so much with PITR
>>>> as I
>>>> am recovering from a db crash. I've got all the backups files saved
>>>> to my backup db server so I can failover to it if my primary db
>>>> server dies. I just want to make sure I've got one physical dump
>>>> that is good. (This is not my actual situation but it
>>>> illustrated my
>>>> point better.)
>>>>
>>>> Now when I do a physical dump it is not a Good dump. That is I
>>>> don't
>>>> have the end archive file necessary to recover from that physical
>>>> dump. That is to say that when I call pg_backup_start() then copy
>>>> $PGDATA then call pg_backup_stop() postgres might be on say WAL
>>>> archive file #5. Once the physical dump is completed WAL archive
>>>> file #5 hasn't been archived yet. I only have up to #4. So if I
>>>> delete my old physical dumps and all I've got is this most
>>>> recent one
>>>> and my database crashes before #5 gets archived then I am hosed. I
>>>> have no good physical backups to start from.
>>>>
>>>> My main question is about the best way to figure out when a
>>>> physical
>>>> dump is Good.
>>>>
>>>> One strategy is to always keep around lots of physical dumps.
>>>> If you
>>>> keep around 100 dumps you can be pretty sure that in the space of
>>>> time that those physical dumps take place that at least one WAL
>>>> file
>>>> was archived. In fact if you keep 2 physical dumps you can be
>>>> fairly
>>>> certain of this. If not then you really need to space our your
>>>> dumps
>>>> more.
>>>>
>>>> Is this making sense at this point?
>>>>
>>>> The problem is that the WAL archiving is triggered by postgres and
>>>> the rate at which the db is updated. The physical dumps are
>>>> triggered by cron and on a purely time based schedule. So in
>>>> theory
>>>> if you had the physical dumps happening once a day but for some odd
>>>> reason no one updated the database for 4 days then all of a sudden
>>>> you'd have 2 physical backups and neither of them are good. If
>>>> you're db crashes during that time you are hosed.
>>>>
>>>> Maybe I am arguing a point that is just stupid because this will
>>>> never happen in real life. But in that it is my backups system
>>>> that
>>>> I will be using to recover from complete and total disaster I just
>>>> want to have all my bases covered.
>>>>
>>>> So my ideas on how to determine if a physical dump is Good are as
>>>> follows.
>>>>
>>>> 1) When you do the physical backup (after dumping the $PGDATA
>>>> dir but
>>>> before calling pg_stop_backup() ) determine the current WAL archive
>>>> file. Mark somewhere in the backed up physical dump the last file
>>>> needed for the dump to be considered good. Then your deletion
>>>> scripts can look at the WAL archive files you have and the last one
>>>> required for the dump to be Good and determine if the dump is
>>>> Good or
>>>> not.
>>>>
>>>> 2) After doing the physical dump but before calling
>>>> pg_stop_backup()
>>>> just copy the current WAL file to the physical dump. If that file
>>>> later gets archived then the restore commands overwrites your
>>>> partially completed one so it doesn't hurt but you know that
>>>> when you
>>>> call pg_stop_backup() that that physical dump is good. (Is it
>>>> ok to
>>>> copy the current WAL file while it is still in use?)
>>>>
>>>> Is anyone taking one of these or any other precautions to make sure
>>>> they've got a good physical dump or does everyone just keep a whole
>>>> bunch of dumps around, and then actually restore the dump to see if
>>>> it is good and if not go back to a previous dump?
>>>>
>>>> I hope that makes more sense.
>>>>
>>>> Thanks,
>>>>
>>>> Rick
>>>>
>>>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
>>>>
>>>>> Rick Gigger wrote:
>>>>>> Um, no you didn't read my email at all. I am aware of all of
>>>>>> that
>>>>>> and it is clearly outlined in the docs. My email was about a
>>>>>> specific detail in the process. Please read it if you want to
>>>>>> know what my actual question was.
>>>>>
>>>>> I'm not sure your email is quite right as regards the process. You
>>>>> need:
>>>>> 1. the filesystem backup
>>>>> 2. the WAL file indicated in the history-file
>>>>> 3. all the WAL files later than that
>>>>> to get up to "now".
>>>>>
>>>>> If you don't want to replay up to "now" then you will not need
>>>>> some
>>>>> of the more recent WAL files. You can't afford to throw them away
>>>>> though since you've got a rolling backup system running and the
>>>>> whole point is so you can recover to any point you like.
>>>>>
>>>>> You can however throw away any WAL files older than that indicated
>>>>> in the history file for your current filesystem-backup. You can
>>>>> then only restore from that point in time forward.
>>>>>
>>>>> There is no "last one" in the WAL set unless you know the time you
>>>>> want to restore to. Indeed, the "last one" might not be "full" yet
>>>>> and therefore archived if you want to restore to 10 seconds ago.
>>>>>
>>>>> Or am I mis-understanding your email too?
>>>>>
>>>>> --
>>>>> Richard Huxton
>>>>> Archonet Ltd
>>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 4: Have you searched our list archives?
>>>>
>>>> http://archives.postgresql.org
>>>>
>>>
>>> --
>>> Bruce Momjian | http://candle.pha.pa.us
>>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
>>> + If your life is a hard drive, | 13 Roberts Road
>>> + Christ can be your backup. | Newtown Square,
>>> Pennsylvania 19073
>>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: incremental backups
Date: 2006-02-24 14:12:12
Message-ID: 200602241412.k1OECC121000@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


I have applied the following patch adds to the paragraph after the one
you quoted below. I just added mention that the start/stop time _and_
wal file names are in the history file.

---------------------------------------------------------------------------

Rick Gigger wrote:
> I've started writing some scripts to set up incremental backup to my
> taste. I just discovered something and thought I would revisit this
> thread briefly.
>
> When you go to restore from a give base file system backup you need
> to know the start WAL file that you need and the end WAL file that
> you need. (You will most likely have many files beyond the "stop"
> file but you must have at least up to the "stop" file for the restore
> to work.
>
> Now if you try to restore but you don't have the "stop" WAL file
> postges will die on recovery and tell you that it can't recover
> forward far enough to make the backup consistent. But I wanted to
> know the easiest way to verify if you indeed had the necessary files
> without having to actually do a restore and have postgres tell you if
> it succeeded or not.
>
> Perhaps no one understood me because the answer I was looking for was
> too obvious. But what I really wanted to know was how do you know
> what the "stop" file is. It informs you of the start file all over
> the place when doing the base backups but I thought I would have to
> do something clever to figure out the stop file on my own. But
> luckily I don't. The backup history file has too lines like this:
>
> START WAL LOCATION: 0/88F21D0C (file 000000010000000000000088)
> STOP WAL LOCATION: 0/88F21D50 (file 000000010000000000000088)
>
> It was clear to me from the docs how to figure out what the start
> file is but the end file was a mystery until I actually created a
> backup history file and looked in it. The only place I can find in
> the Online Backup instructions where this is indicated is this
> paragraph:
>
> "To make use of this backup, you will need to keep around all the WAL
> segment files generated during and after the file system backup. To
> aid you in doing this, the pg_stop_backup function creates a backup
> history file that is immediately stored into the WAL archive area.
> This file is named after the first WAL segment file that you need to
> have to make use of the backup. For example, if the starting WAL file
> is 0000000100001234000055CD the backup history file will be named
> something like 0000000100001234000055CD.007C9330.backup. (The second
> number in the file name stands for an exact position within the WAL
> file, and can ordinarily be ignored.) Once you have safely archived
> the file system backup and the WAL segment files used during the
> backup (as specified in the backup history file), all archived WAL
> segments with names numerically less are no longer needed to recover
> the file system backup and may be deleted. However, you should
> consider keeping several backup sets to be absolutely certain that
> you can recover your data. Keep in mind that only completed WAL
> segment files are archived, so there will be delay between running
> pg_stop_backup and the archiving of all WAL segment files needed to
> make the file system backup consistent."
>
> Reading it now it seems obvious that the file would contain not only
> the start WAL file but also the Stop WAL file but when going over the
> directions the first time it did not pick up on it. And it left me
> thinking I would have to use some hack to figure it out if I ever
> wanted to test a base backup. It would have been less confusing to
> me if it just said right in the docs: "The backup history file
> contains both the start WAL file name and the Stop WAL file name" or
> something like that just to make it perfectly clear.
>
> Now that I know this I can extract that filename from the backup
> history file, check to see if it has been archived and copy it
> somewhere if it hasn't been archived yet. I'm pretty sure that I can
> assume that all files before the stop file have already been
> archived. So once I backup the stop file I can be positive that the
> base backup I just made will be valid when I try to restore from it.
>
> This lessens my need for the "get current WAL file" functionality in
> this context. It will still be nice to have in the context of
> backing it up every five minutes or so in case a WAL file takes a
> long time to fill up.
>
> Anyway I would have been less confused if the docs had made it more
> clear that the name of the stop wal file was in the backup history file.
>
> Rick
>
>
> On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote:
>
> >
> > Yes, I think copying it while it is being written is safe.
> >
> > ----------------------------------------------------------------------
> > -----
> >
> > Rick Gigger wrote:
> >> Yes! Thanks you! That is exactly what I was looking for.
> >>
> >> So I take it that this means that it is save to copy the current in
> >> use WAL file even as it is being written to?
> >> And it also means that if I copy it with my physical file system
> >> backup then I should have the last file that I need to restore from
> >> that physical backup?
> >>
> >> So if I write my own backup_latest_WAL_file.sh script (I think I
> >> found one on the list from Simon Riggs) then I can do what I need to
> >> do before those todo items get done? Or will I need to wait till
> >> postgres gives me the ability to safely copy the file?
> >>
> >>
> >>
> >> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
> >>
> >>>
> >>> Unfortunately, I think I understand your question. :-)
> >>>
> >>> These TODO items are what you need:
> >>>
> >>> * Point-In-Time Recovery (PITR)
> >>>
> >>> o Allow point-in-time recovery to archive partially filled
> >>> write-ahead logs [pitr]
> >>>
> >>> Currently only full WAL files are archived. This means
> >>> that the
> >>> most recent transactions aren't available for recovery
> >>> in case
> >>> of a disk failure. This could be triggered by a user
> >>> command or
> >>> a timer.
> >>>
> >>> o Automatically force archiving of partially-filled WAL
> >>> files when
> >>> pg_stop_backup() is called or the server is stopped
> >>>
> >>> Doing this will allow administrators to know more
> >>> easily when
> >>> the archive contains all the files needed for point-in-
> >>> time
> >>> recovery.
> >>>
> >>> I will try to push to have them done for 8.2.
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> -----
> >>>
> >>> Rick Gigger wrote:
> >>>> I guess my email wasn't all that clear. I will try to rephrase. I
> >>>> am moving from using the old style pg_dump for backups to using
> >>>> incrementals and want to make sure I understand the process
> >>>> before I
> >>>> go about writing a bunch of scritps.
> >>>>
> >>>> To me setting up incremental backup consists of the following
> >>>> components:
> >>>>
> >>>> 1) Setting up the WAL archiving. This one is trivial.
> >>>> 2) Doing physical dumps of the $PGDATA directory. This one is once
> >>>> again trivial.
> >>>> 3) Knowing which physical dumps are Good and Not Good. For a given
> >>>> physical dump D there is are WAL archive files Dstart and Dend for
> >>>> which you much have Dstart and Dend and all files in between.
> >>>> If you
> >>>> have all those files then the physical dump is Good. If you don't
> >>>> have them then the dump is worthless to you.
> >>>> 4) Knowing which dumps and which archive files can be deleted.
> >>>> This
> >>>> depends on a number of factors.
> >>>> a) How far back do you want to be able to do PITR
> >>>> b) How much space do you have / want to use for PITR
> >>>> c) Which physical dumps are Good and which are Not Good. (see #3)
> >>>>
> >>>> Now I think I have a pretty good plan here except for #3 (and so #4
> >>>> then also suffers).
> >>>>
> >>>> Just as an example lets say I'm not concerned so much with PITR
> >>>> as I
> >>>> am recovering from a db crash. I've got all the backups files saved
> >>>> to my backup db server so I can failover to it if my primary db
> >>>> server dies. I just want to make sure I've got one physical dump
> >>>> that is good. (This is not my actual situation but it
> >>>> illustrated my
> >>>> point better.)
> >>>>
> >>>> Now when I do a physical dump it is not a Good dump. That is I
> >>>> don't
> >>>> have the end archive file necessary to recover from that physical
> >>>> dump. That is to say that when I call pg_backup_start() then copy
> >>>> $PGDATA then call pg_backup_stop() postgres might be on say WAL
> >>>> archive file #5. Once the physical dump is completed WAL archive
> >>>> file #5 hasn't been archived yet. I only have up to #4. So if I
> >>>> delete my old physical dumps and all I've got is this most
> >>>> recent one
> >>>> and my database crashes before #5 gets archived then I am hosed. I
> >>>> have no good physical backups to start from.
> >>>>
> >>>> My main question is about the best way to figure out when a
> >>>> physical
> >>>> dump is Good.
> >>>>
> >>>> One strategy is to always keep around lots of physical dumps.
> >>>> If you
> >>>> keep around 100 dumps you can be pretty sure that in the space of
> >>>> time that those physical dumps take place that at least one WAL
> >>>> file
> >>>> was archived. In fact if you keep 2 physical dumps you can be
> >>>> fairly
> >>>> certain of this. If not then you really need to space our your
> >>>> dumps
> >>>> more.
> >>>>
> >>>> Is this making sense at this point?
> >>>>
> >>>> The problem is that the WAL archiving is triggered by postgres and
> >>>> the rate at which the db is updated. The physical dumps are
> >>>> triggered by cron and on a purely time based schedule. So in
> >>>> theory
> >>>> if you had the physical dumps happening once a day but for some odd
> >>>> reason no one updated the database for 4 days then all of a sudden
> >>>> you'd have 2 physical backups and neither of them are good. If
> >>>> you're db crashes during that time you are hosed.
> >>>>
> >>>> Maybe I am arguing a point that is just stupid because this will
> >>>> never happen in real life. But in that it is my backups system
> >>>> that
> >>>> I will be using to recover from complete and total disaster I just
> >>>> want to have all my bases covered.
> >>>>
> >>>> So my ideas on how to determine if a physical dump is Good are as
> >>>> follows.
> >>>>
> >>>> 1) When you do the physical backup (after dumping the $PGDATA
> >>>> dir but
> >>>> before calling pg_stop_backup() ) determine the current WAL archive
> >>>> file. Mark somewhere in the backed up physical dump the last file
> >>>> needed for the dump to be considered good. Then your deletion
> >>>> scripts can look at the WAL archive files you have and the last one
> >>>> required for the dump to be Good and determine if the dump is
> >>>> Good or
> >>>> not.
> >>>>
> >>>> 2) After doing the physical dump but before calling
> >>>> pg_stop_backup()
> >>>> just copy the current WAL file to the physical dump. If that file
> >>>> later gets archived then the restore commands overwrites your
> >>>> partially completed one so it doesn't hurt but you know that
> >>>> when you
> >>>> call pg_stop_backup() that that physical dump is good. (Is it
> >>>> ok to
> >>>> copy the current WAL file while it is still in use?)
> >>>>
> >>>> Is anyone taking one of these or any other precautions to make sure
> >>>> they've got a good physical dump or does everyone just keep a whole
> >>>> bunch of dumps around, and then actually restore the dump to see if
> >>>> it is good and if not go back to a previous dump?
> >>>>
> >>>> I hope that makes more sense.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Rick
> >>>>
> >>>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
> >>>>
> >>>>> Rick Gigger wrote:
> >>>>>> Um, no you didn't read my email at all. I am aware of all of
> >>>>>> that
> >>>>>> and it is clearly outlined in the docs. My email was about a
> >>>>>> specific detail in the process. Please read it if you want to
> >>>>>> know what my actual question was.
> >>>>>
> >>>>> I'm not sure your email is quite right as regards the process. You
> >>>>> need:
> >>>>> 1. the filesystem backup
> >>>>> 2. the WAL file indicated in the history-file
> >>>>> 3. all the WAL files later than that
> >>>>> to get up to "now".
> >>>>>
> >>>>> If you don't want to replay up to "now" then you will not need
> >>>>> some
> >>>>> of the more recent WAL files. You can't afford to throw them away
> >>>>> though since you've got a rolling backup system running and the
> >>>>> whole point is so you can recover to any point you like.
> >>>>>
> >>>>> You can however throw away any WAL files older than that indicated
> >>>>> in the history file for your current filesystem-backup. You can
> >>>>> then only restore from that point in time forward.
> >>>>>
> >>>>> There is no "last one" in the WAL set unless you know the time you
> >>>>> want to restore to. Indeed, the "last one" might not be "full" yet
> >>>>> and therefore archived if you want to restore to 10 seconds ago.
> >>>>>
> >>>>> Or am I mis-understanding your email too?
> >>>>>
> >>>>> --
> >>>>> Richard Huxton
> >>>>> Archonet Ltd
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------(end of
> >>>> broadcast)---------------------------
> >>>> TIP 4: Have you searched our list archives?
> >>>>
> >>>> http://archives.postgresql.org
> >>>>
> >>>
> >>> --
> >>> Bruce Momjian | http://candle.pha.pa.us
> >>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> >>> + If your life is a hard drive, | 13 Roberts Road
> >>> + Christ can be your backup. | Newtown Square,
> >>> Pennsylvania 19073
> >>>
> >>
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 4: Have you searched our list archives?
> >>
> >> http://archives.postgresql.org
> >>
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square,
> > Pennsylvania 19073
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 1.5 KB