Re: Incremental Backup Script

Lists: pgsql-hackers
From: "Gregor Zeitlinger" <gregor(dot)zeitlinger(at)torexretail(dot)de>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Incremental Backup Script
Date: 2005-12-25 13:02:13
Message-ID: 5DE489C997EC984FA3DD0935879DAE1255EB1B@ex09-00-z002.torexretail.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

as far as I have understood, the WAL backup that you control via "archive_command" is the PostgreSQL equivalent to what other databases let you do with an incremental backup. That is, if you don't forget to include the current WAL block.

I have found a script to determine the current WAL on the admin mailing list.
Based on this script, I intend to write two scripts that do the following (unless something like this already exists).

basebackup -> basebackup.<bbd>.bz2
incrementalbackup -> incrementalbackup.<bbd>.<ibc>.<ibd>.bz2 (incremental backup relative to the last incremental backup)
restore (a file produced by the above commands) -> restore database (either base, or base + 1..n incremental backups)

<bbd>: base backup date (e.g. 2005-12-25-14-00)
<ibc>: incremental backup counter (1..n)
<ibd>: incremental backup date

The central idea is that base backups are guaranteed to include all information up to <bbd> and incremental backups all data up to <ibd>. I hope that this makes it easier for administrators.

archive_command:
copy the files to a local backup directory (LWB = local wal backup)

basebackup:
1) tar the data directory
2) add any WALs that are produced while the backup is running.
3) delete all WAL that are included in the tar
4) I still wonder how <bbd> must be chosen (that of pg_start_backup?)

incremental backup:
1) add all WAL that are currently in the LWB to the tar
2) add the current WAL to the tar
3) verify that all WALs prior to the current WAL are included (i.e. that no WAL is currently being copied to the LWB)
4) delete all WAL that are included in the tar

restore:
1) if it's a base backup, just restore that
2) if it's an incremental backup, check that the corresponding base backup and all incremental backups with lower <ibc> are available. Then restore the base backup and all incremental backups up to the one specified

Also, I was wondering whether it is always safe to copy the current WAL file, i.e. may the current WAL file be invalid in any circumstance?

Is this a sensible idea?

Regards,

Gregor


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental Backup Script
Date: 2005-12-25 22:04:02
Message-ID: don4ue$1qrr$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


""Gregor Zeitlinger"" <gregor(dot)zeitlinger(at)torexretail(dot)de> wrote
>
> Also, I was wondering whether it is always safe to copy the current WAL
> file, i.e. may the current WAL file be invalid in any circumstance?
>

If you mean "current WAL file" is the xlog segment in use, then it is
dangerous. We only backup the xlog segments that have been fully used up.

Regards,
Qingqing


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gregor Zeitlinger <gregor(dot)zeitlinger(at)torexretail(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental Backup Script
Date: 2005-12-26 11:04:26
Message-ID: 1135595066.2964.660.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2005-12-25 at 14:02 +0100, Gregor Zeitlinger wrote:
> as far as I have understood, the WAL backup that you control via
> "archive_command" is the PostgreSQL equivalent to what other databases
> let you do with an incremental backup

No it is not an incremental backup of changed data blocks, it is a
transactional log archival. So, other parts of your thinking are
slightly off - but not by much. The only way to do a partial recovery is
to follow the PITR notes.

Best Regards, Simon Riggs


From: Zach Bagnall <zach(dot)bagnall(at)bulletinwireless(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental Backup Script
Date: 2006-01-04 03:41:43
Message-ID: 43BB43F7.4020105@bulletinwireless.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/26/05 11:04, Qingqing Zhou wrote:
> ""Gregor Zeitlinger"" <gregor(dot)zeitlinger(at)torexretail(dot)de> wrote
>
>>Also, I was wondering whether it is always safe to copy the current WAL
>>file, i.e. may the current WAL file be invalid in any circumstance?
>>
>
> If you mean "current WAL file" is the xlog segment in use, then it is
> dangerous. We only backup the xlog segments that have been fully used up.

As per docs, if the databases are rarely updated it could take a long
time for the WAL segment to "roll over". We need to backup the current
segment to guarantee we have the latest trasactions archived at time of
failure.

http://www.postgresql.org/docs/8.1/interactive/backup-online.html
"If you are concerned about being able to recover right up to the
current instant, you may want to take additional steps to ensure that
the current, partially-filled WAL segment is also copied someplace. This
is particularly important if your server generates only little WAL
traffic (or has slack periods where it does so), since it could take a
long time before a WAL segment file is completely filled and ready to
archive. One possible way to handle this is to set up a cron job that
periodically (once a minute, perhaps) identifies the current WAL segment
file and saves it someplace safe."

Gregor: can you explain how to identify the current file? I had
implemented a backup and restore script for PITR but stumbled at this
point. The page above does not specify how this is to be done.

I appreciate the addition of PITR - it's better than nothing (nothing
being full dumps) in some respects. Ideally, we need to be able to dump
deltas for a single database. In practice, restoration using the PITR
method is awkward. I guess you would tarball the current data files, do
a full restore, do a full dump of the database you are interested in,
ditch the restored data files and replace them with the ones you
tarballed, then do a database load from the full dump. The only way to
avoid having the other databases on the server offline is to restore to
a second postgresql instance. Not complaining, just saying :-)

> Regards,
> Qingqing

Zach.


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Zach Bagnall <zach(dot)bagnall(at)bulletinwireless(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental Backup Script
Date: 2006-01-04 07:52:08
Message-ID: 1FEC4B88-9936-4608-B997-51FC93E5AB8D@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I would certainly like some instructions on this as well.

On Jan 3, 2006, at 8:41 PM, Zach Bagnall wrote:

> On 12/26/05 11:04, Qingqing Zhou wrote:
>> ""Gregor Zeitlinger"" <gregor(dot)zeitlinger(at)torexretail(dot)de> wrote
>>> Also, I was wondering whether it is always safe to copy the
>>> current WAL file, i.e. may the current WAL file be invalid in any
>>> circumstance?
>>>
>> If you mean "current WAL file" is the xlog segment in use, then it
>> is dangerous. We only backup the xlog segments that have been
>> fully used up.
>
> As per docs, if the databases are rarely updated it could take a
> long time for the WAL segment to "roll over". We need to backup the
> current segment to guarantee we have the latest trasactions
> archived at time of failure.
>
> http://www.postgresql.org/docs/8.1/interactive/backup-online.html
> "If you are concerned about being able to recover right up to the
> current instant, you may want to take additional steps to ensure
> that the current, partially-filled WAL segment is also copied
> someplace. This is particularly important if your server generates
> only little WAL traffic (or has slack periods where it does so),
> since it could take a long time before a WAL segment file is
> completely filled and ready to archive. One possible way to handle
> this is to set up a cron job that periodically (once a minute,
> perhaps) identifies the current WAL segment file and saves it
> someplace safe."
>
> Gregor: can you explain how to identify the current file? I had
> implemented a backup and restore script for PITR but stumbled at
> this point. The page above does not specify how this is to be done.
>
> I appreciate the addition of PITR - it's better than nothing
> (nothing being full dumps) in some respects. Ideally, we need to be
> able to dump deltas for a single database. In practice, restoration
> using the PITR method is awkward. I guess you would tarball the
> current data files, do a full restore, do a full dump of the
> database you are interested in, ditch the restored data files and
> replace them with the ones you tarballed, then do a database load
> from the full dump. The only way to avoid having the other
> databases on the server offline is to restore to a second
> postgresql instance. Not complaining, just saying :-)
>
>
>
>> Regards,
>> Qingqing
>
> Zach.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>