Quick Links

Implementing incremental backup

Lists:	pgsql-hackers

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Implementing incremental backup
Date:	2013-06-19 10:13:46
Message-ID:	20130619.191346.515430917508820927.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I'm thinking of implementing an incremental backup tool for
PostgreSQL. The use case for the tool would be taking a backup of huge
database. For that size of database, pg_dump is too slow, even WAL
archive is too slow/ineffective as well. However even in a TB
database, sometimes actual modified blocks are not that big, may be
even several GB. So if we can backup those modified blocks only,
that would be an effective incremental backup method.

For now, my idea is pretty vague.

- Record info about modified blocks. We don't need to remember the
whole history of a block if the block was modified multiple times.
We just remember that the block was modified since the last
incremental backup was taken.

- The info could be obtained by trapping calls to mdwrite() etc. We need
to be careful to avoid such blocks used in xlogs and temporary
tables to not waste resource.

- If many blocks were modified in a file, we may be able to condense
the info as "the whole file was modified" to reduce the amount of
info.

- How to take a consistent incremental backup is an issue. I can't
think of a clean way other than "locking whole cluster", which is
obviously unacceptable. Maybe we should give up "hot backup"?

Comments, thoughts are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 12:55:21
Message-ID:	20130619125521.GT23363@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tatsuo,

* Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
> I'm thinking of implementing an incremental backup tool for
> PostgreSQL. The use case for the tool would be taking a backup of huge
> database. For that size of database, pg_dump is too slow, even WAL
> archive is too slow/ineffective as well. However even in a TB
> database, sometimes actual modified blocks are not that big, may be
> even several GB. So if we can backup those modified blocks only,
> that would be an effective incremental backup method.

I'm trying to figure out how that's actually different from WAL..? It
sounds like you'd get what you're suggesting with simply increasing the
checkpoint timeout until the WAL stream is something which you can keep
up with. Of course, the downside there is that you'd have to replay
more WAL when recovering.

What about a tool which receives WALs but then "compresses" them across
a longer period of time than the normal checkpointing by simply keeping
in memory the current set of blocks modified and applying each WAL
record against that block in memory as it reads the WAL? It would then
purge that block out using a full-page WAL write at some pre-defined
point, perhaps at the end of the overall backup?

Consider this: connect the WAL-compressor to a PG backend, issue a
'start backup', which the WAL-compressor detects and then starts keeping
track of every block changed in memory, applying the WAL stream of full
page and non-full-page changes to the in memory set, until the 'stop
backup' is called, at which point the WAL-compressor simply dumps all
the records as full page writes into this new WAL stream.

Or perhaps some combination of an 'always running' WAL compressor which
simply reduces the overall size of the WAL stream with coordination
around full backups.

Thanks,

Stephen

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 12:56:31
Message-ID:	CA+CSw_swVaWb7-tvSh70sdE2564VZZbeoPrD9hBN4K09EvBOwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 1:13 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> I'm thinking of implementing an incremental backup tool for
> PostgreSQL. The use case for the tool would be taking a backup of huge
> database. For that size of database, pg_dump is too slow, even WAL
> archive is too slow/ineffective as well. However even in a TB
> database, sometimes actual modified blocks are not that big, may be
> even several GB. So if we can backup those modified blocks only,
> that would be an effective incremental backup method.

PostgreSQL definitely needs better tools to cope with TB scale
databases. Especially when the ideas that get rid of anti-wraparound
vacuums materialize and make huge databases more practical.

> For now, my idea is pretty vague.
>
> - Record info about modified blocks. We don't need to remember the
> whole history of a block if the block was modified multiple times.
> We just remember that the block was modified since the last
> incremental backup was taken.
>
> - The info could be obtained by trapping calls to mdwrite() etc. We need
> to be careful to avoid such blocks used in xlogs and temporary
> tables to not waste resource.

Unless I'm missing something, the information about modified blocks
can also be obtained by reading WAL, not requiring any modifications
to core.

> - If many blocks were modified in a file, we may be able to condense
> the info as "the whole file was modified" to reduce the amount of
> info.

You could keep a list of block ranges modified and when the list gets
too large, merge ranges that are close together.

> - How to take a consistent incremental backup is an issue. I can't
> think of a clean way other than "locking whole cluster", which is
> obviously unacceptable. Maybe we should give up "hot backup"?

I don't see why regular pg_start_backup(), copy out modified blocks,
pg_stop_backup(), copy WAL needed to recover approach wouldn't work
here.

A good feature of the tool would be to apply the incremental backup to
the previous backup while copying out old blocks so you could have the
latest full backup available and incremental changes to rewind it to
the previous version.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 16:02:31
Message-ID:	CAGTBQpZhBifNKP+g4PeKFAVV=yNEXeczq8an0gqBSvoXbUS4Ag@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
> For now, my idea is pretty vague.
>
> - Record info about modified blocks. We don't need to remember the
> whole history of a block if the block was modified multiple times.
> We just remember that the block was modified since the last
> incremental backup was taken.
>
> - The info could be obtained by trapping calls to mdwrite() etc. We need
> to be careful to avoid such blocks used in xlogs and temporary
> tables to not waste resource.
>
> - If many blocks were modified in a file, we may be able to condense
> the info as "the whole file was modified" to reduce the amount of
> info.
>
> - How to take a consistent incremental backup is an issue. I can't
> think of a clean way other than "locking whole cluster", which is
> obviously unacceptable. Maybe we should give up "hot backup"?

I don't see how this is better than snapshotting at the filesystem
level. I have no experience with TB scale databases (I've been limited
to only hundreds of GB), but from my limited mid-size db experience,
filesystem snapshotting is pretty much the same thing you propose
there (xfs_freeze), and it works pretty well. There's even automated
tools to do that, like bacula, and they can handle incremental
snapshots.

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 18:54:15
Message-ID:	51C1FE57.2010207@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/19/13 11:02 AM, Claudio Freire wrote:
> On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>
>> For now, my idea is pretty vague.
>>
>> - Record info about modified blocks. We don't need to remember the
>> whole history of a block if the block was modified multiple times.
>> We just remember that the block was modified since the last
>> incremental backup was taken.
>>
>> - The info could be obtained by trapping calls to mdwrite() etc. We need
>> to be careful to avoid such blocks used in xlogs and temporary
>> tables to not waste resource.
>>
>> - If many blocks were modified in a file, we may be able to condense
>> the info as "the whole file was modified" to reduce the amount of
>> info.
>>
>> - How to take a consistent incremental backup is an issue. I can't
>> think of a clean way other than "locking whole cluster", which is
>> obviously unacceptable. Maybe we should give up "hot backup"?
>
>
> I don't see how this is better than snapshotting at the filesystem
> level. I have no experience with TB scale databases (I've been limited
> to only hundreds of GB), but from my limited mid-size db experience,
> filesystem snapshotting is pretty much the same thing you propose
> there (xfs_freeze), and it works pretty well. There's even automated
> tools to do that, like bacula, and they can handle incremental
> snapshots.

A snapshot is not the same as an incremental backup; it presents itself as a full copy of the filesystem. Actually, since it's on the same underlying storage a snapshot isn't really a good backup at all.

The proposal (at least as I read it) is to provide a means to easily deal with *only* the data that has actually *changed* since the last backup was taken.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 19:36:39
Message-ID:	CAGTBQpbk7c2CbJd9PJ1z_ObtDTL9=6_7=1pf3i904iumL31nVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 3:54 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On 6/19/13 11:02 AM, Claudio Freire wrote:
>>
>> On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org>
>> wrote:
>>>
>>>
>>> For now, my idea is pretty vague.
>>>
>>> - Record info about modified blocks. We don't need to remember the
>>> whole history of a block if the block was modified multiple times.
>>> We just remember that the block was modified since the last
>>> incremental backup was taken.
>>>
>>> - The info could be obtained by trapping calls to mdwrite() etc. We need
>>> to be careful to avoid such blocks used in xlogs and temporary
>>> tables to not waste resource.
>>>
>>> - If many blocks were modified in a file, we may be able to condense
>>> the info as "the whole file was modified" to reduce the amount of
>>> info.
>>>
>>> - How to take a consistent incremental backup is an issue. I can't
>>> think of a clean way other than "locking whole cluster", which is
>>> obviously unacceptable. Maybe we should give up "hot backup"?
>>
>>
>>
>> I don't see how this is better than snapshotting at the filesystem
>> level. I have no experience with TB scale databases (I've been limited
>> to only hundreds of GB), but from my limited mid-size db experience,
>> filesystem snapshotting is pretty much the same thing you propose
>> there (xfs_freeze), and it works pretty well. There's even automated
>> tools to do that, like bacula, and they can handle incremental
>> snapshots.
>
>
> A snapshot is not the same as an incremental backup; it presents itself as a
> full copy of the filesystem. Actually, since it's on the same underlying
> storage a snapshot isn't really a good backup at all.

Read on bacula[0], which is huge and thus this info may be hard to
find, you can take that snapshot, which will be on the same filesystem
of course, and *then* back it up. So you get a consistent snapshot on
your backup, which means a correct backup, and the backup certainly
doesn't have to be on the same filesystem. It even works for ext3 if
you install the right kernel modules.

Yes, it's a snapshot of the entire filesystem. So it's not the same as
a database-only backup. But it does have a huge overlap don't you
think?

When WAL archiving can get you PITR, and bacula-like tools can get you
incremental and consistent full-FS-snapshot backups, what does the
proposed feature add? I don't think you can get PITR with the proposed
feature, as it takes a snapshot only when told to, and it can't take
multiple snapshots. The only way to get PITR AFAIK is with WAL
archiving, so whether it's viable or not for TB-sized databases is
moot, if it's the only option.

And it will add an overhead. A considerable overhead. Even if you only
have to flip a bit on some page map, it amplifies writes twofold
(unless writes can be coalesced, of which there is no guarantee).

In the end, it may be preferrable to just alter PG's behavior slightly
to make bacula, rsync or whichever tool's job easier. Like trying hard
not to write to cold segments, so entire segments can be skipped by
quick mtime checks.

[0] http://www.bacula.org/en/

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 21:20:21
Message-ID:	20130619212021.GV23363@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
> I don't see how this is better than snapshotting at the filesystem
> level. I have no experience with TB scale databases (I've been limited
> to only hundreds of GB), but from my limited mid-size db experience,
> filesystem snapshotting is pretty much the same thing you propose
> there (xfs_freeze), and it works pretty well. There's even automated
> tools to do that, like bacula, and they can handle incremental
> snapshots.

Large databases tend to have multiple filesystems and getting a single,
consistent, snapshot across all of them while under load is..
'challenging'. It's fine if you use pg_start/stop_backup() and you're
saving the XLOGs off, but if you can't do that..

Thanks,

Stephen

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:02:28
Message-ID:	CAGTBQpa-URuu4Oc2NT0Prm7-N0GifLMv-6FeLNjf2YqaPCDLvA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
>> I don't see how this is better than snapshotting at the filesystem
>> level. I have no experience with TB scale databases (I've been limited
>> to only hundreds of GB), but from my limited mid-size db experience,
>> filesystem snapshotting is pretty much the same thing you propose
>> there (xfs_freeze), and it works pretty well. There's even automated
>> tools to do that, like bacula, and they can handle incremental
>> snapshots.
>
> Large databases tend to have multiple filesystems and getting a single,
> consistent, snapshot across all of them while under load is..
> 'challenging'. It's fine if you use pg_start/stop_backup() and you're
> saving the XLOGs off, but if you can't do that..

Good point there.

I still don't like the idea of having to mark each modified page. The
WAL compressor idea sounds a lot more workable. As in scalable.

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:18:57
Message-ID:	20130619221857.GU3537@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Claudio Freire escribió:
> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
> >> I don't see how this is better than snapshotting at the filesystem
> >> level. I have no experience with TB scale databases (I've been limited
> >> to only hundreds of GB), but from my limited mid-size db experience,
> >> filesystem snapshotting is pretty much the same thing you propose
> >> there (xfs_freeze), and it works pretty well. There's even automated
> >> tools to do that, like bacula, and they can handle incremental
> >> snapshots.
> >
> > Large databases tend to have multiple filesystems and getting a single,
> > consistent, snapshot across all of them while under load is..
> > 'challenging'. It's fine if you use pg_start/stop_backup() and you're
> > saving the XLOGs off, but if you can't do that..
>
> Good point there.
>
> I still don't like the idea of having to mark each modified page. The
> WAL compressor idea sounds a lot more workable. As in scalable.

There was a project that removed "useless" WAL records from the stream,
to make it smaller and useful for long-term archiving. It only removed
FPIs as far as I recall. It's dead now, and didn't compile on recent
(9.1?) Postgres because of changes in the WAL structs, IIRC.

This doesn't help if you have a large lot of UPDATEs that touch the same
set of rows over and over, though. Tatsuo-san's proposal would allow
this use-case to work nicely because you only keep one copy of such
data, not one for each modification.

If you have the two technologies, you could teach them to work in
conjunction: you set up WAL replication, and tell the WAL compressor to
prune updates for high-update tables (avoid useless traffic), then use
incremental backup to back these up. This seems like it would have a
lot of moving parts and be rather bug-prone, though.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:29:34
Message-ID:	CAGTBQpZcoNKwTpMKzRy0kNQmpefxss7SneDdVixLDCKGopO8cA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 7:18 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> If you have the two technologies, you could teach them to work in
> conjunction: you set up WAL replication, and tell the WAL compressor to
> prune updates for high-update tables (avoid useless traffic), then use
> incremental backup to back these up. This seems like it would have a
> lot of moving parts and be rather bug-prone, though.

I don't think it would be worse than storage-manager-level stuff. And
though more complex, don't underestimate the pros: lower footprint,
better scalability, and you get consistent online backups.

That mechanism can also be used to distill a list of modified pages,
mind you, instead of hooking into storage-manager stuff. The pro
there, is that it wouldn't amplify writes. The con there is that you
don't get consistent online backups.

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	sfrost(at)snowman(dot)net
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:39:13
Message-ID:	20130620.073913.1461847419334299815.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> I'm thinking of implementing an incremental backup tool for
>> PostgreSQL. The use case for the tool would be taking a backup of huge
>> database. For that size of database, pg_dump is too slow, even WAL
>> archive is too slow/ineffective as well. However even in a TB
>> database, sometimes actual modified blocks are not that big, may be
>> even several GB. So if we can backup those modified blocks only,
>> that would be an effective incremental backup method.
>
> I'm trying to figure out how that's actually different from WAL..? It
> sounds like you'd get what you're suggesting with simply increasing the
> checkpoint timeout until the WAL stream is something which you can keep
> up with. Of course, the downside there is that you'd have to replay
> more WAL when recovering.

Yeah, at first I thought using WAL was a good idea. However I realized
that the problem using WAL is we cannot backup unlogged tables because
they are not written to WAL.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:44:06
Message-ID:	CAGTBQpbmwcv1tP3q4ee8KbQp_y0vmc3nNr8jYjeg3-xud+fOhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 7:39 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> I'm thinking of implementing an incremental backup tool for
>>> PostgreSQL. The use case for the tool would be taking a backup of huge
>>> database. For that size of database, pg_dump is too slow, even WAL
>>> archive is too slow/ineffective as well. However even in a TB
>>> database, sometimes actual modified blocks are not that big, may be
>>> even several GB. So if we can backup those modified blocks only,
>>> that would be an effective incremental backup method.
>>
>> I'm trying to figure out how that's actually different from WAL..? It
>> sounds like you'd get what you're suggesting with simply increasing the
>> checkpoint timeout until the WAL stream is something which you can keep
>> up with. Of course, the downside there is that you'd have to replay
>> more WAL when recovering.
>
> Yeah, at first I thought using WAL was a good idea. However I realized
> that the problem using WAL is we cannot backup unlogged tables because
> they are not written to WAL.

How does replication handle that?

Because I doubt that's an issue only with backups.

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	klaussfreire(at)gmail(dot)com
Cc:	sfrost(at)snowman(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:54:29
Message-ID:	20130620.075429.92168895872404702.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> I'm trying to figure out how that's actually different from WAL..? It
>>> sounds like you'd get what you're suggesting with simply increasing the
>>> checkpoint timeout until the WAL stream is something which you can keep
>>> up with. Of course, the downside there is that you'd have to replay
>>> more WAL when recovering.
>>
>> Yeah, at first I thought using WAL was a good idea. However I realized
>> that the problem using WAL is we cannot backup unlogged tables because
>> they are not written to WAL.
>
> How does replication handle that?
>
> Because I doubt that's an issue only with backups.

Unlogged tables are not replicated to streaming replication
standbys. It is clearly stated in the doc.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 22:56:19
Message-ID:	20130619225619.GW23363@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
> Yeah, at first I thought using WAL was a good idea. However I realized
> that the problem using WAL is we cannot backup unlogged tables because
> they are not written to WAL.

Unlogged tables are also nuked on recovery, so I'm not sure why you
think an incremental backup would help.. If you're recovering (even
from a simple crash), unlogged tables are going to go away.

Put simply, unlogged tables should not be used for any data you care
about, period.

Thanks,

Stephen

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	sfrost(at)snowman(dot)net
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 23:12:23
Message-ID:	20130620.081223.1792528449674354429.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> * Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
>> Yeah, at first I thought using WAL was a good idea. However I realized
>> that the problem using WAL is we cannot backup unlogged tables because
>> they are not written to WAL.
>
> Unlogged tables are also nuked on recovery, so I'm not sure why you
> think an incremental backup would help.. If you're recovering (even
> from a simple crash), unlogged tables are going to go away.

If my memory serves, unlogged tables are not nuked when PostgeSQL is
stopped by a planned shutdown (not by crash or by "pg_ctl -m i
stop"). If PostgreSQL works so, incremental backup should be able to
recover unlogged tables as well, at least people would expect so IMO.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 23:20:28
Message-ID:	20130619232028.GY23363@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
> > * Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
> >> Yeah, at first I thought using WAL was a good idea. However I realized
> >> that the problem using WAL is we cannot backup unlogged tables because
> >> they are not written to WAL.
> >
> > Unlogged tables are also nuked on recovery, so I'm not sure why you
> > think an incremental backup would help.. If you're recovering (even
> > from a simple crash), unlogged tables are going to go away.
>
> If my memory serves, unlogged tables are not nuked when PostgeSQL is
> stopped by a planned shutdown (not by crash or by "pg_ctl -m i
> stop"). If PostgreSQL works so, incremental backup should be able to
> recover unlogged tables as well, at least people would expect so IMO.

Sure, if you shut down PG, rsync the entire thing and then bring it back
up then unlogged tables should work when "backed up".

They're not WAL'd, so expecting them to work when restoring a backup of
a PG that had been running at the time of the backup is folly.

Thanks,

Stephen

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	sfrost(at)snowman(dot)net
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 23:29:29
Message-ID:	20130620.082929.1073642141629305444.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> * Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
>> > * Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
>> >> Yeah, at first I thought using WAL was a good idea. However I realized
>> >> that the problem using WAL is we cannot backup unlogged tables because
>> >> they are not written to WAL.
>> >
>> > Unlogged tables are also nuked on recovery, so I'm not sure why you
>> > think an incremental backup would help.. If you're recovering (even
>> > from a simple crash), unlogged tables are going to go away.
>>
>> If my memory serves, unlogged tables are not nuked when PostgeSQL is
>> stopped by a planned shutdown (not by crash or by "pg_ctl -m i
>> stop"). If PostgreSQL works so, incremental backup should be able to
>> recover unlogged tables as well, at least people would expect so IMO.
>
> Sure, if you shut down PG, rsync the entire thing and then bring it back
> up then unlogged tables should work when "backed up".

I don't think using rsync (or tar or whatever general file utils)
against TB database for incremental backup is practical. If it's
practical, I would never propose my idea.

> They're not WAL'd, so expecting them to work when restoring a backup of
> a PG that had been running at the time of the backup is folly.

Probably you forget about our nice pg_dump tool:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	klaussfreire(at)gmail(dot)com
Cc:	sfrost(at)snowman(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 23:40:42
Message-ID:	20130620.084042.49354019011506078.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
>>> I don't see how this is better than snapshotting at the filesystem
>>> level. I have no experience with TB scale databases (I've been limited
>>> to only hundreds of GB), but from my limited mid-size db experience,
>>> filesystem snapshotting is pretty much the same thing you propose
>>> there (xfs_freeze), and it works pretty well. There's even automated
>>> tools to do that, like bacula, and they can handle incremental
>>> snapshots.
>>
>> Large databases tend to have multiple filesystems and getting a single,
>> consistent, snapshot across all of them while under load is..
>> 'challenging'. It's fine if you use pg_start/stop_backup() and you're
>> saving the XLOGs off, but if you can't do that..
>
> Good point there.
>
> I still don't like the idea of having to mark each modified page. The
> WAL compressor idea sounds a lot more workable. As in scalable.

Why do you think WAL compressor idea is more scalable? I really want
to know why. Besides the unlogged tables issue, I can accept the idea
if WAL based solution is much more efficient. If there's no perfect,
ideal solution, we need to prioritize things. My #1 priority is
allowing to create incremental backup against TB database, and the
backup file should be small enough and the time to create it is
acceptable. I just don't know why scanning WAL stream is much cheaper
than recording modified page information.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 23:43:24
Message-ID:	20130619234324.GZ23363@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
> I don't think using rsync (or tar or whatever general file utils)
> against TB database for incremental backup is practical. If it's
> practical, I would never propose my idea.

You could use rsync for incremental updates if you wanted, it'd
certainly be faster in some cases and it's entirely possible to use such
against TB databases in some cases.

> > They're not WAL'd, so expecting them to work when restoring a backup of
> > a PG that had been running at the time of the backup is folly.
>
> Probably you forget about our nice pg_dump tool:-)

I don't consider pg_dump a mechanism for backing up TB databases.
You're certainly welcome to use it for dumping unlogged tables, but I
can't support the notion that unlogged tables should be supported
through WAL-supported file-based backups. If we're going down this
road, I'd much rather see support for exporting whole files from and
importing them back into PG in some way which completely avoids the need
to re-parse or re-validate data and supports pulling in indexes as part
of the import.

Thanks,

Stephen

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	klaussfreire(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-19 23:55:46
Message-ID:	20130619235546.GA23363@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tatsuo Ishii (ishii(at)postgresql(dot)org) wrote:
> Why do you think WAL compressor idea is more scalable? I really want
> to know why. Besides the unlogged tables issue, I can accept the idea
> if WAL based solution is much more efficient. If there's no perfect,
> ideal solution, we need to prioritize things. My #1 priority is
> allowing to create incremental backup against TB database, and the
> backup file should be small enough and the time to create it is
> acceptable. I just don't know why scanning WAL stream is much cheaper
> than recording modified page information.

Because that's what the WAL *is*..?

Why would you track what's changed twice?

Thanks,

Stephen

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-20 00:04:12
Message-ID:	CAGTBQpbd5qXoTqFBRKvY0x41LQwF3uTYMPtGHuc28vtXVwCsFg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>> * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
>>>> I don't see how this is better than snapshotting at the filesystem
>>>> level. I have no experience with TB scale databases (I've been limited
>>>> to only hundreds of GB), but from my limited mid-size db experience,
>>>> filesystem snapshotting is pretty much the same thing you propose
>>>> there (xfs_freeze), and it works pretty well. There's even automated
>>>> tools to do that, like bacula, and they can handle incremental
>>>> snapshots.
>>>
>>> Large databases tend to have multiple filesystems and getting a single,
>>> consistent, snapshot across all of them while under load is..
>>> 'challenging'. It's fine if you use pg_start/stop_backup() and you're
>>> saving the XLOGs off, but if you can't do that..
>>
>> Good point there.
>>
>> I still don't like the idea of having to mark each modified page. The
>> WAL compressor idea sounds a lot more workable. As in scalable.
>
> Why do you think WAL compressor idea is more scalable? I really want
> to know why. Besides the unlogged tables issue, I can accept the idea
> if WAL based solution is much more efficient. If there's no perfect,
> ideal solution, we need to prioritize things. My #1 priority is
> allowing to create incremental backup against TB database, and the
> backup file should be small enough and the time to create it is
> acceptable. I just don't know why scanning WAL stream is much cheaper
> than recording modified page information.

Because it aggregates updates.

When you work at the storage manager level, you only see block-sized
operations. This results in the need to WAL-log bit-sized updates on
some hypothetical dirty-map index. Even when done 100% efficiently,
this implies at least one write per dirtied block, which could as much
as double write I/O in the worse (and totally expectable) case.

When you do it at WAL segment recycle time, or better yet during
checkpoints, you deal with checkpoint-scale operations. You can
aggregate dirty-map updates, if you keep a dirty-map, which could not
only reduce I/O considerably (by a much increased likelihood of write
coalescence), but also let you schedule it better (toss it in the
background, with checkpoints). This is for gathering dirty-map
updates, which still leaves you with the complex problem of then
actually snapshotting those pages consistently without interfering
with ongoing transactions.

If you do a WAL compressor, WAL entries are write-once, so you'll have
no trouble snapshotting those pages. You have the checkpoint's initial
full page write, so you don't even have to read the page, and you can
accumulate all further partial writes into one full page write, and
dump that on an "incremental archive". So, you get all the I/O
aggregation from above, which reduces I/O to the point where it only
doubles WAL I/O. It's bound by a constant, and in contrast to
dirty-map updates, it's sequential I/O so it's a lot faster. It's thus
perfectly scalable.

Not only that, but you're also amortizing incremental backup costs
over time, as you're making them constantly as opposed to regular
intervals. You'll have one incremental backup per checkpoint. If you
want to coalesce backups, you launch another compressor to merge the
last incremental checkpoint with the new one. And, now this is the
cherry on top, you only have to do this on the archived WALs, which
means you could very well do it on another system, freeing your main
cluster from all this I/O. It's thus perfectly scalable.

The only bottleneck here, is WAL archiving. This assumes you can
afford WAL archiving at least to a local filesystem, and that the WAL
compressor is able to cope with WAL bandwidth. But I have no reason to
think you'd be able to cope with dirty-map updates anyway if you were
saturating the WAL compressor, as the compressor is more efficient on
amortized cost per transaction than the dirty-map approach.

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	klaussfreire(at)gmail(dot)com
Cc:	sfrost(at)snowman(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-20 01:25:21
Message-ID:	20130620.102521.1131455910749185193.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>>> * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
>>>>> I don't see how this is better than snapshotting at the filesystem
>>>>> level. I have no experience with TB scale databases (I've been limited
>>>>> to only hundreds of GB), but from my limited mid-size db experience,
>>>>> filesystem snapshotting is pretty much the same thing you propose
>>>>> there (xfs_freeze), and it works pretty well. There's even automated
>>>>> tools to do that, like bacula, and they can handle incremental
>>>>> snapshots.
>>>>
>>>> Large databases tend to have multiple filesystems and getting a single,
>>>> consistent, snapshot across all of them while under load is..
>>>> 'challenging'. It's fine if you use pg_start/stop_backup() and you're
>>>> saving the XLOGs off, but if you can't do that..
>>>
>>> Good point there.
>>>
>>> I still don't like the idea of having to mark each modified page. The
>>> WAL compressor idea sounds a lot more workable. As in scalable.
>>
>> Why do you think WAL compressor idea is more scalable? I really want
>> to know why. Besides the unlogged tables issue, I can accept the idea
>> if WAL based solution is much more efficient. If there's no perfect,
>> ideal solution, we need to prioritize things. My #1 priority is
>> allowing to create incremental backup against TB database, and the
>> backup file should be small enough and the time to create it is
>> acceptable. I just don't know why scanning WAL stream is much cheaper
>> than recording modified page information.
>
> Because it aggregates updates.
>
> When you work at the storage manager level, you only see block-sized
> operations. This results in the need to WAL-log bit-sized updates on
> some hypothetical dirty-map index. Even when done 100% efficiently,
> this implies at least one write per dirtied block, which could as much
> as double write I/O in the worse (and totally expectable) case.
>
> When you do it at WAL segment recycle time, or better yet during
> checkpoints, you deal with checkpoint-scale operations. You can
> aggregate dirty-map updates, if you keep a dirty-map, which could not
> only reduce I/O considerably (by a much increased likelihood of write
> coalescence), but also let you schedule it better (toss it in the
> background, with checkpoints). This is for gathering dirty-map
> updates, which still leaves you with the complex problem of then
> actually snapshotting those pages consistently without interfering
> with ongoing transactions.
>
> If you do a WAL compressor, WAL entries are write-once, so you'll have
> no trouble snapshotting those pages. You have the checkpoint's initial
> full page write, so you don't even have to read the page, and you can
> accumulate all further partial writes into one full page write, and
> dump that on an "incremental archive". So, you get all the I/O
> aggregation from above, which reduces I/O to the point where it only
> doubles WAL I/O. It's bound by a constant, and in contrast to
> dirty-map updates, it's sequential I/O so it's a lot faster. It's thus
> perfectly scalable.
>
> Not only that, but you're also amortizing incremental backup costs
> over time, as you're making them constantly as opposed to regular
> intervals. You'll have one incremental backup per checkpoint. If you
> want to coalesce backups, you launch another compressor to merge the
> last incremental checkpoint with the new one. And, now this is the
> cherry on top, you only have to do this on the archived WALs, which
> means you could very well do it on another system, freeing your main
> cluster from all this I/O. It's thus perfectly scalable.
>
> The only bottleneck here, is WAL archiving. This assumes you can
> afford WAL archiving at least to a local filesystem, and that the WAL
> compressor is able to cope with WAL bandwidth. But I have no reason to
> think you'd be able to cope with dirty-map updates anyway if you were
> saturating the WAL compressor, as the compressor is more efficient on
> amortized cost per transaction than the dirty-map approach.

Thank you for detailed explanation. I will think more about this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing incremental backup
Date:	2013-06-20 07:57:33
Message-ID:	CABUevEw+pnxV7zJ_oZe97v3SMr_0e_YD6aGbe-tJYVqxfyWKCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 20, 2013 at 12:18 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Claudio Freire escribió:
>> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
>> >> I don't see how this is better than snapshotting at the filesystem
>> >> level. I have no experience with TB scale databases (I've been limited
>> >> to only hundreds of GB), but from my limited mid-size db experience,
>> >> filesystem snapshotting is pretty much the same thing you propose
>> >> there (xfs_freeze), and it works pretty well. There's even automated
>> >> tools to do that, like bacula, and they can handle incremental
>> >> snapshots.
>> >
>> > Large databases tend to have multiple filesystems and getting a single,
>> > consistent, snapshot across all of them while under load is..
>> > 'challenging'. It's fine if you use pg_start/stop_backup() and you're
>> > saving the XLOGs off, but if you can't do that..
>>
>> Good point there.
>>
>> I still don't like the idea of having to mark each modified page. The
>> WAL compressor idea sounds a lot more workable. As in scalable.
>
> There was a project that removed "useless" WAL records from the stream,
> to make it smaller and useful for long-term archiving. It only removed
> FPIs as far as I recall. It's dead now, and didn't compile on recent
> (9.1?) Postgres because of changes in the WAL structs, IIRC.
>
> This doesn't help if you have a large lot of UPDATEs that touch the same
> set of rows over and over, though. Tatsuo-san's proposal would allow
> this use-case to work nicely because you only keep one copy of such
> data, not one for each modification.
>
> If you have the two technologies, you could teach them to work in
> conjunction: you set up WAL replication, and tell the WAL compressor to
> prune updates for high-update tables (avoid useless traffic), then use
> incremental backup to back these up. This seems like it would have a
> lot of moving parts and be rather bug-prone, though.

Just as a datapoint, I think this is basically what at least some
other database engine (sqlserver) calls "incremental" vs
"differential" backup.

"Differential" backup keep tracks of which blocks have changed (by one
way or another - maybe as simple as the LSN, but it doesn't matter
how, really) and backs up just those blocks (diffed back to the base
backup).

"Incremental" does the transaction log, which is basically what we do
with log archiving except it's not done in realtime - it's all saved
on the master until the backup command runs.

Of course, it's quite a been a few years since I set up one of those
in anger, so disclaimer for that info being out of date :)

Didn't pg_rman try to do something based on the page LSN to achieve
something similar to this?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	"Jehan-Guillaume (ioguix) de Rorthais" <ioguix(at)free(dot)fr>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	klaussfreire(at)gmail(dot)com, sfrost(at)snowman(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementing incremental backup
Date:	2013-06-21 23:09:20
Message-ID:	51C4DD20.3000103@free.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20/06/2013 03:25, Tatsuo Ishii wrote:
>> On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>>> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>>>> * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
[...]
>>
>> The only bottleneck here, is WAL archiving. This assumes you can
>> afford WAL archiving at least to a local filesystem, and that the WAL
>> compressor is able to cope with WAL bandwidth. But I have no reason to
>> think you'd be able to cope with dirty-map updates anyway if you were
>> saturating the WAL compressor, as the compressor is more efficient on
>> amortized cost per transaction than the dirty-map approach.
>
> Thank you for detailed explanation. I will think more about this.

Just for the record, I was mulling over this idea since a bunch of
month. I even talked about that with Dimitri Fontaine some weeks ago
with some beers :)

My idea came from a customer during a training explaining me the
difference between differential and incremental backup in Oracle.

My approach would have been to create a standalone tool (say
pg_walaggregate) which takes a bunch of WAL from archives and merge them
in a single big file, keeping only the very last version of each page
after aggregating all their changes. The resulting file, aggregating all
the changes from given WAL files would be the "differential backup".

A differential backup resulting from a bunch of WAL between W1 and Wn
would help to recover much faster to the time of Wn than replaying all
the WALs between W1 and Wn and saves a lot of space.

I was hoping to find some time to dig around this idea, but as the
subject rose here, then here are my 2¢!

Cheers,
--
Jehan-Guillaume (ioguix) de Rorthais

From:	Cédric Villemain <cedric(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	"Jehan-Guillaume (ioguix) de Rorthais" <ioguix(at)free(dot)fr>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, klaussfreire(at)gmail(dot)com, sfrost(at)snowman(dot)net
Subject:	Re: Implementing incremental backup
Date:	2013-06-22 13:58:35
Message-ID:	201306221558.39966.cedric@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le samedi 22 juin 2013 01:09:20, Jehan-Guillaume (ioguix) de Rorthais a écrit
:
> On 20/06/2013 03:25, Tatsuo Ishii wrote:
> >> On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org>
wrote:
> >>>> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net>
wrote:
> >>>>> * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
> [...]
>
> >> The only bottleneck here, is WAL archiving. This assumes you can
> >> afford WAL archiving at least to a local filesystem, and that the WAL
> >> compressor is able to cope with WAL bandwidth. But I have no reason to
> >> think you'd be able to cope with dirty-map updates anyway if you were
> >> saturating the WAL compressor, as the compressor is more efficient on
> >> amortized cost per transaction than the dirty-map approach.
> >
> > Thank you for detailed explanation. I will think more about this.
>
> Just for the record, I was mulling over this idea since a bunch of
> month. I even talked about that with Dimitri Fontaine some weeks ago
> with some beers :)
>
> My idea came from a customer during a training explaining me the
> difference between differential and incremental backup in Oracle.
>
> My approach would have been to create a standalone tool (say
> pg_walaggregate) which takes a bunch of WAL from archives and merge them
> in a single big file, keeping only the very last version of each page
> after aggregating all their changes. The resulting file, aggregating all
> the changes from given WAL files would be the "differential backup".
>
> A differential backup resulting from a bunch of WAL between W1 and Wn
> would help to recover much faster to the time of Wn than replaying all
> the WALs between W1 and Wn and saves a lot of space.
>
> I was hoping to find some time to dig around this idea, but as the
> subject rose here, then here are my 2¢!

something like that maybe :
./pg_xlogdump -b \
../data/pg_xlog/000000010000000000000001 \
../data/pg_xlog/000000010000000000000005| \
grep 'backup bkp' | awk '{print ($5,$9)}'

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Cédric Villemain <cedric(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Jehan-Guillaume (ioguix) de Rorthais" <ioguix(at)free(dot)fr>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, klaussfreire(at)gmail(dot)com, sfrost(at)snowman(dot)net
Subject:	Re: Implementing incremental backup
Date:	2013-06-22 14:08:51
Message-ID:	20130622140851.GA1254@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-22 15:58:35 +0200, Cédric Villemain wrote:
> > A differential backup resulting from a bunch of WAL between W1 and Wn
> > would help to recover much faster to the time of Wn than replaying all
> > the WALs between W1 and Wn and saves a lot of space.
> >
> > I was hoping to find some time to dig around this idea, but as the
> > subject rose here, then here are my 2¢!
>
> something like that maybe :
> ./pg_xlogdump -b \
> ../data/pg_xlog/000000010000000000000001 \
> ../data/pg_xlog/000000010000000000000005| \
> grep 'backup bkp' | awk '{print ($5,$9)}'

Note that it's a bit more complex than that for a number of reasons:
* we don't log full page images for e.g. new heap pages, we just set the
XLOG_HEAP_INIT_PAGE flag on the record
* there also are XLOG_FPI records
* How do you get a base backup as the basis to apply those to? You need
it to be recovered exactly to a certain point...

But yes, I think something can be done in the end. I think Heikki's
pg_rewind already has quite a bit of the required logic.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services