Re: Implementing incremental backup

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implementing incremental backup
Date: 2013-06-19 12:56:31
Message-ID: CA+CSw_swVaWb7-tvSh70sdE2564VZZbeoPrD9hBN4K09EvBOwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 19, 2013 at 1:13 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> I'm thinking of implementing an incremental backup tool for
> PostgreSQL. The use case for the tool would be taking a backup of huge
> database. For that size of database, pg_dump is too slow, even WAL
> archive is too slow/ineffective as well. However even in a TB
> database, sometimes actual modified blocks are not that big, may be
> even several GB. So if we can backup those modified blocks only,
> that would be an effective incremental backup method.

PostgreSQL definitely needs better tools to cope with TB scale
databases. Especially when the ideas that get rid of anti-wraparound
vacuums materialize and make huge databases more practical.

> For now, my idea is pretty vague.
>
> - Record info about modified blocks. We don't need to remember the
> whole history of a block if the block was modified multiple times.
> We just remember that the block was modified since the last
> incremental backup was taken.
>
> - The info could be obtained by trapping calls to mdwrite() etc. We need
> to be careful to avoid such blocks used in xlogs and temporary
> tables to not waste resource.

Unless I'm missing something, the information about modified blocks
can also be obtained by reading WAL, not requiring any modifications
to core.

> - If many blocks were modified in a file, we may be able to condense
> the info as "the whole file was modified" to reduce the amount of
> info.

You could keep a list of block ranges modified and when the list gets
too large, merge ranges that are close together.

> - How to take a consistent incremental backup is an issue. I can't
> think of a clean way other than "locking whole cluster", which is
> obviously unacceptable. Maybe we should give up "hot backup"?

I don't see why regular pg_start_backup(), copy out modified blocks,
pg_stop_backup(), copy WAL needed to recover approach wouldn't work
here.

A good feature of the tool would be to apply the incremental backup to
the previous backup while copying out old blocks so you could have the
latest full backup available and incremental changes to rewind it to
the previous version.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-06-19 13:18:20 Re: Git-master regression failure
Previous Message Stephen Frost 2013-06-19 12:55:21 Re: Implementing incremental backup