Implementing incremental backup

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Implementing incremental backup
Date: 2013-06-19 10:13:46
Message-ID: 20130619.191346.515430917508820927.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'm thinking of implementing an incremental backup tool for
PostgreSQL. The use case for the tool would be taking a backup of huge
database. For that size of database, pg_dump is too slow, even WAL
archive is too slow/ineffective as well. However even in a TB
database, sometimes actual modified blocks are not that big, may be
even several GB. So if we can backup those modified blocks only,
that would be an effective incremental backup method.

For now, my idea is pretty vague.

- Record info about modified blocks. We don't need to remember the
whole history of a block if the block was modified multiple times.
We just remember that the block was modified since the last
incremental backup was taken.

- The info could be obtained by trapping calls to mdwrite() etc. We need
to be careful to avoid such blocks used in xlogs and temporary
tables to not waste resource.

- If many blocks were modified in a file, we may be able to condense
the info as "the whole file was modified" to reduce the amount of
info.

- How to take a consistent incremental backup is an issue. I can't
think of a clean way other than "locking whole cluster", which is
obviously unacceptable. Maybe we should give up "hot backup"?

Comments, thoughts are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cédric Villemain 2013-06-19 10:33:22 [Review] Re: [PATCH] Remove useless USE_PGXS support in contrib
Previous Message Cédric Villemain 2013-06-19 09:55:58 Re: [PATCH] Remove useless USE_PGXS support in contrib