Re: Proposal: Incremental Backup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: desmodemone <desmodemone(at)gmail(dot)com>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Incremental Backup
Date: 2014-08-01 03:35:33
Message-ID: CAA4eK1JOfrmurgzYhGoz8GMkVGb2tgERW_Fs0nOWLJ_qTCesZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 31, 2014 at 1:56 PM, desmodemone <desmodemone(at)gmail(dot)com> wrote:
>
> Hi Amit, thank you for your comments .
> However , about drawbacks:
> a) It's not clear to me why the method needs checksum enable, I mean, if
the bgwriter or another process flushes a dirty buffer, it's only have to
signal in the map that the blocks are changed with an update of the value
from 0 to 1.They not need to verify the checksum of the block, we could
assume that when a dirty buffers is flushed, the block is changed [ or
better in my idea, the chunk of N blocks ].
> We could think an advanced setting that verify the checksum, but I think
will be heavier.

I was thinking of enabling it for hint bit updates, if any operation
changes the page due to hint bit, then it will not mark the buffer
dirty unless wal_log_hints or checksum is enabled. Now I think
if we don't want to track page changes due to hint bit updates, then
this will not be required.

> b) yes the backends need to update the map, but it's in memory, and as I
show, could be very small if we you chunk of blocks.If we not compress the
map, I not think could be a bottleneck.

This map has to reside in shared memory, so how will you
estimate the size of this map during startup and even if you
have some way to do that, I think still you need to detail out
the idea how your chunk scheme will work incase multiple
backends are trying to flush pages which are part of same chunk.

Also as I mentioned previously there are some operations which
are done without use of shared buffers, so you need to think
how to track the changes done by those operations.

> c) the map is not crash safe by design, because it needs only for
incremental backup to track what blocks needs to be backuped, not for
consistency or recovery of the whole cluster, so it's not an heavy cost for
the whole cluster to maintain it. we could think an option (but it's heavy)
to write it at every flush on file to have crash-safe map, but I not think
it's so usefull . I think it's acceptable, and probably it's better to
force that, to say: "if your db will crash, you need a fullbackup ",

I am not sure if your this assumption is right/acceptable, how can
we say that in such a case users will be okay to have a fullbackup?
In general, taking fullbackup is very heavy operation and we should
try to avoid such a situation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2014-08-01 06:33:44 numeric and float comparison oddities
Previous Message Amit Kapila 2014-08-01 03:01:05 Re: commitfest status