Re: Synch Rep for CommitFest 2009-07

From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synch Rep for CommitFest 2009-07
Date: 2009-07-16 18:45:26
Message-ID: 2493CEC8-2752-42E9-85C8-5D75F27D9C3A@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 16, 2009, at 11:09 AM, Greg Stark wrote:

> On Thu, Jul 16, 2009 at 4:41 PM, Heikki
> Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Rick Gigger wrote:
>>> If you use an rsync like algorithm for doing the base backups
>>> wouldn't
>>> that increase the size of the database for which it would still be
>>> practical to just re-sync? Couldn't you in fact sync a very large
>>> database if the amount of actual change in the files was a small
>>> percentage of the total size?
>>
>> It would certainly help to reduce the network traffic, though you'd
>> still have to scan all the data to see what has changed.
>
> The fundamental problem with pushing users to start over with a new
> base backup is that there's no relationship between the size of the
> WAL and the size of the database.
>
> You can plausibly have a system with extremely high transaction rate
> generating WAL very quickly, but where the whole database fits in a
> few hundred megabytes. In that case you could be behind by only a few
> minutes and have it be faster to take a new base backup.
>
> Or you could have a petabyte database which is rarely updated. In
> which case it might be faster to apply weeks' worth of logs than to
> try to take a base backup.
>
> Only the sysadmin is actually going to know which makes more sense.
> Unless we start tieing WAL parameters to the database size or
> something like that.

Once again wouldn't an rsync like algorithm help here. Couldn't you
have the default be to just create a new base backup for them , but
then allow you to specify an existing base backup if you've already
got one?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2009-07-16 18:47:15 Re: COPY WITH CSV FORCE QUOTE * -- REVIEW
Previous Message Josh Berkus 2009-07-16 18:34:48 Docbook toolchain interfering with patch review?