Quick Links

Re: [RFC] Incremental backup v2: add backup profile to base backup

Lists:	pgsql-hackers

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	[RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 15:31:45
Message-ID:	542EC161.7030603@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Hackers,

I've updated the wiki page
https://wiki.postgresql.org/wiki/Incremental_backup following the result
of discussion on hackers.

Compared to first version, we switched from a timestamp+checksum based
approach to one based on LSN.

This patch adds an option to pg_basebackup and to replication protocol
BASE_BACKUP command to generate a backup_profile file. It is almost
useless by itself, but it is the foundation on which we will build the
file based incremental backup (and hopefully a block based incremental
backup after it).

Any comment will be appreciated. In particular I'd appreciate comments
on correctness of relnode files detection and LSN extraction code.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

Attachment	Content-Type	Size
backup_profile_v2.patch.gz	application/x-gzip	7.1 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 15:53:48
Message-ID:	542EC68C.9090606@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/03/2014 06:31 PM, Marco Nenciarini wrote:
> Hi Hackers,
>
> I've updated the wiki page
> https://wiki.postgresql.org/wiki/Incremental_backup following the result
> of discussion on hackers.
>
> Compared to first version, we switched from a timestamp+checksum based
> approach to one based on LSN.
>
> This patch adds an option to pg_basebackup and to replication protocol
> BASE_BACKUP command to generate a backup_profile file. It is almost
> useless by itself, but it is the foundation on which we will build the
> file based incremental backup (and hopefully a block based incremental
> backup after it).

I'd suggest jumping straight to block-based incremental backup. It's not
significantly more complicated to implement, and if you implement both
separately, then we'll have to support both forever. If you really need
to, you can implement file-level diff as a special case, where the
server sends all blocks in the file, if any of them have an LSN > the
cutoff point. But I'm not sure if there's point in that, once you have
block-level support.

If we're going to need a profile file - and I'm not convinced of that -
is there any reason to not always include it in the backup?

> Any comment will be appreciated. In particular I'd appreciate comments
> on correctness of relnode files detection and LSN extraction code.

I didn't look at it in detail, but one future problem comes to mind:
Once you implement the server-side code that only sends a file if its
LSN is higher than the cutoff point that the client gave, you'll have to
scan the whole file first, to see if there are any blocks with a higher
LSN. At least until you find the first such block. So with a file-level
implementation of this sort, you'll have to scan all files twice, in the
worst case.

- Heikki

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 16:08:47
Message-ID:	542ECA0F.3030507@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 03/10/14 17:53, Heikki Linnakangas ha scritto:
> If we're going to need a profile file - and I'm not convinced of that -
> is there any reason to not always include it in the backup?
>

The main reason is to have a centralized list of files that need to be
present. Without a profile, you have to insert some sort of placeholder
for kipped files. Moreover, the profile allows you to quickly know the
size of the recovered backup (by simply summing the individual size).
Another use could be to 'validate' the presence of all required files in
a backup.

>> Any comment will be appreciated. In particular I'd appreciate comments
>> on correctness of relnode files detection and LSN extraction code.
>
> I didn't look at it in detail, but one future problem comes to mind:
> Once you implement the server-side code that only sends a file if its
> LSN is higher than the cutoff point that the client gave, you'll have to
> scan the whole file first, to see if there are any blocks with a higher
> LSN. At least until you find the first such block. So with a file-level
> implementation of this sort, you'll have to scan all files twice, in the
> worst case.
>

It's true. To solve this you have to keep a central maxLSN directory,
but I think it introduces more issues than it solves.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 16:23:09
Message-ID:	CAGTBQpYixGSqwxZspG_PzSbFTWOgJyar+hhQK+70tMEw0ZLo7Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 3, 2014 at 1:08 PM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>> Any comment will be appreciated. In particular I'd appreciate comments
>>> on correctness of relnode files detection and LSN extraction code.
>>
>> I didn't look at it in detail, but one future problem comes to mind:
>> Once you implement the server-side code that only sends a file if its
>> LSN is higher than the cutoff point that the client gave, you'll have to
>> scan the whole file first, to see if there are any blocks with a higher
>> LSN. At least until you find the first such block. So with a file-level
>> implementation of this sort, you'll have to scan all files twice, in the
>> worst case.
>>
>
> It's true. To solve this you have to keep a central maxLSN directory,
> but I think it introduces more issues than it solves.

I see that as a worthy optimization on the server side, regardless of
whether file or block-level backups are used, since it allows
efficient skipping of untouched segments (common for append-only
tables).

Still, it would be something to do after it works already (ie: it's an
optimization)

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 18:19:20
Message-ID:	20141003181920.GH14522@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 3, 2014 at 06:08:47PM +0200, Marco Nenciarini wrote:
> >> Any comment will be appreciated. In particular I'd appreciate comments
> >> on correctness of relnode files detection and LSN extraction code.
> >
> > I didn't look at it in detail, but one future problem comes to mind:
> > Once you implement the server-side code that only sends a file if its
> > LSN is higher than the cutoff point that the client gave, you'll have to
> > scan the whole file first, to see if there are any blocks with a higher
> > LSN. At least until you find the first such block. So with a file-level
> > implementation of this sort, you'll have to scan all files twice, in the
> > worst case.
> >
>
> It's true. To solve this you have to keep a central maxLSN directory,
> but I think it introduces more issues than it solves.

The central issue Heikki is pointing out is whether we should implement
a file-based system if we already know that a block-based system will be
superior in every way. I agree with that and agree that implementing
just file-based isn't worth it as we would have to support it forever.

So, in summary, if you target just a file-based system, be prepared that
it might be rejected.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 20:47:38
Message-ID:	CA+TgmoY+=GCkMVaUXLsB756C+VXyL_6u+Smcu3aLCOxTMnQtQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
> Il 03/10/14 17:53, Heikki Linnakangas ha scritto:
>> If we're going to need a profile file - and I'm not convinced of that -
>> is there any reason to not always include it in the backup?
>
> The main reason is to have a centralized list of files that need to be
> present. Without a profile, you have to insert some sort of placeholder
> for kipped files.

Why do you need to do that? And where do you need to do that?

It seems to me that there are three interesting operations:

1. Take a full backup. Basically, we already have this. In the
backup label file, make sure to note the newest LSN guaranteed to be
present in the backup.

2. Take a differential backup. In the backup label file, note the LSN
of the fullback to which the differential backup is relative, and the
newest LSN guaranteed to be present in the differential backup. The
actual backup can consist of a series of 20-byte buffer tags, those
being the exact set of blocks newer than the base-backup's
latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
an 8kB block of data. If a relfilenode is truncated or removed, you
need some way to indicate that in the backup; e.g. include a buffertag
with forknum = -(forknum + 1) and blocknum = the new number of blocks,
or InvalidBlockNumber if removed entirely.

3. Apply a differential backup to a full backup to create an updated
full backup. This is just a matter of scanning the full backup and
the differential backup and applying the changes in the differential
backup to the full backup.

You might want combinations of these, like something that does 2+3 as
a single operation, for efficiency, or a way to copy a full backup and
apply a differential backup to it as you go. But that's it, right?
What else do you need?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-03 21:12:25
Message-ID:	20141003211225.GU7158@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote:
> I've updated the wiki page
> https://wiki.postgresql.org/wiki/Incremental_backup following the result
> of discussion on hackers.
>
> Compared to first version, we switched from a timestamp+checksum based
> approach to one based on LSN.
>
> This patch adds an option to pg_basebackup and to replication protocol
> BASE_BACKUP command to generate a backup_profile file. It is almost
> useless by itself, but it is the foundation on which we will build the
> file based incremental backup (and hopefully a block based incremental
> backup after it).
>
> Any comment will be appreciated. In particular I'd appreciate comments
> on correctness of relnode files detection and LSN extraction code.

Can you describe the algorithm you implemented in words?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-04 06:35:37
Message-ID:	CAB7nPqTpddzF0S5VxBGz0LTfu4KOes7QXrb=vSfrhO6LNcMy2A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
> Compared to first version, we switched from a timestamp+checksum based
> approach to one based on LSN.
Cool.

> This patch adds an option to pg_basebackup and to replication protocol
> BASE_BACKUP command to generate a backup_profile file. It is almost
> useless by itself, but it is the foundation on which we will build the
> file based incremental backup (and hopefully a block based incremental
> backup after it).
Hm. I am not convinced by the backup profile file. What's wrong with
having a client send only an LSN position to get a set of files (or
partial files filed with blocks) newer than the position given, and
have the client do all the rebuild analysis?

> Any comment will be appreciated. In particular I'd appreciate comments
> on correctness of relnode files detection and LSN extraction code.
Please include some documentation with the patch once you consider
that this is worth adding to a commit fest. This is clearly WIP yet so
it does not matter much, but that's something not to forget.

Regards,
--
Michael

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 11:30:10
Message-ID:	54327D42.9020504@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 03/10/14 23:12, Andres Freund ha scritto:
> On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote:
>> I've updated the wiki page
>> https://wiki.postgresql.org/wiki/Incremental_backup following the result
>> of discussion on hackers.
>>
>> Compared to first version, we switched from a timestamp+checksum based
>> approach to one based on LSN.
>>
>> This patch adds an option to pg_basebackup and to replication protocol
>> BASE_BACKUP command to generate a backup_profile file. It is almost
>> useless by itself, but it is the foundation on which we will build the
>> file based incremental backup (and hopefully a block based incremental
>> backup after it).
>>
>> Any comment will be appreciated. In particular I'd appreciate comments
>> on correctness of relnode files detection and LSN extraction code.
>
> Can you describe the algorithm you implemented in words?
>

Here it is the relnode files detection algorithm:

I've added a has_relfiles parameter to the sendDir function. If
has_relfiles is true every file in the directory is tested against the
validateRelfilenodeName function. If the response is true, the maxLSN
value is computed for the file.

The sendDir function is called with has_relfiles=true by sendTablespace
function and by sendDir itself when is recurring into a subdirectory

* if has_relfiles is true
* if we are recurring into a "./global" or "./base" directory

The validateRelfilenodeName has been taken from pg_computemaxlsn patch.

It's short enough to be pasted here:

static bool
validateRelfilenodename(char *name)
{
int pos = 0;

while ((name[pos] >= '0') && (name[pos] <= '9'))
pos++;

if (name[pos] == '_')
{
pos++;
while ((name[pos] >= 'a') && (name[pos] <= 'z'))
pos++;
}
if (name[pos] == '.')
{
pos++;
while ((name[pos] >= '0') && (name[pos] <= '9'))
pos++;
}

if (name[pos] == 0)
return true;
return false;
}

To compute the maxLSN for a file, as the file is sent in TAR_SEND_SIZE
chunks (32kb) and it is always a multiple of the block size, I've added
the following code inside the send cycle:

+ char *page;
+
+ /* Scan every page to find the max file LSN */
+ for (page = buf; page < buf + (off_t) cnt; page += (off_t) BLCKSZ) {
+ pagelsn = PageGetLSN(page);
+ if (filemaxlsn < pagelsn)
+ filemaxlsn = pagelsn;
+ }
+

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 12:59:42
Message-ID:	5432923E.8080603@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 04/10/14 08:35, Michael Paquier ha scritto:
> On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini
> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> Compared to first version, we switched from a timestamp+checksum based
>> approach to one based on LSN.
> Cool.
>
>> This patch adds an option to pg_basebackup and to replication protocol
>> BASE_BACKUP command to generate a backup_profile file. It is almost
>> useless by itself, but it is the foundation on which we will build the
>> file based incremental backup (and hopefully a block based incremental
>> backup after it).
> Hm. I am not convinced by the backup profile file. What's wrong with
> having a client send only an LSN position to get a set of files (or
> partial files filed with blocks) newer than the position given, and
> have the client do all the rebuild analysis?
>

The main problem I see is the following: how a client can detect a
truncated or removed file?

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 14:51:28
Message-ID:	CA+TgmoafN8M1MT025t_Z2tJQAwMjrbkOPxCd9gxtoPxBwbMtSg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 6, 2014 at 8:59 AM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
> Il 04/10/14 08:35, Michael Paquier ha scritto:
>> On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini
>> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>> Compared to first version, we switched from a timestamp+checksum based
>>> approach to one based on LSN.
>> Cool.
>>
>>> This patch adds an option to pg_basebackup and to replication protocol
>>> BASE_BACKUP command to generate a backup_profile file. It is almost
>>> useless by itself, but it is the foundation on which we will build the
>>> file based incremental backup (and hopefully a block based incremental
>>> backup after it).
>> Hm. I am not convinced by the backup profile file. What's wrong with
>> having a client send only an LSN position to get a set of files (or
>> partial files filed with blocks) newer than the position given, and
>> have the client do all the rebuild analysis?
>>
>
> The main problem I see is the following: how a client can detect a
> truncated or removed file?

When you take a differential backup, the server needs to send some
piece of information about every file so that the client can compare
that list against what it already has. But a full backup does not
need to include similar information.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 15:33:40
Message-ID:	5432B654.6000003@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 03/10/14 22:47, Robert Haas ha scritto:
> On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini
> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> Il 03/10/14 17:53, Heikki Linnakangas ha scritto:
>>> If we're going to need a profile file - and I'm not convinced of that -
>>> is there any reason to not always include it in the backup?
>>
>> The main reason is to have a centralized list of files that need to be
>> present. Without a profile, you have to insert some sort of placeholder
>> for kipped files.
>
> Why do you need to do that? And where do you need to do that?
>
> It seems to me that there are three interesting operations:
>
> 1. Take a full backup. Basically, we already have this. In the
> backup label file, make sure to note the newest LSN guaranteed to be
> present in the backup.

Don't we already have it in "START WAL LOCATION"?

>
> 2. Take a differential backup. In the backup label file, note the LSN
> of the fullback to which the differential backup is relative, and the
> newest LSN guaranteed to be present in the differential backup. The
> actual backup can consist of a series of 20-byte buffer tags, those
> being the exact set of blocks newer than the base-backup's
> latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
> an 8kB block of data. If a relfilenode is truncated or removed, you
> need some way to indicate that in the backup; e.g. include a buffertag
> with forknum = -(forknum + 1) and blocknum = the new number of blocks,
> or InvalidBlockNumber if removed entirely.

To have a working backup you need to ship each block which is newer than
latest-guaranteed-to-be-present in full backup and not newer than
latest-guaranteed-to-be-present in the current backup. Also, as a
further optimization, you can think about not sending the empty space in
the middle of each page.

My main concern here is about how postgres can remember that a
relfilenode has been deleted, in order to send the appropriate "deletion
tag".

IMHO the easiest way is to send the full list of files along the backup
and let to the client the task to delete unneeded files. The backup
profile has this purpose.

Moreover, I do not like the idea of using only a stream of block as the
actual differential backup, for the following reasons:

* AFAIK, with the current infrastructure, you cannot do a backup with a
block stream only. To have a valid backup you need many files for which
the concept of LSN doesn't apply.

* I don't like to have all the data from the various
tablespace/db/whatever all mixed in the same stream. I'd prefer to have
the blocks saved on a per file basis.

>
> 3. Apply a differential backup to a full backup to create an updated
> full backup. This is just a matter of scanning the full backup and
> the differential backup and applying the changes in the differential
> backup to the full backup.
>
> You might want combinations of these, like something that does 2+3 as
> a single operation, for efficiency, or a way to copy a full backup and
> apply a differential backup to it as you go. But that's it, right?
> What else do you need?
>

Nothing else. Once we agree on definition of involved files and
protocols formats, only the actual coding remains.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 15:50:07
Message-ID:	CA+TgmoYdG1JvymERkGozpfazJBHTNbxSAvWMHGmK7dRioP8bAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 6, 2014 at 11:33 AM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> 1. Take a full backup. Basically, we already have this. In the
>> backup label file, make sure to note the newest LSN guaranteed to be
>> present in the backup.
>
> Don't we already have it in "START WAL LOCATION"?

Yeah, probably. I was too lazy to go look for it, but that sounds
like the right thing.

>> 2. Take a differential backup. In the backup label file, note the LSN
>> of the fullback to which the differential backup is relative, and the
>> newest LSN guaranteed to be present in the differential backup. The
>> actual backup can consist of a series of 20-byte buffer tags, those
>> being the exact set of blocks newer than the base-backup's
>> latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
>> an 8kB block of data. If a relfilenode is truncated or removed, you
>> need some way to indicate that in the backup; e.g. include a buffertag
>> with forknum = -(forknum + 1) and blocknum = the new number of blocks,
>> or InvalidBlockNumber if removed entirely.
>
> To have a working backup you need to ship each block which is newer than
> latest-guaranteed-to-be-present in full backup and not newer than
> latest-guaranteed-to-be-present in the current backup. Also, as a
> further optimization, you can think about not sending the empty space in
> the middle of each page.

Right. Or compressing the data.

> My main concern here is about how postgres can remember that a
> relfilenode has been deleted, in order to send the appropriate "deletion
> tag".

You also need to handle truncation.

> IMHO the easiest way is to send the full list of files along the backup
> and let to the client the task to delete unneeded files. The backup
> profile has this purpose.
>
> Moreover, I do not like the idea of using only a stream of block as the
> actual differential backup, for the following reasons:
>
> * AFAIK, with the current infrastructure, you cannot do a backup with a
> block stream only. To have a valid backup you need many files for which
> the concept of LSN doesn't apply.
>
> * I don't like to have all the data from the various
> tablespace/db/whatever all mixed in the same stream. I'd prefer to have
> the blocks saved on a per file basis.

OK, that makes sense. But you still only need the file list when
sending a differential backup, not when sending a full backup. So
maybe a differential backup looks like this:

- Ship a table-of-contents file with a list relation files currently
present and the length of each in blocks.
- For each block that's been modified since the original backup, ship
a file called delta_<original file name> which is of the form <block
number><changed block contents> [...].

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 15:51:11
Message-ID:	5432BA6F.7070301@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 06/10/14 16:51, Robert Haas ha scritto:
> On Mon, Oct 6, 2014 at 8:59 AM, Marco Nenciarini
> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> Il 04/10/14 08:35, Michael Paquier ha scritto:
>>> On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini
>>> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>>> Compared to first version, we switched from a timestamp+checksum based
>>>> approach to one based on LSN.
>>> Cool.
>>>
>>>> This patch adds an option to pg_basebackup and to replication protocol
>>>> BASE_BACKUP command to generate a backup_profile file. It is almost
>>>> useless by itself, but it is the foundation on which we will build the
>>>> file based incremental backup (and hopefully a block based incremental
>>>> backup after it).
>>> Hm. I am not convinced by the backup profile file. What's wrong with
>>> having a client send only an LSN position to get a set of files (or
>>> partial files filed with blocks) newer than the position given, and
>>> have the client do all the rebuild analysis?
>>>
>>
>> The main problem I see is the following: how a client can detect a
>> truncated or removed file?
>
> When you take a differential backup, the server needs to send some
> piece of information about every file so that the client can compare
> that list against what it already has. But a full backup does not
> need to include similar information.
>

I agree that a full backup does not need to include a profile.

I've added the option to require the profile even for a full backup, as
it can be useful for backup softwares. We could remove the option and
build the profile only during incremental backups, if required. However,
I would avoid the needing to scan the whole backup to know the size of
the recovered data directory, hence the backup profile.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 15:55:30
Message-ID:	CA+TgmoaPXp=S5OSQKqZXZ52G0fx2Bx5qVKXnkpjdu_C+hBVodQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
> I agree that a full backup does not need to include a profile.
>
> I've added the option to require the profile even for a full backup, as
> it can be useful for backup softwares. We could remove the option and
> build the profile only during incremental backups, if required. However,
> I would avoid the needing to scan the whole backup to know the size of
> the recovered data directory, hence the backup profile.

That doesn't seem to be buying you much. Calling stat() on every file
in a directory tree is a pretty cheap operation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:00:40
Message-ID:	CAHNtfO4=mU3cQstDNTAG4rYGHw7cEdR23dfKJuSpwThp4UHS5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

2014-10-06 17:51 GMT+02:00 Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it
>:

> I agree that a full backup does not need to include a profile.
>
> I've added the option to require the profile even for a full backup, as
> it can be useful for backup softwares. We could remove the option and
> build the profile only during incremental backups, if required. However,
> I would avoid the needing to scan the whole backup to know the size of
> the recovered data directory, hence the backup profile.
>

I really like this approach.

I think we should leave users the ability to ship a profile file even in
case of full backup (by default disabled).

Thanks,
Gabriele

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:06:47
Message-ID:	5432BE17.1080009@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 06/10/14 17:55, Robert Haas ha scritto:
> On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini
> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> I agree that a full backup does not need to include a profile.
>>
>> I've added the option to require the profile even for a full backup, as
>> it can be useful for backup softwares. We could remove the option and
>> build the profile only during incremental backups, if required. However,
>> I would avoid the needing to scan the whole backup to know the size of
>> the recovered data directory, hence the backup profile.
>
> That doesn't seem to be buying you much. Calling stat() on every file
> in a directory tree is a pretty cheap operation.
>

In case of incremental backup it is not true. You have to read the delta
file to know the final size. You can optimize it putting this
information in the first few bytes, but in case of compressed tar format
you will need to scan the whole archive.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:06:59
Message-ID:	5432BE23.7060808@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/06/2014 06:33 PM, Marco Nenciarini wrote:
> Il 03/10/14 22:47, Robert Haas ha scritto:
>> 2. Take a differential backup. In the backup label file, note the LSN
>> of the fullback to which the differential backup is relative, and the
>> newest LSN guaranteed to be present in the differential backup. The
>> actual backup can consist of a series of 20-byte buffer tags, those
>> being the exact set of blocks newer than the base-backup's
>> latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
>> an 8kB block of data. If a relfilenode is truncated or removed, you
>> need some way to indicate that in the backup; e.g. include a buffertag
>> with forknum = -(forknum + 1) and blocknum = the new number of blocks,
>> or InvalidBlockNumber if removed entirely.
>
> To have a working backup you need to ship each block which is newer than
> latest-guaranteed-to-be-present in full backup and not newer than
> latest-guaranteed-to-be-present in the current backup. Also, as a
> further optimization, you can think about not sending the empty space in
> the middle of each page.
>
> My main concern here is about how postgres can remember that a
> relfilenode has been deleted, in order to send the appropriate "deletion
> tag".
>
> IMHO the easiest way is to send the full list of files along the backup
> and let to the client the task to delete unneeded files. The backup
> profile has this purpose.

Right, but the server doesn't need to send a separate backup profile
file for that. Rather, anything that the server *didn't* send, should be
deleted.

I think the missing piece in this puzzle is that even for unmodified
blocks, the server should send a note saying the blocks were present,
but not modified. So for each file present in the server, the server
sends a block stream. For each block, it sends either the full block
contents, if it was modified, or a simple indicator that it was not
modified.

There's a downside to this, though. The client has to read the whole
stream, before it knows which files were present. So when applying a
block stream directly over an old backup, the client cannot delete files
until it has applied all the other changes. That needs more needs more
disk space. With a separate profile file that's sent *before* the rest
of the backup, you could delete the obsolete files first. But that's not
a very big deal. I would suggest that you leave out the profile file in
the first version, and add it as an optimization later, if needed.

> Moreover, I do not like the idea of using only a stream of block as the
> actual differential backup, for the following reasons:
>
> * AFAIK, with the current infrastructure, you cannot do a backup with a
> block stream only. To have a valid backup you need many files for which
> the concept of LSN doesn't apply.

Those should be sent in whole. At least in the first version. The
non-relation files are small compared to relation files, so it's not too
bad to just include them in full.

>> 3. Apply a differential backup to a full backup to create an updated
>> full backup. This is just a matter of scanning the full backup and
>> the differential backup and applying the changes in the differential
>> backup to the full backup.
>>
>> You might want combinations of these, like something that does 2+3 as
>> a single operation, for efficiency, or a way to copy a full backup and
>> apply a differential backup to it as you go. But that's it, right?
>> What else do you need?
>
> Nothing else. Once we agree on definition of involved files and
> protocols formats, only the actual coding remains.

BTW, regarding the protocol, I have an idea. Rather than invent a whole
new file format to represent the modified blocks, can we reuse some
existing binary diff file format? For example, the VCDIFF format (RFC
3284). For each unmodified block, the server would send a vcdiff COPY
instruction, to "copy" the block from the old backup, and for a modified
block, the server would send an ADD instruction, with the new block
contents. The VCDIFF file format is quite flexible, but we would only
use a small subset of it. I believe that subset would be just as easy to
generate in the backend as a custom file format, but you could then use
an external tool (xdelta3, open-vcdiff) to apply the diff manually, in
case of emergency. In essence, the server would send a tar stream as
usual, but for each relation file, it would send a VCDIFF file with name
"<relfilenode>.vcdiff" instead.

- Heikki

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:17:37
Message-ID:	5432C0A1.3090203@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/06/2014 07:06 PM, Marco Nenciarini wrote:
> Il 06/10/14 17:55, Robert Haas ha scritto:
>> On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini
>> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>> I agree that a full backup does not need to include a profile.
>>>
>>> I've added the option to require the profile even for a full backup, as
>>> it can be useful for backup softwares. We could remove the option and
>>> build the profile only during incremental backups, if required. However,
>>> I would avoid the needing to scan the whole backup to know the size of
>>> the recovered data directory, hence the backup profile.
>>
>> That doesn't seem to be buying you much. Calling stat() on every file
>> in a directory tree is a pretty cheap operation.
>>
>
> In case of incremental backup it is not true. You have to read the delta
> file to know the final size. You can optimize it putting this
> information in the first few bytes, but in case of compressed tar format
> you will need to scan the whole archive.

I think you're pretty much screwed with the compressed tar format
anyway. The files in the .tar can be in different order in the 'diff'
and the base backup, so you need to do random access anyway when you try
apply the diff. And random access isn't very easy with uncompressed tar
format either. I think it would be acceptable to only support
incremental backups with the directory format.

In hindsight, our compressed tar format was not a very good choice,
because it makes random access impossible.

- Heikki

From:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:18:43
Message-ID:	5432C0E3.9000201@2ndquadrant.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Il 06/10/14 17:50, Robert Haas ha scritto:
> On Mon, Oct 6, 2014 at 11:33 AM, Marco Nenciarini
> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>> 2. Take a differential backup. In the backup label file, note the LSN
>>> of the fullback to which the differential backup is relative, and the
>>> newest LSN guaranteed to be present in the differential backup. The
>>> actual backup can consist of a series of 20-byte buffer tags, those
>>> being the exact set of blocks newer than the base-backup's
>>> latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
>>> an 8kB block of data. If a relfilenode is truncated or removed, you
>>> need some way to indicate that in the backup; e.g. include a buffertag
>>> with forknum = -(forknum + 1) and blocknum = the new number of blocks,
>>> or InvalidBlockNumber if removed entirely.
>>
>> To have a working backup you need to ship each block which is newer than
>> latest-guaranteed-to-be-present in full backup and not newer than
>> latest-guaranteed-to-be-present in the current backup. Also, as a
>> further optimization, you can think about not sending the empty space in
>> the middle of each page.
>
> Right. Or compressing the data.

If we want to introduce compression on server side, I think that
compressing the whole tar stream would be more effective.

>
>> My main concern here is about how postgres can remember that a
>> relfilenode has been deleted, in order to send the appropriate "deletion
>> tag".
>
> You also need to handle truncation.

Yes, of course. The current backup profile contains the file size, and
it can be used to truncate the file to the right size.

>> IMHO the easiest way is to send the full list of files along the backup
>> and let to the client the task to delete unneeded files. The backup
>> profile has this purpose.
>>
>> Moreover, I do not like the idea of using only a stream of block as the
>> actual differential backup, for the following reasons:
>>
>> * AFAIK, with the current infrastructure, you cannot do a backup with a
>> block stream only. To have a valid backup you need many files for which
>> the concept of LSN doesn't apply.
>>
>> * I don't like to have all the data from the various
>> tablespace/db/whatever all mixed in the same stream. I'd prefer to have
>> the blocks saved on a per file basis.
>
> OK, that makes sense. But you still only need the file list when
> sending a differential backup, not when sending a full backup. So
> maybe a differential backup looks like this:
>
> - Ship a table-of-contents file with a list relation files currently
> present and the length of each in blocks.

Having the size in bytes allow you to use the same format for non-block
files. Am I missing any advantage of having the size in blocks over
having the size in bytes?

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:19:03
Message-ID:	CA+Tgmoapc8mCwT7YzdY6Po6-RfHyqjS2VxxfCGja0868qu4aJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 6, 2014 at 12:06 PM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
> Il 06/10/14 17:55, Robert Haas ha scritto:
>> On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini
>> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>> I agree that a full backup does not need to include a profile.
>>>
>>> I've added the option to require the profile even for a full backup, as
>>> it can be useful for backup softwares. We could remove the option and
>>> build the profile only during incremental backups, if required. However,
>>> I would avoid the needing to scan the whole backup to know the size of
>>> the recovered data directory, hence the backup profile.
>>
>> That doesn't seem to be buying you much. Calling stat() on every file
>> in a directory tree is a pretty cheap operation.
>>
>
> In case of incremental backup it is not true. You have to read the delta
> file to know the final size. You can optimize it putting this
> information in the first few bytes, but in case of compressed tar format
> you will need to scan the whole archive.

Well, sure. But I never objected to sending a profile in a
differential backup. I'm just objecting to sending one in a full
backup. At least not without a more compelling reason why we need it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:19:44
Message-ID:	CA+TgmoZGMJn-G8XsdscZgA+Ggr0Y=17K6n42BPcR8S+tZRMXtA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 6, 2014 at 12:18 PM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> - Ship a table-of-contents file with a list relation files currently
>> present and the length of each in blocks.
>
> Having the size in bytes allow you to use the same format for non-block
> files. Am I missing any advantage of having the size in blocks over
> having the size in bytes?

Size in bytes would be fine, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 16:24:32
Message-ID:	5432C240.1020207@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/06/2014 07:00 PM, Gabriele Bartolini wrote:
> Hello,
>
> 2014-10-06 17:51 GMT+02:00 Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it
>> :
>
>> I agree that a full backup does not need to include a profile.
>>
>> I've added the option to require the profile even for a full backup, as
>> it can be useful for backup softwares. We could remove the option and
>> build the profile only during incremental backups, if required. However,
>> I would avoid the needing to scan the whole backup to know the size of
>> the recovered data directory, hence the backup profile.
>
> I really like this approach.
>
> I think we should leave users the ability to ship a profile file even in
> case of full backup (by default disabled).

I don't see the point of making the profile optional. Why burden the
user with that decision? I'm not convinced we need it at all, but if
we're going to have a profile file, it should always be included.

- Heikki

From:	David Fetter <david(at)fetter(dot)org>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Incremental backup v2: add backup profile to base backup
Date:	2014-10-06 22:55:47
Message-ID:	20141006225547.GD18762@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 06, 2014 at 07:24:32PM +0300, Heikki Linnakangas wrote:
> On 10/06/2014 07:00 PM, Gabriele Bartolini wrote:
> >Hello,
> >
> >2014-10-06 17:51 GMT+02:00 Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it
> >>:
> >
> >>I agree that a full backup does not need to include a profile.
> >>
> >>I've added the option to require the profile even for a full backup, as
> >>it can be useful for backup softwares. We could remove the option and
> >>build the profile only during incremental backups, if required. However,
> >>I would avoid the needing to scan the whole backup to know the size of
> >>the recovered data directory, hence the backup profile.
> >
> >I really like this approach.
> >
> >I think we should leave users the ability to ship a profile file even in
> >case of full backup (by default disabled).
>
> I don't see the point of making the profile optional. Why burden the user
> with that decision? I'm not convinced we need it at all, but if we're going
> to have a profile file, it should always be included.

+1 for fewer user decisions, especially with something light-weight in
resource consumption like the profile.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate