Re: [PATCH] Incremental backup: add backup profile to base backup

Lists: pgsql-hackers
From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 00:04:00
Message-ID: 53F142F0.1060500@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Hackers,

This is the first piece of file level incremental backup support, as
described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup

It is not yet complete, but I wish to share it on the list to receive
comments and suggestions.

The point of the patch is adding an option to pg_basebackup and
replication protocol BASE_BACKUP command to generate a backup_profile file.

When taking a full backup with pg_basebackup, the user can request
Postgres to generate a backup_profile file through the --profile option
(-B short option, which I've arbitrarily picked up because both -P and
-p was already taken)

At the moment the backup profile consists of a file with one line per
file detailing modification time, md5, size, tablespace and path
relative to tablespace root (PGDATA or the tablespace)

To calculate the md5 checksum I've used the md5 code present in pgcrypto
contrib as the code in src/include/libpq/md5.h is not suitable for large
files. Since a core feature cannot depend on a piece of contrib, I've
moved the files

contrib/pgcrypto/md5.c
contrib/pgcrypto/md5.h

to

src/backend/utils/hash/md5.c
src/include/utils/md5.h

changing the pgcrypto extension to use them.

There are still some TODOs:

* User documentation

* Remove the pg_basebackup code duplication I've introduced with the
ReceiveAndUnpackTarFileToDir function, which is almost the same of
ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It
instead simply extract a tar stream in a destination directory. The
latter could probably be rewritten using the former as component, but it
needs some adjustment to the "progress reporting" part, which is not
present at the moment in ReceiveAndUnpackTarFileToDir.

* Add header section to backup_profile file which at the moment contains
only the body part. I'm thinking to change the original design and put
the whole backup label as header, which is IMHO clearer and well known.
I would use something like:

START WAL LOCATION: 0/E000028 (file 00000001000000000000000E)
CHECKPOINT LOCATION: 0/E000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2014-08-14 18:54:01 CEST
LABEL: pg_basebackup base backup
END LABEL

I've attached the current patch based on master.

Any comment will be appreciated.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

Attachment Content-Type Size
pg_basebackup_profile.patch text/plain 47.1 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 05:05:30
Message-ID: 20140818050528.GA8062@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marco Nenciarini wrote:

> To calculate the md5 checksum I've used the md5 code present in pgcrypto
> contrib as the code in src/include/libpq/md5.h is not suitable for large
> files. Since a core feature cannot depend on a piece of contrib, I've
> moved the files
>
> contrib/pgcrypto/md5.c
> contrib/pgcrypto/md5.h
>
> to
>
> src/backend/utils/hash/md5.c
> src/include/utils/md5.h
>
> changing the pgcrypto extension to use them.

We already have the FNV checksum implementation in the backend -- can't
we use that one for this and avoid messing with MD5?

(I don't think we're looking for a cryptographic hash here. Am I wrong?)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 12:40:45
Message-ID: 53F1F44D.6010409@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/18/2014 03:04 AM, Marco Nenciarini wrote:
> Hi Hackers,
>
> This is the first piece of file level incremental backup support, as
> described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup
>
> It is not yet complete, but I wish to share it on the list to receive
> comments and suggestions.
>
> The point of the patch is adding an option to pg_basebackup and
> replication protocol BASE_BACKUP command to generate a backup_profile file.

This is difficult to review in isolation. Would have to see the whole
solution to see if this makes sense.

If we ever want to make this more granular than per-file (and I think we
should, because otherwise you might as well just use rsync), how would
this be extended?

- Heikki


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 13:05:07
Message-ID: 53F1FA03.40407@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
> Marco Nenciarini wrote:
>
>> To calculate the md5 checksum I've used the md5 code present in pgcrypto
>> contrib as the code in src/include/libpq/md5.h is not suitable for large
>> files. Since a core feature cannot depend on a piece of contrib, I've
>> moved the files
>>
>> contrib/pgcrypto/md5.c
>> contrib/pgcrypto/md5.h
>>
>> to
>>
>> src/backend/utils/hash/md5.c
>> src/include/utils/md5.h
>>
>> changing the pgcrypto extension to use them.
>
> We already have the FNV checksum implementation in the backend -- can't
> we use that one for this and avoid messing with MD5?
>
> (I don't think we're looking for a cryptographic hash here. Am I wrong?)

Hmm. Any user that can update a table can craft such an update that its
checksum matches an older backup. That may seem like an onerous task; to
correctly calculate the checksum of a file in a previous, you need to
know the LSNs and the exact data, including deleted data, on every block
in the table, and then construct a suitable INSERT or UPDATE that
modifies the table such that you get a collision. But for some tables it
could be trivial; you might know that a table was bulk-loaded with a
particular LSN and there are no dead tuples. Or you can simply create
your own table and insert exactly the data you want. Messing with your
own table might seem harmless, but it'll e.g. let you construct a case
where an index points to a tuple that doesn't exist anymore, or there's
a row that doesn't pass a CHECK-constraint that was added later. Even if
there's no direct security issue with that, you don't want that kind of
uncertainty from a backup solution.

But more to the point, I thought the consensus was to use the highest
LSN of all the blocks in the file, no? That's essentially free to
calculate (if you have to read all the data anyway), and isn't
vulnerable to collisions.

- Heikki


From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 13:13:33
Message-ID: CAO_YK0UcnV8oUNg7zKnEFf21K0F0+R58vfLsNg1E+A9K1YdO4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 10:05 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>
>> Marco Nenciarini wrote:
>>
>> To calculate the md5 checksum I've used the md5 code present in pgcrypto
>>> contrib as the code in src/include/libpq/md5.h is not suitable for large
>>> files. Since a core feature cannot depend on a piece of contrib, I've
>>> moved the files
>>>
>>> contrib/pgcrypto/md5.c
>>> contrib/pgcrypto/md5.h
>>>
>>> to
>>>
>>> src/backend/utils/hash/md5.c
>>> src/include/utils/md5.h
>>>
>>> changing the pgcrypto extension to use them.
>>>
>>
>> We already have the FNV checksum implementation in the backend -- can't
>> we use that one for this and avoid messing with MD5?
>>
>> (I don't think we're looking for a cryptographic hash here. Am I wrong?)
>>
>
> Hmm. Any user that can update a table can craft such an update that its
> checksum matches an older backup. That may seem like an onerous task; to
> correctly calculate the checksum of a file in a previous, you need to know
> the LSNs and the exact data, including deleted data, on every block in the
> table, and then construct a suitable INSERT or UPDATE that modifies the
> table such that you get a collision. But for some tables it could be
> trivial; you might know that a table was bulk-loaded with a particular LSN
> and there are no dead tuples. Or you can simply create your own table and
> insert exactly the data you want. Messing with your own table might seem
> harmless, but it'll e.g. let you construct a case where an index points to
> a tuple that doesn't exist anymore, or there's a row that doesn't pass a
> CHECK-constraint that was added later. Even if there's no direct security
> issue with that, you don't want that kind of uncertainty from a backup
> solution.
>
> But more to the point, I thought the consensus was to use the highest LSN
> of all the blocks in the file, no? That's essentially free to calculate (if
> you have to read all the data anyway), and isn't vulnerable to collisions.
>
> - Heikki
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

We also have both crc32 and crc64 implementations in pg_crc. If the goal is
just verifying file integrity (we can't really protect against intentional
modification) crc sounds more appropriate to me.


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 16:33:37
Message-ID: 20140818163337.GA6817@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:

> >We already have the FNV checksum implementation in the backend -- can't
> >we use that one for this and avoid messing with MD5?
> >
> >(I don't think we're looking for a cryptographic hash here. Am I wrong?)
>
> Hmm. Any user that can update a table can craft such an update that
> its checksum matches an older backup. That may seem like an onerous
> task; to correctly calculate the checksum of a file in a previous,
> you need to know the LSNs and the exact data, including deleted
> data, on every block in the table, and then construct a suitable
> INSERT or UPDATE that modifies the table such that you get a
> collision. But for some tables it could be trivial; you might know
> that a table was bulk-loaded with a particular LSN and there are no
> dead tuples.

What would anybody obtain by doing that? The only benefit is that the
file you so carefully crafted is not included in the next incremental
backup. How is this of any interest?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 20:55:13
Message-ID: 53F26831.2000006@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/18/2014 07:33 PM, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>
>>> We already have the FNV checksum implementation in the backend --
>>> can't we use that one for this and avoid messing with MD5?
>>>
>>> (I don't think we're looking for a cryptographic hash here. Am I
>>> wrong?)
>>
>> Hmm. Any user that can update a table can craft such an update
>> that its checksum matches an older backup. That may seem like an
>> onerous task; to correctly calculate the checksum of a file in a
>> previous, you need to know the LSNs and the exact data, including
>> deleted data, on every block in the table, and then construct a
>> suitable INSERT or UPDATE that modifies the table such that you get
>> a collision. But for some tables it could be trivial; you might
>> know that a table was bulk-loaded with a particular LSN and there
>> are no dead tuples.
>
> What would anybody obtain by doing that? The only benefit is that
> the file you so carefully crafted is not included in the next
> incremental backup. How is this of any interest?

You're not thinking evil enough ;-). Let's say that you have a table
that stores bank transfers. You can do a bank transfer to pay a
merchant, get the goods delivered to you, and then a second transfer to
yourself with a specially crafted message attached to it that makes the
checksum match the state before the first transfer. If the backup is
restored (e.g. by a daily batch job to a reporting system), it will
appear as if neither transfer happened, and you get to keep your money.

Far-fetched? Well, how about this scenario: a vandal just wants to cause
damage. Creating a situation where a restore from backup causes the
system to be inconsistent will certainly cause headaches to the admins,
and leave them wondering what else is corrupt.

Or how about this: you can do the trick to a system catalog, say
pg_attribute, to make it look like a column is of type varlena, when
it's actually since been ALTERed to be an integer. Now you can access
arbitrary memory in the server, and take over the whole system.

I'm sure any or all of those scenarios are highly inpractical when you
actually sit down and try to do it, but you don't want to rely on that.
You have to be able to trust your backups.

- Heikki


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-19 03:16:47
Message-ID: CAA4eK1KLa72xsCPdFm1EwwoDbXkejMaXdVaUyuqdDNJfjAzjJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 6:35 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
wrote:
>
> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>>
>> Marco Nenciarini wrote:
>>
>>> To calculate the md5 checksum I've used the md5 code present in pgcrypto
>>> contrib as the code in src/include/libpq/md5.h is not suitable for large
>>> files. Since a core feature cannot depend on a piece of contrib, I've
>>> moved the files
>>>
>>> contrib/pgcrypto/md5.c
>>> contrib/pgcrypto/md5.h
>>>
>>> to
>>>
>>> src/backend/utils/hash/md5.c
>>> src/include/utils/md5.h
>>>
>>> changing the pgcrypto extension to use them.
>>
>>
>> We already have the FNV checksum implementation in the backend -- can't
>> we use that one for this and avoid messing with MD5?
>>
>> (I don't think we're looking for a cryptographic hash here. Am I wrong?)
>
>
> Hmm. Any user that can update a table can craft such an update that its
checksum matches an older backup. That may seem like an onerous task; to
correctly calculate the checksum of a file in a previous, you need to know
the LSNs and the exact data, including deleted data, on every block in the
table, and then construct a suitable INSERT or UPDATE that modifies the
table such that you get a collision. But for some tables it could be
trivial; you might know that a table was bulk-loaded with a particular LSN
and there are no dead tuples. Or you can simply create your own table and
insert exactly the data you want. Messing with your own table might seem
harmless, but it'll e.g. let you construct a case where an index points to
a tuple that doesn't exist anymore, or there's a row that doesn't pass a
CHECK-constraint that was added later. Even if there's no direct security
issue with that, you don't want that kind of uncertainty from a backup
solution.
>
> But more to the point, I thought the consensus was to use the highest LSN
of all the blocks in the file, no?

If we want to use highest LSN, then may be it would be
helpful for author to reuse some part of code which I have
written for Compute Max LSN utility which didn't got committed
because of not enough use case for it.
The latest patch for the same can be found at:
http://www.postgresql.org/message-id/CA+TgmoY1Wr0KUDpgSkuSPTre0vkt1CoTc+w+t3ZOwxOejX0SpA@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-20 12:36:47
Message-ID: 53F4965F.4030905@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think this has had enough review for a WIP patch. I'm marking this as
Returned with Feedback in the commitfest because:

* should use LSNs instead of a md5
* this doesn't do anything useful on its own, hence would need to see
the whole solution before committing
* not clear how this would be extended if you wanted to do more
fine-grained than file-level diffs.

But please feel free to continue discussing those items.

On 08/18/2014 03:04 AM, Marco Nenciarini wrote:
> Hi Hackers,
>
> This is the first piece of file level incremental backup support, as
> described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup
>
> It is not yet complete, but I wish to share it on the list to receive
> comments and suggestions.
>
> The point of the patch is adding an option to pg_basebackup and
> replication protocol BASE_BACKUP command to generate a backup_profile file.
>
> When taking a full backup with pg_basebackup, the user can request
> Postgres to generate a backup_profile file through the --profile option
> (-B short option, which I've arbitrarily picked up because both -P and
> -p was already taken)
>
> At the moment the backup profile consists of a file with one line per
> file detailing modification time, md5, size, tablespace and path
> relative to tablespace root (PGDATA or the tablespace)
>
> To calculate the md5 checksum I've used the md5 code present in pgcrypto
> contrib as the code in src/include/libpq/md5.h is not suitable for large
> files. Since a core feature cannot depend on a piece of contrib, I've
> moved the files
>
> contrib/pgcrypto/md5.c
> contrib/pgcrypto/md5.h
>
> to
>
> src/backend/utils/hash/md5.c
> src/include/utils/md5.h
>
> changing the pgcrypto extension to use them.
>
> There are still some TODOs:
>
> * User documentation
>
> * Remove the pg_basebackup code duplication I've introduced with the
> ReceiveAndUnpackTarFileToDir function, which is almost the same of
> ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It
> instead simply extract a tar stream in a destination directory. The
> latter could probably be rewritten using the former as component, but it
> needs some adjustment to the "progress reporting" part, which is not
> present at the moment in ReceiveAndUnpackTarFileToDir.
>
> * Add header section to backup_profile file which at the moment contains
> only the body part. I'm thinking to change the original design and put
> the whole backup label as header, which is IMHO clearer and well known.
> I would use something like:
>
> START WAL LOCATION: 0/E000028 (file 00000001000000000000000E)
> CHECKPOINT LOCATION: 0/E000060
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2014-08-14 18:54:01 CEST
> LABEL: pg_basebackup base backup
> END LABEL
>
> I've attached the current patch based on master.
>
> Any comment will be appreciated.
>
> Regards,
> Marco
>

--
- Heikki


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-20 23:24:20
Message-ID: 20140820232420.GD17822@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote:
> But more to the point, I thought the consensus was to use the
> highest LSN of all the blocks in the file, no? That's essentially
> free to calculate (if you have to read all the data anyway), and
> isn't vulnerable to collisions.

The highest-LSN approach allows you to read only the tail part of each
8k block. Assuming 512-byte storage sector sizes, you only have to read
1/8 of the file.

Now, the problem is that you lose kernel prefetch, but maybe
posix_fadvise() would fix that problem.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-20 23:33:20
Message-ID: CAGTBQpZatSD0JZ-1D7V2cxeV-9cVQS6_Hx_ANdc+Lr+wOacMFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 20, 2014 at 8:24 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote:
>> But more to the point, I thought the consensus was to use the
>> highest LSN of all the blocks in the file, no? That's essentially
>> free to calculate (if you have to read all the data anyway), and
>> isn't vulnerable to collisions.
>
> The highest-LSN approach allows you to read only the tail part of each
> 8k block. Assuming 512-byte storage sector sizes, you only have to read
> 1/8 of the file.
>
> Now, the problem is that you lose kernel prefetch, but maybe
> posix_fadvise() would fix that problem.

Sequential read of 512-byte blocks or 8k blocks takes the same amount
of time in rotating media (if they're scheduled right). Maybe not in
SSD media.

Not only, the kernel will read in 4k blocks, instead of 8k (at least in linux).

So, the benefit is dubious.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-22 15:59:09
Message-ID: CA+TgmoZha+Yb1qP-Tbjf2xOPK4orpyVVSN90SvZCcp7yFzbbNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 4:55 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> You're not thinking evil enough ;-). Let's say that you have a table that
> stores bank transfers. You can do a bank transfer to pay a merchant, get the
> goods delivered to you, and then a second transfer to yourself with a
> specially crafted message attached to it that makes the checksum match the
> state before the first transfer. If the backup is restored (e.g. by a daily
> batch job to a reporting system), it will appear as if neither transfer
> happened, and you get to keep your money.
>
> Far-fetched? Well, how about this scenario: a vandal just wants to cause
> damage. Creating a situation where a restore from backup causes the system
> to be inconsistent will certainly cause headaches to the admins, and leave
> them wondering what else is corrupt.
>
> Or how about this: you can do the trick to a system catalog, say
> pg_attribute, to make it look like a column is of type varlena, when it's
> actually since been ALTERed to be an integer. Now you can access arbitrary
> memory in the server, and take over the whole system.
>
> I'm sure any or all of those scenarios are highly inpractical when you
> actually sit down and try to do it, but you don't want to rely on that. You
> have to be able to trust your backups.

Yeah. I agree that these scenarios are far-fetched; however, they're
also preventable, so we should.

Also, with respect to checksum collisions, you figure to have an
*accidental* checksum collision every so often as well. For block
checksums, we're using a 16-bit value, which is OK because we'll still
detect 65535/65536 = 99.998% of corruption events. The requirements
are much higher for incremental backup, because a checksum collision
here means automatic data corruption. If we were crazy enough to use
a 16-bit block-level checksum in this context, about one
actually-modified block would fail to get copied out of every 8kB *
64k = 512MB of modified data, which would not make anybody very happy.
A 32-bit checksum would be much safer, and a 64-bit checksum would be
better still, but block LSNs seem better still.

Of course, there's one case where block LSNs aren't better, which is
where somebody goes backwards in time - i.e. back up the database, run
it for a while, take a base backup, shut down, restore from backup, do
stuff that's different from what you did the first time through, try
to take an incremental backup against your base backup. LSNs won't
catch that; checksums will. Do we want to allow incremental backup in
that kind of situation? It seems like playing with fire, but it's
surely not useless.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-22 16:00:19
Message-ID: CA+TgmobMgZeL6e3kCcgCke5RQ87z4qN=pD7X-sSgWDHTbN6-NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 20, 2014 at 7:33 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Wed, Aug 20, 2014 at 8:24 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote:
>>> But more to the point, I thought the consensus was to use the
>>> highest LSN of all the blocks in the file, no? That's essentially
>>> free to calculate (if you have to read all the data anyway), and
>>> isn't vulnerable to collisions.
>>
>> The highest-LSN approach allows you to read only the tail part of each
>> 8k block. Assuming 512-byte storage sector sizes, you only have to read
>> 1/8 of the file.
>>
>> Now, the problem is that you lose kernel prefetch, but maybe
>> posix_fadvise() would fix that problem.
>
> Sequential read of 512-byte blocks or 8k blocks takes the same amount
> of time in rotating media (if they're scheduled right). Maybe not in
> SSD media.
>
> Not only, the kernel will read in 4k blocks, instead of 8k (at least in linux).
>
> So, the benefit is dubious.

Agreed. But, there could be a CPU benefit, too. Pulling the LSN out
of a block is probably a lot cheaper than checksumming the whole
thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-22 19:15:15
Message-ID: 20140822191515.GD21456@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 22, 2014 at 12:00:19PM -0400, Robert Haas wrote:
> On Wed, Aug 20, 2014 at 7:33 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> > On Wed, Aug 20, 2014 at 8:24 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote:
> >>> But more to the point, I thought the consensus was to use the
> >>> highest LSN of all the blocks in the file, no? That's essentially
> >>> free to calculate (if you have to read all the data anyway), and
> >>> isn't vulnerable to collisions.
> >>
> >> The highest-LSN approach allows you to read only the tail part of each
> >> 8k block. Assuming 512-byte storage sector sizes, you only have to read
> >> 1/8 of the file.
> >>
> >> Now, the problem is that you lose kernel prefetch, but maybe
> >> posix_fadvise() would fix that problem.
> >
> > Sequential read of 512-byte blocks or 8k blocks takes the same amount
> > of time in rotating media (if they're scheduled right). Maybe not in
> > SSD media.
> >
> > Not only, the kernel will read in 4k blocks, instead of 8k (at least in linux).
> >
> > So, the benefit is dubious.
>
> Agreed. But, there could be a CPU benefit, too. Pulling the LSN out
> of a block is probably a lot cheaper than checksumming the whole
> thing.

Oh, good, I hoped someone would find something useful about my idea. :-)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +