Re: [PATCH] Incremental backup: add backup profile to base backup

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-18 13:13:33
Message-ID: CAO_YK0UcnV8oUNg7zKnEFf21K0F0+R58vfLsNg1E+A9K1YdO4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 10:05 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>
>> Marco Nenciarini wrote:
>>
>> To calculate the md5 checksum I've used the md5 code present in pgcrypto
>>> contrib as the code in src/include/libpq/md5.h is not suitable for large
>>> files. Since a core feature cannot depend on a piece of contrib, I've
>>> moved the files
>>>
>>> contrib/pgcrypto/md5.c
>>> contrib/pgcrypto/md5.h
>>>
>>> to
>>>
>>> src/backend/utils/hash/md5.c
>>> src/include/utils/md5.h
>>>
>>> changing the pgcrypto extension to use them.
>>>
>>
>> We already have the FNV checksum implementation in the backend -- can't
>> we use that one for this and avoid messing with MD5?
>>
>> (I don't think we're looking for a cryptographic hash here. Am I wrong?)
>>
>
> Hmm. Any user that can update a table can craft such an update that its
> checksum matches an older backup. That may seem like an onerous task; to
> correctly calculate the checksum of a file in a previous, you need to know
> the LSNs and the exact data, including deleted data, on every block in the
> table, and then construct a suitable INSERT or UPDATE that modifies the
> table such that you get a collision. But for some tables it could be
> trivial; you might know that a table was bulk-loaded with a particular LSN
> and there are no dead tuples. Or you can simply create your own table and
> insert exactly the data you want. Messing with your own table might seem
> harmless, but it'll e.g. let you construct a case where an index points to
> a tuple that doesn't exist anymore, or there's a row that doesn't pass a
> CHECK-constraint that was added later. Even if there's no direct security
> issue with that, you don't want that kind of uncertainty from a backup
> solution.
>
> But more to the point, I thought the consensus was to use the highest LSN
> of all the blocks in the file, no? That's essentially free to calculate (if
> you have to read all the data anyway), and isn't vulnerable to collisions.
>
> - Heikki
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

We also have both crc32 and crc64 implementations in pg_crc. If the goal is
just verifying file integrity (we can't really protect against intentional
modification) crc sounds more appropriate to me.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-08-18 13:55:11 Re: WAL format and API changes (9.5)
Previous Message Heikki Linnakangas 2014-08-18 13:05:07 Re: [PATCH] Incremental backup: add backup profile to base backup