Re: MD5 aggregate

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, david(at)fetter(dot)org
Subject: Re: MD5 aggregate
Date: 2013-06-27 11:29:22
Message-ID: CACMqXCJNrpTttpMFW8u5fvy7sEJCkYCep5278nJB3-vpGHcdcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 27, 2013 at 11:28 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On 26 June 2013 21:46, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On 6/26/13 4:04 PM, Dean Rasheed wrote:
>>> A quick google search reveals several people asking for something like
>>> this, and people recommending md5(string_agg(...)) or
>>> md5(string_agg(md5(...))) based solutions, which are doomed to failure
>>> on larger tables.
>>
>> The thread discussed several other options of checksumming tables that
>> did not have the air of a crytographic offering, as Noah put it.
>>
>
> True but md5 has the advantage of being directly comparable with the
> output of Unix md5sum, which would be useful if you loaded data from
> external files and wanted to confirm that your import process didn't
> mangle it.

The problem with md5_agg() is that it's only useful in toy scenarios.

It's more useful give people script that does same sum(hash(row))
on dump file than try to run MD5 on ordered rows.

Also, I don't think anybody actually cares about MD5(table-as-bytes), instead
people want way to check if 2 tables or table and dump are same.

--
marko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-06-27 11:34:44 Re: in-catalog Extension Scripts and Control parameters (templates?)
Previous Message Amit Kapila 2013-06-27 11:27:56 Re: Reduce maximum error in tuples estimation after vacuum.