Re: MD5 aggregate

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, david(at)fetter(dot)org
Subject: Re: MD5 aggregate
Date: 2013-06-27 15:44:28
Message-ID: CA+Tgmoaa5kMEVZRoGQkUmZ8ykBN9Qv3iqUxyZi8o7U8Y_V_9YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 27, 2013 at 7:29 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> On Thu, Jun 27, 2013 at 11:28 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> On 26 June 2013 21:46, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> On 6/26/13 4:04 PM, Dean Rasheed wrote:
>>>> A quick google search reveals several people asking for something like
>>>> this, and people recommending md5(string_agg(...)) or
>>>> md5(string_agg(md5(...))) based solutions, which are doomed to failure
>>>> on larger tables.
>>>
>>> The thread discussed several other options of checksumming tables that
>>> did not have the air of a crytographic offering, as Noah put it.
>>>
>>
>> True but md5 has the advantage of being directly comparable with the
>> output of Unix md5sum, which would be useful if you loaded data from
>> external files and wanted to confirm that your import process didn't
>> mangle it.
>
> The problem with md5_agg() is that it's only useful in toy scenarios.
>
> It's more useful give people script that does same sum(hash(row))
> on dump file than try to run MD5 on ordered rows.
>
> Also, I don't think anybody actually cares about MD5(table-as-bytes), instead
> people want way to check if 2 tables or table and dump are same.

I think you're trying to tell Dean to write the patch that you want
instead of the patch that he wants. There are certainly other things
that could be done that some people might sometimes prefer, but that
doesn't mean what he did isn't useful.

That having been said, I basically agree with Noah: I think this would
be a useful extension (perhaps even in contrib?) but I don't think we
need to install it by default. It's useful, but it's also narrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-06-27 15:45:43 Re: in-catalog Extension Scripts and Control parameters (templates?)
Previous Message Robert Haas 2013-06-27 15:39:43 Re: Reduce maximum error in tuples estimation after vacuum.