Re: MD5 aggregate

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MD5 aggregate
Date: 2013-06-14 13:59:01
Message-ID: 51BB21A5.3060300@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 06/14/2013 09:40 AM, Stephen Frost wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> Marko Kreen <markokr(at)gmail(dot)com> writes:
>>> On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>>> Attached is a patch implementing a new aggregate function md5_agg() to
>>>> compute the aggregate MD5 sum across a number of rows.
>>> It's more efficient to calculate per-row md5, and then sum() them.
>>> This avoids the need for ORDER BY.
>> Good point. The aggregate md5 function also fails to distinguish the
>> case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
>> from the case where they contain 'xyz' followed by 'zyxyz'.
>>
>> Now, as against that, you lose any sensitivity to the ordering of the
>> values.
>>
>> Personally I'd be a bit inclined to xor the per-row md5's rather than
>> sum them, but that's a small matter.
> Where I'd take this is actually in a completely different direction..
> I'd like the aggregate to be able to match the results of running the
> 'md5sum' unix utility on a file that's been COPY'd out. Yes, that means
> we'd need a way to get back "what would this row look like if it was
> sent through COPY with these parameters", but I've long wanted that
> also.
>
> No, no clue about how to put all that together. Yes, having this would
> be better than nothing, so I'm still for adding this even if we can't
> make it match COPY output. :)
>
>

I'd rather go the other way, processing the records without having to
process them otherwise at all. Turning things into text must slow things
down, surely.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-06-14 14:01:11 Re: Patch for fail-back without fresh backup
Previous Message Heikki Linnakangas 2013-06-14 13:58:38 Re: Patch for fail-back without fresh backup