Re: MD5 aggregate

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MD5 aggregate
Date: 2013-06-14 14:20:25
Message-ID: CAEZATCWZcij6dpv=q9KjxqpPtdh9pbwDebqd=u5YboAMjZyrDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 June 2013 14:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
>> On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>> Attached is a patch implementing a new aggregate function md5_agg() to
>>> compute the aggregate MD5 sum across a number of rows.
>
>> It's more efficient to calculate per-row md5, and then sum() them.
>> This avoids the need for ORDER BY.
>
> Good point. The aggregate md5 function also fails to distinguish the
> case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
> from the case where they contain 'xyz' followed by 'zyxyz'.
>

Well, if you aggregated foo.*::text as in my original example, then
the textual representation of the row would protect you from that. But
yes, if you were just doing it with a single text column that might be
a risk.

> Now, as against that, you lose any sensitivity to the ordering of the
> values.
>
> Personally I'd be a bit inclined to xor the per-row md5's rather than
> sum them, but that's a small matter.
>

But this would be a much riskier thing to do with a single column,
because if you updated multiple rows in the same way (e.g., UPDATE t
SET x='foo' WHERE x='bar') then xor'ing the md5's would cancel out if
there were an even number of matches.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-06-14 14:22:19 Re: Add visibility map information to pg_freespace.
Previous Message Stephen Frost 2013-06-14 14:19:09 Re: MD5 aggregate