Re: MD5 aggregate

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MD5 aggregate
Date: 2013-06-14 14:47:25
Message-ID: 9910.1371221245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 14 June 2013 14:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Personally I'd be a bit inclined to xor the per-row md5's rather than
>> sum them, but that's a small matter.

> But this would be a much riskier thing to do with a single column,
> because if you updated multiple rows in the same way (e.g., UPDATE t
> SET x='foo' WHERE x='bar') then xor'ing the md5's would cancel out if
> there were an even number of matches.

I was implicitly thinking that the sum would be a modulo sum so that the
final result is still the size of an md5 signature. If that's true,
then leaking bits via carry out is just as bad as xor's deficiencies.
Now, you could certainly make it a non-modulo sum and not lose any
information to carries, if you're willing to do the arithmetic in
NUMERIC and have a variable-width result. Sounds a bit slow though.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2013-06-14 14:49:31 Re: MD5 aggregate
Previous Message Tom Lane 2013-06-14 14:41:31 Re: Patch for fail-back without fresh backup