Re: MD5 aggregate

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MD5 aggregate
Date: 2013-06-14 14:56:57
Message-ID: 20130614145657.GH19500@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-06-14 15:49:31 +0100, Dean Rasheed wrote:
> On 14 June 2013 15:19, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
> >> I'd rather go the other way, processing the records without having
> >> to process them otherwise at all. Turning things into text must slow
> >> things down, surely.
> >
> > That's certainly an interesting idea also..
> >
>
> md5_agg(record) ?
>
> Yes, I like it.

It's more complex than just memcmp()ing HeapTupleData though. At least
if the Datum contains varlena columns there's so many different
representations (short, long, compressed, external, external compressed)
of the same data that a md5 without normalizing that wouldn't be very
interesting.
So you would at least need a normalizing version of
toast_flatten_tuple() that also deals with short/long varlenas. But even
after that, you would still need to deal with Datums that can have
different representation (like short numerics, old style hstore, ...).

It might be more realistic to use the binary output functions, but I am
not sure whether all of those are sufficiently reproduceable.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2013-06-14 15:09:46 Re: MD5 aggregate
Previous Message Dean Rasheed 2013-06-14 14:49:31 Re: MD5 aggregate