Re: updated hash functions for postgresql v1

From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kenneth Marshall" <ktm(at)rice(dot)edu>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org, twraney(at)comcast(dot)net, neilc(at)samurai(dot)com
Subject: Re: updated hash functions for postgresql v1
Date: 2008-04-07 07:42:51
Message-ID: e51f66da0804070042g6fa1f903g29b93540a8ae3153@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On 4/6/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So adopting the mixing changes would make it faster yet. What we need
> to be certain of is that this doesn't expose us to poorer hashing.
> We know that it is critical that all bits of the input affect all bits
> of the hash fairly uniformly --- otherwise we are subject to very
> serious performance hits at higher levels in hash join, for instance.
> The comments in the new code led me to worry that Jenkins had
> compromised on that property in search of speed. I looked at his
> website but couldn't find any real discussion of the design principles
> for the new mixing code ...

Scroll at the end of doobs.html, there is the longer discussion.

My understanding is following:

His design is based on 2 properties of mixing function:

- reversible - that means for each input tuple of (a,b,c) corresponds
exactly one output tuple of (a,b,c). Such property guarantees that
after repeatedly applying mixing function, no bits get lost.

- avalanche - that means any single bit change in input (a,b,c)
half of the output bits are affected.

His "insight" (as he called it) when creating lookup3 was that
the bulk mixing that is applied repeatedly does not need to
have avalanche, it only needs to be reversible, meaning all the
bits that went in, are still there after mixing repeatedly.

And only final mixing needs to have avalanche as it produces
final result, but it does not need to be reversible as it wont
be applied repeatedly and most of the result is dropped anyway.

IMHO his choices are reasonable.

--
marko

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Dunstan 2008-04-07 08:03:48 Re: [HACKERS] Database owner installable modules patch
Previous Message Tom Dunstan 2008-04-07 06:16:37 Re: [HACKERS] Database owner installable modules patch