Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date: 2013-10-07 12:50:23
Message-ID: 20131007125023.GQ16128@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote:
> > 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
> > For fun, try not hashing those ints at all and see how that performs (that,
> > I think, is what you get from HashSet<int> in Java/C#).
>
> I've used crc32 mostly because it's easily available in the code (i.e.
> I'm lazy), but I've done some quick research / primitive benchmarking
> too. For example hashing 2e9 integers takes this much time:
>
> FNV-1 = 11.9
> FNV-1a = 11.9
> jenkins = 38.8
> crc32 = 32.0
>
> So it's not really "slow" and the uniformity seems to be rather good.
>
> I'll try FNV in the future, however I don't think that's the main issue
> right now.
>
Hi Tomas,

If you are going to use a function that is not currently in the code,
please consider xxhash:

http://code.google.com/p/xxhash/

Here are some benchmarks for some of the faster hash functions:

Name Speed Q.Score Author
xxHash 5.4 GB/s 10
MumurHash 3a 2.7 GB/s 10 Austin Appleby
SpookyHash 2.0 GB/s 10 Bob Jenkins
SBox 1.4 GB/s 9 Bret Mulvey
Lookup3 1.2 GB/s 9 Bob Jenkins
CityHash64 1.05 GB/s 10 Pike & Alakuijala
FNV 0.55 GB/s 5 Fowler, Noll, Vo
CRC32 0.43 GB/s 9
MD5-32 0.33 GB/s 10 Ronald L. Rivest
SHA1-32 0.28 GB/s 10

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-07 13:32:32 Re: logical changeset generation v6.1
Previous Message Fabien COELHO 2013-10-07 12:02:31 Re: pgbench progress report improvements - split 3 v2 - A