Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 00:04:24
Message-ID: 5638.1419033864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> writes:
> If we are going to consider changing the hash function, we should
> consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
> x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
> is double the speed of fast-hash according to the page running on a
> 3GHz x86_64.

Well, as the google page points out, raw speed is not the only figure of
merit; otherwise we'd just xor all the bytes and call it good. We need
the hash function to spread out the hash values well, or else we lose more
cycles chasing inordinately-long hash chains than we saved with a cheap
hash function. Google claims their fast-hash is actually better on this
point than Jenkins, which if true would be very nice indeed.

Keep in mind also that very very few of our hash keys are longer than
about 20 bytes, so that speed-per-byte is not that exciting anyway.
Longer setup/finish times could easily swamp any per-byte advantages,
for example.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-20 00:16:53 Re: Commitfest problems
Previous Message Josh Berkus 2014-12-19 23:42:25 Re: Commitfest problems