Re: speedup tidbitmap patch: hash BlockNumber

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: speedup tidbitmap patch: hash BlockNumber
Date: 2014-10-22 13:14:43
Message-ID: 5447ADC3.6000408@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just replace tag_hash in tidbitmap which uses hash_any to direct call of
hash_uint32, it saves ~5% of execution time.

An example:
# create extension btree_gin;
# select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v;
# create index idx on t using gin (i);
# set enable_seqscan = off;

# explain analyze select * from t where i >= 0;
without patch: Execution time: 2427.692 ms
with patch: Execution time: 2319.376 ms

# explain analyze select * from t where i >= 100 and i<= 100;
without patch: Execution time: 524.441 ms
with patch: Execution time: 504.478 ms
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
tbm_blocknumber-2.patch.gz application/x-gzip 572 bytes

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speedup tidbitmap patch: hash BlockNumber
Date: 2014-12-15 13:36:48
Message-ID: 548EE3F0.8040504@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/22/2014 04:14 PM, Teodor Sigaev wrote:
> Just replace tag_hash in tidbitmap which uses hash_any to direct call of
> hash_uint32, it saves ~5% of execution time.
>
> An example:
> # create extension btree_gin;
> # select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v;
> # create index idx on t using gin (i);
> # set enable_seqscan = off;
>
> # explain analyze select * from t where i >= 0;
> without patch: Execution time: 2427.692 ms
> with patch: Execution time: 2319.376 ms
>
> # explain analyze select * from t where i >= 100 and i<= 100;
> without patch: Execution time: 524.441 ms
> with patch: Execution time: 504.478 ms

Nice little speedup, for such a simple patch. On my laptop, "perf"
confirms that with the patch, about 5% of the CPU time is spent in
tag_blocknumber, and without it, about 8% is spent in tag_hash. That
gives a 3% speed up, which is in the same ballpark that you saw.

I'd suggest putting the new function in hashfn.c, where we already have
similar functions, string_hash, tag_hash, oid_hash and bitmap_hash. And
name it "blocknumber_hash", for consistency.

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speedup tidbitmap patch: hash BlockNumber
Date: 2014-12-15 15:01:05
Message-ID: 22082.1418655665@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> On 10/22/2014 04:14 PM, Teodor Sigaev wrote:
>> Just replace tag_hash in tidbitmap which uses hash_any to direct call of
>> hash_uint32, it saves ~5% of execution time.

> I'd suggest putting the new function in hashfn.c, where we already have
> similar functions, string_hash, tag_hash, oid_hash and bitmap_hash. And
> name it "blocknumber_hash", for consistency.

I think this suggestion is misguided, and the patch itself needs
rethinking. Instead of doing this, let's hack dynahash.c itself
to substitute a shim like this when it's told function == tag_hash and
keysize == sizeof(uint32). Then we can remove any similar shims that
already exist, and possibly end up with a net savings of code rather than
adding more.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speedup tidbitmap patch: hash BlockNumber
Date: 2014-12-16 17:07:38
Message-ID: 549066DA.6090800@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I think this suggestion is misguided, and the patch itself needs
> rethinking. Instead of doing this, let's hack dynahash.c itself
> to substitute a shim like this when it's told function == tag_hash and
> keysize == sizeof(uint32). Then we can remove any similar shims that
> already exist, and possibly end up with a net savings of code rather than
> adding more.
done, actoually I found oid_hash shim only.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
tbm_blocknumber-3.patch.gz application/x-gzip 3.0 KB

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speedup tidbitmap patch: hash BlockNumber
Date: 2014-12-17 05:27:01
Message-ID: CAApHDvrb8kGdVE31_4-wrKyTXhDBAg3=V=mCxjL2Hu7YRo8_wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 December 2014 at 06:07, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>
> I think this suggestion is misguided, and the patch itself needs
>> rethinking. Instead of doing this, let's hack dynahash.c itself
>> to substitute a shim like this when it's told function == tag_hash and
>> keysize == sizeof(uint32). Then we can remove any similar shims that
>> already exist, and possibly end up with a net savings of code rather than
>> adding more.
>>
> done, actoually I found oid_hash shim only.
>
>
- hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */
+ hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */

I think the comment may need removed here.

Regards

David Rowley


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speedup tidbitmap patch: hash BlockNumber
Date: 2014-12-17 12:42:15
Message-ID: 54917A27.5040409@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */
> +hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */
>
> I think the comment may need removed here.
Thank you, fixed

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
tbm_blocknumber-4.patch.gz application/x-gzip 3.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-17 17:34:19
Message-ID: 3817.1418837659@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> -hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */
>> +hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */
>>
>> I think the comment may need removed here.

> Thank you, fixed

I looked at this patch. It's not right at all here:

+ if (hashp->hash == sizeof(int32) && info->hash == tag_hash)
+ hashp->hash = int32_hash;
+ else
+ hashp->hash = info->hash;

hashp->hash is not defined at this point, and if it were defined it would
not be the size of the key, and replacing that with info->keysize wouldn't
be exactly the right thing either since info->keysize isn't necessarily
set (there is a default, though I suspect it's mostly unused).

I hacked up a solution to that and was about to commit when I had second
thoughts about the whole approach. The thing that's worrying me about
this is that it depends on the assumption that the passed-in info->hash
pointer is identical to what dynahash.c thinks is the address of tag_hash.
I'm not convinced that that's necessarily true on all platforms; in
particular, I'm worried that a call from a loadable module (such as
plperl) might be passing the address of a trampoline or thunk function
that is a jump to tag_hash and not tag_hash itself. I don't see this
happening on my Linux box but it might well happen on, say, Windows.

There are already some comparisons of the hash function pointer in
dynahash.c, so you might think this concern is moot, but if you look
closely they're just comparisons to the address of string_hash ---
which, in normal use-cases, would be supplied by dynahash.c itself.
So I don't think the existing coding proves this will work.

Now, we could do it like this anyway, and just ignore the fact that we
might not get the optimization on every platform for hash tables created
by loadable modules. However, there's another way we could attack this,
which is to invent a new hash option flag bit that says "pick a suitable
hash function for me, assuming that all bits of the specified key size are
significant". So instead of

ctl.keysize = sizeof(...);
ctl.entrysize = sizeof(...);
ctl.hash = tag_hash;
tab = hash_create("...", ..., &ctl,
HASH_ELEM | HASH_FUNCTION);

you'd write

ctl.keysize = sizeof(...);
ctl.entrysize = sizeof(...);
tab = hash_create("...", ..., &ctl,
HASH_ELEM | HASH_BLOBS);

which would save some code space and is arguably cleaner than the
current approach of specifying some support functions and not others.

This would be more invasive than the current patch, since we'd have
to run around and touch code that currently mentions tag_hash as
well as the places that currently mention oid_hash. But the patch
is already pretty invasive just doing the latter.

Thoughts? Anyone have a decent suggestion for what to call the
flag bit? I'm not enamored of "HASH_BLOBS", it's just the first
thing that came to mind.

Also, depending on how much we want to annoy third-party authors,
we could consider making a clean break from the Old Way here and say
say that callers must specify all or none of the hash support functions,
thus letting us get rid of the very ad-hoc logic in hash_create()
that fills in defaults for some functions based on what you said
for others. So the rule would be something like:

No flag mentioned: use functions suitable for null-terminated strings
HASH_BLOBS: use functions suitable for arbitrary fixed-size tags
HASH_FUNCTIONS: caller must supply all three support functions

and we'd remove the existing flag bits HASH_FUNCTION, HASH_COMPARE,
HASH_KEYCOPY. But this would just be in the line of making the API
simpler/more logical, it wouldn't buy us performance as such. I'm
not sure whether it's a good idea to go this far.

Comments?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-18 02:20:34
Message-ID: 32316.1418869234@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> However, there's another way we could attack this,
> which is to invent a new hash option flag bit that says "pick a suitable
> hash function for me, assuming that all bits of the specified key size are
> significant". So instead of

> ctl.keysize = sizeof(...);
> ctl.entrysize = sizeof(...);
> ctl.hash = tag_hash;
> tab = hash_create("...", ..., &ctl,
> HASH_ELEM | HASH_FUNCTION);

> you'd write

> ctl.keysize = sizeof(...);
> ctl.entrysize = sizeof(...);
> tab = hash_create("...", ..., &ctl,
> HASH_ELEM | HASH_BLOBS);

> which would save some code space and is arguably cleaner than the
> current approach of specifying some support functions and not others.

Here's a proposed patch along this line. I left in oid_hash (in the
form of a macro) so that this does not cause any API break for existing
third-party modules. However, no callers in our own code directly
refer to tag_hash or oid_hash anymore.

regards, tom lane

Attachment Content-Type Size
make-dynahash-select-normal-hash-functions-1.patch text/x-diff 60.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-18 18:48:23
Message-ID: 27181.1418928503@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Here's a proposed patch along this line. I left in oid_hash (in the
> form of a macro) so that this does not cause any API break for existing
> third-party modules. However, no callers in our own code directly
> refer to tag_hash or oid_hash anymore.

Committed that version after some further comment wordsmithing.

On Teodor's original test cases, I see about 8% speedup compared to
the 4%-ish numbers he originally reported. This may be random variation
or it might mean that we got a win someplace else besides tidbitmap.c.
I've not tried to sleuth it down exactly. I am wondering though if
this suggests that it'd be worth our while to add a similar fast path
for 8-byte hash keys. That would be quite painless to add now (with
the exception of actually coding the fast hash function, perhaps).

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-18 23:00:08
Message-ID: 54935C78.8020709@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/18/14, 12:48 PM, Tom Lane wrote:
> I wrote:
>> Here's a proposed patch along this line. I left in oid_hash (in the
>> form of a macro) so that this does not cause any API break for existing
>> third-party modules. However, no callers in our own code directly
>> refer to tag_hash or oid_hash anymore.
>
> Committed that version after some further comment wordsmithing.
>
> On Teodor's original test cases, I see about 8% speedup compared to
> the 4%-ish numbers he originally reported. This may be random variation
> or it might mean that we got a win someplace else besides tidbitmap.c.
> I've not tried to sleuth it down exactly. I am wondering though if
> this suggests that it'd be worth our while to add a similar fast path
> for 8-byte hash keys. That would be quite painless to add now (with
> the exception of actually coding the fast hash function, perhaps).

I stuck an elog in to report when a hash wasn't found, and came up with these results (first number is how many times a hash wasn't found, second is keysize). I don't see a lot of 8's in there...

Second set of numbers is the same thing, except counting every time we looked for the hash value, regardless of result. (Both cases generated by running make check.)

BTW, as part of this I found some hash values were hit over 30k times in the first case. I don't know if that means a boatload of collisions or something else. Command to generate that data is: $egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,4|sort|uniq -c|cut -d' ' -f1|sort -n|tail

Before hitting the raw data, here is a summary of hash searches by key size (generated by patch 1):

43 48
327 256
1411 416
9321 136
12436 12
31270 64
107017 8
203127 16
628107 4
2201582 20 -- Mostly LOCALLOCK and Shared Buffer

egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,6|sort|uniq -c (patch 2)
581 Analyzed elements table|8
1141 Analyzed lexemes table|16
46 Array distinct element count table|4
167 Attopt cache|8
26 Btree proof lookup cache|8
95 CFuncHash|4
2387 Combo CIDs|8
99 Databases hash|4
22 Event Trigger Cache|4
85 JoinRelHashTable|8
460690 LOCALLOCK hash|20
1 LOCALLOCK hash|LOCALLOCK hash
49608 LOCK hash|16
951 Local Buffer Lookup Table|20
5 Local predicate lock|16
581 Operator class cache|4
1658 Operator lookup cache|136
301 PLpgSQL function cache|416
5 PREDICATELOCK hash|16
14 PREDICATELOCKTARGET hash|16
52278 PROCLOCK hash|16
1064 Pending Ops Table|12
14790 Per-database table|4
15297 Portal hash|64
21 Prepared Queries|64
126 PrivateRefCount|4
4 RI compare cache|8
55 RI constraint cache|4
90 RI query cache|8
72 Record information cache|64
24040 Relcache by OID|4
742 RelfilenodeMap cache|8
21 Rendezvous variable hash|64
4 Rewrite / Old to new tid map|12
49 Rewrite / Unresolved ctids|12
5 SERIALIZABLEXID hash|4
60 Sequence values|4
9603 Shared Buffer Lookup Table|20
43 ShmemIndex|48
5335 TIDBitmap|4
138 TableSpace cache|4
16516 Temporary table of OIDs|4
169 Timezones|256
6 Tsearch configuration cache|4
4 Tsearch dictionary cache|4
1 Tsearch parser cache|4
11550 TupleHashTable|8
327 Type information cache|4
59 json object hashtable|64
17430 smgr relation table|16

egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,6|sort|uniq -c (patch 1)
2250 Analyzed elements table|8
28817 Analyzed lexemes table|16
428 Array distinct element count table|4
447 Attopt cache|8
224 Btree proof lookup cache|8
1363 CFuncHash|4
5219 Combo CIDs|8
2415 Databases hash|4
12063 Event Trigger Cache|4
178 JoinRelHashTable|8
990753 LOCALLOCK hash|20
72129 LOCK hash|16
19858 Local Buffer Lookup Table|20
2 Local Buffer Lookup Table|LOG: Hashtable
7 Local predicate lock|16
4557 Operator class cache|4
9321 Operator lookup cache|136
1 Operator lookup cache|136LOG: Hashtable
1411 PLpgSQL function cache|416
8 PREDICATELOCK hash|16
106 PREDICATELOCKTARGET hash|16
73102 PROCLOCK hash|16
10929 Pending Ops Table|12
23534 Per-database table|4
29502 Portal hash|64
233 Prepared Queries|64
689 PrivateRefCount|4
91 RI compare cache|8
310 RI constraint cache|4
289 RI query cache|8
1471 Record information cache|64
1 Record information cache|64LOG: Hashtable
1 Record information cache|6LOG: Hashtable
426050 Relcache by OID|4
1308 RelfilenodeMap cache|8
12 Rendezvous variable hash|64
24 Rewrite / Old to new tid map|12
1483 Rewrite / Unresolved ctids|12
12 SERIALIZABLEXID hash|4
263 Sequence values|4
1190971 Shared Buffer Lookup Table|20
30 Shared Buffer Lookup Table|20LOG: Hashtable
20 Shared Buffer Lookup Table|2LOG: Hashtable
1 Shared Buffer Lookup Table|Event Trigger Cache
27 Shared Buffer Lookup Table|LOCALLOCK hash
2 Shared Buffer Lookup Table|LOCK hash
14 Shared Buffer Lookup Table|LOG: Hashtable
9 Shared Buffer Lookup Table|PROCLOCK hash
10 Shared Buffer Lookup Table|Relcache by OID
20 Shared Buffer Lookup Table|Shared Buffer Lookup Table
3 Shared Buffer Lookup Table|TIDBitmap
1 Shared Buffer Lookup Table|smgr relation table
43 ShmemIndex|48
104683 TIDBitmap|4
17 TOAST to main relid map|4
19542 TableSpace cache|4
14058 Temporary table of OIDs|4
327 Timezones|256
5 Tsearch configuration cache|4
68 Tsearch dictionary cache|4
3 Tsearch parser cache|4
1 TupleHashTable| hash
97011 TupleHashTable|8
18045 Type information cache|4
2 db hash|4
52 json object hashtable|64
28958 smgr relation table|16

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment Content-Type Size
1.patch text/plain 452 bytes
2.patch text/plain 558 bytes

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-19 22:41:51
Message-ID: 5494A9AF.6010604@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/18/14, 5:00 PM, Jim Nasby wrote:
> 2201582 20 -- Mostly LOCALLOCK and Shared Buffer

Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64.

It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We also have very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets us down to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminate one of the mix() calls.

But I wonder if we could still do better, because we typically also won't have that many relations. Is there some fast way we could combine dbid, forkNum and relid into a uint32? That gets us down to 8 bytes, which means we could use fash-hash, or a stripped down mix().

Unfortunately I don't know much about hash algorithms, so I don't know how practical any of this actually is, or what a good method for combining those fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ relid, but if you got unlucky with your oid values you could end up with a lot of collissions from that.

I can put some effort into this, but I'd like some guidance.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-19 23:13:29
Message-ID: 4467.1419030809@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
> On 12/18/14, 5:00 PM, Jim Nasby wrote:
>> 2201582 20 -- Mostly LOCALLOCK and Shared Buffer

> Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64.

> It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We also have very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets us down to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminate one of the mix() calls.

I don't see this working. The lock table in shared memory can surely take
no such shortcuts. We could make a backend's locallock table omit fields
that are predictable within the set of objects that backend could ever
lock, but (1) this doesn't help unless we can reduce the tag size for all
LockTagTypes, which we probably can't, and (2) having the locallock's tag
be different from the corresponding shared tag would be a mess too.
I think dealing with (2) might easily eat all the cycles we could hope to
save from a smaller hash tag ... and that's not even considering the added
logical complexity and potential for bugs.

Switching to a different hash algorithm could be feasible, perhaps.
I think we're likely stuck with Jenkins hashing for hashes that go to
disk, but hashes for dynahash tables don't do that.

regards, tom lane


From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-19 23:37:07
Message-ID: 20141219233707.GA19678@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote:
> On 12/18/14, 5:00 PM, Jim Nasby wrote:
> >2201582 20 -- Mostly LOCALLOCK and Shared Buffer
>
> Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64.
>
> It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We also have very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets us down to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminate one of the mix() calls.
>
> But I wonder if we could still do better, because we typically also won't have that many relations. Is there some fast way we could combine dbid, forkNum and relid into a uint32? That gets us down to 8 bytes, which means we could use fash-hash, or a stripped down mix().
>
> Unfortunately I don't know much about hash algorithms, so I don't know how practical any of this actually is, or what a good method for combining those fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ relid, but if you got unlucky with your oid values you could end up with a lot of collissions from that.
>
> I can put some effort into this, but I'd like some guidance.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Hi,

If we are going to consider changing the hash function, we should
consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
is double the speed of fast-hash according to the page running on a
3GHz x86_64. In addition, something like that could be used a checksum
instead of the current CRC32, although some work has already gone into
speeding it up, as is. Otherwise, it probably makes sense to just stick
with creating the fastpath 8-byte analogously to the 4-byte fastpath
that was just added. Is calculating the hash the bottle-neck?

Regards,
Ken


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 00:04:24
Message-ID: 5638.1419033864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> writes:
> If we are going to consider changing the hash function, we should
> consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
> x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
> is double the speed of fast-hash according to the page running on a
> 3GHz x86_64.

Well, as the google page points out, raw speed is not the only figure of
merit; otherwise we'd just xor all the bytes and call it good. We need
the hash function to spread out the hash values well, or else we lose more
cycles chasing inordinately-long hash chains than we saved with a cheap
hash function. Google claims their fast-hash is actually better on this
point than Jenkins, which if true would be very nice indeed.

Keep in mind also that very very few of our hash keys are longer than
about 20 bytes, so that speed-per-byte is not that exciting anyway.
Longer setup/finish times could easily swamp any per-byte advantages,
for example.

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 04:03:55
Message-ID: 5494F52B.7060008@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/19/14, 5:13 PM, Tom Lane wrote:
> Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
>> On 12/18/14, 5:00 PM, Jim Nasby wrote:
>>> 2201582 20 -- Mostly LOCALLOCK and Shared Buffer
>
>> Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64.
>
>> It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We also have very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets us down to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminate one of the mix() calls.
>
> I don't see this working. The lock table in shared memory can surely take
> no such shortcuts. We could make a backend's locallock table omit fields
> that are predictable within the set of objects that backend could ever
> lock, but (1) this doesn't help unless we can reduce the tag size for all
> LockTagTypes, which we probably can't, and (2) having the locallock's tag
> be different from the corresponding shared tag would be a mess too.
> I think dealing with (2) might easily eat all the cycles we could hope to
> save from a smaller hash tag ... and that's not even considering the added
> logical complexity and potential for bugs.

I think we may be thinking different things here...

I'm not suggesting we change BufferTag or BufferLookupEnt; clearly we can't simply throw away any of the fields I was talking about (well, except possibly tablespace ID. AFAICT that's completely redundant for searching because relid is UNIQUE).

What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogous to:

hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)

perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something?

> Switching to a different hash algorithm could be feasible, perhaps.
> I think we're likely stuck with Jenkins hashing for hashes that go to
> disk, but hashes for dynahash tables don't do that.

Yeah, I plan on testing the performance of fash-hash for HASH_BLOBS just to see how it compares.

Why would we be stuck with Jenkins hashing for on-disk data? pg_upgrade, or something else?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 05:15:42
Message-ID: 20141220051542.GM5023@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
> I'm not suggesting we change BufferTag or BufferLookupEnt; clearly we
> can't simply throw away any of the fields I was talking about (well,
> except possibly tablespace ID. AFAICT that's completely redundant for
> searching because relid is UNIQUE).

It's actually not. BufferTag's contain relnodes via RelFileNode - that's
not the relation's oid, but the filenode. And that's *not* guranteed
unique across database unfortunately.

> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogous to:
>
> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
>
> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something?

I don't think that'd improve anything. Jenkin's hash does have a quite
mixing properties, I don't believe that the above would improve the
quality of the hash.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 17:51:19
Message-ID: 4814.1419097879@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogous to:
>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
>>
>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something?

> I don't think that'd improve anything. Jenkin's hash does have a quite
> mixing properties, I don't believe that the above would improve the
> quality of the hash.

I think what Jim is suggesting is to intentionally degrade the quality of
the hash in order to let it be calculated a tad faster. We could do that
but I doubt it would be a win, especially in systems with lots of buffers.
IIRC, when we put in Jenkins hashing to replace the older homebrew hash
function, it improved performance even though the hash itself was slower.

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 20:13:37
Message-ID: 5495D871.5060509@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/20/14, 11:51 AM, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
>>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogous to:
>>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
>>>
>>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something?
>
>> I don't think that'd improve anything. Jenkin's hash does have a quite
>> mixing properties, I don't believe that the above would improve the
>> quality of the hash.
>
> I think what Jim is suggesting is to intentionally degrade the quality of
> the hash in order to let it be calculated a tad faster. We could do that
> but I doubt it would be a win, especially in systems with lots of buffers.
> IIRC, when we put in Jenkins hashing to replace the older homebrew hash
> function, it improved performance even though the hash itself was slower.

Right. Now that you mention it, I vaguely recall the discussions about changing the hash function to reduce collisions.

I'll still take a look at fash-hash, but it's looking like there may not be anything we can do here unless we change how we identify relation files (combining dbid, tablespace id, fork number and file id, at least for searching). If we had 64bit hash support then maybe that'd be a significant win, since you wouldn't need to hash at all. But that certainly doesn't seem to be low-hanging fruit to me...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-24 06:27:39
Message-ID: 549A5CDB.50802@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/20/14, 2:13 PM, Jim Nasby wrote:
> On 12/20/14, 11:51 AM, Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
>>>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogous to:
>>>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
>>>>
>>>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something?
>>
>>> I don't think that'd improve anything. Jenkin's hash does have a quite
>>> mixing properties, I don't believe that the above would improve the
>>> quality of the hash.
>>
>> I think what Jim is suggesting is to intentionally degrade the quality of
>> the hash in order to let it be calculated a tad faster. We could do that
>> but I doubt it would be a win, especially in systems with lots of buffers.
>> IIRC, when we put in Jenkins hashing to replace the older homebrew hash
>> function, it improved performance even though the hash itself was slower.
>
> Right. Now that you mention it, I vaguely recall the discussions about changing the hash function to reduce collisions.
>
> I'll still take a look at fash-hash, but it's looking like there may not be anything we can do here unless we change how we identify relation files (combining dbid, tablespace id, fork number and file id, at least for searching). If we had 64bit hash support then maybe that'd be a significant win, since you wouldn't need to hash at all. But that certainly doesn't seem to be low-hanging fruit to me...

I've taken a look at a number of different hash algorithms, testing them with initially with SMHasher [1] and then with pgbench.

It's worth noting that a lot of work has been done in the area of hash algo's since we last updated the hash_any algorithm in 2009 [2]. It's probably worth revisiting this every other release or so.

Most of my SMHasher results are at [3]. I suspect SpookyHash is close to what we currently have, so that's what I used for a baseline. I determined that fash-hash (called superfast in results) was faster than Spooky, but not as fast as CityHash64[4] or xxhash[5]. CityHash and xxhash had similar performance, but xxhash seems to have slightly better characteristics according to SMHasher, and more important, it's in C, not C++. However, CityHash has been replaced by farmhash[7], which might be faster than xxhash.

I did a quick hack at using xxhash ONLY for shared buffer lookup [6]. I've attached the full patch, as well as a version that omits the new files. Note that currently xxhash is setup in such a way that we'd get different results depending on endian-ness, so we couldn't just drop this in as-is across the board. Of course, there's also the question of if we'd want to force a REINDEX on hash indexes.

pgbench results are below. Select-only testing was done first, then read-write. There were several select-only runs on both to warm shared_buffers (set to 512MB for this test, and fsync is off). Database initialized with bin/pgbench -i -F 100 -s 10.

pgbench -S -T10 -c 4 -j 4
master:
tps = 9556.356145 (excluding connections establishing)
tps = 9897.324917 (excluding connections establishing)
tps = 9287.286907 (excluding connections establishing)
tps = 10210.130833 (excluding connections establishing)

XXH32:
tps = 32462.754437 (excluding connections establishing)
tps = 33232.144511 (excluding connections establishing)
tps = 33082.436760 (excluding connections establishing)
tps = 33597.904532 (excluding connections establishing)

pgbench -T10 -c 4 -j 4
master:
tps = 2299.314145 (excluding connections establishing)
tps = 2029.956749 (excluding connections establishing)
tps = 2067.462362 (excluding connections establishing)

XXH32:
tps = 2653.919321 (excluding connections establishing)
tps = 2615.764370 (excluding connections establishing)
tps = 2952.144461 (excluding connections establishing)

Questions:
Do we want to do what we've previously done and cut/paste/modify this code into our repo? Given how rapidly hash algorithms seem to be changing I'm inclined to minimize the amount of effort required to pull a new one in...

Can someone with a big-endian CPU see what the impact of XXH_FORCE_NATIVE_FORMAT 1 vs 0? If there's a notable difference we might want to do different things for on-disk vs in-memory hashes.

For that matter, assuming we adopt this, do we want to replace the index hash functions too? SMHasher shows that CityHash is ~50% faster than XXHash for 262144 byte keys. Even SpookyHash is about 17% faster on that key size. So if we want to do something with hash indexes, we'd probably be better off with City or Farm hash than XXHash.

[1] https://code.google.com/p/smhasher/
[2] https://github.com/postgres/postgres/commit/8205258fa675115439017b626c4932d5fefe2ea8
[3] https://github.com/decibel/hash_testing/tree/master/results
[4] https://code.google.com/p/cityhash/
[5] https://code.google.com/p/xxhash/
[6] https://github.com/decibel/postgres/commit/b82e7a3b936b3950b296514f6ee0542132f11e4a
[7] https://code.google.com/p/farmhash/
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment Content-Type Size
0001-Fugly-hack-to-test-out-XXHash.patch text/plain 36.4 KB
PG-code-only.patch text/plain 3.1 KB

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-24 06:37:21
Message-ID: 549A5F21.6020402@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/24/14, 12:27 AM, Jim Nasby wrote:
> There were several select-only runs on both to warm shared_buffers (set to 512MB for this test, and fsync is off).

BTW, presumably this ~380M database isn't big enough to show any problems with hash collisions, and I'm guessing you'd need way more memory than what I have on this laptop. Might be worth looking into that on a machine with a serious amount of memory.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-24 09:32:16
Message-ID: 20141224093216.GJ23613@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-12-24 00:27:39 -0600, Jim Nasby wrote:
> pgbench -S -T10 -c 4 -j 4
> master:
> tps = 9556.356145 (excluding connections establishing)
> tps = 9897.324917 (excluding connections establishing)
> tps = 9287.286907 (excluding connections establishing)
> tps = 10210.130833 (excluding connections establishing)
>
> XXH32:
> tps = 32462.754437 (excluding connections establishing)
> tps = 33232.144511 (excluding connections establishing)
> tps = 33082.436760 (excluding connections establishing)
> tps = 33597.904532 (excluding connections establishing)

FWIW, I don't believe these results for one second. It's quite plausible
that there's a noticeable performance benefit, but a factor of three is
completely unrealistic... Could you please recheck?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-24 16:58:45
Message-ID: 8466.1419440325@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-12-24 00:27:39 -0600, Jim Nasby wrote:
>> pgbench -S -T10 -c 4 -j 4
>> master:
>> tps = 9556.356145 (excluding connections establishing)
>> tps = 9897.324917 (excluding connections establishing)
>> tps = 9287.286907 (excluding connections establishing)
>> tps = 10210.130833 (excluding connections establishing)
>>
>> XXH32:
>> tps = 32462.754437 (excluding connections establishing)
>> tps = 33232.144511 (excluding connections establishing)
>> tps = 33082.436760 (excluding connections establishing)
>> tps = 33597.904532 (excluding connections establishing)

> FWIW, I don't believe these results for one second. It's quite plausible
> that there's a noticeable performance benefit, but a factor of three is
> completely unrealistic... Could you please recheck?

A possible theory is that the hash change moved some locks into
different partitions causing a large reduction in contention,
but even then 3X seems unlikely. And of course if that was
the mechanism, the result is still pure luck; other examples
might get worse by the same amount.

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-24 19:58:41
Message-ID: 549B1AF1.4020501@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/24/14, 10:58 AM, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-12-24 00:27:39 -0600, Jim Nasby wrote:
>>> pgbench -S -T10 -c 4 -j 4
>>> master:
>>> tps = 9556.356145 (excluding connections establishing)
>>> tps = 9897.324917 (excluding connections establishing)
>>> tps = 9287.286907 (excluding connections establishing)
>>> tps = 10210.130833 (excluding connections establishing)
>>>
>>> XXH32:
>>> tps = 32462.754437 (excluding connections establishing)
>>> tps = 33232.144511 (excluding connections establishing)
>>> tps = 33082.436760 (excluding connections establishing)
>>> tps = 33597.904532 (excluding connections establishing)
>
>> FWIW, I don't believe these results for one second. It's quite plausible
>> that there's a noticeable performance benefit, but a factor of three is
>> completely unrealistic... Could you please recheck?
>
> A possible theory is that the hash change moved some locks into
> different partitions causing a large reduction in contention,
> but even then 3X seems unlikely. And of course if that was
> the mechanism, the result is still pure luck; other examples
> might get worse by the same amount.

I must have screwed something up, because if anything I see a small loss for XXH now (but my laptop isn't consistent enough to be sure).

This surprises me given that SMHasher shows XXH to be 50% faster than Spooky for 20 byte keys.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-24 22:28:37
Message-ID: 24160.1419460117@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
> On 12/24/14, 10:58 AM, Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>> FWIW, I don't believe these results for one second. It's quite plausible
>>> that there's a noticeable performance benefit, but a factor of three is
>>> completely unrealistic... Could you please recheck?

>> A possible theory is that the hash change moved some locks into
>> different partitions causing a large reduction in contention,
>> but even then 3X seems unlikely. And of course if that was
>> the mechanism, the result is still pure luck; other examples
>> might get worse by the same amount.

> I must have screwed something up, because if anything I see a small loss for XXH now (but my laptop isn't consistent enough to be sure).

> This surprises me given that SMHasher shows XXH to be 50% faster than
> Spooky for 20 byte keys.

The speed of the hash calculation itself is unlikely to move the needle as
much as 1% either direction, because I've seldom seen any profile in which
hash_any accounted for more than a percent or so of total runtime. What
is far more likely to be causing visible performance changes is downstream
effects, ie changes in the distribution of entries across hash buckets,
causing changes in bucket list search times and/or changes in contention
(for shared memory tables that are locked on a hash-partition basis).
But even then it's hard to credit 3X.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-25 11:07:57
Message-ID: 20141225110757.GD31801@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-12-24 13:58:41 -0600, Jim Nasby wrote:
> This surprises me given that SMHasher shows XXH to be 50% faster than Spooky for 20 byte keys.

Note that there's quite some differences in how smhasher and postgres
use the hash functions. The former nearly exclusively exercises the hash
function while postgres also executes a lot of other code. Which means
that the instruction cache and branch prediction is much less likely to
contain relevant entries. And that can change the performance
characteristics noticeably. Additionally there's differences in how much
pipelining can be employed - the likelihood the CPU is able to do so in
tight loops is much higher.

Also note that in many cases in which you can see tag_hash/hash_any in
workloads a part of the cost won't actually be the computation of the
hashes themselves but the cache misses triggered by the hash computation
accessing the data.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services