Re: Making hash indexes worthwhile

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making hash indexes worthwhile
Date: 2009-10-31 00:02:59
Message-ID: f67928030910301702k5a8c5fb9s77380aeae42cbe1f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 4, 2009 at 7:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> I see that the docs were recently changed from discouraging hash
>> indexes both because they were no known uses in which they were
>> meaningfully better than btree, and because they aren't recoverable;
>> to now just discouraging them because they are not recoverable.  Does
>> that mean there are now uses in which hash indexes are provably better
>> than btree if one is willing to overlook the lack of recoverability?
>> If so, what are those situations?
>
> One reason is that you can index values that are too big for btrees;
> since hash indexes now store only the hash codes, they don't have a hard
> length limit on the underlying value.  I'm not sure how useful that
> really is in practice, but it's at least an argument for considering
> them in certain cases.

I haven't tested this situation at all yet, hoping to show that there
is substantially less restrictive use case under which they can be
made to work well.

>
>> I've played around a bit with hash indexes, and it seems to me that
>> making them generally worthwhile will take (at least) reducing or
>> entirely doing away with the heavy-wait locks.
>
> Concurrency is really the least of the issues for hash indexes.  So far
> it's not clear that they're fast enough even in sequential use ...

Hash index performance came up again in another thread, so I wanted to
come back to this and report my findings.

I've done testing using select-only statements with either btree or
hash index on pgbench_accounts.aid, using only one active backend.
For all tests, CPU usage was the bottleneck, essentially 100%.

I used 600MB shared_buffers and -s 30. I figure that if each
table-page-fetch is a physical IO, then that will dominate the
performance, so the index's nature is irrelevant. So if the hash
index has a strength it will best show if the table fits in memory.
(The alternative would be if the table is so large that not even the
btree's index pages fit in memory, resulting in IO for those index
pages as well as table's, but I didn't have the patience or adequate
IO subsystem to set up a test for that)

With pgbench -S (select only, one tuple at a time) the overhead of
communication between pgbench and the backend dominated the run time.
Moving the select loop into a pl/pgsql routine did not help much. So
I made a temp table with 10,000 randomly selected ids (from 1 to
3,000,000) and used that as the outer member to drive an nested loop
join against the pgbench_accounts table, doing a custom query of
select sum(abalance) from pgbench_accounts, foo where aid=random_id

The HEAD code could do 16-17 tps (160,000 to 170,000 inner loop tuple
fetches per second) using hash index, and 18 tps with btree. (But
occasionally the rank order would be reversed, out of many alternating
runs of reindexing, vacuuming, then doing pgbench -T 600 -f
nested_loop.sql). When I disabled the lmgr page locking of the hash
index, by making it think it was a temporary relation, the tps for the
hash index jumped to 24 tps.

If I make all the aid pgbench_accounts be even, and load "foo" with
only odd IDs so that no rows are actually found, then the speed of the
hash index jumps to about 48 tps, so about half the remaining CPU
usage with hash index pertains to dealing with the rows once they are
found, and not with using the index to attempt to find them. (I don't
know how much of this time is used to recheck the qual, which btree
doesn't need to do.)

So my conclusions are:

1) btree indexes are so good at equality lookup, there may not be much
room for hash indexes to improve on it by a substantial amount.

2) Heavy-weight locks are called that for a reason, they use a lot of
CPU even without contention.

3) The CPU usage of the hash-index code proper is quite small, with
more time being spent in heavy-weight PageLocks (specific to hash
indexes, but not part of the hash index code itself) and in executor
code common to all index methods, than in the hash index code. Based
on this, I don't think that changing the bucket size to be smaller
than a 8K block size is going to make much difference in hash index
performance.

Thinking about how to get rid of heavy weight locks and add WAL, I was
wondering if we couldn't get rid of bucket splits altogether,
requiring the bucket count to be set immutably at build time based on
the estimated eventual max/equilibrium size of the table. Maybe this
would render hash indexes somewhat of a niche feature, but probably
much less so than they currently are. I think that getting rid of
dynamic bucket splits would make it easier to get rid of heavy-weight
locks and also much easier to implement WAL. WAL logging the addition
of an overflow page seems about the same as a btree node split, while
WAL logging a bucket split, when the bucket potentially has overflow
pages in it, seems much much harder.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2009-10-31 03:00:27 Re: Patch set under development to add usage reporting.
Previous Message Alexey Klyukin 2009-10-30 23:48:50 PL/Perl backed crashed during spi_exec_query