Re: Hash index todo list item

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-08 22:56:23
Message-ID: 46E32897.8080106@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kenneth Marshall wrote:
> The value of storing the
> actual value, possibly as an option, is that the key check can be done
> in the index without requiring a heap lookup to check the actual value
> which would be a benefit for a unique index. I agree that supporting
> variable length data would complicate the index and reduce performance.
> I am not willing to assume that ~1 entry per hash bucket is necessarily
> what we want, at least without some testing.
I think that if the case of >1 entry per hash becomes common enough to
be significant, and the key is stored in the hash, that a btree will
perform equal or better, and there is no point in pursuing such a hash
index model. This is where we are today.

> It seems reasonable that
> with the performance differences between L1/L2/L3 cache, main memory,
> and the disk subsystem a higher load factor would result in better
> overall system performance by reducing cache line misses and improving
> hardware pre-fetch efficiency.
If the key is stored, all of these factors likely favor the btree format
over the hash format. Again, this is where we are today.

> Along with the hypothetical performance
> wins, the hash index space efficiency would be improved by a similar
> factor. Obviously, all of these ideas would need to be tested in
> various workload environments. In the large index arena, 10^6 to 10^9
> keys and more, space efficiency will help keep the index manageable
> in todays system memories.
>
Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
> Please keep the ideas and comments coming. I am certain that a synthesis
> of them will provide an implementation with the performance characteristics
> that we are seeking.
>
The subject interests me. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian G. Pflug 2007-09-09 00:26:02 Re: WIP patch for latestCompletedXid method of computing snapshot xmax
Previous Message apoc9009 2007-09-08 22:46:49 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)