Quick Links

Re: Hash index todo list item

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Hash index todo list item
Date:	2007-09-08 22:56:23
Message-ID:	46E32897.8080106@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Kenneth Marshall wrote:
> The value of storing the
> actual value, possibly as an option, is that the key check can be done
> in the index without requiring a heap lookup to check the actual value
> which would be a benefit for a unique index. I agree that supporting
> variable length data would complicate the index and reduce performance.
> I am not willing to assume that ~1 entry per hash bucket is necessarily
> what we want, at least without some testing.
I think that if the case of >1 entry per hash becomes common enough to
be significant, and the key is stored in the hash, that a btree will
perform equal or better, and there is no point in pursuing such a hash
index model. This is where we are today.

> It seems reasonable that
> with the performance differences between L1/L2/L3 cache, main memory,
> and the disk subsystem a higher load factor would result in better
> overall system performance by reducing cache line misses and improving
> hardware pre-fetch efficiency.
If the key is stored, all of these factors likely favor the btree format
over the hash format. Again, this is where we are today.

> Along with the hypothetical performance
> wins, the hash index space efficiency would be improved by a similar
> factor. Obviously, all of these ideas would need to be tested in
> various workload environments. In the large index arena, 10^6 to 10^9
> keys and more, space efficiency will help keep the index manageable
> in todays system memories.
>
Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
> Please keep the ideas and comments coming. I am certain that a synthesis
> of them will provide an implementation with the performance characteristics
> that we are seeking.
>
The subject interests me. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Re: Hash index todo list item at 2007-09-08 21:49:00 from Kenneth Marshall

Responses

Re: Hash index todo list item at 2007-09-10 08:56:52 from Jens-Wolfhard Schicke
Re: Hash index todo list item at 2007-09-10 15:04:44 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Florian G. Pflug	2007-09-09 00:26:02	Re: WIP patch for latestCompletedXid method of computing snapshot xmax
Previous Message	apoc9009	2007-09-08 22:46:49	Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)