Quick Links

Re: Hash index todo list item

From:	Brian Hurt <bhurt(at)janestcapital(dot)com>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Hash index todo list item
Date:	2007-09-07 14:36:41
Message-ID:	46E161F9.8070705@janestcapital.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Kenneth Marshall wrote:

>I understand that a hash value is a many-to-one mapping. That is the
>point of the flag in the index. The flag means that there is only one
>item in the heap corresponding to that hash value. In this case we
>know that the value in the heap is the correct one and a possibly
>very expensive string comparison can be skipped. Given that the hash
>function is doing its job, almost every string comparison can be skipped.
>How long would it take to compare 1-32K of data? How much CPU usage?
>With this field in place, you only need to check tuple visibility.
>
>

How likely is it that you will get a hash collision, two strings that
are different that will hash to the same value? To avoid this requires
a very large hash key (128 bits, minimum)- otherwise you get into
birthday attack problems. With a 32-bit hash, the likelyhood is greater
than 50% that two strings in a collection of 100,000 will hash to the
same value. With a 64-bit hash, the likelyhood is greater than 50% that
two strings in a collection of 10 billion will has to same value. 10
billion is a large number, but not an unreasonable number, of strings to
want to put into a hash table- and it's exactly this case where the O(1)
cost of hashtables starts being a real win.

Brian

In response to

Re: Hash index todo list item at 2007-09-07 14:01:00 from Kenneth Marshall

Responses

Re: Hash index todo list item at 2007-09-07 14:55:07 from Kenneth Marshall
Re: Hash index todo list item at 2007-09-08 20:21:22 from Kenneth Marshall

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kenneth Marshall	2007-09-07 14:39:13	Re: Hash index todo list item
Previous Message	Mark Mielke	2007-09-07 14:30:30	Re: Hash index todo list item