Re: Hash index todo list item

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Brian Hurt <bhurt(at)janestcapital(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 15:15:26
Message-ID: 20070907151526.GK19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>>>>
>>> How likely is it that you will get a hash collision, two strings that are
>>> different that will hash to the same value? To avoid this requires a
>>> very large hash key (128 bits, minimum)- otherwise you get into birthday
>>> attack problems. With a 32-bit hash, the likelyhood is greater than 50%
>>> that two strings in a collection of 100,000 will hash to the same value.
>>> With a 64-bit hash, the likelyhood is greater than 50% that two strings
>>> in a collection of 10 billion will has to same value. 10 billion is a
>>> large number, but not an unreasonable number, of strings to want to put
>>> into a hash table- and it's exactly this case where the O(1) cost of
>>> hashtables starts being a real win.
>>>
>>> Brian
>>>
>>>
>> Yes, there is a non-negligible chance of collision (In a DB is there
>> any chance that is non-negligible? :) ) and the values must be checked
>> against the actual. The win is the collapse of the index size and only
>> needed to check a small fraction of the actual tuples.
>>
>>
>>
>
> Ah, OK- I misunderstood you. I thought you were saying that the hash
> values would need to be unique, and you wouldn't check the original values
> at all. My bad.
>
> Brian
>
No, you were correct. I misstated originally and you and Mark both pointed
out my mistake.

Regards,
Ken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2007-09-07 15:33:21 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Previous Message Dave Page 2007-09-07 15:14:06 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)