Re: Hash indexes

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, mark(at)mark(dot)mielke(dot)cc, Bruce Momjian <bruce(at)momjian(dot)us>, Jie Zhang <jzhang(at)greenplum(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash indexes
Date: 2006-08-01 23:41:36
Message-ID: 873bcgxbv3.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing <hannu(at)skype(dot)net> writes:

> Ühel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan:
> > Gregory Stark wrote:
> > >
> > > I looked a while back and was suspicious about the actual hash functions too.
> > > It seemed like a lot of them were vastly suboptimal. That would mean we're
> > > often dealing with mostly empty and mostly full buckets instead of well
> > > distributed hash tables.
> > >
> > >
> > >
> >
> > This is now sounding like a lot of low hanging fruit ... highly
> > performant hash indexed tables could possibly be a very big win.
> >
>
> Are you sure about the badness of our hash functions ?
>
> I just tested and hashtext(text) has about 1.4% of collisions on about
> 120M distinct texts, which is not bad considering thet total space for
> hashes is 4G, meaning that 120M covers itself already 3% of possible
> hash space.

I don't recall exactly what triggered my suspicions, but I think it had more
to do with things like how int4 and int8 were mapped to hash codes and what
the hash function looks like that compresses the hash codes into the actual
bucket. IIRC it's just the low order bits for int8 and it's the actual int4
which then only takes the low order bits. I seem to recall wondering what
would happen if you had, say, a list of /24 network addresses for example.

I didn't actually do any tests though so it's quite possible I missed
something that mitigated the issues I feared.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-08-01 23:45:53 Re: Replication Documentation
Previous Message Tom Lane 2006-08-01 23:16:35 Re: Values list-of-targetlists patch for comments (was Re: [HACKERS] 8.2 features?)