Re: [PATCH]-hash index improving

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Kenneth Marshall" <ktm(at)rice(dot)edu>
Subject: Re: [PATCH]-hash index improving
Date: 2008-07-17 23:37:58
Message-ID: D425483C2C5C9F49B5B7A41F8944154701000F81@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----Original Message-----

> From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Simon Riggs
> Sent: Thursday, July 17, 2008 4:10 PM

> To: Jonah H. Harris

> Cc: Xiao Meng; pgsql-hackers(at)postgresql(dot)org; Kenneth Marshall

> Subject: Re: [HACKERS] [PATCH]-hash index improving

>

>

> On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote:

> > I'm not really seeing any performance improvements over b-tree.

>

> I'd like to see a theoretical analysis on the benefit case before we

> spend too many teraflops on investigation.

>

> In which cases do we expect that hash indexes will beat btrees?

Large table unique index equality search should be very fast with hashed
index (and the only place where any advantage will be seen). Hashed
indexes are useless for any search besides equality and gain more and
more when the levels of the b-tree index increase.

Here is a hash index lookup that is frightfully fast:
http://www.corpit.ru/mjt/tinycdb.html

It's a constant database, but the file format might be worth
examination. Here is a quickie definition of the CDB format:
http://cr.yp.to/cdb/cdb.txt

I think that the problem with hashed indexes tends to be the blocking of
index pages on disk. For instance, the FastDB/GigaBase author was able
to make FastDB memory based hash indexes that are faster than the memory
based b-tree versions, but not for the disk based GigaBase database.
The only way to get better performance from hash based indexes is to
read fewer index pages than if a tree-based index were used. So I think
that the scheme used to create the index pages is the focus to make them
worthwhile.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2008-07-18 00:10:26 Re: [PATCH]-hash index improving
Previous Message Tom Lane 2008-07-17 23:13:03 TABLE-function patch vs plpgsql