Re: [PATCH]-hash index improving

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Kenneth Marshall" <ktm(at)rice(dot)edu>
Subject: Re: [PATCH]-hash index improving
Date: 2008-07-18 05:37:10
Message-ID: 10542.1216359430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:
> The real question now has been listed above; why are hash index
> queries, including this patch, no better than b-tree even for straight
> equality comparisons?

Well, the theoretical advantage is that a hash probe is O(1) while a
btree probe is O(log N) (ignoring a lot of possible nonoptimalities
on each side of course). So you'd need a test case large enough for
log N to be noticeably more than 1, which I think means in practice that
the upper levels of the btree don't fit in memory. So small test cases
aren't going to show any improvement.

I think the historical problem with our hash implementation has been
that it was terribly space-inefficient, because of the fixed (and large)
bucket size. A dense btree index can probably beat a sparse hash up to
exceedingly large values of N. It's not clear to me how much the patch
at hand does to fix that.

The patch might introduce a new problem, which is extra heap visits
caused by the lossiness of the hash value. Again, that doesn't hurt
much ideally, but maybe you chanced to use a test case where it does
hurt. It would be worth sticking in a bit of debug instrumentation to
see whether the number of failed heap visits is significant.

In any case, the reported test seems to be dealing with index sizes in
the tens-of-megabytes range, and it doesn't surprise me a whole lot that
an O(log N) penalty isn't showing up there. Try a test case where the
index size is significantly larger than your RAM.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-07-18 06:01:06 Re: TABLE-function patch vs plpgsql
Previous Message Jonah H. Harris 2008-07-18 05:21:37 Re: [PATCH]-hash index improving