Re: What is wrong with hashed index usage?

Lists: pgsql-hackers
From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Neil Conway" <nconway(at)klamath(dot)dyndns(dot)org>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-22 22:04:22
Message-ID: D90A5A6C612A39408103E6ECDD77B82920CD96@voyager.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: Neil Conway [mailto:nconway(at)klamath(dot)dyndns(dot)org]
> Sent: Monday, April 22, 2002 2:59 PM
> To: Dann Corbit
> Cc: pgsql-hackers(at)postgreSQL(dot)org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> On Mon, 22 Apr 2002 14:15:37 -0700
> "Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
> > From here:
> > http://osdb.sourceforge.net/
> > We find this quote:
> > "For you long-suffering OSDB PostgreSQL users, we offer
> >
> > --postgresql=no_hash_index
> >
> > to work around the hash index problems of OSDB with
> PostgreSQL V7.1 and
> > 7.2. As always, let us know of any problems. May the source be with
> > you!"
> >
> > Does anyone know what the above is all about?
>
> Yes -- search the list archives, or check the PostgreSQL
> docs. This problem
> has been brought up several times: hash indexes deadlock
> under concurrent
> load. A run of pgbench with a reasonably high concurrency
> level (10 or 15)
> produces the problem consistently.
>
> Previously, I had volunteered to fix this, but
>
> (a) I'm busy with the PREPARE/EXECUTE stuff at the moment.
>
> (b) I'm not sure it's worth the investment of time: AFAIK,
> hash indexes don't have many advantages over btrees for
> scalar data.
>
> On the other hand, if someone steps forward with some data on a
> specific advantage that hash indexes have over btrees, I don't
> expect that the concurrency problems should be too difficult to
> solve.

Here is where a hashed index shines:
To find a single item using a key, hashed indexes are enormously faster
than a btree.

That is typically speaking. I have not done performance benchmarks with
PostgreSQL.

In general, hashed indexes are much to be preferred when you are doing
frequent keyed lookups for single items. Hashed indexes are (of course)
completely useless for an ordered scan or for wide ranges of continuous
data.


From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-22 22:13:48
Message-ID: 20020422181348.3296b6f6.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 22 Apr 2002 15:04:22 -0700
"Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
> Here is where a hashed index shines:
> To find a single item using a key, hashed indexes are enormously faster
> than a btree.
>
> That is typically speaking. I have not done performance benchmarks with
> PostgreSQL.

Yes -- but in the benchmarks I've done, the performance different
is not more than 5% (for tables with ~ 600,000 rows, doing lookups
based on a PK with "="). That said, my benchmarks could very well
be flawed, I didn't spend a lot of time on it. If you'd like to
generate some interest in improving hash indexes, I'd like to see
some empirical data supporting your performance claims.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: Michael Loftis <mloftis(at)wgops(dot)com>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-22 23:47:59
Message-ID: 3CC4A12F.70700@wgops.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The benchmarks will depend mostly on the depth of the Btree. Hashes
will be markedly faster only in the case(s) where descending into the
tree to produce a matching leaf node would take longer than walking to
the appropriate item in a hash.

Most of the time until the btree gets deep they are nearly equivalent.
When the tree ends up becoming many levels deep it can take longer to
walk than the hash.

Neil Conway wrote:

>On Mon, 22 Apr 2002 15:04:22 -0700
>"Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
>
>>Here is where a hashed index shines:
>>To find a single item using a key, hashed indexes are enormously faster
>>than a btree.
>>
>>That is typically speaking. I have not done performance benchmarks with
>>PostgreSQL.
>>
>
>Yes -- but in the benchmarks I've done, the performance different
>is not more than 5% (for tables with ~ 600,000 rows, doing lookups
>based on a PK with "="). That said, my benchmarks could very well
>be flawed, I didn't spend a lot of time on it. If you'd like to
>generate some interest in improving hash indexes, I'd like to see
>some empirical data supporting your performance claims.
>
>Cheers,
>
>Neil
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Michael Loftis <mloftis(at)wgops(dot)com>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-23 23:54:53
Message-ID: 200204232354.g3NNsr002293@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Loftis wrote:
> The benchmarks will depend mostly on the depth of the Btree. Hashes
> will be markedly faster only in the case(s) where descending into the
> tree to produce a matching leaf node would take longer than walking to
> the appropriate item in a hash.
>
> Most of the time until the btree gets deep they are nearly equivalent.
> When the tree ends up becoming many levels deep it can take longer to
> walk than the hash.

And what causes the btree to get deep? Is it just the number of rows in
the index?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Loftis <mloftis(at)wgops(dot)com>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-25 20:05:49
Message-ID: 6100.1019765149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Loftis <mloftis(at)wgops(dot)com> writes:
> [ on hash vs btree indexing ]
> Most of the time until the btree gets deep they are nearly equivalent.
> When the tree ends up becoming many levels deep it can take longer to
> walk than the hash.

Maybe. I've just completed a simple benchmark of btree vs hash indexes
as implemented in Postgres, and I can't see any advantage.

Using current sources on Red Hat Linux 7.2, I built a simple test table
containing one integer column, and filled it with 16 million random
integers generated by int4(1000000000 * random()). With a btree index,
"explain analyze select * from foo where f1 = 314888455" (matching a
single row of the table) took about 22 msec on first try (nothing in
cache), and subsequent repetitions about 0.11 msec. With a hash index,
the first try took about 28 msec and repetitions about 0.15 msec.
Moreover, the hash index was a whole lot bigger: main table size 674
meg, btree 301 meg, hash 574 meg, which possibly offers part of the
explanation for the greater access time.

I would have tried a larger test case, but this one already taxed
my patience: it took 36 hours to build the hash index (vs 19 minutes
for the btree index). It looks like hash index build has an O(N^2)
performance curve --- the thing had 100 meg of hash index built within
an hour of starting, but got slower and slower after that.

In short, lack of support for concurrent operations is hardly the
worst problem with Postgres' hash indexes. If you wanna fix 'em,
be my guest ... but I think I shall spend my time elsewhere.

regards, tom lane


From: Michael Loftis <mloftis(at)wgops(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-25 20:19:33
Message-ID: 3CC864D5.6070809@wgops.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Michael Loftis <mloftis(at)wgops(dot)com> writes:
>
>>[ on hash vs btree indexing ]
>>Most of the time until the btree gets deep they are nearly equivalent.
>>When the tree ends up becoming many levels deep it can take longer to
>>walk than the hash.
>>
>
>Maybe. I've just completed a simple benchmark of btree vs hash indexes
>as implemented in Postgres, and I can't see any advantage.
>
>Using current sources on Red Hat Linux 7.2, I built a simple test table
>containing one integer column, and filled it with 16 million random
>integers generated by int4(1000000000 * random()). With a btree index,
>"explain analyze select * from foo where f1 = 314888455" (matching a
>single row of the table) took about 22 msec on first try (nothing in
>cache), and subsequent repetitions about 0.11 msec. With a hash index,
>the first try took about 28 msec and repetitions about 0.15 msec.
>Moreover, the hash index was a whole lot bigger: main table size 674
>meg, btree 301 meg, hash 574 meg, which possibly offers part of the
>explanation for the greater access time.
>
>I would have tried a larger test case, but this one already taxed
>my patience: it took 36 hours to build the hash index (vs 19 minutes
>for the btree index). It looks like hash index build has an O(N^2)
>performance curve --- the thing had 100 meg of hash index built within
>an hour of starting, but got slower and slower after that.
>
>In short, lack of support for concurrent operations is hardly the
>worst problem with Postgres' hash indexes. If you wanna fix 'em,
>be my guest ... but I think I shall spend my time elsewhere.
>
I said can, no will. The particular btree implementation dictates what
sorts of operations become bogged down. I do agree that in pretty much
every case, a well implemented btree will be better than a hash though.
I don't know about PGs implementation but since I assume oyu all
inhereted atleast part of it from the berkely boys you should be in very
solid form.

>
> regards, tom lane
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Loftis <mloftis(at)wgops(dot)com>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-25 20:38:00
Message-ID: 200204252038.g3PKc0h15943@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Nice report. I think we should start thinking of hiding the hash option
from users, or warn them more forcefully, rather than hold it out as a
possible option for them.

People think hash is best for equals-only queries, and btree for others,
and we can now see this clearly isn't the case.

---------------------------------------------------------------------------

Tom Lane wrote:
> Michael Loftis <mloftis(at)wgops(dot)com> writes:
> > [ on hash vs btree indexing ]
> > Most of the time until the btree gets deep they are nearly equivalent.
> > When the tree ends up becoming many levels deep it can take longer to
> > walk than the hash.
>
> Maybe. I've just completed a simple benchmark of btree vs hash indexes
> as implemented in Postgres, and I can't see any advantage.
>
> Using current sources on Red Hat Linux 7.2, I built a simple test table
> containing one integer column, and filled it with 16 million random
> integers generated by int4(1000000000 * random()). With a btree index,
> "explain analyze select * from foo where f1 = 314888455" (matching a
> single row of the table) took about 22 msec on first try (nothing in
> cache), and subsequent repetitions about 0.11 msec. With a hash index,
> the first try took about 28 msec and repetitions about 0.15 msec.
> Moreover, the hash index was a whole lot bigger: main table size 674
> meg, btree 301 meg, hash 574 meg, which possibly offers part of the
> explanation for the greater access time.
>
> I would have tried a larger test case, but this one already taxed
> my patience: it took 36 hours to build the hash index (vs 19 minutes
> for the btree index). It looks like hash index build has an O(N^2)
> performance curve --- the thing had 100 meg of hash index built within
> an hour of starting, but got slower and slower after that.
>
> In short, lack of support for concurrent operations is hardly the
> worst problem with Postgres' hash indexes. If you wanna fix 'em,
> be my guest ... but I think I shall spend my time elsewhere.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-25 21:04:44
Message-ID: 20020425170444.604ae4af.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 25 Apr 2002 16:38:00 -0400 (EDT)
"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
> Nice report. I think we should start thinking of hiding the hash option
> from users, or warn them more forcefully, rather than hold it out as a
> possible option for them.

Why not do something Peter E. suggested earlier: if the functionality of
hash indexes is a subset of that offered by btrees, it might be good to
remove the hash index code and treat USING 'hash' as an alias for
USING 'btree'?

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Loftis <mloftis(at)wgops(dot)com>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-25 21:14:43
Message-ID: 7406.1019769283@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Loftis <mloftis(at)wgops(dot)com> writes:
> I don't know about PGs implementation but since I assume oyu all
> inhereted atleast part of it from the berkely boys you should be in very
> solid form.

One would have thought so, wouldn't one? AFAIK the hash index code is
lock-stock-and-barrel straight from Berkeley; we've not touched it
except for minor tweaking (portability issues and such).

I spent a little time reading the code whilst I was waiting for the hash
index build to complete, and was kind of wondering why it bothers to
maintain bitmaps of free space. Seems like it could just keep all the
free pages chained together in a list, for zero overhead cost, and skip
the bitmaps. It locks the metapage anyway when allocating or freeing
a page, so keeping the freelist head pointer there doesn't seem like it
would have any performance penalty...

<<whacks self on head>> NO <<whack>> I am not getting involved with the
hash index code. I don't think it's worth our trouble.

regards, tom lane


From: Michael Loftis <mloftis(at)wgops(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-25 21:33:57
Message-ID: 3CC87645.6030508@wgops.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The idea behind hte bitmap (correct me if I'm wrong) is that when larger
allocationsa re asked for they can be quickly found and there is no need
to maintain the coalescing of smaller adjacent blocks into larger ones.

I don't know if pg does this or not, but thats the only sane reason I
can come up with.

*quietly installs an rm -rf trigger if tom does any I/O on the has files
outside of the compiler* This is for your own safety Tom... Well that
and our amusement.... :)

Tom Lane wrote:

>Michael Loftis <mloftis(at)wgops(dot)com> writes:
>
>> I don't know about PGs implementation but since I assume oyu all
>>inhereted atleast part of it from the berkely boys you should be in very
>>solid form.
>>
>
>One would have thought so, wouldn't one? AFAIK the hash index code is
>lock-stock-and-barrel straight from Berkeley; we've not touched it
>except for minor tweaking (portability issues and such).
>
>I spent a little time reading the code whilst I was waiting for the hash
>index build to complete, and was kind of wondering why it bothers to
>maintain bitmaps of free space. Seems like it could just keep all the
>free pages chained together in a list, for zero overhead cost, and skip
>the bitmaps. It locks the metapage anyway when allocating or freeing
>a page, so keeping the freelist head pointer there doesn't seem like it
>would have any performance penalty...
>
><<whacks self on head>> NO <<whack>> I am not getting involved with the
>hash index code. I don't think it's worth our trouble.
>
> regards, tom lane
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-26 02:25:06
Message-ID: 200204260225.g3Q2P6011728@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> On Thu, 25 Apr 2002 16:38:00 -0400 (EDT)
> "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> >
> > Nice report. I think we should start thinking of hiding the hash option
> > from users, or warn them more forcefully, rather than hold it out as a
> > possible option for them.
>
> Why not do something Peter E. suggested earlier: if the functionality of
> hash indexes is a subset of that offered by btrees, it might be good to
> remove the hash index code and treat USING 'hash' as an alias for
> USING 'btree'?

I hate to do that because it makes people think something special is
happening for hash, but it isn't. We could throw an elog(NOTICE)
stating that hash is not recommended and btree is faster, or something
like that.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-04-26 02:32:14
Message-ID: 20276.1019788334@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I hate to do that because it makes people think something special is
> happening for hash, but it isn't. We could throw an elog(NOTICE)
> stating that hash is not recommended and btree is faster, or something
> like that.

I think the only action called for is some improvement in the
documentation. Right now the docs are not honest about the state
of any of the non-btree index methods. Ain't none of 'em ready
for prime time IMHO. GIST is the only one that's getting any
development attention --- and probably the only one that deserves
it, given limited resources. Hash offers no compelling advantage
over btree AFAICS, and rtree is likewise dominated by GIST (or would
be, if we shipped rtree-equivalent GIST opclasses in the standard
distribution).

I do not like "throw an elog" as a substitute for documentation.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 03:25:23
Message-ID: 200206210325.g5L3PN529603@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I hate to do that because it makes people think something special is
> > happening for hash, but it isn't. We could throw an elog(NOTICE)
> > stating that hash is not recommended and btree is faster, or something
> > like that.
>
> I think the only action called for is some improvement in the
> documentation. Right now the docs are not honest about the state
> of any of the non-btree index methods. Ain't none of 'em ready
> for prime time IMHO. GIST is the only one that's getting any
> development attention --- and probably the only one that deserves
> it, given limited resources. Hash offers no compelling advantage
> over btree AFAICS, and rtree is likewise dominated by GIST (or would
> be, if we shipped rtree-equivalent GIST opclasses in the standard
> distribution).
>
> I do not like "throw an elog" as a substitute for documentation.

OK, documentation changes for hash attached. Do we need to also throw
a elog(WARNING) about its use? I don't think everyone is going to see
these documentation changes, and I hate to add it to the FAQ.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Attachment Content-Type Size
unknown_filename text/plain 2.8 KB

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, <mloftis(at)wgops(dot)com>, <DCorbit(at)connx(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 05:45:46
Message-ID: Pine.GSO.4.44.0206210842150.5193-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We have documented current GiST interface but in russian.
http://www.sai.msu.su/~megera/postgres/gist/doc/gist-inteface-r.shtml
We have no time to translate it to english :-)
I'd appreciate if somebody could help us in documentation -

Oleg
On Thu, 20 Jun 2002, Bruce Momjian wrote:

> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I hate to do that because it makes people think something special is
> > > happening for hash, but it isn't. We could throw an elog(NOTICE)
> > > stating that hash is not recommended and btree is faster, or something
> > > like that.
> >
> > I think the only action called for is some improvement in the
> > documentation. Right now the docs are not honest about the state
> > of any of the non-btree index methods. Ain't none of 'em ready
> > for prime time IMHO. GIST is the only one that's getting any
> > development attention --- and probably the only one that deserves
> > it, given limited resources. Hash offers no compelling advantage
> > over btree AFAICS, and rtree is likewise dominated by GIST (or would
> > be, if we shipped rtree-equivalent GIST opclasses in the standard
> > distribution).
> >
> > I do not like "throw an elog" as a substitute for documentation.
>
> OK, documentation changes for hash attached. Do we need to also throw
> a elog(WARNING) about its use? I don't think everyone is going to see
> these documentation changes, and I hate to add it to the FAQ.
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 13:25:38
Message-ID: 18837.1024665938@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> <para>
> ! Because of the limited utility of hash indexes, a B-tree index
> ! should generally be preferred over a hash index. We do not have
> ! sufficient evidence that hash indexes are actually faster than
> ! B-trees even for <literal>=</literal> comparisons. Moreover,
> ! hash indexes require coarser locks; see <xref
> ! linkend="locking-indexes">.
> </para>
> </note>
> </para>
> --- 181,189 ----
> </synopsis>
> <note>
> <para>
> ! Testing has shown that hash indexes are slower than btree indexes,
> ! and the size and build time for hash indexes is much worse. For
> ! these reasons, hash index use is discouraged.

This change strikes me as a step backwards. The existing wording tells
the truth; the proposed revision removes the facts in favor of a blanket
assertion that is demonstrably false.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 13:32:19
Message-ID: 200206211332.g5LDWJY10224@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > <para>
> > ! Because of the limited utility of hash indexes, a B-tree index
> > ! should generally be preferred over a hash index. We do not have
> > ! sufficient evidence that hash indexes are actually faster than
> > ! B-trees even for <literal>=</literal> comparisons. Moreover,
> > ! hash indexes require coarser locks; see <xref
> > ! linkend="locking-indexes">.
> > </para>
> > </note>
> > </para>
> > --- 181,189 ----
> > </synopsis>
> > <note>
> > <para>
> > ! Testing has shown that hash indexes are slower than btree indexes,
> > ! and the size and build time for hash indexes is much worse. For
> > ! these reasons, hash index use is discouraged.
>
> This change strikes me as a step backwards. The existing wording tells
> the truth; the proposed revision removes the facts in favor of a blanket
> assertion that is demonstrably false.

OK, which part of is "demonstrably false"? I think the old "should
generally be preferred" is too vague. No one has come up with a case
where hash has shown to be faster, and a lot of cases where it is slower.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 14:47:20
Message-ID: 19635.1024670840@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> OK, which part of is "demonstrably false"? I think the old "should
> generally be preferred" is too vague. No one has come up with a case
> where hash has shown to be faster, and a lot of cases where it is slower.

The only thing I recall being lots worse is initial index build.

I have not tested it much, but I would expect that hash holds up better
in the presence of many equal keys than btree does...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 15:03:25
Message-ID: 200206211503.g5LF3PP20126@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > OK, which part of is "demonstrably false"? I think the old "should
> > generally be preferred" is too vague. No one has come up with a case
> > where hash has shown to be faster, and a lot of cases where it is slower.
>
> The only thing I recall being lots worse is initial index build.
>
> I have not tested it much, but I would expect that hash holds up better
> in the presence of many equal keys than btree does...

I remember three problems: build time, index size, and concurrency
problems. I was wondering about the equal key case myself, and assumed
hash may be a win there, but with the concurrency problems, is that even
possible?

OK, I have reworded it. Is that better? How about an elog(NOTICE) for
hash use?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Attachment Content-Type Size
unknown_filename text/plain 1.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 15:47:17
Message-ID: 20186.1024674437@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I remember three problems: build time, index size, and concurrency
> problems. I was wondering about the equal key case myself, and assumed
> hash may be a win there, but with the concurrency problems, is that even
> possible?

Sure. Many-equal-keys are a problem for btree whether you have any
concurrency or not.

> OK, I have reworded it. Is that better?

It's better, but you've still discarded the original's explicit mention
of concurrency problems. Why do you want to remove information?

> How about an elog(NOTICE) for hash use?

I don't think that's appropriate.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 16:51:53
Message-ID: 200206211651.g5LGprw28745@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I remember three problems: build time, index size, and concurrency
> > problems. I was wondering about the equal key case myself, and assumed
> > hash may be a win there, but with the concurrency problems, is that even
> > possible?
>
> Sure. Many-equal-keys are a problem for btree whether you have any
> concurrency or not.
>
> > OK, I have reworded it. Is that better?
>
> It's better, but you've still discarded the original's explicit mention
> of concurrency problems. Why do you want to remove information?

OK, concurrency added. How is that?

>
> > How about an elog(NOTICE) for hash use?
>
> I don't think that's appropriate.

I was thinking of this during CREATE INDEX ... hash:

NOTICE: Hash index use is discouraged. See the CREATE INDEX
reference page for more information.

Does anyone else like/dislike that?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Attachment Content-Type Size
unknown_filename text/plain 2.1 KB

From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 20:09:19
Message-ID: 1024690164.751.54.camel@lerlaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2002-06-21 at 11:51, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

>
> >
> > > How about an elog(NOTICE) for hash use?
> >
> > I don't think that's appropriate.
>
> I was thinking of this during CREATE INDEX ... hash:
>
> NOTICE: Hash index use is discouraged. See the CREATE INDEX
> reference page for more information.
>
> Does anyone else like/dislike that?
I dislike it. Some clients/dba's will wonder why we even have them.

Why should we bug the DBA on EVERY index that is a hash?

I know I personally hate the FreeBSD linker warnings about certain
functions, and don't like that precedent.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 20:12:23
Message-ID: 200206212012.g5LKCNB14956@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Larry Rosenman wrote:
> On Fri, 2002-06-21 at 11:51, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> >
> > >
> > > > How about an elog(NOTICE) for hash use?
> > >
> > > I don't think that's appropriate.
> >
> > I was thinking of this during CREATE INDEX ... hash:
> >
> > NOTICE: Hash index use is discouraged. See the CREATE INDEX
> > reference page for more information.
> >
> > Does anyone else like/dislike that?
> I dislike it. Some clients/dba's will wonder why we even have them.
>
> Why should we bug the DBA on EVERY index that is a hash?
>
> I know I personally hate the FreeBSD linker warnings about certain
> functions, and don't like that precedent.

OK, that's enough of a negative vote for me. So you feel the
documentation change is enough? Tom thinks so too.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, mloftis(at)wgops(dot)com, DCorbit(at)connx(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 20:13:55
Message-ID: 1024690436.751.56.camel@lerlaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2002-06-21 at 15:12, Bruce Momjian wrote:
> Larry Rosenman wrote:
> > On Fri, 2002-06-21 at 11:51, Bruce Momjian wrote:
> > > Tom Lane wrote:
> > > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> >
> > >
> > > >
> > > > > How about an elog(NOTICE) for hash use?
> > > >
> > > > I don't think that's appropriate.
> > >
> > > I was thinking of this during CREATE INDEX ... hash:
> > >
> > > NOTICE: Hash index use is discouraged. See the CREATE INDEX
> > > reference page for more information.
> > >
> > > Does anyone else like/dislike that?
> > I dislike it. Some clients/dba's will wonder why we even have them.
> >
> > Why should we bug the DBA on EVERY index that is a hash?
> >
> > I know I personally hate the FreeBSD linker warnings about certain
> > functions, and don't like that precedent.
>
> OK, that's enough of a negative vote for me. So you feel the
> documentation change is enough? Tom thinks so too.
Yup.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749