Hash index todo list item

Lists: pgsql-hackers
From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Hash index todo list item
Date: 2007-09-02 18:04:04
Message-ID: 20070902180404.GE16568@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear PostgreSQL Hackers:

After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am currently considering:

1. Characterize the current hash index implementation against
the BTree index, with a focus on space utilization and
lookup performance against a collection of test data. This
will give a baseline performance test to evaluate the impact
of changes. I initially do not plan to bench the hash creation
process since my initial focus will be on lookup performance.

2. Evaluate the performance of different hash index implementations
and/or changes to the current implementation. My current plan is
to keep the implementation as simple as possible and still provide
the desired performance. Several hash index suggestions deal with
changing the layout of the keys on a page to improve lookup
performance, including reducing the bucket size to a fraction of
a page or only storing the hash value on the page, instead of
the index value itself. My goal in this phase is to produce one
or more versions with better performance than the current BTree.

3. Look at build time and concurrency issues with the addition of
some additional tests to the test bed. (1)

4. Repeat as needed.

This is the rough plan. Does anyone see anything critical that
is missing at this point? Please send me any suggestions for test
data and various performance test ideas, since I will be working
on that first.

Regards,
Ken Marshall


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-03 02:41:22
Message-ID: 9163.1188787282@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> ... This is the rough plan. Does anyone see anything critical that
> is missing at this point?

Sounds pretty good. Let me brain-dump one item on you: one thing that
hash currently has over btree is the ability to handle index items up
to a full page. Now, if you go with a scheme that only stores hash
codes and not the underlying data, you can not only handle that but
improve on it; but if you reduce the bucket size and don't remove the
data, it'd be a step backward. The idea I had about dealing with that
was to only reduce the size of primary buckets --- if it's necessary to
add overflow space to a bucket, the overflow units are still full pages.
So an index tuple up to a page in size can always be accommodated by
adding an overflow page to the bucket.

Just a thought, but AFAIR it's not in the archives anywhere.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-03 09:33:54
Message-ID: 1188812034.4175.25.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am currently considering:
>
> 1. Characterize the current hash index implementation against
> the BTree index, with a focus on space utilization and
> lookup performance against a collection of test data. This
> will give a baseline performance test to evaluate the impact
> of changes. I initially do not plan to bench the hash creation
> process since my initial focus will be on lookup performance.
>
> 2. Evaluate the performance of different hash index implementations
> and/or changes to the current implementation. My current plan is
> to keep the implementation as simple as possible and still provide
> the desired performance. Several hash index suggestions deal with
> changing the layout of the keys on a page to improve lookup
> performance, including reducing the bucket size to a fraction of
> a page or only storing the hash value on the page, instead of
> the index value itself. My goal in this phase is to produce one
> or more versions with better performance than the current BTree.
>
> 3. Look at build time and concurrency issues with the addition of
> some additional tests to the test bed. (1)
>
> 4. Repeat as needed.
>
> This is the rough plan. Does anyone see anything critical that
> is missing at this point? Please send me any suggestions for test
> data and various performance test ideas, since I will be working
> on that first.

Sounds good.

I'd be particularly interested in large indexes, say ~ 0.5 - 2GB. There
are likely to be various effects apparent as the indexes grow. It would
be too easy to do all the tests with smaller indexes and miss things.

Other factors are:
- volatility
- concurrency

My general experience is that hash-based indexes are better when the
range of inputs is relatively well-known, allowing a fast lookup. If
that is the only benefit of hash indexes, a flexible hashing scheme may
simply weaken the benefit-case for using them. If that's true, should
the index build process examine the key values in the data to determine
the best parameters to use? Kind of ANALYZE before build.

My current feeling is that they ought to be very good at handling
read-mostly situations such as privilege checking or UPDATE-intensive
situations such as Customer-Current-Credit tracking, when the number of
customers is large.

It might also be worth looking at lossy hash indexes, i.e. the index
stores only the block numbers. That would need to be part of the
discussion around how lossy we will allow indexes to be.

We currently have two kinds of full text index with different
concurrency use cases, so it should be acceptable to have hash indexes
have a clear benefit in one use case but a clear loss in another.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-03 17:18:08
Message-ID: 20070903171808.GG16568@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> > ... This is the rough plan. Does anyone see anything critical that
> > is missing at this point?
>
> Sounds pretty good. Let me brain-dump one item on you: one thing that
> hash currently has over btree is the ability to handle index items up
> to a full page. Now, if you go with a scheme that only stores hash
> codes and not the underlying data, you can not only handle that but
> improve on it; but if you reduce the bucket size and don't remove the
> data, it'd be a step backward. The idea I had about dealing with that
> was to only reduce the size of primary buckets --- if it's necessary to
> add overflow space to a bucket, the overflow units are still full pages.
> So an index tuple up to a page in size can always be accommodated by
> adding an overflow page to the bucket.
>
> Just a thought, but AFAIR it's not in the archives anywhere.
>
> regards, tom lane
>
Tom,

Thank you for the input. I agree that keeping the ability to accomodate
an index tuple up to a page is size worth keeping. I think that your
goal in reducing the bucket size is to improve the packing efficiency
of the index. Since the on disk page size remains the same, it may be
possible to use a different structure overlayed on the current bucket
size and still improve the packing efficiency of the index. After some
more mulling, here are some further thoughts on the improved hash table
implementation:

- Hash lookup is O(1) while btree is O(logN). Is there a value
in optimizing the NOT case, i.e. the entry is not in the table?

- Space versus performance trade-off. This may tie into cache
efficiency and use of L2/L3, shared buffers, main memory.
Denser layouts with a higher load facter may be a bit slower
in lookups but play much nicer in a multi-user system. Look
at the possibility of a lossy mapping?

- Build time versus update time. How does concurrency enter into
the picture regarding simultaneous updates, inserts, deletes,
and lookups?

- Could a hybrid structure with some type of prefix compression
give a more efficient layout and possibly improve performance?

- Index larger fields. Btree is limited to blocksize/3, the
current hash implementation can go up to a full block.

- What about multi-column indexes? The current implementation
only supports 1 column.

More ideas are welcome and I will add them to the list for
investigation.

Regards,
Ken


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-03 17:36:42
Message-ID: 20070903173641.GH16568@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote:
> >
> > This is the rough plan. Does anyone see anything critical that
> > is missing at this point? Please send me any suggestions for test
> > data and various performance test ideas, since I will be working
> > on that first.
>
> Sounds good.
>
> I'd be particularly interested in large indexes, say ~ 0.5 - 2GB. There
> are likely to be various effects apparent as the indexes grow. It would
> be too easy to do all the tests with smaller indexes and miss things.
>
> Other factors are:
> - volatility
> - concurrency
>
> My general experience is that hash-based indexes are better when the
> range of inputs is relatively well-known, allowing a fast lookup. If
> that is the only benefit of hash indexes, a flexible hashing scheme may
> simply weaken the benefit-case for using them. If that's true, should
> the index build process examine the key values in the data to determine
> the best parameters to use? Kind of ANALYZE before build.
>
> My current feeling is that they ought to be very good at handling
> read-mostly situations such as privilege checking or UPDATE-intensive
> situations such as Customer-Current-Credit tracking, when the number of
> customers is large.
>
> It might also be worth looking at lossy hash indexes, i.e. the index
> stores only the block numbers. That would need to be part of the
> discussion around how lossy we will allow indexes to be.
>
> We currently have two kinds of full text index with different
> concurrency use cases, so it should be acceptable to have hash indexes
> have a clear benefit in one use case but a clear loss in another.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>

Simon,

Thank you for your input. I would like to include some tests with large
indexes too. Do you have any ideas for a test corpus or should we try
and generate the test data programatically? Many people in the literature
of text retrieval use the TREC* data for at least some of their runs. I
am going to check at work to see if the campus has access to the data,
otherwise I will do some web crawling to generate some sample data. I
have just posted a reply to Tom Lane with some further ideas for consideration
in the new hash index support. Like you, I suspect that volatile data that
results in many index changes may not work well with hash indexes, in general.
PostgreSQL has the additional burden of needing to access both the index and
the data heap. Obviously, the less I/O that is needed the better the
performance is likely to be. The new HOT functionality plus clustering the
table data on the hash index would effectively organize the table into the
"hash buckets" which could help with reducing both the churn in the index
as well as in the tables.

Regards,
Ken


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash index todo list item
Date: 2007-09-03 22:24:20
Message-ID: 87642rie0b.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Kenneth Marshall" <ktm(at)rice(dot)edu> writes:

> On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
>> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
>> > ... This is the rough plan. Does anyone see anything critical that
>> > is missing at this point?
>>
>> Sounds pretty good. Let me brain-dump one item on you: one thing that
>> hash currently has over btree is the ability to handle index items up
>> to a full page. Now, if you go with a scheme that only stores hash
>> codes and not the underlying data, you can not only handle that but
>> improve on it;

I think that would be a big selling point for hash indexes. It would let you
index even toasted data which are larger than a page. I'm not sure whether you
can make it work for unique indexes though. But for non-unique indexes I think
it would be a solid win and mean you cover a set of use cases quite distinct
from btree indexes.

> - Hash lookup is O(1) while btree is O(logN).

That's not really true. There's a tradeoff between insertion time and lookup
time. In order to get O(1) lookups you need to work pretty hard to maintain
the hash table including spending a lot of time reorganizing it when you grow
it. If you don't want to spend the time on inserts then you end up with
buckets and the hash table is basically just a linear speedup to whatever
algorithm you use to scan the buckets.

> - What about multi-column indexes? The current implementation
> only supports 1 column.

That seems kind of weird. It seems obvious that you mix the three hashes
together which reduces it to the solved problem.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Kenneth Marshall" <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-03 23:55:30
Message-ID: 5266.1188863730@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Kenneth Marshall" <ktm(at)rice(dot)edu> writes:
>> - What about multi-column indexes? The current implementation
>> only supports 1 column.

> That seems kind of weird. It seems obvious that you mix the three hashes
> together which reduces it to the solved problem.

No, because part of the deal is that you can do lookups using only the
leading index columns. At least, all the existing multicolumn index
types can do that.

regards, tom lane


From: "Ben Tilly" <btilly(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Kenneth Marshall" <ktm(at)rice(dot)edu>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-04 00:20:34
Message-ID: acc274b30709031720k5d16b8fh3e7188e461129967@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/3/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>
> "Kenneth Marshall" <ktm(at)rice(dot)edu> writes:
>
> > On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> >> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> >> > ... This is the rough plan. Does anyone see anything critical that
> >> > is missing at this point?
> >>
> >> Sounds pretty good. Let me brain-dump one item on you: one thing that
> >> hash currently has over btree is the ability to handle index items up
> >> to a full page. Now, if you go with a scheme that only stores hash
> >> codes and not the underlying data, you can not only handle that but
> >> improve on it;
>
> I think that would be a big selling point for hash indexes. It would let you
> index even toasted data which are larger than a page. I'm not sure whether you
> can make it work for unique indexes though. But for non-unique indexes I think
> it would be a solid win and mean you cover a set of use cases quite distinct
> from btree indexes.
>
> > - Hash lookup is O(1) while btree is O(logN).
>
> That's not really true. There's a tradeoff between insertion time and lookup
> time. In order to get O(1) lookups you need to work pretty hard to maintain
> the hash table including spending a lot of time reorganizing it when you grow
> it. If you don't want to spend the time on inserts then you end up with
> buckets and the hash table is basically just a linear speedup to whatever
> algorithm you use to scan the buckets.

These facts notwithstanding, average insert performance remains O(1)
if you grow the hash exponentially every time it needs to be grown.
Suppose, for example, that you use a power of 2 arrangement. Then the
worst case scenario, right after a split, is that all of your keys had
to be inserted, all had to be moved once, half had to be moved twice,
a quarter 3 times, etc. So the ratio of moves to keys is 1 + 1/2 +
1/4 + ... which is a well-known geometric series converging on 2.

True, when you cross the threshold a lot of work needs to be done.
Life would be simpler if you could just put up a lock while you split
the hash. You can't do that for a busy transactional database though.
But if you want to be clever about it, you build into your hash
implementation the intelligence to be able to have 1 or 2 hash
locations to search. When they are both present, all inserts go into
one of them, all deletes and updates are performed against both. Then
you're able to have a background job reorganize your hash while the
database continues to use it.

> > - What about multi-column indexes? The current implementation
> > only supports 1 column.
>
> That seems kind of weird. It seems obvious that you mix the three hashes
> together which reduces it to the solved problem.

That raises a very random thought. One of the nicer features of
Oracle is the ability to have function-based indexes. So you could
index, say, trim(lower(person.name)). There are a *lot* of practical
situations where that comes in handy. The best workaround that I can
think of for not having that is to have a column defined to hold the
result of the function, maintain that column with a trigger, then
index that column. Which works, but is inelegant. (It also requires
storing completely redundant data.)

Is there any prospect of postgres aquiring that functionality?

Ben


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Ben Tilly <btilly(at)gmail(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-04 00:27:31
Message-ID: 20070904002730.GI16568@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote:
>
> That raises a very random thought. One of the nicer features of
> Oracle is the ability to have function-based indexes. So you could
> index, say, trim(lower(person.name)). There are a *lot* of practical
> situations where that comes in handy. The best workaround that I can
> think of for not having that is to have a column defined to hold the
> result of the function, maintain that column with a trigger, then
> index that column. Which works, but is inelegant. (It also requires
> storing completely redundant data.)
>
> Is there any prospect of postgres aquiring that functionality?
>
> Ben
>
I believe that PostgreSQL already supports functional indexes. In fact,
one suggestion to address the egregiously poor performance of the current
hash index was to replace it with a functional index.

Regards,
Ken


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ben Tilly" <btilly(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kenneth Marshall" <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-04 00:34:14
Message-ID: 5977.1188866054@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Ben Tilly" <btilly(at)gmail(dot)com> writes:
> That raises a very random thought. One of the nicer features of
> Oracle is the ability to have function-based indexes. So you could
> index, say, trim(lower(person.name)).

> Is there any prospect of postgres aquiring that functionality?

Uh, no, since it's already there; has been since Berkeley days ...

regards, tom lane


From: "Ben Tilly" <btilly(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kenneth Marshall" <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-04 02:06:35
Message-ID: acc274b30709031906j483ea2e9pe8fb6bcfd148c64e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/3/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Ben Tilly" <btilly(at)gmail(dot)com> writes:
> > That raises a very random thought. One of the nicer features of
> > Oracle is the ability to have function-based indexes. So you could
> > index, say, trim(lower(person.name)).
>
> > Is there any prospect of postgres aquiring that functionality?
>
> Uh, no, since it's already there; has been since Berkeley days ...

Nice!

I know of at least one DBA who is moving from Oracle to postgres who
will be *very* happy to hear that.

Ben


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-05 20:07:03
Message-ID: 20070905200702.GG14336@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am currently considering:
>
> 1. Characterize the current hash index implementation against
> the BTree index, with a focus on space utilization and
> lookup performance against a collection of test data. This
> will give a baseline performance test to evaluate the impact
> of changes. I initially do not plan to bench the hash creation
> process since my initial focus will be on lookup performance.
>

Here are very basic results for a table with 1.6m entries:

postgres=# CREATE TABLE dict (word varchar(100));
CREATE TABLE

postgres=# COPY dict FROM '/tmp/words';
COPY 1648379
postgres=# select count(*) from dict;
count
---------
1648379
(1 row)

Time: 11187.418 ms
postgres=# select count(*) from dict;
count
---------
1648379
(1 row)

Time: 6040.912 ms
postgres=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 11108707.160 ms

postgres=# select * from dict where word = 'avatar';
word
--------
avatar
(1 row)

Time: 79.823 ms
postgres=# select * from dict where word = 'zebra';
word
-------
zebra
(1 row)

Time: 9.864 ms
postgres=# select * from dict where word = 'turkey';
word
--------
turkey
(1 row)

Time: 18.418 ms
Time: 1.045 ms
Time: 1.257 ms
Time: 1.080 ms

postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX

Time: 25438.884 ms

postgres=# select * from dict where word = 'avatar';
word
--------
avatar
(1 row)

Time: 13.400 ms
postgres=# select * from dict where word = 'zebra';
word
-------
zebra
(1 row)

Time: 1.173 ms
postgres=# select * from dict where word = 'turkey';
word
--------
turkey
(1 row)

Time: 1.186 ms
Time: 1.103 ms
Time: 1.099 ms
Time: 1.108 ms

------------------------------
Size of table = 87556096

Size of hash index = 268451840
Size of btree index = 53510144

From my very small sample on an unloaded machine, a hash index lookup
took the least amount of time. It had a much larger initial time which
could be attributable to cache population effects. The size is 5X that
of the Btree index. I will continue to improve the test suite as more
granularity is needed. If anyone has a good data generator, please let
me know. Otherwise I will just roll my own.

Regards,
Ken


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 10:24:48
Message-ID: 1189074288.7470.5.camel@hannu-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > "Kenneth Marshall" <ktm(at)rice(dot)edu> writes:
> >> - What about multi-column indexes? The current implementation
> >> only supports 1 column.
>
> > That seems kind of weird. It seems obvious that you mix the three hashes
> > together which reduces it to the solved problem.
>
> No, because part of the deal is that you can do lookups using only the
> leading index columns. At least, all the existing multicolumn index
> types can do that.

One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash.

If you do it smartly you can use any column for index lookups, not just
the leading one.

> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 13:38:18
Message-ID: 21432.1189085898@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing <hannu(at)skype(dot)net> writes:
> hel kenal peval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
>> No, because part of the deal is that you can do lookups using only the
>> leading index columns. At least, all the existing multicolumn index
>> types can do that.

> One approahc is not to mix hashes, but to partition the hash, so that
> each column gets its N bits in the hash.

How does that help? You still need all the keys to find out which
bucket to look in.

regards, tom lane


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 13:59:52
Message-ID: 1189087192.7470.16.camel@hannu-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane:
> Hannu Krosing <hannu(at)skype(dot)net> writes:
> > Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
> >> No, because part of the deal is that you can do lookups using only the
> >> leading index columns. At least, all the existing multicolumn index
> >> types can do that.
>
> > One approahc is not to mix hashes, but to partition the hash, so that
> > each column gets its N bits in the hash.
>
> How does that help? You still need all the keys to find out which
> bucket to look in.

no. you need to look at only the buckets where that part of hash matches

say you allocate bits 4-7 for column 2 and then need to look up column 2
value with hash 3 . here you need to look at only buckets N*16 + 3, that
is, you need to examine only each 16th bucket

---------
Hannu


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 15:53:45
Message-ID: 46E02289.9050406@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing wrote:
>>> One approahc is not to mix hashes, but to partition the hash, so that
>>> each column gets its N bits in the hash.
>>>
>> How does that help? You still need all the keys to find out which
>> bucket to look in.
>>
>
> no. you need to look at only the buckets where that part of hash matches
>
> say you allocate bits 4-7 for column 2 and then need to look up column 2
> value with hash 3 . here you need to look at only buckets N*16 + 3, that
> is, you need to examine only each 16th bucket
>
>

I don't like the truncating hash suggestion because it limits the
ability of a hash code to uniquely identify a key.

If a user requires the ability to search on both (column1) and (column1,
column2), they can create two hash indexes and the planner can decide
which to use.
Or, they can use a btree. I think hash has a subset of uses where it
would be a significant gain, and focus should be spent on this subset.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 17:07:24
Message-ID: 113CF969-9481-4BE8-838C-41866D147163@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Sep 6, 2007, at 10:53 , Mark Mielke wrote:

> I don't like the truncating hash suggestion because it limits the
> ability of a hash code to uniquely identify a key.

AIUI, a hash can't be used as a unique identifier: it always needs to
be rechecked due to the chance of collisions. There might be other
issues with truncation, but preventing hashes from being unique isn't
one of them.

Michael Glaesemann
grzm seespotcode net


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 18:08:59
Message-ID: 20070906180858.GB13405@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote:
> Hannu Krosing wrote:
>>>> One approahc is not to mix hashes, but to partition the hash, so that
>>>> each column gets its N bits in the hash.
>>> How does that help? You still need all the keys to find out which
>>> bucket to look in.
>>>
>>
>> no. you need to look at only the buckets where that part of hash matches
>>
>> say you allocate bits 4-7 for column 2 and then need to look up column 2
>> value with hash 3 . here you need to look at only buckets N*16 + 3, that
>> is, you need to examine only each 16th bucket
>>
>>
>
> I don't like the truncating hash suggestion because it limits the ability
> of a hash code to uniquely identify a key.
>
> If a user requires the ability to search on both (column1) and (column1,
> column2), they can create two hash indexes and the planner can decide which
> to use.
> Or, they can use a btree. I think hash has a subset of uses where it would
> be a significant gain, and focus should be spent on this subset.
>
> Cheers,
> mark
>
I agree that we should focus primarily on the subset of uses for hash
indexes where there would be a significant gain. I do think that being
able to use a single O(1) hash lookup against all the values specified
in a pseudo-multi-column index could be very beneficial in reducing
access time and I/O.

Since we already have to check the actual tuple values for any index
lookup in postgresql, we could only store the full hash value and the
corresponding TIDs in the bucket. Then when we lookup an item by
calculating its hash, if the exact hash is not present in the bucket,
then we know that the item is not in the index. If the value exists,
then we would check the heap tuples before returning the results. Thus
a negative lookup only needs to check the index and if the hash function
is "good" there will be optimally only 1 possibly valid heap tuple if
there is a match. One very big win for this change is to allow a much
smaller index size (hash value + relevant TIDs) and the large column
values are only stored in the actual data tuples.

Regards,
Ken
> --
> Mark Mielke <mark(at)mielke(dot)cc>
>


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Michael Glaesemann <grzm(at)seespotcode(dot)net>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-06 21:17:12
Message-ID: 46E06E58.1080708@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Glaesemann wrote:
> On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
>> I don't like the truncating hash suggestion because it limits the
>> ability of a hash code to uniquely identify a key.
> AIUI, a hash can't be used as a unique identifier: it always needs to
> be rechecked due to the chance of collisions. There might be other
> issues with truncation, but preventing hashes from being unique isn't
> one of them.

Of course - that's why I used the word "limit".

Hash works best, when the key is unique, however. A 32-bit hash will be
many powers of 2 more unique than a 8-bit hash.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 06:20:38
Message-ID: 20070907062038.GA23630@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote:
> Since we already have to check the actual tuple values for any index
> lookup in postgresql, we could only store the full hash value and the
> corresponding TIDs in the bucket. Then when we lookup an item by
> calculating its hash, if the exact hash is not present in the bucket,
> then we know that the item is not in the index.

Sounds like you'd be returning a bitmap for use with a bitmap scan.
That's a different take on other suggestions I've heard and would allow
a hash index to have an almost unlimited key size yet flexible
matching... (combined with other index, or even just the same index).

Neat.

Have a nice day,
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Neil Conway <neilc(at)samurai(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 06:56:25
Message-ID: 1189148185.2231.27.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> 2. Evaluate the performance of different hash index implementations
> and/or changes to the current implementation. My current plan is
> to keep the implementation as simple as possible and still provide
> the desired performance. Several hash index suggestions deal with
> changing the layout of the keys on a page to improve lookup
> performance, including reducing the bucket size to a fraction of
> a page or only storing the hash value on the page, instead of
> the index value itself.

You might find this patch useful:

http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php

It implements the "just store the hash in the index" idea; it also sorts
the entries in a bucket by the hash value, which allows binary search to
be used to locate candidate matches.

I was surprised that this didn't result in a performance improvement for
the benchmarks that I ran, but I never got around to investigating
further (either running more benchmarks or checking whether there was a
bug in the implementation).

Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
it up to HEAD if you'd like.

-Neil


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Neil Conway" <neilc(at)samurai(dot)com>
Cc: "Kenneth Marshall" <ktm(at)rice(dot)edu>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Hash index todo list item
Date: 2007-09-07 11:55:37
Message-ID: 46E13C39.8060005@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> You might find this patch useful:
>
> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php

Oh, I had forgot about that.

> It implements the "just store the hash in the index" idea; it also sorts
> the entries in a bucket by the hash value, which allows binary search to
> be used to locate candidate matches.
>
> I was surprised that this didn't result in a performance improvement for
> the benchmarks that I ran, but I never got around to investigating
> further (either running more benchmarks or checking whether there was a
> bug in the implementation).

You did get a smaller index, which has cache benefits with larger
tables. You didn't compare the size comparison against b-tree, but a
smaller index is a good thing.

I think you left some money on the table on that front. Since all the
HashItems are fixed size, you can get rid of the line pointers
altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line
pointer is 4 bytes, that should give a further 1/3 reduction in index
size. If you're willing to give up on the alignment of HashItemData, you
can get it down to 6 bytes.

Even from a CPU point of view, as the table becomes bigger and the
b-tree becomes deeper, the difference between O(1) and O(log n) lookup
starts to become more significant.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 13:29:28
Message-ID: 20070907132928.GC19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
> On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> > 2. Evaluate the performance of different hash index implementations
> > and/or changes to the current implementation. My current plan is
> > to keep the implementation as simple as possible and still provide
> > the desired performance. Several hash index suggestions deal with
> > changing the layout of the keys on a page to improve lookup
> > performance, including reducing the bucket size to a fraction of
> > a page or only storing the hash value on the page, instead of
> > the index value itself.
>
> You might find this patch useful:
>
> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>
> It implements the "just store the hash in the index" idea; it also sorts
> the entries in a bucket by the hash value, which allows binary search to
> be used to locate candidate matches.
>
> I was surprised that this didn't result in a performance improvement for
> the benchmarks that I ran, but I never got around to investigating
> further (either running more benchmarks or checking whether there was a
> bug in the implementation).
>
> Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
> it up to HEAD if you'd like.
>
> -Neil
>
This is a great starting point. I would appreciate it if you have the
time and could make it apply cleanly to HEAD. I remember when you first
posted it but had forgotten, probably because of the lack-luster results.
Just a quick glance at the patch and from what I can tell, tagging the
index as lossy causes a lot more work to be done than should be needed
in theory. Currently the index-scan machinery will recheck the value
against the original value for lossy indexes. However, given that we
are using a good hash function with a low chance of collision, if we
mark the unique items in the index then they do not actually have to
be rechecked during the scan. Do you have any suggestions for implementing
that optimization or is there any option to tell the scan machinery to
only re-check a certain list of tuples? Thank you again for pointing
this patch out and please let me know when you have a version against
HEAD.

Regards,
Ken
>
>


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 13:38:01
Message-ID: 20070907133801.GD19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote:
> Neil Conway wrote:
> > You might find this patch useful:
> >
> > http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>
> Oh, I had forgot about that.
>
> > It implements the "just store the hash in the index" idea; it also sorts
> > the entries in a bucket by the hash value, which allows binary search to
> > be used to locate candidate matches.
> >
> > I was surprised that this didn't result in a performance improvement for
> > the benchmarks that I ran, but I never got around to investigating
> > further (either running more benchmarks or checking whether there was a
> > bug in the implementation).
>
> You did get a smaller index, which has cache benefits with larger
> tables. You didn't compare the size comparison against b-tree, but a
> smaller index is a good thing.
>
> I think you left some money on the table on that front. Since all the
> HashItems are fixed size, you can get rid of the line pointers
> altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line
> pointer is 4 bytes, that should give a further 1/3 reduction in index
> size. If you're willing to give up on the alignment of HashItemData, you
> can get it down to 6 bytes.
>
> Even from a CPU point of view, as the table becomes bigger and the
> b-tree becomes deeper, the difference between O(1) and O(log n) lookup
> starts to become more significant.
>
If you use the size values from my initial tests, the hash index is
down to 3X the btree size instead of 5X. If we can make these additional
changes and add a unique flags to the index entry, we would have a much
smaller index. I had not thought of it at the time, but the addition of
the unique/sole item flag would allow the use of the hash index to
support unique indexes and be used for primary keys. In some usage
cases, the O(1) would be very advantageous.

Regards,
Ken


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 13:50:07
Message-ID: 46E1570F.6070501@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
>
>> You might find this patch useful:
>>
>> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>> ...
>>
>> Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
>> it up to HEAD if you'd like.
>>
> This is a great starting point. I would appreciate it if you have the
> time and could make it apply cleanly to HEAD. I remember when you first
> posted it but had forgotten, probably because of the lack-luster results.
> Just a quick glance at the patch and from what I can tell, tagging the
> index as lossy causes a lot more work to be done than should be needed
> in theory. Currently the index-scan machinery will recheck the value
> against the original value for lossy indexes. However, given that we
> are using a good hash function with a low chance of collision, if we
> mark the unique items in the index then they do not actually have to
> be rechecked during the scan. Do you have any suggestions for implementing
> that optimization or is there any option to tell the scan machinery to
> only re-check a certain list of tuples? Thank you again for pointing
> this patch out and please let me know when you have a version against
> HEAD.
>
What do you mean by "mark the unique items in the index then they do not
actually have to be rechecked during the scan." Even if there is a
unique hash value mapping to a unique key, there is no guarantee that a
new value won't result in that same hash value. Such is the nature of
hashes. A hash key map does not mean a value match. The value must be
checked. The opposite, however, may be true. If the hash key is not
found, then we know the row for the value does not exist.

Have you measured the performance of re-checking? I have always assumed
the performance of re-checking was near free when compared to the cost
of looking up the tuples in the table to determine whether or not they
are "live". This is why I have not been upset that bitmap index scans
often re-check.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 14:01:00
Message-ID: 20070907140059.GF19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
>>
>>> You might find this patch useful:
>>>
>>> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>>> ...
>>>
>>> Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
>>> it up to HEAD if you'd like.
>>>
>> This is a great starting point. I would appreciate it if you have the
>> time and could make it apply cleanly to HEAD. I remember when you first
>> posted it but had forgotten, probably because of the lack-luster results.
>> Just a quick glance at the patch and from what I can tell, tagging the
>> index as lossy causes a lot more work to be done than should be needed
>> in theory. Currently the index-scan machinery will recheck the value
>> against the original value for lossy indexes. However, given that we
>> are using a good hash function with a low chance of collision, if we
>> mark the unique items in the index then they do not actually have to
>> be rechecked during the scan. Do you have any suggestions for implementing
>> that optimization or is there any option to tell the scan machinery to
>> only re-check a certain list of tuples? Thank you again for pointing
>> this patch out and please let me know when you have a version against
>> HEAD.
>>
> What do you mean by "mark the unique items in the index then they do not
> actually have to be rechecked during the scan." Even if there is a unique
> hash value mapping to a unique key, there is no guarantee that a new value
> won't result in that same hash value. Such is the nature of hashes. A hash
> key map does not mean a value match. The value must be checked. The
> opposite, however, may be true. If the hash key is not found, then we know
> the row for the value does not exist.
>
> Have you measured the performance of re-checking? I have always assumed the
> performance of re-checking was near free when compared to the cost of
> looking up the tuples in the table to determine whether or not they are
> "live". This is why I have not been upset that bitmap index scans often
> re-check.
>
> Cheers,
> mark

I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
very expensive string comparison can be skipped. Given that the hash
function is doing its job, almost every string comparison can be skipped.
How long would it take to compare 1-32K of data? How much CPU usage?
With this field in place, you only need to check tuple visibility.

Regards,
Ken


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 14:12:15
Message-ID: 20070907141215.GG19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
> On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> > 2. Evaluate the performance of different hash index implementations
> > and/or changes to the current implementation. My current plan is
> > to keep the implementation as simple as possible and still provide
> > the desired performance. Several hash index suggestions deal with
> > changing the layout of the keys on a page to improve lookup
> > performance, including reducing the bucket size to a fraction of
> > a page or only storing the hash value on the page, instead of
> > the index value itself.
>
> You might find this patch useful:
>
> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>
> It implements the "just store the hash in the index" idea; it also sorts
> the entries in a bucket by the hash value, which allows binary search to
> be used to locate candidate matches.
>
> I was surprised that this didn't result in a performance improvement for
> the benchmarks that I ran, but I never got around to investigating
> further (either running more benchmarks or checking whether there was a
> bug in the implementation).
>
> Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
> it up to HEAD if you'd like.
>
> -Neil
>
I have another question. Did the scan code at this time use the
heap-order scanning? Could that have had an impact on the patch
performance?

Ken


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 14:30:30
Message-ID: 46E16086.1010007@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> I understand that a hash value is a many-to-one mapping. That is the
> point of the flag in the index. The flag means that there is only one
> item in the heap corresponding to that hash value. In this case we
> know that the value in the heap is the correct one and a possibly
> very expensive string comparison can be skipped. Given that the hash
> function is doing its job, almost every string comparison can be skipped.
> How long would it take to compare 1-32K of data? How much CPU usage?
> With this field in place, you only need to check tuple visibility
The value comparison cannot be skipped. I do not think you understand
the many-to-one mapping in its entirety.

Example:

Table has: a(1), b(2), c(3)
Index has: 1 => 1 (unique), 2 => 2 (unique), 3 => 3 (unique)

Query:

select * from table where key = 'z';

If 'z' hashes to '3' (completely possible), then the index record 3
points to tuple 3, and it "exists". Only, it doesn't because 'a' <> 'z'.
You MUST check the value.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Brian Hurt <bhurt(at)janestcapital(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 14:36:41
Message-ID: 46E161F9.8070705@janestcapital.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:

>I understand that a hash value is a many-to-one mapping. That is the
>point of the flag in the index. The flag means that there is only one
>item in the heap corresponding to that hash value. In this case we
>know that the value in the heap is the correct one and a possibly
>very expensive string comparison can be skipped. Given that the hash
>function is doing its job, almost every string comparison can be skipped.
>How long would it take to compare 1-32K of data? How much CPU usage?
>With this field in place, you only need to check tuple visibility.
>
>

How likely is it that you will get a hash collision, two strings that
are different that will hash to the same value? To avoid this requires
a very large hash key (128 bits, minimum)- otherwise you get into
birthday attack problems. With a 32-bit hash, the likelyhood is greater
than 50% that two strings in a collection of 100,000 will hash to the
same value. With a 64-bit hash, the likelyhood is greater than 50% that
two strings in a collection of 10 billion will has to same value. 10
billion is a large number, but not an unreasonable number, of strings to
want to put into a hash table- and it's exactly this case where the O(1)
cost of hashtables starts being a real win.

Brian


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 14:39:13
Message-ID: 20070907143913.GH19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case we
>> know that the value in the heap is the correct one and a possibly
>> very expensive string comparison can be skipped. Given that the hash
>> function is doing its job, almost every string comparison can be skipped.
>> How long would it take to compare 1-32K of data? How much CPU usage?
>> With this field in place, you only need to check tuple visibility
> The value comparison cannot be skipped. I do not think you understand the
> many-to-one mapping in its entirety.
>
> Example:
>
> Table has: a(1), b(2), c(3)
> Index has: 1 => 1 (unique), 2 => 2 (unique), 3 => 3 (unique)
>
> Query:
>
> select * from table where key = 'z';
>
> If 'z' hashes to '3' (completely possible), then the index record 3 points
> to tuple 3, and it "exists". Only, it doesn't because 'a' <> 'z'. You MUST
> check the value.
>
> Cheers,
> mark
>
Yes, you are completely correct. Thank you for the reminder.

Regards,
Ken


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Brian Hurt <bhurt(at)janestcapital(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 14:55:07
Message-ID: 20070907145507.GJ19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case we
>> know that the value in the heap is the correct one and a possibly
>> very expensive string comparison can be skipped. Given that the hash
>> function is doing its job, almost every string comparison can be skipped.
>> How long would it take to compare 1-32K of data? How much CPU usage?
>> With this field in place, you only need to check tuple visibility.
>>
>
> How likely is it that you will get a hash collision, two strings that are
> different that will hash to the same value? To avoid this requires a very
> large hash key (128 bits, minimum)- otherwise you get into birthday attack
> problems. With a 32-bit hash, the likelyhood is greater than 50% that two
> strings in a collection of 100,000 will hash to the same value. With a
> 64-bit hash, the likelyhood is greater than 50% that two strings in a
> collection of 10 billion will has to same value. 10 billion is a large
> number, but not an unreasonable number, of strings to want to put into a
> hash table- and it's exactly this case where the O(1) cost of hashtables
> starts being a real win.
>
> Brian
>
Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.

Regards,
Ken


From: Brian Hurt <bhurt(at)janestcapital(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 15:08:13
Message-ID: 46E1695D.5000909@janestcapital.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:

>>>
>>>
>>>
>>How likely is it that you will get a hash collision, two strings that are
>>different that will hash to the same value? To avoid this requires a very
>>large hash key (128 bits, minimum)- otherwise you get into birthday attack
>>problems. With a 32-bit hash, the likelyhood is greater than 50% that two
>>strings in a collection of 100,000 will hash to the same value. With a
>>64-bit hash, the likelyhood is greater than 50% that two strings in a
>>collection of 10 billion will has to same value. 10 billion is a large
>>number, but not an unreasonable number, of strings to want to put into a
>>hash table- and it's exactly this case where the O(1) cost of hashtables
>>starts being a real win.
>>
>>Brian
>>
>>
>>
>Yes, there is a non-negligible chance of collision (In a DB is there
>any chance that is non-negligible? :) ) and the values must be checked
>against the actual. The win is the collapse of the index size and only
>needed to check a small fraction of the actual tuples.
>
>
>
>

Ah, OK- I misunderstood you. I thought you were saying that the hash
values would need to be unique, and you wouldn't check the original
values at all. My bad.

Brian


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Brian Hurt <bhurt(at)janestcapital(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-07 15:15:26
Message-ID: 20070907151526.GK19403@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>>>>
>>> How likely is it that you will get a hash collision, two strings that are
>>> different that will hash to the same value? To avoid this requires a
>>> very large hash key (128 bits, minimum)- otherwise you get into birthday
>>> attack problems. With a 32-bit hash, the likelyhood is greater than 50%
>>> that two strings in a collection of 100,000 will hash to the same value.
>>> With a 64-bit hash, the likelyhood is greater than 50% that two strings
>>> in a collection of 10 billion will has to same value. 10 billion is a
>>> large number, but not an unreasonable number, of strings to want to put
>>> into a hash table- and it's exactly this case where the O(1) cost of
>>> hashtables starts being a real win.
>>>
>>> Brian
>>>
>>>
>> Yes, there is a non-negligible chance of collision (In a DB is there
>> any chance that is non-negligible? :) ) and the values must be checked
>> against the actual. The win is the collapse of the index size and only
>> needed to check a small fraction of the actual tuples.
>>
>>
>>
>
> Ah, OK- I misunderstood you. I thought you were saying that the hash
> values would need to be unique, and you wouldn't check the original values
> at all. My bad.
>
> Brian
>
No, you were correct. I misstated originally and you and Mark both pointed
out my mistake.

Regards,
Ken


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Brian Hurt <bhurt(at)janestcapital(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-08 20:21:22
Message-ID: 20070908202122.GA5679@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case we
>> know that the value in the heap is the correct one and a possibly
>> very expensive string comparison can be skipped. Given that the hash
>> function is doing its job, almost every string comparison can be skipped.
>> How long would it take to compare 1-32K of data? How much CPU usage?
>> With this field in place, you only need to check tuple visibility.
>>
>
> How likely is it that you will get a hash collision, two strings that are
> different that will hash to the same value? To avoid this requires a very
> large hash key (128 bits, minimum)- otherwise you get into birthday attack
> problems. With a 32-bit hash, the likelyhood is greater than 50% that two
> strings in a collection of 100,000 will hash to the same value. With a
> 64-bit hash, the likelyhood is greater than 50% that two strings in a
> collection of 10 billion will has to same value. 10 billion is a large
> number, but not an unreasonable number, of strings to want to put into a
> hash table- and it's exactly this case where the O(1) cost of hashtables
> starts being a real win.
>
> Brian
>
Continuing this train of thought.... While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing use of the hash index
in non-lossy mode albeit with a slight increase in complexity.

Ken


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-08 21:14:09
Message-ID: 46E310A1.8000008@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> Continuing this train of thought.... While it would make sense for larger
> keys to store the hash in the index, if the key is smaller, particularly
> if it is of fixed size, it would make sense to store the key in the index
> instead. This would have the benefit of allowing use of the hash index
> in non-lossy mode albeit with a slight increase in complexity.
>
I suspect there is no value in designing a hash implementation to work
well for a context where a btree index would already perform equally well.

If there are too few hash buckets, performance is not O(1). For a hash
index to function better than btree, I believe focus should be spent on
the O(1) case, which means ensuring that enough hash buckets are used to
provide O(1).

All of these must match: 1) Hash value, 2) Key value, 3) Tuple visibility.

In the optimum O(1) scenario, each existing key will map to a hash
bucket that contains ~1 entry. For this case, there is no value to
having the key stored in the index row, as 3) Tuple visibility, will
still require access to the table row. In this optimum scenario, I do
not believe anything of value is saved by storing the key in the index
row. The loss, however, is that the hash index data structures become
more complex, and would likely require support for variable length data.
The resulting increase in hash index size and code complexity would
reduce performance.

Just an opinion.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-08 21:49:00
Message-ID: 20070908214900.GB5679@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> Continuing this train of thought.... While it would make sense for larger
>> keys to store the hash in the index, if the key is smaller, particularly
>> if it is of fixed size, it would make sense to store the key in the index
>> instead. This would have the benefit of allowing use of the hash index
>> in non-lossy mode albeit with a slight increase in complexity.
>>
> I suspect there is no value in designing a hash implementation to work well
> for a context where a btree index would already perform equally well.
>
> If there are too few hash buckets, performance is not O(1). For a hash
> index to function better than btree, I believe focus should be spent on the
> O(1) case, which means ensuring that enough hash buckets are used to
> provide O(1).
>
> All of these must match: 1) Hash value, 2) Key value, 3) Tuple visibility.
>
> In the optimum O(1) scenario, each existing key will map to a hash bucket
> that contains ~1 entry. For this case, there is no value to having the key
> stored in the index row, as 3) Tuple visibility, will still require access
> to the table row. In this optimum scenario, I do not believe anything of
> value is saved by storing the key in the index row. The loss, however, is
> that the hash index data structures become more complex, and would likely
> require support for variable length data. The resulting increase in hash
> index size and code complexity would reduce performance.
>
> Just an opinion.
>
> Cheers,
> mark
>

I agree that we should focus on the O(1) area. The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing. It seems reasonable that
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency. Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance characteristics
that we are seeking.

Regards,
Ken

> --
> Mark Mielke <mark(at)mielke(dot)cc>
>


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-08 22:56:23
Message-ID: 46E32897.8080106@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> The value of storing the
> actual value, possibly as an option, is that the key check can be done
> in the index without requiring a heap lookup to check the actual value
> which would be a benefit for a unique index. I agree that supporting
> variable length data would complicate the index and reduce performance.
> I am not willing to assume that ~1 entry per hash bucket is necessarily
> what we want, at least without some testing.
I think that if the case of >1 entry per hash becomes common enough to
be significant, and the key is stored in the hash, that a btree will
perform equal or better, and there is no point in pursuing such a hash
index model. This is where we are today.

> It seems reasonable that
> with the performance differences between L1/L2/L3 cache, main memory,
> and the disk subsystem a higher load factor would result in better
> overall system performance by reducing cache line misses and improving
> hardware pre-fetch efficiency.
If the key is stored, all of these factors likely favor the btree format
over the hash format. Again, this is where we are today.

> Along with the hypothetical performance
> wins, the hash index space efficiency would be improved by a similar
> factor. Obviously, all of these ideas would need to be tested in
> various workload environments. In the large index arena, 10^6 to 10^9
> keys and more, space efficiency will help keep the index manageable
> in todays system memories.
>
Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
> Please keep the ideas and comments coming. I am certain that a synthesis
> of them will provide an implementation with the performance characteristics
> that we are seeking.
>
The subject interests me. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-10 02:42:58
Message-ID: 20070910024258.GC5679@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> > ... This is the rough plan. Does anyone see anything critical that
> > is missing at this point?
>
> Sounds pretty good. Let me brain-dump one item on you: one thing that
> hash currently has over btree is the ability to handle index items up
> to a full page. Now, if you go with a scheme that only stores hash
> codes and not the underlying data, you can not only handle that but
> improve on it; but if you reduce the bucket size and don't remove the
> data, it'd be a step backward. The idea I had about dealing with that
> was to only reduce the size of primary buckets --- if it's necessary to
> add overflow space to a bucket, the overflow units are still full pages.
> So an index tuple up to a page in size can always be accommodated by
> adding an overflow page to the bucket.
>
> Just a thought, but AFAIR it's not in the archives anywhere.
>
> regards, tom lane
>
I was thinking about this some more, and it strikes me that we can
keep the page size = bucket size = overflow size in the new scheme
of storing the hash value in the index and still reduce the effective
bucket size. Let's make the new size the (page size / 2^n) where n
is chosen appropriately. Then we we store the value in the page we
simply use n bits of the hash to determine which sub-piece of the
page to use to actually store the value. We may need to add a multiplier
to adjust the decision to split the page based on the mini-page. This
should allow us to much more densely pack the pages and approach the
1 item per bucket. This could easily shrink the size of the index by
a factor of 2^n. Let me know what you think.

Regards,
Ken


From: Jens-Wolfhard Schicke <ml+pgsql-hackers(at)asco(dot)de>
To:
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-10 08:56:52
Message-ID: 6CD3179DA44575D994F805D7@[192.168.1.85]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke
<mark(at)mark(dot)mielke(dot)cc> wrote:
> Kenneth Marshall wrote:
> Along with the hypothetical performance
> wins, the hash index space efficiency would be improved by a similar
> factor. Obviously, all of these ideas would need to be tested in
> various workload environments. In the large index arena, 10^6 to 10^9
> keys and more, space efficiency will help keep the index manageable
> in todays system memories.
>
>
> Space efficiency is provided by not storing the key, nor the header data
> required (length prefix?).
Space efficiency at ~1 entry per bucket: How about using closed hashing,
saving in each page a bitmask in front which specifies which entries hold
valid entries and in the rest of the page row-pointers (is this the correct
expression? I don't know...) without further data. Should provide
reasonably simple data structure and alignment for the pointers.

> Please keep the ideas and comments coming. I am certain that a synthesis
> of them will provide an implementation with the performance
> characteristics
> that we are seeking.

One should look into new plan nodes for "!= ANY()", "NOT EXISTS" and
similar. A node like "look into hash and true if bucket is empty" would
work without checking tuple visibility when the bucket is empty and could
be a win in some situations.

Do we want special cases for short keys like INT4? In those cases the
implementation might use hash == key and put that knowledge to use in
plans. Even a unique constraint might then be doable. Does the
postgresql-storage backend on linux support sparse files? Might be a win
when holes in the sequence turn up.


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-10 14:36:28
Message-ID: 46E5566C.3050106@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
>
>> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
>>
>>> ... This is the rough plan. Does anyone see anything critical that
>>> is missing at this point?
>>>
>> Sounds pretty good. Let me brain-dump one item on you: one thing that
>> hash currently has over btree is the ability to handle index items up
>> to a full page. Now, if you go with a scheme that only stores hash
>> codes and not the underlying data, you can not only handle that but
>> improve on it; but if you reduce the bucket size and don't remove the
>> data, it'd be a step backward. The idea I had about dealing with that
>> was to only reduce the size of primary buckets --- if it's necessary to
>> add overflow space to a bucket, the overflow units are still full pages.
>> So an index tuple up to a page in size can always be accommodated by
>> adding an overflow page to the bucket.
>>
>> Just a thought, but AFAIR it's not in the archives anywhere.
>>
>> regards, tom lane
>>
>>
> I was thinking about this some more, and it strikes me that we can
> keep the page size = bucket size = overflow size in the new scheme
> of storing the hash value in the index and still reduce the effective
> bucket size. Let's make the new size the (page size / 2^n) where n
> is chosen appropriately. Then we we store the value in the page we
> simply use n bits of the hash to determine which sub-piece of the
> page to use to actually store the value. We may need to add a multiplier
> to adjust the decision to split the page based on the mini-page. This
> should allow us to much more densely pack the pages and approach the
> 1 item per bucket. This could easily shrink the size of the index by
> a factor of 2^n. Let me know what you think.
>
My personal opinion is that something like this is required to take best
advantage of hashes. I didn't respond immediately because I don't have
advice on what the best implementation would look like.

I have also been wondering about the effect of a hash index that
includes multiple values to the same key (either a non-unique key, or
different tuples from the same key due to table updates). I started by
thinking that the maximum number of hash entries per hash bucket should
be kept to 2 or 3 to prevent reduction in performance to that of btree,
but I think this doesn't work if multiple tuples can have the same key.
Unless - the maps is hash value =1:1> index page =1:1> hash bucket =1:N>
hash value =1:M=> tuples. Guarantee than N is small (either <= 2 or <=4
depending on performance evaluation) by re-hashing if N ever becomes > 2
or > 4. Choose a sparse harsh. Let 1:M be indefinite? Also, optimize the
1:M=1:1 case, such that the value can be inline?

For most cases, I would think the above model would make it cheap to
determine if the key does not exist, as well as the 1:1 (hash value:key)
case requiring a single page lookup. Update performance would suffer,
but lookup should be faster than btree in these cases, as btree often
requires > 1 index page scan.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-10 15:04:44
Message-ID: 20070910150443.GB16512@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote:
> I think that if the case of >1 entry per hash becomes common enough to
> be significant, and the key is stored in the hash, that a btree will
> perform equal or better, and there is no point in pursuing such a hash
> index model. This is where we are today.

There is the point that if a user does an UPDATE of a row without
changing the key your index will have to store entries with the same
hash. If your goal is mostly write-once tables, then that's cool, but
otherwise you're going to have to find a way of dealing with that. It's
not just going to happen a lot, it is going to be common, even for
unique indexes.

Presumably HOT will help with this, but that relies on all index columns
not to change.

The major benenfits will mostly come from not storing the key at all. I
think.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Raney <twraney(at)comcast(dot)net>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-21 00:12:45
Message-ID: 46F30C7D.6060402@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
For example, for a particular 12 million row relation, the
8.2.4 release requires over 2.8 hours to build the hash index.
With our patch, the index is built in 80 seconds.
Here is the link for a graph, showing a comparative analysis of
btree and hash index builds and describing the benchmark data.
http://web.cecs.pdx.edu/~raneyt/pg/

We are currently cleaning up the patch and will submit it asap.

Regards,
Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
Tom Raney <twraney(at)comcast(dot)net>

Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am currently considering:
>
> 1. Characterize the current hash index implementation against
> the BTree index, with a focus on space utilization and
> lookup performance against a collection of test data. This
> will give a baseline performance test to evaluate the impact
> of changes. I initially do not plan to bench the hash creation
> process since my initial focus will be on lookup performance.
>
> 2. Evaluate the performance of different hash index implementations
> and/or changes to the current implementation. My current plan is
> to keep the implementation as simple as possible and still provide
> the desired performance. Several hash index suggestions deal with
> changing the layout of the keys on a page to improve lookup
> performance, including reducing the bucket size to a fraction of
> a page or only storing the hash value on the page, instead of
> the index value itself. My goal in this phase is to produce one
> or more versions with better performance than the current BTree.
>
> 3. Look at build time and concurrency issues with the addition of
> some additional tests to the test bed. (1)
>
> 4. Repeat as needed.
>
> This is the rough plan. Does anyone see anything critical that
> is missing at this point? Please send me any suggestions for test
> data and various performance test ideas, since I will be working
> on that first.
>
> Regards,
> Ken Marshall
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Raney <twraney(at)comcast(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-25 13:17:16
Message-ID: 20070925131716.GJ14440@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> We are pleased to announce an upcoming patch to the hash index code
> which improves build time and index size, based on this item in the
> TODO list:
> During index creation, pre-sort the tuples to improve build speed
> http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
>
> Using our implementation, build times and index sizes are
> comparable with btree index build times and index sizes.
> For example, for a particular 12 million row relation, the
> 8.2.4 release requires over 2.8 hours to build the hash index. With our
> patch, the index is built in 80 seconds.
> Here is the link for a graph, showing a comparative analysis of
> btree and hash index builds and describing the benchmark data.
> http://web.cecs.pdx.edu/~raneyt/pg/
>
> We are currently cleaning up the patch and will submit it asap.
>
> Regards,
> Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
> Tom Raney <twraney(at)comcast(dot)net>
>

That is super! (and timely)

I was just looking at that some myself since the large index build
times make testing a very laborious process. I am looking forward to
seeing the details.

Regards,
Ken


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc: "Tom Raney" <twraney(at)comcast(dot)net>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Hash index todo list item
Date: 2007-09-25 14:35:47
Message-ID: 87y7eun77g.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kenneth Marshall" <ktm(at)rice(dot)edu> writes:

> On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
>
>> Using our implementation, build times and index sizes are
>> comparable with btree index build times and index sizes.
>...
> That is super! (and timely)

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.

Also, if you're going to submit a patch you should check out a copy of the CVS
HEAD and work from that. I don't think there are huge differences in the area
of hash indexes though. But in most other areas you would be spending quite a
bit of time dealing details which have changed since.

Finally note that we're in the final throes of the 8.3 feature freeze.
Normally any patch submitted now would be held until 8.3 is released and
development on 8.4 is started. I could imagine an exception being made for
hash indexes since they're so moribund currently but probably not. The flip
side of that argument is that there's not much point in making an exception
for something which will only be really useful once further work is done in
the same area.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Raney <twraney(at)comcast(dot)net>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-25 16:26:05
Message-ID: 20070925162604.GO14440@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
> "Kenneth Marshall" <ktm(at)rice(dot)edu> writes:
>
> > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> >
> >> Using our implementation, build times and index sizes are
> >> comparable with btree index build times and index sizes.
> >...
> > That is super! (and timely)
>
> It is pretty super. I have a few comments to raise but don't take it to be too
> negative, it sounds like this is a big step forward towards making hash
> indexes valuable.
>
> Firstly in the graphs it seems the index size graph has an exponential x-axis
> but a linear y-axis. This makes it hard to see what I would expect to be
> pretty much linear growth. The curves look exponential which would mean linear
> growth but of course it's hard to tell.
>
> Also, the growth in the time chart looks pretty much linear. That seems weird
> since I would expect there would be a visible extra component since sort times
> are n-log(n). Perhaps you need to test still larger tables to see that though.
>
> In any case it's clear from the results you have there that the change is a
> positive one and fixes a fundamental problem with the hash index build code.
>
> Something else you should perhaps test is indexing something which is
> substantially larger than hash function output. A hash function is going to
> make the most sense when indexing something like strings for which you want to
> avoid the long comparison costs. Especially consider testing this on a UTF8
> locale with expensive comparisons (like a CJK locale for example).
>
> Note that the bottom line for the problems with hash indexes is that the
> current implementation doesn't offer any advantages over btree indexes. Hash
> indexes need to be not only as good of a choice as btree indexes but
> significantly better a significantly better choice at least for some specific
> circumstances.
>
> Also, if you're going to submit a patch you should check out a copy of the CVS
> HEAD and work from that. I don't think there are huge differences in the area
> of hash indexes though. But in most other areas you would be spending quite a
> bit of time dealing details which have changed since.
>
> Finally note that we're in the final throes of the 8.3 feature freeze.
> Normally any patch submitted now would be held until 8.3 is released and
> development on 8.4 is started. I could imagine an exception being made for
> hash indexes since they're so moribund currently but probably not. The flip
> side of that argument is that there's not much point in making an exception
> for something which will only be really useful once further work is done in
> the same area.
>

Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.

Regards,
Ken


From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Raney <twraney(at)comcast(dot)net>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-25 16:38:43
Message-ID: 4BD2CDA5-F59E-4C04-A059-0ECBC3B8D91D@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote:

> Although I am very excited about this patch, I do not see any real
> value
> in including it in 8.3.

I don't think you have to worry about it being in 8.3. Feature freeze
was months ago.

Michael Glaesemann
grzm seespotcode net


From: Tom Raney <twraney(at)comcast(dot)net>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-26 05:43:18
Message-ID: 46F9F176.1060705@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
>
>> "Kenneth Marshall" <ktm(at)rice(dot)edu> writes:
>>
>>
>>> On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
>>>
>>>
>>>> Using our implementation, build times and index sizes are
>>>> comparable with btree index build times and index sizes.
>>>>
>>> ...
>>> That is super! (and timely)
>>>
>> It is pretty super. I have a few comments to raise but don't take it to be too
>> negative, it sounds like this is a big step forward towards making hash
>> indexes valuable.
>>
>> Firstly in the graphs it seems the index size graph has an exponential x-axis
>> but a linear y-axis. This makes it hard to see what I would expect to be
>> pretty much linear growth. The curves look exponential which would mean linear
>> growth but of course it's hard to tell.
>>
>> Also, the growth in the time chart looks pretty much linear. That seems weird
>> since I would expect there would be a visible extra component since sort times
>> are n-log(n). Perhaps you need to test still larger tables to see that though.
>>
>> In any case it's clear from the results you have there that the change is a
>> positive one and fixes a fundamental problem with the hash index build code.
>>
>> Something else you should perhaps test is indexing something which is
>> substantially larger than hash function output. A hash function is going to
>> make the most sense when indexing something like strings for which you want to
>> avoid the long comparison costs. Especially consider testing this on a UTF8
>> locale with expensive comparisons (like a CJK locale for example).
>>
>> Note that the bottom line for the problems with hash indexes is that the
>> current implementation doesn't offer any advantages over btree indexes. Hash
>> indexes need to be not only as good of a choice as btree indexes but
>> significantly better a significantly better choice at least for some specific
>> circumstances.
>>
>> Also, if you're going to submit a patch you should check out a copy of the CVS
>> HEAD and work from that. I don't think there are huge differences in the area
>> of hash indexes though. But in most other areas you would be spending quite a
>> bit of time dealing details which have changed since.
>>
>> Finally note that we're in the final throes of the 8.3 feature freeze.
>> Normally any patch submitted now would be held until 8.3 is released and
>> development on 8.4 is started. I could imagine an exception being made for
>> hash indexes since they're so moribund currently but probably not. The flip
>> side of that argument is that there's not much point in making an exception
>> for something which will only be really useful once further work is done in
>> the same area.
>>
>>
>
> Although I am very excited about this patch, I do not see any real value
> in including it in 8.3. As you mention, we need to to have a hash index
> implementation that outperforms btree in some problem regime and that is
> currently not the case. I have just recently started the process of
> gathering ideas and having discussions on various approaches to making
> hash indexes more performant and we have a number of ideas on which to
> start our investigation. I do think that this patch will make the testing
> and evaluation, that will be needed to truly improve the hash index, much
> much easier.
>
> Regards,
> Ken
>
>
We're glad to contribute and be a part of Postgres. The patch has been
posted to pgsql-patches(at)postgresql(dot)org(dot)

Speeding up the creation time of hash indexes on non-trivial relations
was our goal. This will allow some interesting performance tests of the
hash index on very large relations. It may be that the near constant
lookup time of the hash index outperforms the Btree index for some large
data sets and for certain types of data and distributions.

Sincerely,
Tom Raney


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Raney <twraney(at)comcast(dot)net>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-28 18:48:07
Message-ID: 200709281848.l8SIm8O22044@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Tom Raney wrote:
> We are pleased to announce an upcoming patch to the hash index code
> which improves build time and index size, based on this item in the
> TODO list:
> During index creation, pre-sort the tuples to improve build speed
> http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
>
> Using our implementation, build times and index sizes are
> comparable with btree index build times and index sizes.
> For example, for a particular 12 million row relation, the
> 8.2.4 release requires over 2.8 hours to build the hash index.
> With our patch, the index is built in 80 seconds.
> Here is the link for a graph, showing a comparative analysis of
> btree and hash index builds and describing the benchmark data.
> http://web.cecs.pdx.edu/~raneyt/pg/
>
> We are currently cleaning up the patch and will submit it asap.
>
> Regards,
> Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
> Tom Raney <twraney(at)comcast(dot)net>
>
>
> Kenneth Marshall wrote:
> > Dear PostgreSQL Hackers:
> >
> > After following the hackers mailing list for quite a while,
> > I am going to start investigating what will need to be done
> > to improve hash index performance. Below are the pieces of
> > this project that I am currently considering:
> >
> > 1. Characterize the current hash index implementation against
> > the BTree index, with a focus on space utilization and
> > lookup performance against a collection of test data. This
> > will give a baseline performance test to evaluate the impact
> > of changes. I initially do not plan to bench the hash creation
> > process since my initial focus will be on lookup performance.
> >
> > 2. Evaluate the performance of different hash index implementations
> > and/or changes to the current implementation. My current plan is
> > to keep the implementation as simple as possible and still provide
> > the desired performance. Several hash index suggestions deal with
> > changing the layout of the keys on a page to improve lookup
> > performance, including reducing the bucket size to a fraction of
> > a page or only storing the hash value on the page, instead of
> > the index value itself. My goal in this phase is to produce one
> > or more versions with better performance than the current BTree.
> >
> > 3. Look at build time and concurrency issues with the addition of
> > some additional tests to the test bed. (1)
> >
> > 4. Repeat as needed.
> >
> > This is the rough plan. Does anyone see anything critical that
> > is missing at this point? Please send me any suggestions for test
> > data and various performance test ideas, since I will be working
> > on that first.
> >
> > Regards,
> > Ken Marshall
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-10-12 23:50:42
Message-ID: 20071012235042.GQ12467@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote:
> On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> > Dear PostgreSQL Hackers:
> >
> > After following the hackers mailing list for quite a while,
> > I am going to start investigating what will need to be done
> > to improve hash index performance. Below are the pieces of
> > this project that I am currently considering:
> >
> > 1. Characterize the current hash index implementation against
> > the BTree index, with a focus on space utilization and
> > lookup performance against a collection of test data. This
> > will give a baseline performance test to evaluate the impact
> > of changes. I initially do not plan to bench the hash creation
> > process since my initial focus will be on lookup performance.
> >
>
> Here are very basic results for a table with 1.6m entries:
>
> postgres=# CREATE TABLE dict (word varchar(100));
> CREATE TABLE
>
> postgres=# COPY dict FROM '/tmp/words';
> COPY 1648379
> postgres=# select count(*) from dict;
> count
> ---------
> 1648379
> (1 row)
>
> Time: 11187.418 ms
> postgres=# select count(*) from dict;
> count
> ---------
> 1648379
> (1 row)
>
> Time: 6040.912 ms
> postgres=# CREATE INDEX wordhash ON dict USING hash (word);
> CREATE INDEX
> Time: 11108707.160 ms
>
> postgres=# select * from dict where word = 'avatar';
> word
> --------
> avatar
> (1 row)
>
> Time: 79.823 ms
> postgres=# select * from dict where word = 'zebra';
> word
> -------
> zebra
> (1 row)
>
> Time: 9.864 ms
> postgres=# select * from dict where word = 'turkey';
> word
> --------
> turkey
> (1 row)
>
> Time: 18.418 ms
> Time: 1.045 ms
> Time: 1.257 ms
> Time: 1.080 ms
>
> postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
> CREATE INDEX
>
> Time: 25438.884 ms
>
> postgres=# select * from dict where word = 'avatar';
> word
> --------
> avatar
> (1 row)
>
> Time: 13.400 ms
> postgres=# select * from dict where word = 'zebra';
> word
> -------
> zebra
> (1 row)
>
> Time: 1.173 ms
> postgres=# select * from dict where word = 'turkey';
> word
> --------
> turkey
> (1 row)
>
> Time: 1.186 ms
> Time: 1.103 ms
> Time: 1.099 ms
> Time: 1.108 ms
>
> ------------------------------
> Size of table = 87556096
>
> Size of hash index = 268451840
> Size of btree index = 53510144
>
> From my very small sample on an unloaded machine, a hash index lookup
> took the least amount of time. It had a much larger initial time which
> could be attributable to cache population effects. The size is 5X that
> of the Btree index. I will continue to improve the test suite as more
> granularity is needed. If anyone has a good data generator, please let
> me know. Otherwise I will just roll my own.
>
> Regards,
> Ken
>
I have just re-ran the previous benchmark against 8.3beta1 (versus 8.2)
including the hash index sorted build patch and the results are below:

test=# CREATE TABLE dict (word varchar(100));
CREATE TABLE
Time: 24.463 ms
test=# COPY dict FROM '/tmp/cracklib-words';
LOG: checkpoints are occurring too frequently (21 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
COPY 1648379
Time: 44470.235 ms
test=# select count(*) from dict;
count
---------
1648379
(1 row)

Time: 4553.924 ms
test=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 85905.960 ms
test=# select * from dict where word = 'avatar';
word
--------
avatar
(1 row)

Time: 592.906 ms
test=# select * from dict where word = 'zebra';
word
-------
zebra
(1 row)

Time: 21.458 ms
test=# select * from dict where word = 'turkey';
word
--------
turkey
(1 row)

Time: 22.532 ms
Time: 1.224 ms
Time: 1.200 ms
Time: 1.264 ms
test=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX
Time: 33714.874 ms
test=# select * from dict where word = 'avatar';
word
--------
avatar
(1 row)

Time: 24.296 ms
test=# select * from dict where word = 'zebra';
word
-------
zebra
(1 row)

Time: 1.400 ms
test=# select * from dict where word = 'turkey';
word
--------
turkey
(1 row)

Time: 28.302 ms
Time: 1.284 ms
Time: 1.313 ms

---------------------------------------
Size of table = 69877760

Size of hash index = 268451840
Size of btree index = 48570368

I just wanted to have this baseline for future patches since
8.3 is just around the bend. As expected, the sorted build
process is a huge improvement from the unsorted original.

Regards,
Ken Marshall


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: twraney(at)comcast(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-10-22 19:09:18
Message-ID: 20071022190918.GM27320@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
> On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
> > This is a great starting point. I would appreciate it if you have the
> > time and could make it apply cleanly to HEAD.
>
> Just to give you an update on this, I'll try to find the time to get it
> done soon, but my day job is keeping me really busy these days, so I'm
> not sure when I'll be able to get to it...
>
> -Neil
>
Neil,

I am currently working on integrating your previous patch with the
8.3beta1 hash code + the sorted build patch by Tom Raney. I have
been adding back pieces that were removed and I had a question
about index tuples, in particular what does the size field indicate?
Is it the size in bytes of the data after the IndexTupleData or the
total size in bytes including the IndexTupleData?

In order to use the hitem with the sorted build process, I think
that the flags should provide the size info so that the tuplesort
code can be used. This will also allow the use of >32-bit hash
values. Am I completely off base?

Regards,
Ken


From: Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-11-06 06:47:17
Message-ID: 148682.2744.qm@web53408.mail.re2.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances."

We performed some probe tests on our patch on
hash index and the original btree index to compare the
performance between the two. We used a 80 million tuple table.
The table contained single integer attribute and the data
range was 0 - 100000000. (generated by a random generator).
We did the following:

1. Populate the table with 80 million tuples.
2. Create HASH index on the table.
3. clear both linux cache & psql buffers.
(exiting psql and restarting it cleared the psql buffers;
to clear linux cache, we used drop_cache command)
4. start psql
5. select on an integer in the range of values in the table.
(all test numbers were big ones, like 98934599)
6. record the time.
7. exit psql.
8. drop caches.(as described above)
9. repeat 4-8 for different numbers.
10. Drop Hash index.
11. Create Btree index and repeat 3-9.

The results are as shown below. All trials are in ms.
Count is the number of such tuples in the table.

HASH INDEX:
Number Count Trial1 Trial2 Trial3
21367898 2 152.0 129.5 131.1
95678699 2 140.6 145.6 137.5
99899799 2 148.0 147.4 152.6
97677899 2 142.0 153.5 112.0
94311123 2 137.6 146.3 141.3
67544455 2 141.6 144.6 152.7
88877677 2 122.1 123.1 130.8
88788988 2 150.2 150.0 171.7
44333344 1 119.9 116.3 117.6
34267878 1 97.5 99.9 114.8
55489781 2 115.7 117.2 145.3
97676767 1 169.5 168.5 181.7
99767564 1 142.7 133.6 132.7
21367989 4 198.0 193.2 186.7

BTREE INDEX

Number Trial1 Trial2 Trial3
21367898 145.5 145.6 150.6
95678699 177.5 170.0 176.4
99899799 175.4 181.2 185.4
97677899 136.9 149.0 130.8
94311123 179.0 175.3 166.3
67544455 161.7 162.2 170.4
88877677 138.7 135.2 149.9
88788988 165.7 179.3 186.3
44333344 166.0 172.8 184.3
34267878 159.1 168.8 147.8
55489781 183.2 195.4 185.1
97676767 147.2 143.6 132.6
99767564 167.8 178.3 162.1
21367989 232.3 238.1 216.9
>From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a difference of about 27.The standard deviations are about 23, so this is a statistically significant difference.

Our prediction that the hash index would take on the average one
probe for about 10ms and the btree would take three probes for about 30 ms
or a difference of about 20ms was pretty well shown by the difference we
got of about 27. Hope these data points will help with some questions
about the performance differences between Hash and Btree index.

Regards,
Shreya

Gregory Stark <stark(at)enterprisedb(dot)com> wrote: "Kenneth Marshall" writes:

> On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
>
>> Using our implementation, build times and index sizes are
>> comparable with btree index build times and index sizes.
>...
> That is super! (and timely)

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.

Also, if you're going to submit a patch you should check out a copy of the CVS
HEAD and work from that. I don't think there are huge differences in the area
of hash indexes though. But in most other areas you would be spending quite a
bit of time dealing details which have changed since.

Finally note that we're in the final throes of the 8.3 feature freeze.
Normally any patch submitted now would be held until 8.3 is released and
development on 8.4 is started. I could imagine an exception being made for
hash indexes since they're so moribund currently but probably not. The flip
side of that argument is that there's not much point in making an exception
for something which will only be really useful once further work is done in
the same area.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Shreya Bhargava" <shreya_bhargav(at)yahoo(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-11-06 08:27:42
Message-ID: 9362e74e0711060027h3fd1370eu3d060b579a01b18f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have not followed this thread very closely. But just wanted to give my inputs.

> From the results obtained, the average of all the hash probes is 141.8ms,
> the average for btree is 168.5, a difference of about 27.The standard
> deviations are about 23, so this is a statistically significant difference.
> Our prediction that the hash index would take on the average one
> probe for about 10ms and the btree would take three probes for about 30 ms
> or a difference of about 20ms was pretty well shown by the difference we
> got of about 27. Hope these data points will help with some questions
> about the performance differences between Hash and Btree index.
>

We all know that Hash indexes are good for equality queries and Binary
indexes are good for both equality queries and Range queries. So for
equality queries, Hash index should do only one Logical I/O (if no
hash collisions) and Binary index should do atleast 3 (I don't know
the level of B-tree that came out as a result of this). You can enable
the Logical I/O count by applying this patch.

*** postgresql-8.3beta1/src/backend/storage/buffer/bufmgr.c Tue Sep 25
18:11:48 2007
--- postgresql-8.3patch/src/backend/storage/buffer/bufmgr.c Fri Oct 19
23:18:36 2007
***************
*** 1470,1477 ****
localhitrate = (float) LocalBufferHitCount *100.0 / ReadLocalBufferCount;

appendStringInfo(&str,
! "!\tShared blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
! ReadBufferCount - BufferHitCount, BufferFlushCount, hitrate);
appendStringInfo(&str,
"!\tLocal blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
ReadLocalBufferCount - LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);
--- 1470,1477 ----
localhitrate = (float) LocalBufferHitCount *100.0 / ReadLocalBufferCount;

appendStringInfo(&str,
! "!\tShared blocks: %10ld Logical Reads, %10ld Physical Reads, %10ld
written, buffer hit rate = %.2f%%\n",
! ReadBufferCount, ReadBufferCount - BufferHitCount,
BufferFlushCount, hitrate);
appendStringInfo(&str,
"!\tLocal blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
ReadLocalBufferCount - LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);

If possible, it would be useful for you to present the reduction in
Logical I/O count, as it very well might translate to Physical I/Os
for simple index scans. Atleast check whether it is in the ratio 1:3.

Hope my suggestion helps.

Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-11-06 10:59:54
Message-ID: 4730492A.7090609@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Shreya Bhargava wrote:
> 1. Populate the table with 80 million tuples.
> 2. Create HASH index on the table.
> 3. clear both linux cache & psql buffers.
> (exiting psql and restarting it cleared the psql buffers;
> to clear linux cache, we used drop_cache command)
> 4. start psql
> 5. select on an integer in the range of values in the table.
> (all test numbers were big ones, like 98934599)
> 6. record the time.
> 7. exit psql.
> 8. drop caches.(as described above)
> 9. repeat 4-8 for different numbers.
> 10. Drop Hash index.
> 11. Create Btree index and repeat 3-9.

It seems you're mostly measuring the overhead of starting a backend,
populating the relcache etc.

Restarting psql doesn't clear the postgres shared buffer cache. Or did
you mean that you restarted postgres?

Anyway, I don't think it's interesting to test with cleared caches.
Surely the metapage and first 1-2 levels of the b-tree would stay cached
all the time in real life.

> From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a difference of about 27.The standard deviations are about 23, so this is a statistically significant difference.

I don't trust those numbers much, but in any case I don't think that
edge is big enough to justify the existence of hash indexes.

If you're looking for a use case where hash index is faster, I'd suggest
using a data type with an expensive comparison function. Like long
multi-byte strings in UTF-8 encoding.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jens-Wolfhard Schicke <drahflow(at)gmx(dot)de>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-11-06 11:07:04
Message-ID: 47304AD8.9020506@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shreya Bhargava wrote:
> "Note that the bottom line for the problems with hash indexes is that the
> current implementation doesn't offer any advantages over btree indexes. Hash
> indexes need to be not only as good of a choice as btree indexes but
> significantly better a significantly better choice at least for some
> specific circumstances."
Oh it does. I recently used a hash index to speed up my database. Namely
I found it improved performance when indexing a non-integer column
containing english words.

I don't know how much of that data was cached, according to the sound
of my harddrive it wasn't all of it. Consider this anecdotical
evidence, but the speedup was noticeable.

> We performed some probe tests on our patch on
> hash index and the original btree index to compare the
> performance between the two. We used a 80 million tuple table.
> The table contained single integer attribute and the data
> range was 0 - 100000000. (generated by a random generator).

I'd be interested how much difference is there between non-integer
index behaviour. I at least had the impression that in my case
the sorted strings in the btree pages didn't compare too well.

> */Gregory Stark <stark(at)enterprisedb(dot)com>/* wrote:
> "Kenneth Marshall" writes:
> > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> >> Using our implementation, build times and index sizes are
> >> comparable with btree index build times and index sizes.
Way to go! Currently building hash indices is no fun.

Regards,
Jens-W. Schicke
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHMErXzhchXT4RR5ARAh7pAKCZIZFJFa7Oq25GvwDhiZJFsrtwgACbBC1F
otwhIZVlNgUGlroePIafi1c=
=N1f7
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-11-06 14:34:42
Message-ID: 18874.1194359682@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com> writes:
> We performed some probe tests on our patch on
> hash index and the original btree index to compare the
> performance between the two.

Interesting, but ...

> From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a difference of about 27.The standard deviations are about 23, so this is a statistically significant difference.

> Our prediction that the hash index would take on the average one
> probe for about 10ms and the btree would take three probes for about 30 ms
> or a difference of about 20ms was pretty well shown by the difference we
> got of about 27.

... unfortunately, a zero-cache situation isn't very reflective of the
real world. In reality, for an index that is getting used often enough
to impact application performance at all, the root page will stay in
cache and likely so will some of the upper tree levels. So I don't
think this experiment proves anything at all about real-world
performance.

What I'd like to see is an aggregate time measurement for millions of
random probes into an index that's noticeably larger than memory.
It would also be interesting to measure the speed of the fully-cached
case (same test, but index smaller than memory).

regards, tom lane


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-11-15 00:12:56
Message-ID: 20071115001255.GJ11020@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
> On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
> > This is a great starting point. I would appreciate it if you have the
> > time and could make it apply cleanly to HEAD.
>
> Just to give you an update on this, I'll try to find the time to get it
> done soon, but my day job is keeping me really busy these days, so I'm
> not sure when I'll be able to get to it...
>
> -Neil
>
Neil,

I have been working on putting an updated version of your
patch into the current source. My first try was to try and
put your patch in directly, but it differed so much from the
current build that it was not obvious how to address things
like the current hash_index sorted build patch, which I need
to be able to test with indexes of any size at all. My current
try is to replace the _hash_formitem() calls with a function
called _hash_form_tuple() that actually returns an IndexTuple
and not an HashItem. This will allow it to be used quite
naturally with the current sorted build patch. Here is what
it looks like now:

/*
* _hash_form_tuple -- construct index tuple using hash(value) not value
*/
IndexTuple
_hash_form_tuple(IndexTuple itup, Relation rel)
{
IndexTuple result;
Size size;
uint32 hashkey;
Datum datum;
bool isnull;

/* disallow nulls in hash keys */
if (IndexTupleHasNulls(itup))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("hash indexes cannot contain null keys")
));

if (rel->rd_rel->relnatts != 1)
elog(ERROR, "hash indexes support only one index key");

/* hash the tuple; we only store the hash value in the index */
datum = index_getattr(itup, 1, RelationGetDescr(rel), &isnull);
Assert(!isnull);
hashkey = _hash_datum2hashkey(rel, datum);

size = IndexTupleSize(itup);
result = (IndexTuple) palloc(size);
memcpy(result, itup, size);
return result;
}

I am not currently doing anything other than returning the current
IndexTuple that was created with index_form_tuple(). Am I daft, or
can I just memcpy() the 6 bytes of TID, add the 2 bytes of t_info
(everything 0 and the size set to 6 + 2 + sizeof(hash) = 10), and
the 4 bytes of hash. This will allow me to handle 8-byte hashes
in the future. If you see a problem with this approach, please
let me know. I would appreciate any feedback you can give.

Regards,
Ken
>
>