Hi, hackers. I've post a hash patch in a previous thread http://archives.postgresql.org/pgsql-hackers/2008-07/msg00794.php I do apologize for the bad readability of previous patch. Thank you all for your comments. Here is a new patch which fixed some bugs in the previous one. I post it here to get some feedback and further suggestion. Any comment is welcome. Changes since v1: - fix bug that it crashed in _h_spool when test big data set - adjust the target-fillfactor calculation in _hash_metapinit - remove the HASHVALUE_ONLY macro - replace _create_hash_desc with _get_hash_desc to get a hard-coded hash index tuple. - replace index_getattr with _hash_get_datum to get the hash key datum and avoid too many calls to _get_hash_desc and index_getattr Here is what I intend to do. Todo: - get the statistics of block access i/o - write unit tests using pgunitest to test the following: (Josh Berkus suggested in this thread http://archives.postgresql.org/pgsql-hackers/2008-05/msg00535.php ) bulk load, both COPY and INSERT single-row updates, inserts and deletes batch update by key batch update by other index batch delete by key batch delete by other index concurrent index updates (64 connections insert/deleting concurrently) I makes some simple test mentioned here ( http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php) I'll make some test on bigger data set later. using a word list of 3628800 unique words The table size is 139MB. Index BuildTime IndexSize ---- ---- ---- btree 51961.123 ms 93MB hash 411069.264 ms 2048MB hash-patch 36288.931 ms 128MB dict=# SELECT * from hash-dict where word = '0234567891' ; word ------------ 0234567891 (1 row) Time: 33.960 ms dict=# SELECT * from btree-dict where word = '0234567891' ; word ------------ 0234567891 (1 row) Time: 1.662 ms dict=# SELECT * from hash2-dict where word = '0234567891' ; word ------------ 0234567891 (1 row) Time: 1.457 ms At last, there is a problem I encounter. I'm confused by the function _hash_checkqual. IMHO, the index tuple only store one column here and key->sk_attno should always be 1 here. And scanKeySize should be 1 since we didn't support multi-column hash yet. Do I make some misunderstanding? /* * _hash_checkqual -- does the index tuple satisfy the scan conditions? */ bool _hash_checkqual(IndexScanDesc scan, IndexTuple itup) { TupleDesc tupdesc = RelationGetDescr(scan->indexRelation); ScanKey key = scan->keyData; int scanKeySize = scan->numberOfKeys; IncrIndexProcessed(); while (scanKeySize > 0) { Datum datum; bool isNull; Datum test; datum = index_getattr(itup, key->sk_attno, tupdesc, &isNull); /* assume sk_func is strict */ if (isNull) return false; if (key->sk_flags & SK_ISNULL) return false; test = FunctionCall2(&key->sk_func, datum, key->sk_argument); if (!DatumGetBool(test)) return false; key++; scanKeySize--; } return true; } Hope to hear from you. -- Best Regards, Xiao Meng DKERC, Harbin Institute of Technology, China Gtalk: mx(dot)cogito(at)gmail(dot)com MSN: cnEnder(at)live(dot)com http://xiaomeng.yo2.cn
Attachment:
hash-v2.patch
Description: Text Data