Wildcard search support for pg_trgm

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Wildcard search support for pg_trgm
Date: 2010-12-11 21:07:32
Message-ID: AANLkTi=2iL7_ZdQgnm2d1qiNC2rJM-faCVtFu2K5A8Lz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Here is first version of patch, which enable index support of wildcard
search in pg_trgm contrib module. The idea of the patch is to extract from
wildcard trigrams which should occurs in wildcard matching string. For
example, for '%sector%' wildcard such trigrams would be: 'sec', 'ect',
'tor'.

create table words (word text);
copy words from '/usr/share/dict/american-english';

test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN

------------------------------------------------------------------------------------------------------
Seq Scan on words (cost=0.00..1703.11 rows=10 width=9) (actual
time=18.818..174.146 rows=7 loops=1)
Filter: (word ~~* '%independ%'::text)
Total runtime: 174.200 ms
(3 rows)

CREATE INDEX trgm_idx ON words USING gist (word gist_trgm_ops);

test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN

------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=4.36..40.11 rows=10 width=9) (actual
time=2.445..2.529 rows=7 loops=1)
Recheck Cond: (word ~~* '%independ%'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..4.35 rows=10 width=0)
(actual time=2.406..2.406 rows=7 loops=1)
Index Cond: (word ~~* '%independ%'::text)
Total runtime: 2.612 ms
(5 rows)

CREATE INDEX trgm_idx ON words USING gin (word gin_trgm_ops);

test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=76.08..111.83 rows=10 width=9) (actual
time=2.675..2.755 rows=7 loops=1)
Recheck Cond: (word ~~* '%independ%'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..76.07 rows=10 width=0)
(actual time=2.642..2.642 rows=7 loops=1)
Index Cond: (word ~~* '%independ%'::text)
Total runtime: 2.839 ms
(5 rows)

I've encountered with following problems:
1) Indexing support for ilike is possible only with case-insensetive
wildcards, e.g. when IGNORECASE macro is enabled. But I can't use this macro
in pg_trgm.sql.in, where list of operators is defined. Probably, is it
enuogh to put comment near IGNORECASE, which tells that if one disable this
macro he should also remove oparators from pg_trgm.sql.in?
2) I found gist index not very useful with default SIGLENINT = 3. I've
changed this value to 15 and I found gist index performs very good on
dictionary. But on longer strings greater values of SIGLENINT may be
required (probably even SIGLENINT > 122 will give benefit in some cases in
spite of TOAST).

----
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
trgm_wildcard-0.1.patch text/x-patch 13.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-12-11 21:08:32 pg_execute_from_file, patch v10
Previous Message Heikki Linnakangas 2010-12-11 21:03:23 Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records