Re: trgm regex index peculiarity

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: trgm regex index peculiarity
Date: 2013-06-21 10:40:38
Message-ID: dafad644f268ce1503e1b8b682aae38a.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, June 21, 2013 05:25, Tom Lane wrote:
> "Erik Rijkers" <er(at)xs4all(dot)nl> writes:
>> In a 112 MB test table (containing random generated text) with a trgm index (gin_trgm_ops), I consistently get these
>> timings:
>> select txt from azjunk6 where txt ~ '^abcd';
>> 130 ms
>> select txt from azjunk6
>> where txt ~ 'abcd' and substr(txt,1,4) = 'abcd';
>> 3 ms
>
> Hm, could you provide a self-contained test case?
>

yes, sorry. I tested on a 1M row table:

#!/bin/sh

# create table:
for power in 6;
do
table=azjunk${power}
index=${table}_trgm_re_idx
perl -E'
sub ss{ join"",@_[ map{rand @_} 1 .. shift ] };
say(ss(80,"a".."g"," ","h".."m"," ","n".."s"," ","t".."z")) for 1 .. 1e'"${power};" \
| psql -aqXc "
drop table if exists $table;
create table $table(txt text);
copy $table from stdin;";
echo "set session maintenance_work_mem='1GB';
create index $index on $table using gin (txt gin_trgm_ops);
analyze $table;" | psql -qtAX;
done

# test:
echo "
\\timing on
explain analyze select txt from azjunk6 where txt ~ '^abcd'; -- slow (140 ms)
explain analyze select txt from azjunk6 where txt ~ 'abcd' and substr(txt,1,4) = 'abcd'; -- fast (5 ms)
" | psql -Xqa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2013-06-21 10:45:51 Re: Patch for removng unused targets
Previous Message Craig Ringer 2013-06-21 10:20:29 Re: Support for RANGE ... PRECEDING windows in OVER