From: | "Erik Rijkers" <er(at)xs4all(dot)nl> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | trgm regex index peculiarity |
Date: | 2013-06-21 01:38:24 |
Message-ID: | 786ef170d3cb6b991425a8b364e77c57.squirrel@webmail.xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
9.4devel (but same in 9.3)
In a 112 MB test table (containing random generated text) with a trgm index (gin_trgm_ops), I consistently get these timings:
select txt from azjunk6 where txt ~ '^abcd';
130 ms
select txt from azjunk6
where txt ~ 'abcd' and substr(txt,1,4) = 'abcd';
3 ms
(a similar performance difference occurs when using a regex, i.e. 'abc[de]' )
This difference is so large that I wonder if there is not something wrong in the first case. (The returned results are
correct though)
Here are the two explains:
explain analyze select txt from azjunk6 where txt ~ '^abcd';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on azjunk6 (cost=108.78..484.93 rows=100 width=81) (actual time=129.557..129.742 rows=1 loops=1)
Recheck Cond: (txt ~ '^abcd'::text)
Rows Removed by Index Recheck: 17
-> Bitmap Index Scan on azjunk6_trgm_re_idx (cost=0.00..108.75 rows=100 width=0) (actual time=129.503..129.503 rows=18
loops=1)
Index Cond: (txt ~ '^abcd'::text)
Total runtime: 130.008 ms
(6 rows)
explain analyze select txt from azjunk6 where txt ~ 'abcd' and substr(txt,1,4) = 'abcd';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on azjunk6 (cost=56.75..433.40 rows=1 width=81) (actual time=2.064..3.379 rows=1 loops=1)
Recheck Cond: (txt ~ 'abcd'::text)
Rows Removed by Index Recheck: 14
Filter: (substr(txt, 1, 4) = 'abcd'::text)
Rows Removed by Filter: 112
-> Bitmap Index Scan on azjunk6_trgm_re_idx (cost=0.00..56.75 rows=100 width=0) (actual time=1.911..1.911 rows=127
loops=1)
Index Cond: (txt ~ 'abcd'::text)
Total runtime: 3.409 ms
(8 rows)
The results in both cases are correct, but does this difference not almost amount to a bug?
( Interestingly, the variant WHERE txt ~ 'abcd$'
is as fast as the non-anchored variant )
Thanks,
Erik Rijkers
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2013-06-21 02:19:04 | Re: fallocate / posix_fallocate for new WAL file creation (etc...) |
Previous Message | Bruce Momjian | 2013-06-21 01:35:34 | Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL) |