B-Tree emulation for GIN - some numbers

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: B-Tree emulation for GIN - some numbers
Date: 2008-11-19 16:32:32
Message-ID: Pine.LNX.4.64.0811191828050.7862@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi there,

we got no comments on our proposal about changing GIN interface to
support multicolumn feature of GIN, see
http://archives.postgresql.org/message-id/490B3752.3010800@sigaev.ru
for details, so here are some numbers demonstrating benefits and some
issues. Any help and feedback are welcome.

We used geonames database with 5793013 rows, btree_gin patch (v.0.5)
for CVS HEAD and contrib/btree_gin, which adds GIN support for all
data types, like contrib/btree_gist does.

postgres=# \d spots
Table "public.spots"
Column | Type | Modifiers
--------+----------+-----------
lat | real |
long | real |
name | text |
name1 | text |
fts | tsvector |

We used 3 indexes - btree index on lat column (lat_idx),
multicolumn GIN index on lat,fts columns (lat_fts_idx) and
GIN index on fts column (fts_idx).
The test query returns 7653 rows.
explain analyze select name from spots where fts @@ to_tsquery('river') and lat < 20 and lat > 10;

T1. lat_idx + fts_idx : 152.612 ms
T2. lat_fts_idx : 34.051 ms
T3. lat_idx + lat_fts_idx : 152.430 ms
T4. Disabled bitmapscan : 30.005 ms

T2 vs T1 show >4 times better performance for multicolumn GIN index (T2) than
BitmapAnd scan over two separate indexes (T1).

Several issues:
1. in T3 we see than planner prefer to use Btree index on lat_idx instead of
using lat_fts_idx as in T2, so we got performance of T1 instead of T2.

2. Disabling bitmapscan produces the best results in all cases and
the same plan. Changing statistics for lat column (we tried 100,1000)
doesn't change plan.

==============================

T1. btree index + GIN index
Indexes:
"fts_idx" gin (fts)
"lat_idx" btree (lat)

Q1.
postgres=# explain analyze select name from spots where fts @@ to_tsquery('river') and lat < 20 and lat > 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on spots (cost=12838.77..23940.34 rows=3391 width=12) (actual time=143.451..149.288 rows=7653 loops=1)
Recheck Cond: ((fts @@ to_tsquery('river'::text)) AND (lat < 20::double precision) AND (lat > 10::double precision))
-> BitmapAnd (cost=12838.77..12838.77 rows=3391 width=0) (actual time=143.097..143.097 rows=0 loops=1)
-> Bitmap Index Scan on fts_idx (cost=0.00..893.21 rows=30896 width=0) (actual time=8.543..8.543 rows=32820 loops=1)
Index Cond: (fts @@ to_tsquery('river'::text))
-> Bitmap Index Scan on lat_idx (cost=0.00..11943.62 rows=635886 width=0) (actual time=132.686..132.686 rows=618548 loops=1)
Index Cond: ((lat < 20::double precision) AND (lat > 10::double precision))
Total runtime: 152.040 ms
(8 rows)

Time: 152.612 ms

T2. One multicolumn GIN index
Indexes:
"lat_fts_idx" gin (lat, fts)

postgres=# explain analyze select name from spots where fts @@ to_tsquery('river') and lat < 20 and lat > 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on spots (cost=1011.83..54427.00 rows=3043 width=12) (actual time=10.374..31.349 rows=7653 loops=1)
Recheck Cond: (fts @@ to_tsquery('river'::text))
Filter: ((lat < 20::double precision) AND (lat > 10::double precision))
-> Bitmap Index Scan on lat_fts_idx (cost=0.00..1011.07 rows=28965 width=0) (actual time=8.439..8.439 rows=32820 loops=1)
Index Cond: (fts @@ to_tsquery('river'::text))
Total runtime: 34.051 ms
(6 rows)

T3. btree index + multicolumn GIN index

"lat_fts_idx" gin (lat, fts)
"lat_idx" btree (lat)

postgres=# explain analyze select name from spots where fts @@ to_tsquery('river') and lat < 20 and lat > 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on spots (cost=12442.97..22532.06 rows=3043 width=12) (actual time=142.979..149.162 rows=7653 loops=1)
Recheck Cond: ((fts @@ to_tsquery('river'::text)) AND (lat < 20::double precision) AND (lat > 10::double precision))
-> BitmapAnd (cost=12442.97..12442.97 rows=3043 width=0) (actual time=142.608..142.608 rows=0 loops=1)
-> Bitmap Index Scan on lat_fts_idx (cost=0.00..1011.07 rows=28965 width=0) (actual time=8.493..8.493 rows=32820 loops=1)
Index Cond: (fts @@ to_tsquery('river'::text))
-> Bitmap Index Scan on lat_idx (cost=0.00..11430.13 rows=608537 width=0) (actual time=132.225..132.225 rows=618548 loops=1)
Index Cond: ((lat < 20::double precision) AND (lat > 10::double precision))
Total runtime: 151.880 ms
(8 rows)

Time: 152.430 ms

T4. Disabled bitmapscan

postgres=# set enable_bitmapscan to off;
SET
Time: 0.101 ms

postgres=# explain analyze select name from spots where fts @@ to_tsquery('river') and lat < 20 and lat > 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using lat_fts_idx on spots (cost=0.00..96821.54 rows=3043 width=12) (actual time=0.040..26.847 rows=7653 loops=1)
Index Cond: (fts @@ to_tsquery('river'::text))
Filter: ((lat < 20::double precision) AND (lat > 10::double precision))
Total runtime: 29.543 ms
(4 rows)

Time: 30.005 ms

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2008-11-19 17:01:37 Re: Client certificate authentication
Previous Message Emmanuel Cecchet 2008-11-19 15:40:38 Re: Transactions and temp tables