Re: Fixing GIN for empty/null/full-scan cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Abe Ingersoll <abe(at)abe(dot)us>
Subject: Re: Fixing GIN for empty/null/full-scan cases
Date: 2011-01-18 21:58:15
Message-ID: 3234.1295387895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> These numbers are a bit crazy-making, but the upshot is that Gist is
> slow out of the gate, but with data cached, it's pretty speedy. With
> indexscan and bitmapscan disabled, these queries all took 300-400
> ms. So GIN was never better performing than a table scan.

I could not replicate that here at all --- GIN indexscans were
consistently better than seqscans for me, eg

regression=# set enable_bitmapscan TO 1;
SET
Time: 0.673 ms
regression=# explain analyze SELECT count(*) FROM listings
WHERE features @@ '(1368799&1368800&1369043)'::query_int
AND deleted_at IS NULL AND status = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1159.20..1159.21 rows=1 width=0) (actual time=23.964..23.964 rows=1 loops=1)
-> Bitmap Heap Scan on listings (cost=31.15..1158.18 rows=406 width=0) (actual time=23.014..23.876 rows=772 loops=1)
Recheck Cond: ((features @@ '1368799 & 1368800 & 1369043'::query_int) AND (deleted_at IS NULL) AND (status = 1))
-> Bitmap Index Scan on idx_gin_features (cost=0.00..31.05 rows=406 width=0) (actual time=22.913..22.913 rows=772 loops=1)
Index Cond: (features @@ '1368799 & 1368800 & 1369043'::query_int)
Total runtime: 24.040 ms
(6 rows)

Time: 24.968 ms
regression=# set enable_bitmapscan TO 0;
SET
Time: 0.458 ms
regression=# explain analyze SELECT count(*) FROM listings
WHERE features @@ '(1368799&1368800&1369043)'::query_int
AND deleted_at IS NULL AND status = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9158.24..9158.25 rows=1 width=0) (actual time=145.121..145.121 rows=1 loops=1)
-> Seq Scan on listings (cost=0.00..9157.22 rows=406 width=0) (actual time=0.025..144.982 rows=772 loops=1)
Filter: ((deleted_at IS NULL) AND (features @@ '1368799 & 1368800 & 1369043'::query_int) AND (status = 1))
Total runtime: 145.177 ms
(4 rows)

Time: 146.228 ms

I'm noticing also that I get different rowcounts than you do, although
possibly that has something to do with the partial-index conditions,
which I'm not trying to duplicate here (all rows in my table pass those
two tests).

> * Why does it take 3-4x longer to create the GIN than the GiST index
> on tsvector?

Perhaps more maintenance_work_mem would help with that; although the
fine manual says specifically that GIN text search indexes take about
three times longer to build than equivalent GiST indexes, so maybe that
behavior is as designed.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-01-18 22:00:21 Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Previous Message A.M. 2011-01-18 21:57:39 Re: test_fsync label adjustments