The cost of visibillity testing? (gin-search)

Lists: pgsql-hackers
From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-hackers(at)postgresql(dot)org
Subject: The cost of visibillity testing? (gin-search)
Date: 2010-12-21 19:25:16
Message-ID: 4D10FF1C.9030203@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Hackers.

I have a feeling that GIN is "cheating" on the visibillity checks:

test=# set enable_seqscan = off;
SET
Time: 0.129 ms
test=# select count(id) from fts_test where fts @@ to_tsquery('core');
count
--------
158827
(1 row)

Time: 95.530 ms
test=# explain select count(id) from fts_test where fts @@
to_tsquery('core');
QUERY PLAN
----------------------------------------------------------------------------------------------
Aggregate (cost=211571.52..211571.53 rows=1 width=4)
-> Bitmap Heap Scan on fts_test (cost=134925.95..211174.01
rows=159004 width=4)
Recheck Cond: (fts @@ to_tsquery('core'::text))
-> Bitmap Index Scan on fts_idx (cost=0.00..134886.20
rows=159004 width=0)
Index Cond: (fts @@ to_tsquery('core'::text))
(5 rows)

Time: 0.609 ms

test=# select count(id) from fts_test;
count
--------
168556
(1 row)

Time: 164.655 ms

test=# explain select count(id) from fts_test;
QUERY PLAN
------------------------------------------------------------------------------------------------
Aggregate (cost=10000075969.95..10000075969.96 rows=1 width=4)
-> Seq Scan on fts_test (cost=10000000000.00..10000075548.56
rows=168556 width=4)
(2 rows)

Time: 0.338 ms

This is run multiple times for both queries and the seqscan of the table
is consistently about 1.8 times more expensive than the fts-scan.
This is all on a fully memory cached dataset.

The first query should have the cost of the GIN-search +
visibillity-test of 158K tuples,
the latter should have the cost of visibillity-testing 168K tuples. If
we set the cost
of actually searching GIN to 0 then the gin-search - visibillity costs:
95/158000 0.000373ms/tuple
where the seq-scan case costs close to 0.001ms/tuple (close to 3 times
as much).

So I have a strong feeling that GIN is cheating on the visibillity tests
otherwise I have problems imagining how it ever can become faster to execute
than the seq_scan of the table.

Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
visibillity-testing?

What have I missed in the logic?

Thanks.

--
Jesper


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: The cost of visibillity testing? (gin-search)
Date: 2010-12-21 19:35:03
Message-ID: 4D110167.9010006@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.12.2010 21:25, Jesper Krogh wrote:
> The first query should have the cost of the GIN-search +
> visibillity-test of 158K tuples,
> the latter should have the cost of visibillity-testing 168K tuples. If
> we set the cost
> of actually searching GIN to 0 then the gin-search - visibillity costs:
> 95/158000 0.000373ms/tuple
> where the seq-scan case costs close to 0.001ms/tuple (close to 3 times
> as much).
>
> So I have a strong feeling that GIN is cheating on the visibillity tests
> otherwise I have problems imagining how it ever can become faster to
> execute
> than the seq_scan of the table.
>
> Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
> visibillity-testing?

It certainly shouldn't be.

> What have I missed in the logic?

Perhaps you have a lot of empty space or dead tuples that don't match
the query in the table, which the sequential scan has to grovel through,
but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>
Subject: Re: The cost of visibillity testing? (gin-search)
Date: 2010-12-21 20:28:15
Message-ID: 201012212128.16969.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote:
> What have I missed in the logic?
A reproducible testcase ;-)

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The cost of visibillity testing? (gin-search)
Date: 2010-12-21 23:37:52
Message-ID: 8036.1292974672@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 21.12.2010 21:25, Jesper Krogh wrote:
>> Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
>> visibillity-testing?

> It certainly shouldn't be.

>> What have I missed in the logic?

> Perhaps you have a lot of empty space or dead tuples that don't match
> the query in the table, which the sequential scan has to grovel through,
> but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say?

Another possibility is that the seqscan is slowed by trying to operate
in a limited number of buffers (the buffer strategy stuff).

regards, tom lane


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: The cost of visibillity testing? (gin-search)
Date: 2010-12-22 05:24:48
Message-ID: 4D118BA0.6000903@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-12-21 21:28, Andres Freund wrote:
> On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote:
>
>> What have I missed in the logic?
>>
> A reproducible testcase ;-)
>
Yes, I did a complete dump/restore of the dataset and the numbers
looked like expected. So table bloat seems to be the problem/challenge.

I must have hit a strange sitauation where my table-bloat proportionally
was significantly higher than my gin-index-bloat.

Jesper

--
Jesper