Choosing between seqscan and bitmap scan

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Choosing between seqscan and bitmap scan
Date: 2010-04-29 09:33:46
Message-ID: 4BD9527A.60905@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

There is some strange on current CVS with correct choosing of scans. Although
bitmap scan is cheaper but postgresql chooses seqscan. Test suite:

CREATE OR REPLACE FUNCTION genvect()
RETURNS tsvector AS
$$
SELECT

array_to_string(
ARRAY(
SELECT
(random()*random()*random()*1000.0)::int::text
FROM
generate_series(1, 10 + (100.0*random())::bigint)
),
' '
)::tsvector;
$$
LANGUAGE SQL VOLATILE;

SELECT
t::int4 AS id, genvect() AS ts INTO foo
FROM
generate_series(1, 100000) AS t;

CREATE INDEX foo_idx ON foo USING gin (ts);

VACCUM ANALYZE foo;

postgres=# explain select count(*) from foo where ts @@ '259';
QUERY PLAN
---------------------------------------------------------------
Aggregate (cost=5817.27..5817.28 rows=1 width=0)
-> Seq Scan on foo (cost=0.00..5805.00 rows=4907 width=0)
Filter: (ts @@ '''259'''::tsquery)
(3 rows)

Time: 6,370 ms
postgres=# set enable_seqscan = off;
SET
Time: 2,014 ms
postgres=# explain select count(*) from foo where ts @@ '259';
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate (cost=5767.35..5767.36 rows=1 width=0)
-> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0)
Recheck Cond: (ts @@ '''259'''::tsquery)
-> Bitmap Index Scan on foo_idx (cost=0.00..941.24 rows=4907 width=0)
Index Cond: (ts @@ '''259'''::tsquery)
(5 rows)

Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?

Changed options in postgresql.conf:
shared_buffers=128MB
temp_buffers=16MB
work_mem=16MB
maintenance_work_mem=256MB
effective_cache_size=1024MB

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Choosing between seqscan and bitmap scan
Date: 2010-04-29 10:02:09
Message-ID: t2ge94e14cd1004290302oe3eff2cewcc295dbd89b6bcf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/4/29 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> Hi!
>
> There is some strange on current CVS with correct choosing of scans.

Also true with 8.4, default configuration.

> Although bitmap scan is cheaper but postgresql chooses seqscan. Test suite:
>
> CREATE OR REPLACE FUNCTION genvect()
> RETURNS tsvector AS
> $$
>    SELECT
>
>        array_to_string(
>            ARRAY(
>                SELECT
>                    (random()*random()*random()*1000.0)::int::text
>                FROM
>                    generate_series(1, 10 + (100.0*random())::bigint)
>            ),
>            ' '
>        )::tsvector;
> $$
> LANGUAGE SQL VOLATILE;
>
> SELECT
>    t::int4 AS id, genvect() AS ts INTO foo
> FROM
>    generate_series(1, 100000) AS t;
>
> CREATE INDEX foo_idx ON foo USING gin (ts);
>
> VACCUM ANALYZE foo;
>
> postgres=# explain  select count(*) from foo where ts @@ '259';
>                          QUERY PLAN
> ---------------------------------------------------------------
>  Aggregate  (cost=5817.27..5817.28 rows=1 width=0)
>   ->  Seq Scan on foo  (cost=0.00..5805.00 rows=4907 width=0)
>         Filter: (ts @@ '''259'''::tsquery)
> (3 rows)
>
> Time: 6,370 ms
> postgres=# set enable_seqscan = off;
> SET
> Time: 2,014 ms
> postgres=# explain  select count(*) from foo where ts @@ '259';
>                                   QUERY PLAN
> ---------------------------------------------------------------------------------
>  Aggregate  (cost=5767.35..5767.36 rows=1 width=0)
>   ->  Bitmap Heap Scan on foo  (cost=942.46..5755.08 rows=4907 width=0)
>         Recheck Cond: (ts @@ '''259'''::tsquery)
>         ->  Bitmap Index Scan on foo_idx  (cost=0.00..941.24 rows=4907
> width=0)
>               Index Cond: (ts @@ '''259'''::tsquery)
> (5 rows)
>
> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?
>
> Changed options in postgresql.conf:
> shared_buffers=128MB
> temp_buffers=16MB
> work_mem=16MB
> maintenance_work_mem=256MB
> effective_cache_size=1024MB
>
>
>
> --
> Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
>                                                   WWW: http://www.sigaev.ru/
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cédric Villemain


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Choosing between seqscan and bitmap scan
Date: 2010-04-29 13:38:21
Message-ID: 10295.1272548301@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> [ planner prefers ]
> -> Seq Scan on foo (cost=0.00..5805.00 rows=4907 width=0)
> to
> -> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0)

> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?

There's a fuzz factor of (IIRC) 1% in path cost comparisons. It's
deciding that the seqscan and bitmapscan total costs are not
meaningfully different; then since the startup costs *are* meaningfully
different, it's making the choice on the basis of cheaper startup cost.

regards, tom lane


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Choosing between seqscan and bitmap scan
Date: 2010-04-29 14:02:41
Message-ID: Pine.LNX.4.64.1004291800370.7097@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 29 Apr 2010, Tom Lane wrote:

> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> [ planner prefers ]
>> -> Seq Scan on foo (cost=0.00..5805.00 rows=4907 width=0)
>> to
>> -> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0)
>
>> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?
>
> There's a fuzz factor of (IIRC) 1% in path cost comparisons. It's
> deciding that the seqscan and bitmapscan total costs are not
> meaningfully different; then since the startup costs *are* meaningfully
> different, it's making the choice on the basis of cheaper startup cost.

hmm, what if we add index scan preference inside 1% tolerance ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Choosing between seqscan and bitmap scan
Date: 2010-04-29 14:10:36
Message-ID: 10945.1272550236@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> On Thu, 29 Apr 2010, Tom Lane wrote:
>> There's a fuzz factor of (IIRC) 1% in path cost comparisons. It's
>> deciding that the seqscan and bitmapscan total costs are not
>> meaningfully different; then since the startup costs *are* meaningfully
>> different, it's making the choice on the basis of cheaper startup cost.

> hmm, what if we add index scan preference inside 1% tolerance ?

Why? IMO this behavior is perfectly reasonable; in fact I've sometimes
thought the fuzz threshold should be a lot more than 1%. There is no
reason for the planner to believe that the bitmapscan is meaningfully
superior on total cost, while it is clearly inferior on startup cost.

If your problem is that the seqscan is a lot slower in reality,
the answer to that is to twiddle the cost parameters so that the
planner knows that, not to object to this logic.

regards, tom lane