Re: Question on Explain : Index Scan

Lists: pgsql-general
From: DM <dm(dot)aeqa(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Question on Explain : Index Scan
Date: 2010-10-21 01:47:30
Message-ID: AANLkTimxr7YnvfWcTgALca3TwEmQhOT2sDiMw7JXSOZ0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Question on Index scan:
--------------------------------------------------------------------------->
test=# \d test_seqindex1
Table "public.test_seqindex1"
Column | Type | Modifiers
--------+-----------------------+-----------
sid | character varying(13) | not null
name | character varying(80) |
Indexes:
"test_seqindex1_pkey" PRIMARY KEY, btree (sid)

test=# explain select * from test_seqindex1 where sid='AA023';
QUERY
PLAN
-------------------------------------------------------------------------------------------
Index Scan using test_seqindex1_pkey on test_seqindex1 (cost=0.00..8.27
rows=1 width=28)
Index Cond: ((sid)::text = 'AA023'::text)

test=# \d test_seqindex2
Table "public.test_seqindex2"
Column | Type | Modifiers
--------+-----------------------+-----------
eid | integer | not null
sid | character varying(13) |
ename | character varying(80) |
Indexes:
"test_seqindex2_pkey" PRIMARY KEY, btree (eid)
"idx_test_seqindex2_sid" btree (sid)

test=# explain select * from test_seqindex2 where sid='AA023';
QUERY
PLAN
--------------------------------------------------------------------------------------
Bitmap Heap Scan on test_seqindex2 (cost=4.95..275.53 rows=73 width=30)
Recheck Cond: ((sid)::text = 'AA023'::text)
-> Bitmap Index Scan on idx_test_seqindex2_sid (cost=0.00..4.93 rows=73
width=0)
Index Cond: ((sid)::text = 'AA023'::text)

test=explain select * from test_seqindex1 t1,test_seqindex2 t2 where
t1.sid=t2.sid;
QUERY
PLAN
------------------------------------------------------------------------------------
Hash Join (cost=1231.55..46386.19 rows=920544 width=58)
Hash Cond: ((t2.sid)::text = (t1.sid)::text)
-> Seq Scan on test_seqindex2 t2 (cost=0.00..16225.97 rows=920697
width=30)
-> Hash (cost=581.80..581.80 rows=33580 width=28)
-> Seq Scan on test_seqindex1 t1 (cost=0.00..581.80 rows=33580
width=28)
(5 rows)

*I was hoping the optimizer would do a join using index scan*.

*Could some one please explain me why its not doing an index scan rather
than sequential scan .*

Thanks
Deepak


From: Mathieu De Zutter <mathieu(at)dezutter(dot)org>
To: DM <dm(dot)aeqa(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question on Explain : Index Scan
Date: 2010-10-21 10:20:43
Message-ID: AANLkTik+AFrvM+ExApccijUN0pnbi=9Nt0iZotP88z31@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Oct 21, 2010 at 3:47 AM, DM <dm(dot)aeqa(at)gmail(dot)com> wrote:
> I was hoping the optimizer would do a join using index scan.
>
> Could some one please explain me why its not doing an index scan rather than
> sequential scan .

A index scan would be probably slower here because you're asking for a
lot of rows. A lot of rows means a lot of I/O, and an index scan is
more I/O intensive (since it has to read the index too). If you limit
the result (by being more selective in your where clause, just like
you do in the first two queries), postgres will most likely switch to
index scan.

You can see for yourself if index-scan would be faster in your case by
running the following command before "explain (analyze)":

set enable_seqscan = off;

BTW, try to use explain analyze instead of explain, that way you'll
see the actual timings too instead of just the planner estimates.

Kind regards,
Mathieu


From: DM <dm(dot)aeqa(at)gmail(dot)com>
To: Mathieu De Zutter <mathieu(at)dezutter(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question on Explain : Index Scan
Date: 2010-10-21 18:09:54
Message-ID: AANLkTinCPyy5reWELSJX3YRZPwFH_hgA8_YNcAOkxyv9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

perfecto, thank you for the explanation.

- Deepak

On Thu, Oct 21, 2010 at 3:20 AM, Mathieu De Zutter <mathieu(at)dezutter(dot)org>wrote:

> On Thu, Oct 21, 2010 at 3:47 AM, DM <dm(dot)aeqa(at)gmail(dot)com> wrote:
> > I was hoping the optimizer would do a join using index scan.
> >
> > Could some one please explain me why its not doing an index scan rather
> than
> > sequential scan .
>
>
> A index scan would be probably slower here because you're asking for a
> lot of rows. A lot of rows means a lot of I/O, and an index scan is
> more I/O intensive (since it has to read the index too). If you limit
> the result (by being more selective in your where clause, just like
> you do in the first two queries), postgres will most likely switch to
> index scan.
>
> You can see for yourself if index-scan would be faster in your case by
> running the following command before "explain (analyze)":
>
> set enable_seqscan = off;
>
> BTW, try to use explain analyze instead of explain, that way you'll
> see the actual timings too instead of just the planner estimates.
>
> Kind regards,
> Mathieu
>


From: DM <dm(dot)aeqa(at)gmail(dot)com>
To: Mathieu De Zutter <mathieu(at)dezutter(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Question on Explain : Index Scan
Date: 2010-10-21 18:51:05
Message-ID: AANLkTinm7iZuJDcXqQAojAsLEYXpvfUyRm--KEx9nNvv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

*Why is the difference in query plan, and the total runtime. Both tables
have the same btree index*

test=# explain analyze select * from test_seqindex1 where sid='AA023';
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_seqindex1_pkey on test_seqindex1 (cost=0.00..8.27
rows=1 width=28) (actual time=0.017..0.018 rows=1 loops=1)
Index Cond: ((sid)::text = 'AA023'::text)
Total runtime: 0.035 ms
(3 rows)

test=# explain analyze select * from test_seqindex2 where sid='AA023';
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_seqindex2 (cost=4.95..275.53 rows=73 width=30)
(actual time=57.833..71.577 rows=2 loops=1)
Recheck Cond: ((sid)::text = 'AA023'::text)
-> Bitmap Index Scan on idx_test_seqindex2_sid (cost=0.00..4.93 rows=73
width=0) (actual time=34.374..34.374 rows=2 loops=1)
Index Cond: ((sid)::text = 'AA023'::text)
Total runtime: 71.599 ms
(5 rows)

test=# \d test_seqindex1
Table "public.test_seqindex1"
Column | Type | Modifiers
--------+-----------------------+-----------
sid | character varying(13) | not null
name | character varying(80) |
Indexes:
"test_seqindex1_pkey" PRIMARY KEY, btree (sid)

test=# \d test_seqindex2
Table "public.test_seqindex2"
Column | Type | Modifiers
--------+-----------------------+-----------
eid | integer | not null
sid | character varying(13) |
ename | character varying(80) |
Indexes:
"test_seqindex2_pkey" PRIMARY KEY, btree (eid)
"idx_test_seqindex2_sid" btree (sid)

===============================================================================================

On Thu, Oct 21, 2010 at 11:09 AM, DM <dm(dot)aeqa(at)gmail(dot)com> wrote:

> perfecto, thank you for the explanation.
>
> - Deepak
>
>
> On Thu, Oct 21, 2010 at 3:20 AM, Mathieu De Zutter <mathieu(at)dezutter(dot)org>wrote:
>
>> On Thu, Oct 21, 2010 at 3:47 AM, DM <dm(dot)aeqa(at)gmail(dot)com> wrote:
>> > I was hoping the optimizer would do a join using index scan.
>> >
>> > Could some one please explain me why its not doing an index scan rather
>> than
>> > sequential scan .
>>
>>
>> A index scan would be probably slower here because you're asking for a
>> lot of rows. A lot of rows means a lot of I/O, and an index scan is
>> more I/O intensive (since it has to read the index too). If you limit
>> the result (by being more selective in your where clause, just like
>> you do in the first two queries), postgres will most likely switch to
>> index scan.
>>
>> You can see for yourself if index-scan would be faster in your case by
>> running the following command before "explain (analyze)":
>>
>> set enable_seqscan = off;
>>
>> BTW, try to use explain analyze instead of explain, that way you'll
>> see the actual timings too instead of just the planner estimates.
>>
>> Kind regards,
>> Mathieu
>>
>
>


From: Mathieu De Zutter <mathieu(at)dezutter(dot)org>
To: DM <dm(dot)aeqa(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question on Explain : Index Scan
Date: 2010-10-22 08:26:33
Message-ID: AANLkTinFAjdrAAEdZLN3LFshty75X-Jz8TEaZ+2xvpKR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Oct 21, 2010 at 8:51 PM, DM <dm(dot)aeqa(at)gmail(dot)com> wrote:
> Why is the difference in query plan, and the total runtime. Both tables have
> the same  btree index
>
>
> test=# explain analyze select * from test_seqindex1 where sid='AA023';
>                                                              QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_seqindex1_pkey on test_seqindex1  (cost=0.00..8.27
> rows=1 width=28) (actual time=0.017..0.018 rows=1 loops=1)
>    Index Cond: ((sid)::text = 'AA023'::text)
>  Total runtime: 0.035 ms
> (3 rows)
>
> test=# explain analyze select * from test_seqindex2 where sid='AA023';
>                                                             QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on test_seqindex2  (cost=4.95..275.53 rows=73 width=30)
> (actual time=57.833..71.577 rows=2 loops=1)
>    Recheck Cond: ((sid)::text = 'AA023'::text)
>    ->  Bitmap Index Scan on idx_test_seqindex2_sid  (cost=0.00..4.93 rows=73
> width=0) (actual time=34.374..34.374 rows=2 loops=1)
>          Index Cond: ((sid)::text = 'AA023'::text)
>  Total runtime: 71.599 ms
> (5 rows)
>
>
> test=# \d test_seqindex1
>        Table "public.test_seqindex1"
>  Column |         Type          | Modifiers
> --------+-----------------------+-----------
>  sid    | character varying(13) | not null
>  name   | character varying(80) |
> Indexes:
>     "test_seqindex1_pkey" PRIMARY KEY, btree (sid)
>
> test=# \d test_seqindex2
>        Table "public.test_seqindex2"
>  Column |         Type          | Modifiers
> --------+-----------------------+-----------
>  eid    | integer               | not null
>  sid    | character varying(13) |
>  ename  | character varying(80) |
> Indexes:
>     "test_seqindex2_pkey" PRIMARY KEY, btree (eid)
>     "idx_test_seqindex2_sid" btree (sid)

Index scans are very efficient when they return very few rows. If they
return more (like in this case: estimate is 73 rows), the cost goes up
linearly (here: 73 times slower). An index does not contain the
elements according to their physical location, so it results in a lot
of random I/O. Here the planner thinks it's better to do a bitmap
index scan (first mark all pages that contain results and then read
all these pages sequentially) so to have sequential I/O instead, which
is usually faster. If your query really returns 73 rows, it would be a
good choice.

However, for your case, the estimates are off (73 matches estimated vs
2 matches actual). Therefore the planner needs better statistics on
the actual values in your table.

Look for "SET STATISTICS" here:
http://www.postgresql.org/docs/9.0/static/sql-altertable.html

Kind regards,
Mathieu