Understanding sequential versus index scans.

Lists: pgsql-general
From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Understanding sequential versus index scans.
Date: 2009-07-19 22:58:19
Message-ID: e09785e00907191558q4da127f1k46d6c59575503a5c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why
is this so? And how can I shut this off?
select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) --
this is my X above
OR word like 'moon%' -- this is my Y above

Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
time=16.635..28.580 rows=8 loops=1)
Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
SubPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019 rows=2
loops=1)
Total runtime: 28.658 ms
(Using just X or Y alone uses the index, and completes in 0.150 ms)
Is this a bug?


From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-19 22:59:38
Message-ID: e09785e00907191559j4a4fa660kf2e4bf715ac468be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

PS Running "PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)"

On Sun, Jul 19, 2009 at 6:58 PM, Robert James <srobertjames(at)gmail(dot)com>wrote:

> Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
> do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why
> is this so? And how can I shut this off?
> select * from dict
> where
> word in (select substr('moon', 0, generate_series(3,length('moon')))) --
> this is my X above
> OR word like 'moon%' -- this is my Y above
>
> Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
> time=16.635..28.580 rows=8 loops=1)
> Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
> SubPlan
> -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019
> rows=2 loops=1)
> Total runtime: 28.658 ms
> (Using just X or Y alone uses the index, and completes in 0.150 ms)
> Is this a bug?
>
>
>
>
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-19 23:27:24
Message-ID: 407d949e0907191627s4206eedapfaf48f69aae9350c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Jul 19, 2009 at 11:59 PM, Robert James<srobertjames(at)gmail(dot)com> wrote:
> PS Running "PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)"
>
> On Sun, Jul 19, 2009 at 6:58 PM, Robert James <srobertjames(at)gmail(dot)com>
> wrote:
>>
>> Hi.  I notice that when I do a WHERE x, Postgres uses an index, and when I
>> do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't.  Why
>> is this so? And how can I shut this off?
>> select * from dict
>> where
>>  word in (select substr('moon', 0, generate_series(3,length('moon')))) --
>> this is my X above
>>  OR word like 'moon%' -- this is my Y above
>> Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
>> time=16.635..28.580 rows=8 loops=1)
>>  Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
>>  SubPlan
>>  -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019
>> rows=2 loops=1)
>> Total runtime: 28.658 ms
>> (Using just X or Y alone uses the index, and completes in 0.150 ms)
>> Is this a bug?

Well there are known bugs in 8.2.1 -- that's why the current 8.2
release is 8.2.13.

The next step here is to set enable_seqscan=off and run explain
analyze again. You may have to adjust some costs to sync the estimated
cost with actual run-time.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-19 23:47:40
Message-ID: 11961.1248047260@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert James <srobertjames(at)gmail(dot)com> writes:
> Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
> do WHERE y, it does so as well, but when I do WHERE x OR y, it
> doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

> select * from dict
> where
> word in (select substr('moon', 0, generate_series(3,length('moon')))) --
> this is my X above
> OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary. You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

regards, tom lane


From: John R Pierce <pierce(at)hogranch(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-19 23:48:07
Message-ID: 4A63B0B7.4030109@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert James wrote:
> Hi. I notice that when I do a WHERE x, Postgres uses an index, and
> when I do WHERE y, it does so as well, but when I do WHERE x OR y, it
> doesn't. Why is this so? And how can I shut this off?

maybe its because you have no index on (X OR Y) ? or maybe because the
analyzer thinks that X or Y includes enough rows that a sequential scan
is more effective ?


From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-20 00:10:22
Message-ID: e09785e00907191710q7368d50fx2589b5005f49d7cf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

UNION was better, but still 5 times as slow as either query done
individually.
set enable_seqscan=off didn't help at all - it was totally ignored
Is there anything else I can do?

On Sun, Jul 19, 2009 at 7:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert James <srobertjames(at)gmail(dot)com> writes:
> > Hi. I notice that when I do a WHERE x, Postgres uses an index, and when
> I
> > do WHERE y, it does so as well, but when I do WHERE x OR y, it
> > doesn't.
>
> It can use indexes for OR conditions, but not for arbitrary OR
> conditions...
>
> > select * from dict
> > where
> > word in (select substr('moon', 0, generate_series(3,length('moon')))) --
> > this is my X above
> > OR word like 'moon%' -- this is my Y above
>
> ... and that one is pretty arbitrary. You might have some luck with
> using a UNION instead, viz
>
> select * from dict where X
> union all
> select * from dict where Y
>
> regards, tom lane
>


From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-20 00:58:05
Message-ID: e09785e00907191758r43a6187ja0b372a5a7547fc2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Is there anyway to tell Postgres "Run these two queries, and union their
results, but don't change the plan as to a UNION - just run them
separately"?
Something seems funny to me that running a UNION should be twice as slow as
running the two queries one after the other.

On Sun, Jul 19, 2009 at 8:10 PM, Robert James <srobertjames(at)gmail(dot)com>wrote:

> UNION was better, but still 5 times as slow as either query done
> individually.
> set enable_seqscan=off didn't help at all - it was totally ignored
> Is there anything else I can do?
>
> On Sun, Jul 19, 2009 at 7:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Robert James <srobertjames(at)gmail(dot)com> writes:
>> > Hi. I notice that when I do a WHERE x, Postgres uses an index, and when
>> I
>> > do WHERE y, it does so as well, but when I do WHERE x OR y, it
>> > doesn't.
>>
>> It can use indexes for OR conditions, but not for arbitrary OR
>> conditions...
>>
>> > select * from dict
>> > where
>> > word in (select substr('moon', 0, generate_series(3,length('moon'))))
>> --
>> > this is my X above
>> > OR word like 'moon%' -- this is my Y above
>>
>> ... and that one is pretty arbitrary. You might have some luck with
>> using a UNION instead, viz
>>
>> select * from dict where X
>> union all
>> select * from dict where Y
>>
>> regards, tom lane
>>
>
>


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-20 01:05:00
Message-ID: dcc563d10907191805s37af2aaby39351ef0d783fb2f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Jul 19, 2009 at 6:10 PM, Robert James<srobertjames(at)gmail(dot)com> wrote:
> UNION was better, but still 5 times as slow as either query done
> individually.
> set enable_seqscan=off didn't help at all - it was totally ignored
> Is there anything else I can do?

Did you try union, or union all?


From: Robert James <srobertjames(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-20 13:22:19
Message-ID: e09785e00907200622t532ff2baqa367b7418dda3011@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Yes, I had done UNION. UNION ALL achives the expected plan and speed! Thank
you!
BTW, this is interesting, because there are only about 5 or 6 rows max
returned from both queries - but I guess the planner expects more and hence
changes the plan to remove duplicates.

On Sun, Jul 19, 2009 at 9:05 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>wrote:

> On Sun, Jul 19, 2009 at 6:10 PM, Robert James<srobertjames(at)gmail(dot)com>
> wrote:
> > UNION was better, but still 5 times as slow as either query done
> > individually.
> > set enable_seqscan=off didn't help at all - it was totally ignored
> > Is there anything else I can do?
>
> Did you try union, or union all?
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Understanding sequential versus index scans.
Date: 2009-07-20 14:13:10
Message-ID: 407d949e0907200713y6e0a081an3c3908ec1ac0e362@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jul 20, 2009 at 2:22 PM, Robert James<srobertjames(at)gmail(dot)com> wrote:
> BTW, this is interesting, because there are only about 5 or 6 rows max
> returned from both queries - but I guess the planner expects more and hence
> changes the plan to remove duplicates.

If you sent the plans for the various attempts we might be able to
explain what's going on.

--
greg
http://mit.edu/~gsstark/resume.pdf