Re: Problem with index not always being used

Lists: pgsql-general
From: "Rob Tester" <robtester(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org(dot)
Subject: Re: Problem with index not always being used
Date: 2007-02-17 18:44:20
Message-ID: f5f60fb50702171044w14f805a2md2369b0f511d1f9e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am using pg 8.1.4, I have a table with 1.1 million rows of data (see below
for table definition). One field state is numeric and has an index. The
index is not always picked up when searching the table by state only and I
can't figure out why.

So:

SELECT * FROM STUFF WHERE state=12; --causes a seq scan of the table

where

SELECT * FROM STUFF WHERE state=16 --Uses the index.

I have run Analyze on the table as well as vacuumed it and reindexed it. At
first I thought it might be a type mismatch but forcing the number to
numeric (i.e cast(12 as numeric(2,0)) doesn't change the behavior. However
setting the enable_seqscan=off does force both queries to use the index.
Using the index in all cases is faster than a seq scan according to explain
analyze.

Any thoughts on how to get the optimizer to pick up the index at all times?
I am desperate for fresh ideas.

Thanks,

Rob.

Table/index definitions:

CREATE TABLE stuff(
id serial NOT NULL,
module character(8),
tlid numeric(10),
dirp character(2),
name character(30),
type character(4),
dirs character(2),
zip numeric(5),
state numeric(2),
county numeric(3),
CONSTRAINT stuff_pk PRIMARY KEY (id),
)
WITHOUT OIDS;

CREATE INDEX ndx_cc_state
ON stuff
USING btree
(state);


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Rob Tester" <robtester(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Problem with index not always being used
Date: 2007-02-17 19:58:51
Message-ID: 23718.1171742331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Rob Tester" <robtester(at)gmail(dot)com> writes:
> SELECT * FROM STUFF WHERE state=12; --causes a seq scan of the table
> where
> SELECT * FROM STUFF WHERE state=16 --Uses the index.

This behavior is intended and appropriate, if there are lots of rows
with state=12 and not many with state=16. As an example, if nearly the
whole table had state=12 you would certainly not wish it to use an
indexscan for that. The correct way to think about your gripe is that
the planner's cutting over at the wrong row density. There are a couple
of places to look for a solution:

First, are the planner's estimated row counts for both cases reasonably
close to reality, according to EXPLAIN ANALYZE? If not, you may need to
increase the statistics target (either globally with
default_statistics_target or for the state column with ALTER TABLE).
Don't forget to re-ANALYZE the table after changing the target.

If the statistics are good then you need to fool with the planner's cost
parameters to get it to make decisions that reflect your environment.
Decreasing random_page_cost is usually the thing to do if it's choosing
seqscans too readily. But be wary of choosing a new value on the basis
of just one test case.

You can find a lot about this in the pgsql-performance list archives,
and there are several relevant articles at techdocs:
http://www.postgresql.org/docs/techdocs

regards, tom lane


From: "Rob Tester" <robtester(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Problem with index not always being used
Date: 2007-02-18 00:26:57
Message-ID: f5f60fb50702171626i1c1e2ae3l165e57a2830a2eb7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for the thoughts, certainly I will look into what you have explained.
However, the behavior that you expressed isn't what is occuring. In the 12,
16 example 12 does have more rows than 16. However, there are many cases
when this isn't true, that is other states have more rows than 12 and the
optomizer does use the index when I query them. There are 6 states total
that the optomizer doesn't use the index. The other 5 states would rank row
rise in the minority (of number rows) which would make me believe the
optomizer would want to use the index. That said I am investigating the
statistics and the random_page_cost.

Thank you for your insight.

Rob.

On 2/17/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Rob Tester" <robtester(at)gmail(dot)com> writes:
> > SELECT * FROM STUFF WHERE state=12; --causes a seq scan of the table
> > where
> > SELECT * FROM STUFF WHERE state=16 --Uses the index.
>
> This behavior is intended and appropriate, if there are lots of rows
> with state=12 and not many with state=16. As an example, if nearly the
> whole table had state=12 you would certainly not wish it to use an
> indexscan for that. The correct way to think about your gripe is that
> the planner's cutting over at the wrong row density. There are a couple
> of places to look for a solution:
>
> First, are the planner's estimated row counts for both cases reasonably
> close to reality, according to EXPLAIN ANALYZE? If not, you may need to
> increase the statistics target (either globally with
> default_statistics_target or for the state column with ALTER TABLE).
> Don't forget to re-ANALYZE the table after changing the target.
>
> If the statistics are good then you need to fool with the planner's cost
> parameters to get it to make decisions that reflect your environment.
> Decreasing random_page_cost is usually the thing to do if it's choosing
> seqscans too readily. But be wary of choosing a new value on the basis
> of just one test case.
>
> You can find a lot about this in the pgsql-performance list archives,
> and there are several relevant articles at techdocs:
> http://www.postgresql.org/docs/techdocs
>
> regards, tom lane
>