Forcing query to use an index

From: Michael Nachbaur <mike(at)nachbaur(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Forcing query to use an index
Date: 2003-03-03 21:05:22
Message-ID: D8B0CD90-4DBB-11D7-90E0-000A27935D5A@nachbaur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello everyone,

I have a search query that does a whole bunch of LEFT OUTER JOINs
between multiple tables, since this is a generic search and some
records may not exist for certain customers (e.g. searching for the
text "kate" should bring up people whose customer name, street address
or email addresses match that word). This is for an ISP's customer
management database.

Unfortunately one stage in the query keeps using a sequence scan rather
than the index. Here is the "EXPLAIN ANALYZE" results for the 115 line
SQL query.

Sort (cost=6666.08..6666.08 rows=268 width=265) (actual
time=949.00..949.00 rows=1 loops=1)
-> Aggregate (cost=6487.84..6655.27 rows=268 width=265) (actual
time=948.86..948.86 rows=1 loops=1)
-> Group (cost=6487.84..6648.58 rows=2679 width=265) (actual
time=948.70..948.70 rows=1 loops=1)
-> Sort (cost=6487.84..6487.84 rows=2679 width=265)
(actual time=948.66..948.66 rows=1 loops=1)
-> Merge Join (cost=6106.42..6335.30 rows=2679
width=265) (actual time=859.77..948.06 rows=1 loops=1)
-> Merge Join (cost=6101.24..6319.77
rows=2679 width=247) (actual time=554.11..674.17 rows=2679 loops=1)
-> Index Scan using customer_id_key on
customer c (cost=0.00..129.63 rows=2679 width=156) (actual
time=0.40..43.43 rows=2679 loops=1)
-> Sort (cost=6101.24..6101.24
rows=8117 width=91) (actual time=553.64..559.58 rows=8117 loops=1)
-> Seq Scan on
customer_month_summary cms (cost=0.00..5574.17 rows=8117 width=91)
(actual time=258.03..477.11 rows=8117 loops=1)
-> Sort (cost=5.18..5.18 rows=77 width=18)
(actual time=0.70..0.80 rows=77 loops=1)
-> Seq Scan on emailaddress ea
(cost=0.00..2.77 rows=77 width=18) (actual time=0.08..0.35 rows=77
loops=1)
Total runtime: 951.70 msec

The table in question is "customer_month_summary"; it has 8117 rows.
Essentially, there is one record in the customer_month_summary table
for every month for every customer that has a cable modem (this doesn't
include dial-up users). I have two columns in the summary table that
I'm matching by: CustomerID and MonthStart, which is the first day of
the month in question. I also have an index on this table on
"Customer_Month_Summary(MonthStart, CustomerID)". It keeps trying to
do a sequence scan, and while the query only takes 951 msec right now,
the summary table will keep growing, and I don't want performance to
suffer a few months/years down the line (also, having to wait a second
between search results is a bit too much).

This is currently running on a dual-proc PIII-800 with 4G of ram. I've
put a lot of effort to make this application very responsive, but it's
this one query that keeps killing me. Any help you can provide would
therefore be much appreciated.

--man
Michael A Nachbaur <mike(at)nachbaur(dot)com>

"I used to hate writing assignments, but now I enjoy them. I realized
that the purpose of writing is to inflate weak ideas, obscure poor
reasoning and inhibit clarity.  With a little practice, writing can be
an intimidating and impenetrable fog!" -- Calvin

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Greg Stark 2003-03-03 21:29:18 Gist indexes on int arrays
Previous Message Achilleus Mantzios 2003-03-03 20:57:09 Re: Querying Hierarchical Data