Assistance with optimizing query - same SQL, different category_id = Seq Scan

From: Brendan Duddridge <brendan(at)clickspace(dot)com>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Assistance with optimizing query - same SQL, different category_id = Seq Scan
Date: 2006-05-09 01:29:32
Message-ID: A8C726F2-AE42-4812-9B71-B6CFE68F18B3@clickspace.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I have a query that generates two different plans when there's only a
change in the category_id used in the query.

The first query has category_id = 1001573 and return 3117 rows from
the category_product table.
The second query has category_id = 1001397 and returns 27889 rows
from the category_product table.

The first query does all access via indexes.
The second query does all access via indexes except for a sequential
scan on the Price table.

Here is the explain analyze for the first query:

explain analyze
select distinct pr.amount
from merchant_product mp,
category_product cp,
price pr
where cp.category_id = 1001573 and
cp.product_id = mp.product_id and
cp.product_status_code = 'complete' and
cp.product_is_active = 'true' and
mp.is_active = 'true' and
mp.merchant_product_id = pr.merchant_product_id
order by amount asc;


QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------
Unique (cost=24311.37..24325.11 rows=2748 width=11) (actual
time=277.953..280.844 rows=622 loops=1)
-> Sort (cost=24311.37..24318.24 rows=2748 width=11) (actual
time=277.952..278.490 rows=4007 loops=1)
Sort Key: pr.amount
-> Nested Loop (cost=0.00..24154.40 rows=2748 width=11)
(actual time=0.295..262.225 rows=4007 loops=1)
-> Nested Loop (cost=0.00..14658.32 rows=2750
width=4) (actual time=0.229..84.908 rows=4007 loops=1)
-> Index Scan using
x_category_product__category_id_fk_idx on category_product cp
(cost=0.00..3054.20 rows=2369 width=4) (actual time=0.136..20.746
rows=2832 loops=1)
Index Cond: (category_id = 1001573)
Filter: (((product_status_code)::text =
'complete'::text) AND ((product_is_active)::text = 'true'::text))
-> Index Scan using
merchant_product__product_id_fk_idx on merchant_product mp
(cost=0.00..4.89 rows=1 width=8) (actual time=0.019..0.021 rows=1
loops=2832)
Index Cond: ("outer".product_id =
mp.product_id)
Filter: ((is_active)::text = 'true'::text)
-> Index Scan using
price__merchant_product_id_fk_idx on price pr (cost=0.00..3.44
rows=1 width=15) (actual time=0.042..0.043 rows=1 loops=4007)
Index Cond: ("outer".merchant_product_id =
pr.merchant_product_id)
Total runtime: 281.709 ms

Here is the explain analyze for the second (slow) query:

explain analyze
select distinct pr.amount
from merchant_product mp,
category_product cp,
price pr
where cp.category_id = 1001397 and
cp.product_id = mp.product_id and
cp.product_status_code = 'complete' and
cp.product_is_active = 'true' and
mp.is_active = 'true' and
mp.merchant_product_id = pr.merchant_product_id
order by amount asc;


QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------------------------------------
Unique (cost=106334.48..106452.38 rows=6050 width=11) (actual
time=7140.302..7162.345 rows=2567 loops=1)
-> Sort (cost=106334.48..106393.43 rows=23580 width=11) (actual
time=7140.300..7143.873 rows=26949 loops=1)
Sort Key: pr.amount
-> Hash Join (cost=77475.88..104621.95 rows=23580
width=11) (actual time=4213.546..7015.639 rows=26949 loops=1)
Hash Cond: ("outer".merchant_product_id =
"inner".merchant_product_id)
-> Seq Scan on price pr (cost=0.00..20782.51
rows=1225551 width=15) (actual time=0.059..1482.238 rows=1225551
loops=1)
-> Hash (cost=77416.91..77416.91 rows=23590
width=4) (actual time=4212.042..4212.042 rows=26949 loops=1)
-> Merge Join (cost=22632.74..77416.91
rows=23590 width=4) (actual time=1851.012..4186.067 rows=26949 loops=1)
Merge Cond: ("outer".product_id =
"inner".product_id)
-> Index Scan using
merchant_product__product_id_fk_idx on merchant_product mp
(cost=0.00..51365.12 rows=1226085 width=8) (actual
time=0.073..3141.654 rows=1208509 loops=1)
Filter: ((is_active)::text =
'true'::text)
-> Sort (cost=22632.74..22683.55
rows=20325 width=4) (actual time=507.110..511.076 rows=26949 loops=1)
Sort Key: cp.product_id
-> Index Scan using
x_category_product__category_id_fk_idx on category_product cp
(cost=0.00..21178.38 rows=20325 width=4) (actual time=0.145..440.113
rows=26949 loops=1)
Index Cond: (category_id =
1001397)
Filter:
(((product_status_code)::text = 'complete'::text) AND
((product_is_active)::text = 'true'::text))
Total runtime: 7172.359 ms

Notice the sequential scan of the Price table? It scanned 1,225,551
rows in the second query.

Do you have any suggestions on how I can optimize the query so both
versions of the query come back fast without doing a sequential scan
on the price table?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB T2G 0V9

http://www.clickspace.com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message kah_hang_ang 2006-05-09 01:39:13 Re: extremely slow when execute select/delete for certain tables
Previous Message Klint Gore 2006-05-09 00:10:19 Re: performance question (something to do w/ parameterized