Query works when kludged, but would prefer "best practice" solution
Hi all,
Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below.
You can see it's pretty slow. Oddly enough, an index for facility_address_id
is available but not being used, but I suspect it's questionable whether it
would be an improvement.
I knew that the filter was best applied to the results of the join - my
attempts to restructure the query with subqueries, etc didn't fool the
planner - it always figured out a plan that had this problem SEQ SCAN +
FILTER in it.
Finally, I "hid" the condition from the planner with a coalesce function -
see "SOLUTION" in the "KLUDGED QUERY" plan below.
Sure enough, a new plan appeared with a remarkable performance improvement!
The purpose of this query is to find facilities within a geographical area
when the complete address data is missing (hence the facility_address_id is
NULL).
PG is 8.4.2 on RH linux server with 1GB ram, HDD is RAID 1.
I don't like kludging like this - so any and all help or advice is
appreciated!
Carlo
ORIGINAL QUERY
select
pp.provider_id,
pp.provider_practice_id,
nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10)) as nearby
on f.default_country_code = 'US'
and f.default_postal_code = nearby.zip
where facility_address_id is null
Hash Join (cost=30258.99..107702.53 rows=9438 width=16) (actual
time=169.516..3064.188 rows=872 loops=1)
Hash Cond: (pp.facility_id = f.facility_id)
PROBLEM:
------------
-> Seq Scan on provider_practice pp (cost=0.00..74632.55 rows=724429
width=12) (actual time=0.039..1999.457 rows=728396 loops=1)
Filter: (facility_address_id IS NULL)
------------
-> Hash (cost=29954.15..29954.15 rows=24387 width=12) (actual
time=156.668..156.668 rows=907 loops=1)
-> Nested Loop (cost=0.00..29954.15 rows=24387 width=12) (actual
time=149.891..155.343 rows=907 loops=1)
-> Function Scan on zips_in_mile_range (cost=0.00..12.50
rows=1000 width=40) (actual time=149.850..149.920 rows=66 loops=1)
-> Index Scan using facility_country_postal_code_idx on
facility f (cost=0.00..29.64 rows=24 width=15) (actual time=0.015..0.048
rows=14 loops=66)
Index Cond: ((f.default_country_code = 'US'::bpchar) AND
((f.default_postal_code)::text = zips_in_mile_range.zip))
Total runtime: 3065.338 ms
KLUDGED QUERY
select
pp.provider_id,
pp.provider_practice_id,
nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10)) as nearby
on f.default_country_code = 'US'
and f.default_postal_code = nearby.zip
and coalesce(pp.facility_address_id, -1) = -1
Nested Loop (cost=0.00..112618.87 rows=180 width=16) (actual
time=149.680..167.261 rows=872 loops=1)
-> Nested Loop (cost=0.00..29954.15 rows=24387 width=12) (actual
time=149.659..155.018 rows=907 loops=1)
-> Function Scan on zips_in_mile_range (cost=0.00..12.50 rows=1000
width=40) (actual time=149.620..149.698 rows=66 loops=1)
-> Index Scan using facility_country_postal_code_idx on facility f
(cost=0.00..29.64 rows=24 width=15) (actual time=0.015..0.045 rows=14
loops=66)
Index Cond: ((f.default_country_code = 'US'::bpchar) AND
((f.default_postal_code)::text = zips_in_mile_range.zip))
SOLUTION
-------------
-> Index Scan using provider_practice_facility_idx on provider_practice
pp (cost=0.00..3.38 rows=1 width=12) (actual time=0.007..0.009 rows=1
loops=907)
Index Cond: (f.facility_id = pp.facility_id)
Filter: (COALESCE(facility_address_id, -1) = -1)
-------------
Total runtime: 168.275 ms
Home |
Main Index |
Thread Index