Re: Performance Optimization for Dummies 2 - the SQL

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Optimization for Dummies 2 - the SQL
Date: 2006-10-04 18:22:48
Message-ID: eg0u6e$hbr$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> can you do explain analyze on the two select queries on either side of
> the union separatly? the subquery is correctly written and unlikely
> to be a problem (in fact, good style imo). so lets have a look at
> both sides of facil query and see where the problem is.

Sorry for the delay, the server was down yesterday and couldn't get
anything.

I have modified the sub-queries a little, trying to get the index scans to
fire - all the tables involved here are large enough to benefit from index
scans over sequential scans. I am mystified as to why PART 1 is giving me:

"Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12)
(actual time=0.007..99.033 rows=128268 loops=1)"

which I assume is for the:

"join mdx_core.facility_address as fa on fa.facility_id = f.facility_id"

Then again, I am not sure how to read the EXPLAIN ANALYSE performance
numbers.

The other part of the UNION (PART 2) I have also modified, I think it's
working nicely. Let me know if I'm mistaken on thinking that!

The one remaining problem is that the UNION of these two sub-queries has a
column which is a call to a custom TCL function that does a lexical analysis
on the results, ranking the result names by their proximity to the imported
name. his definitely eats up the performance and I hope that my decision to
call this function on the results of the union (assuming union deletes
redundent rows) is the correct one.

Thanks!

Carlo

/* PART 1.
The redundant expression "facility_address_id is NULL" was removed because
only an OUTER join would have made this meaningful. We use only INNER
joins in this sub-query
Both facility_address and address have seq scans, even though there is an
index for
facility_address(facility_id( and an index for address( country_code,
postal_code, address).
The "like" operator appears to be making things expensive. This is used
because we have to take
into account that perhaps the import row is using the 5-number US ZIP,
not the 9-number USZIP+4
standard (although this is not the case in this sample).
/*
explain analyse select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
a.country_code = 'US'
and a.state_code = 'IL'
and a.postal_code like '60640-5759'||'%'
order by facility_id

"Sort (cost=6392.50..6392.50 rows=1 width=103) (actual
time=189.133..189.139 rows=12 loops=1)"
" Sort Key: f.facility_id"
" -> Nested Loop (cost=2732.88..6392.49 rows=1 width=103) (actual
time=14.006..188.967 rows=12 loops=1)"
" -> Hash Join (cost=2732.88..6388.91 rows=1 width=72) (actual
time=13.979..188.748 rows=12 loops=1)"
" Hash Cond: ("outer".address_id = "inner".address_id)"
" -> Seq Scan on facility_address fa (cost=0.00..3014.68
rows=128268 width=12) (actual time=0.004..98.867 rows=128268 loops=1)"
" -> Hash (cost=2732.88..2732.88 rows=1 width=64) (actual
time=6.430..6.430 rows=3 loops=1)"
" -> Bitmap Heap Scan on address a (cost=62.07..2732.88
rows=1 width=64) (actual time=2.459..6.417 rows=3 loops=1)"
" Recheck Cond: ((country_code = 'US'::bpchar) AND
((state_code)::text = 'IL'::text))"
" Filter: ((postal_code)::text ~~
'60640-5759%'::text)"
" -> Bitmap Index Scan on
address_country_state_postal_code_address_idx (cost=0.00..62.07 rows=3846
width=0) (actual time=1.813..1.813 rows=3554 loops=1)"
" Index Cond: ((country_code = 'US'::bpchar)
AND ((state_code)::text = 'IL'::text))"
" -> Index Scan using facility_pkey on facility f (cost=0.00..3.56
rows=1 width=35) (actual time=0.012..0.013 rows=1 loops=12)"
" Index Cond: ("outer".facility_id = f.facility_id)"
"Total runtime: 189.362 ms"

/* PART 2 - can you see anything that could work faster? */

explain analyse select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
fa.facility_address_id is null
and f.default_country_code = 'US'
and f.default_state_code = 'IL'
and '60640-5759' like f.default_postal_code||'%'

"Nested Loop Left Join (cost=0.00..6042.41 rows=32 width=73) (actual
time=14.923..14.923 rows=0 loops=1)"
" Filter: ("inner".facility_address_id IS NULL)"
" -> Index Scan using facility_country_state_postal_code_idx on facility f
(cost=0.00..5914.69 rows=32 width=73) (actual time=10.118..14.773 rows=10
loops=1)"
" Index Cond: ((default_country_code = 'US'::bpchar) AND
(default_state_code = 'IL'::bpchar))"
" Filter: ('60640-5759'::text ~~ ((default_postal_code)::text ||
'%'::text))"
" -> Index Scan using facility_address_facility_idx on facility_address fa
(cost=0.00..3.97 rows=2 width=8) (actual time=0.009..0.011 rows=1 loops=10)"
" Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 15.034 ms"

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Graham Davis 2006-10-04 18:22:51 Re: Multi-key index not beeing used - bug?
Previous Message me 2006-10-04 18:13:33 Re: UPDATE becomes mired / win32