Re: Performance Optimization for Dummies 2 - the SQL

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Optimization for Dummies 2 - the SQL
Date: 2006-10-04 22:41:40
Message-ID: eg1dai$1tfa$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Merlin,

Here are the results. The query returned more rows (65 vs 12) because of the
vague postal_code.

In reality, we would have to modify the postal_code logic to take advantage
of full zip codes when they were avalable, not unconditionally truncate
them.

Carlo

explain analyze select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
(a.country_code, a.state_code, mdx_core.zip_trunc(a.postal_code)) =
('US', 'IL', mdx_core.zip_trunc('60640-5759'))
order by facility_id

"Sort (cost=6474.78..6474.84 rows=25 width=103) (actual
time=217.279..217.311 rows=65 loops=1)"
" Sort Key: f.facility_id"
" -> Nested Loop (cost=2728.54..6474.20 rows=25 width=103) (actual
time=35.828..217.059 rows=65 loops=1)"
" -> Hash Join (cost=2728.54..6384.81 rows=25 width=72) (actual
time=35.801..216.117 rows=65 loops=1)"
" Hash Cond: ("outer".address_id = "inner".address_id)"
" -> Seq Scan on facility_address fa (cost=0.00..3014.68
rows=128268 width=12) (actual time=0.007..99.072 rows=128268 loops=1)"
" -> Hash (cost=2728.50..2728.50 rows=19 width=64) (actual
time=33.618..33.618 rows=39 loops=1)"
" -> Bitmap Heap Scan on address a (cost=48.07..2728.50
rows=19 width=64) (actual time=2.569..33.491 rows=39 loops=1)"
" Recheck Cond: ((country_code = 'US'::bpchar) AND
((state_code)::text = 'IL'::text))"
" Filter: (mdx_core.zip_trunc(postal_code) =
'60640'::text)"
" -> Bitmap Index Scan on
address_country_state_zip_trunc_idx (cost=0.00..48.07 rows=3846 width=0)
(actual time=1.783..1.783 rows=3554 loops=1)"
" Index Cond: ((country_code = 'US'::bpchar)
AND ((state_code)::text = 'IL'::text))"
" -> Index Scan using facility_pkey on facility f (cost=0.00..3.56
rows=1 width=35) (actual time=0.009..0.010 rows=1 loops=65)"
" Index Cond: ("outer".facility_id = f.facility_id)"
"Total runtime: 217.520 ms"

""Merlin Moncure"" <mmoncure(at)gmail(dot)com> wrote in message
news:b42b73150610041407y3554f311u1329c4f3bdc53999(at)mail(dot)gmail(dot)com(dot)(dot)(dot)
> On 10/4/06, Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca> wrote:
>> > can you do explain analyze on the two select queries on either side of
>> > the union separatly? the subquery is correctly written and unlikely
>> > to be a problem (in fact, good style imo). so lets have a look at
>> > both sides of facil query and see where the problem is.
>>
>> Sorry for the delay, the server was down yesterday and couldn't get
>> anything.
>>
>> I have modified the sub-queries a little, trying to get the index scans
>> to
>> fire - all the tables involved here are large enough to benefit from
>> index
>> scans over sequential scans. I am mystified as to why PART 1 is giving
>> me:
>>
>
>> "Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268
>> width=12)
>> (actual time=0.007..99.033 rows=128268 loops=1)"
>
> not sure on this, lets go back to that.
>
>> into account that perhaps the import row is using the 5-number US ZIP,
>> not the 9-number USZIP+4
>
>
>> where
>> a.country_code = 'US'
>> and a.state_code = 'IL'
>> and a.postal_code like '60640-5759'||'%'
>> order by facility_id
>
> 1. create a small function, sql preferred which truncates the zip code
> to 5 digits or reduces to so called 'fuzzy' matching criteria. lets
> call it zip_trunc(text) and make it immutable which it is. write this
> in sql, not tcl if possible (trust me).
>
> create index address_idx on address(country_code, state_code,
> zip_trunc(postal_code));
>
> rewrite above where clause as
>
> where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US',
> 'IL', zip_trunc('60640-5759'));
>
> try it out, then lets see how it goes and then we can take a look at
> any seqscan issues.
>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ben 2006-10-05 01:35:15 pg_trgm indexes giving bad estimations?
Previous Message Ben 2006-10-04 21:40:47 Re: any hope for my big query?