Lot'sa joins - performance tip-up, please?

From: Mario Splivalo <msplival(at)jagor(dot)srce(dot)hr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Lot'sa joins - performance tip-up, please?
Date: 2006-05-02 01:27:54
Message-ID: 1146533274.14006.11.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a quite large query that takes over a minute to run on my laptop.
On the db server it takes olmost 20 seconds, but I have 200+ concurent
users who will be running similair querries, and during the query the
I/O goes bezerk, I read 30MB/s reading (iostat tells so). So, before
going into denormalization, I wonder if I could do something to speed
things up.

The query is like this:

select
*
from
messages
join services on services.id = messages.service_id
join ticketing_messages on messages.id = ticketing_messages.message_id
left join ticketing_winners on ticketing_winners.message_id =
ticketing_messages.message_id
left join
(
select
*
from
ticketing_codes_played
join ticketing_codes on ticketing_codes.code_id =
ticketing_codes_played.code_id
) as codes on codes.message_id = ticketing_messages.message_id
where
services.type_id = 10
and
messages.receiving_time between '2006-02-12' and '2006-03-18 23:00:00';

The explain analyze of the above produces this:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=221692.04..222029.29 rows=3772 width=264)
(actual time=539169.163..541579.504 rows=75937 loops=1)
Merge Cond: ("outer".message_id = "inner".message_id)
-> Sort (cost=40080.17..40089.60 rows=3772 width=238) (actual
time=8839.072..9723.371 rows=75937 loops=1)
Sort Key: messages.id
-> Hash Left Join (cost=2259.40..39856.10 rows=3772
width=238) (actual time=1457.451..7870.830 rows=75937 loops=1)
Hash Cond: ("outer".message_id = "inner".message_id)
-> Nested Loop (cost=2234.64..39811.76 rows=3772
width=230) (actual time=1418.911..7063.299 rows=75937 loops=1)
-> Index Scan using pk_services on services
(cost=0.00..4.46 rows=1 width=54) (actual time=28.261..28.271 rows=1
loops=1)
Index Cond: (1102 = id)
Filter: (type_id = 10)
-> Hash Join (cost=2234.64..39769.58 rows=3772
width=176) (actual time=1390.621..6297.501 rows=75937 loops=1)
Hash Cond: ("outer".id = "inner".message_id)
-> Bitmap Heap Scan on messages
(cost=424.43..32909.53 rows=74408 width=162) (actual
time=159.796..4329.125 rows=75937 loops=1)
Recheck Cond: (service_id = 1102)
-> Bitmap Index Scan on idx_service_id
(cost=0.00..424.43 rows=74408 width=0) (actual time=95.197..95.197
rows=75937 loops=1)
Index Cond: (service_id = 1102)
-> Hash (cost=1212.37..1212.37 rows=75937
width=14) (actual time=940.372..940.372 rows=75937 loops=1)
-> Seq Scan on ticketing_messages
(cost=0.00..1212.37 rows=75937 width=14) (actual time=12.122..461.960
rows=75937 loops=1)
-> Hash (cost=21.21..21.21 rows=1421 width=8) (actual
time=38.496..38.496 rows=1421 loops=1)
-> Seq Scan on ticketing_winners
(cost=0.00..21.21 rows=1421 width=8) (actual time=24.534..31.347
rows=1421 loops=1)
-> Sort (cost=181611.87..181756.68 rows=57925 width=26) (actual
time=530330.060..530647.055 rows=57925 loops=1)
Sort Key: ticketing_codes_played.message_id
-> Nested Loop (cost=0.00..176144.30 rows=57925 width=26)
(actual time=68.322..529472.026 rows=57925 loops=1)
-> Seq Scan on ticketing_codes_played
(cost=0.00..863.25 rows=57925 width=8) (actual time=0.042..473.881
rows=57925 loops=1)
-> Index Scan using ticketing_codes_pk on
ticketing_codes (cost=0.00..3.01 rows=1 width=18) (actual
time=9.102..9.108 rows=1 loops=57925)
Index Cond: (ticketing_codes.code_id =
"outer".code_id)
Total runtime: 542000.093 ms
(27 rows)

I'll be more than happy to provide any additional information that I may
be able to gather. I'd be most happy if someone would scream something
like "four joins, smells like a poor design" because design is poor, but
the system is in production, and I have to bare with it.

Mario
--
"I can do it quick, I can do it cheap, I can do it well. Pick any two."

Mario Splivalo
msplival(at)jagor(dot)srce(dot)hr

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Liberman 2006-05-02 01:37:03 Why is plan (and performance) different on partitioned table?
Previous Message Nolan Cafferky 2006-05-02 00:01:59 Re: Cluster vs. non-cluster query planning