Encouraging multi-table join order

From: Dan Harris <fbsd(at)drivefaster(dot)net>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Encouraging multi-table join order
Date: 2006-04-10 19:04:06
Message-ID: 443AAC26.4070802@drivefaster.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a query that is intended to select from multiple "small tables"
to get a limited subset of "incidentid" and then join with a "very
large" table. One of the operations will require a sequential scan, but
the planner is doing the scan on the very large table before joining the
small ones, resulting in a huge amount of disk I/O. How would I make
this query join the large table only after narrowing down the possible
selections from the smaller tables? This is running on version 8.0.3.

Thanks for any ideas.

-Dan

QUERY
########################################
explain analyze
select distinct
eventmain.incidentid,
eventmain.entrydate,
eventgeo.eventlocation,
recordtext as retdata
from
eventactivity
join (
select
incidentid
from
k_h
where
id = 33396 and
k_h.entrydate >= '2006-1-1 00:00' and
k_h.entrydate < '2006-4-8 00:00'
) id_keywords using ( incidentid ) ,

eventmain,
eventgeo
where
eventmain.incidentid = eventactivity.incidentid and
eventmain.incidentid = eventgeo.incidentid and
( ' ' || recordtext || ' ' like '%HAL%' ) and
eventactivity.entrydate >= '2006-1-1 00:00' and
eventactivity.entrydate < '2006-4-8 00:00'
order by
eventmain.entrydate limit 10000;

EXPLAIN ANALYZE OUTPUT
########################################
Limit (cost=2521191.65..2521191.90 rows=6 width=187) (actual
time=1360935.787..1361072.277 rows=1400 loops=1)
-> Unique (cost=2521191.65..2521191.90 rows=6 width=187) (actual
time=1360935.779..1361067.853 rows=1400 loops=1)
-> Sort (cost=2521191.65..2521191.66 rows=6 width=187)
(actual time=1360935.765..1360958.258 rows=16211 loops=1)
Sort Key: eventmain.entrydate, eventmain.incidentid,
eventactivity.recordtext, eventgeo.eventlocation
-> Nested Loop (cost=219.39..2521191.57 rows=6
width=187) (actual time=1123.115..1360579.798 rows=16211 loops=1)
-> Nested Loop (cost=219.39..2521173.23 rows=6
width=154) (actual time=1105.773..1325907.716 rows=16211 loops=1)
-> Hash Join (cost=219.39..2521153.37
rows=6 width=66) (actual time=1069.476..1289608.261 rows=16211 loops=1)
Hash Cond: (("outer".incidentid)::text
= ("inner".incidentid)::text)
-> Seq Scan on eventactivity
(cost=0.00..2518092.06 rows=1532 width=52) (actual
time=57.205..1288514.530 rows=2621 loops=1)
Filter: ((((' '::text ||
(recordtext)::text) || ' '::text) ~~ '%HAL%'::text) AND (entrydate >=
'2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate <
'2006-04-08 00:00:00'::timestamp without time zone))
-> Hash (cost=217.53..217.53 rows=741
width=14) (actual time=899.128..899.128 rows=0 loops=1)
-> Index Scan using k_h_id_idx
on k_h (cost=0.00..217.53 rows=741 width=14) (actual
time=55.097..893.883 rows=1162 loops=1)
Index Cond: (id = 33396)
Filter: ((entrydate >=
'2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate <
'2006-04-08 00:00:00'::timestamp without time zone))
-> Index Scan using eventmain_incidentid_idx
on eventmain (cost=0.00..3.30 rows=1 width=88) (actual
time=1.866..2.227 rows=1 loops=16211)
Index Cond:
((eventmain.incidentid)::text = ("outer".incidentid)::text)
-> Index Scan using eventgeo_incidentid_idx on
eventgeo (cost=0.00..3.04 rows=1 width=75) (actual time=1.770..2.126
rows=1 loops=16211)
Index Cond: ((eventgeo.incidentid)::text =
("outer".incidentid)::text)
Total runtime: 1361080.787 ms
(19 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brad Nicholson 2006-04-10 19:40:58 Re: pg 8.1.3, AIX, huge box, painfully slow.
Previous Message Vivek Khera 2006-04-10 18:48:13 Re: Restore performance?