More question about plans & explain (long)

From: "Nick Fankhauser" <nickf(at)ontko(dot)com>
To: "pgsql-admin" <pgsql-admin(at)postgresql(dot)org>
Subject: More question about plans & explain (long)
Date: 2002-04-10 19:19:20
Message-ID: NEBBLAAHGLEEPCGOBHDGMEOFELAA.nickf@ontko.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks to everyone who responded to my question about outer joins.

Now I have some more <grin>.

I'm trying to learn what explain can tell me & whether there are any
reasonable ways to push it's choices in one direction based on my knowledge
of the tables without really screwing up general performance. I haven't
found many resources on the explain info in the documentation, so I'm making
a few guesses about what is meant. Corrections are, of course, welcome!

I'm trying to get better performance out of this query:

explain
select
event.event_date_time,
event.event_duration,
event.event_ical_status,
event.event_location_name,
event.event_type_code,
event.event_hearing_type_desc,
event.event_summary,
case_data.case_public_id || ' ' ||case_data.case_title,
court_config.court_name,
event.event_id,
case_data.case_id,
court_config.court_id,
actor.actor_id,
actor_identifier.actor_identifier_text,
actor_identifier.actor_identifier_type
from
actor_identifier,
actor,
actor_case_assignment,
case_data,
event,
court_config
where
actor_identifier.actor_id = actor.actor_id and
actor.actor_id = actor_case_assignment.actor_id and
actor_case_assignment.case_id = case_data.case_id and
case_data.case_id = event.case_id and
case_data.court_id = court_config.court_id and
actor_identifier_text = '7313 53' and
actor_identifier_type = 'AttorneyStateBarID' and
event_date_time > '4/1/2002' and
event_date_time < '6/1/2002';

All of the columns used for a join or constraint have an index.
A vaccum analyze was done just before this explain.

The explain looks like this (I've numbered the lines so I can refer to
them):

1)Merge Join (cost=1399914.02..1460681.09 rows=154522243 width=248)
2) -> Sort (cost=1374154.11..1374154.11 rows=3605411 width=72)
3) -> Merge Join (cost=138124.99..144473.64 rows=3605411 width=72)
4) -> Sort (cost=93198.01..93198.01 rows=118189 width=60)
5) -> Hash Join (cost=3285.17..75120.12 rows=118189
width=60)
6) -> Seq Scan on actor_case_assignment
(cost=0.00..18880.77 rows=814677 width=24)
7) -> Hash (cost=3285.13..3285.13 rows=15
width=36)
8) -> Index Scan using
actor_identifier_actor_id_text on actor_identifier (cost=0.00..3285.13
rows=15 width=36)
9) -> Sort (cost=44926.98..44926.98 rows=305054 width=12)
10) -> Seq Scan on actor (cost=0.00..7728.54
rows=305054 width=12)
11) -> Materialize (cost=26017.06..26017.06 rows=428584 width=176)
12) -> Merge Join (cost=25759.91..26017.06 rows=428584 width=176)
13) -> Sort (cost=18978.66..18978.66 rows=18219 width=72)
14) -> Hash Join (cost=1.12..17251.58 rows=18219
width=72)
15) -> Seq Scan on case_data (cost=0.00..5407.91
rows=182191 width=48)
16) -> Hash (cost=1.10..1.10 rows=10 width=24)
17) -> Seq Scan on court_config
(cost=0.00..1.10 rows=10 width=24)
18) -> Sort (cost=6781.25..6781.25 rows=2352 width=104)
19) -> Index Scan using event_event_date_time on event
(cost=0.00..6649.52 rows=2352 width=104)

Here are my questions-

(14-17)It looks like on lines 14-17, case_data and court_config are being
joined without any constraint, thus returning 182191 rows, but the join &
sort on 13 & 14 indicate 18219 rows are expected - why does the planner
expect 1/10th of what I expect? Is there a way to affect this expectation?

10)This table has a unique index on the field being scanned, and this field
is also the only row returned from the table- why is a seq scan chosen here?

15)This table has a unique index on the field being scanned, and only 3 rows
are returned to the query- why the seq scan?

(18-19)Is the row count of 2352 from lines 18 and 19 based on a reasonable
guess that out of the 235239 rows in the table, we'll select one out of 100?
Again, where does this number come from, and can I affect it?

12)Line 12 seems to indicate that the planner expects this join to produce
more rows rather than fewer as in the case of line 14 and line 18. In fact,
the real situation is that the join on line 14 will produce exactly 182191
rows, and since the results of line 19 will produce 8537 rows, 12 will also
produce about 8537 rows. So... how does the planner decide whether a join
will increase or decrease the number of rows?

8)The planner seems to expect 15 rows when the table actually has 66559.
This is so much different from the 1:100 ratio in 19 that I wonder how it
was arrived at? (It *is* a very good guess, since the actual number retturn
will be 1.)

6)Indicates a seq scan occuring on a field that is indexed. The table has
800K rows, but it is only 4 columns wide. Is the narrow width making this
choice a good one?

5)I would expect this join to typically result in 1 to 1000 cases, so the
estimate seems very high in this case.

Some other questions-

What is the difference between a hash join & a merge join?

Does materialize fetch the other fields given a list of index values?

Does a seq scan read the entire table, or can it just retrieve the field?

I guess that's enough questions for now. If you've made it this far, thanks
for reading all of this! I look forward to hearing your thoughts.

-Nick
--------------------------------------------------------------------------
Nick Fankhauser nickf(at)ontko(dot)com Phone 1.765.935.4283 Fax 1.765.962.9788
Ray Ontko & Co. Software Consulting Services http://www.ontko.com/

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Alastair D'Silva 2002-04-11 07:31:49 SSL problems
Previous Message Bruce Momjian 2002-04-10 17:03:03 Re: performance "tests"