Re: -HEAD planner issue wrt hash_joins on dbt3 ?

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date: 2006-09-13 07:35:30
Message-ID: 4507B4C2.9060208@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> Tom Lane wrote:
>>> Could we see the actual EXPLAIN ANALYZE results for the slow plan?
>
>> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt
>
> Well, indeed it seems that the hash join is just an innocent bystander:
> the bulk of the runtime (all but about 120 sec in fact) is spent here:
>
> -> Nested Loop (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1)
> -> Merge Join (cost=0.00..10248.66 rows=20000 width=41) (actual time=16.654..2578.060 rows=19837 loops=1)
> ...
> -> Bitmap Heap Scan on lineitem (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
> Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey)
> -> Bitmap Index Scan on i_l_suppkey (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 loops=19837)
> Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)
>
> I suppose that the profile result you showed was taken during the
> startup transient where it was computing the hashtables that this loop's
> results are joined to ... but that's not where the problem is. The
> problem is repeating that bitmap scan on lineitem for nearly 20000
> different l_suppkeys.

possible - I actually took them over a longer period of time

>
> Apparently we've made the planner a bit too optimistic about the savings
> that can be expected from repeated indexscans occurring on the inside of
> a join. The other plan uses a different join order and doesn't try to
> join lineitem until it's got orders.o_orderkey, whereupon it does a
> mergejoin against an indexscan on lineitem:
>
> -> Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 rows=59991868 loops=1)
>
> The runtimes for the remainders of the plans are roughly comparable, so
> it's the cost of joining lineitem that is hurting here.
>
> Is lineitem sorted (or nearly sorted) by l_orderkey? Part of the
> problem could be overestimating the cost of this indexscan.
>
> What are the physical sizes of lineitem and its indexes, and how do
> those compare to your RAM? What are you using for planner settings
> (particularly effective_cache_size)?

ouch - you are right(as usual) here.
effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:

http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt

as for the relation sizes:

dbt3=# select pg_relation_size('lineitem');
pg_relation_size
------------------
10832764928
(1 row)

dbt3=# select pg_total_relation_size('lineitem');
pg_total_relation_size
------------------------
22960259072
(1 row)

there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in
size.

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-09-13 08:04:53 Re: Optimizer improvements: to do or not to do?
Previous Message Gregory Stark 2006-09-13 06:24:32 Re: Simplifying "standby mode"