Re: -HEAD planner issue wrt hash_joins on dbt3 ?

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date: 2006-09-25 16:51:08
Message-ID: 451808FC.1030804@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[RESENT in a modified version since the original reply seems to have ben
lost by the listserver which seems to happen sometimes lately]

Tom Lane wrote:
> Matteo Beccati <php(at)beccati(dot)com> writes:
>> Tom Lane ha scritto:
>>> Matteo Beccati <php(at)beccati(dot)com> writes:
>>>> I cannot see anything bad by using something like that:
>>>> if (histogram is large/representative enough)
>>> Well, the question is exactly what is "large enough"? I feel a bit
>>> uncomfortable about applying the idea to a histogram with only 10
>>> entries (especially if we ignore two of 'em). With 100 or more,
>>> it sounds all right. What's the breakpoint?
>
>> Yes, I think 100-200 could be a good breakpoint.
>
> I've committed this change with (for now) 100 as the minimum histogram
> size to use. Stefan, are you interested in retrying your benchmark?

spent some time retesting that and I got the following results(this is
the same box as before but with a much slower disk-setup and a newly
initdb'd cluster):

http://www.kaltenbrunner.cc/files/analyze_q9_beta1.txt

all that is with 2GB of effective_cache_size(plan does not change with
much smaller settings btw) and a statistic target of 1000 and the
following parameters:

default planner settings for the EXPLAIN and the first EXPLAIN ANALYZE -
then the same query with disabled hash_joins, the next one is with
disabled nest_loops and the fastest one is with both nest_loop and
hash_joins disabled (all run in that order - so there are possible
caching effects).

in comparision to:

http://www.kaltenbrunner.cc/files/analyze_q9.txt

we nearly got a 7 figure speedup due to the latest changes(much better
estimates at least) - however the mergejoin+sort only plan is still faster.

the other troubling query is the following:

query: http://www.kaltenbrunner.cc/files/7/power1/db/plans/power_query21.txt

plans: http://www.kaltenbrunner.cc/files/analyze_q21_beta1.txt

(default,default,enable_nestloop=off,enable_nestloop=off and
enable_hashjoin=off)

despite having not-too bad estimates for most of the key-points in the
plan the actual runtime of the choosen plan is quite disappointing.

Stefan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-09-25 17:07:17 Re: Increase default effective_cache_size?
Previous Message Josh Berkus 2006-09-25 16:37:18 Re: Release Notes: Major Changes in 8.2