Re: The science of optimization in practical terms?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, decibel <decibel(at)decibel(dot)org>, Greg Smith <gsmith(at)gregsmith(dot)com>, jd(at)commandprompt(dot)com, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bernd Helmle <mailings(at)oopsware(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The science of optimization in practical terms?
Date: 2009-02-18 20:13:14
Message-ID: 603c8f070902181213q1ad6693ci6169ad144eeb5f47@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 18, 2009 at 2:46 PM, Ron Mayer
<rm_pg(at)cheapcomplexdevices(dot)com> wrote:
> Robert Haas wrote:
>> experience, most bad plans are caused by bad selectivity estimates,
>> and the #1 source of bad selectivity estimates is selectivity
>> estimates for unknown expressions.
>
> ISTM unknown expressions should be modeled as a range of
> values rather than one single arbitrary value.
>
> For example, rather than just guessing 1000 rows, if an
> unknown expression picked a wide range (say, 100 - 10000
> rows; or maybe even 1 - table_size), the planner could
> choose a plan which wouldn't be pathologically slow
> regardless of if the guess was too low or too high.
>
> For that matter, it seems if all estimates used a range
> rather than a single value, ISTM less in general we would
> product less fragile plans.

It would be interesting to find out if something like this could be
made to work, but it's more than I'd be willing to bite off. I think
this would require reworking large portions of the planner, and I am
doubtful that it could be done without a substantial loss of
performance. The existing code considers A LOT of plans, to the point
where even a few more or fewer floating-point operations per plan
result in a measurable change in planning time that can be measured in
macro-benchmarks.

If we could somehow tamp down the amount of time considering plans
that turn out to be dead ends, it might free up some time to perform
some of these other computations. But I'm not sure how to go about
that. The best ideas I've come up with so far involve refactoring
joinpath.c to eliminate some of the duplicate computation and/or
somehow be more intelligent about which nested loops we generate. But
I haven't come up with anything yet that's demonstrably better than
the add_path patch that I submitted a few weeks ago, which is not bad
but not earth-shattering either. At any rate, we'd need to save quite
a bit to pay for carting around best and worst case costs for every
plan we consider.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-02-18 20:32:53 Re: The science of optimization in practical terms?
Previous Message Tom Lane 2009-02-18 19:54:26 Re: Re: [COMMITTERS] pgsql: Remove the special cases to prevent minus-zero results in float4