Re: The science of optimization in practical terms?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: decibel <decibel(at)decibel(dot)org>, Greg Smith <gsmith(at)gregsmith(dot)com>, jd(at)commandprompt(dot)com, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bernd Helmle <mailings(at)oopsware(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The science of optimization in practical terms?
Date: 2009-02-18 12:50:40
Message-ID: 603c8f070902180450q653d0a2mc249607b2fe9bf64@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 18, 2009 at 1:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm interested to know whether anyone else shares my belief that
>> nested loops are the cause of most really bad plans. What usually
>> happens to me is that the planner develops some unwarranted optimism
>> about the number of rows likely to be generated by the outer side of
>> the join and decides that it's not worth sorting the inner side or
>> building a hash table or using an index, and that the right thing to
>> do is just rescan the inner node on every pass. When the outer side
>> returns three or four orders of magnitude more results than expected,
>> ka-pow!
>
> And then there is the other half of the world, who complain because it
> *didn't* pick a nestloop for some query that would have run in much less
> time if it had.

Well, that's my question: is that really the other half of the world,
or is it the other 5% of the world? And how does it happen? In my
experience, most bad plans are caused by bad selectivity estimates,
and the #1 source of bad selectivity estimates is selectivity
estimates for unknown expressions.

(Now it appears that Josh is having problems that are caused by
overestimating the cost of a page fetch, perhaps due to caching
effects. Those are discussed upthread, and I'm still interested to
see whether we can arrive at any sort of consensus about what might be
a reasonable approach to attacking that problem. My own experience
has been that this problem is not quite as bad, because it can throw
the cost off by a factor of 5, but not by a factor of 800,000, as in
my example of three unknown expressions with a combined selectivity of
0.1.)

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-02-18 14:00:49 pg_migrator progress
Previous Message Simon Riggs 2009-02-18 12:40:08 Re: Hot standby, recovery infra