Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas H(dot)" <me(at)alternize(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
Date: 2006-11-10 00:39:54
Message-ID: 13288.1163119194@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Thomas H." <me(at)alternize(dot)com> writes:
>> Um, what's the datatype of sm_info_ean and dvd_ean exactly?

> varchar(15) and varchar(14)

OK. I was wondering if you'd tried to use the new contrib/isn code and
it was messing up the estimates somehow. Seems like a red herring.

After looking more closely, I think the issue is that the size of the
dvds/movies join is estimated at 39900 rows when it's really only 2,
and this estimate discourages the planner from using a nestloop join of
that join against data_soundmedia. We were thinking this was entirely
because of poor estimation of the ~~ operators, but I suddenly realize
that what's happening is that the size of the left join is being clamped
to be at least the size of its left input --- that is, the planner is
failing to distinguish JOIN/ON clauses (which can't suppress left-side
rows) from WHERE clauses (which can). Per comment in
set_joinrel_size_estimates:

* Basically, we multiply size of Cartesian product by selectivity.
*
* If we are doing an outer join, take that into account: the output must
* be at least as large as the non-nullable input. (Is there any chance
* of being even smarter?) (XXX this is not really right, because it
* assumes all the restriction clauses are join clauses; we should figure
* pushed-down clauses separately.)

This didn't matter a whole lot back when the planner couldn't reorder
outer joins, but now that it can, it's more important that the estimates
be accurate.

I'm not sure if this is feasible to fix before 8.2, but I'll take a
look.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dawid Kuroczko 2006-11-10 00:55:25 Remote tables infrastructure.
Previous Message Cornelia Boenigk 2006-11-10 00:33:42 zero values in statistic views