Re: BUG #5059: Planner ignores estimates when planning an IN () subquery

From: Kenaniah Cerny <kenaniah(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5059: Planner ignores estimates when planning an IN () subquery
Date: 2009-09-17 05:32:50
Message-ID: f647f4600909162232h27d115f4of2e299177a1d3a67@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

My apologies, I meant user_activity_log, and not user_anime_log in the
previous email.

This isn't a huge issue as the query could easily be rewritten using a more
pragmatic join structure, but I thought it would be worth mentioning as
occasionally bugs involve more than just what meets the eye. Thank you for
the help.

On Wed, Sep 16, 2009 at 7:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Sep 16, 2009 at 6:39 PM, Kenaniah Cerny <kenaniah(at)gmail(dot)com>
> wrote:
> > I can provide the output of statistics queries if you would like. Just
> let
> > me know which statistics you want and I'll pastebin them.
> >
> > As far as selectivity goes, the selectivity estimate for the
> > user_anime_log.user_account_id was definitely miscalculated. The
> > user_anime_log contains up to 15 entries per user (70,000 users, but only
> > 475,811 rows). The default statistics target on that relation is set to
> > 1000. But even with poor statistics, guessing 62,000 rows when there's a
> > maximum of 15 per user account is still quite a miss.
>
> You've changed the table name on me, vs. what you pasted in the query,
> which had a user_activity_log but no user_anime_log...
>
> > Analysis of SELECT * FROM user_activity_log WHERE user_account_id =
> 17238;
> > estimates 13 rows and returns 15, which is quite reasonable considering
> the
> > statistics targets.
> >
> > Please forgive my ignorance, but in the case of my subquery, is the
> > estimated number of rows and cost being taken into account (or only the
> > selectivity)?
>
> Well, selectivity is just a term that refers to the fraction of rows
> that match some condition (rows themselves do not have selectivity).
> Usually the initial estimating is done in terms of selectivity, which
> is then multiplied by the total number of rows to find the number of
> rows that will remain after the condition is applied.
>
> So, yes, rows and cost are taken into account. The problem here is
> that the planner is mis-estimating the selectivity, therefore it
> computes the wrong number of rows (way too high), therefore it makes
> the wrong decision.
>
> > Granted I don't understand much about the planner internals,
> > but it seems strange that a nested loop semi join would be chosen when
> the
> > inner table is estimated to return an extremely low number of rows with
> low
> > cost and low selectivity. Shouldn't the planner also estimate the cost of
> an
> > inner (er, left?) join in that scenario?
>
> Well... you can't replace a semi join with an inner or left join,
> because it doesn't do the same thing. You could use a hash semi join
> or merge join semi join, but that doesn't make sense if, as you say,
> the inner table is estimated to return an extremely low number of
> rows.
>
> It might be a bit easier to analyze this if you stripped out all the
> joins that aren't necessary to reproduce the problem. Also, I would
> prefer EXPLAIN ANALYZE output posted in-line to the mailing list
> rather than pasted to a separate web site - it screws up the
> formatting.
>
> But honestly I'm not sure how much time it's worth spending on this.
> You have a way to rewrite the query that works... and fixing the
> estimator is going to be hard... so I suggest doing it the way that
> works, and moving on!
>
> ...Robert
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andre Rothe 2009-09-17 12:15:28 Wrong default values of columns
Previous Message Tom Lane 2009-09-17 03:26:13 Re: 답장: bug report