Re: how to help the planner

From: Marty Frasier <m(dot)frasier(at)escmatrix(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, James Quinn <j(dot)quinn(at)escmatrix(dot)com>
Subject: Re: how to help the planner
Date: 2013-03-28 20:45:26
Message-ID: CAAX5s64Cq_1L8xmNZw7JufM+bDuC5yQNniRAuNcuGU8JBoKGmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

I cranked (join|from)_collapse_limit up to 50, then 500 just to exclude the
limits completey, and attempted the query both times. The planner came up
with an estimate close to the other estimates (1,944,276) and I stopped
actual execution after some length of time.

The t12 subquery is grouped differently because that particular test can be
valid at mutliple schools per student.

I had set session pg_default_statistics to 10000 and analyzed prior to the
earlier runs to allow it to have the best stats it could. I've looked at
it a little more closely, setting pg_default_statistics back to default of
100 and re-ran analyze on that database.

The value 'cahsee_ela' occurs 75,000 times in column
analysis.iteration__student__test__year.test which totals 11M rows. It's
ranked about 60 of 91 values in frequency.
By setting statistics=1000 on the column 'test' the MCV from pg_stats
contains all 91 distinct values (there are no nulls) and there is no
histogram_bounds value for the column. From MCV: cahsee_ela = 0.00658
which is accurate.
I think that should give the planner good info on the selectivity of the
where clause. It appears from the var_eq_const function that it will use
that exact value when found. It doesn' t seem to help the outcome though
as it had good stats before. I just understand it a little better now -
which is good.

Do you have any suggestions where to probe next?
I see some statistics hooks mentioned in some of the source codes but don't
know how to take advantage of them or whether it would be of use.
I suppose the answer could eventually be we have to reorganize our queries?

Thanks,
Marty

On Thu, Mar 28, 2013 at 12:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Marty Frasier <m(dot)frasier(at)escmatrix(dot)com> writes:
> > We've been using postgreSQL for a few years. This is my first post here
> > and first real dive into query plans.
>
> One quick thought is that it's probably worth cranking up
> join_collapse_limit and/or from_collapse_limit, since the number of
> relations in the query is considerably more than the default values of
> those limits. This will make planning take longer but possibly find
> better plans. I'm not sure it will help a lot, since most of the
> problem is evidently bad rowcount estimates, but it might help.
>
> Also it seems like the major rowcount failing is in the estimate for the
> t12 subquery. I can't tell why that particular combination of WHERE
> clauses is giving it such a hard time --- is there something odd about
> the distribution of 'cahsee_ela' tests? Why is that particular subquery
> grouped over school/student when all the others are grouped over just
> student?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2013-03-28 21:03:42 Re: Question about postmaster's CPU usage
Previous Message kelphet xiong 2013-03-28 19:07:01 Question about postmaster's CPU usage