Planner mis-estimation using nested loops followup

From: "Chris Kratz" <chris(dot)kratz(at)vistashare(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner mis-estimation using nested loops followup
Date: 2008-03-18 15:35:08
Message-ID: 3642025c0803180835n20062314g94870b137d8c3ff7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

A number of weeks ago, I had posted a request for help regarding join
estimates in pg 8.2.6. In moderately complex to very complex ad hoc queries
in our system, we were consistently having the system massively
underestimate the number of rows coming out of join at a low level making
these queries very slow and inefficient. At times the mis-estimation was
1000:1. Ie when it should have been 2000 returned rows from a join, the
planner assumed 1 or 2 rows. Modifying stats on the join columns up to the
max made little difference (y, we analyzed tables in question after each
change). Since the planner sees only one row coming out of the low level
join, it uses nested loops all the way up chain when it would be more
efficient to use another join type. In our informal testing, we found that
by disabling nested loops and forcing other join types, we could get
fantastic speedups. Those queries that seem to benefit most from this have
a lot of sub-queries being built up into a final query set as well as a fair
number of joins in the sub-queries. Since these are user created and are
then generated via our tools, they can be quite messy at times.
After doing this testing, have since added some functionality in our ad hoc
reporting tool to allow us to tune individual queries by turning on and off
individual join types at runtime. As we hear of slow reports, we've been
individually turning off the nested loops on those reports. Almost always,
this has increased the performance of the reports, sometimes in a completely
amazing fashion (many, many minutes to seconds at times). It of course
doesn't help everything and turning off nested loops in general causes
overall slowdown in other parts of the system.

As this has gone on over the last couple of weeks, it feels like we either
have a misconfiguration on the server, or we are tickling a mis-estimation
bug in the planner. I'm hoping it's the former. The db server has 8G of
memory and raid1 -wal, raid10- data configuration, os is linux 2.6.9, db is
8.2.6. The db is a utf-8 db if that is of any bearing and autovac and
bgwriter are on.

Nondefault settings of interest from postgresql.conf

shared_buffers = 1024MB # min 128kB or max_connections*16kB
work_mem = 256MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
random_page_cost = 1.75 # same scale as above
effective_cache_size = 4096MB
default_statistics_target = 100 # range 1-1000

If nothing else, perhaps this will help somebody else who has run into the
same problem. If explain analyze of a query shows a large mis-estimation of
rows returned on a join (estimate=1, actual=2k) causing the planner to
choose nested loops instead of another join type, you might try running the
query with nested loops set to off and see if that helps w/ performance.

Thanks,

-Chris

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2008-03-18 15:50:59 Re: Planner mis-estimation using nested loops followup
Previous Message Tom Lane 2008-03-18 14:49:32 Re: TB-sized databases