how to help the planner

From: Marty Frasier <m(dot)frasier(at)escmatrix(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: James Quinn <j(dot)quinn(at)escmatrix(dot)com>
Subject: how to help the planner
Date: 2013-03-28 15:59:05
Message-ID: CAAX5s64o2LCF0cWNqDMTiM1JuSyyESvh1T1QW1cmDTpi4n2rdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings,

We've been using postgreSQL for a few years. This is my first post here
and first real dive into query plans.

A description of what you are trying to achieve and what results you expect.:
Query results of nested joins of table. Results are correct - just
takes a long time with selected plan.

PostgreSQL version number you are running:
PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

How you installed PostgreSQL:

yum, using PGDG repo (package
postgresql92-server-9.2.3-2PGDG.rhel6.x86_64 and friends)

Changes made to the settings in the postgresql.conf file
DateStyle = ISO, MDY
default_tablespace = esc_data
default_text_search_config = pg_catalog.english
effective_cache_size = 24GB
lc_messages = en_US.UTF-8
lc_monetary = en_US.UTF-8
lc_numeric = en_US.UTF-8
lc_time = en_US.UTF-8
listen_addresses = 0.0.0.0
log_connections = on
log_destination = stderr
log_disconnections = on
log_line_prefix = %t %c
log_rotation_age = 1d
log_timezone = US/Eastern
logging_collector = on
maintenance_work_mem = 96MB
max_connections = 100
search_path = "$user", esc_funcs, public
shared_buffers = 8GB
TimeZone = US/Eastern
track_functions = all
track_io_timing = on

Operating system and version:
Red Hat Enterprise Linux Server release 6.4 (Santiago)

What program you're using to connect to PostgreSQL:

java(jdbc) and psql

Is there anything relevant or unusual in the PostgreSQL server logs?:

no

The issue is similar on PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled
by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3 on Ubuntu 10.04 64-bit
although we're doing troubleshooting on our new RHEL server.

We have a particular query that takes about 75 minutes to complete. The
selected execution plan estimates 1 row from several of the outermost
results so picks nested loop join resolutions. That turns out to be a bad
choice since actual row counts are in the thirty to fifty thousand range.
original selected plan: http://explain.depesz.com/s/muR
SQL: http://pastebin.com/f40Xp0JM

I set enable_nestloop=false to hint at the planner not to use nested loop.
That resulted in 13 second runtime. It appears this plan was considered
originally but estimated cost was higher than the plan above.
enable_nestloop=false: http://explain.depesz.com/s/mAa
SQL: http://pastebin.com/CgcSe7r6

We tried rewriting the query using WITH clauses. That took 82 seconds but
plan thought it would take much longer.
using with clauses: http://explain.depesz.com/s/GEZ
SQL: http://pastebin.com/ZRvRK2TV

We have been looking into the issue to the best of our ability but can't
figure out how to help the planner. I've looked at the planner source some
and see where row count is set to 1 if it's <= 1. I haven't found where
it's set yet but presume it was unable to determine the result set row
count and defaulted to 1.

I've run analyze manually and tried it with default_statistics_target=10000
to see if that helped. It didn't.
The table is static - no new rows are being added and there is no other
load on the database.

schema dump: http://pastebin.com/pUU0BJvr

What can we do to help the planner estimate better?

Thanks in advance,
Marty Frasier

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2013-03-28 16:18:46 Re: how to help the planner
Previous Message Josh Berkus 2013-03-27 22:52:10 Re: 9.2.3 upgrade reduced pgbench performance by 60%