Any idea on how to improve the statistics estimates for this plan?

From: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Any idea on how to improve the statistics estimates for this plan?
Date: 2012-12-05 19:39:14
Message-ID: CALt0+o9EdN9HDWuzNO2-+Y4XWU6OT03jzvHdzOCrHsLa5dgyMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I'm struggling with a query for some time and the major problem of the
query is that the statistics are way wrong on a particular operation:
-> Nested Loop (cost=3177.72..19172.84 rows=*2* width=112) (actual
time=139.221..603.929 rows=*355331* loops=1)
Join Filter: (l.location_id = r.location_id)
-> Hash Join (cost=3177.71..7847.52 rows=*33914* width=108)
(actual time=138.343..221.852 rows=*36664* loops=1)
Hash Cond: (el.location_id = l.location_id)
...
-> Index Scan using idx_test1 on representations r
(cost=0.01..0.32 rows=*1* width=12) (actual time=0.002..0.008
rows=*10* loops=36664)
...
(extracted from the original plan which is quite massive)

I tried to improve the statistics of l.location_id, el.location_id,
r.location_id and idx_test1.location_id (up to 5000) but it doesn't
get better.

Any idea on how I could get better statistics in this particular
example and why the estimate of the nested loop is so wrong while the
ones for each individual operations are quite good?

This is with PostgreSQL 9.2.1.

Thanks.

--
Guillaume

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Niels Kristian Schjødt 2012-12-05 21:45:04 Re: Ubuntu 12.04 / 3.2 Kernel Bad for PostgreSQL Performance
Previous Message Shaun Thomas 2012-12-05 18:28:23 Ubuntu 12.04 / 3.2 Kernel Bad for PostgreSQL Performance