Increasing statistics results in worse estimates

From: Shelby Cain <alyandon(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Increasing statistics results in worse estimates
Date: 2005-04-28 21:58:17
Message-ID: 20050428215817.18174.qmail@web50105.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm having a problem with analyze in 8.0.2 that is
really bothering me. I have a table that contains
address, city and state with about 7.8m rows in it.

On that table I have two non-unique indexes for city
and zipcode respectively. This table was loaded in a
manner such that it is essentially sorted by zipcode.
Sorting by zipcode implies that there should exist a
pretty strong correlation on the city column as well.

With a statistics target of 50 on city I'm getting
good estimates for row counts for arbitrary cities
(ie: explain select count(*) from addresses where city
= 'DALLAS' estimates 474k rows out of 500k actual) but
a poor estimate for the correlation (0.13 according to
pg_stats). This seems to be causing the planner to
pick a table scan for "select count(*) from test_zipc
where city = 'DALLAS' (est 474k rows)" vs picking an
index scan for "select count(*) from test_zipc where
zipcode like '75%' (est 2m rows)".

Increasing the statistics target on the city column
and re-analyzing the table seems to make the
correlation estimate better (shows about 0.5) but the
row count estimates are thrown off by 2 orders of
magnitude in some cases. Repeating the above queries
I get a row estimate of 8k for "select count(*) from
test_zipc where city = 'DALLAS'" and a row estimate of
6m for the "select count(*) from test_zipc where
zipcode like '75%'". In this case, the planner picked
an index scan for the city = 'X' condition but for
what I feel are the wrong reasons because it under
estimated the row count. Re-analyzing the table
multiple times always shows about an 8k estimate.

Is my data set that promblematic? Has anyone seen
similar behavior? Any suggestions on how to improve
these stats?

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-04-28 22:24:44 Re: info on strange error messages on postgresql
Previous Message Keatis 2005-04-28 20:52:21 Problem: message type 0xxx arrived from server while idle