correlation in pg_stats

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: rm_pg(at)cheapcomplexdevices(dot)com
Subject: correlation in pg_stats
Date: 2005-02-08 17:44:07
Message-ID: Pine.LNX.4.58.0501271445430.17387@greenie.cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Short summary:

* It looks to me like the planner vastly overestimates
the # of pages read by index scan in quite a few of my
tables even though stats collected by ANALYZE are correct.

* The problem happens any time you have multiple columns
that have a number of repeated values in them, and
you CLUSTER the table by a sort using both columns
(like "city,state,zip,phone#" or "firstname,lastname").

* I think this is the problem that Mark Kirkwood is seeing
in his threads Query optimizer 8.0.1 and "One Big trend
vs multiple smaller trends" in hackers.

* A test script demonstrating the issue also follows.

* I think keeping one more stat per attribute in
pg_stastic that could describe this behavior.

Longer:

If I understand the optimizer correctly, correlation is used
to both guess how much random disk access will be required in
a query; as well as estimate how many pages will be read.

Unfortunately, many tables in my larger databases have
columns with values that are tightly packed on a few pages;
even though there is no total-ordering across the whole table.
Stephan Szabo described this as a "clumping effect":
http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php

The test script below shows a table with 6 columns and
1 million rows.

Note that ANALYZE correctly observes that the correlation
for all the columns except "A" is near zero.

However, also note that columns A,B,C,and even D will have
extremely few UNIQUE values on any given page of data. And
conversely, an index scan on any of those columns be a good
choice (as seen by EXPLAIN ANALYZE output below).

Instead of just storing "correlation", I would also like
to store a variation of "correlation" that "pretends" that
the disk-pages for a particular column were sorted by the
min value for that column. Wherever the optimizer would
use the existing 'correlation' to estimate the number
of pages that would be accesses; it could use this
"correlation discounting the order of blocks" value instead.
All the math/stastics/theory that suggests correlation is
good at estimating # of pages would remain intact.
Anyway... I've talked too much...

The test script showing
1) the tables
2) the values in pg_stats
3) EXPLAIN ANALYZE of columns showing the problem
follows:

********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************

fli=# create temporary table tmp1mil as
select * from
(select generate_series as a from generate_series(0,9)) as a,
(select generate_series as b from generate_series(0,9)) as b,
(select generate_series as c from generate_series(0,9)) as c,
(select generate_series as d from generate_series(0,9)) as d,
(select generate_series as e from generate_series(0,9)) as e,
(select generate_series as f from generate_series(0,9)) as f
order by a,b,c,d,e,f;
fli-# fli-# fli-# fli-# fli-# fli-# fli-# fli-#
SELECT
fli=# fli=# vacuum analyze tmp1mil;
VACUUM
fli=# select * from pg_stats where tablename='tmp1mil';
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-----------+---------+-----------+-----------+------------+-----------------------+------------------------------------------------------------------------------------------------+------------------+-------------
pg_temp_4 | tmp1mil | a | 0 | 4 | 10 | {7,8,5,0,6,3,4,1,2,9} | {0.113,0.106667,0.105667,0.104333,0.101333,0.097,0.095,0.0936667,0.092,0.0913333} | | 1
pg_temp_4 | tmp1mil | b | 0 | 4 | 10 | {1,4,0,5,6,2,9,7,3,8} | {0.119333,0.112667,0.107,0.101,0.099,0.0976667,0.0946667,0.092,0.0886667,0.088} | | 0.229754
pg_temp_4 | tmp1mil | c | 0 | 4 | 10 | {9,5,0,1,8,6,4,7,3,2} | {0.114667,0.107,0.103,0.101667,0.101667,0.0996667,0.0956667,0.094,0.0933333,0.0893333} | | 0.142119
pg_temp_4 | tmp1mil | d | 0 | 4 | 10 | {4,3,2,8,7,9,5,6,1,0} | {0.114667,0.11,0.108,0.104,0.102667,0.099,0.098,0.0903333,0.0893333,0.084} | | 0.0930835
pg_temp_4 | tmp1mil | e | 0 | 4 | 10 | {0,5,1,9,4,7,8,2,3,6} | {0.109333,0.109333,0.108333,0.100667,0.100333,0.0983333,0.0966667,0.0936667,0.092,0.0913333} | | 0.138575
pg_temp_4 | tmp1mil | f | 0 | 4 | 10 | {2,8,6,0,3,5,9,1,7,4} | {0.104667,0.103667,0.102667,0.102333,0.101667,0.100667,0.100667,0.0986667,0.0943333,0.0906667} | | 0.139991
(6 rows)

fli=# create index tmp1mil__c on tmp1mil(c);
CREATE INDEX
fli=# explain analyze select * from tmp1mil where c=5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on tmp1mil (cost=0.00..19853.00 rows=107001 width=24) (actual time=2.164..480.863 rows=100000 loops=1)
Filter: (c = 5)
Total runtime: 531.345 ms
(3 rows)

fli=# explain analyze select * from tmp1mil where c=5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on tmp1mil (cost=0.00..19853.00 rows=107001 width=24) (actual time=2.191..481.849 rows=100000 loops=1)
Filter: (c = 5)
Total runtime: 531.858 ms
(3 rows)

fli=# set enable_seqscan=false;
SET
fli=# explain analyze select * from tmp1mil where c=5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tmp1mil__c on tmp1mil (cost=0.00..364309.19 rows=107001 width=24) (actual time=0.102..101.460 rows=100000 loops=1)
Index Cond: (c = 5)
Total runtime: 152.103 ms
(3 rows)

fli=# explain analyze select * from tmp1mil where c=5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tmp1mil__c on tmp1mil (cost=0.00..364309.19 rows=107001 width=24) (actual time=0.082..101.298 rows=100000 loops=1)
Index Cond: (c = 5)
Total runtime: 151.914 ms
(3 rows)

fli=#

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2005-02-08 17:51:25 Re: Cross column statistics
Previous Message Bruno Wolff III 2005-02-08 16:50:08 Re: Query optimizer 8.0.1 (and 8.0)