Re: Identifying diskspace leakage

From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Identifying diskspace leakage
Date: 2004-05-15 16:59:29
Message-ID: 200405151059.29052.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here's an attempt at a query to estimate diskspace leakage. This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low. Not sure which of the two approaches below (leak1 or
leak2) is more accurate? Is there a better way via SQL?

The query uses the 'dbsize' project from contrib. Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table. I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance. This query assumes you're keeping stats
updated.

SELECT c.relname,
SUM(s.avg_width) as width,
CAST(c.reltuples as BIGINT) AS tuples,
CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
c.relpages AS pages,
CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
relation_size(s.tablename)/1048576 AS reldu,
CAST((relation_size(s.tablename)
- SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
CAST((relation_size(s.tablename)
- c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 | leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 | 0
table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 | 0
table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 | 0
table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0
table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0
table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 | 0
table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 | 0
table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 | 0
table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 | 0
table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 | 0
table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 | 0
table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 | 4096
table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 | 0
(13 rows)

The basic column definitions are:

tupdu(MB) = avg_width * reltuples
pgdu(MB) = relpages * 8K/page
reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
leak1 = reldu - tupdu
leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dennis Gearon 2004-05-15 21:29:32 Interesting Unique Index Needed - Functional?
Previous Message Tom Lane 2004-05-15 16:44:27 Re: pg_xlog becomes extremely large during CREATE INDEX