Stopgap solution for table-size-estimate updating problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Stopgap solution for table-size-estimate updating problem
Date: 2004-11-26 22:55:56
Message-ID: 19915.1101509756@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There's been some previous discussion of getting rid of the pg_class
columns relpages and reltuples, in favor of having the planner check the
current relation block count directly (RelationGetNumberOfBlocks) and
extrapolate the current tuple count based on the most recently measured
tuples-per-page density. A couple of past threads are
http://archives.postgresql.org/pgsql-performance/2004-10/msg00367.php
http://archives.postgresql.org/pgsql-general/2004-08/msg01422.php
and the point came up again today:
http://archives.postgresql.org/pgsql-performance/2004-11/msg00401.php
where we were again reminded of the problems incurred by obsolete
estimates.

It occurs to me that we could get most of the bang for the buck without
making any incompatible changes: just leave the existing fields in place
but make the planner use reltuples-divided-by-relpages as the density
estimate. That is, in place of where we have

rel->pages = relation->rd_rel->relpages;
rel->tuples = relation->rd_rel->reltuples;

in plancat.c, just do

rel->pages = RelationGetNumberOfBlocks(relation);
if (relation->rd_rel->relpages > 0)
density = relation->rd_rel->reltuples / relation->rd_rel->relpages;
else
density = some_default_estimate;
rel->tuples = round(rel->pages * density);

In addition to this we'd perhaps want to hack VACUUM so that when the
table is empty, it doesn't simply zero out relpages/reltuples, but
somehow preserves the previous density value so we don't have to fall
back to the default density estimate. (This of course assumes that we
will refill the table with a density roughly similar to the last meaured
density; which might be wrong but it's still better than just using a
default, I think.) One way to do that is to set relpages = zero
(truthfully) but set reltuples to the previously estimated density
(we can do this because it's already a float field). It might look a
little funny to have nonzero reltuples when relpages is zero, but I
think it wouldn't break anything. Then the above logic becomes

rel->pages = RelationGetNumberOfBlocks(relation);
if (relation->rd_rel->relpages > 0)
density = relation->rd_rel->reltuples / relation->rd_rel->relpages;
else if (relation->rd_rel->reltuples > 0) /* already a density */
density = relation->rd_rel->reltuples;
else
density = some_default_estimate;
rel->tuples = round(rel->pages * density);

A variant of this is to set reltuples = density, relpages = 1 instead
of 0, which makes the relpages value a lie but would be even less likely
to confuse client-side code.

Comments? Does this seem like something reasonable to do for 8.0?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas DAZ SD 2004-11-27 00:14:50 Re: Stopgap solution for table-size-estimate updating problem
Previous Message Tom Lane 2004-11-26 21:19:18 Re: problem installing 8.0.0beta5 on OS X 10.3