Partial vacuum versus pg_class.reltuples

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Partial vacuum versus pg_class.reltuples
Date: 2009-06-06 19:44:50
Message-ID: 14616.1244317490@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I complained a couple days ago that in HEAD, vacuum is putting
very bogus values into pg_class.reltuples for indexes:
http://archives.postgresql.org/pgsql-bugs/2009-06/msg00037.php

After looking through the code a bit, I've confirmed my prior guess that
this is caused by the partial-vacuum patch. The number that's being
used is the number of live tuples found on the pages that were actually
scanned. There's a kluge in there to keep from storing this number as
the relation's own reltuples, but the implications for indexes were not
considered. The index AMs generally assume that what they're told in
IndexVacuumInfo.num_heap_tuples can be trusted, and that's not true
anymore.

I think probably the only workable solution for 8.4 is to use the prior
value of the relation's reltuples field as num_heap_tuples when we have
not scanned the whole heap. This will effectively mean that index
reltuples values don't change either in a partial vacuum.

In the longer term, we need to do something else. As the code now
stands, reltuples is close to being completely untrustworthy: it will
get updated only by VACUUM FULL or anti-wraparound vacuums, neither of
which will happen often in a well-run installation. So the value will
inevitably diverge from reality, perhaps arbitrarily far from reality.
I wonder whether we shouldn't get rid of it altogether (and relpages
too), and make the planner look to the counts maintained by the stats
collector instead of using reltuples/relpages. The main objection I can
think of to that is that turning off stats collection will no longer be
a viable option ... but how many people do that anyway?

Another interesting question is why successive vacuums aren't causing
the index reltuples counts to go to zero. Shouldn't a partial vacuum
result in *all* pages of the relation being marked as not needing to
be examined by the next vacuum?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-06-06 20:05:30 Re: PostgreSQL Developer meeting minutes up
Previous Message Joe Conway 2009-06-06 19:23:31 Re: [Fwd: Re: dblink patches for comment]