Re: Possible TODO item? VACUUM on empty table

Lists: pgsql-hackers
From: Richard Huxton <dev(at)archonet(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Possible TODO item? VACUUM on empty table
Date: 2004-11-26 09:32:43
Message-ID: 41A6F83B.8010505@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Over on the support lists we've seen a couple of examples this week of
people doing a bulk import/delete/replace cycle with a vacuum analyse
just after the delete.

Instinctively, it's where you'd vacuum and I'm guessing the analyse just
gets typed at the same time from habit (I know I've done so more than
once). Of course, it flattens your stats and your import can then take
forever.

So - would it be worth, in the case of vacuum+analyse (but not analyse
alone), where the table is completely empty then just halve the existing
stats.

That way, if you're in the middle of a delete/replace cycle your stats
will still be in the realms of reasonableness. On the other hand, if the
table is genuinely empty and you're running a nightly vacuum the stats
will tend toward 0.

Against the idea:
1. It's a special-case in the code
2. There's a work-around (unless you're running pg_autovacuum I'd guess)
2. We perhaps want a more general table description (this is a
bulk-loading table, this is a log table, this is a mostly static lookup
table)

--
Richard Huxton
Archonet Ltd


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible TODO item? VACUUM on empty table
Date: 2004-11-26 10:52:02
Message-ID: 87k6s8x4q5.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Richard Huxton <dev(at)archonet(dot)com> writes:

> So - would it be worth, in the case of vacuum+analyse (but not analyse alone),
> where the table is completely empty then just halve the existing stats.
>
> That way, if you're in the middle of a delete/replace cycle your stats will
> still be in the realms of reasonableness. On the other hand, if the table is
> genuinely empty and you're running a nightly vacuum the stats will tend toward
> 0.
>
> Against the idea:
> 1. It's a special-case in the code

Hm. I suppose the generalization of this would be to always average the new
stats with the existing stats.

I suppose you could make the argument that if I run analyze and then run
analyze again without changing anything then I've effectively doubled the
sample size and the stats can take advantage of that.

But not all of the stats are simple totals. I'm not sure this would make sense
for the histogram arrays.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible TODO item? VACUUM on empty table
Date: 2004-11-26 18:48:34
Message-ID: 10380.1101494914@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> But not all of the stats are simple totals. I'm not sure this would make sense
> for the histogram arrays.

Yeah, I don't know how you "halve" a histogram.

But the problem is not with the pg_statistic stats, I think. It is
already true that ANALYZE punts without touching pg_statistic if it was
unable to find any live rows, and AFAIR it always has. So the problem
case of delete all/analyze doesn't clobber pg_statistic.

The issue is only with the relpages and reltuples counts in pg_class.
I already have a proposal on the table to get rid of these in favor of
a "tuples per page" entry, see eg
http://archives.postgresql.org/pgsql-general/2004-08/msg01422.php

An objection that I forgot to mention in that message is that any such
change would break autovacuum in its current form; although that issue
largely vanishes if we integrate autovacuum into the backend, and in any
case we could offer a built-in function to return the current number of
pages in a table.

regards, tom lane