Re: Vacuum, analyze, and setting reltuples of pg_class

From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum, analyze, and setting reltuples of pg_class
Date: 2006-12-14 10:39:58
Message-ID: E1539E0ED7043848906A8FF995BDA5790198EFE5@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> >>> Short version: is it optimal for vacuum to always populate
reltuples
> >>> with live rows + dead rows?
> >>
> >> If we didn't do that, it would tend to encourage the use of
seqscans on
> >> tables with lots of dead rows, which is probably a bad thing.
>
> > So then why does vacuum do that? ISTM that it makes more sense for
it to
> > act the same as analyze and only count live rows.
>
> I think what you misread what I said: it's better to have the larger
> count in reltuples so that the planner won't try to use a seqscan when
> there are, say, 3 live tuples and 100K dead ones.

I don't agree. The metric to avoid scans should be/is table size.
(number of pages needed to be read for expected number of rows)
The number of tuples is relevant to estimate call frequency of
related nodes. So from that perspective we do not want dead tuples
in the count.
Maybe we need to improve the estimate in the large table few live
tuples case, but I think we should adjust vacuum and not analyze.

If you have a join with the said table with 3 rows and join it
to a same size but lots of visible tuples table, you would want to
start with the table with 3 rows.

Andreas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yoshiyuki Asaba 2006-12-14 11:13:34 pg_restore fails with a custom backup file
Previous Message Teodor Sigaev 2006-12-14 10:32:21 Re: Operator class group proposal