Re: visibility map and reltuples

Lists: pgsql-hackers
From: "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: visibility map and reltuples
Date: 2008-12-13 22:57:44
Message-ID: 20081213225744.GA2268@xavtug.hell-city.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It appears that the visibility map patch is causing pg_class.reltuples to be
set improperly after a vacuum. For example, it is set to 0 if the map
indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was scanned during
the vacuum?

--
Ned T. Crigler


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-14 11:15:16
Message-ID: 4944EAC4.4070002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ned T. Crigler wrote:
> It appears that the visibility map patch is causing pg_class.reltuples to be
> set improperly after a vacuum. For example, it is set to 0 if the map
> indicated that no pages in the heap needed to be scanned.
>
> Perhaps reltuples should not be updated unless every page was scanned during
> the vacuum?

Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.

The interplay of vacuum and analyze in VACUUM ANALYZE needs to be
changed too. Currently, the analyze after vacuum doesn't overwrite
reltuples, because the one calculated by vacuum is based on scanning all
pages, and is thus more accurate than the one estimated from the sample
(which is not true anymore, as you pointed out). I think the vacuum
needs to somehow tell analyze whether it updated reltuples or not, so
that analyze can update reltuples if the vacuum didn't scan all pages.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-15 09:01:28
Message-ID: 49461CE8.4050007@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Ned T. Crigler wrote:
>> It appears that the visibility map patch is causing pg_class.reltuples
>> to be
>> set improperly after a vacuum. For example, it is set to 0 if the map
>> indicated that no pages in the heap needed to be scanned.
>>
>> Perhaps reltuples should not be updated unless every page was scanned
>> during
>> the vacuum?
>
> Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.

Because we use reltuples divided by relpages in the planner, we probably
shouldn't update relpages either if we don't update reltuples.
Otherwise, if the table has grown a lot since we last updated reltuples,
the reltuples / relpages ratio would be less, not more, accurate, if
relpages is updated to a new higher value but reltuples is not.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-15 13:13:45
Message-ID: CE7582AD-B5EF-4C28-A48F-0450472D0C76@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wonder if we should switch to keeping reltuplesperpage instead. Then
a partial vacuum could update it by taking the average number of
tuples per page forbthe pages it saw. Perhaps adjusting it to the
weights average between the old value and the new value based on how
many pages were seen.

I suppose there's no reason we can't update reltuples using that same
logic though it would be a big opaque.

--
Greg

On 15 Dec 2008, at 04:01, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com
> wrote:

> Heikki Linnakangas wrote:
>> Ned T. Crigler wrote:
>>> It appears that the visibility map patch is causing
>>> pg_class.reltuples to be
>>> set improperly after a vacuum. For example, it is set to 0 if the
>>> map
>>> indicated that no pages in the heap needed to be scanned.
>>>
>>> Perhaps reltuples should not be updated unless every page was
>>> scanned during
>>> the vacuum?
>> Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all
>> pages.
>
> Because we use reltuples divided by relpages in the planner, we
> probably shouldn't update relpages either if we don't update
> reltuples. Otherwise, if the table has grown a lot since we last
> updated reltuples, the reltuples / relpages ratio would be less, not
> more, accurate, if relpages is updated to a new higher value but
> reltuples is not.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc: "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-15 13:24:10
Message-ID: 49465A7A.2070103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> I wonder if we should switch to keeping reltuplesperpage instead. Then a
> partial vacuum could update it by taking the average number of tuples
> per page forbthe pages it saw. Perhaps adjusting it to the weights
> average between the old value and the new value based on how many pages
> were seen.

The pages scanned by a partial vacuum isn't a random sample of pages in
the table. That would bias the reltuplesperpage value towards those
pages that are updated more.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-15 15:55:50
Message-ID: 1223.1229356550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Greg Stark wrote:
>> I wonder if we should switch to keeping reltuplesperpage instead. Then a
>> partial vacuum could update it by taking the average number of tuples
>> per page forbthe pages it saw. Perhaps adjusting it to the weights
>> average between the old value and the new value based on how many pages
>> were seen.

> The pages scanned by a partial vacuum isn't a random sample of pages in
> the table. That would bias the reltuplesperpage value towards those
> pages that are updated more.

Yeah ... and it's highly likely that repeatedly-updated pages would have
more dead space than never-updated ones, so there'd be a systematic
creep towards underestimation of the total tuple count.

I think your previous sketch is right: suppress update of reltuples (and
relpages) from a partial vacuum scan, and ensure that the analyze phase
is allowed to do it instead if it happens during VACUUM ANALYZE.

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-15 17:17:14
Message-ID: Pine.GSO.4.64.0812151207180.7153@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 15 Dec 2008, Greg Stark wrote:

> I wonder if we should switch to keeping reltuplesperpage instead.

It would be preferrable to not touch the user side of reltuples if
possible, since it's the only instant way to get a good estimate of the
number of rows in a table right now. That's been a regular application
technique for at least two years now, since
http://www.varlena.com/GeneralBits/120.php popularized it.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: "Ned T(dot) Crigler" <crigler(at)users(dot)sourceforge(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: visibility map and reltuples
Date: 2008-12-17 09:15:56
Message-ID: 4948C34C.1000209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I think your previous sketch is right: suppress update of reltuples (and
> relpages) from a partial vacuum scan, and ensure that the analyze phase
> is allowed to do it instead if it happens during VACUUM ANALYZE.

We also mustn't reset n_live_tuples in pgstat in partial vacuum.
Committed a patch to do that.

Thanks for the report!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com