Estimation error in n_dead_tuples

Lists: pgsql-hackers
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Estimation error in n_dead_tuples
Date: 2007-02-01 10:43:45
Message-ID: 20070201184722.5AD4.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

AFAICS, the statistics information "the number of dead tuples"
(n_dead_tuples) has an estimation error.

VACUUM sends a message to stats collector process when it has swept a table.
The stats collector receives the message and sets n_dead_tuples of the table
to zero. However, we can update or delete tuples even if a concurrent vacuum
is running through the table. There might be some dead tuples that were
created after start of the vacuum. Therefore, it's not always correct to set
n_dead_tuples to zero at the end of vacuum. Especially, the error will be
worse when a vacuum takes long time.

The autovacuum sees the stats information to decide when to vacuum.
The error in n_dead_tuples misleads it and vacuum starvation might occur
because we always underestimate the number of dead tuples.

I'm thinking to add "the number of vacuumed tuples" to the message from
vacuum. The stats collector will subtract the value from n_dead_tuples
instead of setting it to zero. This is also needed if we want to make
some kinds of "partial" vacuum methods.

Thoughts? Is this worth doing, I'd like to implement it.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Estimation error in n_dead_tuples
Date: 2007-02-01 16:57:31
Message-ID: 19286.1170349051@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> I'm thinking to add "the number of vacuumed tuples" to the message from
> vacuum. The stats collector will subtract the value from n_dead_tuples
> instead of setting it to zero. This is also needed if we want to make
> some kinds of "partial" vacuum methods.

This seems awfully dangerous to me, because then you are operating on
dead reckoning forever: there will be nothing that can correct an
inaccurate rowcount estimate, and in practice that means it will diverge
arbitrarily far from reality :-(, because of the inherent inaccuracies
of the stats system. I think the risk of that is far worse than the
relatively small (or at least bounded) error arising from tuples not
seen by vacuum.

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Estimation error in n_dead_tuples
Date: 2007-02-02 02:47:50
Message-ID: 20070202105422.59FD.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > I'm thinking to add "the number of vacuumed tuples" to the message from
> > vacuum. The stats collector will subtract the value from n_dead_tuples
> > instead of setting it to zero.

> This seems awfully dangerous to me, because then you are operating on
> dead reckoning forever

Ah, I see. We need 'reset' of statistics sometimes.

Now I'll propose to save the n_dead_tuples value at the beginning of
vacuum, and send the value with a stat message to subtract. The added
number to n_dead_tuples during vacuum will be left.
(This is true for now, but will be incorrect after "Recalculating
OldestXmin in a long-running vacuum" patch is commited.)

I'll send a patch that adds the above to patches.

> > This is also needed if we want to make
> > some kinds of "partial" vacuum methods.

We can adjust the number of vacuumed tuples for patial vacuums
that scan a part of heap pages. For example,
n_dead_duples_at_start * scanned_pages / rel_pages .
We can also use the actually removed tuples in vacuum for adjustments.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Jim Nasby <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Estimation error in n_dead_tuples
Date: 2007-02-02 06:03:09
Message-ID: 017466B3-56C8-4B9F-97BC-48934B8C6361@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Feb 1, 2007, at 10:57 AM, Tom Lane wrote:
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
>> I'm thinking to add "the number of vacuumed tuples" to the message
>> from
>> vacuum. The stats collector will subtract the value from
>> n_dead_tuples
>> instead of setting it to zero. This is also needed if we want to make
>> some kinds of "partial" vacuum methods.
>
> This seems awfully dangerous to me, because then you are operating on
> dead reckoning forever: there will be nothing that can correct an
> inaccurate rowcount estimate, and in practice that means it will
> diverge
> arbitrarily far from reality :-(, because of the inherent inaccuracies
> of the stats system. I think the risk of that is far worse than the
> relatively small (or at least bounded) error arising from tuples not
> seen by vacuum.

Yeah, it'd be better for vacuum to send a message stating how many
dead rows it couldn't remove, ala:

DETAIL: 0 dead row versions cannot be removed yet.

Granted, not perfect, but better than what we have now.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)