Re: Heavily modified big table bloat even in auto vacuum is running

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2013-11-08 13:05:11
Message-ID: CAA4eK1KwrvP2TggtZaCBx9AuZ5e+4an0V5fC3zktarWMUfPkQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
<haribabu(dot)kommi(at)huawei(dot)com> wrote:
> On 07 November 2013 09:42 Amit Kapila wrote:
>> I am not sure whether the same calculation as done for new_rel_tuples
>> works for new_dead_tuples, you can once check it.
>
> I didn't find any way to calculate new_dead_tuples like new_rel_tuples.
> I will check it.
>
>> I am thinking that if we have to do estimation anyway, then wouldn't it
>> be better to do the way Tom had initially suggested (Maybe we could
>> have VACUUM copy the n_dead_tuples value as it exists when VACUUM
>> starts, and then send that as the value to subtract when it's done?)
>>
>> I think the reason you gave that due to tuple visibility check the
>> number of dead tuples calculated by above logic is not accurate is
>> right but still it will make the value of dead tuples more appropriate
>> than it's current value.
>>
>> You can check if there is a way to do estimation of dead tuples similar
>> to new tuples, and it will be as solid as current logic of
>> vac_estimate_reltuples(), then it's okay, otherwise use the other
>> solution (using the value of n_dead_tuples at start of Vacuum) to solve
>> the problem.
>
> The two approaches calculations are approximation values only.
>
> 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it once it is done.
> This approach doesn't include the tuples which are remains during the vacuum operation.

Wouldn't next or future vacuum's will make the estimate more appropraite?

> 2. nkeep counter contains the tuples which are still visible to other transactions.
> This approach doesn't include tuples which are deleted on pages where vacuum operation is already finished.
>
> In my opinion the second approach gives the value nearer to the actual value,
> because it includes some of the new dead tuples also. Please correct me if anything wrong in my analysis.
I think main problem in nkeep logic is to come up with an
estimation algorithm similar to live tuples.

By the way, do you have test case or can you try to write a test case
which can show this problem and
then after fix, you can verify if the problem is resolved.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-11-08 14:00:09 Re: shared memory message queues
Previous Message Etsuro Fujita 2013-11-08 09:23:37 Improve code in tidbitmap.c