Re: Heavily modified big table bloat even in auto vacuum is running

From: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2013-11-26 13:56:48
Message-ID: 8977CB36860C5843884E0A18D8747B0372BEFFE0@szxeml558-mbs.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 November 2013 10:43 Amit Kapila wrote:
> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> > On 19 November 2013 10:33 Amit Kapila wrote:
> >> If I understood correctly, then your patch's main intention is to
> >> correct the estimate of dead tuples, so that it can lead to Vacuum
> >> cleaning the table/index which otherwise is not happening as per
> >> configuration value (autovacuum_vacuum_threshold) in some of the
> >> cases, also it is not reducing the complete bloat (Unpatched -
> 1532MB
> >> ~Patched - 1474MB), as the main reason of bloat is extra space in
> >> index which can be reclaimed by reindex operation.
> >>
> >> So if above is correct then this patch has 3 advantages:
> >> a. Extra Vacuum on table/index due to better estimation of dead
> tuples.
> >> b. Space reclaim due to this extra vacuum c. may be some performance
> >> advantage as it will avoid the delay in cleaning dead tuples
> >>
> >> I think better way to test the patch is to see how much benefit is
> >> there due to above (a and b points) advantages. Different values of
> >> autovacuum_vacuum_threshold can be used to test.
> >
> >
> > The performance effect of the patch is not much visible as I think
> the
> > analyze on the table estimates the number of dead tuples of the table
> with some estimation.
>
> Yes, that seems to be the reason why you are not seeing any
> performance benefit, but still I think this is useful optimization to
> do, as
> analyze updates both the livetuples and dead tuples and similarly
> vacuum should also update both the counts. Do you see any reason
> why Vacuum should only update live tuples and not deadtuples?

As vacuum touches all the pages where the dead tuples are present. This is not the
Same with analyzer. Because of this reason, the analyzer estimates the dead tuples also.
With the proposed patch the vacuum also estimates the dead tuples.

> > Because of this reason not much performance improvement is not
> visible
> > as the missed dead tuple calculation in vacuum is covered by the
> analyze.
>
> Yeah, so might be we can check once by configuring
> analyze_threshold/scalefactor in a way that analyze doesn't get trigger
> during your test.

I ran the test for one hour with a high analyze_threshold and results are below.

Auto vacuum count Bloat size
Master 15 155MB
Patched 23 134MB

Updated test script and configuration is attached in the mail.

Regards,
Hari babu.

Attachment Content-Type Size
test_script.tar.gz application/x-gzip 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2013-11-26 15:08:23 Re: Logging WAL when updating hintbit
Previous Message Marko Tiikkaja 2013-11-26 13:43:14 Re: psql shows line number