Re: Heavily modified big table bloat even in auto vacuum is running

From: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2013-11-29 13:25:21
Message-ID: 8977CB36860C5843884E0A18D8747B0372BF0B5B@szxeml558-mbs.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 November 2013 12:00 Amit Kapila wrote:
> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> > On 25 November 2013 10:43 Amit Kapila wrote:
> >> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
> >> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> >> > On 19 November 2013 10:33 Amit Kapila wrote:
> >> >> If I understood correctly, then your patch's main intention is to
> >> >> correct the estimate of dead tuples, so that it can lead to
> Vacuum
> >> >> cleaning the table/index which otherwise is not happening as per
> >> >> configuration value (autovacuum_vacuum_threshold) in some of the
> >> >> cases, also it is not reducing the complete bloat (Unpatched -
> >> 1532MB
> >> >> ~Patched - 1474MB), as the main reason of bloat is extra space
> in
> >> >> index which can be reclaimed by reindex operation.
> >> >>
> >> >> So if above is correct then this patch has 3 advantages:
> >> >> a. Extra Vacuum on table/index due to better estimation of dead
> >> tuples.
> >> >> b. Space reclaim due to this extra vacuum c. may be some
> >> >> performance advantage as it will avoid the delay in cleaning dead
> >> >> tuples
> >> >>
> >> >> I think better way to test the patch is to see how much benefit
> is
> >> >> there due to above (a and b points) advantages. Different values
> >> >> of autovacuum_vacuum_threshold can be used to test.
> >> >
> >> >
> >> > The performance effect of the patch is not much visible as I think
> >> the
> >> > analyze on the table estimates the number of dead tuples of the
> >> > table
> >> with some estimation.
> >>
> >> Yes, that seems to be the reason why you are not seeing any
> >> performance benefit, but still I think this is useful optimization
> to
> >> do, as
> >> analyze updates both the livetuples and dead tuples and similarly
> >> vacuum should also update both the counts. Do you see any reason
> >> why Vacuum should only update live tuples and not deadtuples?
> >
> > As vacuum touches all the pages where the dead tuples are present.
> > This is not the Same with analyzer. Because of this reason, the
> analyzer estimates the dead tuples also.
> > With the proposed patch the vacuum also estimates the dead tuples.
>
> Few questions about your latest patch:
> a. Is there any reason why you are doing estimation of dead tuples only
> for Autovacuum and not for Vacuum.

No, changed.

> /* clear and get the new stats for calculating proper dead tuples */
> pgstat_clear_snapshot(); tabentry =
> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
> b. In the above code, to get latest data you are first clearing
> snapshot and then calling pgstat function. It will inturn perform I/O
> (read of stats file) and send/receive message from stats collector
> to ensure it can read latest data. I think it will add overhead
> to Vacuum, especially if 'nkeep' calculated in function
> lazy_scan_heap() can serve the purpose. In my simple test[1], I
> observed
> that value of keep can serve the purpose.
>
> Can you please once try the test on 'nkeep' approach patch.

Using the nkeep and snapshot approach, I ran the test for 40 mins with a
high analyze_threshold and results are below.

Auto vacuum count Bloat size
Master 11 220MB
Patched_nkeep 14 215MB
Patched_snapshot 18 198MB

Both the approaches are showing good improvement in the test.
Updated patches, test script and configuration is attached in the mail.

Regards,
Hari babu.

Attachment Content-Type Size
vacuum_fix_v6_nkeep.patch application/octet-stream 3.6 KB
test_script.tar.gz application/x-gzip 8.3 KB
vacuum_fix_v6_snapshot.patch application/octet-stream 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajeev rastogi 2013-11-29 13:50:27 Re: COPY table FROM STDIN doesn't show count tag
Previous Message Peter Eisentraut 2013-11-29 13:15:52 commit fest 2013-11 week 2 report