Re: Heavily modified big table bloat even in auto vacuum is running

From: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2013-12-12 06:54:51
Message-ID: 8977CB36860C5843884E0A18D8747B0372BF34D5@szxeml558-mbs.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06 December 2013 11:57 Amit Kapila wrote:
> On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> > On 29 November 2013 12:00 Amit Kapila wrote:
> >> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
> >> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> >> Few questions about your latest patch:
> >> a. Is there any reason why you are doing estimation of dead tuples
> >> only for Autovacuum and not for Vacuum.
> >
> > No, changed.
> >
> >> /* clear and get the new stats for calculating proper dead tuples */
> >> pgstat_clear_snapshot(); tabentry =
> >> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
> >> b. In the above code, to get latest data you are first clearing
> >> snapshot and then calling pgstat function. It will inturn perform
> I/O
> >> (read of stats file) and send/receive message from stats
> >> collector to ensure it can read latest data. I think it will add
> overhead
> >> to Vacuum, especially if 'nkeep' calculated in function
> >> lazy_scan_heap() can serve the purpose. In my simple test[1], I
> >> observed
> >> that value of keep can serve the purpose.
> >>
> >> Can you please once try the test on 'nkeep' approach patch.
> >
> > Using the nkeep and snapshot approach, I ran the test for 40 mins
> with
> > a high analyze_threshold and results are below.
> >
> > Auto vacuum count Bloat size
> > Master 11 220MB
> > Patched_nkeep 14 215MB
> > Patched_snapshot 18 198MB
> >
> > Both the approaches are showing good improvement in the test.
> > Updated patches, test script and configuration is attached in the
> mail.
>
> I think updating dead tuple count using nkeep is good idea as similar
> thing is done for Analyze as well in acquire_sample_rows().
> One minor point, I think it is better to log dead tuples is below error
> message:
> ereport(LOG,
> (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
> "pages: %d removed, %d remain\n"
> "tuples: %.0f removed, %.0f remain\n"
>
> "tuples: %.0f removed, %.0f remain, %.0f dead\n"
>
>
> About your test, how to collect the data by running this script, are
> you manually stopping it after 40 mins, because I ran it for more than
> an hour, the final result didn't came.
> As I mentioned you last time, please simplify your test, for other
> person in its current form, it is difficult to make meaning out of it.
> Write comments on top of it in steps form to explain what exactly it is
> doing and how to take data using it (for example, do I need to wait,
> till script ends; how long this test can take to complete).

A simplified test and updated patch by taking care the above comment are attached in the mail.
I am not able to reduce the test duration but changed as the test automatically exists after 45 mins run.
Please check vacuum_test.sh file more details for running the test.

Auto vacuum count Bloat size
Master 15 220MB
Patched_nkeep 18 213MB

Please let me know your suggestions.

Regards,
Hari babu.

Attachment Content-Type Size
vacuum_fix_v7_nkeep.patch application/octet-stream 5.0 KB
test_script.tar.gz application/x-gzip 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2013-12-12 07:02:41 Re: pgbench with large scale factor
Previous Message knizhnik 2013-12-12 06:53:00 Re: In-Memory Columnar Store