Re: Heavily modified big table bloat even in auto vacuum is running

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2013-12-06 06:26:52
Message-ID: CAA4eK1KpnqQb4R7BMiVHEyjivbch8wzniYSDQ-YAAFBi77Km6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
<haribabu(dot)kommi(at)huawei(dot)com> wrote:
> On 29 November 2013 12:00 Amit Kapila wrote:
>> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
>> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
>> Few questions about your latest patch:
>> a. Is there any reason why you are doing estimation of dead tuples only
>> for Autovacuum and not for Vacuum.
>
> No, changed.
>
>> /* clear and get the new stats for calculating proper dead tuples */
>> pgstat_clear_snapshot(); tabentry =
>> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
>> b. In the above code, to get latest data you are first clearing
>> snapshot and then calling pgstat function. It will inturn perform I/O
>> (read of stats file) and send/receive message from stats collector
>> to ensure it can read latest data. I think it will add overhead
>> to Vacuum, especially if 'nkeep' calculated in function
>> lazy_scan_heap() can serve the purpose. In my simple test[1], I
>> observed
>> that value of keep can serve the purpose.
>>
>> Can you please once try the test on 'nkeep' approach patch.
>
> Using the nkeep and snapshot approach, I ran the test for 40 mins with a
> high analyze_threshold and results are below.
>
> Auto vacuum count Bloat size
> Master 11 220MB
> Patched_nkeep 14 215MB
> Patched_snapshot 18 198MB
>
> Both the approaches are showing good improvement in the test.
> Updated patches, test script and configuration is attached in the mail.

I think updating dead tuple count using nkeep is good idea as similar
thing is done for Analyze as well in acquire_sample_rows().
One minor point, I think it is better to log dead tuples is below error message:
ereport(LOG,
(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
"pages: %d removed, %d remain\n"
"tuples: %.0f removed, %.0f remain\n"

"tuples: %.0f removed, %.0f remain, %.0f dead\n"

About your test, how to collect the data by running this script, are
you manually stopping it after 40 mins, because I ran it for more than
an hour,
the final result didn't came.
As I mentioned you last time, please simplify your test, for other
person in its current form, it is difficult to make meaning out of it.
Write comments on top of it in steps form to explain what exactly it
is doing and how to take data using it (for example, do I need
to wait, till script ends; how long this test can take to complete).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu kommi 2013-12-06 06:40:00 Re: Performance Improvement by reducing WAL for Update Operation
Previous Message Peter Eisentraut 2013-12-06 06:25:11 Re: [PATCH] Add transforms feature