Re: [HACKERS] Autovacuum Improvements

From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: [HACKERS] Autovacuum Improvements
Date: 2007-01-22 23:39:10
Message-ID: 46EC7F08-54DE-49AD-A546-CB2824E8092D@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:

> Bruce Momjian wrote:
>> Yep, agreed on the random I/O issue. The larger question is if
>> you have
>> a huge table, do you care to reclaim 3% of the table size, rather
>> than
>> just vacuum it when it gets to 10% dirty? I realize the vacuum is
>> going
>> to take a lot of time, but vacuuming to relaim 3% three times
>> seems like
>> it is going to be more expensive than just vacuuming the 10%
>> once. And
>> vacuuming to reclaim 1% ten times seems even more expensive. The
>> partial vacuum idea is starting to look like a loser to me again.
>
> Buying a house with a 25-year mortgage is much more expensive than
> just paying cash too, but you don't always have a choice.
>
> Surely the key benefit of the partial vacuuming thing is that you
> can at least do something useful with a large table if a full
> vacuum takes 24 hours and you only have 4 hours of idle I/O.
>
> It's also occurred to me that all the discussion of scheduling way
> back when isn't directly addressing the issue. What most people
> want (I'm guessing) is to vacuum *when the user-workload allows*
> and the time-tabling is just a sysadmin first-approximation at that.

Yup. I'd really like for my app to be able to say "Hmm. No
interactive users at the moment, no critical background tasks. Now
would be a really good time for the DB to do some maintenance." but
also to be able to interrupt the maintenance process if some new
users or other system load show up.

> With partial vacuuming possible, we can arrange things with just
> three thresholds and two measurements:
> Measurement 1 = system workload
> Measurement 2 = a per-table "requires vacuuming" value
> Threshold 1 = workload at which we do more vacuuming
> Threshold 2 = workload at which we do less vacuuming
> Threshold 3 = point at which a table is considered worth vacuuming.
> Once every 10 seconds, the manager compares the current workload to
> the thresholds and starts a new vacuum, kills one or does nothing.
> New vacuum processes keep getting started as long as there is
> workload spare and tables that need vacuuming.
>
> Now the trick of course is how you measure system workload in a
> meaningful manner.

I'd settle for a "start maintenance", "stop maintenance" API.
Anything else (for instance the heuristics you suggest above) would
definitely be gravy.

It's not going to be simple to do, though, I don't think.

Cheers,
Steve

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paul Lambert 2007-01-22 23:51:44 Re: Installing Postegres side-by-side with M$ SQL server]]
Previous Message George Weaver 2007-01-22 23:32:13 Problems compiling from source

Browse pgsql-hackers by date

  From Date Subject
Next Message elein 2007-01-23 00:05:49 Re: -f <output file> option for pg_dumpall
Previous Message Jim C. Nasby 2007-01-22 23:30:21 Re: autovacuum process handling