Re: [HACKERS] Autovacuum Improvements

From: Richard Huxton <dev(at)archonet(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Pavan Deolasee <pavan(at)enterprisedb(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Autovacuum Improvements
Date: 2007-01-22 19:16:35
Message-ID: 45B50D93.3060509@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Yep, agreed on the random I/O issue. The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty? I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once. And
> vacuuming to reclaim 1% ten times seems even more expensive. The
> partial vacuum idea is starting to look like a loser to me again.

Buying a house with a 25-year mortgage is much more expensive than just
paying cash too, but you don't always have a choice.

Surely the key benefit of the partial vacuuming thing is that you can at
least do something useful with a large table if a full vacuum takes 24
hours and you only have 4 hours of idle I/O.

It's also occurred to me that all the discussion of scheduling way back
when isn't directly addressing the issue. What most people want (I'm
guessing) is to vacuum *when the user-workload allows* and the
time-tabling is just a sysadmin first-approximation at that.

With partial vacuuming possible, we can arrange things with just three
thresholds and two measurements:
Measurement 1 = system workload
Measurement 2 = a per-table "requires vacuuming" value
Threshold 1 = workload at which we do more vacuuming
Threshold 2 = workload at which we do less vacuuming
Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to the
thresholds and starts a new vacuum, kills one or does nothing. New
vacuum processes keep getting started as long as there is workload spare
and tables that need vacuuming.

Now the trick of course is how you measure system workload in a
meaningful manner.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Heikki Linnakangas 2007-01-22 19:19:02 Re: [HACKERS] Autovacuum Improvements
Previous Message Alvaro Herrera 2007-01-22 19:07:42 Re: [HACKERS] Autovacuum Improvements

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-01-22 19:19:02 Re: [HACKERS] Autovacuum Improvements
Previous Message Alvaro Herrera 2007-01-22 19:08:49 Re: savepoint improvements