Re: tuning autovacuum

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuning autovacuum
Date: 2011-06-09 15:53:22
Message-ID: 4DF0EC72.5000003@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> Well, if there are more tables that need vacuuming than there are
> workers available at any given time, there will be a delay. We
> probably don't keep track of that delay at present, but we could.
>

There are at least four interesting numbers to collect each time
autovacuum runs:

1) This one, when was the threshold crossed. I believe one of the AV
workers would have to pause periodically to update these if they're all
busy doing work.
2) What time did the last autovacuum start at
3) How many dead rows were there at the point when it started
4) When did the last autovacuum end (currently the only value stored)

There may be a 5th piece of state I haven't looked at yet worth
exposing/saving, something related to how much work was skipped by the
partial vacuum logic introduced in 8.4. I haven't looked at that code
enough to know which is the right metric to measure its effectiveness
by, but I have tis gut feel it's eventually going to be critical for
distinguishing between the various common types of vacuum-heavy
workloads that show up.

All of these need to be stored in a system table/view, so that an admin
can run a query to answer questions like:

-What is AV doing right now?
-How far behind is AV on tables it needs to clean but hasn't even
started on?
-How long is the average AV taking on my big tables?
-As I change the AV parameters, what does it do to the runtimes against
my big tables?

As someone who is found by a lot of people whose problems revolve around
databases with heavy writes or update churn, limitations in the current
state of tracking what autovacuum does have been moving way up my
priority list the last year. I now have someone who is always running
autovacuum on the same table, 24x7. It finishes every two days, and
when it does the 20% threshold is already crossed for it to start
again. The "wait until a worker was available" problem isn't there, but
I need a good wasy to track all of the other three things to have a hope
of improving their situation. Right now getting the data I could use
takes parsing log file output and periodic dumps of pg_stat_user_tables,
then stitching the whole mess together.

You can't run a heavily updated database in the TB+ range and make sense
of what autovacuum is doing without a large effort matching output from
log_autovacuum_min_duration and the stats that are visible in
pg_stat_user_tables. It must get easier than that to support the sort
of bigger tables it's possible to build now. And if this data starts
getting tracked, we can start to move toward AV parameters that are
actually aiming at real-world units, too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-06-09 15:54:19 Re: Core Extensions relocation
Previous Message Greg Smith 2011-06-09 15:51:01 Re: Core Extensions relocation