Re: Log levels for checkpoint/bgwriter monitoring

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Log levels for checkpoint/bgwriter monitoring
Date: 2007-03-13 02:42:04
Message-ID: Pine.GSO.4.64.0703121958540.27778@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 9 Mar 2007, Tom Lane wrote:

> It strikes me that the patch would be more useful if it produced a
> histogram of the observed usage_counts

Don't have something worth releasing yet, but I did code a first rev of
this today. The results are quite instructive and it's well worth looking
at.

The main server I work on has shared_buffers=60000 for pgbench testing and
the background writers turned way up (in hopes of reducing checkpoint
times; that's a whole nother topic). I run pgbench with s=100 (~1.6GB
database). Here's what I get as statistics on the buffer pool after a
scan when the server is "happy", from a run with 20 clients:

writes=38.3MB (8.2%) pinned+used=38.3MB (8.2%)
dirty buffer usage count histogram:
0=0.1% 1=0.3% 2=26% 3=17% 4=21% 5+=36%

Basically, everything that's left dirty is also heavily used; as I noted
before, when I inspect with pg_buffercache these are mostly index blocks.
I note that I should probably generate a second histogram that compares
the clean data. The all scan is writing pages out as soon as they dirty,
the LRU background writer is lucky if it can find anything to do. (Note
that I don't rely on the LRU writer more because that causes a significant
lull in writes after a checkpoint. By the time it gets going again it's
harder to catch up, and delaying PostgreSQL writes also aggrevates issues
with the way Linux caches writes.)

What's I found really interesting was comparing a "sad" section where
performance breaks down. This is from a minute later:

writes=441.6MB (94.2%) pinned+used=356.2MB (76.0%)
dirty buffer usage count histogram:
0=18.7% 1=26.4% 2=31% 3=11% 4=9% 5+=4%

Note how the whole buffer distribution has shifted toward lower usage
counts. The breakdown seems to involve evicting the index blocks to make
room for the recently dirty data when it can't be written out fast enough.
As the index blocks go poof, things slow further, and into the vicious
circle you go. Eventually you can end up blocked on a combination of
buffer evictions and disk seeks for uncached data that are fighting with
the writes.

The bgwriter change this suggested to me is defining a triage behavior
where the all scan switches to acting like the LRU one:

-Each sample period, note what % of the usage_count=0 records are dirty
-If that number is above a tunable threshold, switch to only writing
usage_count=0 records until it isn't anymore.

On my system a first guess for that tunable would be 2-5%, based on what
values I see on either side of the sad periods. No doubt some systems
would set that much higher, haven't tested my system at home yet to have a
guideline for a more typical PC.

As for why this behavior matters so much to me, I actually have a
prototype auto-tuning background writer design that was hung up on this
particular problem. It notes how often you write out max_pages, uses that
to model the average percentage of the buffer cache you're traversing each
pass, then runs a couple of weighted-average feedback loops to aim for a
target seconds/sweep.

The problem was that it went berzerk when the whole buffer cache was dirty
(I hope someone appreciates that I've been calling this "Felix Unger on
crack mode" in my notes). I needed some way to prioritize which buffers
to concentrate on when that happens, and so far the above has been a good
first-cut way to help with that.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-03-13 02:59:04 Re: autovacuum next steps, take 3
Previous Message Richard Huxton 2007-03-13 02:30:31 Re: My honours project - databases using dynamically attached entity-properties