Re: Final background writer cleanup for 8.3

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final background writer cleanup for 8.3
Date: 2007-08-26 21:16:15
Message-ID: Pine.GSO.4.64.0708261637030.3811@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 26 Aug 2007, Kevin Grittner wrote:

> usagecount | count | isdirty
> ------------+-------+---------
> 0 | 8711 | f
> 1 | 9394 | f
> 2 | 1188 | f
> 3 | 869 | f
> 4 | 160 | f
> 5 | 157 | f

Here's a typical sample from your set. Notice how you've got very few
buffers with a high usage count. This is a situation the background
writer is good at working with. Either the old or new work-in-progress
LRU writer can aggressively pound away at any of the buffers with a 0
usage count shortly after they get dirty, and that won't be inefficient
because there aren't large numbers of other clients using them.

Compare against this other sample:

> usagecount | count | isdirty
> ------------+-------+---------
> 0 | 9093 | f
> 1 | 6702 | f
> 2 | 2267 | f
> 3 | 602 | f
> 4 | 428 | f
> 5 | 1388 | f

Notice that you have a much larger number of buffers where the usage count
is 4 or 5. The all-scan part of the 8.2 background writer will waste a
lot of writes when you have a profile that's more like this. If there
have been 4+ client backends touching the buffer recently, you'd be crazy
to write it out right now if you could instead be focusing on banging out
the ones where the usage count is 0. The 8.2 background writer would
write them out anyway, which meant that when you hit a checkpoint both the
OS and the controller cache were filled with such buffers before you even
started writing the checkpoint data. The new setup in 8.3 only worries
about the high usage count buffers when you hit a checkpoint, at which
point it streams them out over a longer, adjustable period (as not to
spike the I/O more than necessary and block your readers) than the 8.2
design, which just dumped them all immediately.

> Just to be sure that I understand, are you saying it would be a bad scene if
> the physical writes happened, or that the overhead of pushing them out to
> the OS would be crippling?

If you have a lot of buffers where the usage_count data was high, it would
be problematic to write them out every time they were touched; odds are
good somebody else is going to dirty them again soon enough so why bother.
On your workload, that doesn't seem to be the case. But that is the
situation on some other test workloads, and balancing for that situation
has been central to the parts of the redesign I've been injecting
suggestions into. One of the systems I was tormented by had the
usagecount of 5 for >20% of the buffers in the cache under heavy load, and
had a physical write been executed every time one of those was touched
that would have been crippling (even if the OS was smart to cache and
therefore make redundant some of the writes, which is behavior I would
prefer not to rely on).

> This contrib module seems pretty safe, patch and all. Does anyone think
> there is significant risk to slipping it into the 8.2.4 database where we
> have massive public exposure on the web site handling 2 million hits per
> day?

I think it's fairly safe, and my patch was pretty small; just exposing
some data that nobody had been looking at before. Think how much easier
your life would have been when doing your earlier tuning if you were
looking at the data in these terms. Just be aware that running the query
is itself intensive and causes its own tiny hiccup in throughput every
time it executes, so you may want to consider this more of a snapshot you
run periodically to learn more about your data rather than something you
do very regularly.

> I also think we need to somehow develop a set of tests which report
> maximum response time on (what should be) fast queries while the
> database is under different loads, so that those of us for whom reliable
> response time is more important than maximum overall throughput are
> protected from performance regressions.

My guess is that the DBT2 tests that Heikki has been running are a more
complicated than you think they are; there are response time guarantee
requirements in there as well as the throughput numbers. The tests that I
run (which I haven't been publishing yet but will be with the final patch
soon) also report worst-case and 90-th percentile latency numbers as well
as TPS. A "regression" that improved TPS at the expense of those two
would not be considered an improvement by anyone involved here.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2007-08-27 00:35:29 Re: Final background writer cleanup for 8.3
Previous Message Tom Lane 2007-08-26 20:41:02 Insufficient attention to security in contrib (mostly)