Count backend self-sync calls

Lists: pgsql-hackers
From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Count backend self-sync calls
Date: 2010-11-14 22:07:53
Message-ID: 4CE05DB9.9080309@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The attached patch adds a new field to pg_stat_bgwriter, counting the
number of times backends execute their own fsync calls. Normally, when
a backend needs to fsync data, it passes a request to the background
writer, which then absorbs the call into its own queue of work to do.
However, under some types of heavy system load, the associated queue can
fill. When this happens, backends are forced to do their own fsync
call. This is potentially much worse than when they do a regular write.

The really nasty situation is when the background writer is busy because
it's executing a checkpoint. In that case, it's possible for the
backend fsync calls to start competing with the ones the background
writer is trying to get done, causing the checkpoint sync phase to
execute slower than it should. I've seen the sync phase take over 45
minutes on a really busy server once it got into this condition, where
hundreds of clients doing their own backend fsync calls were fighting
against the checkpoint fsync work. With this patch, you can observe
that happening as an upwards spike in
pg_stat_bgwriter.buffers_backend_sync, which as documented is an
inclusive subset of the total shown in buffers_backend.

While it takes a busier system than I can useful show how to simulate
here to show a really bad situation, I'm able to see some of these
unabsorbed backend fsync calls when initializing a pgbench database, to
prove they happen in the lab. The attached test program takes as its
input a pgbench scale counter. It then creates a pgbench database
(deleting any existing pgbench database, so watch out for that) and
shows the values accumulated in pg_stat_bgwriter during that period.
Here's an example, using the script's default scale of 100 on a server
with 8GB of RAM and fake fsync (the hard drives are lying about it):

-[ RECORD 1 ]--------+-----------------------------
now | 2010-11-14 16:08:41.36421-05
...
Initializing pgbench
-[ RECORD 1 ]--------+------------------------------
now | 2010-11-14 16:09:46.713693-05
checkpoints_timed | 0
checkpoints_req | 0
buffers_checkpoint | 0
buffers_clean | 0
maxwritten_clean | 0
buffers_backend | 654716
buffers_backend_sync | 90
buffers_alloc | 803

This is with default sizing for memory structures. As you increase
shared_buffers, one of the queues involved here increases
proportionately, making it less likely to run into this problem. That
just changes it to the kind of problem I've only seen on a larger system
with a difficult to simulate workload. The production system getting
hammered with this problem (running a web application) that prompted
writing the patch had shared_buffers=4GB at the time.

The patch also adds some logging to the innards involved here, to help
with understanding problems in this area. I don't think that should be
in the version committed as is. May want to drop the logging level or
make it disabled in regular builds, since it is sitting somewhere it
generates a lot of log data and adds overhead. It is nice for now, as
it lets you get an idea how much fsync work *is* being absorbed by the
BGW, as well as showing what relation is suffering from this issue.
Example of both those things, with the default config for everything
except log_checkpoints (on) and log_min_messages (debug1):

DEBUG: Absorbing 4096 fsync requests
DEBUG: Absorbing 150 fsync requests
DEBUG: Unable to forward fsync request, executing directly
CONTEXT: writing block 158638 of relation base/16385/16398

Here 4096 is the most entries the BGW will ever absorb at once, and all
90 of the missed sync calls are logged so you can see what files they
came from.

As a high-level commentary about this patch, I'm not sure what most end
users will ever do with this data. At the same time, I wasn't sure what
a typical end user would do with anything else in pg_stat_bgwriter
either when it was added, and it turns out the answer is "wait for
people who know the internals to write things that monitor it". For
example, Magnus has updated recent versions of the Munin plug-in for
PostgreSQL to usefully graph pg_stat_bgwriter data over time. As I have
some data to suggest checkpoint problems on Linux in particular are
getting worse as total system memory increases, I expect that having a
way to easily instrument for this particular problem will be
correspondingly more important in the future too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

Attachment Content-Type Size
backend-syncs-v2.patch text/x-patch 13.3 KB
fsync-stress-v2.sh application/x-sh 349 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-14 22:32:34
Message-ID: AANLkTimc4jdP+0z3OpgMHtGKHDLAg9cL6KZeTJxi1VYR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 14, 2010 at 5:07 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> The patch also adds some logging to the innards involved here, to help with
> understanding problems in this area.  I don't think that should be in the
> version committed as is.  May want to drop the logging level or make it
> disabled in regular builds, since it is sitting somewhere it generates a lot
> of log data and adds overhead.

I think this one could be removed:

+ if (n > 0)
+ elog(DEBUG1,"Absorbing %d fsync requests",n);

But if this is generating a lot of log data or adding a lot of
overhead, then you have bigger problems anyway:

+ elog(DEBUG1, "Unable to forward fsync request, executing directly");

I'm inclined to change that to an ereport(), but otherwise it seems
OK. Also, how about referring to this as buffers_backend_fsync
consistently throughout, instead of dropping the "f" in some places?

With those changes, I think this is committable, and will do so,
barring objections.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-14 23:10:49
Message-ID: 13214.1289776249@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> With those changes, I think this is committable, and will do so,
> barring objections.

Obey message style guidelines, please, especially if you're going
to promote any of those to ereports.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-14 23:46:16
Message-ID: AANLkTik2ErwRZynB=v7O=nLuCmJP1tPZvSLm4qjGuFW8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 14, 2010 at 6:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> With those changes, I think this is committable, and will do so,
>> barring objections.
>
> Obey message style guidelines, please, especially if you're going
> to promote any of those to ereports.

The only new message would be the one Greg has as:

Unable to forward fsync request, executing directly

For that, we could just go with:

could not forward fsync request

(Lower case, avoid use of unable, telegram style with program as
implicit subject.)

It might be even better to mention that the reason why we couldn't
forward the fsync request is that the fsync request queue is full.
I'm not sure exactly how to phrase that. I thought about:

fsync request queue is full

But that seems not to answer the "so what" question. There is an
example like this in the docs:

could not forward fsync request (fsync request queue is full)

...but I'm not sure I like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-15 00:19:41
Message-ID: 4CE07C9D.9000403@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> I think this one could be removed:
>
> + if (n > 0)
> + elog(DEBUG1,"Absorbing %d fsync requests",n);
>

Right; that one is just there to let you know the patch is working, and
how much the background writer does for you per pass, mainly for the
purpose of reviewing the patch.

> But if this is generating a lot of log data or adding a lot of
> overhead, then you have bigger problems anyway:
>
> + elog(DEBUG1, "Unable to forward fsync request, executing directly");
>

The argument against this log line even existing is that if the field is
added to pg_stat_bgwriter, that's probably how you want to monitor this
data anyway. I don't think there's actually much value to it, which is
one reason I didn't worry too much about matching style guidelines,
translation, etc. You've found the two things that I think both need to
go away before commit, but I left them in because I think they're
valuable for testing the patch itself does what it's supposed to.

> Also, how about referring to this as buffers_backend_fsync
> consistently throughout, instead of dropping the "f" in some places?
>

I started out touching code that called it just "sync", but then crossed
to other code that called it "fsync", and made the external UI use that
name. No objections to sorting that out within my patch so it's consistent.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-15 00:27:50
Message-ID: 17874.1289780870@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It might be even better to mention that the reason why we couldn't
> forward the fsync request is that the fsync request queue is full.
> I'm not sure exactly how to phrase that. I thought about:

> fsync request queue is full

> But that seems not to answer the "so what" question. There is an
> example like this in the docs:

> could not forward fsync request (fsync request queue is full)

> ...but I'm not sure I like that.

Well, that example is meant to cover cases where you have to assemble a
couple of independently created phrases. In this case I'd suggest
could not forward fsync request because request queue is full
or, if you think there might sometime be a need to have a strerror
variant, ie
could not forward fsync request: %m
then maybe this would make the most sense:
could not forward fsync request: request queue is full

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-15 01:31:09
Message-ID: AANLkTikGnJ=1H1B9=XsJ2VOnh4JqeA-6J07fKKL9Hbt0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> But if this is generating a lot of log data or adding a lot of
>> overhead, then you have bigger problems anyway:
>>
>> +               elog(DEBUG1, "Unable to forward fsync request, executing
>> directly");
>>
>
> The argument against this log line even existing is that if the field is
> added to pg_stat_bgwriter, that's probably how you want to monitor this data
> anyway.

I'll remove it if you really want it gone, but personally I think it's
useful to have. I've more than once had to debug a problem given a
PostgreSQL log file with the debug level cranked up and not a whole
lot else. Rare events that cause performance to tank are worth
logging, IMHO.

> I started out touching code that called it just "sync", but then crossed to
> other code that called it "fsync", and made the external UI use that name.
>  No objections to sorting that out within my patch so it's consistent.

OK, I'll do that before I commit it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-15 01:31:42
Message-ID: AANLkTimOVcjatpEf=W_oGCJhEXr_Z_MbrJ_yrFCuG5JY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 14, 2010 at 7:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It might be even better to mention that the reason why we couldn't
>> forward the fsync request is that the fsync request queue is full.
>> I'm not sure exactly how to phrase that.  I thought about:
>
>> fsync request queue is full
>
>> But that seems not to answer the "so what" question.  There is an
>> example like this in the docs:
>
>> could not forward fsync request (fsync request queue is full)
>
>> ...but I'm not sure I like that.
>
> Well, that example is meant to cover cases where you have to assemble a
> couple of independently created phrases.  In this case I'd suggest
>        could not forward fsync request because request queue is full

Sounds good to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-15 17:48:58
Message-ID: AANLkTik+2f=BYdkUZdySbZrfZ4r1Va1QJHH92C77qR+j@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 14, 2010 at 8:31 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>> But if this is generating a lot of log data or adding a lot of
>>> overhead, then you have bigger problems anyway:
>>>
>>> +               elog(DEBUG1, "Unable to forward fsync request, executing
>>> directly");
>>>
>>
>> The argument against this log line even existing is that if the field is
>> added to pg_stat_bgwriter, that's probably how you want to monitor this data
>> anyway.
>
> I'll remove it if you really want it gone, but personally I think it's
> useful to have.  I've more than once had to debug a problem given a
> PostgreSQL log file with the debug level cranked up and not a whole
> lot else.  Rare events that cause performance to tank are worth
> logging, IMHO.
>
>> I started out touching code that called it just "sync", but then crossed to
>> other code that called it "fsync", and made the external UI use that name.
>>  No objections to sorting that out within my patch so it's consistent.
>
> OK, I'll do that before I commit it.

Committed with (I think) all the changes discussed, plus a catversion bump.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-15 21:06:35
Message-ID: AANLkTimLWzqEkiNfBm0Ox9Z15DZuDQYbqLdH0dAJojKu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 14, 2010 at 2:07 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> The attached patch adds a new field to pg_stat_bgwriter, counting the number
> of times backends execute their own fsync calls.  Normally, when a backend
> needs to fsync data, it passes a request to the background writer, which
> then absorbs the call into its own queue of work to do.  However, under some
> types of heavy system load, the associated queue can fill.  When this
> happens, backends are forced to do their own fsync call.  This is
> potentially much worse than when they do a regular write.
>
> The really nasty situation is when the background writer is busy because
> it's executing a checkpoint.  In that case, it's possible for the backend
> fsync calls to start competing with the ones the background writer is trying
> to get done,

Do you know where this competition is happening? Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?

...
>
> DEBUG:  Absorbing 4096 fsync requests
> DEBUG:  Absorbing 150 fsync requests
> DEBUG:  Unable to forward fsync request, executing directly
> CONTEXT:  writing block 158638 of relation base/16385/16398
>
> Here 4096 is the most entries the BGW will ever absorb at once, and all 90
> of the missed sync calls are logged so you can see what files they came
> from.

Looking in src/backend/postmaster/bgwriter.c line 1071:

* Note: we presently make no attempt to eliminate duplicate requests
* in the requests[] queue. The bgwriter will have to eliminate dups
* internally anyway, so we may as well avoid holding the lock longer
* than we have to here.

This makes sense if we just need to append to a queue. But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?

Cheers,

Jeff


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Count backend self-sync calls
Date: 2010-11-16 02:49:08
Message-ID: 4CE1F124.30003@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes wrote:
> Do you know where this competition is happening? Is it on the
> platters, or is it in the hard drive write cache (I thought high-end
> hardware had tagged writes to avoid that), or in the kernel?
>

Kernel. Linux systems with lots of memory will happily queue up
gigabytes of memory in their write cache, only getting serious about
writing it out to disk when demanded to by fsync.

> This makes sense if we just need to append to a queue. But once the
> queue is full and we are about to do a backend fsync, might it make
> sense to do a little more work to look for dups?
>

One of the paths I'd like to follow is experimenting with both sorting
writes by file and looking for duplication in the queues. I think a
basic, simple sync spreading approach needs to get finished first
through; this sort of thing would then be an optimization on top of it.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us