Re: Instrument checkpoint sync calls

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Instrument checkpoint sync calls
Date: 2010-12-05 21:23:36
Message-ID: 4CFC02D8.70302@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes wrote:
> I've attached a tiny patch to apply over yours, to deal with this and
> with the case where no files are synced.
>

Thanks for that. That obvious error eluded me because in most of the
patch update testing I was doing (on ext3), the longest sync was always
about the same length as the total sync time.

Attached patch (in correct diff form this time!) collects up all
changes. That includes elimination of a potential race condition if
someone changes log_checkpoints while a long sync phase is executing. I
don't know whether that can happen, and it obviously won't give accurate
stats going back to the beginning of the checkpoint in that case, but
it tries to defend aginst producing complete garbage if that value
changes out from under it.

This is the first version of this patch I feel fairly good about; no
open concerns left on my side. Jeff, since you're now the de-facto
credited reviewer of this one by virtue of suggesting bug fixes, could
you take this update out for a spin too?

> Combining this instrumentation patch with the backend sync one already
> committed, the net result under log_min_messages=debug1is somewhat
> undesirable in that I can now see the individual sync times for the
> syncs done by the checkpoint writer, but I do not get times for the
> syncs done by the backend (I only get informed of their existence).
>

On a properly working system, backend syncs shouldn't be happening. So
if you see them, I think the question shouldn't be "how long are they
taking?", it's "how do I get rid of them?" From that perspective,
knowing of their existence is sufficient to suggest the necessary tuning
changes, such as dropping bgwriter_delay.

When you get into a situation where they are showing up, a whole lot of
them can happen in a very brief period; enough that I'd actually be
concerned about the added timing overhead, something I normally don't
care about very much. That's the main reason I didn't bother
instrumenting those too--the situations where they happen are ones
already running low on resources.

Big writes for things that can only be written out at checkpoint time,
on the other hand, are unavoidable in the current design. Providing
detail on them is going to be relevant unless there's a major
refactoring of how those happen.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

Attachment Content-Type Size
log-sync-v4.patch text/x-patch 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2010-12-05 21:53:41 Re: Spread checkpoint sync
Previous Message Kevin Grittner 2010-12-05 21:23:07 Re: V3: Idle in transaction cancellation