Re: postgresql latency & bgwriter not doing its job

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgresql latency & bgwriter not doing its job
Date: 2014-08-30 18:45:59
Message-ID: 20140830184558.GA31166@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-08-30 14:16:10 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
> >> A possible compromise is to sort a limited number of
> >> buffers ---- say, collect a few thousand dirty buffers then sort, dump and
> >> fsync them, repeat as needed.
>
> > Yea, that's what I suggested nearby. But I don't really like it, because
> > it robs us of the the chance to fsync() a relfilenode immediately after
> > having synced all its buffers.
>
> Uh, how so exactly? You could still do that. Yeah, you might fsync a rel
> once per sort-group and not just once per checkpoint, but it's not clear
> that that's a loss as long as the group size isn't tiny.

Because it wouldn't have the benefit of sycing the minimal amount of
data anymore. If lots of other relfilenodes have been synced inbetween
the amount of newly dirtied pages in the os' buffercache (written by
backends, bgwriter) for a individual relfilenode is much higher.

A fsync() on a file with dirty data often causes *serious* latency
spikes - we should try hard to avoid superflous calls.

As an example: Calling fsync() on pgbench_accounts's underlying files,
from outside postgres, *before* postgres even started its first
checkpoint does this:
progress: 72.0 s, 4324.9 tps, lat 41.481 ms stddev 40.567
progress: 73.0 s, 4704.9 tps, lat 38.465 ms stddev 35.436
progress: 74.0 s, 4448.5 tps, lat 40.058 ms stddev 32.634
progress: 75.0 s, 4634.5 tps, lat 39.229 ms stddev 33.463
progress: 76.8 s, 2753.1 tps, lat 48.693 ms stddev 75.309
progress: 77.1 s, 126.6 tps, lat 773.433 ms stddev 222.667
progress: 78.0 s, 183.7 tps, lat 786.401 ms stddev 395.954
progress: 79.1 s, 170.3 tps, lat 975.949 ms stddev 596.751
progress: 80.0 s, 2116.6 tps, lat 168.608 ms stddev 398.933
progress: 81.0 s, 4436.1 tps, lat 40.313 ms stddev 34.198
progress: 82.0 s, 4383.9 tps, lat 41.811 ms stddev 37.241

Note the dip from 4k tps to 130 tps.

We can get a handle on that (on some platforms at least) for writes
issued during the buffer sync by forcing the kernel to write out the
pages in small increments; but I doubt we want to do that for writes by
backends themselves.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-08-30 19:17:36 Re: Selectivity estimation for inet operators
Previous Message Tom Lane 2014-08-30 18:16:10 Re: postgresql latency & bgwriter not doing its job