Re: postgresql latency & bgwriter not doing its job

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgresql latency & bgwriter not doing its job
Date: 2014-08-27 08:30:26
Message-ID: 20140827083026.GB21544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-08-27 09:32:16 +0200, Fabien COELHO wrote:
>
> Hello Andres,
>
> >[...]
> >I think you're misunderstanding how spread checkpoints work.
>
> Yep, definitely:-) On the other hand I though I was seeking something
> "simple", namely correct latency under small load, that I would expect out
> of the box.

Yea. The current situation *sucks*. Both from the utterly borked
behaviour of ext4 and other filesystems and the lack of workaround from postgres.

> >When the checkpointer process starts a spread checkpoint it first writes
> >all buffers to the kernel in a paced manner.
> >That pace is determined by checkpoint_completion_target and
> >checkpoint_timeout.
>
> This pacing does not seem to work, even at slow pace.

It definitely does in some cases. What's your evidence the pacing
doesn't work? Afaik it's the fsync that causes the problem, not the the
writes themselves.

> >If you have a stall of roughly the same magnitude (say a factor
> >of two different), the smaller once a minute, the larger once an
> >hour. Obviously the once-an-hour one will have a better latency in many,
> >many more transactions.
>
> I do not believe in delaying as much as possible writing do disk to handle a
> small load as a viable strategy. However, to show my good will, I have
> tried to follow your advices: I've launched a 5000 seconds test with 50
> segments, 30 min timeout, 0.9 completion target, at 25 tps, which is less
> than 1/10 of the maximum throughput.
>
> There are only two time-triggered checkpoints:
>
> LOG: checkpoint starting: time
> LOG: checkpoint complete: wrote 48725 buffers (47.6%);
> 1 transaction log file(s) added, 0 removed, 0 recycled;
> write=1619.750 s, sync=27.675 s, total=1647.932 s;
> sync files=14, longest=27.593 s, average=1.976 s
>
> LOG: checkpoint starting: time
> LOG: checkpoint complete: wrote 22533 buffers (22.0%);
> 0 transaction log file(s) added, 0 removed, 23 recycled;
> write=826.919 s, sync=9.989 s, total=837.023 s;
> sync files=8, longest=6.742 s, average=1.248 s

The write pacing itself doesn't seem to be bad. The bad thing is the
'sync' times here. Those are *NOT* paced and kernel probably has delayed
flushing out much the writes...

> (1) the ability to put checkpoint_timeout to values smaller than 30s could
> help, although obviously there would be other consequences. But the ability
> to avoid periodic offline time looks like a desirable objective.

I'd rather not do that. It's a utterly horrible hack to go this write.

> (2) I still think that a parameter to force bgwriter to write more stuff
> could help, but this is not tested.

It's going to be random writes. That's not going to be helpful.

> (3) Any other effective idea to configure for responsiveness is welcome!

I've a couple of ideas how to improve the situation, but so far I've not
had the time to investigate them properly. Would you be willing to test
a couple of simple patches?

Did you test xfs already?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-08-27 09:05:52 Re: postgresql latency & bgwriter not doing its job
Previous Message Heikki Linnakangas 2014-08-27 08:08:11 Re: pgbench throttling latency limit