Re: postgresql latency & bgwriter not doing its job

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgresql latency & bgwriter not doing its job
Date: 2014-08-27 09:05:52
Message-ID: alpine.DEB.2.10.1408271035050.8876@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> [...] What's your evidence the pacing doesn't work? Afaik it's the fsync
> that causes the problem, not the the writes themselves.

Hmmm. My (poor) understanding is that fsync would work fine if everything
was already written beforehand:-) that is it has nothing to do but assess
that all is already written. If there is remaining write work, it starts
doing it "now" with the disastrous effects I'm complaining about.

When I say "pacing does not work", I mean that things where not written
out to disk by the OS, it does not mean that pg did not ask for it.

However it does not make much sense for an OS scheduler to wait several
minutes with tens of thousands of pages to write and do nothing about
it... So I'm wondering.

> [...]
>> (1) the ability to put checkpoint_timeout to values smaller than 30s could
>> help, although obviously there would be other consequences. But the ability
>> to avoid periodic offline time looks like a desirable objective.
>
> I'd rather not do that. It's a utterly horrible hack to go this write.

Hmmm. It does solve the issue, though:-) It would be the administrator
choice. It is better than nothing, which is the current status.

>> (2) I still think that a parameter to force bgwriter to write more stuff
>> could help, but this is not tested.
>
> It's going to be random writes. That's not going to be helpful.

The -N small OLTP load on a large (GB) table *is* random writes anyway,
whether they occur at checkpoint or at any other time. Random writes are
fine in this case, the load is small, there should be no problem.

>> (3) Any other effective idea to configure for responsiveness is
>> welcome!
>
> I've a couple of ideas how to improve the situation, but so far I've not
> had the time to investigate them properly. Would you be willing to test
> a couple of simple patches?

I can test a couple of patches. I already did one on someone advice (make
bgwriter round all stuff in 1s instead of 120s, without positive effect.

> Did you test xfs already?

No. I cannot without reinstalling, which I cannot do on a remote host, and
I will probably not have time to do it when I'll have physical access.
Only one partition on the host. My mistake. Will not do it again. Shame on
me.

If someone out there has an XFS setup, it is very easy to test and only
takes a couple of minutes, really. It takes less time to do it than to
write a mail about it afterwards:-)

I have tested FreeBSD/UFS with similar results, a few periodic offlines.
UFS journaled file system is probably not ideal for database work, but yet
again the load is small, it should be able to cope without going offline.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-08-27 09:14:46 Re: postgresql latency & bgwriter not doing its job
Previous Message Andres Freund 2014-08-27 08:30:26 Re: postgresql latency & bgwriter not doing its job