Re: Improvement of checkpoint IO scheduler for stable transaction responses

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improvement of checkpoint IO scheduler for stable transaction responses
Date: 2013-07-14 19:13:41
Message-ID: 51E2F865.8030406@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/27/13 11:08 AM, Robert Haas wrote:
> I'm pretty sure Greg Smith tried it the fixed-sleep thing before and
> it didn't work that well.

That's correct, I spent about a year whipping that particular horse and
submitted improvements on it to the community.
http://www.postgresql.org/message-id/4D4F9A3D.5070700@2ndquadrant.com
and its updates downthread are good ones to compare this current work
against.

The important thing to realize about just delaying fsync calls is that
it *cannot* increase TPS throughput. Not possible in theory, obviously
doesn't happen in practice. The most efficient way to write things out
is to delay those writes as long as possible. The longer you postpone a
write, the more elevator sorting and write combining you get out of the
OS. This is why operating systems like Linux come tuned for such
delayed writes in the first place. Throughput and latency are linked;
any patch that aims to decrease latency will probably slow throughput.

Accordingly, the current behavior--no delay--is already the best
possible throughput. If you apply a write timing change and it seems to
increase TPS, that's almost certainly because it executed less
checkpoint writes. It's not a fair comparison. You have to adjust any
delaying to still hit the same end point on the checkpoint schedule.
That's what my later submissions did, and under that sort of controlled
condition most of the improvements went away.

Now, I still do really believe that better spacing of fsync calls helps
latency in the real world. Far as I know the server that I developed
that patch for originally in 2010 is still running with that change.
The result is not a throughput change though; there is a throughput drop
with a latency improvement. That is the unbreakable trade-off in this
area if all you touch is scheduling.

The reason why I was ignoring this discussion and working on pgbench
throttling until now is that you need to measure latency at a constant
throughput to advance here on this topic, and that's exactly what the
new pgbench feature enables. If we can take the current checkpoint
scheduler and an altered one, run both at exactly the same rate, and one
gives lower latency, now we're onto something. It's possible to do that
with DBT-2 as well, but I wanted something really simple that people
could replicate results with in pgbench.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2013-07-14 19:14:44 Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Previous Message Fabien COELHO 2013-07-14 18:48:25 Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)