Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Dave Chinner <david(at)fromorbit(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Kara <jack(at)suse(dot)cz>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Trond Myklebust <trondmy(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-14 23:37:41
Message-ID: 20140114233741.GI3431@dastard
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 14, 2014 at 03:03:39PM -0800, Kevin Grittner wrote:
> Dave Chinner <david(at)fromorbit(dot)com> write:
>
> > Essentially, changing dirty_background_bytes, dirty_bytes and
> > dirty_expire_centiseconds to be much smaller should make the
> > kernel start writeback much sooner and so you shouldn't have to
> > limit the amount of buffers the application has to prevent major
> > fsync triggered stalls...
>
> Is there any "rule of thumb" about where to start with these?

There's no absolute rule here, but the threshold for background
writeback needs to consider the amount of dirty data being
generated, the rate at which it can be retired and the checkpoint
period the application is configured with. i.e. it needs to be slow
enough to not cause serious read IO perturbations, but still fast
enough that it avoids peaks at synchronisation points. And most
importantly, it needs to be fast enought that it can complete
writeback of all the dirty data in a checkpoint before the next
checkpoint is triggered.

In general, I find that threshold to be somewhere around 2-5s worth
of data writeback - enough to keep a good amount of write combining
and the IO pipeline full as work is done, but no more.

e.g. if your workload results in writeback rates of 500MB/s, then
I'd be setting the dirty limit somewhere around 1-2GB as an initial
guess. It's basically a simple trade off buffering space for
writeback latency. Some applications perform well with increased
buffering space (e.g. 10-20s of writeback) while others perform
better with extremely low writeback latency (e.g. 0.5-1s).

>   For
> example, should a database server maybe have dirty_background_bytes
> set to 75% of the non-volatile write cache present on the
> controller, in an attempt to make sure that there is always some
> "slack" space for writes?

I don't think the hardware cache size matters as it's easy to fill
them very quickly and so after a couple of seconds the controller
will fall back to disk speed anyway. IMO, what matters is that the
threshold is large enough to adequately buffer writes to smooth
peaks and troughs in the pipeline.

Cheers,

Dave.
--
Dave Chinner
david(at)fromorbit(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2014-01-14 23:53:29 Re: nested hstore patch - FailedAssertion("!(value->array.nelems == 1)
Previous Message Heikki Linnakangas 2014-01-14 23:25:02 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE