Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Date: 2013-09-13 21:10:04
Message-ID: CAHyXU0yvKy2jgqPWO1ZdYSdVuH6YS_H==RDZ0jxuGCEOk-q1cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2013 at 4:04 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
>> Absolutely not claiming the contrary. I think it sucks that we
>> couldn't fully figure out what's happening in detail. I'd love to
>> get my hand on a setup where it can be reliably reproduced.
>
> I have seen two completely different causes for symptoms like this,
> and I suspect that these aren't the only two.
>
> (1) The dirty page avalanche: PostgreSQL hangs on to a large
> number of dirty buffers and then dumps a lot of them at once. The
> OS does the same. When PostgreSQL dumps its buffers to the OS it
> pushes the OS over a "tipping point" where it is writing dirty
> buffers too fast for the controller's BBU cache to absorb them.
> Everything freezes until the controller writes and accepts OS
> writes for a lot of data. This can take several minutes, during
> which time the database seems "frozen". Cure is some combination
> of these: reduce shared_buffers, make the background writer more
> aggressive, checkpoint more often, make the OS dirty page writing
> more aggressive, add more BBU RAM to the controller.

Yeah -- I've seen this too, and it's a well understood problem.
Getting o/s to spin dirty pages out faster is the name of the game I
think. Storage is getting so fast that it's (mostly) moot anyways.
Also, this is under the umbrella of 'high i/o' -- the stuff I've been
seeing is low- or no- I/o.

> (2) Transparent huge page support goes haywire on its defrag work.
> Clues on this include very high "system" CPU time during an
> episode, and `perf top` shows more time in kernel spinlock
> functions than anywhere else. The database doesn't completely lock
> up like with the dirty page avalanche, but it is slow enough that
> users often describe it that way. So far I have only seen this
> cured by disabling THP support (in spite of some people urging that
> just the defrag be disabled). It does make me wonder whether there
> is something we could do in PostgreSQL to interact better with
> THPs.

Ah, that's a useful tip; need to research that, thanks. Maybe Josh
might be able to give it a whirl...

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-09-13 21:15:53 Re: Large shared_buffer stalls WAS: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Previous Message Kevin Grittner 2013-09-13 21:04:55 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers