Re: Scaling shared buffer eviction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-16 16:51:24
Message-ID: CA+TgmoZsWZcQnfCj1jUfSOkH0zkiYO_HOfutA7kHrU5mBtEFmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 16, 2014 at 8:18 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> In most cases performance with patch is slightly less as compare
> to HEAD and the difference is generally less than 1% and in a case
> or 2 close to 2%. I think the main reason for slight difference is that
> when the size of shared buffers is almost same as data size, the number
> of buffers it needs from clock sweep are very less, as an example in first
> case (when size of shared buffers is 12286MB), it actually needs at most
> 256 additional buffers (2MB) via clock sweep, where as bgreclaimer
> will put 2000 (high water mark) additional buffers (0.5% of shared buffers
> is greater than 2000 ) in free list, so bgreclaimer does some extra work
> when it is not required and it also leads to condition you mentioned
> down (freelist will contain buffers that have already been touched since
> we added them). Now for case 2 (12166MB), we need buffers more
> than 2000 additional buffers, but not too many, so it can also have
> similar effect.
>

So there are two suboptimal things that can happen and they pull in
opposite directions. I think you should instrument the server how often
each is happening. #1 is that we can pop a buffer from the freelist and
find that it's been touched. That means we wasted the effort of putting it
on the freelist in the first place. #2 is that we can want to pop a buffer
from the freelist and find it empty and thus be forced to run the clock
sweep ourselves. If we're having problem #1, we could improve things by
reducing the water marks. If we're having problem #2, we could improve
things by increasing the water marks. If we're having both problems, then
I dunno. But let's get some numbers on the frequency of these specific
things, rather than just overall tps numbers.

> I think we have below options related to this observation
> a. Some further tuning in bgreclaimer, so that instead of putting
> the buffers up to high water mark in freelist, it puts just 1/4th or
> 1/2 of high water mark and then check if the free list still contains
> lesser than equal to low water mark, if yes it continues and if not
> then it can wait (or may be some other way).
>

That sounds suspiciously like just reducing the high water mark.

> b. Instead of waking bgreclaimer when the number of buffers fall
> below low water mark, wake when the number of times backends
> does clock sweep crosses certain threshold
>

That doesn't sound helpful.

> c. Give low and high water mark as config knobs, so that in some
> rare cases users can use them to do tuning.
>

Yuck.

> d. Lets not do anything as if user does such a configuration, he should
> be educated to configure shared buffers in a better way and or the
> performance hit doesn't seem to be justified to do any further
> work.
>

At least worth entertaining.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-16 16:54:48 Re: jsonb format is pessimal for toast compression
Previous Message Josh Berkus 2014-09-16 16:47:39 Re: jsonb format is pessimal for toast compression