Re: Scaling shared buffer eviction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-11 14:15:57
Message-ID: CA+Tgmoatoh6c2vWdNgEDOK7vn4iaCdmLuMjw8sebdL29Wu06CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 11, 2014 at 10:03 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-09-11 09:48:10 -0400, Robert Haas wrote:
>> On Thu, Sep 11, 2014 at 9:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > I wonder if we should recheck the number of freelist items before
>> > sleeping. As the latch currently is reset before sleeping (IIRC) we
>> > might miss being woken up soon. It very well might be that bgreclaim
>> > needs to run for more than one cycle in a row to keep up...
>>
>> The outer loop in BgMoveBuffersToFreelist() was added to address
>> precisely this point, which I raised in a previous review.
>
> Hm, right. But then let's move BgWriterStats.m_buf_alloc =+,
> ... pgstat_send_bgwriter(); into that loop. Otherwise it'd possibly end
> up being continously busy without being visible.

Good idea.

>> I'm not
>> blind to the possibility that the current logic is inadequate, but
>> testing proves that it works well enough to produce a massive
>> performance boost over where we are now.
>
> But, to be honest, the testing so far was pretty "narrow" in the kind of
> workloads that were run if I crossread things accurately. Don't get me
> wrong, I'm *really* happy about having this patch, that just doesn't
> mean every detail is right ;)

Oh, sure. Totally agreed. And, to the extent that we're improving
things based on actual testing, I'm A-OK with that. I just don't want
to start speculating, or we'll never get this thing off the ground.

Some possibly-interesting test cases would be:

(1) A read-only pgbench workload that is just a tiny bit larger than
shared_buffers, say size of shared_buffers plus 0.01%. Such workloads
tend to stress buffer eviction heavily.

(2) A workload that maximizes the rate of concurrent buffer eviction
relative to other tasks. Read-only pgbench is not bad for this, but
maybe somebody's got a better idea.

As I sort of mentioned in what I was writing for the bufmgr README,
there are, more or less, three ways this can fall down, at least that
I can see: (1) if the high water mark is too high, then we'll start
finding buffers in the freelist that have already been touched since
we added them: (2) if the low water mark is too low, the freelist will
run dry; and (3) if the low and high water marks are too close
together, the bgreclaimer will be constantly getting woken up and
going to sleep again. I can't personally think of a workload that
will enable us to get a better handle on those cases than
high-concurrency pgbench, but you're known to be ingenious at coming
up with destruction workloads, so if you have an idea, by all means
fire away.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-09-11 14:16:53 Re: pgbench throttling latency limit
Previous Message Tom Lane 2014-09-11 14:11:45 Re: bad estimation together with large work_mem generates terrible slow hash joins