Re: Scaling shared buffer eviction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-23 14:31:24
Message-ID: CA+Tgmob6yOedteBB461grFnoSV2MXxD0VqGq_f0xTHdbAm7Nnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 19, 2014 at 7:21 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> Specific numbers of both the configurations for which I have
> posted data in previous mail are as follows:
>
> Scale Factor - 800
> Shared_Buffers - 12286MB (Total db size is 12288MB)
> Client and Thread Count = 64
> buffers_touched_freelist - count of buffers that backends found touched
> after
> popping from freelist.
> buffers_backend_clocksweep - count of buffer allocations not satisfied
> from freelist
>
> buffers_alloc 1531023 buffers_backend_clocksweep 0
> buffers_touched_freelist 0
>

I didn't believe these numbers, so I did some testing. I used the same
configuration you mention here, scale factor = 800, shared_buffers = 12286
MB, and I also saw buffers_backend_clocksweep = 0. I didn't see
buffers_touched_freelist showing up anywhere, so I don't know whether that
would have been zero or not. Then I tried reducing the high watermark for
the freelist from 2000 buffers to 25 buffers, and
buffers_backend_clocksweep was *still* 0. At that point I started to smell
a rat. It turns out that, with this test configuration, there's no buffer
allocation going on at all. Everything fits in shared_buffers, or it did
on my test. I had to reduce shared_buffers down to 10491800kB before I got
any significant buffer eviction.

At that level, a 100-buffer high watermark wasn't sufficient to prevent the
freelist from occasionally going empty. A 2000-buffer high water mark was
by and large sufficient, although I was able to see small numbers of
buffers being allocated via clocksweep right at the very beginning of the
test, I guess before the reclaimer really got cranking. So the watermarks
seem to be broadly in the right ballpark, but I think the statistics
reporting needs improving. We need an easy way to measure the amount of
work that bgreclaimer is actually doing.

I suggest we count these things:

1. The number of buffers the reclaimer has put back on the free list.
2. The number of times a backend has run the clocksweep.
3. The number of buffers past which the reclaimer has advanced the clock
sweep (i.e. the number of buffers it had to examine in order to reclaim the
number counted by #1).
4. The number of buffers past which a backend has advanced the clocksweep
(i.e. the number of buffers it had to examine in order to allocate the
number of buffers count by #3).
5. The number of buffers allocated from the freelist which the backend did
not use because they'd been touched (what you're calling
buffers_touched_freelist).

It's hard to come up with good names for all of these things that are
consistent with the somewhat wonky existing names. Here's an attempt:

1. bgreclaim_freelist
2. buffers_alloc_clocksweep (you've got buffers_backend_clocksweep, but I
think we want to make it more parallel with buffers_alloc, which is the
number of buffers allocated, not buffers_backend, the number of buffers
*written* by a backend)
3. clocksweep_bgreclaim
4. clocksweep_backend
5. freelist_touched

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-23 14:36:03 Re: proposal: rounding up time value less than its unit.
Previous Message Tom Lane 2014-09-23 14:29:34 Re: proposal: rounding up time value less than its unit.