Re: Move unused buffers to freelist

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Andres Freund'" <andres(at)2ndquadrant(dot)com>
Cc: "'Greg Smith'" <greg(at)2ndquadrant(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move unused buffers to freelist
Date: 2013-06-28 04:52:59
Message-ID: 004e01ce73bb$5d85ee20$1891ca60$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 27, 2013 5:54 PM Robert Haas wrote:
> On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> > Configuration Details
> > O/S - Suse-11
> > RAM - 128GB
> > Number of Cores - 16
> > Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min,
> > synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off
> Pgbench -
> > Select-only Scalefactor - 1200 Time - 30 mins
> >
> > 8C-8T 16C-16T 32C-32T 64C-
> 64T
> > Head 62403 101810 99516 94707
> > Patch 62827 101404 99109 94744
> >
> > On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB
> shared
> > buffers, this is no major difference.
> > One of the reasons could be that there is no much swapping in shared
> buffers
> > as most data already fits in shared buffers.
>
> I'd like to just back up a minute here and talk about the broader
> picture here. What are we trying to accomplish with this patch? Last
> year, I did some benchmarking on a big IBM POWER7 machine (16 cores,
> 64 hardware threads). Here are the results:
>
> http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-
> ibm.html
>
> Now, if you look at these results, you see something interesting.
> When there aren't too many concurrent connections, the higher scale
> factors are only modestly slower than the lower scale factors. But as
> the number of connections increases, the performance continues to rise
> at the lower scale factors, and at the higher scale factors, this
> performance stops rising and in fact drops off. So in other words,
> there's no huge *performance* problem for a working set larger than
> shared_buffers, but there is a huge *scalability* problem. Now why is
> that?
>
> As far as I can tell, the answer is that we've got a scalability
> problem around BufFreelistLock. Contention on the buffer mapping
> locks may also be a problem, but all of my previous benchmarking (with
> LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant
> in the room. My interest in having the background writer add buffers
> to the free list is basically around solving that problem. It's a
> pretty dramatic problem, as the graph above shows, and this patch
> doesn't solve it. There may be corner cases where this patch improves
> things (or, equally, makes them worse) but as a general point, the
> difficulty I've had reproducing your test results and the specificity
> of your instructions for reproducing them suggests to me that what we
> have here is not a clear improvement on general workloads. Yet such
> an improvement should exist, because there are other products in the
> world that have scalable buffer managers; we currently don't. Instead
> of spending a lot of time trying to figure out whether there's a small
> win in narrow cases here (and there may well be), I think we should
> back up and ask why this isn't a great big win, and what we'd need to
> do to *get* a great big win. I don't see much point in tinkering
> around the edges here if things are broken in the middle; things that
> seem like small wins or losses now may turn out otherwise in the face
> of a more comprehensive solution.
>
> One thing that occurred to me while writing this note is that the
> background writer doesn't have any compelling reason to run on a
> read-only workload. It will still run at a certain minimum rate, so
> that it cycles the buffer pool every 2 minutes, if I remember
> correctly. But it won't run anywhere near fast enough to keep up with
> the buffer allocation demands of 8, or 32, or 64 sessions all reading
> data not all of which is in shared_buffers at top speed. In fact,
> we've had reports that the background writer isn't too effective even
> on read-write workloads. The point is - if the background writer
> isn't waking up and running frequently enough, what it does when it
> does wake up isn't going to matter very much. I think we need to
> spend some energy poking at that.

Currently it wakes up based on bgwriterdelay config parameter which is by
default 200ms, so you means we should
think of waking up bgwriter based on allocations and number of elements left
in freelist?

As per my understanding Summarization of points raised by you and Andres
which this patch should address to have a bigger win:

1. Bgwriter needs to be improved so that it can help in reducing usage count
and finding next victim buffer
(run the clock sweep and add buffers to the free list).
2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are
less.
3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer
(a spinlock for the freelist, and an lwlock for the clock sweep).
4. Separate processes for writing dirty buffers and moving buffers to
freelist
5. Bgwriter needs to be more aggressive, logic based on which it calculates
how many buffers it needs to process needs to be improved.
6. There can be contention around buffer mapping locks, but we can focus on
it later
7. cacheline bouncing around the buffer header spinlocks, is there anything
we can do to reduce this?

Kindly let me know if I have missed any point.

With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2013-06-28 05:10:54 Re: Patch for fail-back without fresh backup
Previous Message Kevin Grittner 2013-06-28 04:52:03 Re: changeset generation v5-01 - Patches & git tree