Re: Page replacement algorithm in buffer cache

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Jim Nasby'" <jim(at)nasby(dot)net>, "'Ants Aasma'" <ants(at)cybertec(dot)at>
Cc: "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'Greg Stark'" <stark(at)mit(dot)edu>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page replacement algorithm in buffer cache
Date: 2013-03-23 09:43:51
Message-ID: 000001ce27aa$edf71680$c9e54380$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, March 23, 2013 9:34 AM Jim Nasby wrote:
> On 3/22/13 7:27 PM, Ants Aasma wrote:
> > On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:
>
> > One other interesting idea I have seen is closeable scalable nonzero
> > indication (C-SNZI) from scalable rw-locks [1]. The idea there is to
> > use a tree structure to dynamically stripe access to the shared lock
> > counter when contention is detected. Downside is that considerable
> > amount of shared memory is needed so there needs to be some way to
> > limit the resource usage. This is actually somewhat isomorphic to the
> > nailing idea.
> >
> > The issue with the current buffer management algorithm is that it
> > seems to scale badly with increasing shared_buffers. I think the
> > improvements should concentrate on finding out what is the problem
> > there and figuring out how to fix it. A simple idea to test would be
> > to just partition shared buffers along with the whole clock sweep
> > machinery into smaller ones, like the buffer mapping hash tables
> > already are. This should at the very least reduce contention for the
> > clock sweep even if it doesn't reduce work done per page miss.
> >
> > [1] http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf
>
> Partitioned clock sweep strikes me as a bad idea... you could certainly
> get unlucky and end up with a lot of hot stuff in one partition.
>
> Another idea that'sbeen broughht up inthe past is to have something in
> the background keep a minimum number of buffers on the free list.
> That's how OS VM systems I'm familiar with work, so there's precedent
> for it.
>
> I recall there were at least some theoretical concerns about this, but
> I don't remember if anyone actually tested the idea.

I have tried one of the idea's : Adding the buffers background writer finds
reusable to freelist.
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF9
7(at)szxeml509-mbs
This can reduce the clock swipe as it can find buffers from freelist.

It shows performance improvement for read loads when data can be contained
in shared buffers,
but when the data becomes large and (I/O) is involved, it shows some dip as
well.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Jurd 2013-03-23 09:53:06 Re: Single-argument variant for array_length and friends?
Previous Message Sergey Konoplev 2013-03-23 07:37:47 timeofday() and clock_timestamp() produce different results when casting to timestamptz