Re: Readme of Buffer Management seems to have wrong sentence

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Readme of Buffer Management seems to have wrong sentence
Date: 2012-05-22 20:36:00
Message-ID: CA+CSw_t-qgt7d7iN_Tb4zCXi-OTAr+p-F-EcAfX46x=RUcvmqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 22, 2012 at 8:28 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Totally agreed.  We're not the first people to think of this, either:
> CLOCK and GLOCK have been extensively studied and found to be almost
> as good as LRU in selecting good victim pages, but with less
> contention.  That's why people are using them.
>
> Here's a paper that defines GLOCK to be the algorithm we use (page 2,
> second column, second paragraph from the bottom), and furthermore
> mentions PostgreSQL (top of page 3):
>
> http://staff.aist.go.jp/m.yui/publications/ICDE10_conf_full_409.pdf

Interesting paper, they make the whole buffer management lock free.
Getting rid of not only the BufFreelistLock but also BufMappingLocks
and buffer header spinlocks. Now AFAIK BufMappingLocks and buffer
header aren't really a big issue for us, and atleast the latter looks
like turning it lock-free would entail lots of pretty hairy and
bug-prone code. On the other hand, making the clock sweep lock-free
looks relatively easy.

As far as I can see there is no reason why nextVictimBuffer couldn't
be read, incremented and atomically cmpxchg'ed to let other continue
before checking for the usage count. Likewise completePasses and
numBufferAllocs can easily be atomically incremented, bgwriter
shouldn't mind if the values are slightly out of date. The free list
itself is a bit trickier, but if it's still necessary/useful then
SC->firstFreeBuffer and buf->freeNext are in effect a linked-list
stack, there should plenty of tested lock free algorithms floating
around for that. (btw. lastFreeBuffer looks like dead code, is that
correct?)

As for better buffer management algorithms, have you read about
CLOCK-Pro? [1] It looks like it's an adaptive variant of LIRS, the
algorithm the MySQL uses (or atleast used some time ago). Looks like
Linux kernel devs also at least thought about implementing it [2] [3]
(hard to tell exactly, their docs are pretty chaotic compared pg).
According to [4], LIRS is almost as good as or better than LRU, by
extension I'd expect CLOCK-Pro to be better than GLOCK. It still has a
single clock mechanism, so better replacement policies won't help a
whole lot with lock contention on buffer allocation.

[1] http://www.cse.ohio-state.edu/~fchen/paper/papers/usenix05.pdf
[2] http://linux-mm.org/ClockProApproximation
[3] http://linux-mm.org/PageReplacementDesign
[4] http://www.almaden.ibm.com/cs/people/dmodha/ARC.pdf

Cheers,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-05-22 21:06:23 Re: Changing the concept of a DATABASE
Previous Message Stephen Frost 2012-05-22 20:35:22 Re: Per-Database Roles