Re: Readme of Buffer Management seems to have wrong sentence

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Readme of Buffer Management seems to have wrong sentence
Date: 2012-05-22 17:28:36
Message-ID: CA+TgmoaBgkBFpfriD39MSdHFg1Ofuv+8+F3L-Bo8mSeD02dC1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 22, 2012 at 12:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> If we're going to throw our current algorithm over wholesale, I'd
>> rather use some approach that has been demonstrated to work well in
>> other systems.  Buffer eviction is a problem that's been around since
>> the 1970s, and our algorithm is just about that old.
>
> Um, if you're claiming that that code dates from Berkeley days, you're
> quite mistaken.

Note the code: the algorithm. I believe that what we have implemented
is GCLOCK, which is very old.

In the interest of full disclosure, descriptions of exactly what
GCLOCK is seem to vary a bit from paper to paper, but at least some
papers describe the exact algorithm we use.

> We adopted the clock sweep in 2005, after trying a few
> other things whose performance was worse.  I've not seen any argument in
> this thread that suggests we should abandon clock sweep + usage counts
> entirely.  Rather, to me the issue is that we haven't completely gotten
> rid of the last vestiges of the old global freelist approach.

Well, I think that switching from one clock sweep to a clock sweep per
backend would be basically an abandonment of the current approach.
The results might be better or worse, but they'd surely be different.

> BTW, it might be worth pointing out something I was trying to explain
> to Amit at PGCon: the key reason that we went with usage counters rather
> than something like a global LRU chain is that in the fast path where
> ReadBuffer finds the requested block already in a buffer, it does not
> have to contend for any global data structure to update the buffer's
> usage information.  It just has to bump the usage count in the buffer's
> header.  The free list, and the contention for BufFreelistLock, only
> matter in the slow path where you're going to have to incur I/O anyway
> (or at least a visit to the kernel).  That seemed plenty good enough
> in 2005.  Our ambitions have now advanced further, so I'm on board with
> trying to reduce contention here too, but I think it would be a mistake
> to make the fast case any slower.

Totally agreed. We're not the first people to think of this, either:
CLOCK and GLOCK have been extensively studied and found to be almost
as good as LRU in selecting good victim pages, but with less
contention. That's why people are using them.

Here's a paper that defines GLOCK to be the algorithm we use (page 2,
second column, second paragraph from the bottom), and furthermore
mentions PostgreSQL (top of page 3):

http://staff.aist.go.jp/m.yui/publications/ICDE10_conf_full_409.pdf

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2012-05-22 17:33:34 Re: Per-Database Roles
Previous Message Stephen Frost 2012-05-22 17:28:07 Re: Per-Database Roles