Re: BufFreelistLock

From: Jim Nasby <jim(at)nasby(dot)net>
To: Nasby Jim <Jim(at)Nasby(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Herrera Alvaro <alvherre(at)commandprompt(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BufFreelistLock
Date: 2010-12-13 21:12:04
Message-ID: 39023AD9-494D-43F3-A8DF-54DCEF9A8366@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Dec 12, 2010, at 8:48 PM, Jim Nasby wrote:
>> There might be some advantage in having it move buffers
>> to a freelist that's just protected by a simple spinlock (or at least,
>> a lock different from the one that protects the clock sweep). The
>> idea would be that most of the time, backends just need to lock the
>> freelist for long enough to take a buffer off it, and don't run clock
>> sweep at all.
>
> Yeah, the clock sweep code is very intensive compared to pulling a buffer from the freelist, yet AFAICT nothing will run the clock sweep except backends. Unless I'm missing something, the free list is practically useless because buffers are only put there by InvalidateBuffer, which is only called by DropRelFileNodeBuffers and DropDatabaseBuffers. So we make backends queue up behind the freelist lock with very little odds of getting a buffer, then we make them queue up for the clock sweep lock and make them actually run the clock sweep.

Looking at the code, it seems to be pretty trivial to have SyncOneBuffer decrement the usage count of every buffer it's handed. The challenge is that the code that estimates how many buffers we need to sync looks at where the clock hand is at, and I think it uses that information as part of it's calculation.

So the real challenge here is coming up with a good model for how many buffers we need to sync on each pass *and* how far the clock needs to be swept. There is also (currently) an interdependency here: the LRU scan will not sync buffers that have a usage_count > 0. So unless the clock sweep is being run well enough, the LRU scan becomes completely useless.

My thought is that the clock sweep should be scheduled the same way that OS VMs handle their free list: they attempt to keep X number of pages on the free list at all times. We already track the rate of buffer allocations, so that can be used to estimate how many pages are being consumed per cycle. Plus we'd want some number of extra pages as a buffer.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-12-13 21:29:58 Re: Complier warnings on mingw gcc 4.5.0
Previous Message Dimitri Fontaine 2010-12-13 21:08:29 Re: CommitFest wrap-up