Thinking about breaking up the BufMgrLock

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Thinking about breaking up the BufMgrLock
Date: 2005-02-07 00:30:37
Message-ID: 17716.1107736237@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We've been speculating for awhile about ways to reduce contention for
the BufMgrLock, in hopes of fixing the "context swap storm" problem
exhibited by certain patterns of concurrent queries. The traditional
way to fix such problems is to divide what is protected by the lock
into finer-grain lockable objects. (I'm not totally convinced that this
approach can help us, because more locks means more low-level locking
overhead. But let's pursue it and see where it goes; we sure don't have
any other ideas at the moment.) Neil Conway took a stab at this about a
year ago:
http://archives.postgresql.org/pgsql-hackers/2004-01/msg00173.php
but that approach had some problems IMHO, and in any case he never got
the patch to the point of working. Here's a bit of fresh thinking about
the problem.

To fix the test cases we have been looking at, there are two things
we need to fix the performance of: ReadBuffer for the case where the
requested page is already in a shared buffer, and ReleaseBuffer. The
test cases essentially put multiple backends into tight loops doing
these operations. These cases do not require I/O or any kernel call,
so ideally they should be fast, but what we are seeing is that they
suffer from excessive contention for the BufMgrLock.

ReadBuffer needs to do a lookup to map the page ID to a buffer ID,
which in principle requires only a shared lock on the page-to-buffer
mapping (embodied in the buf_table hash table). Assuming success, it
also needs to mark the buffer pinned and update the LRU-list position
of the buffer. Marking pinned is certainly a buffer-local change,
so we could imagine splitting up the BufMgrLock like this:

1. A global LWLock for the page-to-buffer mapping, call it say
BufMappingLock. Share lock on this is sufficient to allow reading the
hash table; must get exclusive lock when reassigning any buffer's page
identity.

2. A global LWLock for the LRU strategy information, say BufLRULock
or BufStrategyLock.

3. Per-buffer LWLocks (or maybe just spinlocks) protecting the state of
each buffer header; you need this lock to examine or change a buffer
header.

ReleaseBuffer has no locking problems in this formulation: it just grabs
the per-buffer lock, decrements its refcount, releases the lock.

ReadBuffer looks like:

* Acquire share lock on BufMappingLock.
* Search hash table for desired ID. (we assume success)
* acquire per-buffer lock.
* increment buffer refcount.
* release per-buffer lock.
* release share lock on BufMappingLock.
* update the LRU state.

(We have to bump the pin count on the target buffer before releasing the
BufMappingLock, otherwise someone could reassign the buffer as soon as
we release BufMappingLock.)

This would be pretty good from a locking point of view, except that
"update the LRU state" seems to require taking an exclusive lock on a
global data structure, which puts us about back where we were.
Contention for a global BufLRULock might be a bit better than for the
existing overall BufMgrLock, but it'll still cripple the performance
of ReadBuffer.

Perhaps someone knows of a neat data structure that can maintain an LRU
list with only local updates? I don't though.

The best idea I've had about it is to add flag bits to the per-buffer
state that essentially indicate "pending move to head of T1" or "pending
move to head of T2". We would make ReadBuffer set the appropriate bit
when it grabs a buffer, but not actually update the global LRU state.
Operations such as the periodic bgwriter scan for work to do, or an
attempt to locate a reusable buffer, would scan the LRU list from tail
to front. When they hit a buffer that's got one of these bits set, they
would not process it immediately, but would move it to the list head and
clear the bit. Doing this would require write lock on the LRU list, as
well as per-buffer locks on each buffer as it's examined or changed,
but *not* any lock on BufMappingLock. So these operations wouldn't
interfere too badly with the success path in ReadBuffer.

This would convert the existing "strict LRU" behavior into an
"approximate LRU". I'm worried that the change might be penny-wise and
pound-foolish: if a poorer cache management algorithm causes us to have
to do more I/O, it's probably a net loss to save some lock contention.
But without a smart idea about data structures I don't see how to do
better.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-02-07 00:33:55 Re: Query optimizer 8.0.1 (and 8.0)
Previous Message Marc G. Fournier 2005-02-07 00:25:22 Re: Patch Count?