Reducing overhead of frequent table locks

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Reducing overhead of frequent table locks
Date: 2011-05-13 20:16:13
Message-ID: 20110513201613.GA22947@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 13, 2011 at 09:07:34AM -0400, Robert Haas wrote:
> Actually, it's occurred to me from time to time that it would be nice
> to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and
> ROW EXCLUSIVE) locks for tables as well. Under normal operating
> conditions (i.e. no DDL running), these locks generate a huge amount
> of lock manager traffic even though none of the locks conflict with
> each other. Unfortunately, I don't really see a way to make this
> work. But maybe it would at least be possible to create some sort of
> fast path. For example, suppose every backend opens a file and uses
> that file to record lock tags for the objects on which it is taking
> "weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking
> a "strong" lock (anything that conflicts with one of those lock
> types), the exclusive locker is required to open all of those files
> and transfer the locks into the lock manager proper. Of course, it's
> also necessary to nail down the other direction: you have to have some
> way of making sure that the backend can't record in it's local file a
> lock that would have conflicted had it been taken in the actual lock
> manager. But maybe there's some lightweight way we could detect that,
> as well. For example, we could keep, say, a 1K array in shared
> memory, representing a 1024-way partitioning of the locktag space.
> Each byte is 1 if there are any "strong" locks on objects with that
> locktag in the lock manager, and 0 if there are none (or maybe you
> need a 4K array with exact counts, for bookkeeping). When a backend
> wants to take a "weak" lock, it checks the array: if it finds a 0 then
> it just records the lock in its file; otherwise, it goes through the
> lock manager. When a backend wants a "strong" lock, it first sets the
> byte (or bumps the count) in the array, then transfers any existing
> weak locks from individual backends to the lock manager, then tries to
> get its own lock. Possibly the array operations could be done with
> memory synchronization primitives rather than spinlocks, especially on
> architectures that support an atomic fetch-and-add. Of course I don't
> know quite how we recover if we try to do one of these "lock
> transfers" and run out of shared memory... and overall I'm hand-waving
> here quite a bit, but in theory it seems like we ought to be able to
> rejigger this locking so that we reduce the cost of obtaining a "weak"
> lock, perhaps at the expense of making it more expensive to obtain a
> "strong" lock, which are relatively rare by comparison.
>
> <end of rambling digression>

The key is putting a rapid hard stop to all fast-path lock acquisitions and
then reconstructing a valid global picture of the affected lock table regions.
Your 1024-way table of strong lock counts sounds promising. (Offhand, I do
think they would need to be counts, not just flags.)

If I'm understanding correctly, your pseudocode would look roughly like this:

if (level >= ShareUpdateExclusiveLock)
++strong_lock_counts[my_strong_lock_count_partition]
sfence
if (strong_lock_counts[my_strong_lock_count_partition] == 1)
/* marker 1 */
import_all_local_locks
normal_LockAcquireEx
else if (level <= RowExclusiveLock)
lfence
if (strong_lock_counts[my_strong_lock_count_partition] == 0)
/* marker 2 */
local_only
/* marker 3 */
else
normal_LockAcquireEx
else
normal_LockAcquireEx

At marker 1, we need to block until no code is running between markers two and
three. You could do that with a per-backend lock (LW_SHARED by the strong
locker, LW_EXCLUSIVE by the backend). That would probably still be a win over
the current situation, but it would be nice to have something even cheaper.

Then you have the actual procedure for transfer of local locks to the global
lock manager. Using record locks in a file could work, but that's a system call
per lock acquisition regardless of the presence of strong locks. Is that cost
sufficiently trivial? I wonder if, instead, we could signal all backends at
marker 1 to dump the applicable parts of their local (memory) lock tables to
files. Or to another shared memory region, if that didn't mean statically
allocating the largest possible required amount. If we were willing to wait
until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the
global insertions directly. That might yield a decent amount of bug swatting to
fill in missing CHECK_FOR_INTERRUPTS, though.

Improvements in this area would also have good synergy with efforts to reduce
the global impact of temporary table usage. CREATE TEMP TABLE can be the
major source of AccessExclusiveLock acquisitions. However, with the strong
lock indicator partitioned 1024 ways or more, that shouldn't be a deal killer.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-05-13 20:52:30 Re: Unlogged vs. In-Memory
Previous Message Robert Haas 2011-05-13 19:52:40 Re: Foreign table permissions and cloning