Re: Wait free LW_SHARED acquisition - v0.2

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wait free LW_SHARED acquisition - v0.2
Date: 2014-10-21 14:26:05
Message-ID: CAA4eK1+2Tk_geUAUepgNiX-mhtL=KN5_0sxp-6xN5q3Xi=suJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
>
> On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> > 2.
> > LWLockWakeup()
> > {
> > ..
> > #ifdef LWLOCK_STATS
> > lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> > #else
> > SpinLockAcquire(&lock->mutex);
> > #endif
> > ..
> > }
> >
> > Earlier while releasing lock, we don't count it towards LWLock stats
> > spin_delay_count. I think if we see other places in lwlock.c, it only
gets
> > counted when we try to acquire it in a loop.
>
> I think the previous situation was clearly suboptimal. I've now modified
> things so all spinlock acquirations are counted.

Code has mainly 4 stats (sh_acquire_count, ex_acquire_count,
block_count, spin_delay_count) to track, if I try to see
all stats together to understand the contention situation,
the unpatched code makes sense. spin_delay_count gives
how much delay has happened to acquire spinlock which when
combined with other stats gives the clear situation about
the contention around aquisation of corresponding LWLock.
Now if we want to count the spin lock delay for Release call
as well, then the meaning of the stat is getting changed.
It might be that new meaning of spin_delay_count stat is more
useful in some situations, however the older one has its own
benefits, so I am not sure if changing this as part of this
patch is the best thing to do.

> > 3.
> > LWLockRelease()
> > {
> > ..
> > /* grant permission to run, even if a spurious share lock increases
> > lockcount */
> > else if (mode == LW_EXCLUSIVE && have_waiters)
> > check_waiters = true;
> > /* nobody has this locked anymore, potential exclusive lockers get a
chance
> > */
> > else if (lockcount == 0 && have_waiters)
> > check_waiters = true;
> > ..
> > }
> >
> > It seems comments have been reversed in above code.
>
> No, they look right. But I've expanded them in the version I'm going to
> post in a couple minutes.

okay.

> > 5.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> > {
> > PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> > LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> > dlist_delete(&waiter->lwWaitLink);
> > pg_write_barrier();
> > waiter->lwWaiting = false;
> > PGSemaphoreUnlock(&waiter->sem);
> > }
> > ..
> > }
> >
> > Why can't we decrement the nwaiters after waking up? I don't think
> > there is any major problem even if callers do that themselves, but
> > in some rare cases LWLockRelease() might spuriously assume that
> > there are some waiters and tries to call LWLockWakeup(). Although
> > this doesn't create any problem, keeping the value sane is good unless
> > there is some problem in doing so.
>
> That won't work because then LWLockWakeup() wouldn't be called when
> necessary - precisely because nwaiters is 0.

> The reason I've done so is that it's otherwise much harder to debug
> issues where there are backend that have been woken up already, but
> haven't rerun yet. Without this there's simply no evidence of that
> state. I can't see this being relevant for performance, so I'd rather
> have it stay that way.

I am not sure what useful information we can get during debugging by not
doing this in LWLockWakeup() and w.r.t performance it can lead extra
function call, few checks and I think in some cases even can
acquire/release spinlock.

> > 6.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> > {
> > ..
> > if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> > continue;
> > ..
> > if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> > {
> > ..
> > wokeup_somebody = true;
> > }
> > ..
> > }
> > ..
> > }
> >
> > a.
> > IIUC above logic, if the waiter queue is as follows:
> > (S-Shared; X-Exclusive) S X S S S X S S
> >
> > it can skip the exclusive waiters and release shared waiter.
> >
> > If my understanding is right, then I think instead of continue, there
> > should be *break* in above logic.
>
> No, it looks correct to me. What happened is that the first S was woken
> up. So there's no point in waking up an exclusive locker, but further
> non-exclusive lockers can be woken up.

Okay, even then it makes the current logic of wakingup
different which I am not sure is what this patch is intended
for.

> > b.
> > Consider below sequence of waiters:
> > (S-Shared; X-Exclusive) S S X S S
> >
> > I think as per un-patched code, it will wakeup waiters uptill
(including)
> > first Exclusive, but patch will wake up uptill (*excluding*) first
> > Exclusive.
>
> I don't think the current code does that.

LWLockRelease()
{
..
/*
* If the front waiter wants exclusive lock, awaken him only.
*
Otherwise awaken as many waiters as want shared access.
*/
if (proc-
>lwWaitMode != LW_EXCLUSIVE)
{
while (proc->lwWaitLink !=
NULL &&
proc->lwWaitLink->lwWaitMode != LW_EXCLUSIVE)
{
if (proc->lwWaitMode != LW_WAIT_UNTIL_FREE)
releaseOK = false;
proc = proc->lwWaitLink;
}
}
/* proc is now the last PGPROC to be
released */
lock->head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
..
}

In the above code, if the first waiter to be woken up is Exclusive waiter,
then it will woke that waiter, else shared waiters till it got
the first exclusive waiter and then first exlusive waiter.

> And it'd be a pretty pointless
> behaviour, leading to useless increased contention. The only time it'd
> make sense for X to be woken up is when it gets run faster than the S
> processes.

Do we get any major benefit by changing the logic of waking up waiters?

> > 7.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> > {
> > ..
> > dlist_delete(&waiter->lwWaitLink);
> > dlist_push_tail(&wakeup, &waiter->lwWaitLink);
> > ..
> > }
> > ..
> > }
> >
> > Use of dlist has simplified the code, but I think there might be a
slight
> > overhead of maintaining wakeup queue as compare to un-patched
> > mechanism especially when there is a long waiter queue.
>
> I don't see that as being relevant. The difference is an instruction or
> two - in the slow path we'll enter the kernel and sleep. This doesn't
> matter in comparison.

Okay, however I see Robert has also raised a point on this issue
which I am not sure is concluded.

> And the code is *so* much more readable.

Code is more readable, but I don't understand why you
want to do refactoring as part of this patch which ideally
doesn't get any benefit from the same.

> > 8.
> > LWLockConditionalAcquire()
> > {
> > ..
> > /*
> > * We ran into an exclusive lock and might have blocked another
> > * exclusive lock from taking a shot because it took a time to back
> > * off. Retry till we are either sure we didn't block somebody (because
> > * somebody else certainly has the lock) or till we got it.
> > *
> > * We cannot rely on the two-step lock-acquisition protocol as in
> > * LWLockAcquire because we're not using it.
> > */
> > if (potentially_spurious)
> > {
> > SPIN_DELAY();
> > goto retry;
> > }
> > ..
> > }
> >
> > Due to above logic, I think it can keep on retrying for long time before
> > it actually concludes whether it got lock or not incase other backend/'s
> > takes Exclusive lock after *double_check* and release before
> > unconditional increment of shared lock in function LWLockAttemptLock.
> > I understand that it might be difficult to have such a practical
scenario,
> > however still there is a theoratical possibility of same.
>
> I'm not particularly concerned. We could optimize it a bit, but I really
> don't think it's necessary.

No issues, I was slightly worried about cases where this
API wasn't suppose to take time earlier (like for contentlock in
BufferAlloc), but now it starts taking time. I am not able to see
any specific issue, so let's proceed with keeping the code
as you have in patch and will optimize later incase we face any problem.

> > Is there any advantage of retrying in LWLockConditionalAcquire()?
>
> It's required for correctness. We only retry if we potentially blocked
> an exclusive acquirer (by spuriously incrementing/decrementing lockcount
> with 1). We need to be sure to either get the lock (in which case we can
> wake up the waiter on release), or be sure that we didn't disturb
> anyone.

Okay, got the point.

> > 9.
> > LWLockAcquireOrWait()
> > {
> > ..
> > /*
> > * NB: We're using nearly the same twice-in-a-row lock acquisition
> > * protocol as LWLockAcquire(). Check its comments for details.
> > */
> > mustwait = LWLockAttemptLock(l, mode, false,
&potentially_spurious_first);
> >
> > if (mustwait)
> > {
> > LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
> >
> > mustwait = LWLockAttemptLock(l, mode, false,
&potentially_spurious_second);
> >
> > }
> >
> > In this function, it doesn't seem to be required to use the return value
> > *mustwait* of second LWLockAttemptLock() call as a return value of
> > function, as per usage of this function if we don't get the lock at
first
> > attempt, then it needs to check if the corresponding WAL record is
> > flushed.
>
> I don't think that's the appropriate comparison. Acquiring the lock in
> the lock in the new implementation essentially consists out of these two
> steps. We *DID* get the lock here. Without sleeping. So returning the
> appropriate return code is correct.
> In fact, returning false would break things, because the caller would
> hold the lock without freeing it again?

Okay, I understand this point, but OTOH with this new logic
(attempt 2 times), in certain cases backends will get WALWriteLock,
when it is not really required which I am not sure can cause any problem,
so lets retain it as you have done in patch.

> > 11.
> > LWLockAcquireOrWait()
> > {
> > ..
> > Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
> > }
> >
> > Isn't it better to use AssertArg() rather than Assert in above usage?
>
> I've changed that. But I have to admit, I don't really see the point of
> AssertArg(). If it'd output the value, then it'd be beneficial, but it
> doesn't.

It seems you forgot to change in LWLockAcquireOrWait(), refer
below code.
@@ -773,14 +1191,14 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
int extraWaits = 0;
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
-#endif
-
- PRINT_LWDEBUG("LWLockAcquireOrWait", lock);

-#ifdef LWLOCK_STATS
lwstats = get_lwlock_stats_entry(lock);
#endif

+ Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
+

> > 15.
> > /* must be greater than MAX_BACKENDS */
> > #define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK)
> >
> > a. how can we guarantee it to be greater than MaxBackends,
> > as MaxBackends is int (the max value for which will be
> > equal to SHARED_LOCK_MASK)?
>
> MaxBackends luckily is limited to something lower. I've added a comment
> to that regard.
>
> > b. This is used only for LWLOCK_STATS, so shouldn't we
> > define this under LWLOCK_STATS.
>
> It's a general value, so I don't think that's appropriate.

No issues.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2014-10-21 15:04:49 Re: Question about RI checks
Previous Message Tom Lane 2014-10-21 14:16:38 Re: Trailing comma support in SELECT statements