Quick Links

Re: Wait free LW_SHARED acquisition - v0.2

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Wait free LW_SHARED acquisition - v0.2
Date:	2014-10-08 12:47:44
Message-ID:	20141008124744.GA11524@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> 2.
> LWLockWakeup()
> {
> ..
> #ifdef LWLOCK_STATS
> lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> #else
> SpinLockAcquire(&lock->mutex);
> #endif
> ..
> }
>
> Earlier while releasing lock, we don't count it towards LWLock stats
> spin_delay_count. I think if we see other places in lwlock.c, it only gets
> counted when we try to acquire it in a loop.

I think the previous situation was clearly suboptimal. I've now modified
things so all spinlock acquirations are counted.

> 3.
> LWLockRelease()
> {
> ..
> /* grant permission to run, even if a spurious share lock increases
> lockcount */
> else if (mode == LW_EXCLUSIVE && have_waiters)
> check_waiters = true;
> /* nobody has this locked anymore, potential exclusive lockers get a chance
> */
> else if (lockcount == 0 && have_waiters)
> check_waiters = true;
> ..
> }
>
> It seems comments have been reversed in above code.

No, they look right. But I've expanded them in the version I'm going to
post in a couple minutes.

> 5.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> {
> PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> dlist_delete(&waiter->lwWaitLink);
> pg_write_barrier();
> waiter->lwWaiting = false;
> PGSemaphoreUnlock(&waiter->sem);
> }
> ..
> }
>
> Why can't we decrement the nwaiters after waking up? I don't think
> there is any major problem even if callers do that themselves, but
> in some rare cases LWLockRelease() might spuriously assume that
> there are some waiters and tries to call LWLockWakeup(). Although
> this doesn't create any problem, keeping the value sane is good unless
> there is some problem in doing so.

That won't work because then LWLockWakeup() wouldn't be called when
necessary - precisely because nwaiters is 0.

> 6.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> {
> ..
> if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> continue;
> ..
> if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> {
> ..
> wokeup_somebody = true;
> }
> ..
> }
> ..
> }
>
> a.
> IIUC above logic, if the waiter queue is as follows:
> (S-Shared; X-Exclusive) S X S S S X S S
>
> it can skip the exclusive waiters and release shared waiter.
>
> If my understanding is right, then I think instead of continue, there
> should be *break* in above logic.

No, it looks correct to me. What happened is that the first S was woken
up. So there's no point in waking up an exclusive locker, but further
non-exclusive lockers can be woken up.

> b.
> Consider below sequence of waiters:
> (S-Shared; X-Exclusive) S S X S S
>
> I think as per un-patched code, it will wakeup waiters uptill (including)
> first Exclusive, but patch will wake up uptill (*excluding*) first
> Exclusive.

I don't think the current code does that. And it'd be a pretty pointless
behaviour, leading to useless increased contention. The only time it'd
make sense for X to be woken up is when it gets run faster than the S
processes.

> 7.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> {
> ..
> dlist_delete(&waiter->lwWaitLink);
> dlist_push_tail(&wakeup, &waiter->lwWaitLink);
> ..
> }
> ..
> }
>
> Use of dlist has simplified the code, but I think there might be a slight
> overhead of maintaining wakeup queue as compare to un-patched
> mechanism especially when there is a long waiter queue.

I don't see that as being relevant. The difference is an instruction or
two - in the slow path we'll enter the kernel and sleep. This doesn't
matter in comparison.
And the code is *so* much more readable.

> 8.
> LWLockConditionalAcquire()
> {
> ..
> /*
> * We ran into an exclusive lock and might have blocked another
> * exclusive lock from taking a shot because it took a time to back
> * off. Retry till we are either sure we didn't block somebody (because
> * somebody else certainly has the lock) or till we got it.
> *
> * We cannot rely on the two-step lock-acquisition protocol as in
> * LWLockAcquire because we're not using it.
> */
> if (potentially_spurious)
> {
> SPIN_DELAY();
> goto retry;
> }
> ..
> }
>
> Due to above logic, I think it can keep on retrying for long time before
> it actually concludes whether it got lock or not incase other backend/'s
> takes Exclusive lock after *double_check* and release before
> unconditional increment of shared lock in function LWLockAttemptLock.
> I understand that it might be difficult to have such a practical scenario,
> however still there is a theoratical possibility of same.

I'm not particularly concerned. We could optimize it a bit, but I really
don't think it's necessary.

> Is there any advantage of retrying in LWLockConditionalAcquire()?

It's required for correctness. We only retry if we potentially blocked
an exclusive acquirer (by spuriously incrementing/decrementing lockcount
with 1). We need to be sure to either get the lock (in which case we can
wake up the waiter on release), or be sure that we didn't disturb
anyone.

> 9.
> LWLockAcquireOrWait()
> {
> ..
> /*
> * NB: We're using nearly the same twice-in-a-row lock acquisition
> * protocol as LWLockAcquire(). Check its comments for details.
> */
> mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious_first);
>
> if (mustwait)
> {
> LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
>
> mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious_second);
>
> }
>
> In this function, it doesn't seem to be required to use the return value
> *mustwait* of second LWLockAttemptLock() call as a return value of
> function, as per usage of this function if we don't get the lock at first
> attempt, then it needs to check if the corresponding WAL record is
> flushed.

I don't think that's the appropriate comparison. Acquiring the lock in
the lock in the new implementation essentially consists out of these two
steps. We *DID* get the lock here. Without sleeping. So returning the
appropriate return code is correct.
In fact, returning false would break things, because the caller would
hold the lock without freeing it again?

> 11.
> LWLockAcquireOrWait()
> {
> ..
> Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
> }
>
> Isn't it better to use AssertArg() rather than Assert in above usage?

I've changed that. But I have to admit, I don't really see the point of
AssertArg(). If it'd output the value, then it'd be beneficial, but it
doesn't.

> 15.
> /* must be greater than MAX_BACKENDS */
> #define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK)
>
> a. how can we guarantee it to be greater than MaxBackends,
> as MaxBackends is int (the max value for which will be
> equal to SHARED_LOCK_MASK)?

MaxBackends luckily is limited to something lower. I've added a comment
to that regard.

> b. This is used only for LWLOCK_STATS, so shouldn't we
> define this under LWLOCK_STATS.

It's a general value, so I don't think that's appropriate.

Thanks for the review!

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Wait free LW_SHARED acquisition - v0.2 at 2014-06-25 13:36:32 from Amit Kapila

Responses

Re: Wait free LW_SHARED acquisition - v0.2 at 2014-10-08 13:00:13 from Andres Freund
Re: Wait free LW_SHARED acquisition - v0.2 at 2014-10-08 17:13:33 from Robert Haas
Re: Wait free LW_SHARED acquisition - v0.2 at 2014-10-21 14:26:05 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2014-10-08 13:00:13	Re: Wait free LW_SHARED acquisition - v0.2
Previous Message	Heikki Linnakangas	2014-10-08 11:50:07	Re: pg_receivexlog --status-interval add fsync feedback