Re: Strange Windows problem, lock_timeout test request

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, 'Craig Ringer' <craig(at)2ndQuadrant(dot)com>, 'Hans-Jürgen Schönig' <hs(at)cybertec(dot)at>, 'Ants Aasma' <ants(at)cybertec(dot)at>, 'PostgreSQL Hackers' <pgsql-hackers(at)postgresql(dot)org>, 'Amit kapila' <amit(dot)kapila(at)huawei(dot)com>
Subject: Re: Strange Windows problem, lock_timeout test request
Date: 2013-03-16 15:59:27
Message-ID: 514496DF.6070706@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2013-03-15 18:53 keltezéssel, Tom Lane írta:
> Boszormenyi Zoltan<zb(at)cybertec(dot)at> writes:
>> [ 2-lock_timeout-v33.patch ]
> I looked at this patch a bit. I don't understand why you've chosen to
> alter the API of the enable_timeout variants to have a bool result that
> says "I didn't bother to process the timeout because it would have fired
> immediately".

I don't know either...

> That is not an advantage for any call site that I can
> see: it just means that the caller needs an extra, very difficult to
> test code path to handle what would normally be handled by a timeout
> interrupt. Even if it were a good API choice, you've broken a number of
> existing call sites that the patch fails to touch (eg in postmaster.c
> and postgres.c). It's not acceptable to define a failure return from
> enable_timeout_after and then let callers assume that the failure can't
> happen.
>
> Please undo that.

Done.

> Also, I'm not really enamored of the choice to use List* infrastructure
> for enable_timeouts(). That doesn't appear to be especially convenient
> for any caller, and what it does do is create memory-leak risks for most
> of them (if they forget to free the list cells, perhaps as a result of
> an error exit). I think a local-variable array of TimeoutParams[]
> structs would serve better for most use-cases.

Changed. However, the first member of the structure is
"TimeoutId id" and a sensible end-of-array value can be -1.
Some versions of GCC (maybe other compilers, too) complain
if a constant is assigned to an enum which is outside of the
declared values of the enum. So I added a "NO_TIMEOUT = -1"
to enum TimeoutId. Comments?

> Another thought is that the PGSemaphoreTimedLock implementations all
> share the same bug, which is that if the "condition" callback returns
> true immediately after we acquire the semaphore, they will all wrongly
> return false indicating that the semaphore wasn't acquired.

You are right. Fixed.

> (BTW,
> I do not like either the variable name "condition" or the typedef name
> "TimeoutCondition" for something that's actually a callback function
> pointer.)

How about "TimeoutCallback" and "callback_fn"?

> In the same vein, this comment change:
>
> * NOTE: Think not to put any shared-state cleanup after the call to
> * ProcSleep, in either the normal or failure path. The lock state must
> ! * be fully set by the lock grantor, or by CheckDeadLock if we give up
> ! * waiting for the lock. This is necessary because of the possibility
> ! * that a cancel/die interrupt will interrupt ProcSleep after someone else
> ! * grants us the lock, but before we've noticed it. Hence, after granting,
> ! * the locktable state must fully reflect the fact that we own the lock;
> ! * we can't do additional work on return.
> *
> * We can and do use a PG_TRY block to try to clean up after failure, but
> * this still has a major limitation: elog(FATAL) can occur while waiting
> --- 1594,1606 ----
> /*
> * NOTE: Think not to put any shared-state cleanup after the call to
> * ProcSleep, in either the normal or failure path. The lock state must
> ! * be fully set by the lock grantor, or by CheckDeadLock or by ProcSleep
> ! * itself in case a timeout is detected or if we give up waiting for the lock.
> ! * This is necessary because of the possibility that a cancel/die interrupt
> ! * will interrupt ProcSleep after someone else grants us the lock, but
> ! * before we've noticed it. Hence, after granting, the locktable state must
> ! * fully reflect the fact that we own the lock; we can't do additional work
> ! * on return.
> *
>
> suggests that you've failed to grasp the fundamental race-condition
> problem here. ProcSleep can't do cleanup after the fact any more than
> its callers can, because otherwise it has a race condition with some
> other process deciding to grant it the lock at about the same time its
> timeout goes off. I think the changes in ProcSleep that alter the
> state-cleanup behavior are just plain wrong, and what you need to do
> is make that look more like the existing mechanisms that clean up when
> statement timeout occurs.

This was the most enlightening comment up to now.
It seems no one else understood the timer code but you...
Thanks very much.

I hope the way I did it is right. I factored out the core of
StatementCancelHandler() into a common function that can
be used by the lock_timeout interrupt as its timeout callback
function. Now the code doesn't need PGSemaphoreTimedLock().

While I was thinking about how this thing works, I recognized
that I also need to set the timeout indicator to false after checking
it in ProcessInterrupts().

The reason is that it's needed is this scenario:
1. lock_timeout is set and the transaction throws its error
2. lock_timeout is unset before the next transaction
3. the user presses Ctrl-C during the next transaction
In this case, the second transaction would report the
lock timeout error, since the indicator was still set.

The other way would be to call disable_all_timeouts(false)
unconditionally in ProcessInterrupts() but setting only the indicator
is faster and it doesn't interfere with unknown registered timeout.
Also, the reason is that it's disable_timeout_indicator() instead of
a generic set_timeout_indicator() is that the generic one may be
abused to produce false positives.

Best regards,
Zoltán Böszörményi

> regards, tom lane
>
>

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web:http://www.postgresql-support.de
http://www.postgresql.at/

Attachment Content-Type Size
2-lock_timeout-v36.patch text/x-patch 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-03-16 16:42:51 Re: Strange Windows problem, lock_timeout test request
Previous Message Fujii Masao 2013-03-16 15:35:54 Re: Support for REINDEX CONCURRENTLY