Re: [PATCH] lock_timeout and common SIGALRM framework

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework
Date: 2012-09-24 21:30:41
Message-ID: 5060D101.8020908@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2012-09-22 20:49 keltezéssel, Tom Lane írta:
> Boszormenyi Zoltan <zb(at)cybertec(dot)at> writes:
>> new version with a lot more cleanup is attached.
> I looked at this patch, and frankly I'm rather dismayed. It's a mess.

I hope you won't find this one a mess. I tried to address all your complaints.

> [rather long diatribe on modifying PGSemaphoreLock improperly]

I have returned to the previous version that used PGSemaphoreTimedLock
and this time I save and restore errno around calling the timeout condition
checker.

> The whole lmgrtimeout module seems to me to be far more mechanism than is
> warranted, for too little added functionality. [...]

lmgrtimeout is no more, back to hardcoding things.

> The minimum thing I would want it to do is avoid calling
> timeout.c multiple times, which is what would happen right now (leading
> to four extra syscalls per lock acquisition, which is enough new overhead
> to constitute a strong objection to committing this patch at all).

I have added enable_multiple_timeouts() and disable_multiple_timeouts()
that minimize the number setitimer() calls.

> There's considerable lack of attention to updating comments, too.
> For instance in WaitOnLock you only bothered to update the comment
> immediately adjacent to the changed code, and not the two comment
> blocks above that, which both have specific references to deadlocks
> being the reason for failure.

I modified the comment in question. I hope the wording is right.

> Also, the "per statement" mode for lock timeout doesn't seem to be
> any such thing, because it's implemented like this:
>
> + case LOCK_TIMEOUT_PER_STMT:
> + enable_timeout_at(LOCK_TIMEOUT,
> + TimestampTzPlusMilliseconds(
> + GetCurrentStatementStartTimestamp(),
> + LockTimeout));
> + break;
>
> That doesn't provide anything like "you can spend at most N milliseconds
> waiting for locks during a statement". What it is is "if you happen to be
> waiting for a lock N milliseconds after the statement starts, or if you
> attempt to acquire any lock more than N milliseconds after the statement
> starts, you lose instantly". I don't think that definition actually adds
> any useful functionality compared to setting statement_timeout to N
> milliseconds, and it's certainly wrongly documented. To do what the
> documentation implies would require tracking and adding up the time spent
> waiting for locks during a statement. Which might be a good thing to do,
> especially if the required gettimeofday() calls could be shared with what
> timeout.c probably has to do anyway at start and stop of a lock wait.
> But this code doesn't do it.

The code now properly accumulates the time spent in waiting for
LOCK_TIMEOUT. This means that if STATEMENT_TIMEOUT and
LOCK_TIMEOUT are set to the same value, STATEMENT_TIMEOUT
will trigger because it considers the time as one contiguous unit,
LOCK_TIMEOUT only accounts the time spent in waiting, not the time
spent with useful work. This means that LOCK_TIMEOUT doesn't
need any special code in its handler function, it's a NOP. The
relation between timeouts is only handled by the timeout.c module.

> Lastly, I'm not sure where is the best place to be adding the control
> logic for this, but I'm pretty sure postinit.c is not it. It oughta be
> somewhere under storage/lmgr/, no?

The above change means that there is no control logic outside of
storage/lmgr now.

I temporarily abandoned the idea of detailed error reporting on
the object type and name/ID. WaitOnLock() reports "canceling statement
due to lock timeout" and LockAcquire() kept its previous semantics.
This can be quickly revived in case of demand, it would be another ~15K patch.

I hope you can find another time slot in this CF to review this one.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

Attachment Content-Type Size
2-lock_timeout-v26.patch.gz application/x-tar 11.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-09-24 21:52:46 Re: [PoC] load balancing in libpq
Previous Message Alvaro Herrera 2012-09-24 20:17:20 Re: External Replication