Re: Escaping from blocked send() reprised.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-08-26 06:55:28
Message-ID: 53FC2F60.6050907@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/26/2014 09:17 AM, Kyotaro HORIGUCHI wrote:
>> but I don't think we want to define the behavior as "usually,
>> pq_terminate_backend() will kill a backend that's blocked on sending
>> to the client, but sometimes you have to call it twice (or more!) to
>> really kill it".
>
> I agree that it is desirable behavior, if any measure to avoid
> that. But I think it's better than doing kill -9 engulfing all
> innocent backends.
>
>> A more robust way is to set ImmediateInterruptOK before calling
>> send(). That wouldn't let you send data that can be sent without
>> blocking though. For that, you could put the socket to non-blocking
>> mode, and sleep with select(), also waiting for the process' latch at
>> the same time (die() sets the latch, so that will wake up the select()
>> if a termination request arrives).
>
> I condiered it but select() frequently (rather in most cases when
> send() blocks by send buffer exhaustion) fails to predict that
> following send() will be blocked. (If my memory is correct.) So
> the final problem would be blocked send()...

My point was to put the socket in non-blocking mode, so that send() will
return immediately with EAGAIN instead of blocking, if the send buffer
is full. See WalSndWriteData for how that would work, it does something
similar.

>> Is it actually safe to process the die-interrupt where send() is
>> called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
>> attempt to send a message to the client. If that happens in the middle
>> of constructing some other message, that will violate the protocol.
>
> So I strongly agree to you if select() works as the impression
> when reading the man document.

Not sure what you mean, but the above is a fatal problem with the patch
right now, regardless of how you do the sleeping.

>>>> 2. I think it would be reasonable to try to kill off the connection
>>>> without notifying the client if we're unable to send the data to the
>>>> client in a reasonable period of time. But I'm unsure what "a
>>>> reasonable period of time" means. This patch would basically do it
>>>> after no delay at all, which seems like it might be too aggressive.
>>>> However, I'm not sure.
>>>
>>> I think there's no such a reasonable time.
>>
>> I agree it's pretty hard to define any reasonable timeout here. I
>> think it would be fine to just cut the connection; even if you don't
>> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
>> somewhere higher in the stack and kill the connection almost as
>> abruptly anyway. (you can't violate the protocol, however)
>
> Yes, closing the blocked connection seems one of the most smarter
> way, checking the occurred interrupt could avoid protocol
> violation. But the problem for that is that there seems no means
> to close sockets elsewhere the blocking handle. dup(2)'ed handle
> cannot release the resource by only itself.

I didn't understand that, surely you can just close() the socket? There
is no dup(2) involved. And we don't necessarily need to close the
socket, we just need to avoid writing to it when we're already in the
middle of sending a message.

I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single
pg_terminate_backend() is enough to always kill the connection.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-08-26 07:10:39 Re: Verbose output of pg_dump not show schema name
Previous Message Fabien COELHO 2014-08-26 06:43:01 Re: postgresql latency & bgwriter not doing its job