Re: Escaping from blocked send() reprised.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnakangas(at)vmware(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-08-26 06:17:08
Message-ID: 20140826.151708.233374120.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, I was absorbed by other tasks..

Thank you for reviewing thiis.

> On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
> > At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas
> > <robertmhaas(at)gmail(dot)com> wrote in
> > <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ(at)mail(dot)gmail(dot)com>
> >> 1. I think it's the case that there are platforms around where a
> >> signal won't cause send() to return EINTR.... and I'd be entirely
> >> unsurprised if SSL_write() doesn't necessarily return EINTR in that
> >> case. I'm not sure what, if anything, we can do about that.
>
> We use a custom "write" routine with SSL_write, where we call send()
> ourselves, so that's not a problem as long as we put the check in the
> right place (in secure_raw_write(), after my recent SSL refactoring -
> the patch needs to be rebased).
>
> > man 2 send on FreeBSD has not description about EINTR.. And even
> > on linux, send won't return EINTR for most cases, at least I
> > haven't seen that. So send()=-1,EINTR seems to me as only an
> > equivalent of send() = 0. I have no idea about what the
> > implementer thought the difference is.
>
> As the patch stands, there's a race condition: if the SIGTERM arrives
> *before* the send() call, the send() won't return EINTR anyway. So
> there's a chance that you still block. Calling pq_terminate_backend()
> again will dislodge it (assuming send() returns with EINTR on signal),

Yes, that window would'nt be extinguished without introducing
something more. EINTR is set only when nothing sent by the
call. So AFAIS the chance of getting EINTR is far small than
expectation.

> but I don't think we want to define the behavior as "usually,
> pq_terminate_backend() will kill a backend that's blocked on sending
> to the client, but sometimes you have to call it twice (or more!) to
> really kill it".

I agree that it is desirable behavior, if any measure to avoid
that. But I think it's better than doing kill -9 engulfing all
innocent backends.

> A more robust way is to set ImmediateInterruptOK before calling
> send(). That wouldn't let you send data that can be sent without
> blocking though. For that, you could put the socket to non-blocking
> mode, and sleep with select(), also waiting for the process' latch at
> the same time (die() sets the latch, so that will wake up the select()
> if a termination request arrives).

I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.) So
the final problem would be blocked send()...

> Is it actually safe to process the die-interrupt where send() is
> called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
> attempt to send a message to the client. If that happens in the middle
> of constructing some other message, that will violate the protocol.

So I strongly agree to you if select() works as the impression
when reading the man document.

> >> 2. I think it would be reasonable to try to kill off the connection
> >> without notifying the client if we're unable to send the data to the
> >> client in a reasonable period of time. But I'm unsure what "a
> >> reasonable period of time" means. This patch would basically do it
> >> after no delay at all, which seems like it might be too aggressive.
> >> However, I'm not sure.
> >
> > I think there's no such a reasonable time.
>
> I agree it's pretty hard to define any reasonable timeout here. I
> think it would be fine to just cut the connection; even if you don't
> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
> somewhere higher in the stack and kill the connection almost as
> abruptly anyway. (you can't violate the protocol, however)

Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-08-26 06:25:58 Re: postgresql latency & bgwriter not doing its job
Previous Message Fabien COELHO 2014-08-26 06:12:48 Re: postgresql latency & bgwriter not doing its job