Re: Escaping from blocked send() reprised.

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-10-05 22:16:38
Message-ID: 20141005221638.GB3462@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-03 18:29:23 +0300, Heikki Linnakangas wrote:
> On 10/03/2014 05:26 PM, Andres Freund wrote:
> >On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> >>On 09/28/2014 01:54 AM, Andres Freund wrote:
> >>>0003 Sinval/notify processing got simplified further. There really isn't
> >>> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> >>> anymore. Also begin_client_read/client_read_ended don't make much
> >>> sense anymore. Instead introduce ProcessClientReadInterrupt (which
> >>> wants a better name).
> >>>There's also a very WIP
> >>>0004 Allows secure_read/write be interrupted when ProcDiePending is
> >>> set. All of that happens via the latch mechanism, nothing happens
> >>> inside signal handlers. So I do think it's quite an improvement
> >>> over what's been discussed in this thread.
> >>> But it (and the other approaches) do noticeably increase the
> >>> likelihood of clients not getting the error message if the client
> >>> isn't actually dead. The likelihood of write() being blocked
> >>> *temporarily* due to normal bandwidth constraints is quite high
> >>> when you consider COPY FROM and similar. Right now we'll wait till
> >>> we can put the error message into the socket afaics.
> >>>
> >>>1-3 need some serious comment work, but I think the approach is
> >>>basically sound. I'm much, much less sure about allowing send() to be
> >>>interrupted.
> >>
> >>Yeah, 1-3 seem sane.
> >
> >I think 3 also needs a careful look. Have you looked through it? While
> >imo much less complex than before, there's some complex interactions in
> >the touched code. And we have terrible coverage of both catchup
> >interrupts and notify stuff...
>
> I only looked at the .patch, I didn't apply it, so I didn't look at the
> context much. But I don't see any fundamental problem with it. I would like
> to have a closer look before it's committed, though.

I'd appreciate that. I don't want to commit it without a careful review
of another committer.

> >There's also the concern that using a latch for client communication
> >increases the number of syscalls for the same work. We should at least
> >try to quantify that...
>
> I'm not too concerned about that, since we only do extra syscalls when the
> socket isn't immediately available for reading/writing, i.e. when we have to
> sleep anyway.

Well, kernels actually do some nice optimizations for blocking reads -
at least for local sockets. Like switching to the other process
immediately and such.
I'm not super concerned either, but I think we should try to measure
it. And if we're failing, we probably should try to address these
problems - if possible in the latch code.

> >>4 also looks OK to me at a quick glance. It basically
> >>enables handling the "die" interrupt immediately, if we're blocked in a read
> >>or write. It won't be handled in the signal handler, but within the
> >>secure_read/write call anyway.
> >
> >What are you thinking about the concern that it'll reduce the likelihood
> >of transferring the error message to the client? I tried to reduce that
> >by only allowing errors when write() blocks, but that's not an
> >infrequent event.
>
> I'm not too concerned about that either. I mean, it's probably true that it
> reduces the likelihood, but I don't particularly care myself. But if we
> care, we could use a timeout there, so that if we receive a SIGTERM while
> blocked on a send(), we wait for a few seconds to see if we can send
> whatever we were sending, before terminating the backend.
>
>
> What should we do with this patch in the commitfest?

I think feature should be ddeclare as 'returned with feedback' for this
commitfest. I've done so.

I don't see much of a reason waiting with patch 1 till the next
commitfest. It imo looks uncontroversial and doesn't have any far
reaching consequences.

> Are you planning to clean up and commit these patches?

I plan to do so one by one, yes. If you'd like to pick up any of them,
feel free to do (after telling me, to avoid duplicated efforts). I don't
feel proprietary about any of them. But I guess you have other stuff
you'd like to work on too ;)

I'll try to send out a version with the stuff you mentioned earlier in
the next couple days.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2014-10-05 22:29:54 Re: Proposal for better support of time-varying timezone abbreviations
Previous Message Tom Lane 2014-10-05 21:33:20 Proposal for better support of time-varying timezone abbreviations