Re: Some 9.5beta2 backend processes not terminating properly?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some 9.5beta2 backend processes not terminating properly?
Date: 2016-01-02 20:38:37
Message-ID: 20160102203837.owqacjmk7ceunjle@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-01-02 15:11:42 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I found a few more resources confirming that FD_CLOSE is edge
> > triggered. Which probably doesn't just make our code buggy when waiting
> > twice on the same socket, but probably also makes it very timing
> > dependent: As the event is only triggered when the close actually occurs
> > it's possible that we don't have any event associated with that socket:
> > We only do so for shorts amount of time in WaitLatchOrSocket() and
> > pgwin32_waitforsinglesocket().
>
> Does the timing dependence explain why we've not been able to trigger this
> by killing psql?

I think so. Possibly it'd be reproducible in psql by sending a
interrupting while executing pg_sleep(3000) or something involving the
copy protocol?

> If the bug only occurs when the client connection drops when we're not
> waiting for input, that would likely explain why nobody noticed it for
> ten months.

Yea. I think there also might also be another issue: Windows' recv() -
which we're not using, but I don't see differing documentation for
WSARecv() - returns 0 bytes if a socket was closed 'gracefully',
i.e. shutdown(SD_SEND) was called on the client side (similar to
unix' recv).

pgwin32_recv() converts a 0 byte return from WSARecv() into
if (pgwin32_noblock)
{
/*
* No data received, and we are in "emulated non-blocking mode", so
* return indicating that we'd block if we were to continue.
*/
errno = EWOULDBLOCK;
return -1;
}

which would explain why we'd eat the FD_CLOSE and then just continue
waiting...

That seems like a pretty straight forward bug. But it hinges on the
client side calling shutdown() on the socket. I don't know enough about
.net's internals to judge wether it does so. I've traced things far
enough to find
"Disposing a Stream object flushes any buffered data, and essentially
calls the Flush method for you. Dispose also releases operating system
resources such as file handles, network connections, or memory used for
any internal buffering. The BufferedStream class provides the capability
of wrapping a buffered stream around another stream in order to improve
read and write performance."
https://msdn.microsoft.com/en-us/library/system.io.stream%28v=vs.110%29.aspx

which'd plausibly use shutdown().

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brar Piening 2016-01-02 21:25:31 Re: Some 9.5beta2 backend processes not terminating properly?
Previous Message Tom Lane 2016-01-02 20:31:25 Re: Release notes of 9.0~9.3 mentioning recovery_min_apply_delay incorrectly