Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn

Lists: pgsql-hackers
From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-17 14:16:22
Message-ID: 20130617141622.GH5875@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

When postgres on linux receives connection on a high rate client
connections sometimes error out with:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected

To reproduce start something like on a server with sufficiently high
max_connections:
pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql

Now that's strange since that error should happen at connect(2) time,
not when sending the startup packet. Some investigation led me to
fe-secure.c's PQConnectPoll:

if (connect(conn->sock, addr_cur->ai_addr,
addr_cur->ai_addrlen) < 0)
{
if (SOCK_ERRNO == EINPROGRESS ||
SOCK_ERRNO == EWOULDBLOCK ||
SOCK_ERRNO == EINTR ||
SOCK_ERRNO == 0)
{
/*
* This is fine - we're in non-blocking mode, and
* the connection is in progress. Tell caller to
* wait for write-ready on socket.
*/
conn->status = CONNECTION_STARTED;
return PGRES_POLLING_WRITING;
}
/* otherwise, trouble */
}

So, we're accepting EWOULDBLOCK as a valid return value for
connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
linux. Unfortunately POSIX allows those two to share the same value...

My manpage tells me:
EAGAIN No more free local ports or insufficient entries in the routing cache. For
AF_INET see the description of
/proc/sys/net/ipv4/ip_local_port_range ip(7)
for information on how to increase the number of local
ports.

So, the problem is that we took a failed connection as having been
initially successfull but in progress.

Not accepting EWOULDBLOCK in the above if() results in:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5440"?

which makes more sense.

Trivial patch attached.

Now, the question is why we cannot complete connections on unix sockets?
Some code reading reading shows net/unix/af_unix.c:unix_stream_connect()
shows:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
So, if we're in nonblocking mode - which we are - and the receive queue
is full we return EAGAIN. The receive queue for unix sockets is defined
as
static inline int unix_recvq_full(struct sock const *sk)
{
return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
Where sk_max_ack_backlog is whatever has been passed to the
listen(backlog) on the listening side.

Question: But postgres does listen(fd, MaxBackends * 2), how can that be
a problem?
Answer:
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn,
then it is silently truncated to that value; the default value in this file is
128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with
the value 128.

Setting somaxconn to something higher indeed makes the problem go away.

I'd guess that pretty much the same holds true for tcp connections,
although I didn't verify that which would explain some previous reports
on the lists.

TLDR: Increase /proc/sys/net/core/somaxconn

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-libpq-Handle-connect-2-returning-EAGAIN-EWOULDBLOCK-.patch text/x-patch 1.6 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-26 11:22:58
Message-ID: 20130626112258.GD8637@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2013-06-17 16:16:22 +0200, Andres Freund wrote:
> When postgres on linux receives connection on a high rate client
> connections sometimes error out with:
> could not send data to server: Transport endpoint is not connected
> could not send startup packet: Transport endpoint is not connected
>
> To reproduce start something like on a server with sufficiently high
> max_connections:
> pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql
>
> Now that's strange since that error should happen at connect(2) time,
> not when sending the startup packet. Some investigation led me to
> fe-secure.c's PQConnectPoll:

> So, we're accepting EWOULDBLOCK as a valid return value for
> connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
> linux. Unfortunately POSIX allows those two to share the same value...
>
> My manpage tells me:
> EAGAIN No more free local ports or insufficient entries in the routing cache. For
> AF_INET see the description of
> /proc/sys/net/ipv4/ip_local_port_range ip(7)
> for information on how to increase the number of local
> ports.
>
> So, the problem is that we took a failed connection as having been
> initially successfull but in progress.
>
> Not accepting EWOULDBLOCK in the above if() results in:
> could not connect to server: Resource temporarily unavailable
> Is the server running locally and accepting
> connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
>
> which makes more sense.
>
> Trivial patch attached.

Could I convince a committer to NACK or commit & backpatch that patch?

It has come up before:
http://www.postgresql.org/message-id/CAMnJ+Beq0hCBuTY_=nz=ru0U-No543_RAEunLVSAYU8tugd6NA@mail.gmail.com
possibly also:
http://lists.pgfoundry.org/pipermail/pgpool-general/2007-March/000595.html

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-26 16:07:54
Message-ID: 8365.1372262874@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-06-17 16:16:22 +0200, Andres Freund wrote:
>> Not accepting EWOULDBLOCK in the above if() results in:
>> could not connect to server: Resource temporarily unavailable
>> Is the server running locally and accepting
>> connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
>> which makes more sense.

> Could I convince a committer to NACK or commit & backpatch that patch?

Some trawling in the commit history shows that the current logic dates
from my commit 6d0d838cebdf2bcd5c03f5449a1888f1e120496f, which unified
Windows and non-Windows code paths; the check for EWOULDBLOCK was added
in the earlier commit ca5a51627919c6fb6ab5e23739615a674caa4037 which
(claimed to) add support for non-blocking connect on Windows. So I'm
concerned that your patch could break that platform. A possible
compromise is

{
if (SOCK_ERRNO == EINPROGRESS ||
+#ifndef WIN32
SOCK_ERRNO == EWOULDBLOCK ||
+#endif
SOCK_ERRNO == EINTR ||
SOCK_ERRNO == 0)

but I wonder whether it's safe to remove the case altogether. Could
anyone research the situation for non-blocking connect() on Windows?
Perhaps on Windows we shouldn't test for EINPROGRESS at all?

I'm also pretty suspicious of the SOCK_ERRNO == 0 case, now that I look
at it, especially in view of the lack of any attempt to set errno to 0
before the call. The Single Unix Spec is pretty clear that EINTR and
EINPROGRESS are the only legit cases, so unless Windows is doing its own
thing, we could probably get rid of that too.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-26 18:12:00
Message-ID: 20130626181200.GB19047@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-26 12:07:54 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-06-17 16:16:22 +0200, Andres Freund wrote:
> >> Not accepting EWOULDBLOCK in the above if() results in:
> >> could not connect to server: Resource temporarily unavailable
> >> Is the server running locally and accepting
> >> connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
> >> which makes more sense.
>
> > Could I convince a committer to NACK or commit & backpatch that patch?
>
> Some trawling in the commit history shows that the current logic dates
> from my commit 6d0d838cebdf2bcd5c03f5449a1888f1e120496f, which unified
> Windows and non-Windows code paths; the check for EWOULDBLOCK was added
> in the earlier commit ca5a51627919c6fb6ab5e23739615a674caa4037 which
> (claimed to) add support for non-blocking connect on Windows. So I'm
> concerned that your patch could break that platform.

> A possible compromise is
>
> {
> if (SOCK_ERRNO == EINPROGRESS ||
> +#ifndef WIN32
> SOCK_ERRNO == EWOULDBLOCK ||
> +#endif
> SOCK_ERRNO == EINTR ||
> SOCK_ERRNO == 0)
>
> but I wonder whether it's safe to remove the case altogether. Could
> anyone research the situation for non-blocking connect() on Windows?
> Perhaps on Windows we shouldn't test for EINPROGRESS at all?

The way EWOULDBLOCK is mentioned on msdn
(http://msdn.microsoft.com/en-us/library/windows/desktop/ms737625%28v=vs.85%29.aspx)
it indeed seems to be required.
I don't see how we could trigger the conditions for EINPROGRESS on
windows that msdn lists, but since we need it on unixoid systems and its
valid to treat the connect as partiall successfull on windows, there
seems little benefit in dropping it. So I guess the #ifdef you suggested
is the way to go.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-27 00:07:40
Message-ID: 1681.1372291660@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-06-26 12:07:54 -0400, Tom Lane wrote:
>> ... I wonder whether it's safe to remove the case altogether. Could
>> anyone research the situation for non-blocking connect() on Windows?
>> Perhaps on Windows we shouldn't test for EINPROGRESS at all?

> The way EWOULDBLOCK is mentioned on msdn
> (http://msdn.microsoft.com/en-us/library/windows/desktop/ms737625%28v=vs.85%29.aspx)
> it indeed seems to be required.

Yeah, that seems pretty clear: WSAEWOULDBLOCK is what's returned when a
nonblocking connect() has been started successfully.

> I don't see how we could trigger the conditions for EINPROGRESS on
> windows that msdn lists, but since we need it on unixoid systems and its
> valid to treat the connect as partiall successfull on windows, there
> seems little benefit in dropping it.

I was about to argue for removing the EINPROGRESS check on Windows,
on the grounds that this would be the same type of bug as you're
complaining of on Linux, ie, if the call does return this error we'll
mistakenly think the connection is in progress and go on to deliver an
unhelpful failure message later.

However, some more trolling of the intertubes suggests that Cygwin's
emulation of socket() does indeed return EINPROGRESS; see for instance
this ancient thread of ours:
http://www.postgresql.org/message-id/flat/14855(dot)49635(dot)565990(dot)716645(at)kryten(dot)bedford(dot)waii(dot)com#14855(dot)49635(dot)565990(dot)716645@kryten.bedford.waii.com
Unless we want to distinguish Cygwin from native Windows in this code
chunk, maybe we'd better leave well enough alone.

I still want to delete the test for SOCK_ERRNO == 0. I traced that back
to commit da9501bddb42222dc33c031b1db6ce2133bcee7b, but I can't find
anything in the mailing list archives to explain that. I have an
inquiry in to Jan to see if he can remember the reason ...

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-27 06:17:57
Message-ID: 20130627061757.GB8025@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-26 20:07:40 -0400, Tom Lane wrote:
> > I don't see how we could trigger the conditions for EINPROGRESS on
> > windows that msdn lists, but since we need it on unixoid systems and its
> > valid to treat the connect as partiall successfull on windows, there
> > seems little benefit in dropping it.

> I was about to argue for removing the EINPROGRESS check on Windows,
> on the grounds that this would be the same type of bug as you're
> complaining of on Linux, ie, if the call does return this error we'll
> mistakenly think the connection is in progress and go on to deliver an
> unhelpful failure message later.

> However, some more trolling of the intertubes suggests that Cygwin's
> emulation of socket() does indeed return EINPROGRESS; see for instance
> this ancient thread of ours:
> http://www.postgresql.org/message-id/flat/14855(dot)49635(dot)565990(dot)716645(at)kryten(dot)bedford(dot)waii(dot)com#14855(dot)49635(dot)565990(dot)716645@kryten.bedford.waii.com
> Unless we want to distinguish Cygwin from native Windows in this code
> chunk, maybe we'd better leave well enough alone.

After some looking it gets funnier. There currently doesn't seem to be
any chance that we actually can get an EINPROGRESS at that level on
windows. We #define connect() to pgwin32_connect() which fudges errno
around. Where WSAEINPROGRESS is mapped to EINVAL.

Also for some reason the connect is performed synchronously to the
outside by doing pgwin32_waitforsinglesocket() if EWOULDBLOCK is
returned by connect()...
That seems to be the case since
a4c40f140d23cefbf94e00283f7688c772867f1b.

> I still want to delete the test for SOCK_ERRNO == 0. I traced that back
> to commit da9501bddb42222dc33c031b1db6ce2133bcee7b, but I can't find
> anything in the mailing list archives to explain that. I have an
> inquiry in to Jan to see if he can remember the reason ...

That looks strange, yes.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-27 13:48:25
Message-ID: 19850.1372340905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-06-26 20:07:40 -0400, Tom Lane wrote:
>> However, some more trolling of the intertubes suggests that Cygwin's
>> emulation of socket() does indeed return EINPROGRESS; see for instance
>> this ancient thread of ours:
>> http://www.postgresql.org/message-id/flat/14855(dot)49635(dot)565990(dot)716645(at)kryten(dot)bedford(dot)waii(dot)com#14855(dot)49635(dot)565990(dot)716645@kryten.bedford.waii.com
>> Unless we want to distinguish Cygwin from native Windows in this code
>> chunk, maybe we'd better leave well enough alone.

> After some looking it gets funnier. There currently doesn't seem to be
> any chance that we actually can get an EINPROGRESS at that level on
> windows. We #define connect() to pgwin32_connect() which fudges errno
> around. Where WSAEINPROGRESS is mapped to EINVAL.

That's only in the backend though: note the #define is controlled by
#ifndef FRONTEND, and we don't link backend/port/win32/socket.c into
libpq anyway. The signal managing it's doing wouldn't work at all in
client-side programs.

But yeah, on the backend side we would definitely treat WSAEINPROGRESS
as a hard error. OTOH, we don't use nonblocking connect (much?) in the
backend so I'm not sure whether that's a directly comparable case or
not.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Date: 2013-06-27 16:42:47
Message-ID: 24398.1372351367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-06-26 20:07:40 -0400, Tom Lane wrote:
>> I still want to delete the test for SOCK_ERRNO == 0. I traced that back
>> to commit da9501bddb42222dc33c031b1db6ce2133bcee7b, but I can't find
>> anything in the mailing list archives to explain that. I have an
>> inquiry in to Jan to see if he can remember the reason ...

> That looks strange, yes.

Committed that way. We can always undo it if Jan can defend the logic.

regards, tom lane