libpq connection status and closed fd

Lists: pgsql-hackers
From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: libpq connection status and closed fd
Date: 2014-09-22 06:42:01
Message-ID: CA+mi_8YF2XRJKKkiaN_D1h4f=5wGqL-0h43qdeA0wU9CHsNVtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

a psycopg user is reporting [1] that the library is not marking the
connection as closed and/or bad after certain errors, such as a
connection timeout. He is emulating the error by closing the
connection fd (I don't know if the two conditions result in the same
effect, but I'll stick to this hypothesis for now).

[1] https://github.com/psycopg/psycopg2/issues/263

Testing with a short C program [2] it seems that indeed, after closing
the fd and causing an error in `PQconsumeInput`, the connection is
deemed in good state by `PQstatus`. A similar test gives the same
result after `PQexec` is attempted on a connection whose fd is closed
(tests performed with PostgreSQL 9.3.5).

[2] https://gist.github.com/dvarrazzo/065f343c95f8ea67cf8f

Is this intentional? Is there a better way to check for a broken connection?

If we mark the connection as closed every time `PQconsumeInput`
returns 0 (or `PQexec` returns null, which are the two code paths
affecting psycopg) would it be too aggressive and cause false
positives?

Thank you very much.

-- Daniele


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 07:17:32
Message-ID: CAAfz9KPXzfXGRoRwY47fuJzxGWvpMwn_ZQi36jsruyC8WzSS6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-09-22 10:42 GMT+04:00 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>:

> Hello,
>
> a psycopg user is reporting [1] that the library is not marking the
> connection as closed and/or bad after certain errors, such as a
> connection timeout. He is emulating the error by closing the
> connection fd (I don't know if the two conditions result in the same
> effect, but I'll stick to this hypothesis for now).
>
> [1] https://github.com/psycopg/psycopg2/issues/263
>
> Testing with a short C program [2] it seems that indeed, after closing
> the fd and causing an error in `PQconsumeInput`, the connection is
> deemed in good state by `PQstatus`. A similar test gives the same
> result after `PQexec` is attempted on a connection whose fd is closed
> (tests performed with PostgreSQL 9.3.5).
>
> [2] https://gist.github.com/dvarrazzo/065f343c95f8ea67cf8f

Why are you using close() instead of PQfinish()?

>
>
> Is this intentional? Is there a better way to check for a broken
> connection?
>
BTW, PQsocket() returns -1 after PQfinish().

>
> If we mark the connection as closed every time `PQconsumeInput`
> returns 0 (or `PQexec` returns null, which are the two code paths
> affecting psycopg) would it be too aggressive and cause false
> positives?
>
> Thank you very much.
>
> -- Daniele
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
// Dmitriy.


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 07:35:14
Message-ID: CA+mi_8a7rtLD5xnbWNaQHtMnQ_HtK3ZgeDwHkbu6PLW6Y3f1RA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 22, 2014 at 8:17 AM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com> wrote:
>
> 2014-09-22 10:42 GMT+04:00 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>:

>> [2] https://gist.github.com/dvarrazzo/065f343c95f8ea67cf8f
>
> Why are you using close() instead of PQfinish()?

Because I'm testing for an error, please read my message and the
original bug report.

-- Daniele


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 07:45:34
Message-ID: CAAfz9KPxTPbx8u-PbtmcPN3JiG2nt=Rs7n3y00qS+ME-LdFKgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-09-22 11:35 GMT+04:00 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>:

> On Mon, Sep 22, 2014 at 8:17 AM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
> wrote:
> >
> > 2014-09-22 10:42 GMT+04:00 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com
> >:
>
> >> [2] https://gist.github.com/dvarrazzo/065f343c95f8ea67cf8f
> >
> > Why are you using close() instead of PQfinish()?
>
> Because I'm testing for an error, please read my message and the
> original bug report.
>
I read it. You are testing for an error, but use libpq in a wrong way in
your test case. You must use PQfinish() to close the connection
and PQstatus() will works for you.

> -- Daniele
>

--
// Dmitriy.


From: Marko Tiikkaja <marko(at)joh(dot)to>
To: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 08:36:26
Message-ID: 541FDF8A.8000403@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/22/14 9:45 AM, Dmitriy Igrishin wrote:
> 2014-09-22 11:35 GMT+04:00 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>:
>
>> On Mon, Sep 22, 2014 at 8:17 AM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
>> wrote:
>>> Why are you using close() instead of PQfinish()?
>>
>> Because I'm testing for an error, please read my message and the
>> original bug report.
>>
> I read it. You are testing for an error, but use libpq in a wrong way in
> your test case. You must use PQfinish() to close the connection
> and PQstatus() will works for you.

Perhaps you should go back and re-read it then. The point of the test
case is not to test connection closure; it's to test behaviour in the
presence of network errors.

.marko


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 08:45:51
Message-ID: CAAfz9KPjVLi+i9x=C-YYQ5zakZcJfJRdERQnmVpgXB5_4Fg-dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-09-22 12:36 GMT+04:00 Marko Tiikkaja <marko(at)joh(dot)to>:

> On 9/22/14 9:45 AM, Dmitriy Igrishin wrote:
>
>> 2014-09-22 11:35 GMT+04:00 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>:
>>
>> On Mon, Sep 22, 2014 at 8:17 AM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
>>> wrote:
>>>
>>>> Why are you using close() instead of PQfinish()?
>>>>
>>>
>>> Because I'm testing for an error, please read my message and the
>>> original bug report.
>>>
>>> I read it. You are testing for an error, but use libpq in a wrong way in
>> your test case. You must use PQfinish() to close the connection
>> and PQstatus() will works for you.
>>
>
> Perhaps you should go back and re-read it then. The point of the test
> case is not to test connection closure; it's to test behaviour in the
> presence of network errors.

And where the network error emulation in the test case? By closing fd?
I'm sorry if I don't understand something, but really, I don't see any
problem
or incorrect behavior of *libpq*. It's behavior adequate to a test case.

>
>
>
> .marko
>

--
// Dmitriy.


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 08:57:10
Message-ID: 20140922085710.GA10971@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-22 07:42:01 +0100, Daniele Varrazzo wrote:
> Hello,
>
> a psycopg user is reporting [1] that the library is not marking the
> connection as closed and/or bad after certain errors, such as a
> connection timeout. He is emulating the error by closing the
> connection fd (I don't know if the two conditions result in the same
> effect, but I'll stick to this hypothesis for now).
>
> [1] https://github.com/psycopg/psycopg2/issues/263
>
> Testing with a short C program [2] it seems that indeed, after closing
> the fd and causing an error in `PQconsumeInput`, the connection is
> deemed in good state by `PQstatus`. A similar test gives the same
> result after `PQexec` is attempted on a connection whose fd is closed
> (tests performed with PostgreSQL 9.3.5).
>
> [2] https://gist.github.com/dvarrazzo/065f343c95f8ea67cf8f
>
> Is this intentional? Is there a better way to check for a broken connection?

Note that the libpq code treats connection resets differently from
other, arbitrary, errors:

int
pqReadData(PGconn *conn)
{
...
nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
conn->inBufSize - conn->inEnd);
if (nread < 0)
{
if (SOCK_ERRNO == EINTR)
goto retry3;
/* Some systems return EAGAIN/EWOULDBLOCK for no data */
#ifdef EAGAIN
if (SOCK_ERRNO == EAGAIN)
return someread;
#endif
#if defined(EWOULDBLOCK) && (!defined(EAGAIN) || (EWOULDBLOCK != EAGAIN))
if (SOCK_ERRNO == EWOULDBLOCK)
return someread;
#endif
/* We might get ECONNRESET here if using TCP and backend died */
#ifdef ECONNRESET
if (SOCK_ERRNO == ECONNRESET)
goto definitelyFailed;
#endif
/* pqsecure_read set the error message for us */
return -1;
}

I.e. if the kernel returns that the connection has been closed it'll do
different stuff.

So I'm not convinced that the testcaseq is valid. This isn't the error
you'd get after a timeout or similar. We could add a special case path
for EBADFD, but why?

> If we mark the connection as closed every time `PQconsumeInput`
> returns 0 (or `PQexec` returns null, which are the two code paths
> affecting psycopg) would it be too aggressive and cause false
> positives?

Likely, yes.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Marko Tiikkaja <marko(at)joh(dot)to>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 09:13:36
Message-ID: 541FE840.5070908@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/22/14 10:57 AM, Andres Freund wrote:
> On 2014-09-22 07:42:01 +0100, Daniele Varrazzo wrote:
>> Is this intentional? Is there a better way to check for a broken connection?
>
> Note that the libpq code treats connection resets differently from
> other, arbitrary, errors:
>
> I.e. if the kernel returns that the connection has been closed it'll do
> different stuff.
>
> So I'm not convinced that the testcaseq is valid. This isn't the error
> you'd get after a timeout or similar. We could add a special case path
> for EBADFD, but why?

Probably not for EBADFD, but how about e.g. ETIMEDOUT? The SSL code
path seems to be putting EPIPE into the same category as ECONNRESET, too.

.marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq connection status and closed fd
Date: 2014-09-22 13:54:15
Message-ID: 12708.1411394055@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> writes:
> a psycopg user is reporting [1] that the library is not marking the
> connection as closed and/or bad after certain errors, such as a
> connection timeout. He is emulating the error by closing the
> connection fd

That seems like a completely illegitimate test procedure. A network
failure does not result in automatic closure of the FD so there is
no reason for the libpq code to be expecting that to happen. Instead,
it expects to see an appropriate error code next time it tries to
use the socket.

If you want to test for connection loss, consider doing a kill -9 on
the connected backend (best not to do that with a production server
of course).

A larger point is that your user may well have false expectations
about how quickly libpq will detect connection loss. Generally
that won't happen until it next tries to transmit or receive some
data; so a simple PQstatus() call doesn't prove a lot about whether
the connection has been lost since the last query. This is not a
bug either IMO.

regards, tom lane