Re: BUG #5837: PQstatus() fails to report lost connection

Lists: pgsql-bugs
From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-14 01:36:29
Message-ID: 201101140136.p0E1aTFb017852@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5837
Logged by: Murray S. Kucherawy
Email address: msk(at)cloudmark(dot)com
PostgreSQL version: 9.0.2
Operating system: FreeBSD
Description: PQstatus() fails to report lost connection
Details:

I'm accessing libpq via OpenDBX, but it appears to be doing the right thing.
This is reproducible.

1) establish a connection to postgresql
2) initiate a query, collect results, etc.; all normal
3) while client is idle, restart the server
4) initiate the very same query as before
5) call PQgetResult(), returns non-NULL
6) call PQresultStatus(), returns PGRES_FATAL_ERROR
7) call PQstatus(), returns CONNECTION_OK

This causes the caller not to try PQreset() or equivalent, because the
connection is presumably fine. However, of course, no further queries will
succeed.

A call to PQerrorMessage() returns the string "FATAL: terminating connection
due to administrator command", which would be expected.

I would hate to have to have the application pull that out each time and
look for this specific string to detect the dead connection.

What happens internally prior to (7) appears to be a bug. I looked through
the "TODO" list and didn't see any mention of this.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-23 02:51:33
Message-ID: AANLkTinZAdPCX685ecg2_eVXJ1vDNXL4Ki-B1rFN=Z3h@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Thu, Jan 13, 2011 at 8:36 PM, Murray S. Kucherawy <msk(at)cloudmark(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5837
> Logged by:          Murray S. Kucherawy
> Email address:      msk(at)cloudmark(dot)com
> PostgreSQL version: 9.0.2
> Operating system:   FreeBSD
> Description:        PQstatus() fails to report lost connection
> Details:
>
> I'm accessing libpq via OpenDBX, but it appears to be doing the right thing.
>  This is reproducible.
>
> 1) establish a connection to postgresql
> 2) initiate a query, collect results, etc.; all normal
> 3) while client is idle, restart the server
> 4) initiate the very same query as before
> 5) call PQgetResult(), returns non-NULL
> 6) call PQresultStatus(), returns PGRES_FATAL_ERROR
> 7) call PQstatus(), returns CONNECTION_OK
>
> This causes the caller not to try PQreset() or equivalent, because the
> connection is presumably fine.  However, of course, no further queries will
> succeed.

I can reproduce this by hacking up src/test/examples/testlibpq to
loop, but I'm not totally sure what's causing the behavior. I think
the problem may be that libpq only reads enough from the connection to
get the FATAL error. It doesn't keep reading to see whether there's
an EOF afterward, and thus doesn't immediately realize that the
connection has been closed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-23 03:44:04
Message-ID: 1950.1295754244@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 13, 2011 at 8:36 PM, Murray S. Kucherawy <msk(at)cloudmark(dot)com> wrote:
>> 1) establish a connection to postgresql
>> 2) initiate a query, collect results, etc.; all normal
>> 3) while client is idle, restart the server
>> 4) initiate the very same query as before
>> 5) call PQgetResult(), returns non-NULL
>> 6) call PQresultStatus(), returns PGRES_FATAL_ERROR
>> 7) call PQstatus(), returns CONNECTION_OK

> I can reproduce this by hacking up src/test/examples/testlibpq to
> loop, but I'm not totally sure what's causing the behavior.

What did you do exactly? testlibpq.c just uses PQexec(), and AFAICS the
connection status does end up BAD if the backend is terminated before a
PQexec starts.

> I think
> the problem may be that libpq only reads enough from the connection to
> get the FATAL error. It doesn't keep reading to see whether there's
> an EOF afterward, and thus doesn't immediately realize that the
> connection has been closed.

I think the OP's mistake is to assume that the first PQgetResult ought
to set this. As you say, it hasn't discovered the EOF condition at the
time it returns that error-message result, and we are certainly not
going to add another kernel call to every query cycle to check for EOF.

The reason I don't see a problem when using PQexec is that PQexec will
internally do another PQgetResult, and it's the second one that will
fail and reset the connection status. In a non-connection-termination
situation, the second internal PQgetResult call consumes the
ReadyForQuery message, and at that point we fall out of PQexec (without
any extra kernel call). Here, though, there won't be a ReadyForQuery,
and it's the read() to try to collect one that discovers the loss of
connection.

In short: I don't think there's a bug here, just failure to understand
proper use of PQgetResult.

regards, tom lane


From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 03:59:04
Message-ID: F5833273385BB34F99288B3648C4F06F1341E73FC4@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Saturday, January 22, 2011 7:44 PM
> To: Robert Haas
> Cc: Murray S. Kucherawy; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Thu, Jan 13, 2011 at 8:36 PM, Murray S. Kucherawy <msk(at)cloudmark(dot)com> wrote:
> >> 1) establish a connection to postgresql
> >> 2) initiate a query, collect results, etc.; all normal
> >> 3) while client is idle, restart the server
> >> 4) initiate the very same query as before
> >> 5) call PQgetResult(), returns non-NULL
> >> 6) call PQresultStatus(), returns PGRES_FATAL_ERROR
> >> 7) call PQstatus(), returns CONNECTION_OK
>
> I think the OP's mistake is to assume that the first PQgetResult ought
> to set this. As you say, it hasn't discovered the EOF condition at the
> time it returns that error-message result, and we are certainly not
> going to add another kernel call to every query cycle to check for EOF.
>
> The reason I don't see a problem when using PQexec is that PQexec will
> internally do another PQgetResult, and it's the second one that will
> fail and reset the connection status. In a non-connection-termination
> situation, the second internal PQgetResult call consumes the
> ReadyForQuery message, and at that point we fall out of PQexec (without
> any extra kernel call). Here, though, there won't be a ReadyForQuery,
> and it's the read() to try to collect one that discovers the loss of
> connection.
>
> In short: I don't think there's a bug here, just failure to understand
> proper use of PQgetResult.

For the sake of context, I'm using opendbx which is a layer between my application and any SQL backend it can support. The code that hits this problem is in there. See http://www.linuxnetworks.de/doc/index.php/OpenDBX. I'll forward this thread to that code's maintainer.

As for the reply above, I disagree. PQstatus(), as documented, doesn't say anything about certain conditions in which it won't report that the connection is dead, when it actually is, once the connection was already established and working.

I would also suggest that everything you're describing is internal details of libpq implementation and not stuff that should be visible to consumers of libpq. The level of knowledge described strikes me as the kind of thing your average libpq consumer shouldn't need to have; it's exactly why you want to have a library providing this sort of service in the first place.

Moreover, the description of PQgetResult() doesn't say or illustrate anything about proper use of it in this context, so how would a reader know he/she got it wrong? The documentation I can find online of PQgetResult() doesn't enumerate the conditions where PQstatus() gives a false indication of whether or not a reconnect is required, nor does any part of the documentation I could find state that PGRES_FATAL_ERROR always implies the connection is no longer usable and must be re-established; "FATAL" could be referring to the transaction/request, not the connection.

So, if this isn't a bug, then I think the documentation needs a bit of work in this area.

-MSK


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 04:26:49
Message-ID: AANLkTikkjFMKoa6PQfJSc31Yn_M5+T3u7dRO4zUi-Bw2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Jan 22, 2011 at 10:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> What did you do exactly?  testlibpq.c just uses PQexec(), and AFAICS the
> connection status does end up BAD if the backend is terminated before a
> PQexec starts.

PFA.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
testlibpq-loop-hack application/octet-stream 619 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 04:58:44
Message-ID: AANLkTi=rSATsN+3UD5feWm3ntsB=6jqAFBZWdTZ7ZfaO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, Jan 23, 2011 at 10:59 PM, Murray S. Kucherawy <msk(at)cloudmark(dot)com> wrote:
> As for the reply above, I disagree.  PQstatus(), as documented, doesn't say anything about certain conditions in which it won't report that the connection is dead, when it actually is, once the connection was already established and working.

I understand that gut feeling, but I think if you think about it a
little, you may realize that at some level it's an unreasonable
expectation. For example, suppose you were to connect to the server
(successfully), unplug the network cable, and then call PQstatus().
There is literally no way for PQstatus() to know whether that
connection is still there. It's just returning some internal flag out
of the object. And even if we wanted it to do something else, what?
Send a keepalive packet of some sort and wait for a response? Given
default TCP parameter settings, that could take many minutes to give
an answer - which is probably not what people want when they call
PQstatus() - and it would introduce a lot of otherwise unnecessary
network traffic for people who want to query the internal state of the
PGconn, not ping the server.

Now, in this case, things are a bit less clear, because it's not as if
the remote side has given us no indication that the connection is
about to get closed. This doesn't strike me as awesome protocol
design, but 14 years on we're probably stuck with it. I think if you
want to have some special handling when an administrative shutdown
happens on the server, you should use PQresultErrorField() to check
PG_DIAG_SQLSTATE; or if you want FATAL and PANIC conditions to be
handled specially you can check PG_DIAG_SEVERITY.

> Moreover, the description of PQgetResult() doesn't say or illustrate anything about proper use of it in this context, so how would a reader know he/she got it wrong?  The documentation I can find online of PQgetResult() doesn't enumerate the conditions where PQstatus() gives a false indication of whether or not a reconnect is required, nor does any part of the documentation I could find state that PGRES_FATAL_ERROR always implies the connection is no longer usable and must be re-established; "FATAL" could be referring to the transaction/request, not the connection.
>
> So, if this isn't a bug, then I think the documentation needs a bit of work in this area.

The description of PGRES_FATAL_ERROR in the documentation does pretty
much suck. I believe you'll get that if the backend returns either an
ERROR (in which case you need to retry the transaction) or a FATAL (in
which case you need to reset the connection), but that's not at all
clear either from the documentation or the naming of the constant
(which, alas, is hard to change at this point for
backward-compatibility reasons).

The description of PQstatus() looks correct to me. It says that "a
communications failure might result in the status changing to
CONNECTION_BAD prematurely". In the scenario you describe, no
communications failure has occurred. The server sent back an error
message, and closed the connection (though libpq hasn't noticed yet)
but there's no communications error anywhere until the client tries to
send a query over a connection that doesn't exist any more. Maybe we
could flesh that description out a bit to make it more clear that this
is really only going to tell you if TCP/IP has explicitly blown up in
your face, and not any other reason why things might not be working,
although I think there are hints of that there already.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 05:59:42
Message-ID: F5833273385BB34F99288B3648C4F06F1341E73FC6@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Sunday, January 23, 2011 8:59 PM
> To: Murray S. Kucherawy
> Cc: Tom Lane; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> > As for the reply above, I disagree.  PQstatus(), as documented,
> > doesn't say anything about certain conditions in which it won't report
> > that the connection is dead, when it actually is, once the connection
> > was already established and working.
>
> I understand that gut feeling, but I think if you think about it a
> little, you may realize that at some level it's an unreasonable
> expectation. For example, suppose you were to connect to the server
> (successfully), unplug the network cable, and then call PQstatus().
> There is literally no way for PQstatus() to know whether that
> connection is still there. It's just returning some internal flag out
> of the object.

True, of course. But that's not the case here; PQsendRequest() was used to try to contact the server to submit some SQL operation after the server had been restarted. Enough of that operation failed so as to make PQresultStatus() return PQRES_FATAL_ERROR. There was enough I/O for libpq to figure out that something has gone wrong, in fact enough to know that the cause was administrative termination of the server for some reason (since the error string returned is "FATAL: terminating connection due to administrator command"). Basically it appears, then, that enough communication (or attempts at it) took place to observe that the connection not usable anymore and mark it as such, and even why.

Again, I'm looking at this as a consumer of the API; I don't know, and to some degree shouldn't care, what's going on inside. The results simply look inconsistent.

I don't expect libpq to do any kind of keep-alive work, but once something has reported PGRES_FATAL_ERROR, I would expect the result returned by PQstatus() to be accurate. In this case, it isn't. It gives the impression that an incomplete state change occurred or something like that.

> Now, in this case, things are a bit less clear, because it's not as if
> the remote side has given us no indication that the connection is
> about to get closed. This doesn't strike me as awesome protocol
> design, but 14 years on we're probably stuck with it. I think if you
> want to have some special handling when an administrative shutdown
> happens on the server, you should use PQresultErrorField() to check
> PG_DIAG_SQLSTATE; or if you want FATAL and PANIC conditions to be
> handled specially you can check PG_DIAG_SEVERITY.

The model in OpenDBX appears to be to query and then iterate getting results until there are none; if there's ever an error, see if it's a kind of error that requires a reconnection attempt before continuing. Without assuming PGRES_FATAL_ERROR always means a reconnect is required, it seems like PQstatus() is the right test for the latter of those two "ifs", and that's what they're doing. It doesn't really matter if the disconnect was a result of administrative action or not; the caller just needs to know if it has to try to reconnect before continuing.

> The description of PGRES_FATAL_ERROR in the documentation does pretty
> much suck. I believe you'll get that if the backend returns either an
> ERROR (in which case you need to retry the transaction) or a FATAL (in
> which case you need to reset the connection), but that's not at all
> clear either from the documentation or the naming of the constant
> (which, alas, is hard to change at this point for
> backward-compatibility reasons).

Well my other suggestion would be to assume PGRES_FATAL_ERROR always means the connection needs to be reset. But this blows that idea away; this would cause a connection reset that wouldn't actually solve the problem when it's an ERROR as you described.

> The description of PQstatus() looks correct to me. It says that "a
> communications failure might result in the status changing to
> CONNECTION_BAD prematurely". In the scenario you describe, no
> communications failure has occurred. The server sent back an error
> message, and closed the connection (though libpq hasn't noticed yet)
> but there's no communications error anywhere until the client tries to
> send a query over a connection that doesn't exist any more. Maybe we
> could flesh that description out a bit to make it more clear that this
> is really only going to tell you if TCP/IP has explicitly blown up in
> your face, and not any other reason why things might not be working,
> although I think there are hints of that there already.

Given what you say here, this seems to be an exception for which there is currently no detection mechanism short of looking for that one specific error string indicating it was administrative action causing the error. I think your suggested changes would certainly help, as well as some generalized way to know that the connection handle is now unusable, either because of administrative server action or because the connection became unreliable for some other reason. I'd be fine with having to do PQstatus() || PQadminStatus() too, if such a thing were to appear. Better documentation of PGRES_FATAL_ERROR would help as well.

-MSK


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 11:55:10
Message-ID: AANLkTimDRAO_xe4h=vL2j0rdGF3_u2uTk0OHmv2rZK+=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Jan 24, 2011 at 12:59 AM, Murray S. Kucherawy <msk(at)cloudmark(dot)com> wrote:
> Well my other suggestion would be to assume PGRES_FATAL_ERROR always means the connection needs to be reset.  But this blows that idea away; this would cause a connection reset that wouldn't actually solve the problem when it's an ERROR as you described.

Well, actually it's the other way around: resetting the connection
would *work* in the case of an ERROR, but it'd be overkill. Most of
the things that can go wrong are ERROR rather than FATAL, and you just
issue a ROLLBACK to balance out any BEGIN you previously issued, and
then do whatever it is you want to do next. A full connection reset
would be pretty expensive in this context and, as you might guess,
there are a lot more things that cause ERROR than there are that cause
FATAL.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 15:23:10
Message-ID: 6379.1295882590@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Jan 22, 2011 at 10:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What did you do exactly? testlibpq.c just uses PQexec(), and AFAICS the
>> connection status does end up BAD if the backend is terminated before a
>> PQexec starts.

> PFA.

Yeah, that was my first cut at it too, but it's not terribly interesting
because it doesn't tell you anything about what PQstatus says after the
first failed PQexec.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 15:29:41
Message-ID: 6493.1295882981@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Murray S. Kucherawy" <msk(at)cloudmark(dot)com> writes:
> Given what you say here, this seems to be an exception for which there
> is currently no detection mechanism short of looking for that one
> specific error string indicating it was administrative action causing
> the error.

This is complete nonsense. If you followed the documented method for
using PQgetResult -- ie, keep calling it till it returns NULL, rather
than assuming there will be a specific number of results --- then
PQstatus would show you the bad status afterwards.

regards, tom lane


From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 17:26:36
Message-ID: F5833273385BB34F99288B3648C4F06F1341E73FCB@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Monday, January 24, 2011 7:30 AM
> To: Murray S. Kucherawy
> Cc: Robert Haas; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> "Murray S. Kucherawy" <msk(at)cloudmark(dot)com> writes:
> > Given what you say here, this seems to be an exception for which there
> > is currently no detection mechanism short of looking for that one
> > specific error string indicating it was administrative action causing
> > the error.
>
> This is complete nonsense. If you followed the documented method for
> using PQgetResult -- ie, keep calling it till it returns NULL, rather
> than assuming there will be a specific number of results --- then
> PQstatus would show you the bad status afterwards.

Your assertion here of "Read the result set to exhaustion, THEN check connection status to see if it was bad to begin with," can be politely described as counter-intuitive.

Please, at a minimum, add some documentation about it.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 17:41:36
Message-ID: 9838.1295890896@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Murray S. Kucherawy" <msk(at)cloudmark(dot)com> writes:
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>> This is complete nonsense. If you followed the documented method for
>> using PQgetResult -- ie, keep calling it till it returns NULL, rather
>> than assuming there will be a specific number of results --- then
>> PQstatus would show you the bad status afterwards.

> Your assertion here of "Read the result set to exhaustion, THEN check connection status to see if it was bad to begin with," can be politely described as counter-intuitive.

[ shrug... ] I guess if your intuition is that PQstatus should be
expected to intuit the failure of a call that hasn't happened yet,
then it's counter-intuitive.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 17:49:52
Message-ID: 4D3D67600200002500039B2C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Murray S. Kucherawy" <msk(at)cloudmark(dot)com> wrote:

> Please, at a minimum, add some documentation about it.

Current documentation at:

http://www.postgresql.org/docs/9.0/interactive/libpq-async.html

says:

| PQgetResult must be called repeatedly until it returns a null
| pointer, indicating that the command is done.

What do you think would make this more clear?

-Kevin


From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 18:04:14
Message-ID: F5833273385BB34F99288B3648C4F06F1341E73FCD@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
> Sent: Monday, January 24, 2011 9:50 AM
> To: Murray S. Kucherawy; Tom Lane
> Cc: Robert Haas; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> "Murray S. Kucherawy" <msk(at)cloudmark(dot)com> wrote:
>
> > Please, at a minimum, add some documentation about it.
>
> Current documentation at:
>
> http://www.postgresql.org/docs/9.0/interactive/libpq-async.html
>
> says:
>
> | PQgetResult must be called repeatedly until it returns a null
> | pointer, indicating that the command is done.
>
> What do you think would make this more clear?

"command is done" sounds like normal operation to me, but we're talking about an obvious exception here. The command (I assume that means the query that ultimately failed) can't even have started because the connection was gone. That's why I find the description misleading.

Indeed, the function order you're talking about is PQgetResult(), which returns non-NULL in this condition (which looks like there's a result available), but then PQresultStatus() returns PGRES_FATAL_ERROR. It's not very obvious to me that the right thing to do here is call PQgetResult() again just to get PQstatus() to tell me the truth. Now we're dealing with an exception rather than something normal.

So maybe something like this after the paragraph you cited would help:

"Note that after returning a PGresult object, PQresultStatus() could indicate a fatal error. The caller should still iterate calling PQgetResult() to completion (i.e. until it returns NULL) in order to allow libpq to process the error information completely."


From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 18:14:43
Message-ID: F5833273385BB34F99288B3648C4F06F1341E73FCE@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Monday, January 24, 2011 9:42 AM
> To: Murray S. Kucherawy
> Cc: Robert Haas; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> >> This is complete nonsense. If you followed the documented method for
> >> using PQgetResult -- ie, keep calling it till it returns NULL, rather
> >> than assuming there will be a specific number of results --- then
> >> PQstatus would show you the bad status afterwards.
>
> > Your assertion here of "Read the result set to exhaustion, THEN check
> > connection status to see if it was bad to begin with," can be politely
> > described as counter-intuitive.
>
> [ shrug... ] I guess if your intuition is that PQstatus should be
> expected to intuit the failure of a call that hasn't happened yet,
> then it's counter-intuitive.

You're mixing internal implementation details with what the API consumer sees. I don't expect PQstatus() to do anything other than tell me that a connection is no longer usable, which is what it's documented to do. I'm fine with checking it after an error has been returned by some query-related function, which is what OpenDBX is doing. I don't know why that's "complete nonsense".

Look at the order of operations here:

0) A previous connection exists. The server is manually restarted.

1) A query is issued. This succeeds.

2) A request for results is issued. This returns non-NULL, apparently indicating at least one result object came back.

3) PQresultStatus() is called, and it returns PGRES_FATAL_ERROR.

4) PQstatus() is checked, and indicates no problem.

4a) A call to PQresultErrorMessage() indicates that the library knows the connection was administratively reset.

As a consumer of the API, not knowing (or wanting to know) what's going on under-the-hood, this looks like a bug; the library certainly did know that the connection can no longer be used or 4a wouldn't happen, but PQstatus() is not saying so. The documentation I see at http://www.postgresql.org/docs/9.0/interactive/libpq-status.html doesn't lead me to believe that any of this logic is faulty. That I have to iterate on step 2 above to get PQstatus() to eventually tell me the truth even though I know something has gone wrong already seems strange, and that's why I raised this issue.

Another possible explanation is that PQresultStatus() doesn't give a consistent result until after PQgetResult() has returned NULL. If that's the case, it should be documented, because right now it isn't.

-MSK


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 18:47:18
Message-ID: 4D3D74D60200002500039B3C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Murray S. Kucherawy" <msk(at)cloudmark(dot)com> wrote:
> From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]

>> What do you think would make this more clear?

> So maybe something like this after the paragraph you cited would
> help:
>
> "Note that after returning a PGresult object, PQresultStatus()
> could indicate a fatal error. The caller should still iterate
> calling PQgetResult() to completion (i.e. until it returns NULL)
> in order to allow libpq to process the error information
> completely."

A patch based on this suggestion attached for consideration by the
community.

-Kevin

Attachment Content-Type Size
PQgetResult-doc.patch text/plain 687 bytes

From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-24 18:48:25
Message-ID: F5833273385BB34F99288B3648C4F06F1341E73FD2@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
> Sent: Monday, January 24, 2011 10:47 AM
> To: Murray S. Kucherawy; Tom Lane
> Cc: Robert Haas; pgsql-bugs(at)postgresql(dot)org
> Subject: RE: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> A patch based on this suggestion attached for consideration by the
> community.

That would work for me, thanks.

-MSK


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-25 15:32:41
Message-ID: AANLkTi=vbzmBP9MyDvXcOOTRuzPMPYY5cEVfO+haDkrN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Jan 24, 2011 at 1:47 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> "Murray S. Kucherawy" <msk(at)cloudmark(dot)com> wrote:
>> From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
>
>>> What do you think would make this more clear?
>
>> So maybe something like this after the paragraph you cited would
>> help:
>>
>> "Note that after returning a PGresult object, PQresultStatus()
>> could indicate a fatal error.  The caller should still iterate
>> calling PQgetResult() to completion (i.e. until it returns NULL)
>> in order to allow libpq to process the error information
>> completely."
>
> A patch based on this suggestion attached for consideration by the
> community.

I think this patch would only be adding to the confusion. When
PQgetResult() is called, we read enough data from the connection to
create and return one result object. It's true that this doesn't
necessarily detect an EOF, but IIUC calling PQgetResult() again is
just ONE way that you could trigger another read against the socket,
not the only one. I think it would also work to call
PQconsumeInput(), for example.

I think the real, underlying problem here is that Murray would like a
behavior change: when a FATAL or PANIC condition is reported by the
server, he'd like the client to immediately close the socket and set
its status to CONNECTION_BAD. If we're going to clarify the
documentation, I think that the lack of that behavior is what we
should try to clarify. For example, it would be good to mention that
a PGRES_FATAL_ERROR is might corresponding to the server levels ERROR,
FATAL, or PANIC, and that only in the latter two cases is the
connection presumably of no further use; and we could also mention
that libpq doesn't understand anything about the meanings of those
error levels, so that PQstatus() won't automatically barf just because
the server has sent a FATAL, *unless* it's read far enough in the
input stream to detect the problem.

Perhaps, as Murray says, this is exposing an implementation detail.
But in my experience when coding against libraries written by other
people it's often necessary to have at least a superficial
understanding of what operations they are performing under the hood.
I don't think there's any way we can eliminate that problem
completely. What we CAN try to do is make sure that the documentation
clearly explains enough of what's going on under the hood to allow
people to code against it successfully, without (hopefully) going
overboard into irrelevant details that the user doesn't need to care
about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-25 15:54:05
Message-ID: 4D3E9DBD0200002500039C36@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think this patch would only be adding to the confusion. When
> PQgetResult() is called, we read enough data from the connection
> to create and return one result object. It's true that this
> doesn't necessarily detect an EOF, but IIUC calling PQgetResult()
> again is just ONE way that you could trigger another read against
> the socket, not the only one. I think it would also work to call
> PQconsumeInput(), for example.

I find it hard to reconcile the above with this:

http://archives.postgresql.org/message-id/6493.1295882981@sss.pgh.pa.us

and the quote from our documentation referenced here:

http://archives.postgresql.org/message-id/4D3D67600200002500039B2C@gw.wicourts.gov

> I think the real, underlying problem here is that Murray would
> like a behavior change

More than that I think he wants to be able to read the manual and
know what will work, without spending loads of time getting in tune
with The Tao of Libpq. Based on his initial reading of the docs he
expected different behavior; that can be fixed by changing the
behavior or changing the docs.

-Kevin


From: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-25 18:01:58
Message-ID: F5833273385BB34F99288B3648C4F06F1341E7400C@EXCH-C2.corp.cloudmark.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Tuesday, January 25, 2011 7:33 AM
> To: Kevin Grittner
> Cc: Murray S. Kucherawy; Tom Lane; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>
> I think the real, underlying problem here is that Murray would like a
> behavior change: when a FATAL or PANIC condition is reported by the
> server, he'd like the client to immediately close the socket and set
> its status to CONNECTION_BAD. If we're going to clarify the
> documentation, I think that the lack of that behavior is what we
> should try to clarify. For example, it would be good to mention that
> a PGRES_FATAL_ERROR is might corresponding to the server levels ERROR,
> FATAL, or PANIC, and that only in the latter two cases is the
> connection presumably of no further use; and we could also mention
> that libpq doesn't understand anything about the meanings of those
> error levels, so that PQstatus() won't automatically barf just because
> the server has sent a FATAL, *unless* it's read far enough in the
> input stream to detect the problem.

First and foremost, I think a behavior change would be the right thing to do. I can't think of any API that follows this odd model; to me it's like saying select() has reported a socket that's both read-ready and in an exception state, but we won't actually set errno until you iterate on read() until EOF first. Counter-intuitive behaviors in APIs lead to application problems or even user attrition in extreme cases.

That said, I've maintained complex APIs before (in fact, this is all the result of one of them) and I understand what this sort of a change might entail. Thus, the next-best option is to document the idiosyncrasy that's been uncovered; i.e., make the proposed change to the documentation with perhaps an added hint as to why it's necessary. You might want to demonstrate why after one iteration PGgetResult() and PQstatus() seem to indicate no problem while PGresultStatus() and PGresultErrorString() both already seem to know the severity and even the nature of the error.

I think Kevin summed it up nicely. Quite simply, the documentation doesn't match the behavior in this case. One or the other should be adjusted so they are in alignment, while disrupting both your installed base and user experience as little as possible.

-MSK


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-01-25 19:09:28
Message-ID: AANLkTikZr7Zg-BxC5bSNtzskBUK-nn_dOcTxVOnGLF_P@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Jan 25, 2011 at 10:54 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think this patch would only be adding to the confusion.  When
>> PQgetResult() is called, we read enough data from the connection
>> to create and return one result object.  It's true that this
>> doesn't necessarily detect an EOF, but IIUC calling PQgetResult()
>> again is just ONE way that you could trigger another read against
>> the socket, not the only one.  I think it would also work to call
>> PQconsumeInput(), for example.
>
> I find it hard to reconcile the above with this:
>
> http://archives.postgresql.org/message-id/6493.1295882981@sss.pgh.pa.us
>
> and the quote from our documentation referenced here:
>
> http://archives.postgresql.org/message-id/4D3D67600200002500039B2C@gw.wicourts.gov

IIUC, Tom's point was that doing it that way would detect the error,
not that it was the ONLY way to detect the error.

But it's easily testable.

>> I think the real, underlying problem here is that Murray would
>> like a behavior change
>
> More than that I think he wants to be able to read the manual and
> know what will work, without spending loads of time getting in tune
> with The Tao of Libpq.  Based on his initial reading of the docs he
> expected different behavior; that can be fixed by changing the
> behavior or changing the docs.

That is why I suggested the type of doc correction that I thought
would be most helpful and accurate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-03-11 10:56:01
Message-ID: 201103111056.p2BAu1L22204@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas wrote:
> On Tue, Jan 25, 2011 at 10:54 AM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> I think this patch would only be adding to the confusion. ?When
> >> PQgetResult() is called, we read enough data from the connection
> >> to create and return one result object. ?It's true that this
> >> doesn't necessarily detect an EOF, but IIUC calling PQgetResult()
> >> again is just ONE way that you could trigger another read against
> >> the socket, not the only one. ?I think it would also work to call
> >> PQconsumeInput(), for example.
> >
> > I find it hard to reconcile the above with this:
> >
> > http://archives.postgresql.org/message-id/6493.1295882981@sss.pgh.pa.us
> >
> > and the quote from our documentation referenced here:
> >
> > http://archives.postgresql.org/message-id/4D3D67600200002500039B2C@gw.wicourts.gov
>
> IIUC, Tom's point was that doing it that way would detect the error,
> not that it was the ONLY way to detect the error.
>
> But it's easily testable.
>
> >> I think the real, underlying problem here is that Murray would
> >> like a behavior change
> >
> > More than that I think he wants to be able to read the manual and
> > know what will work, without spending loads of time getting in tune
> > with The Tao of Libpq. ?Based on his initial reading of the docs he
> > expected different behavior; that can be fixed by changing the
> > behavior or changing the docs.
>
> That is why I suggested the type of doc correction that I thought
> would be most helpful and accurate.

Doc patch attached and applied. I used "should be called" instead of
"must".

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/libpq.diff text/x-diff 791 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-03-11 14:32:48
Message-ID: AANLkTim4X1ACvODZkgoAbCii70e6qRHJ_vBO0+LBfRbq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 11, 2011 at 5:56 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Tue, Jan 25, 2011 at 10:54 AM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> > Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> >> I think this patch would only be adding to the confusion. ?When
>> >> PQgetResult() is called, we read enough data from the connection
>> >> to create and return one result object. ?It's true that this
>> >> doesn't necessarily detect an EOF, but IIUC calling PQgetResult()
>> >> again is just ONE way that you could trigger another read against
>> >> the socket, not the only one. ?I think it would also work to call
>> >> PQconsumeInput(), for example.
>> >
>> > I find it hard to reconcile the above with this:
>> >
>> > http://archives.postgresql.org/message-id/6493.1295882981@sss.pgh.pa.us
>> >
>> > and the quote from our documentation referenced here:
>> >
>> > http://archives.postgresql.org/message-id/4D3D67600200002500039B2C@gw.wicourts.gov
>>
>> IIUC, Tom's point was that doing it that way would detect the error,
>> not that it was the ONLY way to detect the error.
>>
>> But it's easily testable.
>>
>> >> I think the real, underlying problem here is that Murray would
>> >> like a behavior change
>> >
>> > More than that I think he wants to be able to read the manual and
>> > know what will work, without spending loads of time getting in tune
>> > with The Tao of Libpq. ?Based on his initial reading of the docs he
>> > expected different behavior; that can be fixed by changing the
>> > behavior or changing the docs.
>>
>> That is why I suggested the type of doc correction that I thought
>> would be most helpful and accurate.
>
> Doc patch attached and applied.  I used "should be called" instead of
> "must".

I notice that your patch has exactly the same conceptual flaw I
complained about with respect to the previous version.

But I'm not sure it's worth arguing about...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-03-11 14:46:27
Message-ID: 201103111446.p2BEkRI03016@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas wrote:
> On Fri, Mar 11, 2011 at 5:56 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> On Tue, Jan 25, 2011 at 10:54 AM, Kevin Grittner
> >> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> >> > Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> >> I think this patch would only be adding to the confusion. ?When
> >> >> PQgetResult() is called, we read enough data from the connection
> >> >> to create and return one result object. ?It's true that this
> >> >> doesn't necessarily detect an EOF, but IIUC calling PQgetResult()
> >> >> again is just ONE way that you could trigger another read against
> >> >> the socket, not the only one. ?I think it would also work to call
> >> >> PQconsumeInput(), for example.
> >> >
> >> > I find it hard to reconcile the above with this:
> >> >
> >> > http://archives.postgresql.org/message-id/6493.1295882981@sss.pgh.pa.us
> >> >
> >> > and the quote from our documentation referenced here:
> >> >
> >> > http://archives.postgresql.org/message-id/4D3D67600200002500039B2C@gw.wicourts.gov
> >>
> >> IIUC, Tom's point was that doing it that way would detect the error,
> >> not that it was the ONLY way to detect the error.
> >>
> >> But it's easily testable.
> >>
> >> >> I think the real, underlying problem here is that Murray would
> >> >> like a behavior change
> >> >
> >> > More than that I think he wants to be able to read the manual and
> >> > know what will work, without spending loads of time getting in tune
> >> > with The Tao of Libpq. ?Based on his initial reading of the docs he
> >> > expected different behavior; that can be fixed by changing the
> >> > behavior or changing the docs.
> >>
> >> That is why I suggested the type of doc correction that I thought
> >> would be most helpful and accurate.
> >
> > Doc patch attached and applied. ?I used "should be called" instead of
> > "must".
>
> I notice that your patch has exactly the same conceptual flaw I
> complained about with respect to the previous version.
>
> But I'm not sure it's worth arguing about...

You mentioned that other functions could be used, so I said "should"
rather than "must". Other ideas?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "Murray S(dot) Kucherawy" <msk(at)cloudmark(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Date: 2011-03-11 15:12:46
Message-ID: AANLkTi=73odDFY_-9Cg3=w7s17fdKjHFqsZu2+iOJRKm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Jan 25, 2011 at 12:01 PM, Murray S. Kucherawy <msk(at)cloudmark(dot)com> wrote:
>> -----Original Message-----
>> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
>> Sent: Tuesday, January 25, 2011 7:33 AM
>> To: Kevin Grittner
>> Cc: Murray S. Kucherawy; Tom Lane; pgsql-bugs(at)postgresql(dot)org
>> Subject: Re: [BUGS] BUG #5837: PQstatus() fails to report lost connection
>>
>> I think the real, underlying problem here is that Murray would like a
>> behavior change: when a FATAL or PANIC condition is reported by the
>> server, he'd like the client to immediately close the socket and set
>> its status to CONNECTION_BAD.  If we're going to clarify the
>> documentation, I think that the lack of that behavior is what we
>> should try to clarify.  For example, it would be good to mention that
>> a PGRES_FATAL_ERROR is might corresponding to the server levels ERROR,
>> FATAL, or PANIC, and that only in the latter two cases is the
>> connection presumably of no further use; and we could also mention
>> that libpq doesn't understand anything about the meanings of those
>> error levels, so that PQstatus() won't automatically barf just because
>> the server has sent a FATAL, *unless* it's read far enough in the
>> input stream to detect the problem.
>
> First and foremost, I think a behavior change would be the right thing to do.  I can't think of any API that follows this odd model; to me it's like saying select() has reported a socket that's both read-ready and in an exception state, but we won't actually set errno until you iterate on read() until EOF first.  Counter-intuitive behaviors in APIs lead to application problems or even user attrition in extreme cases.

libpq is odd and clunky on many levels; it's old, but it's also very
rich and useful. many people here would admit that given infinite time
and resources it could use a rewrite. however. this is not high at
all on the priority list because at the end of the day you can make it
do just about anything you really need it to do, and the old api would
have to be maintained for a good while.

just wrap the behavior you don't like out -- libpq simply demands a
good set of wrappers unfortunately.

merlin