Re: Detecting backend failures via libpq / DBD::Pg

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detecting backend failures via libpq / DBD::Pg
Date: 2015-01-03 02:21:25
Message-ID: 54A75225.9080104@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/30/2014 08:43 AM, Greg Sabino Mullane wrote:
> I am working on enhancing the ping() method of DBD::Pg. The goal of
> that is for a user to be able to determine if the connection to the
> database is still valid.

This entire concept is flawed. IMO the documentation for it should be
marked with a warning pointing out that use of this method to validate
a connection before use implies a race condition; it is never safe to
"validate" a connection then assume it'll actually be valid and
working for the following statement(s).

Correctly written applications must instead tolerate failures,
trapping errors/exceptions at each statement and retrying the transaction.

> The basic way we do this is to send a simple SELECT via PQexec and
> then check for a valid return value (and when in doubt, we check
> PQstatus).

At the protocol level all that's needed, and all that should be done
IMO, is to send a Sync message and wait for the server's ReadyForQuery
response.

It might be nice to expose a libpq function to do this.

It's only safe to do this when there's no in-progress query, but
that's true of *any* validation method you might choose.

> After some experimenting, the best solution I found is to send the
> PQexec, and then check if PQresultErrorField(result, 'C') is
> '25P02'. If it is, then all is "well", in that the server is still
> alive. If it is not, then we can assume the backend is bad (for
> example, PQerrorMessage gives a "could not receive data from
> server: Bad file descriptor"). Being that we cannot do a rollback
> before calling the PQexec, is this a decent solution? Can we depend
> on really serious errors always trumping the expected 25P02?

I struggle to understand the utility of this.

Your transaction could be in the failed state because of an internal
error caused by the server's storage failing. So it's "alive" in the
sense of responding to queries, but completely useless.

This is another aspect of why the whole approach is wrong. If you want
to find out if the server can run a particular query, run the query.
If you get an error, then it couldn't run that query.

- --
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUp1IhAAoJELBXNkqjr+S2hvwH/3V1jNIYTzjVKlMKGfiOmXAH
FAdy1PznxZx/DWidcIl7AfSa+9SDoF+cqR9leH1ju0MCy1VkS0W/Lx9lFEfDm6fU
s75kN0zno0N9wbSb/sMLGCFEv4wyX3On0rC401NY7/2HXDWco227JfH7O0fAz/lv
dNUDdmIg2+d0J1lKyTQ9Z5T8hl8SvMuRvnoaT0/5/6sqSRL3S/hSE0pObFGpKG0I
hWpklz3nQwMXZgLOt1YmSAprBd6HyUIzQDG0mV8QQ4SKn7M92J5KSgN1ORyVbMGZ
ImKJ+EpnUVEA+n/yG/CV/u27OfKVSYVQZJLZE3XepLY+/eBI3Ai2d+wK7x9Yrfk=
=NkUj
-----END PGP SIGNATURE-----

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-01-03 02:35:57 Re: History of WAL_LEVEL (archive vs hot_standby)
Previous Message Peter Geoghegan 2015-01-03 00:43:00 Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)