Dealing with network-dead clients

Lists: pgsql-hackers
From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Dealing with network-dead clients
Date: 2005-02-14 01:45:09
Message-ID: 421002A5.6090701@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm currently trying to find a clean way to deal with network-dead
clients that are in a transaction and holding locks etc.

The normal "client closes socket" case works fine. The scenario I'm
worried about is when the client machine falls off the network entirely
for some reason (ethernet problem, kernel panic, machine catches
fire..). From what I can see, if the connection is idle at that point,
the server won't notice this until TCP-level SO_KEEPALIVE kicks in,
which by default takes over 2 hours on an idle connection. I'm looking
for something more like a 30-60 second turnaround if the client is
holding locks.

The options I can see are:

1) tweak TCP keepalive intervals down to a low value, system-wide
2) use (nonportable) setsockopt calls to tweak TCP keepalive settings on
a per-socket basis.
3) implement an idle timeout on the server so that open transactions
that are idle for longer than some period are automatically aborted.

(1) is very ugly because it is system-wide.
(2) is not portable.

Also I'm not sure how well extremely low keepalive settings behave.

(3) seems like a proper solution. I've searched the archives a bit and
transaction timeouts have been suggested before, but there seems to be
some resistance to them.

I was thinking along the lines of a SIGALRM-driven timeout that starts
at the top of the query-processing loop when in a transaction and is
cancelled when client traffic is received. I'm not sure exactly what
should happen when the timeout occurs, though. Should it kill the entire
connection, or just roll back the current transaction? If the connection
stays alive, the fun part seems to be in avoiding confusing the client
about the current transaction state.

Any suggestions on what I should do here?

-O


From: Richard Huxton <dev(at)archonet(dot)com>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dealing with network-dead clients
Date: 2005-02-14 08:47:30
Message-ID: 421065A2.7060300@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oliver Jowett wrote:
> I'm currently trying to find a clean way to deal with network-dead
> clients that are in a transaction and holding locks etc.
>
> The normal "client closes socket" case works fine. The scenario I'm
> worried about is when the client machine falls off the network entirely
> for some reason (ethernet problem, kernel panic, machine catches
> fire..). From what I can see, if the connection is idle at that point,
> the server won't notice this until TCP-level SO_KEEPALIVE kicks in,
> which by default takes over 2 hours on an idle connection. I'm looking
> for something more like a 30-60 second turnaround if the client is
> holding locks.

> 3) implement an idle timeout on the server so that open transactions
> that are idle for longer than some period are automatically aborted.

> (3) seems like a proper solution. I've searched the archives a bit and
> transaction timeouts have been suggested before, but there seems to be
> some resistance to them.

Have you come across the pgpool connection-pooling project?
http://pgpool.projects.postgresql.org/

Might be easier to put a timeout+disconnect in there.

--
Richard Huxton
Archonet Ltd


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dealing with network-dead clients
Date: 2005-02-14 10:52:59
Message-ID: 4210830B.5000103@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Richard Huxton wrote:
> Oliver Jowett wrote:
>
>> I'm currently trying to find a clean way to deal with network-dead
>> clients that are in a transaction and holding locks etc.
>>
> Have you come across the pgpool connection-pooling project?
> http://pgpool.projects.postgresql.org/

I've looked at it, haven't used it.

> Might be easier to put a timeout+disconnect in there.

It seems like I have the same design issues even if the code lives in
pgpool. Also, I'm reluctant to introduce another bit of software into
the system just for the sake of timeouts; we have no other need for
pgpool functionality.

-O