Keep-alive support

Lists: pgsql-hackerspgsql-interfaces
From: Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar>
To: pgsql-interfaces(at)postgresql(dot)org
Subject: Keep-alive support
Date: 2006-11-29 19:49:07
Message-ID: 456DE433.8000407@integratech.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-interfaces

Is there any keep alive support in libpq? I'm not really using libpq
directly, I'm using libpqxx and there is no keep-alive support there, so
I'm trying to use TCP's own keep-alive support, but I have a problem:
libpq seems to reconnect the socket when the connection is lost.

What I do is this:

/* Connect libpq as in the Example 1 of the manual. */

/* Before any query is sent (linux 2.6.18) */

int sd = PQsocket(conn);
// Use TCP keep-alive feature
setsockopt(sd, SOL_SOCKET, SO_KEEPALIVE, 1);
// Maximum keep-alive probes before asuming the connection is lost
setsockopt(sd, IPPROTO_TCP, TCP_KEEPCNT, 5);
// Interval (in seconds) between keep-alive probes
setsockopt(sd, IPPROTO_TCP, TCP_KEEPINTVL, 2);
// Maximum idle time (in seconds) before start sending keep-alive probes
setsockopt(sd, IPPROTO_TCP, TCP_KEEPIDLE, 10);
(see set_sock_opt() above, but is just a simple setsockopt wrapper)

then I so a sleep(10) and continue with the Example 1 of the manual
(which makes a simple transaction query). In the sleep time I unplug the
network cable and monitor the TCP connection using netstat -pano, and
found all the TCP keep-alive timers times out perfectly, closing the
connection, but inmediatly I see a new connection (and without the
keep-alive parameters, so it take forever to timeout again). So I guess
libpq is re-opening the socket. This is making my life a nightmare =)

Is there any way to avoid this behavior? Please tell me it is =)

PS: This thread was originated in libpqxx's mailing list, but I'm moving
it here because it looks like a libpq issue, if you want take a look to
the original thread, you can find it here:
http://gborg.postgresql.org/pipermail/libpqxx-general/2006-November/001511.html

TIA

--------8<--------8<--------8<--------8<--------8<--------8<--------

void set_sock_opt(int sd, int level, int name, int val)
{
if (setsockopt(sd, level, name, &val, sizeof(val)) == -1)
{
perror("setsockopt");
abort();
}
}

-------->8-------->8-------->8-------->8-------->8-------->8--------

--
Leandro Lucarella
Integratech S.A.
4571-5252


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Keep-alive support
Date: 2006-11-29 20:18:37
Message-ID: 3930.1164831517@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-interfaces

Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar> writes:
> libpq seems to reconnect the socket when the connection is lost.

libpq does no such thing. Better recheck your own code (look
for calls of PQreset(), perhaps).

regards, tom lane


From: Tomasz Myrta <jasiek(at)klaster(dot)net>
To: Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Keep-alive support
Date: 2006-11-29 20:31:57
Message-ID: 456DEE3D.3020108@klaster.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-interfaces

Leandro Lucarella napisal 2006-11-29 20:49:
> Is there any keep alive support in libpq? I'm not really using libpq
> directly, I'm using libpqxx and there is no keep-alive support there,
> so I'm trying to use TCP's own keep-alive support, but I have a
> problem: libpq seems to reconnect the socket when the connection is lost.
<cut>
> In the sleep time I unplug the network cable and monitor the TCP
> connection using netstat -pano, and found all the TCP keep-alive
> timers times out perfectly, closing the connection, but inmediatly I
> see a new connection (and without the keep-alive parameters, so it
> take forever to timeout again). So I guess libpq is re-opening the
> socket. This is making my life a nightmare =)
I used keepalive the same way as you (reconfiguring socket directly) and
I don't remember libpq trying to reconnect itself. I think it's a
libpqxx's behaviour - I didn't use it, but it looks like it is called
"reactivation".

Regards,
Tomasz Myrta


From: "Jeroen T(dot) Vermeulen" <jtv(at)xs4all(dot)nl>
To: "Tomasz Myrta" <jasiek(at)klaster(dot)net>
Cc: "Leandro Lucarella" <llucarella(at)integratech(dot)com(dot)ar>, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Keep-alive support
Date: 2006-11-30 06:33:03
Message-ID: 17164.125.24.223.202.1164868383.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-interfaces

On Thu, November 30, 2006 03:31, Tomasz Myrta wrote:

> I used keepalive the same way as you (reconfiguring socket directly) and
> I don't remember libpq trying to reconnect itself. I think it's a
> libpqxx's behaviour - I didn't use it, but it looks like it is called
> "reactivation".

That's right. It's libpqxx, not libpq, that restores the connection. (It
couldn't really be any other way because libpq doesn't have enough
information to know it's safe--you could be in the middle of a
transaction, or you could be losing a temp table). Automatic reactivation
can also be disabled explicitly if you don't want it (or just *when* you
don't want it--e.g. when you're working with temp tables).

I do think that the long TCP timeouts are something that should be handled
at the lower levels. We can't really do real keepalives, I guess, simply
because libpq is synchronous to the application. But perhaps we could
demand that the server at least acknowledge a request in some way within a
particular time limit? It'd have to be at the lowest level possible and
as "cheap" as possible, so it doesn't break when the server is merely very
busy.

Jeroen


From: Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar>
To: "Jeroen T(dot) Vermeulen" <jtv(at)xs4all(dot)nl>
Cc: Tomasz Myrta <jasiek(at)klaster(dot)net>, pgsql-interfaces(at)postgresql(dot)org, libpqxx-general(at)gborg(dot)postgresql(dot)org
Subject: Re: Keep-alive support
Date: 2006-11-30 14:11:08
Message-ID: 456EE67C.3050501@integratech.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-interfaces

Jeroen T. Vermeulen escribió:
> On Thu, November 30, 2006 03:31, Tomasz Myrta wrote:
>
>> I used keepalive the same way as you (reconfiguring socket directly) and
>> I don't remember libpq trying to reconnect itself. I think it's a
>> libpqxx's behaviour - I didn't use it, but it looks like it is called
>> "reactivation".
>
> That's right. It's libpqxx, not libpq, that restores the connection. (It
> couldn't really be any other way because libpq doesn't have enough
> information to know it's safe--you could be in the middle of a
> transaction, or you could be losing a temp table). Automatic reactivation
> can also be disabled explicitly if you don't want it (or just *when* you
> don't want it--e.g. when you're working with temp tables).
>
> I do think that the long TCP timeouts are something that should be handled
> at the lower levels. We can't really do real keepalives, I guess, simply
> because libpq is synchronous to the application. But perhaps we could
> demand that the server at least acknowledge a request in some way within a
> particular time limit? It'd have to be at the lowest level possible and
> as "cheap" as possible, so it doesn't break when the server is merely very
> busy.

Thanks all for your responses, but this is *not* a libpqxx issue, just
because I'm doing the test using plain libpq. Anyways, I have a little
more information about my problem and it's no libpq either =)

The problem is shown when the time between the wire is unplugged and the
use of the connection is not long enough to let the keep-alive kill the
connection. Then the connection becomes active and the TCP timers looks
like go back to the defaults, because there is data in the socket queue
to send. So it's an OS/TCP issue.

I don't see any way to control this without using an application-level
keep-alive, so I appreciate any ideas and suggestions =)

--
Leandro Lucarella
Integratech S.A.
4571-5252


From: Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar>
To: Leandro Lucarella <llucarella(at)integratech(dot)com(dot)ar>
Cc: pgsql-interfaces(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Keep-alive support
Date: 2006-12-01 15:44:22
Message-ID: 45704DD6.1040201@integratech.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-interfaces

For pgsql-hackers, here is the original thread (I think this mail is
appropriate for this list, correct me if I'm wrong):
http://archives.postgresql.org/pgsql-interfaces/2006-11/msg00014.php

Leandro Lucarella escribió:
> Thanks all for your responses, but this is *not* a libpqxx issue, just
> because I'm doing the test using plain libpq. Anyways, I have a little
> more information about my problem and it's no libpq either =)
>
> The problem is shown when the time between the wire is unplugged and the
> use of the connection is not long enough to let the keep-alive kill the
> connection. Then the connection becomes active and the TCP timers looks
> like go back to the defaults, because there is data in the socket queue
> to send. So it's an OS/TCP issue.
>
> I don't see any way to control this without using an application-level
> keep-alive, so I appreciate any ideas and suggestions =)

Hi! It's me again =)

I was thinking about solutions for my problem, and I've come up with
(mainly) this 3 ideas:

1) Add TIPC[1] support to Postgresql. This is the cleaner solution, I
think, but the the hardest and could take a lot of time, but if I use
some of the other hacks in the meantime and if there is interest on
adding this to Postgresql officially, I can evaluate working on this
seriously. What I'm sure I don't want is to keep my own Postresql fork.
So, what do you think about this? Or where should I ask?

2) Use a "monitor" dummy connection to postgres, do the TCP keep-alive
tunning and select() the socket waiting for a disconnection. Since this
socket will never be active (is that right? Or Postgresql sends any kind
of control information on an idle connection?), the TCP keep-alive will
be enough to determine if the connection is lost in a short period of
time. If there is no problem with this, I think it could be a quick and
not-so-nasty solution =)

3) Use Heartbeat[2] or make some other specific solution like it
(probably using TIPC too). I don't like it at all, since I'm looking for
a more self-contained solution, but it's another option.

I really appreciate any thought on this, and any suggestions.

TIA.

[1] http://tipc.sourceforge.net/
[2] http://www.linux-ha.org/HeartbeatProgram

--
Leandro Lucarella
Integratech S.A.
4571-5252