Re: Unsuccessful SIGINT

Lists: pgsql-general
From: "Albe Laurenz" <all(at)adv(dot)magwien(dot)gv(dot)at>
To: "Brian Wipf *EXTERN*" <brian(at)clickspace(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unsuccessful SIGINT
Date: 2006-12-04 08:43:02
Message-ID: 52EF20B2E3209443BC37736D00C3C1380BBCE706@EXADV1.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> I have a connection that I am unable to kill with a sigint.
>
> ps auxww for the process in question:
> postgres 3578 0.3 3.6 6526396 1213344 ? S Dec01 0:32
> postgres: postgres ssprod 192.168.0.52(49333) SELECT
>
> and gdb shows:
> (gdb) bt
> #0 0x00002ba62c18f085 in send () from /lib64/libc.so.6
> #1 0x0000000000504765 in internal_flush ()
> #2 0x0000000000504896 in internal_putbytes ()
> #3 0x00000000005048fc in pq_putmessage ()
> #4 0x0000000000505ea4 in pq_endmessage ()
> #5 0x000000000043e37a in printtup ()
> #6 0x00000000004e9349 in ExecutorRun ()
> #7 0x0000000000567931 in PortalRunSelect ()
> #8 0x00000000005685f0 in PortalRun ()
> #9 0x0000000000565ea8 in PostgresMain ()
> #10 0x0000000000540624 in ServerLoop ()
> #11 0x000000000054131a in PostmasterMain ()
> #12 0x000000000050676e in main ()
>
> lsof on the client machine (192.168.0.52) shows no connections on
> port 49333, so it doesn't appear to be a simple matter of killing the

> client connection. If I have to, I can reboot the client machine, but

> this seems like overkill and I'm not certain this will fix the
> problem. Anything else I can try on the server or the client short of

> restarting the database or rebooting the client?

Do I get it right that there is no process on the client machine
using port 49333?
Maybe you can reboot the client machine to make sure.

I'd wait for some time, because the send() might be stuck in kernel
space, and I guess it should timeout at some point. Then the process
will go away.

If the server process is still there after a couple of hours, hmm,
I don't know. Maybe resort to a kill -9. If that does not get rid
of the server process, it is stuck in kernel space for good and
probably nothing except a reboot will get rid of it.

Yours,
Laurenz Albe


From: Brian Wipf <brian(at)clickspace(dot)com>
To: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unsuccessful SIGINT
Date: 2006-12-04 14:26:13
Message-ID: 488E022C-59FB-4409-90E2-1E894FA35169@clickspace.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 4-Dec-06, at 1:43 AM, Albe Laurenz wrote:
>> lsof on the client machine (192.168.0.52) shows no connections on
>> port 49333, so it doesn't appear to be a simple matter of killing the
>
>> client connection. If I have to, I can reboot the client machine, but
>
>> this seems like overkill and I'm not certain this will fix the
>> problem. Anything else I can try on the server or the client short of
>
>> restarting the database or rebooting the client?
>
> Do I get it right that there is no process on the client machine
> using port 49333?
> Maybe you can reboot the client machine to make sure.
>
> I'd wait for some time, because the send() might be stuck in kernel
> space, and I guess it should timeout at some point. Then the process
> will go away.
The Java process on the client machine that held the connection was
killed off and lsof no longer showed a process with a connection on
port 49333. I waited about 7 hours and the database server still
showed the hung connection from port 49333 of the client. I finally
reboot the client computer, which fixed the problem. I suppose
something lower level than the application process was hanging on to
the connection somehow and lsof couldn't even detect it. The client
is a Mac OS X 10.4.8 box. It would have been nice if I could have
killed the process from the server side as well, but I'm sure there's
a good reason why you can't when it's in this state:
send () from /lib64/libc.so.6
in internal_flush ()
in internal_putbytes ()
in pq_putmessage ()
in pq_endmessage ()
in printtup ()
in ExecutorRun ()
in PortalRunSelect ()

> If the server process is still there after a couple of hours, hmm,
> I don't know. Maybe resort to a kill -9. If that does not get rid
> of the server process, it is stuck in kernel space for good and
> probably nothing except a reboot will get rid of it.
The last time I tried a kill -9 on a server process the database
instantly reboot itself and it had to perform some kind of crash
recovery. Is a kill -9 okay in some cases? I suppose a restart of the
database would have worked as well, but that was my last resort.