BUG #5465: dblink TCP connection hangs blocking translation from being terminated

Lists: pgsql-bugs
From: "Valentine Gogichashvili" <valgog(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
Date: 2010-05-19 09:10:29
Message-ID: 201005190910.o4J9AT3A063957@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5465
Logged by: Valentine Gogichashvili
Email address: valgog(at)gmail(dot)com
PostgreSQL version: 8.2.1
Operating system: Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
Description: dblink TCP connection hangs blocking translation from
being terminated
Details:

Hi all,

we have an issue on our productive server. A stored procedure, that uses
dblink to get some data from the remote database hangs not responding to
kill signal and holds several locks on some tables as well as an advisory
lock. So I have this transaction to be completed in order to have a
possibility to operate the database normally.

It was exactly on the time, that the procedure was accessing remote
database, the machine hosting this remote database had a panic attack and
rebooted. But the ESTABLISHED connection is still hanging on the production
database machine:

$ netstat | grep remote_db_host
tcp 0 0 production_db_host:60248 remote_db_host:postgres ESTABLISHED

$ lsof | grep remote_db_host
postgres 1365 postgres 199u IPv4 23003779784 TCP
production_db_host:60248->remote_db_host:postgres (ESTABLISHED)

On the database session list one can see the hanging transaction:

production_db=# select procpid, now() - query_start as running, waiting,
substr(current_query,1,120) as current_query from pg_stat_activity where
current_query not like '%----STATQ-----%' and current_query != '<IDLE>'
order by query_start desc;
procpid | running | waiting |
current_query
---------+------------------------+---------+-------------------------------
----------------------------------------------------------------------------
------------------------------------
1365 | 2 days 00:17:57.992004 | f | SELECT * FROM
get_remote_data()

It seems like the dblink is waiting for the connection to be closed or
reseted and also makes the hole transaction hang not processing kill
signals.

Does the dblink TCP connection have any timeout?

How would it be possible to shutdown the DB in case this session process is
not responding to normal kill signals? Will it hinder the database from
shutting down normally? My previous experience with issuing immediate stops
or killing with -9 had been quite catastrophic and I could not start the DB
afterwards. What would you suggest in this case?

With best regards,

-- Valentine Gogichashvili


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Valentine Gogichashvili <valgog(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
Date: 2010-05-19 15:17:19
Message-ID: AANLkTindNKOoALvWo362Dn0FOPDRN7vfzUiLmYnQjAbC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili
<valgog(at)gmail(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5465
> Logged by:          Valentine Gogichashvili
> Email address:      valgog(at)gmail(dot)com
> PostgreSQL version: 8.2.1
> Operating system:   Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
> Description:        dblink TCP connection hangs blocking translation from
> being terminated
> Details:
>
> Hi all,
>
> we have an issue on our productive server. A stored procedure, that uses
> dblink to get some data from the remote database hangs not responding to
> kill signal and holds several locks on some tables as well as an advisory
> lock. So I have this transaction to be completed in order to have a
> possibility to operate the database normally.

I believe this is a known issue in dblink, where it's not possible to
cancel it when it's waiting in the TCP layer in the kernel.
Unfortunately, there is no fix ATM - there was some work towards it
for 9.0 at one point, but I think this is actually the first real
bug-report on the issue...

> It seems like the dblink is waiting for the connection to be closed or
> reseted and also makes the hole transaction hang not processing kill
> signals.
>
> Does the dblink TCP connection have any timeout?

It does not. But it would detect a conneciton that goes away, so TCP
keepalives should be able to deal with this problem. Once the kernel
notices the other end is gone, dblink should notice it and roll back.

> How would it be possible to shutdown the DB in case this session process is
> not responding to normal kill signals? Will it hinder the database from
> shutting down normally? My previous experience with issuing immediate stops
> or killing with -9 had been quite catastrophic and I could not start the DB
> afterwards. What would you suggest in this case?

kill -9 on a client will make the postmaster restart the whole
process, so yes, it's a very heavy operation.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Joseph Conway <mail(at)joeconway(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Valentine Gogichashvili <valgog(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
Date: 2010-05-19 18:41:41
Message-ID: 4BF430E5.1040503@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Magnus Hagander wrote:
> On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili
> <valgog(at)gmail(dot)com> wrote:
>> The following bug has been logged online:
>>
>> Bug reference: 5465
>> Logged by: Valentine Gogichashvili
>> Email address: valgog(at)gmail(dot)com
>> PostgreSQL version: 8.2.1
>> Operating system: Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
>> Description: dblink TCP connection hangs blocking translation from
>> being terminated
>> Details:
>>
>> Hi all,
>>
>> we have an issue on our productive server. A stored procedure, that uses
>> dblink to get some data from the remote database hangs not responding to
>> kill signal and holds several locks on some tables as well as an advisory
>> lock. So I have this transaction to be completed in order to have a
>> possibility to operate the database normally.
>
> I believe this is a known issue in dblink, where it's not possible to
> cancel it when it's waiting in the TCP layer in the kernel.
> Unfortunately, there is no fix ATM - there was some work towards it
> for 9.0 at one point, but I think this is actually the first real
> bug-report on the issue...

I thought the known issue was only on Windows though...
Note that this is not dblink specific but rather libpq.

>> How would it be possible to shutdown the DB in case this session process is
>> not responding to normal kill signals? Will it hinder the database from
>> shutting down normally? My previous experience with issuing immediate stops
>> or killing with -9 had been quite catastrophic and I could not start the DB
>> afterwards. What would you suggest in this case?
>
> kill -9 on a client will make the postmaster restart the whole
> process, so yes, it's a very heavy operation.

Can you grab the process with gdb and call elog() manually?

Joe


From: valgog <valgog(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated
Date: 2010-05-21 07:45:37
Message-ID: 96a41caa-20c3-455d-a557-680fb9e3cdc5@v37g2000vbv.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On May 19, 8:41 pm, m(dot)(dot)(dot)(at)joeconway(dot)com (Joseph Conway) wrote:
> Magnus Hagander wrote:
> > On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili
> > <val(dot)(dot)(dot)(at)gmail(dot)com> wrote:
> >> The following bug has been logged online:
>
> >> Bug reference:      5465
> >> Logged by:          Valentine Gogichashvili
> >> Email address:      val(dot)(dot)(dot)(at)gmail(dot)com
> >> PostgreSQL version: 8.2.1
> >> Operating system:   Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
> >> Description:        dblink TCP connection hangs blocking translation from
> >> being terminated
> >> Details:
>
> >> Hi all,
>
> >> we have an issue on our productive server. A stored procedure, that uses
> >> dblink to get some data from the remote database hangs not responding to
> >> kill signal and holds several locks on some tables as well as an advisory
> >> lock. So I have this transaction to be completed in order to have a
> >> possibility to operate the database normally.
>
> > I believe this is a known issue in dblink, where it's not possible to
> > cancel it when it's waiting in the TCP layer in the kernel.
> > Unfortunately, there is no fix ATM - there was some work towards it
> > for 9.0 at one point, but I think this is actually the first real
> > bug-report on the issue...
>
> I thought the known issue was only on Windows though...
> Note that this is not dblink specific but rather libpq.
>
> >> How would it be possible to shutdown the DB in case this session process is
> >> not responding to normal kill signals? Will it hinder the database from
> >> shutting down normally? My previous experience with issuing immediate stops
> >> or killing with -9 had been quite catastrophic and I could not start the DB
> >> afterwards. What would you suggest in this case?
>
> > kill -9 on a client will make the postmaster restart the whole
> > process, so yes, it's a very heavy operation.
>
> Can you grab the process with gdb and call elog() manually?
>
> Joe
>
> --
> Sent via pgsql-bugs mailing list (pgsql-b(dot)(dot)(dot)(at)postgresql(dot)org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-bugs

Unfortunately I could not install gdb on that machine :-( some
dependencies are not installable and I cannot upgrade that production
machine...

-- Valentine


From: valgog <valgog(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5465: dblink TCP connection hangs blocking transaction from being terminated
Date: 2010-05-21 07:48:36
Message-ID: ee84b8c1-1c40-440b-a1a9-8942e2b415df@y12g2000vbr.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Oh, found an type in the subject. Transaction, not Translation.