Re: PostgreSQL 7.4.10 hanging on delete

Lists: pgsql-admin
From: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
To: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-20 16:21:53
Message-ID: 1135095712.14411.1260.camel@jparkin.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

I have a reasonably large, live, system-critical database. A perl
script on another machine connects and issues a sequence of commands in
a transaction, the last of which is a delete. The delete never returns
a response, and the connection never times out. The postgres process
handling the delete is in a scheduled state, but stracing it produces no
output at all.

The table contains entries for a "parent" and it's "children".
Grandchildren and other descendents never exist in the table. A before
delete trigger removes the children when the parent is deleted and the
delete is always issued for a parent. Children have exactly one parent.

By tcpdumping the SQL connection the system consistently hangs on the
same delete for one of a set of rows. As the delete never commits the
transaction is never finished. Other rows (parents and children) can
be deleted without issue.

Killing the perl process at the other end, and subsequently the
connection timing out on the perl-server end does not stop the postgres
process handling the delete, and it continues to produce no output when
straced.

Restarting the server, doing a VACUUM FULL ANALYZE and a REINDEX of all
tables in the database (including system tables) has no effect.

This was first noticed using PostgreSQL 7.4.7. Upgrading to 7.4.10 has
not helped. Upgrading to 8.x may not be an option due to the systems
connecting to the database (this is being investigated).

My next logical step is to stop access to the database, take a dump of
it, remove it, and rebuild it. Can anyone think of a reason why I
should not do this?

Can anyone think of anything else I should try?

I welcome all suggestions, no matter how obvious they may appear.

Thanks,

Jonathan

--
Best Regards

Jonathan Parkin
Developer

Legend Communications plc
T: 0844 390 2049
F: 0844 390 2001
E: jonathan(dot)parkin(at)legendplc(dot)com
W: http://www.legend.co.uk/

The information in this message is confidential and may be legally
privileged. Unauthorised disclosure, copying or distribution, either
whole or in part; or action taken in reliance on its content is
prohibited. If you are not the intended recipient, please notify Legend
Communications immediately.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
Cc: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-20 16:37:14
Message-ID: 300.1135096634@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com> writes:
> I have a reasonably large, live, system-critical database. A perl
> script on another machine connects and issues a sequence of commands in
> a transaction, the last of which is a delete. The delete never returns
> a response, and the connection never times out. The postgres process
> handling the delete is in a scheduled state, but stracing it produces no
> output at all.

Sounds like it's stuck in a tight loop. Can you attach to the postgres
process with gdb and get a stack trace to find out where the loop is?

If this is a specific row causing the issue, then I'd wonder about
data corruption of some sort. It might be worth looking at the
table with pg_filedump (from http://sources.redhat.com/rhdb/).

regards, tom lane


From: Pandurangan R S <pandurangan(dot)r(dot)s(at)gmail(dot)com>
To: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
Cc: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-21 08:02:41
Message-ID: 5e744e3d0512210002o7bb0f782oaacdb9a64160e0fe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

When the delete statements hangs, look into the pg_locks table, to
check if the delete statement is waiting because it has conflict with
rows locked by other transactions.

On 20 Dec 2005 16:21:53 +0000, Jonathan Parkin
<jonathan(dot)parkin(at)legendplc(dot)com> wrote:
> I have a reasonably large, live, system-critical database. A perl
> script on another machine connects and issues a sequence of commands in
> a transaction, the last of which is a delete. The delete never returns
> a response, and the connection never times out. The postgres process
> handling the delete is in a scheduled state, but stracing it produces no
> output at all.
>
> The table contains entries for a "parent" and it's "children".
> Grandchildren and other descendents never exist in the table. A before
> delete trigger removes the children when the parent is deleted and the
> delete is always issued for a parent. Children have exactly one parent.
>
> By tcpdumping the SQL connection the system consistently hangs on the
> same delete for one of a set of rows. As the delete never commits the
> transaction is never finished. Other rows (parents and children) can
> be deleted without issue.
>
> Killing the perl process at the other end, and subsequently the
> connection timing out on the perl-server end does not stop the postgres
> process handling the delete, and it continues to produce no output when
> straced.
>
> Restarting the server, doing a VACUUM FULL ANALYZE and a REINDEX of all
> tables in the database (including system tables) has no effect.
>
> This was first noticed using PostgreSQL 7.4.7. Upgrading to 7.4.10 has
> not helped. Upgrading to 8.x may not be an option due to the systems
> connecting to the database (this is being investigated).
>
> My next logical step is to stop access to the database, take a dump of
> it, remove it, and rebuild it. Can anyone think of a reason why I
> should not do this?
>
> Can anyone think of anything else I should try?
>
> I welcome all suggestions, no matter how obvious they may appear.
>
> Thanks,
>
> Jonathan
>
> --
> Best Regards
>
> Jonathan Parkin
> Developer
>
> Legend Communications plc
> T: 0844 390 2049
> F: 0844 390 2001
> E: jonathan(dot)parkin(at)legendplc(dot)com
> W: http://www.legend.co.uk/
>
> The information in this message is confidential and may be legally
> privileged. Unauthorised disclosure, copying or distribution, either
> whole or in part; or action taken in reliance on its content is
> prohibited. If you are not the intended recipient, please notify Legend
> Communications immediately.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Regards
Pandu


From: "Andrea" <andrea(at)compesa(dot)com(dot)br>
To: "Jonathan Parkin" <jonathan(dot)parkin(at)legendplc(dot)com>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-21 09:54:44
Message-ID: 00ce01c60614$91f7b340$ca01a8c0@DBA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi,
I dont´t know if i understand exactly your problem, but ...
When i tried to delete all of the rows of a table , it seems to never end.
So i created an index to each foreing key and the command works very fast.
Good Luck!
Andréa Pereira
Brasil-Recife

----- Original Message -----
From: "Jonathan Parkin" <jonathan(dot)parkin(at)legendplc(dot)com>
To: "PostgreSQL Admin Mailing List" <pgsql-admin(at)postgresql(dot)org>
Sent: Tuesday, December 20, 2005 2:21 PM
Subject: [ADMIN] PostgreSQL 7.4.10 hanging on delete

> I have a reasonably large, live, system-critical database. A perl
> script on another machine connects and issues a sequence of commands in
> a transaction, the last of which is a delete. The delete never returns
> a response, and the connection never times out. The postgres process
> handling the delete is in a scheduled state, but stracing it produces no
> output at all.
>
> The table contains entries for a "parent" and it's "children".
> Grandchildren and other descendents never exist in the table. A before
> delete trigger removes the children when the parent is deleted and the
> delete is always issued for a parent. Children have exactly one parent.
>
> By tcpdumping the SQL connection the system consistently hangs on the
> same delete for one of a set of rows. As the delete never commits the
> transaction is never finished. Other rows (parents and children) can
> be deleted without issue.
>
> Killing the perl process at the other end, and subsequently the
> connection timing out on the perl-server end does not stop the postgres
> process handling the delete, and it continues to produce no output when
> straced.
>
> Restarting the server, doing a VACUUM FULL ANALYZE and a REINDEX of all
> tables in the database (including system tables) has no effect.
>
> This was first noticed using PostgreSQL 7.4.7. Upgrading to 7.4.10 has
> not helped. Upgrading to 8.x may not be an option due to the systems
> connecting to the database (this is being investigated).
>
> My next logical step is to stop access to the database, take a dump of
> it, remove it, and rebuild it. Can anyone think of a reason why I
> should not do this?
>
> Can anyone think of anything else I should try?
>
> I welcome all suggestions, no matter how obvious they may appear.
>
> Thanks,
>
> Jonathan
>
> --
> Best Regards
>
> Jonathan Parkin
> Developer
>
> Legend Communications plc
> T: 0844 390 2049
> F: 0844 390 2001
> E: jonathan(dot)parkin(at)legendplc(dot)com
> W: http://www.legend.co.uk/
>
> The information in this message is confidential and may be legally
> privileged. Unauthorised disclosure, copying or distribution, either
> whole or in part; or action taken in reliance on its content is
> prohibited. If you are not the intended recipient, please notify Legend
> Communications immediately.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
> --
> Esta mensagem foi verificada pelo sistema de antivírus e
> acredita-se estar livre de perigo.
>
>
>
> --
> Internal Virus Database is out-of-date.
> Checked by AVG Anti-Virus.
> Version: 7.0.300 / Virus Database: 265.6.5 - Release Date: 26/12/2004
>

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.6.5 - Release Date: 26/12/2004

--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.


From: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
To: Pandurangan R S <pandurangan(dot)r(dot)s(at)gmail(dot)com>
Cc: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-21 10:10:05
Message-ID: 1135159805.27920.70.camel@jparkin.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

There are no conflicting locks; this occurs both concurrently with other
processes (dealing with other rows) and when the only two processes with
locks on the database are the one that hangs and the one I'm using to
examine pg_locks.

On Wed, 2005-12-21 at 08:02, Pandurangan R S wrote:
> When the delete statements hangs, look into the pg_locks table, to
> check if the delete statement is waiting because it has conflict with
> rows locked by other transactions.
>
> On 20 Dec 2005 16:21:53 +0000, Jonathan Parkin
> <jonathan(dot)parkin(at)legendplc(dot)com> wrote:
> > I have a reasonably large, live, system-critical database. A perl
> > script on another machine connects and issues a sequence of commands in
> > a transaction, the last of which is a delete. The delete never returns
> > a response, and the connection never times out. The postgres process
> > handling the delete is in a scheduled state, but stracing it produces no
> > output at all.

[Remainder of original message cut for brevity]

--
Best Regards

Jonathan Parkin
Developer

Legend Communications plc
T: 0844 390 2049
F: 0844 390 2001
E: jonathan(dot)parkin(at)legendplc(dot)com
W: http://www.legend.co.uk/

The information in this message is confidential and may be legally
privileged. Unauthorised disclosure, copying or distribution, either
whole or in part; or action taken in reliance on its content is
prohibited. If you are not the intended recipient, please notify Legend
Communications immediately.


From: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-21 10:15:36
Message-ID: 1135160136.27915.77.camel@jparkin.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Tue, 2005-12-20 at 16:37, Tom Lane wrote:
> Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com> writes:
> > I have a reasonably large, live, system-critical database. A perl
> > script on another machine connects and issues a sequence of commands in
> > a transaction, the last of which is a delete. The delete never returns
> > a response, and the connection never times out. The postgres process
> > handling the delete is in a scheduled state, but stracing it produces no
> > output at all.
>
> Sounds like it's stuck in a tight loop. Can you attach to the postgres
> process with gdb and get a stack trace to find out where the loop is?

I've attached two back traces. The first (process 23755) is unique
across the ten or so current hangign processes. All the rest follow the
same trace as the second (process 25376). I notice that first (process
23755) has the lowest-numbered PID of the hanging processes, suggesting
it began before the rest.

The "ignoring modified tuple in DELETE trigger" is reaching syslog and
being logged in /var/log/messages. This normally occurs during
processing. Syslog apears to be operating correctly, and HUPing it has
had no effect.

I should note that, other than the upgrade mentioned previously, no
changes have been made recently that I know of, and that the system has
previously appeared to be working correctly. It is not clear when this
issue began.

> If this is a specific row causing the issue, then I'd wonder about
> data corruption of some sort. It might be worth looking at the
> table with pg_filedump (from http://sources.redhat.com/rhdb/).

Thanks, I hadn't heard of that. I'll see what I can turn up with it.
Anything in particular I should be looking for?

Our current plan is to do a rebuild of the database tonight (UK time).
This would hopefully eliminate any such data corruption.

--
Best Regards

Jonathan Parkin
Developer

Legend Communications plc
T: 0844 390 2049
F: 0844 390 2001
E: jonathan(dot)parkin(at)legendplc(dot)com
W: http://www.legend.co.uk/

The information in this message is confidential and may be legally
privileged. Unauthorised disclosure, copying or distribution, either
whole or in part; or action taken in reliance on its content is
prohibited. If you are not the intended recipient, please notify Legend
Communications immediately.

Attachment Content-Type Size
pgsql-traces text/plain 7.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
Cc: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-21 15:19:05
Message-ID: 13239.1135178345@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com> writes:
> On Tue, 2005-12-20 at 16:37, Tom Lane wrote:
>> If this is a specific row causing the issue, then I'd wonder about
>> data corruption of some sort. It might be worth looking at the
>> table with pg_filedump (from http://sources.redhat.com/rhdb/).

> Our current plan is to do a rebuild of the database tonight (UK time).
> This would hopefully eliminate any such data corruption.

Given the stack traces, I no longer think it's a data corruption issue
--- somehow libc is screwing up. Backends are single-threaded and there
is *no* reason for pthread_mutex_lock to block, ever.

I have a vague recollection that we saw something similar reported once
before, but it was awhile ago [ digs... ] Here it is:
http://archives.postgresql.org/pgsql-admin/2003-07/msg00303.php
and later
http://archives.postgresql.org/pgsql-admin/2003-08/msg00064.php

We never did figure out what was up, but it is striking that the other
reporter was also using plpython. It begins to look like Python is
somehow contributing to the problem. What versions of python and
glibc are you running, exactly?

If you just want to get your production machine back to stability,
I'd recommend the workaround suggested in that thread, which is to
not use syslog-based logging.

regards, tom lane


From: Jonathan Parkin <jonathan(dot)parkin(at)legendplc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Admin Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PostgreSQL 7.4.10 hanging on delete
Date: 2005-12-21 18:24:51
Message-ID: 1135189491.27921.104.camel@jparkin.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

> Given the stack traces, I no longer think it's a data corruption issue
> --- somehow libc is screwing up. Backends are single-threaded and there
> is *no* reason for pthread_mutex_lock to block, ever.
>
> I have a vague recollection that we saw something similar reported once
> before, but it was awhile ago [ digs... ] Here it is:
> http://archives.postgresql.org/pgsql-admin/2003-07/msg00303.php
> and later
> http://archives.postgresql.org/pgsql-admin/2003-08/msg00064.php
>
> We never did figure out what was up, but it is striking that the other
> reporter was also using plpython. It begins to look like Python is
> somehow contributing to the problem. What versions of python and
> glibc are you running, exactly?

glibc-2.3.2
python-2.2.2
postgresql-python-7.4.10

Linux 2.4.20

> If you just want to get your production machine back to stability,
> I'd recommend the workaround suggested in that thread, which is to
> not use syslog-based logging.

Which indeed appears to have stopped this from being a problem.

Thank you very much for all the help. If you'd like any more
information (e.g. on versions of things) I may not be around until the
new year, but I'll happily provide them once I'm back.

Thanks again,

Jon
--
Best Regards

Jonathan Parkin
Developer

Legend Communications plc
T: 0844 390 2049
F: 0844 390 2001
E: jonathan(dot)parkin(at)legendplc(dot)com
W: http://www.legend.co.uk/

The information in this message is confidential and may be legally
privileged. Unauthorised disclosure, copying or distribution, either
whole or in part; or action taken in reliance on its content is
prohibited. If you are not the intended recipient, please notify Legend
Communications immediately.