Re: Database deadlock/hanging

Lists: pgsql-general
From: John Gateley <gateley(at)jriver(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Database deadlock/hanging
Date: 2007-03-07 16:08:42
Message-ID: 20070307100842.bebaee38.gateley@jriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

My database stopped responding last night (Postgres 8.1.4).

at 2 am, a vacuum began running:
/usr/local/pgsql/bin/vacuumdb -afz

When I came in to work this morning, I could query some tables
but not others. There were many(100) processes like:
postgres 11791 6901 0 02:07 ? 00:00:00 postgres: www-data yadb2 127.0.0.1(40883) SELECT waiting

I began running an update yesterday that may or may not have completed.
I was moving data out of one table into a new table, and setting
an ID field to point to the new table. That may or may not have
completed (it was done by this morning). It is not a regular occurence
and today was the first time I saw this hanging behavior, so it's
probably related.

The postmaster was using a lot of CPU.

I stopped the postmaster with a SIGINT, it stopped quickly
and came back up automatically (I'm using daemontools) and
everything is fine.

I'm guessing it's something related to table locks.

Any pointers on what I should be looking for to prevent this from
happening again? What information I should be tracking to figure
out what is exactly happening?

Thanks very much,

j


From: Reece Hart <reece(at)harts(dot)net>
To: John Gateley <gateley(at)jriver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Database deadlock/hanging
Date: 2007-03-07 17:27:04
Message-ID: 1173288424.5374.22.camel@snafu.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2007-03-07 at 10:08 -0600, John Gateley wrote:
> I'm guessing it's something related to table locks.
...
> Any pointers on what I should be looking for to prevent this from
> happening again? What information I should be tracking to figure
> out what is exactly happening?

Your inserts almost certainly have a table or index exclusively locked
and thereby causing a backlog of selects.

You can fish current and waiting locks out of pg_locks, but those use
internal identifiers rather than names. Here's a view that will make
pg_locks more readable:

rkh(at)csb-dev=> CREATE OR REPLACE VIEW pgutils.locks AS
SELECT l.pid, d.datname AS "database", n.nspname AS "schema", c.relname AS relation, l.locktype, l."mode",
CASE l."granted"
WHEN true THEN 'RUN'::text
ELSE 'WAIT'::text
END AS state, a.usename, a.current_query, to_char(now() - a.query_start, 'HH24:MI:SS'::text) AS duration
FROM pg_locks l
JOIN pg_database d ON l."database" = d.oid
JOIN pg_class c ON l.relation = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
JOIN pg_stat_activity a ON l.pid = a.procpid
ORDER BY l.pid, d.datname, n.nspname, c.relname, l."granted";

rkh(at)csb-dev=> select * from pgutils.locks ;
pid | database | schema | relation | locktype | mode | state | usename | current_query | duration
-------+----------+------------+--------------------+----------+-----------------+-------+---------+---------------+----------
28434 | csb-dev | pg_catalog | pg_class | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_class_oid_index | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_locks | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_namespace | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_stat_activity | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pgutils | locks | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
(6 rows)

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


From: John Gateley <gateley(at)jriver(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Database deadlock/hanging
Date: 2007-03-08 15:53:10
Message-ID: 20070308095310.f2dc476b.gateley@jriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 07 Mar 2007 09:27:04 -0800
Reece Hart <reece(at)harts(dot)net> wrote:

> On Wed, 2007-03-07 at 10:08 -0600, John Gateley wrote:
> > I'm guessing it's something related to table locks.
> ...
> > Any pointers on what I should be looking for to prevent this from
> > happening again? What information I should be tracking to figure
> > out what is exactly happening?
>
> Your inserts almost certainly have a table or index exclusively locked
> and thereby causing a backlog of selects.

Thanks. It turns out it was my nightly vacuuming of the database.
I had the full option set, and I had added a large table, and I
think it was just trying to finish the vacuum (it took about
15 minutes to do a vacuumdb -az, would several hours be reasonable
for afz? Or maybe there was some deadlock with table access?)

Moral - full isn't always better (and if I RTFM, I would have
known that...)

j