Lists: | pgsql-general |
---|
From: | John Gateley <gateley(at)jriver(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Database deadlock/hanging |
Date: | 2007-03-07 16:08:42 |
Message-ID: | 20070307100842.bebaee38.gateley@jriver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hi,
My database stopped responding last night (Postgres 8.1.4).
at 2 am, a vacuum began running:
/usr/local/pgsql/bin/vacuumdb -afz
When I came in to work this morning, I could query some tables
but not others. There were many(100) processes like:
postgres 11791 6901 0 02:07 ? 00:00:00 postgres: www-data yadb2 127.0.0.1(40883) SELECT waiting
I began running an update yesterday that may or may not have completed.
I was moving data out of one table into a new table, and setting
an ID field to point to the new table. That may or may not have
completed (it was done by this morning). It is not a regular occurence
and today was the first time I saw this hanging behavior, so it's
probably related.
The postmaster was using a lot of CPU.
I stopped the postmaster with a SIGINT, it stopped quickly
and came back up automatically (I'm using daemontools) and
everything is fine.
I'm guessing it's something related to table locks.
Any pointers on what I should be looking for to prevent this from
happening again? What information I should be tracking to figure
out what is exactly happening?
Thanks very much,
j
From: | Reece Hart <reece(at)harts(dot)net> |
---|---|
To: | John Gateley <gateley(at)jriver(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Database deadlock/hanging |
Date: | 2007-03-07 17:27:04 |
Message-ID: | 1173288424.5374.22.camel@snafu.site |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Wed, 2007-03-07 at 10:08 -0600, John Gateley wrote:
> I'm guessing it's something related to table locks.
...
> Any pointers on what I should be looking for to prevent this from
> happening again? What information I should be tracking to figure
> out what is exactly happening?
Your inserts almost certainly have a table or index exclusively locked
and thereby causing a backlog of selects.
You can fish current and waiting locks out of pg_locks, but those use
internal identifiers rather than names. Here's a view that will make
pg_locks more readable:
rkh(at)csb-dev=> CREATE OR REPLACE VIEW pgutils.locks AS
SELECT l.pid, d.datname AS "database", n.nspname AS "schema", c.relname AS relation, l.locktype, l."mode",
CASE l."granted"
WHEN true THEN 'RUN'::text
ELSE 'WAIT'::text
END AS state, a.usename, a.current_query, to_char(now() - a.query_start, 'HH24:MI:SS'::text) AS duration
FROM pg_locks l
JOIN pg_database d ON l."database" = d.oid
JOIN pg_class c ON l.relation = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
JOIN pg_stat_activity a ON l.pid = a.procpid
ORDER BY l.pid, d.datname, n.nspname, c.relname, l."granted";
rkh(at)csb-dev=> select * from pgutils.locks ;
pid | database | schema | relation | locktype | mode | state | usename | current_query | duration
-------+----------+------------+--------------------+----------+-----------------+-------+---------+---------------+----------
28434 | csb-dev | pg_catalog | pg_class | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_class_oid_index | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_locks | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_namespace | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pg_catalog | pg_stat_activity | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
28434 | csb-dev | pgutils | locks | relation | AccessShareLock | RUN | rkh | <IDLE> | 00:00:21
(6 rows)
-Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.
From: | John Gateley <gateley(at)jriver(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Database deadlock/hanging |
Date: | 2007-03-08 15:53:10 |
Message-ID: | 20070308095310.f2dc476b.gateley@jriver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Wed, 07 Mar 2007 09:27:04 -0800
Reece Hart <reece(at)harts(dot)net> wrote:
> On Wed, 2007-03-07 at 10:08 -0600, John Gateley wrote:
> > I'm guessing it's something related to table locks.
> ...
> > Any pointers on what I should be looking for to prevent this from
> > happening again? What information I should be tracking to figure
> > out what is exactly happening?
>
> Your inserts almost certainly have a table or index exclusively locked
> and thereby causing a backlog of selects.
Thanks. It turns out it was my nightly vacuuming of the database.
I had the full option set, and I had added a large table, and I
think it was just trying to finish the vacuum (it took about
15 minutes to do a vacuumdb -az, would several hours be reasonable
for afz? Or maybe there was some deadlock with table access?)
Moral - full isn't always better (and if I RTFM, I would have
known that...)
j