Re: rogue process maxing cpu and unresponsive to signals
On Aug 15, 2007, at 9:27 PM, Jon Jensen wrote:
I've got a simple select query that runs every 10 minutes in order
to update data in some external rrds (it lets us make pretty graphs
and so forth). This has been working fine for months on end, when
suddenly yesterday the badness happen. For some reason, this same
query that normally takes a couple seconds has now been stuck
running for over 24 hours, maxing the CPU and generally slowing
other queries down.
The external script that initiates the query has been restarted,
and netstat no longer shows that connection. All subsequent calls
of the same query are quick as usual, but the renegade process
lingers on, unresponsive to signals. Some of the things I've tried
so far (unsuccessfully):
1. I've tried killing the process using kill from the command-line
(INT, TERM and HUP), as well as using pg_cancel_backend() via psql.
2. I've tried attaching gdb to the renegade process to see what
it's doing, but that hangs, forcing me to kill gdb (no problems
attaching to other postgres processes however).
Any other ideas? I'd like to avoid doing a kill -9 if at all
possible. The machine is debian (sarge) running postgres 8.1.
There's a lot of parts of the code that don't check for signals,
because normally they don't run for any real length of time... until
they do. :) The factorial calculation is an example that was recently
fixed. So it's possible that something in your query is in that same
condition. You may be stuck with a kill -9, but it would be good to
identify what part of the code is hung up so we can determine if it
makes sense to add signal handling.
--
Decibel!, aka Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Home |
Main Index |
Thread Index