Re: Killing "stuck" queries and preventing queries from getting "stuck"

Lists: pgsql-general
From: Tim Uckun <timuckun(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Killing "stuck" queries and preventing queries from getting "stuck"
Date: 2010-09-27 23:41:36
Message-ID: AANLkTikfk-3tVf_NzwjLo4OS+vrcSsWXv4k7jhdoWOZ3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a situation where there are dozens of daemons hitting the same
postgres database. They all run different types of queries but each
daemon runs the same set of queries over and over again.

Sometimes some queries get "stuck" in that they run for hours and
hours. They never stop running. Killing the deamon does not stop the
query from running.

Once there are three of four of these "stuck" queries the database
slows down drastically.

Is there a way to tell postgres to stop any query that runs longer
than a specified amount of time? Say an hour?

Failing that what is a good strategy for detecting stuck queries and
killing them.

Thanks.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Killing "stuck" queries and preventing queries from getting "stuck"
Date: 2010-09-28 02:39:33
Message-ID: 13583.1285641573@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tim Uckun <timuckun(at)gmail(dot)com> writes:
> Is there a way to tell postgres to stop any query that runs longer
> than a specified amount of time? Say an hour?

Setting statement_timeout would do that. You ought to figure out
what's causing the performance problem, though, instead of just
zapping things ...

regards, tom lane


From: Tim Uckun <timuckun(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Killing "stuck" queries and preventing queries from getting "stuck"
Date: 2010-09-28 03:25:58
Message-ID: AANLkTim_k6GUmqsZP5GjrC3NStv=DqNqJ-Sm4Dn_zMD=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Sep 28, 2010 at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Tim Uckun <timuckun(at)gmail(dot)com> writes:
>> Is there a way to tell postgres to stop any query that runs longer
>> than a specified amount of time? Say an hour?
>
> Setting statement_timeout would do that.  You ought to figure out
> what's causing the performance problem, though, instead of just
> zapping things ...

Well the query is pretty heavy but it gets run a lot. There is a
distinct in there which seems to be the cause of most of the headaches
but it's going to take a while to redo the application to not use
distinct.

The query gets run a lot and 99.99% of the time it runs succesfully
and the daemon goes on it's merry way. Occasionally it seems to "get
stuck" and killing the daemon does not unstick it. I have not been
willing to kill -9 the process and at this stage I can afford to
restart the postgres. Eventually I won't be able to do that though so
I want to fix the app so it uses a more reasonable query and detect
and stop stuck queries in case other queries sneak into the process
during development.


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Killing "stuck" queries and preventing queries from getting "stuck"
Date: 2010-09-28 06:21:12
Message-ID: 4CA18958.7020106@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 28/09/10 11:25, Tim Uckun wrote:
> On Tue, Sep 28, 2010 at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Tim Uckun <timuckun(at)gmail(dot)com> writes:
>>> Is there a way to tell postgres to stop any query that runs longer
>>> than a specified amount of time? Say an hour?
>>
>> Setting statement_timeout would do that. You ought to figure out
>> what's causing the performance problem, though, instead of just
>> zapping things ...
>
> Well the query is pretty heavy but it gets run a lot. There is a
> distinct in there which seems to be the cause of most of the headaches
> but it's going to take a while to redo the application to not use
> distinct.
>
> The query gets run a lot and 99.99% of the time it runs succesfully
> and the daemon goes on it's merry way. Occasionally it seems to "get
> stuck" and killing the daemon does not unstick it.

Useful things to try when you have a "stuck" backend:

- attach strace to it and see if it's doing anything
that involves system calls

- attach gdb to it and get a backtrace to see what
it's up to. If it's using CPU, do this multiple times
to see if it's in some kind of infinite loop, as you'll
get a snapshot of different stacks if so. See:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

- (on linux; you didn't mention your OS):
cat /proc/$pid/stack , where $pid is the process id
of the stuck backend, to see what the backend process is
up to in the kernel.

... then post the output of all those tests here, along with the
contents of "select * from pg_stat_activity", "select * from pg_locks"
and anything from the postgresql log files that looks possibly relevant.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/


From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Killing "stuck" queries and preventing queries from getting "stuck"
Date: 2010-09-28 17:57:34
Message-ID: 778964D5-3005-4FEC-82BB-15AC70019D9B@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 28 Sep 2010, at 1:41, Tim Uckun wrote:

> Sometimes some queries get "stuck" in that they run for hours and
> hours. They never stop running. Killing the deamon does not stop the
> query from running.

You really should try to find out why they get "stuck". Killing stuck clients isn't going to solve your problem (aside from the fact that you probably shouldn't be using kill -9 on them, that's like using a jackhammer on a jammed door).

Some things to look into: Are those queries waiting on a lock by another daemon maybe? Are some of them keeping transactions open for a long time without committing them (or rolling them back)?

I recall you were having another problem (with deleting records). This all smells like you either are waiting for locks on records or that the statistics used for query planning aren't reflecting the actual situation.

Have a look in pg_locks and check the query plans of some of your more problematic queries (see: explain analyse) to see what's going on there. Posting the results of those here would allow more eyes to look into your issues, in case it isn't obvious to you. There's some good documentation on these subjects too.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4ca22c9f678304378921584!


From: Tim Uckun <timuckun(at)gmail(dot)com>
To: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Killing "stuck" queries and preventing queries from getting "stuck"
Date: 2010-09-28 20:14:46
Message-ID: AANLkTimtYSOZyP9pDnkGBTuXmhbtfO=YsRLwcjcj1P90@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> You really should try to find out why they get "stuck". Killing stuck clients isn't going to solve your problem (aside from the fact that you probably shouldn't be using kill -9 on them, that's like using a jackhammer on a jammed door).

Well I didn't use kill -9 I used the pg_cancel_backend command.

>
> Some things to look into: Are those queries waiting on a lock by another daemon maybe? Are some of them keeping transactions open for a long time without committing them (or rolling them back)?

I'll take a look at that. It certainly would be simpler than attaching
a gdb session to the pid and getting a stacktrace.

> I recall you were having another problem (with deleting records). This all smells like you either are waiting for locks on records or that the statistics used for query planning aren't reflecting the actual situation.
>

I am having some performance issues with the database. I am also
trying to clean out a lot of records out of the system. Once all the
records I want to delete are gone perhaps the problem will go away. I
am also looking at how the application can be refactored not to use
this particular DISTINCT query.

> Have a look in pg_locks and check the query plans of some of your more problematic queries (see: explain analyse) to see what's going on there. Posting the results of those here would allow more eyes to look into your issues, in case it isn't obvious to you. There's some good documentation on these subjects too.

I did look at the analyze and basically postgres is saying the
distinct is killing me. I remove that and the query is fine. I
didn't look at the locks because the queries are read only so I didn't
think they would be effected by locks but I will look at them post
them here.