Re: <IDLE> and waiting

From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: <IDLE> and waiting
Date: 2008-02-01 22:26:08
Message-ID: 65937bea0802011426p4812b7f0k8e07be9530f5b12a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The situation seems pretty bad!!

Here are the steps to reproduce in 'PostgreSQL 8.3beta2 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)':

session 1: begin;
session 1: update test set a = 112 where a = 112;
session 2: update test set a = 113 where a = 112; --waits
session 1: select * from pg_stat_activity; -- executed this a few times
before executing 'select version()' and then following:
session 1: <stat query1> -- see end of mail for this query

procpid | current_query | waiting |
duration | backend_start
---------+----------------------------------------+---------+------------------+-------------------------------
12577 | update test set a = 113 where a = 112; | t | -00:01:
35.782881 | 2008-02-01 13:36:15.31027-08
11975 | select * from pg_stat_activity ; | f | -00:01:
52.554697 | 2008-02-01 13:30:40.396392-08
(2 rows)

session 1: select * from pg_locks

locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid
| mode | granted
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+---------------
-----+-------+------------------+---------
transactionid | | | | | |
390 | | | | 2/14 | 12577 | ShareLock | f
transactionid | | | | | |
390 | | | | 1/9 | 11975 | ExclusiveLock | t
<snip irrelevant>

Then,
session 2: ^C
Cancel request sent
ERROR: canceling statement due to user request

session1: <stat query1>

procpid | current_query | waiting |
duration | backend_start
---------+----------------------------------------+---------+------------------+-------------------------------
12577 | update test set a = 113 where a = 112; | t | -00:01:
35.782881 | 2008-02-01 13:36:15.31027-08
11975 | select * from pg_stat_activity ; | f | -00:01:
52.554697 | 2008-02-01 13:30:40.396392-08
(2 rows)

session 1: select * from pg_locks ;

<no traces of pid 12577>

session 1: select pg_backend_pid();

pg_backend_pid
----------------
11975

The last mentioned output of <stat query> shows session 1 executing 'select
* from p_s_a', whereas the <stat query> _is_ being executed in session 1!!!
This result is consistently returned for a while, and later...

session 2: select pg_backend_pid();

pg_backend_pid
----------------
12577

session 1: <stat query1>

procpid | current_query | waiting | duration |
backend_start
---------+-----------------------+---------+-----------------+-------------------------------
11975 | <IDLE> in transaction | f | 00:06:08.671029 | 2008-02-01
13:30:40.396392-08
(1 row)

After a while again:

session 1: <stat query2> -- notice 2 not 1; 'select *' comes back to
haunt!!!

procpid | current_query | waiting |
duration | backend_start
---------+----------------------------------------+---------+------------------+-------------------------------
12577 | update test set a = 113 where a = 112; | t | -00:01:
35.782881 | 2008-02-01 13:36:15.31027-08
11975 | select * from pg_stat_activity ; | f | -00:01:
52.554697 | 2008-02-01 13:30:40.396392-08
(2 rows)

session 1: <stat query1> -- 1 back in action

procpid | current_query | waiting |
duration | backend_start
---------+----------------------------------------+---------+------------------+-------------------------------
12577 | update test set a = 113 where a = 112; | t | -00:01:
35.782881 | 2008-02-01 13:36:15.31027-08
11975 | select * from pg_stat_activity ; | f | -00:01:
52.554697 | 2008-02-01 13:30:40.396392-08
(2 rows)

The <stat query1> is:
select
procpid, current_query::varchar(50), waiting, now() - query_start as
duration, backend_start
from pg_stat_activity
where current_query <> '<IDLE>'
and current_query not like '%DONT COUNT ME1 %'
order by duration desc
limit 10;

The <stat query2> is:
select
procpid, current_query::varchar(50), waiting, now() - query_start as
duration, backend_start
from pg_stat_activity
where current_query not like '%DONT COUNT ME1 %'
order by duration desc
limit 10;

Found more bugs than I was looking for, to reproduce!!!

The reporter also made an observation (on 8.2.4) that there were
deadlocks detected at around the same time. Looked at WaitOnLock(), and
clearly there's a problem, but is it at the same/only place we are
suspecting it to be?

Best regards,

PS: Ran the <stat query>ies 1 and 2 again, just before hitting 'send', and
the result is the same:
procpid | current_query | waiting |
duration | backend_start
---------+----------------------------------------+---------+------------------+-------------------------------
12577 | update test set a = 113 where a = 112; | t | -00:01:
35.782881 | 2008-02-01 13:36:15.31027-08
11975 | select * from pg_stat_activity ; | f | -00:01:
52.554697 | 2008-02-01 13:30:40.396392-08
(2 rows)

Clearly, there's something wrong

On Feb 1, 2008 8:16 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com> writes:
> > I saw a strange behaviour on one of the production boxes. The
> > pg_stat_activity shows a process as <IDLE> and yet 'waiting' !!! On top
> of
> > it (understandably, since its IDLE), there are no entries for this pid
> in
> > pg_locks!
>
> Hmm, I can reproduce something like this by aborting a wait for lock.
> It seems the problem is that WaitOnLock() is ignoring its own good
> advice, assuming that it can do cleanup work after waiting.
>
> regards, tom lane
>

--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

17° 29' 34.37"N, 78° 30' 59.76"E - Hyderabad
18° 32' 57.25"N, 73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco *

http://gurjeet.frihost.net

Mail sent from my BlackLaptop device

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-02-01 22:33:36 configurability of OOM killer
Previous Message Gregory Stark 2008-02-01 22:16:23 Re: BUG #3921: CREATE TABLE / INCLUDING INDEXES fails with permission denied