Quick Links

Re: [JDBC] ERROR: canceling query due to user request

Lists:	pgsql-adminpgsql-jdbc

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	<pgsql-admin(at)postgresql(dot)org>
Subject:	Re: ERROR: canceling query due to user request
Date:	2005-09-13 20:14:37
Message-ID:	s326eceb.025@gwmta.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-jdbc

Thanks, Tom. An excellent suggestion. (This 50 year old dog has
today learned a new trick.)

There is good news and bad news. The good news is that I found
the cause, and we can keep this from happening with a change on
our end. The bad news is that I think it also points to a backend bug,
although not as serious as the potential (hypothetical) one I was
asking about earlier.

For the record (and the benefit of anyone with similar problems who
may search the archives), I got the connection pool set up, and found
the pids for the connection processes like this:

# ps aux|grep postgres
postgres 18307 0.0 0.0 3052 1100 pts/0 S Sep09 0:00 su postgres
postgres 18308 0.0 0.0 3408 1828 pts/0 S+ Sep09 0:00 bash
postgres 15463 0.0 0.1 89508 3852 pts/0 S 10:09 0:05 /usr/local/pgsql/bin/postmaster -D /var/pgsql/data
postgres 15466 0.0 4.0 89652 83004 pts/0 S 10:09 0:00 postgres: writer process
postgres 15467 0.0 0.1 7052 2796 pts/0 S 10:09 0:02 postgres: stats buffer process
postgres 15468 0.0 0.0 6388 1996 pts/0 S 10:09 0:03 postgres: stats collector process
postgres 926 0.0 0.3 90484 7576 pts/0 S 14:34 0:00 postgres: dtr dtr 165.219.88.77(4436) idle
postgres 927 0.0 0.2 89956 4684 pts/0 S 14:34 0:00 postgres: dtr dtr 165.219.88.77(4437) idle
postgres 928 3.0 1.1 90404 23764 pts/0 S 14:34 0:07 postgres: dtr dtr 165.219.88.77(4438) idle
postgres 929 2.7 1.1 90468 23260 pts/0 S 14:34 0:07 postgres: dtr dtr 165.219.88.77(4439) idle
root 935 0.0 0.0 1796 648 pts/2 S+ 14:38 0:00 grep postgres

I then established an strace session for each connection like this:

strace -tt -o strace.926 -p 926

Other flags may have been useful, but strace is new to me, so I took
the route I was sure I understood. I ran until we got an error, which
involved running through 7,278 transactions. I used Ctrl+C to stop
each strace and searched the results for SIGINT. There were 1,799
of them. They always came in a set of lines like this:

13:59:19.625498 recv(7, "P\0\0\0Y\0SELECT lcmtr.\"relationName"..., 8192, 0) = 125
13:59:19.625976 _llseek(32, 0, [0], SEEK_END) = 0
13:59:19.626057 _llseek(33, 0, [8192], SEEK_END) = 0
13:59:19.626159 _llseek(32, 0, [0], SEEK_END) = 0
13:59:19.626257 send(7, "1\0\0\0\0042\0\0\0\4T\0\0\0D\0\2relationName\0\0\0"..., 97, 0) = 97
13:59:19.626352 recv(7, 0x82b1000, 8192, 0) = ? ERESTARTSYS (To be restarted)
13:59:19.628477 --- SIGINT (Interrupt) @ 0 (0) ---
13:59:19.628559 sigreturn() = ? (mask now [])

The SELECT statement was easy to find, and it became clear that a
programmer had code which was incorrectly canceling a JDBC
Statement after reaching the end of the (empty) ResultSet. One
time out of 1,799 this was causing the error we were seeing on
the subsequent commit, which strikes me as a bug.

We've already changed the offending code to avoid the invocation
of the Statement.cancel method. Not that it merits high priority, but
it might make sense for PostgreSQL to behave more consistently on
a commit when a statement within the database transaction has been
canceled. There currently is a race condition where if the commit
comes fast enough after the Statement.cancel, it receives the error
which is the subject of this thread.

-Kevin

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 09/13/05 1:18 PM >>>
"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> I'm having a really hard time coming up with theories about the cause
> or things to check.

Have you tried strace'ing the backend process to see if you can see a
signal being delivered to it?

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-admin(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject:	Re: ERROR: canceling query due to user request
Date:	2005-09-13 22:18:40
Message-ID:	27126.1126649920@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-jdbc

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> The SELECT statement was easy to find, and it became clear that a
> programmer had code which was incorrectly canceling a JDBC
> Statement after reaching the end of the (empty) ResultSet. One
> time out of 1,799 this was causing the error we were seeing on
> the subsequent commit, which strikes me as a bug.

> We've already changed the offending code to avoid the invocation
> of the Statement.cancel method. Not that it merits high priority, but
> it might make sense for PostgreSQL to behave more consistently on
> a commit when a statement within the database transaction has been
> canceled. There currently is a race condition where if the commit
> comes fast enough after the Statement.cancel, it receives the error
> which is the subject of this thread.

Yeah. The Query Cancel protocol has a potential race condition in it:
the cancel request is not synchronous with respect to your existing
connection (it's actually sent over a different connection). If you try
to send another command immediately, it's possible that the new command
gets to the backend before the cancel signal does, and then the cancel
zaps your new command instead of the one you intended.

This has been discussed before (try the pgsql-jdbc list archives).
I believe we concluded that an appropriate fix was to not consider the
cancel request "done" until the client sees the separate connection
dropped by the postmaster. libpq's cancel functions wait for that to
happen, and I thought that JDBC had been fixed as well --- maybe you are
using an old driver version?

Of course, if the internal threading in your app is such that you might
issue another command before the Statement.cancel method finishes, then
it's your own bug.

regards, tom lane

From:	Oliver Jowett <oliver(at)opencloud(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-admin(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject:	Re: [JDBC] ERROR: canceling query due to user request
Date:	2005-09-13 23:00:41
Message-ID:	43275A19.8030704@opencloud.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-jdbc

Tom Lane wrote:

> This has been discussed before (try the pgsql-jdbc list archives).
> I believe we concluded that an appropriate fix was to not consider the
> cancel request "done" until the client sees the separate connection
> dropped by the postmaster. libpq's cancel functions wait for that to
> happen, and I thought that JDBC had been fixed as well --- maybe you are
> using an old driver version?

I thought this got done too, but looking at the current JDBC driver code
I don't see it..

-O