idle in txn query cancellation

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: idle in txn query cancellation
Date: 2010-02-13 21:37:41
Message-ID: 201002132237.43930.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

I know it is late in the cycle, but I still think that the current behaviour
of ERRORing during execution of a query but FATALing during IDLE IN
TRANSACTION is very confusing to the user. Especially as you are not even able
to read the reason for getting disconnected because the client doesnt expect
input in that state...

The first patch adds the capability to add a flag to ereport like:
ereport(ERROR | LOG_NO_CLIENT)
Tom earlier suggested using COMERROR but thats just a version of LOG which
doesnt report to the client. The patch makes that to be a synonym of LOG |
LOG_NO_CLIENT.
While its not the most pretty API I dont think its that bad because the
directionality is somewhat a property of the loglevel. Beside it would
generate a lot of useless noise and breakage.

The second patch changes the FATAL during cancelling an idle in txn query into
ERROR | LOG_NO_CLIENT.
To avoid breaking the known state there also may no "ready for query" message
get sent. The patch ensures that by setting and checking a
"silent_error_while_idle" variable.

That way the client will not see that an error occured until the next command
sent but I dont think there is a solution to that in 9.0 timeframe if at all.

The patch only adds that for the recovery conflict path for now.

What do you think? Is it worth applying something like that now? If yes I
would try to test the patch some more (obviously the patch survives the
regression tests, but they do not seem to check the extended query protocol at
all).

One could argue that the LOG_NO_CLIENT flag should be added when a idle
transaction gets terminated by force but I wouldn't bother.

On a related note I would also like to get rid of the restriction that a
normal query cancellation will only be done if no subtransactions are stacked.
But I guess its too late for that? (I have a patch ready, some cleanup would
be needed)
The latter works by:
- adding a explicit error code (which should be done regardless of this
discussion)
- avoiding to catch such error at a few places (plperl, plpython)
- recursively aborting the subtransactions once the mainloop is reached
- relying on the fact that the cancellation signal will get resent
- possibly escalating to a FATAL if nothing happens after a certain number of
tries

Andres

PS: I know I sort-of-promised a patch earlier, but it didn't work out time-
wise.

Attachment Content-Type Size
0002-Dont-FATAL-a-IDLE-IN-TRANSACTIOn-backend-during-conf.patch text/x-patch 3.9 KB
0001-Support-transporting-flags-in-the-elevel-argument-of.patch text/x-patch 6.1 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idle in txn query cancellation
Date: 2010-02-14 05:29:45
Message-ID: 1266125385.7341.8576.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> I know it is late in the cycle

No problem here. Thanks for your diligence. Will review.

--
Simon Riggs www.2ndQuadrant.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idle in txn query cancellation
Date: 2010-02-15 08:47:09
Message-ID: 1266223629.7341.9527.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> On a related note I would also like to get rid of the restriction that
> a normal query cancellation will only be done if no subtransactions
> are stacked.
> But I guess its too late for that? (I have a patch ready, some cleanup
> would be needed)
> The latter works by:
> - adding a explicit error code (which should be done regardless of
> this
> discussion)
> - avoiding to catch such error at a few places (plperl, plpython)
> - recursively aborting the subtransactions once the mainloop is
> reached
> - relying on the fact that the cancellation signal will get resent
> - possibly escalating to a FATAL if nothing happens after a certain
> number of tries

Such an action needs to have a good, clear theoretical explanation with
it to show that the interaction with savepoints is a good one.

I toyed with the idea of a new level between ERROR and FATAL to allow
ERRORs to be handled by savepoints still in all cases.

--
Simon Riggs www.2ndQuadrant.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idle in txn query cancellation
Date: 2010-02-15 08:50:08
Message-ID: 1266223808.7341.9529.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> The first patch adds the capability to add a flag to ereport like:
> ereport(ERROR | LOG_NO_CLIENT)
> Tom earlier suggested using COMERROR but thats just a version of LOG
> which doesnt report to the client. The patch makes that to be a
> synonym of LOG | LOG_NO_CLIENT.
> While its not the most pretty API I dont think its that bad because
> the directionality is somewhat a property of the loglevel. Beside it
> would generate a lot of useless noise and breakage.
>
> The second patch changes the FATAL during cancelling an idle in txn
> query into ERROR | LOG_NO_CLIENT.
> To avoid breaking the known state there also may no "ready for query"
> message get sent. The patch ensures that by setting and checking a
> "silent_error_while_idle" variable.
>
> That way the client will not see that an error occured until the next
> command sent but I dont think there is a solution to that in 9.0
> timeframe if at all.
>
> The patch only adds that for the recovery conflict path for now.
>
> What do you think? Is it worth applying something like that now? If
> yes I would try to test the patch some more (obviously the patch
> survives the regression tests, but they do not seem to check the
> extended query protocol at all).

I think that is much better than FATAL. If it works I think we should
apply it for this release.

--
Simon Riggs www.2ndQuadrant.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idle in txn query cancellation
Date: 2010-02-15 10:38:06
Message-ID: 201002151138.08407.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 15 February 2010 09:50:08 Simon Riggs wrote:
> On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > The first patch adds the capability to add a flag to ereport like:
> > ereport(ERROR | LOG_NO_CLIENT)
> > Tom earlier suggested using COMERROR but thats just a version of LOG
> > which doesnt report to the client. The patch makes that to be a
> > synonym of LOG | LOG_NO_CLIENT.
> > While its not the most pretty API I dont think its that bad because
> > the directionality is somewhat a property of the loglevel. Beside it
> > would generate a lot of useless noise and breakage.
> >
> > The second patch changes the FATAL during cancelling an idle in txn
> > query into ERROR | LOG_NO_CLIENT.
> > To avoid breaking the known state there also may no "ready for query"
> > message get sent. The patch ensures that by setting and checking a
> > "silent_error_while_idle" variable.
> >
> > That way the client will not see that an error occured until the next
> > command sent but I dont think there is a solution to that in 9.0
> > timeframe if at all.
> >
> > The patch only adds that for the recovery conflict path for now.
> >
> > What do you think? Is it worth applying something like that now? If
> > yes I would try to test the patch some more (obviously the patch
> > survives the regression tests, but they do not seem to check the
> > extended query protocol at all).
>
> I think that is much better than FATAL. If it works I think we should
> apply it for this release.
It does work for me at least ;-). I only have marginal testing with the
extended query protocol though and I think the error message needs to get
improved somewhat.

I plan to make testing the extended query protocol easier by making pgbench
able to restart after a such an error (thats why I like the seperate error
code for such cancellations...)

The problem with the error message is, that errdetail_abort() uses MyProc-
>recoveryConflictPending which is already unset when the errdetail is used.
Unless you beat me I plan to provide a patch here (havent looked at how to do
so yet though).

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: idle in txn query cancellation
Date: 2010-02-15 10:43:36
Message-ID: 201002151143.37382.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 15 February 2010 09:47:09 Simon Riggs wrote:
> On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > On a related note I would also like to get rid of the restriction that
> > a normal query cancellation will only be done if no subtransactions
> > are stacked.
> > But I guess its too late for that? (I have a patch ready, some cleanup
> > would be needed)
> > The latter works by:
> > - adding a explicit error code (which should be done regardless of
> > this
> > discussion)
> > - avoiding to catch such error at a few places (plperl, plpython)
> > - recursively aborting the subtransactions once the mainloop is
> > reached
> > - relying on the fact that the cancellation signal will get resent
> > - possibly escalating to a FATAL if nothing happens after a certain
> > number of tries
>
> Such an action needs to have a good, clear theoretical explanation with
> it to show that the interaction with savepoints is a good one.
I can provide a bit more explanation. The patch (other thread) already added
some more comments but its definitely good to explain/define some more.
Will post that to the thread with the patch, ok?

> I toyed with the idea of a new level between ERROR and FATAL to allow
> ERRORs to be handled by savepoints still in all cases.
I have a hard time believing that it will help in that situation. Either you
allow cleaning up process local resources in PG_TRY/PG_TRY in which situation
you cant abort recursively at all places because the catching code block may
very well reference resources associated with that snapshot or you abort the
process in a way that there are no process local resources.

How would the middleway between those work?

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: idle in txn query cancellation
Date: 2010-03-14 18:50:46
Message-ID: 201003141950.46916.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
> On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > I know it is late in the cycle
>
> No problem here. Thanks for your diligence. Will review.
Got a chance to look at it?

Andres


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idle in txn query cancellation
Date: 2010-03-14 19:12:00
Message-ID: 1268593920.3825.6362.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote:
> On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
> > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > > I know it is late in the cycle
> >
> > No problem here. Thanks for your diligence. Will review.
> Got a chance to look at it?

I need to spend my time on ensuring we can avoid the cancellation
altogether, so I apologise for not reviewing. That's not a comment on
your work or the possible effectiveness of the patch. Possibly others
have the time to review?

--
Simon Riggs www.2ndQuadrant.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idle in txn query cancellation
Date: 2010-04-19 16:47:42
Message-ID: 201004191847.43616.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Simon,

On Sunday 14 March 2010 20:12:00 Simon Riggs wrote:
> On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote:
> > On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
> > > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > > > I know it is late in the cycle
> > >
> > > No problem here. Thanks for your diligence. Will review.
> >
> > Got a chance to look at it?
>
> I need to spend my time on ensuring we can avoid the cancellation
> altogether, so I apologise for not reviewing. That's not a comment on
> your work or the possible effectiveness of the patch. Possibly others
> have the time to review?
I guess that wont go anywhere before 9.1?

I think at least the error code should be adjusted before 9.0.

Andres