Re: Hot Standy introduced problem with query cancel behavior

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Kris Jurka <books(at)ejurka(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Hot Standy introduced problem with query cancel behavior
Date: 2010-01-07 20:47:47
Message-ID: 16887.1262897267@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> The reason I suggested adding CHECK_FOR_INTERRUPTS into the recv code path was
> that it should allow a relatively "natural" handling of canceling "IDLE IN
> TRANSACTION" queries without doing anything in the interrupt handler.

> I think it shouldn't be to hard to make that code path safe for
> CHECK_FOR_INTERRUPTS().

Idle in transaction isn't the problem (except for what it does to the
FE/BE protocol state). The problem is what happens inside a non-idle
transaction.

Since apparently I'm still not being clear enough about this, let me
spell it out:

1. Outer transaction calls, say, a plperl function.
2. plperl function executes some query via SPI, thereby starting
a subtransaction.
3. We receive an HS query-cancel interrupt. Since
!ImmediateInterruptOK, this just sets QueryCancelPending.
4. At the next occurrence of CHECK_FOR_INTERRUPTS, ProcessInterrupts
is entered.
5. According to both Simon's committed patch and his recent variant,
ProcessInterrupts executes AbortOutOfAnyTransaction and then throws
elog(ERROR).
6. plperl.c catches the elog longjmp and tries to abort its
subtransaction (loss #1), then return to the Perl interpreter
which is under no obligation to abort processing its perl script
(loss #2), and whenever it does exit, or else call SPI to try to
process another query, we're screwed because the outer transaction
is already dead (loss #3).

The situation with Perl or Python or some other PL is pretty much
the worst case, since we have no control whatever over that code
layer --- but in reality this type of scenario can play out even
without any third-party code involved. Anyplace that catches an
elog longjmp will be broken by AbortOutOfAnyTransaction inside
ProcessInterrupts, because things aren't supposed to happen in that
order.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-01-07 20:47:53 Re: RFC: PostgreSQL Add-On Network
Previous Message Dave Page 2010-01-07 20:44:27 Re: Streaming replication and postmaster signaling