Re: Hot Standby conflict resolution handling

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Hot Standby conflict resolution handling
Date: 2012-12-04 07:00:43
Message-ID: CABOikdP48wU561h5FvWL_qvvAz=GAiAjmMsfk30YVXU_MqED0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I was trying some simple queries on a Hot Standby with streaming
replication.

On standby, I do this:
postgres=# begin transaction isolation level repeatable read;
BEGIN
postgres=# explain verbose select count(b) from test WHERE a > 100000;

On master, I insert a bunch of tuples in the table and run VACUUM ANALYZE.
postgres=# INSERT INTO test VALUES (generate_series(110001,120000), 'foo',
1);
INSERT 0 10000
postgres=# VACUUM ANALYZE test;
VACUUM

After max_standby_streaming_delay, the standby starts cancelling the
queries. I get an error like this on the standby:
postgres=# explain verbose select count(b) from test WHERE a > 100000;
FATAL: terminating connection due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

So I've couple questions/concerns here

1. Why to throw a FATAL error here ? A plain ERROR should be enough to
abort the transaction. There are four places in ProcessInterrupts() where
we throw these kind of errors and three of them are FATAL.

2911 if (DoingCommandRead)
2912 ereport(FATAL,
2913 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
2914 errmsg("terminating connection due to
conflict with recovery"),
2915 errdetail_recovery_conflict(),
2916 errhint("In a moment you should be able to reconnect
to the"
2917 " database and repeat your command.")));
2918 else
2919 ereport(ERROR,
2920 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
2921 errmsg("canceling statement due to conflict with
recovery"),
2922 errdetail_recovery_conflict()));

In this particular test case, the backend is DoingCommandRead and that
forces a FATAL error. I'm not sure why is that required. And even if its
necessary, IMHO we should add a comment explaining that. In the other two
places where we throw FATAL, one looks legitimate, but I'm not sure about
the other.

2836 else if (RecoveryConflictPending && RecoveryConflictRetryable)
2837 {
2838 pgstat_report_recovery_conflict(RecoveryConflictReason);
2839 ereport(FATAL,
2840 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
2841 errmsg("terminating connection due to conflict with
recovery"),
2842 errdetail_recovery_conflict()));
2843 }
2844 else if (RecoveryConflictPending)
2845 {
2846 /* Currently there is only one non-retryable recovery
conflict */
2847 Assert(RecoveryConflictReason ==
PROCSIG_RECOVERY_CONFLICT_DATABASE);
2848 pgstat_report_recovery_conflict(RecoveryConflictReason);
2849 ereport(FATAL,
2850 (errcode(ERRCODE_DATABASE_DROPPED),
2851 errmsg("terminating connection due to conflict with
recovery"),
2852 errdetail_recovery_conflict()));
2853 }

AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
of whether RecoveryConflictRetryable is true or false, we will always
ereport(FATAL).

2. For my test, the error message itself looks wrong because I did not
actually remove any rows on the master. VACUUM probably marked a bunch of
pages as all-visible and that should have triggered a conflict on the
standby in order to support index-only scans. IMHO we should improve the
error message to avoid any confusion. Or we can add a new ProcSignalReason
to differentiate between a cancel due to clean up vs visibilitymap_set()
operation.

BTW, my current set up is using 9.2.1, but I don't see any code changes in
the master. So I would assume the issues will exist there too.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby conflict resolution handling
Date: 2012-12-04 08:14:46
Message-ID: 20121204081229.GA26353@alap2.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2012-12-04 12:30:43 +0530, Pavan Deolasee wrote:
> I was trying some simple queries on a Hot Standby with streaming
> replication.
>
> On standby, I do this:
> postgres=# begin transaction isolation level repeatable read;
> BEGIN
> postgres=# explain verbose select count(b) from test WHERE a > 100000;
>
> On master, I insert a bunch of tuples in the table and run VACUUM ANALYZE.
> postgres=# INSERT INTO test VALUES (generate_series(110001,120000), 'foo',
> 1);
> INSERT 0 10000
> postgres=# VACUUM ANALYZE test;
> VACUUM
>
> After max_standby_streaming_delay, the standby starts cancelling the
> queries. I get an error like this on the standby:
> postgres=# explain verbose select count(b) from test WHERE a > 100000;
> FATAL: terminating connection due to conflict with recovery
> DETAIL: User query might have needed to see row versions that must be
> removed.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> So I've couple questions/concerns here
>
> 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
> abort the transaction. There are four places in ProcessInterrupts() where
> we throw these kind of errors and three of them are FATAL.

The problem here is that were in IDLE IN TRANSACTION in this case. Which
currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
anything).

There are two problems making this non-trivial. For one, while we're in
IDLE IN TXN the client doesn't expect a response on a protocol level, so
we can't simply ereport() at that time.
For another, when were in IDLE IN TXN we're potentially inside openssl
so we can't jump out of there anyway because that would quite likely
corrupt the internal state of openssl.

I tried to fix this before (c.f. "Idle in transaction cancellation" or
similar) but while I had some kind of fix for the first issue (i saved
the error and reported it later when the protocol state allows it) I
missed the jumping out of openssl bit. I think its not that hard to
solve though. I remember having something preliminary but I never had
the time to finish it. If I remember correctly the trick was to set
openssl into non-blocking mode temporarily and return to the caller
inside be-secure.c:my_sock_read.
At that location ProcessInterrupts can run safely, error out silently,
and reraise the error once were in the right protocol state.

> 2836 else if (RecoveryConflictPending && RecoveryConflictRetryable)
> 2837 {
> 2838 pgstat_report_recovery_conflict(RecoveryConflictReason);
> 2839 ereport(FATAL,
> 2840 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> 2841 errmsg("terminating connection due to conflict with
> recovery"),
> 2842 errdetail_recovery_conflict()));
> 2843 }
> 2844 else if (RecoveryConflictPending)
> 2845 {
> 2846 /* Currently there is only one non-retryable recovery
> conflict */
> 2847 Assert(RecoveryConflictReason ==
> PROCSIG_RECOVERY_CONFLICT_DATABASE);
> 2848 pgstat_report_recovery_conflict(RecoveryConflictReason);
> 2849 ereport(FATAL,
> 2850 (errcode(ERRCODE_DATABASE_DROPPED),
> 2851 errmsg("terminating connection due to conflict with
> recovery"),
> 2852 errdetail_recovery_conflict()));
> 2853 }
>
> AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
> of whether RecoveryConflictRetryable is true or false, we will always
> ereport(FATAL).

Which is fine, because were below if (ProcDiePending). Note there's a
separate path for QueryCancelPending. We go on to kill connections once
the normal conflict handling has tried several times.

> 2. For my test, the error message itself looks wrong because I did not
> actually remove any rows on the master. VACUUM probably marked a bunch of
> pages as all-visible and that should have triggered a conflict on the
> standby in order to support index-only scans. IMHO we should improve the
> error message to avoid any confusion. Or we can add a new ProcSignalReason
> to differentiate between a cancel due to clean up vs visibilitymap_set()
> operation.

I think we desparately need to improve *all* of these message with
significantly more detail (cause for cancellation, relation, current
xid, conflicting xid, current/last query).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby conflict resolution handling
Date: 2012-12-04 12:31:52
Message-ID: CABOikdOCtU9bw-FGCr4+gbJdGosMddAK3EE1yfBamxBTNLRDEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 4, 2012 at 1:44 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

>
> >
> > After max_standby_streaming_delay, the standby starts cancelling the
> > queries. I get an error like this on the standby:
> > postgres=# explain verbose select count(b) from test WHERE a > 100000;
> > FATAL: terminating connection due to conflict with recovery
> > DETAIL: User query might have needed to see row versions that must be
> > removed.
> > HINT: In a moment you should be able to reconnect to the database and
> > repeat your command.
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Succeeded.
> >
> > So I've couple questions/concerns here
> >
> > 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
> > abort the transaction. There are four places in ProcessInterrupts() where
> > we throw these kind of errors and three of them are FATAL.
>
> The problem here is that were in IDLE IN TRANSACTION in this case. Which
> currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
> anything).
>
> There are two problems making this non-trivial. For one, while we're in
> IDLE IN TXN the client doesn't expect a response on a protocol level, so
> we can't simply ereport() at that time.
> For another, when were in IDLE IN TXN we're potentially inside openssl
> so we can't jump out of there anyway because that would quite likely
> corrupt the internal state of openssl.
>
> I tried to fix this before (c.f. "Idle in transaction cancellation" or
> similar) but while I had some kind of fix for the first issue (i saved
> the error and reported it later when the protocol state allows it) I
> missed the jumping out of openssl bit. I think its not that hard to
> solve though. I remember having something preliminary but I never had
> the time to finish it. If I remember correctly the trick was to set
> openssl into non-blocking mode temporarily and return to the caller
> inside be-secure.c:my_sock_read.
>

Thanks Andres. I also read the original thread and I now understand why we
are using FATAL here, at least until we have a better solution. Obviously
the connection reset is no good either because as someone commented in the
original discussion, I thought that I'm seeing a server crash while it was
not.

>
> >
> > AFAICS the first of these should be ereport(ERROR). Otherwise
> irrespective
> > of whether RecoveryConflictRetryable is true or false, we will always
> > ereport(FATAL).
>
> Which is fine, because were below if (ProcDiePending). Note there's a
> separate path for QueryCancelPending. We go on to kill connections once
> the normal conflict handling has tried several times.
>
>
Ok. Understood.I now see that every path below if (ProcDiePending) will
call FATAL, albeit with different error codes. That explains the current
code.

>
> I think we desparately need to improve *all* of these message with
> significantly more detail (cause for cancellation, relation, current
> xid, conflicting xid, current/last query).
>
>
I agree.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby conflict resolution handling
Date: 2012-12-04 13:24:53
Message-ID: CABOikdM6=ZYsJCQDAt8hN63n3LYFCASdYsU-qW16k03asxu5Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>wrote:

>
>
> Thanks Andres. I also read the original thread and I now understand why we
> are using FATAL here, at least until we have a better solution. Obviously
> the connection reset is no good either because as someone commented in the
> original discussion, I thought that I'm seeing a server crash while it was
> not.
>
>
How about attached comment to be added at the appropriate place ? I've
extracted this mostly from Tom's explanation in the original thread.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Attachment Content-Type Size
clarify-fatal-error-in-conflict-resolve.patch application/octet-stream 1.0 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby conflict resolution handling
Date: 2012-12-29 19:23:45
Message-ID: 20121229192345.GY16126@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan,

* Pavan Deolasee (pavan(dot)deolasee(at)gmail(dot)com) wrote:
> On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>wrote:
> > Thanks Andres. I also read the original thread and I now understand why we
> > are using FATAL here, at least until we have a better solution. Obviously
> > the connection reset is no good either because as someone commented in the
> > original discussion, I thought that I'm seeing a server crash while it was
> > not.
>
> How about attached comment to be added at the appropriate place ? I've
> extracted this mostly from Tom's explanation in the original thread.

I was hoping to see an update to the actual error messages returned in
this patch.. I agree that it's good to add the comments but that
doesn't do anything to help the user out in this case.

Regarding the actual comment, here's the wording that I'd use:

-----------------------
If we are in DoingCommandRead state, we can't use ereport(ERROR)
because we can't long jumps in this state. If we attempt to
longjmps in this state, we not only risk breaking protocol at
our level, but also risk leaving openssl in an inconsistent
state, either violating the ssl protocol or having its internal
state trashed because it was interrupted while in the middle of
changing that state.

Currently, the only option is to promote ERROR to FATAL until
we figure out a way to handle errors more effectively while in
this state.
-----------------------

If you agree with that wording update, can you update the patch?

Thanks,

Stephen


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 03:56:49
Message-ID: 20130117035649.GA3253@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-12-29 14:23:45 -0500, sfrost(at)snowman(dot)net wrote:
>
> Regarding the actual comment, here's the wording that I'd use:

Sorry for nitpicking, but "we can't long jumps" made me cringe.
Here's a slightly more condensed version:

/*
* We can't use ereport(ERROR) here, because any longjmps
* in DoingCommandRead state run the risk of violating our
* protocol or the SSL protocol, by interrupting OpenSSL in
* the middle of changing its internal state.
*
* Currently, the only option is to promote ERROR to FATAL
* until we figure out a better way to handle errors in this
* state.
*/

Patch along these lines attached, which also removes trailing
whitespace from the original patch.

-- Abhijit

Attachment Content-Type Size
clarify-fatal-error-in-conflict-resolution.diff text/x-diff 841 bytes

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 05:47:40
Message-ID: CABOikdObnOUhkKnxjZNEeNqsJgSVF4oXxDSMPGeyjJG4nMF8NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 9:26 AM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>wrote:

> At 2012-12-29 14:23:45 -0500, sfrost(at)snowman(dot)net wrote:
> >
> > Regarding the actual comment, here's the wording that I'd use:
>
> Sorry for nitpicking, but "we can't long jumps" made me cringe.
> Here's a slightly more condensed version:
>
> /*
> * We can't use ereport(ERROR) here, because any longjmps
> * in DoingCommandRead state run the risk of violating our
> * protocol or the SSL protocol, by interrupting OpenSSL in
> * the middle of changing its internal state.
> *
> * Currently, the only option is to promote ERROR to FATAL
> * until we figure out a better way to handle errors in this
> * state.
> */
>
> Patch along these lines attached, which also removes trailing
> whitespace from the original patch.
>
>
Thanks Stephen and Abhijit for improving the comments. I like this wording.
So +1 from my side. Abhijit, do you want to add the patch and change the CF
status appropriately ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Andres Freund <andres(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 06:38:31
Message-ID: 29354.1358404711@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> Sorry for nitpicking, but "we can't long jumps" made me cringe.

Agreed :-(

> Here's a slightly more condensed version:

I find this still not quite right, because where it's placed, it applies
to both the DoingCommandRead and !DoingCommandRead branches of the
if/else statement. The wording would be okay if placed inside the first
if branch, but I think visually that would look ugly. I'm inclined to
suggest wording it like

* If we're in DoingCommandRead state, we can't use ereport(ERROR),
* because any longjmp would risk interrupting OpenSSL operations
* and thus confusing that library and/or violating wire protocol.

plus your second para as-is.

But having said that ... are we sure this code is not actually broken?
ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
cannot safely attempt to send an error message to the client either;
but ereport(FATAL) will try exactly that.

regards, tom lane


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 07:49:47
Message-ID: CABOikdOcMSdC=L83ScqwkdctMcqRYwqnkJetFHA5=Vq-Up_KqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> But having said that ... are we sure this code is not actually broken?
>

I'm not.

> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
> cannot safely attempt to send an error message to the client either;
> but ereport(FATAL) will try exactly that.
>

I thought since FATAL will force the backend to exit, we don't care much
about corrupted OpenSSL state. I even thought that's why we raise ERROR to
FATAL so that the backend can start in a clean state. But clearly I'm
missing a point here because you don't think that way.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 10:47:44
Message-ID: 20130117104744.GA4314@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-17 01:38:31 -0500, Tom Lane wrote:
> But having said that ... are we sure this code is not actually broken?
> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
> cannot safely attempt to send an error message to the client either;
> but ereport(FATAL) will try exactly that.

You're absolutely right.

ISTM, to fix it we would have to either provide a COMERROR like facility
for FATAL errors or just remove the requirement of raising an error
exactly there.
If I remember what I tried before correctly the latter seems to involve
setting openssl into nonblocking mode and check for the saved error in
the EINTR loop in be-secure and re-raise the error from there.

Do we want to backport either - there hasn't been any report that I
could link to it right now, but on the other hand it could possibly
cause rather ugly problems (data leakage, segfaults, code execution
aren't all that improbable)?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 15:19:23
Message-ID: 8463.1358435963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
>> cannot safely attempt to send an error message to the client either;
>> but ereport(FATAL) will try exactly that.

> I thought since FATAL will force the backend to exit, we don't care much
> about corrupted OpenSSL state. I even thought that's why we raise ERROR to
> FATAL so that the backend can start in a clean state. But clearly I'm
> missing a point here because you don't think that way.

If we were to simply exit(1), leaving the kernel to close the client
socket, it'd be safe enough because control would never have returned to
OpenSSL. But this code doesn't do that. What we're looking at is that
we've interrupted OpenSSL at some arbitrary point, and now we're going
to make fresh calls to it to try to pump the FATAL error message out to
the client. It seems fairly unlikely that that's safe. I'm not sure
I credit Andres' worry of arbitrary code execution, but I do fear that
OpenSSL could get confused to the point of freezing up, or even more
likely that it would transmit garbage to the client, which rather
defeats the purpose.

Don't see a nice fix. The COMMERROR approach (ie, don't try to send
anything to the client, only the log) is not nice at all since the
client would get the impression that the server crashed. On the other
hand, anything else requires waiting till we get control back from
OpenSSL, which might be a long time, and meanwhile we're still holding
locks that prevent WAL recovery from proceeding.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 16:03:19
Message-ID: 20130117160319.GA22844@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-17 10:19:23 -0500, Tom Lane wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
> >> cannot safely attempt to send an error message to the client either;
> >> but ereport(FATAL) will try exactly that.
>
> > I thought since FATAL will force the backend to exit, we don't care much
> > about corrupted OpenSSL state. I even thought that's why we raise ERROR to
> > FATAL so that the backend can start in a clean state. But clearly I'm
> > missing a point here because you don't think that way.
>
> If we were to simply exit(1), leaving the kernel to close the client
> socket, it'd be safe enough because control would never have returned to
> OpenSSL. But this code doesn't do that. What we're looking at is that
> we've interrupted OpenSSL at some arbitrary point, and now we're going
> to make fresh calls to it to try to pump the FATAL error message out to
> the client. It seems fairly unlikely that that's safe. I'm not sure
> I credit Andres' worry of arbitrary code execution, but I do fear that
> OpenSSL could get confused to the point of freezing up, or even more
> likely that it would transmit garbage to the client, which rather
> defeats the purpose.

I don't think its likely either, I seem to remember it copying arround
function pointers though, so it seems possible with some bad luck.

> Don't see a nice fix. The COMMERROR approach (ie, don't try to send
> anything to the client, only the log) is not nice at all since the
> client would get the impression that the server crashed. On the other
> hand, anything else requires waiting till we get control back from
> OpenSSL, which might be a long time, and meanwhile we're still holding
> locks that prevent WAL recovery from proceeding.

I think we can make openssl return pretty much immediately if we assume
recv() can reliably interrupted by signals, possibly by setting the
socket to nonblocking in the signal handler.
We just need to tell openssl not to retry immediately and we should be
fine. Given that quite some people use openssl with nonblocking sockets,
that code path should be reasonably safe.

That still requires ugliness around saving the error and reraising it
after returning from openssl though...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services