Quick Links

Re: loss of transactions in streaming replication

Lists:	pgsql-hackers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	loss of transactions in streaming replication
Date:	2011-10-12 09:45:42
Message-ID:	CAHGQGwEEm1TpNZvuMh08ZB2kVCvyAeFkdUwdXPJYYC8jaf_N1A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

In 9.2dev and 9.1, when walreceiver detects an error while sending data to
WAL stream, it always emits ERROR even if there are data available in the
receive buffer. This might lead to loss of transactions because such
remaining data are not received by walreceiver :(

To prevent transaction loss, I'm thinking to change walreceiver so that it
always ignores an error (specifically, emits COMMERROR instead of ERROR)
during sending data. Then walreceiver receives data if available. If an error
occurrs during receiving data, walreceiver can emit ERROR this time.
Comments? Better ideas?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-12 13:29:06
Message-ID:	CA+Tgmob+ma3LZdwtATF_J=BzocHx4FY4o3EyXMr-M-S6gvh6Uw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> In 9.2dev and 9.1, when walreceiver detects an error while sending data to
> WAL stream, it always emits ERROR even if there are data available in the
> receive buffer. This might lead to loss of transactions because such
> remaining data are not received by walreceiver :(

Won't it just reconnect?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-13 01:08:37
Message-ID:	CAHGQGwEV5qAU+n+QC894oTCmgOdeX3bT2tKY_JEU4bdkp-P9GQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> In 9.2dev and 9.1, when walreceiver detects an error while sending data to
>> WAL stream, it always emits ERROR even if there are data available in the
>> receive buffer. This might lead to loss of transactions because such
>> remaining data are not received by walreceiver :(
>
> Won't it just reconnect?

Yes if the master is running normally. OTOH, if the master is not running (i.e.,
failover case), the standby cannot receive again the data which it failed to
receive.

I found this issue when I shut down the master. When the master shuts down,
it sends the shutdown checkpoint record, but I found that the standby failed
to receive it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-14 11:51:21
Message-ID:	CAHGQGwEQ9qq8Rx83RAfVCtpiAM5Uguf_KHmcMzCwzLtzvxm3Uw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 13, 2011 at 10:08 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> In 9.2dev and 9.1, when walreceiver detects an error while sending data to
>>> WAL stream, it always emits ERROR even if there are data available in the
>>> receive buffer. This might lead to loss of transactions because such
>>> remaining data are not received by walreceiver :(
>>
>> Won't it just reconnect?
>
> Yes if the master is running normally. OTOH, if the master is not running (i.e.,
> failover case), the standby cannot receive again the data which it failed to
> receive.
>
> I found this issue when I shut down the master. When the master shuts down,
> it sends the shutdown checkpoint record, but I found that the standby failed
> to receive it.

Patch attached.

The patch changes walreceiver so that it doesn't emit ERROR just yet even
if it fails to send data to WAL stream. Then, after all available data have been
received and flushed to the disk, it emits ERROR.

If the patch is OK, it should be backported to v9.1.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
walrcv_avoid_data_loss_v1.patch	text/x-diff	5.7 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-19 02:28:12
Message-ID:	CA+TgmobdUdG-2D_=kpLwzpyoed9PH8+pHsubRjBUgCz_OaGwqQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 14, 2011 at 7:51 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Oct 13, 2011 at 10:08 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> In 9.2dev and 9.1, when walreceiver detects an error while sending data to
>>>> WAL stream, it always emits ERROR even if there are data available in the
>>>> receive buffer. This might lead to loss of transactions because such
>>>> remaining data are not received by walreceiver :(
>>>
>>> Won't it just reconnect?
>>
>> Yes if the master is running normally. OTOH, if the master is not running (i.e.,
>> failover case), the standby cannot receive again the data which it failed to
>> receive.
>>
>> I found this issue when I shut down the master. When the master shuts down,
>> it sends the shutdown checkpoint record, but I found that the standby failed
>> to receive it.
>
> Patch attached.
>
> The patch changes walreceiver so that it doesn't emit ERROR just yet even
> if it fails to send data to WAL stream. Then, after all available data have been
> received and flushed to the disk, it emits ERROR.
>
> If the patch is OK, it should be backported to v9.1.

Convince me. :-)

My reading of the situation is that you're talking about a problem
that will only occur if, while the master is in the process of
shutting down, a network error occurs. I am not sure it's a good idea
to convolute the code to handle that case, because (1) there are going
to be many similar situations where nothing within our power is
sufficient to prevent WAL from failing to make it to the standby and
(2) for this marginal improvement, you're giving up including
PQerrorMessage(streamConn) in the error message that ultimately gets
omitted, which seems like a substantial regression as far as
debuggability is concerned. Even if we do decide that we want the
change in behavior, I see no compelling reason to back-patch it.
Stable releases are supposed to be stable, not change behavior because
we thought of something we like better than what we originally
released.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-19 06:31:10
Message-ID:	CAHGQGwFqEvHEZjgbefNWrxs9WCVKP9OE8x8L+==PKKT-Xab7MA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 19, 2011 at 11:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Convince me. :-)

Yeah, I try.

> My reading of the situation is that you're talking about a problem
> that will only occur if, while the master is in the process of
> shutting down, a network error occurs.

No. This happens even if a network error doesn't occur. I can
reproduce the issue by doing the following:

1. Set up streaming replication master and standby with archive
setting.
2. Run pgbench -i
3. Shuts down the master with fast mode.

Then I can see that the latest WAL file in the master's pg_xlog
doesn't exist in the standby's one. The WAL record which was
lost was the shutdown checkpoint one.

When smart or fast shutdown is requested, the master tries to
write and send the WAL switch (if archiving is enabled) and
shutdown checkpoint record. Because of the problem I described,
the WAL switch record arrives at the standby but the shutdown
checkpoint does not.

> I am not sure it's a good idea
> to convolute the code to handle that case, because (1) there are going
> to be many similar situations where nothing within our power is
> sufficient to prevent WAL from failing to make it to the standby and

Shutting down the master is not a rare case. So I think it's worth
doing something.

> (2) for this marginal improvement, you're giving up including
> PQerrorMessage(streamConn) in the error message that ultimately gets
> omitted, which seems like a substantial regression as far as
> debuggability is concerned.

I think that it's possible to include PQerrorMessage() in the error
message. Will change the patch.

> Even if we do decide that we want the
> change in behavior, I see no compelling reason to back-patch it.
> Stable releases are supposed to be stable, not change behavior because
> we thought of something we like better than what we originally
> released.

The original behavior, in 9.0, is that all outstanding WAL are
replicated to the standby when the master shuts down normally.
But ISTM the behavior was changed unexpectedly in 9.1. So
I think that it should be back-patched to 9.1 to revert the behavior
to the original.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-19 12:01:14
Message-ID:	CAHGQGwHncA_qFyugEWqRBS7KA8Cm-+n65nRMqrdBcZ4U4fkp+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 19, 2011 at 3:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> (2) for this marginal improvement, you're giving up including
>> PQerrorMessage(streamConn) in the error message that ultimately gets
>> omitted, which seems like a substantial regression as far as
>> debuggability is concerned.
>
> I think that it's possible to include PQerrorMessage() in the error
> message. Will change the patch.

Attached is the updated version of the patch. When walreceiver fails to
send data to WAL stream, it emits WARNING with the message including
PQerrorMessage(), and also it emits the following DETAIL message:

Walreceiver process will be terminated after all available data
have been received from WAL stream.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
walrcv_avoid_data_loss_v2.patch	text/x-diff	5.7 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-19 12:44:37
Message-ID:	CA+TgmoZ3_-assSrO9jLfnDZfepq3755kUbNe=_b-+y8THLL3oQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 19, 2011 at 2:31 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> My reading of the situation is that you're talking about a problem
>> that will only occur if, while the master is in the process of
>> shutting down, a network error occurs.
>
> No. This happens even if a network error doesn't occur. I can
> reproduce the issue by doing the following:
>
> 1. Set up streaming replication master and standby with archive
> setting.
> 2. Run pgbench -i
> 3. Shuts down the master with fast mode.
>
> Then I can see that the latest WAL file in the master's pg_xlog
> doesn't exist in the standby's one. The WAL record which was
> lost was the shutdown checkpoint one.
>
> When smart or fast shutdown is requested, the master tries to
> write and send the WAL switch (if archiving is enabled) and
> shutdown checkpoint record. Because of the problem I described,
> the WAL switch record arrives at the standby but the shutdown
> checkpoint does not.

Oh, that's not good.

> The original behavior, in 9.0, is that all outstanding WAL are
> replicated to the standby when the master shuts down normally.
> But ISTM the behavior was changed unexpectedly in 9.1. So
> I think that it should be back-patched to 9.1 to revert the behavior
> to the original.

Which commit broke this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-19 14:41:32
Message-ID:	CAHGQGwF70UQ2p3Sx-_ARt-aR5y1JmHycnRWBeoGw5xoURaOJhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 19, 2011 at 9:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> The original behavior, in 9.0, is that all outstanding WAL are
>> replicated to the standby when the master shuts down normally.
>> But ISTM the behavior was changed unexpectedly in 9.1. So
>> I think that it should be back-patched to 9.1 to revert the behavior
>> to the original.
>
> Which commit broke this?

d3d414696f39e2b57072fab3dd4fa11e465be4ed
b186523fd97ce02ffbb7e21d5385a047deeef4f6

The former introduced problematic libpqrcv_send() (which was my mistake...),
and the latter is the first user of it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-19 16:05:52
Message-ID:	CA+TgmobT7HTjTstaB7tHXBHaA+nBNTYHAqQOp=dYpKKUyYwTig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 19, 2011 at 10:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Oct 19, 2011 at 9:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> The original behavior, in 9.0, is that all outstanding WAL are
>>> replicated to the standby when the master shuts down normally.
>>> But ISTM the behavior was changed unexpectedly in 9.1. So
>>> I think that it should be back-patched to 9.1 to revert the behavior
>>> to the original.
>>
>> Which commit broke this?
>
> d3d414696f39e2b57072fab3dd4fa11e465be4ed
> b186523fd97ce02ffbb7e21d5385a047deeef4f6
>
> The former introduced problematic libpqrcv_send() (which was my mistake...),
> and the latter is the first user of it.

OK, so this is an artifact of the changes to make libpq communication
bidirectional. But I'm still confused about where the error is coming
from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver
detects an error while sending data to WAL stream, it always emits
ERROR even if there are data available in the receive buffer." So
that implied to me that this is only going to trigger if you have a
shutdown together with an awkwardly-timed error. But your scenario
for reproducing this problem doesn't seem to involve an error.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-21 01:51:01
Message-ID:	CAHGQGwGqLHMF_k1hbp7s5+C4yhbyuxKNuYXVdmtoRnjiQJBZdg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> OK, so this is an artifact of the changes to make libpq communication
> bidirectional. But I'm still confused about where the error is coming
> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver
> detects an error while sending data to WAL stream, it always emits
> ERROR even if there are data available in the receive buffer." So
> that implied to me that this is only going to trigger if you have a
> shutdown together with an awkwardly-timed error. But your scenario
> for reproducing this problem doesn't seem to involve an error.

Yes, my scenario doesn't cause any real error. My original description was
misleading. The following would be closer to the truth:

"In 9.2dev and 9.1, when walreceiver detects the termination of replication
connection while sending data to WAL stream, it always emits ERROR
even if there are data available in the receive buffer."

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-21 03:01:52
Message-ID:	CA+Tgmobp=c7vwW3FdbsnTZeb9GVe88KXbYjy=7ED-CzMKKSoyw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> OK, so this is an artifact of the changes to make libpq communication
>> bidirectional. But I'm still confused about where the error is coming
>> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver
>> detects an error while sending data to WAL stream, it always emits
>> ERROR even if there are data available in the receive buffer." So
>> that implied to me that this is only going to trigger if you have a
>> shutdown together with an awkwardly-timed error. But your scenario
>> for reproducing this problem doesn't seem to involve an error.
>
> Yes, my scenario doesn't cause any real error. My original description was
> misleading. The following would be closer to the truth:
>
> "In 9.2dev and 9.1, when walreceiver detects the termination of replication
> connection while sending data to WAL stream, it always emits ERROR
> even if there are data available in the receive buffer."

Ah, OK. I think I now agree that this is a bug and that we should fix
and back-patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-10-24 12:40:49
Message-ID:	CAHGQGwFFm9KaErzX0v+czAaRorAUvB46LBRnEfLOqO4f7g_7BA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 21, 2011 at 12:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> OK, so this is an artifact of the changes to make libpq communication
>>> bidirectional. But I'm still confused about where the error is coming
>>> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver
>>> detects an error while sending data to WAL stream, it always emits
>>> ERROR even if there are data available in the receive buffer." So
>>> that implied to me that this is only going to trigger if you have a
>>> shutdown together with an awkwardly-timed error. But your scenario
>>> for reproducing this problem doesn't seem to involve an error.
>>
>> Yes, my scenario doesn't cause any real error. My original description was
>> misleading. The following would be closer to the truth:
>>
>> "In 9.2dev and 9.1, when walreceiver detects the termination of replication
>> connection while sending data to WAL stream, it always emits ERROR
>> even if there are data available in the receive buffer."
>
> Ah, OK. I think I now agree that this is a bug and that we should fix
> and back-patch.

The patch that I posted before is well-formed enough to be adopted?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: loss of transactions in streaming replication
Date:	2011-11-30 20:37:24
Message-ID:	CA+TgmobcopKvRbkdKh_qXEwMbQYa0Sg3xEu5cd1XcaOLzN_mqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 24, 2011 at 8:40 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Oct 21, 2011 at 12:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> OK, so this is an artifact of the changes to make libpq communication
>>>> bidirectional. But I'm still confused about where the error is coming
>>>> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver
>>>> detects an error while sending data to WAL stream, it always emits
>>>> ERROR even if there are data available in the receive buffer." So
>>>> that implied to me that this is only going to trigger if you have a
>>>> shutdown together with an awkwardly-timed error. But your scenario
>>>> for reproducing this problem doesn't seem to involve an error.
>>>
>>> Yes, my scenario doesn't cause any real error. My original description was
>>> misleading. The following would be closer to the truth:
>>>
>>> "In 9.2dev and 9.1, when walreceiver detects the termination of replication
>>> connection while sending data to WAL stream, it always emits ERROR
>>> even if there are data available in the receive buffer."
>>
>> Ah, OK. I think I now agree that this is a bug and that we should fix
>> and back-patch.
>
> The patch that I posted before is well-formed enough to be adopted?

Does this still need to be worked on?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company