Re: pg_basebackup caused FailedAssertion

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_basebackup caused FailedAssertion
Date: 2013-02-26 17:28:39
Message-ID: CAHGQGwHzKmKUF+Ux-rfPrnBXOxhX2fj5kpcNcKHjoyBJDbQOjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

In HEAD, when I ran "pg_basebackup -D hoge -X stream",
I got the following FailedAssertion error:

TRAP: FailedAssertion("!((wakeEvents & ((1 << 1) | (1 << 2))) != (1 <<
2))", File: "pg_latch.c", Line: 234)

This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af.

This assertion error happens when WL_SOCKET_WRITEABLE without
WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
condition is met when walsender has received CopyDone from the client,
but the output buffer is not empty. If reaching such condition is legitimate,
I think that we should get rid of the Assertion check which caused the above
FailedAssertion error. Thought?

Regards,

--
Fujii Masao


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup caused FailedAssertion
Date: 2013-02-26 17:42:57
Message-ID: 4982.1361900577@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> In HEAD, when I ran "pg_basebackup -D hoge -X stream",
> I got the following FailedAssertion error:

> TRAP: FailedAssertion("!((wakeEvents & ((1 << 1) | (1 << 2))) != (1 <<
> 2))", File: "pg_latch.c", Line: 234)

> This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af.

> This assertion error happens when WL_SOCKET_WRITEABLE without
> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
> condition is met when walsender has received CopyDone from the client,
> but the output buffer is not empty. If reaching such condition is legitimate,
> I think that we should get rid of the Assertion check which caused the above
> FailedAssertion error. Thought?

The reason for the assertion is that that case doesn't actually work.
The code that is passing that combination of flags needs to be changed.
Or else you can try to implement the ability to support READABLE only.
But just removing the Assert is 100% wrong.

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup caused FailedAssertion
Date: 2013-02-27 17:29:31
Message-ID: 512E427B.9090308@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.02.2013 19:42, Tom Lane wrote:
> Fujii Masao<masao(dot)fujii(at)gmail(dot)com> writes:
>> In HEAD, when I ran "pg_basebackup -D hoge -X stream",
>> I got the following FailedAssertion error:
>
>> TRAP: FailedAssertion("!((wakeEvents& ((1<< 1) | (1<< 2))) != (1<<
>> 2))", File: "pg_latch.c", Line: 234)
>
>> This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af.
>
>> This assertion error happens when WL_SOCKET_WRITEABLE without
>> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
>> condition is met when walsender has received CopyDone from the client,
>> but the output buffer is not empty. If reaching such condition is legitimate,
>> I think that we should get rid of the Assertion check which caused the above
>> FailedAssertion error. Thought?
>
> The reason for the assertion is that that case doesn't actually work.
> The code that is passing that combination of flags needs to be changed.
> Or else you can try to implement the ability to support READABLE only.

Right. I fixed that by adding WL_SOCKET_READABLE, and handling any
messages that might arrive after the frontend already sent CopyEnd. The
frontend shouldn't send any messages after CopyEnd, until it receives a
CopyEnd from the backend.

In theory, the frontend could already send the next query before
receiving the CopyEnd, but libpq doesn't currently allow that. Until
someone writes a client that actually tries to do that, I'm not going to
try to support that in the backend. It would be a lot more work, and
likely be broken anyway, without any way to test it.

Thanks for the report!

- Heikki


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup caused FailedAssertion
Date: 2013-02-28 12:50:05
Message-ID: CAHGQGwEg9xFmhL6YRPesrUsjEBCz7wsZ8khhWOAWGYQ8HUGOiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 28, 2013 at 2:29 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 26.02.2013 19:42, Tom Lane wrote:
>>
>> Fujii Masao<masao(dot)fujii(at)gmail(dot)com> writes:
>>>
>>> In HEAD, when I ran "pg_basebackup -D hoge -X stream",
>>> I got the following FailedAssertion error:
>>
>>
>>> TRAP: FailedAssertion("!((wakeEvents& ((1<< 1) | (1<< 2))) != (1<<
>>>
>>> 2))", File: "pg_latch.c", Line: 234)
>>
>>
>>> This error happens after the commit
>>> 0b6329130e8e4576e97ff763f0e773347e1a88af.
>>
>>
>>> This assertion error happens when WL_SOCKET_WRITEABLE without
>>> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
>>> condition is met when walsender has received CopyDone from the client,
>>> but the output buffer is not empty. If reaching such condition is
>>> legitimate,
>>> I think that we should get rid of the Assertion check which caused the
>>> above
>>> FailedAssertion error. Thought?
>>
>>
>> The reason for the assertion is that that case doesn't actually work.
>> The code that is passing that combination of flags needs to be changed.
>> Or else you can try to implement the ability to support READABLE only.

Yeah, right.

> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any messages
> that might arrive after the frontend already sent CopyEnd. The frontend
> shouldn't send any messages after CopyEnd, until it receives a CopyEnd from
> the backend.
>
> In theory, the frontend could already send the next query before receiving
> the CopyEnd, but libpq doesn't currently allow that. Until someone writes a
> client that actually tries to do that, I'm not going to try to support that
> in the backend. It would be a lot more work, and likely be broken anyway,
> without any way to test it.
>
> Thanks for the report!

Thanks!

Regards,

--
Fujii Masao


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_basebackup caused FailedAssertion
Date: 2020-12-11 22:47:35
Message-ID: 8a5eacf81749a8b8b1a3944a0e8617ab66cba604.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Old thread:
https://www.postgresql.org/message-id/512E427B.9090308%40vmware.com
about commit 3a9e64aa.

On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote:
> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any
> messages that might arrive after the frontend already sent CopyEnd.
> The
> frontend shouldn't send any messages after CopyEnd, until it receives
> a
> CopyEnd from the backend.

It looks like 4bad60e3 may have fixed the problem, is it possible to
just revert 3a9e64aa and allow the case?

Also, the comment added by 3a9e64aa is misleading, because waiting for
a CopyDone from the server is not enough. It's possible that the client
receives the CopyDone from the server and the client sends a new query
before the server breaks from the loop. The client needs to wait until
at least the first CommandComplete.

> In theory, the frontend could already send the next query before
> receiving the CopyEnd, but libpq doesn't currently allow that. Until
> someone writes a client that actually tries to do that, I'm not going
> to
> try to support that in the backend. It would be a lot more work, and
> likely be broken anyway, without any way to test it.

I tried to add streaming replication support (still a work in progress)
to the rust client[1], and I ran into this problem.

The core of the rust client is fully pipelined and async, so it's a bit
annoying to work around this problem.

Regards,
Jeff Davis

[1] https://github.com/sfackler/rust-postgres/


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_basebackup caused FailedAssertion
Date: 2020-12-14 08:51:45
Message-ID: 3d57bc29-4459-578b-79cb-7641baf53c57@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/12/2020 00:47, Jeff Davis wrote:
> On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote:
>> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any
>> messages that might arrive after the frontend already sent CopyEnd.
>> The
>> frontend shouldn't send any messages after CopyEnd, until it receives
>> a
>> CopyEnd from the backend.
>
> It looks like 4bad60e3 may have fixed the problem, is it possible to
> just revert 3a9e64aa and allow the case?

Yes, I think you're right.

> Also, the comment added by 3a9e64aa is misleading, because waiting for
> a CopyDone from the server is not enough. It's possible that the client
> receives the CopyDone from the server and the client sends a new query
> before the server breaks from the loop. The client needs to wait until
> at least the first CommandComplete.

Good point. I think that's a bug in the implementation rather than the
comment, though. ProcessRepliesIfAny() should exit the loop immediately
if (streamingDoneReceiving && streamingDoneSending). But that's moot if
we revert 3a9e64aa altogether. I think we could backpatch the revert,
because it's not quite right as it is, and we have 3a9e64aa in all the
supported versions.

>> In theory, the frontend could already send the next query before
>> receiving the CopyEnd, but libpq doesn't currently allow that. Until
>> someone writes a client that actually tries to do that, I'm not going
>> to
>> try to support that in the backend. It would be a lot more work, and
>> likely be broken anyway, without any way to test it.
>
> I tried to add streaming replication support (still a work in progress)
> to the rust client[1], and I ran into this problem.
>
> The core of the rust client is fully pipelined and async, so it's a bit
> annoying to work around this problem.

Since you have the means to test this, would you like to do the honors
and revert 3a9e64aa?

- Heikki