Re: [bug fix] psql \copy doesn't end if backend is killed

Lists: pgsql-hackers
From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: [bug fix] psql \copy doesn't end if backend is killed
Date: 2013-12-20 14:13:03
Message-ID: 0FC2D1C4CBA64A649C8C31E9419313E4@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I've encountered a bug on PG 9.2 and fixed it for 9.4. Please find attached
the patch. I'd like it to be backported to at least 9.2.

[Problem]
If the backend is terminated with SIGKILL while psql is running "\copy
table_name from file_name", the \copy didn't end forever. I expected \copy
to be cancelled because the corresponding server process vanished.

[Cause]
psql could not get out of the loop below in handleCopyIn():

while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN)
{
OK = false;
PQclear(res);

PQputCopyEnd(pset.db, _("trying to exit copy mode"));
}

This situation is reached as follows:

1. handleCopyIn() calls PQputCopyData().
2. PQputCopyData() calls pqPutMsgEnd().
3. pqPutMsgEnd() calls pqFlush(), which calls pqSendSome().
4. pqSendSome() calls pqReadData().
5. At this moment, the backend is killed with SIGKILL.
6. pqReadData() fails to read the socket, receiving ECONNRESET. It closes
the socket.
7. As a result, PQputCopyData() fails in 2.
8. handleCopyIn() then calls PQputCopyEnd().
9. PQputCopyEnd() calls pqPutMsgENd(), which calls pqFlush(), which in turn
calls pqSendSome().
10. pqSendSome() fails because the socket is not open.
11. As a result, PQputCopyENd() returns an error, leaving conn->asyncStatus
PGASYNC_COPY_IN.
12. Because conn->asyncStatus remains PGASYNC_COPY_IN, PQgetResult()
continues to return pgresult whose status is PGRES_COPY_IN.

[Fix]
If the message transmission fails in PQputCopyEnd(), switch
conn->asyncStatus back to PGASYNC_BUSY. That causes PQgetResult() to try to
read data with pqReadData(). pqReadData() fails and PQgetResult() returns
NULL. As a consequence, the loop in question terminates.

Regards
MauMau

Attachment Content-Type Size
failed_copy_loops.patch application/octet-stream 2.5 KB

From: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] psql \copy doesn't end if backend is killed
Date: 2014-01-31 09:17:02
Message-ID: 4205E661176A124FAF891E0A6BA91352659435A2@SZXEML507-MBS.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 December 2013 19:43, MauMau Wrote

> [Problem]
> If the backend is terminated with SIGKILL while psql is running "\copy
> table_name from file_name", the \copy didn't end forever. I expected
> \copy
> to be cancelled because the corresponding server process vanished.
>
>
> [Cause]
> psql could not get out of the loop below in handleCopyIn():
>
> while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN)
> {
> OK = false;
> PQclear(res);
>
> PQputCopyEnd(pset.db, _("trying to exit copy mode"));
> }

1. Patch applied to git head successfully
2. Problem can occur in some scenario and fix looks fine to me.
3. For testing, I think it's not possible to directly generate such scenario, so I have verified by debugging as the steps explained.

1. Make pqsecure_write to return less byte(by updating the result while debugging in gdb in pqSendSome.
(also make sure that remaining byte is >= 8192 i.e conn->outCount-sent > 8192 , so that in next step pqPutMsgEnd called from PQputCopyEnd go for flushing the data)

2. Then Kill the backend process before it calls pqReadData.

Scenario reproduced without patch and after applying the patch issue resolves.

Is there any direct scenario by which it can be reproduce ?

Regards,
Dilip




From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Dilip kumar" <dilip(dot)kumar(at)huawei(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] psql \copy doesn't end if backend is killed
Date: 2014-01-31 15:12:54
Message-ID: 353106910D624EF5B8C88189259021B9@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Dilip kumar" <dilip(dot)kumar(at)huawei(dot)com>
Is there any direct scenario by which it can be reproduce ?

Thank you for reviewing and testing the patch. There is no other direct
scenario.
I reproduced the failure exactly like you suggested, because it was very
difficult to reproduce the problem without using the debugger.

Regards
MauMau


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "MauMau" <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [bug fix] psql \copy doesn't end if backend is killed
Date: 2014-02-14 00:55:31
Message-ID: 32508.1392339331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"MauMau" <maumau307(at)gmail(dot)com> writes:
> If the backend is terminated with SIGKILL while psql is running "\copy
> table_name from file_name", the \copy didn't end forever. I expected \copy
> to be cancelled because the corresponding server process vanished.

I just noticed this CF entry pertaining to the same problem that Stephen
Frost reported a couple days ago:
http://www.postgresql.org/message-id/20140211205336.GU2921@tamriel.snowman.net

I believe it's been adequately fixed as of commits fa4440f516 and
b8f00a46bc, but if you'd test that those handle your problem cases,
I'd appreciate it.

> [Fix]
> If the message transmission fails in PQputCopyEnd(), switch
> conn->asyncStatus back to PGASYNC_BUSY.

This patch seems inappropriate to me, because it will allow libpq to exit
the COPY IN state whether or not it still has a live connection. If it
does, the backend will be in an inconsistent state and we'll have a mess.

regards, tom lane


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] psql \copy doesn't end if backend is killed
Date: 2014-02-17 11:40:13
Message-ID: 932840941BEA4555B653D519124C0B28@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> I just noticed this CF entry pertaining to the same problem that Stephen
> Frost reported a couple days ago:
> http://www.postgresql.org/message-id/20140211205336.GU2921@tamriel.snowman.net
>
> I believe it's been adequately fixed as of commits fa4440f516 and
> b8f00a46bc, but if you'd test that those handle your problem cases,
> I'd appreciate it.

I confirmed that the problem disappeared. I'll delete my CommitFest entry
in several days.

Regards
MauMau