\copy: unexpected response (4)

Lists: pgsql-generalpgsql-hackers
From: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: \copy: unexpected response (4)
Date: 2009-08-07 14:37:00
Message-ID: 24866027.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


I have set up a batch job that pipes a couple hundred of \copy commands to
psql. It starts out well enough, sprinkled with a few error messages that I
intend to circle back and clean up after batching all of the ALTER commands
that I need, but then:

\copy gm1 from export/month1-88/13m/GM1.dat csv
psql:copy.sql:6328161: ERROR: invalid input syntax for integer: "743.5"
CONTEXT: COPY gm1, line 415306, column gm1c: "743.5"
\copy gm2a from export/month1-88/13m/GM2A.dat csv
\copy gm2e from export/month1-88/13m/GM2E.dat csv
\copy gm3 from export/month1-88/13m/GM3.dat csv
\copy gm4 from export/month1-88/13m/GM4.dat csv
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
psql:copy.sql:8059525: \copy: unexpected response (4)
. . .

(I am echoing everything and interleaving stderr into a log file.)

That last message is repeated literally ad inifinitum -- I mean I found 300G
of them after leaving my job running overnight!

I tried a \copy on that last file by hand and simply got another 'invalid
input syntax' message.

Can anyone speculate on what might lead to this condition? At first I
thought it was a networking problem but doing what I could to minimize that
possibility had no effect. Maybe I need to use separate connections?

The only reference I could find to that error message was
http://doxygen.postgresql.org/bin_2psql_2copy_8h.html but I know absolutely
nil about the source code.

Thanks.

Neil

--
View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24866027.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: \copy: unexpected response (4)
Date: 2009-08-07 15:49:53
Message-ID: 22987.1249660193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Neil Best <nbest(at)ci(dot)uchicago(dot)edu> writes:
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)
> psql:copy.sql:8059525: \copy: unexpected response (4)

Hmm. It looks like psql could get into an infinite loop if the server
failed to exit COPY IN mode for some reason, but it's not at all clear
how that could happen (or what to do about it). What server version
and what psql version is this? What does the server's log show?
Can you provide a reproducible test case?

regards, tom lane


From: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: \copy: unexpected response (4)
Date: 2009-08-07 16:08:07
Message-ID: 24867582.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane-2 wrote:
>
> Hmm. It looks like psql could get into an infinite loop if the server
> failed to exit COPY IN mode for some reason, but it's not at all clear
> how that could happen (or what to do about it). What server version
> and what psql version is this? What does the server's log show?
> Can you provide a reproducible test case?
>

Maybe it is a networking issue after all. I also thought to check the
server logs and found this:

2009-08-06 16:52:16 CDTCONTEXT: COPY gm1, line 415306, column gm1c: "743.5"
2009-08-06 16:52:16 CDTSTATEMENT: COPY gm1 FROM STDIN CSV
2009-08-06 16:52:51 CDTLOG: checkpoints are occurring too frequently (7
seconds apart)
2009-08-06 16:52:51 CDTHINT: Consider increasing the configuration
parameter "checkpoint_segments".
2009-08-06 16:53:02 CDTLOG: checkpoints are occurring too frequently (11
seconds apart)
2009-08-06 16:53:02 CDTHINT: Consider increasing the configuration
parameter "checkpoint_segments".
2009-08-06 16:53:20 CDTERROR: invalid input syntax for integer: "68.84"
2009-08-06 16:53:20 CDTCONTEXT: COPY gm4, line 411272, column gm4c: "68.84"
2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
2009-08-06 16:53:20 CDTLOG: SSL renegotiation failure
2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
2009-08-06 16:53:20 CDTLOG: SSL error: unexpected record
2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
2009-08-06 16:53:20 CDTLOG: could not send data to client: Connection reset
by peer
2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
2009-08-06 16:53:20 CDTLOG: SSL error: unexpected record
2009-08-06 16:53:20 CDTLOG: could not receive data from client: Connection
reset by peer
2009-08-06 16:53:20 CDTLOG: incomplete message from client
2009-08-06 17:38:21 CDTLOG: could not receive data from client: Operation
timed out
2009-08-06 17:38:21 CDTLOG: unexpected EOF on client connection
(END)

As for the versions:

monthly=> select version();
version
--------------------------------------------------------------
PostgreSQL 8.3.7 on i386-apple-darwin9.6.0, compiled by GCC

$ psql --version
psql (PostgreSQL) 8.1.11

Could this mismatch cause the problem? I will find out if it's feasible to
copy my data to the server to rule out these client and network vagaries.
Should I be concerned abou the checkpoint messages? I haven't looked up
what those are all about yet.

--
View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24867582.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: \copy: unexpected response (4)
Date: 2009-08-07 16:35:40
Message-ID: 23794.1249662940@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Neil Best <nbest(at)ci(dot)uchicago(dot)edu> writes:
> Tom Lane-2 wrote:
>> Hmm. It looks like psql could get into an infinite loop if the server
>> failed to exit COPY IN mode for some reason, but it's not at all clear
>> how that could happen (or what to do about it). What server version
>> and what psql version is this? What does the server's log show?
>> Can you provide a reproducible test case?

> Maybe it is a networking issue after all. I also thought to check the
> server logs and found this:

> 2009-08-06 16:53:20 CDTERROR: invalid input syntax for integer: "68.84"
> 2009-08-06 16:53:20 CDTCONTEXT: COPY gm4, line 411272, column gm4c: "68.84"
> 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
> 2009-08-06 16:53:20 CDTLOG: SSL renegotiation failure
> 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
> 2009-08-06 16:53:20 CDTLOG: SSL error: unexpected record
> 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV
> 2009-08-06 16:53:20 CDTLOG: could not send data to client: Connection reset
> by peer

Hmm, so it looks like the connection dropped and libpq failed to
recognize that, or maybe libpq was okay but psql needs to check a bit
more carefully here. I'll take a look.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: \copy: unexpected response (4)
Date: 2009-08-07 17:33:59
Message-ID: 24444.1249666439@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I wrote:
> Hmm, so it looks like the connection dropped and libpq failed to
> recognize that, or maybe libpq was okay but psql needs to check a bit
> more carefully here. I'll take a look.

I could not reproduce this problem in testing, but after eyeballing
the code awhile I have a theory. It looks like it is possible for
PQputCopyEnd to fail and leave the PGconn in COPY_IN state, but this
only happens (1) if the output buffer contained at least 8K already,
causing pqSendSome to be invoked from pqPutMsgEnd, and (2) pqSendSome
returned failure. In that situation the loop in copy.c becomes
infinite, since there's no mechanism for getting out of COPY_IN state.
This case would be relatively difficult to trigger, but it seems to fit
all the facts, if we assume that the connection had failed for some
reason just at that point. BTW, the "SSL renegotiation failure" bit
suggests that it could have been an OpenSSL bug not a real network
lossage, so you might want to see how up-to-date your openssl libraries
are.

Anyway, it seems to me that the most appropriate fix is to add some
code to that loop, along the lines of

/*
* Make sure we have pumped libpq dry of results; else it may still be in
* ASYNC_BUSY state, leading to false readings in, eg, get_prompt().
*/
while ((result = PQgetResult(pset.db)) != NULL)
{
success = false;
psql_error("\\copy: unexpected response (%d)\n",
PQresultStatus(result));
+ /* if still in COPY IN state, try to get out of it */
+ if (PQresultStatus(result) == PGRES_COPY_IN)
+ PQputCopyEnd(conn, _("trying to exit copy mode"));
PQclear(result);
}

This would cover this particular case and perhaps others as well.

regards, tom lane


From: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: \copy: unexpected response (4)
Date: 2009-08-07 21:21:27
Message-ID: 663c3ffc0908071421i57480766tbf8041cd73eb80e9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, the "SSL renegotiation failure" bit
> suggests that it could have been an OpenSSL bug not a real network
> lossage, so you might want to see how up-to-date your openssl libraries
> are.

Thanks for your comments, Tom. The operation seems more reliable if I
move the data to the server and do it across a local connection, which
I presume does not involve SSL, so that may be the weak link as you
surmise. Would you expect the SSL library problem more likely to be
on the server or the client, or is it just hard to say? Does either
of them have a facility that exposes the SSL version information or do
I have to go to the OS for that? Incidentally, I have not experienced
any sort of instability in my interactive sessions over an SSL
connection, so is it related to the \copy operation itself, the higher
volume of data across the connection, or the fact that I am asking it
to do multiple \copies in rapid succession, would you say? I expect
it's hard to say definitively, but maybe you or someone else can say
something about likelihoods. I appreciate the information.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] \copy: unexpected response (4)
Date: 2009-08-07 21:45:00
Message-ID: 449.1249681500@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Neil Best <nbest(at)ci(dot)uchicago(dot)edu> writes:
> On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, the "SSL renegotiation failure" bit
>> suggests that it could have been an OpenSSL bug not a real network
>> lossage, so you might want to see how up-to-date your openssl libraries
>> are.

> Thanks for your comments, Tom. The operation seems more reliable if I
> move the data to the server and do it across a local connection, which
> I presume does not involve SSL, so that may be the weak link as you
> surmise. Would you expect the SSL library problem more likely to be
> on the server or the client, or is it just hard to say?

You're talking like you've found this to be repeatable. Is it?

regards, tom lane


From: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] \copy: unexpected response (4)
Date: 2009-08-14 15:35:26
Message-ID: 24973703.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane-2 wrote:
>
> Neil Best <nbest(at)ci(dot)uchicago(dot)edu> writes:
>> On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> BTW, the "SSL renegotiation failure" bit
>>> suggests that it could have been an OpenSSL bug not a real network
>>> lossage, so you might want to see how up-to-date your openssl libraries
>>> are.
>
>> Thanks for your comments, Tom. The operation seems more reliable if I
>> move the data to the server and do it across a local connection, which
>> I presume does not involve SSL, so that may be the weak link as you
>> surmise. Would you expect the SSL library problem more likely to be
>> on the server or the client, or is it just hard to say?
>
> You're talking like you've found this to be repeatable. Is it?
>
>

--
View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24973703.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] \copy: unexpected response (4)
Date: 2009-08-14 15:39:31
Message-ID: 24973785.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane-2 wrote:
>
> Neil Best <nbest(at)ci(dot)uchicago(dot)edu> writes:
>> On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> BTW, the "SSL renegotiation failure" bit
>>> suggests that it could have been an OpenSSL bug not a real network
>>> lossage, so you might want to see how up-to-date your openssl libraries
>>> are.
>
>> Thanks for your comments, Tom. The operation seems more reliable if I
>> move the data to the server and do it across a local connection, which
>> I presume does not involve SSL, so that may be the weak link as you
>> surmise. Would you expect the SSL library problem more likely to be
>> on the server or the client, or is it just hard to say?
>
> You're talking like you've found this to be repeatable. Is it?
>
>

It is indeed repeatable, Tom. I was able to perform my \copy batch job
multiple times without error by working over the local connection. I had to
run it many times to iteratively catch all of the type mismatches that had
to be fixed, so I truncated and \copied over 200 tables about 7 times in
this manner. I did not attempt any software upgrades. Any theories?

--
View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24973785.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] \copy: unexpected response (4)
Date: 2009-08-14 18:04:07
Message-ID: 11660.1250273047@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Neil Best <nbest(at)ci(dot)uchicago(dot)edu> writes:
> Tom Lane-2 wrote:
>> You're talking like you've found this to be repeatable. Is it?

> It is indeed repeatable, Tom. I was able to perform my \copy batch job
> multiple times without error by working over the local connection. I had to
> run it many times to iteratively catch all of the type mismatches that had
> to be fixed, so I truncated and \copied over 200 tables about 7 times in
> this manner. I did not attempt any software upgrades. Any theories?

Sorry, I meant to ask whether the *failure* was repeatable.

regards, tom lane


From: Neil Best <nbest(at)ci(dot)uchicago(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] \copy: unexpected response (4)
Date: 2009-08-14 18:31:44
Message-ID: 24976506.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane-2 wrote:
>
> Sorry, I meant to ask whether the *failure* was repeatable.
>

Oh, I see. Well, to the extent that i had the same problem in two different
remote clients before finding that the local connection work-around made it
go away, I would say that it was repeatable. In short, it happened more
than once and was not intermittent, as far as I could tell. Does that help?
--
View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24976506.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From: Charlie <charles(dot)s(dot)johnson(at)hp(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: \copy: unexpected response (4)
Date: 2012-02-07 00:01:56
Message-ID: 1328572916874-5461710.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I have encountered the symmetric error to this -> PGRES_COPY_OUT.

We are using a foreign data wrapper into a large (and unnamed) database
system which generates a substantial pipeline of rows for copyout to bring
back through libPQ to respond to the psql request.

If the back end is blown away the request hangs in the perl command line
script containing the psql command. That command is then terminated with an
alarm(60) call, which was unfortunately a too short timeout.

Nevertheless, the gigabyte of error messages (one per row I assume) that are
generated is probably too much error notification. One (or two) error
message(s) would do. And the fix should possibly look something like:

/*
* Make sure we have pumped libpq dry of results; else it may still
be in
* ASYNC_BUSY state, leading to false readings in, eg, get_prompt().
*/
while ((result = PQgetResult(pset.db)) != NULL)
{
success = false;
psql_error("\\copy: unexpected response (%d)\n",
PQresultStatus(result));
/* if still in COPY IN or COPY OUT state, try to get out of it */
if (PQresultStatus(result) == PGRES_COPY_IN || PQresultStatus(result) ==
PGRES_COPY_OUT)
PQputCopyEnd(conn, _("trying to exit copy mode"));
PQclear(result);
}

I think this becomes an issue only when foreign data wrappers are used,
because both ends of the conversation can die with a pipeline of data being
shoveled through a shared library call under the foreign data wrapper, while
postgres bravely and pointlessly carries on.

Charlie ...

--
View this message in context: http://postgresql.1045698.n5.nabble.com/copy-unexpected-response-4-tp1922077p5461710.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.