Timely reporting of COPY errors

Lists: pgsql-hackers
From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Timely reporting of COPY errors
Date: 2008-04-16 20:29:07
Message-ID: 20080416202907.GA26340@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I notice that while doing bulk-loads that any errors detected by the
backend arn't noticed by libpq until right at the end. Is this
intentional? Looking at the code we have this comment in putCopyData:

/*
* Process any NOTICE or NOTIFY messages that might be pending in the
* input buffer. Since the server might generate many notices during the
* COPY, we want to clean those out reasonably promptly to prevent
* indefinite expansion of the input buffer. (Note: the actual read of
* input data into the input buffer happens down inside pqSendSome, but
* it's not authorized to get rid of the data again.)
*/

Except that pqSendSome won't try reading anything until it has a
problem writing. Since the backend will consume copy data indefinitly,
the error message sits in the kernel buffers until the end.

Is there anything that can be done? I've tried putting in
PQconsumeInput in places but it doesn't appear to help.

Any ideas?
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timely reporting of COPY errors
Date: 2008-04-16 20:49:03
Message-ID: 20080416204903.GU4999@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn,

* Martijn van Oosterhout (kleptog(at)svana(dot)org) wrote:
> Is there anything that can be done? I've tried putting in
> PQconsumeInput in places but it doesn't appear to help.

I certainly hope something can be done, I've noticed this exact same
issue myself and it's very annoying. I've resorted to watching 'top' on
the server and hitting ctrl-c when it goes 'idle' but my psql hasn't
returned yet.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timely reporting of COPY errors
Date: 2008-04-16 21:22:17
Message-ID: 23259.1208380937@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> I notice that while doing bulk-loads that any errors detected by the
> backend arn't noticed by libpq until right at the end. Is this
> intentional?

I dunno about "intentional", but the API exposed by libpq for COPY
doesn't really permit any other behavior: you push all the data and
then look to see if it worked or not.

Even if we had some way of letting the application notice that the copy
had already failed, I don't see that psql could do very much with it,
at least not for COPY FROM STDIN. It's got to read through the source
data anyway or it'll be out of sync with the script file.

We could possibly fix libpq to start dropping the data on the floor
if it sees an error reply already pending, but that's only going
to be an incremental change.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timely reporting of COPY errors
Date: 2008-04-16 21:33:18
Message-ID: 20080416213318.GE7942@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> We could possibly fix libpq to start dropping the data on the floor
> if it sees an error reply already pending, but that's only going
> to be an incremental change.

I think this incremental change makes a lot of sense. What point is
there in transmitting the data over the network, if the backend is in
error state?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timely reporting of COPY errors
Date: 2008-04-16 22:20:01
Message-ID: 20080416222001.GC26340@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 16, 2008 at 05:22:17PM -0400, Tom Lane wrote:
> I dunno about "intentional", but the API exposed by libpq for COPY
> doesn't really permit any other behavior: you push all the data and
> then look to see if it worked or not.

Oh? I expected the PQputData would return -1 as stated by the
documentation, at which point I would call PQendCopy and retrieve the
resultset.

> Even if we had some way of letting the application notice that the copy
> had already failed, I don't see that psql could do very much with it,
> at least not for COPY FROM STDIN. It's got to read through the source
> data anyway or it'll be out of sync with the script file.

psql could ignore the result of PQputData if it wanted, no big deal
there.

> We could possibly fix libpq to start dropping the data on the floor
> if it sees an error reply already pending, but that's only going
> to be an incremental change.

At the very least the documentation needs to be improved. For example,
no NOTICEs will be processed either *unless* there are enough to cause
the backend to block. At which point they will all be processed at
once. But the first step would be to get libpq to even notice the
error. I'm confused as to why PQconsumeInput doesn't work.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timely reporting of COPY errors
Date: 2008-04-17 13:28:31
Message-ID: 20080417132831.GV4999@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Martijn van Oosterhout (kleptog(at)svana(dot)org) wrote:
> On Wed, Apr 16, 2008 at 05:22:17PM -0400, Tom Lane wrote:
> > Even if we had some way of letting the application notice that the copy
> > had already failed, I don't see that psql could do very much with it,
> > at least not for COPY FROM STDIN. It's got to read through the source
> > data anyway or it'll be out of sync with the script file.
>
> psql could ignore the result of PQputData if it wanted, no big deal
> there.

erm, maybe I'm missing something here, but psql could certainly stop
reading the file it was given on a \copy line when an error on the
backend happens. I agree that a user doing a copy-from-stdin wouldn't
be able to have it stop, though it'd go alot faster if it's just psql
throwing away data rather than it being sent across the network. Also,
imv, we should consider adding a 'stdin' option to \copy to let psql
know that it's ok to error-out if it starts to get errors from the
backend.

Admittedly, I do use "zcat | psql -c "copy ... from stdin" quite a bit,
but I'd be extremely happy to change that, in pretty much any way
necessary, to make it so that psql just exit's when the backend starts
reporting errors. Actually, ideally psql would just say "oh, this is a
-c command, not a script anyway, so I can error out if that command
starts to fail for whatever reason".

Thanks,

Stephen


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Postgresql <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timely reporting of COPY errors
Date: 2008-06-23 22:42:38
Message-ID: 200806232242.m5NMgck18425@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

o Allow COPY to report errors sooner

http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> Hi,
>
> I notice that while doing bulk-loads that any errors detected by the
> backend arn't noticed by libpq until right at the end. Is this
> intentional? Looking at the code we have this comment in putCopyData:
>
> /*
> * Process any NOTICE or NOTIFY messages that might be pending in the
> * input buffer. Since the server might generate many notices during the
> * COPY, we want to clean those out reasonably promptly to prevent
> * indefinite expansion of the input buffer. (Note: the actual read of
> * input data into the input buffer happens down inside pqSendSome, but
> * it's not authorized to get rid of the data again.)
> */
>
> Except that pqSendSome won't try reading anything until it has a
> problem writing. Since the backend will consume copy data indefinitly,
> the error message sits in the kernel buffers until the end.
>
> Is there anything that can be done? I've tried putting in
> PQconsumeInput in places but it doesn't appear to help.
>
> Any ideas?
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > Please line up in a tree and maintain the heap invariant while
> > boarding. Thank you for flying nlogn airlines.
-- End of PGP section, PGP failed!

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +