Re: Allow COPY from STDIN to absorb all input before throwing an error

Lists: pgsql-hackers
From: Decibel! <decibel(at)decibel(dot)org>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Allow COPY from STDIN to absorb all input before throwing an error
Date: 2008-04-08 20:26:24
Message-ID: 17145E90-F359-42FD-A5A8-A777237F57A5@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When restoring from pg_dump(all), if a problem occurs in a COPY
command you're going to get a whole slew of errors, because as soon
as COPY detects a problem it will throw an error and psql will
immediately switch to trying to process the remaining data that was
meant for COPY as if it was psql commands. This is confusing and
annoying at best; it could conceivably trash data at worst (picture
dumping a table that had SQL commands in it).

My idea to avoid this situation is to add an option to COPY that
tells it not to throw an error until it runs out of input data. Of
course once it finds a problem it would just throw all the input data
away, but when used in the context of a dump file this would remove
all the bogus errors that either psql or the backend will generate
when trying to process table data as if it was commands.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: Neil Conway <neilc(at)samurai(dot)com>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow COPY from STDIN to absorb all input before throwing an error
Date: 2008-04-08 20:50:20
Message-ID: 1207687820.6942.13.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2008-04-08 at 15:26 -0500, Decibel! wrote:
> My idea to avoid this situation is to add an option to COPY that
> tells it not to throw an error until it runs out of input data.

An alternative would be to have the client continue reading (and
discarding) COPY input until the end-of-COPY-input sequence is reached,
and then switch back into normal input processing mode.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow COPY from STDIN to absorb all input before throwing an error
Date: 2008-04-08 21:39:48
Message-ID: 8606.1207690788@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Decibel! <decibel(at)decibel(dot)org> writes:
> When restoring from pg_dump(all), if a problem occurs in a COPY
> command you're going to get a whole slew of errors, because as soon
> as COPY detects a problem it will throw an error and psql will
> immediately switch to trying to process the remaining data that was
> meant for COPY as if it was psql commands. This is confusing and
> annoying at best; it could conceivably trash data at worst (picture
> dumping a table that had SQL commands in it).

This is nonsense; it hasn't worked that way since we went to v3
protocol.

What is true is that if the COPY command itself is thoroughly borked,
the backend never tells psql to switch into COPY mode in the first
place.

> My idea to avoid this situation is to add an option to COPY that
> tells it not to throw an error until it runs out of input data.

This will not solve the problem, since again it only works if the COPY
command gets to execution.

Perhaps we could improve matters by having pg_dump issue \copy instead
of COPY and tweaking psql \copy (when non-interactive) to switch to
COPY-mode even if the backend rejects the command. I seem to recall
though that there was some reason for sticking to the COPY command form.

In the meantime, pg_restore direct to DB is reasonably proof against the
problem anyway ...

regards, tom lane


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Decibel!" <decibel(at)decibel(dot)org>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow COPY from STDIN to absorb all input before throwing an error
Date: 2008-04-08 22:10:29
Message-ID: F0238EBA67824444BC1CB4700960CB4805110300@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote
> Decibel! <decibel(at)decibel(dot)org> writes:
> > When restoring from pg_dump(all), if a problem occurs in a COPY
> > command you're going to get a whole slew of errors, because
> as soon
> > as COPY detects a problem it will throw an error and psql will
> > immediately switch to trying to process the remaining data
> that was
> > meant for COPY as if it was psql commands. This is confusing and
> > annoying at best; it could conceivably trash data at worst
> (picture
> > dumping a table that had SQL commands in it).
>
> This is nonsense; it hasn't worked that way since we went to v3
> protocol.
>
> What is true is that if the COPY command itself is thoroughly borked,
> the backend never tells psql to switch into COPY mode in the first
> place.

I had an annoying experience with COPY within psql yesterday.
I had a dump of just three tables, which I wanted to investigate. I tried loading them into an empty database, using psql's \i command.
The table creation failed as dependent tables/sequences where absent.
The copy command failed as the tables did not exist.
The data intended as the input to the copy statement resulted in a large number of error messages.

> > My idea to avoid this situation is to add an option to COPY that
> > tells it not to throw an error until it runs out of input data.
>
> This will not solve the problem, since again it only works if the COPY
> command gets to execution.

It is only now that I've found the \set ON_ERROR_STOP command, which I presume would have solved my problem.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________


From: Decibel! <decibel(at)decibel(dot)org>
To: Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow COPY from STDIN to absorb all input before throwing an error
Date: 2008-04-09 19:29:34
Message-ID: 06B7EB87-8D7F-4BC1-8B06-11B76B6AE33B@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Apr 8, 2008, at 5:10 PM, Stephen Denne wrote:
> I had an annoying experience with COPY within psql yesterday.
> I had a dump of just three tables, which I wanted to investigate. I
> tried loading them into an empty database, using psql's \i command.
> The table creation failed as dependent tables/sequences where absent.
> The copy command failed as the tables did not exist.
> The data intended as the input to the copy statement resulted in a
> large number of error messages.
>
>>> My idea to avoid this situation is to add an option to COPY that
>>> tells it not to throw an error until it runs out of input data.
>>
>> This will not solve the problem, since again it only works if the
>> COPY
>> command gets to execution

I brought this up because of a very similar problem a coworker ran
into. He did a pg_dumpall and tried to restore it into an existing
cluster. One of the tables already existed and didn't have the same
columns, so the copy command ran and then failed. And then all hell
broke lose. :) This was on 8.1, which AFAIK is using the v3 protocol,
so it's still an issue.

I can see that there would be a problem if you wrapped the dump into
a transaction and something up-stream of the copy failed... I'm not
sure on a good way to handle that, perhaps other than switching to
\COPY.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828