Re: COPY enhancements

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 23:30:50
Message-ID: 1255044650.6335.15.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2009-10-08 at 18:23 -0400, Bruce Momjian wrote:
> Dimitri Fontaine wrote:
> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > > It will be best to have the ability to have a specific rejection reason
> > > for each row rejected. That way we will be able to tell the difference
> > > between uniqueness violation errors, invalid date format on col7, value
> > > fails check constraint on col22 etc..
> >
> > In case that helps, what pgloader does is logging into two files, named
> > after the table name (not scalable to server-side solution):
> > table.rej --- lines it could not load, straight from source file
> > table.rej.log --- errors as given by the server, plus pgloader comment
> >
> > The pgloader comment is necessary for associating each log line to the
> > source file line, as it's operating by dichotomy, the server always
> > report error on line 1.
> >
> > The idea of having two errors file could be kept though, the aim is to
> > be able to fix the setup then COPY again the table.rej file when it
> > happens the errors are not on the file content. Or for loading into
> > another table, with all columns as text or bytea, then clean data from a
> > procedure.
>
> What would be _cool_ would be to add the ability to have comments in the
> COPY files, like \#, and then the copy data lines and errors could be
> adjacent. (Because of the way we control COPY escaping, adding \# would
> not be a problem. We have \N for null, for example.)

That was my idea also until I heard Dimitri's two file approach.

Having a pristine data file and a matching error file means you can
potentially just resubmit the error file again. Often you need to do
things like trap RI errors and then resubmit them at a later time once
the master rows have entered the system.

--
Simon Riggs www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-10-08 23:53:14 Re: Hot Standby 0.2.1
Previous Message Peter Eisentraut 2009-10-08 23:23:43 Re: Writeable CTEs and side effects