From: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Ragged CSV import |
Date: | 2009-09-09 20:56:01 |
Message-ID: | 1252529761.4080.23.camel@hvost1700 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2009-09-09 at 16:34 -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > >> I have received a requirement for the ability to import ragged CSV
> > >> files, i.e. files that contain variable numbers of columns per row.
> >
> > BTW, one other thought about this: I think the historical reason for
> > COPY being strict about the number of incoming columns was that it
> > provided a useful cross-check that the parsing hadn't gone off into
> > the weeds. We have certainly seen enough examples where the reported
> > manifestation of, say, an escaping mistake was that COPY saw the row
> > as having too many or too few columns. So being permissive about it
> > would lose some error detection capability. I am not clear about
> > whether CSV format is sufficiently more robust than the traditional
> > COPY format to render this an acceptable loss. Comments?
>
> I think accepting less columns and filling with nulls should be
> protected enough for this not to be a problem; if the parser goes nuts,
> it will die eventually. Silently dropping excessive trailing columns
> does not seem acceptable though; you could lose entire rows and not
> notice.
Maybe we could put a catch-all "text" or even "text[]" column at as the
last one of the table and gather all extra columns there ?
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
From | Date | Subject | |
---|---|---|---|
Next Message | Sam Mason | 2009-09-09 21:00:03 | Re: COALESCE and NULLIF semantics |
Previous Message | Sam Mason | 2009-09-09 20:51:40 | Re: RfD: more powerful "any" types |