Re: COPY enhancements

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: COPY enhancements
Date: 2009-09-12 15:13:39
Message-ID: 4AABBAA3.30604@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
>> After some thought, I think that Andrew's feature *is* generally
>> applicable, if done as IGNORE COLUMN COUNT (or, more likely,
>> column_count=ignore). I can think of a lot of data sets where column
>> count is jagged and you want to do ELT instead of ETL.
>
> Exactly, the ELT approach gives you so many more options for cleaning
> up the data that I think it would be used more if it weren't so hard
> to do in Postgres right now.
>
>

+1. That's exactly what my client wants to do. They know perfectly well
that they get junk data. They want to get it into the database with a
minimum of fuss where they will have the right tools for checking and
cleaning it. If they have to spend effort whacking it into shape just to
get it into the database, then their cleanup effort essentially has to
be done in two pieces, part inside and part outside the database.

>
> While complicated, COPY is a pretty walled off command of around 3500
> lines of code, and the hackery required here is pretty small. For
> example, it turns out we do already have the code to get it to ignore
> column overruns here, and it's all of 50 new lines--much of which is
> shared with code that does other error ignoring bits too. It's easy to
> make a case for a grand future extensibility cleanup here, but it's
> really not necessary to provide a significant benefit here for the
> cases I mentioned. And I would guess the maintenance burden of a more
> general solution has to be higher than a simple implementation of the
> feature list I gave in my last message.
>
> In short: there's a presumption that adding any error-ignoring code
> would require significant contortions. I don't think that's really
> true though, and would like to keep open the possibilty of accepting
> some simple but useful ad-hoc features in this area, even if they
> don't solve every possible problem in this space just yet.
>
>

Right. What I proposed would not have been terribly invasive or
difficult, certainly less so than what seems to be our direction by an
order of magnitude at least. I don't for a moment accept the assertion
that we can get a general solution for the same effort.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-09-12 15:23:57 Re: COPY enhancements
Previous Message Martijn van Oosterhout 2009-09-12 12:06:34 Re: Disable and enable of table and column constraints