Re: COPY enhancements

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: COPY enhancements
Date: 2009-09-12 08:22:01
Message-ID: alpine.GSO.2.01.0909120324010.9961@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 11 Sep 2009, Josh Berkus wrote:

> I've been thinking about it, and can't come up with a really strong case
> for wanting a user-defined table if we settle the issue of having a
> strong key for pg_copy_errors. Do you have one?

No, but I'd think that if the user table was only allowed to be the exact
same format as the system one it wouldn't be that hard to implement--once
the COPY syntax is expanded at least. I'm reminded of how Oracle EXPLAIN
PLANs get logged into the PLAN_TABLE by default, but you can specify "INTO
table" to put them somewhere else. You'd basically doing the same thing
but with a different destination relation.

> After some thought, I think that Andrew's feature *is* generally
> applicable, if done as IGNORE COLUMN COUNT (or, more likely,
> column_count=ignore). I can think of a lot of data sets where column
> count is jagged and you want to do ELT instead of ETL.

Exactly, the ELT approach gives you so many more options for cleaning up
the data that I think it would be used more if it weren't so hard to
do in Postgres right now.

> As opposed to Tom, Peter and Heikki vetoing things because the feature
> gain doesn't justify the maintnenance burden? That's your real choice.
> Adding a framework for manageable syntax extensions means that we can be
> more liberal about what we justify as an extension.

I think you're not talking at the distinction I was trying to make. The
work to make the *syntax* for COPY easier to extend is an unfortunate
requirement for all these new bits; no arguments from me that using GUCs
for everything is just too painful

What I was suggesting is that the first set of useful features required
for what you're calling the ELT load path is both small and well
understood. An implementation of the stuff I see a constant need for
could get banged out so fast that trying to completely generalize it on
the first pass has a questionable return.

While complicated, COPY is a pretty walled off command of around 3500
lines of code, and the hackery required here is pretty small. For
example, it turns out we do already have the code to get it to ignore
column overruns here, and it's all of 50 new lines--much of which is
shared with code that does other error ignoring bits too. It's easy to
make a case for a grand future extensibility cleanup here, but it's really
not necessary to provide a significant benefit here for the cases I
mentioned. And I would guess the maintenance burden of a more general
solution has to be higher than a simple implementation of the feature list
I gave in my last message.

In short: there's a presumption that adding any error-ignoring code would
require significant contortions. I don't think that's really true though,
and would like to keep open the possibilty of accepting some simple but
useful ad-hoc features in this area, even if they don't solve every
possible problem in this space just yet.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2009-09-12 08:43:46 Re: COPY enhancements
Previous Message Heikki Linnakangas 2009-09-12 07:12:23 Re: COPY enhancements