Re: COPY enhancements

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-09-11 21:27:19
Message-ID: alpine.GSO.2.01.0909111651530.7278@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 11 Sep 2009, Tom Lane wrote:

> If you believe that somebody might think of a new per-column COPY
> behavior in the future, then the same issue is going to come up again.

While Andrew may have given up on a quick hack to work around his recent
request, I don't have that luxury. We've already had to add two new
behaviors here to COPY in our version and I expect more in the future.
The performance of every path to get data into the database besides COPY
is too miserable for us to use anything else, and the current
inflexibility makes it useless for anything but the cleanest input data.

The full set of new behavior here I'd like to see allows adjusting:

-Accept or reject rows with extra columns?
-Accept or reject rows that are missing columns at the end?
--Fill them with the default for the column (if available) or NULL?
-Save rejected rows?
--To a single system table?
--To a user-defined table?
--To the database logs?

The user-defined table for rejects is obviously exclusive of the system
one, either of those would be fine from my perspective.

I wasn't really pleased with the "if it's not the most general solution
possible we're not interested" tone of Andrew's other COPY-change thread
this week. I don't think there's *that* many common requests here that
they can't all be handled by specific implementations, and the scope creep
of launching into a general framework for adding them is just going to
lead to nothing useful getting committed. If you want something really
complicated, drop into a PL-based solution. The stuff I list above I see
regular requests for at *every* PG installation I've ever been involved
in, and it would be fantastic if they were available out of the box.

But I think it's quite reasonable to say the COPY syntax needs to be
overhauled to handle all these. The two changes we've made at Truviso
both use GUCs to control their behavior, and I'm guessing Aster did that
too for the same reasons we did: it's easier to do and makes for cleaner
upstream merges. That approach doesn't really scale well though to many
options, and when considered for core the merge concerns obviously go
away. (The main reason I haven't pushed for us to submit our
customizations here is that I know perfectly well the GUC-based UI isn't
acceptable, but I haven't been able to get a better one done yet)

This auto-partioning stuff is interesting if the INSERT performance of it
can be made reasonable. I think Emmanuel is too new to the community
process here to realize that there's little hope of those getting
committed or even reviewed together. If I were reviewing this I'd just
kick it back as "separate these cleanly into separate patches where the
partitioning one depends on the logging one" before even starting to look
at the code, it's too much stuff to consume properly in one gulp.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-09-11 21:32:42 Re: COPY enhancements
Previous Message Robert Haas 2009-09-11 21:21:12 Re: COPY enhancements