Re: COPY enhancements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jd(at)commandprompt(dot)com
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 16:37:11
Message-ID: 25829.1255019831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> Couldn't you just commit each range of subtransactions based on some
> threshold?

> COPY foo from '/tmp/bar/' COMMIT_THRESHOLD 1000000;

> It counts to 1mil, commits starts a new transaction. Yes there would be
> 1million sub transactions but once it hits those clean, it commits.

Hmm, if we were willing to break COPY into multiple *top level*
transactions, that would avoid my concern about XID wraparound.
The issue here is that if the COPY does eventually fail (and there
will always be failure conditions, eg out of disk space), then some
of the previously entered rows would still be there; but possibly
not all of them, depending on whether we batch rows. The latter
property actually bothers me more than the former, because it would
expose an implementation detail to the user. Thoughts?

Also, this does not work if you want the copy to be part of a bigger
transaction, viz
BEGIN;
do something;
COPY ...;
do something else;
COMMIT;

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2009-10-08 16:44:53 Re: Issues for named/mixed function notation patch
Previous Message Tom Lane 2009-10-08 16:30:36 Re: Writeable CTEs and side effects