Re: FYI: porting Copy API to 8.x

From: "Kalle Hallivuori" <kato(at)iki(dot)fi>
To: "Kris Jurka" <books(at)ejurka(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: FYI: porting Copy API to 8.x
Date: 2007-06-13 13:26:38
Message-ID: c637d8bb0706130626v40ce1976g1ac45a581759cc76@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi Kris!

Happy to receive feedback.

2007/6/13, Kris Jurka <books(at)ejurka(dot)com>:
> This patch doesn't apply cleanly because there's a mix of unix and windows
> EOL characters and it seems to have some other application problems.

Sorry for that. I'll pay closer attention to the format in future.

> The API is not thread safe. If two threads are using the same connection
> the synchronization in QueryExecutorImpl prevents them from stepping on
> each others toes and writing garbage to the backend. So the call to
> QueryExecutor must be an atomic operation and implies that the
> QueryExecutorImpl will be the controller and demand/push data from some
> kind of copy client instead of the other way around.

Yes. (I thought that was alright since I spotted a statement of no
support for synchronization somewhere, but that turned out to be at
lower level, pgStream.)

I'll look into current synchronization and hopefully I'll be able (as
in have the time) to do that.

> Also I'm not especially fond of the row based API that you've come up
> with. It seems like you should either go to an element or stream based
> API. Who has a premade row? ...

It doesn't matter what kinds of chunks you feed it (I think that was
ingenious of the COPY specification writers). With a CSV file you can
read it as a whole into a single byte[] and write that at once. Or you
can have static size byte buffer and pass it repeatedly to write().

In my application I now succesfully collect a bunch of field values as
separate byte[]'s mixed with references to static instances of
delimiter byte[]'s into a large, static size byte[][], write when it
fills up, repeat until out of data. Cut time spent in import by half.
(The other half of import time is spent post-processing the data :))

However, if that isn't intuitive it has to be explained clearly in all
documentation or a more intuitive API offered.

I think you're right on all accounts. I'll try to find some time to
make it synchronized and self-evident. Shouldn't be too much of an
effort now.

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kalle Hallivuori 2007-06-15 12:30:37 Proper COPY implementation for 8.x
Previous Message Kris Jurka 2007-06-13 08:16:35 Re: FYI: porting Copy API to 8.x