Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search archives
  Advanced Search

Re: Poor addBatch performance. Why dosn't it use copy ?


  • From: Kris Jurka <books(at)ejurka(dot)com>
  • To: femski <hypertree(at)yahoo(dot)com>
  • Cc: pgsql-jdbc(at)postgresql(dot)org
  • Subject: Re: Poor addBatch performance. Why dosn't it use copy ?
  • Date: Fri, 20 Apr 2007 11:05:23 -0400 (EDT)
  • Message-id: <Pine.BSO.4.64.0704201053200.26355@leary.csoft.net> <text/plain>



On Fri, 20 Apr 2007, femski wrote:

I took this discussion in Postgres performance  list and came out with
conclusion that its a
client side JDBC issue - so I am psting it here.

I have a batch application that writes approx. 4 million rows into a narrow
(2 column) table. I am using JDBC addBatch/ExecuteBatch with auto commit
turned off. Batch size is 1000. So far I am seeing Postgres take
roughly five times (280 sec) the time it takes to do this in the Oracle
(60). This is on a Linux
server with Xeon woodcrest 5310 process. Plenty of memory. I have played
with many parameters on
the server side and they seem to have little effect - I am sure Postgres is
a very capable server and its
not a database server issue. Someone mentioned:

"I actually went and looked at the JDBC api and realized 'addBatch' means to
run multiple stmts at once, not batch
inserting.  femski, your best bet is to lobby the JDBC folks to build
support for 'copy' into the driver for faster bulk loads (or help out in
that regard). "

This comment is inaccurate. addBatch is indeed useful for batch inserts. Consider the attached test case which inserts a million rows into a single column table using prepared statements with and without batches. I get:

No batch: 148.92
With batch: 48.008

So using batch execution is about three times faster.

Additionaly gains are possible by using multiple threads and multiple connections in your client to parallelize the work. This should provide linear scaling to the number of cores/io bandwidth you have available. Perhaps oracle is able to do this behind the scenes on the server, but postgresql does not.

Why doesn't the Postgres JDBC driver use "copy" for faster bulk insert ?
What is the best way to speedup
do bulk insert at this time or in near future (I was to use standard JDBC
API) ?


Automatically converting from insert to copy is possible in only limited circumstances that can be difficult to detect. For example copy cannot handle function calls in data, so you couldn't convert this to copy: INSERT INTO t VALUES (now(), ?). What if t is actually a view with an ON INSERT rule? Copy doesn't work with views. So some complicated parsing and analysis of the query is required.

Sure it's possible, but it's non-trivial and no one has done the work.

Kris Jurka

Attachment: Batch.java
Description: Text document



Home | Main Index | Thread Index

Privacy Policy | About PostgreSQL
Copyright © 1996 – 2012 PostgreSQL Global Development Group