Re: setBlob loop performance?

From: Barry Lind <barry(at)xythos(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: pgsql-jdbc <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: setBlob loop performance?
Date: 2002-08-28 08:20:48
Message-ID: 3D6C87E0.6000502@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

David,

Could you resend this in a diff -c format. Since I don't know the exact
version you have it is difficult to pick out the changes. A diff would
make that much easier.

thanks,
--Barry

David Wall wrote:

>>Both of these issues look like bugs to me. Please submit a patch. I
>>agree that using a 4 or 8k buffer for reading/writing between the
>>streams is much better than how the code is currently implemented.
>> Also, setBlob should be calling close on the input stream when it is
>>
>>
>done.
>
>Oddly enough, I don't have the capability to test this just yet, but I
>thought I'd share the code as I think would work better for
>org.postgresql.jdbc2.PreparedStatement. Of course, if anybody sees anything
>screwy, please let me know. The main changes are the closing of the two
>streams in the finally block so that even if an exception is thrown, the
>streams are closed, and the primary loop attempts to read/write 4k blocks.
>I hope to be able to test shortly, but we're in the process of converting
>the database to 7.2.2 and the sources from getBytes/setBytes to
>getBlob/setBlob (we thought this would be easier than converting existing
>OIDs to bytea in the db).
>
>David
>
>
> public void setBlob(int i, Blob x) throws SQLException
> {
> InputStream l_inStream = x.getBinaryStream();
> LargeObjectManager lom = connection.getLargeObjectAPI();
> int oid = lom.create();
> LargeObject lob = lom.open(oid);
> OutputStream los = lob.getOutputStream();
> byte[] buf = new byte[4096];
> try
> {
> // could be buffered, but then the OutputStream returned by LargeObject
> // is buffered internally anyhow, so there would be no performance
> // boost gained, if anything it would be worse!
> int bytesRemaining = (int)x.length();
> int numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining));
> while (numRead != -1 && bytesRemaining > 0)
> {
> bytesRemaining -= numRead;
> los.write(buf,0,numRead);
> numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining));
> }
> }
> catch (IOException se)
> {
> throw new PSQLException("postgresql.unusual", se);
> }
> finally
> {
> try
> {
> los.close();
> l_inStream.close();
> }
> catch( Exception e ) {}
> }
> // lob is closed by the stream so don't call lob.close()
> setInt(i, oid);
> }
>
>
>
>
>>Is there anything else that can be done to improve this. I have always
>>found the jdbc spec for Blobs to be limited in the blob creation area.
>> Is there anything you would like to see in the driver to make this
>>easier for you?
>>
>>
>
>I don't think so, but it's interesting to see the API and wonder if anybody
>actually coded around it. For example, assume you have a nice InputStream
>to a blob on disk. Getting that into a Blob field would be a pain since
>you'd have to wrap the InputStream in your object that implements Blob and
>then have that interface drive the blob insertion. If you already have a
>byte array, you still have to wrap it insert it as a blob. I suppose a
>really smart JDBC could use setBytes()/getBytes() and simply handle the
>conversions internally if the data field was actually a blob and not a
>longvarbinary.
>
>It's also interesting that the interface returns all numbers as a long,
>including the length, but the read requires it to be an int, meaning you can
>attempt to suck in an entire blob in a single call to getBytes(). Oh
>well...
>
>David
>
>
>
>>David Wall wrote:
>>
>>
>>
>>>In the 7.2.2 codeset, PreparedStatement.setBlob() shows a loop as it
>>>
>>>
>reads a
>
>
>>>byte from the input stream (the blob) and writes it to the output stream
>>>(PG's LO routines).
>>>
>>>This seems highly inefficient since most large objects are, well,
>>>
>>>
>large...
>
>
>>>So if I want to insert a 1MB image, this will loop a million times. Is
>>>there a reason it's not read in chunks (even a 4096 sized array would
>>>
>>>
>reduce
>
>
>>>such a loop down to 250 iterations)?
>>>
>>>This is much worse than the 7.1 code which simply took my byte array and
>>>wrote it all to the LargeObject stream in one call.
>>>
>>>+++
>>> public void setBlob(int i, Blob x) throws SQLException
>>> {
>>> InputStream l_inStream = x.getBinaryStream();
>>> int l_length = (int) x.length();
>>> LargeObjectManager lom = connection.getLargeObjectAPI();
>>> int oid = lom.create();
>>> LargeObject lob = lom.open(oid);
>>> OutputStream los = lob.getOutputStream();
>>> try
>>> {
>>> // could be buffered, but then the OutputStream
>>>returned by LargeObject
>>> // is buffered internally anyhow, so there would
>>>
>>>
>be
>
>
>>>no performance
>>> // boost gained, if anything it would be worse!
>>> int c = l_inStream.read();
>>> int p = 0;
>>> while (c > -1 && p < l_length)
>>> {
>>> los.write(c);
>>> c = l_inStream.read();
>>> p++;
>>> }
>>> los.close();
>>> }
>>> catch (IOException se)
>>> {
>>> throw new PSQLException("postgresql.unusual",
>>>
>>>
>se);
>
>
>>> }
>>> // lob is closed by the stream so don't call lob.close()
>>> setInt(i, oid);
>>> }
>>>
>>>+++
>>>
>>>Since the getBinaryStream() returns an InputStream, should this routine
>>>close that inputstream once it's done, or does the Blob itself have to
>>>somehow know that a stream it creates can be closed and discarded (and if
>>>so, how?)?
>>>
>>>
>
>
>
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Daniel Stern 2002-08-28 08:33:25 setFetchSize
Previous Message David Wall 2002-08-28 04:24:09 Re: setBlob loop performance?