BLOB performance test FYI

Lists: pgsql-jdbc
From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: <pgsql-jdbc(at)postgresql(dot)org>
Subject: BLOB performance test FYI
Date: 2002-04-17 03:17:58
Message-ID: HEECIHEEJDBMCCGMGIOBCEBOCHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Hi,

I'm adding a page on using binary data to a primer doc I'm writing, and have been performing some tests with PostgreSQL 7.2 and the latest JDBC code from CVS (14/4).

The following output shows some results of the test on a FreeBSD 4.5-S machine, P3 1GHz CPU, 1G RAM, PostgreSQL 7.2, JDK1.3.1p6 (OpenJIT); here the test was storing and retrieving increasingly large amounts of random binary data. The bytea type seems to cause real memory problems around the 4MB mark, presumably from trying to build an insert statement/StringBuffer big enough for the escaped data.

I was trying to see what data size would be best for which type. Around even 500k there seems to be a big overhead for storing bytea, again I guess converting the mostly unprintable binary data into the PGbytea string, though I'm not sure if it s a fair comparison test. I don't suppose there's any easy way of streaming the bytea values like the LO manager? I could be missing something fundamental about the bytea type, but given its limits, should the JDBC driver act by default on bytea?

Cheers,
Joe

---
www:~user# java -Xmx128m -classpath .:postgresql.jar BLOBTest postgres username password
[bytea] storing 0k of data -> 24ms
[bytea] retrieving 0k of data -> 7ms
[LO/oid] storing 0k of data -> 19ms
[LO/oid] retrieving 0k of data -> 4ms
[bytea] storing 1k of data -> 8ms
[bytea] retrieving 1k of data -> 1ms
[LO/oid] storing 1k of data -> 3ms
[LO/oid] retrieving 1k of data -> 3ms
[bytea] storing 500k of data -> 4943ms
[bytea] retrieving 500k of data -> 136ms
[LO/oid] storing 500k of data -> 142ms
[LO/oid] retrieving 500k of data -> 28ms
[bytea] storing 1000k of data -> 12423ms
[bytea] retrieving 1000k of data -> 270ms
[LO/oid] storing 1000k of data -> 285ms
[LO/oid] retrieving 1000k of data -> 55ms
Exception in thread "main" java.lang.OutOfMemoryError
at java.lang.StringBuffer.expandCapacity(StringBuffer.java, Compiled Code)
at java.lang.StringBuffer.append(StringBuffer.java, Compiled Code)
at org.postgresql.jdbc2.PreparedStatement.setString(Unknown Source)
at org.postgresql.jdbc2.PreparedStatement.setBytes(Unknown Source)
at org.postgresql.jdbc2.PreparedStatement.setBinaryStream(Unknown Source)
at BLOBTest.storeBlobAsBytea(BLOBTest.java, Compiled Code)
at BLOBTest.testData(BLOBTest.java, Compiled Code)
at BLOBTest.main(BLOBTest.java, Compiled Code)

The LO/oid output is using the LargeObject manager, and performance-wise seems to scale well until about 65M in my case (the results below can be avoided if I bump -Xmx up to 256M). This output is from commenting out the bytea tests:

---
www:~user# java -Xmx128m -classpath .:postgresql.jar BLOBTest postgres username password
[LO/oid] storing 0k of data -> 38ms
[LO/oid] retrieving 0k of data -> 5ms
[LO/oid] storing 1k of data -> 3ms
[LO/oid] retrieving 1k of data -> 3ms
[LO/oid] storing 500k of data -> 139ms
[LO/oid] retrieving 500k of data -> 30ms
[LO/oid] storing 1000k of data -> 268ms
[LO/oid] retrieving 1000k of data -> 55ms
[LO/oid] storing 4000k of data -> 1162ms
[LO/oid] retrieving 4000k of data -> 227ms
[LO/oid] storing 16000k of data -> 4817ms
[LO/oid] retrieving 16000k of data -> 868ms
[LO/oid] storing 32000k of data -> 9711ms
[LO/oid] retrieving 32000k of data -> 1704ms
[LO/oid] storing 63999k of data -> 19748ms
Unknown Response Type
Unknown Response Type
at org.postgresql.core.QueryExecutor.execute(Unknown Source)
at org.postgresql.Connection.ExecSQL(Unknown Source)
at org.postgresql.Connection.ExecSQL(Unknown Source)
at org.postgresql.Connection.setAutoCommit(Unknown Source)
at BLOBTest.retrieveBlobAsLO(BLOBTest.java, Compiled Code)
at BLOBTest.testData(BLOBTest.java, Compiled Code)
at BLOBTest.main(BLOBTest.java, Compiled Code)

--- BLOBTest.java

import java.io.*;
import java.sql.*;
import org.postgresql.largeobject.*;

public class BLOBTest {

public BLOBTest()
throws Exception {

Class.forName("org.postgresql.Driver");

createDummyDataFile("bin0",256);
createDummyDataFile("bin1",1024);
createDummyDataFile("bin512",512000);
createDummyDataFile("bin1024",1024000);
createDummyDataFile("bin4096",4096000);
createDummyDataFile("bin16384",16384000);
createDummyDataFile("bin32768",32768000);
createDummyDataFile("bin65535",65535000);
}

public static void main( String args[] ) {
if ( args.length != 3 ) {
System.err.println("usage: java BLOBTest <database_spec> <username> <password>");
System.exit(1);
}

try {

BLOBTest test = new BLOBTest();
long time = 0;

Connection conn = DriverManager.getConnection(
"jdbc:postgresql:"+args[0],args[1],args[2]);

test.testData(conn,"bytea","bin0",256);
test.testData(conn,"LO/oid","bin0",256);

test.testData(conn,"bytea","bin1",1024);
test.testData(conn,"LO/oid","bin1",1024);

test.testData(conn,"bytea","bin512",512000);
test.testData(conn,"LO/oid","bin512",512000);

test.testData(conn,"bytea","bin1024",1024000);
test.testData(conn,"LO/oid","bin1024",1024000);

test.testData(conn,"bytea","bin4096",4096000);
test.testData(conn,"LO/oid","bin4096",4096000);

test.testData(conn,"bytea","bin16384",16384000);
test.testData(conn,"LO/oid","bin16384",16384000);

test.testData(conn,"bytea","bin32768",32768000);
test.testData(conn,"LO/oid","bin32768",32768000);

test.testData(conn,"bytea","bin65535",65535000);
test.testData(conn,"LO/oid","bin65535",65535000);

} catch ( Exception e ) {
System.err.println(e);
e.printStackTrace();
}
}

private void createDummyDataFile( String fileName, int size )
throws Exception {

byte data[] = new byte[size];
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(data);
fos.close();
}

private void testData( Connection conn, String method, String fileName, int size )
throws Exception {

long time;

// Garbage collect to clean up any garbage objects
System.gc();

if ( method.equals("bytea") ) {
time = storeBlobAsBytea(conn, fileName);
System.err.println("["+method+"] storing "+(size/1024)+"k of data -> "+time+"ms");
time = retrieveBlobAsBytea(conn, fileName);
System.err.println("["+method+"] retrieving "+(size/1024)+"k of data -> "+time+"ms");
} else {
time = storeBlobAsLO(conn, fileName);
System.err.println("["+method+"] storing "+(size/1024)+"k of data -> "+time+"ms");
time = retrieveBlobAsLO(conn, fileName);
System.err.println("["+method+"] retrieving "+(size/1024)+"k of data -> "+time+"ms");
}
}

private long storeBlobAsBytea( Connection conn, String fileName )
throws Exception {

long t1, t2;

t1 = System.currentTimeMillis();
File file = new File(fileName);
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
PreparedStatement ps = conn.prepareStatement("INSERT INTO blob_bytea VALUES (?, ?)");
ps.setString(1, file.getName());
ps.setBinaryStream(2, bis, (int)file.length());
ps.executeUpdate();
ps.close();
bis.close();
t2 = System.currentTimeMillis();
return (t2 - t1);
}

private long retrieveBlobAsBytea( Connection conn, String fileName )
throws Exception {

long t1, t2;

t1 = System.currentTimeMillis();
PreparedStatement ps = conn.prepareStatement(
"SELECT blob_data FROM blob_bytea WHERE blob_name=?");
ps.setString(1, fileName);
ResultSet rs = ps.executeQuery();
if ( rs.next() ) {
byte[] imgBytes = rs.getBytes(1);
// use the stream in some way here
}
rs.close();
ps.close();
t2 = System.currentTimeMillis();
return (t2 - t1);
}

private long storeBlobAsLO( Connection conn, String fileName )
throws Exception {

long t1, t2;

boolean oldState = conn.getAutoCommit();
try {

t1 = System.currentTimeMillis();

// All LargeObject API calls must be within a transaction
conn.setAutoCommit(false);

// Get the Large Object Manager to perform operations with
LargeObjectManager lobj = ((org.postgresql.Connection)conn).getLargeObjectAPI();

// create a new large object
int oid = lobj.create(LargeObjectManager.READ | LargeObjectManager.WRITE);

// open the large object for write
LargeObject obj = lobj.open(oid, LargeObjectManager.WRITE);

// Now open the file
File file = new File(fileName);
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));

// copy the data from the file to the large object
byte buf[] = new byte[2048];
int s, tl = 0;
while ((s = bis.read(buf, 0, 2048)) > 0) {
obj.write(buf, 0, s);
tl += s;
}
// Close the large object
obj.close();
// Now insert the row
PreparedStatement ps = conn.prepareStatement("INSERT INTO blob_lo VALUES (?, ?)");
ps.setString(1, file.getName());
ps.setInt(2, oid);
ps.executeUpdate();
ps.close();
bis.close();

t2 = System.currentTimeMillis();
return (t2 - t1);

} finally {
conn.setAutoCommit(oldState);
}
}

private long retrieveBlobAsLO( Connection conn, String fileName )
throws Exception {

long t1, t2;

boolean oldState = conn.getAutoCommit();

try {

t1 = System.currentTimeMillis();

// All LargeObject API calls must be within a transaction
conn.setAutoCommit(false);

// Get the Large Object Manager to perform operations with
LargeObjectManager lobj = ((org.postgresql.Connection)conn).getLargeObjectAPI();
PreparedStatement ps = conn.prepareStatement("SELECT blob_data FROM blob_lo WHERE blob_name=?");
ps.setString(1, fileName);
ResultSet rs = ps.executeQuery();

while(rs.next()) {
// open the large object for reading
int oid = rs.getInt(1);
LargeObject obj = lobj.open(oid, LargeObjectManager.READ);
// read the data
byte buf[] = new byte[obj.size()];
obj.read(buf, 0, obj.size());
// do something with the data read here
// Close the object
obj.close();
}
rs.close();
ps.close();

t2 = System.currentTimeMillis();
return (t2 - t1);
} finally {
conn.setAutoCommit(oldState);
}
}
}

---
Finally, to build or run:

javac -classpath .:postgresql.jar BLOBTest.java

java -Xmx128m -classpath .:postgresql.jar BLOBTest //somehost/db user password


From: Stuart Robinson <stuart(at)zapata(dot)org>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: callable statements
Date: 2002-04-17 04:42:02
Message-ID: Pine.LNX.4.44.0204162136440.3304-100000@www.dreamingamerica.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

What's the latest word on the status of CallableStatement for PostgreSQL
JDBC? I believe it's not currently supported (as of 7.2).
--
Stuart Robinson [stuart(at)zapata(dot)org]


From: Barry Lind <barry(at)xythos(dot)com>
To: Stuart Robinson <stuart(at)zapata(dot)org>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: callable statements
Date: 2002-04-17 05:44:58
Message-ID: 3CBD0BDA.5030908@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Stuart,

What functionality are you looking for? Given the fact that stored
functions in postgres can only return a single value, even when callable
statements are supported, there will be limited support. Is there
something in particular you are looking to have supported?

The current work around of using a preparedStatement to issue the
'select function()' call seems easy and fairly straight forward.

thanks,
--Barry

Stuart Robinson wrote:
> What's the latest word on the status of CallableStatement for PostgreSQL
> JDBC? I believe it's not currently supported (as of 7.2).
> --
> Stuart Robinson [stuart(at)zapata(dot)org]
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>


From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 11:28:31
Message-ID: 1019042912.2087.13.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe,

Yes, Barry has commented on the reasons for this recently. The short
version is that when you are using bytea, the backend parses the input.

Dave
On Tue, 2002-04-16 at 23:17, Joe Shevland wrote:
> Hi,
>
> I'm adding a page on using binary data to a primer doc I'm writing, and have been performing some tests with PostgreSQL 7.2 and the latest JDBC code from CVS (14/4).
>
> The following output shows some results of the test on a FreeBSD 4.5-S machine, P3 1GHz CPU, 1G RAM, PostgreSQL 7.2, JDK1.3.1p6 (OpenJIT); here the test was storing and retrieving increasingly large amounts of random binary data. The bytea type seems to cause real memory problems around the 4MB mark, presumably from trying to build an insert statement/StringBuffer big enough for the escaped data.
>
> I was trying to see what data size would be best for which type. Around even 500k there seems to be a big overhead for storing bytea, again I guess converting the mostly unprintable binary data into the PGbytea string, though I'm not sure if it s a fair comparison test. I don't suppose there's any easy way of streaming the bytea values like the LO manager? I could be missing something fundamental about the bytea type, but given its limits, should the JDBC driver act by default on bytea?
>
> Cheers,
> Joe
>
> ---
> www:~user# java -Xmx128m -classpath .:postgresql.jar BLOBTest postgres username password
> [bytea] storing 0k of data -> 24ms
> [bytea] retrieving 0k of data -> 7ms
> [LO/oid] storing 0k of data -> 19ms
> [LO/oid] retrieving 0k of data -> 4ms
> [bytea] storing 1k of data -> 8ms
> [bytea] retrieving 1k of data -> 1ms
> [LO/oid] storing 1k of data -> 3ms
> [LO/oid] retrieving 1k of data -> 3ms
> [bytea] storing 500k of data -> 4943ms
> [bytea] retrieving 500k of data -> 136ms
> [LO/oid] storing 500k of data -> 142ms
> [LO/oid] retrieving 500k of data -> 28ms
> [bytea] storing 1000k of data -> 12423ms
> [bytea] retrieving 1000k of data -> 270ms
> [LO/oid] storing 1000k of data -> 285ms
> [LO/oid] retrieving 1000k of data -> 55ms
> Exception in thread "main" java.lang.OutOfMemoryError
> at java.lang.StringBuffer.expandCapacity(StringBuffer.java, Compiled Code)
> at java.lang.StringBuffer.append(StringBuffer.java, Compiled Code)
> at org.postgresql.jdbc2.PreparedStatement.setString(Unknown Source)
> at org.postgresql.jdbc2.PreparedStatement.setBytes(Unknown Source)
> at org.postgresql.jdbc2.PreparedStatement.setBinaryStream(Unknown Source)
> at BLOBTest.storeBlobAsBytea(BLOBTest.java, Compiled Code)
> at BLOBTest.testData(BLOBTest.java, Compiled Code)
> at BLOBTest.main(BLOBTest.java, Compiled Code)
>
> The LO/oid output is using the LargeObject manager, and performance-wise seems to scale well until about 65M in my case (the results below can be avoided if I bump -Xmx up to 256M). This output is from commenting out the bytea tests:
>
> ---
> www:~user# java -Xmx128m -classpath .:postgresql.jar BLOBTest postgres username password
> [LO/oid] storing 0k of data -> 38ms
> [LO/oid] retrieving 0k of data -> 5ms
> [LO/oid] storing 1k of data -> 3ms
> [LO/oid] retrieving 1k of data -> 3ms
> [LO/oid] storing 500k of data -> 139ms
> [LO/oid] retrieving 500k of data -> 30ms
> [LO/oid] storing 1000k of data -> 268ms
> [LO/oid] retrieving 1000k of data -> 55ms
> [LO/oid] storing 4000k of data -> 1162ms
> [LO/oid] retrieving 4000k of data -> 227ms
> [LO/oid] storing 16000k of data -> 4817ms
> [LO/oid] retrieving 16000k of data -> 868ms
> [LO/oid] storing 32000k of data -> 9711ms
> [LO/oid] retrieving 32000k of data -> 1704ms
> [LO/oid] storing 63999k of data -> 19748ms
> Unknown Response Type
> Unknown Response Type
> at org.postgresql.core.QueryExecutor.execute(Unknown Source)
> at org.postgresql.Connection.ExecSQL(Unknown Source)
> at org.postgresql.Connection.ExecSQL(Unknown Source)
> at org.postgresql.Connection.setAutoCommit(Unknown Source)
> at BLOBTest.retrieveBlobAsLO(BLOBTest.java, Compiled Code)
> at BLOBTest.testData(BLOBTest.java, Compiled Code)
> at BLOBTest.main(BLOBTest.java, Compiled Code)
>
> --- BLOBTest.java
>
> import java.io.*;
> import java.sql.*;
> import org.postgresql.largeobject.*;
>
> public class BLOBTest {
>
> public BLOBTest()
> throws Exception {
>
> Class.forName("org.postgresql.Driver");
>
> createDummyDataFile("bin0",256);
> createDummyDataFile("bin1",1024);
> createDummyDataFile("bin512",512000);
> createDummyDataFile("bin1024",1024000);
> createDummyDataFile("bin4096",4096000);
> createDummyDataFile("bin16384",16384000);
> createDummyDataFile("bin32768",32768000);
> createDummyDataFile("bin65535",65535000);
> }
>
> public static void main( String args[] ) {
> if ( args.length != 3 ) {
> System.err.println("usage: java BLOBTest <database_spec> <username> <password>");
> System.exit(1);
> }
>
> try {
>
> BLOBTest test = new BLOBTest();
> long time = 0;
>
> Connection conn = DriverManager.getConnection(
> "jdbc:postgresql:"+args[0],args[1],args[2]);
>
> test.testData(conn,"bytea","bin0",256);
> test.testData(conn,"LO/oid","bin0",256);
>
> test.testData(conn,"bytea","bin1",1024);
> test.testData(conn,"LO/oid","bin1",1024);
>
> test.testData(conn,"bytea","bin512",512000);
> test.testData(conn,"LO/oid","bin512",512000);
>
> test.testData(conn,"bytea","bin1024",1024000);
> test.testData(conn,"LO/oid","bin1024",1024000);
>
> test.testData(conn,"bytea","bin4096",4096000);
> test.testData(conn,"LO/oid","bin4096",4096000);
>
> test.testData(conn,"bytea","bin16384",16384000);
> test.testData(conn,"LO/oid","bin16384",16384000);
>
> test.testData(conn,"bytea","bin32768",32768000);
> test.testData(conn,"LO/oid","bin32768",32768000);
>
> test.testData(conn,"bytea","bin65535",65535000);
> test.testData(conn,"LO/oid","bin65535",65535000);
>
> } catch ( Exception e ) {
> System.err.println(e);
> e.printStackTrace();
> }
> }
>
> private void createDummyDataFile( String fileName, int size )
> throws Exception {
>
> byte data[] = new byte[size];
> FileOutputStream fos = new FileOutputStream(fileName);
> fos.write(data);
> fos.close();
> }
>
> private void testData( Connection conn, String method, String fileName, int size )
> throws Exception {
>
> long time;
>
> // Garbage collect to clean up any garbage objects
> System.gc();
>
> if ( method.equals("bytea") ) {
> time = storeBlobAsBytea(conn, fileName);
> System.err.println("["+method+"] storing "+(size/1024)+"k of data -> "+time+"ms");
> time = retrieveBlobAsBytea(conn, fileName);
> System.err.println("["+method+"] retrieving "+(size/1024)+"k of data -> "+time+"ms");
> } else {
> time = storeBlobAsLO(conn, fileName);
> System.err.println("["+method+"] storing "+(size/1024)+"k of data -> "+time+"ms");
> time = retrieveBlobAsLO(conn, fileName);
> System.err.println("["+method+"] retrieving "+(size/1024)+"k of data -> "+time+"ms");
> }
> }
>
> private long storeBlobAsBytea( Connection conn, String fileName )
> throws Exception {
>
> long t1, t2;
>
> t1 = System.currentTimeMillis();
> File file = new File(fileName);
> BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
> PreparedStatement ps = conn.prepareStatement("INSERT INTO blob_bytea VALUES (?, ?)");
> ps.setString(1, file.getName());
> ps.setBinaryStream(2, bis, (int)file.length());
> ps.executeUpdate();
> ps.close();
> bis.close();
> t2 = System.currentTimeMillis();
> return (t2 - t1);
> }
>
> private long retrieveBlobAsBytea( Connection conn, String fileName )
> throws Exception {
>
> long t1, t2;
>
> t1 = System.currentTimeMillis();
> PreparedStatement ps = conn.prepareStatement(
> "SELECT blob_data FROM blob_bytea WHERE blob_name=?");
> ps.setString(1, fileName);
> ResultSet rs = ps.executeQuery();
> if ( rs.next() ) {
> byte[] imgBytes = rs.getBytes(1);
> // use the stream in some way here
> }
> rs.close();
> ps.close();
> t2 = System.currentTimeMillis();
> return (t2 - t1);
> }
>
> private long storeBlobAsLO( Connection conn, String fileName )
> throws Exception {
>
> long t1, t2;
>
> boolean oldState = conn.getAutoCommit();
> try {
>
> t1 = System.currentTimeMillis();
>
> // All LargeObject API calls must be within a transaction
> conn.setAutoCommit(false);
>
> // Get the Large Object Manager to perform operations with
> LargeObjectManager lobj = ((org.postgresql.Connection)conn).getLargeObjectAPI();
>
> // create a new large object
> int oid = lobj.create(LargeObjectManager.READ | LargeObjectManager.WRITE);
>
> // open the large object for write
> LargeObject obj = lobj.open(oid, LargeObjectManager.WRITE);
>
> // Now open the file
> File file = new File(fileName);
> BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
>
> // copy the data from the file to the large object
> byte buf[] = new byte[2048];
> int s, tl = 0;
> while ((s = bis.read(buf, 0, 2048)) > 0) {
> obj.write(buf, 0, s);
> tl += s;
> }
> // Close the large object
> obj.close();
> // Now insert the row
> PreparedStatement ps = conn.prepareStatement("INSERT INTO blob_lo VALUES (?, ?)");
> ps.setString(1, file.getName());
> ps.setInt(2, oid);
> ps.executeUpdate();
> ps.close();
> bis.close();
>
> t2 = System.currentTimeMillis();
> return (t2 - t1);
>
> } finally {
> conn.setAutoCommit(oldState);
> }
> }
>
> private long retrieveBlobAsLO( Connection conn, String fileName )
> throws Exception {
>
> long t1, t2;
>
> boolean oldState = conn.getAutoCommit();
>
> try {
>
> t1 = System.currentTimeMillis();
>
> // All LargeObject API calls must be within a transaction
> conn.setAutoCommit(false);
>
> // Get the Large Object Manager to perform operations with
> LargeObjectManager lobj = ((org.postgresql.Connection)conn).getLargeObjectAPI();
> PreparedStatement ps = conn.prepareStatement("SELECT blob_data FROM blob_lo WHERE blob_name=?");
> ps.setString(1, fileName);
> ResultSet rs = ps.executeQuery();
>
> while(rs.next()) {
> // open the large object for reading
> int oid = rs.getInt(1);
> LargeObject obj = lobj.open(oid, LargeObjectManager.READ);
> // read the data
> byte buf[] = new byte[obj.size()];
> obj.read(buf, 0, obj.size());
> // do something with the data read here
> // Close the object
> obj.close();
> }
> rs.close();
> ps.close();
>
> t2 = System.currentTimeMillis();
> return (t2 - t1);
> } finally {
> conn.setAutoCommit(oldState);
> }
> }
> }
>
> ---
> Finally, to build or run:
>
> javac -classpath .:postgresql.jar BLOBTest.java
>
> java -Xmx128m -classpath .:postgresql.jar BLOBTest //somehost/db user password
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>


From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: "Dave Cramer" <Dave(at)micro-automation(dot)net>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 21:35:57
Message-ID: HEECIHEEJDBMCCGMGIOBKECFCHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Well, isn't it a bit crazy that bytea is the default JDBC BLOB type given the performance issues shown?

The general public that start to pump binary data into a PostgreSQL database are going to be confused if performance is that bad... and that was a grunty server too with bulkloads of RAM, I shudder to think what'd happen to a client.

I mean, generally if you're storing binary data it is for things such as image or document storage (of course there's many, many other purposes), so the binary data wil be large.

Joe

> -----Original Message-----
> From: Dave Cramer [mailto:Dave(at)micro-automation(dot)net]
> Sent: Wednesday, 17 April 2002 21:29
> To: Joe Shevland
> Cc: pgsql-jdbc(at)postgresql(dot)org
> Subject: Re: [JDBC] BLOB performance test FYI
>
>
> Joe,
>
> Yes, Barry has commented on the reasons for this recently. The short
> version is that when you are using bytea, the backend parses the input.
>
> Dave


From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 21:53:44
Message-ID: 1019080424.2085.22.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe,

Yes, I tend to agree with your assessment, and would like to see the
default to be lo instead, or we need to some how warn people of this

Dave
On Wed, 2002-04-17 at 17:35, Joe Shevland wrote:
> Well, isn't it a bit crazy that bytea is the default JDBC BLOB type given the performance issues shown?
>
> The general public that start to pump binary data into a PostgreSQL database are going to be confused if performance is that bad... and that was a grunty server too with bulkloads of RAM, I shudder to think what'd happen to a client.
>
> I mean, generally if you're storing binary data it is for things such as image or document storage (of course there's many, many other purposes), so the binary data wil be large.
>
> Joe
>
> > -----Original Message-----
> > From: Dave Cramer [mailto:Dave(at)micro-automation(dot)net]
> > Sent: Wednesday, 17 April 2002 21:29
> > To: Joe Shevland
> > Cc: pgsql-jdbc(at)postgresql(dot)org
> > Subject: Re: [JDBC] BLOB performance test FYI
> >
> >
> > Joe,
> >
> > Yes, Barry has commented on the reasons for this recently. The short
> > version is that when you are using bytea, the backend parses the input.
> >
> > Dave
>
>


From: Barry Lind <barry(at)xythos(dot)com>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 22:08:56
Message-ID: 3CBDF278.3040304@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe,

No I don't think it is crazy when you look at the other side of the
coin. LargeObject's have their own set of problems and you need to
compare the pros and cons of each solution. The reason I believe that
bytea wins over LargeObjects is because of the symantics of delete.
When you delete a row containing a bytea column the bytea data is gone.
However when you delete a row containing an oid that references a
LargeObject you only delete the reference. The LargeObject does not get
deleted. If you want to use LargeObjects you need to code your own
deletes so that the LargeObjects get cleaned up, or you need to create
triggers on your tables so that it happens automatically. Also
LargeObjects have essentially no security over them. If you have access
to the database you can read all LargeObjects in that database
regardless of whether or not you have access to the table that contains
the oid for the LargeObject. The casual user of binary data (which can
range from a few bytes to hundreds of megs) should have something out of
the box that works symantically as they would expect IMHO. I find that
many users of bytea are storing gifs and html pages for web sites which
generally are small enough such that bytea works fine. However there
certainly are circumstances where bytea is not the correct choice of
datatype for storing the data.

This is all explained to some degree in the documentation for 7.2. It
is left as a choice to the user to use the datatype best suited for
their circumstances given the pros and cons of the different alternatives.

thanks,
--Barry

Joe Shevland wrote:
> Well, isn't it a bit crazy that bytea is the default JDBC BLOB type given the performance issues shown?
>
> The general public that start to pump binary data into a PostgreSQL database are going to be confused if performance is that bad... and that was a grunty server too with bulkloads of RAM, I shudder to think what'd happen to a client.
>
> I mean, generally if you're storing binary data it is for things such as image or document storage (of course there's many, many other purposes), so the binary data wil be large.
>
> Joe
>
>
>>-----Original Message-----
>>From: Dave Cramer [mailto:Dave(at)micro-automation(dot)net]
>>Sent: Wednesday, 17 April 2002 21:29
>>To: Joe Shevland
>>Cc: pgsql-jdbc(at)postgresql(dot)org
>>Subject: Re: [JDBC] BLOB performance test FYI
>>
>>
>>Joe,
>>
>>Yes, Barry has commented on the reasons for this recently. The short
>>version is that when you are using bytea, the backend parses the input.
>>
>>Dave
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>


From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: "Barry Lind" <barry(at)xythos(dot)com>
Cc: "Dave Cramer" <Dave(at)micro-automation(dot)net>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 22:20:10
Message-ID: HEECIHEEJDBMCCGMGIOBIECICHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

> No I don't think it is crazy when you look at the other side of the
> coin. LargeObject's have their own set of problems and you need to
> compare the pros and cons of each solution. The reason I believe that
> bytea wins over LargeObjects is because of the symantics of delete.

I do agree here, LO's have their own insurmountable issues.

[snip]
> the oid for the LargeObject. The casual user of binary data (which can
> range from a few bytes to hundreds of megs) should have something out of
> the box that works symantically as they would expect IMHO.

I totally agree; however as shown the bytea type *will not work* for data
this size (esp. not hundreds of megs). Both storage and retrieval from the
client. !!

> I find that
> many users of bytea are storing gifs and html pages for web sites which
> generally are small enough such that bytea works fine.

Well, I disagree... even a few kilobytes has shown to be substandard in
storing.

> This is all explained to some degree in the documentation for 7.2. It
> is left as a choice to the user to use the datatype best suited for
> their circumstances given the pros and cons of the different
alternatives.

Again its not really well explained at all in the doco (I'm writing a
primer that fixes code bugs in the examples (setBlob() isn't a method ;)
and clarifies these issues). It is left as a choice I agree, but
defaulting to the bytea type IMHO is a shocking thing to do... if I hadn't
done this investigation and gone ahead with my application using 'bytea',
it would be brought to its knees, thats if it makes it to its knees after
the memory issues.

Joe


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To:
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 22:31:49
Message-ID: 001f01c1e65f$aa3bc550$9b0e32d4@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


----- Original Message -----
From: "Barry Lind"

>
> This is all explained to some degree in the documentation for 7.2. It
> is left as a choice to the user to use the datatype best suited for
> their circumstances given the pros and cons of the different alternatives.
>

the docs do not mention that storing even a 500KB of binary data is 40 times
slower with byte when compared to LO

besides, 500KB is all but the "very large value" that is mentioned in the
docs

provided that people have outlined the problems that lead to the poor byte
performance (the backend parses the input) and that these problems seem to
be unresolvable, then wouldn't it be a better mid-term solution to switch to
LO and hope for improvements related to LOs in next releases rather than
using byte at all?

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "


From: Barry Lind <barry(at)xythos(dot)com>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 22:32:33
Message-ID: 3CBDF801.4080801@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe,

The real problem is that postgres does not have a good datatype for
storing large binary (or large character) objects. There are currently
two options each less than ideal. The jdbc driver can only map into one
of the available types and until something better gets put into the
backend there is no idea choice.

Patches to the documentation that help improve everyones understanding
of the issues involved are certainly welcome.

thanks,
--Barry

Joe Shevland wrote:
>>No I don't think it is crazy when you look at the other side of the
>>coin. LargeObject's have their own set of problems and you need to
>>compare the pros and cons of each solution. The reason I believe that
>>bytea wins over LargeObjects is because of the symantics of delete.
>
>
> I do agree here, LO's have their own insurmountable issues.
>
> [snip]
>
>>the oid for the LargeObject. The casual user of binary data (which can
>>range from a few bytes to hundreds of megs) should have something out of
>>the box that works symantically as they would expect IMHO.
>
>
> I totally agree; however as shown the bytea type *will not work* for data
> this size (esp. not hundreds of megs). Both storage and retrieval from the
> client. !!
>
>
>>I find that
>>many users of bytea are storing gifs and html pages for web sites which
>>generally are small enough such that bytea works fine.
>
>
> Well, I disagree... even a few kilobytes has shown to be substandard in
> storing.
>
>
>>This is all explained to some degree in the documentation for 7.2. It
>>is left as a choice to the user to use the datatype best suited for
>>their circumstances given the pros and cons of the different
>
> alternatives.
>
> Again its not really well explained at all in the doco (I'm writing a
> primer that fixes code bugs in the examples (setBlob() isn't a method ;)
> and clarifies these issues). It is left as a choice I agree, but
> defaulting to the bytea type IMHO is a shocking thing to do... if I hadn't
> done this investigation and gone ahead with my application using 'bytea',
> it would be brought to its knees, thats if it makes it to its knees after
> the memory issues.
>
> Joe


From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 22:40:16
Message-ID: HEECIHEEJDBMCCGMGIOBEECJCHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Well said.

I agree with Barry too that the best path to take is to try and improve
the binary support, and that we're dealing with non-ideal options both
ways. If someone is seriously interested in a patch to the current HTML
doco, I'll gladly do it.

Wouldn't it make sense to separate the functionality of the OID data
type - this is an ambigious use of types for different purposes.

What are the chances and issues with streaming binary data to the backend?
LO manager does it, but I'm not sure what else it introduces (separate
transaction etc)

Regards,
Joe

> -----Original Message-----
> From: pgsql-jdbc-owner(at)postgresql(dot)org
> [mailto:pgsql-jdbc-owner(at)postgresql(dot)org]On Behalf Of Marin Dimitrov
> Sent: Thursday, 18 April 2002 8:32
> Cc: pgsql-jdbc(at)postgresql(dot)org
> Subject: Re: [JDBC] BLOB performance test FYI
>
>
>
> ----- Original Message -----
> From: "Barry Lind"
>
> >
> > This is all explained to some degree in the documentation for 7.2. It
> > is left as a choice to the user to use the datatype best suited for
> > their circumstances given the pros and cons of the different
> alternatives.
> >
>
> the docs do not mention that storing even a 500KB of binary data
> is 40 times
> slower with byte when compared to LO
>
> besides, 500KB is all but the "very large value" that is mentioned in
the
> docs
>
> provided that people have outlined the problems that lead to the poor
byte
> performance (the backend parses the input) and that these problems seem
to
> be unresolvable, then wouldn't it be a better mid-term solution
> to switch to
> LO and hope for improvements related to LOs in next releases rather than
> using byte at all?
>
>
> Marin
>
> ----
> "...what you brought from your past, is of no use in your present. When
> you must choose a new path, do not bring old experiences with you.
> Those who strike out afresh, but who attempt to retain a little of the
> old life, end up torn apart by their own memories. "
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>


From: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
To: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-17 23:07:12
Message-ID: 1019084834.1393.22.camel@spinoza
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On 2002-04-18 Joe Shevland wrote:

> > the oid for the LargeObject. The casual user of binary data (which can
> > range from a few bytes to hundreds of megs) should have something out of
> > the box that works symantically as they would expect IMHO.
>
> I totally agree; however as shown the bytea type *will not work* for data
> this size (esp. not hundreds of megs). Both storage and retrieval from the
> client. !!
>
> > I find that
> > many users of bytea are storing gifs and html pages for web sites which
> > generally are small enough such that bytea works fine.
>
> Well, I disagree... even a few kilobytes has shown to be substandard in
> storing.

Some of these problems aren't really about the datatypes themselves, but
how the driver handles them. Instead of streaming large data into the
backend, the client reads them into memory before sending them.
It seems the LO parts of the driver has a different solution, that
streams the data directly.
It would take some major reworking of the JDBC driver to fix this. I've
looked at it, and concluded that it is possible, but that it was too big
for me to fix.

/Anders
--

A n d e r s B e n g t s s o n | ndrsbngtssn(at)yahoo(dot)se
Stockholm, Sweden |

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: "Anders Bengtsson" <ndrsbngtssn(at)yahoo(dot)se>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 00:37:37
Message-ID: HEECIHEEJDBMCCGMGIOBAECLCHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

> > > I find that
> > > many users of bytea are storing gifs and html pages for web
> sites which
> > > generally are small enough such that bytea works fine.
> >
> > Well, I disagree... even a few kilobytes has shown to be substandard in
> > storing.
>
> Some of these problems aren't really about the datatypes themselves, but
> how the driver handles them.

Thats a good point; my issues have purely been through the JDBC driver interaction.

> Instead of streaming large data into the
> backend, the client reads them into memory before sending them.
> It seems the LO parts of the driver has a different solution, that
> streams the data directly.
> It would take some major reworking of the JDBC driver to fix this. I've
> looked at it, and concluded that it is possible, but that it was too big
> for me to fix.

I've got time (read: resigned a few weeks ago and am sick of hop-flavoured water ;) and wouldn't mind doing it; I just needed an indication that its possible. Would you mind giving me some hints about what you looked at offline?

I figured that the differences in the types and how the backend handles them precludes actually trying to do anything to optimise the behaviour in the client interface. For example, can I use the socket connection when performing an update operation to stream binary data to the backend, or does it expect the whole statement to be character data sent at the one time?

What about other client interfaces and bytea types, how does the TCP/IP stuff go? C, Perl et al would still have problems loading binary data into memory too I would have thought unless they stream the data.

I'll add some notes re: this discussion to http://www.j-elite.com/pgprimer sometime today or tomorrow in the BLOB section, then put aside some time to put some real content into the rest of the site.

Cheers,
Joe


From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 01:10:22
Message-ID: 1019092223.2084.24.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe, Anders,

Please keep this discussion online. The more people we have looking at a
possible solution, the better.

Dave
On Wed, 2002-04-17 at 20:37, Joe Shevland wrote:
> > > > I find that
> > > > many users of bytea are storing gifs and html pages for web
> > sites which
> > > > generally are small enough such that bytea works fine.
> > >
> > > Well, I disagree... even a few kilobytes has shown to be substandard in
> > > storing.
> >
> > Some of these problems aren't really about the datatypes themselves, but
> > how the driver handles them.
>
> Thats a good point; my issues have purely been through the JDBC driver interaction.
>
> > Instead of streaming large data into the
> > backend, the client reads them into memory before sending them.
> > It seems the LO parts of the driver has a different solution, that
> > streams the data directly.
> > It would take some major reworking of the JDBC driver to fix this. I've
> > looked at it, and concluded that it is possible, but that it was too big
> > for me to fix.
>
> I've got time (read: resigned a few weeks ago and am sick of hop-flavoured water ;) and wouldn't mind doing it; I just needed an indication that its possible. Would you mind giving me some hints about what you looked at offline?
>
> I figured that the differences in the types and how the backend handles them precludes actually trying to do anything to optimise the behaviour in the client interface. For example, can I use the socket connection when performing an update operation to stream binary data to the backend, or does it expect the whole statement to be character data sent at the one time?
>
> What about other client interfaces and bytea types, how does the TCP/IP stuff go? C, Perl et al would still have problems loading binary data into memory too I would have thought unless they stream the data.
>
> I'll add some notes re: this discussion to http://www.j-elite.com/pgprimer sometime today or tomorrow in the BLOB section, then put aside some time to put some real content into the rest of the site.
>
> Cheers,
> Joe
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>


From: Barry Lind <barry(at)xythos(dot)com>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: Marin Dimitrov <marin(dot)dimitrov(at)sirma(dot)bg>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 01:32:15
Message-ID: 3CBE221F.70508@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe,

> If someone is seriously interested in a patch to the current HTML
> doco, I'll gladly do it.

I would gladly welcome any updates/changes you want to make to the doc.
Also if there are things you would want changed on the
jdbc.postgresql.org website send them in as well.

> Wouldn't it make sense to separate the functionality of the OID data
> type - this is an ambigious use of types for different purposes.

Yes I agree. One of the problems the jdbc driver has with LO is that in
supporting them it has to assume that any column of type oid is a LO.
Thus if you call getObject on an oid column what should the driver do?
Well it depends on if you are using the oid for LOs or not. If you are
using the oid column to store oids other than LOs then the correct thing
to do would be to return an Interger/Long object, else a LO.

> What are the chances and issues with streaming binary data to the
backend?
> LO manager does it, but I'm not sure what else it introduces (separate
> transaction etc)

The backend doesn't have a way to stream data to it. However there are
two different ways to send data to the backend: 1) use the standard sql
mechanism of the FE/BE protocol in which the entire sql including all
data values are converted into a string and sent to the backend; 2) is
to use the FastPath protocol which allows you to call functions directly
on the backend. The FastPath protocol allows you to send binary data
without converting it to a string. The LO functionality in the driver
uses the FastPath protocol to call the underlying LO_xxx functions. So
while you can't stream data, you can break it into chunks and make
repeated calls to a function to send large amounts of data.

thanks,
--Barry

Joe Shevland wrote:
> Well said.
>
> I agree with Barry too that the best path to take is to try and improve
> the binary support, and that we're dealing with non-ideal options both
> ways. If someone is seriously interested in a patch to the current HTML
> doco, I'll gladly do it.
>
> Wouldn't it make sense to separate the functionality of the OID data
> type - this is an ambigious use of types for different purposes.
>
> What are the chances and issues with streaming binary data to the backend?
> LO manager does it, but I'm not sure what else it introduces (separate
> transaction etc)
>
> Regards,
> Joe
>
>
>>-----Original Message-----
>>From: pgsql-jdbc-owner(at)postgresql(dot)org
>>[mailto:pgsql-jdbc-owner(at)postgresql(dot)org]On Behalf Of Marin Dimitrov
>>Sent: Thursday, 18 April 2002 8:32
>>Cc: pgsql-jdbc(at)postgresql(dot)org
>>Subject: Re: [JDBC] BLOB performance test FYI
>>
>>
>>
>>----- Original Message -----
>>From: "Barry Lind"
>>
>>
>>>This is all explained to some degree in the documentation for 7.2. It
>>>is left as a choice to the user to use the datatype best suited for
>>>their circumstances given the pros and cons of the different
>>
>>alternatives.
>>
>>the docs do not mention that storing even a 500KB of binary data
>>is 40 times
>>slower with byte when compared to LO
>>
>>besides, 500KB is all but the "very large value" that is mentioned in
>
> the
>
>>docs
>>
>>provided that people have outlined the problems that lead to the poor
>
> byte
>
>>performance (the backend parses the input) and that these problems seem
>
> to
>
>>be unresolvable, then wouldn't it be a better mid-term solution
>>to switch to
>>LO and hope for improvements related to LOs in next releases rather than
>>using byte at all?
>>
>>
>> Marin
>>
>>----
>>"...what you brought from your past, is of no use in your present. When
>>you must choose a new path, do not bring old experiences with you.
>>Those who strike out afresh, but who attempt to retain a little of the
>>old life, end up torn apart by their own memories. "
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>>message can get through to the mailing list cleanly
>>
>


From: Barry Lind <barry(at)xythos(dot)com>
To: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 01:43:48
Message-ID: 3CBE24D4.40704@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Anders Bengtsson wrote:
>
> Some of these problems aren't really about the datatypes themselves, but
> how the driver handles them. Instead of streaming large data into the
> backend, the client reads them into memory before sending them.
> It seems the LO parts of the driver has a different solution, that
> streams the data directly.
> It would take some major reworking of the JDBC driver to fix this. I've
> looked at it, and concluded that it is possible, but that it was too big
> for me to fix.
>
> /Anders

I would classify the problems more as limitations of the protocol that
exists between the server and all clients (not just jdbc, but odbc and
the rest as well). One of the things I have been advocating for a while
is the ability to use bind variables such that you can send the sql
statement separatly from the data values. This should allow passing
values that are typed such that they don't all need to be converted to
strings as happens today. This requires changing the FE/BE protocol or
uses the FastPath functionality to implement this on top of the existing
FE/BE protocol.

thanks,
--Barry

PS. Below is a mail note I sent last you to one of the kernel hackers
who was working on improving TOAST (thus bytea) functionality in an
effort to get a datatype better suited for jdbc blobs.

I don't know how much work you are planning on putting into this, but I
think what you are doing is great and I hope you continue. Is your goal
here to provide better BLOB/CLOB functionality than currently exists in
either existing TOAST or LargeObjects?

If so I would like to share with you my thoughts on what I think is
needed to provide good BLOB/CLOB support in postgres. I am approching
this from what I need in the server to provide decent Blob support
through the JDBC API.

The JDBC API basically assumes that when you select a blob you only
fetch a pointer/object reference (not any actual data). Then later you
use that pointer in function calls to fetch part/all of the data (or
append/update in the update case).

This is basically how the old LargeObjects work, but they have many
problems in their implementation that make them unsuitable for use in
the JDBC driver.

Thus what I would eventually like to see in the backend is the following:

Two new datatypes (blob and clob). These would basically be 'wrappers'
around the bytea and text datatypes. They would differ in that the text
out method on each would return an object reference instead of returning
the value of the column.

New functions could be created that take these object references and
allow you to get, update, or append data to/from the blob.

So you could do something like:

create table foo (foo_id int, foo_blob blob);

insert into foo values (1, '');

select foo_blob from foo where foo_id = 1;

(would return something like: foo:foo_blob:rowoid -- something that
can be used to identify the toasted value)

select blob_append('foo:foo_blob:rowoid', 'the stuff to append');
// there should also be a blob_update() that allows you to update some
// some range of bytes/characters

select blob_read('foo:foo_blob:rowoid', start_position, length);

(would return the substring of the blob requested)

I just wanted to share my ideas with you as I knew you were working in
this area of the code.

thanks,
--Barry


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 04:19:01
Message-ID: 6209.1019103541@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

"Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg> writes:
> the docs do not mention that storing even a 500KB of binary data is 40 times
> slower with byte when compared to LO

Probably because that hasn't been the experience of other people ;-)

Could we see your test case?

regards, tom lane


From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: "Dave Cramer" <Dave(at)micro-automation(dot)net>
Cc: "Anders Bengtsson" <ndrsbngtssn(at)yahoo(dot)se>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 04:49:14
Message-ID: HEECIHEEJDBMCCGMGIOBIECMCHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Its an interesting topic... when I use BLOB's its generally for storage of large things, documents, images, data files etc. Thats one view though; other uses like small binary data or encrypted stuff would be common too, so 'bytea' is an important type, but used in moderation (with the current JDBC driver) like anything I guess ;)

I care about all the things that PostgreSQL does, including access control, referential integrity and whatnot, but I think the important thing is I'm not particularly concerned about speed of access (when dealing with whole binary data, certainly don't need it indexed): what I mean by that is the binary file could happily reside on the local file system and be streamed to the client... maybe a 'softlink' to the data (aka the Large Object manager with oids almost), but I guess this is old ground, I'm thinking a new datatype that still has ACL's but is only a string pointing to the file on the local filesystem. I should probably be thinking of all the complexity this might add to the parser, optimisations and stuff. I'm not - as a result of aforementioned hop water ;)

So a LO/link datatype that obeys ACL's, can be integrated into the BE/FE protocol, would be rw-only by the postmaster user and stored in the db data directory perhaps, that fits in with pg_dump|restore, that could be streamed and referenced in result sets, that would be deleted on deletion of the link, and that doesn't break any existing functionality, perhaps with extensions to the SQL standard. That's all I'm after :) :)

No, I've no idea what all the issues are yet, I'm sure there's heaps - I think I need to burn some rosemary and sacrifice a goat to the Tom Lane alter ;) Maybe a new datatype 'bytefs', rejig the parser and all the rest. Or, is large binary data really the province of an RDBMS?

Cheers,
Joe

PS. Someone raised the PG-Java procedural language the other day. When that idea comes to fruition it will be an Incredibly Great Thing(TM).... but then I thought about about the separation of data and logic, and thought that it's overkill for the database to do that. Thoughts? I was thinking maybe a pseudo

PPS. We need an RFC or a good MIME type that stops digital signatures interfering with automatic appended footers. Outhouse is causing me grief with this list.

> Joe, Anders,
>
> Please keep this discussion online. The more people we have looking at a
> possible solution, the better.
>
> Dave
[snip]
> > > Instead of streaming large data into the
> > > backend, the client reads them into memory before sending them.
> > > It seems the LO parts of the driver has a different solution, that
> > > streams the data directly.
> > > It would take some major reworking of the JDBC driver to fix
> this. I've
> > > looked at it, and concluded that it is possible, but that it
> was too big
> > > for me to fix.


From: "Joe Shevland" <jshevland(at)j-elite(dot)com>
To: "Dave Cramer" <Dave(at)micro-automation(dot)net>
Cc: "Anders Bengtsson" <ndrsbngtssn(at)yahoo(dot)se>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 05:49:29
Message-ID: HEECIHEEJDBMCCGMGIOBOECOCHAA.jshevland@j-elite.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

> PS. Someone raised the PG-Java procedural language the other day.
> When that idea comes to fruition it will be an Incredibly Great
> Thing(TM).... but then I thought about about the separation of
> data and logic, and thought that it's overkill for the database
> to do that. Thoughts? I was thinking maybe a pseudo

... was going to rant about a pseudo-Java language but came to a decision its a pretty tough issue ;)

What about (and I haven't had a good look into the sources referenced the other day on Sourceforge yet, again prolly treading old ground):

* Using tools.jar to parse and compile the function source code
* One JVM per PostgreSQL server (configurable via postgresql.conf)
* One classloader per database to keep things clean and separate
* JSP-like compilation - functions compiled when there is a change in the code, or first created. Otherwise remains memory resident as a class instance, and/or is created on server startup.
* Communication would be via JNI (Java<->C): the JVM would be assumed to be resident on the same server... unless we'd want to look at load balancing, again I keep thinking of the boundaries of what the db should do.
* Mapping between Java and PG types handled by a central thing

Wading further and further into deep water here I think,

Cheers,
Joe


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 09:46:28
Message-ID: 009c01c1e6bd$e9a7a4d0$9b0e32d4@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


----- Original Message -----
From: "Tom Lane"

> "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg> writes:
> > the docs do not mention that storing even a 500KB of binary data is 40
times
> > slower with byte when compared to LO
>
> Probably because that hasn't been the experience of other people ;-)
>
> Could we see your test case?
>

it wasn't my test case - it was published in a mail from Joe Shevland
yesterday (the beginning of this thread)

The test I've personally tried was comparing Postgres BYTEA performance and
Oracle BLOB performance (both servers reside on the same machine) and the
results showed that postgres is 7 times slower for 128KB file, 17 times
slower for 500KB and 22 times slower for 1MB , i.e. the performance degrades
with the growth of the file size

also my test showed results about memory consumption similar to the ones
reported by Joe - the client JDBC application used 10 times more memory for
sending the binary data to Postgres when compared to Oracle

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "


From: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 10:54:01
Message-ID: 1019127243.1814.23.camel@spinoza
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On 2002-04-18 Joe Shevland wrote:

> I figured that the differences in the types and how the backend
> handles them precludes actually trying to do anything to optimise the
> behaviour in the client interface. For example, can I use the socket
> connection when performing an update operation to stream binary data to
> the backend, or does it expect the whole statement to be character data
> sent at the one time?

Hi Joe!

It's only possible to send the whole statement, data and query, as
character strings.
Today when you add a stream parameter to a PreparedStatement it
immediately reads the stream into a String. Then it concatenates the
query and the parameters into an even larger String, which is scanned
for escape codes and such (creating yet another huge String!). Finally
it's all written to the backend.

A better way would be to keep the stream parameter stored as a
InputStream until the query is executed, and then stream it directly to
the backend. This requires some changes in how the PreparedStatement
class deals with parameters and how the PreparedStatement is fed to the
backend. The escape scanning will have to be rewritten as a filtering
stream, etc.

It would imply a change in semantics of a PreparedStatement, that it
could only be executed one time once the parameters are set. But I
wouldn't be surprised if that is what the JDBC spec. says anyway.

(Today's internal interface in Connection assumes a String to be
executed. Perhaps a new interface "Query" with a "StringQuery" class to
encapsulate simple string queries and something for prepared queries
could be used. But that's only my thoughs on how to implement it).

/Anders

--

A n d e r s B e n g t s s o n | ndrsbngtssn(at)yahoo(dot)se
Stockholm, Sweden |

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To: "Anders Bengtsson" <ndrsbngtssn(at)yahoo(dot)se>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 11:25:12
Message-ID: 000701c1e6cb$b4a81b30$9b0e32d4@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


----- Original Message -----
From: "Anders Bengtsson"

>
> It would imply a change in semantics of a PreparedStatement, that it
> could only be executed one time once the parameters are set. But I
> wouldn't be surprised if that is what the JDBC spec. says anyway.
>

doesn't seem so , it says:

"The values set for the parameter markers of a PreparedStatement object are
not
reset when it is executed. The method clearParameters can be called to
explictly
clear the values that have been set. Setting a parameter with a different
value will
replace the previous value with the new one."

...which IMHO implies multiple executions with the same parameters should be
possible

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "


From: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
To: Marin Dimitrov <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 12:59:55
Message-ID: 1019134798.1814.33.camel@spinoza
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On 2002-04-18 Marin Dimitrov wrote:

> > It would imply a change in semantics of a PreparedStatement, that it
> > could only be executed one time once the parameters are set. But I
> > wouldn't be surprised if that is what the JDBC spec. says anyway.
> >
>
> doesn't seem so , it says:
>
> "The values set for the parameter markers of a PreparedStatement object are
> not
> reset when it is executed. The method clearParameters can be called to
> explictly
> clear the values that have been set. Setting a parameter with a different
> value will
> replace the previous value with the new one."
>
> ...which IMHO implies multiple executions with the same parameters should be
> possible

This is really strange. I haven't managed to find anything about this
issue anywhere when searching for JDBC.
I now took a quick look at some other JDBC drivers. It seems MySQL's
driver stores streams, but converts them to byte arrays at execution
time. But it would fail on the second attempt to use a stream parameter.
FireBird seems to have a nicer protocol for large objects, but it
doesn't seem to support re-using a stream either.

The JDBC 2.0 Specification has this clarification:

"When an application passes a stream as an input value via a setXXX() or
updateXXX() method, the application is responsible for maintaining the
stream in a readable state until one of the following methods is called:
PreparedStatement.execute(), executeQuery(), executeUpdate(), or
executeBatch(), and ResultSet.insertRow() or updateRow(). A JDBC driver
is not required to wait until one of these methods is called to read the
stream value."

This sort of implies streaming directly, but for a clarification it
doesn't make things much clearer. :)

/Anders
--

A n d e r s B e n g t s s o n | ndrsbngtssn(at)yahoo(dot)se
Stockholm, Sweden |

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: "David Wall" <d(dot)wall(at)computer(dot)org>
To: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 15:36:40
Message-ID: 014f01c1e6ee$d5e45340$5a2b7ad8@expertrade.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

> ...which IMHO implies multiple executions with the same parameters should
be
> possible

And is preferable. This is one of the benefits of using a prepared
statement -- doing loop constructs and such that don't force rebuilding it
all each time.

Clearly, JDBC has APIs built to support the concepts of small and large
binary objects. Most databases support them. I've been doing so with PG7.1
with success using OIDs, though I do need to run vacuumlo daily because my
JDBC code doesn't want to "know to unlink OIDs" since I'm trying to stay as
db agnostic as possible in my mainline code.

My impression was that blob support through JDBC in PG7.2 would be handled
by OID, and small binaries by bytea. Is that not the case? I know that I
have a "problem" in that I use setBytes/getBytes today for my blobs as that
worked in 7.1. I've not used the JDBC blob apis yet (setBlob()). I know
this changed in the PG JDBC code for 7.2, but it's still not clear to me how
this is all really working.

Thanks,
David


From: Peter V Cooper <pvcooper(at)adelphia(dot)net>
To: "Joe Shevland" <jshevland(at)j-elite(dot)com>, "Dave Cramer" <Dave(at)micro-automation(dot)net>
Cc: "Anders Bengtsson" <ndrsbngtssn(at)yahoo(dot)se>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 16:15:32
Message-ID: 5.1.0.14.0.20020418082850.00afe408@mail.dc3.adelphia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

This sounds reasonable and like something I might use as I am doing this myself
manually already in Postgres & Oracle. I only have one install of postgres
and many w/ Oracle. We manage Objects like large images manually. It would
be nice to have this in postgres. I would use it now as I have to use the
os filesystem
and synchronize using a sequence already. Security and especially
non-cleanup on
delete are real problems. Just one user opinion.

BTW, I have never used a DBMS for this in the past because I was
comfortable with
having a sequence in the DB/hashed set of dirs and binary files in those
dirs. The
reason for hashed directories is twofold one to keep the number of directory
entries lower when working with millions of rows/files and two when you overrun
the filesystem then it is possible to symlink some of the dirs to another
file system.
This could create problems if you lost some of the files to a drive failure
but it
would be temporary as the system admin would replace/repair and recover. My
next task is to give my FE Java class multiple filesystem places to
automatically
manage - this would be cool as well.

PS, I have done development since 1979 at OS(UNIX) DB (Informally Helped Oracle
in 1983) and Application levels. I have never added code/my help to an Open
Source
project. Where do I find info on this? I might be inclined to work on something
like this if anyone would want me to. I have a lot of source which does much of
what I have spoken of. Any suggestions or should I 'butt out' so to speak.

At 02:49 PM 4/18/2002 +1000, Joe Shevland wrote:
>Its an interesting topic... when I use BLOB's its generally for storage of
>large things, documents, images, data files etc. Thats one view though;
>other uses like small binary data or encrypted stuff would be common too,
>so 'bytea' is an important type, but used in moderation (with the current
>JDBC driver) like anything I guess ;)
>
>I care about all the things that PostgreSQL does, including access
>control, referential integrity and whatnot, but I think the important
>thing is I'm not particularly concerned about speed of access (when
>dealing with whole binary data, certainly don't need it indexed): what I
>mean by that is the binary file could happily reside on the local file
>system and be streamed to the client... maybe a 'softlink' to the data
>(aka the Large Object manager with oids almost), but I guess this is old
>ground, I'm thinking a new datatype that still has ACL's but is only a
>string pointing to the file on the local filesystem. I should probably be
>thinking of all the complexity this might add to the parser, optimisations
>and stuff. I'm not - as a result of aforementioned hop water ;)
>
>So a LO/link datatype that obeys ACL's, can be integrated into the BE/FE
>protocol, would be rw-only by the postmaster user and stored in the db
>data directory perhaps, that fits in with pg_dump|restore, that could be
>streamed and referenced in result sets, that would be deleted on deletion
>of the link, and that doesn't break any existing functionality, perhaps
>with extensions to the SQL standard. That's all I'm after :) :)
>
>No, I've no idea what all the issues are yet, I'm sure there's heaps - I
>think I need to burn some rosemary and sacrifice a goat to the Tom Lane
>alter ;) Maybe a new datatype 'bytefs', rejig the parser and all the rest.
>Or, is large binary data really the province of an RDBMS?
>
>Cheers,
>Joe
>
>PS. Someone raised the PG-Java procedural language the other day. When
>that idea comes to fruition it will be an Incredibly Great Thing(TM)....
>but then I thought about about the separation of data and logic, and
>thought that it's overkill for the database to do that. Thoughts? I was
>thinking maybe a pseudo
>
>PPS. We need an RFC or a good MIME type that stops digital signatures
>interfering with automatic appended footers. Outhouse is causing me grief
>with this list.
>
> > Joe, Anders,
> >
> > Please keep this discussion online. The more people we have looking at a
> > possible solution, the better.
> >
> > Dave
>[snip]
> > > > Instead of streaming large data into the
> > > > backend, the client reads them into memory before sending them.
> > > > It seems the LO parts of the driver has a different solution, that
> > > > streams the data directly.
> > > > It would take some major reworking of the JDBC driver to fix
> > this. I've
> > > > looked at it, and concluded that it is possible, but that it
> > was too big
> > > > for me to fix.
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html


From: Justin Clift <justin(at)postgresql(dot)org>
To: Peter V Cooper <pvcooper(at)adelphia(dot)net>
Cc: Joe Shevland <jshevland(at)j-elite(dot)com>, Dave Cramer <Dave(at)micro-automation(dot)net>, Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 16:40:49
Message-ID: 3CBEF711.A8D3A58A@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Hi Peter,

Peter V Cooper wrote:
<snip>
> PS, I have done development since 1979 at OS(UNIX) DB (Informally Helped Oracle
> in 1983) and Application levels. I have never added code/my help to an Open
> Source
> project. Where do I find info on this? I might be inclined to work on something
> like this if anyone would want me to. I have a lot of source which does much of
> what I have spoken of. Any suggestions or should I 'butt out' so to speak.

Your expertise and assistance is very much welcomed. :-)

First question is "what kind of stuff do you *like* doing?", as if
there's a match of what you enjoy doing with what needs to be done, then
that probably would be a good place to start looking.

:-)

Regards and best wishes,

Justin Clift

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Peter V Cooper <pvcooper(at)adelphia(dot)net>
Cc: Joe Shevland <jshevland(at)j-elite(dot)com>, Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 16:52:36
Message-ID: 1019148756.1425.35.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Peter,

Your help is greatly appreciated. As far as information goes. The
postgres site has a developers section. I presume you already know how
to get the code, if not, let me know and I can direct you appropriately

Please, feel free to contribute

Dave
On Thu, 2002-04-18 at 12:15, Peter V Cooper wrote:
> This sounds reasonable and like something I might use as I am doing this myself
> manually already in Postgres & Oracle. I only have one install of postgres
> and many w/ Oracle. We manage Objects like large images manually. It would
> be nice to have this in postgres. I would use it now as I have to use the
> os filesystem
> and synchronize using a sequence already. Security and especially
> non-cleanup on
> delete are real problems. Just one user opinion.
>
> BTW, I have never used a DBMS for this in the past because I was
> comfortable with
> having a sequence in the DB/hashed set of dirs and binary files in those
> dirs. The
> reason for hashed directories is twofold one to keep the number of directory
> entries lower when working with millions of rows/files and two when you overrun
> the filesystem then it is possible to symlink some of the dirs to another
> file system.
> This could create problems if you lost some of the files to a drive failure
> but it
> would be temporary as the system admin would replace/repair and recover. My
> next task is to give my FE Java class multiple filesystem places to
> automatically
> manage - this would be cool as well.
>
> PS, I have done development since 1979 at OS(UNIX) DB (Informally Helped Oracle
> in 1983) and Application levels. I have never added code/my help to an Open
> Source
> project. Where do I find info on this? I might be inclined to work on something
> like this if anyone would want me to. I have a lot of source which does much of
> what I have spoken of. Any suggestions or should I 'butt out' so to speak.
>
> At 02:49 PM 4/18/2002 +1000, Joe Shevland wrote:
> >Its an interesting topic... when I use BLOB's its generally for storage of
> >large things, documents, images, data files etc. Thats one view though;
> >other uses like small binary data or encrypted stuff would be common too,
> >so 'bytea' is an important type, but used in moderation (with the current
> >JDBC driver) like anything I guess ;)
> >
> >I care about all the things that PostgreSQL does, including access
> >control, referential integrity and whatnot, but I think the important
> >thing is I'm not particularly concerned about speed of access (when
> >dealing with whole binary data, certainly don't need it indexed): what I
> >mean by that is the binary file could happily reside on the local file
> >system and be streamed to the client... maybe a 'softlink' to the data
> >(aka the Large Object manager with oids almost), but I guess this is old
> >ground, I'm thinking a new datatype that still has ACL's but is only a
> >string pointing to the file on the local filesystem. I should probably be
> >thinking of all the complexity this might add to the parser, optimisations
> >and stuff. I'm not - as a result of aforementioned hop water ;)
> >
> >So a LO/link datatype that obeys ACL's, can be integrated into the BE/FE
> >protocol, would be rw-only by the postmaster user and stored in the db
> >data directory perhaps, that fits in with pg_dump|restore, that could be
> >streamed and referenced in result sets, that would be deleted on deletion
> >of the link, and that doesn't break any existing functionality, perhaps
> >with extensions to the SQL standard. That's all I'm after :) :)
> >
> >No, I've no idea what all the issues are yet, I'm sure there's heaps - I
> >think I need to burn some rosemary and sacrifice a goat to the Tom Lane
> >alter ;) Maybe a new datatype 'bytefs', rejig the parser and all the rest.
> >Or, is large binary data really the province of an RDBMS?
> >
> >Cheers,
> >Joe
> >
> >PS. Someone raised the PG-Java procedural language the other day. When
> >that idea comes to fruition it will be an Incredibly Great Thing(TM)....
> >but then I thought about about the separation of data and logic, and
> >thought that it's overkill for the database to do that. Thoughts? I was
> >thinking maybe a pseudo
> >
> >PPS. We need an RFC or a good MIME type that stops digital signatures
> >interfering with automatic appended footers. Outhouse is causing me grief
> >with this list.
> >
> > > Joe, Anders,
> > >
> > > Please keep this discussion online. The more people we have looking at a
> > > possible solution, the better.
> > >
> > > Dave
> >[snip]
> > > > > Instead of streaming large data into the
> > > > > backend, the client reads them into memory before sending them.
> > > > > It seems the LO parts of the driver has a different solution, that
> > > > > streams the data directly.
> > > > > It would take some major reworking of the JDBC driver to fix
> > > this. I've
> > > > > looked at it, and concluded that it is possible, but that it
> > > was too big
> > > > > for me to fix.
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 5: Have you checked our extensive FAQ?
> >
> >http://www.postgresql.org/users-lounge/docs/faq.html
>
>


From: Barry Lind <barry(at)xythos(dot)com>
To: Joe Shevland <jshevland(at)j-elite(dot)com>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 17:02:01
Message-ID: 3CBEFC09.1000203@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Joe,

I think at least initially things can be a lot simpler:

Joe Shevland wrote:
>
> * Using tools.jar to parse and compile the function source code

I would suggest that we just follow the lead of C functions. The
database doesn't compile your C source when you register a C function,
you just tell the database where the precompiled code is and what the
entry point is. To start with, I think the java version should do the
same. Tell the server the statuc class.method to run, but leave the
compiling and loading the classes to the user.

> * One JVM per PostgreSQL server (configurable via postgresql.conf)

Yes.

> * One classloader per database to keep things clean and separate

Sure.

> * JSP-like compilation - functions compiled when there is a change in the code, or first created. Otherwise remains memory resident as a class instance, and/or is created on server startup.

Overkill (at least initially) see my comments above.

> * Communication would be via JNI (Java<->C): the JVM would be assumed to be resident on the same server... unless we'd want to look at load balancing, again I keep thinking of the boundaries of what the db should do.

Should reuse the FE/BE protocol, and then much of the work is done in
the existing jdbc driver code

> * Mapping between Java and PG types handled by a central thing

Already handled by the existing FE/BE code in both jdbc and the server.

>
> Wading further and further into deep water here I think,
>
> Cheers,
> Joe
>

thanks,
--Barry


From: Barry Lind <barry(at)xythos(dot)com>
To: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
Cc: Joe Shevland <jshevland(at)j-elite(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 18:11:32
Message-ID: 3CBF0C54.9030903@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Anders,

This is a great idea. This would certainly solve the memory problem for
inserts. Selects on the other hand wouldn't be helped since you need to
read the entire value first before you can get to the rest of the values.

thanks,
--Barry

Anders Bengtsson wrote:
> On 2002-04-18 Joe Shevland wrote:
>
>
>>I figured that the differences in the types and how the backend
>>handles them precludes actually trying to do anything to optimise the
>>behaviour in the client interface. For example, can I use the socket
>>connection when performing an update operation to stream binary data to
>>the backend, or does it expect the whole statement to be character data
>>sent at the one time?
>
>
> Hi Joe!
>
> It's only possible to send the whole statement, data and query, as
> character strings.
> Today when you add a stream parameter to a PreparedStatement it
> immediately reads the stream into a String. Then it concatenates the
> query and the parameters into an even larger String, which is scanned
> for escape codes and such (creating yet another huge String!). Finally
> it's all written to the backend.
>
> A better way would be to keep the stream parameter stored as a
> InputStream until the query is executed, and then stream it directly to
> the backend. This requires some changes in how the PreparedStatement
> class deals with parameters and how the PreparedStatement is fed to the
> backend. The escape scanning will have to be rewritten as a filtering
> stream, etc.
>
> It would imply a change in semantics of a PreparedStatement, that it
> could only be executed one time once the parameters are set. But I
> wouldn't be surprised if that is what the JDBC spec. says anyway.
>
> (Today's internal interface in Connection assumes a String to be
> executed. Perhaps a new interface "Query" with a "StringQuery" class to
> encapsulate simple string queries and something for prepared queries
> could be used. But that's only my thoughs on how to implement it).
>
> /Anders
>


From: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
To: Barry Lind <barry(at)xythos(dot)com>
Cc: Joe Shevland <jshevland(at)j-elite(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-18 18:31:53
Message-ID: 1019154717.1814.44.camel@spinoza
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On 2002-04-18 Barry Lind wrote:

> This is a great idea. This would certainly solve the memory problem for
> inserts. Selects on the other hand wouldn't be helped since you need to
> read the entire value first before you can get to the rest of the values.

There probably is a need for a complete rework of how values in general,
both parameters and results, are handled in the FE/BE protocol.
One thing that has bothered me is how the character encoding of text
data affects the encoding of the query. I don't think it has ever caused
any problems, but it feels strange that it does.

/Anders
--

A n d e r s B e n g t s s o n | ndrsbngtssn(at)yahoo(dot)se
Stockholm, Sweden |

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
To: Barry Lind <barry(at)xythos(dot)com>
Cc: Joe Shevland <jshevland(at)j-elite(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-19 12:39:03
Message-ID: 1019219950.1822.5.camel@spinoza
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On 2002-04-18 Barry Lind wrote:

> This is a great idea. This would certainly solve the memory problem for
> inserts. Selects on the other hand wouldn't be helped since you need to
> read the entire value first before you can get to the rest of the values.

(my second reply to the same message)

I just found this in the JDBC specification:

"For maximum portability, columns within a row should be read in
left-to-right order, and each column should only be read once. This
reflects implementation limitations in some underlying database
protocols."

This means that it would be possible to stream responses too, even with
the current protocol. But it's very impractical for the user, so it's
not a very good solution.

--

A n d e r s B e n g t s s o n | ndrsbngtssn(at)yahoo(dot)se
Stockholm, Sweden |

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To: "Anders Bengtsson" <ndrsbngtssn(at)yahoo(dot)se>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-19 13:08:28
Message-ID: 00d701c1e7a3$4c690100$9b0e32d4@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


----- Original Message -----
From: "Anders Bengtsson"

>
> This means that it would be possible to stream responses too, even with
> the current protocol. But it's very impractical for the user, so it's
> not a very good solution.
>

why is that?

the Oracle driver demands that columns be read in left-to-right order too,
without skipping a column

I'm also not sure that many people need executing a statement more than once
with the *same* parameter values

thanx,

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "


From: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
To: Marin Dimitrov <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-19 16:49:05
Message-ID: 1019234948.2038.6.camel@spinoza
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On 2002-04-19 Marin Dimitrov wrote:

> > This means that it would be possible to stream responses too, even with
> > the current protocol. But it's very impractical for the user, so it's
> > not a very good solution.
> >
>
> why is that?
>
> the Oracle driver demands that columns be read in left-to-right order too,
> without skipping a column

I guess I under-estimated the pain treshold for impractical features,
then. ;-)

> I'm also not sure that many people need executing a statement more than once
> with the *same* parameter values

Yes, the only thing to worry about here is standards compliance. It's
certainly a reasonable demand for the user to set up the parameters
again if they for some reason want to repeat a query.

/Anders
--

A n d e r s B e n g t s s o n | ndrsbngtssn(at)yahoo(dot)se
Stockholm, Sweden |

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: Barry Lind <barry(at)xythos(dot)com>
To: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>
Cc: Joe Shevland <jshevland(at)j-elite(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-19 18:19:43
Message-ID: 3CC05FBF.2050907@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

It may be possible to stream the responces but certainly is difficult.
Consider the following code:

PreparedStatement l_stat = l_conn.PrepareStatement("select a,b from foo");
ResultSet l_rset = l_stat.executeQuery();
while (l_rset.next()) {
PreparedStatement l_stat2 = l_conn.prepareStatement("select y, z from
bar where y = ?");
l_stat2.setInt(1, l_rset.getInt(1));
ResultSet l_rset2 = l_stat2.executeQuery();
while l_rset2.next() {
//do something useful
}
}

If the first first query doesn't fetch all of the data the second nested
query can't use the socket to get its data. So to do this you would
need to code the driver to pull all of the data into memory when the
socket was needed by some other query.

--Barry

Anders Bengtsson wrote:
> On 2002-04-18 Barry Lind wrote:
>
>
>>This is a great idea. This would certainly solve the memory problem for
>>inserts. Selects on the other hand wouldn't be helped since you need to
>>read the entire value first before you can get to the rest of the values.
>
>
> (my second reply to the same message)
>
> I just found this in the JDBC specification:
>
> "For maximum portability, columns within a row should be read in
> left-to-right order, and each column should only be read once. This
> reflects implementation limitations in some underlying database
> protocols."
>
> This means that it would be possible to stream responses too, even with
> the current protocol. But it's very impractical for the user, so it's
> not a very good solution.
>


From: Barry Lind <barry(at)xythos(dot)com>
To: Marin Dimitrov <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: Anders Bengtsson <ndrsbngtssn(at)yahoo(dot)se>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: BLOB performance test FYI
Date: 2002-04-19 18:21:45
Message-ID: 3CC06039.9020306@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

> the Oracle driver demands that columns be read in left-to-right order
too,
> without skipping a column

This isn't true. My code does not always read data in column order and
it runs fine on Oracle.

--Barry


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To: "Barry Lind" <barry(at)xythos(dot)com>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: BLOB performance test FYI
Date: 2002-04-19 18:40:58
Message-ID: 000b01c1e7d1$bf5b3010$9b0e32d4@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


----- Original Message -----
From: "Barry Lind"

> > the Oracle driver demands that columns be read in left-to-right order
> too,
> > without skipping a column
>
> This isn't true. My code does not always read data in column order and
> it runs fine on Oracle.
>

does your code access LONG columns?

the docs say:

"If your query selects multiple columns and one of the columns contains a
data
stream, then the contents of the columns following the stream column are not
available until the stream has been read, and the stream column is no longer
available once any following column is read. Any attempt to read a column
beyond
a streaming column closes the streaming column."

and (there is an example code skipped):

" - Call the stream column in SELECT-list order.

If your query selects multiple columns, the database sends each row as a set
of
bytes representing the columns in the SELECT order. If one of the columns
contains stream data, the database sends the entire data stream before
proceeding to the next column.
If you do not use the SELECT-list order to access data, then you can lose
the
stream data. That is, if you bypass the stream data column and access data
in a
column that follows it, the stream data will be lost. For example, if you
try to
access the data for the NUMBER column before reading the data from the
stream
data column, the JDBC driver first reads then discards the streaming data
automatically. This can be very inefficient if the LONG column contains a
large
amount of data."

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "