Re: JDBC and processing large numbers of rows

Lists: pgsql-jdbc
From: Carlos Barroso <miguel(dot)barroso(at)mail(dot)pt>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Can't insert more than 80 registers!?
Date: 2004-05-11 10:35:21
Message-ID: 20040511103520.1ADC6D1B1EE@svr1.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Hi! My first post.
I have a problem that's getting crazy. I'm going to discriminate what I used to
get a better understanding.

Tools used:
-----------
Postgres 7.4.2
JDBC

Drivers tested (all downloaded from Postgres site):
---------------------------------------------------
pg74.1jdbc2.jar
pg74.213.jdbc2.jar
pg74.213.jdbc3.jar

Table (DDL):
------------
CREATE TABLE PLANO_ENSAIO (
id INT8 NOT NULL
, ensaio_fk INT8 NOT NULL
, op_fk INT8 NOT NULL
, data_hora TIMESTAMP(10) NOT NULL
, estado CHAR(1) NOT NULL
, user_id CHAR(10) NOT NULL
, dt_hr TIMESTAMP(10) NOT NULL
, PRIMARY KEY (id)
, CONSTRAINT plano_ensaio_fk FOREIGN KEY (ensaio_fk) REFERENCES ENSAIO
(id)
, CONSTRAINT plano_ensaio_op_fk FOREIGN KEY (op_fk) REFERENCES OP (id)
);

NOTE:
I compiled Postgres server without any tweaking.

Here's the testing code:

Class.forName("org.postgresql.Driver");
Connection conn =
DriverManager.getConnection("jdbc:postgresql://localhost:6543/dbdev",
"developer", "xpto");

Statement st = null;
for(int i = 1; i <= 90; i++) {
st = conn.createStatement();
st.executeUpdate("INSERT INTO
plano_ensaio(id,ensaio_fk,op_fk,data_hora,estado,user_id,dt_hr) VALUES (" + i +
",1,1,'2004-04-04 10:11:11','A','mike','2004-05-05 05:55:55')");
}

st.close();
conn.close();

Results:
The code above doesn't give any error. The BIG problem is that it's only
inserting 80 and NOT 90 registers!?
I've tried everything I know and I can't get it working.
I thought it could be because of some WAL configuration, so I increased the
"wal_buffers" parameter, but got no success.

Can someone please help me? Don't know what to do.
--
Adira j ao Net Dialup Light. Acesso profissional gratuito.
NovisNet, a Internet de quem trabalha. http://www.novisnet.pt


From: Paul Thomas <paul(at)tmsl(dot)demon(dot)co(dot)uk>
To: Carlos Barroso <miguel(dot)barroso(at)mail(dot)pt>
Cc: "pgsql-jdbc (at) postgresql (dot) org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Can't insert more than 80 registers!?
Date: 2004-05-11 11:40:44
Message-ID: 20040511124044.C25409@bacon
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


On 11/05/2004 11:35 Carlos Barroso wrote:
> Here's the testing code:
>
> Class.forName("org.postgresql.Driver");
> Connection conn =
> DriverManager.getConnection("jdbc:postgresql://localhost:6543/dbdev",
> "developer", "xpto");
>
> Statement st = null;
> for(int i = 1; i <= 90; i++) {
> st = conn.createStatement();
> st.executeUpdate("INSERT INTO
> plano_ensaio(id,ensaio_fk,op_fk,data_hora,estado,user_id,dt_hr) VALUES ("
> + i +
> ",1,1,'2004-04-04 10:11:11','A','mike','2004-05-05 05:55:55')");
> }
>
> st.close();
> conn.close();
>
> Results:
> The code above doesn't give any error. The BIG problem is that it's only
> inserting 80 and NOT 90 registers!?
> I've tried everything I know and I can't get it working.
> I thought it could be because of some WAL configuration, so I increased
> the
> "wal_buffers" parameter, but got no success.
>
> Can someone please help me? Don't know what to do.

I don't know what the exact problem is but you're got a resource leak.
Your st.close() should really be inside your for() loop.

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for
Business |
| Computer Consultants |
http://www.thomas-micro-systems-ltd.co.uk |
+------------------------------+---------------------------------------------+


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Carlos Barroso <miguel(dot)barroso(at)mail(dot)pt>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Can't insert more than 80 registers!?
Date: 2004-05-11 12:47:17
Message-ID: 40A0CB55.5060400@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Carlos Barroso wrote:

> Results:
> The code above doesn't give any error. The BIG problem is that it's only
> inserting 80 and NOT 90 registers!?
> I've tried everything I know and I can't get it working.

I just tested this schema & code against 7.4.2 and pg74.213.jdbc3.jar.
It works as expected, inserting 90 rows. The only difference in schema
was that I dropped the foreign key constraints as you didn't provide DDL
or data for the referenced tables.

Here's what I did:

> oliver(at)flood:~$ sandbox-7.4.2/bin/psql -p 5742 test
> [... startup banner ...]
> test=> CREATE TABLE PLANO_ENSAIO (
> test(> id INT8 NOT NULL
> test(> , ensaio_fk INT8 NOT NULL
> test(> , op_fk INT8 NOT NULL
> test(> , data_hora TIMESTAMP(10) NOT NULL
> test(> , estado CHAR(1) NOT NULL
> test(> , user_id CHAR(10) NOT NULL
> test(> , dt_hr TIMESTAMP(10) NOT NULL
> test(> , PRIMARY KEY (id) );
> WARNING: TIMESTAMP(10) precision reduced to maximum allowed, 6
> WARNING: TIMESTAMP(10) precision reduced to maximum allowed, 6
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "plano_ensaio_pkey" for table "plano_ensaio"
> CREATE TABLE
> test=> \q
>
> oliver(at)flood:~$ cat Test.java
> import java.sql.*;
>
> public class Test {
> public static void main(String[] args) throws Exception {
> Class.forName("org.postgresql.Driver");
> Connection conn =
> DriverManager.getConnection("jdbc:postgresql://localhost:5742/test",
> "oliver", "oliver");
>
> Statement st = null;
> for(int i = 1; i <= 90; i++) {
> st = conn.createStatement();
> st.executeUpdate("INSERT INTO plano_ensaio(id,ensaio_fk,op_fk,data_hora,estado,user_id,dt_hr) VALUES (" + i + ",1,1,'2004-04-04 10:11:11','A','mike','2004-05-05 05:55:55')");
> }
>
> st.close();
> conn.close();
> }
> }
>
> oliver(at)flood:~$ javac -classpath pg74.213.jdbc3.jar Test.java
> oliver(at)flood:~$ java -classpath pg74.213.jdbc3.jar:. Test
> oliver(at)flood:~$ sandbox-7.4.2/bin/psql -p 5742 test
> [... startup banner ...]
> test=> select count(*) from plano_ensaio;
> count
> -------
> 90
> (1 row)
>

Can you provide something similar showing exactly what you're doing?
Obviously there's something different between our setups.

I'd also check the server logs for any errors; perhaps something is
masking a failure to insert.

Finally, if you do the same inserts by hand via psql, what happens?

-O


From: "David Wall" <d(dot)wall(at)computer(dot)org>
To: <pgsql-jdbc(at)postgresql(dot)org>
Subject: JDBC and processing large numbers of rows
Date: 2004-05-11 15:37:14
Message-ID: 024101c4376d$d567d050$3201a8c0@rasta
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

This is more towards JDBC than PG in particular, but since we support PG,
Oracle and eventually other DBMS via JDBC, I thought I'd see if anybody has
any good ideas on how do the following in a somewhat portable way via JDBC
(by large, I'm talking about thousands to millions of rows -- significant
amounts of data)

1) Often people need to "page through" large result set. Some DBs support
the LIMIT,OFFSET construct of SELECT, and while not super efficient, at
least it only returns the window of the result set that is of interest. How
do most people handle such paging through of data? I've heard people cache
the entire result set, but that's not practical because of the memory
overhead and, with the web, not even knowing when to get rid of the
resultset. Do people use work tables or other schemes to "pre-paginate" the
results but store it in a temporary table?

2) How do people process many records in a large resultset? For example,
scanning through a list of rows and then doing some processing based on
those rows, and then perhaps updating the rows to indicate that they've been
processed.

3) How do people use cursors in JDBC? Being able to FETCH seems like a nice
way to handle question #2 above in a batch program, since only a subset of
rows needs to be retrieved from the db at a time. Cursors probably don't
work for question #1 above since keeping a transaction alive across page
views is generally frowned upon and even hard to accomplish since it means
locking up a connection to the db for each paging user.

Any pointers would be appreciated.

Thanks,
David


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-11 19:56:34
Message-ID: 1084305394.1534.121.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

David,
Use cursors to page through really large result sets

Dave

On Tue, 2004-05-11 at 11:37, David Wall wrote:
> This is more towards JDBC than PG in particular, but since we support PG,
> Oracle and eventually other DBMS via JDBC, I thought I'd see if anybody has
> any good ideas on how do the following in a somewhat portable way via JDBC
> (by large, I'm talking about thousands to millions of rows -- significant
> amounts of data)
>
> 1) Often people need to "page through" large result set. Some DBs support
> the LIMIT,OFFSET construct of SELECT, and while not super efficient, at
> least it only returns the window of the result set that is of interest. How
> do most people handle such paging through of data? I've heard people cache
> the entire result set, but that's not practical because of the memory
> overhead and, with the web, not even knowing when to get rid of the
> resultset. Do people use work tables or other schemes to "pre-paginate" the
> results but store it in a temporary table?
>
>
> 2) How do people process many records in a large resultset? For example,
> scanning through a list of rows and then doing some processing based on
> those rows, and then perhaps updating the rows to indicate that they've been
> processed.
>
>
> 3) How do people use cursors in JDBC? Being able to FETCH seems like a nice
> way to handle question #2 above in a batch program, since only a subset of
> rows needs to be retrieved from the db at a time. Cursors probably don't
> work for question #1 above since keeping a transaction alive across page
> views is generally frowned upon and even hard to accomplish since it means
> locking up a connection to the db for each paging user.
>
>
> Any pointers would be appreciated.
>
> Thanks,
> David
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>
>
> !DSPAM:40a0f4e2176411409110076!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


From: "David Wall" <d(dot)wall(at)computer(dot)org>
To: <pg(at)fastcrypt(dot)com>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-11 20:32:29
Message-ID: 036601c43797$4312ee40$3201a8c0@rasta
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

> Use cursors to page through really large result sets

Well, I've thought about that, but that just led me to my 3rd question in my
previous inquiry:

> > 3) How do people use cursors in JDBC? Being able to FETCH seems like a
nice
> > way to handle question #2 above in a batch program, since only a subset
of
> > rows needs to be retrieved from the db at a time. Cursors probably
don't
> > work for question #1 above since keeping a transaction alive across page
> > views is generally frowned upon and even hard to accomplish since it
means
> > locking up a connection to the db for each paging user.

The question for me is how do you portably use cursors so that you can work
with Oracle and PG seamlessly? I presume there might be some (hopefully)
slight variations, like there are with BLOBs, but it would be nice if using
cursors was standardized enough to make it using standard JDBC.

It seems that the issues are with defining a cursor, executing it, fetching
against it, then release it when done. Is there a standard way to do this?
Any examples?

Thanks,
David


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-11 22:51:08
Message-ID: 1084315868.1536.132.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Well, if all else fails you may have to write a wrapper around them to
deal with the discrepancies between oracle and postgres.

One thing though, be warned holdable cursors in postgres have to be
materialized, so you may end up running out of server memory. This means
that you need to be inside a transaction to get a non-holdable cursor.

--dc--

On Tue, 2004-05-11 at 16:32, David Wall wrote:
> > Use cursors to page through really large result sets
>
> Well, I've thought about that, but that just led me to my 3rd question in my
> previous inquiry:
>
> > > 3) How do people use cursors in JDBC? Being able to FETCH seems like a
> nice
> > > way to handle question #2 above in a batch program, since only a subset
> of
> > > rows needs to be retrieved from the db at a time. Cursors probably
> don't
> > > work for question #1 above since keeping a transaction alive across page
> > > views is generally frowned upon and even hard to accomplish since it
> means
> > > locking up a connection to the db for each paging user.
>
> The question for me is how do you portably use cursors so that you can work
> with Oracle and PG seamlessly? I presume there might be some (hopefully)
> slight variations, like there are with BLOBs, but it would be nice if using
> cursors was standardized enough to make it using standard JDBC.
>
> It seems that the issues are with defining a cursor, executing it, fetching
> against it, then release it when done. Is there a standard way to do this?
> Any examples?
>
> Thanks,
> David
>
>
>
> !DSPAM:40a138a962802251020430!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


From: "David Wall" <d(dot)wall(at)computer(dot)org>
To: <pg(at)fastcrypt(dot)com>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 00:14:29
Message-ID: 03f401c437b6$1bb615b0$3201a8c0@rasta
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Thanks, Dave. Does anybody have any simple examples of the series of JDBC
calls used to declare, open, fetch and close a cursor in PG? In Oracle? I
know this is a PG list, so if no Oracle examples, can anybody at least
confirm that using cursors with Oracle and standard JDBC is possible?

There's nothing like having to write custom code to implement what Java
purports to be write once, run anywhere! It seems that the JDBC spec would
have to be severely lacking if you can't do something as simple (and old) as
use cursors in a standard way.

From what little I can gather, it seems that in PG, I'd do something like:

ps = connection.prepareStatement("DECLARE mycursor CURSOR FOR SELECT a,b
FROM mytable;");
ps.execute();
ps = connection.prepareStatement("FETCH 100 FROM mycursor;");
ResultSet rs = ps.executeQuery();
...process the resultset....possibly doing more FETCHes and getting more
resultsets...
ps = connect.prepareStatement("CLOSE mycursor;"); // is that needed, or
will it close on commit?
connection.commit();

Is that even close?

In Oracle, this seems even more questionable because the FETCH semantics
appear to want to use host variables, so I'm not even sure what the FETCH
statement would look like to get the data back in a ResultSet.

ps = connection.prepareStatement("DECLARE CURSOR mycursor FOR SELECT a,b
FROM mytable; END;");
ps.execute();
ps = connection.prepareStatement("FOR 100 FETCH mycursor [INTO????];");
ResultSet rs = ps.executeQuery();
...process the resultset....possibly doing more FETCHes and getting more
resultsets...
ps = connect.prepareStatement("CLOSE mycursor;"); // is that needed, or
will it close on commit?
connection.commit();

Does anybody out there have real experience doing any of this?

Thanks,
David

----- Original Message -----
From: "Dave Cramer" <pg(at)fastcrypt(dot)com>
To: "David Wall" <d(dot)wall(at)computer(dot)org>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Sent: Tuesday, May 11, 2004 3:51 PM
Subject: Re: [JDBC] JDBC and processing large numbers of rows

> Well, if all else fails you may have to write a wrapper around them to
> deal with the discrepancies between oracle and postgres.
>
> One thing though, be warned holdable cursors in postgres have to be
> materialized, so you may end up running out of server memory. This means
> that you need to be inside a transaction to get a non-holdable cursor.
>
> --dc--
>
> On Tue, 2004-05-11 at 16:32, David Wall wrote:
> > > Use cursors to page through really large result sets
> >
> > Well, I've thought about that, but that just led me to my 3rd question
in my
> > previous inquiry:
> >
> > > > 3) How do people use cursors in JDBC? Being able to FETCH seems
like a
> > nice
> > > > way to handle question #2 above in a batch program, since only a
subset
> > of
> > > > rows needs to be retrieved from the db at a time. Cursors probably
> > don't
> > > > work for question #1 above since keeping a transaction alive across
page
> > > > views is generally frowned upon and even hard to accomplish since it
> > means
> > > > locking up a connection to the db for each paging user.
> >
> > The question for me is how do you portably use cursors so that you can
work
> > with Oracle and PG seamlessly? I presume there might be some
(hopefully)
> > slight variations, like there are with BLOBs, but it would be nice if
using
> > cursors was standardized enough to make it using standard JDBC.
> >
> > It seems that the issues are with defining a cursor, executing it,
fetching
> > against it, then release it when done. Is there a standard way to do
this?
> > Any examples?
> >
> > Thanks,
> > David
> >
> >
> >
> > !DSPAM:40a138a962802251020430!
> >
> >
> --
> Dave Cramer
> 519 939 0336
> ICQ # 14675561
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


From: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: pg(at)fastcrypt(dot)com, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 00:55:44
Message-ID: 40A17610.7040600@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

David,

If you are simply trying to limit the amount of data that comes over the
wire so as not to blow out the java process do the following:

Connection myConnection = your connection pool
myConnection.setAutoCommit(false); <---- Make sure you do this.

PreparedStatement ps = ....
ps.setFetchSize(5000);

ps.executeQuery(.....);

This will bring back the result set in 5000 row chunks.

Make sure you do not end your SQL with a ; (semi colon) as that will
cause the fetching part not to work. Don't know why but it does. :-)

--sean

David Wall wrote:

>Thanks, Dave. Does anybody have any simple examples of the series of JDBC
>calls used to declare, open, fetch and close a cursor in PG? In Oracle? I
>know this is a PG list, so if no Oracle examples, can anybody at least
>confirm that using cursors with Oracle and standard JDBC is possible?
>
>There's nothing like having to write custom code to implement what Java
>purports to be write once, run anywhere! It seems that the JDBC spec would
>have to be severely lacking if you can't do something as simple (and old) as
>use cursors in a standard way.
>
>>From what little I can gather, it seems that in PG, I'd do something like:
>
>ps = connection.prepareStatement("DECLARE mycursor CURSOR FOR SELECT a,b
>FROM mytable;");
>ps.execute();
>ps = connection.prepareStatement("FETCH 100 FROM mycursor;");
>ResultSet rs = ps.executeQuery();
>...process the resultset....possibly doing more FETCHes and getting more
>resultsets...
>ps = connect.prepareStatement("CLOSE mycursor;"); // is that needed, or
>will it close on commit?
>connection.commit();
>
>Is that even close?
>
>In Oracle, this seems even more questionable because the FETCH semantics
>appear to want to use host variables, so I'm not even sure what the FETCH
>statement would look like to get the data back in a ResultSet.
>
>ps = connection.prepareStatement("DECLARE CURSOR mycursor FOR SELECT a,b
>FROM mytable; END;");
>ps.execute();
>ps = connection.prepareStatement("FOR 100 FETCH mycursor [INTO????];");
>ResultSet rs = ps.executeQuery();
>...process the resultset....possibly doing more FETCHes and getting more
>resultsets...
>ps = connect.prepareStatement("CLOSE mycursor;"); // is that needed, or
>will it close on commit?
>connection.commit();
>
>
>Does anybody out there have real experience doing any of this?
>
>Thanks,
>David
>
>
>----- Original Message -----
>From: "Dave Cramer" <pg(at)fastcrypt(dot)com>
>To: "David Wall" <d(dot)wall(at)computer(dot)org>
>Cc: <pgsql-jdbc(at)postgresql(dot)org>
>Sent: Tuesday, May 11, 2004 3:51 PM
>Subject: Re: [JDBC] JDBC and processing large numbers of rows
>
>
>
>
>>Well, if all else fails you may have to write a wrapper around them to
>>deal with the discrepancies between oracle and postgres.
>>
>>One thing though, be warned holdable cursors in postgres have to be
>>materialized, so you may end up running out of server memory. This means
>>that you need to be inside a transaction to get a non-holdable cursor.
>>
>>--dc--
>>
>>On Tue, 2004-05-11 at 16:32, David Wall wrote:
>>
>>
>>>>Use cursors to page through really large result sets
>>>>
>>>>
>>>Well, I've thought about that, but that just led me to my 3rd question
>>>
>>>
>in my
>
>
>>>previous inquiry:
>>>
>>>
>>>
>>>>>3) How do people use cursors in JDBC? Being able to FETCH seems
>>>>>
>>>>>
>like a
>
>
>>>nice
>>>
>>>
>>>>>way to handle question #2 above in a batch program, since only a
>>>>>
>>>>>
>subset
>
>
>>>of
>>>
>>>
>>>>>rows needs to be retrieved from the db at a time. Cursors probably
>>>>>
>>>>>
>>>don't
>>>
>>>
>>>>>work for question #1 above since keeping a transaction alive across
>>>>>
>>>>>
>page
>
>
>>>>>views is generally frowned upon and even hard to accomplish since it
>>>>>
>>>>>
>>>means
>>>
>>>
>>>>>locking up a connection to the db for each paging user.
>>>>>
>>>>>
>>>The question for me is how do you portably use cursors so that you can
>>>
>>>
>work
>
>
>>>with Oracle and PG seamlessly? I presume there might be some
>>>
>>>
>(hopefully)
>
>
>>>slight variations, like there are with BLOBs, but it would be nice if
>>>
>>>
>using
>
>
>>>cursors was standardized enough to make it using standard JDBC.
>>>
>>>It seems that the issues are with defining a cursor, executing it,
>>>
>>>
>fetching
>
>
>>>against it, then release it when done. Is there a standard way to do
>>>
>>>
>this?
>
>
>>>Any examples?
>>>
>>>Thanks,
>>>David
>>>
>>>
>>>
>>>!DSPAM:40a138a962802251020430!
>>>
>>>
>>>
>>>
>>--
>>Dave Cramer
>>519 939 0336
>>ICQ # 14675561
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 7: don't forget to increase your free space map settings
>>
>>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>
>


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: pg(at)fastcrypt(dot)com, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 01:07:21
Message-ID: 40A178C9.5040903@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

David Wall wrote:
> Thanks, Dave. Does anybody have any simple examples of the series of JDBC
> calls used to declare, open, fetch and close a cursor in PG? In Oracle? I
> know this is a PG list, so if no Oracle examples, can anybody at least
> confirm that using cursors with Oracle and standard JDBC is possible?
>
> There's nothing like having to write custom code to implement what Java
> purports to be write once, run anywhere! It seems that the JDBC spec would
> have to be severely lacking if you can't do something as simple (and old) as
> use cursors in a standard way.

I don't know -- that seems more a criticism of SQL than of JDBC. I don't
think DECLARE as a query statement exists in the SQL specs at all? The
DECLARE docs say:

> The SQL standard only makes provisions for cursors in embedded SQL. The
> PostgreSQL server does not implement an OPEN statement for cursors; a
> cursor is considered to be open when it is declared. However, ECPG, the
> embedded SQL preprocessor for PostgreSQL, supports the standard SQL
> cursor conventions, including those involving DECLARE and OPEN
> statements.

Anyway, back to your original question ..

JDBC *does* provide a standard way of using cursors to page through
result data. It's called ResultSet. Take a look at the ResultSet row
movement primitives -- they look suspiciously like cursor movement
primitives, don't they?

I'd suggest using an appropriate resultset type (SCROLLABLE_* or
FORWARD_ONLY, depending on your access patterns) and use setFetchSize()
and setFetchDirection() to hint to the driver about what you're doing.
Then just run your unmodified, cursor-less query. With a good driver
implementation you should get paging of the resultset transparently.
This is why all the different resultset types and fetch hints are there
in the first place..

The current postgresql driver will page results from the backend if you
use FORWARD_ONLY, a non-zero fetchsize, and autocommit off. It isn't
quite there yet for scrollable resultsets, but it's close (see the
recent patches from Andy Zeneski). Note that using a scrollable
resultset can be more expensive as the backend can only provide
scrollable cursors for free in some cases -- in other cases, it has to
materialize the whole resultset. This is a cost you'll end up paying
regardless of whether you're using cursors via ResultSet, cursor
manipulation directly, or LIMIT/OFFSET tricks (at least assuming you
actually want to (eventually) process all the data from a query and not
just a subset).

Then we just need holdable resultset support (for the autocommit case --
there's a parameter in JDBC3 for controlling this as holdable cursors
aren't free, especially if you have very large result sets) and it'd be
all transparent.

I don't know how Oracle handles all this, but hiding the cursor work
inside the ResultSet seems like the Right Way to do it.

-O


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
Cc: David Wall <d(dot)wall(at)computer(dot)org>, pg(at)fastcrypt(dot)com, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 01:15:18
Message-ID: 40A17AA6.5070209@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Sean Shanny wrote:

> Make sure you do not end your SQL with a ; (semi colon) as that will
> cause the fetching part not to work. Don't know why but it does. :-)

Technically, "SELECT foo;" is two statements, "SELECT foo" and "", as
JDBC doesn't require a statement terminator. The (simple-minded) JDBC
query parser doesn't do anything special to ignore trailing empty
statements. And you can't DECLARE a cursor that runs two statements.. so
the driver says "two statements, can't use a cursor!".

-O


From: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 01:29:06
Message-ID: 40A17DE2.4030902@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Oliver,

Thank you very much for clearing that up for me. Learn something
everyday on the postgres lists...

--sean

Oliver Jowett wrote:

> Sean Shanny wrote:
>
>> Make sure you do not end your SQL with a ; (semi colon) as that will
>> cause the fetching part not to work. Don't know why but it does. :-)
>
>
> Technically, "SELECT foo;" is two statements, "SELECT foo" and "", as
> JDBC doesn't require a statement terminator. The (simple-minded) JDBC
> query parser doesn't do anything special to ignore trailing empty
> statements. And you can't DECLARE a cursor that runs two statements..
> so the driver says "two statements, can't use a cursor!".
>
> -O
>


From: "David Wall" <d(dot)wall(at)computer(dot)org>
To: "Oliver Jowett" <oliver(at)opencloud(dot)com>
Cc: <pg(at)fastcrypt(dot)com>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 03:07:12
Message-ID: 042801c437ce$422f4000$3201a8c0@rasta
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Thanks Oliver and Sean. I was just trying to do it the hard way using plain
SQL. It's great that the ResultSet can handle this for me. I'll give that
a try.

David

----- Original Message -----
From: "Oliver Jowett" <oliver(at)opencloud(dot)com>
To: "David Wall" <d(dot)wall(at)computer(dot)org>
Cc: <pg(at)fastcrypt(dot)com>; <pgsql-jdbc(at)postgresql(dot)org>
Sent: Tuesday, May 11, 2004 6:07 PM
Subject: Re: [JDBC] JDBC and processing large numbers of rows

> David Wall wrote:
> > Thanks, Dave. Does anybody have any simple examples of the series of
JDBC
> > calls used to declare, open, fetch and close a cursor in PG? In Oracle?
I
> > know this is a PG list, so if no Oracle examples, can anybody at least
> > confirm that using cursors with Oracle and standard JDBC is possible?
> >
> > There's nothing like having to write custom code to implement what Java
> > purports to be write once, run anywhere! It seems that the JDBC spec
would
> > have to be severely lacking if you can't do something as simple (and
old) as
> > use cursors in a standard way.
>
> I don't know -- that seems more a criticism of SQL than of JDBC. I don't
> think DECLARE as a query statement exists in the SQL specs at all? The
> DECLARE docs say:
>
> > The SQL standard only makes provisions for cursors in embedded SQL. The
> > PostgreSQL server does not implement an OPEN statement for cursors; a
> > cursor is considered to be open when it is declared. However, ECPG, the
> > embedded SQL preprocessor for PostgreSQL, supports the standard SQL
> > cursor conventions, including those involving DECLARE and OPEN
> > statements.
>
> Anyway, back to your original question ..
>
> JDBC *does* provide a standard way of using cursors to page through
> result data. It's called ResultSet. Take a look at the ResultSet row
> movement primitives -- they look suspiciously like cursor movement
> primitives, don't they?
>
> I'd suggest using an appropriate resultset type (SCROLLABLE_* or
> FORWARD_ONLY, depending on your access patterns) and use setFetchSize()
> and setFetchDirection() to hint to the driver about what you're doing.
> Then just run your unmodified, cursor-less query. With a good driver
> implementation you should get paging of the resultset transparently.
> This is why all the different resultset types and fetch hints are there
> in the first place..
>
> The current postgresql driver will page results from the backend if you
> use FORWARD_ONLY, a non-zero fetchsize, and autocommit off. It isn't
> quite there yet for scrollable resultsets, but it's close (see the
> recent patches from Andy Zeneski). Note that using a scrollable
> resultset can be more expensive as the backend can only provide
> scrollable cursors for free in some cases -- in other cases, it has to
> materialize the whole resultset. This is a cost you'll end up paying
> regardless of whether you're using cursors via ResultSet, cursor
> manipulation directly, or LIMIT/OFFSET tricks (at least assuming you
> actually want to (eventually) process all the data from a query and not
> just a subset).
>
> Then we just need holdable resultset support (for the autocommit case --
> there's a parameter in JDBC3 for controlling this as holdable cursors
> aren't free, especially if you have very large result sets) and it'd be
> all transparent.
>
> I don't know how Oracle handles all this, but hiding the cursor work
> inside the ResultSet seems like the Right Way to do it.
>
> -O


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: Sean Shanny <shannyconsulting(at)earthlink(dot)net>, David Wall <d(dot)wall(at)computer(dot)org>, pg(at)fastcrypt(dot)com, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 03:28:36
Message-ID: 19456.1084332516@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Oliver Jowett <oliver(at)opencloud(dot)com> writes:
> Technically, "SELECT foo;" is two statements, "SELECT foo" and "", as
> JDBC doesn't require a statement terminator. The (simple-minded) JDBC
> query parser doesn't do anything special to ignore trailing empty
> statements.

FWIW, the backend's parser does go out of its way to discard empty
statements (a/k/a useless semicolons). I do not know how hard it would
be to teach JDBC's parser to do likewise, but it might be worth the
trouble if not too painful.

regards, tom lane


From: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 06:37:42
Message-ID: 200405120837.42865.guido.fiala@dka-gmbh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Reading all this i'd like to know if all this isn't just a tradeof between
_where_ the memory is consumed?

If your JDBC-client holds all in memory - it gets an OutOfMem-Exception.

If your backend uses Cursors - it caches the whole resultset and probably
starts swapping and gets slow (needs the memory of all users).

If you use Limit and Offset the database has to do more to find the
data-snippet and in worst case (last few records) still needs temporary the
whole resultset? (not sure here)

Is that just a "choose your poison" ? At least in the first case the memory of
the Client _gets_ used too and not all load to the backend, on the other side
- most the the user does not really read all the data at all, so it puts
unnecessary load on all the hardware.

Really like to know what the best way to go is then...

Guido


From: Kris Jurka <books(at)ejurka(dot)com>
To: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 10:00:49
Message-ID: Pine.BSO.4.56.0405120453560.5292@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Wed, 12 May 2004, Guido Fiala wrote:

> Reading all this i'd like to know if all this isn't just a tradeof between
> _where_ the memory is consumed?
>
> If your backend uses Cursors - it caches the whole resultset and probably
> starts swapping and gets slow (needs the memory of all users).
>

The backend spools to a file when a materialized cursor uses more than
sort_mem amount of memory. This is not quite the same as swapping as it
will consume disk bandwidth, but it won't hog memory from other
applications.

Kris Jurka


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 10:56:54
Message-ID: 1084359414.1536.149.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Guido,

No, this isn't the case, if you use cursors inside a transaction then
you will be able to have an arbitrarily large cursor open ( of any size
AFAIK )

--dc--
On Wed, 2004-05-12 at 02:37, Guido Fiala wrote:
> Reading all this i'd like to know if all this isn't just a tradeof between
> _where_ the memory is consumed?
>
> If your JDBC-client holds all in memory - it gets an OutOfMem-Exception.
>
> If your backend uses Cursors - it caches the whole resultset and probably
> starts swapping and gets slow (needs the memory of all users).
>
> If you use Limit and Offset the database has to do more to find the
> data-snippet and in worst case (last few records) still needs temporary the
> whole resultset? (not sure here)
>
> Is that just a "choose your poison" ? At least in the first case the memory of
> the Client _gets_ used too and not all load to the backend, on the other side
> - most the the user does not really read all the data at all, so it puts
> unnecessary load on all the hardware.
>
> Really like to know what the best way to go is then...
>
> Guido
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>
>
> !DSPAM:40a1c98a223941159885930!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 11:36:41
Message-ID: 40A20C49.6090802@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Guido Fiala wrote:
> Reading all this i'd like to know if all this isn't just a tradeof between
> _where_ the memory is consumed?
>
> If your JDBC-client holds all in memory - it gets an OutOfMem-Exception.

Yes. The current driver attempts to hold the entire resultset in heap if
not using cursors. In theory, the driver could try to spill large
resultsets to disk (but see below).

> If your backend uses Cursors - it caches the whole resultset and probably
> starts swapping and gets slow (needs the memory of all users).

As I understand it (break out the salt!), the backend's cursor behaviour
depends on both the query and the cursor type. For a NO SCROLL cursor,
the backend is doing no more real work than for normal query retrieval
(it's just changing *when* the rows are retrieved). For a SCROLL cursor,
the backend may or may not need additional storage depending on the
complexity of the query. When the backend does need to store the
resultset, it will spill large resultsets to disk rather than keep them
in-memory (and possibly this storage is more efficient than just storing
the raw tuples, as it could just reference the original tuples in the
main table itself -- not sure if this is how it actually works though).
For a WITH HOLD cursor, in addition to the SCROLL/NOSCROLL behaviour the
backend will preserve a copy of the resultset when it can no longer
safely derive the results directly from the database (i.e. when the
enclosing transaction commits).

The DECLARE documentation discusses some of this.

> If you use Limit and Offset the database has to do more to find the
> data-snippet and in worst case (last few records) still needs temporary the
> whole resultset? (not sure here)

I'd guess that in the non-zero OFFSET case the backend skips and
discards the initial rows (why would it need to store them?), i.e. it's
similar in cost to doing a DECLARE/MOVE FORWARD/FETCH FORWARD sequence.
But you end up rerunning the start of the query every time you want some
more data..

> Is that just a "choose your poison" ? At least in the first case the memory of
> the Client _gets_ used too and not all load to the backend, on the other side
> - most the the user does not really read all the data at all, so it puts
> unnecessary load on all the hardware.

I lean towards using cursors. The backend can handle some cases more
efficiently than a client can, simply because it knows more about how to
generate the data and where it might already be stored. Also it seems a
bit redundant to do the same work to deal with large datasets (spill to
disk, etc) on both the client and the server.

-O


From: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 12:31:08
Message-ID: 200405121431.08734.guido.fiala@dka-gmbh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Am Mittwoch, 12. Mai 2004 12:00 schrieb Kris Jurka:
> The backend spools to a file when a materialized cursor uses more than
> sort_mem amount of memory. This is not quite the same as swapping as it
> will consume disk bandwidth, but it won't hog memory from other
> applications.

Well thats good on one side, but from the side of the user its worse:

He will see a large drop in performance (factor 1000) ASAP the database starts
using disk for such things. Ok - once the database is to large to be hold in
memory it is disk-bandwith-limited anyway...


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 13:23:16
Message-ID: 40A22544.8000206@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Guido Fiala wrote:
> Am Mittwoch, 12. Mai 2004 12:00 schrieb Kris Jurka:
>
>>The backend spools to a file when a materialized cursor uses more than
>>sort_mem amount of memory. This is not quite the same as swapping as it
>>will consume disk bandwidth, but it won't hog memory from other
>>applications.
>
>
> Well thats good on one side, but from the side of the user its worse:
>
> He will see a large drop in performance (factor 1000) ASAP the database starts
> using disk for such things. Ok - once the database is to large to be hold in
> memory it is disk-bandwith-limited anyway...

What about the kernel cache? I doubt you'll see a *sudden* drop in
performance .. it'll just degrade gradually towards disk speeds as your
resultset gets larger.

-O