Quick Links

Re: Incremental results from libpq

Lists:	pgsql-interfaces

From:	Scott Lamb <slamb(at)slamb(dot)org>
To:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Incremental results from libpq
Date:	2005-11-09 19:36:29
Message-ID:	4D426B54-E08F-4DE9-9776-D46610CA4968@slamb.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

I'm using libpq's asynchronous interface. I'd like to do the
equivalent of setting fetchSize on the JDBC driver - get a chunk of
results, handle them, free the memory they use, and then get another
until I've handled an entire query.

I can do this at the SQL layer using "declare cursor ..." and "fetch
forward n ..." but it seems like the lower level should be able to do
this for me. It'd also let me have a more natural interface that (A)
doesn't make the caller take a PostgreSQL-specific declare/fetch path
for each query (B) can still use the JDBC-style "execute" that
doesn't care if it's dealing with a row-returning statement.

I see that JDBC driver (at least in protocol version 3; I don't care
about 2) does this by passing a maximum number of rows when sending
Execute, then handling PortalSuspended and Execute again. I also see
that libpq never sends a maximum number of rows or handles
PortalSuspended.

Still, I think it should be able to do what I want. The results are
sent from the database in order. This message type would be necessary
to ensure the database sends no more than N rows, but it's not
necessary to ensure the client handles N rows as soon as it has them.

I had been retrieving results from a query in this fashion:

while True:
readfds = [PQsocket(conn)]
writefds = []
if PQflush(conn):
writefds = readfds
if PQconsumeInput(conn):
error
if not PQisBusy(conn):
break
poll(readfds, writefds)
return PQgetResult(conn)

which the documentation recommends. But PQisBusy(conn) doesn't return
false until the _entire_ resultset has been retrieved from the
server. And if I skip PQisBusy(conn) and go straight for the
PQgetResult(), it blocks internally until it can complete.

I looked inside libpq, and came up with this instead:

while True:
readfds = [PQsocket(conn)]
writefds = []
if PQflush(conn):
writefds = readfds
if PQconsumeInput(conn):
error
if PQisBusy(conn):
break
if conn->result != NULL and PQntuples(conn->result) >
retrieved:
return conn->result
poll(readfds, writefds)
last = True
return PQgetResult(conn)

where "retrieved" is the number of rows I've examined so far, and
"last" indicates that I shouldn't call again.

which is 1/3rd right:

- It does return results incrementally; good.
- It pokes inside libpq; ugh.
- It doesn't free any memory until the whole query's done. I suppose
I could do that by changing conn->result myself, but...ugh. Is there
a better way?

--
Scott Lamb <http://www.slamb.org/>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Scott Lamb <slamb(at)slamb(dot)org>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-09 21:22:08
Message-ID:	19391.1131571328@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Scott Lamb <slamb(at)slamb(dot)org> writes:
> Is there a better way?

Not at the moment. It's been requested before though, so if you want to
develop a patch for libpq, have at it.

The main reason why libpq does what it does is that this way we do not
have to expose in the API the notion of a command that fails part way
through. If you support partial result fetching then you'll have to
deal with the idea that a SELECT could fail after you've already
returned some rows to the client. I am not sure that this is a huge
deal, but it definitely uglifies the API a bit. It would be a good
idea to think through exactly what clients will need to do to cope with
that fact before you start designing the API extension.

regards, tom lane

From:	Scott Lamb <slamb(at)slamb(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-09 22:09:09
Message-ID:	84C7635D-EAF2-40CF-A71C-976077D5465A@slamb.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

On Nov 9, 2005, at 1:22 PM, Tom Lane wrote:
> Scott Lamb <slamb(at)slamb(dot)org> writes:
>> Is there a better way?
>
> Not at the moment. It's been requested before though, so if you
> want to
> develop a patch for libpq, have at it.
>
> The main reason why libpq does what it does is that this way we do not
> have to expose in the API the notion of a command that fails part way
> through. If you support partial result fetching then you'll have to
> deal with the idea that a SELECT could fail after you've already
> returned some rows to the client. I am not sure that this is a huge
> deal, but it definitely uglifies the API a bit. It would be a good
> idea to think through exactly what clients will need to do to cope
> with
> that fact before you start designing the API extension.

Cool. I think I'll get my own interface hashed out in my kludgey way,
then look at the broader need if it's a success.

My first idea, though, is to add a callback interface - "got the
RowDescription", "got a DataRow" - and make the storage of stuff
tuples in PGresult optional. (Maybe pqAddTuple would just be the
default callback.)

Regards,
Scott

--
Scott Lamb <http://www.slamb.org/>

From:	Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-10 09:11:45
Message-ID:	200511101011.45702.ftm.van.vugt@foxi.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

> >> Is there a better way?
> >
> > Not at the moment. It's been requested before though, so if you
> > want to develop a patch for libpq, have at it.

> Cool. I think I'll get my own interface hashed out in my kludgey way,
> then look at the broader need if it's a success.

Just FYI, I'm interested in this extension as well.

At the moment, the PostgreSQL driver in TrollTech's Qt will fetch the complete
result of any select-query as opposed to for example the Oracle driver which
will fetch just the rows that are desired. This results in the fact that for
example database-enabled tables in the UI won't do the initial paint as
'snappy' as they could since although they will show only part of the
result-set, the underlying driver is still picking up the complete set or
records from the backend.

--
Best,

Frank.

From:	Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To:	pgsql-interfaces(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Incremental results from libpq
Date:	2005-11-10 16:51:54
Message-ID:	200511101751.55377.ftm.van.vugt@foxi.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

> > The main reason why libpq does what it does is that this way we do not
> > have to expose in the API the notion of a command that fails part way
> > through. If you support partial result fetching then you'll have to
> > deal with the idea that a SELECT could fail after you've already
> > returned some rows to the client.

I'm wondering, what kind of failure do you have in mind, here? If I'm informed
correctly then Oracle and others are generating the complete static result
set on the server-side, which will then stay cached until all rows/chunks are
fetched. The one failure that comes to mind in this scenario is that the
connection breaks down, but since informing the client would then be a bit
difficult, you'll certainly be referring to something else ;)

If PostgreSQL were to build the complete result-set before handing over the
first fetched rows/chunks, then I understand. Is that the case? Or something
else even...?

--
Best,

Frank.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-10 17:03:12
Message-ID:	9994.1131642192@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl> writes:
> The main reason why libpq does what it does is that this way we do not
> have to expose in the API the notion of a command that fails part way
> through. If you support partial result fetching then you'll have to
> deal with the idea that a SELECT could fail after you've already
> returned some rows to the client.

> I'm wondering, what kind of failure do you have in mind, here?

There are lots of possibilities, but one fairly obvious example is

SELECT 1/x FROM tab;

where the 10000'th row of tab contains x = 0. The server will detect
the zero-divide error only after it's already shipped 9999 rows to the
client. Currently, when libpq gets the error it throws away the 9999
rows and just returns an "error" PQresult to the application. If you
let libpq return some rows before it's got the whole result set, then
you need to decide what the API behavior ought to be for a failure after
a partial result set has been returned. This in turn requires a little
thought about how client apps are likely to want to cope with this
scenario, so that you don't come up with a painful-to-use API.

regards, tom lane

From:	Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-10 17:09:54
Message-ID:	200511101809.55139.ftm.van.vugt@foxi.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

> There are lots of possibilities, but one fairly obvious example is

Got it, thanks.

So if the the Oracle part is true, I guess this is a winner for PostgreSQL ;)

--
Best,

Frank.

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
Cc:	pgsql-interfaces(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Incremental results from libpq
Date:	2005-11-10 19:27:18
Message-ID:	877jbgb8x5.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl> writes:

> I'm wondering, what kind of failure do you have in mind, here? If I'm informed
> correctly then Oracle and others are generating the complete static result
> set on the server-side, which will then stay cached until all rows/chunks are
> fetched.

That's obviously not true. Try doing "select * from huge_table" on Oracle and
you'll see records start appearing immediately. There are applications where
huge_table could occupy hundreds of gigabytes (or petabytes) and requiring all
queries to create copies of all their result sets before proceeding would make
lots of important applications entirely infeasible.

--
greg

From:	Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-10 20:33:20
Message-ID:	200511102133.20904.ftm.van.vugt@foxi.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

> > If I'm informed correctly then Oracle and others are generating
> > the complete static result set on the server-side, which will then
> > stay cached until all rows/chunks are fetched.

> That's obviously not true. Try doing "select * from huge_table" on Oracle
> and you'll see records start appearing immediately.

Then it seems I haven't (been informed correctly) :)

Now, given that OCI is offering some kind of asynchronous interface as well,
how are they handling an error occuring after a partial result-set has been
retrieved by the client?

--
Best,

Frank.

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	pgsql-interfaces(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Scott Lamb <slamb(at)slamb(dot)org>
Subject:	Re: Incremental results from libpq
Date:	2005-11-16 09:34:58
Message-ID:	200511161034.58561.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Am Mittwoch, 9. November 2005 22:22 schrieb Tom Lane:
> The main reason why libpq does what it does is that this way we do not
> have to expose in the API the notion of a command that fails part way
> through.

I'm at LinuxWorld Frankfurt and one of the Trolltech guys came over to talk to
me about this. He opined that it would be beneficial for their purpose (in
certain cases) if the server would first compute the entire result set and
keep it in the server memory (thus eliminating potential errors of the 1/x
kind) and then ship it to the client in a way that the client would be able
to fetch it piecewise. Then, the client application could build the display
incrementally while the rest of the result set travels over the (slow) link.
Does that make sense?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-interfaces(at)postgresql(dot)org, Scott Lamb <slamb(at)slamb(dot)org>
Subject:	Re: Incremental results from libpq
Date:	2005-11-16 14:24:24
Message-ID:	5329.1132151064@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Am Mittwoch, 9. November 2005 22:22 schrieb Tom Lane:
>> The main reason why libpq does what it does is that this way we do not
>> have to expose in the API the notion of a command that fails part way
>> through.

> I'm at LinuxWorld Frankfurt and one of the Trolltech guys came over to talk to
> me about this. He opined that it would be beneficial for their purpose (in
> certain cases) if the server would first compute the entire result set and
> keep it in the server memory (thus eliminating potential errors of the 1/x
> kind) and then ship it to the client in a way that the client would be able
> to fetch it piecewise. Then, the client application could build the display
> incrementally while the rest of the result set travels over the (slow) link.
> Does that make sense?

Ick. That seems pretty horrid compared to the straight
incremental-compute-and-fetch approach. Yes, it preserves the illusion
that a SELECT is all-or-nothing, but at a very high cost, both in terms
of absolute runtime and in terms of needing a new concept in the
frontend protocol. It also doesn't solve the problem for people who
need incremental fetch because they have a result set so large they
don't want it materialized on either end of the wire. Furthermore, ISTM
that any client app that's engaging in incremental fetches really has to
deal with the failure-after-part-of-the-query-is-done problem anyway,
because there's always a risk of failures on the client side or in the
network connection. So I don't see any real gain in conceptual
simplicity from adding this feature anyway.

Note that if Trolltech really want this behavior, they can have it today
--- it's called CREATE TEMP TABLE AS SELECT. It doesn't seem attractive
enough to me to justify any further feature than that.

regards, tom lane

From:	Scott Lamb <slamb(at)slamb(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Incremental results from libpq
Date:	2005-11-17 22:04:26
Message-ID:	6D998DAD-8132-4360-9B8D-800918577F7C@slamb.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

On Nov 9, 2005, at 2:09 PM, Scott Lamb wrote:
> Cool. I think I'll get my own interface hashed out in my kludgey
> way, then look at the broader need if it's a success.

I've done some of this. I've got the start of a new Python<-
>PostgreSQL interface at <http://www.slamb.org/svn/repos/trunk/
projects/apypg/>.

Currently, it just "peeks" at connection->result after polling, grabs
a bunch of results and handles them, then empties the resultset
kludgily. (If it has a different idea of PGRESULT_DATA_BLOCKSIZE than
libpq, it'll probably crash.)

Aside from the nastiness of playing with libpq's internal data
structures, it doesn't handle the case Tom mentioned particularly
well. It will handle an undetermined fraction of the rows before the
error, then error out. I think it should handle _all_ the rows it
gets before the error consistently.

> My first idea, though, is to add a callback interface - "got the
> RowDescription", "got a DataRow" - and make the storage of stuff
> tuples in PGresult optional. (Maybe pqAddTuple would just be the
> default callback.)

Actually, I'm not sure about this anymore. There's a complication -
if I want to do something fancy while handling a row - a Python
exception, C++ exception, thread cancellation - I don't think there's
any good way to do that in a callback structure.

Another complication is the limitation of one active resultset. libpq
needs to be extended to support cursors other than the unnamed one
('') in order to handle other statements as it iterates over an
incremental result set.

--
Scott Lamb <http://www.slamb.org/>