Re: C libpq frontend library fetchsize

Lists: pgsql-hackers
From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: C libpq frontend library fetchsize
Date: 2010-02-26 20:28:01
Message-ID: 4B882ED1.9060603@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello list,

I'm wondering if there would be community support for adding using the
execute message with a rownum > 0 in the c libpq client library, as it
is used by the jdbc driver with setFetchSize.

kind regards,
Yeb Havinga


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-02-28 01:30:17
Message-ID: 603c8f071002271730j784f173eu84eef212a3060b5f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
> I'm wondering if there would be community support for adding using the
> execute message with a rownum > 0 in the c libpq client library, as it is
> used by the jdbc driver with setFetchSize.

Not sure I follow what you're asking... what would the new/changed
function signature be?

...Robert


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-01 08:47:40
Message-ID: 20100301174740.9975.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:

> I'm wondering if there would be community support for adding using the
> execute message with a rownum > 0 in the c libpq client library, as it
> is used by the jdbc driver with setFetchSize.

The setFetchSize for libpq is difficult because of the interface
mismatch -- libpq uses array-based APIs (PGresult) and JDBC uses a
cursor-like API (ResultSet). Instead, you can use CURSOR and FETCH
commands to retrieve rows in separated PGresult objects.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, Willem Dijkstra <wpd(at)xs4all(dot)nl>
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-01 09:55:46
Message-ID: 4B8B8F22.8040100@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Takahiro Itagaki wrote:
> Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote
>> I'm wondering if there would be community support for adding using the
>> execute message with a rownum > 0 in the c libpq client library, as it
>> is used by the jdbc driver with setFetchSize.
>>
>
> The setFetchSize for libpq is difficult because of the interface
> mismatch -- libpq uses array-based APIs (PGresult) and JDBC uses a
> cursor-like API (ResultSet). Instead, you can use CURSOR and FETCH
> commands to retrieve rows in separated PGresult objects.
>
Hello Takahiro,

Thank you for your reply. Yes there is a large overlap with SQL's
declare cursor together with fetch, however intuitively it feels strange
that the application needs to be changed for something that could be
kept internal to the communication protocol. The application is forced
to start an application explicitly, and also I tried to declare a cursor
with parameters from sql but did not succeed, nor could I declare a
cursors in combination with a prepared statement. Building fetchsize
into libpq seems like a general solution that could work for all
pgresult fetching.

Together with a college of mine I worked on this subject some time ago,
and we thought it might be interesting enough for others as well. In
short this is what we did:
- add a boolean 'complete' to the PGresult object - to indicate if the
portalrun was complete
- add PQresultComplete api call that returns the boolean above.
- add PQsendSync
- in the query guts routines, do not send sync messages
- call pqsendsync when a complete pqresult is received, or if result
fetching is stopped before fetching the last result.
- the fetchsize was a defined constant in our application, but would be
trivial to replace with a PQsetFetchsize.

There are some intricacies with reading with libpq with an asynchronous
event handler. One was that parseinput does not 'eat' the whole message
buffer, i.e. there can be another complete message in the buffer but a
new read event might not be triggered because no more data arrives into
the buffer. For this purpose we also added
- PQmsgAvail - returns true if the event handler may fire again
immediately (to process more results)

The biggest challenge was when to send sync messages. We have that
covered currently but it is done by our application outside of libpq
(hence the pqsendsync visible in the api as well as pqresultcomplete).
It would be better if that all could be kept inside libpq itself. In the
end we believe this could provide useful functions for callers of the
libpq api such as PHP.

regards,
Yeb Havinga


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-18 16:54:25
Message-ID: 4BA25AC1.4080406@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
>
>> I'm wondering if there would be community support for adding using the
>> execute message with a rownum > 0 in the c libpq client library, as it is
>> used by the jdbc driver with setFetchSize.
>>
>
> Not sure I follow what you're asking... what would the new/changed
> function signature be?
>
Hello Robert, list

I'm sorry I did not catch your reply until I searched in the archives on
libpq, I hope you are not offended. However I think the question is
answered somewhat in a reply I sent to Takahiro Itagaki, viz:
http://archives.postgresql.org/pgsql-hackers/2010-03/msg00015.php

The recent posting in PERFORM where someone compares mysql vs postgresql
speed is caused by libpq / whole pgresult as one time.
(http://archives.postgresql.org/pgsql-performance/2010-03/msg00228.php)

ISTM that using cursors and then fetch is not an adequate solution,
because 1) someone must realise that the pgresult object is
gathered/transfered under the hood of libpq completely before the first
row can be used by the application 2) the structure of the application
layer is altered to make use of partial results.

What if the default operation of e.g. php using libpq would be as
follows: set some default fetchsize (e.g. 1000 rows), then just issue
getrow. In the php pg handling, a function like getnextrow would wait
for the first pgresult with 1000 rows. Then if the pgresult is depleted
or almost depleted, request the next pgresult automatically. I see a lot
of benefits like less memory requirements in libpq, less new users with
why is my query so slow before the first row, and almost no concerns. A
small overhead of row description messages perhaps. Maybe the biggest
benefit of a pgsetfetchsize api call would be to raise awareness that of
the fact that pgresults are transfered completely (or partially if there
is animo for me/collegue of mine working on a patch for this).

Besides that, another approach to get data to clients faster could be by
perhaps using lzo, much in the same way that google uses zippy (see e.g.
http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/)
to speed up data transfer and delivery. LZO has been mentioned before on
mailing lists for pg_dump compression, but I think that with a
--enable-lzo also libpq could benefit too.
(http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php)

regards,
Yeb Havinga


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-18 17:00:01
Message-ID: 13404.1268931601@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Yeb Havinga <yebhavinga(at)gmail(dot)com> writes:
> What if the default operation of e.g. php using libpq would be as
> follows: set some default fetchsize (e.g. 1000 rows), then just issue
> getrow. In the php pg handling, a function like getnextrow would wait
> for the first pgresult with 1000 rows. Then if the pgresult is depleted
> or almost depleted, request the next pgresult automatically. I see a lot
> of benefits like less memory requirements in libpq, less new users with
> why is my query so slow before the first row, and almost no concerns.

You are blithely ignoring the reasons why libpq doesn't do this. The
main one being that it's impossible to cope sanely with queries that
fail partway through execution. The described implementation would not
cope tremendously well with nonsequential access to the resultset, either.

regards, tom lane


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-18 17:21:15
Message-ID: 4BA2610B.30408@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Yeb Havinga <yebhavinga(at)gmail(dot)com> writes:
>
>> What if the default operation of e.g. php using libpq would be as
>> follows: set some default fetchsize (e.g. 1000 rows), then just issue
>> getrow. In the php pg handling, a function like getnextrow would wait
>> for the first pgresult with 1000 rows. Then if the pgresult is depleted
>> or almost depleted, request the next pgresult automatically. I see a lot
>> of benefits like less memory requirements in libpq, less new users with
>> why is my query so slow before the first row, and almost no concerns.
>>
>
> You are blithely ignoring the reasons why libpq doesn't do this. The
> main one being that it's impossible to cope sanely with queries that
> fail partway through execution.
I'm sorry I forgot to add a reference to your post of
http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which
is the only reference to queries failing partway that I know of. But
blithely is not a good description of me ignoring it. I though about how
queries could fail, but can't think of anything else than e.g. memory
exhaustion, and that is just one of the things that is improved this
way. Maybe a user defined type with an error on certain data values, but
then the same arguing could be: why support UDT? And if a query fails
during execution, does that mean that the rows returned until that point
are wrong?
> The described implementation would not
> cope tremendously well with nonsequential access to the resultset, either.
>
That's why I'm not proposing to replace the current way pgresults are
made complete, but just an extra option to enable developers using the
libpq libary making the choice themselves.

regards,
Yeb Havinga


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-19 23:12:40
Message-ID: 603c8f071003191612n5a2fd8c6rae6d9857baad8364@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 18, 2010 at 1:21 PM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
> Tom Lane wrote:
>>
>> Yeb Havinga <yebhavinga(at)gmail(dot)com> writes:
>>
>>>
>>> What if the default operation of e.g. php using libpq would be as
>>> follows: set some default fetchsize (e.g. 1000 rows), then just issue
>>> getrow. In the php pg handling, a function like getnextrow would wait for
>>> the first pgresult with 1000 rows. Then if the pgresult is depleted or
>>> almost depleted, request the next pgresult automatically. I see a lot of
>>> benefits like less memory requirements in libpq, less new users with why is
>>> my query so slow before the first row, and almost no concerns.
>>>
>>
>> You are blithely ignoring the reasons why libpq doesn't do this.  The
>> main one being that it's impossible to cope sanely with queries that
>> fail partway through execution.
>
> I'm sorry I forgot to add a reference to your post of
> http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which is
> the only reference to queries failing partway that I know of. But blithely
> is not a good description of me ignoring it. I though about how queries
> could fail, but can't think of anything else than e.g. memory exhaustion,
> and that is just one of the things that is improved this way. Maybe a user
> defined type with an error on certain data values, but then the same arguing
> could be: why support UDT? And if a query fails during execution, does that
> mean that the rows returned until that point are wrong?
>>
>>  The described implementation would not
>> cope tremendously well with nonsequential access to the resultset, either.
>>
>
> That's why I'm not proposing to replace the current way pgresults are made
> complete, but just an extra option to enable developers using the libpq
> libary making the choice themselves.

This seems pretty reasonable to me, especially considering that JDBC
is apparently already doing it. I suppose there will always be
projects that want to reimplement the backend protocol so that they
can be "pure" some-language, but chipping away at the list of other
reasons why someone might not want to use libpq still seems like a
good idea.

...Robert


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: C libpq frontend library fetchsize
Date: 2010-03-22 08:21:51
Message-ID: 4BA7289F.5030702@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> This seems pretty reasonable to me, especially considering that JDBC
> is apparently already doing it. I suppose there will always be
> projects that want to reimplement the backend protocol so that they
> can be "pure" some-language, but chipping away at the list of other
> reasons why someone might not want to use libpq still seems like a
> good idea.
>
> ...Robert
>
Hello Robert,

Thank you for your support, but am unsure if it is wise to put time in
it due to the other reactions.

regards,
Yeb Havinga