Re: Protocol 3, Execute, maxrows to return, impact?

Lists: pgsql-hackers
From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-09 18:50:39
Message-ID: 20080709185039.GA8190@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I was wondering, if there is any real advantage to actually specify say
64 for the maxrows parameter to the Execute message in the PostgreSQL
network protocol?

I.e.:
- Is it easier on the server, because it somehow uses less total memory when
batching the Executes with 64 rows at a time?
- Is it better for latency when expecting notices/notifies in between?
- Is it simply slowing down the protocol because every 64 rows I
introduce a round-trip delay for the next PortalSuspended/Execute combo?

Please note that we're only running the query once, I'm just batching
the results with the Execute maxrows parameter.
--
Sincerely,
Stephen R. van den Berg.

Limited offer: Pacemakers, with a unique life-time warranty


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 03:31:10
Message-ID: 21589.1215660670@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
> I was wondering, if there is any real advantage to actually specify say
> 64 for the maxrows parameter to the Execute message in the PostgreSQL
> network protocol?

There's no benefit in it from the server's perspective, if that's what
you meant. The point of the parameter is to allow the client to avoid
running out of memory to store all of a huge query result --- it can
pull it in sections, instead. (Think of it as a built-in cursor
FETCH facility.)

> - Is it simply slowing down the protocol because every 64 rows I
> introduce a round-trip delay for the next PortalSuspended/Execute combo?

Yup; there's no free lunch ...

regards, tom lane


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 05:54:17
Message-ID: 64b4daae0807092254m38f76553s62c7bf235a49f8a8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 10, 2008 at 05:31, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
>> I was wondering, if there is any real advantage to actually specify say
>> 64 for the maxrows parameter to the Execute message in the PostgreSQL
>> network protocol?

> There's no benefit in it from the server's perspective, if that's what
> you meant. The point of the parameter is to allow the client to avoid
> running out of memory to store all of a huge query result --- it can
> pull it in sections, instead. (Think of it as a built-in cursor
> FETCH facility.)

Then, from a client perspective, there is no use at all, because the
client can actually pause reading the results at any time it wants,
when it wants to avoid storing all of the result rows. The network
will perform the cursor/fetch facility for it.
--
Sincerely,
Stephen R. van den Berg.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 14:22:28
Message-ID: 1116.1215699748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
> Then, from a client perspective, there is no use at all, because the
> client can actually pause reading the results at any time it wants,
> when it wants to avoid storing all of the result rows. The network
> will perform the cursor/fetch facility for it.

[ shrug... ] In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.

A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 14:30:42
Message-ID: 48761D12.9000202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
>
>> Then, from a client perspective, there is no use at all, because the
>> client can actually pause reading the results at any time it wants,
>> when it wants to avoid storing all of the result rows. The network
>> will perform the cursor/fetch facility for it.
>>
>
> [ shrug... ] In principle you could write a client library that would
> act that way, but I think you'll find that none of the extant ones
> will hand back an incomplete query result to the application.
>
> A possibly more convincing argument is that with that approach, the
> connection is completely tied up --- you cannot issue additional
> database commands based on what you just read, nor pull rows from
> multiple portals in an interleaved fashion.
>
>

I really think we need to get something like this into libpq. It's on my
TODO list after notification payloads and libpq support for arrays and
composites. We'll need to come up with an API before we do much else.

cheers

andrew


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 16:40:03
Message-ID: 20080710164003.GA25117@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>"Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
>> Then, from a client perspective, there is no use at all, because the
>> client can actually pause reading the results at any time it wants,
>> when it wants to avoid storing all of the result rows. The network
>> will perform the cursor/fetch facility for it.

>[ shrug... ] In principle you could write a client library that would
>act that way, but I think you'll find that none of the extant ones
>will hand back an incomplete query result to the application.

True. But I have written one just now. The language is called Pike,
it's a C/C++/Java lookalike. And I start returning rows as they arrive,
and pause reading from the network when the application wants to pause.

>A possibly more convincing argument is that with that approach, the
>connection is completely tied up --- you cannot issue additional
>database commands based on what you just read, nor pull rows from
>multiple portals in an interleaved fashion.

Interleaved retrieval using multiple portals is not what most libraries
support, I'd guess.
It can be supported at the application layer using multiple cursors, but
that works with my approach as well.

In practice, most applications that need that, open multiple
connections to the same database (I'd think).

The only thing I could imagine is that *if* at the server end, the
notifications that arrive during the retrieval of one long running
Execute, are queued *after* all the data, instead of inserted into
the datastream, then it might be worth doing it differently.

Incidentally, the nice thing about my library is that it automatically
does arguments in binary which are easily processed in binary
(TEXT/BYTEA/ and all those others I mentioned earlier).
It automatically transmits those arguments in binary for *both*
arguments and rowresults; i.e. in one row I can have both text and
binary columns, without the application needing to specify which is
which.
--
Sincerely,
Stephen R. van den Berg.

"If you can't explain it to an 8-year-old, you don't understand it."


From: Abhijit Menon-Sen <ams(at)oryx(dot)com>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 17:05:46
Message-ID: 20080710170546.GA22775@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(I don't really have much to add to the discussion here; I'm just
posting for the record on the question of client behaviour, since
I also wrote and maintain a client library in C++.)

At 2008-07-10 18:40:03 +0200, srb(at)cuci(dot)nl wrote:
>
> I start returning rows as they arrive, and pause reading from the
> network when the application wants to pause.

My library also starts returning rows as they arrive, and in fact my
application makes heavy use of that feature. The data rows are read
from a non-blocking socket and the caller either does something for
each one, or waits until they've all arrived before proceeding.

> Interleaved retrieval using multiple portals is not what most
> libraries support, I'd guess.

My code did support that mode of operation in theory, but in practice
in the few situations where I have needed to use something like it, I
found it more convenient to open explicit cursors and FETCH from them
(but I usually needed this inside a transaction, and so did not open
multiple connections).

Thus my code always sets maxrows to 0 at the moment, and so...

> The only thing I could imagine is that *if* at the server end, the
> notifications that arrive during the retrieval of one long running
> Execute, are queued *after* all the data, instead of inserted into
> the datastream, then it might be worth doing it differently.

...I can't comment on this interesting observation.

> i.e. in one row I can have both text and binary columns, without the
> application needing to specify which is which.

Yes, that's nice. My first attempt to define an API for bind variables
set the data format to text by default and allowed it to be overriden,
but that was much too troublesome. Now the code decides by itself what
format is best to use for a given query.

(Again, though my library certainly supports mixing text and binary
format columns, my application has not needed to use this feature.)

-- ams


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 17:09:53
Message-ID: 87ej61d5su.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:

>>A possibly more convincing argument is that with that approach, the
>>connection is completely tied up --- you cannot issue additional
>>database commands based on what you just read, nor pull rows from
>>multiple portals in an interleaved fashion.
>
> Interleaved retrieval using multiple portals is not what most libraries
> support, I'd guess. It can be supported at the application layer using
> multiple cursors, but that works with my approach as well.
>
> In practice, most applications that need that, open multiple
> connections to the same database (I'd think).

Er? There's nothing particularly unusual about application logic like:

$sth->execute('huge select');
while ($sth->fetch('foreign_key')) {
... do some processing which is hard to do in server-side language ...
$sth->execute('insert resulting data');
}

Most drivers do support this kind of interface but they may be reading the
entire result set for "huge select" in advance. However if ti's large enough
then this is only going to really work if you can start a new portal while the
outer portal is actually running on the backend. If the driver tries to cache
the whole result set the programmer will be sad.

Back when I was doing PHP programming and I discovered that PHP's Postgres
driver didn't support this I thought it was an outrageous bug. (It didn't help
that the behaviour was to misbehave randomly rather than throw a meaningful
error.)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 17:24:39
Message-ID: 10953.1215710679@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
>> In practice, most applications that need that, open multiple
>> connections to the same database (I'd think).

> Er? There's nothing particularly unusual about application logic like:

> $sth->execute('huge select');
> while ($sth->fetch('foreign_key')) {
> ... do some processing which is hard to do in server-side language ...
> $sth->execute('insert resulting data');
> }

Moreover, there's often good reasons to do it all within one
transaction, which is impossible if you rely on a separate connection
to issue the inserts on.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Abhijit Menon-Sen" <ams(at)oryx(dot)com>
Cc: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-10 18:01:04
Message-ID: 87vdzdbov3.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Abhijit Menon-Sen" <ams(at)oryx(dot)com> writes:

>> Interleaved retrieval using multiple portals is not what most
>> libraries support, I'd guess.
>
> My code did support that mode of operation in theory, but in practice
> in the few situations where I have needed to use something like it, I
> found it more convenient to open explicit cursors and FETCH from them

Note that using FETCH for each record means a round trip to the server for
each record. If you're dealing with a lot of records that could be a lot
slower than streaming them to the client as quickly as it can consume them.

Now I'm not sure anyone's actually done any experiments to optimize libpq or
other drivers to stream data efficiently, so I'm not sure how much you would
really lose in practice today.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-13 16:22:18
Message-ID: 20080713162218.GA2067@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
>"Abhijit Menon-Sen" <ams(at)oryx(dot)com> writes:
>>> Interleaved retrieval using multiple portals is not what most
>>> libraries support, I'd guess.

>> My code did support that mode of operation in theory, but in practice
>> in the few situations where I have needed to use something like it, I
>> found it more convenient to open explicit cursors and FETCH from them

>Note that using FETCH for each record means a round trip to the server for
>each record. If you're dealing with a lot of records that could be a lot
>slower than streaming them to the client as quickly as it can consume them.

>Now I'm not sure anyone's actually done any experiments to optimize libpq or
>other drivers to stream data efficiently, so I'm not sure how much you would
>really lose in practice today.

My Pike drivers now support multiple simultaneous portals and
automatic streaming by presending overlapping Execute statements with
a dynamically adapted fetchlimit calculated per select as the query
progresses.

The only support still lacking is COPY.
--
Sincerely,
Stephen R. van den Berg.

In this signature, the concluding three words `were left out'.


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-27 19:00:04
Message-ID: 20080727190004.GA1449@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen R. van den Berg wrote:
>My Pike drivers now support multiple simultaneous portals and
>automatic streaming by presending overlapping Execute statements with
>a dynamically adapted fetchlimit calculated per select as the query
>progresses.

They also support COPY now.

The driver beats libpq in speed by about 62%.
The memory consumption is on demand, by row, and not the whole result set.
Transport to and from the query is in binary and dynamically determined
per datatype, no quoting necessary.

Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:

http://admin.cuci.nl/psgsql.pike.tar.gz

Pike is a C/C++/Java like interpreted language.
The production driver uses a PGsql assist class which is written in C to
accelerate (amazingly) few core functions (not included, but the driver
works fully without the PGsql assist class).
--
Sincerely,
Stephen R. van den Berg.
"There are 10 types of people in the world.
Those who understand binary and those who do not."


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 08:35:28
Message-ID: 20080728083528.GA12856@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I replied to this post, yesterday. Yet I don't see my reply appear,
could it have been caught in a spamfilter or something?
--
Sincerely,
Stephen R. van den Berg.

"Even if man could understand women, he still wouldn't believe it."


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Stephen R(dot) van den Berg <srb(at)cuci(dot)nl>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 11:53:21
Message-ID: C5BF64E6-D6BD-4DA9-9A8F-952A4C7B6C22@fastcrypt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:

> Stephen R. van den Berg wrote:
>> My Pike drivers now support multiple simultaneous portals and
>> automatic streaming by presending overlapping Execute statements with
>> a dynamically adapted fetchlimit calculated per select as the query
>> progresses.
>
> They also support COPY now.
>
> The driver beats libpq in speed by about 62%.
> The memory consumption is on demand, by row, and not the whole
> result set.
> Transport to and from the query is in binary and dynamically
> determined
> per datatype, no quoting necessary.
>
> Anyone interested in taking a peek at the (GPL copyright) driver, I
> temporarily put up a small package which contains the working driver
> in Pike at:
>
> http://admin.cuci.nl/psgsql.pike.tar.gz
>

This is very exciting news, I'd love to look at it, is there any way
it could be re-licensed so that it can be incorporated into say the
jdbc driver ?

Dave


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 16:45:41
Message-ID: 20080728164541.GV12856@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dave Cramer wrote:
>On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
>>Stephen R. van den Berg wrote:
>>The driver beats libpq in speed by about 62%.

>>Anyone interested in taking a peek at the (GPL copyright) driver, I
>>temporarily put up a small package which contains the working driver
>>in Pike at:

>> http://admin.cuci.nl/psgsql.pike.tar.gz

>This is very exciting news, I'd love to look at it, is there any way
>it could be re-licensed so that it can be incorporated into say the
>jdbc driver ?

Since I wrote it, I can relicense it any which way I want.
What kind of license would you like to have?
--
Sincerely,
Stephen R. van den Berg.

"Even if man could understand women, he still wouldn't believe it."


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: Dave Cramer <pg(at)fastcrypt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 17:00:09
Message-ID: 1217264409.11220.45.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2008-07-28 at 18:45 +0200, Stephen R. van den Berg wrote:
> Dave Cramer wrote:
> >On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
> >>Stephen R. van den Berg wrote:
> >>The driver beats libpq in speed by about 62%.
>
> >>Anyone interested in taking a peek at the (GPL copyright) driver, I
> >>temporarily put up a small package which contains the working driver
> >>in Pike at:
>
> >> http://admin.cuci.nl/psgsql.pike.tar.gz
>
> >This is very exciting news, I'd love to look at it, is there any way
> >it could be re-licensed so that it can be incorporated into say the
> >jdbc driver ?
>
> Since I wrote it, I can relicense it any which way I want.
> What kind of license would you like to have?

The JDBC driver and PostgreSQL itself are BSD licensed:

http://jdbc.postgresql.org/license.html

http://www.postgresql.org/about/licence

Sincerely,

Joshua D. Drake

> --
> Sincerely,
> Stephen R. van den Berg.
>
> "Even if man could understand women, he still wouldn't believe it."
>
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Stephen R(dot) van den Berg <srb(at)cuci(dot)nl>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 17:54:36
Message-ID: 1A988340-453E-4FE0-A58C-C8C1A3C16D02@fastcrypt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 28-Jul-08, at 12:45 PM, Stephen R. van den Berg wrote:

> Dave Cramer wrote:
>> On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
>>> Stephen R. van den Berg wrote:
>>> The driver beats libpq in speed by about 62%.
>
>>> Anyone interested in taking a peek at the (GPL copyright) driver, I
>>> temporarily put up a small package which contains the working driver
>>> in Pike at:
>
>>> http://admin.cuci.nl/psgsql.pike.tar.gz
>
>> This is very exciting news, I'd love to look at it, is there any way
>> it could be re-licensed so that it can be incorporated into say the
>> jdbc driver ?
>
> Since I wrote it, I can relicense it any which way I want.
> What kind of license would you like to have?

As Joshua mentioned BSD is the preferred postgresql license. As I
understand it I can't even look at your code and subsequently use
anything in the JDBC driver

Dave
>
> --
> Sincerely,
> Stephen R. van den Berg.
>
> "Even if man could understand women, he still wouldn't believe it."
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 18:24:22
Message-ID: F1F91BF6-BA0D-4EE3-9971-87AC22ED5511@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jul 28, 2008, at 1:54 PM, Dave Cramer wrote:

>
> On 28-Jul-08, at 12:45 PM, Stephen R. van den Berg wrote:
>
>> Dave Cramer wrote:
>>> On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
>>>> Stephen R. van den Berg wrote:
>>>> The driver beats libpq in speed by about 62%.
>>
>>>> Anyone interested in taking a peek at the (GPL copyright) driver, I
>>>> temporarily put up a small package which contains the working
>>>> driver
>>>> in Pike at:
>>
>>>> http://admin.cuci.nl/psgsql.pike.tar.gz
>>
>>> This is very exciting news, I'd love to look at it, is there any way
>>> it could be re-licensed so that it can be incorporated into say the
>>> jdbc driver ?
>>
>> Since I wrote it, I can relicense it any which way I want.
>> What kind of license would you like to have?
>
> As Joshua mentioned BSD is the preferred postgresql license. As I
> understand it I can't even look at your code and subsequently use
> anything in the JDBC driver

The GPL does not cover implementation ideas, not to mention that the
author just described the implementation. Furthermore, one could not
"take" anything from the Pike driver for the JDBC driver because it is
a completely different language. It seems like you confused the GPL
with an NDA.

Anyway, what does "The driver beats libpq in speed by about 62%" mean?

Cheers,
M


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Abhijit Menon-Sen <ams(at)oryx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 18:29:42
Message-ID: 20080728182942.GA31945@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dave Cramer wrote:
>>Since I wrote it, I can relicense it any which way I want.
>>What kind of license would you like to have?

>As Joshua mentioned BSD is the preferred postgresql license. As I
>understand it I can't even look at your code and subsequently use
>anything in the JDBC driver

I'll relicense it under a BSD license, so you can use it.
Watch this space.
--
Sincerely,
Stephen R. van den Berg.

"Even if man could understand women, he still wouldn't believe it."


From: "Andrew Garner" <andrew(dot)b(dot)garner(at)gmail(dot)com>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 18:36:13
Message-ID: f7b5dd6b0807281136q5a5a7eadwaad969c9ace97e92@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/27/08, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
>
> Stephen R. van den Berg wrote:
> >My Pike drivers now support multiple simultaneous portals and
> >automatic streaming by presending overlapping Execute statements with
> >a dynamically adapted fetchlimit calculated per select as the query
> >progresses.
>
>
> They also support COPY now.
>
> The driver beats libpq in speed by about 62%.
> The memory consumption is on demand, by row, and not the whole result set.
> Transport to and from the query is in binary and dynamically determined
> per datatype, no quoting necessary.
>
> Anyone interested in taking a peek at the (GPL copyright) driver, I
> temporarily put up a small package which contains the working driver
> in Pike at:
>
> http://admin.cuci.nl/psgsql.pike.tar.gz

I'd love to take a look at the code, but the link seems to be 404'd at the
moment.


From: "Francisco Figueiredo Jr(dot)" <francisco(at)npgsql(dot)org>
To: "Andrew Garner" <andrew(dot)b(dot)garner(at)gmail(dot)com>
Cc: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 20:22:03
Message-ID: 438d02260807281322q79af9ba9l311db4222aeeb40c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 28, 2008 at 3:36 PM, Andrew Garner
<andrew(dot)b(dot)garner(at)gmail(dot)com> wrote:
>
> On 7/27/08, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
>>
>> Stephen R. van den Berg wrote:
>> >My Pike drivers now support multiple simultaneous portals and
>> >automatic streaming by presending overlapping Execute statements with
>> >a dynamically adapted fetchlimit calculated per select as the query
>> >progresses.
>>
>>
>> They also support COPY now.
>>
>> The driver beats libpq in speed by about 62%.
>> The memory consumption is on demand, by row, and not the whole result set.
>> Transport to and from the query is in binary and dynamically determined
>> per datatype, no quoting necessary.
>>
>> Anyone interested in taking a peek at the (GPL copyright) driver, I
>> temporarily put up a small package which contains the working driver
>> in Pike at:
>>
>> http://admin.cuci.nl/psgsql.pike.tar.gz
>
> I'd love to take a look at the code, but the link seems to be 404'd at the
> moment.
>

Hi, I'd love to take a look at the code too. I'm the lead developer of
Npgsql a .Net data provider for postgresql which implements the
protocol 2 and 3. And have been following this thread so I could get
more info about protocol implementation internals so we could use in
our .net implementation. Currently we don't support multiple
resultsets yet.

Also, I'd like to get some ideas about how to optimize throughput. You
said you got a huge improvement compared to libpq. I think you may
implement some nice ideas which we could use to optimize Npgsql.

I'm also getting 404 error when trying to download the code.

--
Regards,

Francisco Figueiredo Jr.
http://fxjr.blogspot.com
http://www.npgsql.org


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-28 21:04:21
Message-ID: 20080728210421.GD31945@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A.M. wrote:
>Anyway, what does "The driver beats libpq in speed by about 62%" mean?

It means that if I open up a connection, my lib only takes half the time
libpq uses to open up the connection, and it also means that
when I run the following query 10 times in a row:

SELECT * FROM pg_catalog.pg_type as a,pg_catalog.pg_type as b LIMIT 1000

and the resultset is subsequently splitted, parsed and copied *per* row and
column into memory, and then discarded;
that using my driver it takes 483 units of time, and using the libpq
driver it uses 762 units of time.

I.e. speed comparison is 762/483 = 1.63 -> 63% faster.

The test environment is:

unloaded Linux 2.6.26 client using libpq5 8.3.3, and an unloaded Linux 2.6.26
server running PostgreSQL 8.3.3, 100Mb/s ethernet in between.

The test is I/O bound, and therefore tests the client-lib efficiency.
--
Sincerely,
Stephen R. van den Berg.

"Even if man could understand women, he still wouldn't believe it."


From: tomas(at)tuxteam(dot)de
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-29 06:05:29
Message-ID: 20080729060529.GD8362@tomas
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Jul 28, 2008 at 02:24:22PM -0400, A.M. wrote:
>
> On Jul 28, 2008, at 1:54 PM, Dave Cramer wrote:
>>
>> As Joshua mentioned BSD is the preferred postgresql license. As I
>> understand it I can't even look at your code and subsequently use anything
>> in the JDBC driver
>
> The GPL does not cover implementation ideas [...]

Exactly. What you can't do is copy code verbatim or link to the code
without being bound by the GPL. Taking ideas is free.

(Disclaimer: I'm clearly biased towards the GPL, but not so much as to
not understand that BSD would make more sense in the context of a BSD
project).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIjrMpBcgs9XrR2kYRAtK4AJ99e4hf74WvBR9qz+mMcZDjKAzWwgCcCJBo
BwgZ3g71VPqmFztKlUheKfg=
=gPPR
-----END PGP SIGNATURE-----


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Relicensed and downloadable (Re: Protocol 3, Execute, maxrows to return, impact?)
Date: 2008-07-29 10:55:04
Message-ID: 20080729105504.GA18800@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Limited offer, get it while it's hot:

http://admin.cuci.nl/pgsql.pike.driver.tar.gz.bin

I relicensed it under BSD now, put up proper copyright notices, and
included the accelleratorclass for amusement (the module works without
the accellerator class, but it's about 10 times slower than libpq then).

For anyone trying to understand Pike, the important points probably are
that:
- ({ 1, 2, 3, 4 }) is an array.
- m = (["key1":13, "key2":14, "key3":15]) is a mapping, i.e. m->key1
or m["key1"] both evaluate to 13.

The rest is rather similar to C++/Java.

See http://pike.roxen.com/ for more documentation and references,
try git://git.cuci.nl/pike as the source repository,
http://pike.ida.liu.se/docs/ for documentation and tutorials.

The fetch_row() function actually returns an array for each row, where
the individual columns are already parsed and converted into the native
Pike formats for handling.
--
Sincerely,
Stephen R. van den Berg.

"People who think they know everything are annoying to those of us who do."


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Relicensed and downloadable (Re: Protocol 3, Execute, maxrows to return, impact?)
Date: 2008-07-29 14:03:52
Message-ID: 20080729140352.GA22186@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen R. van den Berg wrote:
>The fetch_row() function actually returns an array for each row, where
>the individual columns are already parsed and converted into the native
>Pike formats for handling.

A typical (low level) usage sequence in Pike looks like this:

object db;
object q;
array row;

db = Sql.pgsql("host","database","user","password");

q = db->query("SELECT * FROM a");

while(row = q->fetch_row())
write("Processing row %O\n",row);

q = db->query("SELECT * FROM b");

etc.
--
Sincerely,
Stephen R. van den Berg.

"People who think they know everything are annoying to those of us who do."


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Protocol 3, Execute, maxrows to return, impact?
Date: 2008-07-30 10:05:47
Message-ID: 20080730100547.GA14714@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen R. van den Berg wrote:
>A.M. wrote:
>>Anyway, what does "The driver beats libpq in speed by about 62%" mean?

>I.e. speed comparison is 762/483 = 1.63 -> 63% faster.

After some tuning on the libpq side, I can say that:

If libpq is called using a cursor and fetches in 64 row chunks the speed
difference with my driver above is correct.
If libpq is called *not* using a cursor and fetches all (1000) rows in one
go, the new comparison becomes:

583/483 = 1.21, which still makes my driver 21% faster than libpq native.

Please note that my driver automatically chunks up the query using
multiple chained Execute(dynamically calculated limit) and allows
fetching the rows on demand and interleaving portals.
My driver uses a 32KByte buffer per active portal.
--
Sincerely,
Stephen R. van den Berg.

How many weeks are there in a lightyear?