Re: Input and Output data traffic

Lists: psycopg
From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Input and Output data traffic
Date: 2011-05-11 00:26:56
Message-ID: BANLkTinFOvkd9=NwgpDy65ejgZxyyov1YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Hi,

I'm doing some benchmarks for a paper in my university, and with the help of
a proxy I observed something strange.

The volume of sent data (application -> database) and the response data
(database -> application) are basically the same.

For simple inserts/updates/deletes i got a difference of about 15% between
sent and received data.

I used a ruby driver and got the same results for the same instruction.

Finally I used a Java JDBC driver (for postgres too), and the difference was
huge, it was 70% (the received data was much smaller).

Do you guys have any idea about what could be? Is it possible to the JDBC
driver is using some kind of data compression for the traffic?

Thanks in advance,

Israel


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 08:35:19
Message-ID: 20110511083519.GE2297@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Tue, May 10, 2011 at 09:26:56PM -0300, Israel Ben Guilherme Fonseca wrote:

> I used a ruby driver [...]

...

> Finally I used a Java JDBC driver [...]

You are asking this on a list for a *Python* driver ?

I'm sure both Ruby and JDBC drivers each have a vibrant
community asking which would probably increase the chances
of eliciting useful feedback.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 11:19:27
Message-ID: BANLkTinYjiya3bxs8_P6UG9MDTiGPmSX6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

LOL, sorry I wasn't explicit in my text. But i did the same test with
psycopg2 too. :P

"For simple inserts/updates/deletes i got a difference of about 15% between
sent and received data."

Thats for psycopg2 and data_objects.

The other result is for JDBC-Postgres driver.

If someone could say: "well, sent/received data should be significantly
different and there is something wrong there", or "no it's normal". That
would be helpful.

Thanks again,

2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

> On Tue, May 10, 2011 at 09:26:56PM -0300, Israel Ben Guilherme Fonseca
> wrote:
>
> > I used a ruby driver [...]
>
> ...
>
> > Finally I used a Java JDBC driver [...]
>
> You are asking this on a list for a *Python* driver ?
>
> I'm sure both Ruby and JDBC drivers each have a vibrant
> community asking which would probably increase the chances
> of eliciting useful feedback.
>
> Karsten
> --
> GPG key ID E4071346 @ gpg-keyserver.de
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> --
> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 11:43:23
Message-ID: 20110511114322.GI6613@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 08:18:40AM -0300, Israel Ben Guilherme Fonseca wrote:

> LOL, sorry I wasn't explicit in my text. But i did the same test with
> psycopg2 too. :P

:-) That makes better sense now.

> "For simple inserts/updates/deletes i got a difference of about 15% between
> sent and received data."

...

> If someone could say: "well, sent/received data should be significantly
> different and there is something wrong there", or "no it's normal". That
> would be helpful.

It very much depends on the exact queries used:

1) INSERT INTO my_table (a, b) VALUES (1, 2);

2) INSERT INTO my_table (a, b) VALUES (1, 2) RETURNING a, b;

I would expect a lot more data to be returned with 2) than
with 1).

(That difference in testing setup does not explain the
differences between drivers, though, if one assumes you've
been using the very same queries with each.)

The other thing is that one driver may be using server-side
cursors while another driver will use client-side cursors.
The latter will transfer a lot more data.

I think you'd have to be more specific about the exact
testing methodology to get useful advice.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:01:35
Message-ID: BANLkTi=hDAHCbReyZ+GXKJ_Q4=Z5t2YzhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

I'm not with java code right now, but the python one is this:

def inserir(self, curso):
cursor = self.conn.cursor()
cursor.execute("""insert into Curso(area, carga_horaria, ementa,
nome)
values(%s, %s, %s, %s)""",
(curso.area, curso.carga_horaria, curso.ementa,
curso.nome))
cursor.close()
self.conn.commit()

- Simple insert, without returning.
- With a proxy between application and database i measured the quantity of
transfered data.
- Executed 1000 times.

Note: I tried to reuse the same cursor (instead of creating a new one every
time) and didn't get any significant change.

About server side cursors, I thought that it would only matter for select
statements. Wouldn't? Is it possible to implement this differently?

2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

> On Wed, May 11, 2011 at 08:18:40AM -0300, Israel Ben Guilherme Fonseca
> wrote:
>
> > LOL, sorry I wasn't explicit in my text. But i did the same test with
> > psycopg2 too. :P
>
> :-) That makes better sense now.
>
> > "For simple inserts/updates/deletes i got a difference of about 15%
> between
> > sent and received data."
>
> ...
>
> > If someone could say: "well, sent/received data should be significantly
> > different and there is something wrong there", or "no it's normal". That
> > would be helpful.
>
> It very much depends on the exact queries used:
>
> 1) INSERT INTO my_table (a, b) VALUES (1, 2);
>
> 2) INSERT INTO my_table (a, b) VALUES (1, 2) RETURNING a, b;
>
> I would expect a lot more data to be returned with 2) than
> with 1).
>
> (That difference in testing setup does not explain the
> differences between drivers, though, if one assumes you've
> been using the very same queries with each.)
>
> The other thing is that one driver may be using server-side
> cursors while another driver will use client-side cursors.
> The latter will transfer a lot more data.
>
> I think you'd have to be more specific about the exact
> testing methodology to get useful advice.
>
> Karsten
> --
> GPG key ID E4071346 @ gpg-keyserver.de
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> --
> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:04:21
Message-ID: BANLkTikoTGCa6myFxoSv371okr09VaYjkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 1:01 PM, Israel Ben Guilherme Fonseca
<israel(dot)bgf(at)gmail(dot)com> wrote:

> Note: I tried to reuse the same cursor (instead of creating a new one every
> time) and didn't get any significant change.

There's no difference in the query issued if you use one or many cursors.

-- Daniele


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:17:55
Message-ID: 20110511121754.GK6613@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 09:01:35AM -0300, Israel Ben Guilherme Fonseca wrote:

> About server side cursors, I thought that it would only matter for select
> statements.

Indeed. But it might have mattered had you been using RETURNING.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:21:31
Message-ID: 20110511122131.GL6613@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 09:01:35AM -0300, Israel Ben Guilherme Fonseca wrote:

> I'm not with java code right now, but the python one is this:
>
> def inserir(self, curso):
> cursor = self.conn.cursor()
> cursor.execute("""insert into Curso(area, carga_horaria, ementa,
> nome)
> values(%s, %s, %s, %s)""",
> (curso.area, curso.carga_horaria, curso.ementa,
> curso.nome))
> cursor.close()
> self.conn.commit()
>
> - Simple insert, without returning.
> - With a proxy between application and database i measured the quantity of
> transfered data.
> - Executed 1000 times.

Is the connection torn down and set up for each of those
runs ? That would account for some back-traffic due to
connection setup queries being run by psycopg2.

1) What's your objective ? Are you trying to solve a
real-world problem ? Or is this premature optimization ?

2) If the claim that JDBC needs a lot less back traffic for
getting the same amount of work done we'd surely like to
know why and how (for which we'd need details) in order to
decide whether there's something to be gained from that
insight.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:36:04
Message-ID: BANLkTikrxWBD_+BBOw9NY4reiN03UqV1ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

- Same connection for all runs, the setup traffic was cleared before the
execution of the test itself.

1) I would say that it is a premature optimization, actually thats a Paper
for my university, I'm doing some ORMxDirect Access comparisons between
different languages and frameworks. After some tests, i got these "strange"
results, that's why I called you guys. :)

2) That would be very nice if we could help to improve psycopg driver. If I
did a smaller documented-test-suite and sent it to you would it be helpful?

2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

> On Wed, May 11, 2011 at 09:01:35AM -0300, Israel Ben Guilherme Fonseca
> wrote:
>
> > I'm not with java code right now, but the python one is this:
> >
> > def inserir(self, curso):
> > cursor = self.conn.cursor()
> > cursor.execute("""insert into Curso(area, carga_horaria, ementa,
> > nome)
> > values(%s, %s, %s, %s)""",
> > (curso.area, curso.carga_horaria, curso.ementa,
> > curso.nome))
> > cursor.close()
> > self.conn.commit()
> >
> > - Simple insert, without returning.
> > - With a proxy between application and database i measured the quantity
> of
> > transfered data.
> > - Executed 1000 times.
>
> Is the connection torn down and set up for each of those
> runs ? That would account for some back-traffic due to
> connection setup queries being run by psycopg2.
>
> 1) What's your objective ? Are you trying to solve a
> real-world problem ? Or is this premature optimization ?
>
> 2) If the claim that JDBC needs a lot less back traffic for
> getting the same amount of work done we'd surely like to
> know why and how (for which we'd need details) in order to
> decide whether there's something to be gained from that
> insight.
>
> Karsten
> --
> GPG key ID E4071346 @ gpg-keyserver.de
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> --
> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:41:42
Message-ID: BANLkTi=KxDXhpm8jhe0yzTvbPOajf5LrHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Just to conclude, the stranger thing is:

The ruby driver got almost the same results of psycopg2, That's what I
expected at all. Only the Java version that get those odd values, SO maybe
it's possible that I'm still doing something wrong so I'll give a look at
the code again.

Anyway, thanks for the responses,

2011/5/11 Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>

> - Same connection for all runs, the setup traffic was cleared before the
> execution of the test itself.
>
> 1) I would say that it is a premature optimization, actually thats a Paper
> for my university, I'm doing some ORMxDirect Access comparisons between
> different languages and frameworks. After some tests, i got these "strange"
> results, that's why I called you guys. :)
>
> 2) That would be very nice if we could help to improve psycopg driver. If I
> did a smaller documented-test-suite and sent it to you would it be helpful?
>
>
> 2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
>
>> On Wed, May 11, 2011 at 09:01:35AM -0300, Israel Ben Guilherme Fonseca
>> wrote:
>>
>> > I'm not with java code right now, but the python one is this:
>> >
>> > def inserir(self, curso):
>> > cursor = self.conn.cursor()
>> > cursor.execute("""insert into Curso(area, carga_horaria, ementa,
>> > nome)
>> > values(%s, %s, %s, %s)""",
>> > (curso.area, curso.carga_horaria,
>> curso.ementa,
>> > curso.nome))
>> > cursor.close()
>> > self.conn.commit()
>> >
>> > - Simple insert, without returning.
>> > - With a proxy between application and database i measured the quantity
>> of
>> > transfered data.
>> > - Executed 1000 times.
>>
>> Is the connection torn down and set up for each of those
>> runs ? That would account for some back-traffic due to
>> connection setup queries being run by psycopg2.
>>
>> 1) What's your objective ? Are you trying to solve a
>> real-world problem ? Or is this premature optimization ?
>>
>> 2) If the claim that JDBC needs a lot less back traffic for
>> getting the same amount of work done we'd surely like to
>> know why and how (for which we'd need details) in order to
>> decide whether there's something to be gained from that
>> insight.
>>
>> Karsten
>> --
>> GPG key ID E4071346 @ gpg-keyserver.de
>> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>>
>> --
>> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/psycopg
>>
>
>


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:43:15
Message-ID: 20110511124315.GM6613@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 09:36:04AM -0300, Israel Ben Guilherme Fonseca wrote:

> 1) I would say that it is a premature optimization, actually thats a Paper
> for my university, I'm doing some ORMxDirect Access comparisons between
> different languages and frameworks. After some tests, i got these "strange"
> results, that's why I called you guys. :)
>
> 2) That would be very nice if we could help to improve psycopg driver. If I
> did a smaller documented-test-suite and sent it to you would it be helpful?

I would think that if you show some objective differences in
traffic and post how to reproduce them people on this list
who are much smarter than me are likely to be interested, yes.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 12:51:06
Message-ID: BANLkTikEZ_O_fSUXf38wYOgvjjcrPkqbCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 1:36 PM, Israel Ben Guilherme Fonseca
<israel(dot)bgf(at)gmail(dot)com> wrote:

> 1) I would say that it is a premature optimization, actually thats a Paper
> for my university, I'm doing some ORMxDirect Access comparisons between
> different languages and frameworks. After some tests, i got these "strange"
> results, that's why I called you guys. :)

Appreciate, thanks :)

> 2) That would be very nice if we could help to improve psycopg driver. If I
> did a smaller documented-test-suite and sent it to you would it be helpful?

Yes, any result that would improve the driver would be of course well accepted.

Before going for a well packaged test, have you seen what is the
difference in the communication? There may be some obvious (after you
know) design difference, such as the jdbc driver caching the plan, or
the results, or using the binary protocol.

Cheers,

-- Daniele


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 13:04:14
Message-ID: BANLkTi=7DqAdx1fq8RMBMoMN+i-_5p4CxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

No I didn't. I'm using the "noob-approach", and using all default, no
special optimization, because normally that's what the people around here
do. :)

For ruby, python and java, I just got the drivers and used it in the
simplest way.

The tests are all built on "synthetic" scenario, I cratead the base, tables,
tests. So it's unlike to have some unexpected behavior happening, but I'll
give a look anyway.

About caching plan / binary protocol (and I know very-little-almost-nothing
about these), it can't be used with psycopg? Let's say that the jdbc driver
use these techniques, shouldn't psycopg do the same? (let-me repeat that I
may not know about what i'm talking here. :) )

2011/5/11 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>

> On Wed, May 11, 2011 at 1:36 PM, Israel Ben Guilherme Fonseca
> <israel(dot)bgf(at)gmail(dot)com> wrote:
>
> > 1) I would say that it is a premature optimization, actually thats a
> Paper
> > for my university, I'm doing some ORMxDirect Access comparisons between
> > different languages and frameworks. After some tests, i got these
> "strange"
> > results, that's why I called you guys. :)
>
> Appreciate, thanks :)
>
> > 2) That would be very nice if we could help to improve psycopg driver. If
> I
> > did a smaller documented-test-suite and sent it to you would it be
> helpful?
>
> Yes, any result that would improve the driver would be of course well
> accepted.
>
> Before going for a well packaged test, have you seen what is the
> difference in the communication? There may be some obvious (after you
> know) design difference, such as the jdbc driver caching the plan, or
> the results, or using the binary protocol.
>
> Cheers,
>
> -- Daniele
>


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 13:14:04
Message-ID: BANLkTikctiPtvj4UzWLj=QEqbwaJ21Jjew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 2:04 PM, Israel Ben Guilherme Fonseca
<israel(dot)bgf(at)gmail(dot)com> wrote:

> About caching plan / binary protocol (and I know very-little-almost-nothing
> about these), it can't be used with psycopg? Let's say that the jdbc driver
> use these techniques, shouldn't psycopg do the same? (let-me repeat that I
> may not know about what i'm talking here. :) )

We still don't know if the difference comes from anything of the
above. And even if id did, everything has a tradeoff: we should see if
the change would be a net improvement, as the bandwidth required is
only one of the possible parameters to take in consideration.

-- Daniele


From: Oswaldo <listas(at)soft-com(dot)es>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 14:21:15
Message-ID: 4DCA9B5B.8050701@soft-com.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

El 11/05/2011 15:04, Israel Ben Guilherme Fonseca escribió:
> No I didn't. I'm using the "noob-approach", and using all default, no
> special optimization, because normally that's what the people around
> here do. :)
>
> For ruby, python and java, I just got the drivers and used it in the
> simplest way.
>
> The tests are all built on "synthetic" scenario, I cratead the base,
> tables, tests. So it's unlike to have some unexpected behavior
> happening, but I'll give a look anyway.
>
> About caching plan / binary protocol (and I know
> very-little-almost-nothing about these), it can't be used with psycopg?
> Let's say that the jdbc driver use these techniques, shouldn't psycopg
> do the same? (let-me repeat that I may not know about what i'm talking
> here. :) )
>

Perhaps the jdbc driver are caching the querys, note that you are using
only inserts whitout any return, then these calls are equivalent but the
first is faster and uses less network traffic:

# 1 single call with 1000 inserts
cursor.execute("insert into foo (a,b,c) values(1,2,3);" * 1000)

# 1000 calls, each time an insert
for n in range(1000):
cursor.execute("insert into foo (a,b,c) values(1,2,3)")

I'm testing these technique to use it over low bandwith lines.

I suggest:

- Enable log_connections and log_ddl in the server an see what is
receiving the server in each test and the time used in the server side

- Check if the drivers are using ssl connections with compressed data.

- Modify your test app and do selects between inserts.

Regards

--
Oswaldo Hernández


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 14:38:40
Message-ID: BANLkTimq6gv3Sb=GgZoiiEeuYvg96sXMag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Thanks for the insight Oswaldo, i'll give a look. Do you know the exactly
file path to enable the log? Does the pgAdmin3 have some support to show
this data?

About the cache theory:

I'm explicitly committing the insert instruction at every iteration, so I
don't think that's possible to cache all the inserts in one statement like
you did.

But I'll give a look anyway,

Thanks again,

2011/5/11 Oswaldo <listas(at)soft-com(dot)es>

> El 11/05/2011 15:04, Israel Ben Guilherme Fonseca escribió:
>
> No I didn't. I'm using the "noob-approach", and using all default, no
>> special optimization, because normally that's what the people around
>> here do. :)
>>
>> For ruby, python and java, I just got the drivers and used it in the
>> simplest way.
>>
>> The tests are all built on "synthetic" scenario, I cratead the base,
>> tables, tests. So it's unlike to have some unexpected behavior
>> happening, but I'll give a look anyway.
>>
>> About caching plan / binary protocol (and I know
>> very-little-almost-nothing about these), it can't be used with psycopg?
>> Let's say that the jdbc driver use these techniques, shouldn't psycopg
>> do the same? (let-me repeat that I may not know about what i'm talking
>> here. :) )
>>
>>
> Perhaps the jdbc driver are caching the querys, note that you are using
> only inserts whitout any return, then these calls are equivalent but the
> first is faster and uses less network traffic:
>
> # 1 single call with 1000 inserts
> cursor.execute("insert into foo (a,b,c) values(1,2,3);" * 1000)
>
> # 1000 calls, each time an insert
> for n in range(1000):
> cursor.execute("insert into foo (a,b,c) values(1,2,3)")
>
> I'm testing these technique to use it over low bandwith lines.
>
> I suggest:
>
> - Enable log_connections and log_ddl in the server an see what is receiving
> the server in each test and the time used in the server side
>
> - Check if the drivers are using ssl connections with compressed data.
>
> - Modify your test app and do selects between inserts.
>
>
> Regards
>
> --
> Oswaldo Hernández
>
> --
> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 14:56:56
Message-ID: BANLkTimXAbLBuHfpSGv0yFCJ5XGReW9wFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 3:38 PM, Israel Ben Guilherme Fonseca
<israel(dot)bgf(at)gmail(dot)com> wrote:

> I'm explicitly committing the insert instruction at every iteration, so I
> don't think that's possible to cache all the inserts in one statement like
> you did.

In this case, it may be due to the "set isolation level" issued
together with the "begin" by psycopg. Because you send a very short
statement between BEGIN/COMMIT pair the code for the set could be a
relevant part of the communication. The overhead would be less
relevant with bigger commands per transaction of course.

This is a point I just wanted to talk about on the ML: will try to
write this evening/in the next few days.

-- Daniele


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 16:56:08
Message-ID: BANLkTin-nfzw2dF3x3oeWefkg=dtAQ1kVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

With the log tips of Oswaldo, I saw that

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

on every instruction as you said Danielle. JDBC dont have those and it use
parameters for the statements, psycopg do not (its embedded in the SQL). I
dont know if this could have some impact.

I used the recommended way to avoid SQL Injection by the way.

execute("insert into table(col) values(%s)", (10,))

--

Anyway, I wil try to get more careful look on this later(maybe not today,
but soon), and tell you guys what happened.

Thanks very much for all the support.

2011/5/11 Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>

> On Wed, May 11, 2011 at 3:38 PM, Israel Ben Guilherme Fonseca
> <israel(dot)bgf(at)gmail(dot)com> wrote:
>
> > I'm explicitly committing the insert instruction at every iteration, so I
> > don't think that's possible to cache all the inserts in one statement
> like
> > you did.
>
> In this case, it may be due to the "set isolation level" issued
> together with the "begin" by psycopg. Because you send a very short
> statement between BEGIN/COMMIT pair the code for the set could be a
> relevant part of the communication. The overhead would be less
> relevant with bigger commands per transaction of course.
>
> This is a point I just wanted to talk about on the ML: will try to
> write this evening/in the next few days.
>
> -- Daniele
>


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 17:29:41
Message-ID: 20110511172941.GS6613@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 01:56:08PM -0300, Israel Ben Guilherme Fonseca wrote:

> With the log tips of Oswaldo, I saw that
>
> SET TRANSACTION ISOLATION LEVEL READ COMMITTED
>
> on every instruction as you said Danielle. JDBC dont have those and it use
> parameters for the statements, psycopg do not (its embedded in the SQL). I
> dont know if this could have some impact.
>
> I used the recommended way to avoid SQL Injection by the way.
>
> execute("insert into table(col) values(%s)", (10,))

I wonder how any of this is supposed to have any impact on
the expected up-/down-traffic ratio which was the only thing
the OP reported to be different between psycopg2 and JDBC
(if I correctly understood the original post).

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 17:37:28
Message-ID: BANLkTik8x_r7AsARSCsd+Fxz5Ld=PU0nYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

With that extra clause on every commit, that could increase the size of the
sent data. JDBC doesn't do it, so maybe it could be the cause, at last for
the sent data not the received data.

But it's too soon to say anything for now, I'll do some tests latter to see
what happens.

2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

> On Wed, May 11, 2011 at 01:56:08PM -0300, Israel Ben Guilherme Fonseca
> wrote:
>
> > With the log tips of Oswaldo, I saw that
> >
> > SET TRANSACTION ISOLATION LEVEL READ COMMITTED
> >
> > on every instruction as you said Danielle. JDBC dont have those and it
> use
> > parameters for the statements, psycopg do not (its embedded in the SQL).
> I
> > dont know if this could have some impact.
> >
> > I used the recommended way to avoid SQL Injection by the way.
> >
> > execute("insert into table(col) values(%s)", (10,))
>
> I wonder how any of this is supposed to have any impact on
> the expected up-/down-traffic ratio which was the only thing
> the OP reported to be different between psycopg2 and JDBC
> (if I correctly understood the original post).
>
> Karsten
> --
> GPG key ID E4071346 @ gpg-keyserver.de
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> --
> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 18:40:23
Message-ID: 20110511184022.GT6613@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Wed, May 11, 2011 at 02:37:28PM -0300, Israel Ben Guilherme Fonseca wrote:

> With that extra clause on every commit, that could increase the size of the
> sent data. JDBC doesn't do it, so maybe it could be the cause, at last for
> the sent data not the received data.

This is what you originally wrote:

>> The volume of sent data (application -> database) and the response data
>> (database -> application) are basically the same.
>>
>> For simple inserts/updates/deletes i got a difference of about 15% between
>> sent and received data.

(later you said you were testing INSERTs only ...)

>> I used a ruby driver and got the same results for the same instruction.
>>
>> Finally I used a Java JDBC driver (for postgres too), and the difference was
>> huge, it was 70% (the received data was much smaller).

What you've been wondering here is this:

- psycopg2 sends, say, 100 bytes (+some) and receives 85 bytes (+some)
- JDBC sends, say, 100 bytes and receives 30 bytes

Now, since psycopg2 supposedly *sends more* data (namely
setting the transaction isolation level) how is that
supposed to make the difference between sent and received
*smaller* ? This would only make sense if the "more" data
would provoke *a lot more* data to be received. Which
setting the transaction isolation certainly shouldn't.

But maybe I'm royally screwing up my thinking :-)))

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-11 19:10:26
Message-ID: BANLkTik4UAcVGy1Pny3mPj25w6awP4xGyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Man now YOU really screwed my mind. AHUAHua

Actually i'm doing tests for all operations (insert, select, delete,
update), but here in mailist to keep things simple I'm trying to focus on
the insert operation.

Let's keep simple. The objective is (example):

Java
sent: 100 bytes
received 30 bytes

Python
sent: 100 bytes
received: 85 bytes

I want to see the python driver getting the same result (forget about %
differences maybe that turned everything more confuse). Did you get it?

But anyway, I'll do more tests, and maybe even put some charts here, I'll
recheck my implementations too. We are talking about this since the morning
and I didn't have time to recheck the code (now with postgres logs like
sugested). Maybe everything is an false alarm. :)

2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

> On Wed, May 11, 2011 at 02:37:28PM -0300, Israel Ben Guilherme Fonseca
> wrote:
>
> > With that extra clause on every commit, that could increase the size of
> the
> > sent data. JDBC doesn't do it, so maybe it could be the cause, at last
> for
> > the sent data not the received data.
>
> This is what you originally wrote:
>
> >> The volume of sent data (application -> database) and the response data
> >> (database -> application) are basically the same.
> >>
> >> For simple inserts/updates/deletes i got a difference of about 15%
> between
> >> sent and received data.
>
> (later you said you were testing INSERTs only ...)
>
> >> I used a ruby driver and got the same results for the same instruction.
> >>
> >> Finally I used a Java JDBC driver (for postgres too), and the difference
> was
> >> huge, it was 70% (the received data was much smaller).
>
> What you've been wondering here is this:
>
> - psycopg2 sends, say, 100 bytes (+some) and receives 85 bytes (+some)
> - JDBC sends, say, 100 bytes and receives 30 bytes
>
> Now, since psycopg2 supposedly *sends more* data (namely
> setting the transaction isolation level) how is that
> supposed to make the difference between sent and received
> *smaller* ? This would only make sense if the "more" data
> would provoke *a lot more* data to be received. Which
> setting the transaction isolation certainly shouldn't.
>
> But maybe I'm royally screwing up my thinking :-)))
>
> Karsten
> --
> GPG key ID E4071346 @ gpg-keyserver.de
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> --
> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-12 01:16:08
Message-ID: BANLkTimpHE8x4RWRcmo5sLgsWMi+ufDCMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Man.. what a twist of events. My java code was indeed implemented wrong,
while python/ruby were all normalized with same input values 20 character
strings for the columns, the java was with 2! LOL

Now the bizarre thing... In the and the JDBC driver is the one with a hight
traffic (almost the double), maybe I'm wrong AGAIN, but now i'm migrating to
the jdbc forum to ask about it.

Anyway, I've already killed the responsible of this outrage.

Thanks guys for the help and the tremendous amount of knowledge, sorry for
the trouble.

Just for curiosity:

# Python

conn = psycopg2.connect("dbname=tcc host=localhost port=%s user=postgres
password=postgres" % 4444)

for i in range(100):
c = conn.cursor()
c.execute("select * from pessoa")
c.fetchall()
c.close()

# Sent 8.3 Receive 9.6 (KB)

// Java

Properties p = new Properties();
p.setProperty("URL", "jdbc:postgresql://localhost:4444/tcc");
p.setProperty("user", "postgres");
p.setProperty("password", "postgres");

Connection con = DriverManager.getConnection(p.getProperty("URL"),
p);
con.setAutoCommit(false);

for(int i = 0; i < 100; i++){
con.createStatement().executeQuery("select * from
pessoa").next();
}

// Sent 6.1 Receive 31.1 (KB)

The postgre log difference (repeats 100 times):

Python LOG: comando: select * from pessoa
Java LOG: executar <unnamed>: select * from pessoa

comando -> command, executar -> execute

Should this mean something?

2011/5/11 Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>

> Man now YOU really screwed my mind. AHUAHua
>
> Actually i'm doing tests for all operations (insert, select, delete,
> update), but here in mailist to keep things simple I'm trying to focus on
> the insert operation.
>
> Let's keep simple. The objective is (example):
>
> Java
> sent: 100 bytes
> received 30 bytes
>
> Python
> sent: 100 bytes
> received: 85 bytes
>
> I want to see the python driver getting the same result (forget about %
> differences maybe that turned everything more confuse). Did you get it?
>
> But anyway, I'll do more tests, and maybe even put some charts here, I'll
> recheck my implementations too. We are talking about this since the morning
> and I didn't have time to recheck the code (now with postgres logs like
> sugested). Maybe everything is an false alarm. :)
>
>
> 2011/5/11 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
>
>> On Wed, May 11, 2011 at 02:37:28PM -0300, Israel Ben Guilherme Fonseca
>> wrote:
>>
>> > With that extra clause on every commit, that could increase the size of
>> the
>> > sent data. JDBC doesn't do it, so maybe it could be the cause, at last
>> for
>> > the sent data not the received data.
>>
>> This is what you originally wrote:
>>
>> >> The volume of sent data (application -> database) and the response data
>> >> (database -> application) are basically the same.
>> >>
>> >> For simple inserts/updates/deletes i got a difference of about 15%
>> between
>> >> sent and received data.
>>
>> (later you said you were testing INSERTs only ...)
>>
>> >> I used a ruby driver and got the same results for the same instruction.
>> >>
>> >> Finally I used a Java JDBC driver (for postgres too), and the
>> difference was
>> >> huge, it was 70% (the received data was much smaller).
>>
>> What you've been wondering here is this:
>>
>> - psycopg2 sends, say, 100 bytes (+some) and receives 85 bytes (+some)
>> - JDBC sends, say, 100 bytes and receives 30 bytes
>>
>> Now, since psycopg2 supposedly *sends more* data (namely
>> setting the transaction isolation level) how is that
>> supposed to make the difference between sent and received
>> *smaller* ? This would only make sense if the "more" data
>> would provoke *a lot more* data to be received. Which
>> setting the transaction isolation certainly shouldn't.
>>
>> But maybe I'm royally screwing up my thinking :-)))
>>
>> Karsten
>> --
>> GPG key ID E4071346 @ gpg-keyserver.de
>> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>>
>> --
>> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/psycopg
>>
>
>


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-13 04:48:56
Message-ID: BANLkTim=do_7FX4-YHivovKhhUO5t-dahQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Hi again,

I'm still doing my performance tests, but now psycopg2 is really kicking the
bucket with an extremely low traffic usage in comparison to Java. My brand
new test, shows a difference of 8 times, and again I think that it could be
wrong, so that's why I'm here... again.

A guy in the jdbc-mailist give me a tip of a program to measure traffic
usage, it's called 'wireshark', very good indeed and I got this following
results:

1 'Person' table, 2 columns (id, name), 7000++ registers.

The traffic difference was:

Java 220861 Bytes
Python 29014 Bytes

A difference of 8x. Now, I did the same test with my old proxy
implementation and got similar results. So.. that's strange. Very strange.
My question is, is there any specific implementation detail on psycopg?
Maybe some voodoo magic? :)

I think that's much probably that I'm doing something wrong, but I did and
redid all the tests many times until now, that I'm starting to accept this.

My ONLY track of explanation is this postgres log (debug5)

JAVA

simpletests DEPURAÇÃO: análise de <unnamed>: select * from "Person"
simpletests DEPURAÇÃO: StartTransactionCommand
simpletests DEPURAÇÃO: StartTransaction
simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
simpletests DEPURAÇÃO: ligação de <unnamed> para <unnamed>
simpletests LOG: executar <unnamed>: select * from "Person"
simpletests DEPURAÇÃO: CommitTransactionCommand
simpletests DEPURAÇÃO: CommitTransaction
simpletests DEPURAÇÃO: name: unnamed; blockState: STARTED; state:
INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
simpletests LOG: desconexão: tempo da sessão: 0:00:12.172 usuário=postgres
banco de dados=simpletests máquina=localhost port=56401
simpletests DEPURAÇÃO: exit(0)
simpletests DEPURAÇÃO: shmem_exit(-1): 0 callbacks to make
simpletests DEPURAÇÃO: proc_exit(-1): 0 callbacks to make

PYTHON

simpletests DEPURAÇÃO: StartTransactionCommand
simpletests DEPURAÇÃO: StartTransaction
simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
simpletests LOG: comando: BEGIN; SET TRANSACTION ISOLATION LEVEL READ
COMMITTED
simpletests DEPURAÇÃO: ProcessUtility
simpletests DEPURAÇÃO: CommitTransactionCommand
simpletests DEPURAÇÃO: StartTransactionCommand
simpletests DEPURAÇÃO: ProcessUtility
simpletests DEPURAÇÃO: CommitTransactionCommand
simpletests DEPURAÇÃO: StartTransactionCommand
simpletests LOG: comando: select * from "Person"
simpletests DEPURAÇÃO: CommitTransactionCommand

The python log looks somewhat cleaner, and Java got this:

simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make

10 extra callbacks? I dont have any idea about what is about, but maybe it
could mean something.

I hosted all the files and source code for this test-case on google code. If
you are using ubuntu it probabbly take just 10 minutes to execute everthing
and see this with your own eyes (and hopefully someone would say "your
idiot, you did THAT <code> wrong"). Just read the readme to get the tests
running.

http://orm-native-comparative.googlecode.com/files/tests.zip

If you got time to test it, thanks. If you don't, thanks anyway to at least
read this huge email.

Israel


From: Nicolas Grilly <nicolas(at)gardentechno(dot)com>
To: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-13 09:05:31
Message-ID: BANLkTi=mWKGXcMxSp7_Jq2v7eKGzqoLn9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Interesting. Just two random thoughts:
- Connections to a PostgreSQL server can use SSL or not. Have you checked
your Python and Java test cases are both in the same mode?
- Query results can be returned in text mode or binary mode. Have you
checked that too?

These two settings can explain the big difference you noticed.

Best,

Nicolas

On Fri, May 13, 2011 at 06:48, Israel Ben Guilherme Fonseca <
israel(dot)bgf(at)gmail(dot)com> wrote:

> Hi again,
>
> I'm still doing my performance tests, but now psycopg2 is really kicking
> the bucket with an extremely low traffic usage in comparison to Java. My
> brand new test, shows a difference of 8 times, and again I think that it
> could be wrong, so that's why I'm here... again.
>
> A guy in the jdbc-mailist give me a tip of a program to measure traffic
> usage, it's called 'wireshark', very good indeed and I got this following
> results:
>
> 1 'Person' table, 2 columns (id, name), 7000++ registers.
>
> The traffic difference was:
>
> Java 220861 Bytes
> Python 29014 Bytes
>
> A difference of 8x. Now, I did the same test with my old proxy
> implementation and got similar results. So.. that's strange. Very strange.
> My question is, is there any specific implementation detail on psycopg?
> Maybe some voodoo magic? :)
>
> I think that's much probably that I'm doing something wrong, but I did and
> redid all the tests many times until now, that I'm starting to accept this.
>
> My ONLY track of explanation is this postgres log (debug5)
>
> JAVA
>
> simpletests DEPURAÇÃO: análise de <unnamed>: select * from "Person"
> simpletests DEPURAÇÃO: StartTransactionCommand
> simpletests DEPURAÇÃO: StartTransaction
> simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
> simpletests DEPURAÇÃO: ligação de <unnamed> para <unnamed>
> simpletests LOG: executar <unnamed>: select * from "Person"
> simpletests DEPURAÇÃO: CommitTransactionCommand
> simpletests DEPURAÇÃO: CommitTransaction
> simpletests DEPURAÇÃO: name: unnamed; blockState: STARTED; state:
> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
> simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
> simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
> simpletests LOG: desconexão: tempo da sessão: 0:00:12.172 usuário=postgres
> banco de dados=simpletests máquina=localhost port=56401
> simpletests DEPURAÇÃO: exit(0)
> simpletests DEPURAÇÃO: shmem_exit(-1): 0 callbacks to make
> simpletests DEPURAÇÃO: proc_exit(-1): 0 callbacks to make
>
> PYTHON
>
> simpletests DEPURAÇÃO: StartTransactionCommand
> simpletests DEPURAÇÃO: StartTransaction
> simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
> simpletests LOG: comando: BEGIN; SET TRANSACTION ISOLATION LEVEL READ
> COMMITTED
> simpletests DEPURAÇÃO: ProcessUtility
> simpletests DEPURAÇÃO: CommitTransactionCommand
> simpletests DEPURAÇÃO: StartTransactionCommand
> simpletests DEPURAÇÃO: ProcessUtility
> simpletests DEPURAÇÃO: CommitTransactionCommand
> simpletests DEPURAÇÃO: StartTransactionCommand
> simpletests LOG: comando: select * from "Person"
> simpletests DEPURAÇÃO: CommitTransactionCommand
>
> The python log looks somewhat cleaner, and Java got this:
>
> simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
> simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
>
> 10 extra callbacks? I dont have any idea about what is about, but maybe it
> could mean something.
>
> I hosted all the files and source code for this test-case on google code.
> If you are using ubuntu it probabbly take just 10 minutes to execute
> everthing and see this with your own eyes (and hopefully someone would say
> "your idiot, you did THAT <code> wrong"). Just read the readme to get the
> tests running.
>
> http://orm-native-comparative.googlecode.com/files/tests.zip
>
> If you got time to test it, thanks. If you don't, thanks anyway to at least
> read this huge email.
>
> Israel
>


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-13 11:23:25
Message-ID: BANLkTimCBDgM1BjPb6+eJV2_1yrd2N2Y-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Could you give some insights about how to check for the special parameters?

I mean, the server was installed with the default options (apt-get install),
if it's a server side configurations it's the same for both.

From my readings:

1 - SSL need all those certified stuff, I didn't do any of this.
2 - Is binary mode only for the COPY statement? Well, I didn't do any of
special configurations for this too, only if it's default on psycopg2.

2011/5/13 Nicolas Grilly <nicolas(at)gardentechno(dot)com>

> Interesting. Just two random thoughts:
> - Connections to a PostgreSQL server can use SSL or not. Have you checked
> your Python and Java test cases are both in the same mode?
> - Query results can be returned in text mode or binary mode. Have you
> checked that too?
>
> These two settings can explain the big difference you noticed.
>
> Best,
>
> Nicolas
>
> On Fri, May 13, 2011 at 06:48, Israel Ben Guilherme Fonseca <
> israel(dot)bgf(at)gmail(dot)com> wrote:
>
>> Hi again,
>>
>> I'm still doing my performance tests, but now psycopg2 is really kicking
>> the bucket with an extremely low traffic usage in comparison to Java. My
>> brand new test, shows a difference of 8 times, and again I think that it
>> could be wrong, so that's why I'm here... again.
>>
>> A guy in the jdbc-mailist give me a tip of a program to measure traffic
>> usage, it's called 'wireshark', very good indeed and I got this following
>> results:
>>
>> 1 'Person' table, 2 columns (id, name), 7000++ registers.
>>
>> The traffic difference was:
>>
>> Java 220861 Bytes
>> Python 29014 Bytes
>>
>> A difference of 8x. Now, I did the same test with my old proxy
>> implementation and got similar results. So.. that's strange. Very strange.
>> My question is, is there any specific implementation detail on psycopg?
>> Maybe some voodoo magic? :)
>>
>> I think that's much probably that I'm doing something wrong, but I did and
>> redid all the tests many times until now, that I'm starting to accept this.
>>
>> My ONLY track of explanation is this postgres log (debug5)
>>
>> JAVA
>>
>> simpletests DEPURAÇÃO: análise de <unnamed>: select * from "Person"
>> simpletests DEPURAÇÃO: StartTransactionCommand
>> simpletests DEPURAÇÃO: StartTransaction
>> simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
>> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
>> simpletests DEPURAÇÃO: ligação de <unnamed> para <unnamed>
>> simpletests LOG: executar <unnamed>: select * from "Person"
>> simpletests DEPURAÇÃO: CommitTransactionCommand
>> simpletests DEPURAÇÃO: CommitTransaction
>> simpletests DEPURAÇÃO: name: unnamed; blockState: STARTED; state:
>> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
>> simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
>> simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
>> simpletests LOG: desconexão: tempo da sessão: 0:00:12.172
>> usuário=postgres banco de dados=simpletests máquina=localhost port=56401
>> simpletests DEPURAÇÃO: exit(0)
>> simpletests DEPURAÇÃO: shmem_exit(-1): 0 callbacks to make
>> simpletests DEPURAÇÃO: proc_exit(-1): 0 callbacks to make
>>
>> PYTHON
>>
>> simpletests DEPURAÇÃO: StartTransactionCommand
>> simpletests DEPURAÇÃO: StartTransaction
>> simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
>> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
>> simpletests LOG: comando: BEGIN; SET TRANSACTION ISOLATION LEVEL READ
>> COMMITTED
>> simpletests DEPURAÇÃO: ProcessUtility
>> simpletests DEPURAÇÃO: CommitTransactionCommand
>> simpletests DEPURAÇÃO: StartTransactionCommand
>> simpletests DEPURAÇÃO: ProcessUtility
>> simpletests DEPURAÇÃO: CommitTransactionCommand
>> simpletests DEPURAÇÃO: StartTransactionCommand
>> simpletests LOG: comando: select * from "Person"
>> simpletests DEPURAÇÃO: CommitTransactionCommand
>>
>> The python log looks somewhat cleaner, and Java got this:
>>
>> simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
>> simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
>>
>> 10 extra callbacks? I dont have any idea about what is about, but maybe it
>> could mean something.
>>
>> I hosted all the files and source code for this test-case on google code.
>> If you are using ubuntu it probabbly take just 10 minutes to execute
>> everthing and see this with your own eyes (and hopefully someone would say
>> "your idiot, you did THAT <code> wrong"). Just read the readme to get the
>> tests running.
>>
>> http://orm-native-comparative.googlecode.com/files/tests.zip
>>
>> If you got time to test it, thanks. If you don't, thanks anyway to at
>> least read this huge email.
>>
>> Israel
>>
>
>


From: Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-13 16:33:40
Message-ID: BANLkTiktxisXXFZ2OO1nuE_40Xd+o9arUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Well I finally figure it out (or somewhat).

The "problem" is indeed the 'sslmode' parameter of the psycopg2 connection.

If omitted, I think that's the default value is 'prefer' (enable ssl if the
server support). With that I got the very low value of 29014KB for traffic.

When i explicitly set it to 'disable', I got almost the same result of java
221222KB.

So here's the question, does the SSL also compress the traffic data? I
enabled the self-signed certificate mode in the Java test and got the same
result.

Note: I checked the package contents of the traffic with Wireshark to check
if it's indeed encrypted.

2011/5/13 Israel Ben Guilherme Fonseca <israel(dot)bgf(at)gmail(dot)com>

> Could you give some insights about how to check for the special parameters?
>
> I mean, the server was installed with the default options (apt-get
> install), if it's a server side configurations it's the same for both.
>
> From my readings:
>
> 1 - SSL need all those certified stuff, I didn't do any of this.
> 2 - Is binary mode only for the COPY statement? Well, I didn't do any of
> special configurations for this too, only if it's default on psycopg2.
>
>
> 2011/5/13 Nicolas Grilly <nicolas(at)gardentechno(dot)com>
>
>> Interesting. Just two random thoughts:
>> - Connections to a PostgreSQL server can use SSL or not. Have you checked
>> your Python and Java test cases are both in the same mode?
>> - Query results can be returned in text mode or binary mode. Have you
>> checked that too?
>>
>> These two settings can explain the big difference you noticed.
>>
>> Best,
>>
>> Nicolas
>>
>> On Fri, May 13, 2011 at 06:48, Israel Ben Guilherme Fonseca <
>> israel(dot)bgf(at)gmail(dot)com> wrote:
>>
>>> Hi again,
>>>
>>> I'm still doing my performance tests, but now psycopg2 is really kicking
>>> the bucket with an extremely low traffic usage in comparison to Java. My
>>> brand new test, shows a difference of 8 times, and again I think that it
>>> could be wrong, so that's why I'm here... again.
>>>
>>> A guy in the jdbc-mailist give me a tip of a program to measure traffic
>>> usage, it's called 'wireshark', very good indeed and I got this following
>>> results:
>>>
>>> 1 'Person' table, 2 columns (id, name), 7000++ registers.
>>>
>>> The traffic difference was:
>>>
>>> Java 220861 Bytes
>>> Python 29014 Bytes
>>>
>>> A difference of 8x. Now, I did the same test with my old proxy
>>> implementation and got similar results. So.. that's strange. Very strange.
>>> My question is, is there any specific implementation detail on psycopg?
>>> Maybe some voodoo magic? :)
>>>
>>> I think that's much probably that I'm doing something wrong, but I did
>>> and redid all the tests many times until now, that I'm starting to accept
>>> this.
>>>
>>> My ONLY track of explanation is this postgres log (debug5)
>>>
>>> JAVA
>>>
>>> simpletests DEPURAÇÃO: análise de <unnamed>: select * from "Person"
>>> simpletests DEPURAÇÃO: StartTransactionCommand
>>> simpletests DEPURAÇÃO: StartTransaction
>>> simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
>>> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
>>> simpletests DEPURAÇÃO: ligação de <unnamed> para <unnamed>
>>> simpletests LOG: executar <unnamed>: select * from "Person"
>>> simpletests DEPURAÇÃO: CommitTransactionCommand
>>> simpletests DEPURAÇÃO: CommitTransaction
>>> simpletests DEPURAÇÃO: name: unnamed; blockState: STARTED; state:
>>> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
>>> simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
>>> simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
>>> simpletests LOG: desconexão: tempo da sessão: 0:00:12.172
>>> usuário=postgres banco de dados=simpletests máquina=localhost port=56401
>>> simpletests DEPURAÇÃO: exit(0)
>>> simpletests DEPURAÇÃO: shmem_exit(-1): 0 callbacks to make
>>> simpletests DEPURAÇÃO: proc_exit(-1): 0 callbacks to make
>>>
>>> PYTHON
>>>
>>> simpletests DEPURAÇÃO: StartTransactionCommand
>>> simpletests DEPURAÇÃO: StartTransaction
>>> simpletests DEPURAÇÃO: name: unnamed; blockState: DEFAULT; state:
>>> INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:
>>> simpletests LOG: comando: BEGIN; SET TRANSACTION ISOLATION LEVEL READ
>>> COMMITTED
>>> simpletests DEPURAÇÃO: ProcessUtility
>>> simpletests DEPURAÇÃO: CommitTransactionCommand
>>> simpletests DEPURAÇÃO: StartTransactionCommand
>>> simpletests DEPURAÇÃO: ProcessUtility
>>> simpletests DEPURAÇÃO: CommitTransactionCommand
>>> simpletests DEPURAÇÃO: StartTransactionCommand
>>> simpletests LOG: comando: select * from "Person"
>>> simpletests DEPURAÇÃO: CommitTransactionCommand
>>>
>>> The python log looks somewhat cleaner, and Java got this:
>>>
>>> simpletests DEPURAÇÃO: shmem_exit(0): 6 callbacks to make
>>> simpletests DEPURAÇÃO: proc_exit(0): 4 callbacks to make
>>>
>>> 10 extra callbacks? I dont have any idea about what is about, but maybe
>>> it could mean something.
>>>
>>> I hosted all the files and source code for this test-case on google code.
>>> If you are using ubuntu it probabbly take just 10 minutes to execute
>>> everthing and see this with your own eyes (and hopefully someone would say
>>> "your idiot, you did THAT <code> wrong"). Just read the readme to get the
>>> tests running.
>>>
>>> http://orm-native-comparative.googlecode.com/files/tests.zip
>>>
>>> If you got time to test it, thanks. If you don't, thanks anyway to at
>>> least read this huge email.
>>>
>>> Israel
>>>
>>
>>
>


From: Federico Di Gregorio <federico(dot)digregorio(at)dndg(dot)it>
To: psycopg(at)postgresql(dot)org
Subject: Re: Input and Output data traffic
Date: 2011-05-15 17:25:50
Message-ID: 4DD00C9E.8090405@dndg.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On 13/05/11 18:33, Israel Ben Guilherme Fonseca wrote:
> Well I finally figure it out (or somewhat).
>
> The "problem" is indeed the 'sslmode' parameter of the psycopg2 connection.
>
> If omitted, I think that's the default value is 'prefer' (enable ssl if
> the server support). With that I got the very low value of 29014KB for
> traffic.
>
> When i explicitly set it to 'disable', I got almost the same result of
> java 221222KB.
>
> So here's the question, does the SSL also compress the traffic data? I
> enabled the self-signed certificate mode in the Java test and got the
> same result.

Absolutely yes. Compression is considered a standard part of every
modern encription algorithm.

federico

--
Federico Di Gregorio federico(dot)digregorio(at)dndg(dot)it
Studio Associato Di Nunzio e Di Gregorio http://dndg.it
If a process is potentially good, but 90%+ of the time smart and
well-intentioned people screw it up, then it's a bad process.
-- Steve Yegge