Quick Links

strange performance problem

Lists:	pgsql-general

From:	Linos <info(at)linos(dot)es>
To:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	strange performance problem
Date:	2009-02-27 13:04:25
Message-ID:	49A7E4D9.2010303@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hello i have the same table with the same data in my development machine and in
a small server in production. The table is this:

the column "foto" has a toast table, aside from the fact that it have in the
server three triggers they are exactly the same, with the same data too, my
development machine has version 8.3.6 (linux kernel 2.6.28) and production
server has version 8.3.3 (linux kernel 2.6.26), the settings in postgresql.conf
are nearly the same except for work_men (24 server, 36 development machine) and
effective_cache_size (1024 server, 1536 development machine), they have the same
sysctl settings and limits too, and the same mount options for the ext3
filesystem that have the data, i have a single sata disk (wd velociraptor) in my
development machine and the server it is using a linux software raid10 with 4
sata disks.

I have detected that a simple query from the application i am developing in QT
it is really fast in my machine and takes too much time in production server, i
am logging the queries that spend more than 500ms so i have this in the log.

2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms sentencia: SELECT
"nombre", "subfamilia_id", "id_familia", "hasta", "foto", "id_seccion",
"id_categoria" FROM "modelo_subfamilia"

so i have been testing in my machine and in the server the same query to see the
difference.

EXPLAIN ANALYZE:
-development:
Seq Scan on modelo_subfamilia (cost=0.00..11.68 rows=368 width=73) (actual
time=0.010..0.092 rows=368 loops=1)
Total runtime: 0.174 ms

-server:
Seq Scan on modelo_subfamilia (cost=0.00..6.10 rows=368 width=69) (actual
time=0.008..0.158 rows=368 loops=1)
Total runtime: 0.289 ms

PSQL with \timing:
-development: Time: 72,441 ms
-server: Time: 78,762 ms

but if i load it from QT or from pgadmin i get more than 4 seconds in server and
~100ms in develoment machime, if i try the query without the "foto" column i get
2ms in development and 30ms in server so the difference its there anyway but not
in psql commandline it seems to be only when accessing from a graphical front
end, and with the complete query with foto column included i get the postgresql
process to eat 90% of the cpu for the complete 4 seconds that it gets to send me
the result so it not seems to be a problem with the cpu usage from the graphic
libs (no QT or WxWindows), how could i debug this problem?, where should i begin
to search? Thanks.

Regards,
Miguel Angel.

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Linos <info(at)linos(dot)es>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem
Date:	2009-02-27 13:33:12
Message-ID:	49A7EB98.4020501@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Linos wrote:
> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms sentencia:
> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
> "id_seccion", "id_categoria" FROM "modelo_subfamilia"

> PSQL with \timing:
> -development: Time: 72,441 ms
> -server: Time: 78,762 ms

> but if i load it from QT or from pgadmin i get more than 4 seconds in
> server and ~100ms in develoment machime, if i try the query without the
> "foto" column i get 2ms in development and 30ms in server

OK, so:
1. No "foto" - both quick
2. psql + "foto" - both slow
3. QT + "foto" - slow only on server

The bit that puzzles me is why both are slow in #2 and not in #3.

First things first: run "VACUUM FULL VERBOSE modela_subfamilia" on both
systems and see how many pages were being taken up. I'm guessing it will
be more on the server, but is it a *lot* more?

Then run "REINDEX TABLE modela_subfamilia" to clean up your indexes.

If it's still a problem that suggests the two systems are doing
something different with the bytea encoding/decoding. Check:

1. Connection settings - is one tcp/ip and the other unix sockets?
2. Binary/text mode - are you using a binary format for query results on
the development machine?
3. Are you fetching the entire result-set on the server and only the
first row(s) on your dev. machine?
4. Encoding/locale differences - can't see how this would matter for
bytea, but worth ruling out.

--
Richard Huxton
Archonet Ltd

From:	Linos <info(at)linos(dot)es>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem
Date:	2009-02-27 13:54:36
Message-ID:	49A7F09C.9040003@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Richard Huxton escribió:
> Linos wrote:
>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms sentencia:
>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>
>> PSQL with \timing:
>> -development: Time: 72,441 ms
>> -server: Time: 78,762 ms
>
>> but if i load it from QT or from pgadmin i get more than 4 seconds in
>> server and ~100ms in develoment machime, if i try the query without the
>> "foto" column i get 2ms in development and 30ms in server
>
> OK, so:
> 1. No "foto" - both quick
> 2. psql + "foto" - both slow
> 3. QT + "foto" - slow only on server

1.No "foto" -both quick but still a noticeable difference between them 2ms
develoment - 30ms server
2. psql + "foto" -both quick really, they are about 70ms, not bad giving that
foto are bytea with small png images.
3. QT or WXWindows + "foto" -slow only one server yes.

>
> The bit that puzzles me is why both are slow in #2 and not in #3.
>
>
> First things first: run "VACUUM FULL VERBOSE modela_subfamilia" on both
> systems and see how many pages were being taken up. I'm guessing it will
> be more on the server, but is it a *lot* more?
>
> Then run "REINDEX TABLE modela_subfamilia" to clean up your indexes.
>
> If it's still a problem that suggests the two systems are doing
> something different with the bytea encoding/decoding. Check:
>
> 1. Connection settings - is one tcp/ip and the other unix sockets?
> 2. Binary/text mode - are you using a binary format for query results on
> the development machine?
> 3. Are you fetching the entire result-set on the server and only the
> first row(s) on your dev. machine?
> 4. Encoding/locale differences - can't see how this would matter for
> bytea, but worth ruling out.
>

After the vacuum full verbose and reindex still the same problem (i had tried
the vacuum before).

1- The same in the two machines, tcp/ip with localhost.
2- I am exactly the same code in the two machines and the same pgadmin3 version too.
3- Ever the entire result set.
4- I am using es_ES.UTF8 in the two machines

What can be using wxwindows and QT to access postgresql that psql it is not
using, libpq?

Regards,
Miguel Angel.

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Linos <info(at)linos(dot)es>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem
Date:	2009-02-27 18:54:05
Message-ID:	49A836CD.7000608@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Linos wrote:
> Richard Huxton escribió:
>> Linos wrote:
>>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms sentencia:
>>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>>
>>> PSQL with \timing:
>>> -development: Time: 72,441 ms
>>> -server: Time: 78,762 ms
>>
>>> but if i load it from QT or from pgadmin i get more than 4 seconds in
>>> server and ~100ms in develoment machime, if i try the query without the
>>> "foto" column i get 2ms in development and 30ms in server
>>
>> OK, so:
>> 1. No "foto" - both quick
>> 2. psql + "foto" - both slow
>> 3. QT + "foto" - slow only on server
>
> 1.No "foto" -both quick but still a noticeable
> difference between them 2ms develoment - 30ms server
> 2. psql + "foto" -both quick really, they are about 70ms,
> not bad giving that foto are bytea with small png images.

Ah, sorry - I read the psql timings as 72 thousand ms (72 seconds) - of
course you're using European decimal marks.

> 3. QT or WXWindows + "foto" -slow only one server yes.
>
>> The bit that puzzles me is why both are slow in #2 and not in #3.

OK - well, the fact that both psql are fast means there's nothing too
wrong with your setup. It must be something to do with the application
libraries.

> After the vacuum full verbose and reindex still the same problem (i had
> tried the vacuum before).

OK. Worth ruling it out.

> 1- The same in the two machines, tcp/ip with localhost.

Hmm...

> 2- I am exactly the same code in the two machines and the same pgadmin3
> version too.

Good. We can rule that out.

> 3- Ever the entire result set.

Good.

> 4- I am using es_ES.UTF8 in the two machines

Good.

> What can be using wxwindows and QT to access postgresql that psql it is
> not using, libpq?

Well, I'm pretty sure that pgadmin will be using libpq at some level,
even if there is other code above it.

Either:

1. One machine (the fast one) is actually using unix sockets and not
tcp/ip+localhost like you think.
2. The networking setup is different on each.
3. Something your code is doing with the bytea data is slower on one
machine than another. I seem to remember that pgadmin used to be quite
slow at displaying large amounts of data. They did some work on that,
but it might be that your use-case still suffers from it.

For #1 try the psql test again, but with "-h localhost" and "-h /tmp"
(or whatever directory your unix socket is in - might be
/var/run/postgresql or similar too).

For #2, you can always try timing "psql -h localhost ... > /dev/null" on
both machines. If you capture port 5432 with something like "tcpdump -w
ip.dump host localhost and port 5432" you can then use wireshark to see
exactly why it's slow.

For #3, I guess you'd need to reduce your code to just fetching the data
and time that. You may have already done this of course.

HTH

--
Richard Huxton
Archonet Ltd

From:	Linos <info(at)linos(dot)es>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem
Date:	2009-02-27 19:30:32
Message-ID:	49A83F58.2070304@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Richard Huxton escribió:
> Linos wrote:
>> Richard Huxton escribió:
>>> Linos wrote:
>>>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms sentencia:
>>>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>>>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>>>> PSQL with \timing:
>>>> -development: Time: 72,441 ms
>>>> -server: Time: 78,762 ms
>>>> but if i load it from QT or from pgadmin i get more than 4 seconds in
>>>> server and ~100ms in develoment machime, if i try the query without the
>>>> "foto" column i get 2ms in development and 30ms in server
>>> OK, so:
>>> 1. No "foto" - both quick
>>> 2. psql + "foto" - both slow
>>> 3. QT + "foto" - slow only on server
>> 1.No "foto" -both quick but still a noticeable
>> difference between them 2ms develoment - 30ms server
>> 2. psql + "foto" -both quick really, they are about 70ms,
>> not bad giving that foto are bytea with small png images.
>
> Ah, sorry - I read the psql timings as 72 thousand ms (72 seconds) - of
> course you're using European decimal marks.
>
>> 3. QT or WXWindows + "foto" -slow only one server yes.
>>
>>> The bit that puzzles me is why both are slow in #2 and not in #3.
>
> OK - well, the fact that both psql are fast means there's nothing too
> wrong with your setup. It must be something to do with the application
> libraries.
>
>> After the vacuum full verbose and reindex still the same problem (i had
>> tried the vacuum before).
>
> OK. Worth ruling it out.
>
>> 1- The same in the two machines, tcp/ip with localhost.
>
> Hmm...
>
>> 2- I am exactly the same code in the two machines and the same pgadmin3
>> version too.
>
> Good. We can rule that out.
>
>> 3- Ever the entire result set.
>
> Good.
>
>> 4- I am using es_ES.UTF8 in the two machines
>
> Good.
>
>> What can be using wxwindows and QT to access postgresql that psql it is
>> not using, libpq?
>
> Well, I'm pretty sure that pgadmin will be using libpq at some level,
> even if there is other code above it.
>
> Either:
>
> 1. One machine (the fast one) is actually using unix sockets and not
> tcp/ip+localhost like you think.
> 2. The networking setup is different on each.
> 3. Something your code is doing with the bytea data is slower on one
> machine than another. I seem to remember that pgadmin used to be quite
> slow at displaying large amounts of data. They did some work on that,
> but it might be that your use-case still suffers from it.
>
> For #1 try the psql test again, but with "-h localhost" and "-h /tmp"
> (or whatever directory your unix socket is in - might be
> /var/run/postgresql or similar too).

Ok, thanks for the trick now i know where to search, after trying with -h
localhost psql it is slow too in the server from 80,361 with \timing to 4259,453
using -h localhost. Any ideas what can be the problem here? i am going to make
what you suggest and capture analyze the traffic, after find the hole i have
tried in other debian server with the same kernel 2.6.26 and i have the same
problem (my development machine it is Arch Linux with 2.6.28).

Regards,
Miguel Angel.

> For #2, you can always try timing "psql -h localhost ... > /dev/null" on
> both machines. If you capture port 5432 with something like "tcpdump -w
> ip.dump host localhost and port 5432" you can then use wireshark to see
> exactly why it's slow.
>
> For #3, I guess you'd need to reduce your code to just fetching the data
> and time that. You may have already done this of course.
>
> HTH
>

From:	Linos <info(at)linos(dot)es>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem
Date:	2009-02-28 19:57:12
Message-ID:	49A99718.4060305@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Linos escribió:
> Richard Huxton escribió:
>> Linos wrote:
>>> Richard Huxton escribió:
>>>> Linos wrote:
>>>>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms
>>>>> sentencia:
>>>>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>>>>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>>>>> PSQL with \timing:
>>>>> -development: Time: 72,441 ms
>>>>> -server: Time: 78,762 ms
>>>>> but if i load it from QT or from pgadmin i get more than 4 seconds in
>>>>> server and ~100ms in develoment machime, if i try the query without
>>>>> the
>>>>> "foto" column i get 2ms in development and 30ms in server
>>>> OK, so:
>>>> 1. No "foto" - both quick
>>>> 2. psql + "foto" - both slow
>>>> 3. QT + "foto" - slow only on server
>>> 1.No "foto" -both quick but still a noticeable
>>> difference between them 2ms develoment - 30ms server
>>> 2. psql + "foto" -both quick really, they are about 70ms,
>>> not bad giving that foto are bytea with small png images.
>>
>> Ah, sorry - I read the psql timings as 72 thousand ms (72 seconds) - of
>> course you're using European decimal marks.
>>
>>> 3. QT or WXWindows + "foto" -slow only one server yes.
>>>
>>>> The bit that puzzles me is why both are slow in #2 and not in #3.
>>
>> OK - well, the fact that both psql are fast means there's nothing too
>> wrong with your setup. It must be something to do with the application
>> libraries.
>>
>>> After the vacuum full verbose and reindex still the same problem (i had
>>> tried the vacuum before).
>>
>> OK. Worth ruling it out.
>>
>>> 1- The same in the two machines, tcp/ip with localhost.
>>
>> Hmm...
>>
>>> 2- I am exactly the same code in the two machines and the same pgadmin3
>>> version too.
>>
>> Good. We can rule that out.
>>
>>> 3- Ever the entire result set.
>>
>> Good.
>>
>>> 4- I am using es_ES.UTF8 in the two machines
>>
>> Good.
>>
>>> What can be using wxwindows and QT to access postgresql that psql it is
>>> not using, libpq?
>>
>> Well, I'm pretty sure that pgadmin will be using libpq at some level,
>> even if there is other code above it.
>>
>> Either:
>>
>> 1. One machine (the fast one) is actually using unix sockets and not
>> tcp/ip+localhost like you think.
>> 2. The networking setup is different on each.
>> 3. Something your code is doing with the bytea data is slower on one
>> machine than another. I seem to remember that pgadmin used to be quite
>> slow at displaying large amounts of data. They did some work on that,
>> but it might be that your use-case still suffers from it.
>>
>> For #1 try the psql test again, but with "-h localhost" and "-h /tmp"
>> (or whatever directory your unix socket is in - might be
>> /var/run/postgresql or similar too).
>
> Ok, thanks for the trick now i know where to search, after trying with
> -h localhost psql it is slow too in the server from 80,361 with \timing
> to 4259,453 using -h localhost. Any ideas what can be the problem here?
> i am going to make what you suggest and capture analyze the traffic,
> after find the hole i have tried in other debian server with the same
> kernel 2.6.26 and i have the same problem (my development machine it is
> Arch Linux with 2.6.28).
>
> Regards,
> Miguel Angel.
>
>> For #2, you can always try timing "psql -h localhost ... > /dev/null" on
>> both machines. If you capture port 5432 with something like "tcpdump -w
>> ip.dump host localhost and port 5432" you can then use wireshark to see
>> exactly why it's slow.
>>
>> For #3, I guess you'd need to reduce your code to just fetching the data
>> and time that. You may have already done this of course.
>>
>> HTH
>>

I have been testing with tcpdump but i dont see the problem in the traffic
(aside from the fact that it gives big jumps in ms between packets of data, but
i dont know why), i have tested on other debian machines with the same result, i
have upgraded kernel to 2.6.28 and postgresql to 8.3.6 (equal versions of my
Arch Linux Development machine), but still have the same problem:

-query with \timing with "psql -d database" ~110ms
-query with \timing with "psql -d database -h localhost" ~4400ms

Using tcp the cpu of postgresql spike to the max it can borrow within the query.
I have attached the tcpdump logs of a debian machine and the Arch too (maybe
anyone can see anything in them that i can not). How i can test pure speed in
the loopback interface? i have tried iperf but seems to be cpu bound so maybe
the results are misleading.

From:	Linos <info(at)linos(dot)es>
To:
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem
Date:	2009-03-01 00:50:59
Message-ID:	49A9DBF3.2000309@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

>
> I have been testing with tcpdump but i dont see the problem in the
> traffic (aside from the fact that it gives big jumps in ms between
> packets of data, but i dont know why), i have tested on other debian
> machines with the same result, i have upgraded kernel to 2.6.28 and
> postgresql to 8.3.6 (equal versions of my Arch Linux Development
> machine), but still have the same problem:
>
> -query with \timing with "psql -d database" ~110ms
> -query with \timing with "psql -d database -h localhost" ~4400ms
>
> Using tcp the cpu of postgresql spike to the max it can borrow within
> the query. I have attached the tcpdump logs of a debian machine and the
> Arch too (maybe anyone can see anything in them that i can not). How i
> can test pure speed in the loopback interface? i have tried iperf but
> seems to be cpu bound so maybe the results are misleading.
>

The pcap files are here for download, the mail with the attachment seems to not
get into the list so here it is a link to download them.
http://www.linos.es/pcap_files.tar.bz2

Regards,
Miguel Angel.

From:	Linos <info(at)linos(dot)es>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem (SOLVED)
Date:	2009-03-01 17:04:25
Message-ID:	49AAC019.8050307@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Linos escribió:
> Linos escribió:
>> Richard Huxton escribió:
>>> Linos wrote:
>>>> Richard Huxton escribió:
>>>>> Linos wrote:
>>>>>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms
>>>>>> sentencia:
>>>>>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>>>>>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>>>>>> PSQL with \timing:
>>>>>> -development: Time: 72,441 ms
>>>>>> -server: Time: 78,762 ms
>>>>>> but if i load it from QT or from pgadmin i get more than 4 seconds in
>>>>>> server and ~100ms in develoment machime, if i try the query
>>>>>> without the
>>>>>> "foto" column i get 2ms in development and 30ms in server
>>>>> OK, so:
>>>>> 1. No "foto" - both quick
>>>>> 2. psql + "foto" - both slow
>>>>> 3. QT + "foto" - slow only on server
>>>> 1.No "foto" -both quick but still a noticeable
>>>> difference between them 2ms develoment - 30ms server
>>>> 2. psql + "foto" -both quick really, they are about
>>>> 70ms,
>>>> not bad giving that foto are bytea with small png images.
>>>
>>> Ah, sorry - I read the psql timings as 72 thousand ms (72 seconds) - of
>>> course you're using European decimal marks.
>>>
>>>> 3. QT or WXWindows + "foto" -slow only one server yes.
>>>>
>>>>> The bit that puzzles me is why both are slow in #2 and not in #3.
>>>
>>> OK - well, the fact that both psql are fast means there's nothing too
>>> wrong with your setup. It must be something to do with the application
>>> libraries.
>>>
>>>> After the vacuum full verbose and reindex still the same problem (i had
>>>> tried the vacuum before).
>>>
>>> OK. Worth ruling it out.
>>>
>>>> 1- The same in the two machines, tcp/ip with localhost.
>>>
>>> Hmm...
>>>
>>>> 2- I am exactly the same code in the two machines and the same pgadmin3
>>>> version too.
>>>
>>> Good. We can rule that out.
>>>
>>>> 3- Ever the entire result set.
>>>
>>> Good.
>>>
>>>> 4- I am using es_ES.UTF8 in the two machines
>>>
>>> Good.
>>>
>>>> What can be using wxwindows and QT to access postgresql that psql it is
>>>> not using, libpq?
>>>
>>> Well, I'm pretty sure that pgadmin will be using libpq at some level,
>>> even if there is other code above it.
>>>
>>> Either:
>>>
>>> 1. One machine (the fast one) is actually using unix sockets and not
>>> tcp/ip+localhost like you think.
>>> 2. The networking setup is different on each.
>>> 3. Something your code is doing with the bytea data is slower on one
>>> machine than another. I seem to remember that pgadmin used to be quite
>>> slow at displaying large amounts of data. They did some work on that,
>>> but it might be that your use-case still suffers from it.
>>>
>>> For #1 try the psql test again, but with "-h localhost" and "-h /tmp"
>>> (or whatever directory your unix socket is in - might be
>>> /var/run/postgresql or similar too).
>>
>> Ok, thanks for the trick now i know where to search, after trying with
>> -h localhost psql it is slow too in the server from 80,361 with
>> \timing to 4259,453 using -h localhost. Any ideas what can be the
>> problem here? i am going to make what you suggest and capture analyze
>> the traffic, after find the hole i have tried in other debian server
>> with the same kernel 2.6.26 and i have the same problem (my
>> development machine it is Arch Linux with 2.6.28).
>>
>> Regards,
>> Miguel Angel.
>>
>>> For #2, you can always try timing "psql -h localhost ... > /dev/null" on
>>> both machines. If you capture port 5432 with something like "tcpdump -w
>>> ip.dump host localhost and port 5432" you can then use wireshark to see
>>> exactly why it's slow.
>>>
>>> For #3, I guess you'd need to reduce your code to just fetching the data
>>> and time that. You may have already done this of course.
>>>
>>> HTH
>>>
>
> I have been testing with tcpdump but i dont see the problem in the
> traffic (aside from the fact that it gives big jumps in ms between
> packets of data, but i dont know why), i have tested on other debian
> machines with the same result, i have upgraded kernel to 2.6.28 and
> postgresql to 8.3.6 (equal versions of my Arch Linux Development
> machine), but still have the same problem:
>
> -query with \timing with "psql -d database" ~110ms
> -query with \timing with "psql -d database -h localhost" ~4400ms
>
> Using tcp the cpu of postgresql spike to the max it can borrow within
> the query. I have attached the tcpdump logs of a debian machine and the
> Arch too (maybe anyone can see anything in them that i can not). How i
> can test pure speed in the loopback interface? i have tried iperf but
> seems to be cpu bound so maybe the results are misleading.
>

Okay, i have found the problem, in postgresql.conf the parameter "ssl = true"
seems to slow the clear tcp connections (not ssl enabled) very very much, but
this does not affect my arch Linux machine, only debian ones so i will contact
debian package maintainer so they can investigate it, thanks for the help.

Regards,
Miguel Angel.

From:	Linos <info(at)linos(dot)es>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: strange performance problem (SOLVED)
Date:	2009-03-01 19:02:10
Message-ID:	49AADBB2.4060500@linos.es
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Linos escribió:
> Linos escribió:
>> Linos escribió:
>>> Richard Huxton escribió:
>>>> Linos wrote:
>>>>> Richard Huxton escribió:
>>>>>> Linos wrote:
>>>>>>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms
>>>>>>> sentencia:
>>>>>>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>>>>>>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>>>>>>> PSQL with \timing:
>>>>>>> -development: Time: 72,441 ms
>>>>>>> -server: Time: 78,762 ms
>>>>>>> but if i load it from QT or from pgadmin i get more than 4
>>>>>>> seconds in
>>>>>>> server and ~100ms in develoment machime, if i try the query
>>>>>>> without the
>>>>>>> "foto" column i get 2ms in development and 30ms in server
>>>>>> OK, so:
>>>>>> 1. No "foto" - both quick
>>>>>> 2. psql + "foto" - both slow
>>>>>> 3. QT + "foto" - slow only on server
>>>>> 1.No "foto" -both quick but still a noticeable
>>>>> difference between them 2ms develoment - 30ms server
>>>>> 2. psql + "foto" -both quick really, they are about
>>>>> 70ms,
>>>>> not bad giving that foto are bytea with small png images.
>>>>
>>>> Ah, sorry - I read the psql timings as 72 thousand ms (72 seconds) - of
>>>> course you're using European decimal marks.
>>>>
>>>>> 3. QT or WXWindows + "foto" -slow only one server yes.
>>>>>
>>>>>> The bit that puzzles me is why both are slow in #2 and not in #3.
>>>>
>>>> OK - well, the fact that both psql are fast means there's nothing too
>>>> wrong with your setup. It must be something to do with the application
>>>> libraries.
>>>>
>>>>> After the vacuum full verbose and reindex still the same problem (i
>>>>> had
>>>>> tried the vacuum before).
>>>>
>>>> OK. Worth ruling it out.
>>>>
>>>>> 1- The same in the two machines, tcp/ip with localhost.
>>>>
>>>> Hmm...
>>>>
>>>>> 2- I am exactly the same code in the two machines and the same
>>>>> pgadmin3
>>>>> version too.
>>>>
>>>> Good. We can rule that out.
>>>>
>>>>> 3- Ever the entire result set.
>>>>
>>>> Good.
>>>>
>>>>> 4- I am using es_ES.UTF8 in the two machines
>>>>
>>>> Good.
>>>>
>>>>> What can be using wxwindows and QT to access postgresql that psql
>>>>> it is
>>>>> not using, libpq?
>>>>
>>>> Well, I'm pretty sure that pgadmin will be using libpq at some level,
>>>> even if there is other code above it.
>>>>
>>>> Either:
>>>>
>>>> 1. One machine (the fast one) is actually using unix sockets and not
>>>> tcp/ip+localhost like you think.
>>>> 2. The networking setup is different on each.
>>>> 3. Something your code is doing with the bytea data is slower on one
>>>> machine than another. I seem to remember that pgadmin used to be quite
>>>> slow at displaying large amounts of data. They did some work on that,
>>>> but it might be that your use-case still suffers from it.
>>>>
>>>> For #1 try the psql test again, but with "-h localhost" and "-h /tmp"
>>>> (or whatever directory your unix socket is in - might be
>>>> /var/run/postgresql or similar too).
>>>
>>> Ok, thanks for the trick now i know where to search, after trying
>>> with -h localhost psql it is slow too in the server from 80,361 with
>>> \timing to 4259,453 using -h localhost. Any ideas what can be the
>>> problem here? i am going to make what you suggest and capture analyze
>>> the traffic, after find the hole i have tried in other debian server
>>> with the same kernel 2.6.26 and i have the same problem (my
>>> development machine it is Arch Linux with 2.6.28).
>>>
>>> Regards,
>>> Miguel Angel.
>>>
>>>> For #2, you can always try timing "psql -h localhost ... >
>>>> /dev/null" on
>>>> both machines. If you capture port 5432 with something like "tcpdump -w
>>>> ip.dump host localhost and port 5432" you can then use wireshark to see
>>>> exactly why it's slow.
>>>>
>>>> For #3, I guess you'd need to reduce your code to just fetching the
>>>> data
>>>> and time that. You may have already done this of course.
>>>>
>>>> HTH
>>>>
>>
>> I have been testing with tcpdump but i dont see the problem in the
>> traffic (aside from the fact that it gives big jumps in ms between
>> packets of data, but i dont know why), i have tested on other debian
>> machines with the same result, i have upgraded kernel to 2.6.28 and
>> postgresql to 8.3.6 (equal versions of my Arch Linux Development
>> machine), but still have the same problem:
>>
>> -query with \timing with "psql -d database" ~110ms
>> -query with \timing with "psql -d database -h localhost" ~4400ms
>>
>> Using tcp the cpu of postgresql spike to the max it can borrow within
>> the query. I have attached the tcpdump logs of a debian machine and
>> the Arch too (maybe anyone can see anything in them that i can not).
>> How i can test pure speed in the loopback interface? i have tried
>> iperf but seems to be cpu bound so maybe the results are misleading.
>>
>
> Okay, i have found the problem, in postgresql.conf the parameter "ssl =
> true" seems to slow the clear tcp connections (not ssl enabled) very
> very much, but this does not affect my arch Linux machine, only debian
> ones so i will contact debian package maintainer so they can investigate
> it, thanks for the help.
>
> Regards,
> Miguel Angel.
>

The difference here between Arch and Debian it is that debian ships with a ssl
cert (in the pgsql-performance mailing list Magnus have pointed me this info) so
the local psql connections are ever encrypted by default if you not change
PGSSLMODE environment variable to "disable", which gives the difference in the
query, after disable ssl all it is normal other time :).

Regards,
Miguel Angel.