SELECT very slow

Lists: pgsql-sql
From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: SELECT very slow
Date: 2005-06-08 21:46:41
Message-ID: d87ood$8vq$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hello,

I have a table with roughly 100,000 rows (four varchar(100) columns). This
is basically test data I generated for something else. I'm using JDBC to
access PG (but the behaviour is the same with psql).

The problem is, that a SELECT * FROM foobar; takes ages (roughly 3 minutes)
to return the first row. I played around with the fetchSize() to disable
the result set caching in the Java program first (before I tried psql) but
that did not change anything.

It seems that PG is actually building up the full result set in the
background before delivering the first row. But I cannot see any of the
processes (neither my Java process nor the PG processes) using a lot of
memory - which I would expect if a result set of that size is created.

I need to test a program which should process large result sets (without
loading it into memory) and waiting nearly three minutes before it actually
starts working is a bit annoying :)
A SELECT count(*) FROM foobar is quite fast (about 2 seconds)

I hate to say this, but the same SELECT returns the first row more or less
instantly with Firebird, SQL Server (MSDE) and HSQLDB.

Is there anything I can do, to convince PG to return the first row more
quickly?

I tried a VACUUM FULL, no change. I increased the shared_buffers to 1000,
no improvement either.

The execution plan is not really surprising:

Seq Scan on foobar (cost=0.00..2510.04 rows=117504 width=63)

I'm using PG 8.0.3 on Win2K.

Thanks in advance
Thomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-09 00:06:46
Message-ID: 27961.1118275606@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Thomas Kellerer <spam_eater(at)gmx(dot)net> writes:
> Is there anything I can do, to convince PG to return the first row more
> quickly?

libpq's API for PQresult is such that it really doesn't have any choice
but to collect the full result set before it hands you back the
PQresult. I don't know JDBC very well but I think it has similar
semantic constraints.

The solution is to use a cursor and FETCH a reasonably small number of
rows at a time.

regards, tom lane


From: Alain <alainm(at)pobox(dot)com>
To: Sql-Postgre <pgsql-sql(at)postgresql(dot)org>
Subject: Re: SELECT very slow
Date: 2005-06-09 01:13:25
Message-ID: 42A797B5.3060300@pobox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Tom Lane escreveu:
> Thomas Kellerer <spam_eater(at)gmx(dot)net> writes:
>
>>Is there anything I can do, to convince PG to return the first row more
>>quickly?

Are you now looking for the LIMIT ?

SELECT * FROM table LIMIT 1;

and when when you wnat the rest of it:

SELECT * FROM table OFFSET 1;

Alain


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-09 06:56:54
Message-ID: d88ovu$mb8$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 09.06.2005 03:13 Alain wrote:

>
>
> Tom Lane escreveu:
>
>> Thomas Kellerer <spam_eater(at)gmx(dot)net> writes:
>>
>>> Is there anything I can do, to convince PG to return the first row
>>> more quickly?
>
>
> Are you now looking for the LIMIT ?
>
> SELECT * FROM table LIMIT 1;
>
> and when when you wnat the rest of it:
>
> SELECT * FROM table OFFSET 1;
>

No I want the whole result.

Thomas


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-09 06:58:51
Message-ID: d88p3j$mb8$2@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 09.06.2005 02:06 Tom Lane wrote:

> Thomas Kellerer <spam_eater(at)gmx(dot)net> writes:
>
>>Is there anything I can do, to convince PG to return the first row more
>>quickly?
>
>
> libpq's API for PQresult is such that it really doesn't have any choice
> but to collect the full result set before it hands you back the
> PQresult. I don't know JDBC very well but I think it has similar
> semantic constraints.
>
> The solution is to use a cursor and FETCH a reasonably small number of
> rows at a time.
>

My understanding was that setting the fetchSize() to 1 in JDBC would force the
use of a cursor.

I'll have to go through the JDBC docs again to check how I can enforce this.

Thanks for the answer though
Thomas


From: Volkan YAZICI <volkan(dot)yazici(at)gmail(dot)com>
To: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-09 07:31:50
Message-ID: 7104a7370506090031c8617d3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi,

On 6/9/05, Thomas Kellerer <spam_eater(at)gmx(dot)net> wrote:
> No I want the whole result.

As Tom underlined:

On 6/9/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The solution is to use a cursor and FETCH a reasonably
> small number of rows at a time.

AFAIC, query results are stored as arrays in PGresult structures.
Thus, storing huge result sets in a single struct is not very
feasible; although, you can face with theoretical limits like MAX_INT
in the long run. Moreover, it's so rare to see any practical use of
retrieving thousands of rows. If you're getting quite huge sets of
data, you should try grouping them with suitable statements.

IMHO, you should use cursors to fetch a suitable amount of row from
related table and forward it recursively. (Furthermore, I think this
is one of the design goals of FETCH mechanism.)

Although, as I see from most API implementations (like C++, Perl, PHP,
Python and etc.), they use libpq as layer between API and server.
Therefore, you'll probably encounter with every limitation of libpq
with other programming languages out of C too.

Regards.


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-09 11:30:08
Message-ID: d89905$8c0$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 09.06.2005 02:06 Tom Lane wrote:

> Thomas Kellerer <spam_eater(at)gmx(dot)net> writes:
>
>>Is there anything I can do, to convince PG to return the first row more
>>quickly?
>
> The solution is to use a cursor and FETCH a reasonably small number of
> rows at a time.

Thanks for all your answers.

I turned out that I did not read the JDBC documentation closely enough (blush)

Setting the fetchSize to 1 (or something >0) only uses a cursor if autocommit is
disabled (due to the annoying "transaction is aborted" when running DDL scripts
that have errors, I usually turn autocommit on).

With autocommit off, the rows will be returned "immediately" (so the driver is
using a cursor to fetch the data)

Regards
Thomas


From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Thomas Kellerer" <spam_eater(at)gmx(dot)net>, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-14 12:26:21
Message-ID: op.ssc0t7wrth1vuj@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

> The problem is, that a SELECT * FROM foobar; takes ages (roughly 3
> minutes) to return the first row. I played around with the fetchSize()
> to disable the result set caching in the Java program first (before I
> tried psql) but that did not change anything.

Hello,

Yours seemed strange so I tried this :
Created a table with 128K lines, 4 TEXT columns containing about 70 chars
each...

---------------------------------------------
\d bigtest;
Colonne | Type | Modificateurs
---------+---------+---------------------------------------------------------
id | integer | not null default
nextval('public.bigtest_id_seq'::text)
data1 | text |
data2 | text |
data3 | text |
data4 | text |
Index :
«bigtest_pkey» PRIMARY KEY, btree (id)

---------------------------------------------

SELECT count(*) from bigtest;
count
--------
131072

---------------------------------------------

explain analyze select * from bigtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on bigtest (cost=0.00..7001.72 rows=131072 width=308) (actual
time=0.035..484.249 rows=131072 loops=1)
Total runtime: 875.095 ms

So grabbing the data takes 0.875 seconds.

---------------------------------------------

SELECT avg(length(data1)),
avg(length(data2)),avg(length(data3)),avg(length(data3)) from bigtest;
avg | avg | avg |
avg
---------------------+---------------------+---------------------+---------------------
72.1629180908203125 | 72.2342376708984375 | 72.3680572509765625 |
72.3680572509765625

Here you see the average data sizes.

---------------------------------------------

Now I fire up python, do a SELECT * from the table and retrieve all the
data as native objects... Hm, it takes about 1.3 seconds... on my
Pentium-M 1600 laptop...

I was about to suggest you use a less slow and bloated language than
Java, but then on my machine psql takes about 5 seconds to display the
results, so it looks like it ain't Java. psql is slow because it has to
format the result and compute the column widths.

Don't you have a problem somewhere ? Are you sure it's not swapping ? did
you check memory ? Are you transferring all this data over the network ?
Might an obscure cabling problem have reverted your connection to 10 Mbps ?

I'm using pg 8.0.something on Linux.

Ouch. I saw you're on Windows so I tried it on the windows machine there
which has a postgres installed, over a 100Mbps network, querying from my
linux laptop. The windows machine is a piece of crap, Pentium-II 300 and
256 MB Ram, it takes 7 seconds to retrieve the whole table in a python
native object.

So...


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-15 18:05:28
Message-ID: d8pqbi$a5q$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

PFC wrote on 14.06.2005 14:26:
[...]
> Now I fire up python, do a SELECT * from the table and retrieve all
> the data as native objects... Hm, it takes about 1.3 seconds... on my
> Pentium-M 1600 laptop...
>
> Don't you have a problem somewhere ? Are you sure it's not swapping
> ? did you check memory ? Are you transferring all this data over the
> network ? Might an obscure cabling problem have reverted your
> connection to 10 Mbps ?

I'm sure. Everything is running on the same machine, about 350 MB free memory.

> Ouch. I saw you're on Windows so I tried it on the windows machine
> there which has a postgres installed, over a 100Mbps network, querying
> from my linux laptop. The windows machine is a piece of crap,
> Pentium-II 300 and 256 MB Ram, it takes 7 seconds to retrieve the whole
> table in a python native object.
>
It's not the program or Java. The same program takes about 20 seconds with
Firebird and the exactly same data.

Thomas


From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Thomas Kellerer" <spam_eater(at)gmx(dot)net>, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-15 20:04:08
Message-ID: op.ssfgo6xath1vuj@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


> It's not the program or Java. The same program takes about 20 seconds
> with Firebird and the exactly same data.

Hm, that's still very slow (it should do it in a couple seconds like my
PC does... maybe the problem is common to postgres and firebird ?)

Try eliminating disk IO by writing a set returning function which returns
1000000 rows, something simple like just a sequence number and a text
value... if this is slow too... i don't know... do you have an antivirus
or zonealarm or something ?

Have you tried connecting from another machine ?

>
> Thomas
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-15 22:08:18
Message-ID: d8q8iq$pfg$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

PFC wrote on 15.06.2005 22:04:

>
>> It's not the program or Java. The same program takes about 20 seconds
>> with Firebird and the exactly same data.
>
>
> Hm, that's still very slow (it should do it in a couple seconds like
> my PC does... maybe the problem is common to postgres and firebird ?)
>
> Try eliminating disk IO by writing a set returning function which
> returns 1000000 rows, something simple like just a sequence number and
> a text value... if this is slow too... i don't know...

> do you have an antivirus or zonealarm or something ?
Wouldn't that affect all DB access not only PG? And as I said, all other

The 20 seconds are ok. This includes processing of the data in the
application. If I simply loop over the result set and get each column's
value without further processing it takes 4 seconds with Firebird.

Basically I'm doing the following:

rs = stmt.executeQuery("select * from foo");
while (rs.next())
{
for (int i=0; i < 4; i++)
{
Object o = rs.getObject(i+1);
}
}

As I said in my other post, the behaviour/performance in PG is dependent on
the autocommit setting for the connection.

With autocommit set to false the above code takes about 3 seconds in PG
but wit autocommit set to true, PG takes 3 minutes! It seems that it also
is very dependent on the fetchsize (apparently the number of rows that are
cached by the driver). Anything above 100 seems to slow down the overall
process.

Regards
Thomas


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-16 14:00:39
Message-ID: 1118930439.3382.91.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Wed, 2005-06-15 at 17:08, Thomas Kellerer wrote:
> PFC wrote on 15.06.2005 22:04:
>
> >
> >> It's not the program or Java. The same program takes about 20 seconds
> >> with Firebird and the exactly same data.
> >
> >
> > Hm, that's still very slow (it should do it in a couple seconds like
> > my PC does... maybe the problem is common to postgres and firebird ?)
> >
> > Try eliminating disk IO by writing a set returning function which
> > returns 1000000 rows, something simple like just a sequence number and
> > a text value... if this is slow too... i don't know...
>
> > do you have an antivirus or zonealarm or something ?
> Wouldn't that affect all DB access not only PG? And as I said, all other
>
> The 20 seconds are ok. This includes processing of the data in the
> application. If I simply loop over the result set and get each column's
> value without further processing it takes 4 seconds with Firebird.
>
> Basically I'm doing the following:
>
> rs = stmt.executeQuery("select * from foo");
> while (rs.next())
> {
> for (int i=0; i < 4; i++)
> {
> Object o = rs.getObject(i+1);
> }
> }
>
> As I said in my other post, the behaviour/performance in PG is dependent on
> the autocommit setting for the connection.
>
> With autocommit set to false the above code takes about 3 seconds in PG
> but wit autocommit set to true, PG takes 3 minutes! It seems that it also
> is very dependent on the fetchsize (apparently the number of rows that are
> cached by the driver). Anything above 100 seems to slow down the overall
> process.

There's got to be more happening than what this is showing us. A
select, and looping through it, should involve no writes, and therefore
no real performance difference from autocommit versus not. Is there
some underlying trigger on the view or something like that? Some kind
of auditing function?


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-16 14:45:35
Message-ID: d8s30h$r12$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 16.06.2005 16:00 Scott Marlowe wrote:

> There's got to be more happening than what this is showing us. A
> select, and looping through it, should involve no writes, and therefore
> no real performance difference from autocommit versus not. Is there
> some underlying trigger on the view or something like that? Some kind
> of auditing function?

That's exactly the code that produced the mentioned timings. This is - according
to the JDBC driver's documentation - the expected behaviour. The driver can be
set to use cursor based fetching but *only* if autocommit is false.

If autocommit is on (or fetch size is zero) then the driver will build the whole
result set before returning to the caller.

http://jdbc.postgresql.org/documentation/80/query.html#query-with-cursor

Thomas


From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Thomas Kellerer" <spam_eater(at)gmx(dot)net>, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-16 15:29:32
Message-ID: op.ssgyniecth1vuj@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

> If autocommit is on (or fetch size is zero) then the driver will build
> the whole
> result set before returning to the caller.

Sure, but that is not your problem : even building the whole result set
should not take longer than a few seconds (I gave you test timings in a
previous message).
So... what ?
What does the taskman say ? CPU at 100% ? how much kernel time ?

>
> http://jdbc.postgresql.org/documentation/80/query.html#query-with-cursor
>
> Thomas
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT very slow
Date: 2005-06-16 15:33:19
Message-ID: d8s5q1$5ud$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 16.06.2005 17:29 PFC wrote:

>> If autocommit is on (or fetch size is zero) then the driver will
>> build the whole
>> result set before returning to the caller.
>
>
> Sure, but that is not your problem : even building the whole result
> set should not take longer than a few seconds (I gave you test timings
> in a previous message).
> So... what ?
> What does the taskman say ? CPU at 100% ? how much kernel time ?
>

Taskmanager (I'm on windows) does not show any high CPU usage. And (as I wrote
in the initial post) I cannot see any memory increase in the PG process as well
(which I would expect with a result set of that size built up in memory).

Thomas