Re: plpgsql function is so slow

Lists: pgsql-hackers
From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: plpgsql function is so slow
Date: 2009-09-24 20:51:09
Message-ID: 4ABBDBBD.4050007@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I recently saw a complaint that a simple PL/PgSQL code is slower than PL/SQL.
I did some benchmark and confirmed it is. I coded the same function
(function2) in C just to compare with something. According to OP [1], the
PL/SQL seems to run more than 15x faster than PL/PgSQL code.

euler=# select function1();
function1
-----------
100000000
(1 row)

Time: 62107,607 ms
euler=# select function2();
function2
-----------
100000000
(1 row)

Time: 419,673 ms

The PL/PgSQL function is:

CREATE OR REPLACE FUNCTION function1()
RETURNS INTEGER AS
$BODY$
DECLARE
i INTEGER;
s INTEGER := 0;
BEGIN
FOR i IN 1 .. power(10, 8) LOOP
s := s + 1;
END LOOP;
RETURN s;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

The C function is:

#include "postgres.h"
#include <math.h>
#include "fmgr.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(function2);

Datum function2(PG_FUNCTION_ARGS)
{
int i;
int s = 0;

for (i = 1; i <= (int) pow(10, 8); i++)
s += 1;

PG_RETURN_INT32(s);
}

PL/PgSQL oprofile is:

samples % symbol name
2263 25.6024 AllocSetReset
1071 12.1168 ExecMakeFunctionResultNoSets
725 8.2023 AllocSetAlloc
664 7.5122 RevalidateCachedPlan
586 6.6297 ExecEvalParam
521 5.8943 AcquireExecutorLocks
463 5.2381 ResourceOwnerForgetPlanCacheRef
359 4.0615 AllocSetFreeIndex
329 3.7221 int4pl
262 2.9641 ExecEvalConst
248 2.8057 check_stack_depth
244 2.7605 MemoryContextReset
234 2.6474 list_head
143 1.6178 ReleaseCachedPlan
130 1.4708 MemoryContextAlloc
121 1.3689 pgstat_end_function_usage
111 1.2558 pgstat_init_function_usage
98 1.1087 list_head
94 1.0635 ResourceOwnerEnlargePlanCacheRefs
90 1.0182 ResourceOwnerRememberPlanCacheRef
44 0.4978 SPI_push
39 0.4412 SPI_pop

Any ideas?

[1]
http://listas.postgresql.org.br/pipermail/pgbr-geral/2009-September/017427.html

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpgsql function is so slow
Date: 2009-09-24 21:54:59
Message-ID: 29706.1253829299@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> I recently saw a complaint that a simple PL/PgSQL code is slower than PL/SQL.
> I did some benchmark and confirmed it is. I coded the same function
> (function2) in C just to compare with something. According to OP [1], the
> PL/SQL seems to run more than 15x faster than PL/PgSQL code.

Does incrementing an integer in a tight loop represent the complainant's
typical plpgsql workload? If so, maybe he should go use another PL.
plpgsql is intended for work that involves a lot of database access, and
so the overhead in operations like this isn't significant. We offer
other PLs that make different tradeoffs.

FWIW, the high showing of AllocSetReset in your profile suggests to me
that you're timing an assert-enabled build, which wouldn't exactly be
a fair comparison to an Oracle production build anyhow.

regards, tom lane


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpgsql function is so slow
Date: 2009-09-25 03:49:46
Message-ID: 4ABC3DDA.8030709@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane escreveu:
> FWIW, the high showing of AllocSetReset in your profile suggests to me
> that you're timing an assert-enabled build, which wouldn't exactly be
> a fair comparison to an Oracle production build anyhow.
>
Ops... forgot to remove it from other test. It seems much better but far from
the ideal. :( I've never taken a look at the pl/pgsql code but it could be
nice if there would be two path codes: access-data and non-access-data paths.
I have no idea if it will be possible (is path type too complex to detect?)
but it will certainly improve the non-access-data functions.

euler=# select function1(); -- PL/PgSQL
function1
-----------
100000000
(1 row)

Time: 34455,263 ms
euler=# select function3(); -- PL/Perl
function3
-----------
100000000
(1 row)

Time: 24986,016 ms

pl/pgsql oprofile is:

samples % symbol name
620 14.4961 ExecMakeFunctionResultNoSets
537 12.5555 AllocSetAlloc
484 11.3163 AllocSetReset
323 7.5520 RevalidateCachedPlan
292 6.8272 ExecEvalParam
222 5.1906 pgstat_init_function_usage
218 5.0970 int4pl
199 4.6528 ResourceOwnerForgetPlanCacheRef
196 4.5827 AcquireExecutorLocks
175 4.0917 ResourceOwnerRememberPlanCacheRef
166 3.8812 ReleaseCachedPlan
155 3.6240 pgstat_end_function_usage
143 3.3435 check_stack_depth
136 3.1798 ExecEvalConst
123 2.8758 ResourceOwnerEnlargePlanCacheRefs
100 2.3381 MemoryContextReset
99 2.3147 MemoryContextAlloc
48 1.1223 SPI_pop
41 0.9586 SPI_push

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpgsql function is so slow
Date: 2009-09-25 04:56:23
Message-ID: 162867790909242156n1cbcf533g1d85b2650f1cb4fd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/9/24 Euler Taveira de Oliveira <euler(at)timbira(dot)com>:
> Hi,
>
> I recently saw a complaint that a simple PL/PgSQL code is slower than PL/SQL.
> I did some benchmark and confirmed it is. I coded the same function
> (function2) in C just to compare with something. According to OP [1], the
> PL/SQL seems to run more than 15x faster than PL/PgSQL code.

sure - PL/SQL is translated to native code, so PL/pgSQL is only
interpret. What more - PL/SQL or C use native integer arithmetic, but
PL/pgSQL use PostgreSQL integer arithmetic.

so if you have to use fast code, use C, plperl, but not use PL/pgSQL -
it is just glue for SQL statements.

regards
Pavel Stehule

>
>
> euler=# select function1();
>  function1
> -----------
>  100000000
> (1 row)
>
> Time: 62107,607 ms
> euler=# select function2();
>  function2
> -----------
>  100000000
> (1 row)
>
> Time: 419,673 ms
>
> The PL/PgSQL function is:
>
> CREATE OR REPLACE FUNCTION function1()
> RETURNS INTEGER AS
> $BODY$
> DECLARE
>  i INTEGER;
>  s INTEGER := 0;
> BEGIN
>  FOR i IN 1 .. power(10, 8) LOOP
>     s := s + 1;
>  END LOOP;
>  RETURN s;
> END;
> $BODY$
> LANGUAGE 'plpgsql' IMMUTABLE;
>
> The C function is:
>
> #include "postgres.h"
> #include <math.h>
> #include "fmgr.h"
>
> #ifdef PG_MODULE_MAGIC
> PG_MODULE_MAGIC;
> #endif
>
> PG_FUNCTION_INFO_V1(function2);
>
> Datum function2(PG_FUNCTION_ARGS)
> {
>        int     i;
>        int s = 0;
>
>        for (i = 1; i <= (int) pow(10, 8); i++)
>                s += 1;
>
>        PG_RETURN_INT32(s);
> }
>
> PL/PgSQL oprofile is:
>
> samples  %        symbol name
> 2263     25.6024  AllocSetReset
> 1071     12.1168  ExecMakeFunctionResultNoSets
> 725       8.2023  AllocSetAlloc
> 664       7.5122  RevalidateCachedPlan
> 586       6.6297  ExecEvalParam
> 521       5.8943  AcquireExecutorLocks
> 463       5.2381  ResourceOwnerForgetPlanCacheRef
> 359       4.0615  AllocSetFreeIndex
> 329       3.7221  int4pl
> 262       2.9641  ExecEvalConst
> 248       2.8057  check_stack_depth
> 244       2.7605  MemoryContextReset
> 234       2.6474  list_head
> 143       1.6178  ReleaseCachedPlan
> 130       1.4708  MemoryContextAlloc
> 121       1.3689  pgstat_end_function_usage
> 111       1.2558  pgstat_init_function_usage
> 98        1.1087  list_head
> 94        1.0635  ResourceOwnerEnlargePlanCacheRefs
> 90        1.0182  ResourceOwnerRememberPlanCacheRef
> 44        0.4978  SPI_push
> 39        0.4412  SPI_pop
>
> Any ideas?
>
> [1]
> http://listas.postgresql.org.br/pipermail/pgbr-geral/2009-September/017427.html
>
>
> --
>  Euler Taveira de Oliveira
>  http://www.timbira.com/
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: euler(at)timbira(dot)com (Euler Taveira de Oliveira), Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpgsql function is so slow
Date: 2009-09-25 05:05:43
Message-ID: 87k4znwr4o.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Euler" == Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:

Euler> Ops... forgot to remove it from other test. It seems much
Euler> better but far from the ideal. :( I've never taken a look at
Euler> the pl/pgsql code but it could be nice if there would be two
Euler> path codes: access-data and non-access-data paths. I have no
Euler> idea if it will be possible (is path type too complex to
Euler> detect?) but it will certainly improve the non-access-data
Euler> functions.

Like Tom said, this benchmark is silly. Some comparisons (note that in
all these cases I've replaced the power(10,8) with a constant, because
you weren't comparing like with like there):

plpgsql 13.3 sec
tcl85 29.9 sec
perl5.8 7.7 sec
python2.6 11.5 sec
C 0.242 sec

What this suggests to me is that plpgsql isn't so far off the norm for
interpreted scripting languages; sure it's slower than perl, but then
most things are; comparing it with C code is just silly.

There is, though, one genuine case that's come up a few times in IRC
regarding slowness of procedural code in pg, and that's any time
someone tries to implement some array-based algorithm in plpgsql. The
fact that a[i] is O(i) not O(1) (unless the array type is fixed length)
comes as a nasty shock since iterating over an array becomes O(n^2).

This is obviously a consequence of the array storage format; is there
any potential for changing that to some format which has, say, an array
of element offsets at the start, rather than relying on stepping over
length fields?

--
Andrew (irc:RhodiumToad)


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpgsql function is so slow
Date: 2009-09-25 13:48:51
Message-ID: b42b73150909250648k1d4a1984ga365992b56426bf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 25, 2009 at 1:05 AM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>> "Euler" == Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
>
>  Euler> Ops... forgot to remove it from other test. It seems much
>  Euler> better but far from the ideal. :( I've never taken a look at
>  Euler> the pl/pgsql code but it could be nice if there would be two
>  Euler> path codes: access-data and non-access-data paths.  I have no
>  Euler> idea if it will be possible (is path type too complex to
>  Euler> detect?)  but it will certainly improve the non-access-data
>  Euler> functions.
>
> Like Tom said, this benchmark is silly. Some comparisons (note that in
> all these cases I've replaced the power(10,8) with a constant, because
> you weren't comparing like with like there):
>
> plpgsql     13.3 sec
> tcl85       29.9 sec
> perl5.8      7.7 sec
> python2.6   11.5 sec
> C            0.242 sec
>
> What this suggests to me is that plpgsql isn't so far off the norm for
> interpreted scripting languages; sure it's slower than perl, but then
> most things are; comparing it with C code is just silly.
>
> There is, though, one genuine case that's come up a few times in IRC
> regarding slowness of procedural code in pg, and that's any time
> someone tries to implement some array-based algorithm in plpgsql. The
> fact that a[i] is O(i) not O(1) (unless the array type is fixed length)
> comes as a nasty shock since iterating over an array becomes O(n^2).
>
> This is obviously a consequence of the array storage format; is there
> any potential for changing that to some format which has, say, an array
> of element offsets at the start, rather than relying on stepping over
> length fields?

Couple points:
*) Surely, it's better to encourage use of 'unnest' style approaches
for array iteration
*) If an array has fixed length elements and doesn't have null
elements (a fairly common case), maybe it's worthwhile not
generating/storing the lengths vector?
*) Wouldn't it be possible to store offsets always, not lengths, since
you can calculate the length from the next offset?

merlin