Re: simple functions, huge overhead, no cache

Lists: pgsql-general
From: Josip Rodin <joy(at)entuzijast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: simple functions, huge overhead, no cache
Date: 2010-07-09 14:26:15
Message-ID: 20100709142615.GA2326@orion.carnet.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I went looking at why some of our queries using some custom functions were
a bit sluggish.

usercandoonobject(integer, integer, character, integer) does this:

IF (isSuperuser(p_user_id)) THEN
RETURN true;
END IF;
RETURN userCanDoOnObjectCheckGod($1, $2, $3, $4);

issuperuser(integer) does:

RETURN (SELECT userInGroup($1, 1000));

useringroup(integer, integer) does:

IF ($2 = 1) THEN
RETURN true;
ELSE
RETURN EXISTS(
SELECT groups_users.users_id
FROM groups_users
WHERE groups_users.users_id = $1
AND groups_users.groups_id = $2
);
END IF;

This is someone else's code, but it seems simple and clear enough to me...
But on runtime, I get this:

db=# explain analyze select issuperuser(id) from users;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on users (cost=0.00..6624.00 rows=23000 width=4) (actual time=0.186..644.488 rows=23000 loops=1)
Total runtime: 664.486 ms
(2 rows)

db=# explain analyze select userInGroup(id, 1000) from users;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on users (cost=0.00..6624.00 rows=23000 width=4) (actual time=0.125..417.948 rows=23000 loops=1)
Total runtime: 437.594 ms
(2 rows)

db=# explain analyze select case when users.id = 1 then true else exists(SELECT groups_users.users_id FROM groups_users WHERE groups_users.users_id = users.id AND groups_users.groups_id = '1000') end as issuperuser from users;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on users (cost=0.00..191157.14 rows=23000 width=4) (actual time=0.053..94.756 rows=23000 loops=1)
SubPlan
-> Index Scan using groups_users_pkey on groups_users (cost=0.00..8.27 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=22999)
Index Cond: ((groups_id = 1000) AND (users_id = $0))
Total runtime: 112.154 ms
(5 rows)

Why are the function calls four or six times slower than their own direct
meaning?

I gather from the documentation that there exists some function cache, which
sounds good, and I could understand this overhead if writing to cache was
its purpose, but even if I immediately just repeat the same query, I get
exactly the same slow result (verified by using the actual queries and
\timing in psql, not repeated 'explain analyze's).

What am I missing?

I tried to find an explanation in the documentation, to no avail.
This is with PostgreSQL 8.3.11.

(Please Cc: replies, I'm not subscribed. TIA.)

--
2. That which causes joy or happiness.


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Josip Rodin <joy(at)entuzijast(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-10 07:07:22
Message-ID: 4C381C2A.4000301@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 09/07/10 22:26, Josip Rodin wrote:

> db=# explain analyze select issuperuser(id) from users;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------
> Seq Scan on users (cost=0.00..6624.00 rows=23000 width=4) (actual time=0.186..644.488 rows=23000 loops=1)
> Total runtime: 664.486 ms
> (2 rows)
>
> db=# explain analyze select userInGroup(id, 1000) from users;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------
> Seq Scan on users (cost=0.00..6624.00 rows=23000 width=4) (actual time=0.125..417.948 rows=23000 loops=1)
> Total runtime: 437.594 ms
> (2 rows)

That's within expected bounds for PL/PgSQL function overhead.

regress=> CREATE OR REPLACE FUNCTION noop(int) returns int as $$
begin
return $1;
end;
$$ language 'plpgsql';

regress=> explain analyze select x from generate_series(0,23000) as x;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series x (cost=0.00..12.50 rows=1000
width=4) (actual time=9.990..44.339 rows=23001 loops=1)
Total runtime: 78.061 ms
(2 rows)

regress=> explain analyze select noop(x) from generate_series(0,23000) as x;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series x (cost=0.00..262.50 rows=1000
width=4) (actual time=10.846..104.445 rows=23001 loops=1)
Total runtime: 139.622 ms
(2 rows)

The comparison to unnesting two layers of PL/PgSQL function calls is
harsher, but then there *are* two layers of expensive calls around very
simple expressions.

Your problem boils down to the fact that PL/PgSQL function calls are
expensive. PL/PgSQL is good for complex work, but bad for wrapping up
simple expressions because setting up / tearing down the function call
context is so expensive.

For such simple expressions, you should use 'SQL' functions. These can
often be inlined to allow the query planner to avoid call overheads
entirely, and are WAY cheaper even if they can't be inlined. They're
less flexible, but much faster.

So you might write:

create or replace function
usercandoonobject(integer,integer,character,integer) returns boolean as $$
select case when isSuperuser(p_user_id) then true
else userCanDoOnObjectCheckGod($1, $2, $3, $4) end;
$$ language 'sql' stable;

create or replace function
issuperuser(integer) returns boolean as $$
SELECT userInGroup($1, 1000)
$$ language 'sql' stable;

... and so on.

--
Craig Ringer


From: Scott Ribe <scott_ribe(at)killerbytes(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-10 14:52:27
Message-ID: 86470CF1-A56C-4752-9E91-1A878215FF24@killerbytes.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

This may need to be made more obvious in the documentation...

Having both SQL and PL/PgSQL languages is one of those things that is different enough from other database systems that many people will not expect and may not notice it unless it is pointed out explicitly and prominently.

On Jul 10, 2010, at 1:07 AM, Craig Ringer wrote:

> For such simple expressions, you should use 'SQL' functions. These can
> often be inlined to allow the query planner to avoid call overheads
> entirely, and are WAY cheaper even if they can't be inlined. They're
> less flexible, but much faster.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Josip Rodin <joy(at)entuzijast(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-10 15:48:29
Message-ID: 16265.1278776909@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> Your problem boils down to the fact that PL/PgSQL function calls are
> expensive. PL/PgSQL is good for complex work, but bad for wrapping up
> simple expressions because setting up / tearing down the function call
> context is so expensive.

Yes. What's actually the most expensive part is invoking the main
executor to process SQL expressions --- that's approximately the same
cost as starting up/shutting down an entire SQL query. plpgsql has some
optimizations to avoid some of that cost when evaluating a "simple"
expression, but both of the examples here defeat that optimization:
one because it's an EXISTS() subquery, and the other because of the
entirely-unnecessary use of a sub-SELECT.

To illustrate the point I did a bit of experimentation using CVS HEAD.
This is an assert-enabled build so it will have more overhead than a
production build ... you might care to repeat the tests on your own
installation.

Base case (to measure the overhead of the test query):

regression=# explain analyze verbose select (x::float8) from generate_series(1,100000) x;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x (cost=0.00..12.50 rows=1000 width=4) (actual time=252.595..980.359 rows=100000 loops=1)
Output: (x)::double precision
Total runtime: 1495.410 ms
(3 rows)

Add the built-in cos() function to that:

regression=# explain analyze verbose select cos(x::float8) from generate_series(1,100000) x;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x (cost=0.00..15.00 rows=1000 width=4) (actual time=242.764..1140.608 rows=100000 loops=1)
Output: cos((x)::double precision)
Total runtime: 1652.968 ms
(3 rows)

So cos() added about 150 ms to the test, or 1.5 microsec per execution.
Now let's wrap the cos() call in a plpgsql function:

regression=# create function plpgsql_cosine(float8) returns float8 as '
begin return cos($1); end' language plpgsql;
CREATE FUNCTION
regression=# explain analyze verbose select plpgsql_cosine(x::float8) from generate_series(1,100000) x;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x (cost=0.00..262.50 rows=1000 width=4) (actual time=254.050..5719.287 rows=100000 loops=1)
Output: plpgsql_cosine((x)::double precision)
Total runtime: 6296.207 ms
(3 rows)

The wrapper added about 4650 ms. If we're incautious enough to make
the RETURN's expression look not-simple:

regression=# create function plpgsql_cosine2(float8) returns float8 as '
begin return (select cos($1)); end' language plpgsql;
CREATE FUNCTION
regression=# explain analyze verbose select plpgsql_cosine2(x::float8) from generate_series(1,100000) x;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x (cost=0.00..262.50 rows=1000 width=4) (actual time=244.735..15557.516 rows=100000 loops=1)
Output: plpgsql_cosine2((x)::double precision)
Total runtime: 16092.261 ms
(3 rows)

... the wrapper's overhead balloons to 14440 ms.

> For such simple expressions, you should use 'SQL' functions. These can
> often be inlined to allow the query planner to avoid call overheads
> entirely, and are WAY cheaper even if they can't be inlined. They're
> less flexible, but much faster.

I think the performance benefit of SQL functions is real only if they
can be inlined. Continuing the example:

regression=# create function sql_cosine(float8) returns float8 as '
regression'# select cos($1)' language sql;
CREATE FUNCTION
regression=# explain analyze verbose select sql_cosine(x::float8) from generate_series(1,100000) x;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x (cost=0.00..15.00 rows=1000 width=4) (actual time=243.677..1140.872 rows=100000 loops=1)
Output: cos((x)::double precision)
Total runtime: 1653.346 ms
(3 rows)

The runtime is indistinguishable from the bare cos() call, which is not
surprising since you can see from the EXPLAIN output that what was
getting executed was just the bare cos() call --- the SQL function was
inlined. If I do something to prevent inlining, it's in the same
ballpark as plpgsql:

regression=# create function sql_cosine2(float8) returns float8 as '
select cos($1) limit 1' language sql;
CREATE FUNCTION
regression=# explain analyze verbose select sql_cosine2(x::float8) from generate_series(1,100000) x;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x (cost=0.00..262.50 rows=1000 width=4) (actual time=242.401..8647.405 rows=100000 loops=1)
Output: sql_cosine2((x)::double precision)
Total runtime: 9166.177 ms
(3 rows)

The SQL-function executor is fairly stupid and hasn't had nearly the
level of optimization effort put into it that plpgsql has over the
years. This test case doesn't really show the effects of that, but
for example plpgsql can remember plans across queries whereas a SQL
function will not. So if you're focused on performance, use SQL
functions when they can be inlined (and use EXPLAIN to *verify*
that they're getting inlined) ... but otherwise you're probably as
well or better off with plpgsql.

Oh, and one more thing: there is no "function cache".

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Josip Rodin <joy(at)entuzijast(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-10 21:46:43
Message-ID: AANLkTinPhoZe4Iq1Y9fUdwpajHceBK5EgjQoJRwX-Be0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello

2010/7/10 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
>> Your problem boils down to the fact that PL/PgSQL function calls are
>> expensive. PL/PgSQL is good for complex work, but bad for wrapping up
>> simple expressions because setting up / tearing down the function call
>> context is so expensive.
>
> Yes.  What's actually the most expensive part is invoking the main
> executor to process SQL expressions --- that's approximately the same
> cost as starting up/shutting down an entire SQL query.  plpgsql has some
> optimizations to avoid some of that cost when evaluating a "simple"
> expression, but both of the examples here defeat that optimization:
> one because it's an EXISTS() subquery, and the other because of the
> entirely-unnecessary use of a sub-SELECT.

any using a non simple expression is very slow - so there can be some
a warning when people use it. Sometimes people don't know (me too),
when use expensive expression

for example

rowvar := (10,20)

it isn't simple - I am not sure, if it is true still. Maybe we can
have a two GUC for plpgsql - like current parser's plpgsql GUC
variables:

plpgsql.validate_expressions
plpgsql.show_performance_warnings

second GUC can show warning when
a) expression isn't simple
b) expression result isn't binary compatible with variable
c) plpgsql function can be simply replaced by SQL function

c is maybe controversal - SQL/PSM stimulates users to write simple one
statement functions.

Can we inline simple PL/pgSQL functions?

CREATE OR REPLACE FUNCTION foo(a int, b int)
RETURNS int AS $$
BEGIN
RETURN a + b;
END;
$$ LANGUAGE plpgsql;

Some first idea is to add column "inlined_code" - validator can
returns this value - for this function just SELECT $1 + $2; and
executor doesn't run expensive plpgsql code, but just inlined_code.
Maybe this idea is too simple. This is real problem - I can see a
similar code every month.

Regards

Pavel Stehule

>
> To illustrate the point I did a bit of experimentation using CVS HEAD.
> This is an assert-enabled build so it will have more overhead than a
> production build ... you might care to repeat the tests on your own
> installation.
>
> Base case (to measure the overhead of the test query):
>
> regression=# explain analyze verbose select (x::float8) from generate_series(1,100000) x;
>                                                               QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..12.50 rows=1000 width=4) (actual time=252.595..980.359 rows=100000 loops=1)
>   Output: (x)::double precision
>  Total runtime: 1495.410 ms
> (3 rows)
>
> Add the built-in cos() function to that:
>
> regression=# explain analyze verbose select cos(x::float8) from generate_series(1,100000) x;
>                                                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..15.00 rows=1000 width=4) (actual time=242.764..1140.608 rows=100000 loops=1)
>   Output: cos((x)::double precision)
>  Total runtime: 1652.968 ms
> (3 rows)
>
> So cos() added about 150 ms to the test, or 1.5 microsec per execution.
> Now let's wrap the cos() call in a plpgsql function:
>
> regression=# create function plpgsql_cosine(float8) returns float8 as '
> begin return cos($1); end' language plpgsql;
> CREATE FUNCTION
> regression=# explain analyze verbose select plpgsql_cosine(x::float8) from generate_series(1,100000) x;
>                                                                QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..262.50 rows=1000 width=4) (actual time=254.050..5719.287 rows=100000 loops=1)
>   Output: plpgsql_cosine((x)::double precision)
>  Total runtime: 6296.207 ms
> (3 rows)
>
> The wrapper added about 4650 ms.  If we're incautious enough to make
> the RETURN's expression look not-simple:
>
> regression=# create function plpgsql_cosine2(float8) returns float8 as '
> begin return (select cos($1)); end' language plpgsql;
> CREATE FUNCTION
> regression=# explain analyze verbose select plpgsql_cosine2(x::float8) from generate_series(1,100000) x;
>                                                                QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..262.50 rows=1000 width=4) (actual time=244.735..15557.516 rows=100000 loops=1)
>   Output: plpgsql_cosine2((x)::double precision)
>  Total runtime: 16092.261 ms
> (3 rows)
>
> ... the wrapper's overhead balloons to 14440 ms.
>
>> For such simple expressions, you should use 'SQL' functions. These can
>> often be inlined to allow the query planner to avoid call overheads
>> entirely, and are WAY cheaper even if they can't be inlined. They're
>> less flexible, but much faster.
>
> I think the performance benefit of SQL functions is real only if they
> can be inlined.  Continuing the example:
>
> regression=# create function sql_cosine(float8) returns float8 as '
> regression'# select cos($1)' language sql;
> CREATE FUNCTION
> regression=# explain analyze verbose select sql_cosine(x::float8) from generate_series(1,100000) x;
>                                                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..15.00 rows=1000 width=4) (actual time=243.677..1140.872 rows=100000 loops=1)
>   Output: cos((x)::double precision)
>  Total runtime: 1653.346 ms
> (3 rows)
>
> The runtime is indistinguishable from the bare cos() call, which is not
> surprising since you can see from the EXPLAIN output that what was
> getting executed was just the bare cos() call --- the SQL function was
> inlined.  If I do something to prevent inlining, it's in the same
> ballpark as plpgsql:
>
> regression=# create function sql_cosine2(float8) returns float8 as '
> select cos($1) limit 1' language sql;
> CREATE FUNCTION
> regression=# explain analyze verbose select sql_cosine2(x::float8) from generate_series(1,100000) x;
>                                                                QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..262.50 rows=1000 width=4) (actual time=242.401..8647.405 rows=100000 loops=1)
>   Output: sql_cosine2((x)::double precision)
>  Total runtime: 9166.177 ms
> (3 rows)
>
> The SQL-function executor is fairly stupid and hasn't had nearly the
> level of optimization effort put into it that plpgsql has over the
> years.  This test case doesn't really show the effects of that, but
> for example plpgsql can remember plans across queries whereas a SQL
> function will not.  So if you're focused on performance, use SQL
> functions when they can be inlined (and use EXPLAIN to *verify*
> that they're getting inlined) ... but otherwise you're probably as
> well or better off with plpgsql.
>
> Oh, and one more thing: there is no "function cache".
>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: Josip Rodin <joy(at)entuzijast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-11 13:05:43
Message-ID: 20100711130543.GA8164@orion.carnet.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Thank you both for the helpful explanations.

On Sat, Jul 10, 2010 at 11:48:29AM -0400, Tom Lane wrote:
> Oh, and one more thing: there is no "function cache".

Reading
http://www.postgresql.org/docs/8.3/static/plpgsql-implementation.html
section 38.10.2. "Plan Caching" had put that idea in my head :)

--
2. That which causes joy or happiness.


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josip Rodin <joy(at)entuzijast(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 06:06:43
Message-ID: 4C3AB0F3.9060205@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 11/07/2010 5:46 AM, Pavel Stehule wrote:

> any using a non simple expression is very slow - so there can be some
> a warning when people use it. Sometimes people don't know (me too),
> when use expensive expression
>
> for example
>
> rowvar := (10,20)
>
> it isn't simple - I am not sure, if it is true still.

Rather than warning whenever the SPI is invoked from PL/PgSQL, perhaps
this would be a task better suited for inclusion in a profiler feature
for the PL/PgSQL debugger?

I'm not particularly interested in the notion myself, but I don't think
warnings about "non-simple" statements would be very helpful. You'd be
drowned in warnings for statements that were a necessary part of the
operation of your functions, things for which there was no other way to
do it.

It seems like a profiler, which is designed to filter and organize the
collected data, and which can be attached only to specific functions
that you want to know about, might be a better job. As there's already a
PL/PgSQL debugger, some of the infrastructure required is already present.

Meh, personally I'll stick to the good old profiling methods "is it fast
enough", "\timing", and "explain analyze".

--
Craig Ringer


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josip Rodin <joy(at)entuzijast(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 07:12:41
Message-ID: AANLkTik4E1PrBMhuxT_3LLVPAl6xuCtKAg3u1WrwMdzO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2010/7/12 Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>:
> On 11/07/2010 5:46 AM, Pavel Stehule wrote:
>
>> any using a non simple expression is very slow - so there can be some
>> a warning when people use it. Sometimes people don't know (me too),
>> when use expensive expression
>>
>> for example
>>
>> rowvar := (10,20)
>>
>> it isn't simple - I am not sure, if it is true still.
>
> Rather than warning whenever the SPI is invoked from PL/PgSQL, perhaps this
> would be a task better suited for inclusion in a profiler feature for the
> PL/PgSQL debugger?
>
> I'm not particularly interested in the notion myself, but I don't think
> warnings about "non-simple" statements would be very helpful. You'd be
> drowned in warnings for statements that were a necessary part of the
> operation of your functions, things for which there was no other way to do
> it.

I think, so it is warnings has more education sense. Because it is
interactive. It simply show - "don't do it". More - it can be done in
validation time - so it hasn't any impact on real speed (I know it is
mas/menos). There are a few basic a plpgsql repeated mistakes - and I
think so some of these can be catched via warnings.

see http://okbob.blogspot.com/2010/04/frequent-mistakes-in-plpgsql-design.html

I am thinking so plpgsql has a full functionality - there are not too
much issues - so next direction of developing can be a smarted
validation and more verbose diagnostics (now diagnostics (syntax
error, runtime error) are pretty well). There a issue with assign
statement with quite bin-text/text-bin transformation -

declare a int;
begin
a := 4;

because "4" is a numeric and "a" is integer, then in runtime time it
does numeric/string and string to integer transformations. And it is
relative innocent statement. Some like this are traps on beginners.

>
> It seems like a profiler, which is designed to filter and organize the
> collected data, and which can be attached only to specific functions that
> you want to know about, might be a better job. As there's already a PL/PgSQL
> debugger, some of the infrastructure required is already present.

there is a pl/pgsql profiler too. but profiler isn't "interactive".
More - profiler/debugger are third-party software - so it isn't
availlable everywhere.

>
> Meh, personally I'll stick to the good old profiling methods "is it fast
> enough", "\timing", and "explain analyze".

Both tools has a sense - we have a warnings in gcc and we have a profilers too.

Regards

Pavel Stehule

>
> --
> Craig Ringer
>


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josip Rodin <joy(at)entuzijast(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 07:38:49
Message-ID: AANLkTinRRVR20BEoA5qqjdWtJImW6I82Gkj1xdQ7_eNt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jul 12, 2010 at 7:06 AM, Craig Ringer
<craig(at)postnewspapers(dot)com(dot)au> wrote:

> It seems like a profiler, which is designed to filter and organize the
> collected data, and which can be attached only to specific functions that
> you want to know about, might be a better job. As there's already a PL/PgSQL
> debugger, some of the infrastructure required is already present.

There's already a profiler in the same source tree. It just needs to
be given a little love.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Josip Rodin <joy(at)entuzijast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 07:41:18
Message-ID: 20100712074118.GA30625@orion.carnet.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jul 12, 2010 at 02:06:43PM +0800, Craig Ringer wrote:
> Meh, personally I'll stick to the good old profiling methods "is it fast
> enough", "\timing", and "explain analyze".

I agree. Some hint could be included in 'explain analyze' output, maybe just
to separate the timings for things that are well covered by the query plan
optimizer from those that aren't. I found this in a line like this:

Filter: (approved AND (NOT archived) AND ((time_to > now()) OR (time_to IS NULL)) AND ((time_from < now()) OR (time_from IS NULL)) AND usercandoonobject(1, 1, 'news'::bpchar, news_id))

These other referenced columns in the filter were all insignificant
(time-wise) compared to the single function call, but I had to find
that out with a manual search.

--
2. That which causes joy or happiness.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Josip Rodin <joy(at)entuzijast(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 14:38:48
Message-ID: AANLkTilNdH69Ck9rQcWkKi2TBezThEhWYr7QobrxnPP-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2010/7/12 Josip Rodin <joy(at)entuzijast(dot)net>:
> On Mon, Jul 12, 2010 at 02:06:43PM +0800, Craig Ringer wrote:
>> Meh, personally I'll stick to the good old profiling methods "is it fast
>> enough", "\timing", and "explain analyze".
>
> I agree. Some hint could be included in 'explain analyze' output, maybe just
> to separate the timings for things that are well covered by the query plan
> optimizer from those that aren't. I found this in a line like this:

it is useles for functions - explain doesn't show lines of executed
functions. Can you show some example of some more complex query.

Pavel

>
> Filter: (approved AND (NOT archived) AND ((time_to > now()) OR (time_to IS NULL)) AND ((time_from < now()) OR (time_from IS NULL)) AND usercandoonobject(1, 1, 'news'::bpchar, news_id))
>
> These other referenced columns in the filter were all insignificant
> (time-wise) compared to the single function call, but I had to find
> that out with a manual search.
>
> --
>     2. That which causes joy or happiness.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: Josip Rodin <joy(at)entuzijast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 17:15:08
Message-ID: 20100712171508.GA701@orion.carnet.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jul 12, 2010 at 04:38:48PM +0200, Pavel Stehule wrote:
> 2010/7/12 Josip Rodin <joy(at)entuzijast(dot)net>:
> > On Mon, Jul 12, 2010 at 02:06:43PM +0800, Craig Ringer wrote:
> >> Meh, personally I'll stick to the good old profiling methods "is it fast
> >> enough", "\timing", and "explain analyze".
> >
> > I agree. Some hint could be included in 'explain analyze' output, maybe just
> > to separate the timings for things that are well covered by the query plan
> > optimizer from those that aren't. I found this in a line like this:
>
> it is useles for functions - explain doesn't show lines of executed
> functions. Can you show some example of some more complex query.

It doesn't have to show me any lines, but it could tell me which part of
the query is actually being optimized, and OTOH which part is simply being
executed N times unconditionally because it's a function that is marked as
volatile. That alone would be a reasonable improvement.

--
2. That which causes joy or happiness.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Josip Rodin <joy(at)entuzijast(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-12 17:46:30
Message-ID: AANLkTikSCsAMdkPE-uti4K7s204SXhOd2QxIe1XndEeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2010/7/12 Josip Rodin <joy(at)entuzijast(dot)net>:
> On Mon, Jul 12, 2010 at 04:38:48PM +0200, Pavel Stehule wrote:
>> 2010/7/12 Josip Rodin <joy(at)entuzijast(dot)net>:
>> > On Mon, Jul 12, 2010 at 02:06:43PM +0800, Craig Ringer wrote:
>> >> Meh, personally I'll stick to the good old profiling methods "is it fast
>> >> enough", "\timing", and "explain analyze".
>> >
>> > I agree. Some hint could be included in 'explain analyze' output, maybe just
>> > to separate the timings for things that are well covered by the query plan
>> > optimizer from those that aren't. I found this in a line like this:
>>
>> it is useles for functions - explain doesn't show lines of executed
>> functions. Can you show some example of some more complex query.
>
> It doesn't have to show me any lines, but it could tell me which part of
> the query is actually being optimized, and OTOH which part is simply being
> executed N times unconditionally because it's a function that is marked as
> volatile. That alone would be a reasonable improvement.

this is different kinds of problems. You can have a very slow a
immutable function or very fast volatile function. And with wrong
function design your functions can be a 10 times slower. yeah - you
can multiply it via wrong or good design with wrong or good stability
flag.

Regards

Pavel Stehule

>
> --
>     2. That which causes joy or happiness.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: Josip Rodin <joy(at)entuzijast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: simple functions, huge overhead, no cache
Date: 2010-07-13 07:55:53
Message-ID: 20100713075553.GA3661@orion.carnet.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jul 12, 2010 at 07:46:30PM +0200, Pavel Stehule wrote:
> 2010/7/12 Josip Rodin <joy(at)entuzijast(dot)net>:
> > On Mon, Jul 12, 2010 at 04:38:48PM +0200, Pavel Stehule wrote:
> >> 2010/7/12 Josip Rodin <joy(at)entuzijast(dot)net>:
> >> > On Mon, Jul 12, 2010 at 02:06:43PM +0800, Craig Ringer wrote:
> >> >> Meh, personally I'll stick to the good old profiling methods "is it fast
> >> >> enough", "\timing", and "explain analyze".
> >> >
> >> > I agree. Some hint could be included in 'explain analyze' output, maybe just
> >> > to separate the timings for things that are well covered by the query plan
> >> > optimizer from those that aren't. I found this in a line like this:
> >>
> >> it is useles for functions - explain doesn't show lines of executed
> >> functions. Can you show some example of some more complex query.
> >
> > It doesn't have to show me any lines, but it could tell me which part of
> > the query is actually being optimized, and OTOH which part is simply being
> > executed N times unconditionally because it's a function that is marked as
> > volatile. That alone would be a reasonable improvement.
>
> this is different kinds of problems. You can have a very slow a
> immutable function or very fast volatile function. And with wrong
> function design your functions can be a 10 times slower. yeah - you
> can multiply it via wrong or good design with wrong or good stability
> flag.

Well, it was demonstrated previously that the domain of very fast volatile
plpgsql functions is inherently limited with their startup overhead, which
makes them inherently slow with small data sets (and/or nesting). Sure, this
can become relatively insignificant on very large data sets, but as long as
there is a reasonable chance that this slows down a query by an order of
magnitude, IMHO it would be better to note it than to ignore it.

--
2. That which causes joy or happiness.