Re: plpgsql vs. SQL performance

Lists: pgsql-general
From: google(at)newtopia(dot)com (Michael Pohl)
To: pgsql-general(at)postgresql(dot)org
Subject: plpgsql vs. SQL performance
Date: 2003-05-19 02:31:53
Message-ID: da4ea47.0305181831.2ff55bc8@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am occasionally seeing plpgsql functions significantly underperform
their straight SQL equivalents. A simple example I ran into this
evening:

create or replace function pl_get_user_item_count(int, int)
returns int as '
declare
input_user_id alias for $1;
input_status_id alias for $2;
item_count int;
begin
select
into item_count count(id)
from
"item"
where
user_id = input_user_id and
status_id = input_status_id;
return item_count;
end;
' language 'plpgsql';

This function is taking around 2.11 seconds to execute, vs. 0.09 for
the identical SQL:

[michael(at)server1 ~]$ time psql -c "select pl_get_user_item_count(1,2)"
swap_dev pl_get_user_item_count
------------------------
9
(1 row)

0.000u 0.010s 0:02.11 0.4% 0+0k 0+0io 229pf+0w

[michael(at)server1 ~]$ time psql -c "select count(id) from item where
user_id = 1 and status_id = 2" swap_dev
count
-------
9
(1 row)

0.000u 0.000s 0:00.09 0.0% 0+0k 0+0io 229pf+0w

I can provide table schema and 'explain' output if that would help.
My general question is: Should I expect certain SQL to execute
significantly more slowly when wrapped in a plpgsql function? My db
experience is mainly with Sybase, and I'm used to performance boosts
with Transact-SQL stored procedures. It seems strange to see any
penalty at all for using a stored procedure, much less a harsh one as
in the example above.

Input appreciated.

thanks,

michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: google(at)newtopia(dot)com (Michael Pohl)
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: plpgsql vs. SQL performance
Date: 2003-05-25 03:23:10
Message-ID: 2955.1053832990@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

google(at)newtopia(dot)com (Michael Pohl) writes:
> I am occasionally seeing plpgsql functions significantly underperform
> their straight SQL equivalents.

Almost certainly, a different query plan is getting chosen in the
plpgsql case.

One common cause of this problem is sloppiness about datatypes. You
have declared $1 and $2 of the plpgsql function to be integer; are
the columns they're being compared to also integer? If not, that's
likely preventing indexscans from being used.

Another common cause of this sort of thing is that the planner makes
conservative choices because it doesn't have exact runtime values for
the constants in the query. What you are really comparing here is

plpgsql:
select ... where user_id = $1 and status_id = $2

SQL:
select ... where user_id = 1 and status_id = 2

In the latter case the planner can consult pg_statistic to get a pretty
good idea about how many rows will be selected, whereas in the former
case its guess is much more approximate. (I'd still expect an indexscan
to get picked though, unless you have *very* skewed data statistics for
these columns. Usually it's inequalities that push the planner to use
a seqscan in these cases.)

regards, tom lane