Re: Query very slow when in plpgsql function

Lists: pgsql-general
From: Chris McDonald <chrisjonmcdonald(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Query very slow when in plpgsql function
Date: 2009-12-31 16:05:10
Message-ID: c6636737-09fc-44ea-901b-a03cfc4e81e5@b2g2000yqi.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi everyone.

I am using postgresql 8.3.7 on Fedora Core 10. I have 1 table called
evaluation which contains about 1 million records, and another called
evaluationentry which contains about 9 million records.
evaluationentry.veto and evaluation.relevancedt both have indexes on
them. I have run ANALYZE against the tables to ensure I have stats. I
execute query:

SELECT T1.evaluationid, MIN(T2.evalresult)
FROM sharemgr.evaluation T1 INNER JOIN
sharemgr.evaluationentry T2 ON (T1.evaluationid = T2.evaluationid)
WHERE T1.relevancedt BETWEEN CAST('2009-06-15 00:00:00' AS TIMESTAMP)
AND CAST('2009-06-15 23:59:59' AS TIMESTAMP)
AND T2.veto = 'Y'
GROUP BY T1.evaluationid
HAVING MIN(T2.evalresult) = 100

and it returns about 10 results (correctly) in about 4 seconds - which
I am more than happy with given the underlying hardware and
virtualization layer.

QUERY PLAN:
===
HashAggregate (cost=197446.95..197454.58 rows=436 width=9) (actual
time=386.877..387.193 rows=10 loops=1)
Filter: (min(t2.evalresult) = 100::numeric)
-> Nested Loop (cost=0.00..197423.83 rows=3082 width=9) (actual
time=0.319..302.310 rows=4438 loops=1)
-> Index Scan using evaluation_i3 on evaluation t1
(cost=0.00..249.97 rows=436 width=4) (actual time=0.130..12.633
rows=634 loops=1)
Index Cond: ((relevancedt >= '2009-12-14
00:00:00'::timestamp without time zone) AND (relevancedt <=
'2009-12-14 23:59:59'::timestamp without time zone))
-> Index Scan using evaluationentry_i1 on evaluationentry t2
(cost=0.00..440.57 rows=933 width=9) (actual time=0.031..0.172 rows=7
loops=634)
Index Cond: (t2.evaluationid = t1.evaluationid)
Filter: (t2.veto = 'Y'::bpchar)
Total runtime: 387.669 ms"
===

I then made this query into a function so I can pass in the 2
timestamps, and return T1.evaluationid as a SETOF INT by doing

FOR matchRecord IN
same query as above
LOOP
RETURN NEXT matchRecord.evaluationid;
END LOOP;

And when I execute the function with the same parameters it takes well
over 5 minutes to execute.

It seems as though inside a function, the optimizer wants to tablescan
my 8M row table. Is there a way that I can see the query plans that my
functions are using?

Any suggestions welcome...

chris


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Chris McDonald <chrisjonmcdonald(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query very slow when in plpgsql function
Date: 2009-12-31 18:02:39
Message-ID: 4B3CE73F.4050807@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 1/01/2010 12:05 AM, Chris McDonald wrote:

> FOR matchRecord IN
> same query as above
> LOOP
> RETURN NEXT matchRecord.evaluationid;
> END LOOP;
>
> And when I execute the function with the same parameters it takes well
> over 5 minutes to execute.

It's as if you PREPAREd the query once, and each time you run the
function it gets EXECUTEd. The query plan is cached. Unfortunately, when
PostgreSQL builds a prepared statement (or query in a function) it
doesn't have knowledge of exact parameter values, which limit its use of
statistics for query optimisation.

Currently there is no way to ask PostgreSQL to re-plan such queries at
each execution. You have to force it by using a query that cannot be
cached. In PL/PgSQL the usual method is to use EXECUTE ... USING to
provide the query as text that is parsed and executed each time the
function gets invoked.

> It seems as though inside a function, the optimizer wants to tablescan
> my 8M row table. Is there a way that I can see the query plans that my
> functions are using?

Not directly. However, if you PREPARE your query, then
EXPLAIN ANALYZE EXECUTE
it with the parameters you use, you'll see the same effects.

(Hmm, this needs to be a FAQ)

--
Craig Ringer


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Query very slow when in plpgsql function
Date: 2009-12-31 22:53:13
Message-ID: 488efa144a2d43fe5f2cca7f78e9d642@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> And when I execute the function with the same parameters it takes well
> over 5 minutes to execute.
>
> It seems as though inside a function, the optimizer wants to tablescan
> my 8M row table. Is there a way that I can see the query plans that my
> functions are using?

See my blog post on this:

http://blog.endpoint.com/2008/12/why-is-my-function-slow.html

It's also in the first page of hits when you google the subject
line of this thread. :)

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 200912311751
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAks9Kz8ACgkQvJuQZxSWSsj72QCgvrcUOTF8cXyVkIZ29ky3YOmp
p+IAoOmgyTICVXBmndHADua3ypNC0ctK
=dZtE
-----END PGP SIGNATURE-----


From: Chris McDonald <chrisjonmcdonald(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Query very slow when in plpgsql function
Date: 2010-01-01 16:04:36
Message-ID: 47df9c9a-ce04-44ca-9c25-7d0c34343fbe@m26g2000yqb.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for your assistance guys, and special thanks to Greg for his
blog page.
This is exactly my problem which is resolved using dynamic SQL. I can
now see how the function was behaving badly, and have been able to fix
the query.

Once again, thanks guys - and Happy New Year from a snow-covered part
of the UK.

chris