Expensive function and the optimizer

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Expensive function and the optimizer
Date: 2005-11-08 01:58:03
Message-ID: 4370062B.4050202@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a function, call it "myfunc()", that is REALLY expensive computationally. Think of it like, "If you call this function, it's going to telephone the Microsoft Help line and wait in their support queue to get the answer." Ok, it's not that bad, but it's so bad that the optimizer should ALWAYS consider it last, no matter what. (Realistically, the function takes 1-2 msec average, so applying it to 40K rows takes 40-80 seconds. It's a graph-theory algorithm, known to be NP-complete.)

Is there some way to explain this cost to the optimizer in a permanent way, like when the function is installed? Here's what I get with one critical query (somewhat paraphrased for simplicity):

explain analyze
select A.ID
from A join B ON (A.ID = B.ID)
where A.row_num >= 0 and A.row_num <= 43477
and B.ID = 52
and myfunc(A.FOO, 'FooBar') order by row_num;

QUERY PLAN
----------------------------------------------------------------------------------
Nested Loop (cost=0.00..72590.13 rows=122 width=8)
-> Index Scan using i_a_row_num on a (cost=0.00..10691.35 rows=12222 width=8)
Index Cond: ((row_num >= 0) AND (row_num <= 43477))
Filter: myfunc((foo)::text, 'FooBar'::text)
-> Index Scan using i_b_id on b (cost=0.00..5.05 rows=1 width=4)
Index Cond: ("outer".id = b.id)
Filter: (id = 52)
Total runtime: 62592.631 ms
(8 rows)

Notice the "Filter: myfunc(...)" that comes in the first loop. This means it's applying myfunc() to 43477 rows in this example. The second index scan would cut this number down from 43477 rows to about 20 rows, making the query time drop from 62 seconds down to a fraction of a second.

Is there any way to give Postgres this information?

The only way I've thought of is something like this:

select X.id from
(select A.id, A.foo, A.row_num
from A join B ON (A.id = B.id)
where A.row_num >= 0 and A.row_num <= 43477
and B.id = 52) as X
where myfunc(X.foo, 'FooBar') order by X.row_num;

I can do this, but it means carefully hand-crafting each query rather than writing a more natural query.

Thanks,
Craig

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2005-11-08 02:10:38 Re: Temporary Table
Previous Message Ralph Mason 2005-11-07 21:40:01 Re: Temporary Table