Slow joins against set-returning functions

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow joins against set-returning functions
Date: 2004-08-15 16:00:14
Message-ID: 20040815160014.GA85211@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

PostgreSQL versions: 7.4.3, 8.0.0beta1

Joins against set-returning functions can be slow. Here's a simple
example (in 8.0.0beta1, the gen_series function can be replaced
with generate_series):

CREATE FUNCTION gen_series(INTEGER, INTEGER) RETURNS SETOF INTEGER AS '
DECLARE
xstart ALIAS FOR $1;
xend ALIAS FOR $2;
x INTEGER;
BEGIN
FOR x IN xstart .. xend LOOP
RETURN NEXT x;
END LOOP;

RETURN;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE TABLE stuff (
id INTEGER NOT NULL PRIMARY KEY,
item VARCHAR(32) NOT NULL
);

INSERT INTO stuff (id, item)
SELECT id, 'Item ' || id FROM gen_series(1, 100000) AS f(id);

ANALYZE stuff;

Here are two queries; notice how the second query, which uses higher
numbers for the join key, is much slower than the first, apparently
due to an inefficient index scan:

EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id);

Merge Join (cost=62.33..2544.33 rows=1001 width=17) (actual time=1.398..1.950 rows=10 loops=1)
Merge Cond: ("outer".id = "inner".id)
-> Index Scan using stuff_pkey on stuff (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.667..0.860 rows=11 loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.646..0.670 rows=10 loops=1)
Sort Key: f.id
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.403..0.478 rows=10 loops=1)
Total runtime: 2.529 ms
(7 rows)

EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id);

Merge Join (cost=62.33..2544.33 rows=1001 width=17) (actual time=2907.078..2907.618 rows=10 loops=1)
Merge Cond: ("outer".id = "inner".id)
-> Index Scan using stuff_pkey on stuff (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.107..2270.722 rows=100000 loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.630..0.654 rows=10 loops=1)
Sort Key: f.id
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.392..0.469 rows=10 loops=1)
Total runtime: 2908.205 ms
(7 rows)

If I turn off enable_mergejoin then both queries are fast:

SET enable_mergejoin TO off;

EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id);

Nested Loop (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.600..1.912 rows=10 loops=1)
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.395..0.482 rows=10 loops=1)
-> Index Scan using stuff_pkey on stuff (cost=0.00..3.01 rows=1 width=17) (actual time=0.091..0.107 rows=1 loops=10)
Index Cond: (stuff.id = "outer".id)
Total runtime: 2.401 ms
(5 rows)

EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id);

Nested Loop (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.586..1.891 rows=10 loops=1)
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.394..0.479 rows=10 loops=1)
-> Index Scan using stuff_pkey on stuff (cost=0.00..3.01 rows=1 width=17) (actual time=0.089..0.105 rows=1 loops=10)
Index Cond: (stuff.id = "outer".id)
Total runtime: 2.374 ms
(5 rows)

Is the planner doing something wrong here?

Thanks.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Martin Foster 2004-08-15 16:01:26 Re: Faster with a sub-query then without
Previous Message Tom Lane 2004-08-15 03:55:03 Re: Faster with a sub-query then without