Suppressing unused subquery output columns

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Suppressing unused subquery output columns
Date: 2014-06-06 02:27:24
Message-ID: 596.1402021644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A question in pgsql-general made me reflect about how the planner isn't
smart about unreferenced output columns of subqueries that it's not able
to flatten into the parent query. Here's an example:

regression=# create table t1 (f1 int);
CREATE TABLE
regression=# create table t2 (f2 int primary key, f3 int);
CREATE TABLE
regression=# explain select f1 from (select * from t1 left join t2 on f1=f2) ss;
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
Planning time: 0.691 ms
(2 rows)

So far so good; we were able to apply join removal to t2, because the
query doesn't require outputting f2 or f3. But if the subquery can't be
flattened, it stops working:

regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 limit 1) ss;
QUERY PLAN
------------------------------------------------------------------------------------
Subquery Scan on ss (cost=0.15..0.38 rows=1 width=4)
-> Limit (cost=0.15..0.37 rows=1 width=12)
-> Nested Loop Left Join (cost=0.15..516.00 rows=2400 width=12)
-> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
-> Index Scan using t2_pkey on t2 (cost=0.15..0.19 rows=1 width=8)
Index Cond: (t1.f1 = f2)
(6 rows)

This is because while planning the separate subquery, the planner sees
"select *" and doesn't realize that f2 and f3 aren't really needed.

The attached draft patch fixes this by deleting unused output expressions
from unflattened subqueries, so that we get:

regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 limit 1) ss;
QUERY PLAN
------------------------------------------------------------------
Subquery Scan on ss (cost=0.00..0.02 rows=1 width=4)
-> Limit (cost=0.00..0.01 rows=1 width=4)
-> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
Planning time: 0.193 ms
(4 rows)

I'm not entirely convinced that it's worth the extra planning cycles,
though. Given the small number of complaints to date, it might not
be worth doing this. Thoughts?

regards, tom lane

PS: to be clear, I'm not thinking of applying this till 9.5 opens,
in any case.

Attachment Content-Type Size
suppress-unused-subquery-outputs.patch text/x-diff 6.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2014-06-06 02:52:55 Re: Suppressing unused subquery output columns
Previous Message Noah Misch 2014-06-06 01:57:44 Re: tests for client programs