Re: refactoring allpaths.c (was Re: Suppressing unused subquery output columns)

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Suppressing unused subquery output columns
Date: 2014-06-06 02:27:24
Message-ID: 596.1402021644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A question in pgsql-general made me reflect about how the planner isn't
smart about unreferenced output columns of subqueries that it's not able
to flatten into the parent query. Here's an example:

regression=# create table t1 (f1 int);
CREATE TABLE
regression=# create table t2 (f2 int primary key, f3 int);
CREATE TABLE
regression=# explain select f1 from (select * from t1 left join t2 on f1=f2) ss;
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
Planning time: 0.691 ms
(2 rows)

So far so good; we were able to apply join removal to t2, because the
query doesn't require outputting f2 or f3. But if the subquery can't be
flattened, it stops working:

regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 limit 1) ss;
QUERY PLAN
------------------------------------------------------------------------------------
Subquery Scan on ss (cost=0.15..0.38 rows=1 width=4)
-> Limit (cost=0.15..0.37 rows=1 width=12)
-> Nested Loop Left Join (cost=0.15..516.00 rows=2400 width=12)
-> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
-> Index Scan using t2_pkey on t2 (cost=0.15..0.19 rows=1 width=8)
Index Cond: (t1.f1 = f2)
(6 rows)

This is because while planning the separate subquery, the planner sees
"select *" and doesn't realize that f2 and f3 aren't really needed.

The attached draft patch fixes this by deleting unused output expressions
from unflattened subqueries, so that we get:

regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 limit 1) ss;
QUERY PLAN
------------------------------------------------------------------
Subquery Scan on ss (cost=0.00..0.02 rows=1 width=4)
-> Limit (cost=0.00..0.01 rows=1 width=4)
-> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
Planning time: 0.193 ms
(4 rows)

I'm not entirely convinced that it's worth the extra planning cycles,
though. Given the small number of complaints to date, it might not
be worth doing this. Thoughts?

regards, tom lane

PS: to be clear, I'm not thinking of applying this till 9.5 opens,
in any case.

Attachment Content-Type Size
suppress-unused-subquery-outputs.patch text/x-diff 6.6 KB

From: Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suppressing unused subquery output columns
Date: 2014-06-06 02:52:55
Message-ID: CAKddOFAKSL-FU8yC-AXOxH8XNF-y1C-LAoJv8_RK3-kY=0j8Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I'm not entirely convinced that it's worth the extra planning cycles,
> though. Given the small number of complaints to date, it might not
> be worth doing this. Thoughts?
>

Would this avoid execution of expensive functions in views when their
output is discarded?

-- On 9.3
CREATE TABLE data (col1 serial primary key);
INSERT INTO data DEFAULT VALUES;
INSERT INTO data DEFAULT VALUES;

CREATE OR REPLACE VIEW v AS select *, (pg_sleep(1))::text FROM data;

t=# explain analyze select col1 from v;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
Subquery Scan on v (cost=0.00..76.00 rows=2400 width=4) (actual
time=1001.086..2002.217 rows=2 loops=1)
-> Seq Scan on data (cost=0.00..52.00 rows=2400 width=4) (actual
time=1001.083..2002.210 rows=2 loops=1)
Total runtime: 2002.268 ms
(3 rows)

regards,

Rod


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suppressing unused subquery output columns
Date: 2014-06-06 02:54:16
Message-ID: 1210.1402023256@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rod Taylor <rod(dot)taylor(at)gmail(dot)com> writes:
> On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not entirely convinced that it's worth the extra planning cycles,
>> though. Given the small number of complaints to date, it might not
>> be worth doing this. Thoughts?

> Would this avoid execution of expensive functions in views when their
> output is discarded?

Yes, as long as they're not marked volatile and don't return sets.

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suppressing unused subquery output columns
Date: 2014-06-06 21:37:25
Message-ID: 53923495.7010308@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/5/14, 9:54 PM, Tom Lane wrote:
> Rod Taylor <rod(dot)taylor(at)gmail(dot)com> writes:
>> On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm not entirely convinced that it's worth the extra planning cycles,
>>> though. Given the small number of complaints to date, it might not
>>> be worth doing this. Thoughts?
>
>> Would this avoid execution of expensive functions in views when their
>> output is discarded?
>
> Yes, as long as they're not marked volatile and don't return sets.

That would certainly make it useful for us.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suppressing unused subquery output columns
Date: 2014-06-08 04:41:01
Message-ID: CAApHDvonS8RcCcBNei9fjvOnVhqnP1vvvuvp_wVU-Lt3MRVr2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 6, 2014 at 2:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I'm not entirely convinced that it's worth the extra planning cycles,
> though. Given the small number of complaints to date, it might not
> be worth doing this. Thoughts?
>
>
That's a difficult question for sure. Obviously it's going to depend on the
query that we're planning. If it's a very simple query and we don't reduce
the target list of the subquery any, then we'll get a small performance
impact... But if we do manage to prune down the targetlist and later allow
a left join on a 1 billion row table to be removed, then the cost of the
extra few cycles performed by this patch would be totally unmeasurable and
completely insignificant.

At work we use <insert name of popular commercial relational database
product here>, and on a daily basis I come across poorly written queries by
other developers.

The things I can think of off hand are:

1. All primary key fields of a table are in a DISTINCT clause
2. Someone had written a query like: SELECT some,1 as type FROM table UNION
SELECT thing,2 as type from othertable

The database in question manage to remove the DISTINCT on 1 because it's
just not needed as each group could only ever have 1 row. On 2 it managed
to see that because column 2 of the UNION query was a constant and it was a
different constant on each side of the UNION, then the query would not
produce duplicates and it changed this to UNION ALL.

At home I checked how PostgreSQL handled both of these cases, and saw that
it failed to see through the non-sense in the poorly written queries on
both accounts. This is fine, as anyone who ever posted on the performance
list saying, "why is this slow?" We'd tell them to write their query
another way. But what about all the companies who consider porting their
application over to PostgreSQL and get to the stage of importing all the
data over onto a test server and trying a few of their (poorly written)
queries. And they see a 10 times slowdown and immediately think PostgreSQL
is just not for them. It's a shame that they'd turn us down so early, but
if we went ahead and added code to detect both of these situations then
we'd end up increasing planning time for everyone else who writes their
queries properly.

I don't really have an answer for this, but I do think it needs more
discussion. The only things I've thought of so far as a planner_strength
GNU that can try to optimise these things when it gets to a certain level,
but that just introduces surprise factor and a whole bunch of threads on
performance saying... Why is this query on pg10.4 slower than pg10.2? and
our reply goes, is planner_strength set to the same on both? It does not
sound pretty, or another option to replan a query when the cost is over a
certain threshold with the strength level turned to maximum, but that seems
like it could go along the same lines as far as surprise factor is
concerned.

It's pretty hard to get the best of both worlds here. I know my post does
not have many answers, but I posted it anyway just in case someone else
comes up with a good idea that perhaps we could all work towards.

Regards

David Rowley


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suppressing unused subquery output columns
Date: 2014-06-12 15:14:47
Message-ID: CA+TgmoaY+PLWNM-TQnP6uzJUJhhBL8vGeeiqftbo=pHgTXf3ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The attached draft patch fixes this by deleting unused output expressions
> from unflattened subqueries, so that we get:
>
> regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 limit 1) ss;
> QUERY PLAN
> ------------------------------------------------------------------
> Subquery Scan on ss (cost=0.00..0.02 rows=1 width=4)
> -> Limit (cost=0.00..0.01 rows=1 width=4)
> -> Seq Scan on t1 (cost=0.00..34.00 rows=2400 width=4)
> Planning time: 0.193 ms
> (4 rows)
>
> I'm not entirely convinced that it's worth the extra planning cycles,
> though. Given the small number of complaints to date, it might not
> be worth doing this. Thoughts?

I've encountered this issue before and think it would be great to fix
it. I'm not sure precisely how many cycles it's worth paying, but
definitely more than zero.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suppressing unused subquery output columns
Date: 2014-06-12 16:03:07
Message-ID: 26238.1402588987@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The attached draft patch fixes this by deleting unused output expressions
>> from unflattened subqueries, so that we get:
>> ...
>> I'm not entirely convinced that it's worth the extra planning cycles,
>> though. Given the small number of complaints to date, it might not
>> be worth doing this. Thoughts?

> I've encountered this issue before and think it would be great to fix
> it. I'm not sure precisely how many cycles it's worth paying, but
> definitely more than zero.

We have a couple votes for this patch and no one has spoken against it,
so I'll go ahead and push it into HEAD.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: refactoring allpaths.c (was Re: Suppressing unused subquery output columns)
Date: 2014-06-12 16:37:48
Message-ID: 28337.1402591068@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> We have a couple votes for this patch and no one has spoken against it,
> so I'll go ahead and push it into HEAD.

BTW, I forgot to mention that while working on this patch I was thinking
it's past time to separate out the subquery support in allpaths.c into
its own file. With this patch, allpaths.c is 2363 lines, about 690 of
which are set_subquery_pathlist and subsidiary functions. While that
might not be quite tail-wagging-dog level, I think it's enough to justify
splitting that code out into a new file, say optimizer/path/subquerypath.c.

There are also about 630 lines devoted to appendrel path generation,
which might be thought enough to deserve a separate file of its own.
I'm a bit less excited about that though, mainly because the appendrel
logic has some not-quite-arms-length interactions with set_rel_size();
but there's certainly some case to be made for doing it.

Thoughts?

regards, tom lane


From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: refactoring allpaths.c (was Re: Suppressing unused subquery output columns)
Date: 2014-07-25 09:30:30
Message-ID: 53D223B6.7020300@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2014/06/13 1:37), Tom Lane wrote:
> I wrote:
>> We have a couple votes for this patch and no one has spoken against it,
>> so I'll go ahead and push it into HEAD.
>
> BTW, I forgot to mention that while working on this patch I was thinking
> it's past time to separate out the subquery support in allpaths.c into
> its own file. With this patch, allpaths.c is 2363 lines, about 690 of
> which are set_subquery_pathlist and subsidiary functions. While that
> might not be quite tail-wagging-dog level, I think it's enough to justify
> splitting that code out into a new file, say optimizer/path/subquerypath.c.

+1

Sorry for the late reply.

Best regards,
Etsuro Fujita