Quick Links

Re: LEFT JOINs not optimized away when not needed

Lists:	pgsql-bugspgsql-hackers

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	LEFT JOINs not optimized away when not needed
Date:	2014-07-08 14:04:21
Message-ID:	CAJ4CxLkrWctbEhoshWvxnEDi9=PvoDbWzdf5OJqvfv840Hv0_Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Dear PostgreSQL Gurus,

We currently have php code which, for the purposes of speed optimization,
dynamically constructs queries based on the output columns and filter
conditions requested by the application. This code is very complicated, so
we are trying to simplify things by using views instead. We have in mind to
define the view entirely in terms of left joins, so that when particular
tables in the view are not requested on the output or used in filtering,
Postgres will not join the table, and the query will still be fast. This
would drastically simplify our middleware code.

However, it turns out that Postgres is not optimizing away the left joins
as I would expect. See the following query:

SELECT r.reset
FROM tb_reset r
LEFT JOIN tb_project pj ON pj.project = r.project
LEFT JOIN tb_location l ON l.location = r.location
LEFT JOIN tb_program pg ON pg.program = r.program
LEFT JOIN tb_reset_survey rsv ON rsv.reset = r.reset
LEFT JOIN tb_program_location pl ON pl.program = r.program AND
pl.location = r.location
LEFT JOIN tb_program_type pgt ON pgt.program_type = pg.program_type
LEFT JOIN tb_project_department pd ON pd.project = pj.project
LEFT JOIN tb_department d ON d.department = pd.department
LEFT JOIN tb_fiscal_calendar fc ON fc.day = r.execution_date
LEFT JOIN tb_fiscal_month fm ON fm.fiscal_month = fc.month
LEFT JOIN tb_rollout_schedule rs ON rs.fiscal_week = fc.week AND
rs.fiscal_year = fc.year AND rs.program = r.program
LEFT JOIN tb_reset_team rt ON rt.reset_team = r.reset_team
LEFT JOIN tb_vendor v ON v.vendor = rt.vendor
LEFT JOIN tb_reset_status rst ON rst.reset_status = r.reset_status
LEFT JOIN tb_order_location ol ON ol.location = r.location
LEFT JOIN tb_entity_reset er ON er.reset = r.reset
LEFT JOIN tb_market m ON m.market = l.market
LEFT JOIN tb_district dist ON dist.district = l.district
LEFT JOIN tb_mregion mr ON mr.mregion = l.mregion
LEFT JOIN tb_region rg ON rg.region = l.region
LEFT JOIN tb_ogrp ogrp ON ogrp.ogrp = l.ogrp
LEFT JOIN tb_buying_office byo ON byo.buying_office = l.buying_office
LEFT JOIN tb_project_participant pp ON pp.project = r.project
LEFT JOIN tb_project_status ps ON ps.project_status = pj.project_status
WHERE r.in_scope IS TRUE
GROUP BY r.reset
ORDER BY r.reset
LIMIT 100 OFFSET 0;

Seeing that there is only one output column, and that the results are
grouped by this output column, it seems to me that the optimizer should not
even look at the rest of the tables. However, in reality the following
query plan is produced:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.43..162.06 rows=100 width=4) (actual time=0.110..3.977
rows=100 loops=1)
-> Group (cost=1.43..3250.84 rows=2023 width=4) (actual
time=0.109..3.941 rows=100 loops=1)
-> Nested Loop Left Join (cost=1.43..3239.44 rows=4560
width=4) (actual time=0.106..3.652 rows=1209 loops=1)
Join Filter: (pp.project = r.project)
Rows Removed by Join Filter: 1626
-> Nested Loop Left Join (cost=1.43..2964.79
rows=4560 width=8) (actual time=0.089..2.172 rows=629 loops=1)
-> Merge Left Join (cost=1.15..2232.83
rows=2023 width=12) (actual time=0.075..1.094 rows=100 loops=1)
Merge Cond: (r.reset = er.reset)
-> Nested Loop Left Join
(cost=1.00..2165.74 rows=2023 width=12) (actual time=0.072..1.051
rows=100 loops=1)
Join Filter: ((rs.fiscal_week =
fc.week) AND (rs.fiscal_year = fc.year))
-> Nested Loop Left Join
(cost=0.85..1318.06 rows=2023 width=24) (actual time=0.068..0.874
rows=100 loops=1)
-> Nested Loop Left Join
(cost=0.56..523.87 rows=2023 width=20) (actual time=0.065..0.744
rows=100 loops=1)
Join Filter: (pd.project
= pj.project)
Rows Removed by Join Filter: 200
-> Merge Left Join
(cost=0.56..431.80 rows=2023 width=24) (actual time=0.043..0.526
rows=100 loops=1)
Merge Cond:
(r.reset = rsv.reset)
-> Nested Loop
Left Join (cost=0.42..409.82 rows=2023 width=24) (actual
time=0.037..0.474 rows=100 loops=1)
-> Index
Scan using tb_reset_pkey on tb_reset r (cost=0.28..72.42 rows=2023
width=28) (actual time=0.023..0.111 rows=100 loops=1)
Filter:
(in_scope IS TRUE)
-> Index
Scan using tb_project_pkey on tb_project pj (cost=0.14..0.16 rows=1
width=8) (actual time=0.002..0.003 rows=1 loops=100)
Index
Cond: (project = r.project)
-> Index Only Scan
using ix_reset_survey on tb_reset_survey rsv (cost=0.14..15.29
rows=130 width=4) (actual time=0.004..0.004 rows=0 loops=1)
Heap Fetches: 0
-> Materialize
(cost=0.00..1.04 rows=3 width=8) (actual time=0.000..0.001 rows=3
loops=100)
-> Seq Scan on
tb_project_department pd (cost=0.00..1.03 rows=3 width=8) (actual
time=0.006..0.007 rows=3 loops=1)
-> Index Scan using
tb_fiscal_calendar_day_key on tb_fiscal_calendar fc (cost=0.29..0.38
rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=100)
Index Cond: (day =
r.execution_date)
-> Index Scan using
tb_rollout_schedule_program_key on tb_rollout_schedule rs
(cost=0.15..0.30 rows=8 width=12) (actual time=0.001..0.001 rows=0
loops=100)
Index Cond: (program = r.program)
-> Index Only Scan using
tb_entity_reset_reset_role_key on tb_entity_reset er
(cost=0.15..39.90 rows=1770 width=4) (actual time=0.001..0.001 rows=0
loops=1)
Heap Fetches: 0
-> Index Only Scan using
tb_order_location_location_key on tb_order_location ol
(cost=0.28..0.34 rows=2 width=4) (actual time=0.004..0.008 rows=6
loops=100)
Index Cond: (location = r.location)
Heap Fetches: 629
-> Materialize (cost=0.00..1.06 rows=4 width=4)
(actual time=0.000..0.001 rows=4 loops=629)
-> Seq Scan on tb_project_participant pp
(cost=0.00..1.04 rows=4 width=4) (actual time=0.002..0.005 rows=4
loops=1)
Total runtime: 4.421 ms
(37 rows)

When I last reported this issue, I was asked to produce a self-contained
example that could be used to reproduce this behavior. This was not easy
because of the presence of sensitive customer data, a complex schema, and
the need to have data in the tables for this problem to manifest. However,
we have created the requested example, and the sql dump of the schema is
attached.

Please let me know why Postgres is behaving this way. We are very eager to
implement this new solution, but we are stalled out right now.

Thank you.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

“Quality is not an act, it is a habit.” — Aristotle

Attachment	Content-Type	Size
schema.zip	application/zip	221.9 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 15:19:31
Message-ID:	8401.1404832771@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
> However, it turns out that Postgres is not optimizing away the left joins
> as I would expect. See the following query:

> SELECT r.reset
> FROM tb_reset r
> LEFT JOIN tb_project pj ON pj.project = r.project
> LEFT JOIN tb_location l ON l.location = r.location
> LEFT JOIN tb_program pg ON pg.program = r.program
> LEFT JOIN tb_reset_survey rsv ON rsv.reset = r.reset
> LEFT JOIN tb_program_location pl ON pl.program = r.program AND
> pl.location = r.location
> LEFT JOIN tb_program_type pgt ON pgt.program_type = pg.program_type
> LEFT JOIN tb_project_department pd ON pd.project = pj.project
> LEFT JOIN tb_department d ON d.department = pd.department
> LEFT JOIN tb_fiscal_calendar fc ON fc.day = r.execution_date
> LEFT JOIN tb_fiscal_month fm ON fm.fiscal_month = fc.month
> LEFT JOIN tb_rollout_schedule rs ON rs.fiscal_week = fc.week AND
> rs.fiscal_year = fc.year AND rs.program = r.program
> LEFT JOIN tb_reset_team rt ON rt.reset_team = r.reset_team
> LEFT JOIN tb_vendor v ON v.vendor = rt.vendor
> LEFT JOIN tb_reset_status rst ON rst.reset_status = r.reset_status
> LEFT JOIN tb_order_location ol ON ol.location = r.location
> LEFT JOIN tb_entity_reset er ON er.reset = r.reset
> LEFT JOIN tb_market m ON m.market = l.market
> LEFT JOIN tb_district dist ON dist.district = l.district
> LEFT JOIN tb_mregion mr ON mr.mregion = l.mregion
> LEFT JOIN tb_region rg ON rg.region = l.region
> LEFT JOIN tb_ogrp ogrp ON ogrp.ogrp = l.ogrp
> LEFT JOIN tb_buying_office byo ON byo.buying_office = l.buying_office
> LEFT JOIN tb_project_participant pp ON pp.project = r.project
> LEFT JOIN tb_project_status ps ON ps.project_status = pj.project_status
> WHERE r.in_scope IS TRUE
> GROUP BY r.reset
> ORDER BY r.reset
> LIMIT 100 OFFSET 0;

Ugh. What's your setting of join_collapse_limit, and if it's less than
27, does raising it fix this query?

> Seeing that there is only one output column, and that the results are
> grouped by this output column, it seems to me that the optimizer should not
> even look at the rest of the tables.

The GROUP BY has nothing to do with it, but if all the other tables' join
keys are primary keys (or at least unique), I'd expect the planner to get
rid of the joins. However, I'm not sure whether it works completely when
there are more than join_collapse_limit relations to worry about.

regards, tom lane

From:	David Fetter <david(at)fetter(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Moshe Jacobson <moshe(at)neadwerx(dot)com>, pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: [BUGS] LEFT JOINs not optimized away when not needed
Date:	2014-07-08 15:45:02
Message-ID:	20140708154502.GA6759@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Tue, Jul 08, 2014 at 11:19:31AM -0400, Tom Lane wrote:
> Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
>
> > Seeing that there is only one output column, and that the results are
> > grouped by this output column, it seems to me that the optimizer should not
> > even look at the rest of the tables.
>
> The GROUP BY has nothing to do with it, but if all the other tables' join
> keys are primary keys (or at least unique), I'd expect the planner to get
> rid of the joins. However, I'm not sure whether it works completely when
> there are more than join_collapse_limit relations to worry about.

Eliminating JOINs seems orthogonal, at least in theory, to
join_collapse_limit. What have I missed here, and how might they have
dependencies?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 18:33:14
Message-ID:	CAJ4CxLnRUjVAsj+KuWzDtRotsuR7tjRqsAmtsEaL1vwSqQRkog@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Tom,

Thanks for the reply.

On Tue, Jul 8, 2014 at 11:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> What's your setting of join_collapse_limit, and if it's less than
> 27, does raising it fix this query?
>

We raised it to 30 and restarted postgres but the query plan still touched
all of the tables. I'm not sure

> The GROUP BY has nothing to do with it, but if all the other tables' join
> keys are primary keys (or at least unique), I'd expect the planner to get
> rid of the joins. However, I'm not sure whether it works completely when
> there are more than join_collapse_limit relations to worry about.
>

Not all of the join columns are unique, but most of them are unique, and
all of them are indexed on one side or the other... FWIW.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 19:47:13
Message-ID:	30516.1404848833@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
> On Tue, Jul 8, 2014 at 11:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What's your setting of join_collapse_limit, and if it's less than
>> 27, does raising it fix this query?

> We raised it to 30 and restarted postgres but the query plan still touched
> all of the tables.

All of the tables? Your upthread EXPLAIN result (which I duplicated here)
shows it touching only nine of the 27 tables. AFAICS, the eight
un-removed joins are either joining to non-unique keys, or can't be
removed because their tables are needed to provide joining columns for
un-removable joins.

My thought that join_collapse_limit might matter was incorrect --
I was thinking that remove_useless_joins() was invoked on subproblems,
but actually it considers the whole jointree. This is borne out by
experimentation; changing it doesn't change the set of removed joins
in this example. (Which is a good thing because you really didn't want
to run with a collapse_limit as high as 30.)

So it looks to me like it's operating as intended.

regards, tom lane

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 19:57:21
Message-ID:	CAJ4CxLnPdAJ9aHTe65TLBN1X2Xdb3g1Q0-hKt2g7+1VNadGWDw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Tue, Jul 8, 2014 at 3:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> AFAICS, the eight
> un-removed joins are either joining to non-unique keys, or can't be
> removed because their tables are needed to provide joining columns for
> un-removable joins.
>

My point is that even if the column to which we are joining is a non-unique
key, there should be no need to look at that table if the output does not
include any values from that table, and there are no filters on that
table's values, and all output columns are in the GROUP BY. Is the planner
not smart enough to notice this?

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 20:40:28
Message-ID:	709.1404852028@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
> On Tue, Jul 8, 2014 at 3:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> AFAICS, the eight
>> un-removed joins are either joining to non-unique keys, or can't be
>> removed because their tables are needed to provide joining columns for
>> un-removable joins.

> My point is that even if the column to which we are joining is a non-unique
> key, there should be no need to look at that table if the output does not
> include any values from that table, and there are no filters on that
> table's values, and all output columns are in the GROUP BY. Is the planner
> not smart enough to notice this?

No. There is nothing about GROUP BY in the join removal logic.

regards, tom lane

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 20:44:39
Message-ID:	CAJ4CxLkYMm6wV-5yhGtRVXOHsP-9CkXafhsT=+2D6-9dkMueiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Tue, Jul 8, 2014 at 4:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> No. There is nothing about GROUP BY in the join removal logic.
>

OK. Thank you. Is this something that would make sense to add?

More importantly, do you have any suggestions on how we can improve the
speed of our views when only a couple of columns are selected from them?

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-08 21:40:00
Message-ID:	2517.1404855600@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
> On Tue, Jul 8, 2014 at 4:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> No. There is nothing about GROUP BY in the join removal logic.

> OK. Thank you. Is this something that would make sense to add?

Dunno. I don't recall any previous requests for such a thing, so I'd not
be inclined to add it unless it can be done very cheaply (in terms of both
code and runtime). However, it seems like that might possibly be the
case, if someone wanted to pursue making a patch. The existing logic
to check for "no references above the join" would serve fine to exclude
grouping by the inner relation, so it might only be necessary to check
that the query has GROUP BY/DISTINCT at all (and, I guess, no aggregates
or window functions). Or possibly I'm missing something.

> More importantly, do you have any suggestions on how we can improve the
> speed of our views when only a couple of columns are selected from them?

Well, I'd stay away from twenty-seven-way joins if I were you :-(.
That's a recipe for pain. Perhaps denormalizing your schema a bit would
help there.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Moshe Jacobson <moshe(at)neadwerx(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-09 16:07:35
Message-ID:	CA+TgmoZ8HGpVjmnGv=UHd59b3N8KCpo-_yo1D7-GRxT8aaFsfg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Tue, Jul 8, 2014 at 5:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
>> On Tue, Jul 8, 2014 at 4:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> No. There is nothing about GROUP BY in the join removal logic.
>
>> OK. Thank you. Is this something that would make sense to add?
>
> Dunno. I don't recall any previous requests for such a thing, so I'd not
> be inclined to add it unless it can be done very cheaply (in terms of both
> code and runtime).

I thought this was exactly what was being discussed on the "Allowing
join removals for more join types" thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Moshe Jacobson <moshe(at)neadwerx(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-09 16:54:05
Message-ID:	16398.1404924845@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jul 8, 2014 at 5:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
>>> On Tue, Jul 8, 2014 at 4:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> No. There is nothing about GROUP BY in the join removal logic.

>>> OK. Thank you. Is this something that would make sense to add?

>> Dunno. I don't recall any previous requests for such a thing, so I'd not
>> be inclined to add it unless it can be done very cheaply (in terms of both
>> code and runtime).

> I thought this was exactly what was being discussed on the "Allowing
> join removals for more join types" thread.

No, this is a completely different thing. The idea is that if the query
is grouping on outer-relation columns, you don't need to care if the inner
relation is unique or not, because it doesn't matter if there are multiple
matches.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Moshe Jacobson <moshe(at)neadwerx(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-09 17:30:39
Message-ID:	CA+TgmoZSO7t+y3A4HF_tJ6cycQ2f0KUfOpEoMUfJM5DoHqO_3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Wed, Jul 9, 2014 at 12:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jul 8, 2014 at 5:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Moshe Jacobson <moshe(at)neadwerx(dot)com> writes:
>>>> On Tue, Jul 8, 2014 at 4:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> No. There is nothing about GROUP BY in the join removal logic.
>
>>>> OK. Thank you. Is this something that would make sense to add?
>
>>> Dunno. I don't recall any previous requests for such a thing, so I'd not
>>> be inclined to add it unless it can be done very cheaply (in terms of both
>>> code and runtime).
>
>> I thought this was exactly what was being discussed on the "Allowing
>> join removals for more join types" thread.
>
> No, this is a completely different thing. The idea is that if the query
> is grouping on outer-relation columns, you don't need to care if the inner
> relation is unique or not, because it doesn't matter if there are multiple
> matches.

Oh... but that would only work if it were grouping without
aggregation, right? Seems awfully narrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Moshe Jacobson <moshe(at)neadwerx(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-09 17:33:02
Message-ID:	18920.1404927182@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jul 9, 2014 at 12:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> No, this is a completely different thing. The idea is that if the query
>> is grouping on outer-relation columns, you don't need to care if the inner
>> relation is unique or not, because it doesn't matter if there are multiple
>> matches.

> Oh... but that would only work if it were grouping without
> aggregation, right? Seems awfully narrow.

Right. Yeah, I was not that excited about its usefulness either.
However, the OP seems to think that it's a common optimization
(I wonder if he can cite chapter and verse on which other DBMSes
do it).

regards, tom lane

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-09 17:37:25
Message-ID:	CAJ4CxLn7QHOHVYeHWntEn-J4FS_gCT-m39jNzokbdYAZbkVoOg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Wed, Jul 9, 2014 at 1:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Right. Yeah, I was not that excited about its usefulness either.
> However, the OP seems to think that it's a common optimization
> (I wonder if he can cite chapter and verse on which other DBMSes
> do it).
>

I make no claim about its commonness, but I think this type of optimization
is very useful when selecting a limited subset of the columns from a
complex view, particularly where the output rows are expected to be
distinct on the exact set of columns that are selected.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-10 15:06:49
Message-ID:	CA+Tgmoay+3rZeSeKaqdWeD_3h+V2rBVaaS73rd36+CKq-axgfg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Wed, Jul 9, 2014 at 1:37 PM, Moshe Jacobson <moshe(at)neadwerx(dot)com> wrote:
> On Wed, Jul 9, 2014 at 1:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Right. Yeah, I was not that excited about its usefulness either.
>> However, the OP seems to think that it's a common optimization
>> (I wonder if he can cite chapter and verse on which other DBMSes
>> do it).
>
> I make no claim about its commonness, but I think this type of optimization
> is very useful when selecting a limited subset of the columns from a complex
> view, particularly where the output rows are expected to be distinct on the
> exact set of columns that are selected.

Hmm. So maybe something like this?

CREATE VIEW person_with_cars AS
SELECT p.id, p.full_name, p.something_else, array_agg(c.plate_number)
AS plate_numbers
FROM person p LEFT JOIN cars c ON p.id = c.person_id
GROUP BY p.id, p.full_name, p.something_else;

It's reasonable to hope that if the aggregated column isn't selected,
the join will get removed, but cars (person_id) is not unique.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-10 16:12:03
Message-ID:	CAJ4CxL=vTAQ1dVYtAQoeS636qniO1mOy8E83Q4w7YzWuYyqtdg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Thu, Jul 10, 2014 at 11:06 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> CREATE VIEW person_with_cars AS
> SELECT p.id, p.full_name, p.something_else, array_agg(c.plate_number)
> AS plate_numbers
> FROM person p LEFT JOIN cars c ON p.id = c.person_id
> GROUP BY p.id, p.full_name, p.something_else;
>
> It's reasonable to hope that if the aggregated column isn't selected,
> the join will get removed, but cars (person_id) is not unique.
>

Exactly. But the same should be true even if c.plate_number is not
aggregated in the view def.
As long as it's not selected, the output should list one row per person_id
and the cars table should be removed from the join.
What I've just described is our most common use case.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

From:	Moshe Jacobson <moshe(at)neadwerx(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>, Robert Mowlavi <robert(at)neadwerx(dot)com>, Chris Autry <chris(at)neadwerx(dot)com>
Subject:	Re: LEFT JOINs not optimized away when not needed
Date:	2014-07-10 20:18:27
Message-ID:	CAJ4CxLkWyRASRNOZLcsgKP=aZ+T7xsaZ-MnWBfD9Zzh_YThs4w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

On Thu, Jul 10, 2014 at 12:12 PM, Moshe Jacobson <moshe(at)neadwerx(dot)com> wrote:

> It's reasonable to hope that if the aggregated column isn't selected,
>> the join will get removed, but cars (person_id) is not unique.
>>
>
> Exactly. But the same should be true even if c.plate_number is not
> aggregated in the view def.
> As long as it's not selected, the output should list one row per person_id
> and the cars table should be removed from the join.
> What I've just described is our most common use case.
>

Oops, what I said may not have been clear.
The view def wouldn't have a GROUP BY in most of our use cases.
The GROUP BY would actually be used to group all of the columns selected
from the view.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com>
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle