Re: LATERAL quals revisited

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: LATERAL quals revisited
Date: 2013-06-25 20:00:10
Message-ID: 15523.1372190410@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been studying the bug reported at
http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
that the planner can do the wrong thing with queries like

SELECT * FROM
i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;

I think the fundamental problem is that, because the "i.n = j.n" clause
appears syntactically in WHERE, the planner is treating it as if it were
an inner-join clause; but really it ought to be considered a clause of
the upper LEFT JOIN. That is, semantically this query ought to be
equivalent to

SELECT * FROM
i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n;

However, because distribute_qual_to_rels doesn't see the clause as being
attached to the outer join, it's not marked with the correct properties
and ends up getting evaluated in the wrong place (as a "filter" clause
not a "join filter" clause). The bug is masked in the test cases we've
used so far because those cases are designed to let the clause get
pushed down into the scan of the inner relation --- but if it doesn't
get pushed down, it's evaluated the wrong way.

After some contemplation, I think that the most practical way to fix
this is for deconstruct_recurse and distribute_qual_to_rels to
effectively move such a qual to the place where it logically belongs;
that is, rather than processing it when we look at the lower WHERE
clause, set it aside for a moment and then add it back when looking at
the ON clause of the appropriate outer join. This should be reasonably
easy to do by keeping a list of "postponed lateral clauses" while we're
scanning the join tree.

For there to *be* a unique "appropriate outer join", we need to require
that a LATERAL-using qual clause that's under an outer join contain
lateral references only to the outer side of the nearest enclosing outer
join. There's no such restriction in the spec of course, but we can
make it so by refusing to flatten a sub-select if pulling it up would
result in having a clause in the outer query that violates this rule.
There's already some code in prepjointree.c (around line 1300) that
attempts to enforce this, though now that I look at it again I'm not
sure it's covering all the bases. We may need to extend that check.

I'm inclined to process all LATERAL-using qual clauses this way, ie
postpone them till we recurse back up to a place where they can
logically be evaluated. That won't make any real difference when no
outer joins are present, but it will eliminate the ugliness that right
now distribute_qual_to_rels is prevented from sanity-checking the scope
of the references in a qual when LATERAL is present. If we do it like
this, we can resurrect full enforcement of that sanity check, and then
throw an error if any "postponed" quals are left over when we're done
recursing.

Thoughts, better ideas?

regards, tom lane


From: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-06-25 21:50:02
Message-ID: 51CA108A.9000902@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(Please excuse me if my proposal sounds silly, i'm still not too
advanced in this area...)

On 06/25/2013 10:00 PM, Tom Lane wrote:
> After some contemplation, I think that the most practical way to fix
> this is for deconstruct_recurse and distribute_qual_to_rels to
> effectively move such a qual to the place where it logically belongs;
> that is, rather than processing it when we look at the lower WHERE
> clause, set it aside for a moment and then add it back when looking at
> the ON clause of the appropriate outer join. This should be reasonably
> easy to do by keeping a list of "postponed lateral clauses" while we're
> scanning the join tree.
>

Instead of setting it aside, can we (mis)use placeholder var (PHV), to
ensure
that the WHERE clause is evaluated below the OJ; instead of combining it
with
the ON clause? That would be a special PHV(s) in that it's not actually
referenced from outside the subquery.

Whether I'm right or not, I seem to have found another problem while
trying to enforce such a PHV:

postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i)
FROM j WHERE (i.n = j.n)) j ON true;
The connection to the server was lost. Attempting reset: Failed.

TRAP: FailedAssertion("!(!bms_overlap(min_lefthand, min_righthand))",
File: "initsplan.c", Line: 1043)
LOG: server process (PID 24938) was terminated by signal 6: Aborted
DETAIL: Failed process was running: SELECT i.*, j.* FROM i LEFT JOIN
LATERAL (SELECT COALESCE(i) FROM j WHERE (i.n = j.n)) j ON true;
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
FATAL: the database system is in recovery mode

I'm not able to judge right now whether the Assert() statement is the
problem itself or anything
else. You'll probably know better.

(4f14c86d7434376b95477aeeb07fcc7272f4c47d is the last commit in my
environment)

Regards,
Antonin Houska (Tony)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-06-25 22:52:16
Message-ID: 18770.1372200736@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Antonin Houska <antonin(dot)houska(at)gmail(dot)com> writes:
> On 06/25/2013 10:00 PM, Tom Lane wrote:
>> After some contemplation, I think that the most practical way to fix
>> this is for deconstruct_recurse and distribute_qual_to_rels to
>> effectively move such a qual to the place where it logically belongs;
>> that is, rather than processing it when we look at the lower WHERE
>> clause, set it aside for a moment and then add it back when looking at
>> the ON clause of the appropriate outer join.

> Instead of setting it aside, can we (mis)use placeholder var (PHV), to
> ensure that the WHERE clause is evaluated below the OJ; instead of
> combining it with the ON clause?

No, that doesn't help; it has to be part of the joinquals at the join
node, or you don't get the right execution semantics. Plus you could
lose some optimization opportunities, for example if we fail to see
that there's a strict join clause associated with the outer join
(cf lhs_strict). Worse, I think wrapping a PHV around an otherwise
indexable clause would prevent using it for an indexscan.

> Whether I'm right or not, I seem to have found another problem while
> trying to enforce such a PHV:

> postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i)
> FROM j WHERE (i.n = j.n)) j ON true;
> The connection to the server was lost. Attempting reset: Failed.

[ pokes at that ... ] Hm, right offhand this seems like an independent
issue --- the ph_eval_at for the PHV is wrong AFAICS. Thanks for
reporting it!

regards, tom lane


From: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-06-26 14:40:09
Message-ID: 51CAFD49.5080400@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/26/2013 12:52 AM, Tom Lane wrote:
>> Instead of setting it aside, can we (mis)use placeholder var (PHV), to
>> ensure that the WHERE clause is evaluated below the OJ; instead of
>> combining it with the ON clause?
> No, that doesn't help; it has to be part of the joinquals at the join
> node, or you don't get the right execution semantics.
When I wrote 'below the OJ' I meant to retain something of the original
semantics (just like the subquery applies the WHERE clause below the OJ).
However that's probably too restrictive and your next arguments
> Plus you could
> lose some optimization opportunities, for example if we fail to see
> that there's a strict join clause associated with the outer join
> (cf lhs_strict). Worse, I think wrapping a PHV around an otherwise
> indexable clause would prevent using it for an indexscan.
>
also confirm the restrictiveness. So I can forget.

One more concern anyway: doesn't your proposal make subquery pull-up a
little bit risky in terms of cost of the resulting plan?

IMO the subquery in the original query may filter out many rows and thus
decrease the number of pairs to be evaluated by the join the ON clause
belongs to.
If the WHERE clause moves up, then the resulting plan might be less
efficient than the one we'd get if the subquery hadn't been pulled-up at
all.

However at the time of cost evaluation there's no way to get back (not
even to notice the higher cost) because the original subquery has gone
at earlier stage of the planning.

Regards,
Antonin Houska (Tony)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-06-26 14:54:16
Message-ID: 6846.1372258456@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Antonin Houska <antonin(dot)houska(at)gmail(dot)com> writes:
> If the WHERE clause moves up, then the resulting plan might be less
> efficient than the one we'd get if the subquery hadn't been pulled-up at
> all.

No, because we can push the qual back down again (using a parameterized
path) if that's appropriate. The problem at this stage is to understand
the semantics of the outer join correctly, not to make a choice of what
the plan will be.

In fact, the reason we'd not noticed this bug before is exactly that
all the test cases in the regression tests do end up pushing the qual
back down.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-07-02 16:41:21
Message-ID: 11729.1372783281@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Antonin Houska <antonin(dot)houska(at)gmail(dot)com> writes:
> Whether I'm right or not, I seem to have found another problem while
> trying to enforce such a PHV:

> postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i)
> FROM j WHERE (i.n = j.n)) j ON true;
> The connection to the server was lost. Attempting reset: Failed.

I've been poking at this problem, and have found out that there are
several other multi-legged creatures underneath this rock. LATERAL
references turn out to have many more interactions with PlaceHolderVars
than I'd previously thought. I think the existing code was okay in
the initial cut at LATERAL, when we never tried to flatten any LATERAL
subqueries into the parent query --- but now that we allow such flattening
to happen, it's possible that a PlaceHolderVar that's been wrapped
around a pulled-up subquery output expression will contain a lateral
reference. There was a previous report of problems with that sort of
thing, which I tried to fix in a quick-hack way in commit
4da6439bd8553059766011e2a42c6e39df08717f, but that was totally wrong and
in fact caused the Assert you show above. The right way to think about
it is that a PlaceHolderVar should be evaluated at its syntactic
location, but if it contains a lateral reference then that creates an
outer-reference requirement for the scan or join level at which it gets
evaluated.

So attached is a draft patch for this. It's not complete yet because
there are various comments that are now wrong and need to be updated;
but I think the code is functioning correctly. Also the
lateral_vars/lateral_relids stuff seems a bit crude and Rube Goldbergish
now, because it considers *only* lateral references occurring at relation
scan level, which I now see is just part of the problem. I'm not sure
if there's a good way to generalize that or if it's best left alone.

Note that the original join-qual-misplacement problem reported by Jeremy
Evans is not fixed yet; this is just addressing PlaceHolderVar issues.

Comments?

regards, tom lane

Attachment Content-Type Size
placeholdervar-fixes-1.patch text/x-patch 88.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-07-03 18:32:39
Message-ID: 22570.1372876359@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> So attached is a draft patch for this. It's not complete yet because
> there are various comments that are now wrong and need to be updated;
> but I think the code is functioning correctly.

Hm, spoke too soon :-(. This query causes an assertion failure, with or
without my draft patch:

select c.*,a.*,ss1.q1,ss2.q1,ss3.* from
int8_tbl c left join (
int8_tbl a left join
(select q1, coalesce(q2,f1) as x from int8_tbl b, int4_tbl b2) ss1
on a.q2 = ss1.q1
cross join
lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
) on c.q2 = ss2.q1,
lateral (select * from int4_tbl i where ss2.y > f1) ss3;

TRAP: FailedAssertion("!(bms_is_subset(phinfo->ph_needed, phinfo->ph_may_need))", File: "initsplan.c", Line: 213)

What's happening is that distribute_qual_to_rels concludes (correctly)
that the "ss2.y > f1" clause must be postponed until after the nest of
left joins, since those could null ss2.y. So the PlaceHolderVar for
ss2.y is marked as being needed at the topmost join level. However,
find_placeholders_in_jointree had only marked the PHV as being "maybe
needed" to scan the "i" relation, since that's what the syntactic
location of the reference implies. Since we depend on the assumption
that ph_needed is always a subset of ph_may_need, there's an assertion
that fires if that stops being true, and that's what's crashing.

After some thought about this, I'm coming to the conclusion that lateral
references destroy the ph_maybe_needed optimization altogether: we
cannot derive an accurate estimate of where a placeholder will end up in
the final qual distribution, short of essentially doing all the work in
deconstruct_jointree over again. I guess in principle we could repeat
deconstruct_jointree until we had stable estimates of the ph_needed
locations, but that would be expensive and probably would induce a lot
of new planner bugs (since the data structure changes performed during
deconstruct_jointree aren't designed to be backed out easily).

The only place where ph_may_need is actually used is in this bit in
make_outerjoininfo():

/*
* Examine PlaceHolderVars. If a PHV is supposed to be evaluated within
* this join's nullable side, and it may get used above this join, then
* ensure that min_righthand contains the full eval_at set of the PHV.
* This ensures that the PHV actually can be evaluated within the RHS.
* Note that this works only because we should already have determined the
* final eval_at level for any PHV syntactically within this join.
*/
foreach(l, root->placeholder_list)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Relids ph_syn_level = phinfo->ph_var->phrels;

/* Ignore placeholder if it didn't syntactically come from RHS */
if (!bms_is_subset(ph_syn_level, right_rels))
continue;

/* We can also ignore it if it's certainly not used above this join */
/* XXX this test is probably overly conservative */
if (bms_is_subset(phinfo->ph_may_need, min_righthand))
continue;

/* Else, prevent join from being formed before we eval the PHV */
min_righthand = bms_add_members(min_righthand, phinfo->ph_eval_at);
}

Looking at it again, it's not really clear that skipping placeholders in
this way results in very much optimization --- sometimes we can avoid
constraining join order, but how often? I tried diking out the check
on ph_may_need from this loop, and saw no changes in the regression test
results (not that that proves a whole lot about optimization of complex
queries). So I'm pretty tempted to just remove ph_may_need, along with
the machinery that computes it.

Another possibility would be to keep the optimization, but disable it in
queries that use LATERAL. I don't much care for that though --- seems
too Rube Goldbergish, and in any case I have a lot less faith in the
whole concept now than I had before I started digging into this issue.

Thoughts?

regards, tom lane


From: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-07-04 16:11:47
Message-ID: 51D59EC3.2020902@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/03/2013 08:32 PM, Tom Lane wrote:
> Another possibility would be to keep the optimization, but disable it in
> queries that use LATERAL. I don't much care for that though --- seems
> too Rube Goldbergish, and in any case I have a lot less faith in the
> whole concept now than I had before I started digging into this issue.
>
> Thoughts?
>
I noticed EXPLAIN in some regression tests. So if they all pass after
removal of this optimization, it might indicate that it was really
insignificant. But alternatively it may just be a lack of focus on this
feature in the test queries. Digging for (non-LATERAL) queries or rather
patterns where the ph_may_need optimization clearly appears to be
important sounds to me like a good SQL exercise, but I'm afraid I won't
have time for it in the next few days.

//Antonin Houska (Tony)


From: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-07-12 17:07:08
Message-ID: 51E037BC.6030908@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/04/2013 06:11 PM, Antonin Houska wrote:
> On 07/03/2013 08:32 PM, Tom Lane wrote:
>> Another possibility would be to keep the optimization, but disable it in
>> queries that use LATERAL. I don't much care for that though --- seems
>> too Rube Goldbergish, and in any case I have a lot less faith in the
>> whole concept now than I had before I started digging into this issue.
>>
>> Thoughts?
>>
> I noticed EXPLAIN in some regression tests. So if they all pass after
> removal of this optimization, it might indicate that it was really
> insignificant. But alternatively it may just be a lack of focus on
> this feature in the test queries. Digging for (non-LATERAL) queries or
> rather patterns where the ph_may_need optimization clearly appears to
> be important sounds to me like a good SQL exercise, but I'm afraid I
> won't have time for it in the next few days.
>

I constructed a query that triggers the optimization - see attachment
with comments. (Note that the relid sets are derived from my current
knowledge of the logic. I haven't figured out how to check them easily
in gdb session.)

The intention was that the top-level OJ references LHS of the join below
rather than the RHS. That should increase the likelihood that the PHV
becomes the only obstacle for join commuting. And therefore the
ph_may_need optimization should unblock some combinations that would be
impossible otherwise.

However I could not see the condition

if (bms_is_subset(phinfo->ph_may_need, min_righthand))
continue;

met for the top-level join even though the supposed ph_may_need did not
contain tab1. Then it struck me that min_righthand can be the problem.
So I changed the join clause to reference RHS of j1, hoping that it
should make min_righthand bigger. And that really triggered the condition.

EXPLAIN shows the same plan with or without the ph_may_need
optimization, but that might be data problem (my tables are empty).

More important is the fact that I could only avoid addition of the PHV's
eval_at to min_righthand at the cost of adding the whole j1 join (i.e.
more than just eval_at).

Although the idea behind ph_may_need is clever, I can now imagine that
other techniques of the planner can substitute for it. There might be
examples showing the opposite but such are beyond my imagination.

// Antonin Houska (Tony)

Attachment Content-Type Size
ph_optimization.sql text/x-sql 1.2 KB
tables.ddl text/plain 130 bytes

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LATERAL quals revisited
Date: 2013-07-19 10:12:48
Message-ID: CAFjFpRenMeu4V_xGxyWD2PsQj+Tk2d0qGAkqXJJkT7qZac1Mmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have couple of questions.

On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I've been studying the bug reported at
>
> http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
> that the planner can do the wrong thing with queries like
>
> SELECT * FROM
> i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
>
> I think the fundamental problem is that, because the "i.n = j.n" clause
> appears syntactically in WHERE, the planner is treating it as if it were
> an inner-join clause; but really it ought to be considered a clause of
> the upper LEFT JOIN. That is, semantically this query ought to be
> equivalent to
>
> SELECT * FROM
> i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n;
>
> However, because distribute_qual_to_rels doesn't see the clause as being
> attached to the outer join, it's not marked with the correct properties
> and ends up getting evaluated in the wrong place (as a "filter" clause
> not a "join filter" clause). The bug is masked in the test cases we've
> used so far because those cases are designed to let the clause get
> pushed down into the scan of the inner relation --- but if it doesn't
> get pushed down, it's evaluated the wrong way.
>
> After some contemplation, I think that the most practical way to fix
> this is for deconstruct_recurse and distribute_qual_to_rels to
> effectively move such a qual to the place where it logically belongs;
> that is, rather than processing it when we look at the lower WHERE
> clause, set it aside for a moment and then add it back when looking at
> the ON clause of the appropriate outer join. This should be reasonably
> easy to do by keeping a list of "postponed lateral clauses" while we're
> scanning the join tree.
>
> For there to *be* a unique "appropriate outer join", we need to require
> that a LATERAL-using qual clause that's under an outer join contain
> lateral references only to the outer side of the nearest enclosing outer
> join. There's no such restriction in the spec of course, but we can
> make it so by refusing to flatten a sub-select if pulling it up would
> result in having a clause in the outer query that violates this rule.
> There's already some code in prepjointree.c (around line 1300) that
> attempts to enforce this, though now that I look at it again I'm not
> sure it's covering all the bases. We may need to extend that check.
>
>
Why do we need this restriction? Wouldn't a place (specifically join qual
at such a place) in join tree where all the participating relations are
present, serve as a place where the clause can be applied. E.g. in the query

select * from tab1 left join tab2 t2 using (val) left join lateral (select
val from tab2 where val2 = tab1.val * t2.val) t3 using (val);

Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a
place where we are computing join between tab1, t2 and t3?

I'm inclined to process all LATERAL-using qual clauses this way, ie
> postpone them till we recurse back up to a place where they can
> logically be evaluated. That won't make any real difference when no
> outer joins are present, but it will eliminate the ugliness that right
> now distribute_qual_to_rels is prevented from sanity-checking the scope
> of the references in a qual when LATERAL is present. If we do it like
> this, we can resurrect full enforcement of that sanity check, and then
> throw an error if any "postponed" quals are left over when we're done
> recursing.
>
>
Parameterized nested loop join would always be able to evaluate a LATERAL
query. Instead of throwing error, why can't we choose that as the default
strategy whenever we fail to flatten subquery?

Can we put the clause with lateral references at its appropriate place
while flattening the subquery? IMO, that will be cleaner and lesser work
than first pulling the clause and then putting it back again? Right, now,
we do not have that capability in pull_up_subqueries() but given its
recursive structure, it might be easier to do it there.

> Thoughts, better ideas?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LATERAL quals revisited
Date: 2013-07-19 14:57:19
Message-ID: 14545.1374245839@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> For there to *be* a unique "appropriate outer join", we need to require
>> that a LATERAL-using qual clause that's under an outer join contain
>> lateral references only to the outer side of the nearest enclosing outer
>> join. There's no such restriction in the spec of course, but we can
>> make it so by refusing to flatten a sub-select if pulling it up would
>> result in having a clause in the outer query that violates this rule.
>> There's already some code in prepjointree.c (around line 1300) that
>> attempts to enforce this, though now that I look at it again I'm not
>> sure it's covering all the bases. We may need to extend that check.

> Why do we need this restriction? Wouldn't a place (specifically join qual
> at such a place) in join tree where all the participating relations are
> present, serve as a place where the clause can be applied.

No. If you hoist a qual that appears below an outer join to above the
outer join, you get wrong results in general: you might eliminate rows
from the outer side of the join, which a qual from within the inner side
should never be able to do.

> select * from tab1 left join tab2 t2 using (val) left join lateral (select
> val from tab2 where val2 = tab1.val * t2.val) t3 using (val);
> Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a
> place where we are computing join between tab1, t2 and t3?

This particular example doesn't violate the rule I gave above, since
both tab1 and t2 are on the left side of the join to the lateral
subquery, and the qual doesn't have to get hoisted *past* an outer join,
only to the outer join of {tab1,t2} with {t3}.

>> I'm inclined to process all LATERAL-using qual clauses this way, ie
>> postpone them till we recurse back up to a place where they can
>> logically be evaluated. That won't make any real difference when no
>> outer joins are present, but it will eliminate the ugliness that right
>> now distribute_qual_to_rels is prevented from sanity-checking the scope
>> of the references in a qual when LATERAL is present. If we do it like
>> this, we can resurrect full enforcement of that sanity check, and then
>> throw an error if any "postponed" quals are left over when we're done
>> recursing.

> Parameterized nested loop join would always be able to evaluate a LATERAL
> query. Instead of throwing error, why can't we choose that as the default
> strategy whenever we fail to flatten subquery?

I think you misunderstood. That error would only be a sanity check that
we'd accounted for all qual clauses, it's not something a user should
ever see.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-08-14 20:04:51
Message-ID: 13860.1376510691@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Antonin Houska <antonin(dot)houska(at)gmail(dot)com> writes:
> On 07/04/2013 06:11 PM, Antonin Houska wrote:
>> On 07/03/2013 08:32 PM, Tom Lane wrote:
>>> Another possibility would be to keep the optimization, but disable it in
>>> queries that use LATERAL. I don't much care for that though --- seems
>>> too Rube Goldbergish, and in any case I have a lot less faith in the
>>> whole concept now than I had before I started digging into this issue.

> I constructed a query that triggers the optimization - see attachment
> with comments.

Thanks for poking at this.

> EXPLAIN shows the same plan with or without the ph_may_need
> optimization, but that might be data problem (my tables are empty).

Yeah, I didn't have much luck getting a different plan even with data in
the tables. What you'd need for this to be important would be for a join
order that's precluded without the ph_may_need logic to be significantly
better than the join orders that are still allowed. While that's
certainly within the realm of possibility, the difficulty of triggering
the case at all reinforces my feeling that this optimization isn't worth
bothering with. For the moment I'm just going to take it out.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: LATERAL quals revisited
Date: 2013-08-19 15:06:21
Message-ID: 13247.1376924781@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Some time ago, I wrote:
> I've been studying the bug reported at
> http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
> ...
> After some contemplation, I think that the most practical way to fix
> this is for deconstruct_recurse and distribute_qual_to_rels to
> effectively move such a qual to the place where it logically belongs;
> that is, rather than processing it when we look at the lower WHERE
> clause, set it aside for a moment and then add it back when looking at
> the ON clause of the appropriate outer join. This should be reasonably
> easy to do by keeping a list of "postponed lateral clauses" while we're
> scanning the join tree.

Here's a draft patch for this. The comments need a bit more work
probably, but barring objection I want to push this in before this
afternoon's 9.3rc1 wrap.

regards, tom lane

Attachment Content-Type Size
postpone-lateral-quals.patch text/x-diff 28.3 KB