foreign-key inference & join removal

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: foreign-key inference & join removal
Date: 2009-10-19 01:44:28
Message-ID: 603c8f070910181844k21195e8fv6bb4cee2c5cb1d74@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 17, 2009 at 5:14 PM, I wrote:
> I think that the next project in this area should
> be to try to support removal of INNER joins.  (Removal of SEMI, ANTI,
> and FULL joins seems unlikely ever to be interesting.)  That will
> require teaching the planner about foreign key constraints, which
> interestingly opens up some other interesting optimization
> opportunities.  An INNER join that can never match zero rows can
> alternatively be implemented as a LEFT join, which can be reordered in
> some situations where an inner join can't.  e.g. A LJ (B IJ C ON Pbc)
> ON Pab can be implemented as (A LJ B ON Pab) LJ/IJ C ON Pbc if it is
> the case that for every row in B, there will be at least one row in C
> such that Pbc holds.

A few more thoughts on this topic.

Suppose we define a new join type called "inner_or_left_join". This
means that we've proven that every outer row has at least one join
partner, so that we'll get the same results whichever way we implement
it. We can prove this for either inner joins or left joins, or for
right joins with the sides reversed, by checking that:

(1) The inner rel is a baserel with no restriction clauses.
(2) All the join clauses are merge-joinable.
(3) There is a table on the outer side of the join with a foreign key
constraint referencing the inner table, such that the columns of the
foreign key constraint and the chosen equality operators exactly match
up with the join clauses (no extra columns, no extra join clauses).
(4) All the relevant columns of the outer table are NOT NULL.

I am not quite sure how to check for #3 and #4, nor am I sure where
the best place to put this logic is.

Once this infrastructure is in place, it opens up a number of
interesting optimization opportunities.

1. Inner joins that are relabelled as inner_or_left joins can be
removed using the same logic that we already use for left joins.

2. Inner_or_left joins can be reordered more flexibly than either
inner joins or left joins. Consider:

A LJ (B IJ C ON Pbc) ON Pab

If B IJ C is inner_or_left and Pbc is strict, the joins commute.
Further, I believe that if the LJ is inner_or_left and Pab is strict,
we can commute the joins provided that, in the process, we change the
join between A and B to be an inner join (no longer inner_or_left).
But if neither join is inner_or_left then no reordering is possible.

3. Some path types are implemented only for inner joins but not for
left joins; these can also be used for inner_or_left joins. For
example, consider A LJ B ON Pab. If the join can be proven
inner_or_left, we have the option of treating it as an inner join and
using A rather than B as the inner rel.

Thoughts?

...Robert


From: Alex Brasetvik <alex(at)brasetvik(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign-key inference & join removal
Date: 2009-10-19 12:54:38
Message-ID: 5ECCC9E5-BBB8-4543-AA89-E770B0C88CB1@brasetvik.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Oct 19, 2009, at 03:44 , Robert Haas wrote:

> Suppose we define a new join type called "inner_or_left_join". This
> means that we've proven that every outer row has at least one join
> partner, so that we'll get the same results whichever way we implement
> it. We can prove this for either inner joins or left joins, or for
> right joins with the sides reversed, by checking that:
>
> (1) The inner rel is a baserel with no restriction clauses.
> (2) All the join clauses are merge-joinable.
> (3) There is a table on the outer side of the join with a foreign key
> constraint referencing the inner table, such that the columns of the
> foreign key constraint and the chosen equality operators exactly match
> up with the join clauses (no extra columns, no extra join clauses).
> (4) All the relevant columns of the outer table are NOT NULL.

While considering this, have you given any thought to the points in http://archives.postgresql.org/pgsql-hackers/2009-07/msg01555.php
?

(In short, there are other properties --- e.g. that there is *exactly*
one row in B for each in A, uniqueness is kept, etc --- you can deduce
from foreign key relationships, which is useful for more than join
ordering. The example I gave involved removing Distinct and pushing
Limit through a join)

--
Alex Brasetvik


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Brasetvik <alex(at)brasetvik(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign-key inference & join removal
Date: 2009-10-19 13:57:13
Message-ID: 603c8f070910190657h3aa145a5n366dda212f928841@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 19, 2009 at 8:54 AM, Alex Brasetvik <alex(at)brasetvik(dot)com> wrote:
>
> On Oct 19, 2009, at 03:44 , Robert Haas wrote:
>
>> Suppose we define a new join type called "inner_or_left_join".  This
>> means that we've proven that every outer row has at least one join
>> partner, so that we'll get the same results whichever way we implement
>> it.  We can prove this for either inner joins or left joins, or for
>> right joins with the sides reversed, by checking that:
>>
>> (1) The inner rel is a baserel with no restriction clauses.
>> (2) All the join clauses are merge-joinable.
>> (3) There is a table on the outer side of the join with a foreign key
>> constraint referencing the inner table, such that the columns of the
>> foreign key constraint and the chosen equality operators exactly match
>> up with the join clauses (no extra columns, no extra join clauses).
>> (4) All the relevant columns of the outer table are NOT NULL.
>
> While considering this, have you given any thought to the points in
> http://archives.postgresql.org/pgsql-hackers/2009-07/msg01555.php ?
>
> (In short, there are other properties --- e.g. that there is *exactly* one
> row in B for each in A, uniqueness is kept, etc --- you can deduce from
> foreign key relationships, which is useful for more than join ordering. The
> example I gave involved removing Distinct and pushing Limit through a join)

It's in the back of my mind, but I think join removal & join
reordering are the biggest wins here. Pushing a LIMIT through a join
doesn't really help by itself because, under the pull model PostgreSQL
uses, the lower nodes will only be evaluated to the extent necessary
to satisfy the LIMIT. Getting rid of DISTINCT ON could be very
useful, but I think it's probably something of a corner case, since
normally you won't bother to include DISTINCT ON in the first place if
it's not doing anything.

...Robert