Re: Removing INNER JOINs

From: Mart Kelder <mart(at)kelder31(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing INNER JOINs
Date: 2014-11-30 10:19:28
Message-ID: m5eqvh$j92$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David (and others),

David Rowley wrote:
> Hi,
>
> Starting a new thread which continues on from
> http://www.postgresql.org/message-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com
>
> To give a brief summary for any new readers:
>
> The attached patch allows for INNER JOINed relations to be removed from
> the plan, providing none of the columns are used for anything, and a
> foreign key exists which proves that a record must exist in the table
> being removed which matches the join condition:
>
> I'm looking for a bit of feedback around the method I'm using to prune the
> redundant plan nodes out of the plan tree at executor startup.
> Particularly around not stripping the Sort nodes out from below a merge
> join, even if the sort order is no longer required due to the merge join
> node being removed. This potentially could leave the plan suboptimal when
> compared to a plan that the planner could generate when the removed
> relation was never asked for in the first place.

I did read this patch (and the previous patch about removing SEMI-joins)
with great interest. I don't know the code well enough to say much about the
patch itself, but I hope to have some usefull ideas about the the global
process.

I think performance can be greatly improved if the planner is able to use
information based on the current data. I think these patches are just two
examples of where assumptions during planning are usefull. I think there are
more possibilities for this kind of assumpions (for example unique
constraints, empty tables).

> There are some more details around the reasons behind doing this weird
> executor startup plan pruning around here:
>
> http://www.postgresql.org/message-id/20141006145957.GA20577@awork2.anarazel.de

The problem here is that assumpions done during planning might not hold
during execution. That is why you placed the final decision about removing a
join in the executor.

If a plan is made, you know under which assumptions are made in the final
plan. In this case, the assumption is that a foreign key is still valid. In
general, there are a lot more assumptions, such as the still existing of an
index or the still existing of columns. There also are soft assumptions,
assuming that the used statistics are still reasonable.

My suggestion is to check the assumptions at the start of executor. If they
still hold, you can just execute the plan as it is.

If one or more assumptions doesn't hold, there are a couple of things you
might do:
* Make a new plan. The plan is certain to match all conditions because at
that time, a snapshot is already taken.
* Check the assumption. This can be a costly operation with no guarantee of
success.
* Change the existing plan to not rely on the failed assumption.
* Use an already stored alternate plan (generate during the initial plan).

You currently change the plan in executer code. I suggest to go back to the
planner if the assumpion doesn't hold. The planner can then decide to change
the plan. The planner can also conclude to fully replan if there are reasons
for it.

If the planner knows that it needs to replan if the assumption will not hold
during execution, the cost of replanning multiplied by the chance of the
assumption not holding during exeuction should be part of the decision to
deliver a plan with an assumpion in the first place.

> There are also other cases such as MergeJoins performing btree index scans
> in order to obtain ordered results for a MergeJoin that would be better
> executed as a SeqScan when the MergeJoin can be removed.
>
> Perhaps some costs could be adjusted at planning time when there's a
> possibility that joins could be removed at execution time, although I'm
> not quite sure about this as it risks generating a poor plan in the case
> when the joins cannot be removed.

Maybe this is a case where you are better off replanning if the assumption
doesn't hold instead of changing the generated exeuction plan. In that case
you can remove the join before the path is made.

> Comments are most welcome
>
> Regards
>
> David Rowley

Regards,

Mart

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-11-30 11:38:54 Re: Removing INNER JOINs
Previous Message Jim Nasby 2014-11-30 04:46:57 Re: How about a option to disable autovacuum cancellation on lock conflict?