Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-09-30 12:03:35
Message-ID: CAApHDvo6XmnevLOA2BymDrRKwYsAnrZXWn_ERZ-nB_kFViGMQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 1, 2014 at 12:01 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> On 2014-09-30 23:25:45 +1300, David Rowley wrote:
> >
> > I've not quite gotten my head around how we might stop the unneeded
> > relation from being the primary path to join the other inner relations,
> > i.e. what would stop the planner making a plan that hashed all the other
> > relations and planned to perform a sequence scan on the relation that we
> > have no need to scan (because the foreign key tells us the join is
> > pointless). If we were not use use that relation then we'd just have a
> > bunch of hash tables with no path to join them up. If we did anything to
> > force the planner into creating a plan that would suit skipping
> relations,
> > then we could possibly be throwing away the optimal plan..... Right?
>
> I'm not 100% sure I understand your problem description, but let me
> describe how I think this would work. During planning, you'd emit the
> exactly same plan as you'd today, with two exceptions:
> a) When costing a node where one side of a join is very likely to be
> removable, you'd cost it nearly as if there wasn't a join.
>

Ok given the tables:
create table t1 (x int primary key);
create table t2 (y int primary key);

suppose the planner came up with something like:

test=# explain (costs off) select t2.* from t1 inner join t2 on t1.x=t2.y;
QUERY PLAN
----------------------------
Hash Join
Hash Cond: (t1.x = t2.y)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2

If we had a foreign key...

alter table t2 add constraint t2_y_fkey foreign key (y) references t1 (x);

...the join to t1 could possibly be "ignored" by the executor... but
there's a problem as the plan states we're going to seqscan then hash that
relation, then seqscan t1 with a hash lookup on each of t1's rows. In this
case how would the executor skip the scan on t1? I can see how this might
work if it was t2 that we were removing, as we'd just skip the hash lookup
part in the hash join node.

> b) The planner would attach some sort of 'one time join qual' to the
> 'likely removable' join nodes. If, during executor init, that qual
> returns false, simply don't perform the join. Just check the inner
> relation, but entirely skip the outer relation.
>
>
I think in the example that I've given above that the relation we'd want to
keep is the outer relation, we'd want to throw away the inner one, but I
can't quite see how that would work.

Hopefully that makes sense.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-09-30 12:34:57 Re: Patch to support SEMI and ANTI join removal
Previous Message Heikki Linnakangas 2014-09-30 11:56:24 Re: open items for 9.4