Re: Patch to support SEMI and ANTI join removal

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-11-15 21:09:18
Message-ID: CA+U5nM+S5uwN_xG5U4e049J3JB6ZhEhSm143f1SLreEnvunmAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 October 2014 11:03, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> The explain analyze from the above query looks like:
> test=# explain (analyze, costs off, timing off) select count(*) from t1
> inner join t2 on t1.t2_id=t2.id;
> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Nested Loop (actual rows=1000000 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)
> -> Index Only Scan using t2_pkey on t2 (never executed)
> Index Cond: (id = t1.t2_id)
> Heap Fetches: 0
> Execution time: 124.990 ms
> (7 rows)
>
> As you can see the scan on t2 never occurred.

Very good, happy to see this happening (yay FKs!) and with
PostgreSQL-style rigour.

I've reviewed the patch from cold and I have a few comments.

The plan you end up with here works quite differently from left outer
join removal, where the join is simply absent. That inconsistency
causes most of the other problems I see.

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

The alternative is accepting some pretty horrible additional code in
most join types, plus a small regression on nested loop joins which I
would have to call out as regrettably unacceptable. (Horrible in this
sense that we don't want that code, not that David's code is poor).

The tests on the patch are pretty poor. If we should use EXPLAINs to
show a join removal that works and a join removal that fails. With a
few of the main permutations.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-11-15 21:36:06 Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Previous Message Fabrízio de Royes Mello 2014-11-15 19:11:16 Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED