Re: Allowing join removals for more join types

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Subject: Re: Allowing join removals for more join types
Date: 2014-06-26 08:46:30
Message-ID: CAApHDvp8ibrMjo-f6E3PycrV6-SS8f6FMWJ+vwKW1WX0O+Y8Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 22, 2014 at 11:51 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 17 June 2014 11:04, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > On Wed, Jun 4, 2014 at 12:50 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >>
> >> As a point of procedure, I recommend separating the semijoin support
> into
> >> its
> >> own patch. Your patch is already not small; delaying non-essential
> parts
> >> will
> >> make the essential parts more accessible to reviewers.
> >>
> >
> > In the attached patch I've removed all the SEMI and ANTI join removal
> code
> > and left only support for LEFT JOIN removal of sub-queries that can be
> > proved to be unique on the join condition by looking at the GROUP BY and
> > DISTINCT clause.
>
> Good advice, we can come back for the others later.
>
>
> > Example:
> >
> > SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT value,COUNT(*) FROM t2 GROUP
> BY
> > value) t2 ON t1.id = t2.value;
>
> Looks good on initial look.
>
> This gets optimized...
>
> EXPLAIN (COSTS OFF)
> SELECT a.id FROM a
> LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
> GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1;
>
> does it work with transitive closure like this..
>
> EXPLAIN (COSTS OFF)
> SELECT a.id FROM a
> LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
> GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1;
>
> i.e. c.id is not in the join, but we know from subselect that c.id =
> b.id and b.id is in the join
>
>
Well, there's no code that looks at equivalence of the columns in the
query, but I'm not quite sure if there would have to be or not as I can't
quite think of a way to write that query in a valid way that would cause it
not to remove the join.

The example query will fail with: ERROR: column "b.id" must appear in the
GROUP BY clause or be used in an aggregate function

And if we rewrite it to use c.id in the target list

EXPLAIN (COSTS OFF)
SELECT a.id FROM a
LEFT JOIN (SELECT c.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1;

With this one c.id becomes b.id, since we've given the subquery the alias
'b', so I don't think there's case here to optimise anything else.

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2014-06-26 08:49:39 Re: pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic
Previous Message Anastasia Lubennikova 2014-06-26 08:37:08 Changes in amcanreturn() interface to support multicolumn indexes