Re: Allowing join removals for more join types

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Subject: Re: Allowing join removals for more join types
Date: 2014-05-23 21:09:26
Message-ID: CAApHDvoP=gHp8ekgAtdPeU7i0H7-XMdq7gX-Rd7mZdn-w3WvjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 24, 2014 at 3:13 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > I've just had a bit of a look at implementing checks allowing subqueries
> > with unique indexes on the join cols being removed,
>
> I'm a bit confused by this statement of the problem. I thought the idea
> was to recognize that subqueries with DISTINCT or GROUP BY clauses produce
> known-unique output column(s), which permits join removal in the same way
> that unique indexes on a base table allow us to deduce that certain
> columns are known-unique and hence can offer no more than one match for
> a join. That makes it primarily a syntactic check, which you can perform
> despite the fact that the subquery hasn't been planned yet (since the
> parser has done sufficient analysis to determine the semantics of
> DISTINCT/GROUP BY).
>
>
Up thread a little Dilip was talking about in addition to checking that if
the sub query could be proved to be unique on the join condition using
DISTINCT/GROUP BY, we might also check unique indexes in the subquery to
see if they could prove the query is unique on the join condition.

For example a query such as:

SELECT a.* FROM a LEFT JOIN (SELECT b.* FROM b LIMIT 1) b ON a.column =
b.colwithuniqueidx

The presence of the LIMIT would be enough to stop the subquery being pulled
up, but there'd be no reason to why the join couldn't be removed.

I think the use case for this is likely a bit more narrow than the GROUP
BY/DISTINCT case, so I'm planning on using the time on looking into more
common cases such as INNER JOINs where we can prove the existence of the
row using a foreign key.

> Drilling down into the subquery is a whole different matter. For one
> thing, there's no point in targeting cases in which the subquery would be
> eligible to be flattened into the parent query, and your proposed list of
> restrictions seems to eliminate most cases in which it couldn't be
> flattened. For another, you don't have access to any planning results for
> the subquery yet, which is the immediate problem you're complaining of.
> Duplicating the work of looking up a relation's indexes seems like a
> pretty high price to pay for whatever improvement you might get here.
>
>
I agree that there are not many cases left to remove the join that remain
after is_simple_subquery() has decided not to pullup the subquery. Some of
the perhaps more common cases would be having windowing functions in the
subquery as this is what you need to do if you want to include the results
of a windowing function from within the where clause. Another case, though
I can't imagine it would be common, is ORDER BY in the subquery... But for
that one I can't quite understand why is_simple_subquery() stops that being
flattened in the first place.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2014-05-24 03:53:56 SQL access to database attributes
Previous Message Robert Haas 2014-05-23 20:18:54 Re: SKIP LOCKED DATA (work in progress)