Re: Allowing join removals for more join types

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing join removals for more join types
Date: 2014-05-19 06:45:20
Message-ID: CAApHDvp6j4Umgw7eywKbBwcR+1s=r+df6EoukSPQ63NhFJzBkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 19, 2014 at 5:47 PM, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com> wrote:

> On 18 May 2014 16:38 David Rowley Wrote
>
>
>
> Sound like a good idea to me..
>
>
>
> I have one doubt regarding the implementation, consider the below query
>
>
>
> Create table t1 (a int, b int);
>
> Create table t2 (a int, b int);
>
>
>
> Create unique index on t2(b);
>
>
>
> select x.a from *t1 x* left join (select *distinct t2.a a1*, *t2.b b1*from t2) as y on x.a=y.b1; (*because
> of distinct clause subquery will not be pulled up*)
>
>
>
> In this case, Distinct clause is used on *t2.a, *but* t2.b *is used for
> left Join (t2.b have unique index so this left join can be removed).
>
>
>
> So I think now when you are considering this join removal for subqueries
> then this can consider other case also like unique index inside subquery,
>
> because in attached patch unique index is considered only if its
> RTE_RELATION
>
>
>
> + if (innerrel->rtekind == RTE_RELATION &&
>
> + relation_has_unique_index_for(root, innerrel,
> clause_list, NIL, NIL))
>
> return true;
>
>
>
>
>
> Correct me if I am missing something..
>
>
I think you are right here, it would be correct to remove that join, but I
also think that the query in question could be quite easily be written as:

select t1.a from t1 left join t2 on t1.a=t2.b;

Where the join WILL be removed. The distinct clause here technically is a
no-op due to all the columns of a unique index being present in the clause.
Can you think of a use case for this where the sub query couldn't have been
written out as a direct join to the relation?

What would be the reason to make it a sub query with the distinct? or have
I gotten something wrong here?

I'm also thinking here that if we made the join removal code remove these
joins, then the join removal code would end up smarter than the rest of the
code as the current code seems not to remove the distinct clause for single
table queries where a subset of the columns of a distinct clause match all
the columns of a unique index.

create table pktest (id int primary key);
explain (costs off) select distinct id from pktest;
QUERY PLAN
--------------------------
HashAggregate
Group Key: id
-> Seq Scan on pktest

This could have been rewritten to become: select id from pktest

I feel if we did that sort of optimisation to the join removals, then I'd
guess we'd better put it into other parts of the code too, perhaps
something that could do this should be in the re-writer then once the join
removal code gets to it, the join could be removed.

Can you think of a similar example where the subquery could not have been
written as a direct join to the relation?

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-05-19 08:12:32 Re: Re: [GENERAL] Is it typo in pg_stat_replication column name in PG 9.4 ?
Previous Message Dilip kumar 2014-05-19 05:47:10 Re: Allowing join removals for more join types