Re: Performance improvement for joins where outer side is unique

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-02-27 11:30:00
Message-ID: CAApHDvrWrtQWt_dD9EuyGqH7VK7Nio22Ez_Uw4aPTne-UV8eaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26 February 2015 at 08:39, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> I've been looking at this patch, mostly because it seems like a great
> starting point for improving estimation for joins on multi-column FKs.
>
> Currently we do this:
>
> CREATE TABLE parent (a INT, b INT, PRIMARY KEY (a,b));
> CREATE TABLE child (a INT, b INT, FOREIGN KEY (a,b)
> REFERENCES parent (a,b));
>
> INSERT INTO parent SELECT i, i FROM generate_series(1,1000000) s(i);
> INSERT INTO child SELECT i, i FROM generate_series(1,1000000) s(i);
>
> ANALYZE;
>
> EXPLAIN SELECT * FROM parent JOIN child USING (a,b);
>
> QUERY PLAN
> ---------------------------------------------------------------------
> Hash Join (cost=33332.00..66978.01 rows=1 width=8)
> Hash Cond: ((parent.a = child.a) AND (parent.b = child.b))
> -> Seq Scan on parent (cost=0.00..14425.00 rows=1000000 width=8)
> -> Hash (cost=14425.00..14425.00 rows=1000000 width=8)
> -> Seq Scan on child (cost=0.00..14425.00 rows=1000000 width=8)
> (5 rows)
>
> Which is of course non-sense, because we know it's a join on FK, so the
> join will produce 1M rows (just like the child table).
>
> This seems like a rather natural extension of what you're doing in this
> patch, except that it only affects the optimizer and not the executor.
> Do you have any plans in this direction? If not, I'll pick this up as I
> do have that on my TODO.
>
>
Hi Tomas,

I guess similar analysis could be done on FKs as I'm doing on unique
indexes. Perhaps my patch for inner join removal can help you more with
that. You may notice that in this patch I have ended up changing the left
join removal code so that it just checks if has_unique_join is set for the
special join. Likely something similar could be done with your idea and the
inner join removals, just by adding some sort of flag on RelOptInfo to say
"join_row_exists" or some better name. Quite likely if there's any pending
foreign key triggers, then it won't matter at all for the sake of row
estimates.

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Shchukin 2015-02-27 11:42:39 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Marc Cousin 2015-02-27 11:29:44 star schema and the optimizer