Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-10-15 10:03:01
Message-ID: CAApHDvqAigcNnp4D5LbBaZgH_se=6jXWQXxYrQx8bXEukr3G6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 9, 2014 at 12:40 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> On 2014-10-09 00:21:44 +1300, David Rowley wrote:
> > Ok, so I've been hacking away at this for a couple of evenings and I
> think
> > I have a working prototype finally!
>
> Cool!
>
>
Patch attached.

> > So it seems it's not quite as efficient as join removal at planning time,
> > but still a big win when it's possible to perform the join skipping.
>
> Have you checked where the overhead is? Is it really just the additional
> node that the tuples are passed through?
>
>
I've not checked this yet, but I'd assume that it has to be from the extra
node. I'll run some profiles soon.

> Have you measured the overhead of the plan/execution time checks over
> master?
>

I did a bit of benchmarking last night, but this was mostly for testing
that I've not added too much overhead on the nest loop code.
For the merge and hashjoin code I've managed to keep the special skipping
code in the EXEC_MJ_INITIALIZE_OUTER and HJ_BUILD_HASHTABLE part of the
main switch statement, so the extra checks should only be performed on the
first call of the node when skips are not possible. For nested loop I can't
see any other way but to pay the small price of setting the skipflags and
checking if there are any skip flags on every call to the node.

I tested the overhead of this on my laptop by creating 2 tables with 1
million rows each, joining them on an INT column, where each value of the
int column was unique. I seem to have added about a 2% overhead to this. :(
Which I was quite surprised at, giving it's just 1000001 million extra
settings if the skipflags and checking that the skip flags are not empty,
but perhaps the extra local variable is causing something else to not make
it into a register. At the moment I can't quite see another way to do it,
but I guess it may not be the end of the world as the chances of having to
perform a nest loop join on 2 tables of that size is probably not that
high.

Test case:
create table t3 (id int primary key);
create table t2 (id int primary key, t3_id int not null references t3);
create table t1 (id int primary key, t2_id int not null references t2);
insert into t3 select x.x from generate_series(1,1000000) x(x);
insert into t2 select x.x,x.x from generate_series(1,1000000) x(x);
insert into t1 select x.x,x.x from generate_series(1,1000000) x(x);
vacuum;
set enable_hashjoin = off;
set enable_mergejoin = off;

select count(*) from t1 inner join t2 on t1.id=t2.id;

Unpatched:
duration: 120 s
number of transactions actually processed: 45
latency average: 2666.667 ms
tps = 0.371901 (including connections establishing)
tps = 0.371965 (excluding connections establishing)

Patched:
Master
duration: 120 s
number of transactions actually processed: 44
latency average: 2727.273 ms
tps = 0.363933 (including connections establishing)
tps = 0.363987 (excluding connections establishing)

102.19%

Of course if we do the join on the column that has the foreign key, then
this is much faster.

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id;
count
---------
1000000
(1 row)

Time: 105.206 ms

The explain analyze from the above query looks like:
test=# explain (analyze, costs off, timing off) select count(*) from t1
inner join t2 on t1.t2_id=t2.id;
QUERY PLAN
------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=1000000 loops=1)
-> Seq Scan on t1 (actual rows=1000000 loops=1)
-> Index Only Scan using t2_pkey on t2 (never executed)
Index Cond: (id = t1.t2_id)
Heap Fetches: 0
Execution time: 124.990 ms
(7 rows)

As you can see the scan on t2 never occurred.

I've so far only managed to come up with 1 useful regression test for this
new code. It's not possible to tell if the removal has taken place at plan
time, as the plan looks the same as if it didn't get removed. The only way
to tell us from the explain analyze, but the problem is that the execution
time is shown and can't be removed. Perhaps I should modify the explain to
tag join nodes that the planner has marked to say skipping may be possible?
But this is not really ideal as it's only the join nodes that know about
skipping and they just don't bother executing the child nodes, so it's
really up to the executor to decide which child nodes don't get called, so
to add something to explain might require making it more smart than it
needs to be.

Right now I'm not quite sure if I should modify any costs. Quite possibly
hash joins could have the costing reduced a little to try to encourage
hashjoins over merge joins, as with merge joins we can't skip sort
operations, but with hash joins we can skip the hash table build.

Regards

David Rowley

Attachment Content-Type Size
inner_join_removals_2014-10-15_0f3f1ea.patch application/octet-stream 52.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-10-15 11:00:10 Locking for Rename To new_name works differently for different objects
Previous Message Ali Akbar 2014-10-15 09:57:01 Re: proposal: plpgsql - Assert statement