Re: Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-07-15 11:58:30
Message-ID: CAApHDvqRB-iFBy68=dCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 14, 2014 at 8:55 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

>
> Since the patch has not been marked as rejected I was thinking that I'd
> take a bash at fixing it up, but if you think this is a waste of time,
> please let me know.
>
>
>
I've made some changes to the patch so that it only allows the conversion
to ANTI JOIN to take place if both the outer query's expressions AND the
subquery's target list can be proved not to have NULLs.

I've attached a delta, which is the changes I've made on top of Tom's
cleaned up version of my patch, and also a full patch.

I've also performed some benchmarks to try to determine how much time it
takes to execute this null checking code. I ended up hacking the code a
little for the benchmarks and just put the null checking function in a
tight loop that performed 100000 iterations.

Like:
if (under_not)
{
int x;
bool result;
for (x = 0; x < 100000; x++)
{
result = is_NOTANY_compatible_with_antijoin(parse, sublink);
}
if (!result)
return NULL;
}

I then ran 6 queries, 3 times each through the planner and grabbed the
"Planning Time" from the explain analyze result.
I then removed the extra looping code (seen above) and compiled the code as
it is with the attached patch.
I then ran each of the 6 queries again 3 times each and noted down the
"Planning Time from the explain analyze result.

In my results I assumed that the first set of times divided by 100000
would be the time taken to perform the NULL checks... This is not quite
accurate, but all the other planning work was quite well drowned out by the
100k loop.

I found that the call to is_NOTANY_compatible_with_antijoin adds about 0.2%
and 2.3% to total planning time. Though the 2.3% was quite an extreme case,
and the 0.2% was the most simple case I could think of.

I've attached the complete results in html format. I've also attached the
schema that I used and all 6 queries tested.

Here's 2 points which I think are important to note about the planning time
overhead of this patch:
1. There is no additional overhead if the query has no NOT IN clause.
2. The test queries 3 and 6 were to benchmark the overhead of when the NOT
NULL test fails. The slowest of these was test 3 which added just under
0.5% to the planning time. The query that added a 2.3% overhead performed
an ANTI JOIN, so likely the reduction in execution time more than made up
for the extra planning time.

Regards

David Rowley

Attachment Content-Type Size
not_in_benchmark_schema.sql text/plain 1.7 KB
NOTIN_Planner_Benchmark.htm text/html 10.5 KB
not_in_anti_join_v0.9.delta.patch application/octet-stream 14.5 KB
not_in_anti_join_v0.9.patch application/octet-stream 37.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-07-15 12:14:27 Re: Removing dependency to wsock32.lib when compiling code on WIndows
Previous Message Magnus Hagander 2014-07-15 11:46:27 Re: Removing dependency to wsock32.lib when compiling code on WIndows