Re: Allowing NOT IN to use ANTI joins

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-06-09 21:28:51
Message-ID: CAMkU=1zPVbez_HWao781L8PzFk+d1J8VaJuhyjUHaRifk6OcUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 8, 2014 at 5:36 AM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> Currently pull_up_sublinks_qual_recurse only changes the plan for NOT
> EXISTS queries and leaves NOT IN alone. The reason for this is because the
> values returned by a subquery in the IN clause could have NULLs.
>
> A simple example of this (without a subquery) is:
>
> select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is
> unknown.
>
> The attached patch allows an ANTI-join plan to be generated in cases like:
>
> CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL);
> CREATE TABLE b (id INT NOT NULL);
>
> SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b);
>
> To generate a plan like:
> QUERY PLAN
> -----------------------------------------------------------------
> Hash Anti Join (cost=64.00..137.13 rows=1070 width=8)
> Hash Cond: (a.b_id = b.id)
> -> Seq Scan on a (cost=0.00..31.40 rows=2140 width=8)
> -> Hash (cost=34.00..34.00 rows=2400 width=4)
> -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)
>

I think this will be great, I've run into this problem often from
applications I have no control over. I thought a more complete, but
probably much harder, solution would be to add some metadata to the hash
anti-join infrastructure that tells it "If you find any nulls in the outer
scan, stop running without returning any rows". I think that should work
because the outer rel already has to run completely before any rows can be
returned.

But what I can't figure out is, would that change obviate the need for your
change? Once we can correctly deal with nulls in a NOT IN list through a
hash anti join, is there a cost estimation advantage to being able to prove
that the that null can't occur? (And of course if you have code that
works, while I have vague notions of what might be, then my notion probably
does not block your code.)

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-06-09 22:50:39 Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Previous Message Jeff Janes 2014-06-09 21:15:24 Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby