Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Allowing NOT IN to use ANTI joins
Date: 2014-06-08 12:36:30
Message-ID: CAApHDvpDu4FpLAex8LknivMsyiN0btxetcD6s2O4btAwqpRaeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Currently pull_up_sublinks_qual_recurse only changes the plan for NOT
EXISTS queries and leaves NOT IN alone. The reason for this is because the
values returned by a subquery in the IN clause could have NULLs.

A simple example of this (without a subquery) is:

select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is
unknown.

The attached patch allows an ANTI-join plan to be generated in cases like:

CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL);
CREATE TABLE b (id INT NOT NULL);

SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b);

To generate a plan like:
QUERY PLAN
-----------------------------------------------------------------
Hash Anti Join (cost=64.00..137.13 rows=1070 width=8)
Hash Cond: (a.b_id = b.id)
-> Seq Scan on a (cost=0.00..31.40 rows=2140 width=8)
-> Hash (cost=34.00..34.00 rows=2400 width=4)
-> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)

But if we then do:
ALTER TABLE b ALTER COLUMN id DROP NOT NULL;

The plan will go back to the current behaviour of:

QUERY PLAN
-------------------------------------------------------------
Seq Scan on a (cost=40.00..76.75 rows=1070 width=8)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)

Comments are welcome

Regards

David Rowley

Attachment Content-Type Size
not_in_anti_join_v0.4.patch application/octet-stream 12.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-06-08 13:51:45 Re: Scaling shared buffer eviction
Previous Message Amit Kapila 2014-06-08 07:24:28 Re: Proposing pg_hibernate