Re: Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-06-11 08:53:03
Message-ID: CAApHDvo2Li6ZxEvL5H-MgzJMp9wN71jK3x-133xgALYA8g-4cA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 9, 2014 at 11:20 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:

> On Sun, Jun 8, 2014 at 3:36 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > Currently pull_up_sublinks_qual_recurse only changes the plan for NOT
> EXISTS
> > queries and leaves NOT IN alone. The reason for this is because the
> values
> > returned by a subquery in the IN clause could have NULLs.
>
> I believe the reason why this hasn't been done yet, is that the plan
> becomes invalid when another backend modifies the nullability of the
> column. To get it to replan, you'd have to introduce a dependency on
> the "NOT NULL" constraint, but it's impossible for now because there's
> no pg_constraint entry for NOT NULLs.
>
> The only way to consistently guarantee nullability is through primary
> key constraints. Fortunately that addresses most of the use cases of
> NOT IN(), in my experience.
>
>
I tried to break this by putting a break point
in convert_ANY_sublink_to_join in session 1. Not that it really had to be
in that function, I just wanted it to stop during planning and before the
plan is executed.

-- session 1
select * from n1 where id not in(select id from n1); -- hits breakpoint in
convert_ANY_sublink_to_join

-- session 2
alter table n2 alter column id drop not null;

insert into n2 values(null);

I see that session 2 blocks in the alter table until session 1 completes.

I've not really checked out the code in detail around when the snapshot is
taken and the transaction ID is generated, but as long as the transaction
id is taken before we start planning in session 1 then it should not matter
if another session drops the constraint and inserts a NULL value as we
won't see that NULL value in our transaction... I'd assume that the
transaction has to start before it grabs the table defs that are required
for planning. Or have I got something wrong?

> See the comment in check_functional_grouping:
>
> * Currently we only check to see if the rel has a primary key that is a
> * subset of the grouping_columns. We could also use plain unique
> constraints
> * if all their columns are known not null, but there's a problem: we need
> * to be able to represent the not-null-ness as part of the constraints
> added
> * to *constraintDeps. FIXME whenever not-null constraints get represented
> * in pg_constraint.
>
>
I saw that, but I have to say I've not fully got my head around why that's
needed just yet.

> The behavior you want seems somewhat similar to
> check_functional_grouping; maybe you could unify it with your
> targetListIsGuaranteedNotToHaveNulls at some level. (PS: that's one
> ugly function name :)
>
>
Agreed :) Originally I had put the code that does that
in convert_ANY_sublink_to_join, but at the last minute before posting the
patch I decided that it might be useful and reusable so moved it out to
that function. I'll try and think of something better, but I'm open to
ideas.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-06-11 09:03:21 Re: Allowing NOT IN to use ANTI joins
Previous Message Amit Kapila 2014-06-11 05:52:31 Re: [bug fix] Memory leak in dblink