Re: Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-07-05 11:33:46
Message-ID: CAApHDvoLkDtDS+AD28UzXDpYu3+_QRdcKyMj1mtjNRWk6BaLrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 2, 2014 at 9:25 PM, Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

>
> On Sun, Jun 29, 2014 at 4:18 PM, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
>
>> I think I'm finally ready for a review again, so I'll update the
>> commitfest app.
>>
>>
> I have reviewed this on code level.
>
> 1. Patch gets applied cleanly.
> 2. make/make install/make check all are fine
>
> No issues found till now.
>
> However some cosmetic points:
>
> 1.
> * The API of this function is identical to convert_ANY_sublink_to_join's,
> * except that we also support the case where the caller has found NOT
> EXISTS,
> * so we need an additional input parameter "under_not".
>
> Since now, we do support NOT IN handling in convert_ANY_sublink_to_join()
> we
> do have "under_not" parameter there too. So above comments near
> convert_EXISTS_sublink_to_join() function is NO more valid.
>
>
Nice catch. I've removed the last 2 lines from that comment now.

> 2. Following is the unnecessary change. Can be removed:
>
> @@ -506,6 +560,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node
> *node,
> return NULL;
> }
> }
> +
> }
> /* Else return it unmodified */
> return node;
>
>
Removed

>
>
3. Typos:
>
> a.
> + * queryTargetListListCanHaveNulls
> ...
> +queryTargetListCanHaveNulls(Query *query)
>
> Function name is queryTargetListCanHaveNulls() not
> queryTargetListListCanHaveNulls()
>
>
Fixed.

> b.
> * For example the presense of; col IS NOT NULL, or col = 42 would
> allow
>
> presense => presence
>
>
>
Fixed.

> 4. get_attnotnull() throws an error for invalid relid/attnum.
> But I see other get_att* functions which do not throw an error rather
> returning some invalid value, eg. NULL in case of attname, InvalidOid in
> case of atttype and InvalidAttrNumber in case of attnum. I have observed
> that we cannot return such invalid value for type boolean and I guess
> that's
> the reason you are throwing an error. But somehow looks unusual. You had
> put
> a note there which is good. But yes, caller of this function should be
> careful enough and should not assume its working like other get_att*()
> functions.
> However for this patch, I would simply accept this solution. But I wonder
> if
> someone thinks other way round.
>
>
hmmm, I remember thinking about that at the time. It was a choice between
returning false or throwing an error. I decided that it probably should be
an error, as that's what get_relid_attribute_name() was doing. Just search
lsyscache.c for "cache lookup failed for attribute %d of relation %u".

>
> Testing more on SQL level.
>
>
Thank you for reviewing this. I think in the attached I've managed to nail
down the logic in find_inner_rels(). It was actually more simple than I
thought. On working on this today I also noticed that RIGHT joins can still
exist at this stage of planning and also that full joins are not
transformed. e.g: a FULL JOIN b ON a.id=b.id WHERE a.is IS NOT NULL would
later become a LEFT JOIN, but at this stage in planning, it's still a FULL
JOIN. I've added some new regression tests which test some of these join
types.

With further testing I noticed that the patch was not allowing ANTI joins
in cases like this:

explain select * from a where id not in(select x from b natural join c);

even if b.x and b.c were NOT NULL columns. This is because the TargetEntry
for x has a varno which belongs to neither b or c, it's actually the varno
for the join itself. I've added some code to detect this in
find_inner_rels(), but I'm not quite convinced yet that it's in the correct
place... I'm wondering if a future use for find_inner_rels() would need to
only see relations rather than join varnos. The code I added does get the
above query using ANTI joins, but it does have a bit of a weird quirk, in
that for it to perform an ANTI JOIN, b.x must be NOT NULL. If c.x is NOT
NULL and b.x is nullable, then there will be no ANTI JOIN. There must be
some code somewhere that chooses which of the 2 vars that should go into
the target list in for naturals joins.

The above got me thinking that the join conditions can also be used to
prove not nullness too. Take the query above as an example, x can never
actually be NULL, the condition b.x = c.x ensures that. I've only gone as
far as adding some comments which explain that this is a possible future
optimisation. I've not had time to look at this yet and I'd rather see the
patch go in without it than risk delaying this until the next commitfest...
Unless of course someone thinks it's just too weird a quirk to have in the
code.

Attached is the updated version of the patch.

Regards

David Rowley

However, assigning it to author to think on above cosmetic issues.
>
> Thanks
>
> --
> Jeevan B Chalke
> Principal Software Engineer, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

Attachment Content-Type Size
not_in_anti_join_5257082_2014-07-05.patch application/octet-stream 28.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2014-07-05 12:17:34 Re: Proposing pg_hibernate
Previous Message Amit Kapila 2014-07-05 05:52:49 Re: postgresql.auto.conf and reload