Re: IS NOT NULL and LEFT JOIN

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Laurent Martelli <laurent(dot)martelli(at)enercoop(dot)org>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: IS NOT NULL and LEFT JOIN
Date: 2014-10-19 08:41:57
Message-ID: CAApHDvoqMm-hdoktQ+9ErGZ3rkF6=vgeHTJ-M6OeQepr1XLDoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli <
laurent(dot)martelli(at)enercoop(dot)org> wrote:

> Hello there,
>
> I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.
>
> I grant you that the query can be written without the JOIN on
> user_user_info,
> but it is generated like this by hibernate. Just changing the IS NOT NULL
> condition
> to the other side of useless JOIN makes a big difference in the query plan
> :
>
> -- THE BAD ONE : given the selectivity on c.name and c.email, barely more
> than one row will ever be returned
>

But it looks like you're ignoring the fact that the OR condition would
force the query to match not only the user and the email, but also any row
that finds a match in the user_user_info table, which going by the
planner's estimates, that's every row in the contract_contract table. This
is why the planner chooses a seqscan on the contract_contract table instead
of using the index on lower(name).

Is it really your intention to get all rows that find a this martelli
contract that has this email, and along with that, get every contract that
has a not null user_info record?

I see that you have a foreign key on c.user_info to reference the user, so
this should be matching everything with a non null user_info record.

explain analyze select c.*
> from contact_contact c
> left outer join user_user_info u on c.user_info=u.id
> left outer join contact_address a on c.address=a.id
> where lower(c.name)='martelli'
> and c.email='dsfze(at)ezrfz(dot)com' or u.id is not null;
>
QUERY
> PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------
> Hash Left Join (cost=1.83..2246.76 rows=59412 width=4012) (actual
> time=53.645..53.645 rows=0 loops=1)
> Hash Cond: (c.user_info = u.id)
> Filter: (((lower((c.name)::text) = 'martelli'::text) AND
> ((c.email)::text = 'dsfze(at)ezrfz(dot)com'::text)) OR (u.id IS NOT NULL))
> Rows Removed by Filter: 58247
> -> Seq Scan on contact_contact c (cost=0.00..2022.12 rows=59412
> width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
> -> Hash (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029
> rows=37 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 2kB
> -> Seq Scan on user_user_info u (cost=0.00..1.37 rows=37
> width=8) (actual time=0.004..0.015 rows=37 loops=1)
> Planning time: 0.790 ms
> Execution time: 53.712 ms
>
> -- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead of
> userinfo1_.id)
> explain analyze select c.*
> from contact_contact c
> left outer join user_user_info u on c.user_info=u.id
> left outer join contact_address a on c.address=a.id
> where lower(c.name)='martelli'
> and c.email='dsfze(at)ezrfz(dot)com' or c.user_info is not null;
> QUERY
> PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1
> width=4012) (actual time=0.037..0.037 rows=0 loops=1)
> Recheck Cond: (((email)::text = 'dsfze(at)ezrfz(dot)com'::text) OR (user_info
> IS NOT NULL))
> Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text =
> 'dsfze(at)ezrfz(dot)com'::text)) OR (user_info IS NOT NULL))
> -> BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual
> time=0.034..0.034 rows=0 loops=1)
> -> Bitmap Index Scan on idx_contact_email (cost=0.00..4.30
> rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
> Index Cond: ((email)::text = 'dsfze(at)ezrfz(dot)com'::text)
> -> Bitmap Index Scan on contact_contact_user_info_idx
> (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
> Index Cond: (user_info IS NOT NULL)
> Planning time: 0.602 ms
> Execution time: 0.118 ms
>
>
>
If you look closely at the 2nd query plan, you'll see that no joins are
performed, and it's only the contract_contract table that's looked at. This
is because PostgresSQL sees that none of the columns from the 2 tables
which are being left joined to are used, and also that the columns that
you're joining to on these tables are unique, therefore joining to them
cannot duplicate any rows, and since these are left joined, if there was no
matching row, then it wouldn't filter out rows from the contract_contract
table, as it would with INNER JOINs. The planner sees that these left joins
are pointless, so just removes them from the plan.

Regards

David Rowley

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2014-10-19 22:24:51 Re: Yet another abort-early plan disaster on 9.3
Previous Message Laurent Martelli 2014-10-19 04:10:50 IS NOT NULL and LEFT JOIN