pgsql: Rejigger mergejoin logic so that a tuple with a null in the first

Lists: pgsql-committers
From: tgl(at)postgresql(dot)org (Tom Lane)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Rejigger mergejoin logic so that a tuple with a null in the first
Date: 2010-05-28 01:14:03
Message-ID: 20100528011403.D371B7541D2@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers

Log Message:
-----------
Rejigger mergejoin logic so that a tuple with a null in the first merge column
is treated like end-of-input, if nulls sort last in that column and we are not
doing outer-join filling for that input. In such a case, the tuple cannot
join to anything from the other input (because we assume mergejoinable
operators are strict), and neither can any tuple following it in the sort
order. If we're not interested in doing outer-join filling we can just
pretend the tuple and its successors aren't there at all. This can save a
great deal of time in situations where there are many nulls in the join
column, as in a recent example from Scott Marlowe. Also, since the planner
tends to not count nulls in its mergejoin scan selectivity estimates, this
is an important fix to make the runtime behavior more like the estimate.

I regard this as an omission in the patch I wrote years ago to teach mergejoin
that tuples containing nulls aren't joinable, so I'm back-patching it. But
only to 8.3 --- in older versions, we didn't have a solid notion of whether
nulls sort high or low, so attempting to apply this optimization could break
things.

Modified Files:
--------------
pgsql/src/backend/executor:
nodeMergejoin.c (r1.101 -> r1.102)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeMergejoin.c?r1=1.101&r2=1.102)