Re: WIP: RangeTypes

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: RangeTypes
Date: 2011-01-17 21:09:26
Message-ID: 1295298566.11513.27.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When defining generic range functions, there is quite a bit of extra
complexity needed to handle special cases.

The special cases are due to:
* empty ranges
* ranges with infinite boundaries
* ranges with NULL boundaries
* ranges with exclusive bounds (e.g. "(" or ")").

Infinite bounds, and exclusive bounds can both be handled somewhat
reasonably, and the complexity can be somewhat hidden. Empty ranges are
a special case, but can be handled at the top of the generic function in
a straightforward way.

NULL bounds, however, have been causing me a little frustration. A
reasonable interpretation of boolean operators that operate on ranges
might be: "true or false if we can prove it from only the inputs; else
NULL". This gets a little interesting because a NULL value as a range
boundary isn't 100% unknown: it's known to be on one side of the other
bound (assuming that the other side is known). This is similar to how
AND and OR behave for NULL. For instance, take the simple definition of
"contains":

r1.a <= r2.a AND r1.b >= r2.b

(where "a" is the lower bound and "b" is the upper)

Consider r1: [NULL, 10], r2: [20, NULL]. Contains should return "false"
according to our rule above, because no matter what the values of r1.a
and r2.b, the ranges can't possibly overlap.

So, now, more complexity needs to be added. We can be more redundant and
do:

r1.a <= r2.a AND r1.b <= r2 AND r1.a <= r2.b AND r1.b >= r2.a

That seems a little error-prone and harder to understand.

Then, when we have functions that operate on ranges and return ranges,
we're not dealing with 3VL exactly, but some other intuition about what
NULL should do. The semantics get a lot more complicated and hard to
reason about. For instance, what about:
(NULL, 5) INTERSECT (3, NULL)
Should that evaluate to NULL, (NULL, NULL), or throw an error? What
about:
(NULL, 5) MINUS (NULL, 7)
(NULL, 5) MINUS (3, NULL)

I feel like I'm making this too complicated. Should I just scope out
NULL range boundaries for the first cut, and leave room in the
representation so that it can be added when there is a more thorough
proposal for NULL range boundaries?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-01-17 21:13:12 Re: texteq/byteaeq: avoid detoast [REVIEW]
Previous Message Joel Jacobson 2011-01-17 21:08:28 Re: Bug in pg_describe_object, patch v2