RFC: A brief guide to nulls

From: dev(at)archonet(dot)com
To: pgsql-sql(at)postgresql(dot)org
Subject: RFC: A brief guide to nulls
Date: 2003-01-15 17:23:09
Message-ID: 1522.192.168.1.16.1042651389.squirrel@mainbox.archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

There have been a few posts recently where people have had problems with
nulls. Anyone got comments on the below before I submit it to techdocs?

TIA

- Richard Huxton

A Brief Guide to NULLs
======================

What is a null?
===============
A null is *not* an empty string.
A null is *not* a value.
A null is *not* a "special" value.
A null is the absence of a value.

What do nulls mean?
===================
Well, they *should* mean one of two things:
1. There is no applicable value
2. There is a value but it is unknown

Example 1: Imagine you have a customer table with name and sex fields.
If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
since your customer is a company (case 1).
If you get a new customer "Jackie Smith" they might be male or female, but
you might not know (case 2).

Example 2: You have an address table with (street,city,county,postalcode)
fields.
You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
since you don't have a valid county.
You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
where there *must be* a valid postalcode, but you don't know what it is.

It might be useful to be able to distinguish between these two cases - not
applicable and unknown, but there is only one option "Null" available to
us, so we can't.

How do nulls work?
==================
There is one very important rule when dealing with nulls. The result of
any operation or comparison, when applied to a null is null. The only
exception is testing if a value is null.

Example: with the customer table above you could run the following queries:
SELECT * FROM customer WHERE sex='M';
SELECT * FROM customer WHERE sex<>'M';
Now you might think this returns all customers, but it will miss those
where sex is null. You've asked for all rows where the value of sex is 'M'
and all those with values not equal to 'M' but not rows with *no value at
all*

It might help to think of a database as a set of statements you *know* to
be true. A null indicates that you *cannot say anything at all* about that
field. You can't say what it is, you can't say what it isn't, you can only
say there is some information missing.

So, to see all the customers with unknown or inapplicable sex you would need:
SELECT * FROM customer WHERE sex IS NULL;

There are actually three possible results for a test in SQL - True (the
test passed), False (the test failed) and Null (you tested against a
null). A result of null usually gets treated as False, so testing against
nulls always fails.

If you try to perform an operation on nulls, again the result is always
null. So the results of all of the following are null:
SELECT 'abc' || null;
SELECT 1 + null;
SELECT sqrt(null::numeric);
The first case can be especially confusing. Concatenating a null string to
a string value will return null, not the original value.

Uniqueness and nulls
====================
If you define a unique index on a column it prevents you inserting two
values that are the same. It does not prevent you inserting as many nulls
as you like. How could it, you don't have a value so it can't be the same
as any other.

Example: We create a table "ta" with a unique constraint on column "b"
CREATE TABLE ta (
a int4,
b varchar(3),
PRIMARY KEY (a)
);
CREATE UNIQUE INDEX ta_b_idx ON ta (b);
INSERT INTO ta VALUES (1,'aaa'); -- succeeds
INSERT INTO ta VALUES (2,'bbb'); -- succeeds
INSERT INTO ta VALUES (3,null); -- succeeds
INSERT INTO ta VALUES (4,'bbb'); -- fails
INSERT INTO ta VALUES (5,null); -- succeeds!

Given the definition of what a null is, you only have two choices: allow
multiple nulls or allow no nulls. If you want no nulls, define the column
as NOT NULL when creating the table.

Keys and nulls
==============
No column that is part of a primary key can be null. When you define a
PRIMARY KEY, none of the columns mentioned can take a null value.
Postgresql makes sure of this by defining the columns as NOT NULL for you.

Example: With table "ta" we just created, \d ta will show column a as
being not null. Otherwise, we could insert rows with a set to null and
have no way to tell them apart.

Subqueries and nulls
====================
Since tests always fail when testing against nulls you can have unexpected
results with sub-queries.

Example: Assume we have a companies table and a diary table. Diary entries
are usually related to a particular company but not always.
SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
diary);
If any row in diary contains a null dy_company then you will get *no
results*.
We can expand the query like so:
WHERE co_id NOT IN (SELECT dy_company FROM diary)
WHERE co_id NOT IN (1, 2, null, 3...)
WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
WHERE NOT (... OR null OR ...)
WHERE NOT (null)
WHERE null

You either need to explicitly check for null values, or define the column
in question as NOT NULL.

Further Information
===================
Any good relational database book (try something written by Date)
Bruce's book (link LHS at http://techdocs.postgresql.org)
My Postgresql Notes (link at http://techdocs.postgresql.org)

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Joseph Krogh 2003-01-15 17:38:00 Re: index on to_char(created, 'YYYY') doesn't work
Previous Message Steve Crawford 2003-01-15 17:19:43 Re: index on to_char(created, 'YYYY') doesn't work