The Database Design Resource Center



NULL values in a database: A programmer's nightmare

Many argue that it should be allowed to have NULL values in a database. The reality is; all database systems I know of, allow NULL values for columns. But is it right; or even better; are NULL values at all allowed at all according to the relational model?

Before we proceed, I should make you aware of the fact that the term "NULL values" is actually incorrect: NULL means "unknown", and as such, it can not be considered a "value" at all. However, I will let it stand in this article, since the use is so widespread, if it makes it more convenient for you.

Actually, the original paper(s)(1969/1970) by E.F. Codd on the relational model, stated that relations have attributes which contain values within a domain. (I'll be back on that one in another article).

Thus, the relational model states that every attribute has a value for a given occurence (row/tuple). (Remember, NULL is not a value, it is something unknown).

But we need not go all the way to the relational model: This is actually common sense, when you think about it. Let us talk about the concept of NULL values in a database for a little: NULL means something is unknown. It does NOT mean null (the digit 0). Null (or zero, the digit) is a known value: You can include 0 in calculations, whilst NULL stands for "value is unknown". There is a world of difference between the two.

Let us look at an example: I have a table ACCOUNTS: By the way, since it allows a column to have a NULL value, it is definetely not a relational table; ref the above definition. More on that later:

NULL values in a database

Notice the column AVAILABLE: for account_no 30 it holds a NULL value (although the term "value" is not correct when talking about NULLs, as mentioned earlier). How could that be?

Maybe they didn't know if it should be available or not at the time of registration? Maybe the operator just forgot to enter it? By the way; notice Oracle's missing ability to create Boolean (TRUE/FALSE) columns: In this example, I have used Y and N; yes you guessed it: For Yes respective No :-)

Let us continue: We have three rows in our table; two of them have a clear meaning of whether it is available or not; for the third row (not that ordering them as row one, two and three has any meaning in the relational model) no-one knows: It is simply unknown.

Let us explore some consequenses:

NULL values in a database

NULL values in a database

NULL values in a database

Study the three different statements carefully. You will notice that the first two do a correct count of the occurrenses (rows), while the last one returns TWO. But there are THREE! It is only that, that the last occurence is unknown, so the database cannot return a correct value. The value is unknown, so Oracle cannot give us an answer!

Stay with me, as it is just getting better (or worse, if you prefer).

Let us say we want to see all accounts that are either available or not available. Let us do a UNION (you probably wouldn't do it like this, but this is only done for illustrational purposes):

NULL values in a database


The first SELECT turns nicely up with all accounts where available=Y.

The second SELECT turns up all accounts where available IS NOT Y. But actually, as you can see, it doesn't.

What we wanted to ask the database was "give me all acounts with available Y combined with all acounts where available IS NOT Y". But it fails.The row with available=N satisfies the WHERE-clause, but with regards to the row with an unknown value, Oracle simply says: "I don't know, therefore I cannot tell".

In my opinion, instead of returning an answer, Oracle should have told me: "I do not know the right answer to your question".

Chris Date has published an article called "Not is not Not", and I hereby credit him for giving me inspiration for this article, including pointing out some of the flaws in SQL, through his eminent seminar "Relational Remodeled".

Now, most database vendors, including Oracle, who has implemented the NVL function (the SQL Server equivalent is the ISNULL function), know that this is not a situation we can live with. So take a look at how we can 'fix' our little problem:

NULL values in a database

Now, everything is right(?). We have an NVL function which says, 'if AVAILABLE contains a NULL value, replace it with X'. X being different from Y, the UNION works.

But who decided upon that X value?

I did, actually. So, what did I do? I ASSUMED that if the value was unknown, we could use an X. I just guessed.

The whole point of NULL, is that it is unknown. No-one knows, including me. I just made an assumption.

Actually, any value chosen would have been just as good, or bad, as X, with an exception for Y: Y UNION Y yields Y.

Let us take it further: What if all rows in the table had a NULL value for AVAILABLE?

NULL values in a database

There are still three rows in the table, but a COUNT over NULL values makes it look as if there are no rows. Now, here comes a shock for you: The above is from an Oracle database. How about SQL Server?

(Programmer's manual stuff, saying): "In SQL Server, when using the COUNT function against a column containing null values, the null values will be eliminated from the calculation. However, if the COUNT function uses an asterisk, it will calculate all rows regardless of null values being present."

The same thing will happen. Back to Oracle:

Actually, if we delete the three rows, we will get exactly the same answer:

NULL values in a database

This last example is even worse: The SELECT * returns 'No rows selected', but the SELECT COUNT(*) returns ONE row with the value 0!

In other words; if you had a WHERE clause like this:

WHERE EXISTS (SELECT COUNT(AVAILABLE)...

it should return TRUE. Totally wrong...

Well, actually, this is an error in the COUNT function: Any column would give the same COUNT result...

Actually, until proven otherwise, with the COUNT function, you could 'prove' that TRUE=FALSE, since it returns ONE row with the value zero (0).

I am very reluctant to trust ANYTHING coming out of a database like this. Would you?

If you are not convinced, let us do another little exercise: I have added the column BALANCE to the ACCOUNTS table, and I have two rows in it, where one of the rows contain a NULL value for BALANCE:

NULL values in a database

Now, let us calculate the sum of balances:

NULL values in a database

Fair enough? As far as we know, the sum is 1000. That is: As far as we know.

But: Are we sure? A balance is really either 0 or some amount of cash, positive or negative... The best thing the database could have answered, is 'Gee, I'm not really sure here. Can't really give you an answer'.

Why?: 1000 + something unknown = something unknown: That's elementary logic...

POINT: NULLs are giving us severe problems in determining the true values. It serves no informational purpose at all, and can only lead to incorrect answers and distrust in the database itself. Of how much worth is an untrusted database?

CLUE: In both of the examples above, we can trace them back to bad database design: In the example of AVAILABLE, the column should be mandatory. In the example of BALANCE, it is an example of denormalization, often performed in order to 'help' the database to perform better without taking the consequences into account. Check out my eBook on Database normalization, where this case is solved.

Now, another thought: If you define a column as numeric, how could you allow it to contain anything but numbers? Numbers are well-known: they consist of figures from 0-9. NULL is not a number... I think ;-).

Actually, I do not know what the NULL "value" is: That is the concept of NULL...

CLUE:Whenever I see that someone has put a NULL column into the database design, I think to myself: 'Is there a missing entity here?'

Most often, the answer is yes.

A few last words on NULL: NULL = Unknown.

What kind of unknown?

-Don't know
-Haven't been told
-Not applicable yet
-Forgot to tell

etc, etc.

POINT: How many types of unknown are there? For each type, you have to add logic. That is what we call n-Valued logic.

In computers, we should only deal with TRUE or FALSE (2-Valued logic). Everything else is guesswork. Computers are extremely bad at at guesswork: They want facts, and can make desicions based on facts. Without facts, they are of little or no use...

This leads us into a short conclusion:

CONCLUSION: Allowing NULL values into your columns introduces a whole new degree of uncertainty into your database. (Qualified?) guesses must be made by the SQL programmer to counter for erroneous results of NULL values in a database.

Even better, if two (or more) programmers work on the same table(s) with NULL values in a database allowed, you could end up with as many different results as you have programmers. Or more.

To quote Chris Date: "I hate NULLs!". I couldn't agree more.

If you want more documentation on null values in a database, do a Google search: Search for null values. There are (depending on your country) more than 51 million sites discussing this problem.

Yes, the concept of NULL is a problem; just do the search and read on. (Don't be surprised if this page shows up as no. 1 of those millions :-)

Return to Database Theory and Practice


Exclusive interviews with:
Steven Feuerstein, PLSQL expert
Donald Burleson, Top IT consultant


Free eBook

Subscribe to my newsletter and get my ebook on Entity Relationship Modeling Principles as a free gift:


What visitors say...

"I just stumbled accross your site looking for some normalization theory and I have to say it is fantastic.

I have been in the database field for 10+ years and I have never before come across such a useful site. Thank you for taking the time to put this site together."

Mike, USA

Read more Testimonials


Database Normalization eBook:


Database Normalization eBook




Copyright © www.databasedesign-resource.com / Alf A. Pedersen
All rights reserved.
All information contained on this website is for informational purposes only.
Disclaimer: www.databasedesign-resource.com does not warrant any company, product, service or any content contained herein.

Return to top

Copyright acknowledgement note:

The name Oracle is a trademark of Oracle Corporation.
The names MS Access/MS SQL Server are trademarks of Microsoft Corporation.
Any other names used on this website may be trademarks of their respective owners, which I fully respect.