![]() |
||||
Candidate keys:
|
||||
Your role as an analyst/designer is to validate such properties, and to fit them into a normalized database model that reflects the needs of the business.
During this, you will discover many candidate keys and issues concerning how candidate keys could be defined. With wrong candidate key decisions, chances are high that you will have a worse job in normalizing the database model, as well as achieve a high performing database in the end. (Read the eBook on database normalization). ).
There are three fundamental demands on the candidate keys that we should never deviate from, if it is to become the subject for a primary key:
In 2, we say that a candidate key always, without exception, must hold a value. Without it, we break post 1.
In 3, we say that no matter what happens, inside our business or outside of it, we are not allowed to change the value of the candidate key. Adding an attribute (expanding) the candidate key, would pose a change. Some say that you should be able to change the primary key, and do a 'cascade update' to all underlying tables, ref above.
Actually, 3 it not a formal rule as such, but it is my personal experience through the years that make me claim this as a rule for myself, due to the limitations of today's database products. Cascading updates are very critical operations, not well supported by the different database system vendors.
We will deal with that in a later chapter.
If the candidate key can withstand all these three demands, we have most likely identified our primary key. As we proceed in this eBook, we will keep these three demands in our minds.
You may also find interest in reading 'A Relational Model of Data for Large Shared Data Banks' by E.F. Codd (the founder/co-founder of the relational theory): A Relational Model of Data for Large Shared Data Banks
Throughout this eBook, I will be using conceptual models; Entity Relationship diagrams, as well as logical models; what is called Server diagrams in Oracle Designer. I will also examplify the different constructions with SELECT statements where appropriate. First, a few words about notation in the conceptual model:
Fig. 2:

In this figure, a # in front of an attribute means the attribute is (part of) the primary key. Likewise, the horizontal bar over the crow-foot in the drawing of the relationship between departments and accounts is a sign that the primary key from departments is also a part of the primary key in the accounts entity. How smart that is, is a totally different issue: (ACCOUNTS is breaking 2nd Normal Form: Name is dependent on ACCOUNT_NO alone, not DEPT_ID).
A * in front of an attribute signs that this attribute is mandatory (always must have a value filled in), while an o in front of an attribute tells that it is optional to give the attribute a value. Likewise with the logical model:
Fig. 3:

Notice the difference between the two graphics: in the logical model in Fig.3, the relationship in
Fig. 2 is incorporated in the accounts table. The relationship in Fig. 3 now indicates that there exists
an unconditional constraint between the two tables: You are not allowed to enter a value into DEPT_ID
in accounts if it doesn't exist in the departments table. The horizontal line above the crow-bar does
not necessarily indicate participation in the primary key of that table: only the # tells you if a column
is a part of the primary key.
By the way, Fig. numbers do not start at Fig. 1, because this is just
an excerpt from the eBook. you can check it out here:
Primary and Foreign Keys.
I have also used the plural form on entity names as well as table names. The normal is singular form for entities, and plural for tables. I have done this exception for reading purposes only.
Details on uniqueness
A primary key is, as stated earlier in post 1, a column, or a collection of columns, that together give us the ability to uniquely read one single row from a table, no matter how many rows that exists in that table. In the logical world this is just theory, but in the physical, we get into a lot of trouble if we are unable to do that.
Consider a simple example:
You want to give a 10% pay raise to your top DBA, John Wilson.
You do that by writing:
UPDATE EMPLOYEES
SET SALARY = SALARY * 1.1
WHERE NAME ='John Wilson';
If you have two employees with that name, they both get a pay raise. Not exactly what you intended?
Names are very unlikely candidates for primary keys.
You need something unique, and names are not unique. Never.
Details on null values
In post 2, about no null values in a primary key: How can you identify something you don't know? This point should go without saying, but I bring it up, because, believe or not, I have read articles claiming that there exists the possibility of a weak primary key; meaning null values are allowed. Absolutely nonsense!
All meaning of the term primary key vanishes into thin air. Most serious database systems will actually reject your attempts and deny you to go further on that path, and with a very good reason.
If you ever come across this situation, just defy it. It is a path of failure. Actually, it is a non-sense discussion.
Details on no change
Think about Africa. After World War II, and the end of colonialism, how many countries have changed their names? A more recent example: There is no longer a country called The Union of Soviet Socialist Republics; Today we call it Russia.
We even have standards for naming countries: the former Russia was known all over the world as USSR, and Russia is abbreviated as RUS today (or is it?). Look at these two fantastic tables:
Fig. 4: Country definitions from the United Nations
Fig:5: Another table of country definitions from the ISO standardization organization:
Isn't that great?
Western Samoa is abbreviated WSM in the United Nations, and it is abbreviated WS is ISO! It has even 3 identifiers in ISO; WS, WSM and 882!
Take a look at USSR (Sorry, RUS): In United Nations they call it RUS, while ISO use RUS, RU, 643, and on Internet it's .ru. But according to FIPS 10-4 it should be abbreviated RS! How can we identify a country by using standards from these helpful people?
If you do a search on Google for the term 'standard country codes' (without the quotes), you will find the sites for these two tables among the first ten hits. Do not go to the ISO page; they actually want you to walk around with a shopping cart and pay for it.
In fig.4, every column is a candidate key: No two countries have the same name or any other shortcuts of the name, but they differ all the same, if you put on a broader perspective: In ISO, the name is Russia and the (latest determined standard) abbreviation is RU; but in the United Nations, the name for the same country is Russian Federation, and their abbreviation is RUS. Furthermore, in FIPS 10-4, the abbreviation is RS. Do they never meet to talk with each other?
Looking at it isolated, all columns in Fig. 5 are also candidate keys. But not if you want to match entries in Fig. 4 and Fig. 5.
Actually, I can find only one common denominator here: Both ISO and UN give Russia a common number: 643. Here is something to investigate. Are all the other abbreviations and names meant only for each organization, while they share the 643 number for identification? Have they been talking to each other, after all?
It should be clear by now that we cannot rely on any of the character columns in either organization: While each of them are candidate keys in their own world, we would soon get into trouble if we tried to use them to talk to all the world (that is, outside the scope of our little application).
As you can see, defining (or choosing) the right primary keys is not neccessarily a simple, straight-forward task.
How about another case:
You are a manufacturer, and one of your customers places an order on behalf of his customer, asking you to use his customer's customer number as identification: How long will it take before another of your customers does the same, and by sheer coincidence, they supply you with the same id (let us say cust_no 1000), but they are totally different companies?
You will have to organize your system in such a way that you can uniquely identify all information going into the system, regardless of external systems, and regardless of external changes. That's the bottom line: You have to make the primary key foolproof.

Interview with
Steven Feuerstein
I have been in the database field for 10+ years and I have never before come across such a useful site. Thank you for taking the time to put this site together."
Mike, USA


The name Oracle is a trademark of Oracle Corporation.
The names MS Access/MS SQL Server are trademarks of Microsoft Corporation.
Any other names used on this website may be trademarks of their respective owners, which I fully respect.