MySQL Database Design: Setting up Your DB Schema
A basic tutorial on MySQL Database Design.
This article will take a look at several areas of MySQL Database Design to consider when creating an application with MySQL.
We'll look first at some guidelines for how data should be
grouped into tables.
Notice: This article on MySQL database design contains incomplete examples that should be further normalized and refined. For a more extended understanding of general database design, read the sections on Database Theory and Prqactice and the Analysis Phase.
Setting up your TablesBasic work in MySQL Database Design: Before you create the tables for your application you should group your data in sets. A set of data should a be group which is entered or updated with the same or similar frequency. Let's say we have an application which is going to track the productivity of workers at the Acme Manufacturing Company. Our tables will contain the following data:
It would certainly be possible to put all of this data into one table. However this would be a bad MySQL Database Design decision for several reasons.
Some of the data in this list needs to be tracked each working day because it may change each day, and some of it will change only infrequently, if at all. The number of units a worker completes each day, the hours worked, the shift worked, supervisor and date need to tracked daily. However we would not expect a worker's name, department, address, or gender to vary as frequently (although we try to keep an open mind at Acme).
If we enter data that does not change or changes only infrequently every time we insert a performance record, this is a waste of table space and bandwidth.
We could enter this more static information only the first time we enter a record for an employee, but empty fields may waste almost as much space as filled ones. In addition, if we need to retrieve an employee's address, for example, in a table with this configuration, MySQL has to search an ever-growing number of empty rows before it finds the address field that contains the address.
The solution is simply to remove the more static values to a table of their own. We can then join them on a common identifier (emp_id) that will correlate the static information in one table with the daily entries in the other table.
So, a sensible design might consist of a table named workers:
And the the table performance:
One thing to consider when you build your tables is that you are able to set size limits on most fields. Restricting the limit for each column to the minimum size needed makes your tables more efficient. For example, our integer columns have been set to tinyint unsigned. The range that the tinyint data type allows is from -128 to +128. The unsigned designation means that in this column values cannot be negative, so we have further reduced the range to 0 – 128.
This is a good idea as long as we know that our application will never have to handle more than 128 employees. In the workers table the gender column is set to char(1).
In general, fixed field types are more efficient than variable column types and should be used when possible. In this case gender will be a one-letter designation – either 'F' or 'M'. The name column cannot be fixed length for obvious reasons, but we can set a limit on the length of the column. In this case we have chosen 25 characters, far fewer than the default max for this column, 250 characters.
The emp_id column has been designated as a primary index in the workers table. There are two reasons for this. This ensures that no two employees will mistakenly be assigned the same id number, and it also provides an index in case we decide to use a foreign key to maintain referential integrity between the two tables.
The indexes on the name column in the workers table and the emp_id and name columns in the performance table were created for this reason as well.
The rate column in the workers table will hold a summary value (denormalized (!)) which will be the average rate of units completed per hour for each worker. It will allow two digits to the left of the decimal point and two digits to the right.
The innodb storage engine was chosen to provide the possibility stored procedures as well as foreign keys in our application. Stored procedures, triggers, and constraints, including foreign keys in MySQL are each subjects of other articles in this series.
Let's put a little bit of data in our tables so that they look like this:
Before we go on, let's take a look at the output of the procedure analyse function . This is a rather useful function that provides information about your tables and even suggests optimal settings for your columns.
Here is the output for our performance table:
The output is abbreviated here in order to fit it on the page. The focus right now is to take a look at the list of suggestions analyse presents us. Let's compare it to the actual structure of the table:
Let's see if any of the suggestions make sense for our table.
One suggestion is to set the default of each of our rows to NOT NULL. This is a sound point. Also, the presence of a NULL column is a clear indication on lacking MySQL Database Design. Also, if columns are designated as NOT NULL, MySQL does not have to check the column values to see if they are null when it is processing a query. Another suggestion is to reduce the number of place values allowed in our tinyint columns.
We would not want to reduce the allowed digits from 4 to 1 for the emplyee id as suggested. We know that we are going to need 3 digits for this column, so we could reduce it to 3.
We have to remember that analyse is working with a very limited set of data at this point.
Likewise, the units and hours could be reduced to tinyint(2), and shift could be reduced to tinyint(1), since there are only 3. Don't expect these changes to make a pronounced differences in performance, but these things do have a cumulative effect.
Using ENUM for the date and name fields is not a viable option; we will have too many unique values in these fields to make this practical. ENUM could be an option for the names of the supervisors, but I would probably not do this because the pool of supervisors is surely subject to change and this would have to be manually updated in the table structure. This seems like the kind of thing that would likely be forgotten or neglected.
The changes above can be quickly made with the ALTER TABLE statement. To alter a column type:
To change both the data type and null value of the units column, use the change clause of the alter table command:
You need to specify the name of the column twice, because the change clause gives you the option of changing the name of the column. Here are the changes we have made to the column so far:
MySQL Database Design: In optimizing tables, the accumulation of little things can add up to a lot. Look for the obvious and large issues first, but a series of small tweaks can be very effective as well.
Free eBookSubscribe to my newsletter and get my ebook on Entity Relationship Modeling Principles as a free gift:
What visitors say...
"I just stumbled accross your site looking for some normalization theory and I have to say it is fantastic.Read more Testimonials
I have been in the database field for 10+ years and I have never before come across such a useful site. Thank you for taking the time to put this site together."
Theory & Practice
SQL Server DBA
Install SQL Server
Database Normalization eBook:
Copyright © www.databasedesign-resource.com / Alf A. Pedersen
All rights reserved.
All information contained on this website is for informational purposes only.
Disclaimer: www.databasedesign-resource.com does not warrant any company, product, service or any content contained herein.
Return to top
The name Oracle is a trademark of Oracle Corporation.