The formal process of normalization centers around the concepts of some defined normal forms that the database designer can use to achieve a good relational structure.
The normal forms define the logical steps that are used to normalize database tables. In the following sections, the concepts defined by each normal form are covered along with a good example of the normalization process.
The concepts are defined and taught in the context of using a real-world example. The example is of no commercial value; its purpose is for the teaching of a conceptual topic.
Here is the scenario. We have a database table that stores data about orders made by customers. This table is illustrated in Figure B-1.
The application software that uses this table for read and write purposes has been experiencing performance troubles lately. This table requires normalization. We will use it as an example to illustrate the process of normalization.
The First Normal Form is the first step to normalizing a table. When a table has undergone this step, it is referred to as being in the First Normal Form, or 1NF. From looking at the layout of the table in Figure B-1, three things are easily noticeable: repeating groups, object inconsistency, and storage space.
The table is repeating many of the same values. For every order detail, the user must enter the information for the order and the customer. For an order with 20 or more items, this is a daunting task.
Furthermore, there is the risk of a user making a mistake in typing the product name into the ProductDescription field. If you retrieve a report on that particular product, you will not be presenting the right information because your filter for the query hinges on the ProductDescription text field. If the query filters retrieved data for Linux as the ProductDescription, records with Linx are filtered out of the retrieved rowset.
As a solution for repeating groups, you may decide to remove the order-centric data from every record and put in into the first record that contains orderItem-centric data. See Figure B-2 for an illustration of this scenario.
The fact is that the table in Figure B-2 does not fit into the relational database design concepts. It is impossible to express a query that retrieves all order items for one particular order with this data entry model.
The repeating values force the physical database file to grow at a faster rate. The performance of a database table hinges on the number of rows it contains. If the Sales Department is entering around 50 orders per day into this table, write performance decreases drastically because for every order item entered, customer information and order information has to be entered.
The First Normal Form simply attempts to:
Eliminate repeating groups in individual tables
Create a separate table for each entity type
Identify each row with a primary key
A good look at the table in Figure B-1 reveals that the order details form a repeating group. Our solution to this problem is to split this table into two separate tables, one storing order details and the other storing the orders, as illustrated in Figure B-3.
The two are related by a many-to-one foreign key from the OrderDetail table to the Order table, as shown in Figure B-4.
Each order detail has a composite primary key of its order and product numbers. A single entity is easily retrieved from these tables, as repeating groups, inconsistency, and storage space problems are eliminated.
These tables are in 1NF, and they are not at all perfect. The Second Normal Form identifies and resolves further issues with these tables.
The Second Normal Form is the second step to normalizing a database table. After undergoing this step of the normalization process, the table is said to be in 2NF.
In this section, we continue to normalize the tables in our 1NF example.
The functional dependency of a table column defines whether the column is dependent on another column of the same table or some other table.
The Second Normal Form attempts to remove columns that are functionally dependent on any part of a composite key.
After taking a close look at the OrderDetail table, notice that each of the products has one description and one price. Price and description are therefore functionally dependent on the product. To enter a new product, you have to enter an order detail. This is inefficient in the business world. The table has to be in 2NF, and since the product is part of a composite key, this occurrence violates the core rule of the Second Normal Form.
Notice that the products can be grouped into one entity type altogether. This gives your users more flexibility in entering new products as they come in while entering order details when orders are made.
The solution that puts our set of tables into 2NF is splitting the OrderDetail table into two: OrderDetail and Product, as illustrated in Figure B-5.
The Third Normal Form is the third step in the normalization process. It highly resembles the Second Normal Form in its attempt to remove functional dependencies. However, the Third Normal Form requires you to remove dependencies between non-key columns.
The Order table is in 2NF, but it is still vulnerable to attack from data duplication. Every time an order is made, a new customer name and account number must also be entered. Consider the situation where the Sales Department is entering 12 orders for a particular customer every week. Not surprisingly, the salespeople end up entering the customer name and account number 48 times per month. After a few years, this unnecessary duplication of data deals a severe blow to the performance of the Order table because it will be big and slow, and you will be guilty of violating the design concepts of relational database design.
Customers and their accounts can be classified into a new group—Account. In fact, when developing in the real world, you would not want a customer to possess more than one account. For our example, the division illustrated in Figure B-6 will do.
The database tables are now in 3NF and very optimized compared to the single Order table that we started with.
There are other forms of normalization that exist beyond the Third Normal Form. These are not used extensively in the database engineering industry anymore, so they are not covered in this appendix.
If you carefully follow the rules of the first three normal forms, you will have a good relational database design.