DekGenius.com
[ Team LiB ] Previous Section Next Section

2.4 Denormalization

Denormalization is the process of consciously removing entities created through the normalization process. An unnormalized database is not a denormalized database. A database can be denormalized only after it has been sufficiently normalized, and solid justifications need to exist to support every act of denormalization.

Nevertheless, fully normalized databases can require complex programming and generally require more joins than their unnormalized or denormalized counterparts. Joins are resource-intensive operations; thus, the more joins, the more time a query will take.

To deal with queries that take too long or are too complex to be maintainable, a database architect denormalizes the database. As we have seen from the process of normalization, each lower normal form introduces database anomalies that can compromise the integrity, maintainability, and extensibility of the database. Denormalization is thus a reasoned trade-off between query complexity/performance and system integrity, maintenance, and extensibility.

The Perils of Denormalization

Denormalization must be approached with caution. In general, a table should have proven it requires denormalization in testing or even in production before you actually denormalize it. Data architects very commonly denormalize based on hunches about performance or experience with similar applications in the past—a practice that leads down the path to a poorly designed database.

Denormalization can in some circumstances incur performance penalties. More important, however, most of the time you do not see the kinds of performance improvements from denormalization that actually make a difference. When you denormalize without concrete performance benchmarks backing the denormalization, you end up:

  • Denormalizing tables without appreciable performance improvement

  • Denormalizing again later, after you have done performance testing

The result is a database that looks more unnormalized than denormalized. The best rule of thumb is to prove the database needs denormalization and document that need for the people who will be maintaining the database. Subsequently, you should prove that your denormalization actually improves performance and back out the changes if they fail to address the performance concerns.


In most cases, you can deal with complexity simply by creating views that hide the complexity. Performance is thus the general driver of denormalization. To determine whether denormalization makes sense, I recommend Craig Mullins's simple guidelines posted in an online article for The Data Administration Newsletter in an article called "Denormalization Guidelines" (http://www.tdan.com/i001fe02.htm):

  • Can you achieve performance goals without denormalization?

  • Will the system still fail to achieve performance goals with denormalization?

  • Will the system be less reliable as a result of denormalization?

If you answer "yes" to any of these questions, you should not use denormalization as your performance tuning tool.

BEST PRACTICE: Denormalize only when you have concrete proof that denormalization will boost performance.

The most common temptation to denormalize comes from queries that require joins to retrieve a single value. Any query pulling a film's suitability for children along with the film from the database would fall into this category. For example:

SELECT Film.title, Film.language, Film.year, Rating.forChildren
FROM Film, Rating
WHERE Film.filmID = 2
AND Film.rating = Rating.code;

Denormalization would move the rating code and suitability for children back into the Film table. Wouldn't the query perform much better without that join? Actually, it probably would not perform noticeably better—the join is done using a unique index (Rating.code). Denormalization, however, would incur all of the anomalies that led us to normalize the table in the first place.

A better candidate for normalization might be pulling a state name into an Address table along with the state code used in the join. If most queries actually want the state name and the query would definitely benefit from avoiding the join to the State table, it can make sense to add an extra column to Address for stateName. You do not, however, remove the State table. This denormalization works—assuming real performance benefits are achieved for the application—because the state name is a candidate key for the State table. Though stateName would technically be a transitive depencency in the Address table (and thus violate 3NF), its status as a candidate key for State makes it almost a functional depencency and consequently almost acceptable to put into the Address table.

A common situation in which performance does truly become a problem is reporting. For reporting, database normalization is just one of many factors that lead to performance degradation. Because complex reports generally eat server resources regardless of normalization issues, it is generally a bad idea to empower users to execute complex reports against live tables. Instead, you can denormalize by replicating the data into special tables designed to support reporting needs. To create a table for reporting on westerns, we might create a WesternReport table that looks like the table in Table 2-9.

Table 2-9. A table for reporting on westerns

Attribute

Domain

Notes

NULL?

filmID

BIGINT

PRIMARY KEY

No

title

VARCHAR(100)

 

No

rating

CHAR(5)

 

Yes

forChildren

CHAR(1)

DEFAULT 'N'

No

otherGenres

VARCHAR(255)

 

Yes

directors

VARCHAR(255)

 

Yes

actors

VARCHAR(255)

 

Yes

ranking

INT

 

No

year

INT

 

No

Reporting on all of the westerns from 1992 would look like this:

SELECT * FROM WesternReport
WHERE year = 1992;

The alternative is to have users constantly executing the following query against the tables that actually maintain your data:

SELECT Film.filmID, Film.title, Film.rating,
IFNULL(Rating.forChildren, 'N'), Film.ranking, Film.year
FROM Film, FilmGenre
LEFT OUTER JOIN Rating ON Film.rating = Rating.code

WHERE FilmGenre.code = 'WES'
AND year = 1992
AND Film.filmID = FilmGenre.filmID;

Use follow-up queries to get other genres, directors, and actors associated with the film.

    [ Team LiB ] Previous Section Next Section