[ Team LiB ] |
2.4 DenormalizationDenormalization is the process of consciously removing entities created through the normalization process. An unnormalized database is not a denormalized database. A database can be denormalized only after it has been sufficiently normalized, and solid justifications need to exist to support every act of denormalization. Nevertheless, fully normalized databases can require complex programming and generally require more joins than their unnormalized or denormalized counterparts. Joins are resource-intensive operations; thus, the more joins, the more time a query will take. To deal with queries that take too long or are too complex to be maintainable, a database architect denormalizes the database. As we have seen from the process of normalization, each lower normal form introduces database anomalies that can compromise the integrity, maintainability, and extensibility of the database. Denormalization is thus a reasoned trade-off between query complexity/performance and system integrity, maintenance, and extensibility.
In most cases, you can deal with complexity simply by creating views that hide the complexity. Performance is thus the general driver of denormalization. To determine whether denormalization makes sense, I recommend Craig Mullins's simple guidelines posted in an online article for The Data Administration Newsletter in an article called "Denormalization Guidelines" (http://www.tdan.com/i001fe02.htm):
If you answer "yes" to any of these questions, you should not use denormalization as your performance tuning tool.
The most common temptation to denormalize comes from queries that require joins to retrieve a single value. Any query pulling a film's suitability for children along with the film from the database would fall into this category. For example: SELECT Film.title, Film.language, Film.year, Rating.forChildren FROM Film, Rating WHERE Film.filmID = 2 AND Film.rating = Rating.code; Denormalization would move the rating code and suitability for children back into the Film table. Wouldn't the query perform much better without that join? Actually, it probably would not perform noticeably better—the join is done using a unique index (Rating.code). Denormalization, however, would incur all of the anomalies that led us to normalize the table in the first place. A better candidate for normalization might be pulling a state name into an Address table along with the state code used in the join. If most queries actually want the state name and the query would definitely benefit from avoiding the join to the State table, it can make sense to add an extra column to Address for stateName. You do not, however, remove the State table. This denormalization works—assuming real performance benefits are achieved for the application—because the state name is a candidate key for the State table. Though stateName would technically be a transitive depencency in the Address table (and thus violate 3NF), its status as a candidate key for State makes it almost a functional depencency and consequently almost acceptable to put into the Address table. A common situation in which performance does truly become a problem is reporting. For reporting, database normalization is just one of many factors that lead to performance degradation. Because complex reports generally eat server resources regardless of normalization issues, it is generally a bad idea to empower users to execute complex reports against live tables. Instead, you can denormalize by replicating the data into special tables designed to support reporting needs. To create a table for reporting on westerns, we might create a WesternReport table that looks like the table in Table 2-9.
Reporting on all of the westerns from 1992 would look like this: SELECT * FROM WesternReport WHERE year = 1992; The alternative is to have users constantly executing the following query against the tables that actually maintain your data: SELECT Film.filmID, Film.title, Film.rating, IFNULL(Rating.forChildren, 'N'), Film.ranking, Film.year FROM Film, FilmGenre LEFT OUTER JOIN Rating ON Film.rating = Rating.code WHERE FilmGenre.code = 'WES' AND year = 1992 AND Film.filmID = FilmGenre.filmID; Use follow-up queries to get other genres, directors, and actors associated with the film. |
[ Team LiB ] |