Chapter 4. Sequence Similarity
The
fact that the human genome is often referred to as the Book of Life
is an apt description because nucleic acids and proteins are often
represented (and manipulated) as text files. Chapter 3 described common algorithms for aligning
sequences of letters, and score is the metric
used to determine the best alignment. This chapter shows what scores
really are. Some of the introduced terms come from information
theory, so the chapter begins with a brief introduction to this
branch of mathematics. It then explores the typical ways to measure
sequence similarity. You'll see that this approach
fits well with the sequence-alignment algorithms described in Chapter 3. The last part of the chapter focuses on the
statistical significance of sequence similarity in a database search.
The theories discussed in this chapter apply only to local alignment.
There is currently no theory for global alignment.
|