Chapter 3. Sequence Alignment
BLAST finds statistically significant similarities between sequences
by evaluating alignments, but how are sequences aligned? In
principle, there are many ways to align two sequences, but in
practice, one method is used more often than any other. This chapter
explains this technique with the biologist in mind, without using the
mathematical notation and jargon that is usually employed to describe
such algorithm. Divested of unfamiliar language and notation, these
algorithms are quite simple.
Finding
the optimal alignment between two sequences can be a computationally
complex task. Fortunately, a technique called dynamic
programming (DP) makes sequence alignment tractable as
long as you follow a few rules. Rather than have you struggle with a
confusing definition of DP, this chapter demonstrates how the
technique works for sequence alignment and then gets back to the
generalities. There are fundamentally two kinds of alignment: global
and local. In global alignment, both sequences
are aligned along their entire lengths and the best alignment is
found. In local alignment, the best subsequence
alignment is found. For example, if you want to find the two most
similar sentences between two books, you use local alignment. If you
want to compare the sentences end to end, use global alignment. This
chapter describes global alignment, then local alignment. The example
uses English words instead of biological sequences and the algorithms
are quite general.
|