8.6 When Troubleshooting, Read the Footer First
Novices usually focus on
the one-line summaries, regular users concentrate on the alignments
and their statistics, and professionals first read the footer. When
it comes to solving the two most common problems, no hits and too
many hits, the one-line summaries aren't much help.
Regular users can often look at alignments and diagnose compositional
biases and unidentified repeats, but determining the cause of no hits
isn't easy. Examining the footer to determine what
the search was actually looking for is the best way to determine what
happened. Always answer the following questions first:
What
are the values for the seeding parameters W, T, and two-hit distance?
If the seeding parameters are too stringent, divergent alignments may
not be seeded. In NCBI-BLAST, W is unfortunately not displayed in the
footer. The value for T and two-hit distance are given as
T: and A:, respectively.
What is the scoring scheme expecting to
find (i.e., target frequency)? If the scoring matrix expects nearly
identical sequences, highly divergent sequences may be missed.
What
is the alignment threshold? If the alignment threshold is too high,
low scoring alignments will be thrown away. The gapped and ungapped
values are given after S1: and
S2: in NCBI-BLAST. In WU-BLAST, they are on the
rows beneath S2.
What
are B and V set to? If they are set too low, the number of one-line
summaries and database hits may be truncated.
What is the score and expected length of a significant alignment? Use
the Karlin-Altschul equation to solve for the normalized score and
then divide by H to calculate the length.
Was complexity filtering employed, and if
so, was it hard or soft? Complexity filtering is generally a good
idea, but may prevent some sequences from generating significant
alignments. NCBI-BLAST doesn't not currently report
which filters were employed.
|