8.20 How to Lie with BLAST Statistics
Several
techniques can help you massage BLAST statistics to either hide
significant alignments or make meaningless alignments appear highly
significant. Why would you want to do this? If you have to ask,
you're not the intended audience. Dishonest evil
doers read on.
The easiest method to adjust the
significance of all scores is to set the effective size of the search
space either higher or lower. Command-line parameters in both
NCBI-BLAST (-Y) and WU-BLAST (Y
and Z) are available. You can also alter the
scoring scheme by editing the scoring matrices. A more involved
approach involves hacking the source code to set your own values for
l, k, and H.
WU-BLAST makes it all too easy because you can alter scores or set
Karlin-Altschul parameters on the command line. Whatever approach you
take, you will, of course, want to edit the footer to cover your
tracks. The easiest way to do this is to run the search twice and
diff the footers to determine what needs fixing.
With low gap penalties, you can
make alignments between just about anything. For BLASTN, NCBI-BLAST
always uses ungapped statistics, so you don't have
to do much work to lie. Just hope that nobody notices all the gaps.
This works best if you have a supervisor who is either too busy to
look at alignments or wouldn't know a decent
alignment if it bit him. NCBI-BLAST is very restrictive about what
gap penalties you can employ for the protein-based BLAST programs.
Your only choice here is to hack and recompile. WU-BLAST is very
easy; set your gap costs low and include warnings
on the command line to suppress messages about ungapped statistics.
Another way to trick the
unobservant is to remove complexity filters. This works especially
well when claiming that some anonymous low-complexity region or
transcript is a cool gene. You can almost always find a small ORF
that has a poor match to something with an interesting definition
line. A poor match is only poor if you don't know
how to fix the statistics. This approach even works when fooling
scientific journals. (It really does. We've seen it
happen.)
|