[ Team LiB ] |
10.4 Editing Scoring MatricesThe amino scoring matrix files distributed with NCBI-BLAST and WU-BLAST assign a score of +1 to paired stop codons. This doesn't make much biological sense and reduces the ability of TBLASTX to discriminate between coding and noncoding similarities. Therefore, you should edit the scoring matrices to change stop codon pairs to a highly negative score. Be sure to edit the original matrices. The NCBI-BLAST scoring matrices are in the data directory. For WU-BLAST, they are in the matrix/aa directory. The final line of the scoring matrix files looks like this: * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1 Just change the final number to -999: * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -999 You have to do this only once if you remember to keep your edited matrices when updating your BLAST installation. For any of the translating BLAST programs, you can also change all stop scores to highly negative values. If used in conjunction with ungapped extension, doing so prevents a lot of noncoding sequences from appearing in significant alignments. The following Perl script modifies the standard matrices: #!/usr/bin/perl while (<>) { if (/^#|^\s/) { print; } elsif (/^\*/) { print '*', ' -999' x 24, "\n"; } else { s/\S+\s*$/\-999\n/; print; } } Both NCBI-BLAST and WU-BLAST require matrices to have specific names. Unrecognized names cause NCBI-BLAST to terminate the search. WU-BLAST continues searching, but it employs ungapped values for l, k, and H (it issues a warning to this effect). Try to maintain the names of the matrices, but in a location with an obvious name such as stop-999-matrices. Both versions of BLAST look for scoring matrices in the local directory, and on Unix systems, they recognize the BLASTMAT environment variable. Therefore, prior to the search, you can either create a symbolic link (alias) to the scoring matrix of choice or set the BLASTMAT environment variable to point to the location of specialized matrices. In the following examples, the derivative matrices are located in /my_computer/stop-999-matrices: ln -s /my_computer/stop-999-matrices/BLOSUM62 . blastall -p tblastx -d db -i query rm BLOSUM62 setenv BLASTMAT /my_computer/stop-999-matrices blastall -p tblastx -d db -i query unsetenv BLASTMAT WU-BLAST users can use the altscore parameter to change the scores of any pair of letters rather than edit the matrix files. See Chapter 14 for more information on the altscore parameter. |
[ Team LiB ] |