DekGenius.com
[ Team LiB ] Previous Section Next Section

10.4 Editing Scoring Matrices

The amino scoring matrix files distributed with NCBI-BLAST and WU-BLAST assign a score of +1 to paired stop codons. This doesn't make much biological sense and reduces the ability of TBLASTX to discriminate between coding and noncoding similarities. Therefore, you should edit the scoring matrices to change stop codon pairs to a highly negative score. Be sure to edit the original matrices. The NCBI-BLAST scoring matrices are in the data directory. For WU-BLAST, they are in the matrix/aa directory. The final line of the scoring matrix files looks like this:

* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1 

Just change the final number to -999:

* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -999

You have to do this only once if you remember to keep your edited matrices when updating your BLAST installation.

For any of the translating BLAST programs, you can also change all stop scores to highly negative values. If used in conjunction with ungapped extension, doing so prevents a lot of noncoding sequences from appearing in significant alignments. The following Perl script modifies the standard matrices:

#!/usr/bin/perl
while (<>) {
    if (/^#|^\s/) {
        print;
    }
    elsif (/^\*/) {
        print '*', ' -999' x 24, "\n";
    }
    else {
        s/\S+\s*$/\-999\n/;
        print;
    }
}

Both NCBI-BLAST and WU-BLAST require matrices to have specific names. Unrecognized names cause NCBI-BLAST to terminate the search. WU-BLAST continues searching, but it employs ungapped values for l, k, and H (it issues a warning to this effect). Try to maintain the names of the matrices, but in a location with an obvious name such as stop-999-matrices. Both versions of BLAST look for scoring matrices in the local directory, and on Unix systems, they recognize the BLASTMAT environment variable. Therefore, prior to the search, you can either create a symbolic link (alias) to the scoring matrix of choice or set the BLASTMAT environment variable to point to the location of specialized matrices. In the following examples, the derivative matrices are located in /my_computer/stop-999-matrices:

ln -s /my_computer/stop-999-matrices/BLOSUM62 .
blastall -p tblastx -d db -i query
rm BLOSUM62

setenv BLASTMAT /my_computer/stop-999-matrices
blastall -p tblastx -d db -i query
unsetenv BLASTMAT

WU-BLAST users can use the altscore parameter to change the scores of any pair of letters rather than edit the matrix files. See Chapter 14 for more information on the altscore parameter.

    [ Team LiB ] Previous Section Next Section