8.16 Parse BLAST Reports with Bioperl
The
traditional BLAST output format is meant to be human readable, but
when your BLAST report is 1,000 pages long, it isn't
much fun to read. Sometimes all you want is the names of all
sequences that have alignments above 90 percent identity. Such tasks
require a BLAST parser that lets you select only the information you
want. Many freely available BLAST parsers can be downloaded from the
Internet, but the ones in most common use come from the Bioperl
project. Bioperl is an open-source community of bioinformatics
professionals that develops and maintains code libraries and
applications written in the Perl programming language. If your daily
routine finds you running BLAST or other sequence analysis
applications, learning to use the Bioperl system can save you many
hours of work and frustration.
Let's see how Bioperl can help solve the problem
posed earlier: to report the names of all sequences that are more
than 90 percent identical to your query.
#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;
my $blast = new Bio::SearchIO(
-format => 'blast',
-file => $ARGV[0]);
my %Name;
my $result = $blast->next_result;
while(my $sbjct = $result->next_hit) {
while(my $hsp = $sbjct->next_hsp) {
$Name{$sbjct->name} = 1 if $hsp->frac_identical >= 0.9;
}
}
print join("\n", sort keys %Name), "\n";
Pretty simple, huh? With BLAST and Bioperl, it's
possible to create all kinds of useful applications.
|