13.3 blastall Parameters
blastall
is controlled by several parameters. Many of the parameters have
default settings and don't need to be explicitly
assigned. Consider this simple command:
blastall -p blastp
Behind the scenes, this command is converted to:
blastall -p blastp -d nr -i stdin -e 10 -m 0 -o stdout -F T -G 11 -E 2 -X 15 -v 500
-b 250 -f 11 -g T -a 1 -M BLOSUM62 -W 3 -z 0 -K 0 -Y 0 -T F -U F -y 0.0 -Z 0 -A 40
You can see that many parameters are set without your express
knowledge. These parameters affect the results of your experiment
and, as reinforced many times throughout the book, you should try to
understand these parameters and set them to fit each experiment.
The following reference section explains
all the parameters available for blastall and
lists the default values that are used if not explicitly set. The
table was compiled according to the default values for the five basic
programs. Although megablast can be run from
within blastall (-n
T), you should use the standalone program. The
parameters for it are presented later in the chapter.
Sets the number of processors to use
on of processors. If you have multiple queries, you will get better
throughput by executing multiple BLAST searches. For insensitive
searches such as default BLASTN, setting -a to a
higher value may not appreciably improve speed if disk I/O is the
bottleneck.
Default: blastn 0, others 40 | Programs: All |
Sets the multiple-hit window size.
When BLAST is set to two-hit mode, this option requires two word hits
on the same diagonal to be within [integer]
letters of each other in order to extend from either one. The larger
the [integer], the more sensitive BLAST will be.
Setting [integer] to 0 sets the default behavior
of 40, except for blastn, whose default is
single word hit. To specify one-hit behavior, set
-P 1.
Default: 250 | Programs: All |
Truncates the report to [integer] number of
alignments. There is no warning when you exceed this limit, so
it's generally a good idea to set
[integer] very high unless you're
interested only in the top hits.
Default: Optional | Programs: blastn, tblastn |
Sets the number of queries to
concatenate in a single search. Concatenating queries accelerates the
search because the database is scanned just one time. This is the
principle underlying megablast, but the
implementation is different in blastall.
This option is new in Version 2.2.6 and still experimental. The
specified [integer] must be the number of
sequences in the query file. If it's less, only the
first set of [integer] sequences is used. Also,
the output is very different than you would expect. All the query
names are listed, and then all the one-line summaries are given,
followed by the alignments, and finally, one footer is produced for
the whole report. Given this format, it's very
difficult to discern which alignments belong to which query. This
option should not be used in its current implementation.
Identifies the database to search.
[database] must already be formatted by
formatdb. BLAST looks for
[database] in the following order: the local
directory, the BLASTDB environment variable (Unix only), and finally,
the location specified in the .ncbirc file.
You can merge multiple databases into a
single virtual database by putting the individual databases in
quotes. For example, to merge the nt and
est databases, use: -d
"nt est". You
can't mix nucleotide and amino acid databases. The
statistics reported are based on the sizes of the combined databases.
Virtual databases may exceed file size limits imposed by the
operating system.
Default: 1 | Programs: tblastn, tblastx |
Options
- 1
-
Standard Nuclear Genetic Code
- 2
-
Vertebrate Mitochondrial
- 3
-
Yeast Mitochondrial
- 4
-
Mold, Protozoan, and Coelocoel Mitochondrial
- 5
-
Invertebrate Mitochondrial
- 6
-
Ciliate Nuclear
- 9
-
Echinoderm Nuclear
- 10
-
Euplotid Nuclear
- 11
-
Bacterial and Plant Plastid
- 12
-
Alternative yeast nuclear
- 13
-
Ascidian Mitochondrial
- 14
-
Flatworm Mitochondrial
- 15
-
Blepharisma Nuclear
- 16
-
Chlorophycean Mitochondrial
- 21
-
Trematode Mitochondrial
- 22
-
Scenedesmus Obliquus Mitochondrial
- 23
-
Thraustochytrium Mitochondrial
Sets the
threshold expectation value for keeping alignments. This is the
E from the Karlin-Altschul equation that
describes how often an alignment with a given score is expected to
occur at random.
Default: blastn 2, others 1 | Programs: All |
The penalty for each gap character.
The -G parameter controls the initial cost of
opening a gap. Note that -E 0
is synonymous with the default behavior and, it's
impossible to set -E to zero unless
-g F is set, which turns
gapping off. The default gap cost, for programs other than
blastn, depends on the scoring matrix. The value
shown here is for the default BLOSUM62 matrix. See Appendix C for a complete list of default and legal gap
penalties.
Defaults: blastp 11, blastx 12, tblastn 13, tblastx 13 | Programs: blastp, blastx, tblastn, tblastx |
Neighborhood word threshold score. Only
those words scoring equal to or greater than
[integer] will seed alignments.
Default: T, but see below | Programs: All |
Filters the query sequence for
low-complexity subsequences. The default setting is
T. Complexity filtering is generally a good idea,
but it may break long HSPs into several smaller HSPs due to
low-complexity segments. This can cause some alignments to fall below
the significance threshold and be lost. To prevent this, either turn
off filtering (not recommended) or use soft masking, in which the
filter is used only in the word seeding phase, but not the extension
phase.
The parameter argument's [string]
form follows a nonintuitive syntax. If the string begins with an
m, soft masking is turned on. Filtering programs
are specified by a single capital letter: D for
DUST, R for human repeats,
V for vector sequences, S for
SEG, and C for
coiled-coil. D,
R, and V are used only for
blastn searches, and S and
C are used for all other programs. More than one
filter may be specified, and additional parameters may be passed to
the programs. See the following tables and the -U
parameter used for filtering lowercase letters in the query sequence.
To use R or V, the correct
database files must be downloaded and installed in the BLASTDB
directory. For human repeats, three databases are needed:
humlines.lib, humsines.lib,
and retrovir.lib. For vector filtering, use the
UniVec_Core database (ftp://ftp.ncbi.nih.gov/pub/UniVec/).
String options for blastn
|
No complexity filter
|
-F ""
|
Default (DUST)
|
-F "D"
|
Soft masking
|
-F "m D"
|
Lowercase soft masking
|
-F "m" -U
|
Soft masking of DUST and lowercase letters
|
-F "m D" -U
|
Mask human repeats
|
-F "R"
|
Mask vector sequences
|
-F "V"
|
Soft-masking of human repeats and vector
|
-F "m R;V"
|
String options for blastp, blastx, tblastn, and tblastx
|
No complexity filter
|
-F ""
|
Default (SEG)
|
-F "S"
|
Soft masking
|
-F "m S"
|
Lowercase soft masking
|
-F "m" -U
|
Coiled-coil
|
-F "C"
|
SEG plus coiled-coil
|
-F "S;C"
|
SEG with settings for
windowsize, locut, and
hicut
|
-F "S 10 1.0 1.5"
|
As above, plus coiled coil and soft masking (including lowercase)
|
-F "m S 10 1.0 1.5; C" -U
|
Default: T | Programs: blastn, blastp, blastx, tblastn |
Performs gapped alignment.
Setting this to F invokes the older, ungapped
style of alignment. You can't perform gapped
alignments with tblastx, regardless of this
setting.
Defaults: blastn 5, others 11 | Programs: All |
Initial
penalty for opening a gap of length 0. Penalties for extending the
gap is controlled by parameter -E.
-G 0 invokes the default
behavior, and setting -G to zero is impossible,
unless -g F is set, which turns
gapping off. The default gap costs for programs other than
blastn depend on the scoring matrix; the value
here is for the default BLOSUM62 matrix. See Appendix C for a complete list of default and legal gap
penalties.
Default: stdin | Programs: All |
If -i isn't included on the
command line, BLAST expects input from stdin
(i.e., it will wait indefinitely for you to type in a FASTA file from
the keyboard). The following commands are therefore equivalent:
blastall -p blastn -d nt -i query
blastall -p blastn -d nt < query
cat query | blastall -p blastn -d nt
cat query | blastall -p blastn -d nt -i stdin
If the input file contains multiple sequences, BLAST will be run on
each sequence in order, and the resulting output will contain
concatenated BLAST reports.
Shows GenInfo Identifier (GI) numbers in
definition lines. A GI is a unique numeric identifier assigned for a
sequence in GenBank. A GI corresponds to an accession version pair.
Believe the query defline.
Default: 0 - Off | Programs: All |
The number of best hits from a region to keep. This option is useful
when you want to limit the number of alignments that might pile up in
one section of the query. This is most useful if the settings of
-b or -v are low, and the
abundant alignments push lower scoring alignments off the end of the
report. If set, a value of 100 is recommended.
Default: Optional | Programs: All |
Restricts database search to a list of GIs found in
[file]. The database sequences must have
NCBI-compliant identifiers, including GI numbers, and the database
must be indexed (by running formatdb with the
-o option). The [file] must be
in the same directory as the database or in the directory from which
blastall is called. [file]
may be in text format with one GI per line or in binary format (see
the -B parameter for
formatdb).
Default: Optional | Programs: All |
The location on query sequence. This lets you limit the search to a
subsequence of the query sequence. For example, to search just the
letters from 21 to 50, add the following parameter:
-L "21,50"
The alignments won't extend outside the specified
region. In older versions of BLAST, -L set the
size of the region under control of the -K
parameter.
Sets the alignment viewing options. Appendix C
gives examples of these display options.
Options
- 0
-
Pairwise
- 1
-
Query-anchored, showing identities, no gaps in query (gaps are shown
as a tree-like thing in subjects), identities shown as
".", positives uppercase, negatives
lowercase
- 2
-
Query-anchored, no identities, no gaps in query, negatives lowercase
- 3
-
Flat query-anchored, show identities, padding through all sequences
- 4
-
Flat query-anchored, no identities, padding through all sequences
- 5
-
Query-anchored, no identities and blunt ends, (dashes [-]are used to
blunt the ends)
- 6
-
Flat query-anchored, no identities and blunt ends, ([-] to ends)
- 7
-
XML output
- 8
-
Tabular
- 9
-
Tabular with comment lines
- 10
-
ASN.1 in text format ([-] must be set for this
option to work)
- 11
-
ASN.1 in binary format ([-J] must be set for this
option to work)
Default: BLOSUM62 | Programs: All except blastn |
Designates a protein similarity matrix. This is used in all BLAST
programs except blastn. Matrices are sought in
the following order: in the local directory, in the location
specified in the .ncbirc file, in a local data
directory, and finally, in the BLASTMAT environment variable (only on
Unix systems). Other matrices included in the standard distribution
include BLOSUM45, BLOSUM80, PAM30, and PAM70.
You can use custom matrix files, but
it requires modifying the source code and defining the new matrix
with all of its associated statistics for different affine gap
combinations and recompiling the binary. Using these custom files
isn't recommended because it requires the arduous
task of calculating gapped values for lambda and maintaining a
derivative branch of the source code.
Default: F | Programs: megablast |
Sets the blastn
program to the megablast mode, which is
optimized to find near identities very quickly. The following lines
are equivalent:
blastall -p blastn -n T -d est -i my_file
megablast -d est -i my_file -D 2
More program options are available if you run the
megablast executable (see Section 13.6).
Default: Optional | Programs: All |
Designates an output file for the search results. If not used, output
is printed to stdout. The following commands are
equivalent:
blastall -p blastn -d nr -i query -o output
blastall -p blastn -d nr -i query > output
Default: None, required parameter | Choices: blastn, blastp, blastx, tblastn, tblastx, psitblastn |
When
choosing psitblastn, the -R
[checkpoint file] must also be
specified. This special use of blastall uses the
output PSSM checkpoint file of PSI-BLAST (see
blastpgp -C option), combined
with the protein query sequence, to implement a
tblastn search against a nucleotide database.
Default: blastn 1, others 0 | Programs: All |
Specifies the two-hit or single-hit
algorithm. The two-hit option requires two word hits on the same
diagonal to extend from either one. When set to two-hit mode, the
-A parameter specifies how close the two hits have
to be to trigger extension.
Options
- 0
-
Two hit
- 1
-
Single hit
Default: -3 | Programs: blastn only |
Sets the penalty for a nucleotide mismatch. Also see
-r. The choice of [integer] for
-q and -r are very important
because they determine your target frequencies. The default values
-r 1 -q -3
are most effective for aligning sequences that are 99 percent
identical. See Appendix B for more information on
nucleotide scoring schemes.
Default: 1 | Programs: blastx, tblastx |
Genetic code to use for translation of the query nucleotide sequence.
See the -D parameter for list of genetic codes.
Default: 1 | Programs: blastn only |
Sets the score of a nucleotide match. See the -q
parameter and Appendix B.
Default: Optional | Programs: psitblastn |
Designates the PSI-BLAST checkpoint
file to be used in the psitblastn search.
-p must be set to psitblastn.
The input must be a protein sequence and be the same one used with
blastpgp -C to generate the
[checkpoint file].
Default: 3 | Programs: blastn, blastx, tblastx |
Chooses which strand of DNA-based queries is searched.
Options
- 1
-
Top strand
- 2
-
Bottom strand
- 3
-
Both strands
For example, the following command searches only the
query's top strand.
blastall -p blastn -d nr -i query -S 1
Length of the largest intron allowed
in tblastn for linking HSPs. A default of 0
means that linking is turned off.
Produces HTML output with <anchor> links from the summary at
the top of the report to the alignments farther below. This option
should be used only with the standard report format
(-m 0).
Default: 500 | Programs: All |
Sets the number of database
sequences for which to show the one-line summary descriptions at the
top of a BLAST report. You won't be warned if you
exceed [integer]. Also see the
-b parameter.
Default: 0 | Programs: blastx only |
Sets the frame shift penalty for the
Out Of Frame (OOF) algorithm of blastx. When
-w is set, it invokes the OOF mode of
BLAST, which lets alignments proceed across
reading frames. The expect values calculated from OOF
blastx are only approximate, and BLAST issues
the following warning when OOF is invoked:
[NULL_Caption] WARNING: test500: Out-of-frame option
selected, Expect values are only approximate and
calculated not assuming out-of-frame alignments
The out-of-frame alignments are signified by slashes that indicate
the +1(/),+2(//),
-1(\), and -2(\\) frameshifts.
The following is a sample OOF alignment:
Query: 23 PLIRNSL/YCINC\\A//QSIIRAHVKGPYLTRWVVNC/E\TCSKGYAKTPGASTDLLLL 160
PLIRNSL YCINC QSIIRAHVKGPYLTRWVVNC TCSKGYAKTPGASTDLLLL
Sbjct: 1 PLIRNSL YCINC X QSIIRAHVKGPYLTRWVVNC X TCSKGYAKTPGASTDLLLL 53
Query: 161 YKTRNSLTSASSLSPVRSQRMI/N\SFPRFQGHLVVSG/S\SAHNR/FS\FNRDSPRGSG 322
YKTRNSLTSASSLSPVRSQRMI SFPRFQGHLVVSG SAHNR F FNRDSPRGSG
Sbjct: 54 YKTRNSLTSASSLSPVRSQRMI X SFPRFQGHLVVSG X SAHNR FX FNRDSPRGSG 107
Query: 323 SYCSREPMGQIKIRRTHTDDKLFR/ND\SRHTRAGDGLNI//TLA\\RDPSFLSRVYNAN 484
SYCSREPMGQIKIRRTHTDDKLFR SRHTRAGDGLNI L RDPSFLSRVYNAN
Sbjct: 108 SYCSREPMGQIKIRRTHTDDKLFR XX SRHTRAGDGLNI XLX RDPSFLSRVYNAN 161
Query: 485 SYLHI 499
SYLHI
Sbjct: 162 SYLHI 166
Defaults: blastn 11, others 3 | Programs: All |
Sets the word size for the initial
word search. The minimum word size for blastn is
7. Word sizes for blastp, blastx, tblastn, and
tblastx are 2 or 3.
Default: blastn 30, others 15 | Programs: All, except tblastx |
Sets the X2 dropoff value for gapped
alignments. The value is measured in bits. Smaller values of X2
result in earlier termination of extensions. Adjusting this parameter
is generally unnecessary.
Default: blastn 20; other 7 | Programs: All |
Sets the X1 dropoff value (in bits)
for extensions. The lower X1 is set, the shorter the extension will
be. It's rarely necessary to adjust this parameter.
The effective length of the search
space. This is the size of the database multiplied by the size of the
query or MN from the Karlin-Altschul equation.
If -Y is unset or set to 0, the actual size of the
database and query is used.
The effective length of the database. This option is useful for
maintaining consistent statistics over time as databases grow.
If -z is unset or set to 0, the actual effective
length of the database is used.
Sets the X3
dropoff value (in bits) for extensions but is bounded by the value
for X2. It's generally not
necessary to adjust this parameter.
|