14.3 WU-BLAST Parameters
WU-BLAST
has many control parameters, some of which are esoteric and rarely
useful. The most important parameters are listed here.
Defines an alternate scoring system
for any pair of letters. For example, altscore="M
M -3" changes the score of M-M
pairs to -3, and altscore="A C
4" gives a score of 4 if the query is A and the
subject is C. Letters may be designated as any to
change an entire row or column. The score can be given as
min or max for the minimum
and maximum scores in the matrix or na to make
the score infinitely low. To set the score of all rows and columns
containing stop codons to negative infinity, set
altscore="* any
na" and altscore="any
* na". If you change the
scoring parameters, you may also want to adjust
gapL, gapH, and
gapK.
See also
nogap,
gapL,
gapH, gapK
Sets the number of database hits to
report. A warning is issued if this number is exceeded. It is typical
to set this parameter to a very high value, such as
B=100000, to ensure that no alignments are missed.
Default: Off | Programs: blastn, tblastx, blastx |
Search only the bottom strand of the
query.
See also
top
Default: 4 for blastn; all for blastp, blastx, tblastn, and tblastx | |
Sets the number of processors to use.
If not set, all processors on the system may be used except
blastn, which will limit itself to 4. See Chapter 10 for information on the
/etc/sysblast file used for setting systemwide
resource limitations.
Default: Last database record | |
Last database record number to
search.
See also
dbrecmin, qrecmin,
qrecmax
First database record number to
search. For example, by setting dbrecmin=1
dbrecmax=10, only the first 10 database sequences
are searched.
See also
dbrecmax, qrecmin,
qrecmax
This is the
E from the Karlin-Altschul equation. Database
hits whose E-value is greater than this threshold will not be
reported. If both E and S are
set, the more restrictive parameter is used.
See also
S
Default: Variable; calculated from scoring parameters | |
Sets the alignment threshold for ungapped alignments. When
E2 and S2 are set, the more
restrictive parameter is used.
See also
S2, gapE2,
gapS2
Prints out the query sequence after
all filtering is performed. This is useful for troubleshooting when
there are no database hits, and you suspect the filtering is too
aggressive.
See also
filter,
wordmask,
maskextra
Suppress nonfatal error messages. It is generally a good idea to pay
attention to the error messages, but at times it is useful to block
them.
See also
nonnegok, novalidctxok
Processes the query sequence with the specified filtering method.
Letters are replaced with X and N for proteins and nucleotides,
respectively.
- seg
-
Identifies low-complexity regions in both nucleotide and amino acid
sequences.
- dust
-
The standard low-complexity filter for nucleotide sequences.
Generally less sensitive than seg.
- xnu
-
Finds
short repeats in protein sequences.
- seg+xnu
-
Combines both seg and xnu.
- ccp
-
Coiled-coil filter for proteins.
Multiple filtering methods may be specified on the same command line;
for example:
blastp nr query filter=seg filter=ccp filter=xnu
See also
echofilter, maskextra,
wordmask
Default: Variable; calculated from scoring parameters | |
Expectation threshold for saving individual gapped alignments. When
gapE2 and gapS2 are set, the
more restrictive parameter is used.
See also
gapS2,
E2, S2
Default: Variable; depends on scoring parameters | |
Sets the value of
H (information per aligned letter) for gapped
alignments. If a particular combination of scoring matrix (or
match/mismatch scores) and gap values doesn't
already have precomputed values for gapH,
gapK, and gapL, WU-BLAST uses
ungapped statistics. In this case, the resulting E-values may be much
too low. A warning is issued when this is the case. Computing proper
values for gapped Karlin-Altschul parameters requires simulations
with random sequences that determine what ungapped scoring scheme is
most similar to the gapped scoring scheme.
See also
H,
K, gapK,
L, gapL,
warnings
Default: Variable; depends on scoring parameters | |
Sets the value of the Karlin-Altschul K
parameter for gapped alignments. See the description for
gapH.
See also
H, gapH,
K, L, gapL
Default: Variable; depends on scoring parameters | |
Sets the value of the
Karlin-Altschul parameter lambda (information per unit score) used
for gapped alignments. See the description for
gapH.
See also
H, gapH,
K, gapK, L
Default: Variable; calculated from scoring parameters | |
Score threshold for saving individual gapped alignments. Alignments
below the threshold aren't reported.
See also
gapE2
Maximum separation allowed between
gapped alignments along the query.
See also
gapsepsmax,
hspsepqmax,
hspsepsmax
Maximum separation allowed between gapped alignments along the
subject.
See also
gapsepqmax,
hspsepqmax, hspsepsmax
Default: Variable; depends on scoring parameters | |
Sets the alignment extension cutoff for gapped alignment.
See also
X
Displays the GenInfo identifiers of
database hits, if present.
Maximum fractional length overlap for gapped alignment consistency.
See the description for olf.
Maximum absolute length of overlap for gapped alignment consistency.
See the description for olf.
Sets the maximum number of gapped alignments per subject sequence.
gspmax is bounded by hspmax. A
value of 0 implies no limit.
See also
hspmax
Default: Variable; depends on scoring parameters | |
Sets the value of the Karlin-Altschul parameter
H.
See also
gapH, K,
gapK, L, gapL
Sets the maximum number of ungapped alignments per subject sequence.
A warning is issued if this limit is exceeded. A value of 0 implies
no limit.
See also
gspmax
Maximum distance between word hits
for the two-hit seeding algorithm. WU-BLAST uses one-hit seeding by
default.
Maximum separation allowed between alignments along the query.
Maximum separation allowed between alignments along the subject.
Default: Variable; depends on scoring parameters | |
Sets the value for K from the Karlin-Altschul
equation.
See also
gapK, H,
gapH, L, gapL
Assesses individual alignment scores with Karlin-Altschul statistics
rather than using sum statistics on groups of alignments.
Default: Variable; depends on scoring parameters | |
Sets lambda (nats per unit score)
from the Karlin-Altschul equation.
See also
gapL, H,
gapH, K,
gapK
Filters lowercase letters in the
query sequence. The lowercase letters are treated as if they had been
filtered out by one of the filtering programs.
See also
echofilter, filter,
wordmask, lcmask
Masks lowercase letters in the query sequence for seeding only.
Lowercase letters in the query sequence aren't used
in the initial word search but are available for alignment during the
extension stage; known as soft masking.
See also
echofilter, filter,
wordmask,
lcfilter
Display group information. Parentheses indicate the placement of the
alignment in the group. The following example shows three alignments
in the group. The score of the second reported alignment is 159, the
last alignment in the chain.
Score = 159 (61.0 bits), Sum P(3) = 2.7e-38
Identities = 26/39 (66%), Positives = 32/39 (82%)
Links = 1-3-(2)
See also
topcomboN
Sets the match score. This parameter
is usually used for blastn only but may be used
for other programs.
See also
N
Extends masking an extra distance of [integer] letters.
See also
echofilter, filter,
wordmask, lcfilter,
lcmask
Default: BLOSUM62 | Programs: blastp, blastx, tblastn, tblastx |
Specifies
a scoring matrix file. The default is BLOSUM62. A large number of
scoring matrices are distributed with WU-BLAST in the
matrix/aa directory. Nucleotide matrices for use
with blastn are in
matrix/nt.
Sets the mismatch score. This parameter is usually used for
blastn only but may be used for other programs.
See also
M
Turns off gapped alignment. This
parameter is useful in conjunction with altscore
to prevent stop codons.
See also
altscore
Under Karlin-Altschul statistics, the expected score, must be
negative. WU-BLAST normally exits with a fatal error if this
isn't the case. Sometimes scoring schemes with
positive expected scores are useful, and setting
nonnegok silences the error condition.
See also
novalidctxok,
errors
WU-BLAST doesn't allow alignments to cross hyphen
characters that act as query segment boundaries (e.g., for draft
sequence). nosegs effectively converts hyphens to
Ns.
Suppresses informational messages. For example, if you are
intentionally searching for a low-complexity sequence, you may wish
to disable the message that suggests that a low-complexity filter
would help remove meaningless alignments.
See also
errors,
warnings
If a sequence can't generate any significant HSPs,
WU-BLAST normally exits with an error that says there are no valid
contexts. You may see encounter such an error when searching a
collection of sequencing reads, some of which are mostly (or
completely) Ns. Setting novalidctxok allows you to
continue without error.
See also
nonnegok, errors
Sets the length of region for seeding.
See also
nwstart
Sets the starting position for seeding alignments.
nwstart and nwlen indicate that
a specific region of the query should be seeded. Alignments may
extend outside of this region. For example, nwstart=500
nwlen=200 seeds positions 500 to 700 of the query sequence.
See also
nwlen
Write results to this file instead of to stdout
(the screen).
Maximum fractional length of overlap for alignment consistency.
Consistent alignments must be ordered and have minimal overlap (see
Chapter 5). The amount of permitted overlap is expressed as both a
relative fraction and an absolute number. The default setting, 0.1,
prevents alignments whose overlap length is more than 10 percent of
the length of either alignment from being in the same group. The
golf parameter plays the same role for gapped
alignments. The olmax and
golmax parameters control the absolute length of
the overlap.
Maximum absolute length of overlap for alignment consistency. See the
description for olf.
Default: Off | Programs: blastp |
Performs Smith-Waterman alignment after initial BLAST alignment to
return the single maximum-scoring pair rather than several
high-scoring pairs.
Default: 10 blastn, 9 blastp, blastx, tblastn, tblastx | |
Sets the cost for the first gap character.
See also
R
Adjusts the query numbering by this amount—for example, if you
search with a sequence that was known to have a vector sequence in
the first 25 bases. By setting this parameter to 25, your numbering
will be based on the insert sequence.
Last query sequence to search. See the description for
qrecmin.
By default, WU-BLAST produces one BLAST report for each query
sequence in a FASTA files with multiple sequences. Setting
qrecmin and qrecmax allows you
to select a subset of query sequences in much the same way as
dbrecmin and dbrecmax.
See also
qrecmax, dbrecmin,
dbrecmax
Default: 10 blastn, 2 blastp, blastx, tblastn, tblastx | |
Sets the cost for the second and remaining gap characters.
See also
Q
blastp and blastx
statistical tests are based on the number of residues (letters) in
the database. If Z is set in conjunction with
restest, blastn,
tblastn, and tblastx will
also be based on the number of letters.
See also
seqtest, Z
Default: Variable; calculated from E | |
Sets the final score threshold. Since S and
E are interconvertible through the
Karlin-Altschul equation, setting S effectively
sets E, and vice versa. When both are set, the
more restrictive one is used.
See also
E
Default: Variable; depends on scoring parameters | |
Score threshold for individual ungapped alignments. If both
S2 and E2 are set, the more
restrictive one is used.
See also
E2, gapS2,
gapE2
blastn, tblastn, and
tblastx statistical tests are based on the
number of sequences in the database. If Z is set
in conjunction with seqtest,
blastp and blastx will also
be based on the number of sequences.
See also
restest, Z
WU-BLAST normally discards HSPs that
are contained completely within a larger, higher-scoring HSP. This
behavior is called span2. If
span1 is set, alignments are thrown out if they
are subsets of the query or subject (unlike span2,
both conditions aren't required). This is useful if
the sequences contain many repeats. To prevent discarded alignments,
set span. The output may become very large.
Default: 11 blastp, 12 blastx, 13 tblastn, 13 tblastx | |
Sets the neighborhood word threshold
score. Setting this value extremely high removes neighborhood words
and makes seeding require matching words. T,
W, and hitdist are the most
effective parameters for controlling the sensitivity and speed of
BLAST searches.
See also
W,
hitdist
Default: Off | Programs: blastn, tblastx, blastx |
Searches only the top strand of the query.
See also
bottom
Reports the number of consistent, or collinear, HSP combinations.
Controls the number of one-line summaries.
See also
B
WU-BLAST reports various warning conditions. This parameter turns
them off.
See also
notes, errors
Words are created by sliding a window of width W
by wink letters at a time. If W
equals wink, words don't overlap.
See also
W, T, hitdist
Filters the query sequence for seeding only. Low-complexity region in
the query sequence isn't used in the initial word
search but is available for alignment during the extension stage;
called soft masking.
See also
filter, lcfilter,
lcmask, echofilter,
maskextra
Sets the
word size for seeding alignments.
See also
T, hitdist,
wink
Default: Variable; depends on scoring parameters | |
Controls the alignment extension cutoff for ungapped alignments.
See also
gapX
Default: Variable; depends on scoring parameters | |
Sets the size of the query sequence.
See also
Z
Default: Variable; depends on scoring parameters | |
Sets the size of the database in letters (restest
is assumed), but Z may also be used to mean the
number of sequences if seqtest is set.
See also
Y, seqtest,
restest
|