A-G
- aa
-
The abbreviation for an amino acid that is often used when describing
the length of a protein (e.g., the average protein is about 300 aa
long).
- allele
-
A form of a gene. Typically, the most common form is called
wild-type, and each allele is given a specific
(and often obscure) name.
- amino acid
-
The basic building block for all proteins. There are 20 common amino
acids.
- Arabidopsis thaliana
-
Known by its common name, thale cress, this mustard weed is a
favorite organism for plant genetics and molecular biology. It was
the first plant with a complete genomic sequence. For more
information, see http://www.arabidosis.org.
- bit
-
The contraction for binary digit. The base-2 logarithm of a number is
in units of bits.
- BLOSUM
-
The abbreviation for a blocks substitution matrix. Matrix names are
followed by a number (e.g., BLOSUM62) that indicate the minimum
percent identity between any two aligned sequences.
- bp
-
The abbreviation for base pair. The length of DNA is usually given in
bp or nt, Common measures include Kb, Mb, and Gb for thousands,
millions, and billions of bp, respectively.
- C-terminus
-
The end of a protein. In text form, the C-terminus of the protein is
always at the right.
- Caenorhabditis elegans
-
A nematode (also called a roundworm) that is about 1 mm long and has
about 1,000 cells as an adult. C. elegans was
the first animal to have its complete genome sequenced. See
http://www.wormbase.org.
- CDS
-
The abbreviation for a coding sequence. CDS isn't
synonymous with exon, since exons may contain noncoding sequence.
- codon
-
Three contiguous letters of DNA or RNA. Each of the 64 codons
specifies either an amino acid or a translation stop.
- complement
-
The complement of a DNA sequence is the sequence on the other strand.
For example, the complement of ACCCGT is TGGGCA. To complement a
sequence in Perl, use either of the following:
# 4-letter alphabet
$dna =~ tr/ACGT/TGCA/;
# 15-letter alphabet
$dna =~ tr[ACGTRYWSKMBDHV]
[TGCAYRSWMKVHDB];
- Drosophila melanogaster
-
The common fruit fly. This is one of the most famous organisms for
genetic research and was one of the first animals whose complete
genomic sequence was determined. See
http://www.fruitfly.org.
- dynamic programming
-
A common technique that reduces the computational complexity of a
problem by finding and extending a partial optimization.
- E. coli
-
Eschericia coli. A common bacteria normally
found in your gut and a favorite organism for molecular biology
research. Some variants cause food poisoning.
- effective length
-
Karlin-Altschul statistics assume sequences of infinite length. To
adjust for edge effects in real sequences, the search space is
reduced by adjusting the true lengths of the sequences to effective
lengths.
- entropy
-
Randomness; disorder; unpredictability.
- eukaryote
-
Organisms with intracellular membranous organelles such as the
nucleus and mitochondria are called eukaryotes.
- frame-shift mutation
-
A mutation that causes an insertion or deletion of nucleotides that
isn't a multiple of three, and therefore causes the
reading frame to change.
- gene
-
A functional unit of the genome. When not specifically stated,
"gene" is usually considered a
"protein-coding" gene, but many
genes don't contain the instructions for proteins
(e.g., various RNA genes).
- genetic code
-
The mapping of codons to amino acids. See Table 2-3.
- genetic drift
-
The tendency of sequences to change over time by accumulating random
mutations.
- genome
-
The complete genetic material for an organism. For eukaryotes, the
genome refers to the nuclear genome and doesn't
include organelles.
- global alignment
-
An alignment algorithm that requires every letter of each sequence to
appear in the alignment. Globally aligning sequences of different
lengths may lead to very strange alignments.
|