H-U
- homologous
-
In sequence analysis, homologous means derived from a common
ancestor. Sequences are either homologous or they
aren't. It is incorrect to say that sequences are 80
percent homologous unless you mean that there is an 80 percent chance
of common ancestry. Use percent identity to describe the similarity
of alignments.
- hydrophilic
-
Literally, "likes water." Water is
a polar molecule that mixes well with other polar molecules. The
charged amino acids K, R, D, and E, are examples of hydrophilic amino
acids.
- hydrophobic
-
Literally, "fears water." Nonpolar
molecules (like those in oils) don't mix well with
water. The amino acids L, I, V, and F are particularly hydrophobic.
- Karlin-Altschul
-
The standard local alignment theory is often called Karlin-Altschul
statistics after its founding authors.
- lambda, l
-
The Karlin-Altschul statistical parameter that converts a raw score
to a normalized score.
- local alignment
-
An alignment algorithm that finds the optimal subsequence alignment.
The alignment may include all letters of each sequence, but it
isn't required to do so.
- low-complexity sequence
-
Regions of sequences that are highly predictable—for example, a
region that is 90 percent A or T.
- methionine
-
One of the 20 common amino acids. Methionine is abbreviated as M or
Met, and is especially important because all proteins begin with a
methionine. There is only one codon for this amino acid: ATG.
- mutation
-
Any change in sequence to a DNA molecule.
- N-terminus
-
The start of a protein. In text form, a protein's
N-terminus is always at the left.
- nat
-
Contraction for natural log digits. The base e logarithm of a number
is in units of nats.
- natural selection
-
A theory founded by Charles Darwin that explains how organisms change
over time to better fit their environment. It is based on the
principles of variation, heritability, and differential reproduction.
- ncRNA
-
The abbreviation for noncoding RNA. Some RNAs, like tRNAs or rRNAs,
don't contain information for protein sequences.
- Needleman-Wunsch
-
Global alignment is often called Needleman-Wunsch after the authors
who first described the algorithm.
- nucleotide
-
The basic building block of nucleic acid sequences (DNA and RNA). DNA
is made from A, C, G, or T, while RNA contains A, C, G, or U.
- nt
-
The abbreviation for nucleotide.
- O(n)
-
The computational complexity of an algorithm is often described by
its asymptotic behavior. O(n) problems grow linearly with the size of
the input. O(log2n) grow much more slowly,
and O(n2) grow much more quickly.
- ORF
-
Abbreviation for open reading frame. Each strand of DNA has three
frames. Any subsequence that doesn't contain stop
codons in a particular frame is an open reading frame.
- ortholog
-
Genes that are separated by speciation (i.e., the same gene in
different species). This is often approximated as the best reciprocal
match between two complete genomes or proteomes.
- palindrome
-
A palindrome in DNA is a sequence that is read the same on the plus
and minus strands. For example, the sequence GAATTC is a palindrome.
Palindromes and near-palindromes are often sites for DNA-protein
interaction. Proteins scanning along DNA
"see" a palindrome as the same
sequence regardless of which direction they are moving.
- PAM
-
An acronym for Percent or Point Accepted Mutation. PAM scoring matrix
names are usually followed by a number (e.g., PAM200), which
indicates how many iterations of multiplication were used starting
with the PAM1 matrix. The higher number indicates a more distant
similarity.
- paralogs
-
Genes that are duplicated within a single genome. Duplication
sometimes allows one of the genes to take on a specialized function.
- phylogenetics
-
The study of evolutionary relationships among organisms.
- prokaryotes
-
Organisms that don't contain intracellular
organelles. All bacteria are prokaryotes.
- proteome
-
The complete set of all proteins produced by a particular organism.
Many proteins undergo post-translational modifications that add or
subtract features from a protein. Therefore, a particular mRNA might
have many different protein isoforms.
- pseudogene
-
A sequence that looks like a gene but isn't. Most
pseudogenes are derived from mRNAs that have been reverse-transcribed
back to DNA and inserted into the genome. They have the hallmarks of
RNA processing—notably a poly-A tail and no introns.
- relative entropy
-
The average number of bits (or nats) per aligned letter for a given
scoring scheme.
- repeat
-
Any class of a sequence that appears multiple times in a genome.
Usually, gene families aren't called repeats and the
term is used for junk DNA. Some of the most common repeats in the
human genome include the ALU and
LINE families.
- reverse transcriptase
-
A protein that creates DNA from an RNA template.
- RNA
-
Ribonucleic acid. RNA is chemically similar to DNA but not used
strictly for storage. Many RNA molecules have important functions in
the cell and may even have enzymatic properties. Some of the most
common functional RNA molecules include rRNAs and tRNAs.
- RNA polymerase
-
A protein or multiprotein complex that creates RNA from a DNA
template.
- ribosome
-
A complex macromolecule made up of proteins and rRNAs. Ribosomes are
responsible for translating mRNAs into proteins.
- rRNA
-
Ribosomal RNA. The ribosome is composed of many specific RNA
molecules, and these components are called rRNAs. rRNAs are some of
the most abundant RNAs in a cell.
- Smith-Waterman
-
Local alignment is often referred to as Smith-Waterman, after the
authors who first described the algorithm.
- start codon
-
ATG. Codes for the amino acid methionine. Many proteins have
N-terminal post-translational modifications, and the first amino acid
of the mature protein may therefore not be methionine.
- stop codon
-
TAA, TGA, and TAG are the three codons that terminate translation.
- sum statistics
-
A method that determines the aggregate statistical significance of
multiple local alignments.
- target frequency
-
The expected frequencies of individual letter pairings. For
nucleotide scoring matrices, the target frequency is often summarized
by the expected percent identity in sequences with unbiased
composition.
- transcriptome
-
The complete set of transcripts for a particular genome. This term is
often used to mean the mRNAs of protein coding genes and their
alternatively spliced variants.
- tRNA
-
The abbreviation for transfer RNA. tRNAs transfer individual amino
acids to the ribosome. Each tRNA molecule has an anti-codon the
matches the reverse-complement of the amino acid it carries.
- UTR
-
The abbreviation for an untranslated region. The 5´ and
3´ ends of an mRNA have untranslated regions. These
regions sometimes play regulatory roles that change the
mRNA's stability, translatability, or localization.
|