Beruflich Dokumente
Kultur Dokumente
ALIGNMENT
Edi
Department of Information System
STMIK TIME
Jalan Merbabu No. 32 AA-BB, Medan, North Sumatra 20212, Indonesia
Telp.: +62-81973173517, e-mail: edi_foe@yahoo.com
Keyword: profile Hidden Markov Model, Multiple Literature survey is used as research method
Sequence Alignment, computational biology in this paper. In section 2, we will introduce
multiple sequence alignment, profile HMMs, and
1. INTRODUCTION implementing profile HMMs in multiple sequence
alignment.
A Hidden Markov Model (HMM) is a
statistical model that has been used for speech 2.1 Multiple Sequence Alignment
recognition since the early 1970s [1]. HMMs
have been successfully implemented in Sequence alignment is a way of arranging
computational biology field such as multiple sequences of DNA, RNA, or protein to identify
sequence alignment, genetic mapping, secondary similar regions that could be results of functional,
structure protein prediction, gene finding, signal structural, or evolutionary relationships between
peptide prediction, transmembrane protein the sequences [b]. There are two types of
prediction, epitope prediction, RNA secondary sequence alignments namely pair-wise sequence
structure prediction, and phylogenetic analysis [2]. alignment and multiple sequence alignment. A
T. Nguyen et al. [3] used HMMs for cancer pair-wise sequence alignment is comparing two
classification using gene expression profiles. The sequences of DNA, RNA, or protein whereas a
authors compared HMMs with six other multiple sequence alignment (MSA) is comparing
classifiers: k-nearest neighbors (kNN), more than two sequences. A MSA is obtained by
probabilistic neural network (PNN), support inserting gaps (-) into the sequences so that all
vector machine (SVM), multilayer perceptron sequences have same length N and can be
(MLP), fuzzy ARTMAP (FARTMAP), and arranged into K rows and N columns in which
ensemble learning AdaBoost. They found that each column represents homologous position [b].
HMMs are the most robust method among seven There are few main applications of MSA [b]:
other classifiers. They also found that by a. Extrapolation
combining modified analytic hierarchy process A good MSA can help convincing that an
(AHP) and HMMs can improve cancer uncharacterized sequence is really a member of a
classification performance by individually protein family.
increasing the efficiency of gene selection (using b. Phylogenetic analysis
modified AHP) and also the classifier (HMM). MSA can be used to infer a evolutionary tree
The most well-known use of HMM in based on amino or nucleic acid sequences.
molecular biology is a profile HMM, a c. Pattern identification
‘probabilistic profile’ of DNA/protein family [1].
A profile HMM of DNA/protein family can be
1
By discovering very conserved positions, we Figure 3. The structure of profile HMM [1]
can identify a region that is characteristic of a
function. In figure 3, the first bottom row “square shape”
d. Domain identification is called as match states. They represents the
It is possible to turn a MSA into a profile that columns of multiple alignment. The probability
describes a protein family or a protein domain. distribution in these states is the frequency of
This profile can be used to scan databases to find amino acids or nucleotides of protein or DNA [1].
new members of family. The second row “diamond shape” is insertion
e. DNA regulatory elements states. They are used to represent highly variable
We can turn a DNA MSA of a binding site into regions in the alignment [1]. The top row
a weight matrix and scan other DNA sequences “circular shape” is delete states. In these states,
for potential similar binding sites. they do not match any residues and the function is
f. Structure prediction to skip over another column of the alignment.
A good MSA can give an almost perfect Besides three types of states in profile HMM
prediction of secondary structure for proteins or architecture, there are two sets of parameters:
RNA. It can also help building a 3D model. transition probabilities and emission probabilities
[4]. Emission probabilities show the possibility of
MSA is generally a global multiple sequence match states in generating one residue of
alignment [c]. Several tools available for MSA nucleotides or amino acids. Transition
such as MUSCLE, T-Coffee, MAFFT, and probabilites show the transitions from one match
CLUSTALW [c]. state to another match state (from left state to
right state) in the model.
Table 1 shows several HMM software
packages along with internet sources. One
important difference among theses packages is in
model architecture.
Figure 1. Global alignment in MSA [c]
Table 1. profile HMM software
Figure 2 shows a MSA in sequences that Software URL
assumed to be homologous (i.e. having a same SAM https://compbio.soe.ucsc.edu/s
ancestor) [d]. Homologous nucleotides are put in am2src/
the same column of a MSA. Phylogenetic tree is HMMER http://hmmer.org/
shown at the left side. PFTOOLS http://web.expasy.org/pftools/
GENEWISE http://www.ebi.ac.uk/Tools/ps
a/genewise/
META-MEME http://meme-suite.org/
Pfam http://pfam.xfam.org/
PSI-BLAST https://blast.ncbi.nlm.nih.gov/
Blast.cgi?CMD=Web&PAGE
=Proteins&PROGRAM=blastp
&RUN_PSIBLAST=on
Figure 2. Phylogenetic tree in MSA [d]
Figure 4 shows the model architecture in
2.2 Profile Hidden Markov Models profile HMM software packages. In figure [5]
from top to bottom: META-MEME shows
A profile HMM is a type of HMM that allows multiple motif model; the original profile HMM
position dependent gap penalties [1]. Figure of Krogh [1]; the ‘Plan 7’ architecture of
below shows the structure of profile HMM. HMMER2, representative of the new generation
of profile HMM software in SAM, HMMER and
PFTOOLS.
2
Figure 6. Profile HMM derived from figure 5 [1]
3
In future, we will introduce sequence
alignment search based on structure of profile
HMM that has been built.
5. DAFTAR PUSTAKA
Websites:
[a] N. Mimouni, G. Lunter, C. Deane,
Hidden Markov Models for Protein
Sequence Alignment.
https://www.stats.ox.ac.uk/__data/assets/
file/0012/3360/naila_report.pdf.
[b] Anonymous, Multiple Sequence
Alignment,
www.srmuniv.ac.in/sites/default/files/file
s/1(7).pdf.
[c] Anonymous, Difference between
Pairwise and Multiple Sequence
Alignment.
http://www.majordifferences.com/2016/0
5/difference-between-pairwise-and-
multiple-sequence-
alignment.html#.V9ZpTaJUXCs.
[d] Clemens Gropl et al., Multiple Sequence
Alignment, www.mi.fu-
berlin.de/.../MultipleAlignmentWS12/mu
ltal.pdf, 2012.