Beruflich Dokumente
Kultur Dokumente
ARTIGO ORIGINAL
Bioinformtica Comparativa
Faculdade Informtica (FACIN), PUCRS, Av. Ipiranga, 6681, Prdio 32, Sala 602, 90619-900, Porto Alegre, RS, Brasil.
Received on November 12, 2014; revised on November 14, 2014; accepted on November 17, 2014
ABSTRACT
Motivation: Substitution matrices are used in biological sequence
alignment to model amino-acids substitution frequencies during
evolution. The choice of the most suitable matrix strongly depends
on the sequence length. This study investigates how different substitution matrices impact on alignment of short fragments, often used in
template-based protein structure prediction.
Results: Since the statistics of local alignment with gaps is not fully
known, not all substitution matrices work well for short fragments.
The PAM (Percent Accepted Mutation) matrices show a better result
when aligning short sequences. Moreover, using different costs for
gaps has an important impact on the alignment.
Supplementary information: NCBI-BLAST has an option to optimize the alignment for short sequences. It uses substitution matrices
and gap penalties that fit most use cases. Having a better understanding of these parameters, however, comes in handy when fine
tuning for a given domain.
RESUMO
Motivao: Matrizes de substituio so usadas no alinhamento de
sequncias biolgicas para modelar as frequncias de substituio
de aminocidos durante a evoluo. A escolha da matriz mais
adequada est atrelada ao tamanho da sequncia. Este estudo
investiga o impacto de diferentes matrizes de substituio no
alinhamento de fragmentos curtos.
Resultados: Como a estatstica de alinhamentos locais com
espaos no bem conhecida, nem todas as matrizes de
substituio so adequadas a fragmentos pequenos. As matrizes
PAM (Percent Accepted Mutation) apresentam melhor resultado
para alinhamento de sequncias curtas. Alm disso, o custo de
espaamentos tem impacto importante na qualidade do
alinhamento.
Informao suplementar: O NCBI-BLAST tem uma opo para
otimizar o alinhamento de sequncias curtas. Ele seleciona matrizes
de substituio e custo de espaamentos que funcionam bem no
caso geral. No entanto, o entendimento desses parmetros
*To
INTRODUO
2
2.1
FUNDAMENTAO TERICA
Dogma central da biologia molecular
2.2
BLAST
COELACANTH
-PELICAN--
2.3
Matrizes de substituio
2.4
(1)
ln
ln 2
Custo de espaamentos
(3)
2.5
(2)
MATERIAIS E MTODOS
Extenso
2
1
3
1e2
3
Entropia
2,57
1,60
0,354
0,3795
0,4808
0,6979
0,9868
1,1806
.
Foram executados experimentos com protenas cujo cdigo PDB
1ZDD e 1L2Y (Fig. 3). Foram aplicadas todas as combinaes de
tamanhos de fragmentos, matrizes de substituio e custos de
espaamentos. Os resultados so descritos na prxima seo.
Altschul, S.F. & Gish, W. (1996) Local alignment statistics. Meth. Enzymol. 266:460480.
Chao, K.M. and Zhang, L. (2008). Sequence Comparison: Theory and Methods.
Springer. 230 p.
Deken, J. (1983) Probabilistic behavior of longest-common-subsequence length." In
"Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. D. Sankoff & J.B. Kruskal (eds.), pp. 55-91
Dorn, M.; Norberto de Souza, O. (2008) CReF: a central-residue-fragment-based
method for predicting approximate 3-D polypeptides structures. In Proceedings of
the 2008 ACM symposium on Applied computing.
Fitch, W.M. (1983) Random sequences. J. Mol. Biol. 163:171-176.
Gumbel, E. J. (1958) Statistics of extremes. Columbia University Press, New York,
NY.
Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc.
Natl. Acad. Sci. USA 89:10915-10919
Karlin, S. & Altschul, S.F. (1990) Methods for assessing the statistical significance of
molecular sequence features by using general scoring schemes. Proc. Natl. Acad.
Sci. USA 87:2264-2268.
Korf, I. and Yandell, M. and Bedell, J (2003). BLAST. O'Reilly Media, Incorporated.
339 p.
Mount DM. (2004) Bioinformatics: Sequence and Genome Analysis (2nd ed.). Cold
Spring Harbor Laboratory Press: Cold Spring Harbor, NY.
NCBI (2014a) BLAST Help [Internet]. Bethesda (MD): National Center for Biotechnology
Information
(US);
2008-.
Available
from:
http://www.ncbi.nlm.nih.gov/books/NBK1762/
NCBI (2014b) The Statistics of Sequence Similarity Scores Available from:
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
NCBI (2014c) Substitution Matrices [Internet]. National Center for Biotechnology
Information (US). Available from: ftp://ftp.ncbi.nih.gov/blast/matrices/
Needleman, Saul B.; and Wunsch, Christian D. (1970). A general method applicable
to the search for similarities in the amino acid sequence of two proteins. Journal of
Molecular Biology 48 (3): 44353.
Pearson, W.R. (1995) Comparison of methods for searching protein sequence databases. Prot. Sci. 4:1145-1160.
Smith, Temple F.; and Waterman, Michael S. (1981). Identification of Common
Molecular Subsequences. Journal of Molecular Biology 147: 195197.
RESULTADOS
REFERNCIAS
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) Basic local
alignment search tool. J. Mol. Biol. 215:403-410.
Altschul, S.F. (1991) Amino acid substitution matrices from an information theoretic
perspective. J. Mol. Biol. 219:555-565.
Altschul, S.F. (1993) A protein alignment scoring system sensitive at all evolutionary
distances. J. Mol. Evol. 36:290-300.