Sie sind auf Seite 1von 4

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.

ORG

100

S.pombe Protein-Protein Interaction Prediction by using Topological Ontology of S.cerevisiae Protein Network
Praveen Kumar Singh and Madhvi Shakya
Abstract Traditionally, protein interactions have been studied individually by genetic, biochemical and biophysical techniques. Availability of genomic information and protein interaction maps has created an opportunity to improve computational methods. Proposed method is novel approach of graph comparison method used to find protein interaction networks of Schizosaccharomyces pombe from that of Saccharomyces cerevisiae. Topological properties of a protein network can be described through topological ontology by using ontological entities nodes (proteins) and edge (interaction). Unknown protein interaction of S. pombe is predicted by topological similarity search in two steps. First is the identification of similar nodes (orthologous proteins) through BLAST. Second similarity search is done by ontological instances which measure functional similarity based on the binary interactions and groupwise similarity search. Proposed method predicts potential protein interaction by topological similarity, which solves the problem of understanding the biological behavior of protein interaction network within different organism. Index Terms Binary interaction, Functional conservation, Orthologous groups, Topological similarity

1 INTRODUCTION

ellular networks can be decomposed into simple and recurrent patterns, which accomplish discrete biological functions in isolation from other modules in the networks [1]. Stable protein complexes are assortment of proteins, which form many interactions with each other and therefore are cohesive and strongly connected to each other in the context of the larger protein interaction network. The availability of vast amount of biological data of more than 200 organisms has exposed the prevalent modular nature of cellular systems as proteinprotein interaction [2]. Huge amounts of biological data have been accumulated in the last decade from genomic sequencing as well as from transcriptomes, proteomes and interactomes. Such vast amount of data provides an opportunity to explore it to predict protein networks, which provide specialized information to describe the biological roles of the proteins. The conservation of functional modules during the evolution enables to shed new light on the way to understand protein-protein interaction. Protein interaction networks represent the fundamental picture of biological activities within cells and provide opportunities for understanding the biological problems. Here mathematical approach has been used to predict the protein interaction of S. pombe from S. cerevisiae protein interaction network.

Network topologies help us to predict the interaction network from the pattern of functional conservation. A map of proteinprotein interactions provides valuable insight into the cellular function and machinery of a proteome. Protein-protein interactions maps are the compact view of cellular processes, which helps to predict dynamic behavior of genes and their products as the part of functional genomics. Since there is strong evidence that conserved interaction pathways exist across organisms [9], efcient methods to analyze common pathways in these networks will contribute a lot in understanding these networks. Conserved protein interaction network can help to functional annotation of proteins. In a protein-protein interaction network, protein can be represented as nodes whereas interactions between proteins as edges. Protein-protein interaction prediction is usually expensive and time consuming but the computational methods help to accelerate protein interaction prediction. Functional conservation during evolution is reflected at the level of protein-protein interaction network. Most methods for functional similarity identification utilizing measures of amino acid sequence conservation in homologous sequences [10, 11, 12, 13] are based on the assumption that functional properties are relatively conserved during evolution. Protein structural information has also been used to help identify protein functional conservation [14, 15]. These conserved networks are small Praveen Kumar Singh is with the Department of Bioinformatics PhD Pro- networks and helps to assign functional property in new gram in Bioinformatics Maulana Azad National Institute of Technology, organisms. Bhopal, India Many important cellular functions are actually carried Madhvi Shakya is with the Department of Mathematics, National Instiout by protein complexes that act as molecular machines tute of Technology Bhopal, India

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

101

[16]. In the present study focus of attention is on novel approach of identification of S. pombe protein interaction network from S. cerevisiae. The approach described here presumes that large number of physically interacting proteins in one organism have coevolved so that their respective orthologs in other organisms interact as well. This notion of conserved interactions, or interologs [3], is substantiated by the observation that many interactions of cellular processes or molecular complexes are conserved between different species. The yeast species S. pombe and S. cerevisiae are both well studied; these two species diverged approximately 300 to 600 million years ago [17], and are significant tools in the study of biological networks.

3. Having one candidate as the best-matching homolog of the other candidate in the corresponding organ-

2 METHODS
2.1 CALCULATION OF TOPOLOGICAL SIMILARITY BETWEEN ONTOLOGICAL CONCEPTS
Protein-protein interactions are typically visualized as a graph where the nodes represent proteins and the edges represent the protein-protein interactions. Huge amount of protein-protein interaction data is listed in various databases [5, 20]. Direct comparison of interaction data is difficult, because they are often derived under different conditions, come in different formats, and need to be benchmarked against a trusted reference set. For this study, binary interactions have been chosen as the common unit of analysis. Yeast protein interaction networks have been downloaded from the STRING Database [5] which is the functional association, i.e. the specic and biologically meaningful functional connection between two proteins. STRING provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Importantly, all associations in STRING are provided with a probabilistic condence score, which is derived by separately benchmarking groups of associations against the manually curated functional classication scheme of the KEGG database [6]. STRING database contains interaction of 2923 proteins out of 4940 proteins of S. pombe [7]. This means 41% of proteins interaction network is unknown. But for the S. cerevisiae total proteome of all protein interaction networks available in STRING database which is the interaction of 5811 proteins. This data creates an opportunity to find interaction in S. pombe from S. cerevisiae. Nodes (proteins) are used as ontological concept to find topological similarity. Protein function is used as properties of the node to find the similarity of networks. Interologs can help to predict such similarity in binary interactions. A node (protein) of S. pombe is considered similar to a node of S. cerevisiae through BLASTP if both are follows these criterias. 1. Candidate with a significant BLASTP E-value (10-10) 2. Having 80% residues in both sequences includes in the BLASTP alignment
Fig.1. Venn diagram represents common proteins between S. pombe and S. cerevisiae.

ism. Condition 1, 2 and 3, must be true reciprocally Fig.1 represents the common proteins between S. cerevisiae and S. pombe predicted by orthologous criteria mentioned above. 3046 proteins are predicted by sequence similarity search in these species. Total 138920 and 189086 binary protein interactions of S. pombe and S. cerevisiae respectively are collected from STRING database. A binary interaction of S. pombe is assumed similar to that of S. cerevisiae if both the ontological entities edge (types of interaction) and node (orthologous protein) of respective interaction are similar. First of all binary interaction data are classified in 6 broad categories (table-1), which helped to predict topological similarity between binary interaction of S. pombe and S. cerevisiae based on ontological concept edge (type of interaction) as the data source. TABLE 1 PROTEIN INTERACTION TYPES S. No. 1 2 3 4 5 6 Type of Interaction Activation Binding Catalysis Expression Ptmod Reaction Total No. of Interactions Pairs In S. pombe 68 59353 35358 126 112 43903 138920 No. of Interactions Pair In S. cerevisiae 200 113209 32408 446 264 42559 189086

Protein Binary interaction can be classify in 6 broad categories

An intersection graph G is an undirected graph formed from a family of sets Si, (i = 0, 1, 2 ...) by creating one vertex vi for each set Si, and connecting two vertices vi and vj by an edge whenever the corresponding two sets have a nonempty intersection, i.e.

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

102

E(G) = {{vi , v j } | , Si S j

Intersection of two graphs generates the similarity of these networks. Although the whole genome of S. pombe has 4940 proteins [7], a network of only 2932 proteins is available in STRING database. Out of these 2923 proteins, 2228 proteins (nodes) have respective orthologs in S. cerevisiae. Through the nodal or sequence similarity procedure described above it is found that 3046 nodes (proteins) are common to both S. pombe and S. cerevisiae. Applying the edge criteria i.e. similar interaction type and taking intersection of both the graphs (intersection network of S. pombe and S. cerevisiae) only 2223 nodes (proteins) of S. pombe were similar to that of S. cerevisiae. Thus it can be concluded that the conserved proteins follow the interaction topology at the protein interaction level. The work done above is explained through the diagrammatic representation given in Fig 2. Green nodes are those proteins which are similar in both species on the basis of sequence similarity and red nodes represent no sequence similarity in these species. Yellow nodes indicate those orthologous groups in S. cerevisiae which are not present in the existing protein network of S. pombe which compels to conclude that the corresponding orthologs of S. pombe should be the part of existing network.

assignment of proteins. Fig.3 shows new interactions of 255 S. pombe proteins predicted from the proposed approach. Networks based analysis is helpful to predict unknown protein interactions. Predicted interactions are not available in STRING database. These interactions are verified by BIOGRID database of protein interactions. Most of the interactions belong to nuclear proteins.

4 CONCLUSION

Fig. 3 Predicted protein interaction network of S. pombe

Protein interaction prediction is very time consuming and laborious work which can be simplified by using computational approach. Comparing protein interaction between species can help to predict unknown protein interactions. The work done developed a new similarity measures that uses a linear update to generate both nodes Fig.2 Green nodes represents orthologous groups, yellow nodes and edges similarity scores and has desirable converrepresents orthologous proteins which are identified in S. cerevisiae, but absent in known network of S. pombe and Red nodes have no gence properties. Insofar as interesting features of these similarity at all. processes are reflected in the topology of the proteinprotein interaction network of each species, comparing These yellow nodes are predicted as unknown protein the structure of these different networks might reveal interaction network. some pattern with biological significance. This crossTopological similarity based ontological concepts species comparison of interactome corresponds to a spenodes and edges are predicted after analyzing the huge cific function, as the identified protein interaction is not data of binary interactions and orthologous groups available in the databases. While these proteins show the through java programming. high degree of functional similarity. These binary interactions are required to be verified 2.2 UNKNOWN PROTEIN INTERACTION with the help of wet lab experimentation. This can be help PREDICTION Wallhout proposed these orthologous groups as Inte- to investigate intracellular signaling pathways and gainrologs. These interologs facilitate the degree of conserva- ing insight into biochemical processes. tion between the organisms. Here approach is made to assign binary interactions in S. pombe from S.cerevisiae ACKNOWLEDGMENT interaction network. Study facilitates to understanding of Authors are thankful to MHRD, New Delhi for providing functional similarity at the network level. Here yellow facilities in the institute. nodes which are orthologous but not present in S. pombe are considered to predicted network of S. pombe. REFERENCES Present study proved an opportunity to search unknown protein interaction and a fast method of function [1] B.J. Monahan, J. Villen, S. Marguerat, J. Bahler, S.P. Gygi, et al., Fission

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

103

[2]

[3]

[4]

[5]

[6]

[7]

[8]

yeast SWI/SNF and RSC complexes show compositional and functional differences from budding yeast, Nat Struct Mol Biol, vol. 15, pp. 873 880, 2008. H.M. Bourbon, Comparative genomics supports a deep evolutionary origin for the large, four-module transcriptional mediator complex, Nucleic Acids Res, vol. 36, pp. 39934008, 2008. A.J.M Walhout, R. Sordella, X. Lu, J.L. Hartley, G.F.Temple, M.A. Brasch, N. Thierry-Mieg, M. Vidal, Protein interaction mapping in C. elegans using proteins involved in vulval development Science vol. 287, pp. 116122, 2008. W. Du, M.Vidal, J.E. Xie, N. Dyson, RBF, a novel RB-related gene that regulates E2F activity and interacts with cyclin E in Drosophila, Genes & Dev., vol. 10, pp. 12061218, 1996. D. Szklarczyk, A.Franceschini, M. Kuhn, M. Simonovic, A. Roth, P. Minguez, T. Doerks, M. Stark, J. Muller, P. Bork, L.J. Jensen, C. von Mering, The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored, Nucleic Acids Res., vol. 39, pp. D561D568, 2011. M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, M. Hirakawa, KEGG for representation and analysis of molecular networks involving diseases and drugs, Nucleic Acids Res., vol. 38, pp. D355D360, 2010. V. Wood, R. Gwillian, M.A. Rajandream, M. Lyne et al The genome sequence of Schizosaccharomyces pombe, Nature, vol. 415, pp. 871-880, 2002. H. Poor, A Hypertext History of Multiuser Dimensions, MUD History, http://www.ccs.neu.edu/home/pb/mudhistory.html, 1986.

[20] I.Xenarios, L. Salwnski, X.J.Duan, P.Higney, S.M. Kim, D. Eisenberg, DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res., vol. 30, no.1 303305, 2002.
Praveen Kumar Singh received M.Sc. in Biotechnology from Dr BR Ambedkar University Agra, India in 2004 and M. Tech. in Bioinformatics from Maulana Azad National Institute of Technology Bhopal India in 2008. Since 2008 he has been working toward the PhD degree in Bioinformatics from the department of Bioinformatics Maulana Azad National institute of Technology, Bhopal India. Dr Madhvi Shakya received degree M.Sc Mathematics in 1989 from Jiwaji University Gwalior M.P., India and PhD Biomathematics in 2006 from Rajiv Gandhi Technical University Bhopal, M.P. India. He is currently Associate Professor of Mathematics in Maulana Azad National Institute of Technology, Bhopal India. Her research interest is in Mathematics & Computational Biology.

[9] B.P. Kelley, R. Sharan, R.M. Karp, et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment Proc. Natl. Acad. Sci. USA, vol. 100, pp. 1139411399, 2003. [10] W.R. Taylor, The classification of amino acid conservation J. Theor. Biol., vol. 119, pp. 205218, 1986. [11] T.D. Schneider, Information content of individual genetic sequences, J. Theor. Biol. vol. 189, pp. 427441, 1997. [12] O. Lichtarge, H.R. Bourne, F.E. Cohen, An evolutionary trace method defines binding surfaces common to protein families J. Mol. Biol. vol. 257, no. 2, 342358, 1996. [13] W.S. Valdar Scoring residue conservation Proteins, vol. 48, pp. 227241, 2002. [14] M.J. Ondrechen, J.G. Clifton, D. Ringe, THEMATICS: a simple computational predictor of enzyme function from structure, Proc. Natl Acad. Sci. USA, vol. 98, pp. 12473 12478, 2001. [15] A. Armon, D. Graur, N. Ben-Tal, ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information, J. Mol. Biol. vol. 307, pp. 447463, 2001. [16] A. Abbott, The society of proteins Nature, vol. 417, pp.

894896, 2002. [17] M. Sipiczki, Where does fission yeast sit on the tree of life? Genome Biol., vol. 2, pp. 1011.1-1011.4, 2000.
[18] T. Hubbard, D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, R. Durbin, et al. (2002) The Ensembl genome database project. Nucleic Acids Res., vol. 30, pp. 3841, 2002. [19] D. A. Benson, et al. GenBank, Nucleic Acids Res. vol. 30, pp.1720, 2000.

Das könnte Ihnen auch gefallen