Systems Biology Analysis of Protein-Drug Interactions

102
DOI 10.1002/prca.201100077
Proteomics Clin. Appl. 2012, 6, 102116
REVIEW
Systems biology analysis of proteindrug interactions

Jacques Colinge, Uwe Rix, Keiryn L. Bennett and Giulio Superti-Furga
Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Jacques Colinge, Vienna, Austria
Drugs induce global perturbations at the molecular machinery level because their cognate targets are involved in multiple biological functions or because of off-target effects. The analysis or the prediction of such systems level consequences of drug treatment therefore requires the application of systems biology concepts and methods. In this review, we rst summarize the methods of chemical proteomics that can measure unbiased and proteomewide drug protein target spectra, which is an obvious necessity to perform a global analysis. We then focus on the introduction of computational methods and tools to relate such target spectra to global models such as pathways and networks of proteinprotein interactions, and to integrate them with existing protein functional annotations. In particular, we discuss how drug treatment can be mapped onto likely affected biological functions, how this can help identifying drug mechanisms of action, and how such mappings can be exploited to predict potential side effects and to suggest new indications for existing compounds. Keywords: Bioinformatics / Chemical proteomics / Drugs / Personalized medicine / Statistics
Received: September 1, 2011 Revised: September 26, 2011 Accepted: September 27, 2011
Introduction
Our knowledge of drug protein target proles is often limited by practical difculties in obtaining such information. Consequently, numerous compounds in clinical use are orphan ligands or one target only is identied. Continuous and substantial progress in proteomic technologies [1] have made it possible to develop chemical proteomic or chemoproteomic approaches, where the protein targets of a drug are afnity puried and identied by MS [2, 3]. This methodology empowers researchers to measure compoundprotein interactions in a biological context as opposed to in vitro-binding assays. That is, drugprotein interactions cannot only be determined proteome wide, but
Correspondence: Dr. Jacques Colinge, Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Jacques Colinge, AKH BT 25.3, Lazarettgasse 14, A-1090 Vienna, Austria E-mail: jcolinge@cemm.oeaw.ac.at Fax: 143-1-40160-970030 Abbreviations: ABPP, afnity-based protein proling; CML, chronic myeloid leukemia; FDR, false discovery rate; GO, gene ontology; PPI, proteinprotein physical interaction; TCM, traditional Chinese medicine
also in a tissue- or cell type-dependent manner. The strength of this approach is that all the proteins are expressed at true physiological concentrations and bear correct posttranslational modications. Understanding the mechanism-of-action (MoA) of compounds and elucidating the origin of observed side effects are of great importance in drug discovery. Accessing accurate and sensitive target spectra offers promising perspectives that result in the derivation of more efcient and safer compounds. Detrimental leads can be halted at an earlier stage of development and thus signicantly reducing costs [4]. Furthermore, the knowledge of multiple potent targets creates boundless opportunities for drug repurposing. In general, this can potentially provide access to new targets. It could also reveal unexpected synergistic effects that explain the success of certain compounds [5]. Reaching the promises of chemical proteomics is not a straightforward task since the compounds used in patient therapy induce more global changes in the molecular machinery of cells than simply regulating a selected protein [6]. Obviously, a targeted protein is involved in biochemical reactions that take place within one or several biological pathways. These in turn can interact with other pathways
Colour Online: See the article online to view Figs. 2, 4 and 5 in colour.
& 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.clinical.proteomics-journal.com
103 regulation and the data obtained represent the global integrated impact of drugs. In recent years, chemical proteomic applications have aided in the elucidation of several important drugprotein interactions [8, 13, 1825]. The main steps of compound immobilization-based and ABPP methods are introduced below.
[7]. Therefore, modication of the activity of a single agent, which is part of a complex network, can have far-reaching consequences on multiple biological functions. Moreover, compounds often have more than one target and hence can exhibit broad-ranging impact on cell biology. For instance, the tyrosine kinase inhibitor imatinib, which is a hallmark of targeted therapy against the chronic myeloid leukemia (CML) causing fusion protein BCR-ABL, was found to have at least ve additional potent targets, including the nonkinase (NQO2) [8]. To fully exploit the potential of chemical proteomics, combination with computational methods is required. In this way, it now becomes feasible to extend beyond an ideal case where the protein target spectrum reveals valuable information directly, considering complex and indirect implications. This combination of unbiased proteome-wide target determination and computation naturally occurs within the paradigm of systems biology, where units of functions are all regarded as interconnected [9, 10]. Computations can integrate available information on biological pathways, protein and gene interaction networks, and protein functions to predict the biological processes that are impacted by drug treatment. This assists in understanding the mechanism of action of compounds and naturally expands in the direction of identifying potential side effects and new indications for existing molecules. In this review, we briey recall the general principles of chemical proteomics before presenting the current methods of data analysis and available tools and algorithms. We will also discuss how the presented computational methods can be applied to the analysis of target spectra that are not necessarily derived from chemical proteomics, e.g. current efforts in traditional Chinese medicine (TCM) research. We will conclude by presenting attractive future perspectives toward personalized medicine and digital patient models.
2.1 Compound immobilization approaches To prole the protein targets of a chosen compound requires the immobilization of the compound on a matrix. This operation is achieved through a functional group, e.g. sulfhydryl, amino, hydroxyl, or carboxyl, that binds to an activated resin, e.g. sepharose or agarose beads. Small molecules that do not contain an appropriate group must be chemically modied. The potency of the modied, linked compound should be assayed to ensure that it is preserved [26]. A cell or tissue extract is then incubated with the matrix and washed extensively before elution. Finally, proteomic methods are applied to identify the proteins that bound the linked molecule. Current strategies are to either further reduce sample complexity via one- or two-dimensional SDSPAGE [8], or use gel-free methods [27]. Both approaches are then followed by MS. The process is shown in Fig. 1 and commonly named drug pulldown [3, 27].
2.2 Noise in the signal? Similar to any afnity purication methods, compound immobilization approaches suffer from the presence of nonspecic interactions that consistently appear in the list of proteins identied by MS. The causes of nonspecic interactions are multiple, and appropriate solutions exist. In order to analyze the true target prole of a compound, it is crucial to eliminate nonspecic binders without the loss of any important target. This step constitutes the rst stage of bioinformatic analysis. Some proteins might bind to the chemical linker between the matrix support and the compound or even to the matrix itself. This category of nonspecic binders can be readily identied in negative control experiments performed with empty blocked beads. Abundant proteins that have a low afnity for either the immobilized compound or the true interactors of the compound cannot be eliminated completely at the washing step. These proteins contribute largely to the identied nonspecic binders. They are not identied in negative control experiments with blocked beads. Nonetheless, different approaches can be implemented to identify them. It is possible to perform a parallel experiment with an unrelated compound that, with some condence, does not share any target with the compound of interest. To improve the potential of identifying nonspecic binders, a chemically
Methods of chemical proteomics
Different approaches to discover drug targets exist that can be subdivided into two classes [2, 3]. The rst is classical compound afnity purication [8, 11, 12] that requires the immobilization of the compound of interest. The second is afnity-based protein proling (ABPP) [13, 14]. Via a generic chemical probe, it proles a whole class of compounds, though in a biased manner. Additionally, either the expression proteomic experiments or the global mapping of chosen posttranslational modications, e.g. acetylations for histone deacetylase (HDAC) inhibitors, can be employed to compare drug-treated versus untreated samples. Thereby, information regarding drug-induced changes is obtained [1517]. Such broader methods do not measure compound targets directly and are hence not properly speaking chemical proteomic methods. The potential to partially reveal target spectra is evident, although a combination of direct drug inuence is usually combined with downstream
104
J. Colinge et al.
Figure 1. Overview of the entire process of measuring drugprotein interactions through chemical proteomic experiments and of analyzing the generated data. A drug is coupled to magnetic beads through a chemical linker and incubated with the lysate of a biological sample. Different proteins bind the drug with strong afnity (violet and orange) and some others might have a low afnity for the compound (aqua). Additionally, some proteins bind to drug strong binders (green). After washing and elution, the strong binders are puried along with the indirect binders and some abundant low-afnity proteins. MS detects the puried proteins and provides the input data for the bioinformatic analysis, which may exploit a wide range of additional data sources to compute its results.
related compound, which is biologically inactive, is yet a better tool, but the risk of overlapping target proles, however, is increased. One last variation of the same approach is to compare a new drug pulldown with previous experiments and to exclude frequently found proteins. Determining the frequency threshold might require statistical analysis or empirical validation. We have obtained satisfying results with geometric tests (unpublished data). In
general, comparisons with previous experiments increase the risk of discarding a correct target that was found to be nonspecic in a different context. Thus, lists of assumed nonspecic binders, though useful as initial approximations, should be considered with care. It is also possible to compare the proteins identied from a drug pulldown with the proteome of the cells from which the experiment was performed. Due to the nature of MS, an analysis at the whole proteome level will primarily detect the abundant proteins, the so-called core proteome. Conversely, the drug pulldown is an enrichment process resulting in the detection of lower abundance proteins. As a rst approximation, proteins that are detected in both data sets are considered suspicious and should be removed as true interactors. The downside of this simplistic subtraction is that abundant true targets, as recently exemplied by Hsp90 [23], are excluded. Furthermore, such an approach becomes questionable since the emergence of highly sensitive MS instrumentation that can now routinely detect mediumabundance and even some low-abundance proteins. To circumvent this difculty, semi-quantitative MS indicators such as spectral counts [28] can be exploited to detect signicant enrichments from a chemical proteomic experiment. Identication of the minimum increase of spectral count required in the pulldown versus the core proteome can be achieved empirically, e.g. following known targets, or through proper statistical modeling [29, 30]. A nal possibility is to perform the desired pulldown again with a chosen concentration of the free compound added to the cell lysate prior to performing the drug pulldown. In this way, true targets are bound by the free compound and are no longer available in the lysate to interact with the immobilized drug (Fig. 2). Requiring the complete disappearance of the targets or a signicant reduction of the spectral counts in the MS data, when the two data sets are compared, identies the correct proteins. In our hands, this conceptually simple method gives reproducible, sensitive, and reliable results [31]. Drug pulldowns retrieve complete or partial protein complexes in certain cases and the elimination of nonspecic binders leaves a list of proteins mixing direct and indirect interactors (Fig. 3A). Depending on the compound, it is possible to recognize the direct binders with a good condence simply using the knowledge of the binding mechanism. For instance, kinases are very likely to be direct interactors of a kinase inhibitor. More generally, protein protein binary interaction data such as measured by yeast 2hybrid experiments [32], which are available from public databases, might shed light on a mixture of direct/indirect drug interactions (Fig. 3B). Similar help can be obtained through the knowledge of protein interaction domains and predicted physical interactions [33]. Ultimately, complementary proteinprotein interaction experiments could be planed to discover the structure of those interactions, and clarify the position of the drug interactions in this network [29]. Table 1 provides a summary of noise elimination methods.
105
Figure 2. Competition with a free compound in a second experiment sheds light on the nonspecic binders. In comparison with the original pulldown (left), proteins have the opportunity to bind to the free compound in the competition pulldown (right). As a result, direct high-afnity binders and their interactors are found in much reduced abundance in the puried sample. A comparison of spectral counts or truly quantitative measurements identies such reductions readily. Abundant low-afnity proteins binding to the immobilized compound do not nd sufcient free compound copies to signicantly reduce their presence in the puried sample. These are identied by essentially constant spectral counts.
Figure 3. Drug pulldowns retrieve protein complexes. (A) A drug can bind to isolated proteins (ac) but it frequently binds to a protein (f) that is part of a protein complex (dg). (B) The pulldown experiment will identify the direct drugprotein interactions as well as the indirect protein interactions through the complexes (d, e, g). Without a priori knowledge on the binding mechanism, it is essentially impossible to distinguish direct from indirect interactions from such data. When available, information on direct binary interactions between proteins might delineate complexes but not indicate which complex member binds to the drug.
2.3 Miniaturization toward individual patient proling Until recently, relatively large quantities of protein material were necessary to perform chemical proteomic experiments successfully. This constraint might have limited the widespread use of this powerful methodology, particularly when analyzing clinical samples. Due to progresses in MS instrumentation and the development of new experimental protocols involving more sensitive chromatography, it is currently possible to perform drug pulldowns using as little as 106 cells or even less [27]. These improvements provide accessibility to individual patient sample analyses. Thus, new opportunities both in research and, ultimately, in personalized medicine are created that are complementary to next generation DNA-sequencing technologies.
tion experiments, where the same matrix is employed alone or with the presence of compounds to prole, protein targets can be identied reliably through the same logic discussed above for immobilized compounds. Such generic matrices have been developed for kinases (Kinobeadss) [13] and histone deacetylases [34] and offer powerful assay platforms that do not need to be adapted to existing or future compounds. On the other hand, only a limited range of all the possible targets bind to the generic matrices and true targets not present in this subset cannot be detected. A similar approach, termed afnity-based protein proling (ABPP), uses chemically reactive enzyme-specic probe molecules to capture e.g. kinases [14] or proteases [35] and purify them via a biotin tag [36]. While the ABPP methodology follows the same downstream concepts as the generic compound matrices, it offers the advantage to potentially being able to differentiate between active and inactive enzymes. Taken together, these methods are broadrange highly multiplexed assays but they are subjected to a strong bias as they focus on a predened subproteome.
2.4 Subproteome-focused chemical proteomics 2.5 The application of quantitative proteomics To study an entire class of compounds that inhibit proteins through a common binding mechanism, e.g. inhibitors that interact with the ATP pocket of kinases, it is possible to develop generic matrices that bind with a large range of the targets with slightly reduced specicity. Through competi& 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Stable isotope-labeling techniques, e.g. iTRAQ [37], TMT [38], or SILAC [39], nd a natural application in chemical proteomics to render the competition experiment we mentioned previously more precise. For instance, one drug
106
J. Colinge et al.
pulldown can be performed in biological duplicates in two iTRAQ 4-plex channels with the corresponding two competition pulldowns occupying the other two channels. With such an experimental design, the less accurate spectral counts are replaced by relative quantitative measures. The selection of proteins that directly interact with the compounds is then rather straightforward, particularly when combined with appropriate statistical models [40, 41]. In principle, a similar multiplexing approach could improve the comparison of a drug pulldown with the corresponding cell line core proteome as discussed above. In this situation, however, the highly complex core proteome would mask a signicant portion of the pulled down proteins and this option should be disregarded. In subproteome-focused applications, it is advantageous to perform competition pulldowns with increasing amounts of the free compound, and to combine the pulldowns in a single iTRAQ or TMT experiment [13, 34] to obtain doseresponse curves. Via an innovative protocol, Sharma et al. were able to determine the dissociation constant and the IC50 of getinib, an EGFR kinase inhibitor in clinical use for lung cancer [42]. Namely, comparing a rst experiment with a subsequent pulldown performed on the supernatant of the rst one, they determined the immobilized getinib dissociation constant. In parallel, they performed competition experiments with different concentrations of the free compound to obtain the getinib IC50, which combined with the immobilized getinib dissociation constant gave the free compound dissociation constant. Experiments were performed using SILAC 3-plex.
Computational methods
We introduce several computational techniques that are useful in analyzing drug target lists, starting with rather simple methods that ignore important aspects of chemical proteomic data, and expanding with more sophisticated algorithms that integrate additional domain-specic knowledge. For the sake of concision, we often refer to the identication of relevant biological pathways as a model problem but, unless otherwise specied, the methods apply to other reference biological data sets as well. Moreover, we exemplify several methods with the target prole analysis of the tyrosine kinase inhibitors imatinib, dasatinib, bosutinib, and bafetinib that are in clinical use or in development as therapeutic agents, e.g. against CML and other malignant diseases. They provide convenient illustrations of common difculties.
protein lists with existing knowledge is to rst map those proteins onto descriptions of biological functions and then to look for signicant associations by means of statistical tests (Fig. 4A). A standard example is to search for biological pathways that are likely to be modulated by a compound. There exist databases that describe each pathway with a graphical representation and a list of involved proteins, e.g. KEGG [43] or NCI-PID [44]. The analysis is performed ignoring the graphical structure by comparing the number of protein targets present in a pathway with the total number of proteins in this pathway and in the human genome. A proportion of targets found in a pathway that is larger than what is expected by chance indicates potential pathway regulation (Fig. 4A). All the signicant hits obtained in a pathway database search are reported with an indication of statistical signicance, e.g. a p-value. This procedure is named as enrichment analysis. Depending on the research project, several databases can be considered for enrichment analysis (Table 2). While pathway databases are frequently employed in drug target analysis, they can be complemented by gene ontology (GO) biological process (BP) descriptions [45], which document a hierarchy of biological functions from metabolism to signaling with some disease-related processes included as well. As GO is not only a catalogue of sets of proteins associated with a biological process, but it comes with a hierarchy, i.e. it is an ontology, enrichment analyses can be performed at various levels of details. For instance, the GO consortium has proposed slimmed ontologies that retain rather general functions and certain tools propose their own denitions of GO levels, e.g. DAVID [46]. If the interest is to discover an unknown mechanism of binding, it is possible to perform enrichment analyses on the target protein domains or to use the GO molecular function ontology (GO MF). More generally, any database containing sets of proteins that each share any specic characteristic can be used to perform enrichment analyses [47]. Table 2 contains a list of databases and tools that can be applied for this purpose.
3.2 The use of target afnities and secondary interactors Chemical proteomics usually delivers lists of drug targets with a notion of weight, which can be a direct measure of the target afnity [13, 42], or a rough indicator provided by the spectral count or a related quantity (protein sequence coverage, log-transformations, etc.) [26]. Such weights inform on the importance of the targets and they obviously have the potential to improve enrichment analyses performed otherwise with all targets considered equal. In terms of effect strength, high-afnity targets are the best candidates to be the mediators of biological response regulation. Medium-afnity targets that are abundant might also play an important role, provided the drug is present at
3.1 Classical mapping and enrichment methods Lists of drug protein targets can be difcult to interpret directly, especially if they comprise more than a few familiar entities. The classical bioinformatic solution to relate
107 3 Figure 4. General principle of enrichment analysis and its

extensions. (A) Drug protein targets are mapped to sets found in a database, such as pathways or GO terms. Each pathway (P1P6) is composed of a certain number of proteins (small circles). Determining whether a given pathway is signicantly hit by the target list requires a statistical null-model, i.e. a model that allows us to compute the probability to nd a given number of targets in a pathway by random chance. Conceptually, the situation is an experiment where from an urn containing N balls in total, R of which are red, n balls are drawn randomly and we want to compute the probability that they contain r red balls. The urn is the set of all the proteins found in all the pathways of the database (N 5 35 in the gure). The R red balls are the targets (ve in the gure, unmapped targets are ignored). The n drawn balls are the proteins found in a given pathway (9 for P3) and r is the number of targets in this pathway (3). The probability to nd r targets in a pathway of size n by chance is given by the hypergeometric probability density P(r|n,R,N), and summing over all the possible values kZr gives the pathway p-value. When this p-value is below a chosen cutoff, say 1%, the pathway is considered signicantly enriched in the protein targets. This method is equivalent to Fishers exact test, and with the numbers above we nd P(r|n,R,N) 5 0.0841 and po0.0946, which is not signicant. (B) With weights wi associated with drug targets, the score s of a pathway P is the sum of the weights of the proteins found the pathway (in case a multiplicative score is preferred, logarithms can be summed). To estimate the distribution of scores observed by random chance, a large number of subsets P0 of the database proteins with the same size as P are generated. For each, the weights of the targets in P0 are summed, and a histogram of the null-distribution is obtained. The histogram can be used as such or a theoretical distribution t, e.g. Gamma, and s p-value is estimated. (C) Individual pathway p-value thresholds are adapted to control the FDR among the set of pathways selected as signicant. (D) In TopGO analysis [50], terms of a GO that are found signicant (lower red node) are excluded from subsequent calculations, whereas nodes containing targets but not signicant (indicated with a -) contribute to their parents (upwards arrows). (E) Double ltering pathway selection by requiring a target to be present in the pathway and coherent downstream regulation measured in a complementary experiment such as expression proteomics. Here, inhibiting the red proteins with a drug should increase the expression of the green protein. (F) Principle of local enrichment, i.e. as implemented in the functional cloud method [58]. A group of adjacent proteins in the interactome are found that share a common functional annotation a. Such subnetworks can be scored and those which obtain a score higher than what is expected by chance represent biological functions that are likely to be modulated by the drug.
sufcient concentration. Spectral counts reect a mixture of target afnity and abundance and, although they are less precise than afnity estimates, they therefore represent a convenient ad hoc indication of target importance. In the case of available afnity estimates, more precise target weights can be determined. An obvious choice is the afnity estimate itself, but it is also worth investigating the product of the afnity and a protein abundance estimate. As indicated above, drug pulldowns retrieve protein complexes completely or partially and target lists can contain proteins that do not interact with the drug directly. One can argue that the important units of molecular function are in fact the protein complexes, and to have several members of a complex in the target list should not perturb the bioinformatic analysis excessively. Indeed, secondary interactors might even facilitate the analysis since knowledge present in databases is incomplete, and it is possible that association with a pathway or with any relevant biolo& 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
gical concept exists for a secondary interactor, but not for the corresponding drug target. Working with kinase inhibitors, where nonkinases are likely secondary interactors after nonspecic binder ltering, we reduced nonkinase weights in the analysis by a factor of 0.25 [26, 48]. Enrichment analysis on the basis of a weighted protein target list, eventually containing secondary interactor weights, cannot be performed with standard tools because the hypergeometric test would no longer be valid. It must be substituted with another test that models random association-weighted scores and, in general, there is no theoretical
108
J. Colinge et al.
Table 1. Major sources of nonspecic binders and their solutions
Source
Solution method
Limitations
Refs.a)
Binds to the beads Negative control with blocked beads or to the linker Frequent hitter in previous pulldowns Better linker (with hydrophilic spacers) Low-afnity abundant proteins
Might eliminate correct targets [83] [84, 85]
Frequent hitter in previous pulldowns Might eliminate correct targets Negative control with another compound Must ascertain there is no shared target; works better with a close chemical structure which is not always available with no shared target Subtract core proteome Will not remove medium-abundant no nspecic binders completely; no chance to nd abundant targets Pulldown enrichment versus Will not remove medium-abundant nonspecic core proteome binders completely Competition with free compound Public database of frequent hitters Might eliminate correct targets Protein family, e.g. kinases only for kinase inhibitors Binary protein interaction from databases Binding domains, predicted protein interactions Exclude the possibility of unexpected targets
[23, 86, 87]
[8] [13, 31] [88] [8, 89]
Indirect interactions
Reduces possibilities but does not indicate the [53] direct binders clearly Reduces possibilities but does not indicate the [33] direct binders clearly; potential higher rate of false-positive binary interactions compared with measured binary interactions
a) Relevant references to either illustrate the successful application of the solution method or some of its limitations.
statistical distribution to do this. However, nonparametric methods such as permutation tests provide a convenient solution to model the null-distribution and they can work with virtually any weighted scoring scheme (Fig. 3B).
3.3 Multiple testing As a drug target list is searched against a database of pathways, or any other option mentioned above, the comparison of the target list with each individual pathway yields a p-value. To impose a maximum false-positive rate (FPR) a as a threshold to individual p-values does not provide a convenient way of controlling the false-positive rate in the nal selection of signicant pathways. The p-values are obtained from a null-model that ignores the multiple selections and, in particular, the number of true positives present in the database. The most stringent solution is to obtain individual p-values smaller than a/N, where N is the number of pathways described in the database. This solution is named the Bonferroni correction and it ensures that the probability to have one or more false positives among the selected pathways is not more than a. The problem with this approach (and related less strict ones such as the Sidak correction) is that sensitivity is clearly reduced, and it ignores the number of selected pathways completely, considering the total database size only. A more appropriate and widely used solution is to control the false discovery rate (FDR), which is dened as the rate of
false positives among the selection of signicant pathways [49]. FDR is a much more natural concept that is related to the selection size, which is readily understandable for the user of a tool. There exist methods to automatically adjust pvalue thresholds on individual pathways such that a prerequired FDR threshold is met (Fig. 4C) and many common tools such as DAVID [46] offer this option.
3.4 The use of structures in the enrichment Different improvements over the enrichment analysis methods have been proposed to increase specicity. In most cases, these methods try to make a better use of the structure of the reference data described in the database. A rst example is provided by the GO, which can be compared with drug targets at different levels of details. Full detail analyses might return very specic and not so relevant annotations as signicant hits. High-level analyses, with all the detailed GO terms mapped onto a few generic ones, might hide interesting differences at higher levels of details. Several authors have proposed methods to combine both options. An interesting example is TopGO [50], where detailed GO terms not found to be signicant participate in the analysis of more generic terms, whereas detailed signicant terms are excluded from further analysis (Fig. 4D). DAVID proposes only an alternative which is to use the most detailed GO terms and to group them a posteriori to reduce the output complexity.
Proteomics Clin. Appl. 2012, 6, 102116 Table 2. Selection of useful resources
109
Name Classical enrichment analysis DAVID GO Pathway databases
Typea)
Description and references
W W D,W
Generic, simple, and rich tool (pathways, GO, domains, etc.) [46] The European Bioinformatics Institute website provides the GOs [45] with tools KEGG [43], NCI-PID [44], BioCarta (www.biocarta.com), NetPath [90], and WikiPathways [91] are commonly used databases whose websites provide mapping tools MINT [54], IntAct [53], DIP [92], HPRD [57], BioGRID [55], and InnateDB [56] are common repositories of PPIs STRING is an interaction database that complements PPI data with interactions inferred from text mining, coevolution, and simultaneous presence in pathways [93]; STITCH is a related project that compiles drugprotein interactions [94] Database of genedisease associations [95] COSMIC [96] and Oncomine (www.oncomine.org) compile genetic defects found in tumors DrugBank integrates information on drugs and their targets [97], SIDER compiles drug side effects [98], SMPDB provides drug metabolic pathways [99], MMsINC describes a vast collection of compounds [100], and ChEBI [101] lists active compounds A comprehensive database of active molecules in TCM [74] Cytoscape is the most widely used tool with many plug-ins to perform network analyses and extensions [102], BiologicalNetworks is an alternative system also offering a rich set of functions [103] R (www.r-project.org) is a data analysis environment and programming language that provides a comprehensive set of packages to analyze networks, perform enrichment analysis, etc., via its bioinformatics extension (www.bioconductor.org) A generic tool to perform a multitude of gene expression prole analyses, also applicable to proteomic data [104] Ligandsmall molecule binding tool to identify true interactors [105]
Biological pathways PPI databases STRING, STITCH
D,W D, W
Diseases OMIM Tumors Drug databases Compounds
D, W D, W
D, W
TCM database Tools Network display and analysis tools R
D, W T
MeV SwissDock
T W
a) D, database; W, web site with query tools; T, stand alone tool.
As a second example of exploiting structures in the reference data, it is possible to lter pathway hits by integrating expression proteomics or gene microarray data, where drug-treated cells are proled. Pathways truly modulated by a drug should contain upstream targets and exhibit downstream regulation (Fig. 4E). There also exist algorithms developed for gene microarray data that score the global coherence of multiple hits on pathways, taking their topology into account [51]. Weighted target lists can be submitted to such programs.
3.5 Integrating protein interactions A rst systems biology method To a certain extent, denitions of pathways and biological processes are arbitrary and, for sure, many relationships
between proteins and genes are not known [52]. Large amounts of human proteinprotein physical interactions (PPI) have been collected and stored in public repositories such as IntAct, MINT, BioGRID, HPRD, and InnateDB [5357]. The interactome, i.e. the network of all the PPIs, constitutes a valuable complementary approach to consider drug targets in a broader context. It has been shown that proteins sharing physical interactions often share a function and, consequently, proximity in the interactome helps in improving the specicity of enrichment analysis. This can be explained by the frequent participation of proteins in several functions or pathways depending on associations with other proteins. Therefore, if a protein A can associate with B for a certain function or with C for another one, to nd A and B in a drug pulldown indicates that the rst function is modulated and not the second one. This further underlines the potential interest of secondary
110
J. Colinge et al.
Figure 5. Target prole expansion. (A) Adjacent proteins (1) of a target (red) are likely to share a function. By adding one additional layer of neighbors (2), functional specicity is often lost because of highly connected nodes. (B) To limit an explosion of the expanded target prole, proteins that are linked to two targets at least (1) are added only. In a relaxed version, it is possible to consider proteins with one link to a target only, provided they are directly linked to another such protein (2). (C) Diffusion over a small example network. The two rectangle nodes indicated by gray arrows represent targets with identical weights. We selected the top 10 scores on the network after diffusion and we observe synergistic effects that do not select nodes on the basis of their distance to the targets only. (D) We used bosutinib target prole [26] and selected its 5% expanded prole [48] to illustrate the application of diffusion methods. Triangles are targets, diffusion score are indicated by a color-scale (red, strong), immune system-related proteins are shown with large nodes and their names are followed in brackets by the numbers of interactions with other immune system proteins found within the subnetwork. This subnetwork suggests a high risk of immunosuppressive side effects.
binders to increase precision with regard to the context of drug action. A rst approach to improve enrichment analyses consists in nding interactome subnetworks that contain the drug targets and are enriched for an annotated biological function [58] (Fig. 4F). We found this method very useful in analyzing the bafetinib target prole, which featured 33 kinases that yielded a clear CML-relevant association with apoptosis. As a comparison, direct GO enrichment analysis only yielded three signicant (1%) biological processes, none of which was directly cancer related. KEGG pathway enrichment did not nd any signicant hit at the 1% signicance level.
3.6 More global analysis of the impact on the protein interaction network So far, we have presented methods where the target prole was analyzed through its mapping onto existing biological concepts to predict the impact of drug treatment. We already exploited the interactome as a mean of embedding target proles in a broader and more neutral context. Here, we go one step farther by trying to expand the target list with
functionally associated proteins. In this procedure, only the topology of the interactome matters and existing functional annotations are ignored. In principle, adjacency within the interactome often implies a related function due to the common participation to a complex or a pathway. Directly expanding the target list with immediate network neighbors is a rst option (Fig. 5A). We occasionally obtained satisfying results, but this approach usually does not perform well either because more distant relevant neighbors are missed or too many additional proteins are added, resulting in dilution of the ngerprint of relevant biological functions (Fig. 5B). As an example of this difculty, bafetinib prole with direct interactions increased from 33 initial proteins to 831. GO enrichment analysis yielded 676 1%-signicant biological processes [58], a massive dilution of information. In general, PPI networks have a small-world or scale-free topology [59], which means that their organization resembles air trafc routes: every airport is at a distance of one or two ights from a large hub that connects it to another hub, which is close to the nal destination. Therefore, adding more than one layer of neighbors frequently encounters highly connected proteins and too many proteins are included (Fig. 5A). Obviously, the solution to this problem is
111 3.7 Toward drug efcacy predictions The next step in the interpretation of chemical proteomic target proles consists in relating target proles to diseases and patient genetic backgrounds. The main tool to compute this relationship is an intuitive notion of similarity over the interactome. Through diffusion methods, we can compute the inuence of a drug prole, i.e. we compute a drug treatment model. We can do the same starting with the genes causing a disease and thus obtain a disease model. Comparing the two models, we can estimate drug treatment efciencies for a certain disease [48] (Fig. 6A). This method has its origin in the work of researchers who tried to relate phenotypes (diseases, patient records, etc.) with genetic information (genes causing a disease, genetic defects, etc.) through the exploitation of PPI networks. Such studies can predict unknown new important players in certain pathologies [60, 67, 68], suggest new drug targets, or even and closely related to our topic suggest candidate molecules to treat a disease [69]. Direct KEGG pathway enrichment analysis of the imatinib target prole yielded only two 5%-signicant hits, none of which was CML relevant. Expanding the imatinib target lists through direct PPIs generated a list of 295 proteins. When submitted to KEGG enrichment analysis, this extended protein list returned 30 5%-signicant hits with apoptosis at rank 4. For comparison, the network diffusion methods combined with drug efcacy scores returned CML as the top hit [48]. Furthermore, we were able to show that reasonable estimates of treatment efcacy can be obtained through drug efcacy scores when comparing four BCRABL kinase inhibitors. We also showed that modifying the disease model to introduce the constitutive activation of the LYN kinase observed in certain imatinib-resistant patients, the computation could determine an increased score for the second-generation compounds dasatininb, bosutinib, and nilotinib, especially designed to target LYN in addition to BCR-ABL. The imatinib score remained essentially constant thereby illustrating the potential to segregate patients with these methods. We compared the target proles of all the four inhibitors with a list of diseases and proposed plausible additional indications for each compound (Fig. 6B). For instance, lung cancer was highly ranked for dasatinib and actual efcacy was shown in another study [18]. Noonan syndrome was ranked rst for bosutinib, which makes sense since several cancer-associated genes (KRAS, PTPN11, SOS1, RAF1) are involved [70] and kinase inhibitor treatments are currently considered for this syndrome. These examples illustrate how target proles can be analyzed to predict new drug indications. There is a lot of evidence that distinct pathologies might share molecular mechanisms [7173] and, looking forward, this further increases the opportunities for drug repurposing: drugs can be related to diseases and, through disease associations, proposed as therapeutic agents for new diseawww.clinical.proteomics-journal.com
to constrain the expansion procedure such that proteins are added only when sufcient evidence of a potential association with the targets exists. Proteins with direct PPIs with at least two targets constitute safe expansions (Fig. 5B) and they usually increase signal to noise in the functional analysis [16], i.e. submitting the expanded list to enrichment analysis yields more relevant hits without generating additional false positives, or even reducing false positives through FDR control. There are cases where small target lists cannot be augmented sufciently this way and to consider a second layer of PPI is necessary. Usually, the list of added proteins explodes with a second layer and all the target spectrum specicity is lost. It can be done by imposing that such proteins have at least one PPI with a target and one together (Fig. 5B) but this is not a complete secondary layer of PPI and in many cases it is not even sufcient to control the explosion of the expanded target list. One elegant solution to the above limitations involves the notion of diffusion over a network. The concept is rather natural: starting from a set of seed proteins, in our case the drug targets, their inuence diffuses over the network to give a score to all the other proteins (Fig. 5C). The interest of this method is that the global network topology is exploited and synergies between close protein targets confer increased scores to linked proteins. Distant-related proteins that are connected to drug targets through specic paths, which do not contain highly connected proteins, can be scored relatively high (Fig. 5C). In a neutral context, where it is not known which protein interaction is crucial for a treatment or a disease, diffusion methods provide efcient methods that capture a notion of functional proximity that follows the biological intuition. The actual computation of the diffusion and the weights can be implemented as the asymptotic distribution of a random walk or rst passage times [48, 60, 61], or more precisely controlled through diffusion kernels [60]. In every case, afnity weights can be used to adjust the importance of the seed proteins. Interestingly, World Wide Web HTML document hyperlinks constitute a network that has also a small-world topology and random walk methods are at the heart of widely used web search engines. On the basis of the weights determined by the diffusion method, it is possible to select either the top K proteins or, through many repeated diffusions using random target spectra, to determine a score threshold [48] and thus obtain a subnetwork. Figure 5D shows the 5% signicant bosutinib subnetwork, which revealed a strong association with immune system pathways [48]. This immunosuppressive risk illustrates a plausible application of this method to the prediction of side effects, since dasatinib, a related kinase inhibitor in clinical use, has been documented previously to cause such effects [62] and several proteins in the bosutinib subnetwork, such as LYN [62], BTK [63], TBK1 [64], and SYK [65, 66], are known to cause immunosuppression upon inactivation.
112
J. Colinge et al.
to human health deal with similar situations. For instance, TCM research has accumulated a lot of information on many active molecules present in a TCM drug and their target proteins [74]. A TCM drug action can be analyzed with the concepts presented here [7577]. In fact, these were pioneered by TCM researchers [7880]. A similar consideration can be made regarding nutritional science, where through the ingestion of food multiple molecular changes can be induced in human gourmets, suggesting that nutrigenomics could benet from such systems-wide impact analyses as it is collecting evidence on the impact of specic diets on human proteins [81, 82].
Perspective and concluding remarks
Figure 6. Scoring drug treatment efcacy. (A) On the basis of a diffusion method and drug targets with their afnities, a drug treatment model is built. Similarly, genes or proteins causing a disease can be used to build a disease model, eventually integrating patient-specic information such as genetic abnormalities. At the intersection of the two models, a drug efcacy score can be computed, e.g. multiplying the two diffusion scores given by each model to individual proteins and summing over the proteins in the intersection. (B) Two obvious applications of treatment efcacy scores. Drugs can be compared for a chosen disease or patient to predict an adequate treatment, i.e. to implement personalized medicine. Conversely, a chosen compound can be compared with disease models to nd new indications for this compound (drug repurposing). (C) Drugs cannot only be compared with diseases but relationships between drugs can be exploited to transfer knowledge available for some compounds to a new compound and, identically, shared molecular bases of distinct pathologies can be exploited to predict side effects and repurpose drugs.
ses (Fig. 6C). Drug treatment models can also be applied to compare drugs and potentially assign documented side effects or areas of applications to new compounds (Fig. 6C).
3.8 Data sets not from chemical proteomics One important aspect discussed is the difculty of interpreting complex and potentially large drug proles in a systems biology perspective. Other elds of research related
We have presented a brief overview of chemical proteomic methods and introduced various bioinformatic methods to analyze drug target proles. The simplest methods performed enrichment analyses, where sets of proteins, e.g. biological pathways, are compared with drug proles to detect over-representation of the targets in those sets. Then, modeling the specics of chemical proteomics better, we introduced methods that take into account estimations of drugprotein afnities. Finally, expanding further, we presented methods that implement systems-wide analyses by embedding the drug proles in global models of cell biology such as an interactome. The high degree of interconnectivity among the entities involved in biological processes is a natural and strong argument to investigate the positive and negative consequences of drug treatment from the point of view of systems biology. We have provided examples of kinase inhibitors that have a broad spectrum of targets, e.g. bafetinib with 33 kinases, and whose target proles cannot be analyzed successfully without the contribution of the human interactome data. Furthermore, we have showed how information on the molecular causes of diseases can be correlated with the consequences of drug treatment over the interactome to obtain reasonable predictions of side effects, additional indications, and match with individual patient genetic background (Fig. 6A and B). More generally, the measurement of target proles provides an elegant way to compare drugs with each other on the basis of their action on pathways or the interactome (Fig. 6C) and, in combination with similar comparisons realized with diseases, it makes it possible to build a complex set of relationships that allow transferring information from one well-characterized compound to a new molecule or to postulate side effects. It is well known that patients must be segregated to improve treatment efcacy, limit their cost, and reduce new compound attrition rates. The spectacular improvements achieved in DNA-sequencing technologies open avenues in obtaining very detailed information on patient genomes. We believe that, at the other end of the spectrum, chemical proteomics and the systems biology analysis of its data has a
113
[15] Olsen, J. V., Blagoev, B., Gnad, F., Macek, B. et al., Global, in vivo, and site-specic phosphorylation dynamics in signaling networks. Cell 2006, 127, 635648. [16] Hantschel, O., Gstoettenbauer, A., Colinge, J., Kaupe, I. et al., The chemokine interleukin-8 and the surface activation protein CD69 are markers for Bcr-Abl activity in chronic myeloid leukemia. Mol. Oncol. 2008, 2, 272281. [17] Pan, C., Olsen, J. V., Daub, H., Mann, M., Global effects of kinase inhibitors on signaling networks revealed by quantitative phosphoproteomics. Mol. Cell. Proteomics 2009, 8, 27962808. [18] Li, J., Rix, U., Fang, B., Bai, Y. et al., A chemical and phosphoproteomic characterization of dasatinib action in lung cancer. Nat. Chem. Biol. 2010, 6, 291299. [19] Winter, G. E., Rix, U., Lissat, A., Stukalov, A. et al., An integrated chemical biology approach identies specic vulnerability of Ewings sarcoma to combined inhibition of Aurora kinases A and B. Mol. Cancer Ther. 2011. [20] Ramsden, N., Perrin, J., Ren, Z., Lee, B. D. et al., Chemoproteomics-based design of potent LRRK2selective lead compounds that attenuate Parkinsons disease-related toxicity in human neurons. ACS Chem. Biol. 2011. [21] Duncan, J. S., Gyenis, L., Lenehan, J., Bretner, M. et al., An unbiased evaluation of CK2 inhibitors by chemoproteomics: characterization of inhibitor effects on CK2 and identication of novel inhibitor targets. Mol. Cell. Proteomics 2008, 7, 10771088. [22] Mercer, L., Bowling, T., Perales, J., Freeman, J. et al., 2, 4-Diaminopyrimidines as potent inhibitors of Trypanosoma brucei and identication of molecular targets by a chemical proteomics approach. PLoS Negl. Trop. Dis. 2011, 5, e956. [23] Fadden, P., Huang, K. H., Veal, J. M., Steed, P. M. et al., Application of chemoproteomics to drug discovery: identication of a clinical candidate targeting hsp90. Chem. Biol. 2010, 17, 686694. [24] Kang, H. J., Yoon, T. S., Jeong, D. G., Kim, Y. et al., Identication of proteins binding to decursinol by chemical proteomics. J. Microbiol. Biotechnol. 2008, 18, 14271430. [25] Fleischer, T. C., Murphy, B. R., Flick, J. S., Terry-Lorenzo, R. T. et al., Chemical proteomics identies Nampt as the target of CB30865, an orphan cytotoxic compound. Chem. Biol. 2010, 17, 659664. [26] Remsing Rix, L. L., Rix, U., Colinge, J., Hantschel, O. et al., Global target prole of the kinase inhibitor bosutinib in primary chronic myeloid leukemia cells. Leukemia 2009, 23, 477485. [27] Fernbach, N. V., Planyavsky, M., Muller, A., Breitwieser, F. P. et al., Acid elution and one-dimensional shotgun analysis on an Orbitrap mass spectrometer: an application to drug afnity chromatography. J. Proteome Res. 2009, 8, 47534765. [28] Lundgren, D. H., Hwang, S. I., Wu, L., Han, D. K., Role of spectral counting in quantitative proteomics. Expert Rev. Proteomics 2010, 7, 3953.
fundamental role to play in order to link patient specic digital models to accurate models of drug action. The authors thank Professor Shao Li and Professor Jing Zhao for their help on TCM applications. The authors have declared no conict of interest.
References
[1] Domon, B., Aebersold, R., Mass spectrometry and protein analysis. Science 2006, 312, 212217. [2] Bantscheff, M., Scholten, A., Heck, A. J., Revealing promiscuous drug-target interactions by chemical proteomics. Drug Discov. Today 2009, 14, 10211029. [3] Rix, U., Superti-Furga, G., Target proling of small molecules by chemical proteomics. Nat. Chem. Biol. 2009, 5, 616624. [4] Booth, B., Zemmel, R., Prospects for productivity. Nat. Rev. Drug Discov. 2004, 3, 451456. [5] Csermely, P., Agoston, V., Pongor, S., The efciency of multi-target drugs: the network approach might help drug design. Trends Pharmacol. Sci. 2005, 26, 178182. [6] Araujo, R. P., Liotta, L. A., Petricoin, E. F., Proteins, drug targets and the mechanisms they control: the simple truth about complex networks. Nat. Rev. Drug Discov. 2007, 6, 871880. [7] Keith, C. T., Borisy, A. A., Stockwell, B. R., Multicomponent therapeutics for networked systems. Nat. Rev. Drug Discov. 2005, 4, 7178. [8] Rix, U., Hantschel, O., Durnberger, G., Remsing Rix, L. L. et al., Chemical proteomic proles of the BCR-ABL inhibitors imatinib, nilotinib, and dasatinib reveal novel kinase and nonkinase targets. Blood 2007, 110, 40554063. [9] Hood, L., Heath, J. R., Phelps, M. E., Lin, B., Systems biology and new technologies enable predictive and preventative medicine. Science 2004, 306, 640643.
[10] Gavin, A.-C., Aloy, P., Grandi, P., Krause, R. et al., Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440, 631636. [11] Harding, M. W., Galat, A., Uehling, D. E., Schreiber, S. L., A receptor for the immunosuppressant FK506 is a cis-trans peptidyl-prolyl isomerase. Nature 1989, 341, 758760. [12] Cuatrecasas, P., Wilchek, M., Annsen, C. B., Selective enzyme purication by afnity chromatography. Proc. Natl. Acad. Sci. USA 1968, 61, 636643. [13] Bantscheff, M., Eberhard, D., Abraham, Y., Bastuck, S. et al., Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors. Nat. Biotechnol. 2007, 25, 10351044. [14] Patricelli, M. P., Nomanbhoy, T. K., Wu, J., Brown, H. et al., In situ kinase proling reveals functionally relevant properties of native kinases. Chem. Biol. 2011, 18, 699710.
114
J. Colinge et al.
Proteomics Clin. Appl. 2012, 6, 102116 [45] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D. et al., Gene ontology: tool for the unication of biology. Gene Ontol. Consortium Nat. Genet. 2000, 25, 2529. [46] Huang da, W., Sherman, B. T., Lempicki, R. A., Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4, 4457. [47] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S. et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression proles. Proc. Natl. Acad. Sci. USA 2005, 102, 1554515550. [48] Colinge, J., Rix, U., Superti-Furga, G., in: Chen, L., Zhang, X., Shen, B., Wu, L., Wang, Y. (Eds.), 4th International Conference on Computational Systems Biology, World Publishing Company, Suzhou, China 2010, pp. 305313. [49] Benjamini, Y., Hochberg, Y., Controlling the false discovery arte: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 1995, 57, 289300. [50] Alexa, A., Rahnenfuhrer, J., Lengauer, T., Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 2006, 22, 16001607. [51] Tarca, A. L., Draghici, S., Khatri, P., Hassan, S. S. et al., A novel signaling pathway impact analysis. Bioinformatics 2009, 25, 7582. [52] Glaab, E., Baudot, Extending pathways interaction networks Biomed. Chromatogr. A., Krasnogor, N., Valencia, A., and processes using molecular to analyse cancer genome data. Bioinformatics 2010, 11, 597.
[29] Brehme, M., Hantschel, O., Colinge, J., Kaupe, I. et al., Charting the molecular network of the drug target Bcr-Abl. Proc. Natl. Acad. Sci. USA 2009, 106, 74147419. [30] Choi, H., Fermin, D., Nesvizhskii, A. I., Signicance analysis of spectral count data in label-free shotgun proteomics. Mol. Cell. Proteomics 2008, 7, 23732385. [31] Rix, U., Remsing Rix, L. L., Terker, A. S., Fernbach, N. V. et al., A comprehensive target selectivity survey of the BCR-ABL kinase inhibitor INNO-406 by kinase proling and chemical proteomics in chronic myeloid leukemia cells. Leukemia 2009, 24, 4450. [32] Venkatesan, K., Rual, J. F., Vazquez, A., Stelzl, U. et al., An empirical framework for binary interactome mapping. Nat. Methods 2009, 6, 8390. [33] Rhodes, D. R., Tomlins, S. A., Varambally, S., Mahavisno, V. et al., Probabilistic model of the human protein-protein interaction network. Nat. Biotechnol. 2005, 23, 951959. [34] Bantscheff, M., Hopf, C., Savitski, M. M., Dittmann, A. et al., Chemoproteomics proling of HDAC inhibitors reveals selective targeting of HDAC complexes. Nat. Biotechnol. 2011, 29, 255265. [35] Liu, Y., Patricelli, M. P., Cravatt, B. F., Activity-based protein proling: the serine hydrolases. Proc. Natl. Acad. Sci. USA 1999, 96, 1469414699. [36] Adam, G. C., Sorensen, E. J., Cravatt, B. F., Chemical strategies for functional proteomics. Mol. Cell. Proteomics 2002, 1, 781790. [37] Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B. et al., Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 2004, 3, 11541169. [38] Thompson, A., Schafer, J., Kuhn, K., Kienle, S. et al., Tandem mass tags: a novel quantication strategy for comparative analysis of complex protein mixtures by MS/ MS. Anal. Chem. 2003, 75, 18951904. [39] Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B. et al., Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 2002, 1, 376386. [40] Breitwieser, F. P., Muller, A., Dayon, L., Kocher, T. et al., General statistical modeling of data from protein relative expression isobaric tags. J. Proteome Res. 2011, 10, 27582766. [41] Cox, J., Mann, M., MaxQuant enables high peptide identication rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantication. Nat. Biotechnol. 2008, 26, 13671372. [42] Sharma, K., Weber, C., Bairlein, M., Greff, Z. et al., Proteomics strategy for quantitative protein interaction proling in cell extracts. Nat. Methods 2009, 6, 741744. [43] Kanehisa, M., Araki, M., Goto, S., Hattori, M. et al., KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36, D480D484. [44] Schaefer, C. F., Anthony, K., Krupa, S., Buchoff, J. et al., PID: the Pathway Interaction Database. Nucleic Acids Res. 2009, 37, D674D679.
[53] Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I. et al., IntAct open source resource for molecular interaction data. Nucleic Acids Res. 2007, 35, D561D565. [54] Cesareni, G., Chatr-aryamontri, A., Licata, L., Ceol, A., Searching the MINT database for protein interaction information. Curr. Protoc. Bioinformatics 2008. Chapter 8, Unit 8.5. [55] Breitkreutz, B. J., Stark, C., Reguly, T., Boucher, L. et al., The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, 36, D637D640. [56] Lynn, D. J., Winsor, G. L., Chan, C., Richard, N. et al., InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Mol. Syst. Biol. 2008, 4, 218. [57] Prasad, T. S., Kandasamy, K., Pandey, A., Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology. Methods Mol. Biol. 2009, 577, 6779. [58] Burkard, T. R., Rix, U., Breitwieser, F. P., Superti-Furga, G., Colinge, J., A computational approach to analyze the mechanism of action of the kinase inhibitor bafetinib. PLoS Comput. Biol. 2010, 6, e1001001. [59] Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., Barabasi, A. L., The large-scale organization of metabolic networks. Nature 2000, 407, 651654. [60] Kohler, S., Bauer, S., Horn, D., Robinson, P. N., Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 2008, 82, 949958.
Proteomics Clin. Appl. 2012, 6, 102116 [61] Berger, S. I., Iyengar, R., Network analyses in systems pharmacology. Bioinformatics 2009, 25, 24662472. [62] Sillaber, C., Herrmann, H., Bennett, K., Rix, U. et al., Immunosuppression and atypical infections in CML patients treated with dasatinib at 140 mg daily. Eur. J. Clin. Invest. 2009, 39, 10981109. [63] Hantschel, O., Rix, U., Schmidt, U., Burckstummer, T. et al., The Btk tyrosine kinase is a major target of the Bcr-Abl inhibitor dasatinib. Proc. Natl. Acad. Sci. USA 2007, 104, 1328313288. [64] Ishii, K. J., Kawagoe, T., Koyama, S., Matsui, K. et al., TANK-binding kinase-1 delineates innate and adaptive immune responses to DNA vaccines. Nature 2008, 451, 725729. [65] Hussain, S. F., Kong, L. Y., Jordan, J., Conrad, C. et al., A novel small molecule inhibitor of signal transducers and activators of transcription 3 reverses immune tolerance in malignant glioma patients. Cancer Res. 2007, 67, 96309636. [66] Deuse, T., Velotta, J. B., Hoyt, G., Govaert, J. A. et al., Novel immunosuppression: R348, a JAK3- and Syk-inhibitor attenuates acute cardiac allograft rejection. Transplantation 2008, 85, 885892. [67] Berger, S. I., Maayan, A., Iyengar, R., Systems pharmacology of arrhythmias. Sci. Signal. 2010, 3, ra30. [68] Lage, K., Karlberg, E. O., Storling, Z. M., Olason, P. I. et al., A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat. Biotechnol. 2007, 25, 309316. [69] Li, S., Zhang, N., Zhang, B., in: Chen, L., Zhang, X., Shen, B., Wu, L., Wang, Y. (Eds.), 4th International Conference on Computational Systems Biology, World Publishing Company, Suzhou, China 2010, pp. 5158. [70] Gelb, B. D., Tartaglia, M., Noonan syndrome and related disorders: dysregulated RAS-mitogen activated protein kinase signal transduction. Hum. Mol. Genet. 2006, 15, R220R226. [71] Goh, K. I., Cusick, M. E., Valle, D., Childs, B. et al., The human disease network. Proc. Natl. Acad. Sci. USA 2007, 104, 86858690. [72] Braun, P., Rietman, E., Vidal, M., Networking metabolites and diseases. Proc. Natl. Acad. Sci. USA 2008, 105, 98499850. [73] Yildirim, M. A., Goh, K. I., Cusick, M. E., Barabasi, A. L., Vidal, M., Drug-target network. Nat. Biotechnol. 2007, 25, 11191126. [74] Chen, C. Y., TCM Database@Taiwan: the worlds largest traditional Chinese medicine database for drug screening in silico. PLoS One 2011, 6, e15939. [75] Zhao, J., Jiang, P., Zhang, W., Molecular networks for the study of TCM pharmacology. Brief. Bioinform. 2010, 11, 417430. [76] Li, S., Zhang, B., Zhang, N., Network target for screening synergistic drug combinations with application to traditional Chinese medicine. Biomed. Chromatogr. Syst. Biol. 2011, 5, S10.
115
[77] Li, S., Zhang, B., Jiang, D., Wei, Y., Zhang, N., Herb network construction and co-module analysis for uncovering the combination rule of traditional Chinese herbal formulae. Biomed. Chromatogr. Bioinformatics 2010, 11, S6. [78] Zeng, H., Dou, S., Zhao, J., Fan, S. et al., The inhibitory activities of the components of Huang-Lian-Jie-Du-Tang (HLJDT) on eicosanoid generation via lipoxygenase pathway. J. Ethnopharmacol. 2011, 135, 561568. [79] Wang, L., Zhou, G. B., Liu, P., Song, J. H. et al., Dissection of mechanisms of Chinese medicinal formula Realgar-Indigo naturalis as an effective treatment for promyelocytic leukemia. Proc. Natl. Acad. Sci. USA 2008, 105, 48264831. [80] Ung, C. Y., Li, H., Cao, Z. W., Li, Y. X., Chen, Y. Z., Are herbpairs of traditional Chinese medicine distinguishable from others? Pattern analysis and articial intelligence classication study of traditionally dened herbal properties. J. Ethnopharmacol. 2007, 111, 371377. [81] Kussmann, M., Role of proteomics in nutrigenomics and nutrigenetics. Expert Rev. Proteomics 2009, 6, 453456. [82] Kussmann, M., Affolter, M., Proteomics at the center of nutrigenomics: comprehensive molecular understanding of dietary health effects. Nutrition 2009, 25, 10851093. [83] Shiyama, T., Furuya, M., Yamazaki, A., Terada, T., Tanaka, A., Design and synthesis of novel hydrophilic spacers for the reduction of nonspecic binding proteins on afnity resins. Bioorg. Med. Chem. 2004, 12, 28312841. [84] Oda, Y., Owa, T., Sato, T., Boucher, B. et al., Quantitative chemical proteomics for identifying candidate drug targets. Anal. Chem. 2003, 75, 21592165. [85] Wang, G., Shang, L., Burgett, A. W., Harran, P. G., Wang, X., Diazonamide toxins reveal an unexpected function for ornithine delta-amino transferase in mitotic cell division. Proc. Natl. Acad. Sci. USA 2007, 104, 20682073. [86] Schirle, M., Heurtier, M. A., Kuster, B., Proling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry. Mol. Cell. Proteomics 2003, 2, 12971305. [87] Burkard, T. R., Planyavsky, M., Kaupe, I., Breitwieser, F. P. et al., Initial characterization of the human central proteome. Biomed. Chromatogr. Syst. Biol. 2011, 5, 17. [88] Trinkle-Mulcahy, L., Boulon, S., Lam, Y. W., Urcia, R. et al., Identifying specic protein interaction partners using quantitative mass spectrometry and bead proteomes. J. Cell. Biol. 2008, 183, 223239. [89] Winger, J. A., Hantschel, O., Superti-Furga, G., Kuriyan, J., The structure of the leukemia drug imatinib bound to human quinone reductase 2 (NQO2). Biomed. Chromatogr. Struct. Biol. 2009, 9, 7. [90] Kandasamy, K., Mohan, S. S., Raju, R., Keerthikumar, S. et al., NetPath: a public resource of curated signal transduction pathways. Genome Biol. 2010, 11, R3. [91] Pico, A. R., Kelder, T., van Iersel, M. P., Hanspers, K. et al., WikiPathways: pathway editing for the people. PLoS Biol. 2008, 6, e184. [92] Xenarios, I., Salwinski, L., Duan, X. J., Higney, P. et al., DIP, the Database of Interacting Proteins: a research tool for
116
J. Colinge et al. studying cellular networks of protein interactions. Nucleic Acids Res. 2002, 30, 303305.
Proteomics Clin. Appl. 2012, 6, 102116 [99] Frolkis, A., Knox, C., Lim, E., Jewison, T. et al., SMPDB: The small molecule pathway database. Nucleic Acids Res. 2010, 38, D480D487. [100] Masciocchi, J., Frau, G., Fanton, M., Sturlese, M. et al., MMsINC: a large-scale chemoinformatics database. Nucleic Acids Res. 2009, 37, D284D290. [101] de Matos, P., Alcantara, R., Dekker, A., Ennis, M. et al., Chemical entities of biological interest: an update. Nucleic Acids Res. 2010, 38, D249D254. [102] Smoot, M. E., Ono, K., Ruscheinski, J., Wang, P. L., Ideker, T., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011, 27, 431432. [103] Kozhenkov, S., Dubinina, Y., Sedova, M., Gupta, A. et al., BiologicalNetworks 2.0 an integrative view of genome biology data. Biomed. Chromatogr. Bioinformatics 2010, 11, 610. [104] Saeed, A. I., Sharov, V., White, J., Li, J. et al., TM4: A free, open-source system for microarray data management and analysis. Biotechniques 2003, 34, 374378. [105] Grosdidier, A., Zoete, V., Michielin, O., SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011, 39, W270W277.
[93] Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M. et al., The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011, 39, D561D568. [94] Kuhn, M., Szklarczyk, D., Franceschini, A., Campillos, M. et al., STITCH 2: an interaction network database for small molecules and proteins. Nucleic Acids Res. 2010, 38, D552D556. [95] Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H. et al., Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008, 36, D13D21. [96] Forbes, S. A., Tang, G., Bindal, N., Bamford, S. et al., COSMIC (the Catalogue of Somatic Mutations in Cancer): A resource to investigate acquired mutations in human cancer. Nucleic Acids Res. 2010, 38, D652D657. [97] Wishart, D. S., Knox, C., Guo, A. C., Cheng, D. et al., DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, 36, D901D906. [98] Kuhn, M., Campillos, M., Letunic, I., Jensen, L. J., Bork, P., A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol. 2010, 6, 343.

Systems Biology Analysis of Protein-Drug Interactions

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Systems Biology Analysis of Protein-Drug Interactions

Hochgeladen von

Copyright:

Verfügbare Formate

102

Proteomics Clin. Appl. 2012, 6, 102116