Sie sind auf Seite 1von 11

CYTOSCAPE WORKSHEET

Adapted from http://www.cbs.dtu.dk/phdcourse/cookbooks/Exercises_Systems_Biology.pdf Please answer all 6 questions in the exercises. EXERCISE 1: VISUALIZING INTERACTION NETWORKS Overview: This exercise introduces you to Cytoscape, a software for visualizing networks (www.cytoscape.org). In this exercise, you are to familiarize yourself with Cytoscape and its visualization abilities. Then you will work with a few plugins to do a statistical analysis of an interaction network and use the MCODE algorithm to find subnetworks/complexes. Youll be using an older version of Cytoscape (v2.1), so resist the temptation to go for the newer releases. Part 0. Getting started: STEP 1: Make sure you have the Java Runtime Environment 5 or newer on your machine (if not, go to http://www.java.com/en/download/manual.jsp). Once thats squared away, download Cytoscape 2.1 from http://www.cytoscape.org/download.php?file=cyto2_1. Youll need to fill out the license agreement as a CU student. This gives you a .zip distribution file called cytoscapev2.1.zip. Unzip this into a directory on your machine, and dont move it after this point. You can start the code by clicking on the cytoscape.bat file on a windows system, or the bash script on a Linux system. STEP 2: Go to http://cbio.mskcc.org/~bader/software/mcode/ and download MCODE v 1.1, with recommended bug fix. Unzip this file, called mcode_v1_1.zip, and copy the .jar file into the cytoscape-v2.1/plugins directory. STEP 3: Go to http://med.bioinf.mpi-inf.mpg.de/netanalyzer/download.php, and download the NetworkAnalyzer version 1.0 (not the later version). Fill out the license information, wait for an email to come with a download link, and download the file. Rename it to NetworkAnalyzer.jar. Copy this file into the cytoscape plugins directory. STEP 4: In this exercise we will use a subset of the human interaction dataset by Rual et al. (Nature.2005 Oct 20; 437(7062):1173-8). The subset can be downloaded at http://www.cbs.dtu.dk/phdcourse/cookbooks/Cytoscape/RUAL.subset.sif. Save this file to your computer, outside of the cytoscape directory.. PART I. Running Cytoscape: STEP 1: Launch Cytoscape by clicking on the cytoscape.bat file (or the bash script (depending on your OS). After a bit of text scrolls by, you should see a window that looks like this:

STEP 2: Load the network RUAL.subset.sif into Cytoscape by selecting Load under the File menu, then selecting Network, and then specifying the location of the file. This network consists of 1089 interactions observed between 419 human proteins, and is a small subset of a large human interaction dataset. This subset of interactions consists of proteins that interact with the transcription factor protein TP53 (Please note that Cytoscape will only create an automatic view if there are less than 500 nodes). Part II. Network layout & Selecting nodes: STEP 3: Try some of the different layouts (circular, organic, hierarchical and random) by selecting the appropriate layout in the yFiles under the Layout. Different layouts can be helpful for visualizing data in a way that makes sense to a human viewer.

By default, Cytoscape generates a grid layout which is not very useful. One of the most useful layouts for network biology is the spring layout (similar to the organic layout). Try the spring embedded layout: Under Layout, select apply spring embedded layout for all nodes. STEP 4: In the Cytoscape canvas (the blue window with the network view) you can select nodes by clicking on them with the mouse, or dragging with the left mouse button. Select a few nodes, and move them around the screen. The nodes in this network are labeled by numeric Entrez IDs, which are the IDs employed by NCBI (www.ncbi.nlm.nih.gov). The node representing TP53 is numbered 7157. STEP 5: Under the Select menu, select Nodes, and By Name. A popup window should appear. Enter the node id (7157) and click Search, which hopefully will highlight TP53 (TP53 will now appear yellow in the network). You can unselect any selection by clicking on the canvas. You can also select nodes that interact with a specific node (e.g. TP53). STEP 6: Under the Select menu, select Nodes and First neighbors of selected nodes. You should see a network with several yellow nodes in the center. This can be useful when looking for genes that have a common regulator or a common regulatory target. Q1: How many proteins interact with TP53? Part III. Network statistics: In this part of the exercise, you will work with the NetworkAnalyzer-plugin. One of Cytoscapes strengths is the ability to write plugins that can be run in cytoscape. There is a large community of developers that contribute with such plugins. STEP 7: First, deselect all nodes in the network. Then apply the NetworkAnalyzer-plugin to the network by selecting Network Analyzer from the plugins menu (ignore the Cytoscape warning that the network contains both directed and undirected edges). This should produce the following window: As you can see, the NetworkAnalyzer-plugin calculates various network parameters that may be useful for describing the graphs you see. Browse through the various network statistics/parameters and try to answer the following questions: Q2: What is the average degree (connectivity) of the network? _________________________ Q3: What is the most likely degree of a random selected node in the network? _____________ _________________________________

Q4: Use the node degree distribution and the distribution of average cluster coefficient (C(k)) to determine whether the network structure is random, scale free or hierarchical (hint: look at box 2 in the Barabasi paper)? ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________

Part IV. Identification of complexes: As the average cluster coefficient is relatively high it is to be expected that there will exists some clusters (complexes) in the network. Next we will try to identify these using the MCODE algorithm. STEP 8: Select MCODE under the plugins menu and run MCODE on current network. This will identify several complexes. STEP 9: Try clicking on a complex (a complex in the MCODE Results summary). This will highlight the complex (yellow nodes) in the large network. Try to browse through all complexes with a score above 1. How many of these complexes would you have found by manual inspection of the large network? Part V. Extra: STEP 10: Have a look at the shortest path length distribution for the entire network using the NetworkAnalyzer plugin. Q5: What is the highest number of edges that you need to connect any two nodes in the network? ______________________________________________________________________________ This phenomenon is known as small-world-network and can be found in many real life networks, e.g. the six degrees of Kevin Bacon game that connects actors who have appeared in the same movie.

EXERCISE 2: DATA INTEGRATION Overview: In this exercise we will integrate gene expression data from gene deletion studies with proteinprotein interaction network. In the study by Ideker et al. in Science 2001, the yeast transcription factors Gal1, Gal4, and Gal80 were analyzed for their importance in galactose utilization pathways.

Part 0: Set default species in preferences Start Cytoscape, under Edit, Preferences, make sure your default species, defaultSpeciesName, is set to Saccharomyces cerevisiae. Set it if needed. Close and re-start Cytoscape if you had to set the defaultSpeciesName. Part I. Loading network and expression data STEP 0: Go to http://www.cbs.dtu.dk/~workman/IntroSysBio/. Scroll down to Section 2 and download the files galFiltered.sif and galExpData.pvals; save them to your computer STEP 1: Start Cytoscape and load the network galFiltered.sif. Your network will contain a combination of protein-protein (pp) and protein-DNA (pd) interactions.

STEP 2: Under the File menu, select Load, and Expression Matrix File, and load the galExpData.pvals file. This file contains gene expression measurements for three pertubation experiments. In each experiment, the level of one key protein was perturbed artificially. After a brief load, a status window will appear, indicating how many experimental conditions were found (three) and what type of significance values were included. STEP 3: Now we will use the Node Attribute Browser to custom-browse through the expression data, as follows. i. Select any node in the Cytoscape canvas. ii. In the Node Attribute Browser, click the Select Attributes button, and select the attributes gal1RGexp, gal4RGexp, and gal80Rexp. iii. Under the Node Attribute Browser, you should see your nodes listed with their expression values. Part II: Coloring nodes It is common to use expression data in Cytoscape to set the visual attributes of the nodes in a network. This visualization can be used to portray functional relation and experimental response at the same time. First, we will play with how to indicate increased or decreased expression of a gene. The steps for doing this are as follows: STEP 4: Go to the Set Visual Properties menu under Visualization. STEP 5: By clicking on the Duplicate button, create a new visual style named Gal80 or the like to duplicate the default style. Click on the Define button to define your style. STEP 6: The default tab defines the Node Color of this visual style. Set the Node Color as follows: i. Under Mapping, click on the pull-down menu labeled None and select RedGreen. ii. In the pull-down menu labeled MapAttribute, select the attribute Gal80RGexp. This specifies that each node will be colored on a color continuum according to Gal80 expression, as follows: Large negative values (indicating high repression) are colored red Small negative values (indicating slight repression) are colored pink Values close to zero are colored white Small positive values (indicating slight induction) are colored light green Large positive values (indicating high induction) are colored bright green Extreme values (negative values less than -2.5 and positive values greater than 2.1) are colored blue and black respectively

iii. Note that the default node color of pink falls within this spectrum. A useful trick is to choose a color outside this spectrum, to distinguish nodes with no expression value defined from those with slight repression. Under Default, click on Change Default, and select a default color of grey. iv. Finally, click on Apply to Network. You should see most nodes colored pink, green, or white, with a few grey nodes.

Part III. Using p-values Here, we will explore an example of using expression values and p-values together in setting visual properties. STEP 7: Select some nodes at random, and look at their expression values and P-values under the Node Attribute Browser. Notice how the expression data value ranges from about -3 to +3 in these cases, the p-value ranges from 0 to 1, as they should.

STEP 8: Now, we will explore setting node shapes according to p-values. i. Go to Set Visual Style under Visualization. ii. Go to the Node Size tab. iii. In the pull-down menu under Mapping, select BasicContinuous. iv. In the Map Attribute pull-down menu, select gal80RGsig. v. By default, you will see a button labeled Below, a button labeled Above, and three points, three input fields each with a Del button at the left and a Equal button at the right. vi. Click on the bottom-most Del button to delete this break point. Make sure to follow these instructions carefully! vii. Click on Below and set size to 60, click on Above, set size to 20. viii. In the line above Above, set the number in the input field to 0.01, click on the Equals button and set size to 30. ix. In the next line above, set break point to 0.001 and Equal to 40.

x. Click on Apply to Network. On your Cytoscape canvas, your node sizes should change. This will have the effect of depicting nodes with significant p-values as large circles. Changes in larger nodes are more likely to be significant than changes in smaller ones.

Part IV. Biological analysis scenario This section presents one scenario on how expression data can be combined with network data to tell a biological story. STEP 9: Select the neighborhood of GAL4. i. ii. Select -> Nodes-> By Name and enter YPL248C (this is the GAL4 gene) Select -> Nodes-> First neighbors of selected node (this gets the immediate neighbors of GAL4)

iii.

Select -> To new network -> Selected nodes, all edges (See figure on next page) iv. In the new sub-network, apply a graph layout algorithm using the yFiles Hierarchic layout. Notice that all three black (highly induced) nodes are in the same region of the graph. With a little exploration in the node attribute browser, you should see the following: i. The two nodes that interact with all three black nodes are GAL11, a general transcription cofactor with many interactions) and GAL4. ii. Both nodes show fairly small changes in expression, and neither change is statistically significant. These slight changes in expression suggest that the critical change affecting the black nodes might be somewhere else in the network. iii. GAL4 interacts with GAL80, which shows a significant level of repression.

iv. Note that while GAL80 shows evidence of significant repression, most nodes interacting with GAL4 show significant levels of induction. STEP 10: Go to the NCBI website (http://www.ncbi.nlm.nih.gov/), and search the Gene database for YPL248C (another name for the GAL4 gene). The items returned should include Gal4. Click on the link for Gal4 to get more information. Q6: Is Gal80 activating or inhibiting the activity of Gal4? Does this make sense given your subgraph? Explain your reasoning in a sentence or two. ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________

Das könnte Ihnen auch gefallen