Sie sind auf Seite 1von 22

Data Visualization with GEPHI

Luv Walia 10BM60043 Prabhjot Singh Bhatia 10BM60060


Gephi is dubbed as the Photoshop of Data Analytics. It is open source software to visualize and manipulate complex data networks in an intuitive manner. This user guide is an attempt to present a walkthrough for the new user.

Class of 2012 Vinod Gupta School of Management IIT Kharagpur

Contents
Introduction ............................................................................................................................................ 2 What this tutorial is about and what it is not about........................................................................... 2 Who This Tutorial Is For ...................................................................................................................... 2 Prerequisites ....................................................................................................................................... 2 About Gephi ........................................................................................................................................ 2 Features: ............................................................................................................................................. 3 Uses for Gephi in Business .................................................................................................................. 3 Fundamentals ......................................................................................................................................... 4 Installing .............................................................................................................................................. 4 Opening a file ...................................................................................................................................... 4 Graph Visualization ................................................................................................................................. 5 Layout Algorithms ................................................................................................................................. 10 Installing plugins ................................................................................................................................... 20

The cover page image was created with Gephi Version 0.8.1 Beta using Force Atlas 2 layout algorithm

Introduction
Data visualization is the representation of processed data using graphical means, so as to make it easy to communicate the information clearly and effectively. There is a trade-off to be made between aesthetics and functionality. Gephi helps achieve this trade-off effortlessly.

What this tutorial is about and what it is not about


This tutorial is highly practical oriented. It guides one on how to go about data visualization, but limits itself to Gephi. It remains limited to the basic tools and techniques available in Gephi, and does not attempt to discuss all available techniques. The tutorial uses an example dataset to show the implementation of the techniques. Screenshots have been included for the same. The tutorial has been created on the basis of the latest available version, 0.8.1 beta. Future versions may or may not contain the features listed here, or may implement in a manner different from that listed here. In addition, this book does not specifically discuss the following topics. The concepts of data visualization The algorithm followed by various plugins The internal working of the software.

Who This Tutorial Is For


This tutorial is aimed at the budding business professional, who is new to the software and wishes to get started with data visualization.

Prerequisites
A basic understanding of data analysis techniques is necessary. Additionally, one must know how the results of these analyses are to be interpreted for solving a real life problem. However, no prior data visualization experience is necessary. To try out some of the advanced techniques for live data capture and visualization, one must be comfortable doing programming and setting up a server connected to the internet.

About Gephi
[Pronounced: G-fai] Gephi, an open source network visualization platform has a rich set of built in functionalities and an intuitive user interface. The software provides a powerful and interactive visualization and exploration tool for all kinds of networks and complex systems, all with a smooth learning curve. As software for Exploratory Data Analysis, Gephi provides with a robust toolkit to explore, understand and manipulate graph structures, to reveal hidden insights. An analyst can make hypothesis, discover patterns and identify faults during data collection, all with a slick visual interface to have an overall perspective of things. Gephi is a complementary tool to statistics, since the importance of visual thinking has finally been recognized. Additionally, Gephi has built in tools for Social Network Analysis.

Features:
Realtime Visualization: Gephi sports the fastest graph visualization engine which helps an analyst create and analyze a variety of scenarios to make accurate decisions, faster. SNA Metrics: Although Gephi can work with incorporates all major metrics currently used to perform a social network analysis(SNA) like Betweenness: an indicator of influence Diameter: An indicator of the reach of an individual Closeness: An indicator of how fast this individual can reach its entire network Clustering Coefficient: An indicator of how closely knit a particular group of nodes is. Average shortest path: An indicator of how many nodes to cross to reach a particular node PageRank: The importance of a page HITS: Social value of links and content on a page Clustering and Hierarchical graphs: Gephi helps us create clusters and sub clusters out of the given network graphs. Suppport for Large datasets: What differentiates Gephi from other similar software is its ability to work with a very large dataset, upto 50,000 nodes.

Uses for Gephi in Business


Gephi can help visualize any kind of network data graphs. Specifically from a business viewpoint, Gephi can be of help in a number of ways, as detailed: Marketing o Segmentation: Gephi provides an inbuilt clustering tool to the customers from a product/service targeting perspective o Targeting: Whom to target. More importantly, whom NOT to target Gephi helps us to find users with the most influence, and hence identify them as potential targets for marketing communication. Customer Relationship Management: o Identify the worth of a customer, based on his network o Whether or not to go the extra mile to retain that customer Organizational Development o Similar to the manner that we employ social network analysis for customers, a large organization could also apply the same concepts to its own employees and generate meaningful insights that could help in running the organization more effectively. Mergers & Acquisition: o How successful is the merger? Gephi can help answer this question by analyzing the past and the present scenarios Team Building: o What set of employees could bond well? o Where can conflicts arise? o Who are the unsung heroes/leaders? o Where do the barriers to internal communication lie? Human Resources

Gephi can help us identify potential candidates best suited for a particular position. It could also help us target a particular geography to hunt for potential candidates

Gephi can help us answer all the above questions, given the right set of data.

Fundamentals
Installing
Get Gephi from this link: https://gephi.org/users/download/. Being java based, Gephi is available for all: Windows, Linux and Macintosh. The installation is a simple process. NOTE: One needs to have java installed and configured on the system before attempting to install Gephi. To get java, visit this link: http://www.oracle.com/technetwork/java/javase/downloads/index.html . To just run Gephi, Java Runtime Environment would be fine. However, to build plugins for Gephi, one must have the Java Development Kit installed.

Opening a file
Gephi cannot work on raw data. It needs data to be processed into graph formats (for example, say .gexf). To accomplish this, we can take the help from other enterprise grade FOSS software such as R. However, for the purpose of demonstration, we shall be working with the sample datasets included in the Gephi toolkit. More specifically we shall be using the social network data sets, available http://wiki.gephi.org/index.php/Datasets

here:

Open Graph File (File>Open) Import Report When the file is opened, a report is created, and a sum-up of the data and any issues are listed: o Number of nodes o Number of edges o Type of graph

Click on OK to validate and see the graph:

Use the mouse to move and scale the visualization Zoom: Mouse Wheel Pan: Right Mouse Drag

Graph Visualization
o While the Drag mode is enabled you can drag the nodes by keeping left mouse pressed and moving away. Click on the area where Dragging is written Configure the Diameter with the slider

You can change the edge thickness by locating the edge-weight slider:

If you lose your graph, reset the position, using Center On Graph button

Autoselect neighbors Essential option to enhance readability of the network. Selected nodes neighbors are automatically selected as well, allowing to know who is connected to who easily. Expand the visualization settings (right bottom corner of the graph) Check the Autoselect neighbors option

Edge color By default edges have the same color as their source node. This can be configured and a single color can be used instead.

Expand the visualization settings and go to the Edges tab Uncheck the Source node color and configure Edge default color

Node shape and 3-D Although Gephi uses a 3-D rendering engine, networks are usually in 2-D and this is the default mode. Expand the visualization settings and go to the Nodes tab Select Sphere 3d instead of Disk 2d

Display attributes Besides a label, nodes and edges have attributes, like gender, age or relationship type in a social network. Its easy to display them instead/with the label Click on the Attributes button in the visualization settings. A dialog appears and lists all attributes, separated for nodes and edges.

Check all attributes you want to display, for instance Code. Click on OK to confirm

Transform text color and size The Ranking module will be used to do that. Find the label color transformer and select which attribute to use for ranking. Here the Degree is chosen. Configure the ranking colors and click on APPLY The text should be colored now. Try also to use Betweenness Centrality instead of Degree. Now select the label size transformer Select sizes between 0 and 1, as this size value is multiplied with the default element size Click on APPLY to see how the text size changes

Antialiasing option Antialiasing is a visualization option which makes edges look smoother. It is set at 4x by default and can be set up to 16x. Go to Gephi options in the Tools menu Select the Visualization tab and then the OpenGL tab. Here you can change the antialising option. Restart Gephi to validate the changes.

Layout the graph o Layout algorithms sets the graph shape, it is the most essential action. o Locate the Layout module, on the left panel. Choose Force Atlas 2 (to handle large networks while keeping a very good quality.)

RUN the layout by applying the following settings step by step: LinLog mode = checked (Linear attraction & logarithmic repulsion (lin-lin by default), makes clusters tighter) Scaling = 100 (Increase to make the graph sparser) Edge weight influence = 0 (From 0 (no influence) to 1 (normal). Set 0 to calculate forces without edge weight) Now STOP the algorithm.

Layout Algorithms
o The purpose of Layout Properties is to let you control the algorithm in order to make a aesthetically pleasing representation.

There are several layout options available to the user, namely, OpenOrd, ForceAtlas, Yifan Hu, Frushterman-Reingold, Circular, Radial Axis and GeoLayout, each one being used for a specific purpose. LAYOUT OpenOrd ForceAtlas, Yifan Hu, Frushterman-Reingold Circular, Radial Axis GeoLayout EMPHASIS Divisions/Clustering Complementarities Ranking Geographic Repartition

Ranking (color) o Ranking module lets you configure nodes color and size. o Locate Ranking module, in the top left.

o o o

Choose Degree as a rank parameter. You should obtain the configuration panel below configure colors Move your mouse over the gradient component Double-click on triangles to configure the color Click on apply to see the result

10

Ranking result table o You can see rank values by enabling the result table. ACARVIN has 252 links and is the most connected node in the network o Enable table result view at the bottom toolbar o Click again on apply

Metrics o Calculate the average path length for the network. It computes the path length for all possibles pairs of nodes and give information about how nodes are close from each other o Click on RUN near Average Path Length. The settings panel immediately appears

11

o o

Select Directed and click on OK to compute the metric When finished, the metric displays its result in a report

12

Ranking (size) o Metrics generates general reports but also results for each node. Thus three new values have been created by the Average Path Length algorithm we ran. Betweeness Centrality Closeness Centrality Eccentricity o Go back to Ranking o Select Betweeness Centrality in the list. This metrics indicates influencial nodes for highest value. o The nodes size will be set now. Colors remain the Degree indicator. o Select the diamond icon in the toolbar for size. o Set a min size at 40 and a max size at 200 o And click on APPLY to see the result. Color: Degree Size: Betweeness Centrality metric

13

Show labels o Display node labels o Set label size proportional to node size o Set label size with the scale slider

Set label color Locate the color chooser in the visualization settings Press the left mouse to display the palette and pick a color. This sets node label color. To configure edge label color, expand the settings bar

Label Adjust Go to the Layout panel Choose the Label Adjust layout in the list Click RUN on to proceed

14

Community detection o The ability to detect and study communities is central in network analysis. We would like to colorize clusters in our example o Gephi implements the Louvain method1, available from the Statistics panel o Click on RUN near the Modularity line

15

Partition o The community detection algorithm created a Modularity Class value for each node. The partition module can use this new data to colorize communities. o Locate the Partition module on the left panel. o Immediately click on the Refresh button to populate the partition list. o Select Modularity Class in the partition list. o You can see that many communities were found, sorted in decreasing order by percentage, could be different for you. A random color has been set for each community identifier. o Click on APPLY to colorize nodes

16

Filter o

o o o

The last manipulation step is filtering. You create filters that can hide nodes and egdes on the network. We will create a filter to remove leaves, i.e. nodes with nine edge. Locate the Filters module on the right panel. Select Degree Range in the Topology category. Drag it to the Queries, drop it to Drag filter here.

17

o o o o

Click on Degree Range to activate the filter. The parameters panel appears. It shows a range slider and the chart that represents the data, the degree distribution. Move the slider to sets its lower bound to 9. Enable filtering by pushing the button. Nodes with a degree inferior to 9 are now hidden.

Preview o Before exporting your graph as a SVG or PDF file, go to the Preview to: o Select the Preview tab in the banner. Click on Refresh to see the preview. o See exactly how the graph will look like. Put the last touch.

18

In the Node properties, find Show Labels and enable the option. Click on REFRESH.

19

Export as SVG o From Preview, click on SVG near Export (SVG Files are vector graphics, like PDF. Images scale smoothly to different sizes and can therefore be printed or integrated in high-resolution presentations. Transform and manipulate SVG files in Inkscape or Adobe Illustrator) Save your project.

Installing plugins
Being the true open source feature extensive software in its class, Gephi has attracted a lot of attention from developers and researchers all round the world. As a result, there are a plethora of plugins available for Gephi to extend its functionality. These plugins can be found at https://gephi.org/plugins/ . A majority of these plugins are developed by the community and quite a few are under active development. A few prominent ones are: o o o o Retweet Monitor: Used for monitoring live retweets. More details at https://gephi.org/plugins/retweet-monitor/ Graphviz Layout: Used to make layouts suitable for the specialized graphviz software. More details at https://gephi.org/plugins/graphviz-layout/ Parallel Force Atlas: Used to speed up ForceAtlas, using multiple threads. More details at https://gephi.org/plugins/parallel-force-atlas/ Social Network Analysis: This plugin allows computation of various metrics used in social network analysis and influencer analysis. More details at https://gephi.org/plugins/social-network-analysis/ Layered Layout: This is a specialized layout with nodes in different orbits, specially used in Social Network Analysis. More details at https://gephi.org/plugins/layeredlayout/ HTTP Graph: Generates data based on the web browsing activity on the machine. Details at: https://gephi.org/plugins/http-graph/ Circular Layout, OpenOrd Layout, GeoLayout : These are layout algorithms as described previously in layouts

o o

To install a plugin, 1. 2. 3. 4. 5. 6. 7. 8. 9. Download the .zip file from the respective webpage for the plugin. Extract the file to a specified folder of your choice, to get a .nbm file. Open Gephi. Go to Tools>Plugins. Click on Downloaded tab. Click Add Plugins Browse to the path where the file was extracted and select the .nbm file. Click OK and then Install. Follow the onscreen instructions.

20

21

Das könnte Ihnen auch gefallen