Sie sind auf Seite 1von 12

Multilevel Network Visualization

Emmanuel Oppong
Computer Science and Engi neering
The Pennsylvania State University

SROP 2014 Report
August 4, 2014

Abstract
In this research project, we investigate the problem of visualizing large networks. Networks, or graphs,
are used to describe relationships between different objects. Graphs are widely used in social networks,
roadway systems, and in general, to describe a system that has interactions among multiple entities.
Visualizing relationships through graph drawings is important so that information can be easily
comprehended and navigated. Some networks, for instance social networks, can become very large
when they represent a large number of entities. In this project, we develop a new multilevel method for
visualizing graphs, using existing tools and algorithms for graph drawing. We tested this method on real-
world networks from several online repositories, such as the Koblenz network collection and Stanford
large network collection. We evaluated the method and compared it to alternatives. This research tool
will allow users to generate multilevel network visualizations for describing systems such as social
connections, microorganism relationships, highway systems, large populations, and map topologies.
Introduction
A network is a system of interconnected objects. Graph theory is the mathematical language used to
describe networks. It is a very old branch of mathematics which started in 1736 when Leonhard Euler
attempted to solve the problem of the seven bridges of Konigsberg. He tried to prove that there wasnt
a possible way of visiting each bridge without crossing one twice [1].Since then, graph theory has
evolved. Currently, researchers study how networks arise in real-world scenarios and analyze their
properties. Graphs are used to model relations in physical, social, biological, and information systems.
They are a unifying information abstraction to capture various types of data. Graphs are currently widely
used on the internet to make sense of large datasets. In 2012, Google announced the Knowledge Graph
feature as an addition to their search engine [3]. The idea was to build a massive graph of real world
objects and the connections between them. The knowledge graphs uses links between documents on
the web to understand their semantic context. The graph contains millions of objects and billions of
facts connecting them, which it uses to understand the meaning of the keywords entered for the search.
Facebook also utilizes a graph-based search engine. They combine big data from their billions of users
and external data into one big search engine providing user-specific search results.
The amount of data on the internet continues to grow each day. Graphs are used to create network
connections to make it easier to understand the type of information coming in, and the information that
is already there on the web. With the growth of data, especially on the internet, graphs have become
very large. They encapsulate millions of networks and can contain billions of different connection types.
Visual representation of networks is an important way of describing the data they represent.
Visualization of graphs is done with graph drawing techniques. A graph drawing is visual representation
of the vertices and edges it contains. The typical drawing of a graph consists of a shaded circle depicting
the vertices and line segments depicting the edges, which connects related vertices. Graph drawing
makes the information in the graph legible and navigable. The data within a network can be explored
through displaying the vertices and edges in various layouts with attributing colors, size, and other
properties. The display highlights patterns, shows connections, and provides visual information about a
vertex. These factors are used to draw conclusions about a certain dataset, in order to solve complex
problems.
There are many graph drawing techniques that utilize mathematical algorithms to space out the vertices
and edges accordingly. The arc diagram method (See Figure 1) evenly lays out all the vertices on the
same line, and the edges are drawn as semicircles that go above or below the line to connect the
vertices. The layered drawing method (also shown in Figure 1) is done by placing the vertices of directed
graphs in horizontal rows, with the edges directed downwards. These methods are ideal when drawing
displaying networks with a few vertices and edge connections. However, they are not ideal for drawing
larger graphs.


Figure 1: Arc diagram (left)[5], and Layered method (right)[4].
The force-directed system (see example in Figure2) is a physics-based method that calculates the
attractive and repulsive force between vertices, and moves the vertices along the direction of the force
[7]. The process is repeated multiple times until the edges are close to equal lengths and there are as
few crossing edges as possible. This method is better suited for displaying clustered graphs. The larger
the graph however, the longer it takes for the vertices to be repositioned.The spring electrical model
(Figure 3) is a type of force-directed algorithm, where the system is visualized as electrically-charged
vertices connected by springs [7]. Springs are imagined to be placed between vertices that share edges.
The vertices are pulled together by the spring, while a repulsive electrical force exists among all pairs of
nodes. This method is also repeated until the system reaches equilibrium [7].


Figure 2: Force-directed graph drawing technique [6].

Figure 3: Spring-Electrical Models [7].
The multilevel approach to graph drawing aims to scale very large graphs to small ones. This is done by
taking the edge connections between multiple vertices and separating them into layers. Figure 4 shows
a demonstration of the multilevel approach. First the original graph is broken down into parts, and then
new vertices are created encapsulating the parts they represent. The new vertices can be used to
construct a smaller graph, which is then displayed. The new smaller graph now allows easy visualization
of the entire graph.

Figure 4: Multilevel graph visualization approach.
Visualization of large networks, i.e., graphs with more than millions of entities, is very challenging. This is
due to the constraints of screen displays and the limitations of current graph drawing algorithms. To
solve this problem, we implement a multilevel approach, where the network is partitioned into smaller
graphs that hold different parts of the larger graph. Figure 4 illustrates the multilevel approach to graph
visualization.



A network can be partitioned in many different ways. It can be partitioned by labeled categories in the
dataset, using weights associated with the vertices, or using a user-defined parameter present in the
data. For example, if a data consists of a list of interactions between different animals, the data can be
partitioned by grouping together animals that belong to the same species. This way, we can visualize a
higher level view, where the types of species which will be represented by new vertices that belong to a
smaller graph. We can then navigate to a specific species, to view an animal that belongs to that
category.
There are many software tools currently used to visualize small graphs. Gephi[2] is a windows
application that is an interactive visualization and exploration for networks and complex systems. It can
be used for social network analysis, exploratory data analysis, and biological network analysis. It
provides tools for people to explore and understand graphs through graphical visualization. Sigma Js [9],
D3 Js, and Processing Js are all browser-based JavaScript libraries that are dedicated to graph drawing.
JavaScript is a dynamic computer programming language used to develop browser-based applications.
These JavaScript libraries can be used to simplify network visualization in a browser, and allow
application developers to integrate network exploration. We chose the Sigma Js library because it is the
most light-weight of the three aforementioned libraries, and allows more user interaction with the
display. We are creating a web user interface application, where users can upload a formatted large
graph with multiple connections. Sigma Js takes a specific input with formatted labels of the vertices and
edges with listed properties such as color and size. We are developing a PHP script for preprocessing, to
reformat the users input to the format that Sigma Js recognizes. The end goal of this project is to enable
users to upload their generated networks consisting of millions of vertices and billions of edges, and
visualize them in a multilevel manner.
Methodology
The process begins with a formatted graph that consists of multiple vertices and edges. The graph is split
into smaller ones according to their connections. This creates multiple layers of the different parts of the
graphs. The formatted description of the vertices of the smaller graphs holds the identifier of the lower
level networks they represent. When the user wants to navigate to a certain part of the graph, we use
the identifier to locate that part of the graph and magnify the display unto it. The vertex zoom
functionality will be created using JavaScript. A mouse click functionality will also be implemented. The
user can use mouse to navigate through the network by zooming onto specific layers of the graph or
directly onto a vertex. The Sigma Js library utilizes the force-directed method for drawing. The specific
plug-in of the library that uses the force-directed method is called force atlas. When the users network
is ready for display, the force atlas plug-in is called to calculate the position of the vertices for display.
We display the graph using force atlas which is part of the Sigma Js library. The algorithm ensures that
the vertices are well positioned so that all the edges are equal length and that crossing edges are
reduced as much as possible.
During the first four weeks of the eight week research term, we worked on creating the user interface
and building example networks to display. The goal of the application is to allow users to better visualize
and interact with their large networks. The user interface is designed to allow user to move vertices
around the screen, zoom in and out of specific items, and also display textual information about a
vertex. We also added a functionality to change the color of the vertices. Most importantly, the user
interface comes with a search bar where user can search for particular items. The user interface was
designed using HTML, a hypertext markup language used to create the graphical view of a web page.
The user interface consists of input boxes and button selections with which the user can interact with a
mouse and a keyboard. Using JavaScript, We connected the users actions to specific aspects of the
network display, thereby creating the user interactivity with it. We tested networks with different sizes,
small, large, and very large, to analyze the visualization, interactivity and performance the displays. We
found that Sigma Js can processes network with up to 1000 vertices at a preferred performance level,
however, when the vertex count exceeds that amount, performance begins to degrade. This finding is
acceptable for the multilevel approach we will used to solve out problem. If a network with a million
vertices is chosen for visualization, it can be scaled down to a network with 1000 vertices, where each
vertex holds another network with 1000 vertices.
The last four weeks of the research term was dedicated to partitioning of the large graphs into its
smaller scaled representation. To test the multi-level approach, we chose a network with 1000 vertices
and partitioned it into 10 different parts. We partitioned it numerically from 0 to 99, 100 to 199, and so
on. First we used C++ to write the code for breaking up the larger graph. We wrote the code following
the format of the dataset download from the large network databases. The different partitions were
written to new files and another file was created with vertices linked to the partitions. The files are JSON
formats which Sigma Js recognized for created the display of the vertices and edges.
Findings
We tested many different networks from two main sources, KONECT - The Koblenz Network Collection
[10], and Stanford Large Network Dataset Collection [11]. We also tested many randomly generated
graphs with arbitrary sizes and position. Here are some of the results from displaying the networks using
Sigma Js. Figure 5(a) shows a display of a randomly generated graph using Sigma Js. Figure 5(b) shows
the same graph display with the force directed plug-in from Sigma Js applied to it. As mentioned before,
the vertices of the network are moves so that the edges are close to equal length when the force
directed algorithm is applied.

Figure 5: a) Random generated graph with
Sigma Js.


Figure 5: b) Force directed plug-in applied.

Figures 6(a), 6(b), and 6(c) show examples of networks visualized using Sigma Js. These networks were
downloaded from Stanford large network database. The format of the data set was defined by the
creators and therefore had to be converted to the format required by Sigma Js. After careful conversion
from the Stanfords graph data format to Sigma Js JSON format, we displayed the graph along with its
properties. We also tested the effects of the user interface dialog box on these networks. We found that
the vertices responded to the mouse and keyboard actions designed in the program. The vertices move
accordingly and changes colors upon selection of the option to change a vertex color, through the user
interface. Figure 6(a) displays a network with 1000 vertices. Figure 6(b) has a network with 5000
vertices, and Figure 6(c) has a network with 10000 vertices. As we can see in the displays, the network
becomes clustered with the vertex points. The network becomes very hard to visualize. It is not easy to
interpret the type of information being conveyed by the graph. It also takes very long to navigate
through the graph to find a specific item.

Figure 6: a) 1000 vertices.

Figure 6: b) 5000 vertices.

Figure 6: c) 10,000 vertices.
The beginning face of the user interface (Figure 7), directly allows the user to interact with the network
displayed. Interactivity also plays an important role in the visualization of the networks, especially when
implementing the multilevel approach. The user interface makes the information with the network
easily accessible through a navigable display. The display screen can be repositioned along with specific
item to visualize specific parts of the network or to maneuver unto certain vertices. The user interface
allows the user to search for specific items with the data set, change the color of the vertices and
edges, and also change how the edges are drawn. The user can also fit the network to the screen is they
have navigated too far into the display. We defined the number of iteration of the force directed
algorithm when the network is first loaded onto the web browser screen. The user interface has an
option for the user to continue iterating through the algorithm to get a better display of the network.

Figure 7: User interface dialog box.
For testing the multilevel approach to visualizing large networks, we chose the network from Figure 6(a),
to partition. Our goal was to partition it into 10 parts and create a new display to link the partitioned
parts to the files they are stored in. When we loaded Sigma Js, the new display is drawn unto the screen
and also follows the interactivity of the user interface. The vertices in those displays can be changed
with color, position and style. We can now navigate to specific parts of the network we want to display.
We added two animation processes that display either the part of the graph the user wants to navigate
to, or a specific item. If the user searches for an item in the search box, the first animation zooms in to
the part of the graph that item belongs to. Then that part of the graph is loaded onto the screen. The
second animation zooms on to the item search and displays it along with its attributes.Figure 8 shows
the display of the scaled down version of the test network (Figure 6(a)). The vertices are color coated to
match the colors of the part of the larger network it is linked to. The graph is displayed with the force
directed algorithm applied to it. Figure 9 shows the different partitions that were created. The display of
each part consists of items that belong and items from other parts that are linked. They follow the color
coat. If there is at least one connection between two parts of the graph, an edge is drawn in Figure 8 to
connect those two parts.

Figure 8: Scaled display result from partitioning network in Figure 6(a).








Figure 9: The different parts of the larger graph the vertices in Figure 8 are linked to.

Discussion
The results of the tests ran on network display using Sigma Js confirmed our assumptions of the
multilevel approach. When the network is scaled down to a smaller size compared to its larger
representation, we are able to analyze the larger network very easily. For our tests, we chose to use a
network consisting of 1000 vertices. Given the results from these tests, we believe that partitioning and
visualizing networks with over a million vertices will follow the same process and produce similar results.
We have set goals to test our partition algorithm on these much larger networks. The next step is to
create multiple stages when partitioning the networks. For example, if a network has a million vertices,
we can partition it into 1000 different parts, each consisting of 1000 vertices from the larger network.
We can then move to partition further by splitting the new display into 10 parts, the display that consists
of 1000 vertices linked to the 1000 different parts of the larger graph.
We encountered several challenges while conducting this research. The primary concern when designing
the application was to create as much client-side processes as possible and utilize minimal server-side
processes. On web browser applications, server-side processes are those handled on the computer of
the host, and client-side processes are handled on the computer of the user accessing the application.
We aim to process the partitioning of the graph on the user end. However, in the current approach, this
is done on the server. We faced another problem with the use of the force directed plug-in provided by
Sigma Js. We saw that in some displays, the algorithm ran continuously without stopping. We saw some
vertices constantly moving, sometimes back and forth in the same position. We resolved this by only
iterating through the plug-in a certain amount of time and then bringing it to a halt for the first display.
As mentioned earlier, we provided an option in the user interface for the user to continue iterating
through the plug-in if they wanted a better display than the one provided.
We are now developing this work further to possibly include an improved user interface dialog box, and
parallel partitioning of larger networks. We will test the different partitioning algorithms on networks
consisting of millions of vertices and billions of edges. The goal is to minimize the time it takes to
partition the items in the larger datasets and to display the results. If the time to partition the dataset is
minimized, we will add a function in the user interface to allow users to partition the network in real
time. They will be able to define how they want the data to be separated in accordance to the format
provided and see the end results of it on the display screen. We will also design better iteration of the
force directed plug-in so that the first display of the network is desirable.
We believe that the findings of the research project will greatly benefit those interested in analyzing and
interpreting their large datasets through visualization. The end result of the research will be a website
with user access to the application. The website will allow anyone to upload their datasets and easily
visualize and interact with the information conveyed by the dataset. The website will also support
multiple formats of the dataset, and will provide a guideline for the user to follow so the upload the
right formatted document.
References
1. Rhishikesh S. Fansalkar, Graph Theory Origin and Seven Bridges of Knigsberg, New York
University, 2007.
2. The Gephi team, Gephi, http://gephi.github.io/, last accessed August 2014.
3. The Google Team, Inside Search,
http://www.google.com/insidesearch/features/search/knowledge.html, last accessed August
2014.
4. Graph layout, http://goblin2.sourceforge.net/refman/pageGraphLayout.html, last accessed
August 2014.
5. Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, A Tour Through the Visualization Zoo,
http://homes.cs.washington.edu/~jheer/files/zoo/, last accessed August 2014.
6. John Howse, Peter Rodgers, and Gem Stapleton, "VL/HCC Tutorial 2009: Automated Diagram
Drawing", http://www.eulerdiagrams.com/tutorial/AutomatedDiagramDrawing.html, last
accessed August 2014.
7. Yifan Hu, Current and Future Challenges in the Visualization of Large Networks, Encyclopedia
of Social Network Analysis and Mining, 2013.
8. Yifan Hu, Efficient, High-Quality Force-Directed Graph Drawing", The Mathematica Journal
10(1), 2006.
9. Alexis Jacomy, Sigma js library, http://sigmajs.org/, last accessed August 2014.
10. Jrme Kunegis, KONECT-The Koblenz Network Collection, http://konect.uni-
koblenz.de/networks/, last accessed August 2014.
11. Jure Leskovec, Stanford Large Network Dataset Collection,
http://snap.stanford.edu/data/index.html, last accessed August 2014.

Das könnte Ihnen auch gefallen