Sie sind auf Seite 1von 11

A survey on the

algorithms of dynamic graphs

By
Sayantani Dutta

A Graduate Paper
Submitted to the Faculty of
Mississippi State University
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
in Computer Science
in the Department of Computer Science and Engineering
Mississippi State, Mississippi
December 2014

I. INTRODUCTION
A dynamic graph is a graph that goes through various updates. These updates may be insertion or
deletion of edges and vertices from the graph. An efficient dynamic graph algorithm aims to
update the solution of a problem after every dynamic operation, so that the solution does not
have to be computed again from scratch. Dynamic graphs can be both directed and undirected
and can be classified based on the types of updates they go through. A fully dynamic graph can
have unrestricted insertions and deletions of edges and vertices, whereas a partially dynamic
graph allows either insertions or deletions. An incremental partially dynamic graph allows only
insertions and a decremental partially dynamic graph allows only deletion operations. [2]
Dynamic graphs are used in communication networks, VLSI design, graphics, assembly
planning, financial transaction networks, disease transmission networks, ecological food
networks, sensor networks, gene regulatory networks, citation networks, protein-interaction
networks, ground transportation networks, power distribution networks, computational
phylogeny, web crawlers, and various others. [1, 2, 7]
In this survey paper, I discuss the algorithms used on directed and undirected dynamic graphs.
Also I will discuss the challenges faced by the algorithms used in web search engines and large
dynamic graphs in general.

II. ALGORITHMS USED FOR UNDIRECTED GRAPHS


Most of the algorithms used for undirected graphs involve the decomposition or partitioning of
the graph. The three tools used for dynamic graphs are clustering, sparsification and
randomization. [2,7]
1

Clustering: Clustering refers to the decomposition of the graph into clusters in such a way
that each update involves only a handful of clusters. Clusters are a suitable collection of
connected subgraphs. The decomposition is generally done recursively and the
information of the subgraphs is combined with the topology trees that maintain the
properties of the dynamically changing structure. Ambivalent data structures are also
used in clustering, in which only one edge belonging to multiple groups is selected, based
on the spanning trees topology. Clustering is used for the partitioning the vertex set into
subtrees connected in the designated spanning tree, so that each subtree is only adjacent
to a few other subtrees. The recursive partitioning of the spanning tree is represented
using two-dimensional topology trees that maintain information about the edges in the
spanning tree.
The time complexity of fully dynamic algorithms based on a single level of clustering is
O (m2/3), but if the partition is applied recursively using two-dimensional topology tress,
the time complexity becomes O (m1/2), where m is the number of edges in the graph.
According to Fredericksons theorem, The minimum spanning forest of an undirected
graph can be maintained in time O (m1/2) per update, where m is the current number of
edges in the graph. The time complexity is same for fully dynamic connectivity and 2edge connectivity. The main problem with this type of clustering is that it is very problem
dependent and so is very difficult to use as a black box.

Figure 1: Clustering of nodes in a graph

Sparsification: Sparsification is a divide-and-conquer method that can be used as a blackbox to design and dynamize graph algorithms. Dependence on the number of edges in the
graph is reduced and this helps to match the time bounds for maintaining some property
of the graph to the time taken to compute in sparse graphs. The time bound T (n, m) for a
graph with n vertices and m edges speeds up to T (n, O (n)), which is the time needed for
a sparse graph. This requires the notion of a certificate.
For any graph property P and graph G, a certificate for G is a graph G, such that G has
property P of and only if G has the property. The edges of a graph G with m edges and n
vertices are partitioned into O (m/n) subgraphs, with n vertices and O (n) edges. A sparse
certificate stores the information relevant for each subgraph. Larger subgraphs are
produced by merging the certificates in pairs. These are made sparse by again computing
their certificate. The result is a balanced binary tree in which each node is represented by
3

a sparse certificate. Each update involves O (log (m/n)) graphs with O (n) edges each,
instead of one graph with m edges.
Sparsification takes place in two types. The first type is used when no dynamic algorithm
is present and a static algorithm is used for recomputing a sparse certificate in each tree
node affected by an edge update. If the certificates can be found in time O (m + n), this
variant gives time bounds of O (n) per update.
When a dynamic algorithm is present, the second type is used, where certificates are
maintained using a dynamic data structure. A stability property of certificates is needed,
to ensure that a small change in the input graph does not lead to a large change in the
certificates. This type of sparsification transforms time bounds of the form O (m p) into O
(np).
A time bound T (n) is said to be well-behaved if, for some c < 1, T (n/2) < cT (n) and the
time bound does not fluctuate wildly with change in n.
According to the given theorem, let P be a property for which we can find sparse
certificates in time f (n, m) for some well-behaved f, and such that we can construct a
data structure for testing property P in time g (n, m) which can answer queries in time
q (n, m). Then there is a fully dynamic data structure for testing whether a graph has
property P, for which edge insertions and deletions can be performed in time
O (f (n, O (n))) + g (n, O (n)), and for which the query time is q (n, O (n)).
According to another theory, let P be a property for which stable sparse certificates can be
maintained in time f (n, m) per update, where f is well-behaved, and for which there is a

data structure for property P with update time g (n, m) and query time q (n, m). Then P
can be maintained in time O (f (n, O (n))) + g (n, O (n)) per update, with query time
q (n, O (n)).
Sparsification can be used in minimum spanning forests and edge and vertex
connectivity. Sparsification can be used orthogonally in various data structures. An
efficient dynamic graph algorithm can be produced by the combination of clustering and
sparsification.

Randomization: Randomization helps achieve faster update times. The algorithm,


presented by Henzinger and King, works on maintaining spanning forests, random
sampling and graph decomposition. In spanning forests, trees are maintained using the
Euler Tours data structure and helps obtain logarithmic updates and queries within the
forest. In random sampling, when an edge is deleted from a tree, the non-tree edges are
searched and selected randomly to replace the deleted edge. Graph decomposition is
combined with randomization. The graph G is edge decomposed using O (log n) edge
disjointed subgraphs, which are hierarchically ordered. The higher level contains tightly
connected portions (dense edge cuts) and the lower level contains loosely connected
portions (sparse edge cuts). For each level, a spanning forest for the graph, defined by all
the edges in that level or below, is also maintained.
The goal is to get a time bound of O (log 3 n). After an edge is deleted, a number of
sampled edges of O (log2 n) are searched for a replacement. However, if the candidate set
of edge e is a small fraction of all non-tree edges which are adjacent to the tree, it is
unlikely to find a replacement edge for e among this small sample. If no candidate is

found among the sampled edges, all the non-tree edges adjacent to the tree must be
checked explicitly.
After random sampling has failed to produce a replacement edge, we need to perform this
check explicitly; otherwise correct answers to the queries could not be guaranteed. Since
there might be a lot of edges which are adjacent to T, this explicit check could be time
consuming operation, so it should be made a low probability event for the randomized
algorithm. This can produce pathological updates, however, since deleting all edges in a
relatively small candidate set, reinserting them, deleting them again, and so on will
almost surely produce many of those unfortunate events. The graph decomposition is
used to prevent this undesirable behavior.
According to the Henzinger and King Theorem, let G be a graph with m 0 edges and n
vertices subject to edge deletions only. A spanning forest of G can be maintained in
O (log3 n) expected amortized time per deletion, if there are at least (m0) deletions. The
time per query is O (log n).

III. ALGORITHMS FOR DIRECTED DYNAMIC GRAPHS


The tools used for directed dynamic graphs are Kleene closures, locality, matrices and long
paths.

Kleene Closures: Path problems like transitive closure and shortest paths are tightly
related to matrix sum and matrix multiplication over a closed semiring. The transitive
closure of a directed graph can be obtained over the adjacency matrix of the graph via
operations on the semiring of Boolean matrices. Similarly, the shortest path distances in a
6

directed graph with real-valued edge weights can be obtained from the weight matrix of
the graph via operations on the semiring of real-valued matrices. The distance matrix of
the graph is actually the Kleene closure of the weight matrix of that graph. This Kleene
closure can be computed by either Recursive decomposition or Logarithmic

decomposition.
Locality: According to Demetrescu and Italiano, dynamic path problems can be solved by
maintaining classes of paths characterized by local properties. A path in a graph is
locally shortest if and only if every proper subpath of is the shortest path. A historical
shortest path is a path that has been shortest at least once since it was last updated. A path
in a graph is locally historical if and only if every proper subpath of is historical. If
the updates in the graph are not fully dynamic, the theorem holds that Let G be a graph
subject to a sequence of increase-only or decrease-only edge weight updates. Then the
amortized number of paths that start or stop being locally shortest at each update is O
(n2). According to Demetrescu and Italianos theorem, Let G be a graph subject to a
sequence of update operations. If at any time throughout the sequence of updates there
are at most O (h) historical paths in the graph, then the amortized number of paths that

become locally historical at each update is O (h).


Matrices: A matrix, subjected to dynamic changes, is a useful data structure to keep
information about the paths in dynamic directed graphs. Since, Kleene closures can be
constructed by evaluating polynomials over matrices, it is natural to consider data

structures for maintaining polynomials of matrices, subject to updates of entries.


Long paths: There is an intuitive combinatorial property of long paths on a graph. Long
paths can be found by using short searches. According to the theorem of Ullman and
Yannakakis, Let S, a subset of V, be a set of vertices chosen uniformly at random. Then
7

the probability that a given simple path has a sequence of more than (cn log n)/|S|
vertices, none of which are from S, for any c > 0, is, for sufficiently large n, bounded by
2c for some positive .

IV. CHALLENGES IN LARGE DYNAMIC GRAPHSAND WEB-CRAWLERS


Managing large dynamic graphs and using web-crawlers have a lot of challenges. [1, 6, 7, 8]

Minimum communication overhead is desired from a distributed system, which depend


on two factors- query latencies and replica maintenance. But in real-time operations,
read/write operations are latency critical and failing to keep those under acceptable limits
may lead to the demise of those applications. Also, in a dynamically evolving graph, the

cost of keeping replicas up-to-date may exceed the benefits of replication.


Load balancing across sites is requires to prevent over-utilization or under-utilization of

resources. Skewed replication decisions may lead to load imbalance.


Flash traffic is a very important problem, which is a flash of unexpected read/write

requests issued to the system within a small period of time.


All queries are desired to be executed with very low latencies to minimize the number of
pulls needed to gather information to answer the query. This property is called the
fairness criteria, which has a value less than or equal to 1. For a real time system, the

value is always less than 1.


There are a few efficient distributed-memory parallel implementations of even the

simplest algorithm for sparse, arbitrary graphs.


Dynamic graphs can be enormous with very limited (potentially abysmal) locality at all

levels of memory hierarchy, and are not partitionable and highly unstructured.
The edges and vertices of these graphs may have types and access pattern may be data
dependent.
8

Web crawlers have the problem of sampling web pages. A technique for unique sampling
of web pages can be used to find out how many pages are on the web, how many of them
are indexed by a search engine, what is the average length of each page, what percentage
of webpages are homepages, how do the properties change over time, etc. Unfortunately,

no such technique is known.


A random graph model is not present that models the behavior of the web page on the

pages as well as on the host level.


Web-search engines also have duplicate or near duplicate pages present. Even though
duplicate host detection is easier than mirror detection, there are a million different hosts

and comparing all pairs is simply infeasible.


Change in query logs give rise to the problem of data streams, where two sequences- an

increasing and the other decreasing are compared by reading the sequences only once.
Web contains many densely connected directed bipartite subgraphs, which is a densely
connected structure and they should at least contain a constant fraction of the
corresponding complete bipartite subgraphs.

V. CONCLUSION
In this paper, I have surveyed the tools that are required to analyze directed and undirected
dynamic graphs. The tools that are required for directed graphs are Kleene closures, Long paths,
Matrices and Locality, whereas the tools for the undirected graphs are Randomization, Clustering
and Sparsification. I have mentioned the time bounds for these algorithms which are generally
near to the optimum time complexities. I have also mentioned the various challenges faced in the
analyzing of dynamic graphs, large graphs and algorithms used by web crawlers in a web search
engine.
9

VI. REFERENCES
[1] David A. Bader, Petascale Computing for Large-Scale Graph Problems, Georgia Tech
College of Computing
[2] Carlos Castillo, Mauricio Marin, Andrea Rodriguez, Ricardo Baeza-Yates, Scheduling
Algorithms for Web Crawling, Center for Web Research
[3] Camil Demetrescu, Irene Finocchi, Giuseppe F. Italiano, Dynamic Graphs, Chapter 1,
CRC Press, 2001, pp. 1-20
[4] David Ediger, Karl Jiang, E. Jason Riedy, David A. Bader, GraphCT: Multithreaded
Algorithms for Massive Graph Analysis, IEEE Transactions on Parallel and Distributed
Systems, IEEE, 2012, pp. 1-11
[5] Oden Green, High Performance Computing for Irregular Algorithms and Applications
with an Emphasis on Big Data Analysis, Georgia Institute of Technology, May 2014, pp.
1-280
[6] Monika R. Henzinger, Algorithmic Challenges in Web Search Engines, Internet
Mathematics, Vol. 1, No. 1, 2003, pp. 115-126
[7] Jayanta Mondal, Amol Deshpande, Managing Large Dynamic Graphs Efficiently,
Special Interest Group on Management of Data, Scottsdale, Arizona, ACM, May, 2012,
pp. 1-12
[8] Pak Chung Wong, Chaomei Chen, Carsten Gorg, Ben Shneiderman, John Stasko, Jim
Thomas, Graph Analytics- Lessons Learned and Challenges Ahead, IEEE Computer
Society, 2011, pp. 18-29
[9] Clustering image from:
http://i11www.iti.unikarlsruhe.de/_media/members/robert_goerke/clustering_titlelogo_on
ethird.jpg

10

Das könnte Ihnen auch gefallen