Beruflich Dokumente
Kultur Dokumente
with focus on
Social, Technological and Engineered networks
1
Usha Nandini Raghavan, 1 Soundar Kumara and 2 Réka Albert
1 Department of Industrial Engineering, The Pennsylvania State University
2 Department of Physics, The Pennsylvania State University,
University Park, Pennsylvania, 16802, USA
Prologue
We at the Laboratory for Intelligent Systems and Quality (LISQ) at the department of
Industrial Engineering at Penn State are involved with studying complexity since 1989. In
the early stages of work at LISQ we focused on analyzing sensor signals and extracting
features from them for estimating the state of the machines [1, 2]. This fundamental work
evolved into characterizing and analyzing the observed data. These studies established for
the first time the existence of chaos in machining [3, 4, 5, 6, 7, 8]. This work in studying
complexity, in specific, nonlinear dynamics was conducted in different realms, namely sensor
networks, infrastructure monitoring and supply chains. Subsequently the logical question
addressed was “How do we deal with complexity when the number of participating entities
(nodes) increase?”. This took us in the direction of graph theory, random graphs and large
scale networks. In this monograph we summarize our work with the hope that it will help
the engineering community to pursue research in this new and exciting area of complex
networks.
This monograph is a results of sustained work over a period of last six years. Several of
our students helped us shape this work. Hari Prasad Thadakamalla who started this work
is instrumental in exploring supply chains as complex networks and search on weighted
graphs. We started collaborating with Dr.Réka Albert from the early stages of Hari’s PhD
thesis. Christopher Carrino explored dynamic community formation in social networks with
applications to terrorist networks. Usha Nandini Raghavan and Amit Surana explored
adaptivity in general. Nandini in specific addresses algorithms for community detection in
2
I. INTRODUCTION
Why does some innovation capture the imagination of a society while others do not?
How do people form opinions and how does consensus emerge in an organization? How to
capture the opinions and votes of people during election years?
What are the fundamentals of nature and how do cells and organisms evolve and
survive? What makes a cell’s functions robust and adaptable to its environment?
How can we make resource sharing through the Internet secure? In this information
age, how do we as users quickly find relevant information from the World Wide Web?
How do we guard technological infrastructures that form the backbone of our day to day
business, from malicious attacks?
How can we sense and prevent forest fires at an early stage? How do we put to use
sensor devices to detect forest fires? How can we use autonomous sensor nodes to monitor
dangerous terrains and large chemical plants?
These are only a few questions the answers to which will affect the lives of people
and the society we live in significantly. Science and engineering in their overall effort to
address these issues have created many different avenues of research; Network Science
being one among them. Network science is the study of systems mainly using their
network structure or topology. The nodes (vertices) and links (edges) of such networks
are the entities (people, bio-molecules, webpages and sensor devices) and the interactions
(friendships, chemical reactions, hyperlinks and communications respectively) between
entities respectively.
People have opinions of their own, but they also shape opinions by interacting and ex-
changing views with their friends and neighbors. Sociologists have long understood that an
individual’s behavior is significantly affected by their social interactions [9, 10]. It is now
widely believed that biological functions of cells and the robustness of cellular processes arise
due to the interactions that exist between the components of various cells [11]. Webpages
with contents and information relate to other webpages by means of hyperlinks creating a
4
complex web like structure; the WWW. Miniaturized wireless sensor nodes, which individu-
ally have limited capabilities, achieve an overall sensing task by communicating and sharing
information with other nodes [12, 13].
A vast amount of research in recent years has shown that organization of links (who is
connected to whom) in a network and the topological properties carry significant information
about the behaviors of the system it represents [10, 14]. Furthermore, the topological prop-
erties have a huge impact on the performances of processes such as information diffusion,
opinion formation, search, navigation and others.
Organization of links in large-scale natural networks was originally considered to be ran-
dom [10, 14, 15]. But empirical observations in the recent past have revealed topological
properties in a wide range of social, biological and technological networks that deviate from
randomness [10, 14, 16, 17]. That is, natural networks that appear in nature and whose
evolution is largely uncontrolled (self-organized) have specific organizing principles leading
to various properties or orders in their topology. This observation has sparked an interest
in the scientific study of networks and network modeling, including the desire to engineer
man-made systems to mimic the behaviors of nature.
II. NETWORKS
As explained above, complex systems are modeled as networks to understand and op-
timize processes such as formation of opinions, resource sharing, information retrieval, ro-
bustness to perturbations etc. The following are some of the examples of systems and their
network representations.
A. Natural networks
1. Movie actor collaborations: This network consists of movie actors as nodes and edges
represent the appearance of pairs of actors in the same movie. It is a growing network
that had about 225,226 nodes and 13,738,786 edges in 1998 [18]. Interests in this
network include the study of successful collaborations (what kind of casting makes a
5
movie successful?) [19] and the famous Bacon number experiment to study how other
actors are linked to Kevin Bacon through their casting roles [20].
2. Scientific co-authorship: In this network, the nodes are scientists or researchers and
an edge exist between scientists if they have collaborated together in writing a paper.
Newman [21, 22, 23] studied scientific co-authorship networks from four different areas
of research. The information was obtained in an automated way from four different
databases MEDLINE, Physics E-print archive, SPIRES and NCSTRL that has a col-
lection of all the papers and their authors in areas of biomed, physics, high-energy
physics and computer science respectively. One of these networks formed from Med-
line database for the period from 1961 to 2001 had 1,520,251 nodes and 2,163,923
edges. Developing metrics to quantify the scientific productivity or cumulative impact
of a scientist given his/her collaborations is one problem of interest in co-authorship
networks [24, 25]. The Erdős Number project, which motivated the Bacon number, is
a popular experiment that is used in the study of optimal co-authorship structures of
successful scientists [26].
3. The Internet: The Internet is a network of computers and devices connected by wired
or wireless links. The study of Internet is carried out at two different levels namely,
router level and at the level of autonomous systems [14, 27]. At the router level, each
router is represented as a node and the physical connections between them as the edges
in the network. In the autonomous systems level, every domain (Internet Provider
System) is represented as a node and the inter-domain connections are represented by
the edges. The number of nodes at the router and domain level were 150,000 in 2000
[27] and 4000 in 1999 [28] respectively. The problem of identifying and sharing files
efficiently over peer-to-peer networks (such as Gnutella [29]) that are built over the
Internet has received significant attention in recent years [30, 31].
4. World Wide Web (WWW): The WWW is a network of webpages where the hyperlinks
between the webpages are represented by the edges in the network. It is a growing
network that had about one billion nodes in 1999 [32] with a recent study estimating
the size to be about 11.5 billion in January 2005 [33]. Information retrieval from
WWW is a problem of immense interest. Algorithms such as Page Rank [34] or the
6
ones proposed by Kleinberg in [35], use the network structure to extract webpages in
the order of relevance to user requests.
5. Neural networks: Here the nodes are neurons and an edge connects two neurons if there
is a chemical or electrical synapse between them. Watts and Strogatz [14, 18] studied
topological properties of the neural network of nematode worm C.elegans consisting
of 282 neurons with pairs of neurons connected by the presence of either a synapse
or a gap junction. Study of neural networks is important for understanding how the
brain stores and processes information [17]. While we can observe that this is done in
an optimal and robust way in neural networks we are still at loss in quantifying this
mechanism [17].
6. Cellular networks: Here the substrates or molecules that constitute a cell are repre-
sented as nodes and the presence of bio-chemical interactions between the molecules are
represented as edges [14]. Among others, the interactions between protein molecules
are important for many biological functions [11, 36]. Jeong et al. [11] have studied
the topology of protein-protein interaction map of the yeast S.cerevisia that consists
of 1870 nodes as proteins and connected by 2240 identified interactions. Using the
network structure to predict possible (previously unidentified) interactions between
protein molecules has received wide spread attention from researchers [37, 38].
B. Engineered networks
Engineered networks are those in which the nodes of the network follow a pre-specified
set of protocols by which the links are formed. Whether the control is centralized or de-
centralized, the organization is engineered to achieve desired topological properties. Some
examples follow.
1. Agent-based supply chain networks: Here software agents that are responsible for the
functions of a supplier, manufacturer, distributor and retailer are the nodes and the
direct flow of information/tasks/commodities between entities are represented by the
edges in the network. Thadakamalla et al [39] studied the topological properties of a
military supply-chain (with 10,000 nodes [40]) and proposed mechanisms by which the
nodes can re-organize under functional constraints to provide better performances.
7
2. Wireless Sensor Networks (WSN): Here the nodes represent miniaturized wireless sen-
sor devices that consist of a short-ranged radio tranceiver and limited computational
capabilities [12, 13]. Though individual sensors have limited capacities, the true value
of the system is achieved by sharing responsibilities and information through a com-
munication infrastructure [13]. Thus an edge in a WSN represents the presence of
communication between two nodes. The number of nodes in a WSN can vary any-
where between a few hundreds or thousands to even millions depending on the appli-
cation scenario. The sensor nodes when deployed in a sensing region will self-organize
to establish a communication topology. There is considerable interest in developing
topology control protocols that will guide this organization process to support the
global sensing tasks [12, 41, 42].
Interest in the study of natural complex networks can be broadly classified into two
classes, namely scientific and engineering. The scientific interest lies in understanding the
structure, evolution, and properties of networks, with an eventual goal of engineering more
efficient processes on these networks. The engineering interest, on the other hand, lies in
developing more efficient algorithms and finding optimal parameters to better control the
processes taking place on such networks [10, 14, 17].
With an increasing understanding on the structural organization leading to emergent
properties, a rich literature of complex network models that can mimic such properties
has developed [10, 14, 17, 18]. These network models then form the basis on which
processes such as disease propagation, information diffusion, search, navigation and others
are studied and analyzed. Some of the interesting questions that can be answered using a
combination of both aspects of this research include 1) how to control the spread of diseases
in a large class of people interconnected by physical contacts, 2) how to study, maintain,
and control the diffusion of information in WWW, and 3) how to better identify targets
for drug discovery in metabolic networks? In parallel there is also considerable interest
in engineering networks such as supply chains and miniaturized wireless sensors, where,
by controlling the interactions between entities desired behaviors are achieved [12, 39, 43, 44].
8
Other links
• Center for the study of complex systems at University of Michigan, Ann Arbor
• Tracing information flow - Project jointly developed at Cornell University and Carleton
College
• Google Research
• Web Search & Mining and Web search and Data mining groups at Microsoft research
• orgnet software
• NetworkX - Python package for creation, manipulation and study of complex networks
In a social network the nodes represent actors (such as individuals) who are interconnected
by relationships (such as friendship or acquaintance). Social network analysis (SNA) deals
with the study of such networks and how the structural measures and properties relate to
individuals and the processes taking place on these networks.
SNA emphasis the prominent role relationships play in characterizing an individual entity
(or actor). Some of the properties that are used today in complex networks research have
their origins in sociometry such as degree, betweenness centrality, closeness centrality etc.
Such concepts were defined to quantify the prominent or central role played by an actor in
a given network. Under the framework of complex network theory and SNA, there has been
many research efforts that characterized the social interactions or the relative importance
of nodes in movie actor collaborations [16, 20], co-authorship networks [24] and others.
There has also been many work that has to some extent characterized the roles of actors
and predicting future collaborations in terrorist networks [19, 45]. In [45] an extended
10
network of September 11th hijackers and their associates it was shown that many ties in
the network were concentrated around the pilots or persons with unique skills. Hence by
targeting and removing those with necessary skills (or high-degree nodes) for a project can
inflict maximum damage to the project’s mission (network connectivity).
There has however been a constant debate on the validity of data points that are collected
to form networks involving people and their relationships. For example, if one wants to study
relationships among school children, the network is formed by asking individual children in
a specific school to identify their friends. It is possible that some children tend to acquaint
or call every one in his/her class as friends. Especially when the data points collected is
small, it is often difficult to provide confidence to statistical analysis/observations and their
consequences. Scientific collaboration is a network where an abundant amount of accurate
information is available on scientists and their collaborations. As a result they are very
popular in the research community in the study of their structures and in understanding the
social implications of their structural properties. In this network, the nodes are scientists
or researchers and an edge exists between scientists if they have collaborated together in
writing a paper. The network can also be weighted based on some index of the number
of collaborations between scientists. In [21, 22, 23], scientific collaboration networks from
various fields (biomedical, theoretical physics, high-energy physics and computer science)
were considered for structural analysis. One of the important consequences of understanding
the underlying structures of such social networks is to test new theories on models of these
networks [10, 14, 17].
Citation networks have been studied extensively to identify the historical and social
impact of papers/research/scientists. Since the introduction of the Science Citation Index
(SCI) by the Institute for Scientific Information, researchers have been able to construct
and study the structure of large volumes of citation interconnections between papers. The
SCI provides a list all papers from selected Journals and under each of these papers is again
another list of papers that has references to them. In particular, a citation network consists
of papers as nodes and an edge exists between papers, directed towards the cited paper. Price
[46], based on his empirical study, was the first to observe that in many papers one half of
the references were to a research front of recent papers, while the other half of the references
were uniformly randomly scattered through the literature. This suggests that there is a
tendency among researchers to build a research front based on recent work. Currently, there
11
are many databases with information on papers and their references/citations that are freely
available to the community. Few such databases include the Stanford Public Information
Retrieval(SPIRES) which consists of papers in the field of High Energy physics, CiteSeer
which is an open access digital library that consists of a comprehensive list of scientific and
academic papers, Citebase that indexes papers that are self-archieved by authors in the field
of physics, mathematics and computer science and BioMed Central and PubMed Central
that indexes published papers in the field of biomedicine. Availability of large volumes of
accurate data has revived the interests among researchers in the field of citation analysis.
Hirsch [24] developed a structural measure called as the h-index which, unlike previous
measures, can quantify the cumulative impact and relevance of an individual’s scientific
research output. In specific, the h-index of a scientist is h if h of his/her papers has at
least h citations and the others have fewer than h citations. If this index, Hirsch argues, is
different for two different scientists who both have the same number of publications and the
same number of overall citations, then the scientist with a higher h value is likely to be the
most accomplished between the two.
The telecom industry has provided us with some of the most naturally available social
network structures for statistical analysis. Aiello et al [47, 48] analyzed the graph of long-
distance calls made between different phone numbers. They constructed a random graph
model that best emulates the properties of the phone call networks. A similar study was
also done by Nanavati et. al. [49] on a call graph constructed between cell phone users of
a certain telecom provider. Here directed edges were considered between callers originating
from the person making the call to the person who is receiving the call. Their analysis
showed that some of the properties (namely degree distribution) were different from similar
networks such as the WWW and e-mail graphs. They further proposed a Treasure-Hunt
model that can capture these degree distributions effectively.
In addition to studying the structures of social networks, there have also been works that
combine network analysis with other methods to make inferences about the characteristics
of individual entities or groups. In [50], the network of committees and subcommittees of
the U.S. House of Representatives between the 101st and 108th Congress were analyzed.
Here an edge exists between committees if they have common membership. In addition to
network theory, a Singular Value Decomposition (SVD) analysis of the roll call votes by the
members were used to identify correlations between members’ committee assignments and
12
their political positions (such as Republic or Democract). Hogg and Adamic [51] have argued
that using ranking methods such as PageRank [34] or NodeRank [52] to assign reputations
to nodes in a social network can be made effective by making it more difficult to alter ratings
via duplication or collusion. In particular they argue that the structural measures of the
social networks can be effectively used to make ranking systems more reliable.
While traditional models for disease propagation that assume a fully mixed population
work well on small-sized populations they fail to agree with observed trends in heterogenous
and large-sized populations [10, 53, 54, 55, 56, 57]. In such cases, simulation has emerged
as a powerful tool that can capture both the topological properties and changes along with
the disease dynamics to provide a better understanding of the disease propagation in social
networks [58]. There have also been several studies related to opinion formation [59] and
finding community or group structures in social networks [35, 60, 61, 62, 63]. It has been
observed that the interconnections between nodes in real-world networks is not random,
but display a structure wherein nodes show preferences in being connected to other nodes
within a tightly knit group. Finding such tightly interconnected groups of nodes (termed
as communities), can offer a micro level information about the structures of the networks
individually within communities and as a whole [61, 63, 64]. To social networks in particu-
lar, communities can throw light on opinion formation, common characteristics and beliefs
among groups of people that make them different from other communities.
Information sharing and retrieval drives the day to day business across the world. This
need has propelled the research interests on technological networks such as the Internet and
the WWW. The goal of such a research is to develop efficient protocols for communication
on the Internet and information retrieval in the WWW. To achieve this goal a two pronged
approach is required. One branch of research focus is in the understanding of the organization
of the technological networks. Using this understanding, the second focus relies on the
network models developed, to optimize information sharing and retrieval.
The map of the Internet is considered at two different scales. At the level of Autonomous
Systems (AS), the network of the Internet consists of nodes as ASs with edges representing
the physical communications. An AS here is an organizational unit of a particular domain
13
those nodes. Since it is a directed network, unlike the internet, its in and out degree distri-
butions are analyzed differently. Albert et. al. observed the presence of a power-law degree
distribution in the WWW map at the *.nd.edu domain [67]. While the power-law exponent
for the in-degree distribution was 2.1, the exponent for the out-degree distribution was 2.45.
Pencock et al [68], analyzed the WWW by dividing them along subject categories, such
as computer science, universities, companies and newspapers. Within these categories, the
in-degree distribution of the networks showed considerable variability in the power-law ex-
ponent ν, varying between 2.1 and 2.6. This implies that the structure of the WWW shows
different dynamics based on the way the information of the webpages and their connections
are identified and mapped [64].
It has also been shown that the way in which nodes and interconnections are identified
in networks (sampling methods) can affect our estimation of the structural properties of the
original network [69, 70, 71, 72, 73, 74, 75]. For example in [76] Lakhina et. al. show that
by using traceroute-like sampling methods [75] it is possible to conclude from the sample
that the network has scale-free property when in fact the original network is a random graph
[15].
File sharing peer-to-peer networks, such as Gnutella, are another kind of communication
network that has emerged on the top of the basic Internet structure. In specific, Adamic et al
and Thadakamalla et al have analyzed how decentralized search processes on networks such
as Gnutella are affected by the heterogeneities in the degree and edge weight distributions
[30, 31, 77]. In [77] the authors studied decentralized search processes in spatial scale-free
networks. In particular they showed that two factors, namely direction and degree on nodes,
are sufficient to guide the search processes to finding the shortest paths from the origin to
destination. This result adds further evidence to the conjecture that many natural networks
are inherently searchable [78, 79].
Information retrieval is an important issue on the WWW. Search engines are useful tools
that help in information retrieval. Algorithms such as Page Rank are used to retrieve the
webpages in the order that is expected to be of relevance to user requests. This algorithm uses
both the individual webpage’s value and the value attached to the webpage by its neighbors
as an indicator of the overall value of a given webpage [34]. Kleinberg et al [35], propose
similar link based mechanisms for retrieving webpages with relevant information, but do so
using two different sets of measures. They associate values to webpage that determines if
15
they are good authority and/or good hubs. A good hub is a webpage that has hyperlinks to
many good authorities and a good authoritarian webpage is one that is referenced by many
good hubs. The best set of hubs and authorities then contain information that is of most
relevance to the user. Such an approach, according to Kleinberg, was motivated by a large
presence of bi-partite sub-structures observed in the WWW network [35].
Large supply chains are among the networks that have complex topologies [39, 80, 81].
Analysis of the topological properties of real-world supply chains is difficult. This is because
supply chains are composed of various individual and independent entities such as suppliers,
manufacturers, distributors and retailers. Hence, it is difficult to compose information from
various sources to form an accurate picture of any large-scale supply chain. It is however well
known that their topologies tend to be hierarchical so as to enable product flow downstream
from suppliers to customers and information flow upstream from customers back to the
suppliers. One of the well studied dynamics on the supply chains is the Bullwhip effect [82],
where small variabilities or uncertainties created at the lower most layer increases as the
uncertainties move upstream towards the manufacturer and suppliers. This cascading effect
is due to the coupling of complexities arising from human judgement with that of the supply
chain structure. Cascading effects are also studied in the context of power distribution in
power grids [83, 84, 85, 86, 87]. The North American power grid is one of the most complex
technological network. It consists of substations of three types namely, generation substation
responsible for producing electric power, transmission substations that transfers power along
high voltage lines and distribution substations that distribute power to small, local grids.
Kinney et al [87], study the effect of cascading failures on the exact topology of the North
American power grid with plausible assumptions about the load and overload of substations.
If a substation fails to work, then the generated power, since it cannot be destroyed, is re-
routed via other nodes in the network. As a result, the load on other nodes increases and
may result in cascading effects. Under single node removal, Kinney et al showed that 40
percent of transmission substations lead to cascading failures in the North American power
grid.
16
V. CONTENTS TO FOLLOW
We have collected a series of papers that we have published in the last few years and
added them as the remaining contents of this document. They are
• U.N. Raghavan and S. Kumara, “Decentralized topology control algorithms for con-
nectivity of distributed wireless sensor networks”, International Journal of Sensor
17
• U.N. Raghavan, R. Albert and S. Kumara, “Near linear time algorithm to detect
community structures in large scale networks”, Physical Review E, Vol. 76 (036106)
(2007).
In this paper we study the presence of clusters/communities in various real-world com-
plex networks such as movie actor collaboration network, protein-protein interaction
maps, scientific co-authorships and the WWW.
• H.P. Thadakamalla, S. Kumara and R. Albert, “Complexity and large scale networks”,
Chapter 11 in Operations Research and Management Science Handbook edited by A.
R. Ravindran, CRC press (2007).
Engineering community, in specific the Industrial Engineering community’s focus is
on OR. We have thoroughly investigated the relationship between OR and complex
networks in this book chapter.
[27] M. Faloutsos, P. Faloutsos, and C. Faloutsos, in SIGCOMM ’99: Proceedings of the conference
on Applications, technologies, architectures, and protocols for computer communication (ACM,
1999), pp. 251–262.
[28] R. Govindan and H. Tangmunarunkit, in IEEE INFOCOM 2000 (Tel Aviv, Israel, 2000), pp.
1371–1380.
[29] G. Kan, Peer-to-Peer Harnessing the Power of Disruptive Technologies (O’Reilly, Beijing,
2001), chap. Gnutella.
[30] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, Physical Review E 64,
046135 (2001).
[31] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara, Physical Review E 72, 066128 (2005).
[32] S. Lawrence and C. L. Giles, Nature 400, 107 (1999).
[33] A. Gulli and A. Signorini, in WWW ’05: Special interest tracks and posters of the 14th
international conference on World Wide Web (ACM Press, New York, USA, 2005), pp. 902–
903.
[34] L. Page, S. Brin, R. Motwani, and T. Winograd, Tech. Rep., Stanford Digital Library Tech-
nologies Project (1998), URL citeseer.ist.psu.edu/page98pagerank.html.
[35] J. M. Kleinberg, Journal of the ACM 46, 604 (1999).
[36] H. Jeong, S. Mason, A.-L. Barabási, and Z. Oltvai, Nature 411, 41 (2001).
[37] I. Albert and R. Albert, Bioinformatics 20 (2004).
[38] R. Albert, The Plant Cell 19, 3327 (2007).
[39] H. P. Thadakamalla, U. N. Raghavan, S. Kumara, and R. Albert, IEEE Intelligent Systems
19, 24 (2004).
[40] S. Kumara, Tech. Rep., The Pennsylvania State University (2005).
[41] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, Computer Networks 38, 393
(2002).
[42] D. Culler (2001).
[43] D. M. Blough, M. Leoncini, G. Resta, and P. Santi, IEEE Transactions on Mobile Computing
(to appear) (2006).
[44] J. M. Ottino, Nature 427 (2004).
[45] V. Kerbs, First Monday 7 (2002).
[46] D. Price, Science 149, 510 (1965).
20
[47] W. Aiello, F. Chung, and L. Lu, Proceedings of the thirty-second annual ACM symposium on
Theory of computing pp. 171–180 (2000).
[48] W. Aiello, F. Chung, and L. Lu, Experimental Mathematics 10, 53 (2001).
[49] A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and
A. Joshi, in CIKM ’06: Proceedings of the 15th ACM international conference on Information
and knowledge management (ACM, New York, NY, USA, 2006), pp. 435–444.
[50] M. Porter, P. Mucha, M. Newman, and A. Friend, Physica A 386, 414 (2007).
[51] T. Hogg and L. Adamic, in EC ’04: Proceedings of the 5th ACM conference on Electronic
commerce (ACM, New York, NY, USA, 2004), pp. 236–237.
[52] K. Chitrapura and S. Kashyap, in CIKM ’04: Proceedings of the thirteenth ACM international
conference on Information and knowledge management (ACM, New York, NY, USA, 2004),
pp. 597–606.
[53] R. Pastor-Satorras and A. Vespignani, Physical Review E 63, 066117 (2001).
[54] R. Pastor-Satorras and A. Vespignani, Physical Review Letters 86, 3200 (2001).
[55] R. Pastor-Satorras and A. Vespignani, Physical Review E 65, 035108 (2002).
[56] R. Pastor-Satorras and A. Vespignani, Physical Review E 65, 036104 (2002).
[57] R. Pastor-Satorras and A. Vespignani, Handbook of Graphs and Networks (Wiley-VCH, Berlin,
2003), chap. Epidemics and immunization in scale-free networks.
[58] C. Christensen, I. Albert, B. Grenfell, and R. Albert (2008), working paper.
[59] F. Wu and B. Huberman, Computational Economics 0407002, EconWPA (2004), available at
http://ideas.repec.org/p/wpa/wuwpco/0407002.html.
[60] M. E. J. Newman and M. Girvan, Physical Review E 69, 026113 (2004).
[61] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, Nature 435, 814 (2005).
[62] J. Duch and A. Arenas, Physical Review E 72, 027104 (2005).
[63] U. Raghavan, R. Albert, and S. Kumara, Physical Review E 76, 036106 (2007).
[64] G. Flake and D. Pencock, The Colours of Infinity: Self-organization, Self-regulation, and
Self-similarity on the Fractal Web (2004).
[65] A. Vazquez, R. Pastor-Satorras, and A. Vespignani, Internet topology at the router and
autonomous system level (2002), URL http://www.citebase.org/abstract?id=oai:arXiv.org:
cond-mat/0206084.
[66] J. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W. Willinger,
21
1. Introduction
of its structure and function. This sheer complexity of supply-chain networks, with
inevitable lack of prediction, makes it difficult to manage and control them. Further-
more, the changing organizational and market trends mean that the supply chains
should be highly dynamic, scalable, reconfigurable, agile and adaptive: the network
should sense and respond effectively and efficiently to satisfy customer demand.
Supply-chain management necessitates the decisions made by business entities to
consider more factors that are global. The successful integration of the entire
supply-chain process now depends heavily on the availability of accurate
and timely information that can be shared by all members of the supply chain.
Information technology, with its capability of setting up dynamic informa-
tion exchange networks, has been a key enabling factor in shaping supply chains
to meet such requirements. However, a major obstacle remains in the deployment of
coordination and decision technologies to achieve complex, adaptive, and flexible
collective behaviour in the network. This is due to the lack of our understanding
of organizational, functional and evolutionary aspects in supply chains. A key
realization to tackle this problem is that supply-chain networks should be treated
not just as a ‘system’ but as a ‘Complex Adaptive System’ (CAS). The study of
CAS augments the systems theory and provides a rich set of tools and techniques
to model and analyse the complexity arising in systems encompassing science and
technology. In this paper, we take this perspective in dealing with supply chains and
show how various advances in the realm of CAS provide novel and effective ways
to characterize, understand and manage their emergent dynamics.
A similar viewpoint has been emphasized by Choi et al. (2001), who aimed to
demonstrate how supply networks should be managed if we recognize them as CAS.
The concept of CAS allows one to understand how supply networks as living systems
co-evolve with the rugged and dynamic environment in which they exist and identify
patterns that arise in such an evolution. The authors conjecture various propositions
stating how the patterns of behaviour of individual agents in a supply network relate
to the emergent dynamics of the network. One of the important deductions made is
that when managing supply networks, managers must appropriately balance how
much to control, and how much to let emerge. However, no concrete framework has
been suggested under which such conjectures can be verified and generalized. It is
the goal of this paper to show how the theoretical advances made in the realm of
CAS can be used to study such issues systematically and formally in the context of
supply-chain networks.
We posit that supply chains are complex adaptive systems. However, we do not
provide conclusive proofs for such a claim. We survey the emerging literature, faith-
fully report on the state of the art in CAS and try to establish connections, as much
as possible, between CAS tools and supply-chain analysis. Through our effort, we
would like to pave research directions in supply chains from a CAS point of view.
This paper is divided into eight sections. In section 2, we give a brief introduction
to complex adaptive systems in which we discuss the architecture and characteristics
of complex systems in diverse areas encompassing biology, social systems, ecology
and technology. In section 3, we discuss characteristics of supply chain-networks
and argue that they should be understood in terms of a CAS. We also present
some emerging trends in supply chains and the increasing critical role of information
technology in supply-chain management in the light of these trends. In section 4,
we give a brief overview of the main techniques used for modelling and analysis
Supply-chain networks: a complex adaptive systems perspective 4237
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
of supply chains and then discuss how the science of complexity provides a genuine
extension and reformulation of these approaches. Like any CAS, the study of
supply chains should involve a proper balance of simulation and theory. System
dynamics-based and recently agent-based simulation models (inspired by complexity
theory) are used extensively to make theoretical investigations of supply chains
feasible and to support decision-making in real-world supply chains. A system
dynamics approach often leads to models of supply chains described in the form
of a dynamical system. Dynamical systems theory provides a powerful framework
for rigorous analysis of such models and thus can be used to supplement the system
dynamics simulation approach. We illustrate this in section 5, using some nonlinear
models, which consider the effect of priority, heterogeneity, feedback, delays and
resource sharing on the performance of supply chains. Furthermore, the large
volumes of data generated from simulations can be used to understand and com-
prehend the emergent dynamics of supply chains. Even though an exact understand-
ing of the dynamics is difficult in complex systems, archetypal behaviour patterns
can often be recognized, using techniques from complexity theory like Nonlinear
Time Series Analysis and Computational Mechanics, which are discussed in section 6.
The benefits of integrated supply chain concepts are widely recognized, but the
analytical tools that can exploit those benefits are scarce. In order to study supply
chains as a whole, it is critical to understand the interplay of organizational structure
and functioning of supply chains. Network dynamics, an extension of nonlinear
dynamics to networks, provides a systematic framework to deal with such issues
and is discussed in section 7. We conclude in section 8, with the recommendations
for future research.
Many natural systems, and increasingly many artificial (man-made) systems as well,
are characterized by apparently complex behaviours that arise as the result of non-
linear spatio-temporal interactions among a large number of components or sub-
systems. We use the term agent and node interchangeably to refer to the component
or subsystems. Examples of such natural systems include immune systems, nervous
systems, multi-cellular organisms, ecologies, insect societies and social organizations.
However, such systems are not confined to biology and society. Engineering theories
of controls, communications and computing have matured in recent decades, facil-
itating the creation of large-scale systems which have turned out to possess bewilder-
ing complexity, almost equivalent to that of biological systems. Systems sharing this
property include parallel and distributed computing systems, communication net-
works, artificial neural networks, evolutionary algorithms, large-scale software sys-
tems, and economies. Such systems have been commonly referred to as Complex
Systems (Baranger 2005, Bar-Yam 1997, Adami 1998, Flake 1998). However, at the
present time, the notion of a complex system is not precisely delineated.
The most remarkable phenomenon exhibited by the complex systems is the emer-
gence of highly structured collective behaviour over time from the interaction
of simple subsystems without any centralized control. Their typical character-
istics include: dynamics involving interrelated spatial and temporal effects, cor-
relations over long length and timescales, strongly coupled degrees of freedom,
4238 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
has been introduced recently to focus on the ‘robust, yet fragile’ nature of complex-
ity. It is also becoming increasingly clear that robustness and complexity in biology,
ecology, technology, and social systems are so intertwined that they must be treated
in a unified way. Given the diversity of systems falling into this broad class, the
discovery of any commonalities or ‘universal’ laws underlying such systems requires
a very general theoretical framework.
The scientific study of CAS has been attempting to find common characteristics
and/or formal distinctions among complex systems that might lead to a better under-
standing of how complexity develops, whether it follows any general scientific laws
of nature, and how it might be related to simplicity. The attractiveness of the
methods developed in this research effort for general-purpose modelling, design
and analysis lies in their ability to produce complex emergent phenomena out of
a small set of relatively simple rules, constraints and the relationships couched in
either quantitative or qualitative terms. We believe that the tools and techniques
developed in the study of CAS offer a rich potential for design, modelling and
analysis of large-scale systems in general and supply chains in particular.
Demand forecasting is used to estimate demand for each stage, and the inventory
between stages for the network is used for protecting against fluctuations in supply
and demand across the network. Owing to the decentralized control properties of the
SCN, control of ripple effect requires coordination between entities in performing
their tasks. With the increase in the number of participants in the supply chain,
the problem of coordination has reached another dimension.
Two important organizational and market trends that are on their way have been
the atomization of markets as well as that of organizational entities (Balakrishnan
et al. 1999). In such a scenario, the product realization process has a continuous
customer involvement in all phases—from design to delivery. Customization is not
only limited to selecting from pre-determined model variants; rather, product design,
process plans, and even the supply chain configuration have to be tailored for
each customer. The product-realization organization has to form on the fly, as a
consortium of widely dispersed organizations to cater to the needs of a single cus-
tomer. Thus, organizations consist of series of opportunistic alliances among several
focused organizational entities to address particular market opportunities. For
manufacturing organizations to operate effectively in this environment of dynamic,
virtual alliances, products must have modular architectures, processes must be well
characterized and standardized, documentation must be widely accessible, and sys-
tems must be interoperable. Automation and intelligent information processing is
vital for diagnosing problems during product realization and usage, coordination,
design and production schedules, searching for relevant information in multi-media
databases. These trends exacerbate the challenges of coordination and collaboration
as the number of product realization networks increase, and so does the number
of partners in each network.
Building a larger inventory can be used as a general means for dealing with highly
changing market demand and short-life-cycle products. However, augmenting inven-
tory building with information may be a useful approach. Information about
the material lead time from different suppliers can be used for planning the material
arrival, instead of simply building up an inventory. The demand information can
be transmitted to the manufacturers on a timely basis, so that the orders can be
fulfilled with less inventory costs. In fact, it is widely realized that the successful
integration of the entire supply-chain process depends heavily on the availability of
accurate and timely information that can be shared by all members of the supply
chain. Supply-chain management now increasingly relies on information technology,
as discussed below.
As pointed out, the key challenge in designing supply-chain networks or, for
that matter, any large-scale systems is the difficulty of reverse engineering, i.e. deter-
mining what individual agent strategies lead to the desired collective behaviour.
Because of this difficulty in understanding the effect of individual characteristics
on the collective behaviour of the system, simulation has been the primary tool for
designing and optimizing such systems. Simulation makes investigations possible
and useful when, in the real-world situation, experimentation would be too costly
or, for ethical reasons, not feasible, or where the decisions and their consequences
are well separated in space and time. It seems at present that large-scale simulations
4244 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
of future complex processes may be the most logical, and perhaps an important
vehicle to study them objectively (Ghosh 2002).
Simulation in general helps one to detect design errors, prior to developing a
prototype in a cost-effective manner. Second, simulation of system operations may
identify potential problems that might occur during actual operation. Third,
extensive simulation may potentially detect problems that are rare and otherwise
elusive. Fourth, hypothetical concepts that do not exist in nature, even those that
defy natural laws, may be studied. The increased speed and precision of today’s
computers promise the development of high-fidelity models of physical and natural
processes, models that yield reasonably accurate results, quickly. This in turn would
permit system architects to study the performance impact of a wide variation of key
parameters quickly and, in some cases, even in real time. Thus, a quali-
tative improvement in system design may be achieved. In many cases, unexpected
variations in external stress can be simulated quickly to yield appropriate system
parameters values, which are then adopted into the system to enable it to success-
fully counteract the external stress.
Mathematical analysis, on the other hand, has to a play a critical role because
it alone can enable us to formulate rigorous generalizations or principle. Neither
physical experiments nor computer-based experiments on their own can support
such generalizations. Physical experiments usually are limited to supplying inputs
and constraints for rigorous models, because experiments themselves are rarely
described in a language that permits deductive exploration. Computer-based
experiments or simulations have rigorous descriptions, but they deal only in specifics.
A well-designed mathematical model, on the other hand, generalizes the particulars
revealed by the physical experiments, computer-based models and any
interdisciplinary comparisons. Using mathematical analysis, we can study the
dynamics, predict long-term behaviour, and gain insights into system design: e.g.
what parameters determine group behaviour, how individual agent characteristics
affect the system and that the proposed agent strategy leads to the desired group
behaviour. In addition, mathematical analysis may be used to select parameters that
optimize a system’s collective behaviour, prevent instabilities, etc.
It seems that successful modelling efforts of large-scale systems like supply-chain
networks, large-scale software systems, communication networks, biological
ecosystems, food webs, social organizations, etc. would require a solid empirical
base. Pure abstract mathematical contemplation would be unlikely to lead to
useful models. The discipline of physics provides an appropriate parallel; advances
in theoretical physics are more often than not inspired by experimental findings. The
study of supply-chain networks should therefore involve an amalgam of both
simulation and analytical techniques.
Considering the broad spectrum of a supply-chain, no model can capture all
the aspects of supply-chain processes. The modelling proceeds at three levels:
1. competitive strategic analysis, which includes location-allocation decisions,
demand planning, distribution channel planning, strategic alliances,
new product development, outsourcing, IT selection, pricing and network
structuring;
2. tactical problems like inventory control, production/distribution coordina-
tion, material handling and layout design;
Supply-chain networks: a complex adaptive systems perspective 4245
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
Queuing theory has primarily been used to address the steady-state operation of a
typical network. On the other hand, techniques from mathematical programming
have been used to solve the problem of resource allocation in networks. This is
meaningful when dynamic transients can be disregarded. However, present-day
supply-chain networks are highly dynamic, reconfigurable, intrinsically nonlinear
and non-stationary. New tools and techniques are required for their analysis such
that the structure, function and growth of networks can be considered simulta-
neously. In this regard, we discuss ‘network dynamics’ in section 7, which deals
with such issues and can be used to study the structure of supply chain and its
implication for its functionality. Understanding the behaviour of large complex
networks is the next logical step for the field of nonlinear dynamics, because they
are so pervasive in the real world. We begin with a brief introduction to dynamical
systems theory, in particular nonlinear dynamics in next section.
5.1.1 Pre-emptive queuing model with delays. Priority and heterogeneity are funda-
mental to any logistic planning and scheduling. Tasks have to be prioritized in order
to do the most important things first. This comes naturally as we try to optimize an
objective and assign the tasks their ‘importance’. Priorities may also arise due to the
non-homogeneity of the system where the ‘knowledge’ level of one agent is different
from the other. In addition, in all logistics systems, resources are limited, in both
time and space. Temporal dependence plays an important role in logistic planning
(interdependency). Sometimes, they can also arise from the physical facts when
different stages of processing have certain temporal constraints.
The considerations regarding the generality of assumptions and the clear one-
to-one correspondence between the physical logistics tasks and the model param-
eters described in (Erramilli and Forys 1991) made us apply their queuing model in
the context of supply chains (Kumara et al. 2003). The queuing system considered
here has two queues (A and B) and a single server with the following characteristics:
. once served, the class A customer returns as a class B customer after a
constant interval of time;
. Class B has non-pre-emptive priority over class A, i.e. the class A queue is
not served until the class B queue is emptied;
. the schedules are organized every T units of time, i.e. if the low priority queue
is emptied within time T, the server remains idle for the reminder of the
interval;
. finally, the higher-priority class B has a lower service rate than the low-
priority class A.
Suppose the system is sampled at the end of every schedule cycle, and the follow-
ing quantities are observed at the beginning of the kth interval: Ak: queue length
of low-priority queue; Bk: queue length of high-priority queue; Ck: outflow from
low-priority queue in the kth interval; Dk: outflow from high-priority queue in the
kth interval; k: inflow to low-priority queue from the outside in the kth interval.
The system is characterized by the following parameters: a: rate per unit of
the schedule cycle at which the low-priority queue can be served; b: rate per unit
of the schedule cycle at which the high-priority queue can be served; l: the feedback
interval in units of the schedule cycle.
The following four equations then completely describe the evolution of the
system:
Akþ1 ¼ Ak þ k Ck ð2Þ
Dk
Ck ¼ min Ak þ k , a 1 ð3Þ
b
lines (X and Y) representing some activity (Feichtinger et al. 1994). The input
rates of both queues are constant, and their sum equals the server capacity. In
each time period, the server has to decide how much time to spend on each of the
two activities.
The following quantities can be defined: : constant input rate for activity X;
: constant input rate for activity Y; X: time spent on activity X; Y: time spent
on activity Y; xk: queue length of X; yk: queue length of Y.
The amount of time X and Y that will be spent on activities X and Y in
period k þ 1 are determined by an adaptive feedback rule depending on the
difference of the queue lengths xk and yk. The decision rule or policy function
states that longer queues are served with a higher priority. Two possibilities
considered are:
1. All-or nothing decision: The server decides to spend all its time on the
activity corresponding to the longer queue. Hence, is a Heaviside function
given by
ðx yÞ ¼ 1 if x y
¼0 if x < y: ð6Þ
2. Mixed solutions: The server decides to spend most of its time to the activity
corresponding to the longer queue. For this decision function, an S-shaped
logistic function is used as given by:
1
ðx yÞ ¼ : ð7Þ
1 þ e kðxyÞ
xkþ1 ¼ xk þ ðxk yk Þ
ð8Þ
ykþ1 ¼ yk þ ðxk yk Þ:
the nature of agents can lead to more stability in the system compared with a
homogenous case, but the system loses its ability to cope with unexpected changes
in the system such as new task requirements. On the other hand, a poor performance
can be traced to the fact that the non-predictive agents do not take into account
the information delay.
If the agents are able to make accurate predictions of its current state, the
information delay could be overcome, and the system would perform well. This
results in a ‘co-evolutionary’ system in which all of the individuals are simulta-
neously trying to adapt to one another. In such a situation, agents can act like
Technical Analysts and System Analysts (Kephart et al. 1990). Agents as technical
analysts (like those in market behaviour) use either linear extrapolation or cyclic
trend analysis to estimate the current state of the system. On the other hand, agents
as system analysts have knowledge about both the individual characteristics of the
other agents in the system and how those characteristics are related to the overall
system dynamics. Technical analysts are responsive to the behaviour of the system
but suffer from an inability to take into account the strategies of other agents.
Moreover, a good predictive strategy for a single agent may be disastrous if applied
on a global scale. System analysts perform extremely well when they have very
accurate information about other agents in the system but can perform very
poorly when their information is even slightly inaccurate. They take into account
the strategies of other agents but pay no heed to the actual behaviour of the system.
This suggests combining the strengths of both methods to form a hybrid-adaptive
system analyst, which modifies its assumptions about other agents in response to
feedback about success of its own predictions. The resultant hybrid is able to per-
form well.
In order to avoid chaos while maintaining a high performance and adaptability
to unforeseen changes, more sophisticated techniques are required. One such way is
by a reward mechanism (Hogg and Huberman 1991), whereby the relative number of
computational agents following effective strategies is increased at the expense of the
others. This procedure, which generates a right population diversity out of essen-
tially homogenous ones, is able to control chaos by a series of bifurcations into a
stable fixed point.
In the above description, each agent chooses among different resources accord-
ing to its perceived payoff, which depends on the number of agents already using it.
Even the agent with predictive ability is myopic, as it considers only its current
estimate of the system state, without regard to the future. Expectations come into
play if agents use past and present global behaviour in estimating the expected future
payoff for each resource. A dynamical model of collective action that includes
expectations can be found in Glance (1993).
One of the central problems in a supply chain, closely related to modelling, is that
of demand forecasting: given the past, how can we predict the future demand?
The classic approach to forecasting is to build an explanatory model from first
principles and measure the initial conditions. Unfortunately, this has not been pos-
sible for two reasons in systems like supply chains. First, we still lack the general
Supply-chain networks: a complex adaptive systems perspective 4255
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
‘first principles’ for demand variation in supply chains, which are necessary to make
good models. Second, due to the distributed nature of the supply chains, the initial
data or the conditions are often difficult to obtain.
Because of these factors, the modern theory of forecasting that has been used in
supply chains views a time series x(t) as a realization of a random process. This is
appropriate when effective randomness arises from complicated motion involving
many independent, irreducible degrees of freedom. An alternative cause of random-
ness is chaos, which can occur even in very simple deterministic systems, as we
discussed in the earlier sections. While chaos places a fundamental limit on long-
term prediction, it suggests possibilities for short-term prediction. Random-looking
data may contain only few irreducible degrees of freedom. Time traces of the state
variable of such chaotic systems display a behaviour, which is intermediate between
regular periodic or quasiperiodic motions, and unpredictable, truly stochastic behav-
iour. It has long been seen as a form of ‘noise’ because the tools for its analysis
were couched in language tuned to a linear process. The main such tool is Fourier
analysis, which is precisely designed to extract the composition of sines and cosines
found in an observation x(t). Similarly, the standard linear modelling and pre-
diction techniques, such as autoregressive moving average (ARMA) models, are
not suitable for nonlinear systems.
With the advances in IT and science of complexity, both the challenges for
forecasting can be revisited. Large-scale simulation and micro-autonomy (section 2)
enable tracking of the detailed interaction between different entities in a supply
chain. The large volumes of data thus generated can be used to understand
demand patterns in particular and comprehend the emergence of other character-
istics in general. Even though an exact prediction of future behaviour is difficult,
often archetypal behaviour patterns can be recognized using these data. Techniques
from the complexity theory like Nonlinear Time Series Analysis and Computational
Mechanics are appropriate for this purpose.
. Phase space reconstruction (finding the space): Using the method of delays,
one can construct a series of vectors which is diffeomorphically equivalent to
the attractor of the original dynamical system and at the same time distin-
guish it from the being stochastic. The basis for this is Taken’s Embedding
theorem (Takens 1981). Time-lagged variables are used to construct vectors
for a phase space in dE dimension:
yðnÞ ¼ ½xðnÞ, xðn þ T Þ, . . . , xðn þ ðdE 1ÞT Þ: ð11Þ
The time lag T can be determined using mutual information (Fraser and
Swinney 1983) and dE using a false nearest-neighbours test (Kennel et al.
1992).
. Classification of the signal: System identification in nonlinear chaotic systems
means establishing a set of invariants for each system of interest and then
comparing observations with that library of invariants. The invariants are
properties of attractor and are independent of any particular trajectory of the
attractor. Invariants can be divided into two classes: fractal dimensions
(Farmer et al. 1983) and Lyapunov exponents (Sano and Sawada 1985).
Fractal dimensions characterize the geometrical complexity of dynamics,
i.e. how the sample of points along a system orbit are distributed spatially.
Lyapunov exponents, on the other hand, describe the dynamical complexity,
i.e. ‘stretching and folding’ in the dynamical process.
. Making models and prediction: This step involves determination of the
parameters aj of the assumed model of the dynamics:
yðnÞ ! yðn þ 1Þ
ð12Þ
yðn þ 1Þ ¼ Fð yðnÞ, a1 , a2 , . . . , ap Þ,
which is consistent with invariant classifiers (Lyapunov exponents, dimensions).
The functional form F () often used includes polynomials, radial basis functions,
etc. The Local False Nearest Neighbor (Abarbanel and Kennel 1993) test is used to
determine how many dimensions are locally required to describe the dynamics gen-
erating the time series, without knowing the equations of motion, and hence gives
the dimension for the assumed model. The methods for building nonlinear models
are classified as Global and Local (Farmer and Sidorowich 1987, Casdalgi 1989).
By definition, Local methods vary from point to point in the phase space, while
Global Models are constructed once and for all in the whole phase space. Models
based on Machine Learning techniques such as radial basis functions or Neural
Networks (Powell 1987) and Support Vector Machines (Mukherjee et al. 1997)
carry features of both. They are usually used as global functional forms, but they
clearly demonstrate localized behaviour, too.
The techniques from nonlinear time series analysis are well suited for
modelling the nonlinearities in the supply chains. For an application of nonlinear
time series analysis in supply chains, the reader is referred to Lee et al. (2002).
Using this, one can deduce that the time series is deterministic, so it should be
possible in principle to build predictive models. The invariants can be used to
effectively characterize the complex behaviour. For example, the largest
Lyapunov exponent gives an indication of how far into the future reliable predic-
tions can be made, while the fractal dimensions give an indication of how complex a
Supply-chain networks: a complex adaptive systems perspective 4257
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
model should be chosen to represent the data. These models then provide the
basis for systematically developing the control strategies. It should be noted
that the functional forms used for modelling in the fourth step above are continuous
in their argument. This approach builds models viewing a dynamical system as
obeying laws of physics. From another perspective, a dynamical system can be
considered as processing information. So, an alternative class of discrete ‘computa-
tional’ models inspired from the theory of automata and formal languages can also
be used for modelling the dynamics (Lind and Marcus 1996). ‘Computational
Mechanics’ considers this viewpoint and describes the system behaviour in
terms of its intrinsic computational architecture, i.e. how it stores and processes
information.
models. The inductive jump to a higher computational level occurs by taking those
regularities as the new representation.
"-machines reflect a balanced utilization of deterministic and random informa-
tion processing, and this is discovered automatically during "-machine reconstruc-
tion. These machines are unique and optimal in the sense that they have maximal
predictive power and minimum model size (hence satisfying Occam’s Razor, i.e.
causes should not be multiplied beyond necessity). "-machines provide a minimal
description of the pattern or regularities in a system in the sense that the pattern
is the algebraic structure determined by the causal states and their transitions.
"-machines are also minimally stochastic. Hence, computational mechanics acts as
a method for automatic pattern discovery.
An "-machine is the organization of the process, or at least of the part of it
which is relevant to our measurements. The "-machine that models the observed
time series from a system can be used to define and calculate macroscopic or global
properties that reflect the characteristic average information- processing capabilities
of the system. Some of these include Entropy rate, Excess entropy and Statistical
complexity (Feldman and Crutchfield 1998) and (Crutchfield and Feldman 2001).
The entropy density indicates how predictable the system is. Excess entropy, on
other hand, provides a measure of the apparent memory stored in a spatial config-
uration and represents how hard it is to predict. "-machine reconstruction leads to a
natural measure of the statistical complexity of a process, namely the amount of
information needed to specify the state of the "-machine, i.e. the Shannon Entropy.
Statistical complexity is distinct and dual from information theoretic entropies and
dimension (Crutchfield and Young 1989). The existence of chaos shows that there is
a rich variety of unpredictability that spans the two extremes: periodic and random
behaviour. This behaviour between two extremes, while of intermediate information
content, is more complex in that the most concise description (modelling) is an
amalgam of regular and stochastic processes. An information theoretic description
of this spectrum in terms of dynamical entropies measures raw diversity of temporal
patterns. The dynamical entropies, however, do not measure directly the com-
putational effort required in modelling the complex behaviour, which is what
statistical complexity captures.
Computational mechanics sets limits on how well processes can be predicted
and shows how, at least in principle, those limits can be attained. "-machines are
what any prediction method would build, if only they could. Similar to "-machine
reconstruction, techniques exist which can be used to discover casual architecture in
memoryless transducers, transducers with memory and spatially extended systems
(Shalizi and Crutchfield 2001). Computational mechanics can be used for modelling
and prediction in supply chains in the following way:
. In systems like supply chains, it is difficult to define analogues of various
thermodynamic quantities like energy, temperature, pressure, etc. as we do
for physical systems. Each component in the network has cognition, which is
absent in physical systems such as a molecule of a gas. Because of such
difficulties, statistical mechanics cannot be applied directly to build predic-
tion models for supply chains. As discussed previously by not requiring a
Hamiltonian (the energy-like function), computational mechanics is still
applicable in the case of supply chains.
Supply-chain networks: a complex adaptive systems perspective 4259
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
7. Network dynamics
The ubiquity of networks in the social, biological and physical sciences and in
technology leads naturally to an important set of common problems, which
are being currently studied under the rubric of ‘Network Dynamics’ (Strogatz
2001). Structure always affects function, and it is important to consider dynamical
and structural complexity together in the study of networks. For instance, the
topology of social networks affects the spread of information and disease, and
the topology of the power grid affects the robustness and stability of power
transmission. The different problem areas in network dynamics are discussed
below.
One area of research in this field has been primarily concerned with the dynam-
ical complexity in regular networks without regard to other network topologies.
While the collective behaviour depends on the details of the network, some general-
izations can still be drawn (Strogatz 2001). For instance, if the dynamical system
at each node has stable fixed points and no other attractor, the network tends to
lock into a static fixed pattern. If the nodes have competing interactions, the network
may display an enormous number of locally stable equilibria. In the intermediate
case where each node has a stable limit cycle, synchronization and patterns like
travelling waves can be observed. For non-identical oscillators, the temporal ana-
logue of phase transition can be seen with the control parameter as the coupling
coefficient. At the opposite extreme, if each node has an identical chaotic attractor,
the network can synchronize their erratic fluctuations. For a wide range of network
topologies, synchronized chaos requires that the coupling be neither too weak nor
too strong; otherwise, spatial instabilities are triggered. Related lines of research that
address networks of identical chaotic maps are coupled map lattices (Kaneko and
Tsuda 2000) and cellular automata (Wolfram 1994). However, these systems have
4260 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
been used mainly as testbeds for exploring spatio-temporal chaos and pattern
formation in the simplest mathematical settings, rather than as models of real
systems.
The second area in network dynamics is concerned with characterizing the net-
work structure. The network structure or topologies in general can vary from com-
pletely regular, like chains, grids, lattices and fully connected, to completely random.
Moreover, the graphs can be directed or undirected and cyclic or acyclic. In order to
characterize topological properties of the graphs, various statistical quantities have
been defined. Most important of them include average path length, clustering coeffi-
cient, degree distributions, size of giant component and various spectral properties.
A review of the main models and analytical tools, covering regular graphs, random
graphs, generalized random graphs, small-world and scale-free networks, as well as
the interplay between topology and the network’s robustness against failures
and attacks can be found in Albert (2000b), Albert and Barabasi (2002), Albert
et al. (2002), Callaway et al. (2000) and Dorogovtsev and Mendes (2002).
Classic random graphs were introduced by Erdos and Renyi (Bollobas 1985)
and have been the most thoroughly studied models of networks. Such graphs have
a Poisson degree distribution and statistically uncorrelated vertices. At large N
(total number of nodes in the graph) and large enough p (the probability that two
arbitrary vertices are connected), a giant connected component appears in the net-
work, a process known as percolation. Random graphs exhibit a low average path
length and a low clustering coefficient. Regular networks, on other hand, show a
high clustering coefficient and also a greater average path length compared with the
random graphs of similar size. The networks found in the real world, however, are
neither completely regular nor completely random. Instead, we see ‘small world’ and
‘scale free’ characteristics for many real networks like social networks, Internet,
WWW, power grids, collaboration networks, ecological and metabolic networks,
to name a few.
In order to describe the transition from a regular network to a random network,
Watts and Strogatz introduced the so-called small-world graphs as models of social
networks (Watts and Strogatz 1998) and (Newman 2000). This model exhibits a high
degree of clustering, as in the regular network, and a small average distance between
vertices, as in the classic random graphs. A common feature of this model with a
random graph model is that the connectivity distribution of the network peaks at an
average value and decays exponentially. Such an exponential network is homoge-
neous in nature: each node has roughly the same number of connections. Because of
the high degree of clustering, the models of dynamical systems with small-world
coupling display an enhanced signal-propagation speed, rapid disease propagation,
and synchronizability (Watts and Strogatz 1998, Newman 2002).
Another significant recent discovery in the field of complex networks is that the
connectivity distributions of a number of large-scale and complex networks, includ-
ing the WWW, Internet, and metabolic networks, satisfy the power law PðkÞ k ,
where P(k) is the probability that a node in the network is connected to k other
nodes, and is a positive real number (Albert et al. 2000a Barabasi et al. 2000,
Barabasi 2001). Since power-laws are free of the characteristic scale, networks that
satisfy these laws are called ‘scale-free’. A scale-free network is inhomogeneous in
nature: most nodes have a few connections, and a small but statistically significant
number of nodes have many connections. The average path length is smaller in the
Supply-chain networks: a complex adaptive systems perspective 4261
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
The idea of managing the whole supply chain and transforming it into a highly
autonomous, dynamic, agile, adaptive and reconfigurable network certainly provides
an appealing vision for managers. The infrastructure provided by information tech-
nology has made this vision partially realizable. But the inherent complexity
of supply chains makes the efficient utilization of information technology an
elusive endeavour. Tackling this complexity has been beyond the existing tools
and techniques and requires revival and extensions.
As a result, we emphasized in this paper that in order to effectively understand
a supply-chain network, it should be treated as a CAS. We laid down some initial
ideas for the extension of modelling and analysis of supply chains using the con-
cepts, tools and techniques arising in the study of CAS. As future work, we need
to verify the feasibility and usefulness of the proposed techniques in the context
of large-scale supply chains.
Acknowledgements
References
Abarbanel, H.D.I., The Analysis of Observed Chaotic Data, 1996 (Springer: New York).
Abarbanel, H.D.I. and Kennel, M.B., Local false nearest neighbors and dynamical dimensions
from observed chaotic data. Phys. Rev. E, 1993, 47, 3057–3068.
Adami, C., Introduction to Artificial Life, 1998 (Springer: New York).
Albert, R. and Barabasi, A.L., Statistical mechanics of complex networks. Rev. Mod. Phys.,
2002, 74, 47.
Albert, R., Barabási, A.L., Jeong, H. and Bianconi, G., Power-law distribution of the World
Wide Web. Science, 2000, 287, 2115.
Albert R., Jeong, H., Barabasi, A.L., Error and attack tolerance of complex networks. Nature,
2000, 406, 378–382.
Balakrishnan, A., Kumara, S. and Sundaresan, S., Exploiting information technologies
for product realization. Inform. Syst. Front. J. Res. Innov., 1999, 1(1), 25–50.
Barabasi, A.L., The physics of web. Phys. World, July 2001.
Barabasi, A.L., Albert, R. and Jeong, H., Scale-free characteristics of random networks:
The topology of the World Wide Web. Physica A, 2000, 281, 69–77.
Baranger, M., Chaos, complexity, and entropy: a physics talk for non-physicists. Available
online at: http://necsi.org/projects/baranger/cce.pdf (accessed May 2005).
Bar-Yam, Y., Dynamics of Complex Systems, 1997 (Addison-Wesley: Reading, MA).
Bollobas, B., Random Graphs, 1985 (Academic Press: London).
Callaway, D.S., Newman, M.E.J., Strogatz, S.H. and Watts, D.J., Network robustness and
fragility: Percolation on random graphs. Phys. Rev. Lett., 2000, 85, 5468–5471.
Carlson, J.M., Doyle, J., Highly optimised tolerance: a mechanism for power laws in designed
systems. Phys. Rev. E, 1999, 60(2), 1412–1427.
Casdalgi, M., Nonlinear prediction of chaotic time series. Physica D, 1989, 35, 335–356.
Choi, T.Y., Dooley, K.J., Ruangtusanathan, M., Supply networks and complex adaptive
systems: control versus emergence. J. Operat. Manage., 2001, 19(3), 351–366.
Cooper, M.C., Lambert, D.M. and Pagh, J.D., Supply chain management: more than a new
name for logistics. Int. J. Logist. Manage., 1997, 8(1), 1–13.
Crutchfield, J.P., Knowledge and meaning . . . chaos and complexity. In Modeling Complex
Systems, edited by L. Lam and H.C. Morris, 1992 (Springer: Berlin), pp. 66–101.
Crutchfield, J.P., The calculi of emergence: computation, dynamics and induction. Physica D,
1994, 75, 11–54.
Crutchfield, J.P. and Young, K., Inferring statistical complexity. Phys. Rev. Lett., 1989,
63, 105–108.
Crutchfield, J.P. and Feldman, D.P., Synchronizing to the environment: information theoretic
constraints on agent learning. Adv. Complex Syst., 2001, 4, 251–264.
Crutchfield, J.P. and Feldman, D.P., Regularities unseen, randomness observed: levels of
entropy convergence. Chaos, 2003, 13, 25–54.
Csete, M.E. and Doyle, J., Reverse engineering of biological complexity. Science, 2002,
295, 1664.
Dorogovtsev, S.N. and Mendes, J.F.F., Evolution of networks. Adv. Phys., 2002, 51,
1079–1187.
Erramilli, A. and Forys, L.J., Oscillations and chaos in a flow model of a switching system.
IEEE J. Select. Areas Commun., 1991, 9(2), 171–178.
Farmer, J.D., Ott, E. and Yorke, J.A., The dimension of chaotic attractors. Physica D, 1983,
7, 153–180.
Farmer, J.D. and Sidorowich, J.J., Predicting chaotic time-series. Phys. Rev. Lett., 1987,
59(8), 845–848.
Feichtinger, G., Hommes, C.H. and Herold, W., Chaos in a simple deterministic queuing
system. ZOR- Math. Meth. Oper. Res., 1994, 40, 109–119.
Feldman, D.P. and Crutchfield, J.P., Discovering non-critical organization: statistical mech-
anical, information theoretic and computational views of patterns in one-dimensional
spin systems. Santa Fe Institute Working Paper 98–04–026, 1998.
Flake, G.W., The Computational Beauty of Nature, 1998 (MIT Press: Cambridge, MA).
Forrester, J.W., Industrial Dynamics, 1961 (MIT Press: Cambridge, MA).
4264 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
Fraser, A.M. and Swinney, H.L., Independent coordinates for strange attractors from mutual
information. Phys. Rev. A, 1983, 33(2), 1134–1140.
Ghosh, S., The role of modeling and asynchronous distributed simulation in analyzing com-
plex systems of the future. Inform. Syst. Front. J. Res. Innov., 2002, 4(2), 166–171.
Glance, N.S., Dynamics with expectations. PhD thesis, Physics Department, Stanford
University, 1993.
Hogg, T. and Huberman, B.A., The behavior of computational ecologies. In The Ecology
of Computation, edited by B.A. Huberman, pp. 77–116, 1988 (Elsevier Science:
Amsterdam).
Hogg, T. and Huberman, B.A., Controlling chaos in distributed systems. IEEE Trans. on
Systems, Man and Cybernetics, 1991, 21, 1325–1332.
Kaneko, K. and Tsuda, I., Complex Systems: Chaos and Beyond—A constructive approach
with applications in life sciences, 2000 (Springer: Berlin).
Kennel, M., Brown, R. and Abarbanel, H.D.I., Determining embedding dimension for
phase-space reconstruction using a geometrical construction. Phys. Rev. A, 1992,
45(6), 3403–3068.
Kephart, J.O., Hogg, T. and Huberman, B.A., Dynamics of computational ecosystems.
Phys. Rev. A, 1989, 40(1), 404–421.
Kephart, J.O., Hogg, T. and Huberman, B.A., Collective behavior of predictive agents.
Physica D, 1990, 42, 48–65.
Kumara, S., Ranjan, P., Surana, A. and Narayanan, V., Decision making in logistics:
A chaos theory based analysis. Ann. Int. Inst. Prod. Eng. Res. (Ann. CIRP), 2003, 1,
381–384.
Lee, S., Gautam, N., Kumara, S., Hong, Y., Gupta, H., Surana, A., Narayanan, V.,
Thadakamalla, H., Brinn, M. and Greaves, M., Situation identification using dynamic
parameters in complex agent-based planning systems. Intell. Eng. Syst. Artif. Neural
Networks, 2002, 12, 555–560.
Lind, D. and Marcus, B., An introduction to symbolic dynamics and coding, 1995 (Cambridge
University Press: New York).
Llyod, S. and Slotine, J.J.E., Information theoretic tools for stable adaptation and learning.
Int. J. Adapt. Control Signal Process., 1996, 10, 499–530.
Maxion, R.A., Toward diagnosis as an emergent behavior in a network ecosystem. Physica D,
1990, 42, 66–84.
Min, H. and Zhou, G., Supply chain modeling: past, present and future. Comput. Ind. Eng.,
2002, 43, 231–249.
Mukherjee, S., Osuna, E. and Girosi, F., Nonlinear prediction of chaotic time series using
support vector machines. In IEEE Workshop on Neural Networks for Signal Processing
VII, 1997, pp. 511–519.
Newman, M.E.J., Models of the small world. J. Stat. Phys., 2000, 101, 819–841.
Newman, M.E.J., The spread of epidemic disease on networks. Phys. Rev. E, 2002, 66.
Newman, M.E.J., Random graphs as models of networks. In Handbook of Graphs and
Networks, edited by S. Bornholdt and H.G. Schuster, 2003 (Wiley-VCH, Berlin).
Newman, M.E.J., Strogatz, S.H. and Watts, D.J., Random graphs with arbitrary degree
distribution and their applications. Phys. Rev. E, 2001, 64.
Ott, E., Chaos in Dynamical Systems, 1996 (Cambridge University Press: Cambridge).
Powell, M.J.D., Radial basis function approximation to polynomials. Preprint University of
Cambridge, 1987.
Rasmussen, D.R. and Moseklide, M., Bifurcations and chaos in generic management model.
Eur. J. Oper. Res., 1988, 35, 80–88.
Ravasz, E. and Barabasi, A.L., Hierarchical organization in complex networks. Phys. Rev. E,
2003, 67.
Sano, M. and Sawada, Y., Measurement of the Lyapunov Spectrum form a chaotic time
series. Phys. Rev. Lett., 1985, 55, 1082–1084.
Sawhill, B.K., Self-organised criticality and complexity theory. In 1993 Lectures in Complex
Systems, edited by L. Nadel and D.L. Stein, pp. 143–170, 1995 (Addison-Wesley:
Reading, MA).
Supply-chain networks: a complex adaptive systems perspective 4265
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008
Schieritz, N. and Grobler, A., Emergent structures in supply chains—A study integrating
agent-based and system dynamics modeling, in 36th Annual Hawaii International
Conference on System Sciences, Big Island, HI, 2003.
Shalizi, C.R. and Crutchfield, J.P., Computational mechanics: pattern and prediction,
structure and simplicity. J. Stat. Phys., 2001, 104, 816–879.
Shalizi, C.R., Causal architecture, complexity and self-organization in time series and
cellular automata. Available online at: http://www.santafe.edu/shalizi/thesis, 2005
(accessed May 2005).
Simon, H.A., The Sciences of the Artificial, 3rd ed., 1997 (The MIT Press, Cambridge, MA).
Strogatz, S.H., Nonlinear Dynamics and Chaos, 1994 (Addison-Wesley: Reading, MA).
Strogatz, S.H., Exploring complex networks. Nature, 2001, 410, 268–276.
Takens, F., Detecting Strange Attractor in Turbulence. In L.S. Young, Editor, Dynamical
Systems and Turbulence, Lecture Notes in Mathematics, 1981, 898, 366–381,
(Springer, New York).
Thadakamalla, H.P., Raghavan, U.N., Kumara, S. and Albert, R., Survivability of
multiagent-based supply networks: a topological perspective. IEEE Intell. Syst., 2004,
19(5), 24–31.
Wang, X.F. and Chen, G., Synchronization in scale-free dynamical networks: robustness
and fragility. IEEE Trans. Circuits and Systems I Fundam. Theory Applic., 2002,
49(1), 54–62.
Watts, D.J. and Strogatz, S.H., Collective dynamics of ‘small-world’ networks. Nature, 1998,
393, 440–442.
Wolfram, S., Cellular Automata and Complexity: Collected Papers, 1994 (Addison-Wesley:
Reading, MA).
D e p e n d a b l e A g e n t S y s t e m s
Survivability of
Multiagent-Based
Supply Networks: A
Topological Perspective
Hari Prasad Thadakamalla, Usha Nandini Raghavan, Soundar Kumara, and
Réka Albert, Pennsylvania State University
Although fairly simple business processes govern these individual entities, real-time
multiagent-based capabilities and global Internet connectivity make today’s supply chains complex.
supply network’s
Fluctuating demand patterns, increasing customer large-scale supply network topologies that can
survivability by expectations, and competitive markets also add to extend to other large-scale MASs. Building surviv-
their complexity. able topologies alone doesn’t, however, make an
concentrating on the Supply networks are usually modeled as multi- MAS dependable. To create survivable—and hence
agent systems (MASs).1 Because supply chain man- dependable—multiagent systems, we must also con-
topology and its agement must effectively coordinate among many sider the interplay between network topology and
different entities, a multiagent modeling framework node functionalities.
interplay with based on explicit communication between these enti-
ties is a natural choice.1 Furthermore, we can repre- A topological perspective
functionalities. sent these multiagent systems as a complex network To date, the survivability literature has emphasized
with entities as nodes and the interactions between network functionalities rather than topology. To be
them as edges. Here we explore the survivability (and survivable, a supply network must adapt to a dy-
hence dependability) of these MASs from the view namic environment, withstand failures, and be flex-
of these complex supply networks. ible and highly responsive. These characteristics
Today’s supply networks aren’t dependable—or depend on not only node functionality but also the
survivable—in chaotic environments. For example, topology in which nodes operate.
Figure 1 shows how mediocre a typical supply net-
work’s reaction to a node or edge failure is compared The components of survivability
to a network with built-in redundancy. From a topological perspective, the following
Survivability is a critical factor in supply network properties encompass survivability, and we denote
design. Specifically, supply networks in dynamic them as survivability components.
environments, such as military supply chains during The first is robustness. A robust network can sustain
wartime, must be designed more for survivability the loss of some of its structure or functionalities and
than for cost effectiveness. The more survivable a maintain connectedness under node failures, whether
network is, the more dependable it will be. the failure is random or is a targeted attack. We mea-
We present a methodology for building survivable sure robustness as the size of the network’s largest
• Growth: Start with a small number of nodes—say, m0—and 7. R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Net-
assume that every time a node enters the system, m edges works,” Reviews of Modern Physics, Jan. 2002, pp. 47–97.
are pointing from it, where m < m0.
• Preferential attachment: Every time a new node enters the 8. R. Albert, H. Jeong, and A.-L Barabási, “Error and Attack Tolerance
system, each edge of the newly connected node preferentially of Complex Networks,” Nature, July 2000, pp. 378–382.
P(k )
P(k )
0.6 0.01
while the “rich get richer” phenomenon used <k >
0.4 0.001
0.2 0.0001
in the Barabási-Albert model explains the 0.0
scale-free distribution.2 k 2 4 6
k
8 10 12 1 10 100 1,000
k
Similarly, we seek to design supply net-
Characteristic Scales as Scales linearly with N Scales as
works with inherent survivability components path length log(N ) for small p. And for higher log(N ) / log(logN) )
(see Figure 3), obtaining these components by p scales as log(N )
coining appropriate growth mechanisms. Of
Clustering p (the connection High, but as p → 1 ((m– 1 ) / 2 ) *(log(N )/N)
course, having all the aforementioned proper-
coefficient probability) behaves like where m is the number of
ties in a network might not be practically fea- a random graph edges with which a
sible—we’d likely have to negotiate trade-offs node enters
depending on the domain. Also, domain speci-
Robustness Similar responses Similar response as Highly resilient to random
ficities might make it inefficient to incorpo- to failures to both random random networks. failures while being very
rate all properties. For instance, in a supply and targeted This is because it has sensitive to targeted
network, we might not be able to rewire the attacks a degree distribution attacks
edges as easily as we can in an information similar to random
networks.
network, so we would concentrate more on
obtaining other properties such as low char-
acteristic path length, robustness to failures
Figure 2. Comparing the survivability components of random, small-world, and
and attacks, and high clustering coefficients.
scale-free networks.
So, the construction of these networks is
domain specific.
Establishing edges between network nodes
is also domain specific. For instance, in a sup-
ply network, a retailer would likely prefer to Retailer Failed node
have contact with other geographically con-
Warehouse Retailer Failed edge
venient nodes (distributors, warehouses, and
other retailers). At the same time, nodes in a Alternate path
Retailer
file-sharing network would prefer to attach to
other nodes known to locate or hold many Retailer
shared files (that is, nodes of high degree). Manufacturer Warehouse Retailer
Retailer
Obtaining the survivability
components Retailer
While evolving the network on the basis
of domain constraints, we need to incorpo- Warehouse Retailer
rate four traits into the growth model for Retailer Retailer
obtaining good survivability components.
The first is low characteristic path length. Warehouse Retailer
During network construction, establish a few
Retailer
long-range connections between nodes that
require many steps to reach one from Manufacturer
Retailer
another.
Warehouse Retailer
The second is good clustering. When two
nodes A and B are connected, new edges Retailer
from A should prefer to attach to neighbors Manufacturer
Retailer
of B, and vice versa.
The third is robustness to random and tar- Warehouse Retailer
geted failure. Preferential attachment—where
Retailer
new nodes entering the network don’t connect
uniformly to existing nodes but attach prefer-
entially to higher-degree nodes (see the side- Figure 3. The transition from supply chain to a survivable supply network.
Figure 4. Snapshots of the modeled networks during their growth, where the nodes number 70. MSBs are green, FSBs are red, and
battalions are blue.
bar for more details)—leads to scale-free net- agent in an agent system to communicate • A forward support battalion prefers to
works with very few critical and many not-so- with every other agent uses system band- attach to highly connected nodes so that
critical nodes. Here we measure a node’s criti- width inefficiently and could completely bog its supplies proliferate faster in the net-
cality in terms of the number of edges incident down the system. So the amount of redun- work. The supply range from an FSB goes
on it. So, these networks are robust to random dancy results from a trade-off between cost up to a particular distance (at most three
failures (the probability that a critical node fails and survivability. in our model).
is very small) but not to targeted attacks (attack- • A main support battalion also prefers to
ing the very few critical nodes would devastate An illustration attach to a highly connected node to
the network). Also, it’s not practically feasible Suppose we want to build a topology for a enable its supplies to proliferate faster in
to have all nodes play an equal role in the sys- military supply chain that must be survivable the network. We assume an unrestricted
tem—that is, be equally critical. Thus, the net- in wartime. First, we broadly classify the net- supply reach from an MSB, thus facilitat-
work should have a good balance of critical, work nodes into three types: ing some long-range connections.
not-so-critical, and noncritical nodes.
The fourth is efficient rewiring. Rewiring • Battalions prefer to attach to a highly con- In a conventional logistics network, the
edges in a network might or might not be fea- nected node so that the supplies from dif- MSBs supply commodities (such as ammu-
sible, depending on the domain. But where ferent parts of the network will be trans- nitions, food, and fuel) to the FSBs, who in
it is feasible, it should preserve the other ported to them in fewer steps. Battalions turn forward them to the battalions. Our
three traits. also require quick responses, so they prefer approach doesn’t restrict node functionali-
Although complete graphs come equipped the subsequent links to attach to nodes at ties as such—for example, we assume that
with good survivability components, they convenient shorter distances (in our model even a battalion can supply commodities to
clearly aren’t cost effective. Allowing every we considered a fixed distance of two). other battalions if necessary.
8 5.6
Model 1
7 Model 2 5.5
In (number of nodes of degree > k )
Model 3 5.4
6
Characteristic path length
5.3
5
5.2
4
5.1
3
5.0
2
4.9
1 4.8
0 4.7
0 1 2 3 4 5 6.5 7.0 7.5 8.0 8.5 9.0
(a) In (degree k ) (b) Ln (number of nodes)
Figure 5. How our proposed network performed: (a) the log-log of the degree distribution for all the three networks;
(b) the characteristic path length of the proposed network against the log of the number of nodes.
1,000 10 25
Maximum distance in the largest
Average length in the largest
900 Model 1 9
connected component
connected component
800 Model 2 8 20
700 Model 3 7
600 6 15
500 5
400 4 10
300 3
200 2 5
100 1
0 0 0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
(a) Percentage of nodes removed (b) Percentage of nodes removed (c) Percentage of nodes removed
Figure 6. Responses of the three networks to random attacks, plotted as (a) the size of the largest connected component,
(b) characteristic path length, and (c) maximum distance in the largest connected component against the percentage of nodes
removed from each network.
1,200 18 45
connected component
connected component
Model 3 14 35
800 12 30
10 25
600
8 20
400 6 15
4 10
200
2 5
0 0 0
0 10 20 30 40 50 60 0 20 40 60 0 10 20 30 40 50 60
(a) Percentage of nodes removed (b) Percentage of nodes removed (c) Percentage of nodes removed
Figure 7. The three networks’ responses to targeted attacks, plotted as (a) the size of the largest connected component,
(b) characteristic path length, and (c) maximum distance in the largest connected component against the percentage of nodes
removed from each network.
dom failures). Also, the decrease in the largest are robust to random failures—most of the These networks’ responses to targeted
connected component’s size is linear with nodes in the network have a degree less than attacks are inferior compared to their re-
respect to the number of nodes removed, which four, and removing smaller-degree nodes silience to random attacks (see Figure 7). The
corresponds to the slowest possible decrease. impacts the networks much less than removing size of the largest component decreases much
So, we can safely conclude that these networks high-degree nodes (called hubs). faster for the proposed network than for the
other two networks, but the proposed network
performs better on the other two robustness
measures. That is, the distances in the con-
T h e A u t h o r s nected component are considerably smaller
when more than 10 percent of nodes are
Hari Prasad Thadakamalla is a PhD student in the Department of Industrial removed.
and Manufacturing Engineering at Pennsylvania State University, University
We can improve robustness to targeted
Park. His research interests include supply networks, search in complex net-
works, stochastic systems, and control of multiagent systems. He obtained attacks by introducing constraints in the
his MS in industrial engineering from Penn State. Contact him at attachment rules. Here we assume that node
hpt102@psu.edu. type constrains its degree—that is, network
MSBs, FSBs, and battalions can’t have more
than m1, m2, and m3 edges, respectively, inci-
Usha Nandini Raghavan is a PhD student in industrial and manufacturing dent on them. This is a reasonable assump-
engineering at Pennsylvania State University, University Park. Her research tion because in military logistics (or any orga-
interests include supply chain management, graph theory, complex adaptive
systems, and complex networks. She obtained her MSc in mathematics from
the Indian Institute of Technology, Madras. Contact her at uxr102@psu.edu.
1,000
Sixe of the largest connected component
Model
900 m 1 = 4, m 2 = 10, m 3 = 25
800 m 1 = 4, m 2 = 8, m 3 = 12
Soundar Kumara is a Distinguished Professor of industrial and manufac- m 1 = 3, m 2 = 6, m 3 = 10
turing engineering. He holds joint appointments with the Department of Com- 700
puter Science and Engineering and School of Information Sciences and Tech- 600
nology at Pennsylvania State University. His research interests include
complexity in logistics and manufacturing, software agents, neural networks, 500
and chaos theory as applied to manufacturing process monitoring and diag- 400
nosis. He’s an elected active member of the International Institute of Pro-
duction Research. Contact him at skumara@psu.edu. 300
200
Réka Albert is an assistant professor of physics at Pennsylvania State Uni- 100
versity and is affiliated with the Huck Institutes of the Life Sciences. Her 0
main research interest is modeling the organization and dynamics of com- 0 10 20 30 40 50 60
plex networks. She received her PhD in physics from the University of Notre Percentage of nodes removed
Dame. She is a member of the American Physical Society and the Society for
Mathematical Biology. Contact her at ralbert@phys.psu.edu. Figure 8. The proposed network’s
responses to targeted attacks for
different values of m1, m2, and m3.
www.computer.org/internet/
We study trade-offs presented by local search algorithms in complex networks which are heterogeneous in
edge weights and node degree. We show that search based on a network measure, local betweenness centrality
共LBC兲, utilizes the heterogeneity of both node degrees and edge weights to perform the best in scale-free
weighted networks. The search based on LBC is universal and performs well in a large class of complex
networks.
weighted networks. Finally, we give conclusions in Sec. 共1兲 Its node degree distribution follows a power law with
VIII. exponent varying from 2.0 to 3.0. Although we discuss the
search strategies for networks with Poisson degree distribu-
tion 共ER random graphs兲, we concentrate more on scale free
II. PROBLEM DESCRIPTION AND LITERATURE
networks since most of the real world networks are found to
The problem of decentralized search goes back to the fa- exhibit this behavior 关1–3兴.
mous experiment by Milgram 关25兴 illustrating the short dis- 共2兲 It has nonuniformly distributed weights on the edges.
tances in social networks. One of the striking observations of Here the weights signify the cost or time taken to pass the
this study as pointed out by Kleinberg 关21兴 was the ability of message or query. Hence, smaller weights correspond to
the nodes in the network to find short paths by using only shorter and/or better paths. We consider different distribu-
local information. Currently, Watts et al. 关26兴 are doing an tions such as Beta, uniform, exponential, and power law.
Internet-based study to verify this phenomenon. Kleinberg 共3兲 It is unstructured and decentralized. That is, each
demonstrated that the emergence of such phenomenon re- node has information only about its first and second neigh-
quires special topological features 关21兴. Considering a family bors and no global information about the target is available.
of network models that generalizes the Watts-Strogatz model Also, the nodes can communicate only with their immediate
关6兴, he showed that only one particular model among this neighbors.
infinite family can support efficient decentralized algorithms. 共4兲 Its topology is dynamic 共ad hoc兲 while still maintain-
Unfortunately, the model given by Kleinberg is too con- ing its statistical properties. These particular types of net-
strained and represents only a very small subset of complex works are becoming more prevalent due to advances made in
networks. Watts et al. presented another model to explain the different areas of engineering especially in sensor networks
phenomena observed by Milgram which is based upon plau- 关18兴, peer-to-peer networks 关19兴 and dynamic supply chains
sible hierarchical social structures 关22兴. However, in many 关20兴. Here, in this paper we analyze the problem of finding
real-world networks, it may not be possible to divide the decentralized algorithms in such weighted complex net-
nodes into sets of groups in a hierarchy depending on the works, which we believe has not been explored to date.
properties of the nodes as in the Watts et al. model. Among the search strategies employed in this paper is a
Recently, Adamic et al. 关9兴 showed that in networks with strategy based on the local betweenness centrality 共LBC兲 of
a power-law degree distribution 共scale-free networks兲 high nodes. Betweenness centrality 共also called load兲, first devel-
degree seeking search is more efficient than random walk oped in the context of social networks 关28兴, has been recently
search. In random walk search, the node that has the message adapted to optimal transport in weighted complex networks
passes it to a randomly chosen neighbor. This process con- by Goh et al. 关17兴. These authors have shown that in the
tinues until it reaches the target node. In high degree search, strong disorder limit 共that is, when the total path length is
the node passes the message to the neighbor that has the dominated by the maximum edge weight over the path兲, the
highest degree among all nodes in the neighborhood, assum- load distribution follows a power law for both ER random
ing that a more connected neighbor has a higher probability graphs and scale-free networks. To determine a node’s be-
of reaching the target node. The high degree search was tweenness as defined by Goh et al. one would need to have
found to outperform the random walk search consistently in the knowledge of the entire network. Here we define a local
networks having power-law degree distribution for different parameter called local betweenness centrality 共LBC兲 which
exponents varying from 2.0 to 3.0. Using generating function only uses information on the first and second neighbors of a
formalism given by Newman 关27兴, Adamic et al. showed that node, and we develop a search strategy based on this local
for random walk search the number of steps s until approxi- parameter.
mately the whole graph is revealed is given by s ⬃ N3共1−2/兲,
where is the power-law exponent, while high degree search
leads to a much more favorable scaling s ⬃ N2−4/. III. LOCAL BETWEENNESS CENTRALITY
The assumption of equal edge weights 共meaning the cost, We assume that each node in the network has information
bandwidth, distance, or power consumption associated with about its first and second neighbors. For calculating the local
the process described by the edge兲 usually does not hold in betweenness centrality of the neighbors of a given node we
real-world networks. As pointed out by many researchers consider the local network formed by that node 共which we
关11–17兴, it is incomplete to assume that all the links are will call the root node兲, its first and second neighbors. Then,
equivalent while studying the dynamics of large-scale net- the betweenness centrality, defined as the fraction of shortest
works. The total path length 共p兲 in a weighted network for paths going through a node 关3兴, is calculated for the first
the path 1-2-3¯-n, is given by p = 兺i=1n
wi,i+1, where wi,i+1 is neighbors in this local network. Let L共i兲 be the LBC of a
the weight on the edge from node i to node i + 1. Even neighbor node i in the local network. Then L共i兲 is given by
though high-degree search results in a path with smaller
number of hops, the total path length may be high if the st共i兲
weights on these edges are high. Thus, to be more realistic L共i兲 = 兺 st
s⫽i⫽t
and closer to real-world networks we need to explicitly in-
s,t苸local network
corporate weights in any proposed search algorithm. In this
paper, we are interested in designing decentralized search where st is the total number of shortest paths 共where short-
strategies for networks that have the following properties: est path means the path over which the sum of weights is
066128-2
SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 共2005兲
066128-3
THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲
TABLE I. Comparison of search strategies in a Poisson random network. The edge weights were gener-
ated randomly from an exponential distribution with mean 5 and variance 25. The values in the table are the
average path distances obtained for each search strategy in these networks. The strategy which passes the
message to the neighbor with the least edge weight performs the best.
Search strategy 500 nodes 1000 nodes 1500 nodes 2000 nodes
the message, sends the message to one of its neighbors de- gies 共3, 4, and 5兲, performed best, while high degree search
pending on the search strategy. The search continues until the and LBC did not perform well since the network is highly
message reaches the node whose neighbor is the target node. homogenous in node degree.
In order to avoid passing the message to a neighbor that has However, if we decrease the heterogeneity in edge
already received it, a list li of all the neighbors that received weights 共use a distribution with lesser variance兲, we observe
the message is maintained at each node i. During the search that high LBC search performs best 共see Table II兲. In con-
process, if node i passes the message to its neighbor j, which clusion, when the heterogeneity of edge weights is high com-
does not have any more neighbors that are not in the list l j, pared to the relative homogeneity of node degrees, the search
then the message is routed back to the node i. This particular strategies which are purely based on edge weights would
neighbor j is marked to note that this node cannot pass the perform better. However, as the heterogeneity of the edge
weights decrease the importance of edge weights decreases
message any further. The average path distance was calcu-
and strategies which consider both edge weights and node
lated for each search strategy from the paths obtained for
degree perform better.
these K pairs. We repeated this simulation for 10 to 50 in-
Next we investigated how the search strategies perform
stances of the Poisson and power-law networks depending on
on scale-free networks. Figure 2 shows the scaling of differ-
the size of the network.
ent search strategies for scale-free networks with exponent
2.1. As conjectured, the search strategy that utilizes the het-
VI. ANALYSIS erogeneities of both the edge weights and nodes’ degrees 共the
high LBC search兲 performed better than the other strategies.
First, we study and compare different search strategies on A similar phenomenon was observed for different exponents
ER random graphs. The weights on the edges were generated of the scale-free network 共see Table III兲. Except for the
from an exponential distribution with mean 5 and variance power-law exponent 2.9, the high LBC search was consis-
25. Table I compares the performance of each strategy for the tently better than others. We observe that as the heterogene-
networks of size 500, 1000, 1500, and 2000 nodes. We took ity in the node degree decreases 共i.e., as power-law exponent
the connection probability to be p = 0.004 and hence a giant increases兲, the difference between the high LBC search and
connected component always exists 关5兴. From Table I, it is other strategies decreases. When the exponent is 2.9, the per-
evident that the strategy which passes the message to the formance of LBC, minimum edge weight and high degree
neighbor with the least edge weight is better than all the searches were almost the same. Note that when the network
other strategies in homogeneous networks. Remarkably, a becomes homogeneous in node degree the minimum edge
search strategy that needs less information than other strate- weight search performs better than high LBC search 共Table
TABLE II. Comparison of search strategies in a Poisson random network with 2000 nodes. The table
gives results for different edge weight distributions. The mean for all the distributions is 5 and variance is 2.
The values in the table are the average path lengths obtained for each search strategy in these networks. When
the weight heterogeneity is high, the minimum edge weight search strategy was the best. However, when the
heterogeneity of edge weights is low, then LBC performs better.
066128-4
SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 共2005兲
TABLE III. Comparison of search strategies in power-law network on 2000 nodes with different power-
law exponents. The edge weights are generated from an exponential distribution with mean 5 and variance
25. The values in the table are the average path lengths obtained for each search strategy in these networks.
LBC search, which reflects both the heterogeneities in edge weights and node degree, performed the best for
all power-law exponents. The systematic increase in all path lengths with the increase of the power-law
exponent is due to the fact that the average degree of the network decreases with .
Power-law exponent=
Search strategy 2.1 2.3 2.5 2.7 2.9
066128-5
THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲
TABLE IV. Comparison of search strategies in power-law networks with exponent 2.1 and 2000 nodes
with different edge weight distributions. The mean for all the edge weight distributions is 5 and the variance
is 2. The values in the table are the average distances obtained for each search strategy in these networks.
The values in the brackets show the relative difference between average distance for each strategy with
respect to the average distance obtained by the LBC strategy. LBC search, which reflects both the heteroge-
neities in edge weights and node degree, performed the best for all edge weight distributions.
Tables I–IV兲. This implies that LBC search uses the informa- gree has the highest LBC. We extend the above result for
tion correctly. other configurations of the local network by considering dif-
ferent possible cases.
The possible edges other than the edges present in a tree-
VII. LBC ON UNWEIGHTED NETWORKS like local network are an edge between two first neighbors,
an edge between a first neighbor and a second neighbor and
In this section, we show that the neighbor with the highest
an edge between two second neighbors. As shown in Fig.
LBC is usually the same as the neighbor with the highest
4共b兲, an edge among two first neighbors changes the LBC of
degree in unweighted networks. Hence, high LBC search
the root node but not that of the neighbors. Figure 4共c兲 shows
would give identical results as high degree search in un-
a configuration of a local network with an edge added be-
weighted networks. As mentioned earlier, in unweighted
tween a first and a second neighbor. Now, there is a small
scale-free networks, there is a scaling relation between the
change in the LBCs of the neighbors 共nodes 2 and 3兲 which
共global兲 BC of a node and its degree, as BC⬃ k 关30兴. How-
are connected to a common second neighbor 共node 9兲. Since
ever, this does not imply that in an unweighted local network
the neighbor with highest LBC is always the same as the
neighbor with the highest degree. Here, we show that in most
cases the highest degree and the highest LBC neighbors co-
incide. First, let us consider a tree-like local network without
any loops similar to the network configuration shown in Fig.
4共a兲. In a local network, there are three types of nodes,
namely, root node, first neighbors and second neighbors. Let
the degree of the root node be d and the degree of the neigh-
bors be k1 , k2 , k3 , . . . , kd. The number of nodes 共n兲 in the
local network is n = 1 + 兺dj=1k j 关one root node, d first neigh-
bors and 兺dj=1共k j − 1兲 second neighbors兴. In a tree network
there is a single shortest path between any pair of nodes s
and t, thus st共i兲 is either zero or one. Then the LBC of a first
neighbor i is given by L共i兲 = 共ki − 1兲共n − 2兲 + 共ki − 1兲共n − ki兲
where ki is the degree of the neighbor. The first term is due to
the shortest paths from ki − 1 neighbors of node i to n − 2
remaining nodes 共other than node i and the neighbor j兲 in the
network. The second term is due to the shortest paths from
n − ki nodes 共other than ki − 1 neighbors and node i兲 to ki − 1 FIG. 4. 共a兲 A configuration of a local network with a tree like
neighbors of node i. Note that we choose not to explicitly structure. In such local networks, the neighbor with the highest
take into account the symmetry of distance in undirected degree has the highest LBC. 共b兲 A local network with an edge
networks and count the s-t and t-s paths separately. L共i兲 is an between two first neighbors. Here again the neighbor with the high-
increasing function if ki ⬍ n − 21 , a condition that is always est degree has the highest LBC. 共c兲 A local network with an edge
satisfied since n = 1 + 兺dj=1k j. This implies that in a local net- between a first neighbor and a second neighbor. Although there is
work with treelike structure, the neighbor with highest de- change in LBCs of neighbors, the order remains the same.
066128-6
SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 共2005兲
see that the highest degree neighbor is not the same as the
highest LBC neighbor. In this local network, the highest de-
gree first neighbor 共node 2兲, participates in several four-node
circuits that include the root node. Thus, there are multiple
shortest paths starting from second-neighbor nodes on these
cycles 共nodes 6, 7, 9, 10兲 and the contributions to node 2’s
LBC from the paths that pass through it are smaller than
unity, consequently the LBC of node 2 will be relatively
small. This may be one of the reasons why the highest-
degree neighbor node 2 is not the highest LBC neighbor. We
feel that this happens only in some special instances of local
networks. From about 50 000 simulations we found that in
99.63% of cases the highest degree neighbor is the same as
the highest LBC neighbor. Hence, we can conclude that in
unweighted networks the neighbor with highest LBC is usu-
ally identical to the neighbor with the highest degree.
VIII. CONCLUSION
FIG. 5. An instance of a local network where the order of neigh-
bors with respect to LBC is not the same as the order with respect In this paper we have given a new direction for local
to node degree. search in complex networks with heterogeneous edge
weights. We proposed a local search algorithm based on a
node 9 is now shared by neighbors 2 and 3, the LBC con- new local measure called local betweenness centrality. We
tributed by node 9 is divided between these two neighbors. studied complex tradeoffs presented by efficient local search
The LBC of such a neighbor i is L共i兲 = 共ki − 2兲共n − 2兲 + 共ki in weighted complex networks and showed that heterogene-
− 2兲共n − ki兲 + 共n − k j − 1兲 where ki is the degree of the neighbor ity in edge weights has huge impact on search. Moreover, the
impact of edge weights on search strategies increases as the
i and k j is the degree of the neighbor with which node i has
heterogeneity of the edge weights increase. We also demon-
a common second neighbor. The decrease in the LBC of
strated that the search strategy based on LBC utilizes the
neighbor i is 共n − ki + k j − 1兲. If there are two neighbors with
heterogeneity in both the node degree and edge weight to
the same degree 共one with a common second neighbor and
perform the best in power-law weighted networks. Further-
another without any兲 then the neighbor without any common
more, we have shown that in unweighted power-law net-
second neighbors will have higher LBC. Another possible
works the neighbor with the highest degree is usually the
change of order with respect to LBC would be with a neigh-
same as the neighbor with the highest LBC. Hence, our pro-
bor l of degree kl = ki − 1 共if it exists兲. However, L共i兲 − L共l兲
posed search strategy based on LBC is more universal and is
= 共n − ki − k j + 1兲 is always greater than 0, since n = 兺dj=1k j in efficient in a larger class of complex networks.
this local network. Thus the only scenario under which the
order of neighbors with respect to LBC is different than their
ACKNOWLEDGMENTS
order with respect to degree when adding an edge between
first and second neighbors is if that creates two first neigh- The authors would like to acknowledge the National Sci-
bors with the same degree. A similar argument leads to an ence Foundation 共Grant No. SST 0427840兲 and a Sloan Re-
identical conclusion in the case of adding an edge between search Fellowship to one of the authors 共R. A.兲 for making
two second neighbors as well. this work feasible. Any opinions, findings and conclusions or
The above discussion suggests that the highest degree recommendations expressed in this material are those of the
neighbor is always the same as the highest LBC neighbor. author共s兲 and do not necessarily reflect the views of the Na-
This is not true in few peculiar instances of local networks. tional Science Foundation 共NSF兲. In addition, the first author
For example, consider the network shown in Fig. 5 which 共H.P.T.兲 would like to thank Usha Nandini Raghavan for in-
has several edges between the first and second neighbors. We teresting discussions on issues related to this work.
关1兴 R. Albert and A. L. Barabasi, Rev. Mod. Phys. 74, 1 共2002兲. 共1998兲.
关2兴 S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. 51, 1079 关7兴 E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and
共2002兲. A.-L. Barabási, Science 297, 1551 共2002兲.
关3兴 M. E. J. Newman, SIAM Rev. 45, 167 共2003兲. 关8兴 R. Albert, A. L. Barabási, and H. Jeong, Nature 共London兲 406,
关4兴 P. Erdos and A. Renyi, Publ. Math. 共Debrecen兲 6, 290 共1959兲. 378 共2000兲; R. Albert, I. Albert, and G. L. Nakarado, Phys.
关5兴 B. Bollobas, Random Graphs 共Academic, London, 1985兲. Rev. E 69, 025103 共2004兲.
关6兴 D. J. Watts and S. H. Strogatz, Nature 共London兲 393, 440 关9兴 L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Hu-
066128-7
THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲
berman, Phys. Rev. E 64, 046135 共2001兲. N. Raghavan, H. P. Thadakamalla, and S. R. T. Kumara, Pro-
关10兴 R. Pastor-Satorras and A. Vespignani, Phys. Rev. E 63, ceedings of the Thirteenth International Conference on Ad-
066117 共2001兲; Phys. Rev. Lett. 86, 3200 共2001兲; Phys. Rev. E vanced Computing and Communications-ADCOM, 2005.
65, 035108共R兲 共2002兲; 65, 036104 共2002兲; in Handbook of 关19兴 G. Kan, in Peer-to-Peer Harnessing the Power of Disruptive
Graphs and Networks, edited by S. Bornholdt and H. G. Technologies, edited by A. Oram 共O’Reilly, Beijing, 2001兲; T.
Schuster 共Wiley-VCH, Berlin, 2003兲. Hong, in Peer-to-Peer Harnessing the Power of Disruptive
关11兴 M. Granovetter, Am. J. Sociol. 786, 1360 共1973兲; M. E. J. Technologies, edited by A. Oram 共O’Reilly, Beijing, 2001兲.
Newman, Phys. Rev. E 64, 016132 共2001兲. 关20兴 H. P. Thadakamalla, U. N. Raghavan, S. R. T. Kumara, and R.
关12兴 S. H. Yook, H. Jeong, A. L. Barabasi, and Y. Tu, Phys. Rev. Albert, IEEE Intell. Syst. 19, 24 共2004兲.
Lett. 86, 5835 共2001兲; J. D. Noh and H. Rieger, Phys. Rev. E 关21兴 J. Kleinberg, Nature 共London兲 406, 845 共2000兲; Proceedings
66, 066127 共2002兲; L. A. Braunstein, S. V. Buldyrev, R. Co- of the 32nd ACM Symposium on Theory of Computing, 2000,
hen, S. Havlin, and H. E. Stanley, Phys. Rev. Lett. 91, 168701 163–170; Adv. Neural Inf. Process. Syst. 14, 431 共2001兲.
共2003兲; A. Barrat, M. Barthelemy, and A. Vespignani, Phys. 关22兴 D. J. Watts, P. S. Dodds, and M. E. J. Newman, Science 296,
Rev. E 70, 066149 共2004兲. 1302 共2002兲.
关13兴 S. L. Pimm, Food Webs, 2nd ed. 共The University of Chicago 关23兴 L. A. Adamic and E. Adar, cond-mat/0310120 共unpublished兲.
Press, Chicago, IL, 2002兲. 关24兴 A. Arenas, A. Cabrales, A. Diaz-Guilera, R. Guimera, and F.
关14兴 A. E. Krause, K. A. Frank, D. M. Mason, R. E. Ulanowicz, and Vega, in Statistical mechanics of complex networks, edited by
W. W. Taylor, Nature 共London兲 426, 282 共2003兲; E. Almaas, R. Pastor-Satorras, M. Rubi, and A. Diaz-Guilera 共Springer-
B. Kovacs, T. Vicsek, Z. N. Oltvai, and A. L. Barabasi, ibid. Verlag, Berlin, 2003兲.
427, 839 共2004兲. 关25兴 S. Milgram, Psychol. Today 1, 61 共1967兲.
关15兴 A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespig- 关26兴 D. J. Watts, P. S. Dodds, and R. Muhamad, http://
nani, Proc. Natl. Acad. Sci. U.S.A. 101, 3747 共2004兲; R. Gui- smallworld.columbia.edu/index.html
mera, S. Mossa, A. Turtschi, and L. A. N. Amaral, ibid. 102, 关27兴 M. E. J. Newman, in Handbook of Graphs and Networks, ed-
7794 共2005兲. ited by S. Bornholdt and H. G. Schuster 共Wiley-VCH, Berlin,
关16兴 R. Pastor-Satorras and A. Vespignani, Evolution and Structure 2003兲.
of the Internet: A Statistical Physics Approach 共Cambridge 关28兴 S. Wasserman and K. Faust, Social Network Analysis 共Cam-
University Press, Cambridge, 2004兲. bridge University Press, Cambridge, UK, 1994兲.
关17兴 K. I. Goh, J. D. Noh, B. Kahng, and D. Kim, cond-mat/ 关29兴 W. Aiello, F. Chung, and L. Lu, Proceedings of the Thirty-
0410317 共unpublished兲. second Annual ACM Symposium on Theory of Computing,
关18兴 D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, Proceed- 2000, pp. 171–180.
ings of the Fifth Annual ACM/IEEE International Conference 关30兴 K. I. Goh, B. Kahng, and D. Kim, Phys. Rev. Lett. 87, 278701
on Mobile Computing and Networking, 1999, pp. 263–270; U. 共2001兲.
066128-8
Search in spatial scale-free networks
H P Thadakamalla1,3 , R Albert2 and S R T Kumara1
1
Department of Industrial Engineering, The Pennsylvania State University,
University Park, Pennsylvania, 16802, USA
2
Department of Physics, The Pennsylvania State University, University Park,
Pennsylvania, 16802, USA
E-mail: hpt102@psu.edu, ralbert@phys.psu.edu and skumara@psu.edu
New Journal of Physics 9 (2007) 190
Received 12 March 2007
Published 28 June 2007
Online at http://www.njp.org/
doi:10.1088/1367-2630/9/6/190
3
Author to whom any correspondence should be addressed.
Contents
1. Introduction 2
2. Literature and problem description 3
3. Decentralized search algorithms 4
4. Spatial network model and search analysis 7
4.1. Simulation and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Search in the US airline network 11
5.1. Properties of the US airline network . . . . . . . . . . . . . . . . . . . . . . . 11
5.2. Search results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6. Conclusions and discussion 15
Acknowledgments 16
References 16
1. Introduction
Recently, many large-scale distributed systems in communications, sociology, and biology have
been represented as networks and their macroscopic properties have been extensively studied
[1]–[4]. One of the major findings is the presence of heterogeneity in network properties. For
example, the distribution of node degree (i.e. the number of edges incident on a node) for many
real-world networks including the Internet, the World Wide Web, phone call networks, scientific
collaboration networks and metabolic networks is found to be highly heterogeneous and to
follow a power-law, p(k) ∼ k−γ where p(k) is the fraction of nodes with degree k. The clustering
coefficients, quantifying local order and cohesiveness [5], are also found to be heterogeneous,
i.e. C(k) ∼ k−1 [6]. Further, in many networks the node betweenness centrality, which quantifies
the number of shortest paths that pass through a node, is found to be heterogeneous [7]. These
heterogeneities have a demonstrably large impact on the network’s resilience [8, 9] as well as
navigation, local search [10, 11], and spreading processes [12].
Another interesting property exhibited by these networks is the ‘small-world phenomenon’
whereby almost every node is connected to every other node by a path with a small number
of edges. This phenomenon was first demonstrated by Milgram’s famous experiment in 1960
[13]. Milgram randomly selected individuals from Wichita, Kansas and Omaha, Nebraska and
requested them to direct letters to a target person in Boston, Massachusetts. The participants,
and consecutively each person receiving the letter, were asked to send it to an acquaintance
whom they judged to be closer to the target. Surprisingly, the average length of these paths (i.e.
the number of edges in the path) was approximately 6, illustrating the small-world property of
social networks. An even more striking observation, which was later pointed out by Kleinberg
[14]–[16], is that the nodes (participants) were able to find short paths by using only local
information. Currently, Dodds et al are carrying out an Internet-based study to verify this
phenomenon, and initial findings are published in [17].
The observation by Kleinberg raises two fundamental questions: (i) Why should social
networks be structured in a way that local search is efficient? (ii) What is the structure of
networks that exhibit this phenomenon? Kleinberg [14] and later Watts et al [18] argued that
the emergence of such a phenomenon requires special topological features. They termed the
networks in which short paths can be found using only local information as searchable networks.
These studies along with a few others [10, 19] stimulated research on decentralized searching in
complex networks [11], [20]–[26], a problem with many practical applications. In many networks,
information such as data files and sensor data is stored at the nodes of a distributed network. In
addition, the nodes have only limited or local information about the network. Hence, to access this
information quickly, one should have efficient algorithms that can find the target node using the
available local information. Examples include routing of sensor data in wireless sensor networks
[27, 28], locating data files in peer-to-peer networks [26, 29], and finding information in
distributed databases [30]. For the search process to be efficient, it is important that these networks
are designed to be searchable. The importance of search efficiency becomes even more imminent
in the case of ad-hoc networks, where the networks are decentralized and distributed, and real
time searching is required to find the target node.
In this paper, we study the decentralized search problem in a family of parameterized spatial
network models that are heterogeneous in node degree. We propose several decentralized search
algorithms and examine their performance by simulating them on the spatial network model for
various parameters. As pointed out in [25], our analysis reveals that the optimal search algorithm
should effectively incorporate the direction of travel and the degree of the neighbour. We illustrate
that some of these algorithms exploit the heterogeneities present in the network to find paths as
short as the paths found by using global information; thus we demonstrate that the spatial network
model considered defines a class of searchable networks. Further, we test these algorithms on
the US airline network which belongs to this class of networks and show that searchability is a
generic property of the US airline network.
to the target node based on the grid distance, is able to give short paths. He further extended this
model to hierarchical networks [16], where, again, the network was proven to be searchable only
for a specific parameter value. Unfortunately, the model given by Kleinberg represents only a very
small subset of complex networks. Independently, Watts et al presented another model based upon
plausible hierarchical social structures [18], to explain the phenomena observed in Milgram’s
experiment. The networks were shown to be searchable by a greedy search algorithm for a wide
range of parameter space. Other works on decentralized searching include [20]–[26]. Simsek and
Jensen [25] use homophily between nodes and degree disparity in the network to design a better
algorithm for finding the target node. However, finding an optimal way to combine location and
degree information is yet to be investigated (see [21] for a review). Another interesting problem
studied by Clauset and Moore [31], and by Sandberg [24], is the question of how real-world
networks evolve to become searchable. They propose a simple feedback mechanism where the
nodes continuously conduct decentralized searches, and in the process partially rewire the edges
to form a searchable network.
In this paper, we consider search in a family of parameterized spatial network models that
are heterogeneous in node degree. In this model, nodes are placed in an n-dimensional space and
are connected, based on preferential attachment and geographical constraints, to form spatial
scale-free networks. Preferential attachment to high-degree nodes is believed to be responsible
for the emergence of the power-law degree distribution observed in many real-world networks
[32], and geographical constraints account for the fact that nodes tend to connect to nodes that are
nearby. Many real-world networks such as the Internet [33] and the worldwide airline network
[34], can be described by this family of spatial network models. Our objective is to design
decentralized search algorithms for this type of network model and demonstrate that this simple
model defines a class of searchable networks. The decentralized search algorithm attempts to
send a message from a starting node s to the target node t along the edges of the network using
local information. Each node has information about the position of the target node, the position
of its neighbours, and the degree of its neighbours. Using this information, the start node, and
consecutively each node receiving the message, passes the message to one of its neighbours based
on the search algorithm until it reaches the target node. We evaluate each algorithm based on the
number of hops taken for the message to reach the target node; the lower the number, the better
the performance of the algorithm. Another potentially relevant measure is the physical distance
travelled by each search algorithm. However, the number of hops is the most pertinent distance
measure in many networks, including social networks, the Internet and even airline networks,
as the delays associated with switching between edges are comparable to the delays associated
with traversing an edge.
As observed in previous studies [10, 11], we expect that the heterogeneity present in spatial
scale-free networks influences the search process. In the following section, we discuss why the
degree of a node’s neighbour is important and propose different ways of composing the direction
of travel and the degree of the neighbour.
A simple search algorithm in spatial networks is greedy search, where each node passes the
message to the neighbour closest to the target node. Let di be the distance to the target
node from each neighbour i (see figure 1(a)) and let ki be the degree of the neighbour i.
Figure 1. (a) Illustration of a spatial network. di is the distance to the target node
from each neighbour i and ki is the degree of the neighbour i. (b) Illustration
for demonstrating that sometimes it is better to choose a neighbour with higher
degree i.e. node 2 over node 1, even if we are going away from the target. This
will give higher probability of taking a longer step in the next iteration.
Greedy search chooses the neighbour with the smallest di . This will ensure that the message
is always going to the neighbour closest to the target node. However, greedy search may not be
optimal in spatial scale-free networks that have high heterogeneity in node degree. Adamic et al
[10] and Thadakamalla et al [11] have shown that search algorithms that utilize the heterogeneities
present in the network perform substantially better than those that do not. Indeed, choosing a
neighbour with higher degree, even by going away from the target node, gives a higher probability
of taking a longer step in the next iteration. For instance, in figure 1(b), it is better to choose
node 2 instead of node 1 since node 2 can take a longer step towards the target node in the next
iteration. In the following paragraph, we show that the expected distance a neighbour can take
in the next iteration is a strictly increasing function of its degree.
We define the length of an edge as the Euclidian distance between the two nodes
connected by the edge. Let P(X) be the probability distribution of edge lengths. Let Yk =
Max{X1 , X2 , X3 , . . . , Xk }, where X1 , X2 , X3 , . . . , Xk are independent and identically distributed
(i.i.d.) random variables with distribution function P(X). The cumulative distribution function
of Yk is
k
P[Yk y] = P[Xi y] = [P(X1 y)]k .
i=1
This implies ∞
E(Yk ) = (1 − [P(X1 y)]k ) dy.
0
Since P(X1 y) 1 ∀y,
implying that
E(Yk1 ) E(Yk2 )∀y if k1 k2
Similarly, we can show that if P(X) is not a delta function then
Now consider two neighbours n1 and n2 with degree k1 and k2 . The expected distance the
neighbours n1 and n2 can take in the next iteration irrespective of the direction is given by
E[Yk1 −1 ] and E[Yk2 −1 ] respectively. This implies that E[Yk1 −1 ] > E[Yk2 −1 ] if k1 > k2 . Here, we
approximate that X1 , X2 , X3 , . . . , Xk are independent which is valid when the number of edges
is large. Hence, if we choose a neighbour with higher degree then there is a greater probability of
taking a longer step in the next iteration. Thus one expects that in spatial scale-free networks the
efficient algorithm should combine the direction of travel, quantified by di , and the degree of the
neighbour, ki , into one measure. Since the units of di and ki are different, there is no trivial way of
composition that is optimal. The aim of the measure is to choose a neighbour with smaller di and
larger ki with an intuition that a higher degree node should effectively decrease the distance from
the target—a goal which can be achieved in many different ways. One could give an incentive
g(ki ), and then subtract it from the distance di ; one could also divide di either by ki or by any
increasing function of ki . We investigated the following search algorithms, which cover a broad
spectrum of possibilities.
1. Random walk: the node attempts to reach the target by passing the message to a randomly
selected neighbour.
2. High-degree search: the node passes the message to the neighbour with the highest degree.
The idea here is that by choosing a neighbour that is well-connected, there is a higher
probability of reaching the target node. Note that this algorithm requires the fewest number
of hops to reach the target in unstructured networks [10].
3. Greedy search: the node passes the message to the neighbour i with the smallest di .
This will ensure that the message is always going to the neighbour closest to the target
node.
4. Algorithm 4: the node passes the message to the neighbour i with the smallest measure
di − g(ki ). The function g(ki ) is an incentive for choosing a neighbour of higher degree.
Ideally, g(ki ) should be the expected maximum length of an edge from a node with
degree ki .
5. Algorithm 5: the node passes the message to the neighbour i that has the smallest measure
( ddmi )ki , where dm is the Euclidian distance between the most spatially distant nodes in the
network, and is used for normalizing di . We assume that dm is known to all the nodes in the
network. Note that the algorithm prefers the neighbour that has lower di and higher ki .
6. Algorithm 6: the node passes the message to the neighbour i that has the smallest measure
di
ki
. Here, again, the algorithm prefers the neighbour that has lower di and higher ki .
7. Algorithm 7: the node passes the message to the neighbour i that has the smallest measure
( ddmi )ln ki +1 . This is a conservative version of algorithm 5 with respect to ki .
8. Algorithm 8: the node passes the message to the neighbour i that has the smallest measure
di
ln ki +1
. This algorithm is weaker version of algorithm 6 with respect to ki .
Algorithms from 4 to 8 aim to capture both the direction of travel and the neighbours’degree.
Thus, we expect these algorithms to give smaller path lengths than other algorithms. In the case of
algorithm 4, it would be extremely difficult to define a function independent of the parameters of
the network. Hence, it may not be realistic to use this form of composition for direction of travel
and degree of neighbour. Even greedy search has a slight preference for high-degree nodes, since
the probability of reaching a node with degree k is ∼ kpk [35], where pk is the fraction of nodes
New Journal of Physics 9 (2007) 190 (http://www.njp.org/)
7 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
with degree k. Hence, the proposed algorithms have to be extremely competitive to perform better
than greedy search. The algorithms described above are mainly based on intuition. However, as
we discuss later in the paper, the successful strategies are not restricted to these functional forms.
The spatial network model we consider incorporates both preferential attachment and
geographical constraints. At each step during the evolution of the spatial network model one
of the following occurs [36]:
1. with probability p, a new edge is created between two existing nodes in the network;
2. with probability 1 − p, a new node is added and connected to m existing nodes in the
network, with the constraint that multiple edges are not formed.
In both cases, the degrees of the nodes and the distances between them are considered when
forming a new edge. In the first case, two nodes i and j are selected according to
ki kj
ij ∝ ,
F(dij )
where ki is the degree of node i, dij is the Euclidian distance between nodes i and j and F(dij ) is
an increasing function of dij . A new node i is uniformly and randomly placed in an n-dimensional
space and is connected to a pre-existing node j with probability
kj
j ∝ .
F(dij )
The above process is simulated until the number of nodes in the network is N. Let the network
generated be G(N, p, m, F, n). Here, the preferential attachment mechanism leads to a power-
law degree distribution where the exponent can be tuned by changing the value of p [36] (see
figure 2(a)). F(d) controls the truncation of the power-law decay, and if F(d) increases rapidly,
then the power-law decay regime can disappear altogether [37]. Two widely-used functions for
F(d) are d r [33] and exp(d/dchar ) [37].
considered space. We assume that it is sufficient if the message reaches a small neighbourhood
of the target node defined by a circle with radius D. This is a realistic assumption in many real-
world networks, e.g. it is sufficient if we reach one of the airports in the close neighbourhood of
a destination city (especially when the city has multiple airports). The search process continues
until the message reaches a neighbour of the target node or a node within a circle of radius
D = 50 centred around the target node. In order to avoid passing the message to a neighbour
that has already received the message, a list L is maintained. During the search process, if the
message reaches a node i whose neighbours are all in the list L, then the message is passed to one
of the neighbours using the same algorithm. In the case of random walk or high degree search,
the message is routed back to the previous node and this particular neighbour i is marked to note
that it cannot pass the message any further. If the number of hops exceeds N/2, then the search
process stops, noting that the path was not found. For each search algorithm, the average path
length, l, measured as the number of edges in the path, the average physical distance travelled
along the path, dpath , and the percentage of times the search algorithm is unable to find a path, c,
are computed from the search results obtained for K pairs in 10 instances of the network model.
The lower the value of l, dpath and c, the better the performance of the search algorithm. We use
the shortest average path length and average physical distance obtained by global breadth-first-
search (BFS) algorithm and Dijkstra’s algorithm [38] respectively, as a benchmark for comparing
the performance of the search algorithms.
Table 1 compares the performance of different search algorithms for the spatial network,
G(1000, 0.72, 1, d r , 2) with r = 1, 2 and 3. We find that the decentralized search algorithms 5,
Greedy search 6.55 7.93 2.90 0.09 4.09 0.24 3.10 0.44 3.64 0.18 3.92 0.1
Algorithm 5 3.41 0.02 2.35 0 2.83 0 2.40 0 2.46 0.03 2.55 0
Algorithm 6 3.38 0.04 2.38 0 2.81 0 2.38 0 2.49 0 2.59 0
Algorithm 7 3.59 0.19 2.40 0 2.95 0 2.43 0.01 2.66 0.02 2.78 0
Algorithm 8 4.12 0.73 2.49 < 0.01 3.16 < 0.01 2.54 0 2.79 0.04 3.01 0.01
Shortest path length 2.91 NA 2.16 NA 2.30 NA 2.26 NA 2.23 NA 2.23 NA
6, 7 and 8 perform as well as the shortest path obtained using global information of the network.
Specifically, the difference between the shortest path and the path obtained by algorithms 6 and
7 is less than a hop. These results are surprising because the latter algorithms only use the local
information in the network, yet they perform as well as the BFS algorithm. This behaviour is
mainly due to the power-law nature of the spatial network: the few nodes with high-degree are
allowing the algorithms to make big jumps during the search process (see table 1). This conclusion
is corroborated by the fact that an increase in r, meaning a decrease in the power-law regime in the
degree distribution [37], induces an increase in the path length. Greedy search which uses only
the direction of travel is able to find short paths (compare l’s in table 1) but for a few node pairs
it is unable to find a path (compare c’s in table 1). Greedy search does not consider the degree
of the nodes and sometimes the algorithm gets stuck in a loop in sparsely connected regions
of the network. In the case of algorithm 4, the composition was not very effective. It is likely
that the values of the coefficients, which are difficult to compute, were not optimal. Moreover,
the optimal values are highly dependent on the parameters and the configuration of the spatial
network. Hence, it would be difficult to generalize the algorithm for all networks and we will
not consider it further in our analysis. Random-walk and high-degree search do not consider the
direction of travel and hence take an exorbitantly large number of hops. Further, we found that
the search algorithms’ performance with respect to the path length l and physical distance metric
dpath was similar. Hence, in the rest of our analysis, we do not discuss these two algorithms and
the physical distance metric since the results do not add significant new information.
Similar results are obtained for a wide range of parameters for the spatial network model.
Table 2 summarizes the results for some of these parameter values. This parameter space covers
a broad range of power-law networks with different properties. For example, as the value of p
changes from 0.3 to 0.8, the power-law exponent of the degree distribution changes from 2.4
to 1.7 (see figure 2(a)), which is the usual range of many real-world networks [1]–[4]. Hence
– 1.4
0.010 0.010
0.001 0.001
1 10 100 1000 1 10 100 1000
Degree (k) Degree (k)
80
Normalized BC
60 20
40
10
20
0 0
0 5 10 15 20 0 5 10 15
Scaled degree (k/〈k〉) Scaled degree (k/〈k〉)
we can affirm that the spatial network model belongs to a general class of searchable networks.
Although we have restricted our results to a discussion of two-dimensional spatial networks, it
is easy to verify that these results will be valid for higher dimensions. Further, a large number
of decentralized search algorithms are efficient. For instance, in algorithm 6 we divide di by
ki , whereas in algorithm 8 we divide di by ln ki + 1 which scales logarithmically with ki . Both
algorithms are found to be efficient. This implies that a wide range of functions f(x) that scale
between x and ln x can be used for decentralized search. Hence, we find that the dependence of
the search algorithms on the functional forms is weak and the searchability of these networks lies
in their heterogeneous structure rather than the functional forms used in the search algorithm.
Let us consider the US airline network, where nodes are the airports and two nodes are connected
by an edge if there is a direct flight from one airport to another. In this network, navigating along
an edge from one node to another represents flying from one airport to another. Suppose our
objective is to travel from one place to another using the US airline network. In real life, one can
obtain a choice of itineraries from the closest airport to the departure location (departure airport)
to the closest airport to the destination location (destination airport) using various sources such
as travel agents, airline offices or the World Wide Web. These sources have global information
about the network and one can choose the itinerary based on different criteria, such as travel fare,
number of stopovers, or total time of travel. Now consider a different scenario—one in which
we do not have access to the global information of the network, and each airport has only local
information. In other words, each airport has information about the location of the airports it can
fly to and how well these neighbouring airports are connected (their degree). We do know the
location of the departure airport and the destination airport. The objective is to find a path with
the fewest stopovers from the departure airport to the destination. From the departure airport,
and consecutively from each intermediate airport, we choose to fly to one of its neighbours based
on the degree of the neighbouring airport, its location and the location of the destination airport.
This process continues until we reach the destination airport or any other airport within a small
neighbourhood of the destination airport. In real life, it is sufficient if we reach one of the airports
near the destination airport. For example, it is sufficient to reach LaGuardia Airport (LGA),
New York City if the objective is to reach John F Kennedy International Airport (JFK),
New York City. In our study, as a first-order approximation we do not consider the type of
airline or travel fare as important parameters. Even though this method of travel is unrealistic, it
provides insights on the performance of decentralized search algorithms on real-world networks.
network (WWN) [7]. The average path length for the airline network, which is the average
minimum number of flights one has to take to go from one airport to any other, is 3.6. The
clustering coefficient, which quantifies local order of the network measured in terms of the
number of triangles (3-cliques) present, is 0.41. Hence, the US airline network is also a small-
world network [5]. The degree distribution of the network follows a power-law p(k) ∼ k−γ with
exponent γ = 1.9 ± 0.1 (see figure 2(b)), which is close to the exponent of the WWN, 2.0 ± 0.1
[7]. Further, as observed in the WWN, we find that the most connected airports are not necessarily
the most central airports. Figure 2(c) plots the normalized betweenness centrality (BC) of a node
i, (bi /b), where b is the average BC of the network, versus its scaled degree ki /k, where
k is the average degree of the network. The geopolitical considerations used to explain this
phenomenon in the WWN [34] do not apply to the US airline network, as it belongs to a single
country. In fact, this behaviour is due to Alaska which contains a significant percentage of the
airports (255 of 690, close to 34%) yet only a few (around 6) are connected to airports outside
of Alaska. For instance, the BC of Anchorage, Alaska is significantly higher than its degree
(see figure 2(c)). If we remove the Alaska airports from the network, then we observe better
correlation between the degree of a node and its BC (see figure 2(d)).
If an area is separated from the US mainland (such as Alaska and Hawaii), then very few
airports connect it to the mainland and it may be difficult for search algorithms to capture these
connections between the mainland and the other areas. To investigate the effects of this property
on the search process, we simulate the algorithms on three different networks, namely, the US
airline network, the US airline network without Alaska and the US mainland airline network
without Alaska, Hawaii, Puerto Rico, the US Virgin Islands and the US Pacific Trust Territories
and Possessions (US mainland network). The latter two networks have statistical properties
similar to those of the US airline network. The US airline network without Alaska has 459 nodes
and 2857 edges with 455 nodes and 2856 edges in the LCC; the US mainland network has 431
nodes and 2729 edges with 427 nodes and 2728 edges in the LCC.
l c l c l c
Greedy search 3.93 16806 (3.54%) 2.83 4015 (1.94%) 2.74 3729 (2.05%)
Algorithm 5 5.53 13870 (2.92%) 3.75 456 (0.22%) 2.85 425 (0.23%)
Algorithm 6 4.01 752 (0.16%) 3.17 454 (0.22%) 2.68 425 (0.23%)
Algorithm 7 3.37 688 (0.14%) 2.68 453 (0.22%) 2.93 1 ( 0.01%)
Algorithm 8 3.37 41 (< 0.01%) 2.76 38 (0.02%) 2.75 39 (0.02%)
Shortest path length 3.02 NA 2.39 NA 2.32 NA
When we looked at the search results in more detail we found a few more interesting
behaviours. The greedy search and algorithm 5 were unable to find paths for approximately the
same number of pairs in the US airline network (3.54% in the case of the former and 2.92%
for the latter). However, there is a difference in the type of paths these search algorithms could
not find. The paths not found by greedy search were distributed uniformly for all departure and
destination nodes; the paths not found by algorithm 5 were due predominantly to the 18 airports
in Alaska, which were unreachable, almost regardless of the starting point. It was interesting
to see that even if we start from Anchorage International Airport (ANC), the most connected
airport in Alaska, these airports were not reachable. This is mainly due to the high affinity of
algorithm 5 for high-degree nodes. The degree of neighbours of ANC which are in Alaska is
small compared to the degree of neighbours on the US mainland. Hence, when we start from
an airport, the algorithm was able to reach Anchorage but afterward selected one of the highly-
connected airports on the US mainland. From that point on, it is difficult to return to Alaska,
since the search algorithm is self-avoiding and since the only other airport that flies to Alaska,
excluding ANC, is Seattle-Tacoma International Airport (SEA). The US airline network without
Alaska and the US mainland network do not have these constraints, and hence algorithm 5 was
able to perform better.
Among the 475 410 pairs of source and destination nodes searched, algorithms 6 and 7
could not reach the destination node 752 and 688 times, respectively. Again, it turns out that
the failure to reach the destination was mainly due to a particular airport, namely, Havre City-
County Airport (HVR) in Montana. Similar behaviour was observed for these algorithms in the
US airline network without Alaska and the US mainland network. HVR is a single-degree node
that is connected to Lewistown Airport (LWT), Montana and the only other airport to which LWT
is connected is Billings Logan International Airport (BIL), Montana which is a well-connected
airport. Hence, the only way to reach HVR would be to reach BIL first and then to fly to LWT.
Unfortunately, none of the algorithms, other than the greedy search, can choose LWT from BIL
when the destination is HVR. Here again, even though the algorithms 5, 6, 7 and 8 are able
to reach BIL, they do not choose LWT as the first choice. Moreover, once they fly out of BIL,
they take many hops to reach BIL again due to the self-avoiding nature of the algorithms. For
instance, when the destination is HVR, algorithms 7 and 8 take, on an average, only 2.5 and 3.44
hops respectively to reach BIL. However, to reach HVR they take around 170 and 102 hops,
respectively. The reason why this behaviour is not observed for other single-degree nodes in the
US mainland network is that single-degree nodes are usually connected to high-degree nodes.
The average degree of the neighbours of the single-degree nodes was found to be 82.86, which is
significantly higher than the average degree in the network (12.78). In addition, the only airport
(LWT) that flies to HVR (or to a neighbourhood of HVR) is not chosen by the only other airport
(BIL) that can fly to LWT.
Table 4 gives the percentage of times the path length found by the search algorithms is the
same as the shortest path length. In approximately 90% of the pairs, the path length found by
algorithms 6, 7 and 8 was the same as the shortest path length. Further, in 97% of the pairs,
the path length found was more than the shortest path by a maximum of two hops. Given that
Diff = 0(%) Diff 2(%) Diff = 0(%) Diff 2(%) Diff = 0(%) Diff 2(%)
the search algorithms use only local information these results on the airline networks are quite
fascinating. Note that this behaviour is due mainly to the inherent structure of the US airline
network, which can be considered a ‘searchable network’.
searching, and even slight blending of direction with degree is sufficient to drastically improve
the efficiency of search algorithms. In other words, a search algorithm which traverses based
on direction and that cautiously avoids low-degree nodes should give short paths. However, as
observed with algorithm 5, sometimes high preference for degree may lead the algorithm to
the nodes far away from the destination node. Further, we can conclude that searchability is a
property of the network rather than of the functional forms used for the search algorithm.
The difference between the results obtained on the US airline network and the US mainland
network is not significant (especially for algorithms 7 and 8). This implies that the results can
probably be extended to the WWN [7] which has a very similar structure to the US airline network.
In the US airline network, we have separated areas which are connected to the mainland by only
a few airports. Algorithms 7 and 8 are able to capture these connections in order to travel from
one separated area to another. The WWN will have many more of these separated areas which are
well-connected locally but are sparsely inter-connected. We feel that algorithms 7 and 8 would
be able to find short paths in the WWN; verification would be subject to the availability of data
on the WWN.
Probably, the results obtained for the US airline network are intuitive. For instance, in real life
if one is asked to travel with local information, he/she can always find a short path—if not always
the shortest path. But the significance of the results lies in capturing this phenomenon/intuition in
an algorithm. Definitely, the structure of the network facilitates its searchability. As conjectured
by others, the results presented in this paper support the hypothesis [10, 21] that many real-
world networks evolve to inherently facilitate decentralized search. Furthermore, these results
provide insights for designing the structure of decentralized networks that need effective search
algorithms.
Acknowledgments
The authors would like to acknowledge the National Science Foundation (grants DMI 0537992
and CCF 0643529) for making this work feasible. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.
References
[15] Kleinberg J 2000 Proc. 32nd ACM Symp. Theor. Comput. pp 163–70
[16] Kleinberg J 2001 Adv. Neural Inform. Process. Syst. 14 431
[17] Dodds P, Muhamad R and Watts D J 2003 Science 301 827
[18] Watts D J, Dodds P S and Newman M E J 2002 Science 296 1302
[19] Kim B J, Yoon C N, Han S K and Jeong H 2002 Phys. Rev. E 65 027103
[20] Arenas A, Cabrales A, Diaz-Guilera A, Guimera R and Vega F 2003 Statistical mechanics of complex networks
(Berlin: Springer) chapter ‘Search and Congestion in Complex Networks’ pp 175–94
[21] Kleinberg J 2006 Proc. Int. Cong. Math. 3 1019
[22] Liben-Nowell D, Novak J, Kumar R, Raghavan P and Tomkins A 2005 Proc. Natl Acad. Sci. 102 11623
[23] Menczer F 2002 Proc. Natl Acad. Sci. 99 14014
[24] Sandberg O 2006 Proc. 8th Workshop on Algorithm engineering and experiments (ALENEX) pp 144–55
[25] Simsek O and Jensen D 2005 Proc. 19th Int. Joint Conf. Artificial Intell. pp 304–10
[26] Zhang H, Goel A and Govindan R 2004 Comput. Netw. 46 555
[27] Akyildiz I F, Su W, Sankarasubramaniam Y and Cayirci E 2002 Comput. Netw. 38 393
[28] Raghavan U N and Kumara S R T 2007 Int. J. Sensor Netw. 2 201
[29] Kan G 2001 Peer-to-Peer Harnessing the Power of Disruptive Technologies (Beijing: O’Reilly) chapter
‘Gnutella’
[30] Chakrabarti S, van den Berg M and Dom B 1999 Comput. Netw. 31 1623
[31] Clauset A and Moore C 2003 Preprint cond-mat/0309415
[32] Barabási A L and Albert R 1999 Science 286 509
[33] Yook S H, Jeong H and Barabási A L 2002 Proc. Natl Acad. Sci. 99 13382
[34] Guimera R and Amaral L A N 2004 Eur. Phys. J. B 38 381
[35] Newman M E J, Strogatz S H and Watts D J 2001 Phys. Rev. E 64 026118
[36] Dorogovtsev S and Mendes J F F 2000 Europhys. Lett. 52 33
[37] Barthélemy M 2003 Europhys. Lett. 63 915
[38] Cormen T H, Leiserson C E, Rivest R L and Stein C 2001 Introduction to Algorithms 2nd edn (Cambridge:
MIT Press)
[39] The Bureau of Transportation Statistics online at http://www.transtats.bts.gov/ (date accessed: 20 July 2006)
[40] Kalapala V, Sanwalani V, Clauset A and Moore C 2006 Phys. Rev. E 73 026130
successfully used in dividing networks into two or more a measure that can quantify the strength of a community
communities. obtained. One of the ways to measure the strength of a com-
In this paper, we propose a localized community detection munity is by comparing the density of edges observed within
algorithm based on label propagation. Each node is initial- the community with the density of edges in the network as a
ized with a unique label and at every iteration of the algo- whole 关6兴. If the number of edges observed within a commu-
rithm, each node adopts a label that a maximum number of nity U is eU, then under the assumption that the edges in the
its neighbors have, with ties broken uniformly randomly. As network are uniformly distributed among pairs of nodes, we
the labels propagate through the network in this manner, can calculate the probability P that the expected number of
densely connected groups of nodes form a consensus on their edges within U is larger than eU. If P is small, then the
labels. At the end of the algorithm, nodes having the same observed density in the community is greater than the ex-
labels are grouped together as communities. As we will pected value. A similar definition was recently adopted by
show, the advantage of this algorithm over the other methods Newman 关13兴, where the comparison is between the ob-
is its simplicity and time efficiency. The algorithm uses the served density of edges within communities and the expected
network structure to guide its progress and does not optimize density of edges within the same communities in randomized
any specific chosen measure of community strengths. Fur- networks that nevertheless maintain every node’s degree.
thermore, the number of communities and their sizes are not This was termed the modularity measure Q, where Q
known a priori and are determined at the end of the algo- = 兺i共eii − a2i 兲 , ∀ i. eii is the observed fraction of edges
rithm. We will show that the community structures obtained within group i and a2i is the expected fraction of edges within
by applying the algorithm on previously considered net- the same group i. Note that if eij is the fraction of edges in
works, such as Zachary’s karate club friendship network and the network that run between group i and group j, then ai
the U.S. college football network, are in agreement with the = 兺 jeij. Q = 0 implies that the density of edges within groups
actual communities present in these networks. in a given partition is no more than what would be expected
by a random chance. Q closer to 1 indicates stronger com-
munity structures.
II. DEFINITIONS AND PREVIOUS WORK
Given a network with n nodes and m edges N共n , m兲, any
As mentioned earlier, there is no unique definition of a community detection algorithm finds subgroups of nodes.
community. One of the simplest definitions of a community Let C1 , C2 , . . . , C p be the communities found. In most algo-
is a clique, that is, a group of nodes where there is an edge rithms, the communities found satisfy the following con-
between every pair of nodes. Cliques capture the intuitive straints: 共i兲 Ci 艚 C j = 쏗 for i ⫽ j and 共ii兲 艛iCi spans the node
notion of a community 关6兴 where every node is related to set in N.
every other node and hence have strong similarities with A notable exception is Palla et al. 关14兴 who define com-
each other. An extension of this definition was used by Palla munities as a chain of adjacent k cliques and allow commu-
et al. in 关14兴, who define a community as a chain of adjacent nity overlaps. It takes exponential time to find all such com-
cliques. They define two k cliques 共cliques on k nodes兲 to be munities in the network. They use these sets to study the
adjacent if they share k − 1 nodes. These definitions are strict overlapping structure of communities in social and biological
in the sense that the absence of even one edge implies that a networks. By forming another network where a community
clique 共and hence the community兲 no longer exists. k clans is represented by a node and edges between nodes indicates
and k clubs are more relaxed definitions while still maintain- the presence of overlap, they show that such networks are
ing a high density of edges within communities 关14兴. A group also heterogeneous 共fat-tailed兲 in their node degree distribu-
of nodes is said to form a k clan if the shortest path length tions. Furthermore, if a community has overlapping regions
between any pair of nodes, or the diameter of the group, is at with two other communities, then the neighboring communi-
most k. Here the shortest path only uses the nodes within the ties are also highly likely to overlap.
group. A k club is defined similarly, except that the subnet- The number of different partitions of a network N共n , m兲
work induced by the group of nodes is a maximal subgraph into just two disjoint subsets is 2n and increases exponen-
of diameter k in the network. tially with n. Hence we need a quick way to find only rel-
Definitions based on degrees 共number of edges兲 of nodes evant partitions. Girvan and Newman 关5兴 proposed a divisive
within the group relative to their degrees outside the group algorithm based on the concept of edge betweenness central-
were given by Radicchi et al. 关15兴. If din i and di
out
are the ity, that is, the number of shortest paths among all pairs of
degrees of node i within and outside of its group U, then U is nodes in the network passing through that edge. The main
said to form a strong community if din i ⬎ di , ∀ i 僆 U. If
out
idea here is that edges that run between communities have
兺i僆Udi ⬎ 兺i僆Udi , then U is a community in the weak
in out higher betweenness values than those that lie within commu-
sense. Other definitions based on degrees of nodes can be nities. By successively recalculating and removing edges
found in 关6兴. with highest betweenness values, the network breaks down
There can exist many different partitions of nodes in the into disjoint connected components. The algorithm continues
network that satisfy a given definition of community. In most until all edges are removed from the network. Each step of
cases 关4,22,26–28兴, the groups of nodes found by a commu- the algorithm takes O共mn兲 time and since there are m edges
nity detection algorithm are assumed to be communities ir- to be removed, the worst case running time is O共m2n兲. As the
respective of whether they satisfy a specific definition or not. algorithm proceeds one can construct a dendrogram 共see
To find the best community structures among them we need Fig. 1兲 depicting the breaking down of the network into dis-
036106-2
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲
036106-3
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲
036106-4
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲
共ii兲 Set t = 1.
共iii兲 Arrange the nodes in the network in a random order
and set it to X.
共iv兲 For each x 僆 X chosen in that specific order, let
Cx共t兲 = f(Cxi1共t兲 , . . . , Cxim共t兲 , Cxi共m+1兲共t − 1兲 , . . . , Cxik共t − 1兲). f
here returns the label occurring with the highest frequency
among neighbors and ties are broken uniformly randomly.
共v兲 If every node has a label that the maximum number of
their neighbors have, then stop the algorithm. Else, set t = t
+ 1 and go to 共iii兲.
Since we begin the algorithm with each node carrying a
unique label, the first few iterations result in various small
pockets 共dense regions兲 of nodes forming a consensus 共ac-
quiring the same label兲. These consensus groups then gain
momentum and try to acquire more nodes to strengthen the
group. However, when a consensus group reaches the border
of another consensus group, they start to compete for mem-
bers. The within-group interactions of the nodes can counter-
act the pressures from outside if there are less between-group
edges than within-group edges. The algorithm converges,
and the final communities are identified, when a global con-
sensus among groups is reached. Note that even though the
network as one single community satisfies the stop criterion,
this process of group formation and competition discourages
all nodes from acquiring the same label in the case of het-
erogeneous networks with an underlying community struc-
ture. In the case of homogeneous networks such as Erdős-
Rényi random graphs 关31兴 that do not have community
structures, the label propagation algorithm identifies the gi-
ant connected component of these graphs as a single com-
munity.
Our stop criterion is only a condition and not a measure
that is being maximized or minimized. Consequently there is
no unique solution and more than one distinct partition of a
network into groups satisfies the stop criterion 共see Figs. 4
and 5兲. Since the algorithm breaks ties uniformly randomly,
early on in the iterative process when possibilities of ties are
high, a node may vote in favor of a randomly chosen com-
munity. As a result, multiple community structures are reach-
able from the same initial condition.
If we know the set of nodes in the network that are likely
to act as centers of attraction for their respective communi-
ties, then it would be sufficient to initialize such nodes with
unique labels, leaving the remaining nodes unlabeled. In this
case when we apply the proposed algorithm the unlabeled
nodes will have a tendency to acquire labels from their clos-
est attractor and join that community. Also, restricting the set FIG. 4. 共a兲–共c兲 are three different community structures identi-
of nodes initialized with labels will reduce the range of pos- fied by the algorithm on Zachary’s karate club network. The com-
sible solutions that the algorithm can produce. Since it is munities can be identified by their shades of gray colors.
generally difficult to identify nodes that are central to a com-
munity before identifying the community itself, here we give lege football network that consists of 115 college teams
all nodes equal importance at the beginning of the algorithm represented as nodes and has edges between teams that
and provide them each with unique labels. played each other during the regular season in the year 2000
We apply our algorithm to the following networks. The 关5兴. The teams are divided into conferences 共communities兲
first one is Zachary’s karate club network which is a network and each team plays more games within its own conference
of friendship among 34 members of a karate club 关32兴. Over than interconference games. Next is the coauthorship net-
a period of time the club split into two factions due to lead- work of 16 726 scientists who have posted preprints on the
ership issues and each member joined one of the two fac- condensed matter archive at www.arxiv.org; the edges con-
tions. The second network that we consider is the U.S. col- nect scientists who coauthored a paper 关33兴. It has been
036106-5
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲
FIG. 5. The grouping of U.S. college football teams into conferences are shown in 共a兲 and 共b兲. Each solution 关共a兲 and 共b兲兴 is an aggregate
of five different solutions obtained by applying the algorithm on the college football network.
shown that communities in coauthorship networks are made number of nodes common to community i in one solution
up by researchers working in the same field or are research and community j in the other solution. Then we calculate
groups 关22兴. Along similar lines one can expect an actor f same = 21 共兺imax j兵M ij其 + 兺 jmaxi兵M ij其兲 100
n . Given a network
collaboration network to have communities containing actors whose communities are already known, a community detec-
of a similar genre. Here we consider an actor collaboration tion algorithm is commonly evaluated based on the percent-
network of 374 511 nodes and edges running between actors age 共or number兲 of nodes that are grouped into the correct
who have acted in at least one movie together 关3兴. We also communities 关22,26兴. f same is similar, whereby fixing one so-
consider a protein-protein interaction network 关34兴 consist- lution we evaluate how close the other solution is to the fixed
ing of 2115 nodes. The communities are likely to reflect one and vice versa. While f same can identify how close one
functional groupings of this network. And finally we con- solution is to another, it is, however, not sensitive to the
sider a subset of the WWW兲 consisting of 325 729 web seriousness of errors. For example, when few nodes from
pages within the nd.edu domain and hyperlinks interconnect- several different communities in one solution are fused to-
ing them 关2兴. Communities here are expected to be groups of gether as a single community in another solution, the value
of f same does not change much. Hence we also use Jaccard’s
pages on similar topics.
index which has been shown to be more sensitive to such
differences between solutions 关35兴. If a stands for the pairs
A. Multiple community structures
of nodes that are classified in the same community in both
Figure 4 shows three different solutions obtained for the solutions, b for pairs of nodes that are in the same commu-
Zachary’s karate club network and Fig. 5 shows two different nity in the first solution and different in the second, and c
a
solutions obtained for the U.S. college football network. We vice versa, then Jaccard’s index is defined as a+b+c . It takes
will show that even though we obtain different solutions values between 0 and 1, with higher values indicating stron-
共community structure兲, they are similar to each other. To find ger similarity between the two solutions. Figure 6 shows the
the percentage of nodes classified in the same group in two similarities between solutions obtained from applying the al-
different solutions, we form a matrix M, where M ij is the gorithm five different times on the same network. For a
036106-6
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲
FIG. 6. Similarities between five different solutions obtained for each network is tabulated. An entry in the ith row and jth column in the
lower triangle of each of the tables is the Jaccard’s similarity index for solutions i and j of the corresponding network. Entries in the ith row
and jth column in the upper triangle of the tables are the values of the measure f same for solutions i and j in the respective networks. The
range of modularity values Q obtained for the five different solutions is also given for each network.
given network, the ijth entry in the lower triangle of the table bine them as follows; let C1 denote the labels on the nodes in
is the Jaccard index for solutions i and j, while the ijth entry solution 1 and C2 denote the labels on the nodes in solution
in the upper triangle is the measure f same for solutions i and j. 2. Then, for a given node x, we define a new label as Cx
We can see that the solutions obtained from the five different = 共C1x , C2x 兲 共see Fig. 7兲. Starting with a network initialized
runs are similar, implying that the proposed label propaga- with labels C we perform the iterative process of label propa-
tion algorithm can effectively identify the community struc- gation until every node in the network is in a community to
ture of any given network. Moreover, the tight range and which the maximum number of its neighbors belongs. As
high values of the modularity measure Q obtained for the and when new solutions are available they are combined one
five solutions 共Fig. 6兲 suggest that the partitions denote sig- by one with the aggregate solution to form a new aggregate
nificant community structures. solution. Note that when we aggregate two solutions, if a
community T in one solution is broken into two 共or more兲
different communities S1 and S2 in the other, then by defining
B. Aggregate
the new labels as described above we are showing prefer-
It is difficult to pick one solution as the best among sev- ences to the smaller communities S1 and S2 over T. This is
eral different ones. Furthermore, one solution may be able to only one of the many ways in which different solutions can
identify a community that was not discovered in the other be aggregated. For other methods of aggregation used in
and vice versa. Hence an aggregate of all the different solu- community detection refer to 关26,36,37兴.
tions can provide a community structure containing the most Figure 8 shows the similarities between aggregate solu-
useful information. In our case a solution is a set of labels on tions. The algorithm was applied on each network 30 times
the nodes in the network and all nodes having the same label and the solutions were recorded. An ijth entry is the Jaccard
form a community. Given two different solutions, we com- index for the aggregate of the first 5i solutions with the ag-
036106-7
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲
FIG. 7. An example of aggregating two community structure solutions. t1, t2, t3, and t4 are labels on the nodes in a network obtained from
solution 1 and denoted as C1. The network is partitioned into groups of nodes having the same labels. s1, s2, and s3 are labels on the nodes
in the same network obtained from solution 2 and denoted as C2. All nodes that had label t1 in solution 1 are split into two groups with each
group having labels s1 and s2, respectively, while all nodes with labels t3, t4, or t5 in solution 1 have labels s3 in solution 2. C represents the
new labels defined from C1 and C2.
gregate of the first 5j solutions. We observe that the aggre- the accuracy of the algorithm by applying it on these net-
gate solutions are very similar in nature and hence a small set works. We find that the algorithm can effectively unearth the
of solutions 共5 in this case兲 can offer as much insight about underlying community structures in the respective networks.
the community structure of a network as can a larger solution The community structures obtained by using our algorithm
set. In particular, the WWW network which had low simi- on Zachary’s karate club network is shown in Fig. 4. While
larities between individual solutions 共Jaccard index range all three solutions are outcomes of the algorithm applied to
0.4883–0.5931兲, shows considerably improved similarities the network, Fig. 4共b兲 reflects the true solution 关32兴.
共Jaccard index range 0.6604–0.7196兲 between aggregate so- Figure 5 gives two solutions for the U.S. college football
lutions. network. The algorithm was applied to this network ten dif-
ferent times and the two solutions are the aggregate of the
IV. VALIDATION OF THE COMMUNITY DETECTION
first five and remaining five solutions. In both Figs. 5共a兲 and
ALGORITHM
5共b兲, we can see that the algorithm can effectively identify
Since we know the communities present in Zachary’s ka- all the conferences with the exception of Sunbelt. The reason
rate club and the U.S. football network, we explicitly verify for the discrepancy is the following: among the seven teams
FIG. 8. Similarities between aggregate solutions obtained for each network. An entry in the ith row and jth column in the tables is
Jaccard’s similarity index between the aggregate of the first 5i and the first 5j solutions. While similarities between solutions for the karate
club friendship network and the protein-protein interaction network are represented in the lower triangles of the first two tables, the entries
in the upper triangle of these two tables are for the U.S. college football network and the coauthorship network, respectively. The similarities
between aggregate solutions for the WWW is given in the lower triangle of the third table.
036106-8
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲
in the Sunbelt conference, four teams 共Sunbelt4 ⫽ 兵North- VI. DISCUSSION AND CONCLUSIONS
Texas, Arkansas State, Idaho, New Mexico State其 have all
played each other and three teams 共Sunbelt3 ⫽兵Louisiana- The proposed label propagation process uses only the net-
Monroe, Middle-Tennessee State, Louisiana-Lafayette其兲 have work structure to guide its progress and requires no external
again played one another. There is only one game connecting parameter settings. Each node makes its own decision re-
Sunbelt4 and Sunbelt3, namely, the game between North- garding the community to which it belongs based on the
Texas and Louisiana-Lafayette. However, four teams from communities of its immediate neighbors. These localized de-
the Sunbelt conference 共two each from Sunbelt4 and cisions lead to the emergence of community structures in a
Sunbelt3兲 have together played with seven different teams in given network. We verified the accuracy of community struc-
the Southeastern conference. Hence we have the Sunbelt tures found by the algorithm using Zachary’s karate club and
conference grouped together with the Southeastern confer-
the U.S. college football networks. Furthermore, the modu-
ence in Fig. 5共a兲. In Fig. 5共b兲, the Sunbelt conference breaks
larity measure Q was significant for all the solutions ob-
into two, with Sunbelt3 grouped together with Southeastern
tained, indicating the effectiveness of the algorithm. Each
and Sunbelt4 grouped with an independent team 共Utah State兲,
a team from Western Atlantic 共Boise State兲, and the Moun- iteration takes a linear time O共m兲, and although one can ob-
tain West conference. The latter grouping is due to the fact serve the algorithm beginning to converge significantly after
that every member of Sunbelt4 has played with Utah State about five iterations, the mathematical convergence is hard to
and with Boise State, who have together played five games prove. Other algorithms that run in a similar time scale in-
with four different teams in Mountain West. There are also clude the algorithm of Wu and Huberman 关26兴 关with time
five independent teams which do not belong to any specific complexity O共m + n兲兴 and that of Clauset et al. 关30兴 which
conference and are hence assigned by the algorithm to a has a running time of O共n log2 n兲.
conference where they have played the maximum number of The algorithm of Wu and Huberman is used to break a
their games. given network into only two communities. In this iterative
process two chosen nodes are initialized with scalar values 1
V. TIME COMPLEXITY and 0 and every node updates its value as the average of the
values of its neighbors. At convergence, if a maximum num-
It takes a near-linear time for the algorithm to run to its ber of a node’s neighbors have values above a given thresh-
completion. Initializing every node with unique labels re- old then so will the node. Hence a node tends to be classified
quires O共n兲 time. Each iteration of the label propagation al-
to a community to which the maximum number of its neigh-
gorithm takes linear time in the number of edges 关O共m兲兴. At
bors belong. Similarly if in our algorithm we choose the
each node x, we first group the neighbors according to their
same two nodes and provide them with two distinct labels
labels 关O共dx兲兴. We then pick the group of maximum size and
共leaving the others unlabeled兲, the label propagation process
assign its label to x, requiring a worst-case time of O共dx兲.
will yield similar communities as the Wu and Huberman al-
This process is repeated at all nodes and hence an overall
gorithm. However, to find more than two communities in the
time is O共m兲 for each iteration.
network, the Wu and Huberman algorithm needs to know a
As the number of iterations increases, the number of
priori how many communities there are in the network. Fur-
nodes that are classified correctly increases. Here we assume
thermore, if one knows that there are c communities in the
that a node is classified correctly if it has a label that the
maximum number of its neighbors have. From our experi- network, the algorithm proposed by Wu and Huberman can
ments, we found that irrespective of n, 95% of the nodes or only find communities that are approximately of the same
more are classified correctly by the end of iteration 5. Even size, that is, nc , and it is not possible to find communities with
in the case of Erdős-Rényi random graphs 关31兴 with n be- heterogeneous sizes. The main advantage of our proposed
tween 100 and 10 000 and average degree 4, which do not label propagation algorithm over the Wu and Huberman al-
have community structures, by iteration 5, 95% of the nodes gorithm is that we do not need a priori information on the
or more are classified correctly. In this case, the algorithm number and sizes of the communities in a given network;
identified all nodes in the giant connected component as be- indeed such information usually is not available for real-
longing to one community. world networks. Also, our algorithm does not make restric-
When the algorithm terminates it is possible that two or tions on the community sizes. It determines such information
more disconnected groups of nodes have the same label 共the about the communities by using the network structure alone.
groups are connected in the network via other nodes of dif- In our test networks, the label propagation algorithm
ferent labels兲. This happens when two or more neighbors of found communities whose sizes follow approximately a
a node receive its label and pass the labels in different direc- power-law distribution P共S ⬎ s兲 ⬃ s− with the exponent
tions, which ultimately leads to different communities adopt- ranging between 0.5 and 2 共Fig. 9兲. This implies that there is
ing the same label. In such cases, after the algorithm termi- no characteristic community size in the networks and it is
nates one can run a simple breadth-first search on the consistent with previous observations 关22,30,38兴. While the
subnetworks of each individual group to separate the discon- community size distributions for the WWW and coauthor-
nected communities. This requires an overall time of O共m ship networks approximately follow power laws with a cut-
+ n兲. When aggregating solutions, however, we rarely find off, with exponents 1.15 and 1.98, respectively, there is a
disconnected groups within communities. clear crossover from one scaling relation to another for the
036106-9
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲
FIG. 9. The cumulative probability distributions of community sizes 共s兲 are shown for the WWW, coauthorship and actor collaboration
networks. They approximately follow power laws with the exponents as shown.
actor collaboration network. The community size distribution solutions is low, with the Jaccard index ranging between
for the actor collaboration network has a power-law expo- 0.4883 and 0.5921, yet all five are significantly modular with
nent of 2 for sizes up to 164 nodes and 0.5 between 164 and Q between 0.857 and 0.864. This implies that the proposed
7425 nodes 共see Fig. 9兲. algorithm can find not just one but multiple significant com-
In the hierarchical agglomerative algorithm of Clauset et munity structures, supporting the existence of overlapping
al. 关30兴, the partition that corresponds to the maximum Q is communities in many real-world networks 关14兴.
taken to be the most indicative of the community structure in
the network. Other partitions with high Q values will have a
structure similar to that of the maximum Q partition, as these ACKNOWLEDGMENTS
solutions are obtained by progressively aggregating two
groups at a time. Our proposed label propagation algorithm, The authors would like to acknowledge the National Sci-
on the other hand, finds multiple significantly modular solu- ence Foundation 共Grants No. SST 0427840, No. DMI
tions that have some amount of dissimilarity. For the WWW 0537992, and No. CCF 0643529兲. One of the authors 共R.A.兲
network in particular, the similarity between five different acknowledges support from the Sloan Foundation.
关1兴 R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 共2002兲. 关7兴 L. Danon, A. Díaz-Guilera, and A. Arenas, J. Stat. Mech.:
关2兴 R. Albert, H. Jeong, and A.-L. Barabási, Nature 共London兲 401, Theor. Exp. 2006 P11010 共2006兲.
130 共1999兲. 关8兴 J. Eckmann and E. Moses, Proc. Natl. Acad. Sci. U.S.A. 99,
关3兴 A.-L. Barabási and R. Albert, Science 286, 509 共1999兲. 5825 共2002兲.
关4兴 M. Newman, SIAM 共Soc. Ind. Appl. Math.兲 Rev. 45, 167 关9兴 G. Flake, S. Lawrence, and C. Giles, Proceedings of the 6th
共2003兲. ACM SIGKDD, 2000, pp. 150–160.
关5兴 M. Girvan and M. Newman, Proc. Natl. Acad. Sci. U.S.A. 99, 关10兴 R. Guimerà and L. Amaral, Nature 共London兲 433, 895 共2005兲.
7821 共2002兲. 关11兴 M. Gustafsson, M. Hornquist, and A. Lombardi, Physica A
关6兴 S. Wasserman and K. Faust, Social Network Analysis 共Cam- 367, 559 共2006兲.
bridge University Press, Cambridge, England, 1994兲. 关12兴 M. B. Hastings, Phys. Rev. E 74, 035102共R兲 共2006兲.
036106-10
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲
关13兴 M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113 关26兴 F. Wu and B. Huberman, Eur. Phys. J. B 38, 331 共2004兲.
共2004兲. 关27兴 J. P. Bagrow and E. Bollt, Phys. Rev. E 72, 046108 共2005兲.
关14兴 G. Palla, I. Derényi, I. Farkas, and T. Vicsek, Nature 共London兲 关28兴 L. Costa, e-print arXiv:cond-mat/0405022.
435, 814 共2005兲. 关29兴 M. E. J. Newman, Eur. Phys. J. B 38, 321 共2004兲.
关15兴 F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Pa- 关30兴 A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev. E 70,
risi, Proc. Natl. Acad. Sci. U.S.A. 101, 2658 共2004兲. 066111 共2004兲.
关16兴 D. Karger, J. ACM 47, 46 共2000兲. 关31兴 B. Bollobás, Random Graphs 共Academic Press, Orlando, FL,
关17兴 B. Kernighan and S. Lin, Bell Syst. Tech. J. 29, 291 共1970兲. 1985兲.
关18兴 C. Fiduccia and R. Mattheyses, Proceedings of the 19th An- 关32兴 W. Zachary, J. Anthropol. Res. 33, 452 共1977兲.
nual ACM IEEE Design Automation Conference, 1982, pp. 关33兴 M. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404 共2001兲.
175–181. 关34兴 H. Jeong, S. Mason, A.-L. Barabási, and Z. Oltvai, Nature
关19兴 B. Hendrickson and R. Leland, SIAM 共Soc. Ind. Appl. Math.兲 共London兲 411, 41 共2001兲.
J. Sci. Comput. 16, 452 共1995兲. 关35兴 G. Milligan and D. Schilling, Multivariate Behav. Res. 20, 97
关20兴 M. Stoer and F. Wagner, J. ACM 44, 585 共1997兲. 共1985兲.
关21兴 C. Thompson, Proceedings of the 11th Annual ACM Sympo- 关36兴 D. Gfeller, J. C. Chappelier, and P. De Los Rios, Phys. Rev. E
sium on Theory of Computing, 1979, pp. 81–88. 72, 056135 共2005兲.
关22兴 M. E. J.Newman, Phys. Rev. E 69, 066133 共2004兲. 关37兴 D. Wilkinson and B. Huberman, Proc. Natl. Acad. Sci. U.S.A.
关23兴 P. Pons and M. Latapy, e-print arXiv:physics/0512106. 101, 5241 共2004兲.
关24兴 J. Duch and A. Arenas, Phys. Rev. E 72, 027104 共2005兲. 关38兴 A. Arenas, L. Danon, A. Díaz-Guilera, P. Gleiser, and R.
关25兴 M. E. J. Newman, Phys. Rev. E 74, 036104 共2006兲. Guimerà, Eur. Phys. J. B 38, 373 共2004兲.
036106-11
Int. J. Sensor Networks, Vol. 2, Nos. 3/4, 2007 201
Abstract: In this paper, we study the problem of maintaining the connectivity of a Wireless Sensor
Network (WSN) using decentralised topology control protocols. Previous algorithms on topology
control require the knowledge of the density of nodes (λ) in the sensing region. However, if λ
varies continuously over time, updating this information, at all nodes is impractical. Therefore, in
addition to efficient maintenance of connectivity, we also wish to reduce the control overhead of the
topology control algorithm. In the absence of information regarding λ we study the connectivity
properties of WSNs by means of giant components. We show that by maintaining the out-degree
at each node as five will give rise to a giant connected component in the network. We also show
that this is the smallest value that can maintain a giant connected component irrespective of how
often or by how much λ changes.
Keywords: wireless sensor networks; topology control; percolation; giant connected component.
Reference to this paper should be made as follows: Raghavan, U.N. and Kumara, S.R.T.
(2007) ‘Decentralised topology control algorithms for connectivity of distributed wireless sensor
networks’, Int. J. Sensor Networks, Vol. 2, Nos. 3/4, pp.201–210.
Biographical notes: Usha Nandini Raghavan is a PhD student in the Department of Industrial
Engineering at the Pennsylvania State University. Her main research interest is in the
self-organisation of complex networks and localised algorithms as applied to wireless networks.
Other research interests include graph theory and supply chain management. She obtained her
Master’s in Mathematics from the Indian Institute of Technology, Madras and Master’s in Industrial
Engineering and Operations Research from the Pennsylvania State University.
with which a given node should communicate (Estrin et al., that there exist optimal values for either the transmission
1999; Goldsmith and Wicker, 2002; Santi, 2005). There radius or the number of neighbours at each node that leads
exist many works that concentrate on distributed topology to the connectivity of the network (Gupta and Kumar, 1998;
control (Bettstetter, 2002a,b; Blough et al., 2003; Cerpa Krishnamachari et al., 2002; Meester and Roy, 1996; Xue
and Estrin, 2002; Glauche et al., 2003; Li et al., 2001, and Kumar, 2004). However, in all these cases, the critical
2003; Rodoplu and Meng, 1999). Such topology control values depend on the density of nodes (λ) or equivalently, the
protocols are required because, very often the wrong topology number of nodes (N ) in the sensing region. This would imply
can considerably reduce the performance of the system. that in order to maintain connectivity, the topology control
For example, a sparse network can increase the end to end protocols will require an update on the current density at all
packet delay and threaten the connectivity of the network. nodes. Such global updates (required in desired time intervals
On the other hand, a dense network can promise a connected over the entire lifetime of the network) lead to prohibitive
network with higher probabilities but also leads to higher control overhead (especially so in our scenario) and we would
interferences in the network resulting in limited spatial reuse like to avoid the same. Note that maintaining an optimal
(Ramanathan and Rosales-Hain, 2000). transmission radius or node degree is important because
In this paper, we are particularly interested in developing energy conservation and low interferences are some of the
topology control algorithms for WSNs that are large-scale primary objectives of WSNs (Estrin et al., 1999; Goldsmith
and lack a centralised authority. Advantages of such and Wicker, 2002; Santi, 2005).
distributed WSNs include the ability to rapidly deploy the Our focus therefore is to develop decentralised topology
nodes in a sensing region (which may be unmanned or control algorithms, which can maintain connectivity using
inhospitable), the distributed nature which allows for robust only the localised information available at each node. We
network performances and the lack of single points of failure. do not assume any global information (e.g. density) to be
In addition it is also possible to tailor the network design available at the nodes. In such cases the best one can hope
for intended applications (Goldsmith and Wicker, 2002). to do is to pool in as many nodes as possible to form a
We further assume that the density of active nodes can vary connected network at any point in time (Santi, 2005). That is,
with time. This variation in densities arises because of nodes our measure of connectivity is based on the presence of giant
dying due to loss of power (see Figure 1) or in the case when components in the network. We will in fact show that this
they have energy harvesting capabilities the nodes may go relaxation helps us in finding density independent values for
through an on/off cycle. In the ‘off’ period, the nodes harvest the number of neighbours needed to maintain connectivity.
energy and do not participate in the sensing and networking Though simulation based results exists (Santi, 2005), to our
tasks. Once they acquire sufficient energy they switch ‘on’ to knowledge we have not seen an analytical treatment of this
join the network. problem. In this paper we attempt to do the same.
classes of wireless network models (Sections 3.2.1 and 3.2.2) (where λ = E(X[0, 1]d )), if, (1) for mutually
and a new class of models called as the nearest k-neighbours disjoint Borel sets A1 , . . . , Ak , the random variables
model (Section 3.2.3). X(A1 ), . . . , X(Ak ) are mutually independent and (2) for
any bounded Borel set A we have for every k ≥ 0,
3.1 Preliminaries P (X(A) = k) = e−λ(A) (λk (A)k )/k!, where (.)
denotes Lebesgue measure in R d (Meester and Roy,
• Network: an undirected network G(V , E) consists of a 1996). In this paper we only consider the case when
set of nodes V, a set of edges E and a function w : E → d = 2. Also we can simulate a Poisson point process
V × V . That is, every element of the set E is mapped of intensity λ in a finite region of area A as follows.
to an ordered pair of points from the set V × V . On the
other hand a directed network again consists of the sets – First generate the number of points in the region of
V and E, but has two functions s, t : E → V, where s(e) area A from a Poisson distribution of mean λA.
represents the source and t (e) represents the target of the
edge e. – Place these points in the region uniformly randomly.
• Degree: degree of a node v ∈ V is the number of edges In most cases for simulation (and as in this paper),
incident on that node. In the case of a directed graph we it is sufficient to assume that the number of points is
have two different kinds of degrees on a node, namely λA, instead of generating this number from a Poisson
in-degree and out-degree. While in-degree of v is the distribution of mean λA. In this case it is called as a
number of edges with the target on v, out-degree of uniform Poisson point process.
v is the number of edges whose sources are at v. • Percolation: percolation theory studies the flow of
In undirected graphs degree of a node is just the number fluid across a random media, in particular on a regular
of edges incident on it. d-dimensional lattice where the edges are either present
• Path: a (directed) path in a directed network is an or absent with probabilities p and 1 − p, respectively
alternating sequence of nodes and edges denoted as (Albert and Barabasi, 2002; Bollobas, 1985; Meester and
v0 , e1 , v1 , e2 , . . . , vi , ei+1 , . . . , en , vn . Here v0 is the Roy, 1996). It is obvious that for small p only a few
origin and vn is the terminus of the path. ei+1 is the edge edges are present and hence percolation of a fluid across
that has its source at vi and target at vi+1 . In undirected this media is not possible. But one of the interesting
networks, a path is again an alternating sequence of nodes phenomena is the presence of a percolation threshold
and edges and ei+1 is the edge incident on both vi and pc , at which a percolating cluster of nodes connected
vi+1 . by edges begin to appear rather suddenly. That is for
p < pc a percolating cluster does not exist almost surely
• Connected network: in undirected networks if a subset while for p > pc it exists almost surely. To put in
V1 ⊆ V is such that there exists a path between any simple terms, for small values of p, only few edges
two nodes x, y ∈ V1 , then the network H1 (V1, EV1 ) is are present in the network and hence percolation is not
called a connected component of the network G(V , E). possible. However, as p increases gradually, so does the
Here EV1 ⊆ E contains only those edges that have both number of edges in the network and hence one would
the nodes it is incident on in V1 . Further H1 (V1 , EV1 ) expect the possibility of percolation to also increase
is maximally connected if there exists no V2 ⊆ V gradually. On the contrary, with the gradual increase
such that V1 ⊆ V2 and H2 (V2 , EV2 ) is connected. in p, the appearance of a percolating cluster arises rather
In general it is possible to partition V into disjoint sets suddenly. Suppose we consider a network in which any
of V1 , V2 , . . . , Vi (note that V1 ∪ V2 ∪ . . . ∪ Vi = V ) given pair of nodes is connected with a probability p
such that Hj (Vj , EVj ) is maximally connected in G, (also known as Erdos-Renyi random graphs (Bollobas,
∀j = 1, 2, . . . , i. We call a network as connected if 1985)), we can see from Figure 2 that as p increases
and only if in such a partition i = 1 (Definition 1). Also, gradually from 0, the giant component in the network
if i > 1 in such a partition, but if the size of the largest appears suddenly.
component is O(N ) then the network is said to have
a giant component (Definition 2). If the nodes in the Note that a percolating cluster of nodes in a network is the
system are assumed to be spread across the entire R 2 same as a connected network. In most cases in literature
space with some density λ, then the giant component percolation properties (connectivity) of large-scale networks
is also called as the unbounded connected component. (systems) are measured in terms of giant components (Albert
That is, the number of nodes in the largest component and Barabasi, 2002; Bollobas, 1985; Meester and Roy, 1996;
is unbounded. However, unless otherwise mentioned Penrose, 2003). Thus critical thresholds are determined for
we assume the former definition (Definition 1) for a the appearance of a giant component in the networks. This
connected network. is precisely the kind of approach we will be using in this
paper. However instead of an edge connection probability p,
• Poisson point process: given a compact set K (in we consider a different parameter k, which, is the number
R d ), a point process X is a measurable mapping from of neighbours for a node in the network and find critical
a probability space to the configurations of points thresholds for connectivity with respect to k. It is important
of K. The total number of points in a point process to note that the critical threshold gives the point where
is then a random variable. Further a point process connectivity can be obtained with as few number of edges
X in R d is a Poisson point process of intensity λ as possible.
204 U.N. Raghavan and S.R.T. Kumara
Figure 2 The graph shows the size of the largest connected Boolean model is usually denoted as (X, ρ), where ρ is the
component in a network of 500 nodes and a connection random variable for the radii on the points of X (Meester and
probability p. Note that the giant component arises Roy, 1996).
suddenly as p increases gradually In this case, the points of X can be thought of as sensor
nodes and the radius of the balls represent the transmission
radius of the sensors. For the case when ρ = r a.s
(almost surely), numerical values and bounds on the critical
threshold of parameters such as λ and r for the appearance
of unique giant connected components are available (Dall
and Christensen, 2002; Meester and Roy, 1996). Here
(and elsewhere) the uniqueness of the giant component is
interesting. This implies that there is at most and at least only
one giant component and that there do not exist two disjoint
giant components in the network. Also note that the cases
where ρ = r a.s. are also called the fixed radius models
(Krishnamachari et al., 2002).
Note that h is a version of g in which the probabilities and uses only few information exchanges between nodes to
of connection between nodes are reduced by a factor p maintain the topology.
and is stretched to maintain the same effective area as In all the protocols mentioned above, if the nodes are
g. This implies that the presence of even a few long mobile and their densities vary over time, then a large number
range connections can help to reach percolation at a of information exchanges between nodes will be required
lower density of points. They also introduced another to maintain the topology (Santi, 2005). Even though any
connection function f , which is a shift-squeezed version global updates such as the number of active nodes or the
of g. That is, the function g is shifted by a distance geographical location of nodes can be propagated in the
s (thus two nodes that are at most a distance s network, this information may become stale in the event
apart will not be connected) and squeezed so that of on/off nodes. Using stale information to readjust the
it still has the same effective area as g. It turns out that transmission radius at the nodes might result in non-desirable
(by means of simulation) long-range edges are more helpful topologies.
in the percolation process than short-range edges for a given Local Information No Topology (LINT) (Ramanathan
density of points. This shows the criticality of long-range and Rosales-Hain, 2000) is a neighbour based protocol that
edges to the connectivity of a network. This is usually referred specifically takes into account the mobility of the nodes.
to as the small-world concept (Watts and Strogatz, 1998). When the nodes are mobile, the number of nodes within
a given node’s transmission radius varies with time. LINT
therefore uses only the locally available information about a
5 Background on topology control protocols node’s current transmission radius (rcurrent ) and current
degree c to maintain connectivity. If the desired degree for
Topology control can be achieved in various ways. They connectivity is d, then under the assumption of uniform
could be one of location based mechanisms, direction based random distribution of nodes, the required radius (rreqd ) is
mechanisms or neighbour based to name a few (Santi, 2005). calculated using the formula, rreqd = rcurrent − 5 log d/c
Most of the protocols based on such methods try to set the (Glauche et al., 2003; Ramanathan and Rosales-Hain, 2000).
transmission power or radius at the nodes appropriately so The propagation loss function is assumed to vary as some
as to maintain the connectivity of the network. Note that the power of distance and in practice 2 < < 5. Advantages
transmission power of a node is a measure of how fast its of this protocol is that it does not assume any information such
energy depletes. as location of nodes or direction of neighbours to be present at
Location based protocols use information about the the nodes. Further, this formula can be used to both increase
position of the nodes. It is assumed that each node can or decrease the transmission radius according to d. However,
somehow determine its location accurately (e.g. using GPS). as discussed in Section 4.2 the critical value for d depends
Examples of protocols that are location based include R&M on N (see Figure 1). Therefore, varying densities over time
protocol (Rodoplu and Meng, 1999) and Local Minimal cannot be handled well using this protocol.
Spanning Tree (LMST) protocol (Li et al., 2003). The R&M In this paper, we aim to achieve topology control in a
protocol tries to obtain an optimal topology, where every distributed manner and assuming no global updates to be
node sends messages (multihop fashion) to the only master available at the nodes. Then, to maintain connectivity we
node in the network. To do so this protocol requires global try to pool in as many nodes as possible into one connected
information to be exchanged between nodes which, will lead component and wish to maintain a giant component in the
to message overhead, especially when the network is highly network throughout its lifetime. Specifically, for the nearest
dynamic. In the LMST protocol, each node builds a minimal k-neighbours model, we will show that the critical out-degree
spanning tree based on the information available about other kc required for a giant component in the network is 5 and
nodes up to a predefined distance. The transmission radius is independent of N or λ. In Blough et al. (to appear) the
of all the nodes are then adjusted to have sufficient power authors have shown by means of extensive simulation (10,000
to communicate with the neighbours of their respective instances of the network of sizes between 50 and 500) that
LMSTs. for nodes distributed uniformly randomly in a unit square,
Direction based protocols assume that each node has taking k = 6 will always result in 95% of the nodes in
the capability to somehow determine the direction of all the largest component. We on the other hand assume that
its neighbours. Cone Based Topology Control (CBTC) O(N ) nodes in the largest or giant component are equivalent
(Li et al., 2001) is one such protocol where nodes adjust to ‘as good as possible’ connectivity and show that 5 is the
their transmission radii so as to communicate with the closest magic number. Note that our’s is an average case analysis.
nodes in all directions. A parameter ρ is used as a step length Due to the centrality of measures in such networks (Farago,
to discritise the possible directions in [0, 2π). Bounds on 2002), all except a very small percentage of instances of the
ρ have been determined in Li et al. (2001) to generate a nearest k-neighbours network will have the same statistical
connected network topology. properties as the average case. In other words this means
In neighbour based topology control the nodes, given that when k = 5 or above, the network will have a giant
their transmission radii, are required to have a knowledge of component with high probability. This is what we show in
its neighbours. k-Neigh protocol proposed by Blough et al. the next section.
(2003) is one such protocol that controls the topology by Our interest in this paper is not in developing the protocols
keeping track of the number of neighbours. Here it is assumed for topology control. Instead we assume that there exist
that when a node x receives a message from another node y, efficient protocols that can maintain the connectivity of the
it can estimate its distance from y. This protocol is simple network, in a distributed and localised fashion, without the
Decentralised topology control algorithms 207
requirements of any global information (such as density). kα -neighbours network. Therefore, for a given X and ∀ k ≥
LINT is one such example. However, it does not specify kα , the nearest k-neighbours model has an unbounded
what the desired number of neighbours d should be. ‘Based connected component.
on the study of such topology control protocols we extract the To determine the value of kα , we know from (Cressie,
necessary conditions and constraints to determine a desirable 1991) that if Wk is the random variable for the distance
threshold for the number of neighbours. In this case, a density of the kth nearest-neighbour (k ≥ 1 ) from a point in
independent threshold for the presence of giant components X, then the probability density function of Wk is given
2
in WSNs’. by, f (wk ) = 2(π λ)k wk2k−1 e−π λwk /(k − 1)! It immediately
follows that E(Wk ) = k(2k)!/(2 k!)2 λ1/2 . In order to obtain
k
exists an unbounded connected component almost surely in P (Wk ≥ rc (λ)) = e−4.51 (3)
y=0
y!
the nearest k-neighbours model. Then if kc exists it must
be ≤ kα . and this probability is independent of λ. Thus kα which is
Let rc (λ) be the critical radius for connectivity of a now the smallest k such that the probability in Equation (3)
fixed radius network whose nodes are distributed is 1 is in fact independent of λ. However, only as k → ∞
according to a Poisson point process X of intensity λ the above probability tends to 1. But we see that even
in R 2 (Dall and Christensen, 2002; Raghavan et al., for k around 10, this probability is more than 0.99 and
2005). Suppose each node adjusts its transmission for k about 15 it is arbitrarily close to 1. Hence we can
radius to accommodate a desired number of neighbours k. safely assume that k = 15 yields a network in which each
Then a directed edge from a node towards its k neighbours is node has a transmission radius of at least rc (λ). Thus by
formed. Let definition of rc (λ) this network will have an unbounded
kα = inf{k|infi∈X {ri |outdegree at all connected component. Requiring every node to have at least
a transmission radius of rc (λ) gives a pessimistic estimate
nodes in the network = k} ≥ rc (λ)} of kc . If on the other hand we find the smallest value for k
such that the expected transmission radii on the nodes in the
kα is then the smallest k such that, the smallest transmission
network is at least rc (λ), then we need,
radius required at any node to have k outgoing neighbours,
is at least rc (λ). Note that even though k is independent of λ,
k(2k)! 4.51
kα might not. If each node adjusts its transmission radius E(Wk ) = 1
≥ = rc (λ) (4)
(2k k!)2 λ 2 πλ
to form directed edges with at least kα neighbours,
there will be edges in both directions between nodes and this implies
that are no more than a distance rc (λ) apart. Hence
‘the fixed radius network with radius rc (λ) becomes a k(2k)! 4.51
subgraph of the nearest kα -neighbours network’. This k 2
≥ (5)
(2 k!) π
is because in nearest kα -neighbours network each node
has a transmission radius of at least rc (λ). Also, since and we see that k = 5 is the smallest value for which the
the fixed radius network has an unbounded connected above inequality is satisfied (see Figure 4). This also implies
component (by the definition of rc (λ)) so does the nearest that k = 4 is the largest value for which the above inequality
208 U.N. Raghavan and S.R.T. Kumara
is not satisfied. Hence for values of k up to 4, the nearest consumes approximately the same amount of energy. To show
k-neighbours model does not have an unbounded connected this, we need the following.
component (by the definition of rc (λ)). Further this is true Suppose we restrict the Poisson point process X of
irrespective of the density of nodes in the network. Simulation intensity λ in R 2 to a finite region, say a unit square. Then, let
results also agree that for k = 5 and above the nearest the number of nodes in this finite region be N . The length of
k-neighbours model of any density λ has an unbounded a graph is the sum of the length of its edges. Hence for the kth
connected component. While for k < 5 there exists no giant nearest neighbour graph in which each node is adjacent to its
components (see Figure 4). kth closest neighbour, the length Lk,N is given by Avram and
Bertsimas (1993),
Figure 4 For each fixed k, the graphs show how the size of the j =k
largest connected component grows as the number of Lk,N 1 (j − 1/2)
lim E = (6)
nodes N increases. Note that while there exists no giant N →∞ N 1/2 2π 1/2 j =1 (j − 1)!
component for k = 3, 4 it appears suddenly for k = 5
Therefore the expected sum of the transmission radii on the
sensor nodes in the nearest k-neighbours model is the same
as the expected length of the kth nearest neighbour graph.
Also, the sum √ of the transmission radii (Lr,N ) in a fixed
radius model is dN /π where d is the desired connectivity.
Taking k = 5 and N sufficiently large in Equation (6), we
get, E(Lk,N ) ≈ 2.1809(N/π )1/2 . Also for the same
connectivity, that is taking d = 5, we have E(Lr,N ) ≈
2.2361(N/π )1/2 . On comparison we see that for a fixed N
and k, Lr,N ≈ Lk,N .
so that the number of neighbours (in directed sense or simply Avram, F. and Bertsimas, D. (1993) ‘On central limit theorems
out-degree) is 5. Due to the distributed and localised nature of in geometrical probability’, The Annals of Applied Probability,
this algorithm it is scalable for large number of nodes in the Vol. 3, No. 4, pp.1033–1046.
sensing region. This nature also helps the network to adapt Bettstetter, C. (2002a) ‘On the connectivity of wireless
to constantly changing densities and mobility of the nodes. multihop networks with homogeneous and inhomogeneous range
Further, the degree on the nodes are bounded by k, keeping assignment’, Proceedings of the IEEE Vehicular Technology
the interferences low. Conference, Vol. 3, pp.1706–1710.
The critical values (derived in the previous section), Bettstetter, C. (2002b) ‘On the minimum node degree and
are in some sense the optimal node degree for the worst case connectivity of a multihop wireless network’, Proceedings of
scenario. This is because, for any node degree less then 5 the ACM MobiHoc, pp.80–91.
the network does not have a giant component of bidirectional Bharathidasan, A. and Ponduru, V.A.S. (2005) ‘Sensor networks: an
links and for any node degree less than 4 the network does not overview’, Available at: http://www.cs.binghamton.edu/∼kliu/
have a giant strongly connected component. Therefore any survey.pdf.
decentralised neighbour-based topology control protocol can Blough, D., Leoncini, M., Resta, G. and Santi, P. (2003) ‘The
employ higher values for k than derived above. However the k-neigh protocol for symmetric topology control in ad hoc
aim should be to ensure that in the worst case k should not networks’, Proceedings of the IEEE MobiHoc, pp.141–152.
drop below 5 (or 4). Blough, D.M., Leoncini, M., Resta, G. and Santi, P. (to appear) ‘The
k-neighbors approach to interference bounded and symmetric
topology control in ad hoc networks’, IEEE Transactions on
8 Conclusion Mobile Computing.
Bollobas, B. (1985) Random Graphs, Orlando, FL: Academic Press.
In this paper, we have considered efficient maintenance of
connectivity of a wireless sensor network. In addition to Booth, L., Bruck, J., Fransceschetti, M. and Meester, R. (2003)
energy efficient values for critical degree on the nodes, we ‘Covering algorithms, continuum percolation and the geometry
of wireless networks’, The Annals of Applied Probability, Vol. 13,
considered the constraints and requirements from the view
No. 2, pp.722–741.
of topology control protocols, when the network is highly
dynamic. In specific, in the presence of mobile on/off nodes, Cerpa, A. and Estrin, D. (2002) ‘Ascent: adaptive self-configuring
it is desirable for topology control protocols to use only sensor networks topologies’, Proceedings of the IEEE
INFOCOM, Vol. 3, pp.1278–1287.
localised information in a distributed manner. We therefore
assume that no global updates such as the current density λ Cressie, A.C.N. (1991) Statistics for Spacial Data, Wiley Series in
is available at the nodes. We use a neighbour based topology Probability and Mathematical Statistics, USA: John Wiley and
control because it does not require any information such Sons.
as location of nodes or direction of neighbours, which is Dall, J. and Christensen, M. (2002) ‘Random geometric graphs’,
desirable in the presence of mobile nodes (Santi, 2005). Physical Review E, Vol. 66, No. 016121.
In such a case, we have shown that when nodes adjust their Estrin, D., Govindan, R., Heidmann, J. and Kumar, S. (1999) ‘Next
transmission radius to maintain a fixed out-degree k, then 5 is century challenges: scalable coordination in sensor networks’,
the critical threshold beyond which a giant component exists Proceedings of the ACM MobiCom, pp.263–270.
almost surely in the network. Further, this is true irrespective Farago, A. (2002) ‘Scalable analysis and design of ad hoc networks
of any change in λ as time varies. To our knowledge we are via random graph theory’, Proceedings of Dial-M, pp.43–50.
among the first ones to provide an analytical treatment of this
Farago, A. (2004) ‘On the fundamental limits of topology control’,
problem. Such density independent thresholds are especially
Proceedings of the Joint Workshop on Foundations of Mobile
helpful in the efficient maintenance of the topology in the Computing DIALM-POMC, pp.1–7.
presence of mobile on/off nodes.
Franceschetti, M., Booth, L., Cook, M., Meester, R. and Bruck, J.
(2003) ‘Percolation in multi-hop wireless networks’, IEEE
Acknowledgements Transactions on Information Theory, Available at: http://
www.paradise.caltech.edu/papers/etr055.pdf.
This work has been supported by the National Science Glauche, I., Krause, W., Sollacher, R. and Greiner, M. (2003)
Foundation, USA, under the grant NSF-SST 0427840. Any ‘Continuum percolation of wireless ad hoc communication
opinions, findings and conclusions or recommendations networks’, Physica A, Vol. 325, pp.577–600.
expressed in this paper are those of the authors and do not Goldsmith, A.J. and Wicker, S.B. (2002) ‘Design challenges for
necessarily reflect the views of National Science Foundation. energy-constrained ad hoc wireless networks’, IEEE Wireless
Communications, Vol. 9, No. 4, pp.8–27.
Gupta, P. and Kumar, P.R. (1998) Stochastic Analysis, Control,
References Optimization and Applications: A Volume in Honor of
W.H. Fleming, chapter Critical power for asymptotic connectivity
Albert, R. and Barabasi, A.L. (2002) ‘Statistical mechanics of in wireless networks, Birkhauser, Boston, pp.547–566.
complex networks’, Reviews of Modern Physics, Vol. 74, No. 1,
pp.47–97. Hac, A. (2003) Wireless Sensor Netowrk Designs, England: John
Wiley and Sons.
Appel, M.J.B. and Russo, R. (2002) ‘The connectivity of a graph on
uniform points on [0, 1]d ’, Reviews of Modern Physics, Vol. 60, Hogg, R.V., McKean, J.W. and Craig, A.T. (2005) Introduction to
pp.351–357. Mathematical Statistics, USA: Pearson Prentice Hall.
210 U.N. Raghavan and S.R.T. Kumara
Hou, T. and Li, V.O.K. (1986) ‘Transmission range control Quantanilla, J., Torquato, S. and Ziff, R.M. (2000) ‘Efficient
in multihop packet radio networks’, IEEE Transactions on meansurement of the percolation threshold for fully penetrable
Communications, Vol. 34, No. 1, pp.38–44. discs’, Journal of Physics A: Mathematics and General, Vol. 33,
Kleinrock, L. and Silvester, J. (1978) ‘Optimum transmission pp.L399–L407.
radii for packet radio networks or why six is a magic Quantanilla, J. (2001) ‘Meansurement of percolation, threshold
number’, Proceedings of the IEEE National Telecommunications for fully penetrable discs of different radii’, Physica Review E,
Conference, pp.4.3.1–4.3.5. Vol. 061108, pp.L399–L407.
Krishnamachari, B., Wicker, S.B., Bejar, R. and Pearlman, M. Raghavan, U.N., Thadakamalla, H.P. and Kumara, S.R.T. (2005)
(2002) Communications, Information and Network Security, ‘Phase transitions and connectivity of distributed wireless sensor
chapter Critical Density Thresholds in Distributed Wireless networks’, Advanced Computing and Communications 2005.
Networks, Kluwer publishers.
Ramanathan, R. and Rosales-Hain, R. (2000) ‘Topology control of
Li, L., Halpern, J.Y., Bahl, P., Wang, Y.M. and Wattenhofer, R. multihop wireless networks using transmit power adjustment’,
(2001) ‘Analysis of a cone-based distributed topology control Proceedings of the IEEE INFOCOM, pp.404–413.
algorithm for wireless multi-hop networks’, Proceedings of
the ACM symposium on Principles of Distributed Computing, Rodoplu, V. and Meng, T.H. (1999) ‘Minimum energy mobile
pp.264–273. wireless networks’, IEEE Journal on Selected Areas in
Communications, Vol. 17, No. 8, pp.1333–1344.
Li, N., Hou, J.C. and Sha, L. (2003) ‘Design and analysis of an
mst-based topology control algorithm’, Proceedings of the IEEE Santi, P. (2005) Topology Control in Wireless Ad Hoc and Sensor
INFOCOM, Vol. 3, pp.1702–1712. Networks, Chichester, UK: John Wiley and Sons.
Meester, R. and Roy, R. (1996) Continuum Percolation, Cambridge, Takagi, H. and Kleinrock, L. (1984) ‘Optimal transmission
UK: Cambridge University Press. ranges for randomly distributed packet radio terminals’, IEEE
Penrose, M.D. (1999) ‘On k-connectivity for a geometric random Transactions on Communications, Vol. 32, No. 3, pp.246–257.
graph’, Random Structures and Algorithms, Vol. 15, No. 2, Watts, D.J. and Strogatz, S.H. (1998) ‘Collective dynamics of small
pp.145–164. world networks’, Nature, Vol. 393, pp.440–442.
Penrose, M.D. (2003) Random Geometric Graphs, Oxford Studies Xue, F. and Kumar, P.R. (2004) ‘The number of neighbors needed
in Probability, Oxford: Oxford University Press. for connectivity of wireless networks’, Wireless Networks,
Philips, T.K., Panwar, S.S. and Tantawi, A.N. (1989) ‘Connectivity Vol. 10, pp.169–181.
properties of packet radio network model’, IEEE Transactions Ye, W. and Heidmann, J. (2003) ‘Medium access control in wireless
on Information Theory, Vol. 35, No. 5, pp.1044–1047. sensor networks’, USC/ISI Technical Report, ISI-TR-580.
OPERATIONS RESEARCH AND MANAGEMENT
SCIENCE HANDBOOK
December 1, 2006
ii
Contents
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
i
ii CONTENTS
11.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 11
11.1 Introduction
In the past few decades, graph theory has been a powerful analytical tool for understanding
and solving various problems in operations research (OR). Study on graphs (or networks)
traces back to the solution of the Königsberg bridge problem by Euler in 1735. In Königsberg,
the river Preger flows through the town dividing it into four land areas A, B, C and D as
shown in figure 11.1 (a). These land areas are connected by seven (1 - 7) different bridges.
The Königsberg bridge problem is to find whether it is possible to traverse through the
city on a route that crosses each bridge exactly once, and return to the starting point.
Euler formulated the problem using a graph theoretical representation and proved that the
traversal is not possible. He represented each land area as a vertex (or node) and each bridge
as an edge between two nodes (land areas) as shown in figure 11.1 (b). Then, he posed the
question as whether there exists a path such that it passes every edge exactly once and ends
1
2 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
at the start node. This path was later termed an Eulerian Circuit. Euler proved that for a
graph to have an Eulerian Circuit, all the nodes in the graph need to have an even degree.
Euler’s great insight lay in representing the Königsberg bridge problem as a graph problem
with a set of vertices and edges. Later, in the twentieth century, graph theory has developed
into a substantial area of study which is applied to solve various problems in engineering and
several other disciplines [7]. For example, consider the problem of finding the shortest route
between two geographical points. The problem can be modeled as a shortest path problem
on a network, where different geographical points are represented as nodes and they are
connected by an edge if there exists a direct path between the two nodes. The weights on
the edges represent the distance between the two nodes (see figure 11.2). Let the network
be G(V, E) where V is the set of all nodes, E is the set of edges (i, j) connecting the nodes
and w is a function such that wij is the weight of the edge (i, j). The shortest path problem
from node s to node t can be formulated as follows.
X
minimize wij xij
(i,j)∈ξ
1 if i = s;
X X
subject to xij − xji = −1 if i = t;
{j|(i,j)∈ξ} {j|(j,i)∈ξ}
0 otherwise.
xij ≥ 0, ∀(i, j) ∈ ξ.
where xij = 1 or 0 depending on whether the edge from node i to node j belongs to the
optimal path or not respectively. Many algorithms have been proposed to solve the shortest
path problem [7]. Using one such popular algorithm (Dijkstra’s algorithm [7]), we find the
shortest path from node 10 to node 30 as (10 - 1 - 3 - 12 - 30)(see figure 11.2). Note that
this problem and similarly other problems considered in traditional graph theory requires to
find the exact optimal path.
In the last few years there has been an intense amount of activity in understanding and
characterizing large-scale networks, which led to development of a new branch of science
called “Network science” [108]. The scale of the size of these networks is substantially dif-
11.1. INTRODUCTION 3
Figure 11.1: Königsberg bridge problem. (a) Shows the river flowing through the town
dividing it into four land areas A, B, C, and D. The land areas are connected by seven
bridges numbered from 1 to 7. (b) Graph theoretical representation of the Königsberg
bridge problem. Each node represents a land area and the edge between them represent the
bridges connecting the land areas.
Figure 11.2: Illustration of a typical optimization problem in OR. The objective is to find
the shortest path from node 10 to node 30. The values on the edges represent the distance
between two nodes. Here we use the exact distances between different nodes to calculate the
shortest path 10 - 1 - 3 - 12 - 30.
4 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
ferent from the networks considered in traditional graph theory. Also, the problems posed
in such networks are very different from traditional graph theory. These large-scale net-
works are referred to as complex networks and we will discuss the reasons why they are
termed “complex” networks later in the section 11.4. The following are examples of complex
networks:
• World Wide Web: It can be viewed as a network where web pages are the nodes and
hyperlinks connecting one webpage to another are the directed edges. The World Wide
Web is currently the largest network for which topological information is available. It
had approximately one billion nodes at the end of 1999 [89] and is continuously growing
at an exponential rate. A recent study [66] estimated the size to be 11.5 billion nodes
as of January 2005.
• Phone call network : The phone numbers are the nodes and every completed phone
call is an edge directed from the receiver to the caller. Abello et al. [4] constructed
a phone call network from the long distance telephone calls made during a single day
which had 53, 767, 087 nodes and over 170 million edges.
• Power grid network : Generators, transformers, and substations are the nodes and
high-voltage transmission lines are the edges. The power grid network of the western
United States had 4941 nodes in 1998 [143]. The North American power grid consisted
of 14, 099 nodes and 19, 657 edges [16] in 2005.
• Airline network : Nodes are the airports and an edge between two airports represent the
presence of a direct flight connection [29, 65]. Barthelemy et al. [29] have analyzed the
International Air Transportation Association database to form the world-wide airport
11.1. INTRODUCTION 5
network. The resulting network consisted of 3880 nodes and 18810 edges in 2002.
• Market graph: Recently, Boginski et al. [32, 33] represented the stock market data
as a network where the stocks are nodes and two nodes are connected by an edge if
their correlation coefficient calculated over a period of time exceeds certain threshold
value. The network had 6556 nodes and 27, 885 edges for the U.S. stock data during
the period 2000-2002 [33].
• Scientific collaboration networks: Scientists are represented as nodes and two nodes
are connected if the two scientists have written an article together. Newman [99,
100] studied networks constructed from four different databases spanning biomedical
research, high-energy physics, computer science and physics. On of these networks
formed from Medline database for the period from 1961 to 2001 had 1, 520, 251 nodes
and 2, 163, 923 edges.
• Movie actor collaboration network : Another well studied network is the movie actor
collaboration network, formed from the Internet Movie Database [1], which contains all
the movies and their casts from 1890s. Here again, the actors are represented as nodes
and two nodes are connected by an edge if the two actors have performed together in
a movie. This is a continuously growing network with 225, 226 nodes and 13, 738, 786
edges in 1998 [143].
The above are only a few examples of complex networks pervasive in the real world
[13, 31, 49, 101]. Tools and techniques developed in the field of traditional graph theory
involved studies that looked at networks of tens or hundreds or in extreme cases thousands
of nodes. The substantial growth in size of many such networks [see figure 11.3] necessitates
a different approach for analysis and design. The new methodology applied for analyzing
complex networks is similar to the statistical physics approach to complex phenomena.
The study of large-scale complex systems has always been an active research area in
various branches of science, especially in the physical sciences. Some examples are: fer-
romagnetic properties of materials, statistical description of gases, diffusion, formation of
crystals etc. For instance, let us consider a box containing one mole (6.022 ∗ 1023 ) of gas
atoms as our system of analysis[see figure 11.4 (a)]. If we represent the system with the
6 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
Figure 11.3: Pictorial description of the change in scale in the size of the networks found
in many engineering systems. This change in size necessitates a change in the analytical
approach.
microscopic properties of the individual particles such as their position and velocity, then it
would be next to impossible to analyze the system. Rather, physicists use statistical me-
chanics to represent the system and calculate macroscopic properties such as temperature,
pressure etc. Similarly, in networks such as the Internet and WWW, where the number
of nodes is extremely large, we have to represent the network using macroscopic properties
(such as degree distribution, edge-weight distribution etc), rather than the properties of in-
dividual entities in the network (such as the neighbors of a given node, the weights on the
edges connecting this node to its neighbors etc) [see figure 11.4 (b)]. Now let us consider
the shortest path problem in such networks (for instance, WWW). We rarely require specific
shortest path solutions such as from node A to node B (from webpage A to webpage B).
Rather it is useful if we know the average distance (number of hops) taken from any node
to any other node (any webpage to any other webpage) to understand dynamical processes
(such as search in WWW). This new approach for understanding networked systems provides
new techniques as well as challenges for solving conceptual and practical problems in this
field. Furthermore, this approach has become feasible and received a considerable boost by
the availability of computers and communication networks which have made the gathering
and analysis of large-scale data sets possible.
The objective of this chapter is to introduce this new direction of inter-disciplinary re-
search (Network Science) and discuss the new challenges for the OR community. During
the last few years there has been a tremendous amount of research activity dedicated to the
11.1. INTRODUCTION 7
Figure 11.4: Illustration of the analogy between a box of gas atoms and complex networks.
(a) A mole of gas atoms (6.022 ∗ 1023 atoms) in a box. (b) An example of a large-scale
network. For analysis, we need to represent both the systems using statistical properties.
study of these large-scale networks. This activity was mainly triggered by significant find-
ings in real-world networks which we will elaborate later in the chapter. There was a revival
of network modeling which gave rise to many path breaking results [13, 31, 49, 101] and
provoked vivid interest across different disciplines of the scientific community. Until now,
a major part of this research was contributed by physicists, mathematicians, sociologists
and biologists. However, the ultimate goal of modeling these networks is to understand and
optimize the dynamical processes taking place in the network. In this chapter, we address
the urgent need and opportunity for the OR community to contribute to the fast-growing
inter-disciplinary research on Network Science. The methodologies and techniques developed
till now will definitely aid the OR community in furthering this research.
The following is the outline of the chapter. In section 11.2, we introduce different sta-
tistical properties that are prominently used for characterizing complex networks. We also
present the empirical results obtained for many real complex networks that initiated a revival
of network modeling. In section 11.3, we summarize different evolutionary models proposed
to explain the properties of real networks. In particular, we discuss Erdős-Rényi random
graphs, small-world networks, and scale-free networks. In section 11.4, we discuss briefly
why these networks are called “complex” networks, rather than large-scale networks. We
summarize typical behaviors of complex systems and demonstrate how the real networks
have these behaviors. In section 11.5, we discuss the optimization in complex networks by
8 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
concentrating on two specific processes, robustness and local search, which are most relevant
to engineering networks. We discuss the effects of statistical properties on these processes
and demonstrate how they can be optimized. Further, we briefly summarize few more im-
portant topics and give references for further reading. Finally, in section 11.6, we conclude
and discuss future research directions.
In this section, we explain some of the statistical properties which are prominently used in the
literature. These statistical properties help in classifying different kinds of complex networks.
We discuss the definitions and present the empirical findings for many real networks.
Let G(V, E) be a network where V is the collection of entities (or nodes) and E is the set
of arcs (or edges) connecting them. A path between two nodes u and v in the network G
is a sequence [u = u1 , u2 , ..., un = v] , where u′i s are the nodes in G and there exists an
edge from ui−1 to ui in G for all i. The path length is defined as sum of the weights on the
edges along the path. If all the edges are equivalent in the network, then the path length
is equal to the number of edges (or hops) along the path. The average path length (l) of a
connected network is the average of the shortest paths from each node to every other node
in a network. It is given by
1 X X
l ≡ hd(u, w)i = d(u, w),
N(N − 1) u∈V u6=w∈ V
where, N is the number of nodes in the network and d(u, w) is the shortest path between u
and w. Table 11.1 show the values of l for many different networks. We observe that despite
the large size of the network (w.r.t. the number of nodes), the average path length is small.
This implies that any node can reach any other node in the network in a relatively small
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 9
Table 11.1: Average path length of many real networks. Note that despite the large size of
the network (w.r.t. the number of nodes), the average path length is very small.
number of steps. This characteristic phenomenon, that most pairs of nodes are connected
by a short path through the network, is called the small-world effect.
The existence of the small-world effect was first demonstrated by the famous experiment
conducted by Stanley Milgram in the 1960s [92] which led to the popular concept of six
degrees of separation. In this experiment, Milgram randomly selected individuals from Wi-
chita, Kansas and Omaha, Nebraska to pass on a letter to one of their acquaintances by mail.
These letters had to finally reach a specific person in Boston, Massachusetts; the name and
profession of the target was given to the participants. The participants were asked to send
the letter to one of their acquaintances whom they judged to be closer (than themselves) to
the target. Anyone who received the letter subsequently would be given the same information
and asked to do the same until it reached the target person. Over many trials, the average
length of these acquaintance chains for the letters that reached the targeted node was found
to be approximately 6. That is there is an acquaintance path of an average length 6 in the
social network of people in the United States. We will discuss another interesting and even
more surprising observation from this experiment in section 11.5.2. Currently, Watts et al.
[145] are doing an Internet-based study to verify this phenomenon.
phenomenon implies that within a few steps it could spread to a large fraction of most of
the real networks.
The clustering coefficient characterizes the local transitivity and order in the neighborhood
of a node. It is measured in terms of number of triangles (3-cliques) present in the network.
Consider a node i which is connected to ki other nodes. The number of possible edges
between these ki neighbors that form a triangle is ki (ki − 1)/2. The clustering coefficient of
a node i is the ratio of the number of edges Ei that actually exist between these ki nodes
and the total number ki (ki − 1)/2 possible, i.e.
2Ei
Ci =
ki (ki − 1)
The clustering coefficient of the whole network (C) is then the average of Ci′ s over all the
nodes in the network i.e. C = n1 i Ci (see figure 11.5). The clustering coefficient is high
P
for many real networks [13, 101]. In other words, in many networks if node A is connected
to node B and node C, then there is a high probability that node B and node C are also
connected. With respect to social networks, it means that it is highly likely that two friends
of a person are also friends, a feature analyzed in detail in the so called theory of balance
[43].
The degree of a node is the number of edges incident on it. In a directed network, a node has
both an in-degree (number of incoming edges) and an out-degree (number of outgoing edges).
The degree distribution of the network is the function pk , where pk is the probability that a
randomly selected node has degree k. Here again, a directed graph has both in-degree and
out-degree distributions. It was found that most of the real networks including the World
Wide Web [5, 14, 88], the Internet [55], metabolic networks [77], phone call networks [4, 8],
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 11
Figure 11.5: Calculating the clustering coefficient of a node and the network. For example,
node 1 has degree 5 and the number of edges between the neighbors is 3. Hence, the clustering
coefficient for node 1 is 3/10. The clustering coefficient of the entire network is the average
of the clustering coefficients at each individual nodes (109/180).
scientific collaboration networks [26, 99], and movie actor collaboration networks [12, 19, 25]
follow a power-law degree distribution (p(k) ∼ k −γ ), indicating that the topology of the
network is very heterogeneous, with a high fraction of small-degree nodes and few large
degree nodes. These networks having power-law degree distributions are popularly known
as scale-free networks. These networks were called as scale-free networks because of the lack
of a characteristic degree and the broad tail of the degree distribution. Figure 11.6 shows
the empirical results for the Internet at the router level and co-authorship network of high-
energy physicists. The following are the expected values and variances of the node degree in
scale-free networks,
f inite if γ > 2; f inite if γ > 3;
E[k] = V [k] =
∞ otherwise. ∞ otherwise.
where γ is the power-law exponent. Note that the variance of the node degree is infinite
when γ < 3 and the mean is infinite when γ < 2. The power-law exponent (γ) of most
of the networks lie between 2.1 and 3.0 which implies that their is high heterogeneity with
respect to node degree. This phenomenon in real networks is critical because it was shown
that the heterogeneity has a huge impact on the network properties and processes such as
network resilience [15, 16], network navigation, local search [6], and epidemiological processes
[111, 112, 113, 114, 115]. Later in this chapter, we will discuss the impact of the this
heterogeneity in detail.
12 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
0 0
10 10
−1
−1
10 10
−2
−2
10
10
−3
10
P(k)
−3
10
−4
10
−4
10
−5
10
−5
10 −6
10
−6 −7
10 0 1 2 3
10 0 1 2 3 4
10 10 10 10 10 10 10 10 10
k
(a) (b)
Figure 11.6: The degree distribution of real networks. (a) Internet at the router level. Data
courtesy of Ramesh Govindan [61]. (b) Co-authorship network of high-energy physicists,
after Newman [99].
Betweenness centrality (BC) of a node counts the fraction of shortest paths going through a
node. The BC of a node i is given by
X σst (i)
BC(i) = ,
s6=n6=t
σst
where σst is the total number of shortest paths from node s to t and σst (i) is the number of
these shortest paths passing through node i. If the BC of a node is high, it implies that this
node is central and many shortest paths pass through this node. BC was first introduced
in the context of social networks [139], and has been recently adopted by Goh et al. [59]
as a proxy for the load (li ) at a node i with respect to transport dynamics in a network.
For example, consider the transportation of data packets in the Internet along the shortest
paths. If many shortest paths pass through a node then the load on that node would be high.
Goh et al. have shown numerically that the load (or BC) distribution follows a power-law,
PL (l) ∼ l−δ with exponent δ ≈ 2.2 and is insensitive to the detail of the scale-free network
as long as the degree exponent (γ) lies between 2.1 and 3.0. They further showed that
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 13
Figure 11.7: Illustration of a network with community structure. Communities are defined
as a group of nodes in the network that have higher density of edges with in the group than
between groups. In the above network, group of nodes enclosed with in a dotted loop is a
community.
there exists a scaling relation l ∼ k (γ−1)/(δ−1) between the load and the degree of a node
when 2 < γ ≤ 3. Later in this chapter, we discuss how this property can be utilized for
local search in complex networks. Many other centrality measures exists in literature and a
detailed review of these measures can be found in [86].
Many real networks are found to exhibit a community structure (also called modular struc-
ture). That is, groups of nodes in the network have high density of edges within the group
and lower density between the groups (see figure 11.7). This property was first proposed in
the social networks [139] where people may divide into groups based on interests, age, pro-
fession etc. Similar community structures are observed in many networks which reflects the
division of nodes into groups based on the node properties [101]. For example, in the WWW
it reflects the subject matter or themes of the pages, in citation networks it reflects the area
of research, in cellular and metabolic networks it may reflect functional groups [72, 121].
of specified sizes, such that, the number of edges between these sets is minimum. This
problem is NP-complete [58] and several heuristic methods [69, 81, 119] have been proposed
to decrease the computation time. GPP arises in many important engineering problems
which include mapping of parallel computations, laying out of circuits (VLSI design) and
the ordering of sparse matrix computations [69]. Here, the number of partitions to be
made is specified and the size of each partition is restricted. For example, in mapping of
parallel computations, the tasks have to be divided between a specified number of processors
such that the communication between the processors is minimized and the loads on the
processors are balanced. However, in real networks, we do not have any a priori knowledge
about the number of communities into which we should divide and about the size of the
communities. The goal is to find the naturally existing communities in the real networks
rather than dividing the network into a pre-specified number of groups. Since we do not
know the exact partitions of network, it is difficult to evaluate the goodness of a given
partition. Moreover, there is no unique definition of a community due to the ambiguity
of how dense a group should be to form a community. Many possible definitions exist in
literature [56, 103, 109, 120, 139]. A simple definition given in [56, 120] considers a subgraph
as a community if each node in the subgraph has more connections within the community
than with the rest of the graph. Newman and Grivan [103] have proposed another measure
which calculates the fraction of links within the community minus the expected value of the
same quantity in a randomized counterpart of the network. The higher this difference, the
stronger is the community structure. It is important to note that in spite of this ambiguity,
the presence of community structures is a common phenomenon across many real networks.
Algorithms for detecting these communities are briefly discussed in section 11.5.3.
Figure 11.8: Effects of removing a node or an edge in the network. Observe that as we
remove more nodes and edges the network disintegrates into small components/clusters.
removal of nodes/edges on a network. Observe that as we remove more nodes and edges, the
network disintegrates into many components. There are different ways of removing nodes and
edges to test the robustness of a network. For example, one can remove nodes at random with
uniform probability or by selectively targeting certain classes of nodes, such as nodes with
high degree. Usually, the removal of nodes at random is termed as random failures and the
removal of nodes with higher degree is termed as targeted attacks; other removal strategies
are discussed in detail in [71]. Similarly there are several ways of measuring the degradation of
the network performance after the removal. One simple way to measure it is to calculate the
decrease in size of the largest connected component in the network. A connected component
is a part of the network in which a path exists between any two nodes in that component
and the largest connected component is the largest among the connected components. The
lesser the decrease in the size of the largest connected component, the better the robustness
of the network. In figure 11.8, the size of the largest connected component decreases from
13 to 9 and then to 5. Another way to measure robustness is to calculate the increase of
the average path length in the largest connected component. Malfunctioning of nodes/edges
eliminates some existing paths and generally increases the distance between the remaining
nodes. Again, the lesser the increase, the better the robustness of the network. We discuss
more about network resilience and robustness with respect to optimization in section 11.5.1.
16 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
In this section, we give a brief summary of different models for complex networks. Most of
the modeling efforts focused on understanding the underlying process involved during the
network evolution and capture the above-mentioned properties of real networks. In specific,
we concentrate on three prominent models, namely, the Erdős-Rényi random graph model,
the Watts-Strogatz small-world network model, and the Barabási-Albert scale-free network
model.
One of the earliest theoretical models for complex networks was given by Erdős and Rényi
[52, 53, 54] in the 1950s and 1960s. They proposed uniform random graphs for modeling
complex networks with no obvious pattern or structure. The following is the evolutionary
model given by Erdős and Rényi:
Figure 11.9 illustrates two realizations for Erdős-Rényi random graph model (ER random
graphs) for two connection probabilities. Erdős and Rényi have shown that at pc ≃ 1/N,
the ER random graph abruptly changes its topology from a loose collection of small clusters
to one which has giant connected component. Figure 11.10 shows the change in size of the
largest connected component in the network as the value of p increases, for N = 1000. We
observe that there exists a threshold pc = 0.001 such that when p < pc , the network is com-
posed of small isolated clusters and when p > pc a giant component suddenly appears. This
phenomenon is similar to the percolation transition, a topic well-studied both in mathematics
and statistical mechanics [13].
Figure 11.9: An Erdős-Rényi random graph that starts with N = 20 isolated nodes and
connects any two nodes with a probability p. As the value of p increases the number of edges
in the network increase.
1000
900
Size of the largest connected
800
700
component
600
500
400
300
200
100
0
0 0.002 0.004 0.006 0.008 0.01
Connection probability (p)
Figure 11.10: Illustration of percolation transition for the size of the largest connected
component in Erdős-Rényi random graph model. Note that there exists pc = 0.001 such
that when p < pc , the network is composed of small isolated clusters and when p > pc a
giant component suddenly appears.
18 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
To cover all the nodes in the network, the distance (l) should be such that < k >l ∼ N.
log N
Thus, the average path length is given by l = log <k>
, which scales logarithmically with the
number of nodes N. This is only an approximate argument for illustration, a rigorous proof
can be found in [34]. Hence, ER random graphs are small world. The clustering coefficient
of the ER random graphs is found to be low. If we consider a node and its neighbors in a
ER random graph then the probability that two of these neighbors are connected is equal
to p (probability that two randomly chosen neighbors are connected). Hence, the clustering
<k>
coefficient of a ER random graph is p = N
which is small for large sparse networks. Now,
let us calculate the degree distribution of the ER random graphs. The total number of edges
in the network is a random variable with an expected value of pN(N − 1)/2 and the number
of edges incident on a node (the node degree) follows a binomial distribution with parameters
N − 1 and p,
p(ki = k) = CNk −1 pk (1 − p)N −1−k .
This implies that in the limit of large N, the probability that a given node has degree
<k>k e−<k>
k approaches a Poisson distribution, p(k) = k!
. Hence, ER random graphs are
statistically homogenous in node degree as the majority of the nodes have a degree close to
the average, and significantly small and large node degrees are exponentially rare.
ER random graphs were used to model complex networks for a longtime [34]. The model
was intuitive and analytically tractable; moreover the average path length of real networks
is close to the average path length of a ER random graph of the same size [13]. However,
recent studies on the topologies of diverse large-scale networks found in nature indicated
that they have significantly different properties from ER random graphs [13, 31, 49, 101]. It
has been found [143] that the average clustering coefficient of real networks is significantly
larger than the average clustering coefficient of ER random graphs with the same number
of nodes and edges, indicating a far more ordered structure in real networks. Moreover, the
degree distribution of many large-scale networks are found to follow a power-law p(k) ∼ k −γ .
Figure 11.11 compares two networks with Poisson and power-law degree distributions. We
observe that there is a remarkable difference between these networks. The network with
Poisson degree distribution is more homogenous in node degree, whereas the network with
power-law distribution is highly heterogenous. These discoveries along with others related
11.3. MODELING OF COMPLEX NETWORKS 19
Poisson Power-law
0.12 1.0E+00
0.1 1.0E-01
0.08 1.0E-02
P(k)
P(k)
1.0E-03
0.06
1.0E-04
0.04 1.0E-05
0.02 1.0E-06
k
0 1.0E-07
0 10 20 30 1 10 100 1000
k k
Figure 11.11: Comparison of networks with Poisson and power-law degree distribution of the
same size. Note that the network with Poisson distribution is homogenous in node degree.
Most of the nodes in the network have same degree which is close to the average degree of the
network. However, the network with power-law degree distribution is highly heterogenous
in node degree. There are few nodes with large degree and many nodes with a small degree
to the mixing patterns of complex networks [13, 31, 49, 101] initiated a revival of network
modeling in the past few years.
Non-uniform random graphs are also studied [8, 9, 41, 93, 102, 104] to mimic the properties
of real-world networks, in specific, power-law degree distribution. Typically, these models
specify either a degree sequence, which is set of N values of the degrees ki of nodes i =
1, 2, ..., N or a degree distribution p(k). If a degree distribution is specified then the sequence
is formed by generating N random values from this distribution. This can be thought as
giving each node i in the network ki “stubs” sticking out of it and then pairs of these stubs
are connected randomly to form complete edges [104]. Molloy and Reed [93] have proved that
for a random graph with a degree distribution p(k) a gaint connected component emerges
P
almost surely when k≥1 k(k − 2)p(k) > 0, provided that the maximum degree is less than
N 1/4 . Later, Aiello et al. [8, 9] introduced a two-parameter random graph model P (α, γ)
20 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
for power-law graphs with exponent γ described as follows: Let nk be the number of nodes
with degree k, such that nk and k satisfy log nk = α − γ log k. The total number of nodes in
the network can be computed, noting that the maximum degree of a node in the network is
eα/γ . Using the results from Molloy and Reed [93], they showed that there is almost surely
a unique gaint connected component if γ < γ0 = 3.47875.... Whereas, there is no gaint
connected component almost surely when γ > γ0 .
Newman et al. [104] have developed a general approach to random graphs by using a
generating function formalism [146]. The generating function for the degree distribution pk
is given by G0 (x) = ∞ k
P
k=0 pk x . This function captures all the information present in the
k
original distribution since pk = k!1 ddxGk0 |x=0 . The average degree of a randomly chosen node
P ′
would be < k >= k kp(k) = G0 (1). Further, this formulation helps in calculating other
properties of the network [104]. For instance, we can approximately calculate the relation for
the average path length of the network. Let us consider, the degree of the node reached by
following a randomly chosen edge. If the degree of this node is k then we are k times more
likely to reach this node than a node of degree 1. Thus the degree distribution of the node
arrived by a randomly chosen edge is given by kpk and not pk . In addition, the distribution
(k+1)pk+1 (k+1)pk+1
of number of edges from this node (one less than the degree) qk , is P = .
k kpk <k>
P∞ ′
(k+1)pk+1 xk G0 (x)
Thus, the generating function for qk is given by G1 (x) = k=0
k
= ′
G0 (1)
. Note
that the distribution of number of first neighbors of a randomly chosen node (degree of
a node) is G0 (x). Hence, the distribution of number of second neighbors from the same
randomly chosen node would be G0 (G1 (x)) = k pk [G1 (x)]k . Here, the probability that any
P
of the second neighbors is connected to first neighbors or to one another scales as N −1 and
can be neglected in the limit of large N. This implies that the average number of second
∂
neighbors is given by [ ∂x G0 (G1 (x))]x=1 = G′0 (1)G′1 (1). Extending this method of calculating
the average number of nearest neighbors, we find that the average number of mth neighbors
zm , is [G′1 (1)]m−1 G′0 (1) = [ zz12 ]m−1 z1 . Now, let us start from a node and find the number of
first neighbors, second, third ... mth neighbors. Assuming that all the nodes in the network
can be reached within l steps, we have 1 + lm=1 zm = N. As for most graphs N ≫ z1 and
P
N/z1
z2 ≫ z1 , we obtain the average path length of the network l = z2 /z1
+ 1. The generating
function formalism can further be extended to include other features such as directed graphs,
bipartite graphs and degree correlations [101].
11.3. MODELING OF COMPLEX NETWORKS 21
Another class of random graphs which are especially popular in modeling social networks
is Exponential Random Graphs Models (ERGMs) or p∗ models [20, 57, 70, 129, 140]. The
ERGM consists of a family of possible networks of N nodes in which each network G appears
with probability P (G) = Z1 exp(− i θi ǫi ), where the function Z is, Z = G exp(− i θi ǫi ).
P P P
This is similar to the Boltzmann ensemble of statistical mechanics with Z as the partition
function [101]. Here, {ǫi } is the set of observable’s or measurable properties of the network
such as number of nodes with certain degree, number of triangles etc. {θi } are adjustable
set of parameters for the model. The ensemble average of a property ǫi is given as hǫi i =
P 1
P ∂f
G ǫi (G)P (G) = Z ǫi exp(− i θi ǫi ) = ∂θi . The major advantage of these models is that
they can represent any kind of structural tendencies such as dyad and triangle formations.
A detailed review of the parameter estimation techniques can be found in [20, 127]. Once the
parameters {θi } are specified, the networks can be generated by using Gibbs or Metropolis-
Hastings sampling methods [127].
Watts and Strogatz [143] presented a small-world network model to explain the existence
of high clustering and small average path length simultaneously in many real networks,
especially, social networks. They argued that most of the real networks are neither completely
regular nor completely random, but lie somewhere between these two extremes. The Watts-
Strogatz model starts with a regular lattice on N nodes and each edge is rewired with certain
probability p. The following is the algorithm for the model,
• Start with a regular ring lattice on N nodes where each node is connected to its first
k neighbors.
• Randomly rewire each edge with a probability p such that one end remains the same
and the other end is chosen uniformly at random. The other end is chosen without
allowing multiple edges (more than one edge joining a pair of nodes) and loops (edges
joining a node to itself).
22 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
Figure 11.12: Illustration of the random rewiring process for the Watts-Strogatz model. This
model interpolates between a regular ring lattice and a random network, without changing
the number of vertices (N = 20) or edges (E = 40) in the graph. When p = 0 the graph is
regular (each node has 4 edges), as p increases, the graph becomes increasingly disordered
until p = 1, all the edges are rewired randomly. After Watts and Strogatz, 1998 [143].
The resulting network is a regular network when p = 0 and a random graph when p = 1,
since all the edges are rewired (see figure 11.12). The above model is inspired from social
networks where people are friends with their immediate neighbors such as neighbors on the
street, colleagues at work etc (the connections in the regular lattice). Also, each person has
few friends who are a long way away (long-range connections attained by random rewiring).
Later, Newman [98] proposed a similar model where instead of edge rewiring, new edges are
introduced with probability p. The clustering coefficient of the Watts-Strogatz model and
the Newman model are
3(k − 1) 3(k − 1)
CW S = (1 − p)3 CN =
2(2k − 1) 2(2k − 1) + 4kp(p + 2)
respectively. This class of networks displays a high degree of clustering coefficient for small
values of p since we start with a regular lattice. Also, for small values of p the average path
length falls rapidly due to the few long-range connections. This co-existence of high clustering
coefficient and small average path length is in excellent agreement with the characteristics
of many real networks [98, 143]. The degree distribution of both models depends on the
parameter p, evolving from a univalued peak corresponding to the initial degree k to a
somewhat broader but still peaked distribution. Thus, small-world models are even more
homogeneous than random graphs, which is not the case with real networks.
11.3. MODELING OF COMPLEX NETWORKS 23
As mentioned earlier, many real networks including the World Wide Web [5, 14, 88], the
Internet [55], peer-to-peer networks [122], metabolic networks [77], phone call networks [4,
8] and movie actor collaboration networks [12, 19, 25] are scale-free, that is, their degree
distribution follows a power-law, p(k) ∼ k −γ . Barabási and Albert [25] addressed the origin
of this power-law degree distribution in many real networks. They argued that a static
random graph or Watts-Strogatz model fails to capture two important features of large-scale
networks: their constant growth and the inherent selectivity in edge creation. Complex
networks like the World-Wide Web, collaboration networks and even biological networks
are growing continuously by the creation of new web pages, start of new researchers and
by gene duplication and evolution. Moreover, unlike random networks where each node
has the same probability of acquiring a new edge, new nodes entering the network do not
connect uniformly to existing nodes, but attach preferentially to nodes of higher degree. This
reasoning led them to define the following mechanism,
• Growth: Start with small number of connected nodes say m0 and assume that every
time a node enters the system, m edges are pointing from it, where m < m0 .
• Preferential Attachment: Every time a new node enters the system, each edge of the
newly entered node preferentially attaches to a already existing node i with degree ki
with the following probability,
ki
Πi = P
j kj
It was shown that such a mechanism leads to a network with power-law degree distribution
p(k) = k −γ with exponent γ = 3. These networks were called as scale-free networks because
of the lack of a characteristic degree and the broad tail of the degree distribution. The average
log(N )
path length of this network scales as log(log(N ))
and thus displays small world property. The
(log N )2
clustering coefficient of a scale-free network is approximately C ∼ N
, which is a slower
decay than C =< k > N −1 decay observed in random graphs [35]. In the years following
the proposal of the first scale-free model a large number of more refined models have been
introduced, leading to a well-developed theory of evolving networks [13, 31, 49, 101].
24 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
In this section, we will discuss why these large-scale networks are termed as “complex” net-
works. The reason is not merely because the large size of the network, though the complexity
does arise due to the size of the network. One must also distinguish “complex systems” from
“complicated systems” [136]. Consider an airplane as an example. Even though it is a com-
plicated system, we know its components and the rules governing its functioning. However,
this is not the case with complex systems. Complex systems are characterized by diverse
behaviors that emerge as a result of non-linear spatio-temporal interactions among a large
number of components [73]. These emergent behaviors can not be fully explained by just
understanding the properties of the individual components/constituents. Examples of such
complex systems include ecosystems, economies, various organizations/societies, the nervous
system, the human brain, ant hills ... the list goes on. Some of the behaviors exhibited by
complex systems are discussed below:
the constituents and without any external influence. Self-organization typically leads
to an emergent behavior. Emergent behavior is a phenomenon in which the system
global property is not evident from those of its individual parts. A completely new
property arises from the interactions between the different constituents of the system.
For example, consider an ant colony. Although a single ant (a constituent of an ant
colony) can perform a very limited number of tasks in its lifetime, a large number of
ants interact in an ant colony that leads to more complex emergent behaviors.
Now let us consider the real large-scale networks such as the Internet, the WWW and
other networks mentioned in section 11.1. Most of these networks have power-law degree
distribution which does not have any specific scale [25]. This implies that the networks do not
have any characteristic degree and an average behavior of the system is not typical (see figure
11.11 (b)). Due to these reasons they are called as scale-free networks. This heavy tailed
degree distribution induces a high level of heterogeneity in the degrees of the vertices. The
heterogeneity makes the network highly sensitive to external disturbances. For example,
consider the network shown in figure 11.14(a). This network is highly sensitive when we
remove just two nodes in the network. It completely disintegrates into small components.
On the other hand, the network shown in the figure 11.14(b) having the same number of
nodes and edges is not very sensitive. Most real networks are found to have a structure similar
to the the network shown in figure 11.14(a), with a huge heterogeneity in node degree. Also,
studies [111, 112, 113, 114, 115] have shown that the presence of heterogeneity has a huge
impact on epidemiological processes such as disease spreading. They have shown that in
networks which do not have a heavy tailed degree distribution if the disease transmission
26 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
Figure 11.14: Illustration of high sensitivity phenomena in complex networks. (a) Observe
that when we remove the two highest degree nodes from the network, it disintegrates into
small parts. The network is highly sensitive to node removals. (b) Example of a network
with the same number of nodes and edges which is not sensitive. This network is not effected
much when we remove the three highest degree nodes. The network in (a) is highly sensitive
due to the presence of high heterogeneity in node degree.
rate is lesser than a certain threshold, it will not cause an epidemic or a major outbreak.
However, if the network has power-law or scale-free distribution, it becomes highly sensitive
to disease propagation. They further showed that no matter what the transmission rate is,
there exists a finite probability that the infection will cause a major outbreak. Hence, we
clearly see that these real large-scale networks are highly sensitive or infinitely susceptible.
Further, all these networks have evolved over time with new nodes joining the network (and
some leaving) according to some self-organizing or evolutionary rules. There is no external
influence that controlled the evolution process or structure of the network. Nevertheless,
these networks have evolved in such a manner that they exhibit complex behaviors such as
power-law degree distributions and many others. Hence, they are called “complex” networks
[135].
The models discussed in section 11.3 are focused on explaining the evolution and growth
process of many large real networks. They mainly concentrate on statistical properties of
real networks and network modeling. But the ultimate goal in studying and modeling the
structure of complex networks is to understand and optimize the processes taking place on
these networks. For example, one would like to understand how the structure of the Internet
affects its survivability against random failures or intentional attacks, how the structure of
the WWW helps in efficient surfing or search on the web, how the structure of social networks
affects the spread of viruses or diseases, etc. In other words, to design rules for optimiza-
tion, one has to understand the interactions between the structure of the network and the
processes taking place on the network. These principles will certainly help in redesigning
or restructuring the existing networks and perhaps even help in designing a network from
scratch. In the past few years, there has been tremendous amount of effort by the research
communities of different disciplines to understand the processes taking place on networks
[13, 31, 49, 101]. In this chapter, we concentrate on two processes, namely node failures and
local search, because of their high relevance to engineering systems and discuss few other
topics briefly.
All real networks are regularly subject to node/edge failures either due to normal mal-
functions (random failures) or intentional attacks (targeted attacks) [15, 16]. Hence, it is
extremely important for the network to be robust against such failures for proper function-
ing. Albert et al. [15] demonstrated that the topological structure of the network plays a
major role in its response to node/edge removal. They showed that most of the real net-
works are extremely resilient to random failures. On the other hand, they are very sensitive
to targeted attacks. They attribute it to the fact that most of these networks are scale-free
networks, which are highly heterogenous in node degree. Since a large fraction of nodes have
small degree, random failures do not have any effect on the structure of the network. On
28 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
8000 8000
7000 7000
6000 6000
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
0 0
0 35 70 0 35 70
p
(a) (b)
Figure 11.15: The size of the largest connected component as the percentage number of
nodes (p) removed from the networks due to random failures (⋄) and targeted attacks (△).
(a) ER graph with number of nodes (N) = 10,000 and mean degree < k > = 4; (b) Scale-
free networks generated by Barabási-Albert model with N = 10,000 and < k > = 4. The
behavior with respective to random failures and targeted attacks is similar for random graphs.
Scale-free networks are highly sensitive to targeted attacks and robust to random failures.
the other hand, the removal of a few highly connected nodes that maintain the connectiv-
ity of the network, drastically changes the topology of the network. For example, consider
the Internet: despite frequent router problems in the network, we rarely experience global
effects. However, if a few critical nodes in the Internet are removed then it would lead to
a devastating effect. Figure 11.15 shows the decrease in the size of the largest connected
component for both scale-free networks and ER graphs, due to random failures and targeted
attacks. ER graphs are homogenous in node degree, that is all the nodes in the network
have approximately the same degree. Hence, they behave almost similarly for both random
failures and targeted attacks (see figure 11.15(a)). In contrast, for scale-free networks, the
size of the largest connected component decreases slowly for random failures and drastically
for targeted attacks (see figure 11.15(b)).
the following optimization problem: “What is the optimal degree distribution of a network
of size N nodes that maximizes the robustness of the network to both random failures and
targeted attacks with the constraint that the number of edges remain the same? ”
Note that we can always improve the robustness by increasing the number of edges in
the network (for instance, a completely connected network will be the most robust network
for both random failures and targeted attacks). Hence the problem has a constraint on the
number of edges. In [133], Valente et al. showed that the optimal network configuration
is very different from both scale-free networks and random graphs. They showed that the
optimal networks that maximize robustness for both random failures and targeted attacks
have at most three distinct node degrees and hence the degree distribution is three-peaked.
Similar results were demonstrated by Paul et al. in [117]. Paul et al. showed that the
optimal network design is one in which all the nodes in the network except one have the
same degree, k1 (which is close to the average degree), and one node has a very large degree,
k2 ∼ N 2/3 , where N is the number of nodes. However, these optimal networks may not be
practically feasible because of the requirement that each node has a limited repertoire of
degrees.
Many different evolutionary algorithms have also been proposed to design an optimal
network configuration that is robust to both random failures and targeted attacks [44, 74,
125, 130, 134]. In particular, Thadakamalla et al. [130] consider two other measures, re-
sponsiveness and flexibility along with robustness for random failures and targeted attacks,
specifically for supply-chain networks. They define responsiveness as the ability of network
to provide timely services with effective navigation and measure it in terms of average path
length of the network. The lower the average path length, the better is the responsiveness
of the network. Flexibility is the ability of the network to have alternate paths for dy-
namic rerouting. Good clustering properties ensure the presence of alternate paths, and the
flexibility of a network is measured in terms of the clustering coefficient. They designed a pa-
rameterized evolutionary algorithm for supply-chain networks and analyzed the performance
with respect to these three measures. Through simulation they have shown that there exist
trade-offs between these measures and proposed different ways to improve these properties.
However, it is still unclear as to what would be the optimal configuration of such survivable
30 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
networks. The research question would be “what is the optimal configuration of a network of
N nodes that maximizes the robustness to random failures, targeted attacks, flexibility, and
responsiveness, with the constraint that the number of edges remain the same? ”
Until now, we have focussed on the effects of node removal on the static properties of
a network. However, in many real networks, the removal of nodes will also have dynamic
effects on the network as it leads to avalanches of breakdowns also called cascading failures.
For instance, in a power transmission grid, the removal of nodes (power stations) changes
the balance of flows and leads to a global redistribution of loads over all the network. In
some cases, this may not be tolerated and might trigger a cascade of overload failures [82], as
happened on August 10th 1996 in 11 US states and two Canadian provinces [124]. Models
of cascades of irreversible [97] or reversible [45] overload failures have demonstrated that
removal of even a small fraction of highly loaded nodes can trigger global cascades if the
load distribution of the nodes is heterogenous. Hence, cascade-based attacks can be much
more destructive than any other strategies considered in [15, 71]. Later, in [96], Motter
showed that a defence strategy based on a selective further removal of nodes and edges,
right after the initial attack or failure, can drastically reduce the size of the cascade. Other
studies on cascading failures include [39, 94, 95, 138, 141].
One of the important research problems that has many applications in engineering systems
is search in complex networks. Local search is the process, in which a node tries to find a
network path to a target node using only local information. By local information, we mean
that each node has information only about its first, or perhaps second neighbors and it is
not aware of nodes at a larger distance and how they are connected in the network. This is
an intriguing and relatively little studied problem that has many practical applications. Let
us suppose some required information such as computer files or sensor data is stored at the
nodes of a distributed network or database. Then, in order to quickly determine the location
of particular information, one should have efficient local (decentralized) search strategies.
Note that this is different from neighborhood search strategies used for solving combinatorial
11.5. OPTIMIZATION IN COMPLEX NETWORKS 31
optimization problems [2]. For example, consider the networks shown in figure 11.16(a) and
11.16(b). The objective is for node 1 to send a message to node 30 in the shortest possible
path. In the network shown in figure 11.16(a), each node has global connectivity information
about the network (that is, how each and every node is connected in the network). In such
a case, node 1 can calculate the optimal path using traditional algorithms [7] and send the
message through this path (1 - 3 - 12 - 30, depicted by the dotted line). Next, consider
the network shown in figure 11.16 (b), in which each node knows only about its immediate
neighbors. Node 1, based on some search algorithm, chooses to send the message to one of
its neighbors: in this case, node 4. Similarly, node 4 also has only local information, and
uses the same search algorithm to send the message to node 13. This process continues until
the message reaches the target node. We can clearly see that the search path obtained (1
- 4 - 13 - 28 - 23 - 30) is not optimal. However, given that we have only local information
available, the problem tries to design optimal search algorithms in complex networks. The
algorithms discussed in this section may look similar to “distributed routing algorithms” that
are abundant in wireless ad hoc and sensor networks [10, 11]. However, the main difference
is that the former try to exploit the statistical properties of the network topology whereas
the latter do not. Most of the algorithms in wireless sensor networks literature find a path
to the target node either by broadcasting or random walk and then concentrate on efficient
routing of the data from start node to the end node [10, 76]. As we will see in this section,
the statistical properties of the networks have significant effect on the search process. Hence,
the algorithms in wireless sensor networks could be integrated with these results for better
performance.
We discuss this problem for two types of networks. In the first type of network, the global
position of the target node can be quantified and each node has this information. This
information will guide the search process in reaching the target node. For example, if we
look at the network considered in Milgram’s experiment each person has the geographical
and professional information about the target node. All the intermediary people (or nodes)
use this information as a guide for passing the messages. Whereas in the second type of
network, we can not quantify the global position of the target node. In this case, during the
search process, we would not know whether a step in the search process is going towards the
target node or away from it. This makes the local search process even more difficult. One
32 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
Figure 11.16: Illustration for different ways of sending message from node 1 to node 30. (a)
In this case, each node has global connectivity information about the whole network. Hence,
node 1 calculates the optimal path and send the message through this path. (b) In this case,
each node has only information about its neighbors (as shown by the dotted curve). Using
this local information, node 1 tries to send the message to node 30. The path obtained is
longer than the optimal path.
such kind of network is the peer-to-peer network, Gnutella [79], where the network structure
is such that one may know very little information about the location of the target node.
Here, when a user is searching for a file he/she does not know the global position of the node
that has the file. Further, when the user sends a request to one of its neighbors, it is difficult
to find out whether this step is towards the target node or away from it. For lack of more
suitable name, we call the networks of the first type spatial networks and networks of the
second type non-spatial networks. In this chapter, we focus more on search in non-spatial
networks.
The problem of local search goes back to the famous experiment by Stanley Milgram [92]
(discussed in section 11.2) illustrating the short distances in social networks. Another im-
portant observation of the experiment, which is even more surprising, is the ability of these
nodes to find these short paths using just the local information. As pointed out by Kleinberg
[83, 84, 85], this is not a trivial statement because most of the time, people have only local
information in the network. This is the information about their immediate friends or perhaps
11.5. OPTIMIZATION IN COMPLEX NETWORKS 33
their friends’ friends. They do not have the global information about the acquaintances of
all people in the network. Even in Milgram’s experiment, the people to whom he gave the
letters have only local information about the entire social network. Still, from the results
of the experiment, we can see that arbitrary pairs of strangers are able to find short chains
of acquaintances between them by using only local information. Many models have been
proposed to explain the existence of such short paths [13, 31, 49, 98, 101, 143]. However,
these models are not sufficient to explain the second phenomenon. The observations from
Milgram’s experiment suggest that there is something more embedded in the underlying
social network that guides the message implicitly from the source to the target. Such net-
works which are inherently easy to search are called searchable networks. Mathematically,
a network is searchable if the length of the search path obtained scales logarithmically with
the number of nodes N (∼ logN) or lesser. Kleinberg demonstrated that the emergence of
such a phenomenon requires special topological features [83, 84, 85]. Considering a family
of network models on a n-dimensional lattice that generalizes the Watts-Strogatz model,
he showed that only one particular model among this infinite family can support efficient
decentralized algorithms. Unfortunately, the model given by Kleinberg is highly constrained
and represents a very small subset of complex networks. Watts et al. [144] presented another
model which is based upon plausible hierarchical social structures and contentions regarding
social networks. This model defines a class of searchable networks and offers an explanation
for the searchability of social networks.
The traditional search methods in non-spatial networks are broadcasting or random walk.
In broadcasting, each node sends the message to all its neighbors. The neighbors in turn
broadcast the message to all their neighbors, and the process continues. Effectively, all
the nodes in the network would have received the message at least once or may be even
more. This could have devastating effects on the performance of the network. A hint on
the potential damages of broadcasting can be viewed by looking at the Taylorsville NC,
elementary school project [142]. Sixth-grade students and their teacher sent out a sweet
email to all the people they knew. They requested the recipients to forward the email to
34 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
everyone they know and notify the students by email so that they could plot their locations
on a map. A few weeks later, the project had to be canceled because they had received
about 450,000 responses from all over the world [142]. A good way to avoid such a huge
exchange of messages is by doing a walk. In a walk, each node sends the message to one
of its neighbors until it reaches the target node. The neighbor can be chosen in different
ways depending on the algorithm. If the neighbor is chosen randomly with equal probability
then it is called random search, while in a high degree search the highest degree neighbor is
chosen. Adamic et al. [6] have demonstrated that high degree search is more efficient than
random search in networks with a power-law degree distribution (scale-free networks). High
degree search sends the message to a more connected neighbor that has higher probability
of reaching the target node and thus exploiting the presence of heterogeneity in node degree
to perform better. They showed that the number of steps (s) required for the random
search until the whole graph is revealed is s ∼ N 3(1−2/γ) and for the high-degree search
it is s ∼ N (2−4/γ) . Clearly, for γ > 2.0, the number of steps taken by high-degree search
scales with a smaller exponent than the random walk search. Since most real networks have
power-law degree distribution with exponent (γ) between 2.1 and 3.0, high-degree search
would be more effective in these networks.
All the algorithms discussed until now [6, 83, 84, 85, 144], have assumed that the edges in
the network are equivalent. But, the assumption of equal edge weights (which may represent
the cost, bandwidth, distance, or power consumption associated with the process described
by the edge) usually does not hold in real networks. Many researchers [17, 27, 28, 36, 60,
62, 65, 87, 100, 106, 116, 118, 148], have pointed out that it is incomplete to assume that all
the edges are equivalent. Recently, Thadakamalla et al. [131] have proposed a new search
algorithm based on a network measure called local betweenness centrality (LBC) that utilizes
the heterogeneities in node degrees and edge weights. The LBC of a neighbor node i, L(i),
is given by
X σst (i)
L(i) = ,
s6=n6=t
σst
s,t ∈ local network
where σst is the total number of shortest paths (shortest path means the path over which
the sum of weights is minimal) from node s to t. σst (i) is the number of these shortest paths
passing through i. If the LBC of a node is high, it implies that this node is critical in the local
11.5. OPTIMIZATION IN COMPLEX NETWORKS 35
Table 11.2: Comparison of different search strategies in power-law networks with exponent
2.1 and 2000 nodes with different edge weight distributions. The mean for all the edge weight
distributions is 5 and the variance is σ 2 . The values in the table are the average distances
obtained for each search strategy in these networks. The values in the brackets show the
relative difference between average distance for each strategy with respect to the average
distance obtained by the LBC strategy. LBC search, which reflects both the heterogeneities
in edge weights and node degree, performed the best for all edge weight distributions.
network. Thadakamalla et al. assume that each node in the network has information about
its first and second neighbors and using this information, the node calculates the LBC of each
neighbor and passes the message to the neighbor with the highest LBC. They demonstrated
that this search algorithm utilizes the heterogeneities in node degree and edge-weights to
perform well in power-law networks with exponent between 2.0 and 2.9 for a variety of edge-
weight distributions. Table 11.2 compares the performance of different search algorithms for
scale-free networks with different edge weight distributions. The values in the parentheses
show the relative difference between the average distance for each algorithm with respect
to the average distance obtained by the LBC algorithm. In specific, they observed that as
the heterogeneity in the edge weights increase, the difference between the high-degree search
and LBC search increase. This implies that it is critical to consider the edge weights in the
local search algorithms. Moreover, given that many real networks are heterogeneous in edge
weights, it becomes important to consider an LBC based search rather than high degree
search as shown by Adamic et. al [6].
36 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
There are various other applications to real networks which include the issues related to the
structure of the networks and their dynamics. In this subsection, we briefly summarize these
applications and give some references for further study.
As mentioned earlier, community structures are typically found in many real networks. Find-
ing these communities is extremely helpful in understanding the structure and function of the
network. Sometimes the statistical properties of the community alone may be very different
from the whole network and hence these may be critical in understanding the dynamics in
the community. The following are some of the examples:
• The World Wide Web: Identification of communities in the web is helpful for im-
plementation of search engines, content filtering, automatic classification, automatic
realization of ontologies and focussed crawlers [18, 56].
• Social networks: Community structures are a typical feature of a social network. The
behavior of an individual is highly influenced by the community he/she belongs. Com-
munities often have their own norms, subcultures which are an important source of a
person’s identity [103, 139].
• Biological networks: Community structures are found in cellular [72, 123], metabolic
[121] and genetic networks [147]. Identifying them helps in finding the functional
modules which correspond to specific biological functions.
common neighbors) starting with the edge between the pairs with highest similarity. This
procedure can be stopped at any step and the distinct components of the network are taken
to be the communities. On the other hand, in divisive methods edges are removed from
the network based on certain measure (for example, the edge with the highest betweenness
centrality [103]). As this process continues the network disintegrates into different communi-
ties. Recently, many such algorithms are proposed and applied to complex networks [31, 46].
A comprehensive list of algorithms to identify community structures in complex networks
can be found in [46] where Danon et al. have compared them in terms of sensitivity and
computational cost.
Spreading processes
Congestion
Transport of packets or materials ranging from packet transfer in the Internet to the mass
transfer in chemical reactions in cell is one of the fundamental processes occurring on many
real networks. Due to limitations in resources (bandwidth), increase in number of packets
(packet generation rate) may lead to overload at the node and unusually long deliver times,
in other words, congestion in networks. Considering a basic model, Ohira and Sawatari [107]
have shown that there exists a phase transition from a free flow to a congested phase as
a function of the packet generation rate. This critical rate is commonly called “congestion
threshold” and the higher the threshold, the better is the network performance with respect
to congestion.
11.6. CONCLUSIONS 39
Many studies have shown that an important role is played by the topology and routing
algorithms in the congestion of networks [40, 47, 50, 51, 63, 64, 126, 128, 132]. Toroczkai et
al. [132] have shown that on large networks on which flows are influenced by gradients of a
scalar distributed on the nodes, scale-free topologies are less prone to congestion than random
graphs. Routing algorithms also influence congestion at nodes. For example, in scale-free
networks, if the packets are routed through the shortest paths then most of the packets
pass through the hubs and hence causing higher loads on the hubs [59]. Singh and Gupte
[126] discuss strategies to manipulate hub capacity and hub connections to relieve congestion
in the network. Similarly many congestion-aware routing algorithms [40, 50, 51, 128] have
been proposed to improve the performance. Sreenivasan et al. [128] introduced a novel static
routing protocol which is superior to shortest path routing under intense packet generation
rates. They propose a mechanism in which packets are routed through hub avoidance paths
unless the hubs are required to establish the route. Sometimes when global information is
not available, routing is done using local search algorithms. Congestion due to such local
search algorithms and optimal network configurations are studied in [22].
11.6 Conclusions
Complex networks abound in today’s world and are continuously evolving. The sheer size
and complexity of these networks pose unique challenges in their design and analysis. Such
complex networks are so pervasive that there is an immediate need to develop new analytical
approaches. In this chapter, we presented significant findings and developments in recent
years that led to a new field of inter-disciplinary research, Network Science. We discussed
how network approaches and optimization problems are different in network science than
traditional OR algorithms and addressed the need and opportunity for the OR community
to contribute to this fast-growing research field. The fundamental difference is that large-
scale networks are characterized based on macroscopic properties such as degree distribution
and clustering coefficient rather than the individual properties of the nodes and edges. Im-
portantly, these macroscopic or statistical properties have a huge influence on the dynamic
processes taking place on the network. Therefore, to optimize a process on a given config-
40 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS
Acknowledgments
The authors would like to acknowledge the National Science Foundation (Grant # DMI
0537992) and a Sloan Research Fellowship to one of the authors (R. A.) for making this work
feasible. In addition, the authors would like to thank the anonymous reviewer for helpful
comments and suggestions. Any opinions, findings and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views
of the National Science Foundation (NSF).
Bibliography
[1] The Internet Movie Database can be found on the WWW at http://www.imdb.com/.
[3] J. Abello and J. Vitter, editors. External Memory Algorithms: DIMACS series in
discrete mathematics and theoretical computer science, volume 50. American Mathe-
matical Society, Boston, MA, USA, 1999.
[5] L. A. Adamic and B. A. Huberman. Growth dynamics of the world-wide web. Nature,
401(6749):131, 1999.
41
42 BIBLIOGRAPHY
[8] W. Aiello, F. Chung, and L. Lu. A random graph model for massive graphs. Proceedings
of the thirty-second annual ACM symposium on Theory of computing, pages 171–180,
2000.
[9] W. Aiello, F. Chung, and L. Lu. A random graph model for power law graphs. Exper-
imental Mathematics, 10(1):53–66, 2001.
[12] R. Albert and A. L. Barabási. Topology of evolving networks: Local events and
universality. Phys. Rev. Lett., 85(24):5234–5237, 2000.
[14] R. Albert, H. Jeong, and A. L. Barabási. Diameter of the world wide web. Nature,
401(6749):130–131, 1999.
[15] R. Albert, H. Jeong, and A. L. Barabási. Attack and error tolerance of complex
[16] R. Albert, I. Albert, and G. L. Nakarado. Structural vulnerability of the north american
[20] C. Anderson, S. Wasserman, and B. Crouch. A p∗ primer: Logit models for social
networks. Social Networks, 21(1):37–66, 1999.
[23] R. Badii and A. Politi. Complexity : Hierarchical structures and scaling in physics.
Cambridge university press, 1997.
[25] A. L Barabási and R. Albert. Emergence of scaling in random networks. Science, 286
(5439):509–512, 1999.
[30] C. H. Bennett. From Complexity to Life, chapter How to Define Complexity in Physics,
and Why, pages 34–43. Oxford University Press, 2003.
[33] V. Boginski, S. Butenko, and P. Pardalos. Mining market data: a network approach.
[35] B. Bollobas and O. Riordan. Handbook of Graphs and Networks, chapter Mathematical
results on scale-free graphs. Wiley-VCH, Berlin, 2003.
and J. Wiener. Graph structure in the web. Computer networks, 33:309–320, 2000.
[40] Z. Y. Chen and X. F. Wang. Effects of network structure and routing strategy on
network capacity. Phys. Rev. E, 73(3):036107, 2006.
BIBLIOGRAPHY 45
[41] F. Chung and L. Lu. Connected components in random graphs with given degree
sequences. Annals of combinatorics, 6:125–145, 2002.
[42] V. Colizza, A. Barrat, M. Barthlemy, and A. Vespignani. The role of the airline
transportation network in the prediction and predictability of global epidemics. PNAS,
103(7):2015–2020, 2006.
[44] L. F. Costa. Reinforcing the resilience of complex networks. Phys. Rev. E, 69(6):
066127, 2004.
[45] P. Crucitti, V. Latora, and M. Marchiori. Model for cascading failures in complex
networks. Phys. Rev. E, 69(4):045104, 2004.
[47] M. Argollo de Menezes and A.-L. Barabási. Fluctuations in network dynamics. Phys.
Rev. Lett., 92(2):028701, 2004.
[53] P. Erdos and A.Renyi. On the evolution of random graphs. Magyar Tud. Mat. Kutato
Int. Kozl., 5:17–61, 1960.
[54] P. Erdos and A.Renyi. On the strength of connectedness of a random graph. Acta
Math. Acad. Sci. Hungar., 12:261–267, 1961.
[56] G. Flake, S. Lawrence, and C. Lee Giles. Efficient identification of web communities.
[57] O. Frank and D. Strauss. Markov graphs. J. American Statistical Association, 81:
832–842, 1986.
[58] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory
[59] K. I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free
[60] K. I. Goh, J. D. Noh, B. Kahng, and D. Kim. Load distribution in weighted complex
[61] R. Govindan and H. Tangmunarunkit. Heuristics for internet map discovery. IEEE
INFOCOM, 3:1371–1380, 2000.
[62] M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):
1360–1380, 1973.
BIBLIOGRAPHY 47
[65] R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral. The worldwide air trans-
portation network: Anomalous centrality, community structure, and cities’ global roles.
Proc. Nat. Acad. Sci., 102:7794–7799, 2005.
[66] A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In
WWW ’05: Special interest tracks and posters of the 14th international conference on
World Wide Web, pages 902–903. ACM Press, New York, USA, 2005.
[67] P. Hansen and B. Jaumard. Cluster analysis and mathematical programming. Math-
ematical programming, 79:191–215, 1997.
[68] J. Hasselberg, P. M. Pardalos, and G. Vairaktarakis. Test case generators and compu-
tational results for the maximum clique problem. Journal of Global Optimization, 3:
463–482, 1993.
[71] P. Holme and B. J. Kim. Attack vulnerability of complex networks. Phys. Rev. E, 65
(5), 2002.
48 BIBLIOGRAPHY
[74] R. Ferrer i Cancho and R. V. Solé. Statistical mechanics of complex networks, chapter
[75] R. Ferrer i Cancho, C. Janssen, and R. V. Solé. Topology of technology graphs: Small
[77] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale
organization of metabolic networks. Nature, 407:651–654, 2000.
[78] D. J. Johnson and M. A. Trick, editors. Cliques, Coloring, and Satisfiability: Sec-
ond DIMACS Implementation Challenge, Workshop, October 11-13, 1993. American
2003.
[81] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs.
The Bell System Technical Journal, 49:291–307, 1970.
BIBLIOGRAPHY 49
[82] R. Kinney, P. Crucitti, R. Albert, and V. Latora. Modeling cascading failures in the
north american power grid. The European Physical Journal B, 46:101–107, 2005.
[89] S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:
107–109, 1999.
[91] A. L. Lloyd and R. M. May. How viruses spread among computers and people. Science,
292:1316–1317, 2001.
[92] S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967.
50 BIBLIOGRAPHY
[93] M. Molloy and B. Reed. A critical point for random graphs with a given degree
sequence. Random Structures Algorithms, 6:161–179, 1995.
[96] A. E. Motter. Cascade control and defense in complex networks. Phys. Rev. Lett., 93
(9):098701, 2004.
[97] A. E. Motter and Y. Lai. Cascade-based attacks on complex networks. Phys. Rev. E,
66(6):065102, 2002.
[101] M. E. J. Newman. The structure and function of complex networks. SIAM Review,
45:167–256, 2003.
models of networks.
[105] M. E. J. Newman, S. Forrest, and J. Balthrop. Email networks and the spread of
computer viruses. Phys. Rev. E, 66(3):035101, 2002.
[106] J. D. Noh and H. Rieger. Stability of shortest paths in complex networks with random
[107] T. Ohira and R. Sawatari. Phase transition in a computer network traffic model. Phys.
[108] Committee on network science for future army applications. Network Science. The
National Academies Press, 2005.
[109] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community
structure of complex networks in nature and society. Nature, 435:814–818, 2005.
[110] P. M. Pardalos and J. Xue. The maximum clique problem. Journal of Global Opti-
mization, 4:301–328, 1994.
[118] S.L. Pimm. Food Webs. The University of Chicago Press, 2 edition, 2002.
[119] A. Pothen, H. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of
graphs. SIAM J. Matrix Anal., 11(3):430–452, 1990.
[120] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identi-
fying communities in networks. Proc. Natl. Acad. Sci., 101:2658–2663, 2004.
[122] M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties
of large-scale peer-to-peer systems and implications for system design. IEEE Internet
Computing Journal, 6:50–57, 2002.
[127] T. A. B. Snijders. Markov chain monte carlo estimation of exponential random graph
models. J. Social Structure, 3(2):1–40, 2002.
[129] D. Strauss. On a general class of models for interaction. SIAM Review, 28:513–527,
1986.
[137] W. Vogels, R. van Renesse, and K. Birman. The power of epidemics: robust commu-
nication for large-scale distributed systems. SIGCOMM Comput. Commun. Rev., 33
(1):131–135, 2003.
[138] X. F. Wang and J. Xu. Cascading failures in coupled map lattices. Phys. Rev. E, 70
(5):056113, 2004.
[139] S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press,
1994.
[140] S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks
1: An introduction to markov random graphs and p∗ . Psychometrika, 61:401–426, 1996.
[141] D. J. Watts. A simple model of global cascades on random networks. Proc. Natl. Acad.
Sci., 99(9):5766–5771, 2002.
[142] D. J. Watts. Six degrees: The science of a connected age. W. W. Norton & Company,
2003.
[144] D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social networks.
Science, 296:1302–1305, 2002.
[148] S. H. Yook, H. Jeong, A. L. Barabási, and Y. Tu. Weighted evolving networks. Phys.
Rev. Lett., 2001:5835–5838, 86.