Sie sind auf Seite 1von 166

Studies in the structure and function of complex networks

with focus on
Social, Technological and Engineered networks

1
Usha Nandini Raghavan, 1 Soundar Kumara and 2 Réka Albert
1 Department of Industrial Engineering, The Pennsylvania State University
2 Department of Physics, The Pennsylvania State University,
University Park, Pennsylvania, 16802, USA

Prologue

We at the Laboratory for Intelligent Systems and Quality (LISQ) at the department of
Industrial Engineering at Penn State are involved with studying complexity since 1989. In
the early stages of work at LISQ we focused on analyzing sensor signals and extracting
features from them for estimating the state of the machines [1, 2]. This fundamental work
evolved into characterizing and analyzing the observed data. These studies established for
the first time the existence of chaos in machining [3, 4, 5, 6, 7, 8]. This work in studying
complexity, in specific, nonlinear dynamics was conducted in different realms, namely sensor
networks, infrastructure monitoring and supply chains. Subsequently the logical question
addressed was “How do we deal with complexity when the number of participating entities
(nodes) increase?”. This took us in the direction of graph theory, random graphs and large
scale networks. In this monograph we summarize our work with the hope that it will help
the engineering community to pursue research in this new and exciting area of complex
networks.
This monograph is a results of sustained work over a period of last six years. Several of
our students helped us shape this work. Hari Prasad Thadakamalla who started this work
is instrumental in exploring supply chains as complex networks and search on weighted
graphs. We started collaborating with Dr.Réka Albert from the early stages of Hari’s PhD
thesis. Christopher Carrino explored dynamic community formation in social networks with
applications to terrorist networks. Usha Nandini Raghavan and Amit Surana explored
adaptivity in general. Nandini in specific addresses algorithms for community detection in
2

large social networks.


We have structured this monographs as an evolving document, with an introduction to
complex networks and a general introduction to various problems in social, technological
and engineered networks. We follow this with a series of papers that we have published
in the area of complex networks in the last few years. This research has resulted in three
PhD dissertations (Hari Thadakamalla, Christopher Carrino and Usha Nandini Raghavan
at Penn State jointly co-advised with Dr.Réka Albert).
We look forward to your feedback and comments.

Soundar Kumara April 2008


skumara@psu.edu Penn State
3

I. INTRODUCTION

Why does some innovation capture the imagination of a society while others do not?
How do people form opinions and how does consensus emerge in an organization? How to
capture the opinions and votes of people during election years?

What are the fundamentals of nature and how do cells and organisms evolve and
survive? What makes a cell’s functions robust and adaptable to its environment?

How can we make resource sharing through the Internet secure? In this information
age, how do we as users quickly find relevant information from the World Wide Web?
How do we guard technological infrastructures that form the backbone of our day to day
business, from malicious attacks?

How can we sense and prevent forest fires at an early stage? How do we put to use
sensor devices to detect forest fires? How can we use autonomous sensor nodes to monitor
dangerous terrains and large chemical plants?

These are only a few questions the answers to which will affect the lives of people
and the society we live in significantly. Science and engineering in their overall effort to
address these issues have created many different avenues of research; Network Science
being one among them. Network science is the study of systems mainly using their
network structure or topology. The nodes (vertices) and links (edges) of such networks
are the entities (people, bio-molecules, webpages and sensor devices) and the interactions
(friendships, chemical reactions, hyperlinks and communications respectively) between
entities respectively.
People have opinions of their own, but they also shape opinions by interacting and ex-
changing views with their friends and neighbors. Sociologists have long understood that an
individual’s behavior is significantly affected by their social interactions [9, 10]. It is now
widely believed that biological functions of cells and the robustness of cellular processes arise
due to the interactions that exist between the components of various cells [11]. Webpages
with contents and information relate to other webpages by means of hyperlinks creating a
4

complex web like structure; the WWW. Miniaturized wireless sensor nodes, which individu-
ally have limited capabilities, achieve an overall sensing task by communicating and sharing
information with other nodes [12, 13].
A vast amount of research in recent years has shown that organization of links (who is
connected to whom) in a network and the topological properties carry significant information
about the behaviors of the system it represents [10, 14]. Furthermore, the topological prop-
erties have a huge impact on the performances of processes such as information diffusion,
opinion formation, search, navigation and others.
Organization of links in large-scale natural networks was originally considered to be ran-
dom [10, 14, 15]. But empirical observations in the recent past have revealed topological
properties in a wide range of social, biological and technological networks that deviate from
randomness [10, 14, 16, 17]. That is, natural networks that appear in nature and whose
evolution is largely uncontrolled (self-organized) have specific organizing principles leading
to various properties or orders in their topology. This observation has sparked an interest
in the scientific study of networks and network modeling, including the desire to engineer
man-made systems to mimic the behaviors of nature.

II. NETWORKS

As explained above, complex systems are modeled as networks to understand and op-
timize processes such as formation of opinions, resource sharing, information retrieval, ro-
bustness to perturbations etc. The following are some of the examples of systems and their
network representations.

A. Natural networks

A natural network is a representation of a system that is present in nature or has evolved


over a period of time without any centralized control. Examples include;

1. Movie actor collaborations: This network consists of movie actors as nodes and edges
represent the appearance of pairs of actors in the same movie. It is a growing network
that had about 225,226 nodes and 13,738,786 edges in 1998 [18]. Interests in this
network include the study of successful collaborations (what kind of casting makes a
5

movie successful?) [19] and the famous Bacon number experiment to study how other
actors are linked to Kevin Bacon through their casting roles [20].

2. Scientific co-authorship: In this network, the nodes are scientists or researchers and
an edge exist between scientists if they have collaborated together in writing a paper.
Newman [21, 22, 23] studied scientific co-authorship networks from four different areas
of research. The information was obtained in an automated way from four different
databases MEDLINE, Physics E-print archive, SPIRES and NCSTRL that has a col-
lection of all the papers and their authors in areas of biomed, physics, high-energy
physics and computer science respectively. One of these networks formed from Med-
line database for the period from 1961 to 2001 had 1,520,251 nodes and 2,163,923
edges. Developing metrics to quantify the scientific productivity or cumulative impact
of a scientist given his/her collaborations is one problem of interest in co-authorship
networks [24, 25]. The Erdős Number project, which motivated the Bacon number, is
a popular experiment that is used in the study of optimal co-authorship structures of
successful scientists [26].

3. The Internet: The Internet is a network of computers and devices connected by wired
or wireless links. The study of Internet is carried out at two different levels namely,
router level and at the level of autonomous systems [14, 27]. At the router level, each
router is represented as a node and the physical connections between them as the edges
in the network. In the autonomous systems level, every domain (Internet Provider
System) is represented as a node and the inter-domain connections are represented by
the edges. The number of nodes at the router and domain level were 150,000 in 2000
[27] and 4000 in 1999 [28] respectively. The problem of identifying and sharing files
efficiently over peer-to-peer networks (such as Gnutella [29]) that are built over the
Internet has received significant attention in recent years [30, 31].

4. World Wide Web (WWW): The WWW is a network of webpages where the hyperlinks
between the webpages are represented by the edges in the network. It is a growing
network that had about one billion nodes in 1999 [32] with a recent study estimating
the size to be about 11.5 billion in January 2005 [33]. Information retrieval from
WWW is a problem of immense interest. Algorithms such as Page Rank [34] or the
6

ones proposed by Kleinberg in [35], use the network structure to extract webpages in
the order of relevance to user requests.

5. Neural networks: Here the nodes are neurons and an edge connects two neurons if there
is a chemical or electrical synapse between them. Watts and Strogatz [14, 18] studied
topological properties of the neural network of nematode worm C.elegans consisting
of 282 neurons with pairs of neurons connected by the presence of either a synapse
or a gap junction. Study of neural networks is important for understanding how the
brain stores and processes information [17]. While we can observe that this is done in
an optimal and robust way in neural networks we are still at loss in quantifying this
mechanism [17].

6. Cellular networks: Here the substrates or molecules that constitute a cell are repre-
sented as nodes and the presence of bio-chemical interactions between the molecules are
represented as edges [14]. Among others, the interactions between protein molecules
are important for many biological functions [11, 36]. Jeong et al. [11] have studied
the topology of protein-protein interaction map of the yeast S.cerevisia that consists
of 1870 nodes as proteins and connected by 2240 identified interactions. Using the
network structure to predict possible (previously unidentified) interactions between
protein molecules has received wide spread attention from researchers [37, 38].

B. Engineered networks

Engineered networks are those in which the nodes of the network follow a pre-specified
set of protocols by which the links are formed. Whether the control is centralized or de-
centralized, the organization is engineered to achieve desired topological properties. Some
examples follow.

1. Agent-based supply chain networks: Here software agents that are responsible for the
functions of a supplier, manufacturer, distributor and retailer are the nodes and the
direct flow of information/tasks/commodities between entities are represented by the
edges in the network. Thadakamalla et al [39] studied the topological properties of a
military supply-chain (with 10,000 nodes [40]) and proposed mechanisms by which the
nodes can re-organize under functional constraints to provide better performances.
7

2. Wireless Sensor Networks (WSN): Here the nodes represent miniaturized wireless sen-
sor devices that consist of a short-ranged radio tranceiver and limited computational
capabilities [12, 13]. Though individual sensors have limited capacities, the true value
of the system is achieved by sharing responsibilities and information through a com-
munication infrastructure [13]. Thus an edge in a WSN represents the presence of
communication between two nodes. The number of nodes in a WSN can vary any-
where between a few hundreds or thousands to even millions depending on the appli-
cation scenario. The sensor nodes when deployed in a sensing region will self-organize
to establish a communication topology. There is considerable interest in developing
topology control protocols that will guide this organization process to support the
global sensing tasks [12, 41, 42].

C. Scientific and Engineering interests

Interest in the study of natural complex networks can be broadly classified into two
classes, namely scientific and engineering. The scientific interest lies in understanding the
structure, evolution, and properties of networks, with an eventual goal of engineering more
efficient processes on these networks. The engineering interest, on the other hand, lies in
developing more efficient algorithms and finding optimal parameters to better control the
processes taking place on such networks [10, 14, 17].
With an increasing understanding on the structural organization leading to emergent
properties, a rich literature of complex network models that can mimic such properties
has developed [10, 14, 17, 18]. These network models then form the basis on which
processes such as disease propagation, information diffusion, search, navigation and others
are studied and analyzed. Some of the interesting questions that can be answered using a
combination of both aspects of this research include 1) how to control the spread of diseases
in a large class of people interconnected by physical contacts, 2) how to study, maintain,
and control the diffusion of information in WWW, and 3) how to better identify targets
for drug discovery in metabolic networks? In parallel there is also considerable interest
in engineering networks such as supply chains and miniaturized wireless sensors, where,
by controlling the interactions between entities desired behaviors are achieved [12, 39, 43, 44].
8

Useful links within Penn State

• Laboratory for Intelligent Systems and Quality

• Biological physics and network modeling

• The Huck Institute of Life Sciences

• Center for supply chain research

Other links

• Center for Complex Network Research at Notre Dame

• Center for the study of complex systems at University of Michigan, Ann Arbor

• Social computing lab at Hewlett-Packard Labs

• Complex Systems group at the Los Alamos National Labs

• The Santa Fe Institute

• The Biocomplexity Institute at the Indiana University

• New England Complex Systems Institute

• Amaral Research group at the Northwestern University

• cFinder - Clusters & Communities - overlapping dense groups in networks

• International Network for Social Network Analysis

• Center for Computational Analysis of Social and Organizational Systems

• Small world project

• Tracing information flow - Project jointly developed at Cornell University and Carleton
College

• Program on Networked Governance

• HOT-Highly Optimized Tolerance at UCSB and Caltech


9

• Berkeley WEBS(Wireless Embedded Systems)

• Center for embedded network sensing at UCLA

• Embedded Networks Laboratory at USC

• Microeconomic and Social Systems at Yahoo Research

• Google Research

• Web Search & Mining and Web search and Data mining groups at Microsoft research

Links to complex network software

• orgnet software

• Graphviz - Graph Visualization Software

• NetworkX - Python package for creation, manipulation and study of complex networks

• Pajek - Program for large network analysis

III. SOCIAL NETWORKS

In a social network the nodes represent actors (such as individuals) who are interconnected
by relationships (such as friendship or acquaintance). Social network analysis (SNA) deals
with the study of such networks and how the structural measures and properties relate to
individuals and the processes taking place on these networks.
SNA emphasis the prominent role relationships play in characterizing an individual entity
(or actor). Some of the properties that are used today in complex networks research have
their origins in sociometry such as degree, betweenness centrality, closeness centrality etc.
Such concepts were defined to quantify the prominent or central role played by an actor in
a given network. Under the framework of complex network theory and SNA, there has been
many research efforts that characterized the social interactions or the relative importance
of nodes in movie actor collaborations [16, 20], co-authorship networks [24] and others.
There has also been many work that has to some extent characterized the roles of actors
and predicting future collaborations in terrorist networks [19, 45]. In [45] an extended
10

network of September 11th hijackers and their associates it was shown that many ties in
the network were concentrated around the pilots or persons with unique skills. Hence by
targeting and removing those with necessary skills (or high-degree nodes) for a project can
inflict maximum damage to the project’s mission (network connectivity).
There has however been a constant debate on the validity of data points that are collected
to form networks involving people and their relationships. For example, if one wants to study
relationships among school children, the network is formed by asking individual children in
a specific school to identify their friends. It is possible that some children tend to acquaint
or call every one in his/her class as friends. Especially when the data points collected is
small, it is often difficult to provide confidence to statistical analysis/observations and their
consequences. Scientific collaboration is a network where an abundant amount of accurate
information is available on scientists and their collaborations. As a result they are very
popular in the research community in the study of their structures and in understanding the
social implications of their structural properties. In this network, the nodes are scientists
or researchers and an edge exists between scientists if they have collaborated together in
writing a paper. The network can also be weighted based on some index of the number
of collaborations between scientists. In [21, 22, 23], scientific collaboration networks from
various fields (biomedical, theoretical physics, high-energy physics and computer science)
were considered for structural analysis. One of the important consequences of understanding
the underlying structures of such social networks is to test new theories on models of these
networks [10, 14, 17].
Citation networks have been studied extensively to identify the historical and social
impact of papers/research/scientists. Since the introduction of the Science Citation Index
(SCI) by the Institute for Scientific Information, researchers have been able to construct
and study the structure of large volumes of citation interconnections between papers. The
SCI provides a list all papers from selected Journals and under each of these papers is again
another list of papers that has references to them. In particular, a citation network consists
of papers as nodes and an edge exists between papers, directed towards the cited paper. Price
[46], based on his empirical study, was the first to observe that in many papers one half of
the references were to a research front of recent papers, while the other half of the references
were uniformly randomly scattered through the literature. This suggests that there is a
tendency among researchers to build a research front based on recent work. Currently, there
11

are many databases with information on papers and their references/citations that are freely
available to the community. Few such databases include the Stanford Public Information
Retrieval(SPIRES) which consists of papers in the field of High Energy physics, CiteSeer
which is an open access digital library that consists of a comprehensive list of scientific and
academic papers, Citebase that indexes papers that are self-archieved by authors in the field
of physics, mathematics and computer science and BioMed Central and PubMed Central
that indexes published papers in the field of biomedicine. Availability of large volumes of
accurate data has revived the interests among researchers in the field of citation analysis.
Hirsch [24] developed a structural measure called as the h-index which, unlike previous
measures, can quantify the cumulative impact and relevance of an individual’s scientific
research output. In specific, the h-index of a scientist is h if h of his/her papers has at
least h citations and the others have fewer than h citations. If this index, Hirsch argues, is
different for two different scientists who both have the same number of publications and the
same number of overall citations, then the scientist with a higher h value is likely to be the
most accomplished between the two.
The telecom industry has provided us with some of the most naturally available social
network structures for statistical analysis. Aiello et al [47, 48] analyzed the graph of long-
distance calls made between different phone numbers. They constructed a random graph
model that best emulates the properties of the phone call networks. A similar study was
also done by Nanavati et. al. [49] on a call graph constructed between cell phone users of
a certain telecom provider. Here directed edges were considered between callers originating
from the person making the call to the person who is receiving the call. Their analysis
showed that some of the properties (namely degree distribution) were different from similar
networks such as the WWW and e-mail graphs. They further proposed a Treasure-Hunt
model that can capture these degree distributions effectively.
In addition to studying the structures of social networks, there have also been works that
combine network analysis with other methods to make inferences about the characteristics
of individual entities or groups. In [50], the network of committees and subcommittees of
the U.S. House of Representatives between the 101st and 108th Congress were analyzed.
Here an edge exists between committees if they have common membership. In addition to
network theory, a Singular Value Decomposition (SVD) analysis of the roll call votes by the
members were used to identify correlations between members’ committee assignments and
12

their political positions (such as Republic or Democract). Hogg and Adamic [51] have argued
that using ranking methods such as PageRank [34] or NodeRank [52] to assign reputations
to nodes in a social network can be made effective by making it more difficult to alter ratings
via duplication or collusion. In particular they argue that the structural measures of the
social networks can be effectively used to make ranking systems more reliable.
While traditional models for disease propagation that assume a fully mixed population
work well on small-sized populations they fail to agree with observed trends in heterogenous
and large-sized populations [10, 53, 54, 55, 56, 57]. In such cases, simulation has emerged
as a powerful tool that can capture both the topological properties and changes along with
the disease dynamics to provide a better understanding of the disease propagation in social
networks [58]. There have also been several studies related to opinion formation [59] and
finding community or group structures in social networks [35, 60, 61, 62, 63]. It has been
observed that the interconnections between nodes in real-world networks is not random,
but display a structure wherein nodes show preferences in being connected to other nodes
within a tightly knit group. Finding such tightly interconnected groups of nodes (termed
as communities), can offer a micro level information about the structures of the networks
individually within communities and as a whole [61, 63, 64]. To social networks in particu-
lar, communities can throw light on opinion formation, common characteristics and beliefs
among groups of people that make them different from other communities.

IV. TECHNOLOGICAL NETWORKS

Information sharing and retrieval drives the day to day business across the world. This
need has propelled the research interests on technological networks such as the Internet and
the WWW. The goal of such a research is to develop efficient protocols for communication
on the Internet and information retrieval in the WWW. To achieve this goal a two pronged
approach is required. One branch of research focus is in the understanding of the organization
of the technological networks. Using this understanding, the second focus relies on the
network models developed, to optimize information sharing and retrieval.
The map of the Internet is considered at two different scales. At the level of Autonomous
Systems (AS), the network of the Internet consists of nodes as ASs with edges representing
the physical communications. An AS here is an organizational unit of a particular domain
13

or service provider. Edges representing physical communications connect sub-networks or


devices across these ASs. Exchange of information between devices or sub-networks are
done using routers. Routers are devices responsible for receiving and forwarding data pack-
ets. The map of the Internet at this grainier scale consists of nodes as routers and their
communications within and across ASs as edges.
The structure of the Internet has been analyzed extensively at both these scales [27, 65].
Faloutsos et al [27] have analyzed the Internet AS network individually on data collected
between 1997 to 1999. During the years, while the number of nodes and edges increased
from 3112 and 5450 to 5287 and 10,100 respectively, the average degree remained a constant.
This was also the case with average path lengths, which was found to be approximately 3.8
for all the three years. Furthermore, the path length distribution is peaked around the
average value and the shape remains essentially unchanged over the three years. On the
other hand one property of the network that did change over the years was the clustering
coefficient. It increased from 0.18 in 1997 to 0.24 in 1999. This is due to the modular
structure of the Internet where many small ASs within countries might be interconnected
forming clusters, while there are only a few connections to global areas. The distribution
of the clustering coefficient is a power-law decaying function of the degree on the nodes,
Ck ≈ k −ν , with ν = 0.75 ± 0.03. All properties except the distribution of the clustering
coefficient are similar for the Internet network at the router level. At the router level the
distribution of clustering coefficient shows independence to the degree on the nodes.
Doyle et al [66] take a different view of the steps required to understand the organized
complexities present in the Internet topology. In the Internet, physical connections between
the subsystems and routers forms the lower layer of the protocol stack. The protocols are the
ones responsible for routing and forwarding of data packets and tries to carry effectively the
expected overall traffic demand generated by the end-users. They stress that it is this func-
tional requirement that drives the organization of the Internet topology. By optimizing on
the network throughput (flow of traffic) given the user demands at all the end vertices, Doyle
et al show that one can create a network that has very similar properties and performances
as that of the real Internet.
Enabled by the growing infrastructure of the Internet, WWW is another technological
network that has grown in manifolds over the years. The WWW network consists of web-
pages as nodes and hyper-links from one webpage to another forming a directed edge between
14

those nodes. Since it is a directed network, unlike the internet, its in and out degree distri-
butions are analyzed differently. Albert et. al. observed the presence of a power-law degree
distribution in the WWW map at the *.nd.edu domain [67]. While the power-law exponent
for the in-degree distribution was 2.1, the exponent for the out-degree distribution was 2.45.
Pencock et al [68], analyzed the WWW by dividing them along subject categories, such
as computer science, universities, companies and newspapers. Within these categories, the
in-degree distribution of the networks showed considerable variability in the power-law ex-
ponent ν, varying between 2.1 and 2.6. This implies that the structure of the WWW shows
different dynamics based on the way the information of the webpages and their connections
are identified and mapped [64].
It has also been shown that the way in which nodes and interconnections are identified
in networks (sampling methods) can affect our estimation of the structural properties of the
original network [69, 70, 71, 72, 73, 74, 75]. For example in [76] Lakhina et. al. show that
by using traceroute-like sampling methods [75] it is possible to conclude from the sample
that the network has scale-free property when in fact the original network is a random graph
[15].
File sharing peer-to-peer networks, such as Gnutella, are another kind of communication
network that has emerged on the top of the basic Internet structure. In specific, Adamic et al
and Thadakamalla et al have analyzed how decentralized search processes on networks such
as Gnutella are affected by the heterogeneities in the degree and edge weight distributions
[30, 31, 77]. In [77] the authors studied decentralized search processes in spatial scale-free
networks. In particular they showed that two factors, namely direction and degree on nodes,
are sufficient to guide the search processes to finding the shortest paths from the origin to
destination. This result adds further evidence to the conjecture that many natural networks
are inherently searchable [78, 79].
Information retrieval is an important issue on the WWW. Search engines are useful tools
that help in information retrieval. Algorithms such as Page Rank are used to retrieve the
webpages in the order that is expected to be of relevance to user requests. This algorithm uses
both the individual webpage’s value and the value attached to the webpage by its neighbors
as an indicator of the overall value of a given webpage [34]. Kleinberg et al [35], propose
similar link based mechanisms for retrieving webpages with relevant information, but do so
using two different sets of measures. They associate values to webpage that determines if
15

they are good authority and/or good hubs. A good hub is a webpage that has hyperlinks to
many good authorities and a good authoritarian webpage is one that is referenced by many
good hubs. The best set of hubs and authorities then contain information that is of most
relevance to the user. Such an approach, according to Kleinberg, was motivated by a large
presence of bi-partite sub-structures observed in the WWW network [35].
Large supply chains are among the networks that have complex topologies [39, 80, 81].
Analysis of the topological properties of real-world supply chains is difficult. This is because
supply chains are composed of various individual and independent entities such as suppliers,
manufacturers, distributors and retailers. Hence, it is difficult to compose information from
various sources to form an accurate picture of any large-scale supply chain. It is however well
known that their topologies tend to be hierarchical so as to enable product flow downstream
from suppliers to customers and information flow upstream from customers back to the
suppliers. One of the well studied dynamics on the supply chains is the Bullwhip effect [82],
where small variabilities or uncertainties created at the lower most layer increases as the
uncertainties move upstream towards the manufacturer and suppliers. This cascading effect
is due to the coupling of complexities arising from human judgement with that of the supply
chain structure. Cascading effects are also studied in the context of power distribution in
power grids [83, 84, 85, 86, 87]. The North American power grid is one of the most complex
technological network. It consists of substations of three types namely, generation substation
responsible for producing electric power, transmission substations that transfers power along
high voltage lines and distribution substations that distribute power to small, local grids.
Kinney et al [87], study the effect of cascading failures on the exact topology of the North
American power grid with plausible assumptions about the load and overload of substations.
If a substation fails to work, then the generated power, since it cannot be destroyed, is re-
routed via other nodes in the network. As a result, the load on other nodes increases and
may result in cascading effects. Under single node removal, Kinney et al showed that 40
percent of transmission substations lead to cascading failures in the North American power
grid.
16

V. CONTENTS TO FOLLOW

We have collected a series of papers that we have published in the last few years and
added them as the remaining contents of this document. They are

• A. Surana, S. Kumara, M. Greaves and U.N. Raghavan, “Supply-chain networks:


a complex adaptive systems perspective”, International Journal of Productions Re-
search, Vol. 43, No. 20, pp. 4235 - 4265 (2005).
With the use of sophisticated technologies and the supply chains becoming more and
more global, they have acquired a complexity almost equivalent to biological systems.
In this paper, we investigate supply chain complexity from a complex adaptive sys-
tems perspective. In specific we use tools and techniques from the fields of nonlinear
dynamics, statistical physics and information theory to characterize and model supply
chain networks.

• H.P. Thadakamalla, U.N. Raghavan, S. Kumara and R. Albert, “Survivability of


Multiagent-Based Supply Networks: A Topological Perspective ”, IEEE Intelligent
Systems, Vol. 19, No. 5, pp.24 - 31 (2004).
Our main focus in this paper is on the survivability of supply networks. In specific we
look at the survivability of supply networks from a topological perspective. We define
several components that encompass topological survivability and propose methods by
which one can build topologically survivable supply networks.

• H.P. Thadakamalla, R. Albert and S. Kumara, “Search in weighted complex networks”,


Physical Review E, Vol. 72(066128) (2005) and
H.P. Thadakamalla, R. Albert and S. Kumara, “Search in spatial scale-free networks”,
New Journal of Physics, Vol. 9(190) (2007).
Search in networks is one of the important dynamical processes with a wide range of
applications including information retrieval from WWW, searching for files in peer-to-
peer networks and identifying specific nodes in ad-hoc and wireless sensor networks.
In these papers we develop and investigate decentralized search algorithms in various
classes of complex networks.

• U.N. Raghavan and S. Kumara, “Decentralized topology control algorithms for con-
nectivity of distributed wireless sensor networks”, International Journal of Sensor
17

Networks, Vol. 2, No. 3/4, pp.201 - 210 (2007). and


U.N. Raghavan, H.P. Thadakamalla and S. Kumara, “Phase transitions and connec-
tivity in distributed wireless sensor networks”, in the proceedings of ADCOM’05, pp.
10 - 15, Coimbatore, India (2005).
In this paper we investigate topological requirements in Wireless Sensor Networks. In
particular we focus on one such requirement, namely connectivity. With power being
one of the scarce resources in wireless sensor networks, we optimize power expenditure
subject to network connectivity.

• U.N. Raghavan, R. Albert and S. Kumara, “Near linear time algorithm to detect
community structures in large scale networks”, Physical Review E, Vol. 76 (036106)
(2007).
In this paper we study the presence of clusters/communities in various real-world com-
plex networks such as movie actor collaboration network, protein-protein interaction
maps, scientific co-authorships and the WWW.

• H.P. Thadakamalla, S. Kumara and R. Albert, “Complexity and large scale networks”,
Chapter 11 in Operations Research and Management Science Handbook edited by A.
R. Ravindran, CRC press (2007).
Engineering community, in specific the Industrial Engineering community’s focus is
on OR. We have thoroughly investigated the relationship between OR and complex
networks in this book chapter.

[1] S. Kamarthi, S. Kumara, and P. Cohen, Wavelet Representation of Acoustic Emission in


Turning Process (????).
[2] S. Kamarthi, S. Kumara, and P. Cohen, Journal of Manufacturing Science and Engineering
122, 12 (2000).
[3] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, IIE Transactions 27, 519 (1995).
[4] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, Physical Review E 52, 2375 (1995).
[5] S. Bukkapatnam, A. Lakhtakia, and S. Kumara, Speculations in Science and Technology 19,
137 (1996).
18

[6] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, ASME Transactions Journal of Manufactur-


ing Science and Engineering 121, 568 (1999).
[7] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, IMA Journal of Applied Mathematics 63,
149 (1999).
[8] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, CIRP Journal of Manufacturing Systems 29,
321 (1999).
[9] S. Wasserman and K. Faust, Social network analysis: Methods and Applications (Cambridge
University Press, 1994).
[10] M. E. J. Newman, SIAM Review 45, 167 (2003).
[11] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabasi, NATURE v 407, 651
(2000).
[12] P. Santi, Topology Control in Wireless Ad Hoc and Sensor Networks (John Wiley and Sons,
Chichester, UK, 2005).
[13] D. Estrin, R. Govindan, J. Heidmann, and S. Kumar, in the Proceedings of ACM MobiCom
pp. 263–270 (1999).
[14] R. Albert and A.-L. Barabási, Reviews of Modern Physics 74, 47 (2002).
[15] B. Bollobás, Random Graphs (Academic Press, Orlando, FL, 1985).
[16] R. Albert, H. Jeong, and A.-L. Barabási, Nature 401, 130 (1999).
[17] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Physics Reports 424, 175
(2006).
[18] D. Watts and S. Strogatz, Nature 393, 440 (1998).
[19] C. Carrino, Ph.D. thesis, The Pennsylvania State University (2006).
[20] B. Tjaden and G. Wasson, The oracle of bacon, http://www.cs.virginia.edu/oracle/ (last ac-
cessed April 2008).
[21] M. E. J. Newman, Proceedings of National Academy of Sciences 98, 404 (2001).
[22] M. E. J. Newman, Physical Review E 64, 016131 (2001).
[23] M. E. J. Newman, Physical Review E 64, 016132 (2001).
[24] J. E. Hirsch, Proceedings of the National Academy of Sciences 102, 16569 (2005).
[25] L. Egghe, Scientometrics 69, 131 (2006).
[26] J. Grossman, P. Ion, and R. Castro, Erdös number project, http://www.oakland.edu/enp/(last
accessed April 2008).
19

[27] M. Faloutsos, P. Faloutsos, and C. Faloutsos, in SIGCOMM ’99: Proceedings of the conference
on Applications, technologies, architectures, and protocols for computer communication (ACM,
1999), pp. 251–262.
[28] R. Govindan and H. Tangmunarunkit, in IEEE INFOCOM 2000 (Tel Aviv, Israel, 2000), pp.
1371–1380.
[29] G. Kan, Peer-to-Peer Harnessing the Power of Disruptive Technologies (O’Reilly, Beijing,
2001), chap. Gnutella.
[30] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, Physical Review E 64,
046135 (2001).
[31] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara, Physical Review E 72, 066128 (2005).
[32] S. Lawrence and C. L. Giles, Nature 400, 107 (1999).
[33] A. Gulli and A. Signorini, in WWW ’05: Special interest tracks and posters of the 14th
international conference on World Wide Web (ACM Press, New York, USA, 2005), pp. 902–
903.
[34] L. Page, S. Brin, R. Motwani, and T. Winograd, Tech. Rep., Stanford Digital Library Tech-
nologies Project (1998), URL citeseer.ist.psu.edu/page98pagerank.html.
[35] J. M. Kleinberg, Journal of the ACM 46, 604 (1999).
[36] H. Jeong, S. Mason, A.-L. Barabási, and Z. Oltvai, Nature 411, 41 (2001).
[37] I. Albert and R. Albert, Bioinformatics 20 (2004).
[38] R. Albert, The Plant Cell 19, 3327 (2007).
[39] H. P. Thadakamalla, U. N. Raghavan, S. Kumara, and R. Albert, IEEE Intelligent Systems
19, 24 (2004).
[40] S. Kumara, Tech. Rep., The Pennsylvania State University (2005).
[41] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, Computer Networks 38, 393
(2002).
[42] D. Culler (2001).
[43] D. M. Blough, M. Leoncini, G. Resta, and P. Santi, IEEE Transactions on Mobile Computing
(to appear) (2006).
[44] J. M. Ottino, Nature 427 (2004).
[45] V. Kerbs, First Monday 7 (2002).
[46] D. Price, Science 149, 510 (1965).
20

[47] W. Aiello, F. Chung, and L. Lu, Proceedings of the thirty-second annual ACM symposium on
Theory of computing pp. 171–180 (2000).
[48] W. Aiello, F. Chung, and L. Lu, Experimental Mathematics 10, 53 (2001).
[49] A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and
A. Joshi, in CIKM ’06: Proceedings of the 15th ACM international conference on Information
and knowledge management (ACM, New York, NY, USA, 2006), pp. 435–444.
[50] M. Porter, P. Mucha, M. Newman, and A. Friend, Physica A 386, 414 (2007).
[51] T. Hogg and L. Adamic, in EC ’04: Proceedings of the 5th ACM conference on Electronic
commerce (ACM, New York, NY, USA, 2004), pp. 236–237.
[52] K. Chitrapura and S. Kashyap, in CIKM ’04: Proceedings of the thirteenth ACM international
conference on Information and knowledge management (ACM, New York, NY, USA, 2004),
pp. 597–606.
[53] R. Pastor-Satorras and A. Vespignani, Physical Review E 63, 066117 (2001).
[54] R. Pastor-Satorras and A. Vespignani, Physical Review Letters 86, 3200 (2001).
[55] R. Pastor-Satorras and A. Vespignani, Physical Review E 65, 035108 (2002).
[56] R. Pastor-Satorras and A. Vespignani, Physical Review E 65, 036104 (2002).
[57] R. Pastor-Satorras and A. Vespignani, Handbook of Graphs and Networks (Wiley-VCH, Berlin,
2003), chap. Epidemics and immunization in scale-free networks.
[58] C. Christensen, I. Albert, B. Grenfell, and R. Albert (2008), working paper.
[59] F. Wu and B. Huberman, Computational Economics 0407002, EconWPA (2004), available at
http://ideas.repec.org/p/wpa/wuwpco/0407002.html.
[60] M. E. J. Newman and M. Girvan, Physical Review E 69, 026113 (2004).
[61] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, Nature 435, 814 (2005).
[62] J. Duch and A. Arenas, Physical Review E 72, 027104 (2005).
[63] U. Raghavan, R. Albert, and S. Kumara, Physical Review E 76, 036106 (2007).
[64] G. Flake and D. Pencock, The Colours of Infinity: Self-organization, Self-regulation, and
Self-similarity on the Fractal Web (2004).
[65] A. Vazquez, R. Pastor-Satorras, and A. Vespignani, Internet topology at the router and
autonomous system level (2002), URL http://www.citebase.org/abstract?id=oai:arXiv.org:
cond-mat/0206084.
[66] J. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W. Willinger,
21

Proceedings of the National Academy of Sciences 102, 14497 (2005).


[67] A.-L. Barabási and R. Albert, Science 286, 509 (1999).
[68] D. Pennock, G. Flake, S. Lawrence, E. Glover, and C. Giles, Proceedings of the National
Academy of Sciences 99, 5207 (2002).
[69] J. Leskovec and C. Faloutsos, in KDD ’06: Proceedings of the 12th ACM SIGKDD inter-
national conference on Knowledge discovery and data mining (ACM, New York, NY, USA,
2006), pp. 631–636.
[70] A. Ghani, C. Donnelly, and G. Garnett, Statistics in Medicine 17, 2079 (1998).
[71] R. Rothenberg, Connections 18, 105 (1995).
[72] P. Biernacki and D. Waldorf, Sociological Methods & Research 10, 141 (1981).
[73] A. Awan, R. Ferreira, S. Jagannathan, and A. Grama, in HICSS ’06: Proceedings of the
39th Annual Hawaii International Conference on System Sciences (IEEE Computer Society,
Washington, DC, USA, 2006), p. 223.3.
[74] D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, INFOCOM 2006. 25th IEEE
International Conference on Computer Communications. Proceedings pp. 1–6 (April 2006).
[75] D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, in STOC ’05: Proceedings of the thirty-
seventh annual ACM symposium on Theory of computing (ACM, New York, NY, USA, 2005),
pp. 694–703.
[76] A. Lakhina, J. Byers, M. Crovella, and P. Xie, INFOCOM 2003. Twenty-Second Annual Joint
Conference of the IEEE Computer and Communications Societies. IEEE 1, 332 (2003).
[77] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara, New Journal of Physics 9, 190 (2007).
[78] J. Kleinberg, Nature 406, 845 (2000).
[79] J. Kleinberg, Proceedings of the International Congress of Mathematicians 3, 1019 (2006).
[80] A. Surana, S. Kumara, M. Greaves, and U. Raghavan, International Journal of Productions
Research 43, 4235 (2005).
[81] D. Pathak, J. Day, A. Nair, W. Sawaya, and M. Kristal, Decision Sciences 38, 547 (November
2007).
[82] H. Lee, V. Padmanabhan, and S. Whang, Sloan Management Review 38, 93 (1997).
[83] Y. Moreno, J. B. Gomez, and A. F. Pacheco, Europhys. Lett. 58, 630 (2002).
[84] Y. Moreno, R. Pastor-Satorras, A. Vazquez, and A. Vespignani, Europhys. Lett. 62, 292
(2003).
22

[85] A. E. Motter and Y. Lai, Physical Review E 66, 065102 (2002).


[86] A. E. Motter, Phys. Rev. Lett. 93, 098701 (2004).
[87] R. Kinney, P. Crucitti, R. Albert, and V. Latora, The European Physical Journal B 46, 101
(2005).
International Journal of Production Research,
Vol. 43, No. 20, 15 October 2005, 4235–4265
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Supply-chain networks: a complex adaptive systems perspective

AMIT SURANAy, SOUNDAR KUMARA*z, MARK GREAVES§


and USHA NANDINI RAGHAVANz

yDepartment of Mechanical Engineering,


The Massachusetts Institute of Technology, Cambridge, MA 02139, USA
z310 Leonhard Building,
The Harold and Inge Marcus Department of Industrial & Manufacturing Engineering,
The Pennsylvania State University, University Park, PA 16802, USA
§IXO, DARPA, 3701 North Fairfax Drive, Arlington, VA 22203-1714, USA

(Revision received May 2005)

In this era, information technology is revolutionizing almost every domain of


technology and society, whereas the ‘complexity revolution’ is occurring in
science at a silent pace. In this paper, we look at the impact of the two, in the
context of supply-chain networks. With the advent of information technology,
supply chains have acquired a complexity almost equivalent to that of biological
systems. However, one of the major challenges that we are facing in supply-chain
management is the deployment of coordination strategies that lead to adaptive,
flexible and coherent collective behaviour in supply chains. The main hurdle has
been the lack of the principles that govern how supply chains with complex
organizational structure and function arise and develop, and what organizations
and functionality are attainable, given specific kinds of lower-level constituent
entities. The study of Complex Adaptive Systems (CAS), has been a research
effort attempting to find common characteristics and/or formal distinctions
among complex systems arising in diverse domains (like biology, social systems,
ecology and technology) that might lead to a better understanding of how com-
plexity occurs, whether it follows any general scientific laws of nature, and how it
might be related to simplicity. In this paper, we argue that supply chains should
be treated as a CAS. With this recognition, we propose how various concepts,
tools and techniques used in the study of CAS can be exploited to characterize
and model supply-chain networks. These tools and techniques are based on the
fields of nonlinear dynamics, statistical physics and information theory.

Keywords: Supply chain; Complexity; Complex adaptive systems; Nonlinear


dynamics; Networks

1. Introduction

A supply chain is a complex network with an overwhelming number of interactions


and inter-dependencies among different entities, processes and resources. The net-
work is highly nonlinear, shows complex multi-scale behaviour, has a structure
spanning several scales, and evolves and self-organizes through a complex interplay

*Corresponding author. Email: skumara@psu.edu

International Journal of Production Research


ISSN 0020–7543 print/ISSN 1366–588X online # 2005 Taylor & Francis
http://www.tandf.co.uk/journals
DOI: 10.1080/00207540500142274
4236 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

of its structure and function. This sheer complexity of supply-chain networks, with
inevitable lack of prediction, makes it difficult to manage and control them. Further-
more, the changing organizational and market trends mean that the supply chains
should be highly dynamic, scalable, reconfigurable, agile and adaptive: the network
should sense and respond effectively and efficiently to satisfy customer demand.
Supply-chain management necessitates the decisions made by business entities to
consider more factors that are global. The successful integration of the entire
supply-chain process now depends heavily on the availability of accurate
and timely information that can be shared by all members of the supply chain.
Information technology, with its capability of setting up dynamic informa-
tion exchange networks, has been a key enabling factor in shaping supply chains
to meet such requirements. However, a major obstacle remains in the deployment of
coordination and decision technologies to achieve complex, adaptive, and flexible
collective behaviour in the network. This is due to the lack of our understanding
of organizational, functional and evolutionary aspects in supply chains. A key
realization to tackle this problem is that supply-chain networks should be treated
not just as a ‘system’ but as a ‘Complex Adaptive System’ (CAS). The study of
CAS augments the systems theory and provides a rich set of tools and techniques
to model and analyse the complexity arising in systems encompassing science and
technology. In this paper, we take this perspective in dealing with supply chains and
show how various advances in the realm of CAS provide novel and effective ways
to characterize, understand and manage their emergent dynamics.
A similar viewpoint has been emphasized by Choi et al. (2001), who aimed to
demonstrate how supply networks should be managed if we recognize them as CAS.
The concept of CAS allows one to understand how supply networks as living systems
co-evolve with the rugged and dynamic environment in which they exist and identify
patterns that arise in such an evolution. The authors conjecture various propositions
stating how the patterns of behaviour of individual agents in a supply network relate
to the emergent dynamics of the network. One of the important deductions made is
that when managing supply networks, managers must appropriately balance how
much to control, and how much to let emerge. However, no concrete framework has
been suggested under which such conjectures can be verified and generalized. It is
the goal of this paper to show how the theoretical advances made in the realm of
CAS can be used to study such issues systematically and formally in the context of
supply-chain networks.
We posit that supply chains are complex adaptive systems. However, we do not
provide conclusive proofs for such a claim. We survey the emerging literature, faith-
fully report on the state of the art in CAS and try to establish connections, as much
as possible, between CAS tools and supply-chain analysis. Through our effort, we
would like to pave research directions in supply chains from a CAS point of view.
This paper is divided into eight sections. In section 2, we give a brief introduction
to complex adaptive systems in which we discuss the architecture and characteristics
of complex systems in diverse areas encompassing biology, social systems, ecology
and technology. In section 3, we discuss characteristics of supply chain-networks
and argue that they should be understood in terms of a CAS. We also present
some emerging trends in supply chains and the increasing critical role of information
technology in supply-chain management in the light of these trends. In section 4,
we give a brief overview of the main techniques used for modelling and analysis
Supply-chain networks: a complex adaptive systems perspective 4237
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

of supply chains and then discuss how the science of complexity provides a genuine
extension and reformulation of these approaches. Like any CAS, the study of
supply chains should involve a proper balance of simulation and theory. System
dynamics-based and recently agent-based simulation models (inspired by complexity
theory) are used extensively to make theoretical investigations of supply chains
feasible and to support decision-making in real-world supply chains. A system
dynamics approach often leads to models of supply chains described in the form
of a dynamical system. Dynamical systems theory provides a powerful framework
for rigorous analysis of such models and thus can be used to supplement the system
dynamics simulation approach. We illustrate this in section 5, using some nonlinear
models, which consider the effect of priority, heterogeneity, feedback, delays and
resource sharing on the performance of supply chains. Furthermore, the large
volumes of data generated from simulations can be used to understand and com-
prehend the emergent dynamics of supply chains. Even though an exact understand-
ing of the dynamics is difficult in complex systems, archetypal behaviour patterns
can often be recognized, using techniques from complexity theory like Nonlinear
Time Series Analysis and Computational Mechanics, which are discussed in section 6.
The benefits of integrated supply chain concepts are widely recognized, but the
analytical tools that can exploit those benefits are scarce. In order to study supply
chains as a whole, it is critical to understand the interplay of organizational structure
and functioning of supply chains. Network dynamics, an extension of nonlinear
dynamics to networks, provides a systematic framework to deal with such issues
and is discussed in section 7. We conclude in section 8, with the recommendations
for future research.

2. Complex adaptive systems

Many natural systems, and increasingly many artificial (man-made) systems as well,
are characterized by apparently complex behaviours that arise as the result of non-
linear spatio-temporal interactions among a large number of components or sub-
systems. We use the term agent and node interchangeably to refer to the component
or subsystems. Examples of such natural systems include immune systems, nervous
systems, multi-cellular organisms, ecologies, insect societies and social organizations.
However, such systems are not confined to biology and society. Engineering theories
of controls, communications and computing have matured in recent decades, facil-
itating the creation of large-scale systems which have turned out to possess bewilder-
ing complexity, almost equivalent to that of biological systems. Systems sharing this
property include parallel and distributed computing systems, communication net-
works, artificial neural networks, evolutionary algorithms, large-scale software sys-
tems, and economies. Such systems have been commonly referred to as Complex
Systems (Baranger 2005, Bar-Yam 1997, Adami 1998, Flake 1998). However, at the
present time, the notion of a complex system is not precisely delineated.
The most remarkable phenomenon exhibited by the complex systems is the emer-
gence of highly structured collective behaviour over time from the interaction
of simple subsystems without any centralized control. Their typical character-
istics include: dynamics involving interrelated spatial and temporal effects, cor-
relations over long length and timescales, strongly coupled degrees of freedom,
4238 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

non-interchangeable system elements. They exist in quasi-equilibrium and show a


combination of regularity and randomness (i.e. interplay of chaos and non-chaos).
Such systems have structures spanning several scales and show emergent behaviour.
Emergence is generally understood to be a process that leads to the appearance
of structure not directly described by the defining constraints and instantaneous
forces that control a system. The combination of structure and emergence leads to
self-organization, which is what happens when an emerging behaviour has an effect
of changing the structure or creating a new structure. CAS is a special category of
complex systems to accommodate living beings. As the name suggests, they are
capable of changing themselves to adapt to changing environment. In this regard,
many artificial systems like those stated earlier can be considered as CAS, due to
their capability of evolving. Coexistence of competition and cooperation is another
dichotomy exhibited by CAS.
A CAS can be considered as a network of dynamical elements where the states
of both the nodes and the edges can change, and the topology of the network itself
often evolves in time in a nonlinear and heterogeneous fashion. A dynamical system
can be considered as simply ‘obeying the laws of physics’. From another perspective,
it can be viewed as processing information: how systems obtain information, how
they incorporate that information in the models of their surroundings, and how they
make decisions on the basis of these models determine how they behave (Llyod
and Slotine 1996). This leads to one of the more heuristic definitions of a complex
system: one that ‘stores, processes and transmits, information’ (Sawhil 1995). From
a thermodynamic viewpoint, such systems have the total energy (or its analogy)
unknown, yet something is known about the internal state structure. In these large
open systems (that do not possess well-defined boundaries), energy enters at low
entropy and is dissipated. Open systems organize largely due to the reduction in the
number of active degrees of freedom caused by dissipation. Not all behaviours or
spatial configurations can be supported. The result is a limitation of the collective
modes, cooperative behaviours, and coherent structures that an open system can
express. A central goal of the sciences of complex systems is to understand the laws
and mechanisms by which complicated, coherent global behaviour can emerge from
the collective activities of relatively simple, locally interacting components.
Complexity arises in natural systems thorough evolution, while design plays
an analogous role for the complex engineering systems. Convergent evolution/
design leads to remarkable similarities at a higher level of organization, though at
the molecular or device level, natural and man-made systems differ significantly.
Complexity in both cases is driven far more by the need for robustness to uncer-
tainty in the environment and component parts than by basic functionality.
Through design/evolution, such systems develop highly structured, elaborate inter-
nal configurations, with layers of feedback and signalling. Protocols organize
highly structured and complex modular hierarchies to achieve robustness, but also
create fragilities stemming from rare or ignored perturbations. The evolution of
protocols can lead to a robustness/complexity/fragility spiral where complexity
added for robustness also adds new fragilities, which in turn leads to new and
thus spiralling complexities (Csete and Doyle 2002). However, all this complexity
remains largely hidden in normal operation and only becomes conspicuous
when contributing to rare cascading failures or through chronic fragility/complexity
evolutionary spirals. Highly Optimized Tolerance (HOT) (Carlson and Doyle 1999)
Supply-chain networks: a complex adaptive systems perspective 4239
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

has been introduced recently to focus on the ‘robust, yet fragile’ nature of complex-
ity. It is also becoming increasingly clear that robustness and complexity in biology,
ecology, technology, and social systems are so intertwined that they must be treated
in a unified way. Given the diversity of systems falling into this broad class, the
discovery of any commonalities or ‘universal’ laws underlying such systems requires
a very general theoretical framework.
The scientific study of CAS has been attempting to find common characteristics
and/or formal distinctions among complex systems that might lead to a better under-
standing of how complexity develops, whether it follows any general scientific laws
of nature, and how it might be related to simplicity. The attractiveness of the
methods developed in this research effort for general-purpose modelling, design
and analysis lies in their ability to produce complex emergent phenomena out of
a small set of relatively simple rules, constraints and the relationships couched in
either quantitative or qualitative terms. We believe that the tools and techniques
developed in the study of CAS offer a rich potential for design, modelling and
analysis of large-scale systems in general and supply chains in particular.

3. Supply-chain networks as complex adaptive systems

A supply-chain network transfers information, products and finances between var-


ious suppliers, manufacturers, distributors, retailers and customers. A supply chain
is characterized by a forward flow of goods and a backward flow of information.
Typically, a supply chain comprises two main business processes: material manage-
ment and physical distribution (Min and Zhou 2002). The material management
supports the complete cycle of material flow from the purchase and internal control
of production material to the planning and control of work-in-process, to the ware-
housing, shipping, and distribution of finished products. On the other hand, physical
distribution encompasses all the outbound logistics activities related to providing
customer services. Combining the activities of material management and physical
distribution, a supply chain represents not only a linear chain of one-on-one business
relationships but a web of multiple business networks and relationships.
Supply-chain networks contain an emergent phenomenon. From the view of each
individual entity, the supply chain is self-organizing. Although the totality may be
unknown, individual entities partake in the grand establishment of the network by
engaging in their localized decision-making, i.e. in doing their best to select capable
suppliers and ensure on-time delivery of products to their buyers. The network is
characterized by nonlinear interactions and strong interdependencies between the
entities. In most circumstances, order and control in the network are emergent, as
opposed to predetermined. Control is generated through nonlinear though simple
behavioural rules that operate based on local information. We argue that a supply-
chain network forms a complex adaptive system:
. Structures spanning several scales: The supply-chain network is a bi-level
hierarchical and heterogeneous network where, at the higher level, each
node represents an individual supplier, manufacturer, distributor, retailer
or customer. However, at the lower level, the nodes represent the physical
entities that exist inside each node in the upper level. The heterogeneity of
4240 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

most networks is a function of various technologies being provided by what-


ever vendor could supply them at the time their need was recognized.
. Strongly coupled degrees of freedom and correlations over long length and
timescales: Different entities in a supply chain typically operate autono-
mously with different objectives and subject to different set of constraints.
However, when it comes to improving due date performance, increasing
quality or reducing costs, they become highly inter-dependent. It is the
flow of material, resources, information and finances that provides the bind-
ing force. The welfare of any entity in the system directly depends on the
performance of the others and their willingness and ability to coordinate.
This leads to correlations between entities over long length and timescales.
. Coexistence of competition and cooperation: The entities in a supply chain
often have conflicting objectives. Competition abounds in the form of
sharing and contention of resources. Global control over nodes is an excep-
tion rather than a rule; more likely is a localized cooperation out of which
a global order emerges, which is itself unpredictable.
. Nonlinear dynamics involving interrelated spatial and temporal effects: Supply
chains have a wide geographic distribution. Customers can initiate transac-
tions at any time with little or no regard for existing load, thus contributing
to a dynamic and noisy network character. The characteristics of a network
tend to drift as workloads and configuration change, producing a non-
stationary behaviour. The coordination protocols attempt to arbitrate
among entities with resource conflicts. Arbitration is not perfect, however;
hence, over- and under-corrections contribute to the nonlinear character of
the network.
. Quasi-equilibrium and combination of regularity and randomness (i.e. interplay
of chaos and non-chaos): The general tendency of a supply chain is to main-
tain a stable and prevalent configuration in response to external disturbances.
However, they can undergo a radical structural change when they are
stretched from equilibrium. At such a point, a small event can trigger a
cascade of changes that eventually can lead to system-wide reconfiguration.
In some situations, unstable phenomena can arise, due to feedback structure,
inherent adjustment delays and nonlinear decision-making processes that go
in the nodes. One of the causes of unstable phenomena is that the informa-
tion feedback in the system is slow relative to the rate of changes that occur
in the system. The first mode of unstable behaviour to arise in nonlinear
systems is usually the simple one-cycle self-sustained oscillations. If the
instability drives the system further into the nonlinear regime, more compli-
cated temporal behaviour may be generated. The route to chaos through
subsequent period-doubling bifurcations, as certain parameters of the system
are varied, is generic to a large class of systems in physics, chemistry, biology,
economics and other fields. Functioning in a chaotic regime deprives the
ability for long-term predictions about the behaviour of the system, while
short-term predictions may be possible sometimes. As a result, control and
stabilization of such a system become very difficult.
. Emergent behaviour and self-organization: With the individual entities
obeying a deterministic selection process, the organization of the overall
supply chain emerges through a natural process of order and spontaneity.
Supply-chain networks: a complex adaptive systems perspective 4241
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

This emergence of highly structured collective behaviour over time from


the interaction of the simple entities leads to fulfilment of customer orders.
Demand amplification and inventory swing are two of the undesirable emer-
gent phenomena that can also arise. For instance, the decisions and delays
downstream in a supply chain often leads to amplifying a non-desirable effect
upstream, a phenomenon commonly known as the ‘bull whip’ effect.
. Adaptation and evolution: A supply chain reacts to the environment and
thereby creates its environment. Operationally, the environment depends
on the chosen scale of analysis, e.g. it can be taken as the customer market.
Typically, significant dynamism exists in the environment, which necessitates
a constant adaptation of the supply network. However, the environment
is highly rugged, making the co-evolution difficult. The individual entities
constantly observe what emerges from a supply network and adjust their
organizational goals and supporting infrastructure. Another common adap-
tation is through altering boundaries of the network. The boundaries can
change as a result of including or excluding particular entity and by adding
or eliminating connections among entities, thereby changing the underlying
pattern of interaction. As we discuss next, supply-chain management plays a
critical role in making the network evolve in a coherent manner.

3.1 Supply-chain management


Supply-chain management is the integration of key business processes from end-
users through original suppliers that provide products, services, and information
and add value for customers and other stakeholders (Cooper et al. 1997). It involves
balancing reliable customer delivery with manufacturing and inventory costs. It is
evolved around a customer-focused corporate vision, which drives changes through-
out a firm’s internal and external linkages and then captures the synergy of inter-
functional, inter-organizational integration and coordination. Owing to the inherent
complexity, it is a challenge to coordinate the actions of entities across organiza-
tional boundaries so that they perform in a coherent manner.
An important element in managing supply-chain networks is to control the
ripple effect of lead time so that the variability in supply chain can be minimized.

Figure 1. Supply-chain network.


4242 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Demand forecasting is used to estimate demand for each stage, and the inventory
between stages for the network is used for protecting against fluctuations in supply
and demand across the network. Owing to the decentralized control properties of the
SCN, control of ripple effect requires coordination between entities in performing
their tasks. With the increase in the number of participants in the supply chain,
the problem of coordination has reached another dimension.
Two important organizational and market trends that are on their way have been
the atomization of markets as well as that of organizational entities (Balakrishnan
et al. 1999). In such a scenario, the product realization process has a continuous
customer involvement in all phases—from design to delivery. Customization is not
only limited to selecting from pre-determined model variants; rather, product design,
process plans, and even the supply chain configuration have to be tailored for
each customer. The product-realization organization has to form on the fly, as a
consortium of widely dispersed organizations to cater to the needs of a single cus-
tomer. Thus, organizations consist of series of opportunistic alliances among several
focused organizational entities to address particular market opportunities. For
manufacturing organizations to operate effectively in this environment of dynamic,
virtual alliances, products must have modular architectures, processes must be well
characterized and standardized, documentation must be widely accessible, and sys-
tems must be interoperable. Automation and intelligent information processing is
vital for diagnosing problems during product realization and usage, coordination,
design and production schedules, searching for relevant information in multi-media
databases. These trends exacerbate the challenges of coordination and collaboration
as the number of product realization networks increase, and so does the number
of partners in each network.
Building a larger inventory can be used as a general means for dealing with highly
changing market demand and short-life-cycle products. However, augmenting inven-
tory building with information may be a useful approach. Information about
the material lead time from different suppliers can be used for planning the material
arrival, instead of simply building up an inventory. The demand information can
be transmitted to the manufacturers on a timely basis, so that the orders can be
fulfilled with less inventory costs. In fact, it is widely realized that the successful
integration of the entire supply-chain process depends heavily on the availability of
accurate and timely information that can be shared by all members of the supply
chain. Supply-chain management now increasingly relies on information technology,
as discussed below.

3.2 Information technology in supply-chain management


Information technology, with its capability of providing global reach and wide
range of connectivity, enterprise integration, micro-autonomy and intelligence,
object and networked-oriented computing paradigms and rich media support, has
been the key enabler for the management of modern manufacturing enterprises
(Balakrishnan et al. 1999). It is vital for eliminating collaboration and coordi-
nation costs, and permits the rapid setup of dynamic information exchange net-
works. Connectivity permits involvement of customers and other stakeholders in
all aspects of manufacturing. Enterprise integration facilitates seamless interaction
Supply-chain networks: a complex adaptive systems perspective 4243
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

among global partners. Micro-autonomy and intelligence permit atomic tracking


and remote control. New software paradigms enable distributed, intelligent and
autonomous operations. Distributed computing facilitates quick localized decisions
without losing the vast data-gathering potential and powerful computing capabil-
ities. Rich media support, which includes capabilities like digitization, visualization
tools and virtual reality, facilitate collaboration and immersion.
Many improvements have occurred in supply-chain management because IT
enables dynamic changes in inventory management and production, and it assists
the managers in coping with uncertainty and lead time through improved collection
and sharing of information between supply-chain nodes. The success of an enterprise
is now largely dependent on how its information resources are designed, operated
and managed, especially with the information technology emerging as a critical input
to be leveraged for significant organizational productivity. However, it is difficult to
design an information system that can handle the information needs of supply-chain
nodes to allow efficient, flexible and decentralized supply-chain management. The
main hurdle in efficiently using information technology is the lack of our under-
standing of the organizational, functional and evolutionary principles of supply
chains.
Recognizing supply chains as CAS can, however, lead to novel and effective ways
to understand their emergent dynamics. It has been found that many of the diverse
looking CAS share similar characteristics and problems, and thus can be tackled
through similar approaches. While, at present, networks are largely controlled by
humans, the complexity, diversity and geographic distribution of the networks make
it necessary for networks to maintain themselves in a sort of evolutionary sense, just
as biological organisms do (Maxion 1990). Similarly, the problem of coordination,
which is a challenge in supply chains, has been routinely solved by biological systems
for literally billions of years. We believe that the complexity, flexibility and adapt-
ability in the collective behaviour of the supply chains can be accomplished only
by importing the mechanisms that govern these features in nature. Along with these
robust design principles, we require equally sound techniques for modelling and
analysis of supply chains. This forms the focus of this paper. We first give a brief
overview of the main techniques that have been used for modelling and analysis of
supply chains, and then discuss how the science of complexity provides a genuine
extension and reformulation of these approaches.

4. Modelling and analysis of supply-chain networks

As pointed out, the key challenge in designing supply-chain networks or, for
that matter, any large-scale systems is the difficulty of reverse engineering, i.e. deter-
mining what individual agent strategies lead to the desired collective behaviour.
Because of this difficulty in understanding the effect of individual characteristics
on the collective behaviour of the system, simulation has been the primary tool for
designing and optimizing such systems. Simulation makes investigations possible
and useful when, in the real-world situation, experimentation would be too costly
or, for ethical reasons, not feasible, or where the decisions and their consequences
are well separated in space and time. It seems at present that large-scale simulations
4244 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

of future complex processes may be the most logical, and perhaps an important
vehicle to study them objectively (Ghosh 2002).
Simulation in general helps one to detect design errors, prior to developing a
prototype in a cost-effective manner. Second, simulation of system operations may
identify potential problems that might occur during actual operation. Third,
extensive simulation may potentially detect problems that are rare and otherwise
elusive. Fourth, hypothetical concepts that do not exist in nature, even those that
defy natural laws, may be studied. The increased speed and precision of today’s
computers promise the development of high-fidelity models of physical and natural
processes, models that yield reasonably accurate results, quickly. This in turn would
permit system architects to study the performance impact of a wide variation of key
parameters quickly and, in some cases, even in real time. Thus, a quali-
tative improvement in system design may be achieved. In many cases, unexpected
variations in external stress can be simulated quickly to yield appropriate system
parameters values, which are then adopted into the system to enable it to success-
fully counteract the external stress.
Mathematical analysis, on the other hand, has to a play a critical role because
it alone can enable us to formulate rigorous generalizations or principle. Neither
physical experiments nor computer-based experiments on their own can support
such generalizations. Physical experiments usually are limited to supplying inputs
and constraints for rigorous models, because experiments themselves are rarely
described in a language that permits deductive exploration. Computer-based
experiments or simulations have rigorous descriptions, but they deal only in specifics.
A well-designed mathematical model, on the other hand, generalizes the particulars
revealed by the physical experiments, computer-based models and any
interdisciplinary comparisons. Using mathematical analysis, we can study the
dynamics, predict long-term behaviour, and gain insights into system design: e.g.
what parameters determine group behaviour, how individual agent characteristics
affect the system and that the proposed agent strategy leads to the desired group
behaviour. In addition, mathematical analysis may be used to select parameters that
optimize a system’s collective behaviour, prevent instabilities, etc.
It seems that successful modelling efforts of large-scale systems like supply-chain
networks, large-scale software systems, communication networks, biological
ecosystems, food webs, social organizations, etc. would require a solid empirical
base. Pure abstract mathematical contemplation would be unlikely to lead to
useful models. The discipline of physics provides an appropriate parallel; advances
in theoretical physics are more often than not inspired by experimental findings. The
study of supply-chain networks should therefore involve an amalgam of both
simulation and analytical techniques.
Considering the broad spectrum of a supply-chain, no model can capture all
the aspects of supply-chain processes. The modelling proceeds at three levels:
1. competitive strategic analysis, which includes location-allocation decisions,
demand planning, distribution channel planning, strategic alliances,
new product development, outsourcing, IT selection, pricing and network
structuring;
2. tactical problems like inventory control, production/distribution coordina-
tion, material handling and layout design;
Supply-chain networks: a complex adaptive systems perspective 4245
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

3. operational level problems, which include routing/scheduling, workforce


scheduling and packaging.
The individual models in supply chains can be categorized into four classes
(Min and Zhou 2002):
1. deterministic: single objective and multiple objective models;
2. stochastic: optimal control theoretic and dynamic programming models;
3. hybrid: with elements of both deterministic and stochastic models and
includes inventory theoretic and simulations models;
4. IT-driven: models that aim to integrate and coordinate various phases
of supply-chain planning on a real-time bases using application software,
like ERP.
Mathematical programming techniques and simulation have been two
approaches for the analysis and study of the supply-chain models. Mathematical
programming mainly takes into consideration static aspects of the supply chain.
Simulation, on the other hand, studies dynamics in supply chains and generally
proceeds based on ‘system dynamics’ and ‘agent-based’ methodologies. System
dynamics is a continuous simulation methodology that uses concepts from engineer-
ing feedback control to model and analyse dynamic socio-economic systems
(Forrester 1961). The mathematical description is realized with ordinary differential
equations. An important advantage of system dynamics is the possibility to deduce
the occurrence of a specific behaviour mode because the structure that leads to the
system dynamics is made transparent. We present some nonlinear models in section 5
which are useful for understanding the complex interdependencies, effects of
priority, nonlinearities, delays, uncertainties and competition/cooperation for
resource-sharing in supply chains. The drawback of system dynamics models is
that the structure has to be determined before starting the simulation. Agent-
based modelling (a technique from complexity theory), on the other hand, is
a ‘bottom up approach’ which simulates the underlying processes believed respon-
sible for the global pattern, and allows us to evaluate what mechanisms are most
influential in producing that emergent pattern. In Schieritz and Grobler (2003),
a hybrid modelling approach has been presented that intends to make the system
dynamics approach more flexible by combining it with the discrete agent-based
modelling approach. Such large-scale simulations with their many degrees of free-
dom raise serious technical problems about the design of experiments and the
sequence in which they should be carried out in order to obtain the maximum
relevant information. Furthermore, in order to analyse data from such large-scale
simulations, we require systematic analytical and statistical methods. In section 6,
we describe two such techniques: nonlinear time series analyses and computational
mechanics.
A useful paradigm for modelling a supply chain, taking into consideration
the detailed pattern of interaction, is to view it as a network. A network is essen-
tially anything that can be represented by a graph: a set of points (also generically
called nodes or vertices), connected by links (edges, ties) representing some relation-
ship. Networks are inherently difficult to understand due to their structural com-
plexity, evolving structure, connection diversity, dynamical complexity of nodes,
node diversity and meta-complication where all these factors influence each other.
4246 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Queuing theory has primarily been used to address the steady-state operation of a
typical network. On the other hand, techniques from mathematical programming
have been used to solve the problem of resource allocation in networks. This is
meaningful when dynamic transients can be disregarded. However, present-day
supply-chain networks are highly dynamic, reconfigurable, intrinsically nonlinear
and non-stationary. New tools and techniques are required for their analysis such
that the structure, function and growth of networks can be considered simulta-
neously. In this regard, we discuss ‘network dynamics’ in section 7, which deals
with such issues and can be used to study the structure of supply chain and its
implication for its functionality. Understanding the behaviour of large complex
networks is the next logical step for the field of nonlinear dynamics, because they
are so pervasive in the real world. We begin with a brief introduction to dynamical
systems theory, in particular nonlinear dynamics in next section.

5. Dynamical systems theory

Many physical systems that produce continuous-time response can be modelled by


a set of differential equations of the form:
dy
¼ f ð y, aÞ, ð1Þ
dt
where y ¼ ð y1 ðtÞ, y2 ðtÞ, . . . , yn ðtÞÞ represents the state of the system and may be
thought of as a point in a suitably defined space S, which is known as phase
space, and a ¼ ða1 ðtÞ, a2 ðtÞ, . . . , am ðtÞÞ is a parameter vector. The dimensionality of
S is the number of a priori degrees of freedom in the system. The vector field f ( y, a)
is in general a nonlinear operator acting on points in S. If f ( y, a) is locally Lipschtiz,
the above equation defines an initial value problem in the sense that a unique solu-
tion curve passes through each point y in the phase space. Formally, we may write
the solution at time t given an initial value y0 as yðtÞ ¼ ’t y0 . ’t represents a one-
parameter family of maps of the phase space into itself. We can perceive the solu-
tions to all possible initial value problems for the system by writing them collectively
as ’t S. This may be thought of as a flow of points in the phase space. Initially, the
dimension of the set ’t S will be that of S itself. As the system evolves, however, it is
generally the case for the dissipative system that the flow contracts on to a set of
lower dimension known as an attractor. The attractors can vary from simple sta-
tionary, limit cycle, quasi-periodic to complicated chaotic attractors (Strogatz 1994,
Ott 1996). The nature of attractor changes as parameters (a) is varied, a phenomenon
studied in bifurcation analysis. Typically, a nonlinear system is always chaotic for
some range of parameters. Chaotic attractors have a structure that is not simple;
they are often not smooth manifolds and frequently have a highly fractured struc-
ture, which is popularly referred to as Fractals (self-similar geometrical objects
having a structure at every scale). On this attractor, stretching and folding charac-
terize the dynamics. The stretching phenomenon causes the divergence of nearby
trajectories and the folding phenomenon constraints the dynamics to finite region of
the state space. This accounts for the fractal structure of attractors and the extreme
sensitivity to changes in initial conditions, which is a hallmark of chaotic behaviour.
A system under chaos is unstable everywhere and never settles down, producing
Supply-chain networks: a complex adaptive systems perspective 4247
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

irregular and aperiodic behaviour, which leads to a continuous broadband spectrum.


While this feature can be used to distinguish chaotic behaviour from stationary, limit
cycle, quasi-periodic motions using standard Fourier analysis, it makes it difficult to
separate it from noise which also has a broadband spectrum. It is this ‘deterministic
randomness’ of chaotic behaviour which makes standard linear modelling and
prediction techniques unsuitable for analysis.

5.1 Nonlinear models for the supply chain


Understanding the complex interdependencies, effects of priority, nonlinearities,
delays, uncertainties and competition/cooperation for resource sharing are funda-
mental for prediction and control of supply chains. A system dynamics approach
often leads to models of supply chains, which can be described in the form of
equation (1). Dynamical systems theory provides a powerful framework for rigorous
analysis of such models and thus can be used to supplement the system dynamics
approach. We next describe some nonlinear models and their detailed analysis. These
models can be used either to represent entities in a supply chain or as macroscopic
models, which capture collective behaviour. The models reiterate the fact that simple
rules can lead to complex behaviour, which in general are difficult to predict and
control.

5.1.1 Pre-emptive queuing model with delays. Priority and heterogeneity are funda-
mental to any logistic planning and scheduling. Tasks have to be prioritized in order
to do the most important things first. This comes naturally as we try to optimize an
objective and assign the tasks their ‘importance’. Priorities may also arise due to the
non-homogeneity of the system where the ‘knowledge’ level of one agent is different
from the other. In addition, in all logistics systems, resources are limited, in both

Figure 2. Pre-emptive queuing model.


4248 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

time and space. Temporal dependence plays an important role in logistic planning
(interdependency). Sometimes, they can also arise from the physical facts when
different stages of processing have certain temporal constraints.
The considerations regarding the generality of assumptions and the clear one-
to-one correspondence between the physical logistics tasks and the model param-
eters described in (Erramilli and Forys 1991) made us apply their queuing model in
the context of supply chains (Kumara et al. 2003). The queuing system considered
here has two queues (A and B) and a single server with the following characteristics:
. once served, the class A customer returns as a class B customer after a
constant interval of time;
. Class B has non-pre-emptive priority over class A, i.e. the class A queue is
not served until the class B queue is emptied;
. the schedules are organized every T units of time, i.e. if the low priority queue
is emptied within time T, the server remains idle for the reminder of the
interval;
. finally, the higher-priority class B has a lower service rate than the low-
priority class A.
Suppose the system is sampled at the end of every schedule cycle, and the follow-
ing quantities are observed at the beginning of the kth interval: Ak: queue length
of low-priority queue; Bk: queue length of high-priority queue; Ck: outflow from
low-priority queue in the kth interval; Dk: outflow from high-priority queue in the
kth interval; k: inflow to low-priority queue from the outside in the kth interval.
The system is characterized by the following parameters: a: rate per unit of
the schedule cycle at which the low-priority queue can be served; b: rate per unit
of the schedule cycle at which the high-priority queue can be served; l: the feedback
interval in units of the schedule cycle.
The following four equations then completely describe the evolution of the
system:
Akþ1 ¼ Ak þ k  Ck ð2Þ
  
Dk
Ck ¼ min Ak þ k , a 1  ð3Þ
b

Bkþ1 ¼ Bk þ Ckl  Dk ð4Þ

Dk ¼ minðBk þ Ckl , b Þ: ð5Þ


Equations (2) and (4) are merely conservation rules, while equations (3) and (5)
model the constraints on the outflows and the interaction between the queues.
This model, while conceptually simple, exhibits surprisingly complex behaviours.
The analytic approach to solve for the flow model under constant arrivals
(i.e. k ¼  for all k) shows several classes of solutions. The system batches its work-
load even for perfectly smooth arrival patterns. The characteristics of the behaviour
of the system are as follows:
1. Above a threshold arrival rate (  b/2), a momentary overload can send the
system into a number of stable modes of oscillations.
2. Each mode of oscillations is characterized by distinct average queuing delays.
Supply-chain networks: a complex adaptive systems perspective 4249
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

3. The extreme sensitivity to parameters, and the existence of chaos, implies


that the system at a given time may be any one of a number of distinct
steady-state modes.
The batching of the workload can cause significant queuing delays, even at moderate
occupancies. Also, such oscillatory behaviour significantly lowers the real-time
capacity of the system. For details of the application of this model in a supply-chain
context, refer to Kumara et al. (2003).

5.1.2 Managerial systems. Decision-making is another typical characteristic in


which the entities in a supply chain are continuously engaged. Entities make deci-
sions to optimize their self-interests, often based on local, delayed and imperfect
information.
To illustrate the effects of decisions on the dynamics of supply chain as a whole,
we consider a managerial system which allocates resources to its production and
marketing departments in accordance with shifts in inventory and/or backlog
(Rasmussen and Moseklide 1988). It has four level variables: resources in production,
resources in sales, inventory of finished products and number of customers. In order to
represent the time required to adjust production, a third-order delay is introduced
between production rate and inventory. The sum of the two resource variables is
kept constant. The rate of production is determined from resources in production
through a nonlinear function, which expresses a decreasing productivity of addi-
tional resources as the company approaches maximum capacity. The sales rate, on
the other hand, is determined by the number of customers and by the average sales
per customer-year. Customers are mainly recruited through visits of the company
salesman. The rate of recruitment depends upon the resources allocated to marketing
and sales, and again it is assumed that there is a diminishing return to increasing
sales activity: once recruited, customers are assumed to remain with the company
for an average period AT, the association time.
A difference between production and sales causes the inventory to change.
The Company is assumed to respond to such changes by adjusting its resource
allocation. When the inventory is lower than desired, on the other hand, resources
are redirected from sales to production. A certain minimum of resources is always
maintained in both production and sales. In the model, this is secured by means of
two limiting factors, which reduce the transfer rate when a resource floor is
approached. Finally, the model assumes that there is a feedback from inventory to
customer defection rate. If the inventory of finished products becomes very low, the
delivery time is assumed to become unacceptable to many customers. As a conse-
quence, the defection rate is enhanced by a factor 1 þ H.
The managerial system described is controlled by two interacting negative feed-
back loops. Combined with the delays involved in adjusting production and sales,
these loops create the potential for oscillatory behaviour. If the transfer of resources
is fast enough, this behaviour is destabilized, and the system starts to perform self-
sustained oscillations. The amplitude of these oscillations is finally limited by the
various nonlinear restrictions in the model, particularly by the reduction in resource
transfer rate as lower limits to resources in production or resources in sales are
approached.
4250 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Figure 3. Managerial system.

A series of abrupt changes in the system behaviour is observed as competition


between the basic growth tendency and nonlinear limiting factors is shifted.
The simple one-cycle attractor corresponding to H ¼ 10, becomes unstable for
H ¼ 13, and a new stable attractor with twice the original period arises. If H is
increased to 28, the stable attractor attains a period of 4. As H is further increased,
the period-doubling bifurcations continue until H ¼ 30, the threshold to chaos, is
exceeded. The system now starts to behave in an aperiodic and apparently random
behaviour. Hence, the system shows chaotic behaviour through a series of period-
doubling bifurcations.

5.1.3 Deterministic queuing model. In this section, we consider an alternate


discrete-time deterministic queuing model, for studying decision-making at an
entity level in supply chains. The model consists of one server and two queuing
Supply-chain networks: a complex adaptive systems perspective 4251
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

lines (X and Y) representing some activity (Feichtinger et al. 1994). The input
rates of both queues are constant, and their sum equals the server capacity. In
each time period, the server has to decide how much time to spend on each of the
two activities.
The following quantities can be defined: : constant input rate for activity X;
: constant input rate for activity Y; X: time spent on activity X; Y: time spent
on activity Y; xk: queue length of X; yk: queue length of Y.
The amount of time X and Y that will be spent on activities X and Y in
period k þ 1 are determined by an adaptive feedback rule depending on the
difference of the queue lengths xk and yk. The decision rule or policy function
states that longer queues are served with a higher priority. Two possibilities
considered are:
1. All-or nothing decision: The server decides to spend all its time on the
activity corresponding to the longer queue. Hence,  is a Heaviside function
given by
ðx  yÞ ¼ 1 if x  y
¼0 if x < y: ð6Þ

2. Mixed solutions: The server decides to spend most of its time to the activity
corresponding to the longer queue. For this decision function, an S-shaped
logistic function is used as given by:
1
ðx  yÞ ¼ : ð7Þ
1 þ e kðxyÞ

The parameter k tunes the ‘steepness’ of the S-shape.


With these decision functions, the new queue lengths xkþ1 and ykþ1 are given
equations

xkþ1 ¼ xk þ   ðxk  yk Þ
ð8Þ
ykþ1 ¼ yk þ   ðxk  yk Þ:

Figure 4. Deterministic queuing model.


4252 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Using the constraints  þ  ¼ 1 and X þY ¼ 1, it is sufficient to consider the


dynamics of the map in order to study the behaviour of the system
f ðxÞ ¼ x þ   ð2x  2Þ: ð9Þ
For 0<k<4 and for all 0<<1, the map f has a globally stable equilibrium.
Simulation shows that when the parameter k is not too large, the bifurcation dia-
grams with respect to  are simple. For larger values of k (e.g. k ¼ 7.3), chaotic
behaviour arises after infinitely many period doubling bifurcations, as  is increased
from 0.0 to 0.3. However, when  is further increased from 0.3 to 0.5, chaos
disappears after many period-halving bifurcations. For 0.5<<1, the bifurcations
scenario is qualitatively the same as for 0<<0.5, since the system is symmetric
w.r.t.  ¼ 0.5 and x ¼ 1. Physically, when  is close to 0, there is stable equilibrium,
meaning that in the long run, in each time period, the server spends a fixed pro-
portion of time to each of the two activities, and it spends most of the time on
the activity Y with the highest input rate. For  close to 1, we have same behaviour,
with the activities X and Y interchanged. For  close to 0.5, i.e. when the input rates
of the two activities are almost equal, the equilibrium is unstable, and there is stable
period 2 orbit. This means that in one period, most of the time is spent on activity X,
and then in the next period most of the time is spent on activity Y, and again on
activity X and so on. Chaotic behaviour arises when  is somewhere between 0 and
0.5 or between 0.5 and 1, for  ¼ 1/3 and  ¼ 2/3. Hence, a steep decision function
together with a situation where the input rate of one activity is around twice the
input rate of the other activity leads to irregular queue lengths.
As k ! 1, the decision function  converges to the Heaviside function.
The dynamical behaviour of the queuing model in that case is equivalent to rigid
rotation on a circle. For rational  ¼ p/q of the input rate, every point x is
periodic with period q. In that case, for all q time periods, p time periods are com-
pletely spent on the first activity, while the remaining q  p time periods are spent
on the other activity. On the other hand, when  is irrational, the dynamical
behaviour is quasi-periodic, and every point x is aperiodic.

5.2 Dynamical models of resource allocation: Computational ecosystems


Because of limited resources, resource sharing and allocation is a fundamental prob-
lem in any supply chain. The manner in which the resources are shared and utilized
has a significant impact on the performance of a supply chain. It also dictates how
cooperation/competition arises and is sustained in a supply chain. Resources can be
of various types: physical resources, manpower, information and monetary. With the
IT architectures being developed to realize supply chains, sharing of computational
resources (like CPU, memory, bandwidth, databases, etc.) is also becoming a critical
issue. It is through resource sharing that interdependencies arise between different
entities. This leads to a complex web of interactions in supply chains just like in a
food web or ecology. As a result, such systems can be referred to as ‘Computational
Ecosystems’ (Hogg and Huberman 1988) in analogy with biological ecosystems.
‘Computational Ecosystems’ is a generic model of the dynamics of resource
allocation among agents trying to solve a problem collectively. The model captures
the following features: distributed control, asynchrony in execution, resource con-
tention and cooperation among agents and concomitant problem of incomplete
Supply-chain networks: a complex adaptive systems perspective 4253
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

knowledge and delayed information. The behaviour of each agent is modelled


using a payoff function whose nature determines whether an agent is cooperative
or competitive. The agent here can be any entity in a supply chain like a distributor,
retailer, etc. or a software agent in an e-commerce scenario. The state of the system is
represented as an average number of entities using different resources and follows a
delay differential equation under a mean field approximation. The resources can be
physical or computational as discussed before. For example, in case of two resources
with n identical agents, the rate of change of occupation of a resource is given by:


d n1 ðtÞ

¼  nh i  n1 ðtÞ , ð10Þ
dt
where, hn1 ðtÞi is the expected number of agents using resource 1 at a given instant
of time t;  is the expected number of choices made by an agent per unit time;  is
a random variable that denotes that resource 1 will be perceived to have a higher
payoff than resource 2; and hi gives its expected value.
The global performance of the ecosystem can be obtained from the above equa-
tion. Under different conditions of delay, uncertainty, and cooperation/competition,
the system shows a rich panoply of behaviours ranging from stable, sustained oscil-
lations to intermittent chaos and finally to fully developed chaos. Furthermore,
the following generic deductions can be made from this model (Kephart et al.
1989): while information delay has an adverse impact on the system performance,
uncertainty has a profound effect on the stability of the system. One can deliberately
increase uncertainty in agents’ evaluation of the merits of choices to make it stable
but at the expense of performance degradation. A second possibility is a very slow
re-evaluation rate of the agents, which makes them non-adaptive. Heterogeneity in

Figure 5. Computational ecosystems (: time delay; : standard deviation of ).


4254 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

the nature of agents can lead to more stability in the system compared with a
homogenous case, but the system loses its ability to cope with unexpected changes
in the system such as new task requirements. On the other hand, a poor performance
can be traced to the fact that the non-predictive agents do not take into account
the information delay.
If the agents are able to make accurate predictions of its current state, the
information delay could be overcome, and the system would perform well. This
results in a ‘co-evolutionary’ system in which all of the individuals are simulta-
neously trying to adapt to one another. In such a situation, agents can act like
Technical Analysts and System Analysts (Kephart et al. 1990). Agents as technical
analysts (like those in market behaviour) use either linear extrapolation or cyclic
trend analysis to estimate the current state of the system. On the other hand, agents
as system analysts have knowledge about both the individual characteristics of the
other agents in the system and how those characteristics are related to the overall
system dynamics. Technical analysts are responsive to the behaviour of the system
but suffer from an inability to take into account the strategies of other agents.
Moreover, a good predictive strategy for a single agent may be disastrous if applied
on a global scale. System analysts perform extremely well when they have very
accurate information about other agents in the system but can perform very
poorly when their information is even slightly inaccurate. They take into account
the strategies of other agents but pay no heed to the actual behaviour of the system.
This suggests combining the strengths of both methods to form a hybrid-adaptive
system analyst, which modifies its assumptions about other agents in response to
feedback about success of its own predictions. The resultant hybrid is able to per-
form well.
In order to avoid chaos while maintaining a high performance and adaptability
to unforeseen changes, more sophisticated techniques are required. One such way is
by a reward mechanism (Hogg and Huberman 1991), whereby the relative number of
computational agents following effective strategies is increased at the expense of the
others. This procedure, which generates a right population diversity out of essen-
tially homogenous ones, is able to control chaos by a series of bifurcations into a
stable fixed point.
In the above description, each agent chooses among different resources accord-
ing to its perceived payoff, which depends on the number of agents already using it.
Even the agent with predictive ability is myopic, as it considers only its current
estimate of the system state, without regard to the future. Expectations come into
play if agents use past and present global behaviour in estimating the expected future
payoff for each resource. A dynamical model of collective action that includes
expectations can be found in Glance (1993).

6. Models from observed data

One of the central problems in a supply chain, closely related to modelling, is that
of demand forecasting: given the past, how can we predict the future demand?
The classic approach to forecasting is to build an explanatory model from first
principles and measure the initial conditions. Unfortunately, this has not been pos-
sible for two reasons in systems like supply chains. First, we still lack the general
Supply-chain networks: a complex adaptive systems perspective 4255
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

‘first principles’ for demand variation in supply chains, which are necessary to make
good models. Second, due to the distributed nature of the supply chains, the initial
data or the conditions are often difficult to obtain.
Because of these factors, the modern theory of forecasting that has been used in
supply chains views a time series x(t) as a realization of a random process. This is
appropriate when effective randomness arises from complicated motion involving
many independent, irreducible degrees of freedom. An alternative cause of random-
ness is chaos, which can occur even in very simple deterministic systems, as we
discussed in the earlier sections. While chaos places a fundamental limit on long-
term prediction, it suggests possibilities for short-term prediction. Random-looking
data may contain only few irreducible degrees of freedom. Time traces of the state
variable of such chaotic systems display a behaviour, which is intermediate between
regular periodic or quasiperiodic motions, and unpredictable, truly stochastic behav-
iour. It has long been seen as a form of ‘noise’ because the tools for its analysis
were couched in language tuned to a linear process. The main such tool is Fourier
analysis, which is precisely designed to extract the composition of sines and cosines
found in an observation x(t). Similarly, the standard linear modelling and pre-
diction techniques, such as autoregressive moving average (ARMA) models, are
not suitable for nonlinear systems.
With the advances in IT and science of complexity, both the challenges for
forecasting can be revisited. Large-scale simulation and micro-autonomy (section 2)
enable tracking of the detailed interaction between different entities in a supply
chain. The large volumes of data thus generated can be used to understand
demand patterns in particular and comprehend the emergence of other character-
istics in general. Even though an exact prediction of future behaviour is difficult,
often archetypal behaviour patterns can be recognized using these data. Techniques
from the complexity theory like Nonlinear Time Series Analysis and Computational
Mechanics are appropriate for this purpose.

6.1 Nonlinear time-series analysis


The need to extract interesting physical information about the dynamics of observed
systems when they are operating in a chaotic regime has led to the development of
nonlinear time series analysis techniques. Systematically, the study of potentially,
chaotic systems may be divided into three areas: identification of chaotic behaviour,
modelling and prediction and control. The first area shows how chaotic systems may
be separated from stochastic systems and, at the same time, provides estimates of the
degrees of freedom and the complexity of the underlying chaotic system. Based on
such results, identification of a state-space representation allowing for subsequent
predictions may be carried out. The last stage, if desirable, involves control of a
chaotic system.
Given the observed behaviour of a dynamical system as a one-dimensional time
series x(n), we want to build models for prediction. The most important task in this
process is phase space reconstruction, which involves building topologically and
geometrically equivalent attractor. In general, steps in nonlinear time series analysis
can be summarized as (Abarbanel 1996):
. Signal separation (finding the signal): Separation of a broadband signal from
broadband ‘noise’ using deterministic nature of signal.
4256 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

. Phase space reconstruction (finding the space): Using the method of delays,
one can construct a series of vectors which is diffeomorphically equivalent to
the attractor of the original dynamical system and at the same time distin-
guish it from the being stochastic. The basis for this is Taken’s Embedding
theorem (Takens 1981). Time-lagged variables are used to construct vectors
for a phase space in dE dimension:
yðnÞ ¼ ½xðnÞ, xðn þ T Þ, . . . , xðn þ ðdE  1ÞT Þ: ð11Þ
The time lag T can be determined using mutual information (Fraser and
Swinney 1983) and dE using a false nearest-neighbours test (Kennel et al.
1992).
. Classification of the signal: System identification in nonlinear chaotic systems
means establishing a set of invariants for each system of interest and then
comparing observations with that library of invariants. The invariants are
properties of attractor and are independent of any particular trajectory of the
attractor. Invariants can be divided into two classes: fractal dimensions
(Farmer et al. 1983) and Lyapunov exponents (Sano and Sawada 1985).
Fractal dimensions characterize the geometrical complexity of dynamics,
i.e. how the sample of points along a system orbit are distributed spatially.
Lyapunov exponents, on the other hand, describe the dynamical complexity,
i.e. ‘stretching and folding’ in the dynamical process.
. Making models and prediction: This step involves determination of the
parameters aj of the assumed model of the dynamics:

yðnÞ ! yðn þ 1Þ
ð12Þ
yðn þ 1Þ ¼ Fð yðnÞ, a1 , a2 , . . . , ap Þ,
which is consistent with invariant classifiers (Lyapunov exponents, dimensions).
The functional form F () often used includes polynomials, radial basis functions,
etc. The Local False Nearest Neighbor (Abarbanel and Kennel 1993) test is used to
determine how many dimensions are locally required to describe the dynamics gen-
erating the time series, without knowing the equations of motion, and hence gives
the dimension for the assumed model. The methods for building nonlinear models
are classified as Global and Local (Farmer and Sidorowich 1987, Casdalgi 1989).
By definition, Local methods vary from point to point in the phase space, while
Global Models are constructed once and for all in the whole phase space. Models
based on Machine Learning techniques such as radial basis functions or Neural
Networks (Powell 1987) and Support Vector Machines (Mukherjee et al. 1997)
carry features of both. They are usually used as global functional forms, but they
clearly demonstrate localized behaviour, too.
The techniques from nonlinear time series analysis are well suited for
modelling the nonlinearities in the supply chains. For an application of nonlinear
time series analysis in supply chains, the reader is referred to Lee et al. (2002).
Using this, one can deduce that the time series is deterministic, so it should be
possible in principle to build predictive models. The invariants can be used to
effectively characterize the complex behaviour. For example, the largest
Lyapunov exponent gives an indication of how far into the future reliable predic-
tions can be made, while the fractal dimensions give an indication of how complex a
Supply-chain networks: a complex adaptive systems perspective 4257
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

model should be chosen to represent the data. These models then provide the
basis for systematically developing the control strategies. It should be noted
that the functional forms used for modelling in the fourth step above are continuous
in their argument. This approach builds models viewing a dynamical system as
obeying laws of physics. From another perspective, a dynamical system can be
considered as processing information. So, an alternative class of discrete ‘computa-
tional’ models inspired from the theory of automata and formal languages can also
be used for modelling the dynamics (Lind and Marcus 1996). ‘Computational
Mechanics’ considers this viewpoint and describes the system behaviour in
terms of its intrinsic computational architecture, i.e. how it stores and processes
information.

6.2 Computational mechanics


Computational mechanics is a method for inferring the causal structure of stochastic
processes from empirical data or arbitrary probabilistic representations. It combines
ideas and techniques from nonlinear dynamics, information theory and auto-
mata theory, and is, as it were, an ‘inverse’ to statistical mechanics. Instead of
starting with a microscopic description of particles and their interactions, and deriv-
ing macroscopic phenomena, it starts with observed macroscopic data and infers the
simplest causal structure: the ‘"-machine’ capable of generating the observations.
The "-machine in turn describes the system’s intrinsic computation, i.e. how it
stores and processes information. This is developed using the statistical mechanics
of orbit ensembles, rather than focusing on the computational complexity of indi-
vidual orbits. By not requiring a Hamiltonian, computational mechanics can be
applied in a wide range of contexts, including those where an energy function for
the system may not manifest as for the supply chains. Notions of complexity, emer-
gence and self-organization have also been formalized and quantified in terms of
various information measures (Shalizi 2005).
Given a time series, the (unknowable) exact states of an observed system are
translated into a sequence of symbols via a measurement channel (Crutchfield 1992).
Two histories (i.e. two series of past data) carry equivalent information if they lead
to the same (conditional) probability distribution in the future (i.e. if it makes no
difference whether one or the other data series is observed). Under these circum-
stances, i.e. the effects of the two series being indistinguishable, they can be lumped
together. This procedure identifies causal states and also identifies the structure
of connections or succession in causal states and creates what is known as an
‘epsilon-machine’. The "-machines form a special class of Deterministic Finite
State Automata (DFSA) with transitions labelled with conditional probabilities
and hence can also be viewed as Markov chains. However, finite-memory machines
like "-machines may fail to admit a finite size model, implying that the number of
casual states could turn out to be infinite. In this case, a more powerful model than
DFSA needs to be used. One proceeds by trying to use the next most powerful model
in the hierarchy of machines known as the casual hierarchy (Crutchfield 1994), in
analogy with the Chomsky hierarchy of formal languages. While ‘"-machine recon-
struction’ refers to the process of constructing the machine given an assumed model
class, ‘hierarchical machine reconstruction’ describes a process of innovation to
create a new model class. It detects regularities in a series of increasingly accurate
4258 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

models. The inductive jump to a higher computational level occurs by taking those
regularities as the new representation.
"-machines reflect a balanced utilization of deterministic and random informa-
tion processing, and this is discovered automatically during "-machine reconstruc-
tion. These machines are unique and optimal in the sense that they have maximal
predictive power and minimum model size (hence satisfying Occam’s Razor, i.e.
causes should not be multiplied beyond necessity). "-machines provide a minimal
description of the pattern or regularities in a system in the sense that the pattern
is the algebraic structure determined by the causal states and their transitions.
"-machines are also minimally stochastic. Hence, computational mechanics acts as
a method for automatic pattern discovery.
An "-machine is the organization of the process, or at least of the part of it
which is relevant to our measurements. The "-machine that models the observed
time series from a system can be used to define and calculate macroscopic or global
properties that reflect the characteristic average information- processing capabilities
of the system. Some of these include Entropy rate, Excess entropy and Statistical
complexity (Feldman and Crutchfield 1998) and (Crutchfield and Feldman 2001).
The entropy density indicates how predictable the system is. Excess entropy, on
other hand, provides a measure of the apparent memory stored in a spatial config-
uration and represents how hard it is to predict. "-machine reconstruction leads to a
natural measure of the statistical complexity of a process, namely the amount of
information needed to specify the state of the "-machine, i.e. the Shannon Entropy.
Statistical complexity is distinct and dual from information theoretic entropies and
dimension (Crutchfield and Young 1989). The existence of chaos shows that there is
a rich variety of unpredictability that spans the two extremes: periodic and random
behaviour. This behaviour between two extremes, while of intermediate information
content, is more complex in that the most concise description (modelling) is an
amalgam of regular and stochastic processes. An information theoretic description
of this spectrum in terms of dynamical entropies measures raw diversity of temporal
patterns. The dynamical entropies, however, do not measure directly the com-
putational effort required in modelling the complex behaviour, which is what
statistical complexity captures.
Computational mechanics sets limits on how well processes can be predicted
and shows how, at least in principle, those limits can be attained. "-machines are
what any prediction method would build, if only they could. Similar to "-machine
reconstruction, techniques exist which can be used to discover casual architecture in
memoryless transducers, transducers with memory and spatially extended systems
(Shalizi and Crutchfield 2001). Computational mechanics can be used for modelling
and prediction in supply chains in the following way:
. In systems like supply chains, it is difficult to define analogues of various
thermodynamic quantities like energy, temperature, pressure, etc. as we do
for physical systems. Each component in the network has cognition, which is
absent in physical systems such as a molecule of a gas. Because of such
difficulties, statistical mechanics cannot be applied directly to build predic-
tion models for supply chains. As discussed previously by not requiring a
Hamiltonian (the energy-like function), computational mechanics is still
applicable in the case of supply chains.
Supply-chain networks: a complex adaptive systems perspective 4259
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

. "-machines can be built to discover patterns in the behaviour of various


quantities in supply chains like the inventory levels, demand fluctuations, etc.
. "-machines can be used for prediction through a process known as ‘synchro-
nization’ (Crutchfield and Feldman 2003).
. "-machines can be used to calculate various global properties like entropy
rate, excess entropy and statistical complexity that reflect how the system
stores and processes information. The significance of these quantities has
been discussed earlier.
. We can also quantify notions of Complexity, Emergence and Self-
Organization in terms of various information measures derived from
"-machines. By evaluating such quantities, we can compare the complexity
of different supply chains and quantify the extent to which the network
is showing emergence. We can also infer when a supply chain is under-
going self-organization and to what extent. Such quantification can help
us to compare precisely what policies or cognitive capabilities possessed
by individual agents can lead to different degrees of emergence and self-
organization. Hence, we can decide to what extent we desire to enforce the
control and to what extent we want to let the network emerge.

7. Network dynamics

The ubiquity of networks in the social, biological and physical sciences and in
technology leads naturally to an important set of common problems, which
are being currently studied under the rubric of ‘Network Dynamics’ (Strogatz
2001). Structure always affects function, and it is important to consider dynamical
and structural complexity together in the study of networks. For instance, the
topology of social networks affects the spread of information and disease, and
the topology of the power grid affects the robustness and stability of power
transmission. The different problem areas in network dynamics are discussed
below.
One area of research in this field has been primarily concerned with the dynam-
ical complexity in regular networks without regard to other network topologies.
While the collective behaviour depends on the details of the network, some general-
izations can still be drawn (Strogatz 2001). For instance, if the dynamical system
at each node has stable fixed points and no other attractor, the network tends to
lock into a static fixed pattern. If the nodes have competing interactions, the network
may display an enormous number of locally stable equilibria. In the intermediate
case where each node has a stable limit cycle, synchronization and patterns like
travelling waves can be observed. For non-identical oscillators, the temporal ana-
logue of phase transition can be seen with the control parameter as the coupling
coefficient. At the opposite extreme, if each node has an identical chaotic attractor,
the network can synchronize their erratic fluctuations. For a wide range of network
topologies, synchronized chaos requires that the coupling be neither too weak nor
too strong; otherwise, spatial instabilities are triggered. Related lines of research that
address networks of identical chaotic maps are coupled map lattices (Kaneko and
Tsuda 2000) and cellular automata (Wolfram 1994). However, these systems have
4260 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

been used mainly as testbeds for exploring spatio-temporal chaos and pattern
formation in the simplest mathematical settings, rather than as models of real
systems.
The second area in network dynamics is concerned with characterizing the net-
work structure. The network structure or topologies in general can vary from com-
pletely regular, like chains, grids, lattices and fully connected, to completely random.
Moreover, the graphs can be directed or undirected and cyclic or acyclic. In order to
characterize topological properties of the graphs, various statistical quantities have
been defined. Most important of them include average path length, clustering coeffi-
cient, degree distributions, size of giant component and various spectral properties.
A review of the main models and analytical tools, covering regular graphs, random
graphs, generalized random graphs, small-world and scale-free networks, as well as
the interplay between topology and the network’s robustness against failures
and attacks can be found in Albert (2000b), Albert and Barabasi (2002), Albert
et al. (2002), Callaway et al. (2000) and Dorogovtsev and Mendes (2002).
Classic random graphs were introduced by Erdos and Renyi (Bollobas 1985)
and have been the most thoroughly studied models of networks. Such graphs have
a Poisson degree distribution and statistically uncorrelated vertices. At large N
(total number of nodes in the graph) and large enough p (the probability that two
arbitrary vertices are connected), a giant connected component appears in the net-
work, a process known as percolation. Random graphs exhibit a low average path
length and a low clustering coefficient. Regular networks, on other hand, show a
high clustering coefficient and also a greater average path length compared with the
random graphs of similar size. The networks found in the real world, however, are
neither completely regular nor completely random. Instead, we see ‘small world’ and
‘scale free’ characteristics for many real networks like social networks, Internet,
WWW, power grids, collaboration networks, ecological and metabolic networks,
to name a few.
In order to describe the transition from a regular network to a random network,
Watts and Strogatz introduced the so-called small-world graphs as models of social
networks (Watts and Strogatz 1998) and (Newman 2000). This model exhibits a high
degree of clustering, as in the regular network, and a small average distance between
vertices, as in the classic random graphs. A common feature of this model with a
random graph model is that the connectivity distribution of the network peaks at an
average value and decays exponentially. Such an exponential network is homoge-
neous in nature: each node has roughly the same number of connections. Because of
the high degree of clustering, the models of dynamical systems with small-world
coupling display an enhanced signal-propagation speed, rapid disease propagation,
and synchronizability (Watts and Strogatz 1998, Newman 2002).
Another significant recent discovery in the field of complex networks is that the
connectivity distributions of a number of large-scale and complex networks, includ-
ing the WWW, Internet, and metabolic networks, satisfy the power law PðkÞ  k ,
where P(k) is the probability that a node in the network is connected to k other
nodes, and  is a positive real number (Albert et al. 2000a Barabasi et al. 2000,
Barabasi 2001). Since power-laws are free of the characteristic scale, networks that
satisfy these laws are called ‘scale-free’. A scale-free network is inhomogeneous in
nature: most nodes have a few connections, and a small but statistically significant
number of nodes have many connections. The average path length is smaller in the
Supply-chain networks: a complex adaptive systems perspective 4261
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

scale-free network than in a random graph, indicating that the heterogeneous


scale-free topology is more efficient in bringing the nodes closer than homogenous
topology of the random graphs. The clustering coefficient of the scale-free network is
about five times higher than that of the random graph, and this factor slowly
increases with the number of nodes. It has been shown that it is practically impos-
sible to achieve synchronization in a nearest-neighbour coupled network (regular
connectivity) if the network is sufficiently large. However, it is quite easy to achieve
synchronization in a scale-free dynamical network no matter how large the network
is (Wang and Chen 2002). Moreover, the synchronizability of a scale-free dynamical
network is robust against random removal of nodes but is fragile to specific removal
of the most highly connected nodes.
The scale-free property and high degree of clustering (the small world effect) are
not exclusive for a large number of real networks. Yet, most models proposed to
describe the topology of complex networks have difficulty in capturing simulta-
neously these two features. It has been shown in Ravasz and Barabasi (2003) that
these two features are the consequence of a hierarchical organization present in the
networks. This argument also agrees with that proposed by Herbert Simon (Simon
1997), who argues:
. . . we could expect complex systems to be hierarchies in a world in which complexity has
to evolve from simplicity. In their dynamics, hierarchies have a property, near decom-
posability, that greatly simplifies their behaviour. Near decomposability also simplifies the
description of complex systems and makes it easier to understand how the information
needed for the development of the system can be stored in reasonable compass.
Indeed, many networks are fundamentally modular: one can easily identify groups of
nodes that are highly interconnected with each other but have few or no links to
nodes outside the group to which they belong. This clearly identifiable modular
organization is at the origin of high degree of clustering coefficient. On the other
hand, these modules can be organized in a hierarchical fashion into increasingly
large groups, giving rise to ‘hierarchical networks’, while still maintaining the
scale-free topology. Thus, modularity, scale-free character and a high degree of
clustering can be achieved under a common roof. Moreover, in hierarchical net-
works, the degree of clustering characterizing the different groups follows a strict
scaling law, which can be used to identify the presence of hierarchical structure
in real networks.
The mathematical theory of graphs with arbitrary degree distributions known as
‘generalized random graphs’ can be found in Newman et al. (2001) and Newman
(2003). Using the ‘generating function formulation’, the authors have been able to
solve the percolation problem (i.e. have found conditions for predicting the appear-
ance of a giant component) and have obtained formulae for calculating the clustering
coefficient and average path length for generalized random graphs. The authors have
proposed and studied models of propagation of diseases, failures, fads and synchro-
nization on such graphs and have extended their results for bipartite and directed
graphs.
Network dynamics, though in its infancy, promises a formal framework to char-
acterize the organizational and functional aspects in supply chains (Thadakamalla
et al. 2004). With the changing trends in supply chains, many new issues
have become critical: organizational resistance to change, inter-functional or inter-
4262 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

organizational conflicts, relationship management, and consumer and market


behaviour. Such problems are ill structured, behavioural, and cannot be easily
addressed by analytical tools such as mathematical programming. Successful
supply-chain integration depends on the supply-chain partners’ ability to synchro-
nize and share real-time information. The establishment of collaborative relation-
ship among supply-chain partners is a pre-requisite to information sharing. As
a result, successful supply-chain management relies on systematically studying
questions like
1. What are the robust architectures for collaboration, and what are the coor-
dination strategies that lead to such architectures?
2. If different entities make decisions on whether or not to cooperate on the
basis of imperfect information about the group activity, and incorporate
expectations on how their decision will affect other entities, can overall
cooperation be sustained for long periods of time?
3. How do the expectations, group size, and diversity affect coordination and
cooperation?
4. Which kinds of organizations are most able to sustain ongoing collective
action, and how might such organizations evolve over time?
Network dynamics addresses many such questions and should be explored in the
context of supply chains.

8. Conclusions and future work

The idea of managing the whole supply chain and transforming it into a highly
autonomous, dynamic, agile, adaptive and reconfigurable network certainly provides
an appealing vision for managers. The infrastructure provided by information tech-
nology has made this vision partially realizable. But the inherent complexity
of supply chains makes the efficient utilization of information technology an
elusive endeavour. Tackling this complexity has been beyond the existing tools
and techniques and requires revival and extensions.
As a result, we emphasized in this paper that in order to effectively understand
a supply-chain network, it should be treated as a CAS. We laid down some initial
ideas for the extension of modelling and analysis of supply chains using the con-
cepts, tools and techniques arising in the study of CAS. As future work, we need
to verify the feasibility and usefulness of the proposed techniques in the context
of large-scale supply chains.

Acknowledgements

The authors wish to acknowledge DARPA (Grant No. MDA972-1-1-0038


under the UltraLog Programme) for their generous support for this research.
In addition, the partial support provided by NSF (Grant No. DMII-0075584)
to Professor Kumara is greatly appreciated. The authors wish to thank the
anonymous reviewers for their comments and valuable suggestions.
Supply-chain networks: a complex adaptive systems perspective 4263
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

References

Abarbanel, H.D.I., The Analysis of Observed Chaotic Data, 1996 (Springer: New York).
Abarbanel, H.D.I. and Kennel, M.B., Local false nearest neighbors and dynamical dimensions
from observed chaotic data. Phys. Rev. E, 1993, 47, 3057–3068.
Adami, C., Introduction to Artificial Life, 1998 (Springer: New York).
Albert, R. and Barabasi, A.L., Statistical mechanics of complex networks. Rev. Mod. Phys.,
2002, 74, 47.
Albert, R., Barabási, A.L., Jeong, H. and Bianconi, G., Power-law distribution of the World
Wide Web. Science, 2000, 287, 2115.
Albert R., Jeong, H., Barabasi, A.L., Error and attack tolerance of complex networks. Nature,
2000, 406, 378–382.
Balakrishnan, A., Kumara, S. and Sundaresan, S., Exploiting information technologies
for product realization. Inform. Syst. Front. J. Res. Innov., 1999, 1(1), 25–50.
Barabasi, A.L., The physics of web. Phys. World, July 2001.
Barabasi, A.L., Albert, R. and Jeong, H., Scale-free characteristics of random networks:
The topology of the World Wide Web. Physica A, 2000, 281, 69–77.
Baranger, M., Chaos, complexity, and entropy: a physics talk for non-physicists. Available
online at: http://necsi.org/projects/baranger/cce.pdf (accessed May 2005).
Bar-Yam, Y., Dynamics of Complex Systems, 1997 (Addison-Wesley: Reading, MA).
Bollobas, B., Random Graphs, 1985 (Academic Press: London).
Callaway, D.S., Newman, M.E.J., Strogatz, S.H. and Watts, D.J., Network robustness and
fragility: Percolation on random graphs. Phys. Rev. Lett., 2000, 85, 5468–5471.
Carlson, J.M., Doyle, J., Highly optimised tolerance: a mechanism for power laws in designed
systems. Phys. Rev. E, 1999, 60(2), 1412–1427.
Casdalgi, M., Nonlinear prediction of chaotic time series. Physica D, 1989, 35, 335–356.
Choi, T.Y., Dooley, K.J., Ruangtusanathan, M., Supply networks and complex adaptive
systems: control versus emergence. J. Operat. Manage., 2001, 19(3), 351–366.
Cooper, M.C., Lambert, D.M. and Pagh, J.D., Supply chain management: more than a new
name for logistics. Int. J. Logist. Manage., 1997, 8(1), 1–13.
Crutchfield, J.P., Knowledge and meaning . . . chaos and complexity. In Modeling Complex
Systems, edited by L. Lam and H.C. Morris, 1992 (Springer: Berlin), pp. 66–101.
Crutchfield, J.P., The calculi of emergence: computation, dynamics and induction. Physica D,
1994, 75, 11–54.
Crutchfield, J.P. and Young, K., Inferring statistical complexity. Phys. Rev. Lett., 1989,
63, 105–108.
Crutchfield, J.P. and Feldman, D.P., Synchronizing to the environment: information theoretic
constraints on agent learning. Adv. Complex Syst., 2001, 4, 251–264.
Crutchfield, J.P. and Feldman, D.P., Regularities unseen, randomness observed: levels of
entropy convergence. Chaos, 2003, 13, 25–54.
Csete, M.E. and Doyle, J., Reverse engineering of biological complexity. Science, 2002,
295, 1664.
Dorogovtsev, S.N. and Mendes, J.F.F., Evolution of networks. Adv. Phys., 2002, 51,
1079–1187.
Erramilli, A. and Forys, L.J., Oscillations and chaos in a flow model of a switching system.
IEEE J. Select. Areas Commun., 1991, 9(2), 171–178.
Farmer, J.D., Ott, E. and Yorke, J.A., The dimension of chaotic attractors. Physica D, 1983,
7, 153–180.
Farmer, J.D. and Sidorowich, J.J., Predicting chaotic time-series. Phys. Rev. Lett., 1987,
59(8), 845–848.
Feichtinger, G., Hommes, C.H. and Herold, W., Chaos in a simple deterministic queuing
system. ZOR- Math. Meth. Oper. Res., 1994, 40, 109–119.
Feldman, D.P. and Crutchfield, J.P., Discovering non-critical organization: statistical mech-
anical, information theoretic and computational views of patterns in one-dimensional
spin systems. Santa Fe Institute Working Paper 98–04–026, 1998.
Flake, G.W., The Computational Beauty of Nature, 1998 (MIT Press: Cambridge, MA).
Forrester, J.W., Industrial Dynamics, 1961 (MIT Press: Cambridge, MA).
4264 A. Surana et al.
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Fraser, A.M. and Swinney, H.L., Independent coordinates for strange attractors from mutual
information. Phys. Rev. A, 1983, 33(2), 1134–1140.
Ghosh, S., The role of modeling and asynchronous distributed simulation in analyzing com-
plex systems of the future. Inform. Syst. Front. J. Res. Innov., 2002, 4(2), 166–171.
Glance, N.S., Dynamics with expectations. PhD thesis, Physics Department, Stanford
University, 1993.
Hogg, T. and Huberman, B.A., The behavior of computational ecologies. In The Ecology
of Computation, edited by B.A. Huberman, pp. 77–116, 1988 (Elsevier Science:
Amsterdam).
Hogg, T. and Huberman, B.A., Controlling chaos in distributed systems. IEEE Trans. on
Systems, Man and Cybernetics, 1991, 21, 1325–1332.
Kaneko, K. and Tsuda, I., Complex Systems: Chaos and Beyond—A constructive approach
with applications in life sciences, 2000 (Springer: Berlin).
Kennel, M., Brown, R. and Abarbanel, H.D.I., Determining embedding dimension for
phase-space reconstruction using a geometrical construction. Phys. Rev. A, 1992,
45(6), 3403–3068.
Kephart, J.O., Hogg, T. and Huberman, B.A., Dynamics of computational ecosystems.
Phys. Rev. A, 1989, 40(1), 404–421.
Kephart, J.O., Hogg, T. and Huberman, B.A., Collective behavior of predictive agents.
Physica D, 1990, 42, 48–65.
Kumara, S., Ranjan, P., Surana, A. and Narayanan, V., Decision making in logistics:
A chaos theory based analysis. Ann. Int. Inst. Prod. Eng. Res. (Ann. CIRP), 2003, 1,
381–384.
Lee, S., Gautam, N., Kumara, S., Hong, Y., Gupta, H., Surana, A., Narayanan, V.,
Thadakamalla, H., Brinn, M. and Greaves, M., Situation identification using dynamic
parameters in complex agent-based planning systems. Intell. Eng. Syst. Artif. Neural
Networks, 2002, 12, 555–560.
Lind, D. and Marcus, B., An introduction to symbolic dynamics and coding, 1995 (Cambridge
University Press: New York).
Llyod, S. and Slotine, J.J.E., Information theoretic tools for stable adaptation and learning.
Int. J. Adapt. Control Signal Process., 1996, 10, 499–530.
Maxion, R.A., Toward diagnosis as an emergent behavior in a network ecosystem. Physica D,
1990, 42, 66–84.
Min, H. and Zhou, G., Supply chain modeling: past, present and future. Comput. Ind. Eng.,
2002, 43, 231–249.
Mukherjee, S., Osuna, E. and Girosi, F., Nonlinear prediction of chaotic time series using
support vector machines. In IEEE Workshop on Neural Networks for Signal Processing
VII, 1997, pp. 511–519.
Newman, M.E.J., Models of the small world. J. Stat. Phys., 2000, 101, 819–841.
Newman, M.E.J., The spread of epidemic disease on networks. Phys. Rev. E, 2002, 66.
Newman, M.E.J., Random graphs as models of networks. In Handbook of Graphs and
Networks, edited by S. Bornholdt and H.G. Schuster, 2003 (Wiley-VCH, Berlin).
Newman, M.E.J., Strogatz, S.H. and Watts, D.J., Random graphs with arbitrary degree
distribution and their applications. Phys. Rev. E, 2001, 64.
Ott, E., Chaos in Dynamical Systems, 1996 (Cambridge University Press: Cambridge).
Powell, M.J.D., Radial basis function approximation to polynomials. Preprint University of
Cambridge, 1987.
Rasmussen, D.R. and Moseklide, M., Bifurcations and chaos in generic management model.
Eur. J. Oper. Res., 1988, 35, 80–88.
Ravasz, E. and Barabasi, A.L., Hierarchical organization in complex networks. Phys. Rev. E,
2003, 67.
Sano, M. and Sawada, Y., Measurement of the Lyapunov Spectrum form a chaotic time
series. Phys. Rev. Lett., 1985, 55, 1082–1084.
Sawhill, B.K., Self-organised criticality and complexity theory. In 1993 Lectures in Complex
Systems, edited by L. Nadel and D.L. Stein, pp. 143–170, 1995 (Addison-Wesley:
Reading, MA).
Supply-chain networks: a complex adaptive systems perspective 4265
Downloaded By: [Pennsylvania State University] At: 23:20 21 April 2008

Schieritz, N. and Grobler, A., Emergent structures in supply chains—A study integrating
agent-based and system dynamics modeling, in 36th Annual Hawaii International
Conference on System Sciences, Big Island, HI, 2003.
Shalizi, C.R. and Crutchfield, J.P., Computational mechanics: pattern and prediction,
structure and simplicity. J. Stat. Phys., 2001, 104, 816–879.
Shalizi, C.R., Causal architecture, complexity and self-organization in time series and
cellular automata. Available online at: http://www.santafe.edu/shalizi/thesis, 2005
(accessed May 2005).
Simon, H.A., The Sciences of the Artificial, 3rd ed., 1997 (The MIT Press, Cambridge, MA).
Strogatz, S.H., Nonlinear Dynamics and Chaos, 1994 (Addison-Wesley: Reading, MA).
Strogatz, S.H., Exploring complex networks. Nature, 2001, 410, 268–276.
Takens, F., Detecting Strange Attractor in Turbulence. In L.S. Young, Editor, Dynamical
Systems and Turbulence, Lecture Notes in Mathematics, 1981, 898, 366–381,
(Springer, New York).
Thadakamalla, H.P., Raghavan, U.N., Kumara, S. and Albert, R., Survivability of
multiagent-based supply networks: a topological perspective. IEEE Intell. Syst., 2004,
19(5), 24–31.
Wang, X.F. and Chen, G., Synchronization in scale-free dynamical networks: robustness
and fragility. IEEE Trans. Circuits and Systems I Fundam. Theory Applic., 2002,
49(1), 54–62.
Watts, D.J. and Strogatz, S.H., Collective dynamics of ‘small-world’ networks. Nature, 1998,
393, 440–442.
Wolfram, S., Cellular Automata and Complexity: Collected Papers, 1994 (Addison-Wesley:
Reading, MA).
D e p e n d a b l e A g e n t S y s t e m s

Survivability of
Multiagent-Based
Supply Networks: A
Topological Perspective
Hari Prasad Thadakamalla, Usha Nandini Raghavan, Soundar Kumara, and
Réka Albert, Pennsylvania State University

You can improve a


S upply chains involve complex webs of interactions among suppliers, manufac-

turers, distributors, third-party logistics providers, retailers, and customers.

Although fairly simple business processes govern these individual entities, real-time
multiagent-based capabilities and global Internet connectivity make today’s supply chains complex.
supply network’s
Fluctuating demand patterns, increasing customer large-scale supply network topologies that can
survivability by expectations, and competitive markets also add to extend to other large-scale MASs. Building surviv-
their complexity. able topologies alone doesn’t, however, make an
concentrating on the Supply networks are usually modeled as multi- MAS dependable. To create survivable—and hence
agent systems (MASs).1 Because supply chain man- dependable—multiagent systems, we must also con-
topology and its agement must effectively coordinate among many sider the interplay between network topology and
different entities, a multiagent modeling framework node functionalities.
interplay with based on explicit communication between these enti-
ties is a natural choice.1 Furthermore, we can repre- A topological perspective
functionalities. sent these multiagent systems as a complex network To date, the survivability literature has emphasized
with entities as nodes and the interactions between network functionalities rather than topology. To be
them as edges. Here we explore the survivability (and survivable, a supply network must adapt to a dy-
hence dependability) of these MASs from the view namic environment, withstand failures, and be flex-
of these complex supply networks. ible and highly responsive. These characteristics
Today’s supply networks aren’t dependable—or depend on not only node functionality but also the
survivable—in chaotic environments. For example, topology in which nodes operate.
Figure 1 shows how mediocre a typical supply net-
work’s reaction to a node or edge failure is compared The components of survivability
to a network with built-in redundancy. From a topological perspective, the following
Survivability is a critical factor in supply network properties encompass survivability, and we denote
design. Specifically, supply networks in dynamic them as survivability components.
environments, such as military supply chains during The first is robustness. A robust network can sustain
wartime, must be designed more for survivability the loss of some of its structure or functionalities and
than for cost effectiveness. The more survivable a maintain connectedness under node failures, whether
network is, the more dependable it will be. the failure is random or is a targeted attack. We mea-
We present a methodology for building survivable sure robustness as the size of the network’s largest

24 1541-1672/04/$20.00 © 2004 IEEE IEEE INTELLIGENT SYSTEMS


Published by the IEEE Computer Society
connected component, in which a path exists
Battalion
between any pair of nodes in that component. Battalion Battalion Battalion
Battalion
The second is responsiveness. A respon-
sive network provides timely services and Battalion
Battalion Battalion
effective navigation. Low characteristic path
length (the average of the shortest path FSB FSB Battalion
lengths from each node to every other node)
Battalion Battalion
leads to better responsiveness, which deter-
mines how quickly commodities or infor- Node
failure MSB Battalion
mation proliferate throughout the network.
The third is flexibility. This property de-
Battalion Battalion
pends on the presence of alternate paths. FSB
Good clustering properties ensure alternate
paths to facilitate dynamic rerouting. The Battalion FSB Battalion
clustering coefficient, defined as the ratio Battalion
between the number of edges among a node’s Battalion Battalion
first neighbors and the total possible number
of edges between them, characterizes the Battalion Battalion
local order in a node’s neighborhood. Battalion Battalion
Battalion Battalion
The fourth is adaptivity. An adaptive net- (a)
work can rewire itself efficiently—that is,
restructure or reorganize its topology on the
basis of environmental shifts—to continue Battalion
Battalion Battalion Battalion
providing efficient performance. For exam- Battalion
ple, if a supplier can’t reliably meet a cus- Battalion
tomer’s demands, the customer should be Battalion Battalion
able to choose another supplier. FSB FSB Battalion
A typical supply chain with a tree-like or
hierarchical structure lacks these four prop- Battalion Battalion
erties—the clustering coefficient is nearly Node
zero, and the characteristic path length scales failure MSB Battalion
linearly with the number of nodes (or agents)
N. In designing complex agent networks Battalion Battalion
FSB
with built-in survivability, conventional opti-
mization tools won’t work because of the Battalion
Battalion FSB
problem’s extremely large scale. When net- Battalion
works were smaller, we could understand Battalion Battalion
their overall behavior by concentrating on
the individual components’ properties. But Battalion Battalion
as networks expand, this becomes impossi- Battalion Battalion
Battalion Battalion
ble, so we shift focus to the statistical prop- (b)
erties of the collective behavior.
Figure 1. How redundancy affects survivability. (a) A part of the multiagent system
Using topologies for military logistics modeled using the UltraLog (www.ultralog.net) program. This
Studying complex networks such as pro- example models each entity, such as main support battalion, forward support battalion,
tein interaction networks, regulatory net- and battalion, as a software agent. (We’ve changed the agents’ names for security
works, social networks of acquaintances, reasons.) In the current scenario, MSBs send the supplies to the FSBs, who in turn
and information networks such as the Web forward these to battalions. (b) A modified military supply chain with some redundancy
built into it. This network performs much better in the event of node failures and hence
is illuminating the principles that make these
is more dependable than the first network.
networks extremely resilient to their respec-
tive chaotic environments. The core princi-
ples extracted from this exploration will Networks” for more on this topic). Evaluating topologies to build supply networks. We can,
prove valuable in building robust models for these for survivability (see Figure 2), we find however, use their evolution principles to build
survivable complex agent networks. that no one topology consistently outperforms supply chain networks that perform well in all
Complex-network theory currently offers the others. For example, while small-world net- respects of the survivability components.
random-graph, small-world, and scale-free net- works have better clustering properties, scale- Researchers have studied complex net-
work topologies as likely candidates for sur- free networks are significantly more robust to works in part to find ways to design evolu-
vivable networks (see the sidebar “Complex random attacks. So, we can’t directly use these tionary algorithms for modeling networks

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 25


Complex Networks
Social scientists, among the first to study complex networks attaches to a node i with degree ki with the probability
extensively, focused on acquaintance networks, where nodes
represent people and edges represent the acquaintances be- ki
Πi =
tween them. Social psychologist Stanley Milgram posited the ∑j k j
“six degrees of separation” theory that in the US, a person’s
social network has an average acquaintance path length of six.1
Research has shown that the second mechanism leads to a
This turns out to be a particular instance of the small-world
network with power-law degree distribution P(k) = k–γ with
property found in many real-world networks, which, despite
exponent γ = 3. Barabási and Albert dubbed these networks
their large size, have a relatively short path between any two
“scale free” because they lack a characteristic degree and have
nodes.
a broad tail of degree distribution. Following the proposal of
An early effort to model complex networks introduced ran-
the first scale-free model, researchers have introduced many
dom graphs for modeling networks with no obvious pattern or
more refined models, leading to a well-developed theory of
structure.2 A random graph consists of N nodes, and two nodes
evolving networks.7
are connected with a connection probability p. Random graphs
Protein-to-protein interactions in metabolic and regulatory
are statistically homogeneous because most nodes have a de-
networks and other biological networks also show a striking
gree (that is, the number of edges incident on the node) close
ability to survive under extreme conditions. Most of these
to the graph’s average degree, and significantly small and large
networks’ underlying properties resemble the three most
node degrees are exponentially rare.
familiar networks found in the literature (see Figure 1 in the
However, studying the topologies of diverse large-scale net-
article).
works found in nature reveals a more complex and unpredict-
Complex networks are also vulnerable to node or edge
able dynamic structure. Two measures quantifying network
losses, which disrupt the paths between nodes or increase
topology found to differ significantly in real networks are the
their length and make communication between them harder.
degree distribution (the fraction of nodes with degree k) and
In severe cases, an initially connected network breaks down
the clustering coefficient. Later modeling efforts focused on
into isolated components that can no longer communicate.
trying to reproduce these properties.3,4 Duncan Watts and
Numerical and analytical studies of complex networks indicate
Steven Strogatz introduced the concept of small-world net-
that a network’s structure plays a major role in its response to
works to explain the high degree of transitivity (order) in com-
node removal. For example, scale-free networks are more
plex networks.5 The Watts-Strogatz model starts from a regu-
robust than random or small-world networks with respect to
lar 1D ring lattice on L nodes, where each node is joined to its
random node loss.8 Large scale-free networks will tolerate the
first K neighbors. Then, with probability p, each edge is re-
loss of many nodes yet maintain communication between
wired with one end remaining the same and the other end
those remaining. However, they’re sensitive to removal of the
chosen uniformly at random, without allowing multiple edges
most-connected nodes (by a targeted attack on critical nodes,
(more than one edge joining a pair of vertices) or loops (edges
for example), breaking down into isolated pieces after losing
joining a node to itself). The resulting network is a regular lat-
just a small percentage of these nodes.
tice when p = 0 and a random graph when p = 1, because all
edges are rewired. This network class displays a high clustering
coefficient for most values of p, but as p → 1, it behaves like a
random graph.
Albert-László Barabási and Réka Albert later proposed an
evolutionary model based on growth and preferential attach-
ment leading to a network class, scale-free networks, with References
power law distribution.6 Many real-world networks’ degree
distribution follows a power law, fundamentally different 1. S. Milgram, “The Small World Problem,” Psychology Today, vol. 2,
May 1967, pp. 60–67.
from the peaked distribution observed in random graphs and
small-world networks. Barabási and Albert argued that a 2. P. Erdös and A. Renyi, “On Random Graphs I,” Publicationes Math-
static random graph of the Watts-Strogatz model fails to cap- ematicae, vol. 6, 1959, pp. 290–297.
ture two important features of large-scale networks: their
constant growth and the inherent selectivity in edge creation. 3. S.N. Dorogovtsev and J.F.F. Mendes, “Evolution of Networks,”
Complex networks such as the Web, collaboration networks, Advances in Physics, vol. 51, no. 4, 2002, pp. 1079–1187.
or even biological networks are growing continuously with
the creation of new Web pages, the birth of new individuals, 4. M.E.J. Newman, “The Structure and Function of Complex Net-
and gene duplication and evolution. Moreover, unlike ran- works,” SIAM Rev., vol. 45, no. 2, 2003, pp. 167–256.
dom networks where each node has the same chance of
5. D.J. Watts and S.H. Strogatz, “Collective Dynamics of ‘Small-World’
acquiring a new edge, new nodes entering the scale-free net- Networks,” Nature, vol. 393, June 1998, pp. 440–442.
work don’t connect uniformly to existing nodes but attach
preferentially to higher-degree nodes. This reasoning led 6. A.-L. Barabási and R. Albert, “Emergence of Scaling in Random
Barabási and Albert to define two mechanisms: Networks,” Science, vol. 286, Oct. 1999, pp. 509–512.

• Growth: Start with a small number of nodes—say, m0—and 7. R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Net-
assume that every time a node enters the system, m edges works,” Reviews of Modern Physics, Jan. 2002, pp. 47–97.
are pointing from it, where m < m0.
• Preferential attachment: Every time a new node enters the 8. R. Albert, H. Jeong, and A.-L Barabási, “Error and Attack Tolerance
system, each edge of the newly connected node preferentially of Complex Networks,” Nature, July 2000, pp. 378–382.

26 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS


Random Small-world Scale-free

with distinct properties found in nature. A


network’s evolutionary mechanism is de-
signed such that the network’s inherent prop-
erties emerge owing to the mechanism. For
example, small-world networks were de- Poisson Peaked Power law
signed to explain the high clustering coeffi- Degree 1.0
1
0.8
cient found in many real-world networks, distribution 0.1

P(k )

P(k )
0.6 0.01
while the “rich get richer” phenomenon used <k >
0.4 0.001
0.2 0.0001
in the Barabási-Albert model explains the 0.0
scale-free distribution.2 k 2 4 6
k
8 10 12 1 10 100 1,000
k
Similarly, we seek to design supply net-
Characteristic Scales as Scales linearly with N Scales as
works with inherent survivability components path length log(N ) for small p. And for higher log(N ) / log(logN) )
(see Figure 3), obtaining these components by p scales as log(N )
coining appropriate growth mechanisms. Of
Clustering p (the connection High, but as p → 1 ((m– 1 ) / 2 ) *(log(N )/N)
course, having all the aforementioned proper-
coefficient probability) behaves like where m is the number of
ties in a network might not be practically fea- a random graph edges with which a
sible—we’d likely have to negotiate trade-offs node enters
depending on the domain. Also, domain speci-
Robustness Similar responses Similar response as Highly resilient to random
ficities might make it inefficient to incorpo- to failures to both random random networks. failures while being very
rate all properties. For instance, in a supply and targeted This is because it has sensitive to targeted
network, we might not be able to rewire the attacks a degree distribution attacks
edges as easily as we can in an information similar to random
networks.
network, so we would concentrate more on
obtaining other properties such as low char-
acteristic path length, robustness to failures
Figure 2. Comparing the survivability components of random, small-world, and
and attacks, and high clustering coefficients.
scale-free networks.
So, the construction of these networks is
domain specific.
Establishing edges between network nodes
is also domain specific. For instance, in a sup-
ply network, a retailer would likely prefer to Retailer Failed node
have contact with other geographically con-
Warehouse Retailer Failed edge
venient nodes (distributors, warehouses, and
other retailers). At the same time, nodes in a Alternate path
Retailer
file-sharing network would prefer to attach to
other nodes known to locate or hold many Retailer
shared files (that is, nodes of high degree). Manufacturer Warehouse Retailer
Retailer
Obtaining the survivability
components Retailer
While evolving the network on the basis
of domain constraints, we need to incorpo- Warehouse Retailer
rate four traits into the growth model for Retailer Retailer
obtaining good survivability components.
The first is low characteristic path length. Warehouse Retailer
During network construction, establish a few
Retailer
long-range connections between nodes that
require many steps to reach one from Manufacturer
Retailer
another.
Warehouse Retailer
The second is good clustering. When two
nodes A and B are connected, new edges Retailer
from A should prefer to attach to neighbors Manufacturer
Retailer
of B, and vice versa.
The third is robustness to random and tar- Warehouse Retailer
geted failure. Preferential attachment—where
Retailer
new nodes entering the network don’t connect
uniformly to existing nodes but attach prefer-
entially to higher-degree nodes (see the side- Figure 3. The transition from supply chain to a survivable supply network.

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 27


D e p e n d a b l e A g e n t S y s t e m s

Preferential attachment Random attachment Proposed attachment rules

Figure 4. Snapshots of the modeled networks during their growth, where the nodes number 70. MSBs are green, FSBs are red, and
battalions are blue.

bar for more details)—leads to scale-free net- agent in an agent system to communicate • A forward support battalion prefers to
works with very few critical and many not-so- with every other agent uses system band- attach to highly connected nodes so that
critical nodes. Here we measure a node’s criti- width inefficiently and could completely bog its supplies proliferate faster in the net-
cality in terms of the number of edges incident down the system. So the amount of redun- work. The supply range from an FSB goes
on it. So, these networks are robust to random dancy results from a trade-off between cost up to a particular distance (at most three
failures (the probability that a critical node fails and survivability. in our model).
is very small) but not to targeted attacks (attack- • A main support battalion also prefers to
ing the very few critical nodes would devastate An illustration attach to a highly connected node to
the network). Also, it’s not practically feasible Suppose we want to build a topology for a enable its supplies to proliferate faster in
to have all nodes play an equal role in the sys- military supply chain that must be survivable the network. We assume an unrestricted
tem—that is, be equally critical. Thus, the net- in wartime. First, we broadly classify the net- supply reach from an MSB, thus facilitat-
work should have a good balance of critical, work nodes into three types: ing some long-range connections.
not-so-critical, and noncritical nodes.
The fourth is efficient rewiring. Rewiring • Battalions prefer to attach to a highly con- In a conventional logistics network, the
edges in a network might or might not be fea- nected node so that the supplies from dif- MSBs supply commodities (such as ammu-
sible, depending on the domain. But where ferent parts of the network will be trans- nitions, food, and fuel) to the FSBs, who in
it is feasible, it should preserve the other ported to them in fewer steps. Battalions turn forward them to the battalions. Our
three traits. also require quick responses, so they prefer approach doesn’t restrict node functionali-
Although complete graphs come equipped the subsequent links to attach to nodes at ties as such—for example, we assume that
with good survivability components, they convenient shorter distances (in our model even a battalion can supply commodities to
clearly aren’t cost effective. Allowing every we considered a fixed distance of two). other battalions if necessary.

8 5.6
Model 1
7 Model 2 5.5
In (number of nodes of degree > k )

Model 3 5.4
6
Characteristic path length

5.3
5
5.2
4
5.1
3
5.0
2
4.9
1 4.8
0 4.7
0 1 2 3 4 5 6.5 7.0 7.5 8.0 8.5 9.0
(a) In (degree k ) (b) Ln (number of nodes)

Figure 5. How our proposed network performed: (a) the log-log of the degree distribution for all the three networks;
(b) the characteristic path length of the proposed network against the log of the number of nodes.

28 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS


Table 1. Simulation results.
attachment network has a heavier tail than
Model 1 (random) Model 2 (preferential) Model 3 (proposed) the other two networks. We measured sur-
Clustering coefficient 0.0038–0.0039 0.013–0.019 0.35–0.39 vivability components for all three networks.
Characteristic path length 5.26–5.36 4.09–4.25 4.69–4.79 The clustering coefficient for Model 3 was
the highest (see Table 1). The Model 3 attach-
ment rules, especially those for battalions and
Growth mechanisms • For an MSB, each edge attaches preferen- FSBs, contribute implicitly to the clustering
Start with a small number of nodes—say, tially to a node i with degree ki with the coefficient, unlike the attachment rules in the
m0—and assume that every time a node probability other models.
enters the system, m edges are pointing from ki The proposed network model’s characteris-
Πi = .
it, where m < m0. Battalions, FSBs, and
MSBs enter the system in a certain ratio
kj ∑j tic path length measured between 4.69 and 4.79
despite the network’s large size (1,000 nodes).
l:m:n where l > m > n: This value puts it between the preferential and
Simulation and analysis random attachment models. Also, as Figure 5b
• A battalion has one edge pointing from it Using this method, we built a network of shows, the characteristic path length increases
and a second edge added with a probabil- 1,000 nodes with l, m, and n being 25, 4, and in the order of log(N) as N increases. Model 3
ity p. 1 (we obtained these values from the current clearly displays small-world behavior.
• An FSB has three edges pointing from it. configuration of the military logistics net- To measure network robustness, we re-
• An MSB has five edges pointing from it. work used in the UltraLog program) and moved a set of nodes from the network and
p = 1/2. We compared this network’s surviv- evaluated its resilience to disruptions. We
The attachment rules applied depend on ability with that of two other networks built considered two attacks types: random and tar-
which node type enters the system: using similar mechanisms except that one geted. To simulate random attacks, we re-
used purely preferential attachment rules moved a set of randomly chosen nodes; for
• For a battalion, the first edge attaches to a (similar to scale-free networks) and the other targeted attacks, we removed a set of nodes
node i of degree ki with the probability used purely random attachment rules (simi- selected strictly in order of decreasing node
lar to random networks) (see Figure 4). All degree. To determine robustness, we mea-
ki three networks had an equal number of edges sured how the size of each network’s largest
Πi = .
∑ j kj and nodes to ensure fair comparison. connected component, characteristic path
We refer to the networks built from ran- length, and maximum distance within the
The second edge, which exists with a dom, preferential, and proposed attachment largest connected component changed as a
probability p, attaches to a randomly cho- rules as Models 1, 2, and 3, respectively. As function of the number of nodes removed. We
sen node at a distance of two. we noted earlier, a typical military supply expect that in a robust network the size of the
• For an FSB, the first edge attaches to a chain (see Figure 1a) with a tree-like or hier- largest connected component is a consider-
node i of degree ki with the probability archical structure has deficient survivability able fraction of N (usually O(N)), and the dis-
components, making it vulnerable to both tances between nodes in the largest connected
ki random and targeted attacks. Models 1, 2, component don’t increase considerably.
Πi = .
∑ j kj and 3 outperform the typical supply network For random failures, Figure 6 shows that
in all survivability components. Model 3’s robustness nearly matches that of
The subsequent edges attach to a randomly Figure 5a shows the three models’ degree the preferential-attachment network (note that
chosen node at a distance of at most three. distribution. As expected, the preferential- scale-free networks are highly resilient to ran-
Size of the largest connected component

1,000 10 25
Maximum distance in the largest
Average length in the largest

900 Model 1 9
connected component

connected component

800 Model 2 8 20
700 Model 3 7
600 6 15
500 5
400 4 10
300 3
200 2 5
100 1
0 0 0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
(a) Percentage of nodes removed (b) Percentage of nodes removed (c) Percentage of nodes removed

Figure 6. Responses of the three networks to random attacks, plotted as (a) the size of the largest connected component,
(b) characteristic path length, and (c) maximum distance in the largest connected component against the percentage of nodes
removed from each network.

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 29


D e p e n d a b l e A g e n t S y s t e m s
Size of the largest connected component

1,200 18 45

Maximum distance in the largest


Model 1

Average length in the largest


16 40
1,000 Model 2

connected component
connected component
Model 3 14 35
800 12 30
10 25
600
8 20
400 6 15
4 10
200
2 5
0 0 0
0 10 20 30 40 50 60 0 20 40 60 0 10 20 30 40 50 60
(a) Percentage of nodes removed (b) Percentage of nodes removed (c) Percentage of nodes removed

Figure 7. The three networks’ responses to targeted attacks, plotted as (a) the size of the largest connected component,
(b) characteristic path length, and (c) maximum distance in the largest connected component against the percentage of nodes
removed from each network.

dom failures). Also, the decrease in the largest are robust to random failures—most of the These networks’ responses to targeted
connected component’s size is linear with nodes in the network have a degree less than attacks are inferior compared to their re-
respect to the number of nodes removed, which four, and removing smaller-degree nodes silience to random attacks (see Figure 7). The
corresponds to the slowest possible decrease. impacts the networks much less than removing size of the largest component decreases much
So, we can safely conclude that these networks high-degree nodes (called hubs). faster for the proposed network than for the
other two networks, but the proposed network
performs better on the other two robustness
measures. That is, the distances in the con-
T h e A u t h o r s nected component are considerably smaller
when more than 10 percent of nodes are
Hari Prasad Thadakamalla is a PhD student in the Department of Industrial removed.
and Manufacturing Engineering at Pennsylvania State University, University
We can improve robustness to targeted
Park. His research interests include supply networks, search in complex net-
works, stochastic systems, and control of multiagent systems. He obtained attacks by introducing constraints in the
his MS in industrial engineering from Penn State. Contact him at attachment rules. Here we assume that node
hpt102@psu.edu. type constrains its degree—that is, network
MSBs, FSBs, and battalions can’t have more
than m1, m2, and m3 edges, respectively, inci-
Usha Nandini Raghavan is a PhD student in industrial and manufacturing dent on them. This is a reasonable assump-
engineering at Pennsylvania State University, University Park. Her research tion because in military logistics (or any orga-
interests include supply chain management, graph theory, complex adaptive
systems, and complex networks. She obtained her MSc in mathematics from
the Indian Institute of Technology, Madras. Contact her at uxr102@psu.edu.
1,000
Sixe of the largest connected component

Model
900 m 1 = 4, m 2 = 10, m 3 = 25
800 m 1 = 4, m 2 = 8, m 3 = 12
Soundar Kumara is a Distinguished Professor of industrial and manufac- m 1 = 3, m 2 = 6, m 3 = 10
turing engineering. He holds joint appointments with the Department of Com- 700
puter Science and Engineering and School of Information Sciences and Tech- 600
nology at Pennsylvania State University. His research interests include
complexity in logistics and manufacturing, software agents, neural networks, 500
and chaos theory as applied to manufacturing process monitoring and diag- 400
nosis. He’s an elected active member of the International Institute of Pro-
duction Research. Contact him at skumara@psu.edu. 300
200
Réka Albert is an assistant professor of physics at Pennsylvania State Uni- 100
versity and is affiliated with the Huck Institutes of the Life Sciences. Her 0
main research interest is modeling the organization and dynamics of com- 0 10 20 30 40 50 60
plex networks. She received her PhD in physics from the University of Notre Percentage of nodes removed
Dame. She is a member of the American Physical Society and the Society for
Mathematical Biology. Contact her at ralbert@phys.psu.edu. Figure 8. The proposed network’s
responses to targeted attacks for
different values of m1, m2, and m3.

30 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS


Table 2. The proposed network’s characteristic path work adaptivity, relates more to constraints. For example, we’ve assumed
length for different m1, m2, and m3 values. node functionality than to that a new node can attach preferentially to
Values of m1, m2, and m3 Characteristic path length topology. Node functionality any node in the network, which might not
should facilitate the ability to be a realistic assumption. If specific geo-
m1 = ∞, m2 = ∞, m3 = ∞ 4.4
m1 = 4, m2 = 10, m3 = 25 6.2 rewire. For example, if a sup- graphical constraints are known, we can
m1 = 4, m2 = 8, m3 = 12 7.1 plier can’t fulfill a customer’s modify our mechanism to make the new
m1 = 3, m2 = 6, m3 = 10 8.0 demands, the customer seeks node entering the system attach preferen-
an alternate supplier—that is, tially only within a set of nodes that satisfy
the edge connected to the sup- the constraints.
nization’s logistics management, for that mat- plier is rewired to be incident on another sup-
ter), the suppliers might not be able to cater to plier. Our model rewires according to its Acknowledgments
We thank the anonymous reviewers for their
more than a certain number of battalions or attachment rules. We conjecture that in such helpful comments. We acknowledge DARPA for
other suppliers. Initial experiments (see Fig- a case, other survivability components (clus- funding this work under grant MDA972-01-1-
ure 8) show that a network with these con- tering coefficient, characteristic path length, 0038 as part of the UltraLog program.
straints displayed improved robustness to tar- and robustness) will be intact. But to make a
geted attacks while not deviating much from stronger argument we need more analysis in References
the clustering coefficient. However, as we this direction.
1. J.M. Swaminathan, S.F. Smith, and N.M.
restrict how many links a node can receive, Sadeh, “Modeling Supply Chain Dynamics:
the network’s characteristic path length A Multiagent Approach,” Decision Sciences,
increases (see Table 2). Clearly a trade-off
exists between robustness to targeted attacks
and the average characteristic path length.
T he growth mechanism we describe is
more like an illustration because
real-world data aren’t available, but we can
vol. 29, no. 3, 1998, pp. 607–632.
2. A.-L. Barabási and R. Albert, “Emergence of
Scaling in Random Networks,” Science, vol.
The fourth measure of survivability, net- always modify it to incorporate domain 286, Oct. 1999, pp. 509–512.

Look to the Future


IEEE Internet Computing reports
emerging tools, technologies,
and applications implemented through
the Internet to support a worldwide
computing environment.

In 2004-2005, we’ll look at


• Homeland Security
• Internet Access to Scientific Data
• Recovery-Oriented
Approaches to Dependability
• Information Discovery:
Needles and Haystacks
• Internet Media
... and more!

www.computer.org/internet/

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 31


PHYSICAL REVIEW E 72, 066128 共2005兲

Search in weighted complex networks

Hari P. Thadakamalla,1 R. Albert,2 and S. R. T. Kumara1


1
Department of Industrial Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
2
Department of Physics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
共Received 5 August 2005; published 30 December 2005兲

We study trade-offs presented by local search algorithms in complex networks which are heterogeneous in
edge weights and node degree. We show that search based on a network measure, local betweenness centrality
共LBC兲, utilizes the heterogeneity of both node degrees and edge weights to perform the best in scale-free
weighted networks. The search based on LBC is universal and performs well in a large class of complex
networks.

DOI: 10.1103/PhysRevE.72.066128 PACS number共s兲: 89.75.Fb, 89.75.Hc, 02.10.Ox, 89.70.⫹c

I. INTRODUCTION tions is considered as a crucial component of ecosystems


关13兴 and metabolic networks, respectively 关14兴. Thus it is
Many large-scale distributed systems found in communi- incomplete to represent real-world systems with equal inter-
cations, biology or sociology can be represented by complex action strengths between different pairs of nodes.
networks. The macroscopic properties of these networks In this paper, we concentrate on finding efficient decen-
have been studied intensively by the scientific community, tralized search strategies on networks which have heteroge-
which has led to many significant results 关1–3兴. Graph prop- neity in edge weights. This is an intriguing and relatively
erties such as the degree distribution and clustering coeffi- little studied problem that has many practical applications.
cient were found to be significantly different from random Suppose some required information such as computer files or
graphs 关4,5兴 which are traditionally used to model these net- sensor data is stored at the nodes of a distributed network.
Then to quickly determine the location of particular informa-
works. One of the major findings is the presence of hetero-
tion, one should have efficient decentralized search strate-
geneity in various properties of the elements in the network.
gies. This problem has become more important and relevant
For instance, a large number of the real-world networks in- due to the advances in technology that led to many distrib-
cluding the World Wide Web, the Internet, metabolic net- uted systems such as sensor networks 关18兴, peer-to-peer net-
works, phone call graphs, and movie actor collaboration net- works 关19兴 and dynamic supply chains 关20兴. Previous re-
works are found to be highly heterogeneous in node degree search on local search algorithms 关9,21–24兴 has assumed that
共i.e., the number of edges per node兲 关1–3兴. The clustering all the edges in the network are equivalent. In this paper we
coefficients, quantifying local order and cohesiveness 关6兴, study the complex tradeoffs presented by efficient local
were also found to be heterogeneous, i.e., C共k兲 ⬃ k−1 关7兴. search in weighted complex networks. We simulate and ana-
These discoveries along with others related to the mixing lyze different search strategies on Erdős-Rényi 共ER兲 random
patterns of complex networks initiated a revival of network graphs and scale-free networks. We define a new local pa-
modeling in the past few years 关1–3兴. Focus has been on rameter called local betweenness centrality 共LBC兲 and pro-
understanding the mechanisms which lead to heterogeneity pose a search strategy based on this parameter. We show that
in node degree and implications of it on the network proper- irrespective of the edge weight distribution this search strat-
ties. It was also shown that this heterogeneity has a huge egy performs the best in networks with a power-law degree
impact on the network properties and processes such as net- distribution 共i.e., scale-free networks兲. Finally, we show that
work resilience 关8兴, network navigation, local search 关9兴, and the search strategy based on LBC is usually equivalent with
epidemiological processes 关10兴. high-degree search 共discussed by Adamic et al. 关9兴兲 in un-
Recently, there have been many studies 关11–17兴 that tried weighted 共binary兲 networks. This implies that the search
to analyze and characterize weighted complex networks based on LBC is more universal and is optimal in a larger
where edges are characterized by capacities or strengths in- class of complex networks.
stead of a binary state 共present or absent兲. These studies have The rest of the paper is organized as follows. In Sec. II,
shown that heterogeneity is prevalent in the capacity and we describe the problem in detail and briefly discuss the
strength of the interconnections in the network as well. Many literature related to search in complex networks. In Sec. III,
researchers 关11,13–16兴 have pointed out that the diversity of we define the local betweenness centrality 共LBC兲 of a node’s
the interaction strengths is critical in most real-world net- neighbor and show that it depends on the weight of the edge
works. For instance, sociologists have shown that the weak connecting the node and neighbor and on the degree of the
links that people have outside their close circle of friends neighbor. Section IV explains our methodology and different
play a key role in keeping the social system together 关11兴. search strategies considered. Section V gives the details of
The Internet traffic 关16兴 or the number of passengers in the the simulations conducted for comparing these strategies. In
airline network 关15兴 are critical dynamical quantities that can Sec. VI, we discuss the findings from simulations on ER
be represented by using weighted edges. Similarly, the diver- random and scale-free networks. In Sec. VII, we prove that
sity of the predator-prey interactions and of metabolic reac- the LBC and degree-based search are equivalent in un-

1539-3755/2005/72共6兲/066128共8兲/$23.00 066128-1 ©2005 The American Physical Society


THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲

weighted networks. Finally, we give conclusions in Sec. 共1兲 Its node degree distribution follows a power law with
VIII. exponent varying from 2.0 to 3.0. Although we discuss the
search strategies for networks with Poisson degree distribu-
tion 共ER random graphs兲, we concentrate more on scale free
II. PROBLEM DESCRIPTION AND LITERATURE
networks since most of the real world networks are found to
The problem of decentralized search goes back to the fa- exhibit this behavior 关1–3兴.
mous experiment by Milgram 关25兴 illustrating the short dis- 共2兲 It has nonuniformly distributed weights on the edges.
tances in social networks. One of the striking observations of Here the weights signify the cost or time taken to pass the
this study as pointed out by Kleinberg 关21兴 was the ability of message or query. Hence, smaller weights correspond to
the nodes in the network to find short paths by using only shorter and/or better paths. We consider different distribu-
local information. Currently, Watts et al. 关26兴 are doing an tions such as Beta, uniform, exponential, and power law.
Internet-based study to verify this phenomenon. Kleinberg 共3兲 It is unstructured and decentralized. That is, each
demonstrated that the emergence of such phenomenon re- node has information only about its first and second neigh-
quires special topological features 关21兴. Considering a family bors and no global information about the target is available.
of network models that generalizes the Watts-Strogatz model Also, the nodes can communicate only with their immediate
关6兴, he showed that only one particular model among this neighbors.
infinite family can support efficient decentralized algorithms. 共4兲 Its topology is dynamic 共ad hoc兲 while still maintain-
Unfortunately, the model given by Kleinberg is too con- ing its statistical properties. These particular types of net-
strained and represents only a very small subset of complex works are becoming more prevalent due to advances made in
networks. Watts et al. presented another model to explain the different areas of engineering especially in sensor networks
phenomena observed by Milgram which is based upon plau- 关18兴, peer-to-peer networks 关19兴 and dynamic supply chains
sible hierarchical social structures 关22兴. However, in many 关20兴. Here, in this paper we analyze the problem of finding
real-world networks, it may not be possible to divide the decentralized algorithms in such weighted complex net-
nodes into sets of groups in a hierarchy depending on the works, which we believe has not been explored to date.
properties of the nodes as in the Watts et al. model. Among the search strategies employed in this paper is a
Recently, Adamic et al. 关9兴 showed that in networks with strategy based on the local betweenness centrality 共LBC兲 of
a power-law degree distribution 共scale-free networks兲 high nodes. Betweenness centrality 共also called load兲, first devel-
degree seeking search is more efficient than random walk oped in the context of social networks 关28兴, has been recently
search. In random walk search, the node that has the message adapted to optimal transport in weighted complex networks
passes it to a randomly chosen neighbor. This process con- by Goh et al. 关17兴. These authors have shown that in the
tinues until it reaches the target node. In high degree search, strong disorder limit 共that is, when the total path length is
the node passes the message to the neighbor that has the dominated by the maximum edge weight over the path兲, the
highest degree among all nodes in the neighborhood, assum- load distribution follows a power law for both ER random
ing that a more connected neighbor has a higher probability graphs and scale-free networks. To determine a node’s be-
of reaching the target node. The high degree search was tweenness as defined by Goh et al. one would need to have
found to outperform the random walk search consistently in the knowledge of the entire network. Here we define a local
networks having power-law degree distribution for different parameter called local betweenness centrality 共LBC兲 which
exponents varying from 2.0 to 3.0. Using generating function only uses information on the first and second neighbors of a
formalism given by Newman 关27兴, Adamic et al. showed that node, and we develop a search strategy based on this local
for random walk search the number of steps s until approxi- parameter.
mately the whole graph is revealed is given by s ⬃ N3共1−2/␶兲,
where ␶ is the power-law exponent, while high degree search
leads to a much more favorable scaling s ⬃ N2−4/␶. III. LOCAL BETWEENNESS CENTRALITY
The assumption of equal edge weights 共meaning the cost, We assume that each node in the network has information
bandwidth, distance, or power consumption associated with about its first and second neighbors. For calculating the local
the process described by the edge兲 usually does not hold in betweenness centrality of the neighbors of a given node we
real-world networks. As pointed out by many researchers consider the local network formed by that node 共which we
关11–17兴, it is incomplete to assume that all the links are will call the root node兲, its first and second neighbors. Then,
equivalent while studying the dynamics of large-scale net- the betweenness centrality, defined as the fraction of shortest
works. The total path length 共p兲 in a weighted network for paths going through a node 关3兴, is calculated for the first
the path 1-2-3¯-n, is given by p = 兺i=1n
wi,i+1, where wi,i+1 is neighbors in this local network. Let L共i兲 be the LBC of a
the weight on the edge from node i to node i + 1. Even neighbor node i in the local network. Then L共i兲 is given by
though high-degree search results in a path with smaller
number of hops, the total path length may be high if the ␴st共i兲
weights on these edges are high. Thus, to be more realistic L共i兲 = 兺 ␴st
s⫽i⫽t
and closer to real-world networks we need to explicitly in-
s,t苸local network
corporate weights in any proposed search algorithm. In this
paper, we are interested in designing decentralized search where ␴st is the total number of shortest paths 共where short-
strategies for networks that have the following properties: est path means the path over which the sum of weights is

066128-2
SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 共2005兲

共2兲 Choose the neighbor with smallest edge weight: The


node passes the message along the edge with minimum
weight. The idea behind this strategy is that by choosing a
neighbor with minimum edge weight the expected distance
traveled would be less.
共3兲 Choose the best-connected neighbor: The node passes
the message to the neighbor which has the highest degree.
The idea here is that by choosing a neighbor which is well-
connected, there is a higher probability of reaching the target
node. Note that this strategy takes the least number of hops
to reach the target 关9兴.
共4兲 Choose the neighbor with the smallest average
FIG. 1. 共a兲 In this configuration, neighbor node 2 has a higher weight: The node passes the message to the neighbor which
LBC than other neighbors 3, 4, and 5. This depicts why higher has the smallest average weight. The average weight of a
degree for a node helps in obtaining higher LBC. 共b兲 However, in node is the average weight of all the edges incident on that
this configuration the LBC of the neighbor node 3 is higher than
node. The idea here is similar to the second strategy. Instead
neighbors 2, 4, and 5. This is due to the fact that the edge connect-
of passing the message greedily along the least weighted
ing 1 and 2 has a larger weight. These two configurations show that
edge, the algorithm passes to the node that has the minimum
the LBC of a neighbor depends both on the edge weight and the
node degree. In both cases, edge weights other than those shown in
average weight.
the figure are assumed to be 1. 共5兲 Choose the neighbor with the highest LBC: The node
passes the message to the neighbor which has the highest
LBC. A neighbor with highest LBC would imply that many
minimal兲 from node s to t. ␴st共i兲 is the number of these shortest paths in the local network pass through this neighbor
shortest paths passing through i. If the LBC of a node is and the node is critical in the local network. Thus, by passing
high, it implies that this node is critical in the local network. the message to this neighbor, the probability of reaching the
Intuitively, we can see that the LBC of a neighbor depends target node quicker is higher.
on both its degree and the weight of the edge connecting it to Note that the strategy which depends on LBC utilizes
the root node. For example, let us consider the networks in slightly move information than strategy 4, namely the edge
Figs. 1共a兲 and 1共b兲. Suppose that these are the local networks weights between second neighbors, but it is considerably
of node 1. In the network in Fig. 1共a兲, node 2 has the highest more informative, it reflects the heterogeneities in both edge
degree among the neighbors of node 1 共i.e., nodes 2, 3, 4, weights and node degree. Thus we expect that this search
and 5兲. All the shortest paths from the neighbors of node 2 will perform better than the others, that is, it will give
共6, 7, 8, and 9兲 to other nodes must pass through node 2. smaller path lengths than the others.
Hence, we see that higher degree for a node definitely helps
in obtaining a higher LBC.
Now consider a similar local network but with a higher V. SIMULATIONS
weight on the edge from 2 to 1 as shown in Fig. 1共b兲. In this
For comparing the search strategies we used simulations
network all the shortest paths through node 2 will also pass
on random networks with Poisson and power-law degree dis-
through node 3 共2-3-1兲 instead of going directly from node 2
tributions. For homogeneous networks we used the Poisson
to node 1. Hence, the LBC of the neighbor node 3 will be
random network model given by Erdős and Rényi 关4兴. We
higher than that of neighbor 2. Thus we clearly see that the
considered a network on N nodes where two nodes are con-
LBCs of the neighbors of node 1 depend on both the neigh-
nected with a connection probability p. For scale-free net-
bors’ degrees and the weights on the edges connecting them.
works, we considered different values of degree exponent ␶
Note that a neighbor having the highest degree or the small-
ranging from 2.0 to 3.0 and a degree range of 2 ⬍ k ⬍ m
est weight on the edge connecting it to root node does not
⬃ N1/␶ and generated the network using the method given by
necessarily imply that it will have the highest LBC.
Newman 关27兴. Once the network was generated, we ex-
tracted the largest connected component, shown to always
IV. METHODOLOGY exist for 2 ⬍ ␶ ⬍ 3.48 关29兴 and in ER networks for p ⬎ 1 / N
关5兴. We did our analysis on this largest connected component
In unweighted scale-free networks, Adamic et al. 关9兴 have that contains the majority of the nodes after verifying that the
shown that high degree search which utilizes the heterogene- degree distribution of this largest connected component is
ity in node degree is efficient. Thus one expects that in nearly the same as in the original graph. The weights on the
weighted power-law networks, an efficient search strategy edges were generated from different distributions such as
should consider both the edge weights and node degree. We Beta, uniform, exponential and power law. We considered
investigated the following set of search strategies given in these distributions in the increasing order of their variances
the order of the amount of information required. to understand how the heterogeneity in edge weights affects
共1兲 Choose a neighbor randomly: The node tries to reach different search strategies.
the target by passing the message/query to a randomly se- Further, we randomly choose K pairs 共source and target兲
lected neighbor. of nodes. The source, and consecutively each node receiving

066128-3
THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲

TABLE I. Comparison of search strategies in a Poisson random network. The edge weights were gener-
ated randomly from an exponential distribution with mean 5 and variance 25. The values in the table are the
average path distances obtained for each search strategy in these networks. The strategy which passes the
message to the neighbor with the least edge weight performs the best.

Search strategy 500 nodes 1000 nodes 1500 nodes 2000 nodes

Random walk 1256.3 2507.4 3814.9 5069.5


Minimum edge weight 597.6 1155.7 1815.5 2411.2
Highest degree 979.7 1923.0 2989.2 3996.2
Minimum average node weight 832.1 1652.7 2540.5 3368.6
Highest LBC 864.7 1800.7 2825.3 3820.9

the message, sends the message to one of its neighbors de- gies 共3, 4, and 5兲, performed best, while high degree search
pending on the search strategy. The search continues until the and LBC did not perform well since the network is highly
message reaches the node whose neighbor is the target node. homogenous in node degree.
In order to avoid passing the message to a neighbor that has However, if we decrease the heterogeneity in edge
already received it, a list li of all the neighbors that received weights 共use a distribution with lesser variance兲, we observe
the message is maintained at each node i. During the search that high LBC search performs best 共see Table II兲. In con-
process, if node i passes the message to its neighbor j, which clusion, when the heterogeneity of edge weights is high com-
does not have any more neighbors that are not in the list l j, pared to the relative homogeneity of node degrees, the search
then the message is routed back to the node i. This particular strategies which are purely based on edge weights would
neighbor j is marked to note that this node cannot pass the perform better. However, as the heterogeneity of the edge
weights decrease the importance of edge weights decreases
message any further. The average path distance was calcu-
and strategies which consider both edge weights and node
lated for each search strategy from the paths obtained for
degree perform better.
these K pairs. We repeated this simulation for 10 to 50 in-
Next we investigated how the search strategies perform
stances of the Poisson and power-law networks depending on
on scale-free networks. Figure 2 shows the scaling of differ-
the size of the network.
ent search strategies for scale-free networks with exponent
2.1. As conjectured, the search strategy that utilizes the het-
VI. ANALYSIS erogeneities of both the edge weights and nodes’ degrees 共the
high LBC search兲 performed better than the other strategies.
First, we study and compare different search strategies on A similar phenomenon was observed for different exponents
ER random graphs. The weights on the edges were generated of the scale-free network 共see Table III兲. Except for the
from an exponential distribution with mean 5 and variance power-law exponent 2.9, the high LBC search was consis-
25. Table I compares the performance of each strategy for the tently better than others. We observe that as the heterogene-
networks of size 500, 1000, 1500, and 2000 nodes. We took ity in the node degree decreases 共i.e., as power-law exponent
the connection probability to be p = 0.004 and hence a giant increases兲, the difference between the high LBC search and
connected component always exists 关5兴. From Table I, it is other strategies decreases. When the exponent is 2.9, the per-
evident that the strategy which passes the message to the formance of LBC, minimum edge weight and high degree
neighbor with the least edge weight is better than all the searches were almost the same. Note that when the network
other strategies in homogeneous networks. Remarkably, a becomes homogeneous in node degree the minimum edge
search strategy that needs less information than other strate- weight search performs better than high LBC search 共Table

TABLE II. Comparison of search strategies in a Poisson random network with 2000 nodes. The table
gives results for different edge weight distributions. The mean for all the distributions is 5 and variance is ␴2.
The values in the table are the average path lengths obtained for each search strategy in these networks. When
the weight heterogeneity is high, the minimum edge weight search strategy was the best. However, when the
heterogeneity of edge weights is low, then LBC performs better.

Beta Uniform Exp. Power law


Search strategy ␴2 = 2.3 ␴2 = 8.3 ␴2 = 25 ␴2 = 4653.8

Random walk 1271.91 1284.9 1253.68 1479.32


Minimum edge weight 1017.74 767.405 577.83 562.39
Highest degree 994.64 1014.05 961.5 1182.18
Minimum average node weight 1124.48 954.295 826.325 732.93
Highest LBC 980.65 968.775 900.365 908.48

066128-4
SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 共2005兲

FIG. 3. The pictorial comparison of the behavior of high degree


and high LBC search as the heterogeneity of edge weights increases
FIG. 2. Scaling for search strategies in power-law networks with
in power-law networks. Note that average distances are normalized
exponent 2.1. The edge weights are generated from an exponential
with respect to high LBC search.
distribution with mean 10 and variance 100. The symbols represent
random walk 共䊊兲 and search algorithms based on minimum edge high degree and high LBC search as the heterogeneity of the
weight 共䊐兲, high degree 共〫兲, minimum average node weight 共䉭兲, edge weights increase 共based on the results shown in Table
and high LBC 共ⴱ兲. IV兲. Since many studies 关11–17兴 have shown that there is a
large heterogeneity in the capacity and strengths of the inter-
I兲. This implies that similarly to high degree search 关9兴, the connections in the real networks, it is important that local
effectiveness of high LBC search also depends on the het- search is based on LBC rather than high degree as shown by
erogeneity in node degree. Adamic et al. 关9兴.
Table IV shows the performance of all the strategies on a Note that LBC has been adopted from the definition of
scale-free network 共exponent 2.1兲 with different edge weight betweenness centrality 共BC兲 which requires the global
distributions. The percentage values in the brackets show by knowledge of the network. BC is defined as the fraction of
how much the average distance for that search is higher than shortest paths among all nodes in the network that pass
the average distance obtained by the high LBC search. As in through a given node and measures how critical the node is
random graphs, we observe that the impact of edge weights for optimal transport in complex networks. In unweighted
on search strategies increases as the heterogeneity of the scale-free networks there exists a scaling relation between
edge weights increase. For instance, when the variance 共het- node betweenness centrality and degree, BC⬃ k␩ 关30兴. This
erogeneity兲 of edge weights is small, high degree search is implies that the higher the degree, the higher is the BC of the
better than the minimum edge weight search. On the other node. This may be the reason why high degree search is
hand, when the variance 共heterogeneity兲 of edge weights is optimal in unweighted scale-free networks 共as shown by
high, the minimum edge weight strategy is better than high Adamic et al. 关9兴兲. However, Goh et al. 关17兴 have shown that
degree search. In each case, the high LBC search which re- no scaling relation exists between node degree and between-
flects both edge weights and node degree always out- ness centrality in weighted complex networks. It will be in-
performed the other strategies. Thus, it is clear that in power- teresting to see the relationship between local and global
law networks, irrespective of the edge weight distribution betweenness centrality in our future work. Also, note that the
and the power-law exponent, high LBC search always per- minimum average node weight strategy 共strategy 4兲 uses
forms better than the other strategies 共Tables III and IV兲. only slightly less information than LBC search. However,
Figure 3 gives a pictorial comparison of the behavior of LBC search consistently and significantly outperforms it 共see

TABLE III. Comparison of search strategies in power-law network on 2000 nodes with different power-
law exponents. The edge weights are generated from an exponential distribution with mean 5 and variance
25. The values in the table are the average path lengths obtained for each search strategy in these networks.
LBC search, which reflects both the heterogeneities in edge weights and node degree, performed the best for
all power-law exponents. The systematic increase in all path lengths with the increase of the power-law
exponent ␶ is due to the fact that the average degree of the network decreases with ␶.

Power-law exponent=
Search strategy 2.1 2.3 2.5 2.7 2.9

Random walk 1108.70 1760.58 2713.11 3894.91 4769.75


Minimum edge weight 318.95 745.41 1539.23 2732.01 3789.56
Highest degree 375.83 761.45 1519.74 2693.62 3739.61
Minimum average node weight 605.41 1065.34 1870.43 3042.27 3936.03
Highest LBC 298.06 707.25 1490.48 2667.74 3751.53

066128-5
THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲

TABLE IV. Comparison of search strategies in power-law networks with exponent 2.1 and 2000 nodes
with different edge weight distributions. The mean for all the edge weight distributions is 5 and the variance
is ␴2. The values in the table are the average distances obtained for each search strategy in these networks.
The values in the brackets show the relative difference between average distance for each strategy with
respect to the average distance obtained by the LBC strategy. LBC search, which reflects both the heteroge-
neities in edge weights and node degree, performed the best for all edge weight distributions.

Beta Uniform Exp. Power law


Search strategy ␴2 = 2.3 ␴2 = 8.3 ␴2 = 25 ␴2 = 4653.8

Random walk 1107.71 1097.72 1108.70 1011.21


共202%兲 共241%兲 共272%兲 共344%兲
Minimum edge weight 704.47 414.71 318.95 358.54
共92%兲 共29%兲 共7%兲 共44%兲
Highest degree 379.98 368.43 375.83 394.99
共4%兲 共14%兲 共26%兲 共59%兲
Minimum average node weight 1228.68 788.15 605.41 466.18
共235%兲 共145%兲 共103%兲 共88%兲
Highest LBC 366.26 322.30 298.06 247.77

Tables I–IV兲. This implies that LBC search uses the informa- gree has the highest LBC. We extend the above result for
tion correctly. other configurations of the local network by considering dif-
ferent possible cases.
The possible edges other than the edges present in a tree-
VII. LBC ON UNWEIGHTED NETWORKS like local network are an edge between two first neighbors,
an edge between a first neighbor and a second neighbor and
In this section, we show that the neighbor with the highest
an edge between two second neighbors. As shown in Fig.
LBC is usually the same as the neighbor with the highest
4共b兲, an edge among two first neighbors changes the LBC of
degree in unweighted networks. Hence, high LBC search
the root node but not that of the neighbors. Figure 4共c兲 shows
would give identical results as high degree search in un-
a configuration of a local network with an edge added be-
weighted networks. As mentioned earlier, in unweighted
tween a first and a second neighbor. Now, there is a small
scale-free networks, there is a scaling relation between the
change in the LBCs of the neighbors 共nodes 2 and 3兲 which
共global兲 BC of a node and its degree, as BC⬃ k␩ 关30兴. How-
are connected to a common second neighbor 共node 9兲. Since
ever, this does not imply that in an unweighted local network
the neighbor with highest LBC is always the same as the
neighbor with the highest degree. Here, we show that in most
cases the highest degree and the highest LBC neighbors co-
incide. First, let us consider a tree-like local network without
any loops similar to the network configuration shown in Fig.
4共a兲. In a local network, there are three types of nodes,
namely, root node, first neighbors and second neighbors. Let
the degree of the root node be d and the degree of the neigh-
bors be k1 , k2 , k3 , . . . , kd. The number of nodes 共n兲 in the
local network is n = 1 + 兺dj=1k j 关one root node, d first neigh-
bors and 兺dj=1共k j − 1兲 second neighbors兴. In a tree network
there is a single shortest path between any pair of nodes s
and t, thus ␴st共i兲 is either zero or one. Then the LBC of a first
neighbor i is given by L共i兲 = 共ki − 1兲共n − 2兲 + 共ki − 1兲共n − ki兲
where ki is the degree of the neighbor. The first term is due to
the shortest paths from ki − 1 neighbors of node i to n − 2
remaining nodes 共other than node i and the neighbor j兲 in the
network. The second term is due to the shortest paths from
n − ki nodes 共other than ki − 1 neighbors and node i兲 to ki − 1 FIG. 4. 共a兲 A configuration of a local network with a tree like
neighbors of node i. Note that we choose not to explicitly structure. In such local networks, the neighbor with the highest
take into account the symmetry of distance in undirected degree has the highest LBC. 共b兲 A local network with an edge
networks and count the s-t and t-s paths separately. L共i兲 is an between two first neighbors. Here again the neighbor with the high-
increasing function if ki ⬍ n − 21 , a condition that is always est degree has the highest LBC. 共c兲 A local network with an edge
satisfied since n = 1 + 兺dj=1k j. This implies that in a local net- between a first neighbor and a second neighbor. Although there is
work with treelike structure, the neighbor with highest de- change in LBCs of neighbors, the order remains the same.

066128-6
SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 共2005兲

see that the highest degree neighbor is not the same as the
highest LBC neighbor. In this local network, the highest de-
gree first neighbor 共node 2兲, participates in several four-node
circuits that include the root node. Thus, there are multiple
shortest paths starting from second-neighbor nodes on these
cycles 共nodes 6, 7, 9, 10兲 and the contributions to node 2’s
LBC from the paths that pass through it are smaller than
unity, consequently the LBC of node 2 will be relatively
small. This may be one of the reasons why the highest-
degree neighbor node 2 is not the highest LBC neighbor. We
feel that this happens only in some special instances of local
networks. From about 50 000 simulations we found that in
99.63% of cases the highest degree neighbor is the same as
the highest LBC neighbor. Hence, we can conclude that in
unweighted networks the neighbor with highest LBC is usu-
ally identical to the neighbor with the highest degree.

VIII. CONCLUSION
FIG. 5. An instance of a local network where the order of neigh-
bors with respect to LBC is not the same as the order with respect In this paper we have given a new direction for local
to node degree. search in complex networks with heterogeneous edge
weights. We proposed a local search algorithm based on a
node 9 is now shared by neighbors 2 and 3, the LBC con- new local measure called local betweenness centrality. We
tributed by node 9 is divided between these two neighbors. studied complex tradeoffs presented by efficient local search
The LBC of such a neighbor i is L共i兲 = 共ki − 2兲共n − 2兲 + 共ki in weighted complex networks and showed that heterogene-
− 2兲共n − ki兲 + 共n − k j − 1兲 where ki is the degree of the neighbor ity in edge weights has huge impact on search. Moreover, the
impact of edge weights on search strategies increases as the
i and k j is the degree of the neighbor with which node i has
heterogeneity of the edge weights increase. We also demon-
a common second neighbor. The decrease in the LBC of
strated that the search strategy based on LBC utilizes the
neighbor i is 共n − ki + k j − 1兲. If there are two neighbors with
heterogeneity in both the node degree and edge weight to
the same degree 共one with a common second neighbor and
perform the best in power-law weighted networks. Further-
another without any兲 then the neighbor without any common
more, we have shown that in unweighted power-law net-
second neighbors will have higher LBC. Another possible
works the neighbor with the highest degree is usually the
change of order with respect to LBC would be with a neigh-
same as the neighbor with the highest LBC. Hence, our pro-
bor l of degree kl = ki − 1 共if it exists兲. However, L共i兲 − L共l兲
posed search strategy based on LBC is more universal and is
= 共n − ki − k j + 1兲 is always greater than 0, since n = 兺dj=1k j in efficient in a larger class of complex networks.
this local network. Thus the only scenario under which the
order of neighbors with respect to LBC is different than their
ACKNOWLEDGMENTS
order with respect to degree when adding an edge between
first and second neighbors is if that creates two first neigh- The authors would like to acknowledge the National Sci-
bors with the same degree. A similar argument leads to an ence Foundation 共Grant No. SST 0427840兲 and a Sloan Re-
identical conclusion in the case of adding an edge between search Fellowship to one of the authors 共R. A.兲 for making
two second neighbors as well. this work feasible. Any opinions, findings and conclusions or
The above discussion suggests that the highest degree recommendations expressed in this material are those of the
neighbor is always the same as the highest LBC neighbor. author共s兲 and do not necessarily reflect the views of the Na-
This is not true in few peculiar instances of local networks. tional Science Foundation 共NSF兲. In addition, the first author
For example, consider the network shown in Fig. 5 which 共H.P.T.兲 would like to thank Usha Nandini Raghavan for in-
has several edges between the first and second neighbors. We teresting discussions on issues related to this work.

关1兴 R. Albert and A. L. Barabasi, Rev. Mod. Phys. 74, 1 共2002兲. 共1998兲.
关2兴 S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. 51, 1079 关7兴 E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and
共2002兲. A.-L. Barabási, Science 297, 1551 共2002兲.
关3兴 M. E. J. Newman, SIAM Rev. 45, 167 共2003兲. 关8兴 R. Albert, A. L. Barabási, and H. Jeong, Nature 共London兲 406,
关4兴 P. Erdos and A. Renyi, Publ. Math. 共Debrecen兲 6, 290 共1959兲. 378 共2000兲; R. Albert, I. Albert, and G. L. Nakarado, Phys.
关5兴 B. Bollobas, Random Graphs 共Academic, London, 1985兲. Rev. E 69, 025103 共2004兲.
关6兴 D. J. Watts and S. H. Strogatz, Nature 共London兲 393, 440 关9兴 L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Hu-

066128-7
THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 共2005兲

berman, Phys. Rev. E 64, 046135 共2001兲. N. Raghavan, H. P. Thadakamalla, and S. R. T. Kumara, Pro-
关10兴 R. Pastor-Satorras and A. Vespignani, Phys. Rev. E 63, ceedings of the Thirteenth International Conference on Ad-
066117 共2001兲; Phys. Rev. Lett. 86, 3200 共2001兲; Phys. Rev. E vanced Computing and Communications-ADCOM, 2005.
65, 035108共R兲 共2002兲; 65, 036104 共2002兲; in Handbook of 关19兴 G. Kan, in Peer-to-Peer Harnessing the Power of Disruptive
Graphs and Networks, edited by S. Bornholdt and H. G. Technologies, edited by A. Oram 共O’Reilly, Beijing, 2001兲; T.
Schuster 共Wiley-VCH, Berlin, 2003兲. Hong, in Peer-to-Peer Harnessing the Power of Disruptive
关11兴 M. Granovetter, Am. J. Sociol. 786, 1360 共1973兲; M. E. J. Technologies, edited by A. Oram 共O’Reilly, Beijing, 2001兲.
Newman, Phys. Rev. E 64, 016132 共2001兲. 关20兴 H. P. Thadakamalla, U. N. Raghavan, S. R. T. Kumara, and R.
关12兴 S. H. Yook, H. Jeong, A. L. Barabasi, and Y. Tu, Phys. Rev. Albert, IEEE Intell. Syst. 19, 24 共2004兲.
Lett. 86, 5835 共2001兲; J. D. Noh and H. Rieger, Phys. Rev. E 关21兴 J. Kleinberg, Nature 共London兲 406, 845 共2000兲; Proceedings
66, 066127 共2002兲; L. A. Braunstein, S. V. Buldyrev, R. Co- of the 32nd ACM Symposium on Theory of Computing, 2000,
hen, S. Havlin, and H. E. Stanley, Phys. Rev. Lett. 91, 168701 163–170; Adv. Neural Inf. Process. Syst. 14, 431 共2001兲.
共2003兲; A. Barrat, M. Barthelemy, and A. Vespignani, Phys. 关22兴 D. J. Watts, P. S. Dodds, and M. E. J. Newman, Science 296,
Rev. E 70, 066149 共2004兲. 1302 共2002兲.
关13兴 S. L. Pimm, Food Webs, 2nd ed. 共The University of Chicago 关23兴 L. A. Adamic and E. Adar, cond-mat/0310120 共unpublished兲.
Press, Chicago, IL, 2002兲. 关24兴 A. Arenas, A. Cabrales, A. Diaz-Guilera, R. Guimera, and F.
关14兴 A. E. Krause, K. A. Frank, D. M. Mason, R. E. Ulanowicz, and Vega, in Statistical mechanics of complex networks, edited by
W. W. Taylor, Nature 共London兲 426, 282 共2003兲; E. Almaas, R. Pastor-Satorras, M. Rubi, and A. Diaz-Guilera 共Springer-
B. Kovacs, T. Vicsek, Z. N. Oltvai, and A. L. Barabasi, ibid. Verlag, Berlin, 2003兲.
427, 839 共2004兲. 关25兴 S. Milgram, Psychol. Today 1, 61 共1967兲.
关15兴 A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespig- 关26兴 D. J. Watts, P. S. Dodds, and R. Muhamad, http://
nani, Proc. Natl. Acad. Sci. U.S.A. 101, 3747 共2004兲; R. Gui- smallworld.columbia.edu/index.html
mera, S. Mossa, A. Turtschi, and L. A. N. Amaral, ibid. 102, 关27兴 M. E. J. Newman, in Handbook of Graphs and Networks, ed-
7794 共2005兲. ited by S. Bornholdt and H. G. Schuster 共Wiley-VCH, Berlin,
关16兴 R. Pastor-Satorras and A. Vespignani, Evolution and Structure 2003兲.
of the Internet: A Statistical Physics Approach 共Cambridge 关28兴 S. Wasserman and K. Faust, Social Network Analysis 共Cam-
University Press, Cambridge, 2004兲. bridge University Press, Cambridge, UK, 1994兲.
关17兴 K. I. Goh, J. D. Noh, B. Kahng, and D. Kim, cond-mat/ 关29兴 W. Aiello, F. Chung, and L. Lu, Proceedings of the Thirty-
0410317 共unpublished兲. second Annual ACM Symposium on Theory of Computing,
关18兴 D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, Proceed- 2000, pp. 171–180.
ings of the Fifth Annual ACM/IEEE International Conference 关30兴 K. I. Goh, B. Kahng, and D. Kim, Phys. Rev. Lett. 87, 278701
on Mobile Computing and Networking, 1999, pp. 263–270; U. 共2001兲.

066128-8
Search in spatial scale-free networks
H P Thadakamalla1,3 , R Albert2 and S R T Kumara1
1
Department of Industrial Engineering, The Pennsylvania State University,
University Park, Pennsylvania, 16802, USA
2
Department of Physics, The Pennsylvania State University, University Park,
Pennsylvania, 16802, USA
E-mail: hpt102@psu.edu, ralbert@phys.psu.edu and skumara@psu.edu
New Journal of Physics 9 (2007) 190
Received 12 March 2007
Published 28 June 2007
Online at http://www.njp.org/
doi:10.1088/1367-2630/9/6/190

Abstract. We study the decentralized search problem in a family of param-


eterized spatial network models that are heterogeneous in node degree. We in-
vestigate several algorithms and illustrate that some of these algorithms exploit
the heterogeneity in the network to find short paths by using only local informa-
tion. In addition, we demonstrate that the spatial network model belongs to a class
of searchable networks for a wide range of parameter space. Further, we test these
algorithms on the US airline network which belongs to this class of networks and
demonstrate that searchability is a generic property of the US airline network.
These results provide insights on designing the structure of distributed networks
that need effective decentralized search algorithms.

3
Author to whom any correspondence should be addressed.

New Journal of Physics 9 (2007) 190 PII: S1367-2630(07)45866-9


1367-2630/07/010190+17$30.00 © IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
2 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Contents

1. Introduction 2
2. Literature and problem description 3
3. Decentralized search algorithms 4
4. Spatial network model and search analysis 7
4.1. Simulation and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Search in the US airline network 11
5.1. Properties of the US airline network . . . . . . . . . . . . . . . . . . . . . . . 11
5.2. Search results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6. Conclusions and discussion 15
Acknowledgments 16
References 16

1. Introduction

Recently, many large-scale distributed systems in communications, sociology, and biology have
been represented as networks and their macroscopic properties have been extensively studied
[1]–[4]. One of the major findings is the presence of heterogeneity in network properties. For
example, the distribution of node degree (i.e. the number of edges incident on a node) for many
real-world networks including the Internet, the World Wide Web, phone call networks, scientific
collaboration networks and metabolic networks is found to be highly heterogeneous and to
follow a power-law, p(k) ∼ k−γ where p(k) is the fraction of nodes with degree k. The clustering
coefficients, quantifying local order and cohesiveness [5], are also found to be heterogeneous,
i.e. C(k) ∼ k−1 [6]. Further, in many networks the node betweenness centrality, which quantifies
the number of shortest paths that pass through a node, is found to be heterogeneous [7]. These
heterogeneities have a demonstrably large impact on the network’s resilience [8, 9] as well as
navigation, local search [10, 11], and spreading processes [12].
Another interesting property exhibited by these networks is the ‘small-world phenomenon’
whereby almost every node is connected to every other node by a path with a small number
of edges. This phenomenon was first demonstrated by Milgram’s famous experiment in 1960
[13]. Milgram randomly selected individuals from Wichita, Kansas and Omaha, Nebraska and
requested them to direct letters to a target person in Boston, Massachusetts. The participants,
and consecutively each person receiving the letter, were asked to send it to an acquaintance
whom they judged to be closer to the target. Surprisingly, the average length of these paths (i.e.
the number of edges in the path) was approximately 6, illustrating the small-world property of
social networks. An even more striking observation, which was later pointed out by Kleinberg
[14]–[16], is that the nodes (participants) were able to find short paths by using only local
information. Currently, Dodds et al are carrying out an Internet-based study to verify this
phenomenon, and initial findings are published in [17].
The observation by Kleinberg raises two fundamental questions: (i) Why should social
networks be structured in a way that local search is efficient? (ii) What is the structure of
networks that exhibit this phenomenon? Kleinberg [14] and later Watts et al [18] argued that
the emergence of such a phenomenon requires special topological features. They termed the

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


3 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

networks in which short paths can be found using only local information as searchable networks.
These studies along with a few others [10, 19] stimulated research on decentralized searching in
complex networks [11], [20]–[26], a problem with many practical applications. In many networks,
information such as data files and sensor data is stored at the nodes of a distributed network. In
addition, the nodes have only limited or local information about the network. Hence, to access this
information quickly, one should have efficient algorithms that can find the target node using the
available local information. Examples include routing of sensor data in wireless sensor networks
[27, 28], locating data files in peer-to-peer networks [26, 29], and finding information in
distributed databases [30]. For the search process to be efficient, it is important that these networks
are designed to be searchable. The importance of search efficiency becomes even more imminent
in the case of ad-hoc networks, where the networks are decentralized and distributed, and real
time searching is required to find the target node.
In this paper, we study the decentralized search problem in a family of parameterized spatial
network models that are heterogeneous in node degree. We propose several decentralized search
algorithms and examine their performance by simulating them on the spatial network model for
various parameters. As pointed out in [25], our analysis reveals that the optimal search algorithm
should effectively incorporate the direction of travel and the degree of the neighbour. We illustrate
that some of these algorithms exploit the heterogeneities present in the network to find paths as
short as the paths found by using global information; thus we demonstrate that the spatial network
model considered defines a class of searchable networks. Further, we test these algorithms on
the US airline network which belongs to this class of networks and show that searchability is a
generic property of the US airline network.

2. Literature and problem description

Decentralized searching in networks can be broadly classified into searching in unstructured


networks (as in peer-to-peer networks such as Gnutella [29]) and in structured/spatial networks
(as in wireless sensor networks). In unstructured networks, the global position of a node cannot
be quantified and it is difficult to know whether a step in the search process is towards the
target node or away from the target node. Hence, it is difficult to obtain short paths using local
information. In unstructured networks with power-law degree distributions, Adamic et al [10]
showed that a high-degree seeking search is better than a random-walk search. In a random-walk
search, the node that has the message passes it to a randomly chosen neighbour, and the process
continues until it reaches the target node. Whereas, in a high-degree search, the node that has the
message passes it to the neighbour with highest degree. Thadakamalla et al [11] proposed a more
general algorithm based on a local measure, local betweenness centrality (LBC), for networks
which are heterogeneous both in edge weights and in node degree. They demonstrated that the
search based on LBC utilizes the heterogeneities in edge weights and node degree to perform
the best in power-law (scale-free) weighted networks.
In structured networks the nodes are embedded in a metric space and they are connected
based on the metric distance. Here, the global position of the target node in the space can guide
the search process to reach the target node more quickly. In [14, 15], Kleinberg studied search in
a family of grid-based models that generalize the Watts–Strogatz [5] model. He proved that only
one particular model among this infinite family can support efficient decentralized algorithms. In
this model, a simple greedy search, where the node passes the message to the neighbour closest

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


4 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

to the target node based on the grid distance, is able to give short paths. He further extended this
model to hierarchical networks [16], where, again, the network was proven to be searchable only
for a specific parameter value. Unfortunately, the model given by Kleinberg represents only a very
small subset of complex networks. Independently, Watts et al presented another model based upon
plausible hierarchical social structures [18], to explain the phenomena observed in Milgram’s
experiment. The networks were shown to be searchable by a greedy search algorithm for a wide
range of parameter space. Other works on decentralized searching include [20]–[26]. Simsek and
Jensen [25] use homophily between nodes and degree disparity in the network to design a better
algorithm for finding the target node. However, finding an optimal way to combine location and
degree information is yet to be investigated (see [21] for a review). Another interesting problem
studied by Clauset and Moore [31], and by Sandberg [24], is the question of how real-world
networks evolve to become searchable. They propose a simple feedback mechanism where the
nodes continuously conduct decentralized searches, and in the process partially rewire the edges
to form a searchable network.
In this paper, we consider search in a family of parameterized spatial network models that
are heterogeneous in node degree. In this model, nodes are placed in an n-dimensional space and
are connected, based on preferential attachment and geographical constraints, to form spatial
scale-free networks. Preferential attachment to high-degree nodes is believed to be responsible
for the emergence of the power-law degree distribution observed in many real-world networks
[32], and geographical constraints account for the fact that nodes tend to connect to nodes that are
nearby. Many real-world networks such as the Internet [33] and the worldwide airline network
[34], can be described by this family of spatial network models. Our objective is to design
decentralized search algorithms for this type of network model and demonstrate that this simple
model defines a class of searchable networks. The decentralized search algorithm attempts to
send a message from a starting node s to the target node t along the edges of the network using
local information. Each node has information about the position of the target node, the position
of its neighbours, and the degree of its neighbours. Using this information, the start node, and
consecutively each node receiving the message, passes the message to one of its neighbours based
on the search algorithm until it reaches the target node. We evaluate each algorithm based on the
number of hops taken for the message to reach the target node; the lower the number, the better
the performance of the algorithm. Another potentially relevant measure is the physical distance
travelled by each search algorithm. However, the number of hops is the most pertinent distance
measure in many networks, including social networks, the Internet and even airline networks,
as the delays associated with switching between edges are comparable to the delays associated
with traversing an edge.
As observed in previous studies [10, 11], we expect that the heterogeneity present in spatial
scale-free networks influences the search process. In the following section, we discuss why the
degree of a node’s neighbour is important and propose different ways of composing the direction
of travel and the degree of the neighbour.

3. Decentralized search algorithms

A simple search algorithm in spatial networks is greedy search, where each node passes the
message to the neighbour closest to the target node. Let di be the distance to the target
node from each neighbour i (see figure 1(a)) and let ki be the degree of the neighbour i.

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


5 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Figure 1. (a) Illustration of a spatial network. di is the distance to the target node
from each neighbour i and ki is the degree of the neighbour i. (b) Illustration
for demonstrating that sometimes it is better to choose a neighbour with higher
degree i.e. node 2 over node 1, even if we are going away from the target. This
will give higher probability of taking a longer step in the next iteration.

Greedy search chooses the neighbour with the smallest di . This will ensure that the message
is always going to the neighbour closest to the target node. However, greedy search may not be
optimal in spatial scale-free networks that have high heterogeneity in node degree. Adamic et al
[10] and Thadakamalla et al [11] have shown that search algorithms that utilize the heterogeneities
present in the network perform substantially better than those that do not. Indeed, choosing a
neighbour with higher degree, even by going away from the target node, gives a higher probability
of taking a longer step in the next iteration. For instance, in figure 1(b), it is better to choose
node 2 instead of node 1 since node 2 can take a longer step towards the target node in the next
iteration. In the following paragraph, we show that the expected distance a neighbour can take
in the next iteration is a strictly increasing function of its degree.
We define the length of an edge as the Euclidian distance between the two nodes
connected by the edge. Let P(X) be the probability distribution of edge lengths. Let Yk =
Max{X1 , X2 , X3 , . . . , Xk }, where X1 , X2 , X3 , . . . , Xk are independent and identically distributed
(i.i.d.) random variables with distribution function P(X). The cumulative distribution function
of Yk is
k
P[Yk  y] = P[Xi  y] = [P(X1  y)]k .
i=1

This implies  ∞
E(Yk ) = (1 − [P(X1  y)]k ) dy.
0
Since P(X1  y)  1 ∀y,

[P(X1  y)]k1  [P(X1  y)]k2 if k1  k2 ,

implying that
E(Yk1 )  E(Yk2 )∀y if k1  k2
Similarly, we can show that if P(X) is not a delta function then

E(Yk1 ) < E(Yk2 ) if k1 < k2 .

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


6 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Now consider two neighbours n1 and n2 with degree k1 and k2 . The expected distance the
neighbours n1 and n2 can take in the next iteration irrespective of the direction is given by
E[Yk1 −1 ] and E[Yk2 −1 ] respectively. This implies that E[Yk1 −1 ] > E[Yk2 −1 ] if k1 > k2 . Here, we
approximate that X1 , X2 , X3 , . . . , Xk are independent which is valid when the number of edges
is large. Hence, if we choose a neighbour with higher degree then there is a greater probability of
taking a longer step in the next iteration. Thus one expects that in spatial scale-free networks the
efficient algorithm should combine the direction of travel, quantified by di , and the degree of the
neighbour, ki , into one measure. Since the units of di and ki are different, there is no trivial way of
composition that is optimal. The aim of the measure is to choose a neighbour with smaller di and
larger ki with an intuition that a higher degree node should effectively decrease the distance from
the target—a goal which can be achieved in many different ways. One could give an incentive
g(ki ), and then subtract it from the distance di ; one could also divide di either by ki or by any
increasing function of ki . We investigated the following search algorithms, which cover a broad
spectrum of possibilities.

1. Random walk: the node attempts to reach the target by passing the message to a randomly
selected neighbour.
2. High-degree search: the node passes the message to the neighbour with the highest degree.
The idea here is that by choosing a neighbour that is well-connected, there is a higher
probability of reaching the target node. Note that this algorithm requires the fewest number
of hops to reach the target in unstructured networks [10].
3. Greedy search: the node passes the message to the neighbour i with the smallest di .
This will ensure that the message is always going to the neighbour closest to the target
node.
4. Algorithm 4: the node passes the message to the neighbour i with the smallest measure
di − g(ki ). The function g(ki ) is an incentive for choosing a neighbour of higher degree.
Ideally, g(ki ) should be the expected maximum length of an edge from a node with
degree ki .
5. Algorithm 5: the node passes the message to the neighbour i that has the smallest measure
( ddmi )ki , where dm is the Euclidian distance between the most spatially distant nodes in the
network, and is used for normalizing di . We assume that dm is known to all the nodes in the
network. Note that the algorithm prefers the neighbour that has lower di and higher ki .
6. Algorithm 6: the node passes the message to the neighbour i that has the smallest measure
di
ki
. Here, again, the algorithm prefers the neighbour that has lower di and higher ki .
7. Algorithm 7: the node passes the message to the neighbour i that has the smallest measure
( ddmi )ln ki +1 . This is a conservative version of algorithm 5 with respect to ki .
8. Algorithm 8: the node passes the message to the neighbour i that has the smallest measure
di
ln ki +1
. This algorithm is weaker version of algorithm 6 with respect to ki .

Algorithms from 4 to 8 aim to capture both the direction of travel and the neighbours’degree.
Thus, we expect these algorithms to give smaller path lengths than other algorithms. In the case of
algorithm 4, it would be extremely difficult to define a function independent of the parameters of
the network. Hence, it may not be realistic to use this form of composition for direction of travel
and degree of neighbour. Even greedy search has a slight preference for high-degree nodes, since
the probability of reaching a node with degree k is ∼ kpk [35], where pk is the fraction of nodes
New Journal of Physics 9 (2007) 190 (http://www.njp.org/)
7 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

with degree k. Hence, the proposed algorithms have to be extremely competitive to perform better
than greedy search. The algorithms described above are mainly based on intuition. However, as
we discuss later in the paper, the successful strategies are not restricted to these functional forms.

4. Spatial network model and search analysis

The spatial network model we consider incorporates both preferential attachment and
geographical constraints. At each step during the evolution of the spatial network model one
of the following occurs [36]:

1. with probability p, a new edge is created between two existing nodes in the network;
2. with probability 1 − p, a new node is added and connected to m existing nodes in the
network, with the constraint that multiple edges are not formed.

In both cases, the degrees of the nodes and the distances between them are considered when
forming a new edge. In the first case, two nodes i and j are selected according to
ki kj
ij ∝ ,
F(dij )
where ki is the degree of node i, dij is the Euclidian distance between nodes i and j and F(dij ) is
an increasing function of dij . A new node i is uniformly and randomly placed in an n-dimensional
space and is connected to a pre-existing node j with probability
kj
j ∝ .
F(dij )

The above process is simulated until the number of nodes in the network is N. Let the network
generated be G(N, p, m, F, n). Here, the preferential attachment mechanism leads to a power-
law degree distribution where the exponent can be tuned by changing the value of p [36] (see
figure 2(a)). F(d) controls the truncation of the power-law decay, and if F(d) increases rapidly,
then the power-law decay regime can disappear altogether [37]. Two widely-used functions for
F(d) are d r [33] and exp(d/dchar ) [37].

4.1. Simulation and analysis


We investigate the search algorithms by simulating them on the networks generated by the above
spatial network model. We generate the network on a two-dimensional grid with length a = 1000,
breadth b = 500, and m = 1 for different values of N, p, and different functions F . Once the
network is formed, we randomly choose K pairs (source and target) of nodes and simulate the
search algorithms. The source, and consecutively each node receiving the message, passes the
message to one of its neighbours, according to the search algorithm. For algorithm 4, we assume
the incentive function g(ki ) to be the expected maximum distance a node with degree ki can take
for the next hop, that is, the expected maximum length of an edge from a node with degree ki .
Empirically we found that this function√ follows the form c1 ∗ ln ki + c2 for all the spatial networks.
For algorithms 5 and 7, we let dm be a2 + b2 , the largest distance between two points in the

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


8 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 1. Comparison of search algorithms on a spatial scale-free network


of 1000 nodes in a two-dimensional space with length and breadth equal
to 1000 and 500, respectively. l is the average path length for the paths
found by the search algorithm, dpath is the average physical distance for
the paths found by each search algorithm and c is the percentage number
of times the path was not found. The table summarizes the average of l,
dpath and c obtained from 10 simulations of the network with parameters
p = 0.72 and r for 2000 pairs. Note that the decentralized algorithms 5,
6, 7 and 8 perform as well as the shortest paths found by using global
information. Even though the greedy search performs well for the paths found
(l and dpath ), it is sometimes unable to find a path (c).
r=1 r=2 r=3

l dpath c(%) l dpath c(%) l dpath c(%)

Random walk 41.68 10957 0 70.47 9414 0 138.07 9024 0


High-degree search 28.35 8032 0 54.85 8805 0 120.15 9848 0
Greedy search 3.37 787 0.17 3.59 600 0.83 4.53 537 2.11
Algorithm 4 10.22 2303 0.12 14.07 1987 0.46 20.08 1806 1.87
Algorithm 5 2.47 646 0 2.97 594 0 4.51 677 0.02
Algorithm 6 2.45 636 0 2.85 565 0 3.73 573 0.02
Algorithm 7 2.54 631 0 2.80 539 0 3.52 527 0.02
Algorithm 8 2.66 646 0 2.87 537 < 0.01 3.54 514 0.07
Shortest path length 2.27 531 NA 2.55 435 NA 3.05 403 NA

considered space. We assume that it is sufficient if the message reaches a small neighbourhood
of the target node defined by a circle with radius D. This is a realistic assumption in many real-
world networks, e.g. it is sufficient if we reach one of the airports in the close neighbourhood of
a destination city (especially when the city has multiple airports). The search process continues
until the message reaches a neighbour of the target node or a node within a circle of radius
D = 50 centred around the target node. In order to avoid passing the message to a neighbour
that has already received the message, a list L is maintained. During the search process, if the
message reaches a node i whose neighbours are all in the list L, then the message is passed to one
of the neighbours using the same algorithm. In the case of random walk or high degree search,
the message is routed back to the previous node and this particular neighbour i is marked to note
that it cannot pass the message any further. If the number of hops exceeds N/2, then the search
process stops, noting that the path was not found. For each search algorithm, the average path
length, l, measured as the number of edges in the path, the average physical distance travelled
along the path, dpath , and the percentage of times the search algorithm is unable to find a path, c,
are computed from the search results obtained for K pairs in 10 instances of the network model.
The lower the value of l, dpath and c, the better the performance of the search algorithm. We use
the shortest average path length and average physical distance obtained by global breadth-first-
search (BFS) algorithm and Dijkstra’s algorithm [38] respectively, as a benchmark for comparing
the performance of the search algorithms.
Table 1 compares the performance of different search algorithms for the spatial network,
G(1000, 0.72, 1, d r , 2) with r = 1, 2 and 3. We find that the decentralized search algorithms 5,

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


9 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 2. Comparison of search algorithms on spatial scale-free networks with


different parameters. l is the average path length for the paths found by each search
algorithm and c is the percentage number of times the path was not found. The table
summarizes the average of l and c obtained from 10 simulations of the network
with parameters N, p, r and dchar . Note that the decentralized algorithms 5, 6, 7
and 8 perform as well as the shortest path found by using global information. Even
though the greedy search performs well for the paths found (l), it is sometimes
unable to find a path (c).
N = 1000, r = 1 p = 0.72, r = 1 N = 1000, p = 0.72

p = 0.30 p = 0.80 N = 500 N = 1500 dchar = 0.5 dchar = 2.0

l c(%) l c(%) l c(%) l c(%) l c(%) l c(%)

Greedy search 6.55 7.93 2.90 0.09 4.09 0.24 3.10 0.44 3.64 0.18 3.92 0.1
Algorithm 5 3.41 0.02 2.35 0 2.83 0 2.40 0 2.46 0.03 2.55 0
Algorithm 6 3.38 0.04 2.38 0 2.81 0 2.38 0 2.49 0 2.59 0
Algorithm 7 3.59 0.19 2.40 0 2.95 0 2.43 0.01 2.66 0.02 2.78 0
Algorithm 8 4.12 0.73 2.49 < 0.01 3.16 < 0.01 2.54 0 2.79 0.04 3.01 0.01
Shortest path length 2.91 NA 2.16 NA 2.30 NA 2.26 NA 2.23 NA 2.23 NA

6, 7 and 8 perform as well as the shortest path obtained using global information of the network.
Specifically, the difference between the shortest path and the path obtained by algorithms 6 and
7 is less than a hop. These results are surprising because the latter algorithms only use the local
information in the network, yet they perform as well as the BFS algorithm. This behaviour is
mainly due to the power-law nature of the spatial network: the few nodes with high-degree are
allowing the algorithms to make big jumps during the search process (see table 1). This conclusion
is corroborated by the fact that an increase in r, meaning a decrease in the power-law regime in the
degree distribution [37], induces an increase in the path length. Greedy search which uses only
the direction of travel is able to find short paths (compare l’s in table 1) but for a few node pairs
it is unable to find a path (compare c’s in table 1). Greedy search does not consider the degree
of the nodes and sometimes the algorithm gets stuck in a loop in sparsely connected regions
of the network. In the case of algorithm 4, the composition was not very effective. It is likely
that the values of the coefficients, which are difficult to compute, were not optimal. Moreover,
the optimal values are highly dependent on the parameters and the configuration of the spatial
network. Hence, it would be difficult to generalize the algorithm for all networks and we will
not consider it further in our analysis. Random-walk and high-degree search do not consider the
direction of travel and hence take an exorbitantly large number of hops. Further, we found that
the search algorithms’ performance with respect to the path length l and physical distance metric
dpath was similar. Hence, in the rest of our analysis, we do not discuss these two algorithms and
the physical distance metric since the results do not add significant new information.
Similar results are obtained for a wide range of parameters for the spatial network model.
Table 2 summarizes the results for some of these parameter values. This parameter space covers
a broad range of power-law networks with different properties. For example, as the value of p
changes from 0.3 to 0.8, the power-law exponent of the degree distribution changes from 2.4
to 1.7 (see figure 2(a)), which is the usual range of many real-world networks [1]–[4]. Hence

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


10 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

(a) 1.000 (b) 1.000

Cumulative distribution p(>= k)


Cumulative distribution p(>= k) – 0.7 – 0.9
0.100 0.100

– 1.4

0.010 0.010

0.001 0.001
1 10 100 1000 1 10 100 1000
Degree (k) Degree (k)

(c) 120 (d) 40


Anchorage
100
30
Normalized BC

80
Normalized BC

60 20

40
10
20

0 0
0 5 10 15 20 0 5 10 15
Scaled degree (k/〈k〉) Scaled degree (k/〈k〉)

Figure 2. (a) Cumulative degree distribution of the networks generated by


the spatial network model for different values of p. The symbols represent
p = 0.3(•), 0.4(), 0.6(), and 0.8(). The power-law exponent of the network
can be tuned by changing the value of p. (b) Cumulative degree distribution of
the US airline network. (c) Scaling of normalized BC of a node i with its scaled
degree for the US airline network. Note that unlike random graphs, there exists
no scaling between BC and degree of the node. (d) Scaling of normalized BC of a
node i with its scaled degree for the US airline network without Alaska. Note that
there is better correlation between BC and degree of the node when compared
with the US airline network.

we can affirm that the spatial network model belongs to a general class of searchable networks.
Although we have restricted our results to a discussion of two-dimensional spatial networks, it
is easy to verify that these results will be valid for higher dimensions. Further, a large number
of decentralized search algorithms are efficient. For instance, in algorithm 6 we divide di by
ki , whereas in algorithm 8 we divide di by ln ki + 1 which scales logarithmically with ki . Both
algorithms are found to be efficient. This implies that a wide range of functions f(x) that scale

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


11 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

between x and ln x can be used for decentralized search. Hence, we find that the dependence of
the search algorithms on the functional forms is weak and the searchability of these networks lies
in their heterogeneous structure rather than the functional forms used in the search algorithm.

5. Search in the US airline network

Let us consider the US airline network, where nodes are the airports and two nodes are connected
by an edge if there is a direct flight from one airport to another. In this network, navigating along
an edge from one node to another represents flying from one airport to another. Suppose our
objective is to travel from one place to another using the US airline network. In real life, one can
obtain a choice of itineraries from the closest airport to the departure location (departure airport)
to the closest airport to the destination location (destination airport) using various sources such
as travel agents, airline offices or the World Wide Web. These sources have global information
about the network and one can choose the itinerary based on different criteria, such as travel fare,
number of stopovers, or total time of travel. Now consider a different scenario—one in which
we do not have access to the global information of the network, and each airport has only local
information. In other words, each airport has information about the location of the airports it can
fly to and how well these neighbouring airports are connected (their degree). We do know the
location of the departure airport and the destination airport. The objective is to find a path with
the fewest stopovers from the departure airport to the destination. From the departure airport,
and consecutively from each intermediate airport, we choose to fly to one of its neighbours based
on the degree of the neighbouring airport, its location and the location of the destination airport.
This process continues until we reach the destination airport or any other airport within a small
neighbourhood of the destination airport. In real life, it is sufficient if we reach one of the airports
near the destination airport. For example, it is sufficient to reach LaGuardia Airport (LGA),
New York City if the objective is to reach John F Kennedy International Airport (JFK),
New York City. In our study, as a first-order approximation we do not consider the type of
airline or travel fare as important parameters. Even though this method of travel is unrealistic, it
provides insights on the performance of decentralized search algorithms on real-world networks.

5.1. Properties of the US airline network


The Bureau of Transportation Statistics [39] has a well-documented database on the departure
schedule, number of passengers, flight type etc for all the flights in the USA. We considered
the data collected for the service class F (scheduled passenger service) flights during the month
of January 2006 to form the US airline network. Each airport is represented as a node and a
direct flight connection from one airport to another is depicted as a directed edge. We filtered the
data to remove the anomalous edges formed due to redirected flights caused by environmental
disturbances or random failures. Further, one would expect to have a flight from airport A to
airport B if there is one from B to A; but for a small number of instances this was not true. To
simplify the analysis, we added edges to make the network undirected.
After filtering the data, the airline network had 710 nodes and 3414 edges. The number of
nodes and edges in the largest connected component (LCC) were 690 and 3412 respectively. The
rest of the analysis in the paper considers only the LCC of the network. Not surprisingly, the
properties of the US airline network are very similar to the properties of the world wide airline

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


12 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

network (WWN) [7]. The average path length for the airline network, which is the average
minimum number of flights one has to take to go from one airport to any other, is 3.6. The
clustering coefficient, which quantifies local order of the network measured in terms of the
number of triangles (3-cliques) present, is 0.41. Hence, the US airline network is also a small-
world network [5]. The degree distribution of the network follows a power-law p(k) ∼ k−γ with
exponent γ = 1.9 ± 0.1 (see figure 2(b)), which is close to the exponent of the WWN, 2.0 ± 0.1
[7]. Further, as observed in the WWN, we find that the most connected airports are not necessarily
the most central airports. Figure 2(c) plots the normalized betweenness centrality (BC) of a node
i, (bi /b), where b is the average BC of the network, versus its scaled degree ki /k, where
k is the average degree of the network. The geopolitical considerations used to explain this
phenomenon in the WWN [34] do not apply to the US airline network, as it belongs to a single
country. In fact, this behaviour is due to Alaska which contains a significant percentage of the
airports (255 of 690, close to 34%) yet only a few (around 6) are connected to airports outside
of Alaska. For instance, the BC of Anchorage, Alaska is significantly higher than its degree
(see figure 2(c)). If we remove the Alaska airports from the network, then we observe better
correlation between the degree of a node and its BC (see figure 2(d)).
If an area is separated from the US mainland (such as Alaska and Hawaii), then very few
airports connect it to the mainland and it may be difficult for search algorithms to capture these
connections between the mainland and the other areas. To investigate the effects of this property
on the search process, we simulate the algorithms on three different networks, namely, the US
airline network, the US airline network without Alaska and the US mainland airline network
without Alaska, Hawaii, Puerto Rico, the US Virgin Islands and the US Pacific Trust Territories
and Possessions (US mainland network). The latter two networks have statistical properties
similar to those of the US airline network. The US airline network without Alaska has 459 nodes
and 2857 edges with 455 nodes and 2856 edges in the LCC; the US mainland network has 431
nodes and 2729 edges with 427 nodes and 2728 edges in the LCC.

5.2. Search results and analysis


We simulated the search algorithms for all N ∗ (N − 1) pairs in each network, where N is the
number of nodes. The US airline network, the US airline network without Alaska, and the US
mainland network had 475 410, 206 570, and 181 902 pairs respectively. We chose dm to be the
largest distance between two airports in the network and the neighbourhood distance D to be
100 miles. Table 3 summarizes the results obtained by each search algorithm. l is the average
path length obtained for the paths found by the search algorithm, and c is the number of times
the search algorithm was unable to find a path. The results are similar to the results obtained
for the spatial scale-free network model. Algorithms 6, 7 and 8 are able to find paths as short as
the paths obtained by the BFS algorithm. Again, greedy search is able to give short paths when
it is able to find paths, but there were instances in which it was unable to find any path. In the
case of the US airline network without Alaska and the US mainland network, the performance
of the search algorithms is even better, especially for algorithm 5 which did not perform well for
the complete US airline network. Figure 3 visualizes the paths obtained in a characteristic case
when greedy search takes a higher number of hops. Often the greedy search reaches the nodes
which are near to the destination node but are not well-connected. Hence, it results in travelling
many hops within that region before reaching the destination. The proposed search algorithms
avoid the low-connected nodes and reach the destination node in fewer hops.

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


13 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 3. Comparison of search algorithms on the US airline network, the US


network without Alaska, and the US mainland network. l is the average path
length for the paths found by the search algorithms and c is the number of times
the path was not found. The table summarizes the average of l and c obtained
for all the possible pairs in the network. In the US airline network, algorithms 6,
7 and 8 give paths close to the shortest path length. In the other two networks,
algorithms 5, 6, 7 and 8 give short paths. Here again, the greedy search performs
well for the paths found (l) but it is sometimes unable to find a path (c).
US airline network US network without Alaska US mainland network
(N = 690, Pairs = 475 410) (N = 455, Pairs = 206 570) (N = 427, Pairs = 181 902)

l c l c l c

Greedy search 3.93 16806 (3.54%) 2.83 4015 (1.94%) 2.74 3729 (2.05%)
Algorithm 5 5.53 13870 (2.92%) 3.75 456 (0.22%) 2.85 425 (0.23%)
Algorithm 6 4.01 752 (0.16%) 3.17 454 (0.22%) 2.68 425 (0.23%)
Algorithm 7 3.37 688 (0.14%) 2.68 453 (0.22%) 2.93 1 ( 0.01%)
Algorithm 8 3.37 41 (< 0.01%) 2.76 38 (0.02%) 2.75 39 (0.02%)
Shortest path length 3.02 NA 2.39 NA 2.32 NA

When we looked at the search results in more detail we found a few more interesting
behaviours. The greedy search and algorithm 5 were unable to find paths for approximately the
same number of pairs in the US airline network (3.54% in the case of the former and 2.92%
for the latter). However, there is a difference in the type of paths these search algorithms could
not find. The paths not found by greedy search were distributed uniformly for all departure and
destination nodes; the paths not found by algorithm 5 were due predominantly to the 18 airports
in Alaska, which were unreachable, almost regardless of the starting point. It was interesting
to see that even if we start from Anchorage International Airport (ANC), the most connected
airport in Alaska, these airports were not reachable. This is mainly due to the high affinity of
algorithm 5 for high-degree nodes. The degree of neighbours of ANC which are in Alaska is
small compared to the degree of neighbours on the US mainland. Hence, when we start from
an airport, the algorithm was able to reach Anchorage but afterward selected one of the highly-
connected airports on the US mainland. From that point on, it is difficult to return to Alaska,
since the search algorithm is self-avoiding and since the only other airport that flies to Alaska,
excluding ANC, is Seattle-Tacoma International Airport (SEA). The US airline network without
Alaska and the US mainland network do not have these constraints, and hence algorithm 5 was
able to perform better.
Among the 475 410 pairs of source and destination nodes searched, algorithms 6 and 7
could not reach the destination node 752 and 688 times, respectively. Again, it turns out that
the failure to reach the destination was mainly due to a particular airport, namely, Havre City-
County Airport (HVR) in Montana. Similar behaviour was observed for these algorithms in the
US airline network without Alaska and the US mainland network. HVR is a single-degree node
that is connected to Lewistown Airport (LWT), Montana and the only other airport to which LWT
is connected is Billings Logan International Airport (BIL), Montana which is a well-connected
airport. Hence, the only way to reach HVR would be to reach BIL first and then to fly to LWT.
Unfortunately, none of the algorithms, other than the greedy search, can choose LWT from BIL

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


14 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Figure 3. Visualization of the paths obtained in a characteristic case when greedy


search takes a higher number of hops. In this case, the departure airport is State
College, PA (node 1) and the destination airport is Laredo, Texas (node 9). The
airline codes and degrees corresponding to the nodes are: 1, SCE, degree 5; 2,
CVG, degree 118; 3, SAT, degree 29; 4, HRL, degree 6; 5, CRP, degree 5; 6,
HOU, degree 31; 7, AUS, degree 34; 8, IAH, degree 118; 9, LRD, degree 2. The
path obtained for the greedy search is 1→2 →3 →4 →5 →6 →7 →8 →9
and for the algorithms 5, 6 and 7 is 1→2 →8 →9. Algorithm 8, not shown on
the map, takes 4 hops (1→2 →3 →8 →9). Often the greedy search reaches the
nodes which are near to the destination node but are not well-connected. Hence, it
ends up travelling many hops within that region before it reaches the destination.
Whereas, the proposed search algorithms avoid the low-connected nodes and
reach the destination node in a lesser number of hops.

when the destination is HVR. Here again, even though the algorithms 5, 6, 7 and 8 are able
to reach BIL, they do not choose LWT as the first choice. Moreover, once they fly out of BIL,
they take many hops to reach BIL again due to the self-avoiding nature of the algorithms. For
instance, when the destination is HVR, algorithms 7 and 8 take, on an average, only 2.5 and 3.44
hops respectively to reach BIL. However, to reach HVR they take around 170 and 102 hops,
respectively. The reason why this behaviour is not observed for other single-degree nodes in the
US mainland network is that single-degree nodes are usually connected to high-degree nodes.
The average degree of the neighbours of the single-degree nodes was found to be 82.86, which is
significantly higher than the average degree in the network (12.78). In addition, the only airport
(LWT) that flies to HVR (or to a neighbourhood of HVR) is not chosen by the only other airport
(BIL) that can fly to LWT.
Table 4 gives the percentage of times the path length found by the search algorithms is the
same as the shortest path length. In approximately 90% of the pairs, the path length found by
algorithms 6, 7 and 8 was the same as the shortest path length. Further, in 97% of the pairs,
the path length found was more than the shortest path by a maximum of two hops. Given that

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


15 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 4. Comparison of search algorithms on the US airline network, the


US network without Alaska, and the US mainland network. ‘Diff = 0’ is the
percentage of pairs for which the path length found by the search algorithms
is the same as the shortest path length. Algorithms 6, 7 and 8 are able find the
shortest paths in more than 90% of the pairs. ‘Diff  2’ is the percentage of pairs
for which the path length found was more than the shortest path by a maximum
of two hops. Given that the search algorithms use only local information, these
results on the US airline network are quite fascinating.
US airline network US network without Alaska US mainland network

Diff = 0(%) Diff  2(%) Diff = 0(%) Diff  2(%) Diff = 0(%) Diff  2(%)

Greedy search 66.3 85.8 75.3 92.3 75.8 92.7


Algorithm 5 66.9 72.1 88.2 93.7 90.8 96.0
Algorithm 6 88.8 96.6 90.8 95.6 92.2 96.8
Algorithm 7 91.3 98.0 92.0 97.6 92.4 98.1
Algorithm 8 88.4 97.5 89.5 97.8 89.0 97.6

the search algorithms use only local information these results on the airline networks are quite
fascinating. Note that this behaviour is due mainly to the inherent structure of the US airline
network, which can be considered a ‘searchable network’.

6. Conclusions and discussion

In this paper, we studied decentralized search in spatial scale-free networks. We proposed


different search algorithms that combine the direction of travel and the degree of the neighbour
and illustrated that some of these algorithms can find short paths by using the local information
alone. We demonstrated that a family of parameterized spatial network model belongs to a class
of searchable networks for a wide range of parameter space. Further, we tested these algorithms
on the US airline network. Surprisingly, we found that one can travel from one place to another
in fewer than four hops while using only local information. This implies that searchability is a
generic property of the US airline network, as is also the case for social networks.
In addition, the spatial network model and the airline network are searchable for a wide
range of search algorithms. For example, algorithms 6 and 8 are both able to find short paths in
these networks. Hence, any search algorithm with a function f(x) that scales between x and ln x
should give short paths. Moreover, the algorithms can be extended to other power-law networks if
we can embed the network in an n-dimensional metric space in which nodes are connected based
on the metric distance. The algorithms are relevant to other networks such as the Internet and road
networks. As demonstrated in [33], the Internet can be described by the family of spatial network
models considered in this paper and hence we expect that these search algorithms can find short
paths in the Internet. However, road networks do not follow a power-law degree distribution.
Investigating the algorithms on the dual form of the road networks, which do exhibit scale-free
properties [40], is a topic of future work.
We notice that algorithm 8, the most conservative with respect to degree, performs the best
in the US airline network. This implies that direction plays the most important role in efficient

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


16 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

searching, and even slight blending of direction with degree is sufficient to drastically improve
the efficiency of search algorithms. In other words, a search algorithm which traverses based
on direction and that cautiously avoids low-degree nodes should give short paths. However, as
observed with algorithm 5, sometimes high preference for degree may lead the algorithm to
the nodes far away from the destination node. Further, we can conclude that searchability is a
property of the network rather than of the functional forms used for the search algorithm.
The difference between the results obtained on the US airline network and the US mainland
network is not significant (especially for algorithms 7 and 8). This implies that the results can
probably be extended to the WWN [7] which has a very similar structure to the US airline network.
In the US airline network, we have separated areas which are connected to the mainland by only
a few airports. Algorithms 7 and 8 are able to capture these connections in order to travel from
one separated area to another. The WWN will have many more of these separated areas which are
well-connected locally but are sparsely inter-connected. We feel that algorithms 7 and 8 would
be able to find short paths in the WWN; verification would be subject to the availability of data
on the WWN.
Probably, the results obtained for the US airline network are intuitive. For instance, in real life
if one is asked to travel with local information, he/she can always find a short path—if not always
the shortest path. But the significance of the results lies in capturing this phenomenon/intuition in
an algorithm. Definitely, the structure of the network facilitates its searchability. As conjectured
by others, the results presented in this paper support the hypothesis [10, 21] that many real-
world networks evolve to inherently facilitate decentralized search. Furthermore, these results
provide insights for designing the structure of decentralized networks that need effective search
algorithms.

Acknowledgments

The authors would like to acknowledge the National Science Foundation (grants DMI 0537992
and CCF 0643529) for making this work feasible. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.

References

[1] Albert R and Barabási A L 2002 Rev. Mod. Phys. 74 47


[2] Boccaletti S, Latora V, Moreno Y, Chavez M and Hwang D U 2006 Phys. Rep. 424 175
[3] Dorogovtsev S N and Mendes J F F 2002 Adv. Phys. 51 1079
[4] Newman M E J 2003 SIAM Rev. 45 167
[5] Watts D J and Strogatz S H 1998 Nature 393 440
[6] Ravasz E, Somera A L, Mongru D A, Oltvai Z N and Barabási A L 2002 Science 297 1551
[7] Guimera R, Mossa S, Turtschi A and Amaral L A N 2005 Proc. Natl Acad. Sci. 102 7794
[8] Albert R, Jeong H and Barabási A L 2000 Nature 406 378
[9] Thadakamalla H P, Raghavan U N, Kumara S R T and Albert R 2004 IEEE Intell. Syst. 19 24
[10] Adamic L A, Lukose R M, Puniyani A R and Huberman B A 2001 Phys. Rev. E 64 046135
[11] Thadakamalla H P, Albert R and Kumara S R T 2005 Phys. Rev. E 72 066128
[12] Pastor-Satorras R and Vespignani A 2001 Phys. Rev. Lett. 86 3200
[13] Milgram S 1967 Psychol. Today 2 60
[14] Kleinberg J 2000 Nature 406 845

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


17 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

[15] Kleinberg J 2000 Proc. 32nd ACM Symp. Theor. Comput. pp 163–70
[16] Kleinberg J 2001 Adv. Neural Inform. Process. Syst. 14 431
[17] Dodds P, Muhamad R and Watts D J 2003 Science 301 827
[18] Watts D J, Dodds P S and Newman M E J 2002 Science 296 1302
[19] Kim B J, Yoon C N, Han S K and Jeong H 2002 Phys. Rev. E 65 027103
[20] Arenas A, Cabrales A, Diaz-Guilera A, Guimera R and Vega F 2003 Statistical mechanics of complex networks
(Berlin: Springer) chapter ‘Search and Congestion in Complex Networks’ pp 175–94
[21] Kleinberg J 2006 Proc. Int. Cong. Math. 3 1019
[22] Liben-Nowell D, Novak J, Kumar R, Raghavan P and Tomkins A 2005 Proc. Natl Acad. Sci. 102 11623
[23] Menczer F 2002 Proc. Natl Acad. Sci. 99 14014
[24] Sandberg O 2006 Proc. 8th Workshop on Algorithm engineering and experiments (ALENEX) pp 144–55
[25] Simsek O and Jensen D 2005 Proc. 19th Int. Joint Conf. Artificial Intell. pp 304–10
[26] Zhang H, Goel A and Govindan R 2004 Comput. Netw. 46 555
[27] Akyildiz I F, Su W, Sankarasubramaniam Y and Cayirci E 2002 Comput. Netw. 38 393
[28] Raghavan U N and Kumara S R T 2007 Int. J. Sensor Netw. 2 201
[29] Kan G 2001 Peer-to-Peer Harnessing the Power of Disruptive Technologies (Beijing: O’Reilly) chapter
‘Gnutella’
[30] Chakrabarti S, van den Berg M and Dom B 1999 Comput. Netw. 31 1623
[31] Clauset A and Moore C 2003 Preprint cond-mat/0309415
[32] Barabási A L and Albert R 1999 Science 286 509
[33] Yook S H, Jeong H and Barabási A L 2002 Proc. Natl Acad. Sci. 99 13382
[34] Guimera R and Amaral L A N 2004 Eur. Phys. J. B 38 381
[35] Newman M E J, Strogatz S H and Watts D J 2001 Phys. Rev. E 64 026118
[36] Dorogovtsev S and Mendes J F F 2000 Europhys. Lett. 52 33
[37] Barthélemy M 2003 Europhys. Lett. 63 915
[38] Cormen T H, Leiserson C E, Rivest R L and Stein C 2001 Introduction to Algorithms 2nd edn (Cambridge:
MIT Press)
[39] The Bureau of Transportation Statistics online at http://www.transtats.bts.gov/ (date accessed: 20 July 2006)
[40] Kalapala V, Sanwalani V, Clauset A and Moore C 2006 Phys. Rev. E 73 026130

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)


PHYSICAL REVIEW E 76, 036106 共2007兲

Near linear time algorithm to detect community structures in large-scale networks

Usha Nandini Raghavan,1 Réka Albert,2 and Soundar Kumara1


1
Department of Industrial Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
2
Department of Physics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
共Received 9 April 2007; published 11 September 2007兲
Community detection and analysis is an important methodology for understanding the organization of
various real-world networks and has applications in problems as diverse as consensus formation in social
communities or the identification of functional modules in biochemical networks. Currently used algorithms
that identify the community structures in large-scale real-world networks require a priori information such as
the number and sizes of communities or are computationally expensive. In this paper we investigate a simple
label propagation algorithm that uses the network structure alone as its guide and requires neither optimization
of a predefined objective function nor prior information about the communities. In our algorithm every node is
initialized with a unique label and at every step each node adopts the label that most of its neighbors currently
have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form
communities. We validate the algorithm by applying it to networks whose community structures are known.
We also demonstrate that the algorithm takes an almost linear time and hence it is computationally less
expensive than what was possible so far.

DOI: 10.1103/PhysRevE.76.036106 PACS number共s兲: 89.75.Fb, 89.75.Hc, 87.23.Ge, 02.10.Ox

I. INTRODUCTION Community detection is similar to the well studied net-


work partitioning problems 关16–18兴. The network partition-
A wide variety of complex systems can be represented as ing problem is in general defined as the partitioning of a
networks. For example, the World Wide Web 共WWW兲 is a network into c 共a fixed constant兲 groups of approximately
network of web pages interconnected by hyperlinks; social equal sizes, minimizing the number of edges between
networks are represented by people as nodes and their rela- groups. This problem is NP-hard and efficient heuristic
tionships by edges; and biological networks are usually rep- methods have been developed over years to solve the prob-
resented by biochemical molecules as nodes and the reac- lem 关16–20兴. Much of this work is motivated by engineering
tions between them by edges. Most of the research in the applications including very large scale integrated 共VLSI兲 cir-
cuit layout designs and mapping of parallel computations.
recent past focused on understanding the evolution and orga-
Thompson 关21兴 showed that one of the important factors
nization of such networks and the effect of network topology
affecting the minimum layout area of a given circuit in a chip
on the dynamics and behaviors of the system 关1–4兴. Finding is its bisection width. Also, to enhance the performance of a
community structures in networks is another step toward un- computational algorithm, where nodes represent computa-
derstanding the complex systems they represent. tions and edges represent communications, the nodes are di-
A community in a network is a group of nodes that are vided equally among the processors so that the communica-
similar to each other and dissimilar from the rest of the net- tions between them are minimized.
work. It is usually thought of as a group where nodes are The goal of a network partitioning algorithm is to divide
densely interconnected and sparsely connected to other parts any given network into approximately equal size groups ir-
of the network 关4–6兴. There is no universally accepted defi- respective of node similarities. Community detection, on the
nition for a community, but it is well known that most real- other hand, finds groups that either have an inherent or an
world networks display community structures. There has externally specified notion of similarity among nodes within
been a lot of effort recently in defining, detecting, and iden- groups. Furthermore, the number of communities in a net-
tifying communities in real-world networks 关5,7–15兴. The work and their sizes are not known beforehand and they are
goal of a community detection algorithm is to find groups of established by the community detection algorithm.
nodes of interest in a given network. For example, a commu- Many algorithms have been proposed to find community
nity in the WWW network indicates a similarity among structures in networks. Hierarchical methods divide networks
nodes in the group. Hence if we know the information pro- into communities, successively, based on a dissimilarity
vided by a small number of web pages, then it can be ex- measure, leading to a series of partitions from the entire net-
trapolated to other web pages in the same community. Com- work to singleton communities 关5,15兴. Similarly one can also
munities in social networks can provide insights about successively group together smaller communities based on a
common characteristics or beliefs among people that makes similarity measure leading again to a series of partitions
them different from other communities. In biomolecular in- 关22,23兴. Due to the wide range of partitions, structural indi-
teraction networks, segregating nodes into functional mod- ces that measure the strength of community structures are
ules can help identify the roles or functions of individual used in determining the most relevant ones. Simulation based
molecules 关10兴. Further, in many large-scale real-world net- methods are also often used to find partitions with a strong
works, communities can have distinct properties which are community structure 关10,24兴. Spectral 关17,25兴 and flow
lost in their combined analysis 关1兴. maximization 共cut minimization兲 methods 关9,26兴 have been

1539-3755/2007/76共3兲/036106共11兲 036106-1 ©2007 The American Physical Society


RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲

successfully used in dividing networks into two or more a measure that can quantify the strength of a community
communities. obtained. One of the ways to measure the strength of a com-
In this paper, we propose a localized community detection munity is by comparing the density of edges observed within
algorithm based on label propagation. Each node is initial- the community with the density of edges in the network as a
ized with a unique label and at every iteration of the algo- whole 关6兴. If the number of edges observed within a commu-
rithm, each node adopts a label that a maximum number of nity U is eU, then under the assumption that the edges in the
its neighbors have, with ties broken uniformly randomly. As network are uniformly distributed among pairs of nodes, we
the labels propagate through the network in this manner, can calculate the probability P that the expected number of
densely connected groups of nodes form a consensus on their edges within U is larger than eU. If P is small, then the
labels. At the end of the algorithm, nodes having the same observed density in the community is greater than the ex-
labels are grouped together as communities. As we will pected value. A similar definition was recently adopted by
show, the advantage of this algorithm over the other methods Newman 关13兴, where the comparison is between the ob-
is its simplicity and time efficiency. The algorithm uses the served density of edges within communities and the expected
network structure to guide its progress and does not optimize density of edges within the same communities in randomized
any specific chosen measure of community strengths. Fur- networks that nevertheless maintain every node’s degree.
thermore, the number of communities and their sizes are not This was termed the modularity measure Q, where Q
known a priori and are determined at the end of the algo- = 兺i共eii − a2i 兲 , ∀ i. eii is the observed fraction of edges
rithm. We will show that the community structures obtained within group i and a2i is the expected fraction of edges within
by applying the algorithm on previously considered net- the same group i. Note that if eij is the fraction of edges in
works, such as Zachary’s karate club friendship network and the network that run between group i and group j, then ai
the U.S. college football network, are in agreement with the = 兺 jeij. Q = 0 implies that the density of edges within groups
actual communities present in these networks. in a given partition is no more than what would be expected
by a random chance. Q closer to 1 indicates stronger com-
munity structures.
II. DEFINITIONS AND PREVIOUS WORK
Given a network with n nodes and m edges N共n , m兲, any
As mentioned earlier, there is no unique definition of a community detection algorithm finds subgroups of nodes.
community. One of the simplest definitions of a community Let C1 , C2 , . . . , C p be the communities found. In most algo-
is a clique, that is, a group of nodes where there is an edge rithms, the communities found satisfy the following con-
between every pair of nodes. Cliques capture the intuitive straints: 共i兲 Ci 艚 C j = 쏗 for i ⫽ j and 共ii兲 艛iCi spans the node
notion of a community 关6兴 where every node is related to set in N.
every other node and hence have strong similarities with A notable exception is Palla et al. 关14兴 who define com-
each other. An extension of this definition was used by Palla munities as a chain of adjacent k cliques and allow commu-
et al. in 关14兴, who define a community as a chain of adjacent nity overlaps. It takes exponential time to find all such com-
cliques. They define two k cliques 共cliques on k nodes兲 to be munities in the network. They use these sets to study the
adjacent if they share k − 1 nodes. These definitions are strict overlapping structure of communities in social and biological
in the sense that the absence of even one edge implies that a networks. By forming another network where a community
clique 共and hence the community兲 no longer exists. k clans is represented by a node and edges between nodes indicates
and k clubs are more relaxed definitions while still maintain- the presence of overlap, they show that such networks are
ing a high density of edges within communities 关14兴. A group also heterogeneous 共fat-tailed兲 in their node degree distribu-
of nodes is said to form a k clan if the shortest path length tions. Furthermore, if a community has overlapping regions
between any pair of nodes, or the diameter of the group, is at with two other communities, then the neighboring communi-
most k. Here the shortest path only uses the nodes within the ties are also highly likely to overlap.
group. A k club is defined similarly, except that the subnet- The number of different partitions of a network N共n , m兲
work induced by the group of nodes is a maximal subgraph into just two disjoint subsets is 2n and increases exponen-
of diameter k in the network. tially with n. Hence we need a quick way to find only rel-
Definitions based on degrees 共number of edges兲 of nodes evant partitions. Girvan and Newman 关5兴 proposed a divisive
within the group relative to their degrees outside the group algorithm based on the concept of edge betweenness central-
were given by Radicchi et al. 关15兴. If din i and di
out
are the ity, that is, the number of shortest paths among all pairs of
degrees of node i within and outside of its group U, then U is nodes in the network passing through that edge. The main
said to form a strong community if din i ⬎ di , ∀ i 僆 U. If
out
idea here is that edges that run between communities have
兺i僆Udi ⬎ 兺i僆Udi , then U is a community in the weak
in out higher betweenness values than those that lie within commu-
sense. Other definitions based on degrees of nodes can be nities. By successively recalculating and removing edges
found in 关6兴. with highest betweenness values, the network breaks down
There can exist many different partitions of nodes in the into disjoint connected components. The algorithm continues
network that satisfy a given definition of community. In most until all edges are removed from the network. Each step of
cases 关4,22,26–28兴, the groups of nodes found by a commu- the algorithm takes O共mn兲 time and since there are m edges
nity detection algorithm are assumed to be communities ir- to be removed, the worst case running time is O共m2n兲. As the
respective of whether they satisfy a specific definition or not. algorithm proceeds one can construct a dendrogram 共see
To find the best community structures among them we need Fig. 1兲 depicting the breaking down of the network into dis-

036106-2
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲

Label flooding algorithms have also been used in detect-


ing communities in networks 关27,28兴. In 关27兴, the authors
propose a local community detection method where a node is
initialized with a label which then propagates step by step
via the neighbors until it reaches the end of the community,
where the number of edges proceeding outward from the
community drops below a threshold value. After finding the
local communities at all nodes in the network, an n ⫻ n ma-
trix is formed, where the ijth entry is 1 if node j belongs to
the community started from i and 0 otherwise. The rows of
the matrix are then rearranged such that the similar ones are
closer to each other. Then, starting from the first row they
successively include all the rows into a community until the
distance between two successive rows is large and above a
FIG. 1. An illustration of a dendrogram which is a tree repre- threshold value. After this a new community is formed and
sentation of the order in which nodes are segregated into different the process is continued. Forming the rows of the matrix and
groups or communities. rearranging them requires O共n3兲 time and hence the algo-
rithm is time-consuming.
joint connected components. Hence for any given h such that
1 ⱕ h ⱕ n, at most one partition of the network into h disjoint Wu and Huberman 关26兴 propose a linear time 关O共m + n兲兴
subgroups is found. All such partitions in the dendrogram are algorithm that can divide a given network into two commu-
depicted, irrespective of whether or not the subgroups in nities. Suppose that one can find two nodes 共x and y兲 that
each partition represent a community. Radicchi et al. 关15兴 belong to two different communities, then they are initialized
propose another divisive algorithm where the dendrograms with values 1 and 0, respectively. All other nodes are initial-
are modified to reflect only those groups that satisfy a spe- ized with value 0. Then at each step of the algorithm, all
cific definition of a community. Further, instead of edge be- nodes 共except x and y兲 update their values as follows. If
tweenness centrality, they use a local measure called edge z1 , z2 , . . . , zk are neighbors of a node z, then the value Vz is
Vz +Vz +¯+Vz
clustering coefficient as a criterion for removing edges. The updated as 1 2k k
. This process continues until conver-
edge clustering coefficient is defined as the fraction of num- gence. The authors show that the iterative procedure con-
ber of triangles a given edge participates in, to the total num- verges to a unique value, and the convergence of the algo-
ber of possible such triangles. The clustering coefficient of
rithm does not depend on the size n of the network. Once the
an edge is expected to be the least for those running between
required convergence is obtained, the values are sorted be-
communities and hence the algorithm proceeds by removing
edges with low clustering coefficients. The total running time tween 0 and 1. Going through the spectrum of values in
of this divisive algorithm is O共 mn2 兲.
4 descending order, there will be a sudden drop at the border of
two communities. This gap is used in identifying the two
Similarly one can also define a topological similarity be-
communities in the network. A similar approach was used by
tween nodes and perform an agglomerative hierarchical clus-
Flake et al. 关9兴 to find the communities in the WWW net-
tering 关23,29兴. In this case, we begin with nodes in n differ-
ent communities and group together communities that are the work. Here, given a small set of nodes 共source nodes兲, they
most similar. Newman 关22兴 proposed an amalgamation form a network of web pages that are within a bounded dis-
method 共similar to agglomerative methods兲 using the modu- tance from the sources. Then by designating 共or artificially
larity measure Q, where at each step those two communities introducing兲 sink nodes, they solve for the maximum flow
are grouped together that give rise to the maximum increase from the sources to the sinks. In doing so one can then find
or smallest decrease in Q. This process can also be repre- the minimum cut corresponding to the maximum flow. The
sented as a dendrogram and one can cut across the dendro- connected component of the network containing the source
gram to find the partition corresponding to the maximum nodes after the removal of the cut set is then the required
value of Q 共see Fig. 1兲. At each step of the algorithm one community.
compares at most m pairs of groups and requires at most Spectral bisection methods 关25兴 have been used exten-
O共n兲 time to update the Q value. The algorithm continues sively to divide a network into two groups so that the number
until all the n nodes are in one group and hence the worst of edges between groups is minimized. Eigenvectors of the
case running time of the algorithm is O关n共m + n兲兴. The algo- Laplacian matrix 共L兲 of a given network are used in the
rithm of Clauset et al. 关30兴 is an adaptation of this agglom- bisection process. It can be shown that L has only real non-
erative hierarchical method, but uses a clever data structure negative eigenvalues 共0 ⱕ ␭1 ⱕ ␭2 ⱕ ¯ ⱕ ␭n兲 and minimizing
to store and retrieve information required to update Q. In the number of edges between groups is the same as minimiz-
effect, they reduce the time complexity of the algorithm to ing the positive linear combination M = 兺is2i ␭i, where si
O共md log n兲, where d is the depth of the dendrogram ob- = uTi z and ui is the eigenvector of L corresponding to ␭i. z is
tained. In networks that have a hierarchical structure with the decision vector whose ith entry can be either 1 or −1
communities at many scales, d ⬃ log n. There have also been denoting to which of the two groups node i belongs. To
other heuristic and simulation based methods that find parti- minimize M, z is chosen as parallel as possible to the eigen-
tions of a given network maximizing the modularity measure vector corresponding to the second smallest eigenvalue. 共The
Q 关10,24兴. smallest eigenvalue is 0 and choosing z parallel to the corre-

036106-3
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲

FIG. 2. Nodes are updated one by one as we move from left to


right. Due to a high density of edges 共highest possible in this case兲,
all nodes acquire the same label.
FIG. 3. An example of a bi-partite network in which the label
sponding eigenvector gives a trivial solution.兲 This bisection sets of the two parts are disjoint. In this case, due to the choices
made by the nodes at step t, the labels on the nodes oscillate be-
method has been extended to finding communities in net-
tween a and b.
works that maximize the modularity measure Q 关25兴. Q can
be written as a positive linear combination of eigenvalues of
the matrix B, where B is defined as the difference of the two however, is that subgraphs in the network that are bipartite or
matrices A and P. Aij is the observed number of edges be- nearly bipartite in structure lead to oscillations of labels 共see
tween nodes i and j and Pij is the expected number of edges Fig. 3兲. This is especially true in cases where communities
between i and j if the edges fall randomly between nodes, take the form of a star graph. Hence we use asynchronous
while maintaining the degree of each node. Since Q has to be updating where Cx共t兲 = f(Cxi1共t兲 , . . . , Cxim共t兲 , Cxi共m+1兲共t
maximized, z is chosen as parallel as possible to the eigen- − 1兲 , . . . , Cxik共t − 1兲) and xi1 , . . . , xim are neighbors of x that
vector corresponding to the largest eigenvalue. have already been updated in the current iteration while
Since many real-world complex networks are large in xi共m+1兲 , . . . , xik are neighbors that are not yet updated in the
size, time efficiency of the community detection algorithm is current iteration. The order in which all the n nodes in the
an important consideration. When no a priori information is network are updated at each iteration is chosen randomly.
available about the likely communities in a given network, Note that while we have n different labels at the beginning of
finding partitions that optimize a chosen measure of commu- the algorithm, the number of labels reduces over iterations,
nity strength is normally used. Our goal in this paper is to resulting in only as many unique labels as there are commu-
develop a simple time-efficient algorithm that requires no nities.
prior information 共such as number, sizes, or central nodes of Ideally the iterative process should continue until no node
the communities兲 and uses only the network structure to in the network changes its label. However, there could be
guide the community detection. The proposed mechanism for nodes in the network that have an equal maximum number of
such an algorithm which does not optimize any specific mea- neighbors in two or more communities. Since we break ties
sure or function is detailed in the following section. randomly among the possible candidates, the labels on such
nodes could change over iterations even if the labels of their
III. COMMUNITY DETECTION USING LABEL neighbors remain constant. Hence we perform the iterative
PROPAGATION process until every node in the network has a label to which
the maximum number of its neighbors belongs. By doing so
The main idea behind our label propagation algorithm is we obtain a partition of the network into disjoint communi-
the following. Suppose that a node x has neighbors ties, where every node has at least as many neighbors within
x1 , x2 , . . . , xk and that each neighbor carries a label denoting its community as it has with any other community. If
the community to which they belong. Then x determines its C1 , . . . , C p are the labels that are currently active in the net-
community based on the labels of its neighbors. We assume work and dCi j is the number of neighbors node i has with
that each node in the network chooses to join the community nodes of label C j, then the algorithm is stopped when for
to which the maximum number of its neighbors belong, with every node i,
ties broken uniformly randomly. We initialize every node
with unique labels and let the labels propagate through the
If i has label Cm then dCi m ⱖ dCi j ∀ j.
network. As the labels propagate, densely connected groups
of nodes quickly reach a consensus on a unique label 共see At the end of the iterative process nodes with the same
Fig. 2兲. When many such dense 共consensus兲 groups are cre- label are grouped together as communities. Our stop criterion
ated throughout the network, they continue to expand out- characterizing the obtained communities is similar 共but not
wards until it is possible to do so. At the end of the propa- identical兲 to the definition of strong communities proposed
gation process, nodes having the same labels are grouped by Radicchi et al. 关15兴. While strong communities require
together as one community. each node to have strictly more neighbors within its commu-
We perform this process iteratively, where at every step, nity than outside, the communities obtained by the label
each node updates its label based on the labels of its neigh- propagation process require each node to have at least as
bors. The updating process can either be synchronous or many neighbors within its community as it has with each of
asynchronous. In synchronous updating, node x at the tth the other communities. We can describe our proposed label
iteration updates its label based on the labels of its neighbors propagation algorithm in the following steps.
at iteration t − 1. Hence Cx共t兲 = f(Cx1共t − 1兲 , . . . , Cxk共t − 1兲), 共i兲 Initialize the labels at all nodes in the network. For a
where cx共t兲 is the label of node x at time t. The problem, given node x, Cx共0兲 = x.

036106-4
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲

共ii兲 Set t = 1.
共iii兲 Arrange the nodes in the network in a random order
and set it to X.
共iv兲 For each x 僆 X chosen in that specific order, let
Cx共t兲 = f(Cxi1共t兲 , . . . , Cxim共t兲 , Cxi共m+1兲共t − 1兲 , . . . , Cxik共t − 1兲). f
here returns the label occurring with the highest frequency
among neighbors and ties are broken uniformly randomly.
共v兲 If every node has a label that the maximum number of
their neighbors have, then stop the algorithm. Else, set t = t
+ 1 and go to 共iii兲.
Since we begin the algorithm with each node carrying a
unique label, the first few iterations result in various small
pockets 共dense regions兲 of nodes forming a consensus 共ac-
quiring the same label兲. These consensus groups then gain
momentum and try to acquire more nodes to strengthen the
group. However, when a consensus group reaches the border
of another consensus group, they start to compete for mem-
bers. The within-group interactions of the nodes can counter-
act the pressures from outside if there are less between-group
edges than within-group edges. The algorithm converges,
and the final communities are identified, when a global con-
sensus among groups is reached. Note that even though the
network as one single community satisfies the stop criterion,
this process of group formation and competition discourages
all nodes from acquiring the same label in the case of het-
erogeneous networks with an underlying community struc-
ture. In the case of homogeneous networks such as Erdős-
Rényi random graphs 关31兴 that do not have community
structures, the label propagation algorithm identifies the gi-
ant connected component of these graphs as a single com-
munity.
Our stop criterion is only a condition and not a measure
that is being maximized or minimized. Consequently there is
no unique solution and more than one distinct partition of a
network into groups satisfies the stop criterion 共see Figs. 4
and 5兲. Since the algorithm breaks ties uniformly randomly,
early on in the iterative process when possibilities of ties are
high, a node may vote in favor of a randomly chosen com-
munity. As a result, multiple community structures are reach-
able from the same initial condition.
If we know the set of nodes in the network that are likely
to act as centers of attraction for their respective communi-
ties, then it would be sufficient to initialize such nodes with
unique labels, leaving the remaining nodes unlabeled. In this
case when we apply the proposed algorithm the unlabeled
nodes will have a tendency to acquire labels from their clos-
est attractor and join that community. Also, restricting the set FIG. 4. 共a兲–共c兲 are three different community structures identi-
of nodes initialized with labels will reduce the range of pos- fied by the algorithm on Zachary’s karate club network. The com-
sible solutions that the algorithm can produce. Since it is munities can be identified by their shades of gray colors.
generally difficult to identify nodes that are central to a com-
munity before identifying the community itself, here we give lege football network that consists of 115 college teams
all nodes equal importance at the beginning of the algorithm represented as nodes and has edges between teams that
and provide them each with unique labels. played each other during the regular season in the year 2000
We apply our algorithm to the following networks. The 关5兴. The teams are divided into conferences 共communities兲
first one is Zachary’s karate club network which is a network and each team plays more games within its own conference
of friendship among 34 members of a karate club 关32兴. Over than interconference games. Next is the coauthorship net-
a period of time the club split into two factions due to lead- work of 16 726 scientists who have posted preprints on the
ership issues and each member joined one of the two fac- condensed matter archive at www.arxiv.org; the edges con-
tions. The second network that we consider is the U.S. col- nect scientists who coauthored a paper 关33兴. It has been

036106-5
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲

FIG. 5. The grouping of U.S. college football teams into conferences are shown in 共a兲 and 共b兲. Each solution 关共a兲 and 共b兲兴 is an aggregate
of five different solutions obtained by applying the algorithm on the college football network.

shown that communities in coauthorship networks are made number of nodes common to community i in one solution
up by researchers working in the same field or are research and community j in the other solution. Then we calculate
groups 关22兴. Along similar lines one can expect an actor f same = 21 共兺imax j兵M ij其 + 兺 jmaxi兵M ij其兲 100
n . Given a network
collaboration network to have communities containing actors whose communities are already known, a community detec-
of a similar genre. Here we consider an actor collaboration tion algorithm is commonly evaluated based on the percent-
network of 374 511 nodes and edges running between actors age 共or number兲 of nodes that are grouped into the correct
who have acted in at least one movie together 关3兴. We also communities 关22,26兴. f same is similar, whereby fixing one so-
consider a protein-protein interaction network 关34兴 consist- lution we evaluate how close the other solution is to the fixed
ing of 2115 nodes. The communities are likely to reflect one and vice versa. While f same can identify how close one
functional groupings of this network. And finally we con- solution is to another, it is, however, not sensitive to the
sider a subset of the WWW兲 consisting of 325 729 web seriousness of errors. For example, when few nodes from
pages within the nd.edu domain and hyperlinks interconnect- several different communities in one solution are fused to-
ing them 关2兴. Communities here are expected to be groups of gether as a single community in another solution, the value
of f same does not change much. Hence we also use Jaccard’s
pages on similar topics.
index which has been shown to be more sensitive to such
differences between solutions 关35兴. If a stands for the pairs
A. Multiple community structures
of nodes that are classified in the same community in both
Figure 4 shows three different solutions obtained for the solutions, b for pairs of nodes that are in the same commu-
Zachary’s karate club network and Fig. 5 shows two different nity in the first solution and different in the second, and c
a
solutions obtained for the U.S. college football network. We vice versa, then Jaccard’s index is defined as a+b+c . It takes
will show that even though we obtain different solutions values between 0 and 1, with higher values indicating stron-
共community structure兲, they are similar to each other. To find ger similarity between the two solutions. Figure 6 shows the
the percentage of nodes classified in the same group in two similarities between solutions obtained from applying the al-
different solutions, we form a matrix M, where M ij is the gorithm five different times on the same network. For a

036106-6
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲

FIG. 6. Similarities between five different solutions obtained for each network is tabulated. An entry in the ith row and jth column in the
lower triangle of each of the tables is the Jaccard’s similarity index for solutions i and j of the corresponding network. Entries in the ith row
and jth column in the upper triangle of the tables are the values of the measure f same for solutions i and j in the respective networks. The
range of modularity values Q obtained for the five different solutions is also given for each network.

given network, the ijth entry in the lower triangle of the table bine them as follows; let C1 denote the labels on the nodes in
is the Jaccard index for solutions i and j, while the ijth entry solution 1 and C2 denote the labels on the nodes in solution
in the upper triangle is the measure f same for solutions i and j. 2. Then, for a given node x, we define a new label as Cx
We can see that the solutions obtained from the five different = 共C1x , C2x 兲 共see Fig. 7兲. Starting with a network initialized
runs are similar, implying that the proposed label propaga- with labels C we perform the iterative process of label propa-
tion algorithm can effectively identify the community struc- gation until every node in the network is in a community to
ture of any given network. Moreover, the tight range and which the maximum number of its neighbors belongs. As
high values of the modularity measure Q obtained for the and when new solutions are available they are combined one
five solutions 共Fig. 6兲 suggest that the partitions denote sig- by one with the aggregate solution to form a new aggregate
nificant community structures. solution. Note that when we aggregate two solutions, if a
community T in one solution is broken into two 共or more兲
different communities S1 and S2 in the other, then by defining
B. Aggregate
the new labels as described above we are showing prefer-
It is difficult to pick one solution as the best among sev- ences to the smaller communities S1 and S2 over T. This is
eral different ones. Furthermore, one solution may be able to only one of the many ways in which different solutions can
identify a community that was not discovered in the other be aggregated. For other methods of aggregation used in
and vice versa. Hence an aggregate of all the different solu- community detection refer to 关26,36,37兴.
tions can provide a community structure containing the most Figure 8 shows the similarities between aggregate solu-
useful information. In our case a solution is a set of labels on tions. The algorithm was applied on each network 30 times
the nodes in the network and all nodes having the same label and the solutions were recorded. An ijth entry is the Jaccard
form a community. Given two different solutions, we com- index for the aggregate of the first 5i solutions with the ag-

036106-7
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲

FIG. 7. An example of aggregating two community structure solutions. t1, t2, t3, and t4 are labels on the nodes in a network obtained from
solution 1 and denoted as C1. The network is partitioned into groups of nodes having the same labels. s1, s2, and s3 are labels on the nodes
in the same network obtained from solution 2 and denoted as C2. All nodes that had label t1 in solution 1 are split into two groups with each
group having labels s1 and s2, respectively, while all nodes with labels t3, t4, or t5 in solution 1 have labels s3 in solution 2. C represents the
new labels defined from C1 and C2.

gregate of the first 5j solutions. We observe that the aggre- the accuracy of the algorithm by applying it on these net-
gate solutions are very similar in nature and hence a small set works. We find that the algorithm can effectively unearth the
of solutions 共5 in this case兲 can offer as much insight about underlying community structures in the respective networks.
the community structure of a network as can a larger solution The community structures obtained by using our algorithm
set. In particular, the WWW network which had low simi- on Zachary’s karate club network is shown in Fig. 4. While
larities between individual solutions 共Jaccard index range all three solutions are outcomes of the algorithm applied to
0.4883–0.5931兲, shows considerably improved similarities the network, Fig. 4共b兲 reflects the true solution 关32兴.
共Jaccard index range 0.6604–0.7196兲 between aggregate so- Figure 5 gives two solutions for the U.S. college football
lutions. network. The algorithm was applied to this network ten dif-
ferent times and the two solutions are the aggregate of the
IV. VALIDATION OF THE COMMUNITY DETECTION
first five and remaining five solutions. In both Figs. 5共a兲 and
ALGORITHM
5共b兲, we can see that the algorithm can effectively identify
Since we know the communities present in Zachary’s ka- all the conferences with the exception of Sunbelt. The reason
rate club and the U.S. football network, we explicitly verify for the discrepancy is the following: among the seven teams

FIG. 8. Similarities between aggregate solutions obtained for each network. An entry in the ith row and jth column in the tables is
Jaccard’s similarity index between the aggregate of the first 5i and the first 5j solutions. While similarities between solutions for the karate
club friendship network and the protein-protein interaction network are represented in the lower triangles of the first two tables, the entries
in the upper triangle of these two tables are for the U.S. college football network and the coauthorship network, respectively. The similarities
between aggregate solutions for the WWW is given in the lower triangle of the third table.

036106-8
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲

in the Sunbelt conference, four teams 共Sunbelt4 ⫽ 兵North- VI. DISCUSSION AND CONCLUSIONS
Texas, Arkansas State, Idaho, New Mexico State其 have all
played each other and three teams 共Sunbelt3 ⫽兵Louisiana- The proposed label propagation process uses only the net-
Monroe, Middle-Tennessee State, Louisiana-Lafayette其兲 have work structure to guide its progress and requires no external
again played one another. There is only one game connecting parameter settings. Each node makes its own decision re-
Sunbelt4 and Sunbelt3, namely, the game between North- garding the community to which it belongs based on the
Texas and Louisiana-Lafayette. However, four teams from communities of its immediate neighbors. These localized de-
the Sunbelt conference 共two each from Sunbelt4 and cisions lead to the emergence of community structures in a
Sunbelt3兲 have together played with seven different teams in given network. We verified the accuracy of community struc-
the Southeastern conference. Hence we have the Sunbelt tures found by the algorithm using Zachary’s karate club and
conference grouped together with the Southeastern confer-
the U.S. college football networks. Furthermore, the modu-
ence in Fig. 5共a兲. In Fig. 5共b兲, the Sunbelt conference breaks
larity measure Q was significant for all the solutions ob-
into two, with Sunbelt3 grouped together with Southeastern
tained, indicating the effectiveness of the algorithm. Each
and Sunbelt4 grouped with an independent team 共Utah State兲,
a team from Western Atlantic 共Boise State兲, and the Moun- iteration takes a linear time O共m兲, and although one can ob-
tain West conference. The latter grouping is due to the fact serve the algorithm beginning to converge significantly after
that every member of Sunbelt4 has played with Utah State about five iterations, the mathematical convergence is hard to
and with Boise State, who have together played five games prove. Other algorithms that run in a similar time scale in-
with four different teams in Mountain West. There are also clude the algorithm of Wu and Huberman 关26兴 关with time
five independent teams which do not belong to any specific complexity O共m + n兲兴 and that of Clauset et al. 关30兴 which
conference and are hence assigned by the algorithm to a has a running time of O共n log2 n兲.
conference where they have played the maximum number of The algorithm of Wu and Huberman is used to break a
their games. given network into only two communities. In this iterative
process two chosen nodes are initialized with scalar values 1
V. TIME COMPLEXITY and 0 and every node updates its value as the average of the
values of its neighbors. At convergence, if a maximum num-
It takes a near-linear time for the algorithm to run to its ber of a node’s neighbors have values above a given thresh-
completion. Initializing every node with unique labels re- old then so will the node. Hence a node tends to be classified
quires O共n兲 time. Each iteration of the label propagation al-
to a community to which the maximum number of its neigh-
gorithm takes linear time in the number of edges 关O共m兲兴. At
bors belong. Similarly if in our algorithm we choose the
each node x, we first group the neighbors according to their
same two nodes and provide them with two distinct labels
labels 关O共dx兲兴. We then pick the group of maximum size and
共leaving the others unlabeled兲, the label propagation process
assign its label to x, requiring a worst-case time of O共dx兲.
will yield similar communities as the Wu and Huberman al-
This process is repeated at all nodes and hence an overall
gorithm. However, to find more than two communities in the
time is O共m兲 for each iteration.
network, the Wu and Huberman algorithm needs to know a
As the number of iterations increases, the number of
priori how many communities there are in the network. Fur-
nodes that are classified correctly increases. Here we assume
thermore, if one knows that there are c communities in the
that a node is classified correctly if it has a label that the
maximum number of its neighbors have. From our experi- network, the algorithm proposed by Wu and Huberman can
ments, we found that irrespective of n, 95% of the nodes or only find communities that are approximately of the same
more are classified correctly by the end of iteration 5. Even size, that is, nc , and it is not possible to find communities with
in the case of Erdős-Rényi random graphs 关31兴 with n be- heterogeneous sizes. The main advantage of our proposed
tween 100 and 10 000 and average degree 4, which do not label propagation algorithm over the Wu and Huberman al-
have community structures, by iteration 5, 95% of the nodes gorithm is that we do not need a priori information on the
or more are classified correctly. In this case, the algorithm number and sizes of the communities in a given network;
identified all nodes in the giant connected component as be- indeed such information usually is not available for real-
longing to one community. world networks. Also, our algorithm does not make restric-
When the algorithm terminates it is possible that two or tions on the community sizes. It determines such information
more disconnected groups of nodes have the same label 共the about the communities by using the network structure alone.
groups are connected in the network via other nodes of dif- In our test networks, the label propagation algorithm
ferent labels兲. This happens when two or more neighbors of found communities whose sizes follow approximately a
a node receive its label and pass the labels in different direc- power-law distribution P共S ⬎ s兲 ⬃ s−␯ with the exponent ␯
tions, which ultimately leads to different communities adopt- ranging between 0.5 and 2 共Fig. 9兲. This implies that there is
ing the same label. In such cases, after the algorithm termi- no characteristic community size in the networks and it is
nates one can run a simple breadth-first search on the consistent with previous observations 关22,30,38兴. While the
subnetworks of each individual group to separate the discon- community size distributions for the WWW and coauthor-
nected communities. This requires an overall time of O共m ship networks approximately follow power laws with a cut-
+ n兲. When aggregating solutions, however, we rarely find off, with exponents 1.15 and 1.98, respectively, there is a
disconnected groups within communities. clear crossover from one scaling relation to another for the

036106-9
RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 共2007兲

FIG. 9. The cumulative probability distributions of community sizes 共s兲 are shown for the WWW, coauthorship and actor collaboration
networks. They approximately follow power laws with the exponents as shown.

actor collaboration network. The community size distribution solutions is low, with the Jaccard index ranging between
for the actor collaboration network has a power-law expo- 0.4883 and 0.5921, yet all five are significantly modular with
nent of 2 for sizes up to 164 nodes and 0.5 between 164 and Q between 0.857 and 0.864. This implies that the proposed
7425 nodes 共see Fig. 9兲. algorithm can find not just one but multiple significant com-
In the hierarchical agglomerative algorithm of Clauset et munity structures, supporting the existence of overlapping
al. 关30兴, the partition that corresponds to the maximum Q is communities in many real-world networks 关14兴.
taken to be the most indicative of the community structure in
the network. Other partitions with high Q values will have a
structure similar to that of the maximum Q partition, as these ACKNOWLEDGMENTS
solutions are obtained by progressively aggregating two
groups at a time. Our proposed label propagation algorithm, The authors would like to acknowledge the National Sci-
on the other hand, finds multiple significantly modular solu- ence Foundation 共Grants No. SST 0427840, No. DMI
tions that have some amount of dissimilarity. For the WWW 0537992, and No. CCF 0643529兲. One of the authors 共R.A.兲
network in particular, the similarity between five different acknowledges support from the Sloan Foundation.

关1兴 R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 共2002兲. 关7兴 L. Danon, A. Díaz-Guilera, and A. Arenas, J. Stat. Mech.:
关2兴 R. Albert, H. Jeong, and A.-L. Barabási, Nature 共London兲 401, Theor. Exp. 2006 P11010 共2006兲.
130 共1999兲. 关8兴 J. Eckmann and E. Moses, Proc. Natl. Acad. Sci. U.S.A. 99,
关3兴 A.-L. Barabási and R. Albert, Science 286, 509 共1999兲. 5825 共2002兲.
关4兴 M. Newman, SIAM 共Soc. Ind. Appl. Math.兲 Rev. 45, 167 关9兴 G. Flake, S. Lawrence, and C. Giles, Proceedings of the 6th
共2003兲. ACM SIGKDD, 2000, pp. 150–160.
关5兴 M. Girvan and M. Newman, Proc. Natl. Acad. Sci. U.S.A. 99, 关10兴 R. Guimerà and L. Amaral, Nature 共London兲 433, 895 共2005兲.
7821 共2002兲. 关11兴 M. Gustafsson, M. Hornquist, and A. Lombardi, Physica A
关6兴 S. Wasserman and K. Faust, Social Network Analysis 共Cam- 367, 559 共2006兲.
bridge University Press, Cambridge, England, 1994兲. 关12兴 M. B. Hastings, Phys. Rev. E 74, 035102共R兲 共2006兲.

036106-10
NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 共2007兲

关13兴 M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113 关26兴 F. Wu and B. Huberman, Eur. Phys. J. B 38, 331 共2004兲.
共2004兲. 关27兴 J. P. Bagrow and E. Bollt, Phys. Rev. E 72, 046108 共2005兲.
关14兴 G. Palla, I. Derényi, I. Farkas, and T. Vicsek, Nature 共London兲 关28兴 L. Costa, e-print arXiv:cond-mat/0405022.
435, 814 共2005兲. 关29兴 M. E. J. Newman, Eur. Phys. J. B 38, 321 共2004兲.
关15兴 F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Pa- 关30兴 A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev. E 70,
risi, Proc. Natl. Acad. Sci. U.S.A. 101, 2658 共2004兲. 066111 共2004兲.
关16兴 D. Karger, J. ACM 47, 46 共2000兲. 关31兴 B. Bollobás, Random Graphs 共Academic Press, Orlando, FL,
关17兴 B. Kernighan and S. Lin, Bell Syst. Tech. J. 29, 291 共1970兲. 1985兲.
关18兴 C. Fiduccia and R. Mattheyses, Proceedings of the 19th An- 关32兴 W. Zachary, J. Anthropol. Res. 33, 452 共1977兲.
nual ACM IEEE Design Automation Conference, 1982, pp. 关33兴 M. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404 共2001兲.
175–181. 关34兴 H. Jeong, S. Mason, A.-L. Barabási, and Z. Oltvai, Nature
关19兴 B. Hendrickson and R. Leland, SIAM 共Soc. Ind. Appl. Math.兲 共London兲 411, 41 共2001兲.
J. Sci. Comput. 16, 452 共1995兲. 关35兴 G. Milligan and D. Schilling, Multivariate Behav. Res. 20, 97
关20兴 M. Stoer and F. Wagner, J. ACM 44, 585 共1997兲. 共1985兲.
关21兴 C. Thompson, Proceedings of the 11th Annual ACM Sympo- 关36兴 D. Gfeller, J. C. Chappelier, and P. De Los Rios, Phys. Rev. E
sium on Theory of Computing, 1979, pp. 81–88. 72, 056135 共2005兲.
关22兴 M. E. J.Newman, Phys. Rev. E 69, 066133 共2004兲. 关37兴 D. Wilkinson and B. Huberman, Proc. Natl. Acad. Sci. U.S.A.
关23兴 P. Pons and M. Latapy, e-print arXiv:physics/0512106. 101, 5241 共2004兲.
关24兴 J. Duch and A. Arenas, Phys. Rev. E 72, 027104 共2005兲. 关38兴 A. Arenas, L. Danon, A. Díaz-Guilera, P. Gleiser, and R.
关25兴 M. E. J. Newman, Phys. Rev. E 74, 036104 共2006兲. Guimerà, Eur. Phys. J. B 38, 373 共2004兲.

036106-11
Int. J. Sensor Networks, Vol. 2, Nos. 3/4, 2007 201

Decentralised topology control algorithms


for connectivity of distributed wireless
sensor networks

Usha Nandini Raghavan* and Soundar R.T. Kumara


Department of Industrial Engineering,
The Pennsylvania State University,
University Park, PA, USA
E-mail: uxr102@psu.edu
E-mail: skumara@psu.edu
*Corresponding author

Abstract: In this paper, we study the problem of maintaining the connectivity of a Wireless Sensor
Network (WSN) using decentralised topology control protocols. Previous algorithms on topology
control require the knowledge of the density of nodes (λ) in the sensing region. However, if λ
varies continuously over time, updating this information, at all nodes is impractical. Therefore, in
addition to efficient maintenance of connectivity, we also wish to reduce the control overhead of the
topology control algorithm. In the absence of information regarding λ we study the connectivity
properties of WSNs by means of giant components. We show that by maintaining the out-degree
at each node as five will give rise to a giant connected component in the network. We also show
that this is the smallest value that can maintain a giant connected component irrespective of how
often or by how much λ changes.

Keywords: wireless sensor networks; topology control; percolation; giant connected component.

Reference to this paper should be made as follows: Raghavan, U.N. and Kumara, S.R.T.
(2007) ‘Decentralised topology control algorithms for connectivity of distributed wireless sensor
networks’, Int. J. Sensor Networks, Vol. 2, Nos. 3/4, pp.201–210.

Biographical notes: Usha Nandini Raghavan is a PhD student in the Department of Industrial
Engineering at the Pennsylvania State University. Her main research interest is in the
self-organisation of complex networks and localised algorithms as applied to wireless networks.
Other research interests include graph theory and supply chain management. She obtained her
Master’s in Mathematics from the Indian Institute of Technology, Madras and Master’s in Industrial
Engineering and Operations Research from the Pennsylvania State University.

Soundar R.T. Kumara is a Distinguished Professor of Industrial Engineering at the Pennsylvania


State University. He received joint appointments with the Department of Computer Science
and Engineering and School of Information Sciences and Technology. His research interests
include complexity in sensor networks, logistics and manufacturing, software agents and
neural networks. He is an elected active member of the International Institute of Production
Research.

1 Introduction concentrate on the second type of connectivity problem,


namely giant connected components in WSNs.
In this paper, we discuss two kinds of connectivity problems The topology of a WSN consists of a set of sensor nodes
on large-scale self organising Wireless Sensor Networks that perform the sensing tasks and the communication links
(WSNs). In the first case the goal is to obtain an entirely between these nodes that drive the networking exercises in
connected network, while in the second case, the goal is to the system (Bharathidasan and Ponduru, 2005; Estrin et al.,
obtain only a giant connected component. In networks such 1999; Goldsmith and Wicker, 2002; Hac, 2003; Ramanathan
as a computer network or the internet, it is critical to ensure and Rosales-Hain, 2000; Santi, 2005). The nodes are
that every node can communicate with every other node usually battery powered and the presence or absence of
via the communication links (maybe multihop). However, communication links between the nodes is influenced by
in a WSN where a large number of sensors coordinate to the distance between them. That is, due to severe energy
achieve a global sensing task it may be needless to spend constraints the nodes communicate only with nodes that are
extreme amounts of energy to ensure that the very last node within a small neighbourhood (Bharathidasan and Ponduru,
is connected. Instead it may be optimal to settle with just a 2005; Estrin et al., 1999).
giant connected component, that is, a connected component Distributed topology control algorithms (or protocols) are
which, contains a large fraction of the nodes. We specifically, usually employed to maintain how far or how many nodes

Copyright © 2007 Inderscience Enterprises Ltd.


202 U.N. Raghavan and S.R.T. Kumara

with which a given node should communicate (Estrin et al., that there exist optimal values for either the transmission
1999; Goldsmith and Wicker, 2002; Santi, 2005). There radius or the number of neighbours at each node that leads
exist many works that concentrate on distributed topology to the connectivity of the network (Gupta and Kumar, 1998;
control (Bettstetter, 2002a,b; Blough et al., 2003; Cerpa Krishnamachari et al., 2002; Meester and Roy, 1996; Xue
and Estrin, 2002; Glauche et al., 2003; Li et al., 2001, and Kumar, 2004). However, in all these cases, the critical
2003; Rodoplu and Meng, 1999). Such topology control values depend on the density of nodes (λ) or equivalently, the
protocols are required because, very often the wrong topology number of nodes (N ) in the sensing region. This would imply
can considerably reduce the performance of the system. that in order to maintain connectivity, the topology control
For example, a sparse network can increase the end to end protocols will require an update on the current density at all
packet delay and threaten the connectivity of the network. nodes. Such global updates (required in desired time intervals
On the other hand, a dense network can promise a connected over the entire lifetime of the network) lead to prohibitive
network with higher probabilities but also leads to higher control overhead (especially so in our scenario) and we would
interferences in the network resulting in limited spatial reuse like to avoid the same. Note that maintaining an optimal
(Ramanathan and Rosales-Hain, 2000). transmission radius or node degree is important because
In this paper, we are particularly interested in developing energy conservation and low interferences are some of the
topology control algorithms for WSNs that are large-scale primary objectives of WSNs (Estrin et al., 1999; Goldsmith
and lack a centralised authority. Advantages of such and Wicker, 2002; Santi, 2005).
distributed WSNs include the ability to rapidly deploy the Our focus therefore is to develop decentralised topology
nodes in a sensing region (which may be unmanned or control algorithms, which can maintain connectivity using
inhospitable), the distributed nature which allows for robust only the localised information available at each node. We
network performances and the lack of single points of failure. do not assume any global information (e.g. density) to be
In addition it is also possible to tailor the network design available at the nodes. In such cases the best one can hope
for intended applications (Goldsmith and Wicker, 2002). to do is to pool in as many nodes as possible to form a
We further assume that the density of active nodes can vary connected network at any point in time (Santi, 2005). That is,
with time. This variation in densities arises because of nodes our measure of connectivity is based on the presence of giant
dying due to loss of power (see Figure 1) or in the case when components in the network. We will in fact show that this
they have energy harvesting capabilities the nodes may go relaxation helps us in finding density independent values for
through an on/off cycle. In the ‘off’ period, the nodes harvest the number of neighbours needed to maintain connectivity.
energy and do not participate in the sensing and networking Though simulation based results exists (Santi, 2005), to our
tasks. Once they acquire sufficient energy they switch ‘on’ to knowledge we have not seen an analytical treatment of this
join the network. problem. In this paper we attempt to do the same.

Figure 1 Density of nodes decreases as time increases when


the nodes are battery powered. Note that if the 2 Problem statement
transmission radii are independent of the density
(constant in this case), then the neighbours of a node We consider a network of nodes that are distributed according
decrease as density decreases. Thus the probability of to a Poisson point process of intensity λ in R 2 . Each node
connectivity of the network also decreases has directed edges pointing towards its nearest k neighbours.
Our goal is to find the smallest value for k that gives rise to
an unbounded connected component or a giant component,
in the bi-directional subnetwork.
The rest of this paper is organised as follows. We begin
Section 3 with some preliminaries on graph terminologies
and an introduction to percolation. We further review
some WSN models and a class of model called nearest
k-neighbours network. In Section 4 we review the results
from literature on connectivity of WSNs and on topology
control in Section 5. In Section 6 we show that for the nearest
k-neighbours model, the critical value for the appearance of
a giant component in terms of k is 5. This is followed by
verification of the analysis using simulation and an estimate
of the energy consumed in maintaining a degree k in the
nearest k-neighbours network. In Section 7, the proposed
topology control algorithm and critical thresholds for the
appearance of giant strongly connected components with
Connectivity of WSNs is well researched and various results
respect to k are discussed. We finally conclude in Section 8.
exist in literature (Bettstetter, 2002a,b; Booth et al., 2003;
Farago, 2002, 2004; Franceschetti et al., 2003; Glauche
et al., 2003; Gupta and Kumar, 1998; Krishnamachari et al., 3 Wireless sensor network models
2002; Meester and Roy, 1996; Penrose, 2003; Xue and
Kumar, 2004; Ye and Heidmann, 2003). In all the sensor In this section, we review some definitions and concepts from
network models considered so far, it has been established graph theory and percolation. We further look at different
Decentralised topology control algorithms 203

classes of wireless network models (Sections 3.2.1 and 3.2.2) (where λ = E(X[0, 1]d )), if, (1) for mutually
and a new class of models called as the nearest k-neighbours disjoint Borel sets A1 , . . . , Ak , the random variables
model (Section 3.2.3). X(A1 ), . . . , X(Ak ) are mutually independent and (2) for
any bounded Borel set A we have for every k ≥ 0,
3.1 Preliminaries P (X(A) = k) = e−λ(A) (λk (A)k )/k!, where (.)
denotes Lebesgue measure in R d (Meester and Roy,
• Network: an undirected network G(V , E) consists of a 1996). In this paper we only consider the case when
set of nodes V, a set of edges E and a function w : E → d = 2. Also we can simulate a Poisson point process
V × V . That is, every element of the set E is mapped of intensity λ in a finite region of area A as follows.
to an ordered pair of points from the set V × V . On the
other hand a directed network again consists of the sets – First generate the number of points in the region of
V and E, but has two functions s, t : E → V, where s(e) area A from a Poisson distribution of mean λA.
represents the source and t (e) represents the target of the
edge e. – Place these points in the region uniformly randomly.

• Degree: degree of a node v ∈ V is the number of edges In most cases for simulation (and as in this paper),
incident on that node. In the case of a directed graph we it is sufficient to assume that the number of points is
have two different kinds of degrees on a node, namely λA, instead of generating this number from a Poisson
in-degree and out-degree. While in-degree of v is the distribution of mean λA. In this case it is called as a
number of edges with the target on v, out-degree of uniform Poisson point process.
v is the number of edges whose sources are at v. • Percolation: percolation theory studies the flow of
In undirected graphs degree of a node is just the number fluid across a random media, in particular on a regular
of edges incident on it. d-dimensional lattice where the edges are either present
• Path: a (directed) path in a directed network is an or absent with probabilities p and 1 − p, respectively
alternating sequence of nodes and edges denoted as (Albert and Barabasi, 2002; Bollobas, 1985; Meester and
v0 , e1 , v1 , e2 , . . . , vi , ei+1 , . . . , en , vn . Here v0 is the Roy, 1996). It is obvious that for small p only a few
origin and vn is the terminus of the path. ei+1 is the edge edges are present and hence percolation of a fluid across
that has its source at vi and target at vi+1 . In undirected this media is not possible. But one of the interesting
networks, a path is again an alternating sequence of nodes phenomena is the presence of a percolation threshold
and edges and ei+1 is the edge incident on both vi and pc , at which a percolating cluster of nodes connected
vi+1 . by edges begin to appear rather suddenly. That is for
p < pc a percolating cluster does not exist almost surely
• Connected network: in undirected networks if a subset while for p > pc it exists almost surely. To put in
V1 ⊆ V is such that there exists a path between any simple terms, for small values of p, only few edges
two nodes x, y ∈ V1 , then the network H1 (V1, EV1 ) is are present in the network and hence percolation is not
called a connected component of the network G(V , E). possible. However, as p increases gradually, so does the
Here EV1 ⊆ E contains only those edges that have both number of edges in the network and hence one would
the nodes it is incident on in V1 . Further H1 (V1 , EV1 ) expect the possibility of percolation to also increase
is maximally connected if there exists no V2 ⊆ V gradually. On the contrary, with the gradual increase
such that V1 ⊆ V2 and H2 (V2 , EV2 ) is connected. in p, the appearance of a percolating cluster arises rather
In general it is possible to partition V into disjoint sets suddenly. Suppose we consider a network in which any
of V1 , V2 , . . . , Vi (note that V1 ∪ V2 ∪ . . . ∪ Vi = V ) given pair of nodes is connected with a probability p
such that Hj (Vj , EVj ) is maximally connected in G, (also known as Erdos-Renyi random graphs (Bollobas,
∀j = 1, 2, . . . , i. We call a network as connected if 1985)), we can see from Figure 2 that as p increases
and only if in such a partition i = 1 (Definition 1). Also, gradually from 0, the giant component in the network
if i > 1 in such a partition, but if the size of the largest appears suddenly.
component is O(N ) then the network is said to have
a giant component (Definition 2). If the nodes in the Note that a percolating cluster of nodes in a network is the
system are assumed to be spread across the entire R 2 same as a connected network. In most cases in literature
space with some density λ, then the giant component percolation properties (connectivity) of large-scale networks
is also called as the unbounded connected component. (systems) are measured in terms of giant components (Albert
That is, the number of nodes in the largest component and Barabasi, 2002; Bollobas, 1985; Meester and Roy, 1996;
is unbounded. However, unless otherwise mentioned Penrose, 2003). Thus critical thresholds are determined for
we assume the former definition (Definition 1) for a the appearance of a giant component in the networks. This
connected network. is precisely the kind of approach we will be using in this
paper. However instead of an edge connection probability p,
• Poisson point process: given a compact set K (in we consider a different parameter k, which, is the number
R d ), a point process X is a measurable mapping from of neighbours for a node in the network and find critical
a probability space to the configurations of points thresholds for connectivity with respect to k. It is important
of K. The total number of points in a point process to note that the critical threshold gives the point where
is then a random variable. Further a point process connectivity can be obtained with as few number of edges
X in R d is a Poisson point process of intensity λ as possible.
204 U.N. Raghavan and S.R.T. Kumara
Figure 2 The graph shows the size of the largest connected Boolean model is usually denoted as (X, ρ), where ρ is the
component in a network of 500 nodes and a connection random variable for the radii on the points of X (Meester and
probability p. Note that the giant component arises Roy, 1996).
suddenly as p increases gradually In this case, the points of X can be thought of as sensor
nodes and the radius of the balls represent the transmission
radius of the sensors. For the case when ρ = r a.s
(almost surely), numerical values and bounds on the critical
threshold of parameters such as λ and r for the appearance
of unique giant connected components are available (Dall
and Christensen, 2002; Meester and Roy, 1996). Here
(and elsewhere) the uniqueness of the giant component is
interesting. This implies that there is at most and at least only
one giant component and that there do not exist two disjoint
giant components in the network. Also note that the cases
where ρ = r a.s. are also called the fixed radius models
(Krishnamachari et al., 2002).

3.2.2 Poisson random connection model


Similar to the Poisson Boolean model, the random connection
3.2 Network models model is also driven by a Poisson point process X.
However, unlike the Boolean case, here, the existence of
Large numbers of tiny sensors are usually placed randomly edges between nodes is determined by a non-increasing
in a sensing region. Hence the node distribution is assumed function g from the set of positive reals to [0,1]. Thus
to follow a Poisson point process in the sensing region given any pair of points x and y, the existence of an
with a tunable parameter for the density (denoted as λ). (undirected)edge is determined with probability g(x − y),
The edges (may be directed or un-directed) of the network are independent of other pairs.  ·  here stands for the Euclidean
formed by either setting the transmission radius at each node metric. A Poisson random connection model is usually
or by choosing neighbours based on a connection function denoted as (X, g). It has been shown that there exist
(see Figure 3). In this section we review different classes of finite critical thresholds (λc (g)) that depend on g, for the
network models that are commonly used as a representation appearance of unique giant connected components in the
for WSNs. networks (Meester and Roy, 1996).

3.2.3 Nearest k-neighbours model


Figure 3 An example of a sensor network in which nodes are
uniformly randomly distributed in a sensing region. The third kind of model that we consider here is a neighbour
The edges of the network are formed based on the based model. Unlike the above models here the nodes form
transmission radii at the nodes directed edges towards its closest k neighbours in the plane.
k here is a parameter and is a fixed number for all nodes in the
network. This is usually referred as the nearest k-neighbours
model (Blough et al., 2003; Raghavan et al., 2005). In this
paper, however, we assume the nearest k-neighbours model to
be the bidirectional subnetwork obtained from such a directed
network. This model cannot be captured by both the Poisson
Boolean model as well as the random connection model.
In the random connection model the connection function g
is universal across all nodes in the network. Hence for two
different nodes, say x and y, the distance of the kth closest
neighbour is suppose dx and dy , then it is possible without loss
of generality that dx > dy . Hence for node x , g(dx ) = 1 and
g(dy ) = 1, whereas, for node y, g(dx ) = 0 and g(dy ) = 1.
Thus, it is not possible to find a connection function that is
universal across all nodes in the network. Also, in the Poisson
Boolean model, ρ is independent of X, which is not the
case here.
3.2.1 Poisson Boolean model
Even though k is independent of λ (or N ), it is
Here the nodes are assumed to follow a Poisson point possible that the critical value kc is dependent on λ. However,
process X of intensity λ in the sensing region (which can we will establish that kc for connectivity (in terms of giant
be either the entire R 2 space or a two-dimensional unit component in the network) is independent of λ and thus
cube). Each point of X is the centre of a ball of a random can be used in the design of decentralised topology control
radius. These radii are independent of X. The Poisson algorithms.
Decentralised topology control algorithms 205
4 Background on the connectivity problem numbers (Hou and Li, 1986). They do not however address
in WSNs the problem of connectivity of the network.
While, simulation suggests that in most cases the fixed
Fixed radius models are the most widely used graphs to radius models are connected by assuming an average number
represent WSNs. Here, there is an edge (undirected) between of neighbours as 6 or 8, this is not always the case.
two nodes if and only if they are no more than a distance r Xue and Kumar (2004) studied the problem of connectivity
apart. In most cases the distance metric is the 2 norm and based on the node degree required and showed that the
some times other norms such as p , 1 ≤ p ≤ ∞ are also number of neighbours required in fact grows as (log N ) and
considered (Penrose, 2003). that there exists no such magic numbers. They assume that
each node forms undirected edges (two way communication)
incident on their ϕN nearest neighbours, where N is the
4.1 Critical transmission radius number of nodes placed uniformly randomly in a unit square.
The problem of connectivity on such networks is well studied This implies that to maintain the connectivity of a network
and one of the earliest results was proved by Philips et al. the number of neighbours cannot be bounded as the number
(1989). They assumed the sensing region to be a square of nodes increases.
of area A with a constant density λ. Hence as A → ∞,
number of nodes also increases.
√ They showed that for any
4.3 Continuum percolation and connectivity
given  > 0, if, r ≤ (1 − )lnA/πλ, then under the
assumption of a constant density the graph is almost surely The problem of connectivity has also been well researched
disconnected as A → ∞. This implies that for any given r in the area of continuum percolation (Avram and Bertsimas,
and λ, we can always find an A large enough such that the 1993; Booth et al., 2003; Franceschetti et al., 2003; Glauche
graph is almost surely disconnected. A similar analysis was et al., 2003; Meester and Roy, 1996; Penrose, 2003;
done by Gupta and Kumar (1998) on a unit disk and they Quantanilla, 2001; Quantanilla et al., 2000; Raghavan et al.,
found that 2005). The presence of a giant component is usually
√ for the network to be connected with probability
1, r = (lnN + c(N ))/π N, where c(N ) → ∞ as N → ∞. considered as sufficient in order to study the networks’
N here stands for the number of nodes in the sensing percolation properties. This is unlike all the works described
region. above that require every node to be present in the giant
Penrose (1999) studied in general the problem of component, that is, an entirely connected network. The works
k-connectivity of fixed radius networks in d-dimensional by Meester and Roy (1996), Penrose (2003), Quantanilla
unit cubes (d ≥ 2). He showed that the graph becomes et al. (2000) and Quantanilla (2001) and others is concerned
k-connected almost surely whenever all nodes have degree with the kind of percolation that occurs by keeping the radius
greater than or equal to k. That is, as N → ∞, P {smallest r as fixed and varying the density of nodes λ. They show
r at which the network is k-connected = smallest r at which that as the density of the nodes increases, for a given r,
minimum degree ≥ k} → 1. These results hold in general there exists a finite percolation threshold λc beyond which
for any lp distance metric such that 1 < p < ∞. The case of a unique giant component always exist. Similarly, in a
∞ was discussed by Appel and Russo (2002). d-dimensional unit cube, given the number of nodes N , one
All these critical values however depend on N which we can find a similar threshold Nc for N . It is easy to translate
do not desire. this threshold in terms of the critical number of neighbours
using the relation α = Np (Dall and Christensen, 2002).
Here α is the expected number of neighbours and p is the
4.2 Critical number of neighbours
probability that the given node is incident on any other node
While all the above work looked at the properties of the in the network. For example, p = π r 2 in a network of nodes
transmission radii for the connectivity of the network, there in a two-dimensional unit square. Thus, at the threshold,
are also other problems that focused on the desired number of αc = Nc p.
neighbours. It was first studied in the context of throughput The results so far suggest that αc ≈ 4.51 when d = 2
capacity in packet radio networks by many researchers and αc ≈ 2.7 when d = 3 (Dall and Christensen,
(Hou and Li, 1986; Kleinrock and Silvester, 1978; Takagi 2002; Quantanilla et al., 2000) in the fixed radius models.
and Kleinrock, 1984). Kleinrock and Silvester (1978) studied √ given N , the critical transmission radius rc (N ) =
Also,
the capacity of packet radio networks that are randomly 1/ π {αc /N (d + 2/2)}1/d for the appearance of√a unique
distributed in a region and uses slotted ALOHA as their giant component. When d = 2, rc (N ) is simply αc /π N
access scheme. They assume that each packet radio unit (Dall and Christensen (2002) and Raghavan et al. (2005)).
uses a predetermined fixed radius for transmission and try Note, this implies that if the √density of nodes is λ in the
to maximise the one hop progress of a packet in the desired entire R 2 space, then rc (λ) = αc /π λ.
direction. The passage of the message from the source to It was also shown that there exist similar critical
the target was formulated as a stochastic process and they density thresholds (λc (g)) for the random connection
developed an objective (throughput), which they optimised, models with connection function g (Meester and Roy,
based on the average number of neighbours. In this case they 1996). Further, Franceschetti et al. (2003) showed that
showed that 6 is the magic number (independent of the system squish-squashing a connection function g into another

size) that maximises the throughput. Takagi and Kleinrock connection function h = pg( px) for some 0 <
later revised this number to 8 (Takagi and Kleinrock, 1984). p < 1, will lead to smaller density thresholds for a
There are also works that similarly suggest other magic percolating cluster of nodes. That is, λc (g) ≥ λc (h).
206 U.N. Raghavan and S.R.T. Kumara

Note that h is a version of g in which the probabilities and uses only few information exchanges between nodes to
of connection between nodes are reduced by a factor p maintain the topology.
and is stretched to maintain the same effective area as In all the protocols mentioned above, if the nodes are
g. This implies that the presence of even a few long mobile and their densities vary over time, then a large number
range connections can help to reach percolation at a of information exchanges between nodes will be required
lower density of points. They also introduced another to maintain the topology (Santi, 2005). Even though any
connection function f , which is a shift-squeezed version global updates such as the number of active nodes or the
of g. That is, the function g is shifted by a distance geographical location of nodes can be propagated in the
s (thus two nodes that are at most a distance s network, this information may become stale in the event
apart will not be connected) and squeezed so that of on/off nodes. Using stale information to readjust the
it still has the same effective area as g. It turns out that transmission radius at the nodes might result in non-desirable
(by means of simulation) long-range edges are more helpful topologies.
in the percolation process than short-range edges for a given Local Information No Topology (LINT) (Ramanathan
density of points. This shows the criticality of long-range and Rosales-Hain, 2000) is a neighbour based protocol that
edges to the connectivity of a network. This is usually referred specifically takes into account the mobility of the nodes.
to as the small-world concept (Watts and Strogatz, 1998). When the nodes are mobile, the number of nodes within
a given node’s transmission radius varies with time. LINT
therefore uses only the locally available information about a
5 Background on topology control protocols node’s current transmission radius (rcurrent ) and current
degree c to maintain connectivity. If the desired degree for
Topology control can be achieved in various ways. They connectivity is d, then under the assumption of uniform
could be one of location based mechanisms, direction based random distribution of nodes, the required radius (rreqd ) is
mechanisms or neighbour based to name a few (Santi, 2005). calculated using the formula, rreqd = rcurrent − 5 log d/c
Most of the protocols based on such methods try to set the (Glauche et al., 2003; Ramanathan and Rosales-Hain, 2000).
transmission power or radius at the nodes appropriately so The propagation loss function is assumed to vary as some
as to maintain the connectivity of the network. Note that the  power of distance and in practice 2 <  < 5. Advantages
transmission power of a node is a measure of how fast its of this protocol is that it does not assume any information such
energy depletes. as location of nodes or direction of neighbours to be present at
Location based protocols use information about the the nodes. Further, this formula can be used to both increase
position of the nodes. It is assumed that each node can or decrease the transmission radius according to d. However,
somehow determine its location accurately (e.g. using GPS). as discussed in Section 4.2 the critical value for d depends
Examples of protocols that are location based include R&M on N (see Figure 1). Therefore, varying densities over time
protocol (Rodoplu and Meng, 1999) and Local Minimal cannot be handled well using this protocol.
Spanning Tree (LMST) protocol (Li et al., 2003). The R&M In this paper, we aim to achieve topology control in a
protocol tries to obtain an optimal topology, where every distributed manner and assuming no global updates to be
node sends messages (multihop fashion) to the only master available at the nodes. Then, to maintain connectivity we
node in the network. To do so this protocol requires global try to pool in as many nodes as possible into one connected
information to be exchanged between nodes which, will lead component and wish to maintain a giant component in the
to message overhead, especially when the network is highly network throughout its lifetime. Specifically, for the nearest
dynamic. In the LMST protocol, each node builds a minimal k-neighbours model, we will show that the critical out-degree
spanning tree based on the information available about other kc required for a giant component in the network is 5 and
nodes up to a predefined distance. The transmission radius is independent of N or λ. In Blough et al. (to appear) the
of all the nodes are then adjusted to have sufficient power authors have shown by means of extensive simulation (10,000
to communicate with the neighbours of their respective instances of the network of sizes between 50 and 500) that
LMSTs. for nodes distributed uniformly randomly in a unit square,
Direction based protocols assume that each node has taking k = 6 will always result in 95% of the nodes in
the capability to somehow determine the direction of all the largest component. We on the other hand assume that
its neighbours. Cone Based Topology Control (CBTC) O(N ) nodes in the largest or giant component are equivalent
(Li et al., 2001) is one such protocol where nodes adjust to ‘as good as possible’ connectivity and show that 5 is the
their transmission radii so as to communicate with the closest magic number. Note that our’s is an average case analysis.
nodes in all directions. A parameter ρ is used as a step length Due to the centrality of measures in such networks (Farago,
to discritise the possible directions in [0, 2π). Bounds on 2002), all except a very small percentage of instances of the
ρ have been determined in Li et al. (2001) to generate a nearest k-neighbours network will have the same statistical
connected network topology. properties as the average case. In other words this means
In neighbour based topology control the nodes, given that when k = 5 or above, the network will have a giant
their transmission radii, are required to have a knowledge of component with high probability. This is what we show in
its neighbours. k-Neigh protocol proposed by Blough et al. the next section.
(2003) is one such protocol that controls the topology by Our interest in this paper is not in developing the protocols
keeping track of the number of neighbours. Here it is assumed for topology control. Instead we assume that there exist
that when a node x receives a message from another node y, efficient protocols that can maintain the connectivity of the
it can estimate its distance from y. This protocol is simple network, in a distributed and localised fashion, without the
Decentralised topology control algorithms 207

requirements of any global information (such as density). kα -neighbours network. Therefore, for a given X and ∀ k ≥
LINT is one such example. However, it does not specify kα , the nearest k-neighbours model has an unbounded
what the desired number of neighbours d should be. ‘Based connected component.
on the study of such topology control protocols we extract the To determine the value of kα , we know from (Cressie,
necessary conditions and constraints to determine a desirable 1991) that if Wk is the random variable for the distance
threshold for the number of neighbours. In this case, a density of the kth nearest-neighbour (k ≥ 1 ) from a point in
independent threshold for the presence of giant components X, then the probability density function of Wk is given
2
in WSNs’. by, f (wk ) = 2(π λ)k wk2k−1 e−π λwk /(k − 1)! It immediately
follows that E(Wk ) = k(2k)!/(2 k!)2 λ1/2 . In order to obtain
k

6 Connectivity of WSNs kα , let us first calculate the P (Wk ≥ rc (λ)).


 ∞
2  k 2 dwk
Here we consider the nearest k-neighbours model where P (Wk ≥ rc (λ)) = π λwk2 e−π λwk (1)
rc (λ) (k − 1)! wk
nodes are distributed according to a Poisson point process
X of intensity λ in R 2 . Note that as mentioned By changing the variable as x = π λwk2 , we get
in Section 3.2.3, in the nearest k-neighbours model,
 ∞
each node forms a directed edge towards its closest x k−1 e−x
k-neighbours. We then ‘extract the subnetwork that consists P (Wk ≥ rc (λ)) = dx (2)
π λrc2 (λ) (k − 1)!
of only the bidirectional edges and study its connectivity
properties’.  (λπ r 2 (λ))y e−λπ rc2 (λ)
y=k−1
In this section we will obtain an expression for the = c

critical number of neighbours kc (or critical out-degree), y=0


y!
such that, in the nearest k-neighbours model, for k < kc
there exists no unbounded connected component almost Refer Hogg et al. (2005) for the right hand side equation.
surely and for k ≥ kc there exists an unbounded connected But we also know that, the critical average degree in the
component almost surely. In particular, we will show that network for a fixed radius model to percolate is approximately
irrespective of the density λ, kc = 5 satisfies the above 4.51 (Dall and Christensen, 2002; Quantanilla et al., 2000).
requirements. Which means that in order √ to obtain an average degree of
4.51 we√ must set rc (λ) = 4.51/π λ. Note that if we fix
r = d/π λ in the fixed radius network model, then the
6.1 Critical number of neighbours
expected degree on a node in the network is d. Substituting
When k = 0 it is obvious that the network has no edges for rc (λ) in Equation (2), we get,
and hence no unbounded connected component. We will first
show that there exists a kα such that for all k > kα there  (4.51)y
y=k−1

exists an unbounded connected component almost surely in P (Wk ≥ rc (λ)) = e−4.51 (3)
y=0
y!
the nearest k-neighbours model. Then if kc exists it must
be ≤ kα . and this probability is independent of λ. Thus kα which is
Let rc (λ) be the critical radius for connectivity of a now the smallest k such that the probability in Equation (3)
fixed radius network whose nodes are distributed is 1 is in fact independent of λ. However, only as k → ∞
according to a Poisson point process X of intensity λ the above probability tends to 1. But we see that even
in R 2 (Dall and Christensen, 2002; Raghavan et al., for k around 10, this probability is more than 0.99 and
2005). Suppose each node adjusts its transmission for k about 15 it is arbitrarily close to 1. Hence we can
radius to accommodate a desired number of neighbours k. safely assume that k = 15 yields a network in which each
Then a directed edge from a node towards its k neighbours is node has a transmission radius of at least rc (λ). Thus by
formed. Let definition of rc (λ) this network will have an unbounded
kα = inf{k|infi∈X {ri |outdegree at all connected component. Requiring every node to have at least
a transmission radius of rc (λ) gives a pessimistic estimate
nodes in the network = k} ≥ rc (λ)} of kc . If on the other hand we find the smallest value for k
such that the expected transmission radii on the nodes in the
kα is then the smallest k such that, the smallest transmission
network is at least rc (λ), then we need,
radius required at any node to have k outgoing neighbours,
is at least rc (λ). Note that even though k is independent of λ, 
k(2k)! 4.51
kα might not. If each node adjusts its transmission radius E(Wk ) = 1
≥ = rc (λ) (4)
(2k k!)2 λ 2 πλ
to form directed edges with at least kα neighbours,
there will be edges in both directions between nodes and this implies
that are no more than a distance rc (λ) apart. Hence

‘the fixed radius network with radius rc (λ) becomes a k(2k)! 4.51
subgraph of the nearest kα -neighbours network’. This k 2
≥ (5)
(2 k!) π
is because in nearest kα -neighbours network each node
has a transmission radius of at least rc (λ). Also, since and we see that k = 5 is the smallest value for which the
the fixed radius network has an unbounded connected above inequality is satisfied (see Figure 4). This also implies
component (by the definition of rc (λ)) so does the nearest that k = 4 is the largest value for which the above inequality
208 U.N. Raghavan and S.R.T. Kumara

is not satisfied. Hence for values of k up to 4, the nearest consumes approximately the same amount of energy. To show
k-neighbours model does not have an unbounded connected this, we need the following.
component (by the definition of rc (λ)). Further this is true Suppose we restrict the Poisson point process X of
irrespective of the density of nodes in the network. Simulation intensity λ in R 2 to a finite region, say a unit square. Then, let
results also agree that for k = 5 and above the nearest the number of nodes in this finite region be N . The length of
k-neighbours model of any density λ has an unbounded a graph is the sum of the length of its edges. Hence for the kth
connected component. While for k < 5 there exists no giant nearest neighbour graph in which each node is adjacent to its
components (see Figure 4). kth closest neighbour, the length Lk,N is given by Avram and
Bertsimas (1993),
Figure 4 For each fixed k, the graphs show how the size of the   j =k
largest connected component grows as the number of Lk,N 1  (j − 1/2)
lim E = (6)
nodes N increases. Note that while there exists no giant N →∞ N 1/2 2π 1/2 j =1 (j − 1)!
component for k = 3, 4 it appears suddenly for k = 5
Therefore the expected sum of the transmission radii on the
sensor nodes in the nearest k-neighbours model is the same
as the expected length of the kth nearest neighbour graph.
Also, the sum √ of the transmission radii (Lr,N ) in a fixed
radius model is dN /π where d is the desired connectivity.
Taking k = 5 and N sufficiently large in Equation (6), we
get, E(Lk,N ) ≈ 2.1809(N/π )1/2 . Also for the same
connectivity, that is taking d = 5, we have E(Lr,N ) ≈
2.2361(N/π )1/2 . On comparison we see that for a fixed N
and k, Lr,N ≈ Lk,N .

6.4 Strongly connected components


N In the nearest k-neighbours model we only considered the
bidirectional edges and ignored the presence of unidirectional
ones. Hence if we study the network for the appearance of
6.2 Simulation results giant strongly connected components, then the critical value
To verify the above analysis using simulation, we fixed the for k is 4 (see Figure 5). By a strongly connected network
size of the sensing region to be a unit square. The nodes we mean that for any pair of nodes x and y, there exists a
are placed according to a uniform Poisson point process in directed path both from x to y and from y to x.
the sensing region (refer Section 3.1). We fixed the size of
the network to N and the number of neighbours as k in each Figure 5 For each fixed k, the graphs show how the size of the
simulation. We varied N from 50 to 5000 and k from 3 to 6. largest strongly connected component grows as the
We ran 10 experiments each for a fixed N and k. The results number of nodes N increases. Note that while there
from these experiments are plotted as graphs in Figure 4. exists no giant component for k up to 3 it suddenly
Note that for k = 5 approximately 95% of the nodes appears for k = 4
are in the giant component. For k = 4 the size of the
largest component scales sublinearly with N. This implies
that for N large enough, no matter what fraction of N ,
less than or equal to 95%, we want in the largest component
the corresponding critical value for k will be 5. That is,
irrespective of whether we need only 95% or 90% or 75%
of the nodes in the largest component the optimal value for k
does not change. While we have a bounded node degree on the
nodes until this point, the node degree becomes unbounded
and the optimal k increases rapidly as the fraction of nodes
in the largest component increases from 95% to 100%.
In fact, for full connectivity k should be at least 5.1774 log(N )
(Xue and Kumar, 2004).
N
6.3 Energy consumption in the nearest
neighbours model 7 Topology control algorithm to maintain
The sum of transmission radius at all the nodes in a WSN connectivity in a distributed WSN
is a measure of the energy consumed in the network. In this
section we will show that for the same number of neighbours, A simple algorithm for topology control to obtain/maintain
both the fixed radius model and the nearest neighbours model connectivity is to, adjust the transmission radius on the nodes
Decentralised topology control algorithms 209

so that the number of neighbours (in directed sense or simply Avram, F. and Bertsimas, D. (1993) ‘On central limit theorems
out-degree) is 5. Due to the distributed and localised nature of in geometrical probability’, The Annals of Applied Probability,
this algorithm it is scalable for large number of nodes in the Vol. 3, No. 4, pp.1033–1046.
sensing region. This nature also helps the network to adapt Bettstetter, C. (2002a) ‘On the connectivity of wireless
to constantly changing densities and mobility of the nodes. multihop networks with homogeneous and inhomogeneous range
Further, the degree on the nodes are bounded by k, keeping assignment’, Proceedings of the IEEE Vehicular Technology
the interferences low. Conference, Vol. 3, pp.1706–1710.
The critical values (derived in the previous section), Bettstetter, C. (2002b) ‘On the minimum node degree and
are in some sense the optimal node degree for the worst case connectivity of a multihop wireless network’, Proceedings of
scenario. This is because, for any node degree less then 5 the ACM MobiHoc, pp.80–91.
the network does not have a giant component of bidirectional Bharathidasan, A. and Ponduru, V.A.S. (2005) ‘Sensor networks: an
links and for any node degree less than 4 the network does not overview’, Available at: http://www.cs.binghamton.edu/∼kliu/
have a giant strongly connected component. Therefore any survey.pdf.
decentralised neighbour-based topology control protocol can Blough, D., Leoncini, M., Resta, G. and Santi, P. (2003) ‘The
employ higher values for k than derived above. However the k-neigh protocol for symmetric topology control in ad hoc
aim should be to ensure that in the worst case k should not networks’, Proceedings of the IEEE MobiHoc, pp.141–152.
drop below 5 (or 4). Blough, D.M., Leoncini, M., Resta, G. and Santi, P. (to appear) ‘The
k-neighbors approach to interference bounded and symmetric
topology control in ad hoc networks’, IEEE Transactions on
8 Conclusion Mobile Computing.
Bollobas, B. (1985) Random Graphs, Orlando, FL: Academic Press.
In this paper, we have considered efficient maintenance of
connectivity of a wireless sensor network. In addition to Booth, L., Bruck, J., Fransceschetti, M. and Meester, R. (2003)
energy efficient values for critical degree on the nodes, we ‘Covering algorithms, continuum percolation and the geometry
of wireless networks’, The Annals of Applied Probability, Vol. 13,
considered the constraints and requirements from the view
No. 2, pp.722–741.
of topology control protocols, when the network is highly
dynamic. In specific, in the presence of mobile on/off nodes, Cerpa, A. and Estrin, D. (2002) ‘Ascent: adaptive self-configuring
it is desirable for topology control protocols to use only sensor networks topologies’, Proceedings of the IEEE
INFOCOM, Vol. 3, pp.1278–1287.
localised information in a distributed manner. We therefore
assume that no global updates such as the current density λ Cressie, A.C.N. (1991) Statistics for Spacial Data, Wiley Series in
is available at the nodes. We use a neighbour based topology Probability and Mathematical Statistics, USA: John Wiley and
control because it does not require any information such Sons.
as location of nodes or direction of neighbours, which is Dall, J. and Christensen, M. (2002) ‘Random geometric graphs’,
desirable in the presence of mobile nodes (Santi, 2005). Physical Review E, Vol. 66, No. 016121.
In such a case, we have shown that when nodes adjust their Estrin, D., Govindan, R., Heidmann, J. and Kumar, S. (1999) ‘Next
transmission radius to maintain a fixed out-degree k, then 5 is century challenges: scalable coordination in sensor networks’,
the critical threshold beyond which a giant component exists Proceedings of the ACM MobiCom, pp.263–270.
almost surely in the network. Further, this is true irrespective Farago, A. (2002) ‘Scalable analysis and design of ad hoc networks
of any change in λ as time varies. To our knowledge we are via random graph theory’, Proceedings of Dial-M, pp.43–50.
among the first ones to provide an analytical treatment of this
Farago, A. (2004) ‘On the fundamental limits of topology control’,
problem. Such density independent thresholds are especially
Proceedings of the Joint Workshop on Foundations of Mobile
helpful in the efficient maintenance of the topology in the Computing DIALM-POMC, pp.1–7.
presence of mobile on/off nodes.
Franceschetti, M., Booth, L., Cook, M., Meester, R. and Bruck, J.
(2003) ‘Percolation in multi-hop wireless networks’, IEEE
Acknowledgements Transactions on Information Theory, Available at: http://
www.paradise.caltech.edu/papers/etr055.pdf.
This work has been supported by the National Science Glauche, I., Krause, W., Sollacher, R. and Greiner, M. (2003)
Foundation, USA, under the grant NSF-SST 0427840. Any ‘Continuum percolation of wireless ad hoc communication
opinions, findings and conclusions or recommendations networks’, Physica A, Vol. 325, pp.577–600.
expressed in this paper are those of the authors and do not Goldsmith, A.J. and Wicker, S.B. (2002) ‘Design challenges for
necessarily reflect the views of National Science Foundation. energy-constrained ad hoc wireless networks’, IEEE Wireless
Communications, Vol. 9, No. 4, pp.8–27.
Gupta, P. and Kumar, P.R. (1998) Stochastic Analysis, Control,
References Optimization and Applications: A Volume in Honor of
W.H. Fleming, chapter Critical power for asymptotic connectivity
Albert, R. and Barabasi, A.L. (2002) ‘Statistical mechanics of in wireless networks, Birkhauser, Boston, pp.547–566.
complex networks’, Reviews of Modern Physics, Vol. 74, No. 1,
pp.47–97. Hac, A. (2003) Wireless Sensor Netowrk Designs, England: John
Wiley and Sons.
Appel, M.J.B. and Russo, R. (2002) ‘The connectivity of a graph on
uniform points on [0, 1]d ’, Reviews of Modern Physics, Vol. 60, Hogg, R.V., McKean, J.W. and Craig, A.T. (2005) Introduction to
pp.351–357. Mathematical Statistics, USA: Pearson Prentice Hall.
210 U.N. Raghavan and S.R.T. Kumara
Hou, T. and Li, V.O.K. (1986) ‘Transmission range control Quantanilla, J., Torquato, S. and Ziff, R.M. (2000) ‘Efficient
in multihop packet radio networks’, IEEE Transactions on meansurement of the percolation threshold for fully penetrable
Communications, Vol. 34, No. 1, pp.38–44. discs’, Journal of Physics A: Mathematics and General, Vol. 33,
Kleinrock, L. and Silvester, J. (1978) ‘Optimum transmission pp.L399–L407.
radii for packet radio networks or why six is a magic Quantanilla, J. (2001) ‘Meansurement of percolation, threshold
number’, Proceedings of the IEEE National Telecommunications for fully penetrable discs of different radii’, Physica Review E,
Conference, pp.4.3.1–4.3.5. Vol. 061108, pp.L399–L407.
Krishnamachari, B., Wicker, S.B., Bejar, R. and Pearlman, M. Raghavan, U.N., Thadakamalla, H.P. and Kumara, S.R.T. (2005)
(2002) Communications, Information and Network Security, ‘Phase transitions and connectivity of distributed wireless sensor
chapter Critical Density Thresholds in Distributed Wireless networks’, Advanced Computing and Communications 2005.
Networks, Kluwer publishers.
Ramanathan, R. and Rosales-Hain, R. (2000) ‘Topology control of
Li, L., Halpern, J.Y., Bahl, P., Wang, Y.M. and Wattenhofer, R. multihop wireless networks using transmit power adjustment’,
(2001) ‘Analysis of a cone-based distributed topology control Proceedings of the IEEE INFOCOM, pp.404–413.
algorithm for wireless multi-hop networks’, Proceedings of
the ACM symposium on Principles of Distributed Computing, Rodoplu, V. and Meng, T.H. (1999) ‘Minimum energy mobile
pp.264–273. wireless networks’, IEEE Journal on Selected Areas in
Communications, Vol. 17, No. 8, pp.1333–1344.
Li, N., Hou, J.C. and Sha, L. (2003) ‘Design and analysis of an
mst-based topology control algorithm’, Proceedings of the IEEE Santi, P. (2005) Topology Control in Wireless Ad Hoc and Sensor
INFOCOM, Vol. 3, pp.1702–1712. Networks, Chichester, UK: John Wiley and Sons.
Meester, R. and Roy, R. (1996) Continuum Percolation, Cambridge, Takagi, H. and Kleinrock, L. (1984) ‘Optimal transmission
UK: Cambridge University Press. ranges for randomly distributed packet radio terminals’, IEEE
Penrose, M.D. (1999) ‘On k-connectivity for a geometric random Transactions on Communications, Vol. 32, No. 3, pp.246–257.
graph’, Random Structures and Algorithms, Vol. 15, No. 2, Watts, D.J. and Strogatz, S.H. (1998) ‘Collective dynamics of small
pp.145–164. world networks’, Nature, Vol. 393, pp.440–442.
Penrose, M.D. (2003) Random Geometric Graphs, Oxford Studies Xue, F. and Kumar, P.R. (2004) ‘The number of neighbors needed
in Probability, Oxford: Oxford University Press. for connectivity of wireless networks’, Wireless Networks,
Philips, T.K., Panwar, S.S. and Tantawi, A.N. (1989) ‘Connectivity Vol. 10, pp.169–181.
properties of packet radio network model’, IEEE Transactions Ye, W. and Heidmann, J. (2003) ‘Medium access control in wireless
on Information Theory, Vol. 35, No. 5, pp.1044–1047. sensor networks’, USC/ISI Technical Report, ISI-TR-580.
OPERATIONS RESEARCH AND MANAGEMENT

SCIENCE HANDBOOK

Editor : A. Ravi Ravindran

The Pennsylvania State University

December 1, 2006
ii
Contents

11 Complexity and Large-scale Networks 1

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

11.2 Statistical properties of complex networks . . . . . . . . . . . . . . . . . . . 8

11.2.1 Average path length and the small-world effect . . . . . . . . . . . . . 8

11.2.2 Clustering coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

11.2.3 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

11.2.4 Betweenness centrality . . . . . . . . . . . . . . . . . . . . . . . . . . 12

11.2.5 Modularity and community structures . . . . . . . . . . . . . . . . . 13

11.2.6 Network resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

11.3 Modeling of complex networks . . . . . . . . . . . . . . . . . . . . . . . . . . 16

11.3.1 Random graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

11.3.2 Small-world networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11.3.3 Scale-free networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

11.4 Why “Complex” Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i
ii CONTENTS

11.5 Optimization in complex networks . . . . . . . . . . . . . . . . . . . . . . . . 27

11.5.1 Network resilience to node failures . . . . . . . . . . . . . . . . . . . . 27

11.5.2 Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

11.5.3 Other topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

11.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 11

Complexity and Large-scale Networks

Hari P. Thadakamalla1 , Soundar R. T. Kumara1 and Réka Albert2


1
Dept. of Industrial & Manufacturing Engineering, The Pennsylvania State University
2
Dept. of Physics, The Pennsylvania State University

11.1 Introduction

In the past few decades, graph theory has been a powerful analytical tool for understanding
and solving various problems in operations research (OR). Study on graphs (or networks)
traces back to the solution of the Königsberg bridge problem by Euler in 1735. In Königsberg,
the river Preger flows through the town dividing it into four land areas A, B, C and D as
shown in figure 11.1 (a). These land areas are connected by seven (1 - 7) different bridges.
The Königsberg bridge problem is to find whether it is possible to traverse through the
city on a route that crosses each bridge exactly once, and return to the starting point.
Euler formulated the problem using a graph theoretical representation and proved that the
traversal is not possible. He represented each land area as a vertex (or node) and each bridge
as an edge between two nodes (land areas) as shown in figure 11.1 (b). Then, he posed the
question as whether there exists a path such that it passes every edge exactly once and ends

1
2 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

at the start node. This path was later termed an Eulerian Circuit. Euler proved that for a
graph to have an Eulerian Circuit, all the nodes in the graph need to have an even degree.

Euler’s great insight lay in representing the Königsberg bridge problem as a graph problem
with a set of vertices and edges. Later, in the twentieth century, graph theory has developed
into a substantial area of study which is applied to solve various problems in engineering and
several other disciplines [7]. For example, consider the problem of finding the shortest route
between two geographical points. The problem can be modeled as a shortest path problem
on a network, where different geographical points are represented as nodes and they are
connected by an edge if there exists a direct path between the two nodes. The weights on
the edges represent the distance between the two nodes (see figure 11.2). Let the network
be G(V, E) where V is the set of all nodes, E is the set of edges (i, j) connecting the nodes
and w is a function such that wij is the weight of the edge (i, j). The shortest path problem
from node s to node t can be formulated as follows.

X
minimize wij xij
(i,j)∈ξ


 1 if i = s;


X X
subject to xij − xji = −1 if i = t;

{j|(i,j)∈ξ} {j|(j,i)∈ξ} 
 0 otherwise.

xij ≥ 0, ∀(i, j) ∈ ξ.

where xij = 1 or 0 depending on whether the edge from node i to node j belongs to the
optimal path or not respectively. Many algorithms have been proposed to solve the shortest
path problem [7]. Using one such popular algorithm (Dijkstra’s algorithm [7]), we find the
shortest path from node 10 to node 30 as (10 - 1 - 3 - 12 - 30)(see figure 11.2). Note that
this problem and similarly other problems considered in traditional graph theory requires to
find the exact optimal path.

In the last few years there has been an intense amount of activity in understanding and
characterizing large-scale networks, which led to development of a new branch of science
called “Network science” [108]. The scale of the size of these networks is substantially dif-
11.1. INTRODUCTION 3

Figure 11.1: Königsberg bridge problem. (a) Shows the river flowing through the town
dividing it into four land areas A, B, C, and D. The land areas are connected by seven
bridges numbered from 1 to 7. (b) Graph theoretical representation of the Königsberg
bridge problem. Each node represents a land area and the edge between them represent the
bridges connecting the land areas.

Figure 11.2: Illustration of a typical optimization problem in OR. The objective is to find
the shortest path from node 10 to node 30. The values on the edges represent the distance
between two nodes. Here we use the exact distances between different nodes to calculate the
shortest path 10 - 1 - 3 - 12 - 30.
4 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

ferent from the networks considered in traditional graph theory. Also, the problems posed
in such networks are very different from traditional graph theory. These large-scale net-
works are referred to as complex networks and we will discuss the reasons why they are
termed “complex” networks later in the section 11.4. The following are examples of complex
networks:

• World Wide Web: It can be viewed as a network where web pages are the nodes and
hyperlinks connecting one webpage to another are the directed edges. The World Wide
Web is currently the largest network for which topological information is available. It
had approximately one billion nodes at the end of 1999 [89] and is continuously growing
at an exponential rate. A recent study [66] estimated the size to be 11.5 billion nodes
as of January 2005.

• Internet: The Internet is a network of computers and telecommunication devices con-


nected by wired or wireless links. The topology of the Internet is studied at two
different levels [55]. At the router level, each router is represented as a node and
physical connections between them as edges. At the domain level, each domain (au-
tonomous system, Internet Provider System) is represented as a node and inter-domain
connections by edges. The number of nodes, approximately, at the router level were
150, 000 in 2000 [61] and at the domain level were 4000 in 1999 [55].

• Phone call network : The phone numbers are the nodes and every completed phone
call is an edge directed from the receiver to the caller. Abello et al. [4] constructed
a phone call network from the long distance telephone calls made during a single day
which had 53, 767, 087 nodes and over 170 million edges.

• Power grid network : Generators, transformers, and substations are the nodes and
high-voltage transmission lines are the edges. The power grid network of the western
United States had 4941 nodes in 1998 [143]. The North American power grid consisted
of 14, 099 nodes and 19, 657 edges [16] in 2005.

• Airline network : Nodes are the airports and an edge between two airports represent the
presence of a direct flight connection [29, 65]. Barthelemy et al. [29] have analyzed the
International Air Transportation Association database to form the world-wide airport
11.1. INTRODUCTION 5

network. The resulting network consisted of 3880 nodes and 18810 edges in 2002.

• Market graph: Recently, Boginski et al. [32, 33] represented the stock market data
as a network where the stocks are nodes and two nodes are connected by an edge if
their correlation coefficient calculated over a period of time exceeds certain threshold
value. The network had 6556 nodes and 27, 885 edges for the U.S. stock data during
the period 2000-2002 [33].

• Scientific collaboration networks: Scientists are represented as nodes and two nodes
are connected if the two scientists have written an article together. Newman [99,
100] studied networks constructed from four different databases spanning biomedical
research, high-energy physics, computer science and physics. On of these networks
formed from Medline database for the period from 1961 to 2001 had 1, 520, 251 nodes
and 2, 163, 923 edges.

• Movie actor collaboration network : Another well studied network is the movie actor
collaboration network, formed from the Internet Movie Database [1], which contains all
the movies and their casts from 1890s. Here again, the actors are represented as nodes
and two nodes are connected by an edge if the two actors have performed together in
a movie. This is a continuously growing network with 225, 226 nodes and 13, 738, 786
edges in 1998 [143].

The above are only a few examples of complex networks pervasive in the real world
[13, 31, 49, 101]. Tools and techniques developed in the field of traditional graph theory
involved studies that looked at networks of tens or hundreds or in extreme cases thousands
of nodes. The substantial growth in size of many such networks [see figure 11.3] necessitates
a different approach for analysis and design. The new methodology applied for analyzing
complex networks is similar to the statistical physics approach to complex phenomena.

The study of large-scale complex systems has always been an active research area in
various branches of science, especially in the physical sciences. Some examples are: fer-
romagnetic properties of materials, statistical description of gases, diffusion, formation of
crystals etc. For instance, let us consider a box containing one mole (6.022 ∗ 1023 ) of gas
atoms as our system of analysis[see figure 11.4 (a)]. If we represent the system with the
6 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.3: Pictorial description of the change in scale in the size of the networks found
in many engineering systems. This change in size necessitates a change in the analytical
approach.

microscopic properties of the individual particles such as their position and velocity, then it
would be next to impossible to analyze the system. Rather, physicists use statistical me-
chanics to represent the system and calculate macroscopic properties such as temperature,
pressure etc. Similarly, in networks such as the Internet and WWW, where the number
of nodes is extremely large, we have to represent the network using macroscopic properties
(such as degree distribution, edge-weight distribution etc), rather than the properties of in-
dividual entities in the network (such as the neighbors of a given node, the weights on the
edges connecting this node to its neighbors etc) [see figure 11.4 (b)]. Now let us consider
the shortest path problem in such networks (for instance, WWW). We rarely require specific
shortest path solutions such as from node A to node B (from webpage A to webpage B).
Rather it is useful if we know the average distance (number of hops) taken from any node
to any other node (any webpage to any other webpage) to understand dynamical processes
(such as search in WWW). This new approach for understanding networked systems provides
new techniques as well as challenges for solving conceptual and practical problems in this
field. Furthermore, this approach has become feasible and received a considerable boost by
the availability of computers and communication networks which have made the gathering
and analysis of large-scale data sets possible.

The objective of this chapter is to introduce this new direction of inter-disciplinary re-
search (Network Science) and discuss the new challenges for the OR community. During
the last few years there has been a tremendous amount of research activity dedicated to the
11.1. INTRODUCTION 7

Figure 11.4: Illustration of the analogy between a box of gas atoms and complex networks.
(a) A mole of gas atoms (6.022 ∗ 1023 atoms) in a box. (b) An example of a large-scale
network. For analysis, we need to represent both the systems using statistical properties.

study of these large-scale networks. This activity was mainly triggered by significant find-
ings in real-world networks which we will elaborate later in the chapter. There was a revival
of network modeling which gave rise to many path breaking results [13, 31, 49, 101] and
provoked vivid interest across different disciplines of the scientific community. Until now,
a major part of this research was contributed by physicists, mathematicians, sociologists
and biologists. However, the ultimate goal of modeling these networks is to understand and
optimize the dynamical processes taking place in the network. In this chapter, we address
the urgent need and opportunity for the OR community to contribute to the fast-growing
inter-disciplinary research on Network Science. The methodologies and techniques developed
till now will definitely aid the OR community in furthering this research.

The following is the outline of the chapter. In section 11.2, we introduce different sta-
tistical properties that are prominently used for characterizing complex networks. We also
present the empirical results obtained for many real complex networks that initiated a revival
of network modeling. In section 11.3, we summarize different evolutionary models proposed
to explain the properties of real networks. In particular, we discuss Erdős-Rényi random
graphs, small-world networks, and scale-free networks. In section 11.4, we discuss briefly
why these networks are called “complex” networks, rather than large-scale networks. We
summarize typical behaviors of complex systems and demonstrate how the real networks
have these behaviors. In section 11.5, we discuss the optimization in complex networks by
8 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

concentrating on two specific processes, robustness and local search, which are most relevant
to engineering networks. We discuss the effects of statistical properties on these processes
and demonstrate how they can be optimized. Further, we briefly summarize few more im-
portant topics and give references for further reading. Finally, in section 11.6, we conclude
and discuss future research directions.

11.2 Statistical properties of complex networks

In this section, we explain some of the statistical properties which are prominently used in the
literature. These statistical properties help in classifying different kinds of complex networks.
We discuss the definitions and present the empirical findings for many real networks.

11.2.1 Average path length and the small-world effect

Let G(V, E) be a network where V is the collection of entities (or nodes) and E is the set
of arcs (or edges) connecting them. A path between two nodes u and v in the network G
is a sequence [u = u1 , u2 , ..., un = v] , where u′i s are the nodes in G and there exists an
edge from ui−1 to ui in G for all i. The path length is defined as sum of the weights on the
edges along the path. If all the edges are equivalent in the network, then the path length
is equal to the number of edges (or hops) along the path. The average path length (l) of a
connected network is the average of the shortest paths from each node to every other node
in a network. It is given by

1 X X
l ≡ hd(u, w)i = d(u, w),
N(N − 1) u∈V u6=w∈ V

where, N is the number of nodes in the network and d(u, w) is the shortest path between u
and w. Table 11.1 show the values of l for many different networks. We observe that despite
the large size of the network (w.r.t. the number of nodes), the average path length is small.
This implies that any node can reach any other node in the network in a relatively small
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 9

Table 11.1: Average path length of many real networks. Note that despite the large size of
the network (w.r.t. the number of nodes), the average path length is very small.

Network Size (number of nodes) Average path length


WWW [37] 2 × 108 16
Internet, router level [61] 150,000 11
Internet, domain level [55] 4,000 4
Movie actors [143] 212,250 4.54
Electronic circuits [75] 24,097 11.05
peer-to-peer network [122] 880 4.28

number of steps. This characteristic phenomenon, that most pairs of nodes are connected
by a short path through the network, is called the small-world effect.

The existence of the small-world effect was first demonstrated by the famous experiment
conducted by Stanley Milgram in the 1960s [92] which led to the popular concept of six
degrees of separation. In this experiment, Milgram randomly selected individuals from Wi-
chita, Kansas and Omaha, Nebraska to pass on a letter to one of their acquaintances by mail.
These letters had to finally reach a specific person in Boston, Massachusetts; the name and
profession of the target was given to the participants. The participants were asked to send
the letter to one of their acquaintances whom they judged to be closer (than themselves) to
the target. Anyone who received the letter subsequently would be given the same information
and asked to do the same until it reached the target person. Over many trials, the average
length of these acquaintance chains for the letters that reached the targeted node was found
to be approximately 6. That is there is an acquaintance path of an average length 6 in the
social network of people in the United States. We will discuss another interesting and even
more surprising observation from this experiment in section 11.5.2. Currently, Watts et al.
[145] are doing an Internet-based study to verify this phenomenon.

Mathematically, a network is considered to be small-world if the average path length


scales logarithmically or slower with the number of nodes N (∼ logN). For example, say
the number of nodes in the network, N, increases from 103 to 106 , then average path length
will increase approximately from 3 to 6. This phenomenon has critical implications on the
dynamic processes taking place in the network. For example, if we consider the spread
of information, computer viruses, or contagious diseases across a network, the small-world
10 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

phenomenon implies that within a few steps it could spread to a large fraction of most of
the real networks.

11.2.2 Clustering coefficient

The clustering coefficient characterizes the local transitivity and order in the neighborhood
of a node. It is measured in terms of number of triangles (3-cliques) present in the network.
Consider a node i which is connected to ki other nodes. The number of possible edges
between these ki neighbors that form a triangle is ki (ki − 1)/2. The clustering coefficient of
a node i is the ratio of the number of edges Ei that actually exist between these ki nodes
and the total number ki (ki − 1)/2 possible, i.e.

2Ei
Ci =
ki (ki − 1)

The clustering coefficient of the whole network (C) is then the average of Ci′ s over all the
nodes in the network i.e. C = n1 i Ci (see figure 11.5). The clustering coefficient is high
P

for many real networks [13, 101]. In other words, in many networks if node A is connected
to node B and node C, then there is a high probability that node B and node C are also
connected. With respect to social networks, it means that it is highly likely that two friends
of a person are also friends, a feature analyzed in detail in the so called theory of balance
[43].

11.2.3 Degree distribution

The degree of a node is the number of edges incident on it. In a directed network, a node has
both an in-degree (number of incoming edges) and an out-degree (number of outgoing edges).
The degree distribution of the network is the function pk , where pk is the probability that a
randomly selected node has degree k. Here again, a directed graph has both in-degree and
out-degree distributions. It was found that most of the real networks including the World
Wide Web [5, 14, 88], the Internet [55], metabolic networks [77], phone call networks [4, 8],
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 11

Figure 11.5: Calculating the clustering coefficient of a node and the network. For example,
node 1 has degree 5 and the number of edges between the neighbors is 3. Hence, the clustering
coefficient for node 1 is 3/10. The clustering coefficient of the entire network is the average
of the clustering coefficients at each individual nodes (109/180).

scientific collaboration networks [26, 99], and movie actor collaboration networks [12, 19, 25]
follow a power-law degree distribution (p(k) ∼ k −γ ), indicating that the topology of the
network is very heterogeneous, with a high fraction of small-degree nodes and few large
degree nodes. These networks having power-law degree distributions are popularly known
as scale-free networks. These networks were called as scale-free networks because of the lack
of a characteristic degree and the broad tail of the degree distribution. Figure 11.6 shows
the empirical results for the Internet at the router level and co-authorship network of high-
energy physicists. The following are the expected values and variances of the node degree in
scale-free networks,
 
f inite if γ > 2; f inite if γ > 3;
E[k] = V [k] =
∞ otherwise. ∞ otherwise.

where γ is the power-law exponent. Note that the variance of the node degree is infinite
when γ < 3 and the mean is infinite when γ < 2. The power-law exponent (γ) of most
of the networks lie between 2.1 and 3.0 which implies that their is high heterogeneity with
respect to node degree. This phenomenon in real networks is critical because it was shown
that the heterogeneity has a huge impact on the network properties and processes such as
network resilience [15, 16], network navigation, local search [6], and epidemiological processes
[111, 112, 113, 114, 115]. Later in this chapter, we will discuss the impact of the this
heterogeneity in detail.
12 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

0 0
10 10

−1
−1
10 10

−2
−2
10
10
−3
10
P(k)

−3
10
−4
10
−4
10
−5
10

−5
10 −6
10

−6 −7
10 0 1 2 3
10 0 1 2 3 4
10 10 10 10 10 10 10 10 10
k

(a) (b)

Figure 11.6: The degree distribution of real networks. (a) Internet at the router level. Data
courtesy of Ramesh Govindan [61]. (b) Co-authorship network of high-energy physicists,
after Newman [99].

11.2.4 Betweenness centrality

Betweenness centrality (BC) of a node counts the fraction of shortest paths going through a
node. The BC of a node i is given by

X σst (i)
BC(i) = ,
s6=n6=t
σst

where σst is the total number of shortest paths from node s to t and σst (i) is the number of
these shortest paths passing through node i. If the BC of a node is high, it implies that this
node is central and many shortest paths pass through this node. BC was first introduced
in the context of social networks [139], and has been recently adopted by Goh et al. [59]
as a proxy for the load (li ) at a node i with respect to transport dynamics in a network.
For example, consider the transportation of data packets in the Internet along the shortest
paths. If many shortest paths pass through a node then the load on that node would be high.
Goh et al. have shown numerically that the load (or BC) distribution follows a power-law,
PL (l) ∼ l−δ with exponent δ ≈ 2.2 and is insensitive to the detail of the scale-free network
as long as the degree exponent (γ) lies between 2.1 and 3.0. They further showed that
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 13

Figure 11.7: Illustration of a network with community structure. Communities are defined
as a group of nodes in the network that have higher density of edges with in the group than
between groups. In the above network, group of nodes enclosed with in a dotted loop is a
community.

there exists a scaling relation l ∼ k (γ−1)/(δ−1) between the load and the degree of a node
when 2 < γ ≤ 3. Later in this chapter, we discuss how this property can be utilized for
local search in complex networks. Many other centrality measures exists in literature and a
detailed review of these measures can be found in [86].

11.2.5 Modularity and community structures

Many real networks are found to exhibit a community structure (also called modular struc-
ture). That is, groups of nodes in the network have high density of edges within the group
and lower density between the groups (see figure 11.7). This property was first proposed in
the social networks [139] where people may divide into groups based on interests, age, pro-
fession etc. Similar community structures are observed in many networks which reflects the
division of nodes into groups based on the node properties [101]. For example, in the WWW
it reflects the subject matter or themes of the pages, in citation networks it reflects the area
of research, in cellular and metabolic networks it may reflect functional groups [72, 121].

In many ways, community detection is similar to a traditional graph partitioning problem


(GPP). In GPP the objective is to divide the nodes of the network into k disjoint sets
14 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

of specified sizes, such that, the number of edges between these sets is minimum. This
problem is NP-complete [58] and several heuristic methods [69, 81, 119] have been proposed
to decrease the computation time. GPP arises in many important engineering problems
which include mapping of parallel computations, laying out of circuits (VLSI design) and
the ordering of sparse matrix computations [69]. Here, the number of partitions to be
made is specified and the size of each partition is restricted. For example, in mapping of
parallel computations, the tasks have to be divided between a specified number of processors
such that the communication between the processors is minimized and the loads on the
processors are balanced. However, in real networks, we do not have any a priori knowledge
about the number of communities into which we should divide and about the size of the
communities. The goal is to find the naturally existing communities in the real networks
rather than dividing the network into a pre-specified number of groups. Since we do not
know the exact partitions of network, it is difficult to evaluate the goodness of a given
partition. Moreover, there is no unique definition of a community due to the ambiguity
of how dense a group should be to form a community. Many possible definitions exist in
literature [56, 103, 109, 120, 139]. A simple definition given in [56, 120] considers a subgraph
as a community if each node in the subgraph has more connections within the community
than with the rest of the graph. Newman and Grivan [103] have proposed another measure
which calculates the fraction of links within the community minus the expected value of the
same quantity in a randomized counterpart of the network. The higher this difference, the
stronger is the community structure. It is important to note that in spite of this ambiguity,
the presence of community structures is a common phenomenon across many real networks.
Algorithms for detecting these communities are briefly discussed in section 11.5.3.

11.2.6 Network resilience

The ability of a network to withstand removal of nodes/edges in a network is called network


resilience or robustness. In general, the removal of nodes and edges disrupts the paths
between nodes and can increase the distances and thus making the communication between
nodes harder. In more severe cases, an initially connected network can break down into
isolated components that cannot communicate anymore. Figure 11.8 shows the effect of
11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 15

Figure 11.8: Effects of removing a node or an edge in the network. Observe that as we
remove more nodes and edges the network disintegrates into small components/clusters.

removal of nodes/edges on a network. Observe that as we remove more nodes and edges, the
network disintegrates into many components. There are different ways of removing nodes and
edges to test the robustness of a network. For example, one can remove nodes at random with
uniform probability or by selectively targeting certain classes of nodes, such as nodes with
high degree. Usually, the removal of nodes at random is termed as random failures and the
removal of nodes with higher degree is termed as targeted attacks; other removal strategies
are discussed in detail in [71]. Similarly there are several ways of measuring the degradation of
the network performance after the removal. One simple way to measure it is to calculate the
decrease in size of the largest connected component in the network. A connected component
is a part of the network in which a path exists between any two nodes in that component
and the largest connected component is the largest among the connected components. The
lesser the decrease in the size of the largest connected component, the better the robustness
of the network. In figure 11.8, the size of the largest connected component decreases from
13 to 9 and then to 5. Another way to measure robustness is to calculate the increase of
the average path length in the largest connected component. Malfunctioning of nodes/edges
eliminates some existing paths and generally increases the distance between the remaining
nodes. Again, the lesser the increase, the better the robustness of the network. We discuss
more about network resilience and robustness with respect to optimization in section 11.5.1.
16 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

11.3 Modeling of complex networks

In this section, we give a brief summary of different models for complex networks. Most of
the modeling efforts focused on understanding the underlying process involved during the
network evolution and capture the above-mentioned properties of real networks. In specific,
we concentrate on three prominent models, namely, the Erdős-Rényi random graph model,
the Watts-Strogatz small-world network model, and the Barabási-Albert scale-free network
model.

11.3.1 Random graphs

One of the earliest theoretical models for complex networks was given by Erdős and Rényi
[52, 53, 54] in the 1950s and 1960s. They proposed uniform random graphs for modeling
complex networks with no obvious pattern or structure. The following is the evolutionary
model given by Erdős and Rényi:

• Start with a set of N isolated nodes

• Connect each pair of nodes with a connection probability p

Figure 11.9 illustrates two realizations for Erdős-Rényi random graph model (ER random
graphs) for two connection probabilities. Erdős and Rényi have shown that at pc ≃ 1/N,
the ER random graph abruptly changes its topology from a loose collection of small clusters
to one which has giant connected component. Figure 11.10 shows the change in size of the
largest connected component in the network as the value of p increases, for N = 1000. We
observe that there exists a threshold pc = 0.001 such that when p < pc , the network is com-
posed of small isolated clusters and when p > pc a giant component suddenly appears. This
phenomenon is similar to the percolation transition, a topic well-studied both in mathematics
and statistical mechanics [13].

In a ER random graph, the mean number of neighbors at a distance (number of hops) d


from a node is approximately < k >d , where < k > is the average degree of the network.
11.3. MODELING OF COMPLEX NETWORKS 17

Figure 11.9: An Erdős-Rényi random graph that starts with N = 20 isolated nodes and
connects any two nodes with a probability p. As the value of p increases the number of edges
in the network increase.

1000

900
Size of the largest connected

800

700
component

600

500

400

300

200

100

0
0 0.002 0.004 0.006 0.008 0.01
Connection probability (p)

Figure 11.10: Illustration of percolation transition for the size of the largest connected
component in Erdős-Rényi random graph model. Note that there exists pc = 0.001 such
that when p < pc , the network is composed of small isolated clusters and when p > pc a
giant component suddenly appears.
18 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

To cover all the nodes in the network, the distance (l) should be such that < k >l ∼ N.
log N
Thus, the average path length is given by l = log <k>
, which scales logarithmically with the
number of nodes N. This is only an approximate argument for illustration, a rigorous proof
can be found in [34]. Hence, ER random graphs are small world. The clustering coefficient
of the ER random graphs is found to be low. If we consider a node and its neighbors in a
ER random graph then the probability that two of these neighbors are connected is equal
to p (probability that two randomly chosen neighbors are connected). Hence, the clustering
<k>
coefficient of a ER random graph is p = N
which is small for large sparse networks. Now,
let us calculate the degree distribution of the ER random graphs. The total number of edges
in the network is a random variable with an expected value of pN(N − 1)/2 and the number
of edges incident on a node (the node degree) follows a binomial distribution with parameters
N − 1 and p,
p(ki = k) = CNk −1 pk (1 − p)N −1−k .

This implies that in the limit of large N, the probability that a given node has degree
<k>k e−<k>
k approaches a Poisson distribution, p(k) = k!
. Hence, ER random graphs are
statistically homogenous in node degree as the majority of the nodes have a degree close to
the average, and significantly small and large node degrees are exponentially rare.

ER random graphs were used to model complex networks for a longtime [34]. The model
was intuitive and analytically tractable; moreover the average path length of real networks
is close to the average path length of a ER random graph of the same size [13]. However,
recent studies on the topologies of diverse large-scale networks found in nature indicated
that they have significantly different properties from ER random graphs [13, 31, 49, 101]. It
has been found [143] that the average clustering coefficient of real networks is significantly
larger than the average clustering coefficient of ER random graphs with the same number
of nodes and edges, indicating a far more ordered structure in real networks. Moreover, the
degree distribution of many large-scale networks are found to follow a power-law p(k) ∼ k −γ .
Figure 11.11 compares two networks with Poisson and power-law degree distributions. We
observe that there is a remarkable difference between these networks. The network with
Poisson degree distribution is more homogenous in node degree, whereas the network with
power-law distribution is highly heterogenous. These discoveries along with others related
11.3. MODELING OF COMPLEX NETWORKS 19

Poisson Power-law
0.12 1.0E+00
0.1 1.0E-01
0.08 1.0E-02

P(k)
P(k)

1.0E-03
0.06
1.0E-04
0.04 1.0E-05
0.02 1.0E-06
k
0 1.0E-07
0 10 20 30 1 10 100 1000

k k

Figure 11.11: Comparison of networks with Poisson and power-law degree distribution of the
same size. Note that the network with Poisson distribution is homogenous in node degree.
Most of the nodes in the network have same degree which is close to the average degree of the
network. However, the network with power-law degree distribution is highly heterogenous
in node degree. There are few nodes with large degree and many nodes with a small degree

to the mixing patterns of complex networks [13, 31, 49, 101] initiated a revival of network
modeling in the past few years.

Non-uniform random graphs are also studied [8, 9, 41, 93, 102, 104] to mimic the properties
of real-world networks, in specific, power-law degree distribution. Typically, these models
specify either a degree sequence, which is set of N values of the degrees ki of nodes i =
1, 2, ..., N or a degree distribution p(k). If a degree distribution is specified then the sequence
is formed by generating N random values from this distribution. This can be thought as
giving each node i in the network ki “stubs” sticking out of it and then pairs of these stubs
are connected randomly to form complete edges [104]. Molloy and Reed [93] have proved that
for a random graph with a degree distribution p(k) a gaint connected component emerges
P
almost surely when k≥1 k(k − 2)p(k) > 0, provided that the maximum degree is less than
N 1/4 . Later, Aiello et al. [8, 9] introduced a two-parameter random graph model P (α, γ)
20 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

for power-law graphs with exponent γ described as follows: Let nk be the number of nodes
with degree k, such that nk and k satisfy log nk = α − γ log k. The total number of nodes in
the network can be computed, noting that the maximum degree of a node in the network is
eα/γ . Using the results from Molloy and Reed [93], they showed that there is almost surely
a unique gaint connected component if γ < γ0 = 3.47875.... Whereas, there is no gaint
connected component almost surely when γ > γ0 .

Newman et al. [104] have developed a general approach to random graphs by using a
generating function formalism [146]. The generating function for the degree distribution pk
is given by G0 (x) = ∞ k
P
k=0 pk x . This function captures all the information present in the
k
original distribution since pk = k!1 ddxGk0 |x=0 . The average degree of a randomly chosen node
P ′
would be < k >= k kp(k) = G0 (1). Further, this formulation helps in calculating other
properties of the network [104]. For instance, we can approximately calculate the relation for
the average path length of the network. Let us consider, the degree of the node reached by
following a randomly chosen edge. If the degree of this node is k then we are k times more
likely to reach this node than a node of degree 1. Thus the degree distribution of the node
arrived by a randomly chosen edge is given by kpk and not pk . In addition, the distribution
(k+1)pk+1 (k+1)pk+1
of number of edges from this node (one less than the degree) qk , is P = .
k kpk <k>
P∞ ′
(k+1)pk+1 xk G0 (x)
Thus, the generating function for qk is given by G1 (x) = k=0
k
= ′
G0 (1)
. Note
that the distribution of number of first neighbors of a randomly chosen node (degree of
a node) is G0 (x). Hence, the distribution of number of second neighbors from the same
randomly chosen node would be G0 (G1 (x)) = k pk [G1 (x)]k . Here, the probability that any
P

of the second neighbors is connected to first neighbors or to one another scales as N −1 and
can be neglected in the limit of large N. This implies that the average number of second

neighbors is given by [ ∂x G0 (G1 (x))]x=1 = G′0 (1)G′1 (1). Extending this method of calculating
the average number of nearest neighbors, we find that the average number of mth neighbors
zm , is [G′1 (1)]m−1 G′0 (1) = [ zz12 ]m−1 z1 . Now, let us start from a node and find the number of
first neighbors, second, third ... mth neighbors. Assuming that all the nodes in the network
can be reached within l steps, we have 1 + lm=1 zm = N. As for most graphs N ≫ z1 and
P
N/z1
z2 ≫ z1 , we obtain the average path length of the network l = z2 /z1
+ 1. The generating
function formalism can further be extended to include other features such as directed graphs,
bipartite graphs and degree correlations [101].
11.3. MODELING OF COMPLEX NETWORKS 21

Another class of random graphs which are especially popular in modeling social networks
is Exponential Random Graphs Models (ERGMs) or p∗ models [20, 57, 70, 129, 140]. The
ERGM consists of a family of possible networks of N nodes in which each network G appears
with probability P (G) = Z1 exp(− i θi ǫi ), where the function Z is, Z = G exp(− i θi ǫi ).
P P P

This is similar to the Boltzmann ensemble of statistical mechanics with Z as the partition
function [101]. Here, {ǫi } is the set of observable’s or measurable properties of the network
such as number of nodes with certain degree, number of triangles etc. {θi } are adjustable
set of parameters for the model. The ensemble average of a property ǫi is given as hǫi i =
P 1
P ∂f
G ǫi (G)P (G) = Z ǫi exp(− i θi ǫi ) = ∂θi . The major advantage of these models is that

they can represent any kind of structural tendencies such as dyad and triangle formations.
A detailed review of the parameter estimation techniques can be found in [20, 127]. Once the
parameters {θi } are specified, the networks can be generated by using Gibbs or Metropolis-
Hastings sampling methods [127].

11.3.2 Small-world networks

Watts and Strogatz [143] presented a small-world network model to explain the existence
of high clustering and small average path length simultaneously in many real networks,
especially, social networks. They argued that most of the real networks are neither completely
regular nor completely random, but lie somewhere between these two extremes. The Watts-
Strogatz model starts with a regular lattice on N nodes and each edge is rewired with certain
probability p. The following is the algorithm for the model,

• Start with a regular ring lattice on N nodes where each node is connected to its first
k neighbors.

• Randomly rewire each edge with a probability p such that one end remains the same
and the other end is chosen uniformly at random. The other end is chosen without
allowing multiple edges (more than one edge joining a pair of nodes) and loops (edges
joining a node to itself).
22 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.12: Illustration of the random rewiring process for the Watts-Strogatz model. This
model interpolates between a regular ring lattice and a random network, without changing
the number of vertices (N = 20) or edges (E = 40) in the graph. When p = 0 the graph is
regular (each node has 4 edges), as p increases, the graph becomes increasingly disordered
until p = 1, all the edges are rewired randomly. After Watts and Strogatz, 1998 [143].

The resulting network is a regular network when p = 0 and a random graph when p = 1,
since all the edges are rewired (see figure 11.12). The above model is inspired from social
networks where people are friends with their immediate neighbors such as neighbors on the
street, colleagues at work etc (the connections in the regular lattice). Also, each person has
few friends who are a long way away (long-range connections attained by random rewiring).
Later, Newman [98] proposed a similar model where instead of edge rewiring, new edges are
introduced with probability p. The clustering coefficient of the Watts-Strogatz model and
the Newman model are

3(k − 1) 3(k − 1)
CW S = (1 − p)3 CN =
2(2k − 1) 2(2k − 1) + 4kp(p + 2)

respectively. This class of networks displays a high degree of clustering coefficient for small
values of p since we start with a regular lattice. Also, for small values of p the average path
length falls rapidly due to the few long-range connections. This co-existence of high clustering
coefficient and small average path length is in excellent agreement with the characteristics
of many real networks [98, 143]. The degree distribution of both models depends on the
parameter p, evolving from a univalued peak corresponding to the initial degree k to a
somewhat broader but still peaked distribution. Thus, small-world models are even more
homogeneous than random graphs, which is not the case with real networks.
11.3. MODELING OF COMPLEX NETWORKS 23

11.3.3 Scale-free networks

As mentioned earlier, many real networks including the World Wide Web [5, 14, 88], the
Internet [55], peer-to-peer networks [122], metabolic networks [77], phone call networks [4,
8] and movie actor collaboration networks [12, 19, 25] are scale-free, that is, their degree
distribution follows a power-law, p(k) ∼ k −γ . Barabási and Albert [25] addressed the origin
of this power-law degree distribution in many real networks. They argued that a static
random graph or Watts-Strogatz model fails to capture two important features of large-scale
networks: their constant growth and the inherent selectivity in edge creation. Complex
networks like the World-Wide Web, collaboration networks and even biological networks
are growing continuously by the creation of new web pages, start of new researchers and
by gene duplication and evolution. Moreover, unlike random networks where each node
has the same probability of acquiring a new edge, new nodes entering the network do not
connect uniformly to existing nodes, but attach preferentially to nodes of higher degree. This
reasoning led them to define the following mechanism,

• Growth: Start with small number of connected nodes say m0 and assume that every
time a node enters the system, m edges are pointing from it, where m < m0 .

• Preferential Attachment: Every time a new node enters the system, each edge of the
newly entered node preferentially attaches to a already existing node i with degree ki
with the following probability,
ki
Πi = P
j kj

It was shown that such a mechanism leads to a network with power-law degree distribution
p(k) = k −γ with exponent γ = 3. These networks were called as scale-free networks because
of the lack of a characteristic degree and the broad tail of the degree distribution. The average
log(N )
path length of this network scales as log(log(N ))
and thus displays small world property. The
(log N )2
clustering coefficient of a scale-free network is approximately C ∼ N
, which is a slower
decay than C =< k > N −1 decay observed in random graphs [35]. In the years following
the proposal of the first scale-free model a large number of more refined models have been
introduced, leading to a well-developed theory of evolving networks [13, 31, 49, 101].
24 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

11.4 Why “Complex” Networks

In this section, we will discuss why these large-scale networks are termed as “complex” net-
works. The reason is not merely because the large size of the network, though the complexity
does arise due to the size of the network. One must also distinguish “complex systems” from
“complicated systems” [136]. Consider an airplane as an example. Even though it is a com-
plicated system, we know its components and the rules governing its functioning. However,
this is not the case with complex systems. Complex systems are characterized by diverse
behaviors that emerge as a result of non-linear spatio-temporal interactions among a large
number of components [73]. These emergent behaviors can not be fully explained by just
understanding the properties of the individual components/constituents. Examples of such
complex systems include ecosystems, economies, various organizations/societies, the nervous
system, the human brain, ant hills ... the list goes on. Some of the behaviors exhibited by
complex systems are discussed below:

• Scale invariance or self-similarity: A system is scale invariant if the structure of the


system is similar regardless of the scale. Typical examples of scale invariant systems
are fractals. For example, consider the Sierpinski triangle in figure 11.13. Note that
if we look at a small part of the triangle at a different scale, then it still looks similar
to the original triangle. Similarly, at which ever scale we look at the triangle, it is
self-similar and hence scale invariant.

• Infinite susceptibility/response: Most of the complex systems are highly sensitive or


susceptive to changes in certain conditions. A small change in the system conditions
or parameters may lead to a huge change in the global behavior. This is similar to
the percolation threshold, where a small change in connection probability induces the
emergence of a giant connected cluster. Another good example of such system is a sand
pile. When we are adding more sand particles to a sand pile, they keep accumulating.
But after reaching a certain point, an addition of one more small particle may lead to
an avalanche demonstrating that sand pile is highly sensitive.

• Self-organization and Emergence: Self-organization is the characteristic of a system


by which it can evolve itself into a particular structure based on interactions between
11.4. WHY “COMPLEX” NETWORKS 25

Figure 11.13: Illustration of self-similarity in the Sierpinski triangle. When we look at a


small part of the triangle at a different scale then it looks similar to the original triangle.
Moreover, at each scale we look at the triangle, it is self-similar. This is a typical behavior
of a scale invariant system.

the constituents and without any external influence. Self-organization typically leads
to an emergent behavior. Emergent behavior is a phenomenon in which the system
global property is not evident from those of its individual parts. A completely new
property arises from the interactions between the different constituents of the system.
For example, consider an ant colony. Although a single ant (a constituent of an ant
colony) can perform a very limited number of tasks in its lifetime, a large number of
ants interact in an ant colony that leads to more complex emergent behaviors.

Now let us consider the real large-scale networks such as the Internet, the WWW and
other networks mentioned in section 11.1. Most of these networks have power-law degree
distribution which does not have any specific scale [25]. This implies that the networks do not
have any characteristic degree and an average behavior of the system is not typical (see figure
11.11 (b)). Due to these reasons they are called as scale-free networks. This heavy tailed
degree distribution induces a high level of heterogeneity in the degrees of the vertices. The
heterogeneity makes the network highly sensitive to external disturbances. For example,
consider the network shown in figure 11.14(a). This network is highly sensitive when we
remove just two nodes in the network. It completely disintegrates into small components.
On the other hand, the network shown in the figure 11.14(b) having the same number of
nodes and edges is not very sensitive. Most real networks are found to have a structure similar
to the the network shown in figure 11.14(a), with a huge heterogeneity in node degree. Also,
studies [111, 112, 113, 114, 115] have shown that the presence of heterogeneity has a huge
impact on epidemiological processes such as disease spreading. They have shown that in
networks which do not have a heavy tailed degree distribution if the disease transmission
26 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.14: Illustration of high sensitivity phenomena in complex networks. (a) Observe
that when we remove the two highest degree nodes from the network, it disintegrates into
small parts. The network is highly sensitive to node removals. (b) Example of a network
with the same number of nodes and edges which is not sensitive. This network is not effected
much when we remove the three highest degree nodes. The network in (a) is highly sensitive
due to the presence of high heterogeneity in node degree.

rate is lesser than a certain threshold, it will not cause an epidemic or a major outbreak.
However, if the network has power-law or scale-free distribution, it becomes highly sensitive
to disease propagation. They further showed that no matter what the transmission rate is,
there exists a finite probability that the infection will cause a major outbreak. Hence, we
clearly see that these real large-scale networks are highly sensitive or infinitely susceptible.
Further, all these networks have evolved over time with new nodes joining the network (and
some leaving) according to some self-organizing or evolutionary rules. There is no external
influence that controlled the evolution process or structure of the network. Nevertheless,
these networks have evolved in such a manner that they exhibit complex behaviors such as
power-law degree distributions and many others. Hence, they are called “complex” networks
[135].

The above discussion on complexity is an intuitive explanation rather than technical


details. More rigorous mathematical definitions of complexity can be found in [23, 30].
11.5. OPTIMIZATION IN COMPLEX NETWORKS 27

11.5 Optimization in complex networks

The models discussed in section 11.3 are focused on explaining the evolution and growth
process of many large real networks. They mainly concentrate on statistical properties of
real networks and network modeling. But the ultimate goal in studying and modeling the
structure of complex networks is to understand and optimize the processes taking place on
these networks. For example, one would like to understand how the structure of the Internet
affects its survivability against random failures or intentional attacks, how the structure of
the WWW helps in efficient surfing or search on the web, how the structure of social networks
affects the spread of viruses or diseases, etc. In other words, to design rules for optimiza-
tion, one has to understand the interactions between the structure of the network and the
processes taking place on the network. These principles will certainly help in redesigning
or restructuring the existing networks and perhaps even help in designing a network from
scratch. In the past few years, there has been tremendous amount of effort by the research
communities of different disciplines to understand the processes taking place on networks
[13, 31, 49, 101]. In this chapter, we concentrate on two processes, namely node failures and
local search, because of their high relevance to engineering systems and discuss few other
topics briefly.

11.5.1 Network resilience to node failures

All real networks are regularly subject to node/edge failures either due to normal mal-
functions (random failures) or intentional attacks (targeted attacks) [15, 16]. Hence, it is
extremely important for the network to be robust against such failures for proper function-
ing. Albert et al. [15] demonstrated that the topological structure of the network plays a
major role in its response to node/edge removal. They showed that most of the real net-
works are extremely resilient to random failures. On the other hand, they are very sensitive
to targeted attacks. They attribute it to the fact that most of these networks are scale-free
networks, which are highly heterogenous in node degree. Since a large fraction of nodes have
small degree, random failures do not have any effect on the structure of the network. On
28 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Random graphs Scale−free networks


10000 10000

Size of the largest connected component 9000 9000

8000 8000

7000 7000

6000 6000

5000 5000

4000 4000

3000 3000

2000 2000

1000 1000

0 0
0 35 70 0 35 70
p

(a) (b)

Figure 11.15: The size of the largest connected component as the percentage number of
nodes (p) removed from the networks due to random failures (⋄) and targeted attacks (△).
(a) ER graph with number of nodes (N) = 10,000 and mean degree < k > = 4; (b) Scale-
free networks generated by Barabási-Albert model with N = 10,000 and < k > = 4. The
behavior with respective to random failures and targeted attacks is similar for random graphs.
Scale-free networks are highly sensitive to targeted attacks and robust to random failures.

the other hand, the removal of a few highly connected nodes that maintain the connectiv-
ity of the network, drastically changes the topology of the network. For example, consider
the Internet: despite frequent router problems in the network, we rarely experience global
effects. However, if a few critical nodes in the Internet are removed then it would lead to
a devastating effect. Figure 11.15 shows the decrease in the size of the largest connected
component for both scale-free networks and ER graphs, due to random failures and targeted
attacks. ER graphs are homogenous in node degree, that is all the nodes in the network
have approximately the same degree. Hence, they behave almost similarly for both random
failures and targeted attacks (see figure 11.15(a)). In contrast, for scale-free networks, the
size of the largest connected component decreases slowly for random failures and drastically
for targeted attacks (see figure 11.15(b)).

Ideally, we would like to have a network which is as resilient as scale-free networks to


random failures and as resilient as random graphs to targeted attacks. To determine the
feasibility of modeling such a network, Valente et al. [133] and Paul et al. [117] have studied
11.5. OPTIMIZATION IN COMPLEX NETWORKS 29

the following optimization problem: “What is the optimal degree distribution of a network
of size N nodes that maximizes the robustness of the network to both random failures and
targeted attacks with the constraint that the number of edges remain the same? ”

Note that we can always improve the robustness by increasing the number of edges in
the network (for instance, a completely connected network will be the most robust network
for both random failures and targeted attacks). Hence the problem has a constraint on the
number of edges. In [133], Valente et al. showed that the optimal network configuration
is very different from both scale-free networks and random graphs. They showed that the
optimal networks that maximize robustness for both random failures and targeted attacks
have at most three distinct node degrees and hence the degree distribution is three-peaked.
Similar results were demonstrated by Paul et al. in [117]. Paul et al. showed that the
optimal network design is one in which all the nodes in the network except one have the
same degree, k1 (which is close to the average degree), and one node has a very large degree,
k2 ∼ N 2/3 , where N is the number of nodes. However, these optimal networks may not be
practically feasible because of the requirement that each node has a limited repertoire of
degrees.

Many different evolutionary algorithms have also been proposed to design an optimal
network configuration that is robust to both random failures and targeted attacks [44, 74,
125, 130, 134]. In particular, Thadakamalla et al. [130] consider two other measures, re-
sponsiveness and flexibility along with robustness for random failures and targeted attacks,
specifically for supply-chain networks. They define responsiveness as the ability of network
to provide timely services with effective navigation and measure it in terms of average path
length of the network. The lower the average path length, the better is the responsiveness
of the network. Flexibility is the ability of the network to have alternate paths for dy-
namic rerouting. Good clustering properties ensure the presence of alternate paths, and the
flexibility of a network is measured in terms of the clustering coefficient. They designed a pa-
rameterized evolutionary algorithm for supply-chain networks and analyzed the performance
with respect to these three measures. Through simulation they have shown that there exist
trade-offs between these measures and proposed different ways to improve these properties.
However, it is still unclear as to what would be the optimal configuration of such survivable
30 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

networks. The research question would be “what is the optimal configuration of a network of
N nodes that maximizes the robustness to random failures, targeted attacks, flexibility, and
responsiveness, with the constraint that the number of edges remain the same? ”

Until now, we have focussed on the effects of node removal on the static properties of
a network. However, in many real networks, the removal of nodes will also have dynamic
effects on the network as it leads to avalanches of breakdowns also called cascading failures.
For instance, in a power transmission grid, the removal of nodes (power stations) changes
the balance of flows and leads to a global redistribution of loads over all the network. In
some cases, this may not be tolerated and might trigger a cascade of overload failures [82], as
happened on August 10th 1996 in 11 US states and two Canadian provinces [124]. Models
of cascades of irreversible [97] or reversible [45] overload failures have demonstrated that
removal of even a small fraction of highly loaded nodes can trigger global cascades if the
load distribution of the nodes is heterogenous. Hence, cascade-based attacks can be much
more destructive than any other strategies considered in [15, 71]. Later, in [96], Motter
showed that a defence strategy based on a selective further removal of nodes and edges,
right after the initial attack or failure, can drastically reduce the size of the cascade. Other
studies on cascading failures include [39, 94, 95, 138, 141].

11.5.2 Local search

One of the important research problems that has many applications in engineering systems
is search in complex networks. Local search is the process, in which a node tries to find a
network path to a target node using only local information. By local information, we mean
that each node has information only about its first, or perhaps second neighbors and it is
not aware of nodes at a larger distance and how they are connected in the network. This is
an intriguing and relatively little studied problem that has many practical applications. Let
us suppose some required information such as computer files or sensor data is stored at the
nodes of a distributed network or database. Then, in order to quickly determine the location
of particular information, one should have efficient local (decentralized) search strategies.
Note that this is different from neighborhood search strategies used for solving combinatorial
11.5. OPTIMIZATION IN COMPLEX NETWORKS 31

optimization problems [2]. For example, consider the networks shown in figure 11.16(a) and
11.16(b). The objective is for node 1 to send a message to node 30 in the shortest possible
path. In the network shown in figure 11.16(a), each node has global connectivity information
about the network (that is, how each and every node is connected in the network). In such
a case, node 1 can calculate the optimal path using traditional algorithms [7] and send the
message through this path (1 - 3 - 12 - 30, depicted by the dotted line). Next, consider
the network shown in figure 11.16 (b), in which each node knows only about its immediate
neighbors. Node 1, based on some search algorithm, chooses to send the message to one of
its neighbors: in this case, node 4. Similarly, node 4 also has only local information, and
uses the same search algorithm to send the message to node 13. This process continues until
the message reaches the target node. We can clearly see that the search path obtained (1
- 4 - 13 - 28 - 23 - 30) is not optimal. However, given that we have only local information
available, the problem tries to design optimal search algorithms in complex networks. The
algorithms discussed in this section may look similar to “distributed routing algorithms” that
are abundant in wireless ad hoc and sensor networks [10, 11]. However, the main difference
is that the former try to exploit the statistical properties of the network topology whereas
the latter do not. Most of the algorithms in wireless sensor networks literature find a path
to the target node either by broadcasting or random walk and then concentrate on efficient
routing of the data from start node to the end node [10, 76]. As we will see in this section,
the statistical properties of the networks have significant effect on the search process. Hence,
the algorithms in wireless sensor networks could be integrated with these results for better
performance.

We discuss this problem for two types of networks. In the first type of network, the global
position of the target node can be quantified and each node has this information. This
information will guide the search process in reaching the target node. For example, if we
look at the network considered in Milgram’s experiment each person has the geographical
and professional information about the target node. All the intermediary people (or nodes)
use this information as a guide for passing the messages. Whereas in the second type of
network, we can not quantify the global position of the target node. In this case, during the
search process, we would not know whether a step in the search process is going towards the
target node or away from it. This makes the local search process even more difficult. One
32 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.16: Illustration for different ways of sending message from node 1 to node 30. (a)
In this case, each node has global connectivity information about the whole network. Hence,
node 1 calculates the optimal path and send the message through this path. (b) In this case,
each node has only information about its neighbors (as shown by the dotted curve). Using
this local information, node 1 tries to send the message to node 30. The path obtained is
longer than the optimal path.

such kind of network is the peer-to-peer network, Gnutella [79], where the network structure
is such that one may know very little information about the location of the target node.
Here, when a user is searching for a file he/she does not know the global position of the node
that has the file. Further, when the user sends a request to one of its neighbors, it is difficult
to find out whether this step is towards the target node or away from it. For lack of more
suitable name, we call the networks of the first type spatial networks and networks of the
second type non-spatial networks. In this chapter, we focus more on search in non-spatial
networks.

Search in spatial networks

The problem of local search goes back to the famous experiment by Stanley Milgram [92]
(discussed in section 11.2) illustrating the short distances in social networks. Another im-
portant observation of the experiment, which is even more surprising, is the ability of these
nodes to find these short paths using just the local information. As pointed out by Kleinberg
[83, 84, 85], this is not a trivial statement because most of the time, people have only local
information in the network. This is the information about their immediate friends or perhaps
11.5. OPTIMIZATION IN COMPLEX NETWORKS 33

their friends’ friends. They do not have the global information about the acquaintances of
all people in the network. Even in Milgram’s experiment, the people to whom he gave the
letters have only local information about the entire social network. Still, from the results
of the experiment, we can see that arbitrary pairs of strangers are able to find short chains
of acquaintances between them by using only local information. Many models have been
proposed to explain the existence of such short paths [13, 31, 49, 98, 101, 143]. However,
these models are not sufficient to explain the second phenomenon. The observations from
Milgram’s experiment suggest that there is something more embedded in the underlying
social network that guides the message implicitly from the source to the target. Such net-
works which are inherently easy to search are called searchable networks. Mathematically,
a network is searchable if the length of the search path obtained scales logarithmically with
the number of nodes N (∼ logN) or lesser. Kleinberg demonstrated that the emergence of
such a phenomenon requires special topological features [83, 84, 85]. Considering a family
of network models on a n-dimensional lattice that generalizes the Watts-Strogatz model,
he showed that only one particular model among this infinite family can support efficient
decentralized algorithms. Unfortunately, the model given by Kleinberg is highly constrained
and represents a very small subset of complex networks. Watts et al. [144] presented another
model which is based upon plausible hierarchical social structures and contentions regarding
social networks. This model defines a class of searchable networks and offers an explanation
for the searchability of social networks.

Search in non-spatial networks

The traditional search methods in non-spatial networks are broadcasting or random walk.
In broadcasting, each node sends the message to all its neighbors. The neighbors in turn
broadcast the message to all their neighbors, and the process continues. Effectively, all
the nodes in the network would have received the message at least once or may be even
more. This could have devastating effects on the performance of the network. A hint on
the potential damages of broadcasting can be viewed by looking at the Taylorsville NC,
elementary school project [142]. Sixth-grade students and their teacher sent out a sweet
email to all the people they knew. They requested the recipients to forward the email to
34 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

everyone they know and notify the students by email so that they could plot their locations
on a map. A few weeks later, the project had to be canceled because they had received
about 450,000 responses from all over the world [142]. A good way to avoid such a huge
exchange of messages is by doing a walk. In a walk, each node sends the message to one
of its neighbors until it reaches the target node. The neighbor can be chosen in different
ways depending on the algorithm. If the neighbor is chosen randomly with equal probability
then it is called random search, while in a high degree search the highest degree neighbor is
chosen. Adamic et al. [6] have demonstrated that high degree search is more efficient than
random search in networks with a power-law degree distribution (scale-free networks). High
degree search sends the message to a more connected neighbor that has higher probability
of reaching the target node and thus exploiting the presence of heterogeneity in node degree
to perform better. They showed that the number of steps (s) required for the random
search until the whole graph is revealed is s ∼ N 3(1−2/γ) and for the high-degree search
it is s ∼ N (2−4/γ) . Clearly, for γ > 2.0, the number of steps taken by high-degree search
scales with a smaller exponent than the random walk search. Since most real networks have
power-law degree distribution with exponent (γ) between 2.1 and 3.0, high-degree search
would be more effective in these networks.

All the algorithms discussed until now [6, 83, 84, 85, 144], have assumed that the edges in
the network are equivalent. But, the assumption of equal edge weights (which may represent
the cost, bandwidth, distance, or power consumption associated with the process described
by the edge) usually does not hold in real networks. Many researchers [17, 27, 28, 36, 60,
62, 65, 87, 100, 106, 116, 118, 148], have pointed out that it is incomplete to assume that all
the edges are equivalent. Recently, Thadakamalla et al. [131] have proposed a new search
algorithm based on a network measure called local betweenness centrality (LBC) that utilizes
the heterogeneities in node degrees and edge weights. The LBC of a neighbor node i, L(i),
is given by
X σst (i)
L(i) = ,
s6=n6=t
σst
s,t ∈ local network

where σst is the total number of shortest paths (shortest path means the path over which
the sum of weights is minimal) from node s to t. σst (i) is the number of these shortest paths
passing through i. If the LBC of a node is high, it implies that this node is critical in the local
11.5. OPTIMIZATION IN COMPLEX NETWORKS 35

Table 11.2: Comparison of different search strategies in power-law networks with exponent
2.1 and 2000 nodes with different edge weight distributions. The mean for all the edge weight
distributions is 5 and the variance is σ 2 . The values in the table are the average distances
obtained for each search strategy in these networks. The values in the brackets show the
relative difference between average distance for each strategy with respect to the average
distance obtained by the LBC strategy. LBC search, which reflects both the heterogeneities
in edge weights and node degree, performed the best for all edge weight distributions.

Beta Uniform Exp. Power-law


Search strategy σ 2 = 2.3 σ 2 = 8.3 σ 2 = 25 σ 2 = 4653.8
1107.71 1097.72 1108.70 1011.21
Random walk
(202%) (241%) (272%) (344%)
704.47 414.71 318.95 358.54
Minimum edge weight
(92%) (29%) (7%) (44%)
379.98 368.43 375.83 394.99
Highest degree
(4%) (14%) (26%) (59%)
Minimum average node weight 1228.68 788.15 605.41 466.18
(235%) (145%) (103%) (88%)
Highest LBC 366.26 322.30 298.06 247.77

network. Thadakamalla et al. assume that each node in the network has information about
its first and second neighbors and using this information, the node calculates the LBC of each
neighbor and passes the message to the neighbor with the highest LBC. They demonstrated
that this search algorithm utilizes the heterogeneities in node degree and edge-weights to
perform well in power-law networks with exponent between 2.0 and 2.9 for a variety of edge-
weight distributions. Table 11.2 compares the performance of different search algorithms for
scale-free networks with different edge weight distributions. The values in the parentheses
show the relative difference between the average distance for each algorithm with respect
to the average distance obtained by the LBC algorithm. In specific, they observed that as
the heterogeneity in the edge weights increase, the difference between the high-degree search
and LBC search increase. This implies that it is critical to consider the edge weights in the
local search algorithms. Moreover, given that many real networks are heterogeneous in edge
weights, it becomes important to consider an LBC based search rather than high degree
search as shown by Adamic et. al [6].
36 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

11.5.3 Other topics

There are various other applications to real networks which include the issues related to the
structure of the networks and their dynamics. In this subsection, we briefly summarize these
applications and give some references for further study.

Detecting community structures

As mentioned earlier, community structures are typically found in many real networks. Find-
ing these communities is extremely helpful in understanding the structure and function of the
network. Sometimes the statistical properties of the community alone may be very different
from the whole network and hence these may be critical in understanding the dynamics in
the community. The following are some of the examples:

• The World Wide Web: Identification of communities in the web is helpful for im-
plementation of search engines, content filtering, automatic classification, automatic
realization of ontologies and focussed crawlers [18, 56].

• Social networks: Community structures are a typical feature of a social network. The
behavior of an individual is highly influenced by the community he/she belongs. Com-
munities often have their own norms, subcultures which are an important source of a
person’s identity [103, 139].

• Biological networks: Community structures are found in cellular [72, 123], metabolic
[121] and genetic networks [147]. Identifying them helps in finding the functional
modules which correspond to specific biological functions.

Algorithmically, the community detection problem is same as cluster analysis problem


studied extensively by OR community, computer scientists, statisticians, and mathemati-
cians [67]. One of the major classes of algorithms for clustering is hierarchical algorithms
which fall into two broad types, agglomerative and divisive. In an agglomerative method,
an empty network (n nodes with no edges) is considered and edges are added based on
some similarity measure between nodes (for example, similarity based on the number of
11.5. OPTIMIZATION IN COMPLEX NETWORKS 37

common neighbors) starting with the edge between the pairs with highest similarity. This
procedure can be stopped at any step and the distinct components of the network are taken
to be the communities. On the other hand, in divisive methods edges are removed from
the network based on certain measure (for example, the edge with the highest betweenness
centrality [103]). As this process continues the network disintegrates into different communi-
ties. Recently, many such algorithms are proposed and applied to complex networks [31, 46].
A comprehensive list of algorithms to identify community structures in complex networks
can be found in [46] where Danon et al. have compared them in terms of sensitivity and
computational cost.

Another interesting problem in community detection is to find a clique of maximum


cardinality in the network. A clique is a complete subgraph in the network. In the network
G(V, E), let G(S) denote the subgraph induced by a subset S ⊆ V . A network G(V, E) is
complete if each node in the network is connected to every other node, i.e. ∀i, j ∈ V, {i, j} ∈
E. A clique C is a subset of V such that the induced graph G(C) is complete. The maximum
clique problem has many practical applications such as project selection, coding theory,
computer vision, economics and integration of genome mapping data [38, 68, 110]. For
instance, in [33], Boginski et al. solve this problem for finding maximal independent set in the
market graph which can form a base for forming a diversified portfolio. The maximum clique
problem is known to be NP-hard [58] and details on various algorithms and heuristics can be
found in [78, 110]. Further, if the network size is large, then the data may not fit completely
inside the computer’s internal memory. Then we need to use external memory algorithms and
data structures [3] for solving the optimization problems in such networks. These algorithms
use slower external memory (such as disks) and the resulting communication between internal
memory and external memory can be a major performance bottleneck. In [4], using external
memory algorithms, Abello et al proposed decomposition schemes that make large sparse
graphs suitable for processing by graph optimization algorithms.
38 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Spreading processes

Diffusion of a infectious disease, computer virus or information on a network constitute


examples of spreading processes. In particular, the spread of infectious diseases in a pop-
ulation is called epidemic spreading. The study of epidemiological modeling has been an
active research area for a long time and is heavily used in planning and implementing var-
ious prevention and control programs [48]. Recently, there has been a burst of activities
on understanding the effects of the network properties on the rate and dynamics of disease
propagation [13, 31, 49, 101]. Most of the earlier methods used the homogenous mixing
hypothesis [21], which implies that the individuals who are in contact with susceptible indi-
viduals are uniformly distributed throughout the entire population. However, recent findings
(section 11.2) such as heterogeneities in node degree, presence of high clustering coefficients,
and community structures indicate that this assumption is far from reality. Later, many
models have been proposed [13, 31, 42, 49, 101, 112, 115] which consider these properties of
the network. In particular, many researchers have shown that incorporating these properties
in the model radically changes the results previously established for random graphs. Other
spreading processes which are of interest include spread of computer viruses [24, 91, 105],
data dissemination on the Internet [80, 137], and strategies for marketing campaigns [90].

Congestion

Transport of packets or materials ranging from packet transfer in the Internet to the mass
transfer in chemical reactions in cell is one of the fundamental processes occurring on many
real networks. Due to limitations in resources (bandwidth), increase in number of packets
(packet generation rate) may lead to overload at the node and unusually long deliver times,
in other words, congestion in networks. Considering a basic model, Ohira and Sawatari [107]
have shown that there exists a phase transition from a free flow to a congested phase as
a function of the packet generation rate. This critical rate is commonly called “congestion
threshold” and the higher the threshold, the better is the network performance with respect
to congestion.
11.6. CONCLUSIONS 39

Many studies have shown that an important role is played by the topology and routing
algorithms in the congestion of networks [40, 47, 50, 51, 63, 64, 126, 128, 132]. Toroczkai et
al. [132] have shown that on large networks on which flows are influenced by gradients of a
scalar distributed on the nodes, scale-free topologies are less prone to congestion than random
graphs. Routing algorithms also influence congestion at nodes. For example, in scale-free
networks, if the packets are routed through the shortest paths then most of the packets
pass through the hubs and hence causing higher loads on the hubs [59]. Singh and Gupte
[126] discuss strategies to manipulate hub capacity and hub connections to relieve congestion
in the network. Similarly many congestion-aware routing algorithms [40, 50, 51, 128] have
been proposed to improve the performance. Sreenivasan et al. [128] introduced a novel static
routing protocol which is superior to shortest path routing under intense packet generation
rates. They propose a mechanism in which packets are routed through hub avoidance paths
unless the hubs are required to establish the route. Sometimes when global information is
not available, routing is done using local search algorithms. Congestion due to such local
search algorithms and optimal network configurations are studied in [22].

11.6 Conclusions

Complex networks abound in today’s world and are continuously evolving. The sheer size
and complexity of these networks pose unique challenges in their design and analysis. Such
complex networks are so pervasive that there is an immediate need to develop new analytical
approaches. In this chapter, we presented significant findings and developments in recent
years that led to a new field of inter-disciplinary research, Network Science. We discussed
how network approaches and optimization problems are different in network science than
traditional OR algorithms and addressed the need and opportunity for the OR community
to contribute to this fast-growing research field. The fundamental difference is that large-
scale networks are characterized based on macroscopic properties such as degree distribution
and clustering coefficient rather than the individual properties of the nodes and edges. Im-
portantly, these macroscopic or statistical properties have a huge influence on the dynamic
processes taking place on the network. Therefore, to optimize a process on a given config-
40 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

uration, it is important to understand the interactions between the macroscopic properties


and the process. This will further help in the design of optimal network configurations for
various processes. Due to the growing scale of many engineered systems, a macroscopic
network approach is necessary for the design and analysis of such systems. Moreover, the
macroscopic properties and structure of networks across different disciplines are found to be
similar. Hence the results of this research can easily be migrated to applications as diverse
as social networks to telecommunication networks.

Acknowledgments

The authors would like to acknowledge the National Science Foundation (Grant # DMI
0537992) and a Sloan Research Fellowship to one of the authors (R. A.) for making this work
feasible. In addition, the authors would like to thank the anonymous reviewer for helpful
comments and suggestions. Any opinions, findings and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views
of the National Science Foundation (NSF).
Bibliography

[1] The Internet Movie Database can be found on the WWW at http://www.imdb.com/.

[2] E. Aarts and J. K. Lenstra, editors. Local Search in Combinatorial Optimization. J.

Wiley & Sons, Chichester, UK, 1997.

[3] J. Abello and J. Vitter, editors. External Memory Algorithms: DIMACS series in
discrete mathematics and theoretical computer science, volume 50. American Mathe-
matical Society, Boston, MA, USA, 1999.

[4] J. Abello, P. M. Pardalos, and M. G. C. Resende. External Memory Algorithms:


DIMACS series in discrete mathematics and theoretical computer science, volume 50,
chapter On maximum clique problems in very large graphs, pages 119–130. American

Mathematical Society, 1999.

[5] L. A. Adamic and B. A. Huberman. Growth dynamics of the world-wide web. Nature,
401(6749):131, 1999.

[6] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman. Search in power-

law networks. Phys. Rev. E, 64(4):046135, 2001.

[7] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network flows: Theory, Algorithms,


and Applications. Prentice-Hall, NJ, 1993.

41
42 BIBLIOGRAPHY

[8] W. Aiello, F. Chung, and L. Lu. A random graph model for massive graphs. Proceedings
of the thirty-second annual ACM symposium on Theory of computing, pages 171–180,
2000.

[9] W. Aiello, F. Chung, and L. Lu. A random graph model for power law graphs. Exper-
imental Mathematics, 10(1):53–66, 2001.

[10] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor net-


works: A survey. Computer Networks, 38(4):393–422, 2002.

[11] J. N. Al-Karaki and A. E. Kamal. Routing techniques in wireless sensor networks: a


survey. IEEE Wireless Communications, 11(6):6–28, 2004.

[12] R. Albert and A. L. Barabási. Topology of evolving networks: Local events and
universality. Phys. Rev. Lett., 85(24):5234–5237, 2000.

[13] R. Albert and A. L. Barabási. Statistical mechanics of complex networks. Reviews of


Modern Physics, 74(1):47–97, 2002.

[14] R. Albert, H. Jeong, and A. L. Barabási. Diameter of the world wide web. Nature,

401(6749):130–131, 1999.

[15] R. Albert, H. Jeong, and A. L. Barabási. Attack and error tolerance of complex

networks. Nature, 406(6794):378–382, 2000.

[16] R. Albert, I. Albert, and G. L. Nakarado. Structural vulnerability of the north american

power grid. Phys. Rev. E, 69(2):025103, 2004.

[17] E. Almaas, B. Kovacs, T. Viscek, Z. N. Oltval, and A. L. Barabási. Global organization


of metabolic fluxes in the bacterium escherichia coli. Nature, 427(6977):839–843, 2004.

[18] R. B. Almeida and V. A. F. Almeida. A community-aware search engine. In Proceed-


ings of the 13th International Conference on World Wide Web, ACM Press, 2004.
BIBLIOGRAPHY 43

[19] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world


networks. Proc. Natl. Acad. Sci., 97(21):11149–11152, 2000.

[20] C. Anderson, S. Wasserman, and B. Crouch. A p∗ primer: Logit models for social
networks. Social Networks, 21(1):37–66, 1999.

[21] R. M. Anderson and R. M. May. Infectious Diseases in Humans. Oxford University


Press, Oxford, 1992.

[22] A. Arenas, A. Cabrales, A. Diaz-Guilera, R. Guimera, and F. Vega. Statistical me-


chanics of complex networks, chapter Search and Congestion in Complex Networks,
pages 175–194. Springer-Verlag, Berlin, Germany, 2003.

[23] R. Badii and A. Politi. Complexity : Hierarchical structures and scaling in physics.
Cambridge university press, 1997.

[24] J. Balthrop, S. Forrest, M. E. J. Newman, and M. M. Williamson. Technological


networks and the spread of computer viruses. Science, 304(5670):527–529, 2004.

[25] A. L Barabási and R. Albert. Emergence of scaling in random networks. Science, 286

(5439):509–512, 1999.

[26] A. L. Barabási, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution

of the social network of scientific collaborations. Physica A, 311:590–614, 2002.

[27] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani. The architecture of

complex weighted networks. Proc. Natl. Acad. Sci., 101(11):3747, 2004.

[28] A. Barrat, M. Barthelemy, and A. Vespignani. Modeling the evolution of weighted


networks. Phys. Rev. E, 70(6):066149, 2004.

[29] M. Barthelemy, A. Barrat, R. Pastor-Satorras, and A. Vespignani. Characterization


and modeling of weighted networks. Physica A, 346:34–43, 2005.
44 BIBLIOGRAPHY

[30] C. H. Bennett. From Complexity to Life, chapter How to Define Complexity in Physics,
and Why, pages 34–43. Oxford University Press, 2003.

[31] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. U. Hwang. Complex networks:


Structure and dynamics. Physics Reports, 424:175–308, 2006.

[32] V. Boginski, S. Butenko, and P. Pardalos. Statistical analysis of financial networks.

Computational Statistics & Data Analysis, 48:431–443, 2005.

[33] V. Boginski, S. Butenko, and P. Pardalos. Mining market data: a network approach.

Computers & Operations Research, 33:3171–3184, 2006.

[34] B. Bollobas. Randon graphs. Academic, London, 1985.

[35] B. Bollobas and O. Riordan. Handbook of Graphs and Networks, chapter Mathematical
results on scale-free graphs. Wiley-VCH, Berlin, 2003.

[36] L. A. Braunstein, S. V. Buldyrev, R. Cohen, S. Havlin, and H. E. Stanley. Optimal


paths in disordered complex networks. Phys. Rev. Lett., 91(16):168701, 2003.

[37] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins,

and J. Wiener. Graph structure in the web. Computer networks, 33:309–320, 2000.

[38] S. Butenko and W.E. Wilhelm. Clique-detection models in computational biochemistry


and genomics. European Journal of Operational Research, 173:1–17, 2006.

[39] B. A. Carreras, V. E. Lynch, I. Dobson, and D. E. Newman. Critical points and


transitions in an electric power transmission model for cascading failure blackouts.

Chaos, 12(4):985–994, 2002.

[40] Z. Y. Chen and X. F. Wang. Effects of network structure and routing strategy on
network capacity. Phys. Rev. E, 73(3):036107, 2006.
BIBLIOGRAPHY 45

[41] F. Chung and L. Lu. Connected components in random graphs with given degree
sequences. Annals of combinatorics, 6:125–145, 2002.

[42] V. Colizza, A. Barrat, M. Barthlemy, and A. Vespignani. The role of the airline
transportation network in the prediction and predictability of global epidemics. PNAS,
103(7):2015–2020, 2006.

[43] N. Contractor, S. Wasserman, and K. Faust. Testing multi-theoretical multilevel hy-


potheses about organizational networks: An analytic framework and empirical exam-

ple. Academy of Management Review, 31(3):681–703, 2006.

[44] L. F. Costa. Reinforcing the resilience of complex networks. Phys. Rev. E, 69(6):
066127, 2004.

[45] P. Crucitti, V. Latora, and M. Marchiori. Model for cascading failures in complex
networks. Phys. Rev. E, 69(4):045104, 2004.

[46] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas. Comparing community structure


identification. Journal of Statistical Mechanics, page P09008, 2005.

[47] M. Argollo de Menezes and A.-L. Barabási. Fluctuations in network dynamics. Phys.
Rev. Lett., 92(2):028701, 2004.

[48] O. Diekmann and J. Heesterbeek. Mathematical Epidemiology of Infectious Diseases:


Model Building, Analysis and Interpretation. Wiley, New York, 2000.

[49] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of networks. Adv. Phys., 51:


1079–1187, 2002.

[50] P. Echenique, J. Gomez-Gardenes, and Y. Moreno. Improved routing strategies for


internet traffic delivery. Phys. Rev. E, 70(5):056105, 2004.

[51] P. Echenique, J. Gomez-Gardenes, and Y. Moreno. Dynamics of jamming transitions


in complex networks. Europhys. Lett., 71(2):325–331, 2005.
46 BIBLIOGRAPHY

[52] P. Erdos and A.Renyi. On random graphs. Publicationes Mathematicae, 6:290–297,


1959.

[53] P. Erdos and A.Renyi. On the evolution of random graphs. Magyar Tud. Mat. Kutato
Int. Kozl., 5:17–61, 1960.

[54] P. Erdos and A.Renyi. On the strength of connectedness of a random graph. Acta
Math. Acad. Sci. Hungar., 12:261–267, 1961.

[55] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet


topology. Computer Communications Review, 29:251–262, 1999.

[56] G. Flake, S. Lawrence, and C. Lee Giles. Efficient identification of web communities.

In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data


Mining, pages 150–160, 2000.

[57] O. Frank and D. Strauss. Markov graphs. J. American Statistical Association, 81:
832–842, 1986.

[58] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory

of NP-Completeness. W. H. Freeman, 1979.

[59] K. I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free

networks. Phys. Rev. Lett., 87(27), 2001. 278701.

[60] K. I. Goh, J. D. Noh, B. Kahng, and D. Kim. Load distribution in weighted complex

networks. Phys. Rev. E, 72(1):017102, 2005.

[61] R. Govindan and H. Tangmunarunkit. Heuristics for internet map discovery. IEEE
INFOCOM, 3:1371–1380, 2000.

[62] M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):
1360–1380, 1973.
BIBLIOGRAPHY 47

[63] R. Guimera, A. Arenas, A. Dı́az-Guilera, and F. Giralt. Dynamical properties of model


communication networks. Phys. Rev. E, 66(2):026704, 2002.

[64] R. Guimera, A. Dı́az-Guilera, F. Vega-Redondo, A. Cabrales, and A. Arenas. Optimal


network topologies for local search with congestion. Phys. Rev. Lett., 89(24):248701,
2002.

[65] R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral. The worldwide air trans-
portation network: Anomalous centrality, community structure, and cities’ global roles.
Proc. Nat. Acad. Sci., 102:7794–7799, 2005.

[66] A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In
WWW ’05: Special interest tracks and posters of the 14th international conference on
World Wide Web, pages 902–903. ACM Press, New York, USA, 2005.

[67] P. Hansen and B. Jaumard. Cluster analysis and mathematical programming. Math-
ematical programming, 79:191–215, 1997.

[68] J. Hasselberg, P. M. Pardalos, and G. Vairaktarakis. Test case generators and compu-

tational results for the maximum clique problem. Journal of Global Optimization, 3:
463–482, 1993.

[69] B. Hendrickson and R. W. Leland. A multilevel algorithm for partitioning graphs. In

Supercomputing ’95: Proceedings of the 1995 ACM/IEEE conference on Supercomput-


ing, page 28. ACM Press, New York, USA, 1995.

[70] P. W. Holland and S. Leinhardt. An exponential family of probability distributions

for directed graphs. J. American Statistical Association, 76:33–65, 1981.

[71] P. Holme and B. J. Kim. Attack vulnerability of complex networks. Phys. Rev. E, 65
(5), 2002.
48 BIBLIOGRAPHY

[72] P. Holme, M. Huss, and H. Jeong. Subnetwork hierarchies of biochemical pathways.


Bioinformatics, 19:532–538, 2003.

[73] V. Honavar. Complex Adaptive Systems Group at Iowa State University,


http://www.cs.iastate.edu/∼honavar/cas.html, date accessed: March 22, 2006.

[74] R. Ferrer i Cancho and R. V. Solé. Statistical mechanics of complex networks, chapter

Optimization in complex networks, pages 114–126. Springer-Verlag, Berlin, 2003.

[75] R. Ferrer i Cancho, C. Janssen, and R. V. Solé. Topology of technology graphs: Small

world patterns in electronic circuits. Phys. Rev. E, 64(4):046119, 2001.

[76] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: a scalable and


robust communication paradigm for sensor networks. Proceedings of ACM MobiCom

’00, Boston, MA, pages 174–185, 2000.

[77] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale
organization of metabolic networks. Nature, 407:651–654, 2000.

[78] D. J. Johnson and M. A. Trick, editors. Cliques, Coloring, and Satisfiability: Sec-
ond DIMACS Implementation Challenge, Workshop, October 11-13, 1993. American

Mathematical Society, Boston, USA, 1996.

[79] G. Kan. Peer-to-Peer Harnessing the Power of Disruptive Technologies, chapter


Gnutella. O’Reilly, Beijing, 2001.

[80] A.-M. Kermarrec, L. Massoulie, and A. J. Ganesh. Probabilistic reliable dissemination


in large-scale systems. IEEE Trans. on Parallel and Distributed Sys, 14(3):248–258,

2003.

[81] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs.
The Bell System Technical Journal, 49:291–307, 1970.
BIBLIOGRAPHY 49

[82] R. Kinney, P. Crucitti, R. Albert, and V. Latora. Modeling cascading failures in the
north american power grid. The European Physical Journal B, 46:101–107, 2005.

[83] J. Kleinberg. Navigation in a small world. Nature, 406:845, 2000.

[84] J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd


ACM Symposium on Theory of Computing, pages 163–170, 2000.

[85] J. Kleinberg. Small-world phenomena and the dynamics of information. Advances in


Neural Information Processing Systems, 14:431–438, 2001.

[86] D. Koschützki, K. A. Lehmann, L. Peeters, S. Richter, D. Tenfelde-Podehl, and O. Zlo-


towski. Network Analysis, chapter Centrality Indices, pages 16–61. Springer-Verlag,
Berlin, 2005.

[87] A. E. Krause, K. A. Frank, D. M. Mason, R. E. Ulanowicz, and W. W. Taylor. Com-


partments revealed in food-web structure. Nature, 426:282–285, 2003.

[88] R. Kumar, P. Raghavan, S. Rajalopagan, D. Sivakumar, A. Tomkins, and E. Upfal.


The web as a graph. Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART
symposium on Principles of database systems, pages 1–10, 2000.

[89] S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:
107–109, 1999.

[90] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing,


2005. e-print physics/0509039, http://lanl.arxiv.org/abs/physics?papernum=0509039.

[91] A. L. Lloyd and R. M. May. How viruses spread among computers and people. Science,
292:1316–1317, 2001.

[92] S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967.
50 BIBLIOGRAPHY

[93] M. Molloy and B. Reed. A critical point for random graphs with a given degree
sequence. Random Structures Algorithms, 6:161–179, 1995.

[94] Y. Moreno, J. B. Gomez, and A. F. Pacheco. Instability of scale-free networks under


node-breaking avalanches. Europhys. Lett., 58(4):630–636, 2002.

[95] Y. Moreno, R. Pastor-Satorras, A. Vazquez, and A. Vespignani. Critical load and

congestion instabilities in scale-free networks. Europhys. Lett., 62(2):292–298, 2003.

[96] A. E. Motter. Cascade control and defense in complex networks. Phys. Rev. Lett., 93

(9):098701, 2004.

[97] A. E. Motter and Y. Lai. Cascade-based attacks on complex networks. Phys. Rev. E,
66(6):065102, 2002.

[98] M. E. J. Newman. Models of small world. Journal Statistical Physics, 101:819–841,


2000.

[99] M. E. J. Newman. Scientific collaboration networks: I. network construction and


fundamental results. Phys. Rev. E, 64(1):016131, 2001.

[100] M. E. J. Newman. Scientific collaboration networks: Ii. shortest paths, weighted


networks, and centrality. Phys. Rev. E, 64(1):016132, 2001.

[101] M. E. J. Newman. The structure and function of complex networks. SIAM Review,

45:167–256, 2003.

[102] M. E. J. Newman. Handbook of Graphs and Networks, chapter Random graphs as

models of networks.

[103] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in


networks. Phys. Rev. E, 69(2):026113, 2004.
BIBLIOGRAPHY 51

[104] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary


degree distributions and their applications. Phys. Rev. E, 64(2):026118, 2001.

[105] M. E. J. Newman, S. Forrest, and J. Balthrop. Email networks and the spread of
computer viruses. Phys. Rev. E, 66(3):035101, 2002.

[106] J. D. Noh and H. Rieger. Stability of shortest paths in complex networks with random

edge weights. Phys. Rev. E, 66(6):066127, 2002.

[107] T. Ohira and R. Sawatari. Phase transition in a computer network traffic model. Phys.

Rev. E, 58(1):193–195, 1998.

[108] Committee on network science for future army applications. Network Science. The
National Academies Press, 2005.

[109] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community
structure of complex networks in nature and society. Nature, 435:814–818, 2005.

[110] P. M. Pardalos and J. Xue. The maximum clique problem. Journal of Global Opti-
mization, 4:301–328, 1994.

[111] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics and endemic states in


complex networks. Phys. Rev. E, 63(6):066117, 2001.

[112] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks.

Phys. Rev. Lett., 86:3200–3203, 2001.

[113] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics in finite size scale-free

networks. Phys. Rev. E, 65(3):035108, 2002.

[114] R. Pastor-Satorras and A. Vespignani. Immunization of complex networks. Phys. Rev.


E, 65(3):036104, 2002.
52 BIBLIOGRAPHY

[115] R. Pastor-Satorras and A. Vespignani. Handbook of Graphs and Networks, chapter


Epidemics and immunization in scale-free networks. Wiley-VCH, Berlin, 2003.

[116] R. Pastor-Satorras and A. Vespignani. Evolution and structure of the Internet: A


statistical physics approach. Cambridge University Press, 2004.

[117] G. Paul, T. Tanizawa, S. Havlin, and H. E. Stanley. Optimization of robustness of

complex networks. Eur. Phys. Journal B, 38:187–191, 2004.

[118] S.L. Pimm. Food Webs. The University of Chicago Press, 2 edition, 2002.

[119] A. Pothen, H. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of
graphs. SIAM J. Matrix Anal., 11(3):430–452, 1990.

[120] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identi-
fying communities in networks. Proc. Natl. Acad. Sci., 101:2658–2663, 2004.

[121] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A. L. Barabási. Hierarchical


organization of modularity in metabolic networks. Science, 297:1551–1555, 2002.

[122] M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties

of large-scale peer-to-peer systems and implications for system design. IEEE Internet
Computing Journal, 6:50–57, 2002.

[123] A. W. Rives and T. Galitskidagger. Modular organization of cellular networks. Proc.

Natl. Acad. Sci., 100(3):1128–1133, 2003.

[124] M. L. Sachtjen, B. A. Carreras, and V. E. Lynch. Disturbances in a power transmission

system. Phys. Rev. E, 61(5):4877–4882, 2000.

[125] B. Shargel, H. Sayama, I. R. Epstein, and Y. Bar-Yam. Optimization of robustness


and connectivity in complex networks. Phys. Rev. Lett., 90(6):068701, 2003.
BIBLIOGRAPHY 53

[126] B. K. Singh and N. Gupte. Congestion and decongestion in a communication network.


Phys. Rev. E, 71(5):055103, 2005.

[127] T. A. B. Snijders. Markov chain monte carlo estimation of exponential random graph
models. J. Social Structure, 3(2):1–40, 2002.

[128] S. Sreenivasan, R. Cohen, E. Lopez, Z. Toroczkai, and H. E. Stanley. Com-

munication bottlenecks in scale-free networks, 2006. e-print cs.NI/0604023,


http://xxx.lanl.gov/abs/cs?papernum=0604023.

[129] D. Strauss. On a general class of models for interaction. SIAM Review, 28:513–527,
1986.

[130] H. P. Thadakamalla, U. N. Raghavan, S. R. T. Kumara, and R. Albert. Survivability

of multi-agent based supply networks: A topological perspective. IEEE Intelligent


Systems, 19:24–31, 2004.

[131] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara. Search in weighted complex

networks. Phys. Rev. E, 72(6):066128, 2005.

[132] Z. Toroczkai and K. E. Bassler. Network dynamics: Jamming is limited in scale-free

systems. Nature, 428:716, 2004.

[133] A. X. C. N. Valente, A. Sarkar, and H. A. Stone. Two-peak and three-peak optimal


complex networks. Phys. Rev. Lett., 92(11):118702, 2004.

[134] V. Venkatasubramanian, S. Katare, P. R. Patkar, and F. Mu. Spontaneous emergence


of complex optimal networks through evolutionary adaptation. Computers & Chemical

Engineering, 28(9):1789–1798, 2004.

[135] A. Vespignani. Epidemic modeling: Dealing with complexity.


http://vw.indiana.edu/talks-fall04/, 2004. Date accessed: July 6, 2006.
54 BIBLIOGRAPHY

[136] A. Vespignani. Frontiers of Engineering: Reports on Leading-Edge Engineering from


the 2005 Symposium, chapter Complex Networks: Ubiquity, Importance, and Implica-
tions, pages 75–81. The National Academies Press, 2006.

[137] W. Vogels, R. van Renesse, and K. Birman. The power of epidemics: robust commu-
nication for large-scale distributed systems. SIGCOMM Comput. Commun. Rev., 33
(1):131–135, 2003.

[138] X. F. Wang and J. Xu. Cascading failures in coupled map lattices. Phys. Rev. E, 70
(5):056113, 2004.

[139] S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press,
1994.

[140] S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks
1: An introduction to markov random graphs and p∗ . Psychometrika, 61:401–426, 1996.

[141] D. J. Watts. A simple model of global cascades on random networks. Proc. Natl. Acad.
Sci., 99(9):5766–5771, 2002.

[142] D. J. Watts. Six degrees: The science of a connected age. W. W. Norton & Company,
2003.

[143] D. J. Watts and S. H. Strogatz. Collective dynamics of “small-world” networks. Nature,


393:440–442, 1998.

[144] D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social networks.
Science, 296:1302–1305, 2002.

[145] D. J. Watts, P. S. Dodds, and R. Muhamad.

http://smallworld.columbia.edu/index.html, date accessed: March 22, 2006.

[146] H. S. Wilf. Generating Functionology. Academic, Boston, 1990.


BIBLIOGRAPHY 55

[147] D. M. Wilkinson and B. A. Huberman. A method for finding communities of related


genes. Proc. Natl. Acad. Sci., 101:5241–5248, 2004.

[148] S. H. Yook, H. Jeong, A. L. Barabási, and Y. Tu. Weighted evolving networks. Phys.
Rev. Lett., 2001:5835–5838, 86.

Das könnte Ihnen auch gefallen