Sie sind auf Seite 1von 230

Lectures on Network Systems

Francesco Bullo
With scientific contributions from:
Jorge Corts, Florian Drfler, and Sonia Martnez
Version v0.81 (4 Jan 2016). This document is intended for personal use: you are allowed to
print and photocopy it. All other rights are reserved, e.g., this document (in whole or in part)
may not be posted online or shared in any way without express consent. 2012-16.

Preface
Topics These lecture notes are intended primarily for graduate students interested in network systems,
distributed algorithms, and cooperative control. The objective is to answer basic questions such as:
What are fundamental dynamical models of interconnnected systems? What are the essential dynamical
properties of these models and how are they related to network properties? What are basic estimation,
control, and optimization problems for these dynamical models?
The book is organized in two parts: Linear and Nonlinear Systems. The Linear Systems part includes
(i) basic concepts and results in matrix theory and graph theory (with an emphasis on Perron
Frobenius theory and algebraic graph theory),
(ii) averaging algorithms in discrete and continuous time, described by static, time-varying and
stochastic matrices,
whereas in the Nonlinear Systems part includes
(iii) robotic coordination problems for relative sensing networks,
(iv) networks of phase oscillator systems with an emphasis on the Kuramoto model, and
(v) virus propagation models, including lumped and network models as well as stochastic and deterministic models.
Both parts include motivating examples of network systems and distributed algorithms from sensor,
social, robotic and power networks.
Books which try to digest, coordinate, get rid of the duplication, get rid of the less fruitful methods and
present the underlying ideas clearly of what we know now, will be the things the future generations
will value. Richard Hamming (1915-1998), Mathematician
The intended audience The intended audience is 1st year graduate students in Engineering, Sciences
and Applied Mathematics programs. For the first part on Linear Systems, the required background
includes competency in linear algebra and only very basic notions of dynamical systems. For the second
part on Nonlinear Systems (including coupled oscillators and virus propagation), the required background
includes a calculus course. The treatment is self-contained and does not require a nonlinear systems
course.
These lecture notes are meant to be taught over a quarter-long course with a total 35 to 40 hours of
contact time. On average, each chapter should require approximately 2 hours of lecture time.
3

4
For the benefit of instructors, these lecture notes are supplemented by two documents. First, a
complete Answer Key is available on demand by an instructor. Second, these lecture notes are also
available in a slides format especially suited for classroom teaching.
Acknowledgments I wish to thank Sonia Martnez and Jorge Corts for their fundamental contribution to my understanding and our joint work on distributed algorithms and robotic networks. Their
scientific contribution is most obviously present in Chapters 2, 3, and 4. I am grateful to Noah Friedkin
for instructive discussions about social influence networks that influenced Chapter 5, and to Florian
Drfler for his extensive contributions to Chapters 13, 14, and 15 and to a large number of exercises.
I am grateful to Alessandro Giua for his detailed comments and suggestions. I wish to thank Sandro
Zampieri and Wenjun Mei for their contribution to Chapters 16 and 17 and to Stacy Patterson for
adopting an early version of these notes and providing me with detailed feedback. I wish to thank Jason
Marden and Lucy Pao for their invite to visit the University of Colorado at Boulder and deliver some of
these lecture notes.
I also acknowledge the generous support of the Army Research Office through grant W911NF-111-0092 and the National Science Foundation through grants CPS-1035917 and CPS-1135819.
Finally, a special thank you goes to all students who took this course and all scientists who read
these notes. Particular thanks go to Deepti Kannapan, Peng Jia, Fabio Pasqualetti, Sepehr Seifi, John W.
Simpson-Porco, Ashish Cherukuri, Alex Olshevsky, and Vaibhav Srivastava for their contributions to
these lecture notes and homework solutions.

Santa Barbara, California, USA


29 Mar 2012 4 Jan 2016

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Francesco Bullo

Contents
I

Linear Systems

1 Motivating Problems and Systems


1.1 Social influence networks: opinion dynamics . . . . . . . . . . . .
1.2 Wireless sensor networks: averaging algorithms . . . . . . . . . . .
1.3 Compartmental networks: dynamical flows among compartments .
1.4 Appendix: Robotic networks in cyclic pursuit and balancing . . . .
1.5 Appendix: Design problems in wireless sensor networks . . . . . .
1.5.1 Wireless sensor networks: distributed parameter estimation
1.5.2 Wireless sensor networks: distributed hypothesis testing . .
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

3
4
5
6
7
9
9
10
12

2 Elements of Matrix Theory


2.1 Linear systems and the Jordan normal form . . . . . . . . . . . . . . . . . .
2.1.1 Discrete-time linear systems . . . . . . . . . . . . . . . . . . . . . .
2.1.2 The Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Semi-convergence and convergence for discrete-time linear systems
2.2 Row-stochastic matrices and their spectral radius . . . . . . . . . . . . . . .
2.2.1 The spectral radius for row-stochastic matrices . . . . . . . . . . . .
2.3 PerronFrobenius theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Classification of nonnegative matrices . . . . . . . . . . . . . . . . .
2.3.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Applications to dynamical systems . . . . . . . . . . . . . . . . . . .
2.3.4 Selected proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

13
14
14
15
17
18
19
21
21
22
23
25
28

3 Elements of Graph Theory


3.1 Graphs and digraphs . . . . . . . . . . . . . . . . .
3.2 Paths and connectivity in undirected graphs . . . .
3.3 Paths and connectivity in digraphs . . . . . . . . .
3.3.1 Connectivity properties of digraphs . . . .
3.3.2 Periodicity of strongly-connected digraphs
3.3.3 Condensation digraphs . . . . . . . . . . .
3.4 Weighted digraphs . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

35
35
36
37
38
38
39
40

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

Contents
3.5
3.6

Database collections and software libraries . . . . . . . . . . . . . . . . . . . . . . . . .


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 The Adjacency Matrix


4.1 The adjacency matrix . . . . . . . . . . . . . . . . . .
4.2 Algebraic graph theory: basic and prototypical results .
4.3 Powers of the adjacency matrix, paths and connectivity
4.4 Graph theoretical properties of primitive matrices . . .
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .

41
43

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

45
45
46
46
48
50

5 Discrete-time Averaging Systems


5.1 Averaging with primitive row-stochastic matrices . . . . . . . . . . . . .
5.2 Averaging with reducible matrices . . . . . . . . . . . . . . . . . . . . .
5.3 Averaging with reducible matrices and multiple sinks . . . . . . . . . . .
5.4 Design of weights for undirected graphs: the equal-neighbor model . . .
5.5 Design of weights for undirected graphs: the MetropolisHastings model
5.6 Centrality measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

53
53
54
56
58
59
60
65

6 The Laplacian Matrix


6.1 The Laplacian matrix . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 The Laplacian in mechanical networks of springs . . . . . . . . . .
6.3 The Laplacian in electrical networks of resistors . . . . . . . . . . .
6.4 Properties of the Laplacian matrix . . . . . . . . . . . . . . . . . .
6.5 Graph connectivity and the rank of the Laplacian . . . . . . . . . .
6.6 The algebraic connectivity, its eigenvector, and graph partitioning
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

69
69
70
71
72
73
75
78

.
.
.
.
.
.
.

83
83
84
85
86
88
89
93

.
.
.
.
.
.

95
95
96
97
97
98
100

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

7 Continuous-time Averaging Systems


7.1 Example #1: Flocking behavior for a group of animals . . . . . .
7.2 Example #2: A simple RC circuit . . . . . . . . . . . . . . . . . .
7.3 Continuous-time linear systems and their convergence properties
7.4 The Laplacian flow . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 Design of weight-balanced digraphs from strongly-connected . .
7.6 Distributed optimization using the Laplacian flow . . . . . . . . .
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 The Incidence Matrix and Relative Measurements
8.1 The incidence matrix . . . . . . . . . . . . . . . . . . . .
8.2 Properties of the incidence matrix . . . . . . . . . . . . .
8.3 Distributed estimation from relative measurements . . . .
8.3.1 Problem statement . . . . . . . . . . . . . . . . . .
8.3.2 Optimal estimation via centralized computation . .
8.3.3 Optimal estimation via decentralized computation

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

Contents
8.4
8.5

Cycle and cutset spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9 Compartmental and Positive Systems


9.1 Introduction and example systems . . . . . . . . . . . . . . . . . .
9.2 Compartmental systems . . . . . . . . . . . . . . . . . . . . . . .
9.3 Positive systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Table of asymptotic behaviors for averaging and positive systems .
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 Convergence Rates, Scalability and Optimization
10.1 Some preliminary calculations and observations . .
10.2 Convergence factors for row-stochastic matrices .
10.3 Cumulative quadratic index for symmetric matrices
10.4 Circulant network examples and scalability analysis
10.5 Design of fastest distributed averaging . . . . . . .
10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

107
107
108
113
116
117

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

119
119
120
123
125
126
128

11 Time-varying Averaging Algorithms


11.1 Examples and models of time-varying discrete-time algorithms . . . . . . . . . .
11.1.1 Shared Communication Channel . . . . . . . . . . . . . . . . . . . . . . .
11.1.2 Asynchronous Execution . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1.3 Models of time-varying averaging algorithms . . . . . . . . . . . . . . . .
11.2 Convergence over time-varying connected graphs . . . . . . . . . . . . . . . . .
11.3 Convergence over digraphs connected over time . . . . . . . . . . . . . . . . . .
11.3.1 Shared communication channel with round robin scheduling . . . . . . .
11.3.2 Convergence theorems for symmetric time-varying algorithms . . . . . .
11.3.3 Uniform connectivity is required for non-symmetric matrices . . . . . . .
11.4 Analysis methods and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.4.1 Bounded solutions and non-increasing max-min function . . . . . . . . .
11.4.2 Proof of Theorem 11.2: the max-min function is exponentially decreasing
11.5 Time-varying algorithms in continuous-time . . . . . . . . . . . . . . . . . . . .
11.5.1 Undirected graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.2 Directed graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

129
129
129
130
130
131
132
132
133
133
134
135
136
137
137
139
141

.
.
.
.
.
.

143
143
144
144
145
146
148

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

12 Randomized Averaging Algorithms


12.1 Examples of randomized averaging algorithms . . . . . . . . . . . .
12.2 A brief review of probability theory . . . . . . . . . . . . . . . . .
12.3 Randomized averaging algorithms . . . . . . . . . . . . . . . . . .
12.3.1 Additional results on uniform symmetric gossip algorithms
12.3.2 Additional results on the mean-square convergence factor .
12.4 Table of asymptotic behaviors for averaging systems . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

II

Contents

Nonlinear Systems

149

13 Nonlinear Systems and Robotic Coordination


13.1 Coordination in relative sensing networks . . . . . . . . . . . .
13.2 Stability theory for dynamical systems . . . . . . . . . . . . . .
13.2.1 Main convergence tool: the LaSalle Invariance Principle
13.2.2 Application #1: Linear and linearized systems . . . . . .
13.2.3 Application #2: Negative gradient systems . . . . . . . .
13.3 A nonlinear rendezvous problem . . . . . . . . . . . . . . . . .
13.4 Flocking and Formation Control . . . . . . . . . . . . . . . . .
13.5 Rigidity and stability of the target formation . . . . . . . . . .
13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

151
151
154
156
156
158
159
160
163
168

14 Coupled Oscillators: Basic Models


14.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.1 Example #1: A spring network on a ring . . . . . . . . . . . .
14.2.2 Example #2: The structure-preserving power network model
14.2.3 Example #3: Flocking, schooling, and vehicle coordination . .
14.3 Coupled phase oscillator networks . . . . . . . . . . . . . . . . . . . .
14.3.1 The geometry of the circle and the torus . . . . . . . . . . . .
14.3.2 Synchronization notions . . . . . . . . . . . . . . . . . . . . . .
14.3.3 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . .
14.3.4 The order parameter and the mean field model . . . . . . . . .
14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

171
171
172
172
173
174
175
176
177
177
179
180

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

15 Networks of Coupled Oscillators


181
15.1 Synchronization of identical oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
15.1.1 An averaging-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
15.1.2 The potential landscape, convergence and phase synchronization . . . . . . . . 182
15.1.3 Phase balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
15.2 Synchronization of heterogeneous oscillators . . . . . . . . . . . . . . . . . . . . . . . 184
15.2.1 Synchronization of heterogeneous oscillators over complete homogeneous graphs 185
15.2.2 Synchronization of heterogeneous oscillators over weighted undirected graphs 187
15.2.3 Appendix: alternative theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
15.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
16 Virus Propagation: Basic Models
16.1 The SI model . . . . . . . . .
16.2 The SIR model . . . . . . . .
16.3 The SIS model . . . . . . . .
16.4 Exercises . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

191
191
193
195
197

Contents

17 Virus Propagation in Contact Networks


17.1 The stochastic network SI model . . .
17.2 The network SI model . . . . . . . . .
17.3 The network SIS model . . . . . . . .
17.4 The network SIR model . . . . . . . .
17.5 Exercises . . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

Bibliography

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

199
199
202
204
207
209
211

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

10

Contents

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Part I

Linear Systems

Chapter 1

Motivating Problems and Systems


In this introductory chapter, we introduce some example problems and systems from multiple disciplines.
The objective is to motivate our interest for distributed systems, algorithms and control. We look at the
following examples:
(i) In the context of social influence networks, we discuss a classic reference on how opinions evolve
and possibly reach a consensus in groups of individuals. Here, consensus means that the opinions
of the individuals are identical.
(ii) In the context of wireless sensor networks, we discuss distributed simple averaging algorithms
and, in the appendix, two advanced design problems in the context of parameter estimation and
hypothesis testing.
(iii) In the context of compartmental networks, we discuss dynamical flows among compartments,
such as arising in ecosystems.
(iv) Finally, in the context of robotic networks: we discuss simple robotic behaviors for cyclic pursuit
and balancing.
In all cases we are interested in presenting the basic models and motivating interest in understanding
their dynamic behaviors, such as the existence and attractivity of equilibria.
We present additional linear in later chapters and nonlinear examples in the second part. For a similar
valuable list of related and instructive examples, we refer to (Hendrickx 2008, Chapter 9) and (Garin
and Schenato 2010, Section 3.3). Other examples of multi-agent systems and applications can be found
in (Bullo et al. 2009; Fuhrmann and Helmke 2015; Mesbahi and Egerstedt 2010).

Chapter 1. Motivating Problems and Systems

1.1

Social influence networks: opinion dynamics

This example is an illustration of the rich literature on opinion dynamics, starting with the early works
by French (1956), Harary (1959), and DeGroot (1974). Specifically, we adopt the setup quite literally
from (DeGroot 1974).
We consider a group of n individuals who must act together
as a team. Each individual has his own subjective probability
distribution Fi for the unknown value of some parameter (or
more simply an estimate of the parameter). We assume now
that individual i is appraised of the distribution Fj of each other
member j 6= i of the group. Then the DeGroot model predicts
that the individual will revise its distribution to be:
Fi+ =

n
X

aij Fj ,

Figure 1.1: Interactions in a social influence network

j=1

where aij denotes the weight that individual i assigns to the distribution of individual j when he carries out this revision. More
precisely, the coefficient aii describes the attachment of individual i to its own opinion and aij , j 6= i, is
an interpersonal influence weight that individual i accords to individual j.
In the DeGroot model, the coefficients aij satisfy the following constraints: they are nonnegative,
that
Pm is, aij 0, and, for each individual, the sum of self-weight and accorded weights equals 1, that is,
j=1 aij = 1 for all i. In mathematical terms, the matrix

a11
..
A= .

an1

. . . a1n
..
..
.
.
. . . ann

has nonnegative entries and each of its rows has unit sum. Such matrices are said to be row-stochastic.
Questions of interest are:
(i) Is this model of human opinion dynamics believable at all?
(ii) How does one measure the coefficients aij ?
(iii) Under what conditions do the distributions converge to consensus? What is this value?
(iv) What are more realistic, empirically-motivated models, possibly including stubborn individuals or
antagonistic interactions?
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

1.2. Wireless sensor networks: averaging algorithms

1.2

Wireless sensor networks: averaging algorithms

sensor node
gateway node
Figure 1.2: A wireless sensor network composed of a collection of spatially-distributed sensors in a field and a
gateway node to carry information to an operator. The nodes are meant to measure environmental variables, such
as temperature, sound, pressure, and cooperatively filter and transmit the information to an operator.

A wireless sensor network is a collection of spatially-distributed devices capable of measuring physical and
environmental variables (e.g., temperature, vibrations, sound, light, etc), performing local computations,
and transmitting information throughout the network (including, possibly, an external operator).
Suppose that each node in a wireless sensor network has measured a scalar environmental quantity,
say xi . Consider the following simplest distributed algorithm, based on the concepts of linear averaging:
each node repeatedly executes

x+
i := average xi , {xj , for all neighbor nodes j} ,

(1.1)

where x+
For example, for the graph in Figure 1.3, one
i denotes the new value of xi .
+
+
can easily write x1 := (x1 + x2 )/2, x2 := (x1 + x2 + x3 + x4 )/4,
and so forth. In summary, the algorithms behavior is described
3
4
by

1/2 1/2 0
0
1/4 1/4 1/4 1/4

x+ =
1
2
0 1/3 1/3 1/3 x = Awsn x,
0 1/3 1/3 1/3
Figure 1.3: Example graph
where the matrix Awsn in equation is again row-stochastic.
Questions of interest are:
(i) Does each node converge to a value? Is this value the same for all nodes?
(ii) Is this value equal to the average of the initial conditions?
(iii) What properties do the graph and the corresponding matrix need to have in order for the algorithm
to converge?
(iv) How quick is the convergence?
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 1. Motivating Problems and Systems

1.3

Compartmental networks: dynamical flows among compartments

Compartmental systems model dynamical processes characterized by conservation laws (e.g., mass, fluid,
energy) and by the flow of material between units known as compartments. The flow of energy and
nutrients (water, nitrates, phosphates, etc) in ecosystems is typically studied using compartmental modelling.
For example, Figure 1.4 illustrates a widely-cited water flow model for a desert ecosystem (Noy-Meir
1973).
precipitation

soil

evaporation, drainage, runo

uptake
drinking

plants

transpiration

herbivory
animals

evaporation

Figure 1.4: Water flow model for a desert ecosystem. The blue line denotes an inflow from the outside environment.
The red lines denote outflows into the outside environment.

If we let qi denote the amount of material in compartment i, the mass balance equation for the ith
compartment is written as:
X
qi =
(Fji Fij ) Fi0 + ui ,
j6=i

where ui is the inflow from the environment and Fi0 is the outflow into the environment. We now
assume linear flows, that is, we assume that the flow Fij from node i to node j (as well as to the
environment) is proportional to the mass quantity at i, that is, Fij = fij qi for a positive flow rate
constant fij . Therefore we can write
X
qi =
(fji qj fij qi ) fi0 qi + ui
j6=i

and so, in vector notation, there exists an appropriate C matrix such that
q = Cq + u.

For example, let us write down the compartmental matrix C for the water flow model in figure. We
let q1 , q2 , q3 denote the water mass in soil, plants and animals, respectively. Moreover, as in figure, we let
fe-d-r , ftrnsp , fevap , fdrnk , fuptk , fherb , denote respectively the evaporation-drainage-runoff, transpiration,
evaporation, drinking, uptake, and herbivory rate. With these notations, we can write

fe-d-r fuptk fdrnk


0
0
fuptk
ftrnsp fherb
0 .
C=
fdrnk
fherb
fevap

Questions of interest are:

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

1.4. Appendix: Robotic networks in cyclic pursuit and balancing

(i) for constant inflows u, does the total mass in the system remain bounded?
(ii) is there an asymptotic equilibrium? do all evolutions converge to it?
(iii) which compartments become empty asymptotically?

1.4

Appendix: Robotic networks in cyclic pursuit and balancing

In this section we consider two simple examples of coordination motion in robotic networks. The
standing assumption is that n robots, amicably referred to as bugs, are placed and restricted to move
on a circle of unit radius. Because of this bio-inspiration and because this language is common in the
literature (Bruckstein et al. 1991; Klamkin and Newman 1971; Marshall et al. 2004), we refer to the
following two problems as n-bugs problems.
On this unit circle the bugs positions are angles measured counterclockwise from the positive
horizontal axis. We let angles take value in [0, 2), that is, an arbitrary position satisfies 0 < 2.
The bugs are numbered counterclockwise with identities i {1, . . . , n} and are at positions 1 , . . . , n . It
is convenient to identify n+1 with 1. We assume the bugs move in discrete times k in a counterclockwise
direction by a controllable amount ui (i.e., a control signal), that is:
i (k + 1) = mod(i (k) + ui (k), 2).
where mod(, 2) is the remainder of the division of by 2 and its introduction is requred to ensure
that i (k + 1) remains inside [0, 2).
The n-bugs problem is related to the study of pursuit curves and inquires about what the paths of
n bugs are, not aligned initially, when they chase one another. We refer to (Bruckstein et al. 1991;
Marshall et al. 2004; Smith et al. 2005; Watton and Kydon 1969) for surveys and recent results.

Objective: optimal patrolling of a perimeter. Approach: Cyclic pursuit


We now suppose that each bug feels an attraction and moves towards the closest counterclockwise
neighbor, as illustrated in Figure 1.5. Recall that the counterclockwise distance from i and i+1 is the length
of the counterclockwise arc from i and i+1 and satisfies:
distcc (i , i+1 ) = mod(i+1 i , 2),

In short, given a control gain [0, 1], we assume that the ith bug sets its control signal to
upursuit,i (k) = distcc (i (k), i+1 (k)).

i
i+1

i
i+1

distcc (i , i+1 )

i
distcc (i , i+1 )

distc (i , i

Figure 1.5: Cyclic pursuit and balancing prototypical n-bug problems


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

1)

Chapter 1. Motivating Problems and Systems


Questions of interest are:
(i) Does this system have any equilibrium?
(ii) Is a rotating equally-spaced configuration a solution? Here an equally-spaced configuration is one
for which mod(i+1 i , 2) = mod(i i1 , 2) for all i {1, . . . , n}.

(iii) For which values of do the bugs converge to an equally-spaced configuration and with what
pairwise distance?

Objective: optimal sensor placement. Approach: Cyclic balancing


Next, we suppose that each bug feels an attraction towards both the closest counterclockwise and
the closest clockwise neighbor, as illustrated in Figure 1.5. Given a control gain [0, 1/2] and the
natural notion of clockwise distance, the ith bug sets its control signal to
ubalancing,i (k) = distcc (i (k), i+1 (k)) distc (i (k), i1 (k)),
where distc (i (k), i1 (k)) = distcc (i1 (k), i (k)).
Questions of interest are:
(i) Is a static equally-spaced configuration a solution?
(ii) For which values of do the bugs converge to a static equally-spaced configuration?
(iii) Is it true that the bugs will approach an equally-spaced configuration and that each of them will
converge to a stationary position on the circle?

A preliminary analysis
It is unrealistic (among other aspects of this setup) to assume that the bugs know the absolute position
of themselves and of their neighbors. Therefore, it is interesting to rewrite the dynamical system in
terms of pairwise distances between nearby bugs.
For i {1, . . . , n}, we define the relative angular distances (the lengths of the counterclockwise arcs)
di = distcc (i , i+1 ) 0. (We also adopt the usual convention that dn+1 = d1 and that d0 = dn ). The
change of coordinates from (1 , . . . , n ) to (d1 , . . . , dn ) leads us to rewrite the cyclic pursuit and the
cyclic balancing laws as:
upursuit,i (k) = di ,
ubalancing,i (k) = di di1 .
In this new set of coordinates, one can show that the cyclic pursuit and cyclic balancing systems are,
respectively,
di (k + 1) = (1 )di (k) + di+1 (k),

di (k + 1) = di+1 (k) + (1 2)di (k) + di1 (k).

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(1.2)
(1.3)

1.5. Appendix: Design problems in wireless sensor networks

These are two linear time-invariant dynamical systems with state d = (d1 , . . . , dn ) and governing
equation described by the two n n matrices:

Apursuit

0
1

..
..
= .
.

.
..
0

0
0

..
..
.
.
0

..
..
,
.
.
0

..
. 1

0
1

Abalancing

1 2


1 2

..
..
= .
.

.
..
0

..
..
.
.
0

..
..
.
.
.
0

..
. 1 2

1 2

We conclude with the following remarks.


(i) Equations (1.2) and (1.3) are correct if the counterclockwise order of the bugs is never violated.
One can show that this is true for < 1 in the pursuit case and < 1/2 in the balancing case; we
leave this proof to the reader in Exercise E1.2.
(ii) The matrices Apursuit and Abalancing , for varying n and , are Toeplitz and circulant. Moreover,
they have nonnegative entries for the stated ranges of and are row-stochastic.
(iii) If one defines the agreement space, i.e., {(, , . . . , ) Rn | R}, then each point in this set is
an equilibrium for both systems.
P
(iv) It must be true for all times that (d1 , . . . , dn ) {x Rn | xi 0, ni=1 xi = 2}. This property is
indeed the consequence of the nonnegative matrices Apursuit and Abalancing being doubly-stochastic,
i.e., each row-sum and each column-sum is equal to 1.
(v) We will later study for which values of the system converges to the agreement space.

1.5

Appendix: Design problems in wireless sensor networks

In this appendix we show how averaging algorithms can be used to tackle realistic wireless sensor
network problems.

1.5.1

Wireless sensor networks: distributed parameter estimation

The next two examples are also drawn from the field of wireless sensor network, but they feature a more
advanced setup and require a basic background in estimation and detection theory, respectively. The key
lessons to be learnt from these examples is that it is useful to have algorithms that compute the average of
distributed quantities.
Following ideas from (Garin and Schenato 2010; Xiao et al. 2005), we aim to estimate an unknown
parameter Rm via the measurements taken by a sensor network. Each node i {1, . . . , n} measures
yi = Bi + vi ,
where yi Rmi , Bi is a known matrix and vi is random measurement noise. We assume that
(A1) the noise vectors v1 , . . . , vn are independent jointly-Gaussian variables with zero-mean E[vi ] = 0mi
and positive-definite covariance E[vi vi> ] = i = >
i , for i {1, . . . , n}; and
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

10

Chapter 1. Motivating Problems and Systems

B1
P

(A2) the measurement parameters satisfy the following two properties: i mi m and ... is full
Bn

rank.

Given the measurements y1 , . . . , yn , it is of interest to compute a least-square estimate of , that is, an


estimate of that minimizes a least-square error. Specifically, we aim to minimize the following weighted
least-square error:
n
n
X
X

2
>

b


min
yi Bi 1 =
yi Bi b 1
yi Bi b .
i
b

i=1

i=1

In this weighted least-square error, individual errors are weighted by their corresponding inverse
covariance matrices so that an accurate (respectively, inaccurate) measurement corresponds to a high
(respectively, low) error weight. With this particular choice of weights, the least-square estimate coincides
with the so-called maximum-likelihood estimate; see (Poor 1994) for more details. Under assumptions
(A1) and (A2), the optimal solution is
b =

n
X
i=1

Bi> 1
i Bi

n
1 X

Bi> 1
i yi .

i=1

This formula is easy to implement by a single processor with all the information about the problem, i.e.,
the parameters and the measurements.
To compute b in the sensor (and processor) network, we perform two steps:
[Step 1:] we run two distributed algorithms in parallel to compute the average of the quantities Bi> 1
i Bi
1
>
and Bi i yi .
[Step 2:] we compute the optimal estimate via

1


> 1
> 1
> 1
b = average B1> 1
B
,
.
.
.
,
B

B
average
B

y
,
.
.
.
,
B

y
.
1
n
1
n
n n
1 1
n n
1

Questions of interest are:

(i) How do we design algorithms to compute the average of distributed quantities?


(ii) What properties does the graph need to have in order for such an algorithm to exist?
(iii) How do we design an algorithm with fastest convergence?

1.5.2

Wireless sensor networks: distributed hypothesis testing

We consider a distributed hypothesis testing problem; these ideas appeared in (Olfati-Saber et al. 2006; Rao
and Durrant-Whyte 1993). Let h , for in a finite set , be a set of two or more hypotheses about an
uncertain event. For example, given a certain area of interest, we could have h0 = no target is present,
h1 = one target is present and h2 = two or more targets are present.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

1.5. Appendix: Design problems in wireless sensor networks

11

Suppose that we know the a priori probabilities p(h ) of the hypotheses and that n nodes of a sensor
network take measurements yi , for i {1, . . . , n}, related to the event. Independently of the type of
measurements, assume you can compute
p(yi |h ) = probability of measuring yi given that h is the true hypothesis.
Also, assume that each observation is conditionally independent of all other observations, given any
hypothesis.
(i) We wish to compute the maximum a posteriori estimate, that is, we want to identify which one is the
most likely hypothesis, given the measurements. Note that, under the independence assumption,
Bayes Theorem implies that the a posteriori probabilities satisfy
n

Y
p(h )
p(h |y1 , . . . , yn ) =
p(yi |h ).
p(y1 , . . . , yn )
i=1

(ii) Observe that p(h ) is known, and p(y1 , . . . , yn ) is a constant normalization factor scaling all
posteriori probabilities equally. Therefore, for each hypothesis , we need to compute
n
Y
i=1

p(yi |h ),

or equivalently, we aim to exchange data among the sensors in order to compute:


!
n



X
log(p(yi |h )) = exp n average log p(y1 |h ), . . . , log p(yn |h )
exp
i=1

(iii) In summary, even in this hypothesis testing problem, we need algorithms to compute the average
of the n numbers log p(y1 |h ), . . . , log p(yn |h ), for each hypothesis .
Questions of interest here are the same as in the previous section.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

12

Chapter 1. Motivating Problems and Systems

1.6

Exercises

E1.1

Simulating the averaging dynamics. Simulate in your favorite programming language and software
package the linear averaging algorithm in equation (1.1). Set n = 5, select the initial state equal to
(1, 1, 1, 1, 1), and use the following undirected unweighted graphs, depicted in Figure E1.1:
(i) the complete graph,
(ii) the ring graph, and
(iii) the star graph with node 1 as center.

Which value do all nodes converge to? Is it equal to the average of the initial values? Turn in your code, a
few printouts (as few as possible), and your written responses.

Figure E1.1: Complete graph, ring graph and star graph with 5 nodes
E1.2

Computing the bugs dynamics. Consider the cyclic pursuit and balancing dynamics described in
Section 1.4. Verify
(i) the cyclic pursuit closed-loop equation (1.2),
(ii) the cyclic balancing closed-loop equation (1.3), and
(iii) the counterclockwise order of the bugs is never violated.
Hint: Recall the distributive property of modular addition: mod(a b, n) = mod(mod(a, n) mod(b, n), n).

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 2

Elements of Matrix Theory


We review here basic concepts from matrix theory. These concepts will be useful when analyzing graphs
and averaging algorithms defined over graphs.
In particular we are interested in understanding the convergence of the linear dynamical systems
discussed in Chapter 1. Some of those systems are described by matrices that have nonnegative entries
and have row-sums equal to 1.

Notation
It is useful to start with some basic notations from matrix theory and linear algebra. We let f : X Y
denote a function from set X to set Y . We let R, N and Z denote respectively the set of real, natural and
integer numbers; also R0 and Z0 are the set of nonnegative real numbers and nonnegative integer
numbers. For real numbers a < b, we let
[a, b] = {x R | a x b},

]a, b]= {x R | a < x b},

[a, b[ = {x R | a x < b},

]a, b[= {x R | a < x < b}.

Given a complex number z C, its norm (sometimes referred to as complex modulus) is denoted by |z|,
its real part by <(z) and its imaginary part by =(z).
We let 1n Rn (respectively 0n Rn ) be the column vector with all entries equal to +1 (respectively
0). Let e1 , . . . , en be the standard basis vectors of Rn , that is, ei has all entries equal to zero except for the
ith entry equal to 1.
We let In denote the n-dimensional identity matrix and A Rnn denote a square n n matrix with
real entries {aij }, i, j {1, . . . , n}. The matrix A is symmetric if A> = A. A symmetric matrix is positive
definite (resp. positive semidefinite) if all its eigenvalues are positive (resp. nonnegative). The kernel of
A is the subspace kernel(A) = {x Rn | Ax = 0n }, the image of A is image(A) = {y Rn | Ax =
y, for some x Rn }, and the rank of A is the dimension of its image. Given vectors v1 , . . . , vj Rn ,
their span is span(v1 , . . . , vj ) = {a1 v1 + + aj vj | a1 , . . . , aj R} Rn .
13

14

Chapter 2. Elements of Matrix Theory

2.1

Linear systems and the Jordan normal form

In this section we introduce a prototypical model for dynamical systems and study its stabilities properties
via the so-called Jordan normal form, that is a key tool from matrix theory.

2.1.1

Discrete-time linear systems

We start with a basic definition.


Definition 2.1 (Discrete-time linear system). A square matrix A defines a discrete-time linear systems
by
x(k + 1) = Ax(k), x(0) = x0 ,
(2.1)
or, equivalently by x(k) = Ak x0 , where the sequence {x(k)}kZ0 is called the solution, trajectory or evolution
of the system.
We are interested in understanding when a solution from an arbitrary initial condition has an
asymptotic limit as time diverges and to what value the solution converges. We formally define this
property as follows.
Definition 2.2 (Semi-convergent and convergent matrices). A matrix A Rnn is
(i) semi-convergent if limk+ Ak exists, and

(ii) convergent if it is semi-convergent and limk+ Ak = 0nn .


It is immediate to see that, if A is semi-convergent with limiting matrix A = limk+ Ak , then
lim x(k) = A x0 .

k+

In what follows we characterize the sets of semi-convergent and convergent matrices.


Remark 2.3 (Modal decomposition for symmetric matrices). Before treating the general analysis
method, we present the self-contained and instructive case of symmetric matrices. Recall that a symmetric
matrix A has real eigenvalues 1 2 n and corresponding orthonormal (i.e., orthogonal and unitlength) eigenvectors v1 , . . . , vn . Because the eigenvectors are an orthonormal basis for Rn , we can write the modal
decomposition x(k) = y1 (k)v1 + + yn (k)vn , where the ith normal mode is defined by yi (k) = vi> x(k).
We then left-multiply the two equalities (2.1) by vi> and exploit Avi = i vi to obtain
yi (k + 1) = i yi (k),

yi (0) = vi> x0 ,

yi (k) = ki (vi> x0 ).

In short, the evolution of the linear system (2.1) is


x(k) = k1 (v1> x0 )v1 + + kn (vn> x0 )vn .

Therefore, each evolution starting from an arbitrary initial condition satisfies

(i) limk x(k) = 0n if and only if |i | < 1 for all i {1, . . . , n}, and

> x )v if and only if = = = 1 and | | < 1 for all


(ii) limk x(k) = (v1> x0 )v1 + + (vm
0 m
1
m
i
i {m + 1, . . . , n}.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

2.1. Linear systems and the Jordan normal form

2.1.2

15

The Jordan normal form

In this section we review a very useful canonical decomposition of a square matrix. Recall that two n n
matrices A and B are similar if B = T AT 1 for some invertible matrix T . Here and in what follows,
matrices are allowed to have complex entries.
Theorem 2.4 (Jordan normal form). Each n n matrix A is similar to a block diagonal matrix J, called
the Jordan normal form of A, given by

J1

0
J =
.
..
0

.
J2 . . 0
Cnn ,

.. ..
.
. 0
0 Jm
0

where each block Ji , called a Jordan block, is a square matrix of size ji and of the form

0
Ji =
.
..
0

..
. 0
i
Cji ji .

.. ..
.
. 1
0 i
1

(2.2)

Clearly, m n and j1 + + jm = n.
We refer to (Horn and Johnson 1985) for a standard proof of this theorem. In other words, Theorem 2.4 implies there exists an invertible matrix T such that
A = T JT 1
AT = T J

T 1 A = JT 1 .

(2.3)
(2.4)
(2.5)

The matrix J is unique, modulo a re-ordering of the Jordan blocks. The eigenvalues of J are the (not
necessarily distinct) numbers 1 , . . . , m ; these numers are also the eigenvalues of A (since a similarity
transform does not change the eigenvalues of a matrix). Given an eigenvalue ,
(i) the algebraic multiplicity of is the sum of the sizes of all Jordan blocks with eigenvalue (or,
equivalently, the multiplicity of as a root of the characteristic polynomial of A), and
(ii) the geometric multiplicity of is the number of Jordan blocks with eigenvalue (or, equivalently,
the number of linearly-independent eigenvectors associated to ).
An eigenvalue is simple if it has algebraic and geometric multiplicity equal precisely to 1, that is, a single
Jordan block of size 1. An eigenvalue is semisimple if all its Jordan blocks have size 1, so that its algebraic
and geometric multiplicity are equal.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

16

Chapter 2. Elements of Matrix Theory

Let t1 , . . . , tn and r1 , . . . , rn denote the columns and rows of T and T 1 respectively. If all eigenvalues
of A are semisimple, then the equations (2.4) and (2.5) imply, for all i {1, . . . , n},
and

Ati = i ti

ri A = i ri .

In other words, the ith column of T is the right eigenvector (or simply eigenvector of A corresponding to
the eigenvalue i , and the ith row of T 1 is the corresponding left eigenvector of A. Matrices with only
semisimple eigenvalues are called diagonalizable (because J is diagonal).
Finally, it is possible to have eigenvalues with larger algebraic than geometric multiplicity; in this
case, the columns of the matrix T are the generalized right eigenvectors of A and the rows of T 1 are the
generalized left eigenvector of A. For more details we refer to reader to (Horn and Johnson 1985).
Example 2.5 (Revisiting the wireless sensor network example). Next, as numerical example, let us
reconsider the wireless sensor network discussed in Section 1.2 and the 4-dimensional row-stochastic matrix Awsn ,
which we report here for convenience:

Awsn

1/2
1/4
=
0
0

1/2 0
0
1/4 1/4 1/4
.
1/3 1/3 1/3
1/3 1/3 1/3

With the aid of a symbolic mathematics program, we compute Awsn = T JT 1 where

1
0
J =
0
0

0
0
0
0

0
0
1
24 (5 73)
0

0
,
0
1
24 (5 + 73)
1
6

= 1 + 19
96 9673
1

96
9619
73

1
48
1
48

1
Therefore, the eigenvalues of A are 1, 0, 24
(5
and left eigenvector equations are:


1
1
1 1

Awsn
1 = 1
1
1

1 0 2 + 273 2 273
1 0 11 73 11 + 73
,
T =

1 1
8
8
1 1
8
8

1
1
1
3

0
5
48
73
5
+ 48
73

1
73), 24
(5 +

and

1
64
1
64

12
3
64
73
3
+ 64
73

1
64
1
64

4
1
2

and

3 .

64 73
3

64 73

73). Corresponding to the eigenvalue 1, the right

>
>
1/6
1/6
1/3
1/3


1/4 Awsn = 1/4 .
1/4
1/4

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

2.1. Linear systems and the Jordan normal form

2.1.3

17

Semi-convergence and convergence for discrete-time linear systems

We can now use the Jordan normal form to study the powers of the matrix A:

1
1
1
k 1
Ak = T
| JT T JT {z T JT } = T J T
k times

k
J1

0
=T
.
..
0

.
J2k . . 0
T 1 ,

.. ..
.
. 0
k
0 Jm
0

where the kth power of the generic Jordan block Ji , as a function of block size 1, 2, 3, . . . , ji , are
respectively:

Jik


 k  ki
= i ,
0


k kk1
i
kk1
i
i
k
, 0
k

i
i
0
0

k kk1

i
i
k!k2
i

..
(k2)!2!

.
ki
,..., 0
kk1
i
.
..
..
..
.
.
ki
0

kj +1

k!i i
(kji +1)!(ji 1)!

..
.

kk1
i
ki

We can now derive necessary and sufficient conditions for semi-convergence and convergence of an
arbitrary square matrix. The proof of the following result is an immediate consequence of the Jordan
normal form and of the following equality

0,
j k
lim k = 1,

non-existent or unbounded,

if || < 1,
if j = 0 and = 1,
otherwise.

(2.6)

for any nonnegative integer j; see also Exercise E2.3.

Theorem 2.6 (Semi-convergent and convergent matrices). For a square matrix A with Jordan normal
form J and Jordan blocks Ji , i {1, . . . , m}, the following statements are equivalent:
(i) A is semi-convergent (resp. convergent),
(ii) J is semi-convergent (resp. convergent), and
(iii) each block Ji is semi-convergent (resp. convergent).
Moreover, the following statements hold for each block Ji with eigenvalues i :
(i) for Ji of size 1, Ji is convergent if and only if |i | < 1,

(ii) for Ji of size 1, Ji is semi-convergent and not convergent if and only if i = 1, and
(iii) for Ji of size larger than 1, Ji is semiconvergent and convergent if and only if |i | < 1.
We complete this discussion with two useful definitions and an equivalent reformulation of Theorem 2.6.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

18

Chapter 2. Elements of Matrix Theory

(a) The spectrum of a convergent matrix

(b) The spectrum of a semiconvergent


matrix, provided the eigenvalue 1 is
semisimple.

(c) The spectrum of a matrix that is not


semiconvergent.

Figure 2.1: Eigenvalues and convergence properties of discrete-time linear systems

Definition 2.7 (Spectrum and spectral radius of a matrix). Given a square matrix A,
(i) the spectrum of A, denoted spec(A), is the set of eigenvalues of A; and
(ii) the spectral radius of A is the maximum norm of the eigenvalues of A, that is,
(A) = max{|| | spec(A)},

or, equivalently, the radius of the smallest disk in C centered at the origin and containing the spectrum of A.
Theorem 2.8 (Convergence and spectral radius). For a square matrix A, the following statements hold:
(i) A is convergent if and only if (A) < 1,
(ii) A is semi-convergent if and only if (A) 1, no eigenvalue has unit norm other than possibly the number
1, and if 1 is an eigenvalue, then it is semisimple.

2.2

Row-stochastic matrices and their spectral radius

Motivated by the example systems in Chapter 1, we are now interested in discrete-time linear systems
defined by matrices with special properties. Specifically, we are interested in matrices with nonnegative
entries and whose row-sums are all equal to 1.
The square matrix A Rnn is
(i) nonnegative (respectively positive) if aij 0 (respectively aij > 0) for all i and j in {1, . . . , n};

(ii) row-stochastic if nonnegative and A1n = 1n ;

(iii) column-stochastic if nonnegative and A> 1n = 1n ; and


(iv) doubly-stochastic if it is row- and column-stochastic.
In the following, we write A > 0 and v > 0 (respectively A 0 and v 0) for a positive (respectively
nonnegative) matrix A and vector v.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

2.2. Row-stochastic matrices and their spectral radius

19

Given a finite number of points p1 , p2 , . . . , pn in Rn , a convex


combination of p1 , p2 , . . . , pn is a point of the form

p1

1 p1 + 2 p2 + + n pn

p3

p2

where the real numbers 1 , . . . , n satisfy 1 + +n = 1 and i 0


for all i {1, . . . , n}. For example, on the plane R2 , the set of convex Figure 2.2: Convex combination: q
combinations of two distinct points is the segment connecting them is inside the triangle if and only if q
and the set of convex combinations of three distinct points is the is a convex combination of p1 , p2 , p3 .
triangle (including its interior) defined by them; see Figure 2.2. The
numbers 1 , . . . , n are called convex combination coefficients and each row of a row-stochastic matrix
consists of convex combination coefficients.

2.2.1

The spectral radius for row-stochastic matrices

To characterize the spectral radius of a row-stochastic matrix, we introduce a useful general method to
localize the spectrum of a matrix.
Theorem 2.9 (Gergorin Disks Theorem). For any square matrix A Rnn ,
spec(A)

i{1,...,n}

n
o
Xn

z C |z aii |
|aij |
j=1,j6=i
|
{z
}
P

disk in the complex plane centered at aii with radius

n
j=1,j6=i

.
|aij |

Proof. Consider the eigenvalue equation Ax = x for the eigenpair (, x), where and x 6= 0n are
complex. Choose the index i {1, . . . , n} so that |xi | = P
maxj{1,...,n} |xj | > 0. The ith component
of the eigenvalue equation can be rewritten as aii = nj=1,j6=i aij xj /xi . Now, take the complex
magnitude of this equality and upper-bound its right-hand side:
n

n
n
X
X
X
xj
|xj |

|aij | .
| aii | =
aij
|aij |

xi
|xi |
j=1,j6=i

j=1,j6=i

j=1,j6=i

The theorem statement follows by interpreting this inequality as a bound on the possible location of
each arbitrary eigenvalue of A.

Each disk in the theorem statement is referred to as a Gergorin disks, or more accurately, as a
Gergorin row disks; an analogous disk theorem can be stated for Gergorin column disks. Exercise E2.15
showcases an instructive application to distributed computing of numerous topics covered so far, including
convergence notions and the Gergorin Disks Theorem.
Lemma 2.10 (Spectral properties of a row-stochastic matrix). For a row-stochastic matrix A,
(i) 1 is an eigenvalue, and
(ii) spec(A) is a subset of the unit disk and (A) = 1.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

20

Chapter 2. Elements of Matrix Theory

Proof. First, recall that A being row-stochastic is equivalent to two facts: aij 0, i, j {1, . . . , n},
and A1n = 1n . The second fact implies that 1n is an eigenvector with eigenvalue 1. Therefore, by
definition of spectral radius, (A) 1. Next, we prove that (A) 1 by invoking the Gergorin Disks
Theorem 2.9 to show that spec(A) is contained in the unit disk centered at the origin. The Gergorin
disks of a row-stochastic matrix as illustrated in Figure 2.3.

aii

1
X

aij

j6=i

Figure 2.3: All Gergorin disks of a row-stochastic matrix are contained in the unit disk.

P
Note that A being row-stochastic implies aii [0, 1] and aii + j6=i aij = 1. Hence, the center of
the ith Gergorin disk belongs to the positive real axis between 0 and 1, and the right-most point in the
disk is at 1.

Note: because 1 is an eigenvalue of each row-stochastic matrix A, clearly A is not convergent. But it
is possible for A to be semi-convergent.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

2.3. PerronFrobenius theory

2.3

21

PerronFrobenius theory

We have seen how row-stochastic matrices are not convergent; we now focus on characterizing those that
are semi-convergent. To establish whether a row-stochastic matrix is semi-convergent, we introduce
the widely-established PerronFrobenius theory for nonnegative matrices.

2.3.1

Classification of nonnegative matrices

In the previous section we already defined nonnegative and positive matrices. Here we study two sets of
nonnegative matrices with certain characteristic properties.
Definition 2.11 (Irreducible and primitive matrices). For n 2, an n n nonnegative matrix A is

(i) irreducible if, for all partitions {I, J } of the index set {1, . . . , n}, there exists i I and j J such
that aij 6= 0,

(ii) primitive if there exists k N such that Ak is a positive matrix.

Here {I, J } is a partition of {1, . . . , n} if I J = {1, . . . , n} and I J = . A matrix that is not irreducible
is said to be reducible.
Note: a positive matrix is clearly primitive. Also note that, if there is k N such that Ak is positive,
then (one can show that) all subsequent powers Ak+1 , Ak+2 , . . . are necessarily positive as well; see
Exercise E2.6.
We postpone the proof of the following result until Section 2.3.4.
Lemma 2.12. If a square nonnegative matrix is primitive, then it is irreducible.
As a consequence of this lemma we can draw the set diagram in Figure 2.4 describing the set of
nonnegative square matrices and its subsets of irreducible, primitive and positive matrices. Note that the
inclusions in the diagram are strict in the sense that:


0 1
(i) the matrix
is nonnegative and but not irreducible;
0 0


0 1
(ii) the matrix
is irreducible but not primitive;
1 0


1 1
(iii) the matrix
is primitive but not positive.
1 0

non-negative
(A 0)

irreducible
(no permutation brings A into
block upper triangular form)

primitive
(there exists k
such that Ak > 0)

positive
(A > 0)

Figure 2.4: The set of nonnegative square matrices and its subsets of irreducible, primitive and positive matrices.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

22

Chapter 2. Elements of Matrix Theory

Permutation characterization of irreducibility It is useful to elaborate on Definition 2.11. First,


we note that the notion of irreducibility is applicable to matrices that are not necessarily nonnegative.
Second, we provide a formal version of the following intuition: a matrix is irreducible if it is not
similar via a permutation to a block upper triangular matrix. We start with a useful definition. A square
matrix P is binary if all its entries are equal to 0 or 1; accordingly, we write P {0, 1}nn . A permutation
matrix is a square binary matrix with precisely one entry equal to 1 in every row and every columns. (In
other words, the columns of a permutation matrix are e1 , . . . , en , modulo reordering.) A permutation
matrix acts on a vector by permuting its entries.
Lemma 2.13. For n 2, the n n matrix A Rnn is reducible if there exists a permutation matrix
P {0, 1}nn and a number r {1, . . . , n 1} such that


Brr
Cr(nr)
>
P AP =
0(nr)r D(nr)(nr)
where B, C and D are arbitrary.
We leave the proof of this lemma to the reader as Exercise E2.9.
Note that P > AP is the similarity transformation of A defined by P because the permutation matrix
P satisfies P 1 = P > ; see Exercise E2.13. Moreover, note that P > AP is simply a reordering of rows


0 0 1
1
3
and columns. For example, consider P = 1 0 0, note P 2 = 1 and compute
0 1 0
3
2

a11 a12 a13


a22 a23 a21
A = a21 a22 a23 P > AP = a32 a33 a31 ,
a31 a32 a33
a12 a13 a11
so that the entries of the 1st, 2nd and 3rd rows of A are mapped respectively to the 3rd, 1st and 2nd
rows of P > AP and, at the same time, the entries of the 1st, 2nd and 3rd columns of A are mapped
respectively to the 3rd, 1st and 2nd columns of P > AP .

2.3.2

Main results

We are now ready to state the main results in Perron-Frobenius-theory theory and characterize the
properties of the spectral radius of a nonnegative matrix as a function of the matrix properties. We state
the results in three related theorems.
Theorem 2.14 (Perron-Frobenius for nonnegative matrices). If A is a nonnegative matrix, then
(i) there exists a real eigenvalue 0 such that || for all other eigenvalues ,

(ii) the right and left eigenvectors v and w of can be selected nonnegative.

Theorem 2.15 (PerronFrobenius for irreducible matrices). If A is nonnegative and irreducible, then
(i) there exists a real simple eigenvalue > 0 such that || for all other eigenvalues ,
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

2.3. PerronFrobenius theory

23

(ii) the right and left eigenvectors v and w of can be selected positive and unique, up to rescaling.
Theorem 2.16 (PerronFrobenius for primitive matrices). If A is nonnegative, irreducible and primitive,
then
(i) there exists a real simple eigenvalue > 0 such that > || for all other eigenvalues ,

(ii) the right and left eigenvectors v and w of can be selected positive and unique, up to rescaling.
We refer to Theorem 5.2 in Section 5.2 for a version of the PerronFrobenius Theorem for reducible
matrices.
Some remarks and some additional statements are in order.
Remark 2.17 (Dominant eigenvalue and eigenvectors). In all three cases, the real positive eigenvalue
is the spectral radius (A) of A. We refer to as the dominant eigenvalue; it is sometimes also referred to as the
Perron root. The dominant eigenvalue is equivalently defined by
(A) = inf{ R | Au u for all u > 0},
and it satisfies the following bound (see Exercise E2.8):
min(A1n ) (A) max(A1n ).
Associated with the dominant eigenvalue, the right and left eigenvectors v and w (unique up to rescaling) are called
the right and left dominant eigenvector. The right dominant eigenvector together with its positive multiples
are the only positive right eigenvectors of a primitive matrix A (a similar statement holds for the left dominant
eigenvector.
Remark 2.18 (Counterexamples). The characterizations in the three theorems are sharp in the following
sense:


0 1
(i) the matrix
is nonnegative and reducible, and, indeed, its dominant eigenvalue is 0;
0 0


0 1
(ii) the matrix
is irreducible but not primitive and, indeed, its dominant eigenvalues +1 is not stricly
1 0
larger, in magnitude than the other eigenvalues 1.

2.3.3

Applications to dynamical systems

The PerronFrobenius Theorem for a primitive matrix A has immediate consequences for the behavior
of Ak as k and, therefore, the asymptotic behavior of the dynamical system x(k + 1) = Ax(k).
Proposition 2.19 (Powers of primitive matrices). For a primitive matrix A with dominant eigenvalue
and with dominant right and left eigenvectors v and w normalized so that v > w = 1, we have
lim Ak /k = vw> .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

24

Chapter 2. Elements of Matrix Theory

We now apply this result to row-stochastic matrices. Recall that A 0 is row-stochastic if A1n = 1n .
Therefore, the right eigenvector of the eigenvalue 1 can be selected as 1n .
Corollary 2.20 (Consensus for primitive row-stochastic). For a primitive row-stochastic matrix A,
(i) the simple eigenvalue (A) = 1 is strictly larger than the magnitude of all other eigenvalues, hence A is
semi-convergent;
(ii) limk Ak = 1n w> , where w is the left positive eigenvector of A with eigenvalue 1 satisfying w1 + +
wn = 1;
(iii) the solution to x(k + 1) = Ax(k) satisfies

lim x(k) = w> x(0) 1n ;

(iv) if additionally A is doubly-stochastic, then w = n1 1n (because A> 1n = 1n and n1 1>


n 1n = 1) so that
lim x(k) =


1>
n x(0)
1n = average x(0) 1n .
n

In this case we say that the dynamical system achieves average consensus.

>
>
w x(0)
w
w1 w2 wn

.. , and (1 w> )x(0) = (w> x(0))1 = .. .
..
..
Note: 1n w> = ... = ...
.
n
n
.
.
.

w1 w2 wn
w> x(0)
w>
Note: the limiting vector is therefore a weighted average of the initial conditions. The relative
weights of the initial conditions are the convex combination coefficients w1 , . . . , wn . In a social influence
network, the coefficient wi is regarded as the social influence of agent i. An early reference to average
consensus is (Harary 1959).
Example 2.21 (Revisiting the wireless sensor network example). Finally, as numerical example, let
us reconsider the wireless sensor network discussed in Section 1.2 and the 4-dimensional row-stochastic matrix
Awsn . First, note that Awsn is primitives because A2wsn is positive:

Awsn

1/2
1/4
=
0
0

1/2 0
0
1/4 1/4 1/4

1/3 1/3 1/3


1/3 1/3 1/3

A2wsn

3/8
3/8
1/8
1/8
3/16 17/48 11/48 11/48

=
1/12 11/36 11/36 11/36 .
1/12 11/36 11/36 11/36

Therefore, the PerronFrobenius Theorem 2.16 for primitive matrices applies to Awsn . The four pairs of eigenvalues
and right eigenvectors of Awsn (as computed in Example 2.5) are:

(1, 14 ),

2 273

(5 + 73), 11 + 73 ,
24

8
8

2(1 + 73)

(5 73), 11 73 ,
24

8
8

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

0
0
0, .
1
1

2.3. PerronFrobenius theory

25

Moreover, we know that Awsn is semi-convergent. To apply the convergence results in Corollary 2.20, we numerically
compute its left dominant eigenvector, normalized to have unit sum, to be w = [1/6, 1/3, 1/4, 1/4]> so that we
have:

1/6 1/3 1/4 1/4


1/6 1/3 1/4 1/4

.
lim Akwsn = 1>
4w =
1/6 1/3 1/4 1/4
k
1/6 1/3 1/4 1/4

Therefore, each solution to the averaging system x(k +1) = Awsn x(k) converges to a consensus vector (w> x(0))14 ,
that is, the value at each node of the wireless sensor network converges to w> x(0) = (1/6)x1 (0) + (1/3)x2 (0) +
(1/4)x3 (0) + (1/4)x4 (0). Note that Awsn is not doubly-stochastic and, therefore, the averaging algorithm does
not achieve average consensus and that node 2 has more influence than the other nodes.

Note: If A is reducible, then clearly it is not primitive. Yet, it is possible for an averaging algorithm
described by a reducible matrix to converge to consensus. In other words, Corollary 2.20 provides only
a sufficient condition for consensus. Here is a simple example of an averaging algorithm described by a
reducible matrix that converges to consensus:
x1 (k + 1) = x1 (k),
x2 (k + 1) = x1 (k).
To fully understand what all phenomena are possible and what properties of A are necessary and sufficient
for convergence to consensus, we will study graph theory in the next two chapters.

2.3.4

Selected proofs

We conclude this section with the proof of some selected statements.


Proof of Lemma 2.12 We aim to show that a primitive matrix A Rnn is irreducible. By contradiction, we assume that A is reducible. In other words, we assume that, after appropriately permuting
rows and columns according to a permutation P , the matrix A is block upper triangular:


> ? ?
A=P
P.
0 ?
A simple calculation shows that A2 has the same sparsity pattern:






2
> ? ?
> ? ?
> ? ?
A =P
P P
P =P
P.
0 ?
0 ?
0 ?
Thus, also Ak for any k {1, 2, . . . } has the sparsity pattern


k
> ? ?
A =P
P,
0 ?
that is, Ak is never positive for any k {1, 2, . . . }. Equivalently, A is not primitive. This contradiction
concludes the proof of Lemma 2.12.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

26

Chapter 2. Elements of Matrix Theory

Proof of Theorem 2.16 We start by establishing that a primitive A matrix satisfies (A) > 0. By
contradiction, if spec(A) = {0}, then the Jordan normal form J of A is nilpotent, that is, there is a
k N so that J k = Ak = 0 for all k k . But this is a contradiction because A being primitive implies
that there is k N so that Ak > 0 for all k k .
Next, we prove that (A) is a real positive eigenvalue with a positive right eigenvector v > 0. We
first focus on the case that A is a positive matrix, and later show how to generalize the proof to the case
of primitive matrices. Without loss of generality, assume (A) = 1. If (, x) is an eigenpair for A such
that || = (A) = 1, then
|x| = |||x| = |x| = |Ax| |A||x| = A|x|

|x| A|x|.

(2.7)

Here, we use the notation |x| = (|xi |)i{1,...,n} , |A| = {|aij |}i,j{1,...,n} , and vector inequalities are
understood component-wise. In what follows, we show |x| = A|x|. With the shorthands z = A|x| and
y = z |x|, equation (2.7) reads y 0 and we aim to show y = 0. By contradiction, assume y has a
non-zero component. Therefore, Ay > 0. Independently, we also know z = A|x| > 0. Thus, there
must exist > 0 such that Ay > z. Eliminating the variable y in the latter equation, we obtain A z > z,
where we define A = A/(1 + ). The inequality A z > z implies Ak z > z for all k > 0. Now, observe
that (A ) < 1 so that limk Ak = 0nn and therefore 0 > z. Since we also knew z > 0, we now have
a contradiction. Therefore, we know y = 0.
So far, we have established that |x| = A|x|, so that (1, |x|) is an eigenpair for A. Also note that A > 0
and x 6= 0 together imply A|x| > 0. Therefore we have established that 1 is an eigenvalue of A with
eigenvector |x| > 0. Next, observe that the above reasoning is correct also for primitive matrices if one
replaces the first equality (2.7) by |x| = |k ||x| and carries the exponent k throughout the proof.
In summary, we have established that there exists a real eigenvalue > 0 such that || for all
other eigenvalues , and that each right (and therefore also left) eigenvector of can be selected positive
up to rescaling. It remains to prove that is simple and is strictly greater than the magnitude of all other
eigenvalues. For the proof of se two proofs, we refer to (Meyer 2001, Chapter 8).



1 0
Proof of Proposition 2.19 We write the Jordan normal form of A as A = T
T 1 with
0 B
>
w1
w2>
>



T = v1 v2 v3 . . . vm , and T 1 = w3 ,
.
..
>
wm

where v1 , . . . , vn (respectively, w1 , . . . , wn ) are the columns of T (respectively the rows of T 1 ). Equivalently, we have


 1 0

 
.
A v1 v2 v3 . . . vm = v1 v2 v3 . . . vm
{z
} |
{z
} 0 B
|
=T

=T

The first column of the above matrix equation is Av1 = v1 , that is, v1 is the dominant right eigenvector
of A. By analogous arguments, we find that w1 is the dominant left eigenvector of A. Next we recall
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

2.3. PerronFrobenius theory

Ak

1 0
=T
0 B

k

27

T 1 so that
k

lim A = T

k+

lim

k+

1k 0
0 Bk




1 0 1
=T
T
0 0

since Theorem 2.16 implies (B) < 1, which in turn implies limk+ B k = 0(n1)(n1) by Theorem 2.8. Moreover,

>
w1
1 0 0 ... 0
0 0 0 . . . 0 w2>
>




lim Ak = v1 v2 v3 . . . vm 0 0 0 . . . 0 w3 = v1 w1> .
. . . .
.
k+
.
.. .. .. . . .. ..
0 0 0 ... 0

>
wm

Finally, the (1, 1) entry of the matrix equality T T 1 = In gives precisely the normalization v1> w1 = 1.
This concludes the proof of Proposition 2.19.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

28

Chapter 2. Elements of Matrix Theory

2.4

Exercises

E2.1

Simple properties of stochastic matrices. Let A1 , A2 , . . . , Ak be n n matrices, let A1 A2 Ak be


their product and let 1 A1 + + k Ak be their convex combination with arbitrary convex combination
coefficients. Show that
(i) if A1 , A2 , . . . , Ak are nonnegative, then their product and all their convex combinations are nonnegative,
(ii) if A1 , A2 , . . . , Ak are row-stochastic, then their product and all their convex combinations are
row-stochastic, and
(iii) if A1 , A2 , . . . , Ak are doubly-stochastic, then their product and all their convex combinations are
doubly-stochastic.

E2.2

Semi-convergence and Jordan block decomposition. Consider a square matrix A with (A) = 1.
Show that the following statements are equivalent:
(i) A is semi-convergent,
(ii) there exists a nonsingular matrix T and a number m {1, . . . , n} such that


Im
0m(nm) 1
A=T
T ,
0(nm)m
B
where B R(nm)(nm) is convergent, that is, (B) < 1.

E2.3

Semi-convergent and convergent matrices. Prove equation (2.6) and Theorem 2.6.

E2.4

Row-stochastic matrices after pairwise-difference


stochastic. Define T Rnn by

1
1

.
..

T =

1/n 1/n

similarity transform. Let A Rnn be row


..

.
1
...

Perform the following tasks:

E2.5

.
1
1/n

(i) for x = [x1 , . . . , xn ]> , write T x in components and show T is invertible,




Astable 0n1
1
(ii) show T AT =
for some Astable R(n1)(n1) and c R1(n1) ,
c
1
(iii) show that A primitive implies (Astable ) < 1, and


0 1
(iv) compute T AT 1 for A =
.
1 0

Substochastic matrices. A (row) substochastic matrix is a nonnegative matrix with all row-sums at most 1
and one row-sum strictly less than 1. Given a substochastic matrix A, show that
(i) if the jth row sum of A is strictly less than 1, then the jth row sum of A2 is strictly less than 1; and
(ii) if the jth row sum of A is strictly less than 1 and Aij > 0, then the ith row sum of A2 is strictly less
than 1.

E2.6
E2.7

Powers of primitive matrices. Let A Rnn be nonnegative. Show that Ak > 0, for some k N,
implies Am > 0 for all m k.
Symmetric doubly-stochastic matrix. Let A Rnn be doubly-stochastic. Show that:
(i) the matrix A> A is doubly-stochastic and symmetric,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 2

29

(ii) spec(A> A) [0, 1],


(iii) the eigenvalue 1 of A> A is not necessarily simple even if A is irreducible.
E2.8

Bounds on spectral radius of primitive matrices. Consider a primitive matrix A Rnn and show the
following upper and lower bounds:
min(A1n ) (A) max(A1n ).

E2.9

Equivalent definitions of irreducibility. Prove Lemma 2.13.

E2.10 Discrete-time affine systems. Given A Rnn and b Rn , consider the discrete-time affine systems
x(k + 1) = Ax(k) + b.
Assume A is convergent and show that
(i) the matrix (In A) is invertible,
(ii) the only equilibrium point of the system is (In A)1 b, and
(iii) limk x(k) = (In A)1 b for all initial conditions x(0) Rn .

E2.11 An affine averaging system. Given a primitive doubly-stochastic matrix A and a vector b satisfying
1>
n b = 0, consider the dynamical system
x(k + 1) = Ax(k) + b.
Show that
(i) the quantity k 7 1>
n x(k) is constant,

(ii) for each R, there exists a unique equilibrium point x satisfying 1>
n x = , and

(iii) all solutions with initial condition x(0) satisfying 1>


n x(0) = converge to x .
Hint: Use Exercise E2.2 and E2.10
E2.12 The Neumann series. For A Cnn , the following statements are equivalent:
(i) (A) < 1,
(ii) limk Ak = 0nn , and
P
(iii) the Neumann series k=0 Ak converges.

P
If any and hence all of these conditions hold, then the matrix (I A) is invertible and k=0 Ak = (I A)1 .
Hint: This statement, written in the style of (Meyer
2001, Section 7.10), is an extension of Theorem 2.8 and a
P
1
generalization of the classic geometric series 1x
= k=0 xk , convergent for all |x| < 1. For the proof, the hint is to
use the Jordan normal form.

E2.13 Orthogonal and permutation matrices. A set G with a binary operation mapping two elements of G
into another element of G, denoted by (a, b) 7 a ? b, is a group if:
a ? (b ? c) = (a ? b) ? c for all a, b, c G (associativity property);
there exists e G such that a ? e = e ? a = a for all a G (existence of an identity element); and
there exists a1 G such that a ? a1 = a1 ? a = e for all a G (existence of inverse elements).

Recall that: an orthogonal matrix R is a square matrix whose columns and rows are orthonormal vectors,
i.e., RR> = In ; an orthogonal matrix acts on a vector like a rotation and/or reflection; let O(n) denote the
set of orthogonal matrices. Similarly, recall that: a permutation matrix is a square binary (i.e., entries equal
to 0 and 1) matrix with precisely one entry equal to 1 in every row and every columns; a permutation
matrix acts on a vector by permuting its entries; let Pn denote the set of permutation matrices. Prove that
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

30

Chapter 2. Elements of Matrix Theory


(i) the set of orthogonal matrices O(n) with the operation of matrix multiplication is a group;
(ii) the set of permutation matrices Pn with the operation of matrix multiplication is a group; and
(iii) each permutation matrix is orthogonal.

E2.14 On doubly-stochastic and permutation matrices. The following result is known as the Birkhoff Von
Neumann Theorem. For a matrix A Rnn , the following statements are equivalent:
(i) A is doubly-stochastic; and
(ii) A is a convex combination of permutation matrices.

Do the following:
show that the set of doubly-stochastic matrices is convex (i.e., given any two doubly-stochastic
matrices A1 and A2 , any matrix of the form A1 + (1 )A2 , for [0, 1], is again doublystochastic);
show that (ii) = (i);
find in the literature a proof of (i) = (ii) and sketch it in one or two paragraphs.
E2.15 The Jacobi relaxation in parallel computation. Consider n distributed processors that aim to collectively solve the linear equation Ax = b, where b Rn and A Rnn is invertible and its diagonal elements
aii are nonzero. Each processor stores a variable xi (k) as the discrete-time variable k evolves and applies
the following iterative strategy termed Jacobi relaxation. At time step k N each processor performs the
local computation
n

X
1 
xi (k + 1) =
bi
aij xj (k) , i {1, . . . , n}.
aii
j=1,j6=i

Next, each processor i {1, . . . , n} sends its value xi (k + 1) to all other processors j {1, . . . , n} with
aji 6= 0, and they iteratively repeat the previous computation. The initial values of the processors are
arbitrary.
(i) Assume the Jacobi relaxation converges, i.e., assume limk x(k) = x . Show that Ax = b.
(ii) Give a necessary and sufficient condition for the Jacobi relaxation to converge.
(iii) Use Gergorin Disks Theorem 2.9 to show
Pn that the Jacobi relaxation converges if A is strictly row
diagonally dominant, that is, if |aii | > j=1,j6=i |aij | for all i {1, . . . , n}.

E2.16 The Jacobi over-relaxation in parallel computation. We now consider a more sophisticated version
of the Jacobi relaxation presented in Exercise E2.15. Consider again n distributed processors that aim to
collectively solve the linear equation Ax = b, where b Rn and A Rnn is invertible and its diagonal
elements aii are nonzero. Each processor stores a variable xi (k) as the discrete-time variable k evolves and
applies the following iterative strategy termed Jacobi over-relaxation. At time step k N each processor
performs the local computation
xi (k + 1) = (1 )xi (k) +

n

X

bi
aij xj (k) ,
aii
j=1,j6=i

i {1, . . . , n},

where R is an adjustable parameter. Next, each processor i {1, . . . , n} sends its value xi (k + 1) to
all other processors j 6= i with aji 6= 0, and they iteratively repeat the previous computation. The initial
values of the processors are arbitrary.
(i) Assume the Jacobi over-relaxation converges to x? and show that Ax? = b if 6= 0.
(ii) Find the expression governing the dynamics of the error variable e(k) := x(k) x? .
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 2

31

P
(iii) Suppose that A is strictly row diagonally dominant, that is |aii | > j6=i |aij |. Use the Gergorin Disks
Theorem 2.9 to discuss the convergence properties of the algorithm for all possible values of R.
Hint: Consider different thresholds for .
E2.17 Solutions of partial differential equations. This exercise is taken from (Luenberger 1979, Chapter 6).
A partial differential equation (PDE) is a differential equation that contains unknown functions and their
partial derivatives; PDEs are very common models for physical phenomena in fluids, electromagnetic fields,
temperature distributions and other spatially-distributed quantities. For example, the electric potential V
within a two-dimensional rectangular enclosure is governed by Laplaces equation:
2V
2V
+
= 0,
x2
y 2

(E2.1)

combined with the value of V along the boundary of the enclosure; see the left image in Figure E2.1.
b1

@2V
@2V
+
=0
@x2
@y 2

y
x

b2

b3

b4

b5

V1

V2

V3

V4

b7

b6

V5

V6

V7

V8

b8

b10

b11

b12

b9

Figure E2.1: Laplaces equation over a continuous and discrete grid. For illustrations sake, the grid is lowdimensional.
For arbitrary enclosures and boundary conditions, it is not possible to solve the Laplaces equation in
closed form. An approximate solution is computed as follows. A finite regular Cartesian grid of points
is placed inside the enclosure, see the right image in Figure E2.1, and the second-order derivatives are
approximated by second-order finite differences. Specifically, at node 2 of the grid, we have along the x
direction
2V
(V2 ) (V3 V2 ) (V2 V1 ) = V3 + V2 2V2 ,
x2
so that
0=

2V
2V
(V
)
+
(V2 ) V1 + V3 + V6 + b2 4V2
2
x2
y 2

4V2 = V1 + V3 + V6 + b2 .

Thus, Laplaces equation is equivalent to requiring that the electric potential at each grid node be equal to
the average of its neighboring nodes. In summary, this specification translate into the matrix equation:
4V = Agrid V + Cgrid-boundary b,

(E2.2)

where V Rn is the vector of unknown potentials, b Rm is the vector of boundary conditions,


Agrid {0, 1}nn is the adjacency matrix of the interior grid (that is, (Agrid )ij = 1 if and only if the
interior nodes i and j are connected), and Cgrid-boundary {0, 1}nm is the connection matrix between
interior and boundary nodes (that is, (Cgrid-boundary )i = 1 if and only if grid interior node i is connected
with boundary node ). Show that
(i) (Agrid ) < 4 and Agrid is primitive,
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

32

Chapter 2. Elements of Matrix Theory


(ii) there exists a unique solution V to equation (E2.2) and, if b 0m , then V 0n , and
(iii) each solution to the following iteration converges to V :
4V (k + 1) = Agrid V (k) + Cgrid-boundary b,
whereby, at each step, the value of V at each node is updated to be equal to the average of its
neighboring nodes.

E2.18 Robotic coordination and geometric optimization on the real line. Consider n 3 robots with
dynamics pi = ui , where i {1, . . . , n} is an index labeling each robot, pi R is the position of robot i,
and ui R is a steering control input. For simplicity, assume that the robots are indexed according to
their initial position: p1 (0) p2 (0) p3 (0) pn (0). We consider the following distributed control
laws to achieve some geometric configuration:
(i) Move towards the centroid of your neighbors: The robots i {2, . . . , n 1} (each having two
neighbors) move to the centroid of the local subset {pi1 , pi , pi+1 }:
pi =

1
(pi1 + pi + pi+1 ) pi ,
3

i {2, . . . , n 1} .

The robots {1, n} (each having one neighbor) move to the centroid of the local subsets {p1 , p2 } and
{pn1 , pn }, respectively:
p1 =

1
(p1 + p2 ) p1
2

and

pn =

1
(pn1 + pn ) pn .
2

By using these coordination laws, the robots asymptotically rendezvous.


(ii) Move towards the centroid of your neighbors or walls: Consider two walls at the positions
p0 p1 and pn+1 pn so that all robots are contained between the walls. The walls are stationary,
that is, p0 = 0 and pn+1 = 0. Again, the robots i {2, . . . , n 1} (each having two neighbors)
move to the centroid of the local subset {pi1 , pi , pi+1 }. The robots {1, n} (each having one robotic
neighbor and one neighboring wall) move to the centroid of the local subsets {p0 , p1 , p2 } and
{pn1 , pn , pn+1 }, respectively. Hence, the closed-loop robot dynamics are
pi =

1
(pi1 + pi + pi+1 ) pi ,
3

i {1, . . . , n} .

By using these coordination laws, the robots become uniformly spaced on the interval [p0 , pn+1 ].
(iii) Move away from the centroid of your neighbors or walls: Again consider two stationary walls
at p0 p1 and pn+1 pn containing the positions of all robots. We partition the interval [p0 , pn+1 ]
into areas of interest, where each robot gets a territory assigned that is closer to itself than to
other robots. Hence, robot i {2, . . . , n 1} (having two neighbors) obtains the partition Vi =
[(pi + pi1 )/2, (pi+1 + pi )/2], robot 1 obtains the partition V1 = [p0 , (p1 + p2 )/2], and robot n
obtains the partition Vn = [(pn1 + pn )/2, pn+1 ]. We want to design a distributed algorithm such
that the robots have equally sized partitions. We consider a simple coordination law, where each
robot i heads for the midpoint ci (Vi (p)) of its partition Vi :
pi = ci (Vi (p)) pi .
By using these coordination laws, the robots partitions asymptotically become equally large.
(iv) Discrete-time update rules: If the robots move in discrete-time according to p+
i = ui , then the
above coordination laws are easily modified via an Euler discretization as follows: replace pi = f (p)
by p+
i pi = f (p) in each coordination law, where > 0 is sufficiently small so that the matrices
involved in the discrete iterations are nonnegative.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 2

33

Consider n = 3 robots, take your favorite problem from above, and show that both the continuous-time
and discrete-time dynamics asymptotically lead to the desired geometric configurations.
E2.19 Continuous-time cyclic pursuit. Consider four mobile robotic vehicles, indexed by i {1, 2, 3, 4}. We
model each robot as fully-actuated kinematic point mass, that is, we write pi = ui , where pi C is the
position of robot i in the plane and ui C is its velocity command. The robots are equipped with onboard
cameras as sensors. The task of the robots is rendezvous at a common point (while using only onboard
sensors). A simple strategy to achieve rendezvous is cyclic pursuit: each robot i picks another robot, say
i + 1, and pursues it. This gives rise to the control ui = pi+1 pi and the closed-loop system


p1
1 1
0
0
p1
p2 0 1 1

0
=
p2 .
p3 0
0 1 1 p3
p4
1
0
0 1 p4
A simulation of the cyclic-pursuit dynamics is shown in Figure E2.2.
0.3

0.2

0.1

-0.1

-0.2

-0.3

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

Figure E2.2: Four robots with initial positions  that perform a cyclic pursuit to rendezvous at
Your tasks are as follows.

(i) Prove that the center of mass


average(p(t)) =

4
X
pi (t)
i=1

is constant for all t 0. Notice that this is equivalent to saying d/dt average(p(t)) = 0.
(ii) Prove that the robots asymptotically rendezvous at the initial center of mass, that is,
lim pi (t) = average(p(0)) for i {1, . . . , 4} .

(iii) Prove that if the robots are initially arranged in a square formation, they remain in a square formation
under cyclic pursuit.
Hint: Recall that for a matrix
P A with semisimple eigenvalues, the solution to the equation x = Ax is given by the
modal expansion x(t) = i ei t vi wi> x(0), where i is an eigenvalue, and vi and wi are the associated right and
left eigenvectors pairwise normalized to wi> vi = 1.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

34

Chapter 2. Elements of Matrix Theory

E2.20 Simulation (contd). This is a followup to Exercise E1.1. Consider the linear averaging algorithm in
equation (1.1): set n = 5, select the initial state equal to (1, 1, 1, 1, 1), and use (a) the complete graph
(b) a ring graph, and (c) a star graph with node 1 as center.
(i) To which value do all nodes converge to?
(ii) Compute the dominant left eigenvector of the averaging matrix associated to each of the three graphs
and verify that the result in Corollary 2.20(iii) is correct.
E2.21 Continuous- and discrete-time control control of mobile robots. Consider n robots moving on the
line with positions z1 , z2 , . . . zn R. In order to gather at a common location (i.e., reach rendezvous),
each robot heads for the centroid of its neighbors, that is,
!
n
X
1
zi =
z j zi .
n1
j=1,j6=i

(i) Will the robots asymptotically rendezvous at a common location?


(ii) Consider the Euler discretization of the above closed-loop dynamics with sampling rate T > 0:
!
!
n
X
1
zj (k) zi (k) .
zi (k + 1) = zi (k) + T
n1
j=1,j6=i

For which values of the sampling period T will the robots rendezvous?
Hint: Use the modal decomposition in Remark 2.3.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 3

Elements of Graph Theory


In this chapter we review some basic concepts from graph theory as exposed in standard books, e.g.,
see (Bollobs 1998; Diestel 2000). Graph theory provides key concepts to model, analyze and design
network systems and distributed algorithms; the language of graphs pervades modern science and
technology and is therefore essential

3.1

Graphs and digraphs

[Graphs] An undirected graphin short, graphconsists of a vertex set V and of a set E of unordered pairs
of vertices. For u, v V and u 6= v, the set {u, v} denotes an unordered edge. We define and visualize
some basic examples graphs in Figure 3.1.

Figure 3.1: Example graphs. First row: the ring graph with 6 nodes, a star graph with 7 nodes, a tree (see definition
below), the complete graph with 6 nodes (usually denoted by K(6)). Second row: the complete bipartite graph with
3 + 3 nodes (usually denoted by K(3, 3)), a grid graph, and the Petersen graph. The ring, the complete bipartite
K(3, 3) and the Petersen graph are 3-regular graphs.

35

36

Chapter 3. Elements of Graph Theory

[Neighbors and degrees in graphs] In a graph G, the vertices u and v are neighbors if {u, v} is an undirected
edge. Given a graph G, we let NG (v) denote the set of neighbors of v. The degree of v is the cardinality
of N (v). A graph is regular if all the nodes have the same degree.
[Digraphs and self-loops] A directed graphin short, digraphof order n is a pair G = (V, E), where V is a
set with n elements called vertices (or nodes) and E is a set of ordered pairs of vertices called edges. In other
words, E V V . We call V and E the vertex set and edge set, respectively. For u, v V , the ordered
pair (u, v) denotes an edge from u to v. A digraph is undirected if (v, u) E anytime (u, v) E. In a
digraph, a self-loop is an edge from a node to itself; as customary, self-loops are not allowed in graphs.
We define and visualize some basic examples digraphs in Figure 3.2.

Figure 3.2: Example digraphs: the ring digraph with 6 nodes, the complete graph with 6 nodes, and a directed acyclic
graph, i.e., a digraph with no directed cycles.

[Subgraphs] A digraph (V 0 , E 0 ) is a subgraph of a digraph (V, E) if V 0 V and E 0 E. A digraph


(V 0 , E 0 ) is a spanning subgraph if it is a subgraph and V 0 = V . The subgraph of (V, E) induced by V 0 V
is the digraph (V 0 , E 0 ), where E 0 contains all edges in E between two vertices in V 0 .
[In- and out-neighbors] In a digraph G with an edge (u, v) E, u is called an in-neighbor of v, and v is
called an out-neighbor of u. We let N in (v) (resp., N out (v)) denote the set of in-neighbors, (resp. the set
of out-neighbors) of v. Given a digraph G = (V, E), an in-neighbor of a nonempty set of nodes U is a
node v V \ U for which there exists an edge (v, u) E for some u U .
[In- and out-degree] The in-degree din (v) and out-degree dout (v) of v are the number of in-neighbors
and out-neighbors of v, respectively. Note that a self-loop at a node v makes v both an in-neighbor as
well as an out-neighbor of itself. A digraph is topologically balanced if each vertex has the same in- and
out-degrees (even if distinct vertices have distinct degrees).

3.2

Paths and connectivity in undirected graphs

[Paths] A path in a graph is an ordered sequence of vertices such that any pair of consecutive vertices in
the sequence is an edge of the graph. A path is simple if no vertex appears more than once in it, except
possibly for the initial and final vertex.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

3.3. Paths and connectivity in digraphs

37

[Connectivity and connected components] A graph is connected if there exists a path between any two vertices.
If a graph is not connected, then it is composed of multiple connected components, that is, multiple connected
subgraphs.
[Cycles] A cycle is a simple path that starts and ends at the same vertex and has at least three distinct
vertices. A graph is acyclic if it contains no cycles. A connected acyclic graph is a tree.
Lemma 3.1 (Tree properties). For a graph G = (V, E) without self-loops, the following statements are
equivalent
(i) G = (V, E) is a tree;
(ii) G is connected and |E| = |V | 1; and

(iii) G is acyclic and |E| = |V | 1.

Figure 3.3: This graph has two connected components. The leftmost connected component is a tree, while the
rightmost connected component is a cycle.

3.3

Paths and connectivity in digraphs

[Directed paths] A directed path in a digraph is an ordered sequence of vertices such that any pair of
consecutive vertices in the sequence is a directed edge of the digraph. A directed path is simple if no
vertex appears more than once in it, except possibly for the initial and final vertex.
[Cycles in digraphs] A cycle in a digraph is a simple directed path that starts and ends at the same vertex. It
is customary to accept as feasible cycles in digraphs also cycles of length 1 (i.e., a self-loop) and cycles
of length 2 (i.e., composed of just 2 nodes). The set of cycles of a directed graph is finite. A digraph
is acyclic if it contains no cycles. In a digraph, every vertex of in-degree 0 is named a source, and every
vertex of out-degree 0 is named a sink. Every acyclic digraph has at least one source and at least one sink;
see Exercise E3.1.

Figure 3.4: Acyclic digraph with one sink and two sources.

Figure 3.5: Directed cycle.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

38

Chapter 3. Elements of Graph Theory

[Directed trees] A directed tree (sometimes called a rooted tree) is an acyclic digraph with the following
property: there exists a vertex, called the root, such that any other vertex of the digraph can be reached
by one and only one directed path starting at the root. A directed spanning tree of a digraph is a spanning
subgraph that is a directed tree.

3.3.1

Connectivity properties of digraphs

Next, we present four useful connectivity notions for a digraph G:


(i) G is strongly connected if there exists a directed path from any node to any other node;
(ii) G is weakly connected if the undirected version of the digraph is connected;
(iii) G possesses a globally reachable node if one of its nodes can be reached from any other node by
traversing a directed path; and
(iv) G possesses a directed spanning tree if one of its nodes is the root of directed paths to every other
node.
An example of a strongly connected graph is shown in Figure 3.6, and a weakly connected graph
with a globally reachable node is illustrated in Figure 3.7.
3

5
1

5
1

Figure 3.6: A strongly connected digraph

Figure 3.7: A weakly connected digraph with a globally


reachable node, node #2.

For a digraph G = (V, E), the reverse digraph G(rev) has vertex set V and edge set E(rev) composed
of all edges in E with reversed direction. Clearly, a digraph contains a directed spanning tree if and only
if the reverse digraph contains a globally reachable node.

3.3.2

Periodicity of strongly-connected digraphs

[Periodic and aperiodic digraphs] A strongly-connected directed graph is periodic if there exists a k > 1,
called the period, that divides the length of every cycle of the graph. In other words, a digraph is periodic
if the greatest common divisor of the lengths of all its cycles is larger than one. A digraph is aperiodic if it
is not periodic.
Note: the definition of periodic digraph is well-posed because a digraph has only a finite number of
cycles (because of the assumptions that nodes are not repeated in simple paths). The notions of periodicity
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

3.3. Paths and connectivity in digraphs

(a)

39

(b)

(c)

Figure 3.8: (a) A periodic digraph with period 2. (b) An aperiodic digraph with cycles of length 1 and 2. (c) An
aperiodic digraph with cycles of length 2 and 3.

and aperiodicity only apply to digraphs and not to undirected graphs (where the notion of a cycle is
defined differently). Any strongly-connected digraph with a self-loop is aperiodic.

3.3.3

Condensation digraphs

[Strongly connected components] A subgraph H is a strongly connected component of G if H is strongly


connected and any other subgraph of G strictly containing H is not strongly connected.
[Condensation digraph] The condensation digraph of a digraph G, denoted by C(G), is defined as follows:
the nodes of C(G) are the strongly connected components of G, and there exists a directed edge in C(G)
from node H1 to node H2 if and only if there exists a directed edge in G from a node of H1 to a node of
H2 .

Figure 3.9: An example digraph, its strongly connected components and its condensation.

Lemma 3.2 (Properties of the condensation digraph). For a digraph G and its condensation C(G),
(i) C(G) is acyclic;
(ii) G contains a globally reachable node if and only if C(G) contains a globally reachable node;
(iii) G contains a directed spanning tree if and only if C(G) contains a directed spanning tree; and
(iv) G is weakly connected if and only if C(G) is weakly connected.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

40

Chapter 3. Elements of Graph Theory

Proof. Regarding statement (i): by contradiction, if there exists a cycle (H1 , H2 , . . . , Hm , H1 ) in C(G),
then the set of vertices H1 , . . . , Hm will be strongly connected. However, Hi s are strongly connected
components of G, and by the definition of condensation digraph, every subgraph of G containing them
should not be strongly connected; this is a contradiction.
Regarding the implication : in statement (ii): suppose G contains a globally reachable node v. Let
Hv denote the node in C(G) containing v. Since v is globally reachable, for all u V (G) there exists a
path from u to v, and due to the strong connectivity of Hv and Hu (the node in C(G) containing u),
there will be a path from all nodes of Hu to all nodes of Hv , which shows that Hv is a globally reachable
node in C(G).
Regarding the implication : in statement (ii): suppose C(G) contains a globally reachable node Hv
that can be reached from every node Hu V (C(G)). Again according to strong connectivity of Hv
and Hu , for all v Hv and u Hu , there exists a path from u to v. In other words, every node of Hv is
a globally reachable node in digraph G.
Regarding statement (iii), a digraph contains a directed spanning tree if and only if the reverse
digraph contains a globally reachable node. Thus, the proof statement (iii) is analogous to that for
statement (ii). Both implications in statement (iv) are simple to prove via induction; we leave this task to
the reader.


3.4

Weighted digraphs

A weighted digraph is a triplet G = (V, E, {ae }eE ), where the pair (V, E) is a digraph with nodes
V = {v1 , . . . , vn }, and where {ae }eE is a collection of strictly positive weights for the edges E.
Note: for simplicity we let V = {1, . . . , n}. It is therefore equivalent to write {ae }eE or {aij }(i,j)E .

The collection of weights for this weighted digraph


is

4
1.2

3.7

8.9

2.3

3.7

4.4

3
2.3

3.7
4.4

a12 = 3.7, a13 = 3.7, a21 = 8.9,


a24 = 1.2, a34 = 3.7, a35 = 2.3,
a51 = 4.4, a54 = 2.3, a55 = 4.4.

A digraph G = (V = {v1 , . . . , vn }, E) can be regarded as a weighted digraph by defining its set


of weights to be all equal to 1, that is, setting ae = 1 for all e E. A weighted digraph is undirected if
aij = aji for all i, j {1, . . . , n}.
The notions of connectivity and definitions of in- and out-neighbors, introduced for digraphs,
remain equally valid for weighted digraphs. The notions of in- and out-degree are generalized to
weighted digraphs as follows. In a weighted digraph with V = {v1 , . . . , vn }, the weighted out-degree and
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

3.5. Database collections and software libraries

41

the weighted in-degree of vertex vi are defined by, respectively,


dout (vi ) =
din (vi ) =

n
X

j=1
n
X

aij ,

(i.e., dout (vi ) is the sum of the weights of all the out-edges of vi ) ,

aji ,

(i.e., din (vi ) is the sum of the weights of all the in-edges of vi ) .

j=1

The weighted digraph G is weight-balanced if dout (vi ) = din (vi ) for all vi V .

3.5

Database collections and software libraries

Useful collections of example networks are freely available online; here are some examples:
(i) The Koblenz Network Collection, available at http://konect.uni-koblenz.de and described
in (Kunegis 2013), contains model graphs in easily accessible MATLAB format (as well as a
MATLAB toolbox for network analysis and a compact overview the various computed statistics
and plots for the networks in the collection).
(ii) A broad range of example networks is available online at the Stanford Large Network Dataset
Collection, see http://snap.stanford.edu/data.
(iii) The University of Florida Sparse Matrix Collection, available at http://www.cise.ufl.edu/
research/sparse/matrices and described in (Davis and Hu 2011), contains a large and growing
set of sparse matrices and complex graphs arising in a broad range of applications; e.g., see
Figure 3.10.
(iv) The UCI Network Data Repository, available at http://networkdata.ics.uci.edu, is an effort
to facilitate the scientific study of networks; see also (DuBois 2008).
Useful software libraries for network analysis and visualization are freely available online; here are some
examples:
(i) Gephi, available at https://gephi.org, is an interactive visualization and exploration platform
for all kinds of networks and complex systems, dynamic and hierarchical graphs. Datasets are
available at https://wiki.gephi.org/index.php?title=Datasets.
(ii) NetworkX, available at http://networkx.github.io, is a Python library for network analysis.
For example, one feature is the ability to compute condensation digraphs. A second interesting
feature is the ability to generate numerous well-known model graphs, see http://networkx.
lanl.gov/reference/generators.html
(iii) Cytoscape, available at http://www.cytoscape.org, is an open-source software platform for
visualizing complex networks and integrating them with attribute data.
(iv) Mathematica provides functionality for modeling, analyzing, synthesizing, and visualizing graphs
and networks beside the ability to simulate dynamical systems; see description at http://
reference.wolfram.com/mathematica/guide/GraphsAndNetworks.html.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

42

Chapter 3. Elements of Graph Theory

(a) IEEE 118 bus system

(b) Klavzar bibliography

(c) Pajek network GD99c

Figure 3.10: Example networks from distinct domains: Figure 3.10(a) shows the standard IEEE 118 power grid
testbed (118 nodes); Figure 3.10(b) shows the Klavzar bibliography network (86 nodes); Figure 3.10(c) shows the
GD99c Pajek network (105 nodes). Networks parameters are available at http://www.cise.ufl.edu/research/
sparse/matrices, and their layout is obtained via the graph drawing algorithm proposed by Hu (2005).

(v) Graphviz, available at http://www.graphviz.org/, is an open source graph visualization software which is also compatible with MATLAB: http://www.mathworks.com/matlabcentral/
fileexchange/4518-matlab-graphviz-interface.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

3.6. Exercises

43

3.6

Exercises

E3.1

Acyclic digraphs. Let G be an acyclic digraph with n nodes. Show that:


(i) G contains at least one sink, i.e., a vertex without out-neighbors and at least one source, i.e., a vertex
without in-neighbors;
(ii) the vertices of G can be given labels in the set {1, . . . , n} in such a way that if (u, v) is an edge, then
label(u) > label(v). This labeling is called a topological sort of G. Provide an algorithm to define this
labelling; and
(iii) after topologically sorting its vertices, the adjacency matrix of the digraph is lower-triangular, i.e.,
all its entries above the main diagonal are equal to zero.

E3.2

Condensation digraphs. Draw the condensation for each of the following digraphs.

E3.3

A simple proof. Prove Lemma 3.1 on the properties of a tree in a graph.

E3.4

Connectivity in topologically balanced digraphs. Prove the following statement: If a digraph G is


topologically balanced and contains either a globally reachable vertex or a directed spanning tree, then G
is strongly connected.

E3.5

Globally reachable nodes and disjoint closed subsets (Lin et al. 2005; Moreau 2005). Consider a
digraph G = (V, E) with at least two nodes. Prove that the following statements are equivalent:
(i) G has a globally reachable node, and
(ii) for every pair S1 , S2 of non-empty disjoint subsets of V , there exists a node that is an out-neighbor
of S1 or S2 .

E3.6

Swiss railroads. Consider the fictitious railroad map of Switzerland given in Figure E3.1.
(i) Can a passenger go from any station to any other?
(ii) Is the graph acyclic? Is it aperiodic? If not, what is its period?

BASEL

ST. GALLEN

ZURICH

BERN

6
LAUSANNE

ZERMATT 8

INTERLAKEN

CHUR

LUGANO

Figure E3.1: Fictitious railroad map connections in Switzerland


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

44

Chapter 3. Elements of Graph Theory

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 4

The Adjacency Matrix


We review here basic concepts from algebraic graph theory. Standard books on algebraic graph theory
are (Biggs 1994; Godsil and Royle 2001). One objective is to relate matrix properties with graph
theoretical properties. A second objective is to understand when is a row-stochastic matrix primitive.

4.1

The adjacency matrix

Given a weighted digraph G = (V, E, {ae }eE ), with V = {1, . . . , n}, the weighted adjacency matrix of
G is the n n nonnegative matrix A defined as follows: for each edge (i, j) E, the entry (i, j) of A is
equal to the weight a(i,j) of the edge (i, j), and all other entries of A are equal to zero. In other words,
aij > 0 if and only if (i, j) is an edge of G, and aij = 0 otherwise.

The adjacency matrix of this weighted directed graph is

4
1.2

3.7

8.9

2.3

3.7

4.4

3
2.3

3.7

4.4

0 3.7 3.7 0
0
8.9 0
0 1.2 0

0
0
0 3.7 2.3

.
0
0
0
0
0
4.4 0
0 2.3 4.4

The binary adjacency matrix A {0, 1}nn of a digraph G = (V = {1, . . . , n}, E) or of a weighted
digraph is defined by
(
1, if (i, j) E,
aij =
(4.1)
0, otherwise.
Finally, in a weighted digraph, the weighted out-degree matrix Dout and the weighted in-degree matrix
Din are the diagonal matrices defined by

dout (1) 0
0

>
..
Dout = diag(A1n ) = 0
.
0 , and Din = diag(A 1n ).
0

dout (n)

45

46

Chapter 4. The Adjacency Matrix

where diag(z1 , . . . , zn ) is the diagonal matrix with diagonal entries equal to z1 , . . . , zn .

4.2

Algebraic graph theory: basic and prototypical results

In this section we review some basic and prototypical results that involve correspondences between
graphs and adjacency matrices. We start with some straightforward statements. For a weighted digraph
G with adjacency matrix A, the following statements hold:
(i) the graph G is undirected if and only if A is symmetric and its diagonal entries are equal to 0;
(ii) the digraph G is weight-balanced if and only if A1n = A> 1n ;
(iii) the node i is a sink if and only if ith row-sum of A is zero; and
(iv) the node i is a source if and only if ith column-sum of A is zero.
Lemma 4.1 (Digraph associated to a nonnegative matrix). Given a nonnegative n n matrix A, its
associated weighted digraph is the weighted digraph with nodes {1, . . . , n}, and weighted adjacency matrix A.
The weighted adjacency matrix A is
(i) row-stochastic if and only if each node of its associated digraph has weighted out-degree equal to 1 (so that
In is the weighted out-degree matrix); and
(ii) doubly-stochastic if and only if each node of its associated weighted digraph has weighted out-degree and
weighted in-degree equal to 1 (so that G is weight-balanced and, additionally, both in-degree and out-degree
matrices are equal to In ).

4.3

Powers of the adjacency matrix, paths and connectivity

Lemma 4.2 (Directed paths and powers of the adjacency matrix). Let G be a weighted digraph with
n nodes, with weighted adjacency matrix A, with unweighted adjacency matrix A0,1 {0, 1}nn , and possibly
with self-loops. For all i, j {1, . . . , n} and k N
(i) the (i, j) entry of Ak0,1 equals the number of directed paths of length k (including paths with self-loops)
from node i to node j; and
(ii) the (i, j) entry of Ak is positive if and only if there exists a directed path of length k (including paths with
self-loops) from node i to node j.
Proof. The first statement is proved by induction; for simplicity of notation, let A be binary. The
statement is clearly true for k = 1. Next, we assume the statement is true for k 1 and we prove it for
k + 1. By assumption, the entry (Ak )ij equals the number of directed paths from i to j of length k.
Now, each path from i to j of length k + 1 identifies (1) a unique node h such that (i, h) is an edge
of G and (2) a unique path from h to j of length k. We write Ak+1 = AAk in components as
(Ak+1 )ij =

n
X
h=1

Aih (Ak )hj =

n
X

(Ak )hj ,

hN out (i)

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

4.3. Powers of the adjacency matrix, paths and connectivity

47

where N out (i) are the nodes that are the out-neighbors of i. Therefore, it is true that the entry (Ak+1 )ij
equals the number of directed paths from i to j of length k + 1. This concludes the induction argument.
The second statement, for the case where A is not binary, is a direct consequence of the first.

Proposition 4.3 (Connectivity properties of the digraph and positive powers of the adjacency
matrix). Let G be a weighted digraph with n nodes and weighted adjacency matrix A. The following statements
are equivalent:
(i) G is strongly connected;
(ii) A is irreducible; and
Pn1 k
(iii)
k=0 A is positive.

For any i, j {1, . . . , n}, the following equivalent statements hold:


(iv) the jth node of G is globally reachable if and only if the jth column of

Pn1

Ak is positive; and
P
k
(v) the ith node of G is the root of a directed spanning tree if and only if the ith row of n1
k=0 A is positive.
The adjacency matrix of this unweighted
directed graph is

k=0

1 1 1
0 1 1 .
0 1 1

Even though vertices 2 and 3 are globally reachable, the digraph is not strongly connected because
vertex 1 has no in-neighbor other than itself. Therefore, as it is easy to observe, the associated adjacency
matrix is reducible.
Proof of Proposition 4.3. (ii) = (i) We assume A is irreducible and aim to show that there exist directed
paths from any node to any other node. Fix i {1, . . . , n} and let Ri {1, . . . , n} be the set of nodes that
belong to directed paths originating from node i. Denote the unreachable nodes by Ui = {1, . . . , n} \ Ri .
Second, by contradiction, assume Ui is not empty. Then Ri Ui is a nontrivial partition of the index
set {1, . . . , n} and irreducibility implies the existence of a non-zero entry ajh with j Ri and h Ui .
But then the node h is reachable. Therefore, Ui = , and all nodes are reachable from i. The converse
statement (i) = (ii) is proved similarly.
(i) = (iii): If G is strongly connected, then there exists a directed path of length k n 1
connecting node i to node j, for all i and j. Hence, by Lemma 4.2(ii), the entry (Ak )ij is strictly positive.
This implies (iii). P
h
k
(iii) = (i): If n1
k=0 A is positive, then for all i and j there must exist h such that Aij > 0. This
implies the existence of a path of length h from i to j.

Notice that if node j is reachable from node i via a path of length k and at least one node along that
path has a self-loop, then node j is reachable from node i via paths of length k, k + 1, k + 2, and so on.
This observation and statement (iv) in Proposition 4.3 lead to the following corollary.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

48

Chapter 4. The Adjacency Matrix

Corollary 4.4 (Connectivity properties of the digraph and positive powers of the adjacency matrix: contd). Let G be a weighted digraph with n nodes, weighted adjacency matrix A and a self-loop at each
node. The following statements are equivalent:
(i) G is strongly connected; and
(ii) An1 is positive, so that A is primitive.
For any j {1, . . . , n}, the following two statements are equivalent:
(i) the jth node of G is globally reachable; and
(ii) the jth column of An1 has positive entries.

4.4

Graph theoretical properties of primitive matrices

In this section we present the main result of this chapter, an immediate corollary and its proof.
Proposition 4.5 (Strongly connected and aperiodic digraph and primitive adjacency matrix).
Let G be a weighted digraph with weighted adjacency matrix A. The following two statements are equivalent:
(i) G is strongly connected and aperiodic; and
(ii) A is primitive, that is, there exists k N such that Ak is positive.
Corollary 4.6 (Strongly connected digraph with self-loops and primitive adjacency matrix). Let
G be a weighted digraph with weighted adjacency matrix A. If G is strongly connected and has at least one
self-loop, then A is primitive.
Before proving Proposition 4.5, we introduce a useful fact from number theory, whose proof we
leave as Exercise E4.6. Loosely, the following lemma states that coprime numbers (i.e., numbers whose
greatest common divisor is 1) generate, via linear combinations with nonnegative integer coefficients,
all numbers larger than a given threshold.
Lemma 4.7 (Frobenius number). Given a finite set A = {a1 , a2 , . . . , an } of positive integers, an integer
M is said to be representable by A if there exists nonnegative integers {1 , 2 , . . . , n } such that M =
1 a1 + + N aN . The following statements are equivalent:
(i) there exists a finite largest unrepresentable integer, called the Frobenius number of A, and
(ii) the greatest common divisor of A is 1.
Finally, we provide a proof for Proposition 4.5 taken from (Bullo et al. 2009).
Proof of Proposition 4.5. (i) = (ii) Pick any ordered pair (i, j). We claim that there exists a number
k(i, j) with the property that, for all m > k(i, j), we have (Am )ij > 0, that is, there exists a directed
path from i to j of length m for all m k(i, j). If this claim is correct, then the statement (ii) is proved
with k = max{k(i, j) | i, j {1, . . . , n}}. To show this claim, let {c1 , . . . , cN } be the set of the cycles
of G and let {k1 , . . . , kN } be their lengths. Because G is aperiodic, the lenghts {k1 , . . . , kN } are coprime
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

4.4. Graph theoretical properties of primitive matrices

49

and Lemma 4.7 implies the existence of a number h(k1 , . . . , kN ) such that any number larger than
h(k1 , . . . , kN ) is a linear combination of k1 , . . . , kN with nonnegative integer as coefficients. Because G
is strongly connected, there exists a path of arbitrary length (i, j) that starts at i, contains a vertex of
each of the cycles c1 , . . . , cN , and terminates at j. Now, we claim that k(i, j) = (i, j)+h(k1 , . . . , kN ) has
the desired property. Indeed, pick any number m > k(i, j) and write it as m = (i, j)+1 k1 + +N kN
for appropriate numbers 1 , . . . , N N. A directed path from i to j of length m is constructed by
attaching to the path the following cycles: 1 times the cycle c1 , 2 times the cycle c2 , . . . , N times
the cycle cN .
(ii) = (i) From Lemma 4.2 we know that Ak > 0 means that there are paths of length k from every
node to every other node. Hence, the digraph G is strongly connected. Next, we prove aperiodicity.
Because G is strongly connected, each node of G has at least one outgoing edge, that is, for all i, there
exists at least one index j such that aij > 0. This
implies that the matrix Ak+1 = AAk is positive
Pfact
n
k+1
via the following simple calculation: (A )il = h=1 aih (Ak )hl aij (Ak )jl > 0. In summary, if Ak is
positive for some k, then Am is positive for all subsequent m > k (see also Exercise E2.6). Therefore,
there are closed paths in G of any sufficiently large length. This fact implies that G is aperiodic; indeed,
by contradiction, if the cycle lengths were coprimes, than G would not possess such closed paths of
arbitrary sufficiently large length.


Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

50

Chapter 4. The Adjacency Matrix

4.5

Exercises

E4.1

Edges and triangles in an undirected graph. Let A be the binaryP


adjacency matrix for an undirected
n
graph G without self-loops. Recall that the trace of A is trace(A) = i=1 aii . Show that

E4.2

(i) trace(A) = 0,
(ii) trace(A2 ) = 2|E|, where |E| is the number of edges of G, and
(iii) trace(A3 ) = 6|T |, where |T | is the number of triangles of G. (A triangle is a complete subgraph with
three vertices.)
(iv) Give the formula relating trace(An ) to the number of closed walks on the graph of length n.

0 1 1
(v) Verify results (i)-(iii) on the matrix A = 1 0 1.
1 1 0

A sufficient condition for primitivity. Assume the square matrix A is nonnegative and irreducible.
Show that
(i) if A has a positive diagonal element, then A is primitive,
(ii) if A is primitive, then it is false that A must have a positive diagonal element.

E4.3

Example row-stochastic matrices and associated digraph. Consider the row-stochastic matrices

0
1
1
A1 =
2 0
1

0
0
1
1

1
1
0
0

1
0
,
1
0

1
1
1
A2 =
2 0
0

0
0
1
1

1
1
0
0

0
0
,
1
1

1
1
1
and A3 =
2 0
0

0
1
0
1

1
0
1
0

0
0
.
1
1

Draw the digraphs G1 , G2 and G3 associated with these three matrices. Using only the original definitions
and without relying on the characterizations in Propositions 4.3 and 4.5, show that:
(i) the matrices A1 , A2 and A3 are irreducible and primitive,
(ii) the digraphs G1 , G2 and G3 are strongly connected and aperiodic, and
(iii) the averaging algorithm defined by A2 converges in a finite number of steps.
E4.4

Convergent substochastic matrices. Let A be a nonnegative matrix with associated digraph G. Let
dout (i) denote the out-degree of node i. Show that
(i) A is substochastic (as defined in Exercise E2.5) if and only if dout (i) 1 for all i and dout (j) < 1 for
at least one j.
Next, suppose that for each node i with dout (i) = 1 there exists a directed path from i to a node j(i) with
dout (j) < 1. Show that
(ii) there exists k such that Ak 1n < 1n ; and
(iii) (A) < 1, that is, A is convergent.

E4.5

Normalization of nonnegative irreducible matrices. Consider a strongly connected weighted digraph


G with n nodes and with an irreducible adjacency matrix A Rnn . The matrix A is not necessarily
row-stochastic. Find a positive vector v Rn so that the normalized matrix
Anormalized =

1
(diag(v))1 A diag(v)
(A)

is nonnegative, irreducible, and row-stochastic.


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 4

51

E4.6

The Frobenius number. Prove Lemma 4.7.


Hint: Read up on the Frobenius number in (Owens 2003).

E4.7

Leslie population model. The Leslie model is used in population ecology to model the changes in a
population of organisms over a period of time; see the original reference (Leslie 1945) and a comprehensive
text (Caswell 2001). In this model, the population is divided into n groups based on age classes; the indices
i are ordered increasingly with the age, so that i = 1 is the class of the newborns. The variable xi (k),
i {1, . . . , n}, denotes the number of individuals in the age class i at time k; at every time step k the xi (k)
individuals
produce a number i xi (k) of offsprings (i.e., individuals belonging to the first age class), where
i 0 is a fecundity rate, and
progress to the next age class with a survival rate i [0, 1].

If x(k) denotes the vector of individuals at time k, the Leslie population model reads

1 2 . . . n1 n
1 0 . . .
0
0

.
.
..
..
x(k),
0

0
x(k + 1) = Ax(k) =
2

. .

.
.
.
.. ..
..
..
..
0
0 . . . n1 0

(E4.1)

where A is referred to as the Leslie matrix. Consider the following two independent sets of questions. First,
assume i > 0 for all i {1, . . . , n} and 0 < i 1 for all i {1, . . . , n 1}.
(i) Prove that the matrix A is primitive.

(ii) Let pi (k) = Pnxi (k)


denote the percentage of the total population in class i at time k; accordingly,
i=1 xi (k)
let p(k) be the population distribution at time k. Compute the asymptotic population distribution
when k +, expressing it in terms of the spectral radius (A) and the parameters (i , i ),
i {1, . . . , n}.
Hint: The quantity limk p(k) is independent of x(0) and of the left dominant eigenvector of A.
(iii) Assume i = > 0 and i = n for i {1, . . . , n}. What percentage of the total population belongs
to the eldest class n asymptotically?
(iv) Find a sufficient condition on the parameters (i , i ), i {1, . . . , n}, so that the population will
eventually become extinct.
Second, assume i 0 for i {1, . . . , n} and 0 i 1 for all i {1, . . . , n 1}.

E4.8

(v) Find a necessary and sufficient condition on the parameters (i , i ), i {1, . . . , n} so that the Leslie
matrix A is irreducible.
(vi) For an irreducible Leslie matrix (as in the previous point (v)), find a sufficient condition on the
parameters (i , i ), i {1, . . . , n}, that ensures that the population will not go extinct.

Swiss railroads: continued. From Exercise E3.6, consider the fictitious railroad map of Switzerland given
in Figure E3.1. Write the unweighted adjacency matrix A of this transportation network and, relying
upon A and its powers, answer the following questions:

(i) what is the number of links of the shortest path connecting St. Gallen to Zermatt?
(ii) is it possible to go from Bern to Chur using 4 links? And 5?
(iii) how many different routes, with strictly less then 9 links and possibly visiting the same station more
than once, start from Zrich and end in Lausanne?

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

52

Chapter 4. The Adjacency Matrix

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 5

Discrete-time Averaging Systems


After our discussions about matrix and graph theory, we are finally ready to go back to the examples
discussed in Chapter 1. Recall from Chapter 1:
(i) Averaging Algorithm = given an undirected graph, associate a variable to each node and iteratively compute local
averages;
(ii) Distributed Hypothesis Testing and Distributed Parameter Estimation = design an algorithm to compute the average of n numbers;
(iii) Reaching a Consensus in a Social Influence Network =
given an arbitrary row-stochastic matrix, what does it con- Figure 5.1: Interactions in a social influence network
verge to?
(iv) Cyclic pursuit and balancing = given a specific matrix
(cyclic=sparse, doubly-stochastic), does it converge?
This chapter discusses two topics. First, we present some
analysis results, and, specifically, some convergence results for averaging algorithms defined by rowstochastic matrices; we discuss primitive matrices and reducible matrices with a single or multiple sinks.
Our treatment is related to the discussion in (Jackson 2010, Chapter 8) and (DeMarzo et al. 2003,
Appendix C and, specifically, Theorem 10). Second, we show some design results and, specifically, how
to design optimal matrices; we discuss the equal-neighbor model and the MetropolisHastings model.
The computation of optimal averaging algorithms (doubly-stochastic matrices) is discussed in Boyd et al.
(2004).

5.1

Averaging with primitive row-stochastic matrices

From Chapter 2 on matrix theory, we can now re-state the main convergence result in Corollary 2.20
in a more explicit way using the main graph-theory result in Proposition 4.5.
53

54

Chapter 5. Discrete-time Averaging Systems

Corollary 5.1 (Consensus for row-stochastic matrices with strongly connected and aperiodic
graph). If a row-stochastic matrix A has an associated digraph that is strongly connected and aperiodic (hence A
is primitive), then
(i) limk Ak = 1n w> , where w > 0 is the left eigenvector of A with eigenvalue 1 satisfying w1 + +wn =
1;
(ii) the solution to x(k + 1) = Ax(k) satisfies

lim x(k) = w> x(0) 1n ;

(iii) if additionally A is doubly-stochastic, then w = n1 1n (because A> 1n = 1n and n1 1>


n 1n = 1) so that
lim x(k) =

5.2


1>
n x(0)
1n = average x(0) 1n .
n

Averaging with reducible matrices

Next, consider a reducible row-stochastic matrix A, i.e., a row-stochastic matrix whose associated digraph
G is not strongly connected. We wish to give sufficient conditions for semi-convergence of A.
We first recall a useful property from Lemma 3.2: G has a globally reachable node if and only if its
condensation digraph has a globally reachable node (that is asingle sink). Along these same lines one can
show that the set of globally reachable nodes induces a strongly connected component of G. A digraph
with a globally reachable node and its condensation digraph is illustrated in Figure 5.2.

Figure 5.2: First panel: An example digraph with a set of globally reachable nodes. Second panel: its strongly
connected components (in red and blue). Third panel: its condensation digraph with a sink. For this digraph, the
subgraph induced by the globally reachable nodes is aperiodic.

We are now ready to establish the semiconvergence of adjacency matrices of digraphs with globally
reachable nodes.
Theorem 5.2 (Consensus for row-stochastic matrices with a globally-reachable aperiodic strongly-connected component). Let A be a row-stochastic matrix and let G be its associated digraph. Assume
that G has a globally reachable node and the subgraph induced by the set of globally reachable nodes is aperiodic.
Then
(i) the simple eigenvalue (A) = 1 is strictly larger than the magnitude of all other eigenvalues, hence A is
semi-convergent;
(ii) limk Ak = 1n w> , where w 0 is the left eigenvector of A with eigenvalue 1 satisfying w1 + +wn =
1;
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

5.2. Averaging with reducible matrices

55

(iii) the eigenvector w 0 has positive entries corresponding to each globally reachable node and has zero entries
for all other nodes;
(iv) the solution to x(k + 1) = Ax(k) satisfies

lim x(k) = w> x(0) 1n .

Note that: for all nodes j which are not globally reachable, the initial values xj (0) have no effect on
the final convergence value.
Note: as we discussed in Section 2.3, the limiting vector is a weighted average of the initial conditions.
The relative weights of the initial conditions are the convex combination coefficients w1 , . . . , wn . In
a social influence network, the coefficient wi is regarded as the social influence of agent i. We
illustrate this concept by computing the social influence coefficients for the famous Krackhardts advice
network (Krackhardt 1987); see Figure 5.3.
Note: adjacency matrices of digraphs with globally reachable nodes are sometimes called indecomposable; see (Wolfowitz 1963).

Figure 5.3: Krackhardts advice network with 21 nodes. The social influence of each node is illustrated by its gray
level.

Proof of Theorem 5.2. By assumption the condensation digraph of A contains a sink that is globally
reachable, hence it is unique. Therefore, after a permutation of rows and columns (see Exercise E3.1),


A11 0
A=
, (lower-triangular matrix),
(5.1)
A21 A22
The state vector x is correspondingly partitioned into x1 Rn1 and x2 Rn2 so that
x1 (k + 1) = A11 x1 (k),

(5.2)

x2 (k + 1) = A21 x1 (k) + A22 x2 (k).

(5.3)

Here x1 and A11 are the variables and the matrix corresponding to the sink. Because the sink, as
a subgraph of G, is strongly connected and aperiodic, A11 is primitive and row-stochastic and, by
Corollary 5.1,
lim Ak11 = 1n1 w1> ,
k

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

56

Chapter 5. Discrete-time Averaging Systems

where w1 > 0 is the left eigenvector with eigenvalue 1 for A11 normalized so that 1>
n1 w1 = 1.
The matrix A22 is analyzed as follows. Recall from Exercise E2.5 the notion of substochastic matrix
and note, from Exercise E4.4, that an irreducible substochastic matrix has spectral radius less than 1.
Now, because A21 cannot be zero (otherwise the sink would not be globally reachable), the matrix A22
is substochastic. Moreover, (after appropriately permuting rows and columns of A22 ) it can be observed
that A22 is a lower-triangular matrix such that each diagonal block is row substochastic and irreducible
(corresponding to each node in the condensation digraph). Therefore, we know (A22 ) < 1 and, in turn,
In2 A22 is invertible. Because A11 is primitive and (A22 ) < 1, A is semiconvergent and limk x2 (k)
exists. Taking the limit as k in equation (5.3), some straightforward algebra shows that

lim x2 (k) = (In2 A22 )1 A21 lim x1 (k) = (In2 A22 )1 A21 (1n1 w1> ) x1 (0).
k

From the row-stochasticity of A, we know A21 1n1 +A22 1n2


Collecting these results, we write

k 
A11 0
1 w>
lim
= n1 1>
1n2 w1
k A21 A22

= 1n2 and hence (In2 A22 )1 A21 1n1 = 1n2 .



 >
0
w
= 1n 1 .
0
0


5.3

Averaging with reducible matrices and multiple sinks

In this section we now consider the general case of digraphs that do not contain globally reachable nodes,
that is, digraphs whose condensation digraph has multiple sinks. In the following statement we say that a
node is connected with a sink of a digraph if there exists a directed path from the node to any node in the
sink.
Theorem 5.3 (Convergence for row-stochastic matrices with multiple aperiodic sinks). Let A be
a row-stochastic matrix and let G be its associated digraph. Assume the condensation digraph C(G) contains
M 2 sinks and assume all of them are aperiodic. Then
(i) the semi-simple eigenvalue (A) = 1 has multiplicity equal M and is strictly larger than the magnitude of
all other eigenvalues, hence A is semi-convergent,

(ii) there exist M left eigenvectors of A, denoted by ws Rn , for m {1, . . . , M }, with the properties that:
wm 0, w1m + + wnm = 1 and wim is positive if and only if node i belongs to the m-th sink,

(iii) the solution to x(k + 1) = Ax(k) with initial condition x(0) satisfies

(wm )> x(0),


if node i belongs tothe m-th sink,

(wm )> x(0),


if node i is connected with the m-th sink and no other sink,
lim xi (k) =
M
X

k


zi,m (wm )> x(0) , if node i is connected to more than one sink,

m=1

where, for each node i connected to more than one sink, the coefficients zi,m , m {1, . . . , S}, are combination
coefficients coefficients and are strictly positive if and only if there exists a directed path from node i to the
sink m.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

5.3. Averaging with reducible matrices and multiple sinks

57

Proof. Rather than treating with heavy notation the general case, we work out an example and refer the
reader to (DeMarzo et al. 2003, Theorem 10) for the general proof. Assume the condensation digraph of
A is composed of three nodes, two of which are sinks, as in the side figure.

x3
x1

x2

Therefore, after a permutation of rows and columns (see Exercise E3.1), A can be written as

A11 0
0
A = 0 A22 0
A31 A32 A33

and the state vector x is correspondingly partitioned into the vectors x1 , x2 and x3 . The state equations
are:
x1 (k + 1) = A11 x1 (k),

(5.4)

x2 (k + 1) = A22 x2 (k),

(5.5)

x3 (k + 1) = A31 x1 (k) + A32 x2 (k) + A33 x3 (k).

(5.6)

By the properties of the condensation digraph and the assumption of aperiodicity of the sinks, the
digraphs associated to the row-stochastic matrices A11 and A22 are strongly connected and aperiodic.
Therefore we immediately conclude that


lim x1 (k) = w1> x1 (0) 1n1 and lim x2 (k) = w2> x2 (0) 1n2 ,
k

where w1 (resp. w2 ) is the left eigenvector of the eigenvalue 1 for matrix A11 (resp. A22 ) with the usual
>
normalization 1>
n1 w1 = 1n2 w2 = 1.
Regarding the matrix A33 , the same discussion as in the previous proof leads to (A33 ) < 1 and, in
turn, to the statement that In3 A33 is nonsingular. By taking the limit as k in equation (5.6),
some straightforward algebra shows that

lim x3 (k) = (In3 A33 )1 A31 lim x1 (k) + A32 lim x2 (k)
k
k
k


>
1
= (w1 x1 (0)) (In3 A33 ) A31 1n1 + (w2> x2 (0)) (In3 A33 )1 A32 1n2 .
Moreover, because A is row-stochastic, we know

A31 1n1 + A32 1n2 + A33 1n3 = 1n3 ,


and, using again the fact that In3 A33 is nonsingular,

1n3 = (In3 A33 )1 A31 1n1 + (In3 A33 )1 A32 1n2 .

This concludes our proof of Theorem 5.3 for the simplified case C(G) having three nodes and two
sinks.

Note that: convergence does not occur to consensus (not all components of the state are equal) and
the final value of all nodes is independent of the initial values at nodes which are not in the sinks of the
condensation digraph.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

58

Chapter 5. Discrete-time Averaging Systems


1/3
1/3

1/4
1/4
1/4
1/2

1/3
1/3

1/3 1/3

1/2

1/4

Figure 5.4: The equal-neighbor model

5.4

Design of weights for undirected graphs: the equal-neighbor model

From Section 1.2 let us consider an undirected graph as in Figure 5.4 and the following simplest
distributed algorithm, based on the concepts of linear averaging. Each node contains a value xi and
repeatedly executes:

x+
(5.7)
i := average xi , {xj , for all neighbor nodes j} .
Let us make a few simple observations. The algorithm (5.7) can be written in matrix format as:

1/2 1/2 0
0
1/4 1/4 1/4 1/4

x(k + 1) =
0 1/3 1/3 1/3 x(k) =: Awsn x(k).
0 1/3 1/3 1/3

The binary symmetric adjacency matrix and the degree matrix of the undirected graph are

0 1 0 0
1 0 0 0
1 0 1 1
0 3 0 0

A=
0 1 0 1 , D = 0 0 2 0 ,
0 1 1 0
0 0 0 2
and so one can verify that

Awsn = (D + I3 )1 (A + I3 ),


X
1
xi (k + 1) =
xi (k) +
xj (k) .
1 + d(i)
jN (i)

Recall that A + I3 is the adjacency matrix of a graph that is equal to the graph in figure with the addition
of a self-loop at each node; this new graph has degree matrix D + I3 .
Now, it is also quite easy to verify (see also see Exercise E5.1) that
Awsn 14 = 14 ,

but unfortunately

>
1>
4 Awsn 6= 14 .

We summarize this discussion and state a more general result, in arbitrary dimensions and for arbitrary
graphs.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

5.5. Design of weights for undirected graphs: the MetropolisHastings model

59

Lemma 5.4 (The equal-neighbor row-stochastic matrix). Let G be a weighted digraph with n nodes,
weighted adjacency matrix A and weighted out-degree matrix Dout . Define
Aequal-neighbor = (In + Dout )1 (In + A).
Note that the weighted digraph associated to (A + In ) is G with the addition of a self-loop at each node with unit
weight. Then
(i) Aequal-neighbor is row-stochastic;
(ii) Aequal-neighbor is primitive if and only if G is strongly connected; and
(iii) Aequal-neighbor is doubly-stochastic if G is weight-balanced and the weighted degree is constant for all nodes
(i.e., Dout = Din = dIn for some d R>0 ).
Proof. First, for any v Rn with non-zero entries, it is easy to see diag(v)1 v = 1n . Recalling the
definition Dout + In = diag((A + In )1n ),


(Dout + In )1 (A + In ) 1n = diag((A + In )1n )1 (A + In )1n = 1n ,

which proves statement (i). To prove statement (ii), note that, beside self-loops, G and the weighted
digraph associated with Aequal-neighbor have the same edges. Also note that the weighted digraph associated
with Aequal-neighbor is aperiodic by design. Finally, if Dout = Din = dIn for some d R>0 , then
statement (iii) follows from
(Dout + In )1 (A + In )

5.5

>


1
(A + In )> 1n
d+1

= (Din + I)1 (A + In )> 1n

1n =


= diag((A + In )> 1n )1 (A + In )> 1n = 1n .

Design of weights for undirected graphs: the MetropolisHastings


model

Next, we suggest a second way of assigning weights to a graph for the purpose of designing an averaging
algorithm. Given an undirected unweighted graph G with n nodes, edge set E and degrees d(1), . . . , d(n),
define the weighted adjacency matrix AMetropolis-Hastings by

1 + max{d(i), d(j)}

X
(AMetropolis-Hastings )ij = 1
(AMetropolis-Hastings )ih ,

{i,h}E

0,

if {i, j} E and i 6= j,
if i = j,
otherwise.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

60

Chapter 5. Discrete-time Averaging Systems


5/12
5/12

3/4

1/3

1/4

1/4

1/4

1/4

Figure 5.5: The MetropolisHastings model

In our example,

1
0 1 0 0
0
1 0 1 1

A=
0 1 0 1 , D = 0
0
0 1 1 0

0
3
0
0

0
0
2
0

0
0

0
2

AMetropolis-Hastings

3/4
1/4
=
0
0

1/4
0
0
1/4 1/4 1/4
.
1/4 5/12 1/3
1/4 1/3 5/12

One can verify that the MetropolisHastings weights have the following properties:

(i) (AMetropolis-Hastings )ij > 0 if {i, j} E, (AMetropolis-Hastings )ii > 0 for all i {1, . . . , n}, and
(AMetropolis-Hastings )ij = 0 else;
(ii) AMetropolis-Hastings is symmetric and doubly-stochastic; and
(iii) AMetropolis-Hastings is primitive if and only if G is connected.

5.6

Centrality measures

In network science it is of interest to determine the relative importance of a node in a network. There
are many ways to do so and they are referred to as centrality measures or centrality scores. Part of the
treatment in this section is inspired by (Newman 2010). We refer (Brandes and Erlebach 2005) for a
comprehensive review of network analysis metrics and related computational algorithms and to (Gleich
2015) for a comprehensive review of Pagerank and its multiple extentions and applications.
We start by presenting four centrality notions based on the adjacency matrix. We treat the general
case of a weighted digraph G with weighted adjacency matrix A (warning: many articles in the literature
deal with undirected graphs only.) The matrix A is nonnegative, but not necessarily row stochastic.
From the Perron-Frobenius theory, recall the following facts:
(i) if G is strongly connected, then the spectral radius (A) is an eigenvalue of maximum magnitude
and its corresponding left eigenvector can be selected to be strictly positive and with unit sum (see
Theorem 2.15); and
(ii) if G contains a globally reachable node, then the spectral radius (A) is an eigenvalue of maximum magnitude and its corresponding left eigenvector is nonnegative and has positive entries
corresponding to each globally reachable node (see Theorem 5.2).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

5.6. Centrality measures


Degree centrality
in-degree:

61

For an arbitrary weighted digraph G, the degree centrality cdegree (i) of node i is its
cdegree (i) = din (i) =

n
X

(5.8)

aji ,

j=1

that is, the number of in-neighbors (if G is unweighted) or the sum of the weights of the incoming
edges. Degree centrality is relevant, for example, in (typically unweighted) citation networks whereby
articles are ranked on the basis of their citation records. (Warning: the notion that a high citation count
is an indicator of quality is clearly a fallacy.)
Eigenvector centrality One problem with degree centrality is that each in-edge has unit count, even
if the in-neighbor has negligible importance. To remedy this potential drawback, one could define the
importance of a node to be proportional to the weighted sum of the importance of its in-neighbors
(see (Bonacich 1972) for an early reference). This line of reasoning leads to the following definition.
For a weighted digraph G with globally reachable nodes (or for an undirected graph that is connected),
define the eigenvector centrality vector, denoted by cev , to be the left dominant eigenvector of the adjacency
matrix A associated with the dominant eigenvalue and normalized to satisfy 1>
n cev = 1.
Note that the eigenvector centrality satisfies
A> cev =

1
cev

cev (i) =

n
X

aji cev (j).

(5.9)

j=1

1
where = (A)
is the only possible choice of scalar coefficient in equation (5.9) ensuring that there exists
a unique solution and that the solution, denoted cev , is strictly positive in a strongly connected digraph
and nonnegative in a digraph with globally reachable nodes. Note that this connectivity property may
be restrictive in some cases.

Figure 5.6: Comparing degree centrality versus eigenvector centrality: the node with maximum in-degree has
zero eigenvector centrality in this graph

Katz centrality For a weighted digraph G, pick an attenuation factor < 1/(A) and define the
Katz centrality vector (see (Katz 1953)), denoted by cK , by the following equivalent formulations:
cK (i) =

n
X

aji (cK (j) + 1),

(5.10)

j=1

or
cK (i) =

n
X
X

k (Ak )ji .

k=1 j=1

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(5.11)

62

Chapter 5. Discrete-time Averaging Systems

Katz centrality has therefore two interpretations:


(i) the importance of a node is an attenuated sum of the importance and of the number of the
in-neighbors note indeed how equation (5.10) is a combination of equations (5.8) and (5.9), and
(ii) the importance of a node is times number of length-1 paths into i (i.e., the in-degree) plus 2
times the number of length-2 paths into i, etc. (From Lemma 4.2, recall that, for an unweighted
digraph, (Ak )ji is equal to the number of directed paths of length k from j to i.)
Note how, for < 1/(A), equation (5.10) is well-posed and equivalent to
cK = A> (cK + 1n )
cK + 1n = A> (cK + 1n ) + 1n
(In A> )(cK + 1n ) = 1n

cK = (In A> )1 1n 1n

X
cK =
k (A> )k 1n ,

(5.12)

k=1

P
k
where we used the identity (In A)1 =
k=0 A valid for any matrix A with (A) < 1; see
Exercise E2.12.
There are two simple ways to compute the Katz centrality. According to equation (5.12), for limited
size problems, one can invert the matrix (In A> ). Alternatively, one can show that the following
>
iteration converges to the correct value: c+
K := A (cK + 1n ).
0

1000

Figure 5.7: Image taken without permission from (Ishii


and Tempo 2014). The pattern in figure displays the 2000
so-called hyperlink matrix, i.e., the transpose of the
adjacency matrix, for a collection of websites at the
Lincoln University in New Zealand from the year
2006. Each empty column corresponds to a webpage
without any outgoing link, that is, to a so-called dan- 3000
gling node. This Web has 3756 nodes with 31,718
links. A fairly large portion of the nodes are dangling
nodes: in this example, there are 3255 dangling nodes,
which is over 85% of the total.
0

1000

2000

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

3000

5.6. Centrality measures

63

Pagerank centrality For a weighted digraph G with row-stochastic adjacency matrix (i.e., unit outdegree for each node), pick a convex combination coefficient ]0, 1[ and define the pagerank centrality
vector, denoted by cpr , as the unique positive solution to
cpr (i) =

n
X

aji cpr (j) +

j=1

1
,
n

(5.13)

or, equivalently, to
cpr = M cpr ,

1>
n cpr = 1,

where M = A> +

1
1n 1>
n.
n

(5.14)

(To establish the equivalence between these two definitions, the only non-trivial step is to notice that if
cpr solves equation (5.13), then it must satisfy 1>
n cpr = 1.)
Note that, for arbitrary unweighted digraphs and binary adjacency matrices A0,1 , it is natural to
1
compute the pagerank vector with A = Dout
A0,1 . We refer to (Brin and Page 1998; Ishii and Tempo
2014; Page 2001) for the important interpretation of the pagerank score as the stationary distribution of
the so-called random surfer of an hyperlinked document network it is under this disguise that the
pagerank score was conceived by the Google co-founders and a corresponding algorithm led to the
establishment of the Google search engine. In the Google problem it is customary to set .85.
Closeness and betweenness centrality (based on shortest paths) Degree, eigenvector, Katz and
Pagerank centrality are presented using the adjacency matrix. Next we present two centrality measures
based on the notions of shortest path and geodesic distance; these two notions belong to the class of radial
and medial centrality measures (Borgatti and Everett 2006).
We start by introducing some additional graph theory. For a weighted digraph with n nodes, the
length of a directed path is the sum of the weights of edges in the directed path. For i, j {1, . . . , n}, a
shortest path from a node i to a node j is a directed path of smallest length. Note: it is easy to construct
examples with multiple shortest paths, so that the shortest path is not unique. The geodesic distance dij
from node i to node j is the length of a shortest path from node i to node j; we also stipulate that the
geodesic distance dij takes the value zero if i = j and is infinite if there is no path from i to j. Note:
in general dij 6= dji . Finally, For i, j, k {1, . . . , n}, we let gikj denote the number of shortest
paths from a node i to a node j that pass through node k.
For a strongly-connected weighted digraph, the closeness of node i {1, . . . , n} is the inverse sum
over the geodesic distances dij from node i to all other nodes j {1, . . . , n}, that is:
ccloseness (i) = Pn

j=1 dij

(5.15)

For a strongly-connected weighted digraph, the betweenness of node i {1, . . . , n} is the fraction of
all shortest paths gkij from any node k to any other node j passing through node i, that is:
Pn
j,k=1 gkij
.
(5.16)
cbetweenness (i) = Pn Pn
h=1
j,k=1 gkhj
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

64

Chapter 5. Discrete-time Averaging Systems

Summary To conclude this section, in Table 5.1, we summarize the various centrality definitions for
a weighted directed graph.
Measure

Definition

degree centrality

cdegree = A> 1n

eigenvector centrality

cev = A> cev

pagerank centrality

cpr = A> cpr +

Katz centrality
closeness centrality
betweenness centrality

Assumptions
1
, G has a
(A)
globally reachable node
=

1
1n
n
cK = A> (cK + 1n )
ccloseness (i) = Pn
cbetweenness (i) =

< 1, A1n = 1n
1
<
(A)

dij
j=1 P
n
gkij
Pn j,k=1
Pn
h=1
j,k=1 gkhj

G strongly connected
G strongly connected

Table 5.1: Definitions of centrality measures for a weighted digraph G with adjacency matrix A

Figure 5.8 illustrates some centrality notions on a small instructive example due to Brandes (2006).
Note that a different node is the most central one in each metric; this variability is naturally expected
and highlights the need to select a centrality notion relevant to the specific application of interest.

(a) degree centrality

(b) eigenvector centrality

(c) closeness centrality

(d) betweenness centrality

Figure 5.8: Degree, eigenvector, closeness, and betweenness centrality for an undirected unweighted graph. The
dark node is the most central node in the respective metric; a different node is the most central one in each metric.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

5.7. Exercises

65

5.7

Exercises

E5.1

Left eigenvector for equal-neighbor row-stochastic matrices. Let A01 be the binary (i.e., each
entry is either 0 or 1) adjacency matrix for a unweighted undirected graph. Assume the associated graph is
connected. Let D = diag(d1 , . . . , dn ) be the degree matrix, let |E| be the number of edges of the graph,
and define A = D1 A01 . Show that
(i) the definition of A is well-posed and A is row-stochastic, and
(ii) the left eigenvector of A associated to the eigenvalue 1 and normalized so that 1>
n w = 1 is

d1
1 .
w=
. .
2|E| .
dn
Next, consider the equal-neighbor averaging algorithm in equation (5.7) with associated row-stochastic
matrix Aequal-neighbor = (D + In )1 (A01 + In ).
(iii) Show that
lim x(k) =

n

X
1
(1 + di )xi (0) 1n .
2|E| + n i=1

(iv) Verify that the left dominant eigenvector of the matrix Awsn = Aequal-neighbor defined in Section 1.2
is [1/6, 1/3, 1/4, 1/4]> , as seen in Example 2.5.
E5.2

A stubborn agent. Pick ]0, 1[, and consider the discrete-time consensus algorithm
x1 (k + 1) = x1 (k),
x2 (k + 1) = x1 (k) + (1 )x2 (k).
Perform the following tasks:
(i)
(ii)
(iii)
(iv)
(v)

E5.3

compute the matrix A representing this algorithm and verify it is row-stochastic,


compute the eigenvalues and eigenvectors of A,
draw the directed graph G representing this algorithm and discuss its connectivity properties,
compute the condensation digraph of G,
compute the final value of this algorithm as a function of the initial values in two alternate ways:
invoking and without invoking Theorem 5.2.

Agents with self-confidence levels. Consider 2 agents, labeled +1 and 1, described by the selfconfidence levels s+1 and s1 . Assume s+1 0, s1 0, and s+1 + s1 = 1. For i {+1, 1},
define
x+
i := si xi + (1 si )xi .

Perform the following tasks:


(i)
(ii)
(iii)
(iv)

compute the matrix A representating this algorithm and verify it is row-stochastic,


compute A2 ,
compute the eigenvalues, the right eigenvectors, and the left eigenvectors of A,
compute the final value of this algorithm as a function of the initial values and of the self-confidence
levels. Is it true that an agent with higher self-confidence makes a larger contribution to the final
value?
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

66
E5.4

Chapter 5. Discrete-time Averaging Systems


Persistent disagreement and the Friedkin-Johnsen model of opinion dynamics (Friedkin and
Johnsen 1999). Let W be a row-stochastic matrix describing a network of interpersonal influences;
assume W is irreducible. Let i [0, 1], i {1, . . . , n}, be a parameter descring how open is an individual
to changing her initial opinion about a subject; set = diag(1 , . . . , n ). Consider the Friedkin-Johnsen
model of opinion dynamics
x(k + 1) = W x(k) + (In )x(0).
Assume at least one individual is not completely open to change her opinion, that is, assume i < 1 for
some i. Perform the following tasks:
(i) show that the matrix W is convergent,
(ii) show that the matrix V = (In W )1 (In ) is well-defined and row-stochastic,
Hint: Review Exercises E2.10 and E2.12
(iii) show that the limiting opinions are limk+ x(k) = V x(0),
(iv) compute the matrix V and state whether two agents will achieve consensus or mantain persistent
disagreement for the following pairs of matrices:


1/2 1/2
, and 1 = diag(1/2, 1),
1/2 1/2


1/2 1/2
W2 =
, and 2 = diag(1/4, 3/4).
1/2 1/2
W1 =

(Note: Friedkin and Johnsen (1999) make the additional assumption that + diag(W ) = In ; this
assumption is not needed here. This model is sometimes referred to the opinion dynamics model
with stubborn agents.)
E5.5

Necessary and sufficient conditions for consensus. Let A be a row-stochastic matrix. Prove that the
following statements are equivalent:
(i) the eigenvalue 1 is simple and all other eigenvalues have magnitude strictly smaller than 1,
(ii) limk Ak = 1n w> , for some w Rn , w 0, and 1>
n w = 1,
(iii) the digraph associated to A contains a globally reachable node and the subgraph of globally reachable
nodes is aperiodic.
Hint: Use the Jordan normal form to show that (i) = (ii).

E5.6

Computing centrality. Write in your favorite programming language algorithms to compute degree,
eigenvector, Katz and pagerank centralities. Compute these four centralities for the following undirected
unweighted graphs (without self-loops):
(i)
(ii)
(iii)
(iv)

the ring graph with 5 nodes;


the star graph with 5 nodes;
the line graph with 5 nodes; and
the Zachary karate club network dataset. This dataset can be downloaded for example from: http:
//konect.uni-koblenz.de/networks/ucidata-zachary

To compute Katz centrality of a matrix A, select = 1/(2(A)). For pagerank, use = 1/2.
Hint: Recall that pagerank centrality is well-defined for a row-stochastic matrix.
E5.7

Iterative computation of Katz centrality. Given a graph with adjacency matrix A, show that the
solution to the iteration x(k + 1) := A> (x(k) + 1n ) with < 1/(A) converges to the Katz centrality
vector cK , for all initial conditions x(0).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 5


E5.8

67

A sample DeGroot panel. A conversation between 5 panelists


model by an averaging algorithm x+ = Apanel x, where

0.15 0.15 0.1 0.2


0
0.55
0
0

0.3
0.05
0.05
0
Apanel =

0
0.4 0.1 0.5
0
0.3
0
0

is modeled according to the DeGroot

0.4
0.45

0.6

0
0.7

Assuming that the panel has had sufficiently long deliberations, answer the following:
(i) Based on the associated digraph, do the panelists finally agree on a common decision?
(ii) In the event of agreement, does the initial opinion of any panelists get rejected? If so, which ones?
(iii) If the panelists initial opinions are their self-appraisals (i.e., the self-weights aii , i {1, . . . , 5}), what
is the final opinion?
E5.9

Three DeGroot panels. Recall the DeGroot model introduced in Chapter 1. Denote by xi (0) the initial
opinion of each individual, and xi (k) its updated opinion after k communications with its neighbors. Then
the vector of opinions evolves over time according to x(k + 1) = Ax(k) where the coefficient aij [0, 1] is
the
P influence of the opinion of individual j on the update of the opinion of agent i, subject to the constraint
j aij = 1. Consider the following three scenarios:
(i) Everybody gives the same weight to the opinion of everybody else.
(ii) There is a distinct agent (suppose the agent with index i = 1) that weights equally the opinion of all
the others, and the remaining agents compute the mean between their opinion and the one of first
agent.
(iii) All the agents compute the mean between their opinion and the one of the first agent. Agent 1 does
not change her opinion.

In each case, derive the averaging matrix A, show that the opinions converge asymptotically to a final
opinion vector, and characterize this final opinion vector.
E5.10 Move away from your nearest neighbor and reducible averaging. Consider n 3 robots with
positions pi R, i {1, . . . , n}, dynamics pi (t + 1) = ui (t), where ui R is a steering control input.
For simplicity, assume that the robots are indexed according to their initial position: p1 (0) p2 (0)
p3 (0) pn (0). Consider two walls at the positions p0 p1 (0) and pn+1 pn (0) so that all
robots are contained between the walls. The walls are stationary, that is, p0 (t + 1) = p0 (t) = p0 and
pn+1 (t + 1) = pn+1 (t) = pn+1 .
Consider the following coordination law: robots i {2, . . . , n 1} (each having two neighbors) move
to the centroid of the local subset {pi1 , pi , pi+1 }. The robots {1, n} (each having one robotic neighbor
and one neighboring wall) move to the centroid of the local subsets {p0 , p1 , p2 } and {pn1 , pn , pn+1 },
respectively. Hence, the closed-loop robot dynamics are
pi (t + 1) =

1
(pi1 (t) + pi (t) + pi+1 (t)) ,
3

i {1, . . . , n} .

Show that the robots become uniformly spaced on the interval [p0 , pn+1 ] using Theorem 5.3.
(Note: This exercise is a discrete-time version of E2.18(ii) based on averaging with multiple sinks.)
E5.11 Central nodes in example graph. For the unweighted undirected graph in Figure 5.8, verify (possibly
with the aid of a computational package) that the dark nodes have indeed the largest degree, eigenvector,
closeness and betweenness centrality as stated in the figure caption.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

68

Chapter 5. Discrete-time Averaging Systems

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 6

The Laplacian Matrix


So far, we have studied adjacency matrices. In this chapter, we study a second relevant matrix associated
to a digraph, called the Laplacian matrix. More information on adjacency and Laplacian matrices can be
found in standard books on algebraic graph theory such as (Biggs 1994) and (Godsil and Royle 2001).
Two surveys about Laplacian matrices are (Merris 1994; Mohar 1991).

6.1

The Laplacian matrix

The Laplacian matrix of the weighted digraph G is


L = Dout A.
In components L = (`ij )i,j{1,...,n}

anij ,
X
`ij =
aih ,

if i 6= j,
if i = j,

h=1,h6=i

or, for an unweighted undirected graph,

1, if {i, j} is an edge, not self-loop,


`ij = d(i), if i = j,

0,
otherwise.
Note:

(i) the sign pattern of L is important diagonal elements are positive and off-diagonal elements are
nonpositive (zero or negative);
(ii) the matrix L does not depend upon the existence and values of self-loops (or lack thereof ); and
(iii) the graph G is undirected (i.e., symmetric adjacency matrix) if and only if L is symmetric. In this
case, Dout = Din = D and A = A> .
69

70

Chapter 6. The Laplacian Matrix


We now present some useful equalities. By the way, obviously
(Ax)i =

n
X

(6.1)

aij xj

j=1

First, for x Rn ,
(Lx)i =
=

n
X

`ij xj = `ii xi +

j=1
n
X

n
X

`ij xj =

j=1,j6=i

j=1,j6=i

aij (xi xj ) =

n
 X

j=1,j6=i

jN out (i)

n

X
aij xi +
(aij )xj
j=1,j6=i

(6.2)

aij (xi xj )


dout (i) xi average({xj , for all out-neighbors j}) .

for unit weights

Second, assume L = L> (i.e., aij = aji ) and compute:


x> Lx =
=

n
X

xi (Lx)i =

i=1
n
X

i,j=1

n
X
i=1

xi

n
 X

j=1,j6=i

aij (xi xj )

n
n
X
1 X
2
+
aij xi
aij xi xj
aij xi (xi xj ) =
2 2

by symmetry

1
2

n
X

1

aij x2i +

i,j=1

n
1 X
=
aij (xi xj )2
2
i,j=1
X
=
aij (xi xj )2 .

1
2

i,j=1

n
X

i,j=1

aij x2j

i,j=1

n
X

aij xi xj

i,j=1

(6.3)
(6.4)

{i,j}E

These equalities are useful because it is common to encounter the array of differences Lx and the
quadratic error or disagreement function x> Lx. They provide the correct intuition for the definition
of the Laplacian matrix. In the following, we will refer to x 7 x> Lx as the Laplacian potential function;
this name is justified based on the energy and power interpetation we present in the next two examples.

6.2

The Laplacian in mechanical networks of springs

x
Let xi R denote the displacement of the ith rigid body. Assume that each spring is ideal linear-elastic
and let aij be the spring constant for the spring connecting the ith and jth bodies.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

6.3. The Laplacian in electrical networks of resistors

71

Define a graph as follows: the nodes are the rigid bodies {1, . . . , n} with locations x1 , . . . , xn , and
the edges are the springs with weights aij . Each node i is subject to a force
X
Fi =
aij (xj xi ) = (Lx)i ,
j6=i

where L is the Laplacian for the network of springs (modeled as an undirected weighted graph). Moreover,
recalling that the spring {i, j} stores the quadratic energy 21 aij (xi xj )2 , the total elastic energy is
Eelastic =

1 X
1
aij (xi xj )2 = x> Lx.
2
2
{i,j}E

In this role, the Laplacian matrix is referred to as the stiffness matrix. Stiffness matrices can be defined
for spring networks in arbitrary dimensions (not only on the line) and with arbitrary topology (not only
a chain graph, or line graph, as in figure). More complex spring networks can be found, for example, in
finite-element discretization of flexible bodies and finite-difference discretization of diffusive media.

6.3

The Laplacian in electrical networks of resistors


+

4
1

Suppose the graph is an electrical network with only pure resistors and ideal voltage sources: (i) each
graph vertex i {1, . . . , n} is possibly connected to an ideal voltage source, (ii) each edge is a resistor,
say with resistance rij between nodes i and j. (This is an undirected weighted graph.)
Ohms law along each edge {i, j} gives the current flowing from i to j as
cij = (vi vj )/rij = aij (vi vj ),
where aij is the inverse resistance, called conductance. We set aij = 0 whenever two nodes are not
connected by a resistance. Kirchhoffs current law says that at each node i:
cinjected at i =

n
X

j=1,j6=i

cij =

n
X

j=1,j6=i

aij (vi vj )

Hence, the vector of injected currents cinjected and the vector of voltages at the nodes v satisfy
cinjected = L v.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

72

Chapter 6. The Laplacian Matrix

Moreover, the power dissipated on resistor {i, j} is cij (vi vj ), so that the total dissipated power is
X

Pdissipated =

{i,j}E

aij (vi vj )2 = v> Lv.

Historical Note: Kirchhoff (1847) is a founder of graph theory in that he was an early adopter of
graph models to analyze electrical circuits.

6.4

Properties of the Laplacian matrix

Lemma 6.1 (Zero row-sums). Let G be a weighted digraph with Laplacian L and n nodes. Then
L1n = 0n .
In equivalent words, 0 is an eigenvalue of L with eigenvector 1n .
Proof. For all rows i, the ith row-sum is zero:
n
X

`ij = `ii +

j=1

n
X

`ij =

j=1,j6=i

n
 X

j=1,j6=i

n

X
aij +
(aij ) = 0.
j=1,j6=i

Equivalently, in vector format (remembering the weighted out-degree matrix Dout is diagonal and
contains the row-sums of A):


dout (1)
dout (1)

L1n = Dout 1n A1n = ... ... = 0n

dout (n)

dout (n)

Note: Each graph has a Laplacian matrix. Vice versa, a square matrix is called a Laplacian if (i)
its row-sums are zero, (ii) its diagonal entries are nonnegative, and (iii) its non-diagonal entries are
nonpositive. Such a matrix uniquely induces a weighted digraph with the exception of the self-loops.
Lemma 6.2 (Zero column-sums). Let G be a weighted digraph with Laplacian L and n nodes. The
following statements are equivalent:
(i) G is weight-balanced; and
>
(ii) 1>
n L = 0n .

Proof. Pick j {1, . . . , n} and compute


>
(1>
n L)j = (L 1n )j =

n
X
i=1

`ij = `jj +

n
X

i=1,j6=i

`ij = dout (i) din (i),

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

6.5. Graph connectivity and the rank of the Laplacian

73

where the last equality follows from


`jj = dout (j) ajj

and

n
X

i=1,j6=i

`ij = (din (i) ajj ).

>
In summary, we know that 1>
n L = 0n if and only if Dout = Din .

Lemma 6.3 (Spectrum of the Laplacian matrix). Given a weighted digraph G with Laplacian L, the
eigenvalues of L different from 0 have strictly-positive real part.
P
Proof. Recall `ii = nj=1,j6=i aij 0 and `ij = aij 0 for i 6= j. By the Gergorin Disks Theorem 2.9,
we know that each eigenvalue of L belongs to at least one of the disks
n
n
o 
X


z C |z `ii |
|`ij | = z C | |z `ii | `ii .
j=1,j6=i

`ii
`jj

These disks, with radius equal to the center, contain the origin and complex numbers with positive real
part.

For an undirected graph with symmetric adjacency matrix A = A> , therefore, L is symmetric
and positive semidefinite, that is, all eigenvalues of L are real and nonnegative. By convention we
write these eigenvalues as 0 = 1 2 n . The second smallest eigenvalue 2 is called the
Fiedler eigenvalue or the algebraic connectivity (Fiedler 1973). Note that the theorem proof also implies
n 2 max{dout (1), . . . , dout (n)}.

6.5

Graph connectivity and the rank of the Laplacian

Theorem 6.4 (Rank of the Laplacian). Let L be the Laplacian matrix of a weighted digraph G with n
nodes. Let d be the number of sinks in the condensation digraph of G. Then
rank(L) = n d.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

74

Chapter 6. The Laplacian Matrix

Early references for this theorem include (Agaev and Chebotarev 2000; Foster and Jacquez 1975),
even though the proof here is independent. This theorem has the following immediate consequences:
(i) a digraph G contains a globally reachable vertex if and only if rank(L) = n 1 (also recall the
properties of C(G) from Lemma 3.2);
(ii) for the case of undirected graphs, we have the following two results: the rank of L is equal to n
minus the number of connected components of G and an undirected graph G is connected if and
only if 2 > 0.
by modifying G as
Proof. We start by simplifying the problem. Define a new weighted digraph G
follows: at each node, add a self-loop with unit weight if no self-loop is present, or increase the weight
by modyfing
of the self-loop by 1 if a self-loop is present. Also, define another weighted digraph G
as follows: for each node, divide the weights of its out-going edges by its out-degree, so that the
G
1
= L, and define A = D
out
A
out-degree of each node is 1. In other words, define A = A + I and L
1

out L
= I A. Clearly, the rank of L is equal to the rank of L. Therefore, without loss of
and L = D
generality, we consider in what follows only digraphs with row-stochastic adjacency matrices.
Because the condensation digraph C(G) has d sinks, after a renumbering of the nodes, that is, a
permutation of rows and columns (see Exercise E3.1), the adjacency matrix A can be written in block
lower tridiagonal form as

A11 0
0

0 A22 0

..
0
.
0
A=
.
.
.
..
..
..

..
..
0
.
.
A1o A2o

..
.
..
.
..
.

0
..
.

..

0
..
.

..

0 Add
0
Ado Aothers

Rnn .

where the state vector x is correspondingly partitioned into the vectors x1 , . . . , xd and xothers of dimensions n1 , . . . , nd and n (n1 + + nd ) respectively, corresponding to the d sinks and all other
nodes.
Each sink of C(G) is a strongly connected and aperiodic digraph. Therefore, the square matrices A11 , . . . , Add are nonnegative, irreducible, and primitive. By the PerronFrobenius Theorem for
primitive matrices 2.16, we know that the number 1 is a simple eigenvalue for each of them.
The square matrix Aothers is nonnegative and it can itself be written as a block lower triangular
matrix, whose diagonal block matrices, say (Aothers )1 , . . . , (Aothers )N are nonnegative and irreducible.
Moreover, each of these diagonal block matrices must be row-substochastic because (1) each row-sum
for each of these matrices is at most 1, and (2) at least one of the row-sums of each of these matrices must
be smaller than 1, otherwise that matrix would correspond to a sink of C(G). In summary, because the
matrices (Aothers )1 , . . . , (Aothers )N are irreducible and row-substochastic, the matrix Aothers has spectral
radius (Aothers ) < 1.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

6.6. The algebraic connectivity, its eigenvector, and graph partitioning

75

We now write the Laplacian matrix L = In A with the same block lower triangular structure:

L11
0
0
0
0

..
..
0

.
.
L
0
0
22

.
.. ..
..
.
0

.
.
.
0
.
,
L=
(6.5)
.

..
.. ..
..
..

.
.
.
.
0

.
.
..
.. 0
0
L
0
dd

A1o A2o Ado Lothers

where, for example, L11 = In1 A11 . Because the number 1 is a simple eigenvalue of A11 , the number 0
is a simple eigenvalue of L11 . Therefore, rank(L11 ) = n1 1. This same argument establishes that the
rank of L is at most n d because each one of the matrices L11 , . . . , Ldd is of rank n1 1, . . . , nd 1,
respectively. Finally, we note that the rank of Lothers is maximal, because Lothers = I Aothers and
(Aothers ) < 1 together imply that 0 is not an eigenvalue for Lothers .


6.6

The algebraic connectivity, its eigenvector, and graph partitioning

As shown before, the algebraic connectivity 2 of an undirected and weighted graph G is positive if
and only if G is connected. We build on this insight and show that the algebraic connectivity does not
only provide a binary connectivity measure, but it also quantifies the bottleneck of the graph. To
develop this intuition, we study the problem of community detection in a large-scale undirected graph.
This problem arises, for example, when identifying group of friends in a social network by means of the
interaction graph.
We consider the specific problem of partitioning the vertices V of an undirected connected graph G
in two sets V1 and V2 so that
V1 V2 = V, V1 V2 = , and V1 , V2 6= .
Of course, there are many such partitions. We measure the quality of a partition by the sum of the weights
of all edges that need to be cut to separate the vertices V1 and V2 into two disconnected components.
Formally, the size of the cut separating V1 and V2 is
X
J=
aij .
iV1 ,jV2

We are interested in finding the cut with minimal size that identifies the two groups of nodes that are
most loosely connected. The problem of minimizing the cut size J is combinatorial and computationally
hard since we need to consider all possible partitions of the vertex set V . We present here a tractable
approach based on a relaxation step. First, define a vector x {1, +1}n with entries xi = 1 for i V1
and xi = 1 for i V2 . Then the cut size J can be rewritten via the Laplacian potential as
J=

n
1 X
aij (xi xj )2 = x> Lx
2
i,j=1

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

76

Chapter 6. The Laplacian Matrix

and the minimum cut size problem is:


x> Lx.

minimize

x{1,1}n \{1n ,1n }

(Here we exclude the cases x = 1n because they correspond to one of the two groups being empty.)
Second, since this problem is still computationally hard, we relax the problem from binary decision
variables xi {1, +1} to continuous decision variables xi [1, 1] (or kxk 1), where we exclude
x span(1n ) (corresponding to one of the two groups being empty). Then the minimization problem
becomes
minimize
y > Ly.
yRn ,y1n ,kyk =1

As a third and final step, we consider a 2-norm constraint kyk2 = 1 instead of an -norm constraint

kyk = 1 (recall that kyk kyk2 nkyk ) to obtain the following heuristic:
minimize

y > Ly.

yRn ,y1n ,kyk2 =1

Notice that y > Ly 2 kyk2 and this inequality is strict whenever y = v2 , the normalized eigenvector
associated to 2 . Thus, the unique minimum of the relaxed optimization problem is 2 and the minimizer
is y = v2 . We can then use as a heuristic x = sign(v2 ) to find the desired partition {V1 , V2 }. Hence, the
algebraic connectivity 2 is an estimate for the size of the minimum cut, and the signs of the entries of
v2 identify the associated partition in the graph. For these reasons 2 and v2 can be interpreted as the
size and the location of a bottleneck in a graph.
To illustrate the above concepts, we construct a randomly generated graph as follows. First, we
partition n = 1000 nodes in two groups V1 and V2 of sizes 450 and 550 nodes, respectively. Second, we
connect any pair of nodes in the set V1 (respectively V2 ) with probability 0.3 (respectively 0.2). Third
and finally, any two nodes in distinct groups, i V1 and j V2 , are connected with a probability of
0.1. The sparsity pattern of the associated adjacency matrix is shown in the left panel of Figure 6.1. No
obvious partition is visible at first glance since the indices are not necessarily sorted, that is, V1 is not
necessarily {1, . . . , 450}. The second panel displays the entries of the eigenvector v2 sorted according
to their magnitude showing a sharp transition between positive and negative entries. Finally, the third
panel displays the correspondingly sorted adjacency matrix A clearly indicating the partition V = V1 V2 .
The Matlab code to generate Figure 6.1 can be found below.
1
2

% choose a graph size


n = 1000;

3
4
5
6
7
8

% randomly assign the nodes to two grous


x = randperm(n);
group_size = 450;
group1 = x(1:group_size);
group2 = x(group_size+1:end);

9
10

% assign probabilities of connecting nodes


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

6.6. The algebraic connectivity, its eigenvector, and graph partitioning

77

Figure 6.1: The first panel shows a randomly-generated sparse adjacency matrix A for a graph with 1000 nodes.
The second panel displays the eigenvector v2 which is identical to the normalized eigenvector v2 after sorting the

entries according to their magnitude, and the third panel displays the correspondingly sorted adjacency matrix A.

11
12
13

p_group1 = 0.3;
p_group2 = 0.2;
p_between_groups = 0.1;

14
15
16
17
18
19

% construct adjacency matrix


A(group1, group1) = rand(group_size,group_size) < p_group1;
A(group2, group2) = rand(ngroup_size,ngroup_size) < p_group2;
A(group1, group2) = rand(group_size, ngroup_size) < p_between_groups;
A = triu(A,1); A = A + A';

20
21
22
23

% can you see the groups?


subplot(1,3,1); spy(A);
xlabel('$A$', 'Interpreter','latex','FontSize',28);

24
25
26
27

% construct Laplacian and its spectrum


L = diag(sum(A))A;
[V D] = eigs(L, 2, 'SA');

28
29
30
31

% plot the components of the algebraic connectivity sorted by magnitude


subplot(1,3,2); plot(sort(V(:,2)), '.');
xlabel('$\tilde v_2$', 'Interpreter','latex','FontSize',28);

32
33
34
35
36

% partition the matrix accordingly and spot the communities


[ignore p] = sort(V(:,2));
subplot(1,3,3); spy(A(p,p));
xlabel('$\tilde A$', 'Interpreter','latex','FontSize',28);

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

78

Chapter 6. The Laplacian Matrix

6.7

Exercises

E6.1

The adjacency and Laplacian matrices for the complete graph. For any number n N, the complete
graph with n nodes, denoted by K(n), is the undirected and unweighted graph in which any two distinct
nodes are connected. For example, see K(6) in figure.
Compute, for arbitrary n,
(i) the adjacency matrix of K(n) and its eigenvalues; and
(ii) the Laplacian matrix of K(n) and its eigenvalues.

E6.2

The adjacency and Laplacian matrices for the complete bipartite graph. A bipartite graph is a graph
whose vertices can be divided into two disjoint sets U and V with the property that every edge connects
a vertex in U to one in V . A complete bipartite graph is a bipartite graph in which every vertex of U is
connected with every vertex of V . If U has n vertices and V has m vertices, for arbitrary n, m N, the
resulting complete bipartite graph is denoted by K(n, m). For example, see K(1, 6) and K(3, 3) in figure.
Compute, for arbitrary n and m,
(i) the adjacency matrix of K(n, m) and its eigenvalues; and
(ii) the Laplacian matrix of K(n, m) and its eigenvalues.

E6.3

The Laplacian matrix of an undirected graph is positive semidefinite. Give an alternative proof,
without relying on the Gergorin Disks Theorem 2.9, that the Laplacian matrix L of an undirected
weighted graph is symmetric positive semidefinite. (Note that the proof of Lemma 6.3 relies on Gergorin
Disks Theorem 2.9).

E6.4

The Laplacian matrix of a weight-balanced digraph. Prove the following statements are equivalent:
(i) the digraph G is weight-balanced,
(ii) L + L> is positive semidefinite.
Hint: Recall the proof of Lamma 6.3.

E6.5

A property of weight-balanced Laplacian matrices. Let L be the Laplacian matrix of a stronglyconnected weight-balanced digraph G. Show that, for x Rn ,

2
1


>
>
2 (L + L> ) x (1>
x)1
n x (L + L )x,
n n
2

where 2 (L + L> ) is the smallest non-zero eigenvalue of L + L> .


E6.6

The disagreement function in a directed graph (Gao et al. 2008). Recall that the quadratic form
associated with a symmatric matrix B Rnn is the function x 7 x> Bx. Let G be a weighted digraph G
with n nodes and define the quadratic disagreement function G : Rn R by
G (x) =

n
1 X
aij (xj xi )2 .
2 i,j=1

Show that:
(i) G is the quadratic form associated with the symmetric positive-semidefinite matrix
P =

1
(Dout + Din A A> ),
2

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 6


(ii) P =
E6.7

1
2

79


L + L(rev) , where the Laplacian of the reverse digraph is L(rev) = Din A> .

The pseudoinverse Laplacian matrix. The Moore-Penrose pseudoinverse of an n m matrix M is the


unique m n matrix M with the following properties:
(i) M M M = M ,
(ii) M M M = M , and
(iii) M M is symmetric and M M is symmetric.

Assume L is the Laplacian matrix of a weighted connected undirected graph with n nodes. Let U Rnn
be an orthonormal matrix of eigenvectors of L such that

0
0

L = U .
..
0

Show that

0
0

(i) L = U .
..
0

0
1/2
..
.

...
...
..
.

0
0
..
.

...
...
..
.

0
0
..
.

...

>
U .

>
U ,

. . . 1/n
1
(ii) LL = L L = In 1n 1>
n , and
n
(iii) L 1n = 0n .
E6.8

0
2
..
.

The Green matrix of a Laplacian matrix. Assume L is the Laplacian matrix of a weighted connected
undirected graph with n nodes. Show that
(i) the matrix L + n1 1n 1>
n is positive definite,
(ii) the so-called Green matrix

1 1
1
X = L + 1n 1>
1n 1>
n
n
n
n

(E6.1)

is the unique solution to the system of equations:


(

LX = In n1 1n 1>
n,
>
1>
X
=
0
,
n
n

(iii) X = L , where L is defined in Exercise E6.7. In other words, the Green matrix formula (E6.1) is
an alternative definition of the pseudoinverse Laplacian matrix.
E6.9

Monotonicity of Laplacian eigenvalues. Consider a symmetric Laplacian matrix L Rnn associated


to a weighted and undirected graph G = {V, E, A}. Assume G is connected and let 2 (G) > 0 be its
algebraic connectivity, i.e., the second-smallest eigenvalue of L. Show that
(i) 2 (G) is a monotonically non-decreasing function of each weight aij , {i, j} E; and
(ii) 2 (G) is monotonically non-decreasing function in the edge set in the following sense: 2 (G)
2 (G0 ) for any graph G0 = (V, E 0 , A0 ) with E E 0 and aij = a0ij for all {i, j} E.

Hint: Use the disagreement function.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

80

Chapter 6. The Laplacian Matrix

E6.10 Invertibility of principal minors of the Laplacian matrix. Consider a connected and undirected graph
and an arbitrary partition of the node set V = V1 V2 . The associated symmetric and irreducible Laplacian
matrix L Rnn is partitioned accordingly as

L
L = 11
L>
12


L12
.
L22

Show that the submatrices L11 R|V1 ||V1 | and L22 R|V2 ||V2 | are nonsingular.
Hint: Look up the concept of irreducible diagonally-dominant matrices.
E6.11 Gaussian elimination and Laplacian matrices. Consider an undirected and connected graph and its
associated Laplacian matrix L Rnn . Consider the associated linear Laplacian equation y = Lx, where
x Rn is unknown and y Rn is a given vector. Verify that an elimination of xn from the last row of
this equation yields the following reduced set of equations:

..
y1
L1n /Lnn
.

..

.
. +
yn = . . .
.
.
yn1
Ln1,n /Lnn
..
|
{z
}
|
=A

..
.
Lij

Lin Ljn
Lnn

..
.
{z

=Lred

.
x1
.
. . .
.. ,
..
xn1
.
}
..

where the (i, j)-element of Lred is given by Lij Lin Ljn /Lnn . Show that the matrices A Rn11 and
L Rn1n1 obtained after Gaussian elimination have the following properties:
(i) A is nonnegative and column-stochastic matrix with at least one strictly positive element; and
(ii) Lred is a symmetric and irreducible Laplacian matrix.

Hint: To show the irreducibility of Lred , verify the following property regarding the fill-in of the matrix Lred : The
graph associated to the Laplacian Lred has an edge between nodes i and j if and only if (i) either {i, j} was an edge
in the original graph associated to L, (ii) or {i, n} and {j, n} were edges in the original graph associated to L.

E6.12 The spectra of Laplacian and row-stochastic adjacency matrices. Consider a row-stochastic matrix
A Rnn . Let L be the Laplacian matrix of the digraph associated to A. Compute the spectrum of L as a
function of the spectrum spec(A) of A.
E6.13 Thomsons principle and energy routing. Consider a connected and undirected resistive electrical
network with n nodes, with external nodal current injections c Rn satisfying the balance condition
1>
n c = 0, and with resistances Rij > 0 for every undirected edge {i, j} E. For simplicity, we set
Rij = if there is no edge connecting i and j. As shown earlier in this chapter, Kirchhoffs and Ohms
laws lead to the network equations
cinjected at i =

jN (i)

cji =

n
X

jN (i)

1
(vi vj ) ,
Rij

where vi is the potential at node i and cji = 1/Rij (vi vj ) is the current flow from node i to node j.
Consider now a more general set of current flows fij (for all i, j Rn ) routing energy through the
network and compatible with the following basic assumptions:
(i) Skew-symmetry: fij = fji for all i, j Rn ;
(ii) Consistency: fij = 0 if {i, j} 6 E;
P
(iii) Conservation: cinjected at i = jN (i) fji for all i Rn .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 6

81

Show that among all possible current flows fij , the physical current flow fij = cij = 1/Rij (vj vi )
uniquely minimizes the energy dissipation:
minimize

fij , i,j{1,...,n}

J=

n
1 X
2
Rij fij
2 i,j=1

subject to fij = fji


fij = 0

cinjected at i =

i, j Rn ,

{i, j} 6 E ,
fji

jN (i)

i Rn .

Hint: The solution requires knowledge of the Karush-Kuhn-Tucker (KKT) conditions for optimality; this is a
classic topic in nonlinear constrained optimization discussed in numerous textbooks, e.g., in (Luenberger 1984).
E6.14 Linear spring networks with loads. Consider the two (connected) spring networks with n moving
masses in figure. For the right network, assume one of the masses is connected with a single stationary
object with a spring. Refer to the left spring network as free and to the right network as grounded. Let Fload
be a load force applied to the n moving masses.

For the left network, let Lfree,n be the n n Laplacian matrix describing the free spring network among
the n moving masses, as defined in Section 6.2. For the right network, let Lfree,n + 1 be the (n + 1) (n + 1)
Laplacian matrix for the spring network among the n masses and the stationary object. Let Lgrounded be
the n n grounded Laplacian of the n masses constructed by removing the row and column of Lfree,n + 1
corresponding to the stationary object.
For the free spring network subject to Fload ,
(i) do equilibrium displacements exist for arbitrary loads?
(ii) if the load force Fload is balanced in the sense that 1>
n Fload = 0, is the resulting equilibrium displacement
unique?
(iii) compute the equilibrium displacement if unique, or the set of equilibrium displacements otherwise,
assuming a balanced force profile is applied.
For the grounded spring network,
(iv) derive an expression relating Lgrounded to Lfree,n ,
(v) show that Lgrounded is invertible,
(vi) compute the displacement for the grounded spring network for arbitrary load forces.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

82

Chapter 6. The Laplacian Matrix

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 7

Continuous-time Averaging Systems


In this chapter we consider averaging algorithms in which the variables evolve in continuous time,
instead of discrete time. Therefore we look at some interesting differential equations. We borrow ideas
from (Mesbahi and Egerstedt 2010; Ren et al. 2007).

7.1

Example #1: Flocking behavior for a group of animals

We are interested in a continuous-time agreement phenomenon based on the simple alignment rule
for each agent to steer towards the average heading of its neighbors; see Figure 7.1.

Figure 7.1: Alignment rule: the left fish rotates clockwise to align itself with the average heading of its neighbors.

This alignment rule amounts to a spring-like force, described as follows:

if ith agent has one neighbor


(j i ),
i = 1 (j i ) + 1 (j i ),
if ith agent has two neighbors
1
2
2
2

1
1
if ith agent has m neighbors
m (j1 i ) + + m (jm i ),

= average {j , for all neighbors j} i .

This interaction law can be written as

= L

where L is the Laplacian of an appropriate weighted digraph G: each bird is a node and each directed
edge (i, j) has weight 1/dout (i). Here it is useful to recall the interpretation of (Lx)i as a force perceived
by node i in a network of springs.
83

84

Chapter 7. Continuous-time Averaging Systems

Note: it is weird (i.e., mathematically ill-posed) to compute averages on a circle, but let us not worry
about it for now.
Note: this incomplete model does not concern itself with positions in any way. Hence, (1) there is no
discussion about collision avoidance and formation/cohesion maintenance. Moreover, (2) the graph G
should be really state dependent. For example, we may assume that two birds see each other and interact
if and only if their pairwise Euclidean distance is below a certain threshold.

Figure 7.2: Many animal species exhibit flocking behaviors that arise from decentralized interactions. On the left:
pacific threadfins (Polydactylus sexfilis); public domain image from the U.S. National Oceanic and Atmospheric
Administration. On the right: flock of snow geese (Chen caerulescens); public domain image from the U.S. Fish and
Wildlife Service.

7.2

Example #2: A simple RC circuit

Consider an electrical network with only pure resistors and with pure capacitors connecting each
node to ground; this example is taken from (Mesbahi and Egerstedt 2010; Ren et al. 2007).
From the previous chapter, we know the vector of injected currents cinjected and the vector of voltages
at the nodes v satisfy
cinjected = L v,
where L is the Laplacian for the graph with coefficients aij = 1/rij . Additionally, assuming Ci is the
capacitance at node i, and keeping proper track of the current into each capacitor, we have
Ci

d
vi = cinjected at i
dt

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

7.3. Continuous-time linear systems and their convergence properties

85

so that, with the shorthand C = diag(C1 , . . . , Cn ),


d
v = C 1 L v.
dt
Note: C 1 L is again a Laplacian matrix (for a directed weighted graph).
Note: it is physically intuitive that after some transient all nodes will have the same potential. This
intuition will be proved later in the chapter.

7.3

Continuous-time linear systems and their convergence properties

In Section 2.1 we presented discrete-time linear systems and their convergence properties; here we
present their continuous-time analog.
A continuous-time linear system is
x(t)

= Ax(t).

(7.1)

Its solution t 7 x(t), t R0 from an initial confition x(0) satisfies x(t) = eAt x(0), where the matrix
exponential of a square matrix A is defined by
eA =

X
1 k
A .
k!
k=0

The matrix exponential is a remarkable operation with numerous properties; we ask the reader to review
a few basic ones in Exercise E7.1. A matrix A Rnn is
(i) continuous-time semi-convergent if limt+ eAt exists, and
(ii) continuous-time convergent (Hurwitz) if it is continuous-time semi-convergent and limt+ eAt =
0nn .
The spectral abscissa of a square matrix A is the maximum of the real parts of the eigenvalues of A, that is,
(A) = max{<() | spec(A)}.
Theorem 7.1 (Convergence and spectral abscissa). For a square matrix A, the following statements hold:
(i) A is continuous-time convergent (Hurwitz) (A) < 0,

(ii) A is semi-convergent (A) 0, no eigenvalue has zero real part other than possibly the number 0,
and if 0 is an eigenvalue, then it is semisimple.
We leave the proof of this theorem to the reader and mention that most required steps are similar to
the dicussion in Section 2.1 and are discussed later in this chapter.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

86

7.4

Chapter 7. Continuous-time Averaging Systems

The Laplacian flow

Let G be a weighted directed graph with n nodes and Laplacian matrix L. The Laplacian flow on Rn is
the dynamics
(7.2)

x = Lx,
or, equivalently in components,
x i =

n
X
j=1

aij (xj xi ) =

jN out (i)

aij (xj xi ).

Lemma 7.2 (Equilibrium points). If G contains a globally reachable node, then the only equilibrium points
of the Laplacian flow (7.2) are:
x = 1n , for some R
Proof. A point x is an equilibrium for the Laplacian flow if Lx = 0n . Hence, any point in the kernel
of the matrix L is an equilibrium. From Theorem 6.4, if G contains a globally reachable node, then
rank(L) = n 1. Hence, the dimension of the kernel space is 1. The lemma follows by recalling that
L1n = 0n .

In what follows, we are interested in characterizing the evolution of the Laplacian flow (7.2). To
build some intuition, let us first consider an undirected graph G and write the modal decomposition of
the solution as we did in Section 2.1 for a discrete-time linear system. We proceed in two steps. First,
because G is undirected, the matrix L is symmetric and has real eigenvalues 0 = 1 2 n
with corresponding orthonormal (i.e., orthogonal and unit-length) eigenvectors v1 , . . . , vn . Define
xi (t) = vi> x(t) and left-multiply x = Lx by vi :
d
xi (t) = i xi (t),
dt

xi (0) = vi> x(0).

These n decoupled ordinary differential equations are immediately solved to give


x(t) = x1 (t)v1 + x2 (t)v2 + + xn (t)vn

= e1 t (v1> x(0))v1 + e2 t (v2> x(0))v2 + + en t (vn> x(0))vn .

Second, recall that 1 = 0 and v1 = 1n / n because L is a Laplacian matrix (L1n = 0n ). Therefore, we


compute (v1> x(0))v1 = average(x(0))1n and substitute
x(t) = average(x(0))1n + e2 t (v2> x(0))v2 + + en t (vn> x(0))vn .
Now, let us assume that G is connected so that its second smallest eigenvalue 2 is strictly positive. In
this case, we can infer that
lim x(t) = average(x(0))1n ,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

7.4. The Laplacian flow

87

or, defining a disagreement vector (t) = x(t) average(x(0))1n , we infer


(t) = e2 t (v2> x(0))v2 + + en t (vn> x(0))vn .
In summary, we discovered that, for a connected undirected graph, the disagreement vector converges
to zero with an exponential rate 2 . In what follows, we state a more convergence to consensus result
for the continuous-time Laplacian flow. This result is parallel to Theorem 5.2.
Theorem 7.3 (Consensus for Laplacian matrices with globally reachable node). If a Laplacian
matrix L has associated digraph G with a globally reachable node, then
(i) the eigenvalue 0 of L is simple and all other eigenvalues of L have negative real part,

(ii) limt eLt = 1n w> , where w 0 is the left eigenvector of L with eigenvalue 0 satisfying w1 + +
wn = 1,
(iii) wi > 0 if and only if node i is globally reachable. Accordingly, wi = 0 if and only if node i is not globally
reachable,
(iv) the solution to

d
dt x(t)

= Lx(t) satisfies


lim x(t) = w> x(0) 1n ,

>
(v) if additionally G is weight-balanced, then G is strongly connected, 1>
n L = 0n and w =
1 >
n 1n 1n = 1) so that

1> x(0)
1n = average x(0) 1n .
lim x(t) = n
t
n

1
n 1n

(because

Note: as a corollary to the statement (iii), the left eigenvector w Rn associated to the 0 eigenvalue
has strictly positive entries if and only if G is strongly connected.
Proof. Because the associated digraph has a globally reachable node, from the previous chapter we know
a few properties of L: L1n = 0n , L has rank n 1, and all eigenvalues of L have nonnegative real part.
Therefore, we immediately conclude that 0 is a simple eigenvalue with right eigenvector 1n and that
all other eigenvalues of L have negative real part. This concludes the proof of (i). In what follows we
let w denote the left eigenvector associated to the eigenvalues 0, that is, w> L = 0>
n , normalized so that
1>
w
=
1.
n
To prove statement (ii), we proceed in three steps. First, we write the Laplacian matrix in its Jordan
normal form:

0 0 0

0 J2 . . . 0 1
1

P ,
L = P JP = P .
(7.3)

.
.
.
.
.
.
.
. 0
0

Jm

where m n is the number of Jordan blocks, the first block is the scalar 0 (being the only eigenvalue we
know), the other Jordan blocks J2 , . . . , Jm (unique up to re-ordering) are associated with eigenvalues
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

88

Chapter 7. Continuous-time Averaging Systems

with strictly positive real part, and where the columns of P are the generalized eigenvectors of L (unique
up to rescaling).
Second, using some properties from Exercise E7.1, we compute the limit as t of eLt =
P eJt P 1 as

1 0 0

0 0 . . . 0 1
Lt
Jt 1
1

P = (P e1 )(e>
lim e
= P lim e
P = P .
) = c1 r1 ,
1P

.
.
t
t
.
.
.
.
.
. 0
0 0 0

where c1 is the first column of P and r1 is the first row of P 1 . The contributions of the Jordan blocks
J2 , . . . , Jm vanish because their eigenvalues have negative real part; e.g., for more details see (Hespanha
2009).
Third and final, we characterize c1 and r1 . By definition, the first column of P (unique up to
rescaling) is a right eigenvector of the eigenvalue 0 for the matrix L, that is, c1 = 1n for some scalar
since we know L1n = 0n . Of course, it is convenient to define c1 = 1n . Next, equation (7.3) can
>
be rewritten as P 1 L = JP 1 , whose first row is r1 L = 0>
n . This equality implies r1 = w for
1
>
some scalar . Finally, we note that P P = In implies r1 c1 = 1, that is, w 1n = 1. Since we know
w> 1n = 1, we infer that = 1 and that r1 = w> . This concludes the proof of statement (ii).
Next, we prove statement (iii). Pick a positive constant < 1/dmax , where the maximum out-degree
is dmax = max{dout (1), . . . , dout (n)}. Define B = In L. It is easy to show that B is nonnegative,
>
>
row-stochastic, and has strictly positive diagonal elements. Moreover, w> L = 0>
n implies w B = w so
that w is the left eigenvector with unit eigenvalue for B. Now, note that the digraph G(L) associated to
L (without self-loops) is identical to the digraph G(B) associated to B, except for the fact that B has
self-loops at each node. By assumption G(L) has a globally reachable node and therefore so does G(B),
where the subgraph induced by the set of globally reachable nodes is aperiodic (due to the self-loops).
Therefore, statement (iii) is now an immediate transcription of the same statement for row-stochastic
matrices established in Theorem 5.2 (statement (iii)).
Statements (iv) and (v) are straightforward.


7.5

Design of weight-balanced digraphs from strongly-connected

Problem: Given a directed graph G that is strongly connected, but not weight-balanced, how do we
choose the weights in order to obtain a weight-balanced digraph and a Laplacian satisfying 1>
n L = 0n ?
(Note that an undirected graph is automatically weight-balanced.)
Answer: As usual, let w > 0 be the left eigenvector of L with eigenvalue 0 satisfying w1 + +wn = 1.
In other words, w is a vector of convex combination coefficients, and the Laplacian L satisfies
L1n = 0n ,

and

w> L = 0>
n.

Following (Ren et al. 2007), define a new matrix:


Lrescaled = diag(w)L.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

7.6. Distributed optimization using the Laplacian flow

89

It is immediate to see that


Lrescaled 1n = diag(w)L1n = 0n ,

>
>
>
1>
n Lrescaled = 1n diag(w)L = w L = 0n

Note that:
Lrescaled is again a Laplacian matrix because (i) its row-sums are zero, (ii) its diagonal entries are
positive, and (iii) its non-diagonal entries are nonpositive;
Lrescaled is the Laplacian matrix for a new digraph Grescaled with the same nodes and directed edges
as G, but whose weights are rescaled as follows: aij 7 wi aij . In other words, the weight of each
out-edge of node i is rescaled by wi .

7.6

Distributed optimization using the Laplacian flow

In the following, we present a computational application of the Laplacian flow in distributed optimization.
The materials in this section are inspired by (Cherukuri and Corts 2015; Droge et al. 2013; Gharesifard
and Cortes 2014; Wang and Elia 2010), and we present them here in a self-contained way. As only
preliminaries notions, we introduce the following two definitions: A function f : Rn R is said be
convex if f (x + y) f (x) + f (y) for all x and y in Rn and for all convex combination coefficients
and , i.e., coefficients satisfying , 0 and + = 1. A function is said to be strictly convex if the
previous inequality holds strictly.
Consider a network of n processors that can perform local computation and communicate with
another. The communication architecture is modeled by an undirected, connected, and weighted graph
with n nodes and symmetric Laplacian L = L> Rnn . The objective of the processor network is to
solve the optimization problem
minimize xR f (x) =

n
X

fi (x),

(7.4)

i=1

where fi : R R is a strictly convex and twice continuously differentiable cost function known only
to processor i {1, . . . , n}. In a centralized setup, the decision variable x is globally available and the
minimizers x R of the optimization problem (7.4) can be found by solving for the critical points of
f (x)
n
X

f (x) =
fi (x).
0n =
x
x
i=1

A centralized continuous-time algorithm converging to the set of critical points is the negative gradient
flow

x = f (x) .
x
To find a distributed approach to solving the optimization problem (7.4), we associate a local estimate
yi R of the global variable x R to every processor and solve the equivalent problem
minimize yRn

f(y) =

n
X
i=1

1
fi (yi ) + y > Ly
2

subject to Ly = 0n ,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(7.5)

90

Chapter 7. Continuous-time Averaging Systems

where the consistency constraint Ly = 0 assures that yi = yj for all i, j {1, . . . , n}, that is, the local
estimates of all processors coincide. We also augmented the cost function with the term y > Ly, which
clearly has no effect on the minimizers of (7.5) (due to the consistency constraint), but it provides
supplementary damping and favorable convergence properties for our algorithm. The minimizers of the
optimizatio problems (7.4) and (7.5) are then related by y = x 1n .
Without any further motivation, consider the function L : Rn Rn R given by
1
L(y, z) = f(y) + y > Ly + z > Ly.
2
In the literature on convex optimization this function is known as (augmented) Lagrangian function and
z Rn is referred to as Lagrange multiplier. What is important for us is that the Lagrangian function is
strictly convex in y and linear (and hence concave) in z. Hence, the augmented Lagrangian function
admits a set of saddle points (y , z ) Rn Rn satisfying L(y , z) L(y , z ) L(y, z ). Since L(y, z)
is differentiable in y and z, the saddle points can be obtained as solutions to the equations


L(y, z) =
f (y) + Ly + Lz,
y
y

L(y, z) = Ly.
0n =
z

0n =

Our motivation for introducing the Lagrangian is the the following lemma.
Lemma 7.4 (Properties of saddle points). Let L = L> Rnn be a symmetric Laplacian associated to
an undirected, connected, and weighted graph, and consider the Lagrangian function L, where each fi is strictly
convex and twice continuously differentiable for all i {1, . . . , n}. Then
(i) if (y , z ) Rn Rn is a saddle point of L, then so is (y , z + 1n ) for any R;
(ii) if (y , z ) Rn Rn is a saddle point of L, then y = x 1n where x R is a solution of the original
optimization problem (7.4); and
(iii) if x R is a solution of the original optimization problem (7.4), then there are z Rn and y = x 1n

satisfying Lz + y
f (y ) = 0n so that (y , z ) is a saddle point of L.
We leave the proof to the reader in Exercise E7.10. Since the Lagrangian function is convex in y
and concave in z, we can compute its saddle points by following the so-called saddle-point dynamics,
consisting of a positive and negative gradient:

L(y, z) = f(y) Ly Lz,


y
y

z =
L(y, z) = Ly.
z

y =

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(7.6a)
(7.6b)

7.6. Distributed optimization using the Laplacian flow

91

For processor i {1, . . . , n}, the saddle-point dynamics (7.6) read component-wise as
y i =
zi =

j=1

j=1

X
X

fi (yi )
aij (yi yj )
aij (zi zj ),
yi

L(y, z) =
zi

n
X
j=1

aij (yi yj ).

Hence, the saddle-point dynamics can be implement in a distributed processor network using only
local knowledge of fi (yi ), local computation, nearest-neighbor communication andof courseafter
discretizing the continuous-time dynamics. As shown in (Cherukuri and Corts 2015; Droge et al. 2013;
Gharesifard and Cortes 2014; Wang and Elia 2010), this distributed optimization setup is very versatile
and robust and extends to directed graphs and non-differentiable convex objective functions. We will
later establish using a powerful tool termed LaSalle Invariance Principle to show that the saddle-point
dynamics (7.6) always converge to the set of saddle points; see Exercise E13.2.
For now we restrict our analysis to the case of quadratic cost functions fi (x) = (x xi )> Pi (x xi ),
where Pi > 0 and xi R. In this case, the saddle-point dynamics (7.6) are a linear system
  
 
P L L y
y
,
(7.7)
=
z
L
0
z
|
{z
}
=A

where y = y x 1n and P = diag({Pi }i{1,...,n} ). The matrix A is a so-called saddle matrix (Benzi et al.
2005). We will in the following establish the convergence of the the dynamics (7.7) to the set of saddle
points. First, observe that 0 is an eigenvalue of A with multiplicity 1 and the corresponding eigenvector,


> > corresponds to the set of saddle points:
given by 0>
n 1n
  
 
0n
P L L y
=
= (P + L)
y + Lz = 0n and L
y = 0n = y span(1n )
0n
L
0
z
=

by multiplying (P + L)
y + Lz = 0n by y> : y> P y = 0n

= y = 0n and z = 1n .


Next, note that 12 (A + A> ) = P0L 00 is negative semidefinite. It follows by a Lyapunov or standard
linear algebra result (Bernstein 2009, Fact 5.10.28) that all eigenvalues of A have real part less or equal
than zero. Since there is a unique zero eigenvalue associated with the set of saddle points, it remains to
show that the matrix A has no purely imaginary eigenvalues. This is established in the following lemma
whose proof is left to the reader in Exercise E7.11:
Lemma 7.5 (Absence of sustained oscillations in saddle matrices). Consider a negative semidefinite
matrix B Rnn and a not necessarily square matrix C Rnm . If kernel(B) image(C) = {0n }, then the
composite block-matrix


B
C
A=
C > 0
has no eigenvalues on the imaginary axis except for 0.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

92

Chapter 7. Continuous-time Averaging Systems


>
It follows that the saddle point dynamics (7.7) converge to the set of saddle points y> z >



> > . Since 1> z = 0, it follows that average(z(t)) = average(z ), we can further conclude
span 0>
0
n 1n
n
that the dynamics converge to a unique saddle point satisfying limt y(t) = x 1n and limt z(t) =
z0 1n .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

7.7. Exercises

7.7
E7.1

93

Exercises
P 1 k
Properties of the matrix exponential. Recall the definition of eA = k=0 k!
A for any square matrix
A. Complete the following tasks:
P 1 k
(i) show that k=0 k!
A converges absolutely for all square matrices A,
P
P
Hint: Recall that a matrix series k=1 Ak is said to converge absolutely if k=1 kAk k converges, where
k k is a matrix norm. Introduce a sub-multiplicative matrix norm k k and show k eA k ekAk .
(ii) show that, if A = diag(a1 , . . . , an ), then eA = diag(ea1 , . . . ean ),
1

E7.2

(iii) show that eT AT = T eA T 1 for any invertible T ,


(iv) give an example of matrices A and B such that eAB 6= eBA , and
(v) compute the matrix exponential of etJ where J is a Jordan block of arbitrary size and t R.

Continuous-time affine systems. Given A Rnn and b Rn , consider the continuous-time affine
systems
x(t)

= Ax(t) + b.
Assume A is Hurwitz and, similarly to Exercise E2.10, show that

E7.3

(i) the matrix A is invertible,


(ii) the only equilibrium point of the system is A1 b, and
(iii) limt x(t) = A1 b for all initial conditions x(0) Rn .

The matrix exponential of a Laplacian matrix. Let L be a Laplacian matrix. Show that, for all t > 0,
(i) the matrix exp(Lt) has unit row-sums;
(ii) the matrix exp(Lt) is nonnegative and has strictly-positive diagonal entries; and
Hint: Recall that exp(A + B) = exp(A) exp(B) if AB = BA and that exp(aIn ) = ea In .
(iii) each solution to the Laplacian flow x = Lx is bounded;
(iv) in the weight-balanced case when L has zero column sums, the matrix exp(Lt) has unit columnsums.

E7.4

Euler discretization of the Laplacian. Given a weighted digraph G with Laplacian matrix L and
maximum out-degree dmax = max{dout (1), . . . , dout (n)}. Show that:
(i) if < 1/dmax , then the matrix In L is row-stochastic,
(ii) if < 1/dmax and G is weight-balanced, then the matrix In L is doubly-stochastic, and
(iii) if < 1/dmax and G is strongly connected, then In L is primitive.

Given these results, note that (no additional assignment in what follows)

In L is the one-step Euler discretization of the continuous-time Laplacian flow and is a discretetime consensus algorithm; and
In L is a possible choice of weights for an undirected unweighted graph (which is therefore
also weight-balanced) in the design of a doubly-stochastic matrix (as we did in the discussion about
Metropolis-Hastings).
E7.5

Doubly-stochastic matrices on strongly-connected digraphs. Given a strongly-connected unweighted


digraph G, design weights along the edges of G (and possibly add self-loops) so that the weighted adjacency
matrix is doubly-stochastic.

E7.6

Constants of motion. In the study of mechanics, energy and momentum are two constants of motion,
that is, these quantities are constant along each evolution of the mechanical system. Show that
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

94

Chapter 7. Continuous-time Averaging Systems


(i) If A is a row stochastic matrix with w> A = w> , then w> x(k) = w> x(0) for all times k Z0
where x(k + 1) = Ax(k).
>
>
(ii) If L is a Laplacian matrix with with w> L = 0>
n , then w x(t) = w x(0) for all times t R0 where
x(t)

= Lx(t).

E7.7

Weight-balanced digraphs with a globally reachable node. Given a weighted directed graph G, show
that if G is weight-balanced and has a globally reachable node, then G is strongly connected.

E7.8

The Lyapunov equation for the Laplacian matrix of a strongly-connected digraph. Let L be the
Laplacian matrix of a strongly-connected weighted digraph. Find a positive-definite matrix P such that
(i) P L + L> P is positive semidefinite, and
(ii) (P L + L> P )1n = 0n .

E7.9

H2 performance of balanced averaging in continuous time. Consider the continuous-time averaging


dynamics with disturbance
x(t)

= Lx(t) + w(t),

where L = L> is the Laplacian matrix of an undirected and connected graph and w(t) is an exogenous
disturbance input signal. Pick a matrix Q Rpn satisfying Q1n = 0p and define the output signal
y(t) = Qx(t) Rp as the solution from zero initial conditions x(0) = 0n . Define the system H2 norm from
w to y by
Z

Z
Z
2
>
> >
>
kHk2 =
y(t) y(t)dt =
x(t) Q Qx(t)dt = trace
H(t) H(t)dt ,
0

where H(t) = QeLt is the so-called impulse response matrix.


p
(i) Show kHk2 = trace(P ), where P is the solution to the Lyapunov equation
LP + P L = Q> Q.

(E7.1)

p
(ii) Show kHk2 = trace (L Q> Q) /2, where L is the pseudoinverse of L.
1
>
>
(iii) Define short-range and long-range output matrices Qsr and Qlr by Q>
sr Qsr = L and Qlr Qlr = In n 1n 1n ,
respectively. Show:

for Q = Qsr ,
n 1,
n
2
X
1
kHk2 =

, for Q = Qlr .

(L)
i
i=2

Hint: The H2 norm has several interesting interpretations, including the total output signal energy in response to
a unit impulse input or the root mean square of the output signal in response to a white noise input with identity
covariance. You may find useful Theorem 7.3 and Exercise E6.7.
E7.10 Properties of saddle points. Prove Lemma 7.4.
E7.11 Absence of sustained oscillations in saddle matrices. Prove Lemma 7.5.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 8

The Incidence Matrix and Relative


Measurements
After studying adjacency and Laplacian matrices, in this chapter we introduce one final matrix associated
with a graph: the incidence matrix. We study the properties of incidence matrices and their application
to a class of estimation problems with relative measurements. For simplicity we restrict our attention to
undirected graphs. We borrow ideas from (Barooah 2007; Barooah and Hespanha 2007; Bolognani et al.
2010; Piovan et al. 2013) and refer to Biggs (1994); Foulds (1995); Godsil and Royle (2001) for more
information.

8.1

The incidence matrix

Let G be an undirected unweighted graph with n nodes and m edges. Number the edges of G with a
unique e {1, . . . , m} and assign an arbitrary direction to each edge. The (oriented) incidence matrix
B Rnm of the graph G is defined component-wise by

+1, if node i is the source node of edge e,


Bie = 1, if node i is the sink node of edge e,
(8.1)

0,
otherwise.

>
Note: 1>
n B = 0m since each column of B contains precisely one element equal to +1, one element
equal to 1 and all other zeros.
Note: assume the edge e {1, . . . , m} is oriented from i to j, then for any x Rn ,

(B > x)e = xi xj .

Example 8.1. Consider the following graph depicted in figure.

e4

e3

e2
1

2
95

e1

96

Chapter 8. The Incidence Matrix and Relative Measurements

As depicted on the right, we add an orientation to all edges, we order them and label them as follows: e1 = (1, 2),
e2 = (2, 3), e3 = (4, 2), and e4 = (3, 4). Accordingly, the incidence matrix is

+1 0
0
0
1 +1 1 0

B=
0 1 0 +1 .
0
0 +1 1

8.2

Properties of the incidence matrix

Given an undirected weighted graph G with edge set E and adjacency matrix A, recall
L = D A,

where D is the degree matrix.

Lemma 8.2 (From the incidence to the Laplacian matrix). If diag({ae }eE ) is the diagonal matrix
of edge weights, then
L = B diag({ae }eE )B > .
P
Proof. Recall that, for matrices O, P and Q of appropriate dimensions, we have (OP Q)ij = k,h Oik Pkh Qhj .
P
Moreover, if the matrix P is diagonal, then (OP Q)ij = k Oik Pkk Qkj .
For i 6= j, we compute
(B diag({ae }eE )B > )ij =
=

Xm

Xe=1
m
e=1

Bie ae (B > )ej


Bie Bje ae

(e-th term = 0 unless e is oriented {i, j})

= (+1) (1) aij = `ij ,


where L = {`ij }i,j{1,...,n} , and along the diagonal of B we compute
(B diag({ae }eE )B > )ii =

Xm

e=1

2
Bie
ae =

m
X

e=1, e=(i,) or e=(,i)

ae =

n
X

aij .

j=1


Lemma 8.3 (Rank of the incidence matrix). Let B be the incidence matrix of an undirected graph G with
n nodes. Let d be the number of connected components of G. Then
rank(B) = n d.
Proof. We prove this result for a connected graph with d = 1, but the proof strategy easily extends to
d > 1. Recall that the rank of the Laplacian matrix L equals n d = n 1. Since the Laplacian matrix
can be factorized as L = B diag({ae }eE )B > , where diag({ae }eE ) has full rank m (and m n 1
due to connectivity), we have that necessarily rank(B) n 1. On the other hand rank(B) n 1
since B > 1n = 0n . It follows that B has rank n 1.

Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

8.3. Distributed estimation from relative measurements

97

The factorization of the Laplacian matrix as L = B diag({ae }eE )B > plays an important role of
relative sensing networks. For example, we can decompose, the Laplacian flow x = Lx into
open-loop plant:
measurements:
control gains:
control inputs:

i {1, . . . , n} , or

x i = ui ,
yij = xi xj ,

{i, j} E ,

{i, j} E ,

zij = aij yij ,


X
ui =
zij ,

or

y = B>x ,

or

z = diag({ae }eE )y ,

i {1, . . . , n} , or

{i,j}E

x = u ,

u = Bz .

Indeed, this control structure, illustrated as a block-diagram in Figure 8.1, is required to implement
flocking-type behavior as in Example 7.1. The control structure in Figure 8.1 has emerged as a canonical
control structure in many relative sensing and flow network problems also for more complicated
open-loop dynamics and possibly nonlinear control gains (Bai et al. 2011).

..
u

x i = ui

..

.
BT

..
z

aij

..

Figure 8.1: Illustration of the canonical control structure for a relative sensing network.

8.3

Distributed estimation from relative measurements

In Chapter 1 we considered estimation problems for wireless sensor networks in which each node
measures a scalar absolute quantity (expressing some environmental variable such as temperature,
vibrations, etc). In this section, we consider a second class of examples in which meaurements are
relative, i.e., pairs of nodes measure the difference between their corresponding variables. Estimation
problems involving relative measurements are numerous. For example, imagine a group of robots (or
sensors) where no robot can sense its position in an absolute reference frame, but a robot can measure
other robots relative positions by means of on-board sensors. Similar problems arise in study of clock
synchronization in networks of processors.

8.3.1

Problem statement

The optimal estimation based on relative measurement problem is stated as follows. As illustrated
in Figure 8.2, we are given an undirected graph G = ({1, . . . , n}, E) with the following properties.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

98

Chapter 8. The Incidence Matrix and Relative Measurements

First, each node i {1, . . . , n} of the network is associated with an unknown scalar quantity xi (the
x-coordinate of node i in figure). Second, the m undirected edges are given an orientation and, for each
absolute reference
frame

x
xi

xj

xj

xi
Figure 8.2: A wireless sensor network in which sensors can measure each others relative distance and bearing.
We assume that, for each link between node i and node j, the relative distance along the x-axis xi xj is available,
where xi is the x-coordinate of node i.

edge e = (i, j), e E, the following scalar measurements are available:


y(i,j) = xi xj + v(i,j) = (B > x)e + v(i,j) ,
where B is the graph incidence matrix and the measurement noises v(i,j) , (i, j) E, are independent
2 ] = 2
jointly-Gaussian variables with zero-mean E[v(i,j) ] = 0 and variance E[v(i,j)
(i,j) > 0. The joint
2
mm
matrix covariance is the diagonal matrix = diag({(i,j) }(i,j)E ) R
.

n
The optimal estimate x
b of the unknown vector x R via the relative measurements y Rm is the
solution to
min kB > x
b yk21 .
x
b

Since no absolute information is available about x, we add the additional constraint that the optimal
estimate should have zero mean and summarize this discussion as follows.
Definition 8.4 (Optimal estimation based on relative measurements). Given an incidence matrix B,
a set of relative measurements y with covariance , find x
b satisfying
min kB > x
b yk21 .

x
b1n

8.3.2

(8.2)

Optimal estimation via centralized computation

From the theory of least square estimation, the optimal solution to problem 8.2 is obtained as by
differentiating the quadratic cost function with respect to the unknown variable x
b and setting the
derivative to zero. Specifically:
0=

kB > x
b yk21 = 2B1 B > x
b 2B1 y.
b
x

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

8.3. Distributed estimation from relative measurements

99

The optimal solution is therefore obtained as the unique vector x


b Rn satisfying
B1 B > x
b = B1 y

Lb
x = B1 y,

1>
b = 0,
nx

(8.3)

where the Laplacian matrix L is defined by L = B1 B > . This matrix is the Laplacian for the weighted
graph whose weights are the noise covariances associated to each relative measurement edge.
Before proceeding we review the definition and properties of the pseudoinverse Laplacian matrix
given in Exercise E6.7. Recall that the Moore-Penrose pseudoinverse of an n m matrix M is the unique
m n matrix M with the following properties:
(i) M M M = M ,
(ii) M M M = M , and
(iii) M M is symmetric and M M is symmetric.
For our Laplacian matrix L, let U
that

0 0 ...
0 2 . . .

L = U . . .
..
.. ..
0

Rnn be an orthonormal matrix of eigenvectors of L. It is known

0
0
>
.. U
.

. . . n

Moreover, it is known that LL = L L = In

0
0
...
0
0 1/2 . . .
0

>
L = U .
.
.. U .
.
.
.
.
.
.
.
.
0
0
. . . 1/n

1n 1>
n and L 1n = 0n .
n

Lemma 8.5 (Unique optimal estimate). If the undirected graph G is connected, then
(i) there exists a unique solution to equations (8.3) solving the optimization problem in equation (8.2); and
(ii) this unique solution is given by
x
b = L B1 y.

Proof. We claim there exists a unique solution to equation (8.3) and prove it as follows. Since G is
connected, the rank of L is n 1. Moreover, since L is symmetric and since L1n = 0n , the image of L
is the (n 1)-dimensional vector subspace orthogonal to the subspace spanned by the vector 1n . The
>
vector B1 y belongs to the image of L because the column-sums of B are zero, that is, 1>
n B = 0n , so
>
1
>
>

that 1n B y = 0n . Finally, the requirement that 1n x


b = 0 ensures x
b is perpendicular to the kernel
of L.
The expression x
b = L B1 y follows from left-multiplying left and right hand side of equation (8.3)
by the pseudoinverse Laplacian matrix L and using the property L L = In n1 1n 1>
n . One can also
B1 y = 0, because L 1 = 0 .
verify that 1>
L

n
n
n
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

100

8.3.3

Chapter 8. The Incidence Matrix and Relative Measurements

Optimal estimation via decentralized computation

To compute x
b in a distributed way, we propose the following distributed algorithm; see (Bolognani
et al. 2010, Theorem 4). Pick a small > 0 and let each node implement the affine averaging algorithm:

X
1 
x
b
(k)

x
b
(k)

y
,
x
bi (k + 1) = x
bi (k)
i
j
(i,j)
2
(8.4)
jN (i) (i,j)
x
bi (0) = 0.

There are two interpretations of this algorithm. First, note that the estimate at note i is adjusted at each
iteration as a function of edge errors: each edge error (difference between estimated and measured edge
difference) contributes to a weighted small correction in the node value. Second, note that the affine
Laplacian flow
x
= L
x + B1 y
(8.5)

results in a steady-state satisfying L


x = B1 y, which readily delivers the optimal estimate x
b =
L B1 y for appropriately chosen initial conditions. The algorithm (8.4) results from an Euler discretization of the affine Laplacian flow (8.5) with step size .
Lemma 8.6. Given a graph G describing a relative measurement problem for the unknown variables x Rn ,
2 }
mm . The
with measurements y Rm , and measurement covariance matrix = diag({(i,j)
(i,j)E ) R
following statements hold:
(i) the affine averaging algorithm can be written as
x
b(k + 1) = (In L)b
x(k) + B1 y,
x
b(0) = 0n .

(8.6)

(ii) if G is connected and if < 1/dmax where dmax is the maximum weighted out-degree of G, then the
solution k 7 x
b(k) of the affine averaging algorithm (8.4) converges to the unique solution x
b of the
optimization problem 8.2.

Proof. To show fact (i), note that the algorithm can be written in vector form as
x
b(k + 1) = x
b(k) B1 (B > x
b(k) y),

and, using L = B1 B > , as equation (8.6).


To show fact (ii), define the error signal (k) = x
b x
b(k). Note that (0) = x
b and that
average((0)) = 0 because 1>
b = 0. Compute
nx
(k + 1) = (In L + L)b
x (In L)b
x(k) B1 y
= (In L)(k) + (Lb
x B1 y)
= (In L)(k).

Now, according to Exercise E7.4, is sufficiently small so that In L is nonnegative. Moreover,


(In L) is doubly-stochastic and symmetric, and its corresponding undirected graph is connected and
aperiodic, Corollary 5.1 implies that (k) average((0))1n = 0n .

Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

8.4. Cycle and cutset spaces

8.4

101

Cycle and cutset spaces

As stated in the factorization in Lemma 8.2, we know that the incidence matrix contains at least as much
information as the Laplacian matrix. Indeed, we argue via the following example that the incidence
matrix contains additional information and subtleties.
Recall the distributed estimation in Example 8.3 defined over an undirected ring graph. Introduce next
an arbitrary orientation for the edges and, for simplicity, assume all edges are oriented counterclockwise.
Then, in the absence of noise, a summation of all measurements y(i,j) in this ring graph yields
X
X
> >
y(i,j) =
xi xj = 0
or
1>
n y = 1n B x = 0 ,
(i,j)E

(i,j)E

that is, all relative measurements around the ring cancel out. Equivalently, 1n kernel(B). This
consistency check can be used as additional information to process corrupted measurements.
These insights generalize to arbitrary graphs, and the nullspace of B and its orthogonal complement,
the image of B > , can be related to cycles and cutsets in the graph. In what follows, we present some
of these generalizations; the presentation in this section is inspired by (Biggs 1994; Zelazo 2009). As a
running example in this section we use the graph and the incidence matrix illustrated in Figure 8.3.

4
2

7
4

+1 +1
6 1 0
6
60
1
B=6
60
0
6
40
0
0
0

0
0
+1 0
0 +1
1
1
0
0
0
0

0
0
+1
0
0
1

3
0
0
0
07
7
+1 0 7
7
0 +17
7
1
15
0
0

Figure 8.3: An undirected graph with arbitrary edge orientation and its associated incidence matrix B R67 .

Definition 8.7 (Signed path vector). Given an undirected graph G, consider an arbitrary orientation of its
m edges and a simple path . The signed path vector v Rm corresponding to the path is defined by

+1, if edge i is traversed positively by ,


vi = 1, if edge i is traversed negatively by ,

0,
otherwise.

Proposition 8.8 (Cycle space). Given an undirected graph G, consider an arbitrary orientation of its edges
and the incidence matrix B Rnm . The null space of B, called the cycle space, is spanned by the signed path
vectors corresponding to all the cycles in G.
The proposition follows from the following lemma.
Lemma 8.9. Given an undirected graph G, consider an arbitrary orientation of its edges, its incidence matrix
B Rnm , and a simple path with distinct initial and final nodes described by a signed path vector v Rm .
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

102

Chapter 8. The Incidence Matrix and Relative Measurements

The vector y = Bv has components

+1, if node i is the initial node of ,


yi = 1, if node i is the final node of ,

0,
otherwise.

Proof. We write y = B diag(v)1n . The (i, e) element of the matrix B diag(v) takes the value 1
(respectively +1) if edge e is used by the path to enter (respectively leave) node i. Now, if node i is not
the initial or final node of the path , then the ith row sum of B diag(v), (B diag(v)1n )i , is zero. For
the initial node (B diag(v)1n )i = 1, and for the final node (B diag(v)1n )i = 1.

For the example in Figure 8.3, two cycles and their signed path vectors are illustrated in Figure 8.4.
Observe that v1 , v2 kernel(B) and the cycle traversing the edges (1, 3, 7, 5, 2) in counter-clockwise
orientation has a signed path vector given by the linear combination v1 + v2 .

2
1

v1
3

+1
B6 1
B6
B6+1
B6
6
kernel(B) = span B
B6 1
B6 0
B6
@4 0
0

6
5
v2

02

31
0
C
07
7C
7
0 7C
C
C
+17
7C = span(v1 , v2 )
7
17C
C
0 5A
+1

Figure 8.4: Two cycles and their respective signed path vectors in kernel(B).

Definition 8.10 (Cutset orientation vector). Given an undirected graph G, consider an arbitrary orientation
of its edges and a partition of its vertices V in two non-empty and disjoint sets V1 and V2 . The cutset orientation
vector v Rm corresponding to the partition V = V1 V2 has components

+1, if edge e has its source node in V1 and sink node in V2 ,


vi = 1, if edge e has its sink node in V1 and source node in V2 ,

0,
otherwise.

Proposition 8.11 (Cutset space). Given an undirected graph G, consider an arbitrary orientation of its edges
and its incidence matrix B Rnm . The image of B > , called the cutset space, is spanned by all cutset orientation
vectors corresponding to all partitions of the graph.
Proof. For a cutset orientation vector v Rm associated to the partition V = V1 V2 , we have
1  X > X >
v> =
bi
bi ,
2
iV1

iV2

m
>
where b>
i is the ith row of the incidence matrix. If Bx = 0n for some x R , then bi x = 0 for all
>
i {1, . . . , n}. It follows that v x = 0, or equivalently, v belongs to the orthogonal complement of

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

8.4. Cycle and cutset spaces

103

kernel(B) which is the image of B > . Finally, notice that the image of B > can be constructed this way:
the kth column of B > is obtained by choosing the partition V1 = {k} and V2 = V \ {k}. Thus, the
cutset orientation vectors span the image of B > .

Since rank(B) = n 1, any n 1 columns of the matrix B > form a basis for the cutset space. For
instance, the ith column corresponds to the cut isolating node i as V = {i} V \ {i}. For the example
in Figure 8.3, five cuts and their cutset orientation vectors are illustrated in Figure 8.5. Observe that
vi image(B > ), for i {1, . . . , 5}, and the cut isolating node 6 has a cutset orientation vector given
by the linear combination (v1 + v2 + v3 + v4 + v5 ). Likewise, the cut separating nodes {1, 2, 3} from
{4, 5, 6} has the cutset vector v1 + v2 + v3 corresponding to the sum of the first three columns of B > .

v1 1

v3

1
2
v2

image B T

02

5
v5

7
4 v4

+1
B6+1
B6
B6 0
B6
6
= span B
B6 0
B6 0
B6
@4 0
0

1
0
+1
0
0
0
0

0
0
1 0
0
1
+1
1
+1 0
+1 0
0 +1

= span (v1 , v2 , v3 , v4 , v5 )

31
0
C
07
7C
C
07
7C
7
0 7C
C
C
17
7C
5
0 A
1

Figure 8.5: Five cuts and their cutset orientation vectors in image(B > ).

Example 8.12 (Nonlinear network flow problem). Consider a network flow problem where a commodity
(e.g., power or water) is transported through a network (e.g., a power grid or a piping system). We model this scenario
with an undirected and connected graph with n nodes. With each node we associate an external supply/demand
variable
(positive for a source and negative for a sink) yi and assume that the overall network is balanced:
Pn
i=1 yi = 0. We also associate a potential variable xi with every node (e.g., voltage or pressure), and assume the
flow of commodity between two connected nodes i and j depends on the potential difference as fij (xi xj ), where
fij is a strictly increasing function satisfying fij (0) = 0. For example, for piping systems and power grids these
functions fij are given by the rational Hazen-Williams flow and the trigonometric power flow, which are both
monotone in the region of interest. By balancing the flow at each node (akin to the Kirchhoffs current law), we
obtain at node i
n
X
yi =
aij fij (xi xj ) , i {1, . . . , n} ,
j=1

where aij {0, 1} is the (i, j) element of the network adjacency matrix. In vector notation, the flow balance is


y = Bf B > x ,

where f RE is the vector-valued function with components fij . Consider also the associated linearized problem
y = BB > x = Lx, where L is the network Laplacian matrix, where we implicitly assumed fij0 (0) = 1. The
flows in the linear problem are obtained as B > x? = B > L y, where L is the Moore-Pennrose inverse of L; see
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

104

Chapter 8. The Incidence Matrix and Relative Measurements

Exercises E6.7 and E6.8. In the following we restrict ourselves to an acyclic network and show that the nonlinear
solution can be obtained from the solution of the linear problem.

We formally replace the flow f B > x by a new variable v := f B > x and arrive at
y = Bv ,


v = f B>x .

(8.7a)

(8.7b)

In the acyclic case, kernel(B) = {0} and necessarily v image(B > ), or v = B > w for some w Rn . Thus,
equation(8.7a) reads as y = Bv = BB > w = Lw and its solution is w = L y. Equation (8.7b) then reads as
f B > x = v = B > w = B > L y, and its unique solution (due to monotonicity) is B > x? = f 1 (B > L y).

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

8.5. Exercises

105

8.5

Exercises

E8.1

Continuous distributed estimation from relative measurements. Consider the continuous distributed
estimation algorithm given by the affine Laplacian flow (8.5). Show that for an undirected and connected
graph G and appropriately initial conditions x
(0) = 0n , the affine Laplacian flow (8.5) converges to the
unique solution x
b of the estimation problem given in Lemma 8.5.

E8.2

The edge Laplacian matrix (Zelazo and Mesbahi 2011). For an unweighted undirected graph with n
nodes and m edges, introduce an arbitrary orientation for the edges and recall the notions of incidence
matrix B Rnm and Laplacian matrix L = BB > Rnn . Next, define the edge Laplacian matrix
Ledge = B > B Rmm and show that
(i)
(ii)
(iii)
(iv)

E8.3

kernel(Ledge ) = kernel(B);
for an acyclic graph Ledge is nonsingular;
the non-zero eigenvalues of Ledge are equal to the non-zero eigenvalues of L, and
rank(L) = rank(Ledge ).

Evolution of the local disagreement error. Consider the Laplacian flow x = Lx, x(0) = x0 , defined
over an undirected and connected graph with n nodes and m edges. Beside the absolute disagreement
error = x average(x0 )1n Rn considered thus far, we can also analyze the relative disagreement error
eij = xi xj , for {i, j} E.
(i) Write a differential equation for e Rm ,
(ii) Based on Exercise E8.2, show that the relative disagreement errors converge to zero with exponential
convergence rate given by the algebraic connectivity 2 (L).

E8.4

Averaging with distributed integral control. Consider a Laplacian flow implemented as a relative
sensing network over a connected and undirected graph with incidence matrix B Rn|E| and weights
aij > 0 for i, j E, and subject to a constant disturbance term R|E| , as shown in Figure E8.1.

..

x i = ui

..

BT

+
z

..

aij

..

Figure E8.1: A relative sensing network with a constant disturbance input R|E| .
(i) Derive the dynamic closed-loop equations describing the model in Figure E8.1.
(ii) Show that asymptotically all states x(t) converge to some constant vector x Rn depending on the
value of the disturbance , i.e., x is not necessarily a consensus state.
Consider the system in Figure E8.1 with a distributed integral controller forcing convergence to
consensus, as shown in Figure E8.2. Recall that 1s is the the Laplace symbol for the integrator.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

106

Chapter 8. The Incidence Matrix and Relative Measurements

..
u

x i = ui

..

.
BT

+
p

..

..

aij

..

..

.1
s

Figure E8.2: Relative sensing network with a disturbance R|E| and distributed integral action.
(iii) Derive the dynamic closed-loop equations describing the model in Figure E8.2.
(iv) Show that the distributed integral controller in Figure E8.1 asymptotically stabilizes the set of steady
states (x , p ), where x span(1n ) corresponds to consensus.
Hint: To show stability, use Lemma 7.5.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 9

Compartmental and Positive Systems


This chapter is inspired by the excellent text (Walter and Contreras 1999) and the tutorial treatment
in (Jacquez and Simon 1993); see also the texts (Farina and Rinaldi 2000; Haddad et al. 2010; Luenberger
1979).
Before discussing compartmental systems, it is convenient to introduce a key useful definition. A
square matrix A is said to be Metzler if all its off-diagonal elements are nonnegative. In other words, A is
Metzler if and only if there exists a scalar a > 0 such that A + aIn is nonnegative. For example, if G is a
weighted digraph and L is its Laplacian matrix, then L is a Metzler matrix with zero row-sums.

9.1

Introduction and example systems

Compartmental systems model dynamical processes characterized by conservation laws (e.g., mass, fluid,
energy) and by the flow of material between units known as compartments. Example compartmental
systems are transportation networks, queueing networks, communication networks, epidemic propagation models in social contact networks, as well as ecological and biological networks. We review some
examples in what follows.
Ecological and environmental systems The flow of energy and nutrients (water, nitrates, phosphates, etc) in ecosystems is typically studied using compartmental modelling. For example, Figure 9.1
illustrates a widely-cited water flow model for a desert ecosystem (Noy-Meir 1973). Other classic
ecological network systems include models for dissolved oxygen in stream, nutrient flow in forest growth
and biomass flow in fisheries (Walter and Contreras 1999).
Epidemiology of infectious deseases To study the propagation of infectious deseases, the population at
risk is typically divided into compartments consisting of individiduals who are susceptible (S), infected
(I), and, possibly, recovered and no longer susceptible (R). As illustrated in Figure 9.2, the three basic
epidemiological models are (Hethcote 2000) called SI, SIS, SIR, depending upon how the desease spreads.
A detailed discussion is postponed until Chapter 16.
107

108

Chapter 9. Compartmental and Positive Systems

soil

precipitation

evaporation, drainage, runo

uptake

transpiration

plants

drinking

herbivory
evaporation

animals

Figure 9.1: Water flow model for a desert ecosystem. The blue line denotes an inflow from the outside environment.
The red lines denote outflows into the outside environment.
Infected

Susceptible

Susceptible

Susceptible

Infected

Infected

Recovered

Figure 9.2: The three basic models SI, SIS and SIR for the propagation of an infectious desease

Drug and chemical kinetics in biomedical systems Compartmental model are also widely adopted
to characterize the kinetics of drugs and chemicals in biomedical systems. Here is a classic example (Charkes et al. 1978) from nuclear medicine: bone scintigraphy (also called bone scan) is a medical
test in which the patient is injected with a small amount of radioactive material and then scanned with
an appropriate radiation camera. .
rest of the body

radioactive
material

blood

kidneys

bone ECF

bone

urine

Figure 9.3: The kinetics of a radioactive isotope through the human body (ECF = extra-cellular fluid).

9.2

Compartmental systems

A compartmental system is a system in which material is stored at individual locations and is transfered
along the edged of directed graph, called the compartmental digraph; see Figure 9.4(b). The storage
nodes are referred to as compartments; each compartment contains a time-varying quantity qi (t). Each
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

9.2. Compartmental systems

109

u3

ui

qi

q3

F3!2

Fi!j

q4

F4!3

F4!0

Fi!0

Fj!i
u1

q1

F2!3
F1!2

F2!4
q2

F2!0

Figure 9.4: A compartment and a compartmental system

directed arc (i, j) represents a mass flow (or flux), denoted Fij , from compartment i to compartment j.
The compartmental system also interacts with its surrounding environment via appropriate inputs and
output arcs, denoted in figure by blue and red arcs respectively: the inflow from the environment into
compartment i is denoted by ui and the outflow out of compartment i into the environment is denoted
by Fi0 .
The dynamic equations describing the system evolution are precisely obtained by the instantaneous
flow balance equations at each compartment, that is, the rate of accumulation at each compartment
equals the net inflow rate:
qi (t) =

n
X

j=1,j6=i

(9.1)

(Fji Fij ) Fi0 + ui .

In general, the flow along (i, j) is a function of the entire system state q = (q1 , . . . , qn ) and of time t, so
that Fij = Fij (q, t).
Note that the mass in each of the compartments as well as the mass flowing along each of the edges
must be nonnegative at all times. Specifically, we require the mass flow functions to satisfy
Fij (q, t) 0 for all (q, t), and

Fij (q, t) = 0 for all (q, t) such that qi = 0.

(9.2)

Under these conditions, if at some time P


t0 one of the compartments has no mass, that is, qi (t0 ) = 0
n
and q(t0 ) R0 , it follows that qi (t0 ) = nj=1,j6=i Fji (q(t0 ), t0 ) + ui 0 so that qi does not become
negative. In summary, the compartmental system (9.1) is called positive in the sense that q(t) Rn0 for
all t, provided q(0) Rn0 .
P
If M (q) = ni=1 qi = 1>
n q denotes the total mass in the system, then along the solutions of (9.1)
Xn
d
M (q(t)) = 1>
=
Fi0 (q(t), t) +
n q(t)
i=1
dt
|
{z
}
outflow into environment

Xn

ui
| i=1
{z }

(9.3)

inflow from environment

This equality implies that the total mass t 7 M (q(t)) is constant in systems without inflows and outflows.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

110

Chapter 9. Compartmental and Positive Systems

Linear compartmental systems


Intuitively speaking, a compartmental system is linear if it has constant nonnegative inflow from the environment, and nonnegative flows (between compartments and from compartments into the environment)
that depend linearly upon the mass in the originating compartment.
Definition 9.1 (Linear compartmental system and digraph). A linear compartmental system with
n compartments is triplet (F, f0 , u) consisting of
(i) a nonnegative n n matrix F = (fij )i,j{1,...,n} with zero diagonal, called the flow rate matrix,

(ii) a vector f0 0n , called the outflow rates vector, and

(iii) a vector u 0n , called the inflow vector.

The weighted digraph associated to F , called the compartmental digraph, encodes the following
information: the nodes are the compartments {1, . . . , n}, there is an edge (i, j) if there is a flow from
compartment i to compartment j, and the weight fij of the (i, j) edge is corresponding flow rate
constant. The compartmental digraph has no self-loops. In a linear compartmental system,
Fij (q, t) = fij qi ,
Fi0 (q, t) = f0i qi ,

for j {1, . . . , n},


and

ui (q, t) = ui .
Indeed, this model is also referred to as donor-controlled flow. Note that this model satisfies the physicallymeaningful relationships (9.2). The affine dynamics describing the linear compartmental system is
n
n


X
X
qi (t) = f0i +
fij qi (t) +
fji qj (t) + ui .
j=1,j6=i

(9.4)

j=1,j6=i

Definition 9.2 (Compartmental matrix). The compartmental matrix C = (cij )i,j{1,...,n} of a compartmental system (F, f0 , u) is defined by
(
fji ,
cij =
P
f0i nh=1,h6=i fhi ,

if i 6= j,
if i = j.

Equivalently, if LF = diag(F 1n ) F is the Laplacian matrix of the compartmental digraph,


>
C = L>
F diag(f0 ) = F diag(F 1n + f0 ).

(9.5)

In what follows it is convenient to call compartmental any matrix C with the following properties:
(i) C is Metzler, i.e., its off-diagonal elements are nonnegative: cij 0,

(ii) C has nonpositive diagonal entries: cii 0, and

(iii) C is column diagonally dominant in the sense that |cii |

Pn

j=1,j6=i cji .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

9.2. Compartmental systems

111

With the notion of compartmental matrix, the dynamics of the linear compartmental system (9.4)
can be written as
q(t)
= Cq(t) + u.
(9.6)
>
Moreover, since LF 1n = 0n , we know 1>
n C = f0 and, consistently with equation (9.3),
f0> q(t) + 1>
n u.

d
dt M (q(t))

Algebraic and graphical properties of linear compartmental systems


In this section we present useful properties of compartmental matrices, that are parallel to those enjoyed
by Laplacian matrices.
Lemma 9.3 (Spectral properties of compartmental matrices). For a compartmental system (F, f0 , u)
with compartmental matrix C,
(i) if spec(C), then either = 0 or <() < 0, and

(ii) C is invertible if and only if C is Hurwitz (i.e., <() < 0 for all spec(C)).
Proof. We here sketch the proof and invite the reader to fill out the details in Exercise E9.1. Statement (i)
is akin the result in Lemma 6.3 and can be proved by an application of the Gergorin Disks Theorem 2.9.
Statement (i) immediately implies statement (ii).

Next, we introduce some useful graph-theoretical notions, illustrated in Figure 9.5. In the compartmental digraph, a set of compartments S is
(i) outflow-connected if there exists a directed path from every compartment in S to the environment,
that is, to a compartment j with a positive flow rate constant f0j > 0,
(ii) a trap if there is no directed path from any of the compartments in S to the environment or to any
compartment outside S, and
(iii) a simple trap is a trap that has no traps inside it.
It is immediate to realize the following equivalence: the system is out-flow connected (i.e., all compartments are outflow-connected) if and only if the system contains no trap. (An outflow-connected
compartmental matrix is referred to as weakly chained diagonally dominant in (Shivakumar et al. 1996)
and related references.)
Theorem 9.4 (Algebraic graph theory of compartmental systems). Consider the linear compartmental
system (F, f0 , u) with dynamics (9.6) with compartmental matrix C and compartmental digraph GF . The
following statements are equivalent:
(i) the system is outflow-connected,
(ii) each sink of the condensation of GF is outflow-connected, and
(iii) the compartmental matrix C is invertible.
Moreover, the sinks of the condensation of GF that are not outflow-connected are precisely the simple traps of the
system and their number equals the multiplicity of 0 as a semisimple eigenvalue of C.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

112

Chapter 9. Compartmental and Positive Systems

(a) An example compartmental system and its strongly connected components:


this system is outflow-connected because its two sinks in the condensation digraph
are outflow-connected.

(b) This compartmental system is not


outflow-connected because one of its sink
strongly-connected components is a trap.

Figure 9.5: Outflow-connectivity and traps in compartmental system

The proof of the equivalence between (i) and (iii) is similar to the proof in Theorem 6.4.
Proof. The equivalence between statements (i) and (ii) is immediate.
To establish the equivalence between (ii) and (iii), we first consider the case in which GF is strongly
connected and at least one compartment has a strictly positive outflow rate. Therefore, the Laplacian
matrix LF of GF and the compartmental matrix C = L>
F diag(f0 ) are irreducible. Pick 0 < <
1/ maxi |fii |, and define A = In + C > . Because of the definition of , the matrix A is nonnegative and
irreducible. We compute its row sums as follows:
A1n = 1n + (LF diag(f0 ))1n = 1n f0 .
Therefore, A is row-substochastic as defined in Exercise E2.5, that is, all its row-sums are at most 1 and
one row-sum is strictly less than 1. Moreover, because A is irreducible, the results in Exercise E4.4 imply
that (A) < 1. Now, let 1 , . . . , n denote the eigenvalues of A. Because A = In + C > , we know that
the eigenvalues 1 , . . . , n of C satisfy i = 1 + i so that maxi <(i ) = 1 + maxi <(i ). Finally, we
note that (A) < 1 implies maxi <(i ) < 1 so that
max <(i ) =
i


1
max <(i ) 1 < 0.
i

This concludes the proof that if G is strongly connected, then F has eigenvalues with strictly negative
real part. The converse is easy to prove by contradiction: if f0 = 0n , then the matrix C should be
row-stochastic, but that is a contradiction with the assumption that C is invertible.
Next, to prove the equivalence between (ii) and (iii) for a graph GF whose condensation digraph has
an arbitrary number of sinks, we proceed as in the proof of Theorem 6.4: we reorder the compartments as
described in Exercise E3.1 so that the Laplacian matrix LF is block lower-triangular as in equation (6.5).
We then define an appropriately small and the matrix A = In C > as above. We leave the remaining
details to the reader.

Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

9.3. Positive systems

113

Dynamic properties of linear compartmental systems


We now state our main result about the asymptotic behavior of linear compartmental systems.
Theorem 9.5 (Asymptotic behavior of compartmental systems). The linear compartmental system
(F, f0 , u) with compartmental matrix C and compartmental digraph GF has the following possible asymptotic
behaviors:
(i) if the system is outflow-connected, then the compartmental matrix C is invertible, every solution tends
exponentially to the unique equilibrium q = C 1 u 0n , and in the ith compartment qi > 0 if and
only if the ith compartment is inflow-connected to a strictly positive inflow;
(ii) if the system contains one or more simple traps, then:
(a) define the reduced compartmental systems (Frd , f0,rd , urd ) as follows: remove all traps from GF
and regard the edges into the trapping compartments as outflow edges into the environment. The reduced
compartmental system (Frd , f0,rd , urd ) is outflow-connected and all its solutions converge exponentially
1
> diag(F 1 + f
fast to the unique nonnegative equilibrium Crd
urd , for Crd = Frd
rd n
0,rd );
(b) any simple trap H contains non-decreasing mass along time. If H is inflow-connected to a positive
inflow, then the mass inside H goes to infinity. Otherwise, the mass inside H converges to a scalar
multiple of the right eigenvector corresponding to the eigenvalue 0 of the compartmental submatrix
for H.

Proof. Regarding statement (i), note that the system q = Cq + u is an affine continuous-time system
with a Hurwitz matrix C and, by Exercise E7.2, the system has a unique equilibrium point q = C 1 u,
that is globally exponentially stable. The fact that C 1 u 0n follows from a property of Hurwitz
Metzler matrices, which we study in Theorem 9.8 in the next section.
We leave the proof of statement (ii) to the reader.


9.3

Positive systems

In this section we generalize the class of compartmental systems and study more general network systems
called positive systems.
Definition 9.6 (Positive systems). A dynamical system x(t)

= f (x(t), t), x Rn , is positive if x(0) 0n


implies x(t) 0n for all t.
We are especially interested in linear and affine systems, described by
x(t)

= Ax(t),

and

x(t)

= Ax(t) + b.

Note that the set of affine systems includes the set linear systems (each linear system is affine with b = 0n ).
The following result classifies which affine systems are positive.
Theorem 9.7 (Positive affine systems and Metzler matrices). For the affine system x(t)

= Ax(t) + b,
the following statements are equivalent:
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

114

Chapter 9. Compartmental and Positive Systems

(i) the system is positive, that is, x(t) 0n for all t and all x(0) 0n ,

(ii) A is Metzler and b 0n .

Proof. We start by showing that statement (i) implies statement (ii). If x(0) = 0n , then x cannot have
any negative components, hence b 0n . If any off-diagonal entry (i, j), i 6= j, of A is strictly negative,
then consider an initial condition x(0) with all zero entries except for x(j) > bi /|aij |. It is easy to see
that x i (0) < 0 which is a contradiction.
Finally, to show that statement (ii) implies statement (i), it suffices to note that, anytime there
exists i such that xi (t) = 0, the condition x(t) 0n , A Metzler and b 0n together imply x i (t) =
P

j6=j aij xj (t) + bi 0.
Note: as expected, compartmental systems are positive affine systems. Specifically, compartmental
systems are positive affine systems with additional properties (the compartmental matrix has nonpositive
diagonal entries and it is column diagonally dominant).

Theorem 9.8 (Properties of Hurwitz Metzler matrices). For a Metzler matrix A, the following statements are equivalent:
(i) A is Hurwitz,
(ii) A is invertible and A1 0, and

(iii) for all b 0n , there exists x 0n solving Ax + b = 0n .

Moreover, if A is Metzler, Hurwitz and irreducible, then A1 > 0.


Proof. We start by showing that (i) implies (ii). Clearly, if A is Hurwitz, then it is also invertible.
So it suffices to show that A1 is nonnegative. Pick > 0 and define A,A = In + A, that is,
(A) = (In A,A ). Because A is Metzler, can be selected small enough so that A,A 0. Moreover,
because the spectrum of M is strictly in the left half plane, one can verify that, for small enough,
spec(A) is inside the disk of unit radius centered at the point 1. In turn, this last property implies
that spec(In + A) is strictly inside the disk of unit radius centered at the origin, that is, (A,A ) < 1.
We now adopt the Neumann series as defined in Exercise E2.12: because (A,A ) < 1, we know that
(In A,A ) = (A) is invertible and that
(A)

= (In A,A )

X
k=0

Ak,A .

(9.7)

Note now that the right-hand side is nonnegative because it is the sum of nonnegative matrices. In
summary, we have shown that A is invertible and that A1 0. This statement proves that (i)
implies (ii).
Next we show that (ii) implies (i). We know A is Metzler, invertible and satisfies A1 0. By
the Perron-Frobenius Theorem for Metzler matrices in Exercise E9.4, we know there exists v 0n
satisfying Av = Metzler v, where Metzler = max{<() | spec(A)}. Clearly, A invertible implies
Metzler 6= 0 and, moreover, v = Metzler A1 v. Now, we know v is nonnegative and A1 v is nonpositive.
Hence, Metzler must be negative and, in turn, A is Hurwitz. This statement establishes the equivalence
between (ii) implies (i)
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

9.3. Positive systems

115

Finally, regarding the equivalence between statement (ii) and statement (iii), note that, if A1 0
and b 0n , then clearly x = A1 b 0n solves Ax + b = 0n . This proves that (ii) implies (iii). Vice
versa, if statement (iii) holds, then let xi be the nonnegative solution of Axi = ei and let X be the
nonnegative matrix with columns x1 , . . . , xn . Therefore, we know AX = In so that A is invertible,
X is its inverse, and A1 = (X) = X is nonnegative. This statement proves that (iii) implies (ii).
Finally, the statement that A1 > 0 for each Metzler, Hurwitz and irreducible matrix A is proved
as follows. Because A is irreducible, the matrix A,A = In + A is nonnegative (for sufficiently small)
and primitive. Therefore, the right-hand side of equation (9.7) is strictly positive.

This theorem about Metzler matrices immediately leads to some useful properties of positive affine
systems.
Corollary 9.9 (Existence, positivity and stability of equilibria for positive affine systems). Consider a continuous-time positive affine system x = Ax + b, where A is Metzler and b is nonnegative. If the matrix
A is Hurwitz, then
(i) the system has a unique equilibrium point x Rn , that is, a unique solution to Ax + b = 0n ,

(ii) the equilibrium point x is nonnegative, and

(iii) all trajectories converges asymptotically to x .


Several other properties of positive affine systems and Metzler matrices are reviewed in (Berman and
Plemmons 1994), albeit with a slightly different language.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

116

9.4

Chapter 9. Compartmental and Positive Systems

Table of asymptotic behaviors for averaging and positive systems


Dynamics

Assumptions & Asymptotic Behavior

averaging system
x(k + 1) = Ax(k)
A row-stochastic

References

the associated digraph has a globally reachable node


Theorem 5.2.
Averaging example sys=
limk x(k) = (w> x(0))1n where w 0 is the left tems in Chapter 1
eigenvector of A with eigenvalue 1 satisfying 1>
nw = 1

affine system
A convergent (i.e., its spectral radius is less than 1)
x(k + 1) = Ax(k) + b =
limk x(k) = (In A)1 b

Exercise E2.10.
Friedkin-Johnsen system in Exercise E5.4

positive affine system x(0) 0n = x(k) 0n for all k, and


x(k + 1) = Ax(k) + b
A 0, b 0n
A convergent (i.e., || < 1 for all spec(A))
=
limk x(k) = (In A)1 b 0n

Exercise E2.12

Table 9.1: Discrete-time systems

Dynamics

Assumptions & Asymptotic Behavior

averaging system
x(t)

= Lx(t)
L Laplacian matrix

affine system
x(t)

= Ax(t) + b
positive affine system
x(t)

= Ax(t) + b
A Metzler, b 0n

References

the associated digraph has a globally reachable node


Theorem 7.3.
=
Flocking example syslimt x(t) = (w> x(0))1n where w 0 is the left tem in Section 7.1
eigenvector of L with eigenvalue 0 satisfying 1>
nw = 1
A Hurwitz (i.e., its spectral abscissa is negative)
=
limt x(t) = A1 b

Exercise E7.2

x(0) 0n = x(t) 0n for all t, and

Theorem 9.7 and


Corollary 9.9.
Compartmental
systems in Section 9.1.

A Hurwitz (i.e., <() < 0 for all spec(A))


=
limt x(t) = A1 b 0n
Table 9.2: Continuous-time systems

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

9.5. Exercises

117

9.5

Exercises

E9.1

A simple proof. Write in details the proof of Lemma 9.3.

E9.2

Simple traps and strong connectivity. Show that a compartmental system that has no outflows and that
is a simple trap, is strongly connected.

E9.3

The matrix exponential of a Metzler matrix. Recall Exercise E7.3 about Laplacian matrices. For
M Rnn , show that
(i) exp(M t) 0 for all t if and only if M is Metzler, and
(ii) if M is Metzler, then for all t 0, there exists p > 0 such that
exp(M t) pIn .

E9.4

Perron-Frobenius Theorem for Metzler matrices. Let M be a Metzler matrix and spec(M ) be its
spectrum. Show that:
(i) Metzler (M ) = max{<() | spec(M )} is an eigenvalue of M ,
(ii) if v Rn satisfies M v = Metzler (M )v, then v 0n .

Hint: Recall the Perron-Frobenius Theorem 2.14 for nonnegative matrices.


E9.5

Perron-Frobenius Theorem for irreducible Metzler matrices. Let M be an irreducible Metzler matrix
and spec(M ) be its spectrum. Show that:
(i) Metzler (M ) = max{<() | spec(M )} is a simple eigenvalue of M ,
(ii) if v Rn satisfies M v = Metzler (M )v, then v is unique (up to scalar multiple) and v > 0.

Hint: Recall the Perron-Frobenius Theorem 2.15 for irreducible matrices.


E9.6

On Metzler matrices and compartmental systems with growth and decay. Let M be an n n
symmetric Mezler matrix. Recall Lemma 9.3 and define v Rn by M = L + diag(v), where L is a
symmetric Laplacian matrix. Show that:
(i) if M is Hurwitz, then 1>
n v < 0.
Next, assume n = 2 and assume v has both nonnegative and nonpositive entries. (If v is nonnegative,
lack of stability can be established from statement (i); if v is nonpositive, stability can be established via
Theorem 9.4.) Show that
(ii) there exist nonnegative numbers f , d and g such that, modulo a permutation, M can be written in
the form:

 
 

1 1
g 0
(g f )
f
M = f
+
=
,
1 1
0 d
f
(d f )
(iii) M is Hurwitz if and only if
d>g

and f >

gd
.
dg

Note: The inequality d > g (for n = 2) is equivalent to the inequality 1>


n v < 0 in statement (i). In the
interpretation of compartmental systems with growth and decay rates, f is a flow rate, d is a decay rate
and g is a growth rate; the statement (iii) is then interpreted as follows: M is Hurwitz if and only if the
decay rate is larger than the growth rate and the flow rate is sufficiently large.
E9.7

Nonnegative inverses for nonnegative matrices. Let A be a nonnegative square matrix and show that
the following statements are equivalent:
(i) > (A), and
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

118

Chapter 9. Compartmental and Positive Systems


(ii) the matrix (In A) is invertible and its inverse (In A)1 is nonnegative.

Moreover, show that

(iii) if A is irreducible and > (A), then (In A)1 is positive.

E9.8

(Given a square matrix A, the map 7 (In A)1 is sometimes referred to as the resolvent of A.)
Equilibrium points for positive systems. Consider two continuous-time positive affine systems
x = Ax + b,
b + bb.
x = Ax

b are Hurwitz and, by Corollary 9.9, let x and x


Assume that A and A
b denote the equilibrium points of
b and b bb imply x x
the two systems. Show that the inequalities A A
b .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 10

Convergence Rates, Scalability and


Optimization
In this chapter we discuss the convergence rate of averaging algorithms. We borrow ideas from (Xiao
and Boyd 2004). We focus on discrete-time systems and their convergence factors. The study of
continuous-time systems is analogous.
Before proceeding, we recall a few basic facts. Given a square matrix A,
(i) the spectral radius of A is (A) = max{|| | spec(A)},



(ii) the p-induced norm of A, for p N {}, is kAkp = max kAxkp | x Rn and kxkp = 1 ,

(iii) the induced 2-norm of A is kAk2 = max{ | spec(A> A)},


(iv) if A = A> , then kAk2 = (A), and

1/`
(v) in general, (A) kAkp and even (A) A` p for any p N {} and ` N.

Definition 10.1 (Essential spectral radius of a row-stochastic matrix). The essential spectral radius
of a row-stochastic matrix A is
(
0,
if spec(A) = {1, . . . , 1},
ess (A) =
max{|| | spec(A) \ {1}}, otherwise.

10.1

Some preliminary calculations and observations

The convergence factor for symmetric row-stochastic matrices To build some intuition about the
general case, we start with a weighted undirected graph G with adjacency matrix A that is row-stochastic
and primitive (i.e., the graph G, viewed as a digraph, is strongly connected and aperiodic). We consider
the corresponding discrete-time averaging algorithm
x(k + 1) = Ax(k).
Note that G undirected implies that A is symmetric. Therefore, A has real eigenvalues 1 2
n and corresponding orthonormal eigenvectors v1 , . . . , vn . Because A is row-stochastic, 1 = 1 and
119

120

Chapter 10. Convergence Rates, Scalability and Optimization

v1 = 1n / n. Next, along the same lines of the modal decomposion given in Section 2.1, we know that
the solution can be decoupled into n independent evolution equations as
x(k) = average(x(0))1n + k2 (v2> x(0))v2 + + kn (vn> x(0))vn .
Moreover, A being primitive implies that max{|2 |, . . . , |n |} < 1. Specifically, for a symmetric and
primitive A, we have ess (A) = max{|2 |, |n |} < 1. Therefore
lim x(k) = 1n 1>
n x(0)/n = average(x(0))1n .

To upper bound the error, since the vectors v1 , . . . , vn are orthonormal, we compute
v
uX
n


X


2
u n






k >
j (vj x(0))vj = t
|j |2k (vj> x(0))vj
x(k) average(x(0))1n =
2

j=2

j=2

v
uX
2


u n
>



kt
ess (A)
(vj x(0))vj = ess (A)k x(0) average(x(0))1n , (10.1)
j=2

where the second and last equalities are Pythagoras Theorem.


In summary, we have learned that, for symmetric matrices, the essential spectral radius ess (A) < 1 is
the convergence factor to average consensus. (The wording convergence factor is for discrete-time
systems, whereas the wording convergence rate is for continuous-time systems.)
A note on convergence factors for asymmetric matrices Consider now the asymmetric matrix


0.1 1010
Alarge-gain =
.
0
0.1

Clearly, the two eigenvalues are 0.1 and so is the spectral radius. This is therefore a convergent matrix.
It is however false that the evolution of the system
x(k + 1) = Alarge-gain x(k)
with an initial condition with non-zero second entry, satisfies a bound of the form in equation (10.1). It
is still true, of course, that the solution does eventually converge to zero exponentially fast.
The problem is that the eigenvalues (alone) of a non-symmetric matrix do not fully describe the state
amplification that may take place during a transient period of time. (Note that the 2-norm of Alarge-gain
is order 1010 .)

10.2

Convergence factors for row-stochastic matrices

Consider a discrete-time averaging algorithm (distributed linear averaging)


x(k + 1) = Ax(k)
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

10.2. Convergence factors for row-stochastic matrices

121

where A is doubly-stochastic and not necessarily symmetric. If A is primitive (i.e., the associated digraph
is aperiodic and strongly connected), we know

lim x(k) = average(x(0))1n = 1n 1>
n /n x(0).
k

We now define two possible notions of convergence factors. With the shorthand xfinal = average(x(0))1n ,
the per-step convergence factor is
rstep (A) =

sup
x(k)6=xfinal

kx(k + 1) xfinal k2
kx(k) xfinal k2

and the asymptotic convergence factor is


rasym (A) =

sup

lim

x(0)6=xfinal k

kx(k) xfinal k2
kx(0) xfinal k2

!1/k

Given these definitions the preliminary calculations in the previous Section 10.1, we can now state
our main results.
Theorem 10.2 (Convergence factor and solution bounds). Let A be doubly-stochastic and primitive.
(i) The convergence factors of A satisfy
rstep (A) = kA 1n 1>
n /nk2 ,

rasym (A) = ess (A) = (A 1n 1>


n /n) < 1.

(10.2)

Moreover, rasym (A) rstep (A), and rstep (A) = rasym (A) if A is symmetric.

(ii) For any initial condition x(0) with corresponding xfinal = average(x(0))1n ,




x(k) xfinal rstep (A)k x(0) xfinal ,
2
2




x(k) xfinal C (rasym (A) + )k x(0) xfinal ,
2
2

(10.3)
(10.4)

where > 0 is an arbitrarily small constant and C is a sufficiently large constant independent of x(0).

Note: A sufficient condition for rstep (A) < 1 is given in Exercise E10.1.
Before proving Theorem 10.2, we introduce an interesting intermediate result. For xfinal =
average(x(0))1n , the disagreement vector is the error signal
(k) = x(k) xfinal .

(10.5)

Lemma 10.3 (Disagreement or error dynamics). Given a doubly-stochastic matrix A, the disagreement
vector (k) satisfies
(i) (k) 1n for all k,


(ii) (k + 1) = A 1n 1>
n /n (k),

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

122

Chapter 10. Convergence Rates, Scalability and Optimization

(iii) the following properties are equivalent:


(a) limk Ak = 1n 1>
n /n, (that is, the averaging algorithm achieves average consensus)
(b) A is primitive, (that is, the digraph is aperiodic and strongly connected)
(c) (A 1n 1>
n /n) < 1. (that is, the error dynamics is convergent)
>
>
>
Proof. To study the error dynamics, note that 1>
n x(k+1) = 1n Ax(k) and, in turn, that 1n x(k) = 1n x(0);
see also Exercise E7.6. Therefore, average(x(0)) = average(x(k)) and (k) 1n for all k. This completes
the proof of statement (i). To prove statement (ii), we compute


>
(k + 1) = Ax(k) xfinal = Ax(k) (1n 1>
n /n)x(k) = A 1n 1n /n x(k),


and the equation in statement (ii) follows from A 1n 1>
n /n 1n = 0n .
Next, let us prove the equivalence among the three properties. From PerronFrobenius Theorem 2.16
for primitive matrices in Chapter 2 and from Corollary 2.20, we know that A primitive (statement (iii)b)
implies average consensus (statement (iii)a). The converse is true because 1n 1>
n /n is a positive matrix
k
and, by the definition of limit, there must exist k such that each entry of A becomes positive.
Finally, we prove the equivalence between statement (iii)a and (iii)c. First, note that P = In 1n 1>
n /n
2
is a projection matrix, that is, P = P . This can be easily verified by expanding the matrix power P 2 .
Second, let us prove a useful identity:
k
>
Ak 1n 1>
n /n = A (In 1n 1n /n)

(because A row-stochastic)

k
= Ak (In 1n 1>
n /n)
k
k
= A(In 1n 1>
= A 1n 1>
n /n)
n /n .

(because In 1n 1>
n /n is a projection)

The statement follows from taking the limit as k in this identity and by recalling that a matrix is
convergent if and only if its spectral radius is less then one.

We are now ready to prove the main theorem in this section.
Proof of Theorem 10.2. Regarding the equalities (10.2), the formula for rstep is an immediate consequence
of the definition of induced 2-norm:
rstep (A) = sup

(k)6=0n

k(k + 1)k2
k(A 1n 1>
n /n)(k)k2
= sup
k(k)k2
k(k)k2
(k)6=0n

The equality rasym (A) = (A 1n 1>


n /n) is a consequence of the error dynamics, in Lemma 10.3,
statement (ii). Next, note that (A) = 1 is a simple eigenvalue and A is semiconvergent. Hence, by
Exercise E2.2 on the Jordan normal form of A, there exists a nonsingular T such that
A=T

1
0n1


0>
n1 T 1 ,
B

where B R(n1)(n1) is convergent, that is, (B) < 1. Moreover we know ess (A) = (B).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

10.3. Cumulative quadratic index for symmetric matrices

123

Usual properties of similarity transformations imply


k

A =T

1
0n1


0>
n1 T 1 ,
Bk

lim A = T

0>
n1

0n1 0(n1)(n1)

T 1 .

Because A is doubly-stochastic and primitive, we know limk Ak = 1n 1>


n /n so that A can be decomposed as


0
0>
>
n1
A = 1n 1n /n + T
T 1 ,
0n1
B
and conclude with ess (A) = (B) = (A 1n 1>
n /n). This concludes the proof of the equalities (10.2).
The bound (10.3) is an immediate consequence of the definition of induced norm.
Finally, we leave to the reader the proof of the bound (10.4), which, once again, relies upon the
Jordan block decomposition. Note that the arbitrarily-small positive parameter is required because
the eigenvalue corresponding to the essential spectral radius may have an algebraic multiplicity strictly
larger than its geometric multiplicity.


10.3

Cumulative quadratic index for symmetric matrices

The previous convergence metrics (per-step convergence factor and asymptotic convergence factor) are
worst-case convergence metrics (both are defined with a supremum operation) that are achieved only
for particular initial conditions, e.g., the performance predicted by the asymptotic metric rasym (A) is
achieved when x(0) xfinal is aligned with the eigenvector associated to ess (A) = (A 1n 1>
n /n).
However, the average and transient performance may be much better.
To study an appropriate average performance, we follow the treatment in (Carli et al. 2009). We
consider an averaging algorithm
x(k + 1) = Ax(k),
defined by a row-stochastic matrix A and subject to random initial conditions x0 satisfying
E[x0 ] = 0n ,

and

E[x0 x>
0 ] = In .

Recall the disagreement vector (k) defined in (10.5) and the associated disagreement dynamics

(k + 1) = A 1n 1>
n /n (k) ,

and observe that the initial conditions of the disagreement vector (0) satisfy
E[(0)] = 0n

and

E[(0)(0)> ] = In 1n 1>
n /n .

To define an average transient and asymptotic performance of this averaging algorithm, we are define
the cumulative quadratic index of the matrix A by
K

1X 
Jcum (A) = lim
E k(k)k22 .
K n
k=0

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(10.6)

124

Chapter 10. Convergence Rates, Scalability and Optimization

Theorem 10.4 (Cumulative quadratic index for symmetric matrices). The cumulative quadratic
index (10.6) of a row-stochastic, primitive, and symmetric matrix A satisfies
Jcum (A) =

1
n

spec(A)\{1}

Proof. Pick a terminal time K N and define JK (A) =


and the disagreement dynamics, we compute

JK (A) =

1
n

1
.
1 2

PK



k(k)k22 . From the definition (10.6)

k=0 E

K
 

1X
trace E (k)(k)>
n
k=0


K

k 
k > 
 
1X
>
>
>
trace A 1n 1n /n E (0)(0)
A 1n 1n /n
=
n
k=0
K

k 
k > 
1X
>
>
trace A 1n 1n /n
A 1n 1n /n
.
=
n
k=0

Because A is symmetric, also the matrix A 1n 1>


n /n is symmetric and can be diagonalized as A
> , where Q is orthonormal and is a diagonal matrix whose diagonal entries are the
1n 1>
/n
=
QQ
n

elements of spec A 1n 1>
n /n = {0} spec(A) \ {1}. It follows that

K
> 

1X
k >
k >
JK (A) =
trace Q Q Q Q
n
=

=
=

1
n
1
n
1
n

k=0
K
X
k=0
K
X



trace k k
X

(because trace(AB) = trace(BA))

2k

k=0 spec(A)\{1}

spec(A)\{1}

1 2(K1)
.
1 2

(because of the geometric series)

The formula for Jcum follows from taking the limit as K and recalling that A primitive implies
ess (A) < 1.

Note: All eigenvalues of A appear in the computation of the cumulative quadratic index (10.6), not
only the dominant eigenvalue as in the asymptotic convergence factor. Similar results can be obtained
for normal matrices, as opposed to symmetric, as illustrated in (Carli et al. 2009); it is not known how to
compute the cumulative quadratic index for arbitrary doubly-stochastic primitive matrices.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

10.4. Circulant network examples and scalability analysis

10.4

125

Circulant network examples and scalability analysis

In general it is difficult to compute explicitly the second largest eigenvalue magnitude for an arbitrary
matrix. There are some graphs with constant essential spectral radius, independent of the network
size n. For example, a complete graph with identical weights and doubly stochastic adjacency matrix
A = 1n 1>
n /n has ess (A) = 0. In this case, the associated averaging algorithm converges in a single step.
Next, we present an interesting family of examples where all eigenvalues are known. Recall the
cyclic balancing problem from Section 1.4, where each bug feels an attraction towards the closest
counterclockwise and clockwise neighbors. Given the angular distances between bugs di = i+1 i ,
for i {1, . . . , n} (with the usual convention that dn+1 = d1 and d0 = dn ), the closed-loop system is
d(k + 1) = An, d(k), where [0, 1/2[, and

1 2

..
..

.
.
1 2

..
..
..
0
.
.

1 2
.
.
An, =

.
..
..
..
..
..
.
.
.
.
0

..
..
0
.
.
1 2

1 2

This matrix is circulant, that is, each row-vector is equal to the preceding row-vector rotated one
element to the right. Circulant matrices have remarkable properties (Davis 1994). For example, from
Exercise E10.2, the eigenvalues of An, can be computed to be (not ordered in magnitude)
2(i 1)
+ (1 2), for i {1, . . . , n}.
(10.7)
n
An illustration is given in Figure 10.1. For n even (similar results hold for n odd), plotting the eigenvalues
i = 2 cos

fk (x) = 2 cos(2x) + (1

2)

1.0
1

=1

= f ((i

1)/n), i 2 {1, . . . , n}, n = 5

= .1
0.5

= .4

= .2

= .3

-0.5

0.4

= .4

0.0
0.2

0.6

0.8

= .5

1.0

0.2
3

0.4

0.6

0.8

1.0

-1.0

Figure 10.1: The eigenvalues of An, as given in equation (10.7). The left figure illustrate also the case of = .5,
even if that value is strictly outside the allowed range [0, .5[.

on the segment [1, 1] shows that


ess (An, ) = max{|2 |, |n/2+1 |},

where 2 = 2 cos

2
+ (1 2), and n/2+1 = 1 4.
n

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

126

Chapter 10. Convergence Rates, Scalability and Optimization

If we fix ]0, 1/2[ and consider sufficiently large values of n, then |2 | > |n/2+1 |. In the limit of
large graphs n , the Taylor expansion cos(x) = 1 x2 /2 + O(x4 ) leads to
ess (An, ) = 1 4 2

1
1
+
O
.
n2
n4

Note that ess (An, ) < 1 for any n, but the separation from ess (An, ) to 1, called the spectral gap, shrinks
with 1/n2 .
In summary, this discussion leads to the broad statement that certain large-scale graphs have slow
convergence factors. For more results along these lines (specifically, the case of elegant study of Cayley
graphs), we refer to (Carli et al. 2008). These results can also be easily mapped to the eigenvalues of the
associated Laplacian matrices; e.g., see Exercise E6.12.
We conclude this section by computing the cumulative quadratic cost introduced in Section 10.3.
For the circulant network example, one can compute (Carli et al. 2009)
C1

1
1
Jcum (An,) C2 ,
n
n

where C1 and C2 are positive constants. It is instructive to compare this result with the worst-case
asymptotic or per-step convergence factor that scale as ess (An, ) = 1 4 2 n12 .

10.5

Design of fastest distributed averaging

We are interested in optimization problems of the form:


minimize rasym (A) or rstep (A)
subject to A compatible with a digraph G, doubly-stochastic and primitive
where A is compatible with G if its only non-zero entries correspond to the edges E of the graph. In
>
other words,
P if Eij = ei ej is the matrix with entry (i, j) equal to one and all other entries equal to zero,
then A = (i,j)E aij Eij for arbitrary weights aij R. We refer to such problems as fastest distributed
averaging (FDAs) problems.
Note: In what follows, we remove the constraint A 0 to widen the set of matrices of interest.
Accordingly, we remove the constraint of A being primitive. Convergence to average consensus is
guaranteed by (1) achieving convergence factors less than 1, (2) subject to row-sums and column-sums
equal to 1.
Problem 1: Asymmetric FDA with asymptotic convergence factor

minimize A 1n 1>
n /n
X
>
subject to A =
aij Eij , A1n = 1n , 1>
n A = 1n
(i,j)E

The asymmetric FDA is a hard optimization problem. Even though the constraints are linear, the
objective function, i.e., the spectral radius of a matrix, is not convex (and, additionally, not even Lipschitz
continuous).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

10.5. Design of fastest distributed averaging

127

Problem 2: Asymmetric FDA with per-step convergence factor





minimize A 1n 1>
n /n 2
X
>
subject to A =
aij Eij , A1n = 1n , 1>
n A = 1n
(i,j)E

Problem 3: Symmetric FDA problem (recall A = A> implies (A) = kAk2 ):



minimize A 1n 1>
n /n
X
aij Eij , A = A> , A1n = 1n
subject to A =
(i,j)E

Both Problems 2 and 3 are convex and can be rewritten as so-called semi-definite programs (SDPs);
see (Xiao and Boyd 2004). An SDP is an optimization problem where (1) the variable is a positive
semidefinite matrix, (2) the objective function is linear, and (3) the constraints are affine equations. SDPs
can be efficiently solved by software tools such as CVX; see (Grant and Boyd 2014).

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

128

10.6
E10.1

Chapter 10. Convergence Rates, Scalability and Optimization

Exercises
Induced norm. Assume A is doubly stochastic, primitive and has a strictly-positive diagonal. Show that
rstep (A) = kA 1n 1>
n /nk2 < 1.

E10.2

Eigenpairs for circulant matrices. Let C Cnn


(c0 , . . . , cn1 ) such that

c0
c1
cn1 c0

C= .
..
..
.
c1

Show that

c2

be circulant, that is, assume there exists a vector


...
...
..
.
...

cn1
cn2

.. .
.
c0

(i) the complex eigenvectors and eigenvalues of C are, for j {0, . . . , n 1},

>
vj = 1, j , j2 , , jn1 ,

j = c0 + c1 j + c2 j2 + + cn1 jn1 ,

E10.3

 2j 1 
where j = exp
, j {0, . . . , n 1}, are the nth root of unity.
n
+
(ii) for n even and (c0 , c1 , . . . , cn1 ) = (1 2, , 0, . . . , 0, ), the eigenvalues are i = 2 cos 2(i1)
n
(1 2) for i {1, . . . , n}.

Spectral gap of regular ring graphs. A k-regular ring graph is an undirected ring graph with n-nodes
each connected to itself and its 2k nearest neighbors with a uniform weight equal to 1/(2k + 1). The
associated doubly-stochastic adjacency matrix An,k is a circulant matrix with first row given by
 1

1
1
1
. . . 2k+1
0 . . . 0 2k+1
. . . 2k+1
An,k (1, :) = 2k+1
.
Using the results in Exercise E10.2, compute

(i) the eigenvalues of An,k as a function of n and k;


(ii) the limit of the spectral gap for fixed k as n ; and
(iii) the limit of the spectral gap for 2k = n 1 as n .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 11

Time-varying Averaging Algorithms


In this chapter we discuss time-varying consensus algorithms. We borrow ideas from (Bullo et al. 2009;
Hendrickx 2008).

11.1

Examples and models of time-varying discrete-time algorithms

In time-varying or time-varying algorithms the averaging row-stochastic matrix is not constant throughout time, but instead changes values and, possibly, switches among a finite number of values. Here are
examples of discrete-time averaging algorithms with switching matrices.

11.1.1

Shared Communication Channel

Given a communication digraph Gshared-comm , at each communication round, only one node can transmit
to all its out-neighbors over a common bus and every receiving node will implement a single averaging
step. For example, if agent j receives the message from agent i, then agent j will implement:
1
x+
j := (xi + xj ).
2

(11.1)

Each node is allocated a communication slot in a periodic deterministic fashion, e.g., in a round-robin
scheduling, where the n agents are numbered and, for each i, agent i talks only at times i, n + i, 2n +
i, . . . , kn + i for k Z0 . For example, in Figure 11.1 we illustrate the communication digraph and in
Figure 11.2 the resulting round-robin communication protocol.

Gshared-comm

Figure 11.1: Example communication digraph

129

130

Chapter 11. Time-varying Averaging Algorithms


3
1

4
2

3
1

time = 1, 5, 9, . . .

4
2

time = 2, 6, 10, . . .

3
1

4
2

time = 3, 7, 11, . . .

3
1

4
2

time = 4, 8, 12, . . .

Figure 11.2: Round-robin communication protocol.

Formally, let Ai denote the averaging matrix corresponding to the transmission by agent i to its
out-neighbors. With round robin scheduling, we have
x(n + 1) = An An1 A1 x(1).

11.1.2

Asynchronous Execution

Imagine each node has a different clock, so that there is no common time schedule. Suppose that
messages are safely delivered even if transmitting and receiving agents are not synchronized. Each
time an agent wakes up, the available information from its neighbors varies. At an iteration instant for
agent i, assuming agent i has new messages/information from agents i1 , . . . , im , agent i will implement:
1
1
x+
i := m+1 xi + m+1 (xi1 + + xim ).
Given arbitrary clocks, one can consider the set of times at which one of the n agent performs an
iteration. Then the system is a discrete-time averaging algorithm. It is possible to carefully characterize
all possible sequences of events (who transmitted to agent i when it wakes up).

11.1.3

Models of time-varying averaging algorithms

Consider a sequence of row-stochastic matrices {A(k)}kZ0 , or equivalently a time-varying rowstochastic matrix k 7 A(k). The associated time-varying averaging algorithm is the discrete-time dynamical
system
x(k + 1) = A(k)x(k),

k Z0 .

(11.2)

We let {G(k)}kZ0 be the sequence of weighted digraphs associated to the matrices {A(k)}kZ0 .
 Note that (1, 1n ) is an eigenpair for each matrix A(k). Hence, all points in the consensus set
1n | R are equilibria for the algorithm. We aim to provide conditions under which each solution
converges to consensus.
We start with a useful definition, for two digraphs G = (V, E) and G0 = (V 0 , E 0 ), union of G and G0
is defined by
G G0 = (V V 0 , E E 0 ).
(In what follows, we will need to compute only the union of digraphs with the same set of vertices; in
that case, the graph union is essentially defined by the union of the edge sets.) Some useful properties of
the product of multiple row-stochastic matrices and of the unions of multiple digraphs are presented in
Exercise E11.1.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

11.2. Convergence over time-varying connected graphs

11.2

131

Convergence over time-varying connected graphs

Let us first consider the case when A(k) induces an undirected, connected, and aperiodic graph G(k) at
each time k.
Theorem 11.1 (Convergence under point-wise connectivity). Let {A(k)}kZ0 be a sequence of
symmetric and doubly-stochastic matrices with associated graphs {G(k)}kZ0 so that
(A1) each non-zero edge weight aij (k), including the self-loops weights aii (k), is larger than a constant > 0;
and
(A2) each graph G(k) is connected and aperiodic point-wise in time.

Then the solution to x(k + 1) = A(k)x(k) converges exponentially fast to average x(0) 1n .

The first assumption in Theorem 11.1 prevents the weights from becoming arbitrarily close to zero
as k and assures that ess (A(k)) is upper bounded by a number strictly lower than 1 at every time
k Z0 . To gain some intuition into this non-degeneracy assumption, consider a sequence of symmetric
and doubly-stochastic averaging matrices {A(k)}kZ0 with entries given by


exp(1/(k + 1) )
1 exp(1/(k + 1) )
A(k) =
1 exp(1/(k + 1) )
exp(1/(k + 1) )
for k Z0 and exponent 1. Clearly, for k and for any 1 this matrix converges to
A = [ 01 10 ] with spectrum spec(A ) = {1, +1} and essential spectral radius ess (A ) = 1. One can
show that, for = 1, the convergence of A(k) to A is sufficiently slow so that {x(k)}k converges
to average(x(0)1n , whereas this property is not satisfied for faster convergence rates 2, and the
iteration oscillates indefinitely. 1
Proof of Theorem 11.1. Under the assumptions of the theorem, we have that there exists a c [0, 1[
so that ess (A(k)) c < 1 for all l Z0 . Recall the notion of the disagreement vector (k) =
x(k) average(x(0))1n and define V () = kk2 . It is immediate to compute
V ((k + 1)) = V (A(k)(k)) = kA(k)(k)k2 ess (A(k))2 k(k)k2 c2 V ((k)).
It follows that V ((k)) c2k V ((0)) or k(k)k ck k(0)k, that is, (k) converges
 to zero exponentially
fast. Equivalently, as k , x(k) converges exponentially fast to average x(0) 1n .

The proof idea of Theorem 11.1 is based on the disagreement vector and a so-called common Lyapunov
function, that is, a positive function that decreases along the systems evolutions (we postpone the general
definition of Lyapunov function to Chapter 13). The quadratic function V proposed above is useful
also for sequences of irreducible and primitive row-stochastic matrices {A(k)}kZ0 with a common
positive left eigenvector associated to the eigenvalue (A(k)) = 1, see Exercise E11.5. If the matrices
{A(k)}kZ0 do not share a common left eigenvector associated to the eigenvalue (A(k)) = 1, then
1

To understand the essence of this example, consider the scalar iteration x(k + 1) = exp(1/(k + 1) )x(k). In logarithmic
P
1
coordinates the solution is given by log(x(k)) = k1
k=0 (k+1) x0 . For = 1, log(x(k )) diverges to , and
x(k ) converges. Likewise, for > 1, log(x(k )) exists, and thus x(k ) does not converge to zero.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

132

Chapter 11. Time-varying Averaging Algorithms

there exists generally no common quadratic Lyapunov function of the form V () = > P with P being
a positive definite matrix; e.g., see (Olshevsky and Tsitsiklis 2008). Likewise, if a sequence of symmetric
matrices {A(k)}kZ0 does not induce a connected and aperiodic graph point-wise in time, then the
above analysis fails, and we need to search for non-quadratic common Lyapunov functions.

11.3

Convergence over digraphs connected over time

We are now ready to state the main result in this chapter, originally due to Moreau (2005).
Theorem 11.2 (Consensus for time-varying algorithms). Let {A(k)}kZ0 be a sequence of rowstochastic matrices with associated digraphs {G(k)}kZ0 . Assume that
(A1) each digraph G(k) has a self-loop at each node;
(A2) each non-zero edge weight aij (k), including the self-loops weights aii (k), is larger than a constant > 0;
and
(A3) there exists a duration N such that, for all times k Z0 , the digraph G(k) G(k + 1)
contains a globally reachable node.
Then
(i) there exists a nonnegative w Rn normalized to w1 + + wn = 1 such that limk A(k)A(k
1) A(0) = 1n w> ;

(ii) the solution to x(k + 1) = A(k)x(k) converges exponentially fast to w> x(0) 1n ;

(iii) if additionally each matrix in the sequence is doubly-stochastic, then w = n1 1n so that



lim x(k) = average x(0) 1n .

Note: In a sequence with property (A2), edges can appear and disappear, but the weight of each
edge (that appears an infinite number of times) does not go to zero as k .
Note: This result is analogous to the time-invariant result that we saw in Chapter 5. The existence
of a globally reachable node is the connectivity requirement in both cases.
Note: Assumption (A3) is a uniform connectivity requirement, that is, any interval of length must
have the connectivity property. In equivalent words, the connectivity property holds for any contiguous
interval of duration .
Note: the theorem provides only a sufficient condition. For results on necessary and sufficient
conditions we refer the reader to the recent works (Blondel and Olshevsky 2014; Xia and Cao 2014) and
references therein.

11.3.1

Shared communication channel with round robin scheduling

Consider the shared communication channel model with round-robin scheduling. Assume the algorithm
is implemented over a communication graph Gshared-comm that is strongly connected.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

11.3. Convergence over digraphs connected over time

133

Consider now the assumptions in Theorem 11.2. Assumptions (A1) is satisfied because in equation (11.1) the self-loop weight is equal to 1/2. Similarly, Assumption (A2) is satisfied because the edge
weight is equal to 1/2. Finally, Assumption (A3) is satisfied with duration selected equal to n, because
after n rounds each node has transmitted precisely once and so all edges of the communication graph
Gshared-comm are present in the union graph. Therefore, the algorithm converges to consensus. However,
the algorithm does not converge to average consensus since it is false that the averaging matrices are
doubly-stochastic.
Note: round robin is not necessarily the only scheduling protocol with convergence guarantees.
Indeed, consensus is achieved so long as each node is guaranteed a transmission slot once every bounded
period of time.

11.3.2

Convergence theorems for symmetric time-varying algorithms

Theorem 11.3 (Consensus for symmetric time-varying algorithms). Let {A(k)}kZ0 be a sequence
of symmetric row-stochastic matrices with associated undirected graphs {G(k)}kZ0 . Let the matrix sequence
{A(k)}kZ0 satisfy Assumptions (A1), (A2) in Theorem 11.2 as well as
(A4) for all k Z0 , the graph k G( ) is connected.
Then
(i) limk A(k)A(k 1) A(0) = n1 1n 1>
n;


(ii) each solution to x(k + 1) = A(k)x(k) converges exponentially fast to average x(0) 1n .

Note: this result is analogous to the time-invariant result that we saw in Chapter 5. For symmetric rowstochastic matrices and undirected graphs, the connectivity of an appropriate graph is the requirement
in both cases.
Note: Assumption (A3) in Theorem 11.2 requires the existence of a finite time-interval of duration
so that the union graph k 1 G( ) contains a globally reachable node for all times k 0. This
assumption is weakened in the symmetric case in Theorem 11.3 to Assumption (A4) requiring that the
union graph k G( ) is connected for all times k 0.

11.3.3

Uniform connectivity is required for non-symmetric matrices

We have learned that for asymmetric matrices a uniform connectivity property (A3) is required, whereas
for symmetric matrices, uniform connectivity is not required (see (A4)). Here is a counter-example
from (Hendrickx 2008) showing that Assumption (A3) cannot be relaxed for asymmetric graphs. Initialize
a group of n = 3 agents to
x1 < 1, x2 < 1, x3 > +1.
+
+
Step 1: Perform x+
1 := (x1 + x3 )/2, x2 := x2 , x3 := x3 a number of times 1 until

x1 > +1,

x2 < 1,

x3 > +1.

+
+
Step 2: Perform x+
1 := x1 , x2 := x2 , x3 := (x2 + x3 )/2 a number of times 2 until

x1 > +1,

x2 < 1,

x3 < 1.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

134

Chapter 11. Time-varying Averaging Algorithms

+
+
Step 3: Perform x+
1 := x1 , x2 := (x1 + x2 )/2, x3 := x3 a number of times 3 until

x1 > +1,

x2 > +1,

x3 < 1.

And repeat this process.

Step 1

Step 2

=
3

Step 3

union

Observe that on steps 1, 7, 15, . . . , the variable x1 is made to become larger than +1 by computing
averages with x3 > +1. Note that every time this happens the variable x3 > +1 is increasingly smaller
and closer to +1. Hence, 1 < 7 < 15 < . . . , that is, it takes more steps for x1 to become larger than
+1. Indeed, one can formally show the following:
(i) The agents do not converge to consensus.
(ii) Hence, one of the assumptions of Theorem 11.2 must be violated.
(iii) It is easy to see that (A1) and (A2) are satisfied.
(iv) Regarding connectivity, note that, for all k Z0 , the digraph k G( ) contains a globally
reachable node. However, this property is not quite (A3).
(v) Assumption (A3) in Theorem 11.2 must be violated: there does not exist a duration N such
that, for all k Z0 , the digraph G(k) G(k + 1) contains a globally reachable node.
(vi) Indeed, one can show that limk k = so that, as we keep iterating Steps 1+2+3, their duration
grows unbounded.

11.4

Analysis methods and proofs

It is well known that, for time-varying systems, the analysis of eigenvalues is not sufficient anymore! In
the following example, two matrices with spectral radius equal to 1/2 are multiplied to obtain a spectral
radius larger than 1:
1
 1
 5

2 1
2 0 = 4 0 .
0 0 1 0
0 0
Hence, it is not possible to predict the convergence of arbitrary products of matrices, just based on their
spectral radii and we need to work harder and with sharper tools.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

11.4. Analysis methods and proofs

11.4.1

135

Bounded solutions and non-increasing max-min function

In what follows, we propose a so-called contraction analysis based on a common Lyapunov function
(which is not quadratic). We start by defining the max-min function Vmax-min : Rn R0 by
Vmax-min (x) = max(x1 , . . . , xn ) min(x1 , . . . , xn )
=

max xi

i{1,...,n}

min

i{1,...,n}

xi .

Note that:
(i) Vmax-min (x) 0, and

(ii) Vmax-min (x) = 0 if an only if x = 1n for some R.


Lemma 11.4 (Monotonicity and bounded evolutions). If A is row-stochastic, then for all x Rn
Vmax-min (Ax) Vmax-min (x).
For any sequence of row-stochastic matrices, the solution x(k) of the corresponding time-varying averaging algorithm
satisfies, from any initial condition x(0) and at any time k,
and

Vmax-min (x(k)) Vmax-min (x(0)),

min x(0) min x(k) min x(k + 1) max x(k + 1) max x(k) max x(0).
Proof. For the maximum, let us compute:
max(Ax)i = max
i

n
X
j=1

aij xj max
i

aij max xj =

n
X

n

X
 

aij min xj = min
aij min xj = 1 min xi .

j=1

max
i

n
X

n
X

j=1

aij


max xj = 1 max xi .
j

Similarly, for the minimum,


min(Ax)i = min
i

n
X
j=1

aij xj min
i

j=1

j=1

Connectivity over time


Lemma 11.5 (Global reachability over time). Given a sequence of digraphs {G(k)}kZ0 such that each
digraph G(k) has a self-loop at each node, the following two properties are equivalent:
(i) there exists a duration N such that, for all times k Z0 , the digraph G(k) G(k + 1)
contains a directed spanning tree;
(ii) there exists a duration N such that, for all times k Z0 , there exists a node j = j(k) that reaches
all nodes i {1, . . . , n} over the interval {k, k + 1} in the following sense: there exists a sequence of
nodes {j, h1 , . . . , h1 , i} such that (j, h1 ) is an edge at time k, (h1 , h2 ) is an edge at time k + 1, . . . ,
(h2 , h1 ) is an edge at time k + 2, and (h1 , i) is an edge at time k + 1;
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

136

Chapter 11. Time-varying Averaging Algorithms

or, equivalently, for the reverse digraph,


(iii) there exists a duration N such that, for all times k Z0 , the digraph G(k) G(k + 1)
contains a globally reachable node;
(iv) there exists a duration N such that, for all times k Z0 , there exists a node j reachable from all
nodes i {1, . . . , n} over the interval {k, k + 1} in the following sense: there exists a sequence of
nodes {j, h1 , . . . , h1 , i} such that (h1 , j) is an edge at time k, (h2 , h1 ) is an edge at time k + 1, . . . ,
(h1 , h2 ) is an edge at time k + 2, and (i, h1 ) is an edge at time k + 1.
Note: It is sometimes easy to see if a sequence of digraphs satisfies properties (i) and (iii). Property (iv)
is directly useful in the analysis later in the chapter. Regarding the proof of the lemma, it is easy to check
that (ii) implies (i) and that (iv) implies (iii) with = . The converse is left as Exercise E11.3.

11.4.2

Proof of Theorem 11.2: the max-min function is exponentially decreasing

This proof is inspired by the presentation in (Hendrickx 2008, Theorem 9.2). We start by noting that
Assumptions (A1) and (A3) imply property Lemma 11.5(iv) about the existence of a duration with
certain properties. Next, without loss of generality, we assume that at some time h, for some h N,
the solution x(h) is not equal to a multiple of 1n and, therefore, satisfies Vmax-min (x(h)) > 0. Clearly,
x((h + 1)) = A((h + 1) 1) A(h + 1) A(h) x(h).
By Assumption (A3), we know that there exists a node j reachable from all nodes i over the interval
{h, (h + 1) 1} in the following sense: there exists a sequence of nodes {j, h1 , . . . , h1 , i} such
that all following edges exist in the sequence of digraphs: (h1 , j) at time h, (h2 , h1 ) at time h + 1,
. . . , (i, h1 ) at time (h + 1) 1. Therefore, Assumption (A2) implies



ah1 ,j h , ah2 ,h1 h + 1 , . . . , ai,h1 (h + 1) 1 ,

and therefore their product satisfies






ai,h1 (h + 1) 1 ah1 ,h2 (h + 1) 2 ah2 ,h1 h + 1 ah1 ,j h .

Remarkably, this product is one term in the (i, j) entry of the row-stochastic matrix A := A((h + 1)
1) A(h). In other words, Assumption (A3) implies Aij .
Hence, for all nodes i, given globally reachable node j during interval {h, (h + 1)}, we compute
Xn

xi (h + 1) = Ai,j xj (h) +
Ai,p xp (h)
(by definition)
p6=j,p=1


Ai,j xj (h) + (1 Ai,j ) max x(h)
(because xp (h) max x(h) )



max x(h) + Ai,j xj (h) max x(h)


xj (h) + (1 ) max x(h) .
(because xj (h) max x(h) )
A similar argument leads to



xi (h + 1) xj (h) + (1 ) min x(h) ,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

11.5. Time-varying algorithms in continuous-time

137

so that



Vmax-min x((h + 1)) max xi (h + 1) min xi (h + 1)
i
i


(1 )Vmax-min x(h) .

This final inequality, together with Lemma 11.4, proves exponential convergence of the cost function
k 7 Vmax-min (x(k)) to zero and convergence of x(k) to a multiple of 1n . We leave the other statements
in Theorem 11.2 to the reader and refer to (Hendrickx 2008; Moreau 2005) for further details.

11.5

Time-varying algorithms in continuous-time

We now consider the continuous-time linear time-varying system


x(t)

= L(t)x(t).
We associate a time-varying graph G(t) (without self loops) to the time-varying Laplacian L(t) in the
usual manner.
For example, in Chapter 7, we discussed how the heading in some flocking models is described by
the continuous-time Laplacian flow:
= L,

where each is the heading of a bird, and where L is the Laplacian of an appropriate weighted digraph
G: each bird is a node and each directed edge (i, j) has weight 1/dout (i). We discussed also the need to
consider time-varying graphs: birds average their heading only with other birds within sensing range,
but this sensing relationship may change with time.
Recall that the solution to a continuous-time time-varying system can be given in terms of the state
transition matrix:
x(t) = (t, 0)x(0),

We refer to (Hespanha 2009) for the proper definition and study of the state transition matrix.

11.5.1

Undirected graphs

We first consider the case when L(t) induces an undirected and connected graph G(t) for all t R0 .
Theorem 11.6 (Convergence under point-wise connectivity). Let t 7 L(t) = L(t)> be a timevarying Laplacian matrix with associated time-varying digraph t 7 G(t), t R0 . Assume
(A1) each non-zero edge weight aij (t) is larger than a constant > 0,

(A2) for all t R0 , the digraph associated to the symmetric Laplacian matrix L(t) is undirected and connected.

Then

(i) the state transition matrix (t, 0) associated to L(t) satisfies limt (t, 0) = 1n 1>
n /n,

(ii) the solution to x(t)

= L(t)x(t) converges exponentially fast to



lim x(t) = average x(0) 1n .
t

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

138

Chapter 11. Time-varying Averaging Algorithms

The first assumption in Theorem 11.1 prevents that the weights become arbitrarily close to zero
as t , and it assures that 2 (L) is strictly positive for all t R0 . To see the necessity of this
non-degeneracy assumption, consider the time-varying Laplacian

L(t) = a(t)L

(11.3)

=L
> is a symmetric time-invariant Laplacian
where a : R0 R0 is piece-wise continuous, and L
matrix. It can be verified that solution to x(t)

= L(t)x(t) is given by


x(t) = exp(L(t))x0 = exp L

a( )d

 
exp (A(t) A(0)) x0 ,
x0 = exp L



A(0) .
where d/dt A(t) = a(t). If a(t) is integrable on [0, [, then exp(L(t)) converges to exp L

In analogy to Theorem 11.1, Theorem 11.6 can be proved by considering the norm of the disagreement vector V () = kk2 as a common Lyapunov function. As in the continuous-time case, this
quadratic Lyapunov function has some fundamental limitations pointed out by Moreau (2004). We
review these limitations in the following theorem, extension of Lemma 6.2.

Theorem 11.7 (Limitations of quadratic Lyapunov functions). Let L be a Laplacian matrix associated
with a weighted digraph G. The following statements are equivalent:
(i) L + L> is negative semi-definite;
(ii) L has zero column sums, that is, G is weight-balanced;
(iii) the sum of squares function V () = kk2 is strictly decreasing along trajectories of the Laplacian flow
x = Lx; and

(iv) every convex function V (x) invariant under coordinate permutations is non-increasing along the trajectories
of x = Lx.

Proof sketch. The equivalence of statements (i) and (ii) has been shown in Lemma 6.2. The equivalence
of (i) and (iii) can be proved with a Lyapunov argument similar to the discrete-time case; see Theorem 11.1.
The implication (iv) = (iii) is trivial. To complete the proof, we show that (ii) = (iv). Recall that
the matrix exponential of a Laplacian matrix exp(Lt) is a nonnegative doubly stochastic matrix (see
Exercise E7.3) that can be decomposed into a convex combination of finitely many permutation
P matrices
by the the Birkhoff-Von-Neumann Theorem (see Exercise E2.14). In particular, exp(Lt) = i i (t)Pi ,
where Pi are permutation matrices and i (t) are convex coefficients for every t 0. By convexity of
V (x) and invariance under coordinate permutations we have for any initial condition x0 Rn and for
any t 0
X
 X
X
V (exp(Lt)x0 ) = V
i (t)Pi x0
i (t)V (Pi x0 ) =
i (t)V (x0 ) = V (x0 ) .
i

It follows that V () = kk2 serves as a common Lyapunov function for the time-varying Laplacian
flow x(t)

= L(t)x(t) only if L(t) is weight-balanced and connected point-wise in time. To partially


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

11.5. Time-varying algorithms in continuous-time

139

remedy these strong assumptions, consider now the case when L(t) induces an undirected graph at any
point in time t 0 and an integral connectivity condition holds similar to the discrete-time case. To
motivate the general case, recall the example in (11.3) with a single time-varying parameter a(t). In this
simple
R example, a necessary and sufficient condition for convergence to consensus was that the integral
0 a( )d is divergent. The following result from (Hendrickx and Tsitsiklis 2013) generalizes this case.

Theorem 11.8 (Convergence under integral connectivity). Let t 7 A(t) = A(t)> be a time-varying
symmetric adjacency
matrix. Consider an associated undirected graph G = (V, E), t R0 , that has an edge
R
(i, j) E if 0 aij ( )d is divergent. Assume
(A1) each non-zero edge weight aij (t) is larger than a constant > 0,

(A2) the graph G is connected.


Then
(i) the state transition matrix (t, 0) associated to L(t) satisfies limt (t, 0) = 1n 1>
n /n,

(ii) the solution to x(t)

= L(t)x(t) converges exponentially fast to



lim x(t) = average x(0) 1n .
t

Theorem 11.8 is the continuous-time analog of Theorem 11.3. We remark that the original statement
in (Hendrickx and Tsitsiklis 2013) does not require Assumption (A1) thus allowing for weights such
as aij = 1/t which lead to non-uniform convergence, i.e., the convergence rate depends on the time
t0 when the system is initialized. The proof method of Theorem 11.8 is based on the fact that the
minimal (respectively maximal) element of x(t), the sum of the two smallest (respectively two largest)
elements, the sum of the three smallest (respectively three largest) elements, etc., are all bounded and
non-decreasing (respectively non-increasing). A continuity argument can then be used to show average
consensus.

11.5.2

Directed graphs

The proof method of Theorem 11.8 does not extend to general non-symmetric Laplacian matrices. If
we use the max-min function Vmax-min (x) = maxi{1,...,n} xi mini{1,...,n} xi as a common Lyapunov
function candidate, then we arrive at the following general result (Lin et al. 2007; Moreau 2004).
Theorem 11.9 (Consensus for time-varying algorithms in continuous time). Let t 7 A(t) be a
time-varying adjacency matrix with associated time-varying digraph t 7 G(t), t R0 . Assume
(A1) each non-zero edge weight aij (t) is larger than a constant > 0,
(A2) there exists a duration T > 0 such that, for all t R0 , the digraph associated to the adjancency matrix
Z

t+T

A( )d

contains a globally reachable node.


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

140

Chapter 11. Time-varying Averaging Algorithms

Then
(i) there exists a nonnegative w Rn normalized to w1 + + wn = 1 such that the state transition matrix
(t, 0) associated to L(t) satisfies limt (t, 0) = 1n w> ,

(ii) the solution to x(t)

= L(t)x(t) converges exponentially fast to w> x(0) 1n ,

>
(iii) if additionally, the 1>
n L(t) = 0n for almost all times t (that is, the digraph is weight-balanced at all times,
except a set of measure zero), then w = n1 1n so that

lim x(t) = average x(0) 1n .
t

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

11.6. Exercises

11.6
E11.1

141

Exercises
On the product of stochastic matrices (Jadbabaie et al. 2003). Let k 2 and A1 , A2 , . . . , Ak be
nonnegative nn matrices with positive diagonal entries. Let amin (resp. amax ) be the smallest (resp. largest)
diagonal entry of A1 , A2 , . . . , Ak and let G1 , . . . , Gk be the digraphs associated with A1 , . . . , Ak .
Show that
 2 k1
amin
(A1 + A2 + + Ak ), and
(i) A1 A2 Ak =
2amax
(ii) if the digraph G1 . . . Gk is strongly connected, then the matrix A1 Ak is irreducible.

Hint: Set Ai = amin In + Bi for a nonnegative Bi , and show statement (i) by induction on k.
E11.2
E11.3
E11.4

Products of primitive matrices with positive diagonal. Let A1 , A2 , . . . , An1 be primitive n n


matrices with positive diagonal entries. Show that A1 A2 An1 > 0.
A simple proof. Prove Lemma 11.5.
Hint: You will want to use Exercise E3.5.

Alternative sufficient condition. As in Theorem 11.2, let {A(k)}kZ0 be a sequence of row-stochastic


matrices with associated digraphs {G(k)}kZ0 . Prove that the same asymptotic properties in Theorem 11.2 hold true under the following Assumption (A5), instead of Assumptions (A1), (A2) and (A3):
(A5) there exists a node j such that, for all times k Z0 , each edge weight aij (k), i {1, . . . , n}, is
larger than a constant > 0.
In other words, Assumption (A5) requires that all digraphs G(k) contain all edges aij (k), i {1, . . . , n},
and that all these edges have weights larger than a strictly positive constant.
Hint: Modify the proof of Theorem 11.2.

E11.5

Convergence for strongly-connected graphs point-wise in time: discrete time. Consider a sequence {A(k)}kZ0 of row-stochastic matrices with associated graphs {G(k)}kZ0 so that

(A1) each non-zero edge weight aij (k), including the self-loops weights aii (k), is larger than a constant
> 0;
(A2) each graph G(k) is strongly connected and aperiodic point-wise in time; and
(A3) there is a positive vector w Rn satisfying w> 1n = 1 and w> A(k) = w> for all k Z0 .

Without relying on Theorem 11.2, show that the solution to x(k + 1) = A(k)x(k) converges to
limk x(k) = w> x(0)1n .
Hint: Search for a common quadratic Lyapunov function.
E11.6

Convergence for strongly-connected graphs point-wise in time: continuous time. Let t 7 L(t)
be a time-varying Laplacian matrix with associated time-varying digraph t 7 G(t), t R0 so that

(A1) each non-zero edge weight aij (t) is larger than a constant > 0;
(A2) each graph G(t) is strongly connected point-wise in time; and
>
>
(A3) there is a positive vector w Rn satisfying 1>
n w = 1 and w L(t) = 0n for all t R0 .

Without relying on Theorem 11.9, show that the solution to x(t)

= L(t)x(t) satisfies limt x(t) =


w> x(0)1n .
Hint: Search for a common quadratic Lyapunov function.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

142

Chapter 11. Time-varying Averaging Algorithms

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 12

Randomized Averaging Algorithms


In this chapter we discuss averaging algorithms defined by sequences of random stochastic matrices.
In other words, we imagine that at each discrete instant, the averaging matrix is selected randomly
according to some stochastic model. We refer to such algorithms as randomized averaging algorithms.
Randomized averaging algorithms are well behaved and easy to study in the sense that much
information can be learned simply from the expectation of the averaging matrix. Also, as compared with
time-varying algorithms, it is possible to study convergence rates for randomized algorithms. In this
chapter we present results from (Fagnani and Zampieri 2008; Frasca 2012; Garin and Schenato 2010;
Tahbaz-Salehi and Jadbabaie 2008).

12.1

Examples of randomized averaging algorithms

Consider the following models of randomized averaging algorithms.


Uniform Symmetric Gossip. Given an undirected graph G, at each iteration, select uniformly likely
one of the graph edges, say agents i and j talk, and they both perform (1/2, 1/2) averaging, that is:
xi (k + 1) = xj (k + 1) :=


1
xi (k) + xj (k) .
2

A detailed analysis of this model is given by Boyd et al. (2006).

Packet Loss in Communication Network. Given a strongly connected and aperiodic digraph, at
each communication round, packets travel over directed edges and, with some likelihood, each
edge may drop the packet. (If information is not received, then the receiving node can either do
no update whatsoever, or adjust its averaging weights to compensate for the packet loss).
Broadcast Wireless Communication. Given a digraph, at each communication round, a randomlyselected node transmits to all its out-neighbors. (Here we imagine that simultaneous transmissions
are prohibited by wireless interference.)
Opinion Dynamics with Stochastic Interactions and Prominent Agents. (Somehow similar to uniform gossip) Given an undirected graph and a probability 0 < p < 1, at each iteration, select
143

144

Chapter 12. Randomized Averaging Algorithms


uniformly likely one of the graph edges and perform: with probability p both agents perform
the (1/2, 1/2) update, and with probability (1 p) only one agent performs the update and the
prominent agent does not. A detailed analysis of this model is given by (Acemoglu and Ozdaglar
2011).

Note that, in the second, third and fourth example models, the row-stochastic matrices at each
iteration are not symmetric in general, even if the original digraph was undirected.

12.2

A brief review of probability theory

We briefly review a few basic concepts from probability theory and refer the reader for example
to (Breiman 1992).
Loosely speaking, a random variable X : E is a measurable function from the set of possible
outcomes to some set E which is typically a subset of R.
The probability of an event (i.e., a subset of possible outcomes) is the measure of the likelihood that
the event will occur. An event occurs almost surely if it occurs with probability equal to 1.
The random variable X is called discrete if its image is finite or countably infinite. In this case, X is
described by a probability mass function assigning a probability to each value in the image of X.
Specifically, if X takes value in {x1 , . . . , xM }
P R, then the probability mass function p :
{x1 , . . . , xM } [0, 1] satisfies pX (xi ) 0 and ni=1 pX (xi ) = 1, and determines the probability of X being equal to xi by P[X = xi ] = pX (xi ).

The random variable X is called continuous if its image is uncountably infinite. If X is an absolutely
continuous function, X is described by a probability density function assigning a probability to
intervals in the image of X.
Specifically, if RX takes value in R, then the probability density function fX : R [0, 1] satisfies
f (x) 0 and R f (x)dx = 1, and determines the probability of X taking value in the interval
Rb
[a, b] by P[a X b] = a f (x)dx.
P
The expected value of a discrete variable is E[X] = M
i=1 xi pX (xi ).
R
The expected value of a continuous variable is E[X] = xfX (x)dx.

A (finite or infinite) sequence of random variables is independent and identically distributed (i.i.d.) if
each random variable has the same probability mass/distribution as the others and all are mutually
independent.

12.3

Randomized averaging algorithms

In this section we consider random sequences of row stochastic sequences. Accordingly, let A(k) be the
row-stochastic averaging matrix occurring randomly at time k and G(k) be its associated graph. We
then consider the stochastic linear system
x(k + 1) = A(k)x(k).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

12.3. Randomized averaging algorithms

145

We now present the main result of this chapter; for its proof we refer to (Tahbaz-Salehi and Jadbabaie
2008), see also (Fagnani and Zampieri 2008).
Theorem 12.1 (Consensus for randomized algorithms). Let {A(k)}kZ0 be a sequence of random
row-stochastic matrices with associated digraphs {G(k)}kZ0 . Assume
(A1) the sequence of variables {A(k)}kZ0 is i.i.d.,

(A2) at each time k, random matrix A(k) has strictly positive diagonal so that each digraph in the sequence
{G(k)}kZ0 has a self-loop at each node almost surely, and

(A3) the digraph associated to the expected matrix E[A(k)], for any k, has a globally reachable node.
Then the following statements hold almost surely:
(i) there exists a random nonnegative vector w Rn with w1 + + wn = 1 such that
lim A(k)A(k 1) A(0) = 1n w>

almost surely,

(ii) as k , each solution x(k) of x(k + 1) = A(k)x(k) satisfies



lim x(k) = w> x(0) 1n almost surely,
k

(iii) if additionally each random matrix is doubly-stochastic, then w = n1 1n so that



lim x(k) = average x(0) 1n .
k

Note: if each random matrix is doubly-stochastic, then E[A(k)] is doubly-stochastic. The converse is
easily seen to be false.

12.3.1

Additional results on uniform symmetric gossip algorithms

Recall: given undirected graph G with edge set E, at each iteration, select uniformly likely one of the
graph edges, say agents i and j talk, and they both perform (1/2, 1/2) averaging, that is:
xi (k + 1) = xj (k + 1) :=


1
xi (k) + xj (k) .
2

Corollary 12.2 (Convergence for uniform symmetric gossip). If the graph G is connected, then each
solution to the uniform symmetric gossip converges to average consensus with probability 1.
Proof based on Theorem 12.1. The corollary can be established by verifying that Assumptions (A1)(A3)
in Theorem 12.1 are satisfied. Regarding (A3), note that the graph associated to the expected averaging
matrix is G.

We here provide a simple interesting proof by (Frasca 2012).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

146

Chapter 12. Randomized Averaging Algorithms

Proof based on Theorem 11.3. For any time k0 0 and any edge (i, j) E, consider the event the edge
(i, j) is not selected for update at any time larger than k0 . Since the probability that (i, j) is not selected
at any time k is 1 1/|E|, the probability that (i, j) is not selected at any times after k0 is
lim

1 kk0
= 0.
|E|

With this fact one can verify that all assumptions in Theorem 11.3 are satisfied by the random sequence
of matrices almost surely. Hence, almost sure convergence follows. Finally, since each matrix is doubly
stochastic, average(x(k) is preserved, and the solution converges to average(x(0))1n .


12.3.2

Additional results on the mean-square convergence factor

Given a sequence of stochastic averaging matrices {A(k)}kZ0 and corresponding solutions x(k) to
x(k + 1) = A(k)x(k), we define the mean-square convergence factor by


rmean-square {A(k)}kZ0 =

sup

lim sup

x(0)6=xfinal

E kx(k) average(x(k))1n k22

!1/k

Theorem 12.3 (Upper and lower bounds on the mean-square convergence factor). Under the
same assumptions as in Theorem 12.1, the mean-square convergence factor satisfies
 h
i
2
ess E[A(k)] rmean-square E A(k)> (In 1n 1>
n /n)A(k) .

Proof. For a comprehensive proof we refer to (Fagnani and Zampieri 2008, Proposition 4.4). Here
we prove the upper bound for symmetric matrices. Consider the disagreement vector (k) = x(k)
average(x(0))1n = x(k) average(x(k))1n following the dynamics
(k + 1) = (A(k) 1n 1>
n /n)(k).
We have that


rmean-square {A(k)}kZ0 = sup lim sup


(0)6=0n

k(k))k22

!
i 1/k

>
2
= sup lim sup E k(A(k) 1n 1>
n /n) (A(1) 1n 1n /n)(0))k2
(0)6=0n

2
lim sup E k(A(k) 1n 1>
n /n)k2
k

!1/k

!
h
i 1/k
2
E k(A(1) 1n 1>
n /n)k2

h
i
2
= lim sup E k(A(k) 1n 1>
n /n)k2 ,
k
h
i
2
= E k(A(k) 1n 1>
/n)k
for any k,
n
2 ,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

12.3. Randomized averaging algorithms

147

where we used sub-multiplicativity of the matrix norm and the fact that A(k) is i.i.d.. The upper bound
follows from


2
>
>
>
k(A(k) 1n 1>
/n)k
=

(A(k)

1
1
/n)
(A(k)

1
1
/n)
n n
n n
n
2


= A(k)> (In 1n 1>
n /n)A(k) ,
where we used the properties (see Lemma 10.3 and its proof ) of the projector matrix (In 1n 1>
n /n).

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

148

Chapter 12. Randomized Averaging Algorithms

12.4

Table of asymptotic behaviors for averaging systems


Dynamics

Assumptions & Asymptotic Behavior

References

discrete-time:
x(k + 1) = Ax(k),
A row-stochastic adjacency
matrix of digraph G

G has a globally reachable node


Thm 5.2
=
limk x(k) = (w> x(0))1n ,
where w 0, w> A = w> , and 1>
nw = 1

continuous-time:
x(t)

= Lx(t),
L Laplacian matrix of
digraph G

G has a globally reachable node


Thm 7.3
=
limt x(t) = (w> x(0))1n ,
>
where w 0, w> L = 0>
n , and 1n w = 1

time-varying discrete-time:
x(k + 1) = A(k)x(k),
A(k) row-stochastic adjacency
matrix of digraph G(k),
k Z0

(i) at each time k, G(k) has self-loop at each node,


Thm 11.2
(ii) each aij (k) 0 is larger than > 0,
(iii) there exists duration s.t., for all time k,
G(k) G(k + 1) has a globally reachable node
=
limk x(k) = (w> x(0))1n ,
where w 0, 1>
nw = 1

time-varying symmetric
discrete-time:
x(k + 1) = A(k)x(k),
A(k) symmetric stochastic
adjacency of G(k), k Z0

(i) at each time k, G(k) has self-loop at each node,


(ii) each aij (k) 0 is larger than > 0,
(iii) for all time k, k G( ) is connected
=

limk x(k) = average x(0) 1n

Thm 11.3

time-varying continuous-time:
x(t)

= L(t)x(t),
L(t) Laplacian matrix of
digraph G(t), t R0

Thm 11.9
(i) each aij (k) 0 is larger than > 0,
(ii) there exists duration T s.t., for all time t,
R t+T
digraph associated to t A( )d has a globally reachable
node
=
limk x(k) = (w> x(0))1n ,
where w 0, 1>
nw = 1

randomized discrete-time:
x(k + 1) = A(k)x(k),
A(k) random row-stochastic
adjacency matrix
of digraph G(k), k Z0

(i) {A(k)}kZ0 is i.i.d.,


Thm 12.1
(ii) each matrix has strictly positive diagonal,
(iii) digraph associated to E[A(k)] has a globally reachable
node,
=

limk x(k) = w> x(0) 1n almost surely,
where w > 0 is random vector with 1>
nw = 1

Table 12.1: Averaging systems: definitions, assumptions, asymptotic behavior, and reference

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Part II

Nonlinear Systems

149

Chapter 13

Nonlinear Systems and Robotic


Coordination
Coordination in relative sensing networks: rendezvous, flocking, and formations The material
in this section is self-contained. Further information on flocking can be found in (Olfati-Saber 2006;
Tanner et al. 2007), and further material formation control and graph rigidity can be found in (Anderson
et al. 2008; Drfler and Francis 2010; Krick et al. 2009; Oh et al. 2015).

13.1

Coordination in relative sensing networks

We consider the following setup for the coordination of n autonomous mobile robots (referred to agents)
in a planar environment:
(i) Agent dynamics: We consider a simple and fully actuated agent model: pi = ui , where pi R2
and ui R2 are the position and steering control input of agent i.

(ii) Relative sensing model: We consider the following sensing model.

Each agent is equipped with onboard sensors only and has no communication devices.
The sensing topology is encoded by an undirected and connected graph G = (V, E)
Each agent i can measure the relative position of neighboring agents: pi pj for {i, j} E.
To formalize the relative sensing model, we introduce an arbitrary orientation and labeling
k {1, . . . , |E|} for each undirected edge {i, j} E. Recall the incidence matrix B Rn|E| of
= B I2 via the Kronecker product.
the associated oriented graph and define the 2n 2|E| matrix B
The Kronecker product A B is the element-wise matrix product so that each scalar entry Aij
of A is replaced by a block-entry Aij B in the matrix A B. For example, if B is given by

+1 0
0
0
+I2
0
0
0
1 +1 1 0
I2 +I2 I2
0

.
B=
0 1 0 +1 , then B is given by B = B I2 = 0 I2
0 +I2
0
0 +1 1
0
0 +I2 I2
151

152

Chapter 13. Nonlinear Systems and Robotic Coordination

p2
e1
e2

p1
e3

p3

Figure 13.1: A ring graph with three agents. The first panel shows the agents embedded in the plane R2 with
positions pi and relative positions ei . The second panel shows the artificial potentials as springs connecting the
robots, and the third panel shows the resulting forces.

> p.
With this notation the vector of relative positions is given by e = B
(iii) Geometric objective: The objective is to achieve desired geometric configuration which can
be expressed as a function of relative distances kpi pj k for each {i, j} E. Examples include
rendezvous (kpi pj k = 0), collision avoidance (kpi pj k > 0), and desired relative spacings
(kpi pj k = dij > 0).
(iv) Potential-based control: We specify the geometric objective for each edge {i, j} E as the
minimum of an artificial potential function Vij : Dij R R0 . We require the potential functions
to be twice continuously differentiable on their domain Dij .

It is instructive to think of Vij (kpi pj k) as a spring coupling neighboring agents {i, j} E. The

resulting spring forces acting on agents i and j are fij (pi pj ) = p


Vij (kpi pj k) and fji (pi pj ) =
i

fij (pi pj ) = pj Vij (kpi pj k); see Figure 13.1 for an illustration. The overall network potential
function is then
X
V (p) =
Vij (kpi pj k) .
{i,j}E

We design the associated gradient descent control law as


pi = ui =

X
X
V (p)
=
V (kpi pj k) =
fij (pi pj ) ,
pi
pi
{i,j}E

{i,j}E

i {1, . . . , n} .

In vector form the control reads as the gradient flow


V (p) >
diag({fij }{i,j}E ) B
>p .
=B
p = u =
p

(13.1)

The closed-loop relative sensing network (13.1) is illustrated in Figure 13.2.


Controllers based on artificial potential functions induce a lot of structure in the closed-loop system.
Recall the set of 2-dimensional orthogonal matrices O(2) = {R R2 | RR> = I2 }, introduced in
Exercise E2.13, as the set of 2-dimensional rotations and reflections.
Lemma 13.1 (Symmetries of relative sensing networks). Consider the closed-loop relative sensing
network (13.1) with an undirected and connected graph G = (V, E). For every initial condition p0 R2n , we
have that
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.1. Coordination in relative sensing networks

..
u

153

.
x i = ui

..

.
T
B

..
z

.
y

fij ()

..

Figure 13.2: Closed-loop diagram of the relative sensing network (13.1).

(i) the center of mass is stationary: average(p(t)) = average(p0 ) for all t 0; and
(ii) the closed-loop p = Vp(p)

>

is invariant under rigid body transformations: if i = Rpi + q, where

>
R O(2) and q R2 is a translation vector, then = V() .

P
P
P
Proof. Regarding statement (i), since ni=1 pi = 0, it follows that ni=1 pi (t) = ni=1 pi0 .
Regarding statement (ii), first, notice that potential function is invariant under translations since
V (p) = V (p + 1n q) for any translation q R2 . Second, notice that the potential function is invariant
= V (p) where
under rotations and reflections since Vij (kR(pi pj )k) = Vij (kpi pj k) and thus V (Rp)

R
= V (p) or V (Rp)
= V (p)R
> . By
= In R. From the chain rule we obtain V (Rp)
R
p
p
p
p
+ 1n q), we find
combining these insights when changing coordinates via i = Rpi + q (or = Rp
that

>
>
>
V (Rp)
V () >
= R
>
p = R
V (p) = V (p) R
=
.
=
p
p
p


Example 13.2 (The linear-quadratic rendezvous problem). An undirected consensus system is a relative
sensing network coordination problem where the objective is rendezvous: pi = pj for all {i, j} E. For each
edge {i, j} E consider the artificial potential Vij : R2n R0 which has a minimum at the desired objective.
For example, for the quadratic potential function
1
Vij (pi pj ) = aij kpi pj k22 ,
2
where L
= L I2 . The
the overall potential function is obtained as the Laplacian potential V (p) = 12 p> Lp,
resulting gradient descent control law gives rise to the linear Laplacian flow
pi = ui =

V (p) =
aij (pi pj ) .
pi
{i,j}E

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(13.2)

154

Chapter 13. Nonlinear Systems and Robotic Coordination

So far, we analyzed the consensus problem (13.2) using matrix theory and exploiting the linearity of the problem.
In the following, we introduce numerous tools that will allow us to analyze nonlinear consensus-type interactions
and more general nonlinear dynamical systems.

13.2

Stability theory for dynamical systems

Dynamical systems and equilibrium points A (continuous-time) dynamical systems is a pair (X, f )
where X, called the state space, is a subset of Rn and f , called the vector field, is a map from X to Rn .
Given an initial state x0 X, the solution (also called trajectory or evolution) of the dynamical system is
a curve t 7 x(t) satisfying the differential equation
x(t)

= f (x(t)),

x(0) = x0 .

A dynamical system (X, f ) is linear if x 7 f (x) = Ax for some square matrix A. Typically, the map f is
assumed to have some continuity properties so that the solution exists and is unique for at least small
times; we do not discuss this topic here and refer, for example, to (Khalil 2002).
Examples of continuous-time dynamical systems include the (linear) Laplacian flow x = Lx
(seePequation (7.2) in Section 7.4) and the (nonlinear) Kuramoto coupled oscillator model i = i
n
K
j=1 sin(i j ) (which we discuss in Chapter 14).
n
An equilibrium point for the dynamical systems (X, f ) is a point x X such that f (x ) = 0n . If the
initial state is x(0) = x , then the solution exists unique for all time and is constant: x(t) = x for all
t R0 .
Convergence and invariant sets A curve t 7 x(t) approaches a set S Rn as t + if the distance
from x(t) to the set S converges to 0 as t +. If the set S consists of a single point s, then x(t)
converges to s in the usual sense: limt+ x(t) = s.
Given a dynamical system (X, f ), a set W X is invariant if each solution starting in W remains in
W , that is, if x(0) W implies x(t) W for all t 0. We also need the following general properties: a
set W Rn is
(i) bounded if there exists a constant K that each w W satisfies kwk K,

(ii) closed if it contains its boundary (or, equivalently, if it contains all its limit points), and
(iii) compact if it is bounded and closed.
Stability An equilibrium point x for the system (X, f ) is said to be
(i) stable (or Lyapunov stable) if, for each > 0, there exists = () > 0 so that if kx(0) x k < ,
then kx(t) x k < for all t 0,

(ii) unstable if it is not stable,

(iii) locally asymptotically stable if it is stable and if there exists > 0 such that limt x(t) = x for all
trajectories satisfying kx(0) x k < .
Moreover, given a locally asymptotically stable equilibrium point x ,
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.2. Stability theory for dynamical systems

155

(i) the set of initial conditions x0 X whose corresponding solution x(t) converges to x is a closed
set termed the region of attraction of x ,
(ii) x is said to be globally asymptotically stable if its region of attraction is the whole space X,
(iii) x is said to be globally (respectively, locally) exponentially stable if it is globally (respectively, locally)
asymptotically stable and all trajectories starting in the region of attraction satisfy
kx(t) x k c1 kx(0) x kec2 t ,

for some positive constants c1 , c2 > 0.

Some of these concepts are illustrated in Figure 13.3.

"

Figure 13.3: Illustrations of a stable, an unstable and an asymptotically stable equilibrium.

Energy functions: non-increasing functions, sublevel sets and critical points In order to establish the stability and convergence properties of a dynamical system, we will use the concept of an energy
function that is non-increasing along the systems solution.
The Lie derivative of a function V : Rn R with respect to a vector field f : Rn Rn is the function
Lf V : Rn R defined by

Lf V (x) =
V (x)f (x).
(13.3)
x
A differentiable function V : Rn R is said to be non-increasing along every trajectories of the system if
each solution x : R0 X satisfies
V (x(t)) = Lf V (x(t)) 0,

or, equivalently, if each point x X satisfies

Lf V (x) 0.

A critical point for a differentiable function V : Rn R is a point x


X satisfying
V
(
x) = 0n .
x
Every critical point of a differentiable function is either a local minimum, local maximum or a saddle
point. Given a function V : Rn R and a constant k R, the k-level set of V is {y Rn | V (y) = k},
and the k-sublevel set of V is {y Rn | V (y) k}. These concepts are illustrated in Figure 13.4.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

156

Chapter 13. Nonlinear Systems and Robotic Coordination

`1
`2

`3

x1

x2

x3

x4

x5

{x | V (x) `2 }

Figure 13.4: A differentiable function, its sublevel set and its critical points. The sublevel set {x | V (x) k1 } is
unbounded. The sublevel set {x | V (x) k2 } = [x1 , x5 ] is compact and contains three critical points (x2 and x4
are local minima and x3 is a local maximum). Finally, the sublevel set {x | V (x) k3 } is compact and contains a
single critical point x4 .

13.2.1

Main convergence tool: the LaSalle Invariance Principle

We now present a powerful analysis tool for the convergence analysis of nonlinear systems, namely
the LaSalle Invariance Principle. We refer to (Khalil 2002, Theorem 4.4) for a complete proof, many
examples and much related material. Also, we refer to (Bullo et al. 2009; Mesbahi and Egerstedt 2010)
for various extensions and applications to robotic coordination.
Theorem 13.3 (LaSalle Invariance Principle). Consider a dynamical system (X, f ) with differentiable f .
Assume
(i) there exists a compact set W X that is invariant for (X, f ),

(ii) there exists a continuously-differentiable function V : X R satisfying Lf V (x) 0 for all x X.


Then each solution t 7 x(t) starting in W , that is, x(0) W , converges to the largest invariant set contained in
n
o

x W Lf V (x) = 0 .

Note: If the set S is composed of multiple disconnected components and t 7 x(t) approaches S,
then it must approach one of its disconnected components. Specifically, if the set S is composed of a
finite number of points, then t 7 x(t) must converge to one of the points.

13.2.2

Application #1: Linear and linearized systems

It is interesting to study the convergence properties of a linear system. Recall that a symmetric matrix is
positive definite if all its eigenvalues are strictly positive.
Theorem 13.4 (Convergence of linear systems). For a matrix A Rnn , the following properties are
equivalent:
(i) each solution to the differential equation x = Ax satisfies limt+ x(t) = 0n ,
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.2. Stability theory for dynamical systems

157

(ii) all the eigenvalues of A have strictly-negative real parts, and


(iii) for every positive-definite matrix Q, there exists a unique solution positive-definite matrix P to the so-called
Lyapunov equation:
A> P + P A = Q.
One can show that statement (iii) implies statement (i) using the LaSalle Invariance Principle with
function V (x) = x> P x, whose derivative along the systems solutions is V = x> (A> P + P A)x =
x> Qx 0.
The linearization at the equilibrium point x of the dynamical system (X, f ) is the linear dynamical
system defined by the differential equation x = Ax, where
A=

f
(x ).
x

Theorem 13.5 (Convergence of nonlinear systems via linearization). Consider a dynamical system
(X, f ) with an equilibrium point x , with twice differentiable vector field f , and with linearization A at x . The
following statements hold:
(i) the equilibrium point x is locally exponentially stable if and only if all the eigenvalues of A have
strictly-negative real parts; and
(ii) the equilibrium point x is unstable if at least one eigenvalue of A has strictly-positive real part.
Theorem 13.5 can often be invoked to analyze local stability of a nonlinear system. For example, for
R, consider the dynamical system
= f () = sin() ,
which we will study extensively in Chapters 14 and 15. If [0, 1[, then two equilibrium points
are 1 = arcsin() [0, /2[ and 2 = arcsin() ]/2, +]. Moreover, the 2-periodic set
of equilibria are given by {1 + 2k | k Z} and {2 + 2k | k Z}. The linearization matrix

A(i ) = f
(i ) = cos(i ) for i {1, 2} shows that 1 is locally stable and 2 is unstable.
On the other hand, pick a scalar c and, for x R, consider the dynamical system
x = f (x) = c x3 .
The linearization at the equilibrium x = 0 is indefinite: A(x ) = 0. Thus, Theorem 13.5 offers no
conclusions other than the equilibrium cannot be exponentially stable. On the other hand, the LaSalle
Invariance Principle shows that for c < 0 every trajectory converges to x = 0. Here, a non-increasing
and differentiable function is given by V (x) = x2 with Lie derivative Lf V (x) = 2cx4 0. Since
V (x(t)) is non-increasing along the solution to the dynamical system, a compact invariant set is then
readily given by any sublevel set {x | V (x) k} for k 0.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

158

13.2.3

Chapter 13. Nonlinear Systems and Robotic Coordination

Application #2: Negative gradient systems

Given a twice differentiable function U : Rn R, the negative gradient flow defined by U is the dynamical
system
U
(x(t)).
(13.4)
x(t)

=
x
For x Rn , the Hessian matrix Hess U (x) is the symmetric matrix of second order partial derivatives:
(Hess U )ij (x) = 2 U/xi xj .
Theorem 13.6 (Convergence of negative gradient flows). Let U : Rn R be twice differentiable and
assume its sublevel set {x | U (x) k} is compact for some k R. Then the negative gradient flow (13.4) has
the following properties:
(i) the sublevel set {x | U (x) k} is invariant,

(ii) each solution t 7 x(t) with U (x(0)) k satisfies limt+ U (x(t)) = c k and approaches the set of
critical points of U :
n
o
U
(x) = 0n ,
x Rn
x

(iii) each local minimum point x is locally asymptotically stable and it is locally exponentially stable if and
only if Hess U (x ) is positive definite,
(iv) a critical point x is unstable if at least one eigenvalue of Hess U (x ) is strictly negative.
Proof. To show statements (i) and (ii), we verify that the assumptions of the LaSalle Invariance Principle
are satisfied as follows. First, as set W we adopt the sublevel set {x | U (x) k} which is compact
by assumption and is invariant because, as we show next, the value of t 7 U (x(t)) is non-increasing.
Second, the derivative of the function U along its negative gradient flow is

2

U


(x)
U (x) =
0.
x

The first two facts are now an immediate consequence of the LaSalle Invariance Principle. The statements (iii) and (iv) follow from observing that the linearization of the negative gradient system at the
equilibrium x is the Hessian matrix evaluated at x and from applying Theorem 13.5.

Note: If the function U has isolated critical points, then the negative gradient flow evolving in a
compact set must converge to a single critical point. In such circumstances, it is also true that from almost
all initial conditions the solution will converge to a local minimum rather than a local maximum or a
saddle point.
Note: given a critical point x , a positive definite Hessian matrix Hess U (x ) is a sufficient but not a
necessary condition for x to be a local minimum. As a counterexample, consider the function U (x) = x4
and the critical point x = 0.
Note: If the function U is radially unbounded, that is, limkxk U (x) = (where the limit is taken
along any path resulting in kxk ), then all its sublevel sets are compact.
Note from (ojasiewicz 1984): if the function U is analytic, then every solution starting in a compact
sublevel set has finite length and converges to a single equilibrium point.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.3. A nonlinear rendezvous problem

159

Example 13.7 (Dissipative mechanical system). Consider a dissipative mechanical system of the form
p = v,
mv = dv

U (p) ,
p

where (p, v) R2 are the position and velocity coordinates, m and d are the positive inertia and damping
coefficients, and U : R R is a twice differentiable potential energy function. We assume that U is strictly convex
with a unique global minimum at p . Consider the mechanical energy E : R R R0 given by the sum of
kinetic and potential energy:
1
E(p, v) = mv 2 + U (p).
2
We compute its derivative along trajectories of the mechanical system as follows:

E(p,
v) = mv v +
U (p)p = dv 2 0 .
p
Notice that the assumptions of the LaSalle Invariance Principle in Theorem 13.3 are satisfied: the function E and
the vector field (the right-hand side of the mechanical system) are continuously differentiable; the derivative E is
nonpositive; and for any initial condition (p0 , v0 ) R2 the sublevel set {(p, v) R2 | E(p, v) E(p0 , v0 )} is
compact due to the strict convexity of U . It follows that (p(t), v(t)) converges to largest invariant set contained in

U (p) =
{(p, v) R2 | E(p, v) E(p0 , v0 ), v = 0}, that is, {(p, v) R2 | E(p, v) E(p0 , v0 ), v = 0, p

0}. Because U is strictly convex and twice differentiable, p U (p) = 0 if and only if p = p . Therefore, we
conclude
lim (p(t), v(t) = (p , 0).
t+

13.3

A nonlinear rendezvous problem

Consider the nonlinear rendezvous system


pi = fi (p) =

{i,j}E

(13.5)

gij (pi pj ) ,

where (for each {i, j} E) gij = gji is a continuously differentiable and anti-symmetric function
satisfying gij (e) = 0 if and only if e = 0. Notice that the linearization of the system around the consensus
subspace may be zero and thus not very informative, for example, when gij (e) = kek2 e. The nonlinear
rendezvous system (13.5) can be written as a gradient flow:
n

pi =

V (p) =
Vij (pi pj ) .
pi
pi
j=1

with the associated edge potential function Vij (kpi pj k) =

R kpi pj k
0

gij () d.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

160

Chapter 13. Nonlinear Systems and Robotic Coordination

Theorem 13.8 (Nonlinear rendezvous). Consider the nonlinear rendezvous system (13.5) with an undirected and connected graph G = (V, E). Assume that the associated edge potential functions Vij (kpi pj k) =
R kpi pj k
gij () d are radially unbounded. For every initial condition p0 R2n , we have that
0
the center of mass is stationary: average(p(t)) = average(p0 ) for all t 0; and
limt p(t) = 1n average(p0 ).

Proof. Note that the nonlinear rendezvous system (13.5) is the negative gradient system defined by the
network potential function
X
V (p) =
Vij (kpi pj k) .
{i,j}E

Recall from Lemma 13.1 that the center of mass is stationary, and observe that the function V (p) is radially
unbounded with exception of the direction span(12n ) associated with a translation of the stationary
center of mass. Thus, for every initial condition p0 R2n , the set of points (with fixed center of mass)
{p R2n | average(p) = average(p0 ) , V (p) V (p0 )}
defines a compact set. By the LaSalle Invariance Principle in Theorem 13.3, each solution converges to
the largest invariant set contained in
n
o

V (p)
p R2n average(p(t)) = average(p0 ) , V (p) V (p0 ) ,
= 0>
n .
p

It follows that the only positive limit set is the set of equilibria: limt p(t) span(1n average(p0 )).

13.4

Flocking and Formation Control

In flocking control, the objective is that the robots should mimic the behavior of fish schools and bird
flocks and attain a pre-scribed formation defined by a set of distance constraints. Given an undirected
graph G(V, E) and a distance constraint dij for every edge {i, j} E, a formation is defined by the set
F = {p R2n | kpi pj k2 = dij

{i, j} E} .

We embed the graph G into the plane R2 by assigning to each node i a location pi R2 . We refer to the
pair (G, p) as a framework, and we denote the set of frameworks (G, F) as the target formation. A target
formation is a realization of F in the configuration space R2 . A triangular example is shown in Figure
13.5.
We make the following three observations on the geometry of the target formation:
To be non-empty, the formation F has to be realizable in the plane. For example, for the triangular
formation in Figure 13.5 the distance constraints dij need to satisfy the triangle inequalities:
d12 d13 + d23 ,

d23 d12 + d13 ,

d13 d12 + d23 .

A framework (G, p) with p F is invariant under rigid body transformations, that is, rotation or
translation, as seen in Figure 13.5. Hence, the formation F is a set of at least of dimension 3.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.4. Flocking and Formation Control

161
p2

p2
e13 2 = d13

e12 2 = d12

p1
e12 2 = d12

p3
e23 2 = d23

e23 2 = d23

p1
e13 2 = d13

e23 2 = d23

p3

p2

p3

p1

e13 2 = d13

Figure 13.5: A triangular formation specified by the distance constraints d12 , d13 , and d23 . The left subfigure
shows one possible target formation, the middle subfigure shows a rotation of this target formation, and the right
subfigure shows a flip of the left target formation. All of this triangles satisfy the specified distance constraints
and are elements of F.

The formation F may consist of multiple disconnected components. For instance, for the triangular
example in Figure 13.5 there is no continuous deformation from the left framework to the right
flipped framework, even though both are target formations. In the state space R6 , this absence of
a continuous deformation corresponds to two disconnected components of the set F.
To steer the agents towards the target formation consider an artificial potential function for each
edge {i, j} E which mimics the Hookean potential of a spring with rest length dij :
Vij (kpi pj k) =

2
1
kpi pj k2 dij .
2

Since this potential function is not differentiable, we choose the modified potential function
Vij (kpi pj k) =

2
1
kpi pj k22 d2ij .
4

(13.6)

The resulting closed loop under the gradient control law u = p


V (p) is given by

pi = ui =

V (p) =
pi

{i,j}E


kpi pj k22 d2ij (pi pj ) .

(13.7)

Theorem 13.9 (Flocking). Consider the nonlinear flocking system (13.7) with an undirected and connected
graph G = (V, E) and a realizable formation F. For every initial condition p0 Rn , we have that
the center of mass is stationary: average(p(t)) = average(p0 ) for all t 0; and

the agents asymptotically converge to the set of critical points of the potential function.
Proof. As in the proof of Theorem 13.8, the center of mass is stationary and the potential is non-increasing:


V (p) > 2


V (p) =
0.
p

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

162

Chapter 13. Nonlinear Systems and Robotic Coordination

Observe further that for a fixed initial center of mass, the sublevel sets of V (p) form a compact set. By
the LaSalle Invariance Principle in Theorem 13.3, p(t) converges to the largest invariant set contained in
o
n

V (p)
= 0>
p R2n average(p(t)) = average(p0 ) , V (p) V (p0 ) ,
n .
p
It follows that the positive limit set is the set of critical points of the potential function.

The above results also holds true for non-smooth potential functions Vij : ] d2i , [ R that satisfy

(P1) regularity: Vij () is defined and twice continuously-differentiable on ]0, [;


(P2) distance specification: fij () =

(P3) mutual attractivity: fij () =

Vij ()

Vij ()

= 0 if and only if = dij ;

is strictly monotone increasing; and

(P4) collision avoidance: lim0 Vij () = .

An illustration of possible potential functions can be found in Figure 13.6.

Vij

kfij k

d2ij

kpi

pj k

(a) Artificial potential functions

kpi

pj k

d2ij
(b) Induced artificial spring forces

Figure 13.6: Illustration of the quadratic potential function (13.6) (blue solid plot) and a logarithmic barrier
potential function (red dashed plot) that approaches as two neighboring agents become collocated

Theorem 13.10 (Flocking with collision avoidance). Consider the gradient flow (13.1) with an undirected
and connected graph G = (V, E), a realizable formation F, and artificial potential functions satisfying (P1)
through (P4). For every initial condition p0 R2n satisfying pi (0) 6= pj (0) for all {i, j} E, we have that
the solution to the non-smooth dynamical system exists for all times t 0;

the center of mass average(p(t)) = average(p(0)) is stationary for all t 0;

neighboring robots will not collide, that is, pi (t) 6= pj (t) for all {i, j} E and for all t 0; and
the agents asymptotically converge to the set of critical points of the potential function.

Proof. The proof of Theorem 13.10 is identical to that of Theorem 13.9 after realizing that, for initial
conditions satisfying pi (0) 6= pj (0) for all {i, j} E, the dynamics are confined to the compact and
forward invariant set
n
o

V (p)
p R2n average(p(t)) = average(p0 ) , V (p) V (p0 ) ,
= 0>
n .
p
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.5. Rigidity and stability of the target formation

163

Within this set, the dynamics (13.7) are twice continuously differentiable and collisions are avoided.

At this point we should ask ourselves the following two questions:


(i) Do the agents actually stop, that is, does there exist an p Rn so that limt p(t) = p ?

(ii) The formation F is a subset of the set of critical points of the potential function. How can we
render this particular subset stable (amongst possible other critical points)? What are the other
critical points?
(iii) Does our specification of the target formation make sense? For example, in Figure 13.7 target
formation can be infinitesimally deformed, such that the resulting geometric configurations are
not congruent.

Figure 13.7: A rectangular target formation among four robots, which is specified by four distance constraints.
The initial geometric configuration (solid circles) can be continuously deformed such that the resulting geometric
configuration is not congruent anymore. All of the displayed configurations are part of the target formation set
and satisfy the distance constraints, even the case when the agents are collinear.

The answers to all this question is tied to a graph-theoretic concept called rigidity.

13.5

Rigidity and stability of the target formation

To introduce the notion of graph rigidity, we view the undirected graph G = (V, E) as a framework
(G, p) embedded in the plane R2 . Given a framework (G, p) is, we define the rigidity function rG (p) as
rG : R2n R|E|

rG (p) ,

>
1
. . . , kpi pj k22 , . . . ,
2

where each component in rG (p) corresponds the length of the relative position pi pj for {i, j} E.
Definition 13.11 (Rigidity). Given an undirected graph G(V, E) and p R2n , the framework (G, p) is
said to be rigid if there is an open neighbourhood U of p such that if q U and rG (p) = rG (q), then (G, p) is
congruent to (G, q).
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

164

Chapter 13. Nonlinear Systems and Robotic Coordination

(a) A flexible framework

(b) A rigid framework

Figure 13.8: The framework in Figure 13.8(a) is not rigid since a slight perturbation of the upper two points of
the framework results in a framework that is not congruent to the original one although their rigidity functions
coincide. If an additional cross link is added to the framework as in Figure 13.8(b), small perturbations that do not
change the rigidity function result in a congruent framework. Thus, the framework in Figure 13.8(b) is rigid.

An example of a rigid and non-rigid framework is shown in Figure 13.8.


Although rigidity is a very intuitive concept, its definition does not provide an easily verifiable
condition, especially if one is interested in finding the exact neighbourhood U where the framework is
rigid. The following linearized rigidity concept offers an easily checkable algebraic condition. The idea
is to allow an infinitesimally small perturbation p of the framework (G, p) while keeping the rigidity
function constant up to first order. Then the first order Taylor approximation of the rigidity function
rG about p is
rG (p)
p + O2 (p) .
rG (p + p) = rG (p) +
p


G (p)
The rigidity function then remains constant up to first order if p kernel rp
. The matrix
rG (p)
p

R|E|2n is called the rigidity matrix of the graph G. If the perturbation p is a rigid body motion,
that is a translation and rotation of the framework, then, by Definition 13.11, the framework is still rigid.
Thus, the dimension of the kernel of the rigidity matrix is at least 3. The idea that rigidity is preserved
under infinitesimal perturbations motivates the following definition of infinitesimal rigidity.
Definition 13.12 (Infinitesimal rigidity). Given a formationgraph G(V, E) and p R2n , the
 framework


(G, p) is said to be infinitesimally rigid if dim kernel


2n 3.

rG (p)
p

= 3 or equivalently if rank

rG (p)
p

If a framework is infinitesimally rigid, then it is also rigid but the converse is not necessarily true
(Asimow and Roth 1979). Also note that a infinitesimally rigid framework must have at least 2n 3
edges E. If it has exactly 2n 3 edges, then we call it a minimally rigid framework. Finally, if (G, p) is
infinitesimally rigid at p, so is (G, p0 ) for p0 in an open neighborhood of p. Thus, infinitesimal rigidity
depends is a generic property that depends almost only on the graph G and not on the specific point
p R2n . Throughout the literature (infinitesimally, minimally) rigid frameworks are often denoted as
(infinitesimally, minimally) rigid graphs.
Example 13.13 (Rigidity and infinitesimal rigidity of triangular formation). Consider the triangular
framework in Figure 13.9(a) and the collapsed triangular framework in Figure 13.9(b) which are both embeddings
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.5. Rigidity and stability of the target formation

165

of the same triangular graph. The rigidity function for both frameworks is given by

kp2 p1 k2
1
rG (p) = kp3 p2 k2 .
2
kp1 p3 k2
Both frameworks are rigid but only the left framework is infinitesimally rigid. To see this, consider the rigidity
matrix
>

>
p p>
p>
0
2
2 p1
rG (p) 1
> p> p> .
0
p>
=
2 p3
3
2
p
>
>
>
p1 p3
0
p>
3 p1

The rank of the rigidity matrix at a collinear point is 2 < 2 n 3. Hence, the collapsed triangle in Figure 13.9(b)
is not infinitesimally rigid. All non-collinear realizations are infinitesimally and minimally rigid. Hence, the
triangular framework in Figure 13.9(a) is generically minimally rigid (for almost every p R6 ).

Minimally rigid graphs can be constructed by adding a new node with two undirected edges to an
existing minimally rigid graph; see Figure 13.10. This construction is known under the name Henneberg
sequence.
The flocking result in Theorem 13.9 identifies the critical points of the potential function as the
positive limit set. For minimally rigid graphs, we can perform a more insightful stability analysis. To do
> e. The rigidity
so, we first reformulate the formation control problem in the coordinates of the e = B
function can be conveniently rewritten in terms of the relative positions eij = pi pj for every edge
{i, j} E:
>
1
. . . , keij k22 , . . .
rG : B > R2n R|E| , rG (e) =
2

The rigidity matrix is then obtained in terms of the relative positions as


R(e) ,

rG (e)
rG (e) e
> .
=

= diag(e> ) B
p
e
p

p2

p1

p2

p1
p3

(a) A rigid and infinitesimally rigid framework (triangle inequalities are strict)

p3
(b) A rigid but not infinitesimally rigid framework (triangle inequalities are equalities)

Figure 13.9: Infinitesimal rigidity properties of a framework with three points


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

166

Chapter 13. Nonlinear Systems and Robotic Coordination

Figure 13.10: Construction of a minimally rigid graph by means of Henneberg sequence


>
Consider the shorthand v(e) d = . . . , kpi pj k22 d2ij , . . . . Then the closed-loop formation control
equations (13.7) can be reformulated in terms of relative positions as
> p = B
> u = B
>B
diag(e)(v(e) d) = B
> R(e)> (v(e) d) .
e = B

(13.8)

> p0 is a vector in image(B


> ).
The associated initial condition e0 = B
Theorem 13.14 (Stability of minimally rigid formations). Consider the nonlinear flocking system (13.7)
with an undirected and connected graph G = (V, E) and a realizable and minimally rigid formation F. For every
initial condition p0 Rn , we have that
the center of mass is stationary: average(p(t)) = average(p0 ) for all t 0;

the agents asymptotically converge to the set

Wp0 = {p R2n | average(p) = average(p0 ) , V (p) V (p0 ) , kR(e)> [v(e) d]k2 = 0|E| } .
In particular, the limit set Wp0 is a union of realizations of the target formation (G, p) with p Wp0 F
and the set of points p Wp0 where the framework (G, p) is not infinitesimally rigid; and

For every p0 R2n such that the framework (G, p) is minimally rigid for all p in the set
{p R2n | average(p) = average(p0 ) , V (p) V (p0 )} ,

the agents converge exponentially fast to a stationary target formation (G, p ) with p Wp0 F.
Proof. Consider the potential function (13.9), which reads in e-coordinates as
2
1
V (e) = v(e) d ,
4

(13.9)

> F is compact since the translational invariance


In the space of relative positions the target formation set B
is removed. Also the sublevel sets of V (e) are compact, and the derivative along the trajectories of (13.8)
is
V (e)
> R(e)[v(e) d] = [v(e) d]> R(e)R(e)> [v(e) d] 0 .
e = [v(e) d]> diag(e> )B
e
Notice that V (e(t)) is non-increasing, and for every c 0 the sublevel set
> ) | V (e) c}
(c) := {e Im(B
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

13.5. Rigidity and stability of the target formation

167

> ) the
is forward invariant. By the LaSalle Invariance Principle, for every initial condition e0 image(B
associated solution of (13.8) converges to the largest invariant set in
> ) | V (e) V (e0 ) , kR(e)> [v(e) d]k2 = 0|E| }.
We0 = {e image(B
In particular, the limit set We0 includes realizations of the target formation (G, p) with p Wp0 F,
> p, and [v(e) d] = 0|E| , and the set of points e We where the rigidity matrix R(e)> Rn|E|
e=B
0
looses rank corresponding to points p Wp0 where the framework (G, p) is not infinitesimally rigid.
Due to minimal rigidity of the target formation the matrix RG (e)> R2nm has full rank |E| = 2n3
> F, or said differently RG (e)RG (e)> has no zero eigenvalues for all e B
> F. The minimal
for all e B
>
>
F and thus (due to continuity of eigenvalues
eigenvalue of RG (e)RG (e) is positive for all e B
> F. In particular, for any
with respect to the matrix elements) also in an open neighborhood of B
strictly positive > 0, we can find = () so that for everywhere in the sublevel set () the matrix
RG (e)RG (e)> is positive definite with eigenvalues lower-bounded by . Formally, is obtained by
= argmaxe,
subject to e (
)



min eig RG (e)RG (e)> .

e(
)

Then, for all e (), we can upper-bound the derivative of V (e) along trajectories as
V (e) kv(e) dk2 = 4 V (e) .

(13.10)

By the Grnwall-Bellman Comparison Lemma in Exercise E13.1, we have that for every e0 (),
V (e(t)) V (e0 )e4t . It follows that the the target formation set (parameterized in terms of relative
> F is exponentially stable with () as guaranteed region of attraction.
positions) B
Although the e-dynamics (13.8) and the p-dynamics (13.7) both have the formation F as a limit
set, convergence of the e-dynamics does not automatically imply convergence to a stationary target
formation (but only convergence of the point-to-set distance to F). To establish stationarity, we rewrite
the p-dynamics (13.7) as
Z t
p(t) = p0 +
f ( ) d ,
(13.11)
0

diag(e(t)[v(e(t) d]. Due to the exponential convergence rate of the e-dynamics


where f (t) = B
in We0 the function f (t) is exponentially decaying in time and thus an integrable (L1 ) function. It
follows that the integral on the right-hand side of (13.11) exists even in the limit as t and thus a
solution of the p-dynamics converges to a finite point in F, that is, the agents converge to a stationary
> p0 (), the agents converge
target formation. In conclusion for every p0 R2n so that e0 = B
exponentially fast to a stationary target formation.

Theorem 13.14 formulated for minimally rigid formations can also be extended to more redundant
infinitesimally rigid formations; see (Oh et al. 2012).

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

168

13.6
E13.1

Chapter 13. Nonlinear Systems and Robotic Coordination

Exercises
Grnwall-Bellman Comparison Lemma. Given a continuous function of time t 7 a(t) R, suppose
the signal t 7 x(t) satisfies
x(t)

a(t)x(t).
Define a new signal t 7 y(t) satisfying y(t)
= a(t)y(t). Show that

Rt
(i) y(t) = y(0) exp 0 a( )d , and
(ii) x(t) y(t).

E13.2

Distributed optimization using the Laplacian Flow. Consider the saddle point dynamics (7.6) that
solve the optimization problem (7.5) in a fully distributed fashion. Assume that the objective functions
are strictly convex and twice differentiable and that the underlying communication graph among the
distributed processors is connected and undirected. By using the LaSalle Invariance principle show that
all solutions of the saddle point dynamics converge to the set of saddle points.

Hint: Use the following global under-estimator property of a strictly convex function: f (x0 )f (x) > x
f (x)(x0
0
x) for all x and x in the domain of f .

E13.3

The Lotka-Volterra predator/prey dynamics. In mathematical ecology (Takeuchi 1996), the LotkaVolterra equations are frequently used to describe the dynamics of biological systems in which two animal
species interact, a predator and a prey. In a simplifies model with unit parameters, the animal populations
change through time according to
x(t)

= x(t) x(t)y(t),

y(t)
= y(t) + x(t)y(t),

(E13.1)

where x is the number of preys, y is the number of predators individuals, and are parameters characterizing
the interaction between the two species. Both variables are nonnegative and all four parameters are
strictly positive.
(i) Compute the unique non-zero equilibrium point (x , y ) of the system.
(ii) Determine, if possible, the stability properties of the equilibrium point (x , y ) via linearization
(Theorem 13.5).
(iii) Define the function V (x, y) = x y + ln(x) + ln(y) and note its level sets as illustrated in
Figure (E13.1).
(a) Compute the Lie derivative of V (x, y) with respect to the Lotka-Volterra vector field.
(b) What can you say about the stability properties of (x , y )?
(c) Sketch the trajectories of the system for some initial conditions in the x-y positive orthant.
E13.4

E13.5

On the gradient flow of a strictly convex function. Let f : Rn R be a strictly convex and twice

differentiable function. Show convergence of the associated negative gradient flow, x = x


f (x), to

>

the global minimizer x of f using the Lyapunov function V (x) = (x x ) (x x ) and the LaSalle
Invariance Principle in Theorem 13.3.
Hint: Use the global underestimate property of a strictly convex function stated as follows: f (x0 ) f (x) >

0
0
x f (x)(x x) for all distinct x and x in the domain of f .

Consensus with input constraints. Consider a set of n agents each with first-order dynamics x i = ui .
(i) Design a consensus protocol that respects input constraints ui (t) [1, 1] for all t 0, and prove
that your protocol achieves consensus.
Hint: Adopt the hyperbolic tangent function (or the arctangent function) and Theorem 13.8.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Exercises for Chapter 13

169

(x , y )

Figure E13.1: Level sets of the function V (x, y) for unit parameter values
i = ui to achieve consensus
(ii) Extend the protocol and the proof to the case of second-order dynamics x
of position states and convergence of velocity states to zero.
Hint: Recall Example 13.7.
E13.6

Nonlinear distributed optimization using the Laplacian Flow. Consider the saddle point dynamics (7.6) that solve the optimization problem 7.5 in a fully distributed fashion. Assume that the objective
functions are strictly convex and twice continuously differentiable and that the underlying communication
graph among the distributed processors is connected and undirected. Show via the LaSalle Invariance
Principle that all solutions of the saddle point dynamics converge to the set of saddle points.

Hint: Use the following global underestimate property of a strictly convex function: f (x0 )f (x) > x
f (x)(x0 x)
0
for all distinct x and x in the domain of f ; and the following global overestimate property of a concave function:

g(x0 ) g(x) x
g(x)(x0 x) for all distinct x and x0 in the domain of g. Finally, note that the overestimate

property holds with equality g(x0 ) g(x) = x


g(x)(x0 x) if g(x) is affine.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

170

Chapter 13. Nonlinear Systems and Robotic Coordination

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 14

Coupled Oscillators: Basic Models


In this chapter we discuss network of coupled oscillators. We borrow ideas from (Drfler and Bullo
2011, 2014). This chapter focuses on phase-coupled oscillators and does not discuss models of impulsecoupled oscillators. Further information on coupled oscillator models can be found in Acebrn et al.
(2005); Arenas et al. (2008); Mauroy et al. (2012); Strogatz (2000).

14.1

History

The scientific interest in synchronization of coupled oscillators can be traced back to the work by
Christiaan Huygens on an odd kind sympathy between coupled pendulum clocks (Huygens 1673).
The model of coupled oscillator which we study was originally proposed by Arthur Winfree (Winfree
1967). For complete interaction graphs, this model is nowadays known as the Kuramoto model due to
the work by Yoshiki Kuramoto (Kuramoto 1975, 1984). Stephen Strogatz provides an excellent historical
account in (Strogatz 2000).
The Kuramoto model and its variations appear in the study of biological synchronization phenomena
such as pacemaker cells in the heart (Michaels et al. 1987), circadian rhythms (Liu et al. 1997), neuroscience
(Brown et al. 2003; Crook et al. 1997; Varela et al. 2001), metabolic synchrony in yeast cell populations
(Ghosh et al. 1971), flashing fireflies (Buck 1988), chirping crickets (Walker 1969), and rhythmic
applause (Nda et al. 2000), among others. The Kuramoto model also appears in physics and chemistry in
modeling and analysis of spin glass models (Daido 1992; Jongen et al. 2001), flavor evolutions of neutrinos
(Pantaleone 1998), and in the analysis of chemical oscillations (Kiss et al. 2002). Some technological
applications include deep brain stimulation (Tass 2003), vehicle coordination (Klein et al. 2008; Paley et al.
2007; Sepulchre et al. 2007), semiconductor lasers (Hoppensteadt and Izhikevich 2000; Kozyreff et al.
2000), microwave oscillators (York and Compton 2002), clock synchronization in wireless networks
(Simeone et al. 2008), and droop-controlled inverters in microgrids (Simpson-Porco et al. 2012).
171

172

Chapter 14. Coupled Oscillators: Basic Models

3
k34
k23

k24
k12

Figure 14.1: Mechanical analog of a coupled oscillator network

14.2

Examples

14.2.1

Example #1: A spring network on a ring

This coupled-oscillator network consists of particles rotating around a unit-radius circle and assumed
to possibly overlap without colliding. Each particle is subject to (1) a non-conservative torque i , (2) a
linear damping torque, and (3) a total elastic torque.
Pairs of interacting particles i and j are coupled through elastic springs with stiffness kij > 0. The
elastic energy stored by the spring between particles at angles i and j is

kij
kij
2
distance =
(cos i cos j )2 + (sin i sin j )2
2
2


= kij 1 cos(i ) cos(j ) sin(i ) sin(j ) = kij 1 cos(i j ) ,

Eij (i , j ) =

so that the elastic torque on particle i is

Eij (i , j ) = kij sin(i j ).


i

In summary, Newtons law applied to this rotating system implies that the network of spring-interconnected
particles obeys the dynamics
Xn
Mi i + Di i = i
kij sin(i j ),
j=1

where Mi and Di are inertia and damping coefficients. In the limit of small masses Mi and uniformly-high
viscous damping D = Di , that is, Mi /D 0, the model simplifies to:
Xn
i = i
aij sin(i j ),
i {1, . . . , n}.
j=1

with natural rotation frequencies i = i /D and with coupling strengths aij = kij /D.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

14.2. Examples

173

!"#$%&'''%()(*%(+,-.,*%/012-3*%)0-4%5677*%899:

!"#$%&'
8

15

38

28

16
15
21

39

14
12

19
13

7
8

31

10
32

34

33

23
16

13

10

10
15

11
10

36

11

14

15

23

20

i / rad

24

8
12

21

22

22

17

-5

35

31

18
4

27

35

17

29

24

36

28
27

10
02
03
04
05

26

18

9
38

25

39

26

25

30

10

i / rad

30

37

10

29

8
37

10

32
3

33

34

06
07
08
09

19
20

Fig. 9. The New England test system [10], [11]. The system includes
10 synchronous generators and 39 buses. Most of the buses have constant
active
and reactive
Coupled and
swinggraph
dynamicsrepresentation
of 10 generators
Figure
14.2:power
Lineloads.
diagram
are studied in the case that a line-to-ground fault occurs at point F near bus
16.Generators are represented by 
 and load buses by .

test
system can
be represented
14.2.2
Example
#2:byThe

-5
0

10

for a simplified model of


the/ sNew England Power Grid.
TIME
Fig. 10. Coupled swing of phase angle i in New England test system.
The fault duration is 20 cycles of a 60-Hz sine wave. The result is obtained
by numerical integration of eqs. (11).

structure-preserving power network model

We
in Figure
14.2,towith
n buses
including
generators
and

Hi consider an AC power 2network, visualized


are provided
discuss
whether
the instability
in Fig. 10

=
D

+
P

G
E

E
E

(11)foroccurs
i j models
i
iWe
i present
mi
ii
i simplified
in
the
corresponding
real
power
system.
First,
the
load
buses.
two
this
network,
a
static
power-balance
model
and
a
fs

j=1,j!=i
classical model with constant voltage behind impedance is

dynamic {G
continuous-time
model.
ij cos(i j ) + Bij sin(i j )},
used for first swing criterion of transient stability [1]. This is
because second
and multi
swings
may
be affected
by voltage
The
transmission
network
matrix
Y
Cnn
that
is symmetric
and
where i = 2, . . . , 10. i is the rotor angleisofdescribed
generator i by
withan admittance
fluctuations,
damping
effects,
controllers
such
as
AVR,
PSS,
respect
to
bus
1,
and

the
rotor
speed
deviation
of
generator
sparse with line impedances
Zij = Zji for each branch {i, j} E. The network admittance matrix is
i
i relative to system angular frequency (2fs = 2 60 Hz). and governor. Second, the fault durations, which we fixed at
sparse
matrix
with
nonzero
off-diagonal
entries Yij20=cycles,
1/Zare
each branch
E;Last,
the the
diagonal
normally
less than{i,
10 j}
cycles.
load
ij for
1 is constant for the
Pnabove assumption. The parameters
condition used above is different from the original one in
=
Y
assure
zero
row
sums.
fselements
, Hi , Pmi ,YD
,
E
,
G
,
G
,
and
B
are
in
per
unit
i j=1,j6
ii =iij ij
ij
iii
system except for Hi and Di in second, and for fs in Helz. [11]. We cannot hence argue that global instability occurs in
the concepts.
real system.Firstly,
Analysis,
however, does
show a possibility
The
static
model
is
described
by
the following
according
to Kirchhoffs
current
The mechanical input power Pmi to generator
i and the two
of global
instability
in
real adjacent
power systems.
magnitude
of
internal
voltage
in
generator
i
are
assumed
law, theEcurrent
injection
at
node
i
is
balanced
by
the
current
flows
from
nodes:
i
to be constant for transient stability studies [1], [2]. Hi is
IV. T OWARDS A C ONTROL FOR G LOBAL S WING
the inertia constant of generator i, Di its damping coefficient,
I NSTABILITY
n
n
X 1 and
X
and they are constant. Gii is the internal conductance,
Global
instability
is
related
to the undesirable phenomenon
I
=
(V

V
)
=
Y
V
.
Gij + jBij the transfer impedance between
i generators i i
j
ij j
Zij
that should be avoided by control. We introduce a key
and j; They are the parameters which change with
j=1network
j=1
topology changes. Note that electrical loads in the test system mechanism for the control problem and discuss control
strategies for preventing or avoiding the instability.
are modeled as passive impedance [11].
Here, Ii and
V
are
the
phasor
representations
of
the
nodal current injections and nodal voltages, e.g.,
i
A. Internal Resonance as Another Mechanism
B. Numerical
Experiment
1
i
VCoupled
e
corresponds
to the
signal |V
cos(0 tInspired
+ i ). by
The
complex
i = |Vi | swing
i = Vi I i
[12],
we here power
describeinjection
the globalSinstability
dynamics
of 10
generators
in i | the
systems
theory
close to
internal resonance
test
systemz are
simulated.
Ei and the
initial condition
(where
denotes
the complex
conjugate
of z C) with
thendynamical
satisfies the
power
balance
equation
(i (0), i (0) = 0) for generator i are fixed through power [23], [24]. Consider collective dynamics in the system (5).
flow calculation. Hi is fixed at the original values
in [11]. nFor the system (5) with small parameters pm and b, the set
n
X
{(, ) S 1 R | = 0} of states in the phase plane is
Pmi and constant power loads are assumed to be
50% at their X
1(i j )
Y ij V j = called
Y ij resonant
|Vi ||Vj |esurface
. its neighborhood resonant
[23], and
ratings [22]. The damping Di is S
0.005
i = sVfor
i all generators.
band.
The
phase
plane
is decomposed into the two parts:
Gii , Gij , and Bij are also based on the original
line
data
j=1
j=1
in [11] and the power flow calculation. It is assumed that resonant band and high-energy zone outside of it. Here the
the test system is in a steady operating condition at t = 0 s, initial conditions of local and mode disturbances in Sec. II
inside equations
the resonantat
band.
collective
Secondly,
for a lossless
network
real bus
part16ofatthe indeed
powerexist
balance
eachThe
node
is motion
that
a line-to-ground
fault occurs
at pointthe
F near
t = 1 s20/(60 Hz), and that line 1617 trips at t = 1 s. The before the onset of coherent growing is trapped near the
resonant band. On the other hand, after the coherent growing,
fault duration is 20 cycles of a 60-Hz sine wave. The
n fault
is simulated by adding a small impedance (107 j)X
between it escapes from the resonant band as shown in Figs. 3(b),
Picoupled swings
=
aij 4(b),
sin(5,
j )8(b), andi
. . .trapped
, n}, motion is almost
(14.1)
(c).{1,
The
i and
bus 16 and ground. Fig. 10 shows
of rotor
|{z}
| integrable
{z
} is regarded as a captured state in resonance
and
angle i in the test system.
The
figure
indicates
that
all
rotor
j=1
active power injection
active power
flow
j to i the integrable motion may be interrupted
[23].
Atfrom
a moment,
angles start to grow coherently at about 8 s. The coherent
by small kicks that happen during the resonant band. That is,
growing is global instability.
the so-called release from resonance [23] happens, and the
Lectures on Network
Systems, motion
F. Bullocrosses the homoclinic orbit in Figs. 3(b),
C. Remarks
collective
Version
v0.81
Janthe
2016).
circulation.
Copyright
2012-16.
4(b),
5, and 8(b)
and (c),and
hence it goes away from
It was confirmed that the
system
(11)(4 in
NewDraft
Eng-not for
land test system shows global instability. A few comments the resonant band. It is therefore said that global instability
i = i ,

10
!

(')$
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 10, 2009 at 14:48 from IEEE Xplore. Restrictions apply.

174

Chapter 14. Coupled Oscillators: Basic Models

where aij = |Vi ||Vj ||Yij | denotes the maximum power transfer over the transmission line {i, j}, and
Pi = <(Si ) is the active power injection into the network at node i, which is positive for generators
and negative for loads. The systems of equations (14.1) are the so-called (balanced) active power flow
equations.
Next, we discuss a simplified dynamic model. Many appropriate dynamic models have been proposed
for each network node: zeroth order (for so-called constant power loads), first-order models (for so-called
frequency-dependent loads and inverter-based generators), and second and higher order for generators;
see (Bergen and Hill 1981). For extreme simplicity here, we assume that every node is described by a
first-order integrator with the following intuition: node i speeds up (i.e., i increases) when the power
balance at node i is positive, and slows down (i.e., i decreases) when the power balance at node i is
negative. In other words, we assume
i = Pi

n
X
j=1

aij sin(i j ).

(14.2)

The systems of equations (14.2) are a first-order simplified version of the so-called coupled swing
equations.
Note that, when every node is connected to every other node with identical connections of strength
K > 0, our simplified model of power network is identical to the so-called Kuramoto oscillators model:
n
X
i = i K
sin(i j ).
n

(14.3)

j=1

(Here i = Pi and aij = K/n for all i, j.)


Let us remark that a more realistic model of power network would necessarily include higher-order
dynamics for the generators, uncertain load models, mixed resistive-inductive lines, and the modelling
of reactive power.

14.2.3

Example #3: Flocking, schooling, and vehicle coordination

Consider a set of n particles in the plane R2 , which we identify with the complex plane C. Each particle
i {1, . . . , n} is characterized by its position ri C, its heading angle i S1 , and a steering control
law ui (r, ) depending on the position and heading of itself and other vehicles, see Figure 14.3.(a). For
simplicity, we assume that all particles have unit speed. The particle kinematics are then given by
ri = eii ,
i = ui (r, ) ,

(14.4)

for i {1, . . . , n} and i = 1 is the imaginary unit. If no control is applied, then particle i travels in a
straight line with orientation i (0), and if ui = i R is a nonzero constant, then particle i traverses a
circle with radius 1/|i |.
The interaction among the particles is modeled by a interaction graph G = ({1, . . . , n}, E, A)
determined by communication and sensing patterns. As shown by Vicsek et al. (1995), interesting
motion patterns emerge if the controllers use only relative phase information between neighboring
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

14.3. Coupled phase oscillator networks

175

particles. As discussed in the previous chapter, we may adopt potential functions-based gradient control
strategies (i.e., negative gradient flows) to coordinate the relative heading angles i (t) j (t). As shown
in Example #1, an intuitive extension of the quadratic Hookean spring potential to the circle is the
function Uij : S1 R defined by
Uij (i , j ) = aij (1 cos(i j )),
for each edge {i, j} E. Notice that the potential Uij (i , j ) achieves its unique minimum if the heading
angles i and j are synchronized, and it achieves its maximum when i and j are out of phase by angle
.
These considerations motivate the gradient-based control strategy
Xn
X
Uij (i j ) = 0 K
i = 0 K
aij sin(i j ) ,
j=1
i
{i,j}E

i {1, . . . , n} .

(14.5)

to synchronize the heading angles of the particles for K > 1 (gradient descent), respectively, to disperse
the heading angles for K < 1 (gradient ascent). The term 0 can induce additional rotations (for 0 6= 0)
or translations (for 0 = 0). A few representative trajectories are illustrated in Figure 14.3.
The controlled phase dynamics (14.5) give rise to elegant and useful coordination patterns that
mimic animal flocking behavior and fish schools. Inspired by these biological phenomena, scientists
have studied the controlled phase dynamics (14.5) and their variations in the context of tracking and
formation controllers in swarms of autonomous vehicles (Paley et al. 2007).
(c)

(b)

(a)
!" #!
! x !
!
!r! = !
! y !

(x, y)

(d)

(e)

eii

Figure 14.3: Panel (a) illustrates the particle kinematics (14.4). Panels (b)-(e) illustrate the controlled dynamics (14.4)(14.5) with n = 6 particles, a complete interaction graph, and identical and constant natural frequencies: 0 (t) = 0
in panels (b) and (c) and 0 (t) = 1 in panels (d) and (e). The values of K are K = +1 in panel (b) and (d) and
K = 1 in panel (c) and (e). The arrows depict the orientation, the dashed curves show the long-term position
dynamics, and the solid curves show the initial transient position dynamics. As illustrated, the resulting motion
displays synchronized or dispersed heading angles for K = 1, and translational motion for 0 = 0, respectively
circular motion for 0 = 1.

14.3

Coupled phase oscillator networks

Given a connected, weighted, and undirected graph G = ({1, . . . , n}, E, A), consider the coupled
oscillator model
Xn
i = i
aij sin(i j ),
i {1, . . . , n}.
(14.6)
j=1

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

176

Chapter 14. Coupled Oscillators: Basic Models

A special case of the coupled oscillator (14.6) is the so-called Kuramoto model (Kuramoto 1975) with a
complete homogeneous network (i.e., with identical edge weights aij = K/n):
K Xn
i = i
sin(i j ),
j=1
n

14.3.1

i {1, . . . , n}.

(14.7)

The geometry of the circle and the torus

Parametrization The unit circle is S1 . The torus Tn is the set consisting of n-copies of the circle. We
parametrize the circle S1 by assuming (i) angles are measured counterclockwise, (ii) the 0 angle is the
intersection of the unit circle with the positive horizontal axis, and (iii) angles take value in [, [.
Geodesic distance The clockwise arc-length from i to j is the length of the clockwise arc from i to
j . The counterclockwise arc-length is defined analogously. The geodesic distance between i and j is
the minimum between clockwise and counterclockwise arc-lengths and is denoted by |i j |. In the
parametrization:
distcc (1 , 2 ) = mod((2 1 ), 2),

distc (1 , 2 ) = mod((1 2 ), 2)

|1 2 | = min{distc (1 , 2 ), distcc (1 , 2 )}.

Rotations Given the angle [, [, the rotation of the n-tuple = (1 , . . . , n ) Tn by ,


denoted by rot (), is the counterclockwise rotation of each entry (1 , . . . , n ) by . For = Tn , we
also define its rotation set to be
[] = {rot () Tn | [, [}.
The coupled oscillator model (14.6) is invariant under rotations, that is, given a solution : R0 Tn to
the coupled oscillator model, a rotation of rot ((t)) by any angle is again a solution.
Arc subsets of the n-torus Given a length [0, 2[, the arc subset arc () Tn is the set of n-tuples
(1 , . . . , n ) such that there exists an arc of length containing all 1 , . . . , n . The set arc () is the
interior of arc (). For example, arc () implies all angles 1 , . . . , n belong to a closed half circle.
Note:
(i) If (1 , . . . , n ) arc (), then |i j | for all i and j. The converse is not true in general.
For example, { Tn | |i j | for all i, j} is equal to the entire Tn . However, the converse
statement is true in the following form (see also Exercise E14.2): if |i j | for all i and j and
(1 , . . . , n ) arc (), then (1 , . . . , n ) arc ().

(ii) If = (1 , . . . , n ) arc (), then average() is well posed. (The average of n angles is ill-posed in
general. For example, there is no reasonable definition of the average of two diametrically-opposed
points.)
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

14.3. Coupled phase oscillator networks

14.3.2

177

Synchronization notions

Consider the following notions of synchronization for a solution : R0 Tn :


Frequency synchrony: A solution : R0 Tn is frequency synchronized if i (t) = j (t) for all time t
and for all i and j.
Phase synchrony: A solution : R0 Tn is phase synchronized if i (t) = j (t) for all time t and for
all i and j.
Phase cohesiveness: A solution : R0 Tn is phase cohesive with respect to > 0 if one of the
following conditions hold for all time t:
(i) (t) arc ();
(ii) |i (t) j (t)| for all edges (i, j) of a graph of interest; or
qP
n
2
(iii)
i,j=1 |i (t) j (t)| /2 < .

Asymptotic notions: We will also talk about solutions that asymptotically achieve certain synchronization
properties. For example, a solution : R0 Tn achieves phase synchronization if limt |i (t)
j (t)| = 0. Analogous definitions can be given for asymptotic frequency synchronization and
asymptotic phase cohesiveness.
Finally, notice that phase synchrony is the extreme case of all phase cohesiveness notions with = 0.

14.3.3

Preliminary results

We have the following result on the synchronization frequency.


Lemma 14.1 (Synchronization frequency). If a solution of the coupled oscillator model (14.6) achieves
frequency synchronization, then it does so with a constant synchronization frequency equal to
n

sync ,

1X
i = average().
n
i=1

Proof. This fact is obtained by summing all equations (14.6) for i {1, . . . , n}.

Lemma 14.1 implies that, by expressing each angle with respect to a rotating frame with frequency
sync and by replacing i by i sync , we obtain sync = 0 or, equivalently, 1
n . In this rotating
frame a frequency-synchronized solution is an equilibrium. Due to the rotational invariance of the
coupled oscillator model (14.6), it follows that if Tn is an equilibrium point, then every point in the
rotation set
[ ] = { Tn | rot ( ) , [, [}
is also an equilibrium. Notice that the set [ ] is a connected circle in Tn , and we refer to it as an
equilibrium set. Figure 14.4 for the two-dimensional case.
We have the following important result on local stability properties of equilibria.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

178

Chapter 14. Coupled Oscillators: Basic Models

[ ]
12

|1

2 | < /2

Figure 14.4: Illustration of the state space T2 , the equilibrium set [ ] associated to a phase-synchronized equilibrium
(dotted blue line), the (meshed red) phase cohesive set |2 1 | < /2, and the tangent space with translation
vector 12 at arising from the rotational symmetry.

Lemma 14.2 (Linearization). Assume the frequencies satisfy 1


n and G is connected with incidence
matrix B. The following statements hold:
(i) Jacobian: the Jacobian of the coupled oscillator model (14.6) at Tn is
J() = B diag({aij cos(i j )}{i,j}E )B > ,
(ii) Local stability: if there exists an equilibrium such that |i j | < /2 for all {i, j} E, then
(a) J( ) is a Laplacian matrix; and
(b) the equilibrium set [ ] is locally exponentially stable.
Proof. We start with statements (i) and (ii)a. Given Tn , we define the undirected graph Gcosine ()
with the same nodes and edges as G and with edge weights aij cos(i j ). Next, we compute
Xn
Xn


i
aij sin(i j ) =
aij cos(i j ),
j=1
j=1
i
Xn


i
aik sin(i k ) = aij cos(i j ).
k=1
j

Therefore, the Jacobian is equal to minus the Laplacian matrix of the (possibly negatively weighted)
graph Gcosine () and statement (i) follows from Lemma 8.2. Regarding statement (ii)a, if |i j | < /2
for all {i, j} E, then cos(i j ) > 0 for all {i, j} E, so that Gcosine () has strictly nonnegative
weights and all usual properties of Laplacian matrices hold.
To prove statement (ii)b notice that J( ) is negative semidefinite with the nullspace 1n arising from
the rotational symmetry, see Figure 14.4. All other eigenvectors are orthogonal to 1n and have negative
eigenvalues. We now restrict our analysis to the orthogonal complement of 1n : we define a coordinate
transformation matrix Q R(n1)n with orthonormal rows orthogonal to 1n ,
Q1n = 0n1

and

QQ> = In1 ,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

14.3. Coupled phase oscillator networks

179

and we note that QJ( )Q> has negative eigenvalues. Therefore, in the original coordinates, the zero
eigenspace 1n is exponentially stable. By Theorem 13.5, the corresponding equilibrium set [ ] is locally
exponentially stable.

Corollary 14.3 (Frequency synchronization). If a solution of the coupled oscillator model (14.6) satisfies
the phase cohesiveness properties |i (t) j (t)| for some [0, /2[ and for all t 0, then the coupled
oscillator model (14.6) achieves exponential frequency synchronization.
Proof. Let xi (t) = i (t) be the frequency. Then x(t)

= J((t))x(t) is a time-varying averaging system.


The associated undirected graph has time-varying yet strictly positive weights aij cos(i (t) j (t))
aij cos() > 0 for each {i, j} E. Hence, the weighted graph is connected for each t 0. From the
analysis of time-varying averaging systems in Theorem 11.6, the exponential convergence of x(t) to
average(x(0))1n follows. Equivalently, the frequencies synchronize.


14.3.4

The order parameter and the mean field model

An alternative synchronization measure (besides phase cohesiveness) is the magnitude of the order parameter
rei =

1 Xn
eij .
j=1
n

(14.8)

The order parameter (14.8) is the centroid of all oscillators represented as points on the unit circle in C1 .
The magnitude r of the order parameter is a synchronization measure:
if the oscillators are phase-synchronized, then r = 1;
if the oscillators are spaced equally on the unit circle, then r = 0; and
for r ]0, 1[ and oscillators contained in a semi-circle, the associated configuration of oscillators
satisfy a certain level of phase cohesiveness; see Exercise E14.3.
By means of the order parameter rei the all-to-all Kuramoto model (14.7) can be rewritten in the
insightful form
i = i Kr sin(i ) , i {1, . . . , n} .
(14.9)

(We ask the reader to establish this identity in Exercise E14.4.) Equation (14.9) gives the intuition that
the oscillators synchronize because of their coupling to a mean field represented by the order parameter
rei , which itself is a function of (t). Intuitively, for small coupling strength K each oscillator rotates
with its distinct natural frequency i , whereas for large coupling strength K all angles i (t) will entrain
to the mean field rei , and the oscillators synchronize. The transition from incoherence to synchrony
occurs at a critical threshold value of the coupling strength, denoted by Kcritical .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

180

14.4
E14.1

Chapter 14. Coupled Oscillators: Basic Models

Exercises
Simulating coupled oscillators. Simulate in your favorite programming language and software package
the coupled Kuramoto oscillators in equation (14.3). Set n = 10, define a vector R10 with entries
deterministically uniformly-spaced between 1 and 1. Select random initial phases.

(i) Simulate the resulting differential equations for K = 10 and K = 0.1.


(ii) Find the approximate value of K at which the qualitative behavior of the system changes from
asynchrony to synchrony.

Turn in your code, a few printouts (as few as possible), and your written responses.
E14.2

E14.3

Phase cohesiveness and arc length. Pick < 2/3 and n 3. Show the following statement: if
Tn satisfies |i j | for all i, j {1, . . . , n}, then there exists an arc of length containing all
angles, that is, arc ().

Order parameter and arc length. Given n 2 and Tn , the shortest arc length () is the length of
the shortest arc containing all angles, i.e., the smallest () such that arc (()). Given Tn , the
order parameter is the centroid of (1 , . . . , n ) understood as points on the unit circle in the complex plane
C:

1 Xn
r() e 1 () :=
e 1 j
j=1
n

Prove the following statements:

(i) if () [0, ], then r() [cos(()/2), 1]; and


(ii) if arc (), then () [2 arccos(r()), ].

The order parameter magnitude r is known to measure synchronization. Show the following statements:
(iii) if all oscillators are phase-synchronized, then r = 1, and
(iv) if all oscillators are spaced equally on the unit circle (the so-called splay state), then r = 0.
E14.4

Order parameter and mean-field dynamics. Show that the Kuramoto model (14.7) is equivalent to
the so-called mean-field model (14.9) with the order parameter r defined in (14.8).

E14.5

Uniqueness of Kuramoto equilibria. A common misconception in the literature is that the Kuramoto
model has a unique equilibrium set in the phase cohesive set { Tn | |i j | < /2 for all {i, j} E}.
Consider now the example of a Kuramoto oscillator network defined over a symmetric ring graph with
identical unit weights and zero natural frequencies. The equilibria are determined by
0 = sin(i i1 ) + sin(i i+1 ) ,
where i {1, . . . , n} and all indices are evaluated modulo n. Show that for n > 4 there are at least two
disjoint equilibrium sets in the phase cohesive set { Tn | |i j | < /2 for all {i, j} E}.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 15

Networks of Coupled Oscillators


15.1

Synchronization of identical oscillators

We start our discussion with the following insightful lemma.


Lemma 15.1. Consider the coupled oscillator model (14.6). If i 6= j for some distinct i, j {1, . . . , n},
then the oscillators cannot achieve phase synchronization.
Proof. We prove the lemma by contraposition. Assume that all oscillators are in phase synchrony
i (t) = j (t) for all t 0 and all i, j {1, . . . , n}. Then by equating the dynamics, i (t) = j (t), it
follows necessarily that i = j .

Motivated by Lemma 15.1, we consider oscillators with identical natural frequencies, i = R for
all i {1, . . . , n}. By working in a rotating frame with frequency , we have = 0. Thus, we consider
the model
Xn
i =
aij sin(i j ),
i {1, . . . , n}.
(15.1)
j=1

Notice that phase synchronization is an equilibrium of the this model. Conversely, phase synchronization
cannot be an equilibrium of the original coupled oscillator model (14.6) if i 6= j .

15.1.1

An averaging-based approach

Let us first analyze the coupled oscillator model (15.1) with initial conditions restricted to an open
semi-circle, (0) arc () for some [0, [. In this case, the oscillators remain in a semi-circle at least
for small times t > 0 and the two coordinate transformations
xi (t) = tan(i (t)) (with xi R),

and

yi (t) = i (t) (with yi R)

are well-defined and bijective (at least for small times).


In the xi -coordinates, the coupled oscillator model reads as the time-varying continuous-time
averaging system
Xn
x i (t) =
bij (t)(xi (t) xj (t)),
(15.2)
j=1

181

182

Chapter 15. Networks of Coupled Oscillators

p
where bij (t) = aij (1 + xi (t))2 /(1 + xj (t))2 and bij (t) aij cos(/2); see Exercise E15.3 for a derivation. Similarly, in the yi -coordinates, the coupled oscillator model reads as
Xn
y i (t) =
cij (t)(yi (t) yj (t)),
(15.3)
j=1

where cij (t) = aij sinc(yi (t) yj (t)) and cij (t) aij sinc(). Notice that both averaging formulations
(15.2) and (15.3) are well-defined as long as the the oscillators remain in a semi-circle arc () for some
[0, [.
Theorem 15.2 (Phase cohesiveness and synchronization in open semicircle). Consider the coupled
oscillator model (15.1) with a connected, undirected, and weighted graph G = ({1, . . . , n}, E, A). The following
statements hold:
(i) phase cohesiveness: for each [0, [ each solution orginating in arc () remains in arc () for all
times;
(ii) asymptotic phase synchronization: each trajectory originating in arc () for [0, [ achieves
exponential phase synchronization, that is,
k(t) average((0))1n k2 k(0) average((0))1n k2 eps t ,

(15.4)

where ps = 2 (L) cos(/2).


Proof. Consider the averaging formulations (15.2) and (15.3) with initial conditions (0) arc () for
some [0, [. By continuity, for small positive times t > 0, the oscillators remain in a semi-circle,
the time-varying weights bij (t) aij (cos(/2) and cij (t) aij sinc() are strictly positive for each
{i, j} E, the associated time-dependent graph is connected. As one establishes in the proof of
Theorem 11.9, the max-min functions
Vmax-min (x) =
Vmax-min (y) =

max xi

i{1,...,n}

max yi

i{1,...,n}

i{1,...,n}

i{1,...,n}

min

xi ,

min

yi

are strictly decreasing for the time-varying consensus systems (15.2) and (15.3) until consensus is reached.
Thus, the oscillators remain in arc () phase synchronization exponentially fast. Since the graph is
undirected, we can also conclude convergence to the average phase. Finally, the explicit convergence
estimate (15.4) follows, for example, by analyzing (15.2) with the disagreement Lyapunov function and
using bij (t) aij cos(/2).


15.1.2

The potential landscape, convergence and phase synchronization

The consensus analysis in Theorem 15.2 leads to a powerful result but is inherently restricted to a
semi-circle. To overcome this limitation, we use potential functions as an analysis tool. Inspired by
Examples #1 and #3, define the potential function U : Tn R by
X

U () =
aij 1 cos(i j ) .
(15.5)
{i,j}E

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

15.1. Synchronization of identical oscillators

183

Then the coupled oscillator model (14.6) (with all i = 0) can be formulated as the gradient flow
>

U ()
=
.

(15.6)

Among the many critical points of the potential function (15.5), the set of phase-synchronized angles
is the global minimum of the potential function (15.5). This can be easily seen since each summand
in (15.5) is bounded in [0, 2aij ] and the lower bound is reached only if neighboring oscillators are
phase-synchronized. This global minimum is locally exponentially stable.
Theorem 15.3 (Phase synchronization). Consider the coupled oscillator model (15.1) with a connected,
undirected, and weighted graph G = ({1, . . . , n}, E, A). Then
(i) Global convergence: For all initial conditions (0) Tn , the phases i (t) converge to the set of critical
points { Tn | U ()/ = 0>
n }; and

(ii) Local stability: Phase synchronization is a locally exponentially stable equilibrium set.
Proof. The derivative of the potential function U () along trajectories of (15.6) is


U () > 2


U () =
.

Since the potential function and its derivative are smooth and the dynamics are bounded in a compact forward invariant set (Tn ), we can apply the Invariance Principle in Theorem 13.3 to arrive at
statement (i).
Statement (ii) follows from the Jacobian result in Lemma 14.2 and Theorem 13.5.

Theorem 15.3 together with Theorem 15.2 gives a fairly complete picture of the convergence and
phase synchronization properties of the coupled oscillator model (15.1).

15.1.3

Phase balancing

Applications in neuroscience, vehicle coordination, and central pattern generators for robotic locomotion
motivate the study of coherent behaviors with synchronized frequencies where the phases are not
synchronized, but rather dispersed in appropriate patterns. While the phase-synchronized state can be
characterized by the order parameter r achieving its maximal (unit) magnitude, we say that a solution
: R0 Tn to the coupled oscillator model (14.6) achieves phase balancing if all phases i asymptotically
converge to the set
Xn



Tn | r() =
eij /n = 0 ,
j=1

that is, asymptotically the oscillators are uniformly distributed over the unit circle S1 so that their centroid
converges to the origin.
For a complete homogeneous graph with coupling strength aij = K/n, i.e., for the Kuramoto model
(14.7), we have a remarkable identity between the magnitude of the order parameter r and the potential
function U ()

Kn
U () =
1 r2 .
(15.7)
2
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

184

Chapter 15. Networks of Coupled Oscillators

(We ask the reader to establish this identity in Exercise E15.1.) For the complete graph, the correspondence (15.7) shows that the global minimum of the potential function U () = 0 (for r = 1) corresponds
to phase-synchronization and the global maximum U () = Kn/2 (for r = 0) corresponds to phase
balancing. This motivates the following gradient ascent dynamics to reach phase balancing:
U ()
= +

>

n
X
j=1

aij sin(i j ) .

(15.8)

Theorem 15.4 (Phase balancing). Consider the coupled oscillator model (15.8) with a connected, undirected,
and weighted graph G = ({1, . . . , n}, E, A). Then
(i) Global convergence: For all initial conditions (0) Tn , the phases i (t) converge to the set of critical
points { Tn | U ()/ = 0>
n }; and

(ii) Local stability: For a complete graph with uniform weights aij = K/n, phase balancing is the global
maximizer of the potential function (15.7) and is a locally asymptotically stable equilibrium set.

Proof. The proof statement (i) is analogous to the proof of statement (i) in Theorem 15.3.
To prove statement (ii), notice that, for a complete graph, the phase balanced
set characterized by

2 . By Theorem 13.6, local
r = 0 achieves the global maximum of the potential U () = Kn
1

r
2
maxima of the potential are locally asymptotically stable for the gradient ascent dynamics (15.8).


15.2

Synchronization of heterogeneous oscillators

In this section we analyze non-identical oscillators with i 6= j . As shown in Lemma 15.1, these
oscillator networks cannot achieve phase synchronization. On the other hand frequency synchronization
with a certain degree of phase cohesiveness can be achieved provided that the natural frequencies satisfy
certain bounds relative to the network coupling. We start off with the following necessary conditions.
Lemma 15.5. Necessary synchronization condition Consider the coupled
Pn oscillator model (14.6) with graph
G = ({1, . . . , n}, E, A), frequencies 1
,
and
nodal
degree
deg
=
n
j=1 aij for each node i {1, . . . , n}.
i
If there exists a frequency-synchronized solution satisfying the phase cohesiveness |i j | for all {i, j} E
and for some [0, /2], then the following conditions hold:
(i) Absolute bound: For each node i {1, . . . , n},

degi sin() |i | .

(ii) Incremental bound: For distinct i, j {1, . . . , n},

(degi + degj ) sin() |i j | .

(15.9)

(15.10)

Proof. Statement (i) follows directly from the fact that synchronized solutions must satisfy the equilibrium
equation i = 0. Since
Pnthe sinusoidal interaction terms in equation (14.6) are upper bounded by the
nodal degree degi = j=1 aij , condition (15.9) is necessary for the existence of an equilibrium.
Statement (ii) follows from the fact that frequency-synchronized solutions must satisfy i j = 0.
By analogous arguments, we arrive at the necessary condition (15.10).

Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

15.2. Synchronization of heterogeneous oscillators

15.2.1

185

Synchronization of heterogeneous oscillators over complete homogeneous graphs

Consider the Kuramoto model over a complete homogeneous graph:


K Xn
i = i
sin(i j ),
j=1
n

i {1, . . . , n}.

(15.11)

As discussed in Subsection 14.3.4, the Kuramoto model synchronizes provided that the coupling gain K
is larger than some critical value Kcritical . The necessary condition (15.10) delivers a lower bound for
Kcritical given by


n
K
max i min i .
i
i
2(n 1)

Here we evaluated the left-hand side of (15.10) for aij = K/n, for the maximum = /2, and for all
distinct i, j {1, . . . , n}. Perhaps surprisingly, the lower necessary bound (15.2.1) is a factor 1/2 away
from the upper sufficient bound.

Theorem 15.6 (Synchronization test for all-to-all Kuramoto model). Consider the Kuramoto model
(15.11) with natural frequencies 1
n and coupling strength K. Assume
K > Kcritical , max i min i ,
i

(15.12)

and define the arc lengths min [0, /2[ and max ]/2, ] as the unique solutions to sin(min ) =
sin(max ) = Kcritical /K.

max

Kcritical /K

min

The following statements hold:


(i) phase cohesiveness: each solution starting in arc (), for [min , max ], remains in arc () for all
times;
(ii) asymptotic phase cohesiveness: each solution starting in arc (max ) asymptotically reaches the set
arc (min ); and
(iii) asymptotic frequency synchronization: each solution starting in arc (max ) achieves frequency
synchronization.
Moreover, the following converse statement is true: Given an interval [min , max ], the coupling strength K
satisfies K > max min if, for all frequencies supported on [min , max ] and for the arc length max computed
as above, the set arc (max ) is positively invariant.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

186

Chapter 15. Networks of Coupled Oscillators

Proof. We start with statement (i). Define the function W : arc () [0, [ by
W () = max{|i j | | i, j {1, . . . , n}}.
The arc containing all angles has two boundary points: a counterclockwise maximum and a counterclockwise minimum. If Umax () (resp. Umin ()) denotes the set indices of the angles 1 , . . . , n that are
equal to the counterclockwise maximum (resp. the counterclockwise minimum), then
W () = |m0 k0 |, for all m0 Umax () and k 0 Umin ().
We now assume (0) arc (), for [min , max ], and aim to show that (t) arc () for all times
t > 0. By continuity, arc () is positively invariant if and only if W ((t)) does not increase at any time t
such that W ((t)) = .
In the next equation we compute the maximum possible amount of infinitesimal increase of W ((t))
along system (15.11). We do this in a loose way here and refer to (Lin et al. 2007, Lemma 2.2) for a
rigorous treatment. The statement is:
D+ W ((t)) := lim sup
t0+

W ((t + t)) W ((t))


= m (t) k (t),
h

where m Umax ((t)) and k Umin ((t)) have the property that m (t) = max{m0 (t) | m0
Umax ((t))} and k (t) = min{k0 (t) | k 0 Umin ((t))}. In components
n

K X
sin(m (t) j (t)) + sin(j (t) k (t)) .
D W ((t)) = m k
n
+

j=1

xy
The trigonometric identity sin(x) + sin(y) = 2 sin( x+y
2 ) cos( 2 ) leads to





n 
KX
m (t) k (t)
m (t) i (t) i (t) k (t)
D W ((t)) = m k
2 sin
cos

.
n
2
2
2
+

i=1

Measuring angles counterclockwise and modulo 2, the equality W ((t)) = implies m (t) k (t) = ,
m (t) i (t) [0, ], and i (t) k (t) [0, ]. Moreover,





m i i k
m i i k
= cos(/2),
min cos

= cos max

2
2
2
2

so that

n
 
 
K X
D W ((t)) m k
2 sin
cos
.
n
2
2
+

i=1

Applying the reverse identity 2 sin(x) cos(y) = sin(x y) + sin(x + y), we obtain
n
KX
D W ((t)) m k
sin() (max i min i ) K sin() .
i
i
n
+

i=1

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

15.2. Synchronization of heterogeneous oscillators

187

Hence, the W ((t)) does not increase at all t such that W ((t)) = if K sin() Kcritical = maxi i
mini i .
Given the structure of the level sets of 7 K sin(), there exists an open interval of arc lengths
[0, ] satisfying K sin() maxi i mini i if and only if equation (15.12) is true with the strict
equality sign at = /2, that is, if K > Kcritical . Additionally, if K > Kcritical , there exists a unique
min [0, /2[ and a unique max ]/2, ] that satisfy equation (15.12) with the equality sign. In
summary, for every [min , max ], if W ((t)) = , then the arc-length W ((t)) is non-increasing.
This concludes the proof of statement (i).
Moreover, pick  max min . For all [min + , max ], there exists a positive () with the
property that, if W ((t)) = , then D+ W ((t)) (). Hence, each solution : R0 Tn starting
in arc (max ) must satisfy W ((t)) min after time at most (max min )/(). This proves
statement (ii).
Regarding statement (iii), we just proved that for every (0) arc (max ) and for all ]min , max ]
there exists a finite time T 0 such that (t) arc () for all t T and for some < /2. It follows
that |i (t) j (t)| < /2 for all {i, j} E and for all t T . We now invoke Corollary 14.3 to
conclude the proof of statement (iii).
The converse statement can be established by noticing that all of the above inequalities and estimates
are exact for a bipolar distribution of natural frequencies i {, } for all i {1, . . . , n}. The full
proof is in (Drfler and Bullo 2011)


15.2.2

Synchronization of heterogeneous oscillators over weighted undirected graphs

Consider the coupled oscillator model over a weighted undirected graph:


Xn
i = i
aij sin(i j ),
i {1, . . . , n}.
j=1

Adopt the following shorthands:


r

1 Xn

=
(i j )2 ,
2, pairs
i,j=1
2

and



=
2, pairs

(15.13)

1 Xn
|i j |2 .
i,j=1
2

Theorem 15.7 (Synchronization test I). Consider the coupled oscillator model (15.13) with frequencies
1
n defined over a weighted undirected graph with Laplacian matrix L. Assume
2 (L) > critical , kk2, pairs ,

(15.14)

and define max ]/2, ] and min [0, /2[ as the solutions to (/2) sinc(max ) = sin(min ) =
critical /2 (L). The following statements hold:


(i) phase cohesiveness:
each solution starting

in arc () | kk2, pairs , for [min , max ],
remains in arc () | kk2, pairs for all times,


(ii) asymptotic phase cohesiveness: each solution starting in arc () | kk2, all pairs < max


asymptotically reaches the set arc () | kk2, all pairs min ; and
(iii) asymptotic
frequency synchronization:
each solution starting in


arc () | kk2, pairs < max achieves frequency synchronization.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

188

Chapter 15. Networks of Coupled Oscillators

The proof of Theorem 15.7 follows the reasoning of the proof of Theorem 15.6 using the quadratic
2
Lyapunov function 2, pairs . The full proof is in (Drfler and Bullo 2012, Appendix B).

15.2.3

Appendix: alternative theorem

Notice that the parametric condition (15.14) of the above theorem is very conservative since the left-hand
side is at most n (for a complete graph), and the right hand side is a sum of n2 terms. In the following we
partially improve upon this conservativeness. Adopt the following shorthands:
rX
rX


2


(i j ) ,
and
=
|i j |2 .
=
2, edges
2, edges
{i,j}E

{i,j}E

Theorem 15.8 (Synchronization test II). Consider the coupled oscillator model (15.13) with frequencies
1
n defined over a weighted undirected graph with Laplacian matrix L. Assume
2 (L) > critical , kk2, edges ,

(15.15)

and define min [0, /2[ as the solution to sin(min ) = critical /2 (L). Then there exists a locally exponentially
stable equilibrium set [ ] satisfying |i j | min for all {i, j} E.
Proof. Lemma 14.2 guarantees local exponential stability an equilibrium set [ ] satisfying |i j |
for all {i, j} E and for some [0, /2[. In the following we establish conditions for existence of
equilibria this particular set () = { Tn | |i j | , for all {i, j} E}. The equilibrium
equations can be written as
= L(B > ) ,
(15.16)
where L(B > ) = B diag({aij sinc(i j )}{i,j}E )B > is the Laplacian matrix associated with the graph
with nonnegative edge weights a
G = ({1, . . . , n}, E, A)
ij = aij sinc(i j ) aij sinc() > 0 for
{i, j} E and (). Since for any weighted Laplacian matrix L, we have that L L = L L =
>
>
In (1/n)1n 1>
n , a multiplication of equation (15.16) from the left by B L(B ) yields
B > L(B > ) = B > .

(15.17)

Note that the left-hand side of equation (15.17) is a continuous1 function for (). Consider the
formal substitution x = B > , the compact and convex set S () = {x Img(B > ) | kxk }
(corresponding to ()), and the continuous map f : S () R given by f (x) = B > L(x) . Then
equation (15.17) is equivalent to the fixed-point equation
f (x) = x.
We invoke the Brouwers Fixed Point Theorem which states that every continuous map from a compact
and convex set to itself has a fixed point, see for instance (Spanier 1994, Section 7, Corollary 8).
1

>
The continuity can be established when re-writing equations (15.16) and (15.17) in the quotient space 1
n , where L(B )
is nonsingular, and using the fact that the inverse of a nonsingular matrix is a continuous function of its elements. See also
(Rakoevi 1997, Theorem 4.2) for a necessary and sufficient conditions for continuity of the Moore-Penrose inverse requiring
that L(B > ) has constant rank for ().

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

15.2. Synchronization of heterogeneous oscillators

189

Since the analysis of the map f in the -norm is very hard in the general case, we resort to a 2-norm
analysis and restrict ourselves to
the set S2 () = {x image(B > ) | kxk2 } S (). The set S2 ()
n
corresponds to the set { T | 2, edges } in -coordinates. By Brouwers Fixed Point Theorem,

there exists a solution x S2 () to the equation x = f (x) if and only if kf (x)k2 for all x S2 (),
or equivalently if and only if




max B > L(x) .
(15.18)
xS2 ()

After some bounding (see (Drfler and Bullo 2012, Appendix C) for details), we arrive at





max B > L(x) 2, edges / (2 (L) sinc()) .
xS2 ()

The term on the right-hand side of the above inequality has to be less or equal than . In summary,
we

conclude that there is a locally exponentially stable synchronization set [ ] { Tn | 2, edges
} () if


2 (L) sin() 2, edges .

(15.19)

Since the left-hand side of (15.19) is a concave function of [0, /2[, there exists an open set of
[0, /2[ satisfying equation (15.19) if and only if equation (15.19) is true with the strict equality sign
at = /2, which corresponds to condition (15.15). Additionally, if these two equivalent statements
are true, then there exists
a unique min [0, /2[ that satisfies equation (15.19) with the equality sign,
namely sin(min ) = 2, edges /2 (L). This concludes the proof.


Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

190

15.3
E15.1
E15.2

Chapter 15. Networks of Coupled Oscillators

Exercises

P
Potential and order parameter. Recall U () = {i,j}E aij 1cos(i j ) . Prove U () =
for a complete homogeneous graph with coupling strength aij = K/n.

Kn
2
2 (1r )

Analysis of the two-node case. Present a complete analysis of a system of two coupled oscillators:
1 = 1 a12 sin(1 2 ) ,
2 = 2 a21 sin(2 1 ) ,

where a12 = a21 and 1 + 2 = 0. When do equilibria exist? What are their stability properties and their
basins of attraction?
E15.3

E15.4

Averaging analysis of coupled oscillators in a semi-circle. Consider the coupled oscillator model
(15.1) with arc () for some < . Show that the coordinate transformations xi = tan(i ) (with
xi R) gives the averaging system (15.2) with bij aij cos(/2).
Phase synchronization in spring network. Consider the spring network from Example #1 with
identical oscillators and a connected, undirected, and weighted graph:
Mi i + Di i +

n
X
j=1

aij sin(i j ) ,

i {1, . . . , n} .

Prove the phase synchronization result (in Theorem 15.3) for this spring network.

E15.5

Pn
Synchronization on acyclic graphs. Consider the coupled oscillator model i = j=1 aij sin(i j )
Pn
with i=1 i = 0 and defined over an acyclic interaction graph, that is, the adjacency matrix A with
elements aij = aji {0, 1} induces an undirected, connected, and acyclic graph. Show that in this case
the following exact synchronization condition holds: there exists a locally stable
frequency-synchronized

solution in the set { Tn | |i j | < /2 for all {i, j} E} if and only if B > L < 1, where B
and L are the network incidence and Laplacian matrices.
Hint: Follow the derivation in Example 8.12.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 16

Virus Propagation: Basic Models


In this chapter and the next we present simple models for the diffusion and propagation of infectious
diseases. The proposed models may be relevant also in the context of propagation of information/signals
in a communication network and diffusion of innovations in competitive economic networks. Other
interesting propagation phenomena include failures in power networks and wildfires in forests.
In this chapter and the next, we are interested in (1) models (lumped vs network, deterministic vs
stochastic), (2) asymptotic behaviors (vanishing infection, steady-state epidemic, full contagion), and (3)
the transient propagation of epidemics starting from small initial fractions of infected nodes (possible
epidemic outbreak as opposed to monotonically vanishing infection). In the interest of clarity, we
begin with lumped variables, i.e., variables which represent an entire well-mixed population of
nodes. The next chapter will discuss distributed variable models, i.e., network models. We study
three low-dimensional deterministic models in which nodes may be in one of two or three states; see
Figure 16.1.
Susceptible

Infected

Susceptible

Susceptible

Infected

Infected

Recovered

Figure 16.1: The three basic models SI, SIS and SIR for the propagation of an infectious desease

We say that an epidemic outbreak takes place if a small initial fraction of infected individuals leads to
the contagion of a significant fraction of the population. We say the system displays an epidemic threshold
if epidemic outbreaks occur when some combined value of parameters and initial conditions are above
critical values.

16.1

The SI model

Given a population, let x(t) denote the fraction of infected individuals at time t R0 . Similarly, let s(t)
denote the fraction of susceptible individuals. Clearly, x(t) + s(t) = 1 at all times. We model propagation
191

192

Chapter 16. Virus Propagation: Basic Models

via the following first-order differential equation, called the susceptibleinfected (SI) model
(16.1)

x(t)

= s(t)x(t) = (1 x(t))x(t),

where > 0 is the infection rate. We will see distributed and stochastic versions of this model later in
the chapter. A simple qualitative analysis of this equation can be performed by plotting x over x, see
Figure 16.2.
0.4
0.3
0.2
0.1
0.0
0.2

0.4

0.6

0.8

1.0

-0.1
-0.2

Figure 16.2: Phase portrait of the (lumped deterministic) SI model ( = 1).

Remark 16.1 (Heuristic modeling assumptions and derivation). Over the interval (t, t+t), pairwise
meetings between individuals in the population take place in the following fashion: assume the population has
n individuals, pick a meeting rate m > 0, and assume that nm t individuals will meet other nm t
individuals. Assuming meetings involve uniformly-selected individuals, over the interval (t, t + t), there are
s(t)2 nm t meetings between a susceptible and another susceptible individual; these meetings, as well as meetings
between infected individuals result in no epidemic propagation. However, there will also be s(t)x(t)nm t +
x(t)s(t)nm t meetings between a susceptible and an infected individual. We assume a fraction i [0, 1],
called transmission rate, of these meetings results in the successful transmission of the infection:


i s(t)x(t)nm t + x(t)s(t)nm t = 2i m x(t)s(t)nt.
In summary, the fraction of infected individuals satisfies

x(t + t) = x(t) + 2i m x(t)s(t)t,


and the SI model (16.1) is the limit at t 0+ , where the infection parameter is twice the product of meeting
rate m and infection transmission fraction i .
Lemma 16.2 (Dynamical behavior of the SI model). Consider the SI model (16.1). The solution from
initial condition x(0) = x0 [0, 1] is
x(t) =

x0 et
.
1 x0 + x0 et

(16.2)

From all positive initial conditions 0 < x0 < 1, the solution x(t) is monotonically increasing and converges to the
unique equilibrium 1 as t .
It is easy to see that the SI model (16.1) results in an evolution akin to a logistic curve; see Figure 16.3.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

16.2. The SIR model

193

1.0

x(t)

0.8

0.6

0.4

0.2

t
5

10

15

20

Figure 16.3: Evolution of the (lumped deterministic) SI model ( = 1) from small initial fraction of infected
individuals.

16.2

The SIR model

Next, we study a model in which individuals recover from the infection and are not susceptible to the
epidemics after one round of infection. In other words, we assume the population is divided into three
distinct groups: s(t) denotes the fraction of susceptible individuals, x(t) denotes the fraction of infected
individuals, and r(t) denotes the fraction of recovered individuals. Clearly, s(t) + x(t) + r(t) = 1. We
model the recovery process via a constant recovery rate and write our (susceptibleinfectedrecovered) SIR
model as
s(t)
= s(t)x(t),

x(t)

= s(t)x(t) x(t),

(16.3)

r(t)
= x(t).

Heuristic modeling assumptions and derivation. One can show that the constant recovery rate
assumption corresponds to assuming a so-called Poisson recovery rate for the stochastic version of the SI
model. This is arguably not a very realistic assumption.
Lemma 16.3 (Dynamical behavior of the SIR model). Consider the SIR model (16.3). From each initial
condition with x(0) > 0 and s(0) > 0, the resulting trajectory t 7 (s(t), x(t), r(t)) has the following properties:
(i) if s(0), x(0), r(0) [0, 1], then s(t), x(t), r(t) [0, 1] for all t 0;

(ii) t 7 s(t) is monotonically decreasing and t 7 r(t) is monotonically increasing;

(iii) if s(0)/ > 1, then t 7 x(t) first monotonically increases to a maximum value and then decreases to
zero as t ; (we describe this case as epidemic outbreak, that is, an exponential growth of t 7 x(t)
for small times);
(iv) if s(0)/ < 1, then t 7 x(t) monotonically and exponentially decreases to zero as t ;
(v) limt (s(t), x(t), r(t)) = (s , 0, r ), where r is the unique solution to the equality


1 r = s(0) exp (r r(0)) .

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(16.4)

194

Chapter 16. Virus Propagation: Basic Models

1.0
1.0

0.8

r(t)

s(0)e

0.8

s(t)

( / )r1

, / = 1/4

0.6

0.6

1
0.4

0.4

x(t)
s(0)e

0.2

0.2

r1

( / )r1

, / =4

0.4

0.6

t
5

10

15

20

0.2

0.0

0.8

1.0

r1

Figure 16.4: Left figure: evolution of the (lumped deterministic) SIR model from small initial fraction of infected
individuals (and zero recovered); parameters = 2, = 1/4 (case (iv) in Lemma 16.3). Right figure: intersection
between the two curves in equation (16.4) with s(0) = 0.95, r(0) = 0 and / {1/4, 4}.

Proof. We first prove statement (i). For any fixed point in time t 0 and for any values of s(t), x(t), r(t)
[0, 1], we have that r(t) is strictly increasing for x(t) [0, 1] and r(t)
= 0 for x(t) = 0 or x(t) = 1 (due
to the conserved quantity 1 = s(t) + x(t) + r(t)). Hence, irrespective of s(t), x(t) [0, 1], it follows that
r(t) [0, 1] for all t 0. We next eliminate s(t) = 1 x(t) r(t) and rewrite the SIR model (16.3) as
x(t)

= (1 x(t) r(t))x(t) x(t),


r(t)
= x(t).

(16.5a)
(16.5b)

For any fixed point in time t 0 and for any values of x(t), r(t) [0, 1], we investigate the right-hand
side of (16.5a). The vector field (16.5a) has two roots x1 = 0 and x2 = x2 (r(t)). For sufficiently large
we have that x2 x1 = 0. In this case, x(t)

< 0 for x(t) ]0, 1]. Otherwise, x2 [0, 1], and a further
inspection of the vector field (16.5a) shows that x(t)

> 0 for x(t) ]0, x2 [ and x(t)

< 0 for x(t) ]x2 , 1].


Since the vector field (16.5a) at the boundaries is not pointing outside the interval [0, 1], it follows that
x(t) necessarily remains in [0, 1]. This proves statement (i).
Statement (ii) is an immediate consequence of s(t)
= s(t)x(t) 0 and r(t)
= x(t) 0.
We leave the proof of statement (iii) and (iv) to the reader.
We next focus on statement (v). For the SIR model (16.3) the signals s(t) and r(t) are monotonically
non-increasing. Because they are also lower bounded, their two limits exist: limt s(t) = s and
limt r(t) = r . Moreover, the equality s(t) + x(t) + r(t) = 1 implies that the third limit also must
exist, that is, limt x(t) = x . We now claim that x = 0. By contradiction, assume x > 0. But,
then, also limt r(t)
= x > 0 and this contradicts the fact that r(t) 1.
Next, consider the SIR model (16.3). If s(0) = 0, then clearly r = 1. If instead s(0) > 0, then s(t)
remains strictly positive for sufficiently small time t. Given t sufficiently small, we note a useful equality
and integrate it from 0 to t:
s(t)

= x(t) = r(t)

s(t)

=
=

s(t)

= (r(t) r(0))
s(0)



s(t) = s(0) exp (r(t) r(0)) .

ln

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

16.3. The SIS model

195

The last equality implies that s(t) is strictly positive for all t. Equation (16.4) follows by taking the limit
as t and noting that for all time 1 = s(t) + x(t) + r(t); in particular, 1 = s + r . The uniqueness
of the solution r to equation (16.4) follows from showing there exists a unique intersection between
left and right hand side, as illustrated in Figure 16.4.


16.3

The SIS model

As third and final lumped deterministic model, we study the setting in which individuals recover from
the infection, but are susceptible to being re-infected. As in the SI model, the population is divided into
two fractions with with s(t) + x(t) = 1. We model infection, recovery and possible re-infection with
the SIS model:
x = sx x = ( x)x,
(16.6)

where is the infection rate and is the recovery rate. Note that the first term is the same infection
term as in the SI model and the second term is the same recovery term as in the SIR model.
A simple qualitative analysis of this equation can be performed by plotting x over x for < , = ,
and > ; see Figure 16.5.
0.5

0.0

0.2

0.4

0.6

0.8

1.0

0.4

-0.5

0.2

-1.0

0.0

-1.5

-0.2

-2.0

-0.4

0.2

0.4

0.6

0.8

1.0

Figure 16.5: Phase portrait of the (lumped deterministic) SIS model for = 1 < = 3/2 and for = 1 > = 1/2.

Lemma 16.4 (Dynamical behavior of the SIS model). For the SIS model (16.6):
(i) the closed form solution to equation (16.6) from initial condition x(0) = x0 [0, 1], for 6= , is
x(t) =

x0

( )x0
,
(1 x0 ))

e()t (

(16.7)

(ii) if , all trajectories converge to the unique equilibrium x = 0 (i.e., the epidemic disappears), and

(iii) if > , then, from all positive initial conditions x(0) > 0, all trajectories converge to the unique
exponentially stable equilibrium x = ( )/ < 1 (epidemic outbreak and steady-state epidemic
contagion).
We illustrate these results in Figure 16.6.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

196

Chapter 16. Virus Propagation: Basic Models

0.5

x(t)

0.4

0.3

0.2

0.1

t
5

10

15

20

Figure 16.6: Evolution of the (lumped deterministic) SIS model from small initial fraction of infected individuals;
= 1 > = .5.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

16.4. Exercises

16.4

197

Exercises

E16.1

Closed-form solutions for SI and SIS models. Verify the correctness of the closed-form solutions for
SI and SIS models given in equations (16.2) and (16.7).

E16.2

Dynamical behavior of the SIS model. Prove Lemma 16.4.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

198

Chapter 16. Virus Propagation: Basic Models

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Chapter 17

Virus Propagation in Contact Networks


In this chapter we continue our discussion about the diffusion and propagation of infectious diseases.
Starting from the basic lumped models discussed in Chapter 16, we now focus on network models as
well as we discuss some stochastic modelling aspects.
We borrow ideas from the lecture notes by Zampieri (2013). A detailed survey about infectious
diseases is (Hethcote 2000). A very early work on epidemic models over networks, the spectral radius
of the adjacency matrix and the epidemic threshold is Lajmanovich and Yorke (1976). Later works on
similar models include (Wang et al. 2003) and Mieghem (2011); Mieghem et al. (2009). Our stochastic
analysis is based on the approach in (Mei and Bullo 2014). Recent extensions and general proofs for the
deterministic SIS network model are given by Khanafer et al. (2015). A related book chapter is (Newman
2010, Chapter 17).

17.1

The stochastic network SI model

In this section we consider epidemics models that are richer, more general and complex than the lumped
deterministic models consider before. We extend our treatment in two ways: we consider a stochastic
model of the propagation phenomenon and we imagine the population is distributed over a network.
The stochastic model

The stochastic network SI model, illustrated in Figure 17.1, is defined as follows:

(i) We consider a group of n individuals. The state of each individual is either S for susceptible or I
for infected.
(ii) The n individuals are in pairwise contact, as specified by an undirected graph G with adjacency
matrix A (without self-loops). The edge weights represent the frequency of contact among two
individuals.
(iii) Each individual in susceptible status can transition to infected as follows: given an infection rate
> 0, if a susceptible individual i is in contact with an infected individual j for time t, the
probability of infection is aij t. Each individual can be infected by any neighboring individual:
these random events are independent.
199

200

Chapter 17. Virus Propagation in Contact Networks

Susceptible

(infection rate)

Infected

Figure 17.1: In the stochastic network SI model, each susceptible individual (blue) becomes infected by contact
with infected individuals (red) in its neighborhood according to an infection rate .

An approximate deterministic model We define the infection variable at time t for individual i by
(
1, if node i is in state I at time t,
Yi (t) =
0, if node i is in state S at time t,
and the expected infection, which turns out to be equal to the probability of infection, of individual i by
xi (t) = E[Yi (t) = 1] P[Yi (t) = 1] + 0 P[Yi (t) = 0]
= P[Yi (t) = 1].

In what follows it will be useful to approximate P[Yi (t) = 0 | Yj (t) = 1] with P[Yi (t) = 0], that is, to
require Yi and Yj to be independent for arbitrary i and j. We claim this approximation is acceptable
over certain graphs with large numbers n of individuals. The final model, which we obtain below based
on the Independence Approximation, is an upper bound on the true model because P[Yi (t) = 0]
P[Yi (t) = 0 | Yj (t) = 1].
Definition 17.1 (Independence Approximation). For any two individuals i and j, the infection variables
Yi and Yj are independent.
Theorem 17.2 (From the stochastic to the deterministic network SI model). Consider the stochastic
network SI model with infection rate over a contact graph with adjacency matrix A. The probabilities of infection
satisfy
n

X
d
P[Yi (t) = 1] =
aij P[Yi (t) = 0, Yj (t) = 1].
dt
j=1

Moreover, under the Independence Approximation 17.1, the probabilities of infection xi (t) = P[Yi (t) = 1],
i {1, . . . , n}, satisfy (deterministic) network SI model defined by
x i (t) = (1 xi (t))

n
X

aij xj (t).

j=1

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

17.1. The stochastic network SI model

201

We study the deterministic network SI model in the next section.


Proof. In what follows, we define the random variables
Yi (t) = (Y1 (t), . . . , Yi1 (t), Yi+1 (t), . . . , Yn (t)),
and, similarly, Yij (t), for i, j {1, . . . , n}. We are interested in the events that a susceptible individual
remains susceptible or becomes infected over the interval of time [t, t + t], for small t. We start by
computing the probability of non-infection for time t, conditioned upon Yi (t):
P[Yi (t + t) = 0 | Yi (t) = 0, Yi (t)] =

n
Y

j=1

n
X

1 aij Yj (t)t = 1
aij Yj (t)t + O(t2 ),
j=1

where O(t2 ) is a function upper bounded by a constant times t2 . The complementary probability,
i.e., the probability of infection for time t is:
P[Yi (t + t) = 1 | Yi (t) = 0, Yi (t)] =

n
X

aij Yj (t)t + O(t2 ).

j=1

We are now ready to study the random variable Yi (t + t) Yi (t), given Yi (t):
E[Yi (t+t) Yi (t) | Yi (t)]

= 1 P[Yi (t + t) = 1, Yi (t) = 0 | Yi (t)]




+ 0 P (Yi (t + t) = Yi (t) = 0) or (Yi (t + t) = Yi (t) = 1) | Yi (t) (by def. expectation)

(by conditional prob.)

= P[Yi (t + t) = 1 | Yi (t) = 0, Yi (t)] P[Yi (t) = 0 | Yi (t)]


n
X

=
aij Yj (t)t + O(t2 ) P[Yi (t) = 0 | Yi (t)].
j=1

We now remove the conditioning upon Yi (t) and study:




E[Yi (t + t) Yi (t)] = E E[Yi (t + t) Yi (t) | Yi (t)]
n
X
 

=
aij t E Yj (t) P[Yi (t) = 0 | Yi (t)] + O(t2 ),
j=1

and therefore we compute (where y is an arbitrary realization of the random variable Y ):




E Yj (t) P[Yi (t) = 0 | Yi (t)]
X
=
yj P[Yi (t) = 0 | Yi (t) = yi ] P[Yi (t) = yi ]
(by def. expectation)
yi
X
=
1 P[Yi (t) = 0 | Yij (t) = yij , Yj (t) = 1]
yij

P[Yij (t) = yij , Yj (t) = 1]

P[Yi (t) = 0, Yij (t) = yij , Yj (t) = 1]

(because yj {0, 1})

(by conditional prob.)

yij

= P[Yi (t) = 0, Yj (t) = 1],


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

202

Chapter 17. Virus Propagation in Contact Networks

where, for example, the first summation is taken over all possible values yi that the variable Yi (t) takes.
In summary, we know
E[Yi (t + t) Yi (t)] =

n
X
j=1

aij t P[Yi (t) = 0, Yj (t) = 1] + O(t2 ),

so that, also recalling P[Yi (t) = 1] = E[Yi (t)],


n

X
d
E[Yi (t + t) Yi (t)]
P[Yi (t) = 1] = lim
=
aij P[Yi (t) = 0, Yj (t) = 1].
dt
t
t0+
j=1

The final step is an immediate consequence of the Independence Approximation: P[Yi (t) = 0, Yj (t) =
1] = P[Yi (t) = 0 | Yj (t) = 1] P[Yj (t) = 1] (1 P[Yi (t) = 1]) P[Yj (t) = 1].


17.2

The network SI model

In this and the following sections we consider deterministic network models for the propagation
of epidemics. Two interpretations of the provided models are possible: if node i is a population of
individuals at location i, then xi can be interpreted as the infected fraction of that population. If node
i is a single individual, then xi can be interpreted as the probability that the individual is infected:
xi (t) = P[individual i is infected at time t].
Susceptible

(infection rate)

Infected

Figure 17.2: In the (deterministic) network SI model, each node is described by a probability of infection taking
value between 0 (blue) and 1 (red). The rate at which individuals become increasingly infected is parametrized by
the infection rate .

Consider an undirected weighted graph G = (V, E) of order n with adjacency matrix A and degree
matrix D = diag(A1n ). Let xi (t) [0, 1] denote the fraction of infected individuals at node i V at
time t R0 . The network SI model is
x i (t) = (1 xi (t))
or, in equivalent vector form,

n
X

aij xj (t),

j=1


x = In diag(x) Ax.

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

(17.1)

17.2. The network SI model

203

Alternatively, in terms the fractions of susceptibile individuals s = 1n x, the network SI model reads
s = diag(s)A(1n s).

(17.2)

Theorem 17.3 (Dynamical behavior of the network SI model). Consider the network SI model (17.1).
Assume G is connected so that A is irreducible; let D denote the degree matrix. The following statements hold:
(i) if x(0), s(0) [0, 1]n , then x(t), s(t) [0, 1]n for all t 0;

(ii) there are two equilibrium points: 0n (no epidemics), and 1n (full contagion);
(iii) the linearization of model (17.1) about the equilibrium point 0n is x = Ax and it is exponentially
unstable;
(iv) the linearization of model (17.2) about the equilibrium 0n is s = Ds and it is exponentially stable;

(v) each trajectory with initial condition x(0) 6= 0n converges asymptotically to 1n , that is, the epidemics spreads
to the entire network.

Proof. Statement (i) can be proved by evaluating the vector field (17.1) at the boundaries of the admissible
state space that is for x [0, 1]n such that at least one entry i satisfies xi {0, 1}. We leave the detailed
proof of statement (i) to the reader.
We now prove statement (ii). The point x is an equilibrium point if and only if:

In diag(x) Ax = 0n

Ax = diag(x)Ax.

Clearly, 0n and 1n are equilibrium points. Hence we just need to show that no other points can be
equilibria. First, suppose that there exists an equilibrium point x with 0n x < 1n . But then In diag(x)
has strictly positive diagonal and therefore x must satisfy Ax = 0n . Note that Ax = 0n implies also
Pn1 k
Pn1 k
k=1 A x = 0n . Recall from Proposition 4.3 that, if A is irreducible, then
k=1 A has all off-diagonal
terms strictly positive. Because xi [0, 1[, the only possible solution to Ax = 0n is therefore x = 0n .
This is a contradiction.
Next, suppose there exists an equilibrium point x = (x1 , x2 ) with 0n1 x1 < 1n1 , x2 = 1n2 , and
n1 + n2 = n. The equality Ax = diag(x)Ax implies Ax = diag(x)k Ax for all k N and, in turn,
Ax = lim diag(x)k Ax =
k


0n1 n1
0n2 n1


0n1 n2
Ax.
In2

By partitioning A in corresponding blocks, the previous equality implies A11 x1 + A12 x2 = 0n1 . Because
x2 = 1n2 we know that A12 = 0n1 n2 and, therefore, that A is reducible. This contradiction concludes
the proof of statement (ii).
Statements (iii) and (iv) are straightforward computations:

x
= In diag(x) Ax = Ax diag(x)Ax Ax,

s
= diag(s)A(1n s) = diag(s)A1n + diag(s)As = Ds + diag(s)As Ds,

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

204

Chapter 17. Virus Propagation in Contact Networks

where we used the equality diag(y)z = diag(z)y for y, z Rn . Exponential stability of the linearization
s = Ds is obvious, and the PerronFrobenius Theorem 2.15 for irreducible matrices implies the
existence of the unstable positive eigenvalue (A) > 0 for the linearization x = Ax.
To show statement (v), consider the function V (x) = 1>
n (1n x); this is a smooth function defined

over the compact and forward invariant set [0, 1]n (see (i)). We compute V = 1>
n In diag(x) Ax
and note that V 0 for all x, and V (x) = 0 if and only if x {0n , 1n }. Because of these facts, the
LaSalle Invariance Principle in Theorem 13.3 implies all trajectories with x(0) converge asymptotically
to either 1n or 0n . Additionally, note that 0 V (x) n for all x [0, 1]n , that V (x) = 0 if and only if
x = 1n and that V (x) = n if and only if x = 0n . Therefore, all trajectories with x(0) 6= 0n converge
asymptotically to either 1n .

Before proceeding, we review the notion of dominant eigenvector and introduce some notation. Let
max = (A) be the dominant eigenvalue of the adjacency matrix A and let vmax be the corresponding
positive eigenvector normalized to satisfy 1>
n vmax = 1. (Recall that these definitions are well posed
because of the PerronFrobenius Theorem 2.15 for irreducible matrices.) Additionally, let vmax , v2 . . . , vn
denote an orthonormal set of eigenvectors with corresponding eigenvalues max > 2 n for
the symmetric adjacency matrix A.
Consider now the onset of an epidemics in a large population characterized by a small initial infection
x(0) = x0  1n . So long as x(t)  1n , the system evolution is approximated by x = Ax. This
initial-times linear evolution satisfies
n
X
 max t

>
x(t) = vmax x0 e
vmax +
vi> x0 ei t vi
max t

=e

>
vmax
x0

i=2


vmax + o(t) ,

(17.3)

where o(t) is a function exponentially vanishing as t . In other words, the epidemics initially
experiences exponential growth with rate max and with distribution among the nodes given by the
eigenvector vmax .

17.3

The network SIS model

As previously, consider an undirected weighted graph G = (V, E) of order n with adjacency matrix A.
Let xi (t) [0, 1] denote the fraction of infected individuals at node i V at time t R0 . Given an
infection rate and a recovery rate , the network SIS model is
x i (t) = (1 xi (t))
or, in equivalent vector form

n
X
j=1

aij xj (t) xi (t),


x = In diag(x) Ax x.

(17.4)

(17.5)

We start our analysis with useful preliminary notions. We define the monotonically-increasing
functions
f+ (y) = y/(1 + y), and f (z) = z/(1 z)
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

17.3. The network SIS model

205

for y R0 and z [0, 1]. One can easily verify that f+ (f (z)) = z for all z [0, 1]. For vector variables
y Rn0 and z [0, 1]n , we write F+ (y) = (f+ (y1 ), . . . , f+ (yn )), and F (z) = (f (z1 ), . . . , f (zn )).
Denoting A = A/ and assuming x < 1n , the model (17.5) is rewritten as:

F (x) ,
x = F (x) = diag(1n x) Ax

so that

F (x) 0

F (x)
Ax

x.
F+ (Ax)

= F (x ) or, equivalently, if and


Moreover, x is an equilibrium point (F (x ) = 0) if and only if Ax
) = x . We are now ready to present our results in two theorems.
only if F+ (Ax
Theorem 17.4 (Dynamical behavior of the network SIS model: below the threshold). Consider
the network SIS model (17.4) over an undirected graph G with infection rate and a recovery rate . Assume G
is connected, let A be its adjacency matrix with dominant eigenvalue max . If max / < 1, then
(i) there exists a unique equilibrium point 0n ,
(ii) the linearization of model (17.4) about the equilibrium 0n is x = (A In )x and it is exponentially
stable; and
> x(t) is monotonically and exponentialy
(iii) from any initial condition x(0) 6= 0n , the weighted average t 7 vmax
decreasing, so that all trajectories converge to 0n .

Ax
because f+ (z) z. Compute
Proof. Regarding statement (i), for x [0, 1]n \ {0n }, note F+ (Ax)
2 kAxk
2 kAk
2 kxk2 < kxk2 ,
kF+ (Ax)k

2 = (A),
because A is symmetric, and from (A)
=
where the last inequality follows from kAk
= x.
max / < 1. Therefore, no x 6= 0n can satisfy F+ (Ax)
Regarding statement (ii), the linearization of equation (17.5) is verified by dropping the second-order
terms. The eigenvalues of A In are i , where 1 = max > 2 n are the eigenvalues
of A. The linearized system is exponentially stable at 0n for max < 0.

> x(t), note I diag(z) v
Finally, regarding statement (iii), define y(t) = vmax
n
max vmax for any
z [0, 1]n , and compute

>
>
y(t)
= vmax
In diag(x(t)) Ax(t) vmax
x(t)
>
>
vmax
Ax(t) vmax
x(t)

(max )y(t).

By the Gronwalls Lemma, this inequality implies that t 7 y(t) is monotonically decreasing and satisfies
t 7 y(t) y(0) e(max )t from all initial conditions y(0). This concludes our proof of statement (iii)
since vmax > 0.

Theorem 17.5 (Dynamical behavior of the network SIS model: above the threshold). Consider
the network SIS model (17.4) over an undirected graph G with infection rate and a recovery rate . Assume G
is connected, let A be its adjacency matrix with dominant eigenpair (max , vmax ) and with degree vector d = A1n .
Define the shorthand: := max / 1 and d1 = (1/d1 , . . . , 1/dn )> . If max / > 1, then
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

206

Chapter 17. Virus Propagation in Contact Networks

(i) 0n is an equilibrium point, the linearization of system (17.5) at 0n is unstable with dominant unstable
eigenvalue max and with dominant eigenvector vmax , i.e., there will be an epidemic outbreak;

(ii) besides the equilibrium 0n , there exists a unique other equilibrium point x such that
(a)
(b)
(c)
(d)

x
x
x
x

> 0,
= vmax + O( 2 ), as 0+ ,
= 1n (/)d1 + O( 2 / 2 ), at fixed A as / ,
= limk y(k), where the monotonically-increasing {y(k)}kZ0 [0, 1]n is defined by
yi (k + 1) := f+

n
 X

j=1


aij yj (k) ,

y(0) :=

vmax ,
(1 + )2

(iii) if x(0) 6= 0n , then x(t) x as t . Moreover, if x(0) < x (resp. x(0) > x ), then t 7 x(t) is
monotonically increasing (resp. decreasing).
Note: statement (i) means that, near the onset of an epidemic outbreak, the exponential growth rate
is max and the outbreak tends to align with the dominant eigenvector vmax as in the discussion
leading up to the approximate evolution (17.3).
Proof of selected statements in Theorem 17.5. Statement (i) follows from the same analysis of the linearized
system as in the proof of Theorem 17.4(ii).

We next focus on the statements (ii). We begin by establishing two properties of the map x 7 F+ (Ax).

First, we claim that, y > z 0n implies F+ (Ay) > F+ (Az). Indeed, note that G being connected
implies that the adjacency matrix A has at least one strictly positive entry in each row. Hence, y z > 0n
> F+ (Az).

z) > 0n and, since f+ is monotonically increasing, Ay


> Az
implies F+ (Ay)
implies A(y
n

Second, we claim that there exists an x [0, 1] satisfying F+ (Ax) > x. Indeed, let max =
= max (A)/ > 1 and compute for any > 0
max (A)



max vmax,i ) > vmax,i
max
2 vmax,i ,
max )) = f+ (
F+ (A(v
max
i

max 1)/
2 and recalling
where we used the inequality y/(1 + y) > y(1 y), for y > 0. For = (
max
vmax,i < 1 for each i, compute
max
2 vmax,i =
max (
max 1)vmax,i >
max (
max 1) = 1.

max
max 1)/
2 . Simple calculations show
max ) > vmax , for = (
This concludes our proof that F+ (Av
max
that = /(1 + )2 so that vmax = y(0).
These two properties allow us to analyze the iteration defined in the theorem statement. We just

proved that y(2) = F+ (Ay(1))


> y(1) = (/(1 + )2 )vmax . This inequality implies F+ (Ay(2))
>

F+ (Ay(1)) and, by induction, F+ (Ay(k + 1)) > y(k + 1) = F+ (Ay(k)). Each sequence {yi (k)}kN ,
i {1, . . . , n}, is monotonically increasing and upper bounded by 1. Hence, the sequence {y(k)}kN
) = x . This proves the existence of an
converge and it converges to a point x > 0 such that F+ (Ax

equilibrium point x = limk y(k) > 0, as claimed in statements (ii)d and (ii)a.
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

17.4. The network SIR model

207

Regarding the statement (ii)b, we claim there exists a bounded sequence {w(k, )}kZ0 Rn such
that the sequence {y(k)}kZ0 satisfies y(k) = vmax + 2 w(k, ). The statement x = vmax + O( 2 ) is
then an immediate consequence of this claim and of the limit limk y(k) = x . We prove the claim by
induction. Because /(1+)2 = 2 2 +O( 3 ), the claim is true for k = 0 with w(0, ) = 2vmax +O().
We now assume the claim is true at k and show it true at k + 1:

max + 2 w(k, ))
yi (k + 1) = F+ A(v


= F+ (1 + )vmax + 2 Aw(k,
)


= F+ vmax + 2 (Aw(k,
) + vmax )

= vmax + 2 (Aw(k,
) + vmax )




diag vmax + 2 (Aw(k,


) + vmax ) vmax + 2 (Aw(k,
) + vmax ) + O( 3 )


= vmax + 2 Aw(k,
) + vmax diag(vmax )vmax + O() ,
where we used the Taylor expansion F+ (y) = y diag(y)y + O(kyk3 ). Hence, the claim is true if the
sequence {w(k, )}kZ0 defined by

w(k + 1, ) = Aw(k,
) + vmax diag(vmax )vmax + O()
is bounded. But the sequence is bounded because the spectral radius of A equals max / < 1. This
concludes the proof of statement (ii)b. The proof of statement (ii)c is analogous: it suffices to show the
existence of a bounded sequence {w(k)} such that y(k) = 1n (/)d1 + (/)2 w(k).
To complete the proof of statement (ii) we establish the uniqueness of the equilibrium x [0, 1]n \
{0n }. First, we claim that an equilibrium point with an entry equal to 0 must be 0n . Indeed,
y
Passume
n

is an equilibrium point and assume yi = 0 for some i {1, . . . , n}. The equality yi = f+ ( j=1 aij yj )
implies that also any node j with aij > 0 must satisfy yj = 0. Because G is connected, all entries of
y must be zero. Second, by contradiction, we assume there exists another equilibrium point y > 0
distinct from x . Without loss of generality, assume there exists i such that yi < xi . Let (0, 1)
satisfy y x > 0 and yi = xi . Note:


) y = f+ (Ay
)i xi
F+ (Ay
i

)i xi
f+ (Ax
(because A 0)

)i xi
> f+ (Ax
(because f+ (y) > f+ (y) for < 1)


) x = 0.
= F+ (Ax
(because x is an equilibrium)
i

) y
Therefore F+ (Ay
> 0 and this is a contradiction.
i
Regarding statement (iii) we refer to (Fall et al. 2007; Khanafer et al. 2015; Lajmanovich and Yorke
1976) in the interest of brevity.


17.4

The network SIR model

As previously, consider an undirected weighted graph G = (V, E) of order n with adjacency matrix A.
Let si (t), xi (t), ri (t) [0, 1] denote the fractions of susceptibile, infected and recovered individuals at
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

208

Chapter 17. Virus Propagation in Contact Networks

node i V at time t R0 . The network SIR model is


Xn
aij xj (t),
s i (t) = si (t)
j=1
Xn
aij xj (t) xi (t),
x i (t) = si (t)
j=1

ri (t) = xi (t),

where > 0 is the infection rate and > 0 is the recovery rate. Note that the third equation is redundant
because of the constraint si (t) + xi (t) + ri (t) = 1 and that, therefore, we regard the dynamical system as
described by the first two equations and write it in vector form as
s = diag(s)Ax,

x = diag(s)Ax x.

(17.6)

Theorem 17.6 (Dynamical behavior of the network SIR model). Consider the network SIR model (17.6)
over an undirected graph G with infection rate and a recovery rate . Assume G is connected and let A be its
adjacency matrix. Let (max,0 , vmax,0 ) be the dominant eigenpair for the nonnegative matrix A diag(s(0)). The
following statements hold:
(i) t 7 s(t) is monotonically decreasing and t 7 r(t) is monotonically increasing;

(ii) the set of equilibrium points is the set of pairs (s , 0n ), for any s [0, 1]n ,

>
(iii) if max,0 / < 1 and x(0) 6= 0n , then the weighted average t 7 vmax,0
x(t) monotonically and
exponentialy decreases to zero and each trajectory x(t) 0n as t ,
>
(iv) if max,0 / > 1 and x(0) 6= 0n , then, for small time, the weighted average t 7 vmax,0
x(t) grows
exponentially fast with rate max,0 , i.e., an epidemic outbreak will develop,

(v) each trajectory with initial condition (s(0), x(0)) with x(0) 6= 0n converges asymptotically to an equilibrium
point, that is, the epidemics finally disappears.

Proof of selected statements in Theorem 17.6. Regarding statement (ii), a point (s , x ) is an equilibrium if
0n = diag(s )Ax ,

0n = diag(s )Ax x .
It easy to see that each point of the form (s , 0n ) is an equilibrium. On the other hand, summing the last
two equality we obtain 0n = x , hence x must be the zero vector.
Regarding statement (iii), note that s i 0 for each i implies s(t) s(0) and, in turn, diag(s(t))vmax
>
diag(s(0))vmax for any s(0) [0, 1]n . As previously, define y(t) = vmax,0
x(t) and compute
>
>
y(t)
= vmax,0
diag(s(t))Ax(t) vmax,0
x(t)

>
>
vmax,0
diag(s(0))Ax(t) vmax,0
x(t)

(max,0 )y(t),

where we used the equality A diag(s(0))vmax,0 = max,0 vmax,0 .


Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

17.5. Exercises

17.5
E17.1

209

Exercises
Network SI model in digraphs. Generalize Theorem 17.3 to the setting of strongly-connected directed
graphs:
(i) what are the equilibrium points?
(ii) what are their convergence properties?

E17.2

Initial evolution of network SIS model. Consider the network SIS model with initial fraction x(0) =
x0 , where we take x0  1n and  1. Show that in the time scale t() = ln(1/)/(max ), the
linearized evolution satisfies


>
lim+ x t() = vmax
x0 vmax .
0

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

210

Chapter 17. Virus Propagation in Contact Networks

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Bibliography
J. A. Acebrn, L. L. Bonilla, C. J. P. Vicente, F. Ritort, and R. Spigler. The Kuramoto model: A simple
paradigm for synchronization phenomena. Reviews of Modern Physics, 77(1):137185, 2005. 171
D. Acemoglu and A. Ozdaglar. Opinion dynamics and learning in social networks. Dynamic Games and
Applications, 1(1):349, 2011. 144
R. P. Agaev and P. Y. Chebotarev. The matrix of maximum out forests and its applications. Automation
and Remote Control, 61(9):14241450, 2000. 74
B. D. B. Anderson, C. Yu, B. Fidan, and J. M. Hendrickx. Rigid graph control architectures for
autonomous formations. IEEE Control Systems Magazine, 28(6):4863, 2008. 151
A. Arenas, A. Daz-Guilera, J. Kurths, Y. Moreno, and C. Zhou. Synchronization in complex networks.
Physics Reports, 469(3):93153, 2008. 171
L. Asimow and B. Roth. The rigidity of graphs, II. Journal of Mathematical Analysis and Applications, 68
(1):171190, 1979. 164
H. Bai, M. Arcak, and J. Wen. Cooperative Control Design, volume 89. Springer, 2011. 97
P. Barooah. Estimation and Control with Relative Measurements: Algorithms and Scaling Laws. PhD thesis,
University of California at Santa Barbara, July 2007. 95
P. Barooah and J. P. Hespanha. Estimation from relative measurements: Algorithms and scaling laws.
IEEE Control Systems Magazine, 27(4):5774, 2007. 95
M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numerica, 14:
1137, 2005. 91
A. R. Bergen and D. J. Hill. A structure preserving model for power system stability analysis. IEEE
Transactions on Power Apparatus and Systems, 100(1):2535, 1981. 174
A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. SIAM, 1994. 115
D. S. Bernstein. Matrix Mathematics. Princeton University Press, 2 edition, 2009. 91
N. Biggs. Algebraic Graph Theory. Cambridge University Press, 2 edition, 1994. ISBN 0521458978. 45,
69, 95, 101
211

212

Bibliography

V. D. Blondel and A. Olshevsky. How to decide consensus? a combinatorial necessary and sufficient condition and a proof that consensus is decidable but np-hard. SIAM Journal on Control and Optimization,
52(5):27072726, 2014. 132
B. Bollobs. Modern Graph Theory. Springer, 1998. ISBN 0387984887. 35
S. Bolognani, S. Del Favero, L. Schenato, and D. Varagnolo. Consensus-based distributed sensor
calibration and least-square parameter identification in WSNs. International Journal of Robust and
Nonlinear Control, 20(2):176193, 2010. 95, 100
P. Bonacich. Factoring and weighting approaches to status scores and clique identification. Journal of
Mathematical Sociology, 2(1):113120, 1972. 61
S. P. Borgatti and M. G. Everett. A graph-theoretic perspective on centrality. Social Networks, 28(4):
466484, 2006. 63
S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph. SIAM Review, 46(4):
667689, 2004. 53
S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip algorithms. IEEE Transactions on
Information Theory, 52(6):25082530, 2006. 143
U. Brandes. Centrality: concepts and methods. Slides, May 2006. The International Workshop/School
and Conference on Network Science. 64
U. Brandes and T. Erlebach. Network Analysis: Methodological Foundations. Springer, 2005. 60
L. Breiman. Probability, volume 7 of Classics in Applied Mathematics. SIAM, 1992. ISBN 0-89871-296-3.
Corrected reprint of the 1968 original. 144
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks,
30:107117, 1998. 63
E. Brown, P. Holmes, and J. Moehlis. Globally coupled oscillator networks. In E. Kaplan, J. E. Marsden,
and K. R. Sreenivasan, editors, Perspectives and Problems in Nonlinear Science: A Celebratory Volume in
Honor of Larry Sirovich, pages 183215. Springer, 2003. 171
A. M. Bruckstein, N. Cohen, and A. Efrat. Ants, crickets, and frogs in cyclic pursuit. Technical
Report CIS 9105, Center for Intelligent Systems, Technion, Haifa, Israel, July 1991. Available at
http://www.cs.technion.ac.il/tech-reports. 7
J. Buck. Synchronous rhythmic flashing of fireflies. II. Quarterly Review of Biology, 63(3):265289, 1988.
171
F. Bullo, J. Corts, and S. Martnez. Distributed Control of Robotic Networks. Princeton University Press,
2009. ISBN 978-0-691-14195-4. 3, 48, 129, 156
R. Carli, F. Fagnani, A. Speranzon, and S. Zampieri. Communication constraints in the average consensus
problem. Automatica, 44(3):671684, 2008. 126
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Bibliography

213

R. Carli, F. Garin, and S. Zampieri. Quadratic indices for the analysis of consensus algorithms. In
Information Theory and Applications Workshop, pages 96104, San Diego, CA, USA, Feb. 2009. 123,
124, 126
H. Caswell. Matrix Population Models. John Wiley & Sons, 2001. 51
N. D. Charkes, P. T. M. Jr, and C. Philips. Studies of skeletal tracer kinetics. I. digital-computer solution
of a five-compartment model of [18f ] fluoride kinetics in humans. Journal of Nuclear Medicine, 19
(12):13011309, 1978. 108
A. Cherukuri and J. Corts. Asymptotic stability of saddle points under the saddle-point dynamics. In
American Control Conference, Chicago, IL, USA, July 2015. To appear. 89, 91
S. M. Crook, G. B. Ermentrout, M. C. Vanier, and J. M. Bower. The role of axonal delay in the
synchronization of networks of coupled cortical oscillators. Journal of Computational Neuroscience, 4
(2):161172, 1997. 171
H. Daido. Quasientrainment and slow relaxation in a population of oscillators with random and frustrated
interactions. Physical Review Letters, 68(7):10731076, 1992. 171
P. J. Davis. Circulant Matrices. American Mathematical Society, 2 edition, 1994. ISBN 0828403384. 125
T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Transactions on
Mathematical Software, 38(1):125, 2011. 41
M. H. DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345):118121,
1974. 4
P. M. DeMarzo, D. Vayanos, and J. Zwiebel. Persuasion bias, social influence, and unidimensional
opinions. The Quarterly Journal of Economics, 118(3):909968, 2003. 53, 57
R. Diestel. Graph Theory, volume 173 of Graduate Texts in Mathematics. Springer, 2 edition, 2000. 35
F. Drfler and F. Bullo. On the critical coupling for Kuramoto oscillators. SIAM Journal on Applied
Dynamical Systems, 10(3):10701099, 2011. 171, 187
F. Drfler and F. Bullo. Exploring synchronization in complex oscillator networks, Sept. 2012. Extended
version including proofs. Available at http://arxiv.org/abs/1209.1335. 188, 189
F. Drfler and F. Bullo. Synchronization in complex networks of phase oscillators: A survey. Automatica,
50(6):15391564, 2014. 171
F. Drfler and B. Francis. Geometric analysis of the formation problem for autonomous robots. IEEE
Transactions on Automatic Control, 55(10):23792384, 2010. 151
G. Droge, H. Kawashima, and M. Egerstedt. Proportional-integral distributed optimization for networked systems. arXiv preprint arXiv:1309.6613, 2013. 89, 91
C. L. DuBois. UCI Network Data Repository, 2008. URL http://networkdata.ics.uci.edu. 41
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

214

Bibliography

F. Fagnani and S. Zampieri. Randomized consensus algorithms over large scale networks. IEEE Journal
on Selected Areas in Communications, 26(4):634649, 2008. 143, 145, 146
A. Fall, A. Iggidr, G. Sallet, and J.-J. Tewa. Epidemiological models and Lyapunov functions. Mathematical
Modelling of Natural Phenomena, 2(1):6268, 2007. 207
L. Farina and S. Rinaldi. Positive Linear Systems: Theory and Applications. John Wiley & Sons, 2000. 107
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(2):298305, 1973.
73
D. M. Foster and J. A. Jacquez. Multiple zeros for eigenvalues and the multiplicity of traps of a linear
compartmental system. Mathematical Biosciences, 26(1):8997, 1975. 74
L. R. Foulds. Graph Theory Applications. Universitext. Springer, 1995. ISBN 0387975993. 95
P. Frasca. Quick convergence proof for gossip consensus. Personal communication, 2012. 143, 145
J. R. P. French. A formal theory of social power. Psychological Review, 63(3):181194, 1956. 4
N. E. Friedkin and E. C. Johnsen. Social influence networks and opinion change. In E. J. Lawler and
M. W. Macy, editors, Advances in Group Processes, volume 16, pages 129. JAI Press, 1999. 66
P. A. Fuhrmann and U. Helmke. The Mathematics of Networks of Linear Systems. Springer, 2015. ISBN
3319166468. 3
C. Gao, J. Corts, and F. Bullo. Notes on averaging over acyclic digraphs and discrete coverage control.
Automatica, 44(8):21202127, 2008. 78
F. Garin and L. Schenato. A survey on distributed estimation and control applications using linear
consensus algorithms. In A. Bemporad, M. Heemels, and M. Johansson, editors, Networked Control
Systems, LNCIS, pages 75107. Springer, 2010. 3, 9, 143
B. Gharesifard and J. Cortes. Distributed continuous-time convex optimization on weight-balanced
digraphs. IEEE Transactions on Automatic Control, 59(3):781786, 2014. 89, 91
A. K. Ghosh, B. Chance, and E. K. Pye. Metabolic coupling and synchronization of NADH oscillations
in yeast cell populations. Archives of Biochemistry and Biophysics, 145(1):319331, 1971. 171
D. F. Gleich. Pagerank beyond the Web. SIAM Review, 57(3):321363, 2015. 60
C. D. Godsil and G. F. Royle. Algebraic Graph Theory, volume 207 of Graduate Texts in Mathematics.
Springer, 2001. ISBN 0387952411. 45, 69, 95
M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1.
http://cvxr.com/cvx, Oct. 2014. 127
W. H. Haddad, V. Chellaboina, and Q. Hui. Nonnegative and Compartmental Dynamical Systems. Princeton
University Press, 2010. 107
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Bibliography

215

F. Harary. A criterion for unanimity in Frenchs theory of social power. In D. Cartwright, editor,
Studies in Social Power, pages 168182. University of Michigan, 1959. 4, 24
J. M. Hendrickx. Graphs and Networks for the Analysis of Autonomous Agent Systems. PhD thesis, Universit
Catholique de Louvain, Belgium, Feb. 2008. 3, 129, 133, 136, 137
J. M. Hendrickx and J. N. Tsitsiklis. Convergence of type-symmetric and cut-balanced consensus
seeking systems. IEEE Transactions on Automatic Control, 58(1):214218, 2013. 139
J. P. Hespanha. Linear Systems Theory. Princeton University Press, 2009. ISBN 0691140219. 88, 137
H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599653, 2000. 107, 199
F. C. Hoppensteadt and E. M. Izhikevich. Synchronization of laser oscillators, associative memory, and
optical neurocomputing. Physical Review E, 62(3):40104013, 2000. 171
R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. ISBN 0521386322.
15, 16
Y. Hu. Efficient, high-quality force-directed graph drawing. Mathematica Journal, 10(1):3771, 2005. 42
C. Huygens. Horologium Oscillatorium. Paris, France, 1673. 171
H. Ishii and R. Tempo. The pagerank problem, multiagent consensus, and web aggregation: A systems
and control viewpoint. IEEE Control Systems Magazine, 34(3):3453, 2014. 62, 63
M. O. Jackson. Social and Economic Networks. Princeton University Press, 2010. 53
J. A. Jacquez and C. P. Simon. Qualitative theory of compartmental systems. SIAM Review, 35(1):4379,
1993. 107
A. Jadbabaie, J. Lin, and A. S. Morse. Coordination of groups of mobile autonomous agents using nearest
neighbor rules. IEEE Transactions on Automatic Control, 48(6):9881001, 2003. 141
G. Jongen, J. Anemller, D. Boll, A. C. C. Coolen, and C. Perez-Vicente. Coupled dynamics of fast
spins and slow exchange interactions in the XY spin glass. Journal of Physics A: Mathematical and
General, 34(19):39573984, 2001. 171
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):3943, 1953. 61
H. K. Khalil. Nonlinear Systems. Prentice Hall, 3 edition, 2002. ISBN 0130673897. 154, 156
A. Khanafer, T. Baar, and B. Gharesifard. Stability of epidemic models over directed graphs: A positive
systems approach. Automatica, 2015. to appear. 199, 207
G. Kirchhoff. ber die Auflsung der Gleichungen, auf welche man bei der Untersuchung der linearen
Verteilung galvanischer Strme gefhrt wird. Annalen der Physik und Chemie, 148(12):497508,
1847. 72
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

216

Bibliography

I. Z. Kiss, Y. Zhai, and J. L. Hudson. Emerging coherence in a population of chemical oscillators. Science,
296(5573):16761678, 2002. 171
M. S. Klamkin and D. J. Newman. Cyclic pursuit or "the three bugs problem". American Mathematical
Monthly, 78(6):631639, 1971. 7
D. J. Klein, P. Lee, K. A. Morgansen, and T. Javidi. Integration of communication and control using
discrete time Kuramoto models for multivehicle coordination over broadcast networks. IEEE Journal
on Selected Areas in Communications, 26(4):695705, 2008. 171
G. Kozyreff, A. G. Vladimirov, and P. Mandel. Global coupling with time delay in an array of
semiconductor lasers. Physical Review Letters, 85(18):38093812, 2000. 171
D. Krackhardt. Cognitive social structures. Social Networks, 9(2):109134, 1987. 55
L. Krick, M. E. Broucke, and B. Francis. Stabilization of infinitesimally rigid formations of multi-robot
networks. International Journal of Control, 82(3):423439, 2009. 151
J. Kunegis. KONECT: the Koblenz network collection. In International Conference on World Wide Web
Companion, pages 13431350, 2013. 41
Y. Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. In H. Araki, editor,
Int. Symposium on Mathematical Problems in Theoretical Physics, volume 39 of Lecture Notes in Physics,
pages 420422. Springer, 1975. ISBN 978-3-540-07174-7. 171, 176
Y. Kuramoto. Chemical Oscillations, Waves, and Turbulence. Springer, 1984. ISBN 0387133224. 171
A. Lajmanovich and J. A. Yorke. A deterministic model for gonorrhea in a nonhomogeneous population.
Mathematical Biosciences, 28(3):221236, 1976. 199, 207
P. H. Leslie. On the use of matrices in certain population mathematics. Biometrika, 3(3):183212, 1945.
51
Z. Lin, B. Francis, and M. Maggiore. Necessary and sufficient graphical conditions for formation control
of unicycles. IEEE Transactions on Automatic Control, 50(1):121127, 2005. 43
Z. Lin, B. Francis, and M. Maggiore. State agreement for continuous-time coupled nonlinear systems.
SIAM Journal on Control and Optimization, 46(1):288307, 2007. 139, 186
C. Liu, D. R. Weaver, S. H. Strogatz, and S. M. Reppert. Cellular construction of a circadian clock:
period determination in the suprachiasmatic nuclei. Cell, 91(6):855860, 1997. 171
S. ojasiewicz. Sur les trajectoires du gradient dune fonction analytique. Seminari di Geometria 1982-1983,
pages 115117, 1984. Istituto di Geometria, Dipartimento di Matematica, Universit di Bologna,
Italy. 158
D. G. Luenberger. Introduction to Dynamic Systems: Theory, Models, and Applications. John Wiley & Sons,
1979. 31, 107
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Bibliography

217

D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 2 edition, 1984. 81


J. A. Marshall, M. E. Broucke, and B. A. Francis. Formations of vehicles in cyclic pursuit. IEEE
Transactions on Automatic Control, 49(11):19631974, 2004. 7
A. Mauroy, P. Sacr, and R. J. Sepulchre. Kick synchronization versus diffusive synchronization. In
IEEE Conf. on Decision and Control, pages 71717183, Maui, HI, USA, Dec. 2012. 171
W. Mei and F. Bullo. Modeling and analysis of competitive propagation with social conversion. In IEEE
Conf. on Decision and Control, pages 62036208, Los Angeles, CA, USA, Dec. 2014. 199
R. Merris. Laplacian matrices of a graph: A survey. Linear Algebra and its Applications, 197:143176, 1994.
69
M. Mesbahi and M. Egerstedt. Graph Theoretic Methods in Multiagent Networks. Princeton University
Press, 2010. 3, 83, 84, 156
C. D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2001. ISBN 0898714540. 26, 29
D. C. Michaels, E. P. Matyas, and J. Jalife. Mechanisms of sinoatrial pacemaker synchronization: a new
hypothesis. Circulation Research, 61(5):704714, 1987. 171
P. V. Mieghem. The N -intertwined SIS epidemic network model. Computing, 93(2-4):147169, 2011.
199
P. V. Mieghem, J. Omic, and R. Kooij. Virus spread in networks. IEEE/ACM Transactions on Networking,
17(1):114, 2009. 199
B. Mohar. The Laplacian spectrum of graphs. In Y. Alavi, G. Chartrand, O. R. Oellermann, and A. J.
Schwenk, editors, Graph Theory, Combinatorics, and Applications, volume 2, pages 871898. John
Wiley & Sons, 1991. ISBN 0471532452. 69
L. Moreau. Stability of continuous-time distributed consensus algorithms. In IEEE Conf. on Decision and
Control, pages 39984003, Nassau, Bahamas, 2004. 138, 139
L. Moreau. Stability of multiagent systems with time-dependent communication links. IEEE Transactions
on Automatic Control, 50(2):169182, 2005. 43, 132, 137
Z. Nda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabsi. Physics of the rhythmic applause. Physical
Review E, 61(6):69876992, 2000. 171
M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010. ISBN 0199206651. 60, 199
I. Noy-Meir. Desert ecosystems: environment and producers. Annual Review of Ecology and Systematics,
pages 2551, 1973. 6, 107
K.-K. Oh, M.-C. Park, and H.-S. Ahn. A survey of multi-agent formation control: Position-,
displacement-, and distance-based approaches. Technical Report Technical Report, Number:
GIST DCASL TR 2012-02, Gwangju Institute of Science and Technology, Korea, 2012. 167
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

218

Bibliography

K.-K. Oh, M.-C. Park, and H.-S. Ahn. A survey of multi-agent formation control. Automatica, 53:
424440, 2015. 151
R. Olfati-Saber. Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Transactions
on Automatic Control, 51(3):401420, 2006. 151
R. Olfati-Saber, E. Franco, E. Frazzoli, and J. S. Shamma. Belief consensus and distributed hypothesis
testing in sensor networks. In P. J. Antsaklis and P. Tabuada, editors, Network Embedded Sensing and
Control. (Proceedings of NESC05 Worskhop), Lecture Notes in Control and Information Sciences,
pages 169182. Springer, 2006. ISBN 3540327940. 10
A. Olshevsky and J. N. Tsitsiklis. On the nonexistence of quadratic Lyapunov functions for consensus
algorithms. IEEE Transactions on Automatic Control, 53(11):26422645, 2008. 132
R. W. Owens. An algorithm to solve the Frobenius problem. Mathematics Magazine, 76(4):264275,
2003. 51
L. Page. Method for node ranking in a linked database, Sept. 2001. US Patent 6,285,999. 63
D. A. Paley, N. E. Leonard, R. Sepulchre, D. Grunbaum, and J. K. Parrish. Oscillator models and
collective motion. IEEE Control Systems Magazine, 27(4):89105, 2007. 171, 175
J. Pantaleone. Stability of incoherence in an isotropic gas of oscillating neutrinos. Physical Review D, 58
(7):073002, 1998. 171
G. Piovan, I. Shames, B. Fidan, F. Bullo, and B. D. O. Anderson. On frame and orientation localization
for relative sensing networks. Automatica, 49(1):206213, 2013. 95
V. H. Poor. An Introduction to Signal Detection and Estimation. Springer, 1994. 10
V. Rakoevi. On continuity of the Moore-Penrose and Drazin inverses. Matematichki Vesnik, 49(3-4):
163172, 1997. 188
B. S. Y. Rao and H. F. Durrant-Whyte. A decentralized Bayesian algorithm for identification of tracked
targets. IEEE Transactions on Systems, Man & Cybernetics, 23(6):16831698, 1993. 10
W. Ren, R. W. Beard, and E. M. Atkins. Information consensus in multivehicle cooperative control:
Collective group behavior through local interaction. IEEE Control Systems Magazine, 27(2):7182,
2007. 83, 84, 88
R. Sepulchre, D. A. Paley, and N. E. Leonard. Stabilization of planar collective motion: All-to-all
communication. IEEE Transactions on Automatic Control, 52(5):811824, 2007. 171
P. N. Shivakumar, J. J. Williams, Q. Ye, and C. A. Marinov. On two-sided bounds related to weakly
diagonally dominant m-matrices with application to digital circuit dynamics. SIAM Journal on
Matrix Analysis and Applications, 17(2):298312, 1996. 111
O. Simeone, U. Spagnolini, Y. Bar-Ness, and S. H. Strogatz. Distributed synchronization in wireless
networks. IEEE Signal Processing Magazine, 25(5):8197, 2008. 171
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Bibliography

219

J. W. Simpson-Porco, F. Drfler, and F. Bullo. Droop-controlled inverters are Kuramoto oscillators.


In IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages 264269, Santa
Barbara, CA, USA, Sept. 2012. 171
S. L. Smith, M. E. Broucke, and B. A. Francis. A hierarchical cyclic pursuit scheme for vehicle networks.
Automatica, 41(6):10451053, 2005. 7
E. H. Spanier. Algebraic Topology. Springer, 1994. 188
S. H. Strogatz. From Kuramoto to Crawford: Exploring the onset of synchronization in populations of
coupled oscillators. Physica D: Nonlinear Phenomena, 143(1):120, 2000. 171
A. Tahbaz-Salehi and A. Jadbabaie. A necessary and sufficient condition for consensus over random
networks. IEEE Transactions on Automatic Control, 53(3):791795, 2008. 143, 145
Y. Takeuchi. Global Dynamical Properties of Lotka-Volterra Systems. World Scientific Publishing, 1996.
ISBN 9810224710. 168
H. G. Tanner, A. Jadbabaie, and G. J. Pappas. Flocking in fixed and switching networks. IEEE Transactions
on Automatic Control, 52(5):863868, 2007. 151
P. A. Tass. A model of desynchronizing deep brain stimulation with a demand-controlled coordinated
reset of neural subpopulations. Biological Cybernetics, 89(2):8188, 2003. 171
F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie. The brainweb: Phase synchronization and
large-scale integration. Nature Reviews Neuroscience, 2(4):229239, 2001. 171
T. Vicsek, A. Czirk, E. Ben-Jacob, I. Cohen, and O. Shochet. Novel type of phase transition in a
system of self-driven particles. Physical Review Letters, 75(6-7):12261229, 1995. 174
T. J. Walker. Acoustic synchrony: two mechanisms in the snowy tree cricket. Science, 166(3907):
891894, 1969. 171
G. G. Walter and M. Contreras. Compartmental Modeling with Networks. Birkhuser, 1999. 107
J. Wang and N. Elia. Control approach to distributed optimization. In Allerton Conf. on Communications,
Control and Computing, pages 557561, Monticello, IL, USA, 2010. 89, 91
Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos. Epidemic spreading in real networks: An
eigenvalue viewpoint. In IEEE Int. Symposium on Reliable Distributed Systems, pages 2534, Oct.
2003. 199
A. Watton and D. W. Kydon. Analytical aspects of the N -bug problem. American Journal of Physics, 37
(2):220221, 1969. 7
A. T. Winfree. Biological rhythms and the behavior of populations of coupled oscillators. Journal of
Theoretical Biology, 16(1):1542, 1967. 171
J. Wolfowitz. Product of indecomposable, aperiodic, stochastic matrices. Proceedings of American
Mathematical Society, 14(5):733737, 1963. 55
Lectures on Network Systems, F. Bullo
Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

220

Bibliography

W. Xia and M. Cao. Sarymsakov matrices and asynchronous implementation of distributed coordination
algorithms. IEEE Transactions on Automatic Control, 59(8):22282233, 2014. 132
L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems & Control Letters, 53:6578,
2004. 119, 127
L. Xiao, S. Boyd, and S. Lall. A scheme for robust distributed sensor fusion based on average consensus.
In Symposium on Information Processing of Sensor Networks, pages 6370, Los Angeles, CA, USA, Apr.
2005. 9
R. A. York and R. C. Compton. Quasi-optical power combining using mutually synchronized oscillator
arrays. IEEE Transactions on Microwave Theory and Techniques, 39(6):10001009, 2002. 171
S. Zampieri. Lecture Notes on Dynamics over Networks. Minicourse at UC Santa Barbara, Apr. 2013.
199
D. Zelazo. Graph-Theoretic Methods for the Analysis and Synthesis of Networked Dynamic Systems. PhD
thesis, University of Washington, 2009. 101
D. Zelazo and M. Mesbahi. Edge agreement: Graph-theoretic performance bounds and passivity analysis.
IEEE Transactions on Automatic Control, 56(3):544555, 2011. 105

Lectures on Network Systems, F. Bullo


Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright 2012-16.

Das könnte Ihnen auch gefallen