Sie sind auf Seite 1von 19

Coexistence in preferential attachment networks

Tonci Antunovic

Elchanan Mossel

Mikl os Z. Racz

July 11, 2013
Abstract
Competition in markets is ubiquitous: cell-phone providers, computer manufacturers, and sport gear
brands all vie for customers. Though several coexisting competitors are often observed in empirical
data, many current theoretical models of competition on small-world networks predict a single winner
taking over the majority of the network. We introduce a new model of product adoption that focuses
on word-of-mouth recommendations to provide an explanation for this coexistence of competitors. The
key property of our model is that customer choices evolve simultaneously with the network of customers.
When a new node joins the network, it chooses neighbors according to preferential attachment, and then
chooses its type based on the number of initial neighbors of each type. This can model a new cell-phone
user choosing a cell-phone provider, a new student choosing a laptop, or a new athletic team member
choosing a gear provider. We provide a detailed analysis of the new model; in particular, we determine
the possible limiting proportions of the various types. The main qualitative feature of our model is that,
unlike other current theoretical models, often several competitors will coexist, which matches empirical
observations in many current markets.
1 Introduction
A major challenge in understanding complex networks is the interplay between the evolution of the network
and the dynamical features of processes on the network. Almost all networks we know evolve dynamically:
the citation graph grows every day with new papers being published, friendships are created and broken every
minute, webpages and links between them are born and destroyed every second, and actin laments of the
cytoskeleton assemble and disassemble every millisecond to facilitate cell motion. The changes in network
structure are closely related to changes in the features or content of individual nodes, and the processes
on these nodes. For example, the content of a Facebook page is correlated with the friendship dynamics,
the changing content of webpages inuences the creation and destruction of links, and the connectivity of
neurons is inuenced by their utilization.
The network structure of many complex networks is well understood since the work of Barabasi and
Albert [16], who showed that the network topology arising in these real-world networks is a consequence of
two generic mechanisms: growth and preferential attachment. Subsequently many studies have underlined
the universality of this network topology, conrming its relevance. However, to understand the behavior of
complex systems, it is not enough to understand the underlying network structure. To quote Barabasi [15],
To make progress in this direction, we need to tackle the next frontier, which is to understand the dynamics
of the processes that take place on networks.
Indeed, we argue that the only way to truly understand dynamical processes on networks is to consider
them together with the network evolution dynamics. In the past decade there have been many studies
on processes on networks [17], e.g., epidemic spreading [45], evolutionary games [44], and information cas-
cades [54]. However, all of these considered the network as xed, and then studied the process of interest
on this static graph. This static viewpoint hides the fact that the networks and the processes on them

University of California, Los Angeles; tantunovic@math.ucla.edu.

University of California, Berkeley and Weizmann Institute of Science; mossel@stat.berkeley.edu; supported by NSF grant
DMS 1106999 and by DOD ONR grant N000141110140.

University of California, Berkeley; racz@stat.berkeley.edu; supported by a UC Berkeley Graduate Fellowship, by NSF


grant DMS 1106999 and by DOD ONR grant N000141110140.
1
a
r
X
i
v
:
1
3
0
7
.
2
8
9
3
v
1


[
p
h
y
s
i
c
s
.
s
o
c
-
p
h
]


1
0

J
u
l

2
0
1
3
coevolve. Although the study of such coevolution was initiated over a decade ago [53], only recently is it
starting to be explored in greater depth (see [29, 33] and references therein), and thus many questions still
remain. In particular, in the context of product adoption on networks, there is yet no clear explanation of
the phenomena of coexistence of competing products.
Our main contribution is to identify a simple model which couples the growth of a network and node
feature dynamics; in particular, we focus on type adoption dynamics, where each node has a single type from
a nite set of types. When a new node joins the network, both its connections to the existing nodes and
its type are inuenced by the current structure of the network. As a particular instance of such a general
model, we consider the dynamics where the new node chooses its connections according to linear preferential
attachment [16], and then chooses its type based on how many of its neighbors are of a certain type; see
Figure 1 for an illustration.
Figure 1: Illustration of our model. Each node in the initial graph has a type/color from a nite set
of types/colors. At each time step a new node is added to the graph and connected to m existing nodes
according to linear preferential attachment (here m = 5). When the new node joins the graph it also adopts a
type/color: it picks its type/color according to a probability distribution which depends on the types/colors
of its initial neighbors. See Section 1.1 for details.
Our model is of interest in many cases where preferential attachment is a good representation of the
evolution of the network structure and where competition between types is a natural process. In particular,
these include models of product adoption via word-of-mouth recommendations on social networks, such as
a new cell-phone user choosing a cell-phone provider/package/device based on her friends decisions, a new
student choosing a laptop, or a new athletic team member choosing a gear provider.
A key feature of our model is the elegance and simplicity of its analysis. We explicitly calculate the
possible limiting ratios of the types. An interesting feature of our results is that for many settings of the
parameters of the model, none of the types dominate (see Figure 4), which matches empirical observations
in many current markets. Our results thus provide a theoretical understanding of coexistence of types in
preferential attachment networks. They should be compared to results on other models of competition on
scale-free networks where coexistence is rarely achieved, and typically the winner takes all [50, 23].
We next describe our model and our results in more detail, followed by a discussion of related work.
1.1 Model
For simplicity, we describe our model in the case of two types, which we refer to as red and blue colors.
In the following, we use the terms type and color interchangeably. Our model naturally generalizes to
any number of types, see Section 3 for a description and results. The main feature of the model is that it
incorporates and couples two processes: a network growing process and a type adoption process.
We consider the standard linear preferential attachment model [16] as the network growing process in
our model. Starting from an initial graph G
0
, at each time step an additional node v is added to the graph,
together with m edges connecting v to existing nodes in the graph. Each edge is chosen independently,
and according to linear preferential attachment, i.e., the probability that a given edge connects v to a given
existing node u is proportional to the degree of u.
The type adoption process on the network is as follows. All nodes in the initial graph G
0
start with a type,
i.e., they are either red or blue. Each additional node v receives a color when it is added to the graph, and
2
this color depends on the colors of the nodes it connects to when it is added. Suppose that out of the m edges
connecting the new node v to existing nodes exactly k connect to a red node. Then, conditioned on this event,
v becomes red with probability p
k
and blue with probability 1p
k
. The probabilities p
k
[0, 1] , 0 k m,
are parameters of the model. See Figure 1 for an illustration.
The parameters p
k

0km
allow us to model dierent kinds of behavior. A natural choice is the linear
model, when p
k
= k/m for all k. However, nonlinear models, when p
k
,= k/m for some k, can capture a wide
range of other types of behavior. In particular, they can capture diminishing and increasing returns, and
even more complex behavior that combines these.
1.2 Results
We are interested in the fraction of nodes of each typethis corresponds to the fraction of users using a
given companys product, or in other words, the companys market share. Our main results characterize the
possible limiting fractions of each color in the case of two colors. These results thus provide a complete phase
diagram of the asymptotic behavior of the process as the size of the network goes to innity; see Figure 4
for an illustration.
To describe our results we introduce some notation. Let G
n
denote the graph when n nodes have been
added to the initial graph G
0
. Let A
n
and B
n
, resp., denote the number of red and blue nodes, resp., in
G
n
, and let a
n
:=
An
An+Bn
and b
n
:=
Bn
An+Bn
denote the corresponding normalized fractions. Furthermore,
let X
n
(resp., Y
n
) denote the sum of the degrees of red (resp., blue) nodes in G
n
, and let x
n
:=
Xn
Xn+Yn
and
y
n
:=
Yn
Xn+Yn
denote the normalized fractions. We are primarily interested in the asymptotic proportion of
red and blue nodes, i.e., in the limits lim
n
a
n
and lim
n
b
n
= 1 lim
n
a
n
.
As we shall see, a key role in the asymptotic behavior of the process is played by the polynomial
(1) P (z) =
1
2
m

k=0
_
m
k
_
z
k
(1 z)
mk
_
p
k

k
m
_
,
and in particular its zero set, denoted by Z
P
:= z [0, 1] : P (z) = 0. This is because, as we will see,
a
n

n0
behaves approximately like a stochastic version of the ODE dz/dt = P (z), and thus intuitively the
trajectory of a
n

n0
should approximate the trajectory z (t)
t0
of this ODE.
The following two theorems conrm this intuition. There is an important distinction between the linear
model and nonlinear models, which is due to the fact that in the linear model the polynomial P is identically
zero and thus Z
P
= [0, 1], while in nonlinear models the zero set Z
P
is a nite set.
Theorem 1.1 (Linear model). Suppose that p
k
= k/m for all 0 k m, and that X
0
, Y
0
> 0. Then a
n
converges almost surely; furthermore, the limiting distribution has full support on [0, 1] and no atoms, and
depends only on X
0
, Y
0
, and m.
See Figure 2 for empirical histograms in the linear model with various initial parameters and various
values of m.
Theorem 1.2 (Nonlinear models). Suppose that p
k
,= k/m for some 0 k m, and that X
0
, Y
0
> 0. Then
a
n
converges almost surely; furthermore, the limit is a point in the nite set Z
P
.
In nonlinear models we thus know that the asymptotic proportion of red nodes is contained in the nite
zero set Z
P
. But which points z Z
P
arise as the limiting proportion with positive probability? This
depends on the behavior of the polynomial around the zero z Z
P
. Intuitively, since a
n

n0
is a stochastic
system, we expect that stable trajectories of the ODE dz/dt = P (z) should appear, but unstable trajectories
should not. This intuition is conrmed and formalized in the following three theorems.
Theorem 1.3 (Nonlinear models, stable equilibria). Suppose that p
k
,= k/m for some 0 k m, and that
X
0
, Y
0
> 0. Suppose z Z
P
(0, 1) is such that there exists an > 0 such that P > 0 on (z , z) and
P < 0 on (z, z + ). Then P(lim
n
a
n
= z) > 0, i.e., a
n
converges to z with positive probability. Similarly,
if 0 Z
P
and P < 0 on (0, ), or if 1 Z
P
and P > 0 on (1 , 1), then there is a positive probability of
convergence of a
n
to 0 or 1, respectively.
3
(a) A0 = B0 = 1, X0 = Y0 = 1. (b) A0 = B0 = 2, X0 = Y0 = 4. (c) A0 = B0 = 3, X0 = Y0 = 9.
(d) A0 = 1, B0 = 2, X0 = 1, Y0 = 3. (e) A0 = 1, B0 = 4, X0 = 1, Y0 = 11. (f) A0 = 2, B0 = 3, X0 = 4, Y0 = 8.
Figure 2: Empirical histograms of a
n
in the linear model for n = 10
5
, from 2 10
5
simulations. Each
subgure has dierent initial parameters (see subcaptions), and in each case empirical histograms for ten
dierent values of m are plotted. See Fig. 2e for the key to all plots.
Theorem 1.4 (Nonlinear models, unstable equilibria). Suppose that p
k
,= k/m for some 0 k m, and
that X
0
, Y
0
> 0. Suppose z Z
P
(0, 1) is such that there exists an > 0 such that P < 0 on (z , z) and
P > 0 on (z, z + ). Then P(lim
n
a
n
= z) = 0. Similarly, if 0 Z
P
and P > 0 on (0, ), or if 1 Z
P
and P < 0 on (1 , 1), then the probability of convergence of a
n
to 0 or 1, respectively, is zero.
Theorem 1.5 (Nonlinear models, touchpoints). Suppose that p
k
,= k/m for some 0 k m, and that
X
0
, Y
0
> 0. Suppose z Z
P
(0, 1) is such that there exists an > 0 such that P is either strictly positive
or strictly negative on the union of the intervals (z , z) and (z, z + ). Then P(lim
n
a
n
= z) > 0.
See Figure 3 for an illustration of the polynomial P for various values of the parameters p
k

0km
, and
what the various limiting proportions can be in each case.
The theorems above provide a complete phase diagram of the asymptotic behavior of the process in the
case of two types. To illustrate this, see Figure 4, which shows phase diagrams for m = 3 and m = 4 when
there is no bias towards either color, i.e., when p
k
+p
mk
= 1 for all 0 k m. This condition implies that
P (z) = P (1 z) and so 1/2 Z
P
, but 1/2 need not be a limit point (see Fig. 3).
Coexistence. In particular, the theorems above show that in many cases the two colors coexist in the
limit. Indeed, since P(0) =
1
2
p
0
and P(1) =
1
2
(p
m
1), p
0
= 0 or p
m
= 1 is necessary for one of the colors
to asymptotically take over the network. Whenever p
0
> 0 and p
m
< 1 the two colors coexist in the limit,
and thus our model provides a theoretical understanding of coexistence in preferential attachment networks.
A natural extension of the model is to consider more than two colors. For clarity of presentation, we
postpone the discussion of this until later: see Section 3 for a description of the model with many colors and
the corresponding results and conjectures.
1.3 Related work
Competition and coexistence are phenomena which arise in many dierent scientic disciplines, such as
marketing, epidemiology, and economics. We now briey discuss related work in these areas.
4
(a) (b) (c)
Figure 3: Examples of the polynomial P and possible limiting proportions. In each case there is no
bias towards either color, i.e., p
k
+p
mk
= 1 for all 0 k m. (a) Majority choice: p
k
= 1 if k > m/2 and
p
k
= 0 otherwise (m = 5 in the gure). The possible limits are 0 and 1, i.e., the winner takes all. (b) The
parameters here are: m = 9, p
5
= p
6
= 0.5, p
7
= p
8
= p
9
= 0.95. Such an example is plausible if the
strength of the signal from the neighbors matters: if 3 k 6, then the signal towards either color is weak,
so just ip a fair coin to choose, but if 0 k 2 or 7 k 9 then there is a strong signal towards one of
the colors, so pick that color with probability close to 1. In this example the possible limits are: z
1
0.055,
z
2
= 0.5, z
3
0.945, and there are also two zeros of P which cannot be limits. (c) This is an example where
P has touchpoints. The parameters are: m = 6, p
4
= 1031/1710, p
5
= p
6
= 35/38, and the two touchpoints
are at z
1
= 1/4 and z
2
= 3/4. Both of these, as well as z
3
= 1/2, can be limits.
(a) m = 3 (b) m = 4
Figure 4: Phase diagrams when there is no bias towards either color/type, i.e., when p
k
+p
mk
= 1
for all 0 k m. Let q
k
:= p
k
k/m. (a) If q
2
< 0 or if q
2
+ q
3
0 and q
3
< 0, then lim
n
a
n
= 1/2,
i.e., in this case the network is split evenly among the two types in the limit. The linear model is the case
of q
2
= q
3
= 0. Finally, if q
2
+ q
3
> 0 and q
2
> 0, then let =
q3
q2
[0, 1); the possible limits of a
n
are
then
1
2

1
2
_
33
3+
. In particular, when = 0 then the winner takes all, and if (0, 1), then the two types
coexist in the limit. (b) This is similar to (a). Here, if 2q
3
+ q
4
> 0 and q
4
> 0, then let =
q4
q3
[0, 2);
the possible limits of a
n
are then
1
2

1
2
_
2
2+
.
In marketing, competing companies ght for customers. In essence, our model describes word-of-mouth
recommendations, and thus it should be compared to other models which study the eect of such personal
recommendations. A related model of word-of-mouth learning was studied by Banerjee and Fudenberg [14],
where successive generations of agents make choices between two alternatives, with new agents sampling the
choices of old ones. However, they considered the limit of a continuum of agents with no network structure,
in contrast to our setup, where this is explicitly modeled. Furthermore, they assume that one of the two
alternatives is ex-ante better than the other, and focus on whether or not the agents can learn this via
word-of-mouth communication. See also [26, 27].
5
The power of word-of-mouth has been a widely studied topic in the past half century, with research
conrming the strong inuence of word-of-mouth communication on consumer behavior [25, 4, 30, 22, 28].
This research generally supports the assertion that word-of-mouth is more inuential than external marketing
eorts, such as advertising. In the current information age, online feedback mechanisms have changed the way
customers share opinions about products and services [24], and online social networks are being exploited for
viral marketing purposes [37]. Nevertheless, traditional word-of-mouth recommendation networks still have
a very important eect, and companies are advised to take advantage of this through their marketing eorts,
e.g., via facilitating referrals [35, 36]. Due to the ever-changing ways individuals interact, it is important to
analyze modelssuch as the one introduced in this paperthat study the interplay between how individuals
interact and the eects of word-of-mouth communication in the given setting.
In epidemiology, pathogens ght for survival, and a central topic is the spread of diseases [13, 2]. In
classic models of epidemic spreading, individuals are characterized by the stage of the disease in them: they
can be susceptible, infected, or recovered/removed, leading to the SIR, SIRS and SIS models. The main
object of study is the epidemic threshold, i.e., under what conditions does the disease die out or take over
the population. An important nding is that the network structure underlying the population of individuals
greatly aects the epidemic threshold; in particular, on scale-free networks the epidemic threshold vanishes,
and diseases can spread even when infection probabilities are tiny [45, 39, 40, 42].
Another large area of epidemiology studies conditions under which multiple strains of a pathogen can
coexist (see, e.g., [38] and references therein), while the physics community has been studying the eects of
the underlying network on competing epidemics [43, 1, 34].
This research in epidemiology is relevant in a much broader context, since many dynamical processes,
such as the diusion of information and opinions, can be modeled as epidemics. Indeed, the spread of
competing products has been modeled in this way as well [50, 21]. In [50] the authors study a SI
1
I
2
S
model of competing viruses with perfect mutual immunity in a mean-eld setting for xed networks, and
conclude that the winner takes all, i.e., one virus will take over, while in [21] they study what level of partial
immunity allows for coexistence of the two viruses. A related model of competing rst passage percolation
has been studied in probability theory on various network topologies, including random regular graphs [3]
and scale-free networks [23]; the conclusion again is that the winner takes all. In contrast, in many current
markets we observe that competing products coexist, even when they are mutually exclusive.
Perhaps closest to our paper is the work of Arthur and collaborators in economics [6, 8]. The central
viewpoint of their research is that many economic systems are constantly evolving as opposed to being in
static equilibrium [9]; our model is in line with this out-of-equilibrium viewpoint. In particular, they study
several economic systems involving positive feedback due to increasing returns, such as the evolution of
technology choice [5] and industry locations [7]. The behavior of these systems share many things with our
model: multiple possible long-run states, unpredictability due to stochasticity, lock-in, path dependence,
and symmetry breaking. There are also technical commonalities: nonlinear Polya urn processes feature in
these [10, 11, 12] as well as in our model. However, there are also many dierent features in our model.
Chief among these is that we also explicitly model the network underlying the agents. We argue that the
inclusion of this extra layer is important and deserves further study in such out-of-equilibrium models.
1.4 Outline of paper
First, in Section 2 we prove the results described in Section 1.2. Then in Section 3 we study the case of three
or more types, and nally we conclude with open questions and directions for future research in Section 4.
2 Proofs
This section contains the proofs of our main results described in Section 1.2, and is structured as follows.
First, in Section 2.1 we show how the asymptotic behavior of a
n

n0
is the same as that of the sum-of-
degrees process x
n

n0
, which is more convenient to study, as it is a Markov process. Then in Section 2.2
we study the linear model and prove Theorem 1.1. Next in Section 2.3 we recall results from the theory
of stochastic approximation processes, and nally in Section 2.4 we prove our results concerning nonlinear
models.
6
2.1 Reduction to the sum-of-degrees process
To understand the process A
n

n0
(and thus the normalized process a
n

n0
), it is more convenient to
study the time evolution of the sum of the degrees of each type. The process A
n

n0
is not a Markov process,
and therefore we study the joint process (A
n
, X
n
)
n0
, which is indeed Markov. It evolves as follows. Given
(A
n
, X
n
), u
n+1
is drawn from the binomial distribution with parameters m and x
n
. Subsequently, I
n+1
is
drawn from the Bernoulli distribution with parameter p
un+1
. We then have
A
n+1
= A
n
+ I
n+1
, (2)
X
n+1
= X
n
+ u
n+1
+ mI
n+1
. (3)
The following lemma tells us that in order to understand the asymptotic behavior of a
n

n0
, it is enough
to understand the asymptotic behavior of x
n

n0
. Consequently, in the following we analyze the latter, as
this is a Markov process.
Lemma 2.1. Suppose x
n

n0
converges a.s. and let x := lim
n
x
n
denote the limit. If P (x) = 0 a.s.,
then a
n

n0
converges a.s. as well, and lim
n
a
n
= x a.s.
Proof. Let T
n
denote the ltration of the process until time n. Given T
n
, the probability that the node
added at time n + 1 is red is
P(A
n+1
A
n
= 1 [ T
n
) =
m

k=0
_
m
k
_
x
k
n
(1 x
n
)
mk
p
k
= x
n
+
m

k=0
_
m
k
_
x
k
n
(1 x
n
)
mk
q
k
= x
n
+ 2P (x
n
) =: f (x
n
) ,
where q
k
= p
k
k/m. Thus E(A
n+1
A
n
[ T
n
) = f (x
n
). Dene M
n
= A
n
A
0


n1
i=0
f (x
i
), with
initial condition M
0
= 0. The previous calculation tells us that M
n

n0
is a martingale with respect to the
ltration T
n
. Moreover, this martingale has bounded increments since M
i+1
M
i
= A
i+1
A
i
f (x
i
)
[1, 1], and thus lim
n
M
n
/n = 0 a.s.
Let x := lim
n
x
n
. Since P (x) = 0, we have f (x) = x. Since f is continuous, we have lim
n
f (x
n
) =
f (x) = x a.s., and thus the Ces`aro mean of the sequence f (x
n
)
n0
also converges to the same limit:
lim
n
1
n

n1
i=0
f (x
i
) = x a.s. The claim then follows from the fact that M
n
/n =
An
n

A0
n

1
n

n1
i=0
f (x
i
)
and lim
n
_
a
n

An
n
_
= 0.
2.2 Linear model
Proof of Theorem 1.1. In the linear model when p
k
=
k
m
for all k = 0, 1, . . . , m, we have that P 0, and
thus E(X
n+1
X
n
[ T
n
) = 2mx
n
. Since x
n+1
x
n
=
Xn+1Xn2mxn
Sn+1
, it follows that E(x
n+1
x
n
[ T
n
) = 0,
i.e., x
n

n0
is a martingale. Since it is also bounded, it converges almost surely. Lemma 2.1 then implies
that a
n

n0
converges a.s. as well, and lim
n
a
n
= lim
n
x
n
a.s.
We use a variance argument to show that the distribution of x := lim
n
x
n
has full support on [0, 1].
First note that (x
n+1
x
n
)
2
=
_
Xn+1Xn2mxn
S0+2m(n+1)
_
2

1
(n+1)
2
, and consequently for any n
0
we have
(4)
E
_
(x x
n0
)
2

T
n0
_
= lim
n
E
_
(x
n
x
n0
)
2

T
n0
_
=

j=n0
E
_
(x
j+1
x
j
)
2

T
n0
_

j=n0
1
(j + 1)
2

1
n
0
.
Now let (r, r + ) (0, 1) be any xed interval. Our goal is to show that P(x (r, r + )) > 0. Let n
0
be an
integer such that n
0

18

2
and P
_
x
n0

_
r +

3
, r +
2
3
__
> 0 (this is possible since for large enough n
0
there
exists a sequence of events such that x
n0

_
r +

3
, r +
2
3
_
). Now condition on this event; (4) implies that
E
_
(x x
n0
)
2

x
n0

_
r +

3
, r +
2
3
__

1
n
0


2
18
,
7
which in turn implies that P
_
[x x
n0
[

3

x
n0

_
r +

3
, r +
2
3
__

1
2
. We can conclude that
P(x (r, r + )) P
_
[x x
n0
[

3

x
n0

_
r +

3
, r +
2
3
__
P
_
x
n0

_
r +

3
, r +
2
3
__
> 0.
Finally, showing that the distribution of x has no atoms can be done by adapting arguments by Peman-
tle [46]. First, let us describe how the process x
n

n0
is related to time-dependent Polya urn processes
that Pemantle studies in [46].
Time-dependent Polya urn processes are generalizations of the classical Polya urn process, where the
number of balls added to the urn is allowed to vary with time. Although x
n

n0
is not a time-dependent
Polya urn process, the following slight modication of the preferential attachment process does give a time-
dependent P olya urn process. When adding a new node v to the graph G
n
= (V
n
, E
n
), add its m neighbors
one by one, and after adding each neighbor, update the degree of the neighbor. Let

X
n
denote the sum
of the degrees of red nodes at time n in this model. Consider also a time-dependent Polya urn process
Z
n

n0
where at times t ,= 0 mod m a single ball is added to the urn, and at times t = 0 mod m the
number of balls added to the urn is m + 1. It can be seen that if

X
0
= Z
0
, then

X
n
and Z
mn
have the
same distribution. Thus Pemantles results [46, Theorem 3, Theorem 4] apply directly and show that the
distribution of lim
n
x
n
(this limit exists a.s.) has no atoms.
Since our setting is close to Pemantles original setting, we only sketch the proof that the distribution of
x has no atoms, and leave the details to the reader.
To show that the distribution of x has no atoms on (0, 1), we can adapt the variance arguments of [46,
Theorem 3]. Fix r (0, 1). Suppose on the contrary that P(x = r) > 0. Then for every > 0 there exists
n
0
and some event / T
n0
having positive probability such that P(x
n
r [ /) 1 ; in fact, n
0
can
be as large as desired. Dene c :=
r(1r)
m/2
102
m/2
and let N = max
_
S0
m
,
2
c
2
min{r,1r}
_
. One can then show, via
variance arguments, the following two inequalities. First, for every n N,
P
_
sup
kn
[x
k
r[
c

T
n
_

1
2
.
Second, dening B =
_
[x
n
r[
c

n
_
, we have that for every n N,
P
_
inf
kn
[x
k
r[
c
2

T
n
, B
_

c
2
16
.
Putting these together we have that for every n N, the probability given T
n
is at least
c
2
32
that some x
n+k
will be at least
c

n
away from r and no subsequent x
n+k+
will ever return to the interval
_
r
c
2

n
, r +
c
2

n
_
.
This contradicts our initial assumption and so P(x = r) = 0.
To show that the distribution of x has no atoms at 0 and 1, we can adapt the arguments of [46, Theorem 4].
The main idea is a domination argument. Let v
n

n0
be the Polya urn process where at each time step
2m balls are added to the urn, and let v
0
= x
0
. Then the distribution of x
n
can be dominated by the
distribution of v
n
, in the sense that E(h(x
n
)) E(h(v
n
)) for every continuous bounded convex function h.
In other words, x
n
is smaller than v
n
in the convex order [52]. Since the limiting distribution of v
n

n0
is
a beta distribution, which does not have an atom at zero, one can then take h

(x) := max 0, 2 x/ and


let 0 to conclude that the distribution of x cannot have an atom at zero either. We refer the reader
to [46, Theorem 4] for more details. See also the proof of Theorem 1.4 for the endpoints in Section 2.4.
2.3 Stochastic approximation processes
The key observation in the analysis of the asymptotic behavior of x
n

n0
is that it is a stochastic approx-
imation process. Stochastic approximation was introduced in 1951 by Robbins and Monro [51], whose goal
was to approximate the root of an unknown function via evaluation queries that are necessarily noisy. There
has been much follow-up research, see, e.g., the monograph by Nevelson and Hasminskii [41]. The setup of
stochastic approximation arises naturally in the study of Polya urn processes; see the survey [49] for details.
8
In particular, we use results of Hill, Lane and Sudderth [31], who studied generalized (nonlinear) Polya urn
processes, and we also use subsequent renements by Pemantle [47, 48]. We state the main theorems here
and refer to the original papers for more details; see also the survey [49]. Stochastic approximation results
in higher dimensions will be discussed in Section 3.
Let Z
n

n0
be a stochastic process in R adapted to a ltration T
n
. Suppose that it satises
(5) Z
n+1
Z
n
=
1
n
(F (Z
n
) +
n+1
+ R
n
) ,
where F : R R, E(
n+1
[ T
n
) = 0, and the remainder terms R
n
T
n
go to zero and also satisfy

n=1
n
1
[R
n
[ < almost surely. Such a process is known as a stochastic approximation process.
Intuitively, trajectories of a stochastic approximation process Z
n

n0
should approximate the trajecto-
ries Z (t)
t0
of the corresponding ODE dZ/dt = F (Z). Moreover, since Z
n

n0
is a stochastic system,
we expect that stable trajectories of the ODE should appear, but unstable trajectories should not. This
intuition is conrmed and formalized in the following statements (quoted from the survey [49]); for proofs
and more details see the papers cited above.
Theorem 2.2 (Convergence to the zero set of F). Suppose Z
n
is a stochastic approximation process and
that E
_

2
n+1

T
n
_
K for some nite K. If F is bounded and continuous, then Z
n
converges almost surely
to the zero set of F.
Theorem 2.3 (Convergence to stable equilibria). Suppose Z
n
is a stochastic approximation process with
a bounded and continuous F, and that E
_

2
n+1

T
n
_
K for some nite K. Suppose there is a point z and
an > 0 with F (z) = 0, F > 0 on (z , z) and F < 0 on (z, z + ). Then P(Z
n
z) > 0. Similarly,
when F : [0, 1] R, if F (0) = 0 and F < 0 on (0, ) or if F (1) = 0 and F > 0 on (1 , 1), then there is
a positive probability of convergence to 0 or 1, respectively.
Theorem 2.4 (Nonconvergence to unstable equilibria). Suppose Z
n
is a stochastic approximation process
with a bounded and continuous F. Suppose there is a point z (0, 1) and an > 0 with F (z) = 0, F < 0 on
(z , z) and F > 0 on (z, z + ). Suppose further that E
_

+
n+1

T
n
_
and E
_

n+1

T
n
_
are bounded above
and below by positive numbers when Z
n
(z , z + ). Then P(Z
n
z) = 0.
Pemantle studied the case of touchpoints for generalized (nonlinear) Polya urn processes in [48]. His
proof extends to the following result.
Theorem 2.5 (Convergence to touchpoints). Suppose Z
n
is a stochastic approximation process with a
bounded and continuously dierentiable F, and that [
n
[ K a.s. for some nite K. Suppose z Z
P
is a touchpoint, i.e., there exists an > 0 such that either F > 0 on (z , z) (z, z + ) or F < 0 on
(z , z) (z, z + ). Then P(Z
n
z) > 0.
2.4 Nonlinear models
We rst show that x
n

n0
is a stochastic approximation process (i.e., satises (5)) with the function P
as in (1). Subsequently we show how this implies our results in Section 1.2 using the results described in
Section 2.3.
Lemma 2.6. The process x
n

n0
is a stochastic approximation process with the function F = P as in (1).
Furthermore, the noise term
n
is bounded: [
n
[ 2 for all n 1.
Proof. From (3) we have that the conditional expectation of X
n+1
X
n
is:
E(X
n+1
X
n
[ T
n
) =
m

k=0
_
m
k
_
x
k
n
(1 x
n
)
mk
(k + mp
k
) = 2mx
n
+ 2mP (x
n
) .
One can check that x
n+1
x
n
=
Xn+1Xn2mxn
Sn+1
and consequently E(x
n+1
x
n
[ T
n
) =
2m
Sn+1
P (x
n
), with
P as in (1). We can then write x
n

n0
as a stochastic approximation process as claimed in the statement
of the lemma, i.e., we can write
x
n+1
x
n
=
1
n
(P (x
n
) +
n+1
+ R
n
)
9
with appropriately dened
n+1
and R
n
. Dene
n+1
as
(6)
n+1
= n(x
n+1
x
n
E(x
n+1
x
n
[ T
n
)) .
The remainder term R
n
can then be written as
R
n
=
S
0
+ 2m
S
0
+ 2m(n + 1)
P (x
n
) .
Clearly R
n
T
n
. Let us now show that

n=1
n
1
[R
n
[ < . A crude bound on P is [P (t)[
1
2

m
k=0
_
m
k
_
[p
k
k/m[ t
k
(1 t)
mk

1
2

m
k=0
_
m
k
_
t
k
(1 t)
mk
=
1
2
. Therefore [R
n
[
1
2
S0+2m
S0+2m(n+1)
, so
indeed we have

n=1
n
1
[R
n
[ < .
Finally, to bound the noise term, notice that [x
n+1
x
n
[ =

Xn+1Xn2mxn
S0+2m(n+1)


2m
2m(n+1)
=
1
n+1
. Then
using (6) and the triangle inequality, we get that [
n
[ 2.
The results in Section 1.2 now follow. First, note that Lemma 2.1 implies that it is enough to show the
claims in Theorems 1.2, 1.3, 1.4, and 1.5 for the process x
n

n0
(instead of for the process a
n

n0
).
Proof of Theorem 1.2. This follows directly from Lemma 2.6 and Theorem 2.2.
Proof of Theorem 1.3. This follows directly from Lemma 2.6 and Theorem 2.3.
The proof of Theorem 1.4 is more involved. This is in line with related work in the literature, where
conditions for nonconvergence to unstable equilibria are more dicult to nd than similar results for conver-
gence to stable equilibria (see [49] for a discussion). Recall the proof of Theorem 1.1, where we showed that
the limiting distribution in the linear model has no atoms: we used a variance argument for points in (0, 1),
and a domination argument for the endpoints 0 and 1. Our proof of Theorem 1.4 follows similar lines.
We rst proceed by proving Theorem 1.4 for points z (0, 1) Z
P
. Intuitively, the process has sucient
noise which prevents it from converging to z. The following lemma is key to bounding the noise of the
process from below.
Lemma 2.7. Suppose the parameters p
k

0km
do not fall into one of the following three cases: (a) p
k
= 0
for all 0 k m; (b) p
k
= 1 for all 0 k m; (c) m = 1, p
0
= 1, p
1
= 0. Suppose that z (0, 1) Z
P
.
Then there exist integers k
1
and k
2
such that k
1
< 2mz < k
2
and, if x
n
(, 1 ) for some > 0, then the
probabilities P(X
n+1
X
n
= k
1
[ T
n
) and P(X
n+1
X
n
= k
2
[ T
n
) are bounded away from zero by a positive
function of and the parameters p
k

0km
.
Proof. In the following we always assume that x
n
(, 1 ). If p
0
< 1 then we can choose k
1
= 0
since P(X
n+1
X
n
= 0 [ T
n
) = (1 x
n
)
m
(1 p
0
)
m
(1 p
0
). Similarly, if p
m
> 0 then we can choose
k
2
= 2m, since P(X
n+1
X
n
= 2m[ T
n
) = x
m
n
p
m

m
p
m
. The rest of the proof deals with the cases when
either p
0
= 1 or p
m
= 0.
First consider the case when p
0
> 0 and p
1
= p
2
= = p
m
= 0. In this case P (s) =
1
2
[p
0
(1 s)
m
s],
which is decreasing in [0, 1], so it has a single zero in (0, 1). In fact, P (1/2) < 0, so the single zero of P in
(0, 1) is in (0, 1/2), and thus we can take k
2
= m. If p
0
< 1 then we can take k
1
= 0 as described above.
Finally, if p
0
= 1 and m > 2, then we can take k
1
= 1. This is because the zero of P in (0, 1) is in
_
1
2m
,
1
2
_
,
which follows from the fact that P
_
1
2m
_
> 0. The case when p
0
= p
1
= = p
m1
= 1 and p
m
< 1 follows
similarly.
Now we can assume that there exist 1 i m and 0 j m 1 such that p
i
> 0 and p
j
< 1. This
implies that P(X
n+1
X
n
= j [ T
n
)
m
(1 p
j
) > 0 and P(X
n+1
X
n
= m + i [ T
n
)
m
p
i
> 0. Thus
if z = 1/2 then we can take k
1
= j and k
2
= m + i. If 0 < z < 1/2 then we can again take k
2
= m + i, and
we just need to show the existence of an appropriate k
1
. Assume by contradiction that there does not exist
an appropriate k
1
, i.e., for all < 2mz, p

= 1. Then we have
P (s)
1
2
_
_

0k<2mz
_
m
k
_
s
k
(1 s)
mk
s
_
_
=
1
2
_
_
1 s

2mzkm
_
m
k
_
s
k
(1 s)
mk
_
_
.
By Markovs inequality for a binomial random variable, this latter sum evaluated at z is at most 1/2, and
since z < 1/2, we must have P (z) > 0, which is a contradiction. The case of 1/2 < z < 1 is similar.
10
Proof of Theorem 1.4 for z (0, 1). This follows from Lemma 2.6 and Theorem 2.4. The only condition of
Theorem 2.4 that needs to be checked additionally is that E
_

+
n+1

T
n
_
and E
_

n+1

T
n
_
are bounded away
from zero by positive numbers when x
n
(z , z + ) for small enough > 0; this can be done using
Lemma 2.7. In the special cases (a), (b), and (c) described in Lemma 2.7, the statement of Theorem 1.4
is vacuously true, since in each case the polynomial P has no zeros at which it is increasing. Thus we may
assume that we are not in these special cases, and we can use Lemma 2.7. Recall that

n+1
=
n
S
n+1
X
n+1
X
n
2m(x
n
+ P (x
n
)) .
Dene :=
1
2
min 2mz k
1
, k
2
2mz, where k
1
and k
2
are given by Lemma 2.7, and let > 0 be
small enough such that whenever x
n
(z , z + ), necessarily 2m(x
n
+ P (x
n
)) (2mz , 2mz + ). If
x
n
(z , z + ) then we have E
_

+
n+1

T
n
_

n
Sn+1
P(X
n+1
X
n
= k
2
[ T
n
), where lim
n
n
Sn+1
=
1
2m
,
and by Lemma 2.7 the probability P(X
n+1
X
n
= k
2
[ T
n
) is bounded from below by a positive function of
z, , and the parameters p
k

0km
. We can similarly bound from below E
_

n+1

T
n
_
.
We next prove Theorem 1.4 for the endpoints 0 and 1. The main idea of the proof is to compare the
behavior near the endpoints of our process of interest to that of a standard Polya urn process where 2m balls
are added at each time step. In order to formalize this, we make use of several dierent stochastic orders;
we refer to [52] for an overview of these. We proceed by dening these stochastic orders and stating a few
results on them, before proving Theorem 1.4.
Denition 1 (Stochastic orders). Let X and Y be random variables.
We say that X is smaller than Y in the usual stochastic order (denoted by X
st
Y ) if E((X))
E((Y )) for all increasing continuous functions : R R for which these expectations exist.
We say that X is smaller than Y in the convex order (denoted by X
cx
Y ) if E((X)) E((Y )) for
all continuous convex functions : R R for which these expectations exist.
We say that X is smaller than Y in the increasing convex order (denoted by X
icx
Y ) if E((X))
E((Y )) for all increasing continuous convex functions : R R for which these expectations exist.
Lemma 2.8. Two random variables X and Y satisfy X
icx
Y if and only if there is a random variable Z
such that X
st
Z
cx
Y .
Proof. See [52, Theorem 4.A.6. (a)].
Lemma 2.9. Let X and Y be two random variables with cumulative distribution functions F and G, re-
spectively, and bounded supports. Suppose that E(X) E(Y ), and also that if t
1
< t
2
and G(t
1
) < F (t
1
)
then G(t
2
) F (t
2
). Then X
icx
Y .
Proof. See [52, Theorem 4.A.22. (b)].
Lemma 2.10. Consider the standard Polya urn process where 2m balls are added at each time step. Let x
1
n
and x
2
n
be the proportions of red balls at the n
th
step of two realizations of this process. If x
1
n

cx
x
2
n
, then
x
1
n+1

cx
x
2
n+1
, i.e., the Polya urn process preserves dominance in the convex order.
Proof. See Proposition 1 in [46], in particular equation (13).
Proof of Theorem 1.4 for the endpoints. We prove nonconvergence to 1 when P (1) = 0 and P < 0 on
(1 , 1) for some > 0; the proof for the other endpoint is analogous. In the following x 0 < < 1/m.
The main idea of the proof is to compare the behavior near 1 of our process of interest to that of a
standard Polya urn process where 2m balls are added at each time step. To aid in this comparison we also
introduce an auxiliary process which is a combination of these two. We begin by describing these processes.
Our process of interest is X
n

n0
, together with its normalized process x
n

n0
. Let
_
X
n
_
n0
denote
the process of the number of red balls in a standard Polya urn process where 2m balls are added at each
time step, where the initial conditions are the same as those for the process X
n

n0
, i.e., X
0
= X
0
. Let
x
n

n0
denote the normalized process, i.e., x
n
=
Xn
S0+2mn
. Let
_

X
n
_
n0
denote the auxiliary process, with
initial condition

X
0
= X
0
, and let x
n

n0
denote the normalized process, i.e., x
n
=

Xn
S0+2mn
. We dene this
11
auxiliary process as follows. For 1 < x 1, given x
n
= x let

X
n+1
have the same distribution as X
n+1
given x
n
= x. For x 1 , let P
_

X
n+1
=

X
n

x
n
= x
_
= 1 x and P
_

X
n+1
=

X
n
+ 2m

x
n
= x
_
= x.
In other words, when x
n
> 1 then evolve the auxiliary process according to our process of interest, and
when x
n
1 then evolve it as a Polya urn process.
We rst show that it suces to prove the claim for the auxiliary process, i.e., it suces to show that
P(lim
n
x
n
= 1) = 0. Dene the following events:
A
n
:=
_
lim
k
x
k
= 1, x
k
> 1 for all k n
_
,

A
n
:=
_
lim
k
x
k
= 1, x
k
> 1 for all k n
_
.
If P(lim
n
x
n
= 1) > 0, then there exists n
0
< such that P(A
n0
) > 0. In particular, there exists
y
0
(1 , 1) such that both probabilities P(x
n0
y
0
) and P(A
n0
[ x
n0
= y
0
) are positive. In fact, we claim
that P(A
n0
[ x
n0
= y) is positive for all y
0
y < 1.
To see this, consider two realizations of our process,
_
X
1
n
_
n0
and
_
X
2
n
_
n0
, together with the normalized
processes
_
x
1
n
_
n0
and
_
x
2
n
_
n0
. Given 1 < x
1
n
< x
2
n
< 1, we can couple X
1
n+1
and X
2
n+1
such that
for any 0 k 2m, X
1
n+1
X
1
n
= k implies that either X
2
n+1
X
2
n
= k or X
2
n+1
X
2
n
= 2m. This is
possible due to two facts. First, since < 1/m, on the interval (1 , 1) the function x x
m
is increasing,
while for 0 k < m, the functions x x
k
(1 x)
mk
are decreasing. Consequently P
_
Bin
_
m, x
1
n
_
= m
_
<
P
_
Bin
_
m, x
2
n
_
= m
_
and for 0 k < m, P
_
Bin
_
m, x
1
n
_
= k
_
> P
_
Bin
_
m, x
2
n
_
= k
_
, where Bin (m, x)
denotes a binomial random variable with parameters m and x. Second, P (1) = 0 implies that p
m
= 1.
Repeated application of this coupling shows that for any 1 < y
1
< y
2
< 1 we have P
_
A
n

x
n
= y
1
_

P
_
A
n

x
n
= y
2
_
; in particular, we have that P(A
n0
[ x
n0
= y) P(A
n0
[ x
n0
= y
0
) for all y y
0
.
Now consider the auxiliary process. For one, we have P( x
n0
y
0
) > 0. Moreover, if x
n0
= x
n0
,
on the event A
n0
we can couple the processes x
n

nn0
and x
n

nn0
so that x
n
= x
n
for all n
n
0
, which shows that P
_

A
n0

x
n0
= y
_
P(A
n0
[ x
n0
= y
0
) > 0 for all y y
0
. In particular, this
shows that P(lim
n
x
n
= 1) > 0 implies that P(lim
n
x
n
= 1) > 0. Thus it suces to show that
P(lim
n
x
n
= 1) = 0.
We claim that x
n

icx
x
n
implies that P(lim
n
x
n
= 1) = 0. To see this, for > 0 dene the function
g

: [0, 1] [0, 2] by g

(x) = max 0, 2 1/ + x/. This is an increasing continuous convex function, and


so x
n

icx
x
n
implies that
(7) P( x
n
> 1 ) E(g

( x
n
)) E(g

(x
n
)) 2P(x
n
> 1 2) .
We know that the limiting distribution of x
n
is a beta distribution, and thus
lim
0
lim
n
P(x
n
> 1 2) = 0.
By (7) this then implies that P(lim
n
x
n
= 1) = 0.
We prove

X
n

icx
X
n
(which is equivalent to x
n

icx
x
n
) by induction on n; for n = 0 this is immediate
since the initial conditions agree. Fix now a positive integer n, and consider a random variable X which
attains integer values in the interval [X
0
, S
0
+ 2mn], and let x =
X
S0+2mn
. Denote by X a random variable
with distribution P
_
X = X + 2m

X
_
= x and P
_
X = X

X
_
= 1 x. Similarly, let

X denote a random
variable with distribution the same as that of

X
n+1
conditioned on

X
n
= X. Following Pemantle [46], the
induction step follows from the following two claims: (1)

X
icx
X, and (2) X
icx
Y implies that X
icx
Y .
First, it is enough to show that for any xed r, conditioned on x = r we have

X
icx
X; one can then
integrate out the conditioning to get (1). We show this by checking the conditions of Lemma 2.9. First, when
r 1 we have E
_

x = r
_
= E
_
X

x = r
_
by the denition of the auxiliary process. If r > 1 then
we have E
_
X

x = r
_
= r (S
0
+ 2mn) + 2mr, while E
_

x = r
_
= r (S
0
+ 2mn) + 2m(r + P (r)). Since
r > 1, P (r) < 0, and thus E
_

x = r
_
< E
_
X

x = r
_
. This shows that E
_

x = r
_
E
_
X

x = r
_
.
The other condition of Lemma 2.9 holds automatically due to the fact that conditioned on X = , the
distribution of X is supported on the two values , + 2m, while the support of the distribution of

X is
contained in the interval [, + 2m].
12
In view of Lemmas 2.8 and 2.10, to show (2) it is enough to show that X
st
Y implies X
st
Y ,
i.e., that for any increasing function we have E
_

_
X
__
E
_

_
Y
__
. By conditioning on X and Y , we
have E
_

_
X
__
= E
_
(X)
_
and E
_

_
Y
__
= E
_
(Y )
_
, where (t) := (t) (1 t) + (t + 2m) t, where
= (S
0
+ 2mn)
1
and t is such that 0 t 1. Since X
st
Y , we only need to show that is increasing.
Indeed, if t
1
< t
2
then
(t
2
)(t
1
) = ((t
2
+ 2m) (t
1
+ 2m)) t
1
+((t
2
) (t
1
)) (1 t
1
)+((t
2
+ 2m) (t
2
)) (t
2
t
1
) ,
which is nonnegative, since all of the terms on the right hand side are nonnegative.
Proof of Theorem 1.5. This follows directly from Lemma 2.6 and Theorem 2.5.
3 Many colors/types
It is both natural and important to study competition between more than two colors/types. Our model
naturally extends in this direction, and in this section we present our results regarding N 3 competing
types. In the following, vectors will be denoted using boldface, subscripts typically correspond to time and
superscripts correspond to the indices of types. Furthermore, denote by
N
the probability simplex in R
N
.
The natural extension of the model to multiple competing types is as follows. At time zero, there is a
graph G
0
, where each node is of exactly one of the N types. At each timestep a new node is added to the
graph, and is connected to m nodes of the original graph according to linear preferential attachment. The
types of these m neighbors induce a vector of types u, where u
i
is the number of neighbors of type i. The
type of the new node is then determined according to a random draw from the distribution p
u
=
_
p
i
u
_
i[N]
.
The probabilities
_
p
i
u
_
u,i
are parameters of the model.
As in the case of two types, our primary interest is in the fraction of nodes of each type. Let A
i
n
denote
the number of nodes of type i at time n, and let A
n
=
_
A
1
n
, . . . , A
N
n
_
denote the resulting vector of types.
Let a
n
denote the normalized vector of types, such that

N
i=1
a
i
n
= 1. Furthermore, let X
i
n
denote the sum
of the degrees of type i nodes at time n, let X
n
=
_
X
1
n
, . . . , X
N
n
_
denote the resulting vector of degrees, and
let x
n
be the normalized vector of degrees, such that

N
i=1
x
i
n
= 1.
As in the N = 2 case, there is a clear distinction between the linear model, when p
i
u
=
u
i
m
for all u and
i [N], and nonlinear models, when there exist u and i [N] such that p
i
u
,=
u
i
m
. In fact, the linear model
for N 3 types reduces to the linear model for two types. This is because in the linear model, if we want to
study the evolution of the size of type i, then we can group all other types into a single mega-type, denoted
by i, and run the process with two types: type i and mega-type i. Due to linearity, the original process
with N types and the process with type i and mega-type i can be coupled such that the evolution of
type i is identical in the two processes. Consequently, in the linear model all the results of the N = 2 case
apply. In particular, we have the following theorem.
Theorem 3.1 (Linear model). Assume that p
i
u
=
u
i
m
for all u and i [N], and that X
i
0
> 0 for all i [N].
Then a
n
converges almost surely, and the limiting distribution has full support on
N
, and no atoms.
In nonlinear models, as we will see later, a key role in the asymptotic behavior of the process a
n

n0
is
played by the vector eld
(8) P (y) =
1
2
N

i=1

u
_
m
u
_
(y)
u
_
p
i
u

u
i
m
_

i
,
where
_
m
u
_
=
m!
u
1
!...u
N
!
denotes the multinomial coecient, (y)
u
=
_
y
1
_
u
1 _
y
2
_
u
2
. . .
_
y
N
_
u
N
, and
i
is the
N-dimensional unit vector whose i
th
coordinate is 1 and all other coordinates are 0. Let us denote the zero
set of this vector eld on the probability simplex by Z
P
:=
_
y
N
: P (y) = 0
_
; this will be important
later.
The behavior of the process in the general nonlinear model with multiple types is involved, and its
complete theoretical analysis is as of yet out of our reach. Nonetheless, based on partial theoretical results,
we conjecture the following asymptotic behavior, which is similar to that in the case of two types.
13
Conjecture 3.2. Assume that there exist u and i [N] such that p
i
u
,=
u
i
m
, and that X
i
0
> 0 for all i [N].
Then a
n
converges almost surely and the limit is a point in the zero set Z
P
.
In the rest of this section we describe theoretical progress towards this conjecture. As in the case of two
competing types, the problem can be cast in a (multidimensional) stochastic approximation framework.
The process A
n

n0
is not a Markov process, and therefore we study the joint process (A
n
, X
n
)
n0
,
which is indeed Markov. It evolves as follows. Given (A
n
, X
n
), a vector u
n+1
is drawn from the multinomial
distribution with parameters m and x
n
. Subsequently, an index I
n+1
[N] is chosen from the distribution
p
un+1
. We then have
A
n+1
= A
n
+
In+1
, (9)
X
n+1
= X
n
+u
n+1
+ m
In+1
. (10)
Before analyzing the process x
n

n0
, let us show that in order to prove Conjecture 3.2 on the asymptotic
behavior of a
n

n0
, it is sucient to prove a similar result on the asymptotic behavior of x
n

n0
.
Lemma 3.3. Assume that there exist u and i [N] such that p
i
u
,=
u
i
m
, and that X
i
0
> 0 for all i [N].
Assume that x
n
converges almost surely and the limit is a point in the zero set Z
P
. Then a
n
converges
almost surely and the limit is a point in the zero set Z
P
.
Proof. This is similar to the proof of Lemma 2.1. We have seen that
E(A
n+1
A
n
[ T
n
) = E
_

In+1

T
n
_
=
N

i=1

u
_
m
u
_
(x
n
)
u
p
i
u

i
= x
n
+
N

i=1

u
_
m
u
_
(x
n
)
u
_
p
i
u

u
i
m
_

i
= x
n
+ 2P (x
n
) =: f (x
n
) .
Let M
0
= 0 and dene the martingale M
n
:= A
n
A
0


n1
j=0
f (x
j
). This martingale has bounded
increments, and thus lim
n
M
n
/n = 0 a.s. By the denition of the martingale, this shows that a.s.
lim
n
_
_
a
n

1
n
n1

j=0
f (x
j
)
_
_
= 0.
Now if the limit lim
n
x
n
exists a.s., and any limit point x satises P (x) = 0, then also f (x) = x, and
thus the limit of the Ces`aro mean of the sequence f (x
n
)
n0
also converges to the same limit point. This
then implies that the limit lim
n
a
n
exists a.s. and is equal to lim
n
x
n
.
The key observation in the analysis of the asymptotic behavior of x
n

n0
is that it is a stochastic
approximation process. In higher dimensions, a stochastic approximation process is dened as follows. Let
Z
n
be a stochastic process in the euclidean space R
N
and adapted to a ltration T
n

n0
. Suppose that it
satises
Z
n+1
Z
n
=
1
n
(F (Z
n
) +
n+1
+R
n
) ,
where F is a vector eld on R
N
, E(
n+1
[ T
n
) = 0 and the remainder terms R
n
T
n
go to zero and satisfy

n=1
n
1
|R
n
| < a.s. Such a process is known as a stochastic approximation process.
Lemma 3.4. The process x
n

n0
is a stochastic approximation process with the vector eld P as in (8).
Furthermore, the noise term
n
is bounded: |
n
|
1
2N for all n 1.
Proof. From (10) we have that E(X
n+1
X
n
[ T
n
) = E(u
n+1
[ T
n
) + mE
_

In+1

T
n
_
. Given T
n
, u
n+1
is
multinomial with parameters m and x
n
, and so E(u
n+1
[ T
n
) = mx
n
. By construction, we have that
E
_

In+1

T
n
_
=
N

i=1

u
_
m
u
_
(x
n
)
u
p
i
u

i
.
14
Let S
0
denote the sum of the degrees in G
0
, and let S
n
= S
0
+ 2mn. A simple calculation gives that
x
n+1
x
n
=
Xn+1Xn2mxn
Sn+1
, and so we have
E(x
n+1
x
n
[ T
n
) =
m
S
n+1
N

i=1

u
_
m
u
_
(x
n
)
u
_
p
i
u

u
i
m
_

i
=
2m
S
n+1
P (x
n
) .
We can then write x
n

n0
as a stochastic approximation process:
x
n+1
x
n
=
1
n
[P (x
n
) +
n+1
+R
n
] ,
where
(11)
n+1
= nx
n+1
x
n
E(x
n+1
x
n
[ T
n
)
is the martingale term, and the remainder term is
R
n
=
S
0
+ 2m
S
0
+ 2m(n + 1)
P (x
n
) .
Clearly R
n
T
n
and similarly as at the end of the proof of Lemma 2.6 one can show that |R
n
| c/n for
some constant c = c (N, S
0
, m), which implies that

n=1
n
1
|R
n
| < a.s.
Finally, to check that |
n
|
1
2N, note that

x
i
n+1
x
i
n

X
i
n+1
X
i
n
2mx
i
n
S
n+1

2m
2m(n + 1)
=
1
n + 1
,
and then use (11).
As in the one-dimensional case, intuitively, trajectories of a stochastic approximation process Z
n

n0
should approximate the trajectories Z (t)
t0
of the corresponding ODE dZ/dt = F (Z). Moreover, since
Z
n

n0
is a stochastic system, we expect that stable trajectories of the ODE should appear, but unstable
trajectories should not.
The main concept in formalizing this intuition is that of an asymptotic pseudotrajectory, introduced
by Benam and Hirsch [20]. We omit the precise denition, and refer to Benams lecture notes on the
topic for more details [18] (see also [49, Section 2.5] for a concise summary). There are many results that
give sucient conditions for a stochastic approximation process to be an asymptotic pseudotrajectory of
the corresponding ODE. In particular, [18, Proposition 4.4 and Remark 4.5] (see also [49, Theorem 2.13]),
together with Lemma 3.4 and the fact that P is Lipschitz, imply the following.
Corollary 3.5. Let x(t)
t0
linearly interpolate x
n

n0
at nonintegral times. Then x(t)
t0
is almost
surely an asymptotic pseudotrajectory for the ow induced by the vector eld P via the ODE dy/dt = P (y).
There are further general results about asymptotic pseudotrajectories that apply to the stochastic ap-
proximation process x
n

n0
, e.g., about convergence to attractors and nonconvergence to linearly unstable
equilibria. However, we omit these, as we prefer to emphasize the main message of Corollary 3.5. The main
point is that in order to understand the stochastic approximation process x
n

n0
, we need to understand
the vector eld P, and the corresponding ODE
dy
dt
= P (y) .
Unfortunately, understanding the behavior of such nonlinear ODEs is a notoriously dicult subject (see,
e.g., the book by Hirsch, Smale and Devaney [32]). The most successful tool in this area is Lyapunov theory
(see, e.g., the recent preprint [19]), and this can indeed be applied to our problem for special values of the
parameters; however, it seems dicult to apply this theory to the vector eld P for generic values of the
parameters
_
p
i
u
_
u,i
.
15
For instance, if P is a gradient, i.e., P = V for some V : R
N
R, then Corollary 3.5 and general
results about asymptotic pseudotrajectories (see [18]) imply that x
n
converges almost surely and the limit
is a point in the zero set Z
P
, which, by Lemma 3.3, implies that Conjecture 3.2 holds. An example of when
P is a gradient is when the probability of the new node adopting type i depends only on the proportion of
type i connections, i.e., p
i
u
=
_
u
i
/m
_
for some function which does not depend on i. It is not dicult to
show that must be of the form (z) =
1
N
+(1 ) z for some 0 1, which corresponds to a mixture
of the linear model and a uniformly random choice. In this case P (y) =

2
_
1
N
1 y
_
, where 1 R
N
is the
vector with all entries equal to 1, and thus when > 0 then a
n
converges a.s. to
1
N
1.
However, for generic parameter values P will not be a gradient. To see this, note that P being a gradient
implies that
(12)
(P (y))
i
y
j
=
(P (y))
j
y
i
for every i ,= j. Without any restrictions, there are (N 1)
_
m+N1
N1
_
free parameters in P. The gradient
condition (12) imposes an additional
_
N
2
_
constraints, which will not be satised for generic parameter values.
4 Open problems and future directions
Our paper leaves open several interesting problems. Two immediate open questions concerning our model
are the following.
Limiting distribution in the linear model for two types. Our Theorem 1.1 gives us information
about the limiting behavior of a
n

n0
, but it does not identify the distribution of a := lim
n
a
n
.
For m = 1 the process x
n

n0
corresponds to a Polya urn where whenever one draws a ball, one puts
back two extra balls of the same color. This is because when a new node joins the graph, its color
automatically becomes the color of its initial neighbor. Thus the distribution of xand by Lemma 2.1
the distribution of a as wellis the Beta distribution with parameters
X0
2
and
Y0
2
.
However, for m > 1 we do not know what the limiting distribution is. Note that simulations show that
the limiting distribution can be bimodal; see, e.g., Figure 2b.
Understanding the vector eld P. As discussed in Section 3, in order to understand the behavior
of the general nonlinear model in the case of multiple typesand in order to prove or disprove Con-
jecture 3.2a good understanding of the vector eld P and the corresponding ODE dy/dt = P (y) is
needed. We leave this as our second open problem.
A key property of our model is its simplicity. However, this also means that certain aspects of real-world
networks and processes inuencing product adoption are simplied or not considered. It would be interesting
to understand the following possible extensions of our model, and, in particular, whether anything can be
said analytically in these extensions.
Changing preferences. In our model once a node receives a type, that type is then xed and cannot
change over time. A possible extension of the model is to allow the type of a node to change over time.
This can model changing preferences of individuals, e.g., somebody moving from one mobile phone
provider to another.
Allowing multiple types for a single individual. In our model a node can only have a single
type. This is reasonable in many situations (e.g., an individual typically has only one mobile phone
provider), but modeling other situations might require allowing nodes to simultaneously have multiple
types.
Other network evolution models. The preferential attachment model is a good approximation
of many real-world networks, and it has the advantageous property of being analytically tractable.
How does our model behave under other network evolution models? Can similar results be shown
analytically/experimentally? Are the results robust to small changes in the network evolution model?
16
Other type adoption mechanisms. Our model incorporates a fairly general type adoption mech-
anism, but various modications would be interesting to explore. For instance, in real life choices are
often made based on the opinions of specic friends, not just based on aggregate information of ones
friends.
Marketing. In essence, our model describes word-of-mouth recommendations, and does not consider
marketing eorts by the competing companies, such as advertising. How does incorporating marketing
aect the results?
In conclusion, through a simple model we have coupled network evolution and type adoption, leading
to an explanation of coexistence in preferential attachment networks. Exploring various modications and
extensions of this model, such as those mentioned above, will be crucial in determining the robustness of
this phenomenon, and will help elucidate our understanding of these processes.
Acknowledgments
We thank Erik Bodzsar, Gyorgy Korniss, Geza Meszena, Jasmine Nirody, Nathan Ross, Allan Sly, and
Isabelle Stanton for helpful discussions.
References
[1] Y.-Y. Ahn, H. Jeong, N. Masuda, and J.D. Noh. Epidemic dynamics of two species of interacting
particles on scale-free networks. Physical Review E, 74(6):066113, 2006.
[2] R.M. Anderson and R.M. May. Infectious diseases of humans: Dynamics and control. Oxford University
Press, 1991.
[3] T. Antunovic, Y. Dekel, E. Mossel, and Y. Peres. Competing rst passage percolation on random regular
graphs. Arxiv preprint arXiv:1109.2575, 2011.
[4] J. Arndt. Role of Product-Related Conversations in the Diusion of a New Product. Journal of
Marketing Research, 4(3):291295, 1967.
[5] W.B. Arthur. Competing technologies, increasing returns, and lock-in by historical events. The Eco-
nomic Journal, 99(394):116131, 1989.
[6] W.B. Arthur. Positive Feedbacks in the Economy. Scientic American, 262(2):9299, 1990.
[7] W.B. Arthur. Silicon Valley locational clusters: When do increasing returns imply monopoly? Mathe-
matical Social Sciences, 19(3):235251, 1990.
[8] W.B. Arthur. Increasing Returns and Path Dependence in the Economy. The University of Michigan
Press, 1994.
[9] W.B. Arthur. Complexity and the Economy. Science, 284(5411):107109, 1999.
[10] W.B. Arthur, Y.M. Ermoliev, and Y.M. Kaniovski. On generalized urn schemes of the Polya kind.
Kibernetika, 19:4956, 1983.
[11] W.B. Arthur, Y.M. Ermoliev, and Y.M. Kaniovski. Strong laws for a class of path-dependent stochastic
processes, with applications. In Proceedings of the International Conference on Stochastic Optimization,
pages 287300. Springer-Verlag, New York, 1984.
[12] W.B. Arthur, Y.M. Ermoliev, and Y.M. Kaniovski. Path-dependent processes and the emergence of
macro-structure. European Journal of Operational Research, 30(3):294303, 1987.
[13] N.T.J. Bailey. The Mathematical Theory of Infectious Diseases and Its Applications. Grin, London,
1975.
17
[14] A. Banerjee and D. Fudenberg. Word-of-mouth learning. Games and Economic Behavior, 46(1):122,
2004.
[15] A.L. Barabasi. Scale-free networks: A decade and beyond. Science, 325(5939):412413, 2009.
[16] A.L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509512,
1999.
[17] A. Barrat, M. Barthelemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge
University Press, 2008.
[18] M. Benam. Dynamics of stochastic approximation algorithms. Seminaire de probabilites XXXIII, pages
168, 1999.
[19] M. Benam, I. Benjamini, J. Chen, and Y. Lima. A generalized Polyas urn with graph based interactions.
Arxiv preprint arXiv:1211.1247, 2013.
[20] M. Benam and M.W. Hirsch. Asymptotic Pseudotrajectories and Chain Recurrent Flows, with Appli-
cations. Journal of Dynamics and Dierential Equations, 8(1):141176, 1996.
[21] A. Beutel, B.A. Prakash, R. Rosenfeld, and C. Faloutsos. Interacting viruses in networks: can both
survive? In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 426434. ACM, 2012.
[22] F.A. Buttle. Word of mouth: understanding and managing referral marketing. Journal of Strategic
Marketing, 6(3):241254, 1998.
[23] M. Deijfen and R. van der Hofstad. The winner takes it all. Arxiv preprint arXiv:1306.6467, 2013.
[24] C. Dellarocas. The digitization of word of mouth: Promise and challenges of online feedback mechanisms.
Management Science, 49(10):14071424, 2003.
[25] E. Dichter. How Word-of-Mouth Advertising Works. Harvard Business Review, 44(6):147166, 1966.
[26] G. Ellison and D. Fudenberg. Rules of Thumb for Social Learning. Journal of Political Economy,
101(4):612643, 1993.
[27] G. Ellison and D. Fudenberg. Word-of-mouth communication and social learning. The Quarterly Journal
of Economics, 110(1):93125, 1995.
[28] J. Goldenberg, B. Libai, and E. Muller. Talk of the Network: A Complex Systems Look at the Underlying
Process of Word-of-Mouth. Marketing Letters, 12(3):211223, 2001.
[29] T. Gross and B. Blasius. Adaptive coevolutionary networks: a review. Journal of the Royal Society
Interface, 5(20):259271, 2008.
[30] P.M. Herr, F.R. Kardes, and J. Kim. Eects of Word-of-Mouth and Product-Attribute Information on
Persuasion: An Accessibility-Diagnosticity Perspective. Journal of Consumer Research, 17(4):454462,
1991.
[31] B.M. Hill, D. Lane, and W. Sudderth. A strong law for some generalized urn processes. The Annals of
Probability, 8(2):214226, 1980.
[32] M.W. Hirsch, S. Smale, and R.L. Devaney. Dierential Equations, Dynamical Systems, and An Intro-
duction to Chaos. Academic Press, 2004.
[33] P. Holme and J. Saramaki. Temporal networks. Physics Reports, 519(3):97125, 2012.
[34] B. Karrer and M.E.J. Newman. Competing epidemics on complex networks. Physical Review E,
84(3):036106, 2011.
18
[35] V. Kumar, J.A. Petersen, and R.P. Leone. How Valuable is Word of Mouth? Harvard Business Review,
85(10):139144,146,166, 2007.
[36] V. Kumar, J.A. Petersen, and R.P. Leone. Driving Protability by Encouraging Customer Referrals:
Who, When, and How. Journal of Marketing, 74(5):117, 2010.
[37] J. Leskovec, L.A. Adamic, and B.A. Huberman. The Dynamics of Viral Marketing. ACM Transactions
on the Web, 1(1):139, 2007.
[38] M. Lipsitch, C. Colijn, T. Cohen, W.P. Hanage, and C. Fraser. No coexistence for free: neutral null
models for multistrain pathogens. Epidemics, 1(1):213, 2009.
[39] A.L. Lloyd and R.M. May. How viruses spread among computers and people. Science, 292(5520):1316
1317, 2001.
[40] R.M. May and A.L. Lloyd. Infection dynamics on scale-free networks. Physical Review E, 64(6):066112,
2001.
[41] M.B. Nevelson and R.Z. Hasminskii. Stochastic Approximation and Recursive Estimation, volume 47 of
Translations of Mathematical Monographs. American Mathematical Society, 1976.
[42] M.E.J. Newman. Spread of epidemic disease on networks. Physical Review E, 66(1):016128, 2002.
[43] M.E.J. Newman. Threshold eects for two pathogens spreading on a network. Physical Review Letters,
95(10):108701, 2005.
[44] H. Ohtsuki, C. Hauert, E. Lieberman, and M.A. Nowak. A simple rule for the evolution of cooperation
on graphs and social networks. Nature, 441(7092):502505, 2006.
[45] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review
Letters, 86(14):32003203, 2001.
[46] R. Pemantle. A time-dependent version of Polyas urn. Journal of Theoretical Probability, 3(4):627637,
1990.
[47] R. Pemantle. Nonconvergence to unstable points in urn models and stochastic approximations. The
Annals of Probability, 18(2):698712, 1990.
[48] R. Pemantle. When are touchpoints limits for generalized Polya urns? Proceedings of the American
Mathematical Society, 113(1):235243, 1991.
[49] R. Pemantle. A survey of random processes with reinforcement. Probability Surveys, 4:179, 2007.
[50] B.A. Prakash, A. Beutel, R. Rosenfeld, and C. Faloutsos. Winner takes all: competing viruses or
ideas on fair-play networks. In Proceedings of the 21st International Conference on World Wide Web
(WWW), pages 10371046. ACM, 2012.
[51] H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics,
22(3):400407, 1951.
[52] M. Shaked and J.G. Shanthikumar. Stochastic Orders. Springer, New York, 2007.
[53] B. Skyrms and R. Pemantle. A dynamic model of social network formation. Proceedings of the National
Academy of Sciences, 97(16):93409346, 2000.
[54] D.J. Watts. A simple model of global cascades on random networks. Proceedings of the National
Academy of Sciences, 99(9):57665771, 2002.
19

Das könnte Ihnen auch gefallen