An Introduction To Dynamic Games

An Introduction to Dynamic Games
A. Haurie
J. B. Krawczyk
Contents
Chapter I. Foreword 5
I.1. What are dynamic games? 5
I.2. Origins of this book 5
I.3. What is different in this presentation 6
Part 1. Foundations of Classical Game Theory 7
Chapter II. Elements of Classical Game Theory 9
II.1. Basic concepts of game theory 9
II.2. Games in extensive form 10
II.3. Additional concepts about information 15
II.4. Games in normal form 17
II.5. Exercises 20
Chapter III. Solution Concepts for Noncooperative Games 23
III.1. Introduction 23
III.2. Matrix games 24
III.3. Bimatrix games 32
III.4. Concave m-person games 38
III.5. Correlated equilibria 45
III.6. Bayesian equilibrium with incomplete information 49
III.7. Appendix on Kakutani xed-point theorem 53
III.8. Exercises 53
Chapter IV. Cournot and Network Equilibria 57
IV.1. Cournot equilibrium 57
IV.2. Flows on networks 61
IV.3. Optimization and equilibria on networks 62
IV.4. A convergence result 69
Part 2. Repeated and sequential Games 73
Chapter V. Repeated Games and Memory Strategies 75
V.1. Repeating a game in normal form 76
V.2. Folk theorem 79
V.3. Collusive equilibrium in a repeated Cournot game 82
V.4. Exercises 85
Chapter VI. Shapleys Zero Sum Markov Game 87
VI.1. Process and rewards dynamics 87
3
4 CONTENTS
VI.2. Information structure and strategies 87
VI.3. Shapleys-Denardo operator formalism 89
Chapter VII. Nonzero-sum Markov and Sequential Games 93
VII.1. Sequential games with discrete state and action sets 93
VII.2. Sequential games on Borel spaces 95
VII.3. Application to a stochastic duopoloy model 96
Index 101
Bibliography 103
CHAPTER I
Foreword
I.1. What are dynamic games ?
Dynamic Games are mathematical models of the interaction between indepen-
dent agents who are controlling a dynamical system. Such situations occur in military
conicts (e.g., duel between a bomber and a jet ghter), economic competition (e.g.,
investments in R&D for computer companies), parlor games (chess, bridge). These
examples concern dynamical systems since the actions of the agents (also called play-
ers) inuence the evolution over time of the state of a system (position and velocity
of aircraft, capital of know-how for Hi-Tech rms, positions of remaining pieces on
a chess board, etc). The difculty in deciding what should be the behavior of these
agents stems from the fact that each action an agent takes at a given time will inuence
the reaction of the opponent(s) at later time. These notes are intended to present the
basic concepts and models which have been proposed in the burgeoning literature on
game theory for a representation of these dynamic interactions.
I.2. Origins of this book
These notes are based on several courses on Dynamic Games taught by the au-
thors, in different universities or summer schools, to a variety of students in engineer-
ing, economics and management science. The notes use also some documents prepared
in cooperation with other authors, in particular B. Tolwinski [63] and D. Carlson.
These notes are written for control engineers, economists or management scien-
tists interested in the analysis of multi-agent optimization problems, with a particular
emphasis on the modeling of competitive economic situations. The level of mathemat-
ics involved in the presentation will not go beyond what is expected to be known by
a student specializing in control engineering, quantitative economics or management
science. These notes are aimed at last-year undergraduate, rst year graduate students.
The Control engineers will certainly observe that we present dynamic games as an
extension of optimal whereas economists will see also that dynamic games are only
a particular aspect of the classical theory of games which is considered to have been
launched by J. Von Neumann and O. Morgenstern in the 40s
1
. The economic models
1
The book [66] is an important milestone in the history of Game Theory.
5
6 I. FOREWORD
of imperfect competition that we shall repeatedly use as motivating examples, have a
more ancient origin since they are all variations on the original Cournot model [10],
proposed in the mid 18-th century . An interesting domain of application of dynamic
games, which is described in these notes, relates to environmental management. The
conict situations occurring in sheries exploitation by multiple agents or in policy co-
ordination for achieving global environmental (e.g., in the control of a possible global
warming effect) are well captured in the realm of this theory.
The objects studied in this book will be dynamic. The term dynamic comes from
Greek [powerful], [power], [strength] and means
of or pertaining to force producing motion
2
. In an every day context, dynamic is
an attribute of a phenomenon that undergoes a time-evolution. So, in broad terms,
dynamic systems are such that change in time. They may evolve endogenously like
economies or populations, or change their position and velocity like a car. In the
rst part of these notes, the dynamic models presented are discrete time. This means
that the mathematical description of the dynamics uses difference equations, in the
de terministic context and discrete time Markov processes in the stochastic one. In
a second part of these notes, the models will use a continuous time paradigm where
the mathematical tools representing dynamics are differential equations and diffusion
processes.
Therefore the rst part of the notes should be accessible, and attractive, to stu-
dents who have not done advanced mathematics. However, the second part involves
some developments which have been written for readers with a stronger mathematical
background.
I.3. What is different in this presentation
A course on Dynamic Games, accessible to both control engineering and econom-
ics or management science students, requires a specialized textbook. Since we empha-
size the detailed description of the dynamics of some specic systems controlled by the
players we have to present rather sophisticated mathematical notions, related to theory.
This presentation of the dynamics must also be accompanied by an introduction to the
specic mathematical concepts of game theory. The originality of our approach is in
the mixing of these two branches of applied mathematics.
There are many good books on classical game theory. A nonexhaustive list in-
cludes [47], [58], [59], [3], and more recently [22], [19] and [40]. However, they do
not introduce the reader to the most general dynamic games. There is a classic book
[4] that covers extensively the dynamic game paradigms, however, readers without a
strong mathematical background will probably nd that book difcult. This text is
therefore a modest attempt to bridge the gap.
2
Interestingly, dynasty comes from the same root. See Oxford English Dictionary.
Part 1
Foundations of Classical Game Theory
CHAPTER II
Elements of Classical Game Theory
Dynamic games constitute a subclass of mathematical models studied in what is
usually called game theory. It is therefore proper to start our exposition with those
basic concepts of classical game theory which provide the fundamental tread of the
theory of dynamic games. For an exhaustive treatment of most of the denitions of
classical game theory see e.g., [47], [58], [22], [19] and [40].
II.1. Basic concepts of game theory
In a game we deal with many concepts that relate to the interactions between
agents. Below we provide a short and incomplete list of those concepts that will be
further discussed and explained in this chapter.
Players. They compete in the game. A player
1
can be an individual, a set of
individuals (or a team, a corporation, a political party, a nation, a pilot of an
aircraft, a captain of a submarine, etc. )
A move or a decision is a players action. In the terminology of control the-
ory
2
, a move is the implementation of a players control.
Information. Games will be said to have an information structure or pattern
depending on what the players know about the game and its history when
they decide their moves. The information structure can vary considerably.
For some games, the players do not remember what their own and opponents
actions have been. In other games, the players can remember the current
state of the game (a concept to be elucidated later) but not the history of
moves that led to this state. In other cases, some players may not know who
are the competitors and even what are the rules of the game (an imperfect and
incomplete information for sure). Finally there are games where each player
1
Political correctness promotes the usage of gender inclusive pronouns they and their. How-
ever, in games, we will frequently have to address an individual players action and distinguish it from a
collective action taken by a set of several players. As far as we know, in English, this distinction is only
possible through usage of the traditional grammar gender exclusive pronouns: possessive his, her
and personal he, she. In this book, to avoid confusion, we will refer to a singular genderless agent
as he and to the agents possessions as his.
2
We refer to control theory, since, as said earlier, dynamic games can be viewed as a mixture of
control and game paradigms.
9
10 II. ELEMENTS OF CLASSICAL GAME THEORY
has a perfect and complete information i.e., everything concerning the game
and its history is known to each player.
A players pure strategy is a rule that associates a players move with the in-
formation available to him at the time when he decides which move to choose.
A players mixed strategy is a probability measure on the space of his pure
strategies. We can also view a mixed strategy as a random draw of a pure
strategy.
A players behavioral strategy is a rule which denes a random draw of the
admissible move as a function of the information available
3
. These strategies
are intimately linked with mixed strategies and it has been proved early [33]
that the two concepts coincide for many games.
Payoffs are real numbers measuring desirability of the possible outcomes of
the game e.g., the amounts of money the players may win or loose. Other
names of payoffs can be: rewards, performance indices or criteria, utility
measures, etc.
The above list refers to elements of games in relatively imprecise common language
terms. More rigorous denitions can be given, for the above notions. For this we
formulate a game in the realm of decision analysis where decision trees give a repre-
sentation of the dependence of outcomes on actions and uncertainties. This will be
done in the next section.
II.2. Games in extensive form
Agame in extensive formis dened on a graph. Agraph is a set of nodes connected
by arcs as shown in Figure II.1. In a game tree, the nodes indicate game positions that
>
>
>
>
>
FIGURE II.1. Node and arcs

correspond to precise histories of play. The arcs correspond to the possible actions
of the player who has the right to move in a given position. To be meaningful, the
graph representing a game must have the structure of a tree. This is a graph where all
nodes are connected but there are no cycles. In a tree there is a single node without
parent, called the root and a set of nodes without descendants, the leaves. There
is always a single path from the root to any leaf. The tree represents a sequence of
3
A similar concept has been introduced in control theory under the name of relaxed controls.
II.2. GAMES IN EXTENSIVE FORM 11
actions and random perturbations which inuence the outcome of a game played by a
set of players.
`
`
`
`
`
`
`
`
`
` >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
FIGURE II.2. A tree

II.2.1. Description of moves, information and randomness. A game in exten-
sive form is described by a set of players that includes a particular player called Nature
which always plays randomly. A set of positions of the game correspond to the nodes
on the tree. There is a unique move history leading from the root node to each game
position represented by a node. At each node one particular player has the right to
move i.e., he has to select a possible action from an admissible set represented by the
arcs emanating from the node, see Figure II.2.
The information that each player disposes of at the nodes where he has to select
an action denes the information structure of the game. In general, the player may not
know exactly at which node of the tree the game is currently located. More exactly, his
information is of the following form:
he knows that the current position of the game is within a given sub-
set of nodes; however, he does not know, which specic node it is.
This situation will be represented in a game tree as follows:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
FIGURE II.3. Information set

In Figure II.3 a set of nodes is linked by a dotted line. This will be used to denote
an information set. Notice that there is the same number of arcs emanating from each
node of the set. The player selects an arc, knowing that the node is in the information
set but ignoring which particular node in this set has been reached.
When a player selects a move, this corresponds to selecting an arc of the graph
which denes a transition to a new node, where another player has to select his move,
etc. Among the players, Nature is playing randomly i.e., Natures moves are selected
at random.
The game has a stopping rule described by terminal nodes of the tree (the leaves).
Then the players are paid their rewards, also called payoffs.
Figure II.4 shows the extensive form of a two-player one-stage game with simulta-
neous moves and a random intervention of Nature. We also say that this game has the
simultaneous move information structure. It corresponds to a situation where Player 2
does not know which action has been selected by Player 1 and vice versa. In this g-
ure the node marked D
1
corresponds to the move of Player 1, the nodes marked D
2
correspond to the move of Player 2.
The information of the second player is represented by the doted line linking the
nodes. It says that Player 2 does not know what action has been chosen by Player 1.
The nodes marked E correspond to Natures move. In that particular case we assume
that three possible elementary events are equiprobable. The nodes represented by dark
circles are the terminal nodes where the game stops and the payoffs are collected.
II.2. GAMES IN EXTENSIVE FORM 13
P
1

`
a
1
1
`
`
`
`
`
`
`
`
`
`
a
2
1
P
2
P
2
a
1
2
a
2
2
a
1
2
a
2
2
`
_
E
>
>
>
>
>
1/3
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
`
_
E
>
>
>
>
>
1/3
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
`
_
E
>
>
>
>
>
1/3
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
`
_
E
>
>
>
>
>
1/3
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
FIGURE II.4. A game in extensive form
This representation of games is inspired from parlor games like chess, poker,
bridge, etc. , which can be, at least theoretically, correctly described in this frame-
work. In such a context, the randomness of Nature s play is the representation of card
or dice draws realized in the course of the game.
Extensive form provides indeed a very detailed description of the game. We stress
that if the player knows the node the game is located at, he knows not only the current
state of the game but he also remembers the entire game history.
However, extensive form is rather non practical to analyze even simple games be-
cause the size of the tree increases very fast with the number of steps. An attempt to
provide a complete description of a complex game like bridge , using extensive form,
would lead to a combinatorial explosion. Nevertheless extensive form is useful in con-
ceptualizing the dynamic structure of a game. The ordering of the sequence of moves,
highlighted by extensive form, is present in most games.
There is another drawback of the extensive form description. To be represented as
nodes and arcs, the histories and actions have to be nite or enumerable. Yet in many
models we want to deal with actions and histories that are continuous variables. For
such models, we need different methods for problem description. Dynamic games will
provide us with such methods. As extensive forms, dynamic games theory is about se-
quencing of actions and reactions. In dynamic games, however, different mathematical
tools are used for the representation of the game dynamics. In particular, differential
and/or difference equations are utilized to represent dynamic processes with continu-
ous state and action spaces. To a certain extent, dynamic games do not suffer from
many of extensive form deciencies.
II.2.2. Comparing random perspectives. Due to Natures randomness, the play-
ers will have to compare, and choose from, different random perspectives in their de-
cision making. The fundamental decision structure is described in Figure II.5. If the
player chooses action a
1
he faces a random perspective of expected value 100. If he
chooses action a
2
he faces a sure gain of 100. If the player is risk neutral he will be
indifferent between the two actions. If he is risk averse he will choose action a
2
, if he
is a risk lover he will choose action a
1
. In order to represent the attitude toward risk of
a decision maker Von Neumann and Morgenstern introduced the concept of cardinal
utility [66]. If one accepts the axioms
4
of utility theory then a rational player should
take the action which leads toward the random perspective with the highest expected
utility . This will be called the principle of maximization of expected utility.
4
There are several classical axioms (see e.g., [40]) formalizing the properties of a rational agents
utility function. To avoid the introduction of too many new symbols some of the axioms will be formu-
lated in colloquial rather than mathematical form.
(1) Completeness. Between two utility measures u
1
and u
2
either u
1
u
2
or u
1
u
2
(2) Transitivity. Between three utility measures u
1
, u
2
and u
3
if u
1
u
2
and u
2
u
3
then
u
1
u
3
.
(3) Relevance. Only the possible actions are relevant to the decision maker.
(4) Monotonicity or a higher probability outcome is always better. I.e., if u
1
> u
2
and 0 <
1, then u
1
+ (1 )u
2
> u
1
+ (1 )u
2
.
(5) Continuity. If u
1
> u
2
and u
2
> u
3
, then there exists a number such that 0 1, and
u
2
= u
1
+ (1 )u
3
.
(6) Substitution (several axioms). Suppose the decision maker has to choose between two alter-
native actions to be taken after one of two possible events occurs. If in each event he prefers
the rst action then he must also prefer it before he learns which event has occurred.
(7) Interest. There is always something of interest that can happen.
If the above axioms are jointly satised, the existence of a utility function function is guaranteed. In
the rest of this book we will assume that this is the case and that agents are endowed with such utility
functions (referred to as VNMutility functions), which they maximize. It is in this sense that the subjects
treated in this book are rational agents.
II.3. ADDITIONAL CONCEPTS ABOUT INFORMATION 15
D

a
1
a
2
`
_
E
e.v.=100
1/3
1/3
1/3
100
0
100
200
FIGURE II.5. Decision in uncertainty
This solves the problem of comparing random perspectives. However this also in-
troduces a new way to play the game. A player can set up a random experiment in
order to generate his decision. Since he uses utility functions the principle of maxi-
mization of expected utility permits him to compare deterministic action choices with
random ones.
As a nal reminder of the foundations of utility theory let us recall that the Von
Neumann-Morgenstern utility function is dened up to an afne transformation
5
of re-
wards. This says that the player choices will not be affected if the utilities are modied
through an afne transformation.
II.3. Additional concepts about information
What is known by the players who interact in a game is of paramount importance
for what can be considered a solution to the game. Here, we refer briey to the con-
cepts of complete and perfect information and other types of information patterns.
5
An afne transformation is of the form y = a +bx.
II.3.1. Complete and perfect information. The information structure of a game
indicates what is known by each player at the time the game starts and at each of his
moves.
Complete vs incomplete information. Let us consider rst the information available
to the players when they enter a game play. A player has complete information if he
knows
who the players are
the set of actions available to all players
all player possible outcomes.
A game with common knowledge is a game where all players have complete infor-
mation and all players know that the other players have complete information. This
situation is sometimes called the symmetric information case.
Perfect vs imperfect information. We consider now the information available to a
player when he decides about specic move. In a game dened in extensive form, if
each information set consists of just one node, then we say that the players have perfect
information. If this is not the case the game is one of imperfect information.
EXAMPLE II.3.1. A game with simultaneous moves as e.g., the one shown in Fig-
ure II.4, is of imperfect information.
II.3.2. Perfect recall. If the information structure is such that a player can always
remember all past moves he has selected, and the information he has obtained from
and about the other players, then the game is one of perfect recall. Otherwise it is one
of imperfect recall.
II.3.3. Commitment. A commitment is an action taken by a player that is binding
on him and that is known to the other players. In making a commitment a player can
persuade the other players to take actions that are favorable to him. To be effective
commitments have to be credible. A particular class of commitments are threats.
II.3.4. Binding agreement. Binding agreements are restrictions on the possible
actions decided by two or more players, with a binding contract that forces the imple-
mentation of the agreement. Usually, to be binding an agreement requires an outside
authority that can monitor the agreement at no cost and impose on violators sanctions
so severe that cheating is prevented.
II.4. GAMES IN NORMAL FORM 17
II.4. Games in normal form
II.4.1. Playing games through strategies. Let M = 1, . . . , m be the set of
players. Apure strategy
j
for Player j is a mapping, which transforms the information
available to Player j at a decision node i.e., a position of the game where he is making
a move, into his set of admissible actions. We call strategy vector the m-tuple =
()
j=1,...m
. Once a strategy is selected by each player, the strategy vector is dened
and the game is played as if it were controlled by an automaton
6
.
An outcome expressed in terms of utility to Player j, j M is associated with
all-player strategy vector . If we denote by
j
the set of strategies for Player j then
the game can be represented by the m mappings
V
j
:
1

j

m
IR, j M
that associate a unique (expected utility) outcome V
j
() for each player j M with
a given strategy vector in
1

j

m
. One then says that the game is
dened in normal or strategic form.
II.4.2. From extensive form to strategic or normal form. We consider a simple
two-player game, called matching pennies
7
. The rules of the game are as follows:
The game is played over two stages. At rst stage each player chooses
head (H) or tail (T) without knowing the other players choice. Then
they reveal their choices to one another. If the coins do not match,
Player 1 wins $5 and Payer 2 wins -$5. If the coins match, Player 2
wins $5 and Payer 1 wins -$5. At the second stage, the player who
lost at stage 1 has the choice of either stopping the game (Q - quit)
or playing another penny matching with the same type of payoffs as
in the rst stage. So, his second stage choices are (Q, H, T).
This game is represented in extensive form in Figure II.6. A dotted line connects the
different nodes forming an information set for a player. The player who has the move
is indicated on top of the graph.
In Table II.1 we have identied the 12 different strategies that can be used by each
of each of the two players in the game of matching pennies. Each player moves twice.
In the rst move the players have no information; in the second move they know what
have been the choices made at rst stage.
In this table, each line describes the possible actions of each player, given the
information available to this player. For example, line 1 tells that Player 1 having
6
The idea of playing games through the use of automata will be discussed in more details when we
present the folk theorem for repeated games in Part 2.
7
This example is borrowed from [22].
`
`
`
`
`
`
`
`
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-
-
-
-
-
`
`
`
`
`
>
>
>
>
>
>
>
>
>
>
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
-
-
-
-
-
`
`
`
`
`
.
.
.
.
.
.
.
.
.
.
-
-
-
-
-
`
`
`
`
`
-
-
-
-
-
`
`
`
`
`
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-10,10
0,0
0,0
-10,10
-5,5
0,0
10,-10
10,-10
0,0
5,-5
0,0
10,-10
10,-10
0,0
5,-5
-10,10
0,0
0,0
-10,10
-5,5
H
T
H
T
H
T
Q
H
T
Q
H
T
H
T
H
T
H
T
H
T
H
T
H
T
H
T
H
T
Q
H
T
Q
H
T
P1 P2 P1 P2 P1
FIGURE II.6. The extensive form tree of the matching pennies game
played H in round 1 and Player 2 having played H in round 1, Player 1, knowing
this information would play Q at round 2. If Player 1 has played H in round 1 and
Player 2 has played T in round 1 then Player 1, knowing this information would play
H at round 2. Clearly this describes a possible course of action for Player 1.
The 12 possible courses of actions are listed in Table II.1. The situation is similar
(actually, symmetrical) for Player 2.
Table II.2 represents the payoff matrix obtained by Player 1 when both players
choose one of the 12 possible strategies.
II.4. GAMES IN NORMAL FORM 19
Strategies of Player 1 Strategies of Player 2
1st scnd move 1st scnd move
move if Player 2 move if Player 1
has played has played
H T H T
1 H Q H H H Q
2 H Q T H T Q
3 H H H H H H
4 H H T H T H
5 H T H H H T
6 H T T H T T
7 T H Q T Q H
8 T T Q T Q T
9 T H H T H H
10 T T H T H T
11 T H T T T H
12 T T T T T T
TABLE II.1. List of strategies
1 2 3 4 5 6 7 8 9 10 11 12
1 -5 -5 -5 -5 -5 -5 5 5 0 0 10 10
2 -5 -5 -5 -5 -5 -5 5 5 10 10 0 0
3 -10 0 -10 0 -10 0 5 5 0 0 10 10
4 -10 0 -10 0 -10 0 5 5 10 10 0 0
5 0 -10 0 -10 0 -10 5 5 0 0 10 10
6 0 -10 0 -10 0 -10 5 5 10 10 0 0
7 5 5 0 0 10 10 -5 -5 -5 -5 -5 -5
8 5 5 10 10 0 0 -5 -5 -5 -5 -5 -5
9 5 5 0 0 10 10 -10 0 -10 0 -10 0
10 5 5 10 10 0 0 -10 0 -10 0 -10 0
11 5 5 0 0 10 10 0 -10 0 -10 0 -10
12 5 5 10 10 0 0 0 -10 0 -10 0 -10
TABLE II.2. Payoff matrix for Player 1
So, the extensive form game given in Figure II.6 and played with the strategies
indicated above, has now been represented through a 1212 payoff matrix to Player 1
(Table II.2). Since what is gained by Player 1 is lost by Player 2 we say this is a zero-
sum game. Hence there is no need, here, to repeat the payoff matrix construction for
Player 2 since it is the negative of the previous one. In a more general situation i.e.,
a nonzero-sum game, a specic payoff matrix will be constructed for each player.
The two payoff matrices will be of the same dimensions where the number of rows and
columns correspond to the number of strategies available to Player 1 and 2 respectively.
II.4.3. Mixed and behavior strategies.
II.4.3.1. Mixing strategies. Since a player evaluates outcomes according to his
VNM-utility functions (remember footnote II.2.2, page 14) he can envision to mix
strategies by selecting one of them randomly, according to a lottery that he will dene.
This introduces one supplementary chance move in the game description.
For example, if Player j has p pure strategies
jk
, k = 1, . . . , p he can select the
strategy he will play through a lottery which gives a probability x
jk
to the pure strategy
jk
, k = 1, . . . , p. Now the possible choices of action by Player j are elements of the
set of all the probability distributions
A
j
=
_
x
j
= (x
jk
)
k=1,...,p
[x
jk
0,
p
k=1
x
jk
= 1.
_
We note that the set A
j
is compact and convex in IR
p
. This is important for proving
existence of solutions to these games (see Chapter III).
II.4.3.2. Behavior strategies. A behavior strategy is dened as a mapping which
associates, with the information available to Player j at a decision node where he is
making a move, a probability distribution over his set of actions.
The difference between mixed and behavior strategies is subtle. In a mixed strat-
egy, the player considers the set of possible strategies and picks one, at random, accord-
ing to a carefully designed lottery. In a behavior strategy the player designs a strategy
that consists of deciding at each decision node, according to a carefully designed lot-
tery. However, this design is contingent upon the information available at this node. In
summary, we can say that a behavior strategy is a strategy that includes randomness at
each decision node. A famous theorem [33], which we give without proof, establishes
that these two ways of introducing randomness in the choice of actions are equivalent
in a large class of games.
THEOREM II.4.1. In an extensive game of perfect recall all mixed strategies can
be represented as behavior strategies.
II.5. Exercises
II.5.1. A game is given in extensive form in Figure II.7. Present the game in
normal form.
II.5. EXERCISES 21
P1
P2 P2
1,-1 2,-2 3,-3 4,-7 5,-5 6,-6
1 2
1 2 3 1 2 3
FIGURE II.7. A game in extensive form.
II.5.2. Consider the payoff matrix given in Table II.2. If you know that it rep-
resents a payoff matrix of a two player game, can you create a unique game tree and
formulate a unique corresponding extensive form of the game?
CHAPTER III
Solution Concepts for Noncooperative Games
III.1. Introduction
To speak of a solution concept for a game one needs to deal with the game de-
scribed in strategic or normal form. A solution to an m-player game will thus be a
set of strategy vectors that have attractive properties expressed in terms of payoffs
received by the players.
It should be clear from Chapter II that a game theory problem can admit different
solutions depending on how the game is dened and, in particular, on what informa-
tion the players dispose of. In this chapter we propose and discuss different solution
concepts for games described in normal form. We shall be mainly interested in nonco-
operative games i.e., situations where the players select their strategies independently.
Recall that an m-person game in normal form is dened by the following data
M, (
j
), (V
j
) for j M where M is the set of players, M = 1, 2, ..., m. For
each player j M,
j
is the set of strategies (also called the strategy space ). Symbol
V
j
, j M, denotes the payoff function that assigns a real number V
j
() to a strategy
vector
1

2

m
. We shall study different classes of games in normal
form.
The rst category is constituted by the so-called two-player zero-sum matrix games
that describe conict situations where there are two players and each of them has a
nite choice of pure strategies. Moreover, what one player gains the other player
looses, which explains why the games are called zero-sum.
The second category also consists of two player games, again with a nite pure
strategy set for each player, but then the payoffs are not zero-sum. These are the
nonzero-sum matrix games or bimatrix games.
The third category are concave games where the number of players can be more
than two and the assumption of the action space niteness is dropped. This category
encompasses the previous classes of matrix and bimatrix games. We will be able to
prove for concave games nice existence, uniqueness and stability results for a nonco-
operative game solution concept called equilibrium.
23
24 III. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
III.2. Matrix games
III.2.1. Security levels.
DEFINITION III.2.1. A game is zero-sumif the sumof the players payoffs is always
zero. Otherwise the game is nonzero-sum. A two-player zero-sum game is also called
a duel.
DEFINITION III.2.2. A two-player zero-sum game in which each player has only a
nite number of actions to choose from is called a matrix game.
Let us explore how matrix games can be solved. We number the players 1 and
2 respectively. Conventionally, Player 1 is the maximizer and has m (pure) strategies,
say i = 1, 2, ..., m, and Player 2 is the minimizer and has n strategies to choose from,
say j = 1, 2, ..., n. If Player 1 chooses strategy i while Player 2 picks strategy j, then
Player 2 pays Player 1 the amount a
ij
1
. The set of all possible payoffs that Player
1 can obtain is represented in the form of the m n matrix A with entries a
ij
for
i = 1, 2, ..., m and j = 1, 2, ..., n. Now, the element in the ith row and jth column
of the matrix A corresponds to the amount that Player 2 will pay Player 1 if the latter
chooses strategy i and the former chooses strategy j. Thus one can say that in the
game under consideration, Player 1 (the maximizer) selects rows of A while Player 2
(the minimizer) selects columns of that matrix. As the result of the play, as said above,
Player 2 pays Player 1 the amount of money specied by the element of the matrix in
the selected row and column.
EXAMPLE III.2.1. Consider a game dened by the following matrix:
_
3 1 8
4 10 0
_
What strategy should a rational player select?
A rst line of reasoning is to consider the players security levels. It is easy to see that
if Player 1 chooses the rst row, then, whatever Player 2 does, Player 1 will get a payoff
equal to at least 1 (util
2
). By choosing the second row, on the other hand, Player 1 risks
getting 0. Similarly, by choosing the rst column Player 2 ensures that he will not have
to pay more than 4, while the choice of the second or third column may cost him 10
or 8, respectively. Thus we say that Player 1s security level is 1 which is ensured by
the choice of the rst row, while Player 2s security level is 4 and it is ensured by the
choice of the rst column. Notice that
1 = max
i
min
j
a
ij
and
4 = min
j
max
i
a
ij
.
1
Negative payments are allowed. We could have said also that Player 1 receives the amount a
ij
and
Player 2 receives the amount a
ij
.
2
A util is the utility unit.
III.2. MATRIX GAMES 25
From this observation, the strategy which ensures that Player 1 will get at least the
payoff equal to his security level is called his maximin strategy. Symmetrically, the
strategy which ensures that Player 2 will not have to pay more than his security level
is called his minimax strategy.
LEMMA III.2.1. In any matrix game the following inequality holds
(III.1) max
i
min
j
a
ij
min
j
max
i
a
ij
.
Proof: The proof of this result is based on the remark that, since both security levels
are achievable, they necessarily satisfy the inequality (III.1). More precisely, let (i
, j
)
and (i
, j
) be dened by
(III.2) a
i
j
= max
i
min
j
a
ij
and
(III.3) a
i
= min
j
max
i
a
ij
respectively. Now consider the payoff a
i
. For any k and l, one has

min
j
a
kj
a
kl
max
i
a
il
(III.4)
Then, by construction, and applying (III.4) with k = i
and l = j
, we get
max
i
(min
j
a
ij
) a
i
j
min
j
(max
i
a
ij
).
QED.
An important observation is that if Player 1 has to move rst and Player 2 acts
having seen the move made by Player 1, then the maximin strategy is Player 1s best
choice which leads to the payoff equal to 1. If the situation is reversed and it is Player
2 who moves rst, then his best choice will be the minimax strategy and he will have
to pay 4. Now the question is what happens if the players move simultaneously. The
careful study of the example shows that when the players move simultaneously the
minimax and maximin strategies are not satisfactory solutions to this game. Notice
that the players may try to improve their payoffs by anticipating each others strategy.
In the result of that we will see a process which, in some cases, is not converging to
any stable solution. For example such an instability occurs in the matrix game that we
have introduced in example III.2.1
Consider now another example.
EXAMPLE III.2.2. Let the matrix game A given as follows
_
_
10 15
20
20 30 40
30 45 60
_
_
.
Can we nd satisfactory strategy pairs?
It is easy to see that
max
i
min
j
a
ij
= max15, 30, 45 = 15
and
min
j
max
i
a
ij
= min30, 15, 60 = 15
and that the pair of maximin and minimax strategies is given by
(i, j) = (1, 2).
That means that Player 1 should choose the rst row while Player 2 should select the
second column, which will lead to the payoff equal to -15.
In the above example, we can see that the players maximin and minimax strategies
solve the game in the sense that the players will be best off if they use these strategies.
III.2.2. Saddle points. Let us explore in more depth this class of strategies that
has solved the above zero-sum matrix game.
DEFINITION III.2.3. If, in a matrix game A = [a
ij
]
i=1,...,m;j=1,...,n
, there exists a
pair (i
, j
) such that, for all i = 1, . . . , m and j = 1, . . . , n

(III.5) a
ij
a
i
j
a
i
j
we say that the pair (i
, j
) is a saddle point in pure strategies for the matrix game.

As an immediate consequence of that denition we obtain that, at a saddle point of
a zero-sum game, the security levels of the two players are equal, i.e.,
max
i
min
j
a
ij
= min
j
max
i
a
ij
= a
i
j
.
What is less obvious is the fact that, if the security levels are equal then there exists a
saddle point.
LEMMA III.2.2. If, in a matrix game, the following holds
max
i
min
j
a
ij
= min
j
max
i
a
ij
= v
then the game admits a saddle point in pure strategies.
Proof: Let i
and j
be a strategy pair that yields the security level payoffs v (respec-

tively v) for Player 1 (respectively Player 2). We thus have for all i = 1, . . . , m and
j = 1, . . . , n
a
i
j
min
j
a
i
j
= max
i
min
j
a
ij
(III.6)
a
ij
max
i
a
ij
= min
j
max
i
a
ij
. (III.7)
Since
max
i
min
j
a
ij
= min
j
max
i
a
ij
= a
i
j
= v
by (III.6)-(III.7) we obtain
a
ij
a
i
j
a
i
j
which is the saddle point condition. QED.
When they exist, saddle point strategies provide a solution to the matrix game
problem. Indeed, in Example III.2.2, if Player 1 expects Player 2 to choose the second
column, then the rst row will be his optimal choice. On the other hand, if Player
2 expects Player 1 to choose the rst row, then it will be optimal for him to choose
the second column. In other words, neither player can gain anything by unilaterally
deviating from his saddle point strategy. In a saddle point each player strategy con-
stitutes the best reply the player can have to the strategy choice of his opponent. This
observation leads to the following remark
REMARK III.2.1. Let (i
, j
) be a saddle point for a matrix game. Players 1 and

2 cannot improve their payoff by unilaterally deviating from (i)
or (j)
, respectively.
We say that the strategy pair (i
, j
) is an equilibrium.
Saddle point strategies, as shown in Example III.2.2, lead to both an equilibrium
and a pair of guaranteed payoffs. Therefore such a strategy pair, if it exists, provides a
solution to a matrix game, which is good in that rational players are likely to adopt
this strategy pair. The problem is that a saddle point does not always exist for a matrix
game, if players are restricted to use the discrete (nite) set of strategies that index the
rows and the columns of the game matrix A. In the next section we will see that the
situation improves if one allows the players to mix their strategies.
III.2.3. Mixed strategies. We have already indicated in Chapter II that a player
could mix his strategies by resorting to a lottery to decide which strategy to play. A
reason to introduce mixed strategies in a matrix game is to enlarge the set of possible
choices. We have noticed that, like in Example III.2.1, many matrix games do not
possess saddle points in the class of pure strategies. However Von Neumann has proved
that saddle point strategy pairs always exist in the class of mixed strategies.
Consider the matrix game dened by an mn matrix A. (As before, Player 1 has
m strategies, Player 2 has n strategies). A mixed strategy for Player 1 is an m-tuple
x = (x
1
, x
2
, ..., x
m
)
where x
i
are nonnegative for i = 1, 2, ..., m, and x
1
+ x
2
+ ... + x
m
= 1. Similarly, a
mixed strategy for Player 2 is an n-tuple
y = (y
1
, y
2
, ..., y
n
)
where y
j
are nonnegative for j = 1, 2, ..., n, and y
1
+ y
2
+ ... + y
m
= 1.
Note that a pure strategy can be considered as a particular mixed strategy with
one coordinate equal to one and all others equal to zero. The set of possible mixed
strategies of Player 1 constitutes a simplex
3
in the space IR
m
. This is illustrated in
Figure III.1 for m = 3. Similarly the set of mixed strategies of Player 2 is a simplex in
IR
m
.
`
x
3
-
x
1
x
2
1
1
`
`
`
`
`
`
`
`
`
`
`
1
FIGURE III.1. The simplex of mixed strategies
The interpretation of a mixed strategy, say x, is that Player 1, chooses his pure
strategy i with probability x
i
, i = 1, 2, ..., m. Since the two lotteries dening the
random draws are independent events, the joint probability that the strategy pair (i, j)
be selected is given by x
i
y
j
. Therefore, with each pair of mixed strategies (x, y) we
can associate an expected payoff given by the quadratic form in x and y (where the
superscript
T
denotes the transposition operator on a matrix):
m
i=1
n
j=1
x
i
y
j
a
ij
= x
T
Ay.
One of the rst important result of game theory proved in [65] is the following theorem.
THEOREM III.2.1. Any matrix game has a saddle point in the class of mixed strate-
gies, i.e., there exist probability vectors x and y such that
max
x
min
y
x
T
Ay = min
y
max
x
x
T
Ay = (x
)
T
Ay
= v
3
A simplex is, by construction, the smallest closed convex set that contains n + 1 points in IR
n
.
where v
is called the value of the game.

We shall not repeat the complex proof given by von Neumann. Instead we will
show that the search for saddle points can be formulated as a linear programming
problem (LP). A well known duality property in LP implies the saddle point existence
result.
III.2.4. Algorithms for the computation of saddle-points. Saddle points in mixed
strategies can be obtained as solutions of specic linear programs. It is easy to show
that, for any matrix game, the following two relations hold
4
:
(III.8) v
= max
x
min
y
x
T
Ay = max
x
min
j
m
i=1
x
i
a
ij
and
(III.9) z
= min
y
max
x
x
T
Ay = min
y
max
i
n
j=1
y
j
a
ij
These two relations imply that the value of the matrix game can be obtained by solving
any of the following two linear programs:
(1) Primal problem
max v
subject to
v
m
i=1
x
i
a
ij
, j = 1, 2, ..., n
1 =
m
i=1
x
i
x
i
0, i = 1, 2, ...m
(2) Dual problem
min z
subject to
z
n
j=1
y
j
a
ij
, i = 1, 2, ..., m
1 =
n
j=1
y
j
y
j
0, j = 1, 2, ...n
The following theorem relates the two programs together.
THEOREM III.2.2. (Von Neumann [65]): Any nite two-person zero-sum matrix
game A has a value.
4
Actually this is already a linear programming result. When the vector x is given, the expression
min
y
x
T
Ay, with the simplex constraints i.e., y 0 and

j
y
j
= 1 denes a linear program. The
solution of that LP can always be found at an extreme point of the admissible set. An extreme point
in the simplex corresponds to y
j
= 1 and the other components equal to 0. Therefore, since Player 1
selects his mixed strategy x expecting the opponent to dene his best reply, he can only restrict the
search for the best reply to the opponents set of pure strategies.
Proof: The value v
of the zero-sum matrix game A is obtained as the common opti-

mal value of the following pair of dual linear programming problems. The respective
optimal programs dene the saddle-point mixed strategies.
Primal Dual
max v min z
subject to x
T
A v1
T
subject to Ay z1
x
T
1 = 1 1
T
y = 1
x 0 y 0
where
1 =
_
_
1
.
.
.
1
.
.
.
1
_
_
denotes a vector of appropriate dimension with all components equal to 1. One needs
to solve only one of the programs. The primal and dual solutions give a pair of saddle
point strategies. QED
REMARK III.2.2. Simple n n games can be solved more easily (see [47]). Sup-
pose A is an n n matrix game which does not have a saddle point in pure strategies.
The players unique saddle point mixed strategies and the game value are given by:
(III.10) x =
1
T
A
D
1
T
A
D
1
(III.11) y =
A
D
1
1
T
A
D
1
(III.12) v =
detA
1
T
A
D
1
where A
D
is the adjoint matrix of A, det A the determinant of A, and 1 the vector of
ones as before.
Let us illustrate the usefulness of the above formulae on the following example.
EXAMPLE III.2.3. We want to solve the matrix game
_
1 0
1 2
_
.
The game, obviously, has no saddle point (in pure strategies). The adjoint A
D
is
_
2 0
1 1
_
and 1A
D
= [3 1], A
D
1
T
= [2 2], 1A
D
1
T
= 4, det A = 2. Hence the best mixed
strategies for the players are:
x =
_
3
4
,
1
4
_
, y =
_
1
2
,
1
2
_
and the value of the play is:
v =
1
2
In other words in the long run Player 1 is supposed to win .5 if he uses the rst row
75% of times and the second 25% of times. The Player 2s best strategy will be to use
the rst and the second column 50% of times which ensures him a loss of .5 (only;
using other strategies he is supposed to lose more).
III.3. Bimatrix games
III.3.1. Best reply strategies. We shall now extend the theory that we developed
for matrix games to the case of nonzero sum games. A bimatrix game conveniently
represents a two-person nonzero sum game where each player has a nite set of possi-
ble pure strategies. In a bimatrix game there are two players, say Player 1 and Player 2
who have m and n pure strategies to choose from, respectively. Now, if the players se-
lect a pair of pure strategies, say (i, j), then Player 1 obtains the payoff a
ij
and Player
2 obtains b
ij
, where a
ij
and b
ij
are some given numbers. The payoffs for the two play-
ers corresponding to all possible combinations of pure strategies can be represented
by two m n payoff matrices A and B (from here the name) with entries a
ij
and b
ij
,
respectively. When a
ij
+ b
ij
= 0, the game is a zero-sum matrix game. Otherwise,
the game is nonzero-sum. As a
ij
and b
ij
are the players payoff matrix entires this
conclusion agrees with Denition III.2.1, page 24.
In Remark III.2.1, page 27, in the context of (zero-sum) matrix games, we have
noticed that a pair of saddle point strategies constitute an equilibrium since no player
can improve his payoff by a unilateral strategic change. Each player has therefore
chosen the best reply to the strategy of the opponent. We examine whether the concept
of best reply strategies can be also used to dene a solution to a bimatrix game.
EXAMPLE III.3.1. Consider the bimatrix game dened by the following matrices
_
52 44 44
42 46 39
_
and
_
50 44 41
42 49 43
_
.
Examine whether there are strategy pairs that constitute best reply to each other.
III.3. BIMATRIX GAMES 33
It is often convenient to combine the data contained in two matrices and write it in the
form of one matrix whose entries are ordered pairs (a
ij
, b
ij
). In this case one obtains
_
(52, 50)
(44, 44) (44, 41)

(42, 42) (46, 49)
(39, 43)
_
.
Consider the two cells indicated by an asterisks . Notice that they correspond to
outcomes resulting from best reply strategies. Indeed, at cell 1,1, if Player 2 sticks
to the rst column, Player 1 can only worsen his payoff from 52 to 42, if he moves to
row 2. Similarly, if Player 1 plays row 1, Player 2 can gain 44 or 41 only, instead of
50, if he abandoned column 1. Similar reasoning can be conducted to prove that the
payoffs at cell 2,2 result from another pair of best reply strategies.
III.3.2. Nash equilibria. We can see from Example III.3.1 that a bimatrix game
can have many best reply strategy pairs. However there are other examples where no
such pairs can be found in pure strategies. So, as we have done it with (zero-sum)
matrix games, we will expand the strategy sets to include mixed strategies for bimatrix
games. The Nash equilibrium solution concept for the bimatrix games is dened as
follows.
DEFINITION III.3.1. A pair of mixed strategies (x
, y
) is said to be a Nash equi-

librium of the bimatrix game if
(1) (x
)
T
A(y
) x
T
A(y
) for every mixed strategy x, and

(2) (x
)
T
B(y
) (x
)
T
By for every mixed strategy y.
Notice that this denition simply says that at an equilibrium, no player can improve
his payoff by deviating unilaterally from the equilibrium strategy.
REMARK III.3.1. A Nash equilibrium extends to nonzero sum games the equilib-
rium property that was observed for the saddle point solution to a zero sum matrix
game. The big difference with the saddle point concept is that, in a nonzero sum con-
text, the equilibrium strategy for a player does not guarantee him that he will receive
at least the equilibrium payoff. Indeed if his opponent does not play well i.e., does
not use the equilibrium strategy, the outcome of a player can be anything. There is no
guarantee of the payoff; it is only hoped that the other player is rational and in its
own interest will play his best reply strategy.
Another important step in the development of the theory of games has been the
following theorem [42]
THEOREM III.3.1. Every nite bimatrix game has at least one Nash equilibrium in
mixed strategies.
Proof: A general existence proof for equilibria in concave games, a class of games
that include the bimatrix games, based on Kakutani xed point theorem will be given
in the next section. QED
III.3.3. Shortcommings of the Nash equilibrium concept.
III.3.3.1. Multiple equilibria. As noticed in Example III.3.1 a bimatrix game may
have several equilibria in pure strategies. There may be additional equilibria in mixed
strategies as well. The nonuniqueness of Nash equilibria for bimatrix games is a se-
rious theoretical and practical problem. In Example III.3.1 one equilibrium strictly
dominates the other equilibrium i.e., gives both players higher payoffs. Thus, it can
be argued that even without any consultations the players will naturally pick the strat-
egy pair corresponding to matrix entry (i, j) = (1, 1). However, it is easy to dene
examples where the situation is not so clear.
EXAMPLE III.3.2. Consider the following bimatrix game
_
(2, 1)
(0, 0)
(0, 0) (1, 2)
_
It is easy to see that this game
5
has two equilibria (in pure strategies), none of which
dominates the other. Moreover, Player 1 will obviously prefer the solution (1, 1), while
Player 2 will rather have (2, 2). It is difcult to decide how this game should be played
if the players are to arrive at their decisions independently of one another.
III.3.3.2. The prisoners dilemma. There is a famous example of a bimatrix game,
that is used in many contexts to argue that the Nash equilibrium solution is not always
a good solution to a noncooperative game.
EXAMPLE III.3.3. Suppose that two suspects are held on the suspicion of commit-
ting a serious crime. Each of them can be convicted only if the other provides evidence
against him, otherwise he will be convicted as guilty of a lesser charge. However, by
agreeing to give evidence against the other guy, a suspect can shorten his sentence
by half. Of course, the prisoners are held in separate cells and cannot communicate
with each other. The situation is as described in Table III.1 with the entries giving the
length of the prison sentence for each suspect, in every possible situation. Notice that
in this case, the players are assumed to minimize rather than maximize the outcome
of the play.
Suspect I: Suspect II: refuses agrees to testify
refuses (2, 2) (10, 1)
agrees to testify (1, 10) (5, 5)
TABLE III.1. The Prisoners Dilemma.

5
The above example is classical in game theory and known as the battle-of-sexes game. In an
American and a rather sexist context, the rows represent the womans choices between going to the
theater and the football match while the columns are the mans choices between the same events. In
fact, we can well understand what a mixed-strategy solution means for this example: the couple will be
happy if the go to the theater and match in every alternative week.
The unique Nash equilibrium of this game is given by the pair of pure strategies
(agree-to-testify, agree-to-testify) with the outcome that both suspects will spend ve
years in prison. This outcome is strictly dominated by the strategy pair (refuse-to-
testify, refuse-to-testify), which however is not an equilibrium and thus is not a realistic
solution of the problem when the players cannot make binding agreements.
The above example shows that Nash equilibria could result in outcomes being very
far from efcient.
III.3.4. Algorithms for the computation of Nash equilibria in bimatrix games.
Linear programming is closely associated with the characterization and computation
of saddle points in matrix games. For bimatrix games one has to rely on algorithms
solving either quadratic programming or complementarity problems, which we dene
below. There are also a few algorithms (see [3], [47]) which permit us to nd an
equilibrium of simple bimatrix games. We will show one for a 2 2 bimatrix game
and then introduce the quadratic programming [38] and complementarity problem [34]
formulations.
III.3.4.1. Equilibrium computation in a 2 2 bimatrix game. For a simple 2
2 bimatrix game one can easily nd a mixed strategy equilibrium as shown in the
following example.
EXAMPLE III.3.4. Consider the game with payoff matrix given below.
_
(1, 0) (0, 1)
(
1
2
,
1
3
) (1, 0)
_
Compute a mixed strategy equilibrium.
First, notice that this game has no pure strategy equilibrium.
Assume Player 2 chooses his equilibrium strategy y (i.e., 100y% of times use rst
column, 100(1 y)% times use second column) in such a way that Player 1 (in equi-
librium) will get as much payoff using rst row as using second row i.e.,
y + 0(1 y) =
1
2
y + 1(1 y).
This is true for y
=
2
3
.
Symmetrically, assume Player 1 is using a strategy x (i.e., 100x% of times use rst
row, 100(1 y)% times use second row) such that Player 2 will get as much payoff
using rst column as using second column i.e.,
0x +
1
3
(1 x) = 1x + 0(1 x).
This is true for x
=
1
4
. The players payoffs will be, respectively, (
2
3
) and (
1
4
).
Then the pair of mixed strategies
(x
, 1 x
), (y
, 1 y
)
is an equilibrium in mixed strategies.
III.3.4.2. Links between quadratic programming and Nash equilibria in bimatrix
games. Mangasarian and Stone (1964) have proved the following result that links qua-
dratic programming with the search of equilibria in bimatrix games. Consider a bima-
trix game (A, B). We associate with it the quadratic program
max [x
T
Ay + x
T
By v
1
v
2
] (III.13)
s.t.
Ay v
1
1
m
(III.14)
B
T
x v
2
1
n
(III.15)
x, y 0 (III.16)
x
T
1
m
= 1 (III.17)
y
T
1
n
= 1 (III.18)
v
1
, v
2
, IR. (III.19)
LEMMA III.3.1. The following two assertions are equivalent
(i): (x, y, v
1
, v
2
) is a solution to the quadratic programming problem (III.13)-
(III.19)
(ii): (x, y) is an equilibrium for the bimatrix game.
Proof: From the constraints it follows that x
T
Ay v
1
and x
T
By v
2
for any feasible
(x, y, v
1
, v
2
). Hence the maximum of the program is at most 0. Assume that (x, y) is
an equilibrium for the bimatrix game. Then the quadruple
(x, y, v
1
= x
T
Ay, v
2
= x
T
By)
is feasible i.e., satises (III.14)-(III.19); moreover, it gives a value 0 to the objective
function (III.13). Hence the equilibrium denes a solution to the quadratic program-
ming problem (III.13)-(III.19).
Conversely, let (x
, y
, v
1
, v
2
) be a solution to the quadratic programming problem

(III.13)- (III.19). We know that an equilibrium exists for a bimatrix game (Nash theo-
rem). We know that this equilibrium is a solution to the quadratic programming prob-
lem (III.13)-(III.19) with optimal value 0. Hence the optimal program (x
, y
, v
1
, v
2
)
must also give a value 0 to the objective function and thus be such that
x
T
Ay
+ x
T
By
= v
1
+ v
2
. (III.20)
For any x 0 and y 0 such that x
T
1
m
= 1 and y
T
1
n
= 1 we have, by (III.17) and
(III.18)
x
T
Ay
v
1
x
T
By v
2
.
In particular we must have
x
T
Ay
v
1
x
T
By
v
2
These two conditions with (III.20) imply

x
T
Ay
= v
1
x
T
By
= v
2
.
Therefore we can conclude that For any x 0 and y 0 such that x
T
1
m
= 1 and
y
T
1
n
= 1 we have, by (III.17) and (III.18)
x
T
Ay
x
T
Ay
x
T
By x
T
By
and hence, (x
, y
) is a Nash equilibrium for the bimatrix game. QED

III.3.4.3. A complementarity problem formulation. We have seen that the search
for equilibria could be done through solving a quadratic programming problem. Here
we show that the solution of a bimatrix game can also be obtained as the solution of a
complementarity problem.
There is no loss in generality if we assume that the payoff matrices are mn and
have only positive entries (A > 0 and B > 0). This is not restrictive, since VNM
utilities are dened up to an increasing afne transformation. A strategy for Player 1
is dened as a vector x IR
m
that satises
x 0 (III.21)
x
T
1
m
= 1 (III.22)
and similarly for Player 2
y 0 (III.23)
y
T
1
n
= 1. (III.24)
It is easily shown that the pair (x
, y
) satisfying (III.21)-(III.24) is an equilibrium iff

(III.25)
(x
T
Ay
)1
m
Ay
(A > 0)
(x
T
By
)1
n
B
T
x
(B > 0)
i.e., if the equilibrium condition is satised for pure strategy alternatives only.
Consider the following set of constraints with v
1
IR and v
2
IR
(III.26)
v
1
1
m
Ay
v
2
1
n
B
T
x
and
x
T
(Ay
v
1
1
m
) = 0
y
T
(B
T
x
v
2
1
n
) = 0
.
The relations on the right are called complementarity constraints. For mixed strategies
(x
, y
) satisfying (III.21)-(III.24), they simplify to x
T
Ay
= v
1
, x
T
By
= v
2
. This
shows that the above system (III.26) of constraints is equivalent to the system (III.25).
Dene s
1
= x/v
2
, s
2
= y/v
1
and introduce the slack variables u
1
and u
2
, the
system of constraints (III.21)-(III.24) and (III.26) can be rewritten
_
u
1
u
2
_
=
_
1
m
1
n
_
_
0 A
B
T
0
__
s
1
s
2
_
(III.27)
0 =
_
u
1
u
2
_
T
_
s
1
s
2
_
(III.28)
0
_
u
1
u
2
_
T
(III.29)
0
_
s
1
s
2
_
. (III.30)
Introducing four obvious new variables permits us to rewrite (III.27)- (III.30) in the
generic formulation
u = q + Ms (III.31)
0 = u
T
s (III.32)
u 0 (III.33)
s 0, (III.34)
of a so-called a complementarity problem.
A pivoting algorithm ([34], [35]) has been proposedto solve such problems. This
algorithm applies also to quadratic programming , so this conrms that the solution of
a bimatrix game is of the same level of difculty as solving a quadratic programming
problem.
REMARK III.3.2. Once we obtain x and y, solution to (III.27)-(III.30) we shall
have to reconstruct the strategies through the formulae
x = s
1
/(s
T
1
1
m
) (III.35)
y = s
2
/(s
T
2
1
n
). (III.36)
III.4. Concave m-person games
The nonuniqueness of equilibria in bimatrix games and a fortiori in m-player ma-
trix games poses a delicate problem. If there are many equilibria, in a situation where
one assumes that the players cannot communicate or enter into preplay negotiations,
III.4. CONCAVE m-PERSON GAMES 39
how will a given player choose among the different strategies corresponding to the dif-
ferent equilibrium candidates? In single agent optimization theory we know that strict
concavity of the (maximized) objective function and compactness and convexity of the
constraint set lead to existence and uniqueness of the solution. The following question
thus arises
Can we generalize the mathematical programming approach to a
situation where the optimization criterion is a Nash-Cournot equi-
librium? can we then give sufcient conditions for existence and
uniqueness of an equilibrium solution?
The answer has been given by Rosen in a seminal paper [54] dealing with concave
m-person game.
A concave m-person game is described in terms of individual strategies repre-
sented by vectors in compact subsets of Euclidian spaces (IR
m
j
for Player j) and by
payoffs represented, for each player, by a continuous functions which is concave w.r.t.
his own strategic variable. This is a generalization of the concept of mixed strategies,
introduced in previous sections. Indeed, in a matrix or a bimatrix game the mixed
strategies of a player are represented as elements of a simplex i.e., a compact con-
vex set, and the payoffs are bilinear or multilinear forms of the strategies, hence, for
each player the payoff is concave w.r.t. his own strategic variable. This structure is
thus generalized in two ways: (i) the strategies can be vectors constrained to be in a
more general compact convex set and (ii) the payoffs are represented by more general
continuous-concave functions.
Let us thus introduce the following game in strategic form
Each player j M = 1, . . . , m controls the action u
j
U
j
where U
j
is
a compact convex subset of IR
m
j
, whith m
j
a given integer. Player j receives
a payoff
j
(u
1
, . . . , u
j
, . . . , u
m
) that depends on the actions chosen by all the
players. One assumes that the reward function
j
: U
1
U
j
, U
m

IR is continuous in each u
i
and concave in u
j
.
A coupled constraint is dened as a proper subset | of U
1
U
j

U
m
. The constraint is that the joint action u = (u
1
, . . . , u
m
) must be in |.
DEFINITION III.4.1. An equilibrium, under the coupled constraint | is dened as
a decision m-tuple (u
1
, . . . , u
j
, . . . , u
m
) | such that for each player j M
j
(u
1
, . . . , u
j
, . . . , u
m
)
j
(u
1
, . . . , u
j
, . . . , u
m
) (III.37)
for all u
j
U
j
s.t. (u
1
, . . . , u
j
, . . . , u
m
) |. (III.38)
REMARK III.4.1. The consideration of a coupled constraint is a new feature. Now
each players strategy space may depend on the strategy of the other players. This
may look awkward in the context of nonocooperative games where the players cannot
enter into communication or cannot coordinate their actions. However the concept is
mathematically well dened. We shall see later on that it ts very well some interesting
aspects of environmental management. One can think for example of a global emission
constraint that is imposed on a nite set of rms that are competing on the same market.
This environmental example will be further developed in forthcoming chapters.
III.4.1. Existence of coupled equilibria.
DEFINITION III.4.2. A coupled equilibrium is a vector u
such that
(III.39)
j
(u
) = max
u
j
j
(u
1
, . . . , u
j
, . . . , u
m
)[(u
1
, . . . , u
j
, . . . , u
m
) |.
At such a point no player can improve his payoff by a unilateral change in his strategy
which keeps the combined vector in |.
Let us rst show that an equilibrium is actually dened through a xed point condi-
tion. For that purpose we introduce a so-called global reaction function : || IR
dened by
(III.40) (u, v, r) =
m
j=1
r
j
j
(u
1
, . . . , v
j
, . . . , u
m
),
where the coefcients r
j
> 0, j = 1, . . . , m are arbitrary given positive weights. The
precise role of this weighting scheme will be explained later. For the moment we
could take as well r
j
1. Notice that, even if u and v are in |, the combined vectors
(u
1
, . . . , v
j
, . . . , u
m
) are element of a larger set in U
1
. . . U
m
. This function is
continuous in u and concave in v for every xed u. We call it a reaction function since
the vector v can be interpreted as composed of the reactions of the different players to
the given vector u. This function is helpful as shown in the following result.
LEMMA III.4.1. Let u
| be such that
(III.41) (u
, u
, r) = max
uU
(u
, u, r).
Then u
is a coupled equilibrium.
Proof: Assume u
satises (III.41) but is not a coupled equilibrium i.e., does not

satisfy (III.39). Then, for one player, say , there would exist a vector
u = (u
1
, . . . , u
, . . . , u
m
) |
such that
(u
1
, . . . , u
, . . . , u
m
) >
(u
).
Then we shall also have (u
, u) > (u
, u
) which is a contradiction to (III.41). QED

This result has two important consequences.
(1) It shows that proving the existence of an equilibrium reduces to proving that
a xed point exists for an appropriately dened reaction mapping (u
is the
best reply to u
in (III.41));
(2) it associates with an equilibrium an implicit maximization problem, dened
in (III.41). We say that this problem is implicit since it is dened in terms of
the very solution u
that it characterizes.
To make more precise the xed point argument we introduce a coupled reaction map-
ping.
DEFINITION III.4.3. The point to set mapping
(III.42) (u, r) = v[(u, v, r) = max
wU
(u, w, r).
is called the coupled reaction mapping associated with the positive weighting r. A
xed point of (, r) is a vector u
such that u
(u
, r).
By Lemma III.4.1 a xed point of (, r) is a coupled equilibrium.
THEOREM III.4.1. For any positive weighting r there exists a xed point of (, r)
i.e., a point u
s.t. u
(u
, r). Hence a coupled equilibrium exists.

Proof: The proof is based on the Kakutani xed-point theorem that is given in the
Appendix of section III.7. One is required to show that the point to set mapping is
upper semicontinuous. This is an easy consequence of the concavity of the game and
compactness of all constraint sets U
j
, j = 1, . . . , m and |. QED
REMARK III.4.2. This existence theorem is very close, in spirit, to the theorem of
Nash. It uses a xed-point result which is topological and not constructive i.e.,
it does not provide a computational method. However, the denition of a normalised
equilibrium introduced by Rosen establishes a link between mathematical program-
ming and concave games with coupled constraints.
III.4.2. Normalized equilibria.
III.4.2.1. Kuhn-Tucker multipliers. Suppose that the coupled constraint (III.39)
can be dened by a set of inequalities
(III.43) h
k
(u) 0, k = 1, . . . , p
where h
k
: U
1
U
m
IR, k = 1, . . . , p, are given concave functions. Let
us further assume that the payoff functions
j
() as well as the constraint functions
h
k
() are continuously differentiable and satisfy the constraint qualication conditions
so that Kuhn-Tucker multipliers exist for each of the implicit single agent optimization
problems dened below.
Assume all players other than Player j use their strategy u
i
, i M, i ,= j. Then
the equilibrium conditions (III.37)-(III.39) dene a single agent optimization problem
with concave objective function and convex compact admissible set. As usual, we
denote [u
j
, u
j
] the decision vector where all players i other than j play u
i
while
Player j uses u
j
. Under the assumed constraint qualication assumption there exists a
vector of Kuhn-Tucker multipliers
j
= (
jk
)
k=1,...,p
such that the Lagrangean
(III.44) L
j
([u
j
, u
j
],
j
) =
j
([u
j
, u
j
]) +
k=1...p
jk
h
k
([u
j
, u
j
])
veries, at the optimum
0 =

u
j
L
j
([u
j
, u
j
],
j
) (III.45)
0
j
(III.46)
0 =
jk
h
k
([u
j
, u
j
]) k = 1, . . . , p. (III.47)
DEFINITION III.4.4. We say that the equilibrium is normalized if the different mul-
tipliers
j
for j M are colinear with a common vector
0
, namely
(III.48)
j
=
1
r
j
0
where the coefcients r
j
> 0, j = 1, . . . , m are weights given to the players.
Actually, this common multiplier
0
is associated with the implicit mathematical
programming problem
(III.49) max
uU
(u
, u, r)
to which we associate the Lagrangean
(III.50) L
0
(u,
0
) =
jM
r
j
j
([u
j
, u
j
]) +
k=1...p
0k
h
k
(u).
and the rst order necessary conditions
0 =

u
j
r
j
j
(u
) +
k=1,...,p
0k
h
k
(u
), j M (III.51)
0
0
(III.52)
0 =
0k
h
k
(u
) k = 1, . . . , p. (III.53)
III.4.2.2. An economic interpretation. The multiplier, in a mathematical program-
ming framework, can be interpreted as a marginal cost associated with the right-hand
side of the constraint. More precisely it indicates the sensitivity of the optimum solu-
tion to marginal changes in this right-hand-side. The multiplier permits also a price
decentralization in the sense that, through an ad-hoc pricing mechanism the optimiz-
ing agent is induced to satisfy the constraints. In a normalized equilibrium, the shadow
cost interpretation is not so apparent; however, the price decomposition principle is
still valid. Once the common multiplier has been dened, with the associated weight-
ing r
j
> 0, j = 1, . . . , m, the coupled constraint will be satised by equilibrium
seeking players, when they use as payoffs the Lagrangeans
L
j
([u
j
, u
j
],
j
) =
j
([u
j
, u
j
]) +
1
r
j
k=1...p
0k
h
k
([u
j
, u
j
]),
j = 1, . . . , m. (III.54)
The common multiplier permits then an implicit pricing of the common constraint
so that it remains compatible with the equilibrium structure. Indeed, this result to be
useful necessitates uniqueness of the normalized equilibrium associated with a given
weighting r
j
> 0, j = 1, . . . , m. In a mathematical programming framework, unique-
ness of an optimum results from strict concavity of the objective function to be max-
imized. In a game structure, uniqueness of the equilibrium will result from a more
stringent strict concavity requirement, called by Rosen strict diagonal concavity.
III.4.3. Uniqueness of equilibrium. Let us consider the so-called pseudo-gradient
dened as the vector
(III.55) g(u, r) =
_
_
_
_
_
r
1
u
1
1
(u)
r
2
u
2
2
(u)
.
.
.
r
m
u
m
m
(u)
_
_
_
_
_
We notice that this expression is composed of the partial gradients of the different
payoffs with respect to the decision variables of the corresponding player. We also
consider the function
(III.56) (u, r) =
m
j=1
r
j
j
(u)
DEFINITION III.4.5. The function (u, r) is diagonally strictly concave on | if,
for every u
1
and u
2
in |, the following holds
(III.57) (u
2
u
1
)
T
g(u
1
, r) + (u
1
u
2
)
T
g(u
2
, r) > 0.
A sufcient condition that (u, r) be diagonally strictly concave is that the sym-
metric matrix [G(u, r) +G(u, r)
T
] be negative denite for any u
1
in |, where G(u, r)
is the Jacobian of g(u
0
, r) with respect to u.
THEOREM III.4.2. If (u, r) is diagonally strictly concave on the convex set |,
with the assumptions insuring existence of K.T. multipliers, then for every r > 0 there
exists a unique normalized equilibrium
Proof. We sketch below the proof given by Rosen [54]. Assume that for some
r > 0 we have two equilibria u
1
and u
2
. Then we must have
h(u
1
) 0 (III.58)
h(u
2
) 0 (III.59)
and there exist multipliers
1
0,
2
0, such that
1
T
h(u
1
) = 0 (III.60)
2
T
h(u
2
) = 0 (III.61)
and for which the following holds true for each player j M
r
j
u
j
j
(u
1
) +
1
T
u
j
h(u
1
) = 0 (III.62)
r
j
u
j
j
(u
2
) +
2
T
u
j
h(u
2
) = 0. (III.63)
We multiply (III.62) by (u
2
u
1
)
T
and (III.63) by (u
1
u
2
)
T
and we sum together
to obtain an expression + = 0, where, due to the concavity of the h
k
and the
conditions (III.58)-(III.61)
=
jM
p
k=1
1
k
(u
2
u
1
)
T
u
j
h
k
(u
1
) +
2
k
(u
1
u
2
)
T
u
j
h
k
(u
2
)
jM
1
T
[h(u
2
) h(u
1
)] +
2
T
[h(u
1
) h(u
2
)]
=
jM
1
T
h(u
2
) +
2
T
h(u
1
) 0, (III.64)
and
=
jM
r
j
[(u
2
u
1
)
T
u
j
j
(u
1
) + (u
1
u
2
)
T
u
j
j
(u
2
)]. (III.65)
Since (u, r) is diagonally strictly concave we have > 0 which contradicts + = 0.
QED
III.4.4. Anumerical technique. The diagonal strict concavity property that yielded
the uniqueness result of theorem III.4.2 also provides an interesting extension of the
gradient method for the computation of the equilibrium. The basic idea is to project,
at each step, the pseudo gradient g(u, r) on the constraint set | = u : h(u) 0 (let
us call g(u
, r) this projection) and to proceed through the usual steepest ascent step
u
+1
= u
g(u
, r).
Rosen shows that, at each step the step size
> 0 can be chosen small enough for

having a reduction of the norm of the projected gradient. This yields convergence of
the procedure toward the unique equilibrium.
III.4.5. A variational inequality formulation.
THEOREM III.4.3. Under assumptions IV.1.1 the vector q
= (q
1
, . . . , q
m
) is a
Nash-Cournot equilibrium if and only if it satises the following variational inequality
(III.66) (q q
)
T
g(q
) 0,
where g(q
) is the pseudo-gradient at q
with weighting 1.
III.5. CORRELATED EQUILIBRIA 45
Proof: Apply rst order necessary and sufcient optimality conditions for each player
and aggregate to obtain (III.66). QED
REMARK III.4.3. The diagonal strict concavity assumption is then equivalent to
the property of strong monotonicity of the operator g(q
) in the parlance of varia-

tional inequality theory.
III.5. Correlated equilibria
Aumann, [2], has proposed a mechanism that permits the players to enter into
preplay arrangements so that their strategy choices could be correlated, instead of being
independently chosen, while keeping an equilibrium property. He called this type of
solution correlated equilibrium.
III.5.1. Example of a game with correlated equlibria.
EXAMPLE III.5.1. This example has been initially proposed by Aumann [2]. Con-
sider a simple bimatrix game dened as follows
c
1
c
2
r
1
5, 1 0, 0
r
2
4, 4 1, 5
This game has two pure strategy equilibria (r
1
, c
1
) and (r
2
, c
2
) and a mixed strategy
equilibrium where each player puts the same probality 0.5 on each possible pure strat-
egy. The respective outcomes are shown in Table III.2 Now, if the players agree to play
(r
1
, c
1
) : 5,1
(r
2
, c
2
) : 1,5
(0.5 0.5, 0.5 0.5) : 2.5,2.5
TABLE III.2. Outcomes of the three equilibria
by jointly observing a coin ip and playing (r
1
, c
1
) if the result is head, (r
2
, c
2
) if
it is tail then they expect the outcome 3, 3 which is a convex combination of the two
pure equilibrium outcomes.
By deciding to have a coin ip deciding on the equilibrium to play, a new type of
equilibrium has been introduced that yields an outcome located in the convex hull of
the set of Nash equilibrium outcomes of the bimatrix game. In Figure III.2 we have
represented these different outcomes. The three full circles represent the three Nash
equilibrium outcomes. The triangle dened by these three points is the convex hull of
the Nash equilibrium outcomes. The empty circle represents the outcome obtained by
agreeing to play according to the coin ip mechanism.
It is easy to see that this way to play the game denes an equilibrium for the exten-
sive game shown in Figure III.3. This is an expanded version of the initial game where,
(3,3)
(2.5,2.5)
(1,5)
(5,1)
`
`
`
`
`
FIGURE III.2. The convex hull of Nash equilibria

in a preliminary stage, Nature decides randomly the signal that will be observed by
the players
6
. The result of the coin ip is a public information, in the sense that
`
`
`
`
`
`
`
Head
Tail
`
`
`
`
`
`
`
`
c
1
c
1
c
1
c
1
c
2
c
2
c
2
c
2
r
1
r
1
r
2
r
2
(5,1)
(0,0)
(4,4)
(1,5)
(5,1)
(0,0)
(4,4)
(1,5)
0.5
0.5
FIGURE III.3. Extensive game formulation with Nature playing rst
it is shared by all the players. One can easily check that the correlated equilibrium is
a Nash equilibrium for the expanded game.
6
Dotted lines represent information sets of Player 2.
III.5. CORRELATED EQUILIBRIA 47
c
1
c
2
r
1
1
3
0
r
2
1
3
1
3
FIGURE III.4. Probabilities of signals
We now push the example one step further by assuming that the players agree to
play according to the following mechanism: A random device selects one cell in the
game matrix with the probabilities shown in Figure III.4.
When a cell is selected, then each player is told to play the corresponding pure
strategy. The trick is that a player is told what to play but is not told what the rec-
ommendation to the other player is. The information received by each player is not
public anymore. More precisely, the three possible signals are (r
1
, c
1
), (r
2
, c
1
),
(r
2
, c
2
). When Player 1 receives the signal play r
2
he knows that with a probability
1
2
the other player has been told play c
1
and with a probability
1
2
the other player has
been told play c
2
. When Player 1 receives the signal play r
1
he knows that with
a probability 1 the other player has been told play c
1
. Consider now what Player 1
can do, if he assumes that the other player plays according to the recommendation. If
Player 1 has been told play r
2
and if he plays so he expects
1
2
4 +
1
2
1 = 2.5;
if he plays r
1
instead he expects
1
2
5 +
1
2
0 = 2.5
so he cannot improve his expected reward. If Player 1 has been told play r
1
and
if he plays so he expects 5, whereas if he plays r
2
he expects 4. So, for Player 1,
obeying to the recommendation is the best reply to Player 2 behavior when he himself
plays according to the suggestion of the signalling scheme. Now we can repeat the
verication for Player 2. If he has been told play c
1
he expects
1
2
1 +
1
2
4 = 2.5;
if he plays c
2
instead he expects
1
2
0 +
1
2
5 = 2.5
so he cannot improve. If he has been told play c
2
he expects 5 whereas if he plays
c
1
instead he expects 4 so he is better off with the suggested play. So we have checked
that an equilibrium property holds for this way of playing the game. All in all, each
player expects
1
3
5 +
1
3
1 +
1
3
4 = 3 +
1
3
from a game played in this way. This is illustrated in Figure III.5 where the black spade
shows the expected outcome of this mode of play. Auman called it a correlated equi-
librium. Indeed we can now mix these equilibria and still keep the correlated equi-
(3,3)
(2.5,2.5)
(1,5)
(5,1)
`
`
`
`
`
(3.33,3.33)
FIGURE III.5. The dominating correlated equilibrium

librium property, as indicated by the doted line on Figure III.5; also we can construct
an expanded game in extensive form for which the correlated equilibrium constructed
as above denes a Nash equilibrium (see Exercise 3.6).
In the above example we have seen that, by expanding the game via the adjunction of
a rst stage where Nature plays and gives information to the players, a new class
of equilibria can be reached that dominate, in the outcome space, some of the original
Nash-equilibria. If the random device gives an information which is common to all
players, then it permits a mixing of the different pure strategy Nash equilibria and the
outcome is in the convex hull of the Nash equilibrium outcomes. If the random device
gives an information which may be different from one player to the other, then the
correlated equilibrium can have an outcome which lies outside of the convex hull of
Nash equilibrium outcomes.
III.5.2. A general denition of correlated equilibria. Let us give a general de-
nition of a correlated equilibrium in an m-player normal form game. We shall actually
give two denitions. The rst one describes the construct of an expanded game with
a random device distributing some pre-play information to the players. The second
denition which is valid for m-matrix games is much simpler although equivalent.
III.5.2.1. Nash equilibrium in an expanded game. Assume that a game is de-
scribed in normal form, with m players j = 1, . . . , m, their respective strategy sets
j
and payoffs V
j
(
1
, . . . ,
j
, . . . ,
m
). This will be called the original normal form
game.
Assume the players may enter into a phase of pre-play communication during
which they design a correlation device that will provide randomly a signal called pro-
posed mode of play. Let E = 1, 2, . . . , L be the nite set of the possible modes of
play. The correlation device will propose with probability () the mode of play .
The device will then give the different players some information about the proposed
mode of play. More precisely, let H
j
be a class of subsets of E, called the information
III.6. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 49
structure of Player j. Player j, when the mode of play has been selected, receives an
information denoted h
j
() H
j
. Now, we associate with each player j a meta strategy
denoted
j
: H
j

j
that determines a strategy for the original normal form game,
on the basis of the information received. All this construct, is summarized by the data
(E, ()
E
, h
j
() H
j
jM,E
,
j
: H
j

j
jM
) that denes an expanded
game.
DEFINITION III.5.1. The data (E, ()
E
, h
j
() H
j
jM,E
,
j
: H
j

jM
) denes a correlated equilibrium of the original normal form game if it is a
Nash equilibrium for the expanded game i.e., if no player can improve his expected
payoff by changing unilaterally his meta strategy i.e., by playing
j
(h
j
()) instead of

j
(h
j
()) when he receives the signal h
j
() H
j
.
(III.67)
E
()V
j
([
j
(h
j
()),
Mj
(h
Mj
())]
E
()V
j
([
j
(h
j
()),
Mj
(h
Mj
())].
III.5.2.2. An equivalent denition for m-matrix games. In the case of an m-matrix
game, the denition given above can be replaced with the following one, which is
much simpler
DEFINITION III.5.2. A correlated equilibrium is a probability distribution (s)
over the set of pure strategies S = S
1
S
2
S
m
such that, for every Player j and
any mapping
j
: S
j
S
j
the following holds
(III.68)
sS
(s)V
j
([s
j
, s
Mj
])
sS
(s)V
j
([(s
j
), s
Mj
]).
III.6. Bayesian equilibrium with incomplete information
Up to now we have considered only games where each player knows everything
concerning the rules, the players types (i.e., their payoff functions, their strategy sets)
etc. .. We were dealing with games of complete information. In this section we look
at a particular class of games with incomplete information and we explore the class of
so-called Bayesian equilibria.
III.6.1. Example of a game with unknown type for a player. In a game of in-
complete information, some players do not know exactly what are the characteristics
of other players. For example, in a two-player game, Player 2 may not know exactly
what the payoff function of Player 1 is.
EXAMPLE III.6.1. Consider the case where Player 1 could be of two types, called
1
and
2
respectively. We dene below two matrix games, corresponding to the two
possible types respectively:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
(0,-1)
(2,0)
(2,1)
(3,0)
(1.5,-1)
(3.5,0)
(2,1)
(3,0)
c
1
c
1
c
1
c
1
c
2
c
2
c
2
c
2
r
1
r
1
r
2
r
2
p
1
p
2
FIGURE III.6. Extensive game formulation of the game of incomplete information
1
c
1
c
2
r
1
0, 1 2, 0
r
2
2, 1 3, 0
2
c
1
c
2
r
1
1.5, 1 3.5, 0
r
2
2, 1 3, 0
Game 1 Game 2
If Player 1 is of type
1
, then the bimatrix game 1 is played; if Player 1 is of type
2
,
then the bimatrix game 2 is played; the problem is that Player 2 does not know the type
of Player 1.
III.6.2. Reformulation as a game with imperfect information. Harsanyi, in
[25], has proposed a transformation of a game of incomplete information into a game
with imperfect information. For the example III.6.1 this transformation introduces a
preliminary chance move, played by Nature which decides randomly the type
i
of
Player 1. The probability of each type, denoted p
1
and p
2
= 1 p
1
respectively, rep-
resents the beliefs of Player 2, given here in terms of prior probabilities, about facing
a player of type
1
or
2
. One assumes that Player 1 knows also these beliefs, the prior
probabilities are thus common knowledge. The information structure
7
in the associated
extensive game shown in Figure III.6, indicates that Player 1 knows his type when
7
The dotted line in Figure III.6 represents the information set of Player 2.
III.6. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 51
deciding, whereas Player 2 does not observe the type neither, indeed in that game of
simultaneous moves, the action chosen by Player 1. Call x
i
(respectively 1 x
i
) the
probability of choosing r
1
(respectively r
2
) by Player 1 when he implements a mixed
strategy, knowing that he is of type
i
. Call y (respectively 1 y) the probability of
choosing c
1
(respectively c
2
) by Player 2 when he implements a mixed strategy.
We can dene the optimal response of Player 1 to the mixed strategy (y, 1 y) of
Player 2 by solving
8
max
i=1,2
a
1
i1
y + a
1
i2
(1 y)
if the type is
1
,
max
i=1,2
a
2
i1
y + a
2
i2
(1 y)
if the type is
2
.
We can dene the optimal response of Player 2 to the pair of mixed strategy (x
i
, 1
x
i
), i = 1, 2 of Player 1 by solving
max
j=1,2
p
1
(x
1
b
1
1j
+ (1 x
1
)b
1
2j
) + p
2
(x
2
b
2
1j
+ (1 x
2
)b
2
2j
).
Let us rewrite these conditions with the data of the game illustrated in Figure III.6.
First consider the reaction function of Player 1
1
max0y + 2(1 y), 2y + 3(1 y)
2
max1.5y + 3.5(1 y), 2y + 3(1 y).
We draw in Figure III.7 the lines corresponding to these comparisons between two
linear functions. We observe that, if Player 1s type is
1
, he will always choose r
2
,
-
-
-
-
-
-
-
-
-
-

0 1
y
r
1
r
2
1
2
3
-
-
-
-
-
-
-
-
-
-
0 1 0.5
y
2
3.5
3
2
1.5
r
1
r
2
FIGURE III.7. Optimal reaction of Player 1 to Player 2 mixed strategy
(y, 1 y)
8
We call a
ij
and b
ij
the payoffs of Player 1 and Player 2 respectively when the type is
.
whatever y, whereas, if his type is
2
he will choose r
1
if y is mall enough and switch
to r
2
when y > 0.5. For y = 0.5, the best reply of Player 1 could be any mixed strategy
(x, 1 x).
Consider now the optimal reply of Player 2. We know that Player 1, if he is of type
1
, always chooses action r
2
i.e., x
1
= 0. So the best reply conditions for Player 2 can
be written as follows
max(p
1
(1) +(1 p
1
)(x
2
(1) +(1 x
2
)1)), (p
1
(0) +(1 p
1
)(x
2
(0) +(1 x
2
)0))
which boils down to
max(1 2(1 p
1
)x
2
), 0).
We conclude that Player 2 chooses action c
1
if x
2
<
1
2(1p
1
)
, action c
2
if x
2
>
1
2(1p
1
)
and any mixed action with y [0, 1] if x
2
=
1
2(1p
1
)
. We conclude easily from these
observations that the equilibria of this game are characterized as follows:
x
1
0
if p
1
0.5 x
2
= 0, y = 1 or x
2
= 1, y = 0 or x
2
=
1
2(1p
1
)
, y = 0.5
if p
1
> 0.5 x
2
= 0, y = 1.
III.6.3. A general denition of Bayesian equilibria. We can generalize the anal-
ysis performed on the previous example and introduce the following denitions. Let
M be a set of m players. Each player j M may be of different possible types. Let
j
be the nite set of types for Player j. Whatever his type, Payer j has the same set
of pure strategies o
j
. If = (
1
, . . . .
m
) =
1

m
is a type specication
for every player, then the normal form of the game is specied by the payoff functions
(III.69) u
j
(; , . . . , ) : o
1
o
m
IR, j M.
A prior probability distribution p(
1
, . . . .
m
) on is given as common knowledge. We
assume that all the marginal probabilities are nonzero
p
j
(
j
) > 0, j M.
When Player j observes that he is of type
j

j
he can construct his revised condi-
tional probality distribution on the types
Mj
of the other players through the Bayes
formula
(III.70) p(
Mj
[
j
) =
p([
j
,
Mj
])
p
j
(
j
)
.
We can now introduce an expanded game where Nature draws randomly, according to
the prior probability distribution p(), a type vector for all players. Player-j can
observe only his own type
j

j
. Then each player j M picks a strategy in his
own strategy set o
j
. The outcome is then dened by the payoff functions (III.69).
DEFINITION III.6.1. In a game of incomplete information with m players having
respective type sets
j
and pure strategy sets o
j
, i = 1, . . . , m, a Bayesian equilibrium
is a Nash equilibrium in the expanded game in which each player pure strategy
j
is a map from
j
to o
j
.
III.8. EXERCISES 53
The expanded game can be described in normal form as follows. Each player
j M has a strategy set
j
where a strategy is dened as a mapping
j
:
j
o
j
.
Associated with a strategy prole = (
1
, . . . ,
m
) the payoff to player j is given by
V
j
() =
j
p
j
(
j
)
Mj
Mj
p(
Mj
[
j
)
u
j
([
j
,
Mj
];
1
(
1
), . . . ,
j
(
j
), . . . ,
m
(
m
)). (III.71)
As usual, a Nash equilibrium is a strategy prole
= (
1
, . . . ,
m
) =
1

m
such that
(III.72) V
j
(
) V
j
(
j
,
Mj
)
j

j
.
It is easy to see that, since each p
j
(
j
) is positive, the equilibrium conditions (III.72)
lead to the following conditions
j
(
j
) = argmax
s
j
S
j
Mj
Mj
p(
Mj
[
j
) u
j
([
j
,
Mj
]; [s
j
,
Mj
(
Mj
)])
j M. (III.73)
REMARK III.6.1. Since the sets
j
and o
j
are nite, the set
j
of mappings from
j
to o
j
is also nite. Therefore the expanded game is an m-matrix game and, by Nash
theorem there exists at least one mixed strategy equilibrium.
III.7. Appendix on Kakutani xed-point theorem
DEFINITION III.7.1. Let : IR
m
2
IR
n
be a point to set mapping. We say that
this mapping is upper semicontinuous if, whenever the sequence x
k
k=1,2,...
converges
in IR
m
toward x
0
then any accumulation point y
0
of the sequence y
k
k=1,2,...
in IR
n
,
where y
k
(x
k
), k = 1, 2, . . . , is such that y
0
(x
0
).
THEOREM III.7.1. Let : A 2
A
be a point to set upper semicontinuous map-
ping, where A is a compact subset
9
of IR
m
. Then there exists a xed-point for . That
is, there is x
(x
) for some x
A.
III.8. Exercises
III.8.1. Consider the matrix game.
_
3 1 8
4 10 0
_
9
i.e., a closed and bounded subset.
Assume that the players are in the simultaneous move information structure. Assume
that the players try to guess and counterguess the optimal behavior of the opponent in
order to determine their optimal strategy choice. Show that this leads to an unstable
process.
III.8.2. A game was given in extensive form in Figure II.7 (page 21) and normal
form was obtained in Exercise II.5.1.
(1) What are the players security levels?
(2) Compute for this game the Nash equilibrium point(s) in pure and/or mixed
strategy(-ies) (if they exist).
(3) What are both players Stackelberg-equilibriumpolicies if Player I is the leader?
And if Player II is the leader?
(4) Indicate the policies which lead to the Pareto optimal payoffs.
(5) Suppose you would like Player I to use his rst row policy and Player II
his rst column policy. How would you re-design the payoff matrix so that
element (1, 1) be now preferred by both players (i.e., likely to be played)?
Suppose, moreover, that the incentive for the players to change their policy is
coming out of your pocket so you will want to spend as less as possible on
motivating the players.
III.8.3. Find the value and the saddle point mixed-strategies for the above matrix
game.
III.8.4. Dene the quadratic programming problem that will nd a Nash equilib-
rium for the bimatrix game
_
(52, 50)
(44, 44) (44, 41)

(42, 42) (46, 49)
(39, 43)
_
.
Verify that the entries marked with a correspond to a solution of the associated qua-
dratic programming problem.
III.8.5. Do the same as above but using now the complementarity problem for-
mulation.
III.8.6. Consider the two player game where the strategy sets are the intervals
U
1
= [0, 100] and U
2
= [0, 100] respectively, and the payoffs
1
(u) = 25u
1
+15u
1
u
2
4u
2
1
and
2
(u) = 100u
2
50u
1
u
1
u
2
2u
2
2
respectively. Dene the best reply
mapping. Find an equilibrium point. Is it unique?
III.8. EXERCISES 55
Exercise 3.6. Consider the two player game where the strategy sets are the intervals
U
1
= [10, 20] and U
2
= [0, 15] respectively, and the payoffs
1
(u) = 40u
1
+ 5u
1
u
2

2u
2
1
and
2
(u) = 50u
2
3u
1
u
2
2u
2
2
respectively. Dene the best reply mapping.
Find an equilibrium point. Is it unique?
III.8.7. Consider an oligopoly model. Show that the assumptions IV.1.1 dene a
concave game in the sense of Rosen.
III.8.8. In example III.5.1 a correlated equilibrium has been constructed for the
game
c
1
c
2
r
1
5, 1 0, 0
r
2
4, 4 1, 5
.
Find the associated extensive form game for which the proposed correlated equilibrium
corresponds to a Nash equilibrium.
CHAPTER IV
Cournot and Network Equilibria
IV.1. Cournot equilibrium
The model proposed in 1838 by [10] is still one of the most frequently used game
model in economic theory. It represents the competition between different rms sup-
plying a market for a divisible good.
IV.1.1. The static Cournot model. Let us rst of all recall the basic Cournot
oligopoly model. We consider a single market on which m rms are competing. The
market is characterized by its (inverse) demand law p = D(Q) where p is the market
clearing price and Q =
j=1,...,m
q
j
is the total supply of goods on the market. The
rm j faces a cost of production C
j
(q
j
), hence, letting q = (q
1
, . . . , q
m
) represent
the production decision vector of the m rms together, the prot of rm j is
j
(q) =
q
j
D(Q) C
j
(q
j
). The following assumptions are placed on the model
ASSUMPTION IV.1.1. The market demand and the rms cost functions satisfy the
following properties: (i) The inverse demand function is nite valued, nonnegative and
dened for all Q [0, ). It is also twice differentiable, with D
(Q) < 0 wherever

D(Q) > 0. In addition D(0) > 0. (ii) C
j
(q
j
) is dened for all q
j
[0, ), non-
negative, convex, twice continuously differentiable, and C
(q
j
) > 0. (iii) QD(Q) is
bounded and strictly concave for all Q such that D(Q) > 0.
If one assumes that each rm j chooses a supply level q
j
[0, q
j
], this model
satises the denition of a concave game ` a la Rosen (see exercise 3.7). Therefore an
equilibrium exists. Let us consider the uniqueness issue in the duopoly case.
Consider the pseudo gradient (there is no need of weighting (r
1
, r
2
) since the con-
straints are not coupled)
g(q
1
, q
2
) =
_
D(Q) + q
1
D
(Q) C
1
(q
1
)
D(Q) + q
2
D
(Q) C
2
(q
2
)
_
and the jacobian matrix
G(q
1
, q
2
) =
_
2D
(Q) + q
1
D
(Q) C
1
(q
1
) D
(Q) + q
1
D
(Q)
D
(Q) + q
2
D
(Q) 2D
(Q) + q
2
D
(Q) C
1
(q
2
)
_
.
57
58 IV. COURNOT AND NETWORK EQUILIBRIA
The negative deniteness of the symmetric matrix
1
2
[G(q
1
, q
2
) + G(q
1
, q
2
)
T
] =
_
2D
(Q) + q
1
D
(Q) C
1
(q
1
) D
(Q) +
1
2
QD
(Q)
D
(Q) +
1
2
QD
(Q) 2D
(Q) + q
2
D
(Q) C
1
(q
2
)
_
implies uniqueness of the equilibrium.
IV.1.2. Formulation of a Cournot equilibriumas a nonlinear complementarity
problem. We have seen that Nash equilibria in m-matrix games could be character-
ized as the solution to a linear complementarity problem. A similar formulation holds
for a Cournot equilibrium and it will in general lead to a nonlinear complementarity
problem or NLCP, a class of problems that has recently been the object of consider-
able developments in the eld of Mathematical Programming (see the book [13] or the
article [14]) We can rewrite the equilibrium condition as an NLCP in canonical form
q
j
0 (IV.1)
f
j
(q) 0 (IV.2)
q
j
f
j
(q) = 0, (IV.3)
where
(IV.4) f
j
(q) = g
j
(q) = D(Q) q
j
D
(Q) + C
j
(q
j
).
The relations (IV.1-IV.3) express that, at a Cournot equilibrium, either q
j
= 0 and
the gradient g
j
(q) is 0, or q
j
> 0 and g
j
(q) = 0. There are now optimization
softwares that solve these types of problems. In [55] an extension of the GAMS (see
[9]) modeling software to provide an ability to solve this type of complementarity
problem is described. More recently the modeling language AMPL (see [18]) has also
been adapted to handel a problem of this type and to submit it to an efcient NLCP
solver like PATH (see [12]).
EXAMPLE IV.1.1. This example comes from the AMPL website
http://www.ampl.com/ampl/TRYAMPL/
Consider an oligopoly with n = 10 rms. The production cost function of rm j is
given by
(IV.5) c
i
(q
i
) = c
i
q
i
+

i
1 +
i
(L
i
q
i
)
(1+
i
)/
i
.
The market demand function is dened by
(IV.6) D(Q) = (A/Q)
1/
.
Therefore
(IV.7) D
(Q) =
1
(A/Q)
1/1
(
A
Q
2
) =
1
D(Q)/Q.
IV.1. COURNOT EQUILIBRIUM 59
The Nash Cournot equilibrium will be the solution of the NLCP (IV.1-IV.3) where
f
i
(q) = D(Q) q
i
D
(Q) + C
i
(q
i
)
= (A/Q)
1/
+
1
q
i
D(Q)/Q + c
i
+ (L
i
q
i
)
1/
i
.
IV.1.2.1. AMPL input. The following les can be submitted via the web submis-
sion tool on the NEOS server site
http://www-neos.mcs.anl.gov/neos/
solvers/CP:PATH-AMPL/solver-www.html
%%%%%%%FILE nash.mod%%%%%%%%%%
#==>nash.gms
# A non-cooperative game example: a Nash equilibrium\index{Nash
equilibrium} is sought.
# References:
# F.H. Murphy, H.D. Sherali, and A.L. Soyster, "A Mathematical #
Programming Approach for Determining Oligopolistic Market #
Equilibrium\index{equilibrium}", Mathematical Programming 24
(1986), pp. 92-106.
# P.T. Harker, "Accelerating the convergence . . .", Mathematical
# Programming 41 (1988), pp. 29-59.
set Rn := 1 .. 10;
param gamma := 1.2;
param c {i in Rn}; param beta {i in Rn}; param L {i in Rn} := 10;
var q {i in Rn} >= 0; # production vector var Q = sum {i
in Rn} q[i]; var divQ = (5000/Q)**(1/gamma);
s.t. feas {i in Rn}:
q[i] >= 0 complements
0 <= c[i] + (L[i] * q[i])**(1/beta[i]) - divQ
- q[i] * (-1/gamma) * divQ / Q;
%%%%%%%FILE nash.dat%%%%%%%%%%
data;
param c := 1 5 2 3 3 8 4 5 5 1 6 3 7 7 8 4 9 6
10 3 ;
param beta := 1 1.2 2 1 3 .9 4 .6 5 1.5 6 1 7 .7 8
1.1 9 .95 10 .75 ;
%%%%%%%FILE nash.run%%%%%%%%%%
model nash.mod; data nash.dat;
set initpoint := 1 .. 4; param initval {Rn,initpoint} >= 0;
data;param initval
: 1 2 3 4 :=
1 1 10 1.0 7
2 1 10 1.2 4
3 1 10 1.4 3
4 1 10 1.6 1
5 1 10 1.8 18
6 1 10 2.1 4
7 1 10 2.3 1
8 1 10 2.5 6
9 1 10 2.7 3
10 1 10 2.9 2
;
for {point in initpoint} {
let{i in Rn} q[i] := initval[i,point];
solve;
display max{i in 1.._nccons}
abs(_ccon[i]), min{i in 1.._ncons} _con[i].slack;
}
The result obtained for the different initial points is
q [*] :=
1 7.44155
2 4.09781
3 2.59064
4 0.935386
5 17.949
6 4.09781
7 1.30473
8 5.59008
9 3.22218
10 1.67709 ;
It shows that rm 5 has a market advantage. It has the highest value (
5
= 1.5)
which corresponds to the highest productivity.
IV.2. FLOWS ON NETWORKS 61
IV.1.3. Computing the solution of a classical Cournot model. There are many
approaches to nd a solution to a static Cournot game. We have described above the
one based on a solution of an NLCP. We list below a few other alternatives
(a): one discretizes the decision space and uses the complementarity algorithm
proposed in [34] to compute a solution to the associated bimatrix game;
(b): one exploits the fact that the players are uniquely coupled through the con-
dition that Q =
j=1,...,m
q
j
and a sort of primal-dual search technique is
used [41];
(c): one formulates the Nash-Cournot equilibrium problem as a variational in-
equality. Assumption IV.1.1 implies Strict Diagonal Concavity (SDC) , in the
sense of Rosen i.e., strict monotonicity of the pseudo-gradient operator, hence
uniqueness and convergence of various gradient like algorithms.
IV.2. Flows on networks
In the following sections we present some applications of game theory to the anal-
ysis of ows on networks. In particular we study the concept of trafc equilibrium
introduced in [67] to represent the building of trafc loads on the street networks of a
busy day in a city. We show that equilibria behave very differently from optima, in the
sense that more exibility does not always provide better equilibrium outcomes. We
also show that a Wardrop equilibrium can be obtained as the limit of a NAsh-Cournot
equilibrium for a large set of players sending a multiow on a congested network. This
result has been established in [26].
IV.2.1. Nodes, arcs, ows. A network is a graph G = (N, A) where A is a nite
set of nodes and A is a set of arcs i.e., one-way links between pairs of nodes. On the
arcs circulate (ow) some commodities and, therefore, some rules of conservation of
ows must be observed at every node of the network. Water, gas, electricity distribu-

FIGURE IV.1. Nodes and arcs

tion networks are managed by utilities. Highway, railways or street networks are also
typical applications of this mathematical structure in transportation science. On the
arcs of a network circulates a ow describing, depending on the particular application,
the volume of a commodity or the number of cars, trucks, trains that run on the arc dur-
ing a specied unit of time. One denotes v
a
the ow that circulates on the arc a A.
One associates a link traversal cost S
a
(V ) with a ow vector V = (v
a
)
aA
circulating
on all the arcs. An example of such a cost function is the travel time needed by a unit
of ow to go from the origin node to the end node of arc a.
IV.2.2. Flow conservation rules.
IV.2.2.1. Simple ow conservation. We associate with any node n N two sub-
sets of arcs
A
+
(n) :: the set of arcs with terminal node n
A
(n) :: the set of arcs with initial node n.

The simple ow conservation rule tells that the inow to a node must be equal to the
outow from this node, that is
(IV.8)
aA
+
(n)
v
a
=
aA
(n)
v
a
, n N.
IV.2.2.2. Supply and demand. We associate with each node n N two numbers
s
n
and d
n
where d
n
is the exogenous demand for the commodity in node n and s
n
is
the exogenous supply of commodity in that node. The ow conservation rule can now
be extended as follows
(IV.9)
aA
+
(n)
v
a
+ s
n
=
aA
(n)
v
a
+ d
n
, n N.
For example a gas utility has a network where some nodes are reservoirs and supply
the commodities, some other nodes are customers who consume the commodity, and
nally some node are only transshipment nodes where different pipes connect together.
The delivery of gas will have to satisfy constraints of the form (IV.9) at every node
1
.
IV.3. Optimization and equilibria on networks
IV.3.1. The ow with minimal cost. We formulate the following problem
min
V
aA
v
a
S
a
(V ) (IV.10)
s.t.
d
n
s
n
=
aA
+
(n)
v
a

aA
(n)
v
a
, n N, (IV.11)
v
a
0. (IV.12)
1
In that particular case one would have additional constraints due to the transmission of pressure
into the pipeline system; we shall not enter into this difculty and refer the reader to [?] for more details.
IV.3. OPTIMIZATION AND EQUILIBRIA ON NETWORKS 63
This corresponds to a situation where the network manager desires to satisfy all the
demands from the available supply with a minimal total cost.
IV.3.2. Competition on a network. We consider a set M = 1, 2, . . . , m of
players. Each player i M is associated with a unique pair origin-destination (x
i
, y
i
)
N N. The node x
i
is the production site and y
i
is the market site for Player i. On the
network circulates a multiow. We denote v
i
= (v
i
a
)
aA
the ow vector of Player i and
V = (v
i
)
iM
= (v
i
a
)
aA,iM
the multiow vector generated by the m players. We also
denote S
i
a
(V ) the link traversal cost incurred by Player i on link a, given the multiow
vector V .
The market of Player i is characterized by an (inverse) demand law
p
i
= f
i
(q
1
, q
2
, . . . , q
m
)
which determines the market clearing price p
i
as a function of the total supply (ows)
sent by the different players to their respective markets.
It will be convenient to introduce a return arc from each market site y
i
to the
corresponding production site x
i
with an associated unit traversal cost
S
i
a
i (v
1
a
1, v
2
a
2, . . . , v
m
a
m) = f
i
(v
1
a
1, v
2
a
2, . . . , v
m
a
m).
With this construct it is possible to describe the system as a network with a multiow
x
i

y
i
>
>
>
>
>
f
i
(v
1
a
1
, v
2
a
2
, . . . , v
m
a
m)
FIGURE IV.2. Market return arc

circulation, where , at each node, there is no exogenous supply or demand and the
simple ow conservation constraints (IV.8) have to be satised for each Player i. We
call
i
the set of vectors (v
i
, v
i
a
i
) describing the feasible circulation ows for Player i.
DEFINITION IV.3.1. A multiow V
is a Nash-Cournot equilibrium if, for each

Player i, (v
i
, v
i
a
i
)
i
and the following holds
aA
v
i
a
S
i
a
(V
) + v
i
a
i S
i
a
i (V
)
= min
(v
i
,v
i
a
i
)
i
aA
v
i
a
S
i
a
([V
i
, v
i
]) + v
i
a
i S
i
a
i ([V
i
, v
i
]), (IV.13)
where, as usual now, [V
i
, v
i
] denotes the multiow vector where only the component
corresponding to Player i has been changed to v
i
.
According to this denition, at equilibrium, each Player i, minimizes a function
which is given by the sum of the traversal costs minus the total value of the sales of the
commodity ow he controls, given the ows decided by the other players.
It will be convenient to introduce the notations
J
i
(V ) =
aA
v
i
a
S
i
a
(V ) + v
i
a
i S
i
a
i (V )
and to introduce the r-pseudo-gradient, where r = (r
1
, r
2
, . . . , r
m
) IR
m
dened as
g(V, r) =
_
_
g
1
(V, r)
g
2
(V, r)
.
.
.
g
m
(V, r)
_
_
=
_
_
r
1
v
1J
1
(V )
r
2
v
2J
2
(V )
.
.
.
r
m
v
mJ
m
(V )
_
_
.
THEOREM IV.3.1. Assume that each payoff function J
i
is C
1
and concave in v
i
. Then V
is a Nash-Cournot equilibrium iff it satises the variational inequality

(IV.14) (V V
g(V
, r) 0 V
1
2

m
.
Proof. This is a concave game. Apply Theorem Th-3.4.3. QED
THEOREM IV.3.2. Assume that each payoff function J
i
is C
2
and there exists r > 0
such that the symmetric matrix
[G(V, r) + G
(V, r)]
is negative denite, where G(V, r) is the Jacobian of the pseudo-gradient g(V, r). Then
the equilibrium is unique.
Proof. This is a concave game ` a la Rosen. Apply Theorem III.4.2. QED
IV.3.3. Wardrop equilibrium on a network. There is a famous concept of trafc
equilibrium introduced in [67]. In this problem one considers a network G = (N, A).
For each origin-destination pair (x, y) N N there is transportation demand. The
important aspect of this demand is that it comes froma very large number of users, each
one contributing for a very small part of the demand. This corresponds approximately
to the situation of the commuters trafc in peak hour of a working day. People living in
node x go to work on node y; each user has a single car which contributes marginally
to the trafc ow. We denote w = (w
a
)
aA
the ow that results from the commuters
travelling from their origins to their destinations. The question is to determine the ow
w that should result from a locally optimizing behavior of all users. By locally we
mean an optimality which only concerns the particular user, given the behavior of all
other users.
An important concept that enters into the denition of a trafc equilibrium is the
notion of a path p linking the origin node x to the destination node y. Such a path
is a sequence of connected links, the rst one having its initial node in x and the last
one its terminal node in y. A user who has to travel from x to y selects a path. On
x

y
>
>
>
>
>

FIGURE IV.3. Paths from x to y

each arc a A, the link traversal is supposed to be inuenced by the congestion of the
network i.e., we assume it to be a function S
a
(w). The total travel time along a path p
is given by
ap
S
a
(w). The users, optimizing locally tend to select the path that has
the minimum total travel time to connect the origin x to the destination y. The concept
of Wardrop equilibrium is dened as follows
DEFINITION IV.3.2. A ow vector w
is a Wardrop equilibrium if, for any pair

origin-destination (x, y), the following holds
ap
S
a
(w
ap
S
a
(w
) (IV.15)
p
p P,
where P is the set of all possible paths from x to y and P
is the subset consisting of

all paths that are actually used (i.e., such that a positive ow is circulating on them).
One deduces immediately from this denition that, for all paths that are actually
used between an origin x and a destination y, the total journey time is equal. We
easily see that a Wardrop equilibrium corresponds to a situation where each user has
no incentive to modify the path he (she) is using for his (her) journey.
LEMMA IV.3.1. A Wardrop equilibrium ow vector w
is characterized as the so-

lution to the following variational inequality
(IV.16) (w w
S(w
) 0 w
where is the set of all feasible ows.
Proof.
IV.3.4. Braess paradox. The so called Braess paradox [7], [24] illustrates well
the difference between an optimal solution and an equilibrium solution.
Consider a highway network as in Figure IV.4. There are four origin-destination
pairs or links. A function F
k
(x), k = 1 . . . K can be identied as a travel time through
S
T
F
1
(x)=10x
1
F
2
(x)=25x
2
+2
F
4
(x)=10x
4

F
3
(x)=x
3
+50
A
B
FIGURE IV.4. A trafc network.
link k where x
k
is the total ow assigned to this link; here, K = 4. The functions
F
k
(x) are increasing in x; clearly, the more trafc, the longer the journey.
The journey time is an obvious performance index for a commuter in this problem.
Another index could the maintenance cost incurred by the local government. We will
assume that the total system cost is
(IV.17) J =
K
k=1
x
k
F
k
(x
k
).
Suppose that the demand for travel fromS to T is 3.6 . We will discuss what distribution
of trafc is likely.
If the total trafc went through the upper links 1 and 2 only the travel time would be
128.5 . Conversely, if everybody used links 3 and 4 the journey would last 89.6 . The
total costs would be 460.80 and 322.56 , respectively. See the second and third columns
in Table IV.1 (the subsequent columns show other solutions obtained by using different
performance indices). Obviously, by sharing the upper and bottom links these times
and costs could be improved.
If the customers of this highway network could see what the current trafc on each
link were they would try to minimize their travel times until an equilibrium would
establish itself. At this equilibrium, the journey times through 1-2 and 3-4 would be
equalled. Hence, the following equations have to be satised for the equilibrium to
1-2 3-4 equil. min cost min cost(+5) equil.(+5) _equil.(+5)
x
1
SA 3.6 0 1.9 1.38 2.06 3.6 2.24
x
2
AT 3.6 0 1.9 1.38 1.17 1.32 1.63
x
3
SB 0 3.6 1.7 2.22 1.54 0 1.36
x
4
BT 0 3.6 1.7 2.22 2.43 2.28 1.97
x
5
AB n.a. n.a. n.a. n.a. .89 2.28 .61
max time 128.5 89.6 68.7 74.4 75.9 71.1 71.1
cost 460.80 322.56 247.1 234.6 227.12 255.8 235.0
TABLE IV.1. Various trafc intensities and the corresponding payoffs.
exist:
10x
1
+ 25x
2
+ 2 = x
3
+ 50 + 10x
4
(IV.18)
x
1
x
2
= 0 (IV.19)
x
3
x
4
= 0 (IV.20)
x
1
+ x
3
= 3.6 . (IV.21)
The solution to (IV.18)-(IV.21) and the corresponding time and cost are shown in col-
umn four of Table IV.1.
Let us see now what the local government preferred solution is. This will be ob-
tained by solving the following minimization problem
min
4
k=1
x
k
F
k
(x
k
) (IV.22)
s.t. (IV.19) (IV.21) hold .
The solution to (IV.22) is presented in column ve. Clearly, the trafc through the
quickly saturating (and expensive) link AT has easied. The maximum journey time
given in the table is through links 3-4; the other links time is 50.5 .
There is a clear difference between the equilibrium solution that minimizes the
travel time (equilibrium) and the local governments cost minimizing solution. In the
commuter preferred solution there is a lot of trafc through links 1-2 and less trafc
through links 3-4. The local government might consider a quick link AB as in Figure
IV.5 (not to scale!) in an attempt to improve the situation by redirecting some trafc
away from the congested AT toward the under-utilized BT.
Suppose that the local government has built the link AB, see Figure IV.5, in the
expectation to diminish the trafc through AT.
S
T
F
1
(x)=10x
1
F
2
(x)=25x
2
+2
F
4
(x)=10x
4

F
3
(x)=x
3
+50
A
B
F
5
(x)=
x
5
+10
FIGURE IV.5. An extended trafc network.
Let us solve the problem of the cost minimization with the fth link added to the
network as follows:
min
5
k=1
x
k
F
k
(x
k
) (IV.23)
s.t.
x
1
= x
2
+ x
5
(IV.24)
x
4
= x
3
+ x
5
(IV.25)
x
1
+ x
3
= 3.6 . (IV.26)
The solution is presented in column six. Apparently, this solution gives the lowest
government cost so, the link utilization must be more equilibrated than without the
AB link. However, the maximum travel time (through 3-4) grows while the remaining
links travel times are: 51.8 (1-2) and 56.5 (1-3-4).
This solution gives longer journey times for commuters travelling through the links
3-4 than without the AB connection. The commuters will obviously seek to improve
these times until an equilibrium is achieved.
Unfortunately, there is no global equilibrium, which would make the travel times
1-2, 1-5-4 and 3-4 identical. This is obvious as the following system of equations , has
no positive solution:
10(x
11
+ x
12
) + 25x
11
+ 2 = 10(x
11
+ x
12
) + x
12
+ 10 + 10(x
12
+ x
3
) (IV.27)
10(x
11
+ x
12
) + 25x
11
+ 2 = x
3
+ 50 + 10(x
12
+ x
3
) (IV.28)
x
11
+ x
12
+ x
3
= 3.6 (IV.29)
where x
11
+ x
12
= x
1
is the trafc through SA. Neither the travel times through 1-5-4
and 3-4 can be equalized. The only equilibrium, which can be established is such that
IV.4. A CONVERGENCE RESULT 69
the journey times through 1-5-4 and 1-2 are the same and no trafc passes through SB.
Indeed, the system
10(x
11
+ x
12
) + x
11
+ 10 + 10x
11
= 10(x
11
+ x
12
) + 25x
12
+ 2 (IV.30)
10(x
11
+ x
12
) + 25x
11
+ 2 = x
3
+ 50 + 10(x
12
+ x
3
) (IV.31)
x
11
+ x
12
= 3.6 (IV.32)
has a unique solution which is written in column seven of Table IV.1. Here is the
paradox, the travellers are worse off than without the link AB! Also surprisingly, it
is uneconomical for a single commuter to start the journey using link SB. Indeed,
x + 50 > 11x + 10 for x < 4.
2
The authors of [30] point to the fact that more options mean more strategies and
more equilibria. The Breass paradox illustrates the fact that, in a new equilibrium, all
players may be worse off. The authors of [30] also notice certain similarity between
Breass paradox and the prisoners dilemma (see Example III.3.3, page 34); the latter
exists because each prisoners strategy space contains two options. In other words,
there would be no dilemma and the payoffs (2,2) would be a unique equilibrium if the
players had just one option of no cooperating (e.g., if they spoke a different language
than the police), see Table III.1 page 34.
IV.4. A convergence result
We address the following question
is it possible to approximate a Wardrop equilibrium ow w
by the
total ow w
(n) resulting from a Nash-Cournot equilibrium for the

game played on the network G by nm players partitioned into m
classes I
i
, i M, where a class corresponds to a particular origin-
destination pair (x
i
, y
i
)?
IV.4.1. The approximating games. We call (n) the game dened on the net-
work Gin the following way. There are morigin-destination pairs (x
i
, y), i = 1, 2, . . . , m.
With each i is associated a set M
i
of n identical players who share the same link tra-
versal cost function S
a
(w) and the same demand law f
i
(q
1
, q
2
, . . . , q
m
), where
q
i
=
M
i
v
a
i , i = 1, 2, . . . , m.
We remind the reader that the arc a
i
is the return link between the market site y
i
and
the production site x
i
.
2
Many trafc intensity distributions could weakly dominate the equilibrium; for a result of that kind
see the last column in Table IV.1. The equilibrium domination is called a generalized Breass paradox
[24].
We assume that the commodities produced and marketed by the nm players are
homogenous and we call
w
a
=
=1,...,nm
v
a
the total ow circulating on the arc a. The payoff of Player M
i
is therefore given
by
(IV.33) J
(v
1
, v
2
, . . . , v
mn
) = v
a
i f
i
(q
1
, q
2
, . . . , q
m
)
aA
v
a
S
a
(w).
If we extend the arc set A to also contain the return arcs a
i
, i M, with an associated
traversal cost S
a
i (V ) = f
i
(q
1
, q
2
, . . . , q
m
), we can rewrite the payoff functions in
the more compact form
(IV.34) J
(v
1
, v
2
, . . . , v
mn
) =
aA
v
a
S
a
(w).
We shall now characterize the Nash-Cournot equilibrium through the associated vari-
ational inequality and use this formulation to prove the convergence to a limit which
corresponds to a Wardrop-like equilibrium.
LEMMA IV.4.1. Assume that all traversal cost functions are C
1
, and the payoff
function of each Player is concave in v
. Then a Nash-Cournot equilibrium (v
(n),
= 1, . . . , nm, for the game (n) is characterized by the variational inequality
(IV.35) (v
(n) v)
(S(w
(n)) + (S(w
(n)))
(n) 0
v
M
i
, i = 1, . . . , m.
Proof. Explicit the condition established in Theorem IV.3.1. QED
We are now ready to prove the main result
THEOREM IV.4.1. Under the assumptions of Lemma IV.4.1there exists a Wardrop
equilibrium on the network G, with ow vector w
and a sequence (n)

nIN
of
games on G admitting Nash equilibria multiow V
(n), such that the resulting total

ow vectors w
a
(n) =
=1,...,nm
v
a
(n), a A, verify
lim
n
w
(n) = w
.
Proof. For each n the game (n) admits a Nash-Cournot equilibrium multiow
V
(n). Since all the players in M

i
are supposed to be identical we can aleays assume
that all the players in M
i
share the same strategy and, therefore, we can write
v
(n) =
1
n

i
(n), M
i
where
i
(n) is a ow which is uniformly bounded. The associated total link-ow
vector
w
a
(n) =
=1,...,nm
v
a
(n) =
m
i=1
i
(n)
IV.4. A CONVERGENCE RESULT 71
is also uniformly bounded. Hence, as n tends to , one can extract a converging
subsequence (still denoted by n for the sake of simplifying the notations) with a limit
denoted w
.
Replacing v
(n) by
1
n

i
(n) in (IV.35) and summing over all players, there comes
(w
(n) w)
(S(w
(n)) + Q
n
0 w
with
Q
n
=
m
i=1
M
i
_
i
(n)
n
v
_
(S(w
(n)))
i
(n)
n
.
Now, since the set of feasible ows is compact, one has lim
n
Q
n
= 0, therefore
the limit ow vector w
satises
(w
w)
S(w
) 0 w
and this is by Lemma IV.3.1 a Wardrop equilibrium. QED
Part 2
Repeated and sequential Games
CHAPTER V
Repeated Games and Memory Strategies
In this chapter we begin our analysis of dynamic games. To say it in an imprecise
but intuitive way, the dynamic structure of a game expresses a change over time of the
conditions under which the game is played. In a repeated game this structural change
over time is due to the accumulation of information about the history of the game. As
time unfolds the information at the disposal of the players changes and, since strategies
transform this information into actions, the players strategic choices are affected. If
a game is repeated twice i.e., the same game is played in two successive stages and a
reward is obtained at each stage, one can easily see that, at the second stage, the players
can decide on the basis of the outcome of the rst stage. The situation becomes more
and more complex as the number of stages increases, since the players can base their
decisions on histories represented by sequences of actions and outcomes observed
over increasing numbers of stages. We may also consider games that are played over an
innite number of stages. These innitely repeated games are particularly interesting,
since, at each stage, there is always an innite number of remaining stages over which
the game will be played and, therefore, there will not be the so-called end of horizon
effect where the fact that the game is ending up affects the players behavior. Indeed,
the evaluation of strategies will have to be based on the comparison of innite streams
of rewards and we shall explore different ways to do that. In summary, the important
concept to understand here is that, even if it is the same game which is repeated over
a number of periods, the global repeated game becomes a fully dynamic system with
a much more complex strategic structure than the one-stage game.
In this chapter we shall consider repeated bimatrix games and repeated concave m-
player games. We shall explore, in particular, the class of equilibria when the number
of periods over which the game is repeated (the horizon ) becomes innite. We will
show how the class of equilibria in repeated games can be expended very much if one
allows memory in the information structure. The class of memory strategies allows the
players to incorporate threats in their strategic choices. A threat is meant to deter the
opponents to enter into a detrimental behavior. In order to be effective a threat must be
credible. We shall explore the credibility issue and show how the subgame perfectness
property of equilibria introduces a renement criterion that make these threats credible.
Notice that, at a given stage, the players actions have no direct inuence on the
normal form description of the games that will be played in the forthcoming periods. A
more complex dynamic structure would obtain if one assumes that the players actions
at one stage inuence the type of game to be played in the next stage. This is the case
75
76 V. REPEATED GAMES AND MEMORY STRATEGIES
for the class of Markov or sequential games and a fortiori for the class of differential
games, where the time ows continuously. These dynamic games will be discussed in
subsequent chapters of this volume.
V.1. Repeating a game in normal form
Repeated games have been also called games in semi-extensive formin [21]. Indeed
this corresponds to an explicit dynamic structure, as in the extensive form, but with a
game in normal form dened at each period of time. In the repeated game, the global
strategy becomes a way to dene the choice of a stage strategy at each period, on
the basis of the knowledge accumulated by the players on the history of the game.
Indeed the repetition of the same game in normal formpermits the players to adapt their
strategies to the observed history of the game. In particular, they have the opportunity
to implement a class of strategies that incorporate threats.
The payoff associated with a repeated game is usually dened as the sum of the
payoffs obtained at each period. However, when the number of periods tends to , a
total payoff that implies an innite sum may not converge to a nite value. There are
several ways of dealing with the comparison of innite streams of payoffs. We shall
discover some of them in the subsequent examples.
V.1.1. Repeated bimatrix games. Figure V.1 describes a repeated bimatrix game
structure. The same matrix game is played repeatedly, over a number T of periods or
1 q
1 (
j
11
)
j=1,2
(
j
1q
)
j=1,2
.
.
.
.
.
.
.
.
.
.
.
.
p (
j
p1
)
j=1,2
(
j
pq
)
j=1,2
1 q
1 (
j
11
)
j=1,2
(
j
1q
)
j=1,2
.
.
.
.
.
.
.
.
.
.
.
.
p (
j
p1
)
j=1,2
(
j
pq
)
j=1,2

FIGURE V.1. A repeated game
stages that represent the passing of time. In the one-stage game, where Player 1 has p
pure strategies and Player 2 q pure strategies and the one-stage payoff pair associated
with the startegy pair (k, ) is given by (
j
k
)
j=1,2
. The payoffs are accumulated over
time. As the players may recall the past history of the game, the extensive formof those
repeated games is quite complex. The game is dened in a semi-extensive form since
at each period a normal form (i.e., an already aggregated description) denes the con-
sequences of the strategic choices in a one-stage game. Even if the same game seems
to be played at each period (it is similar to the others in its normal form description), in
fact, the possibility to use the history of the game as a source of information to adapt
the strategic choice at each stage makes a repeated game a fully dynamic object.
V.1. REPEATING A GAME IN NORMAL FORM 77
We shall use this structure to prove an interesting result, called the folk theorem,
which shows that we can build an innitely repeated bimatrix game, played with the
help of nite automata, where the class of equilibria permits the approximation of any
individually rational outcome of the one-stage bimatrix game.
V.1.2. Repeated concave games. Consider nowthe class of dynamic games where
one repeats a concave game ` a la Rosen. Let t 0, 1, . . . , T 1 be the time index.
At each time period t the players enter into a noncooperative game dened by
(M; U
j
,
j
(), j M)
where as indicated in section III.4, M = 1, 2, . . . , m is the set of players, U
j
is
a compact convex set describing the actions available to Player j at each period and
j
(u(t)) is the payoff to Player j when the action vector u(t) = (u
1
(t), . . . , u
m
(t))
U = U
1
U
2
U
m
is chosen at period t by the m players. This function is
assumed to be concave in u
j
.
We shall denote u = (u(t) : t = 0, 1 . . . , T 1) the actions sequence over the
sequence of T periods. The total payoff of Player j over the T-horizon is then dened
by
(V.1) V
T
j
( u) =
T1
t=0
j
(u(t)).
V.1.2.1. Open-loop information structure. At period t, each Player j knows only
the current time t and what he has played in periods = 0, 1, . . . , t 1. He does not
observe what the other players do. We implicitly assume that the players cannot use
the rewards they receive at each period to infer through e.g., an appropriate ltering the
actions of the other players. The open-loop information structure actually eliminates
almost every aspect of the dynamic structure of the repeated game context.
V.1.2.2. Closed-loop information structure. At period t, each Player j knows not
only the current time t and what he has played in periods = 0, 1, . . . , t 1, but also
what the other players have done at previous period. We call history of the game at
period the sequence
h() = (u(t) : t = 0, 1 . . . , 1).
A closed-loop strategy for Player j is dened as a sequence of mappings
j
: h() U
j
.
Due to the repetition over time of the same game, each Player j can adjust his choice
of one-stage action u
j
to the history of the game. This permits the consideration of
memory strategies where some threats are included in the announced strategies. For
example, a player can declare that some particular histories would trigger a retalia-
tion from his part. The description of a trigger strategy will therefore include
a nominal mood of play that contributes to the expected or desired outcomes
a retaliation mood of play that is used as a threat
the set of histories that trigger a switch from the nominal mood of play to the
retaliation mood of play.
Many authors have studied equilibria obtained in repeated games through the use of
trigger strategies (see for example [51], [52], [22]).
V.1.2.3. Innite horizon games. If the game is repeated over an innite number
of periods then payoffs are represented by innite streams of rewards
j
(u(t)), t =
1, 2, . . . , .
Discounted sum of rewards: We have assumed that the actions sets U
j
are compact,
therefore, since the functions
j
() are continuous, the one-stage rewards
j
(u(t)) are
uniformly bounded. Hence, the discounted payoffs
(V.2) V
j
( u) =
t=0
t
j
j
(u(t)),
where
j
[0, 1) is the discount factor of Player j, are well dened. Therefore, if the
players discount over time, we can compare strategies on the basis of the discounted
sums of the rewards they generate for each player.
Average rewards per period: If the players do not discount over time the compari-
son of strategies imply the consideration of innite streams of rewards that sum up to
innity. A way to circumvent the difculty is to consider a limit average reward per
period in the following way
(V.3) g
j
( u) = liminf
T
1
T
T1
t=0
j
(u(t)).
To link this evaluation of innite streams of rewards with the previous one based on
discounted sums one may dene the equivalent discounted constant reward
(V.4) g
j
j
( u) = (1
j
)
t=0
t
j
j
(u(t)).
One expects that, for well behaved games, the following limiting property holds true
(V.5) lim
j
1
g
j
j
( u) = g
j
( u).
V.1.2.4. Threats. In innite horizon repeated games each player has always enough
time to retaliate. He will always have the possibility to implement an announced threat,
if the opponents are not playing according to some nominal sequence corresponding to
an agreement. We shall see that the equilibria, based on the use of threats constitute a
very large class. Therefore the set of strategies with the memory information structure
is much more encompassing than the open-loop one.
V.2. FOLK THEOREM 79
V.1.2.5. Existence of a trivial dynamic equilibrium. We know, by Rosens ex-
istence theorem, that a concave static game
(M; U
j
,
j
(), j M)
admits an equilibrium u
. It should be clear (see exercise 4.1) that the sequence u
=
(u(t) u
: t = 0, 1 . . . , T 1) is also an equilibrium for the repeated game in both

the open-loop and closed-loop (memory) information structure.
Assume the one period game is such that a unique equilibrium exists. The open-
loop repeated game will also have a unique equilibrium, whereas the closed-loop re-
peated game will still have a plethora of equilibria, as we shall see it shortly.
V.2. Folk theorem
In this section we present the celebrated folk theorem. This name is due to the
fact that there is no well dened author of the rst version of it. The idea is that an
innitely repeated game should permit the players to design equilibria, supported by
threats, with outcomes being Pareto efcient. The strategies will be based on memory,
each player remembering the past moves of game. Indeed the innite horizon could
give raise to an innite storage of information, so there is a question of designing
strategies with the help of mechanisms that would exploit the history of the game but
with a nite information storage capacity. Finite automata provide such systems.
V.2.1. Repeated games played by automata. A nite automaton is a logical sys-
tem that, when stimulated by an input, generates an output. For example an automaton
could be used by a player to determine the stage-game action he takes given the in-
formation he has received. The automaton is nite if the length of the logical input (a
stream of 0 1 bits) it can store is nite. So in our repeated game context, a nite
automaton will not permit a player to determine his stage-game action upon an innite
memory of what has happened before. To describe a nite automaton one needs the
following
A list of all possible stored input congurations; this list is nite and each
element is called a state. One element of this list is the initial state.
An output function determines the action taken by the automaton, given its
current state.
A transition function tells the automaton how to move from one state to an-
other after it has received a new input element.
In our paradigm of a repeated game played by automata, the input element for the
automaton a of Player 1 is the stage-game action of Player 2 in the preceding stage,
and symmetrically for Player 2. In this way, each player can choose a way to process
the information he receives in the course of the repeated plays, in order to decide the
next action.
An important aspect of nite automata is that, when they are used to play a repeated
game, they necessarily generate cycles in the successive states and hence in the actions
taken in the successive stage-games.
Let G be a nite two player game i.e., a bimatrix game, which denes the one-
stage game. U and V are the sets of pure strategies in the game G for Player 1 and 2
respectively. Denote by g
j
(u, v) the payoff to Player j in the one-stage game when the
pure strategy pair (u, v) U V has been selected by the two players.
The game is repeated indenitely. Although the set of strategies for an innitely
repeated game is enormously rich, we shall restrict the strategy choices of the players
to the class of nite automata. A pure strategy for Player 1 is an automaton a A
which has for input the actions u U of Player 2. Symmetrically a pure strategy for
Player 2 is an automaton b B which has for input the actions v V of Player 1.
We associate with the pair of nite automata (a, b) A B a pair of average
payoffs per stage dened as follows
(V.6) g
j
(a, b) =
1
N
N
n=1
g
j
(u
n
, v
n
) j = 1, 2,
where N is the length of the cycle associated with the pair of automata (a, b) and
(u
n
, v
n
) is the action pair at the n-th stage in this cycle. One notices that the expression
(V.6) is also the limit average reward per period due to the cycling behavior of the two
automata.
We call ( the game dened by the strategy sets A and B and the payoff functions
(V.6).
V.2.2. Minimax point. Assume Player 1 wants to threaten Player 2. An effective
threat would be to dene his action in a one-stage game through the solution of
m
2
= min
u
max
v
g
2
(u, v).
Similarly if Player 2 wants to threaten Player 2, an effective threat would be to dene
his action in a one-stage game through the solution of
m
1
= min
v
max
u
g
1
(u, v).
We call minimax point the pair m = ( m
1
, m
2
) in IR
2
.
V.2.3. Set of outcomes dominating the minimax point.
V.2. FOLK THEOREM 81
THEOREM V.2.1. Let T
m
be the set of outcomes in a matrix game that dominate
the minimax point m. Then the outcomes corresponding to Nash equilibria in pure
strategies
1
of the game ( are dense in T
m
.
Proof: Let g
1
, g
2
, . . . , g
K
be the pairs of payoffs that appear in the bimatrix game
when it is written in the vector form, that is
g
k
= (g
1
(u
k
, v
k
), g
2
(u
k
, v
k
))
where (u
k
, v
k
) is a possible pure strategy pair. Let q
1
, q
2
, . . . , q
K
be nonnegative ratio-
nal numbers that sum to 1. Then the convex combination
g
= q
1
g
1
+ q
2
g
2
, . . . , q
K
g
K
is an element of T if g
1
m
1
and g
2
m
2
. The set of these g is dense in T. Each q
k
being a rational number can be written as a ratio of two integers, that is a fraction. It is
possible to write the K fractions with a common denominator, hence
q
k
=
n
k
N
,
with
n
1
+ n
2
+ + n
K
= N
since the q
k
s must sum to 1.
We construct two automata a
and b
, such that, together, they play (u

1
, v
1
) dur-
ing n
1
stages, (u
2
, v
2
) during n
2
stages, etc. They complete the sequence by playing
(u
K
, v
K
) during n
K
stages and the N-stage cycle begins again. The limit average per
period reward of Player j in the ( game played by these automata is given by
g
j
(a
, b
) =
1
N
K
k=1
n
k
g
j
(u
k
, v
k
) =
K
k=1
q
k
g
j
(u
k
, v
k
) = g
j
.
Therefore these two automata will achieve the payoff pair g. Now we rene the struc-
ture of these automata as follows:
Let u(n) and v(n) be the pure strategy that Player 1 and Player 2 respectively
are supposed to play at stage n, according to the cycle dened above.
The automaton a
plays u(n+1) at stage n+1 if automaton b has played v(n)

at stage n; otherwise it plays u, the strategy that is minimaxing Player 2s one
shot payoff and this stage strategy will be played forever.
Symmetrically, the automaton b
plays v(n+1) at stage n+1 if automaton a

has played u(n) at stage n; otherwise it plays v, the strategy that is minimax-
ing Player 1s one shot payoff and this stage strategy will be played forever.
It should be clear that the two automata a
and b
dened as above constitute a Nash

equilibrium for the repeated game ( whenever g is dominating mi.e., g T
m
. Indeed,
1
A pure strategy in the game ( is a deterministic choice of an automaton; this automaton will
process information coming from the repeated use of mixed strategies in the repeated game.
if Player 1 changes unilaterally his automaton to a then the automaton b
will select
for all periods, except a nite number, the minimax action v and, as a consequence, the
limit average per period reward obtained by Player 1 is
g
1
(a, b
) m
1
g
1
(a
, b
).
Similarly for Player 2, a unilateral change to b B leads to a payoff
g
2
(a
, b) m
2
g
2
(a
, b
).
Q.E.D.
REMARK V.2.1. This result is known as the folk theorem, because it has always
been in the folklore of game theory without knowing to whom it should be attributed.
The class of equilibria is more restricted if one imposes a condition that the threats
used by the players in their announced strategies have to be credible. This is the issue
discussed in one of the following sections under the name of subgame perfectness.
V.3. Collusive equilibrium in a repeated Cournot game
V.3.1. Innite horizon and Cournot equilibrium threats. Consider a Cournot
oligopoly game repeated over an innite sequence of periods. Let t = 1, . . . , be the
sequence of time periods. Let q(t) IR
m
be the production vector chosen by the m
rms at period t and
j
(q(t)) the payoff to Player j at period t.
We denote
q = q(1), . . . , q(t), . . .
the innite sequence of production decisions of the m rms. Over the innite time
horizon the rewards of the players are dened as
V
j
( q) =
t=0
t
j
j
(q(t))
where
j
[0, 1) is a discount factor used by Player j.
We assume that, at each period t, all players have the same information h(t) which
is the history of passed productions decided by all the rms in competition, that is
h(t) = (q(0), q(1), . . . , q(t 1)).
We consider, for the one-stage game, two possible outcomes,
(i): the Cournot equilibrium, supposed to be uniquely dened by q
c
= (q
c
j
)
j=1,...,m
and
(ii): a Pareto outcome resulting from the production levels q
= (q
j
)
j=1,...,m
and
such that
j
(q)

j
(q
c
), j = 1, . . . , m. We say that this Pareto outcome is
a collusive outcome dominating the Cournot equilibrium.
V.3. COLLUSIVE EQUILIBRIUM IN A REPEATED COURNOT GAME 83
A Cournot equilibrium for the repeated game is dened by the strategy consisting
for each Player j to choose repeatedly the production levels q
j
(t) q
c
j
, j = 1, 2. How-
ever there are many other equilibria that can be obtained through the use of memory
strategies.
Let us dene
j
=
t=0
t
j
j
(q
) (V.7)
j
(q
) = max
q
j
j
([q
(j)
, q
j
]). (V.8)
The following has been shown in [21]
LEMMA V.3.1. There exists an equilibrium for the repeated Cournot game that
yields the payoffs

V
j
if the following inequality holds
(V.9)
j

j
(q
)
j
(q
j
(q
)
j
(q
c
)
.
This equilibrium is reached through the use of a so-called trigger strategy dened as
follows
(V.10)
if (q(0), q(1), . . . , q(t 1)) = (q
, q
, . . . , q
) then q
j
(t) = q
j
otherwise q
j
(t) = q
c
j
.
_
Proof: If the players play according to (V.10) the payoff they obtain is given by
2
j
=
t=0
t
j
j
(q
) =
1
1
j
j
(q
).
Assume Player j decides to deviate unilaterally at period 0. He knows that his deviation
will be detected at period 1 and that, thereafter the Cournot equilibrium production
level q
c
will be played. So, the best he can expect from this deviation is the following
payoff which combines the maximal reward he can get in period 0 when the other rms
play q
k
with the discounted value of an innite stream of Cournot outcomes
3
.
V
j
( q) =
j
(q
) +

j
1
j
j
(q
c
).
This unilateral deviation is not protable if
V
j
( q)

V
j
=
t=0
t
j
j
(q
) =
1
1
j
j
(q
),
that is, if the following inequality holds
j
(q
)
j
(q
) +

j
1
j
(
j
(q
)
j
(q
c
)) 0,
2
We use here the fact that the
k=0
k
=
1
1
when < 1.
3
We use here the fact that the
k=1
k
=

1
when < 1.
which can be rewritten as
(1
j
)(
j
(q
)
j
(q
)) +
j
(
j
(q
)
j
(q
c
)) 0,
from which we get
j
(q
)
j
(q
)
j
(
j
(q
c
)
j
(q
)) 0.
and, nally the condition (V.9)
j

j
(q
)
j
(q
j
(q
)
j
(q
c
)
.
The same reasonning holds if the deviation occurs at any other period t. Q.E.D.
REMARK V.3.1. According to this strategy the q
production levels correspond to

a cooperative mood of play while the q
c
correspond to a punitive mood of play. The
punitive mood is a threat. As the punitive mood of play consists to play the Cournot
solution which is indeed an equilibrium, it is a credible threat. Therefore the players
will always choose to play in accordance with the Pareto solution q
and thus the

equilibrium denes a nondominated outcome.
V.3.2. Subgame perfectness. It is important to notice the difference between the
threats used in the folk theorem and the ones used in this repeated Cournot game. In
the folk theorem, the threats are the most effective ones in the sense that the threats
aims at maximizing the damage done to the opponent. However, as already mentioned
these threats lack credibility since, for the player who is using the threats, this action
is not the best course of action, given the opponent behavior.
In the innitely repeated Cournot game, the threats constite also an equilibrium for
the dynamic game. Therefore the threats are credible, since, once the threats are ap-
plied by every player, everyone is reacting optimally to the actions of the other players.
We say that the memory (trigger strategy) equilibrium is subgame perfect in the sense
of Selten [56].
V.3.3. Finite vs innite horizon. There is an important difference between nitely
and innitely repeated games. If the same game is repeated only a nite number of
times then, in a subgame perfect equilibrium, a single game equilibrium will be re-
peated at each period. This is due to the impossibility to retaliate at the last period.
Then this situation is translated backward in time until the initial period. When the
game is repeated over an innite time horizon there is always enough time to retaliate
and even, possibly, to resume cooperation. For example, let us consider a repeated
Cournot game, without discounting where the players payoffs are determined by the
long term average of the one sage rewards. If the players have perfect information
with delay they can observe without errors the moves selected by their opponents in
previous stages. There might be a delay of several stages before this observation is
made. In that case it is easy to show that, to any cooperative solution dominating a
static Nash equilibrium corresponds a subgame perfect sequential equilibrium for the
repeated game, based on the use of trigger strategies. These strategies use as a threat
V.4. EXERCISES 85
a switch to the noncooperative solution for all remaining periods, as soon as a player
has been observed not to play according to the cooperative solution. Since the long
term average only depends on the innitely repeated one stage rewards, it is clear that
the threat gives raise to the Cournot equilibrium payoff in the long run. This is the
subgame perfect version of the folk theorem [22].
V.3.4. A repeated stochastic Cournot game with discounting and imperfect
information. We give now an example of stochastic repeated game for which the
construction of equilibria based on the use of threats is not trivial. This is the stochastic
repeated game, based on the Cournot model with imperfect information, (see [23],
[50]).
Let X(t) =
m
j=1
x
j
(t) be the total supply of m rms at time t and (t) be a
sequence of i.i.d
4
random variables affecting these prices. Each rm j chooses its
supplies x
j
(t) 0, in a desire to maximize its total expected discounted prot
V
j
(x
1
(), . . . , x
j
(), . . . , x
m
()) = E
t=0
t
[D(X(t), (t))x
j
(t) c
j
(x
j
(t))],
j = 1, . . . , m,
where < 1 is the discount factor. The assumed information structure does not allow
each player to observe the opponent actions used in the past. Only the sequence of
past prices, corrupted by the random noise, is available. Therefore the players cannot
monitor without error the past actions of their opponent(s). This impossibility to detect
without error a breach of cooperation increases considerably the difculty of building
equilibria based on the use of credible threats. In [23], [50] it is shown that there exist
subgame perfect equilibria, based on the use of trigger strategies, which are dominat-
ing the repeated Cournot-Nash equilibrium. This construct uses the concept of Markov
game and will be further discussed in Part 3 of these Notes. It will be shown that these
equilibria are in fact closely related to the so-called correlated and/or communication
equilibria discussed at the end of Part 2.
V.4. Exercises
Exercise 5.1. Prove that the repeated one-stage equilibrium is an equilibrium for
the repeated game with additive payoffs and a nite number of periods. Extend the
result to innitely repeated games.
Exercise 5.2. Show that condition (V.9) holds if
j
= 1. Adapt Lemma V.3.1 to the
case where the game is evaluated through the long term average reward criterion.
4
Independent and identically distributed.
CHAPTER VI
Shapleys Zero Sum Markov Game
VI.1. Process and rewards dynamics
The concept of Markov game has been introduced, in a zero-sum game framework,
in [57]. The structure of this game can also be described as a controlled Markov chain
with two competing agents.
Let S = 1, 2, . . . , n be the set of possible states of a discrete time stochastic
process x(t) : t = 0, 1, . . . . Let U
j
= 1, 2, . . . ,
j
, j = 1, 2, be the nite action
sets of two players. The process dynamics is described by the transition probabilities
p
s,s
(u) = P[x(t + 1) = s
[x(t) = s, u], s, s
S, u U
1
U
2
,
which satisfy, for all u U
1
U
2
p
s,s
(u) 0
S
p
s,s
(u) = 1, s S.
As the transition probabilities depend on the players actions we speak of a controlled
Markov chain. A transition reward function
r(s, u), s S, u U
1
U
2
,
denes the gain of Player 1 when the process is in state s and the two players take the
action pair u. Player 2s reward is given by r(s, u), since the game is assumed to be
zero-sum.
VI.2. Information structure and strategies
VI.2.1. The extensive form of the game. The game dened above corresponds to
a game in extensive form with an innite number of moves. We assume an information
structure where the players choose sequentially and simultaneously, with perfect recall,
their actions. This is illustrated in gure VI.1
VI.2.2. Strategies. A strategy is a device which transforms the information into
action. Since the players may recall the past observations, the general form of a strat-
egy can be quite complex.
87
88 VI. SHAPLEYS ZERO SUM MARKOV GAME
`
_
s

P
1

`
u
1
1
`
`
`
`
`
`
`
`
`
`
u
2
1
_
`
P
2
P
2
u
1
2
u
2
2
u
1
2
u
2
2
`
_
E
>
>
>
>>
p
s,
(u)
p
s,
(u)
`
_
`
_
`
_
`
_
E
>
>
>
>>
p
s,
(u)
p
s,
(u)
`
_
`
_
`
_
`
_
E
>
>
>
>
>
p
s,
(u)
p
s,
(u)
`
_
`
_
`
_
`
_
E
>
>
>
>
>
p
s,
(u)
p
s,
(u)
`
_
`
_
`
_
FIGURE VI.1. A Markov game in extensive form

Markov strategies: These strategies are also called feedback strategies. The as-
sumed information structure allows the use of strategies dened as mappings
j
: S P(U
j
), j = 1, 2,
where P(U
j
) is the class of probability distributions over the action set U
j
. Since
the strategies are based only on the information conveyed by the current state of the
x-process, they are called Markov strategies.
When the two players have chosen their respective strategies the state process be-
comes a Markov chain with transition probabilities
P
1
,
2
s,s
k=1
=1
1
(s)
k

2
(s)
p
s,s
(u
k
1
, u
2
),
VI.3. SHAPLEYS-DENARDO OPERATOR FORMALISM 89
where we have denoted
1
(s)
k
, k = 1, . . . ,
1
, and
2
(s)
, = 1, . . . ,
2
the prob-
ability distributions induced on U
1
and U
2
by
1
(s) and
2
(s) respectively. In order
to formulate the normal form of the game, it remains to dene the payoffs associated
with the admissible strategies of the twp players. Let [0, 1) be a given discount
factor. Player 1s payoff, when the game starts in state s
0
, is dened as the discounted
sum over the innite time horizon of the expected transitions rewards i.e.,
V (s
0
;
1
,
2
) = E
1
,
2
[
t=0
k=1
=1
1
(x(t))
k

2
(x(t))
r(x(t), u
k
1
, u
2
)[x(0) = s
0
].
Player 2s payoff is equal to V (s
0
;
1
,
2
).
DEFINITION VI.2.1. A pair of strategies (
1
,
2
) is a saddle point if, for all strate-
gies
1
and
2
of Players 1 and 2 respecively, for all s S
(VI.1) V (s;
1
,
2
) V (s;
1
,
2
) V (s;
1
,
2
).
The number
v
(s) = V (s;
1
,
2
)
is called the value of the game at state s.
Memory strategies: Since the game is of perfect recall, the information on which a
player can base his/her decision at period t is the whole state-history
h(t) = x(0), x(1), . . . , x(t).
Notice that we do not assume that the players have a direct access to the actions used
in the past by their opponent; they can only observe the state history. However, as in
the case of a single-player Markov decision processes, it can be shown that optimality
does not require more than the use of Markov strategies.
VI.3. Shapleys-Denardo operator formalism
We introduce, in this section the powerful formalism of dynamic progaming oper-
ators that has been formally introduced in [11] but was already implicit in Shapleys
work. The solution of the dynamic programming equations for the stochastic game is
obtained as a xed point of an operator acting on the space of value functions.
VI.3.1. Dynamic programming operators. In a seminal paper [57] the existence
of optimal stationary Markov strategies has been established via a xed point argu-
ment, involving a contracting operator. We give here a brief account of the method,
taking our inspiration from [11] and [15]. Let v() = (V (s) : s S) be an arbitrary
function with real values, dened over S. Since we assume that S is nite this is also
a vector. Introduce, for any s S the so-called local reward functions
h(v(), s, u
1
, u
2
) = r(s, , u
1
, u
2
) +
S
p
s,s
(u
1
, u
2
) v(s
), (u
1
, u
2
) U
1
U
2
.
90 VI. SHAPLEYS ZERO SUM MARKOV GAME
Dene now, for each s S, the zero-sum matrix game with pure strategies U
1
and U
1
and with payoffs
H(v(), s) = [h(v(), s, u
k
1
, u
2
)]
k = 1, . . . ,
1
= 1, . . . ,
2
.
Denote the value of each of these games by
T(v(), s) := val[H(v(), s)]
and let T(v()) = (T(v(), s) : s S). This denes a mapping T : IR
n
IR
n
. This
mapping is also called the Dynamic Programming operator of the Markov game.
VI.3.2. Existence of sequential saddle points.
LEMMA VI.3.1. If A and B are two matrices of same dimensions, then
(VI.2) [val[A] val[B][ max
k,
[a
k,
b
k,
[.
The proof of this lemma is left as exercise 6.1.
LEMMA VI.3.2. If v(),
1
and
2
are such that, for all s S
v(s) (respectively respectively =) h(v(), s,
1
(s),
2
s(s))
= r(s,
1
(s),
s
(s)) +
S
p
s,s
(
1
(s),
2
s(s)) v(s
) (VI.3)
then
(VI.4) v(s) (respectively respectively =) V (s;
1
,
2
).
Proof: The proof is relatively straightforward and consists in iterating the inequal-
ity (VI.3). QED
We can now establish the following result
THEOREM VI.3.1. the mapping T is contracting in the max-norm
|v()| = max
sS
[v(s)[
and admits the value function introduced in Denition VI.2.1 as its unique xed-point
v
() = T(v
()).
Furthermore the optimal (saddle-point) strategies are dened as the mixed strategies
yielding this value i.e.,
(VI.5) h(v
(), s,
1
(s),
2
(s)) = val[H(v
(), s)], s S.
VI.3. SHAPLEYS-DENARDO OPERATOR FORMALISM 91
Proof: We rst establish the contraction property. We use lemma VI.3.1 and the tran-
sition probability properties to establish the following inequalities
|T(v()) T(w())| max
sS
_
max
u
1
U
1
; u
2
U
2
r(s, u
1
, u
2
) +
S
p
s,s
(u
1
, u
2
) v(s
)
r(s, u
1
, u
2
)
S
p
s,s
(u
1
, u
2
) w(s
_
= max
sS;u
1
U
1
; u
2
U
2
S
p
s,s
(u
1
, u
2
) (v(s
) w(s
))
max
sS;u
1
U
1
; u
2
U
2
S
p
s,s
(u
1
, u
2
) [v(s
) w(s
)[
max
sS;u
1
U
1
; u
2
U
2
S
p
s,s
(u
1
, u
2
) |v() w()|
= |v() w()| .
Hence T is a contraction, since 0 < 1. By the Banach contraction theorem this
implies that there exists a unique xed point v
() to the operator T.
We now show that there exist stationary Markov strategies
1
,
2
for wich the sad-
dle point condition holds
(VI.6) V (s;
1
,
2
) V (s;
1
,
2
) V (s;
1
,
2
).
Let
1
(s),
2
(s) be the saddle point strategies for the local matrix game with payoffs
(VI.7) h(v
(), s, u
1
, u
2
)
u
1
U
1
,u
2
U
2
= H(v(), s).
Consider any strategy
2
for Player 2. Then, by denition
(VI.8) h(v
(), s,
1
(s),
2
(s))
u
1
U
1
,u
2
U
2
v
(s) s S.
By Lemma VI.3.2 the inequality (VI.8) implies that for all s S
(VI.9) V (s;
1
,
2
) v
(s).
Similarly we would obtain that for any strategy
1
and for all s S
(VI.10) V (s;
1
,
2
) v
(s),
and
(VI.11) V (s;
1
,
2
) = v
(s),
This establishes the saddle point property. QED
In the single player case (MDP or Markov Decision Process) the contraction and
monotonicity of the dynamic programming operator yield readily a converging numer-
ical algorithm for the computation of the optimal strategies [31]. In the two-player
zero-sum case, even if the existence theorem can also be translated into a converging
algorithm, some difculties arise (see e.g., [16] for a recently proposed converging
algorithm).
CHAPTER VII
Nonzero-sum Markov and Sequential Games
In this chapter we consider the possible extension of the Shapley Markov game for-
malism to nonzero-sum games. We shall consider two steps in these extensions: (i) use
the same nite state and nite action formalism as in Shapleys and introduce differ-
ent reward functions for the different players; (ii) introduce a more general framework
where the state and actions can be continuous variables.
VII.1. Sequential games with discrete state and action sets
We consider a stochastic game played non-cooperatively by players that are not
in purely antagonistic sutuation. Shapleys formalism extends without difculty to a
nonzero-sum setting.
VII.1.1. Markov game dynamics. Introduce S = 1, 2, . . . , n, the set of possi-
ble states, U
j
(s) = 1, 2, . . . ,
j
(s), j = 1, . . . , m, the nite action sets at state s of
m players and the transition probabilities
p
s,s
(u) = P[x(t + 1) = s
[x(t) = s, u], s, s
S, u U
1
(s) U
m
(s).
Dene the transition reward of Player j when the process is in state s and the players
take the action pair u by
r
j
(s, u), s S, u U
1
(s) U
m
(s).
REMARK VII.1.1. Notice that we permit the action sets of the different players to
depend on the current state s of the game. Shapleys theory remains valid in that case
too.
VII.1.2. Markov strategies. Markov strategies are dened as in the zero-sum
case. We denote
j
(x(t))
k
the probability given to action u
k
j
by Player j when he uses
strategy
j
and the current state is x.
VII.1.3. Feedback-Nash equilibrium. Dene Markov strategies as above i.e., as
mappings from S into the players mixed actions. Player js payoff is thus dened as
V
j
(s
0
;
1
, . . . ,
m
) = E
1
,...,
m
[
t=0
k=1

m
=1
1
(x(t))
k

m(x(t))
r
j
(x(t), u
k
1
, . . . , u
m
)[x(0) = s
0
].
93
94 VII. NONZERO-SUM MARKOV AND SEQUENTIAL GAMES
DEFINITION VII.1.1. An m-tuple of Markov strategies (
1
, . . . ,
m
) is a feedback-Nash equilibrium
if for all s S
V
j
(s;
1
, . . . ,
j
, . . . ,
m
) V
j
(s;
1
, . . . ,
j
, . . . ,
m
)
for all strategies
j
of Player j, j M. The number
v
j
(s) = V
j
(s;
1
, . . . ,
j
, . . . ,
m
)
will be called the equilibrium value of the game at state s for Player j.
VII.1.4. Sobel-Whitt operator formalism. The rst author to extend Shapleys
work to a nonzero-sum framework has been Sobel [?]. A more recent treatment of
nonzero-sum sequential games can be found in [68].
We introduce the so-called local reward functions
h
j
(s, v
j
(), u) = r
j
(s, u) +
S
p
ss
(u)v
j
(s
), j M (VII.1)
where the functions v
j
() : S IR are given reward-to-go functionals (in this case they
are vectors of dimension n = card(S)) dened for each Player j. The local reward
(VII.1) is the sum of the transition reward for Player j and the discounted expected
reward-to-go from the new state reached after the transition. For a given s and a given
set of reward-to-go functionals v
j
(), j M, the local rewards (VII.1) dene a matrix
game over the pure strategy sets U
j
(s), j M.
We now dene, for any given Markov policy vector =
j
jM
, an operator
H
, acting on the space of reward-to-go functionals, (i.e., n-dimensional vectors) and

dened as
(H
v())(s) = E
(s)
[h
j
(s, v
j
(), u]
jM
. (VII.2)
We also introduce the operator F
dened as
(F
v())(s) = sup
j
E
(j)
(s)
[h
j
(s, v
j
(), u]
jM
, (VII.3)
where u is the random action vector and
(j)
is the Markov policy obtained when
only Player j adjusts his/her policy, while the other players keep their -policies xed.
In other words Eq. VII.3 denes the optimal reply of each Player j to the Markov
strategies chosen by the other players.
VII.1.5. Existence of Nash-equilibria. The dynamic programming formalismin-
troduced above, leads to the following powerful results (see [68] for a recent proof)
THEOREM VII.1.1. Consider the sequential game dened above, then
(1) The expected payoff vector associated with a stationary Markov policy is
given by the unique xed point v
() of the contracting operator H
.
(2) The operator F
is also contracting and thus admits a unique xed-point f
().
VII.2. SEQUENTIAL GAMES ON BOREL SPACES 95
(3) The stationary Markov policy
is an equilibrium strategy iff

f
(s) = v
(s), s S. (VII.4)
(4) There exists an equilibrium dened by a stationary Markov policy.
REMARK VII.1.2. In the nonzero-sum case the existence theorem is not based on
a contraction property of the equilibrium dynamic programming operator. As a
consequence, the existence result does not translate easily into a converging algorithm
for the numerical solution of these games (see [?]).
VII.2. Sequential games on Borel spaces
The theory of non-cooperative markov games has been extended by several authors
to the case where the state and the actions are in continuous sets. Since we are dealing
with stochastic processes, the apparatus of measure theory is now essential.
VII.2.1. Description of the game. An m-player sequential game is dened by
the following objects:
(S, ), U
j
,
j
(), r
j
(, ), Q(, ).,
where:
(1) (S, ) is a measurable state space with a countably generated -algebra of
subsets of S.
(2) U
j
is a compact metric space of actions for Player j.
(3)
j
() is a lower measurable map from S into nonempty compact subsets of
U
j
. For each s,
j
(s) represents the set of admissible actions for Player j.
(4) r
j
(, ) : S U IR is a bounded measurable transition reward function for
Player j. This functions are assumed to be continuous on U, for every s S.
(5) Q(, ) is a product measurable transition probability fromSUto S. It is as-
sumed that Q(, ) satises some regularity conditions which are too technical
to be given here. We refer the reader to [46] for a more precise statement.
(6) (0, 1) is the discount factor.
A stationary Markov strategy for Player j is a measurable map
j
() from S into the
set P(U
j
) of probability measure on U
j
such that
j
(s) P(
j
(s)) for every s S.
VII.2.2. Dynamic programming formalism. The denition of local reward func-
tions given in VII.1 in the discrete state case has to be adapted to the continuous state
format, it becomes
h
j
(s, v
j
(), u) = r
j
(s, u) +
_
S
v
j
(t)Q(dt[s, u), j M (VII.5)
The operators H
and F
are dened as above. The existence of equilibria is difcult

to establish for this general class of sequential games. In [68], the existence of -
equilibria is proved, using an approximation theory in dynamic programming. The
existence of equilibria was obtained only for special cases
1
VII.3. Application to a stochastic duopoloy model
VII.3.1. A stochastic repeated duopoly. Consider the stochastic duopoly model
dened by the following linear demand equation
x(t + 1) = [u
1
(t) + u
2
(t)] + (t)
which determines the price x(t + 1) of a good at period t + 1 given the total supply
u
1
(t) +u
2
(t) decided at the end of period t by Players 1 and 2. Assume a unit produc-
tion cost equal to . The prots, at the end of period t by both players (rms) are then
determined as
j
(t) = (x(t + 1) )u
j
(t).
Assume that the two rms have the same discount rate , then over an innite time
horizon, the payoff to Player j will be given by
V
j
=
t=0
j
(t).
This game is repeated, therefore an obvious equilibrium solution consists to play re-
peatedly the (static) Cournot solution
(VII.6) u
c
j
(t) =

3
, j = 1, 2
which generates the payoffs
(VII.7) V
c
j
=
( )
2
9(1 )
j = 1, 2.
A symmetric Pareto (nondominated) solution is given by the repeated actions
u
P
j
(t) =

4
, j = 1, 2
and the associated payoffs
V
P
j
=
( )
8(1 )
j = 1, 2.
where = .
1
see [43], [44], [45], [48].
VII.3. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 97
The Pareto outcome dominates the Cournot equilibrium but it does not represent
an equilibrium. The question is the following:
is it possible to construct a pair of memory strategies which would
dene an equilibriumwith an outcome dominating the repeated Cournot
strategy outcome and which would be as close as possible to the
Pareto nondominated solution?
VII.3.2. A class of trigger strategies based on a monitoring device. The ran-
dom perturbations affecting the price mechanism do not permit a direct extension of
the approach described in the deterministic context. Since it is assumed that the actions
of players are not directly observable, there is a need to proceed to some ltering of the
sequence of observed states in order to monitor the possible breaches of agreement.
In [23] a dominating memory strategy equilibrium is constructed, based on a one-
step memory scheme. We propose below another scheme, using a multistep memory,
that yields an outcome which lies closer to the Pareto solution.
The basic idea consists to extend the state space by introducing a newstate variable,
denoted v which is used to monitor a cooperative policy that all players have agreed
to play and which is dened as : v u
j
= (v). The state equation governing the
evolution of this state variable is designed as follows
(VII.8) v(t + 1) = maxK, v(t) + x
e
(t + 1) x(t + 1),
where x
e
is the expected outcome if both players use the cooperative policy i.e.,
x
e
(t + 1) = 2(v(t)).
It should be clear that the new state variable v provides a cumulative measure of the
positive discrepancies between the expected prices x
e
and the realized ones x(t). The
parameter K denes a lower bound for v. This is introduced to prevent a compen-
sation of positive discrepancies by negative ones. A positive discrepancy could be an
indication of oversupply, i.e., an indication that at least one player is not respecting the
agreement and is maybe trying to take advantage over the other player.
If these discrepancies accumulate too fast, the evidence of cheating is mounting
and thus some retaliation should be expected. To model the retaliation process we
introduce another state variable, denoted y, which is a binary variable i.e., y 0, 1.
This new state variable will be an indicator of the prevailing mood of play. If y = 1
then the game is played cooperatively; if y = 0, then the the game is played in a
noncooperative manner, interpreted as a punitive or retaliatory mood of play.
This state variable is assumed to evolve according to the following state equation
(VII.9) y(t + 1) =
_
1 if y(t) = 1 and v(t + 1) < (v(t))
0 otherwise,
where the positive valued function : v (v) is a design parameter of this
monitoring scheme.
According to this state equation, the cooperative mood of play will be maintained
provided the cumulative positive discrepancies do not increase too fast from one period
to the next. Also, this state equation tells us that, once y(t) = 0, then y(t
) 0 for all
periods t
> t, i.e a punitive mood of play lasts forever. In the models discussed later
on we shall relax this assumption of everlasting punishment.
When the mood of play is noncooperative i.e., when y = 0, both players use as
a punishment (or retaliation) the static Cournot solution forever. This generates the
expected payoffs V
c
j
, j = 1, 2 dened in Eq. (VII.7). Since the two players are identical
we shall not use the subscript j anymore.
When the mood of play is cooperative i.e., when y = 1, both players use an agreed
upon policy which determines their respective s as a function of the state variable v.
This agreement policy is dened by the function (v). The expected payoff is then a
function W(v) of this state variable v.
For this agreement to be stable i.e., not to provide a temptation to cheat to any
player, one imposes that it be an equilibrium. Note that the game is now a sequential
Markov game with a continuous state space. The dynamic programming equation
characterizing an equilibrium is given below
W(v) = max
u
[ ((v) + u)]u
+P[v
(v)]V
c
+P[v
< (v)]E[W(v
)[v
< (v)], (VII.10)

where we have denoted
v
= maxK, v + (u (v))
the random value of the state variable v after the transition generated by the s (u, (v)).
In Eq. (VII.10) we recognize the immediate reward [ ((v) + u)]u of
Player 1 when he plays u while the opponent sticks to (v). This is added to the
conditional expected payoffs after the transition to either the punishment mood of play,
corresponding to the values y = 0 or the cooperative mood of play corresponding to
y = 1.
VII.3. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 99
A solution of these DP equations can be found by solving an associated xed point
problem, as indicated in [29]. To summarize the approach we introduce the operator
(T
W)(v, u) = [ (u + (v))] u + ( )
2
F(s (v))
9(1 )
+W(K)[1 F(s K)]
+
_
(v)
K
W()f(s ) d (VII.11)
where F() and f() are the cumulative distribution function and the density probability
function respectively of the random disturbance . We have also used the following
notation
s = v + (u (v)).
An equilibrium solution is a pair of functions (w(), ()) such that
W(v) = max
u
T
(W)(v, u) (VII.12)
W(v) = (T
W)(v, (v)). (VII.13)

In [29] it is shown how an adapatation of the Howard policy improvement algorithm
[31] permits the computation of the solution of this sort of xed-point problem. The
case treated in [29] corresponds to the use of a quadratic function () and a Gauss-
ian distribution law for . The numerical experiments reported in [29] show that one
can dene, using this approach, a subgame perfect equilibrium which dominates the
repeated Cournot solution.
In [50], this problem has studied in the case where the (inverse) demand law is
subject to a multiplicative noise. Then one obtains an existence proof for a dominating
equilibrium based on a simple one-step memory scheme where the variable v satises
the following equation
v(t + 1) =
x
e
x(t + 1)
x(t)
.
This is the case where one does not monitor the cooperative policy through the use of a
cumulated discrepancy function but rather on the basis of repeated identical tests. Also
in Porters approach the punishment period is nite.
In [29] it is shown that the approach could be extended to a full edged Markov
game i.e., a sequential game rather than a repeated game. A simple model of Fisheries
management was used in that work to illustrate this type of sequential game coopera-
tive equilibrium.
VII.3.3. Interpretation as a communication device. In our approach, by extend-
ing the state space description (i.e., introducing the new variables v and y), we retained
a Markov game formalism for an extended game and this has permitted us to use dy-
namic programming for the characterization of subgame perfect equilibria. This is
of course reminiscent of the concept of communication device considered in [17]for
repeated games and discussed in Part 1. An easy extension of the approach described
above would lead to random transitions between the two moods of play, with transition
probabilities depending on the monitoring statistic v. Also a punishment of random du-
ration is possible in this model. In the next section we illustrate these features when
we propose a differential game model with random moods of play.
The monitoring scheme is a communication device which receives as input the
observation of the state of the system and sends as an output a public signal which is
suggesting to play according to two different moods of play.
Index
Bayesian equilibrium, 52
bimatrix game, 3235
Braess paradox, 65
control, 5, 6, 98
correlated equilibrium, 48, 55
Cournot equilibrium, 58, 85
equilibrium, 23, 27, 3337, 3949, 5355, 57,
58, 61, 6471, 79, 8285, 9499, 101
equilibrium on network, 66
equilibrium versus optimum, 65
extensive form of a game, 10, 1214, 1618,
20, 45, 48, 50, 55, 76, 87, 88
feedback Nash equilibrium, 93
folk theorem, 77
information pattern, 912, 1517, 20, 23, 46
50, 52, 54, 7579, 81, 82, 84, 85, 8789
Nash equilibrium, 3337, 45, 46, 48, 49, 52
55, 81, 84
Nash-Cournot equilibrium, 39, 44, 59, 61, 63,
64, 69, 70, 85
Natures move, 1114, 46, 48, 50
network, 65
normal form of a game, 17, 23, 48, 52, 53, 76,
89
normalized equilibrium, 41
prisoners dilemma, 34, 69
rational agent, 14, 24, 27
saddle point, 2629, 31, 33, 35, 54, 89, 91
state, 5, 79, 8789, 9395, 97100
symmetric information, 16
utility function, 10, 14, 15, 17, 20, 24
utility function axioms, 14
Wardrop equilibrium, 70
101
Bibliography
[1] A. ALJ AND A. HAURIE, Dynamic Equilibria in Multigeneration Stochastic Games, IEEE Trans.
Autom. Control, Vol. AC-28, 1983, pp 193-203.
[2] R. J. AUMANN Subjectivity and correlation in randomized strategies, J. Economic Theory,
Vol. 1, pp. 67-96, 1974.
[3] R. J.AUMANN Lectures on Game Theory, Westview Press, Boulder etc. , 1989.
[4] T. BAS AR AND G. K. OLSDER, Dynamic Noncooperative Game Theory, Academic Press, New
York, 1982.
[5] T. BAS AR, Time consistency and robustness of equilibria in noncooperative dynamic games in F.
VAN DER PLOEG AND A. J. DE ZEEUW EDS., Dynamic Policy Games in Economics , North
Holland, Amsterdam, 1989.
[6] T. BAS AR , A Dynamic Games Approach to Controller Design: Disturbance Rejection in Discrete
Time in Proceedings of the 28
th
Conference on Decision and Control, IEEE, Tampa, Florida,
1989.
[7] Braess, D., 1968,
Uber ein Paradoxen der Verkrhrsplannung. Unternehmenforschung, vol. 12,

pp. 258-268.
[8] D. P. BERTSEKAS Dynamic Programming: Deterministic and Stochastic Models, Prentice
Hall, Englewood Cliffs, New Jersey, 1987.
[9] A. BROOKE, D. KENDRICK AND A. MEERAUS, GAMS. A Users Guide, Release 2.25, Scien-
tic Press/Duxbury Press, 1992.
[10] A. COURNOT, Recherches sur les principes math ematiques de la th eorie des richesses, Li-
brairie des sciences politiques et sociales, Paris, 1838.
[11] E.V. DENARDO, Contractions Mappings in the Theory Underlying Dynamic Programming,
SIAM Rev. Vol. 9, 1967, pp. 165-177.
[12] M.C. FERRIS AND T.S. MUNSON, Interfaces to PATH 3.0, to appear in Computational Opti-
mization and Applications.
[13] M.C. FERRIS AND J.-S. PANG, Complementarity and Variational Problems: State of the Art,
SIAM, 1997.
[14] M.C. FERRIS AND J.-S. PANG, Engineering and economic applications of complementarity prob-
lems, SIAM Review, Vol. 39, 1997, pp. 669-713.
[15] J.A. FILAR AND VRIEZE K., Competitive Markov Decision Processes, Springer-Verlag New
York, 1997.
[16] J.A. FILAR AND B. TOLWINSKI, On the Algorithm of Pollatschek and Avi-Itzhak in T.E.S. Ragha-
van et al. (eds) Stochastic Games and Related Topics, 1991, Kluwer Academic Publishers.
[17] F. FORGES, 1986, An Approach to Communication Equilibria, Econometrica, Vol. 54, pp. 1375-
1385.
[18] R. FOURER, D.M. GAY AND B.W. KERNIGHAN, AMPL: A Modeling Languagefor Mathe-
matical Programming, Scientic Press/Duxbury Press, 1993.
[19] D. FUDENBERG AND J. TIROLE Game Theory, The MIT Press 1991, Cambridge, Massachusetts,
London, England.
[20] A. HAURIE AND B. TOLWINSKI Cooperative Equilibria in Discounted Stochastic Sequential
Games, Journal of Optimization Theory and Applications, Vol. 64, No. 3, March 1990.
[21] J.W. FRIEDMAN, Oligopoly and the Theory of Games, Amsterdam: North-Holland, 1977.
[22] J.W. FRIEDMAN, Game Theory with Economic Applications, Oxford: Oxford University Press,
1986.
103
104 BIBLIOGRAPHY
[23] GREEN E.J. AND R.H. PORTER Noncooperative collusion under imperfect price information ,
Econometrica, Vol. 52, 1985, 1984, pp. 87-100.
[24] Hagstrom, J.N. & R.A. Abrams, Characterizing Braesss Paradox for Trafc Networks. Working
Paper.
[25] J. HARSANY, Games with incomplete information played by Bayesian players, Management
Science, Vol. 14, 1967-68, pp. 159-182, 320-334, 486-502.
[26] A. HAURIE AND P. MARCOTTE, On the relationship between Nash-Cournot and Wardrop equi-
libria, Networks, Vol. 15, 1985, pp. 295-308.
[27] A. HAURIE AND B. TOLWINSKI, 1984, Acceptable Equilibria in Dynamic bargaining Games,
Large Scale Systems, Vol. 6, pp. 73-89.
[28] A. HAURIE AND B. TOLWINSKI, Denition and Properties of Cooperative Equilibria in a Two-
Player Game of Innite Duration, Journal of Optimization Theory and Applications, Vol. 46i,
1985, No.4, pp. 525-534.
[29] A. HAURIE AND B. TOLWINSKI, Cooperative Equilibria in Discounted Stochastic Sequential
Games, Journal of Optimization Theory and Applications, Vol. 64, 1990, No.3, pp. 511-535.
[30] Hassin, & M. Haviv, 200?, Equilibrium customer behavior in queueing, Kluwer, to appear.
[31] R. HOWARD, Dynamic programming and Markov Processes, MIT Press, Cambridge Mass,
1960.
[32] H. MOULIN AND J.-P. VIAL, strategically Zero-sum Games: The Class of Games whose Com-
pletely Mixed Equilibria Cannot be Improved Upon, International Journal of Game Theory,
Vol. 7, 1978, pp. 201-221.
[33] H.W. KUHN, Extensive games and the problem of information, in H.W. Kuhn and A.W. Tucker
eds. Contributions to the theory of games, Vol. 2, Annals of Mathematical Studies No 28, Prince-
ton University pres, Princeton new Jersey, 1953, pp. 193-216.
[34] C.E. LEMKE AND J.T. HOWSON, Equilibrium points of bimatrix games, J. Soc. Indust. Appl.
Math., 12, 1964, pp. 413-423.
[35] C.E. LEMKE, Bimatrix equilibrium points and mathematical programming, Management Sci.,
11, 1965, pp. 681-689.
[36] D. G. LUENBERGER, Optimization by Vector Space Methods, J. Wiley & Sons, New York,
1969.
[37] D. G. LUENBERGER, Introduction to Dynamic Systems: Theory, Models & Applications, J.
Wiley & Sons, New York, 1979.
[38] O.L. MANGASARIAN AND STONE H., Two-Person Nonzerosum Games and Quadartic progam-
ming, J. Math. Anal. Applic.,/bf 9, 1964, pp. 348-355.
[39] A. W. MERZ, The Game of Identical Cars in: Multicriteria Decision Making and Differential
Games, G. Leitmann ed., Plenum Press, New York and London, 1976.
[40] R.B. MYERSON Game Theory: Analysis of Conict, Harvard University Press, Cambridge
Mass. 1997.
[41] F.H. MURPHY,H.D.SHERALI AND A.L. SOYSTER, A mathematical programming approach for
determining oligopolistic market equilibrium, Mathematical Programming, Vol. 24, 1982, pp.
92-106.
[42] J.F. NASH, Non-cooperative games, Annals of Mathematics, 54, 1951, pp. 286-295.
[43] A.S. NOWAK, Existence of equilibrium Stationary Strategies in Discounted Noncooperative Sto-
chastic Games with Uncountable State Space, J. Optim. Theory Appl., Vol. 45, pp. 591-602,
1985.
[44] A.S. NOWAK, Nonrandomized Strategy Equilibria in Noncooperative Stochastic Games with Ad-
ditive Transition and Reward Structures, J. Optim. Theory Appl., Vol. 52, 1987, pp. 429-441.
[45] A.S. NOWAK, Stationary Equilibria for Nonzero-Sum Average Payoff Ergodic Stochastic Games
with General State Space, Annals of the International Society of Dynamic Games, Birkhauser,
1993.
[46] A.S. NOWAK AND T.E.S. RAGHAVAN, Existence of Stationary Correlated Equilibria with Sym-
metric Information for Discounted Stochastic Games Mathematics of Operations Research, to
appear.
BIBLIOGRAPHY 105
[47] Owen G. Game Theory, Academic Press, New York, London etc. , 1982.
[48] PARTHASARATHY T. AND S. SHINA, Existence of Stationary Equilibrium Strategies in Nonzero-
Sum Discounted Stochastic Games with Uncountable State Space and State Independent Transi-
tions, International Journal of Game Theory, Vol. 18, 1989, pp.189-194.
[49] M. L. PETIT, Control Theory and Dynamic Games in Economic Policy Analysis, Cambridge
University Press, Cambridge etc. , 1990.
[50] R.H. PORTER, Optimal Cartel trigger Strategies , Journal of Economic Theory, Vol. 29, 1983,
pp.313-338.
[51] R. RADNER, Collusive Behavior in Noncooperative -Equilibria of Oligopolies with Long but
Finite Lives, Journal of Economic Theory, Vol.22, 1980, pp.136-154.
[52] R. RADNER, Monitoring Cooperative Agreement in a Repeated Principal-Agent Relationship,
Econometrica, Vol. 49, 1981, pp. 1127-1148.
[53] R. RADNER, Repeated Principal-Agent Games with Discounting, Econometrica, Vol. 53, 1985,
pp. 1173-1198.
[54] J.B. ROSEN, Existence and Uniqueness for concave n-person games, Econometrica, 33, 1965,
pp. 520-534.
[55] T.F RUTHERFORD, Extensions of GAMS for complementarity problems arising in applied eco-
nomic analysis, Journal of Economic Dynamics and Control, Vol. 19, 1995, pp. 1299-1324.
[56] R. SELTEN, Rexaminition of the Perfectness Concept for Equilibrium Points in Extensive Games,
International Journal of Game Theory, Vol. 4, 1975, pp. 25-55.
[57] L. SHAPLEY, Stochastic Games, 1953, Proceedings of the National Academy of Science Vol.
39, 1953, pp. 1095-1100.
[58] M. SHUBIK, Games for Society, Business and War, Elsevier, New York etc. 1975.
[59] M. SHUBIK, The Uses and Methods of Gaming, Elsevier, New York etc. , 1975.
[60] M. SIMAAN AND J. B. CRUZ, On the Stackelberg Strategy in Nonzero-Sum Games in: G. LEIT-
MANN ed., Multicriteria Decision Making and Differential Games, Plenum Press, New York
and London, 1976.
[61] M. SIMAAN AND J. B. CRUZ, Additional Aspects of the Stackelberg Strategy in Nonzero-Sum
Games in: G. LEITMANN ed., Multicriteria Decision Making and Differential Games, Plenum
Press, New York and London, 1976.
[62] M.J. SOBEL, Noncooperative Stochastic games, Annals of Mathematical Statistics, Vol. 42, pp.
1930-1935, 1971.
[63] B. TOLWINSKI, Introduction to Sequential Games with Applications, Discussion Paper No.
46, Victoria University of Wellington, Wellington, 1988.
[64] B. TOLWINSKI, A. HAURIE AND G. LEITMANN, Cooperative Equilibria in Differential Games,
Journal of Mathematical Analysis and Applications, Vol. 119,1986, pp.182-202.
[65] J. VON NEUMANN, Zur Theorie der Gesellshaftsspiele, Math. Annalen, 100, 1928, pp. 295-320.
[66] J. VON NEUMANN AND O. MORGENSTERN, Theory of Games and Economic Behavior,
Princeton: Princeton University Press, 1944.
[67] J.G. WARDROP, Some theoretical aspects of road trafc research, Proc. Inst. Civil Engi., Vol. II,
No. 1 pp. 325-378, 1952.
[68] W. WHITT, Representation and Approximation of Noncooperative Sequential Games, SIAM J.
Control, Vol. 18, pp. 33-48, 1980.
[69] P. WHITTLE, Optimization Over Time: Dynamic Programming and Stochastic Control, Vol.
1, J. Wiley, Chichester, 1982.

An Introduction To Dynamic Games

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

An Introduction To Dynamic Games

Hochgeladen von

Copyright:

Verfügbare Formate

An Introduction to Dynamic Games

FIGURE II.1. Node and arcs

FIGURE II.2. A tree

FIGURE II.3. Information set

. For any k and l, one has

) such that, for all i = 1, . . . , m and j = 1, . . . , n

) is a saddle point in pure strategies for the matrix game.

be a strategy pair that yields the security level payoffs v (respec-

) be a saddle point for a matrix game. Players 1 and

is called the value of the game.

of the zero-sum matrix game A is obtained as the common opti-

(44, 44) (44, 41)

) is said to be a Nash equi-

) for every mixed strategy x, and

TABLE III.1. The Prisoners Dilemma.

) be a solution to the quadratic programming problem

These two conditions with (III.20) imply

) is a Nash equilibrium for the bimatrix game. QED

) satisfying (III.21)-(III.24) is an equilibrium iff

) satisfying (III.21)-(III.24), they simplify to x

satises (III.41) but is not a coupled equilibrium i.e., does not

) which is a contradiction to (III.41). QED

, r). Hence a coupled equilibrium exists.

> 0 can be chosen small enough for

) in the parlance of varia-

FIGURE III.2. The convex hull of Nash equilibria

FIGURE III.5. The dominating correlated equilibrium

(44, 44) (44, 41)

(Q) < 0 wherever

FIGURE IV.1. Nodes and arcs

(n) :: the set of arcs with initial node n.

FIGURE IV.2. Market return arc

is a Nash-Cournot equilibrium if, for each

is a Nash-Cournot equilibrium iff it satises the variational inequality

FIGURE IV.3. Paths from x to y

is a Wardrop equilibrium if, for any pair

is the subset consisting of

is characterized as the so-

(n) resulting from a Nash-Cournot equilibrium for the

. Then a Nash-Cournot equilibrium (v

and a sequence (n)

(n), such that the resulting total

(n). Since all the players in M

. It should be clear (see exercise 4.1) that the sequence u

: t = 0, 1 . . . , T 1) is also an equilibrium for the repeated game in both

, such that, together, they play (u

plays u(n+1) at stage n+1 if automaton b has played v(n)

plays v(n+1) at stage n+1 if automaton a

dened as above constitute a Nash

production levels correspond to

and thus the

FIGURE VI.1. A Markov game in extensive form

, acting on the space of reward-to-go functionals, (i.e., n-dimensional vectors) and

() of the contracting operator H

is also contracting and thus admits a unique xed-point f

is an equilibrium strategy iff

are dened as above. The existence of equilibria is difcult

< (v)], (VII.10)

W)(v, (v)). (VII.13)

Uber ein Paradoxen der Verkrhrsplannung. Unternehmenforschung, vol. 12,

Das könnte Ihnen auch gefallen