Beruflich Dokumente
Kultur Dokumente
Simone Cerreia-Vioglio
Department of Decision Sciences and IGIER, Università Bocconi
Massimo Marinacci
AXA-Bocconi Chair, Department of Decision Sciences and IGIER, Università Bocconi
Elena Vigna
Dipartimento Esomas, Università di Torino and Collegio Carlo Alberto
August 2017
1
This manuscript is a very preliminary version of a textbook that will be published by Springer
International Publishing (ISBN 978-3-319-44713-1). It is for the personal use of Bocconi students who
are attending …rst year mathematics courses. We thank Gabriella Chiomio and Claudio Mattalia,
who thoroughly translated a …rst version of the manuscript, as well as Alexandra Fotiou, Giacomo
Lanzani and Kelly Gail Strada for excellent research assistance, Margherita Cigola, Guido Osimo,
and Lorenzo Peccati for some very useful comments that helped us to improve the manuscript. We
are especially indebted to Pierpaolo Battigalli, Erio Castagnoli (with whom this project started),
Itzhak Gilboa, Fabio Maccheroni, Luigi Montrucchio, and David Schmeidler for the discussions that
over the years shaped our views on economics and mathematics.
ii
Contents
I Structures 1
iii
iv CONTENTS
3 Linear structure 59
3.1 Vector subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Linear independence and dependence . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Generated subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 Bases of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Post scriptum: some high school algebra . . . . . . . . . . . . . . . . . . . . . 73
4 Euclidean structure 75
4.1 Absolute value and norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.2 Absolute value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.3 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Topological structure 85
5.1 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Taxonomy of the points of Rn with respect to a set . . . . . . . . . . . . . . . 90
5.3.1 Interior, exterior and boundary points . . . . . . . . . . . . . . . . . . 90
5.3.2 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 Set stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.6 Compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.7 Closure and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6 Functions 105
6.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.1 Static choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.2 Intertemporal choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3 General properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.1 Preimages and level curves . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.2 Algebra of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4 Classes of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.4.1 Injective, surjective, and bijective functions . . . . . . . . . . . . . . . 126
6.4.2 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4.3 Bounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.4 Monotonic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4.5 Concave and convex functions: a preview . . . . . . . . . . . . . . . . 139
6.4.6 Separable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5 Elementary functions on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5.1 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5.2 Exponential and logarithmic functions . . . . . . . . . . . . . . . . . . 143
CONTENTS v
7 Cardinality 163
7.1 Actual in…nite and potential in…nite . . . . . . . . . . . . . . . . . . . . . . . 163
7.2 Bijective functions and cardinality . . . . . . . . . . . . . . . . . . . . . . . . 164
7.3 A Pandora’s box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8 Sequences 179
8.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.2 The space of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.3 Application: intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . 187
8.4 Application: prices and expectations . . . . . . . . . . . . . . . . . . . . . . . 187
8.4.1 A market for a good . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.4.2 Delays in production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.4.3 Expectation formation . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.5 Images and classes of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.6 Eventually: a key adverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.7 Limits: introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8 Limits and asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.8.2 Limits from above and from below . . . . . . . . . . . . . . . . . . . . 197
8.8.3 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.8.4 Topology of R and a general de…nition of limit . . . . . . . . . . . . . 199
8.9 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.9.1 Monotonicity and convergence . . . . . . . . . . . . . . . . . . . . . . 203
8.9.2 Bolzano-Weierstrass’Theorem . . . . . . . . . . . . . . . . . . . . . . 204
8.10 Algebra of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.10.1 The (many) certainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.10.2 Some common limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.10.3 Indeterminate forms for the limits . . . . . . . . . . . . . . . . . . . . 212
8.10.4 Summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.10.5 How many indeterminate forms are there? . . . . . . . . . . . . . . . . 215
8.11 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.11.1 Comparison criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.11.2 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.11.3 Root criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.12 The Cauchy condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
vi CONTENTS
9 Series 243
9.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9.1.1 Three classic series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.1.2 Sub specie aeternitatis: in…nite horizon . . . . . . . . . . . . . . . . . 247
9.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3 Series with positive terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.3.1 Comparison criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.3.2 Ratio criterion: prelude . . . . . . . . . . . . . . . . . . . . . . . . . . 255
9.3.3 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
9.3.4 A …rst series expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 258
9.4 Series with terms of any sign . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.4.1 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.4.2 Hic sunt leones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
V Optima 519
20 Derivatives 609
20.1 Marginal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
20.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
20.3 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
20.4 Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
20.5 One-sided derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
20.6 Derivability and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
20.7 Derivatives of elementary functions . . . . . . . . . . . . . . . . . . . . . . . . 622
20.8 Algebra of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
20.9 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
20.10Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
20.11Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
20.12Di¤erentiability and linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
20.12.1 Di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
20.12.2 Di¤erentiability and derivability . . . . . . . . . . . . . . . . . . . . . 635
20.12.3 Di¤erentiability and continuity . . . . . . . . . . . . . . . . . . . . . . 637
20.12.4 A terminological turning point . . . . . . . . . . . . . . . . . . . . . . 637
20.13Derivatives of higher order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
20.14Discrete limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
23 Approximation 713
23.1 Taylor’s polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . 713
23.1.1 Polynomial expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
23.1.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
23.1.3 Taylor expansion and limits . . . . . . . . . . . . . . . . . . . . . . . . 720
23.2 Omnibus proposition for local extremal points . . . . . . . . . . . . . . . . . . 721
23.3 Omnibus procedure of search of local extremal points . . . . . . . . . . . . . . 724
23.3.1 Twice di¤erentiable functions . . . . . . . . . . . . . . . . . . . . . . . 724
23.3.2 In…nitely di¤erentiable functions . . . . . . . . . . . . . . . . . . . . . 724
23.4 Taylor expansion: functions of several variables . . . . . . . . . . . . . . . . . 725
23.4.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
23.4.2 Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
23.4.3 Second-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 731
23.4.4 Multivariable unconstrained optima . . . . . . . . . . . . . . . . . . . 735
23.5 Coda: asymptotic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
23.5.1 Asymptotic scales and expansions . . . . . . . . . . . . . . . . . . . . 736
23.5.2 Asymptotic expansions and analytic functions . . . . . . . . . . . . . . 740
23.5.3 Hille’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
23.5.4 Borel’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
37 Stieltjes’integral 1073
37.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
37.2 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
37.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076
37.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078
37.5 Step integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079
37.6 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081
37.7 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1082
37.8 Modelling assets’gains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083
38 Moments 1085
38.1 Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085
38.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086
38.3 The problem of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087
38.4 Moment generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088
CONTENTS xvii
IX Appendices 1091
B Permutations 1099
B.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1099
B.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100
B.3 Anagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101
B.4 Newton’s binomial formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1102
Structures
1
Chapter 1
1.1 Sets
A set is a collection of distinguishable objects. There are two ways to describe a set: by
listing directly its elements, or by specifying a property that its elements have in common.
The second way is more common: for instance,
can be described as the set of the prime numbers between 10 and 30. The chairs of your
kitchen form a set of objects, the chairs, that have in common the property of being part
of your kitchen. The chairs of your bedroom form another set, as the letters of the Latin
alphabet form a set, distinct from the set of the letters of the Greek alphabet (and from the
set of chairs or from the set of numbers considered above).
Sets are usually denoted by capital letters: A, B, C, and so on; their elements are denoted
by small letters: a, b, c, and so on. To denote that an element a belongs to the set A we
write
a2A
where 2 is the symbol of belonging. Instead, to denote that an element a does not belong
to the set A we write a 2
= A.
O¤ the record remark (O.R.). The concept of set, apparently introduced in 1847 by
Bernhard Bolzano, is for us a primitive concept, not de…ned through other notions. Like
in Euclidean geometry, in which points and lines are primitive concepts (with an intuitive
geometric meaning that readers may give them). H
1.1.1 Subsets
The chairs of your bedroom are a subset of the chairs of your home: a chair that belongs to
your bedroom also belongs to your home. In general, a set A is subset of a set B when all
the elements of A are also elements of B. In this case we write A B. Formally,
3
4 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
and let
B = f11; 13; 15; 17; 19; 21; 23; 25; 27; 29g (1.2)
be the set of the odd numbers between 10 and 30. We have A B.
4 A⊆B
2
-2 A
-4
B
-6
-6 -4 -2 0 2 4 6
by using the so-called Venn diagrams to represent graphically the sets A and B: it is an
ingenuous, yet e¤ective, way to visualize sets.
When we have both A B and B A – that is, x 2 A if and only if x 2 B – the two
sets A and B are said to be equal; in symbols A = B. For example, let A be the set of
the solutions of the quadratic equation x2 3x + 2 = 0 and let B be the set formed by the
numbers 1 and 2. It is easy to see that A = B.
When A B and A 6= B, we write A B and say that A is a proper subset of B.
The sets A = fag that consist of a unique element are called singletons. They are a
peculiar, but altogether legitimate, class of sets.1
Nota Bene (N.B.) Though the two symbols 2 and are conceptually well distinct and
must not be confused, there exists an interesting relation between them. Indeed, consider
the set formed by a unique element a, that is, the singleton fag. Through such a singleton,
we can establish the relation
between 2 and . O
1
Note that a and fag are not the same thing; a is an element and fag is a set, even if it is formed by only
one element. For instance, the set A of the Nations of the Earth with the ‡ag of only one colour had (until
2011) only one element, Libya, but it is not “the Libya”: Tripoli is not the capital of A.
1.1. SETS 5
1.1.2 Operations
There are three basic operations among sets: union, intersection, and di¤erence. As we will
see, they take any two sets and, starting from them, form a new set.
The …rst operation that we consider is the intersection of two sets A and B. As the
term “intersection” suggests, with this operation we select all the elements that belong
simultaneously to the sets A and B.
De…nition 2 Given two sets A and B, their intersection A \ B is the set of all the elements
that belong both to A and B, that is, x 2 A \ B if x 2 A and x 2 B.
For example, let A be the set of the left-handed and B the set of the right-handed citizens
of a country. The intersection A \ B is the set of the ambidextrous citizens. If, instead, A is
the set of the gasoline cars and B the set of the methane cars, the intersection A \ B is the
set of the bi-fuel cars that run on both gasoline and methane.
It can happen that two sets have no elements in common. For example, let
C = f10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30g (1.3)
be the set of the even numbers between 10 and 30. It has no elements in common with the
set B in (1.2). In this case we talk of disjoint sets, with no elements in common. Such a
notion gives us the opportunity to introduce a fundamental set.
As a …rst use of the notion, note that two sets A and B are disjoint when they have
empty intersection, that is, A \ B = ;. For example, for the sets B and C in (1.2) and (1.3),
we have B \ C = ;.
We write A 6= ; when the set A is not empty, that is, it contains at least one element.
Conventionally, we consider the empty set as a subset of any set, that is, ; A for every set
A.
Proof “If”. Let A B. We want to prove that A \ B = A. To show that two sets are equal,
we always need to prove separately the two opposite inclusions: in this case, A \ B A and
A A \ B.
The inclusion A \ B A is easily proven to be true. Indeed, let x 2 A \ B.2 Then, by
de…nition, x belongs both to A and to B. In particular, x 2 A and this is enough to conclude
that A \ B A.
Let us prove the inclusion A A \ B. Let x 2 A. Since, by hypothesis, A B, each
element of A also belongs to B, it follows that x 2 B. Hence, x belongs both to A and to
B, i.e., x 2 A \ B. This proves that A A \ B.
We have shown that both the inclusions A \ B A and A A \ B hold; we can therefore
conclude that A \ B = A, which completes the proof of the “if” part.
The next operation we consider is the union. Here again the term “union” already
suggests how in this operation all the elements of both sets are collected together.
De…nition 5 Given two sets A and B, their union A [ B is the set of all the elements that
belong to A or to B, that is, x 2 A [ B if x 2 A or x 2 B.3
Note that an element can belong to both sets (unless they are disjoint). For example, if
A is again the set of the left-handed and B is the set of the right-handed citizens, the union
set contains all citizens with at least one hand, and there are individuals (the ambidexters)
who belong to both sets.4
It is immediate to show that A A [ B and that B A [ B. It then follows that
A\B A[B
2
In proving an inclusion between sets, say C D, throughout the book we will tacitly assume that C 6= ;
because the inclusion is trivially true when C = ;. For this reason our inclusion proof will show that x 2 C
(i.e., C 6= ;) implies x 2 D.
3
The conjunction “or” has the inclusive sense of the Latin “vel” (x belongs to A or to B or to both) and
not the exclusive sense of “aut” (x belongs to either A or to B, but not to both). Indeed, Giuseppe Peano
gave the symbol [ the meaning “vel” when he …rst introduced it, along with the intersection symbol \ and
the membership symbol ", which he interpreted as the Latin “et” and “est”, respectively (see the “signorum
tabula” in his 1889 Arithmetices principia, a seminal work on the foundations of mathematics).
4
The clause “with at least one hand”, though needed, may seem pedantic, even tactless. The distinction
between being precise and pedantic is subtle and, ultimately, subjective. Experience may help to balance
rigor and readability. In any case, in mathematics loose ends have to be handled with care and, de…nitely,
are not for beginners.
1.1. SETS 7
4 A ∪ B
-2 A
B
-4
-6
-2 0 2 4 6 8 10
De…nition 6 Given two sets A and B, their di¤erence A B is the set of all the elements
that belong to A, but not to B, that is, x 2 A B if both x 2 A and x 2
= B.
The set A B is, therefore, obtained by eliminating from A all the elements that belong
(also) to B.5 Graphically:
2 A-B
-1 B
A
-2
-3
-3 -2 -1 0 1 2 3 4 5
For example, let us go back to the sets A and B speci…ed in (1.1) and (1.2). Then,
that is, B A is the set of the non-prime odd numbers between 10 and 30.
Note that: (i) when A and B are disjoint, we have A B = A and B A = B, (ii) A B
is equivalent to A B = ; since, by removing from A all the elements that belong also to
B, the set A is deprived of all its elements, that is, we remain with the empty set.
In many applications there is a general set of reference, an all inclusive set, of which
various subsets are considered. For example, for demographers this set can be the entire
5
The di¤erence A B is often denoted by AnB.
8 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
population of a country, of which they can consider various subsets according to the demo-
graphic properties that are of interest (for instance, age is a standard demographic variable
through which the population can be subdivided in subsets).
The general set of reference is called universal set or, more commonly, space. There
is no standard notation for this set (which is often clear from the context). We denote it
temporarily by S. Given any of its subsets A, the di¤erence S A is denoted by Ac and
is called the complement set, or simply the complement, of A. The di¤erence operation is
called complementation when it involves the universal set.
Example 7 If S is the set of all citizens of a country and A is the set of all citizens that are
at least 65 years old, the complement Ac is constituted by all citizens that are (strictly) less
than 65 years old. N
Proposition 8 (Ac )c = A.
Proof Since we have to verify an equality between sets (as in the proof of Proposition 4),
we have to consider separately the two inclusions (Ac )c A and A (Ac )c .
If a 2 (Ac )c , then a 2
= Ac and therefore a 2 A. It follows that (Ac )c A.
Vice versa, if a 2 A, then a 2= Ac and therefore a 2 (Ac )c . Hence, A (Ac )c .
(i) commutative, that is, for any two sets A and B, we have A \ B = B \ A and A [ B =
B [ A;
(ii) associative, that is, for any three sets A, B, and C, we have A[(B [ C) = (A [ B)[C
and A \ (B \ C) = (A \ B) \ C.
We leave to the reader the simple proof. Property (ii) permits to write A [ B [ C
and A \ B \ C and, therefore, to extend without ambiguity the operations of union and
intersection to an arbitrary (…nite) number of sets:
n
[ n
\
Ai and Ai
i=1 i=1
It is possible to extend such operations also to in…nitely many sets. If A1 ; A2 ; :::An ; ::: is an
in…nite collection of sets, the union
[1
An
n=1
1.1. SETS 9
is the set of the elements that belong at least to one of the An , that is,
1
[
An = fa : a 2 An for at least one index ng
n=1
The intersection
1
\
An
n=1
is the set of the elements that belong to every An , that is,
1
\
An = fa : a 2 An for every index ng
n=1
We turn to the relations between the operations of intersection and union. Note the
symmetry between properties (1.4) and (1.5), in which \ and [ are exchanged.
Proposition 11 The operations of union and intersection are distributive, that is, given
any three sets A, B, and C, we have
A \ (B [ C) = (A \ B) [ (A \ C) (1.4)
and
A [ (B \ C) = (A [ B) \ (A [ C) : (1.5)
Proof We prove only (1.4). We have to consider separately the two inclusions A\(B [ C)
(A \ B) [ (A \ C) and (A \ B) [ (A \ C) A \ (B [ C).
If x 2 A \ (B [ C), then x 2 A and x 2 B [ C, that is (i) x 2 A and (ii) x 2 B or
x 2 C. It follows that x 2 A \ B or x 2 A \ C, i.e., x 2 (A \ B) [ (A \ C), and therefore
A \ (B [ C) (A \ B) [ (A \ C).
Vice versa, if x 2 (A \ B) [ (A \ C), then x 2 A \ B or x 2 A \ C, that is, x belongs
to A and to at least one of B and C and therefore x 2 A \ (B [ C). It follows that
(A \ B) [ (A \ C) A \ (B [ C).
Example 13 Let A be the set of all citizens of a country. Its subsets A1 , A2 , and A3
formed, respectively, by the citizens of school or pre-school age (from 0 to 17 years old), by
the citizens of working age (from 18 to 65 years old) and by the elders (from 65 years old
on) form a partition of the set A. Relatedly, age cohorts, formed by citizens who have the
same age, form a partition of A. N
10 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
We conclude with the so-called De Morgan’s laws for complementation: they illustrate
the relationship between the operations of intersection, union, and complementation.
Proof We prove only the …rst law, leaving the second one to the reader. As usual, to prove
an equality between sets we have to consider separately the two inclusions that compose it.
(i) (A [ B)c Ac \ B c . If x 2 (A [ B)c , then x 2 = A [ B, that is, x does not belong either
to A or to B. It follows that x belongs simultaneously to Ac and to B c and, therefore, to
their intersection. (ii) Ac \ B c (A [ B)c . If x 2 Ac \ B c then x 2
= A and x 2= B; therefore,
x does not belong to their union.
De Morgan’s laws show that, when considering complements, the operations [ and \
are, essentially, interchangeable. Often these laws are written in the equivalent form
1.2 Numbers
To quantify the variables of interest in economic applications (for example, the prices and
quantities of goods traded in some market) we need an adequate set of numbers. This is the
topic of the present section.
The natural numbers
0; 1; 2; 3; :::
do not need any introduction; their set will be denoted by the symbol N.
The set N of natural numbers is closed with respect to the fundamental operations of
addition and multiplication:
(i) m + n 2 N when m; n 2 N;
(ii) m n 2 N when m; n 2 N.
On the contrary, N is not closed with respect to the fundamental operations of subtraction
and division: for example, neither 5 6 nor 5=6 are natural numbers. It is, therefore, clear
that N is inadequate as a set of numbers for economic applications: the budget of a company
is an obvious example in which the closure with respect to the subtraction is crucial –
otherwise, how can we quantify losses?6
The integer numbers
:::; 3; 2; 1; 0; 1; 2; 3; :::
form a …rst extension, denoted by the symbol Z, of the set N. It leads to a set that is closed
with respect to addition and multiplication, as well as to subtraction. Indeed, by setting
m n = m + ( n),7 we have
(i) m n 2 Z when m; n 2 Z;
(ii) m n 2 Z when m; n 2 Z.
Z = fm n : m; n 2 Ng
Proposition 15 N Z.
We are left with a fundamental operation with respect to which Z is not closed: division.
For example, 1=3 is not an integer. To remedy this important shortcoming of the integers
(if we want to divide 1 cake among 3 guests, how can we quantify their portions if only Z
is available?), we need a further enlargement to the set of the rational numbers, denoted by
the symbol Q, and given by
nm o
Q= : m; n 2 Z with n 6= 0
n
6
Historically, negative numbers have often been viewed with suspicion. It is in economics, indeed, where
they have a most natural interpretation in terms of losses.
7
The di¤erence m n is simply the sum of m with the negative n of n (recall the notion of algebraic
sum).
12 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
In words, the set of the rational numbers consists of all the fractions with integers in both
the numerator and the denominator (not equal to zero).
Proposition 16 Z Q.
The set of rational numbers is closed with respect to all the four fundamental operations:8
(i) m n 2 Q when m; n 2 Q;
(ii) m n 2 Q when m; n 2 Q;
O.R. Each rational number that is not periodic, that is, that has a …nite number of decimals,
has two decimal representations. For example, 1 = 0:9 because
1
0:9 = 3 0:3 = 3 =1
3
In an analogous way, 2:5 = 2:49, 51:2 = 51:19, and so on. On the contrary, periodic rational
numbers and irrational numbers have a unique decimal representation (which is in…nite).
This is not a simple curiosity: if 0:9 were not equal to 1, we could state that 0:9 is the
number that immediately precedes 1 (without any other number in between), which would
violate a notable property that we will discuss shortly. H
The set of rational numbers seems, therefore, to have all that we need. Some simple
observations on the multiplication, however, will bring us some surprising …ndings. If q is a
rational number, the notation q n , with n 1, means
q q ::: q
| {z }
n times
with q 0 = 1 for every q 6= 0. The notation q n , called power of basis q and exponent n, per se
is just shorthand notation for the repeated multiplication of the same factor. Nevertheless,
given a rational q > 0, it is natural to consider the inverse path, that is, to determine the
1 p
positive “number”, denoted by q n –or, equivalently, by n q –and called root of order n of
q, such that
1 n
qn =q
p
For example,9 25 = 5 because 52 = 25. To understand the importance of roots, we can
consider the following simple geometric …gure:
8
The names of the four fundamental operations are addition, subtraction, multiplication, and division,
while the names of their results are sum, di¤erence, product, and quotient, respectively (the addition of 3
and 4 has 7 as sum, and so on).
9 p p
The square root 2 q is simply denoted by q, omitting the index 2.
1.2. NUMBERS 13
p
By Pythagoras’ Theorem, the length of the hypotenuse is 2. To quantify elementary
geometric entities, we thus need square roots. Here we have a, tragic to some, surprise.10
p
Theorem 17 2 2 = Q.
p
Proof p Suppose, by contradiction, that 2 2 Q. Then there exist m; n 2 Z such that
m=n = 2, and therefore
m 2
=2 (1.6)
n
We can assume that m=n is already reduced to its lowest terms, i.e., that m and n have no
factors in common.11 This means that m and n cannot both be even numbers (otherwise, 2
would be a common factor).
Formula (1.6) implies
m2 = 2n2 (1.7)
and, therefore, m2 is even. As the square of an odd number is odd, m is also even (otherwise,
if m were odd, then m2 would also be odd). Therefore, there exists an integer k 6= 0 such
that
m = 2k (1.8)
From (1.7) and (1.8) it follows that
n2 = 2k 2
Therefore n2 is even, and so n itself is even. In conclusion, both m and n are even, but this
contradicts
p the fact that m=n is reduced to its lowest terms. This contradiction proves that
22= Q.
This magni…cent result is one of the great theorems of Greek mathematics. Proved by the
Pythagorean school between the VI and the V century B.C., the unexpected outcome of the
–prima facie innocuous –distinction between even and odd numbers that the Pythagoreans
were the …rst to make, it represented a turning point in the history of mathematics. Leaving
aside the philosophical aspects,12 from the mathematical point of view it shows the need for
10
For the Pythagorean philosophy, in which the proportions (that is, the rational numbers) were central,
the discovery of the non-rationality of square roots was a traumatic event. We refer the curious reader to
Fritz (1945).
11
For example, 14=10 is not reduced to its lowest terms because the numerator and the denominator have
in common the factor 2. On the contrary, 7=5 is reduced to its lowest terms. p
12
The theorem shows, inter alia, that the hypotenuse contains in…nitely many points (otherwise 2 would
be a natural number). This questions the relations between geometry and the physical world that originally
motivated its study (at least under any kind of Atomism, back then advocated by the Ionian school).
14 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
a further enlargement of the set of numbers in order to quantify basic geometric entities (as
well as basic economic variables, as it will be clear in the sequel).
To introduce, at an intuitive level, this …nal enlargement,13 consider the real line:
It is easy to see how on this line we can represent the rational numbers:
The rational numbers do not exhaust, however, the real line. For example, also roots like
p
2, or other non-rational numbers, such as , must …nd their representation on the real
line:14
We denote by R the set of all the numbers that can be represented on the real line; they are
called real numbers.
The set R has the following properties in terms of the fundamental operations (here a; b
and c are generic real numbers):
(i) a + b 2 R and a b 2 R;
(ii) a + b = b + a and a b = b a;
(iv) a + 0 = a and b 1 = b;
1
(v) a + ( a) = 0 and b b = 1 provided b 6= 0;
(vi) a (b + c) = a b + a c.
Clearly, Q R. But Q 6= R: there are many real numbers, called irrationals, that are
not rational. Many roots and the numbers and e are examples of irrational numbers. It
is actually possible to prove that most real numbers are irrational. Although a rigorous
treatment of this topic would take us too far, the next simple result is already a clear
indication of how rich the set of the irrational numbers is.
Proposition 18 Given any two real numbers a < b, there exists an irrational number c 2 R
such that a < c < b.
13
For a rigorous treatment we refer, for example, to the …rst chapter of Rudin (1976).
14
Though intuitive, it is actually a postulate (of continuity of the real line).
1.3. STRUCTURE OF THE INTEGERS 15
In conclusion, R is the set of numbers that we will consider in the rest of the book. It
turns out to be adequate for most economic applications.16
Proof Two distinct properties are stated in the proposition: the existence of the pair (q; r),
and its uniqueness. Let us start by proving its existence. We will only consider the case
in which n 0 (you need only to to change the sign if n < 0). Consider the set A =
fp 2 N : p n=mg. Since n 0, the set A is non-empty because it contains at least the
integer zero. Let q be the largest element of A. By de…nition, qm n < (q + 1) m. Setting
r = n qm, we have
0 n qm = r < (q + 1) m qm = m
We have thus shown the existence of the desired pair (q; r).
Let us now consider uniqueness. By contradiction, let (q 0 ; r0 ) and (q 00 ; r00 ) be two di¤erent
pairs such that
n = q 0 m + r0 = q 00 m + r00 (1.9)
with 0 r0 ; r00 < m. Since (q 0 ; r0 ) and (q 00 ; r00 ) are di¤erent we have either q 0 6= q 00 or r0 6= r00
or both. If q 0 6= q 00 , without loss of generality, we can suppose that q 0 < q 00 ; that is,
q0 + 1 q 00 (1.10)
since q 0 and q 00 are integers. It follows from (1.9) that (q 00 q0 ) m = r0 r00 . Since
(q 00 q 0 ) m 0, we have that 0 r00 r0 < m. Hence,
q 00 q0 m = r0 r00 < m
which implies that q 00 q 0 < 1, that is, q 00 < q 0 + 1, which contradicts (1.10). We can
conclude that, necessarily, q 0 = q 00 . This leaves open only the possibility that r0 6= r00 . But,
since q 0 = q 00 , we have that
0 = q 00 q0 m = r0 r00 6= 0;
a contradiction. Hence, the assumption of having two di¤erent pairs (q 0 ; r0 ) and (q 00 ; r00 ) is
false.
Given two strictly positive integers m and n, their greatest common divisor, denoted by
gcd (m; n), is the largest divisor both numbers share. The next result, which was proven
by Euclid in his Elements, shows exactly what was taken for granted in elementary school,
namely, that any pair of integers has a unique greatest common divisor.
Theorem 21 (Euclid) Any pair of strictly positive integers has one and only one greatest
common divisor.
Proof Like Proposition 20, this is also an existence and uniqueness result. Uniqueness is
obvious; let us prove existence. Let m and n be any two strictly positive integers. By
Proposition 20, there is a unique pair (q1 ; r1 ) such that
n = q1 m + r1 (1.11)
with 0 r1 < m. If r1 = 0, then gcd (m; n) = m, and the proof is concluded. If r1 > 0, we
iterate the procedure by applying Proposition 20 to m. We thus have a unique pair (q2 ; r2 )
such that
m = q2 r1 + r2 (1.12)
1.3. STRUCTURE OF THE INTEGERS 17
n q1 m + r1 q1 q2 r1 + r1
= = = q1 q2 + 1
r1 r1 r1
and so r1 j n. Thus r1 is a divisor both for n and m. We now need to show that it is the
greatest of those divisors. Suppose p is a strictly positive integer such that p j m and p j n.
By de…nition, there are two strictly positive integers a and b such that n = ap and m = bp.
We have that
r1 n q1 m
0< = = a q1 b
p p
Hence r1 =p is a strictly positive integer, which implies that r1 p. To sum up, gcd (m; n) =
r1 , if r2 = 0. If this is the case, the proof is concluded.
If r2 > 0, we iterate the procedure once more by applying Proposition 20 to r2 . We thus
have a unique pair (q3 ; r3 ) such that
r1 = q3 r2 + r3
Example 22 Let us consider the strictly positive integers 3801 and 1708. Their greatest
common divisor is not apparent at …rst sight. Fortunately, we can calculate it by means of
Euclid’s Algorithm. We proceed as follows:
Step 1 3801 = 2 1708 + 385
Step 2 1708 = 4 385 + 168
Step 3 385 = 2 168 + 49
Step 4 168 = 3 49 + 21
Step 5 49 = 2 21 + 7
Step 6 21 = 3 7
In six steps we have found that gcd(3801; 1708) = 7. N
The quality of an algorithm depends on the number of steps, or iterations, that are
required to reach the solution. The fewer the iterations, the more powerful the algorithm is.
The following remarkable property –proven by Gabriel Lamé –holds for Euclid’s Algorithm.
Theorem 23 (Lamé) Given two integers m and n, the number of iterations needed for
Euclid’s Algorithm is less than or equal to …ve times the number of digits of min fm; ng.
For example, if we go back to the numbers 3801 and 1708, the number of relevant digits
is 4. Lamé’s Theorem guarantees in advance that Euclid’s Algorithm would have required
at most 20 iterations. It took us only 6 steps, but thanks to Lamé’s Theorem we already
knew, before starting, that it would not have taken too much e¤ort (and thus it was worth
giving it a shot without running the risk of getting stuck in a grueling number of iterations).
A natural number which is not prime is called composite. Let us denote the set of prime
numbers by P. Obviously, P N and N P is the set of composite numbers. The reader can
easily verify that the following naturals
12 = 22 3
1.3. STRUCTURE OF THE INTEGERS 19
60 = 22 3 5
522 = 2 32 29
What we have just seen raises two questions: whether every natural number admits
a prime factorization (we have only seen a few examples up to now) and whether such
factorization is unique. The next result, the Fundamental Theorem of Arithmetic, addresses
both questions by showing that every integer admits one and only one prime factorization.
In other words, every integer can be expressed uniquely as a product of prime numbers.
Prime numbers are thus the “atoms” of N: they are “indivisible” –as they are divisible
only by 1 and themselves –and by means of them any other natural number can be expressed
uniquely. The importance of this result, which shows the centrality of prime numbers, can
be seen in its name. Its …rst proof can be found in the famous Disquisitiones Arithmeticae,
published in 1801 by Carl Friederich Gauss, although Euclid was already aware of the result
in its essence.
Proof Let us start by showing the existence of this factorization. We will proceed by
contradiction. Suppose there are natural numbers that do not have a prime factorization
as in (1.13). Let n > 1 be the smallest among them. Obviously, n is a composite number.
There are then two natural numbers p and q such that n = pq with 1 < p; q < n. Since n
is the smallest number that does not admit a prime factorization, the numbers p and q do
admit such factorization. In particular, we can write
n0 n0 0
p = pn1 1 pn2 2 pnk k and q = q1 1 q2 2 qsns
Since q1 is a divisor of n, it must be a divisor of at least one of the factors p1 < < pm .18
For example, let p1 be one such factor. Since both q1 and p1 are primes, we have that q1 = p1 .
Hence
n0 1 n0 0
pn1 1 1 pn2 2 pnk k = q1 1 q2 2 qsns < n
which contradicts the minimality of n, as the number pn1 1 1 pn2 2 pnk k also admits multiple
factorizations. The contradiction proves the uniqueness of the prime factorization.
From a methodological viewpoint it must be noted that this proof of existence is carried
out by contradiction and, as such, cannot be constructive. Indeed, such proofs are based on
the law of excluded middle (a property is either true or false; cf. Appendix D) and the truth
of a statement is established by showing its non-falseness. This often allows for such proofs
to be short and elegant but, although logically air-tight,19 they are almost metaphysical as
they do not provide a procedure for constructing the mathematical entities whose existence
they establish. In other words, they do not provide an algorithm with which such entities
can be determined.
To sum up, we invite the reader to compare this proof of existence with the constructive
one provided for Theorem 21. This comparison should clarify the di¤erences between the two
fundamental types of proofs of existence, constructive/direct and non-constructive/indirect.
It is not a coincidence that the proof of the existence in the Fundamental Theorem of
Arithmetic is not constructive. Indeed, designing algorithms which allow us to factorize
a natural number n into prime numbers – the so-called factorization tests – is exceedingly
complex. After all, constructing algorithms which can assess whether n is prime or composite
– the so-called primality tests – is already extremely cumbersome and it is to this day an
active research …eld (so much so that an important result in this …eld dates to 2002).20
To grasp the complexity of the problem it su¢ ces to observe that, if n is composite, there
p p
are two natural numbers a; b > 1 such that n = ab. Hence, a n or b n (otherwise,
p
ab > n), so there is a divisor of n among the natural numbers between 1 and n. To verify
whether n is prime or composite, we can merely divide n by all natural numbers between 1
18
This mathematical fact, although intuitive, requires a mathematical proof. This is indeed the content of
Euclid’s Lemma, which we do not prove. This lemma permits to conclude that, if a prime p divides a product
of strictly positive integers, then it must divide at least one of them.
19
Unless one rejects the law of excluded middle, as some eminent mathematicians have done (although it
constitutes a minority view and a very subtle methodological issue, the analysis of which is surely premature).
20
One of the reasons why the study of factorization tests is an active research …eld is that the di¢ culty
in factorizing natural numbers is exploited by modern cryptography to build unbreakable codes (see Section
6.4).
1.3. STRUCTURE OF THE INTEGERS 21
p
and n: if none of them is a divisor for n, we can safely conclude that n is a prime number,
p
or, if this is not the case, that n is composite. This procedure requires at most n steps.
With this in mind, suppose we want to test whether the number 10100 + 1 is prime or
composite p (it is a number with 101 digits, so it is big but not huge). The procedure requires
at most 10100 + 1 operations, that is, at most 1050 operations (approximately). Suppose we
have an extremely powerful computer which is able to carry out 1010 (ten billion) operations
per second. Since there are 31:536:000 seconds in a year, that is, approximately 3 107
seconds, our computer would be able to carry out approximately 3 107 1010 = 3 1017
operations in one year. To carry out the operations that our procedure might require, our
computer would need
1050 1
17
= 1033
3 10 3
years. We had better get started...
It should be noted that, if the prime factorization of two natural numbers n and m is
known, we can easily determine their greatest common divisor. For example, from
it easily follows that gcd (3801; 1708) = 7, which con…rms the result of Euclid’s Algorithm.
Given how di¢ cult it is to factorize natural numbers, the observation is hardly useful from
a computational standpoint. Thus, it is a good idea to hold on to Euclid’s Algorithm, which
thanks to Lamé’s Theorem is able to produce the greatest common divisors with reasonable
e¢ ciency, without having to conduct any factorization.
Proof The proof is carried out by contradiction. Suppose that there are only …nitely many
prime numbers and denote them by p1 < p2 < < pn . De…ne
q = p1 p2 pn
and set m = q + 1. The natural number m is larger than any prime number, hence it is a
composite number. By the Fundamental Theorem of Arithmetic, it is divisible by at least
one of the prime numbers p1 , p2 , ..., pn . Let us denote this divisor by p. Both natural
numbers m and q are thus divisible by p. It follows that also their di¤erence, that is the
natural number 1 = m q, is divisible by p, which is impossible since p > 1. Hence, the
assumption that there are …nitely many prime numbers is false.
In conclusion, we have looked at some basic notions in number theory, the branch of
mathematics which deals with the properties of integers. It is one of the most fascinating
22 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
and complex …elds of mathematics, and it bears incredibly deep results, often easy to state
but hard to prove. A classic example is the famous Fermat’s Last Theorem, whose statement
is quite simple: if n 3, there cannot exist three strictly positive integers x, y, and z
n n n
such that x + y = z . Thanks to Pythagoras’ Theorem we know that for n = 2 such
triplets of integers do exist (for example, 32 + 42 = 52 ); Fermat’s Last Theorem states that
n = 2 is indeed the only case in which this remarkable property holds. Stated by Fermat,
the theorem was …rst proven in 1994 by Andrew Wiles after more than three centuries of
unfruitful attempts.
(i) re‡exivity: a a;
(iv) completeness (or totality): for every pair a; b 2 R, we have a b or b a (or both);
ac bc if c > 0
ac = bc = 0 if c = 0
ac bc if c < 0
(vii) separation:21 given two sets of real numbers A and B, if a b for every a 2 A and
b 2 B, then there exists c 2 R such that a c b for every a 2 A and b 2 B.
The …rst three properties have an obvious interpretation. Completeness guarantees that
any two real numbers can always be ordered. Additive independence ensures that the initial
ordering between two real numbers a and b is not altered by adding to both the same real
number c. Multiplicative independence considers, instead, the stability of such ordering with
respect to multiplication.
Finally, separation permits to separate two sets ordered by – that is, such that each
element of one of the two sets is greater than or equal to each element of the other one –
21
Sometimes the property of separation of real numbers is called axiom of completeness (or of continuity or
also of Dedekind ). We do not adopt this terminology to avoid confusion with property (iv) of completeness
or totality.
1.4. ORDER STRUCTURE OF R 23
The strict form a > b of the “weak”inequality indicates that a is strictly greater than
b. In terms of , we have a > b if and only if b a, that is, the strict inequality can be
de…ned as the negation of the weak inequality (of opposite direction). The reader can verify
that transitivity and independence (both additive and multiplicative) hold also for the strict
inequality >, while the other properties of the inequality do not hold for >.
For example, if A = [0; 1], the number 3 is an upper bound and the number 1 is a lower
bound since 1 x 3 for every x 2 [0; 1]. In particular, the set of upper bounds of A is
the interval [1; 1) and the set of the lower bounds is the interval ( 1; 0].
We will denote by A the set of upper bounds of A and by A the set of lower bounds.
In the example just seen, A = [1; 1) and A = ( 1; 0].
(i) Upper bounds and lower bounds do not necessarily belong to the set A: the upper
bound 3 and the lower bound 1, for the set [0; 1], are an example of this.
(ii) Upper bounds and lower bounds might not exist. For example, for the set of even
numbers
f0; 2; 4; 6; g (1.14)
there is no real number which is greater than all its elements: hence, this set does not
have upper bounds. Analogously, the set
f0; 2; 4; 6; g (1.15)
has no lower bounds, while the set of integers Z is a simple example of a set without
upper and lower bounds.
Through upper bounds and lower bounds we can give a …rst classi…cation of sets of the
real line.
For example, the closed interval [0; 1] is bounded because it is bounded both above and
below, while the set of even numbers (1.14) is bounded below, but not above (indeed, it has
no upper bounds).25 Analogously, the set (1.15) is bounded above, but not below.
Note that this classi…cation of sets is not exhaustive: there exist sets that do not fall
in any of the types (i)-(iii) of the previous de…nition. For example, Z has neither an upper
bound nor a lower bound in R, and therefore it is not of any of the types (i)-(iii). Such sets
are called unbounded.
x
^ x 8x 2 A
x
^ x 8x 2 A
The key feature of this de…nition is the condition that the maximum and minimum belong
to the set A at hand. It is immediate to see how maxima and minima are, respectively, upper
bounds and lower bounds. Indeed, they are nothing but the upper bounds and lower bounds
that belong to the set A. For such a reason, maxima and minima can be seen as the “best”
among the upper bounds and the lower bounds. Many economic applications are, indeed,
based on the search of maxima or minima of suitable sets of alternatives.
Unfortunately, maxima and minima are fragile notions: sets often do not admit them.
Example 32 The half-closed interval [0; 1) has minimum 0, but it has no maximum. Indeed,
suppose by contradiction that there exists a maximum x^ 2 [0; 1), so that x
^ x for every
x 2 [0; 1). Set
1 1
x~= x
^+ 1
2 2
Since x
^ < 1, we have x
^<x ~. But, it is obvious that x
~ 2 [0; 1), which contradicts the fact
that x
^ is maximum of [0; 1). N
(i) the half-closed interval (0; 1] has maximum 1, but it has no minimum;
(ii) the open interval (0; 1) has neither minimum nor maximum.
The maximum of a set A is denoted by max A, and the minimum by min A. For example,
for A = [0; 1] we have max A = 1 and min A = 0.
26 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
Indeed, let x
^ 2 A be the maximum of A. If h is an upper bound of A, we have h x
^, since
x
^ 2 A. On the other hand, x^ is also an upper bound, and we thus obtain (1.16).
Example 34 The set of upper bounds of [0; 1] is the interval [1; 1). In this example, the
equality (1.16) takes the form max [0; 1] = min [1; 1). N
Thus, when it exists, the maximum is the smallest upper bound. But, the smallest upper
bound –that is, min A –might exist also when the maximum does not exist. For example,
consider A = [0; 1): the maximum does not exist, but the smallest upper bound exists and
it is 1, i.e., min A = 1.
All of this suggests that the smallest upper bound is the surrogate for the maximum
which we are looking for. Indeed, in the example just seen, the point 1 is, in absence of a
maximum, its closest approximation.
Reasoning in a similar way, the greatest lower bound, i.e., max A , is the natural candid-
ate to be the surrogate for the minimum when the latter does not exist. Motivated by what
we have just seen, we give the following de…nition.
De…nition 35 Given a non-empty set A R, the supremum of A is the least upper bound
of A, that is, min A , while the in…mum is the greatest lower bound of A, that is, max A .
Thanks to Proposition 33, both the supremum and the in…mum of A are unique, when
they exist. We denote them by sup A and inf A. For example, for A = (0; 1) we have
inf A = 0 and sup A = 1.
As already remarked, when inf A 2 A, it is the minimum of A, and when sup A 2 A, it
is the maximum of A.
Although suprema and in…ma may exist when maxima and minima do not, they do not
always exist.
Example 36 Consider the set A of the even numbers in (1.14). In this case A = ; and so
A has no supremum. More generally, if A is not bounded above, we have A = ; and the
supremum does not exist. In a similar way, the sets that are not bounded below have no
in…ma.27 N
26
As already mentioned, in economics maxima play a fundamental role.
27
If A does not admit supremum, we write sup A = +1 and, when it does not admit in…mum, inf A = 1.
Moreover, by convention, we set sup ; = 1 and inf ; = +1. This is motivated by the fact that each real
number must be considered simultaneously an upper bound and a lower bound of ;: then it is natural to
conclude that sup ; = inf ; = inf R = 1 and inf ; = sup ; = sup R = + 1.
1.4. ORDER STRUCTURE OF R 27
To be a useful surrogate, suprema and in…ma must exist for a large class of sets; other-
wise, if also their existence were problematic, they would be of little help as surrogates.28
Fortunately, the next important result shows that suprema and in…ma do indeed exist for a
large class of sets (with sets of the kind seen in the last example being the only troublesome
ones).
Theorem 37 (Least Upper Bound Principle) Each non-empty set A R has supremum
if it is bounded above and it has in…mum if it is bounded below.
Proof We limit ourselves to prove the …rst statement. To say that A is bounded above
means that it admits an upper bound, i.e., that A 6= ;. Since a h for every a 2 A and
every h 2 A , by the separation property there exists a separating element c 2 R such that
a c h for every a 2 A and every h 2 A . Since c a for every a 2 A, we have that c
is an upper bound of A, so that c 2 A . But, since c h for every h 2 A , it follows that
c = min A , that is, c = sup A. This proves the existence of the supremum of A.
Except for the sets that are not bounded above, all the other sets in R admit supremum.
Analogously, except for the sets that are not bounded below, all the other sets in R have
in…mum. Suprema and in…ma are thus excellent surrogates that exist, and so help us, for a
large class of subsets of R.
Note that a simple, but useful, consequence of the previous theorem is that bounded sets
have both supremum and in…mum.
1.4.3 Density
The order structure is also useful to clarify the relations among the sets N, Z, Q, and R.
First of all, we make rigorous a natural intuition: however great is a real number, there
always exists a greater natural number. This is the so-called Archimedean property of real
numbers.
Proposition 38 For each real number a 2 R, there exists a natural number n 2 N such that
n a.
Proof By contradiction, assume that there exists a 2 R such that a n for all n 2 N.
By the Least Upper Bound Principle, sup N exists and belongs to R. Recall that, by the
de…nition of sup,
sup N n 8n 2 N (1.17)
At the same time, again by the de…nition of sup, we have sup N 1 < n for some n 2 N
(otherwise, sup N 1 would be an upper bound of N, thus violating the fact that sup N is the
least of these upper bounds). We can conclude that sup N < n + 1 2 N, which contradicts
(1.17).
The next property shows a fundamental di¤erence between the structures of N and Z,
on the one side, and of Q and R, on the other side. If we take an integer, we can talk in
a natural way of predecessor and successor: if m 2 Z, its predecessor is the integer m 1,
28
The utility of a surrogate depends on how well it approximates the original, as well as on its availability.
28 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
while its successor is the integer m + 1 (for example, the predecessor of 317 is 316 and its
successor is 318). In other words, Z has a discrete “rhythm”.
In contrast, we cannot talk of predecessors and successors in Q or in R. Consider …rst
Q. Given a rational number q = m=n, let q 0 = m0 =n0 be any rational such that q 0 > q. Set
1 0 1
q 00 = q + q
2 2
The number q 00 is rational, since
1 m0 1 m 1 m0 n + mn0
q 00 = 0
+ =
2 n 2 n 2 nn0
Proposition 39 Given any two real numbers a < b, there exists a rational number q 2 Q
such that a < q < b.
This property can be stated by saying that Q is dense in R. In the proof of this result
we use the notion of integer part [a] of a real number a 2 p
R, which is the greatest integer
n 2 Z such that n a. For example, [ ] = 3, [5=2] = 2, 2 = 1, [ ] = 4 and so on.
The reader can verify that
[a + 1] = [a] + 1 (1.19)
since, for each n 2 Z, we have n a if and only if n + 1 a + 1. Moreover, [a] < a when
a2= Z.
29
In his famous argument against plurality, Zeno of Elea remarks that a “plurality” is in…nite because “...
there will always be other things between the things that are, and yet others between those others.” (trans.
Raven). Zeno thus identi…es density as the characterizing property of an in…nite collection. With a (twenty
…ve centuries) hidden insight, we can say that he is neglecting the integers. Yet, it is stunning how he was
able to identify a key property of in…nite sets.
1.5. POWERS AND LOGARITHMS 29
Case 2: Let b a > 1, i.e., a < a + 1 < b. From Case 1 it follows that there exists q 2 Q
such that a < q < a + 1 < b.
Case 3: Let b a < 1. By the Archimedean property of real numbers, there exists 0 6= n 2 N
such that
1
n
b a
So, nb na = n (b a) 1. Then, for what we have just seen in cases 1 and 2, there exists
q 2 Q such that na < q < nb. Therefore a < q=n < b, which completes the proof because
q=n 2 Q.
(i) We have de…ned ar only for a > 0 to avoid dangerous and embarrassing
q misunderstand-
3 2 3 p
ings. Think, for example, of ( 5) 2 . It could be rewritten as ( 5) = 2 125 or as
p2 3
5 ; which do not exist (among the real numbers). But, it could also be written
q
3 6 p
as ( 5) = ( 5) which, in turn, can be expressed as either 4 ( 5)6 = 4 15; 625; or
2 4
p4 6
5 . The former exists and is approximately equal to 11:180339, but the latter
does not exist.
p 1
(ii) Let us consider the root a = a 2 . From p high school we know that each positive number
has two algebraic roots, for example 9 = 3. The unique positive value of the root
is called, instead, arithmetical root. For example, 3 and 3 are the two algebraic roots
of 9, while 3 is its unique arithmetical root. In what follows the (even order) roots will
30 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
always be in the arithmetical sense (and therefore with a unique value). It is, by the
way, the standard convention: for example, in the classic solution formula
p
b b2 4ac
x=
2a
of the quadratic equation ax2 + bx + c = 0, the root is in the arithmetical sense (this
is why we need to write ).
We now extend the notion of power to the case ax , with 0 < a 2 R and x 2 R. Un-
fortunately, the details of this extension are tedious, so we limit ourselves to saying that, if
a > 1, the power ax is the supremum of the set of all the values aq when the exponent q
varies among the rational numbers such that q x. Formally,
In a similar way we de…ne ax for 0 < a < 1. We have the following properties that, by (1.21),
follow from the analogous properties that hold when the exponent is rational.
ax > ay if a > 1
ax < ay if a < 1
ax = ay = 1 if a = 1
The most important base a is Napier’s constant e, which will be introduced in Chapter
8. As we will see, the power ex has truly remarkable properties.
Finally, note that point (ii) of the lemma implies, inter alia, that
y
ax = by =) a = b x (1.22)
y y 3
for all a; b > 0 and x; y 2 R. Indeed, (b x )x = b x x = by . For instance, a2 = b3 implies a = b 2 ,
5
while a 3 = b5 implies a = b 3 .
1.5.2 Logarithms
The operations of addition and multiplication are commutative: a + b = b + a and ab = ba.
Therefore, they have only one inverse operation, respectively the subtraction and the division:
The power operation ab , with a > 0, is not commutative: ab might well be di¤erent from
ba .
Therefore, it has two distinct inverse operations.
Let ab = c. The …rst inverse operation – given c and b, …nd out a – is called root with
index b of c: p
a = b c = c1=b
The second one –given c and a, …nd out b –is called logarithm with base a of c:
b = loga c
Note that, together with a > 0 and c > 0, one must also have a 6= 1 because 1b = c is
impossible except when c = 1.
aloga c = c
The properties of the logarithms derive easily from the properties of the powers established
in Lemma 40.
The key property of the logarithm is to transform the product of two numbers in a sum
of two other numbers, that is, property (i) above. Sums are much easier to handle than
products, so the importance of logarithms also computationally (till the age of computers,
tables of logarithms were a most important aid to perform computations). To emphasize this
key property of logarithms, denote a (strictly positive) scalar by an upper case letter and its
logarithm by the corresponding lower case letter; e.g., C = loga c. Then, we can summarize
property (i) as:
c d !C +D
30
For example, loga x2 = 2 loga x for x > 0. Note that loga x2 exists for each x 6= 0, while 2 loga x exists
only for x > 0.
32 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
logb c
loga c =
logb a
Thanks to this change of base formula, it is possible to take as base of the logarithms
always the same number, say 10, because
log10 c
loga c =
log10 a
As for the powers ax , also for the logarithms the most common base is Napier’s constant
e. In such a case we simply write log x instead of loge x. Because of its importance, log x is
called the natural logarithm of x, which leads to the notation ln x sometimes used in place
of log x.
The next result shows the close connections between logarithms and powers, which can
be actually seen as inverse notions.
loga ax = x 8x 2 R
and
aloga x = x 8x > 0
We leave to the reader the simple proof. To check their understanding of the material
of this section, the reader may want to verify that bloga c = cloga b for all strictly positive
numbers a 6= 1, b, and c.
For example, in this manner, 4357 means 4 thousands, 3 hundreds, 5 tens and 7 units.
The natural numbers are thus expressed by powers of 10, each of which causes a digit to be
added: writing 4357 is the abbreviation of
To employ positional notation, it is fundamental to adopt the 0 to signal an empty slot: for
example, when writing 4057 the zero signals the absence of the hundreds, that is,
Decimals are represented in a completely analogous fashion through the powers of 1=10 =
10 1 : for example 0:501625 is the abbreviation of
1 2 3 4 5 6
5 10 + 0 10 + 1 10 + 6 10 + 2 10 + 5 10
The choice of decimal notation is due to the mere fact that we have ten …ngers, but
obviously is not the only possible one. Some Native American tribes used to count on their
hands using the eight spaces between their …ngers rather than the ten …ngers themselves.
They would have chosen only 8 digits, say
0; 1; 2; 3; 4; 5; 6; 7
and they would have articulated the integers along the powers of 8, that is 8, 64, 512, 4096,
. . . They would have written our decimal number 4357 as
1 2
4 0:125 + 1 0:0015625 = 4 8 +1 8 = 0:41
In general, given a base b and a set of digits
Cb = fc0 ; c1 ; :::; cb 1g
used to represent the integers between 0 and b 1, every natural number n is written in the
base b as
dk dk 1 d1 d0
where k is an appropriate natural number and
n = dk bk + dk 1b
k 1
+ + d1 b + d0
0; 1; 2; 3; 4; 5; 6; 7; 8; 9; |; •
We have used the symbols | and • for the two additional digits we need compared to the
decimal notation. The duodecimal number
1011 = 1 23 + 0 22 + 1 21 + 1 20
and in decimal notation
11 = 1 101 + 1 100
The considerable reduction in the digit set C2 made possible by the base 2 involves in terms of
cost the large number of bits required to represent numbers in binary notation. For example:
if 16 consists of two decimal digits, the corresponding binary 10000 requires …ve bits; if 201
requires three digits, the corresponding binary 11001001 requires eight bits; if 2171 requires
four digits, the corresponding binary 100001111011 requires twelve bits, and so on. Very
quickly, binary notation requires a number of bits that only a computer is able to process.
From a purely mathematical perspective, the choice of base is merely conventional, and
going from one base to another is easy (although tedious).32 Bases 2 and 10 are nowadays
32
Operations on numbers written in a non-decimal notation are not particularly di¢ cult either. For ex-
ample, 11 + 9 = 20 can be calculated in a binary way as
1011+
1001 =
10100
It is su¢ cient to remember that the “carrying” must be done at 2 and not at 10.
1.6. NUMBERS, FINGERS AND CIRCUITS 35
the most important ones, but others have been used in the past, such as 20 (the number of
…ngers and toes, a trace of which is still found in the French language where “quatre-vingts”
– i.e., “four-twenties” – stands for eighty and “four-twenty-ten” stands for ninety), as well
as 16 (the number of spaces between …ngers and toes) and 60 (which is convenient because
it is divisible by 2, 3, 4, 5, 6, 10, 12, 15, 20 and 30; a signi…cant trace of this system remains
in how we divide hours and minutes and in how we measure angles).
The positional notation has been used to perform manual calculations since the dawn
of times (just think about computations carried out with the abacus), but it is a relatively
recent conquest in terms of writing, made possible by the fundamental innovation of the zero,
and has been exceptionally important in the development of mathematics and its countless
applications – commercial, scienti…c, and technological. Born in India (apparently around
the …fth century AD), the positional notation was developed during the early Middle Ages
in the Arab world (especially thanks to the works of Al-Khwarizmi), from which the name
“Arabic numerals” for the digits (1.23) derives. It arrived in the Western world thanks
to Italian merchants between the 11th and 12th centuries. In particular, the son of one
of those merchants, Leonardo da Pisa (also known as Fibonacci), was the most important
medieval mathematician: for the …rst time in Western Europe after so many dark centuries,
he conducted original research in mathematics with the overt ambition of going beyond
what the great mathematicians of the classical world had established. Inter alia, Leonardo
authored a famous treatise in 1202, the Liber Abaci, which was the most important among
the …rst essays who brought in Europe the positional notation. Until then, non-positional
Roman numerals were used
which made even trivial operations overly complex (try to sum up CXL and MCL, and then
140 and 1150).
Let us conclude with the incipit of the …rst chapter of Liber Abaci, with the extraordinary
innovation that the book brought to the Western world:
9; 8; 7; 6; 5; 4; 3; 2; 1
Cum his itaque novem …guris, et cum hoc signo, quod arabice zephirum appellatur,
scribitur quilibet numerus, ut inferius demonstratur. [...] ut in sequenti cum
…guris numeris super notatis ostenditur.
R [ f 1; +1g
denoted by the symbol R or, sometimes, by [ 1; +1]. The order structure of R can be
naturally extended on R by setting 1 < a < +1 for each a 2 R.
a + 1 = +1; a 1= 1 8a 2 R (1.24)
+1 + 1 = +1 and 1 1= 1
with, in particular,
(v) division:
a a
= =0 8a 2 R
+1 1
(vi) power of a real number:
8
>
> a+1 = +1 if a > 1
>
>
>
< a+1 = 0 if 0 < a < 1
>
> a 1 =0 if a > 1
>
>
>
: 1
a = +1 if 0 < a < 1
1.7. THE EXTENDED REAL LINE 37
While the addition of in…nities with the same sign is a well-de…ned operation (for example,
the sum of two positive in…nities is again a positive in…nity), the addition of in…nities of
di¤erent sign is not de…ned. For example, the result of +1 1 is not de…ned. This is a
…rst example of an indeterminate operation in R. In general, the following operations are
indeterminate:
1 0 and 0 ( 1) (1.26)
(iii) divisions with denominator equal to zero or with numerator and denominator that are
both in…nities:
a 1
and (1.27)
0 1
with a 2 R;
The indeterminate operations (i)-(iv) are called forms of indetermination and will play
an important role in the theory of limits. Note that, by setting a = 0, formula (1.27) takes
the form
0
0
O.R. As we have observed, the most natural geometric image of R is the (real) line: to each
point there corresponds a number and, vice versa, to each number there corresponds a point.
If we take a closed (and obviously bounded) segment, we can “transport” all the numbers
from the real line to the open interval (0; 1), as the following …gure shows:34
34
We refer to the proof of Proposition 253 for the analytic expression of the bijection shown here.
38 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
2
y
1.5
1
1
0.5 1/2
0
O x
-0.5
-1
-1.5
-2
-5 -4 -3 -2 -1 0 1 2 3 4 5
All the real numbers that found a place on the real line also …nd a place on the interval (0; 1)
– maybe packed, but they really …t all. Two points are left, the endpoints of the interval,
to which it is natural to associate, respectively, +1 and 1. The closed interval [0; 1] is,
therefore, a geometric image of R. H
posed by Arpad Szabo,35 underlines the importance of the Eleatic philosophy, ‡ourished at
Elea in the V century B.C. and that has in Parmenides and Zeno its most famous exponents.
In Parmenides’famous doctrine of the Being, a turning point in intellectual history that the
reader might have encountered in some high school philosophy course, it is logic that permits
the study of the Being, that is, of the world of truth ( " ). This study is impossible for
the senses, which can only guide us among the appearances that characterize the world of
opinion ( o ). In particular, only the reason can dominate the arguments by contradiction,
which have no empirical substratum, but are the pure result of reason. Such arguments,
developed – according to Szabo – by the Eleatic school and at the center of its dialectics
(culminated in the famous paradoxes of Zeno), for example enabled the Eleatic philosopher
Melissus of Samo to state that the Being “always was what it was and always will be. For
if it had come into being, necessarily before it came into being there was nothing. But, if
there was nothing, in no way could something come into being from nothing”.36
True knowledge is thus theoretic, only the eye of the mind can see the truth, while
empirical analysis necessarily stops at the appearance. The anti-empirical character of the
Eleatic school could have been decisive in the birth of the deductive method, at least in
creating a favorable intellectual environment. Naturally, it is not possible to exclude an
opposite causality to the one proposed by Szabo: The deductive method could have been
developed inside mathematics and could have p then in‡uenced philosophy, and in particular
the Eleatics.37 Indeed, the irrationality of 2, established by the Pythagorean school (the
other great Presocratic school of Magna Graecia), is a …rst decisive triumph of such a method
in mathematics: only the eye of the mind could see such a property, which is devoid of
any “empirical” intuition. It is the eye of the mind that explains the inescapable error
in which incurs every empirical measurement of the hypotenuse of a right triangle with
catheti of unitary length: however accurate is this
p measurement, it will always be a rational
approximation of the true irrational distance, 2, with a consequent approximation error
(that, by the way, will probably vary from measurement to measurement).
In any case, between the VI and the V century B.C. two Presocratic schools of Magna
Graecia were the cradle of an incredible intellectual revolution. In the III century B.C. an-
other famous Magna Graecia scholar, Archimedes from Syracuse, led this revolution to its
maximum splendor in the classical world (and beyond). We close with Plato’s famous (prob-
ably …ctional) description of two protagonists of this revolution, Parmenides and Zeno.38
35
See Szabo (1978). Elea was a town of Magna Graecia, around 140 kilometers south of Naples.
36
Barnes (1982) calls this beautiful fragment the theorem of ungenerability (trans. Allho¤, Smith, and
Vaidya in “Ancient phylosophy”, Blackwell, 2008). In a less transparent way (but it was part of the …rst
logical argument ever reported) Parmenides had written in his poem “And how might what is be then? And
how might it have come into being? For if it came into being, it is not, nor if it is about to be at some time”
(trans. Barnes). We refer to Calogero (1977) for a classic work on Eleatic philosophy, and to Barnes (1982)
as well as to the recent Warren (2014), for general introductions to the Presocratics.
37
For instance, arguments by contradiction could have been developed within the Pythagorean school p
through the odd-even dichotomy for natural numbers that is central in the proof of the irrationality of 2.
This is what Cardini Timpanaro (1964) argues, contra Szabo, in her comprehensive book. See also pp. 258-
259 in Vlastos (1996). Interestingly, the archaic Greek enigmas were formulated in contradictory terms (their
role in the birth of dialectics is emphasized by Colli, 1975).
38
In Plato’s dialogue “Parmenides” (trans. Jowett reported in Barnes ibid.). A caveat : over the centuries
– actually, over the millennia – the strict Eleatic anti-empirical stance (understandable, back then, in the
excitement of a new approach) has inspired a great deal of metaphysical thinking. Reason without empirical
motivation and discipline becomes, at best, sterile.
40 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION
They came to Athens ... the former was, at the time of his visit, about 65 years
old, very white with age, but well favoured. Zeno was nearly 40 years of age,
tall and fair to look upon: in the days of his youth he was reported to have been
beloved by Parmenides.
Chapter 2
On another label one reads: 1 year of aging and 10 degrees. In this case we can write
(1; 10)
The pairs (2; 12) and (1; 10) are called ordered pairs. In them we distinguish the …rst element,
the aging, from the second one, the alcoholic content. In an ordered pair the position is,
therefore, crucial: a (2; 12) wine is very di¤erent from a (12; 2) wine (try the latter...).
Let A1 be the set of the possible years of aging and A2 the set of the possible alcoholic
contents. We can then write
De…nition 44 Given two sets A1 and A2 , the Cartesian product A1 A2 is the set of all
the ordered pairs (a1 ; a2 ) with a1 2 A1 and a2 2 A2 .
In the example, we have A1 N and A2 N, i.e., the elements of A1 and A2 are natural
numbers. More generally, we can assume that A1 = A2 = R, so that the elements of A1
and A2 are real numbers, although with a possible di¤erent interpretation according to their
position. In this case
A1 A2 = R R = R2
41
42 CHAPTER 2. CARTESIAN STRUCTURE AND RN
(i) (a1 ; a2 ) 2 R2 : a1 = 0 , that is, the set of the ordered pairs of the form (0; a2 ); it is
the vertical axis (or axis of the ordinates).
(ii) (a1 ; a2 ) 2 R2 : a2 = 0 , that is, the set of the ordered pairs of the form (a1 ; 0); it is
the horizontal axis (or axis of the abscissae).
(iii) (a1 ; a2 ) 2 R2 : a1 0 and a2 0 , that is, the set of the ordered pairs (a1 ; a2 ) with
both components that are positive; it is the …rst quadrant of the plane (also called
positive orthant). In a similar way we can de…ne the other quadrants:
y
3
II I
1
0
O x
-1
III IV
-2
-3 -2 -1 0 1 2 3 4 5
(iv) (a1 ; a2 ) 2 R2 : a21 + a22 1 and (a1 ; a2 ) 2 R2 : a21 + a22 < 1 , that is, the closed unit
ball and open unit ball, respectively (both centered at the origin and with radius one).1
1
The meaning of the adjectives “closed” and “open” will become clear in Chapter 5.
2.1. CARTESIAN PRODUCTS AND RN 43
(v) (a1 ; a2 ) 2 R2 : a21 + a22 = 1 , that is, the unit circle; it is the skin of the closed unit
ball:
4
x
2
3
0
O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
Before we classi…ed wines according to two characteristics, aging and alcoholic content.
We now consider a more complicated example, that is, portfolios of assets. Suppose that
there exist four di¤erent assets that can be purchased in a …nancial market. A portfolio is
then described by an ordered quadruple
(a1 ; a2 ; a3 ; a4 )
where a1 is the amount of money invested in the …rst asset, a2 is the amount of money
invested in the second asset, and so on. For example,
denotes a portfolio in which 1000 euros have been invested in the …rst asset, 1500 in the
second one, and so on. The position is crucial: the portfolio
is very di¤erent from the previous one, although the amounts of money involved are the
same.
Since amounts of money are numbers that are not necessarily integers, possibly negative
(in case of sales), it is natural to assume A1 = A2 = A3 = A4 = R, where Ai is the set of the
possible amounts of money that can be invested in asset i = 1; 2; 3; 4. We have
(a1 ; a2 ; a3 ; a4 ) 2 A1 A2 A3 A4 = R4
In particular,
(1000; 1500; 1200; 600) 2 R4
In general, if we consider n sets A1 ; A2 ; :::; An we can give the following de…nition.
44 CHAPTER 2. CARTESIAN STRUCTURE AND RN
A1 A2 An
Q
denoted by ni=1 Ai (or by ni=1 Ai ), is the set of all the ordered n-tuples (a1 ; a2 ; :::; an ) with
a1 2 A1 ; a2 2 A2 ; ; an 2 An .
Rn = |R R {z R}
n times
An element
x = (x1 ; x2 ; :::; xn ) 2 Rn
is called vector.2 The Cartesian product Rn is called the (n-dimensional ) Euclidean space.
For n = 1, R is represented by the real line and, for n = 2, R2 is represented by the plane.
As one learns in high school, it was Descartes that in 1637 understood it – so all points of
the plane can be identi…ed by a pair (a1 ; a2 ), as seen in a previous …gure – a marvelous
insight that permitted to study geometry through algebra (this is why Cartesian products
are named after him). Also the vectors (a1 ; a2 ; a3 ) in R3 admit a graphic representation:
1 z
0.9
0.8
a
3
0.7
0.6
0.5
a
2
0.4 O
0.3 a
1
0.2 y
x
0.1
0
0 0.2 0.4 0.6 0.8 1
However, this is no longer possible in Rn when n 4. The graphic representation may help
the intuition, but from a theoretical and computational viewpoint it has no importance: the
vectors of Rn , with n 4, are completely well-de…ned entities. They actually turn out to be
2
For real numbers we use the letter x instead of a.
2.2. OPERATIONS IN RN 45
fundamental in economics, as we will see in Section 2.4 and as the portfolio example already
showed.
Notation We will denote the components of a vector by the same letter used for the vector
itself, along with ad hoc indexes: for example a3 is the third component of the vector a, y7
the seventh component of the vector y, and so on.
2.2 Operations in Rn
Let us consider two vectors in Rn ,
x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn )
For example, for the two vectors x = (7; 8; 9) and y = (2; 4; 7) in R3 , we have
x = ( x1 ; x2 ; :::; xn )
Even in this case, we have x 2 Rn . In other words, also through the operation of scalar
multiplication we constructed a new element of Rn .3
We have introduced in Rn two operations, addition and scalar multiplication, that extend
to vectors the corresponding operations for real numbers. Let us see their properties. We
start with addition.
(i) x + y = y + x (commutativity),
3
A real number is often called scalar. Throughout the book we will use the terms “scalar” and “real
number” interchangeably.
46 CHAPTER 2. CARTESIAN STRUCTURE AND RN
(ii) (x + y) + z = x + (y + z) (associativity),
Proof We prove (i), leaving the other properties to the reader. We have
as desired.
(iv) ( x) = ( ) x (associativity).
Proof We only prove (ii), the other properties are left to the reader. We have:
( + ) x = (( + ) x1 ; ( + ) x2 ; :::; ( + ) xn )
= ( x1 + x1 ; x2 + x2 ; :::; xn + xn )
= ( x1 ; x2 ; :::; xn ) + ( x1 ; x2 ; :::; xn ) = x + x
as claimed.
The last operation in Rn that we consider is the inner product. Given two vectors x and
y in Rn , their inner product, denoted by x y, is the scalar de…ned by
x y = x1 y1 + x2 y2 + + xn yn
Other common notations for the inner product are (x; y) and hx; yi.
For example, for the vectors x = (1; 1; 5; 3) and y = ( 2; 3; ; 1) of R4 , we have
x y = 1 ( 2) + ( 1) 3 + 5 + ( 3) ( 1) = 5 2
4 Pn
Given n real
Q numbers ri , their sum r1 + r2 + + rn is denoted by i=1 ri , while their product r1 r2 rn
is denoted by n i=1 ri .
2.3. ORDER STRUCTURE ON RN 47
The inner product is an operation that di¤ers from addition and scalar multiplication in a
structural aspect: while the latter operations determine a new vector of Rn , the result of the
inner product is a scalar. The next result gathers the main properties of the inner product
(we leave to the reader the simple proof).
(i) x y = y x (commutativity),
(ii) (x + y) z = (x z) + (y z) (distributivity),
(iii) x z= (x z) (distributivity).
Note that the two distributive properties can be summarized in the single property
( x + y) z = (x z) + (y z).
The order structure of Rn is based on the order structure of R, but with some important
novelties. We begin by de…ning the order on Rn : given two vectors x = (x1 ; x2 ; ::; xn ) and
y = (y1 ; y2 ; ::; yn ) in Rn , we write
x y
The study of the basic properties of the inequality on Rn reveals a …rst important
novelty: when n 2, the order does not satisfy completeness. Indeed, consider for
example x = (0; 1) and y = (1; 0) in R2 : neither x y nor y x. We say, therefore, that
on Rn is a partial order (which becomes a complete order when n = 1).
It is easy to …nd vectors in Rn that are not comparable. The following …gure shows the
vectors of R2 that are or than the vector x = (1; 2); the darker area represents the points
smaller than x, the clearer area those greater than x, and the two white areas represent the
48 CHAPTER 2. CARTESIAN STRUCTURE AND RN
5
y
4
2
2
1
0
O 1 x
-1
-2
-2 -1 0 1 2 3 4 5
Apart from completeness, it is easy to verify that on Rn continues to enjoy the properties
seen for n = 1:
(i) re‡exivity: x x,
(iv) separation: given two sets A and B in Rn , if a b for every a 2 A and b 2 B, then
there exists c 2 Rn such that a c b for every a 2 A and b 2 B.
Another notion that becomes surprisingly delicate when n 2 is that of strict inequality.
Indeed, given two vectors x = (x1 ; x2 ; :::; xn ) and y = (y1 ; y2 ; :::; yn ) of Rn , two cases can
happen.
1. All the components of x are than the corresponding components of y, with some of
them strictly greater; i.e., xi yi for each index i = 1; 2; :::n, with xi > yi for at least
an index i.
2. All the components of x are > than the corresponding components of y; i.e., xi > yi
for each i = 1; 2; :::; n:
In the …rst case we have a strict inequality, in symbols x > y; in the second case a strong
inequality, in symbols x y.
x y =) x > y =) x y
The three notions of inequality among vectors in Rn are, therefore, more and more
stringent. Indeed, we have:
(i) a weak notion, , that permits the equality between the two vectors;
(ii) an intermediate notion, >, that requires at least one strict inequality among the com-
ponents;
(iii) a strong notion, , that requires strict inequality among all the components of the
two vectors.
When n = 1, both > and reduce to the standard > on R. Moreover, the “reversed”
symbols , <, and are used for the converse inequalities.
An important comparison is that between a vector x and the zero vector 0. We say that
the vector x is:
(ii) strictly positive if x > 0, i.e., if all the components of x are positive and at least one
of them is strictly positive;
(iii) strongly positive if x 0, i.e., all the components of x are strictly positive.
N.B. The notation and terminology that we introduced is not the only possible one. For
example, some authors use =, >, and > in place of >, >, and ; other authors call “non-
negative” the vectors that we call positive, and so on. O
Together with the lack of completeness of , the presence of the two di¤erent notions of
strict inequality is the main novelty, relative to what happens in the real line, that we have
in Rn when n 2.
[a; b] = fx 2 Rn : a x bg = fx 2 Rn : ai xi bi g
2.4 Applications
2.4.1 Static choices
Consider a consumer who has to choose how many kilograms of apples and of potatoes to
buy at the market. For convenience, we assume that these goods are in…nitely divisible, so
that the consumer can buy any real positive quantity (for example, 3 kg of apples and
kg of potatoes). In this case, R+ is the set of the possible quantities of apples or potatoes
that can be bought. Therefore, the collection of all bundles of apples and potatoes that the
consumer can buy is
R2+ = R+ R+ = f(x1 ; x2 ) : x1 ; x2 0g
Graphically, it is the …rst quadrant of the plane. In general, if a consumer chooses n goods,
the set of the bundles is represented by the Cartesian product
In a similar way we can de…ne minimals, which are also called Pareto optima.6
To understand the nature of maximals,7 say that a point x 2 A is dominated by another
point y 2 A if x < y, that is, if xi yi for each index i, with xi < yi for at least an index
i (Section 2.3). A dominated point is thus outperformed by another point available in the
set. For instance, if they represent bundles of goods, a dominated bundle x is obviously a no
better alternative than the dominant one y. In terms of dominance, we can say that a point a
of A is maximal if is not dominated by any other point in A. That is, a is not outperformed
by any other alternative available in A. Maximality is thus the natural extension of the
notion of maximum when dealing –as it is often the case in applications –with alternatives
that are multi-dimensional (and so represented by vectors of Rn ).
The set in the next …gure has a maximum, i.e., point a. Thanks to this lemma, a is
therefore also the unique maximal.
Thus:
maximum =) maximal
6
Optima, like angels, have no gender. Note that here “maximal” is an adjective used as a noun (as it was
the case for “maximum” in De…nitions 30 and 51). If used as adjectives, we would have “maximal element”
(as well as “maximum element”).
7
In the rest of the chapter we focus on maxima and maximals, the most relevant in economic applications,
leaving to the reader the dual properties that hold for minima and minimals.
2.5. PARETO OPTIMA 53
But, the converse is false: there exist maximals that are not maxima, that is,
maximal 6=) maximum
Example 54 In the binary set A = f(1; 2) ; (2; 1)g of the plane, the vector (2; 1) is a maximal
that is not a maximum, while the vector (1; 2) is a minimal that is not a minimum. N
Example 55 The next …gure shows a set A of R2 that has no maxima, but in…nitely many
maximals.
3 a
2
A
0
O
-1
-2
-2 -1 0 1 2 3 4 5
It is easy to see that any point a 2 A on the dark edge is maximal: there is no x 2 A such
that x > a. On the other hand, a is not a maximum: we have a x only for the points
x 2 A that are comparable with a, which are represented in the shaded part of A :
Nothing can be said, instead, for the points that are not comparable with a (the non-shaded
part of A). The lack of maxima for this set is thus due to the fact that the order is only
partial in Rn when n > 1. N
The set A of the last example illustrates another fundamental di¤erence between maxima
and maximals in Rn with n > 1: the maximum of a set, if it exists, is unique while a maximal
might well not to be unique.
54 CHAPTER 2. CARTESIAN STRUCTURE AND RN
Summing up, because of the incompleteness of the order on Rn , maxima are much less
important than maximals in Rn . That said, maximals might also not exist: the 45 straight
line is a subset of R2 without maximals (and minimals).8
De…nition 56 The set of the maximals of a set A Rn is called the Pareto (or e¢ cient)
frontier of A.
In the last example, the dark edge is the Pareto frontier of the set A :
5
2
A
0
O
-1
-2
-2 -1 0 1 2 3 4 5
As a …rst economic application, assume for example that the di¤erent vectors of a set
A Rn represent the pro…ts that n individuals can earn. So, in x = (x1 ; :::; xn ) 2 A the
component xi is the pro…t of individual i, with i = 1; :::; n. The Pareto optima represent
the situations from which it is not possible to move away without reducing the pro…t of at
least one of the individuals. In other words, the n individuals would not object to restrict A
to the set of its Pareto optima (nobody looses), that is, to its Pareto frontier. A con‡ict of
interests arises among them, instead, when a speci…c point on the frontier has to be selected.
Thus, the concept of Pareto optimum permits to narrow down, with a unanimous con-
sensus, a set A of alternatives by identifying the true “critical” subset, the Pareto frontier,
which is often much smaller than the original set A.9
8
This set is the graph of the function f : R ! R given by f (x) = x, as we will see in Chapter 6.
9
For Pareto optimality is key that agents only consider their own alternatives (bundles of goods, pro…ts,
etc.), without caring about those of their peers. In other words, they should not feel envy or similar social
emotions. To see why, think of a tribe of “envious” whose chief decides to double the food rations to half of
the members of the tribe, living unchanged those of the other members. The new allocation would provoke
lively protests by the “unchanged” members even though nothing changed for them.
2.5. PARETO OPTIMA 55
A magni…cent illustration of this key aspect of Pareto optimality is the famous Edgeworth
box.10 Consider two agents, Albert and Barbara, who have to divide between them unitary
quantities of two in…nitely divisible goods (for example, a kilogram of ‡our and a liter of
wine). We want to model the problem of division (probably determined by a bargaining
between them) and to see if, thanks to Pareto optimality, we can say something non-trivial
about it.
Each pair x = (x1 ; x2 ) with x1 2 [0; 1] and x2 2 [0; 1], represents a possible allocation of
the two goods to one of the two agents. In particular, the Cartesian product [0; 1] [0; 1]
describes them all. The two agents must agree on the allocations (a1 ; a2 ) of Albert and
(b1 ; b2 ) of Barbara. Clearly,
a1 + b1 = a2 + b2 = 1 (2.1)
To complete the description of the problem, we have to specify the desiderata of the two
agents. To this end, we suppose that they have identical utility functions ua ; ub : [0; 1]
p
[0; 1] ! R that, for simplicity, are of the Cobb-Douglas type ua (x1 ; x2 ) = ub (x1 ; x2 ) = x1 x2
(see Example 178). The indi¤erence curves can be “packed” in the following way:
This is the classic Edgeworth box. By condition (2.1), we can think of a point (x1 ; x2 ) 2
[0; 1] [0; 1] as the allocation of Albert. We can actually identify each possible division
between the two agents with the allocations (x1 ; x2 ) of Albert. Indeed, the allocations of
Barbara (1 x1 ; 1 x2 ) are uniquely determined once those of Albert are known.
Each allocation (x1 ; x2 ) has utility ua (x1 ; x2 ) for Albert and ub (1 x1 ; 1 x2 ) for Bar-
bara. Let
be the set of all the utility pro…les of the two agents determined by the division of the two
goods. We are interested in the allocations whose utility pro…les belong to the Pareto frontier
10
Since we will use notions that we will introduce in Chapter 6, the reader may want to read this application
after that chapter.
56 CHAPTER 2. CARTESIAN STRUCTURE AND RN
of A, so are Pareto optima of the set A. Indeed, these are the allocations that cannot be
improved upon with a unanimous consensus.
By looking at the Edgeworth box, it is easy to see that the Pareto frontier P of A is
given by the values of allocations on the diagonal of the box, i.e.,
That is, by the locus of the tangency points of the indi¤erence curves (called contract curve).
To prove it rigorously, we need the next simple result.
Since the last inequality is always true, we conclude that (2.2) holds. Moreover, these
equivalences imply that
p p
1 x1 x2 = (1 x1 ) (1 x2 ) () (x1 x2 )2 = 0
Having established this lemma, we can now prove rigorously what the last picture sug-
gested.
Proof Let D = (d; d) 2 R2+ : d 2 [0; 1] be the diagonal of the box. We start by showing
that, for any division of goods (x1 ; x2 ) 2
= D –i.e., with x1 6= x2 –there exists (d; d) 2 D such
that
(ua (d; d) ; ub (1 d; 1 d)) > (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) (2.3)
For Albert, we have
p p p
ua ( x1 x2 ; x1 x2 ) = x1 x2 = ua (x1 ; x2 )
p p
Therefore, ( x1 x2 ; x1 x2 ) is for him indi¤erent to (x1 ; x2 ). By Lemma 57, for Barbara we
have
p p p p
ub (1 x1 x2 ; 1 x1 x2 ) = 1 x1 x2 > (1 x1 ) (1 x2 ) = ub (1 x1 ; 1 x2 )
p
where the inequality is strict since x1 6= x2 . Therefore, setting d = x1 x2 , (2.3) holds.
2.5. PARETO OPTIMA 57
It follows that the divisions (x1 ; x2 ) outside of the diagonal have utility pro…les that
are not Pareto optima. It remains to show that the divisions on the diagonal are so. Let
(d; d) 2 D and suppose, by contradiction, that there exists (x1 ; x2 ) 2 [0; 1] [0; 1] such that
that is,
p p p p
x1 x2 > dd = d and (1 x1 ) (1 x2 ) (1 d) (1 d) = 1 d
Therefore, p
p
1 x1 x2 < 1 d (1 x1 )(1 x2 )
which contradicts (2.2). It follows that there is no (x1 ; x2 ) 2 [0; 1] [0; 1] for which (2.4)
holds. This completes the proof.
In sum, if agents maximize their Cobb-Douglas utilities, the bargaining will result in
a division of the goods on the diagonal of the Edgeworth box, i.e., such that each agent
has an equal quantity of both goods. Proposition 58 does not say anything about which of
the points of the diagonal is, then, actually determined by the bargaining, that is, how the
ensuing con‡ict of interest among agents is then solved. Nevertheless, through the notion of
Pareto optimum we have been able to say something highly non-trivial about the problem
of division.
11
A similar argument holds when ua (x1 ; x2 ) ua (d; d) and ub (1 x1 ; 1 x2 ) > ub (1 d; 1 d).
58 CHAPTER 2. CARTESIAN STRUCTURE AND RN
Chapter 3
Linear structure
In this chapter we study more in depth the linear structure of Rn which was introduced
in Section 2.2. The study of such a fundamental structure of Rn , which we will continue
in Chapter 13 on linear functions, is part of linear algebra. The theory of …nance is a
fundamental application of linear algebra, as we will see in Section 19.5.
59
60 CHAPTER 3. LINEAR STRUCTURE
(i) x + y 2 V if x; y 2 V ;
(ii) x 2 V if x 2 V and 2 Rn .
We leave to the reader the easy check that the two operations satisfy in V properties
(v1)-(v8). In this regard, it is important to note that by (ii) the origin belongs to each vector
subspace V –i.e., 0 2 V –because 0x = 0 for every vector x 2 V .
The following characterization is useful when one needs to check whether a subset of Rn
is a vector subspace.
x+ y 2V (3.1)
Proof “Only if”. Let V be a vector subspace and let x; y 2 V . As V is closed with respect
to scalar multiplication, we have x 2 V and y 2 V . It follows that x + y 2 V since V
is closed with respect to addition.
“If”. Putting = = 1 in (3.1), we get x + y 2 V , while putting = 0 we get x 2 V .
Therefore, V is closed with respect to the operations of addition and scalar multiplication
inherited from Rn .
Putting = = 0, (3.1) implies that 0 2 V . This con…rms that each vector subspace
contains the origin 0.
Example 61 There are two legitimate, yet trivial, subspaces of Rn : the singleton f0g and
the space Rn itself. In particular, the reader can check that a singleton fxg is a vector
subspace of Rn if and only if x = 0. N
M = fx 2 Rn : x1 = = xm = 0g
x + y = ( x1 + y1 ; :::; xn + yn )
= (0; :::; 0; xm+1 + ym+1 ; :::; xn + yn ) 2 M
In other words, M is the set of the solutions of this system of equations. It is a vector
subspace: the reader can check that, given x; y 2 M and ; 2 R, we have x + y 2 M .
Performing the computations,3 we …nd that the vectors
10 2
t; 6t; t; t (3.2)
3 3
10 2
M= t; 6t; t; t :t2R
3 3
If V1 and V2 are two vector subspaces, we can show that also their intersection V1 \ V2 is
a vector subspace. More generally:
Di¤erently from the intersection, the union of vector subspaces is not in general a vector
subspace, as the next simple example shows.4
V1 [ V2 = x 2 R2 : x1 = 0 or x2 = 0
1 = 2 = = m =0
The set x1 ; :::; xm is, instead, said to be linearly dependent if it is not linearly inde-
pendent, i.e.,5 if there exists a set f 1 ; :::; m g of scalars, not all equal to zero, such that
1 2 m
1x + 2x + + mx =0
e1 = (1; 0; 0; :::; 0)
e2 = (0; 1; 0; :::; 0)
en = (0; 0; :::; 0; 1)
called standard unit vectors or versors of Rn . The set e1 ; :::; en is linearly independent.
Indeed
1
1e + + n en = ( 1 ; :::; n )
and so 1e
1 + + ne
n = 0 implies 1 = = n = 0. N
Example 68 All the sets of vectors x1 ; :::; xm of Rn that include the zero vector 0 are
linearly dependent. Indeed, without loss of generality, set x1 = 0. Given a set f 1 ; :::; m g
of scalars with 1 6= 0 and i = 0 for i = 2; :::; m, we have
1 2 m
1x + 2x + + mx =0
m
which proves the linear dependence of the set xi i=1
. N
Example 69 Two vectors x1 and x2 that are linearly dependent are called collinear. This
happens if and only if either x = 0 or y = 0 or there exists 6= 0 such that x1 = x2 . In
other words, if and only if there exist two scalars 1 and 2 , where at least one is di¤erent
from zero, such that 1 x1 = 2 x2 . N
Therefore, 1 2 3
1x + 2x + 3x = 0 means
8
< 1+3 2+9 3 =0
1+ 2+ 3 =0
:
1 + 5 2 + 25 3 = 0
which is a system of equations whose unique solution is ( 1; 2; 3) = (0; 0; 0). More gener-
ally, to check if k vectors
If ( 1 ; :::; k ) = (0; :::; 0) is the unique solution, then the vectors are linearly independent in
Rn . For example, consider in R3 the two vectors x1 = (1; 3; 4) and x2 = (2; 5; 1). The system
to solve is 8
< 1+2 2 =0
3 1+5 2 =0
:
4 1+ 2=0
It has the unique solution ( 1; 2) = (0; 0), so the two vectors x1 and x2 are linearly inde-
pendent. N
10 2
t; 6t; t; t (3.3)
3 3
64 CHAPTER 3. LINEAR STRUCTURE
for each t 2 R. Therefore, (0; 0; 0; 0) is not the unique solution of the system, and so the
vectors x1 , x2 , x3 , and x4 are linearly dependent. Indeed, by setting for example t = 1 in
(3.3), the set of four numbers
10 2
( 1; 2; 3; 4) = ; 6; ;1
3 3
is a set of scalars, with at least one di¤erent from zero, such that 1+ 2+ 3 4
1x 2x 3x + 4x =
0. N
Proposition 72 The subsets of a linearly independent set are, in turn, linearly independent.
The simple proof is left to the reader, who can also check that if we add vectors to a
linearly dependent set, the set remains linearly dependent.
The scalars i are called the coe¢ cients of the linear combination.
Theorem 75 A …nite set S of Rn , with S 6= f0g, is linearly dependent if and only if there
exists at least an element of S that is a linear combination of other elements of S.6
m
Proof “Only if”. Let S = xi i=1 be a linearly dependent set of Rn . Let 2 k m
be the smallest natural number between 2 and m such that the set x1 ; :::; xk is linearly
m
dependent. At worst, k is equal to m since by hypothesis xi i=1 is linearly dependent. By
the de…nition of linear dependence, there exist k scalars f i gki=1 , with at least one di¤erent
from zero, such that
1 2
1x + 2x + + k xk = 0
We have k 6= 0, because otherwise x1 ; :::; xk 1 would be a linearly dependent set, contra-
dicting the fact that k is the smallest natural number between 2 and m such that x1 ; :::; xk
is a linearly dependent set. Given that k 6= 0, we can write
1 1 2 2 k 1 k 1
xk = x + x + + x
k k k
6
In view of Example 61, the condition S 6= f0g amounts to require that S is not a singleton.
3.4. GENERATED SUBSPACES 65
and, therefore, xk is linear combination of the vectors x1 ; :::; xk 1 . In other words, the
vector xk of S is linear combination of other elements of S.
m
“If”. Suppose that the vector xk of a …nite set S = xi i=1 is a linear combination of
other elements of S. Without loss of generality, assume k = 1. There exists a set f i gm
i=2 of
scalars such that x1 = 2 x2 + + m xm . De…ne the scalars f i gmi=1 as follows
1 i=1
i =
i i 2
By
Pmconstruction, f i gm
i=1 is a set of scalars, with at least one di¤erent from zero, such that
i
i=1 i x = 0. Indeed
m
X
i
ix = x1 + 2x
2
+ 3x
3
+ + mx
m
= x1 + x1 = 0
i=1
m
It follows that xi i=1
is a linearly dependent set.
Example 76 (i) Consider the vectors x1 = (1; 3; 4), x2 = (2; 5; 1) ; and x3 = (0; 1; 7) in R3 .
Since x3 = 2x1 x2 , the third vector is a linear combination of the other two. By Theorem
75, the set x1 ; x2 ; x3 is linearly dependent (in the proof we have k = 3). It is immediate
to check that also each of the vectors in the set x1 ; x2 ; x3 is a linear combination of the
other two, something that, as the next example shows, does not hold in general for sets of
linearly dependent vectors.
(ii) Consider the vectors x1 = (1; 3; 4), x2 = (2; 6; 8) ;and x3 = (2; 5; 1) in R3 . Since
x2 = 2x1 , the second vector is a multiple (so, a linear combination) of the …rst vector. By
Theorem 75, the set x1 ; x2 ; x3 is linearly dependent (in the proof we have k = 2). Note
how x3 is not a linear combination of x1 and x2 , i.e., there are no 1 ; 2 2 R such that
x3 = 1 x1 + 2 x2 . In conclusion, Theorem 75 ensures that, in a set of linearly dependent
vectors, some of them are linear combination of others, but this is not necessarily the case
for all the vectors of the set. For example, this happened for all the vectors in the previous
example, but not in this example. N
Corollary 77 A …nite set S of Rn is linearly independent if and only if none of the vectors
in S is linear combination of other vectors in S.
T
The vector subspace i Vi is very important and is called the vector subspace generated
or spanned by S, denoted by span S. In other words, span S is the smallest “enlargement”
of S with the property of being a vector subspace.
The next result shows that span S has a “concrete” representation in terms of linear
combinations of S.
Proof We need to prove that x 2 Rn belongs to span S if and only if there P exist a …nite
set xi i2I of vectors in S and a …nite set f i gi2I of scalars such that x = i2I i xi . “If”.
Let x 2 Rn be a linear combination of a …nite set xi i2I of vectors of S. For simplicity,
set xi i2I = x1 ; :::; xk . There exists, therefore, a set f i gki=1 of real numbers such that
P
x = ki=1 i xi . By the de…nition of a vector subspace, we have 1 x1 + 2 x2 2 span S since
x1 ; x2 2 span S. In turn, 1
1x + 2x
2 2 span S implies 1
1x + 2x
2 + 3
3 x 2 span S,
Pk
and by proceeding in this way we get that x = i=1 i xi 2 span S, as claimed.
“Only if”. Let V be the set of all vectors x 2 Rn that can be expressed as linear
combinations of vectors of S, that is, x 2 V if there exist …nite sets xi i2I S and
i
Pk
i2I
R such that x = i=1 i x . It is easy to see that V is a vector subspace of Rn
i
Before illustrating the theorem with some examples, we state a simple consequence.
In words, the vector subspace generated by a set does not change by adding to the set a
vector that is already a linear combination of its elements. The “generative” capability of a
set is not improved by adding to it vectors that are linear combinations of its elements.
y
6
3
2
0
O 2 x
-2
-4
-6 -4 -2 0 2 4 6
3.5 Bases
By Theorem 78, the subspace generated by a subset S of Rn is formed by all the linear
combinations of the vectors in S. Suppose that S is a linearly dependent set. By Theorem
75, some vectors in S are then linear combinations of other elements of S. By Corollary
79, such vectors are, therefore, redundant for the generation of span S. Indeed, if a vector
x 2 span S is a linear combination of vectors of S, then by Corollary 79 we have
span S = span (S fxg)
where S fxg is the set S without the vector x.
A linearly dependent set S thus contains some elements that are redundant for the
generation of span S. This does not happen if, on the contrary, S is a linearly independent
set: by Corollary 77, no vector of S can then be a linear combination of other elements of S.
In other words, when S is linearly independent, all its vectors are essential for the generation
of span S.
These observations lead us to the notion of basis.
68 CHAPTER 3. LINEAR STRUCTURE
(ii) all the vectors of S are essential for this representation, none of them is redundant.
Hence,
m
X
i
( i i) x =0
i=1
and, since the vectors in S are linearly independent, it follows that i i = 0 for every
i = 1; :::; m; that is, i = i for every i = 1; :::; m.
“If”. Let S = x1 ; :::; xm and suppose that each x 2 Rn can be written in a unique way
as a linear combination of vectors in S. Clearly, by Theorem 78 we have Rn = span S. It
remains to prove that S is a linearly independent set. Suppose that the scalars f i gm
i=1 are
such that
Xm
i
ix = 0
i=1
Since we also have
m
X
0xi = 0
i=1
we conclude that i = 0 for every i = 1; :::; m because, by hypothesis, the vector 0 can be
written in only one way as a linear combination of vectors in S.
That is, the coe¢ cients of the linear combination are the components of the vector x. N
3.5. BASES 69
Example 86 The standard basis of R2 is f(1; 0) ; (0; 1)g. But, there exist in…nitely many
other bases of R2 : for example, S = f(1; 2) ; (0; 7)g is another such basis. It is easy to prove
the linear independence of S. To show that span S = R2 , consider any vector x = (x1 ; x2 ) 2
R2 . We need to show that there exist 1 ; 2 2 R such that
= x1
1
2 1+7 2 = x2
Since
x2 2x1
1 = x1 ; 2 =
7
solve the system, we conclude that S is indeed a basis of R2 . N
Theorem 87 For each linearly independent set x1 ; :::; xk of Rn with k n, there exist
n
n k vectors xk+1 ; :::; xn such that the overall set xi i=1 is a basis of Rn .
Because of its importance, we give two di¤erent proofs of the result. They both require
the following lemma.
x = c1 b1 + + cn bn
It follows that
span x; b2 ; :::; bn = span b1 ; b2 ; :::; bn = Rn
70 CHAPTER 3. LINEAR STRUCTURE
It remains to show that the set x; b2 ; :::; bn is linearly independent, so that we can
conclude that it is a basis of Rn . Let f i gni=1 R be coe¢ cients for which
n
X
i
1x + ib =0 (3.5)
i=2
If 1 6= 0, we have
n
X n
X
i i i i
x= b = 0b1 + b
i=2 1 i=2 1
Since x can be written in a unique way as linear combination of the vectors of the basis
n
bi i=1 , one gets that c1 = 0, which contradicts the hypothesis c1 6= 0. This means that
1 = 0 and (3.5) simpli…es to
n
X
0b1 + i
ib = 0
i=2
Suppose now that the statement of the theorem is true for each set of k 1 vectors
(induction hypothesis); we want to show that it is true for each set of k vectors. Let therefore
x1 ; :::; xk be a set of k linearly independent vectors. The subset x1 ; :::; xk 1 is linearly
independent and has k 1 elements. By the induction hypothesis, there exist n (k 1)
vectors yek ; :::; yen such that x1 ; :::; xk 1 ; yek ; :::; yen is a basis of Rn . Therefore, there exist
coe¢ cients f i gni=1 R such that
k 1
X n
X
xk = i
ix + ei
iy (3.6)
i=1 i=k
As the vectors x1 ; :::; xk 1 ; xk are linearly independent, at least one of the coe¢ cients
Pk 1
f i gni=k is di¤erent from zero. Otherwise, xk = i
i=1 i x and so the vector x would
k
Proof 2 of Theorem 87 The theorem holds for k = 1 (see the previous proof). So, let
1 < k n be the smallest integer for which the property is false. By Lemma 88, there exists
a linearly independent set x1 ; :::; xk such that there are no n k vectors of Rn that, added
to x1 ; :::; xk , yield a basis of Rn . Given that x1 ; :::; xk 1 is, in turn, linearly independent,
7
See Appendix E for the induction principle.
8
Note that a singleton fxg is linearly independent when x = 0 implies = 0, which is equivalent to
requiring x 6= 0.
3.5. BASES 71
the minimality of k implies that there are xk ; :::; xn such that x1 ; :::; xk 1 ; xk ; :::; xn is a
basis of Rn . But then
xk = c1 x1 + + ck 1x
k 1
+ ck xk + + cn xn
is a basis of Rn , a contradiction.
Proof (i) It is enough to set k = n in Theorem 87. (ii) Let S = x1 ; :::; xk be a linearly
independent set in Rn . We want to show that k n. By contradiction, suppose k > n.
1
Then, x ; :::; x n is in turn a linearly independent set and by point (i) is a basis of Rn .
Hence, the vectors xn+1 ; :::; xk are linear combinations of the vectors x1 ; :::; xn , which,
by Corollary 77, contradicts the linear independence of the vectors x1 ; :::; xk . Therefore,
k n, which completes the proof.
Example 90 By point (i), any two linearly independent vectors form a basis of R2 . Going
back to Example 86, it is therefore su¢ cient to verify that the vectors (1; 2) and (0; 7) are
linearly independent to conclude that S = f(1; 2) ; (0; 7)g is a basis of R2 . N
Proof Suppose that Rn has a basis of n elements. By Corollary 89-(ii), every other basis of
Rn can have at most n elements. Let x1 ; :::; xk be any another basis of Rn . We show that
one cannot have k < n, and so conclude that k = n. Suppose that k < n. By Theorem 87,
there exist n k vectors xk+1 ; :::; xn such that the set x1 ; :::; xk ; xk+1 ; :::; xn is a basis of
Rn . This, however, contradicts the assumption that x1 ; :::; xk is a basis of Rn , because the
vectors xk+1 ; :::; xn are not linear combinations of the vectors x1 ; :::; xk : x1 ; :::; xn is
a linearly independent set. Therefore, k = n.
72 CHAPTER 3. LINEAR STRUCTURE
Also the bases of vector subspaces thus permit to represent – without redundancies –
each vector of the subspace as linear combinations.
The results of the previous section continue to hold.9 We start with Theorem 84.
Theorem 96 All bases of a vector subspace of Rn have the same number of elements.
Although in view of Theorem 91 the result is not surprising, it remains of great elegance
because it shows how, despite their diversity, the bases share a fundamental characteristic
like the cardinality. This motivates the next de…nition, which was implicit in the discussion
that followed Theorem 91.
By Theorem 96, this number is unique, and is denoted by dim V . It is the notion of dimen-
sion that, indeed, makes interesting this (otherwise routine) section, as the next examples
show.
Example 98 In the special case V = Rn we have dim Rn = n, which makes rigorous the
discussion that followed Theorem 91. N
9
We leave to the reader the proofs of the results of this section because they are similar to those of the
last section.
3.7. POST SCRIPTUM: SOME HIGH SCHOOL ALGEBRA 73
Example 99 (i) The horizontal axis is a vector subspace of dimension one of R2 . (ii) The
plane M = x = (x1 ; x2 ; x3 ) 2 R3 : x1 = 0 is a vector subspace of dimension two of R3 , that
is, dim M = 2. N
Example 100 If V = f0g, that is, if V is the trivial vector subspace formed only by the
origin 0, we set dim V = 0. Indeed, V does not contain linearly independent vectors (why?)
and, therefore, it has as basis the empty set f;g. N
through a simple high school argument. Consider x4 as a known term and solve the system
in x1 , x2 , and x3 ; clearly, we will get solutions that depend on the value of the parameter
x4 :
8 8
< 2x1 x2 + 2x3 + 2x4 = 0 < 2x1 x2 = 2x3 2x4
x1 x2 2x3 4x4 = 0 =) x1 x2 = 2x3 + 4x4 =)
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< 2 (x2 + 2x3 + 4x4 ) x2 = 2x3 2x4 < x2 = 6x3 10x4
x1 + ( 2x3 2x4 2x1 ) = 2x3 + 4x4 =) x1 = 4x3 6x4 =)
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< x2 = 6x3 10x4 < x2 = 6x3 10x4
x1 = 4x3 6x4 =) x1 = 4x3 6x4 =)
: :
( 4x3 6x4 ) 2 ( 6x3 10x4 ) 2x3 10x4 = 0 x3 = 32 x4
8 2
8
< x2 = 6 3 x4 10x4 < x2 = 6x4
2
x1 = 4 x
3 4 6x 4 =) x1 = 10 3 x4
: 2 : 2
x3 = 3 x4 x3 = 3 x4
In conclusion, the vectors of R4 of the form (3.2) are the solutions of the system for every
t 2 R.
74 CHAPTER 3. LINEAR STRUCTURE
Chapter 4
Euclidean structure
The sum of the squares of the components of a vector is thus the inner product of the vector
by itself. This simple observation will be central in this chapter because it will allow us to
de…ne the fundamental notion of norm using the inner product. In this regard, note that
x x = 0 if and only if x = 0: a sum of squares is zero if and only if all addends are zero.
Before studying the norm we introduce the absolute value, which is the scalar version of
the norm and probably already familiar to the reader.
For example, j5j = j 5j = 5. Geometrically, the absolute value represents the distance of a
scalar from the origin. It satis…es the following elementary properties that the reader can
verify:
75
76 CHAPTER 4. EUCLIDEAN STRUCTURE
Property (iv) is called the triangle inequality. Another basic, but important, property of
the absolute value is
jxj < c () c < x < c 8c > 0 (4.1)
as the reader can check.
p
Recall that we agreed
p to consider only the positive root x of a positive scalar x (Section
1.5). For example, 25 = 5. Formally, this amounts to take
p
x2 = jxj 8x 2 R (4.2)
as it is easily checked.
4.1.3 Norm
The notion of norm generalizes that of absolute value to Rn . In particular, the (Euclidean)
norm of a vector x 2 Rn , denoted by kxk, is given by
1
q
kxk = (x x) = x21 + x22 +
2 + x2n
When n = 1, the norm reduces to the absolute value; indeed, by (4.2) we have
p
kxk = x2 = jxj 8x 2 R
q p
For example, if x = 4 we have kxk = ( 4)2 = 16 = 4 = j 4j = jxj.
Geometrically, the norm of a vector x = (x1 ; x2 ) of the plane is the length of the segment
that joins it with the origin, that is, it is the distance ofpthe vector from the origin. Indeed,
this length is, by Pythagoras’Theorem, exactly kxk = x21 + x22 .
p p
(iii) If x = (a; 2a; a) 2 R3 , then kxk = a2 + (2a)2 + ( a)2 = jaj 6.
p
(iv) If x = 2; ; 2; 3 2 R4 , then
q p p
p 2
kxk = 22 + 2 + 2 + 32 = 4+ 2 +2+9= 15 + 2
The norm satis…es some elementary properties that extend to Rn those of the absolute
value. The next result gathers the simplest ones.
(i) kxk 0;
(iii) k xk = j j kxk.
Proof We prove
p point (ii), leaving the other points to the reader. If x = 0 = (0; 0; :::; 0),
then kxk = 0 + 0 + + 0 = 0. Vice versa, if kxk = 0 then
Since x2i 0 for each i = 1; 2; :::; n, from (4.3) it follows that x2i = 0 for each such i since a
sum of squares is zero if and only if they are all zero.
Property (iii) extends the property jxyj = jxj jyj of the absolute value. The famous
Cauchy-Schwarz inequality is a di¤erent, more subtle, extension of such property.
1
Recall that two vectors are collinear if they are linearly dependent (Example 69).
78 CHAPTER 4. EUCLIDEAN STRUCTURE
where a = y y, b = 2(x y) and c = x x. From high school algebra we know that at2 +bt+c 0
only if the discriminant = b2 4ac is smaller or equal than 0. Therefore,
Whence
(x y)2 kxk2 kyk2
and, by taking square roots of both sides, we obtain the Cauchy-Schwarz inequality (4.4).
It remains to prove that equality holds if and only if the vectors x and y are collinear.
“Only if”. Let us assume that (4.4) holds as equality. Then, by (4.5), it follows that = 0.
Thus, there exists a point t^ where the parabola at2 + bt + c takes the value 0, i.e.,
2
0 = (x + t^y) (x + t^y) = x + t^y
By Proposition 102, this implies that x + t^y = 0, i.e., x = t^y. “If”. If x and y are collinear,
then x = t^y for some t^. Then, 0 = 0 0 =(x + t^y) (x + t^y). This implies that the parabola
at2 + bt + c, besides being always positive, takes the value 0 at the point t^, and thus the
discriminant must be zero. By (4.5), we deduce that (4.4) holds as equality.
The Cauchy-Schwarz inequality allows us to prove the triangle inequality for the norm,
thereby completing the extension to the norm of the properties (i)-(iv) of the absolute value.
That is,
n n n n
!1 n
!1
X 2
X X X 2 X 2
Hence, simplifying,
n n
!1 n
!1
X X 2 X 2
xi yi x2i yi2
i=1 i=1 i=1
x
2
y
0
O
-1
-2
-3 -2 -1 0 1 2 3 4 5
1
v= x
kxk
is a unit vector: to “normalize” a vector is enough to divide it by its own norm. Indeed, we
have
x 1
= kxk = 1 (4.7)
kxk kxk
where, being kxk a scalar, the …rst equality follows from Proposition 102-(iii).
The unit vectors
e1 = (1; 0; 0; ::; 0)
e2 = (0; 1; 0; :::; 0)
en = (0; 0; :::; 0; 1)
are the versors of Rn introduced in Chapter 3. To see their special status, note that in R2
they are
and lie on the horizontal and on the vertical axes, respectively. In particular, e1 ; e2
80 CHAPTER 4. EUCLIDEAN STRUCTURE
0.8
0.6
2
+e
0.4
0.2
1 1
-e +e
0
O
-0.2
-0.4
2
-e
-0.6
-0.8
-1
-1 -0.5 0 0.5 1
4.2 Orthogonality
Through a simple trigonometric analysis, Appendix C.3 shows that two vectors x and y of
the plane can be regarded to be perpendicular when their inner product is zero, i.e., x y = 0.
This suggests the following de…nition.
De…nition 105 Two vectors x; y 2 Rn are said to be orthogonal (or perpendicular), written
x?y, if
x y=0
From the commutativity of the inner product it follows that x?y is equivalent to y?x.
Proof We have
as desired.
4.2. ORTHOGONALITY 81
De…nition 108 A set of vectors of Rn is said to be orthogonal if its elements are pairwise
orthogonal vectors.
The set e1 ; :::; en of the versors is the most classic example of an orthogonal set. Indeed,
ei ej = 0 for every 1 i = 6 j n.
Proposition 109 Any orthogonal set that does not contain the zero vector is linearly inde-
pendent.
k k k
!
X X X
j j i
0= jx 0 = jx ix
j=1 j=1 i=1
k
! k
! k
!
X X X
1 i 2 i k i
= 1x ix + 2x ix + + kx ix
i=1 i=1 i=1
k
! k
!
X X
2 1 2 1 i 2 2 2 2 1 i
= 1 x + 1x ix + 2 x + 2x 1x + ix
i=2 i=3
k 1
! k
2 X X 2
2
+ + k xk + kx
k
ix
i
= 2
i xi
i=1 i=1
An orthogonal set composed of unit vectors is called orthonormal. The set e1 ; :::; en
is, for example, orthonormal. In general, given an orthogonal set x1 ; :::; xk of vectors of
Rn , the set
x1 xk
; :::;
kx1 k kxk k
2
In reading this result, recall that a set of vectors containing the zero vector is necessarily linearly dependent
(see Example 68).
82 CHAPTER 4. EUCLIDEAN STRUCTURE
obtained by dividing each element by its norm is orthonormal. Indeed, by (4.7) each vector
xi = xi has norm 1 (so it is a unit vector), and for every i 6= j we have
xi xj 1
i j
= i xi xj = 0
kx k kx k kx k kxj k
x1 = (1; 1; 1) ; x2 = ( 2; 1; 1) ; x3 = (0; 1; 1)
Then p p p
x1 = 3; x2 = 6; x3 = 2
By dividing each vector by its norm, we get the orthonormal vectors
x1 1 1 1 x2 2 1 1 x3 1 1
= p ;p ;p ; = p ;p ;p ; = 0; p ;p
kx1 k 3 3 3 kx2 k 6 6 6 kx3 k 2 2
In particular, these three vectors form an orthonormal basis. N
The orthonormal bases of Rn , in primis the standard basis e1 ; :::; en , are the most
important bases of Rn because for them it is easy to determine the coe¢ cients of the linear
combinations that represent the vectors of Rn , as the next result shows.
The coe¢ cients y xi are called Fourier coe¢ cients of y (with respect to the given
orthonormal basis).
Proof Since fx1 ; :::; xn g is a basis, there exist n scalars 1; 2 ; :::; n such that
n
X
i
y= ix
i=1
0 if i 6= j
xi xj =
1 if i = j
With respect to the standard basis e1 ; :::; en , each vector y = (y1 ; :::; yn ) 2 Rn has the
Fourier coe¢ cients y ei = yi . In this case, (4.8) thus reduces to (3.4), i.e., to
n
X
y= yi ei
i=1
This way of writing vectors, which plays a key role in many results, is a special case of
the general expression (4.8). In other words, the components of a vector y are its Fourier
coe¢ cients with respect to the standard basis.
For a change, the next example considers an orthonormal basis di¤erent from the standard
basis.
y = x1 y x1 + x2 y x2 + x3 y x3
9 1 1 1 3 2 1 1 1 1 1
=p p ;p ;p +p p ;p ;p +p 0; p ; p
3 3 3 3 6 6 6 6 2 2 2
p p p
Thus, 9= 3; 3= 6; 1= 2 are the Fourier coe¢ cients of y = (2; 3; 4) with respect to this
orthonormal basis of R3 . N
k 2 k
X X 2
i
x = xi
i=1 i=1
k 2 k 1 2
X X 2 2
x i
= x +xi k
= y + xk = kyk2 + xk
i=1 i=1
k 1 2 k 1 k
X 2 X 2 2 X 2
i
= x + xk = xi + xk = xi
i=1 i=1 i=1
as desired.
Chapter 5
Topological structure
In this chapter we introduce the fundamental notion of distance between points of Rn that,
by formalizing the notion of “proximity”, endows Rn with a topological structure.
5.1 Distances
The norm, studied in Section 4.1, allows to de…ne a distance in Rn . We start with n = 1,
when the norm is simply the absolute value jxj. Consider two points x and y on the real
line, with x > y:
The distance between the two points is x y, which is the length of the segment that joins
them. On the other hand, if we take any two points x and y on the real line, without knowing
their order (i.e., whether x y or x y), the distance becomes
jx yj
x y if x y
jx yj =
y x if x < y
and so the absolute value of the di¤erence represents the distance between the two points,
independently of their order. In symbols, we write
d (x; y) = jx yj 8x; y 2 R
In particular, d (0; x) = jxj and therefore the absolute value – i.e., the norm – of a point
x 2 R can be regarded as its distance from the origin.
Let us now consider n = 2. Take two vectors x = (x1 ; x2 ) and y = (y1 ; y2 ) in the plane:
85
86 CHAPTER 5. TOPOLOGICAL STRUCTURE
The distance between x and y is given by the length of the segment that joins them (in
boldface in the …gure). By Pythagoras’Theorem, this distance is
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 (5.1)
since it is the hypotenuse of the right triangle whose catheti are the segments that join xi
and yi for i = 1; 2. The following …gure illustrates:
The distance (5.1) is nothing but the norm of the vector x y (and also of y x), i.e.,
d (x; y) = kx yk
The distance between two vectors in R2 is, therefore, given by the norm of their di¤erence.
It is easy to see that, by applying again Pythagoras’Theorem, the distance between two
vectors x and y in R3 is given by
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 + (x3 y3 )2
De…nition 114 The ( Euclidean) distance d (x; y) between two vectors x and y in Rn is the
norm of their di¤ erence, i.e., d (x; y) = kx yk.
5.1. DISTANCES 87
In particular, d(x; 0) = kxk, which is the norm kxk of the vector x 2 Rn , can be regarded
as its distance from the vector 0 (i.e., the length of the segment that joins 0 and x).
The following proposition collects the basic properties of the distance (we leave the simple
proof to the reader).
(i) d (x; y) 0;
Properties (i)-(iv) are natural for a notion of distance. Property (i) says that a distance
is always a positive quantity, which by (ii) is zero only between vectors that are equal (so,
the distance between distinct vectors is always strictly positive). Property (iii) says that
distance is a symmetric notion: in measuring a distance between two vectors, it does not
matter from which vector we take the measurement. Finally, property (iv) is the so-called
triangle inequality: for example, the distance between cities x and y cannot exceed the sum
of the distances between x and any other city z and between z and y: detours cannot reduce
the distance one needs to cover.
1 1 2 2
d (x; y) = = =
3 3 3 3
N
88 CHAPTER 5. TOPOLOGICAL STRUCTURE
5.2 Neighborhoods
De…nition 117 We call neighborhood of center x0 2 Rn and radius " > 0, denoted by
B" (x0 ), the set
B" (x0 ) = fx 2 Rn : d (x; x0 ) < "g
The neighborhood B" (x0 ) is, therefore, the locus of the points of Rn that lie at distance
strictly smaller than " from x0 .1
In R the neighborhoods are the open intervals (x0 "; x0 + "), i.e.,
where we have used (4.1), i.e., jxj < a () a < x < a. Graphically:
Hence in R the neighborhoods are open intervals. It is easily seen that in R2 they are
open balls (so, without circumference), in R3 open balls (so, without surface), and so on.
Indeed, the points that lie at a distance strictly less than " from x0 form a open, so “skinless”,
ball of center x0 . Graphically, in the plane we have:
4
2 ε
x
0
1
0
O
-1
-2
-3 -2 -1 0 1 2 3 4 5
Next we give some examples of neighborhoods. To ease notation, we write B" (x1 ; ::; xn )
instead of B" ((x1 ; ::; xn )).
(ii) The notations B 1 (0) and B0 (1) are meaningless because we need " > 0.
(iii) We have
q
B3 (0; 0) = B3 (0) = x 2 R2 : d(x; 0) < 3 = x 2 R2 : x21 + x22 < 3
(iv) We have
N.B. Each point x0 of Rn has in…nitely many neighborhoods B" (x0 ), one per each value of
the radius " > 0. O
Sometimes we will use, though only in the real line, “half neighborhoods” of a point x0 .
Speci…cally:
De…nition 119 Given " > 0, the interval [x0 ; x0 + ") is called the right neighborhood of
x0 2 R of radius ", while the interval (x0 "; x0 ] is called the left neighborhood of x0 of
radius ".
Through them we can give a useful characterization of suprema and in…ma of subsets of
the real line (Section 1.4.2).
Proof “Only if”. If a = sup A, (i) is obviously satis…ed. Let " > 0. Since sup A > a ", the
point a " is not an upper bound of A. Therefore, there exists x 2 A such that x > a ".
“If”. Suppose that a 2 R satis…es (i) and (ii). By (i), a is an upper bound of A. By (ii),
it is also the least upper bound. Indeed, each b < a can be written as b = a ", by setting
" = a b > 0. Given b < a, by (ii) there exists x 2 A such that x > a " = b. Therefore, b
is not an upper bound of A, which implies that there is no upper bound smaller than a.
90 CHAPTER 5. TOPOLOGICAL STRUCTURE
The set of the interior points of A is called the interior of A and is denoted by int A. By
de…nition, int A A. The set of the exterior points of A is then int Ac .
Example 122 Let A = (0; 1). Each point of A is interior, that is, int A = A. Indeed, let
x 2 (0; 1). Consider the smallest distance of x from the two endpoints 0 and 1 of the interval,
i.e., min fd (0; x) ; d (1; x)g. Let " > 0 be such that " < min fd (0; x) ; d (1; x)g. Then
Therefore, x is an interior point of A. Since x was arbitrarily chosen, it follows that int A = A.
Finally, the set of exterior points is int Ac = ( 1; 0) [ (1; +1). N
Example 123 Let A = [0; 1]. We have int A = (0; 1). Indeed, by proceeding as above we
see that the points in (0; 1) are all interior, that is, (0; 1) int A. It remains to check the
endpoints 0 and 1. Consider 0. Its neighborhoods have the form ( "; "), so they contain also
points of Ac . It follows that 0 2
= int A. Similarly, 1 2= int A. We conclude that int A = (0; 1).
The set of the exterior points is Ac , i.e., int Ac = Ac (as the reader can easily verify). N
A point x0 is, therefore, a boundary point for A if all its neighborhoods contain both
points of A (because it is not exterior) and points of Ac (because it is not interior). The set
of the boundary points of a set A is called the boundary or frontier of A and is denoted by
@A. Intuitively, the frontier is the “border” of a set.
Example 125 (i) Let A = (0; 1). Given the residual nature of the de…nition of boundary
points, to determine @A we need to …nd the interior and exterior points. From Example 122,
we know that int A = (0; 1) and int Ac = ( 1; 0) [ (1; +1). It follows that
@A = f0; 1g
i.e., the boundary of (0; 1) is formed by the two endpoints 0 and 1. Note that A \ @A = ;:
in this example the boundary points do not belong to the set A.
(ii) Let A = [0; 1]. In Example 123 we have seen that int A = (0; 1) and int Ac = Ac .
Therefore, @A = f0; 1g. Here @A A, the set A contains its own boundary points.
(iii) Let A = (1; 0]. The reader can verify that int A = (0; 1) and int Ac = ( 1; 0) [
(1; +1). Hence, @A = f0; 1g. In this example, the frontier is partly outside and partly
inside the set: the boundary point 1 belongs to A, while the boundary point 0 does not.
All the points such that x21 + x22 < 1 are interior, that is,
while all the points such that x21 + x22 > 1 are exterior, that is,
Example 127 Let A = Q be the set of rational numbers, so that Ac is the set of the
irrational numbers. By Propositions 18 and 39, between any two rational numbers q < q 0
there exists an irrational number a such that q < a < q 0 and between any two irrational
numbers a < b there exists a rational number q 2 Q such that a < q < b. The reader can
check that this implies int A = int Ac = ;, and so @A = R. This example shows that the
interpretation of the boundary as a “border” can be misleading in some cases. Indeed, the
mathematical notions have their own life and we must be ready to follow them also when
our intuition may fall short. N
92 CHAPTER 5. TOPOLOGICAL STRUCTURE
Lemma 128 Let A R be a bounded set. Then sup A 2 @A and inf A 2 @A.
Proof We prove that = sup A 2 @A (the proof for the in…mum is similar). Consider any
neighborhood ( "; + ") of . We have ( ; + ") Ac , so ( "; + ") \ Ac 6= ;.
Moreover, by Proposition 120 for every " > 0 there exists x0 2 A such that x0 > ", so
that ( "; ] \ A 6= ;. Thus, ( "; + ") \ A 6= ;. We conclude that, for every " > 0, we
have both ( "; + ") \ A 6= ; and ( "; + ") \ Ac 6= ;, that is, 2 @A.
As the terminology suggests, isolated points are “separated” from the rest of the set.
Example 130 Let A = [0; 1] [ f2g. It consists of the closed unit interval and, in addition,
of the point 2. This point is isolated. Indeed, if B" (2) is a neighborhood of 2 with " < 1,
then A \ B" (2) = f2g. N
As anticipated, we have:
Hence, x0 is a limit point of A if, for every " > 0, there exists some x 2 A such that
0 < kx0 xk < ".2 The set of limit points of A is denoted by A0 and is called the derived
set of A. Note that limit points are not required to belong to the set.
4
x
2
3
2 2
0
-1 O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
94 CHAPTER 5. TOPOLOGICAL STRUCTURE
The de…nition of limit point requires that its neighborhoods contain at least one point of
A other than itself. As next we show, they actually contain in…nitely many of them.
Proposition 136 Each neighborhood of a limit point of A contains in…nitely many points
of A.
Proof Let x be a limit point of A. Suppose, by contradiction, that there exists a neighbor-
hood B" (x) of x containing a …nite number of points fx1 ; :::; xn g of A distinct from x. Since
the set fx1 ; :::; xn g is …nite, the minimum distance mini=1;:::;n d (x; xi ) exists and is strictly
positive, i.e.,
min d (x; xi ) > 0
i=1;:::;n
Let > 0 be such that < mini=1;:::;n d (x; xi ). Clearly, 0 < < " since < mini=1;:::;n d (x; xi ) <
". Hence, B (x) B" (x). It is also clear, by construction, that xi 2= B (x) for each
i = 1; 2; :::; n. So, if x 2 A we have B (x)\A = fxg. Instead, if x 2 = A we have B (x)\A = ;.
Regardless of whether x belongs to A or not, we thus have B (x) \ A fxg. Therefore, the
unique point of A that B (x) may contain is x itself. But, this contradicts the hypothesis
that x is a limit point of A.
O.R. The concept of interior point of a set A requires the existence of a neighborhood of the
point that is entirely formed by points of A. This means that it is possible to move away, at
least a bit, from the point by remaining inside A – i.e., it is possible go for a “little walk”
in any direction without showing the passport. Retracing one’s steps, it is then possible to
approach the point from any direction by remaining inside A.
The concept of limit point of a set A does not require the point to belong to A but
requires, instead, that we can get as close as we want to the point by “jumping” on points
of the set (by jumping on river stones, we can get as close as we want to our target through
stones that all belong to the set). This idea of approaching a point by remaining within a
given set will be crucial to de…ne limits of functions. H
De…nition 137 A set A in Rn is called open if all its points are interior, that is, if int A =
A.
Thus, a set is open if it does not contains its borders (so it is skinless).
Example 138 The open interval (a; b) is open (whence the name). Indeed, let x 2 (a; b).
Let " > 0 be such that
" < min fd (x; a) ; d (x; b)g
We have B" (x) (a; b), so x is an interior point of (a; b). Since x was arbitrarily chosen, it
follows that (a; b) is open. N
5.4. OPEN AND CLOSED SETS 95
Example 139 The set x 2 R2 : 0 < x21 + x22 < 1 is open. Graphically, it is the ball de-
prived of both the skin and the origin:
4
x
2
3
0
O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
Given that the neighborhoods in R are all of the type (a; b), they are all open. The next
result shows that, more generally, neighborhoods are open in Rn .
Proof Let B" (x0 ) be a neighborhood of a point x0 2 Rn . To show that B" (x0 ) is open, we
have to show that all its points are interior. Let x 2 B" (x0 ). To prove that x is interior to
B" (x0 ), let
0 < "0 < " d (x; x0 ) (5.2)
Then B"0 (x) B" (x0 ). Indeed, let y 2 B"0 (x). Then
where the last inequality follows from (5.2). Therefore B"0 (x) B" (x0 ), which completes
the proof.
Clearly, A A. The closure of A is, thus, an “enlargement” of A that includes all its
boundary points, that is, the borders. Naturally, the notion of closure becomes relevant
when the borders are not already part of A.
Example 142 (i) If A = [0; 1) R, then A = [0; 1]. (ii) If A = (x1 ; x2 ) 2 R2 : x21 + x22 1
is the closed unit ball, then A = A. N
De…nition 144 A set A in Rn is called closed if it contains all its boundary points, that is,
if A = A.
Hence, a set is closed when it includes its border (so it has a skin).
Example 145 (i) The set A = [0; 1) is not closed since A 6= A, while the closed unit
ball A = (x1 ; x2 ) 2 R2 : x21 + x22 1 is closed since A = A. (ii) The closed interval
[a; b] is closed (whence the name). The unbounded intervals (a; 1) and ( 1; a) are open.
The unbounded intervals [a; 1) and ( 1; a] are closed. (iii) The circumference A =
(x1 ; x2 ) 2 R2 : x1 + x2 = 1 is closed because A = @A = A0 = A. N
Open and closed sets are dual notions, as the next result shows.3
Proof “Only if”. Let A be open. We show that Ac is closed. Let x be a boundary point
of Ac , that is, x 2 @Ac . By de…nition, x is not an interior point of either A or Ac . Hence,
x2 = int A. But, A = int A because A is open. Therefore x 2 = A, that is, x 2 Ac . It follows
that @Ac Ac since x 2 @Ac . Therefore, Ac = Ac , which proves that Ac is closed.
Example 147 The …nite sets of Rn (so, the singletons) are closed. To verify it, let A =
fx1 ; x2 ; :::; xn g be a generic …nite set. Its complement Ac is open. Indeed, let x 2 Ac . If
" > 0 is such that
" < d (x; xi ) 8i = 1; :::; n
then B" (x) Ac . So, x is an interior point of Ac . Since x was arbitrarily chosen, it follows
that Ac is open. As the reader can check, we also have int A = ; and @A = A. N
4
x
2
3
0 -1 2
O x
1
-1
-1
-2
-3 -2 -1 0 1 2 3 4 5
Open and closed sets are, therefore, two sides of the same coin: a set is closed (open) if
and only if its complement is open (closed). Naturally, there are many sets that are neither
open nor closed. Next we can give a simple example of such a set.
Example 149 The set A = [0; 1) is neither open nor closed. Indeed, int A = (0; 1) 6= A and
A = [0; 1] 6= A. N
There is a case in which the duality of open and closed sets takes a curious form.
Example 150 The empty set ; and the whole Rn are simultaneously open and closed. By
Theorem 146, it is su¢ cient to show that Rn is both open and closed. But, this is obvious.
Indeed, Rn is open because, trivially, all its points are interior (all neighborhoods are included
in Rn ) as well as closed because it trivially coincides with its own closure. It is possible to
show that ; and Rn are the unique sets with such double personality. N
Let us go back to the notion of closure A. The next result shows that it can be equivalently
seen as the addition to the set A of its limit points A0 . In other terms, adding the borders
turns out to be equivalent to adding the limit points.
98 CHAPTER 5. TOPOLOGICAL STRUCTURE
Proof We need to prove that A [ A0 = A [ @A. We …rst prove that A [ A0 A [ @A. Since
A A [ @A, we have to prove that A0 A [ @A. Let x 2 A0 . In view of what we observed
after the proof of Lemma 133, x is either an interior or a boundary point, so x 2 A [ @A.
We conclude that A [ A0 A [ @A.
It remains to show that A [ @A A [ A0 . Since A A [ A0 , we have to prove that
@A A [ A0 . Let x 2 @A. If x is an isolated point, then by de…nition x 2 A. Otherwise,
by Lemma 133 x is a limit point for A, that is, x 2 A0 . Hence, x 2 A [ A0 . This proves
A [ @A A [ A0 , and so the result.
A corollary of this result is that a set is closed when it contains all its limit points. This
sheds further light on the nature of closed sets.
Corollary 152 A set in Rn is closed if and only if it contains all its limit points.
Example 153 The inclusion A0 A in this corollary can be strict, in which case the set
A A0 consists of the isolated points of A. For example, let A = [0; 1] [ f 1; 4g. Then A
is closed and A0 = [0; 1]. Hence, A0 is strictly included in A and the set A A0 = f 1; 4g
consists of the isolated points of A. N
The set of interior points int A is, therefore, the largest open set that approximates A
“from inside”, while the closure A is the smallest closed set that approximates A “from
outside”. The relation (5.4) is, therefore, the best topological sandwich – with lower open
slice and upper closed slice –that we can have for the set A.4
It is now easy to prove an interesting and intuitive property of the boundary of a set.
Proof Let A be any set in Rn . Since the exterior points to A are interior to its complement,
we have (@A)c = int A [ int Ac . So, @A is closed because int A and int Ac are open and, as
we will momentarily see in Theorem 157, a union of open sets is open.
The next result, whose proof is left to the reader, shows that the di¤erence between the
closure and the interior of a set is given by its boundary points.
This result makes rigorous the intuition that open sets are sets without borders (or
skinless). Indeed, it implies that A is open if and only if @A \ A = ;. On the other hand, by
de…nition, a set is closed if and only if @A A, that is, when it includes the borders (it has
a skin).
It is, however, no longer true for intersections of in…nitely many neighborhoods. For example,
1
\ 1
\ 1 1
B 1 (x0 ) = x0 ; x0 + = fx0 g (5.5)
n n n
n=1 n=1
i.e., this intersection reduces to the singleton fx0 g, which is closed (Example 147). Therefore,
the intersection of in…nitely many neighborhoods T might well not be open. To check (5.5),
note that a point belongs to the intersection 1 B
n=1 1=nT1(x0 ) if and only if it belongs to each
neighborhood B1=n (x0 ). This is true for x0 , so x0 2 n=1 B1=n (x0 ). This is, however, the
unique point thatT satis…es this property. Indeed, suppose by contradiction that y 6= x0 is
such that y 2 1 n=1 B1=n (x0 ). Since y 6= x0 , we have d (x0 ; y) > 0. If we take n su¢ ciently
large, in particular if
1
n>
d (x0 ; y)
then its reciprocal 1=n will be su¢ ciently small so to have
1
0< < d (x0 ; y)
n
100 CHAPTER 5. TOPOLOGICAL STRUCTURE
T
Therefore, y 2= B1=n (x0 ), which contradicts the assumption
T1 that y 2 1 n=1 B1=n (x0 ). It
follows that x0 is the only point in the intersection n=1 B1=n (x0 ), i.e., (5.5) holds.
More generally, in the case of in…nitely many neighborhoods B"i (x0 ), if supi "i < +1 we
set " = supi "i , so that
1
[
B"i (x0 ) = B" (x0 )
i=1
For example,
1
[ 1
[ 1 1
B 1 (x0 ) = x0 ; x0 + = B1 (x0 )
n n n
n=1 n=1
For example,
1
[ 1
[
Bn (x0 ) = (x0 n; x0 + n) = Rn
n=1 n=1
Theorem 157 The intersection of a …nite family of open sets is open, while the union of
any family (…nite or not) of open sets is open.
T
Proof Let A = ni=1 Ai with each Ai open. Each point x 2 A belongs to all sets Ai and
is interior to all of them (because they T are open), i.e., there exist neighborhoods B"i (x) of
x such that B"i (x) Ai . Put B = ni=1 B"i (x). The set B it is still a neighborhood of x
– with radius " = min f"1 ; :::; "n g – and B Ai for each i. So, B is a neighborhood of x
contained in A. Therefore, A is open.
S
Let A = i2I Ai , where i runs over a …nite or in…nite index set I. Each x 2 A belongs to
at least one of the sets Ai , say to A{ . Since all sets Ai are open, there exists a neighborhood
of x contained in A{ , and so in A. Therefore, x is interior to A and, given the arbitrariness
of x, A is open.
By Theorem 146 and by the De Morgan laws, it is easy to prove that dual properties
hold for closed sets.
5.6. COMPACT SETS 101
Corollary 158 The union of a …nite family of closed sets is closed, while the intersection
of any family (…nite or not) of closed sets is closed.
In general, in…nite unions of closed sets are not closed: for example, for the closed sets
[1
An = [ 1 + 1=n; 1 + 1=n] we have An = ( 1; 1).
n=1
jxj < K 8x 2 A
The next de…nition is the natural extension of this idea to Rn , where the absolute value is
replaced by the more general notion of norm.
kxk < K 8x 2 A
By recalling that kxk is the distance of x from the origin d(x; 0), it is easily seen that a
set A is bounded if, for every x 2 A, we have d(x; 0) < K, i.e., all its points have distance
from the origin smaller than K. So, a set A is bounded if it is contained in a neighborhood
BK (0) of the origin, geometrically if it can be inscribed in a large enough open ball.
Example 160 Neighborhoods and their closures (5.3) are bounded sets: it is su¢ cient to
take K > ". In contrast, (a; 1) is a simple example of an unbounded set (for this reason, it
is called unbounded open interval). N
Proposition 161 A set A is bounded if and only if there exists K > 0 such that, for every
x = (x1 ; :::; xn ) 2 A, we have
jxi j < K 8i = 1; :::; n
Proof We prove the “if” and leave the converse to the P reader. Let x 2 A. If jxi j < K for
1; :::; n, then x2i < K for all i = 1; :::; n. So, ni=1 x2i < nK. In turn, this implies
all i = q
Pn p 0 =
p
kxk = x 2 < nK. Since x was arbitrarily chosen in A, by setting K nK it
i=1 i
0
follows that kxk < K for each x 2 A, so A is bounded.
Using boundedness, we can de…ne a class of closed sets that turns out to be very important
for applications.
For example, all the intervals closed and bounded in R are compact.5 More generally,
the closure B" (x0 ) of a neighborhood in Rn is compact. For example, the set
B1 (0) = (x1 ; :::; xn ) 2 Rn : x21 + + x2n 1
is compact in Rn . This classic set of Rn is called closed unit ball and generalizes to Rn the
notion of closed ball unit ball that in Section 2.1 we presented in R2 (if the inequality is
strict we have the open unit ball, which instead is an open set).
Like closedness, compactness is stable under …nite unions and arbitrary intersections, as
the reader can check.6
Example 163 Finite sets – so, the singletons – are compact. Indeed, in Example 147 we
showed that they are closed sets. Since they are obviously bounded, they are then compact.
N
Example 164 Provided there are no free goods, budget sets are a fundamental example of
compact sets in consumer theory, as Proposition 792 will show. N
Theorem 165 A set C in Rn is closed if and only if it contains the limit of every convergent
sequence of its points. That is, C is closed if and only if
fxn g C; xn ! x =) x 2 C (5.6)
Proof “Only if”. Let C be closed. Let fxn g C be a sequence such that xn ! x. We want
to show that x 2 C. Suppose, by contradiction, that x 2 = C. Since xn ! x, for every " > 0
there exists n" 1 such that xn 2 B" (x) for every n n" . Therefore, x is a limit point for
C, which contradicts x 2= C because C is closed and so contains all its limit points.
“If”. Let C be a set for which property (5.6) holds. By contradiction, suppose C is not
closed. Then, there exists at least a boundary point x of C that does not belong to C. Since
it cannot be isolated (otherwise it would belong to C), by Lemma 133 x is a limit point for
C. Each neighborhood B1=n (x) does contain a point of C, call it xn . The sequence of such
xn converges to x 2= C, contradicting (5.6). Hence, C is closed.
This property is important: a set is closed if and only if “it is closed with respect to
the limit operation”, that is, if we never leave the set by taking limits of sequences. This
is a main reason why in applications sets are often assumed to be closed: otherwise, one
could get arbitrarily close to a point x without being able to reach it, a “discontinuity”that
applications typically do not feature (it would be like licking the windows of a pastry shop
without being able to reach the pastries, close yet unreachable).
5
The empty set ; is considered a compact set.
6
Note that, being the empty set compact, the intersection of two disjoint compact sets is the empty (so,
compact) set.
7
This section can be skipped at a …rst reading, and be read only after having studied sequences in Chapter
8.
5.7. CLOSURE AND CONVERGENCE 103
Example 166 Consider the closed interval C = [a; b]. We show that it is closed using
Theorem 165. Let fxn g C be such that xn ! x 2 R. By Theorem 165, to show that C
is closed it is su¢ cient to show that x 2 C. Since a xn b, a simple application of the
comparison criterion shows that a x b, that is, x 2 C. N
Functions
In other words, if the shopkeeper buys 10 kg of walnuts he will pay them 4 euros per
kg, if he buys 20 kg he will pay them 3:9 euros per kg, and so on (as it is often the case,
the dealer o¤ers quantity discounts: the higher the quantity purchased, the lower the unit
price).
The table is an example of a supply function that associates to each quantity the
corresponding selling price, where A = f10; 20; 30; 40g is the set of the quantities and
B = f4; 3:9; 3:8; 3:7g is the set of their unit prices. The supply function is a rule that,
to each element of the set A, associates an element of the set B.
In general, we have:
De…nition 168 Given any two sets A and B, a function de…ned on A and with values in
B, denoted by f : A ! B, is a rule that associates to each element of the set A one, and
only one, element of the set B.
We write
b = f (a)
to indicate that, to the element a 2 A, the function f associates the element b 2 B. Graph-
ically:
105
106 CHAPTER 6. FUNCTIONS
The rule can be completely arbitrary; what matters is that it associates to each element
a of A only one element b of B.1 The arbitrariness of the rule is the key feature of the notion
of function. It is one of the fundamental ideas of mathematics, key for applications, which
has been fully understood not so long ago: the notion of function that we just presented was
introduced in 1829 by Dirichlet after about 150 years of discussions (the …rst ideas on the
subject go back at least to Leibniz at the end of the seventeenth century).
Note that it is perfectly legitimate that the same element of B is associated to two (or
more) di¤erent elements of A, that is,
Legitimate
In contrast, it cannot happen that di¤erent elements of B are associated to the an element
1
We have emphasized in italics the most important words: the rule must hold for each element of A and,
to each of them, it must associate only one element of B.
6.1. THE CONCEPT 107
of A, that is,
Illegitimate
In terms of the supply function in the initial example, di¤erent quantities of walnuts might
well have the same unit price (e.g., there are no quantity discounts), but the same quantity
cannot have di¤erent unit prices!
Before considering some examples, we introduce a bit of terminology. The two variables a
and b are called the independent variable and the dependent variable, respectively. Moreover,
the set A is called the domain of the function, while the set B is its codomain.
The codomain is the set in which the function takes on its values, but not necessarily
contains only such values: it might well be larger. In this respect, the next notion is import-
ant: given a 2 A, the element f (a) 2 B is called the image of a. Given any subset C of the
domain A, the set
f (C) = ff (x) : x 2 Cg B (6.1)
of the images of the points in C is called the image of C. In particular, the set f (A) of
all the images of points of the domain is called image (or range) of the function f , denoted
Im f . Therefore, Im f is the subset of the codomain formed by the elements that are actually
image of some element of the domain:
Im f = f (A) = ff (x) : x 2 Ag B
Note that any set that contains Im f is, indeed, a possible codomain for the function: if
Im f B and Im f C, then writing both f : A ! B and f : A ! C is …ne. The choice of
codomain is, ultimately, a matter of convenience. For example, throughout this book we will
often consider functions that take on real values, that is, f (x) 2 R for each x in the domain
of f . In this case, the most convenient choice for the codomain is the entire real line, so we
will usually write f : A ! R.
Example 169 (i) Let A be the set of all countries in the world and B a set containing
some colors. The function f : A ! B associates to each country the color given to it on a
geographic map, so Im f is the set of the colors used at least once on the map.
(ii) The rule that associates to each living human being his date of birth is a function
f : A ! B, where A is the set of the human beings and, for example, B is the set of the dates
of the last 150 years (a codomain su¢ ciently large to contain all the possible birthdates). N
108 CHAPTER 6. FUNCTIONS
Example 170 Consider the rule that associates to each positive scalar x both the positive
p p
and the negative square roots, that is, f x; xg. For example, it associates to 4 the
elements f 2; 2g. This rule does not describe a function f : [0; 1) ! R because, to each
element of the domain di¤erent from 0, two di¤erent elements of the codomain are associated.
N
Example 171 The cubic function f : R ! R de…ned by f (x) = x3 is a rule that associates
to each scalar its cube. Since each scalar has a unique cube, this rule de…nes a function.
Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
5
y
4
1
1
0
-1 O 1 x
-1
-2
-3 -2 -1 0 1 2 3 4
In this case, to two di¤erent elements of the domain may correspond the same element of
the codomain: for example, f (1) = f ( 1) = 1. N
The clause “is a rule that” is usually omitted, so we will do from now on.
p
Example 173 The square root function f : [0; 1) ! R de…ned by f (x) = x associates
to each positive scalar its (arithmetic) square root. The domain is the positive half-line and
Im f = [0; 1). Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
N
110 CHAPTER 6. FUNCTIONS
Example 174 The logarithmic function f : (0; 1) ! R de…ned by f (x) = loga x, a > 0
and a 6= 1, associates to each strictly positive scalar its logarithm. Its domain is (0; 1),
while Im f = R. Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
Example 175 The absolute value function f : R ! R de…ned by f (x) = jxj associates to
each scalar its absolute value. This function has domain R, with Im f = [0; 1). Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
Example 176 Let f : R f0g ! R be de…ned by f (x) = 1= jxj for every scalar x 6= 0.
6.1. THE CONCEPT 111
Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
Here the domain is A = R f0g, the real line without the origin. Moreover, Im f = (0; 1).
N
f (x1 ; x2 ) = x1 + x2 (6.2)
associates to each vector x = (x1 ; x2 ) 2 R2 the sum of its components.4 For every x 2 R2 ,
such sum is unique, so the rule de…nes a function with Im f = f (R2 ) = R.
(ii) The function f : Rn ! R de…ned by
n
X
f (x1 ; x2 ; ; xn ) = xi
i=1
associates to each vector x = (x1 ; x2 ) 2 R2+ the square root of the product of the components.
For each x 2 R2+ , this root is unique, so the rule de…nes a function with Im f = R+ .
(ii) The function f : Rn+ ! R de…ned by
n
Y
f (x1 ; x2 ; ; xn ) = xi i
i=1
4
To be consistent with the notation adopted for vectors, we should write f ((x1 ; x2 )). But, to ease notation
we write f (x1 ; x2 ).
112 CHAPTER 6. FUNCTIONS
P
with the exponents i > 0 such that ni=1 i = 1, generalizes to Rn the function of two
variables (6.3) – which is the special case with n = 2 and 1 = 2 = 1=2. It is widely used
in economics with the name of Cobb-Douglas function. N
f (x1 ; x2 ) = (x1 ; x1 x2 )
For example, if (x1 ; x2 ) = (2; 5), then f (x1 ; x2 ) = (2; 2 5) = (2; 10) 2 R2 .
(ii) De…ne f : R3 ! R2 by
b=f(a)
The names of the variables are altogether irrelevant: we can indi¤erently write a = f (b),
or y = f (x), or s = f (t), or = f ( ), etc., or also = f ( ): the names of the variables are
just placeholders, what matters is only the sequence of operations (almost always numerical)
that lead from a to b = f (a). Writing b = a2 + 2a + 1 is exactly the same as writing
y = x2 + 2x + 1, or s = t2 + 2t + 1, or = 2 + 2 + 1, or even = 2 + 2 + 1. This
function is identi…ed by the operations “square + double + 1” that allow us to move from
the independent variable to the dependent one. H
We close this introductory section by making rigorous the notion of graph of a function,
until now used intuitively. For the quadratic function f (x) = x2 the graph is the parabola
5
y
4
1
1
0
-1 O 1 x
-1
-2
-3 -2 -1 0 1 2 3 4
that is, the locus of the points x; x2 of the plane, as x varies in the real line – which is
the domain of the function. For example, the points ( 1; 1), (0; 0) and (1; 1) belong to the
parabola.
Gr f = f(x; f (x)) : x 2 Ag A B
Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
(ii) When A R2 and B R, the graph is a subset of the tridimensional space R3 , i.e., a
surface (without thickness). Graphically:
6.2 Applications
6.2.1 Static choices
Let us interpret the vectors in Rn+ as bundles of goods (Section 2.4.1). It is natural to assume
that the consumer will prefer some bundles to others. For example, it is reasonable to assume
that, if x y (bundle x is “richer” than y), then x is preferred to y. In symbols, we then
6.2. APPLICATIONS 115
write x % y, where the symbol % represents the preference (binary) relation of the consumer
over the bundles.
In general, we assume that the preference % over the available bundles of goods can be
represented by a function u : Rn+ ! R, called utility function, such that
That is, bundle x is preferred to y if and only if it gets a higher “utility”. The image, Im u,
represents all the levels of utility that can be attained by the consumer.
Originally, around 1870, the …rst marginalists –in particular, Jevons, Menger and Walras
– interpreted u (x) as the level of physical satisfaction caused by the bundle x. They gave,
therefore, a physiological interpretation of utility functions, which quanti…ed the emotions
that consumers felt in owing di¤erent bundles. It is the so-called cardinalist interpretation
of the utility functions that goes back to Jeremy Bentham and to his “pain and pleasure
calculus”.5 The utility functions, besides representing the preference %, are inherently inter-
esting because they quantify an emotional state of the consumer, i.e., the degree of pleasure
determined by the bundles. In addition to the comparison u (x) u (y), it is also meaningful
to compare the di¤erences
u (x) u (y) u (z) u (w) (6.5)
which indicate that bundle x is more intensively preferred to bundle y than bundle z relative
to bundle w. Moreover, since u (x) measures the degree of pleasure that the consumer gets
by the bundle x, in the cardinalist interpretation it is also legitimate to compare these
measures among di¤erent consumers, i.e., to make interpersonal comparisons of utility. Such
interpersonal comparisons can be then used, for example, to assess the impact of di¤erent
economic policies on the welfare of the economic agents. For instance, we can ask whether
a given policy, though making some agents worse o¤, still increases the overall utility across
agents.
The cardinalist interpretation came into question at the end of the nineteenth century
due to the impossibility of measuring experimentally the physiological aspects that were
assumed to underlie utility functions.6 For this reason, with the works of Vilfredo Pareto at
the beginning of the twentieth century, developed …rst by Eugen Slutsky in 1915 and then
by John Hicks in the 1930s,7 the ordinalist interpretation of the utility functions prevailed:
more modestly, it is assumed that they are only a mere numerical representation of the
preference % of the consumer. According to such less demanding interpretation, what matters
is only that the ordering u (x) u (y) represents the preference for bundle x over bundle
y, that is, x % y. Instead, it is no longer of interest to know if it also represents the, more
or less intense, consumers’ emotions over the bundles. In other terms, in the ordinalist
approach the fundamental notion is the preference %, while the utility function becomes
just a numerical representation of it. The comparisons of intensity (6.5), as well as the
interpersonal comparisons of utility, no longer have meaning.
5
See his Introduction to the Principles of Morals and Legislation, published in 1789.
6
Around 1901, the famous mathematician Henri Poincaré wrote to Leon Walras: “I can say that one
satisfaction is greater than another, since I prefer one to the other, but I cannot say that the …rst satisfaction
is two or three times greater than the other.” Poincaré, with great sensibility, understood a key issue.
7
We refer interested readers to Stigler (1950).
116 CHAPTER 6. FUNCTIONS
At the empirical level, the consumers’ preferences % are revealed through their choices
among bundles, which are much simpler to observe than emotions or other mental states.
The ordinalist interpretation became the mainstream one because, besides the superior
empirical content just mentioned, the works of Pareto showed that it is su¢ cient for develop-
ing a powerful consumer theory (cf. Section 18.1.4). So, Occam’s razor was a further reason
to abandon the earlier cardinalist interpretation. Nevertheless, economists often use, at an
intuitive level, cardinalist categories because of their introspective plausibility.
Be that as it may, through utility functions we can address the problem of a consumer
who has to choose a bundle within a given set A of Rn+ . The consumer will be guided in such
a choice by his utility function u : A Rn+ ! R; namely, u (x) u (y) indicates that the
consumer prefers the bundle x of goods to the bundle y or that he is indi¤erent between the
two.
For example,
n
X
u (x) = xi
i=1
is the utility function of a consumer that orders the bundles simply according to the sum
of the quantities of the di¤erent goods that they contain. The classic Cobb-Douglas utility
function is
Yn
u (x) = xi i
i=1
Pn
with the exponents i > 0 such that i=1 i = 1 (see Example 178). When i = 1=n for
each i, we have
n n
!1
Y 1 Y n
u (x) = (xi ) =n xi
i=1 i=1
with bundles being ordered according to the n-th root of the product of the quantities of the
di¤erent goods that they contain.8
We close by considering a producer that has to decide how much output to produce
(Section 2.4.1). In such a decision the production function f : A Rn+ ! R plays a crucial
role in that it describes how much output f (x) is obtained by starting from a vector x 2 Rn
of input. For example,
n
!1
Y n
f (x) = xi
i=1
is the Cobb-Douglas production function in which the output is equal to the n-th root of the
product of the input components.
8
Because of its multiplicative form, the bundles with at least one zero component xi have zero utility
according to the Cobb-Douglas utility function. Since it is not that plausible that the presence of a zero
component has such drastic consequences, this utility function is often de…ned only on Rn
++ (as we will also
often do).
6.3. GENERAL PROPERTIES 117
where 2 (0; 1) is a subjective discount factor that depends on how “patient”the consumer
is. The more patient the consumer is –i.e., the more he is willing to postpone his consumption
of a given quantity of the good –the higher the value of is. In particular, the closer gets
to 1, the closer we approach the form
T
X
U (x) = u1 (x1 ) + u2 (x2 ) + + uT (xT ) = ut (xt )
t=1
in which consumption in each period is evaluated in the same way. In contrast, the closer
gets to 0, the closer U (x) gets to u1 (x1 ), that is, the consumer becomes extremely impatient
and does not give any importance to future consumptions.
of the elements of the domain whose image is y. More generally, given any subset D of the
codomain B, its preimage f 1 (D) is the set
1
f (D) = fx 2 A : f (x) 2 Dg
Example 181 Consider the function f : A ! B that to each (living) person associates the
date of birth. If y 2 B is a possible such date, f 1 (y) is the set of the persons that have y
as date of birth; in other words, all the persons in f 1 (y) have the same age (they form a
cohort, in the demography terminology). N
9
For the sake of brevity, we will consider as sets D only intervals and singletons, but similar considerations
hold for other types of sets.
118 CHAPTER 6. FUNCTIONS
Note that f 1 (a; b) = f 1 ([0; b)) when a < 0. Indeed, the elements
p p between a and 0 have
no preimage. For example, if D = ( 1; 2), then f 1 (D) = ( 2; 2). Since
1 1 1
f (D) = f ([0; 2)) = f ( 1; 2)
the negative elements of D are irrelevant (as they do not belong to the image of the function).
N
1
f (k) = fx 2 A : f (x) = kg
Example 184 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 + x22 . For every k 0, the level
curve f 1 (k) is the locus in R2 of equation
x21 + x22 = k
10 1 1
To ease notation, we denote the preimage of an open interval (a; b) by f (a; b) instead of f ((a; b)).
6.3. GENERAL PROPERTIES 119
p
That is, it is the circumference with center at the origin and radius k. Graphically, the
level curves can be represented as:
4
x3
0
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
Two di¤erent level curves of the same function cannot have any point in common, that
is,
1 1
k1 6= k2 =) f (k1 ) \ f (k2 ) = ; (6.7)
Indeed, if there were a point x 2 A that belongs to both the two curves of levels k1 and k2 ,
we would have f (x) = k1 and f (x) = k2 with k1 6= k2 , but this is impossible because, by
de…nition, a function may assume only one value at each point.
120 CHAPTER 6. FUNCTIONS
p
Example 185 Let f : A R2 ! R be given by f p (x1 ; x2 ) = 7x21 x2 . For every k 0,
the level curve f 1 (k) is the locus in R2 of equation 7x21 x2 = k, that is, x2 = k 2 +7x21 .
It is a parabola that intersects the vertical axis in k 2 . Graphically:
7
x
6 2
1
k=0
0
O x
1
-1
k=1
-2
-3
-4
k=2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
s
x21 + x22
f (x1 ; x2 ) =
x1
is de…ned only for x1 > 0. Its level curves f 1 (k) are the loci of equation
s
x21 + x22
=k
x1
that is, x21 + x22 k 2 x1 = 0. Therefore, they are circumferences passing through the origin
6.3. GENERAL PROPERTIES 121
Although all such circumferences have the origin as common point, the “true” level curves
are the circumferences without the origin because at (0; 0) the function is not de…ned. So,
they do not actually have any point in common. N
O.R. The equation f (x1 ; x2 ) = k of a generic level curve of a function f of two variables
can be rewritten, in an apparently more complicated form, as
y = f (x1 ; x2 )
y=k
(ii) the equation y = k represents an horizontal plane (it contains the points (x1 ; x2 ; k) 2
R3 , i.e., all the points of “height” k);
(iii) the brace “f ” geometrically means intersection between the sets de…ned by the two
previous equations.
The curve of level k is, therefore, viewed as the intersection between the surface that
represents f and a horizontal plane.
122 CHAPTER 6. FUNCTIONS
x3
-2
-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
Hence, the di¤erent level curves are obtained by cutting the surface horizontally with hori-
zontal planes (at di¤erent levels). They represent the edges of the “slices” obtained in this
way on the plane (x1 ; x2 ). H
Indi¤erence curves
We now turn to a classic economic application of level curves. Given a utility function
u : A Rn+ ! R, its level curves
1
u (k) = fx 2 A : u (x) = kg
are called indi¤erence curves. So, an indi¤erence curve is formed by all the bundles x 2 A
that have the same utility k, which are therefore indi¤erent for the consumer. The collection
u 1 (k) : k 2 R of all the indi¤erence curves is sometimes called indi¤ erence map.
Example 187 Consider the Cobb-Douglas utility function u : R2+ ! R given by u (x) =
p
x1 x2 . For every k > 0 we have
1 p
u (k) = x 2 R2+ : x1 x2 = k = x 2 R2+ : x1 x2 = k 2
k2
= x 2 R2+ : x2 =
x1
k2
x2 =
x1
8
y
7
6 k=3
5
k=2
4
2 k=1
0
O x
-1
0 0.5 1 1.5 2 2.5 3 3.5
Introductory economics courses emphasize that indi¤erence curves “do not cross”, i.e.,
are disjoint: k1 6= k2 implies u 1 (k1 ) 6= u 1 (k2 ). Clearly, this just a special case of the more
general property (6.7) that holds for any family of level curves.
of a production function f : A Rn+ ! R are called isoquants. An isoquant is, thus, the set
of all the input vectors x 2 Rn+ that produce the same output. The set f 1 (k) : k 2 R of
all the isoquants is sometimes called isoquant map.
Finally, the level curves
1
c (k) = fx 2 A : c (x) = kg
of a cost function c : A R+ ! R are called isocosts. So, an isocost is the set of all the
levels of output x 2 A that have the same cost. The set c 1 (k) : k 2 R of all the isocosts
is sometimes called isocost map.
In sum, indi¤erence curves, isoquants and isocosts are all examples of level curves, whose
general properties they inherit. For example, the fact that two level curves have no points in
common –property (6.7) –implies the analogous classic property of the indi¤erence curves,
as already noted, as well as the property that isoquants and isocosts never intersect.
De…nition 188 Given any two functions f and g in RA , the sum function f + g is the
element of RA such that
(f + g) (x) = f (x) + g (x) 8x 2 A
The sum function f + g : A ! R is thus constructed by adding, for each element x of the
domain A, the images f (x) and g (x) of x under the two functions.
Example 189 Let RR be the set of all the functions f : R ! R. Consider f (x) = x and
g (x) = x2 . The sum function f + g is de…ned by (f + g) (x) = x + x2 . N
(i) the di¤ erence function (f g) (x) = f (x) g (x) for every x 2 A;
(ii) the product function (f g) (x) = f (x) g (x) for every x 2 A;
(iii) the quotient function (f =g) (x) = f (x) =g (x) for every x 2 A, provided g (x) 6= 0.
We have thus introduced four operations in the set RA , based on the four basic operations
on the real numbers. It is easy to see that these operations inherit the properties of the
basic operations. For example, addition is commutative, f + g = g + f , and associative,
(f + g) + h = f + (g + h).
N.B. (i) These operations require the functions to have the same domain A. For example,
p
if f (x) = x2 and g (x) = x, the sum f + g is meaningless because, for x < 0, the function
g is not de…ned. (ii) The domain A can be any set: numbers, chairs, or other. Instead, it is
key that the codomain is R because it is among real numbers that we are able to perform
the four basic operations. O
6.3.3 Composition
Consider two functions f : A ! B and g : C ! D, with Im f C. Take any point x 2 A.
Since Im f C, the image f (x) belongs to the domain C of the function g. We can apply
the function g to the image f (x), obtaining in such a way the element g (f (x)) of D. Indeed,
the function g has as its argument the image f (x) of x. Graphically:
1.6 A Im f ⊆ C D
1.4
1.2 f g
x f(x) g(f(x))
1
0.8
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
6.3. GENERAL PROPERTIES 125
We have, therefore, associated to each element x of the set A the element g (f (x)) of the
set D. This rule, called composition, starts with the functions f and g and de…nes a new
function from A in D, denoted by g f . Formally:
Note that the inclusion condition, Im f C, is key in making the composition possible.
Let us give some examples.
Example 193 If in the previous example we consider g~ : [1; +1) ! R given by g~ (x) = x 1,
the inclusion condition is satis…ed
p for f g~ because Im g~ = [0; +1) = R+ . In particular,
f g~ : [1; +1) ! R is given by x 1. As we will see soon in Section 6.7, the function g~ is
the restriction of g to [1; +1). N
Example 194 Let A be the set of all citizens of a country, f : A ! R the function that to
each of them associates his income for this year, and g : R ! R the function that to each
possible income associates the tax that must be paid. The composite function g f : A ! R
establishes the correspondence between each citizen and the tax that he has to pay. For the
revenue service (and also for the citizens) such composite function is of great interest. N
To di¤erent elements of the domain, an injective f thus associates di¤erent elements of the
codomain. Graphically:
1.6
A B
1.4
a
1
1.2 b
1
b
3
1 a b
2 2
0.8
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Example 196 A simple example of injective function is the cubic f (x) = x3 . Indeed, two
distinct scalars have always distinct cubes, so x 6= y implies x3 6= y 3 for all x; y 2 R. A
classic example of non-injective function is the quadratic f (x) = x2 : for instance, to the two
distinct points 2 and 2 of R there corresponds the same square, that is, f (2) = f ( 2) = 4.
N
which requires that two elements of the domain that have the same image be equal.
Given any two sets A and B, a function f : A ! B is called surjective (or onto) if
Im f = B
that is, if for each element y of B there exists at least an element x of A such that f (x) = y.
In other words, a function is surjective if each element of the codomain is the image of at
least one point in the domain.
12
Given two properties p and q, we have p =) q if and only if :q =) :p (: stands for “not”). The
implication :q =) :p is the contrapositive of the original implication p =) q. See Appendix D.
6.4. CLASSES OF FUNCTIONS 127
Example 197 The cubic function f : R ! R given by f (x) = x3 is surjective because each
1 1
y 2 R is the image of y 3 2 R, that is, f (y 3 ) = y. On the other hand, the quadratic function
f : R ! R given by f (x) = x2 is not surjective, because no y < 0 is the image of a point of
the domain. N
Finally, given any two sets A and B, a function f : A ! B is called bijective if it is both
injective and surjective. In this case, we can go “back and forth” between the sets A and B
by using f : from any x 2 A we arrive to a unique y = f (x) 2 B, while from any y 2 B we
go back to a unique x 2 A such that y = f (x). Graphically:
1.6
A B
1.4
a b
1 1
1.2
1 a b
2 2
0.8
a b
3 3
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Through bijective functions we can establish a simple, but interesting, result about …nite
sets. Here jAj denotes the cardinality of a …nite set A, that is, the number of its elements.
Proposition 198 Let A and B be any two …nite sets. There exists a bijection f : A ! B if
and only if jAj = jBj.
Proof “If”. Let jAj = jBj = n and write A = fa1 ; a2 ; : : : ; an g and B = fb1 ; b2 ; : : : ; bn g. Then
de…ne the bijection f : A ! B by f (ai ) = bi for i = 1; 2; :::; n. “Only if”. Let f : A ! B
be a bijection. By injectivity, we have jAj jBj. Indeed, to each x 2 A there corresponds
a distinct f (x) 2 B. On the other hand, by surjectivity we have jBj jAj. Indeed, for
128 CHAPTER 6. FUNCTIONS
We have both
1
f (f (x)) = x 8x 2 A (6.9)
and
1
f f (y) = y 8y 2 Im f (6.10)
Inverse functions go in the opposite way than the original ones, they retrace their steps back
to the domain: from x 2 A we arrive to f (x) 2 B, and we go back with f 1 (f (x)) = x.
Graphically:
1.6
A B
1.4
1.2
f
1 x y
-1
f
0.8
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
It makes sense to talk about the inverse function only for injective functions, which are
then called invertible. Indeed, if f were not injective, there would be at least two elements
of the domain x1 6= x2 with the same image y = f (x1 ) = f (x2 ). So, the set of the preimages
of y would not be a singleton (because it would contain at least the two elements x1 and x2 )
and the relation f 1 would not be a function.
We actually have f 1 : B ! A when the function f is also surjective, and so bijective.
In such a case the domain of the inverse is the entire codomain of f .
6.4. CLASSES OF FUNCTIONS 129
8
< 2y if y < 0
f 1 (y) =
: y if y 0
3
Example 202 Let f : R f0g ! R be de…ned by f (x) = 1=x. From y = 1=x, it follows
that x = 1=y, so f 1 : R f0g ! R is given by f 1 (y) = 1=y. In this case f = f 1 . Note
that R f0g is both the domain of f 1 and the image of f: N
(
x if x 2 Q
f (x) =
x if x 2
=Q
1
It is easy to see that, when it exists, the inverse (g f ) of the composite function g f
is
1 1
f g (6.11)
That is, it is the composition of the inverse functions, but exchanged of place. Indeed, from
y = g (f (x)) we get g 1 (y) = f (x) and …nally f 1 g 1 (y) = x. On the other hand, in
dressing, we …rst we put the underpants, f , and then the pants, g; in undressing, …rst we
take o¤ the pants, g 1 , and then the underpants, f 1 .
130 CHAPTER 6. FUNCTIONS
O.R. The graph of the inverse function f 1 is the mirror image of the graph of the function
f with respect to the 45 degree line:
Inverses and cryptography The computation of the cube x3 of any scalar x is much
p
easier than the computation of the cube root 3 x: it is much easier to compute 803 = 512; 000
p
(three multiplications su¢ ce) than 3 512; 000 = 80. In other words, the computation of the
p
cubic function f (x) = x3 is much easier than the computation of its inverse f 1 (x) = 3 x.
This computational di¤erence increases signi…cantly as we take higher and higher odd powers
(for example f (x) = x5 , f (x) = x7 and so on).
Similarly, while the computation of ex is fairly easy, that of log x is much harder (be-
fore electronic calculators became available, logarithmic tables were used to aid such com-
putations). From a computational viewpoint (in the theoretical world everything works
smoothly), the inverse function f 1 may be very di¢ cult to deal with. Injective functions
for which the computation of f is easy, while that of the inverse f 1 is complex, are called
one-way.13
For example, let A = f(p; q) 2 P P : p < qg and consider the function f : A P P ! N
de…ned by f (p; q) = pq that associates to each pair of prime numbers p; q 2 P, with p < q,
their product pq. For example, f (2; 3) = 6 and f (11; 13) = 143. By the Fundamental
13
The notions of “simple” and “complex”, here used qualitatively, can be made more rigorous (as the
curious reader may discover in cryptography texts).
6.4. CLASSES OF FUNCTIONS 131
Theorem of Arithmetic, it is an injective function.14 Given two prime numbers p and q, the
computation of their product is a trivial multiplication. Instead, given any natural number n
it is quite complex, and may require a long time even to a powerful computer, to determine
if it is the product of two prime numbers. In this regard, the reader may recall the discussion
regarding factorization and primality tests from Section 1.3.2 (to experience the di¢ culty
…rst-hand, the reader may try to check whether the number 4343 is the product of two prime
numbers). This makes the computation of the inverse function f 1 very complex, as opposed
to the very simple computation of f . For this reason, f is a classic example of a one-way
function.
14
But not surjective: for example 4 2
= Im f because there are no two di¤erent prime numbers whose product
is 4.
132 CHAPTER 6. FUNCTIONS
(i) bounded (from) above if its image Im f is a set bounded above in R, i.e., if there exists
M 2 R such that f (x) M for every x 2 A;
(ii) bounded (from) below if its image Im f is a set bounded below in R, i.e., if there
exists m 2 R such that f (x) m for every x 2 A;
Lemma 204 A function f : A ! R is bounded if and only if there exists k > 0 such that
jf (x)j k 8x 2 A (6.12)
Proof If f is bounded, there exist m; M 2 R such that m f (x) M . Let k > 0 be such
that k m M k. Then (6.12) holds. Vice versa, suppose that (6.12) holds. By (4.1),
which holds also for , we have k f (x) k, so f is bounded both above and below.
By the de…nition of the supremum, for a scalar M we have f (x) M for all x 2 A if and
only if supx2A f (x) M .
Similarly, we denote by inf x2A f (x) the in…mum of the image of a function f : A ! R
bounded below, that is,
inf f (x) = inf (Im f )
x2A
By the de…nition of the in…mum, for a scalar m we have f (x) m for all x 2 A if and only
if inf x2A f (x) m.
Clearly, a bounded function f : A ! R has both extrema, with
In particular, for two scalars m and M we have m f (x) M for all x 2 A if and only if
m inf x2A f (x) supx2A f (x) M .
Example 205 For the function (6.13) we have supx2R f (x) = 1 and inf x2R f (x) = 2. For
the function f : R f0g ! R given by f (x) = 1= jxj, which is bounded below but not above,
one has inf x2R f0g f (x) = 0. N
Monotonic functions on R
We begin by studying scalar functions.
(i) increasing if
x > y =) f (x) f (y) 8x; y 2 A (6.14)
strictly increasing if
(ii) decreasing if
x > y =) f (x) f (y) 8x; y 2 A (6.16)
strictly decreasing if
f (x) = k 8x 2 A
134 CHAPTER 6. FUNCTIONS
Note that a function is constant if and only if it is both increasing and decreasing. In
other words, constancy is equivalent to having both monotonicity properties. This is why
we have introduced constancy among the forms of monotonicity. Soon, we will see that in
the multivariable case the relation between constancy and monotonicity is a bit more subtle.
Increasing or decreasing functions are called, generically, monotonic (or monotone). They
are called strictly monotonic when they are either strictly increasing or strictly decreasing
(two mutually exclusive properties: there are no functions that are both strictly increas-
ing and strictly decreasing). The next result shows that strict monotonicity excludes the
possibility that the function is constant on some region of its domain. Formally:
A similar result holds for strictly decreasing functions. Strictly monotonic functions are
therefore injective, and so invertible.16
Proof “Only if”. Let f be strictly increasing and let f (x) = f (y). Suppose, by contradiction,
that x 6= y, say x > y. By (6.15), we have f (x) 6= f (y), which contradicts f (x) = f (y). It
follows that x = y, as desired.
“If”. Suppose that (6.17) holds. Let f be increasing. We prove that it is also strictly
increasing. Let x > y. By increasing monotonicity, we have f (x) f (y), but we cannot
have f (x) = f (y) because (6.17) would imply x = y. Thus f (x) > f (y), as claimed.
Example 208 The functions f : R ! R given by f (x) = x and f (x) = x3 are strictly
increasing, while the function (
x if x 0
f (x) =
0 if x < 0
is increasing, but not strictly increasing, because it is constant for every x < 0. The same is
true for the function 8
>
> x 1 if x 1
<
f (x) = 0 if 1<x<1 (6.18)
>
>
:
x + 1 if x 1
because it is constant on [ 1; 1]. N
Note that in (6.14) we can replace x > y by x y without any consequence because we
have f (x) = f (y) if x = y. Hence, increasing monotonicity is equivalently stated as
It requires that, to larger values of the image, correspond larger values of the argument.
Clearly, f (x) = f (y) is equivalent to having both f (x) f (y) and f (y) f (x), which in
turn, by (6.20), imply both x y and y x, that is, x = y. Therefore, from (6.20) it follows
that
f (x) = f (y) =) x = y (6.21)
In view of Proposition 207, we conclude that an increasing function that satis…es also (6.20)
is strictly increasing. The next result shows that the converse is also true, thus establishing
an important characterization of strictly increasing functions (a dual result holds for strictly
decreasing functions).
Momentarily, we will see that this result plays an important role in the ordinalist approach
of utility theory.
Proof Thanks to what we have seen above, we just need to prove the “only if”part, i.e., that
a strictly increasing function satis…es (6.22). Since a strictly increasing function is increasing,
the implication
x y =) f (x) f (y)
is obvious. To prove (6.22) it remains to show that
f (x) f (y) =) x y
Let f (x) f (y) and suppose, by contradiction, that x < y. Strict increasing monotonicity
implies f (x) < f (y), which contradicts f (x) f (y). So x y, as desired.
Monotonic functions on Rn
The monotonicity notions seen in the case n = 1 generalize in a natural way to the case
of arbitrary n, though some subtle issues arise because of the two peculiarities of the case
n 2, that is, the incompleteness of and the presence of two notions of strict inequality,
> and .
Basic monotonicity is easily generalized: a function f : A Rn ! R is said to be:
(i) increasing if
x y =) f (x) f (y) 8x; y 2 A (6.23)
(ii) decreasing if
x y =) f (x) f (y) 8x; y 2 A
f (x) = k 8x 2 A
136 CHAPTER 6. FUNCTIONS
This notion of increasing and decreasing function has bite only on vectors x and y that
can be compared, while vectors x and y that cannot be compared, such as for example (1; 2)
and (2; 1) in R2 , are ignored. As a result, while constant functions are both increasing and
decreasing, the converse is no longer true when n 2, as the next example shows.
Example 210 Let A = fa; a0 ; b; b0 g be a subset of the plane with four elements. Assume
that a a0 and b b0 are the only comparisons that can be made in A. For instance,
a = ( 1; 0), a = (0; 1), b = (1; 1=2) and b0 = (2; 1=2). The function f : A
0 R2 ! R
de…ned by f (a) = f (a0 ) = 0 and f (b) = f (b0 ) = 1 is both increasing and decreasing, but it
is not constant. N
More delicate is the generalization to Rn of strict monotonicity because of the two distinct
concepts of strict inequality.17 We say that a function f : A Rn ! R is:
Proof A strongly increasing function is, by de…nition, increasing. It remains to prove that
strictly increasing implies strongly increasing. Thus, let f be strictly increasing. We need to
prove that f is increasing and satis…es (6.24). If x y, we have x = y or x > y. In the …rst
case f (x) = f (y). In the second case f (x) > f (y), so f (x) f (y). Thus, f is increasing.
Moreover, if x y a fortiori we have x > y, and therefore f (x) > f (y). We conclude that
f is strongly increasing.
The converses of the previous implications do not hold. An increasing function that,
like (6.18), has constant parts is an example of an increasing, but not strongly increasing
function (so, not strictly increasing either18 ). Therefore,
Moreover, the next example shows that there exist functions that are strongly but not strictly
increasing, that is,
strongly increasing 6=) strictly increasing
17
We focus on the increasing case, leaving the decreasing case to the reader.
18
By the contrapositive of (6.25), a function which is not strongly increasing, it is not strictly increasing as
well.
6.4. CLASSES OF FUNCTIONS 137
is strongly increasing, but not strictly increasing. For example, x = (1; 2) > y = (1; 1) but
f (x) = f (y) = 1. N
N.B. For operators f : Rn ! Rm with m > 1 the notions of monotonicity studied for the
case m = 1 assume a di¤erent meaning since also the images f (x) and f (y) might not
be comparable, that is, neither f (x) f (y) nor f (y) f (x) may hold. For example, if
f : R2 ! R2 is such that f (0; 1) = (1; 2) and f (3; 4) = (2; 1), the images (1; 2) and (2; 1) are
not comparable. A notion of monotonicity suitable for operators f : Rn ! Rm when m > 1
will be studied in Section 24.2.2. O
Utility functions
Let u : A ! R be a utility function de…ned on a set A Rn+ of bundles of goods. A
transformation f u : A ! R of u, where f : Im u R ! R, de…nes a mathematically
di¤erent but conceptually equivalent utility function provided
Indeed, under this condition the function f u orders the bundles in the same way as the
original utility function u, that is,
The utility functions u and f u are thus equivalent because they represent the same under-
lying preference %.
By Proposition 209, the function f satis…es (6.26) if and only if it is strictly increasing.
Therefore, f u is an equivalent utility function if and only if f is strictly increasing. To
describe such a fundamental property of invariance of utility functions, we say that they are
ordinal, that is, unique up to monotonic (strictly increasing) transformations. This property
lies at the heart of the ordinalist approach, in which utility functions are regarded as mere
numerical representation of the underlying preference %, which is the fundamental notion
(recall the discussion in Section 6.2.1).
In this case it is su¢ cient to increase the amount of any of the goods to attain a greater
utility: “the more of any good is always better”.
If, instead, we want to contemplate the possibility that some goods may actually be
useless for the consumer, we only require u to be increasing:
Indeed, if a good in the bundles is “useless” for the consumer (as wine is for a dry person,
or for drunk one who had already too much of it), the inequality x > y might be caused
by a larger amount of such good, with all other goods unchanged; it is then reasonable that
u (x) = u (y) because the consumer does not get any bene…t in passing from y to x. In this
case “the more of any good can be better or indi¤erent”.
Finally, “the more of any good is always better”motto that motivates strict monotonicity
can be weakened in the sense of strong monotonicity by assuming “the more of all the goods
is always better”, that is,
In this case, there is an increase in utility only when the amounts of all goods increase, it
is no longer enough to increase the amount of only some good. Strong monotonicity may
re‡ect a form of complementarity among goods, so that an increase of the amounts of only
some of them can be irrelevant for the consumer if the quantities of the other goods remain
unchanged. Perfect complementarity a la Leontief is the extreme case, a classic example
being pairs of shoes, right and left.20
Example 214 (i) The Cobb-Douglas utility function on Rn++ given by (6.27) is strictly
increasing. By (6.25), it is also strongly increasing.
(ii) The Leontief utility function on Rn++ given by
u (x1 ; x2 ; ; xn ) = min xi
i=1;:::;n
19
Recall that, even if mathematically it can be de…ned on the entire positive orthant Rn
+ , from the economic
viewpoint it is on Rn ++ that the Cobb-Douglas utility function is interesting (cf. Example 214). The fact that
the log-linear utility function can be only de…ned on Rn++ can be viewed as a further sign that this is, indeed,
the proper economic domain of the Cobb-Douglas utility function.
20
It is useless to increase the number of the right shoes without increasing, in the same quantity, that of
the left shoes (and vice versa).
6.4. CLASSES OF FUNCTIONS 139
in which the goods are perfect complements, is strongly increasing. As we saw in Example
212, it is not strictly increasing.
(iii) The reader can check which properties of monotonicity hold if we consider the two
previous utility functions on the entire positive orthant Rn+ rather than just on Rn++ . N
The class of concave and convex functions is of great importance in economics. The concept,
which will be fully developed in Chapter 14, is anticipated here in the scalar case.
f ( x + (1 ) y) f (x) + (1 ) f (y)
f ( x + (1 ) y) f (x) + (1 ) f (y)
Geometrically, a function is concave if the segment (called chord ) that joins any two
points (x; f (x)) and (y; f (y)) of its graph lies below the graph of the function, while it is
convex if the opposite happens, that is, if such chord lies above the graph of the function.
140 CHAPTER 6. FUNCTIONS
Note that the domain of concave and convex functions must be an interval, so the points
x + (1 ) y belong to it and the expression f ( x + (1 ) y) is meaningful.
Example 216 The functions f; g : R ! R de…ned by f (x) = x2 and g(x) = ex are convex,
while the function f : R ! R de…ned by f (x) = log x is concave. The function f : R ! R
given by f (x) = x3 is neither concave nor convex. All this can be checked analytically
through the last de…nition, but it is best seen graphically:
5 5
4 4
3 3
2 2
1 1
0 0
x y x y
-1 -1
-2 -2
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
8
3
6
2 4
2
1
0
x
y
0
x -2
y
-4
-1
-6
-2 -8
-3 -2 -1 0 1 2 3 4 5
-3
-1 0 1 2 3 4
The importance P of this class of functions is due to their great tractability. The simplest
example is f (x) = ni=1 xi , for which the functions gi are the identity, i.e., gi (x) = x for
each i. Let us give some more examples.
In other words, they assumed that the utility of a bundle x is decomposable into the utility
of the quantities xi of the various goods that compose it. It is a restrictive assumption that
ignores any possible interdependence, for example of complementarity or substitutability,
among the di¤erent goods of a bundle. Due to its remarkable tractability, however, (6.29)
remained for a long time the standard form of the utility functions until, at the end of the
nineteenth century, the works of Edgeworth and Pareto showed how to develop consumer
theory for utility functions that are not necessarily separable. N
Example 221 If in (6.29) we set ui (xi ) = xi for all i, we obtain the important special case
n
X
u (x) = xi
i=1
where the goods are perfect substitutes. The utility of bundles x depends only on the sum of
the amounts of the di¤erent goods, regardless of the speci…c amounts of the individual goods.
For example, think of x as a bundle of di¤erent types of oranges, which di¤er in origin and
taste, but are identical in terms of nutritional values. In this case, if the consumer only cares
about such values, then these di¤erent types of oranges are perfect substitutes. This case is
opposite to that of perfect complementarity that characterizes the Leontief utility function.
More generally, if in (6.29) we set ui (xi ) = i xi for all i, with i > 0, we have
n
X
u (x) = i xi
i=1
In this case, the goods in the bundle are no longer perfect substitutes; rather, their relevance
depends on their weights i . Therefore, to keep utility constant each good can be replaced
with another according to a linear trade-o¤. Intuitively, one unit of good i is equivalent
to j = i units of good j. The notion of marginal rate of substitution formalizes this idea
(Section 25.3.2). N
Lemma 224 Both the exponential function ax and the logarithmic function loga x are in-
creasing if a > 1 and decreasing if 0 < a < 1.
Proof For the exponential function, observe that, when a > 1, also ah > 1 for every h > 0.
Therefore ax+h = ax ah > ax for every h > 0. For the logarithmic function, after observing
that loga k > 0 if a > 1 and k > 1, we have
h h
loga (x + h) = loga x 1 + = loga x + loga 1 + > loga x
x x
That said, in the sequel we will mostly use Napier’s constant e as base and so we will
refer to f (x) = ex as the exponential function, without further speci…cation (sometimes it is
denoted by f (x) = exp x). Thanks to the remarkable properties of the power ex (Section 1.5),
the exponential function plays a fundamental role in mathematics and in its applications.
Its image is (0; 1) and its graph is:
5
y
4
1 1
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
The negative exponential function f (x) = e x is also important. Its graph is:
6.5. ELEMENTARY FUNCTIONS ON R 145
2
y
1
0
O x
-1 -1
-2
-3
-4
-5
-3 -2 -1 0 1 2 3 4
In a similar vein, in view of the special importance of the natural logarithm (Section 1.5),
we refer to f (x) = log x as the logarithmic function, without further speci…cation. Like the
exponential function f (x) = ex , which is its inverse, the logarithmic function f (x) = log x
is widely used in applications. Its image is R and its graph is:
5
y
4
0
O 1 x
-1
-2
-3 -2 -1 0 1 2 3 4
146 CHAPTER 6. FUNCTIONS
The functions ex and log x, being one the inverse of the other, have graphs that are mirror
images of each other:
Trigonometric functions, and more generally periodic functions, are also important in many
applications.21
Trigonometric functions
The sine function f : R ! R de…ned by f (x) = sin x is the …rst example of a trigonometric
function. For each x 2 R we have
sin (x + 2k ) = sin x 8k 2 Z
21
We refer readers to Appendix C for of some basic notions of trigonometry.
6.5. ELEMENTARY FUNCTIONS ON R 147
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
The function f : R ! R de…ned by f (x) = cos x is the cosine function. For each x 2 R
we have
cos (x + 2k ) = cos x 8k 2 Z
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
10
y
8
0
O x
-2
-4
-6
-8
-10
-4 -3 -2 -1 0 1 2 3 4
It is immediate to see that, for x 2 (0; =2), we have the sandwich 0 < sin x < x < tan x.
The functions sin x, cos x and tan x are monotonic (so invertible) on, respectively, the in-
tervals [ =2; =2], [0; ], and ( =2; =2). Their inverse functions are denoted respectively
by arcsin x (or sin 1 x), arccos x (or cos 1 x), and arctan x (or tan 1 x).
Speci…cally, by restricting ourselves to an interval [ =2; =2] of strict monotonicity of
the function sin x, we have
h i
sin x : ; ![ 1; 1]
2 2
h i
arcsin x : [ 1; 1] ! ;
2 2
with graph:
6.5. ELEMENTARY FUNCTIONS ON R 149
3 y
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
cos x : [0; ] ! [ 1; 1]
Therefore, the inverse function of cos x is
arccos x : [ 1; 1] ! [0; ]
with graph:
y
3
0
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
tan x : ; !R
2 2
150 CHAPTER 6. FUNCTIONS
arctan x : R ! ;
2 2
with graph:
3 y
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Note that (2= ) arctan x is a one-to-one correspondence between the real line and the
open interval ( 1; 1). As we will learn in the next chapter, this means that the open interval
( 1; 1) has the same cardinality of the real line.22
Periodic functions
The smallest (if it exists) among such p > 0 is called the period of f . In particular,
the periodic functions sin x and cos x have period 2 , while the periodic function tan x has
period . Their graphs well illustrate the property that characterizes periodic functions,
that is, that of repeating themselves identical on each interval of width p.
Example 226 The functions sin2 x and log tan x are periodic of period . N
Example 227 The function f : R ! R given by f (x) = x [x] is called mantissa.23 The
mantissa of x > 0 is its decimal part; for example f (2:37) = 0:37. The mantissa function is
periodic with period 1. Indeed, by (1.19) we have [x + 1] = [x] + 1 for every x 2 R. So,
Its graph
2.5
2
y
1.5
0.5
-0.5
-1
O x
-1.5
-2
-2.5
-3 -2 -1 0 1 2 3
Finally, readers can verify that periodicity is preserved by the fundamental operations
among functions: if f and g are two periodic functions of same period p, the functions
f (x) + g (x), f (x) g (x) and f (x) =g (x) are also periodic (of period at most p).
f (^
x) f (x) 8x 2 A
The value f (^
x) of the function at x
^ is called ( global) maximum value of f on A.
Maximizers thus attain the highest values of the function f on its domain, they outper-
form all other elements of the domain. Note that the maximum value of f on A is nothing
but the maximum of the set Im f , which is a subset of R. That is,
f (^
x) = max f (A) = max Im f
By Proposition 33, the maximum value is unique. We denote such unique value by
max f (x)
x2A
23
Recall from Proposition 39 that the integer part [x] of a scalar x 2 R is the greatest integer x.
152 CHAPTER 6. FUNCTIONS
4 y
3
0
O x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
The maximizer of f is 0 and the maximum value is 1. Indeed, 1 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = ( 1; 1], we have 1 = max ( 1; 1]. N
Similar de…nitions hold for the minimum value of f on A and for the minimizers of f on
A.
Example 230 Consider the quadratic function f (x) = x2 , whose graph is the parabola
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
The minimizer of f is 0 and the minimum value is 0. Indeed, 0 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = [0; 1), we have 0 = min [0; 1). N
While the maximum (minimum) value is unique, maximizers and minimizers might well
not be unique, as the next example shows.
6.7. DOMAINS AND RESTRICTIONS 153
Example 231 Let f : R ! R be the sine function f (x) = sin x. Since Im f = [ 1; 1], the
unique maximum of f on R is 1 and the unique minimum of f on R is 1. Nevertheless,
there are both in…nitely many maximizers –i.e., all the points x = =2 + 2k with k 2 Z -
and in…nitely many minimizers – i.e., all the points x = =2 + 2k with k 2 Z. The next
graph should clarify.
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
The restriction fjC can, therefore, be seen as f restricted on the subset C of A. Thanks
to the smaller domain, the function fjC can satisfy properties di¤erent from those of the
original function f .
Example 233 (i) Let g : [0; 1] ! R be de…ned by g(x) = x2 . The function g can be seen as
the restriction to the interval [0; 1] of the quadratic function f : R ! R given by f (x) = x2 ;
that is g = fj[0;1] . Thanks to its restricted domain, the function g has better properties
than the function f . For example: g is strictly increasing, while f is not; g is injective
(so, invertible), while f is not; g is bounded, while f is only bounded below; g has both a
maximizer and a minimizer, while f does not have a maximizer.
(ii) Let g : ( 1; 0] ! R be de…ned by g(x) = x. The function g can be seen as
the restriction to ( 1; 0] of both f : R ! R given by f (x) = jxj and h : R ! R given
by h(x) = x. Indeed, a function may be the restriction of several functions (rather, of
in…nitely many functions) and it is the speci…c application at hand that may suggest which
is the most relevant. In any case, let us analyze the di¤erences between g and f and those
between g and h. The function g is injective, while f is not; g is monotonic decreasing, while
f is not. The function g is bounded below, while h is not; g has a global minimizer, while h
does not. N
p
Example 234 The function f (x1 ; x2 ) = x1 x2 has as natural domain R2+ [ R2 , i.e., the
…rst and third quadrants of the plane. Nevertheless, when we regard it as a utility function
of Cobb-Douglas type, its domain is restricted to the …rst quadrant, R2+ , because bundles
of goods always have positive components. Moreover, since f (x1 ; x2 ) = 0 even when just
one component is zero, something not that plausible from an economic viewpoint, this util-
ity function is often considered only on R2++ . Therefore, purely economic considerations
determine the domain on which to study f when interpreted as a utility function. N
Example 235 (i) Let g : [0; +1) ! R be de…ned by g (x) = x3 : The function g can be seen
as the restriction to the interval [0; +1) of the cubic function f : R ! R given by f (x) = x3 ,
that is, g = fj[0;+1] . We observe that g is convex, while f is not; g is bounded below, while
f is not; g has a minimizer, while f does not.
(ii) Let g : ( 1; 0] ! R be de…ned by g (x) = x3 . The function g can be seen as the
restriction to the interval ( 1; 0] of the function f : R ! R given by f (x) = x3 , that is,
g = fj( 1;0] . We observe that g is concave, while f is not; g is bounded above, while f is
not; g has a maximizer, while f does not.
(iii) Sometimes smaller domains may actually deprive functions of some of their prop-
erties. For instance, the restriction of the sine function on the interval [0; =2] is no longer
periodic, while the restriction of the quadratic function on the open unbounded interval
(0; 1) has no minimizers. N
is called an extension of f to C.
Restriction and extension are, thus, two sides of the same coin: g is an extension of f
if and only if f is a restriction of g. In particular, a function de…ned on its natural domain
A is an extension to A of each restriction of this function. Moreover, if a function has an
extension, it has in…nitely many ones.24
is an extension of the function f (x) = 1=x, which has as natural domain R f0g.
(ii) The function g : R ! R de…ned by
(
x for x 0
g(x) =
log x for x > 0
is an extension of the function f (x) = log x, which has natural domain R++ . N
(i) we write x y if the bundle x is strictly preferred to y, that is, if x % y but not y % x;
(ii) we write x y if the bundle x is indi¤ erent relative to the bundle y, that is, if both
x % y and y % x.
Relations and are, obviously, mutually exclusive: between two indi¤erent bundles
there cannot exist strict preference, and vice versa. The next simple result further clari…es
the di¤erent nature of the two relations.
Lemma 238 The strict preference relation is asymmetric (i.e., x y implies not y x),
while the indi¤ erence relation is symmetric (i.e., x y implies y x).
24
A function might not have restrictions or extensions. Indeed, let f : A R ! R. In the singleton case
A = fx0 g, then f has no restrictions. Instead, if A is the natural domain, then f has no extensions.
25
In the weak sense of “prefers or is indi¤erent”. The preference relation is an important example of a
binary relation (see Appendix A).
156 CHAPTER 6. FUNCTIONS
This …rst axiom re‡ects the “weakness”of %: each bundle is preferred to itself. The next
axiom is more interesting.
It is a rationality axiom that requires that the preferences of the decision maker have no
cycles:
x%y%z x
Strict preference and indi¤erence inherit these …rst two properties (with the obvious
exception of re‡exivity for the strict preference).
(ii) is transitive.
Proof (i) We have x x since, thanks to the re‡exivity of %, both x % x and x - x hold.
Hence, the relation is re‡exive. To prove transitivity, suppose that x y and y z. We
show that this implies x z. By de…nition, x y means that x % y and y % x, while y z
means that y % z and z % y. Thanks to the transitivity of %, from x % y and y % z it
follows x % z, while from y % x and z % y it follows z % x. We therefore have both x % z
and z % x, i.e., x z.
(ii) Suppose that x y and y z. We show that this implies x z. Suppose, by
contradiction, that this is not the case, i.e., z % x. By de…nition, x y and y z imply
x % y and y % z. Since y % z and z % x, the transitivity of % implies y % x, so x y since
x - y. But, x y contradicts x y.
The last two lemmas together show that, if % is re‡exive and transitive, the indi¤er-
ence relation is re‡exive, symmetric, and transitive (so, it is an equivalence relation; cf.
Appendix A). For each bundle x 2 A, denote by
[x] = fy 2 A : y xg
the collection of the bundles indi¤erent to it. This set is the indi¤ erence class of % determined
by the bundle x.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 157
and
x y () [x] \ [y] = ; (6.32)
Relations (6.31) and (6.32) express two fundamental properties of the indi¤erence classes.
By (6.31), the indi¤erence class [x] does not depend on the choice of the bundle x: each
indi¤erent bundle determines the same indi¤erence class. By (6.32), di¤erent indi¤erence
classes do not have elements in common, they do not intersect.
Proof By the previous lemmas, is re‡exive, symmetric, and transitive. We …rst prove
(6.31). Suppose that x y. We show that this implies [x] [y]. Let z 2 [x], that is, z x.
Since is transitive, x y and z x imply that z y, that is, z 2 [y], which shows that
[x] [y]. By symmetry, x y implies y x. Then, the previous argument shows that
[y] [x]. So, we conclude that x y implies [y] = [x]. Since the converse is obvious, (6.31)
is proved.
We move now to (6.32) and suppose that x y. This implies that [x] \ [y] = ;. Let
us suppose, by contradiction, that this is not the case and there exists z 2 [x] \ [y]. By
de…nition, we have both z x and z y. By the transitivity of , we then have x y,
which contradicts x y. The contradiction shows that x y implies [x] \ [y] = ;. Since the
converse is obvious, the proof is complete.
Let us continue the study of %. The next axiom does not concern the rationality, but
rather the information of the consumer.
Completeness requires the consumer to be able to compare any two bundles of goods,
even very di¤erent ones. Naturally, to do so the consumer must, at least, have su¢ cient
information about the two alternatives: it is easy to think of examples where this assumption
is unrealistic. So, completeness is a non-trivial assumption on preferences.
In any case, note that completeness requires, inter alia, that each bundle be comparable
to itself, that is, x % x. Thus, it implies re‡exivity.
Given the completeness assumption, the relations and are both exclusive (as seen
above) and exhaustive.
Lemma 241 Let % be complete. Given any two any bundles x and y, we always have either
x y or y x or x y.26
26
These “or” are intended as the Latin “aut”.
158 CHAPTER 6. FUNCTIONS
Since we are considering bundles of economic goods (and not of “bads”), it is natural
to assume monotonicity, i.e., that “more is better”. The triad , >, and leads to three
possible incarnations of this simple principle of rationality:
The relationships among the three notions are similar to those seen for the analogous
notions of monotonicity studied (also for utility functions) in Section 6.4.4. For example,
strict monotonicity means that, given a bundle, an increase of the quantity of any good of
the bundle determines a strictly preferred bundle.
Similar considerations hold for the other notions. In particular, (6.25) takes the form:
Proof We have
which proves (6.34). Now consider (6.35). If x y, then u(x) > u(y). Indeed, suppose, by
contradiction, that u (x) u (y). By (6.33), we then have x - y, which contradicts x y.
It remains to show that u(x) > u(y) implies x y. Arguing again by contradiction, suppose
that x - y. Again, by (6.33) we have u (x) u (y), which contradicts u(x) > u(y). This
completes the proof of (6.35).
27
Here “or” is intended as the Latin “vel”.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 159
The equivalence (6.34) allows to represent the indi¤erence classes as indi¤erence curves
of the utility function:
[x] = fy 2 X : u (y) = u (x)g
Thus, when a preference admits a utility representation, (6.32) reduces to the standard
property that indi¤erence curves are disjoint (Section 6.3.1).
As already observed, in the ordinalist approach the utility function is a mere repres-
entation of the preference relation, without any special psychological meaning. Indeed, we
already noted that each strictly increasing function f : Im u ! R de…nes an equivalent utility
function f u, for which it still holds that
x % y () (f u) (x) (f u) (y)
Theorem 243 Let % be a preference de…ned on a …nite set A. The following conditions are
equivalent:
Proof (i) implies (ii) Suppose % is transitive and complete. De…ne u : A ! R by u (x) =
jfy 2 A : y xgj. As the reader can check, we have x % y if and only if u(x) u(y), as
desired. (ii) implies (i). Assume that there exists u : X ! R such that u (x) u (y) if and
only if x % y. The preference % is transitive. Indeed, let x; y; z 2 X be such that x % y
and y % z. By hypothesis, we have that u (x) u (y) and u (y) u (z). Since the order
on R is transitive, we obtain u (x) u (z) which in turn yields x % z, as desired. The
preference % is complete. Indeed, let x; y 2 X. Since u (x) and u (y) are scalars, we either
have u (x) u (y) or u (y) u (x) or both because the order on R is complete. Therefore,
either x % y or y % x or both, as desired.
Thus, if there is a …nite number of alternatives, transitivity and completeness are neces-
sary and su¢ cient conditions for the existence of a utility function. Matters become more
complicated when A is in…nite: later we will present the famous lexicographic preferences on
R2+ , which do not admit any numerical representation. The next theorem solves the existence
problem on the key in…nite set Rn+ . To this end we need a …nal axiom, which reminds the
Archimedean property of the real numbers seen in Section 1.4.3.28
Archimedean: given any three bundles x; y; z 2 Rn+ with x y z, there exist weights
; 2 (0; 1) such that
x + (1 )z y x + (1 )z
28
For simplicity, we will assume that the consumption set A is the entire Rn + . The axiom can be stated
more generally for convex sets, an important notion that we will study in Chapter 14.
160 CHAPTER 6. FUNCTIONS
The axiom implies that there exist no in…nitely preferred and no in…nitely “unpreferred”
bundles. Given the preferences x y and y z, for the consumer the bundle x cannot be
in…nitely better than y, nor the bundle z can be in…nitely worse than y. Indeed, by suitably
combining the bundles x and z we get both a bundle better than y, that is, x+(1 )z, and
a bundle worse than y, that is, x + (1 )z. This would be impossible if x were in…nitely
better than y, or if z were in…nitely worse than y.
In this respect, recall the analogous property of real numbers: if x; y; z 2 R are three
scalars with x > y > z, there exist ; 2 (0; 1) such that
The property does not hold if we consider 1 and 1, that is, the extended real line
R = [ 1; 1]. In this case, if y 2 R but x = +1 and/or z = 1, the scalar x is in…nitely
greater than y, and z is in…nitely smaller than y, and there are no ; 2 (0; 1) that satisfy
the inequality (6.36). Indeed, 1 = +1 and ( 1) = 1 for every ; 2 (0; 1), as seen
in Section 1.7.
In conclusion, the Archimedean axiom makes the bundles of di¤erent but comparable
quality: however di¤erent, they belong to the same league. Thanks to this axiom, we can
now state the existence theorem (its not simple proof is omitted).
Theorem 244 Let % be a preference de…ned on A = Rn+ . The following conditions are
equivalent:
This is a remarkable result: most economic applications use utility functions and the
theorem shows which conditions on preferences justify such use.30
To appreciate the importance of Theorem 244, we close the chapter with a famous ex-
ample of a preference that does not admit a utility function. Let A = R2+ and, given two
bundles x and y, write x % y if either x1 > y1 or x1 = y1 and x2 y2 . The consumer starts
by considering the …rst coordinate: if x1 > y1 , then x % y. If, on the other hand, x1 = y1 ,
then he turns his attention to the second coordinate: if x2 y2 , then x % y.
The preference is inspired by how dictionaries order words; for this reason, it is called
lexicographic preference. In particular, we have x y if x1 > y1 or x1 = y1 and x2 > y2 ,
while we have x y if and only if x = y. The indi¤erence classes are therefore singletons, a
…rst remarkable feature of this preference.
The lexicographic preference is complete, transitive and strictly monotonic, as the reader
can easily verify. It is not Archimedean, however. Indeed, consider for example x = (1; 0),
y = (0; 1), and z = (0; 0). We have x y z and
x + (1 ) z = ( ; 0) y z 8 2 (0; 1)
For this reason, Theorem 244 does not apply to lexicographic preference, which therefore
cannot be represented by a strictly monotonic and continuous utility function. Actually, this
preference does not admit any utility function at all.
Proposition 245 The lexicographic preference does not admit any utility function.
Proof Suppose, by contradiction, that there exists u : R2+ ! R that represents the lex-
icographic preference. Let a < b be any two positive scalars. For each x 0 we have
(x; a) (x; b) and, therefore, u (x; a) < u (x; b). By Proposition 39, there exists a rational
number q (x) such that u (x; ) < q (x) < u (x; ). The rule x 7! q (x) de…nes, therefore, a
function q : R+ ! Q. It is injective. If x 6= y, say y < x, then:
u (y; a) < q (y) < u (y; b) < u (x; a) < q (x) < u (x; b)
and so q (x) 6= q (y). But, since R+ has the same cardinality of R, the injectivity of the
function q : R+ ! Q implies jQj jRj, contradicting Cantor’s Theorem 254. This proves
that the lexicographic preference does not admit any utility function.
162 CHAPTER 6. FUNCTIONS
Chapter 7
Cardinality
163
164 CHAPTER 7. CARDINALITY
Example 246 The set A = f11; 13; 15; 17; 19g of the odd integers between 10 and 20 is
…nite, with jAj = 5. N
Thanks to Proposition 198, two …nite sets have the same cardinality if and only if their
elements can be put in a one-to-one correspondence. For example, if we have seven seats
and seven students, we can assign one (and only one) seat to each student, say by putting a
name tag on it. All this motivates the following de…nition.
In other words, A is …nite if there exist a set f1; 2; :::; ng of natural numbers and a bijective
function f : f1; 2; :::; ng ! A. The set f1; 2; :::; ng can be seen as the “prototypical” set of
cardinality n, a benchmark that permits to “calibrate” all the other …nite sets of the same
cardinality via bijective functions.
This de…nition provides a functional angle on the cardinality of …nite sets, based on
bijective functions and on the identi…cation of a prototypical set. For …nite sets, however,
this angle is not much more than a curiosity. However, it becomes fundamental when we
want to extend the notion of cardinality to in…nite sets. This was the key insight of Georg
Cantor that, by …nding the right angle, led to the birth of the theory of in…nite sets. Indeed,
the possibility of establishing a one-to-one correspondence among in…nite sets allows for
a classi…cation of these sets by “size” and leads to the discovery of deep and surprising
properties.
Relative to …nite sets, countable sets immediately exhibit a remarkable, possibly puzzling,
property: it is always possible to put a countable set into a one-to-one correspondence with
an in…nite proper subset of it. In other words, losing elements might not a¤ect cardinality
when dealing with countable sets.
Proof Let X be a countable set and let A X be an in…nite proper subset of X, i.e.,
A 6= X. Since X is countable, its elements can be listed as a sequence of distinct elements
X = fx0 ; x1 ; : : : ; xn ; : : :g = fxi gi2N . Let us denote by n0 the smallest integer larger than or
equal to 0 such that xn0 2 A (if, for example, x0 2 A, we have n0 = 0, if x0 2 = A and x1 2 A
we have n0 = 1, and so on). Analogously, let us denote by n1 the smallest integer (strictly)
larger than n0 such that xn1 2 A. Given n0 ; n1 ; : : : ; nj , with j 1, let us de…ne nj+1 as the
smallest integer larger than nj such that xnj+1 2 A. Consider now the function f : N ! A
de…ned by f (i) = xni , with i = 0; 1; : : : ; n; : : :. It is easy to check that f is a one-to-one
correspondence between N and A, so A is countable.
The following example should clarify the scope of the previous theorem. The set E of
even numbers is, clearly, a proper subset of N that we may think contains only “half” of
its elements. Nevertheless, it is possible to establish a one-to-one correspondence with N by
putting in correspondence each even number 2n to its half n, that is,
2n 2 E !n2N
Therefore, jEj = jNj. Already Galileo realized this remarkable peculiarity of in…nite sets,
which clearly distinguishes them from …nite sets, whose proper subsets have always smaller
cardinality.4 In a famous passage of the Discorsi e dimostrazioni matematiche intorno a due
nuove scienze,5 published in 1638, he observed that the natural numbers can be put in a
one-to-one correspondence with their squares by setting n2 $ n. The squares, which prima
facie seem to form a rather small subset of N, are thus in equal number with the natural
numbers: “in an in…nite number, if one could conceive of such a thing, he would be forced
4
The mathematical fact considered here is at the basis of several little stories. For example, The Paradise
Hotel has countably in…nite rooms, progressively numbered 1; 2; 3; . At a certain moment, they are all
occupied when a new guest checks in. At this point, the hotel manager faces a conundrum: how to …nd a
room for the new guest? Well, after some thought, he realizes that it is easier than he imagined! It is enough
to ask every other guest to move to the room coming after the one they are actually occupying (1 ! 2; 2 ! 3;
3 ! 4, etc.). In this way, room number 1 will become free. He also realizes that it is possible to improve
upon this new arrangement! It is enough to ask everyone to move to the room with a number which is twice
the one of the room actually occupied (1 ! 2; 2 ! 4; 3 ! 6, etc.). In this way, in…nite rooms will become
available: all the odd ones.
5
The passage is in a dialogue between Sagredo, Salviati, and Simplicio, during the …rst day.
166 CHAPTER 7. CARDINALITY
to admit that there are as many squares as there are numbers all taken together”. The
clarity with which Galileo exposes the problem is worthy of his genius. Unfortunately, the
mathematical notions available to him were completely insu¢ cient for further developing
his intuitions. For example, the notion of function, fundamental for the ideas of Cantor,
emerged (in a primitive form) only at the end of the seventeenth century in the works of
Leibniz.
Clearly, the union of a …nite number of countable sets is also countable. Much more is
actually true.
Theorem 250 The union of a countable collection of countable sets is also countable.
A1 = fa11 ; a12; :::a1n ; :::g ; A2 = fa21 ; a22; :::a2n ; :::g ; ::: An = fan1 ; an2; :::ann ; :::g ; :::
We can then construct an in…nite matrix A in which the elements of the set An form the
n-th row: 2 3
a11 a12 a13 a14 a15
6 a21 a22 a23 a24 a25 7
6 7
6 a31 a32 a33 a34 a35 7
A=6 6 a41 a42
7
7 (7.1)
6 a43 a44 a45 7
4 a51 a52 a53 a54 a55 5
1
[
The matrix A contains at least as many elements as the union An . Indeed, it may contain
n=1
more elements because some elements can be repeated more than once in the matrix, while
they would only appear once in the union (net of such repetitions, the two sets have the
same number of elements).
We now introduce another in…nite matrix, denoted by N , which contains all the natural
numbers except 0.
2 3
1 3 6 10 15
6 2 5 9 14 7
6 7
6 4 8 13 7
N =6 6 7 (7.2)
7
6 7 12 7
4 11 5
Observe that:
1. The …rst diagonal of A (moving from SW to NE) consists of one element: a11 . We map
this element into the natural number 1, which is the corresponding element in the …rst
diagonal of N . Note that the sum of the indexes of a11 is 1 + 1 = 2.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 167
2. The second diagonal of A consists of two elements: a21 and a12 . We map these elements,
respectively, into the natural numbers 2 and 3, which are the corresponding elements
in the second diagonal of N . Note that the sum of the indexes of a21 and a12 is 3.
3. The third diagonal of A consists of three elements: a31 , a22 , and a13 . We map these
elements, respectively, into the natural numbers 4, 5, and 6, which are the correspond-
ing elements in the third diagonal of N . Note that the sum of the indexes of a31 , a22 ,
and a13 is 4.
4. The fourth diagonal of A consists of four elements: a41 , a32 , a23 , and a14 . We map
these elements, respectively, into the natural numbers 7, 8, 9, and 10, which are the
corresponding elements in the fourth diagonal of N . Note that the sum of the indexes
of a41 , a32 , a23 , and a14 is 5.
0.9
0.8
a a a a ...
11 12 13 14
0.7
0.6 a a a a ...
21 22 23 24
0.5
a a a a ...
0.4 31 32 33 34
0.3
a a a a ...
41 42 43 44
0.2
0.1
..........................
0
0 0.2 0.4 0.6 0.8 1
At each step we have an arrow, indexed by the sum of the indexes of the entries that it hits,
minus 1. So, arrow 1 hits entry a11 , arrow 2 hits entries a21 and a12 , arrow 3 hits entries
a31 , a22 , and a13 , and arrow 4 hits entries a41 , a32 , a23 , and a14 . Each arrow hits one more
entry than the previous one.
Intuitively, by proceeding in this way we cover the entire matrix A with countably many
arrows, each hitting a …nite number of entries. So, matrix A has countably many entries.
1
[
The union An is then a countable set.
n=1
That said, next we give a rigorous proof.
Claim 1 N N is countable.
Proof Claim 1 Consider the function f1 : N N ! N given by f1 (m; n) = 2n+1 3m+1 . Note
that f1 (m; n) = f1 (m; n) means that 2n+1 3m+1 = 2n+1 3m+1 . By the Fundamental Theorem
of Arithmetic, this implies that n+1 = n+1 and m+1 = m+1, proving that (m; n) = (m; n).
168 CHAPTER 7. CARDINALITY
With a similar argument it is possible to prove that also the Cartesian product of a …nite
number of countable sets is countable. Moreover, the previous result yields that the set of
rational numbers is countable.
The reader can verify that f is bijective, thus proving that Z is countable. On the other
hand, the set nm o
Q= : m 2 Z and 0 6= n 2 N
n
of rational numbers can be written as union of in…nitely many countable sets:
+1
[
Q= An
n=1
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 169
where
0 1 1 2 2 m m
An = ; ; ; ; ;:::; ; ;:::
n n n n n n n
This corollary is quite surprising: though the rational numbers are much more numerous
than the natural numbers, there exists a way to put these two classes of numbers into a
one-to-one correspondence. The cardinality of N, and so of any countable set, is usually
denoted by @0 , that is, jNj = @0 . We can then write as
jQj = @0
De…nition 252 A set A has the cardinality of the continuum if it can be put in a one-to-one
correspondence with the set R of the real numbers. In this case, we write jAj = jRj.
The cardinality of the continuum is often denoted by c, that is, jRj = c. Also in this case
there exist subsets that are, prima facie, much smaller than R but turn out to have the same
cardinality. Let us see an example which will be useful in proving that R is uncountable.
Proposition 253 The interval (0; 1) has the cardinality of the continuum.7
Proof We want to show that j(0; 1)j = jRj. To do this we have to show that the numbers of
(0; 1) can be put in a one-to-one correspondence with those of R. The bijection f : R ! (0; 1)
de…ned by
(
1 12 ex if x < 0
f (x) = 1 x
2e if x 0
6
@ (aleph) is the …rst letter of the Hebrew alphabet. In the next section we will formalize also for in…nite
sets the notion of same or greater cardinality. For the time being, we treat these notions intuitively.
7
At the end of Section 6.5.3 we noted that the trigonometric function f : R ! ( 1; 1) de…ned by
(2= ) arctan x is a bijection In view of what we learned so far, this shows that ( 1; 1) has the cardinal-
ity of the continuum.
170 CHAPTER 7. CARDINALITY
with graph
2
y
1.5
1
1
0.5 1/2
0
O x
-0.5
-1
-1.5
-2
-5 -4 -3 -2 -1 0 1 2 3 4 5
shows that, indeed, this is the case (as the reader can also formally verify).
Proof Assume, by contradiction, that R is countable. Hence, there exists a bijective function
g : N ! R. By Proposition 253, it follows that there exists a bijective function f : R ! (0; 1).
The reader can easily prove that f g is a bijective function from N to (0; 1), yielding that (0; 1)
is countable. We will next reach a contradiction, showing that (0; 1) cannot be countable. To
this end, we write all the numbers in (0; 1) using their decimal representation: each x 2 (0; 1)
will be written as
x = 0:c0 c1 cn
with ci 2 f0; 1; :::; 9g, using always in…nitely many digits (for example 3:54 will be written
3:54000000 : : :). Since until now we obtained that (0; 1) is countable, there exists a way to
list its elements as a sequence.
and so on. Let us take then the number x = 0:d0 d1 d2 d3 dn such that its generic
decimal digit dn is di¤erent from cnn (but without choosing in…nitely many times 9, thus to
avoid a periodic 9 which, as we know, does not exist on its own). The number x belongs
to (0; 1), but sadly does not belong to the list written above since dn 6= cnn (and therefore
it is di¤erent from x0 since d0 6= c00 , from x1 since d1 6= c11 , etc.). We conclude that the
list written above cannot be complete and hence the numbers of (0; 1) cannot be put in a
one-to-one correspondence with N. So, the interval (0; 1) is not countable, a contradiction.
7.3. A PANDORA’S BOX 171
The set R of real numbers is, therefore, much richer than N and Q. The rational numbers
–which have, as we remarked, a “quick rhythm”–are comparatively very few with respect to
the real numbers: they form a kind of …ne dust that overlaps with the real numbers without
covering them all. At the same time, it is dust so …ne that between any two real numbers,
no matter how close they are, there are particles of it.
Summing up, the real line is a new prototype of in…nite set.
It is possible to prove that both the union and the Cartesian product of a …nite or
countable collection of sets that have the cardinality of the continuum has, in turn, the
cardinality of the continuum. This has the next consequence.
Theorem 255 Rn has the power of the continuum for each n 1.
This is another remarkable …nding, which is surprising already in the special case of the
plane R2 that, intuitively, may appear to contain many more points than the real line. It is
in front of results of this type, so surprising for our “…nitary” intuition, that Cantor wrote
in a letter to Dedekind “I see it, but I do not believe it”. His key intuition on the use of
bijective functions to study the cardinality of in…nite sets opened a new and fundamental
area of mathematics, which is also rich in terms of philosophical implications (mentioned at
the beginning of the chapter).
n n n n
2A = 1 + + + ::: + +
1 2 n 1 n
n
X n k n k
= 1 1 = (1 + 1)n = 2n
k
k=0
Sets can have the same size, but also di¤erent sizes. This motivates the following de…ni-
tion:
(i) A has cardinality less than or equal to B, written jAj jBj, if there exists an injective
function f : A ! B;
(ii) A has cardinality strictly less than B, written jAj < jBj, if jAj jBj and jAj =
6 jBj.
(ii) jAj jBj and jBj jCj imply that jAj jCj;
(iii) jAj jBj and jBj jAj if and only if jAj = jBj;
Example 260 We have jNj < jRj. Indeed, by Theorem 254 jNj =
6 jRj and, by point (iv),
N R implies jNj jRj. N
Properties (i) and (ii) say that the order is re‡exive and transitive. As for property
(iii), it tells us that and = are related in a natural way. Finally, (iv) con…rms the intuitive
idea that smaller sets have a smaller cardinality. Remarkably, this intuition does not carry
over to < – i.e., A B does not imply jAj < jBj – because, as already noted, a proper
subset of an in…nite set may have the same cardinality as the original set (as Galileo had
envisioned).
b = b0 , that is, f (a) = f (a0 ). Since f is injective, we conclude that a = a0 , proving that h is
injective.
(i) Let f : A ! A be the identity, that is, f (a) = a for all a 2 A. The function f is
trivially injective and the statement follows.
(ii) Since jAj jBj, there exists an injective function f : A ! B. Since jBj jCj, there
exists an injective function g : B ! C. Next, note that h = g f : A ! C is well-de…ned
and, by the initial part of the proof, we also know that it is injective, thus proving that
jAj jCj.
(iii) We only prove the “if” part.8 By de…nition and since jAj = jBj, there exists a
bijection f : A ! B. Since f is bijective, it follows that f 1 : B ! A is well-de…ned and
bijective. Thus, both f : A ! B and f 1 : B ! A are injective, yielding that jAj jBj and
jBj jAj.
(iv) De…ne f : A ! B by the rule f (a) = a. Since A B, the function f is well-de…ned
and, clearly, injective, thus proving the statement.
When a set A is …nite and non-empty, we clearly have jAj < 2A . Remarkably, the
inequality continues to hold for in…nite sets.
Theorem 261 (Cantor) For each set A, …nite or in…nite, we have jAj < 2A .
Proof Consider a set A and the collection of all singletons C = ffagga2A . It is immediate to
see that there is a bijective mapping between A and C, that is, jAj = jCj, and C 2A . Since
jCj 2A , we conclude that jAj 2A . Next, by contradiction, assume that jAj = j2A j.
Then there exists a bijection between A and 2A which associates to each element a 2 A an
element b = b (a) 2 2A and vice versa: a $ b. Observe that each b (a), being an element of
2A , is a subset of A. Consider now all the elements a 2 A such that the corresponding subset
b (a) does not contain a. Call S the subset of these elements, that is, S = fa 2 A : a 62 b (a)g.
Since S is a subset of A, S 2 2A . Since we have a bijection between A and 2A , there must
exist an element c 2 A such that b (c) = S. We have two cases:
(ii) if c 2
= S, then by the de…nition of S, b (c) contains c, so c 2 b (c) = S.
In both cases, we have reached a contradiction, thus proving jAj < j2A j.
Cantor’s Theorem o¤ers a simple way to make a “cardinality jump”starting from a given
set A: it is su¢ cient to consider the power set 2A . For example,
R
2R > jRj ; 22 > j2R j
and so on. We can, therefore, construct an in…nite sequence of sets of higher and higher
cardinality. In this way, we enrich (7.4) that now becomes
n R
o
1; 2; :::; n; :::; @0 ; c; 2R ; 22 ; ::: (7.5)
8
The “only if” part is the content of the Schroeder-Bernstein’s Theorem which we leave to more advanced
courses.
174 CHAPTER 7. CARDINALITY
Here is the Pandora’s box mentioned above, which Theorem 261 has allowed us to uncover.
The breathtaking sequence (7.5) is only the incipit of the theory of the in…nite sets, whose
study (even the introductory part) would take us too far away.
Before moving on with the book, however, we consider a …nal famous aspect of the
theory, the so-called continuum hypothesis (which the reader might have already heard of).
By Theorem 261, we know that 2N > jNj. On the other hand, by Theorem 254 we also
have jRj > jNj. The next result (we omit its proof) shows that these two inequalities are
actually not distinct.
Therefore, the power set of N has the cardinality of the continuum. The continuum
hypothesis states that there is no set A such that
That is, there is no in…nite set of cardinality intermediate between @0 and c. In other words,
a set that has cardinality larger than @0 must have at least the cardinality of the continuum.
The validity of the continuum hypothesis is the …rst among the celebrated Hilbert prob-
lems, posed by David Hilbert in 1900, and represents one of the deepest questions in math-
ematics. By adopting this hypothesis, it is possible to set
@1 = jRj
and to consider the cardinality of the continuum as the second in…nite cardinal number @1
after the …rst one @0 = jNj.
The continuum hypothesis can be reformulated in a suggestive way by writing
@1 = 2@0
That is, the smallest cardinal number greater than @0 is equal to the cardinality of the power
set of N or, equivalently, of any set of cardinality @0 (like, for example, the rational numbers).
The generalized continuum hypothesis states that, for each n, we have
@n+1 = 2@n
All the jumps of cardinality in (7.5), not only the …rst one from @0 to @1 , are thus obtained
by considering the power set. Therefore,
R 2R
@2 = 22 ; @3 = 22
The elements of this sequence are the cardinal numbers that represent all the di¤erent car-
dinalities (…nite or in…nite) that sets might have, however large they might be. According
to the generalized continuum hypothesis, the power sets in (7.5) are the prototype sets of
7.3. A PANDORA’S BOX 175
the in…nite cardinal numbers (the …rst two being the two in…nite cardinal numbers @0 = jNj
and @1 = c with which we started this section).
Summing up, the depth of the problems that the use of bijective functions opened is
incredible. As we have seen, this study started by Cantor is, at the same time, rigorous
and intrepid – as typical of the best mathematics, at the basis of its beauty. It relies on
the use of bijective functions to capture the fundamental principle of similarity (in terms of
numerosity) among sets.9
9
The reader who wants to learn more about set theory can consult Halmos (1960), Suppes (1960), as well
as Lombardo Radice (1981).
176 CHAPTER 7. CARDINALITY
Part II
Discrete analysis
177
Chapter 8
Sequences
where each number occupies a place of order, a position, so it follows (except the …rst one)
a number and precedes another one. The next de…nition formalizes this idea. We denote by
N+ the set of the natural numbers without 0.
n 7 ! 2n (8.2)
and so we have the sequence of even integers (that are strictly positive). The image f (n)
is usually denoted by xn . With such notation, the sequence of even integers is xn = 2n for
each n 1. The images xn are called terms (or elements) of the sequence. We will denote
sequences by fxn g1
n=1 , or brie‡y by fxn g.
1
There are di¤erent ways to de…ne a speci…c sequence fxn g, that is, to describe the
underlying function f : N+ ! R. A …rst possibility is to describe it in closed form through
a formula: for instance, this is what we did with the sequence of the even numbers using
(8.2). Other de…ning rules are, for example,
n 7 ! 2n 1 (8.3)
n 7 ! n2 (8.4)
1
n7 ! p (8.5)
2n 1
1
The choice of starting the sequence from n = 1 instead of n = 0 (or of any other natural number k) is a
mere convention. When needed, it is perfectly legitimate to consider sequences fxn g1n=0 or, more generally,
fxn g1
n=k .
179
180 CHAPTER 8. SEQUENCES
1 1 1
1; p ; p ; p ; ::: (8.7)
2 4 8
To de…ne a sequence in closed form thus amounts to specify explicitly the underlying function
f : N+ ! R. The next example presents couple of classic sequences de…ned in closed form.
1 1 1 1
1; ; ; ; ; :::
2 3 4 5
a; aq; aq 2 ; aq 3 ; aq 4 ;
is called geometric (or geometric progression) with …rst term a and common ratio q. For
example, if a = 1 and q = 1=2, we have f1; 1=2; 1=4; 1=8; 1=16; :::g. N
in which each term is the sum of the two terms that precede it, with …xed initial values 0
and 1. For example, in the fourth position we …nd the number 2, i.e., the sum 1 + 1 of the
two terms that precede it, in the …fth position we …nd the number 3, i.e., the sum 1 + 2 of
the two terms that precede it, and so on. The underlying function f : N+ ! R is, hence,
(
f (1) = 0 ; f (2) = 1
(8.8)
f (n) = f (n 1) + f (n 2) for n 3
We have two initial values, f (1) = 0 and f (2) = 1, and a recursive rule that allows to
compute the term in position n once the two preceding terms are known. Di¤erently from
the sequences de…ned through a closed formula, such as (8.3)-(8.5), to obtain the term xn
we now have to …rst construct, using the recursive rule, all the terms that precede it. For
example, to compute the term x100 in the sequence of the odd numbers (8.6), it is su¢ cient
to substitute n = 100 in formula (8.3), …nding x100 = 199. In contrast, to compute the term
2
Indeed, 1=2; 1=3; 1=4; ::: are the positions in which we have to put a …nger on a vibrating string to obtain
the di¤erent notes.
8.1. THE CONCEPT 181
x100 in the Fibonacci sequence we …rst have to construct by recurrence the …rst 99 terms of
the sequence. Indeed, it is true that to determine x100 it is su¢ cient to know the values of
x99 and x98 and then to use the rule x100 = x99 + x98 , but to determine x99 and x98 we must
…rst know x97 and x96 , and so on.
Therefore, the recursive de…nition of a sequence consists of one or more initial values and
of a recurrence rule that, by starting from them, allows to compute the various terms of the
sequence. The initial values are arbitrary. For example, if in (8.8) we choose f (1) = 2 and
f (2) = 1 we have the following Fibonacci sequence
f2; 1; 3; 4; 7; 11; 18; 29; 47; :::g
Next de…ne by recurrence a classic sequence.
Example 265 Given any a; b 2 R, de…ne f : N+ ! R by
(
f (1) = a
f (n) = f (n 1) + b for n 2
Starting from the initial value f (1) = a, it is possible to construct the entire sequence
through the recursive formula f (n) = f (n 1) + b. This is the so-called arithmetic sequence
(or arithmetic progression) with …rst term a and common di¤erence b. For example, if a = 2
and b = 4, we have f2; 6; 10; 14; 18; 22; :::g. N
To ease notation, the underlying function f is often omitted in recursive formulas. For
instance, the arithmetic sequence is written as
(
x1 = a
(8.9)
xn = xn 1 + b for n 2
The next examples adopt this simpli…ed notation.
Example 266 Let P = f3k : k 2 N+ g be the collection of all multiples of 3, i.e., P =
f3; 6; 9; 12; 15; :::g. De…ne recursively a sequence fxn g by x1 = a 2 R and, for each n 2,
(
1 if n 2 P
xn xn 1 = (8.10)
+1 else
In words, at each position we can go either up or down of one unit, we go down if we are
getting to positions that are multiples of 3, we go up otherwise. This sequence is an example
of a random walk: it may describe the walk of a drunk person who, at each block, goes
either North, +1, or South, 1, and that, for some (random) reason, always goes South
after having gone twice North. For instance, if the initial condition is a = 0 we have:
182 CHAPTER 8. SEQUENCES
More generally, given any subset P (…nite or not) of N+ , the recurrence (8.10) is called
random walk. N
Example 267 A Star Wars’ jedi begins his career as a padawan apprentice under a jedi
master, then becomes a knight and, once ready to train, becomes a master and takes a
padawan apprentice.
Let
pt = number of jedi padawans at time t
kt = number of jedi knights at time t
mt = number of jedi masters at time t
Assume that, as one (galactic) year passes, padawans become knights, knights become mas-
ters, and masters take a padawan apprentice. Formally:
8
>
> k = pt
< t+1
mt+1 = mt + kt
>
>
:
pt+1 = mt+1
The total number of jedis at time t + 2, denoted by xt+2 , is then:
xt+2 = kt+2 + mt+2 + pt+2 = pt+1 + mt+1 + kt+1 + mt+1 + kt+1
= xt+1 + mt+1 + kt+1 = xt+1 + mt + kt + pt = xt+1 + xt
So, we have a Fibonacci recursion
xt+2 = xt+1 + xt
which says something simple but not so obvious a priori : the number of jedis at time t+2 can
be regarded as the sum of the numbers of jedis at time t + 1 and at time t. Indeed, a jedi
is a master at t + 2 if and only if he was a jedi (of any kind) at t. So, xt gives the number
of all masters at t = 2, who in turn increase at t + 2 the population of jedis by taking new
apprentices.
The recursion is initiated at t = 1 by a “self-taught” original padawan, who becomes
knight at t = 2 and master with a new padawan at t = 3. So:
(
x1 = 1 ; x2 = 1
xt = xt 1 + xt 2 for t 3
with initial values x1 = x2 = 1. We can diagram the recursion as:
p 1=1
k 1=1
mp 1+1=2
mpk 1+2=3
mpkmp 2+3=5
mpkmpmpk 3+5=8
mpkmpmpkmpkmp 5+8=13
Note how every string is the concatenation of the previous two ones. N
8.1. THE CONCEPT 183
We have
x2 = 4; x3 = 8; x4 = 16
and so on. This suggests that the closed form is the geometric sequence
xn = 2n 8n 1 (8.12)
of both …rst term and common ratio 2. Let us verify that this guess is correct. We proceed
by induction. Initial step: at n = 1 we have x1 = 2, as desired. Induction step: assume that
(8.12) holds at some n 2; then
as the reader can prove. This recursion also motivates the “…rst term”and “common ratio”
terminology. N
x2 = a + b; x3 = a + 2b; x4 = a + 3b
xn = a + (n 1) b 8n 1 (8.13)
Let us verify that this guess is correct. We proceed by induction. Initial step: at n = 1 we
have x1 = a, as desired. Induction step: assume that (8.13) holds at some n 2; then
xn+1 = xn + b = a + (n 1) b + b = a + nb
Example 271 An investor can at each period of time invest an amount of money x, a
monetary capital, and receive at the next period the original amount invested x along with
an additional amount rx computed according to the interest rate r 0. Such additional
amount is the fruit of his investment. For instance, if x = 100 and r = 0:1, then rx = 10 is
such an amount.
Assume that the investor has an initial monetary capital c that he keeps investing at all
periods. The resulting cash ‡ow is described by the following recursion
(
x1 = c
xt = (1 + r) xt 1 for t 2
We have
x2 = c (1 + r) ; x3 = x2 (1 + r) = c (1 + r)2 ; x4 = x3 (1 + r) = c (1 + r)3
xt = (1 + r)t c 8t 1 (8.14)
and so (8.14) holds at t + 1. By induction, it then holds at all t 1. Formula (8.14) is the
classic compound interest formula of …nancial mathematics. N
8.1. THE CONCEPT 185
Not all sequences can be described in closed or recursive form. In this regard, the most
famous example is the sequence fpn g of prime numbers: it is in…nite by Euclid’s Theorem,
but it does not have a (known) explicit description. In particular:
(i) Given n, we do not know any formula that tells us what pn is; in other words, the
sequence fpn g cannot be de…ned in closed form.
(ii) Given pn (or any smaller prime), we do not know any formula that tells us what pn+1
is; in other words, the sequence fpn g cannot be de…ned by recurrence.
(iii) Given any prime number p, we do not know of any (operational) formula that gives us
a prime number q greater than p; in other words, the knowledge of a prime number
does not give any information on the subsequent prime numbers.
Hence, we do not have a clue on how prime numbers follow one another, that is, on the
form of the function f : N+ ! R that de…nes such sequence. We have to consider all the
natural numbers and check, one by one, whether or not they are prime numbers through the
primality tests (Section 1.3.2). Having at our disposal the eternity, we could then construct
term by term the sequence fpn g. More modestly, in the short time that passed between
Euclid and us, tables of prime numbers have been compiled; they establish the terms of the
sequence fpn g until numbers that may seem huge to us, but that are nothing relative to the
in…nity of all the prime numbers.
O.R. As to (iii), for centuries mathematicians have looked for a (workable) rule that, given
a prime number p, would make it possible to …nd a greater prime q > p, that is, a function
q = f (p). A famous example of a possible such rule is given by the so-called Mersenne
primes, which are the prime numbers that can be written in the form 2p 1 with p prime.
It is possible to prove that if 2p 1 is prime, then so is p. For centuries, it was believed (or
hoped) that the much more interesting converse was true, namely: if p is prime, so is 2p 1.
This conjecture was de…nitely disproved in 1536 when Hudalricus Regius showed that
211 1 = 2047 = 23 89
thus …nding the …rst counterexample to the conjecture. Indeed, p = 11 does not satisfy it.
In any case, Mersenne primes are among the most important prime numbers. In particular,
as of 2016, the greatest prime number known is
274207281 1
We close the section by observing that, given any function f : R+ ! R, its restriction fjN+
to N+ is a sequence. So, functions de…ned on (at least) the positive half-line automatically
de…ne also a sequence.
4
See the Great Internet Mersenne Prime Search.
186 CHAPTER 8. SEQUENCES
x = fxn g = fx1 ; x2 ; : : : ; xn ; : : :g
The operations on functions studied in Section 6.3.2 have, as a special case, the operations
on sequences –that is, on elements of the space R1 . In particular, given any two sequences
x = fxn g and y = fyn g in R1 , we have:
To ease notation, we will denote the sum directly by fxn + yn g instead of f(x + y)n g.
We will do the same for the other operations.5
(ii) x > y if x y and x 6= y, i.e., if x y and there is at least a position n such that
xn > yn ;
x y =) x > y =) x y 8x; y 2 R1
That said, like in Rn also in R1 the order is not complete and sequences might well
be not comparable. For instance, the alternating sequence xn = ( 1)n and the constant
sequences yn = 0 cannot be compared. Indeed, they are f 1; 1; 1; 1; :::g and f0; 0; 0; 0; :::g,
respectively.
(i) increasing if
x y =) g (x) g (y) 8x; y 2 A (8.15)
5
If f; g : N+ ! R are the functions underlying the sequences fxn g and fyn g, their sum is equivalently
written (x + y)n = (f + g) (n) = f (n)+g (n) for every n 1. A similar remark holds for the other operations.
So, the operations on functions imply those on sequences, as claimed.
8.3. APPLICATION: INTERTEMPORAL CHOICES 187
g (x) = k 8x 2 A
The decreasing counterparts of these notions are similarly de…ned. For brevity, we do
not dwell upon these notions. We just note that, as in Rn , strict monotonicity implies
the other two kinds of monotonicity and that constancy implies increasing and decreasing
monotonicity, but not vice versa (cf. Example 210).
where 2 (0; 1) can be interpreted as a subjective discount factor that, as we have seen,
depends on the degree of patience of the consumer (Section 6.2.2).
The monotonicity properties of intertemporal utility functions U : R1 + ! R are, clearly,
those seen in points (i)-(iv) of the previous section for a generic function g de…ned on subsets
of R1 .
expectations about such values. For this reason, expectations come to play a key role in
economics and the relevance of this subjective component is a key feature of economics
as a social science that distinguishes it from, for instance, the natural sciences. Through
sequences we can give a …rst illustration of their importance.
De…nition 272 A pair (p; q) 2 [a; b] R+ of prices and quantities is called an equilibrium
of market M if
q = D (p) = S (p)
The pair (p; q) is the equilibrium of our market of potatoes. Graphically, it corresponds
to the classic intersection of supply and demand:
6
y
D
5
S
3
0
O b x
-1
-0.5 0 0.5 1 1.5 2
D (p) = p (M)
S (p) = + p
with > 0, 0 and ; > 0. Since consumers demand positive quantities, we set
a = = > 0 (because D (p) 0 if and only if p = ); similarly, since producers supply
positive quantities, we set b = = (because S (p) 0 if and only if p = ). There can
be trade only at prices that belong to the interval
[a; b] = ; (8.16)
8.4. APPLICATION: PRICES AND EXPECTATIONS 189
where both quantities are positive. So, we consider demand and supply functions de…ned
only on such interval even though, mathematically, they are straight lines de…ned on the
entire real line.6
For our linear economy, the equilibrium condition becomes
p= + p
p= (8.17)
+
and
a
q = D (p) = p= =
+ +
Note that, equivalently, we can retrieve the equilibrium quantity via the supply function:
a
q = S (p) = + p = + =
+ +
D (pt ) = pt (Mt )
S (pt ) = + pt
qt = D (pt ) = S (pt ) 8t 1
It is easy to check that the resulting sequence of equilibrium prices fpt g is constant:
pt = 8t 1 (8.18)
+
We thus go back to the equilibrium price (8.17) of market M . This is not surprising: because
of our assumptions, the markets Mt are independent and, at each t, we have a market identical
to M .
The hypothesis of instantaneous production upon which our analysis relies is, however,
implausible. Let us make the more plausible hypothesis that producers can adjust their
production only after one period: their production technology requires that the quantity
that they supply at t has to be decided at t 1 (to harvest potatoes at t, we need to sow at
t 1).
At the decision time t 1, producers do not know the value of the future equilibrium
price pt , they can only have a subjective expectation about it. Denote by Et 1 (pt ) such
expected value. In this case the market at t, denoted by MRt , has the form
D (pt ) = pt (MRt )
S (Et 1 (pt )) = + Et 1 (pt )
where the expectation Et 1 (pt ) replaces the price pt as an argument of the supply function.
Indeed, producers’decisions now rely upon such expectation.
De…nition 274 A triple of sequences of prices fpt g 2 [a; b]1 , quantities fqt g 2 R1 + , and
expectations fEt 1 (pt )g 2 [a; b]1 is called a uniperiodal market equilibrium of markets MRt
if
qt = D (pt ) = S (Et 1 (pt )) 8t 1
0 Et 1 (pt ) 8t 1
This inequality is a necessary condition for equilibrium expectations. But, expect such simple
inequalities, there are no restrictions on equilibrium expectations: they just have to balance
with prices, nothing else.
8.4. APPLICATION: PRICES AND EXPECTATIONS 191
In view of (8.19), at a uniperiodal market equilibrium, prices then evolve according to the
linear recursion
pt = pt 1 8t 2 (8.21)
So, starting from an initial expectation, prices are determined by recurrence. Expecta-
tions no longer play an explicit role in the evolution of prices, thus dramatically simplifying
the analysis. Yet, one should not forget that, though they do not appear in the recursion,
expectations are key in the underlying economic process. Speci…cally, once …xed a value of
E0 (p1 ), from (8.22) we have the initial equilibrium price, which in turn determines both the
expectation E1 (p2 ) via (8.20) and the next equilibrium price p2 via the recursion (8.21), and
so on so forth. So, starting from an initial expectation, this process features equilibrium
sequences fpt g and fEt 1 (pt )g of prices and expectations.
Assume, instead, that producers expect that the future price be an average of the last
two observed prices:
1 1
Et 1 (pt ) = pt 1 + pt 2 8t 3 (8.23)
2 2
with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). In view of (8.19), at a uniperiodal
market equilibrium, prices then evolve according to the following linear recursion of order 2:
(
p1 = E0 (p1 ) ; p2 = E1 (p2 )
pt = 2 pt 1 2 pt 2 for t 3
with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). Expectations based on (possibly
weighted) averages of past prices –the so-called extrapolative expectations –make it possible
to describe equilibrium prices via a linear recurrence, a very tractable form. It is, however,
a quite naive mechanism of price formation, agents might well feature more sophisticated
ways to form expectations (as readers will learn in some economics course).
8
Indeed, expectations on the initial price p1 cannot rely on any previous price information.
192 CHAPTER 8. SEQUENCES
f 1; 1; 1; 1; :::g (8.24)
Im f = ff (n) : n 1g
of the sequence, which consists exactly of the values that the sequence takes on, disregarding
repetitions. For example, the image of the alternating sequence (8.24) is f 1; 1g, while for
the constant sequence (8.25) it is the singleton f2g. The image thus gives an important
piece of information in that it indicates which values the sequence actually takes on, net of
repetitions: as we have seen, such values may be very few and just repeat themselves over and
over again along the sequence. On the other hand, the sequence of the odd numbers (8.6) does
not contain any repetition; its image consists of all its terms, that is, Im f = f2n 1 : n 1g.
Through the image, in Section 6.4.3 we studied some notions of boundedness for functions.
In the special case of sequences –i.e., of the functions f : N+ ! R –these notions take the
following form. A sequence fxn g is:
(i) bounded (from) above if there exists k 2 R such that xn k for every n 1;
(ii) bounded (from) below if there exists k 2 R such that xn k for every n 1;
(iii) bounded if it is bounded both above and below, i.e., if there exists k > 0 such that
jxn j k for every n 1.
For example, the alternating sequence xn = ( 1)n is bounded, while that of the odd
numbers (8.6) is only bounded below. Note that, as usual, this classi…cation is not exhaustive
because there exist sequences that are both unbounded above and below: for example, the
(strongly) alternating sequence xn = ( 1)n n.9 Such sequences are called unbounded.
Monotonic sequences are another important class of sequences. By applying to the un-
derlying function f : N+ ! R the notions of monotonicity introduced for functions (Section
6.4.4), we say that a sequence fxn g is:
(i) increasing if
xn+1 xn 8n 1
strictly increasing if
xn+1 > xn 8n 1
9
By “unbounded above (below)” we mean “not bounded from above (below)”.
8.6. EVENTUALLY: A KEY ADVERB 193
(ii) decreasing if
xn+1 xn 8n 1
strictly decreasing if
xn+1 < xn 8n 1
(iii) constant if it is both increasing and decreasing, i.e., if there exists k 2 R such that
xn = k 8n 1
De…nition 275 We say that a sequence satis…es a property P eventually if, starting from
a certain position n = nP , all the terms of the sequence satisfy P.
Example 276 (i) The sequence f2; 4; 6; 32; 57; 1; 3; 5; 7; 9; 11; g is eventually increas-
ing: indeed, starting from the 6th term, it is increasing.
(ii) The sequence fng is eventually 1:000: indeed, all the terms of the sequence, starting
from those of position 1:000, are 1:000.
is eventually constant. N
O.R. To satisfy eventually a property, the sequence in its “youth”can do whatever it wants;
what matters is that when old enough (i.e., from a certain n onward) it settles down. Youthful
blunders are forgiven as long as, sooner or later, all the terms of the sequence will satisfy
the property. H
194 CHAPTER 8. SEQUENCES
8.8.1 Convergence
We start with convergence, that is, with case (i) above.
Therefore, a sequence fxn g converges to L when, for each quantity " > 0, arbitrarily small
but positive, there exists a position n" – that depends on "! – starting from which the
distance between the terms xn of the sequence and the limit L is always smaller than ". A
sequence fxn g that converges to a point L 2 R is called convergent.
O.R. To show the convergence of a sequence to L, you have to pass a highly demanding test:
given any threshold " > 0 selected by a relentless examiner, you have to be able to come up
with a position n" far enough so that all terms of the sequence that come after such position
are " close to L. A convergent sequence is able to pass any such test, however tough the
examiner can be (i.e., however small is the posited " > 0). H
We emphasized through an exclamation point that the position n" depends on ", a key
feature of the previous de…nition. Moreover, such n" is not unique: if there exists a position
n" such that jxn Lj < " for every n n" , the same is true for any subsequent position,
which then also quali…es as n" . The choice of which among these positions to call n" is
irrelevant for the de…nition, which only requires the existence of, at least, one of them.
That said, there is always a smallest n" , which is a genuine threshold. As such, its
dependence on " takes a natural monotonic form: such n" becomes larger and larger as "
becomes smaller and smaller. The smallest n" thus best captures, because of its threshold
nature, the spirit of the de…nition: for each arbitrarily small " > 0, there exists a threshold
n" – the larger, the smaller (so, more demanding) " is – beyond which the terms xn are
" close to the limit L. The two examples that we will present shortly should clarify this
discussion.
So, in view of (8.27) we can rewrite the de…nition of convergence in the language of neigh-
borhoods. Conceptually, it is an important rewriting that deserves a separate mention.
De…nition 278 A sequence fxn g converges to a point L 2 R if, for every neighborhood
B" (L) of L, there exists n" 1 such that
Example 279 Consider the sequence xn = 1=n. The natural candidate for its limit is 0.
Let us verify that this is the case. Let " > 0. We have
1 1 1
0 < " () < " () n >
n n "
Therefore, if we take as n" any integer greater that 1=", for example the smallest one n" =
[1="] + 1,10 we then have
1
n n" =) 0 < < "
n
Therefore, 0 is indeed the limit of the sequence. For example, if " = 10 100 , we have
n" = 10100 + 1. Note that we could have chosen n" to be any integer greater than 10100 + 1,
which is indeed the smallest n" . N
p
Example 280 Consider the sequence (8.7), that is, xn = 1= 2n 1 . Also here the natural
candidate for its limit is 0. Let us verify this. Let " > 0. We have
1 1 n 1 1 1
p 0 < " () n 1 < " () 2 2 > () n > 1 + 2 log2
2n 1
2 2 " "
Therefore, by taking n" to be any integer greater than 1+2 log2 " 1 , for example the smallest
one n" = 2 + 2 log2 " 1 , we have
1
n n" =) 0 < p <"
2n 1
Therefore, 0 is the limit of the sequence. For example, if " = 10 100 the smallest n" is
2 + 2 log2 10100 = 2 + 200 [log2 10]. N
We saw two examples of sequences that converge to 0. Such sequences are called in…n-
itesimal (or null ). Thanks to the next result, the computation of their limits is of particular
importance.
Proof “If”. Suppose that limn!1 d (L; xn ) = 0. Let " > 0. There exists n" 1 such that
d (L; xn ) < " for every n n" . Therefore, xn 2 B" (L) for every n n" , as desired.
“Only if”. Let limn!1 xn = L. Consider the sequence of distances, whose term is
yn = d(xn ; L). We have to prove that limn!1 yn = 0, i.e., that for every " > 0 there exists
n" 1 such that n n" implies jyn j < ". Since yn 0, this is actually equivalent to showing
that
n n" =) yn < " (8.28)
10
Recall that [ ] denotes the integer part (Section 1.4.3).
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 197
Since xn ! L, given " > 0 there exists n" 1 such that d(xn ; L) < " for every n n" .
Therefore, (8.28) holds.
We can thus reduce the study of the convergence of any sequence to the convergence
to 0 of the sequence of distances fd (xn ; L)gn 1 . In other words, to check if xn ! L, it is
su¢ cient to check if d (xn ; L) ! 0, that is, if the sequence of distances is in…nitesimal.
Since d (xn ; 0) = jxn j, a simple noteworthy consequence of the last proposition is that
xn ! 0 () jxn j ! 0 (8.29)
A sequence is, thus, in…nitesimal if and only if it is “absolutely” in…nitesimal, in that the
distances of its terms from the origin become smaller and smaller.
8.8.3 Divergence
We now consider divergence. We begin with positive divergence. The spirit of the de…nition
is similar, mutatis mutandis, to that of convergence (as soon will be clear).
De…nition 285 A sequence fxn g diverges positively, written xn ! +1 or limn!1 xn =
+1, if for every K 2 R there exists nK 1 such that
n nK =) xn > K
In other words, a sequence diverges positively when it eventually becomes greater than
every scalar K. Since the constant K can be taken arbitrarily large, this can happen only
if the sequence is not bounded above (it is easy to be > K when K is small, increasingly
di¢ cult the larger K is).
Example 286 The sequence of even numbers xn = 2n diverges positively. Indeed, let
K 2 R. We have:
K
2n > K () n >
2
and so we can choose as nK any integer greater than K=2. For example, if K = 10100 , we
can put nK = 10100 =2 + 1. Therefore, xn = 2n diverges positively. N
O.R. For divergence there is a demanding “above the bar”test to pass: a relentless examiner
now sets an arbitrary bar K, to show the divergence of a sequence you have to come up with
a position nK far enough so that all terms of the sequence that come after such position are
above the posited bar. A divergent sequence is able to pass any such test, however tough
the examiner can be (i.e., however high K is). H
Proposition 288 A sequence fxn g, with eventually xn > 0, diverges positively if and only
if the sequence f1=xn g converges to zero.
Proof “If”. Let 1=xn ! 0. Let K > 0. Setting " = 1=K > 0, by De…nition 277 there exists
n1=K 1 such that 1=xn < 1=K for every n n1=K . Therefore, xn > K for every n n1=K ,
and by De…nition 285 we have xn ! +1.
“Only if”. Let xn ! +1 and let " > 0. Setting K = 1=" > 0, by De…nition 285 there
exists n1=" such that xn > 1=" for every n n1=" . Therefore, 0 < 1=xn < " for every n n1="
and so 1=xn ! 0.
Adding, subtracting or changing in any other way a …nite number of terms of a sequence
does not alter its asymptotic behavior: if it is regular, i.e., convergent or (properly) divergent,
it remains so, and with the same limit; if it is irregular (oscillating), it remains so. Clearly,
this depends on the fact that the notion of limit requires that a property –either “hitting”
an arbitrarily small neighborhood in case of convergence or being greater than an arbitrarily
large number in case of divergence –only holds eventually.
O.R. The smaller " > 0 is, the smaller a neighborhood B" (x) of a point. In contrast, the
greater K > 0 is, the smaller a neighborhood (K; +1] of +1 is. For this reason, for a
neighborhood of +1 the value of K becomes signi…cant when positive and arbitrarily large
(while for a neighborhood of 1 the value of K becomes signi…cant when negative and
arbitrarily large, in absolute value). H
The neighborhoods (K; +1] and [ 1; K) are open intervals in R for every K 2 R.14
That said, we can state a lemma that will be useful in de…ning limits of sequences.
Proof We only prove (i) since the proof of (ii) is similar. “If”. Let A be unbounded above,
i.e., A has no upper bounds. Let (K; +1] be a neighborhood of +1. Since A has no upper
bounds, K is not an upper bound of A. Therefore, there exists x 2 A such that x > K,
i.e., x 2 (K; +1] \ A and x 6= +1. It follows that +1 is a limit point of A. Indeed, each
neighborhood of +1 contains points of A di¤erent from +1.
“Only if”. Let +1 be a limit point of A. We show that A does not have any upper
bound. Suppose, by contradiction, that K 2 R is an upper bound of A. Since +1 is a limit
point of A, the neighborhood (K; +1] of +1 contains a point x 2 A such that x 6= +1.
Therefore K < x, contradicting the fact that K is an upper bound of A.
Example 291 The sets A such that (a; +1) A for some a 2 R are an important class of
sets unbounded above. By Lemma 290, +1 is a limit point for such sets A. Similarly, 1
is a limit point for the sets A such that ( 1; a) A for some a 2 R. N
Using the topology of R we can give a general de…nition of convergence that generalizes
De…nition 278 of convergence so to include De…nitions 285 and 287 of divergence as special
cases. In the next de…nition, which uni…es all previous de…nitions of limit of a sequence, we
set: 8
>
> B (L) if L 2 R
< "
U (L) = (K; +1] if L = +1
>
>
:
[ 1; K) if L = 1
De…nition 292 A sequence fxn g in R converges to a point L 2 R if, for every neighborhood
U (L) of L, there exists nU 1 such that
n nU =) xn 2 U (L)
O.R. If L 2 R, the position nU depends on an arbitrary radius " > 0 (in particular, as small
as we want), so we can write nU = n" : If, instead, L = +1, then nU depends on an arbitrary
scalar K (in particular, positive and arbitrarily large), so we can write nU = nK . Finally,
if L = 1, then nU depends on any negative real number K (in particular, negative and
arbitrarily large, in absolute value) and, without loosing generality, we can set nU = nK .
Thus, when L is …nite it is crucial that the property holds also for arbitrarily small values
of ". When L = 1, it is instead key that the property holds also for K arbitrarily large in
absolute value. H
Theorem 293 (Uniqueness of the limit) A sequence fxn g converges to at most one limit
L 2 R.
Proof Suppose, by contradiction, that there exist two distinct limits L0 and L00 that belong
to the set R. Without loss of generality, we assume that L00 > L0 . We consider di¤erent
cases and show that in each of them we reach a contradiction. So, L0 = L00 and we conclude
that the limit is unique.
We begin with the case when both L0 and L00 are …nite, i.e., L0 ; L00 2 R. Take " > 0 so
that
L00 L0
"<
2
Then
B" L0 \ B" L00 = ;
as the reader can verify and the next …gure illustrates:
10 y
8 L''+ε
L''
L''- ε
6
L'+ε
L'
4
L'- ε
O x
0
-2
-2 -1 0 1 2 3 4
By De…nition 278, there exists n0" 1 such that xn 2 B" (L0 ) for every n n0" , and there
exists n00" 1 such that xn 2 B" (L00 ) for every n n00" . Setting n" = max fn0" ; n00" g, we have
therefore both xn 2 B" (L0 ) and xn 2 B" (L00 ) for every n n" , i.e., xn 2 B" (L0 ) \ B" (L00 )
for every n n" . But this contradicts B" (L0 ) \ B" (L00 ) = ;. We conclude that L0 = L00 , so
the limit is unique.
Turn now to the case in which L0 is …nite and L00 = +1. For every " > 0 and every
K > 0, there exist n" and nK such that
It is now su¢ cient to take K = L0 + " to realize that, for n max fn" ; nK g, the two
inequalities cannot coexist. Also, in this case we reached a contradiction.
202 CHAPTER 8. SEQUENCES
The remaining cases can be treated in a similar way and are thus left to the reader.
The next result shows that, when a sequence converges to a point L 2 R, in each neigh-
borhood of L we …nd almost all the points of the sequence.
In other words, the sequence eventually belongs to any neighborhood B" (L) of L.
Proof Let xn ! L. By De…nition 278, for every " > 0 there exists n" 1 such that
xn 2 B" (L) for every n n" . Therefore, except at most the terms xn with 1 n < n" , all
the terms of the sequence belong to B" (L).
Vice versa, given any neighborhood B" (L) of L, suppose that all the terms of the sequence
belong to it, except at most a …nite number of them. Denote by fxnk g, with k = 1; 2; : : : ; m,
the set of the elements of the sequence that do not belong to B" (L). Setting n" = nm + 1,
we have that xn 2 B" (L) for every n n" . Since this is true for each neighborhood B" (L)
of L, by De…nition 278 we have xn ! L.
The next classic result shows that the terms of a convergent sequence eventually have the
same sign of the limit point. In other words, the sign of the limit point eventually determines
the sign of the terms of the sequence.
Theorem 295 (Permanence of sign) Let fxn g be a sequence that converges to a limit
L 6= 0. Then, eventually xn has the same sign as L, that is, xn L > 0.
Proof Suppose L > 0 (a similar argument holds if L < 0). Let " 2 (0; L). By De…nition
277, there exists n 1 such that jxn Lj < ", i.e., L " < xn < L + " for every n n.
Since " 2 (0; L), we have L " > 0. Therefore,
This last theorem has established a property of the limits with respect to the order
structure of the real line. Next we give another simple result of the same kind, leaving the
proof to the reader. A piece of notation: xn ! L 2 R indicates that the sequence fxn g
either converges to L 2 R or diverges (positively or negatively).
Proposition 296 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If eventually xn yn , then L H.
The scope of this proposition is noteworthy. It allows, for example, to check the positive
or negative divergence of a sequence through a simple comparison with other divergent
sequences. Indeed, if xn yn and xn diverges negatively, so does yn ; if xn yn and yn
diverges positively, so does xn .
8.9. PROPERTIES OF LIMITS 203
The converse of the proposition does not hold: for example, let L = H = 0, fxn g =
f 1=ng and fyn g = f1=ng. We have L H, but xn < yn for every n. However, if we
assume L > H, the converse then holds “strictly”.
Proposition 297 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If L > H, then eventually xn > yn .
Proof We prove the statement for L; H 2 R, leaving the other cases to the reader. Let
0 < " < (L H) =2. Since H +" < L ", we have (H "; H +")\(L "; L+") = ;. Moreover,
there exist n0" ; n00" 1 such that yn 2 (H "; H + ") for every n n0" and xn 2 (L "; L + ")
for every n n00" . For every n maxfn0" ; n00" g, we then have yn 2 (H "; H + ") and
xn 2 (L "; L + "), so xn > L " > H + " > yn . We conclude that eventually xn > yn .
Proof Suppose xn ! L. Setting " = 1, there exists n1 1 such that xn 2 B1 (L) for every
n n1 . Let M > 0 be a constant such that
We have d (xn ; L) < M for every n 1, i.e., jxn Lj < M for every n 1. This implies
that, for all n 1,
L M < xn < L + M
Therefore, the sequence is bounded.
Thanks to this proposition, the convergent sequences form a subset of the bounded ones.
Therefore, if a sequence is unbounded, it cannot be convergent.
In general, the converse of Proposition 298 is false. For example the alternating sequence
xn = ( 1)n is bounded but does not converge. A partial converse will be soon established by
the Bolzano-Weierstrass’Theorem. A full-‡edged converse, however, holds for the important
class of monotonic sequences: for such sequences, boundedness is both a necessary and
su¢ cient condition for convergence. This result is actually a corollary of the following general
theorem on the asymptotic behavior of monotonic sequences.
Proof Let fxn g be an increasing sequence (the proof for decreasing sequences is similar). It
can be either bounded or unbounded above (for sure, it is bounded below because x1 xn
for every n 1). Suppose that fxn g is bounded. We want to prove that it is convergent. Let
E be the image of the sequence. By hypothesis, it is a bounded subset of R. By the Least
Upper Bound Principle, sup E exists. Set L = sup E. Let us prove that xn ! L. Let " > 0.
Since L is the supremum of E, by Proposition 120 we have: (i) L xn for every n 1, (ii)
there exists an element xn" of E such that xn" > L ". Since fxn g is an increasing sequence,
it then follows that
L xn xn" > L " 8n n"
Hence, xn 2 B" (L) for every n n" , as desired.
Suppose that fxn g is unbounded above. Then, for every K > 0 there exists an element
xnK such that xnK > K. Since fxn g is increasing, we then have xn xnK > K for every
n nK , so it diverges to +1.
Thus, monotonic sequences cannot be irregular. We are now able to state and prove the
result anticipated above on the equivalence of boundedness and convergence for monotonic
sequences.
Needless to say, the results just discussed hold, more generally, for sequences that are
eventually monotonic.
8.9.2 Bolzano-Weierstrass’Theorem
The famous Bolzano-Weierstrass’ Theorem is a partial converse of Proposition 298. It is
the deepest result of this chapter, with far-reaching consequences. To state it, we must …rst
introduce subsequences. Consider a sequence fxn g. Given a strictly increasing sequence
fnk g1
k=1 that takes on only strictly positive integer values, i.e.,
the sequence
fxnk g1
k=1 = fxn1 ; xn2 ; xn3 ; :::; xnk ; :::g
is called subsequence of fxn g. In words, the subsequence fxnk g is a new sequence constructed
from the original sequence fxn g by taking only the terms of position nk . A few examples
should clarify.
1 1 1 1
1; ; ; ; : : : ; ; : : : (8.30)
2 3 4 n
8.9. PROPERTIES OF LIMITS 205
1 1 1 1
1; ; ; ; : : : ; ;:::
3 5 7 2k + 1
where fnk gk 1 is the sequence of the odd numbers f1; 3; 5; : : :g. Thus, this subsequence has
been constructed by selecting the elements of odd position in the original sequence. Another
subsequence of (8.30) is given by
1 1 1 1 1
; ; ; ;:::; n;:::
2 4 8 16 2
where now fnk gk 1 is formed by the powers of 2, that is, 2; 22 ; 23 ; : : : . This subsequence
is constructed by selecting the elements of the original sequence whose position is a power
of 2. N
f 1; 1; 1; : : : ; 1; : : :g (8.32)
By taking fnk gk 1 = f1000kg, i.e., by selecting only the elements of positions 1; 000, 2; 000,
3; 000, ... we still get the subsequence (8.31). On the other hand, (8.31) is not a subsequence
of (8.30) because the term 1 appears only at the initial position of (8.30). N
Proposition 303 A sequence is regular, with limit L 2 R, if and only if all its subsequences
are regular and with the same limit L.
Proof We prove the result for L 2 R, leaving the case L = 1 to the reader. “Only if”.
Suppose that fxn g converges to L. Let " > 0. There exists n" 1 such that jxn Lj < "
for every n n" . Let fxnk g1 k=1 be a subsequence of fxn g. Since nk k for every k 1, a
fortiori we have jxnk Lj < " for every k n" , so that fxnk g converges to L.
“If”. Suppose that each subsequence of fxn g converges to L. Suppose, by contradiction,
that fxn g does not converge to L. Then, there is a "0 > 0 such that, for every integer k 1,
there exists a position nk k for which xnk 2
= B"0 (L), i.e., jxnk Lj > "0 . Construct the
15
sequence of such xnk . It is a subsequence of fxn g that, by construction, does not converge
to L. So, we reached a contradiction. We conclude that fxn g converges to L.
15
For the …rst term we take k = 1 and the integer n1 1 such that jxn1 Lj > "0 ; for the second term we
take k = 2 and the integer n2 1 such that jxn2 Lj > "0 ; and so on.
206 CHAPTER 8. SEQUENCES
Theorem 304 (Bolzano-Weierstrass) Each bounded sequence has (at least) one conver-
gent subsequence.
In other words, from any bounded sequence fxn g, even if highly irregular, it is always
possible to extract a convergent subsequence fxnk g, i.e., such that there exists L 2 R for
which limk!1 xnk = L. So, we can always extract convergent behavior from any bounded
sequence, a truly remarkable property.
Example 305 The alternating sequence xn = ( 1)n is bounded because its image is the
bounded set f 1; 1g. By Bolzano-Weierstrass’ Theorem, it has at least one convergent
subsequence. Indeed, such are the constant subsequences (8.31) and (8.32). N
Case 1: for every n 1 there exists m > n such that xm xn . Set n1 = 1. Let n2 > n1
be such that xn2 xn1 ; then let n3 > n2 be such that xn3 xn2 , and so on. We construct
in this way a decreasing monotonic subsequence fxnk g, so the lemma is proved in this case.
Case 2: there exists a position n 1 such that, for each m > n, we have xm > xn . Let
I N be the set of all the positions with this property. If I is a …nite set, then Case 1 holds
for all the positions n > max I. By considering n > max I, we can therefore construct, as in
Case 1, a decreasing monotonic subsequence fxnk g.
Suppose that, instead, I is not …nite. So, there exist in…nitely many positions n 1 such
that
m > n =) xm > xn (8.33)
Since they are in…nitely many, we can write I = fn1 ; n2 ; : : : ; nk ; : : :g, with n1 < n2 < <
nk < By (8.33), we have:
xn1 < xn2 < < xnk <
The subsequence fxnk g is, therefore, monotonic increasing. This completes the proof of the
lemma also in Case 2.
Proposition 307 Each unbounded sequence has a divergent subsequence (to +1 if unboun-
ded above, to 1 if unbounded below).16
Proof Suppose that the sequence is unbounded above (the other case is similar). Then,
for every K > 0 there exists at least one element of the sequence greater than K. We
denote by xnK the smallest term in the sequence fxn g that turns out to be > K. By taking
K = 1; 2; : : :, the resulting sequence fxnK g is clearly a subsequence of fxn g (indeed, all its
terms have been taken among those of fxn g) that diverges to +1.
Summing up:
Remarkably, from any sequence, however wild, we can always extract a regular asymptotic
behavior.
O.R. The Bolzano-Weierstrass’Theorem says that it is not possible to take in…nitely many
scalars (the elements of the sequence) in a bounded interval in a way that make them (or a
part of them) “well separated”one from the other: necessarily they crowd in the proximity of
(at least) one point. More generally, the last proposition says that there is no way of taking
in…nitely many scalars without at least a part of them crowding somewhere (in proximity of
either a …nite number or of +1 or of 1; i.e., of some point of R). H
+1 1 or 1+1
16
If it is both unbounded above and below, it has both a subsequence diverging to +1 and a subsequence
diverging to 1.
17
Recall that xn ! L 2 R indicates that the sequence fxn g either converges to L 2 R or diverges positively
or negatively.
208 CHAPTER 8. SEQUENCES
(ii) xn yn ! LH, provided that LH is not an indeterminate form (1.26), of the type
1 0 or 0 ( 1)
(ii) Let xn ! L and yn ! H, with x; y 2 R. This means that, for every " > 0, there
exist n1 and n2 such that
L " < xn < L + " 8n n1 and H " < yn < H + " 8n n2
Moreover, being convergent, fyn g is bounded (recall Proposition 298): there exists b > 0
such that jyn j b for every n. Now, for every n n3 = max fn1 ; n2 g,
jxn yn LHj = jyn (xn L) + L (yn H)j jyn j jxn Lj + jLj jyn Hj < " (b + jLj)
By the arbitrariness of " (b + jLj), we conclude that xn yn ! L H.
If L > 0 and H = +1, then in addition to having, for every " > 0,
L " < xn < L + " 8n n1
we also have, for every K > 0, yn > K for every n n2 . It follows that, for every
n n3 = max fn1 ; n2 g,
xn yn > (L ") K
By the arbitrariness of (L ") K > 0, we conclude that xn yn ! +1. If L < 0 and H = +1,
we have xn yn < (L + ") K and therefore xn yn ! 1. The other cases of in…nite limits are
treated in an analogous way.
The following result shows that the case a=0 of point (iii) with a 6= 0 is actually not
indeterminate for the algebra of limits, although it is so for the extended real line (as seen
in Section 1.7).
This proposition does not, unfortunately, say anything for the case a = 0, that is, for the
indeterminate form 0=0.
Proof Let us prove the “only if” part (we leave to the reader the rest of the proof). Let
L > 0 (the case L < 0 is similar). Suppose that the sequence fynng does
o not have eventually
constant sign. Hence, there exist two subsequences fynk g and yn0k such that ynk ! 0+
and yn0k ! 0 . Therefore, xnk =ynk ! +1 while xnk =yn0k ! 1. Since two subsequences of
xn =yn have distinct limits, Proposition 303 shows that the sequence xn =yn has no limit.
Summing up, in view of the last two propositions we have the following indeterminate
forms for the limits:
+1 1 or 1+1 (8.34)
which is often denoted by just writing 1 1;
1 0 or 0 ( 1) (8.35)
19
That is, its terms are eventually either all positive or all negative.
210 CHAPTER 8. SEQUENCES
1 0
or (8.36)
1 0
which are often denoted by just writing 1=1 and 0=0. Section 8.10.3 will be devoted to
these indeterminate forms.
Besides the basic operations, the next result shows that limits nicely interchange also
with the power (and the root, which is a special case), the exponential, and the logarithm.
Indeed, (12.8) of Chapter 12 will show that such nicely interchange holds, more generally, for
all functions that – like the power, exponential, and logarithm functions – are continuous.
We thus omit the proof of the next result.
we have:20
We have, therefore, also the following indeterminate forms for the limits:
1
1
lim n = +1
1
lim =0
n
because 0 < 1=n < " for every n [1="] + 1.
As anticipated, from these two elementary limits we can infer, via the algebra of limits,
many other ones. Speci…cally:
(iii) we have: 8
< +1 if > 1
n
lim = 1 if = 1
: +
0 if 0 < < 1
+1 if > 1
lim log n =
1 if 0 < < 1
Many other limits hold; for example,
7
lim 5n + n2 + 1 = +1 + 1 + 1 = +1
as well as
3 1
lim n2 3n + 1 = lim n2 1 + 2 = +1 (1 0 + 0) = +1
n n
5 7
n2 5n 7 n2 1 n n2 1 0 0 1
lim = lim 4 6 = =
2n2 + 4n + 6 n2 2 + n + n2
2+0+0 2
1
5 n
lim n2
= [0 (5 0)] = 0
2
and
n (n + 1) (n + 2) n n 1 + n1 n 1 + n2
lim = lim 1 2 4
(2n 1) (3n 2) (5n 4) 2n 1 2n 3n 1 3n 5n 1 5n
1 2
1+ n 1+ n
= lim 1 2 4
30 1 2n 1 3n 1 5n
1 1 1
= =
30 1 1 1 30
212 CHAPTER 8. SEQUENCES
Indeterminate form 1 1
Consider the indeterminate form 1 1. For example, the limit of the sum xn + yn of the
sequences xn = n and yn = n2 falls under this form of indetermination, so one cannot
resort to the algebra of limits. We have, however,
xn + yn = n n2 = n (1 n)
We have to study carefully the two sequences and come up, each time, with a way to get out
of the indeterminacy (as we have seen in the simple examples just discussed). The same is
true for the other indeterminate forms, as it will be seen next.
Indeterminate form 0 1
Let, for example, xn = 1=n and yn = n3 . The limit of their product has the indeterminate
form 0 1, so we cannot use the algebra of limits. We have, however,
1
lim xn yn = lim n3 = lim n2 = +1
n
1
If xn = and yn = n, then
n3
1 1
lim xn yn = lim 3
n = lim 2 = 0
n n
If xn = n3 and yn = 7=n3 , then
7
lim xn yn = lim n3 = lim 7 = 7
n3
If xn = 1=n and yn = n(cos n + 2),22 then
lim xn yn = lim(cos n + 2)
On the other hand, by exchanging xn with yn , the indeterminate form 1=1 remains but
yn n2
lim = lim = lim n = +1
xn n
xn n2 1 1
lim = lim = lim 1 =
yn 1 + 2n2 n2
+2 2
22
Using the comparison criterion, that we will study soon (Theorem 314), it is possible to prove easily that
yn ! +1.
23
Since xn =yn = 1= (yn =xn ), for the two limits Proposition 288 holds.
214 CHAPTER 8. SEQUENCES
The indeterminate form 1=1 and 0=0 are closely connected: if the limit of the ratio of
the sequences fxn g and fyn g falls under the indeterminate form 1=1, then the limit of the
ratio of the sequences f1=xn g and f1=yn g falls under the indeterminate form 0=0, and vice
versa.
sum +1 L 1
+1 +1 +1 ??
H +1 L+H 1
1 ?? 1 1
Finally, for the ratio we have the following table, where the cells report the value of
lim (xn =yn ).
ratio +1 L > 0 0 L<0 1
+1 ?? 0 0 0 ??
L L
H>0 +1 H 0 H 1
0 1 1 ?? 1 1
L L
H<0 1 H 0 H +1
1 ?? 0 0 0 ??
In view Proposition 311, in the third row we assumed that yn tends to 0 from above, yn ! 0+ ,
or from below, yn ! 0 . In turn, this determines the sign of the in…nity; for example,
1 1
lim 1 = lim n = +1 and lim 1 = lim ( n) = 1
n n
For the ratio, we thus have …ve indeterminate cases out of twenty-…ve.
The tables make it clear that in the majority of the cases we can rely upon the algebra
of limits (in particular, Propositions 309 and 313). Only relatively few case are actually
indeterminate.
O.R. The case 0 1 is not indeterminate. Clearly, it is shorthand notation for lim xynn , where
the base is a sequence (positive, otherwise the power is not de…ned) approaching 0 (more
precisely, 0+ ) and the exponent is a divergent sequence. We can set 0+1 = 0: if we multiply
0 by itself “in…nitely many times” we still get a zero (a “zerissimo”, if you wish). The form
0 1 is the reciprocal, so 0 1 = +1. H
(i) If xn ; yn ! 1, their ratio xn =yn appears in the form 1=1, but it is su¢ cient to write
the ratio as
1
xn
yn
to get the form 0 1.
(ii) If xn ; yn ! 0, their ratio xn =yn appears in the form 0=0, but it is su¢ cient to write
the ratio as
1
xn
yn
to get the form 0 1.
216 CHAPTER 8. SEQUENCES
(iv) For the last three cases it is su¢ cient to consider the logarithm to end up, again, in
the case 0 1. Indeed:
The reader can try to reduce all the forms of indeterminacy to either 0=0 or 1=1.
Theorem 314 (Comparison criterion) Let fxn g, fyn g, and fzn g be three sequences. If,
eventually,
yn xn zn (8.37)
and
lim yn = lim zn = L 2 R (8.38)
then
lim xn = L
We can think of fxn g as a convict who is escorted by the two policemen fyn g and fzn g
(one on each “side”), so he is forced to go wherever they go.
Proof Suppose L 2 R (we leave to the reader the case L = 1). Let " > 0. From (8.38) it
follows, by De…nition 278, that there exists n1 such that yn 2 B" (L) for every n n1 , and
there exists n2 such that zn 2 B" (L) for every n n2 . Finally, let n3 be the position starting
from which one has yn xn zn . Setting n = max fn1 ; n2 ; n3 g, we then have yn 2 B" (L),
zn 2 B" (L), and yn xn zn for every n n. So,
The typical use of this result is in proving the convergence of a given sequence by showing
that it can be “trapped” between two suitable convergent sequences.
Example 315 (i) Consider the sequence xn = n 2 sin2 n. Since 1 sin n 1 for every
n 1, we have 0 sin2 n 1 for every n 1. So,
sin2 n 1
0 8n 1
n2 n2
(ii) Consider the sequences yn = 0 and zn = 1=n2 . Conditions (8.37) and (8.38) hold with
L = 0. By the comparison criterion, we conclude that lim xn = 0. N
The previous example suggests that, if fxn g is a bounded sequence, say k xn k for
all n 1, and yn ! +1 or yn ! 1, then
xn
!0
yn
Indeed, we have
k xn k
jyn j yn jyn j
and k= jyn j ! 0.
Theorem 317 (Ratio criterion) If there exists a scalar q < 1 such that, eventually,
xn+1
q (8.39)
xn
then lim xn = 0.
Condition (8.39) requires hat the sequence of the absolute values jxn j to be eventually
strictly decreasing, i.e., eventually jxn+1 j < jxn j. By Corollary 300, we then have jxn j # L
for some L 0. The theorem claims that, indeed, L = 0.
Proof Suppose that the inequality holds starting from n = 1 (if it held from a certain n
onwards, just recall that eliminating a …nite number of terms does not alter the limit). It
218 CHAPTER 8. SEQUENCES
is enough to prove that jxn j ! 0 (recall (8.29)). From (8.39), it follows jxn+1 j q jxn j. In
particular, by iterating this inequality from n = 1 we have:
So,
0 jxn j qn 1
jx1 j 8n 2
Since 0 < q < 1, we have q n 1 ! 0. So, by the comparison criterion we jxn j ! 0.
Note that the theorem does not simply require the ratio jxn+1 =xn j to be < 1, that is,
xn+1
<1
xn
but that it be “far from it”, i.e., smaller than a number q which, in turn, is itself smaller
than 1. The next example clari…es this observation.
Example 318 The sequence xn = ( 1)n (1 + 1=n) does not converge – indeed, the sub-
sequence of its terms of even positions tends to +1, whereas that of its terms of odd positions
tends to 1. Yet:
1
xn+1 1 + n+1 n2 + 2n
= = <1
xn 1 + n1 n2 + 2n + 1
for every n 1. N
xn+1 L
q =) xn ! L
xn L
The ratio criterion (and also the root criterion that we will see soon) thus applies, mutatis
mutandis, to the study of any convergence xn ! L.
An important case when condition (8.39) holds is when the ratio jxn+1 =xn j has a limit,
and such limit is < 1, that is,
xn+1
lim <1 (8.40)
xn
Indeed, denote by L this limit and let " > 0 be such that L + " < 1. By the de…nition of
limit, eventually we have
xn+1
L <"
xn
that is, L " < jxn+1 =xn j < L + ". Therefore, by setting q = L + " it follows that eventually
jxn+1 =xn j < q, which is property (8.39).
The limit form (8.40) is actually the most common form in which the ratio criterion is
applied. The next common limits illustrate its use:
8.11. CONVERGENCE CRITERIA 219
nk
lim n
=0 (8.41)
Indeed, set
nk
xn = n
By taking the ratio of two consecutive terms (the absolute value is here irrelevant since
all terms are positive), we have
k k
xn+1 (n + 1)k n n+1 1 1 1 1
= n+1
= = 1+ ! <1
xn nk n n
logk n log n
lim = lim =0
n n
O.R. What precedes indicates a hierarchy among the following classes of divergent sequences:
n
with > 1; nk with k > 0; logk n with k > 0 (8.42)
The “strongest”are the exponentials, graded according to the base , then the powers follow,
graded according to the exponent k, and, …nally, the logarithms, graded according to the
exponent k. For example, we have
n4 3n3 + 6n2 4 1
!
5n4 + 7n3 + 25n2 + 342 5
because the numerator inherits the behavior of n4 and the denominator that of 5n4 .
Soon, in Section 8.14 we will make rigorous these observations on limits based on the
rate of convergence (or divergence). H
Theorem 319 (Root criterion) If there exists a scalar q < 1 such that, eventually,
p
n
jxn j q (8.43)
then lim xn = 0.
220 CHAPTER 8. SEQUENCES
Proof As in the previous proof, suppose that (8.43) holds starting with n = 1. From
p
n
jxn j q
For the root criterion we can make observations similar to those n pthatowe made for the
ratio criterion. In particular, property (8.43) holds if the sequence n jxn j has a limit, and
such limit is < 1, that is, p
lim n jxn j < 1 (8.44)
This limit form is the most common with which the criterion is applied.
The next simple example shows that both the ratio and the root criteria are su¢ cient,
but not necessary, conditions for convergence. However useful, they might turn out to be
useless to establish the convergence of some sequences.
In sum, none of the two criteria is of any use for such a simple limit. N
Finally, note that both sequences xn = 1=n and xn = ( 1)n =n satisfy condition
xn+1
!1
xn
8.12. THE CAUCHY CONDITION 221
although the …rst sequence converges to 0 and the second one does not converge at all.
Therefore, this condition does not allow us to draw any conclusion about the asymptotic
behavior of a sequence. The same is true for the condition
p
n
jxn j ! 1
Indeed, it is enough to look at the sequences xn = n and xn = 1=n. All this con…rms the
key importance of the “strict”clause < 1 in (8.40) and (8.44). The next classic limit further
illustrates this remark.
p
Proposition 322 For every k > 0, we have lim n k = 1.
Theorem 323 (Cauchy) A sequence fxn g is convergent if and only if it satis…es the Cauchy
condition, that is, for each " > 0 there exists an integer n" 1 such that
Sequences that satisfy the Cauchy condition are called Cauchy sequences. The Cauchy
condition is an intrinsic condition that only involves the terms of the sequence. According
to the theorem, a sequence converges if and only if it is Cauchy. Thus, to determine whether
a sequence converges it is enough to check whether it is Cauchy, something that does not
require to consider any extraneous object and just rely on the sequence itself.
But, as usual, there are no free meals: checking that a sequence is Cauchy informs us
about its convergence, but it does not say anything about the actual limit point. To …nd it,
we need to go back to the usual procedure that requires that a candidate be posited.
Proof “Only if”. If xn ! L then, by de…nition, for each " > 0 there exists n" 1 such that
jxn Lj < " for every n n" . This implies that, for every n; m n" ,
(i) A and B are not empty. Indeed, we have xn" " 2 A and xn" + " 2 B.
(ii) If a 2 A and b 2 B, then b > a. Indeed, since a 2 A (respectively, b 2 B), there
exists na 1 such that xn > a for every n na (resp., there exists nb 1 such that
b > xn for every n nb ). De…ne n = max fna ; nb g. It follows that b > xn > a.
(iii) We have sup A = inf B. Indeed, by the Least Upper Bound Principle and by the
previous two points, sup A and inf B are well-de…ned and are such that sup A inf B.
Since, by point (i), xn" " 2 A and xn" + " 2 B, we have xn" " sup A inf B
xn" + "; in particular, jinf B sup Aj 2". Since " can be chosen to be arbitrarily
small, we then have jinf B sup Aj = 0, that is, inf B = sup A.
Call z the common value of sup A and inf B. We claim that xn ! z. Indeed, by …xing
arbitrarily a number > 0, there exist a 2 A and b 2 B such that 0 b a < and,
therefore,
z <a<b<z+
because a z b, and so z < a and b < z + . But, by the de…nition of A and B, the
sequence is eventually strictly larger than a and strictly smaller than b. So, eventually,
z < xn < z +
Example 324 (i) The harmonic sequence xn = 1=n is Cauchy. Indeed, let " > 0. We have
to show that there exists n" 1 such that for every n; m n" one has jxn xm j < ".
Without loss of generality, assume that n m. Note that for n m we have
1 1 1
0 < jxn xm j = <
m n m
8.13. NAPIER’S CONSTANT 223
Since " > 1=m amounts to m > 1=", by choosing n" = [1="] + 1 we have jxn xm j < " for
every n m n" , thus proving that xn = 1=n is a Cauchy sequence.
(ii) The sequence xn = log n is not Cauchy. Suppose, by contradiction, that for a …xed
" > 0 there exists n" 1 such that for every n; m n" we have jxn xm j < ". First, note
that if n = m + k with k 2 N, we have
m+k
jxn xm j = log < " () k < m(e" 1)
m
Thus, by choosing k = [m(e" 1)] + 1 and m n" , we obtain jxn xm j = log m+k m ".
This contradicts jxn xm j < " since n; m n" . We conclude that xn = log n is not a Cauchy
sequence. N
The previous theorem states a fundamental property of convergent sequences, yet its
relevance is also due to the structural property of the real line that it isolates, the so-called
completeness of the real line. For example, let us assume –as it was the case for Pythagoras
–that we only knew the rational numbers: so, the space on which we operate is Q. Consider
the sequence whose elements (all rationals) are the decimal approximations of :
Being a decimal approximation, this sequence satis…es the Cauchy condition because the
inequality
jxn xm j < 10 minfm 1;n 1g
can be made arbitrarily small. The sequence, however, does not converge to any point of Q:
if we knew R, we could say that it converges to . Therefore, in Q the Cauchy condition is
necessary, but not su¢ cient, for convergence. The reason is that Q has not “enough points”
to handle well convergence, unlike R. For instance, the previous sequence converges in R
because of the point , which is missing in Q. We thus say that R is complete (with respect
to convergence), while Q is incomplete. Indeed, R can be seen as a way to complete Q by
adding all the missing limit points, like , as readers will learn in more advanced courses.
Theorem 325 The sequence (8.46) is convergent. Its limit is denoted by e, i.e.
n
1
e = lim 1 + (8.47)
n
224 CHAPTER 8. SEQUENCES
Since the sequence involves powers, the root criterion is a …rst possibility to consider to
prove the result. Unfortunately,
s
n 1 n 1
1+ =1+ !1
n n
and, therefore, this criterion cannot be applied. The proof is based, instead, on the following
classic inequality.
Proof The proof is done by induction. Inequality (8.48) holds for n = 2. Indeed, for each
a 6= 0 we have:
(1 + a)2 = 1 + 2a + a2 > 1 + 2a
Suppose now that (8.48) holds for some n 2 (induction hypothesis), i.e.,
(1 + a)n > 1 + an
where the …rst inequality, due to the induction hypothesis, holds because a > 1. This
completes the induction step.
bn b1
0 < bn an = < !0
n+1 n+1
By step 1, the sequence fbn g is decreasing and bounded below (being positive). So,
lim bn = inf bn . By step 2, the sequence fan g is increasing and, being an < bn for each
n (step 3), is bounded above. Hence, lim an = sup an . Since bn an ! 0 (step 3), from
bn inf bn sup an an it follows sup an = inf bn , so lim an = lim bn .
One obtains
a1 = 21 = 2 b1 = 22 = 4
3 2 3 3
a2 = 2 = 2:25 b2 = 2 = 3:375
11 10 11 11
a10 = 10 ' 2:59 b10 = 10 ' 2:85
Therefore, Napier’s constant lies between 2:59 and 2:85. Indeed, it is equal to 2:71828:::
Later we will prove that it is an irrational number (Theorem 368). It can be proved that it
is actually a transcendental number.28
Napier’s constant is, inter alia, the most convenient base of exponential and logarithmic
functions (Section 6.5.2). Later in the book we will see that it can be studied from di¤erent
28
An irrational
p number is called algebraic if it is a root of some polynomial equation with integer coe¢ cients:
for example, 2 is algebraic because it is a root of the equation x2 2 = 0. Irrational numbers that are not
algebraic are called transcendental.
226 CHAPTER 8. SEQUENCES
From the fundamental limit (8.47), we can deduce many other important limits.
For k = 1 the proof just requires to consider the integer part of xn . For any k, it is
su¢ cient to set k=xn = 1=yn , so that
xn kyn yn k
k 1 1
1+ = 1+ = 1+ ! ek
xn yn yn
logb (1 + an )
lim = logb e 80 < b 6= 1
an
cyn 1
lim = log c
yn
cyn 1 an
=
yn logc (1 + an )
So, we are back to the (reciprocal of the) previous case in which the limit is 1= logc e =
loge c = log c.
(1 + zn ) 1
lim =
zn
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 227
Since
log (1 + an ) log (1 + zn )
lim = lim =1
an zn
the result then follows.
as well as !
3 3
1 1 + 1=n2 1
n2 1+ 2 1 = !3
n 1=n2
and
1 log (1 + 1=n)
n log 1 + = !1
n 1=n
and
2n 1
! log 2
n
The next de…nition formalizes this intuition, important both conceptually and computa-
tionally.
228 CHAPTER 8. SEQUENCES
De…nition 327 Let fxn g and fyn g be two sequences, with the terms of the former eventually
di¤ erent from zero.
(i) If
yn
!0
xn
we say that fyn g is negligible with respect to fxn g, and write
yn = o (xn )
(ii) If
yn
! k 6= 0 (8.49)
xn
we say that fyn g is of the same order (or comparable) with fxn g, and write
yn xn
yn xn
This classi…cation is comparative. For example, if fyn g is negligible with respect to fxn g,
it does not mean that fyn g is negligible per se, but that it becomes so when compared to
fxn g. The sequence yn = n2 is negligible with respect to xn = n5 , but it is not negligible at
all per se (it tends to in…nity!).
Observe that, thanks to Proposition 288, we have
yn xn
! 1 () ! 0 () xn = o (yn )
xn yn
Therefore, we can use the previous classi…cation also when the ratio yn =xn diverges, no
separate analysis is needed.
Lemma 328 Let fxn g and fyn g be two sequences with terms eventually di¤ erent from zero.
(ii) The relation of negligibility is transitive, i.e., zn = o (yn ) and yn = o (xn ) implies
zn = o (xn ).
29
Comparability is, indeed, an equivalence relation (cf. Appendix A).
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 229
We now consider the more interesting cases in which both sequences are either in…n-
itesimal or divergent. We start with two in…nitesimal sequences fxn g and fyn g, that is,
lim xn = lim yn = 0. In this case, the negligible sequence tends faster to zero. Consider, for
example, xn = 1=n and yn = 1=n2 . Intuitively, yn goes to zero faster than xn . Indeed,
1
n2 1
1 = !0
n
n
Suppose now that the sequences fxn g and fyn g are both divergent, positively or negat-
ively, that is, limn!1 xn = 1 and limn!1 yn = 1. In this case, negligible sequences
tend slower to in…nity (independently on the sign), that is, they take on values greater and
greater, in absolute value, less rapidly. For example, let xn = n2 and yn = n. Intuitively, yn
goes to in…nity more slowly than xn . Indeed,
yn n 1
= 2 = !0
xn n n
that is, yn = o (xn ). On the other hand, the same is true if xn = n2 and yn = n because
it is not the sign of the in…nity that matters, but the rate of divergence.
N.B. Setting xn = n and yn = n + k, with k > 0, the sequences fxn g and fyn g are
asymptotic. Indeed, no matter how large k is, the divergence to +1 of the two sequences
230 CHAPTER 8. SEQUENCES
will make negligible, from the asymptotic point of view, the role of k. Such a fundamental
viewpoint, central to the theory of sequences, should not make us forget that two asymptotic
sequences are, in general, very di¤erent (to …x ideas, set for example k = 1010 , i.e., 10 billions,
and consider the asymptotic, yet very di¤erent, sequences xn = n and yn = n + 1010 ). O
Proposition 329 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:
The relation o(xn ) + o (xn ) = o (xn ) in (i), bizarre at …rst sight, simply means that the
sum of two little-o of a sequence is still a little-o of such sequence, that is, it continues to be
negligible with respect to that sequence. Similar re-readings hold for the other properties in
the proposition. Note that (ii) has the remarkable special case
(iii) Let us call xn "n the little-o of xn , with "n in…nitesimal sequence. Then
c x n "n
lim = c lim "n = 0
xn
that shows that c o(xn ) is o (xn ).
(iv) Let us call yn = xn "n , with "n an in…nitesimal sequence. Then, the little-o of yn
can be written as yn n that is, xn "n n , with n an in…nitesimal sequence. Moreover, we call
xn n the little-o of xn , with n an in…nitesimal sequence. Then
x n "n n+ xn n
lim = lim ("n n + n) =0
xn
so that o(yn ) + o (xn ) = o (xn ).
(i) Adding up the two sequences we obtain yn + zn = 2 log n n, which is still o(n2 ) in
accordance with (i) proved above.
(ii) Multiplying the two sequences we obtain yn zn = 2n log n 2n2 , which is o(n2 n2 ) ,
i.e., o(n4 ), in accordance with (ii) proved above (in the special case o(xn )o(xn )). Note
that yn zn is not o(n2 ).
(iii) Take c = 3 and consider c yn = 3n. It is immediate that 3n is still o(n2 ), in accordance
with (iii) proved above.
p
(iv) Consider the sequence wn = n 1. It is immediate that wn = o(yn ) = o(n). Consider
now the sum wn + zn (with zn de…ned above), which is the sum of a o(yn ) and a o(xn ),
p
with yn = o(xn ). We have wn + zn = n 1 + 2 log n 2n, which is o(xn ) = o(n2 ) in
accordance with (iv) proved above. Note that wn + zn is not o(yn ), even if wn is o(yn ).
N
N.B. (i) To say that a sequence is o (1) simply means that it tends to 0. Indeed, xn = o (1)
means that xn =1 = xn ! 0. (ii) The fourth property in the last proposition is especially
important because it highlights that, if yn is negligible with respect to xn , in the sum
o(yn ) + o (xn ) the little-o o(yn ) is subsumed in o (xn ). O
yn ! L () xn ! L (8.51)
In detail:
All this suggests that it is possible to replace xn by yn (or vice versa) in the calculation
of the limits. Intuitively, such possibility is attractive because it might allow to replace a
complicate sequence by a simpler one that is asymptotic to it.
To make this intuition precise we start by observing that the asymptotic equivalence
is preserved under the fundamental operations.
xn
k
x n + wn
(ii) yn zn x n wn ;
Note that for sums, di¤erently from the case of products and ratios, the result does not
hold in general, but only with a non-trivial ad hoc hypothesis. For this reason, points (ii)
and (iii) are the most interesting ones. In the sequel we will thus focus on the asymptotic
equivalence of products and ratios, leaving to the reader the study of sums.
Proof (i) We have
yn + zn yn zn yn xn zn wn
= + = +
x n + wn xn + wn xn + wn x n x n + wn wn xn + wn
yn xn zn xn yn zn xn zn
= + 1 = +
xn xn + wn wn xn + wn x n wn x n + wn wn
yn zn xn
!0
xn wn x n + wn
as desired.
(ii) and (iii) We have
yn zn yn zn
= !1
x n wn x n wn
and yn
zn y n wn y n wn
xn = = !1
wn zn x n x n zn
since yn =xn ! 1 and zn =wn ! 1.
The next simple lemma is very useful: in the calculation of a limit, one should neglect
what is negligible.
xn + o (xn ) ! L () xn ! L
What is negligible with respect to the sequence fxn g –i.e., what is o (xn ) –is asymptotically
irrelevant and one can safely ignore it. Together with Lemma 331, this implies for products
and ratios, that
(xn + o (xn )) (yn + o (xn )) xn yn (8.52)
and
xn + o (xn ) xn
(8.53)
yn + o (xn ) yn
We illustrate these very useful asymptotic equivalences with some examples, which should
be read with particular attention.
n4 3n3 + 5n2 7
lim
2n5 + 12n4 6n3 + 4n + 1
By (8.53), we have
n4 3n3 + 5n2 7 n4 + o n4 n4 1
= = !0
2n5 + 12n4 6n3 + 4n + 1 2n5 + o (n5 ) 2n 5 2n
By (8.52),31 we have
1 3
n2 7n + 3 2 + 2 = n2 + o n2 (2 + o (1)) 2n2 ! +1
n n
(iii) Consider the limit
n (n + 1) (n + 2) (n + 3)
lim
(n 1) (n 2) (n 3) (n 4)
By (8.53), we have
n (n + 1) (n + 2) (n + 3) n4 + o n4 n4
= 4 =1!1
(n 1) (n 2) (n 3) (n 4) n + o (n4 ) n4
(iv) Consider the limit
n 1
lim e 7+
n
By (8.52), we have
n 1 n n
e 7+ =e (7 + o (1)) 7e !0
n
N
By (8.50), we have
yn xn zn wn
() (8.54)
zn wn yn xn
provided that the ratios are (eventually) well-de…ned and not zero. Therefore, once we have
established the asymptoticity of the ratios yn =zn and xn =wn , we “automatically” have also
the asymptoticity of their reciprocals zn =yn and wn =xn .
Example 334 Consider the limit
e5n n7 4n2 + 3n
lim
6n + n8 n4 + 5n3
By (8.53),
n
e5n n7 4n2 + 3n e5n + o e5n e5n e5
= = ! +1
6n + n8 n4 + 5n3 6n + o (6n ) 6n 6
If, instead, we consider the reciprocal limit
6n + n8 n4 + 5n3
lim
e5n n7 4n2 + 3n
then, by (8.54),
n
6n + n8 n4 + 5n3 6
!0
e5n n7 4n2 + 3n e5
N
In conclusion, a clever use of (8.52)-(8.53) often allows to simplify substantially the
calculation of limits. But, beyond calculations, they are illuminating relations conceptually.
31
For 0 6= k 2 R, we have k + o(1) k. Indeed,
k + o(1) 1
= 1 + o(1) ! 1
k k
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 235
xn yn () xn = yn + o (yn )
In words, two sequences are asymptotic when they are equal, up to a component that is
asymptotically negligible with respect to them. This result further clari…es how the relation
can be seen as an asymptotic equality.
xn yn + o (yn ) o (yn )
= =1+ !1
yn yn yn
zn x n yn xn
= = 1!0
yn yn yn
Proposition 336 Let fxn g be a sequence with terms eventually non-zero. Then
1
log jxn j ! k 6= 0 (8.55)
n
1 1 kn + o (n)
log jxn j = log ekn+o(n) = !k
n n n
“Only if.” Set zn = log jxn j. Since k 6= 0, from (8.55) it follows that zn =kn ! 1, i.e.,
zn kn. From the previous proposition and Proposition 329-(iii) it follows that
as claimed.
When k < 0, the condition (8.55) characterizes the sequences that converge to zero at
exponential rate. In that case, we speak of exponential decay. When k > 0, there is instead
an explosive exponential behavior.
236 CHAPTER 8. SEQUENCES
8.14.5 Terminology
Due to its importance, for the comparison both of in…nitesimal sequences and of divergent
sequences there is a speci…c terminology. In particular,
(i) if two in…nitesimal sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is in…nitesimal of higher order with respect to fxn g;
(ii) if two divergent sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is of lower order of in…nity with respect to fxn g.
In other words, a sequence is in…nitesimal of higher order if it tends to zero faster, while
it is of lower order of in…nity if it tends to in…nity slower. Besides the terminology (which is
not universal), it is important to recall the idea of negligibility that lies at the basis of the
relation yn = o (xn ).
(ii) nk = o ( n ) for every > 1, as already proved with the ratio criterion. We have
n = o nk if, instead, 0 < < 1 and k > 0.
logk2 n 1
k1
= k1 k2
!0
log n log n
The next lemma reports two important comparisons of in…nities that show that expo-
nentials are of lower order of in…nity than factorials (we omit the proof).
Note that this implies, by Lemma 328, that n = o (nn ). Exponentials are, therefore, of
lower order of in…nity also compared with sequences of the type nn .
The di¤erent orders of in…nity and in…nitesimal are sometimes organized through scales.
If we limit ourselves to the in…nities (similar considerations hold for the in…nitesimals), the
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 237
most classic scale of in…nities is the logarithmic-exponential one. Taking xn = n as the basis,
we have the ascending scale
2 k n
n; n2 ; :::; nk ; :::; en ; e2n ; :::; ekn ; :::; en ; :::; en ; :::; ee ; :::
They provide “benchmarks”to caliber the asymptotic behavior of a sequence fxn g that tends
to in…nity. For example, if xn log n, the sequence fxn g is asymptotically logarithmic; if
xn n , the sequence fxn g is asymptotically quadratic, and so on.32
2
n
In applications one rarely considers orders of in…nity higher than ee and lower than
log log n. Indeed, log log n has an almost imperceptible increase, it is almost constant:
n 3 4 5 6
n
ee 5:284 9 108 5:148 4 1023 2:851 1 1064 1:610 3 10175
The asymptotic behavior of divergent sequences that are relevant in applications usually
n
ranges between the slowness of log log n and the explosiveness of ee . But, from a theoretical
point of view, we can go well beyond them. The study of the scales of in…nity is of great
elegance (see, Hardy, 1910).
Proof We will only show the …rst equality. By setting xn = n!=nn , in the proof of Lemma
337 we have seen that
xn+1 1
lim =
xn e
From (10.16), we have also that
p
n
p n! 1
lim n
xn = lim =
n e
p n
We can thus conclude that n= n n! = e (1 + o (1)), or n!=nn = e n (1 + o (1)) , that is,
n
n! = nn e n
(1 + o (1))
n!
p = eo(1) ! 1
nn e n 2 n
We thus obtain the following remarkable formula
p
n! nn e n 2 n
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(n) 0 1 2 2 3 3 4 4 4 4 5 5 6 6 6
It is, of course, not possible to fully describe the sequence as this would be equivalent to
describing the sequence of prime numbers, which we have argued to be hopeless (at least,
operationally). Nevertheless, we can still ask ourselves whether there is a sequence fxn g that
is described in closed form and is asymptotically equal to . In other words, the question is
whether we can …nd a reasonably simple sequence that asymptotically approximates well
enough.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 239
Around the year 1800, Gauss and Legendre noted independently that the sequence
fn= log ng well approximates , as we can check by inspection of the following table.
n (n)
n (n) log n n= log n
10 4 4; 3 0; 921
102 25 21; 7 1; 151
103 168 145 1; 161
104 1:229 1:086 1; 132
105 9:592 8:686 1; 104
1010 455:052:511 434:294:482 1; 048
1015 29:844:570:422:669 28:952:965:460:217 1; 031
1020 2:220:819:602:560:918:840 2:171:472:409:516:250:000 1; 023
becomes closer and closer to 1 as n increases. Gauss and Legendre’s conjectured that this
was so because is asymptotically equal to fn= log ng. Their conjecture remained open
for about a century, until it was, independently, proven to be true in 1896 by two great
mathematicians, Jacques Hadamard and Charles de la Vallée Poussin. The importance of
such a result is testi…ed by its name, which is as simple as it is demanding.34
Although we are not able to describe the sequence , thanks to the Prime Number
Theorem we can say that its asymptotic behavior is similar to that of the simple sequence
fn= log ng, that is, their number in any given interval of natural numbers [m; n] is approx-
imately
n m
(n) (m) =
log n log m
with increasing accuracy. This wonderful result, which undoubtedly has a statistical “‡avor”,
is incredibly elegant. Even more so if we consider its following remarkable consequence.
The sequence of prime numbers fpn g is thus asymptotically equivalent to fn log ng. The
n-th prime number’s value is, approximately, n log n. For example, by inspecting the prime
number table one can see that for n = 100 one has that pn = 541 while its “estimate” is
n log n = 460 (rounding down). Similarly:
pn
n pn n log n n log n
One can see that the ratio between pn and its estimate n log n stays steadily around 1.
log n
!1
log pn
O.R. Counting objects is one of the most basic activities common across cultures, arguably
the most universal one: counting emerges as soon as similar, yet distinguished, entities come
up. If so, the identi…cation of prime numbers –the atoms of numbers –can be viewed as an
important step in the evolution of a civilization. Indeed, their study emerged in the Greek
world, which also marked the emergence of reason (Section 1.8). The depth with which
a civilization studies prime numbers is, then, a possible universal benchmark to assess its
degree of evolution. Under this scale, the Prime Number Theorem is one of best evidence of
its evolution that mankind can o¤er when going where no one has gone before (unless sure
of their intentions, better not to meet civilizations that have found the closed form of the
sequence of primes). H
8.15 Sequences in Rn
We close the chapter by considering sequences xk of vectors in Rn . For them we give a
de…nition of limit that follows closely the one already given for sequences in R. The funda-
mental di¤erence is that each element of the sequence is now a vector xk = (xk1 ; xk2 ; :::; xkn ) 2
Rn and not a scalar.
In other words, xk = (xk1 ; xk2 ; :::; xkn ) ! L = (L1 ; L2 ; :::; Ln ) if the scalar sequence of
distances xk L converges to zero (cf. Proposition 281). Since
r
Xn 2
k
x L = xki Li
i=1
xk L ! 0 () xki Li ! 0 8i = 1; 2; : : : ; n (8.59)
That is, xk ! L if and only if the scalar sequences xki of the i-th components converge
to the component Li of the vector L. The convergence of a sequence of vectors, therefore,
242 CHAPTER 8. SEQUENCES
reduces to the convergence of the sequences of the single components. So, it is a compon-
entwise notion of convergence that, as such, does not present any signi…cant novelty relative
to the scalar case.
1 1 2k + 3
1 + ; 2;
k k 5k 7
in R3 . Since
1 1 2k + 3 2
1+ !1 , !0 and !
k k2 5k 7 5
the sequence converges to the vector (1; 0; 2=5). N
In a similar way, we de…ne the divergences to +1 and to 1 when all the components of
the vectors that form the sequence diverge to +1 or to 1, respectively. Finally, when the
single components have di¤erent behaviors (some converge, others diverge or are irregular)
the sequence of vectors does not have a limit (for brevity, we omit the details).
Series
To make rigorous this new operation of “addition of in…nitely many summands”, which is
di¤erent from the ordinary addition (as we will realize), we will sum a …nite number of terms,
say n, then make n tend to in…nity and take the resulting limit, if it exists, as the value to
assign to the series. We are, therefore, thinking of constructing a new sequence fsn g de…ned
by
s1 = x1 (9.2)
s2 = x1 + x2
s3 = x1 + x2 + x3
sn = x1 + + xn
and to take the limit of fsn g as the sum of the series. Formally:
243
244 CHAPTER 9. SERIES
De…nition
P1 343 The series with terms given by a sequence fxn g of scalars, in symbols
n=1 x n , is the sequence fsn g de…ned in (9.2). The terms sn of the sequence are called
partial sums of the series.
P
The series 1 n=1 xn is therefore de…ned as the sequence fsPn g of the partial sums (9.2).
Its limit behavior determines its value; in particular, a series 1
1
n=1 xn is:
P
1
(i) convergent, with sum S, in symbols xn = S, if lim sn = S 2 R;
n=1
P1
(ii) positively divergent, in symbols n=1 xn = +1, if lim sn = +1;
P1
(iii) negatively divergent, in symbols n=1 xn = 1, if lim sn = 1;
This formulation can be operationally useful to construct partial sums through a guess and
verify procedure: we …rst posit a candidate expression for the partial sum, which we then
verify by induction. Example 347 will illustrate this procedure. However, as little birds
suggesting guesses are often not around, the main interest of this recursive formulation is,
ultimately, theoretical in that it further clari…es that a series is nothing but a new sequence
constructed from an existing one. Indeed, given a sequence fxn g, the recursion (9.3) de…nes
the sequence of partial sums fsn g. It is this recursion that, thus, underlies the notion of
series.
O.R. Sometimes it is useful to start the series with the index n = 0 rather than from n = 1.
When the option exists (we will see that this is not the case for some types of series, like the
harmonic series, which cannot be de…ned for n = 0), the choice to start a series from either
n = 0 or n = 1 (or from another value of n) is a pure matter of convenience (as it was for
sequences). Actually, one can start the series from any k in N. The context itself typically
suggests the best choice. In any case, this choice does not alter the character of the series
and, therefore, it does not a¤ect the problem of determining whether the series converges or
not. H
1
We thus resorted to a limit, that is, to a notion of potential in…nity. On the other hand, we cannot really
sum in…nitely many summands: all the world paper would not su¢ ce, nor would our entire life (and, by
the way, we would not know where to put the line that one traditionally writes under the summands before
adding them).
2
Using the terminology already employed for the sequences, a series is sometimes called regular when it is
not irregular, that is when one of the cases (i)-(iii) holds.
9.1. THE CONCEPT 245
Since
1 1 1
=
n (n + 1) n n+1
one has that
1 1 1
sn = + + +
1 2 2 3 n (n + 1)
1 1 1 1 1 1 1 1
=1 + + + + =1 !1
2 2 3 3 4 n n+1 n+1
Therefore,
1
X 1
=1
n (n + 1)
n=1
So, the Mengoli series converges and has sum 1. N
Consider the partial sums with indexes n that are powers of 2 (i.e., n = 2k ):
1
s1 = 1; s2 = 1 +
2
1 1 1 1 1 1 1 1
s4 = 1 + + + > 1 + + + = s2 + = 1 + 2
2 3 4 2 4 4 2 2
1 1 1 1 1 1 1 1 1 1
s8 = s4 + + + + > s4 + + + + = s4 + > 1 + 3
5 6 7 8 8 8 8 8 2 2
By continuing in this way we see that
1
s2k > 1 + k (9.4)
2
The sequence of partial sums is strictly increasing (since the summands are all positive) and
so it admits limit; the inequality (9.4) guarantees that it is unbounded above and therefore
lim sn = +1. Hence,
X1
1
= +1
n
n=1
Example 346 (Geometric series) The geometric series with ratio q is de…ned by:
1
X
1 + q + q2 + q3 + + qn + = qn
n=0
sn = |1 + 1 +
{z + 1} = n + 1 ! +1
n+1 times
sn qsn = 1 + q + q 2 + q 3 + + qn q 1 + q + q2 + q3 + + qn
= 1 + q + q2 + q3 + + qn q + q2 + q3 + + q n+1 = 1 q n+1
we have
(1 q) sn = 1 q n+1
1
sn !
1 q
(iii) if q = 1, the partial sums of odd order are equal to zero, while those of even order
are equal to 1. The sequence formed by them is hence irregular;
(iv) if q < 1, the sequence q n+1 is irregular and, therefore, so is fsn g as well. N
9.1. THE CONCEPT 247
Example 347 We can use the recursive de…nition of partial sums (9.3) to guess and verify
(by induction) what are the partial sums of the geometric series. The, highly inspired, guess
is that
1 q n+1
sn =
1 q
We verify the guess by induction. At n = 0; 1 it is trivially true. Assume it is true at n
(induction hypothesis). Then
Epicurus in a letter to Herodotus wrote “Once one says that there are in…nite parts in
a body or parts of any degree of smallness, it is not possible to conceive how this should
be, and indeed how could the body any longer be limited in size?” The previous examples
show that, indeed, if these “parts”, these particles, have a strictly positive, but di¤erent size
– for example either 1=n (n + 1) or q n , with q 2 (0; 1) – then the series might converge, so
the size of the “body” can be de…ned. Nevertheless, Epicurus was right in the sense that, if
we assume – as it seems he does too – that all the particles have same size, no matter how
small, then the series
"+"+"+ +"+
P1
positively diverges. That is, n=1 " = +1 for every " > 0. Indeed, for the partial sums we
have sn = n" ! +1. This simple series has an interesting philosophical meaning (properties
of series have been often used, even within philosophy, to try to clarify the nature of the
potential in…nite).
where 2 (0; 1) is the subjective discount factor. In view of what we have just seen, (9.5) is
the series
X1
t 1
ut (xt ) (9.6)
t=1
Series thus give a rigorous meaning to the fundamental discounted form (9.5) of intertem-
poral utility functions. Naturally, we are interested in the case in which the series (9.6)
is convergent, so that the overall utility that the consumer gets from a stream is …nite.
Otherwise, how could we compare, hence choose, streams if they have in…nite utility?
248 CHAPTER 9. SERIES
Using the properties of the geometric series, momentarily will show in Example 360 that
the series (9.6) converges if < 1, provided that the utility functions ut are positive and
bounded by the same constant.4 In such a case, the intertemporal utility function
1
X
t 1
U (x) = ut (xt ) (9.7)
t=1
has as domain the entire space R1 , that is, U (x) 2 R for every x 2 R1 . We can thus
compare all possible consumption streams.
and
1
X 1
X 1
X
(xn + yn ) = xn + yn
n=1 n=1 n=1
Proof Clearly, we have xn = sn sn 1and, given that the series converges, sn ! S as well
as sn 1 ! S. Therefore, xn = sn sn 1 !S S = 0.
Convergence to zero of the sequence fxn g is, therefore, a necessary condition for conver-
gence P
of its series. This condition is only necessary: even though 1=n ! 0, the harmonic
series 1n=1 1=n diverges.
2n2 3n + 4
xn =
17n2 + 4n + 5
is not convergent because xn is asymptotic to 2n2 =17n2 = 2=17, so it does not tend to 0. N
4
Actually, (9.6) converges if and only if < 1, as long as the instantaneous utility functions are equal
across periods as well as strictly positive and bounded.
9.3. SERIES WITH POSITIVE TERMS 249
Proposition 350 Each series with positive terms is either convergent or positively diver-
gent. In particular, it is convergent if and only if it is bounded above.6
Series with positive terms thus inherit the remarkable regularity properties of monotonic
sequences. This gives them an important status among series. In particular, for them we
now recast the convergence criteria presented in Section 8.11 for sequences.
P P1
Proposition 351 (Comparison criterion) Let 1 n=1 xn and n=1 yn be two series with
positive terms, with xn yn eventually.
P P1
(i) If 1 n=1 xn diverges positively, then so does n=1 yn .
P1 P1
(ii) If n=1 yn converges, then so does n=1 xn .
P 0
Proof Let n0 1 be such that xn yn for all n n0 , and set = nn=1 (yn xn ). By
calling sn (resp., n ) the partial sums of the sequence fxn g (resp., fyn g), for n > n0 we have
Xn
n sn = + (yk xk )
k=n0 +1
That is, n sn + . Therefore, the result follows from Proposition 296 (which is the
sequential counterpart of this statement).
Note that (i) is the contrapositive of (ii), and vice versa: indeed, thanks to Proposition
350, for a series with positive terms the negation of convergence is positive divergence.7
Because of their usefulness, we stated both; but, it is the same property seen in two equivalent
ways.
P1 P1
the convergence of n=1 1=n
2 is a consequence of the convergence of n=1 1= (n + 1)2 .
If > 2, then
1 1
< 2
n n
for every n > 1 and therefore we still have convergence.
Finally, it is possible to see, but it is more delicate, that the generalized harmonic series
converges also if 2 (1; 2).
Summing up, the generalized harmonic series
1
X 1
n
n=1
For the generalized harmonic series, the case = 1 is thus the “last” case of divergence:
it is su¢ cient to very slightly increase the exponent, from 1 to 1+" with " > 0, and the series
will converge. This suggests that the divergence is extremely slow, as the reader can check
by calculating some of the partial sums.10 This intuition is made precise by the following
beautiful result.
Proof The proof of this result may be skipped on a …rst reading since it relies on integration
notions that will be presented in Chapter 35. De…ne : [0; 1) ! R by
1
(x) = 8x 2 [i 1; i)
i
with i 1. That is, (x) = 1 if x 2 [0; 1), (x) = 1=2 if x 2 [1; 2), and so on. It is easy to
see that
1 1
(x) 8x > 0 (9.10)
x+1 x
Therefore, the restriction of on every closed interval is a step function. By Proposition
1423, we then have
n
X n Z
X i Z n
1
= (x) dx = (x) dx 8k = 1; :::; n
i i 1 k 1
i=k i=k
Example 356 The last example can be generalized by showing that the series13
1
X 1
n=2
n log n
converges for > 1 and any 2 R, as well as for = 1 and > 1. It diverges for <1
and any 2 R, as well as for = 1 and any 1. N
The comparison criterion has a nice and useful asymptotic version, based on the asymp-
totic comparison of the terms of the sequences.
P P1
Proposition 357 (Asymptotic comparison criterion) Let 1 n=1 xn and n=1 yn be two
series with strictly positive terms.14 If xn yn , then the two series have the same character.
13
The series starts with n = 2 because for n = 1 the term is not de…ned.
14
The hypothesis that the terms are strictly positive, so non-zero, is necessary to make the ratio xn =yn well
de…ned. This hypothesis will be used several times throughout the chapter.
9.3. SERIES WITH POSITIVE TERMS 253
Therefore, the character of a series is invariant with respect to the asymptotic equivalence
relation.
Proof Since xn yn , for every " > 0 there exists n" 1 such that
xn
1 " 1+" 8n n"
yn
and
n
X n"
X Xn Xn
xk
xk = xk + yk c + (1 ") yk (9.12)
yk
k=1 k=1 k=n" +1 k=n" +1
P " P P1
where c = nn=1 xk . The character of the series 1 n=1 yn is the same as that of k=n" +1 yk
because the value assumedPby a …nite number of initial terms is irrelevant P1for the character
of a series. PTherefore, if 1 n=1 yn converges, by (9.11) it follows thatP n=1 xn converges,
whereas if 1 n=1 yn diverges to +1, from (9.12) it follows that also
1
n=1 xn diverges to
+1.
n+1
xn =
n2 3n + 4
P1
Since xn 1=n, the series n=1 xn diverges to +1. N
We can use the asymptotic comparison criterion to establish a celebrated result, proved
in 1737 by Euler, that says that the sum of the reciprocals of the prime numbers is in…nite.
1 1
pn n log n
254 CHAPTER 9. SERIES
P1
By the asymptotic comparison criterion, the series
P n=1 1=pn has the same character of
1
n=2 1=n log n. In view of Example 356, we have
1
X 1
= +1
n log n
n=2
P1
It follows that n=1 1=pn = +1, as desired.
Euler’s Theorem, along with the comparison criterion, implies the divergence to +1 of
the harmonic series. Indeed
1 1
pn n
for every n 1.15 Euler’s Theorem is, however, a truly remarkable result with respect to
the divergence of the harmonic series in that it involves only the reciprocals of the prime
numbers, whereas the harmonic series considers the reciprocals of all natural numbers (be
they prime or not).
Euler’s Theorem con…rms that there are in…nitely many prime numbers, and shows that
they are “dense”inPN because they tend to +1 more slowly than the powers n , with > 1,
for which we have 1 n=1 1=n < +1.
We conclude our analysis of the comparison criterion with an important economic ap-
plication.
Suppose that the functions ut : R ! R are positive and uniformly bounded above, that is,
there is common constant M > 0 such that, for all t 1,
0 ut (x) M 8x 2 R
converges. In view of Example 346, we conclude that the series (9.6) converges if and only
if < 1.16 N
15
We have 1=p1 = 1=2 1, 1=p2 = 1=3 1=2, 1=p3 = 1=5 1=3, 1=p4 = 1=7 1=4, and so on.
16
The asymptotic behavior as ! 1, that is, as patience becomes in…nite, will be addressed by the
Frobenius-Littlewood’s Theorem in Section 10.6.
9.3. SERIES WITH POSITIVE TERMS 255
(i) If
xn+1
lim <1
xn
the series converges.
(ii) If
xn+1
lim >1
xn
the series diverges positively.
The criterion is thus based on the study of the limit of the ratio
xn+1
xn
of the terms of the series. The condition that the limit lim xn+1 =xn exists is rather demand-
ing, as we will see in the next section. But, when it is satis…ed, the elementary limit form of
the ratio criterion is the easiest to apply.
Proof (i) Without loss of generality, assume that xn > 0 and (9.13) holds for every n. From
xn+1 qxn we deduce, as in the analogous criterion for sequences, that 0 < xn q n 1 x1 ,
and the …rst statement follows from the comparison criterion (Proposition 351) and from
the convergence of the geometric series. (ii) If we have eventually xn+1 =xn 1 and xn > 0,
then eventually xn+1 xn > 0. In other words, the sequence fxn g is eventually increasing
and therefore it cannot tend to 0, yielding that the series must diverge positively.
Example 364 Let fxn g be a sequence such that x1 > 0 and
( 1
2 xn if n even
xn+1 = 1
3 xn if n odd
For instance, if x1 = 1 then
P1f1; 1=3; 1=6; 1=18; :::g. Since xn+1 =xn 1=2 for all n 1, by the
ratio criterion the series n=1 xn converges. Note that here lim xn+1 =xn does not exist. N
9.3. SERIES WITH POSITIVE TERMS 257
It is possible to prove (see Section 10.4) that, if the lim xn+1 =xn exists, then the ratio
criterion assumes exactly the tripartite form given in Proposition 361. That is:
(i) lim xn+1 =xn < 1, then then the series converges;
(iii) lim xn+1 =xn = 1, then the criterion fails and gives no indication about the character
of the series.
Operationally, this tripartite form is the standard form in which the ratio criterion is
applied. At a mechanical level, it might be su¢ cient to recall this tripartition and the
illustrative examples given in the prelude. But, not to do plumbing rather than mathematics,
it is important to keep in mind the theoretical foundations provided by Proposition 363 (the
last simple example, in which the tripartite form is useless, shows that it can be also useful).
Let us see other tripartite examples.
P1 n =n
Example 365 (i) By the ratio criterion, the series n=1 q converges for every 2R
and every 0 < q < 1. Indeed,
n q n+1 n
= q!q<1
(n + 1) q n n+1
Again by the ratio criterion, this series diverges
P positively when q > 1. Finally, if q = 1 we
are back to the generalized harmonic series 1 n=1 1=n of Example 354.
xn+1 n! x
= !0 8x > 0
(n + 1)! xn n+1
xn+1 n n
n
= x!x
n+1 x n+1
which obviously is < 1 when 0 < x < 1. If x > 1, the ratio criterion implies that the
series diverges positively. Finally, if x = 1 we are back to the harmonic series, which
diverges positively. N
We stop here our study of convergence criteria. Much more can be said: in Section 10.4
we will continue to investigate this topic in some more depth.
258 CHAPTER 9. SERIES
Proof In Example 353 we showed that the series converges. Let us compute its sum. By
Newton’s binomial formula (B.4) for each n 1, we have
n n
X n
X
1 n 1 1 n! 1
1+ = k
=
n k n k! (n k)! nk
k=0 k=0
n! k
= n (n 1) (n k + 1) n
| {z n} = n
(n k)! | {z }
k times k times
Therefore,
n! 1
1
(n k)! nk
which implies
n n
X n
X
1 1 n! 1 1
1+ =
n k! (n k)! nk k!
k=0 k=0
It follows that
1
X 1
e (9.15)
n!
n=0
The equality (9.17) holds for every number x and reduces to (9.14) in the special case
x = 1. Note the remarkable series expansion of the exponential function
1
X xn x2 x3 xn
ex = =1+x+ + + + + (9.18)
n! 2 3! n!
n=0
1
X
From Example 365 we know that the series xn =n! converges for x > 0. The case x = 0 is
n=0
trivial. At the same time, Example 371 of the next section will show that this series converges
also for x < 0, when it has no longer positive terms.
and
n! 1
1
(n k)! nk
Fix m 1. For every n > m, we have
m
X n
X n
X n
X
x n xk n! 1 xk n! 1 jxjk n! 1 jxjk
1+ =
n k! (n k)! nk k! (n k)! nk k! (n k)! nk k!
k=0 k=m+1 k=m+1 k=m+1
260 CHAPTER 9. SERIES
Pm k =k!
This proves that k=0 x converges to ex , hence the statement.
P
N.B. In the proof we used aPnoteworthy fact: if the series 1 k=1 xk converges, then the
sequence of “forward” sums f 1 x
k=m k g converges to 0 as m ! +1. Intuitively, if from an
in…nite sum we …rst remove the …rst summand, then the …rst two summands, then the …rst
three summands, and so on and so forth, then what is left should. The reader may want to
make this argument rigorous. O
Later in the book we will see that (9.17) is a power series (Chapter 10). For this reason,
the equality (9.18) is called the power series expansion of the exponential function. It is a
result, as elegant as important, that allows to “decompose” the exponential function in a
sum of (in…nitely many) simple functions such as the powers xn .
We will study in greater generality series expansions with the tools of di¤erential calculus,
of which series expansions are one of the most remarkable applications.
Proof We have:
n
X 1
X
1 1
0 < e =
k! k!
k=0 k=n+1
1 1 1 1
= + + + +
n! n + 1 (n + 1) (n + 2) (n + 1) (n + 2) (n + k)
! 1
1 1 1 1 1 X 1 1 1
< + + + + = =
n! n + 1 (n + 1)2 (n + 1) k n! (n + 1) k n! n
k=1
9.4. SERIES WITH TERMS OF ANY SIGN 261
where the last equality holds because the geometric series that starts at k = 1 with ratio
1= (n + 1) has sum 1=n. By Theorem 366, we then have the following interesting bounds:
n
X 1 1 1
0<e <
k! n! n
k=0
Suppose, by contradiction, that e is rational, i.e., e = p=q for some natural numbers p and
q. By multiplying both sides of the last inequality by n!, we then have
n
X
p 1 1
0 < n! n! < (9.19)
q k! n
k=0
If n = q, then
p (q 1)! (q! + q!=2! + + 1)
is an integer, which cannot be between 0 and 1=n as (9.19) requires. This contradiction
proves that e is not rational.
The next result shows that the convergence of the series of absolute values – which can
be veri…ed with the criteria discussed in the previous sections –guarantees the convergence
of the not necessarily positive, so possibly much wilder, original series.
The condition is only su¢ cient, as we will soon show (Proposition 374). The class of
absolutely convergent series is, therefore, contained in that of convergent series. As the next
section will show, this subclass has key regularity properties: absolutely convergent series
are, among the series with terms of any sign, the ones that behave well.
Example 371 Let us revisit the series in Example 365 by permitting negative terms.
P1 n =n
(i) By Theorem 370 and by the ratio criterion, the series n=1 q converges for every
2 R and every 1 < q < 1. Indeed, from
jxn+1 j n q n+1 n
= = jqj ! jqj < 1
jxn j (n + 1) q n n+1
it follows that it converges absolutely.
262 CHAPTER 9. SERIES
P1 n =n!
(ii) The series n=1 x converges for every x 2 R. In fact, from
jxn+1 j xn+1 n! x
= = !0 8x 2 R
jxn j (n + 1)! xn n+1
it follows that it converges absolutely. So, the series in Theorem 367 is, indeed, con-
vergent.
P
(iii) The series 1 n
n=1 x =n converges for every 1 < x < 1. Indeed,
jxn+1 j xn+1 n n
= n
= jxj ! jxj
jxn j n+1 x n+1
which obviously is < 1 when 1 < x < 1. Thus, also this series converges absolutely.N
converges. Indeed, we have ( 1)n =n2 = 1=n2 , so this series converges absolutely. N
(ii) The series
X1
x3 x5 x7 x2n+1
x + + = ( 1)n
3! 5! 7! (2n + 1)!
n=0
x2n+2 (2n)! x2
= !0 8x 2 R
(2n + 2)! x2n (2n + 2) (2n + 1)
Theorem 370 is a consequence of the following simple lemma, which should also further
clarify its nature.
P P1
Lemma 373 Given a series 1 n=1 xn , suppose there is a convergent series n=1 yn with
positive terms such that, for every n 1,
(i) xn + yn 0,
P1 P
Then, both the series n=1 (xn + yn ) and 1 n=1 xn converge, with
X1 X1 X1
xn = (xn + yn ) yn
n=1 n=1 n=1
partial sums of the three series involved. Both lim szn and lim syn exist. Clearly, sxn = szn syn
for every n 1. By Proposition 309-(i), we then have lim sxn = lim szn lim syn , as desired.
P P1
The series 1 n=1 yn thus “lifts”, via addition, the series of interest n=1 xnPand takes it
back to the familiar terrain of series with positive terms. The convergence of 1 n=1 xn can
then be established by studying two auxiliary series with positive terms, for which we have
at our disposal all the tools learned in the previous sections.
Theorem 370 follows from the lemma by considering yn = jxn j because jxn j + xn 0 and
xn jxn j for every nP 1. ThisPclari…es the “lifting” P1nature of absolute convergence. In
1 1
particular,
P1 it implies n=1 x n = n=1 (x n + jx n j) n=1 jxn j, so that the sum of the series
x
n=1 n can be expressed in terms of the sums of two series with positive terms.
Absolute convergence is only a su¢ cient condition for convergence. Indeed, the altern-
ating harmonic series
1
X ( 1)n+1 1 1 1 1 1
=1 + + + (9.20)
n 2 3 4 5 6
n=1
converges to log 2, as the next elegant result will show. However, it does not converge
absolutely:
X1 X1
( 1)n+1 1
= = +1
n n
n=1 n=1
are decreasing and increasing, respectively. So, they converge to two scalars Lodd and Leven ,
respectively. Since s2n+1 s2n = x2n+1 ! 0, we then have Lodd = Leven . If we call L this
common limit, we conclude that sn ! L, so the alternating harmonic series converges.
It remains to show that L = log 2 . It is enough to consider the even partial sums s2n
and show that lim s2n = log 2. We have
2n
X n 1 n n n
X1 1 n
( 1)k+1 X 1 X 1 X 1 X 1
s2n = = = + 2
k 2k + 1 2k 2k 2k + 1 2k
k=1 k=0 k=1 k=1 k=0 k=1
X2n n
X
1 1
=
k k
k=1 k=1
264 CHAPTER 9. SERIES
By (9.9),
n
X 2n
X
1 1
= + log n + o (1) and = + log 2n + o (1)
k k
k=1 k=1
2n
X n
X
1 1
s2n = = log 2 + o (1)
k k
k=1 k=1
It is easy to check that the argument just used to show the convergence of the alternating
P
1
series (9.20) proves, more generally, that any alternating series ( 1)n+1 xn , with xn 0
n=1
for every n 1, converges provided the sequence fxn g is decreasing and in…nitesimal, i.e.,
xn # 0.
for any permutation : N ! N? In other words, are series stable under permutations of
their elements?
This stability seems inherent to any proper notion of “addition”, which should not a¤ect
by mere rearrangements of the summands. Indeed, the answer is obviously positive for …nite
sums because of the classic associative and commutative properties of addition. The next
result shows that the answer continues to be positive for series that are absolutely convergent.
P P1
Proposition 375 Let 1 n=1 xn be a series that converges absolutely. Then, n=1 xn and
all its rearrangements have the same sum.
17
We refer interested readers to Chapter 3 of Rudin (1976) for a more detailed analysis, which includes the
proofs of the results of this section.
18
Recall that a permutation is a bijective function (see Appendix B).
9.4. SERIES WITH TERMS OF ANY SIGN 265
Absolutely convergent series thus exhibit the same well behavior that characterizes …nite
sums. Unfortunately, this is no longer the case if we drop absolute convergence. For instance,
consider the alternating harmonic series
1 1 1 ( 1)n+1
1 + + + +
2 3 4 n
We learned that it converges, with sum log 2, but that it is not absolutely convergent.
Through a suitable permutation, we con construct the rearrangement
1 1 1 1 1
1+ + + +
3 2 5 7 4
p
which is still convergent, but with sum log 2 2. So, rearrangements have, in general, di¤erent
sums. The next classic result of Riemann shows that everything goes, so the answer to the
previous question turns out to be dramatically negative.
P
Theorem
P1 376 (Riemann) Let 1 n=1 xn be a series that converges
P1but not absolutely (i.e.,
jx
n=1 n j = +1). Given any L 2 R, there is a rearrangement of n=1 xn that has sum L.
Summing up, series that are absolutely convergent behave as the standard addition. But,
as soon as we drop absolute convergence, everything goes.
266 CHAPTER 9. SERIES
Chapter 10
Discrete calculus
Discrete calculus deals with problems analogous to those of di¤erential calculus, with the
di¤erence that sequences, that is, functions f : N f0g ! R with discrete domain, are
considered instead of functions on the real line. Despite a more rough domain, some highly
non-trivial results hold that make discrete calculus useful in applications.1 In particular, in
this chapter we will show its use in the study of series and sequences, allowing for a deeper
analysis of some issues which we have already discussed.
Example 377 For the alternating sequence xn = ( 1)n , we have yn = 1 and zn = 1 for
every n, whereas for the sequence xn = 1=n we have yn = 1=n and zn = 0 for every n. N
M zn xn yn M 8n 1 (10.1)
so fyn g is decreasing and fzn g is increasing. Being monotone, both fyn g and fzn g converge
(Theorem 299). If we denote their limits as y and z, that is, yn ! y and zn ! z, we can
write
lim sup xk = y and lim inf xk = z
n!1 k n n!1 k n
The limits y and z are, respectively, called limit superior and limit inferior of fxn g, and are
denoted by lim sup xn and lim inf xn .
1
Some parts of this chapter require a basic knowledge of di¤erential calculus. This chapter can be read
seamlessly after reading Chapter 20.
267
268 CHAPTER 10. DISCRETE CALCULUS
This example shows two key properties of the limits inferior and superior: they always
exist, even if the original sequence has no limit and their equality is a necessary and su¢ cient
condition for the convergence of the sequence fxn g.2 Formally:
Proof Thanks to (10.1), Proposition 296 implies (10.2). The proof of the second part of the
statement is left to the reader.
lim inf xn = lim sup xn and lim sup xn = lim inf xn (10.3)
They are duality properties that relate the limit superior and limit inferior of a sequence fxn g
with those of the opposite sequence f xn g. For instance, this simple duality allows to easily
translate some properties of the limit superior into properties of the limit inferior, and vice
versa (this is exactly what will happen in the next proof). Another interesting consequence
of the duality is the possibility to rewrite the inequality (10.2) as lim inf xn lim inf xn .
The next result lists some basic properties of the limits superior and inferior. Thanks to
the previous result, they imply the analogous properties that we established for convergent
sequences.3
Lemma 380 Let fxn g and fyn g be two bounded sequence. We have:
(iii) lim inf xn lim inf yn and lim sup xn lim sup yn if eventually xn yn .
2
Since it is bounded, fxn g converges or oscillates, but does not diverge.
3
Speci…cally, (i) and (ii) have as a special case Proposition 309-(i), while (iii) has as a special case Pro-
position 296.
10.1. PREAMBLE: LIMIT POINTS 269
Proof We start by observing that fxn + yn g is bounded. (i) For every n we have inf k n (xk + yk )
inf k n xk + inf k n yk . Since the sequences finf k n (xk + yk )g, finf k n xk g and finf k n yk g
converge, (i) follows from Proposition 296. (ii) follows from (i) and the duality formulas
contained in (10.3):
If the sequence converges, there exists a unique limit point: the limit of the sequence.
If the sequence does not converge, the limit points are the scalars that are approached by
in…nitely many elements of the sequence. Indeed, it can be easily shown that L is a limit
point for a sequence if and only if there exists a subsequence that converges to L.
Example 382 (i) The interval [ 1; 1] is the set of limit points of the sequence xn = sin n,
whereas f 1; 1g are the limit points of the alternating sequence xn = ( 1)n . (ii) The
singleton f0g is the unique limit point of the convergent sequence xn = 1=n. N
The next result shows that the limit points belong to the interval determined by the limit
superior and the limit inferior.
Proposition 383 Let fxn g be a bounded sequence. If x 2 R is a limit point for the sequence,
then x 2 [lim inf xn ; lim sup xn ].
Proof Consider a limit point x. By contradiction, assume that lim inf xn > x. De…ne
" = lim inf xn x > 0 and zn = inf k n xk for every n. On the one hand, in light of the
previous part of the chapter, we know that zn+1 zn for every n and zn ! lim inf xn . This
implies that there exists n" 2 N such that
" "
lim inf xn < zn < lim inf xn +
2 2
for every n n" . On the other hand, since x is a limit point, there exists xn such that
" "
x 2 < xn < x + 2 where n can be chosen to be strictly greater than n" (recall that
each neighborhood of x must contain an in…nite number of elements of the sequence). By
construction, we have that zn = inf k n xk xn . This yields that
" "
lim inf xn < zn xn < x +
2 2
thus lim inf xn < x+". We reached a contradiction since by de…nition " = lim inf xn x which
we just proved being strictly smaller than ". An analogous argument yields that lim sup xn
x (why?).
270 CHAPTER 10. DISCRETE CALCULUS
Intuitively, the larger the set of limit points, the more the sequence is divergent; in par-
ticular, this set reduces to a singleton when the sequence converges. In light of the last result,
the di¤erence between superior and inferior limits, that is, the length of [lim inf xn ; lim sup xn ],
is a (not that precise) indicator of the divergence of a sequence.
Thanks to the inequality lim inf xn lim inf xn , the interval [lim inf xn ; lim sup xn ]
can be rewritten as [lim inf xn ; lim inf xn ]. For instance, if xn = sin n or xn = cos n, we
have that [lim inf xn ; lim inf xn ] = [ 1; 1].
N.B. Up to this point, we have considered only bounded sequences. Versions of the previ-
ous results, however, can be provided for generic sequences. Clearly, we need to allow the
limits superior and inferior to assume in…nity as a value. For instance, if we consider the
sequence xn = n, which diverges to +1, we have lim inf xn = lim sup xn = +1; for the
sequence xn = en , which diverges to 1, we have lim sup xn = lim inf xn = 1, whereas
for the sequence xn = ( 1)n n we have lim inf xn = 1 and lim sup xn = +1, so that
[lim inf xn ; lim sup xn ] = R. We leave to the reader the extension of the previous results to
generic sequences. O
The next result lists the algebraic properties of the di¤erences, that is, their behavior
with respect to the fundamental operations.5
Proposition 385 Let fxn g and fyn g be any two sequences. For every n, we have:
On the one hand, (i) shows that the di¤erence preserves addition and subtraction,
on the other hand, (ii) and (iii) show that more complex rules hold for multiplication and
division. Properties (ii) and (iii) are called product rule and quotient rule, respectively.
Therefore, the monotonicity of the original sequence is revealed by the sign of the di¤er-
ences.
Example 387 (i) If xn = c for all n 1, then xn = 0 for all n 1. In words, constant
sequences (that are both increasing and decreasing) have zero di¤erences. (ii) If xn = an ,
with a > 0, we have that
xn = an+1 an = (a 1) an = (a 1) xn
The sequence xn = 2n thus equals the sequence of its own …nite di¤erences, so it is the
discrete counterpart of the exponential function in di¤erential calculus.
Proof “If”. From the last example, if a = 2 then for the increasing sequence f2n g we have
xn = xn for every n and x1 = 2. “Only if”. Suppose that xn = xn for all n 1, that is,
xn+1 xn = xn . A simple recurrence argument shows that xn = 2n 1 x1 . Since x1 = 2, we
obtain xn = 2n for every n.
272 CHAPTER 10. DISCRETE CALCULUS
This formula can be proved by induction on k (a common technique for this chapter). Here,
we only outline the induction step. Assume that (10.4) holds for k. We show it holds for
k + 1. Fix n. First, observe that (why?)
k+1 k k
= + 8i = 1; :::; k (10.5)
i i 1 i
This implies that
k
X k
X
k k
k+1
xn = k
xn+1 k
xn = ( 1)k i
xn+1+i ( 1)k i
xn+i
i i
i=0 i=0
k 1
X k
X
k k
= ( 1)k i
xn+1+i + xn+k+1 k
( 1) xn ( 1)k i
xn+i
i i
i=0 i=1
k
X k
X
k k
= ( 1)k+1 xn + ( 1)k+1 i
xn+i + ( 1)k+1 i
xn+i + xn+k+1
i 1 i
i=1 i=1
k
X
k+1 k+1 k+1
= ( 1)k+1 xn + ( 1)k+1 i
xn+i + xn+k+1
0 i k+1
i=1
k+1
X k+1
= ( 1)k+1 i
xn+i
i
i=0
n = (n + 1) n=1
n2 = (n + 1)2 n2 = 2n + 1
2 2
n = 2 (n + 1) + 1 (2n + 1) = 2
Formula (10.4) permits the following beautiful generalization of the series expansion
(9.17) of the exponential function. From now on, we set 0 xn = xn for every n. Note that
if we set 00 = 1 too, then (10.4) holds for k = 0 as well.
10.2. DISCRETE CALCULUS 273
Theorem 390 Let fyn g be any bounded sequence. Then, for each n 1,
1
X 1
X
xk k x xj
yn = e yn+j 8x 2 R (10.6)
k! j!
k=0 j=0
Proof Since fyn g is bounded, the two series in the formula converge. By (10.4), we have to
show that, for each n,
1
X k 1
xk X k X xj
( 1)k i yn+i = e x
yn+j 8x 2 R (10.7)
k! i j!
k=0 i=0 j=0
In reality, we are going to prove a much stronger fact. Fix an integer j 0. We show that
the coe¢ cients of yn+j on the two sides of (10.7) are equal. Clearly, on the right-hand side
this coe¢ cient is e x xj =j!. As to the left-hand side, note that yn+j appears as soon as k j
and this coe¢ cient is
1
X xk k
( 1)k j
k! j
k=j
Set i = k j. Then,
1
X 1
X 1
X
xk k xi+j i+j xi+j (i + j)!
( 1)k j
= ( 1)i = ( 1)i
k! j (i + j)! j (i + j)! i!j!
k=j i=0 i=0
1 1
xj X i ( 1)i xj X ( x)i xj x
= x = = e
j! i! j! i! j!
i=0 i=0
where the last equality follows from Theorem 367, thus proving (10.8) and the statement.
The series expansion (9.17) is a special case of (10.6). Indeed, let n = 1 so that (10.6)
becomes
X1 X1
xk k xj
y1 = e x y1+j (10.9)
k! j!
k=0 j=0
m k
n =0 8m > k (10.10)
The proof relies on the following lemma of independent interest (we leave its proof to the
reader).
Lemma 392 Let fxn g be a sequence. For every k and for every n, we have k+1 x =
n
kx = k x .
n n
k+1 s
n =0 8k 2 N; 8s 2 f0; 1; :::; kg (10.11)
We proceed by induction. For k = 1, note that s can only be either 0 or 1 and the result
holds in view of the last example. Assume now that k+1 ns = 0 for all s 2 f0; 1; :::; kg
(induction hypothesis on k), we need to show that k+2 ns = 0 for all s 2 f0; 1; :::; k + 1g.
Let s belong to f1; :::; k + 1g: either s < k + 1 or s = k + 1. In the …rst case, by the induction
hypothesis, we have that k+2 ns = k+1 ns = 0. In the second case, by using Newton’s
binomial, we have
k+1 k k+1 k
nk+1 = (n + 1)k+1 nk+1 = nk+1 + n + n 1
+ +1 nk+1
1 2
k+1 k
= (k + 1) nk + n 1
+ +1
2
where the zeroes follow from the induction hypothesis. We conclude that k+2 nk+1 = 0.
The statement in (10.11) follows. From (10.11), it is then immediate to derive, by induction
on m, equation (10.10) (why?). Next we show that k nk = k!. We proceed by induction.
Again, for k = 1 the result holds in view of the last example. Assume now that the statement
10.2. DISCRETE CALCULUS 275
holds for k (induction hypothesis). We need to show that k+1 nk+1 = (k + 1)!. We then
have
where the zeroes follow from (10.11). Summing up, k nk = k!, as desired.
That said, in di¤erential calculus a key feature of the powers xk is that their derivatives
are kxk 1 . In this respect, the discrete powers nk are disappointing because their di¤erences
do not take such a form: for instance, for the sequence xn = n2 we have n2 = 2n + 1 6= 2n
(Example 389).
To restore the formula kxk 1 , we need to introduce the falling factorial n(k) de…ned by
n!
n(k) = = n (n 1) (n k + 1)
(n k)!
Proof We have
(n + 1)! n! (n + 1) n! n!
n(k) = (n + 1)(k) n(k) = =
(n + 1 k)! (n k)! (n + 1 k) (n k)! (n k)!
n+1 n! k
= 1 = n(k)
n+1 k (n k)! n+1 k
n (n 1) (n k + 2) (n k + 1)
= k
n+1 k
= kn (n 1) (n k + 2) = kn(k 1)
as desired.
Thus, for …nite di¤erences the sequences xn = n(k) are the analog of powers for di¤erential
calculus.6 This analogy underlies the next classic di¤erence formula proved by Isaac Newton
in 1687 in the Principia. Recall that 0 xn = xn .
6
Observe that, given k, the terms xn = n(k) are well de…ned for n k.
276 CHAPTER 10. DISCRETE CALCULUS
Proof Before starting, note that for every sequence fxn g and for n 1 and m 1 equality
(10.12) can be rewritten as
m
X m
X
m! j m j
xn+m = xn = xn
j! (m j)! j
j=0 j=0
1
X
1 0 1 m j
xn+1 = xn + xn+1 xn = xn + xn = xn
0 1 j
j=0
Assume now the statement is true for m. We need to show it holds for m + 1. Note that
m
X m
X
m j m j
xn+m+1 = xn+m + xn+m = ( xn ) + xn
j j
j=0 j=0
m
X Xm
m j+1 m j
= xn + xn
j j
j=0 j=0
m
X1 m
X
m+1 m j+1 m j 0
= xn + xn + xn + xn
j j
j=0 j=1
Xm Xm
m+1 m j m j 0
= xn + xn + xn + xn
j 1 j
j=1 j=1
m
X m+1
X
m+1 m+1 j 0 m+1 j
= xn + xn + xn = xn
j j
j=1 j=0
where the second to last equality follows from (10.5), proving the statement.
m (m 1) 2 m
xn+m xn = m xn + xn + + xn
2
So, it represents the di¤erence between two terms of a sequence via di¤erences of higher
orders. It can be viewed as a discrete analog of Taylor expansion.
m (m 1) m(k 1)
xn+m xn = m nk + 2 k
n + + k 1 k
n + m(k)
2 (k 1)!
provided m k. N
10.2. DISCRETE CALCULUS 277
xn ( 1)n
= !0
yn n
Therefore, even if the ratio xn =yn does converge, the behavior of the ratio xn = yn
of the di¤erences may not. On the other hand, the next result shows that the asymptotic
behavior of the ratio xn = yn determines the one of xn =yn .
Theorem 397 (Cesàro) Let fyn g be a strictly increasing sequence that diverges to in…nity,
that is, yn " +1, and let fxn g be any sequence. Then,
xn xn xn xn
lim inf lim inf lim sup lim sup (10.13)
yn yn yn yn
In particular, this inequality implies that, if the (…nite or in…nite) limit of the ratio
xn = yn exists, we have
xn xn xn xn
lim inf = lim inf = lim sup = lim sup (10.14)
yn yn yn yn
that is, xn =yn converges to the same limit. Therefore, as stated above, the “regularity”of the
the asymptotic behavior of the ratio xn = yn implies the “regularity” of the original ratio
xn =yn . At the same time, if the ratio xn =yn presents an “irregular”asymptotic behavior, so
will the di¤erence ratio.
Proof We will only prove the special case (10.14) when xn = yn admits a …nite limit.
Therefore, let xn = yn ! L 2 R. It follows that, for " > 0, there exists n" such that
xn
L "< <L+"
yn
for every n n" . Since, by hypothesis, yn+1 yn > 0 for every n, we have
(L ") (yn" +1 yn" ) < xn" +1 xn" < (L + ") (yn" +1 yn" )
(L ") (yn" +2 yn" +1 ) < xn" +2 xn" +1 < (L + ") (yn" +2 yn" +1 )
Summing over the previous inequalities, we get for each n > n"
has the indeterminate form 1=1. Consider the sequences xn = log (1 + n) and yn = n. The
sequence (10.15) can be then written as xn =yn . We have
xn log (1 + n + 1) log (1 + n) 1
= = log 1 + !0
yn 1 1+n
Therefore
log (1 + n)
lim =0
n
by Cesàro’s Theorem. N
10.3. CONVERGENCE IN MEAN 279
At a conceptual level, in the next section we will see how Cesàro’s Theorem allows for
a better understanding of convergence criteria for series (see Section 10.4). To this end, the
following remarkable consequence of Cesàro’s Theorem will be crucial.
Corollary 399 Let fxn g be a sequence such that, eventually, xn > 0. Then,
xn+1 p p xn+1
lim inf lim inf n
xn lim sup n
xn lim sup (10.16)
xn xn
Proof Without loss of generality, let fxn g be a strictly positive sequence. We have
xn+1 p 1
log = log xn+1 log xn and log n
xn = log xn
xn n
that is,
log xxn+1
n
p p log xxn+1
n
lim inf lim inf log n
xn lim sup log n
xn lim sup
1 1
from which (10.16) follows since, for every strictly positive sequence fzn g, we have
elim inf zn = lim inf ezn and elim sup zn = lim sup ezn
and, since by hypothesis lim inf xn+1 = lim sup xn+1 = lim xn = L , it follows
x1 + x2 + + xn
lim zn = lim =L
n
as desired.
The sequence Pn
i=1 xi
n
of arithmetic means converges always to the same limit of the sequence fxn g, whereas the
converse does not hold: the sequence of means may converge while the original one does not.
Example 401 The alternating sequence xn = ( 1)n does not converge, whereas
Pn
i=1 xi
!0
n
Indeed (
x1 + x2 + + xn 0 if n is even
=
n 1
if n is odd
n
N
Therefore, the sequence of means is more “stable”than the original one. This motivates
the following, more general, de…nition of limit of a sequence, named after Ernesto Cesàro.
It is fundamental in probability theory (and in its applications).
De…nition 402 We say that a sequence fxn g converges in the sense of Cesàro (or in mean)
C
to L, and we write xn ! L, when
x1 + x2 + + xn
!L
n
From the last result, it follows that standard convergence to a limit implies Cesàro con-
vergence to the same limit. The converse does not hold: we may have Cesàro convergence
without standard convergence.
Example 403 The alternating sequence xn = ( 1)n from the last example does not con-
C
verge but it converges in the sense of Cesàro, i.e., ( 1)n ! 0. N
It is useful to …nd conditions such that the converse holds, that is, the convergence of the
sequence of means implies the convergence of the original sequence. These results are called
Tauberian theorems. We state one of them as an example.
Proposition 404 (Landau) Let fxn g be a sequence for which there exists k < 0 such that
xn =n > k 8n 1
C
Then xn ! L 2 R if and only if xn ! L.
10.3. CONVERGENCE IN MEAN 281
In particular, the hypothesis is always satis…ed when the sequence fxn g is increasing. So,
an increasing sequence converges to L if and only if it Cesàro converges to L.
Whenever a sequence does not converge in mean, we may consider the sequence of the
“means of the means”, that, by the previous results, it is more likely to converge than the
sequence of means: this is called (C; 2) convergence. This idea can be extended to the mean
of the mean iterated k times. We will not consider such cases.7 However, the fundamental
principle is that means tend to smooth the behavior of a sequence. In various fashions, often
stochastic (an example is the law of large number previously mentioned), this principle is of
central importance in applications. In medio stat virtus.
s 1 = 1 ; s 2 = 0 ; s3 = 1 ; s4 = 0 ; s5 = 1 ;
Even if this is not his main scienti…c contribution, the name of Guido Grandi is re-
membered for his treatment of this series. It is curious to note that, until the mid-nineteenth
century, also the greatest mathematicians believed – like Grandi – that this series summed
to 1=2. Until then, mathematics had been developing untidily: highly complex theorems
7
We refer interested readers to Hardy (1949).
282 CHAPTER 10. DISCRETE CALCULUS
were known, but attention to well posed de…nitions and rigor, which we are now used to,
was lacking.
The monk Guido Grandi proposed the following explanation, which contains two mis-
takes. First of all, he identi…ed
1 1+1 1+1 1+
as a geometric series with common ratio q = 1 (correct) and therefore having sum
1 1 1
= =
1 q 1 ( 1) 2
(wrong: the geometric series converges only when jqj < 1). In an unfortunate crescendo, by
pairing the addends (wrong: the associative property does not generally hold for series; cf.
Section 9.4.2), Grandi then derived the equality
(1 1) + (1 1) + =0+0+
Lemma 406 Let fxn g be a sequence with, eventually, xn > 0. There exists q < 1 such that,
eventually, xn+1 =xn q if and only if
xn+1
lim sup <1 (10.17)
xn
Proof Without loss of generality, assume that xn > 0 for every n. “Only if”. Suppose that
there exists q < 1 such that eventually (9.13) holds. There exists n such that xn+1 =xn q
for every n n. Therefore, for any such n we have supk n xk+1 =xk q, which implies
xn+1 xk+1
lim sup = lim sup q<1
xn n!1 k n xk
xn+1
sup L <" 8n n
k n xn
that is
xk+1
L " < sup <L+" 8n n
k n xk
If we choose " su¢ ciently small so that L + " < 1, by setting q = L + " we obtain the desired
condition.
The previous analysis leads to the following corollary, which is useful for computations,
in which the ratio criterion is expressed in terms of limits.
P
Corollary 407 Let 1 n=1 xn be a series with, eventually, xn > 0.
(i) If
xn+1
lim sup <1
xn
then the series converges.
(ii) If
xn+1
lim inf >1
xn
then the series diverges positively.
Note that, thanks to Lemma 406, point (i) is equivalent to point (i) of Proposition 363.
In contrast, point (ii) is weaker than point (ii) of Proposition 363 since condition (10.18) is
only su¢ cient, but not necessary, to have that xn+1 =xn 1 eventually.
As shown by the following examples, this speci…cation of the ratio criterion is particularly
useful when the limit
xn+1
lim
xn
exists, that is, whenever
xn+1 xn+1 xn+1
lim = lim sup = lim inf
xn xn xn
In this particular case, the ratio criterion takes the useful tripartite form of Proposition 361:
284 CHAPTER 10. DISCRETE CALCULUS
(i) if
xn+1
lim <1
xn
the series converges;
(ii) if
xn+1
lim >1
xn
the limit of the series is 1;
(iii) if
xn+1
lim =1
xn
the criterion fails and it does not determine the behavior of the series.
As we have seen in Section 8.11, this form of the ratio criterion is the one which is usually
used in applications. Examples P362 and 365 have
P1shown 2cases (i) and (ii). The unfortunate
1
case (iii) is well exempli…ed by n=1 1=n and n=1 1=n .
Let us see the limit form of this result. By an argument similar to the one contained in
Lemma 406, point (i) can be equivalently stated as
p
lim sup n
xn < 1
p
As to point (ii), it requires that n xn 1 for in…nitely many values of n, that is, that there
p
is a subsequence fnk g such that nk xnk 1 for every k. Such a condition holds if
p
lim sup n
xn > 1 (10.20)
and only if
p
lim sup n
xn 1 (10.21)
10.4. CONVERGENCE CRITERIA FOR SERIES 285
The constant sequence xn = 1 exempli…es how condition (10.21) can hold even if (10.20)
does not. The sequence xn = (1 1=n)n on the other hand, shows how even condition (ii)
from Proposition 408 may not hold although (10.21) holds. It is, therefore, clear that (10.20)
implies point (ii) of Proposition 408, which in turn implies (10.21), but that the opposite
implications do not hold.
All this brings us to the following limit form, in which point (i) is equivalent to that of
Proposition 408, while point (ii) is weaker than its counterpart since, as we have seen above,
p
condition (10.20) only is a su¢ cient condition for n xn 1 to hold for in…nitely many values
of n.
P1
Corollary 409 (Root criterion in limit form) Let n=1 xn be a series with positive terms.
p
(i) If lim sup n xn < 1, the series converges.
p
(ii) If lim sup n xn > 1, the series diverges positively.
p p
Proof If lim sup n xn < 1, we have that n xn q for some q < 1, eventually. The desider-
p p
atum follows from Proposition 408. If lim sup n xn > 1, then n xn 1 for in…nitely many
values of n, and the result follows from Proposition 408.
As for the limit form of the ratio criterion, also that of the root criterion is particularly
p
useful when lim n xn exists. Under such circumstances the criterion takes the following
tripartite form:
(i) if
p
lim n
xn < 1
(ii) if
p
lim n
xn > 1
(iii) if
p
lim n
xn = 1
the criterion fails and it does not determine the behavior of the series.
As for the tripartite form of the ratio criterion, that of the root criterion is its most useful
form at a computational level. Nonetheless, we hope the reader will always keep in mind the
theoretical background of the criterion: “ye were not made to live like unto brutes, but for
pursuit of virtue and of knowledge”, as Dante’s Ulysses famously remarked.9
9
“fatti non foste a viver come bruti, ma per seguir virtute e canoscenza”, Inferno, Canto XXVI.
286 CHAPTER 10. DISCRETE CALCULUS
converges as r
n qn q
n
= !0
n n
P p
(ii) Let 0 q < 1. The series 1 k n
n=1 n q converges for every k: indeed
n
nk q n = qnk=n ! q
because nk=n ! 1 (since log nk=n = (k=n) log n ! 0). N
that is:
1 1 1 1 1 1 1
+1+ + + + + + +
2 8 4 32 16 128 64
We have 8 1
>
> 2(n+1) 2
=2 if n odd
xn+1 < 1
2n
=
xn >
>
1
1
: 2n+1
1 = 8 if n even
2n 2
10
See Rudin (1976) p. 67.
10.4. CONVERGENCE CRITERIA FOR SERIES 287
and ( 1
p 2 if n odd
n
xn = p
n
4
2 if n even
so that
xn+1 xn+1 1
lim sup =2 , lim inf =
xn xn 8
and
p 1
lim sup n
xn =
2
The ratio criterion thus fails, while the root criterion tells us that the series converges. N
Even though the root criterion is more powerful, the ratio criterion can still be useful as
it is generally easier to compute the limit of ratios than that of roots. The root criterion may
be more powerful from a theoretical standpoint, yet it is harder to use from a computational
perspective.
In light of this, when using the criteria for solving problems, one should …rst check
whether lim xn+1 =xn exists and, if it does, compute it. In such a case, thanks to (10.22) we
p
can also know the value of lim n xn and thus we can use the more powerful root criterion.
In the unfortunate case in which lim xn+1 =xn does not exist, and we can at best compute
lim sup xn+1 =xn and lim inf xn+1 =xn , we can either use the less powerful ratio criterion (which
p
may fail, as we have seen in the previous example), or we may try to compute lim sup n xn
directly, hoping it exists (as in the previous example) so that the root criterion can be used
in its handier limit form.
Finally, note that, however powerful it may be, the root criterion –a fortiori, the weaker
ratio criterion – only gives a su¢ cient condition for convergence, as the following example
shows.
P1 p
Proposition 413 Let n=1 xn be a series with positive terms, with lim sup n xn < 1. For
every q > 0 such that
p
lim sup n
xn q<1
we have that, eventually,
xn qn (10.23)
p
Proof Take q > 0 such that lim sup n xn q < 1. There is an nq 1 such that
p
n
xn q
Thanks to (10.23), we can say that those convergent series whose terms converge to zero
less quickly than the geometric sequence – i.e., such that q n = o (xn ) – are out of the root
criterion’s reach. For example, for every natural number k 2 we have that
qn
1 !0
nk
P1
and so q n = o n k . To determine whether the series n=1 n
k converges, the root criterion
is thus useless. This is con…rmed by the fact that
r
n 1
lim =1
nk
But, it is thanks to Proposition 413 that we are able to understand why the root criterion
fails in this instance.
p (x) b0 + b1 x + + bm xm
f (x) = =
q (x) a0 + a1 x + + an xn
Its domain consists of all points of the real line except the real solutions of the equation
a0 + a1 x + + an xn = 0.
A rational function is proper if the degree of the polynomial at the numerator is lower
than that of the polynomial at the denominator, i.e., m < n. Proper rational functions
admit a simple representation –called partial fraction expansion –that often simpli…es their
analysis. We focus on the case of distinct real roots, leaving to readers the case of multiple
roots.
10.5. POWER SERIES 289
Proposition 414 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then
i=1
c1 c2 cn
f (x) = + + + (10.24)
x r1 x r2 x rn
where, for all i = 1; :::; k,
p (ri )
ci = (10.25)
q 0 (ri )
Proof We …rst establish that there exist n coe¢ cients c1 , c2 , ..., cn such that (10.24) holds.
For simplicity, we only consider the case
b0 + b1 x
f (x) =
a0 + a1 x + a2 x2
leaving to readers the general case. Since the denominator is (x r1 ) (x r2 ), we look for
coe¢ cients c1 and c2 such that
b0 + b1 x c1 c2
= +
q (x) (x r1 ) (x r2 )
Since
c1 c2 c1 (x r2 ) + c2 (x r1 ) (c1 + c2 ) x (c1 r2 + c2 r1 )
+ = =
(x r1 ) (x r2 ) q (x) q (x)
we have
b0 + b1 x (c1 + c2 ) x (r2 + r1 )
=
q (x) q (x)
So, by equating coe¢ cients we have the simple linear system
c1 + c2 = b0
c1 r2 + c2 r1 = b1
Since r1 6= r2 , the system is easily seen to have a unique solution (c1 ; c2 ) that provides the
sought-after coe¢ cients.
It remains to show that the coe¢ cients of (10.24) satisfy (10.25). We have
c1 c2 cn
lim (x ri ) f (x) = lim (x ri ) + + +
x!ri x!ri x r1 x r2 x rn
c1 (x ri ) c2 (x ri ) cn (x ri )
= lim + + + ci + = ci
x!ri x r1 x r2 x rn
as well as, by de l’Hospital’s rule,
p (x) (x ri ) 1
lim (x ri ) f (x) = lim (x ri ) = p (ri ) lim = p (ri ) 0
x!ri x!ri q (x) x!ri q (x) q (x)
Putting the two limits together, we conclude that ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k, as
desired.
290 CHAPTER 10. DISCRETE CALCULUS
c1 (x + 2) + c2 (x + 1) x(c1 + c2 ) + (2A + c2 )
= (10.27)
(x + 1)(x + 2) (x + 1)(x + 2)
Expressions (10.26) and (10.27) are equal if and only if c1 and c2 satisfy the system:
c1 + c2 = 1
2c1 + c2 = 1
with an 2 R for every n 0. The scalars an are called coe¢ cients of the series.
The generic term of a power series is xn = an xn . The scalar x parameterizes the series:
to di¤erent values of x correspond di¤erent series, possibly with a di¤erent character.
P
1
De…nition 416 A power series an xn is said to converge ( diverge) at x0 2 R if the
X1 n=0
series an xn0 converges (diverges).
n=0
We set 00X
= 1. In this way, a power series always converges at 0: indeed, from 00 = 1 it
1
follows that an 0n = a0 .
n=0
X1
Proposition 417 If a power series with positive coe¢ cients an xn converges at x0
n=0
0, then it converges at every x 2 R such that jxj < x0 . If it diverges at x0 2 R, then it
diverges at every x 2 R such that jxj > x0 .
10.5. POWER SERIES 291
X1 We only
Proof
n
prove
X1 convergence, the otherX part being similar. Let jxj < x0 . We have
1
an jxj n
an x0 , so the series an xn is absolutely convergent by the
n=0 n=0 X1
n=0
comparison criterion. By Theorem 370, the series an xn converges.
n=0
X1
Inspired by this result, given a power series an xn we say that a positive r 2 [0; +1]
n=0
is the radius of convergence of the power series if it converges at every jxj < r and diverges at
every jxj > r. So, if it exists, the radius of convergence would be a watershed that separates
convergent and divergent behavior of the power series (at jxj = r the character of the series
is ambiguous, it may be regular or not). In particular, if r = +1 the power series converges
at all x 2 R, while if r = 0 it converges only at the origin.
The next powerful result, a simple yet remarkable consequence of the root criterion,
proves the existence of such radius and gives a formula to compute it.
X1
Theorem 418 (Cauchy-Hadamard) The radius of convergence of a power series an xn
n=0
is
1
r=
where p
n
= lim sup jan j 2 [0; +1]
with r = +1 if = 0 and r = 0 if = +1.
Proof Assume 2 (0; 1). We already remarked that the power series converges at x = 0.
So, let x 6= 0. We have
p
n
p jxj
lim sup jan xn j = jxj lim sup n jan j = jxj =
r
So, by the root criterion the series converges if jxj =r < 1, namely if jxj < r, and it diverges
if jxj =r > 1, namely if jxj > r. We leave the case 2 f0; +1g to the reader.
of the sequence f1=n!g, so de…ned via the power series (10.29), has the entire real line as its
domain. By Theorem 367, it is the exponential f (x) = ex .
(ii) The generating function
1
X xn
f (x) =
n
n=1
of the sequence f1=ng, so de…ned via the power series (10.30), has domain [ 1; 1).
P
1
(iii) The “geometric” function f (x) = xn , generating for the constant sequence
n=0
f1; 1; :::; 1; :::g, has domain ( 1; 1).
10.5. POWER SERIES 293
P
1
(iv) The generating function f (x) = n!xn for the factorials’sequence has a singleton
n=1
domain f0g. N
Next we give an important property of generating functions, where we adopt the conven-
tion f (0) (0) = f (0).
Proposition 421 The generating function for a sequence fan g is in…nitely di¤ erentiable on
( r; r), with
f (n) (0)
an = 8n 0 (10.32)
n!
This result shows, inter alia, that generating functions are uniquely determined: if f is
the generating functions of sequences fan g and fbn g, then these sequence are equal, that is,
an = bn for all n 1. Indeed, an = bn = f (n) (0) =n! for all n 1.
Proof Let f : ( r; r) ! R be the generating function for the sequence fan g restricted on
the open interval ( r; r). We prove that it is analytic,11 so that the result follows from
P
1
Proposition 1081. By de…nition, f (x) = an xn for all x 2 ( r; r). Let x0 2 ( r; r) and
n=0
B" (x0 ) ( r; r). By the binomial formula, for each x 2 B" (x0 ) we have
1
X 1
X 1
X n
X n n
f (x) = an xn = an (x x0 + x0 )n = an x m
(x x0 )m
m 0
n=0 n=0 n=0 m=0
1 1
!
X X n
= an xn0 m
(x x0 )m
n=m
m
m=0
where for the change in the order of summation in the last step we refer readers to, e.g., Rudin
P P
1
(1976) p. 176. By setting bm = 1 n n m
n=m m an x0 , we than have f (x) = bm (x x0 )m
m=0
for all x 2 B" (x0 ). This proves the analyticity of f .
!
a fa
This observation is important because, remarkably, it turns out that a generating function
fa may be constructed by just using a de…nition by recurrence of the sequence a = fan g.
This makes it possible to solve the recurrence if one is able to retrieve (in closed form) the
coe¢ cients of the sequence a = fan g that generates fa . Indeed, such a sequence is unique
11
Analytic functions will be introduced in Section 23.5.
294 CHAPTER 10. DISCRETE CALCULUS
and so it has then to be the one de…ned by the recurrence at hand.12 We can diagram this
solution scheme as follows:
a recurrence ! fa ! a closed form
The next classic example gives a ‡avor of this scheme.
Example 422 Consider the classic Fibonacci recursion, started at n = 0,
(
a0 = 0 ; a1 = 1
(10.33)
an = an 1 + an 2 for n 2
that is, f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; :::g. We want to construct its generating function
p f :
A R ! R. Since the sequence is positive and increasing, clearly lim sup n jan j > 0.
By the Cauchy-Hadamard’s Theorem, the domain A contains an open interval ( "; ") with
0 < " < 1. For each scalar x, we have
N
X N
X N
X
an xn = a0 + a1 x + an xn = a0 + a1 x + (an 1 + an 2) x
n
n=1 n=2
By the properties of the geometric series, for each x 2 ( "; ") we then have
" p 1 p !n p 1 p !n #
x 1+ 5X 1+ 5 n 1 5X 1 5
f (x) = p x xn
5 2 2 2 2
n=0 n=0
2 3
1 p !n+1 1 p !n+1
x X 1+ 5 X 1 5
= p 4 xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
x 4X 1 + 5 X 1 5
= p xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
1 4X 1 + 5 X 1 5
= p xn+1 xn+1 5
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 X 1 5
= p xn 1 xn + 1
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 n
X 1 5
= p x xn
5 n=0 2 2
n=0
1
" p !n p !n #
1 X 1+ 5 1 5
= p xn
5 2 2
n=0
By equating coe¢ cients, we conclude that f is generated by the sequence with terms
" p !n p !n #
1 1+ 5 1 5
an = p 8n 0 (10.35)
5 2 2
We call Fibonacci numbers the terms of the sequence (10.35). There is an elegant char-
acterization of their asymptotic behavior.
Proof We have
h p n p ni p n p n
p1 1+ 5 1 5 1+ 5 1 5
an 5 2 2 2 2
p n = p n = p n
p1 1+ 5 p1 1+ 5 1+ 5
5 2 5 2 2
p n
1 5 p !n
2 1 5
= 1 p n =1 p !1
1+ 5 1+ 5
2
296 CHAPTER 10. DISCRETE CALCULUS
p p
where the last step follows from (8.29) since 0 < 1 5 = 1+ 5 < 1.
In solving the Fibonacci recurrence (10.33) it was key that its generating function (10.34)
is a proper rational function, which can be then studied via its partial fraction expansion.
This suggests that, more generally, one can solve recurrences that have proper rational
generating functions. For simplicity, we focus on the case of distinct real roots.
Proposition 424 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then, f is a generating function for
i=1
the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
where, for all i = 1; :::; k,
p (ri )
bi =
ri q 0 (ri )
We give two proofs of this result: the …rst one is direct, while the second one relies on
formula (10.32).
c1 c2 ck c1 1 c2 1 ck 1
f (x) = + + + =
x r1 x r2 x rk r1 1 rx1 r2 1 rx2 rk 1 rxk
1
X 1
X 1
c1 x n
c2 x n
ck X x n
=
r1 r1 r2 r2 rk rk
n=0 n=0 n=0
1
X n n n
c1 x c2 x ck x
=
r1 r1 r2 r2 rk rk
n=0
X1 1
X
c1 xn c2 xn ck xn c1 1 c2 1 ck 1
= = + + + xn
r1 r1n r2 r2n rk rkn n
r1 r1 r2 r2n rk rkn
n=0 n=0
X1
1 1 1
= b1 n + b2 n + + bk xn
r1 r2 rkn
n=0
where bi = p (ri ) =ri q 0 (ri ) for all i = 1; :::; k. We conclude that f is a generating function
for the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
as desired.
10.5. POWER SERIES 297
Proof 2 Consider the function g (x) = 1= (x r). It can be proved by induction that its
derivative of order n is
n!
g (n) (x) =
(r x)n+1
In view of (10.36), we then have
n! n! n!
f (n) (x) = c1 c2 ck
(r1 x)n+1 (r2 x)n+1 (rk x)n+1
f (n) (0) c1 1 c2 1 ck 1
= n
n! r1 r1 r2 r2n rk rkn
1 1 1
= b1 n + b2 n + + bk n
r1 r2 rk
As a dividend of this result, we can solve linear recurrences of order k given by (8.11),
that is,13 (
a0 = 0 ; a1 = 1 ; ::: ; ak 1 = k 1
(10.37)
an = p1 an 1 + p2 an 2 + + pk an k for n k
Some algebra, left to the reader, shows that the Fibonacci formula (10.34) here takes the
general form of a proper rational function given by
+( 2 k 1
0 1 0) x +( 2 1 0) x + +( k 1 k 2 0) x
f (x) =
1 p1 x p2 x2 pk xk
Assume that the polynomial at the denominator has k distinct real roots r1 , r2 , ..., rk . By
the last result, f is then the generating function of the sequence with terms
1 1 1
an = b1 + b2 n + + bk
r1n r2 rkn
This sequence thus solves the linear recurrence (10.37). The key equation
1 p1 x p 2 x2 p k xk = 0
Example 425 We can solve the Fibonacci recurrence (10.33) through this method.
p It is a
linear recurrence of order 2 where p1 = p2 = 1, a0 = 0, a1 = 1, r1 = 1 + 5 =2, and
p
r2 = 1 5 =2. So,
r1 1 1 1
b1 = = = p =p
r1 ( 1 2r1 ) 1 + 2r1 1+ 1+ 5 5
r2 1 1 1
b2 = = = p = p
r2 ( 1 2r2 ) 1 + 2r2 1+ 1 5 5
1 1 1 1
an = p p n p p n
5 1+ 5 5 1 5
2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p n p n p p n p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p p n p p p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p !n p !n
1 1+ 5 1 1 5
= p p
5 2 5 2
11 1 3
1 x + x2 x =0
6 6
has solutions r1 = 1, r2 = 2, and r3 = 3, we have
1 + ri ri2
bi =
ri 2 + 2ri 3ri2
So, b1 = 1=3, b2 = 1=20, and b3 = 5=69. By the last proposition, the sequence with terms
1 1 1 5 1
an =
3 20 2n 69 3n
solves this linear recurrence of order 3. N
10.6. INFINITE PATIENCE 299
so that the limit case corresponds to the sum of the utilities of all periods, all with equal
unitary weight. When the horizon is in…nite the problem becomes far more complex because,
1
X
for the series ut (xt ) to converge, it must be that limt!1 ut (xt ) = 0 (Theorem 348), which
t=1
is hardly justi…able from an economic standpoint.
Let us consider, instead, the limit
1
X
t 1
lim (1 ) ut (xt )
"1
t=1
0; 0 ; 1; 1 ; 0; 0; 0; 0 ; 1; 1; 1; 1; 1; 1; 1; 1; :::
|{z} |{z} | {z } | {z }
2 elements 2 elements 4 elements 8 elements
where every block of 0s and 1s has length equal to the sum of the lengths of the previous
1
X
blocks. One can show that lim "1 (1 ) t 1
xt does not exist. N
t=1
The next remarkable result, the non-simple proof of which we omit, shows that the
existence of the limit is equivalent to convergence in mean.
14
For the meaning of " 1 we refer the reader to Section 8.8.2.
300 CHAPTER 10. DISCRETE CALCULUS
V (x) = (1 ) U (x) 8x 2 R1
as long as the limits exist. The in…nite patience case is thus captured by the limit of the
average utilities
T
1X
lim ut (xt ) (10.41)
T !1 T
t=1
that is, by the Cesàro limit of the sequence fut (xt )g. Such a criterion can be thus seen as a
limit case for " 1 of the intertemporal utility function V .
X T
The role that the sum ut (xt ) plays in case (10.39) with …nite horizon is thus played in
t=1
the in…nite horizon case by the limit of the average utilities (10.41). This important economic
application of Frobenius-Littlewood’s Theorem allows us to elegantly conclude this chapter.
Part III
Continuity
301
Chapter 11
Limits of functions
sin x
f (x) =
x
and analyze its behavior for points closer and closer to x0 = 0, i.e., to the origin. In the next
table we …nd the values that the function assumes at several such points:
By inserting other points, closer and closer to the origin, we can verify that the corresponding
values of f (x) get closer and closer to L = 1. In this case we say that “the limit of f (x), as
x tends to x0 = 0, is L = 1”. In symbols,
lim f (x) = 1
x!0
Note that in this example the point x0 = 0 where we take the limit does not belong to the
domain of the function f .
x for x 1
f (x) =
1 for x > 1
Its graph is:
303
304 CHAPTER 11. LIMITS OF FUNCTIONS
How does f behave when it approaches the point x0 = 1? By taking points closer and closer
to x0 = 1 we have:
Adding other points, closer and closer to x0 = 1, we can verify that, as x gets closer and
closer to x0 = 1, f (x) gets closer and closer to L = 1. In this case we say that “the limit of
f (x) as x tends to x0 = 1 is L = 1”, and write
lim f (x) = 1
x!1
Observe that the value that the function assumes at the point x0 = 1 is f (1) = 1, so the
limit L = 1 is equal to the value f (1) of the function at x0 = 1.
8
< x if x < 1
f (x) = 2 if x = 1
:
1 if x > 1
Compared to the previous example we have introduced a “jump”at the point x = 1, so that
the function jumps to the value 2 –we have indeed f (1) = 2. The graph now is:
11.1. INTRODUCTORY EXAMPLES 305
If we study the behavior of f for values of x closer and closer to x0 = 1, we build the same
table as before (because the function, except at the point 1, is identical to the one in the
previous example). Therefore, also in this case we have
lim f (x) = 1
x!1
This time the value that the function assumes at the point 1 is f (1) = 2, di¤erent from the
value L = 1 of the limit.
Until now we have approached the point x0 from both the right and the left, that is,
bilaterally (in a two-sided manner). Sometimes this is not possible; rather, one can approach
x0 from either the right or the left, that is, unilaterally (in one-sided manner). Consider, for
example, the function f : R f2g ! R given by f (x) = 1= (x 2) and let x0 = 2. Its graph
is:
306 CHAPTER 11. LIMITS OF FUNCTIONS
“To approach the point x0 = 2 from the right” means to approach it by considering only
values x > 2:
x 2:0001 2:001 2:01 2:05 2:1 2:2 2:5
f (x) 10; 000 1; 000 100 20 10 5 2
For values closer and closer to 2 from the right, the function assumes values that are larger
and larger and unbounded above. In this case we say that “the function f tends to +1 as
x tends to 2 from the right” and write
lim f (x) = +1
x!2+
Let us see now what happens by approaching x0 = 2 from the left, that is, by considering
values x < 2:
For values closer and closer to 2 from the left, the function assumes larger and larger (in
absolute value) negative values. In this case we say that “the function f tends to 1 as x
tends to 2 from the left” and write
lim f (x) = 1
x!2
The “right-hand” and the “left-hand” limits both exist but are (dramatically) di¤erent.
As we will see in Proposition 445, the fact that the one-sided limits are distinct re‡ects
the fact that the two-sided limit of f (x), as x tends to 2, does not exist. Indeed, the equality
of the one-sided limits is equivalent to the existence of the two-sided limit. For example, if
we modify the previous function by considering f (x) = 1= jx 2j, we have
Now the two one-sided limits are equal and coincide with the two-sided one, which in this
case exists (even if in…nite).
Considering again the function f (x) = 1= (x 2), what does it happen if, as x0 , we take
+1? In other terms, what does it happen if we consider increasingly larger values of x?
Look at the following table:
For increasingly larger values of x, the function assumes values closer and closer to 0. In this
11.1. INTRODUCTORY EXAMPLES 307
case we say that “the function tends to 0 as x tends to +1” and write
lim f (x) = 0
x!+1
Observe that the function assumes values close to 0, but always strictly positive: f approaches
0 “from above”. If we want to emphasize this aspect we write
lim f (x) = 0+
x!+1
where 0+ suggests that, while converging to 0, the values of f (x) remain positive.
What does it happen if, instead, as x0 we take 1? We have the following table of
values:
For negative and increasingly larger (in absolute value) values of x, the function assumes
values closer and closer to 0. We say that “the function tends to 0 as x tends to 1” and
write
lim f (x) = 0
x! 1
If we want to emphasize that the function, in approaching 0, remains negative, we write
lim f (x) = 0
x! 1
Finally, after having seen various types of limits, let us consider a function that has no
limit, i.e., that it does not exhibit any “de…nite trend”. Let f : R f0g ! R be given by
1
f (x) = sin
x
At the origin, i.e., at x0 = 0, the function does not have a limit: for x closer and closer to
the origin, the function continues to oscillate with a tighter and tighter sinusoidal trend:
1 y
0.8
0.6
0.4
0.2
0
x
-0.2
-0.4
-0.6
-0.8
-1
The origin is, however, the only point where this function does not have a limit: at all other
points of the domain the limit exists. A much more dramatic behavior is displayed by the
Dirichlet function f : R ! R de…ned by
(
1 for x 2 Q
f (x) = (11.3)
0 for x 2 =Q
This remarkable function oscillates “obsessively”between the values 0 and 1 because, by the
density of the rational numbers in the real numbers, for any pair x < y of real numbers there
exists a rational number q such that x < q < y. As we will see, the Dirichlet function does
not have a limit at any point x0 2 R.
(i) limx!x0 f (x) = L 2 R, i.e., both the point x0 and the limit L are …nite (scalars);
(ii) limx!x0 f (x) = 1, i.e., the point x0 is …nite, but the limit L is in…nite;
(iii) limx!+1 f (x) = L 2 R or limx! 1f (x) = L 2 R, i.e., the point x0 is in…nite, but
the limit L is …nite;
(iv) limx!+1 f (x) = 1 or limx! 1f (x) = 1, i.e., both the point x0 and the limit L
are in…nite.
We formalize the notion of limit in these cases. We begin with case (i). First of all, let
us observe that we can meaningfully talk of the limit at x0 2 R of a function with domain
A only when x0 is a limit point of A. Indeed, in this case the sentence “as x 2 A tends to
x0 ” is meaningful.
if, for every " > 0, there exists a " > 0 such that, for every x 2 A,
0 < jx x0 j < " =) jf (x) Lj < " (11.4)
The value L is called the limit of the function at x0 .
Example 430 Let us show that limx!2 (3x 5) = 1. We have to verify that, for every
" > 0, there exists " > 0 such that
We have j(3x 5) 1j < " if and only if jx 2j < "=3. Therefore, setting " = "=3 yields
(11.6). N
Intuitively, the smaller (so the more demanding) the value of " is, the smaller " is. To
make more precise this intuition, note that the relationship between " and " is similar,
mutatis mutandis, to that between " and n" in the de…nition of converge of sequences, which
we discussed at length after De…nition 277. Now to show that L is a limit of f at x0 , you
have to pass the following, still highly demanding, test: given any threshold " > 0 selected
by a relentless examiner, you have to be able to come up with a small enough " so that all
points that are close to x0 have images that are " close to L.
Note that " depends on " and is not unique: when we …nd a value of " , all smaller
values also work …ne. For instance, in the last example we can choose as " any (positive)
value lower than "=3. But, one typically focuses on the largest such " (if exists), which is a
genuine threshold value. It is in terms of such “threshold” " that we can, indeed, say: the
smaller (so the more demanding) the value of " is, the smaller " is.
N.B. The value of " , besides depending on ", clearly depends also on x0 . This dependence
is, however, so obvious that it can safely omitted in the notation. O
O.R. It is hard to overestimate the importance of the previous “test” in making rigorous
limit notions in mathematics. Its origin traces back to Eudoxus’method of exhaustion that
underlies integration theory (Chapter 35). Perhaps, the best classic description of a form of
such test is Proposition 1 in Euclid’s Book X: “Two unequal magnitudes being set out, if
from the greater there is subtracted a magnitude greater than its half, and from that which
is left a magnitude greater than its half, and if this process is repeated continually, then there
will be left some magnitude less than the lesser magnitude set out” (trans. Heath – we put
in italics the words where the test emerges). Yet, it was only in XIX century that, through
the works of Cauchy and Weierstrass, the test took the form that we presented in De…nitions
277 and 429. H
Example 431 For the Dirichlet function (11.3), limx!x0 f (x) does not exist for any x0 2 R.
Indeed, given x0 2 R, let us suppose, by contradiction, that limx!x0 f (x) exists and is equal
to L 2 R. Let 0 < " < 1=2. By de…nition, there exists = " such that1
1
x0 6= x 2 (x0 ; x0 + ) =) jf (x) Lj < " <
2
1
The expression “x0 6= x 2 (x0 ; x0 + )” means “x 2 (x0 ; x0 + ) and x 6= x0 ”. In words, x
belongs to the interval (x0 ; x0 + ) but is distinct from x0 . To ease notation, similar expressions are used
throughout the chapter.
310 CHAPTER 11. LIMITS OF FUNCTIONS
In each neighborhood (x0 ; x0 + ) there exist both rational points and irrational points
distinct from x0 (see Proposition 39), so points x0 ; x00 2 (x0 ; x0 + ) for which f (x0 ) = 1
00
and f (x ) = 0. We thus reach the contradiction
1 1
1 = j1 0j = f x0 f x00 f x0 L + L f x00 < + =1
2 2
Therefore, limx!x0 f (x) does not exist for any point x0 2 R. N
De…nition 429, in which the distances are made explicit, is of the “"- ” type. In view
of (11.5), it is immediate to rewrite it in the language of neighborhoods. To make notation
more expressive, we denote by U (x0 ) a neighborhood of x0 of radius and by V" (L) a
neighborhood of L of radius ". Graphically, the former is a neighborhood in the horizontal
axis, while the latter is a neighborhood in the vertical axis.
lim f (x) = L 2 R
x!x0
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
lim f (x) = L 2 R
x!x0
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
The di¤erence between De…nitions 432 and 433 is apparently minor: in the former de…ni-
tion we have R, in the latter we have R. The simple modi…cation allows, however, to consider
also the cases (ii), (iii) and (iv). In particular:
To exemplify we consider explicitly a few subcases, leaving to the reader the other ones.
We start with the subcase x0 2 R and L = +1 of (i). In this case De…nition 433 reduces to
the following “"- ” form (that is, with distances made explicit).
2
In a nutshell, we can say that “there exists a neighborhood” takes the place of the adverb “eventually”
used for sequences.
11.2. FUNCTIONS OF A SINGLE VARIABLE 311
lim f (x) = +1
x!x0
if, for every M > 0, there exists M > 0 such that, for every x 2 A, we have
In other words, for each constant M , no matter how large, there exists M > 0 such that
all the points x0 6= x 2 A that are M close to x0 have images f (x) larger than M .
The point x0 = 2 is a limit point for R f2g, so we can consider limx!2 f (x). Let M > 0.
Setting M = 1=M , we have
1 1
0 < jx x0 j < M () 0 < jx 2j < =) >M
M jx 2j
and therefore
0 < jx 2j < M =) f (x) > M
That is, limx!2 f (x) = +1. N
Let us now consider case (iii) with x0 = +1 and L 2 R. Here De…nition 433 reduces to
the following “"- ” one.
lim f (x) = L 2 R
x!+1
if, for every " > 0, there exists M" > 0 such that, for every x 2 A, we have
In this case, for each choice of " > 0 arbitrarily small, there exists a value M" such that
the images of points x greater that M" are " close to L.
Finally, we consider case (iv) with x0 = L = +1. In this case De…nition 433 reduces to
the following one:
lim f (x) = +1
x!+1
if, for every M > 0, there exists N such that, for every x 2 A, we have
Setting N = M 2 yields
x > N =) f (x) > M
That is, limx!+1 f (x) = +1. N
N.B. If A = N+ , that is, f : N+ ! R is a sequence, with the last two de…nitions we recover
the notions of convergence and of (positive) divergence for sequences. The theory of limits of
functions extends, therefore, the theory of limits of sequences of Chapter 8. In this respect,
note that the set N+ has only one limit point: +1. This is why the only limit meaningful
for sequences is limn!1 . O
O.R. It may be useful to see the concept of limit “in three stages” (as a rocket):
(iii) all the values of f at x 2 U , x 6= x0 , belong to V , i.e., all the images – excluding at
most f (x0 ) –of f in U \ A belong to V : f (U \ A fx0 g) V .
10 y
V(l)
6
O U(x ) x
0
0
-2
-2 -1 0 1 2 3 4
We are often tempted to simplify to two stages: “the values of x close to x0 have images
f (x) close to L”, that is,
Unfortunately, this an empty statement that is always (vacuously) true, as the …gure shows:
5
y
3 V(l)
0
O x
U(x )
-1 0
-2
-3
-4
-4 -2 0 2 4 6
In the …gure, for every neighborhood U (x0 ), however small, of x0 there exists always a
neighborhood (possibly quite big) V (L) of L inside which fall all the values of f (x) with
x 2 U fx0 g. Such V can always be taken as an open interval that contains f (U fx0 g).H
314 CHAPTER 11. LIMITS OF FUNCTIONS
It is easy to see that limx!1 f (x) does not exist. In these cases one can resort to the weaker
notion of one-sided (or unilateral) limit, which we already met in an intuitive way in the
introductory examples of this chapter. These examples, indeed, suggest two possible cases
when the right limit exists:
Similarly, we also have two “left” cases. Note that in both (i) and (ii) the point x0 is in
R, while the value of the limit is in R.
The next “right” de…nition includes both cases.
lim f (x) = L 2 R
x!x+
0
if, for every neighborhood V" (L) of L, there exists a right neighborhood U +" (x0 ) = [x0 ; x0 + " )
of x0 such that
x0 6= x 2 U +" (x0 ) \ A =) f (x) 2 V" (L) (11.12)
The value L is called the right limit of the function at x0 .
In a similar way we can de…ne the left limits, denoted by limx!x f (x), as readers can
0
check.
11.2. FUNCTIONS OF A SINGLE VARIABLE 315
By excluding x0 , the neighborhood U +" (x0 ) reduces to (x0 ; x0 + " ), so (11.12) can be
more simply written as
lim f (x) = L 2 R
x!x+
0
if, for every " > 0, there exists = " > 0 such that, for every x 2 A,
Let us consider the subcase L = +1 of (ii), leaving to the reader the subcase L = 1.
For this case, De…nition 440 reduces to the following “"- ” one.
lim f (x) = +1
x!x+
0
if, for every M > 0, there exists M > 0 such that, for every x 2 A,
We close this section with an example, from the introduction, in which both one-sided
limits (right and left) exist, but are di¤erent.
316 CHAPTER 11. LIMITS OF FUNCTIONS
1 1
x x0 < M () x 2< =) >M
M x 2
Therefore
0<x 2< M =) f (x) > M
that is, limx!2+ f (x) = +1. On the other hand, for every x < 2 we have
1 1
x0 x< M () 2 x< =) < M
M x 2
Therefore
0<2 x< M =) f (x) < M
That is, limx!2 f (x) = 1. We conclude that the two one-sided limits exist but are
dramatically di¤erent. This formally proves (11.1), which was intuitively discussed in the
introduction. N
Note that B" (x0 ) fx0 g is a neighborhood of x0 deprived of x0 itself, so “with a hole”
in the middle. The result requires that there exists at least one such neighborhood in A.
Clearly, if x0 2 A this amounts to require that x0 be an interior point of A. But the
hole permits x0 to be outside A. For instance, this is the case if we consider (again) the
function f (x) = 1= jx 2j and the point x0 = 2, which is outside the domain of f . We have
limx!2 f (x) = +1 and hence, by Proposition 445,
So, by Proposition 445 the two-sided limit limx!2 f (x) does not exist.
11.2. FUNCTIONS OF A SINGLE VARIABLE 317
Proof We prove the proposition for L 2 R, leaving to the reader the case L = 1.
Moreover, for simplicity we suppose that x0 is an interior point of A.
“If”. We show that limx!x f (x) = limx!x+ f (x) = L implies limx!x0 f (x) = L. Let
0 0
" > 0. Since limx!x+ f (x) = L, there exists 0" > 0 such that, for every x 2 (x0 ; x0 + 0" ) \ A,
0
we have jf (x) Lj < ". On the other hand, since limx!x f (x) = L, there exists 00" > 0
0
00 0 00
such that for every x 2 (x0 " ; x0 ) \ A we have jf (x) Lj < ". Let " = min "; " .
Then
x 2 (x0 ; x0 + " ) \ A =) jf (x) Lj < " (11.15)
and
x 2 (x0 " ; x0 ) \ A =) jf (x) Lj < " (11.16)
that is
x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < "
Therefore, limx!x0 f (x) = L.
“Only if”. We show that limx!x0 f (x) = L implies limx!x f (x) = limx!x+ f (x) = L.
0 0
Let " > 0. Since limx!x0 f (x) = L, there exists " > 0 such that
Since x0 is not a boundary point, both intersections (x0 " ; x0 ) \ A and (x0 ; x0 + " ) \ A
are not empty. Therefore, (11.17) implies both (11.15) and (11.16), so limx!x+ f (x) =
0
limx!x f (x) = L.
0
As the reader may have noted, when A is an interval the hypothesis B" (x0 ) fx0 g A
of Proposition 445 forbids x0 to be a boundary point. Indeed, to …x ideas, assume that A
is an interval of the real line with endpoints a < b.4 When x0 = a = inf A, it does not
make sense to talk of the one-sided limit limx!a f (x), while when x0 = b = sup A it does
not make sense to talk of the one-sided limit limx!b+ f (x). So, at the endpoints one of the
one-sided limit becomes meaningless.
Interestingly, at the endpoints we have, instead, limx!a f (x) = limx!a+ f (x) and limx!b f (x) =
limx!b f (x). Indeed, the de…nition of two-sided limit is perfectly satis…ed: for each neigh-
borhood V of L there exists a neighborhood –necessarily one-sided because x0 is an endpoint
–such that the images of f , except perhaps f (x0 ), fall in V .
A similar observation can be made, more generally, at each boundary point x0 of A. For
p
instance, if A is a half-line [x0 ; +1), the left limit at x0 is meaningless: for f (x) = x and
p
x0 = 0, the left limit limx!0 x is meaningless.
p
Example 446 Let f : [0; 1) ! R be given by f (x) = x. We just remarked that
limx!0 f (x) is meaningless , while in Example 442 we saw that limx!0+ f (x) = 0. By
what we just noted, we can also write limx!0 f (x) = 0. It is instructive to compute this
two-sided limit directly, through De…nition 429. Let " > 0. As we saw in Example 442, we
have p
jf (x) Lj = x < " () x < "2
4
In other words, one of the following four cases holds: (i) A = (a; b); (ii) A = [a; b); (iii) A = (a; b]; (iv)
A = [a; b].
318 CHAPTER 11. LIMITS OF FUNCTIONS
Setting " = "2 , for every x 2 A, that is, for every x 0, we have
lim f (x) = L 2 R
x!x0
f ((U \ A) fx0 g) V
It is this version of two-sided limit that the reader will …nd generalized to topological
spaces in more advanced courses. A similar general version holds for one-sided limits, as the
reader can check.
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
De…nition 433 is the special case with n = 1. In the “"- ” version we have (11.18) if, for
every " > 0, there exists " > 0 such that, for every x 2 A,
As the reader can check, we can easily extend to functions of several variables the limits
from above and from below (indeed, the limit L keeps being a scalar, not a vector). Moreover,
the notion of limit can be easily extended to operators. But we postpone it to Chapter 12
(De…nition 497), when we will study the continuity of operators, a topic that will motivate
this further extension.
11.3.2 Directions
So far, so good. Too good, in a sense because the multivariable extension of the notion of
limit seems just a matter of upgrading the distance, from the absolute value jx x0 j between
scalars to the more general case of the norm kx x0 k between vectors. Formally, this is true
but one should not forget that, when n > 1, the condition kx x0 k < " controls many more
ways to approach a point. Indeed, in the real line there are only two ways to approach a
point x0 , the left direction and the right one. They are identi…ed with and + in the next
…gure, respectively.
Instead, in the plane –a fortiori, in a general space Rn –there are in…nitely many directions
along which to approach a point x0 , as the …gure illustrates:
320 CHAPTER 11. LIMITS OF FUNCTIONS
Intuitively, condition (11.20) requires that, as x0 approaches x along all such directions, the
function f tends to the same value L. In other words, the behavior of f is consistent across
all such directions. If, therefore, there are two such directions along which f does not tend
to the same limit value, the function does not have a limit as x ! x0 . The following example
should clarify the issue.
log(1 + x1 x2 )
f (x1 ; x2 ) =
x21
Let us verify that lim(x1 ;x2 )!(0;0) f (x) does not exist. Consider two possible directions along
which we can approach the origin: along the parabola x2 = x21 , and along the straight line
x2 = x1 . Graphically:
log(1 + x21 )
lim f (x1 ; x2 ) = lim f (x1 ; x1 ) = lim =1
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21
Since f tends to two di¤erent limit values along the two directions, we conclude that
lim(x1 ;x2 )!(0;0) f (x) does not exist.
We can prove this failure rigorously using De…nition 448. Suppose, by contradiction, that
the limit exists, that is,
lim f (x1 ; x2 ) = L
(x1 ;x2 )!(0;0)
Set " = 1=4. By de…nition of limit, there exists 1 > 0 such that, for (0; 0) 6= (x1 ; x2 ) 2
B 1 (0; 0), we have
1
d (f (x1 ; x2 ) ; L) < (11.21)
4
From the limit along the parabola, by setting
log(1 + x3 )
g(x) =
x2
one gets limx1 !0 g(x1 ) = 0. Therefore, by setting again " = 1=4, there exists 2 > 0 such
that for, 0 6= x1 2 B 2 (0) R, we have
1 1
g(x1 ) 2 ( "; ") = ;
4 4
Now consider the neighborhood B 2 (0; 0) R2 of (0; 0). Take a point on the parabola
2
x2 = x1 that belongs to this neighborhood, that is, a point (0; 0) 6= x ^21 2 B 2 (0; 0). We
^1 ; x
^1 2 B 2 (0),6 so
have x
1 1
f x ^21 = g (^
^1 ; x x1 ) 2 ; (11.22)
4 4
Similarly, from the limit along the straight line, by setting
log(1 + x2 )
h(x) =
x2
one gets limx1 !0 h(x1 ) = 1. Therefore, setting again " = 1=4, there exists 3 > 0 such that
for 0 6= x1 2 B 3 (0) R, we have
3 5
h(x1 ) 2 (1 "; 1 + ") = ;
4 4
Now consider the neighborhood B 3 (0; 0) R2 of (0; 0) and take a point of the straight line
x2 = x1 that belongs to it, that is, a point (0; 0) 6= (~x1 ; x
~1 ) 2 B 3 (0; 0). We have x
~1 2 B 3 (0),
so that
3 5
f (~x1 ; x
~1 ) = h (^
x1 ) 2 ; (11.23)
4 4
6
Indeed, d((^ ^21 ); (0; 0)) <
x1 ; x 2, ^21 + x
that is, x ^41 < 2
2, ^21 <
implies x 2
2, whence d(^
x1 ; 0) < 2.
322 CHAPTER 11. LIMITS OF FUNCTIONS
Proof We consider L 2 R, leaving to the reader the case L = 1. “If”. Suppose f (xn ) ! L
for every sequence fxn g of points of A, with xn 6= x0 for every n, such that xn ! x0 . Suppose,
by contradiction, that limx!x0 f (x) = L is false. Then, there is " > 0 such that, for every
> 0, there exists x 2 A such that 0 < d (x ; x0 ) < and d (f (x ) ; L) ". For every n, set
= 1=n and let xn be the corresponding point of A just denoted by x . For the sequence fxn g
of points of A so constructed, we have d (x0 ; xn ) < 1=n for every n, so limn!1 d (x0 ; xn ) = 0.
By Proposition 281, xn ! x. But, by construction, d (f (xn ) ; L) " for every n, so the
sequence f (xn ) does not converge to L. Having contradicted the hypothesis, we conclude
that limx!x0 f (x) = L.
“Only if”. Suppose limx!x0 f (x) = L 2 R. Let fxn g be a sequence of points of A, with
xn 6= x0 for every n, such that xn ! x0 . Let " > 0. There exists " > 0 such that, for
every x 2 A, 0 < d (x; x0 ) < " implies d (f (x) ; L) < ". Since xn ! x0 and xn 6= x0 , there
exists n" 1 such that 0 < d (xn ; x0 ) < " for every n n" . For every n n" we thus have
d (f (xn ) ; L) < ", which implies f (xn ) ! L.
Example 452 Let us go back to limx!2 (3x 5) of Example 430. Since A = R, let fxn g
be any sequence of scalars, with xn 6= 2 for every n, such that xn ! 2. For example,
xn = 2 + 1=n or xn = 2 1=n2 . By the algebra of limits of sequences, we have
Proof Let us suppose, by contradiction, that there exist two di¤erent limits L0 6= L00 . Let
fxn g be a sequence in A, with eventually xn 6= x0 , such that xn ! x0 . By Proposition 451,
f (xn ) ! L0 and f (xn ) ! L00 , which contradicts the uniqueness of the limit for sequences.
It follows that L0 = L00 .
Alternative proof By contradiction, let us suppose that there exist two di¤erent limits L1
and L2 , that is, L1 6= L2 . We assume therefore that
lim f (x) = L1
x!x0
7
That is, for the case x0 2 R that, indeed, includes x ! 1 as the special cases x ! x0 = 1.
324 CHAPTER 11. LIMITS OF FUNCTIONS
and
lim f (x) = L2
x!x0
with L1 6= L2 . Without loss of generality, suppose that L1 > L2 . There exists a number K
such that L1 > K > L2 . Setting 0 < "1 < L1 K and 0 < "2 < K L2 , the neighborhoods
B"1 (L1 ) = (L1 "1 ; L1 + "1 ) and B"2 (L2 ) = (L2 "2 ; L2 + "2 ) are disjoint.
10 y
L +ε
2 2
8
L2
L -ε
2 2
6
L +ε
1 1
L
1
4 L -ε
1 1
O x
0
-2
-2 -1 0 1 2 3 4
Since by hypothesis limx!x0 f (x) = L1 , given "1 > 0 one can …nd 1 > 0 such that
Analogously, since by hypothesis limx!x0 f (x) = L2 , given "2 > 0 one can …nd 2 > 0 such
that
x0 6= x 2 (x0 2 ; x0 + 2 ) \ A =) f (x) 2 (L2 "2 ; L2 + "2 ) (11.25)
Taking = min f 1 ; 2 g we have that the neighborhood (x0 ; x0 + ) of x0 with radius
is contained in the two previous neighborhoods, i.e., in (x0 ; x0 + ) both (11.24) and
(11.25) hold:
x0 6= x 2 (x0 ; x0 + ) =) f (x) 2 (L1 "1 ; L1 + "1 ) and f (x) 2 (L2 "2 ; L2 + "2 )
Hence,
We continue with a version for functions of the theorem on the permanence of sign
(Theorem 295).
11.4. PROPERTIES OF LIMITS 325
We leave to the reader the easy “sequential” proof based on Theorem 295 and on Pro-
position 451. We give, instead, a proof that directly uses the de…nition of limit.
Alternative proof Let L 6= 0, say L > 0. Since limx!x0 f (x) = L, by taking " = L=2 > 0
there exists a neighborhood B" (x0 ) of x0 such that
L L L 3L
x0 6= x 2 B" (x0 ) \ A =) f (x) 2 L ;L + = ;
2 2 2 2
Since L=2 > 0, we are done. For L < 0, the proof is similar.
and
lim g (x) = lim h (x) = L 2 R (11.27)
x!x0 x!x0
then
lim f (x) = L
x!x0
Again we leave to the reader the easy “sequential” proof based on Theorem 314 and on
Proposition 451, and give a proof based on the de…nition of limit.
Alternative proof Let " > 0. We have to show that there exists > 0 such that f (x) 2
(L "; L + ") for every x0 6= x 2 (x0 ; x0 + ) \ A. Since limx!x0 g(x) = L, there exists
1 > 0 such that
that is
f (x) 2 (L "; L + ") 8x0 6= x 2 (x0 ; x0 + ) \ A
Since " was arbitrary, we conclude that limx!x0 f (x) = L.
The comparison criterion for functions has the same interpretation than the original
version for sequences (Theorem 314). The next simple application of this criterion is similar,
mutatis mutandis, to that seen in Example 315.
2 1
Example 457 Let f : R ! R be given by f (x) = ex cos x and let x0 = 0. Since
1
0 cos2 1 8x 2 R
x
by the monotonicity of the exponential function we have
2 1
1 = e0 x ex cos x e1 x = ex 8x 0
Setting g (x) = 1 and h (x) = ex , conditions (11.26) and (11.27) are satis…ed with L = 1.
Therefore, limx!0 f (x) = 1. The proof for x < 0 is analogous. N
As it was the case for sequences, more generally also for functions the last two results
establish properties of the limits with respect to the underlying order structure of Rn . The
next proposition, which extends Propositions 296 and 297 to functions, is yet another simple
result of this kind.
(ii) If L > H, then there exists a neighborhood of x0 in which f (x) > g (x).
Observe that in (i) we can only say L H even when we have the strict inequality
f (x) > g (x). For example, for the functions f; g : R ! R given by
1 if x = 0
f (x) =
x2 if x =
6 0
and g (x) = 0 we have, for x ! 0, L = H = 0 although f (x) > g (x) for every x 2 R.
Similarly, if f (x) = 1=x and g (x) = 0, for x ! +1 we have L = H = 0 although
f (x) > g (x) for every x > 0.
As we did so far in this section, we leave the sequential proof – based on Propositions
296 and 297 –to readers and give, instead, a proof based on the de…nition of limit.
11.5. ALGEBRA OF LIMITS 327
Alternative proof (i) By contradiction, assume that L < H. Set " = H L, so that
" > 0. The neighborhoods (L "=4; L + "=4) and (H "=4; H + "=4) are disjoint since
L + "=4 < H "=4. Since limx!x0 f (x) = L, there exists 1 > 0 such that
" "
x0 6= x 2 (x0 1 ; x0 + 1) =) f (x) 2 L ;L +
4 4
Analogously, since limx!x0 g (x) = H, there exists 2 > 0 such that
" "
x0 6= x 2 (x0 2 ; x0 + 2) =) g(x) 2 H ;H +
4 4
By setting = minf 1 ; 2 g, we have
" " " "
x0 6= x 2 (x0 ; x0 + ) =) L < f (x) < L + < H < g(x) < H +
4 4 4 4
That is, f (x) < g(x) for every x 2 B (x0 ). This contradicts the hypothesis that f (x) g (x)
in a neighborhood of x0 .
(ii) We prove the contrapositive. It is enough to note that, if f (x) g(x) in every
neighborhood of x0 , then (i) implies L H.
(i) limx!x0 (f + g) (x) = L+M , provided that L+M is not an indeterminate form (1.25),
of the type
+1 1 or 1+1
(ii) limx!x0 (f g) (x) = LM , provided that LM is not an indeterminate form (1.26), of the
type
1 0 or 0 ( 1)
(iii) limx!x0 (f =g) (x) = L=M provided that g (x) 6= 0 in a neighborhood of x0 , with x 6= x0 ,
and L=M is not an indeterminate form (1.27), of the type9
1 a
or
1 0
Proof We prove only (i), leaving to the reader the analogous proof of (ii) and (iii). Let
fxn g be a sequence in A, with xn 6= x0 for every n 1, such that xn ! x0 . By Proposition
451, f (xn ) ! L and g (xn ) ! M . Suppose that L + M is not an indeterminate form. By
Proposition 309, (f + g) (xn ) ! L + M , and therefore, by Proposition 451 it follows that
limx!x0 (f + g) (x) = L + M .
8
For brevity, we focus on Proposition 309 and leave to the reader the analogous extension of Proposition
313.
9
As for sequences, to exclude the indeterminacy a=0 amounts to require M 6= 0.
328 CHAPTER 11. LIMITS OF FUNCTIONS
Example 460 Let f; g : R f0g ! R be given by f (x) = sin x=x and g (x) = 1= jxj. We
have limx!0 sin x=x = 1 and limx!0 1= jxj = +1. Therefore,
sin x 1
lim + = 1 + 1 = +1
x!0 x jxj
As for sequences, when a 6= 0 the case a=0 of point (iii) is actually not an indeterminate
form for the algebra of limits, as the following version for functions of Proposition 311 shows.
Proposition 461 Let limx!x0 f (x) = L 2 R, with L 6= 0, and limx!x0 g(x) = 0. The limit
limx!x0 (f =g) (x) exists if and only if there is a neighborhood U (x0 ) of x0 2 Rn where the
function g has constant sign, except at most at x0 . In this case:10
f (x)
lim = +1
x!x0 g (x)
f (x)
lim = 1
x!x0 g (x)
As in the previous section, we considered only limits at points x0 2 Rn . The reader can
verify that for scalar functions the results of this section extend to the case x ! 1.
Example 463 Take f (x) = 1=x 1 and g(x) = 1=x. As x ! +1 we have f ! 1 and
g ! 0. Since g(x) > 0 for every x > 0, so also in any neighborhood of +1, we have g ! 0+ .
Thanks to the version for x ! 1 of Proposition 461, we have limx!+1 (f =g) (x) = 1.
N
Indeterminate form 1 1
For example, the limit limx!0 (f + g) (x) of the sum of the functions f; g : R f0g ! R
given by f (x) = 1=x2 and g (x) = 1=x4 falls under the indeterminate form 1 1. We
have
1 1 1 1
(f + g) (x) = 2 4
= 2 1
x x x x2
and, therefore,
1 1
lim (f + g) (x) = lim lim 1 = 1
x!0 x!0 x2 x!0 x2
since (+1) ( 1) is not an indeterminate form. Exchanging the signs between these two
functions, that is, by setting f (x) = 1=x2 and g (x) = 1=x4 , we have again the indetermin-
ate form 1 1 at x0 = 0, but this time limx!0 (f + g) (x) = +1. Thus, also for functions
the indeterminate forms can give completely di¤erent results, everything goes. So, they must
be solved case by case.
Finally, note that the these functions f and g give rise to an indeterminacy at x0 = 0,
but not at x0 6= 0. Therefore, for functions it is crucial to specify the point x0 that we
are considering. This is, indeed, the only novelty that the study of indeterminate forms of
functions features relative that of sequences (for which we only have the case n ! +1).
Indeterminate form 0 1
For example, consider the functions f; g : R ! R given by f (x) = (x 3)2 and g (x) =
1= (x 3)4 . The limit limx!3 (f g) (x) falls under the indeterminate form 0 1. But we have
1 1
lim (f g) (x) = lim (x 3)2 4 = lim = +1
x!3 x!3 (x 3) x!3 (x 3)2
On the other hand, by considering f (x) = 1= (x 3)2 and g (x) = (x 3)4 , we have
1
lim (f g) (x) = lim (x 3)4 2 = lim (x 3)2 = 0
x!3 x!3 (x 3) x!3
Again, only the direct calculation of the limit can determine its value.
f 5 x 5 x 1 1
lim (x) = lim = lim = lim =
x!5 g x!5 x2 25 x!5 (x 5)(x + 5) x!5 x + 5 10
f x2
lim (x) = lim = lim x = +1
x!+1 g x!+1 x x!+1
330 CHAPTER 11. LIMITS OF FUNCTIONS
f x2
lim (x) = lim = lim x = 1
x! 1 g x! 1 x x! 1
In the two case the limits are in…nities of opposite sign: again, one cannot avoid the direct
calculation of the limit.
For the functions f and g just seen, at the point x0 = 0 we have the indeterminate form
0=0, but
f x2
lim (x) = lim = lim x = 0
x!0 g x!0 x x!0
while, setting g (x) = x4 , we still have an indeterminate form of the type 0=0 and
f x2 1
lim (x) = lim = lim 2 = +1
x!0 g x!0 x4 x!0 x
p
On the other hand, by taking f : R+ ! R given by f (x) = x + x 2 and g : R f1g ! R
given by g (x) = x 1, we have
p p p
f x+ x 2 x 1+ x 1 x 1
lim (x) = lim = lim = lim 1 +
x!1 g x!1 x 1 x!1 x 1 x!1 x 1
p
x 1 1 1 3
= 1 + lim p p = 1 + lim p =1+ =
x!1 ( x 1) ( x + 1) x!1 x+1 2 2
We close with two observations: (i) as for sequences (Section 8.10.5), for functions the
various indeterminate forms can be reduced to one another; (ii) also in the case of functions
we can summarize what we have seen till now in tables similar to those in Section 8.10.4, as
readers can check.
lim xn = xn0
x!x0
(iv) Let f : R++ ! R be given by f (x) = loga x, with a > 0; a 6= 1. For every x0 > 0, we
have limx!x0 loga x = loga x0 . Moreover,
( (
1 if a > 1 +1 if a > 1
lim loga x = and lim loga x =
x!0+ +1 if a < 1 x!+1 1 if a < 1
(v) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. For every x0 2 R, we
have limx!x0 sin x = sin x0 and limx!x0 cos x = cos x0 . The limits limx! 1 sin x and
limx! 1 cos x do not exist. N
Next we prove some classic limits for trigonometric functions (we already met the …rst
one in the introduction of this chapter).
Proposition 465 Let f; g : R+ ! R be de…ned by f (x) = sin x=x and g (x) = (cos x 1) =x.
Then
sin x
lim =1 (11.30)
x!0 x
and
1 cos x 1 cos x 1
lim = 0; lim 2
= (11.31)
x!0 x x!0 x 2
Proof It is easy to see graphically that 0 < sin x < x < tan x for x 2 (0; =2) and that
tan x < x < sin x < 0 for x 2 ( =2; 0). Therefore, by dividing all the terms by sin x and
by observing that sin x > 0 when x 2 (0; =2) and sin x < 0 when x 2 ( =2; 0), we have in
all the cases
x 1
1< <
sin x cos x
The …rst limit then follows from the comparison criterion. For the third one, it is su¢ cient
to observe that
1 cos x 1 cos x 1 + cos x 1 cos2 x sin2 x 1
2
= 2
= 2
= 2
x x 1 + cos x x (1 + cos x) x 1 + cos x
332 CHAPTER 11. LIMITS OF FUNCTIONS
and that, as x ! 0, the …rst factor tends to 1 while the second one tends to 1=2. Finally,
the second limit follows immediately from the third one:
1 cos x 1 cos x 1
=x 2
!0 =0
x x 2
Finally, from the analogous propositions that we proved for sequences, we easily deduce
(the proofs are essentially identical) the following limits:
af (x) 1
lim = log a
x!x0 f (x)
In particular,
ax 1
lim = log a (11.32)
x!0 x
which, when a = e, becomes
ex 1
lim =1
x!0 x
(iii) Let 0 < a 6= 1 and f (x) ! 0 as x ! x0 . Then
loga (1 + f (x)) 1
lim =
x!x0 f (x) log a
In particular,
loga (1 + x) 1
lim =
x!0 x log a
which, when a = e, becomes
log(1 + x)
lim =1
x!0 x
(iv) If f (x) ! 0 as x ! x0 , we have
(1 + f (x)) 1
lim =
x!x0 f (x)
In particular,
(1 + x) 1
lim =
x!0 x
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 333
is the classic CRRA (constant relative risk aversion) utility function, where the scalar is
interpreted as a coe¢ cient of relative risk aversion (see Pratt, 1964, p. 134). In view of the
limit (11.32),11 we have lim !1 u (x) = lim !1 x1 1 = (1 ) = log x. O
(i) If
f (x)
lim =0
x!x0 g (x)
we say that f is negligible with respect to g as x ! x0 ; in symbols,
f = o (g) as x ! x0
(ii) If
f (x)
lim = k 6= 0 (11.33)
x!x0 g (x)
we say that f is comparable with g as x ! x0 ; in symbols,
f g as x ! x0
(iii) In particular, if
f (x)
lim =1
x!x0 g (x)
we say that f and g are asymptotic (or asymptotically equivalent) to one another as
x ! x0 and we write
f (x) g (x) as x ! x0
11
Here 1 plays the role of x in (11.32).
334 CHAPTER 11. LIMITS OF FUNCTIONS
It is easy to see that also for functions the relations and for continue to satisfy the
properties seen in Section 8.14 for sequences, i.e.,
(i) the relations of comparability and of asymptotic are symmetric and transitive;
(ii) the relation of negligibility is transitive;
(iii) if limx!x0 f (x) and limx!x0 g (x) are both …nite and non-zero, then f g as x ! x0 ;
(iv) if limx!x0 f (x) = 0 and 0 6= limx!x0 g (x) 2 R, then f = o (g) as x ! x0 .
We now consider the cases, which also for functions continue to be the most interesting
ones, in which both functions either converge to zero or diverge to 1. We start with
convergence to zero: limx!x0 f (x) = limx!x0 g (x) = 0. In this case, intuitively, f is neg-
ligible with respect to g as x ! x0 if it tends to zero faster. Let, for example, x0 = 1,
f (x) = (x 1)2 and g (x) = x 1. We have
(x 1)2
lim = lim (x 1) ! 0
x!1 x 1 x!1
So, f = o (g) also as x ! 1: in both cases x tends to in…nity slower than x2 . Note that,
as x ! 0, we have instead limx!0 x2 = limx!0 x = 0 and
x2
lim = lim x = 0
x!0 x x!0
so that g = o (f ) as x ! 0.
In sum, also for functions the meaning of negligibility must be speci…ed according to
whether we consider convergence to zero or divergence to in…nity. Moreover, the point x0
where we take the limit is key, as already remarked several times (repetita iuvant, hopefully).
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 335
Proposition 467 For every pair of functions f and g and for every scalar c 6= 0, we have:
(i) o(f ) + o (f ) = o (f );
We omit the proof because it is similar, mutatis mutandis, to that of Proposition 329.
Also the comments we made about that proposition still apply – in particular, about the
important special case o(f )o(f ) = o(f 2 ) of point (ii).
Example 468 Let f (x) = xn , with n > 2. Consider the two functions g(x) = xn 1 and
h(x) = e x 3xn 1 . It easy to check that g = o(f ) = o(xn ) and h = o(f ) = o(xn ) as
x ! +1.
(i) Summing the two functions we obtain g + h = e x 2xn 1, which is still o(xn ) as
x ! +1, in accordance with Proposition 467-(i).
(iii) Set c = 3 and consider c g = 3xn 1 . It is easy to check that 3xn 1 is still o(xn ) as
x ! +1, in accordance with Proposition 467-(iii).
(iv) Consider the function l(x) = x + 1. It is easy to check that l = o(g) = o(xn 1 ) as
x ! +1. Consider now the sum l + h, which is a sum of a o(g) and of a o(f ), with
g = o(f ). We have l + h = x + 1 + e x 3xn 1 , which is o(xn ) as x ! +1, i.e., o(f ),
in accordance with Proposition 467-(iv). Note that l + h is not o(g), even if l = o(g).
N
The next proposition presents some classic instances of functions with di¤erent rates of
divergence.
xk
lim x
=0
x!+1
(ii) xh = o xk as x ! +1 if h < k;
loga x
lim =0
x!+1 xk
By the transitivity property of the negligibility relation, from (i) and (ii) it follows that
x
loga x = o ( ) as x ! +1
Proof For all the three functions x , xk , loga x, one has that f (n 1) f (x) f (n) where
n = [x] is the integer part of x: such sequences are therefore increasing. It is then su¢ cient
to use the sequential characterization of the limit of a function and to use the comparison
criterion.
That is, two functions asymptotic to one another as x ! x0 have the same limit as x ! x0 .
In particular, we have the following version for functions of Lemma 331.13
(ii) f (x) =h (x) g (x) =l (x) as x ! x0 , provided that h (x) 6= 0 and l(x) 6= 0 in every
point x 6= x0 of a neighborhood B" (x0 ).
13
Relative to that lemma, for brevity here we limit ourselves to products and quotients (which are, in any
case, the more interesting cases).
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 337
Therefore,
lim f (x) = L () lim (f (x) + o (f (x))) = L
x!x0 x!x0
and
f (x) + o (f (x)) f (x)
as x ! x0 (11.36)
g (x) + o (g (x)) g (x)
3
and let us set f (x) = x and g (x) = x 2 . As x ! +1, we have
1 2
2x 2 + 5x 3 = o (f ) and 3 + 3x = o (g)
x 2+ 2x 4 + e x x 2
4 + x 8 + 3x 10 4
= x2 ! +1 as x ! +1
x x
(iii) Consider the limit
1 cos x
lim
sin2 x + x3 x!0
11.7.3 Terminology
Here too, for the comparison of two functions that both either converge to 0 or diverge to
1, there is a speci…c terminology. In particular,
(iii) if two functions f and g are in…nitesimal at x0 and such that f = o (g) as x ! x0 , then
f is said to be in…nitesimal of higher order at x0 with respect to g;
(iv) if two functions f and g are in…nite at x0 and such that f = o (g) as x ! x0 , then f
is said to be in…nite of lower order with respect to g.
A function is, therefore, in…nitesimal of higher order than another one if it tends to zero
faster, while it is in…nite of lower order if it tends to in…nity slower.
(ii) xk = o ( x ) for every > 1 and k > 0, as already proved with the ratio criterion. If
instead 0 < < 1 and k > 0, then x = o xk .
(iii) If k1 > k2 > 0, then xk2 = o xk1 ; indeed, xk2 =xk1 = xk2 k1 ! 0.
The previous results can be organized in scales of in…nities and in…nitesimals, in analogy
with what we saw for sequences. For brevity we omit the details.
Chapter 12
Continuous functions
Ibis redibis, non morieris in bello (you will go, you will return, you will not die in war).
So the oracle muttered to the inquiring king, who had to decide whether to go to war. Or,
maybe, the oracle actually said: ibis redibis non, morieris in bello (you will go, you will not
return, you will die in war). A small change in a comma, a dramatic di¤erence in meaning.
When small changes have large e¤ects, instability may result: a small change may, sud-
denly, dramatically alter matters. In contrast, stability prevails when small changes can only
have small e¤ects, in which nothing dramatic can happen because of small alterations. Con-
tinuity is the mathematical translation of this general principle of stability for the relations
between dependent and independent variables that functions represent.
12.1 Generalities
Intuitively, a scalar function is continuous when the relation between the independent variable
x and the dependent variable y is “regular”, without breaks. The graph of a continuous
function can be drawn without ever lifting the pencil.
This means that a function is continuous at a point x0 of the domain if the behavior
towards x0 of the function is consistent with the value f (x0 ) that it actually assumes at x0 ,
that is, if the limit limx!x0 f (x) is equal to the image f (x0 ).
339
340 CHAPTER 12. CONTINUOUS FUNCTIONS
( p
x for x 0
f (x) =
1 for x = 1
Here x0 = 1 is an isolated point in the domain. Hence, we can (conveniently) say that f is
continuous at every point of its domain.
3
y
1 1
0
-1 O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
8
>
> x for x < 1
<
f (x) = 2 for x = 1 (12.2)
>
>
:
1 for x > 1
12.1. GENERALITIES 341
3
y
2 2
1 1
0
O 1 x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
The function f is, thus, not continuous at the point x0 = 1 because there is no consistency
between the behavior at the limit and the value at x0 . On the other hand, f is continuous at
all the other points of its domain: indeed, it is immediate to verify that limx!x0 f (x) = f (x0 )
for every x0 6= 1, so f does not exhibit other jumps besides the one at x0 = 1.
The distinction between limit points and isolated points becomes super‡uous for the
important case of functions f : I ! R de…ned on an interval I of the real line. Indeed, the
points of any such interval (be it bounded or unbounded, closed, open, or semi-closed) are
always limit points, so that f is continuous at any x0 2 I if limx!x0 f (x) = f (x0 ). For
example, f : (a; b) ! R is continuous at x0 2 (a; b) if limx!x0 f (x) = f (x0 ).
Proof The result follows immediately from Proposition 451 once we observe that, when x0
is an isolated point of A, the unique sequence contained in A that tends to x0 is constant,
i.e., fx0 ; x0 ; :::g.
Let us give some examples. We start by observing that elementary functions are con-
tinuous.
2
The condition xn 6= x0 of Proposition 451 does not appear here because x0 belongs to A.
342 CHAPTER 12. CONTINUOUS FUNCTIONS
Example 476 (i) Let f : R++ ! R be given by f (x) = log x. Since limx!x0 log x = log x0
for every x0 > 0, the function is continuous.
(ii) Let f : R ! R be given by f (x) = ax , with a > 0. Since limx!x0 ax = ax0 for every
x0 2 R, the function is continuous.
(iii) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. Since limx!x0 sin x =
sin x0 and limx!x0 cos x = cos x0 , both functions are continuous. N
Example 479 The Dirichlet function is not continuous at any point of its domain: limx!x0 f (x)
does not exist for any x0 2 R (Example 431). N
For which values of b is f continuous at x0 = 2 (so, on its domain)? To answer this question,
it is necessary to …nd the value of b such that
so that f and lim becomes exchangeable. Such exchangeability is the essence of the concept
of continuity.
O.R. Naively, we could claim that a function such as f (x) = 1=x has a (huge) discontinuity
at x = 0. After all, it makes a “big jump” by passing from 1 to +1.
10
y
8
0
O x
-2
-4
-6
-8
-10
-2 -1 0 1 2 3 4
In contrast, the function g (x) = log x does not su¤er from any such problem, so it seems
“more continuous”:
344 CHAPTER 12. CONTINUOUS FUNCTIONS
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
If we pay close attention to these two functions, however, we would realize that 1=x commits
the little sin of not being de…ned for x = 0 (an “original”sin), while log x commits the much
more serious sin of being de…ned neither at x = 0 nor at any x < 0.
The truth is that, at the points at which a function is not de…ned it is meaningless to
wonder about its continuity,3 a property that can only be considered at points where the
function is de…ned. At such points, the functions 1=x and log x are both continuous. H
12.2 Discontinuity
As the examples just seen indicate, for functions of a single variable there are di¤erent types
of discontinuity:4
(i) f is not continuous at x0 because limx!x0 f (x) exists and is …nite, but it is di¤erent
from f (x0 );
(ii) f is not continuous at x0 because the one-sided limits limx!x f (x) and limx!x+ f (x)
0 0
exist and are …nite, but they are di¤erent, i.e., limx!x f (x) 6= limx!x+ f (x) (so,
0 0
limx!x0 f (x) does not exist);
(iii) f is not continuous at x0 because at least one of the one-sided limits limx!x f (x) and
0
limx!x+ f (x) is either 1 or does not exist.
0
For example, the discontinuity at x0 = 1 of the function (12.2) is of type (i) because
limx!1 f (x) exists, but it is di¤erent from f (1). The discontinuity at x0 = 1 of the function
(12.5) is of type (ii) because
On the contrary, the discontinuity at x0 = 0 of the function (12.3) is of type (iii) because
lim f (x) = 1=
6 lim f (x) = +1
x!0 x!0+
In the same way, the discontinuity at x0 = 0 of the function (12.4) is of type (iii) because
(the two-sided limit here exists, but it is in…nite). The discontinuity at each point x0 2 R of
the Dirichlet function is also of type (iii) because it is easy to see that its one-sided limits
do not exist.
Non-removable discontinuity is, de…nitely, a more severe form of discontinuity than the
removable one (as the terminology suggests). Indeed, the latter can be “…xed”by modifying
the function f at x0 in the following way:
(
f (x) if x 6= x0
f~ (x) = (12.7)
limx!x0 f (x) if x = x0
The function f~ is the “…xed” version of the function f that restores the continuity at x0 .
For example, the …xed version of the function (12.2) is
( (
f (x) if x 6= 1 x if x 1
f~ (x) = =
limx!x0 f (x) if x = 1 1 if x > 1
As the reader can easily verify, such …xing is no longer possible for non-removable dis-
continuities, which represent substantial discontinuities of a function.
for each pair of points x0 < y0 of the domain of f . Therefore, these limits cannot be in…nite,
which excludes discontinuities of type (iii).
Moreover, f cannot even have removable discontinuities because they would violate mono-
tonicity. Therefore, a monotonic function can only have jump discontinuities. Indeed, the
next result shows that a monotonic function can have at most countably many jump dis-
continuities. The proof of this useful result is based on the following lemma, which is of
independent interest.
Proof Let fIj gj2J be a set of disjoint intervals of R. By the density of the rational numbers,
each interval Ij contains a rational number qj . Since the intervals are disjoint, qj 6= qj 0 for
j 6= j 0 . Then the set of rational numbers fqj gj2J is a proper subset of Q and is, therefore,
at most countable. In turn, this implies that the index set J is, at most, countable.
The disjointedness hypothesis cannot be removed: for instance, the set of overlapping
intervals f( r; r) : r 2 Rg is clearly uncountable.
Proposition 483 A monotonic function can have at most countably many jump discontinu-
ities.
Proof A jump discontinuity of the function f at the point x0 determines a bounded interval
with endpoints limx!x f (x) and limx!x+ f (x). By the monotonicity of f , the intervals
0 0
determined by the jumps are disjoint. By Lemma 482, the intervals, and therefore the jumps
of f , are at most countable.
In the proof the monotonicity hypothesis is key for having countably many discontinuities:
it guarantees that the intervals de…ned by the jumps of the function do not overlap.
Proof We prove (i), leaving to the reader the other points. Since limx!x0 f (x) = f (x0 ) 2 R
and limx!x0 g (x) = g (x0 ) 2 R, Proposition 459-(i) yields
Therefore, f + g is continuous at x0 .
12.4. ZEROS AND EQUILIBRIA 347
Proof Let fxn g A be such that xn ! x0 . By Proposition 475, f (xn ) ! f (x0 ). Since
g is continuous at f (x0 ), another application of Proposition 475 shows that g (f (xn )) !
g (f (x0 )). Therefore, g f is continuous at x0 .
As the next example shows, the result can be useful also in the computation of limits
since, when its hypotheses hold, we can write
If a limit involves a composition of continuous functions, (12.8) makes its computation im-
mediate.
x2
lim sin
x! x+
Indeed, once we observe that it can be written in terms of g f , then by (12.8) we have
x2 2
lim sin = lim (g f ) (x) = (g f ) ( ) = sin = sin =1
x! x+ x! 2 2
12.4.1 Zeros
The …rst result, Bolzano’s Theorem,5 is very intuitive. Yet its proof, although simple, is not
trivial, showing how statements that are intuitive might be di¢ cult to prove. Intuition is
a fundamental guide in the search for new results, but it may be misleading. Sometimes,
properties that appeared to be intuitively true turned out to be false.6 For this reason, the
proof is the unique way of establishing the validity of a result; intuition, even the most re…ned
one, must at a certain point leave the place to the rigor of the mathematical argument.
Note that the condition f (a) f (b) 0 is equivalent to asking that the two values do
not have the same sign. The clear intuitive meaning of this theorem is revealed by the next
…gure:
Proof If f (a) f (b) = 0, either f (a) = 0 or f (b) = 0. In the …rst case, the result holds by
setting c = a; in the second case, by setting c = b. If instead f (a) f (b) < 0, then we have
either f (a) < 0 < f (b) or f (b) < 0 < f (a). Let us study the case f (a) < 0 < f (b) (the
case f (b) < 0 < f (a) is analogous). Denote by C the set of values of x 2 [a; b] such that
f (x) < 0 and let c = sup C. By Proposition 120, recall that: (i) c x for all x 2 C, and (ii)
for each " > 0 there exists x0 2 C such that x0 > c ".
We next prove that f (c) = 0. By contradiction, assume that f (c) 6= 0, that is, either
f (c) < 0 or f (c) > 0. If f (c) < 0, by the Theorem on the permanence of sign there exists a
neighborhood (c ; c + ) such that f (x) < 0 for all x 2 (c ; c + ). By the de…nition of
C, this implies that c + =2 2 C, yielding that c cannot be the supremum, a contradiction.
Conversely, if f (c) > 0, again, by the Theorem on the permanence of sign there exists
a neighborhood (c ; c + ) of c such that f (x) > 0 for all x 2 (c ; c + ). By the
5
The result is named after Bernard Bolzano, who gave a …rst proof in 1817.
6
Recall Guidi’s crescendo in Section 10.3.2.
12.4. ZEROS AND EQUILIBRIA 349
A simple application of the result concerns the real solutions of a polynomial equation.
Let f : R ! R be the polynomial
2 n
f (x) = 0 + 1x + 2x + + nx (12.9)
and let us study the polynomial (or algebraic) equation f (x) = 0. The equation does not
always have real solutions: for example, this is the case for the equation f (x) = 0 with
f (x) = x2 + 1. Thanks to Bolzano’s Theorem, we have the following result that guarantees
that each polynomial equation of odd degree has always at least a real solution.
Corollary 488 If the degree of the polynomial f in (12.9) is odd, there exists at least a
x
^ 2 R such that f (^
x) = 0.
O.R. In presenting Bolzano’s Theorem, we remarked the limits of intuition. A nice example
in this regard is the following. Imagine you put a rope around the Earth at the equator
(about 40; 000 km) such that it perfectly adheres to the equator in each point. Now, imagine
that you add one meter to the rope and you lift it by keeping uniform its distance from the
ground. What is the measure of this uniform distance? We are all tempted to say “very,
very small: one meter out of forty thousands km is nothing!”Instead, no: the distance is 16
cm. Indeed, if c denotes the equatorial Earth circumference (in meters), the Earth radius is
r = c=2 ; if we add one meter, the new radius is r0 = (c + 1) =2 and the di¤erence between
the two is r0 r = 1=2 ' 0:1592. This proves another remarkable result: the distance of
about 16 centimeters is independent of c: no matter whether it is the Earth, or the Sun, or
a tennis ball, the addition of one meter to the length of the rope always causes a lift of 16
cm! As the manifesto of the Vienna circle remarked “Intuition ... is especially emphasized
by metaphysicians as a source of knowledge.... However, rational justi…cation has to pursue
all intuitive knowledge step by step. The seeker is allowed any method; but what has been
found must stand up to testing.” H
12.4.2 Equilibria
The next result is a further consequence of Bolzano’s Theorem, with a remarkable economic
application: the existence and the uniqueness of the market equilibrium price.
350 CHAPTER 12. CONTINUOUS FUNCTIONS
Proposition 489 Let f; g : [a; b] ! R be continuous. If f (a) g (a) and f (b) g (b),
there exists c 2 [a; b] such that
f (c) = g (c)
Since h is continuous, by Bolzano’s Theorem there exists c 2 [a; b] such that h (c) = 0, that
is, f (c) = g (c).
If f is strictly decreasing and g is strictly increasing, then h is strictly decreasing. There-
fore, again by Bolzano’s Theorem, c is unique.
We now apply the result to establish the existence and uniqueness of the market equi-
librium price. Let D : [a; b] ! R and S : [a; b] ! R be the demand and supply functions of
some good, where [a; b] R+ is the set of the prices at which the good can be traded (see
Section 8.4). A pair (p; q) 2 [a; b] R+ of prices and quantities is called market equilibrium
if
q = D (p) = S (p)
A fundamental problem is the existence, and the possible uniqueness, of such an equilib-
rium. By Proposition 489, so ultimately by Bolzano’s Theorem, we can solve the problem in
a very general way. Let us assume that S (a) D (a) and S (b) D (b). That is, at the smal-
lest possible price, a , the demand of the good is greater than its supply, while the opposite
is true at the highest possible price b. These hypotheses are natural. By Proposition 489,
they guarantee the existence of an equilibrium price p 2 [a; b], i.e., such that D (p) = S (p).
The equilibrium quantity is q = D (p) = S (p). Therefore, the pair of prices and quantities
(p; q) is a market equilibrium.
Moreover, again by Proposition 489, the market has a unique market equilibrium (p; q) if
we assume that the demand function D is strictly decreasing –i.e., at greater prices, smaller
quantities are demanded – and that the supply function S is strictly increasing – i.e., at
greater prices, greater quantities are o¤ered.
Proposition 490 Let D : [a; b] ! R and S : [a; b] ! R be continuous and such that
D (a) S (a) and D (b) S (b). Then there exists a market equilibrium (p; q) 2 [a; b] R+ .
If, in addition, D is strictly decreasing and S is strictly increasing, such equilibrium is unique.
The next …gure illustrates graphically the result, which corresponds to the classic “inter-
section” of demand and supply:
12.5. WEIERSTRASS’THEOREM: A PREVIEW 351
6
y
D
5
S
3
0
O b x
-1
-0.5 0 0.5 1 1.5 2
In equilibrium analysis, Bolzano’s Theorem is often applied through the demand excess
function E : [a; b] ! R de…ned by
We have E (p) 0 when at the price p the demand exceeds the supply; otherwise, we have
E (p) 0. Therefore, p 2 [a; b] is an equilibrium price if and only if E (p) = 0, i.e., if and only
if p equalizes demand and supply. The equilibrium price p is a zero of the excess demand
function; the conditions on the functions D and S assumed in Proposition 490 guarantee the
existence and uniqueness of such a zero.
A …nal observation: the reader can easily verify that Proposition 489 holds as long as
(i) the monotonicity of f and g are opposite: one is increasing and the other decreasing,
(ii) at least one of them is strict. In the statement we assumed f to be strictly decreasing
and g to be strictly increasing both for simplicity and in view of the application to market
equilibrium.
Theorem 491 A continuous function f : [a; b] ! R has (at least one) minimizer and (at
least one) maximizer in [a; b], that is, there exist x1 ; x2 2 [a; b] such that
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
(ii) Let f : (0; 1) ! R be given by f (x) = x. Here f is continuous but the interval (0; 1)
is not compact (it is open). In this case, too, the function has neither a maximizer nor a
minimizer.
3
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
12.5. WEIERSTRASS’THEOREM: A PREVIEW 353
(iii) Let f : [0; 1) ! R be given by f (x) = x. The function f is continuous but the
interval [0; 1) is not compact (it is closed but not bounded). The function does not have a
maximizer (it has only the minimizer 0).
y
2
0
O x
-1
-2
-3
-2 -1 0 1 2 3 4 5
with graph
2
y
1.5
1
1
0.5 1/2
0
O x
-0.5
-1
-1.5
-2
-5 -4 -3 -2 -1 0 1 2 3 4 5
The function f is continuous (and bounded) but R is not compact (it is closed but not
bounded). The function does not have either a maximizer or a minimizer. N
354 CHAPTER 12. CONTINUOUS FUNCTIONS
f (a) z f (b)
then there exists a c b such that f (c) = z. If f is strictly increasing, such c is unique.
Proof If f (a) = f (b), it is su¢ cient to set c = a or c = b. Let f (a) < f (b) and let
g : [a; b] ! R be de…ned by g (x) = f (x) z. We have
Since f is continuous, by Bolzano’s Theorem there exists c 2 [a; b] such that g (c) = 0, that
is, f (c) = z.
The function g is strictly monotonic if and only if f is so. Therefore, by Bolzano’s
Theorem such c is unique whenever f is strictly monotonic.
The function assumes, therefore, all the values between f (a) and f (b), without any
“breaks”. The lemma formalizes the intuition given at the beginning of the chapter that the
graph of a continuous function can be drawn without ever lifting the pencil.
The case f (a) f (b) is analogous. We can thus say, in general, that for any z such that
Together with Weierstrass’Theorem, Lemma 493 implies the following classic result.
there exists c 2 [a; b] such that f (c) = z. If f is strictly monotonic, such c is unique.
12.6. INTERMEDIATE VALUE THEOREM 355
5
y
4
M
3
z = f(c)
1
m
0 O a x c x b x
M m
-1
-1 0 1 2 3 4 5 6
The continuity of f on [a; b] is crucial for Lemma 493 (and therefore for the Intermediate
Value Theorem). To see this, consider, for example, the so-called signum function sgn : R !
R de…ned by 8
>
> 1 if x > 0
<
sgn x = 0 if x = 0
>
>
:
1 if x < 0
Its restriction sgn : [ 1; 1] ! R on the interval [ 1; 1] is continuous at all the points of this
interval except for the origin 0, at which it has a non-removable jump discontinuity. So, the
continuity hypothesis of Lemma 493 does not hold. The image of sgn x consists of only three
points f 1; 0; 1g. Thus, for every z 2 [ 1; 1], with z 6= 1; 0; 1, there is no x 2 [ 1; 1] such
that sgn x = z.
Proof The “if” follows from Proposition 207. As to the converse, assume that f is in-
jective. Suppose, by contradiction, that f is not strictly monotone. Then, there exist
x < z < y such that either f (z) > max ff (x) ; f (y)g or f (z) < min ff (x) ; f (y)g. Suppose
that f (z) > max ff (x) ; f (y)g, the other case being similarly handled. Let f (z) > k >
max ff (x) ; f (y)g. By the Intermediate Value Theorem, there exist t0k 2 [x; z] and t00k 2 [z; y]
such that f (t0k ) = f (t00k ) = k, thus contradicting the injectivity of f . We conclude that f is
strictly monotone.
Without continuity the “only if” fails: consider the discontinuous function f : R ! R
given by
(
x if x 2 Q
f (x) =
x else
It is not strictly monotone: if x = 3, z = and y = 4, we have x < z < y and f (z) <
min ff (x) ; f (y)g. Yet, f is injective. Indeed, let x 6= y. Clearly, f (x) 6= f (y) if either
x; y 2 Q or x; y 2= Q. If x 2 Q and y 2 = Q, then f (x) = x 2 Q and f (y) = y 2 = Q, and so
f (x) 6= f (y). We conclude that f is injective.
ym = fm (x1 ; :::; xn )
The functions fi are the component functions of the operator f . For example, let us go back
to the operators of Example 179.
Example 496 (i) If f : R2 ! R2 is de…ned by f (x1 ; x2 ) = (x1 ; x1 x2 ), then
f1 (x1 ; x2 ) = x1
f2 (x1 ; x2 ) = x1 x2
(ii) If f : R3 ! R2 is de…ned by
f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42
then
f1 (x1 ; x2 ; x3 ) = 2x21 + x2 + x3
f2 (x1 ; x2 ; x3 ) = x1 x42
N
12.7. LIMITS AND CONTINUITY OF OPERATORS 357
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L)
The value L is called the limit of the operator f at x0 .
For m = 1 we …nd again De…nition 448 of limit of functions of several variables. Note
that here L is a vector of Rm .7
Here, too, an operator that is continuous at all the points of a subset E of the domain
A is called continuous on E, while an operator that is continuous at all the points of its
domain is called continuous. It is easy to see that the two operators of the last example are
continuous.
The continuity of an operator is thus brought back to the continuity of its component
functions, a componentwise notion of continuity.
In Section 8.15 we saw that the convergence of vectors is equivalent to that of their
components. This will allow (the reader) to prove the next sequential characterization of
continuity that extends Proposition 475 to operators.
The statement is formally identical to that of Proposition 475, but here f (xn ) ! f (x0 )
indicates convergence of vectors in Rm .
Proposition 500 permits to extend to operators the continuity results established for
functions of several variables, except the ones that use in an essential way the order structure
of their codomain R (e.g., Bolzano’s and Weierstrass’ Theorems). We leave to the reader
such extensions.
7
For simplicity, we do not consider possible “extended values”, that is, a vector L with one or more
coordinates that are 1.
358 CHAPTER 12. CONTINUOUS FUNCTIONS
A main issue in dealing with equations is the existence of solutions, that is, whether there
exist vectors x 2 A such that f (x) = 0. As well known from (at least) high school, this
might well not be the case: consider f : R ! R given by f (x) = x2 + 1; there are no x 2 R
such that x2 + 1 = 0.
Bolzano’s Theorem is a powerful result to establish the existence of solutions in the scalar
case. Indeed, if f : A R ! R is a continuous function, then equation
f (x) = 0 (12.13)
has a solution provided there exist x0 ; x00 2 A such that f (x0 ) < 0 < f (x00 ). For instance, in
this way Corollary 488 was able to establish the existence of solutions of some polynomial
equations.
Bolzano’s Theorem admits a generalization to Rn that, surprisingly, turns out to be a
quite di¢ cult result, known as Poincaré-Miranda’s Theorem.9 A piece of notation: given a
vector x 2 Rn , we write (xi ; x i ) to emphasize the component i of vector x. For instance, if
x = (4; 7; 11) then x1 = 4 and x 1 = (7; 11), while x3 = 11 and x 3 = (4; 7).
8
Often (12.11) is referred to as a “system of equations”, each fi (x) = 0 being an equation. We will also use
this terminology when dealing with systems of linear equations (Section 13.7). In view of (12.10), however,
one should use this terminology cum grano salis.
9
It was stated in 1883 by Henri Poincaré and proved by Carlo Miranda in 1940. For a proof, we refer
interested readers to Kulpa (1997).
12.8. EQUATIONS, FIXED POINTS, AND MARKET EQUILIBRIA 359
Under this condition, the Poincaré-Miranda’s Theorem ensures that, for a continuous oper-
ator f = (f1 ; f2 ) : [a; b] ! R2 , there exists a point x 2 [a; b] such that f1 (x) = f2 (x) = 0.
In general, if there exist vectors x0 ; x00 2 A such that condition (12.14) holds on the interval
[x0 ; x00 ] A, then the equation (12.10) induced by a continuous function f : A Rn ! Rn
has a solution.
Example 502 De…ne f : R2 ! R2 by f (x1 ; x2 ) = (x51 + x22 ; e x21 + x32 ). Consider the
equation ( 5
x1 + x22 = 0
e x21 + x32 = 0
We have limx1 ! f1 (x1 ; x2 ) = 1 for each x2 2 R, as well as limx2 ! f2 (x1 ; x2 ) = 1 for
each x2 2 R. So, there exists an interval [x0 ; x00 ] in the plane on which condition (12.15) is
satis…ed in the form
By the Poincaré-Miranda’s Theorem, the equation has a solution x 2 [x0 ; x00 ], with f1 (x) =
f2 (x) = 0. N
Proposition 503 Let f = (f1 ; :::; fn ) ; g = (g1 ; :::; gn ) : [a; b] ! Rn be continuous operators
de…ned on an interval of Rn . If, for each i = 1; :::; n, we have
Proof Let h : [a; b] ! Rn be de…ned by h (x) = f (x) g (x). Then, for each i = 1; :::; n, we
have
for each x 2 [a; b]. Since h is continuous, by the Poincaré-Miranda’s Theorem there exists
c 2 [a; b] such that h (c) = 0, that is, f (c) = g (c).
Through this result we can generalize the equilibrium analysis that we carried out earlier
in the chapter for the market of a single good (Proposition 490). Consider now a market
where bundles x 2 Rn+ of n goods are traded. Let D : [a; b] ! Rn+ and S : [a; b] ! Rn+ be,
respectively, the aggregate demand and supply functions of such bundles, that is, at price
p 2 [a; b] Rn+ the market demands a quantity Di (p) 0 and o¤ers a quantity Si (p) 0
of each good i = 1; :::; n.
A pair (p; q) 2 [a; b] Rn+ of prices and quantities is a market equilibrium if
The last result permits to establish the existence of such equilibrium, thus generalizing
Proposition 490 to the general case of n goods. In particular, existence requires that, for
each good i, we have
That is, at its smallest possible price, a i , the demand of the good i is greater than its supply
regardless of the prices of the other goods, while the opposite is true at its highest possible
price bi . To …x ideas, assume that a = 0. Then, condition Di (0; p i ) Si (0; p i ) just means
that demand of a free good will always exceed its supply, regardless of which are the prices of
the other goods (a reasonable assumption). In contrast, the opposite happens at the highest
price bi , at which the supply of good i exceeds its demand regardless of the prices of the
other goods (a reasonable assumption as long as bi is “high enough”).
Via the excess demand function E : [a; b] ! Rn de…ned by
E (p) = 0 (12.17)
A pair (p; q) of prices and quantities is a market equilibrium if and only if price p solves this
equation and q = D (p). There is excess demand at price p of good i if Ei (p) 0 and excess
supply if Ei (p) 0. In equilibrium, there is neither excess demand nor excess supply. Next
we state the general existence result in excess demand terms.
Proposition 504 Let the excess demand function E : [a; b] ! R be continuous and such
that, for each good i = 1; :::; n,
Ei (bi ; p i ) 0 Ei (ai ; p i ) 8p i 2 [a i ; b i ]
Example 505 (i) All operators f : Rn ! Rn are, trivially, self-maps. (ii) The function
f : [0; 1] ! R given by f (x) = x2 is a self-map because x2 2 [0; 1] for all x 2 [0; 1]. In
contrast, the function f : [0; 1] ! R given by f (x) = x + 1 is not a self-map because, for
instance, f (1) = 2 2
= [0; 1]. N
Self-maps are important here because they may admit …xed points.
For instance, for the quadratic self-map f : [0; 1] ! [0; 1] given by f (x) = x2 , the
endpoints 0 and 1 are …xed points. For the self-map f : R2 ! R2 given by f (x1 ; x2 ) =
(x1 ; x1 x2 ), the origin is a …xed point in that f (0) = 0.
Turn now to the key question of the existence of …xed points. In the scalar case, it is an
immediate consequence of Bolzano’s Theorem.
Proof The result is obviously true if either f (0) = 0 or f (1) = 1. Suppose f (0) > 0 and
f (1) < 1. De…ne the auxiliary function g : [0; 1] ! R by g (x) = x f (x). Then, g (0) < 0
and g (1) > 0. Since g is continuous, by Bolzano’s Theorem there exists x 2 (0; 1) such that
g (x) = 0. Hence, f (x) = x, and so x is a …xed point.
In the general case, the existence of …xed points is ensured by the famous Brouwer’s
Fixed Point Theorem.11 In analogy with the scalar case, it can be viewed as an immediate
consequence of the Poincaré-Miranda’s Theorem.
Proof We prove the result in the special case K = [0; 1]n . Let I : [0; 1]n ! [0; 1]n be the
identity function I (x) = x. We have Ii (0i ; x i ) fi (0i ; x i ) and Ii (1i ; x i ) fi (1i ; x i )
for all x 2 [0; 1]n , where 1 = (1; :::; 1). So, we can apply the Poincaré-Miranda’s Theorem to
the function I f , which ensures the existence of a vector x 2 [0; 1]n such that (I f ) (x) = 0.
Hence, f (x) = x
Brouwer’s Theorem is a powerful result that only requires the self-map to be continuous.
However, it is demanding on the domain, which has to be a compact and convex set, and it
is a non-constructive existence result: it ensures the existence of a …xed point, but gives no
information on how to …nd it.12
A.2 D ( p) = D (p) and S ( p) = S (p) for each > 0: nominal changes in prices do not
matter;
A.3 Di (p) > Si (p) for some i with pi > 0 implies Sj (p) > Dj (p) for some j: if some goods
are in excess demand at a positive price, other ones must be in excess supply.
1
f (p) = Pn + p + E + (p) 8p 2 n 1
1+ i=1 Ei (p)
By A.1, the function is continuous (why?). By Brouwer’s Fixed Point Theorem, there is
some p 2 n 1 such that f (p) = p, that is,
1
Pn + p + E + (p) = p
1+ i=1 Ei (p)
Pn +
Hence, E + (p) = i=1 Ei (p) p. That is,
n
X
Ek+ (p) = pk Ei+ (p) 8k = 1; ::; n (12.20)
i=1
We want to prove that E + (p) = 0. Suppose, by contradiction, that there exists a good k for
which Ek+ (p) = Ek (p) > 0. By (12.20), it follows that pk > 0. Hence, by A.3 there exists a
good j for which Sj (p) > Dj (p). Hence, Ej+ (p) = 0. Moreover, A.4 implies that its price is
strictly positive, i.e., pj > 0. In view of (12.20) we can write
n
X
0 = Ej+ (p) = pj Ei+ (p)
i=1
Pn
This yields i=1 Ei+ (p) = 0, which contradicts Ek+ (p) > 0. We conclude that E + (p) = 0,
so p is a weak equilibrium price.
A.5 Di (p) < Si (p) for some i with pi > 0 implies Sj (p) < Dj (p) for some j: if some goods
are in excess supply at a positive price, other ones must be in excess demand.
This result shares with our earlier equilibrium existence result, Proposition 504, condi-
tions A.1 and A.4 – the latter being, essentially, the condition Ei (ai ; x i ) 0. Conditions
A.2, A.3 and A.5 are, instead, new and replace the highest price condition Ei (bi ; x i ) 0.
In particular, condition A.2 will be given a compelling foundation in Section 18.8.
In Section 18.8 we will present a simple exchange economy that provides a foundation
in terms of individual behavior of the aggregate market analysis of this section. In such
section we will see that it is natural to expect that the excess demand satis…es the following
property:
This condition is a weak version of the (aggregate) Walras’ law, which is:
As it will be seen in Section 18.8, W.1 only requires agents to buy a¤ordable bundles, while
Walras’law requires them to exhaust their budgets, a reasonable but non-trivial assumption.
In any case, W.1 implies condition A.3, so in the existence Theorem 509 we can replace
A.3 with a weak Walras’ law that has a compelling economic foundation. The stronger
condition W.2 implies both A.3 and A.5, so in the last result Walras’law can replace these
two conditions. A bit more is actually true, so next we state and prove the version of the
last two existence results that takes advantage of conditions W.1 and W.2. It is a simpli…ed
version of classical results proved by Kenneth Arrow and Gerard Debreu in the early 1950s.14
Theorem 511 (Arrow-Debreu) Under conditions A.1, A.2 and W.1, a weak market equi-
librium exists. If, in addition, A.4 and W.2 hold, then a market equilibrium exists.
If ' is linear and A is the real line, by Riesz’s Theorem there exists a vector a = (a1 ; :::; an ) 2
Rn such that ' (x) = a x, so we get back to the linear recurrence (8.11). Solutions of this
important class of recurrences have been studied in Section 10.5.4.
If k = 1, the function ' : A ! A is a self-map that de…nes a recurrence of order 1 given
by (
x0 = 0
(12.21)
xn = ' (xn 1 ) for n 1
with initial condition 0 2 R. If the self-map ' : A ! A is linear, it reduces to the geometric
recurrence (
x0 = 0
(12.22)
xn = axn 1 for n 1
12.9.2 Asymptotics
From now on, we focus on the recurrence (12.21). We need some notation. Given any selfmap
' : A ! A, its second iterate ' ' : A ! A is denoted by '2 . More generally, 'n : A ! A
denotes the n-th iterate 'n = 'n 1 ', i.e.,
We adopt the convention that '0 is the identity map '0 (x) = x for all x 2 A.
15
Most of the analysis of this section continues to hold if A is a subset of Rn , as readers can check.
366 CHAPTER 12. CONTINUOUS FUNCTIONS
Example 513 (i) Consider the self-map ' : (0; 1) ! (0; 1) de…ned by ' (x) = x= (1 + x).
Then,
x
2 1+x x
' (x) = ' (' (x)) = x =
1 + 1+x 1 + 2x
x
1+x x
'3 (x) = ' '2 (x) = x =
1+ 2 1+x 1 + 3x
as desired.
(ii) Consider the self-map ' : [0; 1) ! [0; 1) de…ned by ' (x) = ax2 . Then,
2
'2 (x) = ' (' (x)) = a ax2 = a3 x4
2
'3 (x) = ' '2 (x) = a a3 x4 = a7 x8
Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
n 1 2n 2 n 1 2n 1 n+1 n+1 1 2n+1
'n+1 (x) = ' ('n (x)) = a a2 x = aa2 a x2 = a2 x
as desired. N
We can represent the sequence fxn g de…ned via the recurrence (12.21) using the iterates
'n of the selfmap ' : A ! A. Indeed, we have
A sequence of iterates f'n (x0 )g of points in A that starts from an initial point x0 of A is
called orbit of x0 under '. The collection ff'n (x0 )g : x0 2 Ag of all the orbits determined
by possible initial conditions is called phase portrait of '. In view of (12.25), the orbits that
form the phase portrait of ' describe how the sequence de…ned by the recurrence (12.21)
may evolve according to how it is initialized.
Example 514 (i) For the geometric recurrence the relation (12.25) takes the familiar form
xn = 'n (x0 ) = an x0 8n 0
12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 367
Orbits solve the recurrence (12.21) if they can be described in closed form, as it is the case
for the recurrences of the last example. Unfortunately, often this is not possible and so the
main interest of (12.25) is theoretical; operationally, however, it may suggest a qualitative
analysis of the recurrence. A main issue in this regard is the asymptotic behavior of orbits:
where do they end up eventually? for instance, do they converge?
The next simple, yet important, result shows that …xed points play a key role in studying
the convergence of orbits.
Theorem 515 Let ' : A ! A be a continuous self-map and x0 a point of A. If the orbit
f'n (x0 )g converges to x 2 A, then x is a …xed point of '.
Proof Assume that xn = 'n (x0 ) ! x 2 A. Since ' is continuous, we have ' (x) =
lim ' ('n (x0 )). So,
' (x) = lim ' ('n (x0 )) = lim 'n+1 (x0 ) = lim xn+1 = lim xn = lim 'n (x0 ) = x
where the equality lim xn+1 = lim xn holds because, as easily checked, if xn ! x then
xn+k ! x for every given k 1. We conclude that x is a …xed point, as desired.
So, a necessary condition for a point to be the limit of a sequence de…ned by a recurrence
of order 1 is to be a …xed point of the underlying self-map. If there are no …xed points,
convergence is hopeless. If they exist (e.g., by Brouwer’s Theorem), we have some hope.
Yet, it is only a necessary condition: as it will become clear later in the section, there are
…xed points of ' that are not limits points of the recurrence (12.21).
Fixed points thus provide the candidate limit points. We have the following procedure
to study limits of sequence de…ned by a recurrence (12.21):
1. Find the collection fx 2 A : ' (x) = xg of the …xed points of the self-map '.
2. Check whether they are limits of the orbits f'n (x0 )g, that is, whether 'n (x0 ) ! x.
This procedure is especially e¤ective when the …xed points are unique. Indeed, in this
case there is a unique candidate limit point for all possible initial conditions x0 2 A, so if
orbits converge –e.g., they form a monotonic sequence, so Theorem 299 applies –then they
have to converge to the …xed point. Remarkably, in this case iterations swamp the initial
condition, which asymptotically plays no role in the behavior of the recursion. Regardless of
how the recursion starts, it eventually behaves the same.
In view of this discussion, the next result is especially interesting.16
16
Contractions are introduced in Section 16.1.
368 CHAPTER 12. CONTINUOUS FUNCTIONS
Proposition 516 If the self-map ' : A ! A is a contraction, it has at most a unique …xed
point.
Proof Suppose that x1 ; x2 2 A are …xed points. Then, for some k 2 (0; 1),
0 jx1 x2 j = j' (x1 ) ' (x2 )j k jx1 x2 j
and so jx1 x2 j = 0. This implies x1 = x2 , as desired.
So, recursions de…ned by self-maps that are a contraction have at most a single candidate
limit point. It is then enough to check whether it is actually a limit point.
Example 517 A continuously di¤erentiable function ' : [a; b] ! R is a contraction if 0 <
k = maxx2[a;b] j'0 (x)j < 1 (cf. Example 727). Take the contraction self-map ' : [0; 1] ! [0; 1]
given by ' (x) = x2 =4. The unique …xed point is the origin x = 0. By (12.24), we have
1 n
'n (x0 ) = x2 ! 0 8x0 2 [0; 1]
42n 1 0
So, the orbits converge to the …xed point for all initial conditions x0 2 [0; 1]. N
The next example shows, inter alia, that being a contraction is a su¢ cient but not
necessary condition for the uniqueness of …xed points.
Example 518 Consider the self-map ' : (0; 1) ! (0; 1) de…ned by ' (x) = x= (1 + x).
We have, for all x; y > 0,
jx yj
j' (x) ' (y)j =
(1 + x) (1 + y)
So, ' is not a contraction. Nevertheless, it is easy to check that it has a unique …xed point
given by the origin x = 0. By (12.23), we have
x0
'n (x0 ) = !0 8x0 > 0
1 + nx0
So, the orbits converge to the …xed point for all initial conditions x0 > 0. N
In the rest of the section we illustrate our asymptotic analysis through some important
applications.
' (x) = x
' = + =
We can thus write ' : [a; b] ! [a; b]. This self-map de…nes the price recurrence (12.26).
Its unique …xed point of ' is easily seen to be
p=
+
Thus, the unique candidate limit price is the equilibrium price (8.17) of the market without
delays in production.
Let us check whether or not p is indeed the limit point. The following formula is key.
Lemma 520 We have
t 1
pt p = ( 1)t 1
(p1 p) 8t 2 (12.27)
Proof We have
1 1
pt p = pt 1 =( ) pt 1
+ +
= pt 1 = (pt 1 p)
+
that is,
pt p= (pt 1 p) 8t 2 (12.28)
t 1
t 1
pt p = ( 1) (p1 p)
370 CHAPTER 12. CONTINUOUS FUNCTIONS
as desired.
Since (
1 if t odd
( 1)t 1
=
1 if t even
from formula (12.27) it follows that
t 1
jpt pj = jp1 pj 8t 2 (12.29)
The value of lim pt thus depends on the ratio = of the slopes of the supply and demand
functions. We need to distinguish three cases according to whether such ratio is greater,
equal or lower than 1, that is, according to whether
< ; = ; >
Case 1: < The supply function has a lower slope than the demand function. We have
t 1
lim jpt pj = jp1 pj lim =0
So,
lim pt = p (12.30)
as well as
lim Et 1 (pt ) =p (12.31)
When < , the …xed point p is indeed a limit point. Equilibrium prices of markets with
delays and classic expectations thus converge to the equilibrium price of the market without
delays in production. This holds for any possible initial expectation E0 (p1 ), which in the
long run turns out to be immaterial.
Note that the (one-step-ahead) forecast error vanishes asymptotically:
et = pt Et 1 (pt ) !0
Classic expectations, though lazy, are nevertheless asymptotically correct provided < .
Case 2: = The demand and supply functions have the same slope. Formula (12.27)
implies
pt p = ( 1)t 1
(p1 p) 8t 2
The initial price p1 is equal to p if and only if initial expectation is correct:
E0 (p1 ) = p1 () p1 = E0 (p1 ) = p1 () p1 = p
So, if the initial expectation is correct, then pt = p for all t. Otherwise, the initial error
E0 (p1 ) 6= p1 determines an oscillating sequence of equilibrium prices
2p p1 if t even
pt = p + ( 1)t 1
(p1 p) =
p1 if t odd
12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 371
keeps oscillating.
Case 3: > The supply function has a higher slope than the demand function. From
> it follows that
t 1
lim = +1
In this case, the initial forecast error propagates, causing an exploding price dynamics. When
> , the laziness of classic expectations translates in explosive price behavior.
As we already remarked, given a sequence of equilibrium prices fpt g and of price expect-
ations fEt 1 (pt )g, the error forecast et at each t is given by
et = pt Et 1 (pt )
In view of (8.19), in the market of potatoes with production delays the producers’error
forecast et at time t is
et = pt Et 1 (pt ) = + 1 Et 1 (pt )
et = 0 () + 1 Et 1 (pt ) = 0 () Et 1 (pt ) =
+
So, expectations are rational if and only if
Et 1 (pt ) = pt = p = 8t 1
+
We have thus proved the following result.
372 CHAPTER 12. CONTINUOUS FUNCTIONS
Proposition 521 A uniperiodal market equilibrium of markets MRt features rational expect-
ations if and only if the sequence of equilibrium prices is constant with pt = Et 1 (pt ) = p
for all t 1.
The uniperiodal price equilibrium under rational expectations of markets MRt with pro-
duction delays is equal to the equilibrium price (8.17) of market M . Remarkably, rational
expectations have neutralized, in equilibrium, any e¤ect of di¤erences in production tech-
nologies. In terms of potatoes’ equilibrium prices, it is immaterial to have a traditional
technology, with sowing in t 1 and harvest in in t, rather than a Star Trek one with
instantaneous production.
Thus, Heron’s sequence converges to the square root of a. On top of that, the rate of
convergence is quite fast, as we will see in a few examples.
that is,
2
1 a
x2n+1 = xn + >a
4 xn
p
So, xn+1 > a. This completes the proof of (12.33).
p p
If a > 1, we have x1 = a > a. By (12.33), x2 > a. If, instead, 0 < a < 1, then
p
x2 = (a + 1) =2 > a. Indeed, by squaring we obtain
1 a
xn+1 = xn + < xn
2 xn
By iterating the algorithm, xn and a=xn become closer and closer, till they reach their
p
common value a. The following …gure illustrates:
y
4
2a/xn+1
a/x
n
1
0
O x x x
n+1 n
-1
-1 0 1 2 3 4 5
This characterization is identical to the de…nition of limx!x0 f (x) = f (x0 ) for a point
x0 that belongs to the domain of the function, except for the elimination of the condition
0 < kx x0 k –i.e., of the requirement that x 6= x0 –so to include x0 that are isolated points
of A.
12.10. CODA CONTINUA 375
Here the value of " thus depends only on ", no longer on a point x0 . Indeed, no speci…c
points x0 are mentioned in this de…nition, which only considers the domain per se.
Uniform continuity implies continuity, but the converse does not hold. For example, we
will see soon that the quadratic function is continuous on R, but not uniformly. Yet, the two
notions of continuity turn out to be equivalent for the fundamental class of the compact sets
(Section 5.6).
Proof The “if”is obvious because uniform continuity implies continuity. We prove the “only
if”. For simplicity, consider the scalar case n = 1 with K = [a; b]. So, let f : [a; b] ! R be
continuous. We need to show that it is also uniformly continuous. Suppose, by contradiction,
that there exists a " > 0 such that there are two sequences fxn g and fyn g in [a; b] with
xn yn ! 0 and
f (xn ) f (yn ) " 8n 1 (12.36)
Since the sequences fxn g and fyn g are bounded, the Bolzano-Weierstrass’ Theorem yields
two convergent subsequences fxnk g and fynk g, i.e., there exist x; y 2 [a; b] such that xnk ! x
and ynk ! y. Since xn yn ! 0, we have xnk ynk ! 0 and, therefore, x y = 0 because of the
uniqueness of the limit. Since f is continuous, we have f (xnk ) ! f (x) and f (ynk ) ! f (y).
Hence, f (xnk ) f (ynk ) ! f (x) f (y) = 0, which contradicts (12.36). We conclude that f
is uniformly continuous.
376 CHAPTER 12. CONTINUOUS FUNCTIONS
Theorem 526 does not hold without assuming the compactness of K, as the next two
counterexamples show. In the …rst counterexample we consider a closed, but unbounded set
–the real line –while in the second one we consider a bounded set which is not closed –the
open interval (0; 1).
Example 528 The function f : (0; 1) ! R de…ned by f (x) = 1=x is continuous, but
not uniformly continuous, on (0; 1). Indeed, suppose, by contradiction, that f is uniformly
continuous. By setting " = 1, there exists " > 0 such that
1 1
jx yj < " =) <1 8x; y 2 (0; 1) (12.38)
x y
Let y = min f " =2; 1=2g and x = y=2. It is immediate that 0 < x < y < 1 and jx yj < ".
By (12.38), we thus have
1 1 1 1
= <1 (12.39)
x y x y
On the other hand,
1 1 1
= 2
x y y
which contradicts (12.39). We conclude that the function 1=x is not uniformly continuous
on (0; 1). Nevertheless, by Theorem 526 its restriction to any compact interval [a; b] (0; 1)
is uniformly continuous. N
Part IV
377
Chapter 13
Example 530 The scalar functions f : R ! R de…ned by f (x) = mx for some m 2 R are
linear. Geometrically, they are straight lines passing through the origin. N
Example 531 Through inner products (Section 4.1.1), it is easy to de…ne linear functions.
Indeed, given a vector 2 Rn , de…ne f : Rn ! R by
f (x) = x 8x 2 Rn (13.2)
379
380 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Example 532 In production theory, production functions may be assumed to have the
linear form (13.2). The vector = ( 1 ; 2 ; :::; n ) 2 Rn is interpreted as the vector of
constant production coe¢ cients. Indeed, we have
f e1 = e1 = 1
2 2
f e = e = 2
f (en ) = en = n
which means that 1 is the quantity of output determined by one unit of the …rst input,
2 is the quantity of output determined by one unit of the second input, and so on. These
coe¢ cients are called constant because they do not depend on the quantity of input. This
implies that the returns to scale of these production functions are constant. N
Next, we give a simple but important characterization: a function is linear if and only if
it preserves the operations of addition and scalar multiplication. Linear functions are, thus,
the functions that preserve the linear structure of Rn . This clari…es their nature.
f ( x + y) = f ( x) + f ( y) = f (x) + f (y)
Next, we show that, more generally, linear combinations are preserved by linear functions.
When k = 2 we are back to the de…nition, but the result goes well beyond that, as it holds
for every k 2.
k
for every set of vectors xi i=1
in Rn and every set of scalars f i gki=1 .
Proof Let us show that f (0) = 0. Since f is linear, we have f ( 0) = f (0) for every
2 R. So, f (0) = f (0) for every 2 R, which can happen if and only if f (0) = 0. The
proof of (13.3) is left to the reader.
A more general version of (13.3), called Jensen’s inequality, will be proved in Chapter
14. Property (13.3) has an important consequence: once we know the values taken by a
13.1. LINEAR FUNCTIONS 381
linear function on the elements of a basis, we can determine its value for any vector of Rn
whatsoever. Indeed, let S be a basis of Rn . Each vector x 2 Rn can be written as a linear
n
combination of elements of S, Pso there exists a …nite set of vectors xi i=1 in S and a set of
scalars f i gni=1 such that x = ni=1 i xi . By (13.3), we then have
n
X
f (x) = if xi
i=1
Linearity is a purely algebraic property that requires functions to have a consistent beha-
vior with respect to the operations of addition and scalar multiplication. Thus, prima facie,
linearity has no topological consequences. It is, therefore, remarkable that linear functions
turn out to be continuous.
This elegant result is important because continuity is, as we learned in the last chapter,
a highly desirable property. We omit, however, the proof because it is a special case of a
result, Theorem 669, that will be proved later in the book.
13.1.2 Representation
De…nition 536 The set of all linear functions f : Rn ! R is called the dual space of Rn
and is denoted by (Rn )0 .
The space (Rn )0 is, thus, the set of all linear functions de…ned on Rn . On (Rn )0 it is
possible to de…ne in a natural way addition and scalar multiplication:
for every x 2 Rn .
The two operations satisfy the properties (v1)-(v8) that, in Chapter 3, we discussed for
Rn . Hence, intuitively, (Rn )0 is an example of a vector space. In particular, the neutral
element for the addition is the zero function f such that f (x) = 0 for every x 2 Rn , while
the opposite element of f 2 (Rn )0 is the function g = ( 1) f = f such that g(x) = f (x)
for every x 2 Rn .
The next important result, an elementary version of the celebrated Riesz’s Theorem,
describes the dual space (Rn )0 . We saw that every vector 2 Rn induces a linear function
382 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
f : Rn ! R de…ned by f (x) = x (Example 531). The following result shows that the
converse holds: all linear functions de…ned on Rn have this form, i.e., the dual space (Rn )0
consists of the linear functions of the type f (x) = x for some 2 Rn . In particular, the
straight lines passing through the origin are the unique linear functions de…ned on the real
line (Example 530).
Theorem 537 (Riesz) A function f : Rn ! R is linear if and only if there exists a (unique)
vector 2 Rn such that
f (x) = x 8x 2 Rn
Proof We have already seen the “if”part in Example 531. It remains to prove the “only if”
part. So, let f : Rn ! R be a linear function and consider the standard basis e1 ; :::; en of
Rn . Set
= f e1 ; :::; f (en ) 2 Rn
P
We can write each vector x 2 Rn as x = ni=1 xi ei . Thus, by the linearity of f we have:
n
! n n
X X X
f (x) = f xi ei = xi f e i = i xi = x 8x 2 Rn
i=1 i=1 i=1
and so 0 = .
13.1.3 Monotonicity
Turn now to the order structure of Rn . A function f : Rn ! R is said to be:
In words, a (strictly) positive function f assigns (strictly) positive values f (x) to (strictly)
positive vectors x.
In general, positivity is a much weaker property than monotonicity: for example, the
function f (x) = kxk is positive but it is not increasing. Indeed,pfor n = 2, the p vectors
x = ( 3; 2) and y = (2; 2) are such that y x, while f (x) = 13 > f (y) = 8. A
remarkable feature of linear functions is that the two properties become equivalent.
2
Positivity with respect to the order structure is weaker than positivity of the image of a function f : A
n
R ! R. This latter, stronger, notion requires that f (x) 0 for all x that belong to the domain A. In what
follows, it should be clear from the context which notion of positivity we are referring to.
13.1. LINEAR FUNCTIONS 383
Proof We only prove the “if”part, since the converse is rather trivial. Let f be positive. We
show that it is also increasing. Let x; y 2 Rn be such that x y. Let also z = x y 2 Rn .
Since z 0, positivity and linearity imply
yielding that f (x) f (y), as desired. The proof for f strictly positive is similar.
Thus, to prove that a linear function is increasing, it is enough to show that it is positive,
while to prove that it is strictly increasing, it su¢ ces to show that it is strictly positive.
Positivity emerges also in the monotone version of Riesz’s Theorem. This result, which
will be generalized in Proposition 641, is of great importance in applications as we will see
in Section 19.5.
(Strictly) increasing linear functions are thus characterized by (strongly) positive repres-
enting vectors . Let us see an instance of this result.
Example 540 Consider the linear functions f; g : R3 ! R de…ned by f (x) = x1 + 2x2 + 5x3
and g (x) = x1 +3x2 . Denote by f and g their representing vectors. By the last proposition,
f is strictly increasing because f = (1; 2; 5) 0, and g is increasing because g = (1; 2; 0)
0. N
As the reader can check, the proof of Proposition 539 is an immediate consequence of
Riesz’s Theorem when it is combined with Proposition 538 and the following lemma.
Proof The “if” parts are trivial. As for the “only if” parts, consider b = ei : it follows that
a b = ai which, in turn, must be, respectively, 0 and > 0 for each i.
Similar results can be proven by replacing “strictly” with “strongly”. Moreover, as the
reader can easily verify, dual results hold for decreasing and negative linear functions.
384 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
In other words, f must be a weighted average with weights i . So, it is linearity the property
that underlies weighted averages as summary measures: if it is a property meaningful in the
application at hand, then weighted averages become the way to summarize vectors through
a scalar.
For instance, if all weights are positive and equal, we have i = 1=n and (13.6) becomes
the classic arithmetic average
n
1X
f (x) = xi
n
i=1
The contemplation of this classic average makes us realize that in (13.6) weights might be
negative, something unnatural (at least for our pro…t example). We thus need some further
assumption on f that makes possible the use of the monotone version of Riesz’ Theorem.
This is easily done by requiring f to be positive:
x 0 =) f (x) 0
It is, indeed, a rather intuitive property: if the vector (say, of pro…ts) is positive, its summary
measure should also be positive. If f is linear and positive, by Proposition 539 we indeed
have 0, so weights are positive.
Another property that seems natural is normalization:
f (1; 1; :::; 1) = 1
13.1. LINEAR FUNCTIONS 385
If, for instance, the vector of pro…ts is constant and equal to 1 (so all branches make a unit
of pro…t), then the summary measure of pro…ts is 1 as well. This property is characterized
by having the weights in (13.6) that add up to 1.
Proposition 542 The function f : Rn ! R is linear, Ppositive and normalized if and only if
there exists a (unique) positive vector 2 Rn+ , with ni=1 i = 1, such that f (x) = x for
n
all x 2 R .
Indeed, weights are often assumed to add up to 1, so they can be interpreted as propor-
tions and expressed, if needed, in percentage terms. This result shows that normalization is
the property of linear and positive summary functions that underlies this natural property.
as desired.
A further interesting property that f may satisfy is symmetry. In our pro…t example,
symmetry says that we do not care which branch realized which pro…t, but we only care about
the size of the pro…t. For instance, if n = 2 and x = (1000; 4000) and y = (4000; 1000), under
symmetry f (x) = f (y) because the only di¤erence in the two vectors is which branch earned
a given pro…t.
To state formally symmetry we need permutations, that is, bijections : N ! N where
N = f1; 2; :::; ng (Appendix B.2). Given x; y 2 Rn , write x y if there exists a permutation
such that xi = y (i) for all i = 1; 2; :::; n. In other words, y can be obtained from x by
permuting indexes.
Example 543 We have x = (1000; 4000) y = (4000; 1000). Indeed, let : f1; 2g ! f1; 2g
be the permutation given by (1) = 2 and (2) = 1, in which indexes are interchanged.
Then (y (1) ; y (2) ) = (y2 ; y1 ) = (1000; 4000) = x. N
x y =) f (x) = f (y)
In other words, a symmetric f assigns the same value to all vectors that can be obtained
from another one via a permutation.
Remarkably, this result provides a foundation for the classic arithmetic average: it is the
only summary function on Rn which is linear, positive, normalized, and symmetric. As long
as these properties are compelling in our application, we can summarize vectors via their
arithmetic averages.
proving that f is symmetric. “Only if”. Suppose f is symmetric. Note that ei ej for all
indexes i 6= j. Indeed, it is enough to consider the permutation : N ! N de…ned by
8
< j if k = i
(k) = i if k = j
:
k else
Summing up, the Riesz’s Theorem and its variations permit a principled approach to
weighted averages that are justi…ed via the properties of the summary functions, which are
the fundamental objects of interest –averages just being a way to represent them (however
convenient they might be).
13.2 Matrices
13.2.1 De…nition
Matrices play a key role in the study of linear operators. Speci…cally, a m n matrix is
simply a table, with m rows and n columns, of scalars
2 3
a11 a12 a1j a1n
6 a21 a22 a2j a2n 7
6 7
6 7
6 7
4 5
am1 am2 amj amn
For example, 2 3
1 5 7 9
4 3 2 1 4 5
12 15 11 9
13.2. MATRICES 387
is a 3 4 matrix, where
Notation The elements (or components or entries) of a matrix are denoted by aij and the
matrix itself is also denoted by (aij ). A matrix with m rows and n columns will be often
denoted by A .
m n
A matrix is called square (of order n) when m = n and is called rectangular when m 6= n.
1 5 7 9 ; 3 2 1 4 ; 12 15 11 9
The 3 3 matrix 2 3
1 5 1
4 3 4 2 5
1 7 9
388 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
1 5 1 ; 3 4 2 ; 1 7 9
Example 546 (i) The square matrix of order n obtained by writing, one next to the other,
the versors ei of Rn is called the identity (or unit) matrix and is denoted by In or, when
there is no danger of confusion, simply by I:
2 3
1 0 0
6 0 1 0 7
6 7
I=6 . . .. .. 7
4 .. .. . . 5
0 0 1
(ii) The m n matrix with all zero elements is called null and is denoted by Omn or, when
there is no danger of confusion, simply by O:
2 3
0 0 0
6 0 0 0 7
6 7
O=6 . . .. .. 7
4 .. .. . . 5
0 0 0
N
(i) given two matrices (aij ) and (bij ) in M (m; n), the addition (aij ) + (bij ) is de…ned by
2 3 2 3 2 3
a11 a1n b11 b1n a11 + b11 a1n + b1n
6 7 6 7 6 7
6 7+6 7=6 7
4 5 4 5 4 5
am1 amn bm1 bmn am1 + bm1 amn + bmn
that is (aij ) + (bij ) = (aij + bij );
(ii) given 2 R and (aij ) 2 M (m; n), the scalar multiplication (aij ) is de…ned by
2 3 2 3
a11 a1n a11 a1n
6 7 6 7
6 7=6 7
4 5 4 5
am1 amn am1 amn
that is (aij ) = ( aij ).
13.2. MATRICES 389
and 2 3 2 3
1 5 7 9 4 20 28 36
44 3 2 1 4 5 = 4 12 8 4 16 5
12 15 11 9 48 60 44 36
N
Example 548 Given a square matrix A = (aij ) of order n and two scalars and , we have
2 3
a11 + a12 a1n
6 a21 a22 + a2n 7
A+ I =6
4
7:
5
an1 an2 ann +
It is easy to verify that the operations of addition and scalar multiplication just introduced
on M (m; n) satisfy the properties (v1)-(v8) that in Chapter 3 we established for Rn , that is:
(v1) A + B = B + A
(v2) (A + B) + C = A + (B + C)
(v3) A + O = A
(v4) A + ( A) = O
(v5) (A + B) = A + B
(v6) ( + ) A = A + A
(v7) 1A = A
(v8) ( A) = ( )A
Intuitively, we can say that M (m; n) is another example of a vector space. Note that
the neutral element for the addition is the null matrix.
(i) symmetric if aij = aji for every i; j = 1; 2; :::; n, i.e., when the two triangles separated
by the main diagonal are mirror images of each other;
390 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
(ii) lower triangular if all the elements above the main diagonal are zero, that is, aij = 0
for i < j;
(iii) upper triangular if all the elements below the main diagonal are zero, that is, aij = 0
for i > j;
(iv) diagonal if it is both lower and upper triangular, that is, if all the elements outside the
main diagonal are zero: aij = 0 for i 6= j.
bij = aji
as well as 2 3
1 3
1 0 7
A= and AT = 4 0 5 5
3 5 1
7 1
N
Note that
T
AT =A
so the “transpose of the transpose” of a matrix is the matrix itself. In particular, it is easy
to see that a square matrix A is symmetric if and only if AT = A. In this case, transposition
has no e¤ect. Finally, in terms of operations we have
that is, xT 2 M (n; 1). This allows us to identify Rn also with M (n; 1).
In what follows we will often identify the vectors of Rn with matrices. Sometimes it
will convenient to regard them as row vectors, that is, as elements of M (1; n), sometimes
as column vectors, that is, as elements of M (n; 1). In any case, one should not forget that
vectors are elements of Rn , identi…cations are holograms.
It is thus evident why the dimension of the vector x must be equal to the number of
columns of A: in multiplying A with x, the components of AxT are the inner products
between the rows of A and the vector x. But, inner products are possible only between
vectors of the same dimension.
Notation To ease notation, in what follows we will just write Ax instead of AxT .
However, it is not possible to take the product xA: the number of rows of A (i.e., 3) is not
equal to the number of columns of x (i.e., 1). N
In a similar way, we de…ne the product of two matrices A and B by suitably multiplying
the rows of A and the columns of B. The prerequisite on the dimensions of the matrices is
that the number of columns of A is equal to the number of rows of B. In other words, the
product AB is possible when A 2 M (m; n) and B 2 M (n; q). If we denote by a1 , a2 ,..., am
the rows of A and by b1 , b2 ,..., bq the columns of B, we then have
2 3 2 1 1 3
a1 a b a1 b2 a1 bq
6 a2 7 1 2 6 2 1 a2 b2 a2 bq 7
AB = 6 7 b ; b ; :::; bq = 6 a b 7
4 5 4 5
a m am b1 am b2 m
a b q
The product matrix AB is of type m q: so, it has the same number of rows as A and
the same number of columns as B. Note that it is possible to take the product AB of the
matrices A and B if and only if the product B T AT of the transpose matrices B T and
m n n q q n
T
A is well-de…ned. Momentarily it will be seen that, indeed, (AB)T = B T AT .
n m
This de…nition of product between matrices …nds its justi…cation in Proposition 569,
which we discuss later in the chapter. For the moment, it is important to understand the
“mechanics” of the de…nition. To this end, we proceed with some examples.
2 3
1 2 1 0
1 3 1 4 2 5 2 2 5
AB =
0 1 4
0 1 3 2
1 1+3 2+1 0 1 2+3 5+1 1 1 1+3 2+1 3 1 0+3 2+1 2
=
0 1+1 2+4 0 0 2+1 5+4 1 0 1+1 2+4 3 0 0+1 2+4 2
7 18 10 8
=
2 9 14 10
N
The product of matrices has the following properties, as the reader can verify.
Proposition 554 Let A; B and C be any three matrices for which it is possible to take the
products indicated below. Then
Among the properties of the product, commutativity is missing. Indeed, the product of
matrices does not satisfy this property: if both products AB and BA are well-de…ned, in
general we have AB 6= BA. The next example will illustrate this notable failure.
When AB = BA, we say that the two matrices commute. Since (AB)T = B T AT , the
matrices A and B commute if and only if their transposes commute.
394 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Since A and B are square matrices, both BA and AB are well-de…ned 3 3 matrices. We
have:
2 32 3
2 1 4 1 0 3
BA = 4 0 3 1 5 4 2 1 0 5
4 2 4 1 4 6
2 3
2 1+1 2+4 1 2 0+1 1+4 4 2 3+1 0+4 6
=4 0 1+3 2+1 1 0 0+3 1+1 4 0 3+3 0+1 6 5
4 1+2 2+4 1 4 0+2 1+4 4 4 3+2 0+4 6
2 3
8 17 30
=4 7 7 6 5
12 18 36
while
2 32 3
1 0 3 2 1 4
AB = 4 2 1 0 54 0 3 1 5
1 4 6 4 2 4
2 3
1 2+0 0+3 4 1 1+0 3+3 2 1 4+0 1+3 4
=4 2 2+1 0+0 4 2 1+1 3+0 2 2 4+1 1+0 4 5
1 2+4 0+6 4 1 1+4 3+6 2 1 4+4 1+6 4
2 3
14 7 16
=4 4 5 9 5
26 25 32
The notion of linear operator generalizes that of linear function (De…nition 529), which
is the special case m = 1, that is, Rm = R.
13.3. LINEAR OPERATORS 395
Linear operators are the operators which preserve the operations of addition and scalar
multiplication, thus generalizing the analogous result that we established for linear functions
(Proposition 533). Though natural, it is a signi…cant generalization: here T (x) is a vector
of Rm , not a scalar (unless m = 1).
T (x) = Ax 8x 2 Rn (13.8)
It is easy to see that T is linear. Soon, in Theorem 564, we will show that all linear operators
T : Rn ! Rm actually have such a form.
Note that this operator can be written in the form T = (T1 ; :::; Tm ) : Rn ! Rm introduced
in Section 12.7 by setting, for every i = 1; :::; m,
Ti (x) = ai x
0 (x) = 0 8x 2 Rn
I (x) = x 8x 2 Rn
Example 560 Let A = (aij ) be an n n square matrix. As in Example 558, de…ne the
operator T : Rn ! Rn by
T (x) = Ax 8x 2 Rn
Now, this operator has the same domain and codomain. N
We conclude this …rst section with some basic properties of linear operators that gen-
eralize those stated in Proposition 534 for linear functions (the easy proof is left to the
reader).
396 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
The operations of addition and scalar multiplication are easily de…ned for operators.
Speci…cally, given two operators S; T : Rn ! Rm , linear or not, and a scalar 2 R, de…ne
S + T : Rn ! Rm and T : Rn ! Rm by
(S + T ) (x) = S (x) + T (x)
and
( T ) (x) = T (x)
for every x 2 Rn . Denote by L (Rn ; Rm )the space of all linear operators T : Rn ! Rm . In
the case of linear functions, i.e., R = R, the space L (Rn ; R) reduces to the dual space (Rn )0
m
that we studied before. It is easy to check that the two operations just introduced satisfy
the “usual” properties (v1)-(v8). Again, this means that L (Rn ; Rm ) is, intuitively, another
example of a vector space. In particular, in the case of linear operators T : Rn ! Rn , to
ease notation we denote this space just by L (Rn ), in place of L (Rn ; Rn ).
Addition and scalar multiplication are, by now, routine. The next notion is, instead,
peculiar to operators.
De…nition 562 Given two linear operators T : Rn ! Rm and S : Rm ! Rq , their product
is the function ST : Rn ! Rq de…ned by
(ST ) (x) = S (T (x))
for every x 2 Rn .
In other words, the product operator ST is the composite function S T . If the operators
S and T are linear, also the product ST is so. Indeed:
(ST ) ( x + y) = S (T ( x + y)) = S ( T (x) + T (y))
= S (T (x)) + S (T (y)) = (ST ) (x) + (ST ) (y)
for every x; y 2 Rn and every ; 2 R. The product of two linear operators is, therefore,
still a linear operator.
As Proposition 569 will make clear, in general the product is not commutative: when
both products ST and T S are de…ned, in general we have ST 6= T S. Hence, when one writes
ST and T S, the order with which the two operators appear is important.
Last, but not least, we state the version for operators of the remarkable Theorem 535 on
continuity.
13.3. LINEAR OPERATORS 397
The proof is a simple elaboration on Theorem 535, so it is left to readers (who read the
proof of Theorem 669).
13.3.2 Representation
In this section we study more in detail linear operators T : Rn ! Rm . We start by es-
tablishing a representation theorem for them. In Riesz’s Theorem we saw that a function
f : Rn ! R is linear if and only if there exists a vector 2 Rn such that f (x) = x for
every x 2 Rn . The next result generalizes Riesz’s Theorem to linear operators.
The matrix A is called matrix associated to the operator T (or also representative matrix
of the operator T ).
Matrices allow us, therefore, to represent operators in the form (13.10), which is of great
importance both theoretically and operationally. This is why matrices are so important:
though the fundamental notion is that of operator, thanks to the representation (13.10)
matrices become a most useful auxiliary notion that will accompany us in the rest of the
book.
Proof “If”. This part is contained, essentially, in Example 558. “Only if”. Let T be a linear
operator. Set " #
A = T e1 ; T e2 ; :::; T (en ) (13.11)
m n m 1
m 1 m 1
that is, A is the m n matrix whosePn columns are the column vectors T ei for i = 1; :::; n.
We can write every x 2 Rn as x = ni=1 xi ei . Therefore, for every x 2 Rn ,
n
! n
X X
i
T (x) = T xi e = xi T e i
i=1 i=1
2 3 2 3 2 3
a11 a12 a1n
6 a21 7 6 a22 7 6 a2n 7
6 7 6 7 6 7
= x1 6 .. 7 + x2 6 .. 7+ + xn 6 .. 7
4 . 5 4 . 5 4 . 5
am1 am2 amn
2 3 2 3
a11 x1 + a12 x2 + + a1n xn a1 x
6 a21 x1 + a22 x2 + + a2n xn 7 6 2 7
6 7 6 a x 7
=6 .. 7=6 .. 7 = Ax
4 . 5 4 . 5
am1 x1 + am2 x2 + + amn xn m
a x
398 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
(a1n ; a2n ; :::; amn ) = T (en ) = Ben = (b1n ; b2n ; :::; bmn )
Therefore, A = B.
T (x) = (0; x2 ; x3 ) 8x 2 R3
T e1 = (0; 0; 0)
T e2 = (0; 1; 0)
T e3 = (0; 0; 1)
and therefore 2 3
0 0 0
A = T e1 ; T e2 ; T e3 =4 0 1 0 5
0 0 1
T (x) = (x1 x3 ; x1 + x2 + x3 ) 8x 2 R3
T e1 = (1; 1)
T e2 = (0; 1)
T e3 = ( 1; 1)
and therefore
1 0 1
A = T e1 ; T e2 ; T e3 =
1 1 1
Proposition 567 Let S; T : Rn ! Rm be two linear operators and let 2 R. Let A and B
be the two m n matrices associated to S and T , respectively. Then
is then the matrix associated to the operator S +T . Moreover, if we take for example = 10,
by Proposition 567, 2 3
0 0 0
A = 4 0 10 0 5
0 0 10
is the then matrix associated to the operator S. N
Then, the matrix associated to the product operator ST : Rn ! Rq is the product matrix
AB = (abij )
q n
400 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
The product matrix AB is, therefore, the matrix representation of the product operator
ST . This motivates the notion of product of matrices that, when it was introduced earlier
in the chapter, might have seemed quite arti…cial.
n q m
Proof Let ei i=1
, e~i i=1
, and ei i=1
be respectively the standard bases of Rn , Rq , and
Rm . We have
T ej = Bej = (b1j ; b2j ; :::; bmj )
m
X
= b1j (1; 0; :::; 0) + b2j (0; 1; 0; :::; 0) + + bmj (0; 0; :::; 1) = bkj ek
k=1
As we saw in Section 13.2.4, the product of matrices is in general not commutative: this,
indeed, re‡ects the lack of commutativity of the product of linear operators.
13.4 Rank
13.4.1 Linear operators
The kernel, denoted ker T , of an operator T : Rn ! Rm is the set
ker T = fx 2 Rn : T (x) = 0g (13.12)
That is, ker T = T 1 (0). The kernel is thus the set of the points at which the operator takes
on a null value (i.e., the zero vector 0 of Rm ).
Another important set is the image (or range) of T , which is de…ned in the usual way as
Im T = fy 2 Rm : y = T (x) for some x 2 Rn g (13.13)
The image is, therefore, the set of the vectors of Rm that are “reached” from Rn through
the operator T .
For linear operators the above sets turn out to be vector subspaces, the kernel of the
domain Rn and the image of the codomain Rm .
13.4. RANK 401
Lemma 570 If T 2 L (Rn ; Rm ), then ker T and Im T are vector subspaces of Rn and of Rm ,
respectively.
Proof We show the result for ker T , leaving Im T to the reader. Let x; x0 2 ker T , so
T (x) = 0 and T (x0 ) = 0. We have to prove that x + x0 2 ker T for every ; 2 R.
Indeed, we have
T x + x0 = T (x) + T x0 = 0 + 0 = 0
as desired.
These vector subspaces are important when dealing with the properties of injectivity
and surjectivity of linear operators. In particular, by de…nition the operator T is surjective
when Im T = Rm , that is, when the subspace Im T coincides with the entire space Rm . As
to injectivity, by exploiting the linearity of T we have the following simple characterization
through a null kernel.
Proof “If ”. Suppose that ker T = f0g. Let x; y 2 Rn with x 6= y. Since x y 6= 0, the
hypothesis ker T = f0g implies T (x y) 6= 0, so T (x) 6= T (y). “Only if”. Let T : Rn ! Rm
be an injective linear operator and let x 2 ker T . If x 6= 0, then by injectivity we have the
contradiction T (x) 6= T (0) = 0. Hence, x = 0, which implies ker T = f0g.
We can now state the important Rank-Nullity Theorem, which says that the dimension
of Rn –that is, n –is always the sum of the dimensions of the two subspaces ker T and Im T
determined by a linear operator T . To this end, we give a name to such dimensions.
Using this terminology, we can now state and prove the result.
(T ) + (T ) = n (13.14)
Proof Setting (T ) = k and (T ) = h, let fyi gki=1 be a basis of the vector subspace Im T
of Rm and fxi ghi=1 a basis of the vector subspace ker T of Rn . Since fyi gki=1 Im T , by
k n
de…nition there exist k vectors fxi gi=1 in R such that T (xi ) = yi for every i = 1; :::; k. Set
To prove the theorem it is su¢ cient to show that E is a basis of Rn . Indeed, in this case E
consists of n vectors and therefore k + h = n.
First of all, we show that the set E is linearly independent. Let f 1 ; :::; k ; 1 ; :::; h g be
scalars such that
k
X h
X
i xi + i xi =0 (13.15)
i=1 i=1
402 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
k h
! k
! h
!
X X X X
T i xi + i xi =T i xi +T i xi =0
i=1 i=1 i=1 i=1
Ph
On the other hand, since fxi ghi=1 is a basis of ker T , we have T i=1 i xi = 0. Therefore,
k
! k k
X X X
T i xi = iT (xi ) = i yi =0 (13.16)
i=1 i=1 i=1
Being a basis, fyi gki=1 is a linearly independent set, so (13.16) implies i = 0 for every
P
i = 1; :::; k. Therefore, (13.15) reduces to hi=1 i xi = 0, which implies i = 0 for every
i = 1; :::; h because fxi ghi=1 , as a basis, is a linearly independent set. Thus, we conclude that
the set E is linearly independent.
It remains to show that span E = Rn . Let x 2 Rn and consider its image T (x). By
de…nition, T (x) 2 Im T and therefore, since fyi gki=1 is a basis of Im T , there exists a set
P
f i gki=1 R such that T (x) = ki=1 i yi . Setting yi = T (xi ) for every i = 1; :::; k, one
obtains !
Xk Xk
T (x) = i T (xi ) = T i xi
i=1 i=1
Pk Pk
Therefore, T x i=1 i xi = 0, and so x i=1 i xi 2 ker T . On the other hand,
fxi ghi=1 is a basis of ker T , and therefore there exists a set f i ghi=1 of scalars such that
Pk Ph Pk Ph
x i=1 i xi = i=1 i xi . In conclusion, x = i=1 i xi + i=1 i xi , which shows that
x 2 span E, as desired.
as claimed.
3
In this proof we use two di¤erent zero vectors 0: the zero vector 0Rm in Rm and the zero vector 0Rn in
n
R . For simplicity, we omit subscripts as no confusion should arise.
13.4. RANK 403
For a generic function, injectivity and surjectivity are distinct and independent properties:
it is very easy to give examples of injective, but not surjective, functions and vice versa. The
next important result, another remarkable consequence of the Rank-Nullity Theorem, shows
that for linear “self”operators (i.e., with the same domain and codomain) the two properties
turn out to be, instead, equivalent.
(i) T is bijective;
(iii) Im T = Rn .
Proof (i) trivially implies (ii). As for (ii) implies (iii), let us assume ker T = f0g. Since
(T ) = 0, (13.14) implies (T ) = n. Since Im T is a subspace of Rn , this implies Im T = Rn
and, therefore, (ii) implies (iii).
It remains to prove that (iii) implies (i). Assume (iii), i.e., Im T = Rn . To show that T is
bijective it su¢ ces to show that it is injective. Using (13.14), from (T ) = n it follows that
(T ) = 0, which implies ker T = f0g. By Proposition 571, T is then injective, as desired.
An equivalent way to state the second part of Corollary 575 is to say that the following
conditions are equivalent:
(i) T is bijective;
(ii) (T ) = 0;
(iii) (T ) = n.
De…nition 576 The rank of a matrix A, denoted by (A), is the maximum number of its
columns that are linearly independent.
Let A be the matrix associated to a linear operator T . Since the vector subspace Im T
is generated by the column vectors of A,4 we have (T ) (A) (why?). The next result
shows that, actually, equality holds: the notions of rank for operators and for matrices are
consistent. In other words, the dimension of the image of a linear operator is equal to the
maximum number of linearly independent columns of the matrix associated to it.
Thanks to the Rank-Nullity Theorem, the proposition has the following corollary that
shows that the linear independence of the columns is the matrix counterpart for injectivity.
So far we have considered the linear independence of the columns of A. The connection
with the linear independence of the rows of A is, however, very tight, as the next important
result shows. In reading it, note that the rank of the transpose matrix AT is the maximum
number of linearly independent rows of A.
Theorem 580 For every matrix A, the maximum numbers of linearly independent rows and
columns coincide, i.e.,
(A) = AT
4 Pn Pn
Indeed, recall that the i-th column of A is T ei and therefore T (x) = T i=1 xi e i = i=1 xi T ei .
This shows that the image T (x) is a linear combination of the columns of A.
13.4. RANK 405
Proof Let A = (aij ) 2 M (m; n). In the proof we denote the i-th row by Ri and the j-th
column by Cj . We have to prove that the subspace of Rn generated by the rows of A, called
row space of A, has the same dimension of the subspace of Rm generated by the columns
of A, called column space of A. Let r be the dimension of the row space of A, that is,
r = AT , and let fx1 ; x2 ; :::; xr g Rn be a basis of this space, where
Each row Ri of A can be written in a unique way as a linear combination of fx1 ; x2 ; :::; xr g,
that is, there exists a vector of r coe¢ cients (w1i ; w2i ; :::; wri ) such that
Let us concentrate now on the …rst column of A, i.e., C1 = (a11 ; a21 ; :::am1 ). The …rst
component a11 of C1 is equal to the …rst component of R1 , the second component a21 of C1
is equal to the …rst component of R2 , and so on until the m-th component am1 of C1 which
is equal to the …rst component of Rm . Thanks to (13.17), we have
that is,
2 3 2 1 3 2 3 2 3
a11 w1 w21 wr1
6 a21 7 6 w 2 7 6 2 7 6 wr2 7
C1 = 6 7 = x11 6 1 7 + x21 6 w2 7 + + xr1 6 7
4 5 4 5 4 5 4 5
am1 w1m m
w2 m
wr
The column C1 of A can, therefore, be written as linear combination of the vectors w1 ; w2 ; :::; wr ,
where 2 1 3 2 1 3 2 1 3
w1 w2 wr
6 2
w1 77 6 2 7 6 wr2 7
w1 = 6 2 6 w2 7 ; ; wr = 6 7
4 5; w = 4 5 4 5
w1m w2m wrm
In a similar way it is possible to verify that all the n columns of A can be written as linear
combinations of w1 ; w2 ; :::; wr . Therefore, the column space of A is generated by the r vectors
w1 ; w2 ; :::; wr of Rm , which implies that its dimension (A) is lower than or equal to r. That
is,
(A) r = (AT )
By interchanging the rows and the columns and by repeating the same reasoning, we get
r = (AT ) (A)
Since the …rst row is obtained by multiplying the second one by 3, the set of all the three
rows is linearly dependent. Therefore, AT < 3. Instead, the two rows (3; 6; 18) and
(0; 1; 3) are linearly independent, like the rows (1; 2; 6) and (0; 1; 3). Therefore, AT = 2.
N
Even though the maximum sets of linearly independent rows or columns can be di¤erent
(in the matrix of the last example we have two di¤erent sets, both for the rows and for the
columns), they have the same cardinality because (A) = AT . It is a remarkable result
that, in view of Corollary 575, shows that for a linear operator T : Rn ! Rn the following
conditions are equivalent:
(i) T is injective;
(ii) T is surjective;
The equivalence of these conditions is one of the deepest results of linear algebra.
O.R. Sometimes one calls rank by rows the maximum number of linearly independent rows,
and rank by columns what we have de…ned as the rank, that is, the maximum number of
linearly independent columns. According to these de…nitions, Theorem 580 says that the
rank by columns always coincides with the rank by rows. The rank is their common value.H
13.4.3 Properties
From Theorem 580 it follows that, if A 2 M (m; n), we have
If it happens that (A) = min fm; ng, the matrix A is said to be of full (or maximum) rank.
Indeed, the rank cannot assume a higher value.
Note that the rank of a matrix does not change if one permutes the places of two columns.
Without loss of generality, we can then assume that, for a matrix A of rank r, the …rst r
columns are linearly independent. This useful convention will be used several times in the
proofs below.
Point (i) shows the behavior of the rank with respect to the matrix operations of addition
and scalar multiplication. Points (ii) and (iii) are interesting properties of invariance of the
rank with respect to the product of matrices. The square matrix AT A is important in
applications and is called the Gram matrix (we will meet it in connection with the least
squares method).
Proof (i) Let r and r0 be the ranks of A and of B: there are r and r0 linearly independent
columns in A and in B, respectively. If r + r0 n the result is trivial because the number of
columns of A + B is n and there cannot be more than n linearly independent columns.
Let therefore r + r0 < n. We denote by as and by bs , with s = 1; : : : ; n, the generic
columns of the two matrices, so that the sth column of A + B is as + bs . We can always
suppose that the r linearly independent columns of A are the …rst ones (i.e., a1 ; : : : ; ar ) and
0
that the r0 linearly independent columns of B are the last ones (i.e., bn r +1 ; : : : ; bn ). In this
way the n (r +r0 ) central columns
n of A+B (that0 is, the a +b
s s 0
o with s = r +1; : : : ; n r ) are
certainly linear combinations of a1 ; ; ar ; bn r +1 ; : : : ; bn because the as can be written
n 0
o
as linear combinations of a1 ; ; ar and the bs of bn r +1 ; : : : ; bn . It follows that the
number of linearly independent columns of A + B cannot exceed r + r0 . We leave to the
reader the proof of the rest of the statement.
(ii) Let us prove (A) = (AD), leaving to the reader the proof of (A) = (CA) (the
equality (A) = (CAD) can be obtained immediately from the other two ones). If A = O,
the result is trivially true. Let therefore A 6= O and let r be the rank of A; there are therefore
r linearly independent columns: let us call them a1 ; a2 ; : : : ; ar since we can always suppose
that they are the …rst r ones; the others, ar+1 ; ar+2 ; ; an are linear combinations of the
…rst ones. Let us prove, now, that the columns of AD are linear combinations of the columns
of A. To this end, let A = (aij ) and D = (dij ). Moreover, let i for i = 1; 2; :::; m and aj for
j = 1; 2; :::; n be the rows and the columns of A, and dj for j = 1; 2; :::; n be the columns of
D. Then
2 1
3 2 1
3
d1 1 d2 1 dn
6 2 7 1 2 6 2 d1 2 d2 2 dn 7
AD = 6
4
7 d jd j
5 jdn = 6
4
7
5
m m d1 m d2 m dn
The …rst column of AD is, therefore, a linear combination of the columns of A. Analogously,
it is possible to prove that the second column of AD is
Let us suppose, by contradiction, that (AD) < (A) = r. Then, in the linear combinations
(13.19) one of the …rst r columns of A always has coe¢ cient zero (otherwise, the column
space of AD would have dimension at least r, being a1 ; a2 ; :::; ar linearly independent vectors
of Rm ). Without loss of generality, let us suppose that column a1 is the one having coe¢ cient
zero in all linear combinations (13.19). Then, we have
which is a contradiction since D has full rank and it cannot have a row of only zeros.
Therefore, the space generated by the columns of AD has dimension at least r, that is,
(AD) r. Together with (13.20), this proves the result.
(iii) If A, and therefore AT , are of full rank, the result follows from (ii). Suppose that A
has not full rank and let (A) = r, with r < minfm; ng. As seen in (ii), the columns of AT A
are linear combinations of the columns of AT , and so
By assuming that the …rst r columns of A are linearly independent, we can write A as
A = B C
m n m r m (n r)
BT BTB BTC
AT A = [B C] = :
CT C TB C TC
By property (ii), the submatrix B T B, which is square of order r, has full rank r. Therefore,
the r columns of B T B are linearly independent vectors of Rr . Consequently, the …rst r
columns of AT A are linearly independent vectors of Rn (otherwise, the r columns of B T B
would not be linearly independent). The column space of AT A has dimension at least r,
that is, (AT A) r. Together with (13.21), this proves the result.
13.4. RANK 409
have rank 3: in the …rst one the …rst three columns are linearly independent (they are
the three versors of R3 ); in the second one the …rst three rows are linearly independent.
The matrices (13.22) are a special case of echelon matrices, which are characterized by the
properties:
(i) the rows with not all elements zero have 1 as …rst non-zero component, called pivot
element, or simply pivot;
(ii) the other elements of the column of the pivot are zero;
(iii) pivots form a “little scale” from the left to the right: a pivot of a lower row is to the
right of the pivot of an upper row;
(iv) the rows with all elements zero (if they exist) lie under the other rows, so in the lower
part of the matrix.
in which the pivots are in boldface. Note that a square matrix is an echelon matrix when it
is diagonal, possibly followed by rows of only zeros; for example:
2 3
1 0 0
4 0 1 0 5
0 0 0
Clearly, the non-zero rows (that is, the rows with at least one non-zero element) are linearly
independent. The rank of an echelon matrix is, therefore, obvious.
Lemma 583 The rank of an echelon matrix is equal to the number of non-zero rows.
410 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
There exist some simple operations that permit to transform any matrix A into an echelon
matrix. Such operations, called elementary operations (by row ),6 are:
The three operations amounts to multiplying, on the left, the matrix A 2 M (m; n) by
suitable m m square matrices, called elementary. Speci…cally,
(i) multiplying the s-th row of A by a scalar amounts to multiplying, on the left, A by
the elementary matrix Ps ( ) that coincides with the identity matrix Im except that,
in the place (s; s), we have instead of 1;
(ii) adding to the r-th row of A a multiple of the s-th row amounts to multiplying, on
the left, A by the elementary matrix Srs ( ) that coincides with the identity matrix Im
except that, in the place (r; s), we have instead of 0;
(iii) interchanging the r-th row and the s-th row of A amounts to multiplying, on the left,
A by the elementary matrix Trs that coincides with the identity matrix Im except that
the r-th row and the s-th row have been interchanged.
(i) Multiplying A by 2 3
1 0 0
P2 ( ) = 4 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
1 0 0 3 2 4 1 3 2 4 1
4
P2 ( ) A = 0 0 5 4 5
1 0 6 9 = 4 0 6 9 5
0 0 1 5 3 7 4 5 3 7 4
(ii) Multiplying A by 2 3
1 0
S12 ( ) = 4 0 1 0 5
0 0 1
6
Though we could de…ne also analogous elementary operations by column, we prefer not to do it and to
refer always to the rows in order to avoid any confusion and errors in computations. Choosing the rows over
the columns does not change the results.
13.4. RANK 411
in which to the …rst row one added the second one multiplied by .
(iii) Multiplying A by 2 3
0 1 0
T12 =4 1 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
0 1 0 3 2 4 1 1 0 6 9
T12 A=4 1 0 0 5 4 1 0 6 9 5=4 3 2 4 1 5
0 0 1 5 3 7 4 5 3 7 4
The next result, the proof of which we omit, shows the uniqueness of the echelon matrix
to which we arrive via elementary operations:
Lemma 585 Each matrix A 2 M (m; n) is transformed, via elementary operations, into a
unique echelon matrix, denoted by A 2 M (m; n).
Naturally, di¤erent matrices can be transformed into the same echelon matrix. The
sequence of elementary operations that transforms a matrix A into the echelon matrix A is
called the Gaussian elimination procedure
This lemma, whose proof is left to the reader, shows that the inverse operator T 1 is a
linear operator too, that is, T 1 2 L (Rn ). Moreover, it is easy to verify that
1 1
T T = TT =I (13.23)
where I is the identity operator.
The operator T is invertible, as the reader can verify, where T 1 (x) = Bx for every x 2 R2
and
1 0
B= 1 1
2 2
Finding the inverse operator is not an easy task, yet it is not just con…ned to guessing. Later
in the chapter we will discuss a procedure allowing for the computation of B. N
In the last section we saw a …rst characterization of the invertibility through the notions
of rank and nullity (Corollary 575). We give now another characterization of invertibility.
Proof “Only if”. Let T be invertible; (13.23) implies that (13.24) holds with S = R = T 1 .
“If”. Assume that there exist S; R 2 L (Rn ) such that (13.24) holds. Let x; y 2 Rn , x 6= y.
We have T (x) 6= T (y) and, therefore, T is injective. Indeed, from T (x) = T (y) it would
follow, by (13.24),
x = R (T (x)) = R (T (y)) = y
which contradicts x 6= y. It remains to show that T is surjective. Let x 2 Rn and set
y = S (x). By (13.24), we have
T (y) = T (S (x)) = x
and, therefore, x 2 Im T . This implies that Rn = Im T , as desired. In conclusion, T is
invertible.
Using (13.23) and (13.24), we have
1 1 1
S (x) = T T (S (x)) = T ((T S) (x)) = T (x)
1 1 1
R (x) = R T T (x) = (R T ) T (x) = T (x)
for every x 2 Rn , and so S = R = T 1.
1 0 1 1 0
A= and A = 1 1
1 2 2 2
Corollary 593 For a square matrix A of order n the following properties are equivalent:
(i) A is invertible;
(iv) (A) = n;
(v) there exist two square matrices B and C of order n such that AB = CA = I; such
matrices are unique, with B = C = A 1 .
Proposition 594 If the square matrices A and B of order n are invertible, then their
product is invertible and
(AB) 1 = B 1 A 1
Proof Let A and B be of order n and invertible. We have (A) = (B) = n, so that
(AB) = n by Proposition 582. By Corollary 593, the matrix AB is invertible. Recall from
(6.11) of Section 6.4 that, for the composition of invertible functions f and g, one has that
(g f ) 1 = f 1 g 1 . In particular this holds for linear operators, that is, (ST ) 1 = T 1 S 1 ,
so Proposition 569 implies (AB) 1 = B 1 A 1 .
13.6 Determinants
13.6.1 De…nition
A matrix contained in a matrix A 2 M (m; n) is called a submatrix of A. It can be thought
of as obtained from A by deleting some rows and/or columns. In particular, we denote by
Aij the (m 1) (n 1) submatrix obtained from A by deleting row i and column j.
13.6. DETERMINANTS 415
De…nition 596 The determinant is the function det : M (n) ! R de…ned, for every A 2
M (n), by
a11 a12
A=
a21 a22
is
det A = ( 1)1+1 a11 det ([a22 ]) + ( 1)1+2 a12 det ([a21 ]) = a11 a22 a12 a21
For example, if
2 4
A=
1 3
we have det A = 2 3 4 1 = 2. N
is given by
det A = ( 1)1+1 a11 det A11 + ( 1)1+2 a12 det A12 + ( 1)1+3 a13 det A13
= a11 det A11 a12 det A12 + a13 det A13
= a11 (a22 a33 a23 a32 ) a12 (a21 a33 a23 a31 ) + a13 (a21 a32 a22 a31 )
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a11 a23 a32 a12 a21 a33 a13 a22 a31
416 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
N
Example 599 For a lower triangular matrix A we have
det A = a11 a22 ann
that is, its determinant is simply the product of the elements of the main diagonal. Indeed,
all the other products are zero because they necessarily contain a zero element of the …rst
row.
Since det A = det AT (Proposition 603), a similar result holds for upper triangular
matrices, so also for the diagonal ones. N
Example 600 If the matrix A has all the elements of its …rst row zero except for the …rst
one, which is equal to 1, then
2 3
1 0 0 2 3
6 a21 a22 7 a22 a2n
6 a2n 7 6 .. .. .. 7
det 6 . .. .. .. 7 = det 4 . . . 5
4 .. . . . 5
an2 ann
an1 an2 ann
That is, the determinant coincides with the determinant of the submatrix A11 . Indeed, in
Xn
det A = ( 1)1+j a1j det A1j
j=1
all the summands except for the …rst one are zero. More generally, for any scalar k we have
2 3
k 0 0 2 3
6 a21 7 a22 a2n
6 a22 a2n 7 6 . .. .. 7
det 6 . .. .. .. 7 = k det 4 .. . . 5
4 .. . . . 5
an2 ann
an1 an2 ann
Similar properties hold also for the columns. N
The determinant of a square matrix can, therefore, be calculated through a well speci…ed
procedure – an algorithm – based on its submatrices. There exist various techniques to
simplify the calculation of determinants (we will see some of them shortly) but, for our
purposes, it is important to know that they can be calculated through algorithms.
13.6. DETERMINANTS 417
13.6.2 Geometry
Geometrically, the determinant of a square matrix measures (with a sign!) the “space taken
up”by its column vectors. Let us try to explain this, at least in the simplest case. So, let A
be the matrix 2 2
a11 a12
A=
a21 a22
in which we assume that a11 > a12 > 0 and a22 > a21 > 0 (the other possibilities can be
similarly studied, as readers can check).
3 G
2 a F C E
22
1 a B
21
0
D
O a a
12 11
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
The determinant of A is the area of the parallelogram OBGC (see the …gure), i.e., twice
the area of the triangle OBC that is obtained from the two column vectors of A. The area
of the triangle OBC can be easily calculated by subtracting from the area of the rectangle
ODEF the areas of the three triangles ODB, OCF , and BEC. Since
a11 a21 a22 a12
area ODEF = a11 a22 ; area ODB = ; area OCF =
2 2
(a11 a12 ) (a22 a21 ) a11 a22 a11 a21 a12 a22 + a12 a21
area BCE = =
2 2
one gets
a11 a21 + a22 a12 + a11 a22 a11 a21 a12 a22 + a12 a21
area OBC = a11 a22
2
a11 a22 a12 a21
=
2
Therefore,
det A = area OBGC = a11 a22 a12 a21
The reader will immediately realize that:
(i) if we exchange the two columns, the determinant changes only its sign (because the
parallelogram is covered in the opposite direction);
418 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
(ii) if the two vectors are proportional, that is, linearly dependent, the determinant is zero
(because the parallelogram collapses in a segment).
13.6.3 Combinatorics
A permutation of the set of numbers N = f1; 2; :::; ng is any bijection : N ! N (Appendix
B.2). There are n! possible permutations. For example, the permutation
interchanges the …rst two elements of N and leave the others unchanged. So, it is represented
by the function : N ! N de…ned by
8
>
> 2 if k = 1
<
(k) = 1 if k = 2
>
>
:
k else
is called parity. In particular, an even permutation has parity +1, while an odd permutation
has parity 1.
Example 601 (i) The permutation (13.25) is odd because there is only one inversion, with
k = 1 and k 0 = 2. So, its parity is 1. (ii) The identity permutation (k) = k has, clearly,
no inversions. So, it is an even permutation, with parity +1. N
13.6. DETERMINANTS 419
Let us go back to determinants. Consider a 2 2 matrix A, and set N = f1; 2g. In this
case consists of only two permutations and 0 , de…ned by
( (
1 if k = 1 0
2 if k = 1
(k) = and (k) =
2 if k = 2 1 if k = 2
Indeed:
0
(sgn ) a1 (1) a2 (2) + sgn a1 0 (1) a2 0 (2) = a11 a22 a12 a21
The next result shows that this remarkable fact is true in general, thus providing an important
combinatorial characterization of determinants (we omit the proof).
Note that each term in the sum (13.26) contains only one element of each row and only
one element of each column. This will be crucial in the proofs of the next section.
13.6.4 Properties
The next proposition collects the main properties of determinants, which are also useful for
their computation. In the statement “line” stands for either row or column: the properties
hold, indeed, symmetrically for both the rows and the columns of the matrix. “Parallel lines”
means two rows or two columns.
Proposition 603 Let A and B be two square matrices of the same order. Then:
(iii) If B is obtained from A by interchanging two parallel lines, then det B = det A.
(v) If a line of A is the sum of two vectors b and c, then det A is the sum of the determinants
of the two matrices that are obtained by taking that line equal …rst to b and then to c.
(vi) If B is obtained from A by adding to a line a multiple of a parallel line, then det B =
det A.
Proof The proof relies on the combinatorial characterization of the determinant established
in Proposition 602, in particular on the observation that each term that appears in the
determinant contains exactly one element of each row and one element of each column. In
the proof we only consider rows (similar arguments hold for the columns).
(i) In all the products that constitute the determinant, there appears one element of each
row: if a row is zero, all the products are then zero. (ii) For the same reason, all the products
turn out to be multiplied by k.
(iii) By exchanging two rows, all the even permutations become odd and vice versa.
Therefore, the determinant changes sign.
(iv) Let A be the matrix that has rows i and j equal and let Aij be the matrix A with
such rows interchanged. By (iii), we have det Aij = det A. Nevertheless, since the two
interchanged rows are equal, we have A = Aij . So, det Aij = det A. This is possible if and
only if det Aij = det A = 0.
(v) Suppose
2 1 3 2 3
a a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
A=6 7 6
6 ar 7 = 6 b + c 7
7
6 7 6 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am
and let 2 3 2 3
a1 a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
Ab = 6
6
7
7 and Ac = 6
6
7
7
6 b 7 6 c 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am
be the two matrices obtained by taking as r-th row b and c, respectively. Then
0 1
X n
Y X Y
det A = sgn ai (i) = sgn @ ai (i)
A (b + c)
r (r)
2 i=1 2 i6=r
0 1 0 1
X Y X Y
= sgn @ ai (i)
A br (r) + sgn @ ai (i)
A cr (r) = det Ab + det Ac
2 i6=r 2 i6=r
The matrix obtained from A by adding, for example, k times the …rst row to the second one,
is 2 3
a1
6 a2 + ka1 7
6 7
B=6 .. 7
4 . 5
am
Moreover, let 2 3 2 3
a1 a1
6 ka1 7 6 a1 7
6 7 6 7
C=6 .. 7 and D = 6 .. 7
4 . 5 4 . 5
am am
By (v), det B = det A + det C. On the other hand, by (ii) we have det C = k det D. But,
since D has two equal rows, by (i) we have det D = 0. We conclude that det B = det A.
(vii) Transposition does not alter any of the n! products in the sum (13.26), as well as
their parity.
An important operational consequence of this proposition is that now we can say how
the elementary operations E1 -E3 , which characterize the Gaussian elimination procedure,
modify the determinant of A. Speci…cally:
or, equivalently, det A = 0 if and only if det B = 0. This observation leads to the following
important characterization of square matrices of full rank.
Proposition 604 A square matrix A has full rank if and only if det A 6= 0.
Proof “Only if”. If A has full rank, its rows are linearly independent (Corollary 593). By
Lemma 585 and Proposition 588, A can be then transformed via elementary operations into
a unique echelon square matrix of full rank, that is, the identity matrix In . By (13.27), we
conclude that det A 6= 0.
“If”. Let det A 6= 0. Suppose, by contradiction, that A does not have full rank. Then,
its rows are not linearly independent (Corollary 593), so at least one of them is a linear
combination of the others. Such row can be reduced to become zero by repeatedly adding
to it carefully chosen multiples of the other rows. Denote by B such transformed matrix.
422 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Corollary 593 and the previous result jointly imply the following important result.
Corollary 605 For a square matrix A the following conditions are equivalent:
(iii) det A 6= 0.
The determinants behave well with respect to the product, as the next result shows. It
is a key property of determinants.
Theorem 606 (Binet) If A and B are two square matrices of the same order n, then
det AB = det A det B.
So, determinants commute: det AB = det BA. This is a …rst interesting consequence of
Binet’s Theorem. Since I = A 1 A, another interesting consequence of this result is that
1 1
det A =
det A
when A is invertible. Indeed, 1 = det I = det A 1A = det A 1 det A.
Proof If (at least) one of the two matrices has linearly dependent rows or columns, then the
statement is trivially true since the columns of AB are linear combinations of the columns
of A and the rows of AB are linear combinations of the rows of B, hence in both cases AB
has also linearly dependent rows or columns, so det AB = 0 = det A det B.
Suppose, therefore, that both A and B have full rank. Suppose the matrix A is diagonal.
If so, det A = a11 a22 ann . Moreover, we have
0 10 1
a11 0 0 b11 b12 b1n
B 0 a22 0 C B b2n C
AB = B C B b21 b22 C
@ A@ A
0 0 ann bn1 bn2 bnn
0 1
a11 b11 a11 b12 a11 b1n
B a22 b21 a22 b22 a22 b2n C
=B@
C
A
ann bn1 ann bn2 ann bnn
By Proposition 603-(ii),
multiply A on the left by a square matrices Srs ( ) and Trs , respectively. Let us agree to make
…rst the transformations T and then the transformations S ( ). Let us suppose, moreover,
that the diagonalization requires h transformation S ( ) and k transformations T . If D is
the diagonal matrix obtained in this way, we then have
D = S ( )S ( ) S ( )T T T A
| {z }| {z }
h times k times
Since D is obtained from A through h elementary operations that do not modify the determ-
inant and k elementary operations that only change its sign, we have det D = ( 1)k det A.
Therefore,
det DB = ( 1)k det A det B (13.28)
Analogously, since the product of matrices is associative, we have
DB = (S ( ) S ( )T T A) B = (S ( ) S ( )T T ) (AB)
Therefore, DB is obtained from AB via h elementary operations that do not modify the
determinant and k elementary operations that only change its sign. So, as before, we have
Putting together (13.28) and (13.29), we get det AB = det A det B, as desired.
A = aij
with i; j = 1; 2; :::; n. The transpose (A )T is sometimes called the (classical ) adjoint matrix.
Similarly,
We conclude that 2 3
16 26 27
A =4 12 4 15 5
6 2 16
N
Using the notion of algebraic complement, the de…nition of the determinant of a square
matrix (De…nition 596) can be viewed as the sum of the products of the elements of the …rst
row by their algebraic complements, that is,
n
X
det A = a1j a1j
j=1
The next result shows that, actually, there is nothing special about the …rst row: the
determinant can be computed using any row or column of the matrix. The choice of which
one to use is then just a matter of analytical convenience.
Proposition 608 The determinant of a square matrix A is equal to the sum of the products
of the elements of any line (row or column) by their algebraic complements.
Proof For the …rst row, the result is just a rephrasing of the de…nition of determinant. Let
us verify it for the i-th row. By points (ii) and (v) of Proposition 603 we can rewrite det A
in the following way:
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det A = det 6
6 ai1 aij ain 77 (13.30)
6 .. .. 7
4 . . 5
an1 anj ann
2 3 2 3
a11 a1j a1n a11 a1j a1n
6 .. .. 7 6 .. .. 7
6 . . 7 6 . . 7
6 7 6 7
= ai1 det 6
6 1 0 0 7 7+ + + aij det 6
6 0 1 0 7 7+
6 .. .. 7 6 .. .. 7
4 . . 5 4 . . 5
an1 anj ann an1 anj ann
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
+ ain det 6
6 0 0 1 7 7
6 .. .. 7
4 . . 5
an1 anj ann
Let us calculate the determinant of the submatrix relative to the term (i; j):
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det 6
6 0 1 0 7 7 (13.31)
6 .. .. 7
4 . . 5
an1 anj ann
Note that to be able to apply the de…nition of the determinant and to use the notion of
algebraic complement, it is necessary to bring the i-th row to the top and the j-th column
to the left, i.e., to transform the matrix (13.31) into a matrix that has (1; 0; :::0) as …rst row,
(1; a1j ; a2j ; :::; ai 1;j ; ai+1;j ; :::anj ) as …rst column and Aij as the (n 1) (n 1) South-East
submatrix:
2 3
1 0 0 0 0
6 a1j a11 a1;j 1 a1;j+1 a1n 7
6 7
6 7
6 7
A=6
~
6 ai 1;j ai 1;1 ai 1;j 1 ai 1;j+1 ai 1;n
7:
7
6 ai+1;j ai+1;1 ai+1;j ai+1;j+1 ai 7
6 1 1;n 7
4 5
anj an1 an;j 1 an;j+1 an;n
426 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
The transformation requires i 1 exchanges of adjacent rows to bring the i-th row to the
top, and j 1 exchanges of adjacent columns to bring the j-th column to the left (leaving
the order of the other rows and columns unchanged). Clearly, we have
and so
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det 6
6 0 1 0 7 7 = ( 1)
i+j 2
det A~ = ( 1)i+j det Aij = aij (13.32)
6 .. .. 7
4 . . 5
an1 anj ann
By applying formula (13.30) and using (13.32) we complete the proof.
The next result completes Proposition 608 by showing what happens if we use the algeb-
raic complements of a di¤erent row (or column).
Proposition 610 The sum of the products of the elements of any row (column) by the al-
gebraic complements of a di¤ erent row (column) is zero.
Laplace’s Theorem is the occasion to introduce the classic Kronecker delta function :
N N ! f0; 1g de…ned by
1 if i = j
ij =
0 if i 6= j
Here i and j are, thus, any two natural numbers (e.g., 11 = 33 = 1 and 13 = 31 = 0).
Using this function, points (i) and (ii) of Laplace’s Theorem assume the following elegant
forms:
X n
aij aqj = iq det A
j=1
and
n
X
aij aiq = jq det A
i=1
Theorem 613 A square matrix A is invertible if and only if det A 6= 0. In this case, we
have
1
A 1= (A )T
det A
(i) A is invertible;
(v) (A) = n.
Proof If
2 3 2 3
1 ( )1
6 2 7 6 ( )2 7
A = (aij ) = 6
4
7
5 and A = (aij ) = 6
4
7
5
n
( )n
we have 2 3
1
6 2 7
A (A )T = 6
4
7 (
5 )1 j ( )2 j j( )n
n
where (a C )i is the i-th column of A and (aC )q is the q-th column of A. Therefore,
2 3
det A 0 0
6 0 det A 0 7
6 7
A (A )T = (A )T A = 6 . . .. .. 7 = det A In
4 .. .. . . 5
0 0 det A
That is,
1 1
A (A )T = (A )T A = In
det A det A
which allows to conclude that
1
A 1
= (A )T
det A
as desired.
Example 614 We use formula (13.33) to calculate the inverse of the matrix
1 2
A=
3 5
We have
det A11 a22 5
a111 = ( 1)1+1 = = = 5
det A a11 a22 a12 a21 1
det A21 a12 2
a121 = ( 1)1+2 = = =2
det A a11 a22 a12 a21 1
det A12 a21 3
a211 = ( 1)2+1 = = =3
det A a11 a22 a12 a21 1
det A22 a11 1
a221 = ( 1)2+2 = = = 1
det A a11 a22 a12 a21 1
So,
a22 a12
1 det A det A 5 2
A = a21 a11 =
det A det A 3 1
N
430 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
(i) principal minors the determinants of the square submatrices that are obtained by
eliminating some rows and the columns with the same indexes (place);
(ii) North-West (NW ) principal minors the principal minors that are obtained by elimin-
ating the last k rows and the last k columns, with 0 k n 1.
The following proposition, the proof of which we omit, is very useful to determine the
rank of a matrix.
Proposition 618 (Kronecker) The following properties are equivalent for a matrix A:
(ii) A has a non-zero minor of order r and all the minors of order r + 1 are zero;
(iii) A has a non-zero minor of order r and all the minors of order r + 1 that contain it are
zero;
(iv) A has a non-zero minor of order r and all the minors of order > r are zero.
Kronecker’s Algorithm for determining the rank of a matrix is based on this proposition
and can be illustrated as follows:
(ii) We “border”in all the possible ways the “leader”submatrix with one of the surviving
rows and one of the surviving columns. If all such “bordered” minors (of order + 1)
are zero, the rank of A is and the procedure ends here. If we run into a non-zero
minor of order + 1, we start again by taking it as new “leader”.
8
The property is easy to verify and has already been used in the proof of Proposition 582.
432 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Hence, the rank of A is at least 2. With the last two columns and the last non-used row, we
obtain the following “bordered” minors:
2 3 2 3
6 3 9 6 3 0
4 5
det 4 1 7 = 0 ; det 4 1 4 2 5=0
8 10 6 8 10 12
13.6.8 Summing up
We conclude this section by noting how the rank of a matrix is simultaneously many things
(each one of them being a possible de…nition of it). Indeed, it is:
(iv) the dimension of the image of the linear function that the matrix determines.
The rank is a multi-faceted notion that plays a key role in linear algebra and its many
applications. Operationally, the Gaussian elimination procedure and the Kronecker’s Al-
gorithm permit to compute it.
In matrix form:
A x = b (13.34)
n nn 1 n 1
where A is a square n n matrix, while x and b are vectors in Rn . We ask two questions
concerning the system (13.34):
13.7. SQUARE LINEAR SYSTEMS 433
Existence: which conditions ensure that the system has a solution for every vector
b 2 Rn , that is, when, for any given b 2 Rn there exists a vector x 2 Rn such that
Ax = b?
Uniqueness: which conditions ensure that such a solution is unique, that is, when, for
any given b 2 Rn there exists a unique x 2 Rn such that Ax = b?
To frame the problem in what we studied until now, consider the linear operator T :
Rn ! Rn associated to A, de…ned by T (x) = Ax for every x 2 Rn . The system (13.34) can
be written in functional form as
T (x) = b
the system admits a unique solution for a given b 2 Rn if and only if the preimage
T 1 (b) is a singleton; in particular, the system admits a unique solution for every
b 2 Rn if and only if T is injective.9
Since injectivity and surjectivity are, by Corollary 575, equivalent properties for linear
operators from Rn into Rn , the two problems of existence and uniqueness are equivalent:
there exists a solution for the system (13.34) for every b 2 Rn if and only if such a solution
is unique.
In particular, a necessary and su¢ cient condition for such a unique solution to exist for
every b 2 Rn is that the operator T is invertible, i.e., that one of the following equivalent
conditions holds:
The condition required is, therefore, the invertibility of the matrix A, or one of the
equivalent properties (ii) and (iii). This is the content of Cramer’s Theorem, which thus
follows easily from what we learned so far.
Theorem 620 (Cramer) Let A be a square matrix of order n. The system (13.34) has
one, and only one, solution for every b 2 Rn if and only if the matrix A is invertible. In this
case, the solution is given by
x = A 1b
9
Recall that a function is injective if and only if all its preimages are singletons.
434 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Thus, the system (13.34) admits a solution for every b if and only if the matrix A is
invertible and, even more important, the unique solution is expressed in terms of the inverse
matrix A 1 . Since we are able to calculate A 1 using determinants (Theorem 613), we
have obtained a procedure for solving linear systems of n equations in n unknowns: formula
x = A 1 b can indeed be written as
1
x= (A )T b (13.35)
det A
Using Laplace’s Theorem, it is easy to show that formula (13.35), called Cramer’s rule, can
be written in detail as: 2 det A1 3
det A
6 7
6 det A2 7
x=6 det A 7 (13.36)
4 5
det An
det A
where Ak denotes the matrix obtained by replacing the k-th column of the matrix A with
the column vector 2 3
b1
6 b2 7
b=6 4
7
5
bn
Example 621 A special case of the system (13.34) is when b = 0. Then the system is called
homogeneous and, if A is invertible, by Proposition 620, the unique solution is x = 0. N
1 2
A=
3 5
10
Alternatively, it is possible to prove the “if” in the following, rather mechanical, way. Set x = A 1 b; we
have Ax = A A 1 b = AA 1 b = Ib = b, so x = A 1 b solves the system. It is also the unique solution.
~ 2 Rn is another solution, we have x
Indeed, if x ~ = Ix~ = A 1A x ~ = A 1 (A~x) = A 1 b = x as claimed.
13.7. SQUARE LINEAR SYSTEMS 435
From Example 614 we know that A is invertible. By Proposition 620, the unique solution of
the system is therefore
1 5 2 b1 5b1 + 2b2
x=A b= =
3 1 b2 3b1 b2
Using Cramer’s rule (13.36), we see that
b1 2 1 b1
det A = 1 det A1 = det = 5b1 2b2 det A2 = det = b2 3b1
b2 5 3 b2
Therefore,
2b2 5b1 b2 3b1
x1 = = 5b1 + 2b2 ; x2 = = 3b1 b2
1 1
which coincides with the solution found above. N
where it is no longer required that n = m, i.e., the number of equations and unknowns may
di¤er. The system can be written in matrix form as
A x = b
m nn 1 m 1
where A 2 M (m; n), x 2 Rn , and b 2 Rm . The square system is the special case where
n = m.
(ii.a) determined (or uniquely solvable) when it admits only one solution, i.e., T 1 (b) is a
singleton;
(ii.b) undetermined when it admits in…nitely many solutions, i.e., T 1 (b) has in…nite car-
dinality.11
These two cases exhaust all the possibilities: if a system admits two solutions, it certainly
has in…nitely many ones. Indeed, if x and x0 are two di¤erent solutions –that is, Ax = Ax0 =
b –then all the linear combinations x+(1 ) x0 with 2 R are also solutions of the system
because
A x + (1 ) x0 = Ax + (1 ) Ax0 = b + (1 )b = b
Using this terminology, in the case n = m Cramer’s Theorem says that a square linear
system is solvable for every vector b if and only if it is determined for every such vector. In
this section we modify the analysis of the last section in two di¤erent directions:
(ii) we study the existence and uniqueness of solutions for a given vector b (so, for a speci…c
system at hand), rather than for every such vector.
11
Since the set T 1 (b) is convex, it is a singleton or it has in…nite cardinality (in particular, it has the
power of the continuum), tertium non datur. We will introduce convexity in the next chapter.
13.8. GENERAL LINEAR SYSTEMS 437
To this end, let us consider the so-called augmented (or complete) matrix of the system
Ajb
m (n+1)
obtained by writing near A the vector b of the known terms. The next famous result gives
a necessary and su¢ cient condition for a linear system to have a solution.
Proof Let T : Rn ! Rm be the linear operator associated to the system, which can therefore
be written as T (x) = b. The system is solvable if and only if b 2 Im T . Since Im T is the
vector subspace of Rm generated by the columns of A, the system is solvable if and only if b
is a linear combination of such columns. That is, if and only if the matrices A and Ajb have
the same number of linearly independent columns (so, the same rank).
the third row is the di¤erence between the second and …rst rows. These three rows are thus
not linearly independent: (A) = (Ajb) = 2. So, the system is solvable. N
Example 626 A homogeneous system is always solvable because the zero vector is always
a solution of the system. This is con…rmed by the Kronecker-Capelli’s Theorem because the
ranks of A and of Aj0 are always equal. N
Note the Kronecker-Capelli’s Theorem considers a given pair (A; b), while Cramer’s The-
orem considers, as given, only a square matrix A. This re‡ects the new direction (ii) men-
tioned above and, for this reason, the two theorems are only partly comparable in the case
of square matrices A. Indeed, Cramer’s Theorem considers only the case (A) = n, in
which condition (13.37) is automatically satis…ed for every b 2 Rn (why?). For this case,
it is more powerful than Kronecker-Capelli’s Theorem: the existence holds for every vector
b and, moreover, we have also the uniqueness. But, di¤erently from Cramer’s Theorem,
Kronecker-Capelli’s Theorem is able to handle also the case (A) < n by giving, for a given
vector b, a necessary and su¢ cient condition for the system to be solvable.
438 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
13.8.2 Uniqueness
We now turn our attention to the uniqueness of the solutions of a system Ax = b, whose
existence is guaranteed by Kronecker-Capelli’s Theorem. The next result shows that for
uniqueness, too, it is necessary to consider the rank of the matrix A (recall that, thanks to
condition (13.18), we have (A) n).
Proposition 628 Let T : Rn ! Rm be a linear operator and suppose T (x) = b. The vectors
x 2 Rn for which T (x) = b are those of the form x + z with z 2 ker T , and only them. That
is,
T 1 (b) = fx + z : z 2 ker T g (13.38)
The “only if” part of Lemma 571 – i.e., that linear and injective operators have trivial
kernels – is a special case of this result. Indeed, suppose that the linear operator T is
injective, so that T 1 (0) = f0g. If b = 0, we can set x = 0 and (13.38) then implies
f0g = T 1 (0) = f0 + z : z 2 ker T g = ker T . So, ker T = f0g.
Corollary 629 If x is a solution of the system Ax = b, then all solutions are of the form
x+z
Therefore, once we …nd a solution of the system Ax = b, all the other solutions can be
found by adding to it the solutions of the homogeneous system Ax = 0. Besides its theoretical
interest, this is relevant also operationally (especially when it is signi…cantly simpler to solve
the homogeneous system than the original one).12
That said, Corollary 629 allows to prove Proposition 627.
12
As readers will see in more advanced courses, the representation of all solutions as the sum of a particular
solution and the solution of the associated homogeneous system holds also for the solutions of systems of
linear di¤erential equations, as well as of linear di¤erential equations of order n.
13.8. GENERAL LINEAR SYSTEMS 439
Proof of Proposition 627 By hypothesis, the system has at least one solution x. Moreover,
since (A) = (T ), by the Rank-Nullity Theorem (A) + (T ) = n. If (A) = n, we have
(T ) = 0, that is, ker T = f0g. From Corollary 629 it follows that x is the unique solution.
If, instead, (A) < n we have (T ) > 0 and therefore ker T is a non-trivial vector subspace
of Rm , with in…nitely many elements. By Corollary 629, adding such elements to the solution
x we …nd the in…nitely many solutions of the system.
13.8.3 Summing up
Summing up, now we are able to state a general result on the resolution of linear systems
that combines the Kronecker-Capelli’s Theorem and Proposition 627.
The comparison of the ranks (A) and (Ajb) with the number n of the unknowns
allows, therefore, to establish the existence and the possible uniqueness of the solutions of
the system. If the system is square, we have (A) = n if and only if (A) = (Ajb) = n
for every b 2 Rm .13 Cramer’s Theorem, which was only partly comparable with Kronecker-
Capelli’s Theorem, becomes a special case of the more general Theorem 630.
(i) has a unique solution if m = n, i.e., there are as many equations as unknowns;
(ii) is undetermined if m < n, i.e., there are less equations than unknowns;14
(iii) is unsolvable if m > n, i.e., there are more equations than unknowns.
The idea is wrong because it might well happen that some equations are redundant:
some of them are a multiple of another or a linear combination of others (in such cases,
they would be automatically satis…ed once the others are satis…ed). In view of Theorem 630,
however, the claims (i) and (ii) become true provided that by m we mean the number of
non-redundant equations, that is, the rank of A: indeed, the rank counts the equations that
cannot be expressed as linear combinations of others. H
13
Why? (we have already made a similar observation).
14
Sometimes we say that there are more degrees of freedom (unknowns) than constraints (equations). The
opposite holds in (iii).
440 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
1. If k < m, there are m k rows that can be written as linear combinations of the other
k. Given that each row of A identi…es an equation of the system, there are m k
equations that, being linear combinations of the other ones, are “…ctitious”: they are
satis…ed when the other k are satis…ed. We can simply delete them, by reducing in
this way the system to one with k linearly independent equations.
2. If k < n, there are n k columns that can be written as linear combination of the other
k (so, are “…ctitious”). The corresponding n k “unknowns”are not really unknowns
(they are “…ctitious unknowns”) but can assume completely arbitrary values: for each
choice of such values, the system reduces to one with k unknowns (and k equations)
and, therefore, there is only one solution for the k “true unknowns”. We can simply
assign arbitrary values to the n k “…ctitious unknowns”, by reducing in this way the
system to one with k unknowns.
As usual, we can assume that the k rows and the k columns that determine the rank of
A are the …rst ones. Let A0 be a non-singular submatrix k k of A,15 and write
2 3
A0 B
6 k k k (n k) 7
A =4 5
m n C D
(m k) k (m k) (n k)
then we can eliminate the last m k rows and give arbitrary values, say z 2 Rn k to the
last n k unknowns, obtaining in this way the system
A0 x0 = b0 Bz (13.39)
in which x0 2 Rk is the vector that contains the only k “true” unknowns and b0 2 Rk is the
vector of the …rst k known terms.
The square system (13.39) satis…es the hypothesis of Cramer’s Theorem for every z 2
Rn k , so it can be solved with the Cramer’s rule. If we call x^0 (z) the unique solution for
each given z 2 Rn k , the solutions of the original system Ax = b are
^0 (z) ; z
x 8z 2 Rn k
1 2 3 3
A0 = ; B = ; C = 5 2 ; D = [ 1] ; b0 =
2 2 6 4 2 1 2 1 2 1 1 2 1 7
so that, setting b0z = b0 Bz, the square system (13.39) becomes A0 x = b0z , that is,
(
x1 + 2x2 = 3 3z
6x1 + 4x2 = 7 2z
In other words, the procedure consisted in deleting the redundant equation and in assigning
arbitrary value z to the unknown x3 .
Since det A0 6= 0, by Cramer’s Rule the in…nitely many solutions are described as
2 8z 1 11 + 16z 11
x1 = = + z; x2 = = 2z; x3 = z
8 4 8 8
1 11 1 + 11
First equation : 1 +z +2 2z +3 z = +0 z =3
4 8 4
1 11 6 + 22
Second equation : 6 +z +4 2z +2 z = +0 z =7
4 8 4
Alternatively, we could have noted that the second equation is the sum of the …rst and third
ones and then delete the second equation rather than the third one. In this way the system
would reduce to
x1 + 2x2 + 3x3 = 3
5x1 + 2x2 x3 = 4
We can now assign arbitrary value to the …rst unknown, say x1 = z~, rather than to the third
one.16 This yields the system
2x2 + 3x3 = 3 z~
2x2 x3 = 4 5~ z
and vectors x = (x2 ; x3 )T and b00z~ = (3 z~; 4 z )T . Since det A00 6= 0, Cramer’s Rule
5~
expresses the in…nitely many solutions as
15 16~
z 1
x1 = z~; x2 = ; x3 = + z~ z 2 Rn
8~
8 4
16
The tilde on z helps to distinguish this case from the previous one.
442 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
In the …rst way we get x1 = 1=4 + z, while in the second one x1 = z~. Therefore z~ = 1=4 + z.
With such value the solutions just found,
1
x1 = z~ = +z
4
1
15 16~z 15 16 4 +z 15 4 16z 11
x2 = = = = 2z
8 8 8 8
and
1 1 1
x3 =+ z~ = + +z =z
4 4 4
become the old ones. The two sets of solutions are the same, just written using two di¤erent
parameters. We invite the reader to delete the …rst equation and redo the calculations. N
Since 2 32 3 2 3
1 2 10
3 1 3 2x4 3 x4
1
A0 ( Bz) = 4 0 1 1 5 4 4x4 5 = 4 6x4 5
3 3 3 1 1 1 1 2
6 2 6 10x4 3 x4
solve the system for every t 2 R. This con…rms what found in Section 3.7. N
The solution procedure for systems explained above, based on Cramer’s rule, is theoretic-
ally elegant. However, from the computational viewpoint there is a better procedure that we
do not discuss, known as Gauss method and based on the Gaussian elimination procedure.
13.10. CODA: HAHN-BANACH ET SIMILIA 443
f ( x + y) = f (x) + f (y)
Since V is closed with respect to sums and multiplications by a scalar, we have that
x + y 2 V , and therefore this de…nition is well posed and generalizes De…nition 529.
V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg
Proof Let dim V = k n and let x1 ; :::; xk be a basis for V . If k = n, there is nothing to
prove since V = Rn . Otherwise, by Theorem 87, there are n k vectors xk+1 ; :::; xn such
that the overall set x1 ; :::; xn is a basis for Rn . Let frk+1 ; :::; rn g be an arbitrary set of
n k real numbers. By Theorem 84, note that forPeach vector x in Rn there exists a unique
collection of scalars f i gni=1 R such that x = ni=1 i xi . De…ne f : Rn ! R to be such
Pk Pn
that f (x) = i=1 i f (xi ) + i=k+1 i ri . Since for each vector x the collection f i gni=1 is
unique, we have that f is well de…ned and linear (why?). Note also that
(
f xi for i = 1; :::; k
f xi =
ri for i = k + 1; :::; n
Since x1 ; :::; xk is a basis for V , for every x 2 V there are k scalars f i gki=1 such that
P
x = ki=1 i xi . Hence,
k
! k k k
!
X X X X
i i i i
f (x) = f ix = if x = if x = f ix = f (x)
i=1 i=1 i=1 i=1
444 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
As one can clearly infer from the proof, such an extension is far from unique: to every
set of scalars fri gni=k+1 , a di¤erent extension is associated.
Example 637 Consider the previous example, with the plane V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg
of R3 and the linear function f : V ! R de…ned by f (x) = x1 + x2 . By the Hahn-Banach’s
Theorem, there is a linear function f : R3 ! R such that f (x) = f (x) for each x 2 V .
For example, f (x) = x1 + x2 + x3 for each x 2 R3 , but also f (x) = x1 + x2 + x3 is an
extension, for each 2 R. This con…rms the multiplicity of the extensions. N
Although it may appear as a fairly innocuous result, the Hahn-Banach’s Theorem is very
powerful. Let us see one of its remarkable consequences by extending Riesz’s Theorem to
linear functions de…ned on subspaces.17
f (x) = x 8x 2 V (13.40)
Proof We prove the “only if” since the converse is obvious. Let f : V ! R be a linear
function. By the Hahn-Banach’s Theorem, there is a linear function f : Rn ! R such
that f (x) = f (x) for each x 2 V . By the Riesz’s Theorem, there is a 2 Rn such that
f (x) = x for each x 2 Rn . Therefore f (x) = f (x) = x for every x 2 V , as desired.
Conceptually, the main novelty relative to this version of Riesz’s Theorem is the loss of
the uniqueness of vector . Indeed, the proof shows that such a vector is determined by the
extension f whose existence is guaranteed by Hahn-Banach’s Theorem. Yet, such extensions
are far from being unique, thus implying the non-uniqueness of vector .
Example 639 Going back to the previous examples, we already noted that all linear func-
tions f : R3 ! R de…ned by f (x) = x1 + x2 + x3 , with 2 R, extend f to R3 . By setting
= (1; 1; ), we have that f (x) = x for every 2 R, so that
f (x) = x 8x 2 V
for every 2 R. Hence, in this example there are in…nitely many vectors for which the
representation (13.40) holds. N
Theorem 640 Let V be a vector subspace of Rn . Every (strictly) increasing linear function
f : V ! R can be extended on Rn so to be (strictly) increasing and linear.
17
In Section 19.5 we will see an important …nancial application of this result.
13.10. CODA: HAHN-BANACH ET SIMILIA 445
Proof We prove the statement in the particular, yet important case, in which V \ Rn++ is
not empty and f is increasing.18 We start by introducing a piece of notation which is going
to be useful.
Let W be a vector subspace of Rn such that V W . Consider a linear function f^ : W !
R such that f^ (x) = f (x) for all x 2 V . In other words, f^ extends f to the subspace W .
De…ne dim f^ = dim W . Now consider the set
n o
N = k 2 f1; :::; ng : k = dim f~ and f~ is a monotone increasing linear extension of f
Note that this set is not empty since it contains dim V . For, f is an extension of itself
which is linear and monotone increasing by assumption. Consider now max N . Being N
not empty, max N is well de…ned. If max N = n, then the statement is proved. Indeed, in
such a case we can conclude that there exists a linear monotone increasing extension of f
whose domain is a vector subspace of Rn with dimension n, that is, the domain is Rn itself.
By contradiction, assume instead that n = dim N < n. It means that, in looking for an
extension of f which preserves linearity and monotonicity, one can at most …nd a monotone
increasing linear extension f~ : W ! R where W is a vector subspace of dimension n < n.
Let x1 ; :::; xn be a basis of W . Since n < n, we can …nd at least a vector xn+1 2 Rn such
that x1 ; :::; xn ; xn+1 is still linearly independent. Fix a vector x 2 V \ Rn++ . Clearly, we
have that x 2 V W and for each z 2 Rn there exists m 2 N such that mx z mx.
Let U = x 2 W : x xn+1 and L = y 2 W : xn+1 y . Since x 2 W , both sets are not
empty. Consider now f~ (U ) and f~ (L) which are both subsets of the real line. Since f~ is
monotone increasing, it is immediate to see that each element of f~ (U ) is greater or equal than
each element of f~ (L). By the separation property of the real line, we have that there exists
c 2 R such that a c b for every a 2 f~ (U ) and for every b 2 f~ (L). Observe also that
each vector x 2 span x1 ; :::; xn ; xn+1 can be written in a unique way as x = yx + x xn+1 ,
where yx 2 W and x 2 R (why?).
De…ne now f^ : span x1 ; :::; xn ; xn+1 ! R to be such that f^ (x) = f~ (yx ) + x c for
every x 2 span x1 ; :::; xn ; xn+1 . We leave to the reader to verify that f^ is indeed lin-
ear and f^ extends f . Note instead that f^ is positive, that is, f^ (x) 0 for all x 2
1 n
span x ; :::; x ; x n+1 \ R+ . Otherwise, there would exist x 2 span x ; :::; xn ; xn+1 such
n 1
that x 0 and f^ (x) < 0. If x = 0, then yx = yx + x xn+1 = x 0 and this would yield
that yx 0, that is, since f~ is monotone increasing, 0 > f^ (x) = f~ (yx ) 0, a contradic-
tion. If x 6= 0, then xn+1 yx = x and c < f~ ( yx = x ). In other words, we have that
yx = x belongs to L, thus f~ ( yx = x ) 2 f~ (L) and c f~ ( yx = x ) > c a contradiction.
Since we just showed that f must be positive, by Proposition 538, this implies that f^ is
^
monotone increasing as well. To sum up, we just constructed a function (namely f^) which
extends f to a vector subspace which has dimension n + 1 (namely span x1 ; :::; xn ; xn+1 ),
thus max N n + 1. At the same time, our working hypothesis was that n = max N , thus
reaching a contradiction.
linear extension for it on R3 . Note that there may be non-monotone linear extensions: it is
enough to consider f (x) with < 0.
The last theorem and Proposition 539 lead to the following monotone version of Riesz’s
Theorem.
f (x) = x 8x 2 V
A similar result holds for strong monotonicity. In this regard, note that the function
f (x) = x1 + x2 is strongly positive, and so is f (x) = x1 + x2 + x3 with > 0.
A nice dividend of the Hahn-Banach’s Theorem is the following extension result for a¢ ne
functions, which will be introduced momentarily in the next chapter (they play a key role in
applications; cf. Chapter 34).
Proof of the Claim We start by proving that the statement is true when = 0. Let
x; y 2 C and ; 2 R be such that + = 1 as well as x + y 2 C. We have two cases
either ; 0 or at least one of the two is strictly negative. In the …rst case, since + = 1,
we have that 1. Since f is a¢ ne and = 1 , this implies that
In the second case, without loss of generality, we can assume that < 0. Since + = 1,
we have that = 1 > 1. De…ne w = x + (1 ) y = x + y 2 C. De…ne = 1= and
note that 2 (0; 1). Observe that x = w + (1 ) y. Since f is a¢ ne, we have that
1 1
f (x) = f ( w + (1 ) y) = f (w) + (1 ) f (y) = f ( x + (1 ) y) + 1 f (y)
by rearranging terms, we get that (13.42) holds. We next prove that (13.41) holds. Let us
now consider the more general case, that is, x; y; z 2 C and ; ; 2 R such that + + = 1
and x + y + z 2 C. We split the proof in three cases:
f ( x + y + z) = f ( w + (1 ) z) = f (w) + (1 ) f (z)
= ( + )f x+ y + (1 ) f (z)
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)
f ( x + y + z) = f ( z + (1 ) w) = f (z) + (1 ) f (w)
= f (z) + ( + ) f x+ y
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)
We can now start proving the main statement. We do so by further assuming that
0 2 C and f (0) = 0. We will show that f admits a linear extension to Rn . This will
prove the statement in this particular case (why?). If C = f0g, then any linear function
extends f and so any linear function is an a¢ ne extension of f . Assume C 6= f0g. Since
f0g 6= C Rn there exists a linearly independent collection x1 ; :::; xk C with 1
k n. Let k be the maximum number of linearly independent vectors of C. Note that
span x1 ; :::; xk C. Otherwise, we would have that there exists a vector x in C that
does not belong to span x1 ; :::; xk . Now, observe that if we consider a collection f g [
P
f i gki=1 R of k + 1 scalars, we can say that if x + ki=1 i xi = 0, then we have two cases:
P
either 6= 0 or = 0. In the former case, we could conclude that x = ki=1 ( i = ) xi 2
span x1 ; :::; xk , a contradiction with x 62 span x1 ; :::; xk . In the latter case, we could
Pk i 1 k
conclude that i=1 i x = 0. Since the vectors x ; :::; x are linearly independent, it
follows that i = 0 for all i 2 f1; :::; kg, proving that x ; :::; xk ; x are linearly independent,
1
a contradiction with the fact that x1 ; :::; xk contains the maximum number of linearly
P
independent vectors of C. De…ne f : span x1 ; :::; xk ! R by f (x) = ki=1 i f xi , where
P
f i gki=1 is the unique collection of scalars such that x = ki=1 i xi . By construction, f is
linear (why?). Next, we show it extends f . Let x 2 C. There exists a unique collection of
448 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Pk
scalars f i gki=1 such that x = i=1 ix
i. Divide these scalars in three sets
P P
De…ne = i2P i and = i2N i. We have four cases:
k
X
f (x) = if xi = 0 = f (0) = f (x)
i=1
k
!
X X X i
X
i
f (x) = if x = i P f xi = if xi
i=1 i2P i
i2P i2P i2P
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2P i2P
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2P i2P
k
!
X X X i
X
f (x) = if xi = i P f xi = if xi
i=1 i2N i
i2N i2N i2N
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2N i2N
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2N i2N
13.10. CODA: HAHN-BANACH ET SIMILIA 449
Concave functions
De…nition 643 A set C in Rn is said to be convex if, for every pair of points x; y 2 C,
x + (1 )y 2 C 8 2 [0; 1]
x + (1 )y
which, when varies in [0; 1], represents geometrically the points of the segment
that joins x with y. A set C is convex if it contains the segment (14.1) that joins any two
points x and y of C.
451
452 CHAPTER 14. CONCAVE FUNCTIONS
Other examples:
14.1. CONVEX SETS 453
Example 644 (i) On the real line the only convex sets are the intervals, bounded or un-
bounded. Convex sets can, therefore, be seen as the generalization to Rn of the notion of
interval. (ii) The neighborhoods B" (x) = fy 2 Rn : kx yk < "g of Rn are convex. Indeed,
let y 0 ; y 00 2 B" (x) and 2 [0; 1]. By the properties of the norm (Proposition 102),
x y 0 + (1 ) y 00 = x + (1 )x y 0 + (1 ) y 00
= x y 0 + (1 ) x y 00
x y 0 + (1 ) x y 00 < "
Therefore, y 0 + (1 ) y 00 2 B" (x), which proves that the set B" (x) is convex. N
Let us see a …rst topological property of convex sets (for brevity, we omit its proof).
Proposition 645 The closure and the interior of a convex set are convex sets.
The converse does not hold: a non-convex set may also have a convex interior or closure.
For example, the set [2; 5] [ f7g R is not convex (it is not an interval), but its interior (2; 5)
is; the set (0; 1) [ (1; 5) R is not convex, but its closure [0; 5] is. Even more interesting
is to consider a square in the plane and to remove from it a point on a side that is not a
vertex; the resulting set is not convex, yet both its closure and its interior are so.
Proposition 646 The intersection of any collection of convex sets is a convex set.
In contrast, a union of convex sets is not necessarily convex. For example, (0; 1) [ (2; 5)
is not a convex set although both sets (0; 1) and (2; 5) are so.
Proof Let fCi gi2I be T any collection of convex sets, where i runs over a …nite or in…nite
index set I. Let C = i2I Ci . The empty set is trivially convex, so if C = ; the result holds.
Suppose, therefore, that C 6= ;. Let x; y 2 C and let 2 [0; 1]. We want to prove that
x + (1 ) y 2 C. Since x; y 2 Ci for each i, we haveTthat x + (1 ) y 2 Ci for each i
because each set Ci is convex. Hence, x + (1 ) y 2 i2I Ci , as desired.
is called a convex (linear ) combination of the vectors fxi gki=1 if i 0 for each i and
Pk
i=1 i = 1. In the case n = 2, 1 + 2 = 1 implies 2 = 1 1 , hence convex combinations
of two vectors have the form x + (1 ) y with 2 [0; 1].
Via convex combinations we can de…ne a basic class of convex sets.
De…nition 647 Given a …nite collection of vectors fxi gki=1 of Rn , the polytope that they
generate is the set
( k k
)
X X
i xi : i = 1 and i 0 for every i
i=1 i=1
Clearly, polytopes are convex sets. In particular, the polytope generated by two vectors
x and y is the segment that joins them. On the plane, polytopes have simple geometric
interpretations that takes us back to high school. Given three vectors x, y and z of the plane
(not aligned), the polytope1
f 1x + 2y + (1 1 2) z : 1; 2 0 and 1 + 2 1g
2
x
1 y
-1
z
-2
-3 -2 -1 0 1 2 3 4 5
1
Note that ( 1 ; 2 ; 3 ) 2 R3+ : 1 + 2 + 3 = 1 = f( 1 ; 2 ; 1 1 2) : 1; 2 0 and 1 + 2 1g.
2
A caveat: if, for instance, x lies on the segment that joins y and z (i.e., the vectors are linearly dependent),
the triangle generated by x, y and z reduces to that segment. In this case, the vertices are only y and z.
Similar remarks applies to general polygons.
14.1. CONVEX SETS 455
1.5
0.5
-0.5
-1
-1.5
-2
-2 -1 0 1 2 3
is the polytope generated by the four vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1)g, which are its
vertices.
(ii) The …ve vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1) ; (1=2; 1=2)g also generate the same
rhombus
2
1.5
0.5
-0.5
-1
-1.5
-2
-2 -1 0 1 2 3
because the added vector (1=2; 1=2) already belonged to the rhombus. As mentioned in the
last footnote, not all vectors that generate a polygon have to be necessarily among its vertices.
N
456 CHAPTER 14. CONCAVE FUNCTIONS
Proposition 649 A set is convex if and only if it is closed with respect to all convex com-
binations of its own elements.
In other words, a set is convex if and only if contains all the polytopes generated by its
elements (in the plane, all polygons whose vertices are elements of the set). Though they
are de…ned in terms of P segments, convex sets actually contain all polytopes. In symbols, C
is convex if and only if ki=1 i xi 2 C for every …nite collection fxi gki=1 of vectors of C and
P
every collection f i gki=1 of positive scalars such that ki=1 i = 1.
Proof The “if” is obvious because by considering the convex combinations with n = 2 we
get De…nition 643. We prove the “Only if.” Let C be convex and let fxi gni=1 be a collection
n
Pn of C and f i gi=1 a collection ofPscalars
of vectors such that i 0 for each i = 1; :::; n
and i=1 i = 1. We want to prove that ni=1 i xi 2 C. By De…nition 643, this is true
for n = 2. We proceed by induction on n: we assume that it is true for n 1 (induction
hypothesis) and show that this implies that the property holds also for n. We have:
n
X n
X1 n
X1 i
i xi = i xi + n xn = (1 n) xi + n xn
1 n
i=1 i=1 i=1
n
X1 i
xi 2 C
1 n
i=1
n
X1 i
(1 n) xi + n xn 2C
1 n
i=1
of all their convex combinations is called simplex. For instance, the simplex of the plane
1 2
1 = 1e + 2e : 1; 2 0 and 1 + 2 =1
= f (1; 0) + (1 ) (0; 1) : 2 [0; 1]g
= f( ; 1 ): 2 [0; 1]g
14.2. CONCAVE FUNCTIONS 457
is the segment that joins the versors e1 and e2 . The simplex of R3 is:
1 2 3
2 = 1e + 2e + 3e : 1; 2; 3 0 and 1 + 2 + 3 =1
=f 1 (1; 0; 0) + 2 (0; 1; 0) + (1 1 2 ) (0; 0; 1) : 1; 2 0 and 1 + 2 1g
= f( 1; 2; 1 1 2) : 1; 2 0 and 1 + 2 1g
Graphically, 2 is:
The geometric interpretation is the same as the one seen in the scalar case: a function is
concave if the chord that joins any two points (x; f (x)) and (y; f (y)) of its graph lies below
the graph of the function, while it is convex if the opposite happens, that is, if this chord
lies above the graph of the function.
458 CHAPTER 14. CONCAVE FUNCTIONS
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
-2 x O y -2 x O y
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5
So, the following …gure of a concave function should clarify its geometric interpretation:
for every x; y 2 R and every 2 [0; 1]. More generally, the norm k k : Rn ! R is a convex
function. Indeed,
Note that a function f is convex if and only if f is concave: through this simple duality,
the properties of convex functions can be easily obtained from those of concave functions.
Accordingly, we will consider only the properties of concave functions, leaving to the reader
the simple deduction of the corresponding properties of convex functions.
N.B. The domain of a concave (convex) function must be a convex set. Otherwise, in De…n-
ition 651 the combination f (x) + (1 ) f (y) would be de…ned for every 2 [0; 1] while
f ( x + (1 ) y) would not be de…ned for some 2 [0; 1]. From now on we will assume,
often without mentioning it, that the concave (and convex) functions that we consider are
always de…ned on convex sets. O
An important subclass of concave functions is that of the strictly concave ones, which
are the functions f : C Rn ! R such that
for every x; y 2 C, with x 6= y, and every 2 (0; 1). In other words, inequality (14.3) is
required here to be strict, which implies that the graph of a strictly concave function has no
linear parts. In a dual way, a function f : C Rn ! R is called strictly convex if
for every x; y 2 C, with x 6= y, and every 2 (0; 1). In particular, a function is strictly
convex if and only if f is strictly concave.
We give now some examples of concave and convex functions. To verify whether a
function satis…es such properties using the de…nition is often not easy. For this reason we
invite readers to resort to their geometric intuition for these examples, and wait to see later
in the book some su¢ cient conditions based on di¤erential calculus that greatly simplify the
veri…cation (Chapter 24).
p
Example 653 (i) The functions f; g : R+ ! R given by f (x) = x and g (x) = log x are
strictly concave. (ii) The function f : R ! R given by f (x) = x2 is strictly convex. (iii)
The function f : R ! R given by f (x) = x3 is neither concave nor convex; however, on the
interval ( 1; 0] it is strictly concave, while on [0; 1) it is strictly convex. (iv) The function
f : R ! R given by
(
x if x 1
f (x) =
1 if x > 1
460 CHAPTER 14. CONCAVE FUNCTIONS
Example 654 (i) The function f : R2 ! R given by f (x) = x21 + x22 is strictly convex. (ii)
Cobb-Douglas functions (Example 178) are concave (as it will be seen in Corollary 711). N
f (x) = min xi
i=1;:::;n
because in minimizing separately x and y we have more degrees of freedom than in minimizing
them jointly, i.e., their sum. It then follows that, if x; y 2 Rn and 2 [0; 1], we have
f ( x + (1 ) y) = min ( xi + (1 ) yi )
i=1;:::;n
In consumer theory, u (x) = mini=1;:::;n xi is the Leontief utility function (Example 214). N
Since inequalities (14.3) and (14.4) are weak, it is possible that a function is at the same
time concave and convex. In such a case, the function is said to be a¢ ne. In other words, a
function f : C Rn ! R is a¢ ne if
f ( x + (1 ) y) = f (x) + (1 ) f (y)
for every x; y 2 C and every 2 [0; 1]. The notion of a¢ ne function is closely related to
that of linear function.
14.2. CONCAVE FUNCTIONS 461
f (x) = mx + q (14.8)
with m 2 R.3 A¢ ne functions of a single variable have, therefore, a well-known form: they
are the straight lines with slope m and intercept q. In particular, this con…rms that the
linear functions of a single variable are the straight lines passing through the origin, since
for them f (0) = q = 0.
In general, expression (14.7) tells us that the value f (x) of an a¢ ne function is a weighed
sum, with weighs i , of the components xi of the argument x, plus a known term q 2 R. It is
the simplest form that a function of several variables may assume. For example, if = (3; 4)
and q = 2, we obtain the a¢ ne function f : R2 ! R given by f (x) = 3x1 + 4x2 + 2.
Proof In view of Theorem 642, it is enough to prove the result for C = Rn . “If”. Let
x; y 2 Rn and 2 [0; 1]. We have
f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q + (1 )q
= (l (x) + q) + (1 ) (l (y) + q)
“Only if”. Let f : Rn ! R be a¢ ne and set l (x) = f (x) f (0) for every x 2 Rn . Setting
q = f (0), we have to show that l is linear. We start by showing that
l ( x) = l (x) 8x 2 Rn ; 8 2 R (14.9)
Let now > 1. Setting y = x, by what has just been proved we have
y 1
l (x) = l = l (y)
3
We use in the scalar case the more common letter m in place of .
462 CHAPTER 14. CONCAVE FUNCTIONS
14.3 Properties
14.3.1 Concave functions and convex sets
There exists a simple characterization of concave functions f : C Rn ! R that uses convex
sets. Namely, consider the set
hypo f = f(x; y) 2 C R : f (x) yg Rn+1 (14.11)
called the hypograph of f , constituted by the points (x; y) 2 Rn+1 that lie below the graph
of the function.4 Graphically, the hypograph of a function is:
6
y
1
O x
0
0 1 2 3 4 5 6
4
Recall that the graph is given by Gr f = f(x; y) 2 C R : f (x) = yg Rn+1
14.3. PROPERTIES 463
The next result shows that the concavity of f is equivalent to the convexity of its hypo-
graph.
Proof Let f be concave, and let (x; y) ; (y; z) 2 hypo f . By de…nition, y f (x) and
z f (y). It follows that
t + (1 )z f (x) + (1 ) f (y) f ( x + (1 ) y)
that is,
f (x) + (1 ) f (y) f ( x + (1 ) y)
as desired.
fx 2 C : f (x) kg
are called upper contour (or superlevel ) sets, denoted by (f k), while the sets
fx 2 C : f (x) kg
are called lower contour (or sublevel) sets, denoted by (f k). Clearly,
1
f (k) = (f k) \ (f k) (14.12)
The next two …gures show the upper contour sets of two scalar functions u. In the …rst
…gure we have a non-monotonic function with upper contour sets that are not all convex:
464 CHAPTER 14. CONCAVE FUNCTIONS
5
y
4
2 y=k
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
In contrast, in the second …gure we have a monotonic function with upper contour sets that
are convex:
8
y
6
4
y=k
2
O x
-2
-4
-4 -3 -2 -1 0 1 2 3 4
In economics we meet upper contour sets already in the …rst lectures of a course in
microeconomics principles. For a utility function u : C Rn ! R, the upper contour set
(u k) is the set of all the bundles that have utility at least equal to k. When n = 2,
graphically (u k) is the region of the plane lying below the indi¤erence curve u 1 (k).
Usually in microeconomics such regions are assumed to be convex. Indeed, it is this convexity
of (u k) that one has in mind when one talks, improperly, of convex indi¤erence curves.5
As the next result shows, this convexity holds when the utility function u is concave.
Proposition 658 If f : C Rn ! R is concave, then all its upper contour sets (f k) are
convex.
5
This notion will be made rigorous later in the book (cf. Section 25.3).
14.3. PROPERTIES 465
f x1 + (1 ) x2 f x1 + (1 ) f x2 k + (1 )k = k
We have thus shown that the usual form of the indi¤erence curves is implied by the
concavity of the utility functions. That is, more rigorously, we have shown that concave
functions have convex upper contour sets. The converse is not true! Think for example of
any function f : R ! R strictly increasing: we have
1
(f k) = f (k) ; +1
for every k 2 R. All the upper contour sets are therefore convex, although in general they
are not concave.6
The concavity of the utility functions is therefore a su¢ cient, but not necessary, condition
for the “convexity” of the indi¤erence curves: there exist non-concave utility functions that
have indi¤erence curves of this form. At this point it is natural to ask what is the class of
functions, larger than that of the concave ones, characterized by having “convex”indi¤erence
curves. Section 14.4 will answer this question by introducing quasi-concavity.
Proof Consider the “only if”, the converse being trivial. If f is a¢ ne, it can be written as
f (x) = l (x) + q for every x 2 Rn (Proposition 656). This implies that, for all 2 R and all
x; y 2 Rn ,
as desired.
B ( x + (1 ) y) = Bx + (1 ) By = b + (1 )b = b
So, x + (1 ) y 2 A as desired. N
Back to our original motivation, now we can explain why we can say much more about
level sets of a¢ ne functions on Rn than just that they are convex.
The proof of this result is just the observation, which by now should be fairly obvious,
that Corollary 659 holds for f de…ned on any a¢ ne set, not just the entire Rn .
To fully appreciate the strength of the result, next we characterize a¢ ne sets. Vector
subspaces are an important example of a¢ ne sets. Up to translations, the converse is true:
any a¢ ne set is “parallel” to a vector subspace.
Proof “Only if”. Let A = V +z, where V is a vector subspace. Let x; y 2 A. Then, x = x1 +z
and y = x2 + z for some x1 ; x2 2 V , and so x + (1 ) y = x1 + (1 ) x2 + z 2 V + z = A.
“If”. Take a point z 2 A and set V = A z. We must prove that V is a vector space. Let x 2
V , that is, x = y z for some y 2 A. For all 2 R we have x = y z = y +(1 ) z z.
As y; z 2 A, then y + (1 ) z 2 A and so x 2 A y = V . To conclude, let x1 ; x2 2 V ,
namely, x1 = y1 z and x2 = y2 z. Then
y1 + y2
x1 + x2 = y1 + y2 2z = 2 z 2V
2
So, V is a vector space. We leave to the reader the proof of the …nal part of the statement.
14.3. PROPERTIES 467
Example 665 In the last example, (f = 5) is already a vector subspace. Take k 6= 5, for
instance k = 0. Take any vector x0 such that f (x0 ) = 0, say x0 = ( 3; 1). It is easy to see
that
V = (f = 0) x0 = f(x1 + 3; x2 1) : f (x1 ; x2 ) = 0g
= f(t + 3; 2 (t + 3)) : t 2 Rg
So, a¢ ne sets correspond to the sets of solutions of linear systems. In particular, in view
of Proposition 664 we can say that vector subspaces have the form fx 2 Rn : Bx = 0g, so
they correspond to solutions of homogeneous linear systems.
Proof The “if”is contained in Example 661. We omit the proof of the converse, which relies
on the last proposition.
The inequality (14.13) is known as Jensen’s inequality and is very important in applica-
tions.7 A dual version, with
Pn , holds forPnconvex functions, while for a¢ ne functions we have
a “Jensen equality” f ( i=1 i xi ) = i=1 i f (xi ). So, a¢ ne functions preserve all a¢ ne
combinations, be they with two or more elements.
Proof The “if”is obvious. As to the “only if”part, we proceed by induction on n. Let f be
concave. The inequality (14.13) obviously holds for n = 2. Suppose that it holds for n 1
Pn 1 Pn 1
(induction hypothesis), i.e., f i=1 i xi i=1 i f (xi ) for every convex combination
7
The inequality is named after Johan Jensen, who introduced concave functions in 1906.
468 CHAPTER 14. CONCAVE FUNCTIONS
as desired.
Concave functions are very well behaved, in particular they have remarkable continuity
properties.
Theorem 669 A concave function is continuous at every interior point of its domain:
Then f is concave on the entire domain [ 1; 1] and is discontinuous at 0 and 1, i.e., at the
boundary points of the domain. In accordance with the last theorem, f is continuous on
(0; 1), the interior of its domain [0; 1]. (ii) Concave functions f : Rn ! R de…ned on the
entire space Rn are continuous. N
Proof of Theorem 669 We prove the result for scalar functions. Let f be a concave
function de…ned on an interval C of the real line. We will show that f is continuous in every
closed interval [a; b] included in the interior of C: this will imply the continuity of f on the
interior of C.
So, let [a; b] int C. Let m be the smallest between the two values f (a) and f (b); for
every x = a + (1 ) b, with 0 1, that is, for every x 2 [a; b], one has
that is,
jx yj
f (x) f (y) [f (x) f (z)] (M" m" ) = (M" m" )
" + jy xj
M" m"
< jx yj
"
470 CHAPTER 14. CONCAVE FUNCTIONS
In conclusion,
jf (x) f (y)j k jx yj
where k = (M" m" ) =". Now, if y ! x, that is, jx yj ! 0, then f (y) ! f (x). This
proves the continuity of f at x. Since x is arbitrary, the statement follows.
So, concave functions on open convex sets are continuous. If we strengthen the hypothesis
on f we can weaken that on its domain, as the next interesting result shows.
When the inequality in (14.14) is strict for 2 (0; 1) with x 6= y, the function f is said
to be strictly quasi-concave. Similarly, when the inequality in (14.15) is strict for 2 (0; 1)
with x 6= y, the function f is said to be strictly quasi-convex.
Finally, a function f is said to be quasi-a¢ ne if it is both quasi-concave and quasi-convex,
that is,
min ff (x) ; f (y)g f ( x + (1 ) y) max ff (x) ; f (y)g (14.16)
for every x; y 2 C and every 2 [0; 1].
while convex functions are quasi-convex. In particular, a¢ ne functions are quasi-a¢ ne. The
converses of these implications are false, as the following example shows.
14.4. QUASI-CONCAVE FUNCTIONS 471
Example 673 Monotonic scalar functions (e.g., the cubic) are quasi-a¢ ne. Indeed, let
f : C R ! R be increasing on the interval C and let x; y 2 C and 2 [0; 1], with
x y. Then, x x + (1 )y y and the increasing monotonicity implies f (x)
f ( x + (1 ) y) f (y), that is, (14.16) holds. A similar argument applies when f is
decreasing. This example shows that, unlike concave functions, quasi-concave functions may
be quite irregular. For instance, they might well be discontinuous at interior points of their
domain (just take any discontinuous monotonic scalar functions). N
while strictly convex functions are strictly quasi-convex. The converses of these implications
are false. In particular, note that a quasi-concave function can be strictly convex – for
example, the exponential f (x) = ex . The terminology must, therefore, be taken cum grano
salis.
Proof Let f be quasi-concave. Given a non-empty (otherwise the result is trivial) upper
contour set (f k), let x; y 2 (f k) and 2 [0; 1]. We have
Quasi-concave functions are thus characterized by the convexity of their upper contour
sets. So, quasi-concavity is the weakening of the notion of concavity that answers the opening
question.
In utility theory, quasi-concave utility functions are precisely those featuring “convex”
indi¤erence curves, the usual form of indi¤erence curves. This makes quasi-concave utility
functions the most important class of utility functions. Before studying in more detail this
important economic application of quasi-concavity, we close with some bad news: additivity
preserves concavity (Proposition 668) but not quasi-concavity.
472 CHAPTER 14. CONCAVE FUNCTIONS
1 1 1 1
h x+ y =h = < 0 = h (x) = h (y)
2 2 2 8
Indi¤erence curves For quasi-convex functions Proposition 674 holds with lower contour
sets in place of the upper contour ones. As a consequence, a quasi-a¢ ne function f has
level curves (f = k) that are convex because (f = k) = (f k) \ (f k). The converse is,
however, false: injective functions have level curves that are singletons, so convex, but they
might be not quasi-a¢ ne. For instance, take the function f : R ! R given by
( 1
x if x 6= 0
f (x) =
0 otherwise
So the level curves are, trivially, convex. But f is neither quasi-concave nor quasi-convex, a
fortiori not quasi-a¢ ne.
In utility theory what has just been observed shows that a su¢ cient, but not necessary,
condition for a utility function u to have convex (in a proper sense!) indi¤erence curves is
to be quasi-a¢ ne. Recall that previously we talked about convexity in an improper sense
(within “.”) of the indi¤erence curves, meaning by this the convexity of the upper contour sets
(u k). Although improper, this is a common terminology. In a proper sense, the convexity
of the indi¤erence curves is the convexity of the level curves (u = k). Thanks to Proposition
674, the improper convexity of the indi¤erence curves characterizes quasi-concave utility
functions, while their proper convexity is satis…ed by quasi-a¢ ne utility functions (without
being, however, a characterizing property of them).
(i) If g is concave and f is concave and increasing, then the composite function f g :
C Rn ! R is concave.
Proof We show only (i), leaving (ii) to the reader. Let x; y 2 C and 2 [0; 1]. Thanks to
the properties of the functions f and g, we have
as desired.
Between (i) and (ii) there is an important di¤erence: concavity is preserved by the
monotonic transformation f g if f is both increasing and concave, while, in order to preserve
quasi-concavity, increasing monotonicity is su¢ cient. In other terms, quasi-concavity is
preserved by monotone (increasing) transformations, while this is not true for concavity.
p
For example, if f; g : [0; 1) ! R are g (x) = x and f (x) = x4 , the composite function
f g : [0; 1) ! R is the quasi-concave and strictly convex function x2 .8 So, with f increasing
but not concave, the concavity of g only implies the quasi-concavity of f g, nothing more.
This di¤erence between (i) and (ii) is important in utility theory. A property of the
utility functions that is preserved for strictly increasing monotonic transformations is called
ordinal, while a property that is preserved only for strictly increasing a¢ ne transformations
–that is, for f (x) = x + with > 0 and 2 R –is called cardinal. Naturally, an ordinal
property is also cardinal, while the converse is false. Thanks to Proposition 676, we can thus
say that quasi-concavity is an ordinal property, while concavity is only cardinal.
The distinction between cardinal and ordinal properties is, conceptually, very important.
Indeed, given a utility function u : C Rn ! R and a function f : D R ! R, with
Im u D, we saw in Section 6.4.4 that, when f is strictly increasing, the transformation
f u : C Rn ! R of the utility function u : C Rn ! R is itself a utility function
equivalent to u. In other words, f u represents the same preference relation %, which is the
fundamental economic notion (Section 6.8). Indeed, what matters is how the decision maker
ranks the pairs of bundles x and y, whether x % y (x is preferred to y), that is, u (x) u (y),
or, vice versa, y % x (y is preferred to x), that is, u (y) u (x). When f is strictly increasing,
the preferential ordering % is preserved by f u since
For this reason, ordinal properties –which are satis…ed by u and all its equivalent transform-
ations f u –are characteristic of utility functions in that they are numeric representations
of an underlying preference %. In contrast, this is not true for cardinal properties, which are
preserved only by positive (therefore, increasing) linear transformations f , so might well get
lost through strictly increasing transformations that are not linear.
In light of this, the ordinal quasi-concavity, rather than the cardinal concavity, seems
to be the relevant property for utility functions u : C Rn ! R. Nevertheless, before we
declare quasi-concavity to be the relevant property, in place of concavity, we have to make a
last subtle observation. The monotonic transformation f u is quasi-concave if u is concave;
8
Note that x4 is here strictly increasing because we are considering its restriction on [0; +1). For the
same reason, x2 is quasi-concave.
474 CHAPTER 14. CONCAVE FUNCTIONS
does the opposite also hold? That is, can any quasi-concave function be expressed in this
way, as a monotonic transformation of a concave function?
If this were the case, concavity would be back in business also in an ordinalist approach:9
given a quasi-concave function, it would be then su¢ cient to consider its equivalent concave
version, obtained through a suitable strictly increasing transformation.
The answer to the question is negative: there exist quasi-concave functions that are not
monotonic transformations of concave functions.
g=f h (14.17)
1 1 3 1 1 1
f (0) = f g = f g x+ y
4 2 2
1 1 1 1 1 1 1 1 1
f g (x) + f g (y) = f + f (0)
2 2 2 2 2
that is
1 1 1
f (0) f
2
which contradicts the fact that f 1 is strictly increasing. This proves the claim. N
This example shows that there exist genuinely quasi-concave functions that cannot be
represented as monotonic transformations of concave functions. It is the de…nitive proof
that quasi-concavity, and not concavity, is the relevant property in an ordinalist approach.
This important conclusion was reached in 1949 by Bruno de Finetti in the article in which
he introduced quasi-concave functions, whose theory was then extended in 1954 by Werner
Fenchel.
We have observed many times that in consumer theory we usually consider utility func-
tions with “convex” indi¤erence curves, that is, utility functions with convex upper contour
sets.10 As observed, this is why quasi-concavity is a fundamental property of utility func-
tions. But, what is the economic motivation for assuming convex indi¤erence curves, that
is, quasi-concave utility functions?
The answer is in the diversi…cation principle: if two bundles of goods ensure a certain
level of utility, say k, a convex combination of them, a mixture, x + (1 ) y will yield at
least as much. In other words, the diversi…cation that the compound bundle x + (1 )y
a¤ords relative to the original bundles x and y, guarantees a utility level which is not smaller
than the original one, i.e., k. If x = (0; 1) is the bundle composed by 0 units of water and 1
of bread, while y = (1; 0) is composed by 1 unit of water and 0 of bread, their mixture
1 1 1 1
(0; 1) + (1; 0) = ;
2 2 2 2
is a diversi…ed bundle, with positive quantities of both water and bread. It is natural to
think that this mixture gives a utility which is not smaller than the utility of the two original
bundles.
Everything …ne? Almost, we can actually sharpen what was just said. Observe that the
diversi…cation principle implies that, for every x; y 2 C,
u (x) = u (y) =) u ( x + (1 ) y) u (x) 8 2 [0; 1] (PDP)
Indeed, by setting k = u (x) = u (y), we obviously have u (x) k and u (y) k, which
implies u ( x + (1 ) y) k by the diversi…cation principle. We call condition PDP the
pure diversi…cation principle. In preferential terms, the PDP takes the nice form
x y =) x + (1 )y % x 8 2 [0; 1]
which well expresses its nature.
The PDP is very interesting: it states that each bundle which is a mixture of indi¤erent
bundles is preferred to the original ones. If we draw an indi¤erence curve, we see that the
weaker property PDP is often used as the property that characterizes the convexity of the
indi¤erence curves. Indeed, PDP is the purest and most intuitive form of the diversi…cation
principle: by combining two indi¤erent alternatives, we get a better one. Going back to the
example of bread and water, it is plausible that
1 1
(0; 1) (1; 0) - ;
2 2
10
Throughout this section the convexity of indi¤erence curves is always to be understood in an improper
sense (that, as already remarked, will be made precise in Section 25.3). For simplicity, we omit the quotation
marks in the adjective “convex”.
476 CHAPTER 14. CONCAVE FUNCTIONS
The next result shows that, in most cases of interest for consumer theory, the two principles
turn out to be equivalent. The result uses the notion of directed set.
De…nition 678 A set C in Rn is said to be directed if, for every x; y 2 C, there exists
z 2 C such that z x and z y.
In words, a set is directed when any pair of its elements has a common lower bound that
belongs to the set. In consumer theory many sets of interest are directed. For example, all
sets C Rn+ that contain the origin are directed. Indeed, 0 x for every x 2 Rn and,
therefore, the origin itself is the lower bound common to all the pairs of elements of C.
Proof Since the “only if” part is obvious, we prove the “if” part: the PDP implies the
quasi-concavity of u. Let x; y 2 C and 2 [0; 1], with u (x) u (y). Since C is directed,
there exists z 2 C such that z x and z y. By the increasing monotonicity of u, we
have u (z) u (x) and u (z) u (y). Let us de…ne the auxiliary function : [0; 1] ! R by
(t) = u (tx + (1 t) z) for t 2 [0; 1]. Since C is convex, the function is well-de…ned. The
continuity of u implies that of . Indeed:
tn ! t =) tn x + (1 tn ) z ! tx + (1 t) z
=) u (tn x + (1 tn ) z) ! u (tx + (1 t) z) =) (tn ) ! (t)
Since (0) = u (z) u (y) u (x) = (1), by the Intermediate Value Theorem the continu-
ity of implies the existence of t 2 [0; 1] such that (t) = u (y). By setting w = tx+(1 t) z,
we have therefore u (w) = u (y). Moreover, z x implies w x.
By the PDP condition, it follows that u ( w + (1 ) y) u (w) = u (y), while w x
implies that w + (1 )y x + (1 ) y. Since u is increasing, we conclude that
The result just proved guarantees that, under assumptions typically satis…ed in consumer
theory, the two possible interpretations of the convexity of the indi¤erence curves are equi-
valent. We can therefore consider the pure principle of diversi…cation, which is the clearest
form of the diversi…cation principle, as the motivation for the use of quasi-concave utility
functions.
What about concave functions? They satisfy the diversi…cation principle and therefore
their use does not violate the principle. Example 677 has shown, however, that there ex-
ist examples of quasi-concave functions that are not monotonic transformations of concave
functions, i.e., that do not have the form f g with f strictly increasing and g concave. In
other words, quasi-concavity (so, the diversi…cation principle) is a weaker property than the
concavity in ordinal utility theory.
14.6. GRAND FINALE: CAUCHY’S EQUATION 477
In conclusion, the use of concave functions is consistent with the diversi…cation principle,
but it is not justi…ed by it. Only quasi-concavity is justi…ed by this principle, being its
mathematical counterpart.11
We make a last observation on the pure diversi…cation principle that does not add much
conceptually, but is useful in applications. Consider a version of condition PDP with strict
inequality: for every x 6= y,
x y =) x + (1 )y x 8 2 (0; 1)
We thus obtain a strong form of the principle in which diversi…cation is always strictly
preferred by the consumer. Condition SDP is implied by the strict quasi-concavity of u since
Under the hypotheses of Proposition 679, it is indeed equivalent to SDP. We thus have the
following version of that proposition (the proof is left to the reader).
SDP is thus the version of the diversi…cation principle that characterizes strict quasi-
concavity, a property often used in applications because it ensures the uniqueness of the
solutions of optimization problems, as it will be discussed in Section 18.6.
We close by observing that, although important, the diversi…cation principle does not
have universal validity: there are cases in which it makes little sense. For example, if the
bundle (1; 0) consists of 1 unit of brewer’s yeast and 0 of cakes’yeast, while the bundle (0; 1)
consists of 1 unit of cakes’yeast and 0 of brewer’s yeast, and we judge them indi¤erent, their
combination (1=2; 1=2) might be useless for making both a pizza and a cake. In this case,
the combination turns out to be rather bad.
This re…nement is usually stated through the Cauchy functional equation: we ask whether
or not there are functions f : R ! R that satisfy the condition12
Naturally, a function satis…es Cauchy’s equation if and only if it is additive (cf. De…nition
684). Much more is true:
N.B. The conclusion of Theorem 681 holds also when f is de…ned only on R+ : the proof is
the same. O
Proof The “if” part is trivial; let us show the “only if” part in three steps. (i) Taking
x = y = 0, the equation gives f (0) = f (0) + f (0) = 2f (0), that is, f (0) = 0: the graphs of
all functions that satisfy the equation pass through the origin.
(ii) We claim that f is continuous at every point. Let x0 be the point at which, by
hypothesis, f is continuous, so that f (x) ! f (x0 ) as x ! x0 . Take another (generic) point
z0 . By the Cauchy equation and the continuity of f at x0 ,
Therefore,
f (x z0 ) ! f (x0 ) f (z0 ) = f (x0 z0 ) as x ! x0
which proves the continuity of f at x0 z0 and, by the arbitrariness of x0 z0 , f is everywhere
continuous.
(iii) Using Cauchy’s equation n times, we can write that, for every x 2 R and for every
n 2 N,
f (nx) = f (x
| +x+ {z + x}) = f| (x) + f (x){z+ + f (x) = nf (x)
}
n times n times
1 1
f y = f (y) 8y 2 R, 8k 2 Z (14.19)
k k
as desired.
It admits the trivial solution f (x) = 0 for every x 2 R. Every other solution is strictly
positive. Indeed, if f is such a solution, for every x 2 R we have:
x x x x h x i2
f (x) = f + =f f = f 0
2 2 2 2 2
Moreover, if there exists y 6= 0 with f (y) = 0, then f (x) = f ((x y) + y) =
f (x y) f (y) = 0 for every x 2 R, which contradicts the non-triviality of f . Every
non-trivial solution of (14.20) is therefore strictly positive. This allows us to take the
logarithm of both members of (14.20), so that
which is the Cauchy equation in the unknown function log f . The solution is therefore
log f (x) = mx with m 2 R, so the exponential function
f (x) = emx
The results just seen are remarkable because they establish a functional foundation to
the elementary functions. For example, the exponential function can be characterized, as in
Theorem 367, via the limit
x n
ex = lim 1 +
n!1 n
but also, from a completely di¤erent angle, as the function that solves the functional equation
(14.20). Both points of view are of great importance.
Because of the importance of this new perspective on elementary functions, we record as
a theorem what we established.
Theorem 682 (i) The exponential function f (x) = emx , with m 2 R, is the unique non-
trivial solution of the functional equation
f (x + y) = f (x) f (y) 8x; y 2 R
(ii) The logarithmic function f (x) = log xm , with m 2 R, is the unique non-trivial solution
of the functional equation
f (x y) = f (x) + f (y) 8x; y > 0
(iii) The power function f (x) = xm , with m 2 R, is the unique non-trivial solution of the
functional equation
f (x y) = f (x) f (y) 8x; y 0
14.6. GRAND FINALE: CAUCHY’S EQUATION 481
m = m (c; t) : R R+ ! R
Here, c < 0 is interpreted as a debt. Consider the following properties on this function:
Condition (i) requires that the terminal value of a sum of capitals be the sum of their
terminal values. Observe that it would be meaningless to suppose that m (c1 + c2 ; t) <
m (c1 ; t) + m (c2 ; t) for some c1 ; c2 0 because, in such a case, it would be more pro…table
to invest separately c1 and c2 than their sum c1 + c2 . In contrast, it might be reasonable to
have m (c1 + c2 ; t) m (c1 ; t) + m (c2 ; t), but this would lead us a bit too far away.
Condition (ii) requires that the terminal value increases with the length of the investment.
This presumes that such value is measured in nominal terms. Finally, condition (iii) is
obvious.
Theorem 683 Let m be continuous for, at least, some value of c. It satis…es conditions
(i)-(iii) if and only if
m (c; t) = cf (t)
where f : [0; 1) ! R is an increasing function such that f (0) = 1.
Proof De…ne mt : R ! R by mt (c) = m (c; t). By condition (i), mt satis…es the Cauchy
functional equation. Therefore, for each t 0 there is a scalar t such that mt (c) = t c.
De…ne f : [0; 1) ! R by f (t) = t , so that we can write m (c; t) = cf (t). To satisfy (ii), f
must be increasing and, by (iii), we have f (0) = 1.
Under conditions (i)-(iii), the terminal value is therefore proportional to the amount c
of the capital. In particular, we have f (t) = m (1; t), so f (t) can be interpreted as the
terminal value in t of a unit capital. The terminal value of any other amount of capital can
be obtained simply by multiplying it by f (t). For this reason, f (t) is called the compound
factor.
The most common compound factor has the form
t
f (t) = e
with 0. To see how the exponential factor may come up, suppose that one has to invest
a capital c from today, 0, until the date t1 + t2 . We can think of two investment strategies:
(a) to invest from the beginning to the end, thus obtaining the terminal value cf (t1 + t2 );
482 CHAPTER 14. CONCAVE FUNCTIONS
(b) to invest …rst from 0 to t1 , getting the terminal value cf (t1 ), and then reinvest this
amount for the remaining t2 , thus obtaining the terminal value (cf (t1 )) f (t2 ).
If the two terminal values di¤er, that is, f (t1 + t2 ) 6= f (t1 ) f (t2 ), arbitrage opportunities
may open if in the …nancial market it is possible to lend and borrow without quantity
constraints and transactions costs. Indeed, if f (t1 + t2 ) > f (t1 ) f (t2 ), it would be pro…table
to invest without interruptions from 0 to t1 + t2 and to borrow with interruption at t1 ,
earning in this way the di¤erence f (t1 + t2 ) f (t1 ) f (t2 ) > 0. Vice versa, if f (t1 + t2 ) <
f (t1 ) f (t2 ), it would be pro…table to borrow without interruptions, and then investing with
an interruption at t1 .
In sum, the equality f (t1 + t2 ) = f (t1 ) f (t2 ) must hold for every t1 ; t2 0 in order
not to open arbitrage opportunities. Remarkably, from the study of the variant (14.20) of
Cauchy’s equation, it follows that this equality amounts to
t
f (t) = e
provided f is continuous at least at one point. The exponential compound factor is thus the
outcome of a no arbitrage argument, as it is the case for many key results in …nance (cf.
Section 19.5).
N.B. In this section we assumed that time is continuous, so t can take any positive value,
so each c induces a function mt (see the proof of the last theorem). In contrast, if time were
discrete, with t 2 N+ , we would have a sequence. In this case, the discrete compound factor
f : N+ ! R that corresponds to the exponential continuous compound factor is given by
f (t) = (1 + r)t with mt = (1 + r)t c (cf. Example 271).
(iii) f is linear;
(iv) there exists a (unique) vector 2 Rn such that f (x) = x for all x 2 Rn .
14.7. FIREWORKS: THE SKELETON OF CONVEXITY 483
Proof (iv) implies (iii) by Riesz’s Theorem. (iii) implies (ii) by Theorem 535. (ii) trivially
implies (i). Finally, to prove that (i) implies (iv) is enough to show, along the lines of the
proof of Cauchy’s Theorem for scalar functions (which is easily adapted to Rn , as readers
can check), that (i) implies that f is homogeneous, so linear.
if and only if there exists a vector 2 Rn such that f (x) = e x for all x 2 Rn .
Next we show that convex envelopes are the counterpart for convex combinations of what
generated subspaces were for linear combinations (recall Section 3.4).
Proposition 687 The convex envelope of a set is the intersection of all convex sets that
contain it.
Proof Given a set A of Rn , let fCi gi2I be the collection of all convex subsets
T containing
A, where I is a (…nite
T or in…nite) index set. We want to show that co A = i2I T Ci . By
Proposition 646, i2I Ci is a convex set. Since A Ci for each i, we have co A i2I Ci
since, by de…nition, co A is the smallest convex subset containing A. On the other hand,
co
T A belongs to the collection fCi gi2I , being a convex
T subset containing A. It follows that
C
i2I i co A and we can therefore conclude that i2I Ci = co A.
The next result shows that convex envelopes can be represented through convex combin-
ations.
Proof “If.”Let x 2 Rn be convex combination of a …nite set fxi gi2I of vectors of A. The set
co A is convex and, since fxi gi2I co A, Lemma 649 implies x 2 co A, as desired. “Only if.”
Let C be the set of all the vectors that can be expressed as convex combinationsPof vectors of
A, i.e., x 2 C if there exist …nite sets fx g A and i [0; 1), with
P i i2I i2I i2I i = 1,
such that x = ni=1 i xi . It is easy to see that C is a convex subset containing A. It follows
that co A C and hence each x 2 co A is a linear combination of vectors of A.
484 CHAPTER 14. CONCAVE FUNCTIONS
Example 689 Let A = fx1 ; :::; xk g Rn . The polytope generated by the set A is its convex
envelope co A. In particular, simplices are the convex envelope of the versors. N
Thus, convex envelopes preserve compactness (we omit the proof). When K is a compact
subset, co K is then compact. For instance, polytopes are compact because they are the
convex envelope of a …nite (so, compact) collection of vectors of Rn .
In this set, besides the vertices there is also the vector (1=2; 1=2), which is useless for the
representation of the polygon because is itself a convex combination of the vertices.14 We
therefore have a redundancy in the set A0 , while this does not happen in the set A of the
vertices, whose elements are all essential for the representation of the rhomb.
Hence, for a polygon the set of the vertices is the natural candidate to be the minimal set
that allows to represent each point of the polygon as a convex combination of its elements.
This motivates the notion of extreme point, which generalizes that of vertex to any convex
set.
Lemma 692 A point x0 of a convex set C is extreme if and only if the set C fx0 g is
convex.
The next result shows that the extreme points must be boundary points. No interior
point of a convex set can be an extreme point.
1 " 1 "
x= 1 x+ 1+ x
2 n 2 n
and so x 2
= ext C.
Open convex sets (like, for example, open unit balls) thus do not have extreme points.
We now see other examples in which we …nd the extreme points of some convex sets.
Example 694 Consider the polytope co A generated by a …nite collection A = fx1 ; :::; xk g
Rn . It is easy to see that ext co A is not empty, with ext co A A. That is, the vertices of
the polytope necessarily belong to the …nite collection that generates the polytope. N
Example 695 Consider the closed unit ball B1 (0) = fx 2 Rn : kxk 1g of Rn . In this
case, we have:
ext B1 (0) = fx 2 Rn : kxk = 1g
That is, ext B1 (0) = @B1 (0): the set of the extreme points is given by the “circumference”
of the ball, its skin. Though a quite intuitive result (just draw a circle), it is a bit delicate
to prove. Since @B1 (0) = fx 2 V : kxk = 1g, the previous proposition implies the inclusion
ext B1 (0) fx 2 V : kxk = 1g. As to the converse inclusion, let x0 2 @B1 (0). Let x0 =
486 CHAPTER 14. CONCAVE FUNCTIONS
where is the angle that is di¤erence of the angles determined by the two vectors
(Section C.3). If x 6= y, we have cos ( ) < 1, so ktx + (1 t) yk2 < 1. This contradicts
x0 2 @B1 (0), therefore x = y. We conclude that x 2 ext B, as desired. N
We are now ready to address the opening question of this section. We …rst need a
preliminary lemma that shows that ext C is included in all subsets of C whose convex envelope
is C itself.
The next fundamental result shows that convex and compacts sets can be reconstructed
from its extreme points by taking all their convex combinations. We omit the proof.
K = co (ext K) (14.23)
14.7. FIREWORKS: THE SKELETON OF CONVEXITY 487
In view of the previous lemma, Minkowski’s Theorem answers the opening question:
ext K is the minimal set in K for which (14.23) holds. Indeed, if A K is another set for
which K = co A, then ext K A by the lemma. Summing up:
all the points of a compact and convex set K can be expressed as convex combinations
of the extreme points;
the set of the extreme points of K is the minimal set in K for which this is true.
Minkowski’s Theorem stands out as the deepest and most beautiful result of the chapter.
It shows that, in a sense, convex and compact sets in Rn are generalized polytopes (cf.
Example 694) with extreme points generalizing the role of vertices. In particular, polytopes
are the convex and compact sets of Rn that have a …nite number of extreme points (which
are then their vertices).
488 CHAPTER 14. CONCAVE FUNCTIONS
Chapter 15
Homogeneous functions
Geometrically, C is a cone if, any time x belongs to C, the set C also includes the whole
half-line starting at the origin and passing through x.
5
y 7
6 y
4
3
4
2 3
2
1 O x
O x
1
0
0
-1 -1
-3 -2 -1 0 1 2 3 4 5 6 7 -6 -4 -2 0 2 4 6 8 10
Note that the origin 0 always belong to a cone: given any x 2 C, by taking = 0 we have
0 = 0x 2 C.
One can easily show that the closure of a cone is a cone and that the intersection of two
cones is still a cone.
x; y 2 C =) x + y 2 C 8 ; 0
489
490 CHAPTER 15. HOMOGENEOUS FUNCTIONS
While a generic convex set is closed with respect to convex combinations, convex cones
are closed with respect to all linear combinations with positive coe¢ cients (regardless of
whether or not they add up to 1). This is what distinguishes them among all convex sets.
Proof “Only if”. Let C be a cone. Take x; y 2 C. We want to show that x + y 2 C for
all ; 0. Fix ; 0. If = = 0, then x + y = 0 2 C. Assume that + > 0.
Since C is convex, we have
x+ y2C
+ +
Since C is a cone, we have
x+ y =( + ) x+ y 2C
+ +
as desired.
“If”. Suppose that x; y 2 C implies that x + y 2 C for all ; 0. We want to show
that C is a cone. By taking = = 0, one can conclude that 0 2 C and, by taking y = 0,
that x 2 C for all 0. Hence, C is a cone.
Example 700 (i) A singleton fxg Rn is always convex; it is also a cone if x = 0. (ii)
The only non-trivial cones in R are the two half-lines ( 1; 0] and [0; 1).1 (iii) The set
Rn+ = fx 2 Rn : x 0g of the positive vectors is a convex cone. N
Cones can be closed, for example Rn+ , or open, for example Rn++ . Vector subspaces form
an important class of closed convex cones (the non-trivial proof is omitted).
For example, this proposition implies that the graphs of straight lines passing through
the origin are closed sets because they are vector subspaces of R2 .
Example 703 (i) Linear functions f : Rn ! R are positively homogeneous. (ii) The func-
p
tion f : R2+ ! R given by f (x) = x1 x2 is positively homogeneous. Indeed
p p p
f ( x) = ( x1 ) ( x2 ) = 2x x = x1 x2 = f (x)
1 2
for all 0. N
The condition 0 2 C in the de…nition ensures that x 2 C for all 2 [0; 1], so that (15.1)
is well-de…ned. Whenever C is a cone –as in the previous examples –property (15.1) holds,
more generally, for any positive scalar .
Proof Since the “if”side is trivial, we focus on the “only if”. Let f be positively homogeneous
and let x 2 C. We must show that f ( x) = f (x) for every > 1. Let > 1 and
set y = x, so that x = y= . From > 1 it follows that 1= < 1. Thanks to the
positive homogeneity of f , we have that f (x) = f (y= ) = f (y) = = f ( x) = , that is,
f ( x) = f (x), as desired.
Linear production functions are positively homogeneous, thus having constant returns to
scale (Example 532). Let us now illustrate another famous example.
Apart from being constant, returns to scale may be increasing or decreasing. This mo-
tivates the following de…nition.
f ( x) f (x)
for all x 2 C and all 2 [0; 1], while it is said to be ( positively) subhomogeneous if
f ( x) f (x)
f ( x) f (x) 8 2 [0; 1]
and
f ( x) f (x) 8 1
Proof We consider the “only if” side, the converse being trivial. Let f be subhomogeneous
and x 2 C. Our aim is to show that f ( x) f (x) for all > 1. Take > 1 and set
y = x, so that x = y= . Since > 1, we have 1= < 1. By the positive subhomogeneity of
f , we have f (x) = f (y= ) f (y) = = f ( x) = , that is, f ( x) f (x), as desired.
Thus, by doubling all inputs ( = 2) the output is less than doubled, by tripling all inputs
( = 3) the output is less than tripled, and so on for each 1. A proportional increase of
all inputs brings along a less than proportional increase in output, which models decreasing
returns to scale. Dual considerations hold for increasing returns to scale, which entail more
than proportional increases in output as all inputs increase proportionally. Note that when
2 [0; 1], so we cut inputs, opposite output patterns emerge.
f ( x) = ( x1 )a ( x2 )b = a+b a b
x1 x2 = a+b
f (x)
In conclusion, the notions of homogeneity are de…ned for 2 [0; 1] –that is, for propor-
tional cuts – on convex sets containing the origin. Nonetheless, their natural domains are
cones, where they model the classic returns to scale hypotheses in which both cuts, 2 [0; 1],
and raises, 1, in inputs are considered.
y f (zy)
fm (z) =
z
It yields the average value of f with respect to positive multiples of z only (which is arbitrarily
chosen). In the n = 1 case, by choosing y = 1 one ends up with the previous de…nition of
average function.
Proof “Only if”. If f is subhomogeneous one has that, for any 0 < ,
f ( y) = f y f ( y)
y y y
that is f ( y) = f ( y) = , or fm ( ) fm ( ). Therefore, the function fm is decreasing.
y y y
“If”. If fm is decreasing, by setting = 1, we have fm ( ) fm (1) for 0 < 1 and so
f ( y) = f (y), that is, f ( y) f (y) for each 0 < 1. Since f (0) = 0, the function
f is subhomogeneous.
Corollary 711 (i) The CES production function P is concave if 0 < 1. (ii) The Cobb-
Douglas production function is concave as long as ni=1 ai = 1.
Lemma 712 The product of two concave and strictly positive functions is a quasi-concave
function.
Proof Let f; g : C Rn ! R be strictly positive. Then, we can write log f g = log f + log g.
The functions log f and log g are concave thanks to Proposition 676. Hence, log f g is concave
because it is the sum of concave functions (Proposition 668). It follows that f g is quasi-
concave because f g = elog f g is a strictly increasing transformation of a concave function.
Proof of Corollary 711 (i) For = 1 the statement is obvious. If < 1, note that on
R+ the power function x is concave if 2 (0; 1). Hence, also g (x) = x1 + (1 ) x2
1
is concave. Since h (x) = x is strictly increasing on R+ for any > 0, it follows that
f = h g is quasi-concave. Since f 0 and thanks to Theorem 710, we conclude that f is
concave as we have previously shown its homogeneity. (ii) Any power function xi i is concave
n
Y
and strictly positive. As the function f is their product xi i , from the previous lemma we
i=1
have that it is quasi-concave. Since f 0, Theorem 710 implies that n
Pn f is concave on R+ as
we have already seen that f is positively homogeneous whenever i=1 ai = 1.
15.3 Homotheticity
15.3.1 Semicones
For the sake of simplicity, till now we considered convex sets containing the origin 0, and
cones in particular. To introduce the notions of this …nal section such an assumption becomes
too cumbersome to maintain, so we will consider the following generalization of the notion
of cone.
Unlike the de…nition of cone, here we require that x belong to C only for > 0 rather
than for 0. A cone is thus, a fortiori, a semicone. However, the converse does not hold:
the set Rn++ is a notable example of a semicone that is not a cone.
Therefore, semicones do not necessarily contain the origin and, when they do, they auto-
matically become cones. In any case, the origin is always in the surroundings of a semicone:
The easy proofs of the above lemmas are left to the reader. The last lemma, in particular,
leads to the following result.
The distinction between cones and semicones thus disappears when considering closed
sets. Finally, the following version of Proposition 699 holds for semicones, with coe¢ cients
that now are required to be strictly positive, as the reader can check.
x; y 2 C =) x + y 2 C
Example 718 (i) The two half-lines ( 1; 0) and (0; 1) are semicones in R (but they are
not cones) (ii) The set Rn++ = fx 2 Rn : x 0g of the strongly positive vectors is a convex
semicone (which is not a cone). N
The next result shows that this notion is consistent with what we did so far.
Proof If 0 2 C, then for every > 0 we have f (0) = f ( 0) = f (0). Hence, f (0) = 0.
Thus, when the semicone is actually a cone –i.e., it contains the origin (Lemma 714) –we
get back to the notion of positive homogeneity on cones of the previous section. Everything
…ts together.
Pn
Example P 721 Consider the function f : Rn++ ! R given by f (x) = e i=1 ai log xi , with
ai > 0. If ni=1 ai = 1, the function is positively homogeneous. Indeed, for any > 0 we
have
Pn Pn Pn Pn
ai log xi ai (log +log xi )
f ( x) = e i=1 =e i=1 = elog e i=1 ai log xi
= e i=1 ai log xi
N
15.3. HOMOTHETICITY 497
Yn
Example 724 Let u : Rn+ ! R be the Cobb-Douglas utility function u (x) = xai i , with
P i=1
ai > 0 and ni=1 ai = 1. It follows from Example 708 that such a function is positively
homogeneous. If f is strictly increasing, the transformations f u of the Cobb-Douglas utility
function are homothetic. For example, if we consider the restriction of u on the semicone Rn++
(where it is still positively homogeneous) and the logarithmic transformation
P f (x) = log x,
we obtain the log-linear utility function v = log u given by v (x) = ni=1 ai log xi , which is
thus homothetic. N
3
Let the reader be reminded that the same does not hold for quasiconcavity: as previously noted, there
are quasiconcave functions which are not transformations of concave functions.
498 CHAPTER 15. HOMOGENEOUS FUNCTIONS
Chapter 16
Lipschitz functions
A function is called Lipschitz, without further quali…cations, when the inequality (16.1)
holds on the entire domain of the function. When f is a function, this inequality takes the
simpler form
jf (x1 ) f (x2 )j k kx1 x2 k
where in the left hand side we have the absolute value in place of the norm.
In a Lipschitz operator, the distance kf (x1 ) f (x2 )k between the images of two vectors
x1 and x2 is controlled, through a positive coe¢ cient k, by the distance kx1 x2 k between
the vectors x1 and x2 themselves. This “variation control” that the independent variable
exerts on the dependent variable is at the heart of Lipschitzianity. The rein is especially
tight when k < 1, so variations in the independent variable cause strictly smaller variations
of the dependent variable. In this case, the Lipschitz operator is called a contraction.
The control nature of Lipschitzianity translates in a strong form of continuity. To see
how, …rst note that Lipschitz operators are continuous. Indeed, let x0 2 A. If xn ! x0 , we
have:
kf (xn ) f (x0 )k k kxn x0 k ! 0 (16.2)
and hence f (xn ) ! f (x0 ). So, f is continuous at x0 . More is true:
499
500 CHAPTER 16. LIPSCHITZ FUNCTIONS
The converse is false, as Example 728 will show momentarily. Because of its control
nature, Lipschitzianity thus embodies a stronger form of continuity than the uniform one.
Proof For each " > 0, take 0 < " < "=k. Then, kf (x) f (y)k k kx yk < " for each
x; y 2 Rn such that kx yk < " .
So, setting y = 0, there is no k > 0 such that jf (x) f (y)j k jx yj for each x; y 0.
That said, the previous example shows that f is Lipschitz on each interval [a; b] with
a > 0. So f is not Lipschitz on its entire domain, but it is in suitable subsets of it. More
interestingly, by Theorem 526 the function f is uniformly continuous on each interval [0; b],
with b > 0, but it is not Lipschitz on [0; b]. This also shows that the converse of the last
lemma does not hold. N
Lemma 730 Given a linear operator f : Rn ! Rm , there exists a constant k > 0 such that
kf (x)k k kxk for every x 2 Rn .
kf (x)k
0< k
kxk
The ratio kf (x)k = kxk is thus bounded above by a constant k, so it cannot explode, for
all non-zero vectors x. In other words, there is no sequence fxn g of vectors such that
kf (xn )k = kxn k ! +1.
16.2. LOCAL CONTROL 501
Pn
Proof Set k = i=1 f ei . We have:
n
! n n
X X X
i i
kf (x)k = f xi e = xi f e jxi j f ei
i=1 i=1 i=1
Proof of Theorem 729 Let x; y 2 Rn . Since f is linear, the last lemma implies
So, f is Lipschitz.
Note the local nature of this de…nition: the constant kx0 depends on the point x0 at hand
and the inequality is required only between points of a neighborhood of x0 (not between any
two points of the domain of f ).
When f is locally Lipschitz at each point of a set B we say that it is locally Lipschitz on
B. If B is the entire domain, we say that the operator is locally Lipschitz, without further
quali…cations.
Now, the “variation control” that the independent variable exerts on the dependent
variable is only local, in a neighborhood of a given point. This local control still translates in
a strong from of continuity at a point (with kx0 in place of k , (16.2) still holds as xn ! x0 ),
but no longer across points as it was the case with global Lipschitzianity.
502 CHAPTER 16. LIPSCHITZ FUNCTIONS
where 0 < "0 < ". Since the derivative f 0 is continuous on [x0 "0 ; x0 + "0 ], by Weierstrass’
Theorem the constant k0 is well de…ned. By proceeding as in the Example 727, mutatis
mutandis, the reader can then check that f is locally Lipschitz at x0 . N
There is, however, an important case where local and global Lipschitzianity are equival-
ent.
Proof Since the “only if ” is obvious, we only prove the “if.” Assume that f is locally
Lipschitz on K. Suppose, by contradiction, that f is not Lipschitz on K. So, there exist two
sequences fxn g and fyn g in K such that
kf (xn ) f (yn )k
! +1 (16.4)
kxn yn k
Since K is compact, by the Bolzano-Weierstrass’ Theorem there exist two subsequences
fxnk g and fynk g such that xnk ! x 2 K and ynk ! y 2 K. Since f is continuous, we have
f (xnk ) ! f (x) and f (ynk ) ! f (y). We consider two cases.
(ii) Suppose x = y. By hypothesis, f is locally Lipschitz at x, so there is B" (x) such that
Since xnk ! x and ynk ! x, there is a large enough k" 1 so that xnk ; ynk 2 B" (x)
for all k k" . Then,
kf (xnk ) f (ynk )k
kx 8k k"
kxnk ynk k
which contradicts (16.4).
16.2. LOCAL CONTROL 503
The next important result shows that concave functions are locally Lipschitz, thus clari-
fying the continuity properties of these fundamental functions.
In view of Proposition 734, f is then Lipschitz on each compact set K C. The theorem
is a consequence of the following lemma of independent interest.
Set mx0 = mini=1;:::;n f (x0 ) ; f x0 + ei . We thus have f (x) mx0 for all x 2 int D.
Given any neighborhood B" (x0 ) int D, we have f (x) mx0 for all x 2 B" (x0 ).
So, f is locally bounded below. Next we show that it is also bounded above on B" (x0 ).
For, let y 2 B" (x0 ). Consider the point z = 2x0 y = x0 (y z0 ). Clearly, z 2 B" (x0 )
and x0 = (z + y) =2. By concavity,
1 1 1 1
f (x0 ) = f z+ y f (z) + f (y)
2 2 2 2
as desired.
Proof of Theorem 735 We want to show that f is locally Lipschitz at any x 2 C. By the
last lemma, f is locally bounded at x, i.e., there exists mx 2 R and a neighborhood B2" (x),
without loss of generality of radius 2", such that jf (y)j mx for all y 2 B2" (x). Given
y1 ; y2 2 B2" (x), set
"
y3 = y2 + (y2 y1 )
ky2 y1 k
504 CHAPTER 16. LIPSCHITZ FUNCTIONS
"
ky3 xk = y3 y2 + (y2 y1 ) 2"
ky2 y1 k
Since
" ky2 y1 k
y2 = y1 + y3
ky2 y1 k + " ky2 y1 k + "
concavity implies
" ky2 y1 k
f (y2 ) f (y1 ) + f (y3 )
ky2 y1 k + " ky2 y1 k + "
so that
ky2 y1 k ky2 y1 k
f (y1 ) f (y2 ) (f (y1 ) f (y3 )) 2mx (16.5)
ky2 y1 k + " "
Interchanging the roles of y1 and y2 , we get
ky1 y2 k ky1 y2 k
f (y2 ) f (y1 ) (f (y2 ) f (y3 )) 2mx
ky1 y2 k + " "
2mx
jf (y1 ) f (y2 )j ky1 y2 k
"
So, f is locally Lipschitz at x.
Proof We only prove the “only if”, the converse being trivial. Let f : Rn ! R be translation
invariant. We need to prove that (16.6) holds when k < 0. Let c 0. For each x 2 Rn , we
have f (x) = f (x c + c) = f (x c) + cf (1), so f (x c) = f (x) cf (1). Now, let k < 0.
Since k 0, setting c = k by what just proved we have
f (x + k) = f (x ( k)) = f (x) ( k) f (1) = f (x) + kf (1)
as desired.
Example 739 De…ne f : Rn ! R by
f (x) = min li (x)
i=1;:::;n
It is normalized if and only if c = 1. Later in the book, Theorem 1169 will characterize this
class of translation invariant functions. N
Though translation invariance is much weaker than linearity, under monotonicity we still
have Lipschitzianity. Actually, for the result is enough that the function be Blackwell.
Proposition 740 An increasing Blackwell function is Lipschitz.
Proof First, note that since f is increasing, we have f (1) > 0. Let x 2 Rn . By (16.3), we
have jxi j kxk for each i = 1; :::; n.5 Therefore, maxi=1;:::;n jxi j kxk, which in turn implies
x y maxi=1;:::;n jxi yi j kxk for all x; y 2 Rn . So x y + kx yk. Since f is increasing
and Blackwell, we then have
f (x) f (y + kx yk) f (y) + kx yk f (1)
So, f (x) f (y) kx yk for all x; y 2 Rn . By exchanging the roles of x and y, we also
have f (y) f (x) kx yk for all x; y 2 Rn . We conclude that
jf (x) f (y)j f (1) kx yk 8x; y 2 Rn
as desired.
N.B. The proof shows that an increasing Blackwell function f is a contraction if and only
if f (1) < 1. In applications, this is the most relevant case. O
Remarkably, like positive homogeneity (Theorem 710), also under translation invariance
concavity and quasi-concavity are equivalent properties.
5
To ease matters, in this proof with an abuse of notation we write x k and x + k in place of x k and
x + k.
506 CHAPTER 16. LIPSCHITZ FUNCTIONS
Proof We only prove the “if”, the converse being obvious. Let f be quasi-concave. We
have, for all x 2 Rn ,
t t
f (x) t () f (x) t 0 () f x = f (x) f (1) 0 8t 2 R
f (1) f (1)
which implies6
t
(f t) = (f 0) + 8t 2 R
f (1)
If t and s are any two scalars and 2 (0; 1), then
t + (1 )s
(f t) + (1 ) (f s) = (f 0) + (1 ) (f 0) + (16.7)
f (1)
t + (1 )s
= (f 0) + = (f t + (1 ) s)
f (1)
Take any two points x; y 2 Rn and set f (x) = t and f (y) = s. Then, x 2 (f t) and
y 2 (f s), and x + (1 ) y 2 (f t) + (1 ) (f s). By (16.7), x + (1 )y 2
(f t + (1 ) s), that is,
f ( x + (1 ) y) t + (1 ) s = f (x) + (1 ) f (y)
So, f is concave.
6
To be precise, the right hand side is the sum of sets
t t
(f 0) + = x+ : x 2 (f 0)
f (1) f (1)
in the sense of Section 32.3. Later in the proof we add upper contour sets.
Chapter 17
Supermodular functions
17.1 Lattices
We being by introducing lattices, an important class of sets. Given any two vectors x; y 2 Rn ,
the join x _ y is the vector of Rn such that
In words, x _ y is the smallest vector that is larger than both x and y, while x ^ y is the
largest vector that is smaller than both of them. That is, for all z 2 Rn we have
z x and z y =) z x_y
and
z x and z y =) z x^y
Example 743 Let x = (0; 1) and y = (2; 0) be two vectors in the plane. We have
(x _ y)1 = max fx1 ; y1 g = max f0; 2g = 2 , (x _ y)2 = max fx2 ; y2 g = max f1; 0g = 1
(x ^ y)1 = min fx1 ; y1 g = min f0; 2g = 0 , (x ^ y)2 = min fx2 ; y2 g = min f1; 0g = 0
so x ^ y = (0; 0). N
De…nition 744 A set L of Rn is a lattice if, for any two elements x and y of L, both x _ y
and x ^ y belong to L.
Lattices are, thus, subsets L of Rn that are closed under joins and meets, that is, both
the join and the meet of any its two elements belongs to L.
507
508 CHAPTER 17. SUPERMODULAR FUNCTIONS
Example 745 (i) Given any x; y 2 Rn , the quadruple fx; y; x _ y; x ^ yg is the simplest
example of a …nite lattice. (ii) Given any a; b 2 Rn , with a b, the interval
[a; b] = fx 2 Rn : a x bg
The next simple, yet key, property relates meets, joins and sums.
Proof The equality is trivially true if x and y are scalars. If x and y are vectors of Rn , we
then have:
as desired.
Example 748 (i) Functions of a single variable are modular. Indeed, let x; y 2 R with, say,
x y. Then, x ^ y = x and x _ y = x, so modularity trivially holds. (ii) Linear functions
f : Rn ! R are modular: by (17.1) we have
f (x _ y) + f (x ^ y) = f (x _ y + x ^ y) = f (x + y) = f (x) + f (y)
Interestingly, the modularity notions just introduced have no bite on functions of a single
variable, so they are of interest only in the multivariable case. That said, the next two results
show how to manufacture supermodular functions via convex transformations.
Proposition 749 Let f : L ! R be a monotone and modular function. If ' : C ! R is
a convex function de…ned on a convex set of the real line, with Im ' C, then ' f is
supermodular.
Proof Let x; y 2 I with, say, f (x) f (y). By modularity, we have f (x _ y) f (x) =
f (y) f (x ^ y). We consider two cases. (i) Suppose that f is increasing. We then have
f (x _ y) f (x) f (y) f (x ^ y). Since ' has increasing increments (cf. Proposition
1090), we then have ' (f (y)) ' (f (x ^ y)) ' (f (x _ y)) f (x). So, ' f is supermodular.
(ii) Suppose that f is decreasing. Now, f (x _ y) f (y) f (x) f (x ^ y) and, since '
has increasing increments, we have ' (f (y)) ' (f (x _ y)) ' (f (x ^ y)) ' (f (x)). We
conclude that also in this case ' f is supermodular.
Example 750 Let f : Rn ! R be a positive linear function. Given any convex function
' : R ! R, the function ' f is supermodular. N
Proposition 751 Let f : L ! R be an increasing and supermodular function. If ' : C ! R
is a convex and increasing function de…ned on a convex set of the real line, with Im ' C,
then ' f is supermodular.
Proof Let x; y 2 I with, say, f (x) f (y). Since f is increasing, we have f (x ^ y) f (y)
f (x) f (x _ y). Set k = f (x _ y) f (x) f (y) f (x ^ y) = h. Since f is supermodular,
we have k h 0. Since ' has increasing increments, we then have
' (f (y)) ' (f (x ^ y)) = ' (f (x ^ y) + h) ' (f (x ^ y)) ' (f (x) + h) ' (f (x))
' (f (x) + k) ' (f (x)) = ' (f (x _ y)) ' (f (x))
where the last inequality holds because ' is increasing. So, ' f is supermodular.
Example 752 De…ne f : R2+ ! R by f (x1 ; x2 ) = x1 x2 . Given any increasing and convex
function ' : R+ ! R, the function ' f is supermodular. N
Example 754 p Consider the function f : [1; +1) [3; +1) [2; +1) ! R de…ned by
f (x1 ; x2 ; x3 ) = (x1 1) (x2 3) (x3 2). For a …xed x1 1, the section f x1 : [3; +1)
[2; +1) ! R now has x2 and x3 as the independent variables – indeed, we have x 1 =
(xp 5 5
2 ; x3 ). For instance, if x1 = 5 the section f : [3; +1) [2; +1) ! R is de…ned by f (x2 ) =
x
2 (x2 3) (x3 2). In a similar way we can de…ne the sections f 2 : [1; +1) [2; +1) ! R
and f x3 : [1; +1) [3; +1) ! R.
On the other hand, if we …x x 1 = (x2 ; x3 ) 2 [3; +1) [2; +1), we have the section
f x 2 ;x 3 : [1; +1) ! R that has x as the independent variable. For instance, if x = 6 and
1 p p 2
x3 = 10, the section f 6;4 : [1; +1) ! R is de…ned by f 6;4 (x1 ) = 2 8 x1 1. In a similar
way we can de…ne the sections f x1 ;x3 : [3; +1) ! R and f x1 ;x2 : [2; +1) ! R. N
The sections f x i can be used to formalize ceteris paribus arguments in which all variables
are kept …xed, except xi . Indeed, partial derivation at a point x 2 Rn can be expressed in
terms of these sections:
In sum, we have sections f xi in which the variable xi is kept …xed and the other variables
vary, as well as a section f x i in which the opposite holds: the variable xi is the only
independent variables, the other ones being kept …xed. In a similar spirit we can have
“intermediate” sections in which we block a subset of the variables.
Example 755 Consider the p function f : [1; +1) [3; +1) [2; +1) [ 1; +1) ! R
de…ned by f (x1 ; x2 ; x3 ) = (x1 1) (x2 3) (x3 2) (x4 + 1). The “intermediate” section
f x2 ;x3 : [1; +1) [ 1; +1) ! R has p x1 and x4 as independent variables. So, if x2 = 6 and
x3 = 5, we have f x2 ;x3 (x1 ; x4 ) = 3 (x1 1) (x4 + 1). N
Though this notation is more handy, superscripts best emphasize the parametric role of the
blocked variables.
2
Recall the notation x i from Section 12.8.1. Here A i is the Cartesian products of all sets fA1 ; :::; An g
except Ai , i.e., A i = j6=i Aj .
17.3. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 511
De…nition 756 A function f : I Rn ! R has increasing (cross) di¤ erences if, for each
xi 2 Ii and hi 0 with xi + hi 2 Ii , the di¤ erence
f xi +hi (x i ) f xi (x i )
is increasing in x i , while f has decreasing di¤ erences if such di¤ erence is decreasing in x i .
Increasing and decreasing di¤erences are dual notions, so we will focus on the former.
For functions of two variables, we have a simple characterization of this property.
and
f x2 +h2 (x1 ) f x2 (x1 ) f x2 +h2 (x1 + h1 ) f x2 (x1 + h1 ) (17.4)
which are both equal to (17.2).
So, symmetrically, an increase in the …rst input has a higher impact when also the second
input increases. In sum, the marginal contribution of an input is increasing with the other
input: the two inputs are complementarity.
Proposition 758 A function f : I Rn ! R has increasing di¤ erences if and only if, for
each 1 i 6= j n, the section f x ij : I ij R2 ! R satis…es (17.2), i.e.,
fx ij
(xi ; xj + hj ) fx ij
(xi ; xj ) fx ij
(xi + hi ; xj + hj ) fx ij
(xi + hi ; xj ) (17.5)
In terms of the previous interpretation, we can say that a production function has increas-
ing di¤erences if and only if its inputs are pairwise complementary. Increasing di¤erences
thus model this form of complementarity. In a dual way, decreasing di¤erences model an
analog form of substitutability.
Proof Assume that f has increasing di¤erences. To …x ideas, let i = 1 and j = 2. We want
to show that
fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) fx 12
(x1 + h1 ; x2 + h2 ) fx 12
(x1 + h1 ; x2 )
We have
fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) = f (x1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 ; x2 ; x3 ; :::; xn )
x2 +h2 x2
=f (x1 ; x3 ; :::; xn ) f (x1 ; x3 ; :::; xn )
f x2 +h2 (x1 + h1 ; x3 ; :::; xn ) f x2 (x1 + h1 ; x3 ; :::; xn )
= f (x1 + h1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 + h1 ; x2 ; x3 ; :::; xn )
x x
=f 12
(x1 + h1 ; x2 + h2 ) f 12
(x1 + h1 ; x2 )
as desired. The general case is analogous, just notationally cumbersome. So, (17.5) holds.
We omit the proof of the converse.
The complementarity nature of functions with increasing di¤erences, in which “the mar-
ginal contribution of an input is increasing with the other input”, has mathematically a
(cross) second order ‡avor. The next di¤erential characterization con…rms this intuition.
@f (x)
0 (17.6)
@xi @xj
Proof “Only if”. Suppose f has increasing di¤erences. To …x ideas, let i = 1 and j = 2. By
Proposition 758, the section f x 12 : I1 I2 ! R satis…es (17.2). Let x1 x01 . By setting
0
h1 = x1 x1 , we get
In turn, this implies @f (x) =@x2 @x1 0. A similar argument shows that @f (x) =@x1 @x2 0.
“If”. Suppose @f (x) =@xi @xj 0 for all 1 i 6= j n. In view of Proposition 758, it
is enough to show that the sections f x ij have increasing di¤erences. Again to …x ideas, let
17.3. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 513
Example 760 (i) Let f : R2+ ! R be a CES production function de…ned by f (x) =
1
( x1 + (1 ) x2 ) with 2 [0; 1] and > 0 (cf. Example 705). We have
@f (x) 1 1
2
= (1 ) (1 ) (x1 x2 ) ( x1 + (1 ) x2 )
@x1 @x2
By the previous result, f has decreasing di¤erences if > 1 and increasing di¤erences
if 0 < < 1. So, the parameter determines whether the inputs in the CES pro-
duction functions are complements or substitutes. (ii) Let f : R2+ ! R be a Cobb-
Douglas production function f (x) = x1 1 x2 2 , with 1 ; 2 > 0 (cf. Example 708). Since
@f (x) =@x1 @x2 = 1 2 x1 1 1 x2 2 1 , by the previous result f has increasing di¤erences (so,
its inputs are complements). N
f (y) f (x) = f (y1 ; x2 ; :::; xn ) f (x1 ; :::; xn ) + f (y1 ; y2 ; x3 ; :::; xn ) f (y1 ; x2 ; x3 ; :::; xn )
+ + f (y1 ; :::; yn ) f (y1 ; :::; yn 1 ; xn )
n
X
= f (y1 ; :::; yi ; xi+1 ; :::; xn ) f (y1 ; :::; yi 1 ; xi ; :::; xn )
i=1
Proof “If”. Suppose that f has increasing di¤erences. Let x; y 2 I. By (17.1), we can set
For production functions, this means that, under constant returns to scale, complement-
arity implies concavity.
Proof We only prove the result when f is twice di¤erentiable on Rn++ . Let x; y 2 Rn+ . From
yi yj 2
yi2 yj2 yi yj
= + 2
xi xj x2i x2j xi xj
it follows that
2
1 xj 1 xi 1 yi yj
yi yj = yi2 + yj2 xi xj
2 xi 2 xj 2 xi xj
So,
0 1
X n
X Xn X 2
@f (x) y2 i @ @f (x) A 1 @f (x) yi yj
yi yj = xj xi xj
@xi @xj xi @xi @xj 2 @xi @xj xi xj
1 i;j n i=1 j=1 1 i;j n
that is,
n
X @f (x)
xi = 0 8x 2 Rn++
@xi @xj
i=1
We conclude that, for all x 2 Rn++ ,
X @f (x) 1 X @f (x) yi yj 2
yi yj = xi xj
@xi @xj 2 @xi @xj xi xj
1 i;j n 1 i;j n
X @f (x) yi yj 2
= xi xj 0
@xi @xj xi xj
1 i6=j n
where the last inequality follows from (17.6) and Theorem 761. The Hessian matrix of f
is thus negative semide…nite for all x 2 Rn++ and so f is concave on Rn++ . The reader can
check that the converse holds when n = 2.
Example 763 Let f : R2+ ! R be the positively homogeneous function de…ned by f (x) =
1
(x1 1 x2 2 ) 1 + 2 , with 1 ; 2 > 0. It is supermodular if 1 + 2 1 (why?), so it is concave
by Choquet’s Theorem. N
516 CHAPTER 17. SUPERMODULAR FUNCTIONS
A similar result holds for translation invariant functions (we omit the proof of this note-
worthy result).
f ( x + (1 ) y) [f (x)] [f (y)]1
for every x; y 2 C and 2 [0; 1], and it is said to be log-concave if the inequality is reversed.
Proof We prove the convex version, the concave one being similar. “If”. Let log f be convex.
In view of Proposition 43, we have
as desired.
2
Example 767 (i) The function f : R ! (0; 1) given by f (x) = ex is log-convex. (ii)
2
The Gaussian function f : R ! (0; 1) de…ned by f (x) = e x is log-concave. (iii) The
exponential function is both log-concave and log-convex. N
Log-convexity is much better behaved than log-concavity, as the next result and example
show. They are far from being dual notions.
Proposition 768 (i) Log-convex functions are convex. (ii) Concave functions are log-
concave functions, which in turn are quasi-concave.
Proof (i) Let f be long-convex. Since log f is convex, the result follows from the convex
version of Proposition 676-(i) because we can write f = elog f . (ii) Obvious.
17.5. LOG-CONVEX FUNCTIONS 517
Example 769 The quadratic function f : (0; 1) ! (0; 1) de…ned by f (x) = x2 is, at the
same time, strictly convex and log-concave. Indeed, in view of the last lemma, it is enough
to note that log f (x) = 2 log x is concave. So, the converse of point (i) of the last proposition
fails (there exist convex functions that are not log-convex), while point (ii) is all we can say
about log-concave functions (they can even be strictly convex). N
It is easy to check that the product of log-convex functions is log-convex, as well as that
the product of log-concave functions is log-concave. Addition, instead, does not preserve
log-concavity.
d2 x 2x e x
log e + e = >0
dx2 (1 + e x )2
As a further proof of the much better behavior of log-convexity, we have the following
remarkable result that shows that addition preserves log-convexity (we omit the proof).
Example 772 Given n strictly positive scalars ti > 0 and a strictly positive function ' :
(0; 1) ! (0; 1), de…ne f : C ! (0; 1) by
n
X
f (x) = ' (ti ) txi
i=1
where C any interval of the real line, bounded or not. By Artin’s Theorem, f is log-convex.
Indeed, each function ' (ti ) txi is log-convex in x because log ' (ti ) txi = log ' (ti ) + x log ti is
a¢ ne in x.
An integral version of Artin’s Theorem actually permits to conclude that if ' is continu-
ous, then the function f : C ! (0; 1) de…ned by
Z 1
f (x) = ' (t) tx 1 dt
0
is log-convex (provided the improper integrals are well de…ned for all x 2 C). In this regard,
note that the function ' (t) tx 1 is log-convex in x since log ' (t) tx 1 = log ' (t)+(x 1) log t
is a¢ ne in x. In the special case ' (t) = e t and C = (0; 1), the function f is the classic
gamma function Z 1
(x) = tx 1
e t dt
0
We will consider this log-convex function later in the book (Section 23.5). N
518 CHAPTER 17. SUPERMODULAR FUNCTIONS
Part V
Optima
519
Chapter 18
Optimization problems
As a result, this is the central chapter of the book that justi…es the study of the notions
discussed so far, as well as of those that we will see in the rest of the book.
18.1 Generalities
1
It is a kind of abstraction that any scienti…c inquiry requires, as most eloquently Vilfredo Pareto remarked
in his seminal 1900 piece, to which we refer readers. Note that purposeful individual behavior might well be
boundedly rational (hence, suboptimal), thus rationality is an additional assumption relative to methodolo-
gical individualism (cf. Arrow, 1994).
521
522 CHAPTER 18. OPTIMIZATION PROBLEMS
4 y
3
0
O x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
It is immediate to see that f attains its maximum value, equal to 1, at the point x = 0, that
is, at the origin (Example 229). On the other hand, there is no point at which f attains a
minimum value.
Suppose that, for some reason, we are interested in the behavior of f only on the interval
[1; 2], not on the entire domain R. Then f has 0 as maximum value, attained at the point
x = 1, while it has 3 as minimum value, attained at the point x = 2. Graphically:
4 y
3
1
1 2
0
O x
-1
-2
-3 -3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
(i) the distinction between maximum value and maximizer: a maximizer is an element of
the domain at which the function reaches its maximum value, that is, the element of
the codomain which is the image of a maximizer;2
2
As already anticipated in Section 6.6.
18.1. GENERALITIES 523
(ii) the importance of the subset of the domain in which we are interested in establishing
the existence of maximizers or minimizers.
These two observations lead to next de…nition, in which we consider an objective function
f and a subset C of its domain, called choice set.
f (^
x) f (x) 8x 2 C (18.1)
The value f (^
x) of the function at x
^ is called ( global) maximum value of f on C.
In the special case C = A when the choice set is the entire domain, the point x
^ is called
maximizer, without further speci…cation (in this way, we recover the de…nition of Section
6.6).
(i) in the …rst case C was the entire domain, that is, C = R, and we had x
^ = 0 and
f (^
x) = max f (R) = 1;
(ii) in the second case C was the interval [1; 2] and we had x
^ = 1 and f (^
x) = max f ([1; 2]) =
0.
The maximum value of the objective function f on the choice set C is, thus, nothing but
the maximum of the set f (C), i.e.,3
f (^
x) = max f (C)
By Proposition 33, the maximum value is unique. We denote this unique value by
max f (x)
x2C
The maximizers may, instead, be not unique and their set, called solution set, is denoted by
arg maxx2C f (x), that is,
with graph 2
y
1.5
0.5
-1 1
0
O x
-0.5
-1
-1.5
-2
-3 -2 -1 0 1 2 3
we have maxx2R f (x) = 0 and arg maxx2R f (x) = [ 1; 1], so the set of maximizers is the
entire interval [ 1; 1]. On the other hand, if we restrict ourselves to [1; +1), we have
maxx2[1;+1) f (x) = 0 and arg maxx2[1;+1) f (x) = f1g, so 1 is the unique maximizer of f on
[1; +1). Graphically:
y
1.5
0.5
1
0
O x
-0.5
-1
-1.5
-2
-3 -2 -1 0 1 2 3
f (^
x) > f (x)
Strong maximizers are an important class of maximizers, with the following property of
uniqueness.
Proposition 775 A maximizer is strong if and only if it is unique, that is, if and only if
arg maxx2C f (x) is a singleton.
Proof “Only if”. Suppose, by contradiction, that there exist two distinct strong global
maximizers x
^1 and x
^2 . By de…nition, f (^
x1 ) > f (^
x2 ) and f (^
x2 ) > f (^
x1 ), which is impossible.
18.1. GENERALITIES 525
Until now we have talked about maximizers, but analogous considerations hold for min-
imizers. For example, in De…nition 773 an element x ^ 2 C is a (global) minimizer of f on
C if f (^
x) f (x) for every x 2 C, with minimum value f (^ x) = min f (C), denoted by
minx2C f (x). Maximizing and minimizing are actually two sides of the same coin, as form-
alized by the next result. Its obvious proof is based on the observation that f (x) f (y) if
and only if f (x) f (y) for every x; y 2 A.
Analogous notions hold for minimization problems, in which we look for the minimum
value and the minimizers of an objective function on a given choice set. Finally, optimization
problems include both maximization and minimization problems, they are “genderless”.4
(“sub” from “subject to”) and a minimization problem with min in place of max. The x
below max indicates the choice variable, that is, the variable which we control to maximize
the objective function. When C = A, sometimes we omit the clause “sub x 2 C” since
x must obviously belong to the domain of f . In the important case in which the set C
is open, we talk of unconstrained optimization problems;5 otherwise, we talk of constrained
optimization problems.
4 y
3
0
O
1 2 x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
4
Because of our maximization emphasis, however, in what follows we often use interchangeably the terms
“optimization problem” and “maximization problem”.
5
Since an open set C is still a constraint, this terminology is unsatisfactory. To make some sense of it,
note that all the points x of an open set C are interior and so have a neighborhood B" (x) included in C.
One can thus “move around” the point x while still remaining within C. In this local sense, an open choice
set allows for some freedom.
18.1. GENERALITIES 527
f (x) 1 8x 2 R
Example 778 Let f : R2 ! R be de…ned by f (x) = x21 6x1 x2 + 12x22 for every x =
(x1 ; x2 ) 2 R2 and consider the optimization problem
Since f (x1 ; x2 ) = x21 6x1 x2 + 9x22 + 3x22 = (x1 3x2 )2 + 3x22 , that is, f is the sum of two
squares, we have
f (x1 ; x2 ) 0 8 (x1 ; x2 ) 2 R2
Next, since f (0; 0) = 0 (actually, f assumes value 0 only at the origin), we conclude that the
origin (0; 0) is a strong minimizer of f on R2 . The minimum value of f on R2 is the scalar
0. Finally, f is unbounded above, so it has no maximizers. N
Example 779 Let f : R3 ! R be given by f (x) = e x21 x22 x23 for every x = (x1 ; x2 ; x3 ) 2
R3 and consider the optimization problem
Since 0 < f (x1 ; x2 ; x3 ) 1 for every (x1 ; x2 ; x3 ) 2 R3 and f (0; 0; 0) = 1, the origin (0; 0; 0)
is a strong maximizer of f on R3 . The maximum value of f on R3 is the scalar 1. However,
f does not have a minimizer because it never attains the in…mum of its values, that is, 0. N
Example 780 Let f : R ! R be de…ned by f (x) = cos x, and consider the optimization
problem
min f (x) sub x 2 R
x
Since 1 cos x 1, all the points at which f (x) = 1 are maximizers and all the points at
which f (x) = 1 are minimizers. The maximizers are, therefore, x ^ = 2k with k 2 Z and
the minimizers are x~ = (2k + 1) with k 2 Z. The maximum and minimum values are the
the scalars 1 and 1, respectively.
These maximizers and minimizers on R are not strong. However, if we consider a smaller
choice set, such as C = [0; 2 ), we will …nd that the unique strong maximizer is x
^ = 0 and
the unique strong minimizer is x ~= . N
Example 781 For a constant function, all the points of the domain are simultaneously
maximizers and minimizers. Its constant value is simultaneously the maximum and minimum
value. N
528 CHAPTER 18. OPTIMIZATION PROBLEMS
Note that De…nition 773 does not require the function to satisfy any special property;
in particular, neither continuity nor di¤erentiability are invoked. For example, the function
f : R ! R given by f (x) = jxj attains its minimum value at the point x ^ = 0, where it is not
di¤erentiable. The function f : R ! R given by
(
x + 1 if x 1
f (x) =
x if x > 1
with graph
4
y
3
0
O 1 x
-1 -1
-2
-3
-4
-4 -3 -2 -1 0 1 2 3 4
y
4
0
4
O 1 x
-2
-4 -4
-6
-6 -4 -2 0 2 4 6
18.1. GENERALITIES 529
18.1.2 Properties
The optimization problems (18.2) enjoy a simple, but important, property of invariance.
and
max (g f ) (x) sub x 2 C
x
Therefore, f (^
x) f (x) for every x 2 C if and only if (g f ) (^
x) (g f ) (x) for every
x 2 C.
Thus, two objective functions – here f and f 0 = g f – are equivalent when they are a
strictly transformation one of the other.6 Later in the chapter, we will comment more on
this simple, yet conceptually important, result (Section 18.1.5).
Let us now consider the case, important in economic applications (as we will soon see),
in which the objective function is strongly increasing.
Proof Let x ^ 2 arg maxx2C f (x). We want to show that x ^ 2 @C. Suppose, by contradiction,
that x^2= @C, i.e., x
^ is an interior point of C. There exists, therefore, a neighborhood B" (^x)
of x
^ included in C. It is easy to see that, then, there exists y 2 B" (^
x) such that x
^ y. Since
f is strongly increasing on C, we obtain that f (y) > f (^ x), which contradicts the optimality
of x
^. We conclude that x ^ 2 @C.
The possible solutions of the optimization problem (18.2) are, thus, boundary points
when the objective function is strongly increasing (a fortiori, if it is a strictly increasing
function; cf. Proposition 211). With this kind of objective function, we can thus simplify
problem (18.2) as follows:
max f (x) sub x 2 @C
x
We will soon see a remarkable application of this observation in Walras’law.
The last proposition implies that when @C \C = ;, which happens for example when C is
open, the optimization problem (18.2) does not admit any solution if f is strongly increasing.
A trivial example is f (x) = x on C = (0; 1), as the graph shows:
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
x0 )
Therefore, f (^ f (^
x) for every x
^ 2 arg maxx2C f (x).
Larger sets C always lead to higher maximum values of the objective function. In other
terms, to have more opportunities to choose from is never detrimental, whatever the form
18.1. GENERALITIES 531
of the objective function is. This simple principle of monotonicity is often important. The
basic economic principle that removing constraints on agents’choices can only bene…t them
is, indeed, formalized by this proposition.
Example 785 Recall the initial example in which we considered two di¤erent sets of choices,
R and [1; 2], for the function f (x) = 1 x2 . We had maxx2[1;2] f (x) = 0 < 1 = maxx2R f (x),
in accordance with the last proposition. N
max x2 sub x 0
x
Example 787 Let f : R2++ ! R be de…ned by f (x) = log x1 + log x2 . Consider the
optimization problem
max f (x) sub x1 + x2 = 1
x
The problem is symmetric in each xi , so it is natural to guess a symmetric solution x^ with
equal components x ^1 = x^2 . Then, x
^1 = x^2 = 1=2 because of the constraint x1 + x2 = 1.
Let us verify this guess. Since the logarithmic function is strictly concave, if y 6= x
^ and
y1 + y2 = 1, we have
1 1
f (y) f (^
x) = log 2y1 + log 2y2 = 2 log 2y1 + log 2y2 < 2 log (y1 + y2 ) = 2 log 1 = 0
2 2
So, x
^ indeed solves the problem. Here the the maximum value is f (^
x) = log 4. N
532 CHAPTER 18. OPTIMIZATION PROBLEMS
The next examples are a bit more complicated, but they are important in applications
and show how some little thinking can save many calculations.
Yn
Example 788 Let f : Rn+ ! R be a Cobb-Douglas function de…ned by f (x) = xai i , with
Pn n
i=1
a
i=1 i = 1 and a i > 0 for each i. Given 2 R ++ and > 0, consider the optimization
problem
max f (x) sub x 2 C (18.3)
x
P
with choice set C = x 2 Rn+ : ni=1 i xi = . It is easy to see that the maximizers belong
to Rn++ , that is, they have strictly positive components. Indeed, if x lies on some axes of
Rn – i.e., xi = 0 for some i – then f (x) = 0. Since f 0 on C, it is easy to see that such
x cannot solve the problem. For this reason, we can consider the equivalent optimization
problem
max f (x) sub x 2 C \ Rn++ (18.4)
x
We can do better: since f > 0 on Rn++ , we can consider the logarithmic transformation
P
g = log f of the objective function f , that is, the log-linear function g (x) = ni=1 ai log xi .
The problem
max g (x) sub x 2 C \ Rn++ (18.5)
x
is equivalent to the previous one by Proposition 782. It is, however, more tractable because
of the log-linear form of the objective function.
Let us ponder over problem (18.5). Suppose …rst P that both the coe¢ cients ai and i are
equal among themselves, with ai = 1=n (because ni=1 ai = 1) and i = 1 for each i. The
problem is then symmetric in each xi , so it is natural to guess a symmetric
P solution x
^, with
x
^1 = =x ^i = ai for each i because of the constraint ni=1 xi = . If, instead,
^n . Then, x
the coe¢ cients di¤er, the asymmetry in the solutions should depend on the coe¢ cients i
and ai peculiar to each xi . An (educated) guess is that the solution is
a1 an
x
^= ; :::; (18.6)
1 n
^ 2 C \ Rn++ because x
Let us verify this guess. We have x ^ 2 Rn++ and
n
X n
X n
X
ix
^i = i ai = ai =
i=1 i=1 i i=1
P P
We now show that ni=1 ai log yi < ni=1 ai log x ^i for every y 2 C \ Rn++ with y 6= x
^. Since
log x is strictly concave, by Jensen’s inequality (14.13) we have
n
X n
X n
X n
X n
X
yi yi i yi
ai log yi ai log x
^i = ai log < log ai = log
i=1 i=1 i=1 ai i i=1 ai i i=1
n
X
1 1
= log i yi log = log 1 = 0
i=1
as desired. We conclude that (18.6) is indeed the unique solution of the problem. N
18.1. GENERALITIES 533
Indeed, if x 2 C then
n
X n
X n
X
1
x= xi ei = i xi ei = ~i
i xi e
i=1 i=1 i i=1
P
where i xi 0 for each i and ni=1 i xi = 1 (because x 2 C). It is easy to check that each
e~i belongs to C. We are now in a position to say something about the optimization problem
(18.7). Since f is convex, we have
n
! n
X X
i
f (x) = f x
i i e
~ ~i
i xi f e max f e~i
i=1;:::;n
i=1 i=1
Thus, to …nd a maximizer it is enough to check which e~i receives the highest evaluation under
f . Since the vectors e~i lie on some axis of Rn , in this way we …nd what in the economics
jargon are called corner solutions.
That said, there might well be maximizers that this simple reasoning may neglect. In
other words, we only showed that:
To say something more about all possible maximizers, i.e., about the set arg maxx2C f (x),
we need to assume more on the objective function f . We consider two important cases:
(i) Assume that f is strictly convex. Then, the only maximizers in C are among the
vectors e~j , that is,
arg max f (x) = arg max f (x)
e1 ;:::;~
x2f~ en g x2C
Indeed, strict convexity yields a strict inequality as soon as at least for two indexes i
we have i xi > 0, that is,
n
! n
X X
i
f (x) = f i xi e
~ < ~i
i xi f e
i=1 i=1
534 CHAPTER 18. OPTIMIZATION PROBLEMS
For example, if 1 < 2 < 3,then e~1 = (1= 1 ; 0; 0) is the only solution, while if
~1 = (1=
1 = 2 < 3 , then e ~2 = (0; 1= 2 ; 0) are the only two solutions.
1 ; 0; 0) and e
(ii) Assume that f is a¢ ne, i.e., f (x) = 0 + 1 x1 + n xn . Then, the set of maximizers
j
consists of the vectors e~ that solve problem (18.8) and of their convex combinations
(as the reader can easily check). That is,
where left-hand side is the convex envelope of the vectors in arg maxx2f~e1 ;:::;~en g f (x)
(a polytope; cf. Example 689). For instance, consider the problem
For instance, if 1 = 1 > 2 = 2 > 3 = 3 , then e~1 = (1= 1 ; 0; 0) is the only solution of
problem (18.10), so of problem (18.9). On the other hand, if 1 = 1 = 2 = 2 > 3 = 3 ,
then e~1 = (1= 1 ; 0; 0) and e~2 = (0; 1= 2 ; 0) solve problem (18.10), so the polytope
t (1 t)
co e~1 ; e~2 = t~
e1 + (1 t) e~2 : t 2 [0; 1] = ; ;0 : t 2 [0; 1]
1 2
To sum up, some simple arguments show that optimization problems featuring convex
objective functions and linear constraints have corner solutions. Section 18.6.2 will discuss
these problems, which often arise in applications. N
x
^= Pn ; :::; Pn (18.11)
i=1 i i=1 i
18.1. GENERALITIES 535
because of the constraint. To verify this guess, let x 2 C be a solution of the problem, so
that f (x ) f (y) for all y 2 C. As we will see, by Weierstrass’Theorem such a solution
exists. We want to show that x = x ^. It is easy to check that, if k = (k; :::; k) 2 Rn is a
constant vector and 0 is a positive scalar, we have
f ( x + k) = f (x) + k 8x 2 Rn (18.12)
1 1 1 1
f (x ) f x + x^ = f (x ) + Pn
2 2 2 2 i=1 i
P P
So, mini=1;::;n xi = f (x ) = ni=1 i , that is, xi = ni=1 i for each i. Suppose x 6= x
^,
that is, x > x ^. Since x 2 C, we reach the contradiction
n
X n
X
= i xi > i Pn =
i=1 i=1 i=1 i
We conclude that x = x
^. The constant vector (18.11) is thus the unique solution of the
problem. N
B (p; w) = fx 2 A : p x wg
We write B (p; w) to highlight the dependence of the budget set on p and on w. For example,
w w0 =) B (p; w) B p; w0 (18.13)
that is, to a greater income there corresponds a larger budget set. Analogously,
p p0 =) B (p; w) B p0 ; w (18.14)
with 2 [0; 1] and 2 (0; 1]. In this case the consumption set is A = R2+ .
(ii) Let u : R2++ ! R be the log-linear utility function
with a 2 (0; 1). Here the consumption set is A = R2++ . CES and log-linear consumers have
therefore di¤erent consumption sets.
(iii) Suppose that the consumer has a subsistence bundle x 0, so that he can consider
only bundles x x (in order to survive). In this case it is natural to take as consumption
set the closed and convex set
For instance, we can consider the restrictions of CES and log-linear utility functions on this
set A. N
The next result shows some remarkable properties of the budget set.
Proposition 792 The budget set B (p; w) is convex if A is convex and it is compact if A is
closed and p 0.
The importance of the condition p 0 is obvious: if some of the goods were free (and
available in unlimited quantity), the consumer could obtain any quantity of it and the budget
set would be then unbounded.
In light of this proposition, we will often assume that the consumption set A is closed
and convex (but the log-linear utility function is an important example featuring an open
consumption set).
Proof Let A be closed and p 0. Let us show that B (p; w) is closed. Consider a sequence
of bundles xk B (p; w) such that xk ! x. Since A is closed, x belongs to A. Since
p xk w for every k, we have p x = lim p xk w. Therefore, x 2 B (p; w). By Theorem
165, B (p; w) is closed.
We are left to show that B (p; w) is a bounded set. By contradiction, suppose that there
exists a sequence xk B (p; w) such that xki ! +1 for some good i. Since p 0 and
n
x 2 R+ , we have p x k pi xki for every k. We reach therefore the contradiction
w lim p xk pxki ! +1
p ( x + (1 ) y) = (p x) + (1 ) (p y) w + (1 )w = w
The consumer (optimization) problem consists in maximizing the consumer utility func-
tion u : A Rn+ ! R on the budget set B (p; w), that is,
Given prices and income, the budget set B (p; w) is the choice set of the consumer problem.
In particular, a bundle x
^ 2 B (p; w) is a maximizer, that is, a solution of the optimization
problem (18.16), if
u (^
x) u (x) 8x 2 B (p; w)
while maxx2B(p;w) u (x) is the maximum utility that can be attained by the consumer.
The maximum utility maxx2B(p;w) u (x) depends on the income w and on the vector of
prices p: the function v : Rn++ R++ ! R de…ned by
is called the indirect utility function.7 When prices and income vary, it indicates how varies
the maximum utility that the consumer may attain.
Example 794 The unique optimal bundle for the log-linear utility function u (x) = a log x1 +
(1 a) log x2 , with a 2 (0; 1), is given by x
^1 = aw=p1 and x^2 = (1 a) w=p2 (Example 788).
It follows that that the indirect utility function associated to the log-linear utility function
is
aw (1 a) w
v (p; w) = u (^
x) = a log + (1 a) log
p1 p2
= a (log a + log w log p1 ) + (1 a) (log (1 a) + log w log p2 )
= log w + a log a + (1 a) log (1 a) (a log p1 + (1 a) log p2 )
Thanks to (18.13) and (18.14), the property of monotonicity seen in Proposition 784
takes the following form for indirect utility functions.
7
Here, we are tacitly assuming that a maximizer exists for every pair (p; w) of prices and income. Later
in the chapter we will present results, namely Weierstrass’and Tonelli’s theorems, that guarantee this.
538 CHAPTER 18. OPTIMIZATION PROBLEMS
w w0 =) v (p; w) v p; w0
and
p p0 =) v (p; w) v p0 ; w
In other words, consumers always bene…t both from a higher income and from lower
prices, regardless of their utility functions (provided they are continuous).
Proof Let x 2 B (p; w) be such that p x < w. It is easy to see that, being A closed
under majorization,
P there exists y x such that p y w. Indeed, taking any 0 < " <
(w p x) = ni=1 pi , it is su¢ cient to set y = x + " (1; :::; 1), that is, yi = xi + " for every
i = 1; :::; n. Since u is strongly increasing, we have u (y) > u (x) and therefore x cannot be
a solution of the consumer problem.
The consumer allocates therefore all its income to the purchase of an optimal bundle x ^,
^ = w.9 This property is called Walras’ law and, thanks to it, in the consumer
that is, p x
problem with strongly increasing utility functions we can replace the budget set B (p; w) by
its subset
(p; w) = fx 2 A : p x = wg @B (p; w)
de…ned by the equality constraint.
Producer problem Consider a producer who must decide the quantity y to produce of a
given output. In taking such a decision the producer must consider both the revenue r (y)
that he will have by selling the quantity y and the cost c (y) that he will bear to produce it.
Let r : [0; 1) ! R be the revenue function and c : [0; 1) ! R be the cost function of
the producer. His pro…t is therefore represented by the function : [0; 1) ! R given by
The producer (optimization) problem is to maximize his pro…t function : [0; 1) ! R, that
is,
max (y) sub y 0 (18.18)
y
8
A set A is closed under majorization if x 2 A and y x then y 2 A. That is, if A contains a vector x,
it also contains all the vectors y that are greater than x. For instance, Rn n
+ and R++ are both closed under
majorization, so to …x ideas the reader can think of them in reading Walras’law.
9
Proposition 796 is sharper than Proposition 783 because there exist points of the boundary @B (p; w)
such that p x < w. For example, the origin 0 2 @B (p; w) (provided 0 2 A).
18.1. GENERALITIES 539
(^
y) (y) 8y 0
while maxy2[0;1) (y) is the maximum pro…t that can be obtained by the producer. The set
of the (pro…t) maximizing outputs is arg maxy2[0;1) (y).
The form of the revenue function depends on the structure of the market in which the
producer sells the output, while that of the cost function depends on the structure of the
market where the producer buys the inputs necessary to produce the good. Let us consider
some classic market structures.
(i) The output market is perfectly competitive, so that its sale price p 0 is independent
of the quantity that the producer decides to produce. In such a case the revenue
function r : [0; 1) ! R is given by
r (y) = py
(ii) The producer is a monopolist on the output market. Let us suppose that the demand
function on this market is D : [0; 1) ! R, where D (y) denotes the unit price at
which the market absorbs the quantity y of the output. Usually, for obvious reasons,
we assume that the demand function is decreasing: the market absorbs greater and
greater quantities of output as its unit price gets lower and lower. The revenue function
r : [0; 1) ! R is therefore given by
r (y) = yD (y)
(iii) The input market is perfectly competitive, that is, the vectors
x = (x1 ; x2 ; :::; xn )
that are independent of the quantity that the producer decides to buy (wi is
Pthe price
of the i-th input). The cost of a vector x of input is thus equal to w x = ni=1 wi xi .
But, how does this cost translate in a cost function c (y)?
To answer this question, assume that f : Rn+ ! R is the production function that the
producer has at his disposal to transform a vector x 2 Rn+ of input into the quantity
f (x) of output. The cost c (y) of producing the quantity y of output is then obtained
by minimizing the cost w x among all the vectors x 2 Rn+ that belong to the isoquant
1
f (y) = x 2 Rn+ : f (x) = y
that is, among all the vectors that allow to produce the quantity y of output. Indeed,
in terms of production, the inputs in f 1 (y) are equivalent and so the producer will
opt for the cheaper ones. In other terms, the cost function c : [0; 1) ! R is given by
c (y) = min w x
x2f 1 (y)
540 CHAPTER 18. OPTIMIZATION PROBLEMS
that is,10 it is equal to the minimum value of the minimum problem for the cost w x
on the isoquant f 1 (y).
To sum up, a producer who, for example, is a monopolist in the output market and faces
perfect competition in the inputs’markets, has a pro…t function
Instead, a producer who faces perfect competition in all markets, for the output and the
inputs’, has a pro…t function
18.1.5 Comments
Ordinality Properties of functions that are preserved under strictly increasing transform-
ations are called ordinal, as we mentioned when discussing utility theory (Sections 6.4.4 and
14.4). In view of Proposition 782, a property may hold for all equivalent objective functions
only if it is ordinal. For instance, all them can be quasi-concave but not concave (quasi-
concavity, but not concavity, is an ordinal property). So, if we are interested in a property
of solutions and wonder which properties of objective functions would ensure it, ideally we
should look for ordinal properties. If we come up with su¢ cient conditions that are not so –
for instance, concavity or continuity conditions – chances are that there exist more general
su¢ cient conditions that are ordinal. In any case, any necessary condition must be ordinal
in that it has to hold for all equivalent objective functions.
To illustrate this subtle, yet important, methodological point, consider the uniqueness of
solutions, a most desirable property for comparative statics exercises (as we remarked earlier
in the chapter). We will soon learn that strict quasi-concavity is an ordinal property that
ensures such uniqueness (Theorem 831). So does strict concavity as well, which is not an
ordinal property. Yet, conceptually it is strict quasi-concavity the best way to frame this
su¢ cient condition –though, operationally, strict concavity might be the workable version.
What about a necessary condition for uniqueness of solutions? At the end of the chapter
we will digress on cuneiformity, an ordinal property that is both necessary and su¢ cient for
uniqueness (Proposition 864). As soon as we look for necessary conditions, ordinality takes
center stage.
Rationality Optimization problems are fundamental also in the natural sciences, as Le-
onida Tonelli well explains in a 1940 piece: “Maximum and minimum questions have always
had a great importance also in the interpretation of natural phenomena because they are
governed by a general principle of parsimony. Nature, in its manifestations, tends to save
the most possible of what it uses; therefore, the solutions that it …nds are always solutions
of either minimization or maximization problems”. The general principle to which Tonelli
alludes, the so-called principle of minimum action, is a metaphysical principle (in the most
basic meaning of this term). Not by chance Tonelli continues by writing “Euler said that,
10
To be mathematically precise, the min in the previous expression should be an inf. We tacitly assume
that the inf is indeed achieved.
18.2. EXISTENCE: WEIERSTRASS’THEOREM 541
since the construction of the world is the most perfect and was established by the wisest cre-
ator, nothing happens in this world without an underlying maximum or minimum principle”.
In economics, instead, the centrality of the optimization problems is based on a (secular)
assumption of rationality of economic agents. The resulting optimal choices of the agents –
for example, optimal bundles for the consumers and optimal outputs for the producers –are
the natural benchmark with respect to which to assess any suboptimal, boundedly rational,
behavior that agents may exhibit.
admits a solution whenever f is continuous and C is compact. This holds also for the dual
optimization problem with min in place of max.
Proposition 798 If the utility function u : A Rn+ ! R is continuous on the closed set A,
then the consumer problem has a solution provided p 0 (no free goods).
542 CHAPTER 18. OPTIMIZATION PROBLEMS
In words, if the utility function is continuous and the consumption set is closed, optimal
bundles exist as long as there are no free goods. These conditions are fairly mild and often
satis…ed.11
Given the importance of Weierstrass’ Theorem, we close the section with two possible
proofs. First, we need an important remark on notation.
Notation In the rest of the book, to simplify notation we denote also sequences of vectors by
fxn g rather than fxn g. If needed, the writing fxn g Rn should clarify the vector nature of
the sequence even though here n denotes both the dimension of the space Rn and a generic
term xn of a sequence. It is a slight abuse of notation, as the same letter denotes two
altogether di¤erent entities, but hopefully it should not cause any confusion.
Lemma 800 Let A be a subset of the real line. There exists a convergent sequence fan g A
such that an ! sup A.
Proof Set = sup A. Suppose that 2 R. By Proposition 120, for every " > 0 there exists
a" 2 A such that a" > ". By taking " = 1=n for every n 1, it is therefore possible to
build a sequence fan g A such that an > 1=n for every n. It is immediate to see
that an ! .
Suppose now = +1. It follows that for every K > 0 there exists aK 2 A such that
aK K. By taking K = n for every n 1, we can therefore build a sequence fan g such
that an n for every n. It is immediate to see that an ! +1.
First proof of Weierstrass’ Theorem Set = supx2C f (x), that is, = sup f (C).
By the previous lemma, there exists a sequence fan g f (C) such that an ! . Let
fxn g C be such that an = f (xn ) for every n 1. Since C is compact, the Bolzano-
Weierstrass’ Theorem yields a subsequence fxnk g fxn g that converges to some x^ 2 C,
that is, xnk ! x^ 2 C. Since fan g converges to , also the subsequence fank g converges to
. Since f is continuous, it follows that
= lim ank = lim f (xnk ) = f (^
x)
k!1 k!1
11
Free goods short circuit the consumer problem, so constraints may actually help consumers to focus:
(homo oeconomicus) e vinculis ratiocinatur.
18.2. EXISTENCE: WEIERSTRASS’THEOREM 543
We conclude that x
^ is a solution and = max f (C), that is, x^ 2 arg maxx2C f (x) and
= maxx2C f (x). A similar argument shows that arg minx2C f (x) is not empty.
Proof With the notions of topology at our disposal we are able to prove the result only in
the case n = 1 (the general case, however, does not present substantial di¤erences). So, let
n = 1. By De…nition 29, to show that the set f (K) is bounded in R it is necessary to show
that it is bounded both above and below in R. Suppose, by contradiction, that f (K) is
unbounded above. Then there exists a sequence fyn g f (K) such that limn!1 yn = +1.
Let fxn g K be the corresponding sequence such that f (xn ) = yn for every n. The
sequence fxn g is bounded since it is contained in the bounded set K. By Bolzano-Weierstrass’
Theorem, there exist a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x ~.
Since K is closed, we have x ~ 2 K. Moreover, the continuity of f implies limk!1 ynk =
limk!1 f (xnk ) = f (~x) 2 R. This contradicts limk!1 ynk = limn!1 yn = +1. It follows
that the set f (K) is bounded above. In a similar way, one shows that the set f (K) is
bounded below. Thus, f (K) is bounded.
To complete the proof that f (K) is compact, it remains to show that f (K) is closed.
Consider a sequence fyn g f (K) that converges to y 2 R. By Theorem 165, we must show
that y 2 f (K). Since fyn g f (K), by de…nition there exists a sequence fxn g K such
that f (xn ) = yn . As seen above, the sequence fxn g is bounded. The Bolzano-Weierstrass’
Theorem yields a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x
~. Since K
is closed, x
~ 2 K. Moreover, the continuity of f implies that
Before proving Weierstrass’ Theorem, observe that the fact that continuity preserves
compactness is quite remarkable. It is another characteristic that distinguishes compact sets
among closed sets, for which in general this fact does not hold, as the next example shows.
Example 802 The function f (x) = e x is continuous, but the image of the closed, but not
compact, set [0; 1) under f is the set (0; 1], which is not closed. N
Second proof of Weierstrass’Theorem As for the previous lemma, we prove the result
for n = 1. By Lemma 801, f (K) is compact, so is bounded. By the Least Upper Bound
Principle, there exists sup f (K). Since sup f (K) 2 @f (K) (why?) and f (K) is closed,
it follows that sup f (K) 2 f (K). Therefore, sup f (K) = max f (K), that is, there exists
x1 2 K such that f (x1 ) = maxx2K f (x). A similar argument shows that arg minx2C f (x)
is not empty.
544 CHAPTER 18. OPTIMIZATION PROBLEMS
(f t) \ C = fx 2 C : f (x) tg (18.19)
Thus, a function is coercive on C when there is at least an upper contour set that has a
non-empty and compact intersection with C. In particular, when A = C the function is just
said to be coercive, without any further speci…cation.
4 y
3
0
O x
-1 y =t
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
12
Needless to say, the theorems of this section can be “‡ipped over” (just take f ) in order to guarantee
the existence of minimizers, now without caring about maximizers.
18.3. EXISTENCE: TONELLI’S THEOREM 545
Example 805 Consider the cosine function f : R ! R given by f (x) = cos x, with graph:
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
As the last example shows, coercivity is a joint property of the function f and of the set
C, that is, of the pair (f; C). It is also an ordinal property:
Example 807 Thanks to Example 804 and Proposition 806, the famous Gaussian function
2
f : R ! R de…ned by f (x) = e x is coercive. This should be clear by inspection of its
graph:
3
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
which is the well-known “bell curve” found in statistics courses (cf. Example 1258). N
All continuous functions are coercive on compact sets. This will be a simple consequence
of the following important property of upper and lower contours sets of continuous functions.
Example 809 The hypothesis that C is closed is crucial. Take for example f : R ! R
given by f (x) = x. If C = (0; 1), we have (f t) \ C = [t; 1) for every t 2 (0; 1) and such
sets are not closed. N
Continuous functions f on compact sets C are, thus, a …rst relevant example of pairs
(f; C) exhibiting coercivity. Let us see a few more examples.
18.3. EXISTENCE: TONELLI’S THEOREM 547
4 y
3
0
O x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
and so the set fx 2 R : f (x) tg is non-empty and compact for every t 1. For example,
for t = 0 we have
fx 2 R : f (x) 0g = [ 1; 1]
which su¢ ces to conclude that f is coercive (indeed, in De…nition 803 we require the existence
of at least one t 2 R for which the set fx 2 R : f (x) tg is non-empty and compact). N
and so (
; t>0
fx 2 R : f (x) tg \ C =
1; et [ et ; 1 [ f0g t 0
Thus the function is coercive on the compact set [ 1; 1] (although it is discontinuous at 0,
thus making Proposition 810 inapplicable). N
18.3.2 Tonelli
The fact that coercivity and continuity of a function guarantee the existence of a maximizer
is rather intuitive. The upper contour set (f t) indeed “cuts out the low part” – i.e.,
under the value t – of Im f leaving untouched the high part – where the maximum value
lies. The following result, a version of a result of Leonida Tonelli, formalizes this intuition
by establishing the existence of maximizers for coercive functions.
f (^
x) = max f (x)
x2C
Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t)\C is
non-empty and compact. By Weierstrass’Theorem, there exists x ^ 2 such that f (^x) f (x)
for every x 2 . At the same time, if x 2 C we have that f (x) < t and so f (^
x) t > f (x).
It follows that f (^
x) f (x) for every x 2 C, that is, f (^x) = maxx2C f (x).
Thanks to Proposition 810, the hypotheses of Tonelli’s Theorem are weaker than those
of Weierstrass’Theorem. On the other hand, weaker hypotheses lead to a weaker result (as
always, no free meals) in which only the existence of a maximizer is guaranteed, without mak-
ing any mention of minimizers. Since, as we already noted, in many economic optimization
problems, one is interested in the existence of maximizers, Tonelli’s Theorem is important
because it allows to “trim o¤” overabundant hypotheses (with respect to our needs) from
Weierstrass’Theorem. In particular, we can use Tonelli’s Theorem in optimization problems
where the choice set is not compact – for example, in Chapter 28 we will use it with open
choice sets.
has a solution if f is coercive and continuous on C. Under such hypotheses, one cannot say
anything about the dual minimization problem with min instead of max.
2
Example 815 The functions f; g : R ! R de…ned by f (x) = 1 x2 and g (x) = e x are
both coercive (see Examples 811 and 807). Since they are continuous as well, by Tonelli’s
Theorem we can say that arg maxx2R f (x) 6= ; and arg maxx2R g (x) 6= ; (as easily seen
from their graphs, for both functions the origin is the global maximizer). Note that, instead,
arg minx2R f (x) = arg minx2R g (x) = ;. Indeed, the set R is not compact, thus making
Weierstrass’Theorem inapplicable. N
18.3. EXISTENCE: TONELLI’S THEOREM 549
N.B. The coercivity of f on C amounts to say that there exists a non-empty compact set
K such that
arg max f (x) K C
x2C
18.3.3 Supercoercivity
In light of Tonelli’s Theorem, it becomes important to identify classes of coercive functions.
Supercoercive functions are a …rst relevant example.14
Proposition 817 A function f : Rn ! R is supercoercive if and only if all its upper contour
sets are bounded.
large enough so that xn 2 = (f t) for all n nt , i.e., f (xn ) < t for all n nt . In turn, this
implies that lim sup f (xn ) t. Since this inequality holds for all scalars t < supx2Rn f (x),
we conclude that lim sup f (xn ) = 1, which in turn trivially implies that lim f (xn ) = 1,
as desired.
jxn j ! +1 =) f (xn ) = 1
The next result shows that supercoercivity implies coercivity for functions f that are
continuous on a closed set C. As a result, Tonelli’s Theorem can be applied to the pair
(f; C).
Proof The last result implies that, for every t 2 R, the sets (f t) \ C are bounded. Since
f is continuous and C is closed, such sets are also closed. Indeed, take fxn g (f t) \ C
such that xn ! x 2 Rn . By Theorem 165, to show that (f t) \ C is closed it su¢ ces
to show that x 2 (f t) \ C. As C is closed, we have x 2 C. Since f is continuous, we
have lim f (xn ) = f (x). Since f (xn ) t for every n 1, it follows that f (x) t, that is,
x 2 (f t). Hence, x 2 (f t) \ C and the set (f t) \ C is closed. Since it is bounded,
it is compact.
The reader should note that, when considering a supercoercive and continuous func-
tion, all sets (f t) \ C are compact, while coercivity requires only that at least one of
them be non-empty and compact. This shows, once again, how supercoercivity is a much
stronger property than coercivity. However, it is simpler both to formulate and to verify,
thus explaining its appeal.
Proof Let fxn g Rn be such that kxn k ! +1. This implies that there exists n 1
such that kxn k k, and so g (xn ) f (xn ), for every n n. At the same time, since f
is supercoercive, the sequence ff (xn )g is such that f (xn ) ! 1. This implies that for
each K 2 R there exists nK 1 such that xn < K for all n nK . For each K 2 R, set
nK = max fn; nK g. We then have g (xn ) f (xn ) < K for all n nK , thus proving that
g (xn ) ! 1 as well.
De…nition 823 Given two sets X and Y of Rn , we say that they are separated if there
exists an hyperplane H such that X H+ and Y H . In particular, they are:
(ii) strongly separated if a x b+" > b a y for all x 2 X and y 2 Y and for some
" > 0.
Intuitively, two sets are separated when there exists an hyperplane that acts like a wa-
tershed between them, with each set included in a di¤erent half-space determined by the
hyperplane.
It is often important the separation between a convex set and a single point. Next we
focus on such a case.
Proof We only prove (i), while we omit the non-trivial proof of (ii). Without loss of
generality, assume that x0 = 0 2 = C. Consider the continuous function f : Rn ! R given
2
by f (x) = kxk . This function is supercoercive (Example 818). By Proposition 820, f is
coercive on the closed set C, so it has a maximizer c 2 C by Tonelli’s Theorem. If x is any
point of C, we have kck2 k c + (1 ) xk2 . Hence
kck2 2
kck2 + (1 )2 kxk2 + 2 (1 ) c x
2 2
(1 + ) kck (1 ) kxk + 2 c x
Corollary 825 A compact convex set and a closed convex set are separated if they are
disjoint.
Proof Let K be a compact convex set and C be a closed convex set, with K \ C = ;. The
set K C = fx y : x 2 K; y 2 Cg is a closed and convex set (Proposition 1344) that does
not contain the origin 0 since K \ C = ;. So, by (i) of the last theorem the sets f0g and
K C are strongly separated, so 0 = a 0 b < b + " a (x y) for every x 2 K and
y 2 C. Since + " > 0, this implies a x b + " + a y > a y, so K and C are separated.
6
y
5
0 O x
-1
-2
1880 1900 1920 1940 1960 1980 2000
The highest peak is the (global) maximum value, but intuitively the other peaks, too, cor-
respond to points that, locally, are maximizers. The next de…nition formalizes this simple
idea.
The value f (^
x) of the function at x
^ is called local maximum value of f on C.
A global maximizer on C is obviously also a local maximizer. The notion of local max-
imizer is, indeed, much weaker than that of global maximizer. As the next example shows,
it may happen that there are (even many) local maximizers and no global maximizers.
10
y
8
-2
-4
O x
-6
-8
-10
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
In particular, the origin x = 0 is a local maximizer, but not a global one. Indeed,
limx!+1 f (x) = limx! 1 f (x) = +1, thus the function has no global maximizers.
(ii) Let f : R ! R be given by
(
cos x if x 0
f (x) =
x if x > 0
8
y
6
0
O x
-2
-4
-6
-8
-8 -6 -4 -2 0 2 4 6 8
The function has in…nitely many local maximizers (i.e., x = 2k for k 2 N), but no global
ones. N
18.6. CONCAVITY AND QUASI-CONCAVITY 555
O.R. The most important part of the de…nition of a local maximizer is “if there exists a
neighborhood”. A common mistake is to replace the correct “if there exists a neighborhood”
by the incorrect “if, by taking a neighborhood B" (^ x) of x
^”. In such a way, we do not de…ne
a local maximizer but a global one. Indeed, to …x a priori a neighborhood B" (^ x) amounts
to considering B" (^ x) rather than C as the choice set, so a di¤erent optimization will be
addressed. Relatedly, in the neighborhood B" (^ x) in (18.21) the local maximizer is, clearly,
a global one. Such “choice set” is, however, chosen by the function, not posited by us. So,
it is typically of little interest for the application that motivated the optimization problem.
Applications discipline optimization problems, not vice versa. H
O.R. An isolated point x0 of C is always both a local maximizer and a local minimizer.
Indeed, by de…nition there is a neighborhood B" (x0 ) of x0 such that B" (x0 ) \ C = fx0 g,
so the inequalities f (x0 ) f (x) and f (x0 ) f (x) for every x 2 B" (x0 ) \ C reduce to
f (x0 ) f (x0 ) and f (x0 ) f (x0 ), which are trivially true.
Considering isolated points as both local maximizers and local minimizers is a bit odd. To
avoid this, we could reformulate the de…nition of local maximizer and minimizer by requiring
x
^ to be a limit point of C. However, an even more unpleasant consequence would result:
if an isolated point were a global extremal (e.g., recall the example at the end of Section
18.1.1), we should say that it is not so in the local sense. Thus, the remedy would be worse
than the disease. H
Proof Let x
^ 2 C be a local maximizer. By de…nition, there exists a neighborhood B" (^
x)
such that
f (^
x) f (x) 8x 2 B" (^x) (18.22)
Suppose, by contradiction, that x^ is not a global maximizer. Then, there exists a y 2 C such
that f (y) > f (^
x). Since f is concave, for every t 2 (0; 1) we have
f (t^
x + (1 t) y) tf (^
x) + (1 t) f (y) > tf (^
x) + (1 t) f (^
x) = f (^
x) (18.23)
Therefore, there exists t 2 (0; 1) such that t^x + (1 t) y 2 B" (^x) for every t 2 (t; 1). From
(18.23) it follows that for such t we have f (t^
x + (1 t) y) > f (^
x), which contradicts (18.22).
We conclude that x ^ is a global maximizer.
Graphically:
4
3.5 y
3
2.5
2 2
1.5
0.5
0
O 1 x
-0.5
-1
-3 -2 -1 0 1 2 3
This function is quasi-concave because it is monotonic. All the points x > 1 are local
maximizers, but not global maximizers. N
When f is quasi-concave, the set of maximizers arg maxx2C f (x) is convex.15 Indeed, let
y; z 2 arg maxx2C f (x) and let t 2 [0; 1]. By quasi-concavity, we have
and therefore
f (ty + (1 t) z) = max f (x)
x2C
(iii) arg maxx2C f (x) consists of in…nitely many points: there exist in…nitely many maxim-
izers.
Example 830 (i) Let f : R++ ! R be de…ned by f (x) = log x for every x > 0. The function
f is strictly concave. It is easy to see that it has no maximizers, that is, arg maxx>0 f (x) = ;.
(ii) Let f : R ! R be de…ned by f (x) = 1 x2 for every x 2 R. Then f is strictly concave
and the unique maximizer is x ^ = 0, so that arg maxx2R f (x) = f0g. (iii) Let f : R ! R be
de…ned by 8
>
> x if x 1
<
f (x) = 1 if x 2 (1; 2)
>
>
:
3 x if x > 2
with graph
2
1.5 y
0.5
O
0
1 2 x
-0.5
-1
-1.5
-2
-2 -1 0 1 2 3 4
The last function of this example, with in…nitely many maximizers, is concave but not
strictly concave. The next result shows that, indeed, strict quasi-concavity implies that a
maximizer, if exists, is necessarily unique. In other words, for strictly quasi-concave func-
tions, arg maxx2C f (x) is at most a singleton (so, the unique maximizer is also a strong one,
if exists).
f (^
x2 ) = maxx2C f (x). Set xt = t^x1 + (1 t) x
^2 for t 2 (0; 1). Since C is convex, xt 2 C.
Moreover, by strict quasi-concavity,
f (xt ) = f (t^
x1 + (1 t) x
^2 ) > min ff (^
x1 ) ; f (^
x2 )g = max f (x)
x2C
18.6.2 Minima
Also miniminization problems for concave functions have some noteworthy properties.
Proof Suppose arg minx2C f (x) 6= ; (otherwise the result is trivially true). (i) Let x
^ 2
arg minx2C f (x). Since f is not constant, there exists y 2 C such that f (y) > f (^ x).
Suppose, by contradiction, that x^ is an interior point of C. Set z = x ^ + (1 ) y with
2 R. The points z are the points of the straight line that passes through x
^ and y. Since
x
^ is an interior point of C, there exists > 1 such that z 2 C. On the other hand,
x
^ = z = + y= 1 1 . Therefore, we get the contradiction
1 1 1 1
f (^
x) = f z + 1 y f (z ) + 1 f (y)
1 1
> f (^
x) + 1 f (^
x) = f (^
x)
It follows that x^ 2 @C, as desired. (ii) Let x^ 2 arg minx2C f (x). Suppose, by contradiction,
that x^2 = ext C. Then, there exist x; y 2 C with x 6= y and 2 (0; 1) such that x ^ = x+
(1 ) y. By strict quasi-concavity, f (^
x) = f ( x + (1 ) y) > min ff (x) ; f (y)g f (^
x),
a contradiction. We conclude that x ^ 2 ext C, as desired.
Hence, under (i) the search of minimizers can be restricted to the boundary points of C.
More is true under (ii), where the search can be restricted to the extreme points of C, an
even smaller set (Proposition 693).
18.6. CONCAVITY AND QUASI-CONCAVITY 559
Extreme points take center stage in the compact case, a remarkable fact because the set
of extreme points can be a small subset of the frontier –for instance, if C is a polytope we
can restrict the search of minimizers to the vertices.
and
;=
6 arg min f (x) arg min f (x) co arg min f (x) (18.25)
x2extC x2C x2extC
;=
6 arg min f (x) extC
x2C
Relative to the previous result, now Weierstrass’Theorem ensures the existence of min-
imizers. More interestingly, thanks to Minkowski’s Theorem in (i) we can now say that a
concave function attains its minimum value at some extreme point. So, in terms of value
attainment the miniminization problem
that only involves extreme points. In particular, in the important case when f is strictly
concave we can take advantage of both (i) and (ii), so
;=
6 arg min f (x) = arg min f (x)
x2extC x2C
The miniminization problem (18.26) then reduces to the simpler problem (18.27) in terms
of both solutions and value attainment.
Proof By Weierstrass’ Theorem, arg minx2C f (x) 6= ;. Point (ii) thus follows from the
previous result. As to (i), we …rst prove that
that is, that minimizers are a convex combination of extreme points which are, themselves,
minimizers. Let x ^ 2 arg minx2C f (x). By Minkowski’s Theorem, we have C = co extC.
560 CHAPTER 18. OPTIMIZATION PROBLEMS
Therefore, there
P exist a …nite collection Pfxi gi2I extC and a …nite collection f i gi2I
(0; 1],16 with i2I i = 1, such that x ^ = i2I i xi . Since x ^ is a minimizer, we have f (xi )
f (^
x) for all i 2 I. Together with concavity, this implies that
!
X X X
f (^
x) = f i xi i f (xi ) i f (^
x) = f (^
x) (18.29)
i2I i2I i2I
P
Hence, we conclude that i2I i f (xi ) = f (^ x), which implies f (xi ) = f (^x) for all i 2 I.
Indeed,
P if we had f (x i ) > f (^
x ) for some i 2 I, then we would reach the contradiction
i2I i f (xi ) > f (^ x). It follows that for each i 2 I we have xi 2 arg minx2C f (x) \ extC,
proving (18.28).
We are ready to prove (18.25). By the previous part of the proof, arg minx2C f (x) \
extC 6= ;. Consider x 2 arg minx2C f (x) \ extC. Let x ^ 2 arg minx2extC f (x). By de…nition
and since x 2 extC, we have that f (^ x) f (x). Since x 2 arg minx2C f (x), we have
that f (x) f (^
x). This implies that f (x) = f (^ x) and, therefore, x
^ 2 arg minx2C f (x).
Since x^ was arbitrarily chosen, it follows that arg minx2extC f (x) arg minx2C f (x) \ extC,
proving the …rst inclusion in (18.25). Clearly, extC \ arg minx2C f (x) arg minx2extC f (x).
So, extC \ arg minx2C f (x) = arg minx2extC f (x) and (18.28) yields the second inclusion in
(18.25).
It remains to prove (18.24). Let x ^ 2 arg minx2C f (x). By (18.25), there exists a …-
nite
P collection f^ xi gi2I arg
Pminx2extC f (x) and a …nite collection f i gi2I (0; 1], with
i2I i = 1, such that x
^ = i2I i ix
^ . By concavity:
X X
min f (x) = f (^ x) i f (^
xi ) = i min f (x) = min f (x) min f (x)
x2C x2extC x2extC x2C
i2I i2I
1 1
f (x) =(1 x1 x2 )2 (1 x3 )2
2 2
It is easy to check that f is continuous and concave. Since 2 is convex and compact with
extreme points the versors e1 ; e2 ; e3 , by Bauer’s Theorem-(i) we have
6 arg min f ei
;= arg min f (x) co arg min f ei (18.30)
i2f1;2;3g x2 2 i2f1;2;3g
It is immediate to check that f ei = 1=2 for all i 2 f1; 2; 3g, that is,
Let x = (1=4; 1=4; 1=2) 2 2 and x ^ = (1=2; 1=2; 0). We have f (x) = 1=4 > 1=2 = f (^ x),
so x does not belong to arg minx2 2 f (x) but, clearly, belongs to co(arg mini2f1;2;3g f ei ).
^ belongs to arg minx2 2 f (x) but, clearly, does not belong to arg mini2f1;2;3g f ei .
Moreover, x
This proves that the inclusions in (18.30) are strict. N
18.6.3 A¢ ne functions
If we consider a¢ ne functions –i.e., functions that are both concave and convex –we have
the following corollary of Bauer’s Theorem.
max f (x) = max f (x) and min f (x) = min f (x) (18.31)
x2C x2extC x2C x2extC
as well as
;=
6 arg max f (x) = co arg max f (x) (18.32)
x2C x2extC
and
;=
6 arg min f (x) = co arg min f (x) (18.33)
x2C x2extC
Proof By (18.24) we have (18.31). By Proposition 671, f is continuous. So, the sets in
(18.32) and (18.33) are non-empty by Weierstrass’ Theorem. Since f is a¢ ne, it is also
concave. By (18.25),
so
co arg min f (x) = co arg min f (x) = arg min f (x)
x2extC x2C x2C
because arg minx2C f (x) is convex given that f is a¢ ne. Since f is also a¢ ne, the result
holds for the arg maxx2C f (x) as well.
that only involve extreme points. Moreover, by (18.31), the values attained are the same.
So, the simpler problems are equivalent to the original ones in terms of both solutions and
value attainment.
An earlier instance of such a remarkable simpli…cation a¤orded by a¢ ne objective func-
tions was discussed in Example 789-(ii). Next we provide another couple of examples.
562 CHAPTER 18. OPTIMIZATION PROBLEMS
f e3 = 4 < f e1 = 6 < f e2 = 7
By (18.32) and (18.33), arg maxx2C f (x) = e2 and arg minx2C f (x) = e3 .
(ii) Consider the a¢ ne function f : R3 ! R de…ned by f (x) = x1 + 2x2 + 2x3 + 5. Now
we have
f e1 = 6 < f e2 = f e3 = 7
By (18.32) and (18.33),
of Rn is called polyhedron. Let us write explicitly the row vectors of the matrix A as:
Each row vector ai thus identi…es an inequality constraint ai x bi that a vector x 2 Rn has
to satisfy in order to belong to the polyhedron. We can indeed write P as the intersection
m
\
P = Hi
i=1
Example 838 (i) A¢ ne sets are the polyhedra featuring equality constraints (Proposition
666). (ii) Simplices are polyhedra: for instance 2 in R3 can be written as x 2 R3 : Ax b
18.6. CONCAVITY AND QUASI-CONCAVITY 563
This polyhedron is not bounded: for instance, the vectors xn = ( n; 1=2; 0) belong to P
for all n 1. N
Example 840 The elements of a polyhedron are often required to be positive, so let P =
x 2 Rn+ : Ax b . This polyhedron can be written, however, in the standard form P 0 =
fx 2 Rn : A0 x b0 g via suitable A0 and b0 . For instance, if we require the elements of the
polyhedron of the previous example to be positive, we have b0 = (1; 1; 2; 0; 0; 0) and
2 3
1 2 2
6 0 2 1 7
6 7
6 0 1 1 7
0
A =6 6 7
6 0 0 1 7
7
4 0 1 0 5
1 0 0
in which we added (negative) versors to the matrix A. In sum, the standard formulation of
polyhedra easily includes positivity constraints. N
Polyhedra are easily seen to be closed. So, they are compact if and only if they are
bounded. Bounded polyhedra are actually old friends.
In other words, this result (we omit the non-trivial proof) shows that a bounded poly-
hedron P can be written as a convex envelope of a collection of vectors xi 2 Rn , i.e.,
P = co (x1 ; :::; xm ) (cf. Example 689). This means, inter alia, that bounded polyhedra have
a …nite number of extreme points (cf. Example 694).
We can actually characterize the extreme points of polyhedra. To this end, denote by Ax
the submatrix of A that consists of the rows ai of A featuring constrains that are binding at
x, i.e., such that ai x = bi . Clearly, (Ax ) (A) max fm; ng.
564 CHAPTER 18. OPTIMIZATION PROBLEMS
In other words, a vector is an extreme point of a polyhedron of Rn if and only if there exist
n linearly independent binding constraints at that vector. Besides its theoretical interest,
this characterization operationalizes the search of extreme points by reducing it to checking
a matrix property.
Proof We prove the “if” leaving the converse to the reader. Suppose that (Ax ) = n.
We want to show that x is an extreme point. Suppose, by contradiction, that there exists
2 (0; 1) and two distinct vectors x0 ; x00 2 P such that x = x0 + (1 ) x00 . Denote by
I (x) = fi 2 f1; :::; mg : ai x = bi g the set of binding constrains. Then,
so
ai x0 = ai x00 = bi 8i 2 I (x)
This implies that x0 and x00 are solutions of the linear system
ai x = bi 8i 2 I (x)
In view of Theorem 630, this contradicts the hypothesis (Ax ) = n. We conclude that x is
an extreme point of P .
Example 843 Let us check that the versors e1 , e2 and e3 are the extreme points of the
simplex 2 . For each x 2 R3 we have
8
>
> x1 = 0
<
x2 = 0
Ax = b ()
> x3 = 0
>
:
x1 + x2 + x3 = 1
So,
2 3
0 1 0
Ae1 =4 0 0 1 5
1 1 1
or, equivalently,
n
X
max cj xj
x1 ;:::;xn
j=1
n
X Xn n
X
sub a1j xj b1 ; a2j xj b2 ; :::; amj xj bm
j=1 j=1 j=1
In view of Corollary 836, we can solve this optimization problem when P is bounded (so
compact).
and
;=
6 arg max c x = co arg max c x (18.36)
x2P x2fy2P : (Ay )=ng
max c x sub x 2 n 1
x
A general study of optimization problem with equality and inequality constraints will be
carried out in Chapter 30. Linear programming is the special case of a concave optimization
problem (Section 30.4) where the objective function is linear and the constraints are expressed
via a¢ ne functions.17
17
By Riesz’s Theorem and Proposition 656, we can write the objective function and the constraints in the
inner product and matrix form that (18.34) features.
566 CHAPTER 18. OPTIMIZATION PROBLEMS
18.7 Consumption
18.7.1 Optimal bundles
Let us go back to the consumer problem:
This powerful theorem generalizes Proposition 798 and covers most cases of interest in
consumer theory.
Pn For instance, consider the P log-linear utility function u : Rn++ ! R given
by u (x) = i=1 ai log xi , with ai > 0 and ni=1 ai = 1. It has an open consumption set
Rn++ , so Proposition 798 cannot be applied. Fortunately, the following lemma shows that it
is coercive on B (p; w). Since it is also continuous and strictly concave, by Theorem 846 the
consumer problem with log-linear utility has a unique solution.
Lemma 847 The log-linear utility function u : Rn++ ! R is coercive on B (p; w), provided
p 0.
Proof By Proposition 806, it su¢ ces to show that the result holds for the Cobb-Douglas
n
Y
utility function u (x) = xai i de…ned over Rn++ . We begin by showing that the upper
i=1
contour sets (u t) are closed for every t 2 R. If t 0 the statement is trivially true as
(u t) = ;. Let t > 0, so that (u t) 6= ;. Consider a sequence fxn g (u t) that
converges to a bundle x n
~ 2 R . To prove that (u t) is closed, it is necessary to show that
x
~ 2 (u t). Since fxn g Rn++ , we have x~ 0. Let us show that x ~ 0. Suppose, by
Yn
contradiction, that x has at least one null coordinate. This implies that u (xn ) ! ~ai i = 0,
x
i=1
thus contradicting
u (xn ) t>0 8n 1
It is easily seen that, for t > 0 small enough, the intersection (u t) \ B (p; w) is non-
empty. We have
D (p; w) = x
^ (p; w) 8 (p; w) 2 Rn++ R++
The study of the demand function is usually based on methods of constrained optim-
ization that rely on di¤erential calculus, as we will see in Section 29.5. However, in the
important case of log-linear utility functions the demand for good i is, in view of Example
788,
w
Di (p; w) = ai (18.37)
pi
The demanded quantity of good i depends on income w, on its price pi and the relative
importance ai that the log-linear utility function gives it with respect to the other goods.
Speci…cally, the larger ai is, the higher is good i’s relative importance and –ceteris paribus
(i.e., keeping prices and income constant) –the higher is its demand.
The proof is straightforward: it is enough to note that the budget set does not change if
one multiplies prices and income by the same scalar > 0, that is
B ( p; w) = fx 2 A : ( p) x wg = fx 2 A : p x wg = B (p; w)
As simple as it may seem, this proposition has an important economic meaning. Indeed, it
shows how only relative prices matter. To see why, choose any good among those in bundle
x, for example the …rst good x1 , and call it the numeraire – that is, the unit of account.
By setting its price to 1, we can express income and the other goods’prices in terms of the
numeraire:
p2 pn w
1; ; :::; ;
p1 p1 p1
p2 pn w
x
^ (p1 ; :::; pn ; w) = x
^ 1; ; :::; ; 8p 0
p1 p1 p1
As an example, suppose that bundle x is made up of di¤erent kinds of fruit (apples, bananas,
oranges, and so on). In particular, assume that good 1, the numeraire, are apples. Set
w
~ = w=p1 and qi = pi =p1 for every i = 2; :::; n, so that
p2 p3 pn w
1; ; ; :::; ; = (1; q2 ; q3 ; ::; qn ; w)
~
p1 p1 p1 p1
In terms of the “apple ”numeraire, the price of one unit of fruit 2 is of q2 apples, the price
of one unit of fruit 3 is of q3 apples, ..., the price of one unit of fruit n is of qn apples, while
the value of income is of w ~ apples. To give a concrete example, if
p2 p3 pn w
1; ; ; :::; ; = (1; 3; 7; :::; 5; 12)
p1 p1 p1 p1
the price of one unit of fruit 2 is of 3 apples, the price of one unit of fruit 3 is of 7 apples,
..., the price of one unit of good n is of 5 apples, while the value of income is of 12 apples.
Any good in bundle x can be chosen as numeraire: it is merely a conventional choice
within an economy (justi…ed by political reasons, availability of the good itself, etc.), con-
sumers can solve their optimization problems using any numeraire whatsoever. Such a role,
however, can also be taken by an arti…cial object, such as money, say euros. In this case,
we say that the price of a unit of apples is of p1 euro, the price of a unit of fruit 2 is of p2
euro, the price of a unit of fruit 3 is of p3 euro, ..., the price of a unit of fruit n is of pn
euro, while the value of income is of w euro. It is a mere change of scale, akin to that of
measuring quantities of fruit in kilograms rather than in pounds. In conclusion, Proposition
848 shows that in consumer theory, money is a mere unit of account, nothing but a “veil”.
The choice of optimal bundles does not vary if relative prices p2 =p1 , ..., pn =p1 , and relative
income w=p1 remain unchanged. “Nominal” price and income variations do not matter for
consumers’behavior.
18.8. EQUILIBRIUM ANALYSIS 569
Agents thus play two roles in this economy. Their trader role is, however, ancillary to their
consumer role: what agent i cares about is consumption, trading being only instrumental to
that.
^i (p; p ! i ). Since it only depends on the
Assume that there is a unique optimal bundle x
n n
price vector p, the demand function Di : R+ ! R+ of agent i can be de…ned by
^i (p; ! i )
Di (p) = x 8p 2 Rn+
The individual demand Di has still the remarkable invariance property Di ( p) = Di (p) for
every > 0. So, nominal changes in prices do not a¤ect agents’ consumption behavior.
Moreover, if ui : Rn+ ! R is strongly increasing, then Walras’law is easily seen to hold for
agent i, i.e.,
p Di (p) = p ! i (18.39)
We can now aggregate individual behavior. The aggregate demand function D : Rn+ ! Rn
is de…ned by X
D (p) = Di (p)
i2I
19
We say “net trade” because z may be the outcome of several market operations, here not modelled, in
which agents may have been on both sides of the market (i.e., buyers and sellers).
570 CHAPTER 18. OPTIMIZATION PROBLEMS
Note that the aggregate demand function inherits the invariance property of individual de-
mand functions, that is,
D ( p) = D (p) 8 >0 (18.40)
So, nominal changes do not a¤ect the aggregate demand of goods. Condition A.2 of the
Arrow-Debreu’s Theorem (Section 12.8) is thus satis…ed.
P
Let ! = i2I ! i be the the sum of individual endowments, so the total resources in the
economy. The aggregate supply function S : Rn+ ! Rn is given by such sum, i.e.,
S (p) = !
So, in this simpli…ed exchange economy the aggregate supply function does not depend on
prices. It is a “‡at” supply.
In this economy we have the weak Walras’law
p E (p) 0
where E : Rn+ ! Rn is the excess demand function de…ned by E (p) = D (p) !. Indeed,
X X X
p D (p) = p Di (p) = p Di (p) p !i = p !
i2I i2I i2I
If Walras’law (18.39) holds for each agent i 2 I, then its aggregate version holds
p E (p) = 0
So, besides condition A.2, also conditions W.1 and W.2 used in the Arrow-Debreu’s Theorem
naturally arise in this simple exchange economy.
The wellbeing of each agent i in the economy E depends on the bundle of goods xi =
(xi1 ; :::; xin ) 2 Rn that he receives, as ranked via a utility function ui : Rn+ ! R. A con-
sumption allocation of such bundles is a vector
jIj
x = x1 ; :::; xjIj 2 Rn+
Next we de…ne allocations that may arise via market exchanges which are, at the same time,
voluntary and feasible.
jIj
De…nition 849 A pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a
weak Arrow-Debreu ( market) equilibrium of the exchange economy E if
The optimality condition (i) requires that allocation x consists of bundles that, at the
price level p, are optimal for each agent i – so, as a trader, agent i is freely trading. The
market clearing condition (ii) requires that such allocation x relies on trades that are feasible
in the market. Jointly, conditions (i) and (ii) ensure that allocation x is attained via market
exchanges that are both voluntary and feasible.
The Arrow-Debreu equilibrium notion thus aggregates individual behavior. What distin-
guishes a weak equilibrium and an equilibrium is that in the latter optimal bundles exhaust
endowments, so no resources are left unused. The next result is trivial mathematically yet of
great economic importance in that it shows that the aggregate equilibrium notions of Section
12.8 can be interpreted in terms of a simple exchange economy.
n jIj
P 850 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:
(i) Arrow-Debreu equilibrium if and only if (12.16) holds, i.e., q = D (p) = S (p);
(ii) weak Arrow-Debreu equilibrium if and only if (12.18) holds, i.e., q = D (p) S (p).
In view of this result, we can then establish the existence of a weak market equilibrium
of the exchange economy E using the existence results of Section 12.8, in particular Arrow-
Debreu’s Theorem. For simplicity, next we consider the existence of a weak market price
equilibrium, i.e., a price p such that E (p) 0 (so, at p there is no excess demand).
Proposition 851 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I,
the endowment ! i is strictly positive and the utility function ui is continuous and strictly
quasi-concave on a convex and compact consumption set Ai . Then, a weak Arrow-Debreu
equilibrium of the exchange economy E exists.
In sum, in this simple exchange economy we have connected individual and aggregate be-
havior via an equilibrium notion. In particular, the existence of a (weak) market equilibrium
is established only via conditions on agents’ individual characteristics – i.e., utility func-
tions and endowments – as methodological individualism prescribes. Indeed, to aggregate
individual behavior via an equilibrium notion is a common mode of analysis in economics.
A caveat, however, is in order: indeed, how does a market price equilibrium come about?
The previous analysis provides conditions under which it exists but says nothing about what
kind of individual choices may actually implement it. A deus ex machina, the “market”, sets
price equilibria, a signi…cant limitation of the analysis from a methodological individualism
viewpoint.
572 CHAPTER 18. OPTIMIZATION PROBLEMS
All allocations in C (!) can, in principle, be attained via trading; for this reason, we call
them attainable allocations. Yet, if there exists a mighty planner –say, a pharaon –endowed
with a vector ! of goods, rather than via trading the attainable allocations may result from
an arbitrary consumption allocation selected by the pharaon, who decides which bundle each
agent can consume.
jIj
The operator f : Rn+ ! RjIj given by
represents the utility pro…le across agents of each allocation. So, the image
consists of all utility pro…les (u1 (x1 ) ; :::; u (xn )) that agents can achieve at attainable alloc-
ations. Because of its importance, we denote by the more evocative symbol UE such image,
i.e., we set UE = f (C (!)). The subscript reminds us that this set depends on the individual
characteristics –utility functions and endowments –of the agents in the economy.
jIj
A vector x 2 Rn+ is said to be a (weak, resp.) equilibrium market allocation of
economy E if there is a non-zero price vector p such that the pair (p; x) is a (weak, resp.)
Arrow-Debreu equilibrium of the exchange economy E. Clearly, equilibrium allocations are
attainable.
Can a benevolent pharaon improve upon an equilibrium market allocation? Speci…cally,
given an equilibrium market allocation x, is there an alternative attainable allocation x0 such
that f (x0 ) > f (x), i.e., such that under x0 at least an agent is strictly better o¤ than under
allocation x and none is worse o¤?
Formally, a negative answer to this question amounts to saying that equilibrium market
allocations are Pareto optimal, that is, result in utility pro…les that are maximal in the set
UE , i.e., that are Pareto optima in such set (Section 2.5). Remarkably, this is indeed the
case, as the next fundamental result shows.
Theorem 852 (First Welfare Theorem) Let E = f(ui ; ! i )gi2I be an economy in which
! 0 and, for each agent i 2 I, the utility function ui is concave and strongly increasing
on a convex and closed under majorization consumption set Ai . An equilibrium allocation of
economy E is (if it exists) Pareto optimal.
Thus, it is not possible to Pareto improve upon an equilibrium allocation. The First
Welfare Theorem can be viewed as a possible formalization of the famous invisible hand of
Adam Smith. Indeed, an exchange economy reaches via feasible and voluntary exchanges
an equilibrium allocation that even a benevolent pharaon would be not be able to Pareto
18.9. LEAST SQUARES 573
improve upon, i.e., he would not be able to select a di¤erent attainable allocation that makes
at least an agent strictly better o¤, yet none worse o¤.
Proof Suppose there exists an equilibrium allocation x 2 C (!) under a non-zero price vector
p. Suppose, by contradiction, that there exists a di¤erent x0 2 C (!) such that f (x0 ) > f (x).
Let i 2 I. If ui (x0i ) > ui (xi ), then p x0i > p ! i because xi is an optimal bundle. If
ui (x0i ) = ui (xi ), then p x0i p ! i ; indeed, if p x0i < p ! i then x0i is an optimal bundle
that violates the individual Walras’ law, a contradiction because ui is strongly increasing
and Ai P is closed under majorization (Proposition 796). Being f (x0 ) > f (x), we conclude
P
that p 0 0 0
i2I xi > p !. On the other hand, from x 2 C P (!) it follows that p !P p i2I xi
because p > 0. We thus reached the contradiction p 0 0
i2I xi > p ! p i2I xi . This
proves that x is a Pareto optimum.
A x = b (18.42)
(m n)(n 1) m 1
may not have a solution. This is often the case when a system has more equations than
unknowns, i.e., m > n.
When a system has no solution, there is no vector x^ 2 Rn such that A^ x = b. That said,
one may wonder whether there is a surrogate for a solution, a vector x 2 Rn that minimizes
the approximation error
kAx bk (18.43)
that is, the distance between the vector of constants b and the image Ax of the linear
operator F (x) = Ax. The error is null in the fortunate case where x solves the system:
Ax b = 0. In general, the error (18.43) is positive as the norm is always positive.
By Proposition 782, to minimize the approximation error is equivalent to minimizing the
quadratic transformation kAx bk2 of the norm. This justi…es the following de…nition.
The least squares solution in an approximated solution of the linear system, it is the best
we can do to minimize the distance between vectors Ax and b in Rm . As k k2 is a sum
of squares, to …nd the least squares solution by solving the optimization problem (18.44)
is called least squares method . The fathers of this method are Gauss and Legendre, who
suggested it to analyze astronomical data at the beginning of the nineteenth century.
As we remarked, when it exist the linear system’s solution is also a least squares solution.
To be a good surrogate, a least squares solution should exist also when the system has no
solution. In other words, the more general are the conditions ensuring the existence of
solutions of the optimization problem (18.44), the more useful is the least squares method.
The following fundamental result shows that such solutions do indeed exist and are unique
under the hypothesis that (A) = n. In the more relevant case where m > n, it amounts to
requiring that the matrix A has maximum rank. The result relies on Tonelli’s Theorem for
existence and on Theorem 831 for uniqueness.
Theorem 854 Let m n. The optimization problem (18.44) has a unique solution if
(A) = n.
Later in the book we will see the form of this unique solution (Sections 19.4 and 24.5.1).
To prove the result, let us consider the function g : Rn ! R de…ned by
The following lemma illustrates the remarkable properties of the objective function g which
allow us to use Tonelli’s Theorem and Theorem 831. Note that condition (A) = n is
equivalent to requiring injectivity of the linear operator F (x) = Ax (Corollary 579).
Proof Let us start by showing that g is strictly concave. Set x1 ; x2 2 Rn and 2 (0; 1).
Condition (A) = n implies that F is injective, hence F (x1 ) 6= F (x2 ). Therefore,
where the strict inequality follows from the strict convexity of k k2 .20 So,
g ( x1 + (1 ) x2 ) = kF ( x1 + (1 ) x2 ) bk2
> kF (x1 ) bk2 (1 ) kF (x2 ) bk2
= g (x1 ) + (1 ) g (x2 )
Pn
20
Indeed, the function kxk2 = i=1 x2i is strictly convex, as we already noted for n = 2 in Example 654.
18.9. LEAST SQUARES 575
kyk = ky b + bk ky bk + kbk
hence
kyk ! +1 =) ky bk ! +1 =) f (y) = ky bk2 ! 1
Set Bt = fy 2 Im F : f (y) tg = (f t) \ Im F for t 2 R. As f is supercoercive and
continuous, by Proposition 820 f is coercive on the closed set Im F and the sets Bt =
(f t) \ Im F are compact for every t. Furthermore
(g t) = fx 2 Rn : f (F (x)) tg = fx 2 Rn : F (x) 2 Bt g = F 1
(Bt )
Proof of Theorem 854 In light of the previous lemma, problem (18.45), and so problem
(18.44), has a solution thanks to Tonelli’s Theorem because g is coercive. Such a solution is
unique thanks to Theorem 831 because g is strictly concave.
that is, the sum of the squares of the errors yi xi that are made by using the produc-
tion function f (x) = x to evaluate output. Therefore, one is faced with the following
optimization problem
Xm
min (yi xi )2 sub 2 R
i=1
576 CHAPTER 18. OPTIMIZATION PROBLEMS
By denoting by X = (x1 ; :::; xm ) and Y = (y1 ; :::; ym ) the data vectors regarding input and
output, the problem can be restated as
which is a special case n = 1 of the optimization problem (18.44) with the notation A = X,
x = and b = Y .21
By Theorem 854, problem (18.47) has a unique solution 2 R because the rank condi-
tion is trivially satis…ed when n = 1. The farmer can use the production function
f (x) = x
in order to decide how much fertilizer to use for the next crop, for whichever level of output
he might choose. Given the data he has at hand and the (possibly, simplistic) choice of
a linear production function, the least squares method suggests the farmer that this is the
production function that best …ts the available data.
8
y
7
1
0 O 1 2 3 4 5 6 7x
Such a procedure can be used in the analysis of data regarding any pair of variables.
The independent variable x, referred to as regressor, is not generally unique. For example,
suppose the same farmer needs n kinds of input x1 , x2 , ..., xn – that is, n regressors – to
produce a quantity y of output. The data collected by the farmer is thus
where xij is the quantity of input i used in year j. The vector Y = (y1 ; :::; ym ) denotes the
output, as before. The linear production function is now a function of several variables, that
is, f (x) = x with x 2 Rn . The data matrix
2 3
x11 x21 xn1
6 x12 x22 xn2 7
6 7
X = X1T X2T XnT =6
6
7
7 (18.48)
m n
4 5
x1m x2m xnm
has the vectors X1 , X2 , ..., Xn as columns, so that the latter contain data on each regressor
throughout the years.
The least squares method leads to
min kX Y k2 sub 2 Rn
f (x) > f (^
x)
The value f (^
x) of the function at x
^ is called Pareto value of f on C.
578 CHAPTER 18. OPTIMIZATION PROBLEMS
Because of the planner example, sometimes f is called the social objective function and
C the social choice set. Note that a Pareto value of the objective function f on the choice
set C is a Pareto optimum of the set f (C) = ff (x) : x 2 Cg. Unlike the maximum value,
which is unique, there are in general multiple Pareto values. The collection of all such values
is called Pareto frontier of f on C (in accordance with the terminology of Section 2.5).
We will write an operator optimization problem as
Lemma 857 We have arg maxx2C W (x) arg optx2C f (x) for every .
P
Proof Fix 0, with m i=1 i = 1. Let x ^ 2 arg maxx2C W (x). The point x ^ is clearly a
Pareto optimizer. Otherwise, there exists x 2 C such that f (x) > f (^
x). But, being 0,
this implies W (x) = f (x) > f (^
x) = W (^ x), a contradiction.
This lemma implies the next Weierstrass-type result that ensures the existence of solu-
tions for an operator optimization problem.
In this case, by suitably choosing the vector of weights we can retrieve all optimizers. The
next examples show that this may, or may not, happen.
22
As the reader can check, a dual notion of Pareto optimality would lead to minimum problems.
18.10. OPERATOR OPTIMA 579
Example 859 (i) Consider f : [0; 1] ! R2 given by f (x) = (ex ; e x ). All the points of the
unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = ex + (1 ) e x , where 2 (0; 1). Its maximizer is x ^ = 0 if (1 )= e
and x^ = 1 otherwise. Hence, only the two Pareto optimizers f0; 1g can be found through
scalarization. (ii) Consider f : [0; 1] ! R2 given by f (x) = x2 ; x2 . Again, all the points
of the unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = x2 (1 ) x2 = (2 1) x2 , where 2 (0; 1). We have
8
>
> f0g if < 21
<
arg max W (x) = [0; 1] if = 12
x2C >
>
:
f1g if > 12
and so (18.50) holds. In this case, all Pareto optimizers can be retrieved via scalarization.N
jIj
Given f : Rn+ ! RjIj de…ned in (18.41), i.e., f (x) = (u1 (x1 ) ; :::; u (xn )), the operator
optimization problem of the planner is
The solutions of this problem, i.e., the Pareto optimizers, are called Pareto optimal allocations
(in accordance with the terminology of the First Welfare Theorem).
In view of the previous
P discussion, the planner can tackle his problem through a welfare
function W (x) = m i=1 i ui (xi ) and the associated optimization problem
Unless (18.50) holds, some Pareto optimizers will be missed by a planner that relies on this
scalar optimization problem, whatever he chooses to scalarize with.
Example 860 Consider an exchange economy with two agents and one good. Assume that
the total amount of the good in the economy is ! > 0. For the sake of simplicity, assume
that the two agents have the same preferences over this single good. In this way, they share
the same utility function, for example a linear u : R+ ! R de…ned by
u1 (x) = u2 (x) = x
A planner has to allocate the total endowment ! to the two agents. In other words, he has to
choose an attainable vector x = (x1 ; x2 ) 2 R2+ , that is, such that x1 + x2 ! where x1 will
580 CHAPTER 18. OPTIMIZATION PROBLEMS
be the share of ! allotted to the …rst agent and x2 will be share of the second agent. Indeed,
every agent can only receive a positive quantity of the good, x 2 R2+ , and the planner cannot
allocate to the agents more than what is available in the economy, x1 + x2 !. Here the
collection (18.51) of attainable allocations is
C (!) = x 2 R2+ : x1 + x2 !
that is, the allocations that exhaust total resources are the Pareto optimizers of f on C.
Since agents’utility functions are linear, the Pareto frontier is x 2 R2+ : x1 + x2 = ! . N
Example 861 If in the previous example we have two agents and two goods, we get back
to the setup of the Edgeworth box (Section 2.5). Recall that we assumed that there is a unit
of each good to split among the two agents (Albert and Barbara), so ! = f1; 1g. They have
the same utility function ui : R2+ ! R de…ned by
p
ui (xi1 ; xi2 ) = xi1 xi2
2 p p
De…ne f : R2+ ! R2+ by f (x1 ; x2 ) = ( x11 x12 ; x21 x22 ). The planner operator optimiz-
ation problem (18.49) is here
By Proposition 58,
n o
2
arg opt f (x) = x 2 R2+ :0 x11 = x12 = 1 x21 = 1 x22 1
x2C(!)
that is, the allocations that are symmetric – i.e., there is the same quantity of each good –
and that exhaust total resources are the Pareto optimizers of f on C. The Pareto frontier is
p p
( x11 x12 ; x21 x22 ) 2 R2+ : 0 x11 = x12 = 1 x21 = 1 x22 1
N
23 n
We denote by xi = (xi1 ; :::; xin ) 2 R a bundle of goods of agent i.
18.11. INFRACODA: CUNEIFORM FUNCTIONS 581
O.R. As the First Welfare Theorem suggests, there is a close connection between Pareto op-
timal allocations and equilibrium allocations that would arise if agents were given individual
endowments and could trade among them under a price vector. We do not further discuss
this topic, which readers will study in some microeconomics course. Just note that, through
such connection, the possible equilibrium allocations may be found by solving the operator
optimization problem (18.52) or, under condition (18.50), the standard optimization problem
(18.53). H
De…nition 862 A real-valued function f : A ! R is said to be cuneiform if, for every pair of
distinct elements x; y 2 A, there exists an element z 2 A such that f (z) > min ff (x) ; f (y)g.
The next result shows that being cuneiform is a necessary and su¢ cient condition for the
uniqueness of solutions. In view of the last example, this result generalizes the uniqueness
result that we established for strictly quasi-concave functions.
Proof “If”. Let f : A ! R be cuneiform. We want to show that there exists at most a
maximizer in A. Suppose, by contradiction, that there exist in A two such points x0 and x00 ,
i.e., f (x0 ) = f (x00 ) = maxx2A f (x). Since f is cuneiform, there exists z 2 A such that
which contradicts the optimality of x0 and x00 . “Only if”. Suppose that there exists at most
one maximizer in A. Let x0 and x00 be any two distinct elements in A. If there are no
maximizers, then in particular x0 and x00 are not maximizers; so, there exists z 2 A such
that f (z) > min ff (x0 ) ; f (x00 )g. We conclude that f is cuneiform. On the other hand, if
there is one maximizer, it is easy to check that it plays the role of z in De…nition 862. Also
in this case f is cuneiform.
Though for brevity we omit details, it is easy to see that there is a dual notion in which the
inequality in the previous de…nition is reversed and the previous result holds for minimizers.
inf f (x; y; z; n) = 0
(x;y;z;n)2C
p p 2 p
since limn!1 f 1; 1; n 2; n = limn!1 1 cos 2 n 2 = 0. Indeed, limn!1 n 2 = 1 (Pro-
position 322).
The minimum value is thus 0. The question is whether there is a solution of the problem,
that is, a vector (^
x; y^; z^; n
^ ) 2 C such that f (^
x; y^; z^; n
^ ) = 0. Since f is a sum of squares, this
requires that in such a vector they all be null:
^n^ + y^n^
x z^n^ = 1 cos 2 x
^=1 cos 2 y^ = 1 cos 2 z^ = 0
kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A (18.54)
If in this de…nition we keep the second inequality, we have the following weakening of
continuity.
A function that is upper semicontinuous at each point of a set E is called upper semicon-
tinuous on E. The function is called upper semicontinuous when it is upper semicontinuous
at all the points of its domain.28
Upper semicontinuity has a dual notion of lower semicontinuity, with f (x) > f (x0 ) "
in place of f (x) < f (x0 ) + ".
Proof The “if” is obvious. As to the converse, assume that f is both upper and lower
semicontinuous at x0 2 A. Fix " > 0. There exist 0" ; 00" > 0 such that, for each x 2 A,
0
kx x0 k < " =) f (x) < f (x0 ) + "
00
kx x0 k < " =) f (x) > f (x0 ) "
27
Clearly, the sandwich f (x0 ) " < f (x) < f (x0 ) + " amounts to jf (x0 ) f (x)j < ".
28
Semicontinuity has been introduced by René Baire in 1905.
584 CHAPTER 18. OPTIMIZATION PROBLEMS
0 00
so, by taking " = min "; " , we have
The study of the two forms of semicontinuity, upper and lower, is analogous: indeed, it
is easy to see that f is upper semicontinuous if and only if f is lower semicontinuous. For
this reason, we will focus on upper semicontinuity because it is more relevant for the study
of maximizers.
By Proposition 475, for continuous functions we have lim f (xn ) = f (x0 ), so this sequen-
tial characterization of semicontinuous functions helps to understand to what extent upper
semicontinuity generalizes continuity. For lower semicontinuous, we have the dual condition
lim inf f (xn ) f (x0 ).29
Proof Let f be upper semicontinuous at the point x0 . Let fxn g be such that xn ! x0 . Fix
" > 0. There is n" 1 such that kxn x0 k < " for all n n" . By De…nition 865, we then
have f (xn ) < f (x0 ) + " for each n n" . Therefore, lim sup f (xn ) f (x0 ) + ". Since this
is true for each " > 0, we conclude that lim sup f (xn ) f (x0 ).
Suppose now that lim sup f (xn ) f (x0 ) for each sequence fxn g such that xn ! x0 . Let
" > 0 and suppose, by contradiction, that f is not upper semicontinuous at x0 . Therefore, for
each > 0 there exists x such that kx x0 k < and f (x ) f (x0 )+". Setting = 1=n, it
follows that for each n there exists xn such that kxn x0 k < 1=n and f (xn ) f (x0 ) + ". In
this way we can construct a sequence fxn g such that xn ! x0 and f (xn ) f (x0 )+" for each
n. Therefore, lim inf f (xn ) f (x0 ) + " > f (x0 ), which contradicts lim sup f (xn ) f (x0 )
and thus proves that f is upper semicontinuous at x0 .
3
y
2 2
1 1
0
O 1 x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
The function is upper semicontinuous at x0 = 1. In fact, let fxn g R with xn ! 1. For every
such xn we have f (xn ) 1 and therefore lim sup f (xn ) 1 < 2 = f (1). By Proposition 867,
f is upper semicontinuous also at x0 (so, it is upper semicontinuous because it is continuous
at each x 6= x0 ). N
This last example shows that, in general, if a function f has a removable discontinuity
at a point x0 – i.e., the limit limx!x0 f (x) exists but it is di¤erent from f (x0 ) – then
at x0 is either upper semicontinuous if f (x0 ) > limx!x0 f (x) or lower semicontinuous if
f (x0 ) < limx!x0 f (x).
continuous from the left, that is, limx!x f (x) = f (x0 ). For example, let us modify the
0
function (18.55) at x0 = 1, so to have
(
2 if x > 1
f (x) =
x if x 1
In words, the function 1C takes on value 1 on C and 0 elsewhere. Though not continuous, it
is upper semicontinuous. Indeed, let x0 2 Rn . If x0 2 C, then f (x0 ) f (x) for all x 2 Rn ,
so it trivially holds that lim sup f (xn ) f (x) whenever xn ! x. If x0 2= C, then it belongs
to the open set C c . Given any " > 0, if xn ! x then there is n" 1 such that xn 2 C c , so
f (xn ) = 0, for all n n" . Thus, lim f (xn ) = f (x0 ) = 0. By Proposition 867, we conclude
that f is upper semicontinuous since x0 was arbitrarily chosen. Its upper contour sets:
8 n
>
> R if t 0
<
(1C t) = C if t 2 (0; 1]
>
>
:
; if t>1
From the previous result it follows that also Proposition 810 continues to hold under
upper semicontinuity.
A …nal important property is the stability of upper semicontinuity with respect to in…ma
and suprema of functions.
In words, upper semicontinuity is preserved by in…ma over sets of functions of any car-
dinality, while is preserved under suprema only over …nite sets of functions. In this case, we
can actually write h (x) = maxi2I fi (x).
The last example showed that there is a tight connection between upper semicontinuous
functions and closed sets. It is therefore not surprising that the stability of upper semi-
continuous functions relative to in…ma and suprema reminds that of closed sets relative to
intersections and unions, respectively.
Example 875 The union of the closed sets An = [ 1 + 1=n; 1 1=n] is the open interval
( 1; 1), as noted after Corollary 158. The supremum of the in…nitely many upper semicon-
tinuous functions
fn (x) = 1[ 1+ 1 ;1 1 ] (x)
n n
h (x) = sup 1[ 1
1+ n ;1 1
] (x) = 1( 1;1) (x)
n2N n
Proof of Proposition 874 Let x0 2 A. Given " > 0, there exists i 2 I such that fi (x0 ) <
g (x0 ) + ". Since fi is upper semicontinuous, there exists " > 0 such that
So,
kx x0 k < " =) g (x) fi (x) < fi (x0 ) + " < g (x0 ) + 2" 8x 2 A
that is,
kx x0 k < " =) g (x) < g (x0 ) + 2" 8x 2 A
This proves that g is upper semicontinuous at x0 2 A. We leave to the reader the proof that
h is upper semicontinuous at x0 2 A when I is …nite.
588 CHAPTER 18. OPTIMIZATION PROBLEMS
Dual properties hold for lower semicontinuous functions: lower semicontinuity is pre-
served by suprema over sets of functions of any cardinality, while is preserved under in…ma
only over …nite sets of functions. Now the analogy is with the stability properties of open
sets relative to intersections and unions. Indeed, a tight connection –dual to the established
in Example 872 –is easily seen to exist for lower semicontinuous functions and open sets.
In view of Proposition 866, we then have the following important corollary about the
“…nite” stability of continuous functions.
In…ma and suprema of in…nitely many continuous functions are, in general, no longer
continuous. This fragility of continuity is a main reason for the importance of lower and
upper semicontinuity.
The proof is a slight modi…cation of the …rst proof of Weierstrass’Theorem, which essen-
tially still goes through under upper semicontinuity (a further sign that upper semicontinuity
is the relevant notion of continuity to establish the existence of maximizers).
Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t)\C is
non-empty and compact. Set = supx2 f (x), that is, = sup f ( ). By Lemma 800, there
exists a sequence fan g f ( ) such that an ! . Let fxn g be such that an = f (xn )
for every n 1. Since is compact, the Bolzano-Weierstrass’Theorem yields a subsequence
fxnk g fxn g that converges to some x ^ 2 , that is, xnk ! x^ 2 . Since fan g converges to
, also the subsequence fank g converges to . Since f is upper semicontinuous, it follows
that
= lim ank = lim f (xnk ) f (^ x)
k!1 k!1
We conclude with two remarks. (i) For minimizers hold dual versions of the results that
we established, with for instance lower contour sets in place of the upper ones (as readers
can check). (ii) Coercivity becomes a necessary condition for global optimality for upper
semicontinuous objective functions f and compact choice sets C. Indeed, in this case by
Tonelli’s Theorem the upper contour set (f maxx2C f (x)) \ C is non-empty and compact.
is neither lower nor upper semicontinuous at 0. To see the failure of upper semicontinuity,
just note that xn = 1=n ! 0 but lim g (xn ) = 1 > 0 = g (0). Since g f = g, this proves that
lower and upper semicontinuity are not preserved by strictly increasing transformations, so
they are not ordinal properties. N
590 CHAPTER 18. OPTIMIZATION PROBLEMS
Since upper semicontinuity is not an ordinal notion, we might end up with equivalent
objective functions –in the sense of Section 18.1.5 –for which Tonelli’s Theorem is applicable
to only one of them, thus creating an unnatural asymmetry between them. To address this
issue, next we present an ordinal version of upper semicontinuity.
Proof We only prove the “only if”, the converse being similarly proved. Let f be upper
quasi-continuous. We want to show that g f is upper quasi-continuous at x0 . Let fxn g A
be such that xn ! x0 . Suppose that y 2 A is such that (g f ) (xn ) (g f ) (y). Since g is
strictly increasing, by Proposition 209 we have
Since f is upper quasi-continuous at x0 , we then have f (x0 ) f (y). In view of (18.57), this
in turn implies (g f ) (x0 ) (g f ) (y), thus proving that g f is upper quasi-continuous
at x0 .
We can now state and prove a general ordinal version of Tonelli’s Theorem in which
upper quasi-continuity replaces upper semicontinuity.30
Lemma 884 Let A be a subset of the real line. There exists a convergent and increasing
sequence fan g A such that an " sup A.
30
We leave to readers the dual minimization version, based on a lower quasi-continuity notion.
18.13. ULTRACODA: THE SEMICONTINUOUS TONELLI 591
Proof Set = sup A. Suppose that 2 R. In the proof of Lemma 800 we proved the
existence of a sequence fan g A such that an and an ! . Set bn = max fa1 ; :::; an g.
Then 0 bn an ! 0, so bn ! . Suppose now = +1. In the proof of Lemma
800 we proved the existence of a sequence fan g A such that an ! +1. Again, by setting
bn = max fa1 ; :::; an g, we have bn " +1.
Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t) \ C
is non-empty and compact. Set = supx2 f (x). By Lemma 884, there exists a sequence
fan g f ( ) such that an " . Let fxn g be such that an = f (xn ) for every n 1. Since
is compact, the Bolzano-Weierstrass’ Theorem yields a subsequence fxnk g fxn g that
converges to some x ^ 2 . We want to show that = f (^ x). Suppose, by contradiction, that
x) < . Since ank " , then there exists k
f (^ 1 large enough so that nk nk > f (^
x)
for all k k. Hence, f (xnk ) f xnk for all k k. Since f is upper quasi-continuous at
x
^, we then have f (^ x) f xnk > , a contradiction.31 We conclude that = f (^ x). So,
f (^
x) f (x) for every x 2 . At the same time, if x 2 C we have f (x) < t and so
f (^
x) t > f (x). It follows that f (^x) f (x) for every x 2 C, as desired.
The ordinal Tonelli’s Theorem is the most general form of this existence theorem that we
present. The earlier pre-coda version of Tonelli’s Theorem for continuous functions, Theorem
814, is enough for the results of the book. Yet, when later in the book readers will come
across topics that rely on Tonelli’s Theorem, they may then wonder how much generality
would be gained via its stronger semicontinuous and quasi-continuous versions.
31
Here xnk plays the role of y in (18.56).
592 CHAPTER 18. OPTIMIZATION PROBLEMS
Chapter 19
1.5
1
x
0.5
0 ||x-m||
O
-0.5
m
-1
-1.5
-2
-1 0 1 2 3 4
Clearly, the problem is trivial if x belong to V : just set m = x. Things become interesting
when x is not in V . In this regard, note that we can paraphrase the problem by saying that
it consists in …nding in V the best approximation of a given x 2 Rn : the vector subspace
V thus represents the space of “admissible approximations” and x m is interpreted as an
“approximation error” because it represents the error made by approximating x with m.
The problem described above is an optimization problem that consists in minimizing
kx yk under the constraint y 2 V , that is,
593
594 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS
The following theorem addresses all these questions. It relies on the notions of orthogonal-
ity we studied earlier in the book (Chapter 4). In particular, recall that two vectors x; y 2 Rn
are orthogonal, written x?y, when their inner product is null. When x is orthogonal to all
vectors in a subset S of Rn , we write x?S.
Note that the uniqueness of m implies that kx mk < kx yk for each y 2 V di¤erent
from m.
This remarkable result ensures the existence and uniqueness of the solution, thus an-
swering the …rst two questions, and characterizes it as the vector in V which makes the
approximation error orthogonal to V itself. Orthogonality with respect to the error is a key
property of the solution that has a number of consequences in applications. Furthermore,
Theorem 890 will show how orthogonality allows for identifying the solution in closed form
in terms of a basis of V , thus fully answering also the last question.
Thanks to the following lemma, one can apply Tonelli’s Theorem and Theorem 831 to
this optimization problem.
Proof The proof is analogous to that of Lemma 855 and is thus left to the reader (note
that, from Proposition 701, V is a closed and convex subset of Rn ).
Proof of the Projection Theorem In light of the previous lemma, problem (19.2), so
problem (19.1), has a solution by Tonelli’s Theorem because f is coercive on V and such a
solution is unique by Theorem 831 because f is strictly concave.
It remains to show that, if m minimizes kx yk, then (x m) ?V . Suppose, by contra-
diction, that there is a y~ 2 V which is not orthogonal to x m. Without loss in generality,
suppose that k~y k = 1 (otherwise, it would su¢ ce to take y~= k~
y k which has norm 1) and that
(x m) y~ = 6= 0. Denote by y 0 the element in V such that y 0 = m + y~. We have that
2
x y0 = kx m y~k2 = kx mk2 2 (x m) y~ + 2
= kx mk2 2
< kx mk2
19.2. PROJECTIONS 595
thus contradicting the assumption that m minimizes kx yk as the element y 0 would make
kx yk even smaller. The contradiction proves the desired result.
Denote by V ? = fx 2 Rn : x?V g the set of vectors that are orthogonal to V . The reader
can easily check that such a set is a vector subspace of Rn . It is thus called the orthogonal
complement of V .
Example 887 Let V = span fy1 ; :::; yk g be the vector subspace generated by the vectors
fyi gki=1 and let Y 2 M (k; n) be the matrix whose rows are such vectors. Given x 2 Rn ,
we have x?V if and only if Y x = 0. Therefore, V ? consists of all the solutions of this
homogeneous linear system. N
In words, any vector can be uniquely represented as sum of vectors in V and in its
orthogonal complement V ? , and this can be done for any vector subspace V of Rn . The
uniqueness of such a decomposition is remarkable as it entails that the vectors y and z are
uniquely determined. For this reason we say that Rn is direct sum of subspaces V and V ? ,
in symbols Rn = V V ? . In many applications it is important to be able to regard Rn as a
direct sum of one of its subsets and its orthogonal complement.
19.2 Projections
Given a vector subspace V of Rn , the solution of the minimization problem (19.1) is called
projection of x onto V . In such way one can de…ne an operator PV : Rn ! Rn , called
projection, that associates to each x 2 Rn its projection PV (x).
Therefore,
( PV (x) + PV (y) ( x + y)) ?V
and, by the Projection Theorem and by the uniqueness of decomposition (19.3), PV (x) +
PV (y) is the projection of x + y on V , that is, PV ( x + y) = PV (x) + PV (y).
596 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS
Being linear, projections thus have a matrix representation. To …nd it, consider a set
fyi gki=1 of vectors that generate the subspace V , that is, V = span fy1 ; :::; yk g. Given x 2 Rn ,
by the Projection Theorem we have (x PV (x)) ?V , so
(x PV (x)) yi = 0 8i = 1; :::; k
are the so-called normal equations of theP projection. Since PV (x) 2 V , we can write such a
vector as a linear combination PV (x) = ki=1 k yk . The normal equations then become:
k
!
X
x k yk yi = 0 8i = 1; :::; k
i=1
that is,
k
X
k (yk yi ) = x yi 8i = 1; :::; k
i=1
We thus end up with the system
8
>
> 1 (y1 y1 ) + 2 (y2
y1 ) + + k (yk y1 ) = x y1
<
1 (y1 y2 ) + 2 2 y2 ) +
(y + k (yk y2 ) = x y2
>
>
:
1 (y1 yk ) + 2 (y2 yk ) + + k (yk yk ) = x yk
Let Y 2 M (n; k) be the matrix that has as columns the generating vectors fyi gki=1 . We can
rewrite the system in matrix form as
YT Y = YT x (19.4)
k nn kk 1 k nn 1
We thus end up with the Gram square matrix Y T Y , which has rank equal to that of Y by
Proposition 582, that is, Y T Y = (Y ).
If the vectors fyi gki=1 are linearly independent, matrix Y has full rank k and so the Gram
matrix is invertible. By multiplying all elements in system (19.4) by the inverse of the Gram
1
matrix Y T Y , we have
1
= Y TY Y Tx
So, the projection is given by
k
X 1
PV (x) = k yk =Y = Y Y TY Y Tx 8x 2 Rn
i=1
We have thus proven the important:
Theorem 890 Let V be a vector subspace of Rn generated by the linearly independent vectors
fyi gki=1 .1 The projection PV : Rn ! Rn on V is given by
1
PV (x) = Y Y T Y Y Tx 8x 2 Rn (19.5)
where Y 2 M (n; k) is the matrix that has such vectors as columns.
1
In conclusion, the matrix Y Y T Y Y T represents the linear operator PV .
1
The assumption that V is generated by the linearly independent vectors fyi gki=1 is equivalent to requiring
that such vectors be a basis for V . The theorem can be equivalently formulated as: Let fyi gki=1 be a basis of
a vector subspace of Rn .
19.3. THE ULTIMATE RIESZ 597
In light of this lemma, let us denote the common projection as , that is = PV ( ) with
2 . By the decomposition (19.3), every 2 can be uniquely written as = +", where
" 2 V ? , so that the vectors " and are orthogonal. In other words, = +":"2V? .
Since
f (x) = x = ( + ") x = x+" x= x 8x 2 V
the projection is the only vector in V that represents f . We have thus proven the following
version of Riesz’s Theorem for vector subspaces.
In what follows, when mentioning Riesz’s Theorem we will refer to this general version
of the result.
Projections have made it possible to address the multiplicity of vectors that a- icted
Theorem 638, which resulted from the multiplicity of the extensions f : Rn ! R of a function
f on Rn provided by the Hahn-Banach’s Theorem (Section 13.10).
In particular, if f : Rn ! R is a linear function on Rn and is the unique vector of Rn
such that f (x) = x for every x 2 Rn , for its restriction fjV on a vector subspace V the
vector = PV ( ) is the only vector in V such that f (x) = x for every x 2 V . By (19.5),
we then have the following remarkable formula
1
= Y Y TY YT
598 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS
Least squares The least squares solution x 2 Rn solves the minimization problem
At the same time, since the image Im F of the linear operator F (x) = Ax is a vector subspace
of Rm , the projection PIm F (b) of vector b 2 Rm solves the optimization problem
that is,
kPIm F (b) bk ky bk 8y 2 Im F
that is, if and only if its image Ax is the projection of b on the vector subspace Im F
generated by the columns of A. The image Ax is often denoted as y . With such a
notation, (19.7) can be rewritten as y = PIm F (b).
Errors Equality (19.7) shows the tight relationship between projections and least squares.
In particular, by the Projection Theorem the error Ax b is orthogonal to the vector
subspace Im F :
(Ax b) ? Im F
or, equivalently, (y b) ? Im F .
The vector subspace Im F is generated by the columns of A, which are therefore or-
thogonal to the approximation error. For example, in the statistical interpretation of least
squares from Section 18.9.2, matrix A is denoted as X and has the form (18.48); each column
XiT of X displays data on the i-th regressor in every period. If we identify each such column
with the regressor whose data it portrays, we can see Im F as the vector subspace of Rm
generated by the regressors. The least squares method is equivalent to considering the pro-
jection of the output vector Y on the subspace generated by the regressors X1 , ..., Xn . In
particular, the regressors are orthogonal to the approximation error:
(X Y ) ?Xi 8i = 1; ::; n
By setting Y = X one equivalently has that (Y Y ) ?Xi for every i = 1; ::; n, a classic
property of least squares that we already mentioned.
19.5. A FINANCE ILLUSTRATION 599
Solution’s formula Assume that (A) = n, so matrix A has full rank and the linear
operator F is injective (Corollary 579). In this case, we have
1
x =F (PIm F (b)) (19.8)
so that the least squares solution can be determined via the projection. Equality (19.8) is
even more signi…cant if we can express it in matrix form. In doing so, note that the linearly
independent (since (A) = n) columns of A generate the subspace Im F , thus taking the
role of matrix Y from Section 19.2. By Theorem 890, we have
1
Ax = PIm F (b) = A AT A AT b
1
AT A x = AT A AT A AT b = AT b
1
x = AT A AT b
This is the matrix representation of (19.8) that is made possible by the matrix representation
of projections established in Theorem 890. Cramer’s Theorem is the special case when A is
an invertible square matrix of order n. Indeed, in this case also the transpose AT is invertible
(Proposition 603), so by Proposition 594 we have
1 1
x = AT A AT b = A 1
AT AT b = A 1
b
We have thus found the least squares solution when the matrix A has full rank. Using
the statistical notation, we end up with the well-known least squares formula
1
= X TX X TY
Example 895 In the previous example the market generated by the four primary assets
(19.9) is easily seen to be complete. On the other hand, suppose that only the …rst two
assets are available, that is, L = fy1 ; y2 g. Then, W = span L = f(x; 0; y) : x; y 2 Rg, and so
the market is now incomplete. Indeed, it is not possible to replicate contingent claims that
feature non-zero payments when state s2 obtains. N
is the linear operator that describes the contingent claim determined by portfolio x. In other
words, Ri (x) is the payo¤ of portfolio x if state si obtains. Clearly, W = Im R and so the
rank (R) of the linear operator R : Rn ! Rk is the dimension of the market W .
To derive the matrix representation of the payo¤ operator R, consider the payo¤ matrix
2 3
y11 y12 y1n
6 y21 y22 y2n 7
Y = (yij ) = 6
4
7
5
k n
yk1 yk2 ykn
It has k rows (states) and n columns (assets), where entry yij represents the payo¤ of primary
asset yj in state si . In words, Y is the matrix rendering of the collection L of primary assets.
It is easy to see that the payo¤ operator R : Rn ! Rk can be represented as
R (x) = Y x
The payo¤ matrix Y is thus the matrix associated with operator R. Its rank is then dimension
of market W (see Section 13.4.2).
In a frictionless market, the (market) value
Xn
v (x) = p x = p j xj
j=1
of a portfolio x is its (today) cost caused by the market operations it requires.2 The (market)
value function v : Rn ! R is the linear function that assigns to each portfolio x its value
v (x). In particular, the value of primary assets is their price. For, recalling that the primary
asset yj is identi…ed by the portfolio ej , we have
v ej = p ej = pj (19.10)
Note that it is the frictionless nature of the market that ensures the linearity of the value
function. For instance, if there are transaction costs and so the price of asset yj depends on
the traded quantity –e.g., v 2ej < 2pj –then the value function is no longer linear.
2
Since there are no restrictions to trade, and so it is possible to go long or short on assets, to be precise
v (x) is actually a cost if positive, but a bene…t if negative.
602 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS
De…nition 896 The …nancial market (L; p) satis…es the Law of one price (LOP) if, for all
portfolios x; x0 2 Rn ,
R (x) = R x0 =) v (x) = v x0 (19.11)
In words, portfolios that induce the same contingent claims must share the same market
value. Indeed, the contingent claims that they determine is all that matters in portfolios,
which are just instruments to achieve them. If two portfolios inducing the same contingent
claim had di¤erent market values, a (sure) saving opportunity would be missed in the market.
The LOP requires that the …nancial market takes advantage of any such opportunity.
Since W = Im R, we have R (x) = R (x0 ) if and only if x; x0 2 R 1 (w) for some w 2 W .
The LOP can be then equivalently stated as follows: given any replicable claim w 2 W ,
x; x0 2 R 1
(w) =) v (x) = v x0 (19.12)
All portfolios x that replicate a contingent claim w thus share the same value v (x). It is
then natural to regard such common value as the price of the claim.
In words, pw is the market cost v (x) incurred today to form a portfolio x that tomorrow
will ensure the contingent claim w, that is, w = R (x). By the form (19.12) of the LOP, the
de…nition is well posed: it is immaterial which speci…c replicating portfolio x is considered
to determine price pw . The LOP thus permits to price all replicable claims.
For primary assets we get back to (19.10), that is, pj = v ej . In general, we have
Xn
1
pw = v (x) = p j xj 8x 2 R (w)
j=1
The price of a contingent claim in the market is thus the linear combination of the prices of
the primary assets held in any replicating portfolio, weighted according to assets’weights in
such portfolio.
which permits to price all contingent claims in the market, starting from the market prices
of primary assets.
19.5. A FINANCE ILLUSTRATION 603
Theorem 899 Suppose the …nancial market (L; p) satis…es the LOP. Then, the pricing rule
f : W ! R is linear.
Proof First observe that, by the LOP, v = f R, that is, v (x) = f (R (x)) for each x 2 Rn .
Let us prove the linearity of f . Let w; w0 2 W and ; 2 R. We want to show that
f ( w + w0 ) = f (w) + f (w0 ). Since W = Im R, there exist vectors x; x0 2 Rn such that
R (x) = w and R (x0 ) = w0 . By De…nition 897, pw = v (x) and pw0 = v (x0 ). By the linearity
of R and v, we then have
f w + w0 = f R (x) + R x0 =f R x + x0 =v x + x0
= v (x) + v x0 = pw + pw0 = f (w) + f w0
The fact that the linearity of the pricing rule characterizes the (frictionless) …nancial
markets in which the LOP holds is a remarkable result, upon which modern asset pricing
theory relies. It permits to price all contingent claims in the market in terms of other
contingent claims, thus generalizing formula (19.13). For, suppose a contingent claim w
can
Xmbe written as a linear combination of some replicable contingent claims, that is, w =
j wj . Then w is replicable, with
j=1
Xm Xm Xm
pw = f (w) = f j wj = jf (wj ) = j pwj (19.14)
j=1 j=1 j=1
Formula (19.13) is the special case where the contingent claims wj are primary assets and
their weights are the portfolio ones. In general, it may be easier (e.g., more natural from a
…nancial standpoint) to express a contingent claim in terms of other contingent claims rather
in terms of primary assets. The pricing formula
Xm
pw = j pwj (19.15)
j=1
permits to price contingent claims when expressed in terms of other contingent claims.
Inspection of the proof of Theorem 899 shows that the pricing rule inherits its linearity
from that of the value function, which in turn depends on the frictionless nature of the
…nancial market. We conclude that, in the …nal analysis, the pricing rule is linear because
the …nancial market is frictionless. Whether or not the market is complete is, instead,
irrelevant.
604 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS
Theorem 900 Suppose the …nancial market (L; p) satis…es the LOP. Then, there exists a
unique vector 2 W such that
f (w) = w 8w 2 W (19.16)
The representing vector is called the pricing kernel. When the market is complete,
2 Rk . In this case we have i = pei where pei is the price of the Arrow contingent claim
i
e ; indeed, by (19.16)
pei = f ei = ei = i
In words, the i-th component of the pricing kernel i is the price of the Arrow contingent
claim that corresponds to state si . That is, i is the cost of having, for sure, one euro
tomorrow if state si obtains (and zero otherwise).
As a result, when the market is complete the price of a contingent claim w is the weighted
average
Xk
pw = f (w) = w= i wi (19.17)
i=1
of its payments in the di¤erent states, each state weighted according to how much it costs
today to have one euro tomorrow at that state. Consequently, the knowledge of the pricing
kernel (i.e., of the prices of the Arrow contingent claims) permits to price all contingent
claims in the market via the pricing formula
k
X
pw = i wi (19.18)
i=1
The earlier pricing formulas (19.13) and (19.15) require, to price each claim, the knowledge
of replicating portfolios or of prices of some other contingent claims. In contrast, the pricing
formula (19.18) only requires a single piece of information, the value of the Ppricing kernel,
to price all claims. In particular, for primary assets it takes the form pj = ki=1 i yij .
Example 901 In the three-state economy of Example 894, there are three Arrow contingent
claims e1 , e2 , and e3 . Suppose the today market price of having tomorrow one euro in the
recession state (and zero otherwise) is higher than in the stasis state, which is in turn higher
than in the growth state, say pe1 = 3, pe2 = 2, and pe3 = 1. Then, the pricing kernel is
= (3; 2; 1) and the pricing formula (19.18) becomes pw = 3w1 + 2w2 + w3 for all w 2 W .
For instance, the price of the contingent claim w = (2; 1; 4) is pw = 12. N
19.5. A FINANCE ILLUSTRATION 605
19.5.6 Arbitrage
A portfolio x 2 Rn is an arbitrage if either of the following conditions holds3
Yx 0 Yx>0
I ; II
p x<0 p x 0
A portfolio that satis…es condition I has a strictly negative market value and, nevertheless,
ensures a positive payment in all states. On the other hand, a portfolio that satis…es condition
II has a negative market value and, nevertheless, a strictly positive payo¤ in all states. Well-
functioning …nancial markets should be able to take advantage of any such opportunity of a
sure gain, and so they should feature no arbitrage portfolios.
In this section we will study such well-functioning markets. In particular, in a market
without arbitrages I we have:
The …rst no arbitrage condition is enough to ensure that the market satis…es the LOP.
Lemma 902 A …nancial market (L; p) that has no arbitrages I satis…es the LOP.
R (x) 0 =) v (x) 0 8x 2 Rn
that is,
R (x) 0 =) v (x) 0 8x 2 Rn
Along with (19.19), this implies
R (x) = 0 =) v (x) = 0 8x 2 Rn
Let x and x0 be two portfolios such that R (x) = R (x0 ). The linearity of R implies
R (x x0 ) = 0, and so v (x0 x) = 0, i.e., v (x0 ) = v (x).
Consider a complete market, that is, W = Rk . Thanks to the lemma, the no arbitrage
condition (19.19) implies that contingent claims are priced according to the formula (19.16).
But much more is true: under this no arbitrage condition the vector is positive, and so the
pricing rule becomes linear and increasing. Better claims command higher market prices.
Proposition 903 A complete …nancial market (L; p), with p 6= 0, satis…es the no arbitrage
condition (19.19) if and only if the pricing rule is linear and increasing, that is, there exists
unique vector 2 Rk+ such that
f (w) = w 8w 2 W (19.21)
3
Y x > 0 means (Y x)i > 0 for each i = 1; :::; k.
606 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS
Proof “If”. Let R (x) 0. Then, v (x) = f (R (x)) = R (x) 0 since 0 by hypothesis.
“Only if”. Since the market is complete, we have W = Im R = Rk . By Lemma 902, the
LOP holds and so f is linear (Proposition 899). We need to show that f is increasing. Since
f is linear, this amounts to show that is positive, i.e., that x 0 implies f (x) 0. Let
x 2 Rk with x 0. Being Im R = Rk , there exists x 2 Rn such that R (x) = x. We thus
have R (x) = x 0, and so (19.19) implies v (x) 0 because of the linearity of v. Hence,
f (x) = f (R (x)) = v (x) 0. We conclude that the linear function f is positive, and so
increasing. By the monotone version of Riesz’s Theorem (Proposition 641), there exists a
positive vector 2 Rk such that f (z) = z for every z 2 Rk .4
The result becomes sharper when the market also satis…es the second no arbitrage con-
dition (19.20): the vector then becomes strictly positive, so that the pricing rule gets
linear and strictly increasing. Strictly better claims thus command strictly higher mar-
ket prices. But, as both the no arbitrage conditions (19.19) and (19.20) are compelling, a
well-functioning market should actually satisfy both of them. We thus have the following
important result (as its demanding name shows).5
Theorem 904 (Fundamental Theorem of Finance) A complete …nancial market (L; p),
with p 6= 0, satis…es the no arbitrage conditions (19.19) and (19.20) if and only if the pricing
rule is linear and strictly increasing, that is, there exists a unique vector 2 Rk++ such that
f (w) = w 8w 2 W (19.22)
Proof “If”. Let R (x) > 0. Then, v (x) = f (R (x)) = R (x) > 0 because > 0 by
hypothesis. “Only if.”By Proposition 903, f is linear and increasing. We need to show that
f is strictly increasing. Since f is linear, this amounts to show that is strictly positive, i.e.,
that x 0 implies f (x) > 0. Let x 2 Rk with x 0. Being Im R = Rk , there exists x 2 Rn
such that R (x) = x. We thus have R (x) = x 0, and so (19.19) implies v (x) 0 because
of the linearity of v. Hence, f (x) = f (R (x)) = v (x) > 0. We conclude that the linear
function f is strictly positive, and so strictly increasing. By the (strict) monotone version of
Riesz’s Theorem (Proposition 641), there exists a strictly positive vector 2 Rk++ such that
f (z) = z for every z 2 Rk .
The price of any replicable contingent claim w is thus the weighted average
k
X
pw = f (w) = w= i wi
i=1
of its payments in the di¤erent states, with strictly positive weights. If market prices do not
have this form, the market is not exhausting all arbitrage opportunities. Some sure gains
are still possible.
4
The vector in (19.22) is unique because the market is complete, and so is unique the vector in
Proposition 641.
5
We refer interested readers to Cochrane (2005) and Ross (2005).
Part VI
Di¤erential calculus
607
Chapter 20
Derivatives
c = c (x + x) c (x)
x 2 f:::; 3; 2; 1; 1; 2; 3; :::g
As the production increases, while the average cost decreases the di¤erence quotient in-
creases. This means that the average cost of each additional unit increases. Therefore, to
increase the production is, “at the margin”, more and more expensive for the producer. In
609
610 CHAPTER 20. DERIVATIVES
particular, the last additional unit has determined an increase in costs of 5 euros: for the
producer such increase in the production is pro…table if (and only if) there is an at least
equal increase in the di¤erence quotient of the return R(x), that is, in the return of each
additional unit:
R R (x + x) R (x)
= (20.2)
x x
Let us add to the table two columns with the returns and their di¤erence quotients:
c(x) c R
x c (x) x x R (x) x
The …rst two increases in production are pro…table for the producer: they determine a
di¤erence quotient of the returns equal to 50 euros and 33:3 euros, respectively, versus a
di¤erence quotient of the costs equal to 3 euros and 3:3 euros, respectively. After the last
increment in production, the di¤erence quotient of the returns decreases to only 4 euros,
lower than the corresponding value of 5 euros of the di¤erence quotient of the costs. The
producer will …nd, therefore, pro…table to increase the production to 105 units, but not to
106. That this choice is correct is con…rmed by the trend of the pro…t (x) = R (x) c (x),
which for convenience we add to the table:
c(x) c R
x c (x) x x R (x) x (x)
100 4; 494 44:94 5; 000 506
The pro…t of the producer continues to increase up to the level 105 of produced output, but
decreases in case of a further increase to 106. The “incremental” information, quanti…ed by
di¤erence quotients such as (20.1) and (20.2), is therefore key for the producer ability to
assess his production decisions. In contrast, the information on average costs or on average
returns is, for instance, completely irrelevant (in our example it is actually misleading: the
decrease in average costs can lead to wrong decisions). In the economics jargon, the producer
should decide based on what happens at the margin, not on average.
Until now we have considered the ratio (20.1) for discrete variations x. Idealizing, let
us consider arbitrary non-zero variations x 2 R and, in particular, smaller and smaller
variations, that is, x ! 0. Their limit c0 (x) is given by
c (x + x) c (x)
c0 (x) = lim (20.3)
x!0 x
20.2. DERIVATIVES 611
When it exists and is …nite, c0 (x) is called the marginal cost at x: it indicates the variation
in cost determined by in…nitesimal variations of output with respect to the “initial”quantity
x.
This idealization permits to frame marginal analysis within di¤erential calculus, a fun-
damental mathematical theory that will be the subject matter of the chapters of this part of
the book. Because it formalizes marginal analysis, di¤erential calculus pervades economics.
20.2 Derivatives
For a function f : (a; b) ! R, the di¤erence quotient (20.1) takes the form
f f (x + h) f (x) f (x + h) f (x)
= = (20.4)
x (x + h) x h
Therefore, the derivative is nothing but the limit of the di¤erence quotient when it exists
and is …nite. Other notations used for the derivative at x0 are
df
Df (x0 ) and (x0 )
dx
The notation f 0 (x0 ), which we will mostly use, is probably the most convenient; sometimes
we will use also the other two notations, whenever convenient.2
Note the double requirement that the limit exist and be …nite: if at a point the limit of
the di¤erence quotient (20.5) exists but is in…nite, the function does not have a derivative
at that point (see Example 909).
A few remarks are in order. (i) Di¤erential calculus, of which derivatives are a …rst key
notion, originated in the works of Leibniz and Newton in the second part of the seventeenth
century. Newton was motivated by physics, which indeed features a classic example of a
derivative: let t be the time and s be the distance covered by a mobile object. Suppose the
function s(t) indicates the total distance totally covered until time t. The di¤erence quotient
s= t is the average velocity in a time interval of t. Therefore, its derivative at a point
t0 can be interpreted as the instantaneous velocity at t0 . If space is measured in kilometers
and time in hours, the velocity is measured in km/h, that is, in “kilometers per hour” (as
speedometers do).
(ii) In applications, the dependent and independent variables y and x that appear in
a function y = f (x) take a concrete meaning and are both evaluated in terms of a unit of
1
Since the domain (a; b) is an open interval, for h su¢ ciently small we have x + h 2 (a; b).
2
Di¤erent notations for the same mathematical object can be convenient in di¤erent contexts. For this
reason, it may be important to have several notations at hand (provided they are then used consistently).
612 CHAPTER 20. DERIVATIVES
measure (e, $, kg, liters, years, miles, parsecs, etc.): if we denote by T the unit of measure of
the dependent variable y and by S that of the independent variable x, the di¤erence quotient
y= x (and so the derivative, if it exists) is then expressed in the unit of measure T =S. For
instance, if in the initial example the cost is expressed in euros and the quantity produced
in quintals, the di¤erence quotient (20.1) is expressed in e/q, that is, in “euros per quintal”.
(iii) The notation df =dx (or the equivalent dy=dx) wants to suggest that the derivative
is a limit of ratios.3 Note, however, that df =dx is only a symbol, not a true ratio (indeed, it
is the limit of ratios). Nevertheless, heuristically it is often treated as a true ratio (see, for
example, the remark on the chain rule at the end of Section 20.9). This can be a useful trick
to help our intuition as long as what found is then checked formally.
(iv) The terminology “derivable at” is not so common, but its motivation will become
apparent in Section 20.12.2. In any case, a function f : (a; b) ! R which is derivable at each
point of (a; b) is called derivable, without any further quali…cation.
y
5
f(x +h)
4 0
3
f(x )
0
2
0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6
f (x0 + h) f (x0 )
y = f (x0 ) + (x x0 ) (20.6)
h
3
This notation is due to Leibniz, while the f 0 notation is due to Lagrange.
20.3. GEOMETRIC INTERPRETATION 613
which is the equation of the sought-after straight line passing through the points (x0 ; f (x0 ))
and (x0 + h; f (x0 + h)). Taking the limit as h ! 0, we get
that is, the equation of the straight line which is tangent to the graph of f at the point
(x0 ; f (x0 )) 2 Gr f .
As h tends to 0, the straight line (20.6) thus tends to the tangent (straight) line, whose
slope is the derivative f 0 (x0 ). The graph of the tangent line is:
y
5
f(x +h)
4 0
3
f(x )
0
2
0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6
In sum, geometrically the derivative can be regarded as the slope of the tangent line at
the point (x0 ; f (x0 )). In turn, the tangent line can be regarded as a local approximation
of the function f at x0 , a key observation that will be developed through the fundamental
notion of di¤erential (Section 20.12).
The derivative exists at each x 2 R and is given by 2x. For example, the derivative at x = 1
is f 0 (1) = 2, with tangent line
y = f (1) + f 0 (1) (x 1) = 2x 2
614 CHAPTER 20. DERIVATIVES
y
3
0
-1 O 1 x
-1
-2
-2 -1 0 1 2 3
y
3
0
-1 O 1 x
-1
-2
-2 -1 0 1 2 3
In this case the tangent line is horizontal (constant) and is always equal to 1. N
Example 907 Consider a constant function f : R ! R, that is, f (x) = k for every x 2 R.
For every h 6= 0 we have
f (x + h) f (x) k k
= =0
h h
and therefore f 0 (x) = 0 for every x 2 R. The derivative of a constant function is zero. N
with graph:
10
0
x
-2
-4
-6
-8
-10
-5 0 5
At a point x 6= 0 we have
1 1
f (x + h) f (x) x (x + h)
f 0 (x) = lim = lim x+h x
= lim
h!0 h h!0 h h!0 hx (x + h)
h 1 1
= lim = lim =
h!0 hx (x + h) h!0 x (x + h) x2
The derivative exists at each x 6= 0 and is given by x 2. For example, the derivative at
x = 1 is f 0 (1) = 1, and at x = 2 is f 0 ( 2) = 1=4.
If we consider the origin x = 0 we have, for h 6= 0,
1
f (x + h) f (x) h 0 1
= =
h h h2
so that
f (x + h) f (x)
lim = +1
h!0 h
The limit is not …nite and hence the function does not have a derivative at x = 0. Recall
that the function is not continuous at this point (Example 477). N
( p
x if x 0
f (x) = p
x if x < 0
616 CHAPTER 20. DERIVATIVES
with graph:
3.5
3 y
2.5
1.5
0.5
0
O x
-0.5
-1
-1.5
-2
-6 -4 -2 0 2 4 6 8
De…nition 910 Let f : (a; b) ! R be a function with domain of derivability D (a; b).
0 0
The function f : D ! R that to each x 2 D associates the derivative f (x) is called the
derivative function of f .
The derivative function f 0 describes the derivative of f at the di¤erent points where it
exists, thus describing its overall behavior. In the examples previously discussed:
20.5. ONE-SIDED DERIVATIVES 617
(iii) for f (x) = 1=x = x 1, the derivative function f 0 : R f0g ! R is given by f 0 (x) =
x 2.
The notion of derivative function permits to frame in a bigger picture the computations
that we did in the examples of the last section: to compute the derivative of a function f at
a generic point x of the domain amounts to computing its derivative function f 0 . When we
have found that the derivative of f (x) = x2 is, at any point x 2 R, given by 2x, we have
actually found that its derivative function f 0 : R ! R is given by f 0 (x) = 2x.
Example 911 Let r : R+ ! R be the return function and c : R+ ! R be the cost function
of a producer (see Section 18.1.4). The derivative function r0 : D R+ ! R is called the
marginal return function, and the derivative function c0 : D R+ ! R is called the marginal
cost function. Their economic interpretation should be, by now, clear. N
De…nition 912 A function f : (a; b) ! R is said to be derivable from the right at the point
x0 2 (a; b) if the one-sided limit
f (x0 + h) f (x0 )
lim (20.8)
h!0+ h
exists and is …nite, and to be derivable from the left at x0 2 (a; b) if the one-sided limit
f (x0 + h) f (x0 )
lim (20.9)
h!0 h
When it exists and is …nite, the limit (20.8) is called the right derivative of f at x0 , and
it is denoted by f+0 (x0 ). Analogously, when it exists and is …nite, the limit (20.9) is called
left derivative of f at x0 , and it is denoted by f 0 (x0 ). Since two-sided limits exist if and
only if both one-sided limits exist (Proposition 445), we have:
1
1
0
O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
Therefore, by Proposition 913 the function is derivable also at 0, with f 0 (0) = 0. In conclu-
sion, (
2x if x 0
f 0 (x) =
0 if x > 0
N
Through unilateral derivatives we can classify two important classes of points where
derivability fails. Speci…cally, a point x0 of the domain of f is called:
(i) a corner point if the right derivative and the left derivative exist but are di¤erent, i.e.,
f+0 (x0 ) 6= f 0 (x0 );
(ii) a cuspidal point (or a cusp) if the right and left limits of the di¤erence quotient are
in…nite with di¤erent sign:
f (x0 + h) f (x0 ) f (x0 + h) f (x0 )
lim = 1 and lim = 1
h!0+ h h!0 h
20.5. ONE-SIDED DERIVATIVES 619
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
At x0 = 0 we have
(
f (x0 + h) f (x0 ) jhj 1 if h > 0
= =
h h 1 if h < 0
The two-sided limit of the di¤erence quotient does not exist at 0, so the function is not
derivable at 0. Nevertheless, at 0 there exist the one-sided derivatives. In particular,
f (0 + h) f (0) f (0 + h) f (0)
f+0 (0) = lim =1 ; f 0 (0) = lim = 1
h!0+ h h!0 h
The origin x0 = 0 is, therefore, a corner point. The reader can check that the function is
derivable at each point x 6= 0, with
(
0 1 if x > 0
f (x) =
1 if x < 0
( p
x if x 0
f (x) = p
x if x < 0
620 CHAPTER 20. DERIVATIVES
has a cuspidal point at the origin x = 0, as we can see from its graph:
3.5
3 y
2.5
1.5
0.5
0
O x
-0.5
-1
-1.5
-2
-6 -4 -2 0 2 4 6 8
We close by noting that the right and left derivative functions are de…ned in the same
way, mutatis mutandis, as the derivative function. In Example 915, the one-sided derivative
functions f+0 : R ! R and f 0 : R ! R are given by
( (
0 1 if x 0 0 1 if x > 0
f+ (x) = and f (x) =
1 if x < 0 1 if x 0
Proof We have to prove that limx!x0 f (x) = f (x0 ). Since f is derivable at x, the limit of
the di¤erence quotient exists and is …nite, and it is equal to f 0 (x0 ):
f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
Let us rewrite the limit by setting x = x0 + h, so that h = x x0 . Observing that, as h
tends to 0, we have that x tends to x0 , we get:
f (x) f (x0 )
lim = f 0 (x0 )
x!x0 x x0
Therefore, by the algebra of limits (Proposition 309) we have:
f (x) f (x0 ) f (x) f (x0 )
lim (f (x) f (x0 )) = lim (x x0 ) = lim lim (x x0 )
x!x0 x!x0 x x0 x!x0 x x0 x!x0
0 0
= f (x0 ) lim (x x0 ) = f (x0 ) 0 = 0
x!x0
20.6. DERIVABILITY AND CONTINUITY 621
where the last equality holds since f 0 (x0 ) exists and is …nite. We have thus proved that
limx!x0 (f (x) f (x0 )) = 0. On the other hand, again by algebra of limits, we have:
0 = lim (f (x) f (x0 )) = lim f (x) lim f (x0 ) = lim f (x) f (x0 )
x!x0 x!x0 x!x0 x!x0
Derivability at a point thus implies continuity at that point. The converse is false: the
absolute value function f (x) = jxj is continuous at x = 0 but is not derivable at that point
(Example 915). In other words, continuity is a necessary, but not su¢ cient, condition for
derivability.
Proposition 917, and the examples seen until now, allow us to identify …ve possible causes
of non-derivability at a point x:
(iv) f has at x a point at which a one-sided derivative exist but, at the other side, the limit
of the di¤erence quotient is +1 or 1; for example, the function
( p
x if x 0
f (x) =
x if x < 0
seen in Example 909 has a vertical tangent at x = 0 because limh!0 f (h) =h = +1.
The …ve cases just identi…ed are, however, not exhaustive: there are other sources of
non-derivability. For example, the function
8
< x sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
is continuous everywhere.4 At the origin x0 = 0 it is, however, not derivable because the
limit
f (x0 + h) f (x0 ) h sin h1 0 1
lim = lim = lim sin
h!0 h h!0 h h!0 h
4
Indeed, limx!0 x sin (1=x) = 0 because jsin (1=x)j 1 and so x x sin (1=x) x.
622 CHAPTER 20. DERIVATIVES
does not exist. The origin is not a corner point and there is no vertical tangent at this point.
The lack of derivability here is due to the fact that f has, in any neighborhood of the origin,
in…nitely many oscillations –which are such that the di¤erence quotient sin (1=h) oscillates
in…nitely many times between 1 and 1. Note that in this example the one-sided derivatives
f+0 (0) and f 0 (0) do not exist as well.
Terminology When f is derivable at all the interior points (a; b) and is one-sided derivable
at the endpoints a and b, we say that it is derivable on the closed interval [a; b]. It is
immediate to see that f is then also continuous on such interval.
f 0 (x) = nxn 1
(20.10)
For example, the function f (x) = x5 has derivative function f 0 (x) = 5x4 and the function
f (x) = x3 has derivative function f 0 (x) = 3x2 .
We give two proofs of this basic result.
f (x + h) f (x) (x + h)n xn
f 0 (x) = lim = lim
h!0 h h!0 h
Pn n! n k k n
k=0 k!(n k)! x h x
= lim
h!0 h
n
x + nx n 1 h + n(n2 1) xn 2 h2 + + nxhn 1 + hn xn
= lim
h!0 h
n (n 1) n 2
= lim nxn 1 + x h+ + nxhn 2 + hn 1
h!0 2
= nxn 1
as claimed.
Proof 2 We establish (20.10) by induction, using the derivative of the product of functions
(see Section 20.8). First, we show that the derivative of the function f (x) = x is equal to 1.
The limit of the di¤erence quotient of f is
f (x + h) f (x) x+h x h
lim = lim = lim =1
h!0 h h!0 h h!0 h
Therefore f 0 (x) = 1, so (20.10) thus holds for n = 1. Suppose that (20.10) holds for n 1
(induction hypothesis), that is,
D(xn 1
) = (n 1)xn 2
20.7. DERIVATIVES OF ELEMENTARY FUNCTIONS 623
Consider the function xn = x (xn 1 ). Using the derivative of the product of functions (see
20.13 below) and the induction hypothesis, we have
D(xn ) = 1 (xn 1
) + x D(xn 1
) = (xn 1
) + x (n 1)xn 2
= (1 + n 1)(xn 1
) = nxn 1
f 0 (x) = x
log
In particular, dex =dx = ex , that is, the derivative function of the exponential function is
the exponential function itself. So, the exponential function equals its derivative function, a
truly remarkable invariance property that gives the exponential function a special status in
di¤erential calculus.
Proof We have
x+h x x h 1
f (x + h) f (x)
f 0 (x) = lim = lim = lim
h!0 h h!0 h h!0 h
h 1
= x lim = x log
h!0 h
where the last equality follows from the basic limit (11.32).
f 0 (x) = cos x
Proof From the basic trigonometric formula sin (a + b) = sin a cos b + cos a sin b, it follows
that
f (x + h) f (x) sin (x + h) sin x
f 0 (x) = lim = lim
h!0 h h!0 h
sin x cos h + cos x sin h sin x
= lim
h!0 h
sin x (cos h 1) + cos x sin h
= lim
h!0 h
cos h 1 sin h
= sin x lim + cos x lim = cos x
h!0 h h!0 h
The last equality follows from the basic limits (11.31) and (11.30) for cos x and sin x, re-
spectively.
In a similar way it is possible to prove that the function f : R ! R given by f (x) = cos x
is derivable at each x 2 R, with derivative function f 0 : R ! R given by
Proposition 921 Let f; g : (a; b) ! R be two derivable functions at x 2 (a; b). The sum
function f + g : (a; b) ! R is derivable at x, with
The result actually holds, more generally: for any linear combination f + g : (a; b) ! R,
with ; 2 R, we have
( f + g)0 (x) = f 0 (x) + g 0 (x) (20.12)
Proof We prove the result directly in the more general form (20.12). We have
( f + g) (x + h) ( f + g) (x)
( f + g)0 (x) = lim
h!0 h
( f )(x + h) + ( g) (x + h) ( f )(x) ( g) (x)
= lim
h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim +
h!0 h h
f (x + h) f (x) g (x + h) g (x)
= lim + lim
h!0 h h!0 h
= f 0 (x) + g 0 (x)
as desired.
Thus, the sum behaves in a simple manner with respect to derivatives: the “derivative of
a sum”is the “sum of the derivatives”.5 More subtle is the case of the product of functions.
Proposition 922 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b). The product
function f g : (a; b) ! R is derivable at x, with
5
The converse does not hold: if the sum of two functions has a derivative, it is not necessarily true that
the individual functions have a derivative (for example, the origin is a corner point of both f (x) = jxj and
g (x) = jxj, but the sum f + g is a constant function that has derivative at every point of the real line).
The same is true for the multiplication and division operations on functions.
20.8. ALGEBRA OF DERIVATIVES 625
Proof We have
(f g) (x + h) (f g) (x) f (x + h) g (x + h) f (x) g (x)
(f g)0 (x) = lim = lim
h!0 h h!0 h
f (x + h) g (x + h) f (x) g (x + h) + f (x) g (x + h) f (x) g (x)
= lim
h!0 h
g (x + h) (f (x + h) f (x)) + f (x) (g (x + h) g (x))
= lim
h!0 h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim +
h!0 h h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim + lim
h!0 h h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim g (x + h) lim + f (x) lim
h!0 h!0 h h!0 h
0 0
= g (x) f (x) + f (x) g (x)
as desired. In the last step we have limh!0 g (x + h) = g (x) thanks to the continuity of g,
which is ensured by its derivability.
The derivative of the product, therefore, is not the product of the derivatives, but it is
given by the more subtle product rule (20.13). A similar rule –the so-called quotient rule –
holds mutatis mutandis for the quotient.
Proposition 923 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b), with g (x) 6=
0. The quotient function f =g : (a; b) ! R is derivable at x, with
0
f f 0 (x) g (x) f (x) g 0 (x)
(x) = (20.14)
g g (x)2
Proof We start with the case in which f is constant and equal to 1. We have
1 1
0
1 g (x + h) g (x) g (x) g (x + h)
(x) = lim = lim
g h!0 h h!0 g (x) g (x + h) h
1 g (x) g (x + h)
= lim
g (x) h!0 g (x + h) h
1 g (x + h) g (x) 1 g 0 (x)
= lim lim =
g (x) h!0 h h!0 g (x + h) g (x)2
Now consider any f : (a; b) ! R. Thanks to (20.13), we have
0
f 1 0 1 1 0
(x) = f (x) = f 0 (x) (x) + f (x) (x)
g g g g
f 0 (x) g 0 (x) f 0 (x) g 0 (x)
= + f (x) = f (x)
g (x) g (x)2 g (x) g (x)2
f 0 (x) g (x) f (x) g 0 (x)
=
g (x)2
as desired.
626 CHAPTER 20. DERIVATIVES
Example 924 (i) Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have
and
(f g)0 (x) = 3x2 sin x + x3 cos x 8x 2 R
as well as
0
f 3x2 sin x x3 cos x
(x) = 8x 2 R fn : n 2 Zg
g sin2 x
In the last formula fn : n 2 Zg is the set of the points f ; 2 ; ; 0; ; 2 ; g where
the function g (x) = sin x in the denominator is zero.
(ii) Let f : R ! R be given by f (x) = tan x. Since tan x = sin x= cos x, we have
1
f 0 (x) = 1 + tan2 x =
cos2 x
c (x)
cm (x) =
x
c(x)
xc0 (x) c (x) x c0 (x) x c0 (x) cm (x)
c0m (x) = = =
x2 x2 x
Therefore, at a point x the variation in average costs is positive if and only if marginal costs
are larger than average costs. In other words, average costs continue to increase until they
are lower than marginal costs (cf. the numerical examples with which we began the chapter).
More generally, the same reasoning holds for each function f : [0; 1) ! R that represents,
when x 0 varies, an economic “quantity”: return, pro…t, etc.. The function fm : (0; 1) !
R de…ned by
f (x)
fm (x) =
x
is the corresponding “average quantity” (average return, average pro…t, etc.), while the
derivative function f 0 (x) represents the “marginal quantity” (marginal return, marginal
pro…t, etc.). At each x > 0, the function f 0 (x) can be interpreted geometrically as the slope
of the tangent line of f at x, while fm (x) is the slope of the straight line passing through
20.9. THE CHAIN RULE 627
150 150
y y
f(x)
100 100
f'(x)
50 50
f(x)/x
0 0
O x O x
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5
Geometrically, (20.15) says that the variation of the average fm is positive at a point x > 0,
0 (x)
that is, fm 0, until the slope of the tangent line is larger than that of the straight line
passing through the origin and the point (x; f (x)), that is, f 0 (x) fm (x). N
Proposition 925 Let f : (a; b) ! R and g : (c; d) ! R be two functions with Im f (c; d).
If f is derivable at x 2 (a; b) and g is derivable at f (x), then the composite function g f :
(a; b) ! R is derivable at x, with
Thus, the chain rule features the product of the derivatives g 0 and f 0 , where g 0 has as its
argument the image f (x). Before proving it, we provide a simple heuristic argument. For h
small enough, we have
If h ! 0, then
Note that we tacitly assumed that the denominator f (x + h) f (x) is always di¤erent from
zero, something that the hypotheses of the theorem do not guarantee. For this reason, we
need the following rigorous proof.
628 CHAPTER 20. DERIVATIVES
Example 926 Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have, at every
x 2 R, (g f ) (x) = sin x3 and (f g) (x) = sin3 x, so
(g f )0 (x) = g 0 (f (x)) f 0 (x) = cos x3 3x2 = 3x2 cos x3
and
(f g)0 (x) = f 0 (g (x)) g 0 (x) = 3 sin2 x cos x
N
Example 927 Let f : (a; b) ! R be any function derivable at every x 2 (a; b) and let
g (x) = ex . We have
(g f )0 (x) = g 0 (f (x)) f 0 (x) = ef (x) f 0 (x) (20.18)
4 4
For example, if f (x) = x4 , (g f ) (x) = ex and (20.18) becomes (g f )0 (x) = 4x3 ex . N
The chain rule is very useful to compute the derivative of a function that can be written
as a composition of other functions.
Example 928 Let ' : R ! R be given by ' (x) = sin3 (9x + 1). To calculate '0 (x) it is
useful to write ' as
'=f g h (20.19)
where f; g; h : R ! R are given by f (x) = x3 , g (x) = sin x, and h (x) = 9x + 1. By the
chain rule, we have
'0 (x) = f 0 ((g h) (x)) (g h)0 (x) = f 0 ((g h) (x)) g 0 (h (x)) h0 (x)
= 3 sin2 (9x + 1) cos (9x + 1) 9 = 27 sin2 (9x + 1) cos (9x + 1)
Expressing the function ' as in (20.19) thus simpli…es the computation of its derivative. N
20.10. DERIVATIVE OF INVERSE FUNCTIONS 629
O.R. If we write z = f (x) and y = g (z), we clearly have y = g (f (x)). What we have
proved can be summarized by stating that
dy dy dz
=
dx dz dx
which is easy to remember if the the symbol d =d is interpreted as a true ratio –it is a kind
of Pinocchio, a puppet that behaves like a true kid. H
O.R. The chain rule has an onion ‡avor because the derivative of a composite function is
obtained by successively “peeling” the function from the outside:
1 0 1
f (y0 ) = (20.20)
f 0 (x 0)
In short, the derivative of the inverse function of f , at y0 , is the reciprocal of the derivative
of f , at x0 .
It would be nice to invoke the chain rule and say that, from y0 = f f 1 (y0 ) it
0 0
follows that 1 = f 0 f 1 (y0 ) f 1 (y0 ), so that 1 = f 0 (x0 ) f 1 (y0 ), which is formula
(20.20). Unfortunately, we cannot use the chain rule because we are not sure (yet) that f 1
is derivable: indeed, this is what we …rst need to prove in this theorem.
Proof Set f (x0 + h) = y0 + k and observe that, by the continuity of f , when h ! 0, also
k ! 0. By the de…nition of inverse function, x0 = f 1 (y0 ) and x0 + h = f 1 (y0 + k).
Therefore, h = f 1 (y0 + k) f 1 (y0 ). By hypothesis, there exists
f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
But
f (x0 + h) f (x0 ) y0 + k y0 1
= 1 (y
= 1 (y 1 (y )
h f 0 + k) f 1 (y0 ) f 0 + k) f 0
k
Therefore, provided f 0 (x0 ) 6= 0, the limit of the ratio
f 1 (y + k) f 1 (y )
0 0
k
as k ! 0 also exists, and it is the reciprocal of the previous one, i.e., f 1 0 (y ) = 1=f 0 (x0 ).
0
630 CHAPTER 20. DERIVATIVES
The derivative of the inverse function is thus given by a unit fraction in which at the
denominator the derivative f 0 has as its argument the preimage f 1 (y), that is,
1 0 1 1
f (y) = = 0
f 0 (x) f (f 1 (y))
Example 930 Let f : R ! R be the exponential function f (x) = ex , so that f 1 : R++ !
R is the logarithmic function f 1 (y) = log y. Given that dex =dx = ex = y, we have
d log y 1 1 1 1
= 0 = x = log y =
dy f (x) e e y
for every y > 0. N
This example, along with the chain rule, yield the important formula
d log f (x) f 0 (x)
=
dx f (x)
for strictly positive derivable functions f . It is the logarithmic version of (20.18).
The last example, again along with the chain rule, also leads to an important generaliz-
ation of Proposition 918.
Proof We have
a
xa = elog x = ea log x (20.21)
Setting f (x) = ex and g (x) = a log x, from (20.21) it follows that
d (xa ) a a
= f 0 (g (x)) g 0 (x) = ea log x = xa = axa 1
dx x x
as desired.
d tan x
= 1 + tan2 x = 1 + y 2
dx
and so, for every y 2 R,
d arctan y 1
=
dy 1 + y2
N
We relegate to an example the derivative of a function with variable base and exponent.
f 0 (x)
F 0 (x) = eg(x) log f (x) D [g (x) log f (x)] = F (x) g 0 (x) log f (x) + g (x)
f (x)
dxx 1
= xx log x + x = xx (1 + log x)
dx x
2
while the derivative of F (x) = xx is
2
dxx 2 1 2 +1
= xx 2x log x + x2 = xx (1 + 2 log x)
dx x
x
The reader can try to calculate the derivative of F (x) = xx . N
O.R. Denoting by y = f (x) a function and by x = f 1 (y) its inverse, we can summarize
what we have seen by writing
dx 1
=
dy dy
dx
Again the symbol d =d behaves like a true ratio, a further proof of its Pinocchio nature.H
20.11 Formulary
The chain rule permits to broaden considerably the scope of the results on the derivatives of
elementary functions seen in Section 20.7. In Example 927 we already saw how to calculate
the derivative of a generic function ef (x) , which is much more general than the exponential
ex of Proposition 919.
632 CHAPTER 20. DERIVATIVES
In a similar way it is possible to generalize all the results on the derivatives of elementary
functions seen until now. We summarize all this in two tables: the …rst one lists the deriv-
atives of elementary functions, while the second one contains its generalization that can be
obtained through the chain rule.
f f0 Reference
k 0 Example 907
xa axa 1 Proposition 931
ex ex Proposition 919
x x log Proposition 919
1
log x Example 930
x
1
loga x Exercise for the reader
x log a
sin x cos x Proposition 920
cos x sin x Observation 20.11
1
tan x = 1 + tan2 x Example 924
cos2 x
1
cotanx = cotan2 x Exercise for the reader
sin2 x
1
arcsin x p Example 932
1 x2
1
arccos x p Exercise for the reader
1 x2
1
arctan x Example 933
1 + x2
1
arccotanx Exercise for the reader
1 + x2 (20.22)
Given their importance in so many contexts, it is useful to memorize the previous table,
as one learned as a child by heart the multiplication tables. Let us now see its general
version obtained through the chain rule. In the next table, f are the elementary functions
of the previous table, while g is any derivable function. Most of the derivatives that arise in
20.12. DIFFERENTIABILITY AND LINEARITY 633
f g (f g)0 Image of g
a
g (x) ag (x)a 1 g 0 (x) A R
eg(x) g 0 (x) eg(x) A R
g(x) g 0 (x) g(x) log A R
g 0 (x)
log g (x) A R++
g (x)
g 0 (x) 1
loga g (x) A R++
g (x) log a
sin g (x) g 0 (x) cos g (x) A R
cos g (x) g 0 (x) sin g (x) A R
g 0 (x)
tan g (x) = g 0 (x) 1 + tan2 g (x) A R
cos2 g (x)
g 0 (x)
arcsin g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arccos g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arctan g (x) A R
1 + g 2 (x)
(20.23)
20.12.1 Di¤erential
A fundamental question is whether it is possible to approximate a function f : (a; b) ! R
locally – that is, in a neighborhood of a given point of its domain – by an a¢ ne function,
namely, by a straight line (recall Proposition 656). If this is possible, we could locally
approximate the function – even if very complicated – by the simplest function: a straight
line.
To make precise this idea, given a function f : (a; b) ! R and a point x0 2 (a; b), suppose
that there exists an a¢ ne function r : R ! R that approximates f at x0 in the sense that
adequate approximation of f at x0 . First, the straight line must coincide with f at x0 , that
is, f (x0 ) = r (x0 ): at the point x0 the approximation must be exact, without any error.
Second, and most important, the approximation error
f (x0 + h) r (x0 + h)
at x0 + h is o (h), that is, as x0 + h approaches x0 , the error goes to zero faster than h: the
approximation is (locally) “very good”.
Since the straight line r can be written as r (x) = mx + q, the condition f (x0 ) = r (x0 )
implies
r (x0 + h) = m (x0 + h) + q = mh + mx0 + q = mh + f (x0 )
Denote by l : R ! R the linear function de…ned by l (h) = mh, which geometrically is
a straight line passing through the origin. The approximation condition (20.24) can be
equivalently written as
This expression (20.25) emphasizes the linearity of the approximation l (h) of the di¤erence
f (x0 + h) f (x0 ), as well as the goodness of this approximation: the di¤erence f (x0 + h)
f (x0 ) l (h) is o (h). This emphasis is important and motivates the following de…nition.
for every h 2 (a x0 ; b x0 ).
In other words, the de…nition requires that there exists a number m 2 R, independent
of h (but, in general, dependent on x0 ) such that
The linear function l : R ! R in (20.26) is called the di¤ erential of f at x0 and is denoted
by df (x0 ) : R ! R. With such a notation, (20.26) becomes6
(i) its mere existence ensures that the function is well behaved (it is continuous);
(ii) it reveals whether the function goes up or down and, with its slope, it tells us approx-
imately which is the rate of change of the function at the point studied.
These two pieces of information are often useful in applications. Chapter 23 will study
in more depth these issues and will present sharper local approximations. H
The di¤erential at a point can be thus written in terms of the derivative at that point.
Inter alia, this also shows the uniqueness of the di¤erential df (x0 ).
The linear function l : R ! R is a straight line passing through the origin, so there exists
m 2 R such that l (h) = mh. Hence
Di¤erentiability and derivability are, therefore, equivalent notions for scalar functions.
When they hold, we have, as h ! 0,
or, equivalently, as x ! x0 ,
is the equation of the tangent line at x0 . This con…rms the natural intuition that such line
is the a¢ ne approximation that makes f di¤erentiable at x0 . Graphically:
y
5
f(x +h)
4 0
3
f(x )
0
2
0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6
O.R. The di¤erence f (x0 + h) f (x0 ) is called the increment of f at x0 and is often denoted
by f (x0 ) (h). When f is di¤erentiable at x0 , we have
So,
f (x0 ) df (x0 ) as h ! 0
20.12. DIFFERENTIABILITY AND LINEARITY 637
The two in…nitesimals f (x0 ) and df (x0 ) are, therefore, of the same order. This is another
way of saying that, when f is di¤erentiable at x0 , the di¤erential well approximates the true
increment. H
The converse is clearly false, as shown by the absolute value function f (x) = jxj at
x0 = 0.
Therefore, f is continuous at x0 .
Notation The set of all the continuously di¤erentiable functions on a set E in R is denoted
by C 1 (E).
638 CHAPTER 20. DERIVATIVES
Example 939 The quadratic function is three times di¤erentiable at all point of the real
line. Indeed, its function f 00 : R ! R has a derivative at each x 2 R, with f 000 (x) = 0 for
each x 2 R. N
These de…nitions can be iterated ad libitum, with fourth derivative, …fth derivative, and
so on. Denoting by f (n) the n-th derivative, we can de…ne by recurrence the di¤erentiability
of higher order of a function.
f (n 1) (x + h) f (n 1) (x)
lim (20.32)
h!0 h
exists and is …nite.
f 0 (x) = 4x3 ; f 00 (x) = 12x2 ; f 000 (x) = 24x; f (iv) (x) = 24; f (v) (x) = 0
a0 2a ka
2 0 k 0
hf (x0 ) = ; hf (x0 ) = ; ; hf (x0 ) =
h h2 hk
We have:
a0 f (x0 + h) f (x0 )
hf (x0 ) = =
h h
2a 1 f (x0 + 2h) 2f (x0 + h) + f (x0 )
2 0
hf (x0 ) = = 2 ( a1 a0 ) =
h2 h h2
k
1 X k
k
hf (x0 ) = ( 1)k i
f (x0 + ih)
hk i
i=0
where the last equality follows from (10.4). By de…nition the …rst derivative is the limit, as
h approaches 0, of the di¤erence quotient h f (x0 ). Interestingly, the next result shows that
also the second di¤erence quotient converges to the second derivative, the third di¤erence
quotient converges to the third derivative, and so on.
Proposition 942 Let f be n 1 times di¤ erentiable on R and n times di¤ erentiable at x0 .
We have f (k) (x0 ) = limh!0 kh f (x0 ) for all 1 k n.
Proof We only prove the case n = 2. In Chapter 23 we will establish the following quadratic
approximation:
1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2
2
Then, f (x0 + 2h) = f (x0 ) + 2f 0 (x0 ) h + 2f 00 (x0 ) h2 + o h2 , so
as desired.9
8
Here it is convenient to start the sequence at n = 0.
9
For a direct proof of this result, we refer readers to Jordan (1893) pp. 116-118.
640 CHAPTER 20. DERIVATIVES
Conceptually, this result shows that derivatives can be viewed as limits of …nite di¤er-
ences, so the “discrete” and “continuous” calculi are consistent. Indeed, some important
continuous properties can be viewed as inherited, via limits, from discrete ones: for instance,
the algebra of derivatives can be easily deduced from that of …nite di¤erences via limits. All
this is important (and, in a sense, reassuring) because discrete properties are often much
easier to grasp intuitively.
By establishing a “direct”characterization of second and of higher order derivatives, this
proposition is also important for their numerical computation. For instance, inspection of
the proof shows that f 00 (x0 ) = 2h f (x0 ) + o(h2 ). In general, 2h f (x0 ) is much easier to
compute numerically than f 00 (x0 ), with o(h2 ) being the magnitude of the approximation
error.
So far so good. Yet, from this formula one might be tempted to take …ner and …ner
subdivisions by letting n ! +1. For each k we have
n(k) nk
as well as
k
hf (x0 ) f (k) (x0 )
provided f is in…nitely di¤erentiable. Indeed, by Proposition 942 we have kh f (x0 ) !
f (k) (x0 ) as h ! 0, so as n ! +1. Unfortunately, the equivalence relation does not
necessarily go through sums, let alone through in…nite ones (cf. Lemma 331). Yet, if we take
a leap of faith –in a eighteen century style –we “then” have a series expansion
1
X f (k) (x0 )
f (x) (x x0 )k 8x 2 R
k!
k=0
10
A notation short circuit: here n plays the role of m in (10.12), k that of j, while in the notation of (10.12)
here we have n = 0.
20.14. DISCRETE LIMITS 641
Fortunately, later in the book Section 23.5 will make rigorous all this by showing that in…n-
itely di¤erentiable functions that are analytic admit an (exact) series expansion, something
that makes them the most tractable class of functions. Though rough, the previous heuristic
argument thus opens a door on a key topic.
642 CHAPTER 20. DERIVATIVES
Chapter 21
Our study of di¤erential calculus has so far focused on functions of a single variable. Its
extension to functions of several variables is a fundamental, but subtle, topic. We can begin,
however, with a simple notion of di¤erentiation in Rn : partial di¤erentiation. Let us start
with the two-dimensional case. Consider the origin x = (0; 0) in the plane. There are,
intuitively, two main directions along which to approach the origin: the horizontal one –
that is, moving along the horizontal axis –and the vertical one –that is, moving along the
vertical axis.
0.8
0.6
0.4
0.2
0
O
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.5 0 0.5 1
As we can approach the origin along the two main directions, vertical and horizontal, the
same can be done for any point x in the plane.
643
644 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
0.8
0.6
0.4
0.2
x0
2
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
To formalize this intuition, let us consider the two versors e1 = (1; 0) and e2 = (0; 1) in
R2 . For every x = (x1 ; x2 ) 2 R2 and every scalar h 2 R, we have
Graphically
0.8
0.6
0.4
0.2
x 1
x + he
x0
2
-0.2
-0.4
-0.6
O x x +h
-0.8 1 1
-1
-1 -0.5 0 0.5 1
The set
x + he1 : h 2 R
is, therefore, formed by the vectors of R2 with the same second coordinate, but with a
di¤erent …rst coordinate.
21.1. PARTIAL DERIVATIVES 645
0.8
0.6
0.4
0.2
1
x x { x + he , h ∈ ℜ }
02
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
Graphically, it is the horizontal straight line that passes through the point x. For example,
if x is the origin (0; 0), the set
x + he1 : h 2 R = f(h; 0) : h 2 Rg
Graphically
0.8
0.6
x x
2
0.4
0.2
x + h0 2
2 x + he
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
In this case the set x + he2 : h 2 R is formed by the vectors of R2 with the same …rst
coordinate, but with a di¤erent second coordinate.
646 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
1 2
{ x + he , h ∈ ℜ }
0.8
0.6
x x
2
0.4
0.2
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
Graphically, it is the vertical straight line that passes through the point x. When x is the
origin (0; 0), the set x + he2 : h 2 R is the vertical axis.
Though key for understanding the meaning of partial derivatives, (21.1) and (21.2) are
less useful to compute them. To this end, for a …xed x 2 R2 we introduce the two auxiliary
scalar functions, called projections, '1 ; '2 : R ! R de…ned by
The partial derivative @f =@xi is nothing but the ordinary derivative '0i of the scalar function
'i calculated at t = xi , with i = 1; 2. Thus, using the auxiliary functions 'i we go back
to the di¤erentiation of scalar functions studied in the last chapter. Formulas (21.3) and
(21.4) are very useful for the computation of partial derivatives, which is thus reduced to the
computation of standard derivatives of scalar functions.
Example 943 (i) Let f : R2 ! R be given by f (x1 ; x2 ) = x1 x2 . Let us compute the partial
derivatives of f at x = (1; 1). We have
'1 (t) = f (t; 1) = t ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 1 and at the point t = 1 we have
'02 ( 1) = 1, which implies
@f @f
(1; 1) = '01 (1) = 1 ; (1; 1) = '02 ( 1) = 1
@x1 @x2
More generally, at any point x 2 R2 we have
'1 (t) = tx2 ; '2 (t) = x1 t
Therefore, their derivatives at the point x are '01 (x1 ) = x2 and '02 (x2 ) = x1 . Hence,
@f @f
(x) = '01 (x1 ) = x2 ; (x) = '02 (x2 ) = x1
@x1 @x2
(ii) Let f : R2 ! R be given by f (x1 ; x2 ) = x21 x2 . Let us compute the partial derivatives
of f at x = (1; 2). We have
'1 (t) = f (t; 2) = 2t2 ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 4 and at the point t = 2 we have '02 (2) = 1,
whence
@f @f
(1; 2) = '01 (1) = 4 ; (1; 2) = '02 (2) = 1
@x1 @x2
Again, more generally, at any point x 2 R2 we have
'1 (t) = t2 x2 ; '2 (t) = x21 t
Therefore, their derivatives at the point x are '01 (x1 ) = 2x1 x2 and '02 (x2 ) = x21 , so
@f @f
(x) = '01 (x1 ) = 2x1 x2 ; (x) = '02 (x2 ) = x21
@x1 @x2
N
Example 944 Let f : R R++ ! R be given by f (x1 ; x2 ) = x1 log x2 . Let us calculate the
partial derivatives at x 2 R R++ . We start with @f =@x1 (x). If we consider f as a function
of the single variable x1 , its derivative is log x2 . Therefore,
@f
(x) = log x2
@x1
On the other hand, '1 (t) = t log x2 , and therefore at the point t = x1 we have '01 (x1 ) =
log x2 . Let us move to @f =@x2 (x). If we consider f as a function of the single variable x2 ,
its derivative is x1 =x2 . Therefore,
@f x1
(x) =
@x2 x2
N
O.R. Geometrically, at a point (x1 ; x2 ) the projection '1 (t) = f (t; x2 ) is obtained by
sectioning the surface that represents f with the vertical plane of equation x2 = x2 , while the
projection '2 (t) = f (x1 ; t) is obtained by sectioning the same surface with the vertical plane
(perpendicular to the previous one) of equation x1 = x1 . Therefore, as with a panettone,
the surface is cut with two planes perpendicular one another: the projections are nothing
but the shapes of the two slices and, as such, scalar functions (whose graph lies on the plane
with which we cut the surface).
The partial derivatives at (x1 ; x2 ) are therefore simply the slopes of the two projections at
this point. H
exist and are …nite. These limits are called the partial derivatives of f at x.
The limit (21.5) is the i-th partial derivative of f at x, denoted by either fx0 i (x) or
@f
(x)
@xi
Often, it is actually convenient to write
@f (x)
@xi
The choice among these alternatives will be just a matter of convenience. The vector
@f @f @f
(x) ; (x) ; :::; (x) 2 Rn
@x1 @x2 @xn
Also in the general case of n independent variables, to calculate the partial derivatives
at a point x one can introduce the projections 'i de…ned by
which generalizes to Rn formulas (21.3) and (21.4), reducing in this case, too, the calculation
of partial derivatives to that of standard derivatives of scalar functions.
and therefore
Hence
@f @f
(x) = '01 (x1 ) = 1 ; (x) = '02 (x2 ) = x3 ex2 x3
@x1 @x2
@f @f
(x) = '03 (x3 ) = x2 ex2 x3 ; (x) = '04 (x4 ) = 4x4
@x3 @x4
By putting them together, we have the gradient
rf (x) = (1; x3 ex2 x3 ; x2 ex2 x3 ; 4x4 )
N
As in the special case n = 2, also in the general case to calculate the partial derivative
@f (x) =@xi through the projection 'i amounts to considering f as a function of the single
variable xi , keeping constant the other n 1 variables. We then calculate the ordinary
derivative at xi of this scalar function. In other words, we study the incremental behavior
of f with respect to variations of xi only, by keeping constant the other variables.
@f @f @f
(x) = x2 x3 ; (x) = x1 x3 ; (x) = x1 x2
@x1 @x2 @x3
of the function f (x1 ; x2 ; x3 ) = x1 x2 x3 are functions on all R3 . Together they form the
derivative operator
@f @f @f
rf (x) = (x) ; (x) ; (x) = (x2 x3 ; x1 x3 ; x1 x2 )
@x1 @x2 @x3
of f . N
with g : R ! R strictly increasing. Since u and g u are utility functions that are equivalent
from the ordinal point of view, this shows that the di¤erences (21.8) per se have no meaning.
For this reason, the ordinalist consumer theory uses marginal rates of substitution and not
marginal utilities –as we will see in Section 25.3.2. Nevertheless, marginal utility remains a
notion commonly used in economics because of its intuitive appeal.
21.2 Di¤erential
The notion of di¤erential introduced in De…nition 935 naturally extends to functions of
several variables.
The linear function l is called the di¤ erential of f at x, denoted by df (x) : Rn ! R. The
di¤erential is the linear approximation at the point x of the variation f (x + h) f (x) with
error of magnitude o (khk), that is,5
i.e.,
f (x + h) f (x) df (x) (h) o (khk)
lim = lim =0
h!0 khk h!0 khk
df (x) (h) = h
for a suitable vector 2 Rn . The next important theorem identi…es such a vector and shows
that di¤erentiability guarantees both continuity and partial derivability.
But:
(i) limh!0 l (h) = l (0) = 0 since linear functions l : Rn ! R are continuous (Theorem
535);
To show the existence of partial derivatives at x, let us consider the case n = 2 (the
general case does not present novelties, except of notation). In this case, (21.10) implies the
existence of = ( 1 ; 2 ) 2 R2 such that
f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 ) 1 h1 2 h2
lim p =0 (21.13)
(h1 ;h2 )!(0;0) h21 + h22
and therefore
f (x1 + h1 ; x2 ) f (x1 ; x2 ) @f (x1 ; x2 )
1 = lim =
h1 !0 h @x1
In a similar way it is possible to prove that 2 = @f (x1 ; x2 ) =@x2 , that is, rf (x1 ; x2 ) = .
In conclusion, both partial derivatives exist, so the function f is partially derivable, with
generalizes the tangent line (20.31). The approximation (21.10) assumes the form f (x) =
r (x) + o (kx x0 k), that is,
In the special case n = 2, the a¢ ne function (21.14) that best approximates a function
f : U R2 ! R at a point x0 = (x01 ; x02 ) 2 U takes the form6
@f (x0 ) @f (x0 )
r(x1 ; x2 ) = f (x01 ; x02 ) + (x1 x01 ) + (x2 x02 )
@x1 @x2
6
Here x01 and x02 denote the components of the vector x0 .
21.2. DIFFERENTIAL 655
4
x3
-2
-4 -2
2 -1
1 0
0 1
-1
-2 2
x2
x1
For n 3, the a¢ ne function (21.14) that best approximates a function in the neighborhood
of a point x0 of its domain is called tangent hyperplane. For obvious reasons, it cannot be
visualized graphically.
We close with a piece of terminology. When f is di¤erentiable at all the points of a subset
E of U , for brevity we say that f is di¤ erentiable on E. When f is di¤erentiable at all the
points of its domain, it is called di¤ erentiable, without further speci…cation.
p
Since f (0; 0) = 0 and rf (0; 0) = (0; 0), we have f (h; k) = o h2 + k 2 , that is,
f (h; k)
lim p =0
(h;k)!(0;0) h2 + k 2
i.e., r
hk
lim =0
(h;k)!(0;0) h2 + k2
But, this is not possible. Indeed, if for example we consider the points on the straight line
x2 = x1 , that is, of the form (t; t), we get
r r r
hk t2 1
2 2
= 2 2
= 8t 6= 0
h +k t +t 2
This shows that f is not di¤erentiable at (0; 0),7 even if it has partial derivatives at (0; 0).N
Summing up:
di¤erentiability implies partial derivability (Theorem 952), but not vice versa when
n 2 (Example 953);
It is natural to ask which additional hypotheses are required for partial derivability to
imply di¤erentiability (so, continuity). The answer is given by the next remarkable result that
extends Theorem 936 to the vector case by showing that, under a simple regularity hypotheses
(the continuity of partial derivatives), a partially derivable function is also di¤erentiable (so,
continuous).
Theorem 954 Let f : U ! R be partially derivable. If the partial derivatives are continu-
ous, then f is di¤ erentiable.
Proof8 For simplicity of notation, we consider the case in which n = 2, the function f is
de…ned on the entire plane R2 , and the partial derivatives @f =@x1 and @f =@x2 exist on R2 .
Apart from more complicated notation, the general case can be proved in a similar way.
Therefore, let f : R2 ! R and x 2 R2 . Assume that @f =@x1 and @f =@x2 are both
continuous at x. By adding and subtracting f (x1 + h1 ; x2 ), for each h 2 R2 we have:
f (x + h) f (x) (21.15)
= f (x1 + h1 ; x2 ) f (x1 ; x2 ) + f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )
7
For the more demanding reader: note pthat each neighbourhood
p of the origin contains points
p of the type
(t; t) with t 6= 0. For such points we have hk= (h2 + k2 ) = 1=2. Therefore, for 0 < " < 1=2 there is no
p
neighbourhood of the origin such that, for all its points (h; k) 6= (0; 0), we have hk= (h2 + k2 ) 0 < ".
8
Since this proof uses the Mean Value Theorem for scalar functions that will be presented in the next
chapter, it is best understood after learning that result. The same remark applies to the proof of Schwartz’s
Theorem.
21.2. DIFFERENTIAL 657
The partial derivative @f =@x1 (x) is the derivative of the function 1 : R ! R de…ned by
9
1 (x1 ) = f (x1 ; x2 ), in which x2 is considered as a constant. By the Mean Value Theorem,
there exists z1 2 (x1 ; x1 + h1 ) R such that
1 (x1
+ h1 ) 1 (x1 ) (x1 + h1 ) 1 (x1 )
0
1 (z1 ) = = 1
x1 + h1 x1 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
=
h1
Similarly, the partial derivative @f =@x2 (x + h) is the derivative of the function 2 : R ! R
de…ned by 2 (x2 ) = f (x1 + h1 ; x2 ), in which x1 + h1 is considered as a constant. Again by
the Mean Value Theorem, there exists z2 2 (x2 ; x2 + h2 ) R such that
2 (x2
+ h2 ) 2 (x2 ) (x2 + h2 ) 2 (x2 )
0
2 (z2 ) = = 2
x2 + h2 x2 h2
f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )
=
h2
0 0
Since by construction @f =@x1 (z1 ; x2 ) = 1 (z1 ) and @f =@x2 (x1 + h1 ; z2 ) = 2 (z2 ), we can
rewrite (21.15) as:
@f @f
f (x + h) f (x) = (z1 ; x2 ) h1 + (x1 + h1 ; z2 ) h2
@x1 @x2
On the other hand, by de…nition rf (x) h = @f =@x1 (x1 ; x2 ) h1 + @f =@x2 (x1 ; x2 ) h2 . Thus:
jf (x + h) f (x) rf (x) hj
lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) h2 @x1 (x1 ; x2 ) h1 + @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
lim + lim
h!0 khk h!0 khk
@f @f jh1 j
= lim (z1 ; x2 ) (x1 ; x2 )
h!0 @x1 @x1 khk
@f @f jh2 j
+ lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x2 @x2 khk
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) + lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
where the last inequality holds because
jh1 j jh2 j
0 1 and 0 1
khk khk
9
The Mean Value Theorem for scalar functions will be studied in the next chapter.
658 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
@f @f @f @f
lim (z1 ; x2 ) = (x1 ; x2 ) and lim (x1 + h1 ; z2 ) = (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
which implies
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) = lim (x1 + h1 ; z2 ) (x1 ; x2 ) =0
h!0 @x1 @x1 h!0 @x2 @x2
jf (x + h) f (x) rf (x) hj
lim =0
h!0 khk
Example 955 (i) Consider the function f : Rn ! R given by f (x) = kxk2 . Its gradient is
@f @f
rf (x) = (x) = 2x1 ; :::; (x) = 2xn = 2x 8x 2 Rn
@x1 @xn
and
kx + hk2 kxk2 = 2x h + o (khk)
as khk ! 0.
P
(ii) Consider the function f : Rn++ ! R given by f (x) = ni=1 log xi . Its gradient is
@f 1 @f 1
rf (x) = (x) = ; :::; (x) = 8x 2 Rn++
@x1 x1 @xn xn
The partial derivatives are continuous on Rn++ and therefore f is di¤erentiable on Rn++ . By
(21.10), at each x 2 Rn++ we have
so that, as khk ! 0,
n
X n
X n
X hi
log (xi + hi ) log xi = + o (khk)
xi
i=1 i=1 i=1
N
21.2. DIFFERENTIAL 659
However evocative, one should not forget that the total di¤erential (21.16) is only a
heuristic version of the di¤erential df (x), which is the rigorous notion.10
10
As we already remarked a few times, heuristics plays an important role in the quest for new results (of
a “vanguard of heuristic e¤orts towards the new” wrote Carlo Emilio Gadda). The rigorous veri…cation of
the results so obtained is, however, key; only few outstanding mathematicians, dear to the gods, can rely on
intuition without caring too much of rigor. Yet, one of them, the great Archimedes, so writes in his Method
“... certain things became clear to me by a mechanical method, although they had to be demonstrated by
geometry afterwards because their investigation by the said method did not furnish an actual demonstration.”
(Trans. Heath).
660 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) rg (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
In the scalar case n = 1, we get back the classic rule (f g)0 (x) = f 0 (g (x)) g 0 (x).
Moreover, by Theorem 952 the di¤erential of the composition f g is:
n
X @g (x)
d (f g) (x) (h) = f 0 (g (x)) hi (21.17)
@xi
i=1
df @g df @g
d (f g) = dx1 + + dxm (21.18)
dg @x1 dg @xm
The variation of f g can decomposed according to the di¤erent in…nitesimal variations dxi ,
each of which induces the variation (@g=@xi ) dxi on g, which in turn causes a variation df =dg
on f . Summing these partial e¤ects we get the overall variation d (f g).
Example 958 (i) Let f : R ! R be given by f (x) = e2x and let g : R2 ! R be given by
g (x) = x1 x22 . Let us calculate with the chain rule the di¤erential of the composite function
f g : R2 ! R given by
2
(f g) (x) = e2x1 x2
We have
2 2
r (f g) (x) = 2x22 e2x1 x2 ; 4x1 x2 e2x1 x2
and therefore
2
d (f g) (x) (h) = 2e2x1 x2 x22 h1 + 2x1 x2 h2
for every h 2 R2 . The total di¤erential is
2
d (f g) = 2e2x1 x2 x22 dx1 + 2x1 x2 dx2
(ii) Let f : (0; 1) ! R be given by f (x) = log x and let g : R2++ [ R2 ! R be given
p
by g (x1 ; x2 ) = x1 x2 . Here the function g must be restricted to R2++ [ R2 to satisfy the
condition Im g (0; 1). Let us calculate with the chain rule the di¤erential of the composite
function f g : R2++ [ R2 ! R given by
p
(f g) (x) = log x1 x2
21.2. DIFFERENTIAL 661
We have r r
@g (x) 1 x2 @g (x) 1 x1
= and =
@x1 2 x1 @x2 2 x2
so that
@g (x) 0 @g (x)
r (f g) (x) = f 0 (g (x)) ; f (g (x))
@x1 @x2
r r
1 1 x2 1 1 x1 1 1
= p ;p = ;
x1 x2 2 x1 x1 x2 2 x2 2x1 2x2
and
1 1
d (f g) (x) (h) = h1 + h2
2x1 2x2
for every h 2 R2 . The total di¤erential is
1 1
d (f g) = dx1 + dx2
2x1 2x2
Pn 1
(iii) Let g : Rn++ ! R and f : R+ ! R be given by g (x) = i=1 ai xi and f (x) = x ,
with ai 2 R and 6= 0, so that f g : Rn++ ! R is
n
!1
X
(f g) (x) = ai xi
i=1
i=1 i=1
0 !1 !1 1
n
X n
X
= @a1 ai xi x1 1
; :::; an ai xi xn 1A
i=1 i=1
and
n n
!1 n
!1 n
X X X X
1 1
d (f g) (x) (h) = ai ai xi xi hi = ai xi ai xi hi
i=1 i=1 i=1 i=1
Pn
(iv) Let g : Rn ! R and f : R++ ! R be given by g (x) = i=1 ai e
xi and f (x) =
1
log x , with ai 2 R and 6= 0, so that f g: Rn ! R is
n
X
1 xi
(f g) (x) = log ai e
i=1
@g x1 @g xn
rg (x) = (x) = a1 e ; :::; (x) = an e
@x1 @xn
so that
@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
1 1 1 1
= Pn x
a1 e x1 ; :::; Pn xi
an e xn
i=1 ai e i=1 ai e
i
a e x1 an e xn
= Pn 1 xi
; :::; Pn xi
a
i=1 i e i=1 ai e
and
n
X n
a e xi 1 X
d (f g) (x) (h) = Pn i xi
hi = ai e xi
hi
i=1 i=1 ai e g (x)
i=1
@f
:U !R
@xi
@f @f
(x) = ex2 and (x) = x1 ex2
@x1 @x2
Hence, it makes sense to talk about existence of partial derivatives of the partial deriv-
atives functions @f =@xi : U ! R at a point x 2 U . In this case, for every i; j = 1; :::; n we
have the partial derivative
@f
@ @x i
(x)
@xj
with respect to xj of the partial derivative @f =@xi . These partial derivatives are called
second-order partial derivatives of f and are denoted by
@2f
(x)
@xi @xj
or by fx00i xj . When i = j we write
@2f
(x)
@x2i
instead of @ 2 f =@xi @xi . Using this notation, we can construct the matrix
2 3
@2f @2f @2f
2 (x) @x1 @x2 (x) @x1 @xn (x)
6 @x1 7
6 7
6 @2f 2 2 7
6 @ f @ f 7
6 @x2 @x1 (x) @x22
(x) @x2 @xn (x) 7
6 7
6 7
6 7
6 7
6 7
4 2 2 2
5
@ f @ f @ f
@xn @x1 (x) @xn @x2 (x) @x2
(x)
n
The second-order partial derivatives can, in turn, be seen as functions of several variables.
We can therefore look for their partial derivatives, which (if they exist) are called the third-
order partial derivatives. We can then move to their partial derivatives (if they exist) and
get the fourth-order derivatives, and so on.
For instance, going back to the previous example, consider the partial derivative
@2f
(x) = (1 + x1 x2 ) ex1 x2
@x1 @x2
Theorem 962 (Schwarz) Let f : U ! R be a function that has second-order partial de-
rivatives on U . If they are continuous at x 2 U , then
@2f @2f
(x) = (x) (21.19)
@xi @xj @xj @xi
Proof For simplicity we consider the case n = 2. In this case, (21.19) reduces to:
@2f @2f
= (21.20)
@x1 @x2 @x2 @x1
Again for simplicity, we also assume that the domain A is the whole space R2 , so that we
consider a function f : R2 ! R. By de…nition,
@f f (x1 + h1 ; x2 ) f (x1 ; x2 )
(x) = lim
@x1 h1 !0 h1
21.3. PARTIAL DERIVATIVES OF HIGHER ORDER 665
and therefore:
@f @f
@2f @x1 (x1 ; x2 + h2 ) @x1 (x1 ; x2 )
(x) = lim
@x1 @x2 h2 !0 h2
1 f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 )
= lim lim
h2 !0 h2 h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
lim
h1 !0 h1
@2f (h1 ; h2 )
(x) = lim lim (21.21)
@x1 @x2 h2 !0 h1 !0 h2 h1
Consider in addition the scalar auxiliary function 1 : R ! R de…ned by 1 (x) = f (x; x2 + h2 )
f (x; x2 ) for each x 2 R. We have:
0 @f @f
1 (x) = (x; x2 + h2 ) (x; x2 ) (21.22)
@x1 @x1
Moreover, by the Mean Value Theorem there exists z1 2 (x1 ; x1 + h1 ) such that
@2f (h1 ; h2 )
(z1 ; z2 ) = (21.25)
@x2 @x1 h2 h1
666 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
@2f @2f
(x) = lim lim (z1 ; z2 ) (21.26)
@x1 @x2 h2 !0 h1 !0 @x2 @x1
@2f @2f
lim lim (z1 ; z2 ) = (x1 ; x2 ) (21.27)
h2 !0 h1 !0 @x2 @x1 @x2 @x1
Thus, when they are continuous, the order in which we take partial derivatives does not
matter: we can compute …rst the partial derivative with respect to xi and then the one with
respect to xj , or vice versa, with the same result. So, we can choose the way that seems
computationally easier, obtaining then “for free” the other second-order partial derivative.
This simpli…es considerably the computation of derivatives and, moreover, results in an
elegant property of symmetry of the Hessian matrix.
The reader can verify that: (i) f has continuous partial derivatives @f =@x1 and @f =@x2 ; (ii)
f has second-order partial derivatives @ 2 f =@x1 @x2 and @ 2 f =@x2 @x1 de…ned on all R2 , but
discontinuous at the origin (0; 0). Therefore, the hypothesis of continuity of the second-order
21.4. TAKING STOCK: THE NATURAL DOMAIN OF ANALYSIS 667
partial derivatives of Schwarz’s Theorem does not hold at the origin, so the theorem cannot
say anything about the behavior of these derivatives at the origin. Let us calculate them:
@2f @2f
(0; 0) = 1 and (0; 0) = 1
@x1 @x2 @x2 @x1
So,
@2f @2f
(0; 0) 6= (0; 0)
@x1 @x2 @x2 @x1
The continuity of the second-order partial derivatives is, therefore, needed for the validity of
equality (21.19). N
This limit represents the in…nitesimal increments of the function f at the point x when we
move along the direction determined by the vector y of Rn , which is no longer required to
be a versor ei . Graphically:
hx; yi = f(1 h) x + hy : h 2 Rn g
tells us which is the “incremental” behavior of the function when we move along the line
hx; x + yi. Each y 2 Rn identi…es a line and, therefore, gives us a direction along which we
can study the increments of the function.
Not all lines hx; x + yi identify di¤erent directions: the next result shows that, given a
vector y 2 Rn , all vectors y identify the same direction provided 6= 0.
x + y 0 = x + y = x + (1 ) x + y = (1 )x + (x + y)
and therefore x + y 0 2 hx; x + yi. This implies hx; x + y 0 i hx; x + yi. Since y = (1= ) y 0 ,
by proceeding in a similar way we can prove that hx; x + yi hx; x + y 0 i. We conclude that
hx; x + yi = hx; x + y 0 i. “Only if”. Suppose that hx; x + y 0 i = hx; x + yi. Suppose y 6= y 0
(otherwise the result is trivially true). At least one of them has then to be non-zero, say y 0 .
Since x+y 0 2 hx; x + yi and y 0 6= 0, there exists h 6= 0 such that x+y 0 = (1 h) x+h (x + y).
This implies y 0 = hy and therefore, by setting = h, we have the desired result.
The next corollary shows that this redundancy of the directions translates, in a simple
and elegant way, in the homogeneity of the directional derivative, a property that permits
to determine the value f 0 (x; y) for every scalar once we know the value of f 0 (x; y).
f (x + ( h) y) f (x) f (x + ( h) y) f (x)
lim = lim = f 0 (x; y)
h!0 h ( h)!0 h
670 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
Partial derivatives are nothing but the directional derivatives computed along the fun-
damental directions in Rn represented by the versors ei . That is,
@f (x)
f 0 x; ei =
@xi
for each i = 1; 2; :::; n. So, functions that are derivable at x, are partially derivable there.
The converse is false, as the next example shows.
0 if x1 x2 = 0
f (x1 ; x2 ) =
1 if x1 x2 6= 0
is partially derivable at the origin. However, it is not derivable at the origin 0 = (0; 0).
Indeed, consider x = 0 and y = (1; 1). We have
f (x + hy) f (x) f (h; h) 1
= = 8h 6= 0
h h h
so the limit (21.29) does not exists, and the function is not derivable at 0. N
In sum, partial derivability is a weaker notion than derivability, something not surpris-
ing (indeed, the former notion controls only two directions out of the in…nitely many ones
controlled by the latter notion).
21.5.2 Algebra
Like that of partial derivatives, also the calculus of directional derivatives can be reduced to
the calculus of ordinary derivatives of scalar functions. Given a point x 2 Rn and a direction
y 2 Rn , de…ne an auxiliary scalar function as (h) = f (x + hy) for every h 2 R. The
domain of is the set fh 2 R : x + hy 2 U g, which is an open set in R containing the point
0. By de…nition of right-sided derivative, we have
and therefore
f 0 (x; y) = 0
+ (0) (21.31)
The derivative f 0 (x; y) can therefore be seen as the right-sided ordinary derivative of the
scalar function computed at the point 0. Naturally, when is di¤erentiable at 0, (21.31)
reduces to f 0 (x; y) = 0 (0).
Example 969 (i) Let f : R3 ! R be de…ned by f (x1 ; x2 ; x3 ) = x21 + x22 + x23 . Compute the
directional derivative of f at x = (1; 1; 2) along the direction y = (2; 3; 5). We have:
and therefore
It follows that 0 (h) = 76h + 18 and, by (21.31), we conclude that f 0 (x; y) = 0 (0) = 18.
(ii) Let us generalize the previous example and consider the function f : Rn ! R de…ned
by f (x) = kxk2 . We have
n n
d X X
0
(h) = (xi + hyi )2 = 2 yi (xi + hyi ) = 2y (x + hy)
dt
i=1 i=1
Therefore, f 0 (x; y) = 0 (0) = 2x y. The directional derivative of f (x) = kxk2 thus exists
at all the points and along all possible directions, that is, f is derivable on Rn . Its general
form is
f 0 (x; y) = 2x y
In the special direction y = (2; 3; 5) of point (i), we indeed have f 0 (x; y) = 2 (1; 1; 2) (2; 3; 5) =
18.
(iii) Consider the function f : R2 ! R de…ned by
8 2
< x21 x22 if (x1 ; x2 ) 6= (0; 0)
x1 +x2
f (x1 ; x2 ) =
:
0 if (x1 ; x2 ) = (0; 0)
Consider the origin 0 = (0; 0). For every y 2 R2 we have (h) = f (hy) = hy1 y22 = y12 + y22
and so f 0 (0; y) = 0 (0) = y1 y22 =y12 + y22 . In conclusion,
f 0 (0; y) = f (y)
for every y 2 R2 . So, the function f is derivable at the origin and equals its own directional
derivative there. N
Using the auxiliary functions , it is easy to prove that for directional derivatives the
usual algebraic rules hold:
We already learned that partial derivability does not imply di¤erentiability (Example
953). Now we learned that even full-‡edged derivability is not enough to imply di¤erenti-
ability. It is, indeed, not even enough to imply continuity: there exist functions that are
derivable at some point but are discontinuous there, as the following example shows.
h6 y 4 y 2 hy 4 y 2
= lim 5 4 81 2 4 = lim 4 81 2 4 =0
t!0 h t y + y t!0 h y1 + y2
1 2
Therefore, f 0 (0; y) = 0 for every y 2 R2 and the directional derivative at the origin 0 is then
the null linear function. It follows that f is derivable at 0. However, it is not continuous
at 0 (a fortiori, it is not di¤erentiable at 0 by Theorem 952). Indeed, consider the points
t; t2 2 R2 that lie on the graph of the parabola x2 = x21 . We have
2
t4 t2 t4 t4 1
f t; t2 = = =
t8 + (t2 )4 t8 + t8 2
Along these points the function is constant and takes on value 1=2. It follows that limt!0 f t; t2 =
1=2 and, being f (0) = 0, the function is discontinuous at 0. N
di¤erentiability implies derivability (Theorem 970), but not vice versa when n 2
(Example 971);
These relations sharpen some of the …ndings of Section 21.2.1 on partial derivability.
This de…nition generalizes De…nition 951, which is the special case m = 1. The linear
approximation is now given by a linear operator with values in Rm , while at the numerator
of the incremental ratio in (21.33) we …nd a norm instead of an absolute value because we
now have to deal with vectors in Rm .
The di¤erential for operators satis…es properties that are similar to those that we saw
in the case m = 1. Naturally, instead of the vector representation of Theorem 952 we now
have a more general matrix representation based on the operator version of Riesz’s Theorem
(Theorem 564). To see its form, we introduce the Jacobian matrix. Recall that an operator
f : U ! Rm can be regarded as a m-tuple (f1 ; :::; fm ) of functions de…ned on U and with
values in R. The Jacobian matrix Df (x) of an operator f : U ! Rm at x 2 U is, then, a
matrix m n given by:
2 @f @f1 @f1
3
1
(x) (x) (x)
6 @x1 @x2 @xn
7
6 @f2 @f2 @f2 7
6
Df (x) = 6 @x1 (x) @x2 (x) @xn (x) 7
7
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)
that is, 2 3
rf1 (x)
6 rf2 (x) 7
Df (x) = 6
4
7
5 (21.34)
rfm (x)
We can now give the matrix representation of di¤erentials, which shows that the Jac-
obian matrix Df (x) is, indeed, the matrix associated to the linear operator df (x). This
representation generalizes the vector representation of Theorem 952 because the Jacobian
matrix Df (x) reduces to the gradient rf (x) in the special case m = 1.
Proof We begin by considering a simple property of the norm. Let x = (x1 ; :::; xn ) 2 Rn .
For every j = 1; ::; n we have:
v
q uX
u n 2
jxj j = xj t
2 xj = kxk (21.35)
j=1
f x + tej f (x)
lim df (x) ej =0 (21.36)
t!0 jtj
21.6. DIFFERENTIAL OF OPERATORS 675
fi x + tej fi (x)
lim dfi (x) ej =0
t!0 jtj
for each i = 1; :::m. We can therefore conclude that, for every i = 1; :::; m and every
j = 1; :::; n, we have:
@fi fi x + tej fi (x)
(x) = lim = dfi (x) ej (21.37)
@xj t!0 t
The matrix associated to a linear operator f : Rn ! Rm is (Theorem 564):
A = f e1 ; f e2 ; :::; f (en )
In our case, thanks to (21.37) we therefore have
A = df (x) e1 ; :::; df (x) (en )
2 3
df1 (x) e1 df1 (x) e2 df1 (x) (en )
6 df2 (x) e1 df2 (x) e2 df2 (x) (en ) 7
=64
7
5
dfm (x) e1 dfm (x) e2 n
dfm (x) (e )
2 @f @f1 @f1
3
1
@x (x) @x2 (x) @xn (x)
6 @f21 @f2 @f2 7
6 (x) (x) (x) 7
= 6 @x1 @x2 @xn 7 = Df (x)
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)
as desired.
Example 977 Let f : R ! R3 be de…ned by f (x) = (x; sin x; cos x). For example, if x = ,
then f (x) = ( ; 0; 1) 2 R3 . We have:
and so 2 3
1
Df (x) = 4 cos x 5
sin x
By Theorem 974, the di¤erential at x is given by the linear operator df (x) : R ! R3 de…ned
by
df (x) (h) = Df (x) h = (h; h cos x; h sin x)
for each h 2 R. For example, at x = we have df (x) (h) = (h; h; 0). N
Example 978 Let f : Rn ! Rm be the linear operator de…ned by f (x) = Ax, with
2 3
a11 a12 a1n
6 a21 a22 a2n 7
A=6 4
7
5
am1 am2 amn
Let a1 ; :::; am be the row vectors of A, that is, a1 = (a11 ; a12 ; :::; a1n ) ; ::::; am = (am1 ; am2 ; :::; amn ).
We have:
which implies Df (x) = A. Hence, the Jacobian matrix of a linear operator coincides with
the associated matrix A. By Theorem 974, the di¤erential at x is therefore given by the
linear operator Ah itself. This naturally generalizes the well known result that for scalar
functions of the form f (x) = ax, with a 2 R, the di¤erential is df (x) (h) = ah. N
The right-hand side is the product of the linear operators df (g (x)) and dg (x). By
Theorem 569, its matrix representation is given by the product Df (g (x)) Dg (x) of the
Jacobian matrices. We thus have the fundamental chain rule formula:
In the scalar case n = m = q = 1, the rule takes its basic form (f g)0 (x) = f 0 (g (x)) g 0 (x)
studied in Proposition 925.
Another important special case is when q = 1. In this case we have f : B Rm ! R
and g = (g1 ; :::; gm ) : U n m
R ! R , with g (U ) B. For the composite function f g :
U Rn ! R the chain rule takes the form:
r (f g) (x)
= rf (g (x)) Dg (x)
2 @g1 @g1 @g1
3
@x1 (x) @x2 (x) @xn (x)
@f @f 6 @g2 @g2 @g2 7
6 @x1 (x) @x2 (x) @xn (x) 7
= (g (x)) ; :::; (g (x)) 6 7
@x1 @xm 4 5
@gm @gm @gm
@x1 (x) @x2 (x) @xn (x)
m m
!
X @f @gi X @f @gi
= (g (x)) (x) ; :::; (g (x)) (x)
@xi @x1 @xi @xn
i=1 i=1
Grouping the terms for @f =@xi , we get the following equivalent form:
n
X n
X
@f @g1 @f @gm
d (f g) (x) (h) = (g (x)) (x) hi + + (g (x)) (x) hi
@x1 @xi @xm @xi
i=1 i=1
This is the formula of the total di¤erential for the composite function f g. The total
variation d (f g) of f g is the result of the sum of the e¤ects on the function f of the
variations of the single functions gi determined by in…nitesimal variations dxi of the di¤erent
variables.
The di¤erential is
m
X @f dgi
d (f g) (x) (h) = (g (x)) (x) h
@xi dx
i=1
t 1 1
d (f g) = 6e dt
t2 t
N
4x1 1 1
Dg (x) =
1 4x32 0
1 0
Df (x) =
x2 x1
and therefore
1 0
Df (g (x)) =
x1 x42 2x21 + x2 + x3
It follows that:
Df (g (x)) Dg (x)
1 0 4x1 1 1
=
x1 x42 2x21 + x2 + x3 1 4x32 0
4x1 1 1
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
d (f g) (x) (h)
2 3
h1
4x1 1 1 4 h2 5
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
h3
Naturally, though it is in general more complicated, the Jacobian matrix of the composition
f g can be computed directly, without using the chain rule, by writing explicitly the form
of f g and by computing its partial derivatives. In this example, f g : R3 ! R2 is given
by
Therefore,
and we have:
@ (f g)1 @ (f g)1 @ (f g)1
= 4x1 ; = 1; =1
@x1 @x2 @x3
@ (f g)2
= 6x21 4x1 x42 + x2 + x3
@x1
@ (f g)2
= x1 8x21 x32 5x42 4x32 x3
@x2
@ (f g)2
= x1 x42
@x3
The Jacobian matrix " #
@(f g)1 @(f g)1 @(f g)1
@x1 @x2 @x3
@(f g)2 @(f g)2 @(f g)2
@x1 @x2 @x3
Proof Fix x 2 Rn++ and consider the scalar function ' : (0; 1) ! R de…ned by ' (t) =
f (tx). If we de…ne g : (0; 1) ! Rn++ by g (t) = tx, we can write ' = f g. By (21.41),
we have '0 (t) = rf (tx) x. On the other hand, homogeneity implies ' (t) = t f (x), so
'0 (t) = t 1 f (x). We conclude that rf (tx) x = t 1 f (x). For t = 1, it is Euler’s
Formula.
Equality (21.42) is called Euler’s Formula.13 The more interesting cases are = 0
and = 1. For instance, the indirect utility function v : Rn++ R+ ! R is easily to be
homogeneous of degree 0 (cf. Proposition 848). By Euler’s Formula, we have:
n
X @v (p; w) @v (p; w)
pi = w
@pi @w
i=1
Fix " > 0. By (21.46), there exists " > 0 such that kkk " implies k (k)k = kkk ". In
other words, there exists " > 0 such that kg (x + h) g (x)k " implies
k (g (x + h) g (x))k
"
kg (x + h) g (x)k
On the other hand, since g is continuous at x, there exists 1 > 0 such that khk 1
im-
plies kg (x + h) g (x)k " . Therefore, for khk su¢ ciently small we have k (g (x + h) g (x))k
" kg (x + h) g (x)k. By applying Lemma 730 to the linear operator dg (x), there exists k > 0
such that
k (g (x + h) g (x))k " kg (x + h) g (x)k (21.47)
" k (h) + dg (x) (h)k
" k (h)k + " kdg (x) (h)k " k (h)k + "k khk
682 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
Since " was …xed arbitrarily, it can be taken as small as we like. Therefore:
as desired.
Chapter 22
Di¤erential methods
and so
0 f (x0 + h) f (x0 )
fjU (x0 ) = lim
h!0 h
We can therefore consider directly the limit
f (x0 + h) f (x0 )
lim
h!0 h
and say that its value, denote by f 0 (x0 ), is the derivative of f at the interior point x0 if it
exists and is …nite.
In sum, derivability and di¤erentiability are local notions that use only the properties of
the function in a neighborhood, however small, of the point at hand. They can therefore be
de…ned at any interior point of any set.
683
684 CHAPTER 22. DIFFERENTIAL METHODS
f 0 (^
x) = 0 (22.1)
Proof Let x ^ 2 C be an interior point and a local maximizer on C (a similar argument holds
if it is a local minimizer). There exists therefore B" (^x) such that (18.21) holds, that is,
f (^
x) f (x) for every x 2 B" (^ x) \ C. For every h > 0 su¢ ciently small, that is, h 2 (0; "),
we have x ^ + h 2 B" (^
x). Hence
f (^
x + h) f (^
x)
0 8h 2 (0; ")
h
which implies
f (^
x + h) f (^
x)
lim 0 (22.2)
h!0+ h
On the other hand, for every h < 0 su¢ ciently small, that is, h 2 ( "; 0), we have x
^+h 2
B" (^
x). Therefore,
f (^
x + h) f (^ x)
0 8h 2 ( "; 0)
h
which implies
f (^
x + h) f (^x)
lim 0 (22.3)
h!0 h
Together, inequalities (22.2) and (22.3) imply that
f (^
x + h) f (^
x) f (^
x + h) f (^
x) f (^
x + h) f (^
x)
0 lim = lim = lim 0
h!0 h h!0 h h!0+ h
f (^
x + h) f (^
x)
f 0 (^
x) = lim =0
h!0 h
as desired.
The …rst-order condition (22.1) will turn out to be key in solving optimization problems,
hence the important instrumental role of local extremal points. Conceptually, it tells us
that in order to maximize (or minimize) an objective function we need to consider what
happens at the margin: a point cannot be a maximizer if there is still room for improvement
through in…nitesimal changes, be they positive or negative. At a maximizer, all marginal
opportunities must have been exhausted.
The fundamental principle highlighted by the …rst order condition is that, to maximize
levels of utility (or of production or of welfare and so on), one needs to work at the mar-
gin. In economics, the understanding of this principle was greatly facilitated by a proper
mathematical formalization of the optimization problem that made it possible to rely on
di¤erential calculus (and so on the shoulders of the giants who created it). What becomes
crystal clear through calculus, is highly non-trivial otherwise, in particular if we just use a
purely literary analysis. Only in the 1870s the marginal principle was fully understood and
was at the heart of the marginalist theory of value, pioneered in the 1870s by Jevons, Menger,
and Walras. This approach has continued to evolve since then (at …rst with the works of
Edgeworth, Marshall, and Pareto) and, over the years, has shown a surprising ability to shed
light on economic phenomena. In all this, the …rst-order condition and its generalizations
(momentarily we will see its version for functions of several variables) is, like Shakespeare’s
Julius Caesar: the colossus that bestrides the economics world.
That said, let us continue with the analysis of Fermat’s Theorem. It is important to
focus on the following aspects:
(i) The hypothesis that x ^ is an interior point of C is essential for Fermat’s Theorem.
Indeed, consider for example f : R ! R given by f (x) = x, and let C = [0; 1]. The boundary
point x = 0 is a global minimizer of f on [0; 1], but f 0 (0) = 1 6= 0. In the same way, the
boundary point x = 1 is a maximizer, but f 0 (1) = 1 6= 0. Therefore, if x is a boundary local
extremal point, it is not necessarily true that f 0 (x) = 0.
(ii) Fermat’s Theorem cannot be applied to functions that, even if they have interior
maximizers or minimizers, are not di¤erentiable at these points. A classic example is the
function f : R ! R given by f (x) = jxj: the point x = 0 is a global minimizer but f , at
1
This heuristic argument can be also articulated as follows. Since f is derivable at x0 , we have f (x0 + h)
f (x0 ) = f 0 (x0 ) h + o (h). Heuristically, we can set f (x0 + h) f (x0 ) = f 0 (x0 ) h by neglecting the term o (h).
If f 0 (x0 ) > 0, we have f (x0 + h) > f (x0 ) if h > 0, so a strict increase is strictly bene…cial; if f 0 (x0 ) < 0, we
have f (x0 + h) > f (x0 ) if h < 0, so a strict decrease is strictly bene…cial. Only if f 0 (x0 ) = 0, such strictly
bene…cial variations cannot occur, so f may be maximized at x0 .
686 CHAPTER 22. DIFFERENTIAL METHODS
that point, does not admit derivative, so the condition f 0 (x) = 0 is not relevant in this case.
Another example is the following.
q
Example 984 Let f : R ! R be given by f (x) = 3
(x2 5x + 6)2 , with graph
2.5
y
2
1.5
0.5
0
O 2 5/2 3 x
-0.5
-1
-1.5
0 1 2 3 4 5
2 2 1 2 (2x 5)
f 0 (x) = x 5x + 6 3
(2x 5) = p
3
3 3 x2 5x + 6
and so it does not exist where x2 5x + 6 is zero, that is, at the two minimizers! The
point x = 5=2 is such that f 0 (x) = 0 and is a local maximizer (being unbounded above, this
function has no global maximizers). N
(iii) Lastly, the condition f 0 (x) = 0 is only necessary. The following simple example
should not leave any doubt on this.
22.1. EXTREMAL AND CRITICAL POINTS 687
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
We have f 0 (0) = 0, although the origin x0 = 0 is neither a local maximizer nor a local
minimizer.2 Condition (22.1) is therefore necessary, but not su¢ cient, for a point to be a
local extremum. N
We now address the multivariable version of Fermat’s Theorem. In this case the …rst order
condition (22.1) takes the more general form (22.4) in which gradients replace derivatives.
We leave the proof to the reader. Indeed, mutatis mutandis, it is the same as that of
Fermat’s Theorem.3
The observations (i)-(iii), just made for the scalar case, continue to hold in the multivari-
able case. In particular, as in the scalar case the …rst order condition is necessary, but not
su¢ cient, as the next example shows.
The unique solution of this system is (0; 0), which in turn is the unique point in R2 where
f satis…es condition (22.4). It is easy to see that this point is neither a maximizer nor a
minimizer. Indeed, if we consider any point (0; x2 ) di¤erent from the origin on the vertical
axis and any point (x1 ; 0) di¤erent from the origin on the horizontal axis, we have
f (0; x2 ) = x22 < 0 and f (x1 ; 0) = x21 > 0
that is, being f (0; 0) = 0,
f (0; x2 ) < f (0; 0) < f (x1 ; 0) 80 6= x1 ; x2 2 R
In every neighborhood of the point (0; 0) there are, therefore, both points in which the
function is strictly positive and points in which it is strictly negative: as we can see from the
…gure
0
x3
-2
-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
the origin (0; 0) is a “saddle” point of f which is neither a maximizer nor a minimizer. N
The points x^ of Rn such that rf (^
x) = 0 – in particular for n = 1 the points such that
f 0 (^
x)
= 0 – are said to be stationary points or critical points of f . Using this terminology,
Theorem 986 can be paraphrased as saying that a necessary condition for an interior point
x to be a local minimizer or maximizer is to be stationary.
Example 988 Let f : R ! R be given by f (x) = 10x3 (x 1)2 . The …rst order condition
(22.1) becomes
10x2 (x 1) (5x 3) = 0
and therefore the points that satisfy it are x = 0, x = 1, and x = 3=5. N
Example 989 Let f : R2 ! R be given by f (x1 ; x2 ) = 2x21 + x22 3 (x1 + x2 ) + x1 x2 3.
We have
rf (x) = (4x1 3 + x2 ; 2x2 3 + x1 )
So here the …rst order condition (22.4) assumes the form
4x1 3 + x2 = 0
2x2 3 + x1 = 0
It is easy to see that x = (3=7; 9=7) is the unique solution of the system, so it is the unique
stationary point of f on R2 . N
22.2. MEAN VALUE THEOREM 689
We have rf (x) = 4x31 + 4x2 ; 4x32 + 4x1 , so the …rst order condition is
(
4x31 + 4x2 = 0
4x32 + 4x1 = 0
that is, (
x31 = x2
x32 = x1
The stationary points are (0; 0), (1; 1), and ( 1; 1). Among them we have to look for the
possible solutions of the unconstrained optimization problem
Theorem 991 (Rolle) Let f : [a; b] ! R be continuous on [a; b], with f (a) = f (b), and
di¤ erentiable on (a; b). Then, there exists (at least) one critical point x
^ 2 (a; b), that is, a
point x^ 2 (a; b) such that f 0 (^
x) = 0.
4
Recall that in Section 18.1 optimization problems were called unconstrained when C is open.
690 CHAPTER 22. DIFFERENTIAL METHODS
This theorem, which provides a simple su¢ cient condition for a function to have a critical
point, has an immediate graphical intuition:
6
y
1
O a c b x
0
0 1 2 3 4 5
Proof By Weierstrass’Theorem, there exist x1 ; x2 2 [a; b] such that f (x1 ) = minx2[a;b] f (x)
and f (x2 ) = maxx2[a;b] f (x). Denote m = minx2[a;b] f (x) and M = maxx2[a;b] f (x). If
m = M , then f is constant, that is, f (x) = m = M , and therefore f 0 (x) = 0 for every
x 2 (a; b). If m < M , then at least one of the points x1 and x2 is interior to [a; b]. Indeed,
they cannot be both boundary points because f (a) = f (b). If x1 is an interior point of [a; b],
that is, x1 2 (a; b), then by Fermat’s Theorem we have f 0 (x1 ) = 0, so x ^ = x1 . Analogously,
0
if x2 2 (a; b), we have f (x2 ) = 0, and therefore x^ = x2 .
p
Example 992 Let f : [ 1; 1] ! R be given by f (x) = 1 x2 . This function is continuous
on [ 1; 1] and di¤erentiable on ( 1; 1). Since f ( 1) = f (1) = 0, by Rolle’s Theorem there
exists a critical point x^ 2 ( 1; 1), that is, a point such that f 0 (^
x) = 0. In particular, from
1
f 0 (x) = x 1 x2 2
it follows that this point is x
^ = 0. N
Given a function f : [a; b] ! R, consider the points (a; f (a)) and (b; f (b)) of its graph.
The straight line passing through these points has equation
f (b) f (a)
y = f (a) + (x a) (22.6)
b a
as the reader can verify by solving the system
(
f (a) = ma + q
f (b) = mb + q
This straight line plays a key role in the important Mean Value (or Lagrange’s) Theorem,
which we now state and prove.
Theorem 993 (Mean Value) Let f : [a; b] ! R be continuous on [a; b] and di¤ erentiable
on (a; b). Then, there exists x
^ 2 (a; b) such that
f (b) f (a)
f 0 (^
x) = (22.7)
b a
22.2. MEAN VALUE THEOREM 691
Rolle’s Theorem is the special case in which f (a) = f (b), so that condition (22.7)
becomes f 0 (^x) = 0.
Note that
f (b) f (a)
b a
is the slope of the straight line (22.6) passing through the points (a; f (a)) and (b; f (b)) of the
graph of f , while f 0 (x) is the slope of the straight line tangent to the graph of f at the point
(x; f (x)). The Mean Value Theorem establishes, therefore, a simple su¢ cient condition for
the existence of a point x ^ 2 (a; b) such that the straight line tangent at (^
x; f (^
x)) is parallel
to the straight line passing through the points (a; f (a)) and (b; f (b)). Graphically:
6
y
1
O a c b x
0
0 1 2 3 4 5
Note that the increment f (b) f (a) on the whole interval [a; b] can be written, thanks
to the Mean Value Theorem, as
f (b) f (a) = f 0 (^
x) (b a)
or, in an equivalent way, as
f (b) f (a) = f 0 a + t^(b a) (b a)
for a suitable 0 t^ 1. Indeed, we have
[a; b] = f(1 t) a + tb : t 2 [0; 1]g = fa + t (b a) : t 2 [0; 1]g
^ 2 [a; b] can be written in the form a + t^(b
so every point x a) for a suitable t^ 2 [0; 1].
and therefore
f (b) f (a)
f 0 (^
x) =0
b a
That is, x
^ satis…es condition (22.7).
A …rst interesting application of the Mean Value Theorem shows that constant functions
are characterized by having a zero derivative at every point.
Corollary 994 Let f : [a; b] ! R be continuous on [a; b] and di¤ erentiable on (a; b). Then
f 0 (x) = 0 for every x 2 (a; b) if and only if f is constant, that is, if and only if there exists
k 2 R such that
f (x) = k 8x 2 [a; b]
Proof Let us prove the “only if”, since the “if”is the simple property of derivatives seen in
Example 907. Let x 2 (a; b) and let us apply the Mean Value Theorem on the interval [a; x].
It yields a point x
^ 2 (a; x) such that
f (x) f (a)
0 = f 0 (^
x) =
x a
that is, f (x) = f (a). Since x is any point in (a; b), it follows that f (x) = f (a) for any
x 2 [a; b). By the continuity of f at b, we also have f (a) = f (b).
This characterization of constant functions will prove important in the theory of integ-
ration. In particular, the following simple generalization of Corollary 994 will be key.
Corollary 995 Let f; g : [a; b] ! R be continuous on [a; b] and di¤ erentiable on (a; b). Then
f 0 (x) = g 0 (x) for every x 2 (a; b) if and only if there exists k 2 R such that
Two functions that have the same …rst derivative are, thus, equal up to an (additive)
constant k.
Proof Here too we prove the “only if”, the “if” being obvious. Let h : [a; b] ! R be the
auxiliary function h (x) = f (x) g (x). We have h0 (x) = f 0 (x) g 0 (x) = 0 for every
x 2 (a; b). Therefore, by Corollary 994 h is constant on [a; b]. That is, there exists k 2 R
such that h (x) = k for every x 2 [a; b], so f (x) = g (x) + k for every x 2 [a; b].
Via higher order derivatives, next we establish the ultimate version of the Mean Value
Theorem.5
The Mean Value Theorem is the special case n = 1 because (22.7) can be equivalently
written as
f (b) f (a) = f 0 (^
x) (b a)
Formula (22.8) is a version of Taylor’s formula, arguably the most important formula of
Calculus that will be studied in detail later in the book (Chapter 23).
The function g is continuous on [a; b] and di¤erentiable on (a; b). Some algebra shows that
(b x)n 1
g 0 (x) = k f (n) (x)
(n 1)!
We close by noting that, as easily checked, there is a dual version of (22.8) involving the
derivatives at other endpoint of the interval:
n
X1 f (k) (b) f (n) (^
x)
f (a) f (b) = (a b)k + (a b)n (22.9)
k! n!
k=1
where, again, x
^ 2 (a; b).
Although it might be discontinuous, the derivative function still satis…es the intermediate
values property of Lemma 493, as the next important result proves.
Theorem 998 (Darboux) Let f : [a; b] ! R be di¤ erentiable, with f 0 (a) < f 0 (b). If
f 0 (a) z f 0 (b)
then there exists a c b such that f 0 (c) = z. If f 0 is strictly increasing, such c is unique.
Proof Let f 0 (a) < z < f 0 (b) (otherwise the result is trivially true). Set g(x) = f (x) zx.
We have g 0 (x) = f 0 (x) z, and therefore g 0 (a) < 0 and g 0 (b) > 0. The function g is continuous
on [a; b] and, therefore, by Weierstrass’Theorem it has a minimizer xm on [a; b]. Let us prove
that the minimizer xm is interior. Since g 0 (a) < 0, there exists a point x1 2 (a; b) such
that g(x1 ) < g(a). Analogously, being g 0 (b) > 0, there exists a point x2 2 (a; b) such that
g(x2 ) < g(b). This implies that neither a nor b are minimizers of g on [a; b], so xm 2 (a; b).
By Fermat’s Theorem, g 0 (xm ) = 0, that is, f 0 (xm ) = z. In conclusion, there exists c 2 (a; b)
such that f 0 (c) = z.
As in Lemma 493, the case f 0 (a) > f 0 (b) is analogous. We can thus say that, for any z
such that
min f 0 (a) ; f 0 (b) z max f 0 (a) ; f 0 (b)
there exists a c b such that f (c) = z. If f 0 is strictly monotonic, such c is unique.
Since in general the derivative function is not continuous (so Weierstrass’Theorem cannot
be invoked), Darboux’s Theorem does not imply – unlike Lemma 493 – a version of the
Intermediate Value Theorem for the derivative function. Still, Darboux’s Theorem is per se
a remarkable property of continuity of the derivative function that implies, inter alia, that
such function can only have essential non-removable discontinuities.
0
By taking any 0 < < , we therefore have
0 0
x0 6= x 2 x0 ; x0 + =) L " < f 0 (x) < L + " < f 0 (x0 ) (22.10)
0 0
Consider the interval x0 ; x0 . By (22.10), we have f 0 (x0 ) < f 0 (x0 ). By Darboux’s
0 0 0
Theorem, for every f (x0 ) < z < f (x0 ) there exists c 2 (x0 ; x0 ) such that f 0 (c) = z.
But this contradicts (22.10) which implies that, taking z 2 (L + "; f 0 (x0 )), there is no c 2
[x0 ; x0 ] such that f 0 (c) = z. Hence, f 0 cannot have removable discontinuities.
The function f 0 cannot have jump discontinuities either. Suppose, by contradiction, that
f has such a discontinuity at x0 2 (a; b), that is, limx!x+ f 0 (x) 6= limx!x f 0 (x). Suppose
0
0 0
that f 0 (x0 ) = limx!x+ f 0 (x) (the proof is analogous if f 0 (x0 ) = limx!x f 0 (x)). By setting
0 0
L = limx!x f 0 (x), the proof proceeds in an analogous way to the one seen for the removable
0
discontinuity, as the reader can verify.
Moreover, the function is said to be (locally) strictly increasing if the inequalities in (22.11)
are all strict.
Similar de…nitions hold for the (strictly) decreasing monotonicity at a point. To avoid
misunderstandings, recall that in Section 6.4.4 we de…ned monotonicity in a global way by
saying (in De…nition 206) that a function f : A R ! R is increasing if
Proof If f is increasing, the di¤erence quotients of f at x0 are all positive (at least for h
su¢ ciently small), so their limit is 0. If instead f 0 (x0 ) > 0, the di¤erence quotients are, at
least for h close to 0, strictly positive by the Theorem on the permanence of sign. It follows
that f (x0 + h) > f (x0 ) for h > 0 and f (x0 + h) < f (x0 ) for h < 0, with h su¢ ciently
small, so f is strictly increasing at x0 .
Note the asymmetry between points (i) and (ii) of the previous proposition:
but
The non-negativity of the derivative is necessary for the increasing monotonicity, while its
strict positivity is su¢ cient for the strictly increasing monotonicity.
This asymmetry is unavoidable because the converses of (22.12) and (22.13) do not hold.
For example, the function f (x) = x3 is strictly decreasing at 0 although f 0 (0) = 0, so the
converse of (22.12) is false. The function f (x) = x3 is strictly increasing at x0 = 0 (indeed
x3 > 0 for every x > 0 and x3 < 0 for every x < 0), but f 0 (0) = 0, so the converse of (22.13)
is false as well.
We might think that, if a function is monotonic at each point of a set A, it enjoys the
same type of monotonicity on the entire set A, i.e., globally. This is not the case. Indeed,
consider the function f (x) = 1=x de…ned on the open set R f0g. It is strictly increasing
at each point of its domain because f 0 (x) = 1=x2 > 0 for every x 6= 0. However, it is not
increasing at all because, for example 1 < 1, while f ( 1) = 1 > 1 = f (1). Graphically:
22.4. MONOTONICITY AND DIFFERENTIABILITY 697
8 y
0
O x
-2
-4
-6
-8
-4 -3 -2 -1 0 1 2 3 4 5
Therefore, monotonicity at each point of a set does not imply global monotonicity (of the
same type). Intuitively, this may happen because if such set is a union of disjoint intervals,
then at each interval the function “gets back to the beginning”. The next important result
con…rms this intuition by showing that the implication does hold when the set is an interval
(so we get rid of the case unions of disjoint intervals just mentioned). It is the classic
di¤erential criterion of monotonicity.
Under the clause a; b 2 R the interval (a; b) can be unbounded, for example (a; b) = R.
An similar result, negativity of the derivative on (a; b), holds for the decreasing monotonicity.
Note that Corollary 994 is a special case of this result since f 0 (x) = 0 for every x 2 (a; b) is
equivalent to having both f 0 (x) 0 and f 0 (x) 0 for every x 2 (a; b) and therefore, being
simultaneously increasing and decreasing, f is constant.
Proof “Only if”. Suppose that f is increasing. Let x 2 (a; b). For every h > 0 we have
f (x + h) f (x), hence
f (x + h) f (x)
0
h
It follows that
f (x + h) f (x) f (x + h) f (x)
f 0 (x) = lim = lim 0
h!0 h h!0+ h
“If”. Let f 0 (x) 0 for every x 2 (a; b). Let x1 ; x2 2 (a; b) with x1 < x2 . By the Mean
Value Theorem, there exists x
^ 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (^
x) = (22.14)
x2 x1
Since f 0 (^
x) 0 and x2 x1 > 0, this shows that f (x2 ) f (x1 ).
698 CHAPTER 22. DIFFERENTIAL METHODS
Example 1004 (i) Let f : R ! R be given by f (x) = 3x5 +2x3 . Since f 0 (x) = 15x4 +6x2
0 for every x 2 R, by Proposition 1003 the function is increasing. (ii) Let f : R ! R be
the quadratic function f (x) = x2 . We have f 0 (x) = 2x and hence Proposition 1003 (and its
analog for decreasing monotonicity) shows that f is neither increasing nor decreasing on R,
and that it is increasing on (0; 1) and decreasing on ( 1; 0). N
Next we show that the strict positivity of the derivative implies strict increasing mono-
tonicity, thus providing a most useful di¤erential criterion of strict monotonicity.
Proposition 1005 Let f : (a; b) ! R be a di¤ erentiable function, with a; b 2 R. If f 0 (x) > 0
for every x 2 (a; b), then f is (globally) strictly increasing on (a; b).
Proof The proof is similar to that of Proposition 1003 and is a simple application of the
Mean Value Theorem. Let f 0 (x) > 0 for every x 2 (a; b) and let x1 ; x2 2 (a; b) with x1 < x2 .
By the Mean Value Theorem, there exists c 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (c) = (22.15)
x2 x1
Since f 0 (c) > 0 for every c and x2 x1 > 0, from (22.15) it follows that f (x2 ) > f (x1 ).
The next example shows that the converse of the last result is false, so that for the
derivative of a strictly increasing function on an interval we can only say that it is 0 (and
not > 0).
Propositions 1003 and 1005 give very useful di¤erential criteria for the monotonicity of
scalar functions (dual versions hold for decreasing monotonicity). They hold also for closed or
half-open intervals, once the derivatives at the boundary points are understood as one-sided
ones.
We illustrate Proposition 1005 with an example.
Example 1007 (i) By Proposition 1005 (and its analog for decreasing monotonicity), the
quadratic function f (x) = x2 is strictly increasing on (0; 1) and strictly decreasing on
( 1; 0). (ii) By Proposition 1005, the function f (x) = 3x5 + 2x3 is strictly increasing both
on ( 1; 0) and on (0; 1). Nevertheless, the proposition cannot say anything about the strict
increasing monotonicity of f on R because f 0 (0) = 0. We can, however, check whether f is
strictly increasing on R through the de…nition of strict increasing monotonicity. To this end,
note that f (y) < f (0) = 0 < f (x) for every y < 0 < x, so f is indeed strictly increasing on
the entire real line. N
That said, we close with a curious characterization of strict monotonicity that, in a sense,
completes Proposition 1005.
Thus, it is the strict positivity at the points of an “order dense” subset of (a; b) that
characterizes strictly increasing functions. In view of Proposition 207, for a di¤erentiable
monotone function this strict positivity amounts to being injective.
We can revisit the last two examples in view of Proposition 1008. Indeed, by this result
we can say that the cubic function and the function f (x) = 3x5 + 2x3 are both strictly
increasing because their derivatives are everywhere strictly positive except at the origin.
A …nal twist: under continuous di¤erentiability, the “dense” strict positivity of the de-
rivative actually characterizes strictly increasing functions.
Proof In view of Proposition 1008, it is enough to show that f 0 0 if for every a x0 <
x00 b there exists x0 z x00 such that f 0 (z) > 0. Let x 2 (a; b). For each n large enough
so that x + 1=n 2 (a; b), there is a point x zn x + 1=n with f 0 (zn ) > 0. Since f 0 is
continuous, from zn ! x it follows that f 0 (x) = lim f 0 (zn ) 0. Since x was arbitrarily
chosen, we conclude that f 0 0.
6
y
O x x
0
1
0
-1 0 1 2 3 4 5
In a dual way, a local minimizer if in (22.16) we have f 0 (x) 0 f 0 (y), which is strong
if f 0 (x) < 0 < f 0 (y).6 Note that the di¤erentiability of f at x0 is not required, only its
continuity.
Proof Without loss of generality, assume that B" (x0 ) = (x0 "; x0 + ") C. Let x 2
(x0 "; x0 ). By the Mean Value Theorem, there exists 2 (x0 "; x0 ) such that
f (x0 ) f (x)
= f 0( )
x0 x
By (22.16), we have f 0 ( ) 0, from which we deduce that f (x0 ) f (x). In a similar way,
we can prove that f (x0 ) f (y) for every y 2 (x0 ; x0 + "). So, f (x0 ) f (x) for every
x 2 B" (x0 ) and therefore x0 is a local maximizer.
In particular, the following classic corollary of Proposition 1010 holds. Though weaker,
in many cases it is good enough.
Example 1013 Let f : R ! R be given by f (x) = jxj and take x0 = 0. The function is
continuous at x0 and is di¤erentiable at every x 6= 0. We have
(
1 if x < 0
f 0 (x) =
1 if x > 0
and hence (22.16) is satis…ed in a strict sense. By Proposition 1010, x0 is a strong local
maximizer. Note that in this case Corollary 1011 cannot be applied. N
The previous su¢ cient condition can be substantially simpli…ed if we assume that the
function is twice continuously di¤erentiable. In this case, it is indeed su¢ cient to evaluate
the sign of the second derivative at the point.
Proof Thanks to the continuity of f 00 at x0 , we have limx!x0 f 00 (x) = f 00 (x0 ) < 0. The
Theorem on the permanence of sign implies the existence of a neighborhood B" (x0 ) such
that f 00 (x) < 0 for every x 2 B" (x0 ). Hence, by Proposition 1005 the …rst derivative f 0 is
strictly decreasing in B" (x0 ), that is,
Example 1015 Going back to Example 1012, in view of Corollary 1014 it is actually su¢ -
cient to observe that f 00 (0) = 2 < 0 to conclude that x0 = 0 is a strong local maximizer.
Instead, Corollary 1014 cannot be applied to Example 1012 because f (x) = jxj is not
di¤erentiable at x0 = 0. N
The next example shows that the condition f 00 (x0 ) < 0 is su¢ cient, but not necessary:
there exist local maximizers x0 for which we do not have f 00 (x0 ) < 0.
22.5.2 Searching local extremal points via …rst and second order condi-
tions
Let x0 be an interior point of C. In view of Corollary 1014, we can say that:
(iii) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local maximizer;
(iv) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local minimizer.
(i) Necessary condition for an interior point x0 of C to be a local maximizer is that there
exists a neighborhood B" (x0 ) of x0 on which f is twice continuously di¤ erentiable, with
f 0 (x0 ) = 0 and f 00 (x0 ) 0.
(ii) Su¢ cient condition for an interior point x0 of C to be a (strong) local maximizer is that
there exists a neighborhood B" (x0 ) of x0 on which f is twice continuously di¤ erentiable,
with f 0 (x0 ) = 0 and f 00 (x0 ) < 0.
Intuitively, if f 0 (x0 ) = 0 and f 00 (x0 ) < 0, the derivative function f 0 at x0 is zero and
strictly decreasing (because its derivative f 00 is strictly negative): therefore it goes, being
zero at x0 , from positive values to negative ones. Hence, the function is increasing before
x0 , stationary at x0 and decreasing after x0 . It follows that x0 is a maximizer.7 A similar
intuition holds for the necessary part.
As it should be clear by now, (i) is a necessary but not su¢ cient condition, while (ii) is
a su¢ cient but not necessary condition. It is an unavoidable asymmetry which we have to
live with.
Terminology The conditions on the second derivatives of the last corollary are called second-
order conditions. In particular:
(i) the inequality f 00 (x0 ) 0 (resp., f 00 (x0 ) 0) is called second-order necessary condition
for a maximizer (resp., for a minimizer ).
(ii) the inequality f 00 (x0 ) < 0 (resp., f 00 (x0 ) > 0) is called second-order su¢ cient condition
for a maximizer (resp., for a minimizer ).
The interest of Corollary 1017 is in allowing to establish a procedure for the search of
local maximizers and minimizers on C of a twice-di¤erentiable function f : A R ! R.
Though it will be considerably re…ned in Section 23.3, it is often good enough.
Suppose that f is twice continuously di¤erentiable on the set of the interior points int C
of C. The procedure has two stages, based on the …rst and second order su¢ cient conditions.
Speci…cally:
1. Determine the set S int C of the stationary interior points of f ; in other words, solve
the the …rst-order condition f 0 (x) = 0.
2. Compute f 00 at each of the stationary points x 2 S and check the second order su¢ cient
conditions: the point x is a strong local maximizer if f 00 (x) < 0, while it is a strong
local minimizer if f 00 (x) > 0. If f 00 (x) = 0 the procedure fails.
7
Alternatively, at x0 the function f is stationary and concave (see below), so it admits a maximizer.
22.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 703
The procedure is based on Corollary 1017-(ii). The …rst stage – i.e., the solution of the
…rst-order condition –is based on Fermat’s Theorem: stationary points are the only interior
points that are possible candidates for local extremal points. Hence, the knowledge acquired
in the …rst stage is “negative”, it rules out all the interior points that are not stationary as
none of them can be a local maximizer or minimizer.
The second stage –i.e., the check of the second order condition –examines one by one the
possible candidates from the …rst stage to see if they meet the su¢ cient condition established
in Corollary 1017-(ii).
Example 1018 Let f : R ! R be given by f (x) = 10x3 (x 1)2 and C = R. Via the
procedure, we search the local extremal points of f on R. We have C = int C = R and f is
twice continuously di¤erentiable on R. As to stage 1, by recalling what we saw in Example
988, we have:
S = f0; 1; 3=5g
The stationary points in S are the unique candidates for local extremal points. As to stage
2, we have
f 00 (x) = 60x (x 1)2 + 120x2 (x 1) + 20x3
and therefore f 00 (0) = 0, f 00 (1) > 0 and f 00 (3=5) < 0. Hence, the point 1 is a strong local
minimizer, the point 3=5 is a strong local maximizer, while the nature of the point 0 remains
undetermined. N
The procedure, although very useful, has important limitations. First of all, it can deal
only with the interior points of C at which f is twice continuously di¤erentiable. It is,
instead, completely silent on the other points of C –that is, on its boundary points as well
as on its interior points at which f is not twice continuously di¤erentiable.
The boundary points 0 and 1 are local maximizers, but the procedure is not able to recognize
them as such. N
A further limitation of the procedure is the indeterminacy in the case f 00 (0) = 0, as the
simple function f (x) = x4 most eloquently shows: whether or not the stationary point x = 0
is a local minimizer cannot be determined through the procedure because f 00 (0) = 0. Let us
see another example which is as trivial as disconcerting (for the procedure’s self-esteem).
22.5.3 Searching global extremal points via …rst and second order condi-
tions
We can apply what we just learned to the unconstrained optimization problem (22.5), re…ning
for the scalar case the analysis of Section 22.1.3. So, consider the unconstrained optimization
problem
1. Determine the set S C of the stationary interior points of f by solving the …rst order
condition f 0 (x) = 0.
Note that the procedure is not conclusive because a key piece of information is lacking:
whether the problem actually admits a solution. The di¤erential methods of this chapter
do not ensure the existence of a solution, which only Weierstrass’ and Tonelli’s Theorems
are able to guarantee (in the absence of concavity properties of the objective functions).
In Chapter 28, we will show how the elimination method re…nes, in a resolutive way, the
procedure that we outlined here by combining such existence theorems with the di¤erential
methods.
Example 1021 As usual, the study of the cubic function f (x) = x3 is of illuminating
simplicity: though the unconstrained optimization problem
max x3 sub x 2 R
x
does not admit solutions, nevertheless the procedure determines the singleton S3 = f0g.
According to the procedure, the point 0 is the unique candidate solution of the problem:
unfortunately, the solution does not exist and it is, therefore, a useless candidacy. N
1 1
S= p ; 0; p
2 2
Since
x4 +x2
f 00 (x) = 2 8x6 8x4 4x2 + 1 e
p p
we have f 00 (0) > 0 and f 00 1= 2 = f 00 1= 2 < 0, so
1 1
S2 = p ;p
2 2
p p
On the other hand, f 1= 2 = f 1= 2 , and hence S3 = S2 . In conclusion, the points
p
x = 1= 2 are the candidate solutions of the unconstrained optimization problem. Example
1266, through the elimination method, will show that these points are, indeed, solutions of
the problem. N
Example 1023 Consider again Example 1018 and the unconstrained optimization problem
S = f0; 1; 3=5g
3
S2 = 0;
5
Since
3
f (0) = 0 < f
5
we get
3
S3 =
5
The point x = 3=5 is therefore the unique candidate solution of the unconstrained optimiza-
tion problem. As in the example of the cubic function, unfortunately this candidacy is vain:
indeed,
lim 10x3 (x 1)2 = +1
x!+1
Therefore the function, being unbounded above, has no global maximizers on R. The un-
constrained optimization problem has no solutions. N
706 CHAPTER 22. DIFFERENTIAL METHODS
It is important to observe how the global nature of the solution gives a di¤erent perspect-
ive on Corollary 1017. Of this result, we are interested in point (i) that provides a necessary
conditions for local maximizers (second-order necessary condition of the form f 00 (x) 0).
At the same time, in the previous search for local extremal points we considered point (ii) of
such result that covers su¢ ciency (second-order su¢ cient condition of the form f 00 (x) < 0).
From the “global” point of view, the fact that f 00 (x) < 0 implies that x is a strong local
maximizer is of secondary importance. Indeed, it is not conclusive: the point could be just a
local maximizer and, moreover, we could also have solutions where f 00 (x) = 0.8 In contrast,
the information f 00 (x) > 0 is conclusive in that it excludes, ipso facto, that x may be a
solution.
This is another example of how the global point of view, the one which we are really
interested in applications, can lead to view things in a di¤erent way relative to a local point
of view.9
Naturally, x < x0 < y implies f 0 (x) 0 f 0 (y) is the dual version of (22.18) that leads
to global minimizers.
Proof Let x 2 (a; b) be such that x < x0 . Fixing any " 2 (x0 x; x0 a), it follows that
x 2 (x0 "; x0 ). By the Mean Value Theorem there exists 2 (x0 "; x0 ) such that
f (x0 ) f (x)
= f 0( )
x0 x
Despite being attractive because of its simplicity, the global hypothesis (22.18) on deriv-
atives is less relevant than one can think prima facie because in applications it is typically
subsumed by concavity. Indeed, under concavity the …rst derivative (if exists) is decreasing
(cf. Corollary 1092), so condition (22.18) automatically holds provided the …rst order con-
dition f 0 (x0 ) = 0 holds. Though condition (22.18) can be used to …nd the maximizers of
functions that are not concave –e.g., in Example 1258 we will apply it to the Gaussian func-
tion, which is neither concave nor convex –it is much more convenient to consider a general
property of a function, like concavity, that does not require, a priori, the identi…cation of
a point x0 on which to check (22.18). All this explains the brevity of this section (and its
title). The role of concavity, instead, will be studied at length later in the book.
Theorem 1026 (de l’Hospital) Let f; g : (a; b) ! R be di¤ erentiable on (a; b), with a; b 2
R and g 0 (x) 6= 0 for every x 2 (a; b), and let x0 2 [a; b], with
f 0 (x)
lim =L2R (22.19)
x!x0 g 0 (x)
If either limx!x0 f (x) = limx!x0 g (x) = 0 or limx!x0 f (x) = limx!x0 g (x) = 1, then
f (x)
lim =L
x!x0 g (x)
Thus, de l’Hospital’s rule says that, under the hypotheses just indicated, we have
f 0 (x) f (x)
lim = L =) lim =L
x!x0 g 0 (x) x!x0 g (x)
i.e., the calculation of the limit limx!x0 f (x) =g (x) can be reduced to the calculation of
the limit of the ratio of the derivatives limx!x0 f 0 (x) =g 0 (x). The simpler the second limit
compared to the original one, the greater the usefulness of the rule.
Note that the –by now usual –clause a; b 2 R allows the interval (a; b) to be unbounded.
The rule holds, therefore, also for limits as x ! 1. Moreover, it applies also to one-sided
limits, even if for brevity we have omitted this case in the statement.
10
The result is actually due to Johann Bernoulli.
708 CHAPTER 22. DIFFERENTIAL METHODS
We omit the proof of the l’Hospital’s Theorem. Next we illustrate his rule with some
examples.
Example 1028 Let f; g : R ! R be given by f (x) = sin x and g (x) = x. Set x0 = 0 and
consider the classic limit limx!x0 f (x) =g (x). In every interval ( "; ") the hypotheses of de
l’Hospital’s rule are satis…ed, so
It is nice to see how de l’Hospital’s rule solves, in a simple way, this classic limit. N
The next example shows that for the solution of some limits it may be necessary to apply
de l’Hospital’s rule several times.
f 0 (x) ex 1 ex f (x) ex 1 ex
lim = lim = lim =) lim = lim = lim (22.20)
x!x0 g 0 (x) x!+1 2x 2 x!+1 x x!x0 g (x) x!+1 x2 2 x!+1 x
obtaining a simpler limit, but still not solved.
Let us apply again de l’Hospital’s rule to the derivative functions f 0 ; g 0 : R ! R given by
f (x) = ex and g 0 (x) = x. Again in every interval (a; +1), with a > 0, the hypotheses of
0
f 00 (x) ex f 0 (x) ex
lim = lim = +1 =) lim = lim = +1
x!x0 g 00 (x) x!+1 1 x!x0 g 0 (x) x!+1 x
22.6. DE L’HOSPITAL’S THEOREM AND RULE 709
f (x) ex
lim = lim 2 = +1
x!x0 g (x) x!+1 x
Example 1031 In a similar way it is possible to calculate the limit of the ratio between
f (x) = 1 cos x and g (x) = x2 as x ! 0:
and therefore the application of de l’Hospital’s rule has led to a more complicated limit than
the original one. In this case, the rule is useless, while the limit can be solved very easily in
a direct way:
2
ex 2
lim = lim ex x = lim ex(x 1) = +1
x!+1 ex x!+1 x!+1
As usual, cogito ergo solvo: mindless mechanical arguments may well lead astray. N
does not exist. If we tried to compute the simple limit limx!x0 f (x) =g (x) = 0 through de
l’Hospital’s rule we would have used a tool both useless, given the simplicity of the limit,
and ine¤ective. Again, a mechanical use of the rule can be very misleading. N
Summing up, de l’Hospital’s rule is a useful tool in the computation of limits, but its use-
fulness must be evaluated case by case. Moreover, it is important to note that de l’Hospital’s
Theorem states that, if lim f 0 =g 0 exists, then lim f =g exists too, and the two limits are equal.
The converse does not hold: it may happen that lim f =g exists but not lim f 0 =g 0 . We have
already seen an example of this, but we show two other examples, a bit more complicated.
710 CHAPTER 22. DIFFERENTIAL METHODS
sin x
f (x) x sin x 1
lim = lim = lim x =1
x!1 g (x) x!1 x + sin x x!1 sin x
1+
x
but
f 0 (x) 1 cos x
lim 0
= lim
x!1 g (x) x!1 1 + cos x
does not exist because both the numerator and the denominator oscillate between 0 and 2,
so the ratio oscillates between 0 and +1. N
1
Example 1035 Given f (x) = x2 sin and g (x) = log (1 + x), we have
x
f (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
g(x)
with limx!x0 1=g (x) = 0 and de l’Hospital’s rule is applicable to the functions f and 1=g. If
f is di¤erent from zero in a neighborhood of x0 , we can also write
g (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
f (x)
with limx!x0 1=f (x) = 1. In this case, de l’Hospital’s rule can be applied to the functions
g and 1=f . Which one of the two possible applications of the rule is more convenient must
be evaluated case by case.
22.6. DE L’HOSPITAL’S THEOREM AND RULE 711
g (x)
lim (f (x) + g (x)) = lim f (x) 1 + (22.21)
x!x0 x!x0 f (x)
and apply de l’Hospital’s rule to the limit limx!x0 g (x) =f (x), which has the form 1=1.
Alternatively, we can consider
1 1
+
f (x) g (x)
lim (f (x) + g (x)) = lim (22.22)
x!x0 x!x0 1
f (x) g (x)
Approximation
(i) the simplicity of the approximating function: the a¢ ne function f (x0 ) + f 0 (x0 ) h =
f (x0 ) + df (x0 ) (h) (geometrically, a straight line);
(ii) the quality of the approximation, given by the error term o (h).
Intuitively, there is a tension between these two properties: the simpler the approximating
function, the worse the quality of the approximation. In other terms, the simpler we want
the approximating function to be, the higher the error which we may incur. In this section
we study in detail the relation between these two key properties. In particular, suppose
one modi…es property (i) with an approximating function that is a polynomial of degree n,
not necessarily with n = 1 as in the case of a straight line. The desideratum that we posit
is that there is a corresponding improvement in the error term, namely, it should become
of magnitude o (hn ). In other words, when the degree n of the approximating polynomial
increases, and so does the complexity of the approximating function, we want that the error
term improves in a parallel way: an increase in the complexity of the approximating function
should be compensated by an improvement in the quality of the approximation.
713
714 CHAPTER 23. APPROXIMATION
f (x0 + h) = 0 + h + o (h) as h ! 0
The approximating function is now more complicated: instead of a straight line –the poly-
nomial of …rst degree 0 + h – we have a quadratic function – the polynomial of second
degree 0 + 1 h + 2 h2 . On the other hand, the error term is now better: instead of o (h)
we have o h2 .
Next we establish a key property: when they exist, polynomial expansions are unique.
Lemma 1039 A function f : (a; b) ! R has at most one polynomial expansion of degree n
at every point x0 2 (a; b).
Proof Suppose that, for every h 2 (a x0 ; b x0 ), there are two di¤erent expansions
2 n
0 + 1h + 2h + + nh + o (hn ) = 0 + 1h + 2h
2
+ + nh
n
+ o (hn ) (23.3)
Then
2 n
0 = lim 0 + 1h + 2h + + nh + o (hn )
h!0
2 n
= lim 0 + 1h + 2h + + nh + o (hn ) = 0
h!0
Hence,
n 1
1 = lim 1 + 2h + + nh + o hn 1
h!0
n 1
= lim 1 + 2h + + nh + o hn 1
= 1
h!0
Continuing in this way we can show that 2 = 2 , and so on until we show that n = n.
This proves that at most one polynomial p (h) can satisfy approximation (23.1).
To ease notation we put f (0) = f . The polynomial Tn has as coe¢ cients the derivatives
of f at the point x0 , up to order n. In particular, if x0 = 0 the Taylor’s polynomial is
sometimes called Maclaurin’s polynomial.
The next result, fundamental and of great elegance, shows that if f has a suitable num-
ber of derivatives at x0 , the unique polynomial expansion is given precisely by the Taylor
polynomial.
Theorem 1041 (Taylor) Let f : (a; b) ! R be a function that is n 1 times di¤ erentiable
on (a; b). If f is n times di¤ erentiable at x0 2 (a; b), then it has at x0 a unique polynomial
expansion pn of degree n, given by
where Tn is the unique polynomial, of degree at most n, that satis…es De…nition 1038, i.e.,
which is able to approximate f (x0 + h) with error o (hn ).
The approximation (23.6) is called Taylor’s expansion (or formula) of order n of f at
x0 . The important special case x0 = 0 is called Maclaurin’s expansion (or formula) of order
n of f .
Note that for n = 1 Taylor’s Theorem coincides with the direction “if” of Theorem 936.
Indeed, since we set f (0) = f , saying that f is 0 times di¤erentiable on (a; b) is simply
equivalent to saying that f is de…ned on (a; b). Hence, for n = 1 Taylor’s Theorem states
that, if f : (a; b) ! R is di¤erentiable at x0 2 (a; b), then
Approximation (23.6) is key in applications and is the actual form that the aforemen-
tioned tension between the complexity of the approximating polynomial and the goodness of
the approximation takes. The trade-o¤ must be solved case by case, according to the relative
importance that the two properties of the approximation –complexity and quality –have in
the particular application which we are interested in. In many cases, however, the quadratic
approximation (23.7) is a good compromise and so, among all the possible approximations,
it has a particular importance.
Proof In light of Lemma 1039, it is su¢ cient to show that Taylor’s polynomial satis…es
(23.1). Let us start by observing preliminarily that, by hypothesis, the higher order derivative
functions f (k) : (a; b) ! R exist for every 1 k n 1. Moreover, by Proposition 937 f (k)
is continuous at x0 for 1 k n 1. Let ' : (x0 a; b x0 ) ! R and : R ! R be the
auxiliary functions given by, respectively,
n
X f (k) (x0 )
' (h) = f (x0 + h) hk and (h) = hn
k!
k=0
1
From the Latin os, mouth (that is, it is the “kissing” parabola, where the kiss is with f at x0 ).
23.1. TAYLOR’S POLYNOMIAL APPROXIMATION 717
so that
lim '(k) (h) = '(k) (0) = 0 (23.11)
h!0
Thanks to (23.9) and (23.11), we can apply de l’Hospital’s rule n 1 times, and get
with L 2 R. Simple calculations show that (n 1) (h) = n!h. Hence, since f has n derivatives
at x0 , expression (23.10) with k = n 1 yields
'(n 1) (h) 1 f (n 1) (x
0 + h) f (n 1) (x
0) hf (n) (x0 )
lim (n 1) (h)
= lim
h!0 n! h!0 h
!
1 f (n 1) (x
0 + h) f (n 1) (x
0)
= lim f (n) (x0 ) =0
n! h!0 h
f (k) (0)
k = 81 k n
k!
718 CHAPTER 23. APPROXIMATION
f 00 (0)
0 = f (0) = 0 , 1 = f 0 (0) = 0 , 2= =0
2!
f 000 (0) 18 f (iv) (0) 24
3 = = = 3 , 4 = = =1
3! 6 4! 24
N
Example 1043 Let f : (0; 1) ! R be given by f (x) = log (1 + x). It is n times di¤erenti-
able at each point of its domain, with
(n 1)!
f (n) (x) = ( 1)n+1 8n 1
(1 + x)n
h h2
log (1 + x0 + h) = log (1 + x0 ) +
2 (1 + x0 )2
1 + x0
h3 n+1 hn
+ + + ( 1) + o (hn )
3 (1 + x0 )3 n (1 + x0 )n
Xn
hk
= log (1 + x0 ) + ( 1)k+1 k
+ o (hn )
k=1
k (1 + x0 )
n
X (x x0 )k
log (1 + x) = log (1 + x0 ) + ( 1)k+1 k
+ o ((x x0 )n )
k=1
k (1 + x0 )
x2 x3 xn
log (1 + x) = x + + + ( 1)n+1 + o (xn ) (23.14)
2 3 n
n
X xk
= ( 1)k+1 + o (xn )
k
k=1
Example 1044 In a similar way the reader can verify the Maclaurin’s expansions of order
23.1. TAYLOR’S POLYNOMIAL APPROXIMATION 719
Here too it is important to observe how such functions can be (well) approximated by simple
polynomials. N
3x2 3x4 + 6x
f 0 (x) = 6 cos x sin x , f 00 (x) = 6(cos2 x sin2 x)
1 + x3 (1 + x3 )2
So,
1
f (x) = f (0) + f 0 (0) x + f 00 (0) x2 + o x2 = 3x2 + o x2 (23.15)
2
N
In other words, under the hypotheses of the theorem the error term o (hn ) can always be
taken equal to
f (n+1) (x0 + #h) n+1
h (23.17)
(n + 1)!
where the (n + 1)-th derivative is calculated at an intermediate point between x0 and x0 + h.
The expression indicated allows to control the approximation error: if f (n+1) (x) k for
every x 2 (a; b), then one can conclude that the approximation error does not exceed k and
therefore
X n
f (k) (x0 ) k k
f (x0 + h) h jhjn+1
k! (n + 1)!
k=0
The error term (23.17) is called the Lagrange remainder, while o (hn ) is called the Peano
remainder. The former permits error estimates, as just remarked, but the latter is often
enough to express the quality of the approximation.
Proof Suppose that h > 0. Consider the interval [x0 ; x0 + h] (a; b). By formula (22.8),
we have
n
X f (k) (x0 ) k f (n+1) (^x) n+1
f (x0 + h) = h + h
k! (n + 1)!
k=0
for some x
^ 2 (x0 ; x0 + h). Thus, for some 0 < t < 1 we have x ^ = tx0 + (1 t) (x0 + h), so
x
^ = x0 + #h by setting # = 1 t. We thus get (23.16).
Suppose that h < 0. If we now consider the interval [x0 + h; x0 ] (a; b), by formula
(22.9) we have
Xn
f (k) (x0 ) k f (n+1) (^
x) n+1
f (x0 + h) = h + h
k! (n + 1)!
k=0
log 1 + x3 3 sin2 x
lim
x!0 log (1 + x)
23.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 721
Since the limit is as x ! 0, we can use the second-order Maclaurin’s expansion (23.15) and
(23.14) to approximate the numerator and the denominator. Using Lemma 470 and the
little-o algebra, we have
The calculation of the limit has, therefore, been considerably simpli…ed through the combined
use of Maclaurin’s expansions and of the comparison of in…nitesimals seen in Lemma 470.
(ii) Consider the limit
x sin x
lim
x!0 log2 (1 + x)
This limit can also be calculated by combining an expansion and a comparison of in…nites-
imals:
x sin x x (x + o (x)) x2 + o x2 x2
lim 2 = lim 2 = lim = lim =1
x!0 log (1 + x) x!0 (x + o (x)) x!0 x2 + o (x2 ) x!0 x2
(i) If n is even and f (n) (x0 ) < 0, the point x0 is a strong local maximizer.
(ii) If n is even and f (n) (x0 ) > 0, the point x0 is a strong local minimizer.
(iii) If n is odd, x0 is not a local extremal point and, moreover, f is increasing or decreasing
at x0 depending on whether f (n) (x0 ) > 0 or f (n) (x0 ) < 0.
For n = 1, point (iii) is nothing but the fundamental …rst-order necessary condition
f 0 (x0 ) = 0. Indeed, for n = 1, point (iii) states that if f 0 (x0 ) 6= 0, then x0 is not a
local extremal point (i.e., neither a local maximizer nor a local minimizer). By taking the
contrapositive, this amounts to saying that if x0 is a local extremal point, then f 0 (x0 ) = 0.
Hence, (iii) extends to higher order derivatives the …rst-order necessary condition.
Point (i) instead, together with the hypothesis f (k) (x0 ) = 0 for every 1 k n 1,
00
extends to higher order derivatives the second-order su¢ cient condition f (x0 ) < 0 for
722 CHAPTER 23. APPROXIMATION
strong local maximizers. Indeed, for n = 2 (i) is exactly condition f 00 (x0 ) < 0. Analogously,
(ii) extends the analogous condition f 00 (x0 ) > 0 for minimizers.2
N.B. In this and in the next section we will focus on the generalization of su¢ ciency point
(ii) of Corollary 1017. It is possible to generalize in a similar way its necessity point (i), as
readers can check. O
Proof (i). Let n be even and let f (n) (x0 ) < 0. By Taylor’s Theorem, from the hypothesis
f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0 if follows that
f (n) (x0 ) n f (n) (x0 ) n o (hn )
f (x0 + h) f (x0 ) = h + o (hn ) = h 1+
n! n! hn
Since limh!0 o (hn ) =hn = 0, there exists > 0 such that jhj < implies jo (hn ) =hn j < 1.
Hence,
o (hn )
h2( ; ) =) 1 + >0
hn
Since f (n) (x0 ) < 0, we have therefore, because hn > 0 being n even,
f (n) (x0 ) n o (hn )
h2( ; ) =) h 1+ < 0 =) f (x0 + h) f (x0 ) < 0
n! hn
that is, setting x = x0 + h,
x 2 (x0 ; x0 + ) =) f (x) < f (x0 )
So, x0 is a local maximizer. This proves (i). In a similar way we prove (ii). Finally, (iii) can
be proved by adapting in a suitable way the proof of Fermat’s Theorem.
Proposition 1049 is powerful but has important limitations. Like Corollary 1014, it can
only treat interior points and it is useless for local extremal points that are not strong,
for which in general the derivatives of any order are zero. The most classic instance of such
failure are constant functions: their points are all, trivially, both maximizers and minimizers,
but Proposition 1049 (like Corollary 1014) is not able to tell us anything about them.
Moreover, to apply Proposition 1049 it is necessary that the function has a su¢ cient
number of derivatives at a stationary point, which may not be the case as the next example
shows.
Example 1052 The general version of the previous example considers f : R ! R de…ned
by 8
< xn sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
with n 1, and shows that f does not have derivatives of order n at the origin (in the
case n = 1, this means that at the origin the …rst derivative does not exist). We leave to the
reader the analysis of this example. N
724 CHAPTER 23. APPROXIMATION
1. Determine the set S of stationary points by solving the …rst-order condition f 0 (x) = 0.
If S = ; the procedure ends (we conclude that, since there are no stationary points,
there are no extremal ones); otherwise we move to the next step.
This is the classic procedure to …nd local extremal points based on …rst-order and second-
order conditions of Section 22.5.2. The version just presented improves what we have seen
there because, using again what we observed in a previous footnote, it requires only that the
function has two derivatives on int C, not necessarily continuous. However, we are still left
with the other limitations discussed in Section 22.5.2.
1. Determine the set S of the stationary points by solving the equation f 0 (x) = 0. If
S = ;, the procedure ends; otherwise move to the next step.
3. Compute f 000 at each point of S (2) : if f 000 (x) 6= 0, the point x is not an extremal one.
Call S (3) the subset of S (2) in which f 000 (x) = 0. If S (3) = ;, the procedure ends;
otherwise move to the next step.
4. Compute f (iv) at each point of S (3) : the point x is a strong local maximizer if f (iv) (x) <
0; a strong local minimizer if f (iv) (x) > 0. Call S (4) the subset of S (3) in which
f (iv) (x) = 0. If S (4) = ;, the procedure ends; otherwise move to the next step.
The procedure thus ends if there exists n such that S (n) = ;. Otherwise, the procedure
iterates ad libitum (or ad nauseam).
Example 1053 Consider again the function f (x) = x4 , with C = R. We saw in Example
1016 that for its maximizer x0 = 0 it was not possible to apply the su¢ cient condition
f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have, however,
so that
S = S (2) = S (3) = f0g and S (4) = ;
Stage 1 identi…es the set S = f0g, about which stage 2 has however nothing to say since
f 00 (0) = 0. Also stage 3 does not add any extra information since f 000 (0) = 0. Stage 4
instead is conclusive: since f (iv) (0) < 0, we can assert that x = 0 is a strong local maximizer
(actually, it is a global maximizer, but this procedure does not allow us to say this). N
Naturally, the procedure is of practical interest when it ends after few stages.
For example, f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is a quadratic form because it is the sum of the
monomials of second degree 3x1 x3 and x2 x3 . It is easy to see that the following functions
are quadratic forms:
f (x) = x2
f (x1 ; x2 ) = x21 + x22 4x1 x2
f (x1 ; x2 ; x3 ) = x1 x3 + 5x2 x3 + x23
f (x1 ; x2 ; x3 ; x4 ) = x1 x4 2x21 + 3x2 x3
In other words, given a symmetric matrix A there exists a unique quadratic form
n n
f : Rn ! R for which (23.18) holds. Vice versa, given a quadratic form f : Rn ! R there
exists a unique symmetric matrix A for which (23.18) holds.
n n
The matrix A = (aij ) is called the matrix associated to the quadratic form f . We can
write (23.18) in an extended manner as
f (x) = a11 x21 + a22 x22 + a33 x23 + + ann x2n
+ 2a12 x1 x2 + 2a13 x1 x3 + + 2a1n x1 xn
+ 2a23 x2 x3 + + 2a2n x2 xn + + 2an 1n xn 1 xn
The coe¢ cients of the squares x21 , x22 , ..., x2n are therefore the elements (a11 ; a22 ; :::ann ) of
the diagonal of A, while for every i; j = 1; 2; :::n the coe¢ cient of the monomial xi xj is 2aij .
It is therefore very simple to move from the matrix to the quadratic form and vice versa.
Let us see give some examples.
Example 1056 The matrix associated to the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3
is given by 2 3
3
0 0 2
A=4 0 0 1 5
2
3 1
2 2 0
Indeed, for every x 2 R3 we have:
2 3
323
0 0 2 x1
x Ax = (x1 ; x2 ; x3 ) 4 0 0 1 54
x2 5
2
3 1
2 2 0 x3
3 1 3 1
= (x1 ; x2 ; x3 ) x3 ; x3 ; x1 x2
2 2 2 2
3 1 3 1
= x1 x3 x2 x3 + x1 x3 x2 x3 = 3x1 x3 x2 x3
2 2 2 2
Note that also the matrices
2 3 2 3
0 0 3 0 0 0
A=4 0 0 1 5 and A=4 0 0 0 5 (23.19)
0 0 0 3 1 0
are such that f (x) = x Ax, although they are not symmetric. What we loose without
symmetry is the one-to-one correspondence between quadratic forms and matrices. Indeed,
while given the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 there exists a unique symmetric
matrix for which (23.18) holds, this is no longer true if we do not require the symmetry of
the matrix, as the two matrices in (23.19) show: for both of them, (23.18) holds. N
3
To ease notation we write x Ax instead of the more precise x AxT (cf. the dicussion on vector notation
in in Section 13.2.4).
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 727
Example 1057 As to the quadratic form f (x1 ; x2 ) = x21 + x22 4x1 x2 , we have
1 2
A=
2 1
1 2 x1
x Ax = (x1 ; x2 ) = (x1 ; x2 ) (x1 2x2 ; 2x1 + x2 )
2 1 x2
= x21 2x1 x2 2x1 x2 + x22 = x21 + x22 4x1 x2
N
P
Example 1058 Let f : Rn ! R be de…ned by f (x) = kxk2 = ni=1 x2i . The symmetric
matrix
Pn associated to this quadratic form
Pn is the2 identity matrix I. Indeed, x Ix = x x =
2
i=1 xi . More generally, let f (x) = i=1 i xi with i 2 R for every i = 1; :::; n. It is easy
to see that the matrix associated to f is the diagonal matrix
2 3
1 0 0 0
6 0 0 0 7
6 2 7
6 0 0 0 7
6 3 7
4 0 0 0 0 5
0 0 0 n
(iii) inde…nite if there exist x; x0 2 Rn such that f (x) < 0 < f (x0 ).
In some cases it is easy to check thePsign of a quadratic form. For example, it is immediate
to see that the quadratic form f (x) = ni=1 i x2i is positive semi-de…nite if and only if i 0
for every i, while it is positive de…nite if and only if i > 0 for every i. In general, however,
it is not simple to determine directly the sign of a quadratic form and, therefore, some
useful criteria have been elaborated. Among them, we consider the classic Sylvester-Jacobi
criterion.
728 CHAPTER 23. APPROXIMATION
and their determinants det A1 , det A2 , det A3 ,..., det An = det A.4
(i) positive de…nite if and only if det Ai > 0 for every i = 1; :::; n;
(ii) negative de…nite if and only if det Ai changes sign starting with negative sign (that is,
det A1 < 0, det A2 > 0, det A3 < 0 and so on);
(iii) inde…nite if the determinants det Ai are not zero and the sequence of their signs does
not respect (i) and (ii).
Example 1061 Let f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2 . The matrix associated to
f is: 2 3
1 21 0
A = 4 12 2 12 5
0 12 1
Indeed, we have
2 1
32 3
1 2 0 x1
x Ax = (x1 ; x2 ; x3 ) 4 1
2 2 1 54
2 x2 5
1
0 2 1 x3
1 1 1 1
= (x1 ; x2 ; x3 ) x1 + x2 ; x1 + 2x2 + x3 ; x2 + x3
2 2 2 2
= x21 + 2x22 + x23 + (x1 + x3 ) x2
Let us determine the sign of the quadratic form with the Sylvester-Jacobi criterion. We have:
det A1 = 1 > 0
7 1
1 2
det A2 = det = >0
1
2 42
3
det A3 = det A = > 0
2
Hence, by the Sylvester-Jacobi criterion our quadratic form is positive de…nite. N
for every x 2 U .
We can now present the Taylor expansion for functions of several variables. As in the
scalar case, also in the general multivariable case the Taylor expansion re…nes the …rst order
approximation (23.21). In stating it, we limit ourselves to a second order approximation
that su¢ ces for our purposes.5
Expression (23.22) is called the quadratic (or second-order ) Taylor expansion (or for-
mula). The polynomial in the variable x
1
f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 )
2
is called the Taylor polynomial of second degree at the point x0 . The second-degree term
is a quadratic form. Its associated matrix, the Hessian r2 f (x), is symmetric by Schwarz’s
Theorem.
Naturally, if terminated at the …rst order the Taylor’s expansion reduces to (23.21).
Moreover, observe that in the scalar case Taylor’s polynomial assumes the well-know form:
1
f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 )2
2
Indeed, in this case we have r2 f (x0 ) = f 00 (x0 ), and therefore
As in the scalar case, here too we have a trade-o¤ between the simplicity of the approx-
imation and its accuracy. Indeed, the …rst order approximation (23.21) has the advantage
5
In the rest of this section U is an open convex set. We omit the proof of this theorem and refer readers
to more advanced courses for the study of approximations of higher order.
730 CHAPTER 23. APPROXIMATION
of simplicity compared to the quadratic one: we approximate with a linear function rather
than with a second-degree polynomial, but to the detriment of the degree of accuracy of the
approximation, given by o (kx x0 k) instead of the better o kx x0 k2 .
Also in the multivariable case, the choice of the order at which to terminate the Taylor
expansion depends therefore on the particular use we are interested in, and on which aspect
of the approximation is more important, simplicity or accuracy.
2
Example 1063 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 ex2 . We have:
2 2
rf (x) = 6x1 ex2 ; 6x21 x2 ex2
and " #
2 2
2 6ex2 12x1 x2 ex2
r f (x) = 2 2
12x1 x2 ex2 6x1 ex2 1 + 2x22
2
We close with a …rst order approximation with Lagrange remainder that sharpens the
approximation (23.20) with Peano remainder.6
1
f (x0 + h) = f (x0 ) + rf (x0 ) h + (x0 + #h) r2 f (x0 ) (x0 + #h) (23.24)
2
Note that the same di¤erentiability assumption that permitted the quadratic approxim-
ation (23.22) with a Peano remainder, only allows for a …rst order approximation with the
sharper Lagrange remainder. As usual, no free meals.
6
Higher order approximations with Lagrange remainders are notationally cumbersome, and we leave them
to more advanced courses.
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 731
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2
2
If x
^ is a local extremal point (either a maximizer or minimizer), by Fermat’s Theorem we
have rf (^ x) = 0 and therefore the approximation becomes
1
f (x) = f (^
x) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x (23.25)
2
that is,
1
f (^ x) + h r2 f (^
x + h) = f (^ x) h + o khk2
2
Based on this simple observation, we obtain the following second-order conditions that are
based on the sign of the quadratic form h r2 f (x0 ) h.
Note that from point (i) it follows that if the quadratic form h r2 f (^
x) h is inde…nite,
the point x^ is neither a local maximizer nor a local minimizer on U . This theorem is the
multivariable analog of Corollary 1017. Indeed, in the proof we will use such corollary since
we will be able to reduce the problem from functions of several variables to functions of a
single variable.
Proof We will prove only point (i), leaving point (ii) to the reader. So, let x ^ be a local
2
maximizer on U . We want to prove that the quadratic form h r f (^ x) h is negative semi-
de…nite. For simplicity, let us suppose that x
^ is the origin 0 = (0; 0). First of all, let us prove
that v r2 f (0) v 0 for every unit vector v of Rn . We will then prove that h r2 f (0) h 0
for every vector h 2 Rn .
Since 0 is a local maximizer and U is open, there exists a small enough neighborhood
B" (0) so that B" (0) U and f (0) f (x) for every x 2 B" (0). Note that every vector
x 2 B" (0) can be written as x = tv, where v is a unit vector of Rn (i.e., jjvjj = 1) and t 2 R.8
7
For simplicity we continue to consider functions de…ned on open sets. We leave to readers the routine
extension of the results to functions f : A Rn ! R and to interior points x ^ that belong to a choice set
C A.
8
Intuitively, v represents the direction of x and t its norm (indeed, jjxjj = jtj).
732 CHAPTER 23. APPROXIMATION
Clearly, tv 2 B" (0) if and only if jtj < ". Fix an arbitrary unit vector v in Rn , and de…ne
the function v : ( "; ") ! R by v (t) = f (tv). Since tv 2 B" (0) for jtj < ", we have
for every t 2 ( "; "). It follows that t = 0 is a local maximizer for the function v and hence,
being v di¤erentiable and t = 0 an interior point of the domain of v , by applying Corollary
1017 we get 0v (0) = 0 and 00v (0) 0. By applying the chain rule to the function
Since the unit vector v of Rn is arbitrary, this last inequality holds for every unit vector of
Rn .
Now, let h 2 Rn . In much the same way as before, observe that h = th v for some unit
vector v 2 Rn and th 2 R such that jth j = jjhjj.
1.5 h=t v
h
1
v
0.5
0
1
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Then
h r2 f (0) h = th v r2 f (0) th v = t2h v r2 f (0) v
Since v r2 f (0) v 0, we have also h r2 f (0) h 0. This holds for every h 2 Rn , so the
quadratic form h r2 f (0)h is negative semi-de…nite.
In the scalar case we get back to the usual second-order conditions, based on the sign of
the second derivative f 00 (^
x). Indeed, we already observed in (23.23) that in the scalar case
one has
x r2 f (^
x) x = f 00 (^
x) x2
Thus, in this case the sign of the quadratic form depends only on the sign of f 00 (^ x); that is,
it is negative (positive) de…nite if and only if f 00 (^
x) < 0 (> 0), and it is negative (positive)
semi-de…nite if and only if f 00 (^
x) 0 ( 0).
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 733
Naturally, as in the scalar case, also in this general multivariable case condition (i) is
only necessary for x
^ to be a local maximizer.
^ = 0 we have r2 f (0) = O.
Example 1066 Consider the function f (x1 ; x2 ) = x21 x2 . At x
2
The corresponding quadratic form x r f (0) x is identically zero and is therefore both
negative and positive semi-de…nite. Nevertheless, x
^ = 0 is neither a local maximizer nor a
local minimizer. Indeed, by taking a generic neighborhood B" (0), let x = (x1 ; x2 ) 2 B" (^
x)
be such that x1 = x2 . Let t be such a common value, so that
p p "
(t; t) 2 B" (0) () k(t; t)k = t2 + t2 = jtj 2 < " () jtj < p
2
Since f (t; t) = t3 , for every (t; t) 2 B" (0) we have f (t; t) < f (0) if t < 0 and f (0) < f (t; t)
if t > 0, which shows that x ^ = 0 is neither a local maximizer nor a local minimizer.9 N
Example 1067 For instance, consider the function f (x) = x21 x22 . The point x
^ = 0 is
2
clearly a (global) maximizer for the function f but r f (0) = O, so the corresponding
quadratic form x r2 f (0) x is not negative de…nite. N
The Hessian r2 f (^x) is the symmetric matrix associated to the quadratic form x
2
r f (^
x) x. We can therefore equivalently state Theorem 1065 in the following way:
This Hessian version is important operationally because there exist criteria, such as the
Sylvester-Jacobi one, to determine whether a symmetric matrix is positive/negative de…nite
or semi-de…nite. For instance, consider a generic function of two variables f : R2 ! R that
is twice continuously di¤erentiable. Let x0 2 R2 be a stationary point rf (x0 ) = (0; 0) and
let 2 3
@2f @2f
(x ) (x )
5= a b
2 0 @x1 @x2 0
r2 f (x0 ) = 4 @@x2 f1 @2f (23.26)
(x0 ) 2 (x0 )
c d
@x2 @x1 @x2
be the Hessian matrix computed at the point x0 . Since the gradient at x0 is zero, the point
is a candidate to be a maximizer or minimizer of f . To determine its exact nature, it is
necessary to analyze the Hessian matrix at the point. By Theorem 1065, x0 is a maximizer
if the Hessian is negative de…nite, a minimizer if it is positive de…nite, and it is neither a
maximizer, nor a minimizer if it is inde…nite. If the Hessian is only semi-de…nite, positive or
negative, it is not possible to draw conclusions on the nature of x0 . Applying the Sylvester-
Jacobi criterion to the matrix (23.26) we have that:
9
In an alternative way, it is su¢ cient to observe that at each point of the I or II quadrant, except the
axes, we have f (x1 ; x2 ) > 0, and that at each point of the III or IV quadrant, except the axes, we have
f (x1 ; x2 ) < 0. Every neighborhood of the origin contains necessarily both points of the I and II quadrants
(except the axes), for which we have f (x1 ; x2 ) > 0 = f (0), and points of the III and IV quadrants (except
the axes), for which we have f (x1 ; x2 ) < 0 = f (0). Hence 0 is neither a local maximizer nor a local minimizer.
734 CHAPTER 23. APPROXIMATION
(i) if a > 0 and ad bc > 0, the Hessian is positive de…nite, so x0 is a strong local
minimizer;
(ii) if a < 0 and ad bc > 0, the Hessian is negative de…nite, so x0 is a strong local
maximizer;
(iii) if ad bc < 0, the Hessian is inde…nite, and therefore x0 is neither a local maximizer,
nor a local minimizer.
In all the other cases it is not possible to say anything on the nature of the point x0 .
Example 1068 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 + x22 + 6x1 . We have rf (x) =
(6x1 + 6; 2x2 ) and
6 0
r2 f (x) =
0 2
It is easy to see that the unique point where the gradient vanishes is x0 = ( 1; 0) 2 R2 ,
that is, rf ( 1; 0) = (0; 0). Moreover, in view of the previous discussion, since a > 0 and
ad bc > 0, the point x0 = ( 1; 0) is a strong local minimizer of f . N
Example 1069 Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x31 + x32 + 3x23 2x3 + x21 x22 . We
have
rf (x) = 3x21 + 2x1 x22 ; 3x22 + 2x21 x2 ; 6x3 2
and 2 3
6x1 + 2x22 4x1 x2 0
r2 f (x) = 4 4x1 x2 6x2 + 2x21 0 5
0 0 6
The stationary points are x0 = ( 3=2; 3=2; 1=3) and x00 = (0; 0; 1=3). At x0 , we have
2 9 3
2 9 0
r2 f x0 = 4 9 9
2 0
5
0 0 6
and therefore
9 9
9
det < 0; det 2
9 < 0; det r2 f x0 < 0
2 9 2
By the Sylvester-Jacobi criterion the Hessian matrix is inde…nite. By Theorem 1065, the
point x0 = ( 3=2; 3=2; 1=3) is neither a local minimizer nor a local maximizer. For the
point x00 = (0; 0; 1=3) we have
2 3
0 0 0
r2 f x00 = 4 0 0 0 5
0 0 6
which is positive semi-de…nite since x r2 f (x00 ) x = 6x23 (note that it is not positive de…nite:
for example, we have (1; 1; 0) r2 f (x00 ) (1; 1; 0) = 0). N
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 735
1. Determine the set S C of the stationary interior points of f by solving the …rst order
condition rf (x) = 0 (Section 22.1.3).
2. Calculate the Hessian matrix r2 f at each of the stationary points x 2 S and determine
the set
S2 = x 2 S : r2 f (^x) is negative semi-de…nite
Also here the procedure is not conclusive because nothing ensures the existence of a
solution. Later in the book we will discuss this crucial problem by combining in the method
of elimination such existence theorems with the di¤erential methods.
Here C = R2++ is the …rst quadrant of the plane without the axes (hence an open set). We
have
rf (x) = ( 4x1 + 3 x2 ; 2x2 + 3 x1 )
Therefore, from the …rst-order condition rf (x) = 0 it follows that the unique stationary
point is x = (3=7; 9=7), that is, S = f3=7; 9=7g. We have
4 1
r2 f (x) =
2 1
By the Sylvester-Jacobi criterion, the Hessian matrix r2 f (x) is negative de…nite.10 Hence,
S2 = f3=7; 9=7g. Since S2 is a singleton, we have trivially S3 = S2 . In conclusion, the point
x = (3=7; 9=7) is the unique candidate to be a solution of the unconstrained optimization
problem. One can show that this point is the solution of the problem. For the moment we
can only say that, by Theorem 1065-(ii), it is a local maximizer. N
10
Since r2 f (x) is negative de…nite for all x 2 Rn
++ , this also proves that f is concave.
736 CHAPTER 23. APPROXIMATION
Example 1071 (i) Power functions 'n (x) = (x x0 )n are an asymptotic scale at x0 2
(a; b). (ii) Negative power functions 'n (x) = x n are an asymptotic scale at x0 = +1.13
More generally, powers 'n (x) = x n form an asymptotic scale at x0 = +1 as long as
n+1 > n for every n 1. (iii) The trigonometric functions 'n (x) = sinn (x x0 ) form an
1
asymptotic scale at x0 2 (a; b). (iv) Logarithms 'n (x) = log n x form an asymptotic scale
at x0 = +1. N
are a special case of (23.27) in which the asymptotic scale is given by power functions.
Contrary to the polynomial case where x0 had to be a scalar, now we can take x0 =
1. Indeed, general expansions are relevant because, relative to special case of polynomial
expansions, they also allow us to approximate a function for large values of the argument,
that is, asymptotically.
In symbols, condition (23.27) can be expressed as
n
X
f (x) k 'k (x) as x ! x0
k=0
11
Throughout this section we will maintain this assumption.
12
The expression x0 2 [a; b] entails that x0 is an accumulation point of (a; b). For example, if (a; b) is
the real line, the point x0 belongs to the real line itself; in symbols, if (a; b) = ( 1; +1) we have that
x0 = [ 1; +1]
13
When, as in this example, we have x0 = +1 the interval (a; b) is understood to be unbounded b = +1
(the example of the negative power function scale was made by Poincaré himself.)
23.5. CODA: ASYMPTOTIC EXPANSIONS 737
By using the scale of power functions, we end up with the well-known quadratic approxim-
ation
2
f (x) 0 + 1x + 2x as x ! 0
However, if we use the scale of negative power functions, we get:
1 2
f (x) 0 + + 2
as x ! +1
x x
In such a case, being x0 = +1, we are dealing with a quadratic asymptotic approximation.
The key uniqueness property of polynomial expansions (Lemma 1039) still holds in the
general case.
Lemma 1074 A function f : (a; b) ! R has at most a unique expansion of order n with
respect to scale at every point x0 2 [a; b].
Pn
Proof Consider the expansion k=0 k 'k (x) + o ('n ) at x0 2 [a; b]. We have
Pn
f (x) k=0 k 'k (x) + o ('n )
lim = lim = 0 (23.29)
x!x0 '0 (x) x!x0 '0 (x)
Pn
f (x) 0 '0 (x) k=1 k 'k (x) + o ('n )
lim = lim = 1 (23.30)
x!x0 '1 (x) x!x0 '1 (x)
Pn 1
f (x) k=0 k 'k (x)
lim = n (23.31)
x!x0 'n (x)
Suppose that, for every x 2 (a; b), there are two di¤erent expansions
n
X n
X
k 'k (x) + o ('n ) = k 'k (x) + o ('n ) (23.32)
k=0 k=0
738 CHAPTER 23. APPROXIMATION
Equalities (23.29)-(23.31) must hold for both expansions. Hence, by (23.29) we have that
0 = 0 . Iterating such a procedure, from equality (23.30) we get 1 = 1 , and so on until
n = n.
Limits (23.29)-(23.31) are crucial: it is easy to prove that the expansion (23.27) holds if
and only if the limits exist (and are …nite).14 Such limits, in turn, determine the expansion’s
coe¢ cients f k gnk=0 .
Example 1075 Let us determine the quadratic asymptotic approximation, with respect
to the scale of negative power functions, for the function f : ( 1; +1) ! R de…ned by
f (x) = 1= (1 + x). Thanks to equalities (23.29)-(23.31), we have
1
f (x) 1
0 = lim = lim 1+x = lim =0
x!x0 '0 (x) x!x0 1 x!x0 1 + x
1
f (x) 0 '0 (x) 1+x x
1 = lim = lim 1 = lim =1
x!x0 '1 (x) x!x0
x
x!x0 1+x
1 1
f (x) 0 '0 (x) 1 '1 (x) 1+x x x
2 = lim = lim 1 = lim = 1
x!x0 '2 (x) x!x0
x2
x!x0 1+x
By the previous lemma, it is the only quadratic asymptotic approximation with respect to
the scale of negative power functions. N
If we change the scale, the expansion as well changes. For example, approximation
(23.28) is a quadratic approximation for 1= (x 1) with respect to the scale of negative
power functions. However, by changing scale one obtains a di¤erent quadratic approximation.
Indeed, if for example at x0 = +1 we consider the asymptotic scale 'n (x) = (x + 1) =x2n
we obtain the quadratic asymptotic approximation
1 x+1 x+1
+ as x ! +1
x 1 x2 x4
In fact,
The reader might recall that we considered the two following formulations of the De
Moivre-Stirling formula
where the integral is an improper one (Section 35.11.1). We already know that this function
is log-convex (Example 772). Moreover, it satis…es the following formula.
Proof By integrating by parts, one obtains that for every 0 < a < b
Z b Z b Z b
b
tx e t dt = e t tx a + x tx 1 e t dt = e b bx + e a ax + x tx 1
e t dt
a a a
(n + 1) = n (n) = n (n 1) (n 1) = = n! (1) = n!
15
Since x > 0, we have lima!0 ax = 0 as 1 = x lima!0 log a = lima!0 log ax .
740 CHAPTER 23. APPROXIMATION
since (1) = 1. The gamma function can be, therefore, thought of as the extension on
the real line of the factorial function f (n) = n!, which is de…ned on the natural numbers
(so, it is a sequence).16 It is an important function: the next remarkable result makes its
interpretation in terms of expansion of the two versions of the De Moivre-Stirling formula
more rigorous.
Example 1079 (i) The function f : (1; +1) ! R de…ned by f (x) = 1= (x 1) has, with
respect to the scale of negative power functions, the asymptotic expansion
1
X 1
f (x) as x ! +1 (23.33)
xk
k=1
The asymptotic expansion is, for every given x, a geometric series. Therefore, it converges
for every x > 1 –i.e., for every x in the domain of f –with
1
X 1
f (x) =
xk
k=1
16
Instead of (n + 1) = n! we would have exactly (n) = n! if in the gamma function the exponent was x
instead of x 1 (we adopt the standard notation). This detail also explains the opposite sign of the logarithmic
term in the approximations of n! and of (x). The properties of the gamma function, including the next
theorem and its proof, can be found in Artin (1964).
23.5. CODA: ASYMPTOTIC EXPANSIONS 741
In this (fortunate) case the asymptotic expansion is actually correct: the series determined
by the asymptotic expansion converges to f (x) for every x 2 (a; b).
(ii) Also the function f : (1; +1) ! R de…ned by (1 + e x ) = (x 1) has, with respect
to the scale of negative power functions, the asymptotic expansion (23.33) for x ! +1.
However, in this case we have, for every x > 1,
1
X 1
f (x) 6=
xk
k=1
In this example the asymptotic expansion is merely an approximation, with degree of accur-
acy x n for every n.
(iii) Consider the function f : (1; +1) ! R de…ned by:17
Z x t
x e
f (x) = e dt
1 t
xn 2n+1
= lim x + =0
x!1 e2 x
We have Z x
et ex
dt = o as x ! +1
1 tn+1 xn
Hence,
g (x) 1 1 2! 3! (n 1)! 1
f (x) = x
= + 2+ 3+ 4+ + +o as x ! +1
e x x x x xn xn
and
1
X (k 1)!
f (x) as x ! +1
xk
k=1
P P
For any given x > 1, the ratio criterion implies 1 k=1 (k 1)!=xk = 1 k
k=1 k!=kx = +1.
The asymptotic expansion thus determines a divergent series. In this (very unfortunate)
case not only the series does not converge to f (x), but it even diverges. N
17
This example is taken from de Bruijn (1961).
742 CHAPTER 23. APPROXIMATION
Let us go back to the polynomial case, in which the asymptotic expansion of f : (a; b) ! R
at x0 2 (a; b) has the power series form18
1
X
f (x) k (x x0 )k as x ! x0
k=0
The right-hand side of the expansion is a power series called the Taylor series (Maclaurin if
x0 = 0) of f at x0 , with coe¢ cients k = f (k) (x0 ) =k!.
But, when can we turn in =, that is, when can these approximations become, at least
locally, exact? To answer this important question, we introduce the following classic class of
functions.
De…nition 1080 A function f : (a; b) ! R is said to be analytic if, for every x0 2 (a; b),
there is a neighborhood B (x0 ) and a sequence of scalars f k g1
k=0 such that
1
X
f (x) = k (x x0 )k 8x 2 B (x0 )
k=0
Proposition 1081 A function f : (a; b) ! R is analytic if and only if it is in…nitely di¤ er-
entiable and, for every x0 2 (a; b), there is a neighborhood B (x0 ) such that
1
X f (k) (x0 )
f (x) = (x x0 )k 8x 2 B (x0 ) (23.34)
k!
k=0
Lemma 1074 implies that k = f (k) (x0 ) =k! for every 1 k n. Since n was arbitrarily
chosen, the desired result follows.
To answer the previous “approximation vs. exact” question thus amounts to establish
the analyticity of a function: we can turn in =, at least locally, if the function is analytic.
18
For simplicity, in Section 10.5 we considered power series with x0 = 0 but, of course, everything goes
through if x0 is any scalar.
23.5. CODA: ASYMPTOTIC EXPANSIONS 743
is in…nitely di¤erentiable at every point of the real line, hence at the origin. So,
1
X f (k) (0)
f (x) xk as x ! 0
k!
k=0
Next we present two classic analyticity criteria.19 The …rst one is based on the radius of
convergence of the Taylor series.
The second, quite striking, criterion is based on the sign of the derivatives.
1
f (x) =
1 x
19
The …rst criterion has been proved by Alfred Pringsheim in 1893, the second one by Sergei Bernstein in
1912. We omit the proof of these deep results and refer interested readers to Krantz and Parks (2002).
744 CHAPTER 23. APPROXIMATION
as desired. So, at all x < 1 we have f (k) (x) 0 for all k 1. By Bernstein’s Theorem, f is
analytic on ( 1; 1). That is, at all x0 < 1 there is a neighborhood B (x0 ) ( 1; 1) such
that
X1 1
X
f (k) (x0 ) k x x0 k
f (x) = (x x0 ) = 8x 2 B (x0 )
k! 1 x0
k=0 k=0
because j(x x0 ) = (1 x0 )j < 1 if and only if x 2 (2x0 1; 1). So, we can take B (x0 ) =
(2x0 1; 1), a neighborhood of x0 of radius 1 x0 .20 For instance, at the origin x0 = 0 we
have
1
X 1
f (k) (0) k X k
f (x) = x = x 8x 2 ( 1; 1)
k!
k=0 k=0
If the functions f; g : (a; b) ! R are analytic and ; 2 R are any two scalars, then the
function f + g : (a; b) ! R is still analytic. So, linear combinations of analytic functions
are analytic. This simple remark, combined with analyticity criteria like the previous ones,
permits to establish that many functions of interest are analytic. The following result shows
that, indeed, some classic elementary functions are analytic.
Proposition 1086 (i) The exponential and logarithmic functions are analytic. In particu-
lar,
1
X xk
ex = 8x 2 R
k!
k=0
1
X xk
log (1 + x) = ( 1)k+1 8x > 0
k
k=1
20
Note that x0 < 1 implies 2x0 1 < 1.
23.5. CODA: ASYMPTOTIC EXPANSIONS 745
(ii) The trigonometric functions sine and cosine are analytic. In particular,
1
X 1
X
( 1)k 2k+1 ( 1)k 2k
sin x = x and cos x = x 8x 2 R
(2k + 1)! (2k)!
k=0 k=0
on the real line. The same conclusion could have been achieved via Bernstein’s Theorem.
Theorem 1087 (Hille) Let f : (0; 1) ! R be a bounded continuous function and x0 > 0.
Then, for each h > 0,
1
X k f (x )
0
f (x0 + h) = lim hk (23.35)
!0+ k!
k=0
We call Hille’s formula the limit (23.35). When f is in…nitely di¤erentiable, Hille’s
formula intuitively should approach the series expansion (23.34), i.e.,
1
X f (k) (x0 )
f (x0 + h) = hk
k!
k=0
because lim !0+ k f (x0 ) = f (k) (x0 ) for every k 1 (Proposition 942). This is actually
true when f is analytic because in this case (23.34) and (23.35) together imply
1
X 1
kf(x0 ) k X f (k) (x0 ) k
lim h = h
!0+ k! k!
k=0 k=0
Hille’s formula, however, holds when f is just bounded and continuous, thus providing a
remarkable generalization of Taylor’s expansion of analytic functions.
But, do coe¢ cients of Taylor (in particular, of Maclaurin) series have some characterizing
property? Is there some peculiar property that such coe¢ cients satisfy? In the special case
of analytic
p functions, the answer is positive: Cauchy-Hadamard’s Theorem requires that
lim sup j n j < +1, so only sequence of scalars f k g1
n
k=0 that satisfy such a bound may
qualify to be coe¢ cients of a Taylor series of some analytic function. Yet, we learned in
Example 1082 that there exist in…nitely di¤erentiable functions that are not analytic. Indeed,
the next deep theorem – whose highly non-trivial proof we omit – shows that, in general,
the previous questions have a negative answer.21
(k)
is easily seen to be also such that f (0) = ck for all k = 0; 1; :::; n; :::. A continuum of
in…nitely di¤erentiable functions that satisfy (23.36) thus exist.
21
The theorem was independently proved between 1884 and 1895 by Giuseppe Peano and Emile Borel
(Borel’s version is the best known, so the name of this subsection).
Chapter 24
Concave functions have remarkable di¤erentiability properties that con…rm the great tract-
ability of these widely used functions. The study of these properties is the subject matter of
this chapter. We begin with scalar functions and then move to functions of several variables.
Throughout the chapter C always denotes a convex set (so an interval in the scalar case).
For brevity, we will focus on concave functions, leaving to the readers the dual results that
hold for convex functions.
f (y) f (x)
y x
as one can verify with a simple modi…cation of what done for (20.6). Graphically:
f(y)
4
f(y)-f(x)
3
f(x)
2
y-x
0
O x y
-1
-1 0 1 2 3 4 5 6
747
748 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
If the function f is concave, the slope of the chord decreases when we move the chord
rightward. This basic geometric property characterizes concavity, as the next lemma shows.
Lemma 1089 A function f : C R ! R is concave if and only if, for any four points
x; w; y; z 2 C with x w < y z, we have
f (y) f (x) f (z) f (w)
(24.1)
y x z w
In other words, by moving rightward from [x; y] to [w; z], the slope of the chords decreases.
Graphically:
5 D
C
4
3
B
2
1 A
0
O x w y z
-1
-1 0 1 2 3 4 5 6
Proof “Only if”. Let f be concave. The proof is divided in two steps: …rst we show that
the chord AC has a greater slope than the chord BC:
5
C
4
3
B
2
1 A
0
O x w y
-1
-1 0 1 2 3 4 5 6
24.1. SCALAR FUNCTIONS 749
Then, we show that the chord BC has a greater slope than the chord BD:
5 D
C
4
3
B
2
0
O w y z
-1
-1 0 1 2 3 4 5 6
The …rst step amounts to proving (24.1) for z = y. Since x w < y, there exists 2 [0; 1]
such that w = x + (1 )y. Since f is concave, we have f (w) f (x) + (1 )f (y), so
that
f (y) f (w) f (y) f (x) (1 )f (y) f (y) f (x)
= (24.2)
y w y x (1 )y y x
This completes the …rst step. We now move to the second step, which amounts to proving
(24.1) for x = w. Since w < y z, there exists 2 [0; 1] such that y = w + (1 )z.
Further, since f is concave we have f (y) f (w) + (1 )f (z), so that
The geometric property (24.1) has the following analytical counterpart, of great economic
signi…cance.
750 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
Proof Let x y and h 0. Then the points y and x + h belongs to the interval [x; y + h].
Under the change of variable z = y +h, we have x+h; z h 2 [x; z]. Hence there is a 2 [0; 1]
for which x + h = x + (1 ) z. It is immediate to check that z h = (1 ) x + z.
By the concavity of f , we then have f (x + h) f (x) + (1 ) f (z) and f (z h)
(1 ) f (x) + f (z). Adding the two inequalities, we have
f (x + h) + f (z h) f (x) + f (z)
f (x + h) f (x) f (z) f (z h) = f (y + h) f (y) :
The inequality (24.4) does not change if we divide both sides by a h > 0. Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
provided the limits exist. Similarly f 0 (x) f 0 (y), and so f 0 (x) f 0 (y) when the (bilat-
eral) derivative exists. Concave functions f thus feature decreasing marginal e¤ects as their
argument increases, so embody a fundamental economic principle: additional units have a
lower and lower marginal impact on levels (of utility, of production, and so on; we then talk
of decreasing marginal utility, decreasing marginal returns, and so on). It is through this
principle that forms of concavity …rst entered economics.1
The next lemma establishes this property rigorously by showing that one-sided derivatives
exist and are decreasing.
(i) the right f+0 (x) and left f 0 (x) derivatives exist at each x 2 int C;2
(ii) the right f+0 and left f 0 derivative functions are both decreasing on int C;
Proof Since x0 is an interior point, it has a neighborhood (x0 "; x0 + ") included in C,
that is, (x0 "; x0 + ") C. Let 0 < a < ", so that we have [x0 a; x0 + a] C. Let
: [ a; a] ! R be de…ned by
f (x0 + h) f (x0 )
(h) = 8h 2 [ a; a]
h
Property (24.1) implies that is decreasing, that is,
Indeed, if h0 < 0 < h00 it is su¢ cient to apply (24.1) with w = y = x0 , x = x0 + h0 and
z = x0 + h00 . If h0 h00 < 0, apply (24.2) with y = x0 , x = x0 + h0 and w = x0 + h00 . If
0<h 0 h apply (24.3) with w = x0 , y = x0 + h0 and z = x0 + h00 .
00
f (x + h) f (x) f (y + h) f (y)
8h 2 [ a; a]
h h
Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
which implies that the right derivative function is decreasing. A similar argument holds for
the left derivative function.
Example 1093 (i) The concave function f (x) = jxj does not have a derivative at x = 0.
Nevertheless, the one-sided derivatives exist at each point of the domain, with
(
1 if x < 0
f+0 (x) =
1 if x 0
and (
0 1 if x 0
f (x) =
1 if x > 0
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing.
(ii) The concave function
8
< x+1
> if x 1
f (x) = 0 if 1<x<1
>
:
1 x if x 1
and 8
< 1
> if x 1
0
f (x) = 0 if 1<x 1
>
:
1 if x > 1
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing.
(iii) The concave function f (x) = 1 x2 is di¤erentiable on R with f 0 (x) = 2x. The
derivative function is decreasing. N
Proposition 1091 says, inter alia, that at the interior points x we have f+0 (x) f 0 (x).
The next result says that we actually have f+0 (x) = f 0 (x), and so f is di¤erentiable at x, at
all points x 2 C except those belonging to an, at most, countable subset of C. For the three
concave functions of the previous example, such set of non-di¤erentiability is f0g, f 1; 1g
and ;, respectively.
Theorem 1095 Let f : (a; b) ! R be di¤ erentiable at x 2 (a; b). If f is concave, then
Proof Let f be concave and let x and y be two distinct points of (a; b). If 2 (0; 1), we
have
Therefore,
f (x + (1 ) (y x)) f (x)
f (y) f (x)
(1 )
f (x + (1 ) (y x)) f (x)
(y x) f (y) f (x)
(1 ) (y x)
This inequality holds for every 2 (0; 1). Hence, thanks to the di¤erentiability of f at x,
we have
f (x + (1 ) (y x)) f (x)
lim (y x) = f 0 (x) (y x)
!1 (1 ) (y x)
The right-hand side of inequality (24.6) is the tangent line of f at x, that is, the linear
approximation of f that holds, locally, at x. By Theorem 1095, such line always lies above
the graph of the function, the approximation is in “excess”.
Geometrically, this remarkable property is clear: the de…nition of concavity requires that
the straight line passing through the two points (x; f (x)) and (y; f (y)) lies below the graph
of f in the interval between x and y, and hence that it lies above it outside that interval.4
Letting y tend to x, the straight line becomes tangent and lies all above the curve.
4
For completeness, let us prove it. Let z be outside the interval [x; y]: suppose that z > y. We can then
write y = x + (1 ) z with 2 (0; 1) and, by the concavity of f , we have f (y) f (x) + (1 ) f (z),
that is, f (z) (1 ) 1 f (y) (1 ) 1 f (x). But, since 1= (1 ) = > 1 and 1 = 1 1= (1 )=
= (1 ) < 0, we have f (z) = f ( y + (1 ) x) f (y) + (1 ) f (x) for every > 1. If z < x, we
reason similarly.
754 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
5 f(x)+f'(x)(y-x)
4.5
f(x)
4
f(y)
3.5
3
f(y )
2
2.5
2 f(y )
1
1.5
0.5
O y y y x
1 2
0
0 1 2 3 4 5
Theorem 1096 Let f : (a; b) ! R be di¤ erentiable on (a; b). Then, f is concave if and only
if
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b) (24.7)
Thus, for a function f di¤erentiable on an open interval, a necessary and su¢ cient con-
dition for concavity of f is that the tangent lines at the various points of its domain all lie
above its graph.
Proof The “only if” follows from the previous theorem. We prove the “if”. Suppose that
inequality (24.7) holds and consider the point z = x + (1 ) y. Let us consider (24.7)
twice: …rst with the points z and x, and then with the points z and y. Then:
f 0 (z) (1 ) (x y) f (x) f ( x + (1 ) y)
0
f (z) (y x) f (y) f ( x + (1 ) y)
Let us multiply the …rst inequality by , the second one by (1 ), and add them. We get
0 f (x) + (1 ) f (y) f ( x + (1 ) y)
To this end, remember that a signi…cant property established in Proposition 1091 is the
decreasing monotonicity of the one-sided derivative functions of concave functions. The next
important result shows that for continuous functions this property characterizes concavity.
(i) f is concave if and only if the right derivative function f+0 exists and is decreasing on
int C;
(ii) f is strictly concave if and only if the right derivative function f+0 exists and is strictly
decreasing on int C.
Proof (i) We only prove the “if” since the converse follows from Proposition 1091. For
simplicity, assume that f is di¤erentiable on the open interval int C. By hypotheses, f 0 is
decreasing on int C. Let x; y 2 int C, with x < y, and 2 (0; 1). Set z = x + (1 ) y, so
that x < z < y. By the Mean Value Theorem, there exist x 2 (x; z) and y 2 (z; y) such
that
f (z) f (x) f (y) f (z)
f 0 ( x) = ; f0 y =
z x y z
Since f 0 is decreasing, f 0 ( x ) f0 y . Hence,
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
x + (1 )y x y x (1 )y
A similar result, left to the reader, holds for the other one-sided derivative f 0 . This
theorem thus establishes a di¤erential characterization for concavity by showing that it is
equivalent to the decreasing monotonicity of one-sided derivative functions.
The function f is continuous. It has one-sided derivatives at each point of the domain, with
(
1 + 3x2 if x < 0
f+0 (x) =
1 3x2 if x 0
5
Using a version of the Mean Value Theorem for unilateral derivatives, we can prove the result without
any di¤erentiability assumption on f .
756 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
and
(
0 1 + 3x2 if x 0
f (x) =
1 3x2 if x > 0
To see that this is the case, consider the origin, which is the most delicate point. We have
f (h) f (0) h + h3 h3
f+0 (0) = lim = lim = lim 1+ = 1
h!0+ h h!0+ h h!0+ h
and
f (h) f (0) h + h3 h3
f 0 (0) = lim = lim = lim 1+ =1
h!0 h h!0 h h!0 h
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing. By Theorem 1097, the function f is concave. N
One-sided derivatives are key in the previous theorem because concavity per se only
ensures their existence, not that of the two-sided derivatives. One-sided derivatives are,
however, less easy to handle than the two-sided derivative. So, in applications di¤erentiability
is often assumed. In this case we have the following simple consequence of the previous
theorem that provides a useful concavity criterion for functions.
Under di¤erentiability, a necessary and su¢ cient condition for a function to be (strictly)
concave is, thus, that its …rst derivative is (strictly) decreasing.6
Proof We only prove (i), as (ii) is similar. Let f : C R ! R be di¤erentiable on int C and
continuous on C. If f is concave, Theorem 1097 implies that f 0 = f+0 is decreasing. Vice
versa, if f 0 = f+0 is decreasing then Theorem 1097 implies that f is concave.
6
When C is open, the continuity assumption become super‡uous (a similar observation applies to Corollary
1101 below).
24.1. SCALAR FUNCTIONS 757
3 y
0
O x
-1
-2
-3
-4
-3 -2 -1 0 1 2 3 4 5
2
y
1
0
O x
-1 -1
-2
-3
-4
-5
-3 -2 -1 0 1 2 3 4
The derivatives are strictly decreasing and therefore f and g are strictly concave thanks to
Corollary 1099. N
The previous corollary provides a simple di¤erential criterion of concavity that reduces
the test of concavity to that, often operationally simple, of a property of …rst derivatives. The
next result shows that it is, actually, possible to do even better by recalling the di¤erential
characterization of monotonicity seen in Section 22.4.
758 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
Corollary 1101 Let f : C R ! R be with twice di¤ erentiable on int C and continuous
on C. Then:
Proof (i) It is su¢ cient to observe that, thanks to the “decreasing” version of Proposition
1003, the …rst derivative f 0 is decreasing on int C if and only if f 00 (x) 0 for every x 2 int C.
(ii) It follows from the “strictly decreasing” version of Proposition 1005.
Under the further hypothesis that f is twice di¤erentiable on int C, concavity thus be-
comes equivalent to the negativity of the second derivative, a condition often easier to check
than the decreasing monotonicity of the …rst derivative. In any case, thanks to the last two
corollaries we now have powerful di¤erential tests of concavity.7
Note the asymmetry between points (i) and (ii): while in (i) the decreasing monotonicity
is a necessary and su¢ cient condition for concavity, in (ii) the strictly decreasing mono-
tonicity is only a su¢ cient condition for strict concavity. This follows from the analogous
asymmetry for monotonicity between Propositions 1003 and 1005.
p
Example 1102 (i) The functions f (x) = x and g (x) = log x have, respectively, derivat-
p
ives f 0 (x) = 1=2 x and g 0 (x) = 1=x that are strictly decreasing. Therefore, they are strictly
concave. The second derivatives f 00 (x) = 1=4x3=2 < 0 and g 00 (x) = 1=x2 < 0 con…rm
this conclusion.
(ii) The function f (x) = x2 has derivative f 0 (x) = 2x that is strictly increasing. There-
fore, it is strictly convex. Indeed, f 00 (x) = 2 > 0.
(iii) The function f (x) = x3 has derivative f 0 (x) = 3x2 that is strictly decreasing on
( 1; 0] and strictly increasing on [0; 1). Indeed, the second derivative f 00 (x) = 6x is 0
on ( 1; 0] and 0 on [0; 1). N
24.2 Intermezzo
In the next section we will study the di¤erential properties of concave functions of several
variables. This important topic relies, in turn, on two important topics, superlinear functions
and monotone operators, that we now present.
Example 1103 (i) The norm k k : Rn ! R is a sublinear function (cf. Example 652). (ii)
De…ne f : Rn ! R by
f (x) = inf i x 8x 2 Rn
i2I
where f i gi2I be a collection, …nite or in…nite, of vectors of Rn . This function is easily seen
to be superlinear.
A simple consequence of the last result is the following corollary, which motivates the
“superlinear” terminology.
Proof Let f be both superlinear and sublinear. By (24.8), we have both f ( x) f (x)
and f ( x) f (x) for all x 2 Rn , that is, f ( x) = f (x) for all x 2 Rn . By Proposition
1104, f is then linear. The converse is trivial.
Proof Let l 2 (Rn )0 and suppose that f (x) l (x) for all x 2 Rn . Let x 2 Rn . Then, we
have both f (x) l (x) and f ( x) l ( x), which in turn implies f (x) l (x) = l ( x)
f ( x). This proves (24.9).
(i) g is monotone if and only if the Jacobian matrix Dg (x) is negative semide…nite for all
x 2 C;
(ii) g is strictly monotone if the Jacobian matrix Dg (x) is negative semide…nite for all
x 2 C.
Proof We only prove (i) and leave (ii) to the reader. Suppose that g is monotone. Let
x 2 C and y 2 Rn . Then, for a scalar h > 0 small enough we have (g (x + hy) g (x))
((x + hy) x) 0. Since g is continuously di¤erentiable, we have
(g (x + hy) g (x)) ((x + hy) x)
0 lim
h!0+ h
g (x + hy) g (x)
= lim y = Dg (x) y y
h!0+ h
Since this holds for any y 2 Rn , we conclude that Dg (x) is negative semide…nite.
Conversely, suppose that Dg (x) is negative semide…nite at all x 2 C. Let x1 ; x2 2 C and
de…ne : [0; 1] ! R by
To prove that g is monotone it is enough to show that (1) 0. But, (0) = 0 and is
monotone since, for all t 2 (0; 1),
0
(t) = (x1 x2 ) Dg (tx1 + (1 t) x2 ) (x1 x2 ) 0
A market demand function D : Rn+ ! Rn+ (Section 18.8) is a strictly monotone operator
if
D (p) D p0 p p0 < 0 8p; p0 0
that is, if it satis…es the law of demand. In this case, (24.11) takes a strict version
which means that, ceteris paribus, a higher price of good i results in a lower demand for this
good. In sum, monotonicity formalizes a key economic concept. Its Jacobian characterization
established in the last proposition plays an important role in demand theory.
Finally, we have a dual notion of increasing monotonicity when the inequality (24.10) is
reversed.
f (x + hy) f (x)
f+0 (x; y) = lim (24.12)
h!0+ h
exists and is …nite. This limit is called the directional right derivative of f at x along the
direction y.
762 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
This result implies, inter alia, that f+0 (x; ) is superlinear if and only if f 0 (x; ) is sub-
linear.
Proof Assume that f is derivable from the right at x 2 U . For each y 2 Rn we then have:
So, f is derivable from the left at x, and (24.13) holds. A similar argument shows that
derivability from the left yields derivability from the right.
(i) the right f+0 (x; ) : Rn ! R and left f 0 (x; ) : Rn ! R directional derivatives exist at
each x 2 C;
The proof relies on the following lemma which shows that the di¤erence quotient is
decreasing.
Lemma 1112 Let f : C ! R be concave. Given any x 2 C and y 2 Rn , then the function
f (x + hy) f (x)
0<h7 ! (24.14)
h
is decreasing for all h > 0 values such that x + hy 2 C.
24.3. MULTIVARIABLE CASE 763
Proof Let x 2 C. Assume …rst that x = 0 and f (0) = 0. Fix y 2 Rn and let 0 < h1 < h2 .
By concavity,
h1 h1 h1 h1
f (h1 y) = f h2 y f (h2 y) + 1 f (0) = f (h2 y) ;
h2 h2 h2 h2
and so f (h1 y) =h1 f (h2 y) =h2 . To complete the proof, de…ne g : C fxg ! R by g (z) =
f (z + x) f (x) for all z 2 C fxg. Then, g (0) = 0 and g (hy) =h = (f (x + hy) f (x)) =h.
We conclude that the di¤erence quotient (24.14) has the desired properties.
Proof of Proposition 1111 (i) In view of Proposition 1110, we can focus on the left
derivative function f+0 (x; ) : Rn ! R. By Lemma 1112, the di¤erence quotient is decreasing,
so the limit (24.12) exists and
It remains to show that it is …nite. We only consider the case when f is positive, i.e., f 0.
By concavity, for each h > 0 we then have
y1 + y2 y1 + y2
f+0 (x; y1 + y2 ) = f+0 x; 2 = 2f+0 x;
2 2
0 0
f+ (x; y1 ) f+ (x; y2 )
2 + = f+0 (x; y1 ) + f+0 (x; y2 ) .
2 2
The last result leads to interesting characterization of derivability via one-sided derivative
functions.
(i) f is derivable at x;
A concave function derivable at a point has, thus, a linear directional derivative function
represented via the inner product (24.15). Since, in general, the directional derivative func-
tion is only homogeneous (Corollary 967), it is a further noteworthy property of concavity
that the much stronger property of linearity, with its inner product representation, holds.
Proof (iv) implies (iii). Assume that f 0 (x; ) : Rn ! R is linear. By (24.13), we have, for
all y; y 0 2 Rn and all ; 2 R,
@f (x)
= f 0 x; ei = ei = i 8i = 1; :::; n
@xi
24.3. MULTIVARIABLE CASE 765
Thus, = rf (x).
A remarkable property of concave functions of several variables is that for them partial
derivability and di¤erentiability are equivalent notions.
(ii) f is derivable at x;
Compared to Theorem 954, here the continuity of partial derivatives is not required.
Thus, for concave functions we recover the remarkable equivalence between derivability and
di¤erentiability that holds for scalar functions but fails, in general, for functions of several
variables. This is another sign of the great analytical convenience of concavity.
Proof It is enough to prove that (i) implies (ii) and that (ii) implies (iii) since (iii) implies (i)
by Theorem 952. (i) implies (ii). Suppose f is partially derivable at x. Then, f+0 x; ei =
f 0 x; ei for each versor ei of Rn . Let 0 6= y 2 Rn+ . By Proposition 1111, f+0 (x; ) is
superlinear and f 0 (x; ) is sublinear. So, f+0 (x; 0) = f 0 (x; 0) = 0. Let 0 6= y 2 Rn+ . Since
f+0 x; ei = f 0 x; ei , we have:
n
! n
! n
! n
X X yi X X y
f+0 (x; y) = yi f+0 x; Pn ei yi Pn i f+0 x; ei
i=1 i=1 i=1 yi i=1 i=1 i=1 yi
n
! n n
! n
!
X X yi X X yi
0 i 0 i
= yi Pn f x; e yi f x; Pn e = f 0 (x; y)
i=1 i=1 i=1 yi i=1 i=1 i=1 yi
So, f+0 (x; y) = f 0 (x; y) because, again by Proposition 1111, f+0 (x; y) f 0 (x; y). We
conclude that f+ (x; ) = f (x; ) on R+ . A similar argument, based on f+0 x; ei =
0 0 n
f 0 x; ei , shows that f+0 (x; ) = f 0 (x; ) on Rn . Let y 2 Rn . De…ne the positive vectors
y + = max fy; 0g and y = min fy; 0g. Since y = y + y , we have
By Proposition 1111, we conclude that f+0 (x; y) = f 0 (x; y). In turn, this implies f+0 (x; ) =
f 0 (x; ) on Rn . By Corollary 1113, f is derivable.
(ii) implies (iii). Suppose f is derivable at x. To show that f is di¤erentiable at x, in
view of the last corollary we need to show that
f (x + h) f (x) rf (x) h
lim =0
h!0 khk
Proof We consider the concave case, and leave to the reader the strictly concave one. (i)
implies (ii). Suppose f is concave. Let x; y 2 C and t1 ; t2 2 Cx;y . Then, for each 2 [0; 1],
x;y ( t1 + (1 ) t2 ) = f ((1 ( t1 + (1 ) t2 )) x + ( t1 + (1 ) t2 ) y)
= f ( ((1 t1 ) x + t1 y) + (1 ) ((1 t2 ) x + t2 y))
f ((1 t1 ) x + t1 y) + (1 ) f ((1 t2 ) x + t2 y)
= x;y (t1 ) + (1 ) x;y (t2 )
f ((1 t) x + ty) = x;y (t) t x;y (1) + (1 t) x;y (0) = (1 t) f (x) + tf (y)
Proof Let f be concave. Fix x; y 2 C. Let x;y : Cx;y ! R be given by (24.16). By Lemma
1115, Cx;y is an open interval, and by Proposition 1116 the function x;y is concave on Cx;y .
Hence,11
0 (") (0) f (x + " (x y)) f (x)
+ (0) = lim = lim
"!0+ " "!0+ "
= f+0 (x; x y) = f 0 (x; x y) = 0
(0)
So, is di¤erentiable at 0 2 Cx;y . Since [0; 1] Cx;y , by (24.6) we have
0
(1) (0) + (0) = (0) + f 0 (x; x y)
i.e., f (y) f (x) + rf (x) y (Theorem 970). So, the inequality (24.17) holds.
Proof The “only if” follows from (24.17). As to the converse, suppose that (24.18) holds.
For each x 2 C, consider the function Fx : C ! R given by Fx (y) = f (x) + rf (x) (y x).
By (24.18), f (y) Fx (y) for all x; y 2 C. Since Fx (x) = f (x), we conclude that f (y) =
minx2C Fx (y) for each y 2 C. Since each Fx is a¢ ne, we conclude that f is concave since,
as the reader can check, a function that is a minimum of a family of concave functions is
concave.
(ii) f is strictly concave if and only if f 0 : C ! Rn is strictly monotone, i.e., the previous
inequality is strict if x 6= y.
So, rf (x) (x y) f (x) f (y) rf (y) (x y). In turn, this implies (rf (x) rf (y))
(x y) 0 and we conclude that rf : C ! Rn is monotone decreasing.
Conversely, suppose rf : C ! Rn is monotone decreasing, i.e., (24.19) holds. Suppose
…rst that n = 1. Let x 2 C, and de…ne x : C ! R by x (y) = f (y) f (x) rf (x) (y x).
Then, 0x (y) = rf (y) rf (x), and so 0x (y) 0 if y < x and 0x (y) 0 if y > x. Hence,
x has a minimum at x, i.e.,
Since x was arbitrary, we conclude that f (y) f (x) + rf (x) (y x) for all x; y 2 C. By
Theorem 1118, f is concave. This completes the proof for n = 1.
Suppose now that n > 1. Let x; y 2 C and let x;y : Cx;y ! R be given by (24.16).
By Lemma 1115, Cx;y is an open interval, with [0; 1] Cx;y . Then, x;y is concave and
di¤erentiable on (a; b), with
0
x;y (t) = rf ((1 t) x + ty) (x y) 8t 2 Cx;y (24.20)
f ((1 t) x + ty) = x;y (t) (1 t) x;y (0) + t x;y (1) = (1 t) f (x) + tf (y)
which implies f (x2 ) f (x1 ) = (x2 x1 ). Given 2 (0; 1), by (24.21) we have:
A dual result, with opposite inequality, characterizes convex functions. The next result
makes truly operational this characterization via a condition of negativity on the Hessian
matrix r2 f (x) of f –that is, the matrix of second partial derivatives of f –which generalizes
the condition f 00 (x) 0 of Corollary 1101. In other words, in the general case the role of
the second derivative is played by the Hessian matrix.
Proof The result follows from Proposition 1107 once one remembers that the Hessian matrix
of a function of several variables is the Jacobian matrix of its derivative operator (Exercise
975). So, the Hessian matrix r2 f (x) of f is the Jacobian matrix of the derivative operator
rf : C ! Rn , which plays here the role of g in Proposition 1107.
This is the most useful di¤erential criterion to establish concavity and strict concavity
for functions of several variables. Naturally, dual results hold for convex functions, which
are characterized by having positive semi-de…nite Hessian matrices.
and we saw how its Hessian matrix was positive de…nite. By Proposition 1120, f is strictly
convex. N
770 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
Example 1122 Consider the CES production function f : R2+ ! R de…ned, as in Example
705, by
1
f (x) = ( x1 + (1 ) x2 )
with 2 [0; 1] and > 0. Some tedious algebra shows that the Hessian matrix is
1
2 2 2
r2 f (x) = (1 ) (1 )t x1 x2 H
where t = x1 + (1 ) x2 and
x22 x1 x2
H=
x1 x2 x21
If = ( 1; 2 ), we have
H = x22 2
1 2x1 x2 1 2 + x21 2
2 = (x2 1 x1 2 )2 0
Thus, the matrix H is positive semide…nite. It follows that for > 1 the matrix r2 f (x)
is positive semide…nite for all x1 ; x2 > 0, so by Proposition 1120 f is convex. While f is
concave when 0 < < 1.
In Corollary 711 we already established the concavity of the CES functions without doing
any calculation. Readers can compare the pros and cons of the two approaches. N
@f (x)
0 8i; j = 1; :::; n
@xi @xj
Again, a plain vanilla negativity condition on the Hessian matrix characterizes inframod-
ularity, while for concave functions we needed a notion of monotonicity based on quadratic
forms (Theorem 1120). Note that submodularity requires this negativity property only when
i 6= j. This di¤erential characterization thus sheds further light on the relations between
submodularity or supermodularity and inframodularity or ultramodularity.
The di¤erential characterizations established in the last two results show that, unlike the
scalar case, inframodularity and concavity are quite unrelated properties in the multivariable
case, as we remarked at the beginning of this section.
12
That is, each section f ( ; x i ) : [ai ; bi ] ! R is convex in xi .
13
We omit the proofs of these di¤erentiability results (their inframodular, rather than ultramodular, focus
will be self-explanatory).
14
In reading the result, recall from Section 2.3 that (a; b) = fx 2 Rn : ai < xi < bi g.
772 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
f (y) x) + f 0 (^
f (^ x) (y x
^) = f (^
x) 8y 2 (a; b)
Proposition 1129 Let f : (a; b) ! R be a concave and di¤ erentiable function. A point
^ 2 (a; b) is a global maximizer of f on (a; b) if and only if f 0 (^
x x) = 0.
Example 1130 (i) Consider the function f : R ! R given by f (x) = (x+1)4 +2. We have
f 00 (x) = 12(x + 1)2 < 0. The function is concave on R and it is therefore su¢ cient to …nd
a point where its …rst derivative is zero to …nd a maximizer. We have f 0 (x) = 4(x + 1)3 .
24.5. GLOBAL OPTIMIZATION 773
Theorem 1131 Let f : C ! R be a concave function di¤ erentiable on int C and continuous
on C. A point x
^ of int C is a global maximizer of f on C if and only if rf (^
x) = 0.
Proof In view of Fermat’s Theorem, we need to prove the “if”part, that is, su¢ ciency. So,
let x
^ 2 int C be such that rf (^
x) = 0. We want to show that x
^ is a global maximizer. By
inequality (24.17), we have
f (y) f (^
x) + rf (^
x) (y x) 8y 2 int C
Since f is continuous, the inequality is easily seen to hold for all y 2 C. Since rf (^
x) = 0,
we conclude that f (y) f (^ x) for all y 2 C, as desired.
Example 1132 Consider the function f : R2 ! R given by f (x1 ; x2 ) = (x1 1)2 (x2 +
3)2 6. We have
2 0
r2 f (x1 ; x2 ) =
0 2
Since 2 < 0 and r2 f (x1 ; x2 ) = 4 > 0; the Hessian matrix is negative de…nite for every
(x1 ; x2 ) 2 R2 and hence f is strictly concave. We have
rf (x1 ; x2 ) = 2(x1 1) 2(x2 + 3)
The unique point where the gradient is zero is (1; 3) which is, therefore, the unique global
maximizer. The maximum value of f on R2 is f (1; 3) = 6. N
Example 1133 In Section (18.9) we considered the least squares optimization problem
max g (x) sub x 2 Rn (24.22)
x
with g : Rn ! R de…ned by g (x) = kAx bk2 . We learned that if (A) = n, then there is a
unique solution x
^ (Theorem 854). In Section 19.4 we then noted, via the Projection Theorem,
1 T
that such solution is given by x^ = AT A A b. This can established also from Theorem
1131. Indeed, rg (x) = 2AT (Ax b) and so the …rst order condition 2AT (Ax b) = 0
can be written as a linear system
AT Ax = AT b (24.23)
Since (A) = n, by Proposition 582 we have AT A = n, so the Gram matrix is invertible.
1 T
^ = AT A
By Cramer’s Theorem, x A b is the unique solution of the linear system (24.23),
so by Theorem 1131 the only solution of the optimization problem (24.22). N
774 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
We close by noting that for scalar functions f : (a; b) ! R, with C = (a; b), the last
theorem also follows from Proposition 1024. That said, it is the last theorem the result
used in applications because of the conceptual and analytical appeal of concavity (cf. the
discussion that ends Section 22.5.4).
(i) co f f;
(ii) h co f for all concave function h : C ! R such that h f.
Proof Let fgi gi2I be the collection of all concave functions gi : C ! R such that gi f.
This collection is not empty because f is concavi…able. De…ne co f : C ! R by
For each x 2 C, the scalar f (x) is a lower bound for the set fgi (x) : i 2 Ig. By the Least
Upper Bound Principle inf i2I gi (x) exists, so the function co f is well de…ned. It is easily
seen to be concave. Indeed, let 2 [0; 1] and x; y 2 C. By Proposition 120, for each " > 0
there exists i" such that
Since this inequality holds for every " > 0, we conclude that co f ( x + (1 ) y) co f (x)+
(1 ) co f (y), so co f is concave. In turn, this implies that co f (x) = mini2I gi (x). In par-
ticular, co f satis…es properties (i) and (ii).
Example 1135 (i) Both the sine and cosine functions are concavi…able. Their concave
envelope is constant to 1, i.e., co sin (x) = co cos (x) = 1 for all x 2 R. (ii) Let f : R ! R be
2
the Gaussian function f (x) = e x . It is concavi…able with
( h i
f (x) x 2 p1 ; p1
co f (x) = 2 2
1
e 2 else
(iii) The quadratic function is not concavi…able on the real line. (iv) Functions that have
at least a global maximizer are automatically concavi…able: just take the function constant
to the maximum value. For instance, continuous supercoercive functions f : Rn ! R are
concavi…able. N
24.5. GLOBAL OPTIMIZATION 775
This remarkable result shows how concavity is deeply connected to global maximization,
more than it may appear prima facie. It is a result, however, mostly of theoretical interest
because concave envelopes are, in general, not easy to compute. Indeed, Theorem 1131 can
be regarded as its operational special case.
The proof relies on two elegant lemmas of independent interest.
f (^
x) co f (x) 8x 2 C (24.24)
rf (^
x). So, r co f (^
x) = 0. Since f is continuous, by proceeding as in the proof of Theorem
1131 we can show that inequality (24.17) implies that x
^ is a global maximizer of co f . Hence,
f (^
x) = co f (^
x) co f (x) f (x) 8x 2 C
We conclude that x^ is a global maximizer of f .
“Only if”. Let x ^ 2 int C be a global maximizer of f on C. By Lemma 1138, x ^ is a
global maximizer of co f on C, with co f (^
x) = f (^
x). By Lemma 1137, co f is di¤erentiable
at x
^ with r co f (^
x) = rf (^ x). By Fermat’s Theorem, r co f (^
x) = 0. We conclude that
rf (^x) = 0.
In view of Lemma 1138, in optimization problems with convex choice sets –e.g., consumer
problems since budget sets are, typically, convex – in terms of value attainment one can
assume that the objective function be concave. If in such problems we are only interested
in the value functions, without any loss we can just deal with concave objective functions.
This is no longer the case, however, if we are interested also in the solutions per se, i.e., in
the solution correspondence. Indeed, in this regard Lemma 1138 only says that
arg max f (x) arg max co f (x)
x2C x2C
So, by replacing an objective function with its concave envelope we do not lose solutions
but we might well get intruders that solve the concavi…ed problem but not the original one.
To understand the scope of this issue, note that co (arg maxx2C f (x)) arg maxx2C co f (x)
because the solutions of a concave objective function form a convex set. Thus, the best one
can hope is that
co arg max f (x) = arg max co f (x)
x2C x2C
Even in such best case, there might well be many vectors that solve the optimization problem
for the concave envelope co f but not for the original objective function f . We thus might
end up overestimating the solution correspondence. For instance, if in a consumer problem
we replace a utility function with its concave envelope, we do not lose any optimal bundle
but we might well get “extraneous” bundles, optimal for the concave envelope but not for
the original utility function. For an analytical example, if we maximize the cosine function
over the real line, the maximizers are the points x ^ = 2k with k 2 Z (Example 780). If we
replace the cosine function with its concave envelope, the maximizers become all the points
of the real line. So, the solution set is vastly in‡ated. Still, the common maximum value is
1.
A …nal remark: there is a dual notion of convex envelope of a function as the largest
dominated convex function, relevant for minimization problems (the reader can establish the
dual version of Theorem 1136).
24.6 Superdi¤erentials
Theorem 1118 showed that di¤erentiable concave functions feature the important inequality15
f (y) f (x) + rf (x) (y x) 8y 2 C
15
Unless otherwise stated, throughout this section C denotes an open and convex set in Rn .
24.6. SUPERDIFFERENTIALS 777
This inequality has a natural geometric interpretation: the tangent hyperplane (line, in the
scalar case) lies above the graph of f , which it touches only at (x; f (x)). Remarkably, next
we show that this property actually characterizes the di¤erentiability of concave functions.
In other words, this geometric property is peculiar to the tangent hyperplanes of concave
functions.
Proof “If”. Suppose 2 Rn satis…es (24.25). Let z 2 Rn . Since C is open, for h > 0 small
enough we have x + hz 2 C, so
f (x + hz) f (x)
f+0 (x; z) = lim z
h!0+ h
so satis…es (24.25).
“Only if”. Assume that 2 Rn satis…es (24.26). Let y 2 C. Since C is open, there is
h > 0 small enough so that x + h (y x) 2 C. Then, by Lemma 1112,
f (x + t (y x)) f (x)
(y x) f 0 (x; y x) (24.27)
t
which is (24.25) when t = 1.
0 (t + ") (t)
+ (t) = lim
"!0+ "
f ((1 t) x + ty + " (x y)) f ((1 t) x + ty)
= lim
"!0+ "
= f+0 ((1 t) x + ty; x y)
16
To ease notation, in the rest of the proof we use in place of v;w .
778 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
0 0
for each t 2 Cx;y . Since [0; 1] Cx;y and f is di¤erentiable at x, we have + (0) = (0) and
so, by (24.6),
0
(1) (0) + (0) = (0) + f 0 (x; x y)
i.e., f (y) f (x) + f 0 (x; y x). Since f is di¤erentiable at x, we have f 0 (x; y) = rf (x) y
for all y 2 Rn , so (24.18) holds with = rf (x).
“If”. Assume there is a unique vector 2 Rn such that (24.25) holds. By the last lemma,
0
f+ (x; z) z for all z 2 Rn . Since is unique, by Corollary 1170, f+0 (v; ) : Rn ! R
is a linear function. By Corollary 1113, f is derivable at x. Then, by Theorem 1114 f is
di¤erentiable at x.
The superdi¤erential thus consists of all vectors (and so of the linear functions) for which
(24.25) holds. It may not exist any such vector (Example 1149 below); in this case the
superdi¤erential is empty and the function is not superdi¤erentiable at the basepoint.
In words, r is equal to f at the basepoint x and dominates f elsewhere. It follows that @f (x)
identi…es the set of all a¢ ne functions that touch the graph of f at x and that lie above this
graph at all other points of the domain. In the scalar case, a¢ ne functions are the straight
lines. So, in the next …gure the straight lines r, r0 , and r00 belong to the superdi¤erential
24.6. SUPERDIFFERENTIALS 779
It is easy to see that, at the points where the function is di¤erentiable, the only straight
line that satis…es conditions (24.29)-(24.30) is the tangent line f (x) + f 0 (x) (y x). But,
at the points where the function is not di¤erentiable, we might well have several straight
lines r : R ! R that satisfy such conditions, that is, that touch the graph of the function
at the basepoint x and that lie above such graph elsewhere. The superdi¤erential, being
the collection of these straight lines, can thus be viewed as a surrogate of the tangent line,
i.e., of the di¤erential. This is the idea behind the superdi¤erential: it is a surrogate of the
di¤erential when it does not exist. The next result, an immediate consequence of Theorem
1139, con…rms this intuition.
Proposition 1142 A concave function f : C ! R is di¤ erentiable at x 2 C if and only if
@f (x) is a singleton. In this case, @f (x) = frf (x)g.
Before presenting an example, we state a …rst important property of the superdi¤erential.
Proposition 1143 If f : C ! R is concave, then the set @f (x) is compact at every x 2 C.
Proof It is easy to check that @f (x) is closed and convex. To show that @f (x) is compact,
assume that it is is non-empty (otherwise the result is trivially true) and, without loss of
generality, that 0 2 C and x = 0. By Lemma 736, there exists a neighborhood B" (0) C
and a constant k > 0 such that jf (y)j k kyk for all y 2 B" (0). Let 2 @f (0). Since
y 2 B" (0) if and only if y 2 B" (0), by (24.28) we have:
k kyk f ( y) y= y f (y) k kyk 8y 2 B" (0)
Hence, j yj k kyk for all y 2 B" (0). For each versor ei , there is > 0 small enough so
that ei 2 B" (0). Hence,
j ij = ei k ei = k 8i = 1; :::; n
so j i j k for each i = 1; :::; n. Since was arbitrarily chosen in @f (0), by Proposition 161
we conclude that @f (0) is a bounded (so, compact) set.
Example 1144 Consider f : R ! R de…ned by f (x) = 1 jxj. The only point where f is
not di¤erentiable is x = 0. By Proposition 1142, we have @f (x) = ff 0 (x)g for each x 6= 0. It
remains to determine @f (0). This amounts to …nding the scalars that satisfy the inequality
1 jyj 1 j0j + (y 0) 8y 2 R
i.e., the scalars such that jyj y for each y 2 R. If y = 0, this inequality trivially
holds for all if y = 0. If y =
6 0, we have
y
1 (24.31)
jyj
Since
y 1 if y 0
=
jyj 1 if y < 0
from (24.31) it follows both 1 and ( 1) 1. That is, 2 [ 1; 1]. We conclude
that @f (0) = [ 1; 1]. Thus:
8
>
< 1 if x > 0
@f (x) = [ 1; 1] if x = 0
>
:
1 if x < 0
N
In words, the superdi¤erential of a scalar function consists of all coe¢ cients that lie
between the right and left derivatives. This makes precise the geometric intuition we gave
above on scalar functions.
Proof We only prove that @f (x) f+0 (x) ; f 0 (x) . Let 2 @f (x). Given any h 6= 0, by
de…nition we have f (x + h) f (x) + h. If h > 0, we then have
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f+0 (x) . If h < 0, then
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f 0 (x). We conclude that 2 f+0 (x) ; f 0 (x) , as desired.
i = ei f ei = 0 8i = 1; :::; n
n
X
i = (1; :::; 1) f (1; :::; 1) = 1
i=1
Xn
i = ( 1; :::; 1) f ( 1; :::; 1) = 1
i=1
P
we conclude that ni=1 i = 1 and i 0 for each i = 1; :::; n. That is, belongs to the
simplex n 1 . Thus, @f (0) n 1 . On the other hand, if 2 n 1 , then
and so 2 @f (0). We conclude that @f (0) = n 1 , that is, the superdi¤erential at the
origin is the simplex. The reader can check that, for every x 2 Rn ,
i.e., @f (x) consists of the vectors x of the simplex such that x = f (x). N
Example 1147 We can generalize the previous example by showing that for any positively
homogeneous function f : Rn ! R we have
so f (x) x. We conclude that f (x) = x for all 2 @f (x). In turn, this implies that
(24.28) takes the form
f (y) y 8y 2 Rn
for all 2 @f (x), i.e., @f (x) @f (0). So, (24.33) holds.17 N
Theorem 1148 A function f : C ! R is concave if and only if @f (x) is non-empty for all
x 2 C.
Proof “If”. Suppose @f (x) 6= ; at all x 2 C. Let x1 ; x2 2 C and t 2 [0; 1]. Let 2
@f (tx1 + (1 t) x2 ). By (24.28),
that is,
Hence,
f (tx1 + (1 t) x2 )
tf (x1 ) t (1 t) (x1 x2 ) + (1 t) f (x2 ) (1 t) t (x2 x1 )
= tf (x1 ) + (1 t) f (x2 )
as desired.
“Only if”. Suppose f is concave. Let x 2 C. By proceeding as in the proof of the coda
Theorem 1169, it is easy to check that the Hahn-Banach’s Theorem implies that there exists
2 Rn such that y f+0 (x; y) for all y 2 Rn . Hence, by (24.35), @f (x) is non-empty.
The maintained hypothesis that C is open is key for the last two propositions, as the
next example shows.
p
Example 1149 Consider f : [0; 1) ! R de…ned by f (x) = x. The only point of the
(closed) domain in which the function is not di¤erentiable is the boundary point x = 0. The
superdi¤erential @f (0) is given by the scalars such that
p p
y 0 + (y 0) 8y 0 (24.34)
p
i.e., such that y y for each y 0. If y = 0, this inequality holds for all . If y > 0,
p p
the inequality is equivalent to y=y = 1= y. But, letting y tend to 0, this implies
p
limy!0+ 1= y = +1. Therefore, there is no scalar for which (24.34) holds. It follows
that @f (0) = ;. We conclude that f is not superdi¤erentiable at the boundary point 0. N
N.B. We focused on open convex sets C to ease matters, but this example shows that
non-open domains may be important. Fortunately, the results of this section can be easily
extended to such domains. For instance, Theorem 1148 can be stated for any convex set C
(possibly not open) by saying that a concave and continuous function f : C ! R is concave
on int C if and only if @f (x) is non-empty at all x 2 int C, i.e., at all interior points x of C.18
18
If the domain C is not assumed to be open, we need to require continuity (which is otherwise automatically
satis…ed by Theorem 669).
24.6. SUPERDIFFERENTIALS 783
p
The concave function f (x) = x is indeed di¤erentiable – and so superdi¤erentiable, with
@f (x) = ff 0 (x)g – at all x 2 (0; 1), that is, at all interior points of the function’s domain
R+ . O
and
f+0 (x; y) = min y 8y 2 Rn (24.37)
2@f (x)
Proof Lemma 1140 implies (24.35), while (24.36) follows from (24.35) via (24.13). Finally,
the coda Theorem 1169 implies (24.37) because f+0 (x; ) : Rn ! R is superlinear.
For concave functions this theorem gives as a corollary the most general version of the
…rst order condition for concave functions. Indeed, in view of Corollary 1142, the earlier
Theorem 1131 is a special case of this result.
The next example shows how this corollary makes it possible to …nd maximizers even
when Fermat’s Theorem does not apply because there are points where the function is not
di¤erentiable.
Example 1153 For the function f : R ! R de…ned by f (x) = 1 jxj we have (Example
1144): 8
< 1 if x>0
@f (x) = [ 1; 1] if x=0
:
1 if x<0
By Corollary 1152, x
^ = 0 is a maximizer since 0 2 @f (0). N
784 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
24.7 Quasi-concavity
24.7.1 Ordinal superdi¤erential
The next de…nition introduces a notion of superdi¤erential suitable for quasi-concave func-
tions.19 In reading it, keep in mind that quasi-concavity is an ordinal notion, unlike concavity
which is cardinal (Section 14.4).
The next result shows the ordinal nature of this notion, thus justifying its name.
Proof Given x; y 2 C, it is enough to observe that f (y) f (x) if and only if (f g) (y)
(f g) (x) (cf. Proposition 209).
Because of its ordinal nature, the ordinal superdi¤erential is a convex semicone, as the
next result shows.
It follows that at least one addendum must be negative. Without loss of generality, say the
…rst: (y x) 0. We have two cases: either > 0 or = 0. In the former case, we
have that (y x) 0. In the latter case, since + > 0, 0 (y x) 0 and > 0.
We can conclude that either (y x) 0 or 0 (y x) 0 which implies f (y) f (x),
given that ; 0 2 @ o f (x), yielding that + 0 2 @ o f (x).
Next we show that for concave functions the notions of ordinal superdi¤erential and of
superdi¤erential are connected. Before doing so, we introduce an ancillary result which shows
how monotonicity is captured by the ordinal superdi¤erential.
So, the elements of the ordinal superdi¤erential of a strongly increasing function are
positive and non-zero vectors.
Proof Note that 2 Rn is such that 2 @ o f (x) if and only if for every y 2 C
f (y) > f (x) =) (y x) > 0 (24.38)
Let 2 @ o f (x). Consider z 2 Rn++ . Since C is open, it follows that x + z=n for n large
enough and, in particular, x + z=n x. Since f is strongly increasing, we have that
f (x + z=n) > f (x), yielding that (z=n) > 0, that is, z > 0. By Lemma 541 and since
n
z 2 R++ was arbitrarily chosen and by continuity of the function x 7! x, we have that
z 0 for all z 2 Rn+ , proving that 0. Finally, let 1 be the constant vector whose
components are all 1. Since 1 2 Rn++ , the vector must be di¤erent from 0, otherwise
0= 1 > 0 which would be a contradiction.
Proof Let 2 @f (x). By de…nition, we have that f (y) f (x) (y x) for all y 2 C.
This implies that if y 2 C and (y x) 0, then f (y) f (x), yielding that 2 @ o f (x)
and @f (x) @ o f (x). Now, assume that f is concave, strongly increasing, and x 2 C. Note
that @f (x) is non-empty. By the previous part[of the proof, we have that @f (x) @ o f (x).
By Proposition 1156, it follows that @ o f (x) @f (x). Vice versa, consider 2 @ o f (x).
>0
By Proposition 1157 and since f is strongly increasing, we have that > 0. Let y 2 Rn
be such that y = 0. It follows that for every h > 0 small enough x + hy 2 C and
(x + hy) x. Since 2 @ o f (x), it follows that f (x + hy) f (x) 0 for every h > 0
small enough. We can conclude that
f (x + hy) f (x)
f+0 (x; y) = lim 0
h!0+ h
Since y was arbitrarily chosen, it follows that f+0 (x; y) 0 for all y 2 Rn such that y = 0.
De…ne V = fy 2 R : n y = 0g and g : V ! R by g (y) = 0. Clearly, V is a vector subspace
and g is linear. By the Hahn-Banach’s Theorem (Theorem 1168) and since f+0 (x; ) g and
f+0 (x; ) is superlinear, it follows that g admits a linear extension such that f+0 (x; y) g (y)
for every y 2 Rn . By Riesz’s Theorem, there exists 0 2 Rn such that g (y) = 0 y for every
y 2 Rn . We can conclude that
0
y = 0 =) y=0 (24.39)
By Theorem 1150, it follows that 0 2 @f (x). Since f is strongly increasing, we also have
that 0 > 0.20 We are left to show that = 0 for some > 0. By Theorem 1167 and since
(24.39) holds, we have that 0 = for some 2 R. Since > 0 and 0 > 0, we have that
> 0, it is enough to set = 1= > 0.
20 0
By the previous part of the proof, 2 @ o f (x). By Proposition 1157 and since f is strongly increasing,
0
2 @ o f (x) Rn
+ f0g.
786 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
Proof Since f is bounded above, there exists M 2 R such that f (y) M for all y 2 C. We
need to introduce two connected ancillary objects. We start with the function G : Rn C ! R
such that for every 2 Rn and for every x 2 C
G ( ; x) = sup ff (y) : y xg
f^ (x) = infn G ( ; x)
2R
Observe that f (x) f^ (x) M for every x 2 C. Note that f^ is also quasi-concave on C
(why?).
We can now prove the main statement. We begin with the “If” part. Consider x 2 C.
Let 2 @ o f (x). This implies that if y 2 C is such that (y x) 0, then f (y) f (x).
It follows that
This implies that f^ (x) = f (x). Since x 2 C was arbitrarily chosen, we can conclude that
f = f^, yielding that f is quasi-concave. As for the “Only if” part, let x 2 C. We have two
cases: either x is a maximizer or x is not a maximizer of f on C. In the …rst case, choose
= 0. Note that the implication
(y x) 0 =) f (y) f (x)
trivially holds, since f (y) f (x) for all y 2 C, being x a maximizer. Thus, 2 @ o f (x)
and this latter set is non-empty. In the second case, since x is not a maximizer and f is
continuous and quasi-concave, we have that the strict upper contour set
is non-empty, open, convex, and x does not belong to it. By Proposition 824, there exists
2 Rn such that if y 2 (f > f (x)), that is f (y) > f (x), then y> x. By taking the
o
contrapositive, we have that 2 @ f (x) and this latter set is non-empty.
24.7. QUASI-CONCAVITY 787
provided rf (x) 6= 0.
Proof Let be an element of the set on the left hand side. To prove the inclusion, we want
to show that if y 2 C, then
(y x) 0 =) f (y) f (x)
Example 1162 The conditions rf (x) 6= 0 and of strong increasing monotonicity of Pro-
position 1160 are needed. For instance, for the quasi-concave function f (x) = x3 we have
= @ o f (0) = (0; 1). On the other hand, for the function f (x) = x2 , the origin is
0 = f 0 (0) 2
a global maximum and 0 = f 0 (0) 2 @ o f (0) = R. N
Proof Before starting note that, by contrapositive, (24.41) is equivalent to the following
property for each x; y 2 C
We only prove the “Only if”part. Consider x; y 2 C. Thus, assume that f (y) f (x). Since
f is quasi-concave, it follows that f ((1 t) x + ty) f (x) for every t 2 (0; 1). By Theorem
970, we have that
The next result is the quasi-concave counterpart of Theorem 1119, where a suitable notion
of quasi-monotonicity is used.
The function ' is thus decreasing on (0; 1). By continuity, ' is decreasing on [0; 1]. Since
' (1) ' (0), this implies that ' is constant on [0; 1]. Since '+ (0) = rf (x) (y x) (why?),
in turn, this implies that 0 = '0+ (0) = rf (x) (y x) < 0, a contradiction.
“Only if”Let f be quasi-concave and suppose that (24.43) does not hold. So, there exists
a pair x; y 2 C such that
24.7.4 Optima
We can characterize maximizers via the ordinal superdi¤erential, as we did in Theorem 1151
for the superdi¤erential.
Proof Let x^ 2 C be a maximizer. We have f (y) f (^ x) for every y 2 C. Thus, for every
n
2 R we trivially have that if y 2 C and (y x ^) 0, then f (y) f (^
x), yielding
that 2 @ o f (^
x). Since was arbitrarily chosen, it follows that @ o f (^
x) = Rn . Vice versa,
let 0 2 @f (^
x). It follows that if y 2 C and 0 (y x ^) 0, then f (y) f (^x). Since
0 (y x ^) 0 holds for every y 2 C, we have that f (y) f (^ x) for all y 2 C, i.e., x
^ 2 C is
a maximizer.
We thus have the following general …rst-order condition for quasi-concave functions, the
counterpart here of Corollary 1152.
Proof The “if” part is obvious and therefore left to the reader. “Only if” Before starting,
we introduce some derived objects since reasoning in terms of linear functions rather than
vectors will simplify things quite signi…cantly. De…ne fi : Rn ! R by fi (x) = i x for
each i = 1; :::; k. Similarly, de…ne f : Rn ! R by f (x) = x. Next, de…ne the operator
F : Rn ! Rk to be such that the i-th component of F (x) is F (x)i = fi (x). Since F is linear
(why?), note that Im F is a vector subspace of Rk . Next, we de…ne a function g : Im F ! R
by the following formula: for each v 2 Im F
First, we need to show that g is well de…ned. In other words, we need to check that to each
vector of Im F g assigns one and only one value. In fact, by de…nition, given v 2 Im F there
always exists a vector x 2 Rn such that F (x) = v. The potential issue is that there might
exist a second vector y 2 Rn such that F (y) = v, but f (x) 6= f (y). We next show that this
latter inequality will never hold. Indeed, since F is linear, if F (y) = v, then F (x) F (y) = 0
and F (x y) = 0. By de…nition of F , we have i (x y) = fi (x y) = 0 for every
i = 1; :::; k. By (24.45), this yields that (x y) = f (x y) = 0, that is, f (x) = f (y). We
just proved that g is well de…ned. The reader can verify that g is also linear. By the Hahn-
Banach’s Theorem (Theorem 636), g admits an P extension to Rk . By the Riesz’s Theorem,
there exists a vector 2 Rk such that g (v) = ki=1 i vi for all v 2 Rk . By de…nition of fi ,
f , g, and F , we conclude that for every x 2 Rn
k
X k
X
x = f (x) = g (F (x)) = g (F (x)) = i fi (x) = i i x
i=1 i=1
Pk 21
yielding that = i=1 i i.
The version of the theorem seen in Section 13.10 is a special case. Indeed, let f : V ! R
be any linear function de…ned on V . Theorem 729 is easily seen to hold for linear functions
de…ned on vector subspaces, so there is k > 0 such that jf (x)j k kxk for all x 2 V . The
21
Readers who struggle with this last step should consult the proof of the Riesz’s Theorem (in particular,
the part dealing with “uniqueness”).
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 791
function g : Rn ! R de…ned by g (x) = k kxk is concave (Example 652). Since f (x) g (x)
for all x 2 V , by the last theorem there exists a linear function f : Rn ! R that extends f
to Rn .
Proof Let dim V = k n and let fx1 ; :::; xk g be a basis for V . If k = n, there is nothing
to prove since V = Rn . Otherwise, by Theorem 87 there are n k vectors fxk+1 ; :::; xn g
such that the overall set fx1 ; :::; xn g is a basis for Rn . Let V1 = span fx1 ; :::; xk+1 g. Clearly,
V V1 . Given any x 2 V1 , there exists a unique collection of scalars f i gk+1 i=1 R such
Pk Pk
that x = i=1 i xi + k+1 xk+1 . Since i=1 i xi 2 V , every element of V1 can be uniquely
written as x + xk+1 , with x 2 V and 2 R. That is, V1 = fx + xk+1 : x 2 V; 2 Rg.
Let r be an arbitrary scalar. De…ne f1 : V1 ! R by f (x + xk+1 ) = f (x) + r for all
x 2 V and all 2 R. The function f1 is linear, with f1 (xk+1 ) = r, and is equal to f on V .
We need to show that r can be chosen so that f1 (x) g (x) for all x 2 V1 .
If > 0, we have that for every > 0 and every x 2 V
g (x + xk+1 ) f (x)
f1 (x + xk+1 ) g (x + xk+1 ) () f (x)+ r g (x + xk+1 ) () r
g (x + xk+1 ) f (x)
r
f (x) g (x ( ) xk+1 )
f1 (x + xk+1 ) g (x + xk+1 ) () f (x) ( )r g (x ( ) xk+1 ) () r
f (y) g (y xk+1 )
r
Summing up, we have f1 (x) g (x) for all x 2 V1 if and only if we choose r 2 R so that
Note that
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)
f ( y + x) = ( + ) f y+ x ( + )g y+ x
+ + + +
= ( + )g (y xk+1 ) + (x + xk+1 )
+ +
( + ) g (y xk+1 ) + g (x + xk+1 )
+ +
= g (y xk+1 ) + g (x + xk+1 )
In turn, this implies (24.46), as desired. We conclude that there exists a linear function
f1 : V1 ! R that extends f and such that f1 (x) g (x) for all x 2 V1 .
Consider now V2 = span fx1 ; :::; xk+1 ; xk+2 g. By proceeding as before, we can show the
existence of a linear function f2 : V2 ! R that extends f1 and such that f2 (x) g (x)
for all x 2 V2 . In particular, being V V1 V2 , the linear function f2 is such that
f2 (x) = f1 (x) = f (x) for all x 2 V . So, f2 extends f to V2 . By iterating, we reach a
…nal extension fn k : Rn ! R that extends f and is such that fn k (x) g (x) for all
x 2 Vn k = span fx1 ; :::; xn g = Rn . This completes the proof.
(iv) @f (0) n 1 if and only if f is increasing and translation invariant with f (1) = 1.
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 793
Proof We prove the “only if” part, as the “if” follows from Example 1103. Suppose f is
superlinear. By the Hahn-Banach’s Theorem, @f (0) is not empty. Indeed, let x 2 Rn and
consider the vector subspace Vx = f x : 2 Rg generated by x (see Example 82). De…ne
lx : Vx ! R by lx ( x) = f (x) for all 2 R. The function lx is linear on the vector subspace
Vx . Since f is superlinear, recall that f (x) f ( x), that is, f (x) f ( x). We next
show that lx f on Vx . Since f is superlinear, if 0, then lx ( x) = f (x) = f ( x). If
< 0, then lx ( x) = f (x) = ( f (x)) f ( x) = f ( x), proving that lx f on
Vx . By the Hahn-Banach’s Theorem, there exists l 2 (Rn )0 such that l f on Rn and l = lx
on Vx .22 By the Riesz’s Theorem, there exists 2 Rn such that l (x) = x for all x 2 Rn .
We thus have showed that 2 @f (0) and f (x) = x. The …rst fact implies that @f (0) is
not empty, hence min 2@f (0) x f (x) for all x 2 Rn , while the second fact implies that
Since x was arbitrarily chosen, (24.48) holds for every x 2 Rn . Next, suppose C; C 0 Rn
are any two non-empty convex and compact sets such that
We conclude that C = C 0 . In turn, in view of (24.48) this implies that @f (0) is the unique
non-empty compact and convex set in Rn for which (24.47) holds.
(i) Let @f (0) Rn+ . If x; y 2 Rn are such that x y, then x y for all 2 @f (0).
Let y 2 @f (0) be such that f (y) = y y. Then,
as desired. Conversely, assume that f is increasing. Then, for each i = 1; :::; n we have
0 f ei = min ei = min i
2@f (0) 2@f (0)
22
Recall that (Rn )0 denotes the dual space of Rn , i.e., the collection of all linear functions on Rn (Section
13.1.2).
794 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
So, 0 2= @f (0).
(iii) The proof is similar to (i) and left to the reader.
(iv) Let @f (0) n 1 . By (i), f is increasing. It remains to prove that it is translation
invariant. Let x 2 Rn and k 2 R. We have k = k because 2 n 1 . So,
f (x + k) = min (x + k) = min ( x+ k)
2@f (0) 2@f (0)
= min ( x + k) = k + min x = f (x) + k
2@f (0) 2@f (0)
as desired. Conversely, assume that f is increasing and translation invariant. By point (i),
@f (0) Rn+ . Moreover, since f (k) = k for all k 2 R, we have
n
X
i = 1 min 1 = f (1) = 1 8 2 @f (0)
2@f (0)
i=1
and
n
X
i = 1 min ( 1) = f ( 1) = 1 8 2 @f (0)
2@f (0)
i=1
Pn Pn Pn
So, we have both i=1 i 1 and i=1 i 1, which implies i=1 i = 1. We conclude
that @f (0) n 1.
Proof Let f be superlinear. Suppose f is linear. Let l 2 (Rn )0 be such that l f . By (24.9)
f = l. Conversely, suppose there is a unique l 2 (Rn )0 such that l f . Then (24.47) implies
f = l.
We can actually say something more about the domain of additivity of a superlinear
function. To this end, consider the collection Af = fx 2 Rn : f (x) = f ( x)g of all vectors
where the gap f ( x) f (x) closes.
f (x + y) = min (x + y) = min ( x+ y) = x+ y= ( x) ( y)
2C 2C
= (f ( x) + f ( y)) f( x y) f (x + y)
describes a …nancial market with bid-ask spreads. If paj = pbj for each j, we are back to the
frictionless framework of Section 19.5.
Before moving on, a piece of notation based on joins and meets (Section 17.1): given a
vector x 2 Rn , the positive vectors x+ = x _ 0 and x = x ^ 0 are called positive and
negative part of x, respectively. In terms of components, we have
x+
i = max fxi ; 0g and xi = min fxi ; 0g
In words, the components of x+ coincide with the positive ones of x and are 0 otherwise.
Similarly, the components of x coincide with the negative ones of x and are 0 otherwise.
It is immediate to check that x = x+ x . This decomposition can be interpreted as a
trading strategy: if x denotes a portfolio, its positive and negative parts x+ and x describe
the long and short positions that it involves, respectively – i.e., how much one has to buy
and sell, respectively, of each primary asset to form portfolio x.
Example 1172 Let x = (1; 2; 3) 2 R3 is a portfolio in a market with three primary assets.
We have x+ = (1; 2; 0) and x = (0; 0; 3), so to form portfolio x one has to buy one unit of
the …rst asset and two units of the second one and to sell three units of the third asset. N
Market values
To describe how much it cost to form a portfolio x, we need the ask market value va : Rn ! R
de…ned by
Xn Xn
va (x) = x+ p
j j
a
xj pbj 8x 2 Rn (24.51)
j=1 j=1
So, va (x) is the cost of portfolio x. In particular, since each primary asset yj corresponds to
the portfolio ej , we have va ej = paj . Note that we can attain the primary assets’holdings
of portfolio x also by buying and selling according to any pair of positive vectors x0 and x00
such that x = x0 x00 . In this case, the cost of x would be
n
X n
X
x0j paj x00j pbj (24.52)
j=1 j=1
Example 1173 In the last example we noted that to form portfolio x = (1; 2; 3) one has
to buy and sell the amounts prescribed by x+ = (1; 2; 0) and x = (0; 0; 3), respectively. At
the same time, this portfolio can be also formed by buying an extra unit of the third asset
and by selling the same extra unit of that asset. In other words, we have that x = x0 x00 ,
where x0 = (1; 2; 1) and x00 = (0; 0; 4). The cost of the …rst trading strategy is (24.51), while
the cost of the second one is (24.52). N
A moment’s re‡ection shows that there are actually in…nite possible decompositions of x
as a di¤erence of two positive vectors x0 and x00 . Each of them is a possible trading strategy
that delivers the assets’holdings that portfolio x features. Of course, one would choose the
cheapest among such trading strategies. The next result shows that the cheapest way to
form portfolio x is, indeed, that obtained by buying the amounts in x+ and selling those in
x . So, we can focus on them and forget about alternative buying and selling pairs x0 and
x00 .
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 797
Proposition 1174 The ask market value va : Rn ! R is such that, for each x 2 Rn ,
8 9
<Xn n
X =
va (x) = min x0j paj x00j pbj : x0 ; x00 0 and x = x0 x00
: ;
j=1 j=1
On the one hand, since x+ ; x 0 and x = x+ x , it follows that v^a (x) va (x) for all
x 2 Rn . On the other hand, consider x0 ; x00 0 such that x = x0 x00 . It follows that x0 x+
and x00 x . Indeed, note that x0 = x + x00 . Let i 2 f1; :::; ng. Since x00 0, it follows that
x0i = xi + x00i xi . Since x0 0, we thus have that x0i = max fx0i ; 0g max fxi ; 0g = x+
i .
Since i was arbitrarily chosen, we conclude that x 0 + 00 0
x . Finally, since x = x x + x ,+
Since x0 and x00 were arbitrarily chosen, we conclude that va (x) v^a (x). In particular, since
the inf is attained at x+ and x , we can replace it with a min.
In particular, we have vb ej = pbj for each primary asset j. There is a tight relationship
between bid and ask market values, as next we show.
vb (x) = va ( x) 8x 2 Rn (24.53)
In particular, vb is superlinear.
So, ask and bid market values are one the dual of the other. The superlinearity of vb is
a …rst dividend of this duality.
Proof If x 2 Rn , then
0 1 0 1
n
X n
X Xn n
X
va ( x) = @ ( x)+ a
( x)j pbj A = @ xj paj x+ bA
j pj j pj
j=1 j=1 j=1 j=1
n
X n
X
= x+ b
j pj xj paj = vb (x)
j=1 j=1
proving the …rst part of the statement. Consider now x; x0 2 Rn . Since va is sublinear, we
have that va ( x x0 ) va ( x) + va ( x0 ), yielding that
vb x + x0 = va x x0 va ( x) va x0 vb (x) + vb x0
By Proposition 1171, the set of portfolios without bid-ask spreads fx 2 Rn : vb (x) = va (x)g
is a vector subspace of Rn over which the bid and ask market values are linear.
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 799
De…nition 1177 The …nancial market (L; pb ; pa ) satis…es the Law of one price (LOP) if,
for all portfolios x; x0 2 Rn , we have
or, equivalently,
R (x) = R x0 =) vb (x) = vb x0 (24.55)
Conditions (24.54) and (24.55) are equivalent because of the bid-ask duality (24.53), so
the de…nition is well posed. Note that if pai = pbi for all i, then we get back to the LOP
of Section 19.5 since va = v. The rationale behind this more general version of the LOP is,
mutatis mutandis, the same: portfolios that induce the same contingent claims should have
the same market value whether we form or liquidate them.
In a market with bid-ask spreads, the LOP allows us to de…ne a pair of pricing rules.
Speci…cally, the ask pricing rule fa : W ! R and the bid pricing rule fb : W ! R are the
functions that associate to each replicable contingent claim w 2 W their ask and bid prices,
respectively. That is, for each w 2 W we have
where x 2 R 1 (w). Clearly, we have fb fa and, by the bid-ask duality (24.53), also the
pricing rules are dual:
fb (w) = fa ( w) 8w 2 W (24.56)
Next we show that they also inherit the shape of their corresponding market values.
Theorem 1178 Suppose the …nancial market L; pb ; pa satis…es the LOP. Then, the ask
pricing rule fa : W ! R is sublinear and the bid pricing rule fb : W ! R is superlinear.
In sum, the pricing of contingent claims made possible by the LOP inherits the bid and
ask duality of the underlying market values.
Proof First, we verify that fa is well de…ned. In other words, we are going to check that
to each vector w of W the rule de…ning fa assigns one and only one value. Indeed, assume
that there exist x; x0 2 Rn such that R (x) = w = R (x0 ). The potential issue could be
that va (x) 6= va (x0 ). But, the LOP exactly prevents this from happening. Next, consider
w; w0 2 W . By de…nition, there exist x; x0 2 Rn such that R (x) = w and R (x0 ) = w0 . Since
R is linear, we also have that R (x + x0 ) = R (x) + R (x0 ) = w + w0 . Since va is sublinear,
this yields that
fa w + w0 = va x + x0 va (x) + va x0 = fa (w) + fa w0
800 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY
fa ( w) = va ( x) = va (x) = fa (w)
Pricing kernels
In Theorem 1169 we established a representation result for superlinear functions that we
can now use to provide a representation result for ask and bid pricing rules that generalizes
Theorem 900. Recall that the …nancial market is complete when W = Rk .
Theorem 1179 Suppose the …nancial market L; pb ; pa is complete and satis…es the LOP.
Then, there exists a unique non-empty, compact, and convex set C Rk such that
Compared to the linear case of Section 19.5, bid-ask spreads result in a multiplicity of
pricing kernels , given by the set C. In particular, the ask price of a claim w can be
expressed as fa (w) = aw w and fb (w) = bw w via pricing kernels aw and bw in C that,
respectively, attain the maximum and the minimum for the linear pricing w.
fb (w) = min w 8w 2 Rk
2C
Let us continue to consider a complete market. In such a market there are no arbitrages
I if, for all x; x0 2 Rn ,
R x0 R (x) =) va x0 va (x) (24.57)
or, equivalently,23 if
R x0 R (x) =) vb x0 vb (x) (24.58)
23
To see the equivalence, note that R (x0 ) R (x) =) R ( x0 ) R ( x) =) va ( x0 ) va ( x) =)
va ( x0 ) va ( x) =) vb (x0 ) vb (x).
24.10. ULTRACODA: STRONG CONCAVITY 801
Without bid-ask spreads, the unique pricing rule is linear, so each of these two conditions
reduces to (19.19) because for linear functions positivity and monotonicity are equivalent
properties (Proposition 538). Here we need to make explicit the monotonicity assumption
that in the linear case was implicitly assumed.
It is easy to see that the no arbitrage conditions (24.57) and (24.58) imply the LOPs
(24.54) and (24.55). Under such stronger conditions we can get a stronger version of the last
result in which the pricing kernels are positive, thus generalizing Proposition 903.
Proposition 1180 Suppose the …nancial market L; pb ; pa is complete and has no arbit-
rages I. Then, there exists a non-empty, compact, and convex set C Rk+ such that
for all w 2 Rk . If, in addition, the risk-free contingent claim 1 has no bid-ask spread, with
fa (1) = fb (1) = 1, then C n 1.
Since the market is complete, by Proposition 1171 the set of contingent claims without
bid-ask spreads Af = w 2 Rk : fb (w) = fa (w) is a vector subspace of Rk over which the
bid and ask pricing rules are linear. The second part of the result says that if the constant
(so, risk free) contingent claim 1 belongs to such subspace and if its price is normalized to
1, then the pricing kernels are actually probability measures.24
Proof Under condition (24.57), the superlinear function fb is easily seen to be increasing. By
Theorem 1169-(i), we then have C = @fb (0) Rn+ . If 1 2 Af , then f is translation invariant.
By Theorem 1169-(iv), we then have C = @fb (0) n 1 provided fa (1) = fb (1) = 1.
Finally, the absence of arbitrages II is here modelled via strict monotonicity. So, the
resulting nonlinear version of the Fundamental Theorem of Finance, in which C Rn++ ,
relies on Theorem 1169-(iii). We leave the details to readers.
is concave.
Proof Let f : C ! R be strongly concave. By de…nition, there exists k > 0 such that the
2
function g : C ! R de…ned by
Pgn (x)2= f (x) + k kxk is concave. Let x; y 2 C, with x 6= y,
2
and 2 (0; 1). Since kxk = i=1 xi is strictly convex, we have
f ( x + (1 ) y) = g ( x + (1 ) y) k k x + (1 ) yk2
> g (x) + (1 ) g (y) k kxk2 + (1 ) kyk2
= f (x) + (1 ) f (y)
as desired.
Strong concavity is, thus, a strong version of strict concavity. The next result shows the
great interest of such stronger version.
In Example 811 we showed that the function f (x) = 1 x2 is coercive. Since this function
easily seen to be strongly concave, the example can be now seen as an illustration of the
proposition just stated.
The proof relies on a lemma of independent interest.
Proof Since f is concave and upper semicontinuous, the convex set hypo f is closed. For,
let f(xn ; tn )g hypo f be such that (xn ; tn ) ! (x; t) 2 Rn+1 . We need to show that (x; t) 2
hypo f . By de…nition, tn f (xn ) for each n 1, so t = lim tn lim sup f (xn ) f (x)
because f is upper semicontinuous. This shows that (x; t) 2 hypo f .
Let (x0 ; t0 ) 2
= hypo f , with x0 2 C and t0 > f (x0 ). By Proposition 824, there exist
(a; c) 2 Rn+1 and " > 0 such that
We have c > 0. For, suppose that c = 0. Then, a x0 b + " > b a x for all x 2 C, so
in particular a x0 > a x0 by taking x = x0 , a contradiction. Next, suppose c < 0. Again
by taking x = x0 and t = f (x0 ), from (24.59) it follows that ct0 b + " > b cf (x0 ). So
t0 < f (x0 ), which contradicts t0 > f (x0 ).
In sum, c > 0. Without loss of generality, set c = 1. De…ne the a¢ ne function r : C ! R
by r (x) = a (x0 x) + t0 . We then have r (x) t for all (x; t) 2 hypo f . In particular, this
is the case for (x; f (x)) for all x 2 C, so r (x) f (x) for all x 2 C. We conclude that r is
the sought-after a¢ ne function.
24.10. ULTRACODA: STRONG CONCAVITY 803
Proof of Proposition 1183 We …rst show that every upper contour set (f k) is bounded.
Suppose, by contradiction, that there exists an unbounded sequence fxn g (f k), i.e.,
such that kxn k ! +1. Since g is concave and continuous, by the previous lemma there is
an a¢ ne function r : C ! R, with r (x) = a xn + b for some a 2 Rn and b 2 R, such that
r g. So, a xn + b f (xn ) + k kxn k2 for all n. By the Cauchy-Schwarz inequality we have
a xn kak kxn k, so
By Tonelli’s Theorem, we then have the following remarkable existence and uniqueness
result that combines the best of the two worlds of coercivity and concavity: strict concavity
ensures the existence of at most a maximizer, strong concavity ensures via coercivity that
such a maximizer indeed exists.
In view of this remarkable result one may wonder whether there are strong concavity
criteria. The next result shows that this is, indeed, the case.
Proposition 1186 A twice di¤ erentiable function f : C ! R de…ned on an open convex set
of Rn is strongly concave if and only if there exists c < 0 such that the matrix r2 f (x) cI
is negative de…nite, i.e.,
Proof The function f is strongly concave if and only if g is concave, i.e., if and only if
y r2 g (x) y 0 for all x 2 C and all y 2 Rn (Proposition 1120). Some simple algebra
shows that r g (x) = r2 f (x) + kI, where I is the identity matrix of order n (note that
2
Theorem 1187 (Projection Theorem) Let C be a closed and convex set of Rn . For
every x 2 Rn , the optimization problem
(x m) (m y) 0 8y 2 C (24.62)
Proof Let C be a vector subspace. By taking y = 0 and y = 2m, condition (24.62) is easily
seen to imply (x m) m = 0. So, (x m) (m y) = (x m) y 0 for all y 2 C. Fix
y 2 C. Then, (x m) ty 0 for t = 1, so (x m) y = 0. Since y was arbitrarily chosen,
we conclude that (x m) y = 0 for all y 2 C, i.e., (x m) ?C.
Conversely, assume (x m) ?C. Then, (x m) (m y) = (x m) m for all y 2 C.
Since m 2 C, from (x m) ?C it follows in particular that (x m) m = 0. We conclude
that (x m) (m y) = 0 for all y 2 C, so condition (24.62) holds.
To prove this general form of the Projection Theorem, given an x 2 Rn we consider the
function f : Rn ! R de…ned by f (y) = kx yk2 . Problem (24.61) can be rewritten as
Thanks to the following lemma, we can apply Theorem 1185 to this optimization prob-
lem.25
Proof Simple algebra shows that r2 f (x) = 2I for all x 2 C, so y r2 f (x) y = kyk2 =2 for
all x 2 C and all y 2 Rn . By taking c = 1=2, condition (24.60) is satis…ed. This proves that
f is strongly concave.
Proof of the Projection Theorem In view of the previous lemma, by Theorem 1185 there
exists a unique solution m 2 C of the optimization problem (24.61). Clearly,
It remains to show that conditions (24.62) and (24.64) are equivalent, so that condition
(24.62) characterizes the minimizer m.26 Fix any y 2 C and let yt = ty + (1 t) m for
25
The reader should compare this result with Lemma 886. In a similar vein, the function of Lemma 855 can
be shown to be strongly concave. In these cases, strong concavity combines strict concavity and coercivity,
thus con…rming its dual role across concavity and coercivity.
26
Here we follow Zarantonello (1971).
24.10. ULTRACODA: STRONG CONCAVITY 805
t 2 [0; 1]. From (24.64) it follows that, for each t 2 (0; 1], we have
0 kx mk2 kx yt k2 = km yt k2 2 (x m) (m yt )
2
= km ty (1 t) mk 2 (x m) (m ty (1 t) m)
2 2
= t k(m y)k 2t (x m) (m y)
Thus, (24.62) implies kx mk2 kx yk2 0, so (24.64). Summing up, we proved that
conditions (24.62) and (24.64) are equivalent.
2 (y x) = AT
Ay = b
PC (x) = x+ 8x 2 Rn (24.66)
where x+ = max fx; 0g is the positive part of vector x. For instance, if n = 3 we have
PC (1; 3; 2) = (1; 0; 2). To verify the form of this projection, we use the characterization
(24.62). So, let m 0 be such that
(x m) (m y) 0 8y 2 Rn+
Implicit functions
y = f (x)
This form separates the independent variable x from the dependent one y, so it permits
to determine the values of the latter from those of the former. The same function can be
rewritten in implicit form through an equation that keeps all the variables on the same side
of the equality sign:
g (x; f (x)) = 0
g (x; y) = f (x) y
Example 1192 (i) The function f (x) = x2 + x 3 can be written in implicit form as
g (x; f (x)) = 0 with g (x; y) = x2 + x 3 y. (ii) The function f (x) = 1 + lg x can be
written in implicit form as g (x; f (x)) = 0 with g (x; y) = 1 + lg x y. N
Note that
1
g (0) \ (A Im f ) = Gr f
The graph of the function f thus coincides with the level curve g 1 (0) of the function g of
two variables.1
1
The rectangle A Im f has as its factors –its edges, geometrically –the domain and image of f . Clearly,
p
Gr f A Im f . For example,pfor the function f (x) = x this rectangle is the …rst orthant R2+ of the plane,
while for the function f (x) = x x2 is the unit square [0; 1] [0; 1] of the plane.
807
808 CHAPTER 25. IMPLICIT FUNCTIONS
y
2.5
1.5
0.5
0
-1 O 1 x
-0.5
-1
-2 -1 0 1 2
The implicit rewriting of a scalar function f whose explicit form is known is nothing
more than a curiosity because the explicit form contains all the relevant information on f ,
in particular on the dependence between the independent variable x and the dependent one
y. Unfortunately, often applications feature important scalar functions that are not given in
“ready to use” explicit form, but only in implicit form through equations g (x; y) = 0. For
this reason, it is important to consider the inverse problem: does an equation of the type
g (x; y) = 0 de…ne implicitly a scalar function f ? In other words, does f exist such that
g (x; f (x)) = 0? If so, which properties does it have? For instance, is it unique? Is it convex
or concave? Is it di¤erentiable?
This chapter will address these motivating questions by showing that, under suitable
regularity conditions, this function f exists and is unique (locally or globally, as it will
become clear) and that it may enjoy remarkable properties. As usual, we will emphasize a
global viewpoint, the one most relevant for applications.
the implicit functions considered. In other words, the lemma considers functions f : A ! B
that belong to a posited space B A (cf. Section 6.3.2). It is a purely set theoretic result, so
in the statement we consider generic sets A, B, C and D.
that is, the level curve g 1 (k) of the function g is described on the rectangle A B by
the function of a single variable f . Thus, f provides a “functional description” of this level
curve that speci…es the relationship existing between the arguments x and y of g when they
belong to g 1 (k). By the lemma, for a function f to satisfy condition (25.1) thus amounts
to provide such a functional description of the level curve.
Proof (i) implies (ii). We …rst show that Gr f g 1 (k) \ (A B). Let (x; y) 2 Gr f .
By de…nition, (x; y) 2 A B and y = f (x), thus g (x; y) = g (x; f (x)) = k. This implies
(x; y) 2 g 1 (k) \ (A B), so Gr f g 1 (k) \ (A B). As to the converse inclusion, let
(x; y) 2 g 1 (k) \ (A B). We want to show that y = f (x). Suppose not, i.e., y 6= f (x).
De…ne f~ : A ! R by f~ (x) = f (x) if x 6= x and f~ (x) = y. Since g (x; y) = k, we
have g(x; f~ (x)) = k for every x 2 A. Since (x; y) 2 A B, we have f~ 2 B A . Being by
construction f~ 6= f , this contradicts the uniqueness of f . We conclude that (25.2) holds, as
desired.
(ii) implies (i). Let f 2 B A be such that (25.2). By de…nition, (x; f (x)) 2 Gr f for each
x 2 A. By (25.2), we have (x; f (x)) 2 g 1 (k), so g (x; f (x)) = k for each x 2 A. It remains
to prove the uniqueness of f . Let h 2 B A satisfy (25.1). We have Gr h g 1 (k) \ (A B)
since we can argue as in the …rst inclusion of the …rst part of the proof. By (25.2), this
inclusion then yields Gr h Gr f . In turn, this implies h = f . Indeed, if we consider x 2 A,
then (x; h (x)) 2 Gr h Gr f . Since (x; h (x)) 2 Gr f , then (x; h (x)) = (x0 ; f (x0 )) for some
x0 2 A. This implies x = x0 and h (x) = f (x0 ), and so h (x) = f (x). Since x was arbitrarily
chosen, we conclude that f = h, as desired.
In this case we say that equation g (x;y) = k implicitly de…nes f on the rectangle A B.
Proof For simplicity, let k = 0. Let x0 2 A. By condition (25.3), there exist scalars
y 0 ; y 00 2 B, say with y 0 y 00 , such that g (x0 ; y 0 ) 0 g (x0 ; y 00 ). Since f is continuous, by
Bolzano’s Theorem there exists y0 2 [y 0 ; y 00 ] such that g (x0 ; y0 ) = 0. Since x0 was arbitrarily
chosen, this proves the existence of the implicit function f .
So, if g is continuous and strictly monotone in y and satis…es condition condition (25.3),
then equation g (x;y) = k implicitly de…nes a unique f on the rectangle A B.
Proof Let f; h : A ! B be such that g (x; f (x)) = g (x; h (x)) = k for all x 2 A. We want
to show that h = f . Suppose, by contradiction, that h 6= f . So, there is at least some x 2 A
with h (x) 6= f (x), say h (x) > f (x). The function g is strictly monotone in y, say increasing.
Thus, k = g (x; h (x)) > g (x; f (x)) = k, a contradiction. We conclude that h = f .
g (x; y) = 0
By Propositions 1195 and 1196, there is a unique implicit function f : R ! R such that
Note that we are not able to write y as an explicit function of x, that is, we are not able to
provide the explicit form of f . N
Having discussed existence and uniqueness, we can now turn to the properties that the
implicit function f inherits from g. In short, the continuity of g is passed to the implicit
function, as well as its monotonicity and convexity, although reversed.
(ii) f is (strictly) convex if g is (strictly) quasi concave, provided the sets A, B and C are
convex.
(iii) f is (strictly) concave if g is (strictly) quasi convex, provided the sets A, B and C are
convex.
Proof (i) Let n = 1, so that C R2 . We begin by showing that assuming that g is strictly
increasing both in x and in y is equivalent to directly assuming that g is strictly increasing
on A.
Proof Let us only show the “if”part, the converse being trivial. Hence, let g : C R2 ! R
be strictly increasing both in x and in y. Let (x; y) > (x0 ; y 0 ). Our aim is to show that
g (x; y) > g (x0 ; y 0 ). If x = x0 or y = y 0 , the result is trivial. Hence, let x > x0 and
y > y 0 . We have (x; y) > (x0 ; y) > (x0 ; y 0 ), so g (x; y) > g (x0 ; y) > g (x0 ; y 0 ), which implies
g (x; y) > g (x0 ; y 0 ).
g x + (1 ) x0 ; f (x) + (1 ) f x0
g (x; f (x)) = g x + (1 ) x0 ; f x + (1 ) x0
1 1
g x; y < k < g x; y + 8x 2 B~" (x)
m m
1 1
f (x) < f (x) < f (x) + 8x 2 B~" (x) (25.4)
m m
In turn, this guarantees that f is continuous at x. In fact, let xn ! x. Fix any m 1 large
enough so that 0 < 1=m < ". By what we just proved, there exists ~" > 0 such that (25.4)
holds. By the de…nition of convergence, there is n~" 1 such that xn 2 B~" (x) for every
n n~" , so that
1 1
f (x) < f (xn ) < f (x) + 8n n~"
m m
25.2. IMPLICIT FUNCTIONS 813
Thus
1 1
f (x) lim inf f (xn ) lim sup f (xn ) f (x) +
m m
Since this holds for all m large enough, we have
1 1
f (x) = lim f (x) lim inf f (xn ) lim sup f (xn ) lim f (x) + = f (x)
m!1 m m!1 m
We conclude that lim f (xn ) = f (x). Since x was arbitrarily chosen, the function f is
continuous.
We leave to the reader the dual version of this result in which the strict monotonicity of
g changes from increasing to decreasing. Instead, we turn to the all-important issue of the
di¤erentiability of the implicit function.
Proposition 1200 Let g : C ! D with A B C and let k 2 D. Suppose that the sets A
and B are open and that g is continuously di¤ erentiable on A B, with either @g (x; y) =@y >
0 for all (x; y) 2 A B or @g (x; y) =@y < 0 for all (x; y) 2 A B. If f : A ! B is such that
g (x; f (x)) = k for all x 2 A, then it is continuously di¤ erentiable, with
@g
(x; y)
f 0 (x) = @x (25.5)
@g
(x; y)
@y
for every (x; y) 2 g 1 (k) \ (A B).
In the next section we will discuss at length the di¤erential formula (25.5), which plays
a fundamental role in applications.
Example 1201 In the last example we learned that the equation
g (x; y) = x2 2y ey = 0
de…nes on the plane a unique implicit function f : R ! R. The function g is continuously
di¤erentiable, with
@g
(x; y) = 2 ey < 0 8x; y 2 R2
@y
By Proposition 1200, f is then continuously di¤erentiable, with
@g
(x; y) 2x
f 0 (x) = @x = 8 (x; y) 2 g 1
(0)
@g 2 + ey
(x; y)
@y
Though we were not able to provide the explicit form of f , we have a formula for its derivative.
As we will see in the next section when discussing the Implicit Function Theorem, this is a
main feature of formula (25.5). For instance, at every (x0 ; y0 ) 2 g 1 (0) we can then write
the …rst-order approximation
2x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x 0 ) = y0 + (x x0 ) + o (x x0 )
2 + ey0
that gives us some precious information on f . N
814 CHAPTER 25. IMPLICIT FUNCTIONS
Proof of Proposition 1200 Since either @g (x; y) =@y > 0 for all (x; y) 2 A B or the
opposite inequality holds, g strictly monotone in y. By Proposition 1196, f is then the
unique function in B A such that g (x; f (x)) = k for all x 2 A. The function f is continuously
di¤erentiable. Let x 2 A and y = f (x). Set h2 = f (x + h1 ) f (x). Since g is continuously
di¤erentiable, for every h1 ; h2 6= 0 there exists 0 < # < 1 such that4
@g
h2
@g(x+#h1 ;y+#h2 ) (x; y)
f 0 (x) = lim = lim @x
= @x (25.7)
h1 !0 h1 h1 !0 @g(x+#h1 ;y+#h2 ) @g
@y (x; y)
@y
because of the continuity of @g=@x and of @g=@y. In turn, this shows that the continuity of
the derivative function f 0 is a direct consequence of the continuity of @g=@x and of @g=@y.
From (25.7) it follows that
@g
(x; f (x))
f 0 (x) = @x 8x 2 A
@g
(x; f (x))
@y
@g
(x; y)
f 0 (x) = @x (25.10)
@g
(x; y)
@y
Along with the continuous di¤erentiability of g, the easily checked simple di¤erential
condition (25.8) thus ensures that locally, near the point (x0 ; y0 ), there exists a unique
and continuously di¤erentiable implicit function f : B (x0 ) ! V (y0 ). It is a remarkable
achievement: the hypotheses of the global results of the previous section – Propositions
1195, 1196 and 1200 –are de…nitely clumsier. Yet, the global viewpoint –the most relevant
for applications –will be partly vindicated by the Global Implicit Function Theorem of next
chapter and, more important here, the proof of the Implicit Function Theorem will show
how this theorem in turn builds on the previous global results.
To emphasize the local perspective of the Implicit Function Theorem, here we say that
equation g (x;y) = 0 implicitly de…nes a unique f at the point (x0 ; y0 ) 2 g 1 (0).
Proof Suppose, without loss of generality, that (25.8) takes the positive form
@g
(x0 ; y0 ) > 0 (25.11)
@y
Since g is continuously di¤erentiable, by the Theorem on the permanence of sign there exists
a neighborhood B ~ (x0 ; y0 ) U for which
@g ~ (x0 ; y0 )
(x; y) > 0 8 (x; y) 2 B (25.12)
@y
Since @g (x; y) =@y > 0 for every (x; y) 2 [x0 "; x0 + "] [y0 "; y0 + "], the function g (x; )
is strictly increasing in y for every x 2 [x0 "; x0 + "]. So, g (x0 ; y0 ") < 0 = g (x0 ; y0 ) <
g (x0 ; y0 + "). The functions g ( ; y0 ") and g ( ; y0 + ") are both continuous in x, so by
816 CHAPTER 25. IMPLICIT FUNCTIONS
the Theorem on the permanence of sign there exists a small enough neighborhood B (x0 )
[x0 "; x0 + "] so that
g (x; y0 ") < 0 < g (x; y0 + ") 8x 2 B (x0 ) (25.13)
By Bolzano’s Theorem, for each x 2 B (x0 ) there exists y0 " < y < y0 + " such that
g (x; y) = 0. By the strict monotonicity of g (x; ) on [y0 "; y0 + "], such y is unique. By
setting V (y0 ) = (y0 "; y0 + "), we have thus de…ned a unique implicit function f : B (x0 ) !
V (y0 ) on the rectangle U (x0 ) V (y0 ) such that (25.9) holds.6
Having established the existence of a unique implicit function, its di¤erential properties
now follow from Proposition 1200.
Since the function f : B (x0 ) ! V (y0 ) de…ned implicitly by the equation g (x;y) = 0 at
(x0 ; y0 ) is unique, in view of Proposition 1194 the relation (25.9) is equivalent to
g (x; y) = 0 () y = f (x) 8 (x; y) 2 B (x0 ) V (y0 ) (25.14)
that is, to
1
g (0) \ (B (x0 ) V (y0 )) = Gr f (25.15)
Thus, the level curve g 1 (0)
–so, the solutions of the equation g (x; y) = 0 –can be repres-
ented locally by the graph of the implicit function. This is precisely, in the …nal analysis,
the reason why the theorem is so important in applications (as we will see shortly in Section
25.3.2).
Inspection of the proof of the Implicit Function Theorem shows that on the rectangle
B (x0 ) V (x0 ) we have either @g (x; y) =@y > 0 or @g (x; y) =@y < 0. Assume the former, so
that g is strictly increasing in y. By Proposition 1199, we then have that:
Thus, some basic properties of the implicit function provided by the Implicit Function
Theorem can be easily established. Note that formula (25.10) permits the computation of
the …rst derivative of the implicit function even without knowing the function in explicit
form. Since the …rst derivative is often what is really needed for such a function (because,
for example, we are interested in solving a …rst-order condition), this is a most useful feature
of the Implicit Function Theorem.
Note that the use of formula (25.10) is based on the clause “(x; y) 2 g 1 (0)\B (x0 ) V (y0 )”
that requires to …x both variables x and y. This is the price to pay in implicit derivability –
in contrast, in explicit derivability it is su¢ cient to …x the variable x to compute f 0 (x). On
the other hand, we can rewrite (25.10) as
@g
(x; f (x))
f 0 (x) = @x (25.16)
@g
(x; f (x))
@y
for each x 2 B (x0 ), thus emphasizing the role played by the implicit function. Formulations
(25.10) and (25.16) are both useful, for di¤erent reasons; it is better to keep both of them
in mind. As we remarked, formulation (25.10) allows one to compute the …rst derivative of
f even without knowing f itself, thereby yielding a useful …rst-order local approximation of
f . For this reason in the examples we will always use (25.10) because the closed form of f
will not be available.
We can provide a heuristic derivation of formula (25.10) through the total di¤erential
@g @g
dg = dx + dy
@x @y
of the function g. We have dg = 0 for variations (dx; dy) that keep us along the level curve
g 1 (0). Therefore,
@g @g
dx = dy
@x @y
which “yields” (the power of heuristics!):
@g
dy @x
= @g
dx
@y
It is a rather rough (and incorrect) argument, but certainly useful to remember formula
(25.10).
Sometimes it is possible to …nd stationary points of the implicit function without knowing
its explicit form. When this happens, it is a remarkable application of the Implicit Function
Theorem. For instance, consider in the previous example the point (4; 2) 2 g 1 (0). We have
(@g=@y) (4; 2) = 32 6= 0. Let f : B (4) ! V (2) be the unique function then de…ned implicitly
at the point (4; 2).8 We have:
@g
(4; 2) 0
f 0 (4) = @x = =0
@g 32
(4; 2)
@y
Therefore, x0 = 4 is a stationary point for the implicit function f . It is possible to check
that it is actually a local maximizer.
1 p 1 1
f p = 2 7 x p +o x p
7 7 7
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). The …rst-order local approximation is
and, as x ! 0,
1
f (x) = y0 + f 0 (0) x + o (x) = x + o (x)
4
N
By exchanging the variables in the Implicit Function Theorem, we can say that the
continuity of the partial derivatives of g in a neighborhood of (x0 ; y0 ) and the condition
@g (x0 ; y0 ) =@x 6= 0 ensures the existence of a (unique) implicit function x = ' (y) such
that locally g (' (y) ; y) = 0. It follows that, if at least one of the two partial derivatives
@g (x0 ; y0 ) =@x and @g (x0 ; y0 ) =@y is not zero, there is locally a univocal tie between the two
variables. As a result, the Implicit Function Theorem cannot be applied only when both the
partial derivatives @g (x0 ; y0 ) =@y and @g (x0 ; y0 ) =@x are zero.
For example, if g (x; y) = x2 + y 2 1, then for every point (x0 ; y0 ) that satis…es the
equation g (x; y) = 0 we have @g (x0 ; y0 ) =@y = 2y0 , which is zero only for y0 = 0 (and hence
x0 = 1). At the two points (1; 0) and ( 1; 0) the equation does not de…ne any implicit
function of the type y = f (x). But @g ( 1; 0) =@x = 2 6= 0 and, therefore, at such points
the equation de…nes an implicit function of the type x = ' (y). Symmetrically, at the two
820 CHAPTER 25. IMPLICIT FUNCTIONS
points (0; 1) and (0; 1) the equation de…nes an implicit function of the type y = f (x) but
not one of the type x = ' (y).
This last remark suggests a …nal important observation on the Implicit Function The-
orem. Suppose that, as at the beginning of the chapter, ' is a standard function de…ned in
explicit form, which can be written in implicit form as
Given (x0 ; y0 ) 2 g 1 (0), suppose @g (x0 ; y0 ) =@x 6= 0. The Implicit Function Theorem (in
“exchanged” form) then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function f : B (y0 ) ! V (x0 ) such that
g (f (y) ; y) = 0 8y 2 B (y0 )
The function f is, therefore, the inverse of ' on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence –locally, around the point y0 –of the inverse
of '. In particular, formula (25.10) here becomes
@g
(x0 ; y0 )
@y 1
f 0 (y0 ) = = 0
@g ' (x0 )
(x0 ; y0 )
@x
which is the classic formula (20.20) of the derivative of the inverse function. In sum, there
is a close connection between implicit and inverse functions, which the reader will see later
in the book (Section 26.1).
Proposition 1206 Let g : U ! R be de…ned (at least) on an open set U of R2 and let
g (x0 ; y0 ) = k. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), and
@g
(x0 ; y0 ) 6= 0
@y
then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = k 8x 2 B (x0 )
25.3. A LOCAL PERSPECTIVE 821
@g
(x; y)
f 0 (x) = @x (25.19)
@g
(x; y)
@y
This is the version of the Implicit Function Theorem which we will refer to in the rest of
the section when discussing marginal rates.
In view of Proposition 1194, the implicit function f : B (x0 ) ! V (y0 ) permits to establish
a functional representation of the level curve g 1 (k) through the basic relation
1
g (k) \ (B (x0 ) V (y0 )) = Gr f (25.20)
which is the general form of (25.15) for any k 2 R. Implicit functions thus describe the link
between the variables x and y that belong to the same level curve, thus making it possible
to formulate trough them some key properties of these curves. The great e¤ectiveness of this
formulation explains the importance of implicit functions, as mentioned right after (25.14).
For example, the isoquant g 1 (k) is a level curve of the production function g : R2+ ! R,
which features two inputs, x and y, and one output. The points (x; y) that belong to the
isoquant are all the input combinations that keep the quantity of output produced constant.
The implicit function y = f (x) tells us, locally, how it has to change the quantity y, when
x varies, in order to keep constant the output produced. Therefore, the properties of the
function f : B (x0 ) ! V (y0 ) characterize, locally, the relations between the inputs that
guarantee the level k of output. We usually assume that f is:
(i) decreasing, that is, f 0 (x) 0 for every x 2 B (x0 ): the two inputs are partially
substitutable and, in order to keep the quantity produced unchanged to the level k, to
lower quantities of the input x have to correspond larger quantities of the input y (and
vice versa);
(ii) convex, that is, f 00 (x0 ) 0 for every x 2 B (x0 ): to greater levels of x, have to
correspond larger and larger quantities of y to compensate (negative) in…nitesimal
variations of x in order to keep production at level k.
Remarkably, as noted after the proof of the Implicit Function Theorem, via Proposition
1199 we can tell which properties of g induce these desirable properties.
822 CHAPTER 25. IMPLICIT FUNCTIONS
The absolute value jf 0 j of the derivative of the implicit function is called the marginal
rate of transformation because for in…nitesimal variations of the inputs, it describes their
degree of substitutability –that is, the variation of y that balances an increase in x. Thanks
to the functional representation (25.20) of the isoquant, geometrically the marginal rate of
transformation can be interpreted as the slope of the isoquant at (x; y). This is the classic
interpretation of the rate, which follows from (25.20).
The Implicit Function Theorem implies the classic formula
@g
(x; y)
M RTx;y = f 0 (x) = @x
@g
(25.21)
@y (x; y)
This is the usual form in which the notion of marginal rate of transformation M RTx;y
appears.
For example, at a point at which we use equal quantities of the two inputs –that is, x = y –
if we increase the …rst input by one unit, the second one must decrease by = (1 ) units to
leave unchanged the quantity of output produced: in particular, when = 1=2, the decrease
of the second one must be of one unit. At a point at which we use a quantity of the second
input …ve times larger than that of the …rst input –that is, y = 5x –an increase of one unit
of the …rst input is compensated by a decrease of 5 = (1 ) of the second one. N
Similar considerations hold for the level curves of a utility function u : R2+ ! R, that is,
for its indi¤erence curves u 1 (k). The implicit functions provided by the Implicit Function
Theorem tell us, locally, how one has to vary the quantity y when x varies to keep the
overall utility level constant. For them we assume properties of monotonicity and convexity
similar to those assumed for the implicit functions de…ned by isoquants. The monotonicity
of the implicit function re‡ects the partial substitutability of the two goods: it is possible to
consume a bit less of one good and a bit more of the other one and yet keep unchanged the
overall level of utility. The convexity of the implicit function models the classic hypothesis
of decreasing rates of substitution: when the quantity of a good, for example x, increases
we then need greater and greater “compensative” variations of the other good y in order to
stay on the same indi¤erence curve, i.e., in order to have u (x; y) = u (x + x; y + y).
9
Later in the chapter we will revisit this example (Example 1223).
25.3. A LOCAL PERSPECTIVE 823
Here as well, it is important to note that via Proposition 1199 we can tell which properties
of the utility function u induce these desirable properties, thus for instance making rigorous
the common expression “convex indi¤erence curves” (cf. Chapter 14). Indeed, they have a
functional representation via convex implicit functions.
In the present case the absolute value jf 0 j of the derivative of the implicit function is
called marginal rate of substitution: it measures the (negative) variation in y that balances
marginally an increase in x. Geometrically, it is the slope of the indi¤erence curve at (x; y).
Thanks to the Implicit Function Theorem, we have
@u
(x; y)
M RSx;y = f 0 (x) = @x
@u
@y (x; y)
which is the classic form of the marginal rate of substitution.
Let h be a scalar function with a strictly positive derivative, so that it is strictly increasing
and h u is then a utility function equivalent to u. By the chain rule,
@h u
@x (x; y) h0 (u (x; y)) @u
@x (x; y)
@u
@x (x; y)
@h u
= = (25.22)
@y (x; y) h0 (u (x; y)) @u
@y (x; y) @u
@y (x; y)
Since we can drop the derivative h0 (u (x; y)), the marginal rate of substitution is the same
for u and for all its increasing transformations h u. Thus, the marginal rate of substitution
is an ordinal notion, invariant under strictly increasing (di¤erentiable) transformations. It
does not depend on which of the two equivalent utility function, u or h u, is considered.
This explains the centrality of this ordinal notion in consumer theory, where after Pareto’s
ordinalist revolution it has replaced the cardinal notion of marginal utility (cf. Section 29.5).
Example 1209 To illustrate (25.22), consider on Rn++ the equivalent Cobb-Douglas utility
function u (x; y) = xa y 1 a and log-linear utility function log u (x; y) = a log x + (1 a) log y.
We have
@u @ log(u(x;y))
@x (x; y) axa 1 y 1 a a y @x (x; y)
M RSx;y = @u
= a a
= = @ log(u(x;y))
@y (x; y) (1 a) x y 1 ax (x; y)
@y
The two utility functions have the same marginal rate of substitution. N
Finally, let us consider a consumer that consumes in two periods, today and tomorrow,
with intertemporal utility function U : R2+ ! R given by
U (c1 ; c2 ) = u (c1 ) + u (c2 )
where we assume the same instantaneous utility function u in the two periods. Given a
utility level k, let
U 1 (k) = (c1 ; c2 ) 2 R2+ : U (c1 ; c2 ) = k
be the intertemporal indi¤erence curve and let (c1 ; c2 ) be a point on it. When the hypotheses
of the Implicit Function Theorem – with the variables exchanged – are satis…ed at (c1 ; c2 ),
there exists an implicit function f : B (c2 ) ! V (c1 ) such that
U (f (c2 ) ; c2 ) = k 8c2 2 B (c2 )
824 CHAPTER 25. IMPLICIT FUNCTIONS
The scalar function c1 = f (c2 ) tells us how much has to vary consumption today c1 when
consumption tomorrow c2 varies, so as to keep the overall utility U constant. We have:
@U
(c1 ; c2 )
@c2 u0 (c2 )
f 0 (c2 ) = =
@U u0 (c1 )
(c1 ; c2 )
@c1
When the number
u0 (c2 )
IM RSc1 ;c2 = f 0 (c2 ) = (25.23)
u0 (c1 )
exists, it is called intertemporal marginal rate of substitution: it measures the (negative)
variation in c1 that balances an increase in c2 .
Example 1210 Consider the power utility function u (c) = c = for > 0. We have
c1 c2
U (c1 ; c2 ) = +
Theorem 1211 If in the Implicit Function Theorem the function g is n times continuously
di¤ erentiable, then so does the implicit function f .10 In particular, for n = 2 we have
2 2
@g(x;y) @g(x;y)
@2x @y 2 @g(x;y)
@x@y
@g(x;y) @g(x;y)
@x @y + @g(x;y)
@2y
@g(x;y)
@x
f 00 (x) = 3 (25.24)
@g(x;y)
@y
Proof We will omit the proof of the …rst part of the statement. Suppose f is twice di¤er-
entiable and let us apply the chain rule to (25.10), that is to
@g(x;f (x))
0 @x gx0 (x; f (x))
f (x) = =
@g(x;f (x)) gy0 (x; f (x))
@y
10
Also analyticity is preserved: if g is analytic, so does f .
25.3. A LOCAL PERSPECTIVE 825
For the sake of brevity we do not make the dependence of the derivatives of g on (x; f (x))
explicit, so we can write
0 0
00 + g 00 f 0 (x) g 0 00
gxx 00 gx g 0
gxy gx0 gyx
00 00 gx
gyy
00
gxx xy y gx0 gyx
00 + g 00 f 0 (x)
yy g0y
y g0
y
f (x) = 2 + 2 = 2 + 2
gy0 gy0 (x; f (x)) gy0 gy0
00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
=
gy0 3
as desired.
The two previous theorems allow us to give local approximations for an implicitly de…ned
function. As we know, one is rarely able to write the explicit formulation of a function which
is implicitly de…ned by an equation: being able to give approximations is hence of great
importance.
If g is of class C 1 on an open set U , the …rst order approximation of the implicitly de…ned
function in a point (x0 ; y0 ) 2 A such that g (x0 ; y0 ) = 0 is
@g
(x0 ; f (x0 ))
f (x) = y0 @x (x x0 ) + o (x x0 )
@g
(x0 ; f (x0 ))
@y
as x ! x0 .
If f is of class C 2 on an open set U , the second order (or quadratic) approximation of
the implicit function in a point (x0 ; y0 ) 2 U such that g (x0 ; y0 ) = 0 is, as x ! x0 ,
00 g 0 2 00 g 0 g 0 + g 00 g 02
gx0 gxx y 2gxy x y yy x
f (x) = y0 (x x0 ) + (x x0 )2 + o (x x0 )2
gy0 gy03
where we omitted the dependence of the derivatives on the point (x0 ; f (x0 )).
Example 1212 Given the function in Example 1204 we have
2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2
f 00 (x0 ) =
(3x0 + 2y0 )3
so that the quadratic approximation of f is, as x ! x0 ,
2x + 3y0
f (x) = y0 (x x0 )
3x + 2y0
2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2
(x x0 )2
(3x0 + 2y0 )3
+ o (x x0 )2
g (x1 ; :::; xn ; y) = 0
Theorem 1213 Let g : U ! R be de…ned (at least) on an open set U of Rn+1 and let
g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with
@g
(x0 ; y0 ) 6= 0
@y
then there exist neighborhoods B (x0 ) Rn and V (y0 ) R and a unique vector function
f : B (x0 ) ! V (y0 ) such that
Example 1214 Let g : R3 ! R be de…ned by g (x1 ; x2 ; y) = x21 x22 +y 3 and let (x1 ; x2 ; y0 ) =
(6; 3; 3). We have that g 2 C 1 R3 and so (@g=@y) (x; y) = 3y 2 , therefore
@g
(6; 3; 3) = 27 6= 0
@y
By the Implicit Function Theorem, there exists a unique y = f (x1 ; x2 ) de…ned in a neigh-
borhood U (6; 3), which is di¤erentiable there and takes values in a neighborhood V ( 3).
Since
@g @g
(x; y) = 2x1 and (x; y) = 2x2
@x1 @x2
we have
@f 2x1 @f 2x2
(x) = and (x) = 2
@x1 3y 2 @x2 3y
In particular
12 6
rf (6; 3) = ;
27 27
The reader can check that a global implicit function exists f : R2 ! R and, after having
recovered the explicit expression (which exists because of the simplicity of g), can verify that
formula (25.26) is correct in computing rf (x). N
N.B. Global versions in the spirit of Proposition 1200 of Theorems 1211 and 1213 can be
easily established, as readers can check. O
Here also g = (g1 ; ::; gm ) : A Rn+m ! Rm is an operator and the equation de…nes an
operator f = (f1 ; :::; fm ) between Rn and Rm such that
8
> g1 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
>
<
g2 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
(25.27)
>
>
:
gm (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
Let us focus directly on this latter general case. Here the following square submatrix of
the Jacobian matrix of the operator g plays a key role:
2 @g1 @g1 @g1 3
@y1 (x; y) @y2 (x; y) @ym (x; y)
6 7
6 7
6 @g 7
6 2 @g 2 @g 2 7
6 @y1 (x; y) @y2 (x; y) @ym (x; y) 7
6 7
Dy g (x; y) = 6 7
6 7
6 7
6 7
6 @gm (x; y) @gm (x; y) @gm
(x; y) 7
4 @y1 @y2 @ym 5
We can now state, without proof, the operator version of the Implicit Function Theorem,
which is the most general form of this result that we consider.
Theorem 1215 Let g : U ! Rm be de…ned (at least) on an open set U of Rn+m and let
g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with
then there exist neighborhoods B (x0 ) Rn and V (y0 ) Rm and a unique operator f =
(f1 ; :::; fm ) : B (x0 ) ! V (y0 ) such that (25.27) holds for every x 2 B (x0 ). The operator f
is continuously di¤ erentiable on B (x0 ), with
1
Df (x) = (Dy g (x; y)) Dx g (x; y) (25.29)
The Jacobian of the implicit operator is thus pinned down by formula (25.29). To better
understand this formula, it is convenient to write it as an equality
of two m n matrices. In terms of the (i; j) 2 f1; :::; mg f1; :::; ng component of each such
matrix, the equality is
Xm
@gi @fk @gi
(x) (x) = (x)
@yk @xj @xj
k=1
For each independent variable xj , we can determine the sought-after m-dimensional vector
@f1 @fm
(x) ; :::; (x)
@xj @xj
25.3. A LOCAL PERSPECTIVE 829
By doing this for each j, we can …nally determine the Jacobian Df (x) of the implicit
operator.
and let (x0 ; y0 ) = (1; 0; 1; 0). The submatrix of the Jacobian matrix of the operator g
containing the partial derivatives of g with respect to y1 and y2 is given by
2y1 6
Dy g(x; y) =
4x2 ey1 + 2y1 4x1 y2
3 4ex2
Dx g(x; y) =
2y22 4ey1
The determinant of Dy g(x; y) is jDy g(x; y)j = 8x1 y1 y2 24x2 ey1 + 12y1 , so jDy g(x0 ; y0 )j =
12 6= 0. Condition (25.27) is thus satis…ed. By the last theorem, there exists an implicit
operator f = (f1 ; :::; fm ) : B (x0 ) ! V (y0 ) which is continuously di¤erentiable on B (x0 ).
The partial derivatives
@f1 @f2
(x); (x)
@x1 @x1
satisfy the following system
" #
@f1
2y1 6 @x1 (x) =
3
y
4x2 e + 2y1 4x1 y2
1 @f2 2y22
@x1 (x)
Our previous discussion implies, inter alia, that in the special case m = 1 formula (25.29)
reduces to
@g @f @g
(x) (x) = (x)
@y @xj @xj
which is formula (25.26) of the vector function version of the Implicit Function Theorem.
Since condition (25.28) reduces to (@g=@y) (x0 ; y0 ) 6= 0, we conclude that the vector function
version is, indeed, the special case m = 1. Everything …ts together.
is the set of point x on the horizontal axis for which there exists a point y on the vertical
axis such that the pair (x; y) belong to A.12
Likewise, de…ne the projection
on the vertical axis, that is the set of points y on the vertical axis for which there exists (at
least) one point x on the horizontal axis such that (x; y) belongs to A.
The projections 1 (A) and 2 (A) are nothing but the “shadows” of the set A R2 on
the two axes, as the following …gure illustrates:
12
This notion of projection is not to be confused with the altogether di¤erent one seen in Chapter 21.1.
25.4. A GLOBAL PERSPECTIVE 831
4
y
0 π (A)
2
-2
-4
O π (A) x
1
-6
-6 -4 -2 0 2 4 6
and
2 (B" (x; y)) = B" (y) = (y "; y + ")
In particular, 1 (Gr f ) is the domain of f and 2 (Gr f ) is the image Im f . This holds in
general: if f : A R ! R one has 1 (Gr f ) = A and 2 (Gr f ) = Im f . N
832 CHAPTER 25. IMPLICIT FUNCTIONS
If the implicit function f exists, its domain will be included in 1 (g 1 (0)) and its codomain
will be included in 2 (g 1 (0)). This leads us to the following de…nition.
g (x; f (x)) = 0 8x 2 A
that is,
g (x; y) = 0 () y = f (x) 8 (x; y) 2 A B
In such a signi…cant case, the implicit function f allows us to represent the level curve g 1 (0)
on A B by means of its graph Gr f . In other words, the level curve admits a functional
representation.
is the unit circle. Since 1 (g 1 (0)) = 2 (g 1 (0)) = [ 1; 1], the possible implicit function on
a rectangle A B takes the form f : A ! B with A [ 1; 1] and B [ 1; 1]. Let us …x
x 2 [ 1; 1], so to analyze the set
S (x) = y 2 [ 1; 1] : x2 + y 2 = 1
The set has two elements, except for x = 1. In other words, for every 0 < x < 1 there are
two values y for which g (x; y) = 0. Let us consider the projections’rectangle
A B = [ 1; 1] [ 1; 1]
f (x) 2 S (x) 8x 2 [ 1; 1]
entails that
g (x; f (x)) = 0 8x 2 [ 1; 1]
and is thus implicitly de…ned by g on A B. Such functions are in…nitely many; for example,
this is the case for the function
( p
1 x2 if x 2 Q\ [ 1; 1]
f (x) = p
1 x2 otherwise
p p
f (x) = 1 x2 and f (x) = 1 x2 8x 2 [ 1; 1] (25.32)
Therefore, there are in…nitely many functions implicitly de…ned by g on the rectangle
A B = [ 1; 1] [ 1; 1].13 The equation g (x; y) = 0 is therefore not explicitable on this
rectangle, which makes this case hardly interesting. Let us consider instead the less ambitious
rectangle
A~ ~ = [ 1; 1]
B [0; 1]
p
The function f : [ 1; 1] ! [0; 1] de…ned by f (x) = 1 x2 is the only function such that
p
g (x; f (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]
that is, f is the only function implicitly de…ned by g on the rectangle A~ ~ Equation
B.
g (x; y) = 0 is then explicitable on A~ B,
~ with
g 1
(0) \ A~ ~ = Gr f
B
13
Note that most of them are somewhat irregular; the only continuous ones among them are the two in
(25.32).
834 CHAPTER 25. IMPLICIT FUNCTIONS
y
2.5
1.5
0.5
0
-1 O 1 x
-0.5
-1
-2 -1 0 1 2
y
1.5
0.5
-1 1
0
O x
-0.5
-1
-1.5
-2
-2 -1 0 1 2
25.4. A GLOBAL PERSPECTIVE 835
To sum up, there are in…nitely many implicit functions on the projections rectangle A B,
while uniqueness can be obtained when we restrict ourselves to the smaller rectangles A~ B~
and A B. The study of implicit functions is of interest on these two rectangles because
the unique implicit function f de…ned thereon describes a univocal relationship between the
variables x and y which equation g (x; y) = 0 implicitly determines. N
O.R. If we draw the graph of the level curve g 1 (0) one can note how the rectangle A B
can be thought of a sort of “frame” on this graph, isolating a part of it. In some frames the
graph is explicitable, in other less fortunate ones, it is not. By changing the framing we can
tell apart di¤erent parts of the graph according to their explicitability. H
The last example showed how it is important to study, for each x 2 1 (0)),
1 (g the
solution set
S (x) = y 2 2 (g 1 (0)) : g (x; y) = 0
The scalar functions f : 1 (g 1 (0)) ! 2 (g 1 (0)), with f (x) 2 S (x) for every x in their
domain, are the possible implicit functions. In particular, when the rectangle A B is such
that S (x) \ B is a singleton for each x 2 A, we have a unique implicit function f : A ! B.
In this case, for each x 2 A there is a unique solution y 2 B to equation g (x; y) = 0.
Let us see another simple example, warning the reader that –though useful to …x ideas
– these are very fortunate cases: usually constructing S (x) is far from easy (though local,
the Implicit Function Theorem is key in this regard).
p
Example 1220 Let g : R2+ ! R be given by g (x; y) = xy 1. We have
1
g (0) = (x; y) 2 R2+ : xy = 1
since 1 (0)) 1 (0))
1 (g = 2 (g = (0; 1), and so
A B (0; 1) (0; 1) = R2++
Let us …x x 2 (0; 1) and let us analyze the set
S (x) = fy 2 (0; 1) : xy = 1g
Since
1
S (x) = 8x 2 (0; 1)
x
we consider A B = R2++ and f : (0; 1) ! (0; 1) given by f (x) = 1=x. We have
1
g (x; f (x)) = g x; =0 8x 2 (0; 1)
x
and f is the only function implicitly de…ned by g on R2++ . Moreover, we have
1
g (0) \ R2++ = Gr f
The level curve g 1 (0) can be represented on R2++ as the graph of f . N
A …nal remark. When writing g (x; y) = 0, variables x and y play symmetric roles, so
that we can think of a relationship of type y = f (x) or of type x = ' (y) indi¤erently. In
what follows, we will always consider a function y = f (x), as the case x = ' (y) can be
easily recovered via a parallel analysis to that we conduct here.
836 CHAPTER 25. IMPLICIT FUNCTIONS
(i) equilibrium analysis, where equation (25.33) derives from an equilibrium condition in
which y is an equilibrium (endogenous) variable and x is an (exogenous) parameter;
(ii) optimization problems, where equation (25.33) comes from a …rst order condition in
which y is a choice variable and x is a parameter.
The analysis of the relationship between x and y, that is, between the values of the
parameter and the resulting choice or equilibrium variable, is a comparative statics exercise
that, thus, consists in studying the function f implicitly de…ned by the economic relation
(25.33). The uniqueness of such an implicit function, and hence the explicitability of equation
(25.33), is essential to best conduct comparative statics exercises.
The following two subsections will present these two comparative statics problems.14
and the equilibrium price p^ varies as changes. What is the relationship between taxation
level and equilibrium prices? Which properties does such a relationship have?
Answering these simple, yet important, economic questions is equivalent to asking oneself:
(i) whether a (unique) function p = f ( ) which connects taxation and equilibrium prices
(i.e., the exogenous and endogenous variable of this simple market model) exists, and (ii)
which properties such a function has.
To deal with this problem, we introduce the function g : [0; b] R+ ! R given by
g (p; ) = S (p) D (p; ), so that the equilibrium condition (25.35) can be written as
g (p; ) = 0
14
In Chapter 33 we will further study comparative statics exercises in optimization problems.
25.4. A GLOBAL PERSPECTIVE 837
In particular,
1
g (0) = f(p; ) 2 [0; b] R+ : g (p; ) = 0g
is the set of all pairs of equilibrium prices/taxation levels (i.e., of endogenous/exogenous
variables).
The two questions asked above are now equivalent to asking oneself whether:
(ii) if so, which are the properties of such a function f : for example, if it is decreasing, so
that higher indirect taxes correspond to lower equilibrium prices.
Problems as such, where the relationship among endogenous and exogenous variables
is studied – in particular, how changes in the former impact the latter – are of central
importance in economic theory and in its empirical tests.
To …x ideas, let us examine the simple linear case where everything is straightforward.
D (p; ) = (p + )
S (p) = a + bp
g (p; ) = a + bp + (p + )
a
f( )= + (25.36)
b+ +b
clearly satis…es (25.35). The equation g (p; ) = 0 thus implicitly de…nes (and in this case
also explicitly) the function f given by (25.36). Its properties are obvious: for example, it
is strictly decreasing, so that changes in the taxation level bring about opposite changes in
equilibrium prices.
Regarding the equilibrium quantity q^, for every it is
q^ = D (f ( ) ; ) = S (f ( ))
of a …rm with pro…t function : [0; 1) ! R given by (p; y) = py c (y), where c : [0; 1) !
R is a di¤erentiable cost function (cf. Section 18.1.4). The choice variable is the production
level y of some good, say potatoes.
If, as one would expect, there is at least a production level y > 0 such that (y) > 0,
the level y = 0 is not optimal. So, problem (25.37) becomes
Since the interval (0; 1) is open, by Fermat’s Theorem a necessary condition for y > 0 to
be optimal is that it satis…es the …rst order condition
@ (p; y)
=p c0 (y) = 0 (25.39)
@y
The key aspect of the producer’s problem is to assess how the optimal production of potatoes
varies as the market price of potatoes changes, i.e., how the production of potatoes is a¤ected
by their price. Such a relevant relationship between prices and quantities is expressed by the
scalar function f such that
p c0 (f (p)) = 0 8p 0
that is, by the function implicitly de…ned by the …rst order condition (25.39). Function
f is referred to as the producer’s supply function (of potatoes). For each price level p, it
gives the optimal quantity y = f (p). Its existence and properties (for example, if it is
increasing, so that higher prices lead to larger produced quantities of potatoes, hence larger
supplied quantities in the market) are of central importance in studying a good’s market. In
particular, the sum of the supply functions of all producers who are present in the market
constitutes the market supply function S (p) which we saw in Chapter 12.
To formalize the derivation of the supply function from the optimization problem (25.38),
we de…ne a function g : [0; 1) (0; 1) ! R by
g (p; y) = p c0 (y)
g (p; y) = 0
If there exists an implicit function y = f (p) such that g (p; f (p)) = 0, it is nothing but the
supply function itself. Let us see a simple example where the function f and its properties
can be recovered with simple computations.
Example 1222 Consider quadratic costs c (y) = y 2 for y 0. Here g (p; y) = p 2y, so the
only function f : [0; 1) ! [0; 1) implicitly de…ned by g on R2+ is f (p) = p=2. In particular,
f is strictly increasing, so that higher prices entail a higher production, and hence a larger
supply. N
25.4. A GLOBAL PERSPECTIVE 839
25.4.4 Properties
The …rst important problem one faces when analyzing implicit functions is that of determ-
ining which conditions on function g guarantee that equation g (x; y) = 0 is explicitable on a
rectangle, that is, it de…nes a unique implicit function over there. Later in the book we will
establish a Global Implicit Function Theorem (Section 26.3), a deep result. Here we can,
however, establish a few simple, yet quite interesting, facts that follow from Propositions
1195 and 1196.
If, for simplicity,15 we focus on the rectangle 1 (g 1 (0)) 2 (g
1 (0)), for the problem
So, for every possible x at least a solution (x; y) to equation g (x; y) = 0 exists. As previously
noted, every scalar function f : 1 (g 1 (0)) ! 2 (g 1 (0)) with f (x) 2 S (x) for all x 2
1 (0)) is a possible implicit function.
1 (g
In view of Proposition 1195, the non-emptiness condition (25.40) holds if
1
inf g (x; y) 0 sup g (x; y) 8x 2 1 (g (0))
y2 1 (0))
2 (g y2 2 (g
1 (0))
The results of Section 25.2 permit to ascribe some notable properties to the implicit
function. Speci…cally, let f : 1 (g 1 (0)) ! 2 (g 1 (0)) be the unique function such that
g (x; f (x)) = 0 for all x 2 1 (g 1 (0)). By Propositions 1199 and 1200, if g is strictly
increasing in y, then f is:16
@g
(x; y)
f 0 (x) = @x 8 (x; y) 2 g 1
(0)
@g
(x; y)
@y
if g is continuously di¤erentiable on A B, with either @g (x; y) =@y > 0 for all (x; y) 2
A B or @g (x; y) =@y < 0 for all (x; y) 2 A B.
15
What we establish here and in the next subsection is easily seen to hold for any rectangle A B.
16
In points (ii) and (iii) we tacitly assume that the domain of C is convex, while in points (iv) and (v) we
assume that it is open.
840 CHAPTER 25. IMPLICIT FUNCTIONS
Point (ii) makes rigorous in a global sense –in contrast to the local one already remarked
in Section 25.3.2 – the expression “convex indi¤erence curves” by showing that they are,
indeed, represented via convex implicit functions.
Thus, the results of Section 25.2 are all what we need in this example, there is no need to
invoke the Implicit Function Theorem. For instance, the continuous di¤erentiability of fk
follows from Proposition 1200 since @g (x; y) =@y > 0 for all (x; y) 2 R2++ . In sum, here the
Implicit Function Theorem actually delivers an inferior, local rather than global, result. N
(i) D : [0; b] R ! R and S : [0; b] ! R are continuous and such that D (0; ) S (0) and
D (b; ) S (b) for every .
g (f ( ) ; ) = 0 8 0
By Proposition 1199, it is
(iii) (strictly) convex if S is (strictly) quasi concave and D is (strictly) quasi convex.
17
Indeed D and S are continuous and, furthermore, D (0; ) S (0) and D (b; ) S (b) for every .
25.4. A GLOBAL PERSPECTIVE 841
Property (ii) is especially interesting. Under the natural hypothesis that D is strictly
decreasing in , we have that f is strictly decreasing: changes in taxation bring about
opposite changes in equilibrium prices (increases in entail decreases in p, and decreases in
determine increases in p).
In the linear case of Example 1221, the existence and properties of f follow from simple
computations. The results in this section allow to extend the same conclusions to much more
general demand and supply functions.
where c is the choice variable and 0 parameterizes the objective function F : (a; b)
[0; 1) ! R. Assume that F is partially derivable. If the partial derivative @F ( ; c) =@c is
strictly increasing in c –for example, @ 2 F ( ; c) =@c2 > 0 if F is twice di¤erentiable –and if
condition (25.3) holds, then by Propositions 1195 and 1196 the …rst order condition
@F ( ; c)
g (c; ) = =0
@c
implicitly de…nes a unique function f : [0; 1) ! (a; b) such that
@F ( ; f ( ))
=0 8 0
@c
By Proposition 1199, the function f is:
In the special case of the producer’s problem, market prices p are the parameters and
production levels y are the choice variables. So, F (p; y) = py c (y) is the pro…t function
and
@F (p; y)
g (p; y) = = p c0 (y)
@y
The strict monotonicity of g in y is equivalent to the strict monotonicity of the derivative
function c0 (and to its strict convexity or concavity). In particular, in the standard case when
c0 is strictly increasing (so, c is strictly convex), the function g is concave, which implies that
the supply function y = f (p) is convex. In such a case, since g is strictly increasing in p, the
supply function is strictly increasing in p.
842 CHAPTER 25. IMPLICIT FUNCTIONS
Chapter 26
Inverse functions
26.1 Equations
A general form of an equation is
f (x) = y0 (26.1)
where f is an operator f : A ! Rn Rn
and y0 is a given element of Rn .1
The variable x
is the unknown of the equation and y0 is the known term. The solutions of the equation are
all x 2 X such that f (x) = y0 .
A basic taxonomy: equation (26.1) is
Earlier in the book we studied the special cases of homogeneous equations (Section 12.8)
and linear equations (Section 13.7).
Three main questions can be asked on the solutions of equation (26.1):
(i) can the equation be solved globally: given every y0 2 Rn , is there x 2 A that satis…es
(26.1)? if so, is the solution unique?
(ii) can the equation be solved locally: given a y0 2 Rn , is there x 2 A that satis…es (26.1)?
if so, is the solution unique?
(iii) if the solution is globally unique, does it change continuously as the known term
changes?
843
844 CHAPTER 26. INVERSE FUNCTIONS
The global question (i) is clearly much more demanding than the local one (ii). In
particular, the existence and uniqueness of solutions at each y0 2 Rn amounts to the existence
of the inverse function f 1 : Rn ! Rn , which then describes how solutions vary as the known
term varies. Finally, question (iii) is about the “robustness”of the unique solutions, whether
they change abruptly, discontinuously, under small changes of the known term. If they did,
the equation would have an unpleasant instability in that small changes in the known term
would determine signi…cant changes in its solutions.
Ideally, solutions should be unique globally and vary continuously with respect to the
known term. Formally, this means that f is globally invertible and its inverse f 1 : Rn ! Rn
is continuous (or, even better, di¤erentiable). In this case, we say that the problem of solving
the equation is well posed.
3
Recall that a function is invertible if it is injective (Section 6.4.1). So, global invertibility is a much
stronger notion that requires the function to be a bijection of Rn onto Rn .
26.2. LOCAL ANALYSIS 845
Example 1225 This ideal case may occur for a linear equation Ax = b. Indeed, the linear
operator T : Rn ! Rn de…ned by T (x) = Ax is globally invertible if and only if the matrix A
is invertible, that is, if and only if det A 6= 0 (Cramer’s Theorem). Condition det A 6= 0 thus
ensures that, for each b 2 Rn , there is a unique solution x 2 Rn given by T 1 (b) = A 1 b.
The inverse T 1 : Rn ! Rn is a continuous function that describes how solutions vary as b
varies. N
O.R. Every equation f (x) = y0 can be put in a homogeneous form fy0 (x) = 0 via the
auxiliary function fy0 (x) = f (x) y0 . If we are interested in addressing question (ii), so
what happens at a given y0 , it is then without loss of generality to consider homogeneous
equations (as we did, for example, in Section 12.8). However, for the global questions (i) and
(iii) it is important to keep track of the known term by studying the general form f (x) = y0 .
H
at x0 2 U , then there exist neighborhoods B (x0 ) and V (y0 ) so that the restriction f :
B (x0 ) ! V (y0 ) is a bijective operator, with a k times continuously di¤ erentiable inverse
operator f 1 : V (y0 ) ! B (x0 ) such that
1 1
Df (y) = (Df (x)) 8x 2 B (x0 ) (26.3)
where y = f (x).
The Inverse Function Theorem thus provide conditions that ensure the local invertibility
of a function. This important theorem is a simple consequence of the Implicit Function
Theorem.4
Proof Assume, for simplicity, that Im f is an open set, so the set U Im f is open in R2n .
De…ne g : R2n ! Rn by
g (x; y) = f (x) y (26.4)
Given (x0 ; y0 ) 2 g 1 (0), by (26.2) we have
The operator operator version of the Implicit Function Theorem (Theorem 1215) (in “ex-
changed” form) then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function ' : B (y0 ) ! V (x0 ) such that
For n = 1, formula (26.3) has as a special case the basic formula (20.20) on the derivative
0
of the inverse of a scalar function, i.e., f 1 (y0 ) = 1=f 0 (x0 ). So, the Inverse Function
Theorem vastly generalizes such basic …nding. More importantly, this classic result provides
an answer to the local question (ii). Indeed, suppose that –by skill or luck –we have been
able to …nd a solution x0 of equation f (x) = y0 . Based on this knowledge, under a di¤erential
condition at x0 the Inverse Function Theorem ensures that, …rst, x0 is the unique solution
and, second, that for all know terms y that belong to a neighborhood V (y0 ) of the known
term y0 , the corresponding equations f (x) = y have unique solutions as well, all lying in the
neighborhood B (x0 ) = f 1 (V (y0 )).
Recall that the Jacobian matrix is the matrix associated to the di¤erential operator
df (x0 ) : Rn ! Rn (Theorem 974), i.e.,
df (x0 ) (h) = Df (x0 ) h 8h 2 Rn
Condition (26.2) amounts to require that the Jacobian matrix be invertible, so that the
di¤erential operator is invertible. Its inverse operator d 1 f (x0 ) : Rn ! Rn is then given by
1 1
d f (x0 ) (h) = (Df (x0 )) h 8h 2 Rn
The Inverse Function Theorem shows that the invertibility of its di¤erential at x0 , ensured
by condition (26.2), is inherited locally at x0 by the function f itself. By formula (26.3), we
also have
1 1 1 1
df (y0 ) (h) = Df (y0 ) h = (Df (x0 )) h=d f (x0 ) (h) 8h 2 Rn
So, the di¤erential of the inverse coincides with the inverse of the di¤erential. Formula (26.3)
thus ensures the mutual consistency of the linear approximations at x0 of the function f and
of its inverse f 1 , a further dividend of the Inverse Function Theorem.
The Inverse Function Theorem may fail if we remove either of its hypothesis – i.e.,
condition (26.2) and (at least) continuous di¤erentiability. A non-trivial, omitted, example
can be given to show that di¤erentiability is not enough for the theorem, so continuous
di¤erentiability is needed. A simple example, which we give next, shows that condition
(26.2) is needed.
Example 1227 The continuously di¤erentiable quadratic function f (x) = x2 does not sat-
isfy condition (26.2) at the origin. On the other hand, this function is not locally invertible
at the origin: there is no neighborhood of the origin on which we can restrict the quadratic
function so to make it injective. N
26.3. GLOBAL ANALYSIS 847
For instance, level sets f 1 (y) = fx 2 Rn : f (x) = yg of continuous functions are closed
sets since singletons fyg are closed sets in Rm .
The proof of the proposition relies on some basic set theoretic properties of images and
preimages, whose proof is left to the reader.
Lemma 1229 Let f : X ! Y be a function between any two sets X and Y . We have:
In view of (ii), there is a dual version of the last proposition for open sets: an operator
is continuous if and only if the preimage of each open set is open.
Proof of Proposition 1228 “If”. Suppose that f is continuous. Let C be a closed set of
Rn . Let fxn g f 1 (C) be such that xn ! x0 2 Rn . We want to show that x0 2 f 1 (C).
Set yn = f (xn ). Since f is continuous, we have f (xn ) ! f (x0 ). Then f (x0 ) 2 C because
C is closed. In turn, this implies x0 2 f 1 (C), as desired.
“Only if”. Suppose that, for each closed set C of Rm , the set f 1 (C) is closed in Rn . So,
c
for each open set V of Rm , the set f 1 (V ) is open in Rn because f 1 (V ) = f 1 (V c ).
So, being x0 2 f 1 (V ), there exists a neighborhood B (x0 ) such that B (x0 ) f 1 (V ). So,
f (B (x0 )) f f 1 (V ) V . We conclude that f is continuous at x0 .
There is no counterpart of the last proposition for images: given a continuous function,
in general the image of an open set is not open and the image of a closed set is not closed.
Example 1230 (i) Let f : R ! R be the quadratic function f (x) = x2 . For the open
interval I = ( 1; 1) we have f (I) = [0; 1), which is not open. (ii) Let f : R ! R be the
exponential function f (x) = ex . The real line R is a closed set (also open, but here this is
not of interest), with f (R) = (0; 1), which is not closed. N
In view of Lemma 801, it is not surprising that in this example the closed set considered,
i.e. R, is unbounded, so not compact.
848 CHAPTER 26. INVERSE FUNCTIONS
Properness requires the norm of the images of f to diverge to +1 along any possible
unbounded sequence fxn g Rn –i.e., such that kxn k ! +1. In words, the function cannot
take, inde…nitely, values that have increasing norm values on a sequence that “dashes o¤”
to in…nity.
Example 1232 If m = 1, supercoercive functions are proper. Indeed, for them we have
The converse is obviously false: the cubic function f (x) = x3 is proper but not supercoercive.
N
By now, the next characterization of proper functions should not be that surprising.
In view of Proposition 1228, we have the following simple, yet interesting, corollary.
Proof Invertible linear operators are globally invertible, and the inverse f 1 : Rn ! Rn
is a linear operator (see Chapter 13). By Lemma 730, there exists a constant k > 0 such
that f 1 (x) k kxk for every x 2 Rn . Let fxn g Rn be such that kxn k ! +1. Then,
kxn k = f 1 (f (xn )) k kf (xn )k, so kf (xn )k ! +1. We conclude that f is proper.
26.3. GLOBAL ANALYSIS 849
Proof Let f : Rn ! Rn be continuously di¤erentiable. We prove the “only if”, the converse
being much more complicated. So, suppose that f is bijective, with di¤erentiable inverse
f 1 : Rn ! Rn . Since f 1 is continuous, by Lemma 801 the image f 1 (K) of each compact
set K of Rn is compact. Since f is continuous, by Corollary 1234 this implies that f is
proper. Moreover, since f 1 f (x) = x for all x 2 Rn , by the chain rule formula (21.39)
we have Df 1 (f (x)) Df (x) = I, so det Df 1 (f (x)) Df (x) = 1. By Binet’s Theorem,
det Df (x) 6= 0.
Without the hypothesis that f is proper, the “if” can fail, as the next classic example
shows.
Thus, det Df (x) = e2x1 cos2 x2 + e2x1 sin2 x2 = e2x1 > 0 for all x 2 Rn , so condition (26.5)
holds. However, this function is notpproper. Indeed, if we take xn = (0; n), then kxn k = n
but kf (xn )k = k(cos n; sin n)k = cos2 n + sin2 n = 1, so kxn k ! +1 does not imply
kf (xn )k ! +1.
This function is neither injective nor surjective. To see that it is not surjective, note
that there is no x 2 Rn such that f (x) = 0. Indeed, if f (x) = 0 then ex1 cos x2 = 0, so
cos x2 = 0. In turn, this implies sin x2 = 1, which contradicts f (x) = 0. As to injectivity,
for example we have f (0; 0) = f (0; 2 ) = (1; 0).
In sum, by the Inverse Function Theorem f is locally invertible at each x 2 Rn , but we
just showed that it is not globally invertible on Rn . Thus, a function locally invertible at
each point of its domain might not be globally invertible. N
5
A …rst version of this theorem was proved by Jacques Hadamard in 1906 and then substantially generalized
by Renato Caccioppoli in 1932.
850 CHAPTER 26. INVERSE FUNCTIONS
det Df (x) 6= 0 8x 2 Rn
where y = f (x).
g (x; f (x)) = 0 8x 2 Rn
Since g is proper, so does F . Indeed, if k(x; y)k ! +1, then kg (x; y)k ! +1, so
kF (x; y)k ! +1 because kg (x; y)k kF (x; y)k.
26.3. GLOBAL ANALYSIS 851
Since
Fi (x; y) = xi 8i = 1; :::; n
Fn+j (x; y) = gj (x; y) 8j = 1; :::; m
we have
2 @F1 (x) @F1 (x) @F1 (x) @F1 (x) @F1 (x)
3
@x1 @x2 @xn @y1 @ym
6 7
6 7
6 @Fn (x) @Fn (x) @Fn (x) @Fn (x) @Fn (x) 7
6 7
DF (x) = 6
6
@x1
@Fn+1 (x)
@x2
@Fn+1 (x)
@xn
@Fn+1 (x)
@y1
@Fn+1 (x)
@ym
@Fn+1 (x)
7
7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x)
@x1 @x2 @xn @y1 @ym
2 3
1 0 0 0 0
6 7
6 7
6 7
6 0 0 1 0 0 7
= 6
6 @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y)
7
7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y)
@x1 @x2 @xn @y1 @ym
So, 2 3
@g1 (x;y) @g1 (x;y)
@y1 @ym
6 7
det DF (x) = det 4 5 = det Dy g (x; y)
@gm (x;y) @gm (x;y)
@y1 @ym
By (26.7), we thus have det DF (x) 6= 0 for all x 2 Rn .
By Caccioppoli-Hadamard’s Theorem, F is globally invertible with di¤erentiable F 1 :
R n+m ! Rn+m . Fix x 2 Rn . Since there is y 2 Rn such that F 1 (x; 0) = (x; y), we have
g (x; y) = 0. We claim that such y 2 Rn is unique. Indeed, let y; y 0 2 Rn be such that
g (x; y) = 0. Then,
F (x; y) = (x; g (x; y)) = (x; 0) = x; g x; y 0 = F x; y 0
Since F is bijective, it then follows that y = y 0 , as desired. So, let f : Rn ! Rm be the
operator that associates to each x 2 Rn the unique y 2 Rm such that g (x; y) = 0. By
de…nition, g (x; f (x)) = 0 for all x 2 Rn and f is the unique such operator. Moreover, from
F (x; f (x)) = (x; 0) 8x 2 Rn
it follows that
1
F (x; 0) = (x; f (x)) 8x 2 Rn
Since F 1 is di¤erentiable, it can be proved that this implies that f is di¤erentiable. Since
g (x; f (x)) = 0 8x 2 Rn
by the chain rule we have
Dx g (x; f (x)) = Dy g (x; f (x)) Dx f (x) 8x 2 Rn
So, formula (26.8) holds because condition (26.7) ensures that the matrix Dy g (x; f (x)) is
invertible at all x 2 Rn .
852 CHAPTER 26. INVERSE FUNCTIONS
f (x; ) = y0
(i) is the set Sy0 ( ) not empty for some 2 or for all 2 ? If so, is it a singleton?
(ii) if it is a function (locally or globally), is Sy0 continuous (or di¤erentiable)?
We have
f (Sy0 ( ) ; ) = y0
So, a positive answer to question (i) would amount to say that Sy0 is a function implicitly
de…ned, locally or globally, by the equation f (x; ) = y0 , that is, Sy0 would give the functional
representation of the level curve f 1 (y0 ) = f(x; ) 2 A : f (x; ) = y0 g. Thus, the study
of the solutions of a parametric equation given a known terms and the study of the functional
representations of a level curve are, mathematically, equivalent exercises.
To answer the questions (i) and (ii) we need then to invoke suitable versions of the Implicit
Function Theorem: local versions of such theorem would give local answers, global versions
would give global answers. In any case, a deja vu: in our discussions of implicit functions
we already (implicitly) took this angle, which in economics is at heart of comparative statics
analysis (cf. Section 25.4.3). Indeed, conditions that ensure the existence, at least locally,
of a solution function Sy0 : S ! Rn permit to e¤ectively describe how solutions – the
endogenous variables – react to changes in the parameters – the exogenous variables. For
brevity, we leave readers to revisit those discussions through the lenses of this section.
We can consider four main problems about a scienti…c inquiry described by a triple (X; Y; M ).
We formalize them by means of the evaluation function g : X M ! Y de…ned by g (x; m) =
m (x) that relates causes, e¤ects and models through the expression
y = g (x; m) (26.9)
(i) Direct problems: Given a model m and a cause x, what is the resulting e¤ect y?
formally, which is the (unique) value y = g (x; m) given x 2 X and m 2 M ?
(ii) Causation problems: Given a model m and an e¤ect y, what is the underlying cause
x? formally, which are the (possibly multiple) values of x that solve equation (26.9)
given y 2 Y and m 2 M ?
(iii) Identi…cation problems: Given a cause x and an e¤ect y, what is the underlying model
m? formally, which are the (possibly multiple) values of m 2 M that solve equation
(26.9) given x 2 X and y 2 Y ?
(iv) Induction problems: Given an e¤ect y, what are the underlying cause x and model m?
formally, which are the (possibly multiple) values of x 2 X and m 2 M that solve
equation (26.9) given x 2 X?
The latter three problems (causation, identi…cation and induction) are formalized by
regarding (26.9) as an equation. For this reason, we call them inverse problems.6 We can
thus view the study of equations as a way to address such problems. In this regard, note
that:
2. In an induction problem, y is the known term of equation (26.9), while x and m are
the unknowns.
Example 1240 Consider an orchard with several apple trees that produce a quantity of
apples according to the summer weather conditions; in particular, the summer could be
either cold or hot or mild. Here m is an apple tree that belongs to the collection M of the
apple trees of the orchard, y is the apple harvest with Y = [0; 1), and x is the average
summer temperature with X = [0; 1). We interpret m (x) as the quantity of apples that
the tree m produces when the summer weather is x. The trees in the orchard thus di¤er in
their performance in the di¤erent weather conditions.
In this example the previous four problems takes the form:
(i) Given a tree m and an average summer temperature x, what is the resulting apple
harvest y?
6
In this chapter we considered the case X; Y Rn , but the study of equations can be carried out more
generally, as readers will learn in more advanced courses.
854 CHAPTER 26. INVERSE FUNCTIONS
(ii) Given a tree m and an apple harvest y, what is the underlying average summer tem-
perature x?
(iii) Given an average summer temperature x and an apple harvest y, what is the underlying
tree m?
(iv) Given an apple harvest y, what are the underlying average summer temperature x and
tree m? N
Chapter 27
Study of functions
It is often useful to have, roughly, a sense of how a function looks like. In this chapter we
will outline a qualitative study of functions. To this end, we …rst introduce couple of classes
of points.
A dual de…nition holds for (strict) convexity at a point. From Corollary 1101 it immedi-
ately follows the next result.
Example 1243 (i) The function f : R ! R given by f (x) = 2x2 3 is strictly convex
at every point because f 00 (x) = 4 > 0 at every x. (ii) The function f : R ! R given by
f (x) = x3 is strictly convex at x0 = 5 since f 00 (5) = 30 > 0, and it is strictly concave at
x0 = 1 since f 00 ( 1) = 6 < 0. N
855
856 CHAPTER 27. STUDY OF FUNCTIONS
5 10
y y
0 f(x )
0
6
-5 4 f(x )
0
-10
O x x O x x
0 0
-15 -2
0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 7
O.R. Like the …rst derivative of a function at a point gives information on its increase or
decrease, so the second derivative gives information on concavity or convexity at a point.
The greater jf 00 (x0 )j, the more pronounced the curvature (the “belly”) of f at x0 –and the
“belly” is upward if f 00 (x0 ) < 0 and downward if f 00 (x0 ) > 0, as the previous …gure shows.
Economic applications often consider the ratio
f 00 (x0 )
f 0 (x0 )
which does not depend on the unit of measure of f (x). Indeed, let T and S be the units
of measure of the dependent and independent variables, respectively. Then, the units of
measure of f 0 and of f 00 are T =S and T =S 2 , so the unit of measure of f 00 =f 0 is
T
S2 1
T
=
S
S
In short, in an in‡ection point the “sign” of the concavity of the function changes. By
Proposition 1242, we have the following simple result.
(i) If x0 is an in‡ection point for f , then f 00 (x0 ) = 0 (provided f is twice di¤ erentiable at
x0 ).
(ii) If f 00 (x0 ) = 0 and f 000 (x0 ) 6= 0, then x0 is an in‡ection point for f (provided f is three
times continuously di¤ erentiable at x0 ).
27.2. ASYMPTOTES 857
Example 1246 (i) The origin is an in‡ection point of the cubic function f (x) = x3 . (ii)
2 2
Let f : R ! R be the Gaussian function f (x) = e x . Then f 0 (x) = 2xe x and f 00 (x) =
2
4x2 2 e x , so the function is concave for
1 1
p <x< p
2 2
p p
and convex
p for jxj > 1= 2. The two points 1= 2 are therefore in‡ection points. Indeed,
f 00 ( 1= 2) = 0. We will continue the study of this function later in the chapter in Example
1258. N
For di¤erentiable functions, geometrically at a point of in‡ection x0 the tangent line cuts
the graph: it cannot lie (locally) above or below it. In particular, if f 0 (x0 ) = f 00 (x0 ) = 0
then the tangent line is horizontal and cuts the graph of the function: we talk of a point of
in‡ection with horizontal tangent.
Example 1247 The origin is an in‡ection point with horizontal tangent of the cubic func-
tion, as well as of any function f (x) = xn with n odd. N
27.2 Asymptotes
Intuitively, an asymptote is a straight line to which the graph of a function gets arbitrarily
close. Such straight lines can be vertical, horizontal, or oblique.
lim f (x) = +1 or 1
x!x+
0
lim f (x) = +1 or 1
x!x0
(ii) When
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1
(iii) When
lim (f (x) ax b) = 0 (or lim (f (x) ax b) = 0)
x!+1 x! 1
that is, when the distance between the function and the straight line y = ax + b tends
to 0 as x ! +1 (or ! 1), the straight line of equation y = ax + b is an oblique
asymptote for f to +1 (or to 1).
Horizontal asymptotes are actually the special case of oblique asymptotes with a = 0.
Moreover, it is evident that there can be at most one oblique asymptote as x ! 1 or as
x ! +1. It is, instead, possible that f has several vertical asymptotes.
858 CHAPTER 27. STUDY OF FUNCTIONS
7
f (x) = 3
x2 +1
with graph
2
y
1.5
0.5
-0.5
-1
-1.5
O x
-2
-2.5
-3
-3.5
-5 0 5
Since limx!+1 f (x) = limx! 1 f (x) = 3; the straight line y = 3 is both a right and a
left horizontal asymptote for f (x). N
1
f (x) = +2
x+1
with graph
8
y
6
0
O x
-2
-4
-5 0 5
1
f (x) =
x2 +x 2
27.2. ASYMPTOTES 859
with graph
3
y
0
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4 5
Since limx!1+ f (x) = +1 and limx!1 f (x) = 1, the straight line x = 1 is a vertical
asymptote for f (x). Moreover, since limx! 2+ = 1 and limx! 2 = +1, also the
straight line x = 2 is a vertical asymptote for f (x). N
2x2
f (x) =
x+1
with graph
20
y
15
10
0
O x
-5
-10
-15
-20
-6 -4 -2 0 2 4 6
Vertical and horizontal asymptotes are easily identi…ed. We thus shift our attention to
oblique asymptotes. To this end, we provide two simple results.
Proof “If”. When f (x) =x ! a, consider the di¤erence f (x) ax. If it tends to a …nite
limit b, then (and only then) f (x) ax b ! 0. “Only if”. From f (x) ax b ! 0 it
follows that f (x) ax ! b and, by dividing by x, that f (x) =x a ! 0.
Proposition 1252 gives a necessary and su¢ cient condition for the search of oblique
asymptotes, while Proposition 1253 only provides a su¢ cient condition. To use this latter
condition, the limits involved must exist. In this regard, consider the following example.
cos x2
f (x) = x +
x
as x ! 1 we have
f (x) cos x2
=1+ !1
x x2
and
cos x2
f (x) x= !0
x
Therefore, y = x is an oblique asymptote of f as x ! 1. Nevertheless, the …rst derivative
of f is
2x2 sin x2 cos x2 cos x2
f 0 (x) = 1 + = 1 2 sin x 2
x2 x2
It is immediate to verify that the limit of f 0 (x) as x ! 1 does not exist. N
and as x ! +1
r 1 !
p 1 1 2
f (x) x = x2 x x=x 1 x=x 1 1
x x
1
1
1 x
2
1 1
= 1 !
x
2
Therefore,
1
y=x
2
is an oblique asymptote as x ! +1 for f . N
(i) If f (x) = g (x) + h (x) and h (x) ! 0 as x ! 1, then f and g share the possible
oblique asymptotes.
p 1 a1
y= n
a0 x +
n a0
Let us verify only (ii) for n odd (for n even the calculations are analogous). If n is odd
as x ! 1, we have
p q
n
a x n n 1 + a1 + ::: + an
f (x) 0 a0 x a0 x p
= ! n a0
x x
p
hence the slope of the oblique asymptote is n a0 . Moreover
" 1 #
p p a1 xn 1+ ::: + an n
f (x) n
a0 x = n
a0 x 1+ 1 =
a0 xn
1
a1 xn 1 +:::+a n
1+ a0 xn
n
1
p a1 xn 1+ ::: + an
= n
a0 x a1 xn 1 +:::+a
a0 xn n
a0 xn
862 CHAPTER 27. STUDY OF FUNCTIONS
Since as x ! 1
1
a1 xn 1 +:::+a n
1+ a0 xn
n
1
1 p a1 xn 1 + ::: + an p a1
a1 xn 1 +:::+an
! and n
a0 x ! n
a0
n a0 xn a0
a0 xn
we have, as x ! 1,
p p a1 1
f (x) n
a0 x ! n
a0
a0 n
In the previous example we had n = 2, a0 = 1, and a1 = 1. Indeed, as x ! +1, the
asymptote had the equation
p
2 1 1 1
y= 1 x+ =x
2 1 2
(i) We …rst calculate the limits of f at the boundary points of the domain, and also as
x ! 1 when A is unbounded.
(ii) We determine the sets on which the function is positive, f (x) 0, increasing, f 0 (x)
0, and concave/convex, f 00 (x) Q 0. Once it is also determined the intersections of the
graph with the axes by …nding the set f (0) on the vertical axis and the set f 1 (0) on
the horizontal axis, we begin to have a …rst idea of its graph.
(iii) We look for candidate extremal points via …rst and second-order conditions (or, more
generally, via the omnibus procedure of Section 23.3).
(iv) We look, via the condition f 00 (x) = 0, for candidate in‡ection points; they are certainly
so if at them f 000 6= 0 (provided f is three times continuously di¤erentiable at x).
Example 1257 Let f : R ! R be given by f (x) = x6 3x2 + 1. We look for possible local
extremal points. The …rst-order condition f 0 (x) = 0 has the form
6x5 6x = 0
minimizers. From limx!+1 f (x) = limx! 1f (x) = +1 if follows that the graph of this
function is:
y
1.5
0.5
0
O x
-0.5
-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
2
Example 1258 Let f : R ! R be the Gaussian function f (x) = e x . Both limits, as
x ! 1, are 0. So, the horizontal axis is a horizontal asymptote. The function is always
strictly positive and f (0) = 1. Next, we look for possible local extremal points. The …rst
2
order condition f 0 (x) = 0 has the form 2xe x = 0, so the origin x = 0 is the unique
critical point. The second derivative is
x2 x2 x2
f 00 (x) = 2e + ( 2x) e ( 2x) = 2e 2x2 1
by Proposition 1024 the origin is actually a strong global maximizer. Moreover, we have
1 1
f 00 (x) < 0 () 2x2 1 < 0 () x 2 p ;p
2 2
1
f 00 (x) = 0 () 2x2 1 = 0 () x = p
2
1 1
f 00 (x) > 0 () 2x2 1 > 0 () x 2 1; p [ p ; +1
2 2
p
So, the
p points
p x = 1= 2 are in‡ection points, with f concave
p on thep open interval
1= 2; 1= 2 and convex on the open intervals 1; 1= 2 and 1= 2; +1 . The
864 CHAPTER 27. STUDY OF FUNCTIONS
y
1.5
0.5
0
O x
-0.5
-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
p p p
14 196 144 14 52 7 13
x= = =
6 6 3
p p
The derivative is 0 when x 2 ( 1; (7 13)=3] [ [(7 + 13)=3; 1).
3. Since f 00 (x) = 6x 14, it is zero for x = 7=3. The second derivative is 0 when
x 7=3.
p
Since f 00 ((7
4. p 13)=3) < 0, the point is a local maximizer; since instead f 00 ((7 +
13)=3) > 0, the point is a local minimizer. Finally, the point 7=3 is of in‡ection.
27.3. STUDY OF FUNCTIONS 865
10
y
8
0
O x
-2
-3 -2 -1 0 1 2 3 4 5 6 7
Example 1260 Let f : R ! R be given by f (x) = xex . Its limits are limx! 1 xe
x =0
and limx!+1 xex = +1. We then have:
1. f (x) 0 () x 0.
2. f 0 (x) = (x + 1) ex 0 () x 1.
3. f 00 (x) = (x + 2) ex 0 () x 2.
4. f (0) = 0, so the origin is the unique point of intersection with the axes.
9
y
8
0
O x
-1
-6 -4 -2 0 2 4 6
N
866 CHAPTER 27. STUDY OF FUNCTIONS
lim x2 ex = 0+ , lim x2 ex = +1
x! 1 x!+1
We then have:
8 y
7
0
O x
-1
-4 -3 -2 -1 0 1 2 3 4 5
lim x3 ex = 0 , lim x3 ex = +1
x! 1 x!+1
1. f (0) = 0; f (x) 0 () x 0.
8
y
7
0
O x
-1
-2
-6 -5 -4 -3 -2 -1 0 1 2 3
1. f (0) = 3 0:5 = 2:5; we have f (x) = 0 when (2x + 3) (x 2) = 1, that is, when
2x2 x 5 = 0, i.e., for
p
1 41
x= ' 1:35 and 1:85
4
1
lim [f (x) 2x] = lim 3+ =3
x! 1 x! 1 x 2
25
20 y
15
10
0
O x
-5
-10
-15
-20
-25
-5 0 5 10
Note that
1
f (x)
x 2
as x ! 2 (near 2 f (x) behaves like 1= (x 2), i.e., it diverges) and that f (x) 2x + 3
as x ! 1 (for x su¢ ciently large it behaves like y = 2x + 3). N
Part VII
Di¤erential optimization
869
Chapter 28
Unconstrained optimization
871
872 CHAPTER 28. UNCONSTRAINED OPTIMIZATION
S = fx 2 C : rf (x) = 0g
f (^
x) f (x) 8x 2 S (28.2)
then x
^ is a solution for the optimization problem (28.1).
In other words, once the conditions for Tonelli’s Theorem to be applied are veri…ed, one
constructs the set of critical points. A point where f attains its maximum value is a solution
of the optimization problem.
N.B. If the function f is twice continuously di¤erentiable, in phase 1 instead of S one can
consider the subset S2 S of the critical points that satisfy the second order necessary
condition (Sections 22.5.3 and 23.4.4). O
The rationale of the elimination method is simple. By Fermat’s Theorem, the set S
consists of all points in C which are candidate local solutions for the optimization problem
(28.1). On the other hand, if f is continuous and coercive on C, by Tonelli’s Theorem there
exists at least a solution for this optimization problem. Such a solution must belong to
the set S (as long as it is non-empty) because a solution of the optimization problem is, a
fortiori, a local solution. Hence, the solutions of the “restricted” optimization problem
are also solutions of the optimization problem (28.1). But, the solutions of the restricted
problem (28.3) are the points x^ 2 S for which condition (28.2) holds, which are then the
solutions of optimization problem (28.1), as phase 2 of the elimination method states.
As the following examples show, the elimination method elegantly and e¤ectively com-
bines Tonelli’s global result with Fermat’s local one. Note how Tonelli’s Theorem is crucial
since in unconstrained di¤erential optimization problems the choice set C is open, so Weier-
strass’Theorem inapplicable (as it requires C to be compact).
The smaller is the set S of critical points, the better the method works in that phase 2
requires a direct comparison of f at all points of S. For this reason, the method is particularly
e¤ective when we can consider, instead of S, its subset S2 consisting of all critical points
which satisfy the second order necessary condition.
2
Example 1264 Let f : Rn ! R be given by f (x) = (1 kxk2 )ekxk and let C = Rn . The
function f is coercive on Rn . Indeed, it is supercoercive: by taking tn = kxn k, it follows that
2 2
f (xn ) = 1 kxn k2 ekxn k = 1 t2n etn ! 1
for any sequence fxn g of vectors such that tn = kxn k ! +1. Since it is continuous, f is
coercive on Rn by Proposition 820. The unconstrained di¤erential optimization problem
2
max 1 kxk2 ekxk sub x 2 Rn (28.4)
x
28.2. COERCIVE PROBLEMS 873
Phase 1: The …rst order condition f 0 (x) = 0 takes the form 6x5 6x = 0, so x = 0
and x = 1 are the only critical points, that is, S = f 1; 0; 1g. We have f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24, so S2 = f0g.
of Example 1022. Let us check that this di¤erential problem is coercive. By setting g (x) = ex
and h (x) = x4 x2 , it follows that f = g h. We have limx! 1 h (x) = limx! 1 x4 +x2 =
1. So, by Proposition 820 the function h is coercive on R. Since g is strictly increasing,
the function f is a strictly increasing transformation of a coercive function. By Proposition
806, f is coercive.
This unconstrained di¤erential optimization problem is thus coercive and can be solved
with the elimination method.
p p
Phase 1: From Example 1022 we know that S2 = 1= 2; 1= 2 .
p p p
Phase 2: We have f ( 1= 2) = f (1= 2), so both points x ^ = 1= 2 are solutions of the
unconstrained optimization problem. The elimination method allowed us to identify the
nature of such points, something not possible by using solely di¤erential methods as in
Example 1022. N
is said to be concave if the set C A is both open and convex and if the function f : A
Rn ! R is both di¤erentiable and concave on C.
As we learned earlier in the book (Section 24.5.1), in a such a problem the …rst-order
condition rf (^ x) = 0 becomes necessary and su¢ cient for a point x ^ 2 C to be a solution.
This remarkable property explains the importance of concavity in optimization problems.
But, more is true: by Theorem 831, such a solution is unique if f is strictly quasi-concave.
Besides existence, also the study of the uniqueness of solutions –key for comparative statics
exercises –is best carried out under concavity.
The necessary and su¢ cient status of the …rst order condition leads to the concave (elim-
ination) method to solve the concave problem (28.6). It consists of a single phase:
1. Find the set S = fx 2 C : rf (x) = 0g of the stationary points of f on C; all, and only,
the points x
^ 2 S solve the optimization problem.
It requires the concavity of the objective function, a demanding condition that, however, is
often assumed in economic applications, as remarked before.3
Example 1268 Let f : R ! R be given by f (x) = x log x and let C = (0; 1). The
function f is strictly concave since f 00 (x) = 1=x < 0 for all x > 0 (Corollary 1101). Let us
solve the concave problem
max x log x sub x > 0 (28.7)
x
We have
1
f 0 (x) = 0 () log x = 1 () elog x = e 1
() x =
e
According to the concave method, x
^ = 1=e is the unique solution of problem (28.7). N
Example 1269 Let f : R2 ! R be given by f (x) = 2x2 3xy 6y 2 and let C = R2 . The
function f is strictly concave since the Hessian
4 3
3 12
We have
4x 3y = 0
rf (x) = 0 () () x = (0; 0)
12y 3x = 0
By the concave method, the origin x
^ = (0; 0) is the unique solution of problem (28.8). N
Example 1270 For bundles with two goods, the Cobb-Douglas utility function u : R2+ ! R
is u (x1 ; x2 ) = xa1 x12 a , with a 2 (0; 1). Consider the consumer problem
where (p; w) = x = (x1 ; x2 ) 2 R2+ : p1 x1 + p2 x2 = w is the the budget set, with p1 ; p2 >
0 (strictly positive prices). We can easily solve this problem by substitution. Indeed, from
the budget constraint we have
w p 1 x1
x2 =
p2
In view of this expression, de…ne f : [0; w=p1 ] ! R by4
1 a
w p 1 x1
f (x1 ) = xa1
p2
3
Actually, in these applications strict concavity is often assumed in order to have unique solutions, so to
best carry out comparative statics exercises. For instance, in many works in economics, utility functions u
that are de…ned on monetary outcomes – i.e., on the real line – are assumed to be such that u0 > 0 and
u00 < 0, so strictly increasing (Proposition 1005) and strictly concave (Corollary 1101).
4
The condition x1 I=p1 ensures that x2 0.
876 CHAPTER 28. UNCONSTRAINED OPTIMIZATION
Since g is easily checked to be strictly concave, by the concave method the unique maximizer
is
w
x
^1 = a
p1
By replacing it in the budget constraint, we conclude that
w w
x
^= a ; (1 a)
p1 p2
is the unique solution of the Cobb-Douglas consumer problem (28.9). N
is neither coercive nor concave: the cosine function is neither coercive on the real line
(see Example 805) nor concave. Nonetheless, the problem is trivial: as one can easily
infer from the graph of the cosine function, its solutions are the points x = 2k con
k 2 Z. As usual, common sense gives the best guidance in solving any problem (in
particular, optimization ones), more so than any classi…cation.
28.4. RELATIONSHIP AMONG PROBLEMS 877
2. The two classes are not disjoint: there are unconstrained di¤erential optimization prob-
lems which are both coercive and concave. For example, the unconstrained di¤erential
optimization problem
max 1 x2 sub x 2 R
x
is both coercive and concave: the function 1 x2 is indeed both coercive (see Example
811) and strictly concave on the real line. In cases such as this one, we use the more
powerful concave method.5
3. The two classes are distinct: there are unconstrained di¤erential optimization problems
which are coercive but not concave, and vice versa.
3
y
1
1
0
O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
shows how it is concave, but not coercive. The optimization problem is thus
concave, but not coercive.
(b) The unconstrained di¤erential optimization problem
x2
max e sub x 2 R
x
5
As coda readers may have noted, this objective function is strongly concave. Indeed, it is for such a class
of concave functions that the overlaps of the two classes of unconstrained di¤erential optimization problems
works at best.
878 CHAPTER 28. UNCONSTRAINED OPTIMIZATION
2
is coercive but not concave: the Gaussian function e x is indeed coercive (Ex-
ample 807) but not concave, as its famous bell graph shows
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
28.5 Relaxation
An optimization problem
max f (x) sub x 2 C
x
which is characterized by a larger choice set C B A which is, however, analytically more
convenient (for example it may be convex or open), so that the relaxed problem becomes
coercive or concave. If a solution of the relaxed problem belongs to the original choice set C,
it automatically solves the original problem as well. The following examples should clarify
this simple yet powerful idea, which can allow us to solve optimization problems which are
neither coercive nor concave.
where Qn+ is the set of vectors in Rn whose coordinates are rational and positive. An obvious
relaxing of the problem is
2
max 1 kxk2 ekxk sub x 2 Rn
x
whose choice set is larger yet analytically more convenient. Indeed the relaxed problem is
coercive and a simple application of the elimination method shows that its solution is the
28.6. OPTIMIZATION AND EQUATIONS: GENERAL LEAST SQUARES 879
origin x^ = 0 (Example 1264). Since it belongs to Qn+ , we conclude that the origin is also
the unique solution of problem (28.10). It would have been far more complex to reach such
a conclusion by studying the original problem directly.
(ii) Consider the consumer problem with log-linear utility
n
X
max ai log xi sub x 2 C (28.11)
x
i=1
where C = B (p; w) \ Qn is the set of bundles with rational components (a realistic assump-
tion). Consider the relaxed version
n
X
max ai log xi sub x 2 B (p; w)
x
i=1
with a larger yet convex –thus analytically more convenient –choice set. Indeed, convexity
itself allowed us to conclude in Section 18.6 that the unique solution of the problem is the
bundle x ^ such that x
^i = ai w=pi for every good i = 1; :::; n. If ai ; pi ; w 2 Q for every i, the
bundle x ^ belongs to C, so is the unique solution of problem (28.11). It would have been far
more complex to reach such a conclusion by studying problem (28.11) directly. N
f (x) = y0 (28.12)
If a vector x x ) y0 k 2 =
^ 2 A solves equation (28.12), then it solves problem (28.13). Indeed, kf (^
0. The converse is false because the optimization problem might have solutions even though
the equation has no solutions. Even in this case, however, the optimization connection is
important because the solutions of the optimization problems are the best approximations
–i.e., the best surrogates –of the missing solutions. A classic example is a system of linear
equations Ax = b, which has the form (28.13) via the linear function f (x) = Ax de…ned on
Rn and the known term b 2 Rm , i.e.,
In this case (28.13) is a least squares problem and, when the system has no solutions, we
have the least squares solutions studied in Section 18.9.
In sum, the solutions of the optimization problem (28.13) are candidate solutions of equa-
tion (28.12). If they turn out not to be solutions, they are nevertheless best approximations.
As to problem (28.13), assume that the image of f is a closed convex set of Rn . Consider
the auxiliary problem
min ky y0 k2 sub y 2 Im f
y
(y0 y^) (^
y y) 0 8y 2 Im f
that admits at least a solution, i.e., arg maxx2C f (x) 6= ;. To ease notation, we denote the
maximum value by f^ = maxx2C f (x).
In words, a sequence fxn g in the choice set is relaxing if the objective function assumes
larger and larger values, so it gets closer and closer to the maximum value f^, as n increases.
The following notion gives some computational content to problem (28.15).
xn+1 = h (xn )
6
We refer interested readers to Nesterov (2004) for a authoritative presentation of this topic.
28.7. CODA: COMPUTATIONAL ISSUES 881
^ k2
2 kx0 x
f^ f (xn ) (28.18)
n
for the sequence fxn g of its iterates.
Thus, objective functions that are -smooth and concave have a optimal decision pro-
cedure (28.17), called gradient descent, with unitary speed. The gradient descent procedure
prescribes that, if at x we have @f (x) =@xi > 0 (resp., < 0), in the next iterate we increase
(resp., decrease) the component i of the vector x. If one draws the graph of a scalar concave
function, the intuition behind this rule should be apparent.7 This rule reminds a basic rule
of thumb when trying to reach the peak of a mountain: at a crossroad, always take the rising
path.
The proof relies on the following lemma of independent interest (it is a …rst order ap-
proximation with integral remainder).
7
A dual version of this result holds for minimization problem with convex objective functions, with h (x) =
1
x rf (x).
882 CHAPTER 28. UNCONSTRAINED OPTIMIZATION
for all x; y 2 U .
Proof Let x; y 2 U . De…ne the auxiliary function : [0; 1] ! R by (t) = f ((1 t) x + ty).
Since f is di¤erentiable, the function is easily seen to be di¤erentiable. By the chain rule,
we then have
n
X
0 @f ((1 t) x + ty)
(t) = (yi xi ) = rf (x + t (y x)) (y x)
@xi
i=1
By (35.57), we have
Z 1 Z 1
0
f (y) f (x) = (1) (0) = (t) dt = rf (x + t (y x)) (y x) dt
0 0
as desired.
The next lemma reports some important inequalities for -smooth functions.
for all x; y 2 U .
where the …rst inequality follows from the Cauchy-Schwarz inequality. This proves (28.19).
Assume that f and U are convex. Then, (28.19) implies
Proof of Theorem 1274 Set g = f . Clearly, also the function g is -smooth. Since g is
convex, we then have
where the second inequality follows from the Cauchy-Schwarz inequality. Since krf (x)k =
krg (x)k for all x 2 Rn , we thus have
1
krf (xn+1 )k2 f (xn+1 )
f (xn ) +
2
for all n, so the sequence fxn g is relaxing. In particular, we have
1
f^ f (xn+1 ) f^ f (xn ) krf (xn )k2 (28.23)
2
Next we show that
kxn+1 x
^k kxn x
^k 8n 0 (28.24)
Indeed, since g is -smooth and convex we have
2
1 1
kxn+1 ^ k2 =
x xn rg (xn ) x
^ = kxn ^ k2 +
x 2
krg (xn )k2
2
rf (xn ) (xn x
^)
1 21
kxn ^k2 +
x 2
krg (xn )k2 krf (x)k2
1
= kxn ^k2
x 2
krg (xn )k2
1 n 1
2 +
dn 2 kx0 x
^k d0
28.7. CODA: COMPUTATIONAL ISSUES 885
^k2
2 kx0 x
0 < dn
n
Example 1277 Given a matrix A , with n m, consider the least squares optimization
m n
problem (28.14), i.e.,
max g (x) sub x 2 Rn
x
with g : Rn ! R de…ned by g (x) = kAx bk2 . Then, rg (x) = AT (Ax b), so for
some > 0 we have
where the last inequality holds because the Gram matrix AT A induces a linear operator
g : Rn ! Rn de…ned by g (x) = AT Ax, which is Lipschitz continuous by Theorem 729.
We conclude that g is -smooth. Since it is also concave, by the last theorem the map
h : Rn ! Rn de…ned by
1 T
h (x) = x A (Ax b)
^ k2
2 kx0 x
f^ f (xn )
n
for the sequence of iterates
1
xn+1 = xn AT (Axn b)
generated by h. N
De…nition 1278 A sequence fxn g C is maximizing for problem (28.1) if lim f (xn ) = f^.
Next we show that under some standard conditions maximizing sequences converge to
solutions.
Proof We prove the “if” because the converse is trivial. Let x ^ be the unique solution of
problem (28.1). Let fxn g be maximizing, i.e., lim f (xn ) = f^. We want to show that xn ! x^.
Suppose, by contradiction, that there exists " > 0 and a subsequence fxnk g such that
kxnk x ^k " for all k (cf. Proposition 1557). Since limk!+1 f (xnk ) = f^, there exists some
scalar t such that eventually all terms of the subsequence fxnk g belong to the upper contour
set (f t). The supercoercive function f is continuous because it is concave (Theorem
669). So, the set (f t) is compact (cf. Proposition 820). By the Bolzano-Weierstrass’
Theorem, there exists a subsubsequence xnks that converges to some x 2 (f t). Since
f is continuous, we have lims!+1 f xnks = f (x ) f^ = lims!+1 f xnks , where the
^ ^
equality follows from lim f (xn ) = f . So, f = f (x ). In turn, this implies x
^ = x . We thus
reached the contradiction:
0<" xnks x
^ xnks x + kx ^k = xnks
x x !0
We conclude that xn ! x
^.
Example 1280 In the last example, assume that (A) = n . By Theorem 854, the function
g is strictly concave and supercoercive. So, the iterates
1
xn+1 = xn AT (Axn b) (28.25)
1
^ = AT A
converge to the least squares solution x AT b. The iteration does not require any
matrix inversion. N
Thus, for optimization problems featuring strictly concave and supercoercive objective
functions, the sequence recursively de…ned via a decision procedure converges to the solution.
If we make the stronger assumption that the objective function is strongly concave,8 then
we can bound the rate of convergence to solutions of maximizing sequences.
Thus, for a the sequence fxn g recursively de…ned via a decision procedure with speed k
we have p
c
kxn x ^k k
n2
provided the objective function is strongly concave.
The proof of Proposition 1281 is an easy consequence of the following lemma that
sharpens for strongly concave functions a classic inequality that holds for concave functions
(cf. Theorem 1117).
for all x; y 2 U .
Proof By de…nition, there is k > 0 such that the function g : U ! R de…ned by g (x) =
f (x) + k kxk2 is concave. Then, for all x; y 2 U we have
so that
We have
= k kxk2 kyk2 + 2x y x x
Proof of Proposition 1281 Assume that f is strongly concave with constant k > 0. By
(28.26), we have f (x) pf p (^
x) + rf (^
x) (x x ^) k kx x ^k2 = f (^x) k kx x ^k2pfor all
n
x 2 R . So, k^ x xk k f (^ n
x) f (x) for all x 2 R . In turn, by setting = k this
easily implies the desired result.
where PC : Rn ! C is the projection operator (Section 24.10). Indeed, the projection ensures
that the next iterate keeps being an element of the choice set C.
1
If (A) = m, by (24.65) we have PC (x) = x + AT AAT (b Ax) for all x 2 Rn . So
1 1 1 1
h (x) = PC x+ rf (x) =x+ rf (x) + AT AAT b A x+ rf (x)
1 1 1 1
= x+ rf (x) + AT AAT b AT AAT A x+ rf (x)
provided f is di¤erentiable.
(ii) Let C = Rn+ be the positive orthant. Consider an optimization problem
+
By (24.65), PC (x) = x+ for all x 2 Rn , so h (x) = x + 1 rf (x) provided f is di¤eren-
tiable. N
Finally, there exist “accelerated” decision procedure that have speed 2, i.e., for some
constant c > 0 we have
c
f^ f (xn )
n2
Roughly speaking, they have a bivariate form
1
yn+1 = xn + rf (xn )
xn+1 = n yn+1 + n yn
Equality constraints
29.1 Introduction
The classic necessary condition for local extremal points given by Fermat’s Theorem considers
interior points of the choice set C, something that greatly limits its use in …nding candidate
solutions of optimization problems coming from economics. Indeed, in many of them the
hypotheses of monotonicity of Proposition 783 hold and, therefore, the possible solutions are
on the boundary of the choice set, not in its interior. A classic example is the consumer
problem
max u (x) sub x 2 B (p; w) (29.1)
x
Under a standard hypothesis of monotonicity, by Walras’law the problem can be rewritten
as
max u (x) sub x 2 (p; w)
x
int (p; w) = ;
Fermat’s Theorem is thus useless for …nding the candidate solutions of the consumer prob-
lem. The equality constraint, with its drastic topological consequences, deprives us of this
fundamental result in the study of the consumer problem. Fortunately, there is an equally
important result of Lagrange that rescue us, as this chapter will show.
889
890 CHAPTER 29. EQUALITY CONSTRAINTS
the functions f and gi are continuously di¤erentiable on a non-empty and open subset D of
their domain A; that is, ; =
6 D int A.
The set
C = fx 2 A : gi (x) = bi 8i = 1; :::; mg (29.3)
is the subset of A identi…ed by the constraints. Therefore, optimization problem (29.2) can
be equivalently formulated in canonical form as
Nevertheless, for this special class of optimization problems we will often use the more
evocative formulation (29.2).
In what follows we will …rst study in detail the important special case of a single con-
straint, which we will then generalize in Section 29.7 to the case of several constraints.
The next fundamental lemma gives the key to …nding the solutions of problem (29.4).
The hypothesis x ^ 2 C \ D requires that x^ be a point of the choice set at which f and g are
both continuously di¤erentiable. Moreover, we require that rg (^ x) 6= 0. In this regard, note
that a point x 2 D is said to be regular (with respect to the constraints) if rg (x) = 0, and
singular otherwise. According to this terminology, the condition rg (^ x) 6= 0 requires point
x
^ to be regular.
x) = ^ rg (^
rf (^ x) (29.5)
@f @g
x) = ^
(^ (^
x) 8k = 1; :::; n
@xk @xk
Thus, a necessary condition for x ^ to be a local solution of the optimization problem (29.4)
is that the gradients of the functions f and g are proportional. The “hat” above reminds
us that this scalar depends on the point x^ considered.
Next we give a proof of this remarkable fact based on the Implicit Function Theorem.
29.3. ONE CONSTRAINT 891
Proof We prove the lemma for n = 2 (the extension to arbitrary n is routine if one uses a
version of the Implicit Function Theorem for functions of n variables). Since rg (^ x) 6= 0, at
least one of the two partial derivatives @g=@x1 or @g=@x2 is non-zero at x ^. Let for example
@g=@x2 (^x) 6= 0 (in the case @g=@x1 (^ x) 6= 0 the proof is symmetric). As seen in Section
25.3.2, the Implicit Function Theorem can be applied also to study locally points belonging
to the level curves g 1 (b) with b 2 R. Since x ^ = (^ ^2 ) 2 g 1 (b), this theorem yields
x1 ; x
neighborhoods U (^ x1 ) and V (^x2 ) and a unique di¤erentiable function h : U (^ x1 ) ! V (^x2 )
such that x^2 = h (^
x1 ) and g (x1 ; h(x1 )) = b for each x1 2 U (^ x1 ), with
@g
@x1 (x1 ; x2 )
h0 (x1 ) = @g
8 (x1 ; x2 ) 2 g 1
(b) \ (U (^
x1 ) V (^
x2 ))
@x2 (x1 ; x2 )
0 @f @f
(x1 ) = (x1 ; h(x1 )) + (x1 ; h(x1 )) h0 (x1 )
@x1 @x2
Since x
^ is a local solution of the optimization problem (29.4), there exists a neighborhood
B" (^
x) of x
^ such that
f (^
x) f (x) 8x 2 g 1 (b) \ B" (^
x) (29.6)
Without loss of generality, suppose that " is su¢ ciently small so that
(^
x1 "; x
^1 + ") U (^
x1 ) and (^
x2 "; x
^2 + ") V (^
x2 )
Hence, B" (^
x) U (^
x1 ) V (^
x2 ). This permits to rewrite (29.6) as
f (^
x1 ; h (^
x1 )) f (x1 ; h (x1 )) 8x1 2 (^
x1 "; x
^1 + ")
that is, (^
x1 ) (x1 ) for every x1 2 (^ x1 "; x^1 + "). The point x ^1 is, therefore, a local
maximizer for . The …rst-order condition reads
@g
!
0 @f @f @x1 (^
x1 ; x^2 )
(x1 ) = (^
x1 ; x
^2 ) (^
x1 ; x
^2 ) @g =0 (29.7)
@x1 @x2 (^
x 1 ; x
^ 2 )
@x2
If (@g=@x1 ) (^
x1 ; x
^2 ) 6= 0, we have
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
If (@g=@x1 ) (^
x1 ; x
^2 ) = 0, then (29.7) yields
@f
(^
x1 ; x
^2 ) = 0
@x1
The next example shows that condition (29.5) is necessary, but not su¢ cient.
x31 + x32
max sub x1 x2 = 0 (29.8)
x1 ;x2 2
is of the form (29.4), where f; g : R2 ! R are given by f (x) = 2 1 (x31 + x32 ) and g (x) =
x1 x2 , while b = 0. We have rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1), so ^ = 0 is such
that rf (0; 0) = ^ rg (0; 0). Hence, the origin (0; 0) satis…es condition (29.5) with ^ = 0.
But, the origin is not a solution of problem (29.8):
Note that the origin is not even a constrained (global) minimizer since f (t; t) = t3 < 0 for
every t < 0. N
@f @f @g @g
(^
x) ; (^
x) =^ (^
x) ; (^
x)
@x1 @x2 @x1 @x2
that is,
@f @g @f @g
x) = ^
(^ (^
x) and x) = ^
(^ (^
x) (29.10)
@x1 @x1 @x2 @x2
The condition rg (^ x) 6= 0 means that at least one of the partial derivatives (@g=@xi ) (^
x) is
di¤erent from zero. If, for convenience, we suppose that both are non-zero and that ^ 6= 0,
then (29.10) is equivalent to
@f @f
@x1 (^
x) @x2 (^
x)
@g
= @g
(29.11)
@x1 (^
x) @x2 (^
x)
29.3. ONE CONSTRAINT 893
@f @f
df (^
x) (h) = rf (^
x) h = (^
x ) h1 + (^
x ) h2 8h 2 R2
@x1 @x2
@g @g
dg (^
x) (h) = rg (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
They linearly approximate the di¤erences f (^ x + h) f (^x) and g (^x + h) g (^x), that is, the
e¤ect of moving from x ^ to x
^ +h on f and g. As we know well by now, such an approximation is
the better the smaller h. Suppose, ideally, that h is in…nitesimal and that the approximation
is exact, so that f (^ x + h) f (^ x) = df (^x) (h) and g (^
x + h) g (^ x) = dg (^
x) (h). This is
clearly incorrect formally, but here we are proceeding heuristically.
Continuing in our heuristic reasoning, let us start now from the point x ^ and let us
consider variations x ^ + h with h in…nitesimal. The …rst issue to worry about is whether they
are legitimate, i.e., whether they satisfy the equality constraint g (^ x + h) = b. This means
that g (^
x + h) = g (^x), so h must be such that dg (^ x) (h) = 0. It follows that
@g @g
(^
x ) h1 + (^
x) h2 = 0
@x1 @x2
and so
@g
@x2 (^
x)
h1 = @g
h2 (29.12)
@x1 (^
x)
The e¤ect of moving from x ^ to x
^ + h on the objective function f is given by df (^
x) (h). When
h is legitimate, by (29.12) this e¤ect is given by
@g
!
@f @x2 (^
x) @f
df (^
x) (h) = (^
x) @g
h2 + (^
x) h2 (29.13)
@x1 (^
x) @x 2
@x1
If x
^ is solution of the optimization problem, we must necessarily have df (^ x) (h) = 0 for
every legitimate variation h. Otherwise, if, say df (^x) (h) > 0, one would have a point x ^+h
that satis…es the equality constraint, but such that f (^ x + h) > f (^x). On the other hand, if
instead df (^
x) (h) < 0 the same observation could be made this time for h, which is obviously
a legitimate variation, and that would lead to the point x ^ h with f (^ x h) > f (^ x).
The necessary condition df (^ x) (h) = 0 together with (29.13) gives
@g
!
@f @x2 (^
x) @f
(^
x) @g
h2 + (^
x) h2 = 0
@x1 (^
x) @x 2
@x1
which is precisely expression (29.11). At an intuitive level, all this explains why (29.5) is
necessary for x
^ to be solution of the problem.
894 CHAPTER 29. EQUALITY CONSTRAINTS
This function, called the Lagrangian, plays a key role in optimization problems. Its gradient
is
@L @L @L
rL (x; ) = (x; ) ; :::; (x; ) ; (x; ) 2 Rn+1
@x1 @xn @
It is important to distinguish in this gradient the two parts rx L and r L given by
@L @L
rx L (x; ) = (x; ) ; :::; (x; ) 2 Rn
@x1 @xn
and
@L
r L (x; ) = (x; ) 2 R
@
Using this notation, we have
and
r L (x; ) = b g (x) (29.16)
which leads to the following fundamental formulation of the necessary condition of optimality
of Lemma 1285 in terms of the Lagrangian function.
Proof Let x^ be solution of the optimization problem (29.4). By Lemma 1285, there exists
^ 2 R such that
rf (^x) ^ rg (^x) = 0
By (29.15), the condition is equivalent to
x; ^ ) = 0
rx L(^
x; ^ ) = 0
On the other hand, by (29.15) we have r L (x; ) = b g (x), so we have also r L(^
since b g (^
x) = 0. It follows that (^ ^
x; ) is a stationary point of L.
Thanks to Lagrange’s Theorem, the search for local solutions of the constrained optim-
ization problem (29.4) reduces to the search for the stationary points of a suitable function
of several variables, the Lagrangian function. It is a more complicated function than the
29.3. ONE CONSTRAINT 895
original function f because of the new variable , but through it the search for the solutions
of the optimization problem can be done by solving a standard …rst-order condition, similar
to the ones seen for unconstrained optimization problems.
Needless to say, we are discussing a condition that is only necessary: there is no guarantee
that the stationary points are actually solutions of the problem. It is already a remarkable
achievement, however, to be able to use the simple (…rst-order) condition
rL (x; ) = 0 (29.17)
to search for the possible candidate solutions of the constrained optimization problem (29.4).
In the next section we will see that this condition plays a fundamental role in the search for
the local solutions of problem (29.4) with the Lagrange’s method, which in turn may lead to
the global solutions through a version of the elimination method.
x; ^ ) is not
We close with two important remarks. First, observe that in general the pair (^
a maximizer of the Lagrangian function, even when x ^ turns out to solve the optimization
problem. The pair (^ ^
x; ) is just a stationary point for the Lagrangian function, nothing
more. Therefore, it is erroneous to assert that the search for solutions of the constrained
optimization problem reduces to the search for maximizers of the Lagrangian function.
Second, note that problem (29.4) has a symmetric version
min f (x) sub g (x) = b
x
in which, instead of looking for maximizers, we look for minimizers. Condition (29.5) is
necessary also for this version of problem (29.4) and, therefore, the stationary points of the
Lagrangian function could be minimizers instead of maximizers. However, it can be the
case that they are neither maximizers nor minimizers. This is the usual ambiguity of …rst-
order conditions, encountered also in unconstrained optimization: it re‡ects the fact that
…rst-order conditions are only necessary conditions.
On the other hand, again by a heuristic application of the chain rule we have
df (^
x (b)) df (^
x (b)) dx (b)
=
db dx db
@f (^x (b)) ^ @g (^
x (b)) ^ @g (^x (b)) dx (b)
= (b) + (b)
@x @x @x db
@f (^x (b)) ^ @g (^
x (b)) 0 @g (^
x (b)) dx (b)
= (b) x^ (b) + ^ (b)
@x @x @x db
| {z }
=0 by (29.5)
x (b)) dx (b) ^
@g (^
= ^ (b) = (b)
@x db
where the last equality follows from (29.18). Summing up, for every scalar b we have
df (^
x (b)) ^
= (b)
db
The multiplier is thus the “marginal maximum value” in that it quanti…es the marginal
e¤ect on the attained maximum value of (slightly) altering the constraint. For instance, in
the consumer problem the scalar b is the income of the consumer, so the multiplier quanti…es
the marginal e¤ect on the attained maximum utility of a (small) variation in income.
N.B. We are using the word “altering” rather than “relaxing” because by changing b the
choice set (29.3) does not get larger. It just becomes di¤erent. So, a priori, a change in b
might not be bene…cial (indeed, the sign of the multiplier can be positive or negative). In
contrast, the word “relaxing”becomes appropriate in studying variations of the scalars that
de…ne inequality constraints (cf. the discussion in Section 33.6). O
1. determine the set D where the functions f and g are continuously di¤erentiable;
2. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di¤erentiable;
4. determine the set S of the regular points x 2 C \ (D D0 ) for which there exists a
Lagrange multiplier 2 R such that the pair (x; ) 2 Rn+1 is a stationary point of the
Lagrangian function, that is, it satis…es the …rst-order condition (29.17);1
1
Note that S C because the points that satisfy condition (29.17) also satisfy the constraints. It is
therefore not necessary to check if for a point x 2 S we have also x 2 C.
29.4. THE METHOD OF ELIMINATION 897
5. the local solutions of the optimization problem (29.4), if they exist, belong to the set
S [ (C \ D0 ) [ (C D) (29.19)
Thus, according to Lagrange’s method, the possible local solutions of the optimization
problem (29.4) must be searched among the points of the subset (29.19) of C. Indeed, a
local solution that is a regular point will belong to the set S thanks to Lagrange’s Theorem.
However, this theorem does not say anything about possible local solutions that are singular
points – and so belong to the set C \ D0 – as well as about possible local solutions where
the functions do not have a continuous derivative –and so belong to the set C D.
In conclusion, a necessary condition for a point x 2 C to be a local solution for the
optimization problem (29.4) is that it belongs to the subset S [ (C \ D0 ) [ (C D) C.
This is what this procedure, a key dividend of Lagrange’s Theorem, establishes. Clearly, the
smaller such a set is, the more e¤ective the application of the theorem is: the search for local
solutions can be then restricted to a signi…cantly smaller set than the original set C.
That said, what about global solutions? If the objective function f is coercive and
continuous on C, the …ve phases of the Lagrange’s method plus the following extra sixth
phase provide a version of the elimination method to …nd global solutions.
f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D) (29.20)
then x
^ is a (global) solution of the optimization problem (29.4).
In other words, the points of the set (29.19) in which f attains its maximum value are the
solutions of the optimization problem. Indeed, by Lagrange’s method this is the set of the
possible local solutions; global solutions, whose existence is ensured by Tonelli’s Theorem,
must then belong to such a set. Hence, the solutions of the “restricted”optimization problem
are also the solutions of the optimization problem (29.4). Phase 6 is based on this remark-
able fact. As for the Lagrange’s method, the smaller the set (29.19) is, the more e¤ective
the application of the elimination method is. In particular, in the lucky case when it is a
singleton, the elimination method determines the unique solution of the optimization prob-
lem, a remarkable achievement.
2 P
is of the form (29.4), where f; g : Rn ! R are given by f (x) = e kxk and g (x) = ni=1 xi ,
and b = 1. The functions are both continuously di¤erentiable on the entire plane, so D = R2 .
We then trivially have C D = ;: at all the points of the constraint set, the functions f
and g are both are continuously di¤erentiable. We have therefore completed phases 1 and 2
of Lagrange’s method.
Since rg (x) = (1; 1; :::; 1), there are no singular points, that is, D0 = ;. This completes
phase 3 of Lagrange’s method.
The Lagrangian function L : Rn+1 ! R is given by
n
!
2 X
L (x; ) = e kxk + 1 xi
i=1
To …nd the set of its stationary points, it is necessary to solve the …rst-order condition (29.17)
given here by the following (nonlinear) system of n + 1 equations:
( @L kxk2
@xi = 2xi e = 0 8i = 1; :::; n
@L P n
@ =1 i=1 xi = 0
We observe that for no solution we can have = 0. Indeed, otherwise the …rst n equations
would imply xi = 0, which contradicts the last equation. It follows that for every solution
we have 6= 0. The …rst n equations yield
2
xi = ekxk
2
and, upon substituting these values in the last equation, we get
2
1 + n ekxk = 0
2
that is
2 kxk2
= e
n
Substituting this value of in any of the …rst n equations we …nd xi = 1=n, so the only
point (x; ) 2 Rn+1 that satis…es the …rst-order condition (29.17) is
1 1 1 2 1
; ; :::; ; e n
n n n n
S [ (C \ D0 ) [ (C D) = S (29.23)
Thus, in this example the …rst-order condition (29.17) turns out to be necessary for any local
solution of the optimization problem (29.22). The unique element of S is, therefore, the only
candidate to be a local solution of the problem. This completes Lagrange’s method.
29.4. THE METHOD OF ELIMINATION 899
Turn now to the elimination method, which we can use since the continuous function f
is coercive on the (non compact, being closed but unbounded) set
( n
)
X
C = x = (x1 ; :::; xn ) 2 Rn : xi = 1
i=1
Indeed: 8 n
< R p if t 0
(f t) = x 2 Rn : kxk lg t if t 2 (0; 1]
:
; if t > 1
so the set (f t) is compact and non-empty for each t 2 (0; 1]. Since the set in (29.23) is
a singleton, the elimination method allows us to conclude that (1=n; :::; 1=n) is the unique
solution of the optimization problem (29.22). N
To …nd the set of its stationary points we need to solve the …rst-order condition (29.17),
given here by the following system (nonlinear) of n + 1 equations
( @L pi
@xi = xi =0 8i = 1; :::; n
@L P n
@ =1 i=1 xi = 0
Because the coordinates of the vector p are all di¤erent from zero, one cannot have = 0
for any solution.PIt follows n
P that for each solution 6= 0. Because x 2 R++ , the …rst n
equations
Pn imply pi = xi , and by substituting these values in the last equation we …nd
i=1 pi =
P . Then, by substituting this value of in each of the …rst n equations we …nd
xi = pi = ni=1 pi . Thus, the unique point (x; ) 2 Rn+1 that satis…es the …rst order-condition
(29.17) is ( )
n
X
p1 p2 pn
Pn ; Pn ; :::; Pn ; pi
i=1 pi i=1 pi i=1 pi i=1
so that S is the singleton
p p2 pn
S= Pn 1 ; Pn ; :::; Pn
i=1 pi i=1 pi i=1 pi
2
That is, all coordinates of p are either strictly positive or strictly negative.
900 CHAPTER 29. EQUALITY CONSTRAINTS
S [ (C \ D0 ) [ (C D) = S (29.25)
Thus, also in this example the …rst-order condition (29.17) is necessary for each local solution
of the optimization problem (29.24). Again, the unique element of S is the only candidate to
be a local solution of the optimization problem (29.22). This completes Lagrange’s method.
We can apply the elimination method becauseP the continuous function f is, by Lemma
847, also coercive on the set C = x 2 Rn++ : ni=1 xi = 1 , which is not compact because
it is not closed. In view of (29.25), the elimination method implies that
p1 pn
( Pn ; :::; Pn )
i=1 pi i=1 pi
When the elimination method is based on Weierstrass’ Theorem, rather than on the
weaker (but more widely applicable) Tonelli’s Theorem, as a “by-product” we can also …nd
the global minimizers, that is, the points x 2 C that solve problem minx f (x) sub x 2 C.
Indeed, it is easy to see that such are the points that minimize f over S [(C \ D0 )[(C D).
Clearly, this is no longer true with Tonelli’s Theorem because it only ensures the existence
of maximizers and remains silent on possible minimizers.
is of the form (29.4), with f; g : R2 ! R are given by f (x1 ; x2 ) = 2x21 5x22 and g (x1 ; x2 ) =
x21 + x22 , while b = 1. Both f and g are continuously di¤erentiable on the entire plane, so
D = R2 . Hence, C D = ;: at all the points of the constraint set the functions f and g are
continuously di¤erentiable. This completes phases 1 and 2 Lagrange’s method.
We have rg (x) = (2x1 ; 2x2 ), so the origin (0; 0) is the unique singular point, that is,
D0 = f(0; 0)g. This singular point does not satisfy the constraint, so C \ D0 = ;. This
completes phase 3 of Lagrange’s method.
The Lagrangian function L : R3 ! R is given by
To …nd the set of its stationary points we must solve the …rst-order condition (29.17):
8 @L
>
> =0
< @x1
@L
> @x2 = 0
>
: @L
@ =0
so that
S = f(0; 1) ; (0; 1) ; (1; 0) ; ( 1; 0)g
As in the last two examples, the …rst-order condition is necessary for any local solution of
the optimization problem (29.26).
By having completed Lagrange’s method, let us turn to elimination method to …nd the
global solutions. Since the set C = (x1 ; x2 ) 2 R2 : x21 + x22 = 1 is compact and the function
f is continuous, we can use such method through Weierstrass’Theorem. In view of (29.27),
in phase 6 we have:
The points (0; 1) and (0; 1) are thus the (global) solutions of the optimization problem
(29.26), while the reliance here of the elimination method on Weierstrass’Theorem makes it
possible to say that the points (1; 0) and ( 1; 0) are global minimizers. N
is of the form (29.4), with f; g : R2 ! R given by f (x) = e x1 and g (x) = x31 x22 , and
b = 0. We have D = R2 , hence C D = ;. Phases 1 and 2 of Lagrange’s method have been
completed.
Moreover, we have
rg (x) = 3x21 ; 2x2
so the origin (0; 0) is the unique singular point and it also satis…es the constraint, i.e.,
D0 = C \ D0 = f(0; 0)g. This completes phase 3 of Lagrange’s method.
The Lagrangian function L : R3 ! R is given by
x1
L (x1 ; x2 ; ) = e + x22 x31
3
Note that there are no other points that satisfy rL = 0: Indeed, suppose that rL(^ ^2 ; ^ ) = 0, with
x1 ; x
^1 6= 0 and x
x ^2 6= 0. Then, from @L=@x1 = 0 we deduce = 2, whereas from @L=@x2 = 0 we deduce = 5.
902 CHAPTER 29. EQUALITY CONSTRAINTS
To …nd the set of its stationary points, we need to solve the …rst-order condition (29.17),
given here by the following (nonlinear) system of three equations:
8 @L
>
> = e x1 3 x21 = 0
< @x1
@L
> @x2 = 2 x2 = 0
>
: @L 2 x31 = 0
@ = x2
Note that for no solution we can have = 0. Indeed, for = 0 the …rst equation becomes
e x1 = 0, which does not have solution. Let us suppose therefore 6= 0. The second equation
implies x2 = 0, hence from the third one it follows that x1 = 0. The …rst equation becomes
1 = 0, and the contradiction shows that the system does not have solutions. Therefore,
there are no points that satisfy the …rst-order condition (29.17), so S = ;. Phase 4 of
Lagrange’s method shows that
By Lagrange’s method, the unique possible local solution of the optimization problem (29.28)
is the origin (0; 0).
Turn now to the elimination method. To use it we need to show that the continuous f is
coercive on the (non compact, being closed but unbounded) set C = (x1 ; x2 ) 2 R2 : x31 = x22 .
Note that: 8 2
>
> R if t 0
<
(f t) = ( 1; lg t] R if t 2 (0; 1]
>
>
:
; if t > 1
Thus, f is not coercive on the entire plane but it is coercive on C, which is all that matters
here. Indeed, note that x1 can satisfy the constraint x31 = x22 only if x1 0, so that
C R+ R and
where (p; w) = fx 2 A : p x = wg, with strictly positive prices p 0. To best solve this
problem with the di¤erential methods of this chapter, assume also that the utility function
u : A Rn+ ! R is continuously di¤erentiable on int A.4
For instance, consumer problems that satisfy such assumptions
Pare the those featuring
n
a log-linear utility function u : Rn++ ! R de…ned by u (x) = i=1 i log x
a P i , with A =
int A = R++ , or a separable utility function u : R+ ! R de…ned by u (x) = ni=1 xi , with
n n
Let us …rst …nd the local solutions of the consumer problem through Lagrange’s method.
The function g (x) = p x expresses the constraint, so
Hence, the set (p; w) D consists of the boundary points of A that satisfy the constraint.5
Note that when A = int A, as in the log-linear case, we have (p; w) D = ;.
From
rg (x) = p 8x 2 Rn
it follows that there are no singular points, that is, D0 = ;. Hence,
(p; w) \ D0 = ;
L (x; ) = u (x) + (w p x)
so to …nd the set of its stationary points, it is necessary to solve the …rst-order condition:
8 @u(x)
> @L
>
> @x1 (x; ) = @x1 p1 = 0
>
>
>
>
>
<
>
>
>
> @L
(x; ) = @u(x)
pn = 0
>
> @xn @xn
>
>
: @L
@ (x; ) = w p x=0
In a more compact way, we write
@u (x)
= pi 8i = 1; :::; n (29.30)
@xi
p x=w (29.31)
The fundamental condition (29.30) is read in a di¤erent way according to the interpretation,
cardinalist or ordinalist, of the utility function. Let us suppose, for simplicity, that 6= 0.
In the cardinalist interpretation, the condition is recast in the equivalent form
@u(x) @u(x)
@x1 @xn
= =
p1 pn
4
Note that A Rn + implies int A Rn
++ , i.e., the interior points of A have strictly positive coordinates.
5
Here the choice set, (p; I), is by de…nition included in the domain A, so @A\A\ (p; I) = @A\ (p; I).
904 CHAPTER 29. EQUALITY CONSTRAINTS
which emphasizes that, at a bundle x which is a (local) solution of the consumer problem,
the marginal utilities of the income spent for the various goods, measured by the ratios
@u(x)
@xi
pi
are all equal. Note that 1=pi is the quantity of good i that can be purchased with one unit
of income.
In the ordinalist interpretation, where the notion of marginal utility becomes meaningless,
condition (29.30) is rewritten as
@u(x)
@xi pi
@u(x)
=
pj
@xj
for every pair of goods i and j of the solution bundle x. At such a bundle, therefore,
the marginal rate of substitution between each pair of goods must be equal to the ratio
between their prices, that is, M RSxi ;xj = pi =pj . For n = 2 we have the classic geometric
interpretation of the optimality condition for a bundle (x1 ; x2 ) as equality between the slope
of the indi¤erence curve (in the sense of Section 25.3.2) and the slope of the straight line of
the budget constraint.
2
x
2
1.5
0.5
-0.5
O x
1
-1
-1 0 1 2 3 4 5 6 7
The ordinalist interpretation does not require the cardinalist notion of marginal utility, a
notion that – by Occam’s razor – becomes thus super‡uous for the study of the consumer
problem. The observation dates back to a classic 1900 work of Vilfredo Pareto and repres-
ented a turning point in the history of utility theory, so much that we talk of an “ordinalist
revolution”.
In any case, relations (29.30) and (29.31) are …rst-order conditions for the consumer
problem and their resolution determines the set S of the stationary points. In conclusion,
Lagrange’s method implies that the local solutions of the consumer problem must be looked
for among the points of the set
S [ (@A \ (p; w)) (29.32)
29.5. THE CONSUMER PROBLEM 905
Besides points that satisfy the …rst-order conditions (29.30) and (29.31), local solutions can
therefore be boundary points @A \ (p; w) of the set A that satisfy the constraint.6
When u is coercive and continuous on (p; w), we can apply the elimination method
to …nd the (global) solutions of the consumer problem, that is, the optimal bundles (which
are the economically meaningful notions, consumers do not care about bundles that are just
locally optimal). In view of (29.32), the solutions are the bundles x
^ 2 S [ (@A \ (p; w))
such that
u (^
x) u (x) 8x 2 S [ (@A \ (p; w))
In other words, we have to compare the utility levels attained by the stationary points in S
and by the boundary points that satisfy the constraint in @A \ (p; w). As the comparison
requires the computation of all these utility levels, the smaller the set S [ (@A \ (p; w))
the more e¤ective the elimination method.
Example 1292 Consider the log-linear utility function in the case n = 2, i.e.,
u (x1 ; x2 ) = a log x1 + (1 a) log x2
with a 2 (0; 1). The …rst-order condition at every (x1 ; x2 ) 2 R2++ takes the form
a 1 a
= p1 ; = p2 (29.33)
x1 x2
p1 x1 + p2 x2 = w (29.34)
Relation (29.33) implies
a 1 a
=
p 1 x1 p2 x2
Substituting this in (29.34), we have
1 a
p1 x1 + p 1 x1 = w
a
and hence
w w
x1 = a ; x2 = (1 a)
p1 p2
In conclusion,
w w
S= ; (1 a)a (29.35)
p1 p2
Since @A = ;, we have @A \ (p; w) = ;. By Lagrange’s method, the unique possible local
solution of the consumer problem is the bundle
w w
x= a ; (1 a) (29.36)
p1 p2
We turn now to the elimination method that we can use because the continuous function u
is, by Lemma 847, coercive on the set (p; w) = x 2 R2++ : p1 x1 + p2 x2 = w , which is not
compact since it is not closed. In view of (29.35), the elimination method implies that the
bundle (29.36) is the unique solution of the log-linear consumer problem, that is, the unique
optimal bundle. Note that this …nding con…rms what we already proved and discussed in
Section 18.7, in a more general and elegant way through Jensen’s inequality. N
6
When A = Rn + , they lie on the axes and are called corner solutions in the economics jargon (as remarked
earlier in the book). In the case n = 2 and A = R2+ , corner solutions can be (0; I=p2 ) and (I=p1 ; 0).
906 CHAPTER 29. EQUALITY CONSTRAINTS
where g = (g1 ; :::; gm ) : A Rn ! Rm and b = (b1 ; :::; bm ) 2 Rm . All functions f and gi are
assumed to be continuously di¤erentiable on a non-empty open subset D A. Thus, at all
points x 2 D we can de…ne the Jacobian matrix Dg (x) by
2 3
rg1 (x)
6 rg2 (x) 7
Dg (x) = 6
4
7
5
rgm (x)
A point x 2 D is called regular (with respect to the constraints) if Dg (x) has full rank,
otherwise is called singular. For instance, the Jacobian Dg (^
x) has full rank if the gradients
rg1 (^
x),...,rgm (^x) are linearly independent vectors of Rn . In such a case, the full rank
condition requires m n, that is, that the number m of constraints be smaller than the
dimension n of the space.
Two observations about regularity: (i) when m = n, the Jacobian has full rank if and
only if it is a non-singular square matrix, that is, det Dg (x) 6= 0;7 (ii) when m = 1, we have
Dg (x) = rg (x) and so the full rank condition amounts to require rg (x) 6= 0, which brings
us back to the notions of regular and singular points seen in the case m = 1 of a single
constraint.
The following result extends Lemma 1285 to the case with multiple constraints and shows
that the regularity condition rg (^x) 6= 0 from such lemma can be generalized by requiring
the Jacobian Dg (^x) to have full rank. In other words, x
^ must not be a singular point here
either.8
Lemma 1294 Let x ^ 2 C \ D be the local solution of the optimization problem (29.37). If
x) has full rank, then there is a vector ^ 2 Rm such that
Dg (^
n
X
rf (^
x) = ^ i rgi (^
x) (29.38)
i=1
for every (x; ) 2 A Rm , and Lagrange’s Theorem takes the following general form.
7
So, in this case a point x is singular if its Jacobian matrix Dg (x) is a singular matrix. The notion of
singular point is thus consistent with the notion of singular matrix (Section 13.6.6).
8
We omit the proof, which generalizes that of Lemma 1285 by means of a suitable version of the Implicit
Function Theorem. We then also omit the simple proof of Theorem 1295, which is similar to that of the
special case of a single constraint.
908 CHAPTER 29. EQUALITY CONSTRAINTS
The comments that we made for Lagrange’s Theorem also hold in this more general case.
In particular, the search for local candidate solutions for the constrained problem must still
be conducted following Lagrange’s method, while the elimination method can be still used
to check whether such local candidates actually solve the optimum problem. The examples
will momentarily illustrate all this.
From an operational standpoint note that, however, the …rst order condition
rL (x; ) = 0
is now based on a Lagrangian L that has the more complex form (29.39). Also the form of
the set of singular points D0 is more complex: the study of the Jacobian’s determinant may
be complex, thus making the search for singular points quite hard. The best thing is often
to directly look for the singular points which satisfy the constraints –i.e., for the set C \ D0
–instead of trying to determine the set D0 …rst and the intersection C \ D0 afterwards (as
we did in the case with one constraint). The points x 2 C \ D0 are such that gi (x) = bi and
the gradients rgi (x) are linearly dependent. So, we must verify whether the system
8 Pm
>
> i=1 i rgi (x) = 0
>
> g1 (x) = b1
>
>
<
>
>
>
>
>
>
:
gm (x) = bm
admits solutions (x; ) 2 Rn Rm with = ( 1 ; :::; m ) 6= 0, that is, with i that are not
all null. Such possible solutions identify the singular points that satisfy the constraints. To
ease calculations, it is useful to note that the system can be written as
8 Pm @gi (x)
>
> i=1 i @x1 = 0
>
>
>
>
>
>
>
>
>
>
>
>
>
< Pm
> @gi (x)
i=1 i @xn = 0
(29.40)
>
> g (x) = b
>
> 1 1
>
>
>
>
>
>
>
>
>
>
>
>
:
gm (x) = bm
29.7. SEVERAL CONSTRAINTS 909
has the form (29.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by f (x1 ; x2 ) =
7x1 3x3 ; g1 (x1 ; x2 ; x3 ) = x21 + x22 and g2 (x1 ; x2 ; x3 ) = x1 + x2 x3 , while b = (1; 1) 2 R2 .
These functions are all continuously di¤erentiable on R3 , so D = R3 . Hence, C D = ;:
at all points of the constraint set, the functions f , g1 and g2 are all continuously di¤erentiable.
This completes phases 1 and 2 of Lagrange’s method.
Let us …nd the singular points satisfying the constraint, that is, the set C \ D0 . The
system (29.40) becomes 8
>
> 2 1 x1 + 2 = 0
>
>
< 2 1 x2 + 2 = 0
2 =0
>
> 2 + x2 = 1
>
> x
: 1 2
x1 + x2 x3 = 1
Since 2 = 0, 1 is di¤erent from 0. This implies that x1 = x2 = 0, thus contradicting the
fourth equation. Therefore, there are no singular points satisfying the constraint, that is,
C \ D0 = ;. Phase 3 of Lagrange’s method is thus completed.
The Lagrangian L : R5 ! R is
To …nd the set of its critical points we must solve the …rst order condition (29.17), which is
given by the following non-linear system of …ve equations
8 @L
>
>
> @x1 = 7 2 1 x1 2 =0
>
> @L
= 2 x = 0
< @x2 1 2 2
@L
> @x3 = 3 + 2 = 0
>
> @@L = 1 x21 x22 = 0
>
>
: @L1
@ 2 =1 x1 x2 + x3 = 0
in the …ve unknowns x1 , x2 , x3 , 1 and 2 . The third equation implies 2 = 3, so the …rst
equation implies that 1 6= 0. Therefore, from the …rst two equations it follows that 2= 1 =
x1 and 3= (2 1 ) = x2 . By substituting into the third equation we get that 1 = 5=2. If
1 = 5=2, we have x1 = 4=5, x2 = 3=5, x3 = 4=5. If 1 = 5=2, we have x1 = 4=5,
x2 = 3=5, and x3 = 6=5. We have thus found the two critical points of the Lagrangian
4 3 4 5 4 3 6 5
; ; ; ;3 ; ; ; ; ;3
5 5 5 2 5 5 5 2
so that
4 3 4 4 3 6
S= ; ; ; ; ;
5 5 5 5 5 5
thus completing all phases of Lagrange’s method. Since C D = ; and C \ D0 = ;, we
conclude that
4 3 4 4 3 6
S [ (C \ D0 ) [ (C D) = S = ; ; ; ; ; (29.42)
5 5 5 5 5 5
910 CHAPTER 29. EQUALITY CONSTRAINTS
thus proving that in the example the …rst order condition (29.17) is necessary for any local
solution of the optimization problem (29.41).
We now turn to the elimination method. Clearly, the set
is closed. It is also bounded (and so compact). For the x1 and x2 such that x21 + x22 = 1
we have x1 ; x2 2 [ 1; 1], while for the x3 such that x3 = x1 + x2 1 and x1 ; x2 2 [ 1; 1] we
have x3 2 [ 3; 1]. It follows that C [0; 1] [0; 1] [ 3; 1], and so C is bounded. Since f is
continuous, we can thus use the elimination method through Weierstrass’Theorem. In view
of (29.42), in the last phase of the elimination method we have
4 3 4 4 3 7 49
f ; ; =8 and f ; ; =
5 5 5 5 5 5 5
Hence, (4=5; 3=5; 4=5) solves the optimum problem (29.41), while ( 4=5; 3=5; 7=5) is a
minimizer. N
has also the form (29.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by
f (x1 ; x2 ) = x1 , g1 (x1 ; x2 ; x3 ) = x21 + x32 , g2 (x1 ; x2 ; x3 ) = x23 + x22 2x2 , while b = (0; 0) 2
R2 .
As before, these functions all are continuously di¤erentiable on R3 , so D = R3 . Therefore,
C D = ;: at all points of the constraint set, the functions f , g1 and g2 are all continuously
di¤erentiable. This completes phases 1 and 2 of Lagrange’s method.
Let us …nd the set C \ D0 of the singular points satisfying the constraint. The system
(29.40) becomes 8
>
> 2 1 x1 = 0
>
>
< 3 1 x22 + 2 (2x2 2) = 0
2 2 x3 = 0
>
> 2 3
>
> x1 + x2 = 0
: 2 2
x3 + x2 2x2 = 0
In light of the …rst and the third equations, we must consider three cases:
In conclusion, the origin f(0; 0; 0)g is the unique singular point that satis…es the con-
straints, so C \ D0 = f(0; 0; 0)g. This completes phase 3 of Lagrange’s method.
The Lagrangian L : R4 ! R is given by
The …rst order condition (29.17) given by the following (non-linear) system of …ve equations
8 @L
> @x1 = 1 + 2 1 x1 = 0
>
>
>
> @L 2
< @x2 = 3 1 x2 2 2 (x2 1) = 0
@L
> @x3 = 2 2 x3 = 0
>
> @L
= x21 x32 = 0
>
>
: @@L1 2
@ 2 = x3 x22 + 2x2 = 0
Among such three points one must search for the possible local solutions of the optimization
problem (29.43).
As to the elimination method, also here the set
is clearly closed. It is also bounded (and so compact). In fact, the second constraint can be
written as x23 + (x2 1)2 = 1, and so the x2 and x3 that satisfy it are such that x2p2 [0; p 2]
and x3 2 [ 1; 1]. Now, the constraint x 2 = x3 implies x2 2 [0; 8], and so x 2 [ 8; 8].
p p 1 2 1 1
We conclude that C [ 8; 8] [0; 2] [ 1; 1] and so C is bounded. As in the previous
example, we can thus use the elimination method through Weierstrass’Theorem. In view of
(29.44), in the last phase of the elimination method we have
p p
f 8; 2; 0 = 8 and f (0; 0; 0) = 0
p
Hence, the origin (0; 0; 0) solves the optimum problem (29.43), while ( 8; 2; 0) is a minimizer.
N
2x1 1 0
Dg (x) =
1 0 1
It is easy to see that for no value of x1 the two row vectors, that is, the two gradients
rg1 (x) and rg2 (x), are linearly dependent.9 Therefore, there are no singular points, that
is, D0 = ;. It follows that C \ D0 = ;, and so we have concluded phase 3 of Lagrange’s
method.
Let us now move to the search of the set of the Lagrangian’s critical points L : R5 ! R,
which is given by
To …nd such points we must solve the following (non-linear) system of 5 equations
8 @L
>
>
> @x1 = 2x1 2 1 x1 2 =0
>
> @L
< @x2 = 2x2 + 1 = 0
@L
> @x3 = 2x3 2 =0
>
> @L
= 1 x 2+x =0
>
> 1 2
: @@L1
@ 2 = 1 x 1 x 3 =0
We have that 1 = 2x2 and 2 = 2x3 , which, if substituted in the …rst equation, lead to
the following non-linear system in three equations:
8
< x1 + 2x1 x2 x3 = 0
1 x21 + x2 = 0
:
1 x1 x3 = 0
From the last two equations it follows that x2 = x21 1 and x3 = 1p x1 , which, if substituted
in the …rst equation, imply that 2x31 1 = 0, from which x1 = 1= 3 2 follows and so
1 1
x2 = p
3
1 and x3 = 1 p
3
4 2
Therefore, the Lagrangian has a unique critical point
1 1 1 2 2
p
3
;p
3
1; 1 p
3
;p
3
2; 2 + p
3
2 4 2 4 2
so that
1 1 1
S= p
3
;p
3
1; 1 p
3
2 4 2
9
At a “mechanical” level, one can easily verify that no value of x1 can be such that the matrix Dg (x)
does not have full rank.
29.7. SEVERAL CONSTRAINTS 913
is closed but
p not bounded (so it is not compact). In fact, consider the sequence fxn g given
by xn = 1 + n; n; 1 n . The sequence belongs to C, but kxn k ! +1 and so there is
no neighborhood in R3 that may contain it. On the other hand, by Proposition 820 the
function f is coercive and continuous on C. As in the last two examples, we can thus use the
elimination method but this time via Tonelli’s Theorem. In view of (29.46), the elimination
method implies that the point
1 1 1
p
3
;p
3
1; 1 p
3
2 4 2
is the solution of the optimization problem (29.45). In this case the elimination method
is silent about possible minimizers because it relies on Tonelli’s Theorem rather than on
Weierstrass’Theorem. N
914 CHAPTER 29. EQUALITY CONSTRAINTS
Chapter 30
Inequality constraints
30.1 Introduction
Let us go back to the consumer problem seen at the beginning of the previous chapter, in
which we considered a consumer with utility function u : A Rn ! R and income w 0.
Given the vector p 2 Rn+ of prices of the goods, because of Walras’law we wrote his budget
constraint as
C (p; w) = fx 2 A : p x = wg
and his optimization problem as:
In this formulation we assumed that the consumer exhausts his budget (so the equality in
the budget constraint) and we did not impose other constraints on the bundle x except that
of satisfying the budget constraint. However, the hypothesis that income is entirely spent
may be too strong, so one may wonder what happens to the consumer optimization problem
if we weaken the constraint to p x w, that is, if the constraint is given by an inequality
and not anymore by an equality.
As to the bundles of goods x, in many cases it is meaningless to talk of negative quantities.
Think for example of the purchase of physical goods, say fruit or vegetables in an open air
market, in which the quantity purchased has to be positive. This suggests to impose the
positivity constraint x 0 in the optimization problem.
By keeping in mind these observations, the consumer problem becomes:
the optimization problem still takes the form (30.1), but the budget set C (p; w) is now
di¤erent.
915
916 CHAPTER 30. INEQUALITY CONSTRAINTS
The general form of an optimization problem with both equality and inequality constraints
is:
where I and J are …nite sets of indexes (possibly empty), f : A Rn ! R is the objective
function, the functions gi : A Rn ! R and the associated scalars bi characterize jIj
equality constraints, while the functions hj : A Rn ! R with the associated scalars cj
induce jJj inequality constraints. We continue to assume, as in the previous chapter, that
the functions f , gi and hj are continuously di¤erentiable on a non-empty and open subset
D of their domain A.
The optimization problem (30.4) can be equivalently formulated in canonical form as
(i) A constraint of the form h (x) c can be included in the formulation (30.4) by consid-
ering h (x) c. In particular, the constraint x 0 can be included by considering
x 0;
(ii) A constrained minimization problem for f can be written in the formulation (30.4) by
considering f .
These two observations show the scope and ‡exibility of formulation (30.4). In particular,
in light of (ii) it should be clear that also the choice of the sign in expressing the inequality
constraints is just a convention. That said, next we give some discipline to this formulation.
De…nition 1299 The problem (30.4) is said to be well posed if, for each j 2 J, there exists
x 2 C such that hj (x) < c.
To understand this de…nition observe that an equality constraint g (x) = b can be written
in form of inequality constraint as g (x) b and g (x) b. This makes uncertain the
distinction between equality and inequality constraints in (30.4). To avoid this, and so
to have a clear distinction between the two types of constraints, in what follows we will
always consider optimization problems (30.4) that are well posed, so that it is not possible
to express equality constraints in the form of inequality constraints. Naturally, De…nition
1299 is automatically satis…ed when J = ;, so there are no inequality constraints to worry
about.
30.1. INTRODUCTION 917
is of the form (30.4) with jIj = jJj = 1, f (x) = x21 + x22 + x33 , g (x) = x1 + x2 x3 ,
h (x) = x21 + x22 and b = c = 1.1 These functions are continuously di¤erentiable, so D = R3 .
Moreover, C = x 2 R3 : x1 + x2 x3 = 1 and x21 + x22 1
(ii) The optimization problem:
max x1
x1 ;x2 ;x3
is of the form (30.4) with I = f1; 2g, J = ;, f (x) = x1 , g1 (x) = x21 + x32 , g2 (x) =
x23 + x22 2x2 and b1 = b2 = 0. These functions are continuously di¤erentiable, so D = R3 .
Moreover, C = x 2 R3 : x32 = x21 and x23 + x22 = 2x2
(iii) The optimization problem:
1 1
C= x 2 R3 : x1 + x2 + x3 = 1, x21 + x22 + x23 = , x1 0 and x2
2 10
sub x1 + x2 1 and x1 + x2 1
is of the form (30.4) with I = ;; J = f1; 2g ; f (x) = x31 x32 , h1 (x) = x1 + x2 , h2 (x) =
x2 + x1 and c1 = c2 = 1. These functions are continuously di¤erentiable, so D = R2 .
Moreover, C = x 2 R2 : x1 + x2 1 and x2 1 + x1
(v) The minimum problem:
min x1 + x2 + x3
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 + x23
2
1
To be pedantic, here we should have set I = J = f1g ; g1 (x) = x1 + x2 x3 , h1 (x) = x21 + x22 and
b1 = c1 = 1. But, in this case of a single equality constraint and of a single inequality constraint, the
subscripts just make the notation heavy.
918 CHAPTER 30. INEQUALITY CONSTRAINTS
max (x1 + x2 + x3 )
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 x23
2
N
In words, A (x) is the set of the indices of the so-called binding constraints at x, that is, of
the constraints that hold as equalities at the given point x. For example, in the problem
max f (x1 ; x2 ; x3 )
x1 ;x2 ;x3
In other words, a point x 2 D is regular if the gradients of the functions that induce
constraints binding at such point are linearly independent. This condition generalizes the
notion of regularity upon which Lemma 1294 was based. Indeed, if we form the matrix whose
rows consist of the gradients of the functions that induce binding constraints at the point
considered, the regularity of the point amounts to require that such a matrix has full rank.
Note that in view of Corollary 89-(ii) a point is regular only if jA (x)j n, that is, only
if the number of the binding constraints at x does not exceed the dimension of the space on
which the optimization problem is de…ned.
We can now state the generalization of Lemma 1294 for problem (30.4). In reading it,
note how the vector ^ associated to the inequality constraints has positive sign, while there
is no restriction on the sign of the vector ^ associated to the equality constraints.
X X
rf (^
x) = ^ i rgi (^
x) + ^ j rhj (^
x) (30.8)
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J (30.9)
@f X @gi X @hj
(^
x) = ^i (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J
This lemma generalizes both Fermat’s Theorem and Lemma 1294. Indeed:
The novelty of Lemma 1302 relative to these previous results is, besides the positivity of
the vector ^ associated to the inequality constraints, the condition (30.9). To understand
the role of this condition, it is useful the following characterization.
Lemma 1303 Condition (30.9) holds if and only if ^ j = 0 for each j such that hj (^
x) < cj ,
that is, for each j 2
= A (^
x).
Proof Assume (30.9). Since for each j 2 J we have hj (^ x) cj , from the positive sign of ^
it follows that (30.9) implies cj hj (^ x) = 0 for each j 2 J, and therefore ^ j = 0 for each j
such that hj (^x) < cj . Conversely, if this last property holds we have
^ j (cj hj (^
x)) = 0 8j 2 J (30.10)
because, being hj (^
x) cj for each j 2 J, we have hj (^
x) < cj or hj (^
x) = cj . Condition
(30.10) immediately implies (30.9).
920 CHAPTER 30. INEQUALITY CONSTRAINTS
In words, (30.9) is equivalent to require the nullity of each ^ j associated to a not binding
constraint. Hence, we can have ^ j > 0 only if the constraint j is binding in correspondence
of the solution x^.
For example, if x^ is such that hj (^
x) < cj for each j 2 J, i.e., if in correspondence of x ^
all the inequality constraints are not binding, then we have ^ j = 0 for each j 2 J and the
vector ^ does not play any role in the determination of x ^. Naturally, this re‡ects the fact
that for this solution x
^ the inequality constraints themselves do not play any role.
The next example shows that conditions (30.8) and (30.9) are necessary, but not su¢ cient
(something not surprising since the same is true for Fermat’s Theorem and for Lemma 1294).
x31 + x32
max (30.11)
x1 ;x2 2
sub x1 x2 0
It is a simple modi…cation of Example 1286, and has the form (30.4) with f; h : R2 ! R
given by f (x) = 2 1 (x31 + x32 ) and h (x) = x1 x2 , while c = 0. We have:
and
rf (0; 0) = rg (0; 0)
(0 0) = 0
The origin (0; 0) satis…es with = 0 the conditions (30.8) and (30.9), but it is not solution
of the optimization problem (30.11), as (29.9) shows. N
We defer the proof of Lemma 1302 to the appendix.3 It is possible, however, to give a
heuristic proof of this lemma by reducing problem (30.4) to a problem with only equality
constraints, and then by exploiting the results seen in the previous chapter. For simplicity,
we give this argument for the special case
jJj
for each (x; ; ) 2 A RjIj R+ . Note that the vector is required to be positive.
The next famous result, proved in 1951 by Harold Kuhn and Albert Tucker, generalizes
Lagrange’s Theorem to the optimization problem (30.4). We omit the proof because it is
analogous to that of Lagrange’s Theorem.
^; ^ ; ^ = 0
rLx x (30.15)
^ j rL j
^; ^ ; ^ = 0
x 8j 2 J (30.16)
rL ^; ^ ; ^ = 0
x (30.17)
rL ^; ^ ; ^
x 0 (30.18)
The components ^ i and ^ j of the vectors ^ and ^ are called Lagrange multipliers, while
(30.15)-(30.18) are called Kuhn-Tucker conditions. The points x 2 A for which there exists
jJj
a pair ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis…es the conditions (30.15)-(30.18)
are called points of Kuhn-Tucker.
The Kuhn-Tucker points are, therefore, the solutions of the –typically nonlinear –system
of equations and inequalities given by Kuhn-Tucker conditions. By Kuhn-Tucker’s Theorem,
a necessary condition for a regular point x to be solution of the optimization problem (30.4)
is that it is a point of Kuhn-Tucker.7 Observe, however, that a Kuhn-Tucker point (x; ; )
is not necessarily a stationary point for the Lagrangian function: the condition (30.18) only
requires rL (x; ; ) 0, not the stronger property rL (x; ; ) = 0.
Later in the book, in Section 33.6, we will present a marginal interpretation of the
multipliers ( ^ ; ^ ), along the lines sketched in the case of equality constraints (Section 29.3.3).
Let D0 be the set of the singular points x 2 D where the regularity condition of the
constraints does not hold, and let D1 be, instead, the set of the points x 2 A where this
condition holds. The method of elimination consists of four phases:
1. Verify if Tonelli’s Theorem can be applied, that is, if f is continuous and coercive on
C;
2. determine the set D where the functions f and gi are continuously di¤erentiable;
3. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di¤erentiable;
4. determine the set C \ D0 of the singular points that satisfy the constraints;
5. determine the set S of the regular Kuhn-Tucker points, i.e., the points x 2 C \(D D0 )
jJj
for which there exists a pair ( ; ) 2 RjIj R+ of Lagrange multipliers such that the
triple (x; ; ) satis…es the Kuhn-Tucker conditions (30.15)-(30.18);8
6. determines the set ff (x) : x 2 S [ (C \ D0 )g; if x
^ 2 S [ (C \ D0 ) is such that
f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D)
then such x
^ is solution of the optimization problem (30.4).
The …rst phase of the method of elimination is the same of the previous chapter, while
the other phases are the obvious extension of the method to the case of the problem (30.4).
Example 1307 The optimization problem:
max x1 2x22 (30.19)
x1 ;x2
We start by observing that 6= 0, that is, > 0. Indeed, if = 0 the …rst equation
becomes 1 = 0, a contradiction. We therefore assume that > 0. The second equation
implies x2 = 0, and in turn the third equation implies x1 = 1. From the …rst equation it
follows = (1=2), and hence the only solution of the system is ( 1; 0; (1=2)). The only
Kuhn-Tucker point is therefore ( 1; 0) , i.e., S = f( 1; 0)g.
In sum, since the sets C \ D0 and C D are both empty, we have
S [ (C \ D0 ) [ (C D) = S = f( 1; 0)g
The method of elimination allows us to conclude that ( 1; 0) is the only solution of the
optimization problem 30.19. Note that in this solution the constraint is binding (i.e., it is
satis…ed with equality); indeed = (1=2) > 0, as required by Proposition 1306. N
n
X
max x2i (30.20)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1
n n
! n
X X X
L (x1 ; x2 ; ) = x2i + 1 xi + i xi 8 (x; ; ) 2 R2n+1
i=1 i=1 i=1
30.2. RESOLUTION OF THE PROBLEM 925
To …nd the set S of its Kuhn-Tucker points, it is necessary to solve the system
8 @L
>
>
> @xi = 2xi + =0
Pn i
8i = 1; :::; n
>
> @L
>
> @ = (1 Pn i=1 i
x)=0
< @L
= 1 x = 0
@ i=1 i
@L
>
> i @ i = i xi = 0; 8i = 1; :::; n
>
> @L
>
> = xi 0; 8i = 1; :::; n
>
: @ i
i 0; 8i = 1; :::; n
2x2i xi + i xi = 0; 8i = 1; :::; n
Therefore,
n
X
2 x2i =0
i=1
Pn 2
that is, = 2 i=1 xi . We conclude that 0.
If xi = 0, from the condition @L=@xi = 0 it follows that = i . Since i 0 and 0,
it follows that i = 0. In turn, this implies = 0 and hence using again the condition
@L=@xi = 0 we P conclude that xi = = 0 for each i = 1; :::; n. But this contradicts the
n
condition (1 i=1 xi ) = 0, and we therefore conclude that xi 6= 0, that is, xi > 0.
Since this holds for each i = 1; :::; n, it follows that xi > 0 for each i = 1; :::; n. From
the condition i xi = 0 it follows that i = 0 for each i = 1; :::; n, and the …rst n equations
become:
2xi =0 8i = 1; :::; n
P
that is, xi = =2 for each i = 1; :::; n. The xi are therefore all equal; from ni=1 xi = 1 it
follows that
1
xi = 8i = 1; :::; n
n
In conclusion,
1 1
S= ; :::;
n n
Since C D = ; and D0 = ;, we have
1 1
S [ (C \ D0 ) = ; :::;
n n
The method of elimination allows us to conclude that the point (1=n; :::; 1=n) is the solution
of the optimization problem (30.20). N
926 CHAPTER 30. INEQUALITY CONSTRAINTS
has solution (1=n; :::; 1=n). It is the unique solution if h is strictly concave.
Pn
If h (xi ) = xi log xi , the function i=1 h (xi ) is called entropy (Examples 219 and 1268).
Pn
Proof Let x1 ; x2 ; :::; xn 2 [0; 1] with the constraint i=1 xi = 1. Since h is concave, by the
Jensen’s inequality we have
n n
!
X 1 1X 1
h (xi ) h xi =h
n n n
i=1 i=1
Namely,
n
X 1 1 1
h (xi ) nh =h + +h
n n n
i=1
P
This shows that (1=n; :::; 1=n) is a solution. Clearly, ni=1 h (xi ) is strictly concave if h is.
Hence, the uniqueness of the solution is ensured by Theorem 831.
Proposition 1310 Let A be convex. If the functions gi are a¢ ne for each i 2 I and the
functions hj are convex for each j 2 J, then the choice set C de…ned in (30.5) is convex.
It is easy to give examples where C is no longer convex when the conditions of convexity
and a¢ nity used in this result are not satis…ed. Note that the convexity condition of the
hj is much weaker than that of a¢ nity on the gi . This shows that the convexity of the
choice set is more natural for inequality constraints than for equality ones. This is a crucial
“structural”di¤erence between the two types of constraints –which are more di¤erent than
it may appear prima facie.
De…nition 1311 The optimization problem (30.4) is said to be concave if the objective
function f is concave, the functions gi are a¢ ne and the functions hj are convex on the open
and convex set A.
x=b (30.23)
In a similar vein, when also the functions hj happen to be a¢ ne, say hj (x) = j x + qi ,
we can write also the inequality constraints in the matrix form Hx c, where H is the
jJj n matrix with rows j and c 2 RjJj . Thus, when all constraints are identi…ed by a¢ ne
functions, the choice set is a polyhedron C = fx 2 Rn : x = b and Hx cg. This case often
arises in applications. Indeed, if also the objective function is a¢ ne, we are back to linear
programming, an important class of concave problem that we already studied via convexity
arguments (Section 18.6.4).
Theorem 1312 The Kuhn-Tucker points solve a concave optimization problem in which the
functions f; fgi gi2I and fhj gj2J are di¤ erentiable.
Proof Let (x ; ; ) be a Kuhn-Tucker point for the optimization problem (30.4), that is,
(x; ; ) satis…es the conditions (30.15)-(30.18). In particular, this means that
X X
rf (x ) = i rgi (x )+ j rhj (x ) (30.24)
i2I j2A(x )\J
rhj (x ) (x x ) 0 8j 2 A (x ) ; 8x 2 C
rgi (x ) (x x )=0 8i 2 I; 8x 2 C
f (x) f (x ) + rf (x ) (x x ) 8x 2 A
This theorem provides a su¢ cient condition for optimality: if a point is Kuhn-Tucker,
then it solves the optimization problem. The condition is, however, not necessary: there can
be solutions of a concave optimization problem that are not Kuhn-Tucker points. In view
of Kuhn-Tucker’s Theorem this can happen only if the solution is not a regular point. The
next example illustrates this situation.
has the form (30.4), where f; h1 ; h2 : R3 ! R are continuously di¤erentiable functions given
by f (x1 ; x2 ; x3 ) = x1 x2 x23 , h1 (x1 ; x2 ; x3 ) = x21 +x22 2x1 , h2 (x1 ; x2 ; x3 ) = x21 +x22 +2x1 ,
while c1 = c2 = 0.
30.4. CONCAVE OPTIMIZATION 929
rf (0; 0; 0) = ( 1; 1; 0)
By combining Kuhn-Tucker’s Theorem and Theorem 1312 we get the following necessary
and su¢ cient optimality condition.
Theorem 1314 Consider a concave optimization problem in which the functions f; fgi gi2I
and fhj gj2J are continuously di¤ erentiable. A regular point x 2 A is a solution of the
problem if and only if it is a Kuhn-Tucker point.
3. determine the set C \ D0 of the singular points that satisfy the constraints;
5. if S 6= ;, then all the points of S are solutions of the problem,9 while also a singular
point x 2 C \ D0 is a solution if and only if f (x) = f (^
x) for some x
^ 2 S;
Since either phase 5 or 6 applies, depending on whether or not S is empty, the actual
phases of the convex method are …ve.
The convex method works thanks to Theorems 1312 and 1314. Indeed, if S 6= ; then
by Theorem 1312 all points of S are solutions of the problem. In this case, a singular point
x 2 C \ D0 can in turn be a solution when its value f (x) is equal to that of any point in S.
When, instead, we have S = ;, then Theorem 1314 guarantees that no regular point in A
is solution of the problem. At this stage, if Tonelli’s Theorem is able to ensure the existence
of at least a solution, we can restrict the search to the set C \ D0 of the singular points that
satisfy the constraints. In other words, it is su¢ cient to …nd the maximizers of f on C \ D0 :
they are also solutions of problem (30.4), and vice versa.
Clearly, the convex method becomes especially powerful when S 6= ; because in such a
case there is no need to verify the validity of global existence theorems a la Weierstrass or
Tonelli, but it is su¢ cient to …nd the Kuhn-Tucker points.
If we content ourselves with solutions that are regular points, without worrying about
the possible existence of singular solutions, we can give a short version of the convex method
that is based only on Theorem 1312. We can call it the short convex method. It is based
only on three phases:
Indeed, by Theorem 1312 all regular Kuhn-Tucker points are solutions of the problem.
The short convex method is simpler than the convex method, and it does not require the use
of global existence theorems. The price of this simpli…cation is in the possible inaccuracy of
this method: being based on su¢ cient conditions, it is not able to …nd the solutions where
these conditions are not satis…ed (by Theorem 1314, such solutions would be singular points).
Furthermore, the short method cannot be applied when S = ;; in such a case, it is necessary
to apply the complete convex method.
The short convex method is especially powerful when the objective function f is strictly
concave, as often assumed in applications. Indeed, in such a case a solution found with the
short method is necessarily also the unique solution of the concave optimization problem.
The next example illustrates.
This problem is of the form (30.4), where f; h1 ; h2 : R3 ! R are given by f (x) = x21 + x22 + x23 ,
h1 (x) = (3x1 + x2 + 2x3 ) and h2 (x) = x1 , while c1 = 1 and c2 = 0.
30.4. CONCAVE OPTIMIZATION 931
Using Theorem 1120 it is easy to verify that f is strictly concave, while it is immediate
to verify that h1 and h2 are convex. Therefore, (30.28) is a concave optimization problem.
Moreover, the functions f , h1 and h2 are continuously di¤erentiable. This completes the
…rst two phases of the short convex method, which we apply here since f is strictly concave.
Let us …nd the Kuhn-Tucker points. The Lagrangian function L : R5 ! R is given by
To …nd the set S of its Kuhn-Tucker points it is necessary to solve the system of equalities
and inequalities: 8 @L
> @x1 = 2x1 + 3 1 + 2 = 0
>
>
>
> @L
>
> @x2 = 2x2 + 1 = 0
>
> @L
>
> @x3 = 2x3 + 2 1 = 0
>
< @L
1 @ 1 = 1 ( 1 + 3x1 + x2 + 2x3 ) = 0
(30.29)
> 2 @@L = 2 x1 = 0
>
>
> 2
> @L
>
>
> @ 1 = 1 + 3x1 + x2 + 2x3 0
>
> @L
= x1 0
>
>
: @ 2
1 0; 2 0
We consider four cases, depending on the fact that the multipliers 1 and 2 are zero or
not.
Case 1: 1 > 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 + 2 = 0. This last equation does not have strictly positive solutions 1 and 2 , and
hence we conclude that we cannot have 1 > 0 and 2 > 0.
Case 2: 1 = 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 = 0, that is 1 = 0. This contradiction shows that we cannot have 1 = 0 and 2 > 0.
Case 3: 1 > 0 and 2 = 0. The conditions 1 @L=@ 1 = @L=@x1 = @L=@x2 = @L=@x3 =
0 imply: 8
>
> 2x1 + 3 1 = 0
<
2x2 + 1 = 0
>
> 2x3 + 2 1 = 0
:
3x1 + x2 + 2x3 = 1
Solving for 1 , we get 1 = 1=7, and hence x1 = 3=14, x2 = 1=14 and x3 = 1=7. The
quintuple (3=14; 1=14; 1=7; 1=7; 0) solves the system (30.29), and hence (3=14; 1=14; 1=7) is a
Kuhn-Tucker point.
Case 4: 1 = 2 = 0. The condition @L=@x1 = 0 implies x1 = 0, while the conditions
@L=@x2 = @L=@x3 = 0 imply x2 = x3 = 0. It follows that the condition @L=@ 1 0 implies
1 0, and this contradiction shows that we cannot have 1 = 2 = 0.
In conclusion, S = f((3=14; 1=14; 1=7))g. Since f is strictly concave, the short convex
method allows us to conclude that
3 1 1
; ;
14 14 7
is the unique solution of the optimization problem (30.28).10 N
10
The objective function is easily see to be strongly concave. So, coda readers may note that the existence
and uniqueness of the solution would also follow from Theorem 1185.
932 CHAPTER 30. INEQUALITY CONSTRAINTS
We close with an important observation. The solution methods seen in this chapter are
based on the search of the Kuhn-Tucker points, and therefore they require the resolution of
systems of nonlinear equations. In general, these systems are not easy to solve and this limits
the computational usefulness of these methods, whose importance is mostly theoretical. At
a numerical level, other methods are used (which the interested reader can …nd in books of
numerical analysis).
Lemma 1316 (i) The function y = x jxj is continuously di¤ erentiable in R and Dx jxj =
2
2 jxj. (ii) The square (x+ ) of the function x+ = max fx; 0g is continuously di¤ erentiable on
2
R, and D (x+ ) = 2x+ .
Proof (i) Observe that x jxj is in…nitely di¤erentiable for x 6= 0 and its …rst derivative is,
by the product rule for di¤erentiation,
jxj
Dx jxj = xD jxj + jxj Dx = x + jxj = 2 jxj
x
This is true for x 6= 0. Now it su¢ ces to invoke a basic calculus result that asserts: let f : I !
R be continuous on a real interval, and f be di¤erentiable at I fx0 g; if limx!x0 Df (x) = ,
then f is di¤erentiable at x0 and Df (x0 ) = . As an immediate consequence, Dx jxj = 2 jxj
also at x = 0. (ii) We have x+ = 2 1 (x + jxj). Therefore
2 1 1 1
x+ = (x + jxj)2 = x2 + x jxj
4 2 2
2 2
It follows that (x+ ) is continuously di¤erentiable and D (x+ ) = x + jxj = 2x+ .
Proof of Lemma 1302 Let k k be the Euclidean norm. We have hj (^ x) < cj for each
j 2
= A (^ x). Since A is an open, there exists ~" > 0 su¢ ciently small such that B~" (^ x) =
fx 2 A : kx x ^k ~"g A. Moreover, since each hj is continuous, for each j 2 = A (^x) there
exists "j su¢ ciently small such that hj (x) < cj for each x 2 B"j (^ x) = fx 2 A : kx x ^k "j g.
Let "0 = minj 2A(^
= x) " j and ^
" = min f~
" ; "0 g; in other words, ^
" is the minimum between ~" and
the "j . In this way we have B^" (^ x) = fx 2 A : kx x ^k ^"g A and hj (x) < cj for each
x 2 B^" (^
x) and each j 2 = A (^x).
Given " 2 (0; ^"], the set S" (^ x) = fx 2 A : kx x ^k = "g is compact. Moreover, by what
just seen hj (x) < cj for each x 2 S" (^ x) and each j 2 = A (^
x), that is, in S" (^
x) all the non
binding constraints are always satis…ed.
For each j 2 J, let h ~ j : A Rn ! R be de…ned by
~ 2 (x)
@h + @hj (x)
j ~ j (x)
=2 h cj ; 8p = 1; :::; n (30.30)
@xp @xp
30.5. APPENDIX: PROOF OF A KEY LEMMA 933
Since the sequence fxn g just constructed is contained in the compact set S" (^ x), by the
Bolzano-Weierstrass’Theorem there exists a subsequence fxnk gk convergent in S" (^ x), i.e.,
there exists x 2 S" (^
x) such that xnk ! x . Inequality (30.32) implies that, for each k 1,
we have:
f (xnk ) f (^
x) kxnk x ^k2 X
(gi (xnk ) gi (^x))2 (30.33)
Nnk
i2I
X 2
+ ~ j (xn ) h
h ~ j (^
x)
k
j2J\A(^
x)
2
It follows that (gi (x ) x))2 =
gi (^ ~ j (x )
h ~ j (^
= 0 for each i 2 I and for each
h x)
j 2 J \ A (^x), from which gi (x ) = gi (^
x) = bi for each i 2 I and h~ j (x ) = h
~ j (^
x) = cj for
each j 2 J \ A (^x).
Since in S" (^x) the non binding constraints are always satis…ed, i.e., hj (x) < cj for each
x 2 S" (^x) and each j 2 = A (^
x), we can conclude that x satis…es all the constraints. We
therefore have f (^x) f (x ) given that x ^ solves the optimization problem.
On the other hand, since xnk 2 S" (^ x) for each k 1, (30.33) implies
f (xnk ) f (^
x)
0 1
X X 2
kxnk ^k2 + Nnk @
x (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A "2
i2I j2J\A(^
x)
for each k 1, and hence f (xnk ) f (^x) + "2 for each k 1. Thanks to the continuity of
f , this leads to
f (x ) = lim f (xnk ) f (^x) + "2 > f (^
x)
k
which contradicts f (^
x) f (x ). This contradiction proves Fact 1. 4
Using Fact 1, we prove now a second property that we will need. Here we set S =
SRjIj+jJj+1 = x 2 RjIj+jJj+1 : kxk = 1 .
Fact 2. For each " 2 (0; ^"], there exist x" 2 B" (^
x) and a vector
X X
" @f @gi " " @hj
0 (x" ) 2 x"j x
^j "
i (x ) j (x" ) = 0 (30.34)
@xz @xz @xz
i2I j2J\A(^
x)
Proof of Fact 2 Given " 2 (0; ^"], let N" > 0 be the positive constant whose existence is
guaranteed by Fact 1. De…ne the function " : A Rn ! R as:
0 1
X X 2
" (x) = f (x) f (^
x) kx x ^k2 N" @ x))2 +
(gi (x) gi (^ ~ j (x) h
h x) A
~ j (^
i2I j2J\A(^
x)
The function " is continuous on the compact set B" (^ x) = fx 2 A : kx x ^k "g and, by
Weierstrass’Theorem, there exists x" 2 B" (^
x) such that " (x" ) " (x) for each x 2 B" (^
x).
"
In particular, " (x ) " "
" (^
x) = 0, and hence (30.35) implies that kx k < ", that is, x 2
30.5. APPENDIX: PROOF OF A KEY LEMMA 935
so that (30.34) is obtained by dividing (30.36) by c" . Observe that "i 0 for each j 2 J
P " 2 P 2
" " "
and that i2I ( i ) + j2J "j = 1, i.e., " "
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S. 4
Using Fact 2, we can now complete the proof.nTake a decreasing sequence o f"n gn (0; ^"]
n n n n n
with "n # 0, and consider the associated sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj S whose
n
existence is guaranteednby Fact 2. o
n n n n n
Since the sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj is contained in the compact set S, by
n
the Bolzano-Weierstrass’Theorem there exists a subsequence
n o
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj k
convergent in S, that is, there exists 0; 1 ; :::; jIj ; 1 ; :::; jJj 2 S such that
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj ! 0; 1 ; :::; jIj ; 1 ; :::; jJj
nk @f X nk @gi nk X nk @hj nk
0 (xnk ) 2 (xnk x
^z ) i (x ) j (x ) = 0
@xz @xz @xz
i2I j2J\A(^
x)
for each z = 1; :::; n. Consider the sequence fxnk gk so constructed. From xnk 2 B"nk (^
x) it
follows that kxnk x ^k < "nk ! 0 and hence, for each z = 1; :::; n,
@f X @gi X @hj
0 (^
x) i (^
x) j (x) (30.37)
@xz @xz @xz
i2I j2J\A(^
x)
0 1
nk @f X nk @gi
X nk @hj
= lim @ 0 xk 2 (xnk x
^z ) i (xnk ) j (xnk )A
k @xz @xz @xz
i2I j2J\A(^
x)
= 0:
936 CHAPTER 30. INEQUALITY CONSTRAINTS
The linear independence of the gradients associated to the constraints that holds for the
hypothesis of regularity of the constraints implies i = 0 for each i 2 I, which contradicts
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S.
General constraints
where X is a subset of A and the other elements are as in the optimization problem (30.4).
This problem includes as special cases the optimization problems that we have seen so far:
we get back to the optimization problem (30.4) when X = A and to an unconstrained
optimization problem when I = J = ; and C = X is open.
Formulation (31.1) may be also useful when there are conditions on the sign or on the
value of the choice variables xi . The classic example is the non-negativity condition of the xi ,
which are best expressed as a constraint x 2 Rn+ rather than through n inequalities xi 0.
Here a constraint of the form x 2 X simpli…es the exposition.
In this chapter we want to address the general optimization problem (31.1). If X is open,
the solution techniques of Section 30.2 can be easily adapted by restricting the analysis on
X itself (which can play the role of the set A). Matters are more interesting when X is
not open. Here we focus on the concave case of Section 30.4, widely used in applications.
Consequently, throughout the chapter we assume that X is a closed and convex subset of
an open convex set A, as well as that f : A Rn ! R is a concave di¤erentiable objective
function, that gi : Rn ! R are a¢ ne functions and that hj : Rn ! R are convex di¤erentiable
functions.2
1
Sometimes this distinction is made by talking of implicit and explicit constraints. Di¤erent authors,
however, may give an opposite meaning to this terminology (that, in any case, we do not adopt).
2
To ease matters, we de…ne the functions gi and hj on the entire space Rn .
937
938 CHAPTER 31. GENERAL CONSTRAINTS
The set C is closed and convex. As it is often the case, the best way to proceed is to abstract
from the speci…c problem at hand, with its potentially distracting details. For this reason,
we will consider the following optimization problem:
where C is a generic closed and convex choice set that, for the moment, we treat as a black
box. Throughout this section we assume that f is continuously di¤erentiable on an open
convex set that contains C. The simplest case when this assumption holds is when f is
continuously di¤erentiable on its entire domain A.
The next lemma gives a simple and elegant way to unify these two cases.
Proposition 1317 If x
^ 2 [a; b] is solution of the optimization problem (31.4), then
f 0 (^
x) (x x
^) 0 8x 2 [a; b] (31.5)
Proof We divide the proof in three parts, one for each of the equivalences to prove.
(i) Let x^ 2 (a; b). We prove that (31.5) is equivalent to f 0 (^
x) = 0. If f 0 (^
x) = 0 holds,
0
then f (^ x) (x x ^) = 0 for each x 2 [a; b], and hence (31.5) holds. Vice versa, suppose that
(31.5) holds. Setting x = a, we have (a x ^) < 0 and so (31.5) implies f 0 (^ x) 0. On
the other hand, setting x = b, we have (b x ^) > 0 and so (31.5) implies f 0 (^ x) 0. In
conclusion, x 0
^ 2 (a; b) implies f (^
x) = 0.
(iii) Let x ^ = b. We prove that (31.5) is equivalent to f 0 (b) 0. Let f 0 (b) 0. Since
0
(x b) 0 for each x 2 [a; b], we have f (b) (x b) 0 for each x 2 [a; b] and (31.5) holds.
Vice versa, suppose that (31.5) holds. By taking x 2 [a; b), we have (x b) < 0 and so (31.5)
implies f 0 (b) 0.
Proof of Proposition 1317 In view of Lemma 1318, it only remains to prove that (31.5)
becomes a su¢ cient condition when f is concave. Suppose, therefore, that f is concave and
that x
^ 2 [a; b] is such that (31.5) holds. We prove that this implies that x ^ is solution of
problem (31.4). Indeed, by (24.7) we have f (x) f (^ 0
x) + f (^x) (x x ^) for each x 2 [a; b],
which implies f (x) f (^ x) f 0 (^
x) (x x ^) for each x 2 [a; b]. Thus, (31.5) implies that
f (x) f (^x) 0, that is, f (x) f (^ x) for each x 2 [a; b]. Hence, x
^ solves the optimization
problem (31.4).
rf (^
x) (x x
^) 0 8x 2 C (31.6)
As in the scalar case, the variational inequality uni…es the optimality necessary conditions
for interior and boundary points. Indeed, it is easy to check that, when x ^ is an interior point
of C, (31.6) reduces to the classic …rst-order condition rf (^ x) = 0 of Fermat’s Theorem.
Proof Let x
^ 2 C be solution of the optimization problem (31.3), i.e., f (^
x) f (x) for each
x 2 C. Given x 2 C, set zt = x
^ + t (x x^) for t 2 [0; 1]. Since C is convex, zt 2 C for each
940 CHAPTER 31. GENERAL CONSTRAINTS
0 (t) (0) f (^
x + t (x x ^)) f (^
x)
+ (0) = lim = lim
t!0+ t t!0+ t
df (^
x) (t (x x ^)) + o (kt (x x ^)k)
= lim
t!0 + t
o (t kx x ^k)
= df (^
x) (x x ^) + lim = df (^
x) (x x^) = rf (^
x) (x x
^)
t!0+ t
For each t 2 [0; 1] we have (0) = f (^ x) f (zt ) = (t), and so : [0; 1] ! R has a (global)
0
maximizer at t = 0. It follows that + (0) 0, which implies rf (^ x) (x x ^) 0, as desired.
As to the converse, assume that f is concave. By (24.18), f (x) f (^x) + rf (^
x) (x x ^)
for each x 2 C, and therefore (31.6) implies f (x) f (^x) for each x 2 C.
For the dual minimum problems, the variational inequality is easily seen to take the dual
form rf (^
x) (x x ^) 0. For interior solutions, instead, the condition rf (^ x) = 0 is the
same in both maximization and minimization problems.3
NC (x) = fy 2 Rn : y (x x) 0 8x 2 Cg
Next we provide a couple of important properties of NC (x). In particular, (ii) shows that
NC (x) is non-trivial only if x is a boundary point.
Proof (i) The set NC (x) is clearly closed. Moreover, given y; z 2 NC (x) and ; 0, we
have
( y + z) (x x) = y (x x) + z (x x) 0 8x 2 C
and so y + z 2 NC (x). By Proposition 699, NC (x) is a convex cone. (ii) We only prove
the “if” part. Let x be an interior point of C. Suppose, by contradiction that there is a
vector y 6= 0 in NC (x). As x is interior, we have that x + ty 2 C for t > 0 su¢ ciently
small. Hence we would have y (x + ty x) = ty y = t kyk2 0. This implies y = 0, a
contradiction. Hence NC (x) = f0g.
To see the importance of normal cones, note that condition (31.6) can be written as:
rf (^
x) 2 NC (^
x) (31.7)
3
The unifying power of variational inequalities in optimization is the outcome of a few works of Guido
Stampacchia in the early 1960s. For an overview, see Kinderlehrer and Stampacchia (1980).
31.2. ANALYSIS OF THE BLACK BOX 941
Therefore, x
^ solves the optimization problem (31.3) only if the gradient rf (^
x) belongs to the
normal cone of C with respect to x ^. This way of writing condition (31.6) is useful because,
given a set C, if we can describe the form of the normal cone – something that does not
require any knowledge of the objective function f –we can then have a sense of which form
takes the “…rst order condition” for the optimization problems that have C as a choice set.
In other words, (31.7) can be seen as a general …rst order condition that permits to
distinguish in such condition the part, NC (^ x), determined by the constraint C, and the
part, rf (^
x), determined by the objective function. This distinction between the roles of the
objective function and of the constraint is illuminating.4
The next result characterizes the normal cone for convex cones.
NC (x) = fy 2 Rn : y x = 0 and y x 0 8x 2 Cg
This result implies that for, given a closed and convex cone C, a point x
^ satis…es the …rst
order condition (31.7) when
rf (^
x) x
^=0 (31.9)
rf (^
x) x 0 8x 2 C (31.10)
The …rst order condition is thus easier to check on cones. Even more so in the important
special case C = Rn+ , when from (31.8) it follows that conditions (31.9) and (31.10) reduce
to the following n equalities and n inequalities,
@f (^
x)
x
^i =0 (31.11)
@xi
@f (^
x)
0 (31.12)
@xi
4
For a thorough account of this important viewpoint, we refer readers to Rockafellar (1993).
942 CHAPTER 31. GENERAL CONSTRAINTS
P
We can also characterize the normal cones of the simplices n 1 = x 2 Rn+ : nk=1 xi = 1 ,
another all-important class of closed and convex sets. To this end, given x 2 n 1 set
The set f y 2 Rn : y 2 I (x) and 0g is easily seen to be the smallest convex cone
that contains I (x). The normal cone is thus such a set.
@f (^
x) @f (^
x)
=^ if x
^i > 0 ; ^ if x
^i = 0
@xi @xi
@f (^
x)
^ 8i = 1; :::; n (31.13)
@xi
@f (^
x)
^ x
^i = 0 8i = 1; :::; n (31.14)
@xi
Proof of Proposition 1323 Suppose that P (x) is not a singleton and let i; j 2 P (x).
Clearly, 0 < xi ; xj < 1. Consider the points x" 2 Rn having coordinates x"i = xi + ",
x"j = xj ", and x"k = xk for all k 6= i and k 6= j; while the parameter " runs over
Pn [ ""0 ; "0 ]
with "0 > 0 su¢ ciently small in order that x " 0 for " 2 [ "0 ; "0 ]. Note that k=1 xk = 1
and so x" 2 n 1 . Let y 2 N n 1 (x). By de…nition, y (x" x) 0 for every " 2 [ "0 ; "0 ].
Namely, "yi "yj = " (yi yj ) 0, which implies yi = yj . Hence, it must hold yi = for
all i 2 P (x). That is, the values of y must be constant on P (x). This is trivially true when
P (x) is singleton. Let now j 2= P (x). Consider the vector xj 2 Rn , where xjj = 1 and xjk = 0
for each k 6= j: If y 2 N n 1 (x), then y xj x 0. That is,
X X X
yj yk xk = yj yk xk = yj xk = yj 0
k6=j k2P (x) k2P (x)
Therefore, N n 1 (x) f y 2 Rn : y 2 I (x) and 0g. We now show the converse inclu-
n
sion. Let y 2 R be such that, for some 0, we have yi = for all i 2 P (x) and yk
31.2. ANALYSIS OF THE BLACK BOX 943
for each k 2
= P (x). If x 2 n 1, then
n
X X X
y (x x) = yi (xi xi ) = yi (xi xi ) + yi (xi xi )
i=1 i2P (x) i2P
= (x)
0 1
X X X X
= (xi xi ) + yi x i = @ xi A + yi xi
i2P (x) i2P
= (x) i2P (x) i2P
= (x)
0 1
X X
@ xi A + xi = 0
i2P (x) i2P
= (x)
Hence y 2 N n 1 (x).
Proposition 1325 Let C = C1 \ \ Cn , with each Ci closed and convex. Then, for all
x 2 C,
n
X
NCi (x) NC (x)
i=1
Equality holds if C satis…es Slater’s condition int C1 \ \ int Cn 6= ;, where the set Ci itself
can replace its interior int Ci if it is a¢ ne.
P
Proof Let xP 2 C. Suppose y = ni=1 yi , with yi 2 NCi (x) for every i = 1; :::; n. Then,
y (x x) = ni=1 yi (x x) 0, and so y 2 NC (x). This proves the inequality. We omit
the proof that Slater’s condition implies the equality.
In words, under Slater’s condition the normal cone of an intersection of sets is the sum
of their normal cones. Hence, a point x ^ satis…es the …rst order condition (31.7) if and only
if there is a vector y^ = (^
y1 ; :::; y^n ) such that
n
X
rf (^
x) = y^i
i=1
y^i 2 NCi (^
x) 8i = 1; :::; n
A familiar “multipliers”format emerges. The next section will show how the Kuhn-Tucker’s
Theorem …ts in this general framework.
Lemma 1327 The set C satis…es Slater’s condition if there is x 2 int X such that gi (x) = bi
for all i 2 I and hj (x) < cj for all j 2 J.
\ \
Proof The level sets Ci are a¢ ne (Proposition 662). Since x 2 X \ Ci \ int Cj ,
i2I j2J
such intersection is non-empty and so C satis…es Slater’s condition.
In what follows we thus assume the existence of such x.5 In view of Proposition 1325, it
now becomes key to characterize the normal cones of the sets Ci and Cj .
Lemma 1328 (i) For each x 2 Ci , we have NCi (x) = f rg (x) : 2 Rg for each x 2 Ci ;
(ii) For each x 2 Cj , we have
8
>
> f rhj (x) : 0g if hj (x) = cj
<
NCj (x) = f0g if hj (x) < cj
>
>
:
; if hj (x) > cj
where A (x) is the collection of the binding inequality constraints de…ned in (30.7). Since
here the …rst order condition (31.7) is a necessary and su¢ cient optimality condition, we can
say that x
^ 2 C solves the optimization problem (31.1) if and only if there exists a triple of
^ jJj
vectors ( ; ^ ; ^ ) 2 RjIj R+ Rn such that
X X
rf (^x) = ^ + ^ i rgi (^
x) + ^ j rhj (^
x) (31.15)
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J (31.16)
Indeed, as we noted in Lemma 1303, condition (31.16) amounts to require ^ j = 0 for each
j2= A (^
x).
To sum up, under a Slater’s condition we get back the Kuhn-Tucker’s conditions (30.8)
and (30.9), suitably modi…ed to cope with the new constraint x 2 X. We leave to the reader
the formulation of these conditions via a Lagrangian function.
Example 1329 Let X = Rn+ . By (31.8), ^ k x
^k = 0 and ^ k 0 for each k = 1; :::; n. By
(31.15), we have X X
^ = rf (^x) ^ i rgi (^
x) ^ j rhj (^
x) (31.17)
i2I j2J
So, conditions (31.15) and (31.16) can be equivalently written (with gradients unzipped) as:
x) X ^ @gi (^
@f (^ x) X @hj (^ x)
i + ^j 8k = 1; :::; n
@xk @xk @xk
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J
0 1
@f (^
x ) X @gi (^ x ) X @hj (^ x )
@ ^i ^j Ax^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J
0 1
@ @f (^x) X ^ @gi (^ x) X @hj (^ x)
i ^j ^A x
^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J
Intermezzo: correspondences
Example 1331 (i) The correspondence ' : R R given by ' (x) = [ jxj ; jxj] associates to
each scalar x the interval [ jxj ; jxj]. For instance, ' (1) = ' ( 1) = [ 1; 1] and ' (0) = f0g.
(ii) Given a consumption set A = [0; b] with b 2 Rn++ , the budget correspondence B :
n
R+ R+ Rn+ de…ned by B (p; w) = fx 2 A : p x wg associates to each pair (p; w) of
prices and income the corresponding budget set.
(iii) Given a concave function f : Rn ! R, the superdi¤erential correspondence @f :
R n Rn has as image @f (x) the superdi¤erential of f at x (cf. Proposition 1143). The
superdi¤erential correspondence generalizes for concave functions the derivative operator
rf : Rn ! Rn de…ned in (21.6).
(iv) Let f : X ! Y be a function between any two sets X and Y . The inverse corres-
pondence f 1 : Im f X is de…ned by f 1 (y) = fx 2 X : f (x) = yg. If f is injective, we
get back to the inverse function f 1 : Im f ! Y . For instance, if f : R ! R is the quadratic
function f (x) = x2 , then Im f = [0; 1) and so the inverse correspondence f 1 : [0; 1) R
is de…ned by
p p
f 1 (y) = f y; yg
for all y 0. Recall that in Example 170 we argued that this rule does not de…ne a function
since, to each strictly positive scalar, it associates two elements of the codomain, i.e., its
positive and negative square roots. N
947
948 CHAPTER 32. INTERMEZZO: CORRESPONDENCES
Like the graph of a function, the graph of a correspondence is a subset of X Y . If ' is a func-
tion, we get back to the notion of graph of a function Gr ' = f(x; y) 2 X Y : y = ' (x)g.
Indeed, condition y 2 ' (x) reduces to y = ' (x) when each image ' (x) is a singleton.
Example 1332 (i) The graph of the correspondence ' : R R given by ' (x) = [ jxj ; jxj]
is Gr ' = (x; y) 2 R2 : jxj y jxj . Graphically:
Functions are, trivially, both compact-valued and convex-valued because singletons are
compact convex sets. Let us see an important economic example.
Example 1333 Suppose that the consumption set A is both closed and convex, say it is
Rn+ . Then, the budget correspondence is convex-valued, as well as compact-valued if p 0
n
and w > 0, that is, when restricted to R++ R++ (cf. Proposition 792). N
32.1. DEFINITION AND BASIC NOTIONS 949
The converse implications are false: closedness and convexity of the graph of ' are
signi…cantly stronger assumptions than the closedness and convexity of the images ' (x).
This is best seen by considering scalar functions, as next we show.
The lack of convexity is obvious. To see that Gr ' is not closed observe that the origin is a
boundary point that does not belong to Gr '.
(ii) A scalar function f : R ! R has convex graph if and only if it is a¢ ne (i.e., it is a
straight line). The “if” is obvious. As to the “only if,” suppose that Gr f is convex. Given
any x; y 2 R and any 2 [0; 1], then ( x + (1 ) y; f (x) + (1 ) f (y)) 2 Gr f , that
is, f ( x + (1 ) y) = f (x) + (1 ) f (y), proving f is a¢ ne. By Proposition 656, this
implies that there exist m; q 2 R such that f (x) = mx + q. We conclude that all scalar
functions that are not a¢ ne are convex-valued but do not have convex graphs. N
950 CHAPTER 32. INTERMEZZO: CORRESPONDENCES
32.2 Hemicontinuity
There are several notions of continuity for correspondences. For bounded correspondences,
the main class of correspondences for which continuity will be needed (cf. Section 33.3), the
following notions are adequate.
De…nition 1336 A correspondence ' : A Rn Rm is
(i) upper hemicontinuous at x 2 A if
xn ! x, yn ! y and yn 2 ' (xn )
implies y 2 ' (x);
(ii) lower hemicontinuous at x 2 A if
xn ! x and y 2 ' (x)
implies that there exist elements yn 2 ' (xn ) such that yn ! y;
(iii) continuous at x 2 A if it is both upper and lower hemicontinuous at x.
A correspondence ' is upper (lower ) hemicontinuous if it is upper (lower) hemicontinuous
at all x 2 A. A correspondence ' is continuous if it is upper and lower hemicontinuous.
(
[0; 2] if 0 x<1
' (x) = 1
2 if x = 1
Formally, let xn ! 1 and y 2 ' (1) = f1=2g, that is, y = 1=2. If we take, for instance,
yn = 1=2 2 ' (xn ) for all n, we have yn ! y. In contrast, ' is not upper hemicontinuous at
x = 1 (where an “abrupt shrink” in the graph occurs). For example, consider the sequences
xn = 1 1=n and yn = 1=4. It holds xn ! 1 and yn 2 ' (xn ), but yn trivially converges to
1=4 2
= ' (1) = f1=2g. Finally, ' is easily seen to be continuous on [0; 1). N
(
[1; 2] if 0 x<1
' (x) =
[1; 3] if x = 1
952 CHAPTER 32. INTERMEZZO: CORRESPONDENCES
Formally, if xn ! 1, yn ! y and yn 2 ' (xn ) = [1; 2], then y 2 [1; 2] ' (1). In contrast, '
is not lower hemicontinuous at x = 1 (where an “abrupt dilation” in the graph occurs). For
example, consider the sequence xn = 1 1=n and y = 3. It holds xn ! 1 and y 2 ' (1), but
there is no sequence fyn g such that yn 2 ' (xn ) that converges to y. Finally, ' is easily seen
to be continuous on [0; 1). N
The next two results further clarify the nature of upper hemicontinuous correspondences.
Proof Suppose Gr ' is closed. Let xn ! x, yn ! y and yn 2 ' (xn ). Since (xn ; yn ) ! (x; y)
and Gr ' is a closed set, we have (x; y) 2 Gr ', yielding that y 2 ' (x). We conclude
that ' is upper hemicontinuous. As to the converse, assume that the domain A is closed
and ' : A Rn Rm is upper hemicontinuous. Let f(xn ; yn )g Gr ' be such that
(xn ; yn ) ! (x; y) 2 R Rm . To show that Gr ' is closed, we need to show that (x; y) 2 Gr '.
n
Proof Let x 2 A. We need to show that ' (x) is a closed set. Consider fyn g ' (x) to be
such that yn ! y 2 Rm . De…ne fxn g A to be such that xn = x for every n. It follows
that xn ! x, yn ! y and yn 2 ' (xn ) for every n. Since ' is upper hemicontinuous, we can
conclude that y 2 ' (x), yielding that ' (x) is closed.
For bounded functions the two notions of hemicontinuity are equivalent to continuity.
(i) f is continuous at x;
Proof First observe that, being f a function, y = f (x) amounts to y 2 f (x), when we
look at the function f as a single-valued correspondence. (i) implies (ii). Let xn ! x and
y = f (x). Since f is a function, we can only choose fyn g to be such that yn = f (xn ). By
continuity, yn = f (xn ) ! f (x) = y, so f is lower hemicontinuous at x. (ii) implies (iii).
Let xn ! x and fyn g such that yn 2 f (xn ) and yn ! y. Since f is a function, we can only
choose fyn g to be such that yn = f (xn ). Since f is lower hemicontinuous at x, it holds
yn ! f (x) = y. This implies that f is upper hemicontinuous at x. (iii) implies (i). Let
xn ! x. We want to show that yn = f (xn ) ! f (x). Suppose not. Then there is " > 0 and
a subsequence fynk g such that
De…nition 1342 Given any two sets A and B in Rn , their sum A + B is the set in Rn such
that
A + B = fx + y : x 2 A and y 2 Bg
954 CHAPTER 32. INTERMEZZO: CORRESPONDENCES
Example 1343 (i) The sum of the unit square A = [0; 1] [0; 1] and of the singleton
B = f(3; 3)g is the square A + B = [3; 4] [3; 4]. (ii) The sum of the squares A = [0; 1]
[0; 1] and B = [2; 3] [2; 3] is the square A + B = [2; 4] [2; 4]. Note that B A+B
since 0 2 A. (iii) The sum of the sides A = f(x1 ; x2 ) 2 [0; 1] [0; 1] : x1 = 0g and B =
f(x1 ; x2 ) 2 [0; 1] [0; 1] : x2 = 0g of the unit square is the unit square itself, i.e., A + B =
[0; 1] [0; 1]. (iv) The sum of the vertical A = (x1 ; x2 ) 2 R2 : x1 = 0 and horizontal
B = (x1 ; x2 ) 2 R2 : x2 = 0 axes is the entire plane, i.e., A + B = R2 . N
Proof (i) Let A and B be convex. Let v; w 2 A + B and 2 [0; 1]. By de…nition, there
exist x0 ; x00 2 A and y 0 ; y 00 2 B such that v = x0 + y 0 and w = x00 + y 00 . Since A and B are
convex, we have that x0 + (1 ) x00 2 A and y 0 + (1 ) y 00 2 B. This implies that
of n sets Ai in Rn . Properties (i) and (iii) just established for the sum of two sets continue
to hold for sums of n sets.
De…nition 1345 Given a scalar 2 R and a set A in Rn , their product A is the set in
Rn such that A = f x : x 2 Ag.
Example 1346 The product of the unit square A = [0; 1] [0; 1] and of = 2, is the square
2A = [0; 2] [0; 2]. N
(i) if ' and are bounded and upper hemicontinuous at a point, their sum '+ is
upper hemicontinuous at that point;
(ii) if ' and are lower hemicontinuous at a point, their sum ' + is lower hemicon-
tinuous at that point.
956 CHAPTER 32. INTERMEZZO: CORRESPONDENCES
Proof It is enough to consider the case = = 1, as the general case then easily follows.
(i) Suppose that at x we have xn ! x, yn ! y and yn 2 (' + ) (xn ). We want to show
that y 2 (' + ) (x). By de…nition, for each n there exist yn0 2 ' (xn ) and yn00 2 (xn ) such
that yn = yn0 + yn00 . Since ' and are bounded, there exist compact sets K' and K such
that fyn0 g K' and fyn00 g K . Hence, both sequences are bounded, so by the Bolzano-
Weierstrass’Theorem there exist subsequences yn0 k and yn00k that converge to some points
y 0 2 Rm and y 00 2 Rm , respectively. Since yn0 k 2 ' (xnk ) and yn00k 2 (xnk ) for every k and
xnk ! x, we then have y 0 2 ' (x) and y 00 2 (x) because ' and are upper hemicontinuous
at x. We conclude that y = limk!1 ynk = limk!1 yn0 k + yn00k = y 0 + y 00 2 (' + ) (x), as
desired.
(ii) Suppose that at x we have xn ! x and y 2 (' + ) (x). We want to show that there
exist elements yn 2 (' + ) (xn ) such that yn ! y. By de…nition, there exist y 0 2 ' (x)
and y 00 2 (x) such that y = y 0 + y 00 . Since ' and are lower hemicontinuous, there exist
elements yn0 2 ' (xn ) and yn00 2 (xn ) such that yn0 ! y 0 and yn00 ! y 00 . Setting yn = yn0 + yn00
we then have yn 2 (' + ) (xn ) and yn = yn0 + yn00 ! y 0 + y 00 = y, as desired.
By iterating the linear combination of two correspondences, we can de…ne the linear
combination of
X n
i 'i (32.3)
i=1
0 2 f (x)
For instance, for the self-correspondence f : [0; 1] [0; 1] given by f (x) = [0; x2 ], the
endpoints 0 and 1 are …xed points in that 0 2 f (0) = f0g and 1 2 f (1) = [0; 1].
The next theorem establishes the existence of …xed points by generalizing Brouwer’s
Theorem (we omit its non-trivial proof).1
0 2 E (p) (32.5)
and q 2 D (p).
The existence of equilibria thus reduces to the solution of some inclusion equations de…ned
by the excess market demand correspondence. To solve these inclusion equations, and thus
establish the existence of equilibria, we consider the following assumptions on such corres-
pondence:
We denoted the assumptions as in the earlier Section 12.8.3 because they have the same
economic interpretation (upon which we already expatiated). We use the letter “E” for the
…rst and fourth assumptions because they have to adapt their mathematical form to the
more general setting of correspondences.
We can now state and prove a general version of Arrow-Debreu’s Theorem.
Theorem 1352 (Arrow-Debreu) Under assumptions E.1, A.2, and W.1 a weak market
equilibrium exists. If, in addition, assumptions E.4 and W.2 hold, then a market equilibrium
exists.
Proof We follow Debreu (1959). Since E is bounded, there is a compact set K in Rn such
that E (p) K for all p 2 Rn+ . Without loss of generality, we can assume that K is convex.
By E.2, we can limit ourselves to the upper hemicontinuous restriction E : n 1 K.
De…ne g : K n 1 by
g (z) = arg max p z
p2 n 1
that, in his consumer role, agent i solves we no longer assume that the solution is unique,
but permit multiple optimal bundles. Consequently, now we have a demand correspondence
Di : Rn+ Rn+ de…ned by
Di (p) = arg max ui (x) 8p 2 Rn+
x2Bi (p;p ! i )
32.5. INCLUSION EQUATIONS 959
where now, though, the sum is in the sense of (32.3). The aggregate demand correspondence
still inherits the invariance property of individual demand correspondences –i.e., D ( p) =
D (p) for all > 0 –since this invariance property is easily seen to continue to hold for each
agent.
The aggregate supply function S : Rn+ ! Rn continues to be S (p) = f!g. So, the weak
Walras’ law still takes the form p E (p) 0, where E : Rn+ Rn is the excess demand
correspondence de…ned by E (p) = D (p) f!g. If Walras’law holds for each agent i 2 I,
i.e., p Di (p) = p ! i for each i 2 I, then its aggregate version p E (p) = 0 holds.
jIj
Here a pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a weak Arrow-
Debreu (market) equilibrium of the exchange economy E if
The pair (p; x) becomes a Arrow-Debreu (market) equilibrium if in the market clearing
condition (ii) we have equality, so that optimal bundles exhaust endowments.
The next result, a general version of Lemma 850, connects the Arrow-Debreu and the
aggregate market equilibrium notions.
n jIj
P 1353 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:
(i) Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.4) and
q 2 D (p);
(ii) weak Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.5) and
q 2 D (p).
We can now establish which properties of the utility functions and endowments of the
agents of economy E imply the properties of the aggregate demand correspondence that the
Arrow-Debreu’s Theorem requires. For simplicity, we consider weak equilibria and prove the
desired existence result that generalizes Proposition 851.
Proposition 1354 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I, the
endowment ! i is strictly positive and the utility function ui is continuous and quasi-concave
on a consumption set Ai = [0; bi ] where bi 2 Rn++ . Then, a weak market price equilibrium of
the exchange economy E exists.
This existence result generalizes Proposition 851 in that utility functions are only required
to be quasi-concave and not strictly quasi-concave.
The aggregate demand correspondence D inherits these properties, i.e., it is bounded, convex-
valued, and upper hemicontinuous continuous on Rn++ . So, condition E.1 is satis…ed. Since
we already noted that conditions A.2 and W.1 hold, we conclude that a weak market price
equilibrium exists by the Arrow-Debreu’s Theorem.
Chapter 33
33.1 De…nition
Given a set Rm of parameters and an all inclusive choice space A Rn , suppose that
each value of the parameter vector determines a choice (or feasible) set ' ( ) A. Choice
sets are thus identi…ed, as the parameter varies, by a feasibility correspondence ' : A.
An objective function f : A ! R, de…ned over pairs (a; ) of choices a and parameters
, has to be optimized on the feasible sets determined by the correspondence ' : A.
Jointly, ' and f thus determine an optimization problem in parametric form:
When f ( ; ) is, for every 2 , concave (quasi-concave) on the convex set A and ' is
convex-valued, this problem is called concave (quasi-concave).
Ax
^ 2 ' ( ) is a solution for 2 if it is an optimal choice given , that is,
f (^
x; ) f (x; ) 8x 2 ' ( )
That is, the correspondence associates to each the corresponding solution set, i.e., the
set of optimal choices. Its domain S is the solution domain, that is, the collection of all s
for which problem (33.1) admits a solution. If such solution is unique at all 2 S, then is
single-valued, that is, it is a function. In this case we say that is a solution function.
961
962 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS
Example 1355 The parametric optimization problem with equality and inequality con-
straints has the form
max f (x; ) (33.3)
x
sub i (x; )=0 8i 2 I
j (x; ) 0 8j 2 J
where i : A Rn Rm ! R for every i 2 I, j :A Rn Rm ! R for every j 2 J,
and = ( 1 ; :::; m
m ) 2 R . Here
Example 1356 The consumer problem (Section 18.1.4) is a parametric optimization prob-
lem. The set A is the consumption set. The space Rn+1 + of all price and income pairs
is the parameter set , with generic element = (p; w). The budget correspondence
B : Rn+1
+ R n is the feasibility correspondence and the utility function u : A ! R is
+
the objective function (interestingly, in this important example the objective function does
not depend on the parameter). Let S be the set of all parameters (p; w) for which
the consumer problem has solution (i.e., an optimal bundle). The demand correspondence
D:S Rn+ is the solution correspondence, which becomes a demand function D : S ! Rn+
when optimal bundles are unique. Finally, the indirect utility function v : S ! R is the
value function. N
We now turn to convexity properties. We assume that the set A is convex and, to ease
matters, that S = .
f (^
x1 ; ) f( x
^1 + (1 )x
^2 ; ) min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^
x2 ; ) = v ( )
and so f ( x
^1 + (1 )x
^2 ; ) = v ( ), i.e., x
^1 + (1 )x
^2 2 ( ).
The convexity of the solution set means inter alia that, when non-empty, such a set is
either a singleton or an in…nite set. That is, either the solution is unique or there is an
in…nite number of them. Next we give the most important su¢ cient condition that ensures
uniqueness.
Turn now to value functions. In the following result we assume the convexity of the graph
of '. As we already remarked, this is a substantially stronger assumption than the convexity
of the images ' (x).
964 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS
v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
min ff (^
x1 ; 1) ; f (^
x2 ; 2 )g = min fv ( 1 ) ; v ( 2 )g
v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
f (^
x1 ; 1) + (1 ) f (^
x2 ; 2) = v ( 1 ) + (1 ) v ( 2)
So, v is concave.
Example 1361 In the consumer problem, the graph of the budget correspondence is convex
if the consumption set is convex. Indeed, let ((p; w) ; x) ; ((p0 ; w0 ) ; x0 ) 2 Gr B and let 2
[0; 1]. Then, p ( x + (1 ) x0 ) w+(1 ) w0 , so the set Gr B is convex. By Proposition
1358, the demand correspondence is convex-valued if the utility function is quasi-concave,
while by Proposition 1360 the indirect utility is quasi-concave (concave) if the utility function
is quasi-concave (concave). N
Under the continuity of both the objective function and feasibility correspondence, the
optimization problem is thus stable under changes in parameters: both the value function
and the solution correspondence are continuous. The Maximum Theorem is an important
result in applications because, as remarked before, the stability that it ensures is often a
desirable property of the optimization problems that they feature. Natura non facit saltus
as long as the hypotheses of the Maximum Theorem are satis…ed.
Lemma 1363 Given any bounded sequence of scalars fan g, if lim supn!1 an = a, then
there exists a subsequence fank g such that limk!1 ank = a.
and recursively
1
nk+1 = min n 1 : n > nk and jan aj <
k+1
In this way, fank g is a subsequence of fan g. Indeed, by construction, nk+1 > nk for every
k 1. At the same time, fank g converges to a. Again, by construction, it is su¢ cient to
note that jank aj < 1=k for every k 1. Thus, the subsequence fank g is the subsequence
we were looking for. Nevertheless, we are not done. Indeed, to end the proof we have to show
that fank g is well de…ned. The careful reader probably noted already that the current proof,
despite being correct, is incomplete. Indeed, we do not know that the sets whose minima
we are taking are indeed not empty, so that these minima are well de…ned. The rest of the
proof is devoted to show exactly this.
For each n 1, set An = supm n am 2 R. Recall that a = limn!1 An = inf n An . Fix
any " > 0. On the one hand, since An converges to a, there exists some n" 1 such that
An" a < "=2. On the other hand, by the de…nition of supremum, there is some m n"
such that An" "=2 am An" . In turn, this easily implies that
is not empty. If we set " = 1, the set fn 1 : jan aj < 1g is then not empty. At the
same time, in view of the trivial inclusion fm 1 : jam aj < "0 g fm 1 : jam aj < "g
if " > "0 > 0, we conclude that the latter set is in…nite. This yields that
1 1
n 1 : n > nk and jan aj < = n 1 : jan aj < f1; :::; nk + 1g
k+1 k+1
Proof of the Maximum Theorem Since ' is bounded, recall that there exists a compact
set K such that ' ( ) K A for all 2 . Suppose that ' and f are continuous. By
Proposition 1339, the set ' ( ) is closed for each 2 . Since ' is bounded, ' ( ) turns out
to be compact as well. By Proposition 1357, S = and is compact-valued. Fix any point
2 and consider a sequence f n g such that limn!1 n = . Next, we …rst prove
that fv ( n )g is bounded. By contradiction, assume that supn jv ( n )j = +1. It follows that
there exists a subsequence f nk g such that jv ( nk )j k for every k 1. For each s 1, let
x xnk ; nk ) for every s. By Bolzano-Weierstrass’Theorem
^nk 2 ' ( nk ) such that v ( nk ) = f (^
and since ' is bounded, there exists a subsequence x ^ks that converges to x 2 K. Since '
is continuous and lims!1 nks = , we can conclude that x 2 ' . Since f is continuous,
this implies that
+1 = lim v nks = lim f x
^nks ; nks = f x; < +1
s!1 s!1
so limn!1 v ( n ) = v .
It remains to show that is upper hemicontinuous at . Let n ! and xn ! x
with xn 2 ( n ). We want to show that x 2 . Since ( n ) ' ( n ) and ' is upper
hemicontinuous, clearly x 2 ' . By the continuity of both f and v, we then have
f x; = lim f (xn ; n) = lim v ( n) =v
n!1 n!1
and so x 2 , as desired.
For instance, the continuity properties of demand correspondences and indirect utility
functions follow from the Maximum Theorem. To this end, we need the following continuity
property of the budget correspondence.
33.3. MAXIMUM THEOREM 967
Proposition 1364 The budget correspondence is continuous at all (p; w) such that w > 0.
Proof Let (p; w) 2 Rn+ R++ . We …rst show that B is upper hemicontinuous at (p; w). Let
(pn ; wn ) ! (p; w), xn ! x and xn 2 B (pn ; wn ). We want to show that x 2 B (p; w). Since
p xn wn for each n, it holds p x = limn!1 p xn limn!1 wn = w, that is, x 2 B (p; w).
We conclude that B is upper hemicontinuous at (p; w).
The correspondence B is also lower hemicontinuous at (p; w) 2 Rn+1
+ . Let (pn ; wn ) !
(p; w) and x 2 B (p; w). We want to show that there is a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We consider two cases.
(i) Suppose p x < w. Since (pn ; wn ) ! (p; w), there is n large enough so that pn x < wn
for all n n. Hence, the constant sequence xn = x is such that xn 2 B (pn ; wn ) for all
n n and xn ! x.
(ii) Suppose p x = w. Since w > 0, there is x 2 Rn+ such that p x < w. Since
(pn ; wn ) ! (p; w), there is n large enough so that pn x < wn for all n n. Set
1 1
xn = 1 x+ x
n n
In both cases it then easily follows the existence of a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We conclude that B is lower hemicontinuous at (p; w).
We can now apply the Maximum Theorem to the consumer problem that, under a mild
continuity hypothesis on the utility function, turns out to be stable with respect to changes
in prices and wealth.
(i) the demand correspondence is compact-valued and upper hemicontinuous at (p; w);
Proof Since the consumption set is compact, the budget correspondence is bounded and
continuous on Rn+ R++ . Since the utility function is continuous, the result then follows
from the Maximum Theorem.
Observe that (i) implies that demand functions are continuous at (p; w) since upper
hemicontinuity and continuity coincide for bounded functions (Proposition 1341).
968 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS
where the feasibility correspondence is constant, with ' ( ) = C A for all 2 . The
parameter only a¤ects the objective function. To ease matters, throughout the section we
also assume that S = .
We …rst approach heuristically the issue. To this end, suppose that n = k = 1 so
that both the parameter and the choice variable x are scalars. Moreover, assume that
there is a unique solution for each , so that : ! R is the solution function. Then
v ( ) = f ( ( ) ; ) for every 2 . A heuristic application of the chain rule – a “back of
the envelope calculation” –then suggests that, if exists, the derivative of v at 0 is:
@f ( ( 0 ) ; 0) 0 @f ( ( 0 ) ; 0)
v0 ( 0) = ( 0) +
@x @
Remarkably, the …rst term is null because by Fermat’s Theorem (@f =@x) ( ( 0 ) ; 0) = 0
(provided the solution is interior). Thus,
@f ( ( 0 ) ; 0)
v0 ( 0) = (33.5)
@
Next we make general and rigorous this important …nding.
Theorem 1366 Suppose f (x; ) is, for every x 2 C, di¤ erentiable at 0 2 int . If v is
di¤ erentiable at 0 , then for every x
^ 2 ( 0 ) we have rv ( 0 ) = r f (^
x; 0 ), that is,
@v ( 0 ) @f (^
x; 0 )
= 8i = 1; :::; k (33.6)
@ i @ i
@v ( 0 ) @f ( ( 0 ) ; 0)
= 8i = 1; :::; k
@ i @ i
which is the general form of the heuristic formula (33.5).
We thus have
w( 0 + tu) w ( 0) v( 0 + tu) v ( 0)
t t
for all u 2 Rk and t > 0 su¢ ciently small. Hence,
@f (x; 0 ) f x ( 0) ; + hei
0 f (x ( 0 ) ; 0) w 0 + hei w ( 0)
= lim = lim
@ i h!0+ h h!0+ h
v + hei v (
0 0) @v ( 0 )
lim =
h!0+ h @ i
@f (x; 0 ) @v ( 0 )
@ i @ i
The hypothesis that v is di¤erentiable is not that appealing because it is not in terms
of the primitive elements f and C of problem (33.4). Indeed, to check it we need to know
the value function. Remarkably, in concave problems this di¤erentiability hypothesis follows
from hypotheses that are directly on the objective function.
Theorem 1367 Let C and be convex. Suppose f (x; ) is, for every x 2 C, di¤ erentiable
at 0 2 int . If f is concave on C , then v is di¤ erentiable at 0 .
\
Proof By Proposition 1360, v is concave. We begin by proving that @v ( 0 ) @ f (x; 0 ).
x2 ( 0)
Let 2 @v ( 0 ), so that v ( ) v ( 0) + ( 0) for all 2 . Being v ( 0 ) = w ( 0 ), by
(33.7) we have, for all 2 ,
w( ) v( ) v ( 0) + ( 0) = w ( 0) + ( 0)
@f ( ( 0 ) ; 0) ^ ( 0) @ ( ( 0) ; 0)
v0 ( 0) =
@ @
where ^ ( 0 ) is the Lagrange multiplier that corresponds to the unique solution ( 0 ). Indeed,
being ( ( ) ; ) = 0 for every 2 , by a heuristic application of the chain rule we have
@ ( ( 0) ; 0) 0 @ ( ( 0) ; 0)
( 0) + =0
@x @
On the other hand, being v ( ) = f ( ( ) ; ) for every 2 , again by a heuristic application
of the chain rule we have
@f ( ( 0 ) ; 0 ) 0 @f
v0 ( 0) = ( 0) + ( ( 0) ; 0)
@x @
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) ^ @ ( ( 0) ; 0) 0
= ( 0) + ( 0) ( 0)
@x @x @x
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) 0
= ( 0 ) 0 ( ( 0 )) 0 ( 0 ) + ^ ( 0 ) ( 0)
@x @x
| {z }
=0
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0)
= ( 0)
@ @
as desired. Next we make more rigorous and general the result. We study the case of unique
solutions, common in applications.
33.5. ENVELOPE THEOREMS II: VARIABLE CONSTRAINT 971
Theorem 1368 Suppose that problem (33.8) has a unique solution ( ) at all 2 .2
Suppose that the sets A and are open and that f and are continuously di¤ erentiable on
A . If the determinant of the Jacobian of the operator (rx L; ) is non-zero on , then
rv ( ) = r f ( ( ) ; ) ^( ) r ( ( ); ) 8 2
That is,
m
X
@v ( ) @f ( ( ) ; ) ^i ( ) @ i( ( ); )
= 8s = 1; :::; k (33.9)
@ s @ s @ s
i=1
for all 2 .
Proof As in the heuristic argument we consider the case n = k = m = 1 (the general case
being just notationally messier). By hypothesis, there is a solution function : ! A. By
Lagrange’s Theorem, is then the unique function that, along with a “multiplier” function
^ : ! R, satis…es for all 2 the equations
@f ( ( ) ; ) ^ @ ( ( ); )
rx L( ( ) ; ^ ( )) = ( ) =0
@x @x
r L( ( ) ; ^ ( )) = ( ( ); ) = 0
@ ( ( ); ) 0 @ ( ( ); )
+ ( ) =0 8 2 (33.10)
@ @x
@f ( ( ) ; ) @f
v0 ( ) = 0
( )+ ( ( ); ) 8 2 (33.11)
@x @
Putting together (33.10) and (33.11) via the simple algebra seen in the heuristic derivation,
we get
@f ( ( ) ; ) @f @f ( ( ) ; ) ^( ) @ ( ( ); )
v0 ( ) = 0
( )+ ( ( ); ) = 8 2
@x @ @ @
as desired.
2
Earlier in the chapter we saw which conditions ensure the existence and uniqueness of solutions.
972 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS
@v ( 0 ) @f (^
x; 0 ) X X @ ( ( 0) ; 0)
= ^i ( 0) @ i( ( 0) ; 0)
^j ( 0)
j
(33.12)
@ s @ s @ s @ s
i2I j2J
jJj
for every s = 1; :::; k. Here ( ^ ( 0 ) ; ^ ( 0 )) 2 RjIj R+ are the Lagrange multipliers associated
with the solution ( 0 ), assumed to be unique (for simplicity).
We can derive heuristically this formula with the heuristic argument that we just used for
the equality case. Indeed, if we denote by A ( ( 0 )) be the set of the binding constraints at
0 , by Lemma 1303 we have ^ j = 0 for each j 2 = A ( ( 0 )). So, the non-binding constraints
at 0 do not a¤ect the derivation because their multipliers are null.
That said, let us consider the standard problem (30.4) in which the objective function does
not depend on the parameter, i (x; ) = gi (x) bi for every i 2 I, and j (x; ) = hj (x) cj
for every j 2 J (Example 1355). Formula (33.12) then implies
@v (b; c)
= ^ i (b; c) 8i 2 I
@bi
@v (b; c)
= ^ j (b; c) 8j 2 J
@cj
Interestingly, the multipliers describe the marginal e¤ect on the value function of relaxing
the constraints, that is, how much it is valuable to relax them. In particular, we have
@v (b; c) =@cj = ^ j (b; c) 0 because it is always bene…cial to relax an inequality constraint:
more alternatives become available. In contrast, this might not be the case for an equality
constraint, so the sign of @v (b; c) =@bi = ^ i (b; c) is ambiguous.
for all x; y 2 I.
f (x; ) f (x + h; ) = (x) (x + h) h 8 2
We can now address the question that we posed at the beginning of this section. To
ease matters, from now on we assume that problem (33.13) has a solution for every 2
(e.g., I is compact and f is continuous in x), so we can write the solution correspondence
as : Rn . In most applications, comparative statics exercises actually feature solution
functions : ! Rn rather than correspondences (as we already argued several times).
This motivates the next result.
Example 1375 From the last example we know that, given a supermodular function :
I ! R, the function f : I Rn Rn ! R de…ned by f (x; ) = (x) + x is
parametrically supermodular. Consider the parametric problem
where the feasibility correspondence ' is ascending. By the last corollary, the solution
correspondence of this problem is ascending. For instance, consider a Cobb-Douglas pro-
duction function (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. If q 0 is the output’s price and
p = (p1 ; p2 ) 0 are the inputs’prices, the pro…t function (x; q) = qx1 1 x2 2 p1 x1 p2 x2 is
parametrically supermodular because is supermodular (see Example 760). The producer
problem is
max (x; q) sub x1 ; x2 0
x1 ;x2
where output’s price q plays the role of the parameter . Since the pro…t function is strictly
concave, solutions are unique (if they exist). In particular, a solution of the producer problem
is an optimal amount of inputs that the producer will demand. By the last corollary,4 the
solution function is increasing: if the output’s price increases, the inputs’ demand of the
producer increases. N
and
0 0
f y; > f x _ y; =) f (x ^ y; ) > f (x; ) (33.16)
Interdependent optimization
So far we have considered individual optimization problems. Many economic and social phe-
nomena, however, are characterized by the interplay of several such problems, in which the
outcomes of agents’decisions depend on their decisions as well as on the decisions of other
agents. Market interactions are an obvious example of interdependence among agents’ de-
cisions: for instance, in an oligopoly problem the pro…ts that each producer can earn depends
both on his production decision and on the production decisions of the other oligopolists.
Interdependent decisions must coexist: the mutual compatibility of agents’ decisions is
the novel conceptual issue that emerges in the study of interdependent optimization. Equi-
librium notions address this issue. In this chapter we present an introductory mathematical
analysis of this most important topic, which is the subject matter of game theory and is at
the heart of economic analysis. In particular, the theorems of von Neumann and Nash that
we will present in this chapter are wonderful examples of deep mathematical results that
have been motivated by economic applications.
In other words, (^
x1 ; x
^2 ) is a saddle point if the function f (^
x1 ; ) : C2 ! R has a minimum
at x
^2 and the function f ( ; x ^2 ) : C1 ! R has a maximum at x ^1 . To visualize these points,
think of centers of horse saddles: these points at the same time maximize f along one
dimension and minimize it along the other, perpendicular, one. This motivates their name.
Their nature is clari…ed by the next characterization.
1
Since we have inf and sup, we must allow the values 1 and +1, respectively.
977
978 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
(i) the function inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) attains its maximum value at x
^1 ,
(ii) the function supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x
^2 ,
This characterization consists of two optimization conditions, (i) and (ii), and a …nal
condition, (iii), that requires their mutual consistency. Let us consider these conditions one
by one.
By condition (i), the component x ^1 of a saddle point, called maximinimizer, solves the
following optimization problem, called maximinimization (or primal ) problem,
where inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) is the objective function. If f does not depend on
x2 , this problem reduces to the standard maximization problem
where supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] is the objective function. If f does not depend
on x1 , this problem reduces to the standard minimization problem
The optimization conditions (i) and (ii) have standard optimization (maximization or
minimization) problems as special cases, so conceptually they are generalizations of famil-
iar notions. In contrast, the consistency condition (iii) is the actual novel feature of the
characterization in that it introduces a notion of mutual consistency between optimization
problems, which are no longer studied in isolation, as we did so far. The scope of this
condition will become more clear with the notion of Nash equilibrium.
The proof of Proposition 1381 relies on the following simple but important lemma (inter
alia, it shows that the more interesting part in an equality sup inf = inf sup is the inequality
sup inf inf sup).
34.1. MINIMAX THEOREM 979
Then, inf x2 2A2 supx1 2A1 f (x1 ; x2 ) supx1 2A1 inf x2 2A2 f (x1 ; x2 ).
inf f (^
x1 ; x2 ) = f (^
x1 ; x
^2 ) = sup f (x1 ; x
^2 ) (34.6)
x2 2C2 x1 2C1
So,
sup inf f (x1 ; x2 ) f (^
x1 ; x
^2 ) inf sup f (x1 ; x2 )
x1 2C1 x2 2C2 x2 2C2 x1 2C1
By the previous lemma, the inequalities are actually equalities, that is,
inf f (^
x1 ; x2 ) = sup inf f (x1 ; x2 ) and sup f (x1 ; x
^2 ) = inf sup f (x1 ; x2 )
x2 2C2 x1 2C1 x2 2C2 x1 2C1 x2 2C2 x1 2C1
inf f (^
x1 ; x2 ) = f (^
x1 ; x
^2 ) = sup f (x1 ; x
^2 )
x2 2C2 x1 2C1
The last proposition implies the next remarkable interchangeability property of saddle
points.
In words, if we interchange the two components of a saddle point, we get a new saddle
point.
supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x ^02 . In turn, by the “if”
part of Proposition 1381 this implies that (^
x1 ; x 0
^2 ) is a saddle point of f on C1 C2 .
@f (x1 ; x2 ) @f (x1 ; x2 )
rx1 f (x1 ; x2 ) = ; ::::;
@x11 @x1m
@f (x1 ; x2 ) @f (x1 ; x2 )
rx2 f (x1 ; x2 ) = ; ::::;
@x21 @x2n
This distinction is key for the next di¤erential characterization of saddle points.
(i) Ci is a closed and convex subset of the open and convex set Ai for i = 1; 2;
If (^
x1 ; x
^ 2 ) 2 C1 C2 is a saddle point of f on C1 C2 , then
rx1 f (^
x1 ; x
^2 ) (x1 x
^1 ) 0 8x1 2 C1 (34.7)
rx2 f (^
x1 ; x
^2 ) (x2 x
^2 ) 0 8x2 2 C2 (34.8)
When x
^1 is an interior point, condition (34.7) takes the simpler Fermat’s form
rx1 f (^
x1 ; x
^2 ) = 0
and the same is true for condition (34.8) if x
^2 is an interior point. Remarkably, conditions
(34.7) and (34.8) become necessary and su¢ cient when f is a saddle function on C1 C2 ,
i.e., when f is concave in x1 2 C1 and convex in x2 2 C2 . Saddle functions have therefore
for saddle points the remarkable status that concave and convex functions have in standard
optimization problems for maximizers and minimizers, respectively.
Example 1385 Consider the saddle function f : R2 ! R de…ned by f (x1 ; x2 ) = x21 x22 .
Since
@f (x1 ; x2 ) @f (x1 ; x2 )
= = 0 () x1 = x2 = 0
@x1 @x2
from the last theorem it follows that the origin (0; 0) is the only saddle point of f on R2 (cf.
Example 987). Graphically:
0
x3
-2
-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
The previous result establishes, inter alia, the existence of saddle points under di¤erenti-
ability and concavity assumptions on the function f . Next we give a fundamental existence
result, the Minimax Theorem, that relaxes these requirements on f , in particular it drops
any di¤erentiability assumption. It requires, however, the sets C1 and C2 to be compact (as
usual, there are no free meals).
Theorem 1386 (Minimax) Let f : A1 A2 Rn Rm ! R be a real-valued function and
C1 and C2 subsets of A1 and A2 , respectively. Suppose that:
(i) C1 and C2 are convex and compact subsets of A1 and A2 , respectively;
(ii) f ( ; x2 ) : A1 ! R is continuous and quasi-concave on C1 ;
(iii) f (x1 ; ) : A2 ! R is continuous and quasi-convex on C2 .
Then, f has a saddle point on C1 C2 , with
max min f (x1 ; x
^2 ) = f (^
x1 ; x
^2 ) = min max f (x1 ; x2 ) (34.9)
x1 2C1 x2 2C2 x2 2C2 x1 2C1
982 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
Proof The existence of the saddle point follows from Nash’s Theorem, which will be proved
below. Since the sets C1 and C2 are compact and the function f is continuous in x1 and
in x2 , by Weierstrass’Theorem we can de…ne the functions minx2 2C2 f ( ; x2 ) : C1 ! R and
maxx1 2C1 f (x1 ; ) : C2 ! R. So, (34.2) implies (34.9).
The Minimax Theorem was proved in 1928 by John von Neumann in his seminal paper
on game theory. Interestingly, the choice sets C1 and C2 are required to be convex, so they
have to be in…nite (unless they are singletons, a trivial case).
A simple, yet useful, corollary of the Minimax Theorem is that continuous saddle func-
tions on a compact convex set C1 C2 have a saddle point on C1 C2 . If, in addition, they
are di¤erentiable, conditions (34.7) and (34.8) then characterize any such point.
fi : C1 Cn ! R
For instance, the objective function f1 of agent 1 depends on the agent decision x1 , as well
on the decisions x2 , ...., xn of the other agents. In the oligopoly example below, x1 is the
production decision of agent 1, while x2 , ...., xn are the production decisions of the other
agents.
Decisions are simultaneous, described by a vector (x1 ; :::; xn ). The operator f = (f1 ; :::; fn ) :
C1 Cn ! Rn , with
describes the value fi (x1 ; :::; xn ) that each agent attains at (x1 ; :::; xn ).
Example 1387 Consider n …rms that produce the same output, say potatoes, that they
sell in the same market. The market price of the output depends on the total output
that together all …rms o¤er. Assume that the output has a strictly decreasing demand
function 1
Pn D : [0; 1) ! [0; 1) in the market. So, D (q) is the market price of the output if
q = i=1 qi is the sum of the individual quantities qi 0 of the output produced by each
n
…rm i = 1; :::; n. The pro…t function i : R+ ! R of …rm i is
1
i (q1 ; :::; qn ) =D (q) qi ci (qi )
where ci : [0; 1) ! R is its cost function. Thus, the pro…t of …rm i depends via q on
the production decisions of all …rms, not just on their own decisions qi . We thus have an
interdependent optimization problem, called Cournot oligopoly. Here the choice sets Ci are
the positive half-line [0; 1) and the operator f is given by = ( 1 ; :::; n ) : Rn+ ! Rn . N
5
In game theory agents are often called players (or co-players or opponents).
34.2. NASH EQUILIBRIA 983
To introduce the next equilibrium notion, to …x ideas we …rst consider the case n = 2
of two agents. Here f : C1 C2 ! R2 with f (x1 ; x2 ) = (f1 (x1 ; x2 ) ; f2 (x1 ; x2 )). Suppose a
decision pro…le (^
x1 ; x
^2 ) 2 C1 C2 is such that
f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1 (34.10)
f2 (^
x1 ; x
^2 ) f2 (^
x1 ; x2 ) 8x2 2 C2
In this case, each agent is doing his best given what the other agent does. Agent i has no
incentive to deviate from x ^i – that is, to select a di¤erent decision – as long as he knows
that the other agent (his “opponent”), denoted i, is playing x ^ i .6 In this sense, decisions
(^
x1 ; x
^2 ) are mutually compatible.
All this motivates the following classic de…nition proposed in 1950 by John Nash, which
is the most important equilibrium notion in economics. Here for each agent i we denote by
x i 2 C i = j6=i Cj the decision pro…le of his opponents.
fi (^
x) fi (xi ; x
^ i) 8xi 2 Ci (34.11)
N.B. Nash equilibrium is de…ned purely in terms of agents’individual decisions xi , unlike the
notion of Arrow-Debreu equilibrium (Section 18.8) that involves a variable, the price vector,
which is not under the control of agents. In this sense, the Arrow-Debreu equilibrium is a
spurious equilibrium notion from a methodological individualism standpoint, though most
useful in understanding markets’behavior.7 O
where the opponents’ decisions x i play the role of the parameter. The solution corres-
pondence i : C i Ci de…ned by i (x i ) = arg maxxi fi (xi ; x i ) is called best reply
correspondence. We can reformulate the equilibrium condition (34.11) as
x
^i 2 i (^
x i) 8i = 1; :::; n (34.12)
6
How such mutual understanding among agents emerges is a non-trivial conceptual issue from which we
abstract away, leaving it to game theory courses.
7
Methodological principles are important but a pragmatic attitude should be kept not to transform them
in dogmas.
984 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
max fi (xi ; x
^ i) sub xi 2 Ci (34.13)
xi
In turn, this easily leads to a di¤erential characterization of Nash equilibria via Stam-
pacchia’s Theorem. To ease matters, we assume that each Ai is a subset of the same space
Rm , so that both A and C are subsets of (Rm )n .
(i) Ci is a closed and convex subset of the open and convex set Ai ;
If x
^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of f on C, then, for each i = 1; :::; n,
rxi fi (^
x) (xi x
^i ) 0 8xi 2 Ci (34.14)
When m = 1, so that each Ai is a subset of the real line, the condition takes the simpler
form:
@fi (^
x)
(xi x^i ) 0 8xi 2 Ci
@xi
Moreover, when x
^i is an interior point of Ci , the condition takes the Fermat’s form
rxi fi (^
x) = 0 8xi 2 Ci (34.15)
Example 1390 In the Cournot oligopoly, assume that both the demand and cost functions
are linear, where D 1 (q) = a bq and ci (qi ) = cqi with a > c and b > 0. Then, the pro…t
function of …rm i is i (q1 ; :::; qn ) = (a bq) qi cqi , which is strictly concave in qi . The
choice set of …rm i is the set Ci = [0; +1). By the last proposition, the …rst order condition
(34.14) is necessary and su¢ cient for a Nash equilibrium (^ q1 ; :::; q^n ). This condition is, for
every i,
@ i (^
q1 ; :::; q^n )
(qi q^i ) = (a b^
q b^
qi c) (qi q^i ) 0 8qi 0
@qi
So, for every i we have a b^ q b^ qi = c if q^i > 0, and (a b^
q b^
qi ) c if q^i = 0.
We have q^i > 0 for every i. Indeed, assume by contradiction that q^i = 0 for some i. The
…rst order condition then implies a b^ q c, which in turn implies a c, thus contradicting
a > c. We conclude that q^i > 0 for every i. Then, the …rst order condition implies
a c b^
q
q^i = 8i = 1; :::; n
b
34.3. NASH EQUILIBRIA AND SADDLE POINTS 985
1 a c
q^i = 8i = 1; :::; n
1+n b
The best reply formulation (34.12) permits to establish the existence of Nash equilibria
via a …xed point argument based on Kakutani’s Theorem.
(^
x1 ; :::; x
^n ) 2 ' (^
x1 ; :::; x
^n ) = 1 (^
x 1) n (^
x n)
So, x
^i 2 i (^
x i) for each i = 1; :::; n, as desired.
Example 1392 When ' (x) = x, we have f2 = f1 . This strictly competitive operator f
is called zero-sum. It is the polar case that may arise, for example, in military interactions.
This is the case originally studied by von Neumann and Morgenstern in their celebrated
(wartime) 1944 opus. N
986 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
(' f1 ) (^
x1 ; x
^2 ) (' f1 ) (^
x1 ; x2 ) () f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 )
f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1
f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 ) 8x2 2 C2
that is,
f1 (^
x1 ; x2 ) f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 )
In this case, a pair (^
x1 ; x
^2 ) is a Nash equilibrium if and only if it is a saddle point of f
on C1 C2 . We have thus proved the following mathematically simple, yet conceptually
important, result.
Saddle points are thus Nash equilibria of strictly competitive operators. In particular, the
Minimax Theorem is the special case of Nash’s Theorem for strictly competitive operators.
This further clari…es the nature of saddle points as a way to model individual optimiza-
tion problems that are “negatively” interdependent, so agents expect the worst from their
opponent and best reply by maxminimizing.
rxi fi (^
x) 2 N m 1 (^
x) 8xi 2 Ci
8
Recall Section 31.2.2.
34.5. APPLICATIONS 987
So, the result follows from Proposition 1323 and from Stampacchia’s Theorem.
max fi (xi ; x
^ i) = max fi (xi ; x
^ i) (34.16)
xi 2 m 1 xi 2fe1 ;:::;em g
and
;=
6 arg max fi (xi ; x
^ i ) = co arg max fi (xi ; x
^ i) (34.17)
xi 2 m 1 xi 2fe1 ;:::;em g
By (34.17), the set of Nash equilibria is a non-empty set that consists of the n-tuples
(^
x1 ; :::; x
^n ) 2 m 1 m 1 such that
x
^i 2 co arg max fi (xi ; x
^ i)
xi 2fe1 ;:::;em g
max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi
that only involves the versors. In the next section we will discuss the signi…cance of all this
for games and decisions under randomization.
34.5 Applications
34.5.1 Randomization in games and decisions
Suppose that an agent has a set S = fs1 ; s2 ; :::; sm g of m pure actions (or strategies),
evaluated with a utility function u : S ! R. Since the set S is …nite, it is not convex (unless
988 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
it is a singleton), so we cannot use the powerful results – such as Nash’s Theorem – that
throughout the book we saw to hold for concave (or convex) functions de…ned on convex
sets. A standard way to embed S in a convex set is via randomization, as readers will learn
in game theory courses. Here we just outline the argument to illustrate the results of the
chapter.
Speci…cally, by randomizing via some random device – coin tossing, roulette wheels,
and the like – agents can select a mixed (or randomized ) action in which (sk ) is the
probability that the random device assigns to the pure action sk . Denote by (S) the set of
all randomized actions. According to the expected utility criterion, an agent evaluates the
randomized action via the function U : (S) ! R de…ned by
m
X
U( )= u (sk ) (sk )
k=1
In words, the randomized action is evaluated by taking the average of the utilities of
the pure actions weighted by their probabilities under .9 Note that each pure action sk
corresponds to the “degenerated” randomized action that assigns it probability 1, i.e.,
(sk ) = 1. Via this identi…cation, we can regard S as a subset of (S) and thus write, with
an abuse of notation, S (S).
Under randomization, agents aim to select the best randomized action by solving the
optimization problem
max U ( ) sub 2 (S) (34.19)
(sk ) ! xk
In particular, a degenerate , with (sk ) = 1, is identi…ed with the versor ek . That is, pure
actions can be identi…ed with the versors of the simplex, i.e., with its extreme points. For
instance, if is such that (s2 ) = 1, then it corresponds to the versor e2 .
Summing up, we have the following identi…cations and inclusions:
S ! ext m 1
(S) ! m 1
In this way, we have “convexi…ed” S by identifying it with a subset of the simplex, which is
a convex set in Rm . In this sense, we have convexi…ed S.
Here we have:
S = fs1 ; s2 ; s3 g ! ext 2 = e1 ; e2 ; e3
For instance, if 2 (S) is such that (s1 ) = (s2 ) = 1=4, and (s3 ) = 1=2, then it
corresponds to x = (1=4; 1=4; 1=2). N
By setting uk = u (sk ) for each k, the expected utility function U can be identi…ed with
the a¢ ne function V : m 1 ! R de…ned by
m
X
V (x) = uk xk = u x
k=1
where u = (u1 ; u2 ; :::; um ) 2 Rm . The optimization problem (34.19) of the agent becomes
and
;=
6 arg max V (x) = co arg max V (x) (34.22)
x2 m 1 x2fe1 ;:::;em g
By (34.21), agents’optimal mixed actions are convex combinations of pure actions that,
in turn, are optimal. So, the optimal x
^ is such that
That is, the pure actions that are assigned a strictly positive weight by an optimal mixed
action are, in turn, optimal. By (34.22), in terms of value attainment problem (34.20) is
equivalent to the much simpler problem
Ui (^ i ; ^ i ) Ui ( i ; ^ i ) 8 i 2 (Si )
for each i = 1; 2.
The mixed actions (Si ) can be identi…ed with the simplex m 1 , with its extreme points
ei representing the pure actions si . De…ne ui : f1; :::; mg f1; :::; mg ! R by ui (k 0 ; k 00 ) =
ui (s1k0 ; s2k00 ). We can then identify Ui with the function Vi : m 1 m 1 ! R de…ned by
X
Vi (x1 ; x2 ) = x1k0 x2k00 ui k 0 ; k 00 = x1 Ui x2
(k0 ;k00 )2f1;:::;mg f1;:::;mg
where Ui is the square matrix of order m that has the values ui (k 0 ; k 00 ) as entries.
The function Vi is a¢ ne in xi . A pair (^
x1 ; x
^2 ) 2 m 1 m 1 is a Nash equilibrium if
Vi (^
xi ; x
^ i) Vi (xi ; x
^ i) 8xi 2 m 1
max Vi (xi ; x
^ i) = max Vi (xi ; x
^ i) (34.23)
xi 2 m 1 xi 2fe1 ;:::;em g
and
;=
6 arg max Vi (xi ; x
^ i ) = co arg max Vi (xi ; x
^ i) (34.24)
xi 2 m 1 xi 2fe1 ;:::;em g
By (34.24), equilibrium mixed actions are convex combinations of pure actions that, in turn,
best reply to the opponent’s mixed action. So, the equilibrium x
^i is such that (34.18) holds,
i.e.,
x^ik > 0 =) ek 2 arg max Vi (xi ; x ^ i)
xi 2 m 1
max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi
x; ^ ) 2 A
A pair (^ Rm
+ is a saddle point of L on A Rm
+ if
L (^
x; ) x; ^ )
L(^ L(x; ^ ) 8x 2 A; 8 0
(i) f (^
x) f (x) + ^ (b g (x)) for every x 2 A;
(ii) g (^
x) b and ^ i (bi gi (^
x)) = 0 for all i = 1; :::; m.
x; ^ ) 2 A Rm
Proof “Only if”. Let (^ + be a saddle point of the Lagrangian function L :
m
A R+ ! R. Since L (^ x; ) L(^ ^
x; ) for all 0, it follows that
( ^ ) (b g (^
x)) 0 8 0 (34.26)
f (^ x; ^ )
x) = L(^ L(x; ^ ) = f (x) + ^ (b g (x)) 8x 2 A (34.27)
x; ^ )
L(^ x; ) = ( ^
L (^ ) (b g (^
x)) = (b g (^
x)) 0
x; ^ )
which implies L(^ L (^
x; ) for all 0. On the other hand, (i) and (34.28) imply
x; ^ ) = f (^
L(^ x) f (x) + ^ (b g (x)) = L(x; ^ ) 8x 2 A
x; ^ )
so that L(^ L(x; ^ ) for all x 2 A. We conclude that (^
x; ^ ) is a saddle point of L on
A R+ .m
Proposition 1398 A vector x ^ 2 A solves problem (34.25) if there exists ^ 0 such that
^
x; ) is a saddle point of the Lagrangian function L on A Rm
(^ +.
So, the existence of a saddle point for the Lagrangian function implies the existence of
a solution for the underlying optimization problem with inequality constraints. No assump-
tions are made on the functions f and gi . If we make some standard assumptions on them,
the converse becomes true, thus establishing the following remarkable “saddle” version of
Kuhn-Tucker’s Theorem.
(i) x
^ 2 A solves problem (34.25);
(ii) there exists a vector ^ x; ^ ) is a saddle point of the Lagrangian function
0 such that (^
L on A Rm +;
(iii) there exists a vector ^ 0 such that the Kuhn-Tucker conditions hold
x; ^ ) = 0
rx L(^ (34.29)
^ i r L(^x; ^ ) = 0 8i = 1; :::; m (34.30)
i
r L(^x; ^ ) 0 (34.31)
Proof (ii) implies (i) by the last proposition. (i) implies (iii) by what we learned in Section
31.3. (iii) implies (ii) by Theorem 1384. Indeed the Kuhn-Tucker conditions are nothing but
conditions (34.7) and (34.8) for the Lagrangian function (cf. Example 1322). First, note that
condition (34.7) takes the form rx L(^ x; ^ ) = 0 because the set A is open. As to condition
(34.8), here it becomes
r L(^x; ^ ) ( ^) 0 8 0 (34.32)
This condition is equivalent to (34.30) and (34.31). From (34.30) it follows r L(^ x; ^ )
^ = 0, while from (34.31) it follows that r L(^ ^
x; ) 0 for all 0. So, (34.32) holds.
Conversely, by taking = 0 in (34.32), we have r L(^ x; ^ ) ^ 0 and by taking = 2 ^ we
have r L(^ x; ^ ) ^ 0, so r L(^ x; ^ ) ^ = 0. Finally, by taking = ^ + ei in (34.32), we
easily get r L(^ ^
x; ) 0. Since r L(^ x; ^ ) = b g (x), from b g (x) and the positivity of
^ it follows that r L(^ x; ^ ) ^ = 0 is equivalent to ^ i r L(^ x; ^ ) = 0 for all i = 1; :::; m. In
i
sum, the Kuhn-Tucker conditions are the form that conditions (34.7) and (34.8) take here.
Since the Lagrangian function is easily seen to be a saddle function when f concave and each
gi convex, this prove that properties (ii) and (iii) are equivalent, thus completing the proof.
(i) x
^ solves the primal problem
max inf L (x; ) sub x 2 A
x 0
34.5. APPLICATIONS 993
The primal problem is actually equivalent to the original problem (34.25). Indeed, let us
write problem (34.25) in canonical form as
we have (
1 if x 2
=C
inf L (x; ) =
0 f (x) if x 2 C
and
arg max inf L (x; ) = arg max f (x)
x2A 0 x2C
so the primal and the original problem are equivalent in terms of both solutions and value
attainment. We thus have the following corollary of the last theorem, which relates the
original and dual problems.
Summing up, in concave optimization problems with inequality constraints the solution x^
and the multiplier ^ solve dual optimization problems that are mutually consistent. In par-
ticular, multipliers admit a dual optimization interpretation in which they can be viewed as
(optimally) chosen by some …ctitious, yet malevolent, opponent (say, nature). An individual
optimization problem is thus solved by embedding it in a …ctitious game against nature, a
surprising paranoid twist on multipliers.
994 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
x; ^ )
@L(^ x; ^ )
@f (^
= i (bi gi (^
x)) = 0 8i = 1; :::; n
@xi @xi
x; ^ ) ^
@L(^
i =0 8i = 1; :::; m
@ i
x; ^ )
@L(^
= bi gi (x) 0 8i = 1; :::; m
@ i
This is our last angle on Kuhn-Tucker’s Theorem, the deepest one.
As the proof clari…es, the two problems (34.34) and (34.35) are one the dual of the other,
either providing the multipliers to the other. In particular, solutions exists if either of the
two polyhedra P and is bounded (Corollary 836).
L (x; ) = c x + (b Ax)
Its dual problem is
min sup L (x; ) sub 0 (34.36)
x 0
We have
sup L (x; ) = sup c x + (b Ax) = b + sup c x Ax
x 0 x 0 x 0
n m
!
X X
= b + sup cj aij i xj = b + sup c AT x
x 0 j=1 i=1 x 0
34.5. APPLICATIONS 995
~ : Rm
In turn, the Lagrangian function L + Rn+ ! R of this problem is
n m
!
X X
~ ( ; x) =
L b+x c+A T
= b+ cj + aij i xj
j=1 i=1
= c x (b Ax) = L (x; )
x; ^ ) is a saddle point of L if and only if ( ^ ; x
So, (^ ~ We conclude
^) is a saddle point of L.
that the linear programs (34.34) and (34.37) are one dual to the other, each providing the
multipliers to the other. By Corollary 1400 the result then follows.
Example 1402 Let 2 3
1 2 2 1
4
A= 0 2 1 2 5
0 1 1 3
and b = (1; 3; 2) and c = ( 1; 2; 4; 2). Consider the linear programming problem
max x1 + 2 (x2 x4 ) + 4x3
x1 ;x2 ;x3 ;x4
sub x1 2x2 + 2x3 + x4 1; 2 (x2 + x4 ) x3 3; x2 x3 + 3x4 2
x1 0; x2 0, x3 0, x4 0
Since 2 3
1 0 0
6 2 2 1 7
AT = 6
4 2
7
1 1 5
1 2 3
the dual problem is
min 1 +3 2 +2 3
1; 2; 3
sub 1 1; 2 ( 2 1) + 3 2; 2 1 2 3 4, 1 +2 2 +3 3 2
1 0; 2 0, 3 0
In view of the Duality Theorem of Linear Programming, if the two problems satisfy Slater’s
condition (do they?) then either problem has a solution if the other does, with
max x1 + 2 (x2 x4 ) + 4x3 = min 1 +3 2 +2 3
x 0 0
996 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
Part VIII
Integration
997
Chapter 35
6
y
1
O a b x
0
0 1 2 3 4 5 6
The problem is how to make this natural intuition rigorous. As the …gure shows, the
plane region A f[a;b] is a “curved” trapezoid with three straight sides and a curved one.
So, it is not an elementary geometric …gure that we know how to compute its area. To
our rescue comes a classic procedure known as the method of exhaustion. It consists in
approximating from above and below the area of a non-trivial geometric …gure (such as our
trapezoid) through the areas of simple circumscribed and inscribed elementary geometric
…gures, typically polygons (in our case, the so-called “plurirectangles”), whose measure can
be calculated in an elementary way. If the resulting upper and lower approximations can
be made more and more precise via polygons having more and more sides, till in the limit
of “in…nitely many sides” they reach a common limit value, we then take such a common
999
1000 CHAPTER 35. THE RIEMANN INTEGRAL
value as the sought-after area of the non-trivial geometric …gure (in our case, the area of the
trapezoid, so the integral of f on [a; b]).
In the next sections we will make rigorous the procedure just outlined. The method of
exhaustion originates in Greek mathematics, where it found wonderful applications in the
works of Eudoxus of Cnidus and Archimedes of Syracuse, who with this method were able
to compute or approximate the areas of some highly non-trivial geometric …gures.1
35.2 Plurirectangles
We know how to calculate the areas of elementary geometric …gures. Among them, the
simplest ones are rectangles, whose area is given by the product of the side lengths. A
simple, but key for our purposes, generalization of a rectangle is the plurirectangle, that is,
the polygon formed by contiguous rectangles. Graphically:
-1
-1 0 1 2 3 4 5 6 7 8 9
Clearly, the area of a plurirectangle is just the sum of the areas of the individual rectangles
that compose it.
Let us go back now to the plane region A f[a;b] under the graph of a positive function f
on [a; b]. It is easy to see how such region can be sandwiched between inscribed plurirectangles
and circumscribed plurirectangles. For example, the following plurirectangle
1
For instance, Example 1546 of Appendix C reports the famous Archimedes approximation of , the area
of the closed unit ball, via the method of exhaustion based on circumscribed and inscribed regular polygons.
35.2. PLURIRECTANGLES 1001
4 y
3.5
2.5
1.5
0.5
0
O a b x
-0.5
-1
0 1 2 3 4 5 6
4 y
3.5
2.5
1.5
0.5
0
O a b x
-0.5
-1
0 1 2 3 4 5 6
Naturally, the area of A f[a;b] is larger than the area of any inscribed plurirectangle and
smaller than the area of any circumscribed plurirectangle. The area of A f[a;b] is, therefore,
in between the areas of the inscribed and circumscribed plurirectangles.
We thus have a …rst key observation: the area of A f[a;b] can always be sandwiched
between areas of plurirectangles. This yields simple lower approximations (the areas of
the inscribed plurirectangles) and upper approximations (the areas of the circumscribed
plurirectangles) of the area of A f[a;b] .
A second key observation is that such a sandwich, and consequently the relative ap-
proximations, can be made better and better by considering …ner and …ner plurirectangles,
obtained by subdividing further and further their bases:
1002 CHAPTER 35. THE RIEMANN INTEGRAL
4 y 4 y
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
O a b x O a b x
-0.5 -0.5
-1 -1
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Indeed, by subdividing further and further the bases, the area of the inscribed plurirectangles
becomes larger and larger, though it remains always smaller than the area of A f[a;b] . On
the other hand, the area of the circumscribed plurirectangles becomes smaller and smaller,
though it remains always larger than the area of A f[a;b] . In other words, the two slices of
the sandwich that include the region A f[a;b] –i.e., the lower and the upper approximations
–take values that become closer and closer to each other.
If by considering …ner and …ner plurirectangles, corresponding to …ner and …ner subdi-
visions of the bases, in the limit the lower and upper approximations coincide –so, the two
slices of the sandwich merge –such a limit common value can be rightfully taken to be the
area of A f[a;b] . In this way, starting with objects, the plurirectangles, that are simple to
measure we are able to measure via better and better approximations a much more complic-
ated object such as the area of the plane region A f[a;b] under f . The method of exhaustion
is one of the most powerful ideas in mathematics.
35.3 De…nition
We now formalize the method of exhaustion. We …rst consider positive and bounded func-
tions f : [a; b] ! R+ . In the next section, we will then consider general bounded functions,
not necessarily positive
Let us construct on them the largest plurirectangle inscribed in the plane region under f .
In particular, for the i-th base, the maximum height mi of a inscribed rectangle with base
[xi 1 ; xi ] is
mi = inf f (x)
x2[xi 1 ;xi ]
Since f is bounded, by the Least Upper Bound Principle this in…mum exists and is …nite,
that is, mi 2 R. Since the length xi of each base [xi 1 ; xi ] is
xi = xi xi 1
In a similar way, let us construct on the contiguous bases (35.2) determined by the subdivision
, the smallest plurirectangle that circumscribes the plane region under f . For the i-th base,
the minimum height Mi of a circumscribed rectangle with base [xi 1 ; xi ] is
Mi = sup f (x)
x2[xi 1 ;xi ]
Graphically:
4
M
i
0
m
i
-1
-2 x x
i-1 i
-3
-2 -1 0 1 2 3 4
As before, since f is bounded by the Least Upper Bound Principle the supremum exists
and is …nite, that is, Mi 2 R. Therefore, the area S (f; ) of the minimal circumscribed
plurirectangle is
Xn
S (f; ) = Mi x i (35.4)
i=1
In particular, the area of the plane region under f lies between these two values. Hence,
I (f; ) gives a lower approximation of this area, while S (f; ) gives an upper approximation
of it. They are called the lower and upper integral sums of f with respect to , respectively.
De…nition 1404 Given two subdivisions and 0 of [a; b], we say that 0 re…nes if 0.
In other words, the …ner subdivision 0 is obtained by adding further points to . For
example, the subdivision
0 1 1 3
= 0; ; ; ; 1
4 2 4
of the unit interval [0; 1] re…nes the subdivision = f0; 1=2; 1g.
It is easy to see that if 0 re…nes , then
0 0
I (f; ) I f; S f; S (f; ) (35.6)
In other words, a …ner subdivision 0 yields a better approximation, both lower and upper, of
the area under f .2 By starting from any subdivision, we can always re…ne it, thus improving
(or, at least, not worsening) the approximations given by the corresponding plurirectangles.
The same can be done by starting from any two subdivisions and 0 , not necessarily
nested. Indeed, the subdivision 00 = [ 0 formed by all the points that belong to the two
subdivisions and 0 re…nes both of them. In other words, 00 is a common re…nement of
and 0 .
1 1 2 0 1 1 3
= 0; ; ; ; 1 and = 0; ; ; ; 1
3 2 3 4 2 4
of [0; 1]. They are not nested: neither re…nes 0 nor 0 re…nes . However, the subdivision
00 0 1 1 1 2 3
= [ = 0; ; ; ; ; ; 1
4 3 2 3 4
and
0 00 00 0
I f; I f; S f; S f; (35.8)
The common re…nement 00 gives a better approximation, both lower and upper, of the area
under f than the original subdivisions and 0 .
All this motivates the next de…nition.
2
For sake of brevity, we write “area under f ” instead of the more precise expression “area of the plane
region that lies under the graph of f ”.
35.3. DEFINITION 1005
A …rst important question is whether the lower and upper integrals of a bounded function
exist. Fortunately, this is the case, as next we show.
Lemma 1407 If f : [a; b] ! R+ is a bounded function, then both the lower integral and the
upper integral exist and are …nite, with
Z b Z b
f (x) dx f (x) dx (35.11)
a a
Proof Since f is positive and bounded, there exists M 0 such that 0 f (x) M for
every x 2 [a; b]. Therefore, for every subdivision = fxi gni=0 we have
and so
0 I (f; ) S (f; ) M (b a) 8 2
By the Least Upper Bound Principle, the supremum in (35.9) and the in…mum in (35.10)
Rb Rb
exist and are …nite and positive, that is, f (x) dx 2 R+ and a f (x) dx 2 R+ .
a
We still need to prove the inequality (35.11). Let us suppose, by contradiction, that
Z b Z b
f (x) dx f (x) dx = " > 0
a a
By the previous lemma, every bounded function f : [a; b] ! R+ has both the lower
integral and the upper integral, with
Z b Z b
f (x) dx f (x) dx
a a
The area under f lies between these two values. The last inequality is the most re…ned
version of (35.6). The lower and upper integrals are, respectively, the best lower and upper
approximations of the area under f that can be obtained through plurirectangles. In partic-
Rb Rb
ular, when f (x) dx = a f (x) dx, the area under f will be assumed to be such common
a
value. This motivates the next fundamental de…nition.
For brevity, in the rest of the chapter we will often talk about integrals and integrable
functions, omitting the clause “in the sense of Riemann”. Since there are other notions of
integral, it is important however to keep always in mind such quali…cation. In addition, note
that the de…nition applies only to bounded functions. When in the sequel we will consider
integrable functions, they will be assumed to be bounded (even if not stated explicitly).
Rb
O.R. The notation
Pn a f (x) dx reminds us that P
the integral is obtained as the limit of sums
of the type
R i=1 i x i , in which the symbol is replaced by the integral sign (“a long
letter s”) , the length xi by dx, and the values i of the function by f (x). H
Let us illustrate the de…nition of the integral with, …rst, an example of an integrable
function and, then, of a non-integrable one.
35.3. DEFINITION 1007
Example 1409 Let f : [a; b] ! R be de…ned by f (x) = x. For any subdivision fxi gni=0 we
have
n
X
I (f; ) = x0 x1 + x1 x2 + + xn 1 xn = xi 1 xi
i=1
n
X
S (f; ) = x1 x1 + x2 x2 + + xn xn = xi xi
i=1
Therefore,
n
X
S (f; ) I (f; ) = (x1 x0 ) x1 + (x2 x1 ) x2 + + (xn xn 1) xn = ( xi )2
i=1
where the last inequality follows from Jensen’s inequality because the quadratic function is
Rb Rb
convex.3 Thus, a f (x) dx = f (x) dx and we conclude that f (x) = x is integrable. N
a
Finally, let us introduce a useful quantity that characterizes the “…ness”of a subdivision
of [a; b].
De…nition 1411 Given a subdivision of [a; b], we de…ne the mesh of , denoted by j j,
the positive quantity
j j = max xi
i=1;2;:::;n
5 y
2
+
1
O - x
0
-1
-2
-3 -2 -1 0 1 2 3 4
Intuitively, the integral is now the di¤erence between the area of the positive part and the
area of the negative part. If they have equal value, the integral is zero: this is the case, for
example, of the function f (x) = sin x on the interval [0; 2 ].
To make this idea rigorous, it is useful to decompose a function into its positive and
negative parts.
The function f + is called the positive part of f , while f is called the negative part.
0 x<0 x x<0
f + (x) = and f (x) =
x x 0 0 x 0
Graphically:
3 3
y y
2.5 2.5
2 2
+ -
f f
1.5 1.5
1 1
0.5 0.5
0 0
O x O x
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-4 -2 0 2 4 6 -4 -2 0 2 4 6
and 8 [
>
< 0 x2 [2n ; (2n + 1) ]
f (x) = n2Z
>
:
sin x otherwise
Graphically:
4 4
y y
3 3
-
f
2 2
+
f
1 1
0 0
O x O x
-1 -1
-2 -2
-8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 10
N
1010 CHAPTER 35. THE RIEMANN INTEGRAL
f (x) = max ff (x) ; 0g + min ff (x) ; 0g = max ff (x) ; 0g ( min ff (x) ; 0g)
+
= f (x) f (x)
f = f+ f (35.13)
of its positive and negative parts. Such a decomposition permits to extend in a natural way
the notion of integral to any function, not necessarily positive. Indeed, since both functions
f + and f are positive, the de…nition of Riemann integral for positive functions applies to
the areas under each of them. The di¤erence between their integrals
Z b Z b
+
f (x) dx f (x) dx
a a
is the di¤erence between the areas under f + and f . So, it is the integral which we were
looking for.
All of this motivates the following de…nition of Riemann integral for general bounded
functions, not necessarily positive.
This de…nition makes it rigorous and transparent the idea of considering with di¤erent
sign the areas of the plane regions bounded by f that lie, respectively, above and below the
horizontal axis.
For general functions, too, the sums I(f; ) and S(f; ) are called the lower and upper
integral sum of f with respect to the subdivision , respectively. The reader can easily verify
that for these sums the properties (35.5), (35.6), (35.7) and (35.8) continue to hold. In
particular,
sup I (f; ) inf S (f; )
2 2
Moreover, for any bounded function f : [a; b] ! R, positive or not, we can still de…ne the
lower and upper integrals
Z b Z b
f (x) dx = sup I (f; ) and f (x) dx = inf S (f; ) (35.14)
2 a 2
a
in perfect analogy with what we did for positive functions. The next result shows that
everything …ts together: the notion of Riemann integral obtained through the decomposition
(35.13) into positive and negative part is given by the equality between upper and lower
integrals of (35.14).
Rb
Proposition 1415 A bounded function f : [a; b] ! R is integrable if and only if f (x) dx =
a
Rb
a f (x) dx. In this case,
Z b Z b Z b
f (x) dx = f (x) dx = f (x) dx
a a a
The proof is based on the three lemmas. The …rst one establishes a general property of
the suprema and in…ma of sums of functions, the second one has also a theoretical interest
for the theory of integration (as we will explain at the end of the section), while the last one
has a more technical nature.
Lemma 1416 For any two bounded functions g; h : A ! R, we have supx2A (g + h) (x)
supx2A g (x) + supx2A h (x) and inf x2A (g + h) (x) inf x2A g (x) + inf x2A h (x).
Proof By contradiction, suppose that supx2A (g + h) (x) > supx2A g (x) + supx2A h (x). Let
" = supx2A (g + h) (x) (supx2A g (x) + supx2A h (x)) > 0. By a property of the sup of a
set, there exists x0 2 A such that (g + h)(x0 ) > supx2A (g + h) (x) " = supx2A g (x) +
supx2A h (x).5 At the same time, by the de…nition of sup of a function, we have g(x)
supx2A g (x) and h(x) supx2A h (x) for every x 2 A, from which it follows that g(x)+h(x)
supx2A g (x) + supx2A h (x) for every x 2 A. In particular, (g + h)(x0 ) supx2A g (x) +
supx2A h (x), a contradiction. The reader can prove, in a similar way, that inf x2A (g+h) (x)
inf x2A g (x) + inf x2A h (x).
Lemma 1417 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b], we have
S (f; ) = S f + ; I f ; (35.15)
and
I (f; ) = I f + ; S f ; (35.16)
5
Note that supx2A g (x) + supx2A h (x) = sup Im(g + h) = sup(g + h)(A).
1012 CHAPTER 35. THE RIEMANN INTEGRAL
Proof Let f : [a; b] ! R be a bounded function and let = fxi gni=0 be a subdivision of
[a; b]. For a generic interval [xi 1 ; xi ], put = supx2[xi 1 ;xi ] f (x) and = inf x2[xi 1 ;xi ] f (x).
Since f is bounded, and exist by the Least Upper Bound Principle. We have
0 =) = sup f + (x)
x2[xi 1 ;xi ]
and
< 0 =) sup f + (x) = 0 and = inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
So,
sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
On the other hand, by Lemma 1416 for any pair of functions g; h : A ! R we have
and so
In sum,
= sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
Lemma 1418 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b],
Putting together (35.19), (35.20) and (35.5) applied to both f + and f , we get the inequality
(35.18).
Rb Rb
Proof of Proposition 1415 We begin with the “if”: suppose f (x) dx = af (x) dx. We
a
show that f + and f are integrable. From (35.18) it follows
So
sup I f + ; inf S f ; = inf S f + ; sup I f ;
2 2 2 2
which implies
which implies
It remains to prove the “only if”. Suppose that f be integrable, that is, that f + and f
are both integrable. We show that
By (35.18), we have
Z b Z b Z b
+
sup I (f; ) f (x) dx f (x) dx = f (x) dx inf S (f; ) (35.23)
2 a a a 2
Since f + and f are both integrable, by the integrability criterion of Proposition 1419 we
have that, for every " > 0, there exist subdivisions and 0 such that6
as desired.
N.B. The Riemann integral is often de…ned directly for general functions, not necessarily
positive, through the lower and upper sums. What is lost in de…ning these sums for not
necessarily positive functions is the geometric intuition. While for positive functions I(f; )
is the area of the inscribed plurirectangles and S(f; ) the area of the circumscribed plurir-
ectangles, this is no longer true for a generic function that takes positive and negative values,
as (35.15) and (35.16) show. The formulation we adopt with De…nition 1414 is suggested
by pedagogical motivations and is equivalent to the usual formulation, as Proposition 1415
shows. O
Proof “If”. Suppose that, for every " > 0, there exists a subdivision such that S (f; )
I (f; ) < ". Then
Z b Z b
0 f (x) dx f (x) dx S (f; ) I (f; ) < "
a a
Rb Rb
and therefore, since " > 0 is arbitrary, we have a f (x) dx = f (x) dx.
a
Rb Rb
“Only if”. Suppose that a f (x) dx = f (x) dx. By Proposition 120, for every " > 0
a
0
Rb
there exist a subdivision such that S (f; 0 ) a f (x) dx < " and a subdivision
00 such
Rb
that f (x) dx I (f; 00 ) < ". Let be a subdivision that re…nes both 0 and 00 . Thanks
a
to (35.6), we have I (f; 00 ) I (f; ) S (f; ) S (f; 0 ), so
Z b Z b
0 00
S (f; ) I (f; ) S f; I f; < f (x) dx + " f (x) dx + " = 2"
a a
35.4. INTEGRABILITY CRITERIA 1015
as desired.
The next result shows that, if two functions are equal except at a …nite number of points,
then their integrals (if they exist) are equal. It is an important property of stability of the
integral, whose value does not change if we modify a function f : [a; b] ! R at a …nite
number of points.
Proof It is su¢ cient to prove the statement for the case in which g di¤ers from f at only
one point x^ 2 [a; b]. The case of n points is then proved by (…nite) induction by adding one
point at a time.
Suppose, therefore, that f (^ x) 6= g(^
x) with x ^ 2 [a; b]. Without loss of generality, suppose
that f (^
x) > g(^x). Setting k = f (^ x) g(^ x) > 0, let h : [a; b] ! R be the function h = f g.
Then
0 x 6= x ^
h(x) =
k x=x ^
Rb
Let us prove that h is integrable and that a h(x)dx = 0. Let " > 0. Consider an arbitrary
subdivision = fx0 ; x1 ; :::; xn g of [a; b] such that j j < "=(2k). Since x ^ 2 [a; b], there are
two possibilities: (i) x
^ is not an intermediate point of the subdivision, that is, we have either
x
^ 2 fx0 ; xn g or x
^ 2 (xi 1 ; xi ) for some i = 1; :::; n; (ii) x^ is a point of the subdivision, with
the exclusion of the extremes, that is, x ^ = xi for some i = 1; :::; n 1. Since h(x) = 0 for
every x 6= x
^, we have
I(h; ) = 0
In case (i), with either x
^ 2 fx0 ; xn g or x
^ 2 (xi 1 ; xi ) for some i = 1; :::; n, we have7
" "
S(h; ) = k xi < k = <"
2k 2
In case (ii), with x
^ = xi for some i = 1; :::; n 1, we have
"
S(h; ) = k ( xi + xi+1 ) < 2k ="
2k
Therefore, in both cases (i) and (ii) we have S(h; ) I(h; ) < ". Since " > 0 is arbitrary,
by Proposition 1419 h is integrable on [a; b]. Hence
Z b
h(x)dx = sup I(h; ) = inf S(h; ) (35.24)
a 2 2
sup I(h; ) = 0
2
7
If x
^ = x0 , we have S(h; ) = k x1 , while if x
^ = xn , we have S(h; ) = k xn . In both cases, we have
S(h; ) < ".
1016 CHAPTER 35. THE RIEMANN INTEGRAL
By applying the linearity of the integral (Theorem 1429), we have that g = f h is integrable
because f and h are so, with
Z b Z b Z b Z b
g(x)dx = f (x)dx h(x)dx = f (x)dx
a a a a
as desired.
O.R. Even if a function f is not de…ned at a …nite number of points of the interval [a; b],
we can still talk about its integral: it coincides with that of any function de…ned also at the
missing points and equal to f at the points where f is de…ned. In particular, the integrals
of f on [a; b], (a; b], [a; b) and (a; b) always coincide: this makes unambiguous the notation
Z b
f (x) dx. H
a
Proof Let " > 0. Since g is continuous on [m; M ], by Theorem 526 the function g is
uniformly continuous on [m; M ], that is, there exists " > 0 such that
and therefore
n
" #
X
S (g f; ) I (g f; ) = sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=0
" #
X
= sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
" #
X
+ sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
=
X X
" xi + 2 max jg (y)j xi < " (b a) + 2 max jg (y)j "
y2[m;M ] y2[m;M ]
i2I i2I
=
Since the function g (x) = jxj is continuous, a simple but important consequence of
Proposition 1421 is that the integrability of a bounded function f : [a; b] ! R implies the
integrability of the function absolute value jf j : [a; b] ! R. Note that the converse is false:
the function 8
< 1 if x 2 Q \ [0; 1]
f (x) = (35.26)
:
1 if x 2 = Q \ [0; 1]
is a simple modi…cation of the Dirichlet function and hence it is not integrable, contrary to
its absolute value jf j which is the constant function equal to 1 on the interval [0; 1].
Finally, observe that the …rst integrability criteria of this section, Proposition 1419, opens
an interesting perspective on the Riemann integral. Given any subdivision = fxi gni=0 , by
de…nition we have mi f (x0i ) Mi for every x0i 2 [xi 1 ; xi ], so that
n
X
I (f; ) f x0i xi S (f; )
i=1
Hence, since
Z b
I(f; ) f (x) dx S(f; )
a
we have
n
X Z b
I (f; ) S (f; ) f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
8
Here i 2
= I stands for i 2 f1; 2; : : : ; ng I.
1018 CHAPTER 35. THE RIEMANN INTEGRAL
which is equivalent to
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
By Proposition 1419, for every " > 0 there exists a su¢ ciently …ne subdivision for which
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; ) < "
i=1 a
De…nition 1422 A function f : [a; b] ! R is called step function if there exist a subdivision
= fxi gni=0 and a set fci gni=1 of constants such that
and
n
X
g (x) = c1 1[x0 ;x1 ] (x) + ci 1(xi 1 ;xi ]
(x) (35.30)
i=2
9
Often called Riemann sums (or, sometimes, Cauchy sums).
35.5. CLASSES OF INTEGRABLE FUNCTIONS 1019
are step functions where, for every set A in R, we denote by 1A : R ! R the indicator
function
(
1 if x 2 A
1A (x) = (35.31)
0 if x 2
=A
The two following …gures give, for n = 4, examples of functions f and g described by (35.29)
and (35.30). Note that f and g are, respectively, continuous from the right and from the
left, that is, limx!x+ f (x) = f (x0 ) and limx!x g (x) = g (x0 ).
0 0
7 7
6 6
5 f(x) 5 g(x)
4 c 4 c
4 4
3 c 3 c
2 2
2 c 2 c
3 3
1 c 1 c
1 1
0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9
On the intervals
[x0 ; x1 ) [ (x1 ; x2 ) [ (x2 ; x3 ) [ (x3 ; x4 ]
4 c
4
3 c
2
2 c
3
1 c
1
0
x x x x x
0 1 2 3 4
-1
-1 0 1 2 3 4 5 6 7 8 9
determined by the subdivision fxi g4i=0 and by the constants fci g4i=1 . Nevertheless, at the
points x1 < x2 < x3 the functions f and g di¤er and it is easy to verify that on the entire
interval [x0 ; x4 ] they do not generate this plurirectangle, as the next …gure shows. Indeed,
the dashed segment at x2 is not under f and the dashed segments at x1 and x3 are not under
1020 CHAPTER 35. THE RIEMANN INTEGRAL
g.
7 7
6 6
5 f(x) 5 g(x)
4 c 4 c
4 4
3 c 3 c
2 2
2 c 2 c
3 3
1 c 1 c
1 1
0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 -1
8 90 1 2 3 4 5 6 7
But, thanks to Proposition 1420, such a discrepancy at a …nite number of points is irrelevant
for the integral. The next result shows that the area under the step functions f and g is,
actually, equal to that of the corresponding plurirectangle (independently of the values of
the function at the points x1 < x2 < x3 ).
Proposition 1423 A step function f : [a; b] ! R, determined by the subdivision fxi gni=0
and the constants fci gni=1 according to (35.28), is integrable, with
Z b n
X
f (x) dx = ci xi (35.32)
a i=1
All the step functions that are determined by a subdivision fxi gni=0 and a set of constants
fci gni=1 according to (35.28), share therefore the same integral (35.32). In particular, this
holds for the step functions (35.29) and (35.30).
Rb Rb
Proof Since f is bounded, Lemma 1407 shows that f (x) dx; a f (x) dx 2 R. Let m =
a
inf x2[a;b] f (x) and M = supx2[a;b] f (x). Fix " > 0 su¢ ciently small, and consider the
subdivision " given by
x0 < x1 " < x1 + " < x2 " < x2 + " < < xn 1 " < xn 1 + " < xn
35.5. CLASSES OF INTEGRABLE FUNCTIONS 1021
We have
n
X1
= c1 ( x1 ") + 2" inf f (x)
x2[xi ";xi +"]
i=1
n
X1
+ ci ( xi 2") + cn ( xn ")
i=2
Xn n
X1 n
X1
= ci xi " (c1 + cn ) + 2" inf f (x) 2" ci
x2[xi ";xi +"]
i=1 i=1 i=2
Xn n
X
ci xi 2"M + 2" (n 1) m 2"M (n 2) = ci xi 2" (n 1) (M m)
i=1 i=1
Since " > 0 is arbitrary, Proposition 1419 shows that f is integrable. Moreover, since
Z b
I (f; " ) f (x) dx S (f; " )
a
we have Z
n
X b n
X
ci xi K" f (x) dx ci xi + K"
i=1 a i=1
Rb Pn
which, given the arbitrariness of " > 0, guarantees that a f (x) dx = i=1 ci xi .
and
Z b Z b
f (x) dx = inf h (x) dx : h f and h 2 S ([a; b]) (35.34)
a a
That is, if and only if the lower approximation given by the integrals of step functions smaller
than f coincides, at the limit, with the upper approximation given by the integrals of step
functions larger than f . In this case the method of exhaustion assumes a more analytic and
less geometric aspect10 with the approximation by elementary polygons (the plurirectangles)
replaced by the one given by elementary functions (the step functions).
This suggests a di¤erent approach to the Riemann integral, more analytic and less geo-
metric. In such an approach, we …rst de…ne the integrals of step functions (that is, the area
under them), which can be determined on the basis of elementary geometric considerations
based on plurirectangles. We then use these “elementary”integrals to suitably approximate
the areas under more complicated functions. In particular, we de…ne the lower integral of
a bounded function f : [a; b] ! R as the best approximation “from below” obtained by
means of step functions h f , and, analogously, the upper integral of a bounded function
f : [a; b] ! R as the best approximation “from above” obtained by means of step functions
h f.
Thanks to (35.33) and (35.34), this more analytic interpretation of the method of ex-
haustion is equivalent to the geometric one previously adopted. The analytic approach is
quite fruitful, as readers will learn in more advanced courses.
Proof Since f is continuous on [a; b], by Weierstrass’Theorem f is bounded. Let " > 0. By
Theorem 526, f is uniformly continuous, that is, there exists " > 0 such that
Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". By (35.35), for every i =
1; 2; : : : ; n we therefore have
10
That is, based also on the use of notions of analysis, such as functions, and not only on that of geometric
…gures, such as plurirectangles.
35.5. CLASSES OF INTEGRABLE FUNCTIONS 1023
Because of the stability of the integral seen in Proposition 1420, we have the following
immediate generalization of the last result: every bounded function f : [a; b] ! R that has
at most a …nite number of removable discontinuities is integrable. Indeed, by recalling (12.7)
of Chapter 12, if S = fxi gni=1 is the set of points where f has removable discontinuities, the
function
f (x) if x 2
=S
f~ (x) =
limy!x f (y) if x 2 S
is continuous (so, integrable) and is equal to f except at the points of S.
More is true: the hypothesis that the discontinuities are removable is actually super‡uous,
and we can allow for countably many points of discontinuity (but not more than that).
Theorem 1426 Every bounded function f : [a; b] ! R with at most countably many discon-
tinuities is integrable.
is continuous at all the points of [0; 1], except at the two extreme points 0 and 1. By Theorem
1426, the function f is integrable.
(ii) Consider the countable set
1
E= :n 1 [0; 1]
n
The function f : [0; 1] ! R de…ned by
x2 if x 2
=E
f (x) =
0 if x 2 E
1024 CHAPTER 35. THE RIEMANN INTEGRAL
is continuous at all the points of [0; 1], except at the points of E.11 Since E is a countable
set, by Theorem 1426 the function f is integrable. N
The result follows immediately from Theorem 1426 because monotonic functions have at
most countably many points of discontinuity (Proposition 483). Next we give, however, a
simple direct proof of the result.
Proof Let " > 0. Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". Let us suppose
that f is increasing (the argument for f decreasing is analogous). We have
and therefore
n
X n
X
S (f; ) I (f; ) = sup f (x) xi inf f (x) xi
x2[xi 1 ;xi ]
i=1 x2[xi 1 ;xi ] i=1
Xn n
X n
X
= f (xi ) xi f (xi 1) xi = (f (xi ) f (xi 1 )) xi
i=1 i=1 i=1
n
X
j j (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1
Theorem 1429 Let f; g : [a; b] ! R be two bounded and integrable functions. Then, for
every ; 2 R the function f + g : [a; b] ! R is integrable, with
Z b Z b Z b
( f + g) (x) dx = f (x) dx + g (x) dx (35.36)
a a a
11
Note that f is continuous at the origin, as the reader can verify.
35.6. PROPERTIES OF THE INTEGRAL 1025
Proof The proof is divided into two parts. First we will prove homogeneity, that is,
Z b Z b
f (x) dx = f (x) dx 8 2R (35.37)
a a
whenever f and g are integrable. Together, relations (35.37) and (35.38) are equivalent to
(35.36).
(i) Homogeneity. Let = fxi gni=0 be a subdivision of [a; b]. If 0 we have I ( f; ) =
I (f; ) and S ( f; ) = S (f; ). Therefore, f is integrable, with
Z b Z b
f (x) dx = f (x) dx (35.39)
a a
Z b Z b
( f ) (x) dx = sup I ( f; ) = sup S (f; ) = inf S (f; ) = f (x) dx
a 2 2 2 a
Now let < 0. We have f = ( )( f ) with > 0. Then, by applying (35.39) we obtain
Z b Z b Z b
( f ) (x) dx = ( ) ( f ) (x) dx = ( ) ( f ) (x) dx
a a a
Z b Z b
=( ) f (x) dx = f (x) dx
a a
Therefore,
Z b Z b
f (x) dx = f (x) dx 8 2R (35.40)
a a
that is, (35.37).
(ii) Additivity. Let us prove (35.38). Let " > 0. Since f and g are integrable, by
Proposition 1419 there exists a subdivision of [a; b] such that S (f; ) I (f; ) < " and
there exists 0 such that S (g; 0 ) I (g; 0 ) < ". Let 00 be a subdivision of [a; b] that re…nes
1026 CHAPTER 35. THE RIEMANN INTEGRAL
both and 0 . Thanks to (35.6), we have S (f; 00 ) I (f; 00 ) < " and S (g; 00 ) I (g; 00 ) < ".
Moreover, by applying the inequalities of Lemma 1416,
00 00 00 00 00 00
I f; + I g; I f + g; S f + g; S f; + S g; (35.41)
and therefore
00 00 00 00 00 00
S f + g; I f + g; S f; I f; + S g; I g; < 2"
By Proposition 1419, f + g is integrable. Hence, (35.41) becomes
Z b
I (f; ) + I (g; ) (f + g)(x)dx S (f; ) + S (g; )
a
Rb Rb
for every subdivision 2 . By subtracting a f (x) dx + a g (x) dx from all the three
members of the inequality, we obtain
Z b Z b
I (f; ) + I (g; ) f (x) dx + g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) + S (g; ) f (x) dx + g (x) dx
a a
that is,
Z b Z b
I (f; ) f (x) dx + I (g; ) g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) f (x) dx + S (g; ) g (x) dx
a a
Since f and g are integrable, given any " > 0 we can …nd a subdivision " such that, for
h = f; g, we have
Z b Z b
" " " "
I (h; ) h (x) dx > and S (h; ) h (x) dx <
a 2 a 2
Therefore,
Z b Z b Z b
"< (f + g)(x)dx f (x) dx + g (x) dx <"
a a a
and, given the arbitrariness of " > 0, one necessarily has
Z b Z b Z b
(f + g)(x)dx = f (x) dx + g (x) dx (35.42)
a a a
An important consequence of the linearity of the integral is that the product of two
integrable functions is integrable.
35.6. PROPERTIES OF THE INTEGRAL 1027
Corollary 1430 If f; g : [a; b] ! R are two bounded and integrable functions, then their
product f g : [a; b] ! R is integrable.
O.R. Thanks to the linearity of the integral, knowing the integrals of f and g allows one
to calculate the integral of f + g. It is not so for the product or for the composition of
integrable functions: the integrability of f guarantees the integrability of f 2 , but knowing
the value of
R bthe integral ofR f does not help in the calculation of the integral of f 2 –indeed,
b
in general a f (x) dx 6= ( a f (x) dx)2 . More generally, knowing that g f is integrable does
2
not give any useful indication for the computation of the integral of the composite function.
H
Finally, the linearity of the integral implies that it is possible to freely subdivide the
domain of integration [a; b] into subintervals.
Corollary 1431 Let f : [a; b] ! R be a bounded and integrable function. If a < c < b, then
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (35.43)
a a c
Vice versa, if f1 : [a; c] ! R and f2 : [c; b] ! R are bounded and integrable, then the function
f : [a; b] ! R de…ned by (
f1 (x) if x 2 [a; c]
f (x) =
f2 (x) if x 2 (c; b]
is also bounded and integrable, with
Z b Z c Z b
f (x) dx = f1 (x) dx + f2 (x) dx
a a c
Proof Let us prove the …rst part. Since (recall the de…nition (35.31) of the indicator func-
tion):
f = 1[a;c] f + 1(c;b] f
the linearity of the integral implies that
Z b Z b Z b
f (x) dx = 1[a;c] f + 1(c;b] f (x) dx = 1[a;c] f (x) + 1(c;b] f (x) dx
a a a
Z b Z b
= 1[a;c] f (x) dx + 1(c;b] f (x) dx
a a
1028 CHAPTER 35. THE RIEMANN INTEGRAL
and
n
X X
0 00
S(1[a;c] f (x) ; )= Mi xi = Mi xi = S(fj[a;c] (x) ; ) (35.45)
i=1 i j
Therefore,
00 00
S(fj[a;c] (x) ; ) I(fj[a;c] (x) ; )<"
By Proposition 1419 we conclude that fj[a;c] : [a; c] ! R is integrable. Moreover, from (35.44)
and (35.45) we deduce that
Z b Z c Z c
1[a;c] f (x) dx = fj[a;c] (x)dx = f (x)dx
a a a
as desired.
The next property of monotonicity of the integral shows that to larger functions there
correspond larger integrals. The writing f g means f (x) g (x) for every x 2 [a; b], i.e.,
the function f is pointwise smaller than the function g.
Theorem 1432 Let f; g : [a; b] ! R be two bounded and integrable functions. If f g, then
Rb Rb
a f (x) dx a g (x) dx.
From the monotonicity of the integral we obtain an important inequality between “ab-
solute values of integrals” and “integrals of absolute values”, the latter being larger. In
reading the result keep in mind that, as observed after Proposition 1421, the integrability of
jf j follows from that of f .
Proposition 1434 Let f : [a; b] ! R be a bounded and integrable function. Then, setting
m = inf [a;b] f (x) and M = sup[a;b] f (x), we have
Z b
m (b a) f (x) dx M (b a) (35.47)
a
1030 CHAPTER 35. THE RIEMANN INTEGRAL
Proof We have
m f (x) M 8x 2 [a; b]
whence, by the monotonicity of the integral,
Z b Z b Z b
mdx f (x) dx M dx
a a a
Rb
Clearly, a mdx = m (b a) (it is the area of a rectangle of base b a and height m) and
Rb
a M dx = M (b a).13 This shows that 35.47 holds.
We end with the classic Integral Mean Value Theorem, which follows from the previous
sandwich property.
Theorem 1435 (Integral Mean Value) Let f : [a; b] ! R be a bounded and integrable
function. Then, setting m = inf [a;b] f (x) and M = sup[a;b] f (x), there exists a scalar 2
[m; M ] such that
Z b
f (x) dx = (b a) (35.48)
a
In particular, if f is continuous, there exists c 2 [a; b] such that f (c) = , that is,
Z b
f (x) dx = f (c) (b a)
a
For this reason, is called the mean value (of the ordinates) of f : the value of the integral
does not change if we replace all the ordinates of the function by the constant value .
O.R. The Integral Mean Value Theorem is quite intuitive: there exists a rectangle with base
[a; b] and height , with area equal to the one under f on [a; b]:
13
Note that m (b a) and M (b a) are the areas of the rectangles of base b a and height m and M ,
respectively.
35.7. INTEGRAL CALCULUS 1031
25
y
20
15
10
0
O a b x
-2 0 2 4 6 8
If, moreover, the function f is continuous, the height of such a rectangle coincides with
one of the ordinates of f . H
N.B. Given a function f : [a; b] ! R, until now we have considered the de…nite integral of
Rb
f from a to b,R that is, a f (x)dx. Sometimes it is useful to consider the integral
Ra of f from b
a 14
to a, that is, b f (x)dx, as well as the integral of f from a to a, that is, a f (x)dx. What
do we mean by such expressions? By convention, we pose, for a < b,
Z a Z b
f (x)dx = f (x)dx (35.49)
b a
and Z a
f (x)dx = 0 (35.50)
a
Rb
Thanks to these conventions, it is no longer essential that in a we have a < b: in the case in
which a b the integral assumes the meaning given to it by (35.49) and (35.50). Moreover,
Rb
it is possible to prove that the properties established for the integral a f (x)dx hold also in
the case a b. O
In other words, moving from the function f to its primitive P can be seen as the inverse
procedure with respect to moving from P to f through di¤erentiation. In this sense, the
primitive function is the inverse of the derivative function (indeed, sometimes it is called
antiderivative).
Let us provide a couple of examples. Here it is important to keep in mind that, as Example
1442 will show, a function might not have a primitive, so the search of the primitive of a
function might be vain. In any case, by Corollary 999 a necessary condition for a function f
to have a primitive is that it has no removable or jump discontinuities.
Example 1437 Let f : [0; 1] ! R be given by f (x) = x. The function P : [0; 1] ! R given
by P (x) = x2 =2 is a primitive of f . Indeed, P 0 (x) = 2x=2 = x. N
N.B. If I1 and I2 are two nested intervals, with I1 I2 , then a primitive of f on I2 is also a
primitive on I1 . For example, if we consider the restriction of f (x) = x= 1 + x2 on [0; 1],
that is, the function f~ : [0; 1] ! R given by f~ (x) = x= 1 + x2 , then the primitive on [0; 1]
remains P (x) = 2 1 log 1 + x2 . O
Proof The “if”is obvious. Let us prove the “only if”. Let I = [a; b] and let P1 ; P2 : [a; b] ! R
be two primitive functions of f on [a; b]. Since P10 (x) = f (x) and P20 (x) = f (x) for every
x 2 [a; b], we have
(P1 P2 )0 (x) = P10 (x) P20 (x) = 0 8x 2 [a; b]
Therefore, the function P1 P2 has zero derivative on [a; b]. The Mean Value Theorem, via
Corollary 995, implies that the function P1 P2 is constant, that is, there exists k 2 R such
that P1 = P2 + k.
Let now I be an open and bounded interval (a; b). Let " > 0 be su¢ ciently small so that
a + " < b ". We have
1 h
[ " "i
(a; b) = a + ;b
n n
n=1
By what has just been proved, for every n 1 there exists a constant kn 2 R such that
h " "i
P1 (x) = P2 (x) + kn 8x 2 a + ; b (35.51)
n n
Let x0 2 (a; b) be such that a + " < x0 < b ", so that x0 2 [a + "=n; b "=n] for every
n 1. From (35.51) it follows that P1 (x0 ) = P2 (x0 ) + kn for every n 1. Therefore,
kn = P1 (x0 ) P2 (x0 ) for every n 1, that is, k1 = k2 = = kn . There exists, therefore,
k 2 R such that P1 (x) = P2 (x) + k for every x 2 (a; b).
In a similar way one can show the result when I is a half-open and bounded [1 interval
(a; b] or [a; b). If I = R, we proceed as in the case (a; b), observing that R = [ n; n].
n=1
A similar argument, which we leave to the reader, holds also for unbounded intervals.
This proposition is another important application of the Mean Value Theorem (of di¤er-
ential calculus). Thanks to it, once a primitive P of a function f is identi…ed, we can write
the family of all the primitives as fP + kgk2R . This important family deserves a name.
De…nition 1440 Given a function f : I ! R, the family of all its primitives is called the
inde…nite integral of f and is denoted by
Z
f (x) dx
Example 1441 Let us go back to Examples (1437) and (1438). For the function f : [0; 1] !
R given by f (x) = x, we have Z
x2
f (x) dx = +k
2
For the function f : R ! R given by f (x) = x= 1 + x2 we have
Z
1
f (x) dx = log 1 + x2 + k
2
N
1034 CHAPTER 35. THE RIEMANN INTEGRAL
We close the section by showing that not all the functions admit a primitive, so an
inde…nite integral.
8
>
> 1 if x > 0
<
sgn (x) = 0 if x = 0
>
>
:
1 if x < 0
does not admit a primitive. Let us suppose, by contradiction, that there exists a primitive
P : R ! R, i.e., a di¤erentiable function such that P 0 (x) = sgn x. By Proposition 1439,
there exists k 2 R such that
x+k if x > 0
P (x) =
x+k if x < 0
Rb
The Riemann integral a f (x) dx is often called a de…nite integral to distinguish it from
the inde…nite integral just introduced. Note that the inde…nite integral is a di¤erential
calculus notion. The Riemann integral, with its connection with the method of exhaustion,
is a conceptually much deeper notion.
35.7. INTEGRAL CALCULUS 1035
35.7.2 Formulary
The next table, obtained by “reversing” the corresponding table of the basic derivatives,
records some fundamental inde…nite integrals.
R
f f (x) dx
xa+1
xa +k 1 6= a 2 R and x > 0
a+1
xn+1
xn +k x2R
n+1
1
log x + k x>0
x
1
log ( x) + k x<0
x
cos x sin x + k x2R
sin x cos x + k x2R
ex ex + k x2R
x
x +k > 0 and x 2 R
log
1
p arcsin x + k x 2 ( 1; 1)
1 x2
1
arctan x + k x2R
1 + x2
1
(cos x)2
tan x + k x2R
Proof Let = fxi gni=0 be a subdivision of [a; b]. If we add and subtract P (xi ) for every
i = 1; 2; : : : ; n 1, we have
P (b) P (a) = P (xn ) P (xn 1 ) + P (xn 1) P (x1 ) + P (x1 ) P (x0 )
Xn
= (P (xi ) P (xi 1 ))
i=1
Let us consider P on [xi 1 ; xi ]. Since P is di¤erentiable on (a; b) and is continuous on [a; b],
by the Mean Value Theorem there exists x ^i 2 (xi ; xi 1 ) such that
P (xi ) P (xi 1)
P 0 (^
xi ) =
xi xi 1
Since P is a primitive, we have
P (xi ) P (xi 1)
xi ) = P 0 (^
f (^ xi ) =
xi xi 1
and hence
n
X n
X n
X
P (b) P (a) = (P (xi ) P (xi 1 )) = f (^
xi ) (xi xi 1) = f (^
xi ) xi
i=1 i=1 i=1
which implies
I (f; ) P (b) P (a) S (f; ) (35.53)
Since is any subdivision, (35.53) holds for every 2 and therefore
sup I (f; ) P (b) P (a) inf S (f; )
2 2
Let us illustrate the theorem with some examples, which use again the primitives com-
puted in Examples 1437 and 1438.
35.7. INTEGRAL CALCULUS 1037
For the integrable functions without primitives, such as the signum function, the last
theorem cannot be applied and the calculation of integrals cannot be done through formula
(35.52). In some simple cases it is, however, possible to calculate the integral using directly
the de…nition. For example, the signum function is a step function and therefore we can
apply Proposition 1423 in which, using the de…nition of the integral, we determined the
value of the integral for this class of functions. In particular, we have
8
Z b < b a if a 0
>
sgn x dx = a + b if a < 0 < b
a >
:
a b if b 0
The cases a 0 and b 0 are obvious using (35.32). Let us consider the case a < 0 < b.
Using (35.32) and (35.43), we have
Z b Z 0 Z b Z 0 Z b
sgn x dx = sgn x dx + sgn x dx = ( 1)dx + 1dx
a a 0 a 0
= ( 1)(0 a) + (1)(b 0) = a + b
In other words, the value F (x) of the integral function is the (signed) area under f on
the interval [a; x], when x varies.15
Rx
N.B. The integral function F (x) = a f (t) dt is a function F : [a; b] ! R that has, as
variable,Rthe upper limit of integration x that, when varies, determines a di¤erent Riemann
x
integral a f (t) dt. The value of this integral (which is a scalar) is the image F (x) of the
integral function. In this regard, note that F is de…ned on [a; b] since, f being integrable on
this interval, it is integrable on all the subintervals [a; x] [a; b]. O
Proof Since f is bounded, there exists M > 0 such that jf (x)j M for every x 2R [a; b]. Let
x
x; y 2 [a; b]. By the de…nition of the integral function, we have F (x) F (y) = y f (t) dt.
Thanks to (35.46), we have
Z x Z x Z x
jF (x) F (y)j = f (t) dt jf (t)j dt M dt = M jx yj
y y y
Armed with the notion of integral function, we can address the problem that opened the
section: the next important result, the Second Fundamental Theorem of Calculus, shows
that the integral function is a primitive of a continuous f . Continuity is, therefore, a simple
condition that guarantees the existence of a primitive.
for all a y x b. In view of (35.54), the fact that the integral function may be a
primitive is then not that surprising. Next we give a rigorous argument.
Proof Let x0 2 (a; b). First of all, let us see which form the di¤erence quotient of F at x0
assumes. Take h > 0 such that x0 + h 2 [a; b]. By Corollary 1431,
Z x0 +h Z x0
F (x0 + h) F (x0 ) = f (t) dt f (t) dt
a a
Z x0 Z x0 +h Z x0 Z x0 +h
= f (t) dt + f (t) dt f (t) dt = f (t) dt
a x0 a x0
Therefore, by the Mean Value Theorem, letting x0 + #h, 0 # 1 denote a point of the
interval [x0 ; x0 + h], we have:
R x0 +h
F (x0 + h) F (x0 ) x0 f (t) dt hf (x0 + #h) hf (x0 )
f (x0 ) = f (x0 ) =
h h h
= f (x0 + #h) f (x0 ) ! 0
by the continuity of f .
A similar argument holds if h < 0.16 Therefore,
F (x0 + h) F (x0 )
F 0 (x0 ) = lim = f (x0 )
h!0 h
by completing in this way the proof when x0 2 (a; b). The cases x0 = a and x0 = b are
proved in a similar way, as the reader can easily verify. We conclude that there exists F 0 (x0 )
and that it is equal to f (x0 ).
The Second Fundamental Theorem gives a su¢ cient condition, continuity, for an integ-
rable function to have a primitive (so, an inde…nite integral). More importantly, however,
in so doing it shows that di¤erentiation can be seen as the inverse operation of integration:
condition (35.54) can, indeed, be written as
Z x
d
f (t) dt = f (x) (35.56)
dx a
On the other hand, a di¤erentiable function f : [a; b] ! R is, obviously, a primitive of its
derivative function f 0 : [a; b] ! R. By the First Fundamental Theorem of Calculus, if f 0
16
Observe that in this case we have
Z x0 +h Z x0 Z x0 +h Z x0 +h Z x0 Z x0
f (t) dt f (t) dt = f (t) dt f (t) dt + f (t) dt = f (t) dt
a a a a x0 +h x0 +h
1040 CHAPTER 35. THE RIEMANN INTEGRAL
The next example shows that continuity is only a su¢ cient, but not necessary, condition
for an integrable function to admit a primitive.
Indeed, for x 6= 0 this can be veri…ed by di¤erentiating x2 sin 1=x, while for x = 0 one
observes that
P (h) P (0) h2 sin h1 1
P 0 (0) = lim = lim = lim h sin = 0 = f (0)
h!0 h h!0 h h!0 h
So, there exist discontinuous integrable functions that have primitives (for which the First
Fundamental Theorem can therefore be applied). N
The signum function, which has no primitive (Example 1442), is an example of a dis-
continuous function for which the last theorem altogether fails. Next we present another
example of such failure, yet more subtle in that it features a di¤erentiable integral function.
The function f , a well behaved modi…cation of the Dirichlet function, is continuous at every
irrational points and discontinuous at every rational point of the unit interval. By Theorem
35.8. PROPERTIES OF THE INDEFINITE INTEGRAL 1041
R1
1426, f is integrable. In particular, 0 f (t) dt = 0. It is a useful (non-trivial) exercise to
check all this. Rx
That said, if F (x) = 0 f (t) dt for every x 2 [0; 1], we then have F (x) = 0 for every
x 2 [0; 1]. Hence, F is trivially di¤erentiable, with F 0 (x) = 0 for every x 2 [0; 1], but F 0 6= f
because F 0 (x) = f (x) if and only if x is irrational. We conclude that (35.54) does not
hold, andR so the last theorem fails because F is not a primitive of f . Nevertheless, we have
x
F (x) = 0 F 0 (t) dt for every x 2 [0; 1]. N
O.R. The operation of integration makes a function more regular: the integral function F of
f is always continuous and, if f is continuous, it is di¤erentiable. In contrast, the operation
of di¤erentiation makes a function more irregular. Speci…cally, integration scales up of a
degree the regularity: F is always continuous; if f is continuous, F is di¤erentiable and,
continuing in this way, if f is di¤erentiable, F is twice di¤erentiable, and so on and so forth.
Di¤erentiation, instead, scales down the regularity of a function. H
(ii) we calculate the di¤erence P (b) P (a): this di¤erence is often denoted by P (x)jba or
[P (x)]ba .
Next we present some properties of the inde…nite integral that simplify its calculation.
A …rst observation is that the linearity of derivatives, established in (20.12), implies the
linearity of the inde…nite integral.17
Proposition 1451 Let f; g : I ! R be two functions that admit primitives. Then for every
; 2 R, the function f + g : I ! R admits a primitive and
Z Z Z
( f + g) (x) dx = f (x) dx + g (x) dx (35.58)
A simple application of the result is the calculation of the inde…nite integral of a poly-
nomial. Namely, given a polynomial f (x) = 0 + 1 x + + n xn , it follows from (35.58)
that Z Z X ! Z
n Xn Xn
i i xi+1
f (x) dx = ix dx = i x dx = i +k
i+1
i=0 i=0 i=0
17
As in Section 35.7.1, in this section we denote by I a generic interval, bounded or unbounded, of the real
line.
1042 CHAPTER 35. THE RIEMANN INTEGRAL
The product rule for di¤erentiation leads to an important formula for the calculation of
the inde…nite integral, called integration by parts.
Proof By the product rule (20.13), (f g)0 = f 0 g + f g 0 . Hence, f g = Pf 0 g+f g0 , and thanks to
(35.58) we have
Z Z Z
f (x) g (x) + k = f 0 (x) g (x) + f (x) g 0 (x) dx = f 0 (x) g (x) dx + f (x) g 0 (x) dx
as claimed.
Formula (35.59) is useful becauseR sometimes there is Ra strong asymmetry in the com-
putability of the inde…nite integrals f 0 (x) g (x) dx and f (x) g 0 (x) dx, one of them may
be much simpler to calculate than the other one. By exploiting this asymmetry, thanks to
(35.59) we may be able to calculate the more complicated integral as the di¤erence between
f (x) g (x) and the simpler integral.
R
Example 1453 Let us calculate the inde…niteR integral log x dx. Let f; g :R(0; 1) ! R be
de…ned by f (x) = log x and g (x) = x, so that log x dx can be rewritten as log x g 0 (x) dx.
By formula (35.59), we have
Z Z
0
xf (x) dx + log x dx = x log x + k
that is, Z Z
1
x dx + log x dx = x log x + k
x
So, Z
log x dx = x (log x 1) + k
N
R
Example 1454 Let us calculate the inde…nite integral Rx sin x dx. Let f; g : (0; 1) !
R be given
R by f (x) = x and g (x) = cos x, so that x sin x dx can be rewritten as
f (x) g 0 (x) dx. By formula (35.59),
Z Z
0
f (x) g (x) dx + x sin x dx = x cos x + k
that is, Z Z
x sin x dx = cos xdx x cos x + k = sin x x cos x + k
N
35.9. CHANGE OF VARIABLE 1043
Note that in the last example, if instead we set f (x) = sin x and g (x) = x2 =2, formula
(35.59)
R becomes
R useless. Also with such choice of f and g, it is still possible to rewrite
x sin x dx as f (x) g 0 (x) dx. Yet, here (35.59) implies
Z Z
0 x2
f (x) g (x) dx + x sin x dx = sin x + k
2
that is,
Z Z
x2 1
x sin x dx = sin x x2 cos xdx + k
2 2
R 2
which actually complicated things because
R the integral x cos xdx is more di¢ cult to com-
pute compared to the original integral x sin x dx. This shows that integration by parts
cannot proceed in a mechanical way, but it requires a bit of imagination and experience.
R
O.R. Example 1454 shows that to calculate the integral xn h(x)dx, where h is a function
whose primitive has a similar “complexity” (e.g., h is sin x, cos x or ex ), a good choice is
to set f (x) = xn and g(x) = h(x). Indeed, after having di¤erentiated f (x) for n times,
the polynomial form disappears and one is left with g(x) or g 0 (x), which is immediately
integrable. Such a choice has been used in Example 1454. H
The two factors of the product f (x) g 0 (x) dx are called, respectively, the …nite factor, f (x),
and the di¤ erential factor, g 0 (x) dx. So, the formula says that “the integral of the product
between the …nite factor and a di¤erential factor is equal to the product between …nite
factor and the integral of the di¤erential factor minus the integral of the product between
the derivative of the …nite factor and the integral just found”. We repeat that it is important
to carefully choose which of the two factors to take as …nite factor and which as di¤erential
factor.
Theorem 1455 Let ' : [c; d] ! [a; b] be a di¤ erentiable and strictly increasing function
such that '0 : [c; d] ! R is integrable. If f : [a; b] ! R is continuous, then the function
(f ') '0 : [c; d] ! R is integrable and
Z d Z '(d)
0
f (' (t)) ' (t) dt = f (x) dx (35.61)
c '(c)
If ' is surjective, we have a = ' (c) and b = ' (d). Formula (35.61) can therefore be
rewritten as Z b Z d
f (x) dx = f (' (t)) '0 (t) dt (35.62)
a c
Heuristically, (35.61) can be seen as the result of the change of variable x = ' (t) and of the
corresponding change
dx = '0 (t) dt = d' (t) (35.63)
in dx. At a mnemonic and calculation level, this observation can be useful, even if the writing
(35.63) is per se meaningless.
(F ')0 (t) = F 0 ' (t) '0 (t) = (f ') (t) '0 (t)
that is, F ' is a primitive of (f ') '0 : [c; d] ! R. By Proposition 1421, the composite
function f ' : [c; d] ! R is integrable. Since, by hypothesis, '0 : [c; d] ! R is integrable,
so is the product function (f ') '0 : [c; d] ! R (recall what we saw at the end of Section
35.6). By the First Fundamental Theorem, we have
Z d
(f ') (t) '0 (t) dt = (F ') (d) (F ') (c) (35.65)
c
Since ' is bijective (being strictly increasing), we have ' (c) = a and ' (d) = b. Therefore,
(35.65) and (35.64) imply
Z d Z b
(f ') (t) '0 (t) dt = F (' (d)) F (' (c)) = f (x) dx
c a
as desired.
Theorem 1455, besides having a theoretical interest, can be useful in the calculation of
integrals. Formula (35.61), and its rewriting (35.62), can be used both from “right to left”
and fromR “left to right”. In the …rst case, from right to left, the objective is to calculate the
b
integral a f (x) dx by …nding a suitable change of variable x = ' (t) that leads to an integral
R ' 1 (b)
' 1 (a)
f (' (t)) '0 (t) dt that is easier to calculate. The di¢ culty is in …nding a suitable
35.9. CHANGE OF VARIABLE 1045
change of variable x = ' (t): indeed, nothing guarantees that there exists a “simplifying”
change and, even if it existed, it might not be obvious how to …nd it.
On the other hand, the application in direction left to right of formula (35.61) is useful
Rd
to calculate an integral that can be written as c f (' (t)) '0 (t) dt for some function f for
R '(d)
which we know the primitive F . In such a case, the corresponding integral '(c) f (x) dx,
obtained by setting x = ' (t), is easier to calculate since
Z
f ('(x))'0 (x)dx = F ('(x))
Rd
In such a case the di¢ culty is in recognizing the composite form c f (' (t)) '0 (t) dt in
the integral that we want to calculate. Also here, nothing guarantees that the integral
can be rewritten in this form, nor that, also when possible, it is easy to recognize. Only
the experience (and the exercise) can be of help. The next example presents some classic
integrals that can be calculated with this technique.
(iii) We have
Z Z
sin('(x))'0 (x)dx = cos '(x) + k and cos('(x))'0 (x)dx = sin '(x) + k
For example,
Z
sin(3x3 2x2 ) (9x2 4x)dx = sin(3x3 2x2 ) + k
(iv) We have Z
e'(x) '0 (x)dx = e'(x) + k
For example, Z Z
2 1 2 1 2
xex dx = 2xex dx = ex + k
2 2
1046 CHAPTER 35. THE RIEMANN INTEGRAL
We present now three examples that illustrate the two possible applications of formula
(35.61). The …rst example considers the case right to left, the second example can be solved
both going right to left and left to right, while the last example considers the case left to
right. For simplicity we use the variables x and t as they appear in (35.61), even if it is
obviously a mere convenience, without substantial value.
and so Z b p p p p p p p
sin xdx = 2 sin b sin a + a cos a b cos b
a
p
Note how the starting point has been to set t = x, that is, to specify the inverse function t =
p
' 1 (x) = x. This is often the case because it is simpler to think of which transformation
of x may simplify the integration. N
R '(x)a+1
with '(x) = 1 + sin x and a = 3. Since '(x)a '0 (x)dx = a+1 , we have
Z
2 cos x 1 2 1 1 3
dx = = + =
0 (1 + sin x)3 2 (1 + sin x)2 0
8 2 8
with [c; d] (0; 1). Here we recognize again a form of type (i) of Example 1456, an integral
of the type Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = log t and a = 1. Since again '(t)a '0 (t)dt = a+1 , we have
Z d
d
log t log2 t 1
dt = = log2 d log2 c
c t 2 c 2
N
(i) rational if it is can be expressed as a ratio of polynomials (Section 10.5.1), that is,
a0 + a1 x + ::: + an xn
f (x) = (35.67)
b0 + b1 x + ::: + bm xm
(ii) algebraic if it is de…ned through …nite combinations of the four elementary operations
and root extraction.
(vi) the functions obtained through both …nite combinations and …nite compositions of func-
tions that belong to the previous classes.
The elementary functions that are neither rational nor algebraic are called transcend-
ental. For example, such are the exponential functions, the logarithmic functions, and the
trigonometric functions.
The elementary functions can be written in …nite terms (that is, in closed form), which
gives them simplicity and tractability. However, the relevant question for the integral calculus
is whether their primitive are themselves elementary functions, so they keep enjoying the
tractability of the original functions. This motivates the following de…nition:
De…nition 1462 An elementary function is said to be integrable in …nite terms if its prim-
itive is an elementary function.
In this case, we will say also that f is explicitly integrable or integrable in closed form.
For example, f (x) = 2x is explicitly integrable since its primitive F (x) = x2 is an elementary
function. Also the functions f (x) = sin x, f (x) = cos x, as well as all the polynomials and
the exponential functions f (x) = ekx , with k 2 R, are explicitly integrable.
Nevertheless, and this is what makes interesting the topic of this section, not all element-
ary functions are explicitly integrable. The next result reports the remarkable example of
the Gaussian function.
Proposition 1463 The elementary functions e x2 and ex =x are not integrable in …nite
terms.
The proof of the proposition is based on results of complex analysis. The non-integrability
in …nite terms of these functions implies that of other important functions.
Example 1464 The function 1= log x is not integrable in …nite terms. Indeed, with the
change of variable x = et , we get dx = et dt and therefore, by substitution,
Z Z t
1 e
dx = dt
log x t
18
Through complex numbers, it is possible to express trigonometric functions as linear combinations of
exponential functions, as reader will learn in more advanced courses.
35.10. CLOSED FORMS 1049
Since ex =x is not integrable in …nite terms, the same holds for 1= log x. In particular, the
integral function Z x
1
Li (x) = dt
2 log t
which plays a key role in the study of prime numbers, is not an elementary function. N
In view of this example, it becomes important to have criteria that guarantee the in-
tegrability, or the non-integrability, in …nite terms of a given elementary function. For the
rational functions everything is simple, as the next result shows (we omit its proof).
Proposition 1465 Rational functions are integrable in …nite terms. In particular, the prim-
itive of a rational function f (x) is an elementary function given by a linear combination of
the following functions:
log(ax2 + bx + c), arctan(dx + k) and r (x)
where a; b; c; d; k 2 R and r(x) is a rational function.
Things are more complicated for algebraic and transcendental functions: some of them
are integrable in …nite terms, others are not. A full analysis of the topic is well beyond
the scope of this book.19 We just mention that Liouville has proved an important result
that establishes a necessary and su¢ cient condition for the integrability in …nite terms of
functions of the form f (x)eg(x) . Inter alia, this result permits to prove Proposition 1463,
2
that is, the non-integrability in …nite terms of the functions e x and ex =x.
This said, in some (lucky) cases the integrability in …nite terms of non-rational elementary
functions can be reduced, through suitable substitutions, to that of rational functions. This
is the case, for example, for functions of the type r(ex ), where r ( ) is a rational function.
Indeed, by setting x = log t and by recalling what we saw in Section 35.9 on the integration
by substitution, we get Z Z
r(t)
r(ex )dx = dt
t
Thanks to Proposition 1465, the rational function r (t) =t is integrable in …nite terms.
Another example is the transcendental function
a sin x + b cos x
f (x) =
c sin x + d cos x
with a; b; c; d 2 R and ; ; ; 2 Z. By setting x = 2 arctan t, that is,
x
tan =t
2
simple trigonometric arguments yield:
2t 1 t2
sin x = and cos x = (35.68)
1 + t2 1 + t2
Indeed, we have sin x = 2 sin x=2 cos x=2 and cos x = cos2 x=2 sin2 x=2. Since 1+tan2 x=2 =
cos 2 x=2, we have
x 1
cos = q
2 1 + tan2 x 2
Moreover,
x x x tan x2
sin = tan cos = q
2 2 2 1 + tan2 x
2
By substituting sin x=2 and cos x=2 in sin x and cos x, we get (35.68).
With this substitution we transform f (x) into the rational function
2t 1 t2
a 1+t2
+b 1+t2
2t 1 t2
c 1+t2
+d 1+t2
O.R. The question of determining whether or not the inde…nite integral of a function belongs
to a given class of functions was tackled already by Newton and Leibniz. While Newton,
to avoid resorting to transcendental functions, preferred to express the primitive through
algebraic functions (also through in…nite series of algebraic functions), Leibniz gave priority
to formulations in …nite terms and considered acceptable also non-algebraic primitives. The
vision of Leibniz prevailed and in the nineteenth century the problem of integrability in …nite
terms became an important area of research, with major contributions by Joseph Liouville
in the 1830s. H
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
seen in Example 1258 and whose area is given by the Gauss integral
Z +1
2
e x dx (35.69)
1
In this case the domain of integration is the whole real line ( 1; +1).
Let us begin with domains of integration of the form [a; +1). Given a function f :
[a; +1) ! R, consider the integral function F : [a; +1) ! R given by
Z x
F (x) = f (t) dt
a
1052 CHAPTER 35. THE RIEMANN INTEGRAL
R +1
The de…nition of the improper integral a f (x) dx is based on the limit limx!+1 F (x),
that is, on the asymptotic behavior of the integral function. For such behavior, we can have
three cases:
De…nition 1468 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
[a; +1) with integral function F . If limx!+1 F (x) 2 R, we set
Z +1
f (x) dx = lim F (x)
a x!+1
Rand
+1
the function f is said to be integrable in the improper sense on [a; +1). The value
a f (x) dx is called the improper (or generalized) Riemann integral.
For brevity, in the sequel we will say that a function f is integrable on [a; +1), omitting
“in an improper sense”. We have the following terminology:
R +1
(i) the integral a f (x) dx converges if limx!+1 F (x) 2 R;
R +1
(ii) the integral a f (x) dx diverges positively (resp., negatively) if limx!+1 F (x) = +1
(resp., 1);
R +1
(iii) …nally, if limx!+1 F (x) does not exist, we say that the integral a f (x) dx does not
exist (or that it is oscillating).
Example 1469 Fix > 0 and let f : [1; +1) ! R be given by f (x) = x . The integral
function F : [1; +1) ! R is
8
Z x < 1
x1 1 if =6 1
F (x) = t dt = 1
1 :
log x if = 1
So, 8
< +1 if 1
lim F (x) = 1
x!+1 : if >1
1
It follows that the improper integral
Z +1
1
dx
1 x
Example 1470 A continuous time version of the discrete time intertemporal problem of
Section 9.1.2 features an in…nitely lived consumer who chooses over consumption streams
f : [0; 1) ! [0; 1) of a single good. Such streams are evaluated by a continuous time
intertemporal utility function U : A R[0;1) ! R, often de…ned by the improper integral
Z 1
U (f ) = u (f (t)) e t dt
0
Let us now consider the improper integral on the domain of integration ( 1; 1).
De…nition 1472 RLet f : R ! R beR a function integrable on every interval [a; b]. If there
+1 a
exist the integrals a f (x) dx and 1 f (x) dx, the function f is said to be integrable ( in
an improper sense) on R and we set
Z +1 Z +1 Z a
f (x) dx = f (x) dx + f (x) dx (35.70)
1 a 1
R +1
provided we do not have an indeterminate form 1 1. The value 1 f (x) dx is called
the improper (or generalized) Riemann integral of f on R.
It is easy to see that this de…nition does not depend on the choice of the point a 2 R.
Often, for convenience, we take Ra = 0.
+1
Also the improper integral 1 f (x) dx is called convergent or divergent according to
whether its value is …nite or is equal to 1.
Next we illustrate this notion withR couple of examples.
R a Note that it is necessary to
+1
compute separately the two integrals a f (x) dx and 1 f (x) dx, whose values must
then be summed (unless the indeterminate form 1 1 arises).
1054 CHAPTER 35. THE RIEMANN INTEGRAL
The value of the integral in the previous example is consistent with the geometric inter-
pretation of the integral as the area (with sign) of the region under f . Indeed, such a …gure
is a big rectangle with in…nite base and height k. Its area is +1 if k > 0, zero if k = 0, and
1 if k < 0.
x2 x2
= lim + lim =1 1
x!+1 2 x! 1 2
So, the improper integral
Z +1
xdx
1
Di¤erently from Example 1473, the value of the integral in this last example is not
consistent with the geometric interpretation of the integral. Indeed, look at the following
picture:
35.11. IMPROPER INTEGRALS 1055
y
2
1
(+)
0
O x
(-)
-1
-2
-3
-3 -2 -1 0 1 2 3
The areas of the two regions under f for x < 0 and x > 0 are two “big triangles” of in…nite
base and height. They are intuitively equal because they are perfectly symmetric with respect
to the vertical axis, but of opposite sign –as indicated by the signs (+) and ( ) in the …gure.
It is then natural to think that they compensate each other, resulting in an integral equal
to 0. Nevertheless, the de…nition requires the separate calculation of the two integrals as
x ! +1 and as x ! 1, which in this case generates the indeterminate form 1 1.
instead of the two separate limits in (35.70). This motivates the following de…nition.
De…nition 1476 Let f : R ! R be a function integrable on Reach interval [a; b]. The Cauchy
R1 1
principal value, denoted by PV 1 f (x) dx, of the integral 1 f (x) dx is given by
Z +1 Z k
PV f (x) dx = lim f (x) dx
1 k!+1 k
In place of the two limits upon which the Rde…nition of the improper integral is based,
k
the principal value considers only the limit of k f (x) dx. We will see in examples below
that, with this de…nition, the geometric intuition of the integral as the area (with sign) of
the region under f is preserved. It is, however, a weaker notion than the improper integral.
Indeed:
(i) when the improper integral exists, also the principal value exists and one has
Z +1 Z +1
PV f (x) dx = f (x) dx
1 1
1056 CHAPTER 35. THE RIEMANN INTEGRAL
(ii) the principal value may exist alsoRwhen the improper integral does not exist: in Ex-
+1
ample 1475 the improper integral 1 xdx does not exist, yet
Z +1 Z k
PV xdx = lim xdx = 0
1 k!+1 k
R +1
and therefore PV 1 xdx exists and is …nite.
In sum, the principal value may exist even when the improper integral does not exist.
To better illustrate this key relation between the two notions of integral on ( 1; 1), let us
consider a more general version of Example 1475.
x2 x2
= lim + x + lim x =1 1
x!+1 2 x! 1 2
does not exist because we have the indeterminate form 1 1. By taking the principal
value, we have
Z +1 Z k
PV f (x) dx = lim (x + ) dx
1 k!+1 k
8
Z k < +1 if >0
= lim xdx + 2 k = 2 lim k = 0 if =0
k!+1 k k!+1 :
1 if <0
R +1
So, the principal value exists: PV 1 (x + ) dx = 1, unless is zero. N
35.11. IMPROPER INTEGRALS 1057
In the last example the principal value agrees with the geometric intuition of the integral
as area with sign. Indeed, when = 0 the intuition is obvious (see the …gure and the
comment after Example 1475). In the case > 0, look at the …gure
2.5 y
1.5
0.5
(+)
0
x
-0.5 (-)
-1
-1.5
-2
-3 -2 -1 0 1 2 3
The negative area of the “big triangle”indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other, what
“is left”is the area of the dotted …gure, which is clearly in…nite and with + sign (lying above
the horizontal axis). For < 0 similar considerations hold:
y
2
1
(+)
0
(-) x
-1
-2
-3
-3 -2 -1 0 1 2 3
The negative area of the “big triangle”indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other out,
“what is left” is here again the area of the dotted …gure, which is clearly in…nite and with
negative sign (lying below the horizontal axis).
1058 CHAPTER 35. THE RIEMANN INTEGRAL
Therefore, the improper integral does not exist because we have the indeterminate form
1 1. By calculating the principal value, we have instead
Z +1 Z k
x
PV f (x) dx = lim dx
1 k!+1 k 1 + x2
1 1
= lim log 1 + k 2 log 1 + k 2 =0
k!+1 2 2
and so
Z +1
x
PV dx = 0:
1 1 + x2
N
Properties
Being de…ned as limits, the properties of improper integrals follow from the properties of
limits of functions (Section 11.4). In particular, the improper integral retains the properties
of linearity and of monotonicity of the Riemann integral.
Let us begin with linearity, which follows from the algebra of limits established in Pro-
position 459.
Proposition 1479 Let f; g : [a; +1) ! R be two functions integrable on [a; +1). Then,
for every ; 2 R, the function f + g : [a; +1) ! R is integrable on [a; +1) and
Z +1 Z +1 Z +1
( f + g) (x) dx = f (x) dx + g (x) dx (35.71)
a a a
Proof By the linearity of the Riemann integral, and by points (i) and (ii) of Proposition
459, we have
Z x
lim ( f + g) (x) dx = lim ( F (x) + G (x)) = lim F (x) + lim G (x)
x!+1 a x!+1 x!+1 x!+1
Z +1 Z +1
= f (x) dx + g (x) dx
a a
The property of monotonicity of limits of functions (see Proposition 458 and its scalar
variant) yields the property of monotonicity of the improper integral.
Proposition
R +1 1480 Let
R +1f; g : [a; +1) ! R be two functions integrable on [a; +1). If f g,
then a f (x) dx a g (x) dx.
Proof Thanks to the monotonicity of the Riemann integral, F (x) G (x) for every x 2
[a; +1). By the monotonicity of the limits of functions, we have therefore limx!+1 F (x)
limx!+1 G (x).
R +1
As we have seen in Example 1473, a 0dx = 0. So, a simple consequence of Proposition
R +1
1480 is that a f (x) dx 0 whenever f is positive and integrable on [a; +1).
Integrability criteria
We give now some integrability criteria, limiting ourselves for simplicity to positive functions
f : [a; +1) ! R. In this case, the integral function F : [a; +1) ! R is increasing. Indeed,
for every x2 x1 a,
Z x2 Z x1 Z x2 Z x1
F (x2 ) = f (t) dt = f (t) dt + f (t) dt f (t) dt = F (x1 )
a a x1 a
R x2
since x1 f (t) dt 0. Thanks to the monotonicity of the integral function, we have the
following characterization of improper integrals of positive functions.
Proposition 1481 Let f : [a; +1) ! R be a function positive and integrable on every
interval [a; b] [a; +1). Then, f is integrable on [a; +1) and
Z +1
f (t) dt = sup F (x) (35.72)
a x2[a;+1)
R1
In particular, a f (t) dt converges only if limx!+1 f (x) = 0 (provided this limit exists).
The condition limx!+1 f (x) = 0 is only necessary for convergence, as Example 1469
with 0 < 1 shows. For instance, if = 1 we have limx!+1 1=x = 0, but for every a > 0
we have Z +1 Z x
1 1 x
dt = lim dt = lim log = +1
a t x!+1 a t x!+1 a
R +1
and therefore a (1=t) dt diverges positively.
In stating the necessary condition limx!+1 f (x) = 0 we put the clause “provided this
limit exists”. The next simple example
R 1 shows that the clause is important because the limit
may not exist even if the integral a f (t) dt converges.
Example 1482 Let f : [0; 1) ! R be given by
(
1 if x 2 N
f (x) =
0 otherwise
Rx
By Proposition 1420, it is easy to see that 0 f (t) dt = 0 for every x > 0 and, therefore,
R1
0 f (x) dx = 0. Nevertheless, limx!+1 f (x) does not exist. N
The proof of Proposition 1481 rests on the following simple property of limits of monotonic
functions, which is the version for functions of Theorem 299 for monotonic sequences.
Lemma 1483 Let ' : [a; +1) ! R be an increasing function. Then, limx!+1 ' (x) =
supx2[a;+1) ' (x).
Proof Let us consider …rst the case supx2[a;+1) ' (x) 2 R. Let " > 0. Since supx2[a;+1) ' (x) =
sup ' ([a; +1)), thanks to Proposition 120 there exists x" 2 [a; +1) such that ' (x" ) >
supx2[a;+1) ' (x) ". Since ' is increasing, we have
sup ' (x) " < ' (x" ) ' (x) sup ' (x) 8x x"
x2[a;+1) x2[a;+1)
Proof of Proposition 1481 Since f is positive, its integral function F : [a; +1) ! R is
increasing and therefore, by Lemma 1483,
lim F (x) = sup F (x)
x!+1 x2[a;+1)
Suppose that limx!+1 f (x) exists. Let us show that the integral converges only if limx!+1 f (x) =
0. Suppose, by contradiction, that limx!+1 f (x) = L 2 (0; +1]. Given 0 < " < L, there
exists x" > a such that f (x) L " > 0 for every x x" . Therefore
Z +1 Z x" Z +1 Z +1 Z x
f (t) dt = f (t) dt + f (t) dt f (t) dt = lim f (t) dt
a a x" x" x!+1 x
"
Z x
lim (L ") dt = (L ") lim (x x" ) = +1
x!+1 x x!+1
"
35.11. IMPROPER INTEGRALS 1061
R +1
i.e., a f (t) dt diverges positively.
The next result is a simple comparison criterion to determine if the improper integral of
a positive function is convergent or divergent.
Corollary 1484 Let f; g : [a; +1) ! R be two positive functions integrable on every [a; b]
[a; +1), with f g. Then
Z +1 Z +1
g (x) dx 2 [0; 1) =) f (x) dx 2 [0; 1) (35.73)
a a
and Z Z
+1 +1
f (x) dx = +1 =) g (x) dx = +1 (35.74)
a a
The study of integral (35.69) of the Gaussian function f (x) = e x2 , to which we will
devote the next section, is a remarkable application of this corollary.
R +1 R +1
Proof By Proposition 1480, a f (x) dx g (x) dx, while thanks to Proposition
R +1 R +1 a R +1
1481 we have a f (x) dx 2 [0; +1] and a g (x) dx 2 [0; +1]. Therefore, a f (x) dx
R +1 R +1 R +1
converges if a g (x) dx converges, while a g (x) dx diverges positively if a f (x) dx
diverges positively.
Proposition 1485 Let f; g : [a; +1) ! R be positive functions integrable on every interval
[a; b] [a; +1).
R +1
(i) If f g as x ! +1, then a g (x) dx converges (diverges positively) if and only if
R +1
a f (x) dx converges (diverges positively).
R +1 R +1
(ii) If f = o (g) as x ! +1 and a g (x) dx converges, then so does a f (x) dx.
R +1 R +1
(iii) If f = o (g) as x ! +1 and a f (x) dx diverges positively, then so does a g (x) dx.
R +1
In light of Example 1469, Proposition 1485 implies that a f (x) dx converges if there
exists > 1 such that
1 1
f or f = o as x ! +1
x x
The comparison with powers x is an important convergence criterion for improper integ-
rals, as the next two examples show.
As x ! +1, we have
1
f
x
R +1
By Proposition 1485, 0 f (x) dx = +1, i.e., the integral diverges positively. N
N.B. As the reader can check, what has been proved for positive functions extends easily to
functions f : [a; +1) ! R that are eventually positive, that is, such that there exists c > a
for which f (x) 0 for every x c. O
If x > 0, we have
2
f (x) () e x e x () x x2 () x 1
g (x)
R +1 R +1
By (35.73) of Corollary 1484, if 1 g (x) dx converges, then also 1 f (x) dx converges.
R +1
In turn, this implies that a f (x) dx converges for every a 2 R. This is obvious if a 1.
If a < 1, we have Z Z Z
+1 1 +1
f (x) dx = f (x) dx + f (x) dx
a a 1
R1 R1
Since a f (x) dx exists
R 1because of the continuity of f on [a; 1], the convergence of 1 f (x) dx
then implies that of a f (x) dx.
R +1
Thus, it remains to show that 1 g (x) dx converges. We have
Z x
G (x) = g (t) dt = e 1 e x
1
The equality between integrals (35.75) and (35.76) is quite intuitive in light of the symmetry
of the Gaussian bell with respect to the vertical axis.
Thanks to De…nition 1472, the Gauss integral –i.e., the integral of the Gaussian function
–has therefore value
Z +1 Z +1 Z 0
2 2 2 p
e x dx = e x dx + e x dx = (35.77)
1 0 1
The Gauss integral is central in probability theory, where it is usually presented in the form:
Z +1
1 x2
p e 2 dx
1 2
By proceeding by substitution, it is easy to verify that, for every pair of scalars a; b 2 R, one
has Z +1
(x+a)2 p
e b2 dx = b (35.78)
1
p
By setting b = 2 and a = 0, we then have
Z +1
1 x2
p e 2 dx = 1
1 2
1 x2
f (x) = p e 2
2
has therefore unit value and, thus, it is a density function (as it will be seen in Section 38.1).
This explains the importance of this particular form of the Gaussian function.
1064 CHAPTER 35. THE RIEMANN INTEGRAL
De…nition 1489 Let f : [a; b) ! R be a continuous function such that limx!b f (x) = 1.
If Z z
lim f (x) dx = lim [F (z) F (a)]
z!b a z!b
If the unboundedness of the function concerns the other endpoint a, or both endpoints,
we can give a similar de…nition. If the unboundedness concerns an interior point c 2 (a; b),
it is enough to consider separately the two intervals [a; c] and [c; b].
So,
0 if > 1
lim F (x) =
x!b +1 if 0 < 1
It follows that the improper integral
Z b
1
dx
a (b x)
exists for every > 0: it converges if > 1 and diverges positively if 0 < 1. N
R b Proposition 1485 holds also for these improper integrals and allows us to state that
a f (x) dx converges if there exists > 1 such that
1 1
f or f = o as x ! b
(b x) (b x)
O.R. When the interval is unbounded, for the improper integral to converge the function
must tend to zero quite rapidly (as x with > 1). When the function is unbounded,
for the improper integral to converge the function must tend to in…nity fairly quickly – as
(b x) with > 1. Both things are quite intuitive: for the area of an unbounded surface
to exist …nite, its portion “that escapes to in…nity” must be very narrow.
For example, the function f : R+ ! R+ de…ned by f (x) = 1=x is not integrable either
on intervals of the type [a; +1), with a > 0, or on intervals of the type [0; a]: indeed the
integral function of f is F (x) = log x which diverges when x ! +1 as well as when x ! 0+ .
The functions (asymptotic to) 1= (x b)1+" , with " > 0, are integrable on the intervals of
the type [b; +1), b > 0, as well as on the intervals of the type [0; b]. H
1066 CHAPTER 35. THE RIEMANN INTEGRAL
Chapter 36
Parameter-dependent integrals
f : [a; b] [c; d] ! R
de…ned on a rectangle [a; b] [c; d] in R2 . If for every y 2 [c; d] the scalar function f ( ; y) :
[a; b] ! R is integrable on [a; b], then to every such y we can associate the scalar
Z b
f (x; y)dx (36.1)
a
Unlike the integrals seen so far, the value of the de…nite integral (36.1) depends on the value
of the variable y, which is usually interpreted as a parameter. Such an integral, referred
to as parameter-dependent integral, therefore de…nes a scalar function F : [c; d] ! R in the
following way:
Z b
F (y) = f (x; y)dx (36.2)
a
Note that, although function f is of two variables, the function F is scalar. Indeed, it does
not depend in any way on the variable x, which here plays the role of a mute variable of
integration.
Functions of type (36.2) appear in applications more frequently than one may initially
think. Therefore, having the appropriate instruments to study them is important.
36.1 Properties
We will study two properties of the function F , namely continuity and di¤erentiability. Let
us start with continuity.
1067
1068 CHAPTER 36. PARAMETER-DEPENDENT INTEGRALS
Formula (36.3) is referred to as “passage of the limit under the integral sign”.
Proof Take " > 0. We must show that there exists a > 0 such that
By hypothesis, f is continuous on the compact set [a; b] [c; d]. By Theorem 526, it is
therefore uniformly continuous on [a; b] [c; d] , so there is a > 0 such that
"
k(x; y) (x0 ; y0 )k < =) jf (x; y) f (x0 ; y0 )j < (36.4)
b a
for every (x; y) 2 [a; b] [c; d]. Therefore, for every y 2 [c; d] \ (y0 ; y0 + ) we have
Zb
"
jF (y) F (y0 )j jf (x; y) f (x; y0 )j dx < (b a) = "
b a
a
as desired.
Proposition 1492 Suppose that f : [a; b] [c; d] ! R and its partial derivative @f =@y are
both continuous on [a; b] [c; d]. Then, the function F : [c; d] ! R is di¤ erentiable on (c; d),
with Z b
0 @
F (y) = f (x; y)dx (36.5)
a @y
Zb
0 F (y + h) F (y0 ) f (x; y + h) f (x; y)
F (y) = lim = lim dx
h!0 h h!0 h
a
and Z Z
b b
@ f (x; y + h) f (x; y)
f (x; y)dx = lim dx
a @y a h!0 h
formula (36.5) is then equivalent to
Zb Z b
f (x; y + h) f (x; y) f (x; y + h) f (x; y)
lim dx = lim dx
h!0 h a h!0 h
a
36.1. PROPERTIES 1069
Proof Let y0 2 (c; d). For every x 2 [a; b] the function f (x; ) : [c; d] ! R is by hypothesis
di¤erentiable. By the Mean Value Theorem, then there exists x 2 [0; 1] such that
f (x; y0 + h) f (x; y0 ) @f
= (x; y0 + x h)
h @y
Note that x depends on x. Let us write the di¤erence quotient of function F at y0 :
Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx (36.6)
h @y
a
Zb Zb
f (x; y0 + h) f (x; y0 ) @f
= dx (x; y0 ) dx
h @y
a a
Zb
@f @f
= (x; y0 + x h) (x; y0 ) dx
@y @y
a
Zb
@f @f
(x; y0 + x h) (x; y0 ) dx
@y @y
a
The partial derivative @f =@y is continuous on the compact set [a; b] [c; d], so it is also
uniformly continuous. Thus, given any " > 0, there exists a > 0 such that
@f @f "
k(x; y) (x; y0 )k < =) (x; y) (x; y0 ) < (36.7)
@y @y b a
for every y 2 [c; d]. Therefore, for jhj < we have that
Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx < " 8jhj <
h @y
a
that is,
Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx "< < (x; y0 ) dx + " 8jhj <
@y h @y
a a
Since the above holds for every " > 0, it follows that
Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx lim (x; y0 ) dx
@y h!0 h @y
a a
that is,
Zb
F (y0 + h) F (y0 ) @f
lim = (x; y0 ) dx
h!0 h @y
a
as desired.
As the hypotheses of Proposition 1492 are satis…ed, we di¤erentiate under the integral sign:
Z b
0 b2 a2
F (y) = 2y xdx = y
a 2
The following result extends Proposition 1492 to the case of variable limits of integration.
Proposition 1494 Suppose that f : [a; b] [c; d] R2 ! R and its partial derivative @f =@y
are both continuous on [a; b] [c; d]. If ; : [c; d] ! [a; b] are di¤ erentiable, then the function
G : [c; d] ! R is di¤ erentiable on (c; d), with
Z (y)
0 @f 0 0
G (y) = (x; y)dx + (y)f ( (y); y) (y)f ( (y); y) (36.9)
(y) @y
Since
G(y) = H( (y) ; (y) ; y)
the derivative of G with respect to y at a point y0 2 (c; d) can be calculated via the chain
rule:
@H @H @H
G0 (y0 ) = (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) (36.10)
@v @z @y
where a0 = (y0 ) and b0 = (y0 ). By Proposition 1492, we have
Z b0
@H @
(a0 ; b0 ; y0 ) = f (x; y)dx (36.11)
@y a0 @y
Example 1495 Let f (x; y) = x2 + y 2 , a (x) = sin x and b (x) = cos x. Set
Z y
cos
G(y) = x2 + y 2 dx
sin y
The proof of this result is not simple, so we omit it. Note that the dominance condition
(36.14),
R +1 which is based on the auxiliary function g, guarantees inter alia that the integral
1 f (x; y)dx converges (thanks to the comparison convergence criterion stated in Corollary
1484).
with c 1 or d 1. Let g be the Gaussian function, that is, g (x) = e x2 . For every
y 2 [c; d], we have
y 2 x2 y 2 x2 y 2 x2
sin x e = jsin xj e e g (x)
R +1 2
Moreover, 1 e x dx < +1. The hypotheses of Proposition 1496 are satis…ed, so formula
(36.15) takes the form
Z +1 Z +1
0 @ y 2 x2 2 2
F (y) = sin x e dx = 2y sin x e y x dx = 2yF (y)
1 @y 1
Chapter 37
Stieltjes’integral
n
X n
X
mk (g(xk ) g(xk 1 )) and Mk (g (xk ) g (xk 1 )) (37.2)
k=1 k=1
where g is a scalar function. Clearly, (37.1) is the special case of (37.2) that corresponds to
the identity function g(x) = x.
But, why the more general sums (37.2) are relevant? Recall that the sums (37.1) arise
in Riemann integration because every interval [xi 1 ; xi ] obtained by subdividing [a; b] is
measured according to its length xi = xi xi 1 . Clearly, the length is a most natural
way to measure an interval. However, it is not the only way: in some problems it might be
more natural to measure an interval in a di¤erent way. For example, if [xi 1 ; xi ] represents
levels of production between xi 1 and xi , the most appropriate economic measure for such
an interval may be the additional cost that a higher production level entails: if C (x) is
the total cost for producing x, the measure that must be assigned to [xi 1 ; xi ] is then the
di¤erence C (xi ) C (xi 1 ). If [xi 1 ; xi ] represents, instead, an interval in which a random
variable may assume values and F (x) is the probability that such value is x, then the most
natural way to measure [xi 1 ; xi ] is the di¤erence F (xi ) F (xi 1 ). In such cases, which are
quite common in economic applications (see, e.g., Section 37.8), the Stieltjes’integral is the
natural notion of integral to use.
Besides its interest for applications, however, Stieltjes integration also sheds further light
on Riemann integration. Indeed, we will see in this chapter that some results that we
established for Riemann’s integrals are actually best understood in terms of the more general
Stieltjes’integral.
1073
1074 CHAPTER 37. STIELTJES’INTEGRAL
37.1 De…nition
Consider two functions f; g : [a; b] R ! R, with f bounded and g increasing.1 For every
subdivision = fa = x0 ; x1 ; :::; xn = bg of [a; b] and for every interval Ii = [xi 1 ; xi ], we can
de…ne the following quantities
is referred to as upper Stieltjes sum. It can be easily shown that, for every subdivision of
[a; b], we have
I( ; f; g) S( ; f; g)
When the equality holds, we get Stieltjes’integral.
sup I( ; f; g) = inf S( ; f; g)
2 ([a;b]) 2 ([a;b])
Rb
The common value, denoted by a f (x)dg(x), is called integral in the sense of Stieltjes (or
Stieltjes’integral) of f with respect to g on [a; b].
When g (x) = x, we get back to Riemann’s integral. The functions f and g are called
Rintegrand
b
function and integrator function, respectively. For brevity, we will often write
a f dg, thus omitting the arguments of such functions.
N.B. In the rest of the chapter we will tacitly assume f and g to be any two scalar functions
de…ned on [a; b], with f bounded and g increasing. O
Proposition 1499 The function f is Stieltjes integrable with respect to g if, for every " > 0,
there exists a subdivision 2 ([a; b]) such that S( ; f; g) I( ; f; g) < ".
As for Riemann’s integral, it is important to know which are the classes of integrable
functions. As one may expect, the answer depends on the regularity of both functions f and
g (recall that we assumed g to be increasing).
Rb
Proposition 1500 The integral a f dg exists if at least one of the following two conditions
is satis…ed:
(i) f is continuous;
Note that (i) and (ii) generalize, respectively, Propositions 1425 and 1428 for Riemann’s
integral.
Proof (i) The proof relies on the same steps as that of Proposition 1425. Since f is continu-
ous on [a; b], it is also bounded (Weierstrass’Theorem) and uniformly continuous (Theorem
526). Take " > 0. There exists a " > 0 such that
Let = fxi gni=0 be a subdivision of [a; b] such that j j < " . By condition (37.3), for every
i = 1; 2; : : : ; n we have
max f (x) min f (x) < "
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
(ii) Since g is continuous on [a; b], it is also bounded and uniformly continuous. Let " > 0.
There is a " > 0 such that
Let = fxi gni=0 be a subdivision of [a; b] such that j j < " . For every pair of consecutive
points of such a subdivision, we have that g(xi ) g(xi 1 ) = jg(xi ) g(xi 1 )j < ". The proof
1076 CHAPTER 37. STIELTJES’INTEGRAL
now follows the same steps as that of Proposition 1428. Suppose that f is increasing (if f is
decreasing the argument is analogous). We have
so that
n
X
S ( ; f; g) I ( ; f; g) = sup f (x) (g(xi ) g(xi 1 ))
i=1 x2[xi 1 ;xi ]
Xn
inf f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ]
i=1
n
X n
X
= f (xi ) (g(xi ) g(xi 1 )) f (xi 1 ) (g(xi ) g(xi 1 ))
i=1 i=1
Xn
= (f (xi ) f (xi 1 )) (g(xi ) g(xi 1 ))
i=1
Xn
< " (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1
Lastly, we extend Proposition 1426 to Stieltjes’ integral by requiring that g does not
share discontinuities with f .
Proposition 1501 If f has …nitely many discontinuities and g is continuous at such points,2
then f is Stieltjes integrable with respect to g.
We omit the proof of this remarkable result which, inter alia, generalizes Proposition
1500-(i). However, while Proposition 1426 allowed for in…nitely many discontinuities, in this
more general setting we restrict ourselves to consider …nitely many ones.
37.3 Calculus
When g is di¤erentiable, the Stieltjes’integral can be written as a Riemann’s integral.
Proposition 1502 Let g be di¤ erentiable and g 0 Riemann integrable. Then f is Stieltjes
integrable with respect to g if and only if f g 0 is Riemann integrable. In such a case, we have
Z b Z b
f (x)dg (x) = f (x)g 0 (x)dx (37.4)
a a
Proof Since g 0 is Riemann integrable, for any given " > 0 there exists a subdivision such
that
S(g 0 ; ) I(g 0 ; ) < "
2
In other words, we require the two functions f and g not to be discontinuous at the same points.
37.3. CALCULUS 1077
From (37.5) we also deduce that, for any pair of points si ; ti 2 Ii , we have
n
X
g 0 (si ) g 0 (ti ) xi < " (37.6)
i=1
Always referring to the generic interval Ii of the subdivision, we can observe that, thanks to
the di¤erentiability of g, there is a point ti 2 [xi 1 ; xi ] such that
So,
n
X n
X
M" f (si ) gi f (si )g 0 (si ) xi M"
i=1 i=1
n
X n
X
Note that S(f g 0 ; ) f (si )g 0 (si ) xi , from which f (si ) gi S(f g 0 ; ) + M ", and
i=1 i=1
so also
S( ; f; g) S(f g 0 ; ) + M " (37.7)
One can symmetrically prove that
Inequality (37.9) holds for any subdivision of the interval [a; b] and for every " > 0. So
Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (37.10)
a a
From (37.10) and (37.11) one can see that f g 0 is Riemann integrable if and only if f is
Stieltjes integrable with respect to g, in which case we get (37.4).
This greatly simpli…es computations easier because the techniques developed to solve Riemann’s
integrals can be then used for Stieltjes’integrals.3
From a theoretical standpoint, Stieltjes’integral substantially extends the scope of Riemann’s
integral, while keeping –also thanks to (37.4) –its remarkable analytical properties. Such a
remarkable balance between generality and tractability explains the importance of Stieltjes’
integral.
PropositionR x 1503 Let g be the integral function of a Riemann integrable function , that
is, g (x) = a (t) dt for every x 2 [a; b]. If f is continuous, we have
Z b Z b
f (x)dg (x) = f (x) (x)dx
a a
We omit the proof of this result. However, when is continuous (so, Riemann integrable)
it follows from the previous result because, by the Second Fundamental Theorem of Calculus,
the function g is di¤erentiable with g 0 = .
37.4 Properties
Properties similar to those for Riemann’s integral hold for Stieltjes’ integral. The only
substantial novelty lies in a linearity property that now holds with respect to both the
integrand function f and integrator function g. Next we list the properties without proving
them (the proofs being similar to those of Section 35.6).
(iv) Monotonicity:
Z b Z b
f1 f2 =) f1 dg f2 dg
a a
In other words, Stieltjes’ integral is the sum of all the jumps of the integrator at the
points of discontinuity, multiplied by the value of the integrand in such points. Note that,
as the integrator g is monotone, the jumps are either all positive (increasing monotonicity)
or all negative (decreasing monotonicity).
Rb
Proof By Proposition 1500, the integral a f dg exists. We must show that its value is
(37.13). Let us consider a subdivision of [a; b] which is …ne enough so that in every
interval Ii = [xi 1 ; xi ] there is at most one point of discontinuity cj (otherwise, it would
be enough to add at most n points to obtain the desired subdivision). Therefore, we have
= fx0 ; x1 ; :::; xm g with m n. For such a subdivision, it holds
m
X
I( ; f; g) = mi (g(xi ) g(xi 1 )) (37.14)
i=1
where mi = inf Ii f (x). Consider the generic i-th term of the sum in (37.14), which refers to
interval Ii . There are two cases:
5
That is, g x+
0 = limx!x+ g (x) and g x0 = limx!x g (x). We also set g a = g (a) and g b+ =
0 0
g (b).
1080 CHAPTER 37. STIELTJES’INTEGRAL
1. There exists j 2 f1; 2; :::; ng such that cj 2 Ii . If so, since Ii does not contain any other
points of discontinuity of g besides cj , we have
g(xi 1) = g(cj ) and g(xi ) = g(c+
j )
and furthermore
f (cj ) inf f (x) = mi
Ii
In this case it thus holds
h i
mi (g(xi ) g(xi 1 )) f (cj ) g c+
j g cj (37.15)
Denote by J the set of indexes i 2 f1; 2; :::; mg such that cj 2 Ii for some j 2
f1; 2; :::; ng. Clearly, jJj = n.
2. Ii does not contain any cj . In such a case, g(xi ) = g(xi 1) and so
mi (g(xi ) g(xi 1 )) =0 (37.16)
Let us denote by J c the set of indexes i 2 f1; 2; :::; mg such that cj 2
= Ii for every
c
j = 1; 2; :::; n. Clearly, jJ j = m n.
So,
n
X
I( ; f; g) f (ci ) g c+
i g ci S( ; f; g)
i=1
Since the inequalities hold for …ner partitions than the one considered, we have
n
X
sup I( ; f; g) f (ci ) g c+
i g ci inf S( ; f; g)
2 2
i=1
Rb
This implies, since the integral a f dg exists, that
Z b n
X
f dg = sup I( ; f; g) = inf S( ; f; g) = f (ci ) g c+
i g ci
a 2 2
i=1
1+ 2 1 2+ 2 3
g = ; g =0 ; g =1 ; g =
2 3 2 3 3 4
Consider an integrator step function with unitary jumps, that is, for every i we have
g c+
i g ci =1
Stieltjes’ integral thus includes addition as a particular case. More generally, we will soon
see that the moments of a random variable are represented by Stieltjes’integral.
Proof For every " > 0 there are two subdivisions, = fxi gni=0 and 0 = fyi gni=0 , of [a; b],
such that
Z b X n
"
f dg f (xi 1 ) (g (xi ) g (xi 1 )) <
a 2
i=1
and
Z b n
X "
gdf g (yi ) (f (yi ) f (yi 1 )) <
a 2
i=1
Let 00 = fzi gni=0 be the subdivision 00 = [ 0 . The two inequalities still hold for subdivision
00 . Moreover, note that
n
X n
X
f (zi 1 ) (g (zi ) g (zi 1 )) + g (zi ) (f (zi ) f (zi 1 )) = f (b) g (b) f (a) g (a)
i=1 i=1
which implies
Z b Z b
f dg + gdf f (b) g (b) + F (a) g (a) < "
a a
thus obtaining the integration by parts formula (35.60) for Riemann’s integral.
Theorem 1507 Let f be continuous and g increasing. If ' : [c; d] ! [a; b] is a strictly
increasing function, then (f ') is Stieltjes integrable with respect to g, with
Z d Z '(d)
f (' (t)) d (g ') (t) = f (x) dg (x) (37.18)
c '(c)
In particular, if g (x) = x we get back to the Riemann formula (35.62), that is,
Z d Z b
0
f (' (t)) ' (t) dx = f (x) dx
c a
The more general Stieltjes formula thus clari…es the nature of this earlier formula, besides
extending its scope. After integration by parts, the change of variable formula is thus another
result that is best understood in terms of the Stieltjes’integral.
When g is strictly increasing, the Stieltjes integral can be computed via a Riemann integral.
This result complements Proposition 1502, which showed that the same is true, but with a
di¤erent formula, when g is di¤erentiable.
where = fti gni=0 is a subdivision 0 = t0 < t1 < < tn 1 < tn = T of [0; T ]. At each time
t 2 [tk 1 ; tk ) the portfolio x (t) thus features ck units of the asset, the outcome of trading at
the market open at time tk 1 . Till time tk the portfolio does not change, so no trading is
made. The last trading occurs at tn 1 , so at T the position does not change.6
How do a portfolio’s gains/losses cumulate over time? This is a most basic bookkeeping
question that we need to answer to assess a portfolio’s performance. To this end, de…ne the
6
For simplicity, we do not consider any dividend, so the cumulated gains/losses only come from trading
(“capital gains” in the …nance jargon).
1084 CHAPTER 37. STIELTJES’INTEGRAL
where x is the integrand and p is the integrator. Since x is a step function, it is easy to see
that
( Pk 1
i=1 ci (p (ti ) p (ti 1 )) + ck (p (t) p (tk 1 )) if t 2 [tk 1 ; tk ) ; k = 2; :::; n
Gx (t) = Pn
i=1 ci (p (ti ) p (ti 1 )) if t = T
The gains’process describes how a portfolio’s gains/losses cumulate over time, thus answering
the previous question. To …x ideas, suppose that each ci is positive –i.e., x 0 –and consider
t 2 [t0 ; t1 ). Throughout all the time interval [t0 ; t1 ), the portfolio x features c1 units of the
asset. These units were traded at time 0 at a price p (0) and at time t their price is p (t).
The change in price is p (t) p (t0 ), so the portfolio’s gains/losses up to time t are
At time t1 , our position changed from c1 to c2 and then remained constant throughout the
time interval [t1 ; t2 ). To obtain this new position, we could have for example sold c1 at time
t1 and bought simultaneously c2 or just directly acquired the di¤erence c2 c1 . If markets are
frictionless, these possible trading strategies are equivalent. So, let us focus on the former.
It yields that, up to time t 2 [t1 ; t2 ), the portfolio’s cumulated gain is
Indeed, c1 (p (t1 ) p (t0 )) are the gains/losses matured in the period [0; t1 ] coming from
buying c1 units at 0 and selling them at time t1 , while c2 (p (t) p (t1 )) are the gains/losses
occurred between [t1 ; t), given by the new position c2 . By iterating this reasoning, the
Stieltjes’ integral (37.19) follows immediately – indeed, (37.20) and (37.21) correspond to
t = t1 and t = t2 in such integral. In particular, if one operates in the markets throughout,
from time 0 through time T , so to keep the long and short positions of portfolio x, then one
ends up with the gain/loss Gx (T ).
Finally, we can relax the assumption that portfolios are adjusted only …nitely many times:
as long as functions x and p satisfy, for example, the hypotheses of Proposition 1501, the
gains’process de…ned via the Stieltjes’integral (37.19) is well de…ned and can be interpreted
in terms of gains/losses.
Chapter 38
Moments
In this …nal chapter we outline a study of moments, a notion that plays a fundamental role
in probability theory and, through it, in a number of applications. For us, it is also a way
to illustrate what we learned in the last two chapters.
38.1 Densities
We say that an increasing function g : R ! R is a probability integrator if:
This class of integrators is pervasive in probability theory (in the form of cumulative
distribution functions), and this justi…es their name. If g takes on value 0 outside a bounded
interval, say the unit interval [0; 1] for concreteness, condition (i) reduces to g (0) = 0 and
g (1) = 1.
If g is the integral function of a positive function : R ! R+ , that is,
Z x
g (x) = (t) dt 8x 2 R
1
R +1
we say that is a probability density of g. By condition (i), 1 (x) dx = 1. When g is
continuously di¤erentiable, the Second Fundamental Theorem of Calculus implies g 0 = .
Example 1508 (i) Given any two scalars a < b, consider the probability integrator
8
>
> 0 if x < a
<
x a
g (x) = b a if a x b
>
>
:
1 if x > b
1085
1086 CHAPTER 38. MOMENTS
because Z x Z x
1
(t) dt = dt = g (x) 8x 2 [a; b]
1 a b a
R +1
and 1 (x) dx = 1.
(ii) The Gaussian integrator is
Z x
1 t2
g (x) = p e 2 dt
1 2
38.2 Moments
R +1
The improper Stieltjes integral, denoted 1 f (x) dg (x), can be de…ned in a similar way
than the improper Riemann integral. For it, the proprieties (i)-(v) of Section 37.4 continue
to hold. The next important de…nition rests upon this notion.
De…nition 1509 The n-th moment of an integrator function g is given by the Stieltjes
integral Z +1
n = xn dg (x) (38.1)
1
For instance, 1 is the …rst moment (often called average or mean) of g, 2 is its second
moment, 3 is its third moment, and so on.
Proposition 1510 If the moment n exists, then all lower moments k, with k n, exist.
To assume the existence of higher and higher moments is, therefore, a more and more
demanding requirement. For instance, to assume the existence of the second moment is a
stronger hypothesis than to assume the existence of the …rst moment.
R +1 nTo ease matters,k assumen that there is a scalar a such that g (a) = 0, so that n =
Proof
x dg (x). Since x = o (x ) if k < n, the version for improper Stieltjes integrals of
a R +1
Proposition 1485-(ii) ensures the convergence of a xk dg (x), that is, the existence of k .
In this case, we are back to Riemann integration and we directly say that n is the n-th
moment of the density .
38.3. THE PROBLEM OF MOMENTS 1087
Given a sequence f n g of scalars in [0; 1], is there an integrator g such that, for each
n, the term n is exactly its n-th moment n ?
The question amounts to ask whether sequences of moments have a characterizing prop-
erty, which then f n g should satisfy in order to have the desired property. This question was
…rst posed by Thomas Stieltjes in the same 1894-95 articles where it developed his notion
of integral. Indeed, to provide a setting where to address properly the problem of moments
was a main motivation for his integral (which, as we just remarked, is indeed the natural
setting where to de…ne moments).
Next we present a most beautiful answer given by Felix Hausdor¤ in the early 1920s. To
do it, we need to go back to the …nite di¤erences of Chapter 10.
1088 CHAPTER 38. MOMENTS
In words, a sequence is totally monotone if its …nite di¤erences keep alternating sign
across their orders. A totally monotone sequence is positive because 0 xn = xn , as well as
decreasing because xn 0 (Lemma 386).
Proof We prove the “only if” part, the converse being signi…cantly more complicated.
k k
So,
R 1 nlet f n gk be a sequence of moments (38.3). It su¢ ces to show that ( 1) xn =
0 t (1 t) dg (t) 0. We proceed by induction on k. For k = 0 we trivially have
k k R1 n k 1 k 1 R1 n k 1
( 1) n = n = 0 t dg (t) for all n. Assume ( 1) xn = 0 t (1 t) dg (t)
for all n (induction hypothesis). Then,
k k 1
xn = xn = k 1 xn+1 k 1
xn
Z 1 Z 1
= ( 1)k 1
tn+1 (1 t)k 1
dg (t) tn (1 t)k 1
dg (t)
0 0
Z 1 Z 1
= ( 1)k 1
tn (1 t)k 1
(1 t) dg (t) = ( 1)k tn (1 t)k dg (t)
0 0
as desired.
The characterizing property of moment sequences is, thus, total monotonicity. It is truly
remarkable that a property of …nite di¤erences is able to pin down moments’sequences. Note
that for this result the Stieltjes integral is required: in the “if” part the integrator, whose
moments turn out to be the terms of the given totally monotone sequence, might well be
non-di¤erentiable (so, the Riemann version (38.2) might not hold).
We can then use Proposition 1496 to establish the existence and di¤erentiability of the
moment generating function.
R +1 In particular, if there exists " > 0 and a positive function
g : R ! R such that 1 g (x) dx < +1 and, for every y 2 [ "; "],
At y = 0 we get
F 0 (0) = 1
The derivative at 0 of the moment generating function is, thus, the …rst moment of the
density. R +1
If there exists a positive function h : R ! R such that 1 h (x) dx < +1 and, for every
y 2 [ "; "],
jxeyx (x)j = jxj eyx (x) h (x) 8x 2 R
then, by Proposition 1496, F : ( "; ") ! R is twice di¤erentiable, with
Z +1 Z +1
00 @ yx
F (y) = xe (x) dx = x2 eyx (x) dx
1 @y 1
At y = 0 we get
F 00 (0) = 2
By proceeding in this way (if possible), with higher order derivatives we get:
F 000 (0) = 3
(iv)
F (0) = 4
F (n) (0) = n
The derivative of order n at 0 of the moment generating function is, thus, the n-th moment
of the density. This fundamental property justi…es the name of this function.
1090 CHAPTER 38. MOMENTS
x2
Example 1515 For the Gaussian density (x) = e 2 we have
Z +1 Z +1 Z +1
x2 1 2
F (y) = eyx (x) dx = eyx e 2 dx = e 2 (x 2yx) dx
1 1 1
Z +1 Z +1 2
Z +1
y2
1
x2
( 2yx+y 2 y2 ) dx = 1
x2
( 2yx+y 2 )+ y2 1
(x y)2
= e 2 e 2 dx = e 2 e 2 dx
1 1 1
where in the fourth equality we have added and subtracted y 2 . But, (35.78) of Chapter
R +1 1 2 y2 y2
35 implies 1 e 2 (x y) dx = 1, so F (y) = e 2 . We have F 0 (y) = ye 2 and F 00 (y) =
y2
e 2 (1 y), so 1 = F 0 (0) = 0 and 2 = F 00 (0) = 1. N
The next example shows that not all densities have a moment generating function; in
this case there is no " > 0 such that the integral (38.4) is …nite.
the …rst moment does not exist either. By the comparison criterion for improper Riemann
integrals, this implies n = +1 for every n 1. This density has no moments of any order.
N
Suppose that the moment generating function has derivatives of all orders. By Theorem
367,
X1
y 2 x2 y 3 x3 y n xn y n xn
eyx = 1 + yx + + + + + =
2 3! n! n!
n=0
So, it is tempting to write:
Z +1 Z 1
+1 X X 1 Z +1 X yn 1
yx y n xn y n xn
F (y) = e (x) dx = (x) dx = (x) dx = n
1 1 n=0 n! 1 n! n!
n=0 n=0
Under suitable hypotheses, spelled out in more advanced courses, it is legitimate to give in
to this temptation. Moment generating functions can be then expressed as a power series
with coe¢ cients given by the moments of the density (divided by factorials).
Part IX
Appendices
1091
Appendix A
Binary Relations
A.1 De…nition
Throughout the book we already encountered a few times binary relations, but we never
formally introduced them. In a nutshell, the notion of binary relation formalizes the idea
that an element x is in a relation with an element y. It is an abstract notion that is best
understood after having seen a few concrete examples that make it possible to appreciate its
unifying power. We discuss it in an Appendix, so that readers can decide if and when to go
through it.
A …rst example of a binary relation is the relation “being greater or equal than” among
natural numbers: given any two natural numbers x and y, we can always say if x is greater
or equal than y. For instance, 6 is greater or equal than 4. In this example, x and y are
natural numbers and “being in relation with” is equivalent to say “being greater or equal
than”.
The imagination is the only limit to the number of binary relations one can think of. Set
theory is the language that we can use to formalize the idea that two objects are related to
each other. For example, given the set of citizens I of a country, we could say that x is in
relation with y if x is the mother of y. In this case, “being in relation with” amounts to
“being the mother of ”.
Economics is a source of examples of binary relations. For instance, consider an agent
and a set of alternatives X. The preference relation % is a binary relation. In this case, “x
is in relation with y” is equivalent to say “x is at least as good as y”.
What do all these examples have in common? First, in all them we considered two
elements x and y of a set X. Second, these elements x and y were in a speci…c order: indeed,
one thing is to say that x is in relation with y, another is to say that y is in relation with
x. So, the pair formed by x and y is an ordered pair (x; y) that belongs to the Cartesian
product X X. Finally, in all three examples it might well happen that a generic pair of
elements x and y is actually unrelated. For instance, if in our second example x and y are
siblings, neither is obviously a mother of the other. In other words, a given notion of “being
in relation with” might not include all pairs of elements of X.
We are now ready to give a (set theoretic) de…nition of binary relations.
1093
1094 APPENDIX A. BINARY RELATIONS
In terms of notation, we write xRy in place of (x; y) 2 R. Indeed, the notation xRy,
which reads “x is in the relation R with y”, is more evocative of what the concept of binary
relation is trying to capture. So, in what follows we will adopt it.
To get acquainted with this new mathematical notion, let us now formalize our …rst three
examples.
Example 1518 (i) Let X be the set of natural numbers N. The binary relation can be
viewed as the subset of N N given by
Indeed, it contains all pairs in which the …rst element x is greater or equal than the second
element y.
(ii) Let X be the set of all citizens C of a country. The binary relation “being the mother
of” can be viewed as the subset of C C given by
Indeed, it contains all pairs in which the …rst element is the mother of the second element.
(iii) Let X be the set of all consumption bundles Rn+ . The binary relation % can be seen
as the subset of Rn+ Rn+ given by
Indeed, it contains all pairs of bundles in which the …rst bundle is at least as good as the
second one. N
A binary relation associates to each element x of X some element y of the same set
(possibly x itself, i.e., x = y). We denote by R (x) = fy 2 X : xRyg the image of x through
R, i.e., the collections of all y that stand in the relation R with a given x.
Example 1519 (i) For the binary relation on N, the image R (x) = fy 2 N : y xg of
x 2 N consists of all natural numbers that are greater or equal to x. (ii) For the binary
relation “being the mother of” on C, the image R (x) consists of all children. (iii) For the
binary relation % on Rn+ , the image R (x) = y 2 Rn+ : y % x of x 2 Rn+ consists of all
bundles that are at least as good as x. N
Rf = f(x; f (x)) : x 2 Xg
on X consisting of all pairs (x; f (x)). The image Rf (x) = ff (x)g is a singleton consisting
of the image f (x). Indeed, functions can be regarded as the binary relations on X that have
singleton images, i.e., that associate to each element of X a unique element of X. N
A.2. PROPERTIES 1095
A.2 Properties
A binary relation R can satisfy several properties. In particular, a binary relation R on a
set X is:
Often we will consider binary relations that satisfy more than one of these properties.
However, some of them are incompatible, for example asymmetry and symmetry, while others
are related, for example completeness implies re‡exivity.1
Example 1521 (i) Consider the binary relation on N. Clearly, is complete (so, it is
re‡exive). Indeed, given any two natural numbers x and y, either is greater or equal than
the other. Actually, if both x y and y x, then x = y. Thus, is antisymmetric. Finally,
is transitive but it is neither symmetric nor asymmetric.
(ii) Let R be the binary relation “being the mother of” on C. An individual cannot be
his/her own mother, so R is not re‡exive (thus, it is not complete either). Similarly, R is
not symmetric since if x is the mother of y, then y cannot be the mother of x. A similar
argument shows that, instead, R is antisymmetric. We leave to the reader to verify that R
is not transitive. N
Example 1522 Let R be the binary relation “being married to”on C. This relation consists
of all pairs of citizens (x; y) 2 C C such that x is the spouse of y. That is, xRy means that
x is married to y. The image R (x) is a singleton consisting of the spouse. The “married to”
relation is neither re‡exive (individuals cannot married to themselves) nor antisymmetric
(married couples do not become single individuals). It is symmetric since individuals are
each other spouses, while transitivity does not hold because xRy and yRz implies x = z.
Finally, this relation is not complete if jCj 3. In fact, suppose that R is complete and
that there exist three distinct elements x; y; z 2 X. By completeness, we have xRy, xRz and
yRz. By symmetry, zRx. Since xRy and xRz imply z = y, we then contradict z 6= y. N
The relation on N is the prototype for the following important class of binary relations.
For example, the binary relation on Rn satis…es re‡exivity, transitivity, and antisym-
metry, so it is a partial order (cf. Section 2.3). If n = 1, this binary relation is complete, thus
is a complete order. If n > 1, this is no longer the case, as we emphasized several times
in the text –for instance, the vectors (1; 2) and (2; 1) cannot be ordered by the relation .
Example 1524 (i) Consider the space of sequences R1 = fx = (x1 ; :::; xn ; :::) : xn 2 R for each n
The componentwise order on R1 de…ned by x y if xn yn for each n 1 is easily seen
to be a partial order. (ii) Given any set A, consider the space AR of real-valued functions
f : A ! R. The pointwise order on AR de…ned by f g if f (x) g (x) for all x 2 A
is also easily seen to be a partial order (the componentwise order on R1 is the special case
A = N). (iii) Consider the power set 2X = fA : A Xg of a set X, i.e., the collection of
all its subsets (cf. Section 7.3). The inclusion relation on 2X is a partial order. Unless
X contains only two elements, is not complete –e.g., if X = fa; b; cg, the sets fa; bg and
fb; cg cannot be ordered by the inclusion relation. N
So, the preference relations that one usually encounters in economics are an import-
ant example of complete preorders. Interestingly, we also encountered a preorder when we
discussed the notion of “having cardinality less or equal than” (Section 7.3).
Example 1526 Let 2R be the collection of all subsets of the real line. De…ne the binary
relation on 2R by A B if jAj jBj, i.e., if A has cardinality higher or equal than B
(Section 7.3). By Proposition 259, is re‡exive and transitive, so it is a preorder. It is
not, however, a partial order because antisymmetry is clearly violated: for example, the sets
A = f1; g and B = f2; 5g have the same cardinality – i.e., both A B and B A – yet
they are di¤erent, i.e., A 6= B. N
Clearly, a preorder is a partial order, while this example shows that the converse is false.
Proposition 1527 Let R be a preorder on a set X. The induced binary relation I is re‡ex-
ive, symmetric, and transitive.
A.3. EQUIVALENCE RELATIONS 1097
This result is the general abstract version of what Lemma 239 established for a preference
relation.
Proof Consider x 2 X and y = x. Since R is re‡exive and y = x, we have both xRy and
yRx. So, by de…nition xIx, proving re‡exivity of I. Next assume that xIy. By de…nition,
we have that xRy and yRx, which means that yRx and xRy, yielding that yIx and proving
symmetry. Finally, assume that xIy and yIz. It follows that xRy and yRx as well as yRz
and zRy. By xRy and yRz and the transitivity of R, we conclude that xRz. By yRx and
zRy and the transitivity of R, we conclude that zRx. So, we have both xRz and zRx,
yielding xIz and proving the transitivity of I. We have thus proved that I is an equivalence
relation.
[x] = fy 2 X : yRxg
The collection [x], which is nothing but the image R (x) of x, is called the equivalence class
of x.
Thus, the choice of the representative x in de…ning the equivalence class is immaterial:
any element of the equivalence class can play that role.
Proof Let y 2 [x]. Then [y] [x]. In fact, if y 0 2 [y], then y 0 Ry and so by transitivity y 0 Rx,
i.e., y 0 2 [x]. On the other hand, y 2 [x] implies x 2 [y] by symmetry. So, [x] [y]. We
conclude that [y] = [x].
For a preference relation, the equivalence classes are the indi¤erence classes, i.e., [x] is
the collection of all alternatives indi¤erent to x. Let us see another classic example.
Example 1530 The preorder on 2R of Example 1526 induces the equivalence relation
on 2R de…ned by A B if and only if jAj = jBj, i.e., if A has the same cardinality than B
If we consider the set Q, the equivalence class [Q] is the class of all sets that are countable,
for example N and Z. Intuitively, this binary relation declares two sets similar if they share
the same number of elements. N
1098 APPENDIX A. BINARY RELATIONS
At this point the reader might think that all equivalence relations are necessarily induced
by a preorder, so have the form I. The next classic example shows that this is not the case.
Example 1531 Let n 2 Z be such that n 2. Consider the binary relation R on the set
of integers Z such that xRy if and only if n divides x y, that is, there exists k 2 Z such
that x y = kn. Clearly, for any x 2 Z, we have xRx since x x = kn with k = 0. At the
same time, if x and y in Z are such that xRy, then x y = kn for some k 2 Z, yielding that
y x = ( k) n. It follows that yRx, proving that R is symmetric. Finally, if x, y, and z in
Z are such that xRy and yRz, then x y = kn and y z = k 0 n for some k; k 0 2 Z, yielding
that x z = (k + k 0 ) n. It follows that xRz, proving that R is transitive. We conclude that
R is an equivalence relation. It is often denoted by x = y (mod n). N
The next result shows that equivalence relations are closely connected to partitions of X,
so to subdivisions of the set of interest X in mutually exclusive classes. It generalizes the
basic property that indi¤erence curves are disjoint (Lemma 240).
Example 1533 (i) The relation “having the same age” is an equivalence relation on C,
whose equivalence classes consist of all citizens that have the same age, that is, who belong to
same age cohort. The quotient space has, as points, the age cohorts. (ii) For the indi¤erence
relation on Rn+ , the quotient space has, as points, the indi¤erence curves. N
Appendix B
Permutations
B.1 Generalities
Combinatorics is an important area of discrete mathematics, useful in many applications.
Here we focus on permutations, a fundamental combinatorial notion that is important to
understand some of the topics of the book.
We start with a simple problem. We have at our disposal three pairs of pants and …ve
T-shirts. If there are no chromatic pairs that hurt our aesthetic sense, in how many possible
ways can we dress? The answer is very simple: in 3 5 = 15 ways. Indeed, let us call the
pairs of pants a, b, c and the T-shirts 1, 2, 3, 4, 5: since the choice of a certain T-shirt does
not impose any (aesthetic) restriction on the choice of the pants, the possible pairings are
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
We can therefore conclude that if we have to make two independent choices, one among
n di¤erent alternative and the other among m di¤erent alternatives, the total number of
possible choices is n m. In particular, suppose that A and B are two sets with n and m
elements, respectively. Their Cartesian product A B, which is set of ordered pairs (a; b)
with a 2 A and b 2 B, has n m elements. That is:
Proposition 1534 jA Bj = jAj jBj.
What has been said can be easily extended to the case of more than two choices: if we
have to make multiple choices, none of which imposes restrictions on the others, the total
number of possible choices is the product of the numbers of alternatives for each choice.
Formally:
Proposition 1535 jA1 A2 An j = jA1 j jA2 j jAn j.
Example 1536 (i) How many Italian licence plates are possible? They have the form
AA 000 AA with two letters, three digits, and again two letters. There are 22 letters
that can be used and, obviously, 10 digits. The number of (di¤erent) plates is, therefore,
22 22 10 10 10 22 22 = 234; 256; 000. (ii) In a multiple choice test, in each question
students have to select one of the three possible answers. If there are 13 questions, then the
overall number of possible selections is 313 = 1; 594; 323. N
1099
1100 APPENDIX B. PERMUTATIONS
B.2 Permutations
Intuitively, a permutation of n distinct objects is a possible arrangement of these objects.
For instance, with three objects a, b, c there are 6 permutations:
Permutations are thus nothing but the bijective functions f : X ! X. Though combin-
atorics typically considers …nite sets X, the de…nition is fully general.
For instance, if X = fa; b; cg the permutations f : fa; b; cg ! fa; b; cg that correspond to
the arrangements (B.1) are:
Example 1539 (i) A deck of 52 cards can be reshu- ed in 52! di¤erent ways. (ii) Six
passengers can occupy in 6! = 720 di¤erent ways a six-passenger car. N
Indeed, Lemma 337 showed that n = o (n!). The already very fast exponentials are actually
slower than factorials, which de…nitely deserve their exclamation mark.
B.3 Anagrams
We now drop the requirement that the objects be distinct and allow for repetitions. Spe-
ci…cally in this section we consider P n objects of h n di¤erent types, each type i with
multiplicity ki , with i = 1; :::; h, and hi=1 ki = n.1 For instance, consider the 6 objects
a; a; b; b; b; c
Proposition 1540 The number of distinct arrangements, called permutations with repeti-
tions (or anagrams), is
n!
(B.2)
k1 !k2 ! kh !
Example 1541 (i) The possible anagrams of the word ABA are 3!= (2!1!) = 3. They
are ABA, AAB, BAA. (ii) The possible anagrams of the word MAMMA are 5!= (3!2!) =
120= (6 2) = 10. N
In the important two-type case, h = 2, we have k objects of one type and n k of the
other type. By (B.2), the number of distinct arrangements is
n!
(B.3)
k! (n k)!
1
Note that, because of repetitions, these n objects do not form a set X. The notion of “multiset” is
sometimes used for collections in which repetitions are permitted.
1102 APPENDIX B. PERMUTATIONS
Example 1542 (i) In a parking lot, spots can be either free or busy. Suppose that 15
out of the 20 available spots are busy. The possible arrangements of the 5 free spots (or,
symmetrically, of the 15 busy spots) are:
20 20
= = 15; 504
5 15
(ii) We repeat an experiment 100 times: each time we can record either a “success” or a
“failure”, so a string of a 100 outcomes like F SF F:::S results. Suppose that we have recorded
92 “successes” and 8 “failures”. The number of the di¤erent strings that may result is:
100 100
= = 186; 087; 894; 300
92 8
N
n n n n n
(a + b)n = an + a 1
b+ a 2 2
b + + abn 1
+ bn (B.4)
1 2 n 1
n
X n n k k
= a b
k
k=0
Proof We proceed by induction. The initial step, that is the veracity of the statement for
n = 1, is trivially veri…ed. Indeed:
1
1 1 0 0 1 1 1 0 1 0 1 X 1 1 k k
(a + b) = a + b = a b + a b = a b + a b = a b
0 1 k
k=0
We next prove the inductive step. We assume the statement holds for n, that is,
n
X n n
(a + b)n = a k k
b
k
k=0
and we show it holds for n + 1 as well. In doing so, we will use the combinatorial identity
(10.5), that is,
n+1 n n
= + 8i = 1; :::; n
i i 1 i
Note that
n
X n n
(a + b)n+1 = (a + b) (a + b)n = (a + b) a k k
b
k
k=0
n
X n
X
n n+1 k k n n k k+1
= a b + a b
k k
k=0 k=0
Xn n+1
X
n n+1 i i n
= a b + an+1 i bi
i i 1
i=0 i=1
n
X n
n n+1 i i X n
= an+1 + a b + an+1 i bi + bn+1
i i 1
i=1 i=1
Xn
n n
= an+1 + + an+1 i bi + bn+1
i 1 i
i=1
n
X n+1
X n+1
n+1 n + 1 n+1 i i n+1
= a + a b +b = an+1 i bi
i i
i=1 i=0
So, the statement holds for n + 1, thus proving the induction step and the main statement.
Formula (B.4) is called the Newton binomial formula. It motivates the name of binomial
n
coe¢ cients for the integers . In particular,
k
1104 APPENDIX B. PERMUTATIONS
n
X n k
(1 + x)n = x
k
k=0
which can be used to prove that if a …nite set has cardinality n , then its power set has
cardinality 2n (cf. Proposition 257). Indeed, there is only one, 1 = n0 , subset with 0
elements (the empty set), n = n1 subsets with only one element, n2 subsets with two
elements, ..., and …nally only one, 1 = nn , subset –the set itself –with all the n elements.
Notions of trigonometry
C.1 Generalities
We call trigonometric circle the unit circle with center at the origin and radius 1, oriented
counterclockwise, and on which one moves starting from the point of coordinates (1; 0).
y
1.5
0.5
(1,0)
0
O x
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Clearly, each point on the circle determines an angle between the positive horizontal axis
and the straight line joining the point with the origin; vice versa, each angle determines a
point on the circle. This correspondence between points and angles can be, equivalently,
viewed as a correspondence between points and arcs of circle. In the following …gure the
1105
1106 APPENDIX C. NOTIONS OF TRIGONOMETRY
y
1.5
P
P
2
1
α'
0.5
α
0
O P 1 x
1
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Angles are usually measured in either degrees or radians. A degree is the 360th part of
a round angle (corresponding to a complete round of the circle); a radian is an, apparently
strange, unit of measure that assigns measure 2 to a round angle; it is therefore its 2 -th
part. We will use the radian as unit of measure of angles because it presents some advantages
over the degree. In any case, the next table lists some equivalent values of degrees and
radians.
degrees 0 30 45 60 90 180 270 360
3
radians 0 2
6 4 3 2 2
Angles that di¤er by one or more complete rounds of the circle are identical: to write or
+ 2k , with k 2 Z, is the same. We will therefore always take 0 <2 .
Fix a point P = (P1 ; P2 ) on the trigonometric circle, as in the previous …gure. The sine
of the angle determined by the point P is the ordinate P2 of such point, while the cosine
of is the abscissa P1 .
The sine and the cosine of the angle are denoted, respectively, by sin and cos . The
sine is positive in the quadrants I and II, and negative in the quadrants III and IV. The
cosine is positive in the quadrants I and IV, and negative in the quadrants II and III. For
example,
3
0 p4 2 2 2
2
sin 0 p2 1 0 1 0
2
cos 1 2 0 1 0 1
tan2
sin2 =
1 + tan2
Finally, the reciprocals of sine, cosine, and tangent are called secant, cosecant, and cotangent,
respectively.
Next we list some formulas that we do not prove (in any case, it would be enough to
prove the …rst two because the other ones are simple consequences).
1108 APPENDIX C. NOTIONS OF TRIGONOMETRY
sin ( + ) = sin cos + sin cos ; cos ( + ) = cos cos sin sin
and
sin ( ) = sin cos sin cos ; cos ( ) = cos cos + sin sin (C.4)
and r r
1 cos 2 1 + cos 2
sin = ; cos =
2 2
Prostaphaeresis formulas (addition and subtraction):
and
We close with a few classic theorems that show how trigonometry is intimately linked
to the study of triangles. In these theorems a, b, c denote the lengths of the three sides
of a triangle and , , the angles opposite to them:
Theorem 1544 (Law of Sines) Sides are proportional to the sines of their opposite angles,
that is,
a b c
= =
sin sin sin
An interesting consequence of the law of sines is that the area of a triangle can be
expressed in trigonometric form via the length of two sides and of the angle opposite to the
third side. Speci…cally, if the two sides are b and c, the area is
1
bc sin (C.5)
2
Indeed, draw in the last …gure a perpendicular from the top vertex to the side of length c,
and denote its length by h. From, at least, high school we know that the area of the triangle
C.2. CONCERTO D’ARCHI (STRING CONCERT) 1109
is ch=2 (it is the classic “half the base times the height”formula). Consider the right triangle
that has the side of length b as hypotenuse and the perpendicular of length h as a cathetus.
By the law of sines,
h b
=
sin sin 2
So, h = b sin . From the high school formula ch=2 it then follows the trigonometric formula
(C.5).
Example 1545 Some important geometric …gures in the plane can be subdivided in tri-
angles, so their area can be recovered by adding up the area of such triangles. For instance,
consider a regular polygon with n sides of equal length and n angles of equal measure 2 =n.
For example, in the following …gure we have an hexagon with six sides of equal length and
six angles of equal measure =3 (i.e., 60 degrees):
Denote by r the radius of this regular polygon. The area of each regular polygon is partitioned
in n identical isosceles triangles with two sides of equal length r. For instance, in the hexagon
there are six such triangles. By formula (C.5), the area of each of these identical isosceles
triangles is 2 1 r2 sin 2 =n, so the area of the polygon is
n 2 2
r sin (C.6)
2 n
p p
For example, the area of the hexagon is 3r2 3=2 since sin =3 = 3=2.
The subdivision of geometric …gures of the plane in triangles is called triangulation, an
important technique that may permit to reduce the study of geometric …gures to that of
triangles (by taking limits via arbitrarily small triangles, the technique becomes especially
powerful). N
1110 APPENDIX C. NOTIONS OF TRIGONOMETRY
Example 1546 The famous number can be de…ned as the area of the closed unit ball
4
x
2
3
0
O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
To compute amounts to compute this area, a problem that Archimedes famously ap-
proached via the method of exhaustion. This method considers the areas of inscribed and
circumscribed polygons, which provide lower and upper approximations for , respectively.
Indeed, the area of any inscribed polygon is always , while the area of any circumscribed
polygon is always . For instance, consider a regular polygons inscribed in the closed unit
ball, like the hexagon:
By increasing the number of sides, we get larger and larger inscribed regular polygons that
provide better and better lower approximations of . The area of each such polygon is given
by formula (C.6). Since their radius r is 1, we thus have the lower approximations
n 2
sin 8n 1
2 n
that are better and better as n increases. At the limit, we have:
n 2
lim sin =
n!1 2 n
C.2. CONCERTO D’ARCHI (STRING CONCERT) 1111
Similarly, by increasing the number of sides we get smaller and smaller circumscribed regular
polygons that provide better and better upper approximations of . The radius r of the
circumscribed regular polygon with n sides is the length of the equal sides of the isosceles
triangles in which it can be partitioned. So, r = cos 1 =n > 1 as the reader can check with
the help of the next …gure:
Summing up,
n 2 n 1 2
" sin sin # 8n 1 (C.7)
2 n 2 cos2 n n
Via a trigonometric argument, we thus showed that the areas of the inscribed and circum-
scribed regular polygons provide lower and upper approximations of that, as the number
of sides increases, better and better sandwich till, in the limit of “in…nitely many sides”,
they reach as their common limit value.1
The trigonometric approximations (C.7) thus justify the use of the method of exhaustion
to compute . Archimedes was able to compute the area of the inscribed and circumscribed
regular polygons till n = 96, getting the remarkable approximation
10 1
3:1408 = 3 + 3+ = 3:1429
71 7
1
The role of in the approximations is to identify radians, so the actual knowledge of is not needed
(thus, there is no circularity in using these approximations for ).
1112 APPENDIX C. NOTIONS OF TRIGONOMETRY
By computing the areas of the inscribed and circumscribed regular polygons for larger and
larger n, we get better and better approximations of . N
We close with a result that generalizes Pythagoras’ Theorem, which is the special case
when the triangle is right and side a is the hypotenuse (indeed, cos = cos =2 = 0).
C.3 Perpendicularity
The trigonometric circle consists of the points x 2 R2 of unit norm, that is, kxk = 1. Hence,
any point x = (x1 ; x2 ) 2 R2 can be moved back on the unit circle by dividing it by its norm
kxk since
x
=1
kxk
The following picture illustrates:
It follows that
x2 x1
sin = and cos = (C.8)
kxk kxk
that is,
x = (kxk cos ; kxk sin )
This trigonometric representation of the vector x is called polar. The components kxk cos
and kxk sin are called polar coordinates.
The angle can be expressed through the inverse trigonometric functions arcsin x,
arccos x, and arctan x. To this end, observe that
x2
sin kxk x2
tan = = x1 =
cos kxk x1
C.3. PERPENDICULARITY 1113
The equality = arctan x2 =x1 is especially important because it permits to express the angle
as a function of the coordinates of the point x = (x1 ; x2 ).
Let x and y be two vectors in the plane R2 that determine the angles and :
By (C.4), we have
that is,
x y
= cos ( )
kxk kyk
where is the angle that is di¤erence of the angles determined by the two points.
1114 APPENDIX C. NOTIONS OF TRIGONOMETRY
This angle is a right one, i.e., the vectors x and y are “perpendicular”, when
x y
= cos = 0
kxk kyk 2
that is, if and only if x y = 0. In other words, two vectors in the plane R2 are perpendicular
when their inner product is zero.
Appendix D
In this chapter we will introduce some basic notions of logic. Though, “logically”, these
notions should actually be placed at the beginning of a textbook, they can be best appreciated
after having learned some mathematics (even if in a logically disordered way). This is why
this chapter is an Appendix, leaving to the reader to judge when it is best to read it.
D.1 Propositions
We call proposition a statement that can be either true or false. For example, “ravens are
black” and “in the year 1965 it rained in Milan” are propositions. On the contrary, the
statement “in the year 1965 it has been cold in Milan”is not a proposition, unless we specify
the meaning of cold, for example with the proposition “in the year 1965 the temperature
went below zero in Milan”.
We will denote propositions by letters such as p; q; :::. Moreover, we will denote for the
sake of brevity with 1 and 0, respectively, the truth or the falsity of a proposition: these are
called truth values.
D.2 Operations
Let us list some operations on propositions.
(i) Negation. Let p be a proposition; the negation, denoted by :p, is the proposition that
is true when p is false and that is false when p is true. We can summarize the de…nition
in the following truth table
p :p
1 0
0 1
which reports the truth values of p and :p. For instance, if p is “in the year 1965 it
rained in Milan”, then :p is “in the year 1965 it did not rain in Milan”.
(ii) Conjunction. Let p and q be two propositions; the conjunction of p and q, denoted by
p ^ q, is the proposition that is true when p and q are both true and is false when at
1115
1116 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
p q p^q
1 1 1
1 0 0
0 1 0
0 0 0
For instance, if p is “in the year 1965 it rained in Milan” and q is “in the year 1965
the temperature went below zero in Milan”, then p ^ q is “in the year 1965 it rained in
Milan and the temperature went below zero”.
(iii) Disjunction. Let p and q be two propositions; the disjunction of p and q, denoted by
p _ q, is the proposition that is true when at least one between p and q is true and is
false when both of them are false.1 The truth table is:
p q p_q
1 1 1
1 0 1
0 1 1
0 0 0
For instance, with the previous examples of p and q, then p _ q is “in the year 1965 it
rained in Milan or the temperature went below zero”.
p q p =) q
1 1 1
1 0 0
0 1 1
0 0 1 (D.1)
The conditional is therefore true if, when p is true, also q is true, or if p is false (in
which case the truth value of q is irrelevant). The proposition p is called the antecedent
and q is the consequent. For instance, suppose the antecedent p is “I go on vacation”
and the consequent q is “I go to the sea”; the conditional p =) q is “If I go on
vacation, then I go to the sea”.
p q p =) q q =) p p () q
1 1 1 1 1
1 0 0 1 0
0 1 1 0 0
0 0 1 1 1
The biconditional is, therefore, true when the two involved implications are both true
or both false. With the last example of p and q, the biconditional p () q is “I go on
vacation if and only if I go to the sea”.
These …ve logical operations allow us to build new propositions form old ones. Starting
from the three propositions p, q, and r, through negation, disjunction and conditional we
can build, for example, the proposition
: ((p _ :q) =) r)
O.R. The true-false dichotomy originates in the Eleatic school, which based its dialectics
upon it (Section 1.8). Apparently, it …rst appears as “[a thing] is or it is not” in the poem
of Parmenides (trans. Raven). A serious challenge to the universal validity of the true-false
dichotomy has been posed by some, old and new, paradoxes. We already encountered the set
theoretic paradox of Russell (Section 1.1.4). A simpler, much older, paradox is that of the
liar: consider the self-referential proposition “this proposition is false”. Is it true or false?
Maybe it is both.2 Be that as it may, in many matters – in mathematics, let alone in the
empirical sciences –the dichotomy can be safely assumed.
In other words, the symbol 0 denotes a generic contradiction and the symbol 1 a generic
tautology.
Two propositions p and q are said to be (logically) equivalent, written p q, when they
have the same truth values, i.e., they are always both true or both false. In other words, two
propositions p and q are equivalent when the co-implication p () q is a tautology, i.e., it
is always true. The relation is called logical equivalence.
The following properties are evident:
p :p p ^ :p p _ :p
1 0 0 1
0 1 0 1
If p is the proposition “all ravens are black”, the contradiction p ^ :p is “all ravens are both
black and non-black” and the tautology p _ :p is “all ravens are either black or non-black”.
: (p ^ q) :p _ :q and : (p _ q) :p ^ :q
They can be proved through the truth tables; we con…ne ourselves to the …rst law:
p q p^q : (p ^ q) :p :q :p _ :q
1 1 1 0 0 0 0
1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 0 1 1 1 1
The table shows that the true values of : (p ^ q) and of :p _ :q are identical, as claimed.
Note an interesting duality: the laws of non-contradiction and of the excluded middle can
be derived one from the other via de Morgan’s laws.
D.4. DEDUCTION 1119
N.B. Given two equivalent propositions, one of them is a tautology if and only if the other
one is so. O
D.4 Deduction
D.4.1 Theorems and proofs
An equivalence is a biconditional which is a tautology, i.e., which is always true. In a similar
vein, we call implication a conditional which is a tautology, that is, (p =) q) 1. In this
case, if p is true then also q is true.3 We say that q is a logical consequence of p, written
p j= q.
The antecedent p is now called hypothesis and the consequent q thesis. Naturally, we
have p q when simultaneously p j= q and q j= p.
In our naive setup, a theorem is a proposition of the form p j= q, that is, an implication.
The proof is a logical argument that proves that the conditional p =) q is actually an
implication.4 To do this it is necessary to establish that, if the hypothesis p is true, then
also the thesis q is true. Usually we choose one among the following three di¤erent types of
proof:
3
When p is false the implication is automatically true, as the truth table (D.1) shows.
4
In these introductory notes we remain vague about what a “logical argument” is, leaving a more de-
tailed analysis to more advanced courses. We expect, however, that readers can (intuitively) recognize, and
elaborate, such arguments.
1120 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
(a) direct proof : p j= q, i.e., to establish directly that, if p is true, also q is so;
(b) proof by contraposition: :q j= :p, i.e., to establish that the contrapositive :q =) :p
is a tautology (i.e., that if q is false, so is p);
(c) proof by contradiction (reductio ad absurdum): p ^ :q j= r ^ :r, i.e., to establish that
the conditional p ^ :q =) r ^ :r is a tautology (i.e., that, if p is true and q is false,
we reach a contradiction r ^ :r).
The proof by contraposition relies on the equivalence (D.2) and is, basically, an upside
down direct proof (for instance, Theorem 1554 will be proved by contraposition). For this
reason in what follows we will focus on the two main types of proofs, direct and by contra-
diction.
N.B. (i) When both p j= q and q j= p hold, the theorem takes the form of equivalence p q.
The implications p j= q and q j= p are independent and each of them requires its own proof
(this is why in the book we studied separately the “if” and the “only if”). (ii) When, as it
is often the case, the hypothesis is the conjunction of several propositions, we write
p1 ^ ^ pn j= q (D.4)
So, the scope of the implication p j= q is broader than it may appear prima facie. O
Direct proofs are, however, often articulated in several steps, in a divide et impera spirit.
In this regard, the next result is key.
Proposition 1549 j= is transitive.
Proof Assume p j= r and r j= q. We have to show that p =) q is a tautology, that is, that
if p is true, then q is true. Assume that p is true. Then, r is true because p j= r. In turn,
this implies that q is true because r j= q.
Example 1550 (i) Assume that p is “n2 + 1 is odd” and q is “n is even”. To prove p j= q,
let us consider the auxiliary proposition “n2 is even”. The implication p j= r is obvious,
while the implication r j= q will be proved momentarily (Theorem 1553). Jointly, these two
implications provide a direct proof p j= r j= q of p j= q, that is, of the proposition “if n2 + 1
is odd, then n is even”. (ii) Assume that p is “the scalar function f is di¤erentiable” and q
is “the scalar function f is integrable”. To prove p j= q is natural to consider the auxiliary
proposition “the scalar function f is continuous”. The implications p j= r and r j= q are
basic calculus results that, jointly, provide a direct proof p j= r j= q of p j= q, that is, of the
proposition “if the scalar function f is di¤erentiable, then it is integrable”. N
p q p ^ :q r ^ :r p =) q p ^ :q =) r ^ :r
1 1 0 0 1 1
1 0 1 0 0 0
0 1 0 0 1 1
0 0 0 0 1 1
(p =) q) (p ^ :q =) r ^ :r) (D.6)
(p =) q) (p ^ :q =) 0)
1122 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
The proof by contradiction is the most intriguing (recall Section 1.8 on the birth of the
deductive method). We illustrate it with one of the gems of Greek mathematics that we saw
in the …rst chapter. For brevity, we do not repeat the proof of the …rst chapter and just
present its logical analysis.
p
Theorem 1552 22
= Q.
Logical analysis In this, as in other theorems it might seem that there is no hypothesis, but
it is not so: simply the hypothesis is concealed. For example, here the concealed hypothesis
is “the axioms of arithmetic, in particular those aboutp arithmetical operations, hold”. Let
a be this concealed hypothesis,5 let q be the thesis “ 2 2= Q”, and let r be the proposition
“m=n is reduced to its lowest terms”. The scheme of the proof is a ^ :q j= r ^ :r, i.e., if
arithmetical operations apply, the negation of the thesis leads to a contradiction.
An important special case of the equivalence (D.6) is when the role of r is played by the
hypothesis p itself. In this case, (D.6) becomes
(p =) q) (p ^ :q =) p ^ :p)
p q p =) q p ^ :q :p p ^ :q =) :p p ^ :q =) p ^ :p
1 1 1 0 0 0 1
1 0 0 1 0 0 0
0 1 1 0 1 1 1
0 0 1 0 1 1 1
(p =) q) (p ^ :q =) :p)
In words, it is necessary to show that the hypothesis and the negation of the thesis imply,
jointly, the negation of the hypothesis. Let us see an example.
Proof Let us assume, by contradiction, that n is odd. Then n2 is odd, which contradicts
the hypothesis.
Logical analysis. Let p be the hypothesis “n2 is even” and q the thesis “n is even”. The
scheme of the proof is p ^ :q j= :p.
5
This discussion will become clearer after the next section on the deductive method. In any case, we can
think of a = a1 ^ ^ an as the conjunction of a collection A = fa1 ; :::; an g of axioms of arithmetic (in our
naive setup, we do not worry wether all such axioms can be expressed via propositional calculus, an issue
that readers will study in more advanced courses). In terms of (D.7), in this theorem there is no speci…c
hypothesis p.
D.5. DEDUCTIVE METHOD 1123
D.4.4 Summing up
Proofs require, in general, some inspiration: there are no recipes or mechanical rules that
can help us in …nding in a proof by contradiction an auxiliary proposition r that determines
the contradiction and in a direct proof the auxiliary propositions ri that permit to articulate
a direct argument.
As to terminology, the implication p j= q can be read in di¤erent, but equivalent, ways:
(i) p implies q;
(ii) if p, then q;
(iii) p only if q ;
(iv) q if p;
The choice among these versions is a matter of expositional convenience. Similarly, the
equivalence p q can be read as:
For example, the next simple result shows that the implication “a > 1 j= a2 > 1” is
true, i.e., that “a > 1 is a su¢ cient condition for a2 > 1”, i.e., that “a2 > 1 is a necessary
condition for a > 1”.
= A [ fp g j= q (D.7)
That is, consists of the axioms as well as of a speci…c hypothesis (which, of course, can in
turn be the conjunction of several propositions). Note that here j= q stands for a ^ p j= q,
where a = a1 ^ ^ an is the conjunction of the axioms A = fa1 ; :::; an g.
Normally, to ease exposition axioms are omitted in theorems’ statements because they
are taken for granted within the mathematical theory at hand. So, we just write p j= q in
place of A [ fp g j= q. For instance in Euclidean geometry, theorems do not mention the
axioms which they rely upon, for instance the parallel axiom, but only the speci…c hypothesis
of the theorem.
The scope of a mathematical theory is given by the propositions that, via theorems (D.7),
can be established to be true from the axioms in A and from a speci…c hypothesis p (which are
required not to contradict the axioms, i.e., a ^ p is not a contradiction). If these hypotheses
follow from the axioms, (D.7) is actually A j= q. If the axiomatic system is complete, all
theorems then take the form A j= q.
6
As Tarski (1994) writes on p. 110. Alfred Tarski has been, along with David Hilbert and Giuseppe Peano,
a central …gure in the modern analysis of the deductive method in mathematics. We refer readers to his book
for a masterly introduction to the subject.
D.5. DEDUCTIVE METHOD 1125
Proof We have a2 j= r, where r =“z z and y z imply z y for all y; z 2 I”. So, the
proof relies on the deduction scheme a1 ^ a2 j= a1 ^ r j= q.7
Thus, under the axioms – i.e., is symmetric and transitive – the binary relation is
symmetric. It is easily checked to be also transitive.
D.5.4 Interpretations
The speci…c meaning attached to the primitive terms is irrelevant for the formal deductions
carried out via (D.7). For instance, following again Tarski (1994), consider an alternative
interpretation of the primitive terms of the previous theory in which I now indicates a set of
numbers and the symbol indicates a congruence relation in which x y reads as “there
is an integer z such that x y = z”. Axioms A.1 and A.2 and the resulting Theorem 1555
still apply.
So, the same mathematical theory may admit di¤erent interpretations, whose meaning is
understood outside the theory –which thus takes it for granted. The expression “self-evident”
is now replaced by this more general principle. For this reason, in modern mathematics
the emphasis is on the consistency of the axioms rather than on their self-evidence (as it
was in Greek geometry), a notion that implicitly refers to a speci…c interpretation. As
readers will learn in logic courses, axioms have their own syntactic life that abstracts from
any speci…c interpretations (semantics). For instance, in Tarski’s miniature example the
underlying general abstract structure consists of a set X and a binary relation R on it. Any
interpretation of X and R provides a model for such abstract structure. The abstract axioms
are:
A.2 the proposition a2 =“xRz and yRz imply xRy for all x; y; z 2 X” is true.
7
It is easy to check using truth tables that from q j= r it follows p ^ q j= p ^ r for all propositions p, q and
r.
1126 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
A.1 R is re‡exive;
If we call Tarskian the property in A.2, we can state the abstract version of Theorem
1555 in a legible way.
In all models of the abstract structure (X; R) this theorem holds and will be suitably
interpreted.
9x 2 R, x2 = 1 (D.9)
we would assert a (simple) truth: there is some real number (there are actually two of them:
x = 1) whose square is 1.
Milan”. Note that when the domain is …nite, say X = fx1 ; :::; xn g, the propositions (D.10)
and (D.11) can be written as p (x1 ) _ _ p (xn ) and p (x1 ) ^ ^ p (xn ), respectively.
Quanti…ers transform, therefore, predicates in propositions, that is, in statements that
are either true or false. That said, if X is in…nite to verify whether proposition (D.11)
is true requires an in…nite number of checks, i.e., whether p (x) is true for each x 2 X.
Operationally, such truth value cannot be determined. In contrast, to verify whether (D.11)
is false is enough to exhibit one x 2 X such that p (x) is false. There is, therefore, a clear
asymmetry between the operational content of the two truth values of (D.11). A large X
reinforces the asymmetry between veri…cation and falsi…cation that a large n already causes,
as we remarked in Coda (a proposition “8x 2 X, p1 (x) ^ ^ pn (x)” would combine, so
magnify, these two sources of asymmetry).
In contrast, the existential proposition (D.10) can be veri…ed via an element x 2 X such
that p (x) is true. Of course, if X is large (let alone if it is in…nite), it may be operationally
not obvious how to …nd such an element. Be that as it may, falsi…cation is in a much bigger
trouble: to verify that proposition (D.10) is false we should check that, for all x 2 X, the
proposition p (x) is false. Operationally, existential propositions are typically not falsi…able.
N.B. (i) In the book we often write “p (x) for every x 2 X” in the form
p (x) 8x 2 X
D.6.2 Algebra
In a sense, 8 and 9 represent the negation of one another. So8
and, symmetrically,
: (8x, p (x)) 9x, :p (x)
In the example where p (x) is “x2 = 1”, we can equally well write:
: 8x, x2 = 1 or 9x, x2 6= 1
(respectively: it is not true that x2 = 1 for every x and it is true that for some x one has
x2 6= 1).
More generally
: (8x; 9y, p (x; y)) 9x; 8y, :p (x; y)
For example, let p (x; y) be the proposition “x + y 2 = 0”. We can equally assert that
: 8x; 9y, x + y 2 = 0
8
To ease notation, in the quanti…ers we omit the clause “2 X”.
1128 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
(it is not true that, for every x 2 R, we can …nd a value of y 2 R such that the sum x + y 2
is zero: it is su¢ cient to take x = 5) or
9x; 8y, x + y 2 6= 0
(it is true that, for every choice of y 2 R, there exists some value of x 2 R such that
x + y 2 6= 0: it is su¢ cient to take x 6= y 2 ).
1 2 m
1x + 2x + + mx = 0 =) 1 = 2 = = m =0
m
The set xi i=1 has been, instead, called linearly dependent if it is not linearly inde-
pendent, i.e., if there exists a set f i gmi=1 of real numbers, not all equal to zero, such that
x 1+ x 2+ + xm = 0.
1 2 m
We can write these notions by making the role of predicates explicit. Let p ( 1 ; :::; m ) and
q ( 1 ; :::; m ) be the predicates “ 1 x1 + 2 x2 + + m xm = 0”and “ 1 = 2 = = m = 0”,
m
respectively. The set xi i=1 is linearly independent when
8 f i gm
i=1 , p ( 1 ; :::; m) =) q ( 1 ; :::; m)
9 f i gm
i=1 ; : (p ( 1 ; :::; m) =) q ( 1 ; :::; m ))
9 f i gm
i=1 ; p ( 1 ; :::; m) ^ :q ( 1 ; :::; m)
In other words, a sequence fxn g does not converge to a point L 2 R if there exists " > 0
such that for each k 1 there is n k such that
jxn Lj "
By denoting by nk any such n k,9 we de…ne a subsequence fxnk g such that jxnk Lj "
for all k 1. So, we have the following useful characterization of non-convergence to a given
point.
Proposition 1557 A sequence fxn g does not converge to a point L 2 R if and only if there
is a subsequence fxnk g such that jxnk Lj " for all k 1.
p (x) is true () x 2 A
So, predicates and sets are two sides of the same coin. Indeed, predicates formalize the
speci…cation of sets via a property that its elements have in common, as we mentioned at
the very beginning of the book.
In a similar vein, a binary predicate p (x; y) with two arguments that belong to the same
set X can be identi…ed with the binary relation R on X consisting of all pairs (x; y) such
that the proposition p (x; y) is true, i.e., R = f(x; y) 2 X X : p (x; y) is trueg. Clearly,
We conclude that also binary predicates and binary relations are two sides of the same coin.
In general, predicates with n arguments can be identi…ed with n-ary relations, as readers
will learn in more advanced courses. In any case, the set-theoretic translations of some key
logical notions is a further wonder of Cantor’s paradise.
by the theory, and of de…ned terms. So written, the propositions in P are either true or
false, and the collection P describes the empirical phenomenon under investigation.11
A function v : P ! f0; 1g assigns a truth value to all propositions in P . Each truth
assignment v corresponds to a possible con…guration of the empirical reality in which the
propositions in P are either true or false. Each truth assignment is, thus, a possible inter-
pretation that reality may give P . There is a unique true v because there is a unique true
empirical reality.
Let V be the collection of all truth assignments. A proposition p 2 P is a tautology if
v (p) = 1 for all v 2 V and is a contradiction if v (p) = 0 for all v 2 V . In words, a tautology
is a proposition that is true under all interpretations, while a contradiction is a proposition
that is false under all them. The truth value of tautologies and contradictions thus only
depend on their own form, regardless of any interpretation that they can take.12
Proof Let p j= q. If p is true also q is true (both values equal to 1); if p is false (value 0), q
can be true or false (value either 0 or 1). Thus, v (p) v (q) for all v 2 V . The converse is
easily checked.
Let v be the true con…guration of the empirical reality under investigation. A scienti…c
theory takes a stance about the empirical reality that it is studying by positing a consistent
collection A = fa1 ; :::; an g of propositions, called axioms, that are assumed to be true under
the (unknown) true con…guration v , i.e., it is assumed that v (ai ) = 1 for each i = 1; :::; n.
All propositions that are logical consequences of the axioms are then assumed to be true
under v .13 In particular, if A is complete the truth value of all propositions in P can be, in
principle, decided. So, the function v is identi…ed.
Example 1559 (i) A choice theory studies the behavior of a consumer who faces di¤erent
bundles of goods. Consider a choice theory that has two primitive terms I and (cf. Section
D.5.3). The symbol I indicates the set of all bundles of goods available to the consumer. The
symbol indicates the consumer’s indi¤erence relation between the bundles, so that x y
reads as “for the consumer bundle x is indi¤erent to bundle y”.14 If the theory assumes
axioms A.1 and A.2, so the truth of propositions a1 and a2 , then is symmetric (Theorem
1555) and transitive. By assuming these two axioms, the theory makes a stance about the
consumer’s behavior, which is the empirical reality that is studying. The theory is correct
as long as these axioms are true, i.e., v (a1 ) = v (a2 ) = 1.15 (ii) Special relativity is based
11
Of course, behind this sentence there are a number of highly non-trivial conceptual issues about meaning,
truth, reality, etc. etc. (an early classical analysis of these issues can be found in Carnap, 1936).
12
The importance of propositions whose truth value is independent of any interpretation was pointed out
by Ludwig Wittgenstein in his famous Tractatus (the use of the term tautology in logic is due to him; he also
popularized the use of truth tables to handle truth assignments).
13
In the words of Wittgenstein “If a god creates a world in which certain propositions are true, he creates
thereby also a world in which all propositions consequent on them are true.” (Tractatus, proposition 5.123)
14
Needless to say, after congruence relations on segments and integers, the indi¤erence relation on bundles
of goods is yet another model of the abstract structure (X; R) of Section D.5.4.
15
Di¤erent interpretations are, of course, possible of this theory. Debreu (1959) is a classic axiomatic work
in economics; in the preface of his book, Debreu writes that “Allegiance to rigor dictates the axiomatic form
of the analysis where the theory, in the strict sense, is logically entirely disconnected from its interpretations.”
D.7. CODA: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1131
on two axioms: a1 =“invariance of the laws of physics in all inertial frames of reference”,
a2 =“the velocity of light in vacuum is the same in all inertial frames of reference”. If v is
the true physical con…guration, the theory is true if v (a1 ) = v (a2 ) = 1. N
To decide whether a scienti…c theory is true we thus have to check whether v (ai ) = 1
for each i = 1; :::; n. If n is large, operationally this might be complicated (infeasible if is
in…nite). In contrast, to falsify the theory it is enough to exhibit, directly, a proposition of
that is false or, indirectly, a consequence of that is false. This operational asymmetry
between veri…cation and falsi…cation (emphasized by Karl Popper in the 1930s) is an im-
portant methodological aspect. Indirect falsi…cation is, in general, the kind of falsi…cation
that one might hope for. It is the so-called testing of the implications of a scienti…c theory.
In this indirect case, however, it is unclear which one of the posited axioms actually fails: in
fact, : (p1 ^ ^ pn ) :p1 _ _:pn . If not all the posited axioms have the same status,
only some of them being “core”axioms (as opposed to auxiliary ones), it is then unclear how
serious is the falsi…cation. Indeed, falsi…cation is often a chimera (especially in the social
sciences), as even the highly stylized setup of this section should suggest.
1132 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
Appendix E
Mathematical induction
E.1 Generalities
Suppose that we want to prove that a proposition p(n), formulated for every natural number
n, is true for every such number n. Intuitively, it is su¢ cient to show that the “initial”
proposition p(1) is true and that the truth of each proposition p (n) implies that of the
“subsequent” one p (n + 1). Next we formalize this domino argument:1
Theorem 1560 (Induction principle) Let p (n) be a proposition stated in terms of each
natural number n. Suppose that:
Proof Suppose, by contradiction, that proposition p (n) is false for some n. Denote by n0
the smallest such n, which exists since every non-empty collection of natural numbers has
a smallest element.2 By (i), n0 > 1. Moreover, by the de…nition of n0 , the proposition
p (n0 1) is true. By (ii), p (n0 ) is true, a contradiction.
(ii) Induction step: prove that, for each n, if p(n) is true (induction hypothesis), then
p(n + 1) is true.
We illustrate this important type of proof by determining the sum of some important
series.
1
There are many soldiers, one next to the other. The …rst has the “right scarlet fever”, a rare form of
scarlet fever that contaminates instantaneously who is at the right of the sick person. All the soldiers take it
because the …rst one infects the second one, the second one infects the third one, and so on so forth.
2
In the set-theoretic jargon, we say that N is a well ordered set.
1133
1134 APPENDIX E. MATHEMATICAL INDUCTION
(i) We have
n
X n (n + 1)
1+2+ +n= s=
2
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1)
1=
2
Induction step. Assume it is true for n = k (induction hypothesis), that is,
k
X k (k + 1)
s=
2
s=1
Indeed3
k+1
X k
X k (k + 1) (k + 1) (k + 2)
s= s + (k + 1) = +k+1=
2 2
s=1 s=1
(ii) We have
n
X
2 2 2 n (n + 1) (2n + 1)
1 +2 + +n = s2 =
6
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1) (2 + 1)
12 =
6
Induction step. By proceeding as above, we get:
k+1
X k
X k (k + 1) (2k + 1)
2
s = s2 + (k + 1)2 = + (k + 1)2
6
s=1 s=1
(k + 1) [k (2k + 1) + 6 (k + 1)] (k + 1) 2k 2 + 7k + 6
= =
6 6
(k + 1) (k + 2) (2k + 3)
=
6
as claimed.
3
Alternatively, this sum can be derived by observing that the sum of the …rst and of the last addend is
n + 1, the sum of the second one and of the second-last one is still n + 1, etc. There are n=2 pairs and therefore
the sum is (n + 1) n=2.
E.2. THE HARMONIC MENGOLI 1135
(iii) We have
n n
!2
X X n2 (n + 1)2
13 + 2 3 + + n3 = s3 = s =
4
s=1 s=1
12 (1 + 1)2
13 =
4
Induction step. By proceeding as above, we get:
k+1
X k
X k 2 (k + 1)2
s3 = s3 + (k + 1)3 = + (k + 1)3
4
s=1 s=1
(k + 1)2 k 2 + 4 (k + 1) (k + 1)2 (k + 2)2
= =
4 4
of n terms in the geometric progression with …rst term a and common ratio q 6= 1.
Initial step. For n = 1 the formula is trivially true:
1 q
a=a
1 q
as claimed.
The proof is based on a couple of lemmas, the second of which is proven by induction.
1136 APPENDIX E. MATHEMATICAL INDUCTION
Proof Consider the convex function f : (0; 1) ! (0; 1) de…ned by f (x) = 1=x. Since
1 1 1
k= (k 1) + k + (k + 1)
3 3 3
Jensen’s inequality implies
1 1 1 1 1
= f (k) = f (k 1) + k + (k + 1) (f (k 1) + f (k) + f (k + 1))
k 3 3 3 3
1 1 1 1
= + +
3 k 1 k k+1
as claimed.
Pn
Let sn = k=1 xk be the partial sum of the harmonic series xk = 1=k.
Proof We proceed by induction. Initial step: n = 1. We apply the previous lemma for
k = 3:
1 1 1 3
s3 1+1 = s4 = 1 + + + > 1 + = 1 + s1
2 3 4 3
Induction step: let us assume that the statement holds for n 1. We prove that it holds
for n + 1. We apply the previous lemma for k = 3n + 3,
1 1 1
s3(n+1)+1 = s3n+4 = s3n+1 + + +
3n + 2 3n + 3 3n + 4
1 1 1
sn + 1 + + +
3n + 2 3n + 3 3n + 4
3 1
sn + 1 + = sn + 1 + = sn+1 + 1
3n + 3 n+1
which completes the induction step. In conclusion, the result holds thanks to the induction
principle.
Proof of the theorem Since the harmonic series has positive terms, the sequence of its
partial sums fsn g is monotonic increasing. Therefore, it either converges or diverges. By
contradiction, let us assume that it converges, i.e., sn " L < 1. From the last lemma it
follows that
L lim s3n+1 lim (1 + sn ) = 1 + lim sn = 1 + L
n n n
which is a contradiction.
Appendix F
Cast of characters
1137
1138 APPENDIX F. CAST OF CHARACTERS
Muh.ammad ibn Mūsa al-Khuwārizm¯¬ (750 ca – Baghdad 850 ca), astronomer and
mathematician.
Giuseppe Lagrange (Turin 1736 –Paris 1813), mathematician.
Gabriel Lamé (Tours 1795 –Paris 1870), mathematician.
Edmund Landau (Berlin 1877 –1938), mathematician.
Pierre-Simon de Laplace (Beaumont-en-Auge 1749 – Paris 1827), mathematician and
physicist.
Adrien-Marie Legendre (Paris 1752 –1833), mathematician.
Gottfried Leibniz (Leipzig 1646 –Hannover 1716), mathematician and philosopher.
Wassily Leontief (Saint Petersburg 1905 –New York 1999), economist.
Joseph Liouville (Saint-Omer 1809 –Paris 1882), mathematician.
Rudolph Lipschitz (Konigsberg 1832 –Bonn 1903), mathematician.
John Littlewood (Rochester 1885 –Cambridge 1977), mathematician.
Colin Maclaurin (Kilmodan 1698 –Edinburgh 1746), mathematician.
Lorenzo Mascheroni (Bergamo, 1750 –Paris, 1800), mathematician.
Melissus (Samos V century BC), philosopher.
Carl Menger (Nowy Sacz
¾ 1840 –Vienna 1921), economist.
Pietro Mengoli (Bologna 1626 –1686), mathematician.
Marin Mersenne (Oizé 1588 –Paris 1648), mathematician and physicist.
Hermann Minkowski (Aleksotas 1864 –Gottingen 1909), mathematician.
Carlo Miranda (Naples 1912 –1982), mathematician.
Abraham de Moivre (Vitry-le-François 1667 –London 1754), mathematician.
John Napier (Edinburgh 1550 –1617), mathematician.
John Nash (Blue…eld 1928 –Monroe 2015), mathematician.
Isaac Newton (Woolsthorpe 1642 –London 1727), mathematician and physicist.
Vilfredo Pareto (Paris 1848 –Céligny 1923), economist and sociologist.
Parmenides (Elea VI century BC), philosopher.
Giuseppe Peano (Spinetta di Cuneo 1858 –Turin 1932), mathematician.
Plato (Athens 484 BC ca. –348 BC ca.), philosopher.
Alfred Pringsheim (Olawa 1850 –Zurich 1941), mathematician.
Pythagoras (Samos 570 BC ca. – Metapontum 495 BC ca.), mathematician and philo-
sopher.
Henri Poincaré (Nancy 1854 –Paris 1912), mathematician.
Hudalricus Regius (Ulrich Rieger) (XVI century), mathematician.
Bernhard Riemann (Breselenz 1826 –Selasca 1866), mathematician.
Michel Rolle (Ambert 1652 –Paris 1719), mathematician.
Bertrand Russell (Trellech 1872 –Penrhyndeudraeth 1970), philosopher.
Karl Schwarz (Hermsdorf, 1843 –Berlin 1921), mathematician.
1140 APPENDIX F. CAST OF CHARACTERS
[1] Kenneth J. Arrow, Methodological individualism and social knowledge, American Eco-
nomic Review, 84, 1-9, 1994.
[2] Emil Artin, The gamma function, Holt, Rinehart and Winston, New York, 1964.
[4] Claude Berge, Espaces topologiques et fonctions multivoques, Dunod, Paris, 1959.
[5] Daniel Bernoulli, Specimen theoriae novae de mensura sortis, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, 1738 (trans. in Econometrica, 22, 23-36, 1954).
[9] Rudolf Carnap, Testability and meaning, Philosophy of Science, 3, 419-471, 1936.
[10] John H. Cochrane, Asset pricing, Princeton, Princeton University Press, 2005.
[12] Gerard Debreu, Theory of value, Yale University Press, New Haven, 1959.
[14] Godfrey H. Hardy, Orders of in…nity, Cambridge University Press, Cambridge, 1910.
[15] Bruno de Finetti, Sulle strati…cazioni convesse, Annali di Matematica Pura e Applicata,
30, 173-183, 1949
[16] Werner Fenchel, Convex cones, sets, and functions, Princeton University Press, 1953.
[17] Kurt von Fritz, The discovery of incommensurability by Hippasus of Metapontum, An-
nals of Mathematics, 46, 242–264, 1945.
[18] Izrail S. Gradshteyn and Iosif M. Ryzhik, Table of integrals, series, and products, 8th
ed., Academic Press, New York, 2014.
1141
1142 BIBLIOGRAPHY
[19] Paul Halmos, Naive set theory, Van Nostrand, Princeton, 1960.
[20] Godfrey H. Hardy, Divergent series, Oxford University Press, Oxford, 1949.
[21] Johan Jensen, Sur les fonctions convexes et les inégalités entre les valeurs moyennes,
Acta Mathematica, 30, 175-193, 1906.
[23] Shizuo Kakutani, A generalization of Brouwer’s …xed point theorem, Duke Mathematical
Journal, 8, 457-459, 1941.
[25] Harold W. Kuhn and Albert W. Tucker, Nonlinear programming, Proceedings of the
Second Berkeley Symposium, 481-492, University of California Press, Berkeley, 1951.
[27] Katta G. Murty e Santosh N. Kabadi, Some NP-complete problems in quadratic and
nonlinear programming, Mathematical Programming, 39, 117-129, 1987.
[28] Steven G. Krantz and Harold R. Parks, A primer of real analytic functions, Birkhauser,
Boston, 2002.
[30] John Nash, Equilibrium points in n-person games, Proceedings of the National Academy
of Sciences, 36, 48-49, 1950.
[31] Yurii Nesterov, Introductory lectures on convex optimization, Kluwer, Boston, 2004.
[32] Vilfredo Pareto, Sunto di alcuni capitoli di un nuovo trattato di economia pura, 20, 216-
235, Giornale degli Economisti, 1900 (trans. in Giornale degli Economisti, 67, 453-504,
2008).
[33] John W. Pratt, Risk aversion in the small and in the large, Econometrica, 32, 122-136,
1964.
[34] Joseph F. Ritt, Integration in …nite terms: Liouville’s theory of elementary models,
Columbia University Press, New York, 1948.
[35] R. Tyrrell Rockafellar, Lagrange multipliers and optimality, SIAM Review, 35, 183-238,
1993.
[36] Stephen A. Ross, Neoclassical …nance, Princeton University Press, Princeton, 2005.
[37] Walter Rudin, Principles of mathematical analysis, McGraw-Hill, New York, 1964.
[38] Arpad Szabo, The beginnings of Greek mathematics, Reidel Publishing Company,
Dordrecht, 1978.
BIBLIOGRAPHY 1143
[39] George J. Stigler, The development of utility theory I, II, Journal of Political Economy,
58, 307-327 and 373-396, 1950.
[40] Patrick Suppes, Axiomatic set theory, Princeton, Van Nostrand, 1960.
[41] Alfred Tarski, Introduction to logic and to the methodology of the deductive sciences, 4th
ed., Oxford University Press, Oxford, 1994.
[42] Leonida Tonelli, L’analisi funzionale nel calcolo delle variazioni, Annali della Scuola
Normale Superiore di Pisa, 9, 289-302, 1940.
[44] Gregory Vlastos, Studies in Greek philosophy, v. 1, Princeton University Press, Prin-
ceton, 1996.
[45] John von Neumann, Zur theorie der gesellshaftsphiele, Mathematische Annalen, 100,
295-320, 1928 (trans. in R. D. Luce and A. W. Tucker, eds., Contributions to the Theory
of Games IV, 13-42. Princeton University Press, Princeton, 1959).
[46] John von Neumann and Oskar Morgenstern, Theory of games and economic behavior,
Princeton University Press, Princeton, 1944.
[48] Eduardo H. Zarantonello, Projections on convex sets in Hilbert space and spectral the-
ory, in Contributions to nonlinear functional analysis (E. H. Zarantonello, ed.), Aca-
demic Press, New York, 1971.
Index
1144
INDEX 1145
Union, 6, 1114
Unit ball, 42
Unit circle, 43
Upper bound, 23
Value
absolute, 75
maximum, 151
principal, according to Cauchy, 1055
saddle, 977
Variable
dependent, 107
independent, 107
of choice, 526
Vector, 42, 44
unit, 79
zero, 45
Vector subspace, 60
generated, 66
Vectors
addition, 45
collinear, 62
column, 387
linearly dependent, 62
linearly independent, 62
orthogonal, 80
product, 45
row, 387
scalar multiplication, 45
sum, 45
Venn diagrams, 4
Versors, 62
fundamental of R^n, 79
Walras’Law, 538