Sie sind auf Seite 1von 1172

Principles of Mathematics for Economics1

Simone Cerreia-Vioglio
Department of Decision Sciences and IGIER, Università Bocconi

Massimo Marinacci
AXA-Bocconi Chair, Department of Decision Sciences and IGIER, Università Bocconi

Elena Vigna
Dipartimento Esomas, Università di Torino and Collegio Carlo Alberto

August 2017

1
This manuscript is a very preliminary version of a textbook that will be published by Springer
International Publishing (ISBN 978-3-319-44713-1). It is for the personal use of Bocconi students who
are attending …rst year mathematics courses. We thank Gabriella Chiomio and Claudio Mattalia,
who thoroughly translated a …rst version of the manuscript, as well as Alexandra Fotiou, Giacomo
Lanzani and Kelly Gail Strada for excellent research assistance, Margherita Cigola, Guido Osimo,
and Lorenzo Peccati for some very useful comments that helped us to improve the manuscript. We
are especially indebted to Pierpaolo Battigalli, Erio Castagnoli (with whom this project started),
Itzhak Gilboa, Fabio Maccheroni, Luigi Montrucchio, and David Schmeidler for the discussions that
over the years shaped our views on economics and mathematics.
ii
Contents

I Structures 1

1 Sets and numbers: an intuitive introduction 3


1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Properties of the operations . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 A naive remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Structure of the integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Divisors and algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Order structure of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.1 Maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.2 Supremum and in…mum . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.3 Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Powers and logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.1 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.2 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Numbers, …ngers and circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.7 The extended real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.8 The birth of the deductive method . . . . . . . . . . . . . . . . . . . . . . . . 38

2 Cartesian structure and Rn 41


2.1 Cartesian products and Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Operations in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Order structure on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.1 Static choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.2 Intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5 Pareto optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.2 Maxima and maximals . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5.3 Pareto frontier and Edgeworth box . . . . . . . . . . . . . . . . . . . . 54

iii
iv CONTENTS

3 Linear structure 59
3.1 Vector subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Linear independence and dependence . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Generated subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 Bases of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Post scriptum: some high school algebra . . . . . . . . . . . . . . . . . . . . . 73

4 Euclidean structure 75
4.1 Absolute value and norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.2 Absolute value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.3 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5 Topological structure 85
5.1 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Taxonomy of the points of Rn with respect to a set . . . . . . . . . . . . . . . 90
5.3.1 Interior, exterior and boundary points . . . . . . . . . . . . . . . . . . 90
5.3.2 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 Set stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.6 Compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.7 Closure and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6 Functions 105
6.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.1 Static choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.2 Intertemporal choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3 General properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.1 Preimages and level curves . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.2 Algebra of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4 Classes of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.4.1 Injective, surjective, and bijective functions . . . . . . . . . . . . . . . 126
6.4.2 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4.3 Bounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.4 Monotonic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4.5 Concave and convex functions: a preview . . . . . . . . . . . . . . . . 139
6.4.6 Separable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5 Elementary functions on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5.1 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5.2 Exponential and logarithmic functions . . . . . . . . . . . . . . . . . . 143
CONTENTS v

6.5.3 Trigonometric and periodic functions . . . . . . . . . . . . . . . . . . . 146


6.6 Maxima and minima of a function: a preview . . . . . . . . . . . . . . . . . . 151
6.7 Domains and restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.8 Grand …nale: preferences and utility . . . . . . . . . . . . . . . . . . . . . . . 155
6.8.1 Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.8.2 Paretian utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.8.3 Existence and lexicographic preference . . . . . . . . . . . . . . . . . . 159

7 Cardinality 163
7.1 Actual in…nite and potential in…nite . . . . . . . . . . . . . . . . . . . . . . . 163
7.2 Bijective functions and cardinality . . . . . . . . . . . . . . . . . . . . . . . . 164
7.3 A Pandora’s box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

II Discrete analysis 177

8 Sequences 179
8.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.2 The space of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.3 Application: intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . 187
8.4 Application: prices and expectations . . . . . . . . . . . . . . . . . . . . . . . 187
8.4.1 A market for a good . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.4.2 Delays in production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.4.3 Expectation formation . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.5 Images and classes of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.6 Eventually: a key adverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.7 Limits: introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8 Limits and asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.8.2 Limits from above and from below . . . . . . . . . . . . . . . . . . . . 197
8.8.3 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.8.4 Topology of R and a general de…nition of limit . . . . . . . . . . . . . 199
8.9 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.9.1 Monotonicity and convergence . . . . . . . . . . . . . . . . . . . . . . 203
8.9.2 Bolzano-Weierstrass’Theorem . . . . . . . . . . . . . . . . . . . . . . 204
8.10 Algebra of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.10.1 The (many) certainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.10.2 Some common limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.10.3 Indeterminate forms for the limits . . . . . . . . . . . . . . . . . . . . 212
8.10.4 Summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.10.5 How many indeterminate forms are there? . . . . . . . . . . . . . . . . 215
8.11 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.11.1 Comparison criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.11.2 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.11.3 Root criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.12 The Cauchy condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
vi CONTENTS

8.13 Napier’s constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223


8.14 Orders of convergence and of divergence . . . . . . . . . . . . . . . . . . . . . 227
8.14.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8.14.2 Little-o algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.14.3 Asymptotic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.14.4 Characterization and decay . . . . . . . . . . . . . . . . . . . . . . . . 235
8.14.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.14.6 Scales of in…nities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.14.7 The De Moivre-Stirling formula . . . . . . . . . . . . . . . . . . . . . . 237
8.14.8 Distribution of prime numbers . . . . . . . . . . . . . . . . . . . . . . 238
8.15 Sequences in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

9 Series 243
9.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9.1.1 Three classic series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.1.2 Sub specie aeternitatis: in…nite horizon . . . . . . . . . . . . . . . . . 247
9.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3 Series with positive terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.3.1 Comparison criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.3.2 Ratio criterion: prelude . . . . . . . . . . . . . . . . . . . . . . . . . . 255
9.3.3 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
9.3.4 A …rst series expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 258
9.4 Series with terms of any sign . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.4.1 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.4.2 Hic sunt leones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

10 Discrete calculus 267


10.1 Preamble: limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
10.2 Discrete calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
10.2.1 Finite di¤erences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
10.2.2 Newton di¤erence formula . . . . . . . . . . . . . . . . . . . . . . . . . 274
10.2.3 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
10.3 Convergence in mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
10.3.1 In medio stat virtus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
10.3.2 Creatio ex nihilo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
10.4 Convergence criteria for series . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
10.4.1 Root criterion for convergence . . . . . . . . . . . . . . . . . . . . . . . 284
10.4.2 The power of the root criterion . . . . . . . . . . . . . . . . . . . . . . 286
10.5 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
10.5.1 Preamble: rational functions . . . . . . . . . . . . . . . . . . . . . . . 288
10.5.2 Cauchy-Hadamard’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 290
10.5.3 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10.5.4 Solving recurrences via generating functions . . . . . . . . . . . . . . . 293
10.6 In…nite patience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
CONTENTS vii

III Continuity 301

11 Limits of functions 303


11.1 Introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
11.2 Functions of a single variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.2.1 Two-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.2.2 One-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.2.3 Relations between one-sided and two-sided limits . . . . . . . . . . . . 316
11.2.4 Grand …nale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
11.3 Functions of several variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
11.3.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
11.3.2 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
11.3.3 Sequential characterization . . . . . . . . . . . . . . . . . . . . . . . . 322
11.4 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
11.5 Algebra of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
11.5.1 Indeterminacies for limits . . . . . . . . . . . . . . . . . . . . . . . . . 328
11.6 Common limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
11.7 Orders of convergence and of divergence . . . . . . . . . . . . . . . . . . . . . 333
11.7.1 Little-o algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
11.7.2 Asymptotic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 336
11.7.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
11.7.4 The usual bestiary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

12 Continuous functions 339


12.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
12.2 Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
12.3 Operations and composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
12.4 Zeros and equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
12.4.1 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
12.4.2 Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
12.5 Weierstrass’Theorem: a preview . . . . . . . . . . . . . . . . . . . . . . . . . 351
12.6 Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
12.7 Limits and continuity of operators . . . . . . . . . . . . . . . . . . . . . . . . 356
12.8 Equations, …xed points, and market equilibria . . . . . . . . . . . . . . . . . . 358
12.8.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
12.8.2 Fixed points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
12.8.3 Aggregate market analysis via …xed points . . . . . . . . . . . . . . . . 362
12.9 Asymptotic behavior of recurrences . . . . . . . . . . . . . . . . . . . . . . . . 365
12.9.1 A general de…nition for recurrences . . . . . . . . . . . . . . . . . . . . 365
12.9.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
12.9.3 Price dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
12.9.4 Heron’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
12.10Coda continua . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
viii CONTENTS

IV Linear and nonlinear analysis 377

13 Linear functions and operators 379


13.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
13.1.1 De…nition and …rst properties . . . . . . . . . . . . . . . . . . . . . . . 379
13.1.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
13.1.3 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
13.1.4 Application: averages . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
13.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
13.2.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
13.2.2 Operations on matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 388
13.2.3 A …rst taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
13.2.4 Product of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
13.3 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
13.3.1 De…nition and …rst properties . . . . . . . . . . . . . . . . . . . . . . . 394
13.3.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
13.3.3 Matrices and operations . . . . . . . . . . . . . . . . . . . . . . . . . . 399
13.4 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
13.4.1 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
13.4.2 Rank of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
13.4.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
13.4.4 Gaussian elimination procedure . . . . . . . . . . . . . . . . . . . . . . 409
13.5 Invertible operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
13.5.1 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
13.5.2 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
13.6.3 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
13.6.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
13.6.5 Laplace’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
13.6.6 Inverses and determinants . . . . . . . . . . . . . . . . . . . . . . . . . 428
13.6.7 Kronecker’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
13.6.8 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
13.7 Square linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
13.8 General linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
13.8.1 Kronecker-Capelli’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 436
13.8.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
13.8.3 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
13.9 Solving systems: Cramer’s method . . . . . . . . . . . . . . . . . . . . . . . . 440
13.10Coda: Hahn-Banach et similia . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

14 Concave functions 451


14.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
14.1.1 De…nition and basic properties . . . . . . . . . . . . . . . . . . . . . . 451
14.1.2 Back to high school: polytopes . . . . . . . . . . . . . . . . . . . . . . 454
CONTENTS ix

14.2 Concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457


14.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
14.3.1 Concave functions and convex sets . . . . . . . . . . . . . . . . . . . . 462
14.3.2 A¢ ne functions and a¢ ne sets . . . . . . . . . . . . . . . . . . . . . . 465
14.3.3 Jensen’s inequality and continuity . . . . . . . . . . . . . . . . . . . . 467
14.4 Quasi-concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
14.5 Diversi…cation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
14.6 Grand …nale: Cauchy’s equation . . . . . . . . . . . . . . . . . . . . . . . . . 477
14.6.1 The basic equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
14.6.2 Remarkable variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
14.6.3 Continuous compounding . . . . . . . . . . . . . . . . . . . . . . . . . 481
14.6.4 Additive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
14.7 Fireworks: the skeleton of convexity . . . . . . . . . . . . . . . . . . . . . . . 483
14.7.1 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
14.7.2 Extreme points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

15 Homogeneous functions 489


15.1 Preamble: cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
15.2 Homogeneity and returns to scale . . . . . . . . . . . . . . . . . . . . . . . . . 490
15.2.1 Homogeneous functions . . . . . . . . . . . . . . . . . . . . . . . . . . 490
15.2.2 Average functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
15.2.3 Homogeneity and quasi-concavity . . . . . . . . . . . . . . . . . . . . . 494
15.3 Homotheticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
15.3.1 Semicones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
15.3.2 Homotheticity and utility . . . . . . . . . . . . . . . . . . . . . . . . . 497

16 Lipschitz functions 499


16.1 Global control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
16.2 Local control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
16.3 Translation invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504

17 Supermodular functions 507


17.1 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
17.2 Supermodular functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
17.3 Functions with increasing cross di¤erences . . . . . . . . . . . . . . . . . . . . 509
17.3.1 Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
17.3.2 Increasing cross di¤erences and complementarity . . . . . . . . . . . . 511
17.4 Supermodularity and concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 514
17.5 Log-convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516

V Optima 519

18 Optimization problems 521


18.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
18.1.1 The beginner’s luck . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
18.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
x CONTENTS

18.1.3 Cogito ergo solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531


18.1.4 Consumption and production . . . . . . . . . . . . . . . . . . . . . . . 535
18.1.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
18.2 Existence: Weierstrass’Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 541
18.2.1 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
18.2.2 First proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
18.2.3 Second proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
18.3 Existence: Tonelli’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
18.3.1 Coercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
18.3.2 Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
18.3.3 Supercoercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
18.4 Separating sets and points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
18.5 Local extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
18.6 Concavity and quasi-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 555
18.6.1 Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
18.6.2 Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
18.6.3 A¢ ne functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
18.6.4 Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
18.7 Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
18.7.1 Optimal bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
18.7.2 Demand function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
18.7.3 Nominal changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
18.8 Equilibrium analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
18.8.1 Exchange economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
18.8.2 Invisible hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
18.9 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
18.9.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
18.9.2 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
18.10Operator optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
18.10.1 Operator optimization problems . . . . . . . . . . . . . . . . . . . . . 577
18.10.2 Planner’s problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
18.11Infracoda: cuneiform functions . . . . . . . . . . . . . . . . . . . . . . . . . . 581
18.12Coda: no illusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
18.13Ultracoda: the semicontinuous Tonelli . . . . . . . . . . . . . . . . . . . . . . 583
18.13.1 Semicontinuous functions: de…nition . . . . . . . . . . . . . . . . . . . 583
18.13.2 Semicontinuous functions: properties . . . . . . . . . . . . . . . . . . . 586
18.13.3 The (almost) ultimate Tonelli . . . . . . . . . . . . . . . . . . . . . . . 588
18.13.4 The ordinal Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

19 Projections and approximations 593


19.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
19.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
19.3 The ultimate Riesz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
19.4 Least squares and projections . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
19.5 A …nance illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
19.5.1 Portfolios and contingent claims . . . . . . . . . . . . . . . . . . . . . 600
CONTENTS xi

19.5.2 Market value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601


19.5.3 Law of one price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
19.5.4 Pricing rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
19.5.5 Pricing kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
19.5.6 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

VI Di¤erential calculus 607

20 Derivatives 609
20.1 Marginal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
20.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
20.3 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
20.4 Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
20.5 One-sided derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
20.6 Derivability and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
20.7 Derivatives of elementary functions . . . . . . . . . . . . . . . . . . . . . . . . 622
20.8 Algebra of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
20.9 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
20.10Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
20.11Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
20.12Di¤erentiability and linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
20.12.1 Di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
20.12.2 Di¤erentiability and derivability . . . . . . . . . . . . . . . . . . . . . 635
20.12.3 Di¤erentiability and continuity . . . . . . . . . . . . . . . . . . . . . . 637
20.12.4 A terminological turning point . . . . . . . . . . . . . . . . . . . . . . 637
20.13Derivatives of higher order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
20.14Discrete limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

21 Di¤erential calculus in several variables 643


21.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
21.1.1 The notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
21.1.2 A continuity failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
21.1.3 Derivative operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
21.1.4 Ceteris paribus: marginal analysis . . . . . . . . . . . . . . . . . . . . 651
21.2 Di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
21.2.1 Di¤erentiability and partial derivability . . . . . . . . . . . . . . . . . 655
21.2.2 Total di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
21.2.3 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
21.3 Partial derivatives of higher order . . . . . . . . . . . . . . . . . . . . . . . . . 662
21.4 Taking stock: the natural domain of analysis . . . . . . . . . . . . . . . . . . 667
21.5 Incremental and approximation viewpoints . . . . . . . . . . . . . . . . . . . 667
21.5.1 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
21.5.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
21.5.3 The two viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
21.6 Di¤erential of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
xii CONTENTS

21.6.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673


21.6.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676
21.6.3 Proof of the chain rule (Theorem 979) . . . . . . . . . . . . . . . . . . 681

22 Di¤erential methods 683


22.1 Extremal and critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
22.1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
22.1.2 Fermat’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
22.1.3 Unconstrained optima: incipit . . . . . . . . . . . . . . . . . . . . . . . 689
22.2 Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
22.3 Continuity properties of the derivative . . . . . . . . . . . . . . . . . . . . . . 693
22.4 Monotonicity and di¤erentiability . . . . . . . . . . . . . . . . . . . . . . . . . 695
22.5 Su¢ cient conditions for local extremal points . . . . . . . . . . . . . . . . . . 699
22.5.1 Local extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
22.5.2 Searching local extremal points via …rst and second order conditions . 701
22.5.3 Searching global extremal points via …rst and second order conditions 704
22.5.4 A false start: global extremal points . . . . . . . . . . . . . . . . . . . 706
22.6 De l’Hospital’s Theorem and rule . . . . . . . . . . . . . . . . . . . . . . . . . 707
22.6.1 Indeterminate forms 0=0 and 1=1 . . . . . . . . . . . . . . . . . . . . 707
22.6.2 Other indeterminacies . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

23 Approximation 713
23.1 Taylor’s polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . 713
23.1.1 Polynomial expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
23.1.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
23.1.3 Taylor expansion and limits . . . . . . . . . . . . . . . . . . . . . . . . 720
23.2 Omnibus proposition for local extremal points . . . . . . . . . . . . . . . . . . 721
23.3 Omnibus procedure of search of local extremal points . . . . . . . . . . . . . . 724
23.3.1 Twice di¤erentiable functions . . . . . . . . . . . . . . . . . . . . . . . 724
23.3.2 In…nitely di¤erentiable functions . . . . . . . . . . . . . . . . . . . . . 724
23.4 Taylor expansion: functions of several variables . . . . . . . . . . . . . . . . . 725
23.4.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
23.4.2 Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
23.4.3 Second-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 731
23.4.4 Multivariable unconstrained optima . . . . . . . . . . . . . . . . . . . 735
23.5 Coda: asymptotic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
23.5.1 Asymptotic scales and expansions . . . . . . . . . . . . . . . . . . . . 736
23.5.2 Asymptotic expansions and analytic functions . . . . . . . . . . . . . . 740
23.5.3 Hille’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
23.5.4 Borel’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745

24 Concavity and di¤erentiability 747


24.1 Scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
24.1.1 Decreasing marginal e¤ects . . . . . . . . . . . . . . . . . . . . . . . . 747
24.1.2 Chords and tangents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753
24.1.3 Concavity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
CONTENTS xiii

24.2 Intermezzo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758


24.2.1 Superlinear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
24.2.2 Monotonic operators and the law of demand . . . . . . . . . . . . . . 760
24.3 Multivariable case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
24.3.1 Derivability and di¤erentiability . . . . . . . . . . . . . . . . . . . . . 761
24.3.2 A key inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
24.3.3 Concavity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
24.4 Ultramodular functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
24.5 Global optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
24.5.1 Su¢ ciency of the …rst order condition . . . . . . . . . . . . . . . . . . 772
24.5.2 A deeper result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
24.6 Superdi¤erentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
24.7 Quasi-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
24.7.1 Ordinal superdi¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . 784
24.7.2 Quasi-concavity and di¤erentiability . . . . . . . . . . . . . . . . . . . 787
24.7.3 Quasi-concavity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 788
24.7.4 Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
24.8 Infracoda: a linear algebra result . . . . . . . . . . . . . . . . . . . . . . . . . 789
24.9 Coda: representation of superlinear functions . . . . . . . . . . . . . . . . . . 790
24.9.1 The ultimate Hahn-Banach’s Theorem . . . . . . . . . . . . . . . . . . 790
24.9.2 Representation of superlinear functions . . . . . . . . . . . . . . . . . 792
24.9.3 Modelling bid-ask spreads . . . . . . . . . . . . . . . . . . . . . . . . . 795
24.10Ultracoda: strong concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801

25 Implicit functions 807


25.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
25.2 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
25.3 A local perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
25.3.1 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . 814
25.3.2 Level curves and marginal rates . . . . . . . . . . . . . . . . . . . . . . 820
25.3.3 Quadratic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
25.3.4 Implicit functions of several variables . . . . . . . . . . . . . . . . . . . 826
25.3.5 Implicit operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
25.4 A global perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
25.4.1 Preamble: projections and shadows . . . . . . . . . . . . . . . . . . . . 830
25.4.2 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
25.4.3 Comparative statics I . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
25.4.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
25.4.5 Comparative statics II . . . . . . . . . . . . . . . . . . . . . . . . . . . 840

26 Inverse functions 843


26.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
26.2 Local analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
26.3 Global analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
26.3.1 Preamble: preimages of continuous functions . . . . . . . . . . . . . . 847
26.3.2 Proper functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
xiv CONTENTS

26.3.3 Global Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . 849


26.3.4 Global Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . 850
26.4 Parametric equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
26.5 Coda: direct and inverse problems . . . . . . . . . . . . . . . . . . . . . . . . 852

27 Study of functions 855


27.1 In‡ection points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
27.2 Asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
27.3 Study of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862

VII Di¤erential optimization 869

28 Unconstrained optimization 871


28.1 Unconstrained problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
28.2 Coercive problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
28.3 Concave problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874
28.4 Relationship among problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 876
28.5 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
28.6 Optimization and equations: general least squares . . . . . . . . . . . . . . . 879
28.7 Coda: computational issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
28.7.1 Decision procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
28.7.2 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
28.7.3 Maximizing sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
28.7.4 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887

29 Equality constraints 889


29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
29.2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
29.3 One constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
29.3.1 A key lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
29.3.2 Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
29.3.3 A heuristic interpretation of the multiplier . . . . . . . . . . . . . . . . 895
29.4 The method of elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
29.5 The consumer problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902
29.6 Cogito ergo solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906
29.7 Several constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906

30 Inequality constraints 915


30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
30.2 Resolution of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918
30.2.1 Kuhn-Tucker’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 921
30.2.2 The method of elimination . . . . . . . . . . . . . . . . . . . . . . . . 922
30.3 Cogito et solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
30.4 Concave optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
30.4.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
30.4.2 Kuhn-Tucker points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
CONTENTS xv

30.5 Appendix: proof of a key lemma . . . . . . . . . . . . . . . . . . . . . . . . . 932

31 General constraints 937


31.1 A general concave problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937
31.2 Analysis of the black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938
31.2.1 Variational inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 938
31.2.2 A general …rst order condition . . . . . . . . . . . . . . . . . . . . . . 940
31.2.3 Divide et impera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943
31.3 Resolution of the general concave problem . . . . . . . . . . . . . . . . . . . . 944

32 Intermezzo: correspondences 947


32.1 De…nition and basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947
32.2 Hemicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
32.3 Addition and scalar multiplication of sets . . . . . . . . . . . . . . . . . . . . 953
32.4 Combining correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
32.5 Inclusion equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
32.5.1 Inclusion equations and …xed points . . . . . . . . . . . . . . . . . . . 956
32.5.2 Aggregate market analysis . . . . . . . . . . . . . . . . . . . . . . . . . 957
32.5.3 Back to agents: exchange economy . . . . . . . . . . . . . . . . . . . . 958

33 Parametric optimization problems 961


33.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
33.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962
33.3 Maximum Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
33.4 Envelope theorems I: …xed constraint . . . . . . . . . . . . . . . . . . . . . . . 968
33.5 Envelope theorems II: variable constraint . . . . . . . . . . . . . . . . . . . . 970
33.6 Marginal interpretation of multipliers . . . . . . . . . . . . . . . . . . . . . . . 972
33.7 Monotone solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972

34 Interdependent optimization 977


34.1 Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977
34.2 Nash equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
34.3 Nash equilibria and saddle points . . . . . . . . . . . . . . . . . . . . . . . . . 985
34.4 Nash equilibria on a simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
34.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
34.5.1 Randomization in games and decisions . . . . . . . . . . . . . . . . . . 987
34.5.2 Kuhn-Tucker’s saddles . . . . . . . . . . . . . . . . . . . . . . . . . . . 990
34.5.3 Linear programming: duality . . . . . . . . . . . . . . . . . . . . . . . 994

VIII Integration 997

35 The Riemann integral 999


35.1 The method of exhaustion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
35.2 Plurirectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1000
35.3 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002
35.3.1 Positive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002
xvi CONTENTS

35.3.2 General functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008


35.3.3 Everything holds together . . . . . . . . . . . . . . . . . . . . . . . . . 1010
35.4 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014
35.5 Classes of integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018
35.5.1 Step functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018
35.5.2 Analytic and geometric approaches . . . . . . . . . . . . . . . . . . . . 1021
35.5.3 Continuous functions and monotonic functions . . . . . . . . . . . . . 1022
35.6 Properties of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024
35.7 Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031
35.7.1 Primitive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032
35.7.2 Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035
35.7.3 The First Fundamental Theorem of Calculus . . . . . . . . . . . . . . 1036
35.7.4 The Second Fundamental Theorem of Calculus . . . . . . . . . . . . . 1037
35.8 Properties of the inde…nite integral . . . . . . . . . . . . . . . . . . . . . . . . 1041
35.9 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1043
35.10Closed forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
35.11Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051
35.11.1 Unbounded intervals of integration: generalities . . . . . . . . . . . . . 1051
35.11.2 Unbounded integration intervals: properties and criteria . . . . . . . . 1058
35.11.3 Gauss integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062
35.11.4 Unbounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064

36 Parameter-dependent integrals 1067


36.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067
36.2 Variability: Leibniz’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1070
36.3 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072

37 Stieltjes’integral 1073
37.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
37.2 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
37.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076
37.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078
37.5 Step integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079
37.6 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081
37.7 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1082
37.8 Modelling assets’gains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083

38 Moments 1085
38.1 Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085
38.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086
38.3 The problem of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087
38.4 Moment generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088
CONTENTS xvii

IX Appendices 1091

A Binary Relations 1093


A.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093
A.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095
A.3 Equivalence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096

B Permutations 1099
B.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1099
B.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100
B.3 Anagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101
B.4 Newton’s binomial formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1102

C Notions of trigonometry 1105


C.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105
C.2 Concerto d’archi (string concert) . . . . . . . . . . . . . . . . . . . . . . . . . 1107
C.3 Perpendicularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1112

D Elements of intuitive logic 1115


D.1 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115
D.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115
D.3 Logical equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117
D.4 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119
D.4.1 Theorems and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119
D.4.2 Direct proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1120
D.4.3 Reductio ad absurdum . . . . . . . . . . . . . . . . . . . . . . . . . . . 1121
D.4.4 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
D.5 Deductive method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
D.5.1 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
D.5.2 Deductive method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124
D.5.3 A miniature theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125
D.5.4 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125
D.6 Predicates and quanti…ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126
D.6.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126
D.6.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1127
D.6.3 Example: linear dependence . . . . . . . . . . . . . . . . . . . . . . . . 1128
D.6.4 Example: negation of convergence . . . . . . . . . . . . . . . . . . . . 1128
D.6.5 A set-theoretic twist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129
D.7 Coda: the logic of empirical scienti…c theories . . . . . . . . . . . . . . . . . . 1129

E Mathematical induction 1133


E.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1133
E.2 The harmonic Mengoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135

F Cast of characters 1137


xviii CONTENTS
Part I

Structures

1
Chapter 1

Sets and numbers: an intuitive


introduction

1.1 Sets
A set is a collection of distinguishable objects. There are two ways to describe a set: by
listing directly its elements, or by specifying a property that its elements have in common.
The second way is more common: for instance,

f11; 13; 17; 19; 23; 29g (1.1)

can be described as the set of the prime numbers between 10 and 30. The chairs of your
kitchen form a set of objects, the chairs, that have in common the property of being part
of your kitchen. The chairs of your bedroom form another set, as the letters of the Latin
alphabet form a set, distinct from the set of the letters of the Greek alphabet (and from the
set of chairs or from the set of numbers considered above).

Sets are usually denoted by capital letters: A, B, C, and so on; their elements are denoted
by small letters: a, b, c, and so on. To denote that an element a belongs to the set A we
write

a2A
where 2 is the symbol of belonging. Instead, to denote that an element a does not belong
to the set A we write a 2
= A.

O¤ the record remark (O.R.). The concept of set, apparently introduced in 1847 by
Bernhard Bolzano, is for us a primitive concept, not de…ned through other notions. Like
in Euclidean geometry, in which points and lines are primitive concepts (with an intuitive
geometric meaning that readers may give them). H

1.1.1 Subsets
The chairs of your bedroom are a subset of the chairs of your home: a chair that belongs to
your bedroom also belongs to your home. In general, a set A is subset of a set B when all
the elements of A are also elements of B. In this case we write A B. Formally,

3
4 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

De…nition 1 Given two sets A and B, we say that A is subset of B, in symbols A B, if


all the elements of A are also elements of B, that is, if x 2 A implies x 2 B.

For instance, denote by A the set (1.1), that is,

A = f11; 13; 17; 19; 23; 29g

and let
B = f11; 13; 15; 17; 19; 21; 23; 25; 27; 29g (1.2)
be the set of the odd numbers between 10 and 30. We have A B.

Graphically, the relation A B can be illustrated as

4 A⊆B
2

-2 A

-4
B

-6
-6 -4 -2 0 2 4 6

by using the so-called Venn diagrams to represent graphically the sets A and B: it is an
ingenuous, yet e¤ective, way to visualize sets.

When we have both A B and B A – that is, x 2 A if and only if x 2 B – the two
sets A and B are said to be equal; in symbols A = B. For example, let A be the set of
the solutions of the quadratic equation x2 3x + 2 = 0 and let B be the set formed by the
numbers 1 and 2. It is easy to see that A = B.
When A B and A 6= B, we write A B and say that A is a proper subset of B.
The sets A = fag that consist of a unique element are called singletons. They are a
peculiar, but altogether legitimate, class of sets.1

Nota Bene (N.B.) Though the two symbols 2 and are conceptually well distinct and
must not be confused, there exists an interesting relation between them. Indeed, consider
the set formed by a unique element a, that is, the singleton fag. Through such a singleton,
we can establish the relation

a 2 A if and only if fag A

between 2 and . O
1
Note that a and fag are not the same thing; a is an element and fag is a set, even if it is formed by only
one element. For instance, the set A of the Nations of the Earth with the ‡ag of only one colour had (until
2011) only one element, Libya, but it is not “the Libya”: Tripoli is not the capital of A.
1.1. SETS 5

1.1.2 Operations
There are three basic operations among sets: union, intersection, and di¤erence. As we will
see, they take any two sets and, starting from them, form a new set.

The …rst operation that we consider is the intersection of two sets A and B. As the
term “intersection” suggests, with this operation we select all the elements that belong
simultaneously to the sets A and B.

De…nition 2 Given two sets A and B, their intersection A \ B is the set of all the elements
that belong both to A and B, that is, x 2 A \ B if x 2 A and x 2 B.

The operation can be illustrated graphically in the following way:

For example, let A be the set of the left-handed and B the set of the right-handed citizens
of a country. The intersection A \ B is the set of the ambidextrous citizens. If, instead, A is
the set of the gasoline cars and B the set of the methane cars, the intersection A \ B is the
set of the bi-fuel cars that run on both gasoline and methane.

It can happen that two sets have no elements in common. For example, let

C = f10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30g (1.3)

be the set of the even numbers between 10 and 30. It has no elements in common with the
set B in (1.2). In this case we talk of disjoint sets, with no elements in common. Such a
notion gives us the opportunity to introduce a fundamental set.

De…nition 3 The empty set, denoted by ;, is the set without elements.

As a …rst use of the notion, note that two sets A and B are disjoint when they have
empty intersection, that is, A \ B = ;. For example, for the sets B and C in (1.2) and (1.3),
we have B \ C = ;.
We write A 6= ; when the set A is not empty, that is, it contains at least one element.
Conventionally, we consider the empty set as a subset of any set, that is, ; A for every set
A.

It is immediate that A \ B A and A \ B B. The next result is more subtle and


establishes a useful property that links and \.
6 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Proposition 4 A \ B = A if and only if A B.

Proof “If”. Let A B. We want to prove that A \ B = A. To show that two sets are equal,
we always need to prove separately the two opposite inclusions: in this case, A \ B A and
A A \ B.
The inclusion A \ B A is easily proven to be true. Indeed, let x 2 A \ B.2 Then, by
de…nition, x belongs both to A and to B. In particular, x 2 A and this is enough to conclude
that A \ B A.
Let us prove the inclusion A A \ B. Let x 2 A. Since, by hypothesis, A B, each
element of A also belongs to B, it follows that x 2 B. Hence, x belongs both to A and to
B, i.e., x 2 A \ B. This proves that A A \ B.
We have shown that both the inclusions A \ B A and A A \ B hold; we can therefore
conclude that A \ B = A, which completes the proof of the “if” part.

“Only if”. Let A \ B = A. Let x 2 A. By hypothesis A \ B = A so x 2 A \ B. In


particular, x then belongs to B, as claimed.

The next operation we consider is the union. Here again the term “union” already
suggests how in this operation all the elements of both sets are collected together.

De…nition 5 Given two sets A and B, their union A [ B is the set of all the elements that
belong to A or to B, that is, x 2 A [ B if x 2 A or x 2 B.3

Note that an element can belong to both sets (unless they are disjoint). For example, if
A is again the set of the left-handed and B is the set of the right-handed citizens, the union
set contains all citizens with at least one hand, and there are individuals (the ambidexters)
who belong to both sets.4
It is immediate to show that A A [ B and that B A [ B. It then follows that

A\B A[B

Graphically the union is represented in the following way:

2
In proving an inclusion between sets, say C D, throughout the book we will tacitly assume that C 6= ;
because the inclusion is trivially true when C = ;. For this reason our inclusion proof will show that x 2 C
(i.e., C 6= ;) implies x 2 D.
3
The conjunction “or” has the inclusive sense of the Latin “vel” (x belongs to A or to B or to both) and
not the exclusive sense of “aut” (x belongs to either A or to B, but not to both). Indeed, Giuseppe Peano
gave the symbol [ the meaning “vel” when he …rst introduced it, along with the intersection symbol \ and
the membership symbol ", which he interpreted as the Latin “et” and “est”, respectively (see the “signorum
tabula” in his 1889 Arithmetices principia, a seminal work on the foundations of mathematics).
4
The clause “with at least one hand”, though needed, may seem pedantic, even tactless. The distinction
between being precise and pedantic is subtle and, ultimately, subjective. Experience may help to balance
rigor and readability. In any case, in mathematics loose ends have to be handled with care and, de…nitely,
are not for beginners.
1.1. SETS 7

4 A ∪ B

-2 A
B
-4

-6
-2 0 2 4 6 8 10

The last operation that we consider is the di¤erence.

De…nition 6 Given two sets A and B, their di¤erence A B is the set of all the elements
that belong to A, but not to B, that is, x 2 A B if both x 2 A and x 2
= B.

The set A B is, therefore, obtained by eliminating from A all the elements that belong
(also) to B.5 Graphically:

2 A-B

-1 B
A
-2

-3
-3 -2 -1 0 1 2 3 4 5

For example, let us go back to the sets A and B speci…ed in (1.1) and (1.2). Then,

B A = f15; 21; 25; 27g

that is, B A is the set of the non-prime odd numbers between 10 and 30.
Note that: (i) when A and B are disjoint, we have A B = A and B A = B, (ii) A B
is equivalent to A B = ; since, by removing from A all the elements that belong also to
B, the set A is deprived of all its elements, that is, we remain with the empty set.

In many applications there is a general set of reference, an all inclusive set, of which
various subsets are considered. For example, for demographers this set can be the entire
5
The di¤erence A B is often denoted by AnB.
8 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

population of a country, of which they can consider various subsets according to the demo-
graphic properties that are of interest (for instance, age is a standard demographic variable
through which the population can be subdivided in subsets).
The general set of reference is called universal set or, more commonly, space. There
is no standard notation for this set (which is often clear from the context). We denote it
temporarily by S. Given any of its subsets A, the di¤erence S A is denoted by Ac and
is called the complement set, or simply the complement, of A. The di¤erence operation is
called complementation when it involves the universal set.

Example 7 If S is the set of all citizens of a country and A is the set of all citizens that are
at least 65 years old, the complement Ac is constituted by all citizens that are (strictly) less
than 65 years old. N

It is immediate to verify that, for every A, we have A [ Ac = S and A \ Ac = ;. We also


have:

Proposition 8 (Ac )c = A.

Proof Since we have to verify an equality between sets (as in the proof of Proposition 4),
we have to consider separately the two inclusions (Ac )c A and A (Ac )c .
If a 2 (Ac )c , then a 2
= Ac and therefore a 2 A. It follows that (Ac )c A.
Vice versa, if a 2 A, then a 2= Ac and therefore a 2 (Ac )c . Hence, A (Ac )c .

Finally, we can easily prove that A B = A \ B c . Indeed, x 2 A B means that x 2 A


= B, that is, x 2 A and x 2 B c .
and x 2

1.1.3 Properties of the operations


Proposition 9 The operations of union and intersection are:

(i) commutative, that is, for any two sets A and B, we have A \ B = B \ A and A [ B =
B [ A;

(ii) associative, that is, for any three sets A, B, and C, we have A[(B [ C) = (A [ B)[C
and A \ (B \ C) = (A \ B) \ C.

We leave to the reader the simple proof. Property (ii) permits to write A [ B [ C
and A \ B \ C and, therefore, to extend without ambiguity the operations of union and
intersection to an arbitrary (…nite) number of sets:
n
[ n
\
Ai and Ai
i=1 i=1

It is possible to extend such operations also to in…nitely many sets. If A1 ; A2 ; :::An ; ::: is an
in…nite collection of sets, the union
[1
An
n=1
1.1. SETS 9

is the set of the elements that belong at least to one of the An , that is,
1
[
An = fa : a 2 An for at least one index ng
n=1

The intersection
1
\
An
n=1
is the set of the elements that belong to every An , that is,
1
\
An = fa : a 2 An for every index ng
n=1

Example 10 Let An be the set of the even numbers


T1 n. For example, A3 = f0; 2g and
A6 = f0; 2; 4; 6g. We have n=1 AS n = f0g because 0 is the only even numberSsuch that
0 2 An for each n 1. Moreover, 1 n=1 An = f2n : n positive integerg, that is,
1
n=1 An is
the set of all even numbers. N

We turn to the relations between the operations of intersection and union. Note the
symmetry between properties (1.4) and (1.5), in which \ and [ are exchanged.

Proposition 11 The operations of union and intersection are distributive, that is, given
any three sets A, B, and C, we have

A \ (B [ C) = (A \ B) [ (A \ C) (1.4)

and
A [ (B \ C) = (A [ B) \ (A [ C) : (1.5)

Proof We prove only (1.4). We have to consider separately the two inclusions A\(B [ C)
(A \ B) [ (A \ C) and (A \ B) [ (A \ C) A \ (B [ C).
If x 2 A \ (B [ C), then x 2 A and x 2 B [ C, that is (i) x 2 A and (ii) x 2 B or
x 2 C. It follows that x 2 A \ B or x 2 A \ C, i.e., x 2 (A \ B) [ (A \ C), and therefore
A \ (B [ C) (A \ B) [ (A \ C).
Vice versa, if x 2 (A \ B) [ (A \ C), then x 2 A \ B or x 2 A \ C, that is, x belongs
to A and to at least one of B and C and therefore x 2 A \ (B [ C). It follows that
(A \ B) [ (A \ C) A \ (B [ C).

We now introduce a concept that plays an important role in many applications.

De…nition 12 A family fA1 ; A2 ; : : : ; An g of subsets of a set A is a partition of A if the


subsets are pairwise
S disjoint, that is, Ai \ Aj = ; for every i 6= j, and if their union coincides
with A, that is, ni=1 Ai = A.

Example 13 Let A be the set of all citizens of a country. Its subsets A1 , A2 , and A3
formed, respectively, by the citizens of school or pre-school age (from 0 to 17 years old), by
the citizens of working age (from 18 to 65 years old) and by the elders (from 65 years old
on) form a partition of the set A. Relatedly, age cohorts, formed by citizens who have the
same age, form a partition of A. N
10 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

We conclude with the so-called De Morgan’s laws for complementation: they illustrate
the relationship between the operations of intersection, union, and complementation.

Proposition 14 Given two subsets A and B of a space S, we have (A [ B)c = Ac \ B c and


(A \ B)c = Ac [ B c .

Proof We prove only the …rst law, leaving the second one to the reader. As usual, to prove
an equality between sets we have to consider separately the two inclusions that compose it.
(i) (A [ B)c Ac \ B c . If x 2 (A [ B)c , then x 2 = A [ B, that is, x does not belong either
to A or to B. It follows that x belongs simultaneously to Ac and to B c and, therefore, to
their intersection. (ii) Ac \ B c (A [ B)c . If x 2 Ac \ B c then x 2
= A and x 2= B; therefore,
x does not belong to their union.

De Morgan’s laws show that, when considering complements, the operations [ and \
are, essentially, interchangeable. Often these laws are written in the equivalent form

A [ B = (Ac \ B c )c and A \ B = (Ac [ B c )c

1.1.4 A naive remark


In this book we will usually de…ne sets by means of the properties of their elements. Such
a “naive” notion of a set is su¢ cient for our purposes. The naiveté of this approach is
highlighted by the classic paradoxes that, between the end of the nineteenth century and
early Twentieth century, were discovered by Cesare Burali Forti and Bertrand Russell. Such
paradoxes arise by considering sets of sets, that is, sets whose elements are sets themselves.
As in Burali Forti, using the naive notion of a set we de…ne “the set of all sets”, that is,
the set whose elements share the property of being sets. If such a universal set “U ”existed,
we could also form the set fB : B U g that consists of U and all of its subsets. Yet, as it
will be shown in Cantor’s Theorem 261, such a set does not belong to U , which contradicts
the supposed universality of U . Among the bizarre features of a universal set there is the
fact that it belongs to itself, i.e., U 2 U , a completely unintuitive property (as observed by
Russell, “the human race, for instance, is not a human”).
As suggested by Russell, let us consider the set A formed by all sets that are not members
of themselves (e.g., the set of red oranges belongs to A because its elements are red oranges
and, obviously, none of them is the entire collection of all them). If A 2 = A, namely if A
does not belong to itself, then A 2 A because it is a set that satis…es the property of not
belonging to itself. On the other hand, if A 2 A, namely if A contains itself, then A 2 = A
because, by de…nition, the elements of A do not contain themselves. In conclusion, we reach
the absurdity A 2 = A if and only if A 2 A. It is the famous paradox of Russell.
These logical paradoxes (often called antinomies) can be addresses within a non-naive
set theory, in particular that of Zermelo-Fraenkel. In the practice of mathematics, all the
more in an introductory book, these foundational aspects can be safely ignored (their study
would require an ad hoc, highly non-trivial, course). But, it is important to be aware of these
paradoxes because the methods that have been developed to address them have a¤ected the
practice of mathematics, as well as that of the empirical sciences.
1.2. NUMBERS 11

1.2 Numbers
To quantify the variables of interest in economic applications (for example, the prices and
quantities of goods traded in some market) we need an adequate set of numbers. This is the
topic of the present section.
The natural numbers
0; 1; 2; 3; :::
do not need any introduction; their set will be denoted by the symbol N.
The set N of natural numbers is closed with respect to the fundamental operations of
addition and multiplication:

(i) m + n 2 N when m; n 2 N;
(ii) m n 2 N when m; n 2 N.

On the contrary, N is not closed with respect to the fundamental operations of subtraction
and division: for example, neither 5 6 nor 5=6 are natural numbers. It is, therefore, clear
that N is inadequate as a set of numbers for economic applications: the budget of a company
is an obvious example in which the closure with respect to the subtraction is crucial –
otherwise, how can we quantify losses?6
The integer numbers
:::; 3; 2; 1; 0; 1; 2; 3; :::
form a …rst extension, denoted by the symbol Z, of the set N. It leads to a set that is closed
with respect to addition and multiplication, as well as to subtraction. Indeed, by setting
m n = m + ( n),7 we have

(i) m n 2 Z when m; n 2 Z;
(ii) m n 2 Z when m; n 2 Z.

Formally, the set Z can be written in terms of N as

Z = fm n : m; n 2 Ng

Proposition 15 N Z.

Proof Let m 2 N. We have m = m 0 2 Z because 0 2 N.

We are left with a fundamental operation with respect to which Z is not closed: division.
For example, 1=3 is not an integer. To remedy this important shortcoming of the integers
(if we want to divide 1 cake among 3 guests, how can we quantify their portions if only Z
is available?), we need a further enlargement to the set of the rational numbers, denoted by
the symbol Q, and given by
nm o
Q= : m; n 2 Z with n 6= 0
n
6
Historically, negative numbers have often been viewed with suspicion. It is in economics, indeed, where
they have a most natural interpretation in terms of losses.
7
The di¤erence m n is simply the sum of m with the negative n of n (recall the notion of algebraic
sum).
12 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

In words, the set of the rational numbers consists of all the fractions with integers in both
the numerator and the denominator (not equal to zero).

Proposition 16 Z Q.

Proof Let m 2 Z. We have m = m=1 2 Q because 1 2 Z.

The set of rational numbers is closed with respect to all the four fundamental operations:8

(i) m n 2 Q when m; n 2 Q;

(ii) m n 2 Q when m; n 2 Q;

(iii) m=n 2 Q when m; n 2 Q with n 6= 0.

O.R. Each rational number that is not periodic, that is, that has a …nite number of decimals,
has two decimal representations. For example, 1 = 0:9 because

1
0:9 = 3 0:3 = 3 =1
3

In an analogous way, 2:5 = 2:49, 51:2 = 51:19, and so on. On the contrary, periodic rational
numbers and irrational numbers have a unique decimal representation (which is in…nite).
This is not a simple curiosity: if 0:9 were not equal to 1, we could state that 0:9 is the
number that immediately precedes 1 (without any other number in between), which would
violate a notable property that we will discuss shortly. H

The set of rational numbers seems, therefore, to have all that we need. Some simple
observations on the multiplication, however, will bring us some surprising …ndings. If q is a
rational number, the notation q n , with n 1, means

q q ::: q
| {z }
n times

with q 0 = 1 for every q 6= 0. The notation q n , called power of basis q and exponent n, per se
is just shorthand notation for the repeated multiplication of the same factor. Nevertheless,
given a rational q > 0, it is natural to consider the inverse path, that is, to determine the
1 p
positive “number”, denoted by q n –or, equivalently, by n q –and called root of order n of
q, such that
1 n
qn =q
p
For example,9 25 = 5 because 52 = 25. To understand the importance of roots, we can
consider the following simple geometric …gure:
8
The names of the four fundamental operations are addition, subtraction, multiplication, and division,
while the names of their results are sum, di¤erence, product, and quotient, respectively (the addition of 3
and 4 has 7 as sum, and so on).
9 p p
The square root 2 q is simply denoted by q, omitting the index 2.
1.2. NUMBERS 13

p
By Pythagoras’ Theorem, the length of the hypotenuse is 2. To quantify elementary
geometric entities, we thus need square roots. Here we have a, tragic to some, surprise.10
p
Theorem 17 2 2 = Q.
p
Proof p Suppose, by contradiction, that 2 2 Q. Then there exist m; n 2 Z such that
m=n = 2, and therefore
m 2
=2 (1.6)
n
We can assume that m=n is already reduced to its lowest terms, i.e., that m and n have no
factors in common.11 This means that m and n cannot both be even numbers (otherwise, 2
would be a common factor).
Formula (1.6) implies
m2 = 2n2 (1.7)
and, therefore, m2 is even. As the square of an odd number is odd, m is also even (otherwise,
if m were odd, then m2 would also be odd). Therefore, there exists an integer k 6= 0 such
that
m = 2k (1.8)
From (1.7) and (1.8) it follows that
n2 = 2k 2
Therefore n2 is even, and so n itself is even. In conclusion, both m and n are even, but this
contradicts
p the fact that m=n is reduced to its lowest terms. This contradiction proves that
22= Q.

This magni…cent result is one of the great theorems of Greek mathematics. Proved by the
Pythagorean school between the VI and the V century B.C., the unexpected outcome of the
–prima facie innocuous –distinction between even and odd numbers that the Pythagoreans
were the …rst to make, it represented a turning point in the history of mathematics. Leaving
aside the philosophical aspects,12 from the mathematical point of view it shows the need for
10
For the Pythagorean philosophy, in which the proportions (that is, the rational numbers) were central,
the discovery of the non-rationality of square roots was a traumatic event. We refer the curious reader to
Fritz (1945).
11
For example, 14=10 is not reduced to its lowest terms because the numerator and the denominator have
in common the factor 2. On the contrary, 7=5 is reduced to its lowest terms. p
12
The theorem shows, inter alia, that the hypotenuse contains in…nitely many points (otherwise 2 would
be a natural number). This questions the relations between geometry and the physical world that originally
motivated its study (at least under any kind of Atomism, back then advocated by the Ionian school).
14 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

a further enlargement of the set of numbers in order to quantify basic geometric entities (as
well as basic economic variables, as it will be clear in the sequel).
To introduce, at an intuitive level, this …nal enlargement,13 consider the real line:

It is easy to see how on this line we can represent the rational numbers:

The rational numbers do not exhaust, however, the real line. For example, also roots like
p
2, or other non-rational numbers, such as , must …nd their representation on the real
line:14

We denote by R the set of all the numbers that can be represented on the real line; they are
called real numbers.
The set R has the following properties in terms of the fundamental operations (here a; b
and c are generic real numbers):

(i) a + b 2 R and a b 2 R;

(ii) a + b = b + a and a b = b a;

(iii) (a + b) + c = a + (b + c) and (a b) c = a (b c);

(iv) a + 0 = a and b 1 = b;
1
(v) a + ( a) = 0 and b b = 1 provided b 6= 0;

(vi) a (b + c) = a b + a c.

Clearly, Q R. But Q 6= R: there are many real numbers, called irrationals, that are
not rational. Many roots and the numbers and e are examples of irrational numbers. It
is actually possible to prove that most real numbers are irrational. Although a rigorous
treatment of this topic would take us too far, the next simple result is already a clear
indication of how rich the set of the irrational numbers is.

Proposition 18 Given any two real numbers a < b, there exists an irrational number c 2 R
such that a < c < b.
13
For a rigorous treatment we refer, for example, to the …rst chapter of Rudin (1976).
14
Though intuitive, it is actually a postulate (of continuity of the real line).
1.3. STRUCTURE OF THE INTEGERS 15

Proof For each natural n 2 N, let p


2
cn = a +
n
We have cn > a for every n, and it is easy to check that every cn is irrational. Moreover,
p
2
cn < b () n >
b a
p
Let therefore n 2 N be any natural number such that n > 2= (b a).15 Since a < cn < b,
the proof is complete.

In conclusion, R is the set of numbers that we will consider in the rest of the book. It
turns out to be adequate for most economic applications.16

1.3 Structure of the integers


Let us now analyze some basic –yet not trivial –properties of integers. The main result we
will present is the Fundamental Theorem of Arithmetic, which shows the central role that
prime numbers play in the structure of the set of integers.

1.3.1 Divisors and algorithms


In this …rst section we will present some preliminary notions which will be needed for the
following section regarding prime numbers. In so doing we will encounter and get acquainted
with the notion of algorithm, which is of paramount importance for applications.
We begin by introducing in a rigorous fashion some notions, the essence of which the
reader may have learned in elementary school. An integer n is divisible by an integer p 6= 0
if there is a third integer q such that n = pq. In symbols we write p j n, which is read as “p
divides n”.
Example 19 The integer 6 is divisible by the integer 2, that is, 2 j 6, because the integer 3
is such that 6 = 2 3. Furthermore, 6 is divisible by 3, that is, 3 j 6 because the integer
2 is such that 6 = 2 3. N
The reader may have learned in elementary school how to divide two integers by using
remainders and quotients. For example, if n = 7 and m = 2, we have n = 3 2 + 1, with 3 as
the quotient and 1 as the remainder. The next simple result formalizes the above procedure
and shows that it holds for any pair of integers (something that young learners take for
granted, but from now on we will take nothing for granted).
Proposition 20 Given any two integers m and n, with m strictly positive,17 there is one
and only one pair of integers q and r such that
n = qm + r
with 0 r < m.
15
Such n exists because of the Archimedean property of the real numbers, which we will soon see in
Proposition 38.
16
An important further enlargement, which we do not consider, is the set C of complex numbers.
17
An integer m is said to be strictly positive if m > 0, that is, m 1.
16 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Proof Two distinct properties are stated in the proposition: the existence of the pair (q; r),
and its uniqueness. Let us start by proving its existence. We will only consider the case
in which n 0 (you need only to to change the sign if n < 0). Consider the set A =
fp 2 N : p n=mg. Since n 0, the set A is non-empty because it contains at least the
integer zero. Let q be the largest element of A. By de…nition, qm n < (q + 1) m. Setting
r = n qm, we have
0 n qm = r < (q + 1) m qm = m
We have thus shown the existence of the desired pair (q; r).
Let us now consider uniqueness. By contradiction, let (q 0 ; r0 ) and (q 00 ; r00 ) be two di¤erent
pairs such that
n = q 0 m + r0 = q 00 m + r00 (1.9)
with 0 r0 ; r00 < m. Since (q 0 ; r0 ) and (q 00 ; r00 ) are di¤erent we have either q 0 6= q 00 or r0 6= r00
or both. If q 0 6= q 00 , without loss of generality, we can suppose that q 0 < q 00 ; that is,

q0 + 1 q 00 (1.10)

since q 0 and q 00 are integers. It follows from (1.9) that (q 00 q0 ) m = r0 r00 . Since
(q 00 q 0 ) m 0, we have that 0 r00 r0 < m. Hence,

q 00 q0 m = r0 r00 < m

which implies that q 00 q 0 < 1, that is, q 00 < q 0 + 1, which contradicts (1.10). We can
conclude that, necessarily, q 0 = q 00 . This leaves open only the possibility that r0 6= r00 . But,
since q 0 = q 00 , we have that

0 = q 00 q0 m = r0 r00 6= 0;

a contradiction. Hence, the assumption of having two di¤erent pairs (q 0 ; r0 ) and (q 00 ; r00 ) is
false.

Given two strictly positive integers m and n, their greatest common divisor, denoted by
gcd (m; n), is the largest divisor both numbers share. The next result, which was proven
by Euclid in his Elements, shows exactly what was taken for granted in elementary school,
namely, that any pair of integers has a unique greatest common divisor.

Theorem 21 (Euclid) Any pair of strictly positive integers has one and only one greatest
common divisor.

Proof Like Proposition 20, this is also an existence and uniqueness result. Uniqueness is
obvious; let us prove existence. Let m and n be any two strictly positive integers. By
Proposition 20, there is a unique pair (q1 ; r1 ) such that

n = q1 m + r1 (1.11)

with 0 r1 < m. If r1 = 0, then gcd (m; n) = m, and the proof is concluded. If r1 > 0, we
iterate the procedure by applying Proposition 20 to m. We thus have a unique pair (q2 ; r2 )
such that
m = q2 r1 + r2 (1.12)
1.3. STRUCTURE OF THE INTEGERS 17

where 0 r2 < r1 . If r2 = 0, then gcd (m; n) = r1 . Indeed, (1.12) implies r1 j m. Further-


more, by (1.11) and (1.12), we have that

n q1 m + r1 q1 q2 r1 + r1
= = = q1 q2 + 1
r1 r1 r1

and so r1 j n. Thus r1 is a divisor both for n and m. We now need to show that it is the
greatest of those divisors. Suppose p is a strictly positive integer such that p j m and p j n.
By de…nition, there are two strictly positive integers a and b such that n = ap and m = bp.
We have that
r1 n q1 m
0< = = a q1 b
p p
Hence r1 =p is a strictly positive integer, which implies that r1 p. To sum up, gcd (m; n) =
r1 , if r2 = 0. If this is the case, the proof is concluded.
If r2 > 0, we iterate the procedure once more by applying Proposition 20 to r2 . We thus
have a unique pair (q3 ; r3 ) such that

r1 = q3 r2 + r3

where 0 r3 < r2 . If r3 = 0, proceeding as above we can show that gcd (m; n) = r2 ,


and the proof is complete. If r3 > 0, we iterate the procedure. Iteration after iteration, a
strictly decreasing sequence of positive integers r1 > r2 > > rk is generated. A strictly
decreasing sequence of positive integers can only be …nite: there is a k 1 such that rk = 0.
Proceeding as above we can show that gcd (m; n) = rk 1 , which completes the proof of
existence of gcd (m; n).

From a methodological standpoint, the above argument is a good example of a con-


structive proof, since it is based on an algorithm (known as the Euclid’s Algorithm) which
determines with a …nite number of iterations the mathematical entity whose existence is
stated – here, the greatest common divisor. The notion of algorithm is of paramount im-
portance because, when available, it makes mathematical entities computable. In principle
an algorithm can be automated by means of an appropriate computer program (for example,
Euclid’s Algorithm allows us to automate the search for the greatest common divisors).

Euclid’s Algorithm is the …rst algorithm we encounter and it is of such importance in


number theory that it deserves to be reviewed in greater detail. Given two strictly positive
integers m and n, the algorithm unfolds in the following k 1 steps:
Step 1 n = q1 m + r1
Step 2 m = q2 r1 + r2
Step 3 r1 = q2 r2 + r3

Step k rk 2 = q2 rk 1 (that is, rk = 0)


The algorithm stops at step k when rk = 0. In this case gcd (m; n) = rk 1, as we saw in
the previous proof.
18 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Example 22 Let us consider the strictly positive integers 3801 and 1708. Their greatest
common divisor is not apparent at …rst sight. Fortunately, we can calculate it by means of
Euclid’s Algorithm. We proceed as follows:
Step 1 3801 = 2 1708 + 385
Step 2 1708 = 4 385 + 168
Step 3 385 = 2 168 + 49
Step 4 168 = 3 49 + 21
Step 5 49 = 2 21 + 7
Step 6 21 = 3 7
In six steps we have found that gcd(3801; 1708) = 7. N

The quality of an algorithm depends on the number of steps, or iterations, that are
required to reach the solution. The fewer the iterations, the more powerful the algorithm is.
The following remarkable property –proven by Gabriel Lamé –holds for Euclid’s Algorithm.

Theorem 23 (Lamé) Given two integers m and n, the number of iterations needed for
Euclid’s Algorithm is less than or equal to …ve times the number of digits of min fm; ng.

For example, if we go back to the numbers 3801 and 1708, the number of relevant digits
is 4. Lamé’s Theorem guarantees in advance that Euclid’s Algorithm would have required
at most 20 iterations. It took us only 6 steps, but thanks to Lamé’s Theorem we already
knew, before starting, that it would not have taken too much e¤ort (and thus it was worth
giving it a shot without running the risk of getting stuck in a grueling number of iterations).

1.3.2 Prime numbers


Among the natural numbers, a prominent position is held by prime numbers, which the
reader has most likely encountered in secondary school

De…nition 24 A natural number n 2 is said to be prime if it is divisible only by 1 and


itself.

A natural number which is not prime is called composite. Let us denote the set of prime
numbers by P. Obviously, P N and N P is the set of composite numbers. The reader can
easily verify that the following naturals

f2; 3; 5; 7; 11; 13; 17; 19; 23; 29g

are the …rst ten prime numbers.


The importance of prime numbers becomes more apparent if we note how composite
numbers (strictly greater than 1) can be expressed as a product of primes. For example, the
composite number 12 can be written as

12 = 22 3
1.3. STRUCTURE OF THE INTEGERS 19

while the composite number 60 can be written as

60 = 22 3 5

In general, the prime factorization (or decomposition) of a composite number n can be


written as
n = pn1 1 pn2 2 pnk k (1.13)
where pi 2 P and ni 2 N for each i = 1; :::; k, with

p1 < p2 < < pk and n1 > 0; :::; nk > 0

Example 25 (i) For n = 12 we have p1 = n1 = 2, p2 = 3 and n2 = 1; in this case k = 2.


(ii) For n = 60 we have p1 = n1 = 2, p2 = 3, n2 = 1, p3 = 5 and n3 = 1; in this case k = 3.
(iii) For n = 200 we have
200 = 23 52
hence p1 = 2, n1 = 3, p2 = 5 and n2 = 2; in this case k = 2. (iv) For n = 522 we have

522 = 2 32 29

hence p1 = 2, n1 = 1, p2 = 3, n2 = 2, p3 = 29 and n3 = 1; in this case k = 3. N

What we have just seen raises two questions: whether every natural number admits
a prime factorization (we have only seen a few examples up to now) and whether such
factorization is unique. The next result, the Fundamental Theorem of Arithmetic, addresses
both questions by showing that every integer admits one and only one prime factorization.
In other words, every integer can be expressed uniquely as a product of prime numbers.
Prime numbers are thus the “atoms” of N: they are “indivisible” –as they are divisible
only by 1 and themselves –and by means of them any other natural number can be expressed
uniquely. The importance of this result, which shows the centrality of prime numbers, can
be seen in its name. Its …rst proof can be found in the famous Disquisitiones Arithmeticae,
published in 1801 by Carl Friederich Gauss, although Euclid was already aware of the result
in its essence.

Theorem 26 (Fundamental Theorem of Arithmetic) Any natural number n > 1 ad-


mits one and only one prime factorization as in (1.13).

Proof Let us start by showing the existence of this factorization. We will proceed by
contradiction. Suppose there are natural numbers that do not have a prime factorization
as in (1.13). Let n > 1 be the smallest among them. Obviously, n is a composite number.
There are then two natural numbers p and q such that n = pq with 1 < p; q < n. Since n
is the smallest number that does not admit a prime factorization, the numbers p and q do
admit such factorization. In particular, we can write
n0 n0 0
p = pn1 1 pn2 2 pnk k and q = q1 1 q2 2 qsns

Thus, we have that


n0 n0 0
n = pq = pn1 1 pn2 2 pnk k q1 1 q2 2 qsns
20 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

By collecting the terms pi and qj appropriately, n can be rewritten as in (1.13). Hence, n


admits a prime factorization, which contradicts our assumptions on n, thus concluding the
proof of the existence.
Let us proceed by contradiction to prove uniqueness as well. Suppose that there are
natural numbers that admit more than one factorization. Let n > 1 be the smallest among
them: then n admits at least two di¤erent factorizations, so that we can write
n0 n0 0
n = pn1 1 pn2 2 pnk k = q1 1 q2 2 qsns

Since q1 is a divisor of n, it must be a divisor of at least one of the factors p1 < < pm .18
For example, let p1 be one such factor. Since both q1 and p1 are primes, we have that q1 = p1 .
Hence
n0 1 n0 0
pn1 1 1 pn2 2 pnk k = q1 1 q2 2 qsns < n
which contradicts the minimality of n, as the number pn1 1 1 pn2 2 pnk k also admits multiple
factorizations. The contradiction proves the uniqueness of the prime factorization.

From a methodological viewpoint it must be noted that this proof of existence is carried
out by contradiction and, as such, cannot be constructive. Indeed, such proofs are based on
the law of excluded middle (a property is either true or false; cf. Appendix D) and the truth
of a statement is established by showing its non-falseness. This often allows for such proofs
to be short and elegant but, although logically air-tight,19 they are almost metaphysical as
they do not provide a procedure for constructing the mathematical entities whose existence
they establish. In other words, they do not provide an algorithm with which such entities
can be determined.
To sum up, we invite the reader to compare this proof of existence with the constructive
one provided for Theorem 21. This comparison should clarify the di¤erences between the two
fundamental types of proofs of existence, constructive/direct and non-constructive/indirect.

It is not a coincidence that the proof of the existence in the Fundamental Theorem of
Arithmetic is not constructive. Indeed, designing algorithms which allow us to factorize
a natural number n into prime numbers – the so-called factorization tests – is exceedingly
complex. After all, constructing algorithms which can assess whether n is prime or composite
– the so-called primality tests – is already extremely cumbersome and it is to this day an
active research …eld (so much so that an important result in this …eld dates to 2002).20
To grasp the complexity of the problem it su¢ ces to observe that, if n is composite, there
p p
are two natural numbers a; b > 1 such that n = ab. Hence, a n or b n (otherwise,
p
ab > n), so there is a divisor of n among the natural numbers between 1 and n. To verify
whether n is prime or composite, we can merely divide n by all natural numbers between 1
18
This mathematical fact, although intuitive, requires a mathematical proof. This is indeed the content of
Euclid’s Lemma, which we do not prove. This lemma permits to conclude that, if a prime p divides a product
of strictly positive integers, then it must divide at least one of them.
19
Unless one rejects the law of excluded middle, as some eminent mathematicians have done (although it
constitutes a minority view and a very subtle methodological issue, the analysis of which is surely premature).
20
One of the reasons why the study of factorization tests is an active research …eld is that the di¢ culty
in factorizing natural numbers is exploited by modern cryptography to build unbreakable codes (see Section
6.4).
1.3. STRUCTURE OF THE INTEGERS 21

p
and n: if none of them is a divisor for n, we can safely conclude that n is a prime number,
p
or, if this is not the case, that n is composite. This procedure requires at most n steps.
With this in mind, suppose we want to test whether the number 10100 + 1 is prime or
composite p (it is a number with 101 digits, so it is big but not huge). The procedure requires
at most 10100 + 1 operations, that is, at most 1050 operations (approximately). Suppose we
have an extremely powerful computer which is able to carry out 1010 (ten billion) operations
per second. Since there are 31:536:000 seconds in a year, that is, approximately 3 107
seconds, our computer would be able to carry out approximately 3 107 1010 = 3 1017
operations in one year. To carry out the operations that our procedure might require, our
computer would need
1050 1
17
= 1033
3 10 3
years. We had better get started...

It should be noted that, if the prime factorization of two natural numbers n and m is
known, we can easily determine their greatest common divisor. For example, from

3801 = 3 7 181 and 1708 = 22 7 61

it easily follows that gcd (3801; 1708) = 7, which con…rms the result of Euclid’s Algorithm.
Given how di¢ cult it is to factorize natural numbers, the observation is hardly useful from
a computational standpoint. Thus, it is a good idea to hold on to Euclid’s Algorithm, which
thanks to Lamé’s Theorem is able to produce the greatest common divisors with reasonable
e¢ ciency, without having to conduct any factorization.

But how many are there?


Given the importance of prime numbers, it comes naturally to ask oneself how many there
are. The next celebrated result of Euclid shows that these are in…nitely many. After Theorem
17, it is the second remarkable gem of Greek mathematics we have the pleasure to meet in
these few pages.

Theorem 27 (Euclid) There are in…nitely many prime numbers.

Proof The proof is carried out by contradiction. Suppose that there are only …nitely many
prime numbers and denote them by p1 < p2 < < pn . De…ne

q = p1 p2 pn

and set m = q + 1. The natural number m is larger than any prime number, hence it is a
composite number. By the Fundamental Theorem of Arithmetic, it is divisible by at least
one of the prime numbers p1 , p2 , ..., pn . Let us denote this divisor by p. Both natural
numbers m and q are thus divisible by p. It follows that also their di¤erence, that is the
natural number 1 = m q, is divisible by p, which is impossible since p > 1. Hence, the
assumption that there are …nitely many prime numbers is false.

In conclusion, we have looked at some basic notions in number theory, the branch of
mathematics which deals with the properties of integers. It is one of the most fascinating
22 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

and complex …elds of mathematics, and it bears incredibly deep results, often easy to state
but hard to prove. A classic example is the famous Fermat’s Last Theorem, whose statement
is quite simple: if n 3, there cannot exist three strictly positive integers x, y, and z
n n n
such that x + y = z . Thanks to Pythagoras’ Theorem we know that for n = 2 such
triplets of integers do exist (for example, 32 + 42 = 52 ); Fermat’s Last Theorem states that
n = 2 is indeed the only case in which this remarkable property holds. Stated by Fermat,
the theorem was …rst proven in 1994 by Andrew Wiles after more than three centuries of
unfruitful attempts.

1.4 Order structure of R


We now turn our attention to the set R of the real numbers, which is central for applications.
An important property of R is the possibility of ordering its elements through the inequality
. The intuitive meaning of such inequality is clear: given two real numbers a and b, we
have a b when a is at least as great as b.
Consider the following properties of the inequality :

(i) re‡exivity: a a;

(ii) antisymmetry: if a b and b a, then a = b;

(iii) transitivity: if a b and b c, then a c;

(iv) completeness (or totality): for every pair a; b 2 R, we have a b or b a (or both);

(v) additive independence: if a b, then a + c b + c for every c 2 R.

(vi) multiplicative independence: let a b; then

ac bc if c > 0

ac = bc = 0 if c = 0

ac bc if c < 0

(vii) separation:21 given two sets of real numbers A and B, if a b for every a 2 A and
b 2 B, then there exists c 2 R such that a c b for every a 2 A and b 2 B.

The …rst three properties have an obvious interpretation. Completeness guarantees that
any two real numbers can always be ordered. Additive independence ensures that the initial
ordering between two real numbers a and b is not altered by adding to both the same real
number c. Multiplicative independence considers, instead, the stability of such ordering with
respect to multiplication.
Finally, separation permits to separate two sets ordered by – that is, such that each
element of one of the two sets is greater than or equal to each element of the other one –
21
Sometimes the property of separation of real numbers is called axiom of completeness (or of continuity or
also of Dedekind ). We do not adopt this terminology to avoid confusion with property (iv) of completeness
or totality.
1.4. ORDER STRUCTURE OF R 23

through a real number c, called separating element.22 Separation is a fundamental property


of “continuity”of the real numbers and it is what mainly distinguishes them from the rational
numbers (for which such property does not hold, as remarked in the last footnote) and makes
them the natural environment for mathematical analysis.

The strict form a > b of the “weak”inequality indicates that a is strictly greater than
b. In terms of , we have a > b if and only if b a, that is, the strict inequality can be
de…ned as the negation of the weak inequality (of opposite direction). The reader can verify
that transitivity and independence (both additive and multiplicative) hold also for the strict
inequality >, while the other properties of the inequality do not hold for >.

The order structure, characterized by properties (i)-(vii), is fundamental in R. Before


starting its study, we introduce by means of and > some fundamental subsets of R:
(i) the closed bounded intervals [a; b] = fx 2 R : a x bg;
(ii) the open bounded intervals (a; b) = fx 2 R : a < x < bg;
(iii) the half-closed (or half-open) bounded intervals (a; b] = fx 2 R : a < x bg and [a; b) =
fx 2 R : a x < bg.
Other important intervals are:
(iv) the unbounded intervals [a; 1) = fx 2 R : x ag and (a; 1) = fx 2 R : x > ag, and
their analogous ( 1; a] and ( 1; a).23 In particular, the positive half-line [0; 1) is
often denoted by R+ , while R++ denotes (0; 1), that is, the positive half-line without
the origin.
The use of the adjectives open, closed, and unbounded will become clear in Chapter 5. To
ease notation, in the rest of the chapter (a; b) will denote both an open bounded interval and
the unbounded ones (a; 1), ( 1; b) and ( 1; 1) = R. Analogously, (a; b] and [a; b) will
denote both the half-closed bounded intervals and the unbounded ones ( 1; b] and [a; 1).

1.4.1 Maxima and minima


De…nition 28 Let A R be a non-empty set. A number h 2 R is called upper bound of A
if it is greater than or equal to each element of A, that is, if24
h x 8x 2 A
while it is called lower bound of A if it is smaller than or equal to each element of A, that
is, if
h x 8x 2 A
22
The property
p of separation holds also
p for N and Z, but not for Q. For example, the sets A =
q 2 Q : q < 2 and B = q 2 Q : q > 2 do not have a rational separating element (as the reader can
verify in light of Theorem 17 and of what we will see in Section 1.4.3).
23
When there is not danger of confusion, we will write simply 1 instead of +1. The symbol 1, introduced
in mathematics by John Wallis in the 17th Century, reminds a curve called lemniscate and a kind of hat or of
halo (symbol of force) put on the head of some tarot card …gures: in any case, it is de…nitely not a ‡attened
8.
24
The universal quanti…er 8 reads “for every”. Therefore, “8x 2 A”reads “for every element x that belongs
to the set A” (see Appendix D).
24 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

For example, if A = [0; 1], the number 3 is an upper bound and the number 1 is a lower
bound since 1 x 3 for every x 2 [0; 1]. In particular, the set of upper bounds of A is
the interval [1; 1) and the set of the lower bounds is the interval ( 1; 0].
We will denote by A the set of upper bounds of A and by A the set of lower bounds.
In the example just seen, A = [1; 1) and A = ( 1; 0].

A few simple remarks. Let A be any set.

(i) Upper bounds and lower bounds do not necessarily belong to the set A: the upper
bound 3 and the lower bound 1, for the set [0; 1], are an example of this.

(ii) Upper bounds and lower bounds might not exist. For example, for the set of even
numbers
f0; 2; 4; 6; g (1.14)
there is no real number which is greater than all its elements: hence, this set does not
have upper bounds. Analogously, the set

f0; 2; 4; 6; g (1.15)

has no lower bounds, while the set of integers Z is a simple example of a set without
upper and lower bounds.

(iii) If h is an upper bound, so is h0 > h; analogously, if h is a lower bound, so is h00 < h.


Therefore, if they exist, upper bounds and lower bounds are not unique.

Through upper bounds and lower bounds we can give a …rst classi…cation of sets of the
real line.

De…nition 29 A non-empty set A R is said to be:

(i) bounded ( from) above if it has an upper bound, that is, A 6= ;;

(ii) bounded ( from) below if it has a lower bound, that is, A 6= ;;

(iii) bounded if it is bounded both above and below.

For example, the closed interval [0; 1] is bounded because it is bounded both above and
below, while the set of even numbers (1.14) is bounded below, but not above (indeed, it has
no upper bounds).25 Analogously, the set (1.15) is bounded above, but not below.
Note that this classi…cation of sets is not exhaustive: there exist sets that do not fall
in any of the types (i)-(iii) of the previous de…nition. For example, Z has neither an upper
bound nor a lower bound in R, and therefore it is not of any of the types (i)-(iii). Such sets
are called unbounded.

We now introduce a fundamental class of upper and lower bounds.


25
By using Proposition 38, the reader can formally prove that, indeed, the set of even numbers is not
bounded from above.
1.4. ORDER STRUCTURE OF R 25

De…nition 30 Given a non-empty set A R, an element x


^ of A is called maximum of A
if it is the greatest element of A, that is, if

x
^ x 8x 2 A

while it is called minimum of A if it is the smallest element of A, that is, if

x
^ x 8x 2 A

The key feature of this de…nition is the condition that the maximum and minimum belong
to the set A at hand. It is immediate to see how maxima and minima are, respectively, upper
bounds and lower bounds. Indeed, they are nothing but the upper bounds and lower bounds
that belong to the set A. For such a reason, maxima and minima can be seen as the “best”
among the upper bounds and the lower bounds. Many economic applications are, indeed,
based on the search of maxima or minima of suitable sets of alternatives.

Example 31 The closed interval [0; 1] has minimum 0 and maximum 1. N

Unfortunately, maxima and minima are fragile notions: sets often do not admit them.

Example 32 The half-closed interval [0; 1) has minimum 0, but it has no maximum. Indeed,
suppose by contradiction that there exists a maximum x^ 2 [0; 1), so that x
^ x for every
x 2 [0; 1). Set
1 1
x~= x
^+ 1
2 2
Since x
^ < 1, we have x
^<x ~. But, it is obvious that x
~ 2 [0; 1), which contradicts the fact
that x
^ is maximum of [0; 1). N

By reasoning in a similar way, we see that:

(i) the half-closed interval (0; 1] has maximum 1, but it has no minimum;

(ii) the open interval (0; 1) has neither minimum nor maximum.

When they exist, maxima and minima are unique:

Proposition 33 A set A R has at most one maximum and one minimum.

Proof Let x^1 ; x


^2 2 A be two maxima of A. We show that x^1 = x^2 . Since x
^1 is a maximum,
we have x
^1 x for every x 2 A. In particular, since x
^2 2 A, we have x ^1 x ^2 . Analogously,
x
^2 x ^1 because also x^2 is a maximum. Therefore, x
^1 = x
^2 . In a similar way, we can prove
the uniqueness of the minimum.

The maximum of a set A is denoted by max A, and the minimum by min A. For example,
for A = [0; 1] we have max A = 1 and min A = 0.
26 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

1.4.2 Supremum and in…mum


Since maxima and minima are key for applications (and not only there), their fragility is a
substantial problem. To mitigate it, we look for a “surrogate”: a conceptually similar, but
less fragile, notion which is available also when maxima or minima are absent.
Let us consider …rst maxima.26 We begin by noting that the maximum, when it exists,
is the smallest (least) upper bound, that is,

max A = min A (1.16)

Indeed, let x
^ 2 A be the maximum of A. If h is an upper bound of A, we have h x
^, since
x
^ 2 A. On the other hand, x^ is also an upper bound, and we thus obtain (1.16).

Example 34 The set of upper bounds of [0; 1] is the interval [1; 1). In this example, the
equality (1.16) takes the form max [0; 1] = min [1; 1). N

Thus, when it exists, the maximum is the smallest upper bound. But, the smallest upper
bound –that is, min A –might exist also when the maximum does not exist. For example,
consider A = [0; 1): the maximum does not exist, but the smallest upper bound exists and
it is 1, i.e., min A = 1.
All of this suggests that the smallest upper bound is the surrogate for the maximum
which we are looking for. Indeed, in the example just seen, the point 1 is, in absence of a
maximum, its closest approximation.
Reasoning in a similar way, the greatest lower bound, i.e., max A , is the natural candid-
ate to be the surrogate for the minimum when the latter does not exist. Motivated by what
we have just seen, we give the following de…nition.

De…nition 35 Given a non-empty set A R, the supremum of A is the least upper bound
of A, that is, min A , while the in…mum is the greatest lower bound of A, that is, max A .

Thanks to Proposition 33, both the supremum and the in…mum of A are unique, when
they exist. We denote them by sup A and inf A. For example, for A = (0; 1) we have
inf A = 0 and sup A = 1.
As already remarked, when inf A 2 A, it is the minimum of A, and when sup A 2 A, it
is the maximum of A.

Although suprema and in…ma may exist when maxima and minima do not, they do not
always exist.

Example 36 Consider the set A of the even numbers in (1.14). In this case A = ; and so
A has no supremum. More generally, if A is not bounded above, we have A = ; and the
supremum does not exist. In a similar way, the sets that are not bounded below have no
in…ma.27 N
26
As already mentioned, in economics maxima play a fundamental role.
27
If A does not admit supremum, we write sup A = +1 and, when it does not admit in…mum, inf A = 1.
Moreover, by convention, we set sup ; = 1 and inf ; = +1. This is motivated by the fact that each real
number must be considered simultaneously an upper bound and a lower bound of ;: then it is natural to
conclude that sup ; = inf ; = inf R = 1 and inf ; = sup ; = sup R = + 1.
1.4. ORDER STRUCTURE OF R 27

To be a useful surrogate, suprema and in…ma must exist for a large class of sets; other-
wise, if also their existence were problematic, they would be of little help as surrogates.28
Fortunately, the next important result shows that suprema and in…ma do indeed exist for a
large class of sets (with sets of the kind seen in the last example being the only troublesome
ones).

Theorem 37 (Least Upper Bound Principle) Each non-empty set A R has supremum
if it is bounded above and it has in…mum if it is bounded below.

Proof We limit ourselves to prove the …rst statement. To say that A is bounded above
means that it admits an upper bound, i.e., that A 6= ;. Since a h for every a 2 A and
every h 2 A , by the separation property there exists a separating element c 2 R such that
a c h for every a 2 A and every h 2 A . Since c a for every a 2 A, we have that c
is an upper bound of A, so that c 2 A . But, since c h for every h 2 A , it follows that
c = min A , that is, c = sup A. This proves the existence of the supremum of A.

Except for the sets that are not bounded above, all the other sets in R admit supremum.
Analogously, except for the sets that are not bounded below, all the other sets in R have
in…mum. Suprema and in…ma are thus excellent surrogates that exist, and so help us, for a
large class of subsets of R.
Note that a simple, but useful, consequence of the previous theorem is that bounded sets
have both supremum and in…mum.

1.4.3 Density
The order structure is also useful to clarify the relations among the sets N, Z, Q, and R.
First of all, we make rigorous a natural intuition: however great is a real number, there
always exists a greater natural number. This is the so-called Archimedean property of real
numbers.

Proposition 38 For each real number a 2 R, there exists a natural number n 2 N such that
n a.

Proof By contradiction, assume that there exists a 2 R such that a n for all n 2 N.
By the Least Upper Bound Principle, sup N exists and belongs to R. Recall that, by the
de…nition of sup,
sup N n 8n 2 N (1.17)
At the same time, again by the de…nition of sup, we have sup N 1 < n for some n 2 N
(otherwise, sup N 1 would be an upper bound of N, thus violating the fact that sup N is the
least of these upper bounds). We can conclude that sup N < n + 1 2 N, which contradicts
(1.17).

The next property shows a fundamental di¤erence between the structures of N and Z,
on the one side, and of Q and R, on the other side. If we take an integer, we can talk in
a natural way of predecessor and successor: if m 2 Z, its predecessor is the integer m 1,
28
The utility of a surrogate depends on how well it approximates the original, as well as on its availability.
28 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

while its successor is the integer m + 1 (for example, the predecessor of 317 is 316 and its
successor is 318). In other words, Z has a discrete “rhythm”.
In contrast, we cannot talk of predecessors and successors in Q or in R. Consider …rst
Q. Given a rational number q = m=n, let q 0 = m0 =n0 be any rational such that q 0 > q. Set

1 0 1
q 00 = q + q
2 2
The number q 00 is rational, since

1 m0 1 m 1 m0 n + mn0
q 00 = 0
+ =
2 n 2 n 2 nn0

and one has


q < q 00 < q 0 (1.18)
Therefore, there is no smallest rational number greater than q. Analogously, it is easy to see
that there is no greatest rational number smaller than q. Rational numbers, hence, do not
admit predecessors and successors.
In a similar way, given any two real numbers a < b there exists a real number c such that
a < c < b. Indeed,
1 1
a< a+ b<b
2 2
Real numbers as well, therefore, do not admit predecessors and successors. The rhythm of
both rational and real numbers is “tight”, without discrete interruptions (which are inter-
vals). Such property of Q and R is called density. Unlike N and Z, which are discrete sets,
Q and R are dense sets.29

We conclude with an important density relationship between Q and R. We already


observed how most real numbers are not rational. Nevertheless, rational numbers are a
“dense” –therefore very signi…cant –subset of the real numbers because, as we show next:
between any two real numbers we can always “insert” a rational number.

Proposition 39 Given any two real numbers a < b, there exists a rational number q 2 Q
such that a < q < b.

This property can be stated by saying that Q is dense in R. In the proof of this result
we use the notion of integer part [a] of a real number a 2 p
R, which is the greatest integer
n 2 Z such that n a. For example, [ ] = 3, [5=2] = 2, 2 = 1, [ ] = 4 and so on.
The reader can verify that
[a + 1] = [a] + 1 (1.19)
since, for each n 2 Z, we have n a if and only if n + 1 a + 1. Moreover, [a] < a when
a2= Z.
29
In his famous argument against plurality, Zeno of Elea remarks that a “plurality” is in…nite because “...
there will always be other things between the things that are, and yet others between those others.” (trans.
Raven). Zeno thus identi…es density as the characterizing property of an in…nite collection. With a (twenty
…ve centuries) hidden insight, we can say that he is neglecting the integers. Yet, it is stunning how he was
able to identify a key property of in…nite sets.
1.5. POWERS AND LOGARITHMS 29

Proof Let a; b 2 R, with a < b. For simplicity, we distinguish three cases.

Case 1: Let a + 1 = b. If a 2 Q the result follows from (1.18). Let a 2


= Q, and therefore
a+12 = Q. We have
[a] a < [a] + 1 = [a + 1] < a + 1 (1.20)

So, q = [a] + 1 is the rational number we were looking for.

Case 2: Let b a > 1, i.e., a < a + 1 < b. From Case 1 it follows that there exists q 2 Q
such that a < q < a + 1 < b.

Case 3: Let b a < 1. By the Archimedean property of real numbers, there exists 0 6= n 2 N
such that
1
n
b a
So, nb na = n (b a) 1. Then, for what we have just seen in cases 1 and 2, there exists
q 2 Q such that na < q < nb. Therefore a < q=n < b, which completes the proof because
q=n 2 Q.

1.5 Powers and logarithms


1.5.1 Powers
1
Given n 2 N, we have already recalled the meaning of q n with q 2 Q and of q n with
1
0 < q 2 Q. In a similar way we de…ne an with a 2 R and a n with 0 < a 2 R. More generally,
we set
1 m 1
a n = n and a n = (am ) n
a
for m; n 2 N and 0 < a 2 R. We have, therefore,
p de…ned the power ar with real positive base
n m m
and rational exponent. Sometimes we write a instead of a n .
Given 0 < a 2 R, we now want to extend this notion to the case ax with x 2 R, i.e., with
real exponent. Before doing this, we make two important observations.

(i) We have de…ned ar only for a > 0 to avoid dangerous and embarrassing
q misunderstand-
3 2 3 p
ings. Think, for example, of ( 5) 2 . It could be rewritten as ( 5) = 2 125 or as
p2 3
5 ; which do not exist (among the real numbers). But, it could also be written
q
3 6 p
as ( 5) = ( 5) which, in turn, can be expressed as either 4 ( 5)6 = 4 15; 625; or
2 4
p4 6
5 . The former exists and is approximately equal to 11:180339, but the latter
does not exist.
p 1
(ii) Let us consider the root a = a 2 . From p high school we know that each positive number
has two algebraic roots, for example 9 = 3. The unique positive value of the root
is called, instead, arithmetical root. For example, 3 and 3 are the two algebraic roots
of 9, while 3 is its unique arithmetical root. In what follows the (even order) roots will
30 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

always be in the arithmetical sense (and therefore with a unique value). It is, by the
way, the standard convention: for example, in the classic solution formula
p
b b2 4ac
x=
2a
of the quadratic equation ax2 + bx + c = 0, the root is in the arithmetical sense (this
is why we need to write ).

We now extend the notion of power to the case ax , with 0 < a 2 R and x 2 R. Un-
fortunately, the details of this extension are tedious, so we limit ourselves to saying that, if
a > 1, the power ax is the supremum of the set of all the values aq when the exponent q
varies among the rational numbers such that q x. Formally,

ax = sup faq : q x with q 2 Qg (1.21)

In a similar way we de…ne ax for 0 < a < 1. We have the following properties that, by (1.21),
follow from the analogous properties that hold when the exponent is rational.

Lemma 40 Let a; b > 0 and x; y 2 R. We have ax > 0 for every x 2 R. Moreover:

(i) ax ay = ax+y and ax =ay = ax y;

(ii) (ax )y = axy ;

(iii) ax bx = (ab)x and ax =bx = (a=b)x ;

(iv) if x 6= y then ax 6= ay ; in particular, if x > y then

ax > ay if a > 1

ax < ay if a < 1

ax = ay = 1 if a = 1

The most important base a is Napier’s constant e, which will be introduced in Chapter
8. As we will see, the power ex has truly remarkable properties.
Finally, note that point (ii) of the lemma implies, inter alia, that
y
ax = by =) a = b x (1.22)
y y 3
for all a; b > 0 and x; y 2 R. Indeed, (b x )x = b x x = by . For instance, a2 = b3 implies a = b 2 ,
5
while a 3 = b5 implies a = b 3 .

1.5.2 Logarithms
The operations of addition and multiplication are commutative: a + b = b + a and ab = ba.
Therefore, they have only one inverse operation, respectively the subtraction and the division:

(i) if a + b = c, then b = c a and a = c b.


1.5. POWERS AND LOGARITHMS 31

(ii) if ab = c, then b = c=a and a = c=b, with a; b 6= 0.

The power operation ab , with a > 0, is not commutative: ab might well be di¤erent from
ba .
Therefore, it has two distinct inverse operations.
Let ab = c. The …rst inverse operation – given c and b, …nd out a – is called root with
index b of c: p
a = b c = c1=b
The second one –given c and a, …nd out b –is called logarithm with base a of c:

b = loga c

Note that, together with a > 0 and c > 0, one must also have a 6= 1 because 1b = c is
impossible except when c = 1.

The logarithm is a fundamental notion, introduced in 1614 by John Napier, ubiquitous in


mathematics and in its applications. As we have just seen, it is a simple notion: the number
b = loga c is nothing but the exponent that must be given to a in order to get c, that is,

aloga c = c

The properties of the logarithms derive easily from the properties of the powers established
in Lemma 40.

Lemma 41 Let a; c; d > 0, with a 6= 1. Then:

(i) loga cd = loga c + loga d;

(ii) loga (c=d) = loga c loga d;

(iii) loga ck = k loga c for every k 2 R;

(iv) logak c = k 1 log


ac for every 0 6= k 2 R.

Proof (i) Let ax = c, ay = d, and az = cd. Since az = cd = ax ay = ax+y , by Lemma 40-(iv)


it follows that z = x + y. (ii) The proof is similar to the previous one. (iii) Let b = loga ck .
b
Then, ab = ck and so by (1.22) we have c = a k , which implies b=k = loga c. We conclude
b
that loga ck = b = k loga c.30 (iv) Let ak = c. Then akb = c, so kb = loga c. In turn, this
implies b = k 1 loga c.

The key property of the logarithm is to transform the product of two numbers in a sum
of two other numbers, that is, property (i) above. Sums are much easier to handle than
products, so the importance of logarithms also computationally (till the age of computers,
tables of logarithms were a most important aid to perform computations). To emphasize this
key property of logarithms, denote a (strictly positive) scalar by an upper case letter and its
logarithm by the corresponding lower case letter; e.g., C = loga c. Then, we can summarize
property (i) as:
c d !C +D
30
For example, loga x2 = 2 loga x for x > 0. Note that loga x2 exists for each x 6= 0, while 2 loga x exists
only for x > 0.
32 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

The importance of this transformation can be hardly overestimated.31

A simple formula permits a change of base.

Lemma 42 Let a; b; c > 0, with a 6= 1. Then

logb c
loga c =
logb a

Proof Let ax = c, by = c, and bz = a. We have ax = (bz )x = bzx = c = by and therefore


zx = y , that is, x = y=z.

Thanks to this change of base formula, it is possible to take as base of the logarithms
always the same number, say 10, because

log10 c
loga c =
log10 a

As for the powers ax , also for the logarithms the most common base is Napier’s constant
e. In such a case we simply write log x instead of loge x. Because of its importance, log x is
called the natural logarithm of x, which leads to the notation ln x sometimes used in place
of log x.

The next result shows the close connections between logarithms and powers, which can
be actually seen as inverse notions.

Proposition 43 Given a > 0, a 6= 1, we have

loga ax = x 8x 2 R

and
aloga x = x 8x > 0

We leave to the reader the simple proof. To check their understanding of the material
of this section, the reader may want to verify that bloga c = cloga b for all strictly positive
numbers a 6= 1, b, and c.

1.6 Numbers, …ngers and circuits


The most natural way to write numbers makes use of the “decimal notation”. Ten symbols
have been chosen,
0; 1; 2; 3; 4; 5; 6; 7; 8; 9 (1.23)
called digits. Using positional notation, any natural number can be written by means of
digits which represent, from right to left respectively, units, tens, hundreds, thousands, etc.
31
Napier’s entitled his 1614 work Miri…ci logarithmorum canonis descriptio, that is, “A description of the
wonderful law of logarithms”. He was not exaggerating (the importance of logarithms was very soon realized).
1.6. NUMBERS, FINGERS AND CIRCUITS 33

For example, in this manner, 4357 means 4 thousands, 3 hundreds, 5 tens and 7 units.
The natural numbers are thus expressed by powers of 10, each of which causes a digit to be
added: writing 4357 is the abbreviation of

4 103 + 3 102 + 5 101 + 7 100

To employ positional notation, it is fundamental to adopt the 0 to signal an empty slot: for
example, when writing 4057 the zero signals the absence of the hundreds, that is,

4 103 + 0 102 + 5 101 + 7 100

Decimals are represented in a completely analogous fashion through the powers of 1=10 =
10 1 : for example 0:501625 is the abbreviation of
1 2 3 4 5 6
5 10 + 0 10 + 1 10 + 6 10 + 2 10 + 5 10

The choice of decimal notation is due to the mere fact that we have ten …ngers, but
obviously is not the only possible one. Some Native American tribes used to count on their
hands using the eight spaces between their …ngers rather than the ten …ngers themselves.
They would have chosen only 8 digits, say

0; 1; 2; 3; 4; 5; 6; 7

and they would have articulated the integers along the powers of 8, that is 8, 64, 512, 4096,
. . . They would have written our decimal number 4357 as

1 4096 + 0 512 + 4 64 + 0 8 + 5 = 1 84 + 0 83 + 4 82 + 0 81 + 5 80 = 10405

and the decimal 0:501625 as

1 2
4 0:125 + 1 0:0015625 = 4 8 +1 8 = 0:41
In general, given a base b and a set of digits

Cb = fc0 ; c1 ; :::; cb 1g

used to represent the integers between 0 and b 1, every natural number n is written in the
base b as
dk dk 1 d1 d0
where k is an appropriate natural number and

n = dk bk + dk 1b
k 1
+ + d1 b + d0

with di 2 Cb for each i = 0; :::; k.


For example, let us consider the duodecimal base, with digits

0; 1; 2; 3; 4; 5; 6; 7; 8; 9; |; •

We have used the symbols | and • for the two additional digits we need compared to the
decimal notation. The duodecimal number

9|0•2 = 9 124 + | 123 + 0 122 + • 12 + 2


34 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

can be converted to decimal notation as


9|0•2 = 9 124 + | 123 + 0 122 + • 12 + 2
= 9 124 + 10 123 + 0 122 + 11 12 + 2
= 188630
using the conversion table
Duod. 0 1 2 3 4 5 6 7 8 9 | •
Dec. 0 1 2 3 4 5 6 7 8 9 10 11
One can note that the duodecimal notation 9|0•2 requires fewer digits than the decimal
188630, that is, …ve instead of six. On the other hand, the duodecimal notation requires 12
symbols to be used as digits, instead of 10. It is a typical trade o¤ one faces in choosing the
base in which to represent numbers: larger bases make it possible to represent numbers with
fewer digits, but require a large set of digits. The solution to the trade o¤, and the resulting
choice of base, depends on the characteristics of the application of interest.
For example, in electronic engineering it is important to have a set of digits which is as
simple as possible, with only two elements, as computers and electrical appliances naturally
have only two digits at their disposal (open or closed circuit, positive or negative polarity).
For this reason, the base 2 is incredibly common, as it is the most e¢ cient base in terms of
the complexity of the digit set C2 , which only consists of the digits 0 and 1 (which are called
bits, from binary digits).
In binary notation, the integers can be written as
Dec. 0 1 2 3 4 5 6 7 8 9 10 11 16
Bin. 0 1 10 11 100 101 110 111 1000 1001 1010 1011 10000
where, for example, in binary notation

1011 = 1 23 + 0 22 + 1 21 + 1 20
and in decimal notation
11 = 1 101 + 1 100
The considerable reduction in the digit set C2 made possible by the base 2 involves in terms of
cost the large number of bits required to represent numbers in binary notation. For example:
if 16 consists of two decimal digits, the corresponding binary 10000 requires …ve bits; if 201
requires three digits, the corresponding binary 11001001 requires eight bits; if 2171 requires
four digits, the corresponding binary 100001111011 requires twelve bits, and so on. Very
quickly, binary notation requires a number of bits that only a computer is able to process.

From a purely mathematical perspective, the choice of base is merely conventional, and
going from one base to another is easy (although tedious).32 Bases 2 and 10 are nowadays
32
Operations on numbers written in a non-decimal notation are not particularly di¢ cult either. For ex-
ample, 11 + 9 = 20 can be calculated in a binary way as
1011+
1001 =
10100
It is su¢ cient to remember that the “carrying” must be done at 2 and not at 10.
1.6. NUMBERS, FINGERS AND CIRCUITS 35

the most important ones, but others have been used in the past, such as 20 (the number of
…ngers and toes, a trace of which is still found in the French language where “quatre-vingts”
– i.e., “four-twenties” – stands for eighty and “four-twenty-ten” stands for ninety), as well
as 16 (the number of spaces between …ngers and toes) and 60 (which is convenient because
it is divisible by 2, 3, 4, 5, 6, 10, 12, 15, 20 and 30; a signi…cant trace of this system remains
in how we divide hours and minutes and in how we measure angles).

The positional notation has been used to perform manual calculations since the dawn
of times (just think about computations carried out with the abacus), but it is a relatively
recent conquest in terms of writing, made possible by the fundamental innovation of the zero,
and has been exceptionally important in the development of mathematics and its countless
applications – commercial, scienti…c, and technological. Born in India (apparently around
the …fth century AD), the positional notation was developed during the early Middle Ages
in the Arab world (especially thanks to the works of Al-Khwarizmi), from which the name
“Arabic numerals” for the digits (1.23) derives. It arrived in the Western world thanks
to Italian merchants between the 11th and 12th centuries. In particular, the son of one
of those merchants, Leonardo da Pisa (also known as Fibonacci), was the most important
medieval mathematician: for the …rst time in Western Europe after so many dark centuries,
he conducted original research in mathematics with the overt ambition of going beyond
what the great mathematicians of the classical world had established. Inter alia, Leonardo
authored a famous treatise in 1202, the Liber Abaci, which was the most important among
the …rst essays who brought in Europe the positional notation. Until then, non-positional
Roman numerals were used

I; II; III; IV; V; :::; X; :::; L; :::; C; :::M; :::

which made even trivial operations overly complex (try to sum up CXL and MCL, and then
140 and 1150).
Let us conclude with the incipit of the …rst chapter of Liber Abaci, with the extraordinary
innovation that the book brought to the Western world:

Novem …gure indorum he sunt

9; 8; 7; 6; 5; 4; 3; 2; 1

Cum his itaque novem …guris, et cum hoc signo, quod arabice zephirum appellatur,
scribitur quilibet numerus, ut inferius demonstratur. [...] ut in sequenti cum
…guris numeris super notatis ostenditur.

MI M M XXIII M M M XXII M M M XX M M M M M DC MMM


1001 2023 3022 3020 5600 3000

... Et sic in reliquis numeris est procedendum.33


33
“The nine Indian symbols are ... With these nine symbols and with the symbol 0, which the Arabs call
zephyr, any number can be written as shown below. [...] the above numbers are shown below in symbols
... And in this way you continue for the following numbers.” Interestingly, Roman numerals continued to be
used in book keeping for a long time because they are more di¢ cult to manipulate (just add a 0 to an Arabic
numeral in a balance sheet...).
36 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

1.7 The extended real line


In the theory of limits that we will study later in the book, it is very useful to consider the
extended real line. It is obtained by adding to the real line the two ideal points +1 and
1. We obtain in such a way the set

R [ f 1; +1g

denoted by the symbol R or, sometimes, by [ 1; +1]. The order structure of R can be
naturally extended on R by setting 1 < a < +1 for each a 2 R.

The operations de…ned in R can be partially extended to R. In particular, besides the


usual rules of calculation in R, on the extended real line the following additional rules hold:

(i) addition with a real number:

a + 1 = +1; a 1= 1 8a 2 R (1.24)

(ii) addition between in…nities of the same sign:

+1 + 1 = +1 and 1 1= 1

(iii) multiplication with a non-zero number:

a (+1) = +1 and a ( 1) = 1 8a > 0


a (+1) = 1 and a ( 1) = +1 8a < 0

(iv) multiplication of in…nities:


(
+1 (+1) = 1 ( 1) = +1
+1 ( 1) = 1 (+1) = 1

with, in particular,

(+1)a = +1 if a > 0 and (+1)a = 0 if a < 0

(v) division:
a a
= =0 8a 2 R
+1 1
(vi) power of a real number:
8
>
> a+1 = +1 if a > 1
>
>
>
< a+1 = 0 if 0 < a < 1
>
> a 1 =0 if a > 1
>
>
>
: 1
a = +1 if 0 < a < 1
1.7. THE EXTENDED REAL LINE 37

(vii) power between in…nities:


(
(+1)+1 = +1
(+1) 1 =0

While the addition of in…nities with the same sign is a well-de…ned operation (for example,
the sum of two positive in…nities is again a positive in…nity), the addition of in…nities of
di¤erent sign is not de…ned. For example, the result of +1 1 is not de…ned. This is a
…rst example of an indeterminate operation in R. In general, the following operations are
indeterminate:

(i) addition of in…nities with di¤erent sign:

+1 1 and 1+1 (1.25)

(ii) multiplication between 0 and in…nity:

1 0 and 0 ( 1) (1.26)

(iii) divisions with denominator equal to zero or with numerator and denominator that are
both in…nities:
a 1
and (1.27)
0 1

with a 2 R;

(iv) the powers:


1
1 ; 00 ; (+1)0 (1.28)

The indeterminate operations (i)-(iv) are called forms of indetermination and will play
an important role in the theory of limits. Note that, by setting a = 0, formula (1.27) takes
the form
0
0

O.R. As we have observed, the most natural geometric image of R is the (real) line: to each
point there corresponds a number and, vice versa, to each number there corresponds a point.
If we take a closed (and obviously bounded) segment, we can “transport” all the numbers
from the real line to the open interval (0; 1), as the following …gure shows:34

34
We refer to the proof of Proposition 253 for the analytic expression of the bijection shown here.
38 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

All the real numbers that found a place on the real line also …nd a place on the interval (0; 1)
– maybe packed, but they really …t all. Two points are left, the endpoints of the interval,
to which it is natural to associate, respectively, +1 and 1. The closed interval [0; 1] is,
therefore, a geometric image of R. H

1.8 The birth of the deductive method


The deductive method, upon which mathematics is based, was born between the VI and
the V century B.C. and, in that period, came to dominate Greek mathematics. As we have
seen throughout the chapter, mathematical properties are stated in theorems, whose truth
is established by a logical argument, their proof, which is based on axioms and de…nitions.
It is a revolutionary innovation in the history of human thought, celebrated in several
Dialogues of Plato and elaborated and codi…ed in the Elements of Euclid. It places reason as
the sole guide for scienti…c (and non-scienti…c) investigations. A mathematical property –for
example, that the sum of the squares of the catheti is equal to the square of the hypotenuse –
is true because it can be logically proved and not because it is empirically veri…ed in concrete
examples or because a nice drawing makes the intuition clear or because some “authority”
reveals its truth.
Little is known about the birth of the deductive method, the survived documentation is
scarce. Reason emerged in the Ionian Greek colonies (…rst in Miletus with Thales and Anax-
imander) to guide the …rst scienti…c investigations of physical phenomena. It was, however,
in Magna Graecia that reason …rst tackled abstract matters. An intriguing hypothesis, pro-
1.8. THE BIRTH OF THE DEDUCTIVE METHOD 39

posed by Arpad Szabo,35 underlines the importance of the Eleatic philosophy, ‡ourished at
Elea in the V century B.C. and that has in Parmenides and Zeno its most famous exponents.
In Parmenides’famous doctrine of the Being, a turning point in intellectual history that the
reader might have encountered in some high school philosophy course, it is logic that permits
the study of the Being, that is, of the world of truth ( " ). This study is impossible for
the senses, which can only guide us among the appearances that characterize the world of
opinion ( o ). In particular, only the reason can dominate the arguments by contradiction,
which have no empirical substratum, but are the pure result of reason. Such arguments,
developed – according to Szabo – by the Eleatic school and at the center of its dialectics
(culminated in the famous paradoxes of Zeno), for example enabled the Eleatic philosopher
Melissus of Samo to state that the Being “always was what it was and always will be. For
if it had come into being, necessarily before it came into being there was nothing. But, if
there was nothing, in no way could something come into being from nothing”.36
True knowledge is thus theoretic, only the eye of the mind can see the truth, while
empirical analysis necessarily stops at the appearance. The anti-empirical character of the
Eleatic school could have been decisive in the birth of the deductive method, at least in
creating a favorable intellectual environment. Naturally, it is not possible to exclude an
opposite causality to the one proposed by Szabo: The deductive method could have been
developed inside mathematics and could have p then in‡uenced philosophy, and in particular
the Eleatics.37 Indeed, the irrationality of 2, established by the Pythagorean school (the
other great Presocratic school of Magna Graecia), is a …rst decisive triumph of such a method
in mathematics: only the eye of the mind could see such a property, which is devoid of
any “empirical” intuition. It is the eye of the mind that explains the inescapable error
in which incurs every empirical measurement of the hypotenuse of a right triangle with
catheti of unitary length: however accurate is this
p measurement, it will always be a rational
approximation of the true irrational distance, 2, with a consequent approximation error
(that, by the way, will probably vary from measurement to measurement).
In any case, between the VI and the V century B.C. two Presocratic schools of Magna
Graecia were the cradle of an incredible intellectual revolution. In the III century B.C. an-
other famous Magna Graecia scholar, Archimedes from Syracuse, led this revolution to its
maximum splendor in the classical world (and beyond). We close with Plato’s famous (prob-
ably …ctional) description of two protagonists of this revolution, Parmenides and Zeno.38
35
See Szabo (1978). Elea was a town of Magna Graecia, around 140 kilometers south of Naples.
36
Barnes (1982) calls this beautiful fragment the theorem of ungenerability (trans. Allho¤, Smith, and
Vaidya in “Ancient phylosophy”, Blackwell, 2008). In a less transparent way (but it was part of the …rst
logical argument ever reported) Parmenides had written in his poem “And how might what is be then? And
how might it have come into being? For if it came into being, it is not, nor if it is about to be at some time”
(trans. Barnes). We refer to Calogero (1977) for a classic work on Eleatic philosophy, and to Barnes (1982)
as well as to the recent Warren (2014), for general introductions to the Presocratics.
37
For instance, arguments by contradiction could have been developed within the Pythagorean school p
through the odd-even dichotomy for natural numbers that is central in the proof of the irrationality of 2.
This is what Cardini Timpanaro (1964) argues, contra Szabo, in her comprehensive book. See also pp. 258-
259 in Vlastos (1996). Interestingly, the archaic Greek enigmas were formulated in contradictory terms (their
role in the birth of dialectics is emphasized by Colli, 1975).
38
In Plato’s dialogue “Parmenides” (trans. Jowett reported in Barnes ibid.). A caveat : over the centuries
– actually, over the millennia – the strict Eleatic anti-empirical stance (understandable, back then, in the
excitement of a new approach) has inspired a great deal of metaphysical thinking. Reason without empirical
motivation and discipline becomes, at best, sterile.
40 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

They came to Athens ... the former was, at the time of his visit, about 65 years
old, very white with age, but well favoured. Zeno was nearly 40 years of age,
tall and fair to look upon: in the days of his youth he was reported to have been
beloved by Parmenides.
Chapter 2

Cartesian structure and Rn

2.1 Cartesian products and Rn


Suppose we want to classify a wine according to two characteristics, aging and alcoholic
content. For example, suppose one reads on a label: 2 years of aging and 12 degrees. We
can write
(2; 12)

On another label one reads: 1 year of aging and 10 degrees. In this case we can write

(1; 10)

The pairs (2; 12) and (1; 10) are called ordered pairs. In them we distinguish the …rst element,
the aging, from the second one, the alcoholic content. In an ordered pair the position is,
therefore, crucial: a (2; 12) wine is very di¤erent from a (12; 2) wine (try the latter...).
Let A1 be the set of the possible years of aging and A2 the set of the possible alcoholic
contents. We can then write

(2; 12) 2 A1 A2 ; (1; 10) 2 A1 A2

We denote by a1 a generic element of A1 and by a2 a generic element of A2 . For example,


in (2; 12) we have a1 = 2 and a2 = 12.

De…nition 44 Given two sets A1 and A2 , the Cartesian product A1 A2 is the set of all
the ordered pairs (a1 ; a2 ) with a1 2 A1 and a2 2 A2 .

In the example, we have A1 N and A2 N, i.e., the elements of A1 and A2 are natural
numbers. More generally, we can assume that A1 = A2 = R, so that the elements of A1
and A2 are real numbers, although with a possible di¤erent interpretation according to their
position. In this case
A1 A2 = R R = R2

and the pair (a1 ; a2 ) can be represented by a point in the plane:

41
42 CHAPTER 2. CARTESIAN STRUCTURE AND RN

An ordered pair of real numbers (a1 ; a2 ) 2 R2 is called vector.

Among the subsets of R2 , of particular importance are:

(i) (a1 ; a2 ) 2 R2 : a1 = 0 , that is, the set of the ordered pairs of the form (0; a2 ); it is
the vertical axis (or axis of the ordinates).
(ii) (a1 ; a2 ) 2 R2 : a2 = 0 , that is, the set of the ordered pairs of the form (a1 ; 0); it is
the horizontal axis (or axis of the abscissae).
(iii) (a1 ; a2 ) 2 R2 : a1 0 and a2 0 , that is, the set of the ordered pairs (a1 ; a2 ) with
both components that are positive; it is the …rst quadrant of the plane (also called
positive orthant). In a similar way we can de…ne the other quadrants:

y
3

II I
1

0
O x
-1

III IV
-2
-3 -2 -1 0 1 2 3 4 5

(iv) (a1 ; a2 ) 2 R2 : a21 + a22 1 and (a1 ; a2 ) 2 R2 : a21 + a22 < 1 , that is, the closed unit
ball and open unit ball, respectively (both centered at the origin and with radius one).1
1
The meaning of the adjectives “closed” and “open” will become clear in Chapter 5.
2.1. CARTESIAN PRODUCTS AND RN 43

(v) (a1 ; a2 ) 2 R2 : a21 + a22 = 1 , that is, the unit circle; it is the skin of the closed unit
ball:

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

Before we classi…ed wines according to two characteristics, aging and alcoholic content.
We now consider a more complicated example, that is, portfolios of assets. Suppose that
there exist four di¤erent assets that can be purchased in a …nancial market. A portfolio is
then described by an ordered quadruple

(a1 ; a2 ; a3 ; a4 )

where a1 is the amount of money invested in the …rst asset, a2 is the amount of money
invested in the second asset, and so on. For example,

(1000; 1500; 1200; 600)

denotes a portfolio in which 1000 euros have been invested in the …rst asset, 1500 in the
second one, and so on. The position is crucial: the portfolio

(1500; 1200; 1000; 600)

is very di¤erent from the previous one, although the amounts of money involved are the
same.
Since amounts of money are numbers that are not necessarily integers, possibly negative
(in case of sales), it is natural to assume A1 = A2 = A3 = A4 = R, where Ai is the set of the
possible amounts of money that can be invested in asset i = 1; 2; 3; 4. We have

(a1 ; a2 ; a3 ; a4 ) 2 A1 A2 A3 A4 = R4

In particular,
(1000; 1500; 1200; 600) 2 R4
In general, if we consider n sets A1 ; A2 ; :::; An we can give the following de…nition.
44 CHAPTER 2. CARTESIAN STRUCTURE AND RN

De…nition 45 Given n sets A1 ; A2 ; :::; An , their Cartesian product

A1 A2 An
Q
denoted by ni=1 Ai (or by ni=1 Ai ), is the set of all the ordered n-tuples (a1 ; a2 ; :::; an ) with
a1 2 A1 ; a2 2 A2 ; ; an 2 An .

We call a1 ; a2 ; ; an the components (or elements) of a. When A1 = A2 = = An =


A, we write
A1 A2 An = A A A = An
In particular, if A1 = A2 = = An = R the Cartesian product is denoted by Rn , which
therefore is the set of all the (ordered) n-tuples of real numbers. In other words,

Rn = |R R {z R}
n times

An element
x = (x1 ; x2 ; :::; xn ) 2 Rn
is called vector.2 The Cartesian product Rn is called the (n-dimensional ) Euclidean space.
For n = 1, R is represented by the real line and, for n = 2, R2 is represented by the plane.
As one learns in high school, it was Descartes that in 1637 understood it – so all points of
the plane can be identi…ed by a pair (a1 ; a2 ), as seen in a previous …gure – a marvelous
insight that permitted to study geometry through algebra (this is why Cartesian products
are named after him). Also the vectors (a1 ; a2 ; a3 ) in R3 admit a graphic representation:

1 z
0.9

0.8
a
3
0.7

0.6

0.5
a
2
0.4 O
0.3 a
1
0.2 y
x
0.1

0
0 0.2 0.4 0.6 0.8 1

However, this is no longer possible in Rn when n 4. The graphic representation may help
the intuition, but from a theoretical and computational viewpoint it has no importance: the
vectors of Rn , with n 4, are completely well-de…ned entities. They actually turn out to be
2
For real numbers we use the letter x instead of a.
2.2. OPERATIONS IN RN 45

fundamental in economics, as we will see in Section 2.4 and as the portfolio example already
showed.

Notation We will denote the components of a vector by the same letter used for the vector
itself, along with ad hoc indexes: for example a3 is the third component of the vector a, y7
the seventh component of the vector y, and so on.

2.2 Operations in Rn
Let us consider two vectors in Rn ,

x = (x1 ; x2 ; ::; xn ) ; y = (y1 ; y2 ; :::; yn )

We de…ne the vector sum x + y by

x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn )

For example, for the two vectors x = (7; 8; 9) and y = (2; 4; 7) in R3 , we have

x + y = (7 + 2; 8 + 4; 9 + 7) = (9; 12; 16)

Note that x + y 2 Rn : through the operation of addition we constructed a new element of


Rn .
Now, let 2 R and x 2 Rn . We de…ne the vector product x by

x = ( x1 ; x2 ; :::; xn )

For example, for = 2 and x = (7; 8; 9) 2 R3 , we have

2x = (2 7; 2 8; 2 9) = (14; 16; 18)

Even in this case, we have x 2 Rn . In other words, also through the operation of scalar
multiplication we constructed a new element of Rn .3

Notation We set x = ( 1)x = ( x1 ; x2 ; :::; xn ) and x y = x + ( 1) y. We will also


set 0 = (0; 0; :::; 0), where boldface distinguishes the vector 0 of zeros from the scalar 0. The
vector 0 is called zero vector.

We have introduced in Rn two operations, addition and scalar multiplication, that extend
to vectors the corresponding operations for real numbers. Let us see their properties. We
start with addition.

Proposition 46 Let x; y; z 2 Rn . The operation of addition satis…es the following proper-


ties:

(i) x + y = y + x (commutativity),
3
A real number is often called scalar. Throughout the book we will use the terms “scalar” and “real
number” interchangeably.
46 CHAPTER 2. CARTESIAN STRUCTURE AND RN

(ii) (x + y) + z = x + (y + z) (associativity),

(iii) x + 0 = x (existence of the neutral element for addition),

(iv) x + ( x) = 0 (existence of the opposite of any vector).

Proof We prove (i), leaving the other properties to the reader. We have

x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn ) = (y1 + x1 ; y2 + x2 ; :::; yn + xn ) = y + x

as desired.

We now consider scalar multiplication.

Proposition 47 Let x; y 2 Rn and ; 2 R. The operation of scalar multiplication satis…es


the following properties:

(i) (x + y) = x + y (distributivity of the addition of vectors),

(ii) ( + ) x = x + x (distributivity for the addition of scalars),

(iii) 1x = x (existence of the neutral element for the scalar multiplication),

(iv) ( x) = ( ) x (associativity).

Proof We only prove (ii), the other properties are left to the reader. We have:

( + ) x = (( + ) x1 ; ( + ) x2 ; :::; ( + ) xn )
= ( x1 + x1 ; x2 + x2 ; :::; xn + xn )
= ( x1 ; x2 ; :::; xn ) + ( x1 ; x2 ; :::; xn ) = x + x

as claimed.

The last operation in Rn that we consider is the inner product. Given two vectors x and
y in Rn , their inner product, denoted by x y, is the scalar de…ned by

x y = x1 y1 + x2 y2 + + xn yn

That is, in more compact notation,4


n
X
x y= x i yi
i=1

Other common notations for the inner product are (x; y) and hx; yi.
For example, for the vectors x = (1; 1; 5; 3) and y = ( 2; 3; ; 1) of R4 , we have

x y = 1 ( 2) + ( 1) 3 + 5 + ( 3) ( 1) = 5 2
4 Pn
Given n real
Q numbers ri , their sum r1 + r2 + + rn is denoted by i=1 ri , while their product r1 r2 rn
is denoted by n i=1 ri .
2.3. ORDER STRUCTURE ON RN 47

The inner product is an operation that di¤ers from addition and scalar multiplication in a
structural aspect: while the latter operations determine a new vector of Rn , the result of the
inner product is a scalar. The next result gathers the main properties of the inner product
(we leave to the reader the simple proof).

Proposition 48 Let x; y; z 2 Rn and 2 R. We have:

(i) x y = y x (commutativity),

(ii) (x + y) z = (x z) + (y z) (distributivity),

(iii) x z= (x z) (distributivity).

Note that the two distributive properties can be summarized in the single property
( x + y) z = (x z) + (y z).

2.3 Order structure on Rn

The order structure of Rn is based on the order structure of R, but with some important
novelties. We begin by de…ning the order on Rn : given two vectors x = (x1 ; x2 ; ::; xn ) and
y = (y1 ; y2 ; ::; yn ) in Rn , we write

x y

when xi yi for every i = 1; 2; : : : ; n. In particular, we have x = y if and only if we have


both x y and y x.
In other words, orders two vectors by applying, component by component, the order
on R studied in Section 1.4. For example, x = (0; 3; 4) y = (0; 2; 1). When n = 1, the
order thus reduces to the standard one on R.

The study of the basic properties of the inequality on Rn reveals a …rst important
novelty: when n 2, the order does not satisfy completeness. Indeed, consider for
example x = (0; 1) and y = (1; 0) in R2 : neither x y nor y x. We say, therefore, that
on Rn is a partial order (which becomes a complete order when n = 1).
It is easy to …nd vectors in Rn that are not comparable. The following …gure shows the
vectors of R2 that are or than the vector x = (1; 2); the darker area represents the points
smaller than x, the clearer area those greater than x, and the two white areas represent the
48 CHAPTER 2. CARTESIAN STRUCTURE AND RN

points that are not comparable with x.

5
y
4

2
2
1

0
O 1 x

-1

-2
-2 -1 0 1 2 3 4 5

Apart from completeness, it is easy to verify that on Rn continues to enjoy the properties
seen for n = 1:

(i) re‡exivity: x x,

(ii) transitivity: if x y and y z, then x z,

(iii) independence: if x y, then x + z y + z for every z 2 Rn ,

(iv) separation: given two sets A and B in Rn , if a b for every a 2 A and b 2 B, then
there exists c 2 Rn such that a c b for every a 2 A and b 2 B.

Another notion that becomes surprisingly delicate when n 2 is that of strict inequality.
Indeed, given two vectors x = (x1 ; x2 ; :::; xn ) and y = (y1 ; y2 ; :::; yn ) of Rn , two cases can
happen.

1. All the components of x are than the corresponding components of y, with some of
them strictly greater; i.e., xi yi for each index i = 1; 2; :::n, with xi > yi for at least
an index i.

2. All the components of x are > than the corresponding components of y; i.e., xi > yi
for each i = 1; 2; :::; n:

In the …rst case we have a strict inequality, in symbols x > y; in the second case a strong
inequality, in symbols x y.

Example 49 For x = (1; 3; 4) and y = (0; 1; 2) in R3 , we have x y. For x = (0; 3; 4) and


y = (0; 1; 2), we have x > y, but not x y, because x has only two components out of three
strictly greater than the corresponding components of y. N
2.3. ORDER STRUCTURE ON RN 49

Given two vectors x; y 2 Rn , we have

x y =) x > y =) x y

The three notions of inequality among vectors in Rn are, therefore, more and more
stringent. Indeed, we have:

(i) a weak notion, , that permits the equality between the two vectors;

(ii) an intermediate notion, >, that requires at least one strict inequality among the com-
ponents;

(iii) a strong notion, , that requires strict inequality among all the components of the
two vectors.

When n = 1, both > and reduce to the standard > on R. Moreover, the “reversed”
symbols , <, and are used for the converse inequalities.

An important comparison is that between a vector x and the zero vector 0. We say that
the vector x is:

(i) positive if x 0, i.e., if all the components of x are positive;

(ii) strictly positive if x > 0, i.e., if all the components of x are positive and at least one
of them is strictly positive;

(iii) strongly positive if x 0, i.e., all the components of x are strictly positive.

N.B. The notation and terminology that we introduced is not the only possible one. For
example, some authors use =, >, and > in place of >, >, and ; other authors call “non-
negative” the vectors that we call positive, and so on. O

Together with the lack of completeness of , the presence of the two di¤erent notions of
strict inequality is the main novelty, relative to what happens in the real line, that we have
in Rn when n 2.

We conclude this section by generalizing the intervals introduced in R (Section 1.4).


Given a; b 2 Rn , we have:

(i) the bounded closed interval

[a; b] = fx 2 Rn : a x bg = fx 2 Rn : ai xi bi g

(ii) the bounded open interval

(a; b) = fx 2 Rn : a x bg = fx 2 Rn : ai < xi < bi g

(iii) the bounded half-closed (or half-open) intervals

(a; b] = fx 2 Rn : a x bg and [a; b) = fx 2 Rn : a x bg


50 CHAPTER 2. CARTESIAN STRUCTURE AND RN

(iv) the unbounded intervals [a; 1) = fx 2 Rn : x ag and (a; 1) = fx 2 Rn : x ag,


and their analogues ( 1; a] and ( 1; a).

N.B. (i) The intervals [0; 1) = fx 2 Rn : x 0g and (0; 1) = fx 2 Rn : x 0g are often


denoted by Rn+ and Rn++ , respectively. The intervals Rn = fx 2 Rn : x 0g and Rn =
fx 2 Rn : x 0g are similarly de…ned. (ii) The intervals n
Q n in R can be expressed as Cartesian
products of intervals in R; for example, [a; b] = i=1 [ai ; bi ]. (iii) In the intervals just
introduced we used the inequalities or . By replacing them with the inequality <, we
obtain other possible intervals that, however, are not that relevant for our purposes. O

2.4 Applications
2.4.1 Static choices
Consider a consumer who has to choose how many kilograms of apples and of potatoes to
buy at the market. For convenience, we assume that these goods are in…nitely divisible, so
that the consumer can buy any real positive quantity (for example, 3 kg of apples and
kg of potatoes). In this case, R+ is the set of the possible quantities of apples or potatoes
that can be bought. Therefore, the collection of all bundles of apples and potatoes that the
consumer can buy is
R2+ = R+ R+ = f(x1 ; x2 ) : x1 ; x2 0g
Graphically, it is the …rst quadrant of the plane. In general, if a consumer chooses n goods,
the set of the bundles is represented by the Cartesian product

Rn+ = R+ R+ R+ = f(x1 ; x2 ; ::; xn ) : xi 0 for i = 1; 2; :::; ng

In production theory, a vector in Rn+ represents, instead, a possible con…guration of n


inputs for the producer. In this case the vector x = (x1 ; x2 ; ::; xn ) indicates that the producer
has at his disposal x1 units of the …rst input, x2 units of the second input, ..., and xn units
of the last input.

2.4.2 Intertemporal choices


In consumer theory a vector x = (x1 ; x2 ; ::; xn ) may thus be interpreted as a bundle in which
xi is the quantity of good i = 1; 2; :::; n. But, there is another possible interpretation in which
there is a single good and x = (x1 ; x2 ; ::; xn ) indicates the quantity of such good available
in di¤erent periods, with xi being the quantity of the good available in the i-th period. For
example, if the single good are apples, x1 is the quantity of apples in period 1, x2 is the
quantity of apples in period 2, and so on, until xn which is the quantity of apples in the n-th
period.
In this case, Rn+ denotes the space of all streams of quantities of a given good, say apples,
over n periods. It is often used the more evocative notation RT , where T is the number of
periods and xt is the quantity of apples in period t, with t = 1; 2; : : : ; T .5 A fundamental
5
The notation t = 1; 2; : : : ; T is equivalent to t 2 f1; 2; : : : ; T g, like the notation i = 1; 2; : : : ; n is equivalent
to i 2 f1; 2; : : : ; ng. Choosing one of them is a matter of convenience.
2.5. PARETO OPTIMA 51

example is the one in which the good is money, so that


x = (x1 ; x2 ; :::; xt ; :::; xT ) 2 RT
represents the quantity of money in di¤erent periods: in this case stream x is called a cash
‡ow. For example, the checking account of a family records in each day the balance between
revenues (wages, incomes, etc.) and expenditures (purchases, rents, etc.). Setting T = 365,
the resulting cash ‡ow is
x = (x1 ; x2 ; ::::; x365 )
So, x1 is the balance of the checking account on January 1, x2 is the balance on January 2,
and so on until x365 , which is the balance at the end of the year.

Instead of a stream of quantities of a single good, we can consider a stream of bundles of


several goods. Similarly, in an intertemporal problem of production, we will have streams of
inputs’vectors. Such situations are modeled by means of matrices, a simple notion that will
be studied in Chapter 13. Many economic applications focus, however, on the single good
case, so RT is an important space in economics.

2.5 Pareto optima


2.5.1 De…nition
The concept of maximum of a subset of R (De…nition 30) can be equivalently reformulated
as follows:
Lemma 50 Let A R. A point x
^ 2 A is maximum of A if and only if there is no x 2 A
such that x > x
^.
Indeed, since is complete on the real line, requiring that all the points of A be x
^
amounts to require that none of them be > x ^. A similar reformulation can be given for
minima.
That said, turn now to subsets of the space Rn , with its order . We can extend the
notion of maximum in the following way.
De…nition 51 Let A Rn . A point x
^ 2 A is called maximum of A if x
^ x for every
x 2 A.
In an analogous way we can de…ne the minimum. Moreover, Proposition 33 continues to
hold: the maximum (minimum) of a set A Rn , if it exists, is unique (as the reader can
check).
Unfortunately, this last de…nition is of little interest in economic applications because
often subsets of Rn do not have maxima (or minima) since the order is not complete in
n
R when n 2 (Section 2.3). The binary set f(1; 2) ; (2; 1)g is a trivial example of a set of
the plane without maxima and minima.
It is much more fruitful to follow, instead, the order of ideas sketched in Lemma 50. In-
deed, the characterization there established is equivalent to the usual de…nition of maximum
in R, but it becomes more general in Rn because is no longer complete when n 2 and
so the “if” is easily seen to fail in Lemma 50. This motivates the next de…nition, of great
importance in economic applications.
52 CHAPTER 2. CARTESIAN STRUCTURE AND RN

De…nition 52 Let A Rn . A point x ^ 2 A is called maximal (or a Pareto optimum) of A


if there is no x 2 A such that x > x
^.

In a similar way we can de…ne minimals, which are also called Pareto optima.6
To understand the nature of maximals,7 say that a point x 2 A is dominated by another
point y 2 A if x < y, that is, if xi yi for each index i, with xi < yi for at least an index
i (Section 2.3). A dominated point is thus outperformed by another point available in the
set. For instance, if they represent bundles of goods, a dominated bundle x is obviously a no
better alternative than the dominant one y. In terms of dominance, we can say that a point a
of A is maximal if is not dominated by any other point in A. That is, a is not outperformed
by any other alternative available in A. Maximality is thus the natural extension of the
notion of maximum when dealing –as it is often the case in applications –with alternatives
that are multi-dimensional (and so represented by vectors of Rn ).

2.5.2 Maxima and maximals


Lemma 50 shows that the notions of maximum and maximal are equivalent in R. This is no
longer true in Rn when n > 1: the notion of maximum becomes (much) stronger than that
of maximal.

Lemma 53 The maximum of a set A Rn is, if it exists, the unique maximal of A.

Proof Let x ^ 2 A be the maximum of A. Clearly, x ^ is a maximal. We need to show that it


is the unique maximal. Let x 2 A with x 6= x^. Since x
^ is the maximum of A, we have x
^ x.
Since x 6= x
^, we have x
^ > x. Therefore, x is not a maximal.

The set in the next …gure has a maximum, i.e., point a. Thanks to this lemma, a is
therefore also the unique maximal.

Thus:
maximum =) maximal
6
Optima, like angels, have no gender. Note that here “maximal” is an adjective used as a noun (as it was
the case for “maximum” in De…nitions 30 and 51). If used as adjectives, we would have “maximal element”
(as well as “maximum element”).
7
In the rest of the chapter we focus on maxima and maximals, the most relevant in economic applications,
leaving to the reader the dual properties that hold for minima and minimals.
2.5. PARETO OPTIMA 53

But, the converse is false: there exist maximals that are not maxima, that is,
maximal 6=) maximum

Example 54 In the binary set A = f(1; 2) ; (2; 1)g of the plane, the vector (2; 1) is a maximal
that is not a maximum, while the vector (1; 2) is a minimal that is not a minimum. N

Example 55 The next …gure shows a set A of R2 that has no maxima, but in…nitely many
maximals.

3 a

2
A

0
O

-1

-2
-2 -1 0 1 2 3 4 5

It is easy to see that any point a 2 A on the dark edge is maximal: there is no x 2 A such
that x > a. On the other hand, a is not a maximum: we have a x only for the points
x 2 A that are comparable with a, which are represented in the shaded part of A :

Nothing can be said, instead, for the points that are not comparable with a (the non-shaded
part of A). The lack of maxima for this set is thus due to the fact that the order is only
partial in Rn when n > 1. N

The set A of the last example illustrates another fundamental di¤erence between maxima
and maximals in Rn with n > 1: the maximum of a set, if it exists, is unique while a maximal
might well not to be unique.
54 CHAPTER 2. CARTESIAN STRUCTURE AND RN

Summing up, because of the incompleteness of the order on Rn , maxima are much less
important than maximals in Rn . That said, maximals might also not exist: the 45 straight
line is a subset of R2 without maximals (and minimals).8

2.5.3 Pareto frontier and Edgeworth box


Maximals are fundamental in economics, where they are often called Pareto optima. The
set of these points is of particular importance.

De…nition 56 The set of the maximals of a set A Rn is called the Pareto (or e¢ cient)
frontier of A.

In the last example, the dark edge is the Pareto frontier of the set A :
5

2
A

0
O

-1

-2
-2 -1 0 1 2 3 4 5

As a …rst economic application, assume for example that the di¤erent vectors of a set
A Rn represent the pro…ts that n individuals can earn. So, in x = (x1 ; :::; xn ) 2 A the
component xi is the pro…t of individual i, with i = 1; :::; n. The Pareto optima represent
the situations from which it is not possible to move away without reducing the pro…t of at
least one of the individuals. In other words, the n individuals would not object to restrict A
to the set of its Pareto optima (nobody looses), that is, to its Pareto frontier. A con‡ict of
interests arises among them, instead, when a speci…c point on the frontier has to be selected.
Thus, the concept of Pareto optimum permits to narrow down, with a unanimous con-
sensus, a set A of alternatives by identifying the true “critical” subset, the Pareto frontier,
which is often much smaller than the original set A.9
8
This set is the graph of the function f : R ! R given by f (x) = x, as we will see in Chapter 6.
9
For Pareto optimality is key that agents only consider their own alternatives (bundles of goods, pro…ts,
etc.), without caring about those of their peers. In other words, they should not feel envy or similar social
emotions. To see why, think of a tribe of “envious” whose chief decides to double the food rations to half of
the members of the tribe, living unchanged those of the other members. The new allocation would provoke
lively protests by the “unchanged” members even though nothing changed for them.
2.5. PARETO OPTIMA 55

A magni…cent illustration of this key aspect of Pareto optimality is the famous Edgeworth
box.10 Consider two agents, Albert and Barbara, who have to divide between them unitary
quantities of two in…nitely divisible goods (for example, a kilogram of ‡our and a liter of
wine). We want to model the problem of division (probably determined by a bargaining
between them) and to see if, thanks to Pareto optimality, we can say something non-trivial
about it.
Each pair x = (x1 ; x2 ) with x1 2 [0; 1] and x2 2 [0; 1], represents a possible allocation of
the two goods to one of the two agents. In particular, the Cartesian product [0; 1] [0; 1]
describes them all. The two agents must agree on the allocations (a1 ; a2 ) of Albert and
(b1 ; b2 ) of Barbara. Clearly,
a1 + b1 = a2 + b2 = 1 (2.1)
To complete the description of the problem, we have to specify the desiderata of the two
agents. To this end, we suppose that they have identical utility functions ua ; ub : [0; 1]
p
[0; 1] ! R that, for simplicity, are of the Cobb-Douglas type ua (x1 ; x2 ) = ub (x1 ; x2 ) = x1 x2
(see Example 178). The indi¤erence curves can be “packed” in the following way:

This is the classic Edgeworth box. By condition (2.1), we can think of a point (x1 ; x2 ) 2
[0; 1] [0; 1] as the allocation of Albert. We can actually identify each possible division
between the two agents with the allocations (x1 ; x2 ) of Albert. Indeed, the allocations of
Barbara (1 x1 ; 1 x2 ) are uniquely determined once those of Albert are known.

Each allocation (x1 ; x2 ) has utility ua (x1 ; x2 ) for Albert and ub (1 x1 ; 1 x2 ) for Bar-
bara. Let

A = (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) 2 R2+ : (x1 ; x2 ) 2 [0; 1] [0; 1]

be the set of all the utility pro…les of the two agents determined by the division of the two
goods. We are interested in the allocations whose utility pro…les belong to the Pareto frontier
10
Since we will use notions that we will introduce in Chapter 6, the reader may want to read this application
after that chapter.
56 CHAPTER 2. CARTESIAN STRUCTURE AND RN

of A, so are Pareto optima of the set A. Indeed, these are the allocations that cannot be
improved upon with a unanimous consensus.
By looking at the Edgeworth box, it is easy to see that the Pareto frontier P of A is
given by the values of allocations on the diagonal of the box, i.e.,

P = (ua (d; d) ; ub (1 d; 1 d)) 2 R2+ : d 2 [0; 1]

That is, by the locus of the tangency points of the indi¤erence curves (called contract curve).
To prove it rigorously, we need the next simple result.

Lemma 57 Given x1 ; x2 2 [0; 1], we have


p p
1 x1 x2 (1 x1 ) (1 x2 ) (2.2)

with equality if and only if x1 = x2 .

Proof Since x1 ; x2 2 [0; 1], we have:


p p p
1 x1 x2 (1 x1 ) (1 x2 ) () (1 x1 x2 )2 (1 x1 ) (1 x2 )
2
x1 + x2 p x1 + x2
() x1 x2 () x1 x2 () (x1 x2 )2 0
2 2

Since the last inequality is always true, we conclude that (2.2) holds. Moreover, these
equivalences imply that
p p
1 x1 x2 = (1 x1 ) (1 x2 ) () (x1 x2 )2 = 0

which holds if and only if x1 = x2 .

Having established this lemma, we can now prove rigorously what the last picture sug-
gested.

Proposition 58 A utility pro…le (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) 2 A is a Pareto optimum


of A if and only if x1 = x2 .

Proof Let D = (d; d) 2 R2+ : d 2 [0; 1] be the diagonal of the box. We start by showing
that, for any division of goods (x1 ; x2 ) 2
= D –i.e., with x1 6= x2 –there exists (d; d) 2 D such
that
(ua (d; d) ; ub (1 d; 1 d)) > (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) (2.3)
For Albert, we have
p p p
ua ( x1 x2 ; x1 x2 ) = x1 x2 = ua (x1 ; x2 )
p p
Therefore, ( x1 x2 ; x1 x2 ) is for him indi¤erent to (x1 ; x2 ). By Lemma 57, for Barbara we
have
p p p p
ub (1 x1 x2 ; 1 x1 x2 ) = 1 x1 x2 > (1 x1 ) (1 x2 ) = ub (1 x1 ; 1 x2 )
p
where the inequality is strict since x1 6= x2 . Therefore, setting d = x1 x2 , (2.3) holds.
2.5. PARETO OPTIMA 57

It follows that the divisions (x1 ; x2 ) outside of the diagonal have utility pro…les that
are not Pareto optima. It remains to show that the divisions on the diagonal are so. Let
(d; d) 2 D and suppose, by contradiction, that there exists (x1 ; x2 ) 2 [0; 1] [0; 1] such that

(ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) > (ua (d; d) ; ub (1 d; 1 d)) (2.4)

Without loss of generality,11 suppose that

ua (x1 ; x2 ) > ua (d; d) and ub (1 x1 ; 1 x2 ) ub (1 d; 1 d)

that is,
p p p p
x1 x2 > dd = d and (1 x1 ) (1 x2 ) (1 d) (1 d) = 1 d

Therefore, p
p
1 x1 x2 < 1 d (1 x1 )(1 x2 )
which contradicts (2.2). It follows that there is no (x1 ; x2 ) 2 [0; 1] [0; 1] for which (2.4)
holds. This completes the proof.

In sum, if agents maximize their Cobb-Douglas utilities, the bargaining will result in
a division of the goods on the diagonal of the Edgeworth box, i.e., such that each agent
has an equal quantity of both goods. Proposition 58 does not say anything about which of
the points of the diagonal is, then, actually determined by the bargaining, that is, how the
ensuing con‡ict of interest among agents is then solved. Nevertheless, through the notion of
Pareto optimum we have been able to say something highly non-trivial about the problem
of division.

11
A similar argument holds when ua (x1 ; x2 ) ua (d; d) and ub (1 x1 ; 1 x2 ) > ub (1 d; 1 d).
58 CHAPTER 2. CARTESIAN STRUCTURE AND RN
Chapter 3

Linear structure

In this chapter we study more in depth the linear structure of Rn which was introduced
in Section 2.2. The study of such a fundamental structure of Rn , which we will continue
in Chapter 13 on linear functions, is part of linear algebra. The theory of …nance is a
fundamental application of linear algebra, as we will see in Section 19.5.

3.1 Vector subspaces of Rn


Propositions 46 and 47 have shown that the operations of addition and scalar multiplication
on Rn satisfy the following properties, for every vectors x; y; z 2 Rn and every scalars ; 2
R,
(v1) x + y = y + x
(v2) (x + y) + z = x + (y + z)
(v3) x + 0 = x
(v4) x + ( x) = 0
(v5) (x + y) = x + y
(v6) ( + ) x = x + x
(v7) 1x = x
(v8) ( x) = ( )x
For this reason, Rn is an example of a vector space, which is, in general, a set where
we can de…ne two operations of addition and scalar multiplication that satisfy properties
(v1)-(v8). For instance, in Chapter 13 we will see another example of vector space, the space
of matrices.1
We call vector subspaces of Rn its subsets that behave well with respect to the two
operations:
1
The notion of vector space, …rst proposed by Giuseppe Peano in 1888 in his book “Calcolo geometrico”
and then developed to its full power by Stefan Banach in the 1920s, is central in mathematics but it is
necessary to go beyond Rn to fully understand it. For this reason the reader will study in depth vector spaces
in more advanced courses.

59
60 CHAPTER 3. LINEAR STRUCTURE

De…nition 59 A non-empty subset V of Rn is called vector subspace if it is closed with


respect to the operations of addition and scalar multiplication, i.e.,2

(i) x + y 2 V if x; y 2 V ;

(ii) x 2 V if x 2 V and 2 Rn .

We leave to the reader the easy check that the two operations satisfy in V properties
(v1)-(v8). In this regard, it is important to note that by (ii) the origin belongs to each vector
subspace V –i.e., 0 2 V –because 0x = 0 for every vector x 2 V .

The following characterization is useful when one needs to check whether a subset of Rn
is a vector subspace.

Proposition 60 A non-empty subset V of Rn is a vector subspace if and only if

x+ y 2V (3.1)

for every ; 2 R and every x; y 2 V .

Proof “Only if”. Let V be a vector subspace and let x; y 2 V . As V is closed with respect
to scalar multiplication, we have x 2 V and y 2 V . It follows that x + y 2 V since V
is closed with respect to addition.
“If”. Putting = = 1 in (3.1), we get x + y 2 V , while putting = 0 we get x 2 V .
Therefore, V is closed with respect to the operations of addition and scalar multiplication
inherited from Rn .

Putting = = 0, (3.1) implies that 0 2 V . This con…rms that each vector subspace
contains the origin 0.

Example 61 There are two legitimate, yet trivial, subspaces of Rn : the singleton f0g and
the space Rn itself. In particular, the reader can check that a singleton fxg is a vector
subspace of Rn if and only if x = 0. N

Example 62 Let m n and set

M = fx 2 Rn : x1 = = xm = 0g

For example, if n = 3 and m = 2, we have M = x 2 R3 : x1 = x2 = 0 . The subset M is a


vector subspace. Indeed, let x; y 2 M and ; 2 R. We have:

x + y = ( x1 + y1 ; :::; xn + yn )
= (0; :::; 0; xm+1 + ym+1 ; :::; xn + yn ) 2 M

In particular, the vertical axis in R2 , which corresponds to M = x 2 R2 : x1 = 0 , is a


vector subspace of R2 . N
2
Recall that a set is closed with respect to an operation when the result of the operation still belongs to
the set.
3.1. VECTOR SUBSPACES OF RN 61

Example 63 Let M be the set of all x 2 R4 such that


8
>
> 2x1 x2 + 2x3 + 2x4 = 0
<
x1 x2 2x3 4x4 = 0
>
>
:
x1 2x2 2x3 10x4 = 0

In other words, M is the set of the solutions of this system of equations. It is a vector
subspace: the reader can check that, given x; y 2 M and ; 2 R, we have x + y 2 M .
Performing the computations,3 we …nd that the vectors

10 2
t; 6t; t; t (3.2)
3 3

solve the system for each t 2 R, so that

10 2
M= t; 6t; t; t :t2R
3 3

is a description of the subspace. N

If V1 and V2 are two vector subspaces, we can show that also their intersection V1 \ V2 is
a vector subspace. More generally:

Proposition 64 The intersection of any collection of vector subspaces of Rn is a vector


subspace.

Rn . Since 0 2 Vi for every i, we


ProofT Let fVi g be any collection of vector subspaces of T
have i Vi 6= ;. Let x; y 2 V and ; 2 R. Since x; y 2 i Vi , we have x; y 2 Vi for every
i and, therefore,
T x + yT2 Vi for every i since each Vi is a vector subspace of Rn . Hence,
x + y 2 i Vi , and so i Vi is a vector subspace of Rn .

Di¤erently from the intersection, the union of vector subspaces is not in general a vector
subspace, as the next simple example shows.4

Example 65 The sets V1 = x 2 R2 : x1 = 0 and V2 = x 2 R2 : x2 = 0 are both vector


subspaces of R2 . The set

V1 [ V2 = x 2 R2 : x1 = 0 or x2 = 0

is not a vector subspace of R2 . Indeed,

(1; 0) 2 V1 [ V2 and (0; 1) 2 V1 [ V2

but (1; 0) + (0; 1) = (1; 1) 2


= V1 [ V2 . N
3
The system is properly solved in Example 633. But, for completeness at the end of the chapter (Section
3.7) we provide a simple high school argument.
4
Examples that show the failure of a property are often called counterexamples. In general, the simpler
they are, the better because the failure is then starker.
62 CHAPTER 3. LINEAR STRUCTURE

3.2 Linear independence and dependence


In this chapter we will adopt the notation xi = xi1 ; :::; xin 2 Rn , in which the superscript
identi…es di¤erent vectors and the subscripts their components. We use immediately this
notation in the next important de…nition.

De…nition 66 A …nite set of vectors x1 ; :::; xm of Rn is said to be linearly independent


if whenever
1 2
1x + 2x + + m xm = 0
for some set f 1 ; :::; mg of scalars, then

1 = 2 = = m =0

The set x1 ; :::; xm is, instead, said to be linearly dependent if it is not linearly inde-
pendent, i.e.,5 if there exists a set f 1 ; :::; m g of scalars, not all equal to zero, such that
1 2 m
1x + 2x + + mx =0

Example 67 Consider the vectors

e1 = (1; 0; 0; :::; 0)
e2 = (0; 1; 0; :::; 0)

en = (0; 0; :::; 0; 1)

called standard unit vectors or versors of Rn . The set e1 ; :::; en is linearly independent.
Indeed
1
1e + + n en = ( 1 ; :::; n )
and so 1e
1 + + ne
n = 0 implies 1 = = n = 0. N

Example 68 All the sets of vectors x1 ; :::; xm of Rn that include the zero vector 0 are
linearly dependent. Indeed, without loss of generality, set x1 = 0. Given a set f 1 ; :::; m g
of scalars with 1 6= 0 and i = 0 for i = 2; :::; m, we have
1 2 m
1x + 2x + + mx =0
m
which proves the linear dependence of the set xi i=1
. N

Example 69 Two vectors x1 and x2 that are linearly dependent are called collinear. This
happens if and only if either x = 0 or y = 0 or there exists 6= 0 such that x1 = x2 . In
other words, if and only if there exist two scalars 1 and 2 , where at least one is di¤erent
from zero, such that 1 x1 = 2 x2 . N

Before presenting other examples, we must clarify a terminological question. Although


m
linear independence and dependence are properties of a set of vectors xi i=1 , often they are
referred to the single vectors. We then speak of a “set of linearly independent (dependent)
vectors” instead of a “linearly independent (dependent) set of vectors”.
5
See Section D.6.3 of the Appendix for a careful logical analysis of this important negation.
3.2. LINEAR INDEPENDENCE AND DEPENDENCE 63

Example 70 In R3 , the vectors

x1 = (1; 1; 1) ; x2 = (3; 1; 5) ; x3 = (9; 1; 25)

are linearly independent. Indeed


1 2 3
1x + 2x + 3x = 1 (1; 1; 1) + 2 (3; 1; 5) + 3 (9; 1; 25)
=( 1 +3 2 +9 3; 1 + 2 + 3; 1 +5 2 + 25 3)

Therefore, 1 2 3
1x + 2x + 3x = 0 means
8
< 1+3 2+9 3 =0
1+ 2+ 3 =0
:
1 + 5 2 + 25 3 = 0

which is a system of equations whose unique solution is ( 1; 2; 3) = (0; 0; 0). More gener-
ally, to check if k vectors

x1 = x11 ; :::; x1n ; x2 = x21 ; :::; x2n ; ; xk = (xk1 ; :::; xkn )

are linearly independent in Rn , it su¢ ces to solve the linear system


8 1 2
>
> 1 x1 + 2 x1 + + k xk1 = 0
>
< 1 2 + k xk2 = 0
1 x2 + 2 x2 +
>
>
>
: 1 2 + k xkn = 0
1 xn + 2 xn +

If ( 1 ; :::; k ) = (0; :::; 0) is the unique solution, then the vectors are linearly independent in
Rn . For example, consider in R3 the two vectors x1 = (1; 3; 4) and x2 = (2; 5; 1). The system
to solve is 8
< 1+2 2 =0
3 1+5 2 =0
:
4 1+ 2=0
It has the unique solution ( 1; 2) = (0; 0), so the two vectors x1 and x2 are linearly inde-
pendent. N

Example 71 Consider the vectors

x1 = (2; 1; 1) ; x2 = ( 1; 1; 2) ; x3 = (2; 2; 2) ; x4 = (2; 4; 10)

To check if these vectors are linearly independent in R3 , we solve the system


8
< 2 1 2+2 3+2 4 =0
1 2 2 3 4 4=0
:
1 2 2 2 3 10 4 = 0

As we have seen previously (Example 63), it is solved by the vectors

10 2
t; 6t; t; t (3.3)
3 3
64 CHAPTER 3. LINEAR STRUCTURE

for each t 2 R. Therefore, (0; 0; 0; 0) is not the unique solution of the system, and so the
vectors x1 , x2 , x3 , and x4 are linearly dependent. Indeed, by setting for example t = 1 in
(3.3), the set of four numbers
10 2
( 1; 2; 3; 4) = ; 6; ;1
3 3
is a set of scalars, with at least one di¤erent from zero, such that 1+ 2+ 3 4
1x 2x 3x + 4x =
0. N

Subsets retain linear independence.

Proposition 72 The subsets of a linearly independent set are, in turn, linearly independent.

The simple proof is left to the reader, who can also check that if we add vectors to a
linearly dependent set, the set remains linearly dependent.

3.3 Linear combinations


De…nition 73 A vector x 2 Rn is said to be a linear combination of the vectors x1 ; :::; xm
of Rn if there exist m scalars f 1 ; :::; m g such that
1 m
x= 1x + + mx

The scalars i are called the coe¢ cients of the linear combination.

Example 74 Consider the two vectors e1 = (1; 0; 0) and e2 = (0; 1; 0) in R3 . A vector of R3


is a linear combination of e1 and e2 if and only if it has the form ( 1 ; 2 ; 0) for 1 ; 2 2 R.
Indeed, ( 1 ; 2 ; 0) = 1 e1 + 2 e2 . N

The notion of linear combination allows us to establish a remarkable characterization of


linear dependence.

Theorem 75 A …nite set S of Rn , with S 6= f0g, is linearly dependent if and only if there
exists at least an element of S that is a linear combination of other elements of S.6
m
Proof “Only if”. Let S = xi i=1 be a linearly dependent set of Rn . Let 2 k m
be the smallest natural number between 2 and m such that the set x1 ; :::; xk is linearly
m
dependent. At worst, k is equal to m since by hypothesis xi i=1 is linearly dependent. By
the de…nition of linear dependence, there exist k scalars f i gki=1 , with at least one di¤erent
from zero, such that
1 2
1x + 2x + + k xk = 0
We have k 6= 0, because otherwise x1 ; :::; xk 1 would be a linearly dependent set, contra-
dicting the fact that k is the smallest natural number between 2 and m such that x1 ; :::; xk
is a linearly dependent set. Given that k 6= 0, we can write
1 1 2 2 k 1 k 1
xk = x + x + + x
k k k
6
In view of Example 61, the condition S 6= f0g amounts to require that S is not a singleton.
3.4. GENERATED SUBSPACES 65

and, therefore, xk is linear combination of the vectors x1 ; :::; xk 1 . In other words, the
vector xk of S is linear combination of other elements of S.
m
“If”. Suppose that the vector xk of a …nite set S = xi i=1 is a linear combination of
other elements of S. Without loss of generality, assume k = 1. There exists a set f i gm
i=2 of
scalars such that x1 = 2 x2 + + m xm . De…ne the scalars f i gmi=1 as follows

1 i=1
i =
i i 2

By
Pmconstruction, f i gm
i=1 is a set of scalars, with at least one di¤erent from zero, such that
i
i=1 i x = 0. Indeed

m
X
i
ix = x1 + 2x
2
+ 3x
3
+ + mx
m
= x1 + x1 = 0
i=1

m
It follows that xi i=1
is a linearly dependent set.

Example 76 (i) Consider the vectors x1 = (1; 3; 4), x2 = (2; 5; 1) ; and x3 = (0; 1; 7) in R3 .
Since x3 = 2x1 x2 , the third vector is a linear combination of the other two. By Theorem
75, the set x1 ; x2 ; x3 is linearly dependent (in the proof we have k = 3). It is immediate
to check that also each of the vectors in the set x1 ; x2 ; x3 is a linear combination of the
other two, something that, as the next example shows, does not hold in general for sets of
linearly dependent vectors.
(ii) Consider the vectors x1 = (1; 3; 4), x2 = (2; 6; 8) ;and x3 = (2; 5; 1) in R3 . Since
x2 = 2x1 , the second vector is a multiple (so, a linear combination) of the …rst vector. By
Theorem 75, the set x1 ; x2 ; x3 is linearly dependent (in the proof we have k = 2). Note
how x3 is not a linear combination of x1 and x2 , i.e., there are no 1 ; 2 2 R such that
x3 = 1 x1 + 2 x2 . In conclusion, Theorem 75 ensures that, in a set of linearly dependent
vectors, some of them are linear combination of others, but this is not necessarily the case
for all the vectors of the set. For example, this happened for all the vectors in the previous
example, but not in this example. N

The next result is an immediate, yet fundamental, consequence of Theorem 75.

Corollary 77 A …nite set S of Rn is linearly independent if and only if none of the vectors
in S is linear combination of other vectors in S.

3.4 Generated subspaces


Let S be a set of vectors of Rn and fVi g be the collection of all the vector subspaces that
contain S. The collection is non-empty because, trivially, Rn contains
T S and is, therefore, an
element of the collection. By Proposition 64, the intersection
T i Vi of all such subspaces is
itself a vector subspace of Rn that contains S. Therefore, i Vi is the smallest (with respect
to n
T inclusion) vector subspace of R that contains S: for each such subspace V , we have
i Vi V.
66 CHAPTER 3. LINEAR STRUCTURE

T
The vector subspace i Vi is very important and is called the vector subspace generated
or spanned by S, denoted by span S. In other words, span S is the smallest “enlargement”
of S with the property of being a vector subspace.

The next result shows that span S has a “concrete” representation in terms of linear
combinations of S.

Theorem 78 Let S be a set of Rn . A vector x 2 Rn belongs to span S if and only if it is a


linear combination of vectors of S.

Proof We need to prove that x 2 Rn belongs to span S if and only if there P exist a …nite
set xi i2I of vectors in S and a …nite set f i gi2I of scalars such that x = i2I i xi . “If”.
Let x 2 Rn be a linear combination of a …nite set xi i2I of vectors of S. For simplicity,
set xi i2I = x1 ; :::; xk . There exists, therefore, a set f i gki=1 of real numbers such that
P
x = ki=1 i xi . By the de…nition of a vector subspace, we have 1 x1 + 2 x2 2 span S since
x1 ; x2 2 span S. In turn, 1
1x + 2x
2 2 span S implies 1
1x + 2x
2 + 3
3 x 2 span S,
Pk
and by proceeding in this way we get that x = i=1 i xi 2 span S, as claimed.
“Only if”. Let V be the set of all vectors x 2 Rn that can be expressed as linear
combinations of vectors of S, that is, x 2 V if there exist …nite sets xi i2I S and
i
Pk
i2I
R such that x = i=1 i x . It is easy to see that V is a vector subspace of Rn
i

containing S. It follows that span S V and so each x 2 span S is a linear combination of


vectors of S.

Before illustrating the theorem with some examples, we state a simple consequence.

Corollary 79 Let S be a set of Rn . If x 2 Rn is a linear combination of vectors of S, then


span S = span (S [ fxg).

In words, the vector subspace generated by a set does not change by adding to the set a
vector that is already a linear combination of its elements. The “generative” capability of a
set is not improved by adding to it vectors that are linear combinations of its elements.

Example 80 Let S = x1 ; :::; xk Rn . By Theorem 78 we have


( k
)
X
span S = x 2 Rn : x = ix
i
with i 2 R for each i = 1; :::; k
i=1
( k )
X
i
= ix : i 2 R for each i = 1; :::; k
i=1

Example 81 Let S = f(1; 0; 0) ; (0; 1; 0) ; (0; 0; 1)g R3 . We have

span S = x 2 R3 : x = 1 (1; 0; 0) + 2 (0; 1; 0) + 3 (0; 0; 1) with each i 2R


3
= f( 1; 2; 3) : i 2 R for every i = 1; 2; 3g = R
3.5. BASES 67

More generally, let S = e1 ; :::; en Rn . We have


( n
)
X
span S = x 2 Rn : x = ie
i
with each i 2 Rn
i=1
= f( 1; 2 ; :::; n) : i 2 R for every i = 1; :::; ng = Rn

Example 82 If S = fxg, then span S = f x : 2 Rg. For example, let x = (2; 3) 2 R2 .


We have
span S = f(2 ; 3 ) : 2 Rg
i.e., span S is the graph of the straight line y = (3=2) x that passes through the origin and
the point x. Graphically:
8

y
6

3
2

0
O 2 x

-2

-4
-6 -4 -2 0 2 4 6

3.5 Bases
By Theorem 78, the subspace generated by a subset S of Rn is formed by all the linear
combinations of the vectors in S. Suppose that S is a linearly dependent set. By Theorem
75, some vectors in S are then linear combinations of other elements of S. By Corollary
79, such vectors are, therefore, redundant for the generation of span S. Indeed, if a vector
x 2 span S is a linear combination of vectors of S, then by Corollary 79 we have
span S = span (S fxg)
where S fxg is the set S without the vector x.
A linearly dependent set S thus contains some elements that are redundant for the
generation of span S. This does not happen if, on the contrary, S is a linearly independent
set: by Corollary 77, no vector of S can then be a linear combination of other elements of S.
In other words, when S is linearly independent, all its vectors are essential for the generation
of span S.
These observations lead us to the notion of basis.
68 CHAPTER 3. LINEAR STRUCTURE

De…nition 83 A …nite subset S of Rn is a basis of Rn if S is a linearly independent set


such that span S = Rn .

If S is a basis of Rn , we therefore have:

(i) each x 2 Rn can be represented as a linear combination of vectors in S;

(ii) all the vectors of S are essential for this representation, none of them is redundant.

Such “essentiality” of a basis to represent, as linear combinations, the elements of Rn is


evident in the following result.

Theorem 84 A …nite subset S of Rn is a basis of Rn if and only if each x 2 Rn can be


written in only one way as a linear combination of vectors in S.
m
Proof “Only if”. Let S = xi i=1 be a basis of Rn . By de…nition, each vector x 2 Rn can
be represented as a linear combination of elements of S. Given x 2 Rn , suppose that there
exist two sets of scalars f i gm m
i=1 and f i gi=1 such that
m
X m
X
i i
x= ix = ix
i=1 i=1

Hence,
m
X
i
( i i) x =0
i=1
and, since the vectors in S are linearly independent, it follows that i i = 0 for every
i = 1; :::; m; that is, i = i for every i = 1; :::; m.
“If”. Let S = x1 ; :::; xm and suppose that each x 2 Rn can be written in a unique way
as a linear combination of vectors in S. Clearly, by Theorem 78 we have Rn = span S. It
remains to prove that S is a linearly independent set. Suppose that the scalars f i gm
i=1 are
such that
Xm
i
ix = 0
i=1
Since we also have
m
X
0xi = 0
i=1
we conclude that i = 0 for every i = 1; :::; m because, by hypothesis, the vector 0 can be
written in only one way as a linear combination of vectors in S.

Example 85 The standard basis of Rn is given by the versors e1 ; :::; en . Each x 2 Rn


can be written, in a unique way, as a linear combination of these vectors. In particular,
n
X
x = x1 e 1 + + xn e n = xi e i (3.4)
i=1

That is, the coe¢ cients of the linear combination are the components of the vector x. N
3.5. BASES 69

Example 86 The standard basis of R2 is f(1; 0) ; (0; 1)g. But, there exist in…nitely many
other bases of R2 : for example, S = f(1; 2) ; (0; 7)g is another such basis. It is easy to prove
the linear independence of S. To show that span S = R2 , consider any vector x = (x1 ; x2 ) 2
R2 . We need to show that there exist 1 ; 2 2 R such that

(x1 ; x2 ) = 1 (1; 2) + 2 (0; 7)

i.e., that solve the simple linear system

= x1
1
2 1+7 2 = x2

Since
x2 2x1
1 = x1 ; 2 =
7
solve the system, we conclude that S is indeed a basis of R2 . N

Each vector of Rn can be represented (“recovered”) as a linear combination of the vectors


of a basis of Rn . In a sense, a basis is therefore the “genetic code” for a vector space that
contains all the pieces of information necessary to identify its elements. Since there are
several bases of Rn , such pieces of “genetic” information can be encoded in di¤erent sets of
vectors. It is therefore important to understand what are the relations among the di¤erent
bases. They will become clear after the next theorem, whose remarkable implications make
it the deus ex machina of the chapter.

Theorem 87 For each linearly independent set x1 ; :::; xk of Rn with k n, there exist
n
n k vectors xk+1 ; :::; xn such that the overall set xi i=1 is a basis of Rn .

Because of its importance, we give two di¤erent proofs of the result. They both require
the following lemma.

Lemma 88 Let b1 ; :::; bn be a basis of Rn . If

x = c1 b1 + + cn bn

with ci 6= 0, then b1 ; :::; bi 1 ; x; bi+1 ; :::; bn is a basis of Rn .

Proof Without loss of generality suppose that c1 6= 0. We prove that x; b2 ; :::; bn is a


basis of Rn . As c1 6= 0, we can write
1 c2 2 cn n
b1 = x b b
c1 c1 c1
Therefore, for each choice of the coe¢ cients f i gni=1 R we have
n n
" n
# n
X X 1 X ci i 1
X 1 ci
i i
i b = i b + 1 x b = x + i bi
c1 c1 c1 c1
i=1 i=2 i=2 i=2

It follows that
span x; b2 ; :::; bn = span b1 ; b2 ; :::; bn = Rn
70 CHAPTER 3. LINEAR STRUCTURE

It remains to show that the set x; b2 ; :::; bn is linearly independent, so that we can
conclude that it is a basis of Rn . Let f i gni=1 R be coe¢ cients for which
n
X
i
1x + ib =0 (3.5)
i=2

If 1 6= 0, we have
n
X n
X
i i i i
x= b = 0b1 + b
i=2 1 i=2 1

Since x can be written in a unique way as linear combination of the vectors of the basis
n
bi i=1 , one gets that c1 = 0, which contradicts the hypothesis c1 6= 0. This means that
1 = 0 and (3.5) simpli…es to
n
X
0b1 + i
ib = 0
i=2

Since b1 ; : : : ; bn is a basis, one obtains 2 = ::: = n =0= 1.

Proof 1 of Theorem 87 We proceed by induction.7 The theorem holds for k = 1. In-


deed, consider Pn a singleton fxg,8 with x 6= 0, and the standard basis e1 ; :::; en of Rn .
As x = i
i=1 xi e , there exists at least one index i such that xi 6= 0. By Lemma 88,
e ; :::; e ; x; ei+1 ; :::; en is a basis of Rn .
1 i 1

Suppose now that the statement of the theorem is true for each set of k 1 vectors
(induction hypothesis); we want to show that it is true for each set of k vectors. Let therefore
x1 ; :::; xk be a set of k linearly independent vectors. The subset x1 ; :::; xk 1 is linearly
independent and has k 1 elements. By the induction hypothesis, there exist n (k 1)
vectors yek ; :::; yen such that x1 ; :::; xk 1 ; yek ; :::; yen is a basis of Rn . Therefore, there exist
coe¢ cients f i gni=1 R such that

k 1
X n
X
xk = i
ix + ei
iy (3.6)
i=1 i=k

As the vectors x1 ; :::; xk 1 ; xk are linearly independent, at least one of the coe¢ cients
Pk 1
f i gni=k is di¤erent from zero. Otherwise, xk = i
i=1 i x and so the vector x would
k

be linear combination of the vectors x1 ; :::; xk 1 , something that by Corollary 77 cannot


happen. Let, for example, k 6= 0. By Lemma 88, x1 ; :::; xk ; yek+1 ; :::; yen is then a basis of
Rn . This completes the induction.

Proof 2 of Theorem 87 The theorem holds for k = 1 (see the previous proof). So, let
1 < k n be the smallest integer for which the property is false. By Lemma 88, there exists
a linearly independent set x1 ; :::; xk such that there are no n k vectors of Rn that, added
to x1 ; :::; xk , yield a basis of Rn . Given that x1 ; :::; xk 1 is, in turn, linearly independent,
7
See Appendix E for the induction principle.
8
Note that a singleton fxg is linearly independent when x = 0 implies = 0, which is equivalent to
requiring x 6= 0.
3.5. BASES 71

the minimality of k implies that there are xk ; :::; xn such that x1 ; :::; xk 1 ; xk ; :::; xn is a
basis of Rn . But then

xk = c1 x1 + + ck 1x
k 1
+ ck xk + + cn xn

Given that x1 ; :::; xk is linearly independent, one cannot have ck = = cn = 0. So,


cj 6= 0 for some index j 2 fk; :::; ng. By Lemma 88,
n o
x1 ; :::; xk 1 ; xk ; :::; xj 1 ; xk ; xj+1 ; :::; xn

is a basis of Rn , a contradiction.

The next result is a simple, but important, consequence of Theorem 87.

Corollary 89 (i) Each linearly independent set of Rn with n elements is a basis of Rn .

(ii) Each linearly independent set of Rn has at most n elements.

Proof (i) It is enough to set k = n in Theorem 87. (ii) Let S = x1 ; :::; xk be a linearly
independent set in Rn . We want to show that k n. By contradiction, suppose k > n.
1
Then, x ; :::; x n is in turn a linearly independent set and by point (i) is a basis of Rn .
Hence, the vectors xn+1 ; :::; xk are linear combinations of the vectors x1 ; :::; xn , which,
by Corollary 77, contradicts the linear independence of the vectors x1 ; :::; xk . Therefore,
k n, which completes the proof.

Example 90 By point (i), any two linearly independent vectors form a basis of R2 . Going
back to Example 86, it is therefore su¢ cient to verify that the vectors (1; 2) and (0; 7) are
linearly independent to conclude that S = f(1; 2) ; (0; 7)g is a basis of R2 . N

We can …nally state the main result of the section.

Theorem 91 All bases of Rn have the same number n of elements.

In other words, although the “genetic”information of Rn can be codi…ed in di¤erent sets


of vectors – that is, in di¤erent bases – such sets have the same …nite number of elements,
that is, the same “length”. The number n can, therefore, be seen as the dimension of the
space Rn . Indeed, it is natural to think that, the “greater”a space Rn is, the more elements
its bases have –that is, the greater is the quantity of information that the bases require to
represent all the elements of Rn through linear combinations.
Summing up, the number n that emerges from Theorem 91 indicates the “dimension”of
Rn and, in a sense, justi…es its superscript n. This notion of dimension makes rigorous the
intuitive idea that Rn is a larger space than Rm when m < n.

Proof Suppose that Rn has a basis of n elements. By Corollary 89-(ii), every other basis of
Rn can have at most n elements. Let x1 ; :::; xk be any another basis of Rn . We show that
one cannot have k < n, and so conclude that k = n. Suppose that k < n. By Theorem 87,
there exist n k vectors xk+1 ; :::; xn such that the set x1 ; :::; xk ; xk+1 ; :::; xn is a basis of
Rn . This, however, contradicts the assumption that x1 ; :::; xk is a basis of Rn , because the
vectors xk+1 ; :::; xn are not linear combinations of the vectors x1 ; :::; xk : x1 ; :::; xn is
a linearly independent set. Therefore, k = n.
72 CHAPTER 3. LINEAR STRUCTURE

3.6 Bases of subspaces


The notions introduced in the previous section for Rn extend in a natural way to its vector
subspaces.

De…nition 92 Let V be a vector subspace of Rn . A …nite subset S of V is a basis of V if


S is a linearly independent set such that span S = V .

Also the bases of vector subspaces thus permit to represent – without redundancies –
each vector of the subspace as linear combinations.
The results of the previous section continue to hold.9 We start with Theorem 84.

Theorem 93 Let V be a vector subspace of Rn . A …nite subset S of V is a basis of V if


and only if each x 2 V can be written in a unique way as linear combination of vectors in S.

Example 94 (i) The horizontal axis M = x 2 R2 : x2 = 0 is a vector subspace of R2 . The


singleton e1 M is a basis. (ii) The plane through the origin M = x 2 R3 : x3 = 0 is
a vector subspace of R3 . The set e1 ; e2 M is a basis. N

Since V is a subset of Rn , it will have at most n linearly independent vectors. In partic-


ular, the following generalization of Theorem 87 holds.

Theorem 95 Let V be a vector subspace of Rn with a basis of m n elements. For


each linearly independent set of vectors v 1 ; :::; v k , with k m, there exist m k vectors
m
v k+1 ; :::; v m such that the set v i i=1 is a basis of V .

In turn, Theorem 95 leads to the following extension of Theorem 91.

Theorem 96 All bases of a vector subspace of Rn have the same number of elements.

Although in view of Theorem 91 the result is not surprising, it remains of great elegance
because it shows how, despite their diversity, the bases share a fundamental characteristic
like the cardinality. This motivates the next de…nition, which was implicit in the discussion
that followed Theorem 91.

De…nition 97 The dimension of a vector subspace V of Rn is the number of elements of


any basis of V .

By Theorem 96, this number is unique, and is denoted by dim V . It is the notion of dimen-
sion that, indeed, makes interesting this (otherwise routine) section, as the next examples
show.

Example 98 In the special case V = Rn we have dim Rn = n, which makes rigorous the
discussion that followed Theorem 91. N
9
We leave to the reader the proofs of the results of this section because they are similar to those of the
last section.
3.7. POST SCRIPTUM: SOME HIGH SCHOOL ALGEBRA 73

Example 99 (i) The horizontal axis is a vector subspace of dimension one of R2 . (ii) The
plane M = x = (x1 ; x2 ; x3 ) 2 R3 : x1 = 0 is a vector subspace of dimension two of R3 , that
is, dim M = 2. N

Example 100 If V = f0g, that is, if V is the trivial vector subspace formed only by the
origin 0, we set dim V = 0. Indeed, V does not contain linearly independent vectors (why?)
and, therefore, it has as basis the empty set f;g. N

3.7 Post scriptum: some high school algebra


We solve the system of equations in Example 63, i.e.,
8
>
> 2x1 x2 + 2x3 + 2x4 = 0
<
x1 x2 2x3 4x4 = 0
>
>
:
x1 2x2 2x3 10x4 = 0

through a simple high school argument. Consider x4 as a known term and solve the system
in x1 , x2 , and x3 ; clearly, we will get solutions that depend on the value of the parameter
x4 :
8 8
< 2x1 x2 + 2x3 + 2x4 = 0 < 2x1 x2 = 2x3 2x4
x1 x2 2x3 4x4 = 0 =) x1 x2 = 2x3 + 4x4 =)
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< 2 (x2 + 2x3 + 4x4 ) x2 = 2x3 2x4 < x2 = 6x3 10x4
x1 + ( 2x3 2x4 2x1 ) = 2x3 + 4x4 =) x1 = 4x3 6x4 =)
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< x2 = 6x3 10x4 < x2 = 6x3 10x4
x1 = 4x3 6x4 =) x1 = 4x3 6x4 =)
: :
( 4x3 6x4 ) 2 ( 6x3 10x4 ) 2x3 10x4 = 0 x3 = 32 x4
8 2
8
< x2 = 6 3 x4 10x4 < x2 = 6x4
2
x1 = 4 x
3 4 6x 4 =) x1 = 10 3 x4
: 2 : 2
x3 = 3 x4 x3 = 3 x4

In conclusion, the vectors of R4 of the form (3.2) are the solutions of the system for every
t 2 R.
74 CHAPTER 3. LINEAR STRUCTURE
Chapter 4

Euclidean structure

4.1 Absolute value and norm


4.1.1 Inner product
The operations of addition and scalar multiplication and their properties determine the linear
structure of Rn . The operation of inner product and its properties characterize, instead, the
Euclidean structure of Rn , which will be the subject matter of this chapter.
Recall from Section 2.2 that the inner product x y of two vectors in Rn is de…ned by
n
X
x y = x 1 y1 + x 2 y2 + + x n yn = x i yi
i=1

and that it is commutative, x y = y x, and distributive, ( x + y) z = (x z) + (y z).


Note, moreover, that
Xn
x x= x2i 0
i=1

The sum of the squares of the components of a vector is thus the inner product of the vector
by itself. This simple observation will be central in this chapter because it will allow us to
de…ne the fundamental notion of norm using the inner product. In this regard, note that
x x = 0 if and only if x = 0: a sum of squares is zero if and only if all addends are zero.
Before studying the norm we introduce the absolute value, which is the scalar version of
the norm and probably already familiar to the reader.

4.1.2 Absolute value


The absolute value jxj of a scalar x 2 R is
(
x if x 0
jxj =
x if x < 0

For example, j5j = j 5j = 5. Geometrically, the absolute value represents the distance of a
scalar from the origin. It satis…es the following elementary properties that the reader can
verify:

75
76 CHAPTER 4. EUCLIDEAN STRUCTURE

(i) jxj 0 for every x 2 R;


(ii) jxj = 0 if and only if x = 0;
(iii) jxyj = jxj jyj for every x; y 2 R;
(iv) jx + yj jxj + jyj for every x; y 2 R.

Property (iv) is called the triangle inequality. Another basic, but important, property of
the absolute value is
jxj < c () c < x < c 8c > 0 (4.1)
as the reader can check.
p
Recall that we agreed
p to consider only the positive root x of a positive scalar x (Section
1.5). For example, 25 = 5. Formally, this amounts to take
p
x2 = jxj 8x 2 R (4.2)
as it is easily checked.

4.1.3 Norm
The notion of norm generalizes that of absolute value to Rn . In particular, the (Euclidean)
norm of a vector x 2 Rn , denoted by kxk, is given by
1
q
kxk = (x x) = x21 + x22 +
2 + x2n

When n = 1, the norm reduces to the absolute value; indeed, by (4.2) we have
p
kxk = x2 = jxj 8x 2 R
q p
For example, if x = 4 we have kxk = ( 4)2 = 16 = 4 = j 4j = jxj.

Geometrically, the norm of a vector x = (x1 ; x2 ) of the plane is the length of the segment
that joins it with the origin, that is, it is the distance ofpthe vector from the origin. Indeed,
this length is, by Pythagoras’Theorem, exactly kxk = x21 + x22 .

A similar geometric interpretation holds for n = 3, but is obviously lost when n 4.


4.1. ABSOLUTE VALUE AND NORM 77
p p
Example 101 (i) If x = (1; 1) 2 R2 , then kxk = 12 + ( 1)2 = 2.

(ii) If x = a; a2 2 R2 , with a 2 R, then


q p p
kxk = a2 + (a2 )2 = a2 + a4 = jaj 1 + a2

p p
(iii) If x = (a; 2a; a) 2 R3 , then kxk = a2 + (2a)2 + ( a)2 = jaj 6.
p
(iv) If x = 2; ; 2; 3 2 R4 , then
q p p
p 2
kxk = 22 + 2 + 2 + 32 = 4+ 2 +2+9= 15 + 2

The norm satis…es some elementary properties that extend to Rn those of the absolute
value. The next result gathers the simplest ones.

Proposition 102 Let x; y 2 Rn and 2 R. Then:

(i) kxk 0;

(ii) kxk = 0 if and only if x = 0;

(iii) k xk = j j kxk.

Proof We prove
p point (ii), leaving the other points to the reader. If x = 0 = (0; 0; :::; 0),
then kxk = 0 + 0 + + 0 = 0. Vice versa, if kxk = 0 then

0 = kxk2 = x21 + x22 + + x2n (4.3)

Since x2i 0 for each i = 1; 2; :::; n, from (4.3) it follows that x2i = 0 for each such i since a
sum of squares is zero if and only if they are all zero.

Property (iii) extends the property jxyj = jxj jyj of the absolute value. The famous
Cauchy-Schwarz inequality is a di¤erent, more subtle, extension of such property.

Proposition 103 (Cauchy-Schwarz) Let x; y 2 Rn . Then:

jx yj kxk kyk (4.4)

Equality holds if and only if the vectors x and y are collinear.1

1
Recall that two vectors are collinear if they are linearly dependent (Example 69).
78 CHAPTER 4. EUCLIDEAN STRUCTURE

Proof Let x; y 2 Rn be any two vectors. If either x = 0 or y = 0, the result is trivially


true. Indeed, in this case we have jx yj = 0 = kxk kyk and, moreover, the two vectors are
trivially collinear, consistently with the fact that in (4.4) we have equality.
So, let us assume that x and y are both di¤erent from 0. Note that (x + ty) (x + ty) =
kx + tyk2 0 for all t 2 R. Therefore,

0 (x + ty) (x + ty) = x x + 2t(x y) + t2 (y y) = at2 + bt + c

where a = y y, b = 2(x y) and c = x x. From high school algebra we know that at2 +bt+c 0
only if the discriminant = b2 4ac is smaller or equal than 0. Therefore,

0 = b2 4ac = 4(x y)2 4(x x)(y y) = 4 (x y)2 kxk2 kyk2 (4.5)

Whence
(x y)2 kxk2 kyk2
and, by taking square roots of both sides, we obtain the Cauchy-Schwarz inequality (4.4).
It remains to prove that equality holds if and only if the vectors x and y are collinear.
“Only if”. Let us assume that (4.4) holds as equality. Then, by (4.5), it follows that = 0.
Thus, there exists a point t^ where the parabola at2 + bt + c takes the value 0, i.e.,
2
0 = (x + t^y) (x + t^y) = x + t^y

By Proposition 102, this implies that x + t^y = 0, i.e., x = t^y. “If”. If x and y are collinear,
then x = t^y for some t^. Then, 0 = 0 0 =(x + t^y) (x + t^y). This implies that the parabola
at2 + bt + c, besides being always positive, takes the value 0 at the point t^, and thus the
discriminant must be zero. By (4.5), we deduce that (4.4) holds as equality.

The Cauchy-Schwarz inequality allows us to prove the triangle inequality for the norm,
thereby completing the extension to the norm of the properties (i)-(iv) of the absolute value.

Corollary 104 Let x; y 2 Rn . Then:

kx + yk kxk + kyk (4.6)

Proof Squaring both sides, (4.6) becomes

kx + yk2 kxk2 + kyk2 + 2 kxk kyk

That is,
n n n n
!1 n
!1
X 2
X X X 2 X 2

(xi + yi ) x2i + yi2 +2 x2i yi2


i=1 i=1 i=1 i=1 i=1

Hence, simplifying,
n n
!1 n
!1
X X 2 X 2

xi yi x2i yi2
i=1 i=1 i=1

which holds thanks to the Cauchy-Schwarz inequality.


4.1. ABSOLUTE VALUE AND NORM 79
p p
A vector with norm 1 is called unit vector . In the …gure the vectors 2=2; 2=2 and
p
3=2; 1=2 are two unit vectors in R2 :

x
2
y

0
O
-1

-2
-3 -2 -1 0 1 2 3 4 5

Note that, for any vector x 6= 0, the vector

1
v= x
kxk

is a unit vector: to “normalize” a vector is enough to divide it by its own norm. Indeed, we
have
x 1
= kxk = 1 (4.7)
kxk kxk

where, being kxk a scalar, the …rst equality follows from Proposition 102-(iii).
The unit vectors

e1 = (1; 0; 0; ::; 0)
e2 = (0; 1; 0; :::; 0)

en = (0; 0; :::; 0; 1)

are the versors of Rn introduced in Chapter 3. To see their special status, note that in R2
they are

e1 = (1; 0) and e2 = (0; 1)

and lie on the horizontal and on the vertical axes, respectively. In particular, e1 ; e2
80 CHAPTER 4. EUCLIDEAN STRUCTURE

belong to the Cartesian axes of R2 :

0.8

0.6
2
+e
0.4

0.2
1 1
-e +e
0
O
-0.2

-0.4
2
-e
-0.6

-0.8

-1
-1 -0.5 0 0.5 1

In R3 the versors are

e1 = (1; 0; 0) ; e2 = (0; 1; 0) and e3 = (0; 0; 1)

Also in this case, e1 ; e2 ; e3 belong to the Cartesian axes of R3 .

4.2 Orthogonality
Through a simple trigonometric analysis, Appendix C.3 shows that two vectors x and y of
the plane can be regarded to be perpendicular when their inner product is zero, i.e., x y = 0.
This suggests the following de…nition.

De…nition 105 Two vectors x; y 2 Rn are said to be orthogonal (or perpendicular), written
x?y, if
x y=0

From the commutativity of the inner product it follows that x?y is equivalent to y?x.

Example 106 (i) Two di¤erent versors are orthogonal.pFor example,


p for e1pand e2 in R3
1 2
we have e e = (1; 0; 0) (0; 1; 0) = 0. (ii) The vectors 2=2; 6=2 and 3=2; 1=2 are
orthogonal:
p p ! p ! p p
2 6 3 1 6 6
; ; = + =0
2 2 2 2 4 4
N

The next result clari…es the importance of orthogonality.

Theorem 107 (Pythagoras) Let x; y 2 Rn . If x?y, then kx + yk2 = kxk2 + kyk2 .

Proof We have

kx + yk2 = (x + y) (x + y) = kxk2 + x y + y x + kyk2 = kxk2 + kyk2

as desired.
4.2. ORTHOGONALITY 81

The basic Pythagoras’Theorem is the case n = 2. Thanks to the notion of orthogonality,


we established a general version for Rn of this celebrated result of Greek mathematics.

Orthogonality extends in a natural way to sets of vectors.

De…nition 108 A set of vectors of Rn is said to be orthogonal if its elements are pairwise
orthogonal vectors.

The set e1 ; :::; en of the versors is the most classic example of an orthogonal set. Indeed,
ei ej = 0 for every 1 i = 6 j n.

A remarkable property of orthogonal sets is linear independence.2 This implies, inter


alia, that an orthogonal set of vectors has at most n elements, so k n in the last de…nition
(cf. Corollary 89-(ii)).

Proposition 109 Any orthogonal set that does not contain the zero vector is linearly inde-
pendent.

Proof Let x1 ; :::; xk be an orthogonal set of Rn and f 1 ; :::; kg


a set of scalars such that
Pk i
i=1 i x = 0. We have to show that 1 = 2 = = k = 0. We have:

k k k
!
X X X
j j i
0= jx 0 = jx ix
j=1 j=1 i=1
k
! k
! k
!
X X X
1 i 2 i k i
= 1x ix + 2x ix + + kx ix
i=1 i=1 i=1
k
! k
!
X X
2 1 2 1 i 2 2 2 2 1 i
= 1 x + 1x ix + 2 x + 2x 1x + ix
i=2 i=3
k 1
! k
2 X X 2
2
+ + k xk + kx
k
ix
i
= 2
i xi
i=1 i=1

where the last equality uses the hypothesis


P that the2 vectors are pairwise orthogonal, i.e.,
xi xj = 0 for every i 6= j. Hence, 0 = ki=1 2i xi . Since none of the vectors xi is zero,
2 P 2
we have xi > 0 for every i = 1; 2; :::; k. From 0 = ki=1 2i xi , it then follows that
1 = 2 = = k = 0, as desired.

An orthogonal set composed of unit vectors is called orthonormal. The set e1 ; :::; en
is, for example, orthonormal. In general, given an orthogonal set x1 ; :::; xk of vectors of
Rn , the set
x1 xk
; :::;
kx1 k kxk k
2
In reading this result, recall that a set of vectors containing the zero vector is necessarily linearly dependent
(see Example 68).
82 CHAPTER 4. EUCLIDEAN STRUCTURE

obtained by dividing each element by its norm is orthonormal. Indeed, by (4.7) each vector
xi = xi has norm 1 (so it is a unit vector), and for every i 6= j we have

xi xj 1
i j
= i xi xj = 0
kx k kx k kx k kxj k

Example 110 Consider the following three orthogonal vectors in R3 :

x1 = (1; 1; 1) ; x2 = ( 2; 1; 1) ; x3 = (0; 1; 1)

Then p p p
x1 = 3; x2 = 6; x3 = 2
By dividing each vector by its norm, we get the orthonormal vectors

x1 1 1 1 x2 2 1 1 x3 1 1
= p ;p ;p ; = p ;p ;p ; = 0; p ;p
kx1 k 3 3 3 kx2 k 6 6 6 kx3 k 2 2
In particular, these three vectors form an orthonormal basis. N

The orthonormal bases of Rn , in primis the standard basis e1 ; :::; en , are the most
important bases of Rn because for them it is easy to determine the coe¢ cients of the linear
combinations that represent the vectors of Rn , as the next result shows.

Proposition 111 Let fx1 ; :::; xn g be an orthonormal basis of Rn . For every y 2 Rn , we


have
Xn
1 1 2 2 n n
y = (y x )x + (y x )x + + (y x )x = (y xi )xi (4.8)
i=1

The coe¢ cients y xi are called Fourier coe¢ cients of y (with respect to the given
orthonormal basis).

Proof Since fx1 ; :::; xn g is a basis, there exist n scalars 1; 2 ; :::; n such that
n
X
i
y= ix
i=1

For j = 1; 2; :::n the scalar product y xj is


n
X
j i
y x = i (x xj )
i=1

Since fx1 ; :::; xn g is orthonormal, we have

0 if i 6= j
xi xj =
1 if i = j

Hence y xj = j, from which the statement follows.


4.2. ORTHOGONALITY 83

With respect to the standard basis e1 ; :::; en , each vector y = (y1 ; :::; yn ) 2 Rn has the
Fourier coe¢ cients y ei = yi . In this case, (4.8) thus reduces to (3.4), i.e., to
n
X
y= yi ei
i=1

This way of writing vectors, which plays a key role in many results, is a special case of
the general expression (4.8). In other words, the components of a vector y are its Fourier
coe¢ cients with respect to the standard basis.

For a change, the next example considers an orthonormal basis di¤erent from the standard
basis.

Example 112 Consider the orthonormal basis of R3 of Example 110, i.e.,


1 1 1 2 1 1 1 1
x1 = p ;p ;p ; x2 = p ;p ;p ; x3 = 0; p ;p
3 3 3 6 6 6 2 2
Consider, for example, the vector y = (2; 3; 4). Since
9 3 1
x1 y = p ; x2 y = p ; x3 y = p
3 6 2
we have

y = x1 y x1 + x2 y x2 + x3 y x3
9 1 1 1 3 2 1 1 1 1 1
=p p ;p ;p +p p ;p ;p +p 0; p ; p
3 3 3 3 6 6 6 6 2 2 2
p p p
Thus, 9= 3; 3= 6; 1= 2 are the Fourier coe¢ cients of y = (2; 3; 4) with respect to this
orthonormal basis of R3 . N

We close by showing that Pythagoras’Theorem extends to orthogonal sets of vectors.

Proposition 113 For an orthogonal set x1 ; :::; xk of vectors of Rn we have

k 2 k
X X 2
i
x = xi
i=1 i=1

Proof We proceed by induction. By Pythagoras’Theorem, the result holds for k = 2. Now,


assume that it holds for k 1 (induction hypothesis), i.e.,
k 1 2 k 1
X X 2
i
x = xi (4.9)
i=1 i=1
Pk 1 i
We show that this implies that it holds for k. Observe that, setting y = i=1 x , we have
y?xk . Indeed, !
k 1
X k 1
X
k i k
y x = x x = xi xk = 0
i=1 i=1
84 CHAPTER 4. EUCLIDEAN STRUCTURE

By Pythagoras’Theorem and (4.9), we have

k 2 k 1 2
X X 2 2
x i
= x +xi k
= y + xk = kyk2 + xk
i=1 i=1
k 1 2 k 1 k
X 2 X 2 2 X 2
i
= x + xk = xi + xk = xi
i=1 i=1 i=1

as desired.
Chapter 5

Topological structure

In this chapter we introduce the fundamental notion of distance between points of Rn that,
by formalizing the notion of “proximity”, endows Rn with a topological structure.

5.1 Distances
The norm, studied in Section 4.1, allows to de…ne a distance in Rn . We start with n = 1,
when the norm is simply the absolute value jxj. Consider two points x and y on the real
line, with x > y:

The distance between the two points is x y, which is the length of the segment that joins
them. On the other hand, if we take any two points x and y on the real line, without knowing
their order (i.e., whether x y or x y), the distance becomes

jx yj

which is the absolute value of their di¤erence. Indeed,

x y if x y
jx yj =
y x if x < y

and so the absolute value of the di¤erence represents the distance between the two points,
independently of their order. In symbols, we write

d (x; y) = jx yj 8x; y 2 R

In particular, d (0; x) = jxj and therefore the absolute value – i.e., the norm – of a point
x 2 R can be regarded as its distance from the origin.
Let us now consider n = 2. Take two vectors x = (x1 ; x2 ) and y = (y1 ; y2 ) in the plane:

85
86 CHAPTER 5. TOPOLOGICAL STRUCTURE

The distance between x and y is given by the length of the segment that joins them (in
boldface in the …gure). By Pythagoras’Theorem, this distance is
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 (5.1)

since it is the hypotenuse of the right triangle whose catheti are the segments that join xi
and yi for i = 1; 2. The following …gure illustrates:

The distance (5.1) is nothing but the norm of the vector x y (and also of y x), i.e.,

d (x; y) = kx yk

The distance between two vectors in R2 is, therefore, given by the norm of their di¤erence.
It is easy to see that, by applying again Pythagoras’Theorem, the distance between two
vectors x and y in R3 is given by
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 + (x3 y3 )2

Therefore, we have again


d (x; y) = kx yk
At this point we can generalize the notion of distance to any dimension n.

De…nition 114 The ( Euclidean) distance d (x; y) between two vectors x and y in Rn is the
norm of their di¤ erence, i.e., d (x; y) = kx yk.
5.1. DISTANCES 87

In particular, d(x; 0) = kxk, which is the norm kxk of the vector x 2 Rn , can be regarded
as its distance from the vector 0 (i.e., the length of the segment that joins 0 and x).

The following proposition collects the basic properties of the distance (we leave the simple
proof to the reader).

Proposition 115 Let x; y 2 Rn . Then:

(i) d (x; y) 0;

(ii) d (x; y) = 0 if and only if x = y;

(iii) d (x; y) = d (y; x);

(iv) d (x; y) d (x; z) + d (z; y) for every z 2 Rn .

Properties (i)-(iv) are natural for a notion of distance. Property (i) says that a distance
is always a positive quantity, which by (ii) is zero only between vectors that are equal (so,
the distance between distinct vectors is always strictly positive). Property (iii) says that
distance is a symmetric notion: in measuring a distance between two vectors, it does not
matter from which vector we take the measurement. Finally, property (iv) is the so-called
triangle inequality: for example, the distance between cities x and y cannot exceed the sum
of the distances between x and any other city z and between z and y: detours cannot reduce
the distance one needs to cover.

Example 116 (i) If x = (1=3) and y = 1=3, then

1 1 2 2
d (x; y) = = =
3 3 3 3

(i) if x = a and y = a2 with a 2 R, then d (x; y) = d a; a2 = a a2 = jaj j1 aj;

(iii) if x = (1; 3) and y = (3; 1), then


p p p
d (x; y) = (1 3)2 + ( 3 ( 1))2 = 8=2 2

(iv) if x = (a; b) and y = ( a; b), then


p p p
d (x; y) = (a ( a))2 + (b b)2 = (2a)2 + 0 = 4a2 = 2 jaj

(v) if x = (0; a; 0) and y = (1; 0; a), then


p p
d (x; y) = (0 1)2 + (a 0)2 + (0 ( a))2 = 1 + 2a2

N
88 CHAPTER 5. TOPOLOGICAL STRUCTURE

5.2 Neighborhoods
De…nition 117 We call neighborhood of center x0 2 Rn and radius " > 0, denoted by
B" (x0 ), the set
B" (x0 ) = fx 2 Rn : d (x; x0 ) < "g

The neighborhood B" (x0 ) is, therefore, the locus of the points of Rn that lie at distance
strictly smaller than " from x0 .1
In R the neighborhoods are the open intervals (x0 "; x0 + "), i.e.,

B" (x0 ) = (x0 "; x0 + ")


Indeed,

fx 2 R : d(x; x0 ) < "g = fx 2 R : jx x0 j < "g


= fx 2 R : "<x x0 < "g = (x0 "; x0 + ")

where we have used (4.1), i.e., jxj < a () a < x < a. Graphically:

Hence in R the neighborhoods are open intervals. It is easily seen that in R2 they are
open balls (so, without circumference), in R3 open balls (so, without surface), and so on.
Indeed, the points that lie at a distance strictly less than " from x0 form a open, so “skinless”,
ball of center x0 . Graphically, in the plane we have:
4

2 ε
x
0
1

0
O
-1

-2
-3 -2 -1 0 1 2 3 4 5

Next we give some examples of neighborhoods. To ease notation, we write B" (x1 ; ::; xn )
instead of B" ((x1 ; ::; xn )).

Example 118 (i) We have B3 ( 1) = ( 1 3; 1 + 3) = ( 4; 2), as well as


3 3 1 5
B 3 (1) = 1 ;1 + = ;
2 2 2 2 2
1
In the mathematical jargon, they are said to be “" close” to x0 .
5.2. NEIGHBORHOODS 89

(ii) The notations B 1 (0) and B0 (1) are meaningless because we need " > 0.
(iii) We have
q
B3 (0; 0) = B3 (0) = x 2 R2 : d(x; 0) < 3 = x 2 R2 : x21 + x22 < 3

= x 2 R2 : x21 + x22 < 9

(iv) We have

B1 (1; 1; 1) = x 2 R3 : d (x; (1; 1; 1)) < 1


n p o
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1

For example, (1=2; 1=2; 1=2) 2 B1 (1; 1; 1). Indeed


2 2 2
1 1 1 3
1 + 1 + 1 = <1
2 2 2 4
Check that, instead, 0 = (0; 0; 0) 2
= B1 (1; 1; 1). N

N.B. Each point x0 of Rn has in…nitely many neighborhoods B" (x0 ), one per each value of
the radius " > 0. O

Sometimes we will use, though only in the real line, “half neighborhoods” of a point x0 .
Speci…cally:

De…nition 119 Given " > 0, the interval [x0 ; x0 + ") is called the right neighborhood of
x0 2 R of radius ", while the interval (x0 "; x0 ] is called the left neighborhood of x0 of
radius ".

Through them we can give a useful characterization of suprema and in…ma of subsets of
the real line (Section 1.4.2).

Proposition 120 Let A R. We have a = sup A if and only if

(i) a x for every x 2 A,


(ii) for every " > 0, there exists x 2 A such that x > a ".

Thus, point a 2 R is the supremum of A R if and only if (i) it is an upper bound of A


and (ii) in each left neighborhood of a there are elements of A. A similar characterization
holds for in…ma by replacing right neighborhoods with left ones.

Proof “Only if”. If a = sup A, (i) is obviously satis…ed. Let " > 0. Since sup A > a ", the
point a " is not an upper bound of A. Therefore, there exists x 2 A such that x > a ".
“If”. Suppose that a 2 R satis…es (i) and (ii). By (i), a is an upper bound of A. By (ii),
it is also the least upper bound. Indeed, each b < a can be written as b = a ", by setting
" = a b > 0. Given b < a, by (ii) there exists x 2 A such that x > a " = b. Therefore, b
is not an upper bound of A, which implies that there is no upper bound smaller than a.
90 CHAPTER 5. TOPOLOGICAL STRUCTURE

5.3 Taxonomy of the points of Rn with respect to a set


The notion of neighborhood permits to classify the points of Rn in various categories, ac-
cording to their relations with a given set A Rn .

5.3.1 Interior, exterior and boundary points


The …rst fundamental notion is that of interior point. Intuitively, a point is interior to a set
if it is “well inside”the set, i.e., if it is surrounded by other points that belong to the set (so,
from an interior point one can always go in any direction by remaining, at least for a while,
in the set).

De…nition 121 Let A be a set in Rn . A point x0 2 A is an interior point of A if there


exists " > 0 such that B" (x0 ) A.

In words, x0 is interior point of A if there exists at least a neighborhood of x0 completely


contained in A. This motivates the adjective “interior”. An interior point x of A is, therefore,
contained in A together with an entire neighborhood B" (x), however small. Thus, we can
say that it belongs to A both in a set-theoretical sense, x 2 A, and in a topological sense,
B" (x) A.

In a dual way, a point x0 2 Rn is called exterior to A if it is interior to the complement Ac


of A, i.e., if there exists " > 0 such that B" (x0 ) is contained in Ac (so that B" (x0 ) \ A = ;).
A point that is exterior to a set is thus “well outside” it.

The set of the interior points of A is called the interior of A and is denoted by int A. By
de…nition, int A A. The set of the exterior points of A is then int Ac .

Example 122 Let A = (0; 1). Each point of A is interior, that is, int A = A. Indeed, let
x 2 (0; 1). Consider the smallest distance of x from the two endpoints 0 and 1 of the interval,
i.e., min fd (0; x) ; d (1; x)g. Let " > 0 be such that " < min fd (0; x) ; d (1; x)g. Then

B" (x) = (x "; x + ") (0; 1)

Therefore, x is an interior point of A. Since x was arbitrarily chosen, it follows that int A = A.
Finally, the set of exterior points is int Ac = ( 1; 0) [ (1; +1). N

Example 123 Let A = [0; 1]. We have int A = (0; 1). Indeed, by proceeding as above we
see that the points in (0; 1) are all interior, that is, (0; 1) int A. It remains to check the
endpoints 0 and 1. Consider 0. Its neighborhoods have the form ( "; "), so they contain also
points of Ac . It follows that 0 2
= int A. Similarly, 1 2= int A. We conclude that int A = (0; 1).
The set of the exterior points is Ac , i.e., int Ac = Ac (as the reader can easily verify). N

De…nition 124 Let A be a set in Rn . A point x0 2 Rn is a boundary point of A if it is


neither interior nor exterior, i.e., if for every " > 0 both B" (x0 )\A 6= ; and B" (x0 )\Ac 6= ;.
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 91

A point x0 is, therefore, a boundary point for A if all its neighborhoods contain both
points of A (because it is not exterior) and points of Ac (because it is not interior). The set
of the boundary points of a set A is called the boundary or frontier of A and is denoted by
@A. Intuitively, the frontier is the “border” of a set.

The de…nition of boundary points is residual: a point is boundary if it is neither interior


nor exterior. This implies that the classi…cation into interior, exterior, and boundary points
is exhaustive: given a set A, each point x0 of Rn necessarily falls into one of these three
categories. The classi…cation is also exclusive: given a set A, each point x0 of Rn is either
interior or exterior or boundary.

Example 125 (i) Let A = (0; 1). Given the residual nature of the de…nition of boundary
points, to determine @A we need to …nd the interior and exterior points. From Example 122,
we know that int A = (0; 1) and int Ac = ( 1; 0) [ (1; +1). It follows that

@A = f0; 1g

i.e., the boundary of (0; 1) is formed by the two endpoints 0 and 1. Note that A \ @A = ;:
in this example the boundary points do not belong to the set A.
(ii) Let A = [0; 1]. In Example 123 we have seen that int A = (0; 1) and int Ac = Ac .
Therefore, @A = f0; 1g. Here @A A, the set A contains its own boundary points.
(iii) Let A = (1; 0]. The reader can verify that int A = (0; 1) and int Ac = ( 1; 0) [
(1; +1). Hence, @A = f0; 1g. In this example, the frontier is partly outside and partly
inside the set: the boundary point 1 belongs to A, while the boundary point 0 does not.

Example 126 Consider the closed unit ball

A = (x1 ; x2 ) 2 R2 : x21 + x22 1

All the points such that x21 + x22 < 1 are interior, that is,

int A = (x1 ; x2 ) 2 R2 : x21 + x22 < 1

while all the points such that x21 + x22 > 1 are exterior, that is,

int Ac = (x1 ; x2 ) 2 R2 : x21 + x22 > 1

Therefore, the unit circle is the frontier of A:

@A = (x1 ; x2 ) 2 R2 : x21 + x22 = 1

The set A contains all its own boundary points. N

Example 127 Let A = Q be the set of rational numbers, so that Ac is the set of the
irrational numbers. By Propositions 18 and 39, between any two rational numbers q < q 0
there exists an irrational number a such that q < a < q 0 and between any two irrational
numbers a < b there exists a rational number q 2 Q such that a < q < b. The reader can
check that this implies int A = int Ac = ;, and so @A = R. This example shows that the
interpretation of the boundary as a “border” can be misleading in some cases. Indeed, the
mathematical notions have their own life and we must be ready to follow them also when
our intuition may fall short. N
92 CHAPTER 5. TOPOLOGICAL STRUCTURE

The next lemma generalizes what we saw in Example 125.

Lemma 128 Let A R be a bounded set. Then sup A 2 @A and inf A 2 @A.

Proof We prove that = sup A 2 @A (the proof for the in…mum is similar). Consider any
neighborhood ( "; + ") of . We have ( ; + ") Ac , so ( "; + ") \ Ac 6= ;.
Moreover, by Proposition 120 for every " > 0 there exists x0 2 A such that x0 > ", so
that ( "; ] \ A 6= ;. Thus, ( "; + ") \ A 6= ;. We conclude that, for every " > 0, we
have both ( "; + ") \ A 6= ; and ( "; + ") \ Ac 6= ;, that is, 2 @A.

Next we identify an important class of boundary points.

De…nition 129 Let A be a set in Rn . A point x0 2 A is isolated if there exists a neigh-


borhood B" (x0 ) of x0 that does not contain other points of A except for x0 itself, i.e.,
A \ B" (x0 ) = fx0 g.

As the terminology suggests, isolated points are “separated” from the rest of the set.

Example 130 Let A = [0; 1] [ f2g. It consists of the closed unit interval and, in addition,
of the point 2. This point is isolated. Indeed, if B" (2) is a neighborhood of 2 with " < 1,
then A \ B" (2) = f2g. N

As anticipated, we have:

Lemma 131 Isolated points are boundary points.

Proof Let x0 be an isolated point of A. Since x0 belongs to each of its neighborhoods,


we have B" (x0 ) \ A 6= ; for every " > 0. It remains to prove that B" (x0 ) \ Ac 6= ; for
every " > 0. Let " > 0. Since x0 is isolated point of A, there exists "0 > 0 such that
(B"0 (x0 ) fx0 g) Ac . Let = minf"; "0 g. We have B (x0 ) fx0 g B"0 (x0 ) fx0 g Ac
and B (x0 ) fx0 g B" (x0 ) fx0 g. Let y 2 B (x0 ) fx0 g. For what we have seen, y 2 Ac
and y 2 B" (x0 ) fx0 g, so y 2 Ac \ B" (x0 ). It follows that B" (x0 ) \ Ac 6= ;. Hence, for every
" > 0, we have both B" (x0 ) \ A 6= ; and B" (x0 ) \ Ac 6= ;, that is, x0 is a boundary point for
A:

5.3.2 Limit points


De…nition 132 Let A be a set in Rn . A point x0 2 Rn is called a limit (or accumulation)
point for A if each neighborhood B" (x0 ) of x0 contains at least one point of A distinct from
x0 .

Hence, x0 is a limit point of A if, for every " > 0, there exists some x 2 A such that
0 < kx0 xk < ".2 The set of limit points of A is denoted by A0 and is called the derived
set of A. Note that limit points are not required to belong to the set.

Clearly, limit points are never exterior. Moreover:


2
The inequality 0 < kx0 xk is equivalent to the condition x 6= x0 . So, this inequality is a way to require
that x is a point of A distinct from x0 .
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 93

Lemma 133 Let A be a set in Rn .


(i) Each interior point of A is a limit point, that is, int A A0 .
(ii) A boundary point of A is a limit point if and only if it is not isolated.
Proof (i) If x0 2 int A, there exists a neighborhood B"0 (x0 ) of x0 such that B"0 (x0 ) A.
Let B" (x0 ) be any neighborhood of x0 . The intersection
B"0 (x0 ) \ B" (x0 ) = Bminf"0 ;"g (x0 )
is, in turn, a neighborhood of x0 of radius min f"0 ; "g > 0. Hence Bminf"0 ;"g (x0 ) A and,
to complete the proof, it is su¢ cient to consider any x 2 Bminf"0 ;"g (x0 ) such that x 6= x0 .
Indeed, x belongs also to the neighborhood B" (x0 ) and it is distinct from x0 .
(ii) “If”. Consider a boundary point x0 which is not an isolated point. By the de…nition
of boundary points, for every " > 0 we have B" (x0 ) \ A 6= ;. Since x0 is not isolated,
for every " > 0 we have B" (x0 ) \ A 6= fx0 g. This implies that for every " > 0 we have
(B" (x0 ) fx0 g) \ A 6= ;, i.e., that x0 is a limit point of A.
“Only if”. Take a point x0 that is both a boundary point and a limit point, i.e., x0 2
@A \ A0 . Each neighborhood B" (x0 ) contains at least a point x 2 A distinct from x0 , that
is, B" (x0 ) \ A 6= fx0 g. It follows that x0 is not isolated.
In view of this result, we can say that the set A0 of the limit points consists of the interior
points of A as well as of the boundary points of A that are not isolated. Therefore, a point
of a set A is either a limit or an isolated point, tertium non datur.
Example 134 (i) The points of the interval A = [0; 1] are all limit points, i.e., A0 = [0; 1].
Note that the limit point 1 does not belong to A. (ii) The points of the closed unit ball
A = (x1 ; x2 ) 2 R2 : x21 + x22 1 are all limit points, i.e., A = A0 . N
Example 135 The set A = (x1 ; x2 ) 2 R2 : x1 + x2 = 1 is a straight line in the plane. We
have int A = ; and @A = A0 = A. Hence, the set A has no interior points (as the next …gure
shows, if one draws a disc around a point of A, however small it can be, there is no way to
include it all in A), while all its points are both limit and boundary points.

4
x
2
3

2 2

0
-1 O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5
94 CHAPTER 5. TOPOLOGICAL STRUCTURE

The de…nition of limit point requires that its neighborhoods contain at least one point of
A other than itself. As next we show, they actually contain in…nitely many of them.

Proposition 136 Each neighborhood of a limit point of A contains in…nitely many points
of A.

Proof Let x be a limit point of A. Suppose, by contradiction, that there exists a neighbor-
hood B" (x) of x containing a …nite number of points fx1 ; :::; xn g of A distinct from x. Since
the set fx1 ; :::; xn g is …nite, the minimum distance mini=1;:::;n d (x; xi ) exists and is strictly
positive, i.e.,
min d (x; xi ) > 0
i=1;:::;n

Let > 0 be such that < mini=1;:::;n d (x; xi ). Clearly, 0 < < " since < mini=1;:::;n d (x; xi ) <
". Hence, B (x) B" (x). It is also clear, by construction, that xi 2= B (x) for each
i = 1; 2; :::; n. So, if x 2 A we have B (x)\A = fxg. Instead, if x 2 = A we have B (x)\A = ;.
Regardless of whether x belongs to A or not, we thus have B (x) \ A fxg. Therefore, the
unique point of A that B (x) may contain is x itself. But, this contradicts the hypothesis
that x is a limit point of A.

O.R. The concept of interior point of a set A requires the existence of a neighborhood of the
point that is entirely formed by points of A. This means that it is possible to move away, at
least a bit, from the point by remaining inside A – i.e., it is possible go for a “little walk”
in any direction without showing the passport. Retracing one’s steps, it is then possible to
approach the point from any direction by remaining inside A.
The concept of limit point of a set A does not require the point to belong to A but
requires, instead, that we can get as close as we want to the point by “jumping” on points
of the set (by jumping on river stones, we can get as close as we want to our target through
stones that all belong to the set). This idea of approaching a point by remaining within a
given set will be crucial to de…ne limits of functions. H

5.4 Open and closed sets


We introduce now the fundamental notions of open and closed sets. We begin with open
sets.

De…nition 137 A set A in Rn is called open if all its points are interior, that is, if int A =
A.

Thus, a set is open if it does not contains its borders (so it is skinless).

Example 138 The open interval (a; b) is open (whence the name). Indeed, let x 2 (a; b).
Let " > 0 be such that
" < min fd (x; a) ; d (x; b)g
We have B" (x) (a; b), so x is an interior point of (a; b). Since x was arbitrarily chosen, it
follows that (a; b) is open. N
5.4. OPEN AND CLOSED SETS 95

Example 139 The set x 2 R2 : 0 < x21 + x22 < 1 is open. Graphically, it is the ball de-
prived of both the skin and the origin:

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

Given that the neighborhoods in R are all of the type (a; b), they are all open. The next
result shows that, more generally, neighborhoods are open in Rn .

Lemma 140 Neighborhoods are open sets.

Proof Let B" (x0 ) be a neighborhood of a point x0 2 Rn . To show that B" (x0 ) is open, we
have to show that all its points are interior. Let x 2 B" (x0 ). To prove that x is interior to
B" (x0 ), let
0 < "0 < " d (x; x0 ) (5.2)
Then B"0 (x) B" (x0 ). Indeed, let y 2 B"0 (x). Then

d(y; x0 ) d(y; x) + d(x; x0 ) < "0 + d (x; x0 ) < "

where the last inequality follows from (5.2). Therefore B"0 (x) B" (x0 ), which completes
the proof.

This proof can be illustrated by the following picture:


96 CHAPTER 5. TOPOLOGICAL STRUCTURE

De…nition 141 The set


A [ @A
formed by the points of A and by its boundary points is called the closure of A, denoted by
A.

Clearly, A A. The closure of A is, thus, an “enlargement” of A that includes all its
boundary points, that is, the borders. Naturally, the notion of closure becomes relevant
when the borders are not already part of A.

Example 142 (i) If A = [0; 1) R, then A = [0; 1]. (ii) If A = (x1 ; x2 ) 2 R2 : x21 + x22 1
is the closed unit ball, then A = A. N

Example 143 Given a neighborhood B" (x0 ) of a point x0 2 Rn , we have

B" (x0 ) = fx 2 Rn : d (x; x0 ) "g (5.3)

The closure of a neighborhood features “ "” instead of “< "”. N

We can now introduce closed sets.

De…nition 144 A set A in Rn is called closed if it contains all its boundary points, that is,
if A = A.

Hence, a set is closed when it includes its border (so it has a skin).

Example 145 (i) The set A = [0; 1) is not closed since A 6= A, while the closed unit
ball A = (x1 ; x2 ) 2 R2 : x21 + x22 1 is closed since A = A. (ii) The closed interval
[a; b] is closed (whence the name). The unbounded intervals (a; 1) and ( 1; a) are open.
The unbounded intervals [a; 1) and ( 1; a] are closed. (iii) The circumference A =
(x1 ; x2 ) 2 R2 : x1 + x2 = 1 is closed because A = @A = A0 = A. N

Open and closed sets are dual notions, as the next result shows.3

Theorem 146 A set A in Rn is open if and only if its complement is closed.

Proof “Only if”. Let A be open. We show that Ac is closed. Let x be a boundary point
of Ac , that is, x 2 @Ac . By de…nition, x is not an interior point of either A or Ac . Hence,
x2 = int A. But, A = int A because A is open. Therefore x 2 = A, that is, x 2 Ac . It follows
that @Ac Ac since x 2 @Ac . Therefore, Ac = Ac , which proves that Ac is closed.

“If”. Let Ac be closed. We show that A is open. Let x be a point of A. Since x 2


= Ac = Ac ,
the point x is not a boundary point of Ac . It is, therefore, an interior point of either A or
Ac . But, since x 2
= Ac implies x 2
= int Ac , we have x 2 int A. Since x was arbitrarily chosen,
we conclude that A is open.
3
Often, a set is de…ned to be closed when its complement is open. It is then proved as a theorem that a
closed set contains its boundary. In other words, the de…nition and the theorem are switched relative to the
approach that we have chosen.
5.4. OPEN AND CLOSED SETS 97

Example 147 The …nite sets of Rn (so, the singletons) are closed. To verify it, let A =
fx1 ; x2 ; :::; xn g be a generic …nite set. Its complement Ac is open. Indeed, let x 2 Ac . If
" > 0 is such that
" < d (x; xi ) 8i = 1; :::; n
then B" (x) Ac . So, x is an interior point of Ac . Since x was arbitrarily chosen, it follows
that Ac is open. As the reader can check, we also have int A = ; and @A = A. N

Example 148 The …gure

4
x
2
3

0 -1 2
O x
1
-1
-1

-2
-3 -2 -1 0 1 2 3 4 5

represents the closed set


f(2; 1)g [ f(x1 ; x2 ) 2 R2 : x2 = x21 g [ f(x1 ; x2 ) 2 R2 : (x1 + 1)2 + (x2 + 1)2 1=4g
of R2 . N

Open and closed sets are, therefore, two sides of the same coin: a set is closed (open) if
and only if its complement is open (closed). Naturally, there are many sets that are neither
open nor closed. Next we can give a simple example of such a set.

Example 149 The set A = [0; 1) is neither open nor closed. Indeed, int A = (0; 1) 6= A and
A = [0; 1] 6= A. N

There is a case in which the duality of open and closed sets takes a curious form.

Example 150 The empty set ; and the whole Rn are simultaneously open and closed. By
Theorem 146, it is su¢ cient to show that Rn is both open and closed. But, this is obvious.
Indeed, Rn is open because, trivially, all its points are interior (all neighborhoods are included
in Rn ) as well as closed because it trivially coincides with its own closure. It is possible to
show that ; and Rn are the unique sets with such double personality. N

Let us go back to the notion of closure A. The next result shows that it can be equivalently
seen as the addition to the set A of its limit points A0 . In other terms, adding the borders
turns out to be equivalent to adding the limit points.
98 CHAPTER 5. TOPOLOGICAL STRUCTURE

Proposition 151 We have A = A [ A0 .

Proof We need to prove that A [ A0 = A [ @A. We …rst prove that A [ A0 A [ @A. Since
A A [ @A, we have to prove that A0 A [ @A. Let x 2 A0 . In view of what we observed
after the proof of Lemma 133, x is either an interior or a boundary point, so x 2 A [ @A.
We conclude that A [ A0 A [ @A.
It remains to show that A [ @A A [ A0 . Since A A [ A0 , we have to prove that
@A A [ A0 . Let x 2 @A. If x is an isolated point, then by de…nition x 2 A. Otherwise,
by Lemma 133 x is a limit point for A, that is, x 2 A0 . Hence, x 2 A [ A0 . This proves
A [ @A A [ A0 , and so the result.

A corollary of this result is that a set is closed when it contains all its limit points. This
sheds further light on the nature of closed sets.

Corollary 152 A set in Rn is closed if and only if it contains all its limit points.

Proof Let A be closed. By de…nition, A = A and hence, by Proposition 151, A [ A0 = A,


that is, A0 A. Vice versa, if A0 A, then obviously A [ A0 = A. By Proposition 151,
0
A = A [ A = A.

Example 153 The inclusion A0 A in this corollary can be strict, in which case the set
A A0 consists of the isolated points of A. For example, let A = [0; 1] [ f 1; 4g. Then A
is closed and A0 = [0; 1]. Hence, A0 is strictly included in A and the set A A0 = f 1; 4g
consists of the isolated points of A. N

As already remarked, we have


int A A A (5.4)
The next result shows the importance of these inclusions.

Proposition 154 Given a set A in Rn , we have that:

(i) int A is the largest open set contained in A;

(ii) A is the smallest closed set that contains A.

The set of interior points int A is, therefore, the largest open set that approximates A
“from inside”, while the closure A is the smallest closed set that approximates A “from
outside”. The relation (5.4) is, therefore, the best topological sandwich – with lower open
slice and upper closed slice –that we can have for the set A.4

It is now easy to prove an interesting and intuitive property of the boundary of a set.

Corollary 155 The boundary of a set in Rn is a closed set.


4
Clearly, there are also sandwiches with a lower closed slice and an upper open slice, as the reader will see
in more advanced courses.
5.5. SET STABILITY 99

Proof Let A be any set in Rn . Since the exterior points to A are interior to its complement,
we have (@A)c = int A [ int Ac . So, @A is closed because int A and int Ac are open and, as
we will momentarily see in Theorem 157, a union of open sets is open.

The next result, whose proof is left to the reader, shows that the di¤erence between the
closure and the interior of a set is given by its boundary points.

Proposition 156 For each set A in Rn , we have @A = A int A.

This result makes rigorous the intuition that open sets are sets without borders (or
skinless). Indeed, it implies that A is open if and only if @A \ A = ;. On the other hand, by
de…nition, a set is closed if and only if @A A, that is, when it includes the borders (it has
a skin).

5.5 Set stability


We saw in Theorem 146 that the set operation of complementation plays a crucial role for
open and closed sets. It is then natural to ask what are the stability properties of open and
closed with respect to the other basic set operations of intersection and union.
We start by considering this issue for neighborhoods, the simplest open sets. The inter-
section of two neighborhoods of x0 is still a neighborhood of x0 : indeed B"1 (x0 ) \ B"2 (x0 )
is nothing but the smallest of the two, i.e.,

B"1 (x0 ) \ B"2 (x0 ) = Bminf"1 ;"2 g (x0 )

The same is true for intersections of a …nite number of neighborhoods:

B"1 (x0 ) \ \ B"n (x0 ) = Bminf"1 ;:::;"n g (x0 )

It is, however, no longer true for intersections of in…nitely many neighborhoods. For example,
1
\ 1
\ 1 1
B 1 (x0 ) = x0 ; x0 + = fx0 g (5.5)
n n n
n=1 n=1

i.e., this intersection reduces to the singleton fx0 g, which is closed (Example 147). Therefore,
the intersection of in…nitely many neighborhoods T might well not be open. To check (5.5),
note that a point belongs to the intersection 1 B
n=1 1=nT1(x0 ) if and only if it belongs to each
neighborhood B1=n (x0 ). This is true for x0 , so x0 2 n=1 B1=n (x0 ). This is, however, the
unique point thatT satis…es this property. Indeed, suppose by contradiction that y 6= x0 is
such that y 2 1 n=1 B1=n (x0 ). Since y 6= x0 , we have d (x0 ; y) > 0. If we take n su¢ ciently
large, in particular if
1
n>
d (x0 ; y)
then its reciprocal 1=n will be su¢ ciently small so to have

1
0< < d (x0 ; y)
n
100 CHAPTER 5. TOPOLOGICAL STRUCTURE

T
Therefore, y 2= B1=n (x0 ), which contradicts the assumption
T1 that y 2 1 n=1 B1=n (x0 ). It
follows that x0 is the only point in the intersection n=1 B1=n (x0 ), i.e., (5.5) holds.

A union of neighborhoods of x0 is, instead, always a neighborhood of x0 , even if the


union is in…nite. The union of two neighborhoods is nothing but the largest of the two:

B"1 (x0 ) [ B"2 (x0 ) = Bmaxf"1 ;"2 g (x0 )

More generally, in the case of in…nitely many neighborhoods B"i (x0 ), if supi "i < +1 we
set " = supi "i , so that
1
[
B"i (x0 ) = B" (x0 )
i=1

For example,
1
[ 1
[ 1 1
B 1 (x0 ) = x0 ; x0 + = B1 (x0 )
n n n
n=1 n=1

When, instead, supi "i = +1, we have


1
[
B"i (x0 ) = Rn
i=1

For example,
1
[ 1
[
Bn (x0 ) = (x0 n; x0 + n) = Rn
n=1 n=1

In any case, we always get an open set.

Finite intersections of neighborhoods are, therefore, neighborhoods, while (arbitrary)


unions of neighborhoods are neighborhoods. The next result shows that these properties of
stability continue to hold for all open sets.

Theorem 157 The intersection of a …nite family of open sets is open, while the union of
any family (…nite or not) of open sets is open.
T
Proof Let A = ni=1 Ai with each Ai open. Each point x 2 A belongs to all sets Ai and
is interior to all of them (because they T are open), i.e., there exist neighborhoods B"i (x) of
x such that B"i (x) Ai . Put B = ni=1 B"i (x). The set B it is still a neighborhood of x
– with radius " = min f"1 ; :::; "n g – and B Ai for each i. So, B is a neighborhood of x
contained in A. Therefore, A is open.
S
Let A = i2I Ai , where i runs over a …nite or in…nite index set I. Each x 2 A belongs to
at least one of the sets Ai , say to A{ . Since all sets Ai are open, there exists a neighborhood
of x contained in A{ , and so in A. Therefore, x is interior to A and, given the arbitrariness
of x, A is open.

By Theorem 146 and by the De Morgan laws, it is easy to prove that dual properties
hold for closed sets.
5.6. COMPACT SETS 101

Corollary 158 The union of a …nite family of closed sets is closed, while the intersection
of any family (…nite or not) of closed sets is closed.

In general, in…nite unions of closed sets are not closed: for example, for the closed sets
[1
An = [ 1 + 1=n; 1 + 1=n] we have An = ( 1; 1).
n=1

5.6 Compact sets


This section is short, yet important. We …rst introduce bounded sets. On the real line,
bounded sets have already been introduced with De…nition 29: a set A in R is bounded
when it is bounded both below and above. As the reader can easily verify, this is equivalent
to the existence of a scalar K > 0 such that K < x < K for every x 2 A, that is,

jxj < K 8x 2 A

The next de…nition is the natural extension of this idea to Rn , where the absolute value is
replaced by the more general notion of norm.

De…nition 159 A set A in Rn is bounded if there exists K > 0 such that

kxk < K 8x 2 A

By recalling that kxk is the distance of x from the origin d(x; 0), it is easily seen that a
set A is bounded if, for every x 2 A, we have d(x; 0) < K, i.e., all its points have distance
from the origin smaller than K. So, a set A is bounded if it is contained in a neighborhood
BK (0) of the origin, geometrically if it can be inscribed in a large enough open ball.

Example 160 Neighborhoods and their closures (5.3) are bounded sets: it is su¢ cient to
take K > ". In contrast, (a; 1) is a simple example of an unbounded set (for this reason, it
is called unbounded open interval). N

A set is bounded if and only if its elements are componentwise bounded.

Proposition 161 A set A is bounded if and only if there exists K > 0 such that, for every
x = (x1 ; :::; xn ) 2 A, we have
jxi j < K 8i = 1; :::; n

Proof We prove the “if” and leave the converse to the P reader. Let x 2 A. If jxi j < K for
1; :::; n, then x2i < K for all i = 1; :::; n. So, ni=1 x2i < nK. In turn, this implies
all i = q
Pn p 0 =
p
kxk = x 2 < nK. Since x was arbitrarily chosen in A, by setting K nK it
i=1 i
0
follows that kxk < K for each x 2 A, so A is bounded.

Using boundedness, we can de…ne a class of closed sets that turns out to be very important
for applications.

De…nition 162 A set A in Rn is called compact if it is both closed and bounded.


102 CHAPTER 5. TOPOLOGICAL STRUCTURE

For example, all the intervals closed and bounded in R are compact.5 More generally,
the closure B" (x0 ) of a neighborhood in Rn is compact. For example, the set
B1 (0) = (x1 ; :::; xn ) 2 Rn : x21 + + x2n 1
is compact in Rn . This classic set of Rn is called closed unit ball and generalizes to Rn the
notion of closed ball unit ball that in Section 2.1 we presented in R2 (if the inequality is
strict we have the open unit ball, which instead is an open set).
Like closedness, compactness is stable under …nite unions and arbitrary intersections, as
the reader can check.6

Example 163 Finite sets – so, the singletons – are compact. Indeed, in Example 147 we
showed that they are closed sets. Since they are obviously bounded, they are then compact.
N

Example 164 Provided there are no free goods, budget sets are a fundamental example of
compact sets in consumer theory, as Proposition 792 will show. N

5.7 Closure and convergence


In this …nal section we present an important characterization of closed sets by means of
sequences.7

Theorem 165 A set C in Rn is closed if and only if it contains the limit of every convergent
sequence of its points. That is, C is closed if and only if
fxn g C; xn ! x =) x 2 C (5.6)

Proof “Only if”. Let C be closed. Let fxn g C be a sequence such that xn ! x. We want
to show that x 2 C. Suppose, by contradiction, that x 2 = C. Since xn ! x, for every " > 0
there exists n" 1 such that xn 2 B" (x) for every n n" . Therefore, x is a limit point for
C, which contradicts x 2= C because C is closed and so contains all its limit points.
“If”. Let C be a set for which property (5.6) holds. By contradiction, suppose C is not
closed. Then, there exists at least a boundary point x of C that does not belong to C. Since
it cannot be isolated (otherwise it would belong to C), by Lemma 133 x is a limit point for
C. Each neighborhood B1=n (x) does contain a point of C, call it xn . The sequence of such
xn converges to x 2= C, contradicting (5.6). Hence, C is closed.

This property is important: a set is closed if and only if “it is closed with respect to
the limit operation”, that is, if we never leave the set by taking limits of sequences. This
is a main reason why in applications sets are often assumed to be closed: otherwise, one
could get arbitrarily close to a point x without being able to reach it, a “discontinuity”that
applications typically do not feature (it would be like licking the windows of a pastry shop
without being able to reach the pastries, close yet unreachable).
5
The empty set ; is considered a compact set.
6
Note that, being the empty set compact, the intersection of two disjoint compact sets is the empty (so,
compact) set.
7
This section can be skipped at a …rst reading, and be read only after having studied sequences in Chapter
8.
5.7. CLOSURE AND CONVERGENCE 103

Example 166 Consider the closed interval C = [a; b]. We show that it is closed using
Theorem 165. Let fxn g C be such that xn ! x 2 R. By Theorem 165, to show that C
is closed it is su¢ cient to show that x 2 C. Since a xn b, a simple application of the
comparison criterion shows that a x b, that is, x 2 C. N

Example 167 Consider the rectangle C = [a; b] [c; d] in R2 . Let xk C be such


k 2
that x ! x 2 R . By Theorem 165, to show that C is closed it is su¢ cient to show that
x = (x1 ; x2 ) 2 C. By (8.59), xk ! x implies xk1 ! x1 and xk2 ! x2 . Since xk1 2 [a; b] and
xk2 2 [c; d] for every k, again a simple application of the comparison criterion shows that
x1 2 [a; b] and x2 2 [c; d], that is, x 2 C. N
104 CHAPTER 5. TOPOLOGICAL STRUCTURE
Chapter 6

Functions

6.1 The concept


Consider a shopkeeper who, at a wholesale market, faces the following table that lists the
unit price of a kilogram of walnuts in correspondence to various quantities of walnuts that
can be purchased from his dealer:

Quantity Price per kg


10 kg 4 euros
20 kg 3:9 euros
30 kg 3:8 euros
40 kg 3:7 euros

In other words, if the shopkeeper buys 10 kg of walnuts he will pay them 4 euros per
kg, if he buys 20 kg he will pay them 3:9 euros per kg, and so on (as it is often the case,
the dealer o¤ers quantity discounts: the higher the quantity purchased, the lower the unit
price).
The table is an example of a supply function that associates to each quantity the
corresponding selling price, where A = f10; 20; 30; 40g is the set of the quantities and
B = f4; 3:9; 3:8; 3:7g is the set of their unit prices. The supply function is a rule that,
to each element of the set A, associates an element of the set B.
In general, we have:

De…nition 168 Given any two sets A and B, a function de…ned on A and with values in
B, denoted by f : A ! B, is a rule that associates to each element of the set A one, and
only one, element of the set B.

We write
b = f (a)

to indicate that, to the element a 2 A, the function f associates the element b 2 B. Graph-
ically:

105
106 CHAPTER 6. FUNCTIONS

The rule can be completely arbitrary; what matters is that it associates to each element
a of A only one element b of B.1 The arbitrariness of the rule is the key feature of the notion
of function. It is one of the fundamental ideas of mathematics, key for applications, which
has been fully understood not so long ago: the notion of function that we just presented was
introduced in 1829 by Dirichlet after about 150 years of discussions (the …rst ideas on the
subject go back at least to Leibniz at the end of the seventeenth century).

Note that it is perfectly legitimate that the same element of B is associated to two (or
more) di¤erent elements of A, that is,

Legitimate

In contrast, it cannot happen that di¤erent elements of B are associated to the an element

1
We have emphasized in italics the most important words: the rule must hold for each element of A and,
to each of them, it must associate only one element of B.
6.1. THE CONCEPT 107

of A, that is,

Illegitimate

In terms of the supply function in the initial example, di¤erent quantities of walnuts might
well have the same unit price (e.g., there are no quantity discounts), but the same quantity
cannot have di¤erent unit prices!

Before considering some examples, we introduce a bit of terminology. The two variables a
and b are called the independent variable and the dependent variable, respectively. Moreover,
the set A is called the domain of the function, while the set B is its codomain.
The codomain is the set in which the function takes on its values, but not necessarily
contains only such values: it might well be larger. In this respect, the next notion is import-
ant: given a 2 A, the element f (a) 2 B is called the image of a. Given any subset C of the
domain A, the set
f (C) = ff (x) : x 2 Cg B (6.1)
of the images of the points in C is called the image of C. In particular, the set f (A) of
all the images of points of the domain is called image (or range) of the function f , denoted
Im f . Therefore, Im f is the subset of the codomain formed by the elements that are actually
image of some element of the domain:

Im f = f (A) = ff (x) : x 2 Ag B

Note that any set that contains Im f is, indeed, a possible codomain for the function: if
Im f B and Im f C, then writing both f : A ! B and f : A ! C is …ne. The choice of
codomain is, ultimately, a matter of convenience. For example, throughout this book we will
often consider functions that take on real values, that is, f (x) 2 R for each x in the domain
of f . In this case, the most convenient choice for the codomain is the entire real line, so we
will usually write f : A ! R.

Example 169 (i) Let A be the set of all countries in the world and B a set containing
some colors. The function f : A ! B associates to each country the color given to it on a
geographic map, so Im f is the set of the colors used at least once on the map.
(ii) The rule that associates to each living human being his date of birth is a function
f : A ! B, where A is the set of the human beings and, for example, B is the set of the dates
of the last 150 years (a codomain su¢ ciently large to contain all the possible birthdates). N
108 CHAPTER 6. FUNCTIONS

Let us see an example of a rule that does not de…ne a function.

Example 170 Consider the rule that associates to each positive scalar x both the positive
p p
and the negative square roots, that is, f x; xg. For example, it associates to 4 the
elements f 2; 2g. This rule does not describe a function f : [0; 1) ! R because, to each
element of the domain di¤erent from 0, two di¤erent elements of the codomain are associated.
N

The main classes of functions that we will consider are:

(i) f : A R ! R, real-valued functions of a real variable, called functions of a single


variable or scalar functions.2

(ii) f : A Rn ! R, real-valued functions of n real variables, called functions of several


variables or vector functions.

(iii) f : A R ! Rm , vector-valued functions of a real variable, called curves.3

(iv) f : A Rn ! Rm , vector-valued functions of n real variables, called operators.

We present now some classic examples of functions of a single variable.

Example 171 The cubic function f : R ! R de…ned by f (x) = x3 is a rule that associates
to each scalar its cube. Since each scalar has a unique cube, this rule de…nes a function.
Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

In particular, we have Im f = f (R) = R. N


2
The terminology “scalar function” is convenient, but not standard (and it can have di¤erent meanings
in di¤erent books). So, the reader must use it with some care. The same is true for the terminology “vector
function”.
3
We will rarely consider functions f : A R ! Rm (we mention them here for the sake of completeness),
so this speci…c meaning of the word “curve” will not be relevant for us in the book.
6.1. THE CONCEPT 109

Example 172 The quadratic function f : R ! R de…ned by f (x) = x2 is a rule that


associates to each scalar its square. Since each scalar has a unique square, this rule de…nes
a function. In particular, Im f = f (R) = [0; 1). Graphically:

5
y
4

1
1

0
-1 O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

In this case, to two di¤erent elements of the domain may correspond the same element of
the codomain: for example, f (1) = f ( 1) = 1. N

The clause “is a rule that” is usually omitted, so we will do from now on.
p
Example 173 The square root function f : [0; 1) ! R de…ned by f (x) = x associates
to each positive scalar its (arithmetic) square root. The domain is the positive half-line and
Im f = [0; 1). Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

N
110 CHAPTER 6. FUNCTIONS

Example 174 The logarithmic function f : (0; 1) ! R de…ned by f (x) = loga x, a > 0
and a 6= 1, associates to each strictly positive scalar its logarithm. Its domain is (0; 1),
while Im f = R. Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Example 175 The absolute value function f : R ! R de…ned by f (x) = jxj associates to
each scalar its absolute value. This function has domain R, with Im f = [0; 1). Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Example 176 Let f : R f0g ! R be de…ned by f (x) = 1= jxj for every scalar x 6= 0.
6.1. THE CONCEPT 111

Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Here the domain is A = R f0g, the real line without the origin. Moreover, Im f = (0; 1).
N

Functions of several variables f : A Rn ! R play a key role in economics. Let us


provide some examples.

Example 177 (i) The function f : R2 ! R de…ned by

f (x1 ; x2 ) = x1 + x2 (6.2)

associates to each vector x = (x1 ; x2 ) 2 R2 the sum of its components.4 For every x 2 R2 ,
such sum is unique, so the rule de…nes a function with Im f = f (R2 ) = R.
(ii) The function f : Rn ! R de…ned by
n
X
f (x1 ; x2 ; ; xn ) = xi
i=1

generalizes to Rn the function of two variables (6.2). N

Example 178 (i) The function f : R2+ ! R de…ned by


p
f (x1 ; x2 ) = x1 x2 (6.3)

associates to each vector x = (x1 ; x2 ) 2 R2+ the square root of the product of the components.
For each x 2 R2+ , this root is unique, so the rule de…nes a function with Im f = R+ .
(ii) The function f : Rn+ ! R de…ned by
n
Y
f (x1 ; x2 ; ; xn ) = xi i
i=1
4
To be consistent with the notation adopted for vectors, we should write f ((x1 ; x2 )). But, to ease notation
we write f (x1 ; x2 ).
112 CHAPTER 6. FUNCTIONS

P
with the exponents i > 0 such that ni=1 i = 1, generalizes to Rn the function of two
variables (6.3) – which is the special case with n = 2 and 1 = 2 = 1=2. It is widely used
in economics with the name of Cobb-Douglas function. N

In economics the operators f : A Rn ! Rm , too, are important. Next we present a


few examples.

Example 179 (i) De…ne f : R2 ! R2 by

f (x1 ; x2 ) = (x1 ; x1 x2 )

For example, if (x1 ; x2 ) = (2; 5), then f (x1 ; x2 ) = (2; 2 5) = (2; 10) 2 R2 .

(ii) De…ne f : R3 ! R2 by

f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42

For example, if x = (2; 5; 3), then

f (x1 ; x2 ; x3 ) = 2 22 + 5 3; 2 54 = (10; 623)

O.R. A function f : A ! B is a kind of machine that transforms each element a 2 A in an


element b = f (a) 2 B.

b=f(a)

If we insert in it any element a 2 A, it “spits out”f (a) 2 B. If we insert an element a 2


= A,
the machine will jam and will not produce anything. The image Im f = f (A) B is simply
the “list” of all the elements that can come out from the machine.
In particular, for scalar functions the machine transforms real numbers into real numbers,
for vector functions it transforms vectors of Rn into real numbers, for curves it transforms
real numbers into vectors of Rm , and for operators it transforms vectors of Rn into vectors
of Rm .
6.1. THE CONCEPT 113

The names of the variables are altogether irrelevant: we can indi¤erently write a = f (b),
or y = f (x), or s = f (t), or = f ( ), etc., or also = f ( ): the names of the variables are
just placeholders, what matters is only the sequence of operations (almost always numerical)
that lead from a to b = f (a). Writing b = a2 + 2a + 1 is exactly the same as writing
y = x2 + 2x + 1, or s = t2 + 2t + 1, or = 2 + 2 + 1, or even = 2 + 2 + 1. This
function is identi…ed by the operations “square + double + 1” that allow us to move from
the independent variable to the dependent one. H

We close this introductory section by making rigorous the notion of graph of a function,
until now used intuitively. For the quadratic function f (x) = x2 the graph is the parabola

5
y
4

1
1

0
-1 O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

that is, the locus of the points x; x2 of the plane, as x varies in the real line – which is
the domain of the function. For example, the points ( 1; 1), (0; 0) and (1; 1) belong to the
parabola.

De…nition 180 The graph of a function f : A ! B, denoted by Gr f , is the set

Gr f = f(x; f (x)) : x 2 Ag A B

The graph is, therefore, a subset of the Cartesian product A B. In particular:

(i) When A; B R, the graph is a subset of the plane R2 . Geometrically, it is a curved


line (without thickness) in R2 because, to each x 2 A, there corresponds a unique f (x).
114 CHAPTER 6. FUNCTIONS

Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

(ii) When A R2 and B R, the graph is a subset of the tridimensional space R3 , i.e., a
surface (without thickness). Graphically:

6.2 Applications
6.2.1 Static choices
Let us interpret the vectors in Rn+ as bundles of goods (Section 2.4.1). It is natural to assume
that the consumer will prefer some bundles to others. For example, it is reasonable to assume
that, if x y (bundle x is “richer” than y), then x is preferred to y. In symbols, we then
6.2. APPLICATIONS 115

write x % y, where the symbol % represents the preference (binary) relation of the consumer
over the bundles.
In general, we assume that the preference % over the available bundles of goods can be
represented by a function u : Rn+ ! R, called utility function, such that

x % y () u (x) u (y) (6.4)

That is, bundle x is preferred to y if and only if it gets a higher “utility”. The image, Im u,
represents all the levels of utility that can be attained by the consumer.
Originally, around 1870, the …rst marginalists –in particular, Jevons, Menger and Walras
– interpreted u (x) as the level of physical satisfaction caused by the bundle x. They gave,
therefore, a physiological interpretation of utility functions, which quanti…ed the emotions
that consumers felt in owing di¤erent bundles. It is the so-called cardinalist interpretation
of the utility functions that goes back to Jeremy Bentham and to his “pain and pleasure
calculus”.5 The utility functions, besides representing the preference %, are inherently inter-
esting because they quantify an emotional state of the consumer, i.e., the degree of pleasure
determined by the bundles. In addition to the comparison u (x) u (y), it is also meaningful
to compare the di¤erences
u (x) u (y) u (z) u (w) (6.5)
which indicate that bundle x is more intensively preferred to bundle y than bundle z relative
to bundle w. Moreover, since u (x) measures the degree of pleasure that the consumer gets
by the bundle x, in the cardinalist interpretation it is also legitimate to compare these
measures among di¤erent consumers, i.e., to make interpersonal comparisons of utility. Such
interpersonal comparisons can be then used, for example, to assess the impact of di¤erent
economic policies on the welfare of the economic agents. For instance, we can ask whether
a given policy, though making some agents worse o¤, still increases the overall utility across
agents.

The cardinalist interpretation came into question at the end of the nineteenth century
due to the impossibility of measuring experimentally the physiological aspects that were
assumed to underlie utility functions.6 For this reason, with the works of Vilfredo Pareto at
the beginning of the twentieth century, developed …rst by Eugen Slutsky in 1915 and then
by John Hicks in the 1930s,7 the ordinalist interpretation of the utility functions prevailed:
more modestly, it is assumed that they are only a mere numerical representation of the
preference % of the consumer. According to such less demanding interpretation, what matters
is only that the ordering u (x) u (y) represents the preference for bundle x over bundle
y, that is, x % y. Instead, it is no longer of interest to know if it also represents the, more
or less intense, consumers’ emotions over the bundles. In other terms, in the ordinalist
approach the fundamental notion is the preference %, while the utility function becomes
just a numerical representation of it. The comparisons of intensity (6.5), as well as the
interpersonal comparisons of utility, no longer have meaning.
5
See his Introduction to the Principles of Morals and Legislation, published in 1789.
6
Around 1901, the famous mathematician Henri Poincaré wrote to Leon Walras: “I can say that one
satisfaction is greater than another, since I prefer one to the other, but I cannot say that the …rst satisfaction
is two or three times greater than the other.” Poincaré, with great sensibility, understood a key issue.
7
We refer interested readers to Stigler (1950).
116 CHAPTER 6. FUNCTIONS

At the empirical level, the consumers’ preferences % are revealed through their choices
among bundles, which are much simpler to observe than emotions or other mental states.

The ordinalist interpretation became the mainstream one because, besides the superior
empirical content just mentioned, the works of Pareto showed that it is su¢ cient for develop-
ing a powerful consumer theory (cf. Section 18.1.4). So, Occam’s razor was a further reason
to abandon the earlier cardinalist interpretation. Nevertheless, economists often use, at an
intuitive level, cardinalist categories because of their introspective plausibility.

Be that as it may, through utility functions we can address the problem of a consumer
who has to choose a bundle within a given set A of Rn+ . The consumer will be guided in such
a choice by his utility function u : A Rn+ ! R; namely, u (x) u (y) indicates that the
consumer prefers the bundle x of goods to the bundle y or that he is indi¤erent between the
two.
For example,
n
X
u (x) = xi
i=1

is the utility function of a consumer that orders the bundles simply according to the sum
of the quantities of the di¤erent goods that they contain. The classic Cobb-Douglas utility
function is
Yn
u (x) = xi i
i=1
Pn
with the exponents i > 0 such that i=1 i = 1 (see Example 178). When i = 1=n for
each i, we have
n n
!1
Y 1 Y n

u (x) = (xi ) =n xi
i=1 i=1

with bundles being ordered according to the n-th root of the product of the quantities of the
di¤erent goods that they contain.8

We close by considering a producer that has to decide how much output to produce
(Section 2.4.1). In such a decision the production function f : A Rn+ ! R plays a crucial
role in that it describes how much output f (x) is obtained by starting from a vector x 2 Rn
of input. For example,
n
!1
Y n

f (x) = xi
i=1

is the Cobb-Douglas production function in which the output is equal to the n-th root of the
product of the input components.
8
Because of its multiplicative form, the bundles with at least one zero component xi have zero utility
according to the Cobb-Douglas utility function. Since it is not that plausible that the presence of a zero
component has such drastic consequences, this utility function is often de…ned only on Rn
++ (as we will also
often do).
6.3. GENERAL PROPERTIES 117

6.2.2 Intertemporal choice


Assume that the consumer has, over the possible consumption streams x = (x1 ; x2 ; :::; xT )
of some good, preferences quanti…ed by an intertemporal utility function U : A RT ! R
(Section 2.4.2). For example, assume that he has a utility function ut : A R ! R, called
instantaneous, for the consumption level xt of each period. In this case a possible form of
the intertemporal utility function is
T
X
T 1 t 1
U (x) = u1 (x1 ) + u2 (x2 ) + + uT (xT ) = ut (xt ) (6.6)
t=1

where 2 (0; 1) is a subjective discount factor that depends on how “patient”the consumer
is. The more patient the consumer is –i.e., the more he is willing to postpone his consumption
of a given quantity of the good –the higher the value of is. In particular, the closer gets
to 1, the closer we approach the form
T
X
U (x) = u1 (x1 ) + u2 (x2 ) + + uT (xT ) = ut (xt )
t=1

in which consumption in each period is evaluated in the same way. In contrast, the closer
gets to 0, the closer U (x) gets to u1 (x1 ), that is, the consumer becomes extremely impatient
and does not give any importance to future consumptions.

6.3 General properties


6.3.1 Preimages and level curves
The notion of preimage is dual to that of image. Speci…cally, let f : A ! B. Given a point
y 2 B, its preimage, denoted by f 1 (y), is the set
1
f (y) = fx 2 A : f (x) = yg

of the elements of the domain whose image is y. More generally, given any subset D of the
codomain B, its preimage f 1 (D) is the set
1
f (D) = fx 2 A : f (x) 2 Dg

of the elements of the domain whose images belong to D.


The next examples illustrate these notions.9

Example 181 Consider the function f : A ! B that to each (living) person associates the
date of birth. If y 2 B is a possible such date, f 1 (y) is the set of the persons that have y
as date of birth; in other words, all the persons in f 1 (y) have the same age (they form a
cohort, in the demography terminology). N
9
For the sake of brevity, we will consider as sets D only intervals and singletons, but similar considerations
hold for other types of sets.
118 CHAPTER 6. FUNCTIONS

Example 182 Let f : R ! R be the cubic function f (x) = x3 . We have Im f = R. For


each y 2 R,
n 1o
1
f (y) = y 3

For example, f 1 (27) = 3. The preimage of a closed interval [a; b] is


h 1 1i
1
f ([a; b]) = a 3 ; b 3

For example, f 1 ([ 8; 27]) = [ 2; 3]. N

Example 183 Let f : R ! R be the quadratic function f (x) = x2 . We have Im f = R+ .


The preimage of each y 0 is
p p
f 1 (y) = f y; yg

while that of each y < 0 is f 1 (y) = ;.10 So,


8 p p p p
>
< ( b; a) [ ( a; b) if a 0
f 1 (a; b) = ; if b < 0
>
: p p
( b; b) if a < 0 < b

Note that f 1 (a; b) = f 1 ([0; b)) when a < 0. Indeed, the elements
p p between a and 0 have
no preimage. For example, if D = ( 1; 2), then f 1 (D) = ( 2; 2). Since

1 1 1
f (D) = f ([0; 2)) = f ( 1; 2)

the negative elements of D are irrelevant (as they do not belong to the image of the function).
N

By resorting to an appropriate topographic term, the preimage

1
f (k) = fx 2 A : f (x) = kg

of a function f : A ! R is often called level curve (or level set) of f of level k 2 R.


This terminology nicely expresses the idea that the set f 1 (k) is formed by the points of
the domain at which the function attains the “level”k. It is particularly …tting in economic
applications, as we will see shortly.
The level curves of functions of two variables have a geometric representation that may
prove illuminating, as next we show.

Example 184 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 + x22 . For every k 0, the level
curve f 1 (k) is the locus in R2 of equation

x21 + x22 = k
10 1 1
To ease notation, we denote the preimage of an open interval (a; b) by f (a; b) instead of f ((a; b)).
6.3. GENERAL PROPERTIES 119
p
That is, it is the circumference with center at the origin and radius k. Graphically, the
level curves can be represented as:

while the graph of the function is:

4
x3

0
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

Two di¤erent level curves of the same function cannot have any point in common, that
is,
1 1
k1 6= k2 =) f (k1 ) \ f (k2 ) = ; (6.7)

Indeed, if there were a point x 2 A that belongs to both the two curves of levels k1 and k2 ,
we would have f (x) = k1 and f (x) = k2 with k1 6= k2 , but this is impossible because, by
de…nition, a function may assume only one value at each point.
120 CHAPTER 6. FUNCTIONS
p
Example 185 Let f : A R2 ! R be given by f p (x1 ; x2 ) = 7x21 x2 . For every k 0,
the level curve f 1 (k) is the locus in R2 of equation 7x21 x2 = k, that is, x2 = k 2 +7x21 .
It is a parabola that intersects the vertical axis in k 2 . Graphically:

7
x
6 2

1
k=0
0
O x
1
-1
k=1
-2

-3

-4
k=2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Example 186 The function f : R++ R ! R given by

s
x21 + x22
f (x1 ; x2 ) =
x1

is de…ned only for x1 > 0. Its level curves f 1 (k) are the loci of equation

s
x21 + x22
=k
x1

that is, x21 + x22 k 2 x1 = 0. Therefore, they are circumferences passing through the origin
6.3. GENERAL PROPERTIES 121

and with centers k 2 =2; 0 , all on the horizontal axis. Graphically:

Although all such circumferences have the origin as common point, the “true” level curves
are the circumferences without the origin because at (0; 0) the function is not de…ned. So,
they do not actually have any point in common. N

O.R. The equation f (x1 ; x2 ) = k of a generic level curve of a function f of two variables
can be rewritten, in an apparently more complicated form, as

y = f (x1 ; x2 )
y=k

This rewriting clari…es its geometric meaning:

(i) the equation y = f (x1 ; x2 ) represents a surface in R3 (the graph of f );

(ii) the equation y = k represents an horizontal plane (it contains the points (x1 ; x2 ; k) 2
R3 , i.e., all the points of “height” k);

(iii) the brace “f ” geometrically means intersection between the sets de…ned by the two
previous equations.

The curve of level k is, therefore, viewed as the intersection between the surface that
represents f and a horizontal plane.
122 CHAPTER 6. FUNCTIONS

x3
-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

Hence, the di¤erent level curves are obtained by cutting the surface horizontally with hori-
zontal planes (at di¤erent levels). They represent the edges of the “slices” obtained in this
way on the plane (x1 ; x2 ). H

Indi¤erence curves
We now turn to a classic economic application of level curves. Given a utility function
u : A Rn+ ! R, its level curves
1
u (k) = fx 2 A : u (x) = kg

are called indi¤erence curves. So, an indi¤erence curve is formed by all the bundles x 2 A
that have the same utility k, which are therefore indi¤erent for the consumer. The collection
u 1 (k) : k 2 R of all the indi¤erence curves is sometimes called indi¤ erence map.

Example 187 Consider the Cobb-Douglas utility function u : R2+ ! R given by u (x) =
p
x1 x2 . For every k > 0 we have
1 p
u (k) = x 2 R2+ : x1 x2 = k = x 2 R2+ : x1 x2 = k 2
k2
= x 2 R2+ : x2 =
x1

Therefore, the indi¤erence curve of level k is the hyperbola of equation

k2
x2 =
x1

By varying k > 0, we get the indi¤erence map u 1 (k) : k > 0 , i.e.,


6.3. GENERAL PROPERTIES 123

8
y
7

6 k=3

5
k=2
4

2 k=1

0
O x
-1
0 0.5 1 1.5 2 2.5 3 3.5

Introductory economics courses emphasize that indi¤erence curves “do not cross”, i.e.,
are disjoint: k1 6= k2 implies u 1 (k1 ) 6= u 1 (k2 ). Clearly, this just a special case of the more
general property (6.7) that holds for any family of level curves.

The level curves


1
f (k) = fx 2 A : f (x) = kg

of a production function f : A Rn+ ! R are called isoquants. An isoquant is, thus, the set
of all the input vectors x 2 Rn+ that produce the same output. The set f 1 (k) : k 2 R of
all the isoquants is sometimes called isoquant map.
Finally, the level curves
1
c (k) = fx 2 A : c (x) = kg

of a cost function c : A R+ ! R are called isocosts. So, an isocost is the set of all the
levels of output x 2 A that have the same cost. The set c 1 (k) : k 2 R of all the isocosts
is sometimes called isocost map.

In sum, indi¤erence curves, isoquants and isocosts are all examples of level curves, whose
general properties they inherit. For example, the fact that two level curves have no points in
common –property (6.7) –implies the analogous classic property of the indi¤erence curves,
as already noted, as well as the property that isoquants and isocosts never intersect.

6.3.2 Algebra of functions


Given any two sets A and B, we denote by B A the set of all functions f : A ! B.11 In
particular, RA is the set of all real-valued functions f : A ! R de…ned on a set A whatsoever.
In RA we can de…ne in a natural way some operations that associate to two functions in RA
a new function still in RA .
11 A
Sometimes we use the notation B instead of B A (the context should clarify).
124 CHAPTER 6. FUNCTIONS

De…nition 188 Given any two functions f and g in RA , the sum function f + g is the
element of RA such that
(f + g) (x) = f (x) + g (x) 8x 2 A

The sum function f + g : A ! R is thus constructed by adding, for each element x of the
domain A, the images f (x) and g (x) of x under the two functions.

Example 189 Let RR be the set of all the functions f : R ! R. Consider f (x) = x and
g (x) = x2 . The sum function f + g is de…ned by (f + g) (x) = x + x2 . N

In a similar way we de…ne:

(i) the di¤ erence function (f g) (x) = f (x) g (x) for every x 2 A;
(ii) the product function (f g) (x) = f (x) g (x) for every x 2 A;
(iii) the quotient function (f =g) (x) = f (x) =g (x) for every x 2 A, provided g (x) 6= 0.

We have thus introduced four operations in the set RA , based on the four basic operations
on the real numbers. It is easy to see that these operations inherit the properties of the
basic operations. For example, addition is commutative, f + g = g + f , and associative,
(f + g) + h = f + (g + h).

N.B. (i) These operations require the functions to have the same domain A. For example,
p
if f (x) = x2 and g (x) = x, the sum f + g is meaningless because, for x < 0, the function
g is not de…ned. (ii) The domain A can be any set: numbers, chairs, or other. Instead, it is
key that the codomain is R because it is among real numbers that we are able to perform
the four basic operations. O

6.3.3 Composition
Consider two functions f : A ! B and g : C ! D, with Im f C. Take any point x 2 A.
Since Im f C, the image f (x) belongs to the domain C of the function g. We can apply
the function g to the image f (x), obtaining in such a way the element g (f (x)) of D. Indeed,
the function g has as its argument the image f (x) of x. Graphically:

1.6 A Im f ⊆ C D

1.4

1.2 f g
x f(x) g(f(x))
1

0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
6.3. GENERAL PROPERTIES 125

We have, therefore, associated to each element x of the set A the element g (f (x)) of the
set D. This rule, called composition, starts with the functions f and g and de…nes a new
function from A in D, denoted by g f . Formally:

De…nition 190 Let A, B, C and D be four sets and f : A ! B and g : C ! D two


functions. If Im f C, the composite (or compound) function g f : A ! D is de…ned by
(g f ) (x) = g (f (x)) 8x 2 A

Note that the inclusion condition, Im f C, is key in making the composition possible.
Let us give some examples.

Example 191 Let f; g : R ! R be given by f (x) = x2 and g (x) = x + 1. In this case


A = B = C = D = R, so the inclusion condition is trivially satis…ed. Consider g f . Given
x 2 R, we have f (x) = x2 . The function g has therefore x2 as its argument, so
g (f (x)) = g x2 = x2 + 1
Hence, the composite function g f : R ! R is given by (g f ) (x) = x2 + 1.
Consider instead f g. Given x 2 R, one has g (x) = x + 1. The function f has therefore
x + 1 as its argument, whence
f (g (x)) = f (x + 1) = (x + 1)2
The composite function f g) (x) = (x + 1)2 .
g : R ! R is thus given by (f N
p
Example 192 Consider f : R+ ! R given by f (x) = x and g : R ! R given by
g (x) = x 1. In this case B = C = D = R and A = R+ . The inclusion condition is satis…ed
for g f because Im f = R+ R, but not for f g because Im g = R is not included in R+ ,
which is the domain of f .
p p
Consider g f . Given x 2 R, we have f (x) = x. The function g has therefore x as
its argument, so p p
g (f (x)) = g x = x 1
p
The composite function g f : R+ ! R is given by (g f ) (x) = x 1. N

Example 193 If in the previous example we consider g~ : [1; +1) ! R given by g~ (x) = x 1,
the inclusion condition is satis…ed
p for f g~ because Im g~ = [0; +1) = R+ . In particular,
f g~ : [1; +1) ! R is given by x 1. As we will see soon in Section 6.7, the function g~ is
the restriction of g to [1; +1). N

Example 194 Let A be the set of all citizens of a country, f : A ! R the function that to
each of them associates his income for this year, and g : R ! R the function that to each
possible income associates the tax that must be paid. The composite function g f : A ! R
establishes the correspondence between each citizen and the tax that he has to pay. For the
revenue service (and also for the citizens) such composite function is of great interest. N

Example 195 Consider any function g : R+ ! R and the function f : R2 ! R given


by f (x1 ; x2 ) = x21 + x22 . The composite function g f : R2 ! R, given by (g f ) (x) =
g x21 + x22 , takes on the same values on all circles centered at the origin. For instance, if
p p
g = x then (g f ) (x) = x21 + x22 is the norm of x. N
126 CHAPTER 6. FUNCTIONS

6.4 Classes of functions


In this section we introduce some important classes of functions.

6.4.1 Injective, surjective, and bijective functions


Given any two sets A and B, a function f : A ! B is called injective (or one-to-one) if

x 6= y =) f (x) 6= f (y) 8x; y 2 A (6.8)

To di¤erent elements of the domain, an injective f thus associates di¤erent elements of the
codomain. Graphically:

1.6

A B
1.4
a
1
1.2 b
1
b
3
1 a b
2 2

0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Example 196 A simple example of injective function is the cubic f (x) = x3 . Indeed, two
distinct scalars have always distinct cubes, so x 6= y implies x3 6= y 3 for all x; y 2 R. A
classic example of non-injective function is the quadratic f (x) = x2 : for instance, to the two
distinct points 2 and 2 of R there corresponds the same square, that is, f (2) = f ( 2) = 4.
N

Note that (6.8) is equivalent to the contrapositive:12

f (x) = f (y) =) x = y 8x; y 2 A

which requires that two elements of the domain that have the same image be equal.

Given any two sets A and B, a function f : A ! B is called surjective (or onto) if

Im f = B

that is, if for each element y of B there exists at least an element x of A such that f (x) = y.
In other words, a function is surjective if each element of the codomain is the image of at
least one point in the domain.
12
Given two properties p and q, we have p =) q if and only if :q =) :p (: stands for “not”). The
implication :q =) :p is the contrapositive of the original implication p =) q. See Appendix D.
6.4. CLASSES OF FUNCTIONS 127

Example 197 The cubic function f : R ! R given by f (x) = x3 is surjective because each
1 1
y 2 R is the image of y 3 2 R, that is, f (y 3 ) = y. On the other hand, the quadratic function
f : R ! R given by f (x) = x2 is not surjective, because no y < 0 is the image of a point of
the domain. N

A function f : A ! B can always be written as f : A ! Im f , that is, it can be


made surjective by taking B = Im f . For example, if we write the quadratic function as
f : R ! R+ , it becomes surjective. Therefore, by suitably choosing the codomain, each
function becomes surjective. This, however, does not mean that surjectivity is a notion
without interest: as we will see, the set B is often …xed a priori (for various reasons) and it
is then important to distinguish the functions that have B as image, that is, the surjective
ones, from those whose image is only contained in B.

Finally, given any two sets A and B, a function f : A ! B is called bijective if it is both
injective and surjective. In this case, we can go “back and forth” between the sets A and B
by using f : from any x 2 A we arrive to a unique y = f (x) 2 B, while from any y 2 B we
go back to a unique x 2 A such that y = f (x). Graphically:

1.6

A B
1.4
a b
1 1
1.2

1 a b
2 2

0.8
a b
3 3

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

For example, the cubic function f : R ! R given by f (x) = x3 is bijective.

Through bijective functions we can establish a simple, but interesting, result about …nite
sets. Here jAj denotes the cardinality of a …nite set A, that is, the number of its elements.

Proposition 198 Let A and B be any two …nite sets. There exists a bijection f : A ! B if
and only if jAj = jBj.

As we will see in Chapter 7, by paraphrasing a famous sentence of David Hilbert we can


say that this result is the door to the paradise of Cantor.

Proof “If”. Let jAj = jBj = n and write A = fa1 ; a2 ; : : : ; an g and B = fb1 ; b2 ; : : : ; bn g. Then
de…ne the bijection f : A ! B by f (ai ) = bi for i = 1; 2; :::; n. “Only if”. Let f : A ! B
be a bijection. By injectivity, we have jAj jBj. Indeed, to each x 2 A there corresponds
a distinct f (x) 2 B. On the other hand, by surjectivity we have jBj jAj. Indeed, for
128 CHAPTER 6. FUNCTIONS

each y1 6= y2 we have f 1 (y1 ) \ f 1 (y2 ) = ;. Hence, setting C = f 1 (y) : y 2 B , we


have jBj = jCj. But, it is easy to see that jCj jAj, and so jBj jAj. We conclude that
jAj = jBj.

6.4.2 Inverse functions


Given any two sets A and B, let f : A ! B be an injective function. Then, to each element
f (x) of the image Im f there corresponds a unique element x 2 A such that f (x) = y. The
function so determined, called inverse function of f , therefore associates to each element of
the image of f its unique preimage. Formally:

De…nition 199 Let f : A ! B be an injective function. The function f 1 : Im f ! A


de…ned by f 1 (y) = x if and only if f (x) = y is called the inverse function of f .

We have both
1
f (f (x)) = x 8x 2 A (6.9)

and
1
f f (y) = y 8y 2 Im f (6.10)

Inverse functions go in the opposite way than the original ones, they retrace their steps back
to the domain: from x 2 A we arrive to f (x) 2 B, and we go back with f 1 (f (x)) = x.
Graphically:

1.6

A B
1.4

1.2

f
1 x y
-1
f
0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

It makes sense to talk about the inverse function only for injective functions, which are
then called invertible. Indeed, if f were not injective, there would be at least two elements
of the domain x1 6= x2 with the same image y = f (x1 ) = f (x2 ). So, the set of the preimages
of y would not be a singleton (because it would contain at least the two elements x1 and x2 )
and the relation f 1 would not be a function.
We actually have f 1 : B ! A when the function f is also surjective, and so bijective.
In such a case the domain of the inverse is the entire codomain of f .
6.4. CLASSES OF FUNCTIONS 129

Example 200 (i) Let f : R ! R be the bijective function f (x) = x3 . From y = x3 it


1 1
follows x = y 3 . The inverse f 1 : R ! R is given by f 1 (y) = y 3 . That is, because of the
1
irrelevance of the label of the independent variable, f 1 (x) = x 3 .
(ii) Let f : R ! R be the bijective function f (x) = 3x . From y = 3x it follows x = log3 y.
The inverse f 1 : R ! R is given by f 1 (y) = log3 y, that is, f 1 (x) = log3 x. N

Example 201 Let f : R ! R be de…ned by


8 x
< if x < 0
f (x) = 2
:
3x if x 0

From y = x=2 it follows x = 2y, while from y = 3x it follows x = y=3. Therefore,

8
< 2y if y < 0
f 1 (y) =
: y if y 0
3

Example 202 Let f : R f0g ! R be de…ned by f (x) = 1=x. From y = 1=x, it follows
that x = 1=y, so f 1 : R f0g ! R is given by f 1 (y) = 1=y. In this case f = f 1 . Note
that R f0g is both the domain of f 1 and the image of f: N

Example 203 The curious function f : R ! R de…ned by

(
x if x 2 Q
f (x) =
x if x 2
=Q

is bijective, so invertible with f 1 : R ! R. Also in this case we have f = f 1, as the reader


can check. N

1
It is easy to see that, when it exists, the inverse (g f ) of the composite function g f
is
1 1
f g (6.11)

That is, it is the composition of the inverse functions, but exchanged of place. Indeed, from
y = g (f (x)) we get g 1 (y) = f (x) and …nally f 1 g 1 (y) = x. On the other hand, in
dressing, we …rst we put the underpants, f , and then the pants, g; in undressing, …rst we
take o¤ the pants, g 1 , and then the underpants, f 1 .
130 CHAPTER 6. FUNCTIONS

O.R. The graph of the inverse function f 1 is the mirror image of the graph of the function
f with respect to the 45 degree line:

Inverses and cryptography The computation of the cube x3 of any scalar x is much
p
easier than the computation of the cube root 3 x: it is much easier to compute 803 = 512; 000
p
(three multiplications su¢ ce) than 3 512; 000 = 80. In other words, the computation of the
p
cubic function f (x) = x3 is much easier than the computation of its inverse f 1 (x) = 3 x.
This computational di¤erence increases signi…cantly as we take higher and higher odd powers
(for example f (x) = x5 , f (x) = x7 and so on).
Similarly, while the computation of ex is fairly easy, that of log x is much harder (be-
fore electronic calculators became available, logarithmic tables were used to aid such com-
putations). From a computational viewpoint (in the theoretical world everything works
smoothly), the inverse function f 1 may be very di¢ cult to deal with. Injective functions
for which the computation of f is easy, while that of the inverse f 1 is complex, are called
one-way.13
For example, let A = f(p; q) 2 P P : p < qg and consider the function f : A P P ! N
de…ned by f (p; q) = pq that associates to each pair of prime numbers p; q 2 P, with p < q,
their product pq. For example, f (2; 3) = 6 and f (11; 13) = 143. By the Fundamental
13
The notions of “simple” and “complex”, here used qualitatively, can be made more rigorous (as the
curious reader may discover in cryptography texts).
6.4. CLASSES OF FUNCTIONS 131

Theorem of Arithmetic, it is an injective function.14 Given two prime numbers p and q, the
computation of their product is a trivial multiplication. Instead, given any natural number n
it is quite complex, and may require a long time even to a powerful computer, to determine
if it is the product of two prime numbers. In this regard, the reader may recall the discussion
regarding factorization and primality tests from Section 1.3.2 (to experience the di¢ culty
…rst-hand, the reader may try to check whether the number 4343 is the product of two prime
numbers). This makes the computation of the inverse function f 1 very complex, as opposed
to the very simple computation of f . For this reason, f is a classic example of a one-way
function.

Let us now look at a simple application of one-way functions to cryptography. Consider


a user who handles sensitive data with an information system accessible by means of a
password. Suppose the password is numerical and that, for the sake of simplicity, it is made
up of any pair of natural numbers. The system has a speci…c data storage unit in which
it saves the password chosen by the user. When the user inputs this password, the system
veri…es whether it coincides with the one stored in its memory.
This scheme has an obvious Achilles’ heel: the system manager can access the data
storage and reveal the password to any third party interested in accessing the user’s personal
data. One-way functions help to mitigate this problem. Indeed, let f : A N N ! N
be a one-way function that associates a natural number f (n; m) to any pair of natural
numbers (n; m) 2 A. Instead of memorizing the chosen password, say (n; m), the system
now memorizes its image f (n; m). When the user inserts a password (n; m) the system
computes f (n; m) and compares it with f (n; m). If f (n; m) = f (n; m), the password is
correct –that is, (n; m) = (n; m) –and the system allows the user to log in.
Since the function is one-way, the computation of f (n; m) is simple and requires a level
of e¤ort only slightly higher than that needed to compare passwords directly. The memory
will no longer store the password (n; m), but its image f (n; m). This image will be the only
piece of information that the manager will be able to access. Even if he (or the third party
to whom he gives the information) knows the function f , the fact that the computation of
the inverse f 1 is very complex (and requires a good deal of time) makes it computationally,
so practically, very di¢ cult to recover the password (n; m) from the knowledge of f (n; m).
But, without the knowledge of (n; m) it is impossible to access the sensitive data.
For example, if instead of any natural number we require the password to consist of a
pair (p; q) of prime numbers, we can use f (p; q) = pq as a one-way function. The manager
has access to the product pq, for example the number 4343, and it will not be easy to recover
the pair of prime numbers (p; q) that generated the product, so the password, in a reasonably
short amount of time.
To sum up, one-way functions make it possible to signi…cantly strengthen the protection
of restricted access systems. The design of better and better one-way functions, which
combine the ease of computation of f (x) with increasingly complex inverses f 1 (x), is an
important …eld of research in cryptography.

14
But not surjective: for example 4 2
= Im f because there are no two di¤erent prime numbers whose product
is 4.
132 CHAPTER 6. FUNCTIONS

6.4.3 Bounded functions


Let f : A ! R be a function with domain any set A and codomain the real line. We say
that f is:

(i) bounded (from) above if its image Im f is a set bounded above in R, i.e., if there exists
M 2 R such that f (x) M for every x 2 A;

(ii) bounded (from) below if its image Im f is a set bounded below in R, i.e., if there
exists m 2 R such that f (x) m for every x 2 A;

(iii) bounded if it is bounded both above and below.

For example, the function f : R f0g ! R given by


1
f (x) =
jxj
is bounded below, but not above, since f (x) 0 for every x 2 R. Instead, the function
2
f : R ! R given by f (x) = x is bounded above, but not below, since f (x) 0 for every
x 2 R.
The next lemma establishes a simple, but useful, condition of boundedness.

Lemma 204 A function f : A ! R is bounded if and only if there exists k > 0 such that

jf (x)j k 8x 2 A (6.12)

Proof If f is bounded, there exist m; M 2 R such that m f (x) M . Let k > 0 be such
that k m M k. Then (6.12) holds. Vice versa, suppose that (6.12) holds. By (4.1),
which holds also for , we have k f (x) k, so f is bounded both above and below.

The function de…ned by


8
< 1
> if x 1
f (x) = 0 if 0 < x < 1 (6.13)
>
:
2 if x 0

is bounded since jf (x)j 2 for every x 2 R.

Thus, we have a …rst taxonomy of the real-valued functions f : A ! R, that is, of


the elements of the space RA .15 This taxonomy is not exhaustive: there exist functions
that do not satisfy any of the conditions (i)-(iii). This is the case, for example, of the
function f (x) = x. Such “unclassi…ed” functions are called unbounded (their image being
an unbounded set).

We denote by supx2A f (x) the supremum of the image of a function f : A ! R bounded


above, that is,
sup f (x) = sup (Im f )
x2A
15
Note the use of the term “space” to denote a set of reference (in this case RA ).
6.4. CLASSES OF FUNCTIONS 133

By the de…nition of the supremum, for a scalar M we have f (x) M for all x 2 A if and
only if supx2A f (x) M .
Similarly, we denote by inf x2A f (x) the in…mum of the image of a function f : A ! R
bounded below, that is,
inf f (x) = inf (Im f )
x2A

By the de…nition of the in…mum, for a scalar m we have f (x) m for all x 2 A if and only
if inf x2A f (x) m.
Clearly, a bounded function f : A ! R has both extrema, with

inf f (x) f (x) sup f (x) 8x 2 A


x2A x2A

In particular, for two scalars m and M we have m f (x) M for all x 2 A if and only if
m inf x2A f (x) supx2A f (x) M .

Example 205 For the function (6.13) we have supx2R f (x) = 1 and inf x2R f (x) = 2. For
the function f : R f0g ! R given by f (x) = 1= jxj, which is bounded below but not above,
one has inf x2R f0g f (x) = 0. N

6.4.4 Monotonic functions


We now introduce monotonic functions, an important class of real-valued functions f : A
Rn ! R de…ned in terms of the underlying order structure of Rn .

Monotonic functions on R
We begin by studying scalar functions.

De…nition 206 A function f : A R ! R is said to be:

(i) increasing if
x > y =) f (x) f (y) 8x; y 2 A (6.14)
strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A (6.15)

(ii) decreasing if
x > y =) f (x) f (y) 8x; y 2 A (6.16)
strictly decreasing if

x > y =) f (x) < f (y) 8x; y 2 A

(iii) constant if there exists k 2 R such that

f (x) = k 8x 2 A
134 CHAPTER 6. FUNCTIONS

Note that a function is constant if and only if it is both increasing and decreasing. In
other words, constancy is equivalent to having both monotonicity properties. This is why
we have introduced constancy among the forms of monotonicity. Soon, we will see that in
the multivariable case the relation between constancy and monotonicity is a bit more subtle.

Increasing or decreasing functions are called, generically, monotonic (or monotone). They
are called strictly monotonic when they are either strictly increasing or strictly decreasing
(two mutually exclusive properties: there are no functions that are both strictly increas-
ing and strictly decreasing). The next result shows that strict monotonicity excludes the
possibility that the function is constant on some region of its domain. Formally:

Proposition 207 An increasing function f : A R ! R is strictly increasing if and only


if
f (x) = f (y) =) x = y 8x; y 2 A (6.17)
that is, if and only if it is injective.

A similar result holds for strictly decreasing functions. Strictly monotonic functions are
therefore injective, and so invertible.16

Proof “Only if”. Let f be strictly increasing and let f (x) = f (y). Suppose, by contradiction,
that x 6= y, say x > y. By (6.15), we have f (x) 6= f (y), which contradicts f (x) = f (y). It
follows that x = y, as desired.
“If”. Suppose that (6.17) holds. Let f be increasing. We prove that it is also strictly
increasing. Let x > y. By increasing monotonicity, we have f (x) f (y), but we cannot
have f (x) = f (y) because (6.17) would imply x = y. Thus f (x) > f (y), as claimed.

Example 208 The functions f : R ! R given by f (x) = x and f (x) = x3 are strictly
increasing, while the function (
x if x 0
f (x) =
0 if x < 0
is increasing, but not strictly increasing, because it is constant for every x < 0. The same is
true for the function 8
>
> x 1 if x 1
<
f (x) = 0 if 1<x<1 (6.18)
>
>
:
x + 1 if x 1
because it is constant on [ 1; 1]. N

Note that in (6.14) we can replace x > y by x y without any consequence because we
have f (x) = f (y) if x = y. Hence, increasing monotonicity is equivalently stated as

x y =) f (x) f (y) (6.19)

Consider the converse implication

f (x) f (y) =) x y (6.20)


16
Later in the book we will see a partial converse of this result (Proposition 495).
6.4. CLASSES OF FUNCTIONS 135

It requires that, to larger values of the image, correspond larger values of the argument.
Clearly, f (x) = f (y) is equivalent to having both f (x) f (y) and f (y) f (x), which in
turn, by (6.20), imply both x y and y x, that is, x = y. Therefore, from (6.20) it follows
that
f (x) = f (y) =) x = y (6.21)
In view of Proposition 207, we conclude that an increasing function that satis…es also (6.20)
is strictly increasing. The next result shows that the converse is also true, thus establishing
an important characterization of strictly increasing functions (a dual result holds for strictly
decreasing functions).

Proposition 209 A function f : A R ! R is strictly increasing if and only if

x y () f (x) f (y) 8x; y 2 A (6.22)

Momentarily, we will see that this result plays an important role in the ordinalist approach
of utility theory.

Proof Thanks to what we have seen above, we just need to prove the “only if”part, i.e., that
a strictly increasing function satis…es (6.22). Since a strictly increasing function is increasing,
the implication
x y =) f (x) f (y)
is obvious. To prove (6.22) it remains to show that

f (x) f (y) =) x y

Let f (x) f (y) and suppose, by contradiction, that x < y. Strict increasing monotonicity
implies f (x) < f (y), which contradicts f (x) f (y). So x y, as desired.

Monotonic functions on Rn
The monotonicity notions seen in the case n = 1 generalize in a natural way to the case
of arbitrary n, though some subtle issues arise because of the two peculiarities of the case
n 2, that is, the incompleteness of and the presence of two notions of strict inequality,
> and .
Basic monotonicity is easily generalized: a function f : A Rn ! R is said to be:

(i) increasing if
x y =) f (x) f (y) 8x; y 2 A (6.23)

(ii) decreasing if
x y =) f (x) f (y) 8x; y 2 A

(iii) constant if there exists k 2 R such that

f (x) = k 8x 2 A
136 CHAPTER 6. FUNCTIONS

This notion of increasing and decreasing function has bite only on vectors x and y that
can be compared, while vectors x and y that cannot be compared, such as for example (1; 2)
and (2; 1) in R2 , are ignored. As a result, while constant functions are both increasing and
decreasing, the converse is no longer true when n 2, as the next example shows.

Example 210 Let A = fa; a0 ; b; b0 g be a subset of the plane with four elements. Assume
that a a0 and b b0 are the only comparisons that can be made in A. For instance,
a = ( 1; 0), a = (0; 1), b = (1; 1=2) and b0 = (2; 1=2). The function f : A
0 R2 ! R
de…ned by f (a) = f (a0 ) = 0 and f (b) = f (b0 ) = 1 is both increasing and decreasing, but it
is not constant. N

More delicate is the generalization to Rn of strict monotonicity because of the two distinct
concepts of strict inequality.17 We say that a function f : A Rn ! R is:

(iv) strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A

(v) strongly increasing if is increasing and

x y =) f (x) > f (y) 8x; y 2 A (6.24)

We have a simple hierarchy among these notions:

Proposition 211 Let f : A Rn ! R. We have:

strictly increasing =) strongly increasing =) increasing (6.25)

They are, therefore, increasingly stringent notions of monotonicity. In applications we


have to choose the most relevant form for the problem at hand.

Proof A strongly increasing function is, by de…nition, increasing. It remains to prove that
strictly increasing implies strongly increasing. Thus, let f be strictly increasing. We need to
prove that f is increasing and satis…es (6.24). If x y, we have x = y or x > y. In the …rst
case f (x) = f (y). In the second case f (x) > f (y), so f (x) f (y). Thus, f is increasing.
Moreover, if x y a fortiori we have x > y, and therefore f (x) > f (y). We conclude that
f is strongly increasing.

The converses of the previous implications do not hold. An increasing function that,
like (6.18), has constant parts is an example of an increasing, but not strongly increasing
function (so, not strictly increasing either18 ). Therefore,

increasing 6=) strongly increasing

Moreover, the next example shows that there exist functions that are strongly but not strictly
increasing, that is,
strongly increasing 6=) strictly increasing
17
We focus on the increasing case, leaving the decreasing case to the reader.
18
By the contrapositive of (6.25), a function which is not strongly increasing, it is not strictly increasing as
well.
6.4. CLASSES OF FUNCTIONS 137

Example 212 The Leontief function f : R2 ! R given by

f (x) = min fx1 ; x2 g

is strongly increasing, but not strictly increasing. For example, x = (1; 2) > y = (1; 1) but
f (x) = f (y) = 1. N

N.B. For operators f : Rn ! Rm with m > 1 the notions of monotonicity studied for the
case m = 1 assume a di¤erent meaning since also the images f (x) and f (y) might not
be comparable, that is, neither f (x) f (y) nor f (y) f (x) may hold. For example, if
f : R2 ! R2 is such that f (0; 1) = (1; 2) and f (3; 4) = (2; 1), the images (1; 2) and (2; 1) are
not comparable. A notion of monotonicity suitable for operators f : Rn ! Rm when m > 1
will be studied in Section 24.2.2. O

Utility functions
Let u : A ! R be a utility function de…ned on a set A Rn+ of bundles of goods. A
transformation f u : A ! R of u, where f : Im u R ! R, de…nes a mathematically
di¤erent but conceptually equivalent utility function provided

u (x) u (y) () (f u) (x) (f u) (y) 8x; y 2 A (6.26)

Indeed, under this condition the function f u orders the bundles in the same way as the
original utility function u, that is,

x % y () (f u) (x) (f u) (y) 8x; y 2 A

The utility functions u and f u are thus equivalent because they represent the same under-
lying preference %.
By Proposition 209, the function f satis…es (6.26) if and only if it is strictly increasing.
Therefore, f u is an equivalent utility function if and only if f is strictly increasing. To
describe such a fundamental property of invariance of utility functions, we say that they are
ordinal, that is, unique up to monotonic (strictly increasing) transformations. This property
lies at the heart of the ordinalist approach, in which utility functions are regarded as mere
numerical representation of the underlying preference %, which is the fundamental notion
(recall the discussion in Section 6.2.1).

Example 213 Consider the Cobb-Douglas utility function on Rn++ given by


n
Y
u (x1 ; x2 ; ; xn ) = xai i (6.27)
i=1
Pn
with each i > 0 and i=1 i = 1. Taking f (x) = log x, its monotonic transformation
n
X
f u= i log xi
i=1
138 CHAPTER 6. FUNCTIONS

is a utility function equivalent to u on Rn++ . It is the logarithmic version of the Cobb-Douglas


function, often called log-linear utility function.19 N

The three notions of monotonicity on Rn – increasing, strongly increasing, and strictly


increasing –are key for utility functions u : A ! R. Since their argument x 2 Rn is a bundle
of “goods”, it is natural to assume that the consumer prefers vectors with larger amounts of
the di¤erent goods, that is, “the more, the better”. According to how we state this motto,
one of the three forms of monotonicity becomes the appropriate one.
If in a vector x 2 Rn each component –i.e., each type of good –is deemed important by
the consumer, it is natural to assume that u is strictly increasing:

x > y =) u (x) > u (y) 8x; y 2 A

In this case it is su¢ cient to increase the amount of any of the goods to attain a greater
utility: “the more of any good is always better”.
If, instead, we want to contemplate the possibility that some goods may actually be
useless for the consumer, we only require u to be increasing:

x y =) u (x) u (y) 8x; y 2 A (6.28)

Indeed, if a good in the bundles is “useless” for the consumer (as wine is for a dry person,
or for drunk one who had already too much of it), the inequality x > y might be caused
by a larger amount of such good, with all other goods unchanged; it is then reasonable that
u (x) = u (y) because the consumer does not get any bene…t in passing from y to x. In this
case “the more of any good can be better or indi¤erent”.
Finally, “the more of any good is always better”motto that motivates strict monotonicity
can be weakened in the sense of strong monotonicity by assuming “the more of all the goods
is always better”, that is,

x y =) u (x) > u (y) 8x; y 2 A

In this case, there is an increase in utility only when the amounts of all goods increase, it
is no longer enough to increase the amount of only some good. Strong monotonicity may
re‡ect a form of complementarity among goods, so that an increase of the amounts of only
some of them can be irrelevant for the consumer if the quantities of the other goods remain
unchanged. Perfect complementarity a la Leontief is the extreme case, a classic example
being pairs of shoes, right and left.20

Example 214 (i) The Cobb-Douglas utility function on Rn++ given by (6.27) is strictly
increasing. By (6.25), it is also strongly increasing.
(ii) The Leontief utility function on Rn++ given by

u (x1 ; x2 ; ; xn ) = min xi
i=1;:::;n
19
Recall that, even if mathematically it can be de…ned on the entire positive orthant Rn
+ , from the economic
viewpoint it is on Rn ++ that the Cobb-Douglas utility function is interesting (cf. Example 214). The fact that
the log-linear utility function can be only de…ned on Rn++ can be viewed as a further sign that this is, indeed,
the proper economic domain of the Cobb-Douglas utility function.
20
It is useless to increase the number of the right shoes without increasing, in the same quantity, that of
the left shoes (and vice versa).
6.4. CLASSES OF FUNCTIONS 139

in which the goods are perfect complements, is strongly increasing. As we saw in Example
212, it is not strictly increasing.
(iii) The reader can check which properties of monotonicity hold if we consider the two
previous utility functions on the entire positive orthant Rn+ rather than just on Rn++ . N

Consumers with strictly or strongly monotonic utility functions are “insatiable”because,


by suitably increasing their bundles, their utility also increases. This property of utility
functions is sometimes called insatiability, and it is thus shared by both strict and strong
monotonicity. The only form of monotonicity compatible with satiety is increasing monoton-
icity (6.28): as observed for the drunk consumer, this weaker form of monotonicity allows for
the possibility that a given good, when it exceeds a certain level, does not result in a further
increase of utility. However, it cannot happen that utility decreases: if (6.28) holds, utility
either increases or remains constant, but it never (strictly or strongly) decreases. Therefore,
if an extra glass of wine results in a decrease of the drunk’s utility, this cannot be modelled
by any form of increasing monotonicity, no matter how weak.

6.4.5 Concave and convex functions: a preview

The class of concave and convex functions is of great importance in economics. The concept,
which will be fully developed in Chapter 14, is anticipated here in the scalar case.

De…nition 215 A function f : I ! R, de…ned on an interval I of R, is said to be concave


if

f ( x + (1 ) y) f (x) + (1 ) f (y)

for every x; y 2 I and every 2 [0; 1], while it is said to be convex if

f ( x + (1 ) y) f (x) + (1 ) f (y)

for every x; y 2 I and every 2 [0; 1].

Geometrically, a function is concave if the segment (called chord ) that joins any two
points (x; f (x)) and (y; f (y)) of its graph lies below the graph of the function, while it is
convex if the opposite happens, that is, if such chord lies above the graph of the function.
140 CHAPTER 6. FUNCTIONS

The following …gure illustrates:

Note that the domain of concave and convex functions must be an interval, so the points
x + (1 ) y belong to it and the expression f ( x + (1 ) y) is meaningful.

Example 216 The functions f; g : R ! R de…ned by f (x) = x2 and g(x) = ex are convex,
while the function f : R ! R de…ned by f (x) = log x is concave. The function f : R ! R
given by f (x) = x3 is neither concave nor convex. All this can be checked analytically
through the last de…nition, but it is best seen graphically:

5 5

4 4

3 3

2 2

1 1

0 0
x y x y
-1 -1

-2 -2
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4

Convex function x2 Convex function ex


6.4. CLASSES OF FUNCTIONS 141

8
3
6

2 4

2
1
0
x
y
0
x -2
y
-4
-1
-6

-2 -8

-3 -2 -1 0 1 2 3 4 5
-3
-1 0 1 2 3 4

Non-concave and non-convex


Concave function log x function x3

6.4.6 Separable functions


In economics an important role is played by vector functions that are sums of scalar functions.

De…nition 217 A function f : A Rn ! R is said to be separable if there exist n scalar


functions gi : A R ! R such that
n
X
f (x) = gi (xi ) 8x = (x1 ; :::; xn ) 2 A
i=1

The importance P of this class of functions is due to their great tractability. The simplest
example is f (x) = ni=1 xi , for which the functions gi are the identity, i.e., gi (x) = x for
each i. Let us give some more examples.

Example 218 The function f : R2 ! R de…ned by

f (x) = x21 + 4x2 8x = (x1 ; x2 ) 2 R2

is separable with g1 (x1 ) = x21 and g2 (x2 ) = 4x2 . N

Example 219 The function f : Rn++ ! R, called entropy, de…ned by


n
X
f (x) = xi log xi 8x = (x1 ; :::; xn ) 2 Rn++
i=1

is separable with gi (xi ) = xi log xi . N

Example 220 The intertemporal utility function (6.6), that is,


T
X
t 1
U (x) = ut (xt )
t=1
142 CHAPTER 6. FUNCTIONS

is separable with gt (xt ) = t 1 ut (xt ) for each t.


Separable utility functions are important in the static case as well. The utility functions
used by the …rst marginalists were indeed of the form
n
X
u (x) = ui (xi ) (6.29)
i=1

In other words, they assumed that the utility of a bundle x is decomposable into the utility
of the quantities xi of the various goods that compose it. It is a restrictive assumption that
ignores any possible interdependence, for example of complementarity or substitutability,
among the di¤erent goods of a bundle. Due to its remarkable tractability, however, (6.29)
remained for a long time the standard form of the utility functions until, at the end of the
nineteenth century, the works of Edgeworth and Pareto showed how to develop consumer
theory for utility functions that are not necessarily separable. N

Example 221 If in (6.29) we set ui (xi ) = xi for all i, we obtain the important special case
n
X
u (x) = xi
i=1

where the goods are perfect substitutes. The utility of bundles x depends only on the sum of
the amounts of the di¤erent goods, regardless of the speci…c amounts of the individual goods.
For example, think of x as a bundle of di¤erent types of oranges, which di¤er in origin and
taste, but are identical in terms of nutritional values. In this case, if the consumer only cares
about such values, then these di¤erent types of oranges are perfect substitutes. This case is
opposite to that of perfect complementarity that characterizes the Leontief utility function.
More generally, if in (6.29) we set ui (xi ) = i xi for all i, with i > 0, we have
n
X
u (x) = i xi
i=1

In this case, the goods in the bundle are no longer perfect substitutes; rather, their relevance
depends on their weights i . Therefore, to keep utility constant each good can be replaced
with another according to a linear trade-o¤. Intuitively, one unit of good i is equivalent
to j = i units of good j. The notion of marginal rate of substitution formalizes this idea
(Section 25.3.2). N

Example 222 The log-linear utility function


n
X
log u (x) = ai log xi
i=1

studied in Example 213 is separable. It is the logarithmic transformation of the Cobb-


Douglas utility function, which is not separable. Thus, sometimes it is possible to obtain
separable versions of utility functions via their strictly monotonic transformations. Usually,
the separable versions are the most convenient from the analytical point of view – the
log-linear utility is, indeed, more tractable than the Cobb-Douglas (6.27). N
6.5. ELEMENTARY FUNCTIONS ON R 143

6.5 Elementary functions on R


The section introduces the so-called “elementary”functions, which include most of the scalar
functions of interest in applications. Section 35.10 of Chapter 35 will continue their study.

6.5.1 Polynomial functions


The polynomial function, or polynomial, f : R ! R of degree n 0 is de…ned by
f (x) = a0 + a1 x + + an xn
with ai 2 R for every 0 i n and an 6= 0. Let Pn be the set of all polynomials of degree
lower than or equal to n. Clearly,
P0 P1 P2 Pn

Example 223 (i) We have f (x) = x + x2 2 P2 , and f (x) = 3x 10x4 2 P4 . (ii) A


polynomial f has degree zero when there exists a 2 R such that f (x) = a for every x.
Constant functions can, therefore, be regarded as a polynomial of degree zero. N
[
The set of all polynomials, of any degree, is denoted by P. That is, P = Pn .
n 0

6.5.2 Exponential and logarithmic functions


Given a > 0, the function f : R ! R de…ned by
f (x) = ax
is called the exponential function of base a. By Lemma 40-(iv), the exponential function is:

(i) strictly increasing if a > 1 (e.g., e > 1);


(ii) constant if a = 1;
(iii) strictly decreasing if 0 < a < 1.

Provided a 6= 1, the exponential function ax is strictly monotonic, and therefore injective.


Its inverse has as domain the image (0; 1) and, by Proposition 43, it is the function f :
(0; 1) ! R de…ned by
f (x) = loga x
called logarithmic function of base a > 0. Note that, by what just observed, a 6= 1.
The properties established in Proposition 43, i.e.,
loga ax = x 8x 2 R
and
aloga x = x 8x 2 (0; 1)
are therefore nothing but the relations (6.9) and (6.10) for inverse functions – i.e., the
relations f 1 (f (x)) = x and f f 1 (y) = y – in the special case of the exponential and
logarithmic functions.
The next result summarizes the monotonicity properties of these elementary functions.
144 CHAPTER 6. FUNCTIONS

Lemma 224 Both the exponential function ax and the logarithmic function loga x are in-
creasing if a > 1 and decreasing if 0 < a < 1.

Proof For the exponential function, observe that, when a > 1, also ah > 1 for every h > 0.
Therefore ax+h = ax ah > ax for every h > 0. For the logarithmic function, after observing
that loga k > 0 if a > 1 and k > 1, we have

h h
loga (x + h) = loga x 1 + = loga x + loga 1 + > loga x
x x

for every h > 0, as desired.

That said, in the sequel we will mostly use Napier’s constant e as base and so we will
refer to f (x) = ex as the exponential function, without further speci…cation (sometimes it is
denoted by f (x) = exp x). Thanks to the remarkable properties of the power ex (Section 1.5),
the exponential function plays a fundamental role in mathematics and in its applications.
Its image is (0; 1) and its graph is:

5
y
4

1 1

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

The negative exponential function f (x) = e x is also important. Its graph is:
6.5. ELEMENTARY FUNCTIONS ON R 145

2
y
1

0
O x
-1 -1

-2

-3

-4

-5
-3 -2 -1 0 1 2 3 4

In a similar vein, in view of the special importance of the natural logarithm (Section 1.5),
we refer to f (x) = log x as the logarithmic function, without further speci…cation. Like the
exponential function f (x) = ex , which is its inverse, the logarithmic function f (x) = log x
is widely used in applications. Its image is R and its graph is:

5
y
4

0
O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4
146 CHAPTER 6. FUNCTIONS

The functions ex and log x, being one the inverse of the other, have graphs that are mirror
images of each other:

6.5.3 Trigonometric and periodic functions

Trigonometric functions, and more generally periodic functions, are also important in many
applications.21

Trigonometric functions

The sine function f : R ! R de…ned by f (x) = sin x is the …rst example of a trigonometric
function. For each x 2 R we have

sin (x + 2k ) = sin x 8k 2 Z

The graph of the sine function is:

21
We refer readers to Appendix C for of some basic notions of trigonometry.
6.5. ELEMENTARY FUNCTIONS ON R 147

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

The function f : R ! R de…ned by f (x) = cos x is the cosine function. For each x 2 R
we have
cos (x + 2k ) = cos x 8k 2 Z

Its graph is:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

Finally, the function f : R 2 + k ; k 2 Z ! R de…ned by f (x) = tan x is the tangent


function. By (C.3),
tan (x + k ) = tan x 8k 2 Z

The graph is:


148 CHAPTER 6. FUNCTIONS

10
y
8

0
O x
-2

-4

-6

-8

-10
-4 -3 -2 -1 0 1 2 3 4

It is immediate to see that, for x 2 (0; =2), we have the sandwich 0 < sin x < x < tan x.

The functions sin x, cos x and tan x are monotonic (so invertible) on, respectively, the in-
tervals [ =2; =2], [0; ], and ( =2; =2). Their inverse functions are denoted respectively
by arcsin x (or sin 1 x), arccos x (or cos 1 x), and arctan x (or tan 1 x).
Speci…cally, by restricting ourselves to an interval [ =2; =2] of strict monotonicity of
the function sin x, we have

h i
sin x : ; ![ 1; 1]
2 2

Hence, the inverse function of sin x is

h i
arcsin x : [ 1; 1] ! ;
2 2

with graph:
6.5. ELEMENTARY FUNCTIONS ON R 149

3 y

O x
-1

-2

-3

-4 -3 -2 -1 0 1 2 3 4

Restricting ourselves to an interval [0; ] of strict monotonicity of cos x we have:

cos x : [0; ] ! [ 1; 1]
Therefore, the inverse function of cos x is

arccos x : [ 1; 1] ! [0; ]

with graph:

y
3

0
O x
-1

-2

-3

-4 -3 -2 -1 0 1 2 3 4

Finally, restricting ourselves to an interval ( =2; =2) of strict monotonicity of tan x,


we have:

tan x : ; !R
2 2
150 CHAPTER 6. FUNCTIONS

so that the inverse function of tan x is

arctan x : R ! ;
2 2

with graph:

3 y

O x
-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4

Note that (2= ) arctan x is a one-to-one correspondence between the real line and the
open interval ( 1; 1). As we will learn in the next chapter, this means that the open interval
( 1; 1) has the same cardinality of the real line.22

Periodic functions

Trigonometric functions are the most important class of periodic functions.

De…nition 225 A function f : R ! R is said to be periodic if there exists p 2 R such that,


for each x 2 R, we have
f (x + kp) = f (x) 8k 2 Z (6.30)

The smallest (if it exists) among such p > 0 is called the period of f . In particular,
the periodic functions sin x and cos x have period 2 , while the periodic function tan x has
period . Their graphs well illustrate the property that characterizes periodic functions,
that is, that of repeating themselves identical on each interval of width p.

Example 226 The functions sin2 x and log tan x are periodic of period . N

Let us see an example of a periodic function which is not trigonometric.


22
The more readers are puzzled by this remark, the higher the chance that they are actually understanding
it.
6.6. MAXIMA AND MINIMA OF A FUNCTION: A PREVIEW 151

Example 227 The function f : R ! R given by f (x) = x [x] is called mantissa.23 The
mantissa of x > 0 is its decimal part; for example f (2:37) = 0:37. The mantissa function is
periodic with period 1. Indeed, by (1.19) we have [x + 1] = [x] + 1 for every x 2 R. So,

f (x + 1) = x + 1 [x + 1] = x + 1 ([x] + 1) = x [x] = f (x)

Its graph
2.5

2
y
1.5

0.5

-0.5

-1
O x
-1.5

-2

-2.5
-3 -2 -1 0 1 2 3

well illustrates the periodicity. N

Finally, readers can verify that periodicity is preserved by the fundamental operations
among functions: if f and g are two periodic functions of same period p, the functions
f (x) + g (x), f (x) g (x) and f (x) =g (x) are also periodic (of period at most p).

6.6 Maxima and minima of a function: a preview


At this point, it is useful to introduce the concepts of maximizer and minimizer of a scalar
function. We will then discuss them in full generality in Chapter 18.

De…nition 228 Let f : A R ! R be a real-valued function. An element x


^ 2 A is called a
( global) maximizer (or maximum point) of f on A if

f (^
x) f (x) 8x 2 A

The value f (^
x) of the function at x
^ is called ( global) maximum value of f on A.

Maximizers thus attain the highest values of the function f on its domain, they outper-
form all other elements of the domain. Note that the maximum value of f on A is nothing
but the maximum of the set Im f , which is a subset of R. That is,

f (^
x) = max f (A) = max Im f

By Proposition 33, the maximum value is unique. We denote such unique value by

max f (x)
x2A
23
Recall from Proposition 39 that the integer part [x] of a scalar x 2 R is the greatest integer x.
152 CHAPTER 6. FUNCTIONS

Example 229 Consider the function f : R ! R given by f (x) = 1 x2 , with graph:

4 y
3

0
O x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

The maximizer of f is 0 and the maximum value is 1. Indeed, 1 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = ( 1; 1], we have 1 = max ( 1; 1]. N

Similar de…nitions hold for the minimum value of f on A and for the minimizers of f on
A.

Example 230 Consider the quadratic function f (x) = x2 , whose graph is the parabola

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

The minimizer of f is 0 and the minimum value is 0. Indeed, 0 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = [0; 1), we have 0 = min [0; 1). N

While the maximum (minimum) value is unique, maximizers and minimizers might well
not be unique, as the next example shows.
6.7. DOMAINS AND RESTRICTIONS 153

Example 231 Let f : R ! R be the sine function f (x) = sin x. Since Im f = [ 1; 1], the
unique maximum of f on R is 1 and the unique minimum of f on R is 1. Nevertheless,
there are both in…nitely many maximizers –i.e., all the points x = =2 + 2k with k 2 Z -
and in…nitely many minimizers – i.e., all the points x = =2 + 2k with k 2 Z. The next
graph should clarify.

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

6.7 Domains and restrictions


In the …rst paragraph of the chapter we de…ned the domain of a function as the set on
which the function is de…ned: the set A is the domain of a function f : A ! B. In the
various examples of real-valued functions presented until now we have identi…ed as domain
the greatest set A R where the function f could be de…ned. For example, for f (x) = x2
p
the domain is R, for f (x) = x the domain is R+ , for f (x) = log x the domain is R++ , and
so on. For a function f of one or several variables we call natural domain the largest set on
p
which f can be de…ned. For example, R is the natural domain of x2 , R+ is that of x, R++
is that of log x, and so on.
But, there is nothing special, except for maximality, about the natural domain: a function
can be regarded as de…ned on any subset of the natural domain. For example, we can
consider x2 only for positive values of x, so to have a quadratic function f : R+ ! R, or
we can consider log x only for values of x greater than 1 so to have a logarithmic function
f : [1; +1) ! R, and so on.
In general, given a function f : A ! B, it is sometimes important to consider restrictions
to subsets of A.

De…nition 232 Let f : A ! B be a function and C A. The function g : C ! B de…ned


by
g(x) = f (x) 8x 2 C
154 CHAPTER 6. FUNCTIONS

is called the restriction of f to C and is denoted by fjC .

The restriction fjC can, therefore, be seen as f restricted on the subset C of A. Thanks
to the smaller domain, the function fjC can satisfy properties di¤erent from those of the
original function f .

Example 233 (i) Let g : [0; 1] ! R be de…ned by g(x) = x2 . The function g can be seen as
the restriction to the interval [0; 1] of the quadratic function f : R ! R given by f (x) = x2 ;
that is g = fj[0;1] . Thanks to its restricted domain, the function g has better properties
than the function f . For example: g is strictly increasing, while f is not; g is injective
(so, invertible), while f is not; g is bounded, while f is only bounded below; g has both a
maximizer and a minimizer, while f does not have a maximizer.
(ii) Let g : ( 1; 0] ! R be de…ned by g(x) = x. The function g can be seen as
the restriction to ( 1; 0] of both f : R ! R given by f (x) = jxj and h : R ! R given
by h(x) = x. Indeed, a function may be the restriction of several functions (rather, of
in…nitely many functions) and it is the speci…c application at hand that may suggest which
is the most relevant. In any case, let us analyze the di¤erences between g and f and those
between g and h. The function g is injective, while f is not; g is monotonic decreasing, while
f is not. The function g is bounded below, while h is not; g has a global minimizer, while h
does not. N
p
Example 234 The function f (x1 ; x2 ) = x1 x2 has as natural domain R2+ [ R2 , i.e., the
…rst and third quadrants of the plane. Nevertheless, when we regard it as a utility function
of Cobb-Douglas type, its domain is restricted to the …rst quadrant, R2+ , because bundles
of goods always have positive components. Moreover, since f (x1 ; x2 ) = 0 even when just
one component is zero, something not that plausible from an economic viewpoint, this util-
ity function is often considered only on R2++ . Therefore, purely economic considerations
determine the domain on which to study f when interpreted as a utility function. N

Example 235 (i) Let g : [0; +1) ! R be de…ned by g (x) = x3 : The function g can be seen
as the restriction to the interval [0; +1) of the cubic function f : R ! R given by f (x) = x3 ,
that is, g = fj[0;+1] . We observe that g is convex, while f is not; g is bounded below, while
f is not; g has a minimizer, while f does not.
(ii) Let g : ( 1; 0] ! R be de…ned by g (x) = x3 . The function g can be seen as the
restriction to the interval ( 1; 0] of the function f : R ! R given by f (x) = x3 , that is,
g = fj( 1;0] . We observe that g is concave, while f is not; g is bounded above, while f is
not; g has a maximizer, while f does not.
(iii) Sometimes smaller domains may actually deprive functions of some of their prop-
erties. For instance, the restriction of the sine function on the interval [0; =2] is no longer
periodic, while the restriction of the quadratic function on the open unbounded interval
(0; 1) has no minimizers. N

We introduce now the concept of extension of a function on a larger domain, which is


dual with respect to that of restriction.

De…nition 236 Let f : A ! B be a function and let A C. A function g : C ! B such


that
g (x) = f (x) 8x 2 A
6.8. GRAND FINALE: PREFERENCES AND UTILITY 155

is called an extension of f to C.

Restriction and extension are, thus, two sides of the same coin: g is an extension of f
if and only if f is a restriction of g. In particular, a function de…ned on its natural domain
A is an extension to A of each restriction of this function. Moreover, if a function has an
extension, it has in…nitely many ones.24

Example 237 (i) The function g : R ! R de…ned by


( 1
x if x 6= 0
g(x) =
0 if x = 0

is an extension of the function f (x) = 1=x, which has as natural domain R f0g.
(ii) The function g : R ! R de…ned by
(
x for x 0
g(x) =
log x for x > 0

is an extension of the function f (x) = log x, which has natural domain R++ . N

6.8 Grand …nale: preferences and utility


6.8.1 Preferences
We close the chapter by studying in more depth the notions of preference and utility intro-
duced in Section 6.2.1. Consider a preference (binary) relation % de…ned on a subset A of
Rn+ , called consumption set, whose elements are interpreted as the bundles of goods relevant
for the choices of the consumer.
The preference relation represents the tastes of the consumer over the bundles. In partic-
ular, x % y means that the consumer prefers bundle x over bundle y.25 It is a basic relation
that economists take as a given (leaving to psychologists the study of the psychological
motivations that underlie it). From it, the following two important notions are derived:

(i) we write x y if the bundle x is strictly preferred to y, that is, if x % y but not y % x;

(ii) we write x y if the bundle x is indi¤ erent relative to the bundle y, that is, if both
x % y and y % x.

Relations and are, obviously, mutually exclusive: between two indi¤erent bundles
there cannot exist strict preference, and vice versa. The next simple result further clari…es
the di¤erent nature of the two relations.

Lemma 238 The strict preference relation is asymmetric (i.e., x y implies not y x),
while the indi¤ erence relation is symmetric (i.e., x y implies y x).
24
A function might not have restrictions or extensions. Indeed, let f : A R ! R. In the singleton case
A = fx0 g, then f has no restrictions. Instead, if A is the natural domain, then f has no extensions.
25
In the weak sense of “prefers or is indi¤erent”. The preference relation is an important example of a
binary relation (see Appendix A).
156 CHAPTER 6. FUNCTIONS

Proof Suppose x y. By de…nition, x % y but not y % x, so we cannot have y x. This


proves the asymmetry of . As to the symmetry of , suppose x y. By de…nition, both
x % y and y % x. So, y x.

On the preference % we consider some axioms.

Re‡exivity: x % x for every x 2 A.

This …rst axiom re‡ects the “weakness”of %: each bundle is preferred to itself. The next
axiom is more interesting.

Transitivity: x % y and y % z implies x % z for every x; y; z 2 A.

It is a rationality axiom that requires that the preferences of the decision maker have no
cycles:
x%y%z x

Strict preference and indi¤erence inherit these …rst two properties (with the obvious
exception of re‡exivity for the strict preference).

Lemma 239 Let % be re‡exive and transitive. Then:

(i) is re‡exive and transitive;

(ii) is transitive.

Proof (i) We have x x since, thanks to the re‡exivity of %, both x % x and x - x hold.
Hence, the relation is re‡exive. To prove transitivity, suppose that x y and y z. We
show that this implies x z. By de…nition, x y means that x % y and y % x, while y z
means that y % z and z % y. Thanks to the transitivity of %, from x % y and y % z it
follows x % z, while from y % x and z % y it follows z % x. We therefore have both x % z
and z % x, i.e., x z.
(ii) Suppose that x y and y z. We show that this implies x z. Suppose, by
contradiction, that this is not the case, i.e., z % x. By de…nition, x y and y z imply
x % y and y % z. Since y % z and z % x, the transitivity of % implies y % x, so x y since
x - y. But, x y contradicts x y.

The last two lemmas together show that, if % is re‡exive and transitive, the indi¤er-
ence relation is re‡exive, symmetric, and transitive (so, it is an equivalence relation; cf.
Appendix A). For each bundle x 2 A, denote by

[x] = fy 2 A : y xg

the collection of the bundles indi¤erent to it. This set is the indi¤ erence class of % determined
by the bundle x.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 157

Lemma 240 If % is re‡exive and transitive, we have

x y () [x] = [y] (6.31)

and
x y () [x] \ [y] = ; (6.32)

Relations (6.31) and (6.32) express two fundamental properties of the indi¤erence classes.
By (6.31), the indi¤erence class [x] does not depend on the choice of the bundle x: each
indi¤erent bundle determines the same indi¤erence class. By (6.32), di¤erent indi¤erence
classes do not have elements in common, they do not intersect.

Proof By the previous lemmas, is re‡exive, symmetric, and transitive. We …rst prove
(6.31). Suppose that x y. We show that this implies [x] [y]. Let z 2 [x], that is, z x.
Since is transitive, x y and z x imply that z y, that is, z 2 [y], which shows that
[x] [y]. By symmetry, x y implies y x. Then, the previous argument shows that
[y] [x]. So, we conclude that x y implies [y] = [x]. Since the converse is obvious, (6.31)
is proved.
We move now to (6.32) and suppose that x y. This implies that [x] \ [y] = ;. Let
us suppose, by contradiction, that this is not the case and there exists z 2 [x] \ [y]. By
de…nition, we have both z x and z y. By the transitivity of , we then have x y,
which contradicts x y. The contradiction shows that x y implies [x] \ [y] = ;. Since the
converse is obvious, the proof is complete.

The collection f[x] : x 2 Ag of all the indi¤erence classes is denoted by A= and is


sometimes called indi¤ erence map. Thanks to the last lemma, A= forms a partition of A.

Let us continue the study of %. The next axiom does not concern the rationality, but
rather the information of the consumer.

Completeness: x % y or y % x for every x; y 2 A.

Completeness requires the consumer to be able to compare any two bundles of goods,
even very di¤erent ones. Naturally, to do so the consumer must, at least, have su¢ cient
information about the two alternatives: it is easy to think of examples where this assumption
is unrealistic. So, completeness is a non-trivial assumption on preferences.
In any case, note that completeness requires, inter alia, that each bundle be comparable
to itself, that is, x % x. Thus, it implies re‡exivity.

Given the completeness assumption, the relations and are both exclusive (as seen
above) and exhaustive.

Lemma 241 Let % be complete. Given any two any bundles x and y, we always have either
x y or y x or x y.26
26
These “or” are intended as the Latin “aut”.
158 CHAPTER 6. FUNCTIONS

Proof By completeness, we have x % y or y % x.27 Suppose, without loss of generality, that


x % y. One has y % x if and only if x y, while one does not have y % x if and only if
x y.

Since we are considering bundles of economic goods (and not of “bads”), it is natural
to assume monotonicity, i.e., that “more is better”. The triad , >, and leads to three
possible incarnations of this simple principle of rationality:

Monotonicity: x y implies x % y for every x; y 2 A.

Strict monotonicity: x > y implies x y for every x; y 2 A.

Strong monotonicity: % is monotonic and x y implies x y for every x; y 2 A.

The relationships among the three notions are similar to those seen for the analogous
notions of monotonicity studied (also for utility functions) in Section 6.4.4. For example,
strict monotonicity means that, given a bundle, an increase of the quantity of any good of
the bundle determines a strictly preferred bundle.
Similar considerations hold for the other notions. In particular, (6.25) takes the form:

strict monotonicity =) strong monotonicity =) monotonicity

6.8.2 Paretian utility


Although the preference % is the fundamental notion, it is analytically convenient to …nd
a numerical representation of %, that is, a function u : A ! R such that, for each pair of
bundles x; y 2 A, we have
x % y () u(x) u(y) (6.33)
The function u is called (Paretian) utility function. It represents also the strict preference
and indi¤erence:

Lemma 242 We have


x y () u(x) = u(y) (6.34)
and
x y () u(x) > u(y) (6.35)

Proof We have

x y () x % y and y % x () u(x) u(y) and u (y) u (x) () u (x) = u (y)

which proves (6.34). Now consider (6.35). If x y, then u(x) > u(y). Indeed, suppose, by
contradiction, that u (x) u (y). By (6.33), we then have x - y, which contradicts x y.
It remains to show that u(x) > u(y) implies x y. Arguing again by contradiction, suppose
that x - y. Again, by (6.33) we have u (x) u (y), which contradicts u(x) > u(y). This
completes the proof of (6.35).
27
Here “or” is intended as the Latin “vel”.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 159

The equivalence (6.34) allows to represent the indi¤erence classes as indi¤erence curves
of the utility function:
[x] = fy 2 X : u (y) = u (x)g
Thus, when a preference admits a utility representation, (6.32) reduces to the standard
property that indi¤erence curves are disjoint (Section 6.3.1).
As already observed, in the ordinalist approach the utility function is a mere repres-
entation of the preference relation, without any special psychological meaning. Indeed, we
already noted that each strictly increasing function f : Im u ! R de…nes an equivalent utility
function f u, for which it still holds that

x % y () (f u) (x) (f u) (y)

6.8.3 Existence and lexicographic preference


In view of all this, a key theoretical problem is to establish under which conditions a pref-
erence relation % admits a utility function. Things are easy when the consumption set is
…nite.

Theorem 243 Let % be a preference de…ned on a …nite set A. The following conditions are
equivalent:

(i) % is transitive and complete;

(ii) there exists a utility function u : A ! R.

Proof (i) implies (ii) Suppose % is transitive and complete. De…ne u : A ! R by u (x) =
jfy 2 A : y xgj. As the reader can check, we have x % y if and only if u(x) u(y), as
desired. (ii) implies (i). Assume that there exists u : X ! R such that u (x) u (y) if and
only if x % y. The preference % is transitive. Indeed, let x; y; z 2 X be such that x % y
and y % z. By hypothesis, we have that u (x) u (y) and u (y) u (z). Since the order
on R is transitive, we obtain u (x) u (z) which in turn yields x % z, as desired. The
preference % is complete. Indeed, let x; y 2 X. Since u (x) and u (y) are scalars, we either
have u (x) u (y) or u (y) u (x) or both because the order on R is complete. Therefore,
either x % y or y % x or both, as desired.

Thus, if there is a …nite number of alternatives, transitivity and completeness are neces-
sary and su¢ cient conditions for the existence of a utility function. Matters become more
complicated when A is in…nite: later we will present the famous lexicographic preferences on
R2+ , which do not admit any numerical representation. The next theorem solves the existence
problem on the key in…nite set Rn+ . To this end we need a …nal axiom, which reminds the
Archimedean property of the real numbers seen in Section 1.4.3.28

Archimedean: given any three bundles x; y; z 2 Rn+ with x y z, there exist weights
; 2 (0; 1) such that
x + (1 )z y x + (1 )z
28
For simplicity, we will assume that the consumption set A is the entire Rn + . The axiom can be stated
more generally for convex sets, an important notion that we will study in Chapter 14.
160 CHAPTER 6. FUNCTIONS

The axiom implies that there exist no in…nitely preferred and no in…nitely “unpreferred”
bundles. Given the preferences x y and y z, for the consumer the bundle x cannot be
in…nitely better than y, nor the bundle z can be in…nitely worse than y. Indeed, by suitably
combining the bundles x and z we get both a bundle better than y, that is, x+(1 )z, and
a bundle worse than y, that is, x + (1 )z. This would be impossible if x were in…nitely
better than y, or if z were in…nitely worse than y.
In this respect, recall the analogous property of real numbers: if x; y; z 2 R are three
scalars with x > y > z, there exist ; 2 (0; 1) such that

x + (1 )z > y > x + (1 )z (6.36)

The property does not hold if we consider 1 and 1, that is, the extended real line
R = [ 1; 1]. In this case, if y 2 R but x = +1 and/or z = 1, the scalar x is in…nitely
greater than y, and z is in…nitely smaller than y, and there are no ; 2 (0; 1) that satisfy
the inequality (6.36). Indeed, 1 = +1 and ( 1) = 1 for every ; 2 (0; 1), as seen
in Section 1.7.

In conclusion, the Archimedean axiom makes the bundles of di¤erent but comparable
quality: however di¤erent, they belong to the same league. Thanks to this axiom, we can
now state the existence theorem (its not simple proof is omitted).

Theorem 244 Let % be a preference de…ned on A = Rn+ . The following conditions are
equivalent:

(i) % is transitive, complete, strictly monotonic and Archimedean;

(ii) there exists a strictly monotonic and continuous29 utility function u : A ! R.

This is a remarkable result: most economic applications use utility functions and the
theorem shows which conditions on preferences justify such use.30
To appreciate the importance of Theorem 244, we close the chapter with a famous ex-
ample of a preference that does not admit a utility function. Let A = R2+ and, given two
bundles x and y, write x % y if either x1 > y1 or x1 = y1 and x2 y2 . The consumer starts
by considering the …rst coordinate: if x1 > y1 , then x % y. If, on the other hand, x1 = y1 ,
then he turns his attention to the second coordinate: if x2 y2 , then x % y.
The preference is inspired by how dictionaries order words; for this reason, it is called
lexicographic preference. In particular, we have x y if x1 > y1 or x1 = y1 and x2 > y2 ,
while we have x y if and only if x = y. The indi¤erence classes are therefore singletons, a
…rst remarkable feature of this preference.
The lexicographic preference is complete, transitive and strictly monotonic, as the reader
can easily verify. It is not Archimedean, however. Indeed, consider for example x = (1; 0),
y = (0; 1), and z = (0; 0). We have x y z and

x + (1 ) z = ( ; 0) y z 8 2 (0; 1)

which shows that the Archimedean axiom does not hold.


29
Continuity is an important property, to which Chapter 12 is devoted.
30
There exist other results on the existence of utility functions, mostly proved in the 1940s and in the 1950s.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 161

For this reason, Theorem 244 does not apply to lexicographic preference, which therefore
cannot be represented by a strictly monotonic and continuous utility function. Actually, this
preference does not admit any utility function at all.

Proposition 245 The lexicographic preference does not admit any utility function.

Proof Suppose, by contradiction, that there exists u : R2+ ! R that represents the lex-
icographic preference. Let a < b be any two positive scalars. For each x 0 we have
(x; a) (x; b) and, therefore, u (x; a) < u (x; b). By Proposition 39, there exists a rational
number q (x) such that u (x; ) < q (x) < u (x; ). The rule x 7! q (x) de…nes, therefore, a
function q : R+ ! Q. It is injective. If x 6= y, say y < x, then:

u (y; a) < q (y) < u (y; b) < u (x; a) < q (x) < u (x; b)

and so q (x) 6= q (y). But, since R+ has the same cardinality of R, the injectivity of the
function q : R+ ! Q implies jQj jRj, contradicting Cantor’s Theorem 254. This proves
that the lexicographic preference does not admit any utility function.
162 CHAPTER 6. FUNCTIONS
Chapter 7

Cardinality

7.1 Actual in…nite and potential in…nite


Ideally, a quantity can be always made larger by a unit increase, a set can always become
larger by adding an extra element, a segment can be subdivided into smaller and smaller
parts (of positive length) by continuing to cut it in half. Therefore, potentially, we have
arbitrarily large quantities and sets, as well as arbitrarily small segments. In these cases, we
talk of potential in…nite. It is a notion that has been playing a decisive role in mathematics
since the dawn of Greek mathematics. The "- arguments upon which the study of limits is
based are a brilliant example of this, as it is the method of exhaustion upon which integration
relies.1
When the potential in…nite realizes and becomes actual, we have an actual in…nite. In
set theory, our main interest here, the actual in…nite corresponds to sets formed by in…nite
elements. Not in potentia (in power) but in act: a set with a …nite number of grains of sand
to which we add more and more new grains is in…nite in potentia, but not in act, because,
however large, the number of grains remains …nite.2 Instead, a set that consists of in…nite
grains of sand is in…nite in the actual sense. It is, of course, a metaphysical notion that
only the eye of the mind can see: (sensible) reality is necessarily …nite. Thus, actual in…nite,
starting from Aristotle, to whom the distinction between the two notions of in…nite dates
back, was considered with great suspicion (summarized with the Latin saying in…nitum actu
non datur ).3 On the other hand, the dangers of a naive approach, based purely on intuition,
to the actual in…nite had been masterfully highlighted already in Presocratic times by some
1
The "- arguments will be seen in Chapters 8 and 11. The potential in…nite will come into play when,
for example, we will consider " > 0 arbitrarily small (but always non-zero) or n arbitrarily large (yet …nite).
In Chapter 35 will study in detail the role of the method of exhaustion in integration.
2
Archimedes, who masterly used the method of exhaustion to compute some remarkable areas, in his work
Arenarius argued that about 8 1063 grains of sand are enough to …ll the universe. It is a huge, but …nite,
number.
3
In a conference held in 1925, David Hilbert described these notions of in…nite with the following words
“Someone who wished to characterize brie‡y the new conception of the in…nite which Cantor introduced
might say that in analysis we deal with the in…nitely large and the in…nitely small only as limit concepts,
as something becoming, happening, i.e., with the potential in…nite. But this is not the true in…nite. We
meet the true in…nite when we regard the totality of numbers 1; 2; 3; 4; ::: itself as a completed unity, or when
we regard the points of an interval as a totality of things which exists all at once. This kind of in…nity is
known as actual in…nity.”(Trans. in P. Benacerraf and H. Putnam, “Philosophy of mathematics”, Cambridge
University Press, 1964).

163
164 CHAPTER 7. CARDINALITY

of the celebrated paradoxes of Zeno of Elea.


All this did change, after more than twenty centuries, with the epoch-making work of
Georg Cantor. Approximately between 1875 and 1885, Cantor revolutionized mathematics
by …nding the key concept (bijective functions) that allows for a rigorous study of sets, …nite
and in…nite, thus putting the notion of set at the foundations of mathematics. It is not by
chance that our book starts with such a notion. The rest of the chapter is devoted to the
Cantorian study of in…nite sets, in particular of their cardinality.

7.2 Bijective functions and cardinality


Bijective functions, introduced in the last chapter, are fundamental in mathematics because
criteria of similarity between mathematical entities are often based on them. Cantor’s study
of the cardinality of in…nite sets is, indeed, a magni…cent example of this role of bijective
functions.
We start by considering a …nite set A, that is, a set with a …nite number of elements. We
call the number of elements of the set A the cardinality (or power ) of A. We usually denote
it by jAj.

Example 246 The set A = f11; 13; 15; 17; 19g of the odd integers between 10 and 20 is
…nite, with jAj = 5. N

Thanks to Proposition 198, two …nite sets have the same cardinality if and only if their
elements can be put in a one-to-one correspondence. For example, if we have seven seats
and seven students, we can assign one (and only one) seat to each student, say by putting a
name tag on it. All this motivates the following de…nition.

De…nition 247 A set A is …nite if it can be put in a one-to-one correspondence with a


subset of the form f1; 2; :::; ng of N. In this case, we write jAj = n.

In other words, A is …nite if there exist a set f1; 2; :::; ng of natural numbers and a bijective
function f : f1; 2; :::; ng ! A. The set f1; 2; :::; ng can be seen as the “prototypical” set of
cardinality n, a benchmark that permits to “calibrate” all the other …nite sets of the same
cardinality via bijective functions.

This de…nition provides a functional angle on the cardinality of …nite sets, based on
bijective functions and on the identi…cation of a prototypical set. For …nite sets, however,
this angle is not much more than a curiosity. However, it becomes fundamental when we
want to extend the notion of cardinality to in…nite sets. This was the key insight of Georg
Cantor that, by …nding the right angle, led to the birth of the theory of in…nite sets. Indeed,
the possibility of establishing a one-to-one correspondence among in…nite sets allows for
a classi…cation of these sets by “size” and leads to the discovery of deep and surprising
properties.

De…nition 248 A set A is said to be countable if it can be put in a one-to-one correspond-


ence with the set N of the natural numbers. In this case, we write jAj = jNj.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 165

In other words, A is countable if there exists a bijective function f : N ! A, that is, if


the elements of the set A can be ordered in a sequence a0 ; a1 ; :::; an ; ::: (i.e., 0 corresponds
to f (0) = a0 , 1 to f (1) = a1 , and so on). The set N is, therefore, the “prototype” for
countable sets: any other set is countable if it is possible to pair in a one-to-one fashion
(as the aforementioned seats and students) its elements with those of N. This is the …rst
category of in…nite sets that we encounter.

Relative to …nite sets, countable sets immediately exhibit a remarkable, possibly puzzling,
property: it is always possible to put a countable set into a one-to-one correspondence with
an in…nite proper subset of it. In other words, losing elements might not a¤ect cardinality
when dealing with countable sets.

Theorem 249 Each in…nite subset of a countable set is also countable.

Proof Let X be a countable set and let A X be an in…nite proper subset of X, i.e.,
A 6= X. Since X is countable, its elements can be listed as a sequence of distinct elements
X = fx0 ; x1 ; : : : ; xn ; : : :g = fxi gi2N . Let us denote by n0 the smallest integer larger than or
equal to 0 such that xn0 2 A (if, for example, x0 2 A, we have n0 = 0, if x0 2 = A and x1 2 A
we have n0 = 1, and so on). Analogously, let us denote by n1 the smallest integer (strictly)
larger than n0 such that xn1 2 A. Given n0 ; n1 ; : : : ; nj , with j 1, let us de…ne nj+1 as the
smallest integer larger than nj such that xnj+1 2 A. Consider now the function f : N ! A
de…ned by f (i) = xni , with i = 0; 1; : : : ; n; : : :. It is easy to check that f is a one-to-one
correspondence between N and A, so A is countable.

The following example should clarify the scope of the previous theorem. The set E of
even numbers is, clearly, a proper subset of N that we may think contains only “half” of
its elements. Nevertheless, it is possible to establish a one-to-one correspondence with N by
putting in correspondence each even number 2n to its half n, that is,

2n 2 E !n2N

Therefore, jEj = jNj. Already Galileo realized this remarkable peculiarity of in…nite sets,
which clearly distinguishes them from …nite sets, whose proper subsets have always smaller
cardinality.4 In a famous passage of the Discorsi e dimostrazioni matematiche intorno a due
nuove scienze,5 published in 1638, he observed that the natural numbers can be put in a
one-to-one correspondence with their squares by setting n2 $ n. The squares, which prima
facie seem to form a rather small subset of N, are thus in equal number with the natural
numbers: “in an in…nite number, if one could conceive of such a thing, he would be forced
4
The mathematical fact considered here is at the basis of several little stories. For example, The Paradise
Hotel has countably in…nite rooms, progressively numbered 1; 2; 3; . At a certain moment, they are all
occupied when a new guest checks in. At this point, the hotel manager faces a conundrum: how to …nd a
room for the new guest? Well, after some thought, he realizes that it is easier than he imagined! It is enough
to ask every other guest to move to the room coming after the one they are actually occupying (1 ! 2; 2 ! 3;
3 ! 4, etc.). In this way, room number 1 will become free. He also realizes that it is possible to improve
upon this new arrangement! It is enough to ask everyone to move to the room with a number which is twice
the one of the room actually occupied (1 ! 2; 2 ! 4; 3 ! 6, etc.). In this way, in…nite rooms will become
available: all the odd ones.
5
The passage is in a dialogue between Sagredo, Salviati, and Simplicio, during the …rst day.
166 CHAPTER 7. CARDINALITY

to admit that there are as many squares as there are numbers all taken together”. The
clarity with which Galileo exposes the problem is worthy of his genius. Unfortunately, the
mathematical notions available to him were completely insu¢ cient for further developing
his intuitions. For example, the notion of function, fundamental for the ideas of Cantor,
emerged (in a primitive form) only at the end of the seventeenth century in the works of
Leibniz.

Clearly, the union of a …nite number of countable sets is also countable. Much more is
actually true.

Theorem 250 The union of a countable collection of countable sets is also countable.

Before providing a proof of this theorem, we give an heuristic argument. Denote by


fAn g1
n=1 the countable collection of the countable sets. The result claims that their union
[1
An is a countable set. Since each set An is countable, we can list their elements as follows:
n=1

A1 = fa11 ; a12; :::a1n ; :::g ; A2 = fa21 ; a22; :::a2n ; :::g ; ::: An = fan1 ; an2; :::ann ; :::g ; :::

We can then construct an in…nite matrix A in which the elements of the set An form the
n-th row: 2 3
a11 a12 a13 a14 a15
6 a21 a22 a23 a24 a25 7
6 7
6 a31 a32 a33 a34 a35 7
A=6 6 a41 a42
7
7 (7.1)
6 a43 a44 a45 7
4 a51 a52 a53 a54 a55 5

1
[
The matrix A contains at least as many elements as the union An . Indeed, it may contain
n=1
more elements because some elements can be repeated more than once in the matrix, while
they would only appear once in the union (net of such repetitions, the two sets have the
same number of elements).
We now introduce another in…nite matrix, denoted by N , which contains all the natural
numbers except 0.
2 3
1 3 6 10 15
6 2 5 9 14 7
6 7
6 4 8 13 7
N =6 6 7 (7.2)
7
6 7 12 7
4 11 5

Observe that:

1. The …rst diagonal of A (moving from SW to NE) consists of one element: a11 . We map
this element into the natural number 1, which is the corresponding element in the …rst
diagonal of N . Note that the sum of the indexes of a11 is 1 + 1 = 2.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 167

2. The second diagonal of A consists of two elements: a21 and a12 . We map these elements,
respectively, into the natural numbers 2 and 3, which are the corresponding elements
in the second diagonal of N . Note that the sum of the indexes of a21 and a12 is 3.

3. The third diagonal of A consists of three elements: a31 , a22 , and a13 . We map these
elements, respectively, into the natural numbers 4, 5, and 6, which are the correspond-
ing elements in the third diagonal of N . Note that the sum of the indexes of a31 , a22 ,
and a13 is 4.

4. The fourth diagonal of A consists of four elements: a41 , a32 , a23 , and a14 . We map
these elements, respectively, into the natural numbers 7, 8, 9, and 10, which are the
corresponding elements in the fourth diagonal of N . Note that the sum of the indexes
of a41 , a32 , a23 , and a14 is 5.

These four steps can be illustrated as follows:

0.9

0.8
a a a a ...
11 12 13 14
0.7

0.6 a a a a ...
21 22 23 24
0.5
a a a a ...
0.4 31 32 33 34

0.3
a a a a ...
41 42 43 44
0.2

0.1
..........................

0
0 0.2 0.4 0.6 0.8 1

At each step we have an arrow, indexed by the sum of the indexes of the entries that it hits,
minus 1. So, arrow 1 hits entry a11 , arrow 2 hits entries a21 and a12 , arrow 3 hits entries
a31 , a22 , and a13 , and arrow 4 hits entries a41 , a32 , a23 , and a14 . Each arrow hits one more
entry than the previous one.
Intuitively, by proceeding in this way we cover the entire matrix A with countably many
arrows, each hitting a …nite number of entries. So, matrix A has countably many entries.
1
[
The union An is then a countable set.
n=1
That said, next we give a rigorous proof.

Proof of Theorem 250 We …rst prove two auxiliary claims.

Claim 1 N N is countable.

Proof Claim 1 Consider the function f1 : N N ! N given by f1 (m; n) = 2n+1 3m+1 . Note
that f1 (m; n) = f1 (m; n) means that 2n+1 3m+1 = 2n+1 3m+1 . By the Fundamental Theorem
of Arithmetic, this implies that n+1 = n+1 and m+1 = m+1, proving that (m; n) = (m; n).
168 CHAPTER 7. CARDINALITY

Thus, f1 is injective and f1 : N N ! Im f1 is bijective. At the same time, by Theorem


249 and since Im f1 is in…nite (indeed, it contains the set 2 3; 22 3; :::; 2n 3; ::: ), it
follows that Im f1 is countable, that is, there exists a bijection f2 : N ! Im f1 . The reader
can easily verify that the map f = f1 1 f2 is a bijection from N to N N, proving that
N N is countable.

Claim 2 If g : N ! B is surjective and B is in…nite, then B is countable.

Proof Claim 2 De…ne h1 : B ! N by h1 (b) = min fn 2 N : g (n) = bg for all b 2 B. Since


h1 is surjective, fn 2 N : g (n) = bg is non-empty for all b 2 B, thus h1 is well-de…ned. Note
that b 6= b0 implies that h1 (b) 6= h1 (b0 ), thus h1 is injective. It follows that h1 : B ! Im h1 is
bijective. At the same time, by Theorem 249 and since Im h1 is in…nite (B is in…nite), there
exists a bijection h2 : N ! Im h1 . The reader can easily verify that the map h = h1 1 h2 is
a bijection from N to B, thus proving that B is countable.

We are ready to prove the result. Consider the countable collection

A0 ; A1 :::; Am ; ::: (7.3)


S
and de…ne B = +1 m=0 Am . Since each Am is countable, clearly B is in…nite and there exists
a bijection gm : N ! Am . De…ne the map g^ : N N ! B by the rule g^ (m; n) = gm (n). In
other words, the …rst natural number m chooses the set while the second natural number
chooses the n-th element of that set. The map g^ is surjective, for, given an element b 2 B, it
belongs to Am for some m and it is paired to a natural number n by the map gm (n), that is,
g^ (m; n) = gm (n) = b. Unfortunately, g^ might not be injective, since the sets in (7.3) might
have elements in common. If we consider g = g^ f where f is like in Claim 1, this function
is from N to B and it is surjective. By Claim 2, it follows that B is countable, thus proving
the result.

With a similar argument it is possible to prove that also the Cartesian product of a …nite
number of countable sets is countable. Moreover, the previous result yields that the set of
rational numbers is countable.

Corollary 251 Z and Q are countable.

Proof We …rst prove that Z is countable. De…ne f : N ! Z by the rule


( n
2 if n is even
f (n) = (n+1)
2 if n is odd

The reader can verify that f is bijective, thus proving that Z is countable. On the other
hand, the set nm o
Q= : m 2 Z and 0 6= n 2 N
n
of rational numbers can be written as union of in…nitely many countable sets:
+1
[
Q= An
n=1
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 169

where
0 1 1 2 2 m m
An = ; ; ; ; ;:::; ; ;:::
n n n n n n n

Each An is countable because it is in a one-to-one correspondence with Z, which, in turn, is


countable. By Theorem 250, it follows that Q is countable.

This corollary is quite surprising: though the rational numbers are much more numerous
than the natural numbers, there exists a way to put these two classes of numbers into a
one-to-one correspondence. The cardinality of N, and so of any countable set, is usually
denoted by @0 , that is, jNj = @0 . We can then write as

jQj = @0

the remarkable property that Q is countable.6


At this point, we might suspect that all in…nite sets are countable. The next result of
Cantor shows that this is not the case: the set R of real numbers is in…nite but not countable,
its cardinality being higher than @0 . To establish this fundamental result, we need a new
de…nition and an interesting result.

De…nition 252 A set A has the cardinality of the continuum if it can be put in a one-to-one
correspondence with the set R of the real numbers. In this case, we write jAj = jRj.

The cardinality of the continuum is often denoted by c, that is, jRj = c. Also in this case
there exist subsets that are, prima facie, much smaller than R but turn out to have the same
cardinality. Let us see an example which will be useful in proving that R is uncountable.

Proposition 253 The interval (0; 1) has the cardinality of the continuum.7

Proof We want to show that j(0; 1)j = jRj. To do this we have to show that the numbers of
(0; 1) can be put in a one-to-one correspondence with those of R. The bijection f : R ! (0; 1)
de…ned by
(
1 12 ex if x < 0
f (x) = 1 x
2e if x 0

6
@ (aleph) is the …rst letter of the Hebrew alphabet. In the next section we will formalize also for in…nite
sets the notion of same or greater cardinality. For the time being, we treat these notions intuitively.
7
At the end of Section 6.5.3 we noted that the trigonometric function f : R ! ( 1; 1) de…ned by
(2= ) arctan x is a bijection In view of what we learned so far, this shows that ( 1; 1) has the cardinal-
ity of the continuum.
170 CHAPTER 7. CARDINALITY

with graph
2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

shows that, indeed, this is the case (as the reader can also formally verify).

Theorem 254 (Cantor) R is uncountable, that is, jRj > @0 .

Proof Assume, by contradiction, that R is countable. Hence, there exists a bijective function
g : N ! R. By Proposition 253, it follows that there exists a bijective function f : R ! (0; 1).
The reader can easily prove that f g is a bijective function from N to (0; 1), yielding that (0; 1)
is countable. We will next reach a contradiction, showing that (0; 1) cannot be countable. To
this end, we write all the numbers in (0; 1) using their decimal representation: each x 2 (0; 1)
will be written as
x = 0:c0 c1 cn
with ci 2 f0; 1; :::; 9g, using always in…nitely many digits (for example 3:54 will be written
3:54000000 : : :). Since until now we obtained that (0; 1) is countable, there exists a way to
list its elements as a sequence.

x0 = 0:c00 c01 c02 c03 c0n


x1 = 0:c10 c11 c12 c13 c1n
x2 = 0:c20 c21 c22 c23 c2n

and so on. Let us take then the number x = 0:d0 d1 d2 d3 dn such that its generic
decimal digit dn is di¤erent from cnn (but without choosing in…nitely many times 9, thus to
avoid a periodic 9 which, as we know, does not exist on its own). The number x belongs
to (0; 1), but sadly does not belong to the list written above since dn 6= cnn (and therefore
it is di¤erent from x0 since d0 6= c00 , from x1 since d1 6= c11 , etc.). We conclude that the
list written above cannot be complete and hence the numbers of (0; 1) cannot be put in a
one-to-one correspondence with N. So, the interval (0; 1) is not countable, a contradiction.
7.3. A PANDORA’S BOX 171

The set R of real numbers is, therefore, much richer than N and Q. The rational numbers
–which have, as we remarked, a “quick rhythm”–are comparatively very few with respect to
the real numbers: they form a kind of …ne dust that overlaps with the real numbers without
covering them all. At the same time, it is dust so …ne that between any two real numbers,
no matter how close they are, there are particles of it.
Summing up, the real line is a new prototype of in…nite set.

It is possible to prove that both the union and the Cartesian product of a …nite or
countable collection of sets that have the cardinality of the continuum has, in turn, the
cardinality of the continuum. This has the next consequence.
Theorem 255 Rn has the power of the continuum for each n 1.
This is another remarkable …nding, which is surprising already in the special case of the
plane R2 that, intuitively, may appear to contain many more points than the real line. It is
in front of results of this type, so surprising for our “…nitary” intuition, that Cantor wrote
in a letter to Dedekind “I see it, but I do not believe it”. His key intuition on the use of
bijective functions to study the cardinality of in…nite sets opened a new and fundamental
area of mathematics, which is also rich in terms of philosophical implications (mentioned at
the beginning of the chapter).

7.3 A Pandora’s box


The symbols @0 and c are called in…nite cardinal numbers. The role played by the natural
numbers in representing the cardinality of …nite sets is now played by the cardinal numbers
@0 and c for the in…nite sets N and R. For this reason, the natural numbers are also called
…nite cardinal numbers. The cardinal numbers
0; 1; 2; :::; n; :::; @0 , and c (7.4)
represent, therefore, the cardinality of the prototype sets
;; f1g ; f1; 2g ; :::; f1; 2; :::; ng ; :::; N, and R
respectively. Looking at (7.4), it is natural to wonder whether @0 and c are the only in…nite
cardinal numbers. As we will see shortly, this is far from being true. Indeed, we are about
to uncover a genuine Pandora’s box (from which, however, no evil will emerge but only
wonders). To do this, we …rst need to generalize to any pairs of sets the comparative notion
of size we considered in De…nitions 248 and 252.
De…nition 256 Two sets A and B have the same cardinality if there exists a bijective
correspondence f : A ! B. In this case, we write jAj = jBj.
In particular, when A is …nite we have jAj = jf1; :::; ngj = n, when A is countable we
have jAj = jNj = @0 , and when A has the cardinality of the continuum we have jAj = jRj = c.
We denote by 2A the power set of the set A, that is, the collection
2A = fB : B Ag
of all its subsets. The notation 2A is justi…ed by the cardinality of the power set in the …nite
case, as we next show.
172 CHAPTER 7. CARDINALITY

Proposition 257 If jAj = n, then 2A = 2n .


n
Proof Combinatorial analysis shows immediately that 2A contains the empty set, 1 sets
with one element, n2 sets with two elements,..., nn 1 sets with n 1 elements, and n
n =1
sets with all the n elements. Therefore,

n n n n
2A = 1 + + + ::: + +
1 2 n 1 n
n
X n k n k
= 1 1 = (1 + 1)n = 2n
k
k=0

where the penultimate equality follows from Newton’s binomial formula.

Sets can have the same size, but also di¤erent sizes. This motivates the following de…ni-
tion:

De…nition 258 Given any two sets A and B, we say that:

(i) A has cardinality less than or equal to B, written jAj jBj, if there exists an injective
function f : A ! B;

(ii) A has cardinality strictly less than B, written jAj < jBj, if jAj jBj and jAj =
6 jBj.

Next we list a few properties of these comparative notions of cardinality.

Proposition 259 Let A, B, and C be any three sets. Then:

(i) jAj jAj;

(ii) jAj jBj and jBj jCj imply that jAj jCj;

(iii) jAj jBj and jBj jAj if and only if jAj = jBj;

(iv) A B implies that jAj jBj.

Example 260 We have jNj < jRj. Indeed, by Theorem 254 jNj =
6 jRj and, by point (iv),
N R implies jNj jRj. N

Properties (i) and (ii) say that the order is re‡exive and transitive. As for property
(iii), it tells us that and = are related in a natural way. Finally, (iv) con…rms the intuitive
idea that smaller sets have a smaller cardinality. Remarkably, this intuition does not carry
over to < – i.e., A B does not imply jAj < jBj – because, as already noted, a proper
subset of an in…nite set may have the same cardinality as the original set (as Galileo had
envisioned).

Proof We start by proving an auxiliary fact. If f : A ! B and g : B ! C are injective,


then g f is injective. For, set h = g f . Assume that h (a) = h (a0 ). Denote b = f (a) and
b0 = f (a). By the de…nition of h, we have g (b) = g (b0 ). Since g is injective, this implies
7.3. A PANDORA’S BOX 173

b = b0 , that is, f (a) = f (a0 ). Since f is injective, we conclude that a = a0 , proving that h is
injective.
(i) Let f : A ! A be the identity, that is, f (a) = a for all a 2 A. The function f is
trivially injective and the statement follows.
(ii) Since jAj jBj, there exists an injective function f : A ! B. Since jBj jCj, there
exists an injective function g : B ! C. Next, note that h = g f : A ! C is well-de…ned
and, by the initial part of the proof, we also know that it is injective, thus proving that
jAj jCj.
(iii) We only prove the “if” part.8 By de…nition and since jAj = jBj, there exists a
bijection f : A ! B. Since f is bijective, it follows that f 1 : B ! A is well-de…ned and
bijective. Thus, both f : A ! B and f 1 : B ! A are injective, yielding that jAj jBj and
jBj jAj.
(iv) De…ne f : A ! B by the rule f (a) = a. Since A B, the function f is well-de…ned
and, clearly, injective, thus proving the statement.

When a set A is …nite and non-empty, we clearly have jAj < 2A . Remarkably, the
inequality continues to hold for in…nite sets.

Theorem 261 (Cantor) For each set A, …nite or in…nite, we have jAj < 2A .

Proof Consider a set A and the collection of all singletons C = ffagga2A . It is immediate to
see that there is a bijective mapping between A and C, that is, jAj = jCj, and C 2A . Since
jCj 2A , we conclude that jAj 2A . Next, by contradiction, assume that jAj = j2A j.
Then there exists a bijection between A and 2A which associates to each element a 2 A an
element b = b (a) 2 2A and vice versa: a $ b. Observe that each b (a), being an element of
2A , is a subset of A. Consider now all the elements a 2 A such that the corresponding subset
b (a) does not contain a. Call S the subset of these elements, that is, S = fa 2 A : a 62 b (a)g.
Since S is a subset of A, S 2 2A . Since we have a bijection between A and 2A , there must
exist an element c 2 A such that b (c) = S. We have two cases:

(i) if c 2 S, then by the de…nition of S, b (c) does not contain c, so c 2


= b (c) = S;

(ii) if c 2
= S, then by the de…nition of S, b (c) contains c, so c 2 b (c) = S.

In both cases, we have reached a contradiction, thus proving jAj < j2A j.

Cantor’s Theorem o¤ers a simple way to make a “cardinality jump”starting from a given
set A: it is su¢ cient to consider the power set 2A . For example,
R
2R > jRj ; 22 > j2R j

and so on. We can, therefore, construct an in…nite sequence of sets of higher and higher
cardinality. In this way, we enrich (7.4) that now becomes
n R
o
1; 2; :::; n; :::; @0 ; c; 2R ; 22 ; ::: (7.5)
8
The “only if” part is the content of the Schroeder-Bernstein’s Theorem which we leave to more advanced
courses.
174 CHAPTER 7. CARDINALITY

Here is the Pandora’s box mentioned above, which Theorem 261 has allowed us to uncover.
The breathtaking sequence (7.5) is only the incipit of the theory of the in…nite sets, whose
study (even the introductory part) would take us too far away.
Before moving on with the book, however, we consider a …nal famous aspect of the
theory, the so-called continuum hypothesis (which the reader might have already heard of).
By Theorem 261, we know that 2N > jNj. On the other hand, by Theorem 254 we also
have jRj > jNj. The next result (we omit its proof) shows that these two inequalities are
actually not distinct.

Theorem 262 2N = jRj.

Therefore, the power set of N has the cardinality of the continuum. The continuum
hypothesis states that there is no set A such that

jNj < jAj < jRj

That is, there is no in…nite set of cardinality intermediate between @0 and c. In other words,
a set that has cardinality larger than @0 must have at least the cardinality of the continuum.
The validity of the continuum hypothesis is the …rst among the celebrated Hilbert prob-
lems, posed by David Hilbert in 1900, and represents one of the deepest questions in math-
ematics. By adopting this hypothesis, it is possible to set

@1 = jRj

and to consider the cardinality of the continuum as the second in…nite cardinal number @1
after the …rst one @0 = jNj.
The continuum hypothesis can be reformulated in a suggestive way by writing

@1 = 2@0

That is, the smallest cardinal number greater than @0 is equal to the cardinality of the power
set of N or, equivalently, of any set of cardinality @0 (like, for example, the rational numbers).
The generalized continuum hypothesis states that, for each n, we have

@n+1 = 2@n

All the jumps of cardinality in (7.5), not only the …rst one from @0 to @1 , are thus obtained
by considering the power set. Therefore,

R 2R
@2 = 22 ; @3 = 22

and so on. At this point, (7.5) becomes

f1; 2; :::; n; :::; @0 ; @1 ; @2 ; @3 ; :::g

The elements of this sequence are the cardinal numbers that represent all the di¤erent car-
dinalities (…nite or in…nite) that sets might have, however large they might be. According
to the generalized continuum hypothesis, the power sets in (7.5) are the prototype sets of
7.3. A PANDORA’S BOX 175

the in…nite cardinal numbers (the …rst two being the two in…nite cardinal numbers @0 = jNj
and @1 = c with which we started this section).

Summing up, the depth of the problems that the use of bijective functions opened is
incredible. As we have seen, this study started by Cantor is, at the same time, rigorous
and intrepid – as typical of the best mathematics, at the basis of its beauty. It relies on
the use of bijective functions to capture the fundamental principle of similarity (in terms of
numerosity) among sets.9

9
The reader who wants to learn more about set theory can consult Halmos (1960), Suppes (1960), as well
as Lombardo Radice (1981).
176 CHAPTER 7. CARDINALITY
Part II

Discrete analysis

177
Chapter 8

Sequences

8.1 The concept


A numerical sequence is an in…nite, endless, “list” of real numbers, for example

f2; 4; 6; 8; :::g (8.1)

where each number occupies a place of order, a position, so it follows (except the …rst one)
a number and precedes another one. The next de…nition formalizes this idea. We denote by
N+ the set of the natural numbers without 0.

De…nition 263 A function f : N+ ! R is called a sequence of real numbers.

In other words, a sequence is a function that associates to each natural number n 1a


real number f (n). In (8.1), to each n we associate f (n) = 2n, that is,

n 7 ! 2n (8.2)

and so we have the sequence of even integers (that are strictly positive). The image f (n)
is usually denoted by xn . With such notation, the sequence of even integers is xn = 2n for
each n 1. The images xn are called terms (or elements) of the sequence. We will denote
sequences by fxn g1
n=1 , or brie‡y by fxn g.
1

There are di¤erent ways to de…ne a speci…c sequence fxn g, that is, to describe the
underlying function f : N+ ! R. A …rst possibility is to describe it in closed form through
a formula: for instance, this is what we did with the sequence of the even numbers using
(8.2). Other de…ning rules are, for example,

n 7 ! 2n 1 (8.3)
n 7 ! n2 (8.4)
1
n7 ! p (8.5)
2n 1

1
The choice of starting the sequence from n = 1 instead of n = 0 (or of any other natural number k) is a
mere convention. When needed, it is perfectly legitimate to consider sequences fxn g1n=0 or, more generally,
fxn g1
n=k .

179
180 CHAPTER 8. SEQUENCES

Rule (8.3) de…nes the sequence of odd integers

f1; 3; 5; 7; :::g (8.6)

while rule (8.4) de…nes the sequence of the squares

f1; 4; 9; 16; :::g

and rule (8.5) de…nes the sequence

1 1 1
1; p ; p ; p ; ::: (8.7)
2 4 8
To de…ne a sequence in closed form thus amounts to specify explicitly the underlying function
f : N+ ! R. The next example presents couple of classic sequences de…ned in closed form.

Example 264 The sequence with xn = 1=n, that is,

1 1 1 1
1; ; ; ; ; :::
2 3 4 5

is called harmonic,2 while the sequence with xn = aq n 1, that is,

a; aq; aq 2 ; aq 3 ; aq 4 ;

is called geometric (or geometric progression) with …rst term a and common ratio q. For
example, if a = 1 and q = 1=2, we have f1; 1=2; 1=4; 1=8; 1=16; :::g. N

Another important way to de…ne a sequence is by recurrence (or recursion). Consider


the famous Fibonacci sequence

f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; :::g

in which each term is the sum of the two terms that precede it, with …xed initial values 0
and 1. For example, in the fourth position we …nd the number 2, i.e., the sum 1 + 1 of the
two terms that precede it, in the …fth position we …nd the number 3, i.e., the sum 1 + 2 of
the two terms that precede it, and so on. The underlying function f : N+ ! R is, hence,
(
f (1) = 0 ; f (2) = 1
(8.8)
f (n) = f (n 1) + f (n 2) for n 3

We have two initial values, f (1) = 0 and f (2) = 1, and a recursive rule that allows to
compute the term in position n once the two preceding terms are known. Di¤erently from
the sequences de…ned through a closed formula, such as (8.3)-(8.5), to obtain the term xn
we now have to …rst construct, using the recursive rule, all the terms that precede it. For
example, to compute the term x100 in the sequence of the odd numbers (8.6), it is su¢ cient
to substitute n = 100 in formula (8.3), …nding x100 = 199. In contrast, to compute the term
2
Indeed, 1=2; 1=3; 1=4; ::: are the positions in which we have to put a …nger on a vibrating string to obtain
the di¤erent notes.
8.1. THE CONCEPT 181

x100 in the Fibonacci sequence we …rst have to construct by recurrence the …rst 99 terms of
the sequence. Indeed, it is true that to determine x100 it is su¢ cient to know the values of
x99 and x98 and then to use the rule x100 = x99 + x98 , but to determine x99 and x98 we must
…rst know x97 and x96 , and so on.
Therefore, the recursive de…nition of a sequence consists of one or more initial values and
of a recurrence rule that, by starting from them, allows to compute the various terms of the
sequence. The initial values are arbitrary. For example, if in (8.8) we choose f (1) = 2 and
f (2) = 1 we have the following Fibonacci sequence
f2; 1; 3; 4; 7; 11; 18; 29; 47; :::g
Next de…ne by recurrence a classic sequence.
Example 265 Given any a; b 2 R, de…ne f : N+ ! R by
(
f (1) = a
f (n) = f (n 1) + b for n 2
Starting from the initial value f (1) = a, it is possible to construct the entire sequence
through the recursive formula f (n) = f (n 1) + b. This is the so-called arithmetic sequence
(or arithmetic progression) with …rst term a and common di¤erence b. For example, if a = 2
and b = 4, we have f2; 6; 10; 14; 18; 22; :::g. N
To ease notation, the underlying function f is often omitted in recursive formulas. For
instance, the arithmetic sequence is written as
(
x1 = a
(8.9)
xn = xn 1 + b for n 2
The next examples adopt this simpli…ed notation.
Example 266 Let P = f3k : k 2 N+ g be the collection of all multiples of 3, i.e., P =
f3; 6; 9; 12; 15; :::g. De…ne recursively a sequence fxn g by x1 = a 2 R and, for each n 2,
(
1 if n 2 P
xn xn 1 = (8.10)
+1 else
In words, at each position we can go either up or down of one unit, we go down if we are
getting to positions that are multiples of 3, we go up otherwise. This sequence is an example
of a random walk: it may describe the walk of a drunk person who, at each block, goes
either North, +1, or South, 1, and that, for some (random) reason, always goes South
after having gone twice North. For instance, if the initial condition is a = 0 we have:
182 CHAPTER 8. SEQUENCES

More generally, given any subset P (…nite or not) of N+ , the recurrence (8.10) is called
random walk. N

Example 267 A Star Wars’ jedi begins his career as a padawan apprentice under a jedi
master, then becomes a knight and, once ready to train, becomes a master and takes a
padawan apprentice.
Let
pt = number of jedi padawans at time t
kt = number of jedi knights at time t
mt = number of jedi masters at time t
Assume that, as one (galactic) year passes, padawans become knights, knights become mas-
ters, and masters take a padawan apprentice. Formally:
8
>
> k = pt
< t+1
mt+1 = mt + kt
>
>
:
pt+1 = mt+1
The total number of jedis at time t + 2, denoted by xt+2 , is then:
xt+2 = kt+2 + mt+2 + pt+2 = pt+1 + mt+1 + kt+1 + mt+1 + kt+1
= xt+1 + mt+1 + kt+1 = xt+1 + mt + kt + pt = xt+1 + xt
So, we have a Fibonacci recursion
xt+2 = xt+1 + xt
which says something simple but not so obvious a priori : the number of jedis at time t+2 can
be regarded as the sum of the numbers of jedis at time t + 1 and at time t. Indeed, a jedi
is a master at t + 2 if and only if he was a jedi (of any kind) at t. So, xt gives the number
of all masters at t = 2, who in turn increase at t + 2 the population of jedis by taking new
apprentices.
The recursion is initiated at t = 1 by a “self-taught” original padawan, who becomes
knight at t = 2 and master with a new padawan at t = 3. So:
(
x1 = 1 ; x2 = 1
xt = xt 1 + xt 2 for t 3
with initial values x1 = x2 = 1. We can diagram the recursion as:

p 1=1
k 1=1
mp 1+1=2
mpk 1+2=3
mpkmp 2+3=5
mpkmpmpk 3+5=8
mpkmpmpkmpkmp 5+8=13

Note how every string is the concatenation of the previous two ones. N
8.1. THE CONCEPT 183

Example 268 A Fibonacci recurrence is a classic instance of a linear recurrence of order k


given by (
x1 = 1 ; x2 = 2 ; ::: ; xk = k
(8.11)
xn = a1 xn 1 + a2 xn 2 + + ak xn k for n k + 1
with k initial conditions i and k coe¢ cients ai . A Fibonacci recurrence is a linear recurrence
of order 2 with unitary coe¢ cients a1 = a2 = 1. For example,
(
x1 = 1 ; x2 = 2 ; x3 = 2
xn = 2xn 1 xn 2 + xn 3 for n 4

is a linear recurrence of order 3. N

A closed form explicitly describes the underlying function f : N+ ! R, while a recurrence


gives a partial description of such function that only speci…es what happens next. So, a closed
form de…nition is, in general, more informative than one by recurrence –however interesting,
as a property of a sequence, a recurrence might be per se. Yet, in applications sequences are
often de…ned by recurrence because a partial description is all one is able to say about the
phenomenon under study. For instance, if in studying walking habits of drunk people the
only pattern that one is able to detect is that a drunk person always goes South after having
gone twice North, then the recurrence (8.10) is all one can specify about this phenomenon.
An important topic is, then, whether it is possible to solve a recurrence – that is, to
…nd the closed form –so to have a complete description of the sequence. In general, solving
a recurrence is not a simple endeavor. However, next we present few examples where this
is possible via a “guess and verify” method in which we …rst guess a solution and then
verify it by mathematical induction. Fortunately, there are more systematic methods to
solve recurrences. Though we do not study them in this book –except for a few remarks in
Section 10.5.4 (where we solve linear recursions via generating functions) – it is important
to keep this issue in mind.3

Example 269 Consider the recursion


(
x1 = 2
xn = 2xn 1 for n 2

We have
x2 = 4; x3 = 8; x4 = 16
and so on. This suggests that the closed form is the geometric sequence

xn = 2n 8n 1 (8.12)

of both …rst term and common ratio 2. Let us verify that this guess is correct. We proceed
by induction. Initial step: at n = 1 we have x1 = 2, as desired. Induction step: assume that
(8.12) holds at some n 2; then

xn+1 = 2xn = 2 (2n ) = 2n+1


3
We refer readers to courses in di¤erence equations for a study of this topic.
184 CHAPTER 8. SEQUENCES

and so (8.12) holds at n + 1. By induction, it then holds at all n 1.


In general, the geometric sequence of …rst term a and common ratio q solves the recursion
(
x1 = a
xn = qxn 1 for n 2

as the reader can prove. This recursion also motivates the “…rst term”and “common ratio”
terminology. N

Example 270 For the arithmetic sequence (8.9), we have

x2 = a + b; x3 = a + 2b; x4 = a + 3b

and so on. This suggests the closed form

xn = a + (n 1) b 8n 1 (8.13)

Let us verify that this guess is correct. We proceed by induction. Initial step: at n = 1 we
have x1 = a, as desired. Induction step: assume that (8.13) holds at some n 2; then

xn+1 = xn + b = a + (n 1) b + b = a + nb

and so (8.13) holds at n + 1. By induction, it then holds at all n 1. N

Example 271 An investor can at each period of time invest an amount of money x, a
monetary capital, and receive at the next period the original amount invested x along with
an additional amount rx computed according to the interest rate r 0. Such additional
amount is the fruit of his investment. For instance, if x = 100 and r = 0:1, then rx = 10 is
such an amount.
Assume that the investor has an initial monetary capital c that he keeps investing at all
periods. The resulting cash ‡ow is described by the following recursion
(
x1 = c
xt = (1 + r) xt 1 for t 2

We have

x2 = c (1 + r) ; x3 = x2 (1 + r) = c (1 + r)2 ; x4 = x3 (1 + r) = c (1 + r)3

This suggests that the solution of the recursion is

xt = (1 + r)t c 8t 1 (8.14)

To verify this guess, we can proceed by induction. Initial step: at t = 1 we have x1 = c, as


desired. Induction step: assume that (8.14) holds at some t 2; then

xt+1 = (1 + r) xt 1 = (1 + r) (1 + r)t c = (1 + r)t+1 c

and so (8.14) holds at t + 1. By induction, it then holds at all t 1. Formula (8.14) is the
classic compound interest formula of …nancial mathematics. N
8.1. THE CONCEPT 185

Not all sequences can be described in closed or recursive form. In this regard, the most
famous example is the sequence fpn g of prime numbers: it is in…nite by Euclid’s Theorem,
but it does not have a (known) explicit description. In particular:

(i) Given n, we do not know any formula that tells us what pn is; in other words, the
sequence fpn g cannot be de…ned in closed form.

(ii) Given pn (or any smaller prime), we do not know any formula that tells us what pn+1
is; in other words, the sequence fpn g cannot be de…ned by recurrence.

The situation is actually even more sad:

(iii) Given any prime number p, we do not know of any (operational) formula that gives us
a prime number q greater than p; in other words, the knowledge of a prime number
does not give any information on the subsequent prime numbers.

Hence, we do not have a clue on how prime numbers follow one another, that is, on the
form of the function f : N+ ! R that de…nes such sequence. We have to consider all the
natural numbers and check, one by one, whether or not they are prime numbers through the
primality tests (Section 1.3.2). Having at our disposal the eternity, we could then construct
term by term the sequence fpn g. More modestly, in the short time that passed between
Euclid and us, tables of prime numbers have been compiled; they establish the terms of the
sequence fpn g until numbers that may seem huge to us, but that are nothing relative to the
in…nity of all the prime numbers.
O.R. As to (iii), for centuries mathematicians have looked for a (workable) rule that, given
a prime number p, would make it possible to …nd a greater prime q > p, that is, a function
q = f (p). A famous example of a possible such rule is given by the so-called Mersenne
primes, which are the prime numbers that can be written in the form 2p 1 with p prime.
It is possible to prove that if 2p 1 is prime, then so is p. For centuries, it was believed (or
hoped) that the much more interesting converse was true, namely: if p is prime, so is 2p 1.
This conjecture was de…nitely disproved in 1536 when Hudalricus Regius showed that

211 1 = 2047 = 23 89

thus …nding the …rst counterexample to the conjecture. Indeed, p = 11 does not satisfy it.
In any case, Mersenne primes are among the most important prime numbers. In particular,
as of 2016, the greatest prime number known is

274207281 1

which has 22338618 digits and is a Mersenne prime.4 H

We close the section by observing that, given any function f : R+ ! R, its restriction fjN+
to N+ is a sequence. So, functions de…ned on (at least) the positive half-line automatically
de…ne also a sequence.
4
See the Great Internet Mersenne Prime Search.
186 CHAPTER 8. SEQUENCES

8.2 The space of sequences


We denote by R1 the space of all the sequences x = fxn g of real numbers. We denote,
therefore, by x a generic element of R1 that, written in “extended” form, reads

x = fxn g = fx1 ; x2 ; : : : ; xn ; : : :g

The operations on functions studied in Section 6.3.2 have, as a special case, the operations
on sequences –that is, on elements of the space R1 . In particular, given any two sequences
x = fxn g and y = fyn g in R1 , we have:

(i) the sequence sum (x + y)n = xn + yn for every n 1;

(ii) the sequence di¤ erence (x y)n = xn yn for every n 1;

(iii) the sequence product (xy)n = xn yn for every n 1;

(iv) the sequence quotient (x=y)n = xn =yn for every n 1, provided yn 6= 0.

To ease notation, we will denote the sum directly by fxn + yn g instead of f(x + y)n g.
We will do the same for the other operations.5

On R1 we have an order structure similar to that of Rn . In particular, given x; y 2 R1 ,


we write:

(i) x y if xn yn for every n 1;

(ii) x > y if x y and x 6= y, i.e., if x y and there is at least a position n such that
xn > yn ;

(iii) x y if xn > yn for every n 1.

Moreover, (iii) =) (ii) =) (i), i.e.,

x y =) x > y =) x y 8x; y 2 R1

That said, like in Rn also in R1 the order is not complete and sequences might well
be not comparable. For instance, the alternating sequence xn = ( 1)n and the constant
sequences yn = 0 cannot be compared. Indeed, they are f 1; 1; 1; 1; :::g and f0; 0; 0; 0; :::g,
respectively.

The functions g : A R1 ! R de…ned on subsets of R1 are important. Thanks to the


order structure of R1 , we can classify these functions through monotonicity, as we did on
Rn (Section 6.4.4). Speci…cally, a function g : A R1 ! R is:

(i) increasing if
x y =) g (x) g (y) 8x; y 2 A (8.15)
5
If f; g : N+ ! R are the functions underlying the sequences fxn g and fyn g, their sum is equivalently
written (x + y)n = (f + g) (n) = f (n)+g (n) for every n 1. A similar remark holds for the other operations.
So, the operations on functions imply those on sequences, as claimed.
8.3. APPLICATION: INTERTEMPORAL CHOICES 187

(ii) strongly increasing if it is increasing and

x y =) g (x) > g (y) 8x; y 2 A

(iii) strictly increasing if

x > y =) g (x) > g (y) 8x; y 2 A

(iv) constant if there exists k 2 R such that

g (x) = k 8x 2 A

The decreasing counterparts of these notions are similarly de…ned. For brevity, we do
not dwell upon these notions. We just note that, as in Rn , strict monotonicity implies
the other two kinds of monotonicity and that constancy implies increasing and decreasing
monotonicity, but not vice versa (cf. Example 210).

8.3 Application: intertemporal choices


The Euclidean space RT can model a problem of intertemporal choice of a consumer over T
periods (Section 2.4.2). However, in many applications it is important not to …x a priori a
…nite horizon T for the consumer, but to imagine that he faces an in…nite horizon. In this
case, in the sequence x = fx1 ; x2 ; : : : ; xt ; : : :g the term xt denotes the quantity of the good
consumed (say, potatoes) at time t 1.
This is, of course, an idealization. But it permits to model in a simple way the intertem-
poral choices of agents that ex ante, at the time of the decision, are not able to specify the
last period T relevant for them (for example, the …nal date might be their death, which they
do not know ex ante).
In analogy with what we saw in Section 6.2.2, the consumer has a preference over the
consumption streams x = fx1 ; x2 ; : : : ; xt ; : : :g that is represented by an intertemporal utility
function U : R1 + ! R. For example, if we assume that the consumer evaluates the consump-
tion xt of each period through a (bounded) instantaneous utility function ut : R+ ! R, then
a standard form of the intertemporal utility function is
t 1
U (x) = u1 (x1 ) + u2 (x2 ) + + ut (xt ) +

where 2 (0; 1) can be interpreted as a subjective discount factor that, as we have seen,
depends on the degree of patience of the consumer (Section 6.2.2).
The monotonicity properties of intertemporal utility functions U : R1 + ! R are, clearly,
those seen in points (i)-(iv) of the previous section for a generic function g de…ned on subsets
of R1 .

8.4 Application: prices and expectations


Economic agents’ decisions are often based on variables the value of which they will only
learn in the future. At the moment of the decision, agents can only rely on their subjective
188 CHAPTER 8. SEQUENCES

expectations about such values. For this reason, expectations come to play a key role in
economics and the relevance of this subjective component is a key feature of economics
as a social science that distinguishes it from, for instance, the natural sciences. Through
sequences we can give a …rst illustration of their importance.

8.4.1 A market for a good


Let us consider the market, denoted by M , of some agricultural good, say potatoes. It is
formed by a demand function D : [a; b] ! R and by a supply function S : [a; b] ! R, with
0 a < b. The image D (p) is the overall amount of potatoes demanded at price p by
consumers, while the image S (p) is the overall amount of potatoes supplied at price p by
producers. We assume that both such quantities respond instantaneously to changes in the
market price p: in particular, producers are able to adjust in real time their production levels
according to the market price p.

De…nition 272 A pair (p; q) 2 [a; b] R+ of prices and quantities is called an equilibrium
of market M if
q = D (p) = S (p)

The pair (p; q) is the equilibrium of our market of potatoes. Graphically, it corresponds
to the classic intersection of supply and demand:

6
y
D
5

S
3

0
O b x
-1
-0.5 0 0.5 1 1.5 2

For simplicity, let us consider linear demand and supply functions:

D (p) = p (M)
S (p) = + p

with > 0, 0 and ; > 0. Since consumers demand positive quantities, we set
a = = > 0 (because D (p) 0 if and only if p = ); similarly, since producers supply
positive quantities, we set b = = (because S (p) 0 if and only if p = ). There can
be trade only at prices that belong to the interval

[a; b] = ; (8.16)
8.4. APPLICATION: PRICES AND EXPECTATIONS 189

where both quantities are positive. So, we consider demand and supply functions de…ned
only on such interval even though, mathematically, they are straight lines de…ned on the
entire real line.6
For our linear economy, the equilibrium condition becomes

p= + p

So, the equilibrium price and quantity are

p= (8.17)
+

and
a
q = D (p) = p= =
+ +
Note that, equivalently, we can retrieve the equilibrium quantity via the supply function:

a
q = S (p) = + p = + =
+ +

Thus, the pair


a
;
+ +
is the equilibrium of our market of potatoes.

8.4.2 Delays in production


Suppose that the market of potatoes opens periodically, say once a month. Denote by t,
with t = 1; 2; : : :, a generic month and by pt the corresponding market price. Assume that
the demand and supply functions

D (pt ) = pt (Mt )
S (pt ) = + pt

form the market, denoted by Mt , of potatoes at t. Besides the hypothesis of instantaneous


adjustment, already made for the market M , we make two further assumptions on the
markets Mt : (i) at every t the same producers and consumers trade, so the coe¢ cients , ,
and do not change; (ii) the good traded at each t, the potatoes, is perishable and does
not last till the next month t + 1: the quantities demanded and supplied at t + 1 and at t
are independent, so the markets Mt have no links among them.
Now we need to consider all markets Mt , not just a a single one M , so demand and
supply have to be in equilibrium at each t. In place of the pair of scalars (p; q) of the last
de…nition, we now have a pair of sequences.7
6
Yet another example where the relevant domain of a function is determined by economic considerations.
Note that the interval (8.16) is non-empty only if = = , i.e., . So, this is a further conditions
that the coe¢ cients of the demand and supply functions must satisfy.
7
Here [a; b]1 denotes the collection of sequences with terms that all belong to the interval [a; b].
190 CHAPTER 8. SEQUENCES

De…nition 273 A pair of sequences fpt g 2 [a; b]1 and fqt g 2 R1


+ of prices and quantities
is called a uniperiodal market equilibrium of markets Mt if

qt = D (pt ) = S (pt ) 8t 1

It is easy to check that the resulting sequence of equilibrium prices fpt g is constant:

pt = 8t 1 (8.18)
+

We thus go back to the equilibrium price (8.17) of market M . This is not surprising: because
of our assumptions, the markets Mt are independent and, at each t, we have a market identical
to M .
The hypothesis of instantaneous production upon which our analysis relies is, however,
implausible. Let us make the more plausible hypothesis that producers can adjust their
production only after one period: their production technology requires that the quantity
that they supply at t has to be decided at t 1 (to harvest potatoes at t, we need to sow at
t 1).
At the decision time t 1, producers do not know the value of the future equilibrium
price pt , they can only have a subjective expectation about it. Denote by Et 1 (pt ) such
expected value. In this case the market at t, denoted by MRt , has the form

D (pt ) = pt (MRt )
S (Et 1 (pt )) = + Et 1 (pt )

where the expectation Et 1 (pt ) replaces the price pt as an argument of the supply function.
Indeed, producers’decisions now rely upon such expectation.

De…nition 274 A triple of sequences of prices fpt g 2 [a; b]1 , quantities fqt g 2 R1 + , and
expectations fEt 1 (pt )g 2 [a; b]1 is called a uniperiodal market equilibrium of markets MRt
if
qt = D (pt ) = S (Et 1 (pt )) 8t 1

In a uniperiodal market equilibrium, the sequences of prices and expectations have to be


such that demand and supply are in equilibrium at each t. In particular, in equilibrium we
have
pt = Et 1 (pt ) 8t 1 (8.19)

Since prices are positive, we must have

0 Et 1 (pt ) 8t 1

This inequality is a necessary condition for equilibrium expectations. But, expect such simple
inequalities, there are no restrictions on equilibrium expectations: they just have to balance
with prices, nothing else.
8.4. APPLICATION: PRICES AND EXPECTATIONS 191

8.4.3 Expectation formation


Let us make a few hypotheses on how expectations can be formed. An important piece
of information that producers have at time t is the sequence of previous equilibrium prices
fp1 ; p2 ; :::; pt 1 g. Let us assume that, a bit lazily, producers expect that the last observed
price, pt 1 , will be also the future equilibrium price, that is,
Et 1 (pt ) = pt 1 8t 2 (8.20)
with an arbitrary initial expectation E0 (p1 ).8 With this process of expectation formation,
the market MRt becomes
D (pt ) = pt
S (pt 1) = + pt 1

In view of (8.19), at a uniperiodal market equilibrium, prices then evolve according to the
linear recursion
pt = pt 1 8t 2 (8.21)

with initial value


p1 = E0 (p1 ) (8.22)

determined by the initial expectation E0 (p1 ).

So, starting from an initial expectation, prices are determined by recurrence. Expecta-
tions no longer play an explicit role in the evolution of prices, thus dramatically simplifying
the analysis. Yet, one should not forget that, though they do not appear in the recursion,
expectations are key in the underlying economic process. Speci…cally, once …xed a value of
E0 (p1 ), from (8.22) we have the initial equilibrium price, which in turn determines both the
expectation E1 (p2 ) via (8.20) and the next equilibrium price p2 via the recursion (8.21), and
so on so forth. So, starting from an initial expectation, this process features equilibrium
sequences fpt g and fEt 1 (pt )g of prices and expectations.

Assume, instead, that producers expect that the future price be an average of the last
two observed prices:
1 1
Et 1 (pt ) = pt 1 + pt 2 8t 3 (8.23)
2 2
with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). In view of (8.19), at a uniperiodal
market equilibrium, prices then evolve according to the following linear recursion of order 2:
(
p1 = E0 (p1 ) ; p2 = E1 (p2 )
pt = 2 pt 1 2 pt 2 for t 3

with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). Expectations based on (possibly
weighted) averages of past prices –the so-called extrapolative expectations –make it possible
to describe equilibrium prices via a linear recurrence, a very tractable form. It is, however,
a quite naive mechanism of price formation, agents might well feature more sophisticated
ways to form expectations (as readers will learn in some economics course).
8
Indeed, expectations on the initial price p1 cannot rely on any previous price information.
192 CHAPTER 8. SEQUENCES

8.5 Images and classes of sequences


In a sequence the same values can appear several times. For example, the two values 1 and
1 keep being repeated in the alternating sequence xn = ( 1)n , i.e.,

f 1; 1; 1; 1; :::g (8.24)

The constant sequence xn = 2 is


f2; 2; 2; :::g (8.25)
It is thus constituted only by the value 2 (so, the underlying f is the constant function
f (n) = 2 for every n 1).
In this respect, it plays an important role the image

Im f = ff (n) : n 1g

of the sequence, which consists exactly of the values that the sequence takes on, disregarding
repetitions. For example, the image of the alternating sequence (8.24) is f 1; 1g, while for
the constant sequence (8.25) it is the singleton f2g. The image thus gives an important
piece of information in that it indicates which values the sequence actually takes on, net of
repetitions: as we have seen, such values may be very few and just repeat themselves over and
over again along the sequence. On the other hand, the sequence of the odd numbers (8.6) does
not contain any repetition; its image consists of all its terms, that is, Im f = f2n 1 : n 1g.

Through the image, in Section 6.4.3 we studied some notions of boundedness for functions.
In the special case of sequences –i.e., of the functions f : N+ ! R –these notions take the
following form. A sequence fxn g is:

(i) bounded (from) above if there exists k 2 R such that xn k for every n 1;

(ii) bounded (from) below if there exists k 2 R such that xn k for every n 1;

(iii) bounded if it is bounded both above and below, i.e., if there exists k > 0 such that
jxn j k for every n 1.

For example, the alternating sequence xn = ( 1)n is bounded, while that of the odd
numbers (8.6) is only bounded below. Note that, as usual, this classi…cation is not exhaustive
because there exist sequences that are both unbounded above and below: for example, the
(strongly) alternating sequence xn = ( 1)n n.9 Such sequences are called unbounded.

Monotonic sequences are another important class of sequences. By applying to the un-
derlying function f : N+ ! R the notions of monotonicity introduced for functions (Section
6.4.4), we say that a sequence fxn g is:

(i) increasing if
xn+1 xn 8n 1
strictly increasing if
xn+1 > xn 8n 1
9
By “unbounded above (below)” we mean “not bounded from above (below)”.
8.6. EVENTUALLY: A KEY ADVERB 193

(ii) decreasing if
xn+1 xn 8n 1

strictly decreasing if
xn+1 < xn 8n 1

(iii) constant if it is both increasing and decreasing, i.e., if there exists k 2 R such that

xn = k 8n 1

A (strictly) increasing or decreasing sequence is called (strictly) monotonic. For example,


the Fibonacci sequence is increasing (not strictly, though), the sequence (8.6) of the odd
numbers is strictly increasing, while the sequence (8.7) is strictly decreasing.

8.6 Eventually: a key adverb


A key feature of sequences is that properties often hold “eventually”.

De…nition 275 We say that a sequence satis…es a property P eventually if, starting from
a certain position n = nP , all the terms of the sequence satisfy P.

The position n depends on the property P, as indicated by writing n = nP .

Example 276 (i) The sequence f2; 4; 6; 32; 57; 1; 3; 5; 7; 9; 11; g is eventually increas-
ing: indeed, starting from the 6th term, it is increasing.

(ii) The sequence fng is eventually 1:000: indeed, all the terms of the sequence, starting
from those of position 1:000, are 1:000.

(iii) The same sequence is also eventually 1:000:000:000 as well as 10123 .

(iv) The sequence f1=ng is eventually smaller than 1=1:000:000.

(v) The sequence


f27; 65; 13; 32; ; 125; 32; 3; 3; 3; 3; 3; 3; 3; 3; g

is eventually constant. N

O.R. To satisfy eventually a property, the sequence in its “youth”can do whatever it wants;
what matters is that when old enough (i.e., from a certain n onward) it settles down. Youthful
blunders are forgiven as long as, sooner or later, all the terms of the sequence will satisfy
the property. H
194 CHAPTER 8. SEQUENCES

8.7 Limits: introductory examples


The purpose of the notion of limit is to formalize rigorously the concept of “how a sequence
behaves as n becomes larger and larger”, that is, asymptotically. In other words, as for a
thriller story, we ask ourselves “how it will end”. For sequences that represent the values that
an economic variable takes on at subsequent dates, economists talk of “long run behavior”.
We start with some examples to understand intuitively what we mean by limit of a
sequence. Consider the sequence (8.7), i.e.,
1 1 1
1; p ; p ; p ;
2 4 8
p
For larger and larger values of n, its terms xn = 1= 2n 1 become closer and closer to, “tend
to”, the value L = 0. In this case, we say that the sequence tends to 0 and write
1
lim p =0
n!1 2n 1
For the sequence (8.6) of the odd numbers
f1; 3; 5; 7; g
the terms xn = 2n 1 of the sequence become larger and larger as the values of n become
larger and larger. In this case, we say that the sequence diverges positively and write
lim (2n 1) = +1
n!1
In a dual manner, the sequence of the negative odd numbers xn = 2n+1 diverges negatively,
written
lim ( 2n + 1) = 1
n!1
Finally, the alternating sequence xn = ( 1)n , i.e.,
f 1; 1; 1; 1; g
continues to oscillate, as n varies, between the values 1 and 1, never approaching (eventu-
ally) any particular value. In this case, the sequence is irregular (or oscillating): it does not
have any limit.

8.8 Limits and asymptotic behavior


In the introductory examples we identi…ed three possible asymptotic behaviors of the terms
of a sequence:
(i) convergence to a value L 2 R;
(ii) divergence to either +1 or 1;
(iii) oscillation.
In the cases (i) and (ii) we say that the sequence is regular : it tends to (it approaches
asymptotically) a value, possibly in…nite. In case (iii) we say that the sequence is irregular
(or oscillating). In the rest of the section we focus on regular sequences and formalize the
intuitive idea of “tending to a value”.
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 195

8.8.1 Convergence
We start with convergence, that is, with case (i) above.

De…nition 277 A sequence fxn g converges to a point L 2 R, in symbols xn ! L or


limn!1 xn = L, if for every " > 0 there exists n" 1 such that

n n" =) jxn Lj < " (8.26)

The number L is called the limit of the sequence.

The implication (8.26) can be rewritten as

n n" =) d (xn ; L) < " (8.27)

Therefore, a sequence fxn g converges to L when, for each quantity " > 0, arbitrarily small
but positive, there exists a position n" – that depends on "! – starting from which the
distance between the terms xn of the sequence and the limit L is always smaller than ". A
sequence fxn g that converges to a point L 2 R is called convergent.

O.R. To show the convergence of a sequence to L, you have to pass a highly demanding test:
given any threshold " > 0 selected by a relentless examiner, you have to be able to come up
with a position n" far enough so that all terms of the sequence that come after such position
are " close to L. A convergent sequence is able to pass any such test, however tough the
examiner can be (i.e., however small is the posited " > 0). H

We emphasized through an exclamation point that the position n" depends on ", a key
feature of the previous de…nition. Moreover, such n" is not unique: if there exists a position
n" such that jxn Lj < " for every n n" , the same is true for any subsequent position,
which then also quali…es as n" . The choice of which among these positions to call n" is
irrelevant for the de…nition, which only requires the existence of, at least, one of them.
That said, there is always a smallest n" , which is a genuine threshold. As such, its
dependence on " takes a natural monotonic form: such n" becomes larger and larger as "
becomes smaller and smaller. The smallest n" thus best captures, because of its threshold
nature, the spirit of the de…nition: for each arbitrarily small " > 0, there exists a threshold
n" – the larger, the smaller (so, more demanding) " is – beyond which the terms xn are
" close to the limit L. The two examples that we will present shortly should clarify this
discussion.

A neighborhood of a scalar L has the form

B" (L) = fx 2 R : d (xn ; L) < "g = (L "; L + ")

So, in view of (8.27) we can rewrite the de…nition of convergence in the language of neigh-
borhoods. Conceptually, it is an important rewriting that deserves a separate mention.

De…nition 278 A sequence fxn g converges to a point L 2 R if, for every neighborhood
B" (L) of L, there exists n" 1 such that

n n" =) xn 2 B" (L)


196 CHAPTER 8. SEQUENCES

In words, a sequence tends to a scalar L if, eventually, it belongs to each neighborhood


of L, however small it might be (it is easy to belong to a large neighborhood, but di¢ cult to
belong to a very small one). Although this last de…nition is a mere rewriting of De…nition
277, the use of neighborhoods should further clarify the nature of convergence.

Example 279 Consider the sequence xn = 1=n. The natural candidate for its limit is 0.
Let us verify that this is the case. Let " > 0. We have

1 1 1
0 < " () < " () n >
n n "

Therefore, if we take as n" any integer greater that 1=", for example the smallest one n" =
[1="] + 1,10 we then have
1
n n" =) 0 < < "
n
Therefore, 0 is indeed the limit of the sequence. For example, if " = 10 100 , we have
n" = 10100 + 1. Note that we could have chosen n" to be any integer greater than 10100 + 1,
which is indeed the smallest n" . N
p
Example 280 Consider the sequence (8.7), that is, xn = 1= 2n 1 . Also here the natural
candidate for its limit is 0. Let us verify this. Let " > 0. We have

1 1 n 1 1 1
p 0 < " () n 1 < " () 2 2 > () n > 1 + 2 log2
2n 1
2 2 " "

Therefore, by taking n" to be any integer greater than 1+2 log2 " 1 , for example the smallest
one n" = 2 + 2 log2 " 1 , we have

1
n n" =) 0 < p <"
2n 1

Therefore, 0 is the limit of the sequence. For example, if " = 10 100 the smallest n" is
2 + 2 log2 10100 = 2 + 200 [log2 10]. N

We saw two examples of sequences that converge to 0. Such sequences are called in…n-
itesimal (or null ). Thanks to the next result, the computation of their limits is of particular
importance.

Proposition 281 A sequence fxn g converges to a point L 2 R if and only if d (xn ; L) ! 0.

Proof “If”. Suppose that limn!1 d (L; xn ) = 0. Let " > 0. There exists n" 1 such that
d (L; xn ) < " for every n n" . Therefore, xn 2 B" (L) for every n n" , as desired.
“Only if”. Let limn!1 xn = L. Consider the sequence of distances, whose term is
yn = d(xn ; L). We have to prove that limn!1 yn = 0, i.e., that for every " > 0 there exists
n" 1 such that n n" implies jyn j < ". Since yn 0, this is actually equivalent to showing
that
n n" =) yn < " (8.28)
10
Recall that [ ] denotes the integer part (Section 1.4.3).
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 197

Since xn ! L, given " > 0 there exists n" 1 such that d(xn ; L) < " for every n n" .
Therefore, (8.28) holds.

We can thus reduce the study of the convergence of any sequence to the convergence
to 0 of the sequence of distances fd (xn ; L)gn 1 . In other words, to check if xn ! L, it is
su¢ cient to check if d (xn ; L) ! 0, that is, if the sequence of distances is in…nitesimal.

Example 282 The sequence


1
xn = 1 + ( 1)n
n
converges to L = 1. Indeed,
( 1)n ( 1)n 1
d (xn ; 1) = 1 + 1 = = ! 0;
n n n
and so, by Proposition 281, xn ! 1. N

Since d (xn ; 0) = jxn j, a simple noteworthy consequence of the last proposition is that

xn ! 0 () jxn j ! 0 (8.29)

A sequence is, thus, in…nitesimal if and only if it is “absolutely” in…nitesimal, in that the
distances of its terms from the origin become smaller and smaller.

We close with an important observation: in applying the De…nition 277 of convergence,


we have always to posit a possible candidate limit L 2 R, and then to verify whether it
satis…es the de…nition. It is a “guess and verify”procedure.11 For some sequences, however,
to guess a candidate limit L might not be obvious, thus making problematic the application
of the de…nition. We will talk again about this important issue when discussing Cauchy
sequences (Section 8.12).12

8.8.2 Limits from above and from below


It may happen that xn ! L 2 R and that, eventually, we also have xn L. In other words,
fxn g approaches L by remaining to its right. In such a case we say that fxn g tends to L
from above, and write limn!1 xn = L+ or xn ! L+ . In particular, if fxn g is decreasing, we
write xn # L.
The notations xn ! L+ and xn # L are more informative than xn ! L: besides saying
that fxn g converges to L, they also convey the information that this happens from above
(monotonically if xn # L).
Similarly, if xn ! L 2 R and eventually xn L, we say that fxn g tends to L from below
and write limn!1 xn = L or xn ! L . In particular, if fxn g is increasing we write xn " L.
p
Example 283 (i) We have 1=n # 0, and 1= 2n 1 # 0, as well as f1 1=ng " 1. (ii) We
have 1 + ( 1n ) n 1 ! 1 but neither to 1+ nor to 1 . N
11
The “guess” part, i.e., how to posit a candidate limit, relies on experience (so we have an “educated
guess”), inspiration, revelation, or just a little bird suggestion.
12
Section 12.9 will show that for sequences de…ned by recurrences there is an elegant way, via …xed points,
to supply candidate limit points.
198 CHAPTER 8. SEQUENCES

Example 284 Consider the sequence xn = n 1 + ( 1)n n 1, i.e.,


(
0 if n odd
xn = 2
n if n even
So, xn ! 0+ but not xn # 0 because this sequence is not monotonic. N
The notions of limits from above and from below can be made rigorous via right and left
neighborhoods of L, as readers can check.

8.8.3 Divergence
We now consider divergence. We begin with positive divergence. The spirit of the de…nition
is similar, mutatis mutandis, to that of convergence (as soon will be clear).
De…nition 285 A sequence fxn g diverges positively, written xn ! +1 or limn!1 xn =
+1, if for every K 2 R there exists nK 1 such that
n nK =) xn > K
In other words, a sequence diverges positively when it eventually becomes greater than
every scalar K. Since the constant K can be taken arbitrarily large, this can happen only
if the sequence is not bounded above (it is easy to be > K when K is small, increasingly
di¢ cult the larger K is).
Example 286 The sequence of even numbers xn = 2n diverges positively. Indeed, let
K 2 R. We have:
K
2n > K () n >
2
and so we can choose as nK any integer greater than K=2. For example, if K = 10100 , we
can put nK = 10100 =2 + 1. Therefore, xn = 2n diverges positively. N
O.R. For divergence there is a demanding “above the bar”test to pass: a relentless examiner
now sets an arbitrary bar K, to show the divergence of a sequence you have to come up with
a position nK far enough so that all terms of the sequence that come after such position are
above the posited bar. A divergent sequence is able to pass any such test, however tough
the examiner can be (i.e., however high K is). H

The de…nition of negative divergence is dual.


De…nition 287 A sequence fxn g diverges negatively, written xn ! 1 or limn!1 xn =
1, if for every K 2 R there exists nK 1 such that
n nK =) xn < K
In such a case, the terms of the sequence are eventually smaller than every scalar K:
although the constant can take arbitrarily large negative values (in absolute value), there
exists a position beyond which all the terms of the sequence are smaller than or equal to the
constant. This characterizes the convergence to 1 of the sequence.

Intuitively, divergence is a form of “convergence to in…nity”. The next simple, but


important, result highlights the strong connection between convergence and divergence.
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 199

Proposition 288 A sequence fxn g, with eventually xn > 0, diverges positively if and only
if the sequence f1=xn g converges to zero.

A dual result holds for the negative divergence.13

Proof “If”. Let 1=xn ! 0. Let K > 0. Setting " = 1=K > 0, by De…nition 277 there exists
n1=K 1 such that 1=xn < 1=K for every n n1=K . Therefore, xn > K for every n n1=K ,
and by De…nition 285 we have xn ! +1.
“Only if”. Let xn ! +1 and let " > 0. Setting K = 1=" > 0, by De…nition 285 there
exists n1=" such that xn > 1=" for every n n1=" . Therefore, 0 < 1=xn < " for every n n1="
and so 1=xn ! 0.

Adding, subtracting or changing in any other way a …nite number of terms of a sequence
does not alter its asymptotic behavior: if it is regular, i.e., convergent or (properly) divergent,
it remains so, and with the same limit; if it is irregular (oscillating), it remains so. Clearly,
this depends on the fact that the notion of limit requires that a property –either “hitting”
an arbitrarily small neighborhood in case of convergence or being greater than an arbitrarily
large number in case of divergence –only holds eventually.

8.8.4 Topology of R and a general de…nition of limit


The topology of the real line can be extended in a natural way to the extended real line R
by de…ning the neighborhoods of the points at in…nity +1 and 1 in the following way.

De…nition 289 A neighborhood of +1 is a half-line (K; +1], with K 2 R. A neighborhood


of 1 is a half-line [ 1; K), with K 2 R.

Therefore, a neighborhood of +1 is formed by all scalars greater than a scalar K, while


a neighborhood of 1 is formed by all scalars smaller than K.

O.R. The smaller " > 0 is, the smaller a neighborhood B" (x) of a point. In contrast, the
greater K > 0 is, the smaller a neighborhood (K; +1] of +1 is. For this reason, for a
neighborhood of +1 the value of K becomes signi…cant when positive and arbitrarily large
(while for a neighborhood of 1 the value of K becomes signi…cant when negative and
arbitrarily large, in absolute value). H

The neighborhoods (K; +1] and [ 1; K) are open intervals in R for every K 2 R.14
That said, we can state a lemma that will be useful in de…ning limits of sequences.

Lemma 290 Let A be a set in R. Then,

(i) +1 is a point of accumulation A if and only if A is unbounded above.

(ii) 1 is point of accumulation of A if and only if A is unbounded below.


13
The hypothesis “eventually xn > 0”is redundant in the “only if”since a sequence that diverges positively
always satis…es this condition.
14
Each point x 2 (K; +1] is interior because, by taking K 0 with K < K 0 < x, we have x 2 (K 0 ; +1]
(K; +1]. A similar argument shows that each point x 2 [ 1; K) is interior.
200 CHAPTER 8. SEQUENCES

Proof We only prove (i) since the proof of (ii) is similar. “If”. Let A be unbounded above,
i.e., A has no upper bounds. Let (K; +1] be a neighborhood of +1. Since A has no upper
bounds, K is not an upper bound of A. Therefore, there exists x 2 A such that x > K,
i.e., x 2 (K; +1] \ A and x 6= +1. It follows that +1 is a limit point of A. Indeed, each
neighborhood of +1 contains points of A di¤erent from +1.
“Only if”. Let +1 be a limit point of A. We show that A does not have any upper
bound. Suppose, by contradiction, that K 2 R is an upper bound of A. Since +1 is a limit
point of A, the neighborhood (K; +1] of +1 contains a point x 2 A such that x 6= +1.
Therefore K < x, contradicting the fact that K is an upper bound of A.

Example 291 The sets A such that (a; +1) A for some a 2 R are an important class of
sets unbounded above. By Lemma 290, +1 is a limit point for such sets A. Similarly, 1
is a limit point for the sets A such that ( 1; a) A for some a 2 R. N

Using the topology of R we can give a general de…nition of convergence that generalizes
De…nition 278 of convergence so to include De…nitions 285 and 287 of divergence as special
cases. In the next de…nition, which uni…es all previous de…nitions of limit of a sequence, we
set: 8
>
> B (L) if L 2 R
< "
U (L) = (K; +1] if L = +1
>
>
:
[ 1; K) if L = 1

De…nition 292 A sequence fxn g in R converges to a point L 2 R if, for every neighborhood
U (L) of L, there exists nU 1 such that

n nU =) xn 2 U (L)

If L 2 R, we get back to De…nition 278. If L = 1, thanks to the De…nition 289 of neigh-


borhood, De…nition 292 becomes a reformulation in terms of neighborhoods of De…nitions
285 and 287.
This general de…nition of convergence shows the unity of the notions of convergence and
divergence studied so far, thus con…rming the strong connection between convergence and
divergence that already emerged in Proposition 288.

O.R. If L 2 R, the position nU depends on an arbitrary radius " > 0 (in particular, as small
as we want), so we can write nU = n" : If, instead, L = +1, then nU depends on an arbitrary
scalar K (in particular, positive and arbitrarily large), so we can write nU = nK . Finally,
if L = 1, then nU depends on any negative real number K (in particular, negative and
arbitrarily large, in absolute value) and, without loosing generality, we can set nU = nK .
Thus, when L is …nite it is crucial that the property holds also for arbitrarily small values
of ". When L = 1, it is instead key that the property holds also for K arbitrarily large in
absolute value. H

8.9 Properties of limits


In this section we study some properties of limits. The …rst result shows that the limit of a
sequence, if exists, is unique.
8.9. PROPERTIES OF LIMITS 201

Theorem 293 (Uniqueness of the limit) A sequence fxn g converges to at most one limit
L 2 R.

Proof Suppose, by contradiction, that there exist two distinct limits L0 and L00 that belong
to the set R. Without loss of generality, we assume that L00 > L0 . We consider di¤erent
cases and show that in each of them we reach a contradiction. So, L0 = L00 and we conclude
that the limit is unique.
We begin with the case when both L0 and L00 are …nite, i.e., L0 ; L00 2 R. Take " > 0 so
that
L00 L0
"<
2
Then
B" L0 \ B" L00 = ;
as the reader can verify and the next …gure illustrates:

10 y

8 L''+ε
L''
L''- ε
6
L'+ε
L'
4
L'- ε

O x
0

-2
-2 -1 0 1 2 3 4

By De…nition 278, there exists n0" 1 such that xn 2 B" (L0 ) for every n n0" , and there
exists n00" 1 such that xn 2 B" (L00 ) for every n n00" . Setting n" = max fn0" ; n00" g, we have
therefore both xn 2 B" (L0 ) and xn 2 B" (L00 ) for every n n" , i.e., xn 2 B" (L0 ) \ B" (L00 )
for every n n" . But this contradicts B" (L0 ) \ B" (L00 ) = ;. We conclude that L0 = L00 , so
the limit is unique.
Turn now to the case in which L0 is …nite and L00 = +1. For every " > 0 and every
K > 0, there exist n" and nK such that

L0 " < xn < L0 + " 8n n" and xn > K 8n nK

For n max fn" ; nK g, we therefore have simultaneously

L0 " < xn < L0 + " and xn > K

It is now su¢ cient to take K = L0 + " to realize that, for n max fn" ; nK g, the two
inequalities cannot coexist. Also, in this case we reached a contradiction.
202 CHAPTER 8. SEQUENCES

The remaining cases can be treated in a similar way and are thus left to the reader.

The next result shows that, when a sequence converges to a point L 2 R, in each neigh-
borhood of L we …nd almost all the points of the sequence.

Proposition 294 A sequence fxn g converges to L 2 R if and only if each neighborhood


B" (L) of L contains all the terms of the sequence, except at most a …nite number of them.

In other words, the sequence eventually belongs to any neighborhood B" (L) of L.

Proof Let xn ! L. By De…nition 278, for every " > 0 there exists n" 1 such that
xn 2 B" (L) for every n n" . Therefore, except at most the terms xn with 1 n < n" , all
the terms of the sequence belong to B" (L).
Vice versa, given any neighborhood B" (L) of L, suppose that all the terms of the sequence
belong to it, except at most a …nite number of them. Denote by fxnk g, with k = 1; 2; : : : ; m,
the set of the elements of the sequence that do not belong to B" (L). Setting n" = nm + 1,
we have that xn 2 B" (L) for every n n" . Since this is true for each neighborhood B" (L)
of L, by De…nition 278 we have xn ! L.

The next classic result shows that the terms of a convergent sequence eventually have the
same sign of the limit point. In other words, the sign of the limit point eventually determines
the sign of the terms of the sequence.

Theorem 295 (Permanence of sign) Let fxn g be a sequence that converges to a limit
L 6= 0. Then, eventually xn has the same sign as L, that is, xn L > 0.

Analogously, it is easy to see that if xn ! +1 (resp., 1), then eventually xn K


(resp., xn K) for every K > 0 (resp., K < 0).

Proof Suppose L > 0 (a similar argument holds if L < 0). Let " 2 (0; L). By De…nition
277, there exists n 1 such that jxn Lj < ", i.e., L " < xn < L + " for every n n.
Since " 2 (0; L), we have L " > 0. Therefore,

0<L " < xn 8n n

We conclude that xn > 0 for every n n, as desired.

This last theorem has established a property of the limits with respect to the order
structure of the real line. Next we give another simple result of the same kind, leaving the
proof to the reader. A piece of notation: xn ! L 2 R indicates that the sequence fxn g
either converges to L 2 R or diverges (positively or negatively).

Proposition 296 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If eventually xn yn , then L H.

The scope of this proposition is noteworthy. It allows, for example, to check the positive
or negative divergence of a sequence through a simple comparison with other divergent
sequences. Indeed, if xn yn and xn diverges negatively, so does yn ; if xn yn and yn
diverges positively, so does xn .
8.9. PROPERTIES OF LIMITS 203

The converse of the proposition does not hold: for example, let L = H = 0, fxn g =
f 1=ng and fyn g = f1=ng. We have L H, but xn < yn for every n. However, if we
assume L > H, the converse then holds “strictly”.

Proposition 297 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If L > H, then eventually xn > yn .

Proof We prove the statement for L; H 2 R, leaving the other cases to the reader. Let
0 < " < (L H) =2. Since H +" < L ", we have (H "; H +")\(L "; L+") = ;. Moreover,
there exist n0" ; n00" 1 such that yn 2 (H "; H + ") for every n n0" and xn 2 (L "; L + ")
for every n n00" . For every n maxfn0" ; n00" g, we then have yn 2 (H "; H + ") and
xn 2 (L "; L + "), so xn > L " > H + " > yn . We conclude that eventually xn > yn .

8.9.1 Monotonicity and convergence


The next result gives a simple necessary condition for convergence.

Proposition 298 Each convergent sequence is bounded.

Proof Suppose xn ! L. Setting " = 1, there exists n1 1 such that xn 2 B1 (L) for every
n n1 . Let M > 0 be a constant such that

M > max [1; d (x1 ; L) ; : : : ; d (xn1 1 ; L)]

We have d (xn ; L) < M for every n 1, i.e., jxn Lj < M for every n 1. This implies
that, for all n 1,
L M < xn < L + M
Therefore, the sequence is bounded.

Thanks to this proposition, the convergent sequences form a subset of the bounded ones.
Therefore, if a sequence is unbounded, it cannot be convergent.

In general, the converse of Proposition 298 is false. For example the alternating sequence
xn = ( 1)n is bounded but does not converge. A partial converse will be soon established by
the Bolzano-Weierstrass’Theorem. A full-‡edged converse, however, holds for the important
class of monotonic sequences: for such sequences, boundedness is both a necessary and
su¢ cient condition for convergence. This result is actually a corollary of the following general
theorem on the asymptotic behavior of monotonic sequences.

Theorem 299 Each monotonic sequence is regular. In particular,

(i) it converges if it is bounded;

(ii) it diverges positively if it is increasing and unbounded;

(iii) it diverges negatively if it is decreasing and unbounded.


204 CHAPTER 8. SEQUENCES

Proof Let fxn g be an increasing sequence (the proof for decreasing sequences is similar). It
can be either bounded or unbounded above (for sure, it is bounded below because x1 xn
for every n 1). Suppose that fxn g is bounded. We want to prove that it is convergent. Let
E be the image of the sequence. By hypothesis, it is a bounded subset of R. By the Least
Upper Bound Principle, sup E exists. Set L = sup E. Let us prove that xn ! L. Let " > 0.
Since L is the supremum of E, by Proposition 120 we have: (i) L xn for every n 1, (ii)
there exists an element xn" of E such that xn" > L ". Since fxn g is an increasing sequence,
it then follows that
L xn xn" > L " 8n n"
Hence, xn 2 B" (L) for every n n" , as desired.
Suppose that fxn g is unbounded above. Then, for every K > 0 there exists an element
xnK such that xnK > K. Since fxn g is increasing, we then have xn xnK > K for every
n nK , so it diverges to +1.

Thus, monotonic sequences cannot be irregular. We are now able to state and prove the
result anticipated above on the equivalence of boundedness and convergence for monotonic
sequences.

Corollary 300 A monotonic sequence is convergent if and only if it is bounded.

Proof Consider an increasing sequence. If it is convergent, then by Proposition 298 it is


bounded. If it is bounded, then by Theorem 299 is convergent.

Needless to say, the results just discussed hold, more generally, for sequences that are
eventually monotonic.

8.9.2 Bolzano-Weierstrass’Theorem
The famous Bolzano-Weierstrass’ Theorem is a partial converse of Proposition 298. It is
the deepest result of this chapter, with far-reaching consequences. To state it, we must …rst
introduce subsequences. Consider a sequence fxn g. Given a strictly increasing sequence
fnk g1
k=1 that takes on only strictly positive integer values, i.e.,

n1 < n2 < n3 < < nk <

the sequence
fxnk g1
k=1 = fxn1 ; xn2 ; xn3 ; :::; xnk ; :::g

is called subsequence of fxn g. In words, the subsequence fxnk g is a new sequence constructed
from the original sequence fxn g by taking only the terms of position nk . A few examples
should clarify.

Example 301 Consider the sequence

1 1 1 1
1; ; ; ; : : : ; ; : : : (8.30)
2 3 4 n
8.9. PROPERTIES OF LIMITS 205

with term xn = 1=n. A subsequence is given by

1 1 1 1
1; ; ; ; : : : ; ;:::
3 5 7 2k + 1

where fnk gk 1 is the sequence of the odd numbers f1; 3; 5; : : :g. Thus, this subsequence has
been constructed by selecting the elements of odd position in the original sequence. Another
subsequence of (8.30) is given by

1 1 1 1 1
; ; ; ;:::; n;:::
2 4 8 16 2

where now fnk gk 1 is formed by the powers of 2, that is, 2; 22 ; 23 ; : : : . This subsequence
is constructed by selecting the elements of the original sequence whose position is a power
of 2. N

Example 302 Consider the alternating sequence xn = ( 1)n . A simple subsequence is


given by
f1; 1; 1; : : : ; 1; : : :g (8.31)
where fnk gk 1 is the sequence of the even numbers. This subsequence has been thus con-
structed by selecting the elements of even position in the original sequence. If we select those
of odd position, we construct the subsequence

f 1; 1; 1; : : : ; 1; : : :g (8.32)

By taking fnk gk 1 = f1000kg, i.e., by selecting only the elements of positions 1; 000, 2; 000,
3; 000, ... we still get the subsequence (8.31). On the other hand, (8.31) is not a subsequence
of (8.30) because the term 1 appears only at the initial position of (8.30). N

A subsequence is obtained by discarding some terms (possibly, in…nitely many) of the


original sequence, still keeping an in…nite number of them. So, if a sequence is regular, all
its subsequences are regular and with the same limit: ubi maior, minor cessat. More is true:

Proposition 303 A sequence is regular, with limit L 2 R, if and only if all its subsequences
are regular and with the same limit L.

Proof We prove the result for L 2 R, leaving the case L = 1 to the reader. “Only if”.
Suppose that fxn g converges to L. Let " > 0. There exists n" 1 such that jxn Lj < "
for every n n" . Let fxnk g1 k=1 be a subsequence of fxn g. Since nk k for every k 1, a
fortiori we have jxnk Lj < " for every k n" , so that fxnk g converges to L.
“If”. Suppose that each subsequence of fxn g converges to L. Suppose, by contradiction,
that fxn g does not converge to L. Then, there is a "0 > 0 such that, for every integer k 1,
there exists a position nk k for which xnk 2
= B"0 (L), i.e., jxnk Lj > "0 . Construct the
15
sequence of such xnk . It is a subsequence of fxn g that, by construction, does not converge
to L. So, we reached a contradiction. We conclude that fxn g converges to L.
15
For the …rst term we take k = 1 and the integer n1 1 such that jxn1 Lj > "0 ; for the second term we
take k = 2 and the integer n2 1 such that jxn2 Lj > "0 ; and so on.
206 CHAPTER 8. SEQUENCES

In the last example we extracted, from an oscillating sequence, a constant subsequence


by selecting only the elements of even position (or, only those of odd position). So, it might
well happen that, by suitably selecting the elements, we can extract a convergent “trend”
out of an irregular one. There might be order even in chaos (and method in madness).
Bolzano-Weierstrass’Theorem shows that this is always possible, as long as the sequence is
bounded.

Theorem 304 (Bolzano-Weierstrass) Each bounded sequence has (at least) one conver-
gent subsequence.

In other words, from any bounded sequence fxn g, even if highly irregular, it is always
possible to extract a convergent subsequence fxnk g, i.e., such that there exists L 2 R for
which limk!1 xnk = L. So, we can always extract convergent behavior from any bounded
sequence, a truly remarkable property.

Example 305 The alternating sequence xn = ( 1)n is bounded because its image is the
bounded set f 1; 1g. By Bolzano-Weierstrass’ Theorem, it has at least one convergent
subsequence. Indeed, such are the constant subsequences (8.31) and (8.32). N

The proof of Bolzano-Weierstrass’Theorem is based on the next lemma.

Lemma 306 Each sequence has a monotonic subsequence.

Proof Let fxn g be a sequence. We consider two cases.

Case 1: for every n 1 there exists m > n such that xm xn . Set n1 = 1. Let n2 > n1
be such that xn2 xn1 ; then let n3 > n2 be such that xn3 xn2 , and so on. We construct
in this way a decreasing monotonic subsequence fxnk g, so the lemma is proved in this case.

Case 2: there exists a position n 1 such that, for each m > n, we have xm > xn . Let
I N be the set of all the positions with this property. If I is a …nite set, then Case 1 holds
for all the positions n > max I. By considering n > max I, we can therefore construct, as in
Case 1, a decreasing monotonic subsequence fxnk g.
Suppose that, instead, I is not …nite. So, there exist in…nitely many positions n 1 such
that
m > n =) xm > xn (8.33)
Since they are in…nitely many, we can write I = fn1 ; n2 ; : : : ; nk ; : : :g, with n1 < n2 < <
nk < By (8.33), we have:
xn1 < xn2 < < xnk <
The subsequence fxnk g is, therefore, monotonic increasing. This completes the proof of the
lemma also in Case 2.

Proof of Bolzano-Weierstrass’Theorem Let fxn g be a bounded sequence. By Lemma


306, there exists a monotonic subsequence fxnk g. Since this subsequence is bounded (being
a subsequence of a bounded sequence), Theorem 299 shows that it is convergent, as desired.

For unbounded sequences, it is possible to establish a quite similar property.


8.10. ALGEBRA OF LIMITS 207

Proposition 307 Each unbounded sequence has a divergent subsequence (to +1 if unboun-
ded above, to 1 if unbounded below).16

Proof Suppose that the sequence is unbounded above (the other case is similar). Then,
for every K > 0 there exists at least one element of the sequence greater than K. We
denote by xnK the smallest term in the sequence fxn g that turns out to be > K. By taking
K = 1; 2; : : :, the resulting sequence fxnK g is clearly a subsequence of fxn g (indeed, all its
terms have been taken among those of fxn g) that diverges to +1.

Summing up:

Proposition 308 Each sequence has a regular subsequence.

Remarkably, from any sequence, however wild, we can always extract a regular asymptotic
behavior.

O.R. The Bolzano-Weierstrass’Theorem says that it is not possible to take in…nitely many
scalars (the elements of the sequence) in a bounded interval in a way that make them (or a
part of them) “well separated”one from the other: necessarily they crowd in the proximity of
(at least) one point. More generally, the last proposition says that there is no way of taking
in…nitely many scalars without at least a part of them crowding somewhere (in proximity of
either a …nite number or of +1 or of 1; i.e., of some point of R). H

8.10 Algebra of limits


8.10.1 The (many) certainties
In computing limits it is important to know how they behave with respect to the basic
operations on sequences of Section 8.2. Besides its theoretical interest, this is important
operationally because through the basic operations the computation of limits often reduces
to the computation of simpler limits or of some common limits (that we will introduce soon)
or of both.
The next result, based on the properties of the extended real line, shows that limits nicely
interchange with the basic operations (so, the “limit of a sum”is the “sum of the limits”, and
so on). Except in the forms of indetermination –i.e., except with respect to the operations
that are indeterminate in extended real line (Section 1.7).

Proposition 309 Let xn ! L 2 R and yn ! H 2 R. Then:17

(i) xn + yn ! L + H, provided that L + H is not an indeterminate form (1.25), of the type

+1 1 or 1+1
16
If it is both unbounded above and below, it has both a subsequence diverging to +1 and a subsequence
diverging to 1.
17
Recall that xn ! L 2 R indicates that the sequence fxn g either converges to L 2 R or diverges positively
or negatively.
208 CHAPTER 8. SEQUENCES

(ii) xn yn ! LH, provided that LH is not an indeterminate form (1.26), of the type
1 0 or 0 ( 1)

(iii) xn =yn ! L=H provided that eventually yn =


6 0 and that L=H is not an indeterminate
form (1.27), of the type18
1 a
or
1 0
Proof (i) Let xn ! L and yn ! H, with L; H 2 R. This means that, for every " > 0, there
exist n1 and n2 such that
L " < xn < L + " 8n n1 and H " < yn < H + " 8n n2
By adding the inequalities member by member, for every n n3 = max fn1 ; n2 g we have
L+H 2" < xn + yn < L + H + 2"
Since 2" is arbitrary, it follows that xn + yn ! L + H.
Now let xn ! L 2 R and yn ! +1. This means that, for every " > 0 and for every
K > 0, there exist n1 and n2 such that
L " < xn < L + " 8n n1 and yn > K 8n n2
By adding, we have, for every n n3 = max fn1 ; n2 g,
x n + yn > K + L "
Since K + L " > 0 is arbitrary, it follows that xn + yn ! +1. The other cases with in…nite
limit are treated similarly.

(ii) Let xn ! L and yn ! H, with x; y 2 R. This means that, for every " > 0, there
exist n1 and n2 such that
L " < xn < L + " 8n n1 and H " < yn < H + " 8n n2
Moreover, being convergent, fyn g is bounded (recall Proposition 298): there exists b > 0
such that jyn j b for every n. Now, for every n n3 = max fn1 ; n2 g,
jxn yn LHj = jyn (xn L) + L (yn H)j jyn j jxn Lj + jLj jyn Hj < " (b + jLj)
By the arbitrariness of " (b + jLj), we conclude that xn yn ! L H.
If L > 0 and H = +1, then in addition to having, for every " > 0,
L " < xn < L + " 8n n1
we also have, for every K > 0, yn > K for every n n2 . It follows that, for every
n n3 = max fn1 ; n2 g,
xn yn > (L ") K
By the arbitrariness of (L ") K > 0, we conclude that xn yn ! +1. If L < 0 and H = +1,
we have xn yn < (L + ") K and therefore xn yn ! 1. The other cases of in…nite limits are
treated in an analogous way.

Finally, we leave point (iii) to the reader.


18
Note that a=0 is equivalent to H = 0.
8.10. ALGEBRA OF LIMITS 209

Example 310 (i) Let xn = n= (n + 1) and yn = 1 + ( 1)n =n. Since xn ! 1 and yn ! 1,


we have xn + yn ! 1 + 1 = 2 and xn yn ! 1.
(ii) Let xn = 2n and yn = 1 + ( 1)n =n. Since xn ! +1 and yn ! 1, we have
xn + yn ! +1 and xn yn ! +1. N

The following result shows that the case a=0 of point (iii) with a 6= 0 is actually not
indeterminate for the algebra of limits, although it is so for the extended real line (as seen
in Section 1.7).

Proposition 311 Let xn ! L 2 R, with L 6= 0, and yn ! 0 2 R. The limit of the sequence


xn =yn exists if and only if the sequence fyn g eventually has constant sign.19 In such a case:

(i) if either L > 0 and yn ! 0+ or L < 0 and yn ! 0 , then


xn
! +1
yn

(ii) if either L > 0 and yn ! 0 or L < 0 and yn ! 0+ , then


xn
! 1
yn

This proposition does not, unfortunately, say anything for the case a = 0, that is, for the
indeterminate form 0=0.

Proof Let us prove the “only if” part (we leave to the reader the rest of the proof). Let
L > 0 (the case L < 0 is similar). Suppose that the sequence fynng does
o not have eventually
constant sign. Hence, there exist two subsequences fynk g and yn0k such that ynk ! 0+
and yn0k ! 0 . Therefore, xnk =ynk ! +1 while xnk =yn0k ! 1. Since two subsequences of
xn =yn have distinct limits, Proposition 303 shows that the sequence xn =yn has no limit.

Example 312 (i) Take xn = 1=n 2 and yn = 1=n. We have xn ! 2 and yn ! 0.


Since fyn g has always (and therefore also eventually) positive sign, the proposition yields
xn =yn ! 1.
(ii) Take xn = 1=n + 3 and yn = ( 1)n =n. In this case xn ! 3, but yn ! 0 with
alternating signs, that is, yn has not eventually constant sign. Thanks to the proposition,
the sequence fxn =yn g has no limit. N

Summing up, in view of the last two propositions we have the following indeterminate
forms for the limits:
+1 1 or 1+1 (8.34)
which is often denoted by just writing 1 1;

1 0 or 0 ( 1) (8.35)
19
That is, its terms are eventually either all positive or all negative.
210 CHAPTER 8. SEQUENCES

which is often denoted by just writing 0 1; and

1 0
or (8.36)
1 0

which are often denoted by just writing 1=1 and 0=0. Section 8.10.3 will be devoted to
these indeterminate forms.

Besides the basic operations, the next result shows that limits nicely interchange also
with the power (and the root, which is a special case), the exponential, and the logarithm.
Indeed, (12.8) of Chapter 12 will show that such nicely interchange holds, more generally, for
all functions that – like the power, exponential, and logarithm functions – are continuous.
We thus omit the proof of the next result.

Proposition 313 Except in the indeterminate forms (1.28), that is,


1
1 ; 00 ; (+1)0

we have:20

(i) lim xn = (lim xn ) provided 2 R and xn > 0;

(ii) lim xn = lim xn provided > 0;

(iii) lim loga xn = loga lim xn .

We have, therefore, also the following indeterminate forms for the limits:
1
1

which is often denoted by 11 ;


(+1)0
which is often denoted by 10 ; and
00

8.10.2 Some common limits


We introduce two basic sequences (one being the reciprocal of the other). From their limit
behavior we will then deduce many other limits thanks to the algebra of limits (Propositions
309 and 313).
For the sequence xn = n , we have

lim n = +1

because n > K for every n [K] + 1.


20
From now on, since there is no danger of confusion, we will simply write lim xn instead of limn!1 xn .
Indeed, the limit of a sequence is de…ned only for n ! 1, so we can safely omit this detail.
8.10. ALGEBRA OF LIMITS 211

For the “reciprocal” harmonic sequence xn = 1=n, we have

1
lim =0
n
because 0 < 1=n < " for every n [1="] + 1.

As anticipated, from these two elementary limits we can infer, via the algebra of limits,
many other ones. Speci…cally:

(i) lim n = +1 for every > 0;

(ii) lim (1=n) = lim n = 0+ for every > 0; therefore,


8
< +1 if > 0
lim n = 1 if = 0
: +
0 if < 0

(iii) we have: 8
< +1 if > 1
n
lim = 1 if = 1
: +
0 if 0 < < 1

+1 if > 1
lim log n =
1 if 0 < < 1
Many other limits hold; for example,
7
lim 5n + n2 + 1 = +1 + 1 + 1 = +1

as well as
3 1
lim n2 3n + 1 = lim n2 1 + 2 = +1 (1 0 + 0) = +1
n n
5 7
n2 5n 7 n2 1 n n2 1 0 0 1
lim = lim 4 6 = =
2n2 + 4n + 6 n2 2 + n + n2
2+0+0 2
1
5 n
lim n2
= [0 (5 0)] = 0
2
and

n (n + 1) (n + 2) n n 1 + n1 n 1 + n2
lim = lim 1 2 4
(2n 1) (3n 2) (5n 4) 2n 1 2n 3n 1 3n 5n 1 5n
1 2
1+ n 1+ n
= lim 1 2 4
30 1 2n 1 3n 1 5n
1 1 1
= =
30 1 1 1 30
212 CHAPTER 8. SEQUENCES

8.10.3 Indeterminate forms for the limits


In the previous section we have carefully avoided the indeterminate forms of the limits (8.34)-
(8.36) because in such cases we cannot say, in general, anything. For instance, the limit of
the sum of two sequences whose limits are in…nite of opposite sign can be …nite, in…nite or
even not exist, as the examples below will show. Such limit is thus “indeterminate” based
on the information that the two summands diverge to +1 and to 1, respectively.
Fortunately, in many cases such indeterminacies do not arise and the limit of a sequence
can be computed via the algebra of limits established in Propositions 309 and 313. For
instance, if xn ! 5 and yn ! 3, then xn + yn ! 5 + ( 3) = 2 and xn yn ! 5 ( 3) = 15.
Indeed, these limits involve operations on the extended real line that are well-de…ned, so the
algebra of limits is e¤ective.
That said, when we come across an indeterminate form, the algebra of limits is useless:
we need to roll up our sleeves and work on the speci…c limit at hand. There are no shortcuts.

Indeterminate form 1 1
Consider the indeterminate form 1 1. For example, the limit of the sum xn + yn of the
sequences xn = n and yn = n2 falls under this form of indetermination, so one cannot
resort to the algebra of limits. We have, however,

xn + yn = n n2 = n (1 n)

where n ! 1 and 1 n ! 1, so that, being in the case +1 ( 1), it follows that


xn + yn ! 1. Through a very simple algebraic manipulation, we have been able to …nd
our way out of the indeterminacy.
Now take xn = n2 and yn = n. Also in this case, the limit of the sum xn + yn falls
under the indeterminacy 1 1. By proceeding as we just did, this time we get

lim (xn + yn ) = lim n (n 1) = lim n lim (n 1) = +1


1
Next, take xn = n and yn = n, still of type 1 1. Here again, a simple manipulation
n
allows us to …nd a way out:
1 1
lim (xn + yn ) = lim n + n = lim =0
n n
Finally, take xn = n2 + ( 1)n n and yn = n2 , which is again of type 1 1 since xn ! +1
because xn n2 n = n (n 1). Now, the limit

lim (xn + yn ) = lim ( 1)n n

does not exist.

In sum, when we have an indeterminate form 1 1, the limit might be either +1 or


1 or …nite or nonexistent. In other words, everything goes. So, just to remark that the
case at hand is of type 1 1 does not allow us to say anything on the limit of the sum.21
21
In contrast, if the case were, say, of type 1 + a, then – even without knowing the speci…c form of the
two sequences – the algebra of limits (speci…cally, Proposition 309-(i)) would allow us to conclude that the
limit of their sum is 1.
8.10. ALGEBRA OF LIMITS 213

We have to study carefully the two sequences and come up, each time, with a way to get out
of the indeterminacy (as we have seen in the simple examples just discussed). The same is
true for the other indeterminate forms, as it will be seen next.

Indeterminate form 0 1
Let, for example, xn = 1=n and yn = n3 . The limit of their product has the indeterminate
form 0 1, so we cannot use the algebra of limits. We have, however,
1
lim xn yn = lim n3 = lim n2 = +1
n
1
If xn = and yn = n, then
n3
1 1
lim xn yn = lim 3
n = lim 2 = 0
n n
If xn = n3 and yn = 7=n3 , then
7
lim xn yn = lim n3 = lim 7 = 7
n3
If xn = 1=n and yn = n(cos n + 2),22 then

lim xn yn = lim(cos n + 2)

does not exist.


Again, everything goes. Only the direct calculation of the limit at hand can determine
its value.

Indeterminate forms 1=1 and 0=0


Consider, for example, xn = n and yn = n2 . The limit of their ratio has the form 1=1, but
xn n 1
lim = lim 2 = lim = 0
yn n n

On the other hand, by exchanging xn with yn , the indeterminate form 1=1 remains but

yn n2
lim = lim = lim n = +1
xn n

with a limit altogether di¤erent from the previous one.23


Another example 1=1 is given by xn = n2 and yn = 1 + 2n2 . We have

xn n2 1 1
lim = lim = lim 1 =
yn 1 + 2n2 n2
+2 2
22
Using the comparison criterion, that we will study soon (Theorem 314), it is possible to prove easily that
yn ! +1.
23
Since xn =yn = 1= (yn =xn ), for the two limits Proposition 288 holds.
214 CHAPTER 8. SEQUENCES

That said, if xn = n2 (sin n + 7) and yn = n2 , then


xn
lim = lim (sin n + 7)
yn
which does not exist. Everything goes.
Naturally, the same is true for the indeterminate form 0=0. For example, let xn = 1=n
and yn = 1=n2 . We have
1
xn
lim = lim n1 = lim n = +1
yn n2
whereas, by exchanging the role of xn and yn , we have
1
yn n2 1
lim = lim 1 = lim =0
xn n
n

The indeterminate form 1=1 and 0=0 are closely connected: if the limit of the ratio of
the sequences fxn g and fyn g falls under the indeterminate form 1=1, then the limit of the
ratio of the sequences f1=xn g and f1=yn g falls under the indeterminate form 0=0, and vice
versa.

8.10.4 Summary tables


We can summarize what we learned on the algebra of limits in three tables. In them, the
…rst row indicates the limit of the sequence fxn g, and the …rst column indicates the limit of
the sequence fyn g.
We start with the limit of the sum: the cells report the value of lim (xn + yn ); we write
?? in case of indeterminacy.

sum +1 L 1
+1 +1 +1 ??
H +1 L+H 1
1 ?? 1 1

We have two indeterminate cases out of nine.


Turn to the product: the cells now report the value of lim xn yn .

product +1 L>0 0 L<0 1


+1 +1 +1 ?? 1 1
H>0 +1 LH 0 LH 1
0 ?? 0 0 0 ??
H<0 1 LH 0 LH +1
1 1 1 ?? +1 +1

Here there are four indeterminate cases out of twenty-…ve.


8.10. ALGEBRA OF LIMITS 215

Finally, for the ratio we have the following table, where the cells report the value of
lim (xn =yn ).
ratio +1 L > 0 0 L<0 1
+1 ?? 0 0 0 ??
L L
H>0 +1 H 0 H 1
0 1 1 ?? 1 1
L L
H<0 1 H 0 H +1
1 ?? 0 0 0 ??
In view Proposition 311, in the third row we assumed that yn tends to 0 from above, yn ! 0+ ,
or from below, yn ! 0 . In turn, this determines the sign of the in…nity; for example,
1 1
lim 1 = lim n = +1 and lim 1 = lim ( n) = 1
n n

For the ratio, we thus have …ve indeterminate cases out of twenty-…ve.

The tables make it clear that in the majority of the cases we can rely upon the algebra
of limits (in particular, Propositions 309 and 313). Only relatively few case are actually
indeterminate.

O.R. The case 0 1 is not indeterminate. Clearly, it is shorthand notation for lim xynn , where
the base is a sequence (positive, otherwise the power is not de…ned) approaching 0 (more
precisely, 0+ ) and the exponent is a divergent sequence. We can set 0+1 = 0: if we multiply
0 by itself “in…nitely many times” we still get a zero (a “zerissimo”, if you wish). The form
0 1 is the reciprocal, so 0 1 = +1. H

8.10.5 How many indeterminate forms are there?


We mentioned seven indeterminate forms:
1 0
; ; 0 1; 1 1; 00 ; 10 ; 11
1 0
They are actually all connected. We could regard, for example, 0 1 (or any other) as the
basic indeterminate form and reduce all the other ones to it. Indeed:

(i) If xn ; yn ! 1, their ratio xn =yn appears in the form 1=1, but it is su¢ cient to write
the ratio as
1
xn
yn
to get the form 0 1.

(ii) If xn ; yn ! 0, their ratio xn =yn appears in the form 0=0, but it is su¢ cient to write
the ratio as
1
xn
yn
to get the form 0 1.
216 CHAPTER 8. SEQUENCES

(iii) If xn ! 1 and yn ! 1, their sum xn + yn appears in the form 1 1. However,


we can write
yn
x n + yn = 1+ xn
xn
If yn =xn does not tend to 1, the form is no longer indeterminate, while if yn =xn ! 1
then the form is of the type 0 1.

(iv) For the last three cases it is su¢ cient to consider the logarithm to end up, again, in
the case 0 1. Indeed:

log 00 = 0 log 0 = 0 ( 1) ; log 10 = 0 log 1 = 0 1; log 11 = 1 log 1 = 1 0

The reader can try to reduce all the forms of indeterminacy to either 0=0 or 1=1.

8.11 Convergence criteria


The computation of limits can be rather tedious and, in many cases, might not be that easy.
In these cases, results that establish su¢ cient conditions for convergence – the so-called
convergence criteria –are most useful.24

8.11.1 Comparison criterion


We start with the classic comparison criterion: when two sequences converge to the same
limit, then the same is true for any sequence whose terms are “sandwiched” between those
of the two original sequences.

Theorem 314 (Comparison criterion) Let fxn g, fyn g, and fzn g be three sequences. If,
eventually,
yn xn zn (8.37)
and
lim yn = lim zn = L 2 R (8.38)
then
lim xn = L

We can think of fxn g as a convict who is escorted by the two policemen fyn g and fzn g
(one on each “side”), so he is forced to go wherever they go.

Proof Suppose L 2 R (we leave to the reader the case L = 1). Let " > 0. From (8.38) it
follows, by De…nition 278, that there exists n1 such that yn 2 B" (L) for every n n1 , and
there exists n2 such that zn 2 B" (L) for every n n2 . Finally, let n3 be the position starting
from which one has yn xn zn . Setting n = max fn1 ; n2 ; n3 g, we then have yn 2 B" (L),
zn 2 B" (L), and yn xn zn for every n n. So,

L " < yn xn zn < L + " 8n n


24
In this book the term “criterion” (or “test”) will be always understood as “su¢ cient condition”.
8.11. CONVERGENCE CRITERIA 217

that is, xn 2 B" (L) for every n n. Hence, xn ! L as claimed.

The typical use of this result is in proving the convergence of a given sequence by showing
that it can be “trapped” between two suitable convergent sequences.

Example 315 (i) Consider the sequence xn = n 2 sin2 n. Since 1 sin n 1 for every
n 1, we have 0 sin2 n 1 for every n 1. So,

sin2 n 1
0 8n 1
n2 n2
(ii) Consider the sequences yn = 0 and zn = 1=n2 . Conditions (8.37) and (8.38) hold with
L = 0. By the comparison criterion, we conclude that lim xn = 0. N

Example 316 The sequence xn = n 1 sin n converges to 0. Indeed,


1 sin n 1
8n 1
n n n
and both sequences f1=ng and f 1=ng converge to 0. N

The previous example suggests that, if fxn g is a bounded sequence, say k xn k for
all n 1, and yn ! +1 or yn ! 1, then
xn
!0
yn
Indeed, we have
k xn k
jyn j yn jyn j
and k= jyn j ! 0.

8.11.2 Ratio criterion


The ratio and root criteria are often useful to establish that a sequence is in…nitesimal. They
will be used also for the convergence of series, as we will see in next chapter. Let us begin
with the ratio criterion.

Theorem 317 (Ratio criterion) If there exists a scalar q < 1 such that, eventually,

xn+1
q (8.39)
xn
then lim xn = 0.

Condition (8.39) requires hat the sequence of the absolute values jxn j to be eventually
strictly decreasing, i.e., eventually jxn+1 j < jxn j. By Corollary 300, we then have jxn j # L
for some L 0. The theorem claims that, indeed, L = 0.

Proof Suppose that the inequality holds starting from n = 1 (if it held from a certain n
onwards, just recall that eliminating a …nite number of terms does not alter the limit). It
218 CHAPTER 8. SEQUENCES

is enough to prove that jxn j ! 0 (recall (8.29)). From (8.39), it follows jxn+1 j q jxn j. In
particular, by iterating this inequality from n = 1 we have:

jx2 j q jx1 j ; jx3 j q jx2 j q 2 jx1 j ; ; jxn j qn 1


jx1 j ;

So,
0 jxn j qn 1
jx1 j 8n 2
Since 0 < q < 1, we have q n 1 ! 0. So, by the comparison criterion we jxn j ! 0.

Note that the theorem does not simply require the ratio jxn+1 =xn j to be < 1, that is,

xn+1
<1
xn

but that it be “far from it”, i.e., smaller than a number q which, in turn, is itself smaller
than 1. The next example clari…es this observation.

Example 318 The sequence xn = ( 1)n (1 + 1=n) does not converge – indeed, the sub-
sequence of its terms of even positions tends to +1, whereas that of its terms of odd positions
tends to 1. Yet:
1
xn+1 1 + n+1 n2 + 2n
= = <1
xn 1 + n1 n2 + 2n + 1
for every n 1. N

Though stated as a criterion to establish whether a sequence is in…nitesimal, the ratio


criterion is important for the general study of convergence because of the special status that
Proposition 281 gives in…nitesimal sequences. Indeed, by that proposition we have xn ! L
if and only if jxn Lj ! 0, so by the ratio criterion we have

xn+1 L
q =) xn ! L
xn L

The ratio criterion (and also the root criterion that we will see soon) thus applies, mutatis
mutandis, to the study of any convergence xn ! L.

An important case when condition (8.39) holds is when the ratio jxn+1 =xn j has a limit,
and such limit is < 1, that is,
xn+1
lim <1 (8.40)
xn
Indeed, denote by L this limit and let " > 0 be such that L + " < 1. By the de…nition of
limit, eventually we have
xn+1
L <"
xn
that is, L " < jxn+1 =xn j < L + ". Therefore, by setting q = L + " it follows that eventually
jxn+1 =xn j < q, which is property (8.39).
The limit form (8.40) is actually the most common form in which the ratio criterion is
applied. The next common limits illustrate its use:
8.11. CONVERGENCE CRITERIA 219

(i) For any > 1 and k 2 R, we have

nk
lim n
=0 (8.41)

Indeed, set
nk
xn = n

By taking the ratio of two consecutive terms (the absolute value is here irrelevant since
all terms are positive), we have
k k
xn+1 (n + 1)k n n+1 1 1 1 1
= n+1
= = 1+ ! <1
xn nk n n

(ii) If k 2 R and yn ! +1, then


logk yn
lim =0
yn
Indeed, by setting yn = ezn we get back to the previous case. In particular,

logk n log n
lim = lim =0
n n

O.R. What precedes indicates a hierarchy among the following classes of divergent sequences:
n
with > 1; nk with k > 0; logk n with k > 0 (8.42)

The “strongest”are the exponentials, graded according to the base , then the powers follow,
graded according to the exponent k, and, …nally, the logarithms, graded according to the
exponent k. For example, we have

5n 6 2n n123 + 7n87 n36 log n ! +1

since the sequence inherits the behavior of 5n , while we have

n4 3n3 + 6n2 4 1
!
5n4 + 7n3 + 25n2 + 342 5
because the numerator inherits the behavior of n4 and the denominator that of 5n4 .
Soon, in Section 8.14 we will make rigorous these observations on limits based on the
rate of convergence (or divergence). H

8.11.3 Root criterion


Next we turn to the second convergence criterion for in…nitesimal sequences.

Theorem 319 (Root criterion) If there exists a scalar q < 1 such that, eventually,
p
n
jxn j q (8.43)

then lim xn = 0.
220 CHAPTER 8. SEQUENCES

The strict inequality


p q < 1 is, again, key: the constant sequence xn = 1 does not converge
to 0 although n jxn j 1 for every n.

Proof As in the previous proof, suppose that (8.43) holds starting with n = 1. From
p
n
jxn j q

we immediately get jxn j q n , i.e., q n xn q n . Since 0 < q < 1, then q n ! 0, so the


result follows from the comparison criterion.

For the root criterion we can make observations similar to those n pthatowe made for the
ratio criterion. In particular, property (8.43) holds if the sequence n jxn j has a limit, and
such limit is < 1, that is, p
lim n jxn j < 1 (8.44)
This limit form is the most common with which the criterion is applied.

Example 320 Given k 2 R, let


n
n2 + 3
xn = k+
n3
p
Then, lim n
jxn j = jkj, so the root criterion implies lim xn = 0 as long as jkj < 1. N

The next simple example shows that both the ratio and the root criteria are su¢ cient,
but not necessary, conditions for convergence. However useful, they might turn out to be
useless to establish the convergence of some sequences.

Example 321 The harmonic sequence xn = 1=n converges to 0. It is hard to think of a


simpler limit. However, we have
xn+1 n
= !1
xn n+1
p
n
and so the ratio criterion is not applicable. Furthermore, we have n ! 1 since
p log n
log n
n = log n1=n = !0
n
Then, also the root criterion is is not applicable since
r
n 1 1
= p !1
n n

In sum, none of the two criteria is of any use for such a simple limit. N

Finally, note that both sequences xn = 1=n and xn = ( 1)n =n satisfy condition

xn+1
!1
xn
8.12. THE CAUCHY CONDITION 221

although the …rst sequence converges to 0 and the second one does not converge at all.
Therefore, this condition does not allow us to draw any conclusion about the asymptotic
behavior of a sequence. The same is true for the condition
p
n
jxn j ! 1

Indeed, it is enough to look at the sequences xn = n and xn = 1=n. All this con…rms the
key importance of the “strict”clause < 1 in (8.40) and (8.44). The next classic limit further
illustrates this remark.
p
Proposition 322 For every k > 0, we have lim n k = 1.

Proof The result is obvious


p for k = 1. Let k > 1. For any n, let xn > 0 be such that
(1 + xn )n = k, so that n k = 1 + xn . pFrom Newton’s binomial formula (B.4), we have
n
nxn k, and so xn ! 0. It follows that k ! 1. p p
Now, let k < 1. From what just seen, we have n 1=k ! 1, so thep sequence n 1=k
is bounded (Proposition
p 298). This, in turn, implies that the sequence n k is bounded as
n
well, say 0 k K for some scalar K > 0. By the comparison criterion, the equality
p
n
p p
k 1 = n
1=k 1 n k implies
r r
p
n n 1 p
n n 1
0 k 1 = 1 k K 1 !0
k k
p
n
So, lim k = 1.

8.12 The Cauchy condition


To check whether a sequence converges amounts to compute its limit, a “guess and verify”
procedure in which we …rst posit a candidate limit and then we check whether it is indeed
a limit (Section 8.8.1). It is often not so easy to implement this procedure,25 so to check
convergence. Moreover, the limit is an object which is, in a sense, “extraneous” to the
sequence because, in general, it is not a term of the sequence. Therefore, to establish the
convergence of a sequence we have to rely on a “stranger” that, in addition, might even be
di¢ cult to identify.
For this reason, it is important to have an “intrinsic”criterion for convergence that only
makes use of the terms of the sequence, without involving any extraneous object. To see how
to do this, consider the following simple intuition: if a sequence converges, then its elements
become closer and closer to the limit; but, if they become closer and closer to the limit, then
as a by-product they also become closer and closer one another. The next result formalizes
this intuition.

Theorem 323 (Cauchy) A sequence fxn g is convergent if and only if it satis…es the Cauchy
condition, that is, for each " > 0 there exists an integer n" 1 such that

jxn xm j < " 8n; m n" (8.45)


25
The role of little birds’suggestions in the “guess” part is especially troublesome.
222 CHAPTER 8. SEQUENCES

Sequences that satisfy the Cauchy condition are called Cauchy sequences. The Cauchy
condition is an intrinsic condition that only involves the terms of the sequence. According
to the theorem, a sequence converges if and only if it is Cauchy. Thus, to determine whether
a sequence converges it is enough to check whether it is Cauchy, something that does not
require to consider any extraneous object and just rely on the sequence itself.
But, as usual, there are no free meals: checking that a sequence is Cauchy informs us
about its convergence, but it does not say anything about the actual limit point. To …nd it,
we need to go back to the usual procedure that requires that a candidate be posited.

Proof “Only if”. If xn ! L then, by de…nition, for each " > 0 there exists n" 1 such that
jxn Lj < " for every n n" . This implies that, for every n; m n" ,

jxn xm j = jxn L+L xm j jxn Lj + jxm Lj < " + " = 2"

Since " was arbitrarily chosen, the statement follows.


“If”. If jxn xm j < " for every n; m n" , it easily follows that jxn xn" j < " for
n = n" + 1; n" + 2; : : :, that is,

xn" " < xn < xn" + " for n = n" + 1; n" + 2; : : :

Set A = fa 2 R : xn > a eventuallyg and B = fb 2 R : xn < b eventuallyg. Note that:

(i) A and B are not empty. Indeed, we have xn" " 2 A and xn" + " 2 B.
(ii) If a 2 A and b 2 B, then b > a. Indeed, since a 2 A (respectively, b 2 B), there
exists na 1 such that xn > a for every n na (resp., there exists nb 1 such that
b > xn for every n nb ). De…ne n = max fna ; nb g. It follows that b > xn > a.
(iii) We have sup A = inf B. Indeed, by the Least Upper Bound Principle and by the
previous two points, sup A and inf B are well-de…ned and are such that sup A inf B.
Since, by point (i), xn" " 2 A and xn" + " 2 B, we have xn" " sup A inf B
xn" + "; in particular, jinf B sup Aj 2". Since " can be chosen to be arbitrarily
small, we then have jinf B sup Aj = 0, that is, inf B = sup A.

Call z the common value of sup A and inf B. We claim that xn ! z. Indeed, by …xing
arbitrarily a number > 0, there exist a 2 A and b 2 B such that 0 b a < and,
therefore,
z <a<b<z+
because a z b, and so z < a and b < z + . But, by the de…nition of A and B, the
sequence is eventually strictly larger than a and strictly smaller than b. So, eventually,

z < xn < z +

Due to the arbitrary choice of , this shows that xn ! z, as desired.

Example 324 (i) The harmonic sequence xn = 1=n is Cauchy. Indeed, let " > 0. We have
to show that there exists n" 1 such that for every n; m n" one has jxn xm j < ".
Without loss of generality, assume that n m. Note that for n m we have
1 1 1
0 < jxn xm j = <
m n m
8.13. NAPIER’S CONSTANT 223

Since " > 1=m amounts to m > 1=", by choosing n" = [1="] + 1 we have jxn xm j < " for
every n m n" , thus proving that xn = 1=n is a Cauchy sequence.
(ii) The sequence xn = log n is not Cauchy. Suppose, by contradiction, that for a …xed
" > 0 there exists n" 1 such that for every n; m n" we have jxn xm j < ". First, note
that if n = m + k with k 2 N, we have

m+k
jxn xm j = log < " () k < m(e" 1)
m

Thus, by choosing k = [m(e" 1)] + 1 and m n" , we obtain jxn xm j = log m+k m ".
This contradicts jxn xm j < " since n; m n" . We conclude that xn = log n is not a Cauchy
sequence. N

The previous theorem states a fundamental property of convergent sequences, yet its
relevance is also due to the structural property of the real line that it isolates, the so-called
completeness of the real line. For example, let us assume –as it was the case for Pythagoras
–that we only knew the rational numbers: so, the space on which we operate is Q. Consider
the sequence whose elements (all rationals) are the decimal approximations of :

x1 = 3, x2 = 3:1, x3 = 3:14, x4 = 3:141, x5 = 3:1415, :::

Being a decimal approximation, this sequence satis…es the Cauchy condition because the
inequality
jxn xm j < 10 minfm 1;n 1g
can be made arbitrarily small. The sequence, however, does not converge to any point of Q:
if we knew R, we could say that it converges to . Therefore, in Q the Cauchy condition is
necessary, but not su¢ cient, for convergence. The reason is that Q has not “enough points”
to handle well convergence, unlike R. For instance, the previous sequence converges in R
because of the point , which is missing in Q. We thus say that R is complete (with respect
to convergence), while Q is incomplete. Indeed, R can be seen as a way to complete Q by
adding all the missing limit points, like , as readers will learn in more advanced courses.

8.13 Napier’s constant


The limit of the sequence
n
1
xn = 1+ (8.46)
n
involves the indeterminate form 11 , so the algebra of limits is useless and we have to study
it directly.
The next result proves that the limit exists and is, indeed, a fundamental number, denoted
by e and called Napier’s constant.

Theorem 325 The sequence (8.46) is convergent. Its limit is denoted by e, i.e.
n
1
e = lim 1 + (8.47)
n
224 CHAPTER 8. SEQUENCES

Since the sequence involves powers, the root criterion is a …rst possibility to consider to
prove the result. Unfortunately,
s
n 1 n 1
1+ =1+ !1
n n

and, therefore, this criterion cannot be applied. The proof is based, instead, on the following
classic inequality.

Lemma 326 Let 1 < a 6= 0. We have, for every n > 1,26

(1 + a)n > 1 + an (8.48)

Proof The proof is done by induction. Inequality (8.48) holds for n = 2. Indeed, for each
a 6= 0 we have:
(1 + a)2 = 1 + 2a + a2 > 1 + 2a
Suppose now that (8.48) holds for some n 2 (induction hypothesis), i.e.,

(1 + a)n > 1 + an

We want to prove that (8.48) holds for n + 1. We have:

(1 + a)n+1 = (1 + a)(1 + a)n > (1 + a)(1 + an)


= 1 + a(n + 1) + a2 n > 1 + a(n + 1)

where the …rst inequality, due to the induction hypothesis, holds because a > 1. This
completes the induction step.

Proof of Theorem 325 Set, for each n 1,


n n+1
1 1
an = 1+ ; bn = 1+
n n
We proceed by steps.

Step 1: fbn g is decreasing. Clearly, b1 > b2 . Moreover, for n 2 we have


2 3n " #
1 n+1 1 n+1 n
bn 1+ n 1 4 1+ n 5 1 n
= n = 1+ = 1+ n
bn 1 1+ 1 n 1 + 1 n n 1
n 1 n 1
n 1
1 (n + 1) (n 1) 1+ n
= 1+ = n
n n2 1+ 1
n2 1

and, using the inequality (8.48),27 we see that


n
1 n n 1
1+ >1+ >1+ >1+
n2 1 n2 1 n 2 n
26
For n = 1, equality holds trivially.
27
Note that 1 < 1= n2 1 6= 0 for n 2.
8.13. NAPIER’S CONSTANT 225

So, bn =bn 1 < 1.

Step 2: fan g is increasing. Clearly, a1 < a2 . Moreover, for n 2 we have


n
1 n n+1 n n 1 n n2 1 1 n
an 1+ n n n n2 1 n2
= n 1 = n 1 = 1 = 1
an 1 1 1 1
1+ n 1
n n n

and, again by the inequality used above,


n
1 n 1
1 >1 =1
n2 n2 n

we see that an =an 1 > 1.

Step 3: bn > an for every n and, moreover, bn an ! 0. Indeed


!
n+1 n n+1
1 1 1 1
bn an = 1+ 1+ = 1+ 1 1
n n n 1+ n
n+1
1 1 1
= 1+ = bn >0
n n+1 n+1

Given that bn < b1 , one gets that

bn b1
0 < bn an = < !0
n+1 n+1
By step 1, the sequence fbn g is decreasing and bounded below (being positive). So,
lim bn = inf bn . By step 2, the sequence fan g is increasing and, being an < bn for each
n (step 3), is bounded above. Hence, lim an = sup an . Since bn an ! 0 (step 3), from
bn inf bn sup an an it follows sup an = inf bn , so lim an = lim bn .

One obtains
a1 = 21 = 2 b1 = 22 = 4
3 2 3 3
a2 = 2 = 2:25 b2 = 2 = 3:375

11 10 11 11
a10 = 10 ' 2:59 b10 = 10 ' 2:85
Therefore, Napier’s constant lies between 2:59 and 2:85. Indeed, it is equal to 2:71828:::
Later we will prove that it is an irrational number (Theorem 368). It can be proved that it
is actually a transcendental number.28
Napier’s constant is, inter alia, the most convenient base of exponential and logarithmic
functions (Section 6.5.2). Later in the book we will see that it can be studied from di¤erent
28
An irrational
p number is called algebraic if it is a root of some polynomial equation with integer coe¢ cients:
for example, 2 is algebraic because it is a root of the equation x2 2 = 0. Irrational numbers that are not
algebraic are called transcendental.
226 CHAPTER 8. SEQUENCES

angles: as many important mathematical entities, Napier’s constant is a multi-faceted dia-


mond. Besides the “sequential” angle just seen in Theorem 325, a summation angle will be
studied in Section 9.3.4, a functional angle – with a compelling economic interpretation in
terms of compounding –will be presented in Section 14.6, and a di¤erential angle in Section
20.7.

From the fundamental limit (8.47), we can deduce many other important limits.

(i) If jxn j ! +1 (for example, xn ! +1 or xn ! 1), we have


xn
k
lim 1 + = ek
xn

For k = 1 the proof just requires to consider the integer part of xn . For any k, it is
su¢ cient to set k=xn = 1=yn , so that
xn kyn yn k
k 1 1
1+ = 1+ = 1+ ! ek
xn yn yn

(ii) If an ! 0 and an 6= 0, then


1
lim (1 + an ) an = e
It is su¢ cient to set an = 1=xn to …nd again the previous case (i).

(iii) If an ! 0 and an 6= 0, then


log (1 + an )
lim =1
an
It is su¢ cient to take the logarithm in the previous limit. More generally,

logb (1 + an )
lim = logb e 80 < b 6= 1
an

(iv) If c > 0, yn ! 0, and yn 6= 0, then

cyn 1
lim = log c
yn

It is su¢ cient to set cyn 1 = an (so that also an ! 0) to see that

cyn 1 an
=
yn logc (1 + an )

So, we are back to the (reciprocal of the) previous case in which the limit is 1= logc e =
loge c = log c.

(vi) If 2 R and zn ! 0, with zn 6= 0, then

(1 + zn ) 1
lim =
zn
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 227

The result is obvious for = 1. Let 6= 1, and set an = (1 + zn ) 1. That is,


log (1 + an ) = log (1 + zn ), so that also an ! 0. We have, therefore,

log (1 + an ) log (1 + zn ) log (1 + zn ) zn


= =
an (1 + zn ) 1 zn (1 + zn ) 1

Since
log (1 + an ) log (1 + zn )
lim = lim =1
an zn
the result then follows.

Let us apply to some simple limits what we just learned. We have:


n n
n+5 5
= 1+ ! e5
n n

as well as !
3 3
1 1 + 1=n2 1
n2 1+ 2 1 = !3
n 1=n2

and
1 log (1 + 1=n)
n log 1 + = !1
n 1=n
and
2n 1
! log 2
n

8.14 Orders of convergence and of divergence


8.14.1 Generalities
Some sequences converge to their limit “faster” than others. For instance, consider two
sequences fxn g and fyn g, both diverging to +1. For example, yn = n and xn = n2 .
Intuitively, the sequence fxn g diverges faster than fyn g. If we compare them through their
ratio
yn
xn
we have
yn 1
lim = lim = 0
xn n
Even though the numerator also tends to +1, the denominator has driven the ratio to its
end, forcing it to zero. Hence, the higher rate of divergence – i.e., of convergence to +1 –
of the sequence fxn g reveals itself in the convergence to zero of the ratio yn =xn . The ratio
seems, therefore, to be a natural test for the relative speed of convergence/divergence of the
two sequences.

The next de…nition formalizes this intuition, important both conceptually and computa-
tionally.
228 CHAPTER 8. SEQUENCES

De…nition 327 Let fxn g and fyn g be two sequences, with the terms of the former eventually
di¤ erent from zero.

(i) If
yn
!0
xn
we say that fyn g is negligible with respect to fxn g, and write

yn = o (xn )

(ii) If
yn
! k 6= 0 (8.49)
xn
we say that fyn g is of the same order (or comparable) with fxn g, and write

yn xn

(iii) In particular when k = 1, i.e., when


yn
!1
xn
we say that fyn g and fxn g are asymptotic, and write

yn xn

This classi…cation is comparative. For example, if fyn g is negligible with respect to fxn g,
it does not mean that fyn g is negligible per se, but that it becomes so when compared to
fxn g. The sequence yn = n2 is negligible with respect to xn = n5 , but it is not negligible at
all per se (it tends to in…nity!).
Observe that, thanks to Proposition 288, we have
yn xn
! 1 () ! 0 () xn = o (yn )
xn yn
Therefore, we can use the previous classi…cation also when the ratio yn =xn diverges, no
separate analysis is needed.

Terminology The expression yn = o (xn ) reads “fyn g is little-o of fxn g”.

We collect a few simple properties of these notions.

Lemma 328 Let fxn g and fyn g be two sequences with terms eventually di¤ erent from zero.

(i) The relation of comparability (in particular, ) is both symmetric, i.e., yn xn if


and only if xn yn , and transitive, i.e., zn yn and yn xn imply zn xn .29

(ii) The relation of negligibility is transitive, i.e., zn = o (yn ) and yn = o (xn ) implies
zn = o (xn ).
29
Comparability is, indeed, an equivalence relation (cf. Appendix A).
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 229

Proof The symmetry of follows from


yn xn 1
! k 6= 0 () ! 6= 0
xn yn k
We leave to the reader the easy proof of the other properties.

Finally, observe that


1 1
yn xn ()
yn xn
and, in particular,
1 1
yn xn () (8.50)
yn xn
provided that fxn g and fyn g are eventually di¤erent from zero. In other words, comparability
and negligibility are preserved when one moves to the reciprocals.

We now consider the more interesting cases in which both sequences are either in…n-
itesimal or divergent. We start with two in…nitesimal sequences fxn g and fyn g, that is,
lim xn = lim yn = 0. In this case, the negligible sequence tends faster to zero. Consider, for
example, xn = 1=n and yn = 1=n2 . Intuitively, yn goes to zero faster than xn . Indeed,
1
n2 1
1 = !0
n
n

that is yn = o (xn ). On the other hand, we have


p r
n 1
p = 1 !1
n+1 n+1
p p
and so the in…nitesimal sequences xn = 1= n and yn = 1= n + 1 are comparable.

Suppose now that the sequences fxn g and fyn g are both divergent, positively or negat-
ively, that is, limn!1 xn = 1 and limn!1 yn = 1. In this case, negligible sequences
tend slower to in…nity (independently on the sign), that is, they take on values greater and
greater, in absolute value, less rapidly. For example, let xn = n2 and yn = n. Intuitively, yn
goes to in…nity more slowly than xn . Indeed,
yn n 1
= 2 = !0
xn n n
that is, yn = o (xn ). On the other hand, the same is true if xn = n2 and yn = n because
it is not the sign of the in…nity that matters, but the rate of divergence.

The meaning of negligibility must, therefore, be quali…ed depending on whether we con-


sider convergence to zero or to in…nity (i.e., divergence). It is important to distinguish
carefully the two cases.

N.B. Setting xn = n and yn = n + k, with k > 0, the sequences fxn g and fyn g are
asymptotic. Indeed, no matter how large k is, the divergence to +1 of the two sequences
230 CHAPTER 8. SEQUENCES

will make negligible, from the asymptotic point of view, the role of k. Such a fundamental
viewpoint, central to the theory of sequences, should not make us forget that two asymptotic
sequences are, in general, very di¤erent (to …x ideas, set for example k = 1010 , i.e., 10 billions,
and consider the asymptotic, yet very di¤erent, sequences xn = n and yn = n + 1010 ). O

8.14.2 Little-o algebra


The application of the concept of “little-o” is not always straightforward. Indeed, knowing
that a sequence fyn g is little-o of another sequence fxn g does not convey too much inform-
ation on the form of fyn g, apart from being negligible with respect to fxn g. There exists,
however, an “algebra” of little-o that allows for manipulating safely the little-o of sums and
products of sequences.

Proposition 329 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:

(i) o(xn ) + o (xn ) = o (xn );

(ii) o(xn )o(yn ) = o(xn yn );

(iii) c o(xn ) = o(xn );

(iv) o(yn ) + o (xn ) = o (xn ) if yn = o(xn ).

The relation o(xn ) + o (xn ) = o (xn ) in (i), bizarre at …rst sight, simply means that the
sum of two little-o of a sequence is still a little-o of such sequence, that is, it continues to be
negligible with respect to that sequence. Similar re-readings hold for the other properties in
the proposition. Note that (ii) has the remarkable special case

o(xn )o(xn ) = o(x2n )

Proof If a sequence is little-o of xn it can be written as xn "n , where "n is an in…nitesimal


sequence. Indeed
xn " n
lim = lim "n = 0
xn
and therefore xn "n is little-o of xn . The proof will be based on this very useful arti…ce.
(i) Let us call xn "n the …rst of the two little-o on the left-hand side of the equality and
xn n the second one, with "n and n two in…nitesimal sequences. Then
xn " n + xn n
lim = lim ("n + n) =0
xn
which shows that o(xn ) + o (xn ) is o (xn ).
(ii) Let us call xn "n the little-o of xn and yn n the little-o of yn , with "n and n two
in…nitesimal sequences. Then
x n "n y n n
lim = lim ("n n) =0
x n yn

so that o(xn )o (yn ) is o (xn yn ).


8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 231

(iii) Let us call xn "n the little-o of xn , with "n in…nitesimal sequence. Then
c x n "n
lim = c lim "n = 0
xn
that shows that c o(xn ) is o (xn ).
(iv) Let us call yn = xn "n , with "n an in…nitesimal sequence. Then, the little-o of yn
can be written as yn n that is, xn "n n , with n an in…nitesimal sequence. Moreover, we call
xn n the little-o of xn , with n an in…nitesimal sequence. Then
x n "n n+ xn n
lim = lim ("n n + n) =0
xn
so that o(yn ) + o (xn ) = o (xn ).

Example 330 Consider the sequence xn = n2 , as well as the sequences yn = n and zn =


2(log n n). It is immediate to see that yn = o(xn ) = o(n2 ) and zn = o(xn ) = o(n2 ).

(i) Adding up the two sequences we obtain yn + zn = 2 log n n, which is still o(n2 ) in
accordance with (i) proved above.

(ii) Multiplying the two sequences we obtain yn zn = 2n log n 2n2 , which is o(n2 n2 ) ,
i.e., o(n4 ), in accordance with (ii) proved above (in the special case o(xn )o(xn )). Note
that yn zn is not o(n2 ).

(iii) Take c = 3 and consider c yn = 3n. It is immediate that 3n is still o(n2 ), in accordance
with (iii) proved above.
p
(iv) Consider the sequence wn = n 1. It is immediate that wn = o(yn ) = o(n). Consider
now the sum wn + zn (with zn de…ned above), which is the sum of a o(yn ) and a o(xn ),
p
with yn = o(xn ). We have wn + zn = n 1 + 2 log n 2n, which is o(xn ) = o(n2 ) in
accordance with (iv) proved above. Note that wn + zn is not o(yn ), even if wn is o(yn ).
N

N.B. (i) To say that a sequence is o (1) simply means that it tends to 0. Indeed, xn = o (1)
means that xn =1 = xn ! 0. (ii) The fourth property in the last proposition is especially
important because it highlights that, if yn is negligible with respect to xn , in the sum
o(yn ) + o (xn ) the little-o o(yn ) is subsumed in o (xn ). O

8.14.3 Asymptotic equivalence


The relation identi…es sequences that are asymptotically equivalent to one another. Indeed,
it is easy to see that yn xn implies that, for L 2 R,

yn ! L () xn ! L (8.51)

In detail:

(i) if L 2 R, we have yn ! L if and only if xn ! L;

(ii) if L = +1, we have yn ! +1 if and only if xn ! +1;


232 CHAPTER 8. SEQUENCES

(iii) if L = 1, we have yn ! 1 if and only if xn ! 1;

All this suggests that it is possible to replace xn by yn (or vice versa) in the calculation
of the limits. Intuitively, such possibility is attractive because it might allow to replace a
complicate sequence by a simpler one that is asymptotic to it.
To make this intuition precise we start by observing that the asymptotic equivalence
is preserved under the fundamental operations.

Lemma 331 Let yn xn and zn wn . Then,

(i) yn + zn xn + wn provided there exists k > 0 such that, eventually,30

xn
k
x n + wn

(ii) yn zn x n wn ;

(iii) yn =zn xn =wn provided that eventually zn 6= 0 and wn 6= 0.

Note that for sums, di¤erently from the case of products and ratios, the result does not
hold in general, but only with a non-trivial ad hoc hypothesis. For this reason, points (ii)
and (iii) are the most interesting ones. In the sequel we will thus focus on the asymptotic
equivalence of products and ratios, leaving to the reader the study of sums.
Proof (i) We have
yn + zn yn zn yn xn zn wn
= + = +
x n + wn xn + wn xn + wn x n x n + wn wn xn + wn
yn xn zn xn yn zn xn zn
= + 1 = +
xn xn + wn wn xn + wn x n wn x n + wn wn

Since yn =xn ! 1 and zn =wn ! 1, we have


yn zn
!0
xn wn
hence
yn zn xn yn zn xn yn zn
0 = k!0
xn wn x n + wn xn wn x n + wn xn wn

By the comparison criterion,

yn zn xn
!0
xn wn x n + wn

and hence, since zn =wn ! 1, we have


yn + zn
!1
x n + wn
30
For example, the condition holds if fxn g and fwn g are both eventually positive.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 233

as desired.
(ii) and (iii) We have
yn zn yn zn
= !1
x n wn x n wn
and yn
zn y n wn y n wn
xn = = !1
wn zn x n x n zn
since yn =xn ! 1 and zn =wn ! 1.

The next simple lemma is very useful: in the calculation of a limit, one should neglect
what is negligible.

Lemma 332 We have


xn xn + o (xn )

Proof It is su¢ cient to observe that


xn + o (xn ) o (xn )
=1+ !1
xn xn

By (8.51), we therefore have

xn + o (xn ) ! L () xn ! L

What is negligible with respect to the sequence fxn g –i.e., what is o (xn ) –is asymptotically
irrelevant and one can safely ignore it. Together with Lemma 331, this implies for products
and ratios, that
(xn + o (xn )) (yn + o (xn )) xn yn (8.52)
and
xn + o (xn ) xn
(8.53)
yn + o (xn ) yn
We illustrate these very useful asymptotic equivalences with some examples, which should
be read with particular attention.

Example 333 (i) Consider the limit

n4 3n3 + 5n2 7
lim
2n5 + 12n4 6n3 + 4n + 1
By (8.53), we have

n4 3n3 + 5n2 7 n4 + o n4 n4 1
= = !0
2n5 + 12n4 6n3 + 4n + 1 2n5 + o (n5 ) 2n 5 2n

(ii) Consider the limit


1 3
lim n2 7n + 3 2 +
n n2
234 CHAPTER 8. SEQUENCES

By (8.52),31 we have
1 3
n2 7n + 3 2 + 2 = n2 + o n2 (2 + o (1)) 2n2 ! +1
n n
(iii) Consider the limit
n (n + 1) (n + 2) (n + 3)
lim
(n 1) (n 2) (n 3) (n 4)
By (8.53), we have
n (n + 1) (n + 2) (n + 3) n4 + o n4 n4
= 4 =1!1
(n 1) (n 2) (n 3) (n 4) n + o (n4 ) n4
(iv) Consider the limit
n 1
lim e 7+
n
By (8.52), we have
n 1 n n
e 7+ =e (7 + o (1)) 7e !0
n
N
By (8.50), we have
yn xn zn wn
() (8.54)
zn wn yn xn
provided that the ratios are (eventually) well-de…ned and not zero. Therefore, once we have
established the asymptoticity of the ratios yn =zn and xn =wn , we “automatically” have also
the asymptoticity of their reciprocals zn =yn and wn =xn .
Example 334 Consider the limit
e5n n7 4n2 + 3n
lim
6n + n8 n4 + 5n3
By (8.53),
n
e5n n7 4n2 + 3n e5n + o e5n e5n e5
= = ! +1
6n + n8 n4 + 5n3 6n + o (6n ) 6n 6
If, instead, we consider the reciprocal limit
6n + n8 n4 + 5n3
lim
e5n n7 4n2 + 3n
then, by (8.54),
n
6n + n8 n4 + 5n3 6
!0
e5n n7 4n2 + 3n e5
N
In conclusion, a clever use of (8.52)-(8.53) often allows to simplify substantially the
calculation of limits. But, beyond calculations, they are illuminating relations conceptually.
31
For 0 6= k 2 R, we have k + o(1) k. Indeed,
k + o(1) 1
= 1 + o(1) ! 1
k k
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 235

8.14.4 Characterization and decay


The next result establishes an enlightening characterization of asymptotic equivalence.

Proposition 335 We have

xn yn () xn = yn + o (yn )

In words, two sequences are asymptotic when they are equal, up to a component that is
asymptotically negligible with respect to them. This result further clari…es how the relation
can be seen as an asymptotic equality.

Proof “If.” From xn = yn + o (yn ) it follows that

xn yn + o (yn ) o (yn )
= =1+ !1
yn yn yn

“Only if.” Let xn yn . Denoting zn = xn yn , one has that

zn x n yn xn
= = 1!0
yn yn yn

and therefore zn = o (yn ).

The next result is a nice application of this characterization.

Proposition 336 Let fxn g be a sequence with terms eventually non-zero. Then

1
log jxn j ! k 6= 0 (8.55)
n

if and only if jxn j = ekn+o(n) .

Proof “If.” From jxn j = ekn+o(n) it follows that

1 1 kn + o (n)
log jxn j = log ekn+o(n) = !k
n n n

“Only if.” Set zn = log jxn j. Since k 6= 0, from (8.55) it follows that zn =kn ! 1, i.e.,
zn kn. From the previous proposition and Proposition 329-(iii) it follows that

jxn j = ezn = ekn+o(kn) = ekn+o(n)

as claimed.

When k < 0, the condition (8.55) characterizes the sequences that converge to zero at
exponential rate. In that case, we speak of exponential decay. When k > 0, there is instead
an explosive exponential behavior.
236 CHAPTER 8. SEQUENCES

8.14.5 Terminology
Due to its importance, for the comparison both of in…nitesimal sequences and of divergent
sequences there is a speci…c terminology. In particular,

(i) if two in…nitesimal sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is in…nitesimal of higher order with respect to fxn g;

(ii) if two divergent sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is of lower order of in…nity with respect to fxn g.

In other words, a sequence is in…nitesimal of higher order if it tends to zero faster, while
it is of lower order of in…nity if it tends to in…nity slower. Besides the terminology (which is
not universal), it is important to recall the idea of negligibility that lies at the basis of the
relation yn = o (xn ).

8.14.6 Scales of in…nities


Through the orders of convergence we can compare exponential sequences f n g, power se-
quences nk , and logarithmic sequences logk n , thus making precise the hierarchy (8.42)
that we established with the ratio criterion.
First of all, observe that they are of in…nite order when > 1 and k > 0 and in…nitesimal
when 0 < < 1 and k < 0. Moreover, we have:

(i) If > , then n


= o( n ). Indeed, n
= n = ( = )n ! 0.

(ii) nk = o ( n ) for every > 1, as already proved with the ratio criterion. We have
n = o nk if, instead, 0 < < 1 and k > 0.

(iii) If k1 > k2 , then nk2 = o nk1 . Indeed, nk2 =nk1 = 1=nk1 k2 ! 0.

(iv) logk n = o (n), as already proved with the ratio criterion.

(v) If k1 > k2 , then logk2 n = o logk1 n . Indeed,

logk2 n 1
k1
= k1 k2
!0
log n log n

The next lemma reports two important comparisons of in…nities that show that expo-
nentials are of lower order of in…nity than factorials (we omit the proof).

Lemma 337 We have n = o (n!), with > 0, and n! = o (nn ).

Note that this implies, by Lemma 328, that n = o (nn ). Exponentials are, therefore, of
lower order of in…nity also compared with sequences of the type nn .

The di¤erent orders of in…nity and in…nitesimal are sometimes organized through scales.
If we limit ourselves to the in…nities (similar considerations hold for the in…nitesimals), the
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 237

most classic scale of in…nities is the logarithmic-exponential one. Taking xn = n as the basis,
we have the ascending scale
2 k n
n; n2 ; :::; nk ; :::; en ; e2n ; :::; ekn ; :::; en ; :::; en ; :::; ee ; :::

and the descending scale


1 1 p p p p
n; n 2 ; :::; n k ; :::; log n; log n; :::; k log n; :::; log log n; log log n; :::; k log log n; :::

They provide “benchmarks”to caliber the asymptotic behavior of a sequence fxn g that tends
to in…nity. For example, if xn log n, the sequence fxn g is asymptotically logarithmic; if
xn n , the sequence fxn g is asymptotically quadratic, and so on.32
2

n
In applications one rarely considers orders of in…nity higher than ee and lower than
log log n. Indeed, log log n has an almost imperceptible increase, it is almost constant:

n 10 102 103 104 105 106


log log n 0:834 03 1:527 2 1:932 6 2:220 3 2:443 5 2:625 8
n
while ee increases explosively:

n 3 4 5 6
n
ee 5:284 9 108 5:148 4 1023 2:851 1 1064 1:610 3 10175

The asymptotic behavior of divergent sequences that are relevant in applications usually
n
ranges between the slowness of log log n and the explosiveness of ee . But, from a theoretical
point of view, we can go well beyond them. The study of the scales of in…nity is of great
elegance (see, Hardy, 1910).

8.14.7 The De Moivre-Stirling formula


To better illustrate how little-o analysis works, we will present the De Moivre-Stirling for-
mula. Besides being a quite surprising formula, it is also used in many theoretical and applied
problems in dealing with the asymptotic behavior of n!.

Theorem 338 We have

log n! = n log n n + o (n)


1 p
= n log n n + log n + log 2 + o (1)
2
Two approximations of log n! are thus established. The …rst one, which De Moivre came
up with, is slightly less precise because it has an error term of order o (n). The second
approximation was given by Stirling and is more accurate –its error term is o (1) –but also
more complex.33
32
Although for brevity we omit the details, Lemma 337 shows that the logarithmic-exponential scale can
be remarkably re…ned with orders of in…nity of the type n! and nn .
33
Since o (1) =n ! 0, a sequence which is o (1) is also o (n). For this reason, an error term of order o (1) is
better than one of order o (n).
238 CHAPTER 8. SEQUENCES

Proof We will only show the …rst equality. By setting xn = n!=nn , in the proof of Lemma
337 we have seen that
xn+1 1
lim =
xn e
From (10.16), we have also that
p
n
p n! 1
lim n
xn = lim =
n e
p n
We can thus conclude that n= n n! = e (1 + o (1)), or n!=nn = e n (1 + o (1)) , that is,
n
n! = nn e n
(1 + o (1))

Hence, log n! = n log n n n log (1 + o (1)). Since log (1 + an ) an as an ! 0, we have


n log (1 + o (1)) n o (1) = o (n).
p
We conclude that n! = nn e n 2 neo(1) , and so

n!
p = eo(1) ! 1
nn e n 2 n
We thus obtain the following remarkable formula
p
n! nn e n 2 n

that allow us to elegantly conclude our asymptotic analysis of factorials.

8.14.8 Distribution of prime numbers


The little-o notation was born and …rst used at the end of the nineteenth century in the
study of the distribution of prime numbers. We introduced prime numbers in Section 1.3
where we showed their “atomic”centrality among the other natural numbers by means of the
Fundamental Theorem of Arithmetic. The existence of in…nitely many prime numbers was
also proven thanks to a well-known theorem by Euclid, so that we can speak of the sequence
of prime numbers fpn g. Nevertheless, in Section 8.1 we noted that it is unfortunately not
possible to explicitly describe such a sequence. This issue brought mathematicians to wonder
about the distribution of prime numbers in N. Let : N+ ! R be the sequence whose n-th
term (n) is the number of prime numbers that are less than or equal than n. For example

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(n) 0 1 2 2 3 3 4 4 4 4 5 5 6 6 6

It is, of course, not possible to fully describe the sequence as this would be equivalent to
describing the sequence of prime numbers, which we have argued to be hopeless (at least,
operationally). Nevertheless, we can still ask ourselves whether there is a sequence fxn g that
is described in closed form and is asymptotically equal to . In other words, the question is
whether we can …nd a reasonably simple sequence that asymptotically approximates well
enough.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 239

Around the year 1800, Gauss and Legendre noted independently that the sequence
fn= log ng well approximates , as we can check by inspection of the following table.

n (n)
n (n) log n n= log n

10 4 4; 3 0; 921
102 25 21; 7 1; 151
103 168 145 1; 161
104 1:229 1:086 1; 132
105 9:592 8:686 1; 104
1010 455:052:511 434:294:482 1; 048
1015 29:844:570:422:669 28:952:965:460:217 1; 031
1020 2:220:819:602:560:918:840 2:171:472:409:516:250:000 1; 023

One can easily see that the ratio


(n)
n
log n

becomes closer and closer to 1 as n increases. Gauss and Legendre’s conjectured that this
was so because is asymptotically equal to fn= log ng. Their conjecture remained open
for about a century, until it was, independently, proven to be true in 1896 by two great
mathematicians, Jacques Hadamard and Charles de la Vallée Poussin. The importance of
such a result is testi…ed by its name, which is as simple as it is demanding.34

Theorem 339 (Prime Number Theorem) It holds that


n
(n)
log n

Although we are not able to describe the sequence , thanks to the Prime Number
Theorem we can say that its asymptotic behavior is similar to that of the simple sequence
fn= log ng, that is, their number in any given interval of natural numbers [m; n] is approx-
imately
n m
(n) (m) =
log n log m
with increasing accuracy. This wonderful result, which undoubtedly has a statistical “‡avor”,
is incredibly elegant. Even more so if we consider its following remarkable consequence.

Theorem 340 It holds that


pn n log n (8.56)
34
The proof of this theorem requires complex analysis methods which we do not cover in this book. The
use of complex analysis in the study of prime numbers is due to a Bernhard Riemann’s deep insight. Only
in 1949 two outstanding mathematicians, Paul Erdös and Atle Selberg, were able to prove this results using
real analysis methods.
240 CHAPTER 8. SEQUENCES

The sequence of prime numbers fpn g is thus asymptotically equivalent to fn log ng. The
n-th prime number’s value is, approximately, n log n. For example, by inspecting the prime
number table one can see that for n = 100 one has that pn = 541 while its “estimate” is
n log n = 460 (rounding down). Similarly:
pn
n pn n log n n log n

100 541 460 1; 176 1


1:000 7:919 6:907 1; 146 5
10:000 104:729 92:104 1; 137 1
100:000 1:299:709 1:151:292 1; 128 9
10:00:000 154:85:863 13:815:510 1; 120 9
10:000:000 179:424:673 161:180:956 1; 113 2
100:000:000 2:038:074:743 1:842:068:074 1; 106 4
1:000:000:000 22:801:763:489 20:723:265:836 1; 100 3

One can see that the ratio between pn and its estimate n log n stays steadily around 1.

Proof From the Prime Number Theorem one has that


log n
(n) !1
n
Hence, for any " > 0, there is an n" such that
log n
(n) 1 <" 8n n" (8.57)
n
Since pn ! 1, there is an n" such that pn n" per n n" . Hence, (8.57) implies that
log pn
(pn ) 1 <" 8n n"
pn
At the same time, one has that (pn ) = n, so that
log pn
n 1 <" 8n n"
pn
that is,
log pn
n !1 (8.58)
pn
from which it follows that
log pn
log n ! log 1 = 0
pn
or, log n + log log pn log pn ! 0. Since log pn ! +1,
log n log log pn log n + log log pn log pn
+ 1= !0
log pn log pn log pn
8.15. SEQUENCES IN RN 241

Yet, log log pn = log pn ! 0 (can you explain why?), and so

log n
!1
log pn

Multiplying by (8.58), we get that

n log n log pn log n


=n !1
pn pn log pn

thus showing that (8.56) holds.

O.R. Counting objects is one of the most basic activities common across cultures, arguably
the most universal one: counting emerges as soon as similar, yet distinguished, entities come
up. If so, the identi…cation of prime numbers –the atoms of numbers –can be viewed as an
important step in the evolution of a civilization. Indeed, their study emerged in the Greek
world, which also marked the emergence of reason (Section 1.8). The depth with which
a civilization studies prime numbers is, then, a possible universal benchmark to assess its
degree of evolution. Under this scale, the Prime Number Theorem is one of best evidence of
its evolution that mankind can o¤er when going where no one has gone before (unless sure
of their intentions, better not to meet civilizations that have found the closed form of the
sequence of primes). H

8.15 Sequences in Rn
We close the chapter by considering sequences xk of vectors in Rn . For them we give a
de…nition of limit that follows closely the one already given for sequences in R. The funda-
mental di¤erence is that each element of the sequence is now a vector xk = (xk1 ; xk2 ; :::; xkn ) 2
Rn and not a scalar.

De…nition 341 We say that the sequence xk in Rn tends to L 2 Rn , in symbols xk ! L


or lim xk = L, if for every " > 0 there exists n" 1 such that

k n" =) kxk Lk < "

In other words, xk = (xk1 ; xk2 ; :::; xkn ) ! L = (L1 ; L2 ; :::; Ln ) if the scalar sequence of
distances xk L converges to zero (cf. Proposition 281). Since
r
Xn 2
k
x L = xki Li
i=1

we see immediately that

xk L ! 0 () xki Li ! 0 8i = 1; 2; : : : ; n (8.59)

That is, xk ! L if and only if the scalar sequences xki of the i-th components converge
to the component Li of the vector L. The convergence of a sequence of vectors, therefore,
242 CHAPTER 8. SEQUENCES

reduces to the convergence of the sequences of the single components. So, it is a compon-
entwise notion of convergence that, as such, does not present any signi…cant novelty relative
to the scalar case.

N.B. A sequence in Rn may be regarded as the restriction to N+ of a vector function


f : R ! Rn . O

Example 342 Consider the sequence

1 1 2k + 3
1 + ; 2;
k k 5k 7

in R3 . Since
1 1 2k + 3 2
1+ !1 , !0 and !
k k2 5k 7 5
the sequence converges to the vector (1; 0; 2=5). N

In a similar way, we de…ne the divergences to +1 and to 1 when all the components of
the vectors that form the sequence diverge to +1 or to 1, respectively. Finally, when the
single components have di¤erent behaviors (some converge, others diverge or are irregular)
the sequence of vectors does not have a limit (for brevity, we omit the details).

Notation Sequences of vectors are denoted by a superscript xk instead of a subscript


fxn g to avoid confusion with the dimension n of the space Rn and to be able to indicate the
single components xki of each vector xk of the sequence.
Chapter 9

Series

9.1 The concept


The idea that we want to develop here is, roughly, about the possibility of summing in…nitely
many addends. Imagine a stick 1 meter long and cut it in half, obtaining in this way two
pieces 1=2 meter long; then cut the second piece in half, obtaining two pieces 1=4 meter long;
cut again the second piece, obtaining two pieces 1=8 meter long, and continue, without never
stopping. This cutting process results in in…nitely many pieces of length 1=2, 1=4, 1=8, ...
in which the original stick of 1 meter has been divided into. It is rather natural to imagine
that
1 1 1 1
+ + + + n+ =1 (9.1)
2 4 8 2
i.e., that –by reassembling the individual pieces –we get back the original stick.
In this chapter we will give a precise meaning to equalities like (9.1). Consider, therefore,
a sequence fxn g and suppose that we want to “sum” all its terms, i.e., to carry out the
operation
X1
x1 + x2 + + xn + = xn
n=1

To make rigorous this new operation of “addition of in…nitely many summands”, which is
di¤erent from the ordinary addition (as we will realize), we will sum a …nite number of terms,
say n, then make n tend to in…nity and take the resulting limit, if it exists, as the value to
assign to the series. We are, therefore, thinking of constructing a new sequence fsn g de…ned
by

s1 = x1 (9.2)
s2 = x1 + x2
s3 = x1 + x2 + x3

sn = x1 + + xn

and to take the limit of fsn g as the sum of the series. Formally:

243
244 CHAPTER 9. SERIES

De…nition
P1 343 The series with terms given by a sequence fxn g of scalars, in symbols
n=1 x n , is the sequence fsn g de…ned in (9.2). The terms sn of the sequence are called
partial sums of the series.
P
The series 1 n=1 xn is therefore de…ned as the sequence fsPn g of the partial sums (9.2).
Its limit behavior determines its value; in particular, a series 1
1
n=1 xn is:

P
1
(i) convergent, with sum S, in symbols xn = S, if lim sn = S 2 R;
n=1
P1
(ii) positively divergent, in symbols n=1 xn = +1, if lim sn = +1;
P1
(iii) negatively divergent, in symbols n=1 xn = 1, if lim sn = 1;

(iv) irregular (or oscillating) if the sequence fsn g is irregular.

In sum, we attribute to the series the same character – convergence, divergence, or


irregularity –as that of its sequence of partial sums.2

Partial sums can be de…ned recursively by


(
s1 = x1
(9.3)
sn = sn 1 + xn for n 2

This formulation can be operationally useful to construct partial sums through a guess and
verify procedure: we …rst posit a candidate expression for the partial sum, which we then
verify by induction. Example 347 will illustrate this procedure. However, as little birds
suggesting guesses are often not around, the main interest of this recursive formulation is,
ultimately, theoretical in that it further clari…es that a series is nothing but a new sequence
constructed from an existing one. Indeed, given a sequence fxn g, the recursion (9.3) de…nes
the sequence of partial sums fsn g. It is this recursion that, thus, underlies the notion of
series.

O.R. Sometimes it is useful to start the series with the index n = 0 rather than from n = 1.
When the option exists (we will see that this is not the case for some types of series, like the
harmonic series, which cannot be de…ned for n = 0), the choice to start a series from either
n = 0 or n = 1 (or from another value of n) is a pure matter of convenience (as it was for
sequences). Actually, one can start the series from any k in N. The context itself typically
suggests the best choice. In any case, this choice does not alter the character of the series
and, therefore, it does not a¤ect the problem of determining whether the series converges or
not. H
1
We thus resorted to a limit, that is, to a notion of potential in…nity. On the other hand, we cannot really
sum in…nitely many summands: all the world paper would not su¢ ce, nor would our entire life (and, by
the way, we would not know where to put the line that one traditionally writes under the summands before
adding them).
2
Using the terminology already employed for the sequences, a series is sometimes called regular when it is
not irregular, that is when one of the cases (i)-(iii) holds.
9.1. THE CONCEPT 245

9.1.1 Three classic series


We illustrate the previous notions with three important series (and a Epicurus piece).

Example 344 (Mengoli series) The Mengoli series is given by:


1
X
1 1 1 1
+ + + + =
1 2 2 3 n (n + 1) n (n + 1)
n=1

Since
1 1 1
=
n (n + 1) n n+1
one has that
1 1 1
sn = + + +
1 2 2 3 n (n + 1)
1 1 1 1 1 1 1 1
=1 + + + + =1 !1
2 2 3 3 4 n n+1 n+1
Therefore,
1
X 1
=1
n (n + 1)
n=1
So, the Mengoli series converges and has sum 1. N

Example 345 (Harmonic series) The harmonic series is given by:


1
X
1 1 1 1
1+ + + + + =
2 3 n n
n=1

Consider the partial sums with indexes n that are powers of 2 (i.e., n = 2k ):
1
s1 = 1; s2 = 1 +
2
1 1 1 1 1 1 1 1
s4 = 1 + + + > 1 + + + = s2 + = 1 + 2
2 3 4 2 4 4 2 2
1 1 1 1 1 1 1 1 1 1
s8 = s4 + + + + > s4 + + + + = s4 + > 1 + 3
5 6 7 8 8 8 8 8 2 2
By continuing in this way we see that
1
s2k > 1 + k (9.4)
2
The sequence of partial sums is strictly increasing (since the summands are all positive) and
so it admits limit; the inequality (9.4) guarantees that it is unbounded above and therefore
lim sn = +1. Hence,
X1
1
= +1
n
n=1

i.e., the harmonic series diverges positively.3 N


3
In Appendix E.2, we present another proof of the divergence of the harmonic series, due to Pietro Mengoli.
246 CHAPTER 9. SERIES

Example 346 (Geometric series) The geometric series with ratio q is de…ned by:
1
X
1 + q + q2 + q3 + + qn + = qn
n=0

Its character depends on the value of q. In particular, we have that:


8
> +1 if q 1
>
>
1
X < 1
qn = if jqj < 1
>
> 1 q
n=0 >
:
irregular if q 1

To verify this, we start by observing that when q = 1 we have

sn = |1 + 1 +
{z + 1} = n + 1 ! +1
n+1 times

Let now q 6= 1. Since

sn qsn = 1 + q + q 2 + q 3 + + qn q 1 + q + q2 + q3 + + qn
= 1 + q + q2 + q3 + + qn q + q2 + q3 + + q n+1 = 1 q n+1

we have
(1 q) sn = 1 q n+1

and therefore, since q 6= 1,


1 q n+1
sn =
1 q
It follows that
1
X 1 q n+1
q n = lim
n!1 1 q
n=0

The study of this limit is divided into several cases:

(i) if 1 < q < 1, we have q n+1 ! 0 and so

1
sn !
1 q

(ii) if q > 1, we have q n+1 ! +1 and so sn ! +1;

(iii) if q = 1, the partial sums of odd order are equal to zero, while those of even order
are equal to 1. The sequence formed by them is hence irregular;

(iv) if q < 1, the sequence q n+1 is irregular and, therefore, so is fsn g as well. N
9.1. THE CONCEPT 247

Example 347 We can use the recursive de…nition of partial sums (9.3) to guess and verify
(by induction) what are the partial sums of the geometric series. The, highly inspired, guess
is that
1 q n+1
sn =
1 q
We verify the guess by induction. At n = 0; 1 it is trivially true. Assume it is true at n
(induction hypothesis). Then

1 q n+1 1 q n+1 + q n+1 q n+2 1 q (n+1)+1


sn+1 = sn + q n+1 = + q n+1 = =
1 q 1 q 1 q
as desired. N

Epicurus in a letter to Herodotus wrote “Once one says that there are in…nite parts in
a body or parts of any degree of smallness, it is not possible to conceive how this should
be, and indeed how could the body any longer be limited in size?” The previous examples
show that, indeed, if these “parts”, these particles, have a strictly positive, but di¤erent size
– for example either 1=n (n + 1) or q n , with q 2 (0; 1) – then the series might converge, so
the size of the “body” can be de…ned. Nevertheless, Epicurus was right in the sense that, if
we assume – as it seems he does too – that all the particles have same size, no matter how
small, then the series
"+"+"+ +"+
P1
positively diverges. That is, n=1 " = +1 for every " > 0. Indeed, for the partial sums we
have sn = n" ! +1. This simple series has an interesting philosophical meaning (properties
of series have been often used, even within philosophy, to try to clarify the nature of the
potential in…nite).

9.1.2 Sub specie aeternitatis: in…nite horizon


Series are important in economics. For example, let us go back to the intertemporal choices
introduced in Section 8.3. We saw that a consumption stream can be represented by a
sequence
x = fx1 ; x2 ; : : : ; xt ; : : :g
and can be evaluated by an intertemporal utility function U : A R1 ! R. In particular,
we mentioned the discounted U given by
t 1
U (x) = u1 (x1 ) + u2 (x2 ) + + ut (xt ) + (9.5)

where 2 (0; 1) is the subjective discount factor. In view of what we have just seen, (9.5) is
the series
X1
t 1
ut (xt ) (9.6)
t=1

Series thus give a rigorous meaning to the fundamental discounted form (9.5) of intertem-
poral utility functions. Naturally, we are interested in the case in which the series (9.6)
is convergent, so that the overall utility that the consumer gets from a stream is …nite.
Otherwise, how could we compare, hence choose, streams if they have in…nite utility?
248 CHAPTER 9. SERIES

Using the properties of the geometric series, momentarily will show in Example 360 that
the series (9.6) converges if < 1, provided that the utility functions ut are positive and
bounded by the same constant.4 In such a case, the intertemporal utility function
1
X
t 1
U (x) = ut (xt ) (9.7)
t=1

has as domain the entire space R1 , that is, U (x) 2 R for every x 2 R1 . We can thus
compare all possible consumption streams.

9.2 Basic properties


Given that the character of a series is determined by the character of the sequence of its
partial sums, it is evident that subtracting, adding, or modifying a …nite number of terms of
aPseries, does not change its character. In contrast, its sum P
might well change. For instance,
1 1
n=1 xn has the same character, but not the same sum, as n=k xn for every integer k > 1.
As to the fundamental operations, we have
1
X 1
X
cxn = c xn 8c 2 R
n=1 n=1

and
1
X 1
X 1
X
(xn + yn ) = xn + yn
n=1 n=1 n=1

when we do not fall in a indeterminate form 0 1 or 1 1, respectively.


The next result is simple, yet important. If a series converges, then its terms necessarily
tend to 0: summands must eventually vanish to avoid having an exploding sum (memento
Epicurus).
P1
Theorem 348 If the series n=1 xn converges, then xn ! 0.

Proof Clearly, we have xn = sn sn 1and, given that the series converges, sn ! S as well
as sn 1 ! S. Therefore, xn = sn sn 1 !S S = 0.

Convergence to zero of the sequence fxn g is, therefore, a necessary condition for conver-
gence P
of its series. This condition is only necessary: even though 1=n ! 0, the harmonic
series 1n=1 1=n diverges.

Example 349 The series with term

2n2 3n + 4
xn =
17n2 + 4n + 5

is not convergent because xn is asymptotic to 2n2 =17n2 = 2=17, so it does not tend to 0. N
4
Actually, (9.6) converges if and only if < 1, as long as the instantaneous utility functions are equal
across periods as well as strictly positive and bounded.
9.3. SERIES WITH POSITIVE TERMS 249

9.3 Series with positive terms


9.3.1 Comparison criterion
P
We study now the important case of series 1 n=1 xn with positive terms, that is, xn 0 for
all n 1.5 In such a case, the sequence fsn g of the partial sums is increasing and therefore
the following regularity result holds trivially.

Proposition 350 Each series with positive terms is either convergent or positively diver-
gent. In particular, it is convergent if and only if it is bounded above.6

Series with positive terms thus inherit the remarkable regularity properties of monotonic
sequences. This gives them an important status among series. In particular, for them we
now recast the convergence criteria presented in Section 8.11 for sequences.
P P1
Proposition 351 (Comparison criterion) Let 1 n=1 xn and n=1 yn be two series with
positive terms, with xn yn eventually.
P P1
(i) If 1 n=1 xn diverges positively, then so does n=1 yn .
P1 P1
(ii) If n=1 yn converges, then so does n=1 xn .
P 0
Proof Let n0 1 be such that xn yn for all n n0 , and set = nn=1 (yn xn ). By
calling sn (resp., n ) the partial sums of the sequence fxn g (resp., fyn g), for n > n0 we have
Xn
n sn = + (yk xk )
k=n0 +1

That is, n sn + . Therefore, the result follows from Proposition 296 (which is the
sequential counterpart of this statement).

Note that (i) is the contrapositive of (ii), and vice versa: indeed, thanks to Proposition
350, for a series with positive terms the negation of convergence is positive divergence.7
Because of their usefulness, we stated both; but, it is the same property seen in two equivalent
ways.

Example 352 The series


X1 10n
n=1 n52n+3
converges. Indeed, since
n
10n 10n 10n 1 2
= =
n52n+3 52n+2 25n 52 25 5
the convergence of the geometric series with ratio 2=5 guarantees, via the comparison cri-
terion, the convergence of the series. N
5
Nothing changes if the terms are positive only eventually. Indeed, we can always discard a …nite number
of terms without altering the asymptotic behavior of the series. Hence, all the results on the asymptotic
behavior of series with positive terms hold, more generally, for series with terms that are eventually positive.
6
By de…nition, a series is bounded above when the sequence of the partial sums is so, i.e., there exists
k > 0 such that sn k for every n 1.
7
Recall that, given two properties p and q, the implication :q =) :p is the contrapositive of the original
implication p =) q (see Appendix D).
250 CHAPTER 9. SERIES

Example 353 The series of the reciprocal of the factorials8


X1
1
n!
n=0
converges. Indeed, observe that
X1 1
X 1
X
1 1 1
=1+1+ =2+
n! n! (n + 1)!
n=0 n=2 n=1
But the series
1
X 1
(n + 1)!
n=1
converges because, for every n 3,
1 1
<
(n + 1)! n (n + 1)
where the right-hand side is the generic term of theP
Mengoli series, which we know converges.
By the comparison criterion, the convergence of 1 n=0 1=n! then follows from that of the
Mengoli series. We will see later that, remarkably, its sum is Napier’s constant e (Theorem
366). N
Example 354 We call generalized harmonic series the series
X1
1
n
n=1
with 2 R. If = 1, it reduces to the harmonic series that we know diverges to +1.
If < 1, it is easy to see that, for every n > 1,
1 1
> (i.e., n < n)
n n
Therefore, by the comparison criterion,
1
X 1
= +1
n
n=1
If = 2, the generalized harmonic series converges. Indeed, let us observe that
1
X X1 X1
1 1 1
2
=1+ 2
=1+
n n (n + 1)2
n=1 n=2 n=1
But the series
1
X 1
(n + 1)2
n=1
converges because, for every n 1,
1 1
<
(n + 1)2 n (n + 1)
which is the generic term of the convergent Mengoli series.9 By the comparison criterion,
8
Recall
P that 0! = 1. For this reason, we start the series
P1 from n = 0 (so, in Proposition 366 we will able to
write 1 1=n!
n=0 P = e, a more elegant expression than n=1 1=n! = e 1).
9
Indeed, 1 n=1 1=n 2
= 2
=6 but here we do not have the tools to prove this remarkable result.
9.3. SERIES WITH POSITIVE TERMS 251

P1 P1
the convergence of n=1 1=n
2 is a consequence of the convergence of n=1 1= (n + 1)2 .
If > 2, then
1 1
< 2
n n
for every n > 1 and therefore we still have convergence.
Finally, it is possible to see, but it is more delicate, that the generalized harmonic series
converges also if 2 (1; 2).
Summing up, the generalized harmonic series
1
X 1
n
n=1

converges for > 1, while it diverges for 1. N

For the generalized harmonic series, the case = 1 is thus the “last” case of divergence:
it is su¢ cient to very slightly increase the exponent, from 1 to 1+" with " > 0, and the series
will converge. This suggests that the divergence is extremely slow, as the reader can check
by calculating some of the partial sums.10 This intuition is made precise by the following
beautiful result.

Proposition 355 We have


1 1 1
1+ + + + log n (9.8)
2 3 n
In words, the sequence of the partial sums of the harmonic series is asymptotic to the
logarithm. This result can be further improved: it can be shown that there is a scalar > 0,
the so-called Euler-Mascheroni constant, such that
1 1 1
1+ + + + = + log n + o (1) (9.9)
2 3 n
This approximation, with an error term o (1), is more accurate than (9.8), which by Propos-
ition 335 can be written as
1 1 1
1+ + + + = log n + o (log n)
2 3 n
with an error term o (log n).11 Thus, the partial sums of the harmonic series are equal to
the logarithm, up to a positive constant and a term that goes to 0. In particular, in view of
(9.9) we have
1 1 1
= lim 1 + + + + log n
2 3 n
So, the Euler-Mascheroni constant is the limit of the di¤erence between the partial sums of
the harmonic series and the logarithm. It is a remarkable number, which is approximately
0:5772156649, whose nature is still elusive.12
10
A “cadaverous in…nity”, in the words of a professor.
11
Indeed o (1) = log n ! 0, so a sequence which is o (1) is also o (log n). This is why an error term of order
o (1) is better than one of order o (log n). Mutatis mutandis, the relations between these two approximations
is similar to that between the two approximations that we saw for the De Moivre-Stirling formula.
12
It is not even known if it is irrational, i.e., we do not have for it the counterpart of Euler’s Theorem 368.
252 CHAPTER 9. SERIES

Proof The proof of this result may be skipped on a …rst reading since it relies on integration
notions that will be presented in Chapter 35. De…ne : [0; 1) ! R by
1
(x) = 8x 2 [i 1; i)
i
with i 1. That is, (x) = 1 if x 2 [0; 1), (x) = 1=2 if x 2 [1; 2), and so on. It is easy to
see that
1 1
(x) 8x > 0 (9.10)
x+1 x
Therefore, the restriction of on every closed interval is a step function. By Proposition
1423, we then have
n
X n Z
X i Z n
1
= (x) dx = (x) dx 8k = 1; :::; n
i i 1 k 1
i=k i=k

for every n 1. By (9.10),


Z n n
X n
X Z n
1 1 1 1
log (1 + n) = dx =1+ 1+ dx = 1 + log n
0 x+1 i i 1 x
i=1 i=2

for every n 2. Therefore,


Pn 1
log (1 + n) i=1 i 1 + log n
8n 2
log n log n log n
By the comparison criterion, we conclude that
Pn 1
i=1 i
!1
log n
as desired.

Example 356 The last example can be generalized by showing that the series13
1
X 1
n=2
n log n

converges for > 1 and any 2 R, as well as for = 1 and > 1. It diverges for <1
and any 2 R, as well as for = 1 and any 1. N

The comparison criterion has a nice and useful asymptotic version, based on the asymp-
totic comparison of the terms of the sequences.
P P1
Proposition 357 (Asymptotic comparison criterion) Let 1 n=1 xn and n=1 yn be two
series with strictly positive terms.14 If xn yn , then the two series have the same character.
13
The series starts with n = 2 because for n = 1 the term is not de…ned.
14
The hypothesis that the terms are strictly positive, so non-zero, is necessary to make the ratio xn =yn well
de…ned. This hypothesis will be used several times throughout the chapter.
9.3. SERIES WITH POSITIVE TERMS 253

Therefore, the character of a series is invariant with respect to the asymptotic equivalence
relation.

Proof Since xn yn , for every " > 0 there exists n" 1 such that
xn
1 " 1+" 8n n"
yn

For every n > n" , we have


n
X n"
X n
X n"
X n
X n
X
xk
xk = xk + yk xk + (1 + ") yk c + (1 + ") yk (9.11)
yk
k=1 k=1 k=n" +1 k=1 k=n" +1 k=n" +1

and
n
X n"
X Xn Xn
xk
xk = xk + yk c + (1 ") yk (9.12)
yk
k=1 k=1 k=n" +1 k=n" +1
P " P P1
where c = nn=1 xk . The character of the series 1 n=1 yn is the same as that of k=n" +1 yk
because the value assumedPby a …nite number of initial terms is irrelevant P1for the character
of a series. PTherefore, if 1 n=1 yn converges, by (9.11) it follows thatP n=1 xn converges,
whereas if 1 n=1 yn diverges to +1, from (9.12) it follows that also
1
n=1 xn diverges to
+1.

Example 358 Let


2n3 3n + 8
xn =
5n5 n4 4n3 + 2n2 12
Since
2n3 2
xn = 2
5n5 5n
P1
the series n=1 xn converges. Let, instead,

n+1
xn =
n2 3n + 4
P1
Since xn 1=n, the series n=1 xn diverges to +1. N

We can use the asymptotic comparison criterion to establish a celebrated result, proved
in 1737 by Euler, that says that the sum of the reciprocals of the prime numbers is in…nite.

Theorem 359 (Euler) We have


1
X
1 1 1 1 1 1
+ + + + + = = +1
2 3 5 7 11 pn
n=1

Proof. By Theorem 340, we have pn n log n. Therefore (recall (8.50)),

1 1
pn n log n
254 CHAPTER 9. SERIES

P1
By the asymptotic comparison criterion, the series
P n=1 1=pn has the same character of
1
n=2 1=n log n. In view of Example 356, we have

1
X 1
= +1
n log n
n=2
P1
It follows that n=1 1=pn = +1, as desired.

Euler’s Theorem, along with the comparison criterion, implies the divergence to +1 of
the harmonic series. Indeed
1 1
pn n
for every n 1.15 Euler’s Theorem is, however, a truly remarkable result with respect to
the divergence of the harmonic series in that it involves only the reciprocals of the prime
numbers, whereas the harmonic series considers the reciprocals of all natural numbers (be
they prime or not).

Euler’s Theorem con…rms that there are in…nitely many prime numbers, and shows that
they are “dense”inPN because they tend to +1 more slowly than the powers n , with > 1,
for which we have 1 n=1 1=n < +1.

We conclude our analysis of the comparison criterion with an important economic ap-
plication.

Example 360 Consider the series (9.6), that is


1
X
t 1
ut (xt )
t=1

Suppose that the functions ut : R ! R are positive and uniformly bounded above, that is,
there is common constant M > 0 such that, for all t 1,

0 ut (x) M 8x 2 R

As a result, we can write:


1
X 1
X
t 1 t 1
0 ut (xt ) M
t=1 t=1
P
By the comparison criterion, it remains to check whether the geometric series 1 t=1
t 1

converges. In view of Example 346, we conclude that the series (9.6) converges if and only
if < 1.16 N
15
We have 1=p1 = 1=2 1, 1=p2 = 1=3 1=2, 1=p3 = 1=5 1=3, 1=p4 = 1=7 1=4, and so on.
16
The asymptotic behavior as ! 1, that is, as patience becomes in…nite, will be addressed by the
Frobenius-Littlewood’s Theorem in Section 10.6.
9.3. SERIES WITH POSITIVE TERMS 255

9.3.2 Ratio criterion: prelude


The next section will be devoted to the important ratio criterion for convergence. For the
impatient reader, we …rst see its simplest version.
P1
Proposition 361 (Ratio criterion, elementary limit form) Let n=1 xn be a series
with, eventually, strictly positive terms. Suppose that lim xn+1 =xn exists.

(i) If
xn+1
lim <1
xn
the series converges.
(ii) If
xn+1
lim >1
xn
the series diverges positively.

The criterion is thus based on the study of the limit of the ratio
xn+1
xn
of the terms of the series. The condition that the limit lim xn+1 =xn exists is rather demand-
ing, as we will see in the next section. But, when it is satis…ed, the elementary limit form of
the ratio criterion is the easiest to apply.

Example 362 (i) The series


1
X n2 + 5n + 1
n2n + 1
n=1
converges. Indeed,
(n+1)2 +5(n+1)+1
xn+1 (n+1)2n+1 +1 (n + 1)2 + 5 (n + 1) + 1 n2n + 1
= n2 +5n+1
=
xn n2 + 5n + 1 (n + 1) 2n+1 + 1
n2n +1
2
n + 7n + 7 n2n + 1 n2 n2n
=
n2 + 5n + 1 (n + 1) 2n+1 + 1 n2 (n + 1) 2n+1
n2n 1 n 1
= n+1
= !
(n + 1) 2 2n+1 2
So, by the ratio criterion the series convergences.

(ii) The series


1
X 2n!
3n
n=1
diverges positively. Indeed,
2(n+1)!
xn+1 3n+1 2 (n + 1)! 3n 1 (n + 1)! 1
= 2n!
= n+1
= = (n + 1) ! +1
xn 3n
3 2n! 3 n! 3
which, by the ratio criterion, implies the divergence of the series. N
256 CHAPTER 9. SERIES

If lim xn+1 =xn exists but


xn+1
lim =1
xn
then nothing
P1 can be said P1about the character of the series. This is well illustrated by the two
2
series n=1 1=n and n=1 1=n : although for both we have lim xn+1 =xn = 1, the former
diverges, while the latter converges.

9.3.3 Ratio criterion


We now study the ratio criterion in more generality by dropping the hypothesis that lim xn+1 =xn
exists.
P
Proposition 363 (Ratio criterion) Let 1 n=1 xn be a series with, eventually, strictly pos-
itive terms.
(i) If there exists a number q < 1 such that, eventually,
xn+1
q (9.13)
xn
then the series converges.
(ii) If, instead, the ratio is eventually 1, then the series diverges positively.
The theorem requires that the ratios be (uniformly) smaller than a number q which is
itself smaller than 1 (so, the terms form a strictly decreasing sequence), and not just that
they all be smaller than 1. Indeed, the ratios of harmonic series
1
n+1 n
1 =
n
n+1
are all < 1, but the series diverges (since the ratios tend to 1, there is no room to insert a
number q < 1 greater than all them).
Since the convergence of a series implies that the sequence of its terms is in…nitesimal
(Theorem 348), the ratio criterion for series can be seen as an extension of the homonymous
criterion for sequences. The same is true for the root criterion that we will see in the next
chapter.

Proof (i) Without loss of generality, assume that xn > 0 and (9.13) holds for every n. From
xn+1 qxn we deduce, as in the analogous criterion for sequences, that 0 < xn q n 1 x1 ,
and the …rst statement follows from the comparison criterion (Proposition 351) and from
the convergence of the geometric series. (ii) If we have eventually xn+1 =xn 1 and xn > 0,
then eventually xn+1 xn > 0. In other words, the sequence fxn g is eventually increasing
and therefore it cannot tend to 0, yielding that the series must diverge positively.
Example 364 Let fxn g be a sequence such that x1 > 0 and
( 1
2 xn if n even
xn+1 = 1
3 xn if n odd
For instance, if x1 = 1 then
P1f1; 1=3; 1=6; 1=18; :::g. Since xn+1 =xn 1=2 for all n 1, by the
ratio criterion the series n=1 xn converges. Note that here lim xn+1 =xn does not exist. N
9.3. SERIES WITH POSITIVE TERMS 257

It is possible to prove (see Section 10.4) that, if the lim xn+1 =xn exists, then the ratio
criterion assumes exactly the tripartite form given in Proposition 361. That is:

(i) lim xn+1 =xn < 1, then then the series converges;

(ii) lim xn+1 =xn > 1, then diverges positively;

(iii) lim xn+1 =xn = 1, then the criterion fails and gives no indication about the character
of the series.

Operationally, this tripartite form is the standard form in which the ratio criterion is
applied. At a mechanical level, it might be su¢ cient to recall this tripartition and the
illustrative examples given in the prelude. But, not to do plumbing rather than mathematics,
it is important to keep in mind the theoretical foundations provided by Proposition 363 (the
last simple example, in which the tripartite form is useless, shows that it can be also useful).
Let us see other tripartite examples.
P1 n =n
Example 365 (i) By the ratio criterion, the series n=1 q converges for every 2R
and every 0 < q < 1. Indeed,

n q n+1 n
= q!q<1
(n + 1) q n n+1
Again by the ratio criterion, this series diverges
P positively when q > 1. Finally, if q = 1 we
are back to the generalized harmonic series 1 n=1 1=n of Example 354.

(ii) By the ratio criterion, the series


1
X xn
n!
n=0
converges for every x > 0. Indeed, for n 0 we have

xn+1 n! x
= !0 8x > 0
(n + 1)! xn n+1

(iii) By the ratio criterion, the series


1
X xn
n
n=1
converges for every 0 < x < 1. Indeed,

xn+1 n n
n
= x!x
n+1 x n+1
which obviously is < 1 when 0 < x < 1. If x > 1, the ratio criterion implies that the
series diverges positively. Finally, if x = 1 we are back to the harmonic series, which
diverges positively. N

We stop here our study of convergence criteria. Much more can be said: in Section 10.4
we will continue to investigate this topic in some more depth.
258 CHAPTER 9. SERIES

9.3.4 A …rst series expansion


Napier’s constant has been introduced in the previous chapter as the limit of the sequence
(1 + 1=n)n . Surprisingly, it emerges also as sum of the series of the reciprocals of the factori-
als, as Newton proved in 1665.

Theorem 366 (Newton) We have


1
X 1
=e (9.14)
n!
n=0

Proof In Example 353 we showed that the series converges. Let us compute its sum. By
Newton’s binomial formula (B.4) for each n 1, we have
n n
X n
X
1 n 1 1 n! 1
1+ = k
=
n k n k! (n k)! nk
k=0 k=0

On the other hand,

n! k
= n (n 1) (n k + 1) n
| {z n} = n
(n k)! | {z }
k times k times

Therefore,
n! 1
1
(n k)! nk
which implies
n n
X n
X
1 1 n! 1 1
1+ =
n k! (n k)! nk k!
k=0 k=0

It follows that
1
X 1
e (9.15)
n!
n=0

For every k 0 we have


n! 1
lim =1 (9.16)
n!1 (n k)! nk
Indeed,
n! 1 n (n k) (n k + 1) nk
k
= =1
(n k)! n nk nk
Fix m 1. For every n > m, we have
n n
X m
X n
X
1 1 n! 1 1 n! 1 1 n! 1
1+ = k
= k
+
n k! (n k)! n k! (n k)! n k! (n k)! nk
k=0 k=0 k=m+1
Xm
1 n! 1
k! (n k)! nk
k=0
9.3. SERIES WITH POSITIVE TERMS 259

Therefore, thanks to (9.16),


n m
X m
X m
X
1 1 n! 1 1 n! 1 1
e = lim 1+ lim k
= lim =
n!1 n n!1 k! (n k)! n k! n!1 (n k)! nk k!
k=0 k=0 k=0

Since this holds for every m 1, we have that


m
X 1
X
1 1
e lim =
m!1 k! n!
k=0 n=0

that, along with (9.15), implies (9.14).

The beautiful equality (9.14) can be substantially generalized.

Theorem 367 For every x 2 R, we have


1
X
x n xn
ex = lim 1+ = (9.17)
n!1 n n!
n=0

The equality (9.17) holds for every number x and reduces to (9.14) in the special case
x = 1. Note the remarkable series expansion of the exponential function
1
X xn x2 x3 xn
ex = =1+x+ + + + + (9.18)
n! 2 3! n!
n=0

1
X
From Example 365 we know that the series xn =n! converges for x > 0. The case x = 0 is
n=0
trivial. At the same time, Example 371 of the next section will show that this series converges
also for x < 0, when it has no longer positive terms.

Proof As in Theorem 366, we start by applying Newton’s binomial formula:


n
X n
X
x n n xk xk n! 1
1+ = k
=
n k n k! (n k)! nk
k=0 k=0

As before, note that


n! k
= n (n 1) (n k + 1) |n {z n} = n
(n k)! | {z }
k times k times

and
n! 1
1
(n k)! nk
Fix m 1. For every n > m, we have
m
X n
X n
X n
X
x n xk n! 1 xk n! 1 jxjk n! 1 jxjk
1+ =
n k! (n k)! nk k! (n k)! nk k! (n k)! nk k!
k=0 k=m+1 k=m+1 k=m+1
260 CHAPTER 9. SERIES

Therefore, thanks to (9.16),


m
X m
X
xk x n xk n! 1
ex = lim 1+ lim
k! n!1 n k! n!1 (n k)! nk
k=0 k=0
m
X
x n xk n! 1
= lim 1+
n!1 n k! (n k)! nk
k=0
n
X 1
X
jxjk jxjk
lim =
n!1 k! k!
k=m+1 k=m+1

Since this holds for every m 1, we have


m
X 1
X
xk jxjk
0 lim ex lim =0
m!1 k! m!1 k!
k=0 k=m+1

Pm k =k!
This proves that k=0 x converges to ex , hence the statement.
P
N.B. In the proof we used aPnoteworthy fact: if the series 1 k=1 xk converges, then the
sequence of “forward” sums f 1 x
k=m k g converges to 0 as m ! +1. Intuitively, if from an
in…nite sum we …rst remove the …rst summand, then the …rst two summands, then the …rst
three summands, and so on and so forth, then what is left should. The reader may want to
make this argument rigorous. O

Later in the book we will see that (9.17) is a power series (Chapter 10). For this reason,
the equality (9.18) is called the power series expansion of the exponential function. It is a
result, as elegant as important, that allows to “decompose” the exponential function in a
sum of (in…nitely many) simple functions such as the powers xn .
We will study in greater generality series expansions with the tools of di¤erential calculus,
of which series expansions are one of the most remarkable applications.

We close this section by establishing the irrationality of Napier’s constant, a property,


…rst proved by Euler in 1737, that we already mentioned a few times. We can now …nally
prove it as a corollary of its series expansion (9.14).

Theorem 368 (Euler) Napier’s constant is an irrational number.

Proof We have:
n
X 1
X
1 1
0 < e =
k! k!
k=0 k=n+1
1 1 1 1
= + + + +
n! n + 1 (n + 1) (n + 2) (n + 1) (n + 2) (n + k)
! 1
1 1 1 1 1 X 1 1 1
< + + + + = =
n! n + 1 (n + 1)2 (n + 1) k n! (n + 1) k n! n
k=1
9.4. SERIES WITH TERMS OF ANY SIGN 261

where the last equality holds because the geometric series that starts at k = 1 with ratio
1= (n + 1) has sum 1=n. By Theorem 366, we then have the following interesting bounds:
n
X 1 1 1
0<e <
k! n! n
k=0

Suppose, by contradiction, that e is rational, i.e., e = p=q for some natural numbers p and
q. By multiplying both sides of the last inequality by n!, we then have
n
X
p 1 1
0 < n! n! < (9.19)
q k! n
k=0

If n = q, then
p (q 1)! (q! + q!=2! + + 1)
is an integer, which cannot be between 0 and 1=n as (9.19) requires. This contradiction
proves that e is not rational.

9.4 Series with terms of any sign


9.4.1 Absolute convergence
P
We close the chapter by brie‡y considering the general case of series 1
n=1 xn with terms that
are not necessarily positive, even eventually. To study such series, we consider an auxiliary
series with positive terms.
P P1
De…nition 369 The series 1 n=1 xn is said to be absolutely convergent if the series n=1 jxn j
of its absolute values is convergent.

The next result shows that the convergence of the series of absolute values – which can
be veri…ed with the criteria discussed in the previous sections –guarantees the convergence
of the not necessarily positive, so possibly much wilder, original series.

Theorem 370 If a series converges absolutely, then it converges.

The condition is only su¢ cient, as we will soon show (Proposition 374). The class of
absolutely convergent series is, therefore, contained in that of convergent series. As the next
section will show, this subclass has key regularity properties: absolutely convergent series
are, among the series with terms of any sign, the ones that behave well.

Example 371 Let us revisit the series in Example 365 by permitting negative terms.
P1 n =n
(i) By Theorem 370 and by the ratio criterion, the series n=1 q converges for every
2 R and every 1 < q < 1. Indeed, from

jxn+1 j n q n+1 n
= = jqj ! jqj < 1
jxn j (n + 1) q n n+1
it follows that it converges absolutely.
262 CHAPTER 9. SERIES

P1 n =n!
(ii) The series n=1 x converges for every x 2 R. In fact, from

jxn+1 j xn+1 n! x
= = !0 8x 2 R
jxn j (n + 1)! xn n+1

it follows that it converges absolutely. So, the series in Theorem 367 is, indeed, con-
vergent.
P
(iii) The series 1 n
n=1 x =n converges for every 1 < x < 1. Indeed,

jxn+1 j xn+1 n n
= n
= jxj ! jxj
jxn j n+1 x n+1

which obviously is < 1 when 1 < x < 1. Thus, also this series converges absolutely.N

Example 372 (i) The series


1
X ( 1)n
n2
n=1

converges. Indeed, we have ( 1)n =n2 = 1=n2 , so this series converges absolutely. N
(ii) The series
X1
x3 x5 x7 x2n+1
x + + = ( 1)n
3! 5! 7! (2n + 1)!
n=0

converges for every x 2 R. Indeed,

x2n+3 (2n + 1)! x2


= !0 8x 2 R
(2n + 3)! x2n+1 (2n + 3) (2n + 2)

and so the series converges absolutely.


(ii) The series
1
X
x2 x4 x6 x2n
1 + + = ( 1)n+1
2! 4! 6! (2n)!
n=0

converges for every x 2 R. Indeed,

x2n+2 (2n)! x2
= !0 8x 2 R
(2n + 2)! x2n (2n + 2) (2n + 1)

and, therefore, also this last series converges absolutely. N

Theorem 370 is a consequence of the following simple lemma, which should also further
clarify its nature.
P P1
Lemma 373 Given a series 1 n=1 xn , suppose there is a convergent series n=1 yn with
positive terms such that, for every n 1,

(i) xn + yn 0,

(ii) xn kyn for some k > 0.


9.4. SERIES WITH TERMS OF ANY SIGN 263

P1 P
Then, both the series n=1 (xn + yn ) and 1 n=1 xn converge, with
X1 X1 X1
xn = (xn + yn ) yn
n=1 n=1 n=1

Proof Set zn = P1 xn + yn . Since 0 zPn (1 + k) yn , by the comparison criterion the


convergence of n=1 yn implies that of n=1 zn . Let sxn , syn and szn be the terms of the
1

partial sums of the three series involved. Both lim szn and lim syn exist. Clearly, sxn = szn syn
for every n 1. By Proposition 309-(i), we then have lim sxn = lim szn lim syn , as desired.
P P1
The series 1 n=1 yn thus “lifts”, via addition, the series of interest n=1 xnPand takes it
back to the familiar terrain of series with positive terms. The convergence of 1 n=1 xn can
then be established by studying two auxiliary series with positive terms, for which we have
at our disposal all the tools learned in the previous sections.
Theorem 370 follows from the lemma by considering yn = jxn j because jxn j + xn 0 and
xn jxn j for every nP 1. ThisPclari…es the “lifting” P1nature of absolute convergence. In
1 1
particular,
P1 it implies n=1 x n = n=1 (x n + jx n j) n=1 jxn j, so that the sum of the series
x
n=1 n can be expressed in terms of the sums of two series with positive terms.

Absolute convergence is only a su¢ cient condition for convergence. Indeed, the altern-
ating harmonic series
1
X ( 1)n+1 1 1 1 1 1
=1 + + + (9.20)
n 2 3 4 5 6
n=1

converges to log 2, as the next elegant result will show. However, it does not converge
absolutely:
X1 X1
( 1)n+1 1
= = +1
n n
n=1 n=1

Proposition 374 We have


1
X ( 1)n+1
= log 2
n
n=1

Proof The subsequences of the odd and even partial sums

s1 ; s3 ; s5 ; ::: and s2 ; s4 ; s6 ; :::

are decreasing and increasing, respectively. So, they converge to two scalars Lodd and Leven ,
respectively. Since s2n+1 s2n = x2n+1 ! 0, we then have Lodd = Leven . If we call L this
common limit, we conclude that sn ! L, so the alternating harmonic series converges.
It remains to show that L = log 2 . It is enough to consider the even partial sums s2n
and show that lim s2n = log 2. We have
2n
X n 1 n n n
X1 1 n
( 1)k+1 X 1 X 1 X 1 X 1
s2n = = = + 2
k 2k + 1 2k 2k 2k + 1 2k
k=1 k=0 k=1 k=1 k=0 k=1
X2n n
X
1 1
=
k k
k=1 k=1
264 CHAPTER 9. SERIES

By (9.9),
n
X 2n
X
1 1
= + log n + o (1) and = + log 2n + o (1)
k k
k=1 k=1

where is the Euler-Mascheroni constant. Thus,

2n
X n
X
1 1
s2n = = log 2 + o (1)
k k
k=1 k=1

so that lim s2n = log 2, as desired.

It is easy to check that the argument just used to show the convergence of the alternating
P
1
series (9.20) proves, more generally, that any alternating series ( 1)n+1 xn , with xn 0
n=1
for every n 1, converges provided the sequence fxn g is decreasing and in…nitesimal, i.e.,
xn # 0.

9.4.2 Hic sunt leones


Series that are not absolutely convergent are, in general, not thatPwell behaved.17 To see why
this is the case, we introduce rearrangements. Given a series 1 n=1 xn , …x a permutation
: N ! N that, to each position n, associates a unique position (n) and vice versa.18 The
new series
1
X
x (n)
n=1
P
constructed via is called a rearrangement of 1 n=1 xn . In words, the new series has been
obtained by permuting the terms of the original series. That said, is it true that
1
X 1
X
xn = x (n)
n=1 n=1

for any permutation : N ! N? In other words, are series stable under permutations of
their elements?
This stability seems inherent to any proper notion of “addition”, which should not a¤ect
by mere rearrangements of the summands. Indeed, the answer is obviously positive for …nite
sums because of the classic associative and commutative properties of addition. The next
result shows that the answer continues to be positive for series that are absolutely convergent.

P P1
Proposition 375 Let 1 n=1 xn be a series that converges absolutely. Then, n=1 xn and
all its rearrangements have the same sum.
17
We refer interested readers to Chapter 3 of Rudin (1976) for a more detailed analysis, which includes the
proofs of the results of this section.
18
Recall that a permutation is a bijective function (see Appendix B).
9.4. SERIES WITH TERMS OF ANY SIGN 265

Absolutely convergent series thus exhibit the same well behavior that characterizes …nite
sums. Unfortunately, this is no longer the case if we drop absolute convergence. For instance,
consider the alternating harmonic series

1 1 1 ( 1)n+1
1 + + + +
2 3 4 n
We learned that it converges, with sum log 2, but that it is not absolutely convergent.
Through a suitable permutation, we con construct the rearrangement
1 1 1 1 1
1+ + + +
3 2 5 7 4
p
which is still convergent, but with sum log 2 2. So, rearrangements have, in general, di¤erent
sums. The next classic result of Riemann shows that everything goes, so the answer to the
previous question turns out to be dramatically negative.
P
Theorem
P1 376 (Riemann) Let 1 n=1 xn be a series that converges
P1but not absolutely (i.e.,
jx
n=1 n j = +1). Given any L 2 R, there is a rearrangement of n=1 xn that has sum L.

Summing up, series that are absolutely convergent behave as the standard addition. But,
as soon as we drop absolute convergence, everything goes.
266 CHAPTER 9. SERIES
Chapter 10

Discrete calculus

Discrete calculus deals with problems analogous to those of di¤erential calculus, with the
di¤erence that sequences, that is, functions f : N f0g ! R with discrete domain, are
considered instead of functions on the real line. Despite a more rough domain, some highly
non-trivial results hold that make discrete calculus useful in applications.1 In particular, in
this chapter we will show its use in the study of series and sequences, allowing for a deeper
analysis of some issues which we have already discussed.

10.1 Preamble: limit points


Let fxn g be a bounded sequence of scalars, so that there exists a positive constant M > 0
with M xn M for every n. Consider the ancillary sequences fyn g and fzn g de…ned by

yn = sup xk and zn = inf xk


k n k n

Example 377 For the alternating sequence xn = ( 1)n , we have yn = 1 and zn = 1 for
every n, whereas for the sequence xn = 1=n we have yn = 1=n and zn = 0 for every n. N

It is immediate to check that

M zn xn yn M 8n 1 (10.1)

Hence, also the ancillary sequences are bounded. Moreover,

n1 < n2 =) sup xk sup xk and inf xk inf xk


k n1 k n2 k n1 k n2

so fyn g is decreasing and fzn g is increasing. Being monotone, both fyn g and fzn g converge
(Theorem 299). If we denote their limits as y and z, that is, yn ! y and zn ! z, we can
write
lim sup xk = y and lim inf xk = z
n!1 k n n!1 k n

The limits y and z are, respectively, called limit superior and limit inferior of fxn g, and are
denoted by lim sup xn and lim inf xn .
1
Some parts of this chapter require a basic knowledge of di¤erential calculus. This chapter can be read
seamlessly after reading Chapter 20.

267
268 CHAPTER 10. DISCRETE CALCULUS

Example 378 For the alternating sequence xn = ( 1)n we have

lim sup xn = 1 and lim inf xn = 1

whereas for the convergent sequence xn = 1=n we have

lim sup xn = lim inf xn = lim xn = 0

This example shows two key properties of the limits inferior and superior: they always
exist, even if the original sequence has no limit and their equality is a necessary and su¢ cient
condition for the convergence of the sequence fxn g.2 Formally:

Proposition 379 Let fxn g be a bounded sequence. We have

1 < lim inf xn lim sup xn < +1 (10.2)

In particular, xn ! L 2 R if and only if lim inf xn = lim sup xn = L.

Proof Thanks to (10.1), Proposition 296 implies (10.2). The proof of the second part of the
statement is left to the reader.

Other noteworthy properties are

lim inf xn = lim sup xn and lim sup xn = lim inf xn (10.3)

They are duality properties that relate the limit superior and limit inferior of a sequence fxn g
with those of the opposite sequence f xn g. For instance, this simple duality allows to easily
translate some properties of the limit superior into properties of the limit inferior, and vice
versa (this is exactly what will happen in the next proof). Another interesting consequence
of the duality is the possibility to rewrite the inequality (10.2) as lim inf xn lim inf xn .

The next result lists some basic properties of the limits superior and inferior. Thanks to
the previous result, they imply the analogous properties that we established for convergent
sequences.3

Lemma 380 Let fxn g and fyn g be two bounded sequence. We have:

(i) lim inf xn + lim inf yn lim inf (xn + yn ),

(ii) lim sup xn + lim sup yn lim sup (xn + yn ),

(iii) lim inf xn lim inf yn and lim sup xn lim sup yn if eventually xn yn .
2
Since it is bounded, fxn g converges or oscillates, but does not diverge.
3
Speci…cally, (i) and (ii) have as a special case Proposition 309-(i), while (iii) has as a special case Pro-
position 296.
10.1. PREAMBLE: LIMIT POINTS 269

Proof We start by observing that fxn + yn g is bounded. (i) For every n we have inf k n (xk + yk )
inf k n xk + inf k n yk . Since the sequences finf k n (xk + yk )g, finf k n xk g and finf k n yk g
converge, (i) follows from Proposition 296. (ii) follows from (i) and the duality formulas
contained in (10.3):

lim sup (xn + yn ) = lim inf (( xn ) + ( yn ))


lim inf ( xn ) lim inf ( yn ) = lim sup xn + lim sup yn

The proof of (iii) is left to the reader.

It is possible to give a topological characterization of these limits; to do so, we introduce


the notion of limit point.

De…nition 381 A scalar L 2 R is a limit point for a sequence if every neighborhood of L


contains an in…nite number of elements of the sequence.

If the sequence converges, there exists a unique limit point: the limit of the sequence.
If the sequence does not converge, the limit points are the scalars that are approached by
in…nitely many elements of the sequence. Indeed, it can be easily shown that L is a limit
point for a sequence if and only if there exists a subsequence that converges to L.

Example 382 (i) The interval [ 1; 1] is the set of limit points of the sequence xn = sin n,
whereas f 1; 1g are the limit points of the alternating sequence xn = ( 1)n . (ii) The
singleton f0g is the unique limit point of the convergent sequence xn = 1=n. N

The next result shows that the limit points belong to the interval determined by the limit
superior and the limit inferior.

Proposition 383 Let fxn g be a bounded sequence. If x 2 R is a limit point for the sequence,
then x 2 [lim inf xn ; lim sup xn ].

Proof Consider a limit point x. By contradiction, assume that lim inf xn > x. De…ne
" = lim inf xn x > 0 and zn = inf k n xk for every n. On the one hand, in light of the
previous part of the chapter, we know that zn+1 zn for every n and zn ! lim inf xn . This
implies that there exists n" 2 N such that
" "
lim inf xn < zn < lim inf xn +
2 2
for every n n" . On the other hand, since x is a limit point, there exists xn such that
" "
x 2 < xn < x + 2 where n can be chosen to be strictly greater than n" (recall that
each neighborhood of x must contain an in…nite number of elements of the sequence). By
construction, we have that zn = inf k n xk xn . This yields that
" "
lim inf xn < zn xn < x +
2 2
thus lim inf xn < x+". We reached a contradiction since by de…nition " = lim inf xn x which
we just proved being strictly smaller than ". An analogous argument yields that lim sup xn
x (why?).
270 CHAPTER 10. DISCRETE CALCULUS

Intuitively, the larger the set of limit points, the more the sequence is divergent; in par-
ticular, this set reduces to a singleton when the sequence converges. In light of the last result,
the di¤erence between superior and inferior limits, that is, the length of [lim inf xn ; lim sup xn ],
is a (not that precise) indicator of the divergence of a sequence.

Thanks to the inequality lim inf xn lim inf xn , the interval [lim inf xn ; lim sup xn ]
can be rewritten as [lim inf xn ; lim inf xn ]. For instance, if xn = sin n or xn = cos n, we
have that [lim inf xn ; lim inf xn ] = [ 1; 1].

N.B. Up to this point, we have considered only bounded sequences. Versions of the previ-
ous results, however, can be provided for generic sequences. Clearly, we need to allow the
limits superior and inferior to assume in…nity as a value. For instance, if we consider the
sequence xn = n, which diverges to +1, we have lim inf xn = lim sup xn = +1; for the
sequence xn = en , which diverges to 1, we have lim sup xn = lim inf xn = 1, whereas
for the sequence xn = ( 1)n n we have lim inf xn = 1 and lim sup xn = +1, so that
[lim inf xn ; lim sup xn ] = R. We leave to the reader the extension of the previous results to
generic sequences. O

10.2 Discrete calculus


10.2.1 Finite di¤erences
The (…nite) di¤ erences
xn = xn+1 xn
of a sequence fxn g are the discrete case counterparts of the derivatives of a function de…ned
on the real line.4 Indeed, the smallest discrete increment starting from n is equal to 1,
therefore
xn+1 xn xn+1 xn xn
xn = = =
1 (n + 1) n n

De…nition 384 The sequence f xn g = fxn+1 xn g is called sequence of (…nite) di¤erences


of a sequence fxn g.

The next result lists the algebraic properties of the di¤erences, that is, their behavior
with respect to the fundamental operations.5

Proposition 385 Let fxn g and fyn g be any two sequences. For every n, we have:

(i) ( xn + yn ) = xn + yn for every ; 2 R;

(ii) (xn yn ) = xn+1 yn + yn xn ;


xn yn xn xn yn
(iii) = provided yn 6= 0 for every n.
yn yn yn+1
4
See Section 20.14.
5
It is the discrete counterpart of the results in Section 20.8.
10.2. DISCRETE CALCULUS 271

On the one hand, (i) shows that the di¤erence preserves addition and subtraction,
on the other hand, (ii) and (iii) show that more complex rules hold for multiplication and
division. Properties (ii) and (iii) are called product rule and quotient rule, respectively.

Proof (i) Obvious. (ii) It follows from

(xn yn ) = xn+1 yn+1 xn yn = xn+1 yn+1 xn+1 yn + xn+1 yn xn yn


= xn+1 (yn+1 yn ) + yn (xn+1 xn ) = xn+1 yn + yn xn

(iii) It follows from

xn xn+1 xn xn+1 yn xn yn+1 xn+1 yn xn yn + xn yn xn yn+1


= = =
yn yn+1 yn yn yn+1 yn yn+1
yn (xn+1 xn ) xn (yn+1 yn ) yn xn xn yn
= =
yn yn+1 yn yn+1

Monotonicity of sequences is characterized through di¤erences in a simple, yet interesting


way.

Lemma 386 A sequence is increasing (decreasing) if and only if xn 0( 0) for every


n 1.

Therefore, the monotonicity of the original sequence is revealed by the sign of the di¤er-
ences.

Example 387 (i) If xn = c for all n 1, then xn = 0 for all n 1. In words, constant
sequences (that are both increasing and decreasing) have zero di¤erences. (ii) If xn = an ,
with a > 0, we have that

xn = an+1 an = (a 1) an = (a 1) xn

Therefore, the sequence fan g is increasing if and only if a 1. N

The case a = 2 in this last example is noteworthy.

Proposition 388 We have xn = xn for every n 1 and x1 = 2 if and only if xn = 2n


for every n.

The sequence xn = 2n thus equals the sequence of its own …nite di¤erences, so it is the
discrete counterpart of the exponential function in di¤erential calculus.

Proof “If”. From the last example, if a = 2 then for the increasing sequence f2n g we have
xn = xn for every n and x1 = 2. “Only if”. Suppose that xn = xn for all n 1, that is,
xn+1 xn = xn . A simple recurrence argument shows that xn = 2n 1 x1 . Since x1 = 2, we
obtain xn = 2n for every n.
272 CHAPTER 10. DISCRETE CALCULUS

The sequence of di¤erences of f xn g is denoted by 2x and is called sequence of


n
second di¤ erences; in particular:
2
xn = xn+2 xn+1 (xn+1 xn ) = xn+2 2xn+1 + xn

Analogously, for every k 2, we denote by kx the di¤erences of k 1x


n n, that is,
k
X k
k
xn = k 1
xn = k 1
xn+1 k 1
xn = ( 1)k i
xn+i (10.4)
i
i=0

This formula can be proved by induction on k (a common technique for this chapter). Here,
we only outline the induction step. Assume that (10.4) holds for k. We show it holds for
k + 1. Fix n. First, observe that (why?)
k+1 k k
= + 8i = 1; :::; k (10.5)
i i 1 i
This implies that
k
X k
X
k k
k+1
xn = k
xn+1 k
xn = ( 1)k i
xn+1+i ( 1)k i
xn+i
i i
i=0 i=0
k 1
X k
X
k k
= ( 1)k i
xn+1+i + xn+k+1 k
( 1) xn ( 1)k i
xn+i
i i
i=0 i=1
k
X k
X
k k
= ( 1)k+1 xn + ( 1)k+1 i
xn+i + ( 1)k+1 i
xn+i + xn+k+1
i 1 i
i=1 i=1
k
X
k+1 k+1 k+1
= ( 1)k+1 xn + ( 1)k+1 i
xn+i + xn+k+1
0 i k+1
i=1
k+1
X k+1
= ( 1)k+1 i
xn+i
i
i=0

Note that the second equality is justi…ed by the inductive hypothesis.

Example 389 If xn = n, we have

n = (n + 1) n=1

and kn = 0 for every k > 1. If xn = n2 , we have

n2 = (n + 1)2 n2 = 2n + 1
2 2
n = 2 (n + 1) + 1 (2n + 1) = 2

and k n2 = 0 for every k > 2. N

Formula (10.4) permits the following beautiful generalization of the series expansion
(9.17) of the exponential function. From now on, we set 0 xn = xn for every n. Note that
if we set 00 = 1 too, then (10.4) holds for k = 0 as well.
10.2. DISCRETE CALCULUS 273

Theorem 390 Let fyn g be any bounded sequence. Then, for each n 1,
1
X 1
X
xk k x xj
yn = e yn+j 8x 2 R (10.6)
k! j!
k=0 j=0

Proof Since fyn g is bounded, the two series in the formula converge. By (10.4), we have to
show that, for each n,

1
X k 1
xk X k X xj
( 1)k i yn+i = e x
yn+j 8x 2 R (10.7)
k! i j!
k=0 i=0 j=0

In reality, we are going to prove a much stronger fact. Fix an integer j 0. We show that
the coe¢ cients of yn+j on the two sides of (10.7) are equal. Clearly, on the right-hand side
this coe¢ cient is e x xj =j!. As to the left-hand side, note that yn+j appears as soon as k j
and this coe¢ cient is
1
X xk k
( 1)k j
k! j
k=j

Therefore, it remains to prove that


1
X xk k xx
j
( 1)k j
=e (10.8)
k! j j!
k=j

Set i = k j. Then,
1
X 1
X 1
X
xk k xi+j i+j xi+j (i + j)!
( 1)k j
= ( 1)i = ( 1)i
k! j (i + j)! j (i + j)! i!j!
k=j i=0 i=0
1 1
xj X i ( 1)i xj X ( x)i xj x
= x = = e
j! i! j! i! j!
i=0 i=0

where the last equality follows from Theorem 367, thus proving (10.8) and the statement.

The series expansion (9.17) is a special case of (10.6). Indeed, let n = 1 so that (10.6)
becomes
X1 X1
xk k xj
y1 = e x y1+j (10.9)
k! j!
k=0 j=0

Assume that yj = 1 for every j. Then, 0y = y1 = 1 and ky = 0 if k 1. Hence, (10.9)


1 1
becomes
1
X
x xj
1=e
j!
j=0

which is the series expansion (9.17).


274 CHAPTER 10. DISCRETE CALCULUS

10.2.2 Newton di¤erence formula


The next result, which generalizes Example 389, shows a further analogy between in
discrete calculus and the derivative in “continuous”calculus. Indeed, in the continuous case
it is necessary to derive k times the power function xk in order to obtain a constant and
k + 1 times to get the constant 0. In the discrete case, we must apply k times the operator
to the sequence nk – the restriction of the power function on N+ – in order to obtain a
constant and k + 1 times to get the constant 0.

Proposition 391 Let xn = nk with k 1. Then, k nk = k! and

m k
n =0 8m > k (10.10)

The proof relies on the following lemma of independent interest (we leave its proof to the
reader).

Lemma 392 Let fxn g be a sequence. For every k and for every n, we have k+1 x =
n
kx = k x .
n n

Proof We begin by proving a version of (10.10), namely that

k+1 s
n =0 8k 2 N; 8s 2 f0; 1; :::; kg (10.11)

We proceed by induction. For k = 1, note that s can only be either 0 or 1 and the result
holds in view of the last example. Assume now that k+1 ns = 0 for all s 2 f0; 1; :::; kg
(induction hypothesis on k), we need to show that k+2 ns = 0 for all s 2 f0; 1; :::; k + 1g.
Let s belong to f1; :::; k + 1g: either s < k + 1 or s = k + 1. In the …rst case, by the induction
hypothesis, we have that k+2 ns = k+1 ns = 0. In the second case, by using Newton’s

binomial, we have

k+1 k k+1 k
nk+1 = (n + 1)k+1 nk+1 = nk+1 + n + n 1
+ +1 nk+1
1 2
k+1 k
= (k + 1) nk + n 1
+ +1
2

Therefore, by the previous lemma we have

k+2 k+1 k+1 k+1 k


n = nk+1 = k+1
(k + 1) nk + n 1
+ +1
2
k+1 k k+1 k+1 k 1 k+1
= (k + 1) n + n + + 1=0+0+ +0=0
2

where the zeroes follow from the induction hypothesis. We conclude that k+2 nk+1 = 0.
The statement in (10.11) follows. From (10.11), it is then immediate to derive, by induction
on m, equation (10.10) (why?). Next we show that k nk = k!. We proceed by induction.
Again, for k = 1 the result holds in view of the last example. Assume now that the statement
10.2. DISCRETE CALCULUS 275

holds for k (induction hypothesis). We need to show that k+1 nk+1 = (k + 1)!. We then
have

k+1 k+1 k k+1 k


n = nk+1 = k
(k + 1) nk + n 1
+ +1
2
k k k+1 k k 1 k
= (k + 1) n + n + + 1
2
= (k + 1) k! + 0 + + 0 = (k + 1)!

where the zeroes follow from (10.11). Summing up, k nk = k!, as desired.

That said, in di¤erential calculus a key feature of the powers xk is that their derivatives
are kxk 1 . In this respect, the discrete powers nk are disappointing because their di¤erences
do not take such a form: for instance, for the sequence xn = n2 we have n2 = 2n + 1 6= 2n
(Example 389).
To restore the formula kxk 1 , we need to introduce the falling factorial n(k) de…ned by

n!
n(k) = = n (n 1) (n k + 1)
(n k)!

with 0 k n. Clearly, if k = n we go back to standard factorials, i.e., n(n) = n!.

Proposition 393 We have n(k) = kn(k 1) for every 1 k n.

Proof We have
(n + 1)! n! (n + 1) n! n!
n(k) = (n + 1)(k) n(k) = =
(n + 1 k)! (n k)! (n + 1 k) (n k)! (n k)!
n+1 n! k
= 1 = n(k)
n+1 k (n k)! n+1 k
n (n 1) (n k + 2) (n k + 1)
= k
n+1 k
= kn (n 1) (n k + 2) = kn(k 1)

as desired.

Thus, for …nite di¤erences the sequences xn = n(k) are the analog of powers for di¤erential
calculus.6 This analogy underlies the next classic di¤erence formula proved by Isaac Newton
in 1687 in the Principia. Recall that 0 xn = xn .

Theorem 394 (Newton) We have


m
X m(j) j
xn+m = xn (10.12)
j!
j=0

6
Observe that, given k, the terms xn = n(k) are well de…ned for n k.
276 CHAPTER 10. DISCRETE CALCULUS

Proof Before starting, note that for every sequence fxn g and for n 1 and m 1 equality
(10.12) can be rewritten as

m
X m
X
m! j m j
xn+m = xn = xn
j! (m j)! j
j=0 j=0

Let fxn g be a generic sequence and n a generic element in N+ . We proceed by induction on


m. For m = 1 the statement is true, indeed we have that

1
X
1 0 1 m j
xn+1 = xn + xn+1 xn = xn + xn = xn
0 1 j
j=0

Assume now the statement is true for m. We need to show it holds for m + 1. Note that
m
X m
X
m j m j
xn+m+1 = xn+m + xn+m = ( xn ) + xn
j j
j=0 j=0
m
X Xm
m j+1 m j
= xn + xn
j j
j=0 j=0
m
X1 m
X
m+1 m j+1 m j 0
= xn + xn + xn + xn
j j
j=0 j=1
Xm Xm
m+1 m j m j 0
= xn + xn + xn + xn
j 1 j
j=1 j=1
m
X m+1
X
m+1 m+1 j 0 m+1 j
= xn + xn + xn = xn
j j
j=1 j=0

where the second to last equality follows from (10.5), proving the statement.

This expansion can be written as

m (m 1) 2 m
xn+m xn = m xn + xn + + xn
2

So, it represents the di¤erence between two terms of a sequence via di¤erences of higher
orders. It can be viewed as a discrete analog of Taylor expansion.

Example 395 Let xn = nk with k 1. By Proposition 391, we have

m (m 1) m(k 1)
xn+m xn = m nk + 2 k
n + + k 1 k
n + m(k)
2 (k 1)!

provided m k. N
10.2. DISCRETE CALCULUS 277

10.2.3 Asymptotic behavior


The limit of the ratio
xn
yn
is fundamental, as we have seen in the analysis of the order of convergence. Consider the
following example.

Example 396 Let xn = n ( 1)n and yn = n2 . We have

xn ( 1)n
= !0
yn n

If we consider their di¤erences we get

xn xn+1 xn ( 1)n+1 (1 + 2n)


= = = ( 1)n+1
yn yn+1 yn 1 + 2n

So, the ratio xn = yn does not converge. N

Therefore, even if the ratio xn =yn does converge, the behavior of the ratio xn = yn
of the di¤erences may not. On the other hand, the next result shows that the asymptotic
behavior of the ratio xn = yn determines the one of xn =yn .

Theorem 397 (Cesàro) Let fyn g be a strictly increasing sequence that diverges to in…nity,
that is, yn " +1, and let fxn g be any sequence. Then,

xn xn xn xn
lim inf lim inf lim sup lim sup (10.13)
yn yn yn yn

In particular, this inequality implies that, if the (…nite or in…nite) limit of the ratio
xn = yn exists, we have

xn xn xn xn
lim inf = lim inf = lim sup = lim sup (10.14)
yn yn yn yn

that is, xn =yn converges to the same limit. Therefore, as stated above, the “regularity”of the
the asymptotic behavior of the ratio xn = yn implies the “regularity” of the original ratio
xn =yn . At the same time, if the ratio xn =yn presents an “irregular”asymptotic behavior, so
will the di¤erence ratio.

Proof We will only prove the special case (10.14) when xn = yn admits a …nite limit.
Therefore, let xn = yn ! L 2 R. It follows that, for " > 0, there exists n" such that

xn
L "< <L+"
yn

for every n n" . Since, by hypothesis, yn+1 yn > 0 for every n, we have

(L ") (yn+1 yn ) < xn+1 xn < (L + ") (yn+1 yn ) 8n n"


278 CHAPTER 10. DISCRETE CALCULUS

In particular, for every n > n" , we obtain

(L ") (yn" +1 yn" ) < xn" +1 xn" < (L + ") (yn" +1 yn" )
(L ") (yn" +2 yn" +1 ) < xn" +2 xn" +1 < (L + ") (yn" +2 yn" +1 )

(L ") (yn yn 1) < xn xn 1 < (L + ") (yn yn 1)

Summing over the previous inequalities, we get for each n > n"

(L ") (yn yn" ) < xn xn" < (L + ") (yn yn" )


that is, for each n > n"
xn" (L ") yn" xn xn (L + ") yn"
L "+ < <L+"+ "
yn yn yn
Since n" is a given integer and yn " +1 for n ! 1, it follows that
xn" (L ") yn" xn (L + ") yn"
lim = lim " =0
n yn n yn
Therefore, we have
xn xn
L " lim inf lim sup L+"
n yn n yn
Since " > 0 is arbitrary, it follows that
xn xn
lim inf = lim sup =L
n yn n yn

as desired. If xn = yn ! 1 we can proceed in a similar way, as the reader can verify.

The previous result can be interpreted as a discrete version of de l’Hospital’s Theorem.


As the de l’Hospital’s Theorem is useful in …nding the limit of functions, in particular if
they present indeterminate forms, the discrete analogous by Cesàro proves itself to be useful
operationally in …nding the limit of sequences that present indeterminate forms.

Example 398 The limit of the sequence


log (1 + n)
(10.15)
n

has the indeterminate form 1=1. Consider the sequences xn = log (1 + n) and yn = n. The
sequence (10.15) can be then written as xn =yn . We have

xn log (1 + n + 1) log (1 + n) 1
= = log 1 + !0
yn 1 1+n
Therefore
log (1 + n)
lim =0
n
by Cesàro’s Theorem. N
10.3. CONVERGENCE IN MEAN 279

At a conceptual level, in the next section we will see how Cesàro’s Theorem allows for
a better understanding of convergence criteria for series (see Section 10.4). To this end, the
following remarkable consequence of Cesàro’s Theorem will be crucial.

Corollary 399 Let fxn g be a sequence such that, eventually, xn > 0. Then,
xn+1 p p xn+1
lim inf lim inf n
xn lim sup n
xn lim sup (10.16)
xn xn

Proof Without loss of generality, let fxn g be a strictly positive sequence. We have

xn+1 p 1
log = log xn+1 log xn and log n
xn = log xn
xn n

Consider log xn and yn = n , (10.13) takes the form

log xn log xn log xn log xn


lim inf lim inf lim sup lim sup
yn yn yn yn

that is,

log xxn+1
n
p p log xxn+1
n
lim inf lim inf log n
xn lim sup log n
xn lim sup
1 1
from which (10.16) follows since, for every strictly positive sequence fzn g, we have

elim inf zn = lim inf ezn and elim sup zn = lim sup ezn

as the reader can check.

10.3 Convergence in mean


10.3.1 In medio stat virtus
The next result, apart from being particularly elegant, is a deterministic version of the law
of large numbers, one of the main results in probability theory.

Theorem 400 Let fxn g be a sequence that converges to L 2 R. We have


x1 + x2 + + xn
!L
n
Proof Consider the sequences zn = x1 + x2 + + xn and yn = n. We have
zn+1 zn xn+1
= = xn+1
yn+1 yn 1

Therefore, from the previous results, it follows that


zn zn
lim inf xn+1 lim inf lim sup lim sup xn+1
n n
280 CHAPTER 10. DISCRETE CALCULUS

and, since by hypothesis lim inf xn+1 = lim sup xn+1 = lim xn = L , it follows
x1 + x2 + + xn
lim zn = lim =L
n
as desired.

The sequence Pn
i=1 xi
n
of arithmetic means converges always to the same limit of the sequence fxn g, whereas the
converse does not hold: the sequence of means may converge while the original one does not.

Example 401 The alternating sequence xn = ( 1)n does not converge, whereas
Pn
i=1 xi
!0
n
Indeed (
x1 + x2 + + xn 0 if n is even
=
n 1
if n is odd
n
N

Therefore, the sequence of means is more “stable”than the original one. This motivates
the following, more general, de…nition of limit of a sequence, named after Ernesto Cesàro.
It is fundamental in probability theory (and in its applications).

De…nition 402 We say that a sequence fxn g converges in the sense of Cesàro (or in mean)
C
to L, and we write xn ! L, when
x1 + x2 + + xn
!L
n
From the last result, it follows that standard convergence to a limit implies Cesàro con-
vergence to the same limit. The converse does not hold: we may have Cesàro convergence
without standard convergence.

Example 403 The alternating sequence xn = ( 1)n from the last example does not con-
C
verge but it converges in the sense of Cesàro, i.e., ( 1)n ! 0. N

It is useful to …nd conditions such that the converse holds, that is, the convergence of the
sequence of means implies the convergence of the original sequence. These results are called
Tauberian theorems. We state one of them as an example.

Proposition 404 (Landau) Let fxn g be a sequence for which there exists k < 0 such that

xn =n > k 8n 1
C
Then xn ! L 2 R if and only if xn ! L.
10.3. CONVERGENCE IN MEAN 281

In particular, the hypothesis is always satis…ed when the sequence fxn g is increasing. So,
an increasing sequence converges to L if and only if it Cesàro converges to L.

Whenever a sequence does not converge in mean, we may consider the sequence of the
“means of the means”, that, by the previous results, it is more likely to converge than the
sequence of means: this is called (C; 2) convergence. This idea can be extended to the mean
of the mean iterated k times. We will not consider such cases.7 However, the fundamental
principle is that means tend to smooth the behavior of a sequence. In various fashions, often
stochastic (an example is the law of large number previously mentioned), this principle is of
central importance in applications. In medio stat virtus.

10.3.2 Creatio ex nihilo


The previous analysis has a particularly interesting application to the sequence of partial
sums. Indeed, if we consider the Cesàro limit of the sequence of partial sums fsn g, we can
C P C
extend the concept of summation of a series: if sn ! S we will write 1 n=1 xn = S.
Highly divergent series become convergent according to this broader de…nition. Consider
this famous example.

Example 405 The series, named after Grandi,


X1
1 1+1 1+ = ( 1)n+1
n=1

does not converge. Its partial sums

s 1 = 1 ; s 2 = 0 ; s3 = 1 ; s4 = 0 ; s5 = 1 ;

lead to the following sequence of means (of partial sums)


1+0 1 2 2 1 3
y1 = 1 ; y2 = = ; y3 = ; y4 = = ; y5 = ;
2 2 3 4 2 5
It is quite obvious that yn is equal to 1=2 when n is even and

1=2 + n=2 n+1 1 1


yn = = = +
n 2n 2 2n
when n is odd. Therefore, yn ! 1=2, so
1
X C 1
( 1)n+1 =
2
n=1

Grandi’s series converges in the sense of Cesàro.

Even if this is not his main scienti…c contribution, the name of Guido Grandi is re-
membered for his treatment of this series. It is curious to note that, until the mid-nineteenth
century, also the greatest mathematicians believed – like Grandi – that this series summed
to 1=2. Until then, mathematics had been developing untidily: highly complex theorems
7
We refer interested readers to Hardy (1949).
282 CHAPTER 10. DISCRETE CALCULUS

were known, but attention to well posed de…nitions and rigor, which we are now used to,
was lacking.
The monk Guido Grandi proposed the following explanation, which contains two mis-
takes. First of all, he identi…ed

1 1+1 1+1 1+

as a geometric series with common ratio q = 1 (correct) and therefore having sum
1 1 1
= =
1 q 1 ( 1) 2
(wrong: the geometric series converges only when jqj < 1). In an unfortunate crescendo, by
pairing the addends (wrong: the associative property does not generally hold for series; cf.
Section 9.4.2), Grandi then derived the equality

(1 1) + (1 1) + =0+0+

in order to conclude that


1
=0+0+
2
That is, the sum of in…nite zeroes is equal to 1=2. This led him not to deny the existence of
God, but to deem as irrelevant his intervention in the creation. Indeed, even without divine
intervention something can come out of nothing (if you wait long enough): creatio ex nihilo.
That said, in a sense Grandi’s intuition that the sum of the series is 1=2 can be vindicated
through the Cesàro convergence.

10.4 Convergence criteria for series


The results of this chapter will allow us to achieve a better understanding of the convergence
criteria for series provided in Section 9.3.8 We begin with a useful lemma.

Lemma 406 Let fxn g be a sequence with, eventually, xn > 0. There exists q < 1 such that,
eventually, xn+1 =xn q if and only if
xn+1
lim sup <1 (10.17)
xn
Proof Without loss of generality, assume that xn > 0 for every n. “Only if”. Suppose that
there exists q < 1 such that eventually (9.13) holds. There exists n such that xn+1 =xn q
for every n n. Therefore, for any such n we have supk n xk+1 =xk q, which implies
xn+1 xk+1
lim sup = lim sup q<1
xn n!1 k n xk

“If”. Suppose that (10.17) holds. Since


xk+1
lim sup =L<1
n!1 k n xk
8
For the sake of brevity, we will only consider series. Nonetheless, similar considerations hold for sequences
(Section 8.11). Example 398 is explanatory.
10.4. CONVERGENCE CRITERIA FOR SERIES 283

for every " > 0 there exists n such that

xn+1
sup L <" 8n n
k n xn

that is
xk+1
L " < sup <L+" 8n n
k n xk
If we choose " su¢ ciently small so that L + " < 1, by setting q = L + " we obtain the desired
condition.

Analogously, we can prove that eventually xn+1 =xn 1 if


xn+1
lim inf >1 (10.18)
xn
and only if
xn+1
lim inf 1 (10.19)
xn
Therefore, the condition “eventually xn+1 =xn 1”implies (10.19) and is implied by (10.18).
However, we cannot prove anything more. The constant sequence xn = 1 shows that the
aforementioned condition holds even if (10.18) does not hold, whereas the sequence f1=ng
shows that (10.19) may hold even if the condition is violated.

The previous analysis leads to the following corollary, which is useful for computations,
in which the ratio criterion is expressed in terms of limits.
P
Corollary 407 Let 1 n=1 xn be a series with, eventually, xn > 0.

(i) If
xn+1
lim sup <1
xn
then the series converges.
(ii) If
xn+1
lim inf >1
xn
then the series diverges positively.

Note that, thanks to Lemma 406, point (i) is equivalent to point (i) of Proposition 363.
In contrast, point (ii) is weaker than point (ii) of Proposition 363 since condition (10.18) is
only su¢ cient, but not necessary, to have that xn+1 =xn 1 eventually.

As shown by the following examples, this speci…cation of the ratio criterion is particularly
useful when the limit
xn+1
lim
xn
exists, that is, whenever
xn+1 xn+1 xn+1
lim = lim sup = lim inf
xn xn xn
In this particular case, the ratio criterion takes the useful tripartite form of Proposition 361:
284 CHAPTER 10. DISCRETE CALCULUS

(i) if
xn+1
lim <1
xn
the series converges;

(ii) if
xn+1
lim >1
xn
the limit of the series is 1;

(iii) if
xn+1
lim =1
xn
the criterion fails and it does not determine the behavior of the series.

As we have seen in Section 8.11, this form of the ratio criterion is the one which is usually
used in applications. Examples P362 and 365 have
P1shown 2cases (i) and (ii). The unfortunate
1
case (iii) is well exempli…ed by n=1 1=n and n=1 1=n .

10.4.1 Root criterion for convergence


The next convergence criterion is, from a theoretical point of view, the most powerful one
(as the next section will show).
P1
Proposition 408 (Root criterion) Let n=1 xn be a series with positive terms.

(i) If there exists a number q < 1 such that, eventually,


p
n
xn q

then the series converges.


p
(ii) If instead n xn 1 for in…nitely many values of n, then the series diverges.
p
Proof From n xn q we immediately have that 0 xn q n and, by using the comparison
criterion and the convergence of the geometric series, the statement follows. If instead
pn x
n 1 for in…nitely many values of n, for them xn is 1 and it cannot tend to 0.

Let us see the limit form of this result. By an argument similar to the one contained in
Lemma 406, point (i) can be equivalently stated as
p
lim sup n
xn < 1
p
As to point (ii), it requires that n xn 1 for in…nitely many values of n, that is, that there
p
is a subsequence fnk g such that nk xnk 1 for every k. Such a condition holds if
p
lim sup n
xn > 1 (10.20)

and only if
p
lim sup n
xn 1 (10.21)
10.4. CONVERGENCE CRITERIA FOR SERIES 285

The constant sequence xn = 1 exempli…es how condition (10.21) can hold even if (10.20)
does not. The sequence xn = (1 1=n)n on the other hand, shows how even condition (ii)
from Proposition 408 may not hold although (10.21) holds. It is, therefore, clear that (10.20)
implies point (ii) of Proposition 408, which in turn implies (10.21), but that the opposite
implications do not hold.

All this brings us to the following limit form, in which point (i) is equivalent to that of
Proposition 408, while point (ii) is weaker than its counterpart since, as we have seen above,
p
condition (10.20) only is a su¢ cient condition for n xn 1 to hold for in…nitely many values
of n.

P1
Corollary 409 (Root criterion in limit form) Let n=1 xn be a series with positive terms.

p
(i) If lim sup n xn < 1, the series converges.
p
(ii) If lim sup n xn > 1, the series diverges positively.

p p
Proof If lim sup n xn < 1, we have that n xn q for some q < 1, eventually. The desider-
p p
atum follows from Proposition 408. If lim sup n xn > 1, then n xn 1 for in…nitely many
values of n, and the result follows from Proposition 408.

As for the limit form of the ratio criterion, also that of the root criterion is particularly
p
useful when lim n xn exists. Under such circumstances the criterion takes the following
tripartite form:

(i) if
p
lim n
xn < 1

the series converges;

(ii) if
p
lim n
xn > 1

the series diverges positively;

(iii) if
p
lim n
xn = 1

the criterion fails and it does not determine the behavior of the series.

As for the tripartite form of the ratio criterion, that of the root criterion is its most useful
form at a computational level. Nonetheless, we hope the reader will always keep in mind the
theoretical background of the criterion: “ye were not made to live like unto brutes, but for
pursuit of virtue and of knowledge”, as Dante’s Ulysses famously remarked.9
9
“fatti non foste a viver come bruti, ma per seguir virtute e canoscenza”, Inferno, Canto XXVI.
286 CHAPTER 10. DISCRETE CALCULUS

Example 410 (i) Let q > 0. The series


1
X qn
nn
n=1

converges as r
n qn q
n
= !0
n n
P p
(ii) Let 0 q < 1. The series 1 k n
n=1 n q converges for every k: indeed
n
nk q n = qnk=n ! q
because nk=n ! 1 (since log nk=n = (k=n) log n ! 0). N

10.4.2 The power of the root criterion


p
The ratio and root criteria are based on the behavior of sequences fxn+1 =xn g and n xn ,
which are related via the important inequalities (10.16). In particular, if lim xn+1 =xn exists,
we have
xn+1 p
lim = lim n xn (10.22)
xn
and so the two criteria are equivalent in their limit form. However, if lim xn+1 =xn does not
exist, we still obtain from (10.16) that
xn+1 p
lim sup < 1 =) lim sup n xn < 1
xn
and
xn+1 p
lim inf > 1 =) lim sup n xn > 1
xn
This suggests that the root criterion is more powerful than the ratio criterion in determining
convergence: whenever the ratio criterion rules in favor of convergence or of divergence, we
would have reached the same conclusion by using the root criterion. The opposite does not
hold, as the next example shows: the ratio criterion fails while the root criterion determines
that the series in question converges.

Example 411 Consider the sequence10


( 1
2n if n odd
xn = 1
2n 2 if n even

that is:
1 1 1 1 1 1 1
+1+ + + + + + +
2 8 4 32 16 128 64
We have 8 1
>
> 2(n+1) 2
=2 if n odd
xn+1 < 1
2n
=
xn >
>
1
1
: 2n+1
1 = 8 if n even
2n 2

10
See Rudin (1976) p. 67.
10.4. CONVERGENCE CRITERIA FOR SERIES 287

and ( 1
p 2 if n odd
n
xn = p
n
4
2 if n even
so that
xn+1 xn+1 1
lim sup =2 , lim inf =
xn xn 8
and
p 1
lim sup n
xn =
2
The ratio criterion thus fails, while the root criterion tells us that the series converges. N

Even though the root criterion is more powerful, the ratio criterion can still be useful as
it is generally easier to compute the limit of ratios than that of roots. The root criterion may
be more powerful from a theoretical standpoint, yet it is harder to use from a computational
perspective.

In light of this, when using the criteria for solving problems, one should …rst check
whether lim xn+1 =xn exists and, if it does, compute it. In such a case, thanks to (10.22) we
p
can also know the value of lim n xn and thus we can use the more powerful root criterion.
In the unfortunate case in which lim xn+1 =xn does not exist, and we can at best compute
lim sup xn+1 =xn and lim inf xn+1 =xn , we can either use the less powerful ratio criterion (which
p
may fail, as we have seen in the previous example), or we may try to compute lim sup n xn
directly, hoping it exists (as in the previous example) so that the root criterion can be used
in its handier limit form.

Finally, note that, however powerful it may be, the root criterion –a fortiori, the weaker
ratio criterion – only gives a su¢ cient condition for convergence, as the following example
shows.

Example 412 The series


1
X 1
n2
n=1

converges. However, by recalling Example 321, we have that


r r r
n 1 n 1 n 1
lim = lim lim =1
n2 n n
N
P1 2
The root criterion is of no help in determining whether the simple series n=1 n
converges. The reason behind such a “failure”is evident in the following simple result, which
shows how such a criterion implies that the terms of the sequence converge to zero as fast
as the geometric sequence.
288 CHAPTER 10. DISCRETE CALCULUS

P1 p
Proposition 413 Let n=1 xn be a series with positive terms, with lim sup n xn < 1. For
every q > 0 such that
p
lim sup n
xn q<1
we have that, eventually,
xn qn (10.23)
p
Proof Take q > 0 such that lim sup n xn q < 1. There is an nq 1 such that
p
n
xn q

for every n nq . For every such an n we have:


p
n
xn q () xn qn

and so (10.23) holds.

Thanks to (10.23), we can say that those convergent series whose terms converge to zero
less quickly than the geometric sequence – i.e., such that q n = o (xn ) – are out of the root
criterion’s reach. For example, for every natural number k 2 we have that
qn
1 !0
nk
P1
and so q n = o n k . To determine whether the series n=1 n
k converges, the root criterion
is thus useless. This is con…rmed by the fact that
r
n 1
lim =1
nk
But, it is thanks to Proposition 413 that we are able to understand why the root criterion
fails in this instance.

10.5 Power series


10.5.1 Preamble: rational functions
A scalar function f is rational if it is the ratio of two polynomials p and q:

p (x) b0 + b1 x + + bm xm
f (x) = =
q (x) a0 + a1 x + + an xn

Its domain consists of all points of the real line except the real solutions of the equation
a0 + a1 x + + an xn = 0.
A rational function is proper if the degree of the polynomial at the numerator is lower
than that of the polynomial at the denominator, i.e., m < n. Proper rational functions
admit a simple representation –called partial fraction expansion –that often simpli…es their
analysis. We focus on the case of distinct real roots, leaving to readers the case of multiple
roots.
10.5. POWER SERIES 289

Proposition 414 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then
i=1

c1 c2 cn
f (x) = + + + (10.24)
x r1 x r2 x rn
where, for all i = 1; :::; k,
p (ri )
ci = (10.25)
q 0 (ri )

Proof We …rst establish that there exist n coe¢ cients c1 , c2 , ..., cn such that (10.24) holds.
For simplicity, we only consider the case
b0 + b1 x
f (x) =
a0 + a1 x + a2 x2
leaving to readers the general case. Since the denominator is (x r1 ) (x r2 ), we look for
coe¢ cients c1 and c2 such that
b0 + b1 x c1 c2
= +
q (x) (x r1 ) (x r2 )
Since
c1 c2 c1 (x r2 ) + c2 (x r1 ) (c1 + c2 ) x (c1 r2 + c2 r1 )
+ = =
(x r1 ) (x r2 ) q (x) q (x)
we have
b0 + b1 x (c1 + c2 ) x (r2 + r1 )
=
q (x) q (x)
So, by equating coe¢ cients we have the simple linear system

c1 + c2 = b0
c1 r2 + c2 r1 = b1

Since r1 6= r2 , the system is easily seen to have a unique solution (c1 ; c2 ) that provides the
sought-after coe¢ cients.
It remains to show that the coe¢ cients of (10.24) satisfy (10.25). We have
c1 c2 cn
lim (x ri ) f (x) = lim (x ri ) + + +
x!ri x!ri x r1 x r2 x rn
c1 (x ri ) c2 (x ri ) cn (x ri )
= lim + + + ci + = ci
x!ri x r1 x r2 x rn
as well as, by de l’Hospital’s rule,
p (x) (x ri ) 1
lim (x ri ) f (x) = lim (x ri ) = p (ri ) lim = p (ri ) 0
x!ri x!ri q (x) x!ri q (x) q (x)
Putting the two limits together, we conclude that ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k, as
desired.
290 CHAPTER 10. DISCRETE CALCULUS

Example 415 Consider the proper rational function


x 1
f (x) =
x2 + 3x + 2
The roots of the polynomial at the denominator are 1 and 2, so by (10.25) we have
c1 = p ( 1) =q 0 ( 1) = 2 and c2 = p ( 2) =q 0 ( 2) = 3. So, the partial fraction expansion
of f is
2 3
f (x) = +
x+1 x+2
This can be also checked directly. Indeed, since the denominator is (x + 1)(x + 2), let us look
for c1 and c2 such that
c1 c2 x 1
+ = 2 (10.26)
x+1 x+2 x + 3x + 2
The …rst term in (10.26) is equal to

c1 (x + 2) + c2 (x + 1) x(c1 + c2 ) + (2A + c2 )
= (10.27)
(x + 1)(x + 2) (x + 1)(x + 2)

Expressions (10.26) and (10.27) are equal if and only if c1 and c2 satisfy the system:

c1 + c2 = 1
2c1 + c2 = 1

Therefore, c1 = 2 and c2 = 3. This con…rms what established via formula (10.25). N

10.5.2 Cauchy-Hadamard’s Theorem


Power series are an important class of series of the form
1
X
an xn (10.28)
n=0

with an 2 R for every n 0. The scalars an are called coe¢ cients of the series.
The generic term of a power series is xn = an xn . The scalar x parameterizes the series:
to di¤erent values of x correspond di¤erent series, possibly with a di¤erent character.

P
1
De…nition 416 A power series an xn is said to converge ( diverge) at x0 2 R if the
X1 n=0
series an xn0 converges (diverges).
n=0

We set 00X
= 1. In this way, a power series always converges at 0: indeed, from 00 = 1 it
1
follows that an 0n = a0 .
n=0
X1
Proposition 417 If a power series with positive coe¢ cients an xn converges at x0
n=0
0, then it converges at every x 2 R such that jxj < x0 . If it diverges at x0 2 R, then it
diverges at every x 2 R such that jxj > x0 .
10.5. POWER SERIES 291

X1 We only
Proof
n
prove
X1 convergence, the otherX part being similar. Let jxj < x0 . We have
1
an jxj n
an x0 , so the series an xn is absolutely convergent by the
n=0 n=0 X1
n=0
comparison criterion. By Theorem 370, the series an xn converges.
n=0
X1
Inspired by this result, given a power series an xn we say that a positive r 2 [0; +1]
n=0
is the radius of convergence of the power series if it converges at every jxj < r and diverges at
every jxj > r. So, if it exists, the radius of convergence would be a watershed that separates
convergent and divergent behavior of the power series (at jxj = r the character of the series
is ambiguous, it may be regular or not). In particular, if r = +1 the power series converges
at all x 2 R, while if r = 0 it converges only at the origin.
The next powerful result, a simple yet remarkable consequence of the root criterion,
proves the existence of such radius and gives a formula to compute it.
X1
Theorem 418 (Cauchy-Hadamard) The radius of convergence of a power series an xn
n=0
is
1
r=

where p
n
= lim sup jan j 2 [0; +1]
with r = +1 if = 0 and r = 0 if = +1.

Proof Assume 2 (0; 1). We already remarked that the power series converges at x = 0.
So, let x 6= 0. We have
p
n
p jxj
lim sup jan xn j = jxj lim sup n jan j = jxj =
r
So, by the root criterion the series converges if jxj =r < 1, namely if jxj < r, and it diverges
if jxj =r > 1, namely if jxj > r. We leave the case 2 f0; +1g to the reader.

Example 419 (i) The power series


1
X xn
(10.29)
n!
n=0

has radius of convergence r = +1. Indeed,


1
(n+1)! 1
1 = !0
n!
n+1
p
which, thanks to the inequality (10.16), implies = lim sup n 1=n! = 0, namely r = +1.
The power series thus converges at all x 2 R. Indeed, in Theorem 367 we saw that its sum
is ex for every x 2 R.
(ii) The power series
X1
xn
(10.30)
n
n=1
292 CHAPTER 10. DISCRETE CALCULUS

has radius of convergence r = 1. Indeed,


1
(n+1) n
1 = !1
n
n+1
p
which, thanks to the inequality (10.16), implies = lim sup n 1=n = 1, namely r = 1.
At x = 1, it becomes the harmonic series, so it diverges, while at x = 1 it becomes the
alternating harmonic series, so it converges (Proposition 374). We conclude that the power
series (10.30) converges at every x 2 [ 1; 1).
P
1
(iii) The geometric power series xn has radius of convergence r = 1. Indeed, =
p n=0
lim sup n 1 = 1. As well known, it converges at every x 2 ( 1; 1).
P
1
(iv) The power series with factorial coe¢ cients n!xn has radius of convergence r = 0.
n=1
This can be checked directly because,pif x 6= 0, we have n! jxjn ! +1, as well as via the last
theorem by noting that = lim sup n n! = +1. N

10.5.3 Generating functions


We can revisit the previous notions from a functional angle that will clarify the nature of
power series and will be useful later in the book (Section 23.5). Given a sequence fan g of
scalars, the function f : A R ! R de…ned by
1
X
f (x) = an xn (10.31)
n=0

is called the generating function


X1 for the sequence fan g. Its domain A is formed by the points
at which the power series an xn converges. By Cauchy-Hadamard’s Theorem, there
n=0
exists a radius of convergence r 2 [0; +1] such that ( r; r) A [ r; r]. Depending on the
character of the series at x = r, the inclusions may become equalities. For instance, if the
power series converges at both points r, we have A = [ r; r], while if it does not converge
at either point we have A = ( r; r).

Example 420 (i) The generating function


1
X xn
f (x) =
n!
n=0

of the sequence f1=n!g, so de…ned via the power series (10.29), has the entire real line as its
domain. By Theorem 367, it is the exponential f (x) = ex .
(ii) The generating function
1
X xn
f (x) =
n
n=1
of the sequence f1=ng, so de…ned via the power series (10.30), has domain [ 1; 1).
P
1
(iii) The “geometric” function f (x) = xn , generating for the constant sequence
n=0
f1; 1; :::; 1; :::g, has domain ( 1; 1).
10.5. POWER SERIES 293

P
1
(iv) The generating function f (x) = n!xn for the factorials’sequence has a singleton
n=1
domain f0g. N

Next we give an important property of generating functions, where we adopt the conven-
tion f (0) (0) = f (0).

Proposition 421 The generating function for a sequence fan g is in…nitely di¤ erentiable on
( r; r), with
f (n) (0)
an = 8n 0 (10.32)
n!
This result shows, inter alia, that generating functions are uniquely determined: if f is
the generating functions of sequences fan g and fbn g, then these sequence are equal, that is,
an = bn for all n 1. Indeed, an = bn = f (n) (0) =n! for all n 1.

Proof Let f : ( r; r) ! R be the generating function for the sequence fan g restricted on
the open interval ( r; r). We prove that it is analytic,11 so that the result follows from
P
1
Proposition 1081. By de…nition, f (x) = an xn for all x 2 ( r; r). Let x0 2 ( r; r) and
n=0
B" (x0 ) ( r; r). By the binomial formula, for each x 2 B" (x0 ) we have
1
X 1
X 1
X n
X n n
f (x) = an xn = an (x x0 + x0 )n = an x m
(x x0 )m
m 0
n=0 n=0 n=0 m=0
1 1
!
X X n
= an xn0 m
(x x0 )m
n=m
m
m=0

where for the change in the order of summation in the last step we refer readers to, e.g., Rudin
P P
1
(1976) p. 176. By setting bm = 1 n n m
n=m m an x0 , we than have f (x) = bm (x x0 )m
m=0
for all x 2 B" (x0 ). This proves the analyticity of f .

10.5.4 Solving recurrences via generating functions


Denote by fa the generating function for a sequence a = fan g. As remarked after the last
proposition, fa is uniquely determined by a, so one can go back and forth between a and fa .
We can diagram this univocal relationship as follows:

!
a fa

This observation is important because, remarkably, it turns out that a generating function
fa may be constructed by just using a de…nition by recurrence of the sequence a = fan g.
This makes it possible to solve the recurrence if one is able to retrieve (in closed form) the
coe¢ cients of the sequence a = fan g that generates fa . Indeed, such a sequence is unique
11
Analytic functions will be introduced in Section 23.5.
294 CHAPTER 10. DISCRETE CALCULUS

and so it has then to be the one de…ned by the recurrence at hand.12 We can diagram this
solution scheme as follows:
a recurrence ! fa ! a closed form
The next classic example gives a ‡avor of this scheme.
Example 422 Consider the classic Fibonacci recursion, started at n = 0,
(
a0 = 0 ; a1 = 1
(10.33)
an = an 1 + an 2 for n 2
that is, f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; :::g. We want to construct its generating function
p f :
A R ! R. Since the sequence is positive and increasing, clearly lim sup n jan j > 0.
By the Cauchy-Hadamard’s Theorem, the domain A contains an open interval ( "; ") with
0 < " < 1. For each scalar x, we have
N
X N
X N
X
an xn = a0 + a1 x + an xn = a0 + a1 x + (an 1 + an 2) x
n

n=0 n=2 n=2


N
X N
X N
X N
X
n n n 1
= x+ an 1x + an 2x =x+x an 1x + x2 an 2x
n 2

n=2 n=2 n=1 n=2


XN N
X
n 1
= x+x an 1x + x2 an 2x
n 2

n=1 n=2

If x 2 ( "; "), by taking limits we then get f (x) = x + xf (x) + x2 f (x), so


x
f (x) = 8x 2 ( "; ")
1 x x2
The solutions of the equation 1 x x2 = 0 are
p
1 5
x=
2
Some simple algebra then shows that
p1 p1
1 5 5
= p p (10.34)
1 x x2 x 1+ 5
+ 2 x + 2 5
1

So, for each x 2 ( "; ") we have:


! 0 p p 1
1 5 1+ 5
x 1 1 x
f (x) = p p p =p @ p 2
p p
2
p A
5 x + 1+2 5 x+ 1
2
5 5 1
2
5
x+ 1 2 5 1+ 5
2 x + 1+ 5
2
p p ! p p !
1 5 1+ 5 1+ 5 1 5
x x
= p p 2 p 2 =p 2 p 2 p
5 1 5 1+ 5 5 1+ 5 1 5
2 x 1 2 x 1 1 2 x 1 2 x
12
The di¤erential formula (10.32) is of less operational interest than one might expect to …nd the sequence
fan g because taking subsequently higher order derivatives is another kind of recurrence that can be as
demanding as going over the original recurrence itself. That said, it will be momentarily used in proving
Proposition 424.
10.5. POWER SERIES 295

By the properties of the geometric series, for each x 2 ( "; ") we then have
" p 1 p !n p 1 p !n #
x 1+ 5X 1+ 5 n 1 5X 1 5
f (x) = p x xn
5 2 2 2 2
n=0 n=0
2 3
1 p !n+1 1 p !n+1
x X 1+ 5 X 1 5
= p 4 xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
x 4X 1 + 5 X 1 5
= p xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
1 4X 1 + 5 X 1 5
= p xn+1 xn+1 5
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 X 1 5
= p xn 1 xn + 1
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 n
X 1 5
= p x xn
5 n=0 2 2
n=0
1
" p !n p !n #
1 X 1+ 5 1 5
= p xn
5 2 2
n=0

By equating coe¢ cients, we conclude that f is generated by the sequence with terms
" p !n p !n #
1 1+ 5 1 5
an = p 8n 0 (10.35)
5 2 2

So, this sequence solves the previous Fibonacci recursion. N

We call Fibonacci numbers the terms of the sequence (10.35). There is an elegant char-
acterization of their asymptotic behavior.

Proposition 423 For the Fibonacci numbers an we have


p !n
1 1+ 5
an p
5 2

Proof We have
h p n p ni p n p n
p1 1+ 5 1 5 1+ 5 1 5
an 5 2 2 2 2
p n = p n = p n
p1 1+ 5 p1 1+ 5 1+ 5
5 2 5 2 2
p n
1 5 p !n
2 1 5
= 1 p n =1 p !1
1+ 5 1+ 5
2
296 CHAPTER 10. DISCRETE CALCULUS
p p
where the last step follows from (8.29) since 0 < 1 5 = 1+ 5 < 1.

In solving the Fibonacci recurrence (10.33) it was key that its generating function (10.34)
is a proper rational function, which can be then studied via its partial fraction expansion.
This suggests that, more generally, one can solve recurrences that have proper rational
generating functions. For simplicity, we focus on the case of distinct real roots.

Proposition 424 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then, f is a generating function for
i=1
the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
where, for all i = 1; :::; k,
p (ri )
bi =
ri q 0 (ri )

We give two proofs of this result: the …rst one is direct, while the second one relies on
formula (10.32).

Proof 1 By Proposition 414, the partial fraction expansion of f is


c1 c2 ck
f (x) = + + + (10.36)
x r1 x r2 x rk

where ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k. So, we can write

c1 c2 ck c1 1 c2 1 ck 1
f (x) = + + + =
x r1 x r2 x rk r1 1 rx1 r2 1 rx2 rk 1 rxk
1
X 1
X 1
c1 x n
c2 x n
ck X x n
=
r1 r1 r2 r2 rk rk
n=0 n=0 n=0
1
X n n n
c1 x c2 x ck x
=
r1 r1 r2 r2 rk rk
n=0
X1 1
X
c1 xn c2 xn ck xn c1 1 c2 1 ck 1
= = + + + xn
r1 r1n r2 r2n rk rkn n
r1 r1 r2 r2n rk rkn
n=0 n=0
X1
1 1 1
= b1 n + b2 n + + bk xn
r1 r2 rkn
n=0

where bi = p (ri ) =ri q 0 (ri ) for all i = 1; :::; k. We conclude that f is a generating function
for the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn

as desired.
10.5. POWER SERIES 297

Proof 2 Consider the function g (x) = 1= (x r). It can be proved by induction that its
derivative of order n is
n!
g (n) (x) =
(r x)n+1
In view of (10.36), we then have

n! n! n!
f (n) (x) = c1 c2 ck
(r1 x)n+1 (r2 x)n+1 (rk x)n+1

where ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k. So,

f (n) (0) c1 1 c2 1 ck 1
= n
n! r1 r1 r2 r2n rk rkn
1 1 1
= b1 n + b2 n + + bk n
r1 r2 rk

The result now follows from formula (10.32).

As a dividend of this result, we can solve linear recurrences of order k given by (8.11),
that is,13 (
a0 = 0 ; a1 = 1 ; ::: ; ak 1 = k 1
(10.37)
an = p1 an 1 + p2 an 2 + + pk an k for n k
Some algebra, left to the reader, shows that the Fibonacci formula (10.34) here takes the
general form of a proper rational function given by

+( 2 k 1
0 1 0) x +( 2 1 0) x + +( k 1 k 2 0) x
f (x) =
1 p1 x p2 x2 pk xk

Assume that the polynomial at the denominator has k distinct real roots r1 , r2 , ..., rk . By
the last result, f is then the generating function of the sequence with terms

1 1 1
an = b1 + b2 n + + bk
r1n r2 rkn

where, for all i = 1; :::; k,


2 k 1
0 +( 1 0 ) ri +( 2 1 0 ) ri + +( k 1 k 2 0 ) ri
bi =
ri p1 2p2 ri kpk rik 1

This sequence thus solves the linear recurrence (10.37). The key equation

1 p1 x p 2 x2 p k xk = 0

is a version of the so-called characteristic equation of the recurrence.


13
Relative to (8.11), we use the letters a and p in place of x and a, respectively, because the letter x is in
this section the argument of a power series.
298 CHAPTER 10. DISCRETE CALCULUS

Example 425 We can solve the Fibonacci recurrence (10.33) through this method.
p It is a
linear recurrence of order 2 where p1 = p2 = 1, a0 = 0, a1 = 1, r1 = 1 + 5 =2, and
p
r2 = 1 5 =2. So,

r1 1 1 1
b1 = = = p =p
r1 ( 1 2r1 ) 1 + 2r1 1+ 1+ 5 5
r2 1 1 1
b2 = = = p = p
r2 ( 1 2r2 ) 1 + 2r2 1+ 1 5 5

and, by the last proposition, the sequence with terms

1 1 1 1
an = p p n p p n
5 1+ 5 5 1 5
2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p n p n p p n p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p p n p p p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p !n p !n
1 1+ 5 1 1 5
= p p
5 2 5 2

solves the Fibonacci recurrence (10.33), in accordance with (10.35). N

Example 426 Consider the linear recurrence of order 3 given by


(
a0 = 1 ; a1 = 2 ; a2 = 2
11
an = 6 an 1 an 2 + 16 an 3 for n 3

where p1 = 2, p2 = 1, p3 = 1, a0 = 1, and a1 = 1. Since the cubic equation

11 1 3
1 x + x2 x =0
6 6
has solutions r1 = 1, r2 = 2, and r3 = 3, we have

1 + ri ri2
bi =
ri 2 + 2ri 3ri2

So, b1 = 1=3, b2 = 1=20, and b3 = 5=69. By the last proposition, the sequence with terms

1 1 1 5 1
an =
3 20 2n 69 3n
solves this linear recurrence of order 3. N
10.6. INFINITE PATIENCE 299

10.6 In…nite patience


In Section 9.1.2 we introduced, for every 2 (0; 1), the intertemporal utility function U :
R1 ! R given by
1
X
t 1
U (x) = ut (xt ) (10.38)
t=1
where all ut are uniformly bounded (Example 360). Such a utility function ranks all possible
consumption streams x = (x1 ; :::; xt ; :::) 2 R1 . In particular, the higher the subjective
discount factor the more the decision maker cares about future periods, that is, he is more
patient.
One may ask oneself what happens in the limit case " 1, that is, when the subjective
discount factor tends to 1.14 Intuitively, we are in an “in…nite patience” setting, where all
periods –present and future –matter the same for the decision maker. When the horizon T
is …nite, the answer is simple:
T
X T
X
t 1
lim ut (xt ) = ut (xt ) (10.39)
"1
t=1 t=1

so that the limit case corresponds to the sum of the utilities of all periods, all with equal
unitary weight. When the horizon is in…nite the problem becomes far more complex because,
1
X
for the series ut (xt ) to converge, it must be that limt!1 ut (xt ) = 0 (Theorem 348), which
t=1
is hardly justi…able from an economic standpoint.
Let us consider, instead, the limit
1
X
t 1
lim (1 ) ut (xt )
"1
t=1

where 1 is a normalization factor since


1
X
t 1
(1 ) =1 (10.40)
t=1

Such a limit may not exist:

Example 427 Consider the sequence fxt g given by

0; 0 ; 1; 1 ; 0; 0; 0; 0 ; 1; 1; 1; 1; 1; 1; 1; 1; :::
|{z} |{z} | {z } | {z }
2 elements 2 elements 4 elements 8 elements

where every block of 0s and 1s has length equal to the sum of the lengths of the previous
1
X
blocks. One can show that lim "1 (1 ) t 1
xt does not exist. N
t=1

The next remarkable result, the non-simple proof of which we omit, shows that the
existence of the limit is equivalent to convergence in mean.
14
For the meaning of " 1 we refer the reader to Section 8.8.2.
300 CHAPTER 10. DISCRETE CALCULUS

Theorem 428 (Frobenius-Littlewood) Let x = fxt g be a bounded sequence. The limit


1
X
t 1
lim "1 (1 ) xt exists if and only if fxt g converges in the sense of Cesàro. In this
t=1
case,
1
X T
t 1 1X
lim (1 ) xt = lim xt
"1 T !1 T
t=1 t=1

The theorem suggests to de…ne the function V : R1 ! R by

V (x) = (1 ) U (x) 8x 2 R1

For every 2 (0; 1) the function V is equivalent to U :

V (x) V (y) () U (x) U (y) 8x 2 R1

In light of (10.40), V is a normalization of U which assigns value 1 to the constant sequence


xt = 1 for every t.
Thanks to Frobenius-Littlewood’s Theorem, we have
T
1X
lim V (x) = lim ut (xt )
"1 T !1 T
t=1

as long as the limits exist. The in…nite patience case is thus captured by the limit of the
average utilities
T
1X
lim ut (xt ) (10.41)
T !1 T
t=1

that is, by the Cesàro limit of the sequence fut (xt )g. Such a criterion can be thus seen as a
limit case for " 1 of the intertemporal utility function V .
X T
The role that the sum ut (xt ) plays in case (10.39) with …nite horizon is thus played in
t=1
the in…nite horizon case by the limit of the average utilities (10.41). This important economic
application of Frobenius-Littlewood’s Theorem allows us to elegantly conclude this chapter.
Part III

Continuity

301
Chapter 11

Limits of functions

11.1 Introductory examples


The concept of limit has been introduced to formalize the concept of “how a function behaves
when the independent variable approaches (tends to) a point x0 ”. To …x ideas, we start with
some introductory examples in which we consider scalar functions, and then move to a
rigorous formalization in the scalar case as well as in the general multivariable case.

Consider the function f : R f0g ! R de…ned by

sin x
f (x) =
x
and analyze its behavior for points closer and closer to x0 = 0, i.e., to the origin. In the next
table we …nd the values that the function assumes at several such points:

x 0:1 0:01 0:001 0:001 0:01 0:1


f (x) 0:998 0:99998 0:9999999 0:9999999 0:99998 0:998

By inserting other points, closer and closer to the origin, we can verify that the corresponding
values of f (x) get closer and closer to L = 1. In this case we say that “the limit of f (x), as
x tends to x0 = 0, is L = 1”. In symbols,

lim f (x) = 1
x!0

Note that in this example the point x0 = 0 where we take the limit does not belong to the
domain of the function f .

Let f : R ! R be the function de…ned by

x for x 1
f (x) =
1 for x > 1
Its graph is:

303
304 CHAPTER 11. LIMITS OF FUNCTIONS

How does f behave when it approaches the point x0 = 1? By taking points closer and closer
to x0 = 1 we have:

x 0:98 0:99 0:999 0:9999 1:0001 1:001 1:01 1:02


f (x) 0:98 0:99 0:999 0:9999 1 1 1 1

Adding other points, closer and closer to x0 = 1, we can verify that, as x gets closer and
closer to x0 = 1, f (x) gets closer and closer to L = 1. In this case we say that “the limit of
f (x) as x tends to x0 = 1 is L = 1”, and write

lim f (x) = 1
x!1

Observe that the value that the function assumes at the point x0 = 1 is f (1) = 1, so the
limit L = 1 is equal to the value f (1) of the function at x0 = 1.

Let f : R ! R be the function de…ned by

8
< x if x < 1
f (x) = 2 if x = 1
:
1 if x > 1

Compared to the previous example we have introduced a “jump”at the point x = 1, so that
the function jumps to the value 2 –we have indeed f (1) = 2. The graph now is:
11.1. INTRODUCTORY EXAMPLES 305

If we study the behavior of f for values of x closer and closer to x0 = 1, we build the same
table as before (because the function, except at the point 1, is identical to the one in the
previous example). Therefore, also in this case we have

lim f (x) = 1
x!1

This time the value that the function assumes at the point 1 is f (1) = 2, di¤erent from the
value L = 1 of the limit.

Until now we have approached the point x0 from both the right and the left, that is,
bilaterally (in a two-sided manner). Sometimes this is not possible; rather, one can approach
x0 from either the right or the left, that is, unilaterally (in one-sided manner). Consider, for
example, the function f : R f2g ! R given by f (x) = 1= (x 2) and let x0 = 2. Its graph
is:
306 CHAPTER 11. LIMITS OF FUNCTIONS

“To approach the point x0 = 2 from the right” means to approach it by considering only
values x > 2:
x 2:0001 2:001 2:01 2:05 2:1 2:2 2:5
f (x) 10; 000 1; 000 100 20 10 5 2

For values closer and closer to 2 from the right, the function assumes values that are larger
and larger and unbounded above. In this case we say that “the function f tends to +1 as
x tends to 2 from the right” and write

lim f (x) = +1
x!2+

Let us see now what happens by approaching x0 = 2 from the left, that is, by considering
values x < 2:

x 1:5 1:8 1:9 1:95 1:99 1:999 1:9999


f (x) 2 5 10 20 100 1; 000 10; 000

For values closer and closer to 2 from the left, the function assumes larger and larger (in
absolute value) negative values. In this case we say that “the function f tends to 1 as x
tends to 2 from the left” and write

lim f (x) = 1
x!2

Summing up, as also the graph suggests, we have

+1 = lim f (x) 6= lim f (x) = 1 (11.1)


x!2+ x!2

The “right-hand” and the “left-hand” limits both exist but are (dramatically) di¤erent.
As we will see in Proposition 445, the fact that the one-sided limits are distinct re‡ects
the fact that the two-sided limit of f (x), as x tends to 2, does not exist. Indeed, the equality
of the one-sided limits is equivalent to the existence of the two-sided limit. For example, if
we modify the previous function by considering f (x) = 1= jx 2j, we have

lim f (x) = lim f (x) = lim f (x) = +1 (11.2)


x!2 x!2+ x!2

Now the two one-sided limits are equal and coincide with the two-sided one, which in this
case exists (even if in…nite).

Considering again the function f (x) = 1= (x 2), what does it happen if, as x0 , we take
+1? In other terms, what does it happen if we consider increasingly larger values of x?
Look at the following table:

x 100 1; 000 10; 000 100; 000 1; 000; 000


f (x) 0:0102 0:001002 0:0001 0:00001 0:000001

For increasingly larger values of x, the function assumes values closer and closer to 0. In this
11.1. INTRODUCTORY EXAMPLES 307

case we say that “the function tends to 0 as x tends to +1” and write
lim f (x) = 0
x!+1

Observe that the function assumes values close to 0, but always strictly positive: f approaches
0 “from above”. If we want to emphasize this aspect we write
lim f (x) = 0+
x!+1

where 0+ suggests that, while converging to 0, the values of f (x) remain positive.
What does it happen if, instead, as x0 we take 1? We have the following table of
values:

x 100 1; 000 10; 000 100; 000 1; 000; 000


f (x) 0:0098 0:000998 0:0001 0:00001 0:000001

For negative and increasingly larger (in absolute value) values of x, the function assumes
values closer and closer to 0. We say that “the function tends to 0 as x tends to 1” and
write
lim f (x) = 0
x! 1
If we want to emphasize that the function, in approaching 0, remains negative, we write
lim f (x) = 0
x! 1

Finally, after having seen various types of limits, let us consider a function that has no
limit, i.e., that it does not exhibit any “de…nite trend”. Let f : R f0g ! R be given by
1
f (x) = sin
x
At the origin, i.e., at x0 = 0, the function does not have a limit: for x closer and closer to
the origin, the function continues to oscillate with a tighter and tighter sinusoidal trend:

1 y
0.8

0.6

0.4

0.2

0
x
-0.2

-0.4

-0.6

-0.8

-1

-0.4 -0.2 0 0.2 0.4 0.6


308 CHAPTER 11. LIMITS OF FUNCTIONS

The origin is, however, the only point where this function does not have a limit: at all other
points of the domain the limit exists. A much more dramatic behavior is displayed by the
Dirichlet function f : R ! R de…ned by
(
1 for x 2 Q
f (x) = (11.3)
0 for x 2 =Q
This remarkable function oscillates “obsessively”between the values 0 and 1 because, by the
density of the rational numbers in the real numbers, for any pair x < y of real numbers there
exists a rational number q such that x < q < y. As we will see, the Dirichlet function does
not have a limit at any point x0 2 R.

11.2 Functions of a single variable


11.2.1 Two-sided limits
In the introductory examples emerge four possible cases in which the limit exists, depending
on the …niteness or not of the point x0 and of the value L of the limit. Speci…cally:

(i) limx!x0 f (x) = L 2 R, i.e., both the point x0 and the limit L are …nite (scalars);
(ii) limx!x0 f (x) = 1, i.e., the point x0 is …nite, but the limit L is in…nite;
(iii) limx!+1 f (x) = L 2 R or limx! 1f (x) = L 2 R, i.e., the point x0 is in…nite, but
the limit L is …nite;
(iv) limx!+1 f (x) = 1 or limx! 1f (x) = 1, i.e., both the point x0 and the limit L
are in…nite.

We formalize the notion of limit in these cases. We begin with case (i). First of all, let
us observe that we can meaningfully talk of the limit at x0 2 R of a function with domain
A only when x0 is a limit point of A. Indeed, in this case the sentence “as x 2 A tends to
x0 ” is meaningful.

De…nition 429 Given a function f : A R ! R and a limit point x0 of A, we write


lim f (x) = L 2 R
x!x0

if, for every " > 0, there exists a " > 0 such that, for every x 2 A,
0 < jx x0 j < " =) jf (x) Lj < " (11.4)
The value L is called the limit of the function at x0 .

Note that (11.4) can be written as


0 < d (x; x0 ) < " =) d (f (x) ; L) < " (11.5)
The de…nition requires that, for any …xed quantity " > 0, arbitrarily small, there exists
a value " such that all the points x 2 A that are " close to the point x0 have images f (x)
that are " close to the value L of the limit. Note that the condition d (x; x0 ) > 0 amounts
to require x 6= x0 .
11.2. FUNCTIONS OF A SINGLE VARIABLE 309

Example 430 Let us show that limx!2 (3x 5) = 1. We have to verify that, for every
" > 0, there exists " > 0 such that

jx 2j < " =) j(3x 5) 1j < " (11.6)

We have j(3x 5) 1j < " if and only if jx 2j < "=3. Therefore, setting " = "=3 yields
(11.6). N

Intuitively, the smaller (so the more demanding) the value of " is, the smaller " is. To
make more precise this intuition, note that the relationship between " and " is similar,
mutatis mutandis, to that between " and n" in the de…nition of converge of sequences, which
we discussed at length after De…nition 277. Now to show that L is a limit of f at x0 , you
have to pass the following, still highly demanding, test: given any threshold " > 0 selected
by a relentless examiner, you have to be able to come up with a small enough " so that all
points that are close to x0 have images that are " close to L.
Note that " depends on " and is not unique: when we …nd a value of " , all smaller
values also work …ne. For instance, in the last example we can choose as " any (positive)
value lower than "=3. But, one typically focuses on the largest such " (if exists), which is a
genuine threshold value. It is in terms of such “threshold” " that we can, indeed, say: the
smaller (so the more demanding) the value of " is, the smaller " is.

N.B. The value of " , besides depending on ", clearly depends also on x0 . This dependence
is, however, so obvious that it can safely omitted in the notation. O

O.R. It is hard to overestimate the importance of the previous “test” in making rigorous
limit notions in mathematics. Its origin traces back to Eudoxus’method of exhaustion that
underlies integration theory (Chapter 35). Perhaps, the best classic description of a form of
such test is Proposition 1 in Euclid’s Book X: “Two unequal magnitudes being set out, if
from the greater there is subtracted a magnitude greater than its half, and from that which
is left a magnitude greater than its half, and if this process is repeated continually, then there
will be left some magnitude less than the lesser magnitude set out” (trans. Heath – we put
in italics the words where the test emerges). Yet, it was only in XIX century that, through
the works of Cauchy and Weierstrass, the test took the form that we presented in De…nitions
277 and 429. H

We provide now an example in which the limit does not exist.

Example 431 For the Dirichlet function (11.3), limx!x0 f (x) does not exist for any x0 2 R.
Indeed, given x0 2 R, let us suppose, by contradiction, that limx!x0 f (x) exists and is equal
to L 2 R. Let 0 < " < 1=2. By de…nition, there exists = " such that1

1
x0 6= x 2 (x0 ; x0 + ) =) jf (x) Lj < " <
2
1
The expression “x0 6= x 2 (x0 ; x0 + )” means “x 2 (x0 ; x0 + ) and x 6= x0 ”. In words, x
belongs to the interval (x0 ; x0 + ) but is distinct from x0 . To ease notation, similar expressions are used
throughout the chapter.
310 CHAPTER 11. LIMITS OF FUNCTIONS

In each neighborhood (x0 ; x0 + ) there exist both rational points and irrational points
distinct from x0 (see Proposition 39), so points x0 ; x00 2 (x0 ; x0 + ) for which f (x0 ) = 1
00
and f (x ) = 0. We thus reach the contradiction
1 1
1 = j1 0j = f x0 f x00 f x0 L + L f x00 < + =1
2 2
Therefore, limx!x0 f (x) does not exist for any point x0 2 R. N

De…nition 429, in which the distances are made explicit, is of the “"- ” type. In view
of (11.5), it is immediate to rewrite it in the language of neighborhoods. To make notation
more expressive, we denote by U (x0 ) a neighborhood of x0 of radius and by V" (L) a
neighborhood of L of radius ". Graphically, the former is a neighborhood in the horizontal
axis, while the latter is a neighborhood in the vertical axis.

De…nition 432 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (11.7)

As for convergence of sequences, the rewriting in the language of neighborhoods is very


evocative.2 In particular, via the topology of the extended real line (Section 8.8.4), we can
immediately generalize the de…nition so to include also the cases (ii), (iii) and (iv), in analogy
with what we did with De…nition 292 for limits of sequences.

De…nition 433 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (11.8)

The di¤erence between De…nitions 432 and 433 is apparently minor: in the former de…ni-
tion we have R, in the latter we have R. The simple modi…cation allows, however, to consider
also the cases (ii), (iii) and (iv). In particular:

case (ii) is obtained by setting x0 2 R and L = 1;

case (iii) is obtained by setting x0 = 1 and L 2 R;

case (iv) is obtained by setting x0 = 1 and L = 1.

To exemplify we consider explicitly a few subcases, leaving to the reader the other ones.
We start with the subcase x0 2 R and L = +1 of (i). In this case De…nition 433 reduces to
the following “"- ” form (that is, with distances made explicit).
2
In a nutshell, we can say that “there exists a neighborhood” takes the place of the adverb “eventually”
used for sequences.
11.2. FUNCTIONS OF A SINGLE VARIABLE 311

De…nition 434 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = +1
x!x0

if, for every M > 0, there exists M > 0 such that, for every x 2 A, we have

0 < jx x0 j < M =) f (x) > M (11.9)

In other words, for each constant M , no matter how large, there exists M > 0 such that
all the points x0 6= x 2 A that are M close to x0 have images f (x) larger than M .

Example 435 Let f : R f2g ! R be given by f (x) = 1= jx 2j. Graphically:

The point x0 = 2 is a limit point for R f2g, so we can consider limx!2 f (x). Let M > 0.
Setting M = 1=M , we have
1 1
0 < jx x0 j < M () 0 < jx 2j < =) >M
M jx 2j

and therefore
0 < jx 2j < M =) f (x) > M
That is, limx!2 f (x) = +1. N

Let us now consider case (iii) with x0 = +1 and L 2 R. Here De…nition 433 reduces to
the following “"- ” one.

De…nition 436 Let f : A R ! R, with A unbounded above.3 We write

lim f (x) = L 2 R
x!+1

if, for every " > 0, there exists M" > 0 such that, for every x 2 A, we have

x > M" =) jf (x) Lj < " (11.10)


3
By Lemma 290, the fact that A is unbounded above guarantees that +1 is a limit point of A. For
example, this is the case when (a; +1) A.
312 CHAPTER 11. LIMITS OF FUNCTIONS

In this case, for each choice of " > 0 arbitrarily small, there exists a value M" such that
the images of points x greater that M" are " close to L.

Example 437 Let f : R ! R be given by f (x) = 1 + e x . By Lemma 290, +1 is a


limit point of R. We can, therefore, consider the limit limx!+1 f (x). Let us verify that
limx!+1 f (x) = 1. Let " > 0. We have
x x
jf (x) Lj = 1 + e 1 =e < " () x < log " () x > log "

Therefore, setting M" = log ", we have

x > M" =) jf (x) Lj < "

That is, limx!+1 f (x) = 1. N

Finally, we consider case (iv) with x0 = L = +1. In this case De…nition 433 reduces to
the following one:

De…nition 438 Let f : A R ! R, with A unbounded above. We write

lim f (x) = +1
x!+1

if, for every M > 0, there exists N such that, for every x 2 A, we have

x > N =) f (x) > M (11.11)


p
Example 439 Let f : R+ ! R be given by f (x) = x. By Lemma 290, +1 is a limit
point of R+ , so we can consider limx!+1 f (x). Let us verify that limx!+1 f (x) = +1.
For every M > 0 we have
p
f (x) > M () x > M () x > M 2

Setting N = M 2 yields
x > N =) f (x) > M
That is, limx!+1 f (x) = +1. N

N.B. If A = N+ , that is, f : N+ ! R is a sequence, with the last two de…nitions we recover
the notions of convergence and of (positive) divergence for sequences. The theory of limits of
functions extends, therefore, the theory of limits of sequences of Chapter 8. In this respect,
note that the set N+ has only one limit point: +1. This is why the only limit meaningful
for sequences is limn!1 . O

O.R. It may be useful to see the concept of limit “in three stages” (as a rocket):

(i) for every neighborhood V of L (in ordinate)

(ii) there exists a neighborhood U of x0 (in abscissa) such that


11.2. FUNCTIONS OF A SINGLE VARIABLE 313

(iii) all the values of f at x 2 U , x 6= x0 , belong to V , i.e., all the images – excluding at
most f (x0 ) –of f in U \ A belong to V : f (U \ A fx0 g) V .

10 y

V(l)
6

O U(x ) x
0
0

-2
-2 -1 0 1 2 3 4

We are often tempted to simplify to two stages: “the values of x close to x0 have images
f (x) close to L”, that is,

for every U there exists V such that f (U \ A fx0 g) V

Unfortunately, this an empty statement that is always (vacuously) true, as the …gure shows:

5
y

3 V(l)

0
O x
U(x )
-1 0

-2

-3

-4
-4 -2 0 2 4 6

In the …gure, for every neighborhood U (x0 ), however small, of x0 there exists always a
neighborhood (possibly quite big) V (L) of L inside which fall all the values of f (x) with
x 2 U fx0 g. Such V can always be taken as an open interval that contains f (U fx0 g).H
314 CHAPTER 11. LIMITS OF FUNCTIONS

11.2.2 One-sided limits


We cannot always talk of two-sided (or bilateral) limits. For example, consider the simple
function f : R ! R given by
2 if x 1
f (x) =
x if x < 1
with graph

It is easy to see that limx!1 f (x) does not exist. In these cases one can resort to the weaker
notion of one-sided (or unilateral) limit, which we already met in an intuitive way in the
introductory examples of this chapter. These examples, indeed, suggest two possible cases
when the right limit exists:

(i) limx!x+ f (x) 2 R;


0

(ii) limx!x+ f (x) = 1.


0

Similarly, we also have two “left” cases. Note that in both (i) and (ii) the point x0 is in
R, while the value of the limit is in R.
The next “right” de…nition includes both cases.

De…nition 440 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x+
0

if, for every neighborhood V" (L) of L, there exists a right neighborhood U +" (x0 ) = [x0 ; x0 + " )
of x0 such that
x0 6= x 2 U +" (x0 ) \ A =) f (x) 2 V" (L) (11.12)
The value L is called the right limit of the function at x0 .

In a similar way we can de…ne the left limits, denoted by limx!x f (x), as readers can
0
check.
11.2. FUNCTIONS OF A SINGLE VARIABLE 315

By excluding x0 , the neighborhood U +" (x0 ) reduces to (x0 ; x0 + " ), so (11.12) can be
more simply written as

x 2 (x0 ; x0 + ") \ A =) f (x) 2 V" (L)

But, it is important to keep track of neighborhoods.

This de…nition includes both cases:

case (i) is obtained by setting L 2 R;

case (ii) is obtained by setting L = 1.

In case (i), De…nition 440 reduces to the following “"- ” one.

De…nition 441 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x+
0

if, for every " > 0, there exists = " > 0 such that, for every x 2 A,

x0 < x < x0 + =) jf (x) Lj < " (11.13)


p p
Example 442 Consider f : R+ ! R given by f (x) = x. We claim that limx!0+ x = 0.
Let " > 0. Then,
p
jf (x) Lj = x < " () x < "2

Setting " = "2 , we have


0<x< " =) jf (x) Lj < "
p
That is, limx!0+ x = 0. N

Let us consider the subcase L = +1 of (ii), leaving to the reader the subcase L = 1.
For this case, De…nition 440 reduces to the following “"- ” one.

De…nition 443 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = +1
x!x+
0

if, for every M > 0, there exists M > 0 such that, for every x 2 A,

x0 < x < x0 + M =) f (x) > M (11.14)

We close this section with an example, from the introduction, in which both one-sided
limits (right and left) exist, but are di¤erent.
316 CHAPTER 11. LIMITS OF FUNCTIONS

Example 444 Let f : R f2g ! R be given by f (x) = 1= (x 2). The point x0 = 2


is a limit point of R f2g, so we can consider the two one-sided limits limx!2+ f (x) and
limx!2 f (x). Let M > 0. Setting M = 1=M , for every x > 2 we have

1 1
x x0 < M () x 2< =) >M
M x 2
Therefore
0<x 2< M =) f (x) > M
that is, limx!2+ f (x) = +1. On the other hand, for every x < 2 we have

1 1
x0 x< M () 2 x< =) < M
M x 2
Therefore
0<2 x< M =) f (x) < M
That is, limx!2 f (x) = 1. We conclude that the two one-sided limits exist but are
dramatically di¤erent. This formally proves (11.1), which was intuitively discussed in the
introduction. N

11.2.3 Relations between one-sided and two-sided limits


Next we show that two-sided limits (for …nite points) exist if and only if the corresponding
one-sided limits exist and are equal. In other words, a two-sided limit can be regarded as the
case in which the two one-sided limits coincide. When they di¤er (or at least one of them
does not exist), the two-sided limit no longer exist.

Proposition 445 Let f : A R ! R be a function and x0 2 R a point for which there


exists a neighborhood B" (x0 ) such that B" (x0 ) fx0 g A. Then, limx!x0 f (x) = L 2 R if
and only if
lim f (x) = lim f (x) = L 2 R
x!x+
0 x!x0

Note that B" (x0 ) fx0 g is a neighborhood of x0 deprived of x0 itself, so “with a hole”
in the middle. The result requires that there exists at least one such neighborhood in A.
Clearly, if x0 2 A this amounts to require that x0 be an interior point of A. But the
hole permits x0 to be outside A. For instance, this is the case if we consider (again) the
function f (x) = 1= jx 2j and the point x0 = 2, which is outside the domain of f . We have
limx!2 f (x) = +1 and hence, by Proposition 445,

lim f (x) = lim f (x) = lim f (x) = +1


x!2 x!2+ x!2

which con…rms (11.2). For f (x) = 1= (x 2) we have, instead,

+1 = lim f (x) 6= lim f (x) = 1


x!2+ x!2

So, by Proposition 445 the two-sided limit limx!2 f (x) does not exist.
11.2. FUNCTIONS OF A SINGLE VARIABLE 317

Proof We prove the proposition for L 2 R, leaving to the reader the case L = 1.
Moreover, for simplicity we suppose that x0 is an interior point of A.
“If”. We show that limx!x f (x) = limx!x+ f (x) = L implies limx!x0 f (x) = L. Let
0 0
" > 0. Since limx!x+ f (x) = L, there exists 0" > 0 such that, for every x 2 (x0 ; x0 + 0" ) \ A,
0
we have jf (x) Lj < ". On the other hand, since limx!x f (x) = L, there exists 00" > 0
0
00 0 00
such that for every x 2 (x0 " ; x0 ) \ A we have jf (x) Lj < ". Let " = min "; " .
Then
x 2 (x0 ; x0 + " ) \ A =) jf (x) Lj < " (11.15)
and
x 2 (x0 " ; x0 ) \ A =) jf (x) Lj < " (11.16)
that is
x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < "
Therefore, limx!x0 f (x) = L.
“Only if”. We show that limx!x0 f (x) = L implies limx!x f (x) = limx!x+ f (x) = L.
0 0
Let " > 0. Since limx!x0 f (x) = L, there exists " > 0 such that

x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < " (11.17)

Since x0 is not a boundary point, both intersections (x0 " ; x0 ) \ A and (x0 ; x0 + " ) \ A
are not empty. Therefore, (11.17) implies both (11.15) and (11.16), so limx!x+ f (x) =
0
limx!x f (x) = L.
0

As the reader may have noted, when A is an interval the hypothesis B" (x0 ) fx0 g A
of Proposition 445 forbids x0 to be a boundary point. Indeed, to …x ideas, assume that A
is an interval of the real line with endpoints a < b.4 When x0 = a = inf A, it does not
make sense to talk of the one-sided limit limx!a f (x), while when x0 = b = sup A it does
not make sense to talk of the one-sided limit limx!b+ f (x). So, at the endpoints one of the
one-sided limit becomes meaningless.
Interestingly, at the endpoints we have, instead, limx!a f (x) = limx!a+ f (x) and limx!b f (x) =
limx!b f (x). Indeed, the de…nition of two-sided limit is perfectly satis…ed: for each neigh-
borhood V of L there exists a neighborhood –necessarily one-sided because x0 is an endpoint
–such that the images of f , except perhaps f (x0 ), fall in V .
A similar observation can be made, more generally, at each boundary point x0 of A. For
p
instance, if A is a half-line [x0 ; +1), the left limit at x0 is meaningless: for f (x) = x and
p
x0 = 0, the left limit limx!0 x is meaningless.
p
Example 446 Let f : [0; 1) ! R be given by f (x) = x. We just remarked that
limx!0 f (x) is meaningless , while in Example 442 we saw that limx!0+ f (x) = 0. By
what we just noted, we can also write limx!0 f (x) = 0. It is instructive to compute this
two-sided limit directly, through De…nition 429. Let " > 0. As we saw in Example 442, we
have p
jf (x) Lj = x < " () x < "2
4
In other words, one of the following four cases holds: (i) A = (a; b); (ii) A = [a; b); (iii) A = (a; b]; (iv)
A = [a; b].
318 CHAPTER 11. LIMITS OF FUNCTIONS

Setting " = "2 , for every x 2 A, that is, for every x 0, we have

0 < jx x0 j < " () 0 < x < " =) jf (x) Lj < "


p
Therefore, limx!0 x = 0. N

11.2.4 Grand …nale


We conclude by observing that in the general De…nition 433 of the two-sided limit – which
includes all the cases of …nite or in…nite points and …nite or in…nite limits –the mention of
" and " is actually super‡uous. We can therefore rewrite such a de…nition in the following
neater way.

De…nition 447 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V of L, there exists a neighborhood U of x0 such that

f ((U \ A) fx0 g) V

It is this version of two-sided limit that the reader will …nd generalized to topological
spaces in more advanced courses. A similar general version holds for one-sided limits, as the
reader can check.

11.3 Functions of several variables


11.3.1 De…nition
The extension to functions of several variable f : A Rn ! R of the de…nition of limit,
limx!x0 f (x) = L, is altogether natural, almost e¤ortless. Indeed, once we consider neigh-
borhoods of Rn de…ned through the general distance d (x; x0 ) = kx x0 k, the sentence
“to approach x0 ” continues to mean “the distance between x and x0 becomes smaller and
smaller”. Formally:5

De…nition 448 Let f : A Rn ! R be a function and x0 2 Rn a limit point of A. We


write
lim f (x) = L 2 R (11.18)
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (11.19)

The value L is called the limit of the function at x0 .


5
For brevity, we consider only the two-sided case x0 2 Rn , leaving the other cases to readers.
11.3. FUNCTIONS OF SEVERAL VARIABLES 319

De…nition 433 is the special case with n = 1. In the “"- ” version we have (11.18) if, for
every " > 0, there exists " > 0 such that, for every x 2 A,

0 < d (x; x0 ) = kx x0 k < " =) d (f (x) ; L) = jf (x) Lj < " (11.20)

Clearly, (11.20) reduces to (11.5) when n = 1, i.e., when kx x0 k reduces to jx x0 j.


Pn
Example 449 Let f : Rn ! R be given by f (x) = 1+ i=1 xi . We verify that limx!0 f (x) =
1. Let " > 0. We have
n
X n
X
d (f (x) ; 1) = 1 + xi 1 < " () xi < "
i=1 i=1
P Pn
Set " = "=n. Since j ni=1 xi j i=1 jxi j, we have
v
u n n
uX " X "2
d (x; x0 ) < " () t 2
xi < () x2i < 2 =)
n n
i=1 i=1
q r
" 2 "2 "
x2i < 2 8i = 1; 2; : : : ; n =) jxi j = x2i < 2
= 8i = 1; 2; : : : ; n
n n n
X n n
X Xn
=) jxi j < " =) d (f (x) ; 1) = xi jxi j < "
i=1 i=1 i=1

That is, limx!0 f (x) = 1. N

As the reader can check, we can easily extend to functions of several variables the limits
from above and from below (indeed, the limit L keeps being a scalar, not a vector). Moreover,
the notion of limit can be easily extended to operators. But we postpone it to Chapter 12
(De…nition 497), when we will study the continuity of operators, a topic that will motivate
this further extension.

11.3.2 Directions
So far, so good. Too good, in a sense because the multivariable extension of the notion of
limit seems just a matter of upgrading the distance, from the absolute value jx x0 j between
scalars to the more general case of the norm kx x0 k between vectors. Formally, this is true
but one should not forget that, when n > 1, the condition kx x0 k < " controls many more
ways to approach a point. Indeed, in the real line there are only two ways to approach a
point x0 , the left direction and the right one. They are identi…ed with and + in the next
…gure, respectively.

Instead, in the plane –a fortiori, in a general space Rn –there are in…nitely many directions
along which to approach a point x0 , as the …gure illustrates:
320 CHAPTER 11. LIMITS OF FUNCTIONS

Intuitively, condition (11.20) requires that, as x0 approaches x along all such directions, the
function f tends to the same value L. In other words, the behavior of f is consistent across
all such directions. If, therefore, there are two such directions along which f does not tend
to the same limit value, the function does not have a limit as x ! x0 . The following example
should clarify the issue.

Example 450 Let f : R2 ! R be given by

log(1 + x1 x2 )
f (x1 ; x2 ) =
x21

Let us verify that lim(x1 ;x2 )!(0;0) f (x) does not exist. Consider two possible directions along
which we can approach the origin: along the parabola x2 = x21 , and along the straight line
x2 = x1 . Graphically:

Along the parabola we have

log(1 + x31 ) log(1 + x31 )


lim f (x1 ; x2 ) = lim f x1 ; x21 = lim 2 = lim x1 =0
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x1 x1 !0 x31
11.3. FUNCTIONS OF SEVERAL VARIABLES 321

Along the straight line, we instead have

log(1 + x21 )
lim f (x1 ; x2 ) = lim f (x1 ; x1 ) = lim =1
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21

Since f tends to two di¤erent limit values along the two directions, we conclude that
lim(x1 ;x2 )!(0;0) f (x) does not exist.
We can prove this failure rigorously using De…nition 448. Suppose, by contradiction, that
the limit exists, that is,
lim f (x1 ; x2 ) = L
(x1 ;x2 )!(0;0)

Set " = 1=4. By de…nition of limit, there exists 1 > 0 such that, for (0; 0) 6= (x1 ; x2 ) 2
B 1 (0; 0), we have
1
d (f (x1 ; x2 ) ; L) < (11.21)
4
From the limit along the parabola, by setting

log(1 + x3 )
g(x) =
x2
one gets limx1 !0 g(x1 ) = 0. Therefore, by setting again " = 1=4, there exists 2 > 0 such
that for, 0 6= x1 2 B 2 (0) R, we have

1 1
g(x1 ) 2 ( "; ") = ;
4 4

Now consider the neighborhood B 2 (0; 0) R2 of (0; 0). Take a point on the parabola
2
x2 = x1 that belongs to this neighborhood, that is, a point (0; 0) 6= x ^21 2 B 2 (0; 0). We
^1 ; x
^1 2 B 2 (0),6 so
have x
1 1
f x ^21 = g (^
^1 ; x x1 ) 2 ; (11.22)
4 4
Similarly, from the limit along the straight line, by setting

log(1 + x2 )
h(x) =
x2
one gets limx1 !0 h(x1 ) = 1. Therefore, setting again " = 1=4, there exists 3 > 0 such that
for 0 6= x1 2 B 3 (0) R, we have

3 5
h(x1 ) 2 (1 "; 1 + ") = ;
4 4

Now consider the neighborhood B 3 (0; 0) R2 of (0; 0) and take a point of the straight line
x2 = x1 that belongs to it, that is, a point (0; 0) 6= (~x1 ; x
~1 ) 2 B 3 (0; 0). We have x
~1 2 B 3 (0),
so that
3 5
f (~x1 ; x
~1 ) = h (^
x1 ) 2 ; (11.23)
4 4
6
Indeed, d((^ ^21 ); (0; 0)) <
x1 ; x 2, ^21 + x
that is, x ^41 < 2
2, ^21 <
implies x 2
2, whence d(^
x1 ; 0) < 2.
322 CHAPTER 11. LIMITS OF FUNCTIONS

Let = minf 1 ; 2 ; 3 g and consider two points x ^21 and (~


^1 ; x x1 ; x
~1 ) on the parabola and
on the straight line that belong to B (0; 0) and that are di¤erent from the origin (0; 0). By
(11.22) and (11.23), we have
1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) >
2
On the other hand, from (11.21) we have
1 1 1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) d f x ^21 ; L + d (L; f (~
^1 ; x x1 ; x
~1 )) < + =
4 4 2
This contradiction shows that the limit lim(x1 ;x2 )!(0;0) f (x1 ; x2 ) does not exist. N

11.3.3 Sequential characterization


Limits of functions admit a key characterization through limits of approaching sequences.

Proposition 451 Given a function f : A Rn ! R and a limit point x0 2 Rn of A, we


have limx!x0 f (x) = L 2 R if and only if
xn ! x0 =) f (xn ) ! L
for every sequence fxn g in A with terms distinct from x0 (i.e., xn 6= x0 for every n 1).

Proof We consider L 2 R, leaving to the reader the case L = 1. “If”. Suppose f (xn ) ! L
for every sequence fxn g of points of A, with xn 6= x0 for every n, such that xn ! x0 . Suppose,
by contradiction, that limx!x0 f (x) = L is false. Then, there is " > 0 such that, for every
> 0, there exists x 2 A such that 0 < d (x ; x0 ) < and d (f (x ) ; L) ". For every n, set
= 1=n and let xn be the corresponding point of A just denoted by x . For the sequence fxn g
of points of A so constructed, we have d (x0 ; xn ) < 1=n for every n, so limn!1 d (x0 ; xn ) = 0.
By Proposition 281, xn ! x. But, by construction, d (f (xn ) ; L) " for every n, so the
sequence f (xn ) does not converge to L. Having contradicted the hypothesis, we conclude
that limx!x0 f (x) = L.
“Only if”. Suppose limx!x0 f (x) = L 2 R. Let fxn g be a sequence of points of A, with
xn 6= x0 for every n, such that xn ! x0 . Let " > 0. There exists " > 0 such that, for
every x 2 A, 0 < d (x; x0 ) < " implies d (f (x) ; L) < ". Since xn ! x0 and xn 6= x0 , there
exists n" 1 such that 0 < d (xn ; x0 ) < " for every n n" . For every n n" we thus have
d (f (xn ) ; L) < ", which implies f (xn ) ! L.

Example 452 Let us go back to limx!2 (3x 5) of Example 430. Since A = R, let fxn g
be any sequence of scalars, with xn 6= 2 for every n, such that xn ! 2. For example,
xn = 2 + 1=n or xn = 2 1=n2 . By the algebra of limits of sequences, we have

lim (3xn 5) = 3 lim xn 5=3 2 5=1


n!1 n!1

For example, in the special case xn = 2 + 1=n we have


1 1
lim 3 2+ 5 = 3 lim 2+ 5=3 2 5=1
n!1 n n!1 n
By Proposition 451, this con…rms that limx!2 (3x 5) = 1. N
11.4. PROPERTIES OF LIMITS 323

Example 453 Consider the function f : (0; 1) ! R given by


p
x
f (x) =
x
and the limit limx!0 f (x). Since A = (0; 1) and x0 = 0, let fxn g be a sequence of strictly
positive scalars such that xn ! 0. For example, xn = 1=n or xn = 1=n2 . By the algebra of
limits of sequences, we have
p
xn 1
lim = lim p = +1
n!1 xn n!1 xn

and so Proposition 451 allows to conclude that limx!0 f (x) = +1. N

The characterization of limits through sequences is important both operationally, be-


cause the calculation of the limits of functions reduces to the simpler calculation of limits
of sequences (as the last examples just showed), and theoretically because in this way many
of the properties established for limits of sequences easily extend to the more general case
of limits of functions. In this chapter we will mostly focus on the second, more theoretical
aspect, of the sequential characterization in order to obtain basic properties of limits.
In sum, though limits of sequences are a special case of limits of functions, they have a
special status because of the sequential characterization established in the last proposition.

11.4 Properties of limits


In this section we present some basic properties of limits. To ease matters and state the
properties directly in terms of functions of several variables, we will consider only limit
points x0 that belong to Rn . However, as the reader can check, for scalar functions these
properties hold also for the case x ! 1.7 Later in the book we will use also such versions.
We start with the uniqueness of the limit.

Theorem 454 (Uniqueness of the limit) Let f : A Rn ! R be a function and x0 2 R


a limit point of A. There exists at most a unique L 2 R such that limx!x0 f (x) = L.

Proof Let us suppose, by contradiction, that there exist two di¤erent limits L0 6= L00 . Let
fxn g be a sequence in A, with eventually xn 6= x0 , such that xn ! x0 . By Proposition 451,
f (xn ) ! L0 and f (xn ) ! L00 , which contradicts the uniqueness of the limit for sequences.
It follows that L0 = L00 .

Here is an alternative proof, which does not use limits of sequences.

Alternative proof By contradiction, let us suppose that there exist two di¤erent limits L1
and L2 , that is, L1 6= L2 . We assume therefore that

lim f (x) = L1
x!x0

7
That is, for the case x0 2 R that, indeed, includes x ! 1 as the special cases x ! x0 = 1.
324 CHAPTER 11. LIMITS OF FUNCTIONS

and
lim f (x) = L2
x!x0

with L1 6= L2 . Without loss of generality, suppose that L1 > L2 . There exists a number K
such that L1 > K > L2 . Setting 0 < "1 < L1 K and 0 < "2 < K L2 , the neighborhoods
B"1 (L1 ) = (L1 "1 ; L1 + "1 ) and B"2 (L2 ) = (L2 "2 ; L2 + "2 ) are disjoint.

10 y
L +ε
2 2
8
L2

L -ε
2 2
6
L +ε
1 1
L
1
4 L -ε
1 1

O x
0

-2
-2 -1 0 1 2 3 4

Since by hypothesis limx!x0 f (x) = L1 , given "1 > 0 one can …nd 1 > 0 such that

x0 6= x 2 (x0 1 ; x0 + 1) \ A =) f (x) 2 (L1 "1 ; L1 + "1 ) (11.24)

Analogously, since by hypothesis limx!x0 f (x) = L2 , given "2 > 0 one can …nd 2 > 0 such
that
x0 6= x 2 (x0 2 ; x0 + 2 ) \ A =) f (x) 2 (L2 "2 ; L2 + "2 ) (11.25)
Taking = min f 1 ; 2 g we have that the neighborhood (x0 ; x0 + ) of x0 with radius
is contained in the two previous neighborhoods, i.e., in (x0 ; x0 + ) both (11.24) and
(11.25) hold:

x0 6= x 2 (x0 ; x0 + ) =) f (x) 2 (L1 "1 ; L1 + "1 ) and f (x) 2 (L2 "2 ; L2 + "2 )

Hence,

x0 6= x 2 (x0 ; x0 + ) =) f (x) 2 (L1 "1 ; L1 + "1 ) \ (L2 "2 ; L2 + "2 )

which is a contradiction, since we assumed that

(L1 "1 ; L1 + "1 ) \ (L2 "2 ; L2 + "2 ) = ;

The limit is therefore unique.

We continue with a version for functions of the theorem on the permanence of sign
(Theorem 295).
11.4. PROPERTIES OF LIMITS 325

Theorem 455 (Permanence of sign) Let f : A Rn ! R be a function and x0 2 Rn a


limit point of A. If limx!x0 f (x) = L 6= 0, then there exists a neighborhood B" (x0 ) of x0 on
which f (x) and L have the same sign, i.e.,

f (x) L > 0 8x0 6= x 2 B" (x0 ) \ A

In words, if L 6= 0, it is always possible to choose a neighborhood of x0 small enough so


that the function takes on, at all its points (distinct from x0 ), a value that has the same sign
of L –i.e., such that f (x) L > 0.

We leave to the reader the easy “sequential” proof based on Theorem 295 and on Pro-
position 451. We give, instead, a proof that directly uses the de…nition of limit.

Alternative proof Let L 6= 0, say L > 0. Since limx!x0 f (x) = L, by taking " = L=2 > 0
there exists a neighborhood B" (x0 ) of x0 such that

L L L 3L
x0 6= x 2 B" (x0 ) \ A =) f (x) 2 L ;L + = ;
2 2 2 2

Since L=2 > 0, we are done. For L < 0, the proof is similar.

The comparison criterion takes the following form for functions.

Theorem 456 (Comparison criterion) Let f; g; h : A Rn ! R be three functions and


x0 2 Rn a limit point of A. If

g (x) f (x) h (x) 8x 2 A (11.26)

and
lim g (x) = lim h (x) = L 2 R (11.27)
x!x0 x!x0

then
lim f (x) = L
x!x0

Again we leave to the reader the easy “sequential” proof based on Theorem 314 and on
Proposition 451, and give a proof based on the de…nition of limit.

Alternative proof Let " > 0. We have to show that there exists > 0 such that f (x) 2
(L "; L + ") for every x0 6= x 2 (x0 ; x0 + ) \ A. Since limx!x0 g(x) = L, there exists
1 > 0 such that

8x0 6= x 2 (x0 1 ; x0 + 1) \ A =) L " < g(x) < L + " (11.28)

Since limx!x0 h(x) = L, there exists 2 > 0 such that

8x0 6= x 2 (x0 2 ; x0 + 2) \ A =) L " < h(x) < L + " (11.29)


326 CHAPTER 11. LIMITS OF FUNCTIONS

By taking = min f 1 ; 2 g, both (11.28) and (11.29) then hold in (x0 ; x0 + ) \ A. By


(11.26), we then have

L " < g(x) f (x) h(x) < L + " 8x0 6= x 2 (x0 ; x0 + ) \ A

that is
f (x) 2 (L "; L + ") 8x0 6= x 2 (x0 ; x0 + ) \ A
Since " was arbitrary, we conclude that limx!x0 f (x) = L.

The comparison criterion for functions has the same interpretation than the original
version for sequences (Theorem 314). The next simple application of this criterion is similar,
mutatis mutandis, to that seen in Example 315.
2 1
Example 457 Let f : R ! R be given by f (x) = ex cos x and let x0 = 0. Since
1
0 cos2 1 8x 2 R
x
by the monotonicity of the exponential function we have
2 1
1 = e0 x ex cos x e1 x = ex 8x 0

Setting g (x) = 1 and h (x) = ex , conditions (11.26) and (11.27) are satis…ed with L = 1.
Therefore, limx!0 f (x) = 1. The proof for x < 0 is analogous. N

As it was the case for sequences, more generally also for functions the last two results
establish properties of the limits with respect to the underlying order structure of Rn . The
next proposition, which extends Propositions 296 and 297 to functions, is yet another simple
result of this kind.

Proposition 458 Let f; g : A Rn ! R be two functions, x0 2 Rn a limit point of A, and


limx!x0 f (x) = L 2 R and limx!x0 g (x) = H 2 R.

(i) If f (x) g (x) in a neighborhood of x0 , then L H.

(ii) If L > H, then there exists a neighborhood of x0 in which f (x) > g (x).

Observe that in (i) we can only say L H even when we have the strict inequality
f (x) > g (x). For example, for the functions f; g : R ! R given by

1 if x = 0
f (x) =
x2 if x =
6 0

and g (x) = 0 we have, for x ! 0, L = H = 0 although f (x) > g (x) for every x 2 R.
Similarly, if f (x) = 1=x and g (x) = 0, for x ! +1 we have L = H = 0 although
f (x) > g (x) for every x > 0.

As we did so far in this section, we leave the sequential proof – based on Propositions
296 and 297 –to readers and give, instead, a proof based on the de…nition of limit.
11.5. ALGEBRA OF LIMITS 327

Alternative proof (i) By contradiction, assume that L < H. Set " = H L, so that
" > 0. The neighborhoods (L "=4; L + "=4) and (H "=4; H + "=4) are disjoint since
L + "=4 < H "=4. Since limx!x0 f (x) = L, there exists 1 > 0 such that
" "
x0 6= x 2 (x0 1 ; x0 + 1) =) f (x) 2 L ;L +
4 4
Analogously, since limx!x0 g (x) = H, there exists 2 > 0 such that
" "
x0 6= x 2 (x0 2 ; x0 + 2) =) g(x) 2 H ;H +
4 4
By setting = minf 1 ; 2 g, we have
" " " "
x0 6= x 2 (x0 ; x0 + ) =) L < f (x) < L + < H < g(x) < H +
4 4 4 4
That is, f (x) < g(x) for every x 2 B (x0 ). This contradicts the hypothesis that f (x) g (x)
in a neighborhood of x0 .
(ii) We prove the contrapositive. It is enough to note that, if f (x) g(x) in every
neighborhood of x0 , then (i) implies L H.

11.5 Algebra of limits


The next result extends the algebra of limits established for sequences (Propositions 309 and
313) to the general case of functions.8

Proposition 459 Given two functions f; g : A Rn ! R and a limit point x0 2 Rn of A,


suppose that limx!x0 f (x) = L 2 R and limx!x0 g (x) = M 2 R. Then:

(i) limx!x0 (f + g) (x) = L+M , provided that L+M is not an indeterminate form (1.25),
of the type
+1 1 or 1+1

(ii) limx!x0 (f g) (x) = LM , provided that LM is not an indeterminate form (1.26), of the
type
1 0 or 0 ( 1)

(iii) limx!x0 (f =g) (x) = L=M provided that g (x) 6= 0 in a neighborhood of x0 , with x 6= x0 ,
and L=M is not an indeterminate form (1.27), of the type9
1 a
or
1 0

Proof We prove only (i), leaving to the reader the analogous proof of (ii) and (iii). Let
fxn g be a sequence in A, with xn 6= x0 for every n 1, such that xn ! x0 . By Proposition
451, f (xn ) ! L and g (xn ) ! M . Suppose that L + M is not an indeterminate form. By
Proposition 309, (f + g) (xn ) ! L + M , and therefore, by Proposition 451 it follows that
limx!x0 (f + g) (x) = L + M .
8
For brevity, we focus on Proposition 309 and leave to the reader the analogous extension of Proposition
313.
9
As for sequences, to exclude the indeterminacy a=0 amounts to require M 6= 0.
328 CHAPTER 11. LIMITS OF FUNCTIONS

Example 460 Let f; g : R f0g ! R be given by f (x) = sin x=x and g (x) = 1= jxj. We
have limx!0 sin x=x = 1 and limx!0 1= jxj = +1. Therefore,

sin x 1
lim + = 1 + 1 = +1
x!0 x jxj

If, instead, g (x) = ex , we have limx!0 (sin x=x + ex ) = 1 + 1 = 2. N

As for sequences, when a 6= 0 the case a=0 of point (iii) is actually not an indeterminate
form for the algebra of limits, as the following version for functions of Proposition 311 shows.

Proposition 461 Let limx!x0 f (x) = L 2 R, with L 6= 0, and limx!x0 g(x) = 0. The limit
limx!x0 (f =g) (x) exists if and only if there is a neighborhood U (x0 ) of x0 2 Rn where the
function g has constant sign, except at most at x0 . In this case:10

(i) if L > 0 and g ! 0+ or if L < 0 and g ! 0 , then

f (x)
lim = +1
x!x0 g (x)

(ii) if L > 0 and g ! 0 or if L < 0 and g ! 0+ , then

f (x)
lim = 1
x!x0 g (x)

Example 462 Consider f (x) = x + 5 and g(x) = x. As x ! 0, we have f ! 5, but in every


neighborhood of 0 the sign of the function g(x) alternates, that is, there is no neighborhood
of 0 where g has constant sign. By Proposition 461, the limit of (f =g) (x) as x ! 0 does not
exist. N

As in the previous section, we considered only limits at points x0 2 Rn . The reader can
verify that for scalar functions the results of this section extend to the case x ! 1.

Example 463 Take f (x) = 1=x 1 and g(x) = 1=x. As x ! +1 we have f ! 1 and
g ! 0. Since g(x) > 0 for every x > 0, so also in any neighborhood of +1, we have g ! 0+ .
Thanks to the version for x ! 1 of Proposition 461, we have limx!+1 (f =g) (x) = 1.
N

11.5.1 Indeterminacies for limits


The algebra of limits presents indeterminacies similar to those of sequences (Section 8.10.3).
Here we will brie‡y review them.
10
Here g ! 0+ and g ! 0 indicate that limx!x0 g (x) = 0 with, respectively, g(x) 0 and g (x) 0 for
every x0 6= x 2 U (x0 ).
11.5. ALGEBRA OF LIMITS 329

Indeterminate form 1 1
For example, the limit limx!0 (f + g) (x) of the sum of the functions f; g : R f0g ! R
given by f (x) = 1=x2 and g (x) = 1=x4 falls under the indeterminate form 1 1. We
have
1 1 1 1
(f + g) (x) = 2 4
= 2 1
x x x x2
and, therefore,
1 1
lim (f + g) (x) = lim lim 1 = 1
x!0 x!0 x2 x!0 x2
since (+1) ( 1) is not an indeterminate form. Exchanging the signs between these two
functions, that is, by setting f (x) = 1=x2 and g (x) = 1=x4 , we have again the indetermin-
ate form 1 1 at x0 = 0, but this time limx!0 (f + g) (x) = +1. Thus, also for functions
the indeterminate forms can give completely di¤erent results, everything goes. So, they must
be solved case by case.
Finally, note that the these functions f and g give rise to an indeterminacy at x0 = 0,
but not at x0 6= 0. Therefore, for functions it is crucial to specify the point x0 that we
are considering. This is, indeed, the only novelty that the study of indeterminate forms of
functions features relative that of sequences (for which we only have the case n ! +1).

Indeterminate form 0 1
For example, consider the functions f; g : R ! R given by f (x) = (x 3)2 and g (x) =
1= (x 3)4 . The limit limx!3 (f g) (x) falls under the indeterminate form 0 1. But we have

1 1
lim (f g) (x) = lim (x 3)2 4 = lim = +1
x!3 x!3 (x 3) x!3 (x 3)2

On the other hand, by considering f (x) = 1= (x 3)2 and g (x) = (x 3)4 , we have

1
lim (f g) (x) = lim (x 3)4 2 = lim (x 3)2 = 0
x!3 x!3 (x 3) x!3

Again, only the direct calculation of the limit can determine its value.

Indeterminate forms 1=1 and 0=0


For example, let f; g : R ! R be given by f (x) = 5 x and g (x) = x2 25. The limit of
their ratio as x ! 5 has the form 0=0, but

f 5 x 5 x 1 1
lim (x) = lim = lim = lim =
x!5 g x!5 x2 25 x!5 (x 5)(x + 5) x!5 x + 5 10

On the other hand, by taking f; g : R ! R given by f (x) = x2 and g (x) = x, as x ! +1


we have a indeterminate form of the type 1=1 and

f x2
lim (x) = lim = lim x = +1
x!+1 g x!+1 x x!+1
330 CHAPTER 11. LIMITS OF FUNCTIONS

while, as x ! 1, we still have a form of the type 1=1 but

f x2
lim (x) = lim = lim x = 1
x! 1 g x! 1 x x! 1

In the two case the limits are in…nities of opposite sign: again, one cannot avoid the direct
calculation of the limit.

For the functions f and g just seen, at the point x0 = 0 we have the indeterminate form
0=0, but
f x2
lim (x) = lim = lim x = 0
x!0 g x!0 x x!0

while, setting g (x) = x4 , we still have an indeterminate form of the type 0=0 and

f x2 1
lim (x) = lim = lim 2 = +1
x!0 g x!0 x4 x!0 x

p
On the other hand, by taking f : R+ ! R given by f (x) = x + x 2 and g : R f1g ! R
given by g (x) = x 1, we have
p p p
f x+ x 2 x 1+ x 1 x 1
lim (x) = lim = lim = lim 1 +
x!1 g x!1 x 1 x!1 x 1 x!1 x 1
p
x 1 1 1 3
= 1 + lim p p = 1 + lim p =1+ =
x!1 ( x 1) ( x + 1) x!1 x+1 2 2

Summing up, everything goes.

We close with two observations: (i) as for sequences (Section 8.10.5), for functions the
various indeterminate forms can be reduced to one another; (ii) also in the case of functions
we can summarize what we have seen till now in tables similar to those in Section 8.10.4, as
readers can check.

11.6 Common limits


Using what studied so far, we now calculate some, more or less elementary, common limits.
We begin with a few examples of limits of elementary functions.

Example 464 (i) Let f : R ! R be given by f (x) = xn with n 1. For every x0 2 R, by


the basic properties of limits we have

lim xn = xn0
x!x0

Moreover, limx! n = +1 if n is even, while limx!+1 xn = +1 and limx! n


1x 1x = 1
if n is odd.
11.6. COMMON LIMITS 331

(ii) Let f : R f0g ! R be given by f (x) = 1=xn for n 1. For every 0 6= x0 2 R, we


have
1
lim f (x) = n
x!x0 x0
Moreover, limx! 1 1=xn = 0+ if n is even, while limx!+1 1=xn = 0+ and limx! 1 xn =
0 if n is odd. Finally, limx!0+ 1=xn = +1 and limx!0 1=xn = 1 if n is odd, while
limx!0+ 1=xn = limx!0 1=xn = +1 if n is even.

(iii) Let f : R ! R be given by f (x) = x, with > 0. For every x0 2 R, we have


limx!x0 x = x0 . Moreover,
8 8
>
> 0 if >1 >
> +1 if >1
< <
x x
lim = 1 if =1 and lim = 1 if =1
x! 1 >
> x!+1 >
>
: :
+1 if <1 0 if <1

(iv) Let f : R++ ! R be given by f (x) = loga x, with a > 0; a 6= 1. For every x0 > 0, we
have limx!x0 loga x = loga x0 . Moreover,
( (
1 if a > 1 +1 if a > 1
lim loga x = and lim loga x =
x!0+ +1 if a < 1 x!+1 1 if a < 1

(v) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. For every x0 2 R, we
have limx!x0 sin x = sin x0 and limx!x0 cos x = cos x0 . The limits limx! 1 sin x and
limx! 1 cos x do not exist. N

Next we prove some classic limits for trigonometric functions (we already met the …rst
one in the introduction of this chapter).

Proposition 465 Let f; g : R+ ! R be de…ned by f (x) = sin x=x and g (x) = (cos x 1) =x.
Then
sin x
lim =1 (11.30)
x!0 x

and
1 cos x 1 cos x 1
lim = 0; lim 2
= (11.31)
x!0 x x!0 x 2

Proof It is easy to see graphically that 0 < sin x < x < tan x for x 2 (0; =2) and that
tan x < x < sin x < 0 for x 2 ( =2; 0). Therefore, by dividing all the terms by sin x and
by observing that sin x > 0 when x 2 (0; =2) and sin x < 0 when x 2 ( =2; 0), we have in
all the cases
x 1
1< <
sin x cos x
The …rst limit then follows from the comparison criterion. For the third one, it is su¢ cient
to observe that
1 cos x 1 cos x 1 + cos x 1 cos2 x sin2 x 1
2
= 2
= 2
= 2
x x 1 + cos x x (1 + cos x) x 1 + cos x
332 CHAPTER 11. LIMITS OF FUNCTIONS

and that, as x ! 0, the …rst factor tends to 1 while the second one tends to 1=2. Finally,
the second limit follows immediately from the third one:
1 cos x 1 cos x 1
=x 2
!0 =0
x x 2

Finally, from the analogous propositions that we proved for sequences, we easily deduce
(the proofs are essentially identical) the following limits:

(i) If f (x) ! 1 as x ! x0 , then


f (x)
k
lim 1+ = ek
x!x0 f (x)
In particular
f (x) x
1 1
lim 1+ = e; lim 1+ =e
x!x0 f (x) x! 1 x
(ii) Let a > 0 and f (x) ! 0 as x ! x0 . Then

af (x) 1
lim = log a
x!x0 f (x)
In particular,
ax 1
lim = log a (11.32)
x!0 x
which, when a = e, becomes
ex 1
lim =1
x!0 x
(iii) Let 0 < a 6= 1 and f (x) ! 0 as x ! x0 . Then
loga (1 + f (x)) 1
lim =
x!x0 f (x) log a
In particular,
loga (1 + x) 1
lim =
x!0 x log a
which, when a = e, becomes
log(1 + x)
lim =1
x!0 x
(iv) If f (x) ! 0 as x ! x0 , we have
(1 + f (x)) 1
lim =
x!x0 f (x)
In particular,
(1 + x) 1
lim =
x!0 x
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 333

N.B. The function u : (0; 1) ! R de…ned by


( x1 1
1 if 6= 1
u (x) =
log x if =1

is the classic CRRA (constant relative risk aversion) utility function, where the scalar is
interpreted as a coe¢ cient of relative risk aversion (see Pratt, 1964, p. 134). In view of the
limit (11.32),11 we have lim !1 u (x) = lim !1 x1 1 = (1 ) = log x. O

11.7 Orders of convergence and of divergence


As for sequences, also for functions it may happen that some of them approach their limit
“faster” than other ones.
For simplicity we limit ourselves to scalar functions. We …rst extend to them the key
De…nition 327. Note the importance of the clause “as x ! x0 ”, which (as already remarked)
is the unique true novelty with respect to the case of sequences, in which this clause could
only take the form “n ! +1”.

De…nition 466 Given two functions f; g : A R ! R, let x0 2 R be a limit point of A for


which there exists a neighborhood B" (x0 ) such that g (x) 6= 0 for every x 2 A \ B" (x0 ).

(i) If
f (x)
lim =0
x!x0 g (x)
we say that f is negligible with respect to g as x ! x0 ; in symbols,

f = o (g) as x ! x0

(ii) If
f (x)
lim = k 6= 0 (11.33)
x!x0 g (x)
we say that f is comparable with g as x ! x0 ; in symbols,

f g as x ! x0

(iii) In particular, if
f (x)
lim =1
x!x0 g (x)
we say that f and g are asymptotic (or asymptotically equivalent) to one another as
x ! x0 and we write
f (x) g (x) as x ! x0
11
Here 1 plays the role of x in (11.32).
334 CHAPTER 11. LIMITS OF FUNCTIONS

Terminology For functions, too, the expression f = o (g) as x ! x0 reads “f is little-o of


g, as x ! x0 ”.

It is easy to see that also for functions the relations and for continue to satisfy the
properties seen in Section 8.14 for sequences, i.e.,

(i) the relations of comparability and of asymptotic are symmetric and transitive;
(ii) the relation of negligibility is transitive;
(iii) if limx!x0 f (x) and limx!x0 g (x) are both …nite and non-zero, then f g as x ! x0 ;
(iv) if limx!x0 f (x) = 0 and 0 6= limx!x0 g (x) 2 R, then f = o (g) as x ! x0 .

We now consider the cases, which also for functions continue to be the most interesting
ones, in which both functions either converge to zero or diverge to 1. We start with
convergence to zero: limx!x0 f (x) = limx!x0 g (x) = 0. In this case, intuitively, f is neg-
ligible with respect to g as x ! x0 if it tends to zero faster. Let, for example, x0 = 1,
f (x) = (x 1)2 and g (x) = x 1. We have
(x 1)2
lim = lim (x 1) ! 0
x!1 x 1 x!1

that is, f = o (g) as x0 ! 1. On the other hand, as x ! +1, we have


p r
x 1
lim p = lim 1 !1
x!+1 x + 1 x!+1 x+1
p p
Therefore, the functions f (x) = x and g (x) = x + 1 are comparable (even better, they
are asymptotic to one another) as x ! +1.

Let us consider two functions tending both to 1 as x ! x0 . In this case, intuitively,


f is negligible with respect to g when it tends to in…nity slower, that is, when assumes less
rapidly larger and larger values (in absolute value). For example, if f (x) = x and g (x) = x2 ,
for x0 = +1 we have
x 1
lim 2 = lim =0
x!+1 x x!+1 x

and so f = o (g) as x ! +1. When x ! 1, too, we have


x 1
lim 2
= lim =0
x! 1x x! 1 x

So, f = o (g) also as x ! 1: in both cases x tends to in…nity slower than x2 . Note that,
as x ! 0, we have instead limx!0 x2 = limx!0 x = 0 and
x2
lim = lim x = 0
x!0 x x!0

so that g = o (f ) as x ! 0.

In sum, also for functions the meaning of negligibility must be speci…ed according to
whether we consider convergence to zero or divergence to in…nity. Moreover, the point x0
where we take the limit is key, as already remarked several times (repetita iuvant, hopefully).
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 335

11.7.1 Little-o algebra


Like for sequences, also for functions the application of the concept of “little-o”is not always
straightforward. Indeed, knowing that a function f is little-o of another function g as x ! x0
does not give much information on the form of f , apart from that of being negligible with
respect to g. Fortunately, there exists an “algebra” of little-o, which extends the one seen
for sequences (Proposition 329), that allows to manipulate safely the little-o of sums and
products of functions. To ease notation, in what follows we will always assume that the
negligibility of the various functions is as x approaches the same point x0 , so we will always
omit the clause “as x ! x0 ”.12

Proposition 467 For every pair of functions f and g and for every scalar c 6= 0, we have:

(i) o(f ) + o (f ) = o (f );

(ii) o(f )o(g) = o(f g);

(iii) c o(f ) = o(f );

(iv) o(g) + o (f ) = o (f ) if g = o(f ).

We omit the proof because it is similar, mutatis mutandis, to that of Proposition 329.
Also the comments we made about that proposition still apply – in particular, about the
important special case o(f )o(f ) = o(f 2 ) of point (ii).

Example 468 Let f (x) = xn , with n > 2. Consider the two functions g(x) = xn 1 and
h(x) = e x 3xn 1 . It easy to check that g = o(f ) = o(xn ) and h = o(f ) = o(xn ) as
x ! +1.

(i) Summing the two functions we obtain g + h = e x 2xn 1, which is still o(xn ) as
x ! +1, in accordance with Proposition 467-(i).

(ii) Multiplying the two functions we obtain g h = xn 1 e x 3x2n 2 , which is o(xn xn ) =


o(x2n ) as x ! +1, in accordance with Proposition 467-(ii) in the special case o(f )o(f ).
Note that g h is not o(xn ).

(iii) Set c = 3 and consider c g = 3xn 1 . It is easy to check that 3xn 1 is still o(xn ) as
x ! +1, in accordance with Proposition 467-(iii).

(iv) Consider the function l(x) = x + 1. It is easy to check that l = o(g) = o(xn 1 ) as
x ! +1. Consider now the sum l + h, which is a sum of a o(g) and of a o(f ), with
g = o(f ). We have l + h = x + 1 + e x 3xn 1 , which is o(xn ) as x ! +1, i.e., o(f ),
in accordance with Proposition 467-(iv). Note that l + h is not o(g), even if l = o(g).
N

The next proposition presents some classic instances of functions with di¤erent rates of
divergence.

Proposition 469 Let k; h > 0, > 1 and a > 1. Then,


12
In any case, it would be meaningless to consider sums or products of little-o at di¤erent x0 .
336 CHAPTER 11. LIMITS OF FUNCTIONS

(i) xk = o ( x) as x ! +1, that is,

xk
lim x
=0
x!+1

(ii) xh = o xk as x ! +1 if h < k;

(iii) loga x = o xk as x ! +1, that is,

loga x
lim =0
x!+1 xk

By the transitivity property of the negligibility relation, from (i) and (ii) it follows that
x
loga x = o ( ) as x ! +1

Proof For all the three functions x , xk , loga x, one has that f (n 1) f (x) f (n) where
n = [x] is the integer part of x: such sequences are therefore increasing. It is then su¢ cient
to use the sequential characterization of the limit of a function and to use the comparison
criterion.

N.B. A function is o (1) as x ! x0 if it tends to 0. Indeed, f (x) = o (1) means that


f (x) =1 = f (x) ! 0. O

11.7.2 Asymptotic equivalence


The asymptotic equivalence for functions is analogous to that for sequences. In particular, we
will see that in the calculation of limits it is possible to replace a function by an asymptotically
equivalent one, which often allows to simplify substantially such calculation.
The development of this argument is parallel to that seen for sequences in Section 8.14.3.
Such parallelism, and the unavoidable repetitiveness that it implies, should not make us lose
sight of the importance of what we will see now. To minimize repetitions, we will omit some
details and comments, as well as the proofs (referring the reader to Section 8.14.3).
Let us start by observing that f (x) g (x) as x ! x0 implies, for given L 2 R,

lim f (x) = L () lim g (x) = L


x!x0 x!x0

That is, two functions asymptotic to one another as x ! x0 have the same limit as x ! x0 .
In particular, we have the following version for functions of Lemma 331.13

Lemma 470 Let f (x) g (x) and h (x) l (x) as x ! x0 . Then:

(i) f (x) h (x) g (x) l (x) as x ! x0 ;

(ii) f (x) =h (x) g (x) =l (x) as x ! x0 , provided that h (x) 6= 0 and l(x) 6= 0 in every
point x 6= x0 of a neighborhood B" (x0 ).
13
Relative to that lemma, for brevity here we limit ourselves to products and quotients (which are, in any
case, the more interesting cases).
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 337

We give now the analog of the important Lemma 332,

Lemma 471 We have

f (x) f (x) + o (f (x)) as x ! x0 (11.34)

Therefore,
lim f (x) = L () lim (f (x) + o (f (x))) = L
x!x0 x!x0

What is negligible with respect to f as x ! x0 , which is what o (f (x)) is as x ! x0 , is


asymptotically irrelevant and can be neglected. Thanks to Lemma 470, we therefore have:

(f (x) + o (f (x))) (g (x) + o (g (x))) f (x) g (x) as x ! x0 (11.35)

and
f (x) + o (f (x)) f (x)
as x ! x0 (11.36)
g (x) + o (g (x)) g (x)

Example 472 (i) Consider the limit


p p3 1 2
2 x + 5 x2 + x 2x 2 + 5x 3 + x
lim p = lim
x!+1 3 + x3 + 3x x!+1 3 + x 32 + 3x

3
and let us set f (x) = x and g (x) = x 2 . As x ! +1, we have
1 2
2x 2 + 5x 3 = o (f ) and 3 + 3x = o (g)

By (11.36), we then have


1 2
2x 2 + 5x 3 + x x 1
3 = p !0
3 as x ! +1
3 + x + 3x
2 x 2 x

(ii) Consider the limit


1
x2
+ x24 + e1x x 2 + 2x 4 + e x
lim = lim
x!+1 14 + 18 + 310 x!+1 x 4 + x 8 + 3x 10
x x x

As x ! +1, we have x 8 +3x 10 =o x 4 and, by Proposition 469-(i) e x +2x 4 =o x 2 .


By (11.36), we then have

x 2+ 2x 4 + e x x 2
4 + x 8 + 3x 10 4
= x2 ! +1 as x ! +1
x x
(iii) Consider the limit
1 cos x
lim
sin2 x + x3 x!0

By applying …rst (11.36) and then Lemma 470-(iii), we get


1 cos x 1 cos x 1 cos x 1
! as x ! 0
sin2 x + x3 sin2 x x2 2
N
338 CHAPTER 11. LIMITS OF FUNCTIONS

11.7.3 Terminology
Here too, for the comparison of two functions that both either converge to 0 or diverge to
1, there is a speci…c terminology. In particular,

(i) a function f such that limx!x0 f (x) = 0 is called in…nitesimal as x ! x0 ;

(ii) a function f such that limx!x0 f (x) = 1 is called in…nite as x ! x0 ;

(iii) if two functions f and g are in…nitesimal at x0 and such that f = o (g) as x ! x0 , then
f is said to be in…nitesimal of higher order at x0 with respect to g;

(iv) if two functions f and g are in…nite at x0 and such that f = o (g) as x ! x0 , then f
is said to be in…nite of lower order with respect to g.

A function is, therefore, in…nitesimal of higher order than another one if it tends to zero
faster, while it is in…nite of lower order if it tends to in…nity slower.

Example 473 (i) The functions de…ned by (x x0 )a are in…nitesimal as x ! x+ 0 when


a > 0 and in…nite when a < 0. (ii) The functions de…ned by x are in…nite as x ! +1 and
in…nitesimal as x ! 1 when > 1, and vice versa when 0 < < 1. N

11.7.4 The usual bestiary


We recast the results, already provided for sequences, concerning the comparison among
exponential functions x , power functions xk , and logarithmic functions logh x. As x ! +1,
they are in…nite when > 1, k > 0 and h > 0, and in…nitesimal when 0 < < 1, k < 0 and
h < 0.

(i) If > > 0, then x


= o( x ); indeed, x
= x = ( = )x ! 0.

(ii) xk = o ( x ) for every > 1 and k > 0, as already proved with the ratio criterion. If
instead 0 < < 1 and k > 0, then x = o xk .

(iii) If k1 > k2 > 0, then xk2 = o xk1 ; indeed, xk2 =xk1 = xk2 k1 ! 0.

(iv) If k > 0, then logh x = o xk .

(v) If h1 > h2 , then logh2 x = o logh1 x ; indeed, logh2 x= logh1 x = logh2 h1


x ! 0.

We can still add:

(vi) x = o (xx ) for every > 0; indeed, x =xx = ( =x)x ! 0.

The previous results can be organized in scales of in…nities and in…nitesimals, in analogy
with what we saw for sequences. For brevity we omit the details.
Chapter 12

Continuous functions

Ibis redibis, non morieris in bello (you will go, you will return, you will not die in war).
So the oracle muttered to the inquiring king, who had to decide whether to go to war. Or,
maybe, the oracle actually said: ibis redibis non, morieris in bello (you will go, you will not
return, you will die in war). A small change in a comma, a dramatic di¤erence in meaning.
When small changes have large e¤ects, instability may result: a small change may, sud-
denly, dramatically alter matters. In contrast, stability prevails when small changes can only
have small e¤ects, in which nothing dramatic can happen because of small alterations. Con-
tinuity is the mathematical translation of this general principle of stability for the relations
between dependent and independent variables that functions represent.

12.1 Generalities
Intuitively, a scalar function is continuous when the relation between the independent variable
x and the dependent variable y is “regular”, without breaks. The graph of a continuous
function can be drawn without ever lifting the pencil.
This means that a function is continuous at a point x0 of the domain if the behavior
towards x0 of the function is consistent with the value f (x0 ) that it actually assumes at x0 ,
that is, if the limit limx!x0 f (x) is equal to the image f (x0 ).

De…nition 474 A function f : A Rn ! R is said to be continuous at a limit point x0 2 A


if
lim f (x) = f (x0 ) (12.1)
x!x0

By convention, f is continuous at each isolated point of A.

Note that we required x0 to belong to the domain A. Indeed, continuity is a consistency


property of the function at the points of its domain, so it loses meaning at points where the
function is not de…ned.
The de…nition distinguishes between the points of A that are limit points, for which it
makes sense to talk of limits, and the points of A that are isolated.1 For the latter points the
notion of continuity is, conceptually, vacuous: being isolated, they cannot be approached by
other points of A and, therefore, there is no limit behavior for which to require consistency.
1
Recall that a point of A is either a limit point or an isolated point, tertium non datur (Section 5.3.2).

339
340 CHAPTER 12. CONTINUOUS FUNCTIONS

Nevertheless, it is convenient to assume that a function be continuous at the isolated points


of its domain. As an example, consider the function f : R+ [ f 1g ! R de…ned by

( p
x for x 0
f (x) =
1 for x = 1

Here x0 = 1 is an isolated point in the domain. Hence, we can (conveniently) say that f is
continuous at every point of its domain.

3
y

1 1

0
-1 O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

In sum, as a matter of convenience, we assume by convention that functions are automatically


continuous at isolated points. That said, the important case is, clearly, when x0 is a limit
point of A. In such a case, condition (12.1) requires consistency between the limit behavior
of the function towards x0 and the value f (x0 ) that it assumes at x0 . As we have seen in
the previous chapter, such consistency might well not hold. For example, we considered the
function f : R ! R given by

8
>
> x for x < 1
<
f (x) = 2 for x = 1 (12.2)
>
>
:
1 for x > 1
12.1. GENERALITIES 341

For this function limx!1 f (x) = 1 6= f (1) because at x0 = 1 there is a jump:

3
y

2 2

1 1

0
O 1 x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

The function f is, thus, not continuous at the point x0 = 1 because there is no consistency
between the behavior at the limit and the value at x0 . On the other hand, f is continuous at
all the other points of its domain: indeed, it is immediate to verify that limx!x0 f (x) = f (x0 )
for every x0 6= 1, so f does not exhibit other jumps besides the one at x0 = 1.

The distinction between limit points and isolated points becomes super‡uous for the
important case of functions f : I ! R de…ned on an interval I of the real line. Indeed, the
points of any such interval (be it bounded or unbounded, closed, open, or semi-closed) are
always limit points, so that f is continuous at any x0 2 I if limx!x0 f (x) = f (x0 ). For
example, f : (a; b) ! R is continuous at x0 2 (a; b) if limx!x0 f (x) = f (x0 ).

A function continuous at all the points of a subset E of the domain A is said to be


continuous on E. The set of all continuous functions on set E is denoted by C(E). For
example, the function de…ned by (12.2) is not continuous on R, but it is continuous on
R f1g. When the function is continuous at all the points of its domain, it is called
continuous, without further speci…cation. For example, the function sin x is continuous.

We provide now an important characterization of continuity through sequences, based


on Proposition 451. Note that it does not distinguish between isolated and limit points x0 .2

Proposition 475 A function f : A Rn ! R is continuous at a point x0 of A if and only


if f (xn ) ! f (x0 ) for every sequence fxn g of points of A such that xn ! x0 .

Proof The result follows immediately from Proposition 451 once we observe that, when x0
is an isolated point of A, the unique sequence contained in A that tends to x0 is constant,
i.e., fx0 ; x0 ; :::g.

Let us give some examples. We start by observing that elementary functions are con-
tinuous.
2
The condition xn 6= x0 of Proposition 451 does not appear here because x0 belongs to A.
342 CHAPTER 12. CONTINUOUS FUNCTIONS

Example 476 (i) Let f : R++ ! R be given by f (x) = log x. Since limx!x0 log x = log x0
for every x0 > 0, the function is continuous.

(ii) Let f : R ! R be given by f (x) = ax , with a > 0. Since limx!x0 ax = ax0 for every
x0 2 R, the function is continuous.

(iii) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. Since limx!x0 sin x =
sin x0 and limx!x0 cos x = cos x0 , both functions are continuous. N

Let us now see some examples of discontinuity.

Example 477 The function f : R ! R given by


8
< 1 if x 6= 0
f (x) = x (12.3)
:
0 if x = 0

is not continuous at x0 = 0, and therefore on its domain R, but it is so on R f0g. The


same is true for the function f : R ! R given by
8
< 1 if x 6= 0
f (x) = x2 (12.4)
:
0 if x = 0
N

Example 478 The function f : R ! R given by


(
2 if x > 1
f (x) = (12.5)
x if x 1

is not continuous at x0 = 1, and therefore on its domain R, but it is so both on ( 1; 1) and


on (1; +1). N

Example 479 The Dirichlet function is not continuous at any point of its domain: limx!x0 f (x)
does not exist for any x0 2 R (Example 431). N

Let us now consider some functions of several variables.


P
Example 480 (i) Let f : Rn ! R be given by f (x) = 1 + ni=1 xi . Proceeding as in
Example 449, we can verify that limx!x0 f (x) = f (x0 ) for every x0 2 Rn . The function is,
therefore, continuous.
(ii) The function f (x1 ; x2 ) = x21 + 1=x2 is continuous: it is indeed continuous at each
point of its domain A = x = (x1 ; x2 ) 2 R2 : x2 6= 0 . N

Example 481 Consider the function f : R ! R given by


(
2x + b if x 2
f (x) = (12.6)
4 x2 if x > 2
12.1. GENERALITIES 343

For which values of b is f continuous at x0 = 2 (so, on its domain)? To answer this question,
it is necessary to …nd the value of b such that

lim f (x) = lim f (x) = f (2)


x!2 x!2+

We have limx!2 f (x) = 4 + b = f (2) and limx!2+ f (x) = 0, so that f is continuous at


x0 = 2 if and only if 4 + b = 0, i.e., when b = 4. Therefore, for b = 4 the function (12.6)
is continuous on R, while for b 6= 4 it is continuous on R f2g. N

Note that when f is continuous at x0 , we can write

lim f (x) = f (x0 ) = f ( lim x)


x!x0 x!x0

so that f and lim becomes exchangeable. Such exchangeability is the essence of the concept
of continuity.

O.R. Naively, we could claim that a function such as f (x) = 1=x has a (huge) discontinuity
at x = 0. After all, it makes a “big jump” by passing from 1 to +1.

10
y
8

0
O x
-2

-4

-6

-8

-10
-2 -1 0 1 2 3 4

In contrast, the function g (x) = log x does not su¤er from any such problem, so it seems
“more continuous”:
344 CHAPTER 12. CONTINUOUS FUNCTIONS

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

If we pay close attention to these two functions, however, we would realize that 1=x commits
the little sin of not being de…ned for x = 0 (an “original”sin), while log x commits the much
more serious sin of being de…ned neither at x = 0 nor at any x < 0.
The truth is that, at the points at which a function is not de…ned it is meaningless to
wonder about its continuity,3 a property that can only be considered at points where the
function is de…ned. At such points, the functions 1=x and log x are both continuous. H

12.2 Discontinuity
As the examples just seen indicate, for functions of a single variable there are di¤erent types
of discontinuity:4

(i) f is not continuous at x0 because limx!x0 f (x) exists and is …nite, but it is di¤erent
from f (x0 );

(ii) f is not continuous at x0 because the one-sided limits limx!x f (x) and limx!x+ f (x)
0 0
exist and are …nite, but they are di¤erent, i.e., limx!x f (x) 6= limx!x+ f (x) (so,
0 0
limx!x0 f (x) does not exist);

(iii) f is not continuous at x0 because at least one of the one-sided limits limx!x f (x) and
0
limx!x+ f (x) is either 1 or does not exist.
0

For example, the discontinuity at x0 = 1 of the function (12.2) is of type (i) because
limx!1 f (x) exists, but it is di¤erent from f (1). The discontinuity at x0 = 1 of the function
(12.5) is of type (ii) because

lim f (x) = 1 6= lim f (x) = 2


x!1 x!1+
3
It would be as asking if green pigs are able to ‡y: they do not exist, so the question is meaningless.
4
Recall that f (x0 ) 2 R, we cannot have f (x) = 1.
12.2. DISCONTINUITY 345

On the contrary, the discontinuity at x0 = 0 of the function (12.3) is of type (iii) because

lim f (x) = 1=
6 lim f (x) = +1
x!0 x!0+

In the same way, the discontinuity at x0 = 0 of the function (12.4) is of type (iii) because

lim f (x) = lim f (x) = lim f (x) = +1


x!0 x!0+ x!0

(the two-sided limit here exists, but it is in…nite). The discontinuity at each point x0 2 R of
the Dirichlet function is also of type (iii) because it is easy to see that its one-sided limits
do not exist.

When the discontinuity at a point x0 is of type (i) we talk of a removable discontinuity,


while when it is of type (ii) or (iii) we talk of a non-removable discontinuity. In particular, the
non-removable discontinuity (ii) is called jump, while (iii) is called essential non-removable
discontinuity.
Note that when a function f has a non-removable jump discontinuity at a point x0 , its
“jump” is given by the di¤erence

lim f (x) lim f (x)


x!x+
0 x!x0

For example, the function (12.5) has at x0 = 1 a jump equal to

lim f (x) lim f (x) = 2 1=1


x!x+
0 x!x0

Non-removable discontinuity is, de…nitely, a more severe form of discontinuity than the
removable one (as the terminology suggests). Indeed, the latter can be “…xed”by modifying
the function f at x0 in the following way:
(
f (x) if x 6= x0
f~ (x) = (12.7)
limx!x0 f (x) if x = x0
The function f~ is the “…xed” version of the function f that restores the continuity at x0 .
For example, the …xed version of the function (12.2) is
( (
f (x) if x 6= 1 x if x 1
f~ (x) = =
limx!x0 f (x) if x = 1 1 if x > 1

As the reader can easily verify, such …xing is no longer possible for non-removable dis-
continuities, which represent substantial discontinuities of a function.

A monotonic (increasing or decreasing) function cannot have discontinuities of type (i)


or (iii). Indeed, suppose that f is increasing (similar considerations hold in the decreasing
case). Increasing monotonicity guarantees that the right and the left limits exist, with

lim f (x) lim f (x) lim f (x) lim f (x)


x!x0 x!x+
0 x!y0 x!y0+
346 CHAPTER 12. CONTINUOUS FUNCTIONS

for each pair of points x0 < y0 of the domain of f . Therefore, these limits cannot be in…nite,
which excludes discontinuities of type (iii).
Moreover, f cannot even have removable discontinuities because they would violate mono-
tonicity. Therefore, a monotonic function can only have jump discontinuities. Indeed, the
next result shows that a monotonic function can have at most countably many jump dis-
continuities. The proof of this useful result is based on the following lemma, which is of
independent interest.

Lemma 482 A collection of disjoint intervals of R is at most countable.

Proof Let fIj gj2J be a set of disjoint intervals of R. By the density of the rational numbers,
each interval Ij contains a rational number qj . Since the intervals are disjoint, qj 6= qj 0 for
j 6= j 0 . Then the set of rational numbers fqj gj2J is a proper subset of Q and is, therefore,
at most countable. In turn, this implies that the index set J is, at most, countable.

The disjointedness hypothesis cannot be removed: for instance, the set of overlapping
intervals f( r; r) : r 2 Rg is clearly uncountable.

Proposition 483 A monotonic function can have at most countably many jump discontinu-
ities.

Proof A jump discontinuity of the function f at the point x0 determines a bounded interval
with endpoints limx!x f (x) and limx!x+ f (x). By the monotonicity of f , the intervals
0 0
determined by the jumps are disjoint. By Lemma 482, the intervals, and therefore the jumps
of f , are at most countable.

In the proof the monotonicity hypothesis is key for having countably many discontinuities:
it guarantees that the intervals de…ned by the jumps of the function do not overlap.

12.3 Operations and composition


The next result illustrates the behavior of continuity with respect to the algebra of functions.

Proposition 484 Let f; g : A Rn ! R be continuous at x0 2 A. Then:

(i) the function f + g is continuous at x0 ;

(ii) the function f g is continuous at x0 ;

(iii) the function f =g is continuous at x0 , provided that g (x0 ) 6= 0.

Proof We prove (i), leaving to the reader the other points. Since limx!x0 f (x) = f (x0 ) 2 R
and limx!x0 g (x) = g (x0 ) 2 R, Proposition 459-(i) yields

lim (f + g) (x) = lim f (x) + lim g (x) = f (x0 ) + g (x0 ) = (f + g) (x0 )


x!x0 x!x0 x!x0

Therefore, f + g is continuous at x0 .
12.4. ZEROS AND EQUILIBRIA 347

For example, each polynomial f (x) = 2 n


0+ 1 x+ 2x + + nx is continuous. Indeed,
for each x0 2 R we have
2 n
lim f (x) = lim 0 + 1x + 2x + + nx
x!x0 x!x0
2 n
= lim 0 + lim 1x + lim 2x + + lim nx
x!x0 x!x0 x!x0 x!x0
2 n
= 0 + 1 x0 + 2 x0 + + n x0 = f (x0 )

Continuity is preserved by the composition of functions:

Proposition 485 Let f : A Rn ! R and g : B R ! R be such that Im f B. If f is


continuous at x0 2 A and g is continuous at f (x0 ), then g f is continuous at x0 .

Proof Let fxn g A be such that xn ! x0 . By Proposition 475, f (xn ) ! f (x0 ). Since
g is continuous at f (x0 ), another application of Proposition 475 shows that g (f (xn )) !
g (f (x0 )). Therefore, g f is continuous at x0 .

As the next example shows, the result can be useful also in the computation of limits
since, when its hypotheses hold, we can write

lim (g f ) (x) = (g f ) (x0 ) = g (f (x0 )) = g lim f (x) (12.8)


x!x0 x!x0

If a limit involves a composition of continuous functions, (12.8) makes its computation im-
mediate.

Example 486 Let f : R f g ! R be given by f (x) = x2 = (x + ) and g : R ! R be


given by g (x) = sin x. Since g is continuous, by Proposition 485 g f is continuous at every
x 2 R f g. The observation is useful, for example, to compute the limit

x2
lim sin
x! x+

Indeed, once we observe that it can be written in terms of g f , then by (12.8) we have

x2 2
lim sin = lim (g f ) (x) = (g f ) ( ) = sin = sin =1
x! x+ x! 2 2

Therefore, continuity allows to calculate limits by substitution. N

12.4 Zeros and equilibria


Continuous functions have remarkable properties that often assign them a key role in applic-
ations. In this section we study some of these applications, with in addition a short preview
of Weierstrass’ Theorem, a fundamental property of continuous functions whose detailed
study is postponed to Chapter 18.
348 CHAPTER 12. CONTINUOUS FUNCTIONS

12.4.1 Zeros
The …rst result, Bolzano’s Theorem,5 is very intuitive. Yet its proof, although simple, is not
trivial, showing how statements that are intuitive might be di¢ cult to prove. Intuition is
a fundamental guide in the search for new results, but it may be misleading. Sometimes,
properties that appeared to be intuitively true turned out to be false.6 For this reason, the
proof is the unique way of establishing the validity of a result; intuition, even the most re…ned
one, must at a certain point leave the place to the rigor of the mathematical argument.

Theorem 487 (Bolzano) Let f : [a; b] ! R be a continuous function. If f (a) f (b) 0,


then there exists c 2 [a; b] such that f (c) = 0. Moreover, if f is strictly monotonic, such c
is unique.

Note that the condition f (a) f (b) 0 is equivalent to asking that the two values do
not have the same sign. The clear intuitive meaning of this theorem is revealed by the next
…gure:

Proof If f (a) f (b) = 0, either f (a) = 0 or f (b) = 0. In the …rst case, the result holds by
setting c = a; in the second case, by setting c = b. If instead f (a) f (b) < 0, then we have
either f (a) < 0 < f (b) or f (b) < 0 < f (a). Let us study the case f (a) < 0 < f (b) (the
case f (b) < 0 < f (a) is analogous). Denote by C the set of values of x 2 [a; b] such that
f (x) < 0 and let c = sup C. By Proposition 120, recall that: (i) c x for all x 2 C, and (ii)
for each " > 0 there exists x0 2 C such that x0 > c ".
We next prove that f (c) = 0. By contradiction, assume that f (c) 6= 0, that is, either
f (c) < 0 or f (c) > 0. If f (c) < 0, by the Theorem on the permanence of sign there exists a
neighborhood (c ; c + ) such that f (x) < 0 for all x 2 (c ; c + ). By the de…nition of
C, this implies that c + =2 2 C, yielding that c cannot be the supremum, a contradiction.
Conversely, if f (c) > 0, again, by the Theorem on the permanence of sign there exists
a neighborhood (c ; c + ) of c such that f (x) > 0 for all x 2 (c ; c + ). By the
5
The result is named after Bernard Bolzano, who gave a …rst proof in 1817.
6
Recall Guidi’s crescendo in Section 10.3.2.
12.4. ZEROS AND EQUILIBRIA 349

de…nition of C, we have (c ; c + ) \ C = ;. By choosing " = , this implies that there


exists no x0 2 C such that x0 > c ", a contradiction.
Finally, if f is strictly monotonic, it is injective (Proposition 207) and therefore there
exists a unique point c 2 [a; b] such that f (c) = 0.

A simple application of the result concerns the real solutions of a polynomial equation.
Let f : R ! R be the polynomial
2 n
f (x) = 0 + 1x + 2x + + nx (12.9)

and let us study the polynomial (or algebraic) equation f (x) = 0. The equation does not
always have real solutions: for example, this is the case for the equation f (x) = 0 with
f (x) = x2 + 1. Thanks to Bolzano’s Theorem, we have the following result that guarantees
that each polynomial equation of odd degree has always at least a real solution.

Corollary 488 If the degree of the polynomial f in (12.9) is odd, there exists at least a
x
^ 2 R such that f (^
x) = 0.

Proof Let us suppose n > 0 (otherwise, we consider f ) and let g : R ! R be given by


g (x) = 0 + 1 x + 2 x2 + + n 1 xn 1 . We have g (x) = o (xn ) both as x ! +1 and as
x ! 1. We can therefore write f (x) = n xn + o (xn ) both as x ! +1 and as x ! 1,
which implies limx!+1 f (x) = +1 and limx! 1 f (x) = 1. Since f is continuous, there
exist x1 < x2 such that f (x1 ) < 0 < f (x2 ). The function f is continuous on the interval
[x1 ; x2 ]. Therefore, by Bolzano’s Theorem there exists x
^ 2 (x1 ; x2 ) such that f (^
x) = 0.

O.R. In presenting Bolzano’s Theorem, we remarked the limits of intuition. A nice example
in this regard is the following. Imagine you put a rope around the Earth at the equator
(about 40; 000 km) such that it perfectly adheres to the equator in each point. Now, imagine
that you add one meter to the rope and you lift it by keeping uniform its distance from the
ground. What is the measure of this uniform distance? We are all tempted to say “very,
very small: one meter out of forty thousands km is nothing!”Instead, no: the distance is 16
cm. Indeed, if c denotes the equatorial Earth circumference (in meters), the Earth radius is
r = c=2 ; if we add one meter, the new radius is r0 = (c + 1) =2 and the di¤erence between
the two is r0 r = 1=2 ' 0:1592. This proves another remarkable result: the distance of
about 16 centimeters is independent of c: no matter whether it is the Earth, or the Sun, or
a tennis ball, the addition of one meter to the length of the rope always causes a lift of 16
cm! As the manifesto of the Vienna circle remarked “Intuition ... is especially emphasized
by metaphysicians as a source of knowledge.... However, rational justi…cation has to pursue
all intuitive knowledge step by step. The seeker is allowed any method; but what has been
found must stand up to testing.” H

12.4.2 Equilibria
The next result is a further consequence of Bolzano’s Theorem, with a remarkable economic
application: the existence and the uniqueness of the market equilibrium price.
350 CHAPTER 12. CONTINUOUS FUNCTIONS

Proposition 489 Let f; g : [a; b] ! R be continuous. If f (a) g (a) and f (b) g (b),
there exists c 2 [a; b] such that
f (c) = g (c)

If f is strictly decreasing and g is strictly increasing, such c is unique.

Proof Let h : [a; b] ! R be de…ned by h (x) = f (x) g (x). Then

h (a) = f (a) g (a) 0 and h (b) = f (b) g (b) 0

Since h is continuous, by Bolzano’s Theorem there exists c 2 [a; b] such that h (c) = 0, that
is, f (c) = g (c).
If f is strictly decreasing and g is strictly increasing, then h is strictly decreasing. There-
fore, again by Bolzano’s Theorem, c is unique.

We now apply the result to establish the existence and uniqueness of the market equi-
librium price. Let D : [a; b] ! R and S : [a; b] ! R be the demand and supply functions of
some good, where [a; b] R+ is the set of the prices at which the good can be traded (see
Section 8.4). A pair (p; q) 2 [a; b] R+ of prices and quantities is called market equilibrium
if
q = D (p) = S (p)

A fundamental problem is the existence, and the possible uniqueness, of such an equilib-
rium. By Proposition 489, so ultimately by Bolzano’s Theorem, we can solve the problem in
a very general way. Let us assume that S (a) D (a) and S (b) D (b). That is, at the smal-
lest possible price, a , the demand of the good is greater than its supply, while the opposite
is true at the highest possible price b. These hypotheses are natural. By Proposition 489,
they guarantee the existence of an equilibrium price p 2 [a; b], i.e., such that D (p) = S (p).
The equilibrium quantity is q = D (p) = S (p). Therefore, the pair of prices and quantities
(p; q) is a market equilibrium.

Moreover, again by Proposition 489, the market has a unique market equilibrium (p; q) if
we assume that the demand function D is strictly decreasing –i.e., at greater prices, smaller
quantities are demanded – and that the supply function S is strictly increasing – i.e., at
greater prices, greater quantities are o¤ered.

Because of its importance, we state formally this market equilibriums result.

Proposition 490 Let D : [a; b] ! R and S : [a; b] ! R be continuous and such that
D (a) S (a) and D (b) S (b). Then there exists a market equilibrium (p; q) 2 [a; b] R+ .
If, in addition, D is strictly decreasing and S is strictly increasing, such equilibrium is unique.

The next …gure illustrates graphically the result, which corresponds to the classic “inter-
section” of demand and supply:
12.5. WEIERSTRASS’THEOREM: A PREVIEW 351

6
y
D
5

S
3

0
O b x
-1
-0.5 0 0.5 1 1.5 2

In equilibrium analysis, Bolzano’s Theorem is often applied through the demand excess
function E : [a; b] ! R de…ned by

E (p) = D (p) S (p)

We have E (p) 0 when at the price p the demand exceeds the supply; otherwise, we have
E (p) 0. Therefore, p 2 [a; b] is an equilibrium price if and only if E (p) = 0, i.e., if and only
if p equalizes demand and supply. The equilibrium price p is a zero of the excess demand
function; the conditions on the functions D and S assumed in Proposition 490 guarantee the
existence and uniqueness of such a zero.

A …nal observation: the reader can easily verify that Proposition 489 holds as long as
(i) the monotonicity of f and g are opposite: one is increasing and the other decreasing,
(ii) at least one of them is strict. In the statement we assumed f to be strictly decreasing
and g to be strictly increasing both for simplicity and in view of the application to market
equilibrium.

12.5 Weierstrass’Theorem: a preview


A continuous function de…ned on a compact (i.e., closed and bounded) domain enjoys a fun-
damental property: on such a domain it attains both its maximum and minimum values, that
is, it has a maximizer and a minimizer. This result is contained in the Weierstrass’Theorem
(sometimes called Extreme Value Theorem), which is central in mathematical analysis. Here
we state the theorem for functions of a single variable de…ned on a compact interval [a; b].
In Chapter 18 we will state and prove it in the more general case of functions of several
variables de…ned on compact sets of Rn .

Theorem 491 A continuous function f : [a; b] ! R has (at least one) minimizer and (at
least one) maximizer in [a; b], that is, there exist x1 ; x2 2 [a; b] such that

f (x1 ) = max f (x) and f (x2 ) = min f (x)


x2[a;b] x2[a;b]
352 CHAPTER 12. CONTINUOUS FUNCTIONS

The hypotheses of continuity of f and of compactness (closure and boundedness) of its


domain are both indispensable. In absence of any one of them, the existence of a maximizer
or of a minimizer is no longer guaranteed, as the next simple examples show.

Example 492 (i) Let f : [0; 1] ! R be given by


8
< x if x 2 (0; 1)
f (x) =
: 1 if x 2 f0; 1g
2
Then f is de…ned on the compact interval [0; 1] but is not continuous. It is easy to see that
f has neither a maximizer nor a minimizer.

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4

(ii) Let f : (0; 1) ! R be given by f (x) = x. Here f is continuous but the interval (0; 1)
is not compact (it is open). In this case, too, the function has neither a maximizer nor a
minimizer.
3

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4
12.5. WEIERSTRASS’THEOREM: A PREVIEW 353

(iii) Let f : [0; 1) ! R be given by f (x) = x. The function f is continuous but the
interval [0; 1) is not compact (it is closed but not bounded). The function does not have a
maximizer (it has only the minimizer 0).

y
2

0
O x

-1

-2

-3
-2 -1 0 1 2 3 4 5

(iv) Let f : R ! R be given by (see Proposition 253)


( 1 x
1 2e if x < 0
f (x) = 1 x
2e if x 0

with graph

2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

The function f is continuous (and bounded) but R is not compact (it is closed but not
bounded). The function does not have either a maximizer or a minimizer. N
354 CHAPTER 12. CONTINUOUS FUNCTIONS

12.6 Intermediate Value Theorem


An important extension of Bolzano’s Theorem is the Intermediate Value Theorem, to which
we devote this section. The next lemma establishes a …rst remarkable property.

Lemma 493 Let f : [a; b] ! R be continuous, with f (a) f (b). If

f (a) z f (b)

then there exists a c b such that f (c) = z. If f is strictly increasing, such c is unique.

Proof If f (a) = f (b), it is su¢ cient to set c = a or c = b. Let f (a) < f (b) and let
g : [a; b] ! R be de…ned by g (x) = f (x) z. We have

g (a) = f (a) z 0 and g (b) = f (b) z 0

Since f is continuous, by Bolzano’s Theorem there exists c 2 [a; b] such that g (c) = 0, that
is, f (c) = z.
The function g is strictly monotonic if and only if f is so. Therefore, by Bolzano’s
Theorem such c is unique whenever f is strictly monotonic.

The function assumes, therefore, all the values between f (a) and f (b), without any
“breaks”. The lemma formalizes the intuition given at the beginning of the chapter that the
graph of a continuous function can be drawn without ever lifting the pencil.

The case f (a) f (b) is analogous. We can thus say, in general, that for any z such that

min ff (a) ; f (b)g z max ff (a) ; f (b)g

there exists a c b such that f (c) = z. If f is strictly monotonic, such c is unique.


The Theorem of the zeros is, therefore, the special case in which

min ff (a) ; f (b)g 0 max ff (a) ; f (b)g

that is, f (a) f (b) 0.

Together with Weierstrass’Theorem, Lemma 493 implies the following classic result.

Theorem 494 (Intermediate Value Theorem) Let f : [a; b] ! R be continuous. Set

m = min f (x) and M = max f (x)


x2[a;b] x2[a;b]

Then, for any z with


m z M

there exists c 2 [a; b] such that f (c) = z. If f is strictly monotonic, such c is unique.
12.6. INTERMEDIATE VALUE THEOREM 355

In other words, we have


Im f = [m; M ]
Since minx2[a;b] f (x) and maxx2[a;b] f (x) are, respectively, the minimum and the maximum
values among all the values that f (x) assumes on the interval [a; b], the Intermediate Value
Theorem, too, has a clear intuitive meaning. It is illustrated by the following …gure:

5
y
4
M

3
z = f(c)

1
m
0 O a x c x b x
M m

-1

-1 0 1 2 3 4 5 6

Proof Let z 2 [m; M ]. By Weierstrass’Theorem, there exist a maximizer and a minimizer of


f in [a; b]. Let x1 ; x2 2 [a; b] be such that m = f (x1 ) and M = f (x2 ). Suppose, without loss
of generality, that x1 x2 and consider the interval [x1 ; x2 ]. The function f is continuous
on [x1 ; x2 ]. Since f (x1 ) z f (x2 ), thanks to Lemma 493 there exists c 2 [x1 ; x2 ] [a; b]
such that f (c) = z.
If f is strictly monotonic, it is injective (Proposition 207) and therefore the point c 2 [a; b]
such that f (c) = z is unique.

The continuity of f on [a; b] is crucial for Lemma 493 (and therefore for the Intermediate
Value Theorem). To see this, consider, for example, the so-called signum function sgn : R !
R de…ned by 8
>
> 1 if x > 0
<
sgn x = 0 if x = 0
>
>
:
1 if x < 0
Its restriction sgn : [ 1; 1] ! R on the interval [ 1; 1] is continuous at all the points of this
interval except for the origin 0, at which it has a non-removable jump discontinuity. So, the
continuity hypothesis of Lemma 493 does not hold. The image of sgn x consists of only three
points f 1; 0; 1g. Thus, for every z 2 [ 1; 1], with z 6= 1; 0; 1, there is no x 2 [ 1; 1] such
that sgn x = z.

A nice consequence of the Intermediate Value Theorem is a characterization of scalar


continuous injective functions that completes what we established in Proposition 207.

Proposition 495 Let f : I ! R be a continuous function de…ned on an interval, bounded


or not, of the real line. Then, f is injective if and only if it is strictly monotone.
356 CHAPTER 12. CONTINUOUS FUNCTIONS

Proof The “if” follows from Proposition 207. As to the converse, assume that f is in-
jective. Suppose, by contradiction, that f is not strictly monotone. Then, there exist
x < z < y such that either f (z) > max ff (x) ; f (y)g or f (z) < min ff (x) ; f (y)g. Suppose
that f (z) > max ff (x) ; f (y)g, the other case being similarly handled. Let f (z) > k >
max ff (x) ; f (y)g. By the Intermediate Value Theorem, there exist t0k 2 [x; z] and t00k 2 [z; y]
such that f (t0k ) = f (t00k ) = k, thus contradicting the injectivity of f . We conclude that f is
strictly monotone.

Without continuity the “only if” fails: consider the discontinuous function f : R ! R
given by
(
x if x 2 Q
f (x) =
x else
It is not strictly monotone: if x = 3, z = and y = 4, we have x < z < y and f (z) <
min ff (x) ; f (y)g. Yet, f is injective. Indeed, let x 6= y. Clearly, f (x) 6= f (y) if either
x; y 2 Q or x; y 2= Q. If x 2 Q and y 2 = Q, then f (x) = x 2 Q and f (y) = y 2 = Q, and so
f (x) 6= f (y). We conclude that f is injective.

12.7 Limits and continuity of operators


The notion of continuity extends in a natural way to operators f : A Rn ! Rm . First of
all, note that they can be seen as an m-tuple (f1 ; :::; fm ) of functions of several variables
fi : A Rn ! R 8i = 1; 2; :::; m
de…ned by
y1 = f1 (x1 ; :::; xn )
y2 = f2 (x1 ; :::; xn )

ym = fm (x1 ; :::; xn )
The functions fi are the component functions of the operator f . For example, let us go back
to the operators of Example 179.
Example 496 (i) If f : R2 ! R2 is de…ned by f (x1 ; x2 ) = (x1 ; x1 x2 ), then
f1 (x1 ; x2 ) = x1
f2 (x1 ; x2 ) = x1 x2
(ii) If f : R3 ! R2 is de…ned by
f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42
then
f1 (x1 ; x2 ; x3 ) = 2x21 + x2 + x3
f2 (x1 ; x2 ; x3 ) = x1 x42
N
12.7. LIMITS AND CONTINUITY OF OPERATORS 357

The notion of limit extends in a natural way to operators.

De…nition 497 Let f : A Rn ! Rm be an operator and x0 2 Rn a limit point of A. We


write
lim f (x) = L 2 Rm
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L)
The value L is called the limit of the operator f at x0 .

For m = 1 we …nd again De…nition 448 of limit of functions of several variables. Note
that here L is a vector of Rm .7

De…nition 498 An operator f : A Rn ! Rm is said to be continuous at a limit point


x0 2 A if
lim f (x) = f (x0 )
x!x0
Moreover, by convention f is continuous at each isolated point of A.

Here, too, an operator that is continuous at all the points of a subset E of the domain
A is called continuous on E, while an operator that is continuous at all the points of its
domain is called continuous. It is easy to see that the two operators of the last example are
continuous.

By writing f = (f1 ; :::; fm ) one obtains the following componentwise characterization of


continuity, whose proof is left to the reader.

Proposition 499 An operator f = (f1 ; :::; fm ) : A Rn ! Rm is continuous at a point


n
x0 2 A if and only if all its component functions fi R ! R are continuous at x0 .

The continuity of an operator is thus brought back to the continuity of its component
functions, a componentwise notion of continuity.

In Section 8.15 we saw that the convergence of vectors is equivalent to that of their
components. This will allow (the reader) to prove the next sequential characterization of
continuity that extends Proposition 475 to operators.

Proposition 500 An operator f : A Rn ! Rm is continuous at a point x0 of A if and


only if f (xn ) ! f (x0 ) for every sequence fxn g of points of A such that xn ! x0 .

The statement is formally identical to that of Proposition 475, but here f (xn ) ! f (x0 )
indicates convergence of vectors in Rm .

Proposition 500 permits to extend to operators the continuity results established for
functions of several variables, except the ones that use in an essential way the order structure
of their codomain R (e.g., Bolzano’s and Weierstrass’ Theorems). We leave to the reader
such extensions.
7
For simplicity, we do not consider possible “extended values”, that is, a vector L with one or more
coordinates that are 1.
358 CHAPTER 12. CONTINUOUS FUNCTIONS

12.8 Equations, …xed points, and market equilibria


12.8.1 Equations
An operator f = (f1 ; :::; fn ) : A Rn ! Rn de…nes an equation
f (x) = 0 (12.10)
that is, 8
>
> f1 (x) = 0
>
>
>
< f2 (x) = 0
(12.11)
>
>
>
>
>
:
fn (x) = 0
The vector x is the unknown of the equation. The solutions of equation (12.10) are all x 2 A
such that f (x) = 0.8
For example, the second order equation
2
0 + 1x + 2x =0
can be written as
f (x) = 0 (12.12)
where f : R ! R is the polynomial f (x) = 0 + 1 x + 2 x2 . Its solutions are all x 2 R that
satisfy (12.12).
Later in the book (Section 13.7) we will study systems of linear equations that can be
written as f (x) = 0 through the a¢ ne operator f : Rn ! Rn de…ned by f (x) = Ax b.

A main issue in dealing with equations is the existence of solutions, that is, whether there
exist vectors x 2 A such that f (x) = 0. As well known from (at least) high school, this
might well not be the case: consider f : R ! R given by f (x) = x2 + 1; there are no x 2 R
such that x2 + 1 = 0.
Bolzano’s Theorem is a powerful result to establish the existence of solutions in the scalar
case. Indeed, if f : A R ! R is a continuous function, then equation
f (x) = 0 (12.13)
has a solution provided there exist x0 ; x00 2 A such that f (x0 ) < 0 < f (x00 ). For instance, in
this way Corollary 488 was able to establish the existence of solutions of some polynomial
equations.
Bolzano’s Theorem admits a generalization to Rn that, surprisingly, turns out to be a
quite di¢ cult result, known as Poincaré-Miranda’s Theorem.9 A piece of notation: given a
vector x 2 Rn , we write (xi ; x i ) to emphasize the component i of vector x. For instance, if
x = (4; 7; 11) then x1 = 4 and x 1 = (7; 11), while x3 = 11 and x 3 = (4; 7).
8
Often (12.11) is referred to as a “system of equations”, each fi (x) = 0 being an equation. We will also use
this terminology when dealing with systems of linear equations (Section 13.7). In view of (12.10), however,
one should use this terminology cum grano salis.
9
It was stated in 1883 by Henri Poincaré and proved by Carlo Miranda in 1940. For a proof, we refer
interested readers to Kulpa (1997).
12.8. EQUATIONS, FIXED POINTS, AND MARKET EQUILIBRIA 359

Theorem 501 (Poincaré-Miranda) Consider a continuous operator f = (f1 ; :::; fn ) :


[a; b] ! Rn de…ned on an interval of Rn . If, for each i = 1; :::; n, we have

fi (ai ; x i ) fi (bi ; x i ) 0 8x i 2 [a i ; b i ] (12.14)

then there exists c 2 [a; b] such that f (c) = 0.10

If n = 1, we are back to Bolzano’s Theorem. If n = 2, condition (12.14) becomes:

f1 (a1 ; x2 ) f1 (b1 ; x2 ) 0 8x2 2 [a2 ; b2 ] (12.15)


f2 (x1 ; a2 ) f2 (x1 ; b2 ) 0 8x1 2 [a1 ; b1 ]

Under this condition, the Poincaré-Miranda’s Theorem ensures that, for a continuous oper-
ator f = (f1 ; f2 ) : [a; b] ! R2 , there exists a point x 2 [a; b] such that f1 (x) = f2 (x) = 0.
In general, if there exist vectors x0 ; x00 2 A such that condition (12.14) holds on the interval
[x0 ; x00 ] A, then the equation (12.10) induced by a continuous function f : A Rn ! Rn
has a solution.

Example 502 De…ne f : R2 ! R2 by f (x1 ; x2 ) = (x51 + x22 ; e x21 + x32 ). Consider the
equation ( 5
x1 + x22 = 0
e x21 + x32 = 0
We have limx1 ! f1 (x1 ; x2 ) = 1 for each x2 2 R, as well as limx2 ! f2 (x1 ; x2 ) = 1 for
each x2 2 R. So, there exists an interval [x0 ; x00 ] in the plane on which condition (12.15) is
satis…ed in the form

f1 x01 ; x2 < 0 < f1 x001 ; x2 8x2 2 x02 ; x002


f2 x1 ; x02 < 0 < f2 x1 ; x002 8x1 2 x01 ; x001

By the Poincaré-Miranda’s Theorem, the equation has a solution x 2 [x0 ; x00 ], with f1 (x) =
f2 (x) = 0. N

Thanks to the Poincaré-Miranda’s Theorem, we can establish an operator version of


Proposition 489.

Proposition 503 Let f = (f1 ; :::; fn ) ; g = (g1 ; :::; gn ) : [a; b] ! Rn be continuous operators
de…ned on an interval of Rn . If, for each i = 1; :::; n, we have

fi (ai ; x i ) gi (ai ; x i ) and fi (bi ; x i ) gi (bi ; x i ) 8x i 2 [a i ; b i ]

then there exists c 2 [a; b] such that f (c) = g (c).

Proof Let h : [a; b] ! Rn be de…ned by h (x) = f (x) g (x). Then, for each i = 1; :::; n, we
have

hi (ai ; x i ) = fi (ai ; x i ) gi (ai ; x i ) 0 and hi (bi ; x i ) = fi (bi ; x i ) gi (bi ; x i ) 0


10
For instance, if a; b 2 R3 , then [a 1; b 1] = [a2 ; b2 ] [a3 ; b3 ], [a 2; b 2] = [a1 ; b1 ] [a3 ; b3 ] and [a 3; b 3] =
[a1 ; b1 ] [a2 ; b2 ].
360 CHAPTER 12. CONTINUOUS FUNCTIONS

for each x 2 [a; b]. Since h is continuous, by the Poincaré-Miranda’s Theorem there exists
c 2 [a; b] such that h (c) = 0, that is, f (c) = g (c).

Through this result we can generalize the equilibrium analysis that we carried out earlier
in the chapter for the market of a single good (Proposition 490). Consider now a market
where bundles x 2 Rn+ of n goods are traded. Let D : [a; b] ! Rn+ and S : [a; b] ! Rn+ be,
respectively, the aggregate demand and supply functions of such bundles, that is, at price
p 2 [a; b] Rn+ the market demands a quantity Di (p) 0 and o¤ers a quantity Si (p) 0
of each good i = 1; :::; n.
A pair (p; q) 2 [a; b] Rn+ of prices and quantities is a market equilibrium if

q = D (p) = S (p) (12.16)

The last result permits to establish the existence of such equilibrium, thus generalizing
Proposition 490 to the general case of n goods. In particular, existence requires that, for
each good i, we have

Di (ai ; p i ) Si (ai ; p i ) and Di (bi ; p i ) Si (bi ; p i ) 8p i 2 [a i ; b i ]

That is, at its smallest possible price, a i , the demand of the good i is greater than its supply
regardless of the prices of the other goods, while the opposite is true at its highest possible
price bi . To …x ideas, assume that a = 0. Then, condition Di (0; p i ) Si (0; p i ) just means
that demand of a free good will always exceed its supply, regardless of which are the prices of
the other goods (a reasonable assumption). In contrast, the opposite happens at the highest
price bi , at which the supply of good i exceeds its demand regardless of the prices of the
other goods (a reasonable assumption as long as bi is “high enough”).
Via the excess demand function E : [a; b] ! Rn de…ned by

E (p) = D (p) S (p)

we can formulate the equilibrium condition (12.16) as a market equation

E (p) = 0 (12.17)

A pair (p; q) of prices and quantities is a market equilibrium if and only if price p solves this
equation and q = D (p). There is excess demand at price p of good i if Ei (p) 0 and excess
supply if Ei (p) 0. In equilibrium, there is neither excess demand nor excess supply. Next
we state the general existence result in excess demand terms.

Proposition 504 Let the excess demand function E : [a; b] ! R be continuous and such
that, for each good i = 1; :::; n,

Ei (bi ; p i ) 0 Ei (ai ; p i ) 8p i 2 [a i ; b i ]

Then there exists a market equilibrium (p; q) 2 [a; b] Rn+ .


12.8. EQUATIONS, FIXED POINTS, AND MARKET EQUILIBRIA 361

12.8.2 Fixed points


We can look at the scalar equation f (x) = 0 from a di¤erent angle. De…ne the auxiliary
function g : A R ! R by g (x) = f (x) + x, with 6= 0. A scalar x 2 A solves the scalar
equation if and only if g (x) = x. The scalar x is said to be a …xed point of function g. So, a
scalar is a solution of the equation de…ned by function f if and only if it is a …xed point of
function g. Solving an equation thus amounts to …nd a …xed point.
In the scalar case, this remark is just a bit more than a curiosum. In contrast, it becomes
important in the general vector case because sometimes the best way to solve the general
equation (12.10) is to consider an associated …xed point problem, so to reduce the solution
of an equation to the search of the …xed points of suitable operators. For this reason in this
section we study …xed points.

An operator f : A Rn ! Rn is said to be a self-map if f (A) A, that is, if f (x) 2 A


for all x 2 A. In words, self-maps associates an element of A to each element of A. They
never escape A. To emphasize this key feature, we often write f : A ! A.

Example 505 (i) All operators f : Rn ! Rn are, trivially, self-maps. (ii) The function
f : [0; 1] ! R given by f (x) = x2 is a self-map because x2 2 [0; 1] for all x 2 [0; 1]. In
contrast, the function f : [0; 1] ! R given by f (x) = x + 1 is not a self-map because, for
instance, f (1) = 2 2
= [0; 1]. N

Self-maps are important here because they may admit …xed points.

De…nition 506 Given a self-map f : A ! A, a vector x 2 A is said to be a …xed point of f


if f (x) = x.

For instance, for the quadratic self-map f : [0; 1] ! [0; 1] given by f (x) = x2 , the
endpoints 0 and 1 are …xed points. For the self-map f : R2 ! R2 given by f (x1 ; x2 ) =
(x1 ; x1 x2 ), the origin is a …xed point in that f (0) = 0.

Turn now to the key question of the existence of …xed points. In the scalar case, it is an
immediate consequence of Bolzano’s Theorem.

Lemma 507 A continuous self-map f : [0; 1] ! [0; 1] has a …xed point.

Proof The result is obviously true if either f (0) = 0 or f (1) = 1. Suppose f (0) > 0 and
f (1) < 1. De…ne the auxiliary function g : [0; 1] ! R by g (x) = x f (x). Then, g (0) < 0
and g (1) > 0. Since g is continuous, by Bolzano’s Theorem there exists x 2 (0; 1) such that
g (x) = 0. Hence, f (x) = x, and so x is a …xed point.

In the general case, the existence of …xed points is ensured by the famous Brouwer’s
Fixed Point Theorem.11 In analogy with the scalar case, it can be viewed as an immediate
consequence of the Poincaré-Miranda’s Theorem.

Theorem 508 (Brouwer) A continuous self-map f : K ! K de…ned on a convex compact


subset K of Rn has a …xed point.
11
It is named after Luitzen Brouwer, who proved it in 1912.
362 CHAPTER 12. CONTINUOUS FUNCTIONS

Proof We prove the result in the special case K = [0; 1]n . Let I : [0; 1]n ! [0; 1]n be the
identity function I (x) = x. We have Ii (0i ; x i ) fi (0i ; x i ) and Ii (1i ; x i ) fi (1i ; x i )
for all x 2 [0; 1]n , where 1 = (1; :::; 1). So, we can apply the Poincaré-Miranda’s Theorem to
the function I f , which ensures the existence of a vector x 2 [0; 1]n such that (I f ) (x) = 0.
Hence, f (x) = x

Brouwer’s Theorem is a powerful result that only requires the self-map to be continuous.
However, it is demanding on the domain, which has to be a compact and convex set, and it
is a non-constructive existence result: it ensures the existence of a …xed point, but gives no
information on how to …nd it.12

12.8.3 Aggregate market analysis via …xed points


Let us go back to equation (12.10), i.e.,
f (x) = 0
In view of Brouwer’s Theorem, we may solve this equation by …nding a self-map g : K ! K
de…ned on a convex compact subset K of Rn such that f (x) = 0 if and only if g (x) = x.
In this way, we reduce the solution of the equation to the search of the …xed points of a
self-map.
Nice on paper, but in practice it might well not be an easy task to carry out. Remarkably,
however, this approach works very well to establish the existence of market equilibria. So,
let D : Rn+ ! Rn+ and S : Rn+ ! Rn+ be, respectively, the aggregate demand and supply
functions of such bundles of n goods.13 Through the excess demand operator E = D S we
can de…ne the market equation (12.17), i.e., E (p) = 0. A pair (p; q) 2 Rn+ Rn+ of prices and
quantities is a market equilibrium if (12.16) holds, i.e., q = D (p) = S (p). Thus, a market
equilibrium exists if and only if there exists a price vector p, called equilibrium price, that
solves the market equation.
A weaker notion is often considered, however, that only requires goods’demand not to
exceed their supply: a pair (p; q) 2 Rn+ Rn+ is a weak market equilibrium if
q = D (p) S (p) (12.18)
To de…ne the corresponding operator equation, de…ne the positive part E + : Rn+ ! Rn+ of E
by
Ei+ (p) = max fEi (p) ; 0g 8i = 1; :::; n
That is, Ei+ (p) = Ei (p) if Ei (p) > 0 and Ei+ (p) = 0 otherwise. As a result, given any
price vector p it holds E (p) 0 if and only if E + (p) = 0. So, a weak market equilibrium
exists if and only if there exists a price vector p, called weak equilibrium price, that solves
the following equation
E + (p) = 0 (12.19)
and q = D (p). Note that the domain and range of E + is Rn+ , not Rn .
A remarkable application of Brouwer’s Fixed Point Theorem is the resolution of this more
general market equation. We assume that:
12
Recall the discussion in Section 1.3.2 on existence results.
13
Relative to Section 12.8.1, we set a = 0 and we no longer assume that there exists a highest price b at
which trade may occur.
12.8. EQUATIONS, FIXED POINTS, AND MARKET EQUILIBRIA 363

A.1 D and S are continuous on Rn++ ;

A.2 D ( p) = D (p) and S ( p) = S (p) for each > 0: nominal changes in prices do not
matter;

A.3 Di (p) > Si (p) for some i with pi > 0 implies Sj (p) > Dj (p) for some j: if some goods
are in excess demand at a positive price, other ones must be in excess supply.

A.4 Di (p) Si (p) if pi = 0: free goods are in excess demand.

Theorem 509 Under conditions A.1-A.4, a weak market equilibrium exists.


P
Proof Let n 1 = p 2 Rn+ : ni=1 pi = 1 be the simplex of Rn . By A.2, without loss of
generality we can consider E : n 1 ! Rn , that is, the restriction of E on the simplex. We
want to show that there is some p 2 n 1 such that E + (p) = 0, i.e., E (p) 0. De…ne
f : n 1 ! n 1 by

1
f (p) = Pn + p + E + (p) 8p 2 n 1
1+ i=1 Ei (p)

By A.1, the function is continuous (why?). By Brouwer’s Fixed Point Theorem, there is
some p 2 n 1 such that f (p) = p, that is,

1
Pn + p + E + (p) = p
1+ i=1 Ei (p)
Pn +
Hence, E + (p) = i=1 Ei (p) p. That is,
n
X
Ek+ (p) = pk Ei+ (p) 8k = 1; ::; n (12.20)
i=1

We want to prove that E + (p) = 0. Suppose, by contradiction, that there exists a good k for
which Ek+ (p) = Ek (p) > 0. By (12.20), it follows that pk > 0. Hence, by A.3 there exists a
good j for which Sj (p) > Dj (p). Hence, Ej+ (p) = 0. Moreover, A.4 implies that its price is
strictly positive, i.e., pj > 0. In view of (12.20) we can write
n
X
0 = Ej+ (p) = pj Ei+ (p)
i=1
Pn
This yields i=1 Ei+ (p) = 0, which contradicts Ek+ (p) > 0. We conclude that E + (p) = 0,
so p is a weak equilibrium price.

Consider the following additional condition, which complements condition (iii):

A.5 Di (p) < Si (p) for some i with pi > 0 implies Sj (p) < Dj (p) for some j: if some goods
are in excess supply at a positive price, other ones must be in excess demand.

Proposition 510 Under conditions A.1-A.5, a market equilibrium exists.


364 CHAPTER 12. CONTINUOUS FUNCTIONS

This result shares with our earlier equilibrium existence result, Proposition 504, condi-
tions A.1 and A.4 – the latter being, essentially, the condition Ei (ai ; x i ) 0. Conditions
A.2, A.3 and A.5 are, instead, new and replace the highest price condition Ei (bi ; x i ) 0.
In particular, condition A.2 will be given a compelling foundation in Section 18.8.

Proof By the previous result there exists p 2 n 1 such that E (p ) 0. We want to


show that E (p ) = 0. Suppose, by contradiction, that Ei (p ) < 0 for some good i. By
A.4, pi > 0. By A.5, there exists some good j such that Ej (p ) > 0, which contradicts
E (p ) 0. We conclude that E (p ) = 0, so p is an equilibrium price.

In Section 18.8 we will present a simple exchange economy that provides a foundation
in terms of individual behavior of the aggregate market analysis of this section. In such
section we will see that it is natural to expect that the excess demand satis…es the following
property:

W.1 p E (p) 0 for all p 2 Rn+ .

This condition is a weak version of the (aggregate) Walras’ law, which is:

W.2 p E (p) = 0 for all p 2 Rn+ .

As it will be seen in Section 18.8, W.1 only requires agents to buy a¤ordable bundles, while
Walras’law requires them to exhaust their budgets, a reasonable but non-trivial assumption.
In any case, W.1 implies condition A.3, so in the existence Theorem 509 we can replace
A.3 with a weak Walras’ law that has a compelling economic foundation. The stronger
condition W.2 implies both A.3 and A.5, so in the last result Walras’law can replace these
two conditions. A bit more is actually true, so next we state and prove the version of the
last two existence results that takes advantage of conditions W.1 and W.2. It is a simpli…ed
version of classical results proved by Kenneth Arrow and Gerard Debreu in the early 1950s.14

Theorem 511 (Arrow-Debreu) Under conditions A.1, A.2 and W.1, a weak market equi-
librium exists. If, in addition, A.4 and W.2 hold, then a market equilibrium exists.

Proof As in the previous proof,P


using A.1 and A.2 we can prove the existence a price vector
n +
p 2 n 1 such that E + (p) = i=1 Ei (p) p. Multiply this equation by the vector E (p)
and use W.1 to get " n #
X
+ +
E (p) E (p) = Ei (p) p E (p) 0
i=1
Pn +
So, we have i=1 Ei (p) Ei (p) 0. But, every addendum is positive because Ei+ (p) Ei (p)
is either 0 or 2
Ei (p). So,
n
X
Ei+ (p) Ei (p) = 0
i=1

This implies Ei+ (p) Ei (p)


= 0 for each i, namely Ei (p) 0. Therefore, p is a weak equilib-
rium price.
It remains to show that, if also A.4 and W.1 hold, then p is an equilibrium price. Since
W.1 implies A.5, we can proceed as in the proof of Proposition 510.
14
The classic work on this topic is Debreu (1959).
12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 365

12.9 Asymptotic behavior of recurrences


12.9.1 A general de…nition for recurrences
The notions introduced so far in this chapter permit to study the convergence of sequences
de…ned by recurrences, a most important class of sequences.
We …rst give a general de…nition of a recurrence that properly formalizes the informal
analysis of recurrences of Section 8.1. Throughout this section A denotes a subset of the real
line.15

De…nition 512 A function ' : An = A A ! A de…ne a recurrence of order k if


(
x0 = 0 ; x1 = 1 ; ::: ; xk 1 = k 1

xn = ' (xn 1 ; xn 2 ; :::; xn k ) for n k

with k initial conditions i 2 A.

A closed form sequence f : N ! R solves the recurrence if


(
f (0) = 0 ; f (1) = 1 ; ::: ; f (k 1) = k 1

f (n) = ' (f (n 1) ; f (n 2) ; :::; f (n k)) for n k

If ' is linear and A is the real line, by Riesz’s Theorem there exists a vector a = (a1 ; :::; an ) 2
Rn such that ' (x) = a x, so we get back to the linear recurrence (8.11). Solutions of this
important class of recurrences have been studied in Section 10.5.4.
If k = 1, the function ' : A ! A is a self-map that de…nes a recurrence of order 1 given
by (
x0 = 0
(12.21)
xn = ' (xn 1 ) for n 1
with initial condition 0 2 R. If the self-map ' : A ! A is linear, it reduces to the geometric
recurrence (
x0 = 0
(12.22)
xn = axn 1 for n 1

12.9.2 Asymptotics
From now on, we focus on the recurrence (12.21). We need some notation. Given any selfmap
' : A ! A, its second iterate ' ' : A ! A is denoted by '2 . More generally, 'n : A ! A
denotes the n-th iterate 'n = 'n 1 ', i.e.,

'n (x) = ' 'n 1


(x) = ( ' ' ' ) (x) 8x 2 A
| {z }
n times

We adopt the convention that '0 is the identity map '0 (x) = x for all x 2 A.
15
Most of the analysis of this section continues to hold if A is a subset of Rn , as readers can check.
366 CHAPTER 12. CONTINUOUS FUNCTIONS

Example 513 (i) Consider the self-map ' : (0; 1) ! (0; 1) de…ned by ' (x) = x= (1 + x).
Then,
x
2 1+x x
' (x) = ' (' (x)) = x =
1 + 1+x 1 + 2x
x
1+x x
'3 (x) = ' '2 (x) = x =
1+ 2 1+x 1 + 3x

This suggests that


x
'n (x) = 8n 1 (12.23)
1 + nx
Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
x
n+1 n 1+nx x
' (x) = ' (' (x)) = x =
1 + 1+nx 1 + (n + 1) x

as desired.
(ii) Consider the self-map ' : [0; 1) ! [0; 1) de…ned by ' (x) = ax2 . Then,
2
'2 (x) = ' (' (x)) = a ax2 = a3 x4
2
'3 (x) = ' '2 (x) = a a3 x4 = a7 x8

With the help of a little bird, this suggests that


n 1 2n
'n (x) = a2 x 8n 1 (12.24)

Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
n 1 2n 2 n 1 2n 1 n+1 n+1 1 2n+1
'n+1 (x) = ' ('n (x)) = a a2 x = aa2 a x2 = a2 x

as desired. N

We can represent the sequence fxn g de…ned via the recurrence (12.21) using the iterates
'n of the selfmap ' : A ! A. Indeed, we have

xn = 'n (x0 ) 8n 0 (12.25)

A sequence of iterates f'n (x0 )g of points in A that starts from an initial point x0 of A is
called orbit of x0 under '. The collection ff'n (x0 )g : x0 2 Ag of all the orbits determined
by possible initial conditions is called phase portrait of '. In view of (12.25), the orbits that
form the phase portrait of ' describe how the sequence de…ned by the recurrence (12.21)
may evolve according to how it is initialized.

Example 514 (i) For the geometric recurrence the relation (12.25) takes the familiar form

xn = 'n (x0 ) = an x0 8n 0
12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 367

So, the phase portrait of ' (x) = ax is ffan x0 g : x0 2 Rg.


(ii) For the nonlinear recurrence de…ned by the self-map ' : (0; 1) ! (0; 1) given by
' (x) = x= (1 + x), we have
x0
xn = 'n (x0 ) = 8n 1
1 + nx0
Here the phase portrait is ffx0 = (1 + nx0 )g : x0 > 0g. N

Orbits solve the recurrence (12.21) if they can be described in closed form, as it is the case
for the recurrences of the last example. Unfortunately, often this is not possible and so the
main interest of (12.25) is theoretical; operationally, however, it may suggest a qualitative
analysis of the recurrence. A main issue in this regard is the asymptotic behavior of orbits:
where do they end up eventually? for instance, do they converge?
The next simple, yet important, result shows that …xed points play a key role in studying
the convergence of orbits.

Theorem 515 Let ' : A ! A be a continuous self-map and x0 a point of A. If the orbit
f'n (x0 )g converges to x 2 A, then x is a …xed point of '.

Proof Assume that xn = 'n (x0 ) ! x 2 A. Since ' is continuous, we have ' (x) =
lim ' ('n (x0 )). So,

' (x) = lim ' ('n (x0 )) = lim 'n+1 (x0 ) = lim xn+1 = lim xn = lim 'n (x0 ) = x

where the equality lim xn+1 = lim xn holds because, as easily checked, if xn ! x then
xn+k ! x for every given k 1. We conclude that x is a …xed point, as desired.

So, a necessary condition for a point to be the limit of a sequence de…ned by a recurrence
of order 1 is to be a …xed point of the underlying self-map. If there are no …xed points,
convergence is hopeless. If they exist (e.g., by Brouwer’s Theorem), we have some hope.
Yet, it is only a necessary condition: as it will become clear later in the section, there are
…xed points of ' that are not limits points of the recurrence (12.21).

Fixed points thus provide the candidate limit points. We have the following procedure
to study limits of sequence de…ned by a recurrence (12.21):

1. Find the collection fx 2 A : ' (x) = xg of the …xed points of the self-map '.

2. Check whether they are limits of the orbits f'n (x0 )g, that is, whether 'n (x0 ) ! x.

This procedure is especially e¤ective when the …xed points are unique. Indeed, in this
case there is a unique candidate limit point for all possible initial conditions x0 2 A, so if
orbits converge –e.g., they form a monotonic sequence, so Theorem 299 applies –then they
have to converge to the …xed point. Remarkably, in this case iterations swamp the initial
condition, which asymptotically plays no role in the behavior of the recursion. Regardless of
how the recursion starts, it eventually behaves the same.
In view of this discussion, the next result is especially interesting.16
16
Contractions are introduced in Section 16.1.
368 CHAPTER 12. CONTINUOUS FUNCTIONS

Proposition 516 If the self-map ' : A ! A is a contraction, it has at most a unique …xed
point.
Proof Suppose that x1 ; x2 2 A are …xed points. Then, for some k 2 (0; 1),
0 jx1 x2 j = j' (x1 ) ' (x2 )j k jx1 x2 j
and so jx1 x2 j = 0. This implies x1 = x2 , as desired.

So, recursions de…ned by self-maps that are a contraction have at most a single candidate
limit point. It is then enough to check whether it is actually a limit point.
Example 517 A continuously di¤erentiable function ' : [a; b] ! R is a contraction if 0 <
k = maxx2[a;b] j'0 (x)j < 1 (cf. Example 727). Take the contraction self-map ' : [0; 1] ! [0; 1]
given by ' (x) = x2 =4. The unique …xed point is the origin x = 0. By (12.24), we have
1 n
'n (x0 ) = x2 ! 0 8x0 2 [0; 1]
42n 1 0
So, the orbits converge to the …xed point for all initial conditions x0 2 [0; 1]. N
The next example shows, inter alia, that being a contraction is a su¢ cient but not
necessary condition for the uniqueness of …xed points.
Example 518 Consider the self-map ' : (0; 1) ! (0; 1) de…ned by ' (x) = x= (1 + x).
We have, for all x; y > 0,
jx yj
j' (x) ' (y)j =
(1 + x) (1 + y)
So, ' is not a contraction. Nevertheless, it is easy to check that it has a unique …xed point
given by the origin x = 0. By (12.23), we have
x0
'n (x0 ) = !0 8x0 > 0
1 + nx0
So, the orbits converge to the …xed point for all initial conditions x0 > 0. N
In the rest of the section we illustrate our asymptotic analysis through some important
applications.

12.9.3 Price dynamics


Let us go back to the recurrence, with initial expectation E0 (p1 ),
(
p1 = E0 (p1 )
(12.26)
pt = pt 1 for t 2
of the equilibrium prices of markets with production delays and classic expectations, that is,
extrapolative expectations of the simplest form Et 1 (pt ) = pt 1 (cf. Section 8.4.3).
We now study lim pt to understand the asymptotic behavior of such equilibrium prices.
To this end, consider the map ' : [a; b] ! R de…ned by

' (x) = x

where [a; b] = [ = ; = ], > 0, 0 and ; > 0.


12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 369

Lemma 519 The function ' : [a; b] ! R is a self-map.


Proof We have
' = + =

as well as, being = = ,

' = + =

Since 0, the function ' is decreasing. So

a= ' ' (x) ' = =b 8x 2 [a; b]

We conclude that ' is a self-map.

We can thus write ' : [a; b] ! [a; b]. This self-map de…nes the price recurrence (12.26).
Its unique …xed point of ' is easily seen to be

p=
+
Thus, the unique candidate limit price is the equilibrium price (8.17) of the market without
delays in production.

Let us check whether or not p is indeed the limit point. The following formula is key.
Lemma 520 We have
t 1
pt p = ( 1)t 1
(p1 p) 8t 2 (12.27)

Proof We have
1 1
pt p = pt 1 =( ) pt 1
+ +

= pt 1 = (pt 1 p)
+
that is,
pt p= (pt 1 p) 8t 2 (12.28)

By iterating this geometric recursion, we have


p2 p = (p1 p)
2
p3 p = (p2 p) = (p1 p) = (p1 p)
3
p4 p = (p3 p) = (p1 p)

t 1
t 1
pt p = ( 1) (p1 p)
370 CHAPTER 12. CONTINUOUS FUNCTIONS

as desired.

Since (
1 if t odd
( 1)t 1
=
1 if t even
from formula (12.27) it follows that
t 1
jpt pj = jp1 pj 8t 2 (12.29)

The value of lim pt thus depends on the ratio = of the slopes of the supply and demand
functions. We need to distinguish three cases according to whether such ratio is greater,
equal or lower than 1, that is, according to whether

< ; = ; >

Case 1: < The supply function has a lower slope than the demand function. We have
t 1
lim jpt pj = jp1 pj lim =0

So,
lim pt = p (12.30)
as well as
lim Et 1 (pt ) =p (12.31)
When < , the …xed point p is indeed a limit point. Equilibrium prices of markets with
delays and classic expectations thus converge to the equilibrium price of the market without
delays in production. This holds for any possible initial expectation E0 (p1 ), which in the
long run turns out to be immaterial.
Note that the (one-step-ahead) forecast error vanishes asymptotically:

et = pt Et 1 (pt ) !0

Classic expectations, though lazy, are nevertheless asymptotically correct provided < .

Case 2: = The demand and supply functions have the same slope. Formula (12.27)
implies
pt p = ( 1)t 1
(p1 p) 8t 2
The initial price p1 is equal to p if and only if initial expectation is correct:

E0 (p1 ) = p1 () p1 = E0 (p1 ) = p1 () p1 = p

So, if the initial expectation is correct, then pt = p for all t. Otherwise, the initial error
E0 (p1 ) 6= p1 determines an oscillating sequence of equilibrium prices

2p p1 if t even
pt = p + ( 1)t 1
(p1 p) =
p1 if t odd
12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 371

for all t 2. Also the error forecast


et = pt Et 1 (pt ) = pt pt 1 = ( 1)t 1
( 1)t 2
(p1 p) = 2 ( 1)t 1
(p1 p)

keeps oscillating.

Case 3: > The supply function has a higher slope than the demand function. From
> it follows that
t 1
lim = +1

When the initial expectation is not correct, p1 6= p, then the oscillations


t 1
pt p = ( 1)t 1
(p1 p)

have a higher and higher amplitude. Indeed:


t 1
lim jpt pj = jp1 pj lim = +1

In this case, the initial forecast error propagates, causing an exploding price dynamics. When
> , the laziness of classic expectations translates in explosive price behavior.

As we already remarked, given a sequence of equilibrium prices fpt g and of price expect-
ations fEt 1 (pt )g, the error forecast et at each t is given by
et = pt Et 1 (pt )

The expectation underestimate price pt if et > 0 and overestimate it if et < 0. Instead, if


et = 0 the expectation is correct.
It is plausible that rational producers do not err systematically: errare humanum est,
perseverare diabolicum. An extreme form of this principle requires that expectations be
always correct:
Et 1 (pt ) = pt 8t 1
It is the so-called hypothesis of rational expectations (or perfect foresight). Though extreme,
it is a clear-cut hypothesis that is important to …x ideas.

In view of (8.19), in the market of potatoes with production delays the producers’error
forecast et at time t is

et = pt Et 1 (pt ) = + 1 Et 1 (pt )

In particular, at each t 1 one has

et = 0 () + 1 Et 1 (pt ) = 0 () Et 1 (pt ) =
+
So, expectations are rational if and only if

Et 1 (pt ) = pt = p = 8t 1
+
We have thus proved the following result.
372 CHAPTER 12. CONTINUOUS FUNCTIONS

Proposition 521 A uniperiodal market equilibrium of markets MRt features rational expect-
ations if and only if the sequence of equilibrium prices is constant with pt = Et 1 (pt ) = p
for all t 1.

The constancy of equilibrium prices is thus equivalent to the correctness of expectations.


A non-trivial price dynamics is, thus, the outcome of forecast errors. This result holds for
any kind of expectations, extrapolative or not. Indeed, the rationality of expectations is
a property of expectations, not an hypothesis on how they are formed: once a possible
expectation formation mechanism is speci…ed, a theoretical issue is to understand when they
are correct. For instance, in the previous case = , we saw that classic expectations are
rational if and only if the initial expectation is correct, that is, E0 (p1 ) = p1 .

The uniperiodal price equilibrium under rational expectations of markets MRt with pro-
duction delays is equal to the equilibrium price (8.17) of market M . Remarkably, rational
expectations have neutralized, in equilibrium, any e¤ect of di¤erences in production tech-
nologies. In terms of potatoes’ equilibrium prices, it is immaterial to have a traditional
technology, with sowing in t 1 and harvest in in t, rather than a Star Trek one with
instantaneous production.

12.9.4 Heron’s method


While computing the square a2 of a number a is quite simple, the procedure required to
p
compute the square root a of a positive number a is signi…cantly harder. Fortunately, we
can count on Heron’s method, a powerful algorithm also known as “Babylonian method”.
Given 0 < a 6= 1, Heron’s sequence fxn g is de…ned by recurrence by setting x1 = a and
1 a
xn+1 = xn + 8n 2 (12.32)
2 xn
p
Theorem 522 (Heron) Let 0 < a 6= 1. Then xn ! a.

Thus, Heron’s sequence converges to the square root of a. On top of that, the rate of
convergence is quite fast, as we will see in a few examples.

Proof Heron’s sequence is convergent because it is (strictly) decreasing, at least after n = 2.


To prove it, we …rst observe that
p p
xn > a =) xn > xn+1 > a (12.33)
p
Indeed, let xn > a. It follows that x2n > a, i.e., xn > a=xn . So,
1 a 1
xn+1 = xn + < (xn + xn ) = xn
2 xn 2
2 p
Moreover, since x2n a > 0 when xn =
6 a, we have
x4n + a2 a2
x4n 2x2n a + a2 > 0 =) > 2a =) x 2
n + > 2a =)
x2n x2n
2
a2 a
x2n + + 2a > 4a =) xn + > 4a
x2n xn
12.9. ASYMPTOTIC BEHAVIOR OF RECURRENCES 373

that is,
2
1 a
x2n+1 = xn + >a
4 xn
p
So, xn+1 > a. This completes the proof of (12.33).
p p
If a > 1, we have x1 = a > a. By (12.33), x2 > a. If, instead, 0 < a < 1, then
p
x2 = (a + 1) =2 > a. Indeed, by squaring we obtain

(a + 1)2 > 4a () a2 + 2a + 1 > 4a () a2 2a + 1 > 0 () (a 1)2 > 0


p p
In sum, for all 0 < a 6= 1 we have x2 > a. From (12.33) it then follows a < x3 < x2 ,
p
which in turn implies a < x4 < x3 , and so on. The elements of the sequence, starting from
p
the second one, are thus decreasing and greater than a.
p
We conclude that Heron’sequence is decreasing, at least for n 2, with lower bound a.
p
So, it is bounded and, by Theorem 299-(i), it has a …nite limit L a > 0. The recurrence
(12.32) is de…ned by the self-map ' : (0; 1) ! (0; 1) given by
1 a
' (x) = x+
2 x
Since
1 a a a
x= x+ () 2x = x + () x = () x2 = a
2 x x x
p p
the unique …xed point of ' is a. By Theorem 515, we conclude that L = a, as desired.
p
Example 523 (i) Let us compute 2, which we know to be approximately 1:4142135.
Heron’s sequence is:
1 2 3
x1 = 2 ; x2 = 2+ = = 1:5
2 2 2
1 3 2 17
x3 = + = ' 1:4166667
2 2 3=2 12
1 17 2 577
x4 = + = ' 1:4142156
2 12 17=12 408
1 577 2 665857
x5 = + = ' 1:4142135
2 408 577=408 470832
The quality of the approximation after only …ve steps is remarkable.
p
(ii) Let us compute 428356 ' 654:48911. Heron’s sequence is:

x1 = 428356 ; x2 ' 214178:5 ; x3 ' 107090:24


x4 ' 53547:115 ; x5 ' 26777:619 ; x6 ' 13396:807
x7 ' 6714:3905 ; x8 ' 3389:0936 ; x9 ' 1757:743
x10 ' 1000:7198 ; x11 = 714:3838 ; x12 ' 656:9999
x13 ' 654:4939 ; x14 ' 654:4891

Here fourteen steps delivered a sharp approximation.


374 CHAPTER 12. CONTINUOUS FUNCTIONS
p
(iii) For 0:13 ' 0:3605551, Heron’s sequence is:

x1 = 0:13 ; x2 ' 0:565 ; x3 ' 0:3975442


x4 ' 0:3622759 ; x5 ' 0:3605592 ; x6 ' 0:360555

The sequence is decreasing starting from the second element. N

The geometric intuition behind Heron’s method is elegant. It is based on a sequence of


p
rectangles of area equal to a that converge to a square with sides of length a (thus with
area a). The n-th rectangle’s longer side is equal to xn and its shorter side is equal to a=xn
(given that the area must equal a): for n + 1 the longer side shrinks to

1 a
xn+1 = xn + < xn
2 xn

By iterating the algorithm, xn and a=xn become closer and closer, till they reach their
p
common value a. The following …gure illustrates:

y
4

2a/xn+1
a/x
n
1

0
O x x x
n+1 n

-1
-1 0 1 2 3 4 5

12.10 Coda continua


In view of the sequential characterization of limits (Proposition 451), the notion of continuity
can be easily reformulated using the concept of limit (De…nition 447) as follows.

Proposition 524 A function f : A Rn ! R is continuous at x0 2 A if and only if, for


every " > 0, there exists " > 0 such that

kx x0 k < " =) jf (x) f (x0 )j < " 8x 2 A (12.34)

This characterization is identical to the de…nition of limx!x0 f (x) = f (x0 ) for a point
x0 that belongs to the domain of the function, except for the elimination of the condition
0 < kx x0 k –i.e., of the requirement that x 6= x0 –so to include x0 that are isolated points
of A.
12.10. CODA CONTINUA 375

Proof If x0 is an accumulation point of A, condition (12.34) amounts to limx!x0 f (x) =


f (x0 ). If x0 is an isolated point of A, condition (12.34) always holds. Indeed, by the
de…nition of isolated point, there exists a neighborhood B (x0 ) of small enough radius > 0
so that B (x0 ) \ A = fx0 g. Thus, for each x 2 A we have kx x0 k < if and only if x = x0 .
It follows that, for each " > 0, there exists > 0 such that kx x0 k < implies x = x0 for
all x 2 A, so that jf (x) f (x0 )j = 0 < ".

In the language of neighborhoods, this characterization reads as follows: f is continuous


at x0 2 A if and only if, for every neighborhood V" (f (x0 )) there exists a neighborhood
U " (x0 ) such that
x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (12.35)
that is, f (U " (x0 ) \ A) V" (f (x0 )). Equivalently (why?), for each open set V containing
f (x0 ), there exists an open set U containing x0 such that f (U \ A) V .
Besides on ", the value of " in (12.35) depends also on the point x0 at hand. If it happens
that, given " > 0, we can choose the same " for every x0 2 A (i.e., once " is …xed, the same
" would work at all the points of the domain of f ), we have a stronger notion of continuity,
called uniform continuity. It is a remarkable property of uniformity that allows to “control”
the distance jf (x) f (y)j between images just through the distance jx yj between each
pair of points x and y of the domain of f .

De…nition 525 A function f : A Rn ! R is said to be uniformly continuous on A if, for


every " > 0, there exists " > 0 such that

kx yk < =) jf (x) f (y)j < " 8x; y 2 A

Here the value of " thus depends only on ", no longer on a point x0 . Indeed, no speci…c
points x0 are mentioned in this de…nition, which only considers the domain per se.
Uniform continuity implies continuity, but the converse does not hold. For example, we
will see soon that the quadratic function is continuous on R, but not uniformly. Yet, the two
notions of continuity turn out to be equivalent for the fundamental class of the compact sets
(Section 5.6).

Theorem 526 A function f : A Rn ! R is continuous on a compact set K A if and


only if it is uniformly continuous on K.

Proof The “if”is obvious because uniform continuity implies continuity. We prove the “only
if”. For simplicity, consider the scalar case n = 1 with K = [a; b]. So, let f : [a; b] ! R be
continuous. We need to show that it is also uniformly continuous. Suppose, by contradiction,
that there exists a " > 0 such that there are two sequences fxn g and fyn g in [a; b] with
xn yn ! 0 and
f (xn ) f (yn ) " 8n 1 (12.36)
Since the sequences fxn g and fyn g are bounded, the Bolzano-Weierstrass’ Theorem yields
two convergent subsequences fxnk g and fynk g, i.e., there exist x; y 2 [a; b] such that xnk ! x
and ynk ! y. Since xn yn ! 0, we have xnk ynk ! 0 and, therefore, x y = 0 because of the
uniqueness of the limit. Since f is continuous, we have f (xnk ) ! f (x) and f (ynk ) ! f (y).
Hence, f (xnk ) f (ynk ) ! f (x) f (y) = 0, which contradicts (12.36). We conclude that f
is uniformly continuous.
376 CHAPTER 12. CONTINUOUS FUNCTIONS

Theorem 526 does not hold without assuming the compactness of K, as the next two
counterexamples show. In the …rst counterexample we consider a closed, but unbounded set
–the real line –while in the second one we consider a bounded set which is not closed –the
open interval (0; 1).

Example 527 The quadratic function f : R ! R is not uniformly continuous. Suppose, by


contradiction, that f (x) = x2 is uniformly continuous on R. By setting " = 1, there exists
" > 0 such that
jx yj < " =) x2 y 2 < 1 8x; y 2 R (12.37)
If we take xn = n and yn = n + " =2, we have jxn yn j < " for every n 1, but
2 2 2
limn xn yn = +1, which contradicts (12.37). Therefore, the function x is not uniformly
continuous on R. But, its restriction to any compact interval [a; b] is uniformly continuous
thanks to Theorem 526. N

Example 528 The function f : (0; 1) ! R de…ned by f (x) = 1=x is continuous, but
not uniformly continuous, on (0; 1). Indeed, suppose, by contradiction, that f is uniformly
continuous. By setting " = 1, there exists " > 0 such that

1 1
jx yj < " =) <1 8x; y 2 (0; 1) (12.38)
x y

Let y = min f " =2; 1=2g and x = y=2. It is immediate that 0 < x < y < 1 and jx yj < ".
By (12.38), we thus have
1 1 1 1
= <1 (12.39)
x y x y
On the other hand,
1 1 1
= 2
x y y
which contradicts (12.39). We conclude that the function 1=x is not uniformly continuous
on (0; 1). Nevertheless, by Theorem 526 its restriction to any compact interval [a; b] (0; 1)
is uniformly continuous. N
Part IV

Linear and nonlinear analysis

377
Chapter 13

Linear functions and operators

In Chapter 3 we studied at length the linear structure of Rn . In this chapter we consider


linear functions, which form an important family of functions de…ned on Rn that preserve
its linear structure.

13.1 Linear functions


13.1.1 De…nition and …rst properties
De…nition 529 A function f : Rn ! R is said to be linear if

f ( x + y) = f (x) + f (y) (13.1)

for every x; y 2 Rn and every ; 2 R.1

Example 530 The scalar functions f : R ! R de…ned by f (x) = mx for some m 2 R are
linear. Geometrically, they are straight lines passing through the origin. N

Example 531 Through inner products (Section 4.1.1), it is easy to de…ne linear functions.
Indeed, given a vector 2 Rn , de…ne f : Rn ! R by

f (x) = x 8x 2 Rn (13.2)

This function f is linear:


n
X
f ( x + y) = ( x + y) = i( xi + yi )
i=1
n
X n
X
= i xi + i yi = ( x) + ( y)
i=1 i=1
= f (x) + f (y)

for every x; y 2 Rn and every ; 2 R. When n = 1 we go back to the previous example. N


1
These functions are sometimes called linear functionals.

379
380 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Example 532 In production theory, production functions may be assumed to have the
linear form (13.2). The vector = ( 1 ; 2 ; :::; n ) 2 Rn is interpreted as the vector of
constant production coe¢ cients. Indeed, we have

f e1 = e1 = 1
2 2
f e = e = 2

f (en ) = en = n

which means that 1 is the quantity of output determined by one unit of the …rst input,
2 is the quantity of output determined by one unit of the second input, and so on. These
coe¢ cients are called constant because they do not depend on the quantity of input. This
implies that the returns to scale of these production functions are constant. N

Next, we give a simple but important characterization: a function is linear if and only if
it preserves the operations of addition and scalar multiplication. Linear functions are, thus,
the functions that preserve the linear structure of Rn . This clari…es their nature.

Proposition 533 A function f : Rn ! R is linear if and only if

(i) f (x + y) = f (x) + f (y) for all x; y 2 Rn ;

(ii) f ( x) = f (x) for all x 2 Rn and 2 Rn .

Proof “If”. Suppose that (i) and (ii) hold. Then

f ( x + y) = f ( x) + f ( y) = f (x) + f (y)

so f is a linear function. “Only if”. Let f be a linear function. If in (13.1) we set = =1


we get (i); if in (13.1) we instead set = 0, we get (ii).

Next, we show that, more generally, linear combinations are preserved by linear functions.
When k = 2 we are back to the de…nition, but the result goes well beyond that, as it holds
for every k 2.

Proposition 534 Let f : Rn ! R be a linear function. We have f (0) = 0 and


k
! k
X X
i i
f ix = if x (13.3)
i=1 i=1

k
for every set of vectors xi i=1
in Rn and every set of scalars f i gki=1 .

Proof Let us show that f (0) = 0. Since f is linear, we have f ( 0) = f (0) for every
2 R. So, f (0) = f (0) for every 2 R, which can happen if and only if f (0) = 0. The
proof of (13.3) is left to the reader.

A more general version of (13.3), called Jensen’s inequality, will be proved in Chapter
14. Property (13.3) has an important consequence: once we know the values taken by a
13.1. LINEAR FUNCTIONS 381

linear function on the elements of a basis, we can determine its value for any vector of Rn
whatsoever. Indeed, let S be a basis of Rn . Each vector x 2 Rn can be written as a linear
n
combination of elements of S, Pso there exists a …nite set of vectors xi i=1 in S and a set of
scalars f i gni=1 such that x = ni=1 i xi . By (13.3), we then have
n
X
f (x) = if xi
i=1

Thus, by exploiting the linearity of f , if we know the values of f xi : xi 2 S , we can


determine the value f (x) for each vector x 2 Rn .

Linearity is a purely algebraic property that requires functions to have a consistent beha-
vior with respect to the operations of addition and scalar multiplication. Thus, prima facie,
linearity has no topological consequences. It is, therefore, remarkable that linear functions
turn out to be continuous.

Theorem 535 Linear functions are continuous.

This elegant result is important because continuity is, as we learned in the last chapter,
a highly desirable property. We omit, however, the proof because it is a special case of a
result, Theorem 669, that will be proved later in the book.

13.1.2 Representation
De…nition 536 The set of all linear functions f : Rn ! R is called the dual space of Rn
and is denoted by (Rn )0 .

The space (Rn )0 is, thus, the set of all linear functions de…ned on Rn . On (Rn )0 it is
possible to de…ne in a natural way addition and scalar multiplication:

1. If f; g 2 (Rn )0 , then f + g is the element of (Rn )0 de…ned by

(f + g) (x) = f (x) + g (x) (13.4)

for every x 2 Rn .

2. If f 2 (Rn )0 and 2 R, then f is the element of (Rn )0 de…ned by

( f ) (x) = f (x) (13.5)

for every x 2 Rn and every 2 R.

The two operations satisfy the properties (v1)-(v8) that, in Chapter 3, we discussed for
Rn . Hence, intuitively, (Rn )0 is an example of a vector space. In particular, the neutral
element for the addition is the zero function f such that f (x) = 0 for every x 2 Rn , while
the opposite element of f 2 (Rn )0 is the function g = ( 1) f = f such that g(x) = f (x)
for every x 2 Rn .

The next important result, an elementary version of the celebrated Riesz’s Theorem,
describes the dual space (Rn )0 . We saw that every vector 2 Rn induces a linear function
382 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

f : Rn ! R de…ned by f (x) = x (Example 531). The following result shows that the
converse holds: all linear functions de…ned on Rn have this form, i.e., the dual space (Rn )0
consists of the linear functions of the type f (x) = x for some 2 Rn . In particular, the
straight lines passing through the origin are the unique linear functions de…ned on the real
line (Example 530).

Theorem 537 (Riesz) A function f : Rn ! R is linear if and only if there exists a (unique)
vector 2 Rn such that
f (x) = x 8x 2 Rn

Proof We have already seen the “if”part in Example 531. It remains to prove the “only if”
part. So, let f : Rn ! R be a linear function and consider the standard basis e1 ; :::; en of
Rn . Set
= f e1 ; :::; f (en ) 2 Rn
P
We can write each vector x 2 Rn as x = ni=1 xi ei . Thus, by the linearity of f we have:
n
! n n
X X X
f (x) = f xi ei = xi f e i = i xi = x 8x 2 Rn
i=1 i=1 i=1

As to the uniqueness of , let 0 2 Rn be a vector such that f (x) = 0 x for every x 2 Rn .


Then, for every i = 1; :::; n we have
0 0
i = ei = f ei = ei = i

and so 0 = .

A linear function f : Rn ! R is, therefore, identi…ed by a unique vector 2 Rn . In a


slightly improper way, we can say that the dual space of Rn is Rn itself.

13.1.3 Monotonicity
Turn now to the order structure of Rn . A function f : Rn ! R is said to be:

(i) positive if f (x) 0 for each x 0;

(ii) strictly positive if f (x) > 0 for each x > 0.2

In words, a (strictly) positive function f assigns (strictly) positive values f (x) to (strictly)
positive vectors x.
In general, positivity is a much weaker property than monotonicity: for example, the
function f (x) = kxk is positive but it is not increasing. Indeed,pfor n = 2, the p vectors
x = ( 3; 2) and y = (2; 2) are such that y x, while f (x) = 13 > f (y) = 8. A
remarkable feature of linear functions is that the two properties become equivalent.
2
Positivity with respect to the order structure is weaker than positivity of the image of a function f : A
n
R ! R. This latter, stronger, notion requires that f (x) 0 for all x that belong to the domain A. In what
follows, it should be clear from the context which notion of positivity we are referring to.
13.1. LINEAR FUNCTIONS 383

Proposition 538 A linear function f : Rn ! R is (strictly) increasing if and only if it is


(strictly) positive.

Proof We only prove the “if”part, since the converse is rather trivial. Let f be positive. We
show that it is also increasing. Let x; y 2 Rn be such that x y. Let also z = x y 2 Rn .
Since z 0, positivity and linearity imply

f (x) f (y) = f (x y) = f (z) 0

yielding that f (x) f (y), as desired. The proof for f strictly positive is similar.

Thus, to prove that a linear function is increasing, it is enough to show that it is positive,
while to prove that it is strictly increasing, it su¢ ces to show that it is strictly positive.

Positivity emerges also in the monotone version of Riesz’s Theorem. This result, which
will be generalized in Proposition 641, is of great importance in applications as we will see
in Section 19.5.

Proposition 539 A function f : Rn ! R is linear and (strictly) increasing if and only if


there is a (strongly) positive vector 2 Rn+ such that f (x) = x for all x 2 Rn .

(Strictly) increasing linear functions are thus characterized by (strongly) positive repres-
enting vectors . Let us see an instance of this result.

Example 540 Consider the linear functions f; g : R3 ! R de…ned by f (x) = x1 + 2x2 + 5x3
and g (x) = x1 +3x2 . Denote by f and g their representing vectors. By the last proposition,
f is strictly increasing because f = (1; 2; 5) 0, and g is increasing because g = (1; 2; 0)
0. N

As the reader can check, the proof of Proposition 539 is an immediate consequence of
Riesz’s Theorem when it is combined with Proposition 538 and the following lemma.

Lemma 541 Let a; b 2 Rn . We have:

(i) a b 0 for each b 0 if and only if a 0;

(ii) a b > 0 for each b > 0 if and only if a 0.

Proof The “if” parts are trivial. As for the “only if” parts, consider b = ei : it follows that
a b = ai which, in turn, must be, respectively, 0 and > 0 for each i.

Similar results can be proven by replacing “strictly” with “strongly”. Moreover, as the
reader can easily verify, dual results hold for decreasing and negative linear functions.
384 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

13.1.4 Application: averages


Averages are a nice application of the representations theorems seen in this section. So,
assume that a …rm has n branches (for example, in n di¤erent countries). We can collect in
a vector x = (x1 ; x2 ; :::; xn ) 2 Rn the di¤erent pro…ts of the di¤erent branches: x1 denotes
the pro…t of the …rst branch, x2 denotes the pro…t of the second branch, and so on. A
negative xi is interpreted as a loss (which, indeed, can be regarded as a negative pro…t).
Suppose that we would like to have a summary measure that returns, for every vector
of pro…ts, a single scalar that summarizes the entire vector of pro…ts. Indeed, the board
of directors asks the management for a single number that indicates …rm’s pro…tability.
Formally, we need a “summary” function f : Rn ! R that, to each vector of pro…ts x,
associates a scalar f (x) interpreted as summary measure of vector x. Let us proceed from
…rst principles by postulating some properties that we deem natural that a summary function
–in particular, a “pro…tability”summary function –should satisfy and then try to see where
these properties lead us.
For example, we can suppose that the summary function f is linear. Indeed, in the
example assume that, if we merge two …rms with pro…ts’ vector x and y, then the pro…ts’
vectors of the merged company is x + y (so, no “synergies”). In this case, the summary
f (x + y) should also be the sum of the summaries f (x) and f (y) of the two companies, that
is, f (x + y) = f (x) + f (y). We could argue and justify in a similar fashion the property of
homogeneity.
Riesz’Theorem tells us that a linear f must have the form
n
X
f (x) = i xi 8x 2 Rn (13.6)
i=1

In other words, f must be a weighted average with weights i . So, it is linearity the property
that underlies weighted averages as summary measures: if it is a property meaningful in the
application at hand, then weighted averages become the way to summarize vectors through
a scalar.
For instance, if all weights are positive and equal, we have i = 1=n and (13.6) becomes
the classic arithmetic average
n
1X
f (x) = xi
n
i=1

The contemplation of this classic average makes us realize that in (13.6) weights might be
negative, something unnatural (at least for our pro…t example). We thus need some further
assumption on f that makes possible the use of the monotone version of Riesz’ Theorem.
This is easily done by requiring f to be positive:

x 0 =) f (x) 0

It is, indeed, a rather intuitive property: if the vector (say, of pro…ts) is positive, its summary
measure should also be positive. If f is linear and positive, by Proposition 539 we indeed
have 0, so weights are positive.
Another property that seems natural is normalization:

f (1; 1; :::; 1) = 1
13.1. LINEAR FUNCTIONS 385

If, for instance, the vector of pro…ts is constant and equal to 1 (so all branches make a unit
of pro…t), then the summary measure of pro…ts is 1 as well. This property is characterized
by having the weights in (13.6) that add up to 1.

Proposition 542 The function f : Rn ! R is linear, Ppositive and normalized if and only if
there exists a (unique) positive vector 2 Rn+ , with ni=1 i = 1, such that f (x) = x for
n
all x 2 R .

Indeed, weights are often assumed to add up to 1, so they can be interpreted as propor-
tions and expressed, if needed, in percentage terms. This result shows that normalization is
the property of linear and positive summary functions that underlies this natural property.

Proof By Proposition 539, there exists a (unique)


Pn positive vector 2 Rn+ such that f (x) =
x for all x 2 n
PRn . We need to prove that i=1Pni = 1 if and only if f is normalized. “If”.
Suppose that i=1 i = 1. Then, f (1; :::; 1) = i=1 i = 1, so f is normalized. “Only if”.
Suppose f is normalized. Then,
n
! n n
X X X
i
1 = f (1; :::; 1) = f e = f ei = i
i=1 i=1 i=1

as desired.

A further interesting property that f may satisfy is symmetry. In our pro…t example,
symmetry says that we do not care which branch realized which pro…t, but we only care about
the size of the pro…t. For instance, if n = 2 and x = (1000; 4000) and y = (4000; 1000), under
symmetry f (x) = f (y) because the only di¤erence in the two vectors is which branch earned
a given pro…t.
To state formally symmetry we need permutations, that is, bijections : N ! N where
N = f1; 2; :::; ng (Appendix B.2). Given x; y 2 Rn , write x y if there exists a permutation
such that xi = y (i) for all i = 1; 2; :::; n. In other words, y can be obtained from x by
permuting indexes.

Example 543 We have x = (1000; 4000) y = (4000; 1000). Indeed, let : f1; 2g ! f1; 2g
be the permutation given by (1) = 2 and (2) = 1, in which indexes are interchanged.
Then (y (1) ; y (2) ) = (y2 ; y1 ) = (1000; 4000) = x. N

We say that f is symmetric if

x y =) f (x) = f (y)

In other words, a symmetric f assigns the same value to all vectors that can be obtained
from another one via a permutation.

Proposition 544 The function f : Rn ! R is linear, positive, normalized and symmetric


if and only if
n
1X
f (x) = xi 8x 2 Rn
n
i=1
386 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Remarkably, this result provides a foundation for the classic arithmetic average: it is the
only summary function on Rn which is linear, positive, normalized, and symmetric. As long
as these properties are compelling in our application, we can summarize vectors via their
arithmetic averages.

Proof In view of the last proposition, f is linear, positive


P and normalized if and only if
there exists a (unique) positive vector 2 Rn+ , with ni=1 i = 1, such that f (x) = x
n
for all x 2 R . Thus, it remains to prove that f is symmetric if and only if i = 1=n for
each i = 1; 2; :::; n. “If”. Suppose that i = 1=n for each i = 1; 2; :::; n. Let x; y 2 Rn
be such that x y. By de…nition, there is a permutation such that xi = y (i) for all
i = 1;
P 2; :::; n. Clearly,
P …nite sums are commutative, so they are invariant under permutations,
i.e., ni=1 y (i) = ni=1 yi . Then,
n n n
1X 1X 1X
f (x) = xi = y (i) = yi = f (y)
n n n
i=1 i=1 i=1

proving that f is symmetric. “Only if”. Suppose f is symmetric. Note that ei ej for all
indexes i 6= j. Indeed, it is enough to consider the permutation : N ! N de…ned by
8
< j if k = i
(k) = i if k = j
:
k else

By symmetry, we then have i = f ei = f ej = j for all indexes i 6= j. So, the weights


are equal. In turn, this implies i = 1=n for each i, as desired.

Summing up, the Riesz’s Theorem and its variations permit a principled approach to
weighted averages that are justi…ed via the properties of the summary functions, which are
the fundamental objects of interest –averages just being a way to represent them (however
convenient they might be).

13.2 Matrices
13.2.1 De…nition
Matrices play a key role in the study of linear operators. Speci…cally, a m n matrix is
simply a table, with m rows and n columns, of scalars
2 3
a11 a12 a1j a1n
6 a21 a22 a2j a2n 7
6 7
6 7
6 7
4 5
am1 am2 amj amn

For example, 2 3
1 5 7 9
4 3 2 1 4 5
12 15 11 9
13.2. MATRICES 387

is a 3 4 matrix, where

a11 = 1 a12 = 5 a13 = 7 a14 = 9


a21 = 3 a22 = 2 a23 = 1 a24 = 4
a31 = 12 a32 = 15 a33 = 11 a34 = 9

Notation The elements (or components or entries) of a matrix are denoted by aij and the
matrix itself is also denoted by (aij ). A matrix with m rows and n columns will be often
denoted by A .
m n

In a matrix (aij ) we have n column vectors:


2 3 2 3 2 3
a11 a12 a1n
6 7 6 7 6 7
6 7;6 7 ; :::; 6 7
4 5 4 5 4 5
am1 am2 amn

and m row vectors:

(a11 ; :::; a1n )


(a21 ; :::; a2n )

(am1 ; :::; amn )

A matrix is called square (of order n) when m = n and is called rectangular when m 6= n.

Example 545 The 3 4 matrix


2 3
1 5 7 9
4 3 2 1 4 5
12 15 11 9

is rectangular, with three row vectors

1 5 7 9 ; 3 2 1 4 ; 12 15 11 9

and four column vectors


2 3 2 3 2 3 2 3
1 5 7 9
4 3 5; 4 2 5; 4 1 5; 4 4 5
12 15 11 9

The 3 3 matrix 2 3
1 5 1
4 3 4 2 5
1 7 9
388 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

is square, with three row vectors

1 5 1 ; 3 4 2 ; 1 7 9

and three column vectors 2 3 2 3 2 3


1 5 1
4 3 5; 4 4 5; 4 2 5
1 7 9
N

Example 546 (i) The square matrix of order n obtained by writing, one next to the other,
the versors ei of Rn is called the identity (or unit) matrix and is denoted by In or, when
there is no danger of confusion, simply by I:
2 3
1 0 0
6 0 1 0 7
6 7
I=6 . . .. .. 7
4 .. .. . . 5
0 0 1
(ii) The m n matrix with all zero elements is called null and is denoted by Omn or, when
there is no danger of confusion, simply by O:
2 3
0 0 0
6 0 0 0 7
6 7
O=6 . . .. .. 7
4 .. .. . . 5
0 0 0
N

13.2.2 Operations on matrices


Let M (m; n) be the set of all the m n matrices. On M (m; n) we can de…ne in a natural
way the operations of addition and scalar multiplication:

(i) given two matrices (aij ) and (bij ) in M (m; n), the addition (aij ) + (bij ) is de…ned by
2 3 2 3 2 3
a11 a1n b11 b1n a11 + b11 a1n + b1n
6 7 6 7 6 7
6 7+6 7=6 7
4 5 4 5 4 5
am1 amn bm1 bmn am1 + bm1 amn + bmn
that is (aij ) + (bij ) = (aij + bij );
(ii) given 2 R and (aij ) 2 M (m; n), the scalar multiplication (aij ) is de…ned by
2 3 2 3
a11 a1n a11 a1n
6 7 6 7
6 7=6 7
4 5 4 5
am1 amn am1 amn
that is (aij ) = ( aij ).
13.2. MATRICES 389

Example 547 We have


2 3 2 3 2 3
1 5 7 9 0 2 1 4 1 7 8 13
4 3 2 1 4 5+4 1 3 1 4 5=4 2 1 0 0 5
12 15 11 9 5 8 1 2 17 23 12 11

and 2 3 2 3
1 5 7 9 4 20 28 36
44 3 2 1 4 5 = 4 12 8 4 16 5
12 15 11 9 48 60 44 36
N

Example 548 Given a square matrix A = (aij ) of order n and two scalars and , we have
2 3
a11 + a12 a1n
6 a21 a22 + a2n 7
A+ I =6
4
7:
5
an1 an2 ann +

It is easy to verify that the operations of addition and scalar multiplication just introduced
on M (m; n) satisfy the properties (v1)-(v8) that in Chapter 3 we established for Rn , that is:

(v1) A + B = B + A

(v2) (A + B) + C = A + (B + C)

(v3) A + O = A

(v4) A + ( A) = O

(v5) (A + B) = A + B

(v6) ( + ) A = A + A

(v7) 1A = A

(v8) ( A) = ( )A

Intuitively, we can say that M (m; n) is another example of a vector space. Note that
the neutral element for the addition is the null matrix.

13.2.3 A …rst taxonomy


Square matrices are particularly important. We call main (or principal ) diagonal of a square
matrix the set of the elements aii on the diagonal. A square matrix is said to be:

(i) symmetric if aij = aji for every i; j = 1; 2; :::; n, i.e., when the two triangles separated
by the main diagonal are mirror images of each other;
390 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

(ii) lower triangular if all the elements above the main diagonal are zero, that is, aij = 0
for i < j;

(iii) upper triangular if all the elements below the main diagonal are zero, that is, aij = 0
for i > j;

(iv) diagonal if it is both lower and upper triangular, that is, if all the elements outside the
main diagonal are zero: aij = 0 for i 6= j.

Example 549 The matrix 2 3


1 2 1
4 2 4 0 5
1 0 9
is symmetric. The matrices
2 3 2 3 2 3
1 0 0 1 5 1 1 0 0
4 3 4 0 5, 4 0 4 2 5, 4 0 4 0 5
1 7 9 0 0 0 0 0 9

are lower triangular, upper triangular and diagonal, respectively. N

We call transpose of a matrix A 2 M (m; n) the matrix B 2 M (n; m) obtained by


interchanging the rows and the columns of A, that is,

bij = aji

for every i = 1; 2; ; n and every j = 1; 2; :::; m. The transpose of A is denoted by AT .

Example 550 We have:


2 3 2 3
1 5 7 1 3 12
A= 4 3 2 1 5 and T 4
A = 5 2 15 5
12 15 11 7 1 11

as well as 2 3
1 3
1 0 7
A= and AT = 4 0 5 5
3 5 1
7 1
N

Note that
T
AT =A
so the “transpose of the transpose” of a matrix is the matrix itself. In particular, it is easy
to see that a square matrix A is symmetric if and only if AT = A. In this case, transposition
has no e¤ect. Finally, in terms of operations we have

( A)T = AT and (A + B)T = AT + B T

for every 2 R and every A; B 2 M (m; n).


13.2. MATRICES 391

A row vector x = (x1 ; :::; xn ) 2 Rn can be regarded as a 1 n matrix, so we can identify


Rn with M (1; n). According to this identi…cation, the transpose xT of x is the column vector
2 3
x1
6 7
6 7
4 5
xn

that is, xT 2 M (n; 1). This allows us to identify Rn also with M (n; 1).
In what follows we will often identify the vectors of Rn with matrices. Sometimes it
will convenient to regard them as row vectors, that is, as elements of M (1; n), sometimes
as column vectors, that is, as elements of M (n; 1). In any case, one should not forget that
vectors are elements of Rn , identi…cations are holograms.

13.2.4 Product of matrices


It is possible to de…ne the product of two matrices A and B under suitable conditions on
their dimensions. We …rst present the special case of the product of a matrix with a vector.
Let A = (aij ) 2 M (m; n) and x 2 Rn . The choice of the dimensions of A and x is not
arbitrary: the product of the type AxT between matrix A and the column vector xT requires
that the number of rows of x be equal to the number of columns of A. If this is the case,
the product AxT is de…ned by
2 n 3
X
6 a1i xi 7
6 7
2 32 3 6 i=1 7 2 1 3
a11 a12 a1n x1 6 7 a x
6 X n 7
6 a21 a22 a2n 7 6 7 6 a2i xi 7 6 2 7
AxT = 6 7 6 x2 7 = 6 7=6 a x 7
4 54 5 6 i=1 7 4 5
6 7
am1 am2 amn xn 6 7 m
a x
6 n 7
6 X 7
4 a x 5
mi i
i=1

where a1 , a2 , ..., am are the rows of A and


a1 x; a2 x; :::; am x
are the inner products between the rows of A and the vector x. In particular, AxT 2 M (m; 1).

It is thus evident why the dimension of the vector x must be equal to the number of
columns of A: in multiplying A with x, the components of AxT are the inner products
between the rows of A and the vector x. But, inner products are possible only between
vectors of the same dimension.

Notation To ease notation, in what follows we will just write Ax instead of AxT .

Example 551 Let A 2 M (3; 4) and x 2 R4 be given by


2 3
3 2 0 1
A = 4 0 10 2 2 5 and x = (1; 2; 3; 4)
4 0 2 3
392 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

It is possible to compute the product Ax:


2 3
2 3 1
3 2 0 1 6
2 7
Ax = 4 0 10 2 2 56
4 3 5
7
4 0 2 3
4
2 3 2 3
3 1 + ( 2) 2 + 0 3 + ( 1) ( 4) 3
= 4 0 1 + 10 2 + 2 3 + ( 2) ( 4) 5 = 4 34 5
4 1 + 0 2 + ( 2) 3 + 3 ( 4) 14

However, it is not possible to take the product xA: the number of rows of A (i.e., 3) is not
equal to the number of columns of x (i.e., 1). N

In a similar way, we de…ne the product of two matrices A and B by suitably multiplying
the rows of A and the columns of B. The prerequisite on the dimensions of the matrices is
that the number of columns of A is equal to the number of rows of B. In other words, the
product AB is possible when A 2 M (m; n) and B 2 M (n; q). If we denote by a1 , a2 ,..., am
the rows of A and by b1 , b2 ,..., bq the columns of B, we then have
2 3 2 1 1 3
a1 a b a1 b2 a1 bq
6 a2 7 1 2 6 2 1 a2 b2 a2 bq 7
AB = 6 7 b ; b ; :::; bq = 6 a b 7
4 5 4 5
a m am b1 am b2 m
a b q

The elements abij of the product matrix AB are, therefore,


n
X
abij = ai bj = aik bkj
k=1

for i = 1; :::; m and j = 1; :::; q.

The product matrix AB is of type m q: so, it has the same number of rows as A and
the same number of columns as B. Note that it is possible to take the product AB of the
matrices A and B if and only if the product B T AT of the transpose matrices B T and
m n n q q n
T
A is well-de…ned. Momentarily it will be seen that, indeed, (AB)T = B T AT .
n m

This de…nition of product between matrices …nds its justi…cation in Proposition 569,
which we discuss later in the chapter. For the moment, it is important to understand the
“mechanics” of the de…nition. To this end, we proceed with some examples.

Example 552 Let A 2 M (2; 4) and B 2 M (4; 3) be given by


2 3
0 2 3
3 2 8 6 6 5 6 1 7
A= and B=6
4 12
7
13 0 4 9 7 0 5
1 9 11
13.2. MATRICES 393

It is possible to compute the product AB:


3 0 + ( 2) 5 + 8 12 + ( 6) ( 1) 3 2 + ( 2) ( 6) + 8 7 + ( 6) 9
AB =
13 0 + 0 5 + ( 4) 12 + 9 ( 1) 13 2 + 0 ( 6) + ( 4) 7 + 9 9
3 3 + ( 2) 1 + 8 0 + ( 6) (11) 92 20 59
=
13 3 + 0 1 + ( 4) 0 + 9 (11) 57 79 138
However, it is not possible to take the product BA: the number of rows of A (i.e., 2) is not
equal to the number of columns of B (i.e., 3). As we just remarked, it is possible, though,
to take the product B T AT ; indeed, the number of columns of B T (i.e., 4) is equal to the
number of rows of AT . N

Example 553 Consider the matrices


2 3
1 2 1 0
1 3 1
A = and B =4 2 5 2 2 5
2 3 0 1 4 3 4
0 1 3 2
The product matrix AB is 2 4. In this regard, note the useful mnemonic rule (2 4) =
(2 3)(3 4). We have:

2 3
1 2 1 0
1 3 1 4 2 5 2 2 5
AB =
0 1 4
0 1 3 2
1 1+3 2+1 0 1 2+3 5+1 1 1 1+3 2+1 3 1 0+3 2+1 2
=
0 1+1 2+4 0 0 2+1 5+4 1 0 1+1 2+4 3 0 0+1 2+4 2
7 18 10 8
=
2 9 14 10
N
The product of matrices has the following properties, as the reader can verify.

Proposition 554 Let A; B and C be any three matrices for which it is possible to take the
products indicated below. Then

(i) (AB)C = A(BC);


(ii) A(B + C) = AB + AC;
(iii) (A + B)C = AC + BC;
(iv) AB = ( A)B = A( B) for every 2 R;
(v) (AB)T = B T AT .

Among the properties of the product, commutativity is missing. Indeed, the product of
matrices does not satisfy this property: if both products AB and BA are well-de…ned, in
general we have AB 6= BA. The next example will illustrate this notable failure.
When AB = BA, we say that the two matrices commute. Since (AB)T = B T AT , the
matrices A and B commute if and only if their transposes commute.
394 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Example 555 Let A and B be given by


2 3 2 3
1 0 3 2 1 4
A=4 2 1 0 5 and B=4 0 3 1 5
1 4 6 4 2 4

Since A and B are square matrices, both BA and AB are well-de…ned 3 3 matrices. We
have:
2 32 3
2 1 4 1 0 3
BA = 4 0 3 1 5 4 2 1 0 5
4 2 4 1 4 6
2 3
2 1+1 2+4 1 2 0+1 1+4 4 2 3+1 0+4 6
=4 0 1+3 2+1 1 0 0+3 1+1 4 0 3+3 0+1 6 5
4 1+2 2+4 1 4 0+2 1+4 4 4 3+2 0+4 6
2 3
8 17 30
=4 7 7 6 5
12 18 36

while
2 32 3
1 0 3 2 1 4
AB = 4 2 1 0 54 0 3 1 5
1 4 6 4 2 4
2 3
1 2+0 0+3 4 1 1+0 3+3 2 1 4+0 1+3 4
=4 2 2+1 0+0 4 2 1+1 3+0 2 2 4+1 1+0 4 5
1 2+4 0+6 4 1 1+4 3+6 2 1 4+4 1+6 4
2 3
14 7 16
=4 4 5 9 5
26 25 32

So AB 6= BA, the product is not commutative. N

13.3 Linear operators


13.3.1 De…nition and …rst properties
The functions T : Rn ! Rm de…ned on Rn and with values in Rm are called operators
(Chapter 6).

De…nition 556 An operator T : Rn ! Rm is linear if

T ( x + y) = T (x) + T (y) (13.7)

for every x; y 2 Rn and every ; 2 R.

The notion of linear operator generalizes that of linear function (De…nition 529), which
is the special case m = 1, that is, Rm = R.
13.3. LINEAR OPERATORS 395

Linear operators are the operators which preserve the operations of addition and scalar
multiplication, thus generalizing the analogous result that we established for linear functions
(Proposition 533). Though natural, it is a signi…cant generalization: here T (x) is a vector
of Rm , not a scalar (unless m = 1).

Proposition 557 An operator T : Rn ! Rm is linear if and only if

(i) T (x + y) = T (x) + T (y) for all x; y 2 Rn ;

(ii) T ( x) = T (x) for all x 2 Rn and 2 Rn .

We omit the proof because it is similar to that of Proposition 533.

Example 558 Given a matrix A 2 M (m; n), de…ne the operator T : Rn ! Rm by

T (x) = Ax 8x 2 Rn (13.8)

It is easy to see that T is linear. Soon, in Theorem 564, we will show that all linear operators
T : Rn ! Rm actually have such a form.
Note that this operator can be written in the form T = (T1 ; :::; Tm ) : Rn ! Rm introduced
in Section 12.7 by setting, for every i = 1; :::; m,

Ti (x) = ai x

where ai is the i-th row vector of the matrix A. N

Example 559 (i) The operator 0 : Rn ! Rm de…ned by

0 (x) = 0 8x 2 Rn

is linear and is called the null or zero operator.

(ii) The identity operator I : Rn ! Rn de…ned by

I (x) = x 8x 2 Rn

is of great importance. Clearly, it is linear. N

When n = m, we have the important special case of “self” operators T : Rn ! Rn that


have the same domain and codomain.

Example 560 Let A = (aij ) be an n n square matrix. As in Example 558, de…ne the
operator T : Rn ! Rn by
T (x) = Ax 8x 2 Rn
Now, this operator has the same domain and codomain. N

We conclude this …rst section with some basic properties of linear operators that gen-
eralize those stated in Proposition 534 for linear functions (the easy proof is left to the
reader).
396 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Proposition 561 Let T : Rn ! Rm be a linear operator. We have T (0) = 0 and


k
! k
X X
i i
T i x = iT x (13.9)
i=1 i=1
k
for every set of vectors xi i=1 in Rn and every set of scalars f i gki=1 .
As we have already seen for linear functions, property (13.9) has the important con-
sequence that, once we know the values taken by a linear operator T on the elements of a
basis of Rn , we can determine the values of T for each vector of Rn .

The operations of addition and scalar multiplication are easily de…ned for operators.
Speci…cally, given two operators S; T : Rn ! Rm , linear or not, and a scalar 2 R, de…ne
S + T : Rn ! Rm and T : Rn ! Rm by
(S + T ) (x) = S (x) + T (x)
and
( T ) (x) = T (x)
for every x 2 Rn . Denote by L (Rn ; Rm )the space of all linear operators T : Rn ! Rm . In
the case of linear functions, i.e., R = R, the space L (Rn ; R) reduces to the dual space (Rn )0
m

that we studied before. It is easy to check that the two operations just introduced satisfy
the “usual” properties (v1)-(v8). Again, this means that L (Rn ; Rm ) is, intuitively, another
example of a vector space. In particular, in the case of linear operators T : Rn ! Rn , to
ease notation we denote this space just by L (Rn ), in place of L (Rn ; Rn ).

Addition and scalar multiplication are, by now, routine. The next notion is, instead,
peculiar to operators.
De…nition 562 Given two linear operators T : Rn ! Rm and S : Rm ! Rq , their product
is the function ST : Rn ! Rq de…ned by
(ST ) (x) = S (T (x))
for every x 2 Rn .
In other words, the product operator ST is the composite function S T . If the operators
S and T are linear, also the product ST is so. Indeed:
(ST ) ( x + y) = S (T ( x + y)) = S ( T (x) + T (y))
= S (T (x)) + S (T (y)) = (ST ) (x) + (ST ) (y)
for every x; y 2 Rn and every ; 2 R. The product of two linear operators is, therefore,
still a linear operator.

As Proposition 569 will make clear, in general the product is not commutative: when
both products ST and T S are de…ned, in general we have ST 6= T S. Hence, when one writes
ST and T S, the order with which the two operators appear is important.

Last, but not least, we state the version for operators of the remarkable Theorem 535 on
continuity.
13.3. LINEAR OPERATORS 397

Proposition 563 Linear operators are continuous.

The proof is a simple elaboration on Theorem 535, so it is left to readers (who read the
proof of Theorem 669).

13.3.2 Representation
In this section we study more in detail linear operators T : Rn ! Rm . We start by es-
tablishing a representation theorem for them. In Riesz’s Theorem we saw that a function
f : Rn ! R is linear if and only if there exists a vector 2 Rn such that f (x) = x for
every x 2 Rn . The next result generalizes Riesz’s Theorem to linear operators.

Theorem 564 An operator T : Rn ! Rm is linear if and only if there exists a (unique)


matrix A such that
m n
T (x) = Ax (13.10)
for every x 2 Rn .

The matrix A is called matrix associated to the operator T (or also representative matrix
of the operator T ).

Matrices allow us, therefore, to represent operators in the form (13.10), which is of great
importance both theoretically and operationally. This is why matrices are so important:
though the fundamental notion is that of operator, thanks to the representation (13.10)
matrices become a most useful auxiliary notion that will accompany us in the rest of the
book.

Proof “If”. This part is contained, essentially, in Example 558. “Only if”. Let T be a linear
operator. Set " #
A = T e1 ; T e2 ; :::; T (en ) (13.11)
m n m 1
m 1 m 1

that is, A is the m n matrix whosePn columns are the column vectors T ei for i = 1; :::; n.
We can write every x 2 Rn as x = ni=1 xi ei . Therefore, for every x 2 Rn ,
n
! n
X X
i
T (x) = T xi e = xi T e i
i=1 i=1
2 3 2 3 2 3
a11 a12 a1n
6 a21 7 6 a22 7 6 a2n 7
6 7 6 7 6 7
= x1 6 .. 7 + x2 6 .. 7+ + xn 6 .. 7
4 . 5 4 . 5 4 . 5
am1 am2 amn
2 3 2 3
a11 x1 + a12 x2 + + a1n xn a1 x
6 a21 x1 + a22 x2 + + a2n xn 7 6 2 7
6 7 6 a x 7
=6 .. 7=6 .. 7 = Ax
4 . 5 4 . 5
am1 x1 + am2 x2 + + amn xn m
a x
398 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

where a1 , a2 , ..., am are the rows of A.

As to uniqueness, let B be an m n matrix for which (13.10) holds. By considering the


vectors ei we have

(a11 ; a21 ; :::; am1 ) = T e1 = Be1 = (b11 ; b21 ; :::; bm1 )


(a12 ; a22 ; :::; am2 ) = T e2 = Be2 = (b12 ; b22 ; :::; bm2 )

(a1n ; a2n ; :::; amn ) = T (en ) = Ben = (b1n ; b2n ; :::; bmn )

Therefore, A = B.

Example 565 Let T : R3 ! R3 be de…ned by

T (x) = (0; x2 ; x3 ) 8x 2 R3

In other words, T projects every vector in R3 on the plane x 2 R3 : x1 = 0 . For example,


T (2; 3; 5) = (0; 3; 5). We have

T e1 = (0; 0; 0)
T e2 = (0; 1; 0)
T e3 = (0; 0; 1)

and therefore 2 3
0 0 0
A = T e1 ; T e2 ; T e3 =4 0 1 0 5
0 0 1

Hence, T (x) = Ax for every x 2 R3 . N

Example 566 Let T : R3 ! R2 be de…ned by

T (x) = (x1 x3 ; x1 + x2 + x3 ) 8x 2 R3

For example, T (2; 3; 5) = ( 3; 10). We have

T e1 = (1; 1)
T e2 = (0; 1)
T e3 = ( 1; 1)

and therefore
1 0 1
A = T e1 ; T e2 ; T e3 =
1 1 1

So, we can write T (x) = Ax for every x 2 R3 . N


13.3. LINEAR OPERATORS 399

13.3.3 Matrices and operations


At this point it is natural to ask what are the matrix representations of the operations on
operators. For addition and scalar multiplication we have the following simple result (the
easy proof is left to the reader).

Proposition 567 Let S; T : Rn ! Rm be two linear operators and let 2 R. Let A and B
be the two m n matrices associated to S and T , respectively. Then

(i) A + B is the matrix associated to the operator S + T ;

(ii) A is the matrix associated to the operator S.

Example 568 Let S; T : R3 ! R3 be linear operators de…ned, for all x 2 R3 , by

S (x) = (0; x2 ; x3 ) and T (x) = (2x1 x3 ; x1 + x2 + 3x3 ; 2x1 x2 )

In Example 565 we saw that 2 3


0 0 0
A=4 0 1 0 5
0 0 1
is the matrix associated to the operator S. By proceeding in the same way, it is easy to
check that 2 3
2 0 1
B=4 1 1 3 5
2 1 0
is the matrix associated to the operator T . By Proposition 567,
2 3
2 0 1
A+B =4 1 2 3 5
2 1 1

is then the matrix associated to the operator S +T . Moreover, if we take for example = 10,
by Proposition 567, 2 3
0 0 0
A = 4 0 10 0 5
0 0 10
is the then matrix associated to the operator S. N

We move to the more interesting case of the product of operators.

Proposition 569 Let S : Rm ! Rq and T : Rn ! Rm be two linear operators with associ-


ated matrices, respectively,

A = (aij ) and B = (bij )


q m m n

Then, the matrix associated to the product operator ST : Rn ! Rq is the product matrix

AB = (abij )
q n
400 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

The product matrix AB is, therefore, the matrix representation of the product operator
ST . This motivates the notion of product of matrices that, when it was introduced earlier
in the chapter, might have seemed quite arti…cial.
n q m
Proof Let ei i=1
, e~i i=1
, and ei i=1
be respectively the standard bases of Rn , Rq , and
Rm . We have
T ej = Bej = (b1j ; b2j ; :::; bmj )
m
X
= b1j (1; 0; :::; 0) + b2j (0; 1; 0; :::; 0) + + bmj (0; 0; :::; 1) = bkj ek
k=1

In the same way,


q
X
k k
S(e ) = Ae = (a1k ; :::; aqk ) = aik e~i
i=1
We can therefore write
m
! m
X X
j j k
(ST ) e =S T e =S bkj e = bkj S ek
k=1 k=1
m q
! q m
!
X X X X
= bkj aik e~i = aik bkj e~i
k=1 i=1 i=1 k=1

On the other hand, if C is the q n matrix associated to the operator ST , then


q
X
(ST ) ej = Cej = (c1j ; :::; cqj ) = cij e~i
i=1
Pm
Therefore, cij = k=1 aik bkj and we conclude that C = AB.

As we saw in Section 13.2.4, the product of matrices is in general not commutative: this,
indeed, re‡ects the lack of commutativity of the product of linear operators.

13.4 Rank
13.4.1 Linear operators
The kernel, denoted ker T , of an operator T : Rn ! Rm is the set
ker T = fx 2 Rn : T (x) = 0g (13.12)
That is, ker T = T 1 (0). The kernel is thus the set of the points at which the operator takes
on a null value (i.e., the zero vector 0 of Rm ).
Another important set is the image (or range) of T , which is de…ned in the usual way as
Im T = fy 2 Rm : y = T (x) for some x 2 Rn g (13.13)
The image is, therefore, the set of the vectors of Rm that are “reached” from Rn through
the operator T .

For linear operators the above sets turn out to be vector subspaces, the kernel of the
domain Rn and the image of the codomain Rm .
13.4. RANK 401

Lemma 570 If T 2 L (Rn ; Rm ), then ker T and Im T are vector subspaces of Rn and of Rm ,
respectively.

Proof We show the result for ker T , leaving Im T to the reader. Let x; x0 2 ker T , so
T (x) = 0 and T (x0 ) = 0. We have to prove that x + x0 2 ker T for every ; 2 R.
Indeed, we have
T x + x0 = T (x) + T x0 = 0 + 0 = 0
as desired.

These vector subspaces are important when dealing with the properties of injectivity
and surjectivity of linear operators. In particular, by de…nition the operator T is surjective
when Im T = Rm , that is, when the subspace Im T coincides with the entire space Rm . As
to injectivity, by exploiting the linearity of T we have the following simple characterization
through a null kernel.

Lemma 571 An operator T 2 L (Rn ; Rm ) is injective if and only if ker T = f0g.

Proof “If ”. Suppose that ker T = f0g. Let x; y 2 Rn with x 6= y. Since x y 6= 0, the
hypothesis ker T = f0g implies T (x y) 6= 0, so T (x) 6= T (y). “Only if”. Let T : Rn ! Rm
be an injective linear operator and let x 2 ker T . If x 6= 0, then by injectivity we have the
contradiction T (x) 6= T (0) = 0. Hence, x = 0, which implies ker T = f0g.

We can now state the important Rank-Nullity Theorem, which says that the dimension
of Rn –that is, n –is always the sum of the dimensions of the two subspaces ker T and Im T
determined by a linear operator T . To this end, we give a name to such dimensions.

De…nition 572 The rank (T ) of a linear operator T : Rn ! Rm is the dimension of Im T ,


while the nullity (T ) is the dimension of ker T .

Using this terminology, we can now state and prove the result.

Theorem 573 (Rank-Nullity) Given a linear operator T : Rn ! Rm , we have

(T ) + (T ) = n (13.14)

Proof Setting (T ) = k and (T ) = h, let fyi gki=1 be a basis of the vector subspace Im T
of Rm and fxi ghi=1 a basis of the vector subspace ker T of Rn . Since fyi gki=1 Im T , by
k n
de…nition there exist k vectors fxi gi=1 in R such that T (xi ) = yi for every i = 1; :::; k. Set

E = fx1 ; :::; xk ; x1 ; :::; xh g

To prove the theorem it is su¢ cient to show that E is a basis of Rn . Indeed, in this case E
consists of n vectors and therefore k + h = n.
First of all, we show that the set E is linearly independent. Let f 1 ; :::; k ; 1 ; :::; h g be
scalars such that
k
X h
X
i xi + i xi =0 (13.15)
i=1 i=1
402 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Since T (0) = 0,3 we have

k h
! k
! h
!
X X X X
T i xi + i xi =T i xi +T i xi =0
i=1 i=1 i=1 i=1

Ph
On the other hand, since fxi ghi=1 is a basis of ker T , we have T i=1 i xi = 0. Therefore,

k
! k k
X X X
T i xi = iT (xi ) = i yi =0 (13.16)
i=1 i=1 i=1

Being a basis, fyi gki=1 is a linearly independent set, so (13.16) implies i = 0 for every
P
i = 1; :::; k. Therefore, (13.15) reduces to hi=1 i xi = 0, which implies i = 0 for every
i = 1; :::; h because fxi ghi=1 , as a basis, is a linearly independent set. Thus, we conclude that
the set E is linearly independent.
It remains to show that span E = Rn . Let x 2 Rn and consider its image T (x). By
de…nition, T (x) 2 Im T and therefore, since fyi gki=1 is a basis of Im T , there exists a set
P
f i gki=1 R such that T (x) = ki=1 i yi . Setting yi = T (xi ) for every i = 1; :::; k, one
obtains !
Xk Xk
T (x) = i T (xi ) = T i xi
i=1 i=1
Pk Pk
Therefore, T x i=1 i xi = 0, and so x i=1 i xi 2 ker T . On the other hand,
fxi ghi=1 is a basis of ker T , and therefore there exists a set f i ghi=1 of scalars such that
Pk Ph Pk Ph
x i=1 i xi = i=1 i xi . In conclusion, x = i=1 i xi + i=1 i xi , which shows that
x 2 span E, as desired.

To appreciate the importance of the result, we present some interesting consequences


that it has.

Corollary 574 A linear operator T : Rn ! Rm is injective only if n m, while it is


surjective only if n m.

Proof Let T be injective, so that ker T = f0g. Since Im T is a vector subspace of Rm , we


have (T ) = dim (Im T ) dim Rm = m. Therefore, (13.14) reduces to

n = dim Rn = (T ) + dim (0) = (T ) dim Rm = m

Assume now that T is surjective, i.e., Im T = Rm . Since (T ) 0, (13.14) yields

n = dim Rn = (T ) + (T ) = dim Rm + (T ) dim Rm = m

as claimed.
3
In this proof we use two di¤erent zero vectors 0: the zero vector 0Rm in Rm and the zero vector 0Rn in
n
R . For simplicity, we omit subscripts as no confusion should arise.
13.4. RANK 403

For a generic function, injectivity and surjectivity are distinct and independent properties:
it is very easy to give examples of injective, but not surjective, functions and vice versa. The
next important result, another remarkable consequence of the Rank-Nullity Theorem, shows
that for linear “self”operators (i.e., with the same domain and codomain) the two properties
turn out to be, instead, equivalent.

Corollary 575 A linear operator T : Rn ! Rn is injective if and only if it is surjective. In


particular, the following properties are equivalent:

(i) T is bijective;

(ii) ker T = f0g;

(iii) Im T = Rn .

Proof (i) trivially implies (ii). As for (ii) implies (iii), let us assume ker T = f0g. Since
(T ) = 0, (13.14) implies (T ) = n. Since Im T is a subspace of Rn , this implies Im T = Rn
and, therefore, (ii) implies (iii).
It remains to prove that (iii) implies (i). Assume (iii), i.e., Im T = Rn . To show that T is
bijective it su¢ ces to show that it is injective. Using (13.14), from (T ) = n it follows that
(T ) = 0, which implies ker T = f0g. By Proposition 571, T is then injective, as desired.

An equivalent way to state the second part of Corollary 575 is to say that the following
conditions are equivalent:

(i) T is bijective;

(ii) (T ) = 0;

(iii) (T ) = n.

13.4.2 Rank of matrices


The rank of a matrix is one of the central notions of linear algebra.

De…nition 576 The rank of a matrix A, denoted by (A), is the maximum number of its
columns that are linearly independent.

Example 577 Let 2 3


3 6 18 2
6 1 2 6 47
A=6
40
7
1 3 65
2 1 3 8
Since the third column can be obtained by multiplying the second column by 3, the set of all
four columns is linearly dependent. Therefore, (A) < 4. Instead, it is easy to verify that the
…rst, second and, fourth columns are linearly independent, like the …rst, third, and fourth
columns. Thus, (A) = 3. Note that there are two di¤erent sets of linearly independent
columns, which have the same cardinality. N
404 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

N.B. To establish whether k vectors x1 ; x2 ; :::; xk 2 Rn are linearly independent (with k n,


otherwise the answer is certainly negative) one can construct the n k matrix that has these
vectors as columns. They are linearly independent if and only if the rank of this matrix is
k. O

Let A be the matrix associated to a linear operator T . Since the vector subspace Im T
is generated by the column vectors of A,4 we have (T ) (A) (why?). The next result
shows that, actually, equality holds: the notions of rank for operators and for matrices are
consistent. In other words, the dimension of the image of a linear operator is equal to the
maximum number of linearly independent columns of the matrix associated to it.

Proposition 578 Let A 2 M (m; n) be the matrix associated to a linear operator T : Rn !


Rm . Then (A) = (T ).

Proof Denote (A) = k n. By (13.11), we have A = T e1 ; T e2 ; :::; T (en ) . Without


loss of generality, suppose that the k linearly independent columns are T e1 , ..., T ek .
The (possible) remaining columns T ek+1 , ..., T (en ) can therefore be expressed as their
linear combination, so that
n o
span T e1 ; T e2 ; :::; T (en ) = span T e1 ; T e2 ; :::; T ek

LetP de…nition, there exists x 2 Rn such that T (x) = y. Therefore, y = T (x) =


y 2 Im T . ByP
n i n i
T i=1 xi e = i=1 xi T e . It follows that
n o
Im T = span T e1 ; T e2 ; :::; T (en ) = span T e1 ; T e2 ; :::; T ek

which proves that the set T e1 ; T e2 ; :::; T ek is basis of Im T . Therefore, (T ) =


dim (Im T ) = k.

Thanks to the Rank-Nullity Theorem, the proposition has the following corollary that
shows that the linear independence of the columns is the matrix counterpart for injectivity.

Corollary 579 A linear operator T : Rn ! Rm , with associated matrix A 2 M (m; n), is


injective if and only if the columns of A are linearly independent.

Proof By Lemma 571, T is injective if and only if (T ) = 0. By the Rank-Nullity Theorem,


this happens if and only if (T ) = n, i.e., if and only if (A) = n (by Proposition 578).

So far we have considered the linear independence of the columns of A. The connection
with the linear independence of the rows of A is, however, very tight, as the next important
result shows. In reading it, note that the rank of the transpose matrix AT is the maximum
number of linearly independent rows of A.

Theorem 580 For every matrix A, the maximum numbers of linearly independent rows and
columns coincide, i.e.,
(A) = AT
4 Pn Pn
Indeed, recall that the i-th column of A is T ei and therefore T (x) = T i=1 xi e i = i=1 xi T ei .
This shows that the image T (x) is a linear combination of the columns of A.
13.4. RANK 405

Proof Let A = (aij ) 2 M (m; n). In the proof we denote the i-th row by Ri and the j-th
column by Cj . We have to prove that the subspace of Rn generated by the rows of A, called
row space of A, has the same dimension of the subspace of Rm generated by the columns
of A, called column space of A. Let r be the dimension of the row space of A, that is,
r = AT , and let fx1 ; x2 ; :::; xr g Rn be a basis of this space, where

xi = xi1 ; xi2 ; :::; xin 8i = 1; 2; :::; r

Each row Ri of A can be written in a unique way as a linear combination of fx1 ; x2 ; :::; xr g,
that is, there exists a vector of r coe¢ cients (w1i ; w2i ; :::; wri ) such that

Ri = w1i x1 + w2i x2 + + wri xr 8i = 1; 2; :::; m (13.17)

Let us concentrate now on the …rst column of A, i.e., C1 = (a11 ; a21 ; :::am1 ). The …rst
component a11 of C1 is equal to the …rst component of R1 , the second component a21 of C1
is equal to the …rst component of R2 , and so on until the m-th component am1 of C1 which
is equal to the …rst component of Rm . Thanks to (13.17), we have

a11 = w11 x11 + w21 x21 + + wr1 xr1


a21 = w12 x11 + w22 x21 + + wr2 xr1

am1 = w1m x11 + w2m x21 + + wrm xr1

that is,
2 3 2 1 3 2 3 2 3
a11 w1 w21 wr1
6 a21 7 6 w 2 7 6 2 7 6 wr2 7
C1 = 6 7 = x11 6 1 7 + x21 6 w2 7 + + xr1 6 7
4 5 4 5 4 5 4 5
am1 w1m m
w2 m
wr

The column C1 of A can, therefore, be written as linear combination of the vectors w1 ; w2 ; :::; wr ,
where 2 1 3 2 1 3 2 1 3
w1 w2 wr
6 2
w1 77 6 2 7 6 wr2 7
w1 = 6 2 6 w2 7 ; ; wr = 6 7
4 5; w = 4 5 4 5
w1m w2m wrm
In a similar way it is possible to verify that all the n columns of A can be written as linear
combinations of w1 ; w2 ; :::; wr . Therefore, the column space of A is generated by the r vectors
w1 ; w2 ; :::; wr of Rm , which implies that its dimension (A) is lower than or equal to r. That
is,
(A) r = (AT )

By interchanging the rows and the columns and by repeating the same reasoning, we get

r = (AT ) (A)

This concludes the proof.


406 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Example 581 Consider the rows of the matrix


2 3
3 6 18
A=4 1 2 6 5
0 1 3

Since the …rst row is obtained by multiplying the second one by 3, the set of all the three
rows is linearly dependent. Therefore, AT < 3. Instead, the two rows (3; 6; 18) and
(0; 1; 3) are linearly independent, like the rows (1; 2; 6) and (0; 1; 3). Therefore, AT = 2.
N

Even though the maximum sets of linearly independent rows or columns can be di¤erent
(in the matrix of the last example we have two di¤erent sets, both for the rows and for the
columns), they have the same cardinality because (A) = AT . It is a remarkable result
that, in view of Corollary 575, shows that for a linear operator T : Rn ! Rn the following
conditions are equivalent:

(i) T is injective;

(ii) T is surjective;

(iii) the columns of A are linearly independent, that is, (A) = n;

(iv) the rows of A are linearly independent, that is, AT = n.

The equivalence of these conditions is one of the deepest results of linear algebra.

O.R. Sometimes one calls rank by rows the maximum number of linearly independent rows,
and rank by columns what we have de…ned as the rank, that is, the maximum number of
linearly independent columns. According to these de…nitions, Theorem 580 says that the
rank by columns always coincides with the rank by rows. The rank is their common value.H

13.4.3 Properties
From Theorem 580 it follows that, if A 2 M (m; n), we have

(A) min fm; ng (13.18)

If it happens that (A) = min fm; ng, the matrix A is said to be of full (or maximum) rank.
Indeed, the rank cannot assume a higher value.

Note that the rank of a matrix does not change if one permutes the places of two columns.
Without loss of generality, we can then assume that, for a matrix A of rank r, the …rst r
columns are linearly independent. This useful convention will be used several times in the
proofs below.

The next result gathers some useful properties of the rank.

Proposition 582 Let A; B 2 M (m; n). Then


13.4. RANK 407

(i) (A + B) (A) + (B) and ( A) = (A) for every 6= 0;


(ii) (A) = (CA) = (AD) = (CAD) if C and D are square matrices of full rank;5
(iii) (A) = AT A .

Point (i) shows the behavior of the rank with respect to the matrix operations of addition
and scalar multiplication. Points (ii) and (iii) are interesting properties of invariance of the
rank with respect to the product of matrices. The square matrix AT A is important in
applications and is called the Gram matrix (we will meet it in connection with the least
squares method).

Proof (i) Let r and r0 be the ranks of A and of B: there are r and r0 linearly independent
columns in A and in B, respectively. If r + r0 n the result is trivial because the number of
columns of A + B is n and there cannot be more than n linearly independent columns.
Let therefore r + r0 < n. We denote by as and by bs , with s = 1; : : : ; n, the generic
columns of the two matrices, so that the sth column of A + B is as + bs . We can always
suppose that the r linearly independent columns of A are the …rst ones (i.e., a1 ; : : : ; ar ) and
0
that the r0 linearly independent columns of B are the last ones (i.e., bn r +1 ; : : : ; bn ). In this
way the n (r +r0 ) central columns
n of A+B (that0 is, the a +b
s s 0
o with s = r +1; : : : ; n r ) are
certainly linear combinations of a1 ; ; ar ; bn r +1 ; : : : ; bn because the as can be written
n 0
o
as linear combinations of a1 ; ; ar and the bs of bn r +1 ; : : : ; bn . It follows that the
number of linearly independent columns of A + B cannot exceed r + r0 . We leave to the
reader the proof of the rest of the statement.
(ii) Let us prove (A) = (AD), leaving to the reader the proof of (A) = (CA) (the
equality (A) = (CAD) can be obtained immediately from the other two ones). If A = O,
the result is trivially true. Let therefore A 6= O and let r be the rank of A; there are therefore
r linearly independent columns: let us call them a1 ; a2 ; : : : ; ar since we can always suppose
that they are the …rst r ones; the others, ar+1 ; ar+2 ; ; an are linear combinations of the
…rst ones. Let us prove, now, that the columns of AD are linear combinations of the columns
of A. To this end, let A = (aij ) and D = (dij ). Moreover, let i for i = 1; 2; :::; m and aj for
j = 1; 2; :::; n be the rows and the columns of A, and dj for j = 1; 2; :::; n be the columns of
D. Then

2 1
3 2 1
3
d1 1 d2 1 dn
6 2 7 1 2 6 2 d1 2 d2 2 dn 7
AD = 6
4
7 d jd j
5 jdn = 6
4
7
5
m m d1 m d2 m dn

The …rst column of AD, denoted by (ad)1 , is


2 1 1 3 2 3
d a11 d11 + a12 d21 + ::: + a1n dn1
6 2 d1 7 6 a21 d11 + a22 d21 + ::: + a2n dn1 7
(ad) = 6
1
4
7=6
5 4
7 = d11 a1 + d21 a2 + ::: + dn1 an
5
m d1 am1 d11 + am2 d21 + ::: + amn dn1
5
Of order m and n, respectively, so the products CA and AD are well de…ned.
408 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

The …rst column of AD is, therefore, a linear combination of the columns of A. Analogously,
it is possible to prove that the second column of AD is

(ad)2 = d12 a1 + d22 a2 + + dn2 an

and, in general, the j-th column of AD is

(ad)j = d1j a1 + d2j a2 + + dnj an 8j = 1; 2; :::; n (13.19)

Therefore, since each column of AD is a linear combination of the columns of A, the


space generated by the columns of AD is a subspace of Rm of dimension lower than or equal
to that of the space generated by the columns of A. In other words,

(AD) (A) = r (13.20)

Let us suppose, by contradiction, that (AD) < (A) = r. Then, in the linear combinations
(13.19) one of the …rst r columns of A always has coe¢ cient zero (otherwise, the column
space of AD would have dimension at least r, being a1 ; a2 ; :::; ar linearly independent vectors
of Rm ). Without loss of generality, let us suppose that column a1 is the one having coe¢ cient
zero in all linear combinations (13.19). Then, we have

d11 = d12 = = d1n = 0

which is a contradiction since D has full rank and it cannot have a row of only zeros.
Therefore, the space generated by the columns of AD has dimension at least r, that is,
(AD) r. Together with (13.20), this proves the result.

(iii) If A, and therefore AT , are of full rank, the result follows from (ii). Suppose that A
has not full rank and let (A) = r, with r < minfm; ng. As seen in (ii), the columns of AT A
are linear combinations of the columns of AT , and so

(AT A) (AT ) = (A) = r (13.21)

By assuming that the …rst r columns of A are linearly independent, we can write A as

A = B C
m n m r m (n r)

with B of full rank equal to r. Therefore,

BT BTB BTC
AT A = [B C] = :
CT C TB C TC

By property (ii), the submatrix B T B, which is square of order r, has full rank r. Therefore,
the r columns of B T B are linearly independent vectors of Rr . Consequently, the …rst r
columns of AT A are linearly independent vectors of Rn (otherwise, the r columns of B T B
would not be linearly independent). The column space of AT A has dimension at least r,
that is, (AT A) r. Together with (13.21), this proves the result.
13.4. RANK 409

13.4.4 Gaussian elimination procedure


The Gaussian elimination procedure is an important algorithm for the calculation of the
rank of matrices. Another algorithm, due to Kronecker, will be presented in Section 13.6.7
after having introduced the notion of determinant.
We start with a trivial observation. There are matrices that reveal immediately their
properties, among them the rank. For example, both matrices
2 3
2 3 1 0 0
1 0 0 0 0 6 0
4 0 1 0 0 0 5 6 1 0 7
7
and 4 0 (13.22)
0 1 5
0 0 1 0 0
0 0 0

have rank 3: in the …rst one the …rst three columns are linearly independent (they are
the three versors of R3 ); in the second one the …rst three rows are linearly independent.
The matrices (13.22) are a special case of echelon matrices, which are characterized by the
properties:

(i) the rows with not all elements zero have 1 as …rst non-zero component, called pivot
element, or simply pivot;

(ii) the other elements of the column of the pivot are zero;

(iii) pivots form a “little scale” from the left to the right: a pivot of a lower row is to the
right of the pivot of an upper row;

(iv) the rows with all elements zero (if they exist) lie under the other rows, so in the lower
part of the matrix.

Matrices (13.22) are echelon matrices, and so it is the matrix:


2 3
1 0 0 0 0
6 0 1 0 0 7 7
6 7
4 0 0 1 3 0 5
0 0 0 0 0

in which the pivots are in boldface. Note that a square matrix is an echelon matrix when it
is diagonal, possibly followed by rows of only zeros; for example:
2 3
1 0 0
4 0 1 0 5
0 0 0

Clearly, the non-zero rows (that is, the rows with at least one non-zero element) are linearly
independent. The rank of an echelon matrix is, therefore, obvious.

Lemma 583 The rank of an echelon matrix is equal to the number of non-zero rows.
410 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

There exist some simple operations that permit to transform any matrix A into an echelon
matrix. Such operations, called elementary operations (by row ),6 are:

(i) multiplying any row by a non-zero scalar (denoted by E1 );

(ii) adding to any row a multiple of any other row (denoted by E2 );

(iii) interchanging any two rows (denoted by E3 ).

The three operations amounts to multiplying, on the left, the matrix A 2 M (m; n) by
suitable m m square matrices, called elementary. Speci…cally,

(i) multiplying the s-th row of A by a scalar amounts to multiplying, on the left, A by
the elementary matrix Ps ( ) that coincides with the identity matrix Im except that,
in the place (s; s), we have instead of 1;

(ii) adding to the r-th row of A a multiple of the s-th row amounts to multiplying, on
the left, A by the elementary matrix Srs ( ) that coincides with the identity matrix Im
except that, in the place (r; s), we have instead of 0;

(iii) interchanging the r-th row and the s-th row of A amounts to multiplying, on the left,
A by the elementary matrix Trs that coincides with the identity matrix Im except that
the r-th row and the s-th row have been interchanged.

Example 584 Let 2 3


3 2 4 1
A=4 1 0 6 9 5
5 3 7 4

(i) Multiplying A by 2 3
1 0 0
P2 ( ) = 4 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
1 0 0 3 2 4 1 3 2 4 1
4
P2 ( ) A = 0 0 5 4 5
1 0 6 9 = 4 0 6 9 5
0 0 1 5 3 7 4 5 3 7 4

in which the second row has been multiplied by .

(ii) Multiplying A by 2 3
1 0
S12 ( ) = 4 0 1 0 5
0 0 1
6
Though we could de…ne also analogous elementary operations by column, we prefer not to do it and to
refer always to the rows in order to avoid any confusion and errors in computations. Choosing the rows over
the columns does not change the results.
13.4. RANK 411

on the left, we get


2 3 2 3 2 3
1 0 3 2 4 1 3 2 4+6 1+9
S12 ( ) A = 4 0 1 0 5 4 1 0 6 9 5 = 4 1 0 6 9 5
0 0 1 5 3 7 4 5 3 7 4

in which to the …rst row one added the second one multiplied by .

(iii) Multiplying A by 2 3
0 1 0
T12 =4 1 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
0 1 0 3 2 4 1 1 0 6 9
T12 A=4 1 0 0 5 4 1 0 6 9 5=4 3 2 4 1 5
0 0 1 5 3 7 4 5 3 7 4

in which the …rst two rows have been exchanged. N

The next result, the proof of which we omit, shows the uniqueness of the echelon matrix
to which we arrive via elementary operations:

Lemma 585 Each matrix A 2 M (m; n) is transformed, via elementary operations, into a
unique echelon matrix, denoted by A 2 M (m; n).

Naturally, di¤erent matrices can be transformed into the same echelon matrix. The
sequence of elementary operations that transforms a matrix A into the echelon matrix A is
called the Gaussian elimination procedure

Example 586 Let 2 3


3 2 4 1
A=4 1 0 6 9 5
5 3 7 4
We proceed as follows (the sign means that we pass from a matrix to the next one via an
elementary operation):
2 3
2 3 1 2 4 1
2 3 1 2 4 1 3 3 3
3 2 4 1 3 3 3 6 7
6 7 6 7
A=4 1 0 6 9 5 6 7 6 0 2 6+ 4
9+ 1 7
(1) 4 1 0 6 9 (2) 6
5
4
3 3 3 7
5
5 3 7 4
5 3 7 4
5 3 7 4
2 2 4 1
3 2 18 27
3
1 3 3 3 1 0 3 3
6 7 6 7
6 7 6 7
6 0 2 22 28 7 6 2 22 28 7
(3) 6
3 3 3 7 (4) 6 0 3 3 3 7
4 5 4 5
10 20 5 1 1 7
0 3 3 7 3 4 3 0 3 3 3
412 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
2 3 2 21
3 2 3
3
1 0 6 9 1 0 0 9+ 2 1 0 0 2
6 7 6 7 6 7
6 7 6 7 6 7
6 0 2 22 28 7 6 2 22 28 7 6 2 28 154 7
(5) 6
3 3 3 7 (6) 6 0 3 3 3 7 (7) 6 0 3 0 3 12 7
4 5 4 5 4 5
0 0 4 7 0 0 4 7 0 0 4 7
2 3
3
1 0 0 2
6 7
6 7
6 0 1 0 21 7
(8) 6 7
4
4 5
7
0 0 1 4
where: (1) multiplication of the …rst row by 1=3; (2) addition of the …rst row to the second
one; (3) addition of 5 times the …rst row to the third one; (4) subtraction of the second
row from the …rst one; (5) addition of the second row multiplied by 1=2 to the third one; (6)
addition of the third row multiplied by 3=2 to the …rst one; (7) subtraction of the third row
multiplied by 22=12 from the second one; (8) multiplication of the second row by 3=2 and of
the third one by 1=4. Finally, we get
2 3
3
1 0 0 2
6 7
6 7
6
A=6 0 1 0 21 7
4 7
4 5
7
0 0 1 4
N
Example 587 If A is square of order n, the echelon matrix A that the Gaussian elimination
procedure yields is square of order n and upper triangular, with diagonal composed of only
1s and 0s. N
Going back to the calculation of the rank, which was the motivation of this section,
Proposition 582 shows that the elementary operations by row do not alter the rank of A
because the elementary matrices are square matrices of full rank. We have therefore:
Proposition 588 For each matrix A we have (A) = (A).
To calculate the rank of a matrix one can, therefore, apply Gaussian elimination to obtain
an echelon matrix of equal rank, whose rank is evident.
Example 589 In the last example (A) = 3 because all the three rows are non-zero. By
Proposition 588, we have (A) = 3, so matrix A has full rank. N

13.5 Invertible operators


13.5.1 Invertibility
An injective operator is usually called invertible. An invertible operator T 2 L (Rn ) has,
indeed, an inverse operator T 1 : Rn ! Rn , whose domain is the entire space Rn because T ,
being injective, is also surjective by Corollary 575.7
7
Recall that L(Rn ) is the space of linear operators T : Rn ! Rn .
13.5. INVERTIBLE OPERATORS 413

Lemma 590 T 2 L (Rn ) if and only if T 1 2 L (Rn ).

This lemma, whose proof is left to the reader, shows that the inverse operator T 1 is a
linear operator too, that is, T 1 2 L (Rn ). Moreover, it is easy to verify that
1 1
T T = TT =I (13.23)
where I is the identity operator.

Example 591 (i) The identity operator I : Rn ! Rn is clearly invertible, with I 1 = I.


(ii) Let T : R2 ! R2 be de…ned by T (x) = Ax for every x 2 R2 , where
1 0
A=
1 2

The operator T is invertible, as the reader can verify, where T 1 (x) = Bx for every x 2 R2
and
1 0
B= 1 1
2 2
Finding the inverse operator is not an easy task, yet it is not just con…ned to guessing. Later
in the chapter we will discuss a procedure allowing for the computation of B. N

In the last section we saw a …rst characterization of the invertibility through the notions
of rank and nullity (Corollary 575). We give now another characterization of invertibility.

Proposition 592 An operator T 2 L (Rn ) is invertible if and only if there exist S; R 2


L (Rn ) such that
T S = RT = I (13.24)
In this case, S and R are unique and we have S = R = T 1.

Proof “Only if”. Let T be invertible; (13.23) implies that (13.24) holds with S = R = T 1 .
“If”. Assume that there exist S; R 2 L (Rn ) such that (13.24) holds. Let x; y 2 Rn , x 6= y.
We have T (x) 6= T (y) and, therefore, T is injective. Indeed, from T (x) = T (y) it would
follow, by (13.24),
x = R (T (x)) = R (T (y)) = y
which contradicts x 6= y. It remains to show that T is surjective. Let x 2 Rn and set
y = S (x). By (13.24), we have
T (y) = T (S (x)) = x
and, therefore, x 2 Im T . This implies that Rn = Im T , as desired. In conclusion, T is
invertible.
Using (13.23) and (13.24), we have
1 1 1
S (x) = T T (S (x)) = T ((T S) (x)) = T (x)
1 1 1
R (x) = R T T (x) = (R T ) T (x) = T (x)
for every x 2 Rn , and so S = R = T 1.

In (13.24) we need both T S = I and RT = I. Otherwise, T might not be invertible.


414 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

13.5.2 Inverse matrix


The square matrix A associated to an invertible linear operator T : Rn ! Rn is said to be
invertible. The matrix associated to the inverse operator T 1 is called the inverse matrix of
A and is denoted by A 1 . Going back to Example 591, we have

1 0 1 1 0
A= and A = 1 1
1 2 2 2

From (13.23) we have


1 1
A A = AA =I
More generally, in view of Corollary 579, of Theorem 580 and of Proposition 592, we have
the following characterization.

Corollary 593 For a square matrix A of order n the following properties are equivalent:

(i) A is invertible;

(ii) the columns of A are linearly independent;

(iii) the rows of A are linearly independent;

(iv) (A) = n;

(v) there exist two square matrices B and C of order n such that AB = CA = I; such
matrices are unique, with B = C = A 1 .

From this corollary one deduces a noteworthy property of inverse matrices.

Proposition 594 If the square matrices A and B of order n are invertible, then their
product is invertible and
(AB) 1 = B 1 A 1

Proof Let A and B be of order n and invertible. We have (A) = (B) = n, so that
(AB) = n by Proposition 582. By Corollary 593, the matrix AB is invertible. Recall from
(6.11) of Section 6.4 that, for the composition of invertible functions f and g, one has that
(g f ) 1 = f 1 g 1 . In particular this holds for linear operators, that is, (ST ) 1 = T 1 S 1 ,
so Proposition 569 implies (AB) 1 = B 1 A 1 .

So far so good. But, operationally, how do we compute the inverse of an (invertible)


matrix A, i.e., how do we …nd the elements of the inverse matrix A 1 ? To answer this
important question, we must …rst introduce determinants.

13.6 Determinants
13.6.1 De…nition
A matrix contained in a matrix A 2 M (m; n) is called a submatrix of A. It can be thought
of as obtained from A by deleting some rows and/or columns. In particular, we denote by
Aij the (m 1) (n 1) submatrix obtained from A by deleting row i and column j.
13.6. DETERMINANTS 415

Example 595 Let 2 3 2 3


a11 a12 a13 2 1 4
A = 4 a21 a22 a23 5 = 4 3 1 0 5
a31 a32 a33 1 6 3
We have, for example,

a21 a23 3 0 a11 a13 2 4


A12 = = ; A32 = =
a31 a33 1 3 a21 a23 3 0
a11 a13 2 4 a12 a13 1 4
A22 = = ; A31 = =
a31 a33 1 3 a22 a23 1 0
N

Through submatrices, we can de…ne in a recursive way the determinants of square


matrices (only for them this notion is de…ned). To ease notation we denote by M (n),
in place of M (n; n), the space of the square matrices of order n.

De…nition 596 The determinant is the function det : M (n) ! R de…ned, for every A 2
M (n), by

(i) det A = a11 if n = 1, A = [a11 ];


P
(ii) det A = nj=1 ( 1)1+j a1j det A1j if n > 1, A = (aij ).

Example 597 If n = 2, the determinant of the matrix

a11 a12
A=
a21 a22

is
det A = ( 1)1+1 a11 det ([a22 ]) + ( 1)1+2 a12 det ([a21 ]) = a11 a22 a12 a21
For example, if
2 4
A=
1 3
we have det A = 2 3 4 1 = 2. N

Example 598 If n = 3, the determinant of the matrix


2 3
a11 a12 a13
A = 4 a21 a22 a23 5
a31 a32 a33

is given by

det A = ( 1)1+1 a11 det A11 + ( 1)1+2 a12 det A12 + ( 1)1+3 a13 det A13
= a11 det A11 a12 det A12 + a13 det A13
= a11 (a22 a33 a23 a32 ) a12 (a21 a33 a23 a31 ) + a13 (a21 a32 a22 a31 )
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a11 a23 a32 a12 a21 a33 a13 a22 a31
416 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

For example, suppose we want to calculate the determinant of the matrix


2 3
2 1 4
A=4 3 1 0 5
1 6 3
Let us calculate …rst the determinants of the three submatrices A11 , A12 and A13 . We have
det A11 = 1 3 0 6=3
det A12 = 3 3 0 1=9
det A13 = 3 6 1 1 = 17
and, therefore,
det A = 2 det A11 1 det A12 + 4 det A13 = 2 3 1 9 + 4 17 = 65

N
Example 599 For a lower triangular matrix A we have
det A = a11 a22 ann
that is, its determinant is simply the product of the elements of the main diagonal. Indeed,
all the other products are zero because they necessarily contain a zero element of the …rst
row.
Since det A = det AT (Proposition 603), a similar result holds for upper triangular
matrices, so also for the diagonal ones. N
Example 600 If the matrix A has all the elements of its …rst row zero except for the …rst
one, which is equal to 1, then
2 3
1 0 0 2 3
6 a21 a22 7 a22 a2n
6 a2n 7 6 .. .. .. 7
det 6 . .. .. .. 7 = det 4 . . . 5
4 .. . . . 5
an2 ann
an1 an2 ann
That is, the determinant coincides with the determinant of the submatrix A11 . Indeed, in
Xn
det A = ( 1)1+j a1j det A1j
j=1

all the summands except for the …rst one are zero. More generally, for any scalar k we have
2 3
k 0 0 2 3
6 a21 7 a22 a2n
6 a22 a2n 7 6 . .. .. 7
det 6 . .. .. .. 7 = k det 4 .. . . 5
4 .. . . . 5
an2 ann
an1 an2 ann
Similar properties hold also for the columns. N
The determinant of a square matrix can, therefore, be calculated through a well speci…ed
procedure – an algorithm – based on its submatrices. There exist various techniques to
simplify the calculation of determinants (we will see some of them shortly) but, for our
purposes, it is important to know that they can be calculated through algorithms.
13.6. DETERMINANTS 417

13.6.2 Geometry
Geometrically, the determinant of a square matrix measures (with a sign!) the “space taken
up”by its column vectors. Let us try to explain this, at least in the simplest case. So, let A
be the matrix 2 2
a11 a12
A=
a21 a22
in which we assume that a11 > a12 > 0 and a22 > a21 > 0 (the other possibilities can be
similarly studied, as readers can check).

3 G

2 a F C E
22

1 a B
21

0
D
O a a
12 11

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

The determinant of A is the area of the parallelogram OBGC (see the …gure), i.e., twice
the area of the triangle OBC that is obtained from the two column vectors of A. The area
of the triangle OBC can be easily calculated by subtracting from the area of the rectangle
ODEF the areas of the three triangles ODB, OCF , and BEC. Since
a11 a21 a22 a12
area ODEF = a11 a22 ; area ODB = ; area OCF =
2 2
(a11 a12 ) (a22 a21 ) a11 a22 a11 a21 a12 a22 + a12 a21
area BCE = =
2 2
one gets
a11 a21 + a22 a12 + a11 a22 a11 a21 a12 a22 + a12 a21
area OBC = a11 a22
2
a11 a22 a12 a21
=
2
Therefore,
det A = area OBGC = a11 a22 a12 a21
The reader will immediately realize that:

(i) if we exchange the two columns, the determinant changes only its sign (because the
parallelogram is covered in the opposite direction);
418 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

(ii) if the two vectors are proportional, that is, linearly dependent, the determinant is zero
(because the parallelogram collapses in a segment).

For example, let


6 4
A=
2 8
One has
6 2
area ODEF = 6 8 = 48; area ODB = =6
2
8 4 (6 4) (8 2)
area OCF = = 16; area BCE = =6
2 2
and so
area OBC = 48 6 16 6 = 20
We conclude that
det A = area OBGC = 40
For 3 3 matrices, the determinant is the volume (with sign) of the hexahedron determined
by the three column vectors.

13.6.3 Combinatorics
A permutation of the set of numbers N = f1; 2; :::; ng is any bijection : N ! N (Appendix
B.2). There are n! possible permutations. For example, the permutation

f2; 1; 3; 4; :::; ng (13.25)

interchanges the …rst two elements of N and leave the others unchanged. So, it is represented
by the function : N ! N de…ned by
8
>
> 2 if k = 1
<
(k) = 1 if k = 2
>
>
:
k else

Let be the set of all the permutations of N . We have an inversion in a permutation 2


if, for some k; k 0 2 N we have k < k 0 and (k) > (k 0 ). We say that the permutation
is odd (resp., even) if it features an odd (resp., even) number of inversions. The function
sgn : ! f 1; 1g de…ned by
(
+1 if is even
sgn =
1 if is odd

is called parity. In particular, an even permutation has parity +1, while an odd permutation
has parity 1.

Example 601 (i) The permutation (13.25) is odd because there is only one inversion, with
k = 1 and k 0 = 2. So, its parity is 1. (ii) The identity permutation (k) = k has, clearly,
no inversions. So, it is an even permutation, with parity +1. N
13.6. DETERMINANTS 419

Let us go back to determinants. Consider a 2 2 matrix A, and set N = f1; 2g. In this
case consists of only two permutations and 0 , de…ned by
( (
1 if k = 1 0
2 if k = 1
(k) = and (k) =
2 if k = 2 1 if k = 2

In particular, we have sgn = +1 and sgn 0 = 1. Remarkably, we then have


0
det A = (sgn ) a1 (1) a2 (2) + sgn a1 0 (1) a2 0 (2)

Indeed:
0
(sgn ) a1 (1) a2 (2) + sgn a1 0 (1) a2 0 (2) = a11 a22 a12 a21
The next result shows that this remarkable fact is true in general, thus providing an important
combinatorial characterization of determinants (we omit the proof).

Theorem 602 We have


X n
Y
det A = sgn ai (i) (13.26)
2 i=1

for every square matrix A = (aij ) of order n.

Note that each term in the sum (13.26) contains only one element of each row and only
one element of each column. This will be crucial in the proofs of the next section.

13.6.4 Properties
The next proposition collects the main properties of determinants, which are also useful for
their computation. In the statement “line” stands for either row or column: the properties
hold, indeed, symmetrically for both the rows and the columns of the matrix. “Parallel lines”
means two rows or two columns.

Proposition 603 Let A and B be two square matrices of the same order. Then:

(i) If a line of A is zero, then det A = 0.

(ii) If B is obtained from A by multiplying a line by a scalar k, then det B = k det A.

(iii) If B is obtained from A by interchanging two parallel lines, then det B = det A.

(iv) If two parallel lines of A are equal, then det A = 0.

(v) If a line of A is the sum of two vectors b and c, then det A is the sum of the determinants
of the two matrices that are obtained by taking that line equal …rst to b and then to c.

(vi) If B is obtained from A by adding to a line a multiple of a parallel line, then det B =
det A.

(vii) det A = det AT .


420 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Proof The proof relies on the combinatorial characterization of the determinant established
in Proposition 602, in particular on the observation that each term that appears in the
determinant contains exactly one element of each row and one element of each column. In
the proof we only consider rows (similar arguments hold for the columns).
(i) In all the products that constitute the determinant, there appears one element of each
row: if a row is zero, all the products are then zero. (ii) For the same reason, all the products
turn out to be multiplied by k.
(iii) By exchanging two rows, all the even permutations become odd and vice versa.
Therefore, the determinant changes sign.
(iv) Let A be the matrix that has rows i and j equal and let Aij be the matrix A with
such rows interchanged. By (iii), we have det Aij = det A. Nevertheless, since the two
interchanged rows are equal, we have A = Aij . So, det Aij = det A. This is possible if and
only if det Aij = det A = 0.
(v) Suppose
2 1 3 2 3
a a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
A=6 7 6
6 ar 7 = 6 b + c 7
7
6 7 6 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am

and let 2 3 2 3
a1 a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
Ab = 6
6
7
7 and Ac = 6
6
7
7
6 b 7 6 c 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am

be the two matrices obtained by taking as r-th row b and c, respectively. Then
0 1
X n
Y X Y
det A = sgn ai (i) = sgn @ ai (i)
A (b + c)
r (r)
2 i=1 2 i6=r
0 1 0 1
X Y X Y
= sgn @ ai (i)
A br (r) + sgn @ ai (i)
A cr (r) = det Ab + det Ac
2 i6=r 2 i6=r

which completes the proof of this point.


(vi) Let
2 3
a1
6 a2 7
6 7
A=6 .. 7
4 . 5
am
13.6. DETERMINANTS 421

The matrix obtained from A by adding, for example, k times the …rst row to the second one,
is 2 3
a1
6 a2 + ka1 7
6 7
B=6 .. 7
4 . 5
am
Moreover, let 2 3 2 3
a1 a1
6 ka1 7 6 a1 7
6 7 6 7
C=6 .. 7 and D = 6 .. 7
4 . 5 4 . 5
am am
By (v), det B = det A + det C. On the other hand, by (ii) we have det C = k det D. But,
since D has two equal rows, by (i) we have det D = 0. We conclude that det B = det A.
(vii) Transposition does not alter any of the n! products in the sum (13.26), as well as
their parity.

An important operational consequence of this proposition is that now we can say how
the elementary operations E1 -E3 , which characterize the Gaussian elimination procedure,
modify the determinant of A. Speci…cally:

E1 : if B is obtained from A by multiplying a row of the matrix A by a constant 6= 0,


then det B = det A by Proposition 603-(ii);

E2 : if B is obtained from A by adding to a row of A the multiple of another row, then


det B = det A by Proposition 603-(vi);

E3 : if B is obtained from A by exchanging two rows of A, then det B = det A by Pro-


position 603-(iii).

In particular, if matrix B is obtained from A via elementary operations, we have

det A 6= 0 () det B 6= 0 (13.27)

or, equivalently, det A = 0 if and only if det B = 0. This observation leads to the following
important characterization of square matrices of full rank.

Proposition 604 A square matrix A has full rank if and only if det A 6= 0.

Proof “Only if”. If A has full rank, its rows are linearly independent (Corollary 593). By
Lemma 585 and Proposition 588, A can be then transformed via elementary operations into
a unique echelon square matrix of full rank, that is, the identity matrix In . By (13.27), we
conclude that det A 6= 0.
“If”. Let det A 6= 0. Suppose, by contradiction, that A does not have full rank. Then,
its rows are not linearly independent (Corollary 593), so at least one of them is a linear
combination of the others. Such row can be reduced to become zero by repeatedly adding
to it carefully chosen multiples of the other rows. Denote by B such transformed matrix.
422 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

By Proposition 603-(i), det B = 0, so by (13.27) we have det A = 0, a contradiction. We


conclude that A has full rank.

Corollary 593 and the previous result jointly imply the following important result.

Corollary 605 For a square matrix A the following conditions are equivalent:

(i) the rows are linearly independent;

(ii) the columns are linearly independent;

(iii) det A 6= 0.

The determinants behave well with respect to the product, as the next result shows. It
is a key property of determinants.

Theorem 606 (Binet) If A and B are two square matrices of the same order n, then
det AB = det A det B.

So, determinants commute: det AB = det BA. This is a …rst interesting consequence of
Binet’s Theorem. Since I = A 1 A, another interesting consequence of this result is that

1 1
det A =
det A
when A is invertible. Indeed, 1 = det I = det A 1A = det A 1 det A.

Proof If (at least) one of the two matrices has linearly dependent rows or columns, then the
statement is trivially true since the columns of AB are linear combinations of the columns
of A and the rows of AB are linear combinations of the rows of B, hence in both cases AB
has also linearly dependent rows or columns, so det AB = 0 = det A det B.
Suppose, therefore, that both A and B have full rank. Suppose the matrix A is diagonal.
If so, det A = a11 a22 ann . Moreover, we have
0 10 1
a11 0 0 b11 b12 b1n
B 0 a22 0 C B b2n C
AB = B C B b21 b22 C
@ A@ A
0 0 ann bn1 bn2 bnn
0 1
a11 b11 a11 b12 a11 b1n
B a22 b21 a22 b22 a22 b2n C
=B@
C
A
ann bn1 ann bn2 ann bnn

By Proposition 603-(ii),

det AB = a11 a22 ann det B = det A det B

proving the result in this case.


If A is not diagonal, we can transform it into a diagonal matrix by suitably applying
the elementary operations E2 and E3 . As we have seen, such operations are equivalent to
13.6. DETERMINANTS 423

multiply A on the left by a square matrices Srs ( ) and Trs , respectively. Let us agree to make
…rst the transformations T and then the transformations S ( ). Let us suppose, moreover,
that the diagonalization requires h transformation S ( ) and k transformations T . If D is
the diagonal matrix obtained in this way, we then have

D = S ( )S ( ) S ( )T T T A
| {z }| {z }
h times k times

Since D is diagonal, we know that

det DB = det D det B

Since D is obtained from A through h elementary operations that do not modify the determ-
inant and k elementary operations that only change its sign, we have det D = ( 1)k det A.
Therefore,
det DB = ( 1)k det A det B (13.28)
Analogously, since the product of matrices is associative, we have

DB = (S ( ) S ( )T T A) B = (S ( ) S ( )T T ) (AB)

Therefore, DB is obtained from AB via h elementary operations that do not modify the
determinant and k elementary operations that only change its sign. So, as before, we have

det DB = ( 1)k det AB (13.29)

Putting together (13.28) and (13.29), we get det AB = det A det B, as desired.

13.6.5 Laplace’s Theorem


Let A be a square matrix of order n. The algebraic complement (or cofactor ) of aij , denoted
by aij , is the number
aij = ( 1)i+j det Aij
The cofactor matrix (or matrix of algebraic complements) of A, denoted by A , is the matrix
whose elements are the algebraic complements of the elements of A, that is,

A = aij

with i; j = 1; 2; :::; n. The transpose (A )T is sometimes called the (classical ) adjoint matrix.

Example 607 Let 2 3


1 3 0
A= 4 5 1 2 5
3 6 4
For a11 = 1, we have
1 2
A11 = and det A11 = 16
6 4
Therefore, a11 = ( 1)1+1 ( 16) = 16.
424 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

For a12 = 3, we have


5 2
A12 = and det A12 = 26
3 4

Therefore, a12 = ( 1)1+2 26 = 26.

For a13 = 0, we have


5 1
A13 = and det A13 = 27
3 6

Therefore, a13 = ( 1)1+3 27 = 27.

Similarly,

a21 = ( 1)2+1 12 = 12; a22 = ( 1)2+2 4 = 4; a23 = ( 1)2+3 15 = 15


3+1 3+2 3+3
a31 = ( 1) 6 = 6; a32 = ( 1) 2= 2; a33 = ( 1) ( 16) = 16

We conclude that 2 3
16 26 27
A =4 12 4 15 5
6 2 16
N

Using the notion of algebraic complement, the de…nition of the determinant of a square
matrix (De…nition 596) can be viewed as the sum of the products of the elements of the …rst
row by their algebraic complements, that is,
n
X
det A = a1j a1j
j=1

The next result shows that, actually, there is nothing special about the …rst row: the
determinant can be computed using any row or column of the matrix. The choice of which
one to use is then just a matter of analytical convenience.

Proposition 608 The determinant of a square matrix A is equal to the sum of the products
of the elements of any line (row or column) by their algebraic complements.

In symbols, choosing the row i,


n
X
det A = aij aij
j=1

or, choosing the column j,


n
X
det A = aij aij
i=1
13.6. DETERMINANTS 425

Proof For the …rst row, the result is just a rephrasing of the de…nition of determinant. Let
us verify it for the i-th row. By points (ii) and (v) of Proposition 603 we can rewrite det A
in the following way:
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det A = det 6
6 ai1 aij ain 77 (13.30)
6 .. .. 7
4 . . 5
an1 anj ann

2 3 2 3
a11 a1j a1n a11 a1j a1n
6 .. .. 7 6 .. .. 7
6 . . 7 6 . . 7
6 7 6 7
= ai1 det 6
6 1 0 0 7 7+ + + aij det 6
6 0 1 0 7 7+
6 .. .. 7 6 .. .. 7
4 . . 5 4 . . 5
an1 anj ann an1 anj ann
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
+ ain det 6
6 0 0 1 7 7
6 .. .. 7
4 . . 5
an1 anj ann

Let us calculate the determinant of the submatrix relative to the term (i; j):
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det 6
6 0 1 0 7 7 (13.31)
6 .. .. 7
4 . . 5
an1 anj ann

Note that to be able to apply the de…nition of the determinant and to use the notion of
algebraic complement, it is necessary to bring the i-th row to the top and the j-th column
to the left, i.e., to transform the matrix (13.31) into a matrix that has (1; 0; :::0) as …rst row,
(1; a1j ; a2j ; :::; ai 1;j ; ai+1;j ; :::anj ) as …rst column and Aij as the (n 1) (n 1) South-East
submatrix:
2 3
1 0 0 0 0
6 a1j a11 a1;j 1 a1;j+1 a1n 7
6 7
6 7
6 7
A=6
~
6 ai 1;j ai 1;1 ai 1;j 1 ai 1;j+1 ai 1;n
7:
7
6 ai+1;j ai+1;1 ai+1;j ai+1;j+1 ai 7
6 1 1;n 7
4 5
anj an1 an;j 1 an;j+1 an;n
426 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

The transformation requires i 1 exchanges of adjacent rows to bring the i-th row to the
top, and j 1 exchanges of adjacent columns to bring the j-th column to the left (leaving
the order of the other rows and columns unchanged). Clearly, we have

det A~ = 1 det Aij

and so
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det 6
6 0 1 0 7 7 = ( 1)
i+j 2
det A~ = ( 1)i+j det Aij = aij (13.32)
6 .. .. 7
4 . . 5
an1 anj ann
By applying formula (13.30) and using (13.32) we complete the proof.

Example 609 Let 2 3


1 3 4
A=4 2 0 2 5
1 3 1
By Proposition 608, we can compute the determinant using any line. It is, however, simpler
to compute it using the second row because it contains a zero, a feature that facilitates the
algebra. Indeed,

det A = a21 a21 + a22 a22 + a23 a23


3 4 1 3
= ( 2)( 1)2+1 det + 0 + (2)( 1)2+3 det
3 1 1 3
= ( 2)( 1)( 15) + 0 + (2)( 1)(0) = 30

The next result completes Proposition 608 by showing what happens if we use the algeb-
raic complements of a di¤erent row (or column).

Proposition 610 The sum of the products of the elements of any row (column) by the al-
gebraic complements of a di¤ erent row (column) is zero.

In symbols, choosing the row i,


n
X
aij aqj = 0 8q 6= i
j=1

or, choosing the column j,


n
X
aij aiq = 0 8q 6= j
i=1
P
Proof Let us replace the i-th row by the q-th row. Then we get det A = nj=1 aij aqj . But,
on the other hand, the determinant is zero because the matrix has two equal rows.
13.6. DETERMINANTS 427

Example 611 Let 2 3


1 0 2
A=4 2 1 3 5
2 4 1
Then
a11 = ( 1)1+1 ( 13) = 13; a12 = ( 1)1+2 4 = 4; a13 = ( 1)1+3 10 = 10
a21 = ( 1)2+1 8 = 8; a22 = ( 1)2+2 ( 3) = 3; a23 = ( 1)2+3 ( 4) = 4
a31 = ( 1)3+1 2 = 2; a32 = ( 1)3+2 1 = 1; a33 = ( 1)3+3 ( 1) = 1
Let us add the products of the elements of the second row by the algebraic complements of
the …rst row:
2a11 + a12 + 3a13 = 26 4 + 30 = 0
Now, let us add the products of the elements of the second row by the algebraic complements
of the third row:
2a31 + a32 + 3a33 = 4 1 3 = 0
The reader can verify that, in accordance with the last result, we get 0 in all the cases
in which we add the products of the elements of a row by the algebraic complements of a
di¤erent row. N
The last two results are summarized in the famous, all inclusive, Laplace’s Theorem:
Theorem 612 (Laplace) Let A be a square matrix of order n. Then:
(i) choosing the row i, (
n
X det A if q = i
aij aqj =
j=1 0 if q 6= i

(ii) choosing the column j,


n
(
X det A if q = j
aij aiq =
i=1 0 if q 6= j

Laplace’s Theorem is the occasion to introduce the classic Kronecker delta function :
N N ! f0; 1g de…ned by
1 if i = j
ij =
0 if i 6= j
Here i and j are, thus, any two natural numbers (e.g., 11 = 33 = 1 and 13 = 31 = 0).
Using this function, points (i) and (ii) of Laplace’s Theorem assume the following elegant
forms:
X n
aij aqj = iq det A
j=1

and
n
X
aij aiq = jq det A
i=1

as the reader may verify.


428 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

13.6.6 Inverses and determinants


Let us go back to the inverse matrices. The next result shows the importance of the determ-
inants in their calculation.

Theorem 613 A square matrix A is invertible if and only if det A 6= 0. In this case, we
have
1
A 1= (A )T
det A

Thus, the elements aij1 of the inverse matrix A 1 are

aji det Aji


aij1 = = ( 1)i+j (13.33)
det A det A
A (square) matrix A for which det A = 0 is called singular. With this terminology, the
theorem says that a matrix is invertible if and only if it is not singular. By Corollary 593,
the following properties are therefore equivalent:

(i) A is invertible;

(ii) det A 6= 0, that is, A is not singular;

(iii) the columns of A are linearly independent;

(iv) the rows of A are linearly independent;

(v) (A) = n.

Proof If
2 3 2 3
1 ( )1
6 2 7 6 ( )2 7
A = (aij ) = 6
4
7
5 and A = (aij ) = 6
4
7
5
n
( )n

we have 2 3
1
6 2 7
A (A )T = 6
4
7 (
5 )1 j ( )2 j j( )n
n

By Laplace’s Theorem, the place (i; q) in the product A (A )T is


n
X
i q det A if i = q
( ) = aij aqj =
0 if i 6= q
j=1

Analogously, the place (i; q) in the product (A )T A is


n
X
C i det A if i = q
(a ) (aC )q = aji ajq =
0 if i 6= q
j=1
13.6. DETERMINANTS 429

where (a C )i is the i-th column of A and (aC )q is the q-th column of A. Therefore,
2 3
det A 0 0
6 0 det A 0 7
6 7
A (A )T = (A )T A = 6 . . .. .. 7 = det A In
4 .. .. . . 5
0 0 det A

That is,
1 1
A (A )T = (A )T A = In
det A det A
which allows to conclude that
1
A 1
= (A )T
det A
as desired.

This last theorem is important because, through determinants, it provides an algorithm


that allows both to verify the invertibility of A and to compute the elements of the inverse
A 1 . Note that in formula (13.33) the subscript of Aji is exactly ji and not ij.

Example 614 We use formula (13.33) to calculate the inverse of the matrix

1 2
A=
3 5

We have
det A11 a22 5
a111 = ( 1)1+1 = = = 5
det A a11 a22 a12 a21 1
det A21 a12 2
a121 = ( 1)1+2 = = =2
det A a11 a22 a12 a21 1
det A12 a21 3
a211 = ( 1)2+1 = = =3
det A a11 a22 a12 a21 1
det A22 a11 1
a221 = ( 1)2+2 = = = 1
det A a11 a22 a12 a21 1
So,
a22 a12
1 det A det A 5 2
A = a21 a11 =
det A det A 3 1

Example 615 A diagonal matrix A is invertible if no element of the diagonal is zero. In


this case the inverse A 1 is diagonal and formula (13.33) implies that
( 1
aij if i = j
1
aij =
0 if i 6= j

N
430 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Example 616 For the matrix


2 3
1 3 0
A=4 5 1 2 5
3 6 4
we saw that 2 3
16 26 27
A =4 12 4 15 5
6 2 16
Therefore, 2 3
16 12 6
(A )T = 4 26 4 2 5
27 15 16
Also det A = 94 and so
2 3
2 3 8 6 3
16 12 6 47 47 47
1 1 4 6 7
A 1
= (A )T = 26 4 2 5=6
4
13
47
2
47
1
47
7
5
det A 94
27 15 16 27 15 8
94 94 47
N

13.6.7 Kronecker’s Algorithm


Kronecker’s Algorithm allows to check the rank of a matrix by using determinants. To
introduce it, we need some terminology. Let A be a square matrix of order n. We call:

(i) principal minors the determinants of the square submatrices that are obtained by
eliminating some rows and the columns with the same indexes (place);
(ii) North-West (NW ) principal minors the principal minors that are obtained by elimin-
ating the last k rows and the last k columns, with 0 k n 1.

Example 617 Let 23


1 3 2
A = 4 10 1 2 5
3 5 7
Its principal minors are the determinants
1 3 1 2
det A = 101; det = 29; det = 3; det [1] = 1
10 1 5 7
1 2
det = 1; det [7] = 7
3 7
The previous matrix A has only three NW principal minors:
1 3
det A = 101; det = 29; det [1] = 1
10 1
N
13.6. DETERMINANTS 431

A square matrix of order n has


n n
k k
minors of order k (that is, determinants of square submatrices of order k). Indeed, we can
n
discard n k rows in di¤erent ways and in as many ways we can discard n k columns
k
n
(so as to leave k of them). Of them are principal minors. There is only one NW
k
principal minor of order k: the one that is obtained discarding the last n k rows and
columns.

Before we present Kronecker’s Algorithm, we recall some results proved previously:

1. if the rank of a matrix is r, it contains at most r linearly independent columns (so,


also r linearly independent rows);

2. r vectors x1 ; x2 ; :::; xr of Rr are linearly independent if and only if the determinant of


the square matrix of order r that has them as row (or column) vectors is non-zero;

3. if r vectors x1 ; x2 ; :::; xr of Rr are linearly independent in Rr , then the r vectors


y 1 ; y 2 ; :::; y r of Rn , with n > r, that have exactly x1 ; x2 ; :::; xr as their …rst r com-
ponents are linearly independent in Rn .8

The following proposition, the proof of which we omit, is very useful to determine the
rank of a matrix.

Proposition 618 (Kronecker) The following properties are equivalent for a matrix A:

(i) A has rank r;

(ii) A has a non-zero minor of order r and all the minors of order r + 1 are zero;

(iii) A has a non-zero minor of order r and all the minors of order r + 1 that contain it are
zero;

(iv) A has a non-zero minor of order r and all the minors of order > r are zero.

Kronecker’s Algorithm for determining the rank of a matrix is based on this proposition
and can be illustrated as follows:

(i) We choose as “leader” a square submatrix of order of A that is readily seen to be


not singular; pragmatically, we often take a submatrix of order 2.

(ii) We “border”in all the possible ways the “leader”submatrix with one of the surviving
rows and one of the surviving columns. If all such “bordered” minors (of order + 1)
are zero, the rank of A is and the procedure ends here. If we run into a non-zero
minor of order + 1, we start again by taking it as new “leader”.
8
The property is easy to verify and has already been used in the proof of Proposition 582.
432 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Example 619 Let 2 3


6 3 9 0
A=4 4 1 7 2 5
8 10 6 12
Let us choose as “leader” the minor of order 2
6 3
det = 6 6= 0
4 1

Hence, the rank of A is at least 2. With the last two columns and the last non-used row, we
obtain the following “bordered” minors:
2 3 2 3
6 3 9 6 3 0
4 5
det 4 1 7 = 0 ; det 4 1 4 2 5=0
8 10 6 8 10 12

So, the rank of A is 2. N

13.6.8 Summing up
We conclude this section by noting how the rank of a matrix is simultaneously many things
(each one of them being a possible de…nition of it). Indeed, it is:

(i) the maximum number of linearly independent columns;

(ii) the maximum number of linearly independent rows;

(iii) the maximum order of its all non-zero minors;

(iv) the dimension of the image of the linear function that the matrix determines.

The rank is a multi-faceted notion that plays a key role in linear algebra and its many
applications. Operationally, the Gaussian elimination procedure and the Kronecker’s Al-
gorithm permit to compute it.

13.7 Square linear systems


Using inverse matrices we can give a procedure for solving “square” linear systems of equa-
tions, i.e., systems of n equations in n unknowns:
8
> a11 x1 + a12 x2 +
>
<
+ a1n xn = b1
a21 x1 + a22 x2 + + a2n xn = b2
>
>
:
an1 x1 + an2 x2 + + ann xn = bn

In matrix form:
A x = b (13.34)
n nn 1 n 1

where A is a square n n matrix, while x and b are vectors in Rn . We ask two questions
concerning the system (13.34):
13.7. SQUARE LINEAR SYSTEMS 433

Existence: which conditions ensure that the system has a solution for every vector
b 2 Rn , that is, when, for any given b 2 Rn there exists a vector x 2 Rn such that
Ax = b?

Uniqueness: which conditions ensure that such a solution is unique, that is, when, for
any given b 2 Rn there exists a unique x 2 Rn such that Ax = b?

To frame the problem in what we studied until now, consider the linear operator T :
Rn ! Rn associated to A, de…ned by T (x) = Ax for every x 2 Rn . The system (13.34) can
be written in functional form as
T (x) = b

So, it is immediate that:

the system admits a solution for a given b 2 Rn if and only if b 2 Im T ; in particular,


the system admits a solution for every b 2 Rn if and only if T is surjective, that is,
Im T = Rn ;

the system admits a unique solution for a given b 2 Rn if and only if the preimage
T 1 (b) is a singleton; in particular, the system admits a unique solution for every
b 2 Rn if and only if T is injective.9

Since injectivity and surjectivity are, by Corollary 575, equivalent properties for linear
operators from Rn into Rn , the two problems of existence and uniqueness are equivalent:
there exists a solution for the system (13.34) for every b 2 Rn if and only if such a solution
is unique.
In particular, a necessary and su¢ cient condition for such a unique solution to exist for
every b 2 Rn is that the operator T is invertible, i.e., that one of the following equivalent
conditions holds:

(i) the matrix A is invertible;

(ii) the matrix A is non-singular, i.e., det A 6= 0;

(iii) the matrix A is of full rank, i.e., (A) = n.

The condition required is, therefore, the invertibility of the matrix A, or one of the
equivalent properties (ii) and (iii). This is the content of Cramer’s Theorem, which thus
follows easily from what we learned so far.

Theorem 620 (Cramer) Let A be a square matrix of order n. The system (13.34) has
one, and only one, solution for every b 2 Rn if and only if the matrix A is invertible. In this
case, the solution is given by
x = A 1b
9
Recall that a function is injective if and only if all its preimages are singletons.
434 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Proof “If”. Let A be invertible. The associated linear operator T : Rn ! Rn is invertible,


so both surjective and injective. Since T is surjective, the system has a solution. Since T
is injective, this solution is unique. In particular, the solution that corresponds to a given
b 2 Rn is T 1 (b). Since T 1 (y) = A 1 y for every y 2 Rn , it follows that the solution is
T 1 (b) = A 1 b.10
“Only if”. Assume that the system (13.34) admits one and only one solution for every
b 2 Rn . This means that, for every vector b 2 Rn , there exists only one vector x 2 Rn such
that T (x) = b. Hence, the operator T is bijective, so invertible. It follows that also A is
invertible.

Thus, the system (13.34) admits a solution for every b if and only if the matrix A is
invertible and, even more important, the unique solution is expressed in terms of the inverse
matrix A 1 . Since we are able to calculate A 1 using determinants (Theorem 613), we
have obtained a procedure for solving linear systems of n equations in n unknowns: formula
x = A 1 b can indeed be written as
1
x= (A )T b (13.35)
det A
Using Laplace’s Theorem, it is easy to show that formula (13.35), called Cramer’s rule, can
be written in detail as: 2 det A1 3
det A
6 7
6 det A2 7
x=6 det A 7 (13.36)
4 5
det An
det A

where Ak denotes the matrix obtained by replacing the k-th column of the matrix A with
the column vector 2 3
b1
6 b2 7
b=6 4
7
5
bn

Example 621 A special case of the system (13.34) is when b = 0. Then the system is called
homogeneous and, if A is invertible, by Proposition 620, the unique solution is x = 0. N

Example 622 For the system


x1 + 2x2 = b1
3x1 + 5x2 = b2
of two equations in two unknowns we have

1 2
A=
3 5
10
Alternatively, it is possible to prove the “if” in the following, rather mechanical, way. Set x = A 1 b; we
have Ax = A A 1 b = AA 1 b = Ib = b, so x = A 1 b solves the system. It is also the unique solution.
~ 2 Rn is another solution, we have x
Indeed, if x ~ = Ix~ = A 1A x ~ = A 1 (A~x) = A 1 b = x as claimed.
13.7. SQUARE LINEAR SYSTEMS 435

From Example 614 we know that A is invertible. By Proposition 620, the unique solution of
the system is therefore

1 5 2 b1 5b1 + 2b2
x=A b= =
3 1 b2 3b1 b2
Using Cramer’s rule (13.36), we see that
b1 2 1 b1
det A = 1 det A1 = det = 5b1 2b2 det A2 = det = b2 3b1
b2 5 3 b2
Therefore,
2b2 5b1 b2 3b1
x1 = = 5b1 + 2b2 ; x2 = = 3b1 b2
1 1
which coincides with the solution found above. N

Example 623 For the system


8
< x1 2x2 + 2x3 = b1
2x2 x3 = b2
:
x2 x3 = b3
of three equations in three unknowns we have
2 3
1 2 2
A=4 0 2 1 5
0 1 1
Using submatrices, it is easy to verify that det A = 1 6= 0. Therefore, A is invertible and,
using formula (13.33), we obtain
2 3
1 0 2
A 1=4 0 1 1 5
0 1 2
By Proposition 620, the unique solution of the system is
2 32 3 2 3
1 0 2 b1 b1 + 2b3
x = A 1b = 4 0 1 1 5 4 b2 5 = 4 b2 b3 5
0 1 2 b3 b2 2b3
For example, if b = (1; 1; 2), we have
x = (1 + 2 2; 1 2; 1 2 2) = (5; 3; 5)
Using Cramer’s rule (13.36), we see that
det A = 1 det A1 = b1 2b3 det A2 = b2 + b3 det A3 = b2 + 2b3
Hence
b1 2b3 b2 + b3 b2 + 2b3
x1 = = b1 + 2b3 x2 = = b2 b3 x3 = = b2 2b3
1 1 1
which coincides with the solution found above. N
436 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

13.8 General linear systems


13.8.1 Kronecker-Capelli’s Theorem
We now turn to a general linear system of m equations in n unknowns
8
>
> a11 x1 + a12 x2 + + a1n xn = b1
>
<
a21 x1 + a22 x2 + + a2n xn = b2
>
>
>
:
am1 x1 + am2 x2 + + amn xn = bm

where it is no longer required that n = m, i.e., the number of equations and unknowns may
di¤er. The system can be written in matrix form as

A x = b
m nn 1 m 1

where A 2 M (m; n), x 2 Rn , and b 2 Rm . The square system is the special case where
n = m.

Let T (x) = Ax be the operator T : Rn ! Rm associated to the system, which can be


then written as T (x) = b. We say that the system is:

(i) unsolvable when it does not admit any solution, i.e., b 2


= Im T ;

(ii) solvable when it admits at least one solution, i.e., b 2 Im T .

Moreover, a solvable linear system is said to be:

(ii.a) determined (or uniquely solvable) when it admits only one solution, i.e., T 1 (b) is a
singleton;

(ii.b) undetermined when it admits in…nitely many solutions, i.e., T 1 (b) has in…nite car-
dinality.11

These two cases exhaust all the possibilities: if a system admits two solutions, it certainly
has in…nitely many ones. Indeed, if x and x0 are two di¤erent solutions –that is, Ax = Ax0 =
b –then all the linear combinations x+(1 ) x0 with 2 R are also solutions of the system
because
A x + (1 ) x0 = Ax + (1 ) Ax0 = b + (1 )b = b
Using this terminology, in the case n = m Cramer’s Theorem says that a square linear
system is solvable for every vector b if and only if it is determined for every such vector. In
this section we modify the analysis of the last section in two di¤erent directions:

(i) we consider general systems, without requiring that m = n;

(ii) we study the existence and uniqueness of solutions for a given vector b (so, for a speci…c
system at hand), rather than for every such vector.
11
Since the set T 1 (b) is convex, it is a singleton or it has in…nite cardinality (in particular, it has the
power of the continuum), tertium non datur. We will introduce convexity in the next chapter.
13.8. GENERAL LINEAR SYSTEMS 437

To this end, let us consider the so-called augmented (or complete) matrix of the system

Ajb
m (n+1)

obtained by writing near A the vector b of the known terms. The next famous result gives
a necessary and su¢ cient condition for a linear system to have a solution.

Theorem 624 (Kronecker-Capelli) Let A 2 M (m; n) and b 2 Rm . The linear system


Ax = b is solvable if and only if the matrix A has the same rank as the augmented matrix
Ajb, that is,
(A) = (Ajb) (13.37)

Proof Let T : Rn ! Rm be the linear operator associated to the system, which can therefore
be written as T (x) = b. The system is solvable if and only if b 2 Im T . Since Im T is the
vector subspace of Rm generated by the columns of A, the system is solvable if and only if b
is a linear combination of such columns. That is, if and only if the matrices A and Ajb have
the same number of linearly independent columns (so, the same rank).

Example 625 Consider


8
< x1 + 2x2 + 3x3 = 3
>
6x1 + 4x2 + 2x3 = 7
>
:
5x1 + 2x2 x3 = 4
For both matrices
2 3 2 3
1 2 3 1 2 3 3
A=4 6 4 2 5 and Ajb = 4 6 4 2 7 5
5 2 1 5 2 1 4

the third row is the di¤erence between the second and …rst rows. These three rows are thus
not linearly independent: (A) = (Ajb) = 2. So, the system is solvable. N

Example 626 A homogeneous system is always solvable because the zero vector is always
a solution of the system. This is con…rmed by the Kronecker-Capelli’s Theorem because the
ranks of A and of Aj0 are always equal. N

Note the Kronecker-Capelli’s Theorem considers a given pair (A; b), while Cramer’s The-
orem considers, as given, only a square matrix A. This re‡ects the new direction (ii) men-
tioned above and, for this reason, the two theorems are only partly comparable in the case
of square matrices A. Indeed, Cramer’s Theorem considers only the case (A) = n, in
which condition (13.37) is automatically satis…ed for every b 2 Rn (why?). For this case,
it is more powerful than Kronecker-Capelli’s Theorem: the existence holds for every vector
b and, moreover, we have also the uniqueness. But, di¤erently from Cramer’s Theorem,
Kronecker-Capelli’s Theorem is able to handle also the case (A) < n by giving, for a given
vector b, a necessary and su¢ cient condition for the system to be solvable.
438 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

13.8.2 Uniqueness
We now turn our attention to the uniqueness of the solutions of a system Ax = b, whose
existence is guaranteed by Kronecker-Capelli’s Theorem. The next result shows that for
uniqueness, too, it is necessary to consider the rank of the matrix A (recall that, thanks to
condition (13.18), we have (A) n).

Proposition 627 Let Ax = b be a solvable linear system, with A 2 M (m; n) and b 2 Rm .


Then:

(i) if (A) = n, then the system is determined;

(ii) if (A) < n, then the system is undetermined.

The proof is based on the following result, of independent interest.

Proposition 628 Let T : Rn ! Rm be a linear operator and suppose T (x) = b. The vectors
x 2 Rn for which T (x) = b are those of the form x + z with z 2 ker T , and only them. That
is,
T 1 (b) = fx + z : z 2 ker T g (13.38)

Proof Being T (z) = 0, one has T (x + z) = T (x) + T (z) = b + 0 = b. Now, let x be


another vector for which T (x ) = b. Subtracting member to member the two equalities
T (x ) = b and T (x) = b, we get T (x ) T (x) = 0, that is, T (x x) = 0 and therefore
x x 2 ker T . We conclude that x = x + z with z 2 ker T .

The “only if” part of Lemma 571 – i.e., that linear and injective operators have trivial
kernels – is a special case of this result. Indeed, suppose that the linear operator T is
injective, so that T 1 (0) = f0g. If b = 0, we can set x = 0 and (13.38) then implies
f0g = T 1 (0) = f0 + z : z 2 ker T g = ker T . So, ker T = f0g.

For systems the last result takes the following form:

Corollary 629 If x is a solution of the system Ax = b, then all solutions are of the form

x+z

with z such that Az = 0 (i.e., z solves the homogeneous system Ax = 0).

Therefore, once we …nd a solution of the system Ax = b, all the other solutions can be
found by adding to it the solutions of the homogeneous system Ax = 0. Besides its theoretical
interest, this is relevant also operationally (especially when it is signi…cantly simpler to solve
the homogeneous system than the original one).12
That said, Corollary 629 allows to prove Proposition 627.
12
As readers will see in more advanced courses, the representation of all solutions as the sum of a particular
solution and the solution of the associated homogeneous system holds also for the solutions of systems of
linear di¤erential equations, as well as of linear di¤erential equations of order n.
13.8. GENERAL LINEAR SYSTEMS 439

Proof of Proposition 627 By hypothesis, the system has at least one solution x. Moreover,
since (A) = (T ), by the Rank-Nullity Theorem (A) + (T ) = n. If (A) = n, we have
(T ) = 0, that is, ker T = f0g. From Corollary 629 it follows that x is the unique solution.
If, instead, (A) < n we have (T ) > 0 and therefore ker T is a non-trivial vector subspace
of Rm , with in…nitely many elements. By Corollary 629, adding such elements to the solution
x we …nd the in…nitely many solutions of the system.

13.8.3 Summing up
Summing up, now we are able to state a general result on the resolution of linear systems
that combines the Kronecker-Capelli’s Theorem and Proposition 627.

Theorem 630 Let A 2 M (m; n) and b 2 Rm . The linear system Ax = b is

(i) unsolvable if and only if (A) < (Ajb);

(ii) solvable if and only if (A) = (Ajb). In this case, it is

(ii.a) determined if and only if (A) = (Ajb) = n;


(ii.b) undetermined if and only if (A) = (Ajb) < n.

The comparison of the ranks (A) and (Ajb) with the number n of the unknowns
allows, therefore, to establish the existence and the possible uniqueness of the solutions of
the system. If the system is square, we have (A) = n if and only if (A) = (Ajb) = n
for every b 2 Rm .13 Cramer’s Theorem, which was only partly comparable with Kronecker-
Capelli’s Theorem, becomes a special case of the more general Theorem 630.

Example 631 Consider a homogeneous linear system Ax = 0. Since, as already observed,


the condition (A) = (Aj0) is always satis…ed, the system has a unique solution (that is,
the zero vector) if and only if (A) = n, and it is undetermined if and only if (A) < n. N

O.R. It is often said that a linear system Ax = b with A 2 M (m; n)

(i) has a unique solution if m = n, i.e., there are as many equations as unknowns;

(ii) is undetermined if m < n, i.e., there are less equations than unknowns;14

(iii) is unsolvable if m > n, i.e., there are more equations than unknowns.

The idea is wrong because it might well happen that some equations are redundant:
some of them are a multiple of another or a linear combination of others (in such cases,
they would be automatically satis…ed once the others are satis…ed). In view of Theorem 630,
however, the claims (i) and (ii) become true provided that by m we mean the number of
non-redundant equations, that is, the rank of A: indeed, the rank counts the equations that
cannot be expressed as linear combinations of others. H
13
Why? (we have already made a similar observation).
14
Sometimes we say that there are more degrees of freedom (unknowns) than constraints (equations). The
opposite holds in (iii).
440 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

13.9 Solving systems: Cramer’s method


We close with a “quadrature” procedure that, by permitting the use of Cramer’s Rule, is
useful in calculations. Consider a generic solvable linear system
A x=b
m n

i.e., such that (A) = (Ajb). Set (A) = k.

1. If k < m, there are m k rows that can be written as linear combinations of the other
k. Given that each row of A identi…es an equation of the system, there are m k
equations that, being linear combinations of the other ones, are “…ctitious”: they are
satis…ed when the other k are satis…ed. We can simply delete them, by reducing in
this way the system to one with k linearly independent equations.
2. If k < n, there are n k columns that can be written as linear combination of the other
k (so, are “…ctitious”). The corresponding n k “unknowns”are not really unknowns
(they are “…ctitious unknowns”) but can assume completely arbitrary values: for each
choice of such values, the system reduces to one with k unknowns (and k equations)
and, therefore, there is only one solution for the k “true unknowns”. We can simply
assign arbitrary values to the n k “…ctitious unknowns”, by reducing in this way the
system to one with k unknowns.

As usual, we can assume that the k rows and the k columns that determine the rank of
A are the …rst ones. Let A0 be a non-singular submatrix k k of A,15 and write
2 3
A0 B
6 k k k (n k) 7
A =4 5
m n C D
(m k) k (m k) (n k)

then we can eliminate the last m k rows and give arbitrary values, say z 2 Rn k to the
last n k unknowns, obtaining in this way the system
A0 x0 = b0 Bz (13.39)
in which x0 2 Rk is the vector that contains the only k “true” unknowns and b0 2 Rk is the
vector of the …rst k known terms.
The square system (13.39) satis…es the hypothesis of Cramer’s Theorem for every z 2
Rn k , so it can be solved with the Cramer’s rule. If we call x^0 (z) the unique solution for
each given z 2 Rn k , the solutions of the original system Ax = b are
^0 (z) ; z
x 8z 2 Rn k

Example 632 Consider again the system


8
< x1 + 2x2 + 3x3 = 3
>
6x1 + 4x2 + 2x3 = 7
>
:
5x1 + 2x2 x3 = 4
15
Often there is more than one, i.e., there is some freedom in choosing which equations to delete and which
unknowns are “…ctitious”.
13.9. SOLVING SYSTEMS: CRAMER’S METHOD 441

of Example 625, which we showed to be solvable because (A) = (Ajb) = 2.


Since the last equation is redundant (recall that it is di¤erence between the second and
…rst equations), one has

1 2 3 3
A0 = ; B = ; C = 5 2 ; D = [ 1] ; b0 =
2 2 6 4 2 1 2 1 2 1 1 2 1 7

so that, setting b0z = b0 Bz, the square system (13.39) becomes A0 x = b0z , that is,
(
x1 + 2x2 = 3 3z
6x1 + 4x2 = 7 2z

In other words, the procedure consisted in deleting the redundant equation and in assigning
arbitrary value z to the unknown x3 .
Since det A0 6= 0, by Cramer’s Rule the in…nitely many solutions are described as

2 8z 1 11 + 16z 11
x1 = = + z; x2 = = 2z; x3 = z
8 4 8 8

for every z 2 R. We can verify it:

1 11 1 + 11
First equation : 1 +z +2 2z +3 z = +0 z =3
4 8 4
1 11 6 + 22
Second equation : 6 +z +4 2z +2 z = +0 z =7
4 8 4

Alternatively, we could have noted that the second equation is the sum of the …rst and third
ones and then delete the second equation rather than the third one. In this way the system
would reduce to
x1 + 2x2 + 3x3 = 3
5x1 + 2x2 x3 = 4
We can now assign arbitrary value to the …rst unknown, say x1 = z~, rather than to the third
one.16 This yields the system
2x2 + 3x3 = 3 z~
2x2 x3 = 4 5~ z

that is, A00 x = b00z~ , with matrix


2 3
A00 =
2 1

and vectors x = (x2 ; x3 )T and b00z~ = (3 z~; 4 z )T . Since det A00 6= 0, Cramer’s Rule
5~
expresses the in…nitely many solutions as

15 16~
z 1
x1 = z~; x2 = ; x3 = + z~ z 2 Rn
8~
8 4
16
The tilde on z helps to distinguish this case from the previous one.
442 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

In the …rst way we get x1 = 1=4 + z, while in the second one x1 = z~. Therefore z~ = 1=4 + z.
With such value the solutions just found,

1
x1 = z~ = +z
4
1
15 16~z 15 16 4 +z 15 4 16z 11
x2 = = = = 2z
8 8 8 8
and
1 1 1
x3 =+ z~ = + +z =z
4 4 4
become the old ones. The two sets of solutions are the same, just written using two di¤erent
parameters. We invite the reader to delete the …rst equation and redo the calculations. N

Example 633 Consider the homogeneous system


8
>
> 2x1 x2 + 2x3 + 2x4 = 0
<
x1 x2 2x3 4x4 = 0
>
>
:
x1 2x2 2x3 10x4 = 0

If we consider x4 as a known term, so that x0 = (x1 ; x2 ; x3 ) and z = x4 , we can write the


system in the “square” form (13.39) as A0 x0 = Bz with
2 3 2 3
2 1 2 2
A0 = 4 1 1 2 5 and B =4 4 5
3 3 3 1
1 2 2 10

The square matrix A0 is invertible, with


2 1 2
3
3 1 3
1
A0 =4 0 1 1 5
1 1 1
6 2 6

Since 2 32 3 2 3
1 2 10
3 1 3 2x4 3 x4
1
A0 ( Bz) = 4 0 1 1 5 4 4x4 5 = 4 6x4 5
3 3 3 1 1 1 1 2
6 2 6 10x4 3 x4

in view of Cramer’s Theorem we conclude that vectors x of R4 of the form


10 2
x= t; 6t; t; t
3 3

solve the system for every t 2 R. This con…rms what found in Section 3.7. N

The solution procedure for systems explained above, based on Cramer’s rule, is theoretic-
ally elegant. However, from the computational viewpoint there is a better procedure that we
do not discuss, known as Gauss method and based on the Gaussian elimination procedure.
13.10. CODA: HAHN-BANACH ET SIMILIA 443

13.10 Coda: Hahn-Banach et similia


So far we considered linear functions de…ned on the entire space Rn . However, they can be
de…ned on any vector subspace V of Rn .

De…nition 634 A function f : V ! R is said to be linear if

f ( x + y) = f (x) + f (y)

for every x; y 2 V and every ; 2 R.

Since V is closed with respect to sums and multiplications by a scalar, we have that
x + y 2 V , and therefore this de…nition is well posed and generalizes De…nition 529.

Example 635 Consider in R3 the vector subspace

V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg

generated by the versors e1 and e2 . It is a “zero level”plane in R3 . The function f : V ! R


de…ned by f (x) = x1 + x2 for every x 2 V is linear. N

Given a linear function f : V ! R de…ned on a vector subspace of Rn , one may wonder


whether it can be extended to the entire space Rn while still preserving linearity or if, instead,
it remains “trapped” in the subspace V without having any possible extension to Rn . More
formally, we wonder whether there is a linear function f : Rn ! R such that fjV = f , that
is,
f (x) = f (x) 8x 2 V
This is quite an important problem, as we will see shortly, also for applications. Fortunately,
the following positive result holds.

Theorem 636 (Hahn-Banach) Let V be a vector subspace of Rn . Every linear function


f : V ! R can be linearly extended to Rn .

Proof Let dim V = k n and let x1 ; :::; xk be a basis for V . If k = n, there is nothing to
prove since V = Rn . Otherwise, by Theorem 87, there are n k vectors xk+1 ; :::; xn such
that the overall set x1 ; :::; xn is a basis for Rn . Let frk+1 ; :::; rn g be an arbitrary set of
n k real numbers. By Theorem 84, note that forPeach vector x in Rn there exists a unique
collection of scalars f i gni=1 R such that x = ni=1 i xi . De…ne f : Rn ! R to be such
Pk Pn
that f (x) = i=1 i f (xi ) + i=k+1 i ri . Since for each vector x the collection f i gni=1 is
unique, we have that f is well de…ned and linear (why?). Note also that
(
f xi for i = 1; :::; k
f xi =
ri for i = k + 1; :::; n

Since x1 ; :::; xk is a basis for V , for every x 2 V there are k scalars f i gki=1 such that
P
x = ki=1 i xi . Hence,
k
! k k k
!
X X X X
i i i i
f (x) = f ix = if x = if x = f ix = f (x)
i=1 i=1 i=1 i=1
444 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

We conclude that the linear function f : Rn ! R extends the linear function f : V ! R to


Rn .

As one can clearly infer from the proof, such an extension is far from unique: to every
set of scalars fri gni=k+1 , a di¤erent extension is associated.

Example 637 Consider the previous example, with the plane V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg
of R3 and the linear function f : V ! R de…ned by f (x) = x1 + x2 . By the Hahn-Banach’s
Theorem, there is a linear function f : R3 ! R such that f (x) = f (x) for each x 2 V .
For example, f (x) = x1 + x2 + x3 for each x 2 R3 , but also f (x) = x1 + x2 + x3 is an
extension, for each 2 R. This con…rms the multiplicity of the extensions. N

Although it may appear as a fairly innocuous result, the Hahn-Banach’s Theorem is very
powerful. Let us see one of its remarkable consequences by extending Riesz’s Theorem to
linear functions de…ned on subspaces.17

Theorem 638 Let V be a vector subspace of Rn . A function f : V ! R is linear if and


only if there exists a vector 2 Rn such that

f (x) = x 8x 2 V (13.40)

Such a vector is unique if V = Rn .

Proof We prove the “only if” since the converse is obvious. Let f : V ! R be a linear
function. By the Hahn-Banach’s Theorem, there is a linear function f : Rn ! R such
that f (x) = f (x) for each x 2 V . By the Riesz’s Theorem, there is a 2 Rn such that
f (x) = x for each x 2 Rn . Therefore f (x) = f (x) = x for every x 2 V , as desired.

Conceptually, the main novelty relative to this version of Riesz’s Theorem is the loss of
the uniqueness of vector . Indeed, the proof shows that such a vector is determined by the
extension f whose existence is guaranteed by Hahn-Banach’s Theorem. Yet, such extensions
are far from being unique, thus implying the non-uniqueness of vector .

Example 639 Going back to the previous examples, we already noted that all linear func-
tions f : R3 ! R de…ned by f (x) = x1 + x2 + x3 , with 2 R, extend f to R3 . By setting
= (1; 1; ), we have that f (x) = x for every 2 R, so that

f (x) = x 8x 2 V

for every 2 R. Hence, in this example there are in…nitely many vectors for which the
representation (13.40) holds. N

The monotone version of Hahn-Banach’s Theorem is of great importance.

Theorem 640 Let V be a vector subspace of Rn . Every (strictly) increasing linear function
f : V ! R can be extended on Rn so to be (strictly) increasing and linear.
17
In Section 19.5 we will see an important …nancial application of this result.
13.10. CODA: HAHN-BANACH ET SIMILIA 445

Proof We prove the statement in the particular, yet important case, in which V \ Rn++ is
not empty and f is increasing.18 We start by introducing a piece of notation which is going
to be useful.
Let W be a vector subspace of Rn such that V W . Consider a linear function f^ : W !
R such that f^ (x) = f (x) for all x 2 V . In other words, f^ extends f to the subspace W .
De…ne dim f^ = dim W . Now consider the set
n o
N = k 2 f1; :::; ng : k = dim f~ and f~ is a monotone increasing linear extension of f

Note that this set is not empty since it contains dim V . For, f is an extension of itself
which is linear and monotone increasing by assumption. Consider now max N . Being N
not empty, max N is well de…ned. If max N = n, then the statement is proved. Indeed, in
such a case we can conclude that there exists a linear monotone increasing extension of f
whose domain is a vector subspace of Rn with dimension n, that is, the domain is Rn itself.
By contradiction, assume instead that n = dim N < n. It means that, in looking for an
extension of f which preserves linearity and monotonicity, one can at most …nd a monotone
increasing linear extension f~ : W ! R where W is a vector subspace of dimension n < n.
Let x1 ; :::; xn be a basis of W . Since n < n, we can …nd at least a vector xn+1 2 Rn such
that x1 ; :::; xn ; xn+1 is still linearly independent. Fix a vector x 2 V \ Rn++ . Clearly, we
have that x 2 V W and for each z 2 Rn there exists m 2 N such that mx z mx.
Let U = x 2 W : x xn+1 and L = y 2 W : xn+1 y . Since x 2 W , both sets are not
empty. Consider now f~ (U ) and f~ (L) which are both subsets of the real line. Since f~ is
monotone increasing, it is immediate to see that each element of f~ (U ) is greater or equal than
each element of f~ (L). By the separation property of the real line, we have that there exists
c 2 R such that a c b for every a 2 f~ (U ) and for every b 2 f~ (L). Observe also that
each vector x 2 span x1 ; :::; xn ; xn+1 can be written in a unique way as x = yx + x xn+1 ,
where yx 2 W and x 2 R (why?).
De…ne now f^ : span x1 ; :::; xn ; xn+1 ! R to be such that f^ (x) = f~ (yx ) + x c for
every x 2 span x1 ; :::; xn ; xn+1 . We leave to the reader to verify that f^ is indeed lin-
ear and f^ extends f . Note instead that f^ is positive, that is, f^ (x) 0 for all x 2
1 n
span x ; :::; x ; x n+1 \ R+ . Otherwise, there would exist x 2 span x ; :::; xn ; xn+1 such
n 1

that x 0 and f^ (x) < 0. If x = 0, then yx = yx + x xn+1 = x 0 and this would yield
that yx 0, that is, since f~ is monotone increasing, 0 > f^ (x) = f~ (yx ) 0, a contradic-
tion. If x 6= 0, then xn+1 yx = x and c < f~ ( yx = x ). In other words, we have that
yx = x belongs to L, thus f~ ( yx = x ) 2 f~ (L) and c f~ ( yx = x ) > c a contradiction.
Since we just showed that f must be positive, by Proposition 538, this implies that f^ is
^
monotone increasing as well. To sum up, we just constructed a function (namely f^) which
extends f to a vector subspace which has dimension n + 1 (namely span x1 ; :::; xn ; xn+1 ),
thus max N n + 1. At the same time, our working hypothesis was that n = max N , thus
reaching a contradiction.

In Example 637, the function f (x) = x1 + x2 is linear and strictly increasing on V =


f(x1 ; x2 ; 0) : x1 ; x2 2 Rg and any f (x) = x1 + x2 + x3 with > 0 is a strictly increasing
18
In …nancial applications this assumption is often satis…ed (see Section 19.5). The proof of the more
general case, as well as the strictly increasing version of the result, relies on mathematical facts that the
reader will encounter in more advanced courses.
446 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

linear extension for it on R3 . Note that there may be non-monotone linear extensions: it is
enough to consider f (x) with < 0.
The last theorem and Proposition 539 lead to the following monotone version of Riesz’s
Theorem.

Proposition 641 Let V be a vector subspace of Rn . A function f : V ! R is linear and


(strictly) increasing if and only if there exists a (strongly) positive vector 2 Rn+ such that

f (x) = x 8x 2 V

Such a vector is unique if V = Rn .

A similar result holds for strong monotonicity. In this regard, note that the function
f (x) = x1 + x2 is strongly positive, and so is f (x) = x1 + x2 + x3 with > 0.

A nice dividend of the Hahn-Banach’s Theorem is the following extension result for a¢ ne
functions, which will be introduced momentarily in the next chapter (they play a key role in
applications; cf. Chapter 34).

Theorem 642 Let C be a convex subset of Rn . If f : C ! R is a¢ ne, then there exists an


a¢ ne extension of f to the entire space Rn .

Proof. We begin with a Claim.

Claim Let C be a convex subset of Rn . If f : C ! R is a¢ ne, then for each triple x; y; z 2 C


and weights ; ; 2 R such that + + = 1 and x + y + z 2 C

f ( x + y + z) = f (x) + f (y) + f (z) (13.41)

Proof of the Claim We start by proving that the statement is true when = 0. Let
x; y 2 C and ; 2 R be such that + = 1 as well as x + y 2 C. We have two cases
either ; 0 or at least one of the two is strictly negative. In the …rst case, since + = 1,
we have that 1. Since f is a¢ ne and = 1 , this implies that

f ( x + y) = f ( x + (1 ) y) = f (x) + (1 ) f (y) = f (x) + f (y) (13.42)

In the second case, without loss of generality, we can assume that < 0. Since + = 1,
we have that = 1 > 1. De…ne w = x + (1 ) y = x + y 2 C. De…ne = 1= and
note that 2 (0; 1). Observe that x = w + (1 ) y. Since f is a¢ ne, we have that

1 1
f (x) = f ( w + (1 ) y) = f (w) + (1 ) f (y) = f ( x + (1 ) y) + 1 f (y)

by rearranging terms, we get that (13.42) holds. We next prove that (13.41) holds. Let us
now consider the more general case, that is, x; y; z 2 C and ; ; 2 R such that + + = 1
and x + y + z 2 C. We split the proof in three cases:

1. All three scalars are positive, i.e., ; ; 0. Since + + = 1, we have that


x + y + z is a standard convex combination. Since f is a¢ ne, (13.41) holds.
13.10. CODA: HAHN-BANACH ET SIMILIA 447

2. Only two scalars are positive, say ; 0. De…ne w = + x + + y and = + .


Since + + = 1, then > 0. Since C is convex and x; y 2 C, we have that w 2 C.
It is immediate to check that w + (1 ) z = x + y + z 2 C where 2 R. Since
(13.42) holds, we have that

f ( x + y + z) = f ( w + (1 ) z) = f (w) + (1 ) f (z)

= ( + )f x+ y + (1 ) f (z)
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)

proving the statement.

3. One scalar is positive, say ; < 0. De…ne w = + x+ + y and =1 . It


follows that 1 = + < 0 and + ; + > 0 as well as + + + = 1. Since
C is convex and x; y 2 C, this implies that w 2 C. It is immediate to check that
z + (1 ) w = x + y + z 2 C where 2 R. Since (13.42) holds, we have that

f ( x + y + z) = f ( z + (1 ) w) = f (z) + (1 ) f (w)

= f (z) + ( + ) f x+ y
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)

proving the statement.

We can now start proving the main statement. We do so by further assuming that
0 2 C and f (0) = 0. We will show that f admits a linear extension to Rn . This will
prove the statement in this particular case (why?). If C = f0g, then any linear function
extends f and so any linear function is an a¢ ne extension of f . Assume C 6= f0g. Since
f0g 6= C Rn there exists a linearly independent collection x1 ; :::; xk C with 1
k n. Let k be the maximum number of linearly independent vectors of C. Note that
span x1 ; :::; xk C. Otherwise, we would have that there exists a vector x in C that
does not belong to span x1 ; :::; xk . Now, observe that if we consider a collection f g [
P
f i gki=1 R of k + 1 scalars, we can say that if x + ki=1 i xi = 0, then we have two cases:
P
either 6= 0 or = 0. In the former case, we could conclude that x = ki=1 ( i = ) xi 2
span x1 ; :::; xk , a contradiction with x 62 span x1 ; :::; xk . In the latter case, we could
Pk i 1 k
conclude that i=1 i x = 0. Since the vectors x ; :::; x are linearly independent, it
follows that i = 0 for all i 2 f1; :::; kg, proving that x ; :::; xk ; x are linearly independent,
1

a contradiction with the fact that x1 ; :::; xk contains the maximum number of linearly
P
independent vectors of C. De…ne f : span x1 ; :::; xk ! R by f (x) = ki=1 i f xi , where
P
f i gki=1 is the unique collection of scalars such that x = ki=1 i xi . By construction, f is
linear (why?). Next, we show it extends f . Let x 2 C. There exists a unique collection of
448 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS

Pk
scalars f i gki=1 such that x = i=1 ix
i. Divide these scalars in three sets

P = fi 2 f1; :::; kg : i > 0g ; N = fi 2 f1; :::; kg : i < 0g


Z = fi 2 f1; :::; kg : i = 0g

P P
De…ne = i2P i and = i2N i. We have four cases:

1. = 0 = . Then, i = 0 for all i 2 f1; :::; kg and x = 0

k
X
f (x) = if xi = 0 = f (0) = f (x)
i=1

2. 6= 0 and = 0. Then, i = 0 for all i 2 N [ Z. De…ne i = P i i > 0 for all


P P i2P P
i 2 C. It follows that x = i
i2
P P . Note that i2P i = 1 and i2P i x i2P i x =
i
i2P i x + (1 ) 0. We have that

k
!
X X X i
X
i
f (x) = if x = i P f xi = if xi
i=1 i2P i
i2P i2P i2P
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2P i2P
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2P i2P

3. = 0 and 6= 0. Then, i = 0 for all i 2 P [ Z. De…ne i = P i i > 0 for all


P P i2N P
i 2 C. It follows that x = i
i2
P N . Note that i2N i = 1 and i2N i x i2N i x =
i
i2N i x + (1 ) 0. We have that

k
!
X X X i
X
f (x) = if xi = i P f xi = if xi
i=1 i2N i
i2N i2N i2N
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2N i2N
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2N i2N
13.10. CODA: HAHN-BANACH ET SIMILIA 449

4. 6= 0 and 6= 0. De…ne i and i as in points 2 and 3. We have that


k
X X
f (x) = if xi = if xi
i=1 i2P [N
X X
i
= if x + if xi
i2P i2N
! !
X X i
X X i
i
= i P f x + i P f xi
i2P i i2N i
i2P i2P i2N i2N
! !
X X X X
= if xi + if xi = f ix
i
+ f ix
i

i2P i2N i2P i2N


! !
X X
i i
= f ix + f ix + (1 ) f (0)
i2P i2N
! !
X X X
i i i
=f ix + ix + (1 )0 =f ix = f (x)
i2P i2N i2P [N

Thus, we have that f is a linear extension of f to span x1 ; :::; xk . By the Hahn-Banach’s


Theorem, f can then be linearly extended to the entire space Rn , proving the statement for
the case 0 2 C and f (0) = 0.
Now assume that either 0 62 C or f (0) 6= 0. Let x 2 C. De…ne D = fy 2 Rn : y = x xg.
As the reader can verify, D has three notable features: (a) D is convex, (b) for each y 2 D
there exists a unique vector xy 2 C such that y = xy x, (c) 0 2 D. De…ne the function
f^ : D ! R to be such that f^ (y) = f (xy ) f (x) for every y 2 D. The reader can verify that
f^ is a¢ ne and such that f^ (0) = 0. By the previous part of the proof, there exists a linear
extension to Rn of f^. Denote such an extension by f and de…ne k = f (x) f (x) 2 R. It
follows that for every x 2 C

f (x) = f^ (x x) + f (x) = f (x x) + f (x) = f (x) + f (x) f (x) = f (x) + k

that is, f is extended to the entire space Rn by the a¢ ne function f + k.


450 CHAPTER 13. LINEAR FUNCTIONS AND OPERATORS
Chapter 14

Concave functions

14.1 Convex sets


14.1.1 De…nition and basic properties
In economics it is often important to be able to combine the di¤erent alternatives among
which decision makers have to choose. For example, if x and y are bundles of goods or vectors
of inputs, we may want to consider also their mixtures x + (1 ) y, with 2 [0; 1]. If
x = (10; 0) and y = (0; 10) are vectors of inputs, the …rst one with ten units of iron and zero
of copper, the second one with zero units of iron and ten of copper, we may want to consider
also their combination
1 1
(0; 10) + (10; 0) = (5; 5)
2 2
that consists of …ve units of both materials.
The sets that always allows such combinations are called convex. They play a key role
in economics.

De…nition 643 A set C in Rn is said to be convex if, for every pair of points x; y 2 C,

x + (1 )y 2 C 8 2 [0; 1]

The meaning of convexity is based on the notion of convex (linear ) combination:

x + (1 )y

which, when varies in [0; 1], represents geometrically the points of the segment

f x + (1 )y : 2 [0; 1]g (14.1)

that joins x with y. A set C is convex if it contains the segment (14.1) that joins any two
points x and y of C.

451
452 CHAPTER 14. CONCAVE FUNCTIONS

Graphically, a convex set:

and non convex set:

Other examples:
14.1. CONVEX SETS 453

Ancient convex sets Good non-convex set

Example 644 (i) On the real line the only convex sets are the intervals, bounded or un-
bounded. Convex sets can, therefore, be seen as the generalization to Rn of the notion of
interval. (ii) The neighborhoods B" (x) = fy 2 Rn : kx yk < "g of Rn are convex. Indeed,
let y 0 ; y 00 2 B" (x) and 2 [0; 1]. By the properties of the norm (Proposition 102),

x y 0 + (1 ) y 00 = x + (1 )x y 0 + (1 ) y 00
= x y 0 + (1 ) x y 00
x y 0 + (1 ) x y 00 < "

Therefore, y 0 + (1 ) y 00 2 B" (x), which proves that the set B" (x) is convex. N

Let us see a …rst topological property of convex sets (for brevity, we omit its proof).

Proposition 645 The closure and the interior of a convex set are convex sets.

The converse does not hold: a non-convex set may also have a convex interior or closure.
For example, the set [2; 5] [ f7g R is not convex (it is not an interval), but its interior (2; 5)
is; the set (0; 1) [ (1; 5) R is not convex, but its closure [0; 5] is. Even more interesting
is to consider a square in the plane and to remove from it a point on a side that is not a
vertex; the resulting set is not convex, yet both its closure and its interior are so.

Proposition 646 The intersection of any collection of convex sets is a convex set.

In contrast, a union of convex sets is not necessarily convex. For example, (0; 1) [ (2; 5)
is not a convex set although both sets (0; 1) and (2; 5) are so.
Proof Let fCi gi2I be T any collection of convex sets, where i runs over a …nite or in…nite
index set I. Let C = i2I Ci . The empty set is trivially convex, so if C = ; the result holds.
Suppose, therefore, that C 6= ;. Let x; y 2 C and let 2 [0; 1]. We want to prove that
x + (1 ) y 2 C. Since x; y 2 Ci for each i, we haveTthat x + (1 ) y 2 Ci for each i
because each set Ci is convex. Hence, x + (1 ) y 2 i2I Ci , as desired.

Notation Throughout the chapter C denotes a convex set in Rn .


454 CHAPTER 14. CONCAVE FUNCTIONS

14.1.2 Back to high school: polytopes


The points of the segment (14.1) are convex combinations of the vectors x and y. In general,
given a collection fxi gki=1 of vectors, a linear combination
k
X
i xi
i=1

is called a convex (linear ) combination of the vectors fxi gki=1 if i 0 for each i and
Pk
i=1 i = 1. In the case n = 2, 1 + 2 = 1 implies 2 = 1 1 , hence convex combinations
of two vectors have the form x + (1 ) y with 2 [0; 1].
Via convex combinations we can de…ne a basic class of convex sets.

De…nition 647 Given a …nite collection of vectors fxi gki=1 of Rn , the polytope that they
generate is the set
( k k
)
X X
i xi : i = 1 and i 0 for every i
i=1 i=1

of all their convex combinations.

Clearly, polytopes are convex sets. In particular, the polytope generated by two vectors
x and y is the segment that joins them. On the plane, polytopes have simple geometric
interpretations that takes us back to high school. Given three vectors x, y and z of the plane
(not aligned), the polytope1

f 1x + 2y + (1 1 2) z : 1; 2 0 and 1 + 2 1g

is the triangle that has them as vertices:2

2
x

1 y

-1
z

-2
-3 -2 -1 0 1 2 3 4 5

1
Note that ( 1 ; 2 ; 3 ) 2 R3+ : 1 + 2 + 3 = 1 = f( 1 ; 2 ; 1 1 2) : 1; 2 0 and 1 + 2 1g.
2
A caveat: if, for instance, x lies on the segment that joins y and z (i.e., the vectors are linearly dependent),
the triangle generated by x, y and z reduces to that segment. In this case, the vertices are only y and z.
Similar remarks applies to general polygons.
14.1. CONVEX SETS 455

In general, given k vectors x1 , ..., xk of the plane, the polytope


( k k
)
X X
i xi : i = 1 and i 0 for every i (14.2)
i=1 i=1
is the polygon that has them as vertices. The polygons that we studied in high school can
be regarded as the locus of all convex combinations of their vertices.
Example 648 (i) The rhombus

1.5

0.5

-0.5

-1

-1.5

-2
-2 -1 0 1 2 3

is the polytope generated by the four vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1)g, which are its
vertices.
(ii) The …ve vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1) ; (1=2; 1=2)g also generate the same
rhombus
2

1.5

0.5

-0.5

-1

-1.5

-2
-2 -1 0 1 2 3

because the added vector (1=2; 1=2) already belonged to the rhombus. As mentioned in the
last footnote, not all vectors that generate a polygon have to be necessarily among its vertices.
N
456 CHAPTER 14. CONCAVE FUNCTIONS

Proposition 649 A set is convex if and only if it is closed with respect to all convex com-
binations of its own elements.

In other words, a set is convex if and only if contains all the polytopes generated by its
elements (in the plane, all polygons whose vertices are elements of the set). Though they
are de…ned in terms of P segments, convex sets actually contain all polytopes. In symbols, C
is convex if and only if ki=1 i xi 2 C for every …nite collection fxi gki=1 of vectors of C and
P
every collection f i gki=1 of positive scalars such that ki=1 i = 1.
Proof The “if” is obvious because by considering the convex combinations with n = 2 we
get De…nition 643. We prove the “Only if.” Let C be convex and let fxi gni=1 be a collection
n
Pn of C and f i gi=1 a collection ofPscalars
of vectors such that i 0 for each i = 1; :::; n
and i=1 i = 1. We want to prove that ni=1 i xi 2 C. By De…nition 643, this is true
for n = 2. We proceed by induction on n: we assume that it is true for n 1 (induction
hypothesis) and show that this implies that the property holds also for n. We have:

n
X n
X1 n
X1 i
i xi = i xi + n xn = (1 n) xi + n xn
1 n
i=1 i=1 i=1

By the induction hypothesis, we have:

n
X1 i
xi 2 C
1 n
i=1

Hence, the convexity of C implies:

n
X1 i
(1 n) xi + n xn 2C
1 n
i=1

We conclude that C is closed with respect to the convex combinations of n elements, as


desired.

Example 650 Given the versors e1 , e2 , ..., en of Rn , the set


( n n
)
X X
i
n 1 = ie : i = 1 and i 0 for every i
i=1 i=1
( n
)
X
= ( 1 ; :::; n) : i = 1 and i 0 for every i
i=1

of all their convex combinations is called simplex. For instance, the simplex of the plane
1 2
1 = 1e + 2e : 1; 2 0 and 1 + 2 =1
= f (1; 0) + (1 ) (0; 1) : 2 [0; 1]g
= f( ; 1 ): 2 [0; 1]g
14.2. CONCAVE FUNCTIONS 457

is the segment that joins the versors e1 and e2 . The simplex of R3 is:
1 2 3
2 = 1e + 2e + 3e : 1; 2; 3 0 and 1 + 2 + 3 =1
=f 1 (1; 0; 0) + 2 (0; 1; 0) + (1 1 2 ) (0; 0; 1) : 1; 2 0 and 1 + 2 1g
= f( 1; 2; 1 1 2) : 1; 2 0 and 1 + 2 1g

Graphically, 2 is:

Simplices are an important class of polytopes. N

14.2 Concave functions


A convex set can represent, for example, a collection of bundles on which a utility function is
de…ned, or a collection of inputs on which a production function is de…ned. The convexity of
the sets allows us to combine bundles or inputs. It then becomes important to study how the
functions de…ned on such sets, be they utility or production functions, behave with respect
to these combinations.
For this reason, concave and convex functions are extremely important in economics. We
have already introduced them in Section 6.4.5 for scalar functions de…ned on intervals of R.
The following de…nition holds for any function de…ned on a convex set C of Rn .

De…nition 651 A function f : C Rn ! R is said to be concave if

f ( x + (1 ) y) f (x) + (1 ) f (y) (14.3)

for every x; y 2 C and every 2 [0; 1], and it is said to be convex if

f ( x + (1 ) y) f (x) + (1 ) f (y) (14.4)

for every x; y 2 C and every 2 [0; 1].

The geometric interpretation is the same as the one seen in the scalar case: a function is
concave if the chord that joins any two points (x; f (x)) and (y; f (y)) of its graph lies below
the graph of the function, while it is convex if the opposite happens, that is, if this chord
lies above the graph of the function.
458 CHAPTER 14. CONCAVE FUNCTIONS

14 14

12 12

10 10

8 8

6 6

4 4

2 2

0 0

-2 x O y -2 x O y
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5

Concave function Convex function

Indeed, such a chord consists of the points

f (x; f (x)) + (1 ) (y; f (y)) : 2 [0; 1]g


= f( x + (1 ) y; f (x) + (1 ) f (y)) : 2 [0; 1]g

So, the following …gure of a concave function should clarify its geometric interpretation:

Example 652 The absolute value function j j : R ! R is convex since

j x + (1 ) yj j xj + j(1 ) yj = jxj + (1 ) jyj


14.2. CONCAVE FUNCTIONS 459

for every x; y 2 R and every 2 [0; 1]. More generally, the norm k k : Rn ! R is a convex
function. Indeed,

k x + (1 ) yk k xk + k(1 ) yk = kxk + (1 ) kyk (14.5)

for every x; y 2 Rn and every 2 [0; 1]. N

Note that a function f is convex if and only if f is concave: through this simple duality,
the properties of convex functions can be easily obtained from those of concave functions.
Accordingly, we will consider only the properties of concave functions, leaving to the reader
the simple deduction of the corresponding properties of convex functions.

N.B. The domain of a concave (convex) function must be a convex set. Otherwise, in De…n-
ition 651 the combination f (x) + (1 ) f (y) would be de…ned for every 2 [0; 1] while
f ( x + (1 ) y) would not be de…ned for some 2 [0; 1]. From now on we will assume,
often without mentioning it, that the concave (and convex) functions that we consider are
always de…ned on convex sets. O

An important subclass of concave functions is that of the strictly concave ones, which
are the functions f : C Rn ! R such that

f ( x + (1 ) y) > f (x) + (1 ) f (y)

for every x; y 2 C, with x 6= y, and every 2 (0; 1). In other words, inequality (14.3) is
required here to be strict, which implies that the graph of a strictly concave function has no
linear parts. In a dual way, a function f : C Rn ! R is called strictly convex if

f ( x + (1 ) y) < f (x) + (1 ) f (y)

for every x; y 2 C, with x 6= y, and every 2 (0; 1). In particular, a function is strictly
convex if and only if f is strictly concave.

We give now some examples of concave and convex functions. To verify whether a
function satis…es such properties using the de…nition is often not easy. For this reason we
invite readers to resort to their geometric intuition for these examples, and wait to see later
in the book some su¢ cient conditions based on di¤erential calculus that greatly simplify the
veri…cation (Chapter 24).

p
Example 653 (i) The functions f; g : R+ ! R given by f (x) = x and g (x) = log x are
strictly concave. (ii) The function f : R ! R given by f (x) = x2 is strictly convex. (iii)
The function f : R ! R given by f (x) = x3 is neither concave nor convex; however, on the
interval ( 1; 0] it is strictly concave, while on [0; 1) it is strictly convex. (iv) The function
f : R ! R given by
(
x if x 1
f (x) =
1 if x > 1
460 CHAPTER 14. CONCAVE FUNCTIONS

is concave (but not strictly). Indeed, its graph is:

Example 654 (i) The function f : R2 ! R given by f (x) = x21 + x22 is strictly convex. (ii)
Cobb-Douglas functions (Example 178) are concave (as it will be seen in Corollary 711). N

Example 655 The function f : Rn ! R de…ned by

f (x) = min xi
i=1;:::;n

is concave. Indeed, given any two vectors x; y 2 Rn , we have

min (xi + yi ) min xi + min yi


i=1;:::;n i=1;:::;n i=1;:::;n

because in minimizing separately x and y we have more degrees of freedom than in minimizing
them jointly, i.e., their sum. It then follows that, if x; y 2 Rn and 2 [0; 1], we have

f ( x + (1 ) y) = min ( xi + (1 ) yi )
i=1;:::;n

min xi + (1 ) min yi = f (x) + (1 ) f (y)


i=1;:::;n i=1;:::;n

In consumer theory, u (x) = mini=1;:::;n xi is the Leontief utility function (Example 214). N

Since inequalities (14.3) and (14.4) are weak, it is possible that a function is at the same
time concave and convex. In such a case, the function is said to be a¢ ne. In other words, a
function f : C Rn ! R is a¢ ne if

f ( x + (1 ) y) = f (x) + (1 ) f (y)

for every x; y 2 C and every 2 [0; 1]. The notion of a¢ ne function is closely related to
that of linear function.
14.2. CONCAVE FUNCTIONS 461

Proposition 656 A function f : C Rn ! R is a¢ ne if and only if there exist a linear


n
function l : R ! R and a scalar q 2 R such that

f (x) = l (x) + q 8 x 2 Rn (14.6)

A¢ ne functions are thus translations of linear functions. To …x ideas, consider the


important case when 0 2 C (for instance, when C is the entire space Rn ). Then, the
translation is given by f (0) = q, so f is linear if and only if f (0) = 0. A¢ nity can,
therefore, be seen as a weakening of linearity that permits a non-zero “intercept” q.
By Riesz’s Theorem, we can recast expression (14.6) as
n
X
f (x) = x+q = i xi +q (14.7)
i=1

where 2 Rn and q 2 R. In the scalar case, we get

f (x) = mx + q (14.8)

with m 2 R.3 A¢ ne functions of a single variable have, therefore, a well-known form: they
are the straight lines with slope m and intercept q. In particular, this con…rms that the
linear functions of a single variable are the straight lines passing through the origin, since
for them f (0) = q = 0.
In general, expression (14.7) tells us that the value f (x) of an a¢ ne function is a weighed
sum, with weighs i , of the components xi of the argument x, plus a known term q 2 R. It is
the simplest form that a function of several variables may assume. For example, if = (3; 4)
and q = 2, we obtain the a¢ ne function f : R2 ! R given by f (x) = 3x1 + 4x2 + 2.

Proof In view of Theorem 642, it is enough to prove the result for C = Rn . “If”. Let
x; y 2 Rn and 2 [0; 1]. We have

f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q + (1 )q
= (l (x) + q) + (1 ) (l (y) + q)

So, f (x) = l (x) + q is a¢ ne.

“Only if”. Let f : Rn ! R be a¢ ne and set l (x) = f (x) f (0) for every x 2 Rn . Setting
q = f (0), we have to show that l is linear. We start by showing that

l ( x) = l (x) 8x 2 Rn ; 8 2 R (14.9)

For every 2 [0; 1] we have

l ( x) = f ( x) f (0) = f ( x + (1 ) 0) (1 ) f (0) f (0)


= f (x) + (1 ) f (0) (1 ) f (0) f (0) = f (x) f (0) = l (x)

Let now > 1. Setting y = x, by what has just been proved we have
y 1
l (x) = l = l (y)
3
We use in the scalar case the more common letter m in place of .
462 CHAPTER 14. CONCAVE FUNCTIONS

and so l ( x) = l (x). On the other hand,


1 1 1 1
0 = l (0) = l
x x =f x x f (0)
2 2 2 2
1 1 1 1 1 1
= f (x) + f ( x) f (0) f (0) = l (x) + l ( x)
2 2 2 2 2 2
so that l ( x) = l (x). Hence, if < 0 then
l ( x) = l (( ) ( x)) = ( ) l ( x) = ( ) ( l (x)) = l (x)
All this proves that (14.9) holds. In view of Proposition 533, to complete the proof of the
linearity of l we have to show that
l (x + y) = l (x) + l (y) 8x; y 2 Rn (14.10)
We have
x+y x y x y
l (x + y) = 2l = 2l + =2 f + f (0)
2 2 2 2 2
1 1 1 1
=2 f (x) + f (y) f (0) f (0) = l (x) + l (y)
2 2 2 2
as desired.

14.3 Properties
14.3.1 Concave functions and convex sets
There exists a simple characterization of concave functions f : C Rn ! R that uses convex
sets. Namely, consider the set
hypo f = f(x; y) 2 C R : f (x) yg Rn+1 (14.11)
called the hypograph of f , constituted by the points (x; y) 2 Rn+1 that lie below the graph
of the function.4 Graphically, the hypograph of a function is:

6
y

1
O x

0
0 1 2 3 4 5 6

4
Recall that the graph is given by Gr f = f(x; y) 2 C R : f (x) = yg Rn+1
14.3. PROPERTIES 463

The next result shows that the concavity of f is equivalent to the convexity of its hypo-
graph.

Proposition 657 A function f : C Rn ! R is concave if and only if its hypograph hypo f


is a convex set in Rn+1 .

Proof Let f be concave, and let (x; y) ; (y; z) 2 hypo f . By de…nition, y f (x) and
z f (y). It follows that

t + (1 )z f (x) + (1 ) f (y) f ( x + (1 ) y)

for every 2 [0; 1]. Therefore, ( x + (1 ) y; t + (1 ) z) 2 hypo f , which proves that


hypo f is convex.
For the converse, suppose that hypo f is convex. By de…nition, for every x; y 2 C and
2 [0; 1],
( x + (1 ) y; f (x) + (1 ) f (y)) 2 hypo f

that is,
f (x) + (1 ) f (y) f ( x + (1 ) y)

as desired.

In Section 6.3.1 we have de…ned the level curves of a function f : C Rn ! R as the


preimages
1
f (k) = fx 2 C : f (x) = kg

for k 2 R. In a similar way, the sets

fx 2 C : f (x) kg

are called upper contour (or superlevel ) sets, denoted by (f k), while the sets

fx 2 C : f (x) kg

are called lower contour (or sublevel) sets, denoted by (f k). Clearly,

1
f (k) = (f k) \ (f k) (14.12)

and so sometimes we use the notation (f = k) in place of f 1 (k).

The next two …gures show the upper contour sets of two scalar functions u. In the …rst
…gure we have a non-monotonic function with upper contour sets that are not all convex:
464 CHAPTER 14. CONCAVE FUNCTIONS

5
y
4

2 y=k

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

In contrast, in the second …gure we have a monotonic function with upper contour sets that
are convex:

8
y
6

4
y=k
2

O x
-2

-4
-4 -3 -2 -1 0 1 2 3 4

In economics we meet upper contour sets already in the …rst lectures of a course in
microeconomics principles. For a utility function u : C Rn ! R, the upper contour set
(u k) is the set of all the bundles that have utility at least equal to k. When n = 2,
graphically (u k) is the region of the plane lying below the indi¤erence curve u 1 (k).
Usually in microeconomics such regions are assumed to be convex. Indeed, it is this convexity
of (u k) that one has in mind when one talks, improperly, of convex indi¤erence curves.5
As the next result shows, this convexity holds when the utility function u is concave.

Proposition 658 If f : C Rn ! R is concave, then all its upper contour sets (f k) are
convex.
5
This notion will be made rigorous later in the book (cf. Section 25.3).
14.3. PROPERTIES 465

Proof Given k 2 R, let (f k) be non-empty (otherwise, the result is obvious because


empty sets are trivially convex). Let x1 ; x2 2 (f k) and 2 [0; 1]. By the concavity of f ,

f x1 + (1 ) x2 f x1 + (1 ) f x2 k + (1 )k = k

and therefore x1 + (1 ) x2 2 (f k).

We have thus shown that the usual form of the indi¤erence curves is implied by the
concavity of the utility functions. That is, more rigorously, we have shown that concave
functions have convex upper contour sets. The converse is not true! Think for example of
any function f : R ! R strictly increasing: we have
1
(f k) = f (k) ; +1

for every k 2 R. All the upper contour sets are therefore convex, although in general they
are not concave.6
The concavity of the utility functions is therefore a su¢ cient, but not necessary, condition
for the “convexity” of the indi¤erence curves: there exist non-concave utility functions that
have indi¤erence curves of this form. At this point it is natural to ask what is the class of
functions, larger than that of the concave ones, characterized by having “convex”indi¤erence
curves. Section 14.4 will answer this question by introducing quasi-concavity.

14.3.2 A¢ ne functions and a¢ ne sets


The dual version of the last result holds for convex functions, in which the lower contour sets
(f k) are convex. If f is a¢ ne, it then follows by (14.12) that the level sets (f = k) are
convex, being the intersection of convex sets. But, much more can be said for a¢ ne functions
de…ned on Rn . Indeed, recall that they are translations of linear functions (Proposition 656).
This property has a simple, but noteworthy, consequence.

Corollary 659 A function f : Rn ! R is a¢ ne if and only if f ( x + (1 ) y) = f (x) +


(1 ) f (y) for all scalars 2 R.

Remarkably, is any scalar, it is not required to lie in [0; 1].

Proof Consider the “only if”, the converse being trivial. If f is a¢ ne, it can be written as
f (x) = l (x) + q for every x 2 Rn (Proposition 656). This implies that, for all 2 R and all
x; y 2 Rn ,

f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q = f (x) + (1 ) f (y)

as desired.

Given two vectors x and y, the linear combination x + (1 ) y is called a¢ ne if 2 R.


An a¢ ne combination is convex when belongs to [0; 1]. Using this terminology, the last
result says that a¢ ne functions preserve a¢ ne combinations, not just the convex ones.
All this suggests the following de…nition.
6 1
To …x ideas, think of the cubic function f (x) = x3 , for which we have (f c) = [c 3 ; +1) for every c 2 R.
466 CHAPTER 14. CONCAVE FUNCTIONS

De…nition 660 A set A of Rn is said to be a¢ ne if x + (1 ) y 2 A for all x; y 2 A and


all 2 R.

A¢ ne sets are an important class


P of convex sets thatP preserve a¢ ne combinations. If
we say that a linear combination ni=1 i xi is a¢ ne if m i=1 i = 1, they are easily seen to
contain all a¢ ne combinations of their elements (not just their convex combinations, as it is
the case for generic convex sets; cf. Proposition 649).

Example 661 Given a m n matrix B and a vector b 2 Rm , the set A = fx 2 Rn : Bx = bg


is a¢ ne. Indeed, let x; y 2 A and 2 R. Then,

B ( x + (1 ) y) = Bx + (1 ) By = b + (1 )b = b

So, x + (1 ) y 2 A as desired. N

Back to our original motivation, now we can explain why we can say much more about
level sets of a¢ ne functions on Rn than just that they are convex.

Proposition 662 Let A be an a¢ ne subset of Rn . If f : A ! R is a¢ ne, then all its level


sets (f = k) are a¢ ne.

The proof of this result is just the observation, which by now should be fairly obvious,
that Corollary 659 holds for f de…ned on any a¢ ne set, not just the entire Rn .

Example 663 Consider the a¢ ne function f : R2 ! R de…ned by f (x1 ; x2 ) = 2x1 + x2 + 5.


Clearly, the level set (f = k) = (x1 ; x2 ) 2 R2 : 2x1 + x2 + 5 = k is a¢ ne. Geometrically,
it is the graph of the straight line x2 = 2x1 + k 5. Note that when k = 5 the a¢ ne set
(f = 5) is a vector subspace of R2 ; if k 6= 5, this is no longer the case. N

To fully appreciate the strength of the result, next we characterize a¢ ne sets. Vector
subspaces are an important example of a¢ ne sets. Up to translations, the converse is true:
any a¢ ne set is “parallel” to a vector subspace.

Proposition 664 A set A of Rn is a¢ ne if and only if there is a vector subspace V of Rn


and a vector z 2 V such that A = V + z = fx + z : x 2 V g. In particular, A is a vector
subspace if and only if 0 2 A.

Proof “Only if”. Let A = V +z, where V is a vector subspace. Let x; y 2 A. Then, x = x1 +z
and y = x2 + z for some x1 ; x2 2 V , and so x + (1 ) y = x1 + (1 ) x2 + z 2 V + z = A.
“If”. Take a point z 2 A and set V = A z. We must prove that V is a vector space. Let x 2
V , that is, x = y z for some y 2 A. For all 2 R we have x = y z = y +(1 ) z z.
As y; z 2 A, then y + (1 ) z 2 A and so x 2 A y = V . To conclude, let x1 ; x2 2 V ,
namely, x1 = y1 z and x2 = y2 z. Then

y1 + y2
x1 + x2 = y1 + y2 2z = 2 z 2V
2

So, V is a vector space. We leave to the reader the proof of the …nal part of the statement.
14.3. PROPERTIES 467

Example 665 In the last example, (f = 5) is already a vector subspace. Take k 6= 5, for
instance k = 0. Take any vector x0 such that f (x0 ) = 0, say x0 = ( 3; 1). It is easy to see
that

V = (f = 0) x0 = f(x1 + 3; x2 1) : f (x1 ; x2 ) = 0g
= f(t + 3; 2 (t + 3)) : t 2 Rg

is a vector subspace of R2 . We can then write (f = 0) = V + x0 . N

The last proposition permits to establish a concrete representation of a¢ ne sets by show-


ing that they all have the form of Example 661.

Proposition 666 A set A in Rn is a¢ ne if and only if there is a m n matrix B and a


vector b 2 Rm such that
A = fx 2 Rn : Bx = bg

So, a¢ ne sets correspond to the sets of solutions of linear systems. In particular, in view
of Proposition 664 we can say that vector subspaces have the form fx 2 Rn : Bx = 0g, so
they correspond to solutions of homogeneous linear systems.

Proof The “if”is contained in Example 661. We omit the proof of the converse, which relies
on the last proposition.

14.3.3 Jensen’s inequality and continuity


Although concavity is de…ned via convex combinations involving only two elements, next we
show that it actually holds for all the convex combinations.

Proposition 667 (Jensen’s inequality) A function f : C Rn ! R is concave if and


only if, for every …nite collection fx1 ; x2 ; :::; xn g of elements of C, we have
n
! n
X X
f x
i i i f (xi ) (14.13)
i=1 i=1
Pn
for all i 0 such that i=1 i = 1.

The inequality (14.13) is known as Jensen’s inequality and is very important in applica-
tions.7 A dual version, with
Pn , holds forPnconvex functions, while for a¢ ne functions we have
a “Jensen equality” f ( i=1 i xi ) = i=1 i f (xi ). So, a¢ ne functions preserve all a¢ ne
combinations, be they with two or more elements.

Proof The “if”is obvious. As to the “only if”part, we proceed by induction on n. Let f be
concave. The inequality (14.13) obviously holds for n = 2. Suppose that it holds for n 1
Pn 1 Pn 1
(induction hypothesis), i.e., f i=1 i xi i=1 i f (xi ) for every convex combination

7
The inequality is named after Johan Jensen, who introduced concave functions in 1906.
468 CHAPTER 14. CONCAVE FUNCTIONS

of n 1 elements of C. If n = 1, inequality (14.13) holds trivially. Let therefore n < 1.


We have
n
! n
! !
X X1 n
X1 i
f i xi =f i xi + n xn = f (1 n) xi + n xn
1 n
i=1 i=1 i=1
n
!
X1 i
(1 n) f xi + n f (xn )
1 ni=1
n
X1 n
X
i
(1 n) f (xi ) + nf (xn ) = if (xi )
1 n
i=1 i=1

as desired.

Concavity is preserved by addition, as well as by “positive” scalar multiplication (the


proof is left to the reader):

Proposition 668 Let f; g : C Rn ! R be two concave functions. The function f + g is


concave, while f is concave if 0.

Concave functions are very well behaved, in particular they have remarkable continuity
properties.

Theorem 669 A concave function is continuous at every interior point of its domain:

Geometrically, it should be easy to see that the presence of a discontinuity at an interior


point of the domain forces some chord to cut the graph of the function, thereby preventing it
to be concave (or convex). If the discontinuity is on the boundary, this does not necessarily
happen.

Example 670 (i) Let f : [ 1; 1] ! R be de…ned by:


(
2 x2 if x 2 (0; 1)
f (x) =
0 if x 2 f0; 1g

Then f is concave on the entire domain [ 1; 1] and is discontinuous at 0 and 1, i.e., at the
boundary points of the domain. In accordance with the last theorem, f is continuous on
(0; 1), the interior of its domain [0; 1]. (ii) Concave functions f : Rn ! R de…ned on the
entire space Rn are continuous. N

Proof of Theorem 669 We prove the result for scalar functions. Let f be a concave
function de…ned on an interval C of the real line. We will show that f is continuous in every
closed interval [a; b] included in the interior of C: this will imply the continuity of f on the
interior of C.
So, let [a; b] int C. Let m be the smallest between the two values f (a) and f (b); for
every x = a + (1 ) b, with 0 1, that is, for every x 2 [a; b], one has

f (x) f (a) + (1 ) f (b) m + (1 )m = m


14.3. PROPERTIES 469

Therefore, f is bounded below by m on [a; b]. For every


a b b a
t
2 2
one has, due to the concavity of f , that
a+b 1 a+b 1 a+b
f f +t + f t
2 2 2 2 2
That is,
a+b a+b a+b
f +t 2f f t
2 2 2
Moreover, since
a+b b a b a
t 2 [a; b] 8t 2 ;
2 2 2
we have
a+b
f t m
2
whence
a+b a+b
f +t 2f m
2 2
By setting
a+b
M = 2f m
2
and by observing that
a+b b a b a
[a; b] = + t, for t 2 ;
2 2 2
we conclude that f is also bounded above by M on [a; b]. Thus, the function f is bounded
on [a; b].
Now consider the interval [a "; b + "], with " > 0. Clearly, it is also contained in the
interior of C, so f is bounded also on it (by what we have just proved). Let m" and M" be
the in…mum and the supremum of f on [a "; b + "]. If m" = M" , the function is constant
and, even more so, continuous. Let then m" < M" . Take two points x 6= y in [a; b] and set
" (x y) jx yj
z=y ; =
jx yj " + jx yj
We see immediately that z 2 [a "; b + "] and that y = z + (1 ) x. Therefore,

f (y) f (z) + (1 ) f (x) = f (x) + [f (z) f (x)]

that is,
jx yj
f (x) f (y) [f (x) f (z)] (M" m" ) = (M" m" )
" + jy xj
M" m"
< jx yj
"
470 CHAPTER 14. CONCAVE FUNCTIONS

In conclusion,
jf (x) f (y)j k jx yj
where k = (M" m" ) =". Now, if y ! x, that is, jx yj ! 0, then f (y) ! f (x). This
proves the continuity of f at x. Since x is arbitrary, the statement follows.

So, concave functions on open convex sets are continuous. If we strengthen the hypothesis
on f we can weaken that on its domain, as the next interesting result shows.

Proposition 671 An a¢ ne function de…ned on a convex set is continuous.

Proof Let f : C ! R be a¢ ne on the convex set C. By Proposition 656, we have f = l + q.


By the last result, l is continuous, so f as well is continuous.

14.4 Quasi-concave functions


In the previous section we posed a question, motivated by some simple observations from
utility theory, that we can reformulate as follows: given that concavity is only a su¢ cient
condition for the convexity of the upper contour sets, which weakening of the notion of
concavity does permit to identify the functions featuring convex upper contour sets? In the
language of utility theory, what is the characterization of utility functions with “convex”
indi¤erence curves?
The answer to these questions is the following class of functions.

De…nition 672 A function f : C Rn ! R de…ned on a convex set C is said to be


quasi-concave if
f ( x + (1 ) y) min ff (x) ; f (y)g (14.14)
for every x; y 2 C and every 2 [0; 1], and it is said to be quasi-convex if

f ( x + (1 ) y) max ff (x) ; f (y)g (14.15)

for every x; y 2 C and every 2 [0; 1] :

When the inequality in (14.14) is strict for 2 (0; 1) with x 6= y, the function f is said
to be strictly quasi-concave. Similarly, when the inequality in (14.15) is strict for 2 (0; 1)
with x 6= y, the function f is said to be strictly quasi-convex.
Finally, a function f is said to be quasi-a¢ ne if it is both quasi-concave and quasi-convex,
that is,
min ff (x) ; f (y)g f ( x + (1 ) y) max ff (x) ; f (y)g (14.16)
for every x; y 2 C and every 2 [0; 1].

Concave functions are quasi-concave because

f ( x + (1 ) y) f (x) + (1 ) f (y) min ff (x) ; f (y)g

while convex functions are quasi-convex. In particular, a¢ ne functions are quasi-a¢ ne. The
converses of these implications are false, as the following example shows.
14.4. QUASI-CONCAVE FUNCTIONS 471

Example 673 Monotonic scalar functions (e.g., the cubic) are quasi-a¢ ne. Indeed, let
f : C R ! R be increasing on the interval C and let x; y 2 C and 2 [0; 1], with
x y. Then, x x + (1 )y y and the increasing monotonicity implies f (x)
f ( x + (1 ) y) f (y), that is, (14.16) holds. A similar argument applies when f is
decreasing. This example shows that, unlike concave functions, quasi-concave functions may
be quite irregular. For instance, they might well be discontinuous at interior points of their
domain (just take any discontinuous monotonic scalar functions). N

Strictly concave functions are strictly quasi-concave:

f ( x + (1 ) y) > f (x) + (1 ) f (y) min ff (x) ; f (y)g

while strictly convex functions are strictly quasi-convex. The converses of these implications
are false. In particular, note that a quasi-concave function can be strictly convex – for
example, the exponential f (x) = ex . The terminology must, therefore, be taken cum grano
salis.

The next important result justi…es the study of quasi-concave functions.

Proposition 674 A function f : C Rn ! R is quasi-concave if and only if all its upper


contour sets (f k) are convex.

Proof Let f be quasi-concave. Given a non-empty (otherwise the result is trivial) upper
contour set (f k), let x; y 2 (f k) and 2 [0; 1]. We have

f ( x + (1 ) y) min ff (x) ; f (y)g k

and so x + (1 ) y 2 (f k). The set (f k) is therefore convex.


Vice versa, suppose that all the upper contour sets (f k) are convex. Let x; y 2 C
and 2 [0; 1]. Without loss of generality, suppose f (x) f (y). Setting k = f (y), we have
x + (1 ) y 2 (f k), and therefore

f ( x + (1 ) y) k = min ff (x) ; f (y)g

This proves the quasi-concavity of f .

Quasi-concave functions are thus characterized by the convexity of their upper contour
sets. So, quasi-concavity is the weakening of the notion of concavity that answers the opening
question.
In utility theory, quasi-concave utility functions are precisely those featuring “convex”
indi¤erence curves, the usual form of indi¤erence curves. This makes quasi-concave utility
functions the most important class of utility functions. Before studying in more detail this
important economic application of quasi-concavity, we close with some bad news: additivity
preserves concavity (Proposition 668) but not quasi-concavity.
472 CHAPTER 14. CONCAVE FUNCTIONS

Example 675 Let f; g : R ! R be given by f (x) = x3 and g (x) = x2 . These two


scalar functions are monotone, so quasi-concave. De…ne h : R ! R by h = f g, that is,
h (x) = x3 x2 . If we take the points x = 0 and y = 1, we have

1 1 1 1
h x+ y =h = < 0 = h (x) = h (y)
2 2 2 8

So, h is not quasi-concave. N

Indi¤erence curves For quasi-convex functions Proposition 674 holds with lower contour
sets in place of the upper contour ones. As a consequence, a quasi-a¢ ne function f has
level curves (f = k) that are convex because (f = k) = (f k) \ (f k). The converse is,
however, false: injective functions have level curves that are singletons, so convex, but they
might be not quasi-a¢ ne. For instance, take the function f : R ! R given by
( 1
x if x 6= 0
f (x) =
0 otherwise

Since f is injective, its level curves are singletons. In particular,


8
< f0g if y = 0
f 1 (y) = n o
: 1 if y =
6 0
y

So the level curves are, trivially, convex. But f is neither quasi-concave nor quasi-convex, a
fortiori not quasi-a¢ ne.
In utility theory what has just been observed shows that a su¢ cient, but not necessary,
condition for a utility function u to have convex (in a proper sense!) indi¤erence curves is
to be quasi-a¢ ne. Recall that previously we talked about convexity in an improper sense
(within “.”) of the indi¤erence curves, meaning by this the convexity of the upper contour sets
(u k). Although improper, this is a common terminology. In a proper sense, the convexity
of the indi¤erence curves is the convexity of the level curves (u = k). Thanks to Proposition
674, the improper convexity of the indi¤erence curves characterizes quasi-concave utility
functions, while their proper convexity is satis…ed by quasi-a¢ ne utility functions (without
being, however, a characterizing property of them).

Cardinality and ordinality Quasi-concavity is preserved by monotonic transformations,


unlike concavity. To shed light on this key di¤erence, it is useful to study together the
behavior of concavity and of quasi-concavity with respect to composition.

Proposition 676 Let g : C Rn ! R and f : D R ! R be two functions de…ned on


convex sets and such that Im g D.

(i) If g is concave and f is concave and increasing, then the composite function f g :
C Rn ! R is concave.

(ii) If g is quasi-concave and f is increasing, then the composite function f g : C Rn ! R


is quasi-concave.
14.4. QUASI-CONCAVE FUNCTIONS 473

Proof We show only (i), leaving (ii) to the reader. Let x; y 2 C and 2 [0; 1]. Thanks to
the properties of the functions f and g, we have

(f g) ( x + (1 ) y) = f (g ( x + (1 ) y)) f ( g (x) + (1 ) g (y))


f (g (x)) + (1 ) (f (g (y)))
= (f g) (x) + (1 ) (f g) (y)

as desired.

Between (i) and (ii) there is an important di¤erence: concavity is preserved by the
monotonic transformation f g if f is both increasing and concave, while, in order to preserve
quasi-concavity, increasing monotonicity is su¢ cient. In other terms, quasi-concavity is
preserved by monotone (increasing) transformations, while this is not true for concavity.
p
For example, if f; g : [0; 1) ! R are g (x) = x and f (x) = x4 , the composite function
f g : [0; 1) ! R is the quasi-concave and strictly convex function x2 .8 So, with f increasing
but not concave, the concavity of g only implies the quasi-concavity of f g, nothing more.
This di¤erence between (i) and (ii) is important in utility theory. A property of the
utility functions that is preserved for strictly increasing monotonic transformations is called
ordinal, while a property that is preserved only for strictly increasing a¢ ne transformations
–that is, for f (x) = x + with > 0 and 2 R –is called cardinal. Naturally, an ordinal
property is also cardinal, while the converse is false. Thanks to Proposition 676, we can thus
say that quasi-concavity is an ordinal property, while concavity is only cardinal.
The distinction between cardinal and ordinal properties is, conceptually, very important.
Indeed, given a utility function u : C Rn ! R and a function f : D R ! R, with
Im u D, we saw in Section 6.4.4 that, when f is strictly increasing, the transformation
f u : C Rn ! R of the utility function u : C Rn ! R is itself a utility function
equivalent to u. In other words, f u represents the same preference relation %, which is the
fundamental economic notion (Section 6.8). Indeed, what matters is how the decision maker
ranks the pairs of bundles x and y, whether x % y (x is preferred to y), that is, u (x) u (y),
or, vice versa, y % x (y is preferred to x), that is, u (y) u (x). When f is strictly increasing,
the preferential ordering % is preserved by f u since

x % y () u (x) u (y) () (f u) (x) (f u) (y)

For this reason, ordinal properties –which are satis…ed by u and all its equivalent transform-
ations f u –are characteristic of utility functions in that they are numeric representations
of an underlying preference %. In contrast, this is not true for cardinal properties, which are
preserved only by positive (therefore, increasing) linear transformations f , so might well get
lost through strictly increasing transformations that are not linear.

In light of this, the ordinal quasi-concavity, rather than the cardinal concavity, seems
to be the relevant property for utility functions u : C Rn ! R. Nevertheless, before we
declare quasi-concavity to be the relevant property, in place of concavity, we have to make a
last subtle observation. The monotonic transformation f u is quasi-concave if u is concave;
8
Note that x4 is here strictly increasing because we are considering its restriction on [0; +1). For the
same reason, x2 is quasi-concave.
474 CHAPTER 14. CONCAVE FUNCTIONS

does the opposite also hold? That is, can any quasi-concave function be expressed in this
way, as a monotonic transformation of a concave function?
If this were the case, concavity would be back in business also in an ordinalist approach:9
given a quasi-concave function, it would be then su¢ cient to consider its equivalent concave
version, obtained through a suitable strictly increasing transformation.
The answer to the question is negative: there exist quasi-concave functions that are not
monotonic transformations of concave functions.

Example 677 Let g : R ! R be given by


8
>
> x if x 0
<
g (x) = 0 if x 2 (0; 1)
>
>
:
x 1 if x 1

This function is increasing, so quasi-concave. We claim that there is no f : R ! R strictly


increasing and no h : R ! R concave such that

g=f h (14.17)

If f is strictly increasing, it has a strictly increasing inverse f 1 . Therefore, (14.17) is


equivalent to h = f 1 g. Hence, our claim amounts to saying that f 1 g is not concave
for any f : R ! R strictly increasing. Suppose, by contradiction, that f 1 g is concave.
By setting x = 3=2 and y = 0, we have

1 1 3 1 1 1
f (0) = f g = f g x+ y
4 2 2
1 1 1 1 1 1 1 1 1
f g (x) + f g (y) = f + f (0)
2 2 2 2 2

that is
1 1 1
f (0) f
2
which contradicts the fact that f 1 is strictly increasing. This proves the claim. N

This example shows that there exist genuinely quasi-concave functions that cannot be
represented as monotonic transformations of concave functions. It is the de…nitive proof
that quasi-concavity, and not concavity, is the relevant property in an ordinalist approach.
This important conclusion was reached in 1949 by Bruno de Finetti in the article in which
he introduced quasi-concave functions, whose theory was then extended in 1954 by Werner
Fenchel.

14.5 Diversi…cation principle


It is time to justify the economic relevance of the notions studied in the chapter. We will
focus on consumer theory, but similar considerations hold for production theory.
9
Recall the discussion of Section 6.2.1.
14.5. DIVERSIFICATION PRINCIPLE 475

We have observed many times that in consumer theory we usually consider utility func-
tions with “convex” indi¤erence curves, that is, utility functions with convex upper contour
sets.10 As observed, this is why quasi-concavity is a fundamental property of utility func-
tions. But, what is the economic motivation for assuming convex indi¤erence curves, that
is, quasi-concave utility functions?
The answer is in the diversi…cation principle: if two bundles of goods ensure a certain
level of utility, say k, a convex combination of them, a mixture, x + (1 ) y will yield at
least as much. In other words, the diversi…cation that the compound bundle x + (1 )y
a¤ords relative to the original bundles x and y, guarantees a utility level which is not smaller
than the original one, i.e., k. If x = (0; 1) is the bundle composed by 0 units of water and 1
of bread, while y = (1; 0) is composed by 1 unit of water and 0 of bread, their mixture
1 1 1 1
(0; 1) + (1; 0) = ;
2 2 2 2
is a diversi…ed bundle, with positive quantities of both water and bread. It is natural to
think that this mixture gives a utility which is not smaller than the utility of the two original
bundles.

Formally, the diversi…cation principle translates into the condition:


u (x) k and u (y) k =) u ( x + (1 ) y) k 8k 2 R (DP)
for every 2 [0; 1]. This is, precisely, the classic property of convexity of indi¤erence curves.
Mathematically, it is the convexity of the upper contour set (u k). Since it holds for every
k 2 R, it corresponds to the quasi-concavity of utility functions.

Everything …ne? Almost, we can actually sharpen what was just said. Observe that the
diversi…cation principle implies that, for every x; y 2 C,
u (x) = u (y) =) u ( x + (1 ) y) u (x) 8 2 [0; 1] (PDP)
Indeed, by setting k = u (x) = u (y), we obviously have u (x) k and u (y) k, which
implies u ( x + (1 ) y) k by the diversi…cation principle. We call condition PDP the
pure diversi…cation principle. In preferential terms, the PDP takes the nice form
x y =) x + (1 )y % x 8 2 [0; 1]
which well expresses its nature.
The PDP is very interesting: it states that each bundle which is a mixture of indi¤erent
bundles is preferred to the original ones. If we draw an indi¤erence curve, we see that the
weaker property PDP is often used as the property that characterizes the convexity of the
indi¤erence curves. Indeed, PDP is the purest and most intuitive form of the diversi…cation
principle: by combining two indi¤erent alternatives, we get a better one. Going back to the
example of bread and water, it is plausible that
1 1
(0; 1) (1; 0) - ;
2 2
10
Throughout this section the convexity of indi¤erence curves is always to be understood in an improper
sense (that, as already remarked, will be made precise in Section 25.3). For simplicity, we omit the quotation
marks in the adjective “convex”.
476 CHAPTER 14. CONCAVE FUNCTIONS

The next result shows that, in most cases of interest for consumer theory, the two principles
turn out to be equivalent. The result uses the notion of directed set.

De…nition 678 A set C in Rn is said to be directed if, for every x; y 2 C, there exists
z 2 C such that z x and z y.

In words, a set is directed when any pair of its elements has a common lower bound that
belongs to the set. In consumer theory many sets of interest are directed. For example, all
sets C Rn+ that contain the origin are directed. Indeed, 0 x for every x 2 Rn and,
therefore, the origin itself is the lower bound common to all the pairs of elements of C.

Proposition 679 Let u : C Rn ! R be a continuous and increasing function de…ned


on a convex and directed set C. The function u is quasi-concave if and only if it satis…es
condition PDP.

Proof Since the “only if” part is obvious, we prove the “if” part: the PDP implies the
quasi-concavity of u. Let x; y 2 C and 2 [0; 1], with u (x) u (y). Since C is directed,
there exists z 2 C such that z x and z y. By the increasing monotonicity of u, we
have u (z) u (x) and u (z) u (y). Let us de…ne the auxiliary function : [0; 1] ! R by
(t) = u (tx + (1 t) z) for t 2 [0; 1]. Since C is convex, the function is well-de…ned. The
continuity of u implies that of . Indeed:

tn ! t =) tn x + (1 tn ) z ! tx + (1 t) z
=) u (tn x + (1 tn ) z) ! u (tx + (1 t) z) =) (tn ) ! (t)

Since (0) = u (z) u (y) u (x) = (1), by the Intermediate Value Theorem the continu-
ity of implies the existence of t 2 [0; 1] such that (t) = u (y). By setting w = tx+(1 t) z,
we have therefore u (w) = u (y). Moreover, z x implies w x.
By the PDP condition, it follows that u ( w + (1 ) y) u (w) = u (y), while w x
implies that w + (1 )y x + (1 ) y. Since u is increasing, we conclude that

u ( x + (1 ) y) u ( w + (1 ) y) u (y) = min fu (x) ; u (y)g

which proves that u is quasi-concave.

The result just proved guarantees that, under assumptions typically satis…ed in consumer
theory, the two possible interpretations of the convexity of the indi¤erence curves are equi-
valent. We can therefore consider the pure principle of diversi…cation, which is the clearest
form of the diversi…cation principle, as the motivation for the use of quasi-concave utility
functions.

What about concave functions? They satisfy the diversi…cation principle and therefore
their use does not violate the principle. Example 677 has shown, however, that there ex-
ist examples of quasi-concave functions that are not monotonic transformations of concave
functions, i.e., that do not have the form f g with f strictly increasing and g concave. In
other words, quasi-concavity (so, the diversi…cation principle) is a weaker property than the
concavity in ordinal utility theory.
14.6. GRAND FINALE: CAUCHY’S EQUATION 477

In conclusion, the use of concave functions is consistent with the diversi…cation principle,
but it is not justi…ed by it. Only quasi-concavity is justi…ed by this principle, being its
mathematical counterpart.11

We make a last observation on the pure diversi…cation principle that does not add much
conceptually, but is useful in applications. Consider a version of condition PDP with strict
inequality: for every x 6= y,

u (x) = u (y) =) u ( x + (1 ) y) > u (x) 8 2 (0; 1) (SDP)

or, equivalently, in preferential terms,

x y =) x + (1 )y x 8 2 (0; 1)

We thus obtain a strong form of the principle in which diversi…cation is always strictly
preferred by the consumer. Condition SDP is implied by the strict quasi-concavity of u since

u (x) = u (y) =) u ( x + (1 ) y) > min fu (x) ; u (y)g = u (x) 8 2 (0; 1)

Under the hypotheses of Proposition 679, it is indeed equivalent to SDP. We thus have the
following version of that proposition (the proof is left to the reader).

Proposition 680 Let u : C Rn ! R be a continuous and increasing function de…ned on


a convex and directed set C. The function u is strictly quasi-concave if and only if it satis…es
condition SDP.

SDP is thus the version of the diversi…cation principle that characterizes strict quasi-
concavity, a property often used in applications because it ensures the uniqueness of the
solutions of optimization problems, as it will be discussed in Section 18.6.

We close by observing that, although important, the diversi…cation principle does not
have universal validity: there are cases in which it makes little sense. For example, if the
bundle (1; 0) consists of 1 unit of brewer’s yeast and 0 of cakes’yeast, while the bundle (0; 1)
consists of 1 unit of cakes’yeast and 0 of brewer’s yeast, and we judge them indi¤erent, their
combination (1=2; 1=2) might be useless for making both a pizza and a cake. In this case,
the combination turns out to be rather bad.

14.6 Grand …nale: Cauchy’s equation


14.6.1 The basic equation
We close the chapter with a remarkable re…nement of Proposition 533 which shows that,
for functions that satisfy a minimal condition of regularity (continuity at one point), the
property of additivity f (x + y) = f (x) + f (y) is su¢ cient to characterize linear functions
of a single variable.
11
In a microeconomics course, readers will learn that concavity can be given a compelling justi…cation in
terms of risk aversion in choice under risk.
478 CHAPTER 14. CONCAVE FUNCTIONS

This re…nement is usually stated through the Cauchy functional equation: we ask whether
or not there are functions f : R ! R that satisfy the condition12

f (x + y) = f (x) + f (y) 8x; y 2 R

Naturally, a function satis…es Cauchy’s equation if and only if it is additive (cf. De…nition
684). Much more is true:

Theorem 681 (Cauchy) If f : R ! R is continuous at least at one point, then it satis…es


Cauchy’s equation if and only if it is linear, that is, it is such that f (x) = mx for some
m 2 R.

In the language of Proposition 533 the theorem reads: a function f : R ! R, continuous


at least at one point, is linear if and only if it is additive. With a minimal regularity property
(continuity at a point), the property of homogeneity (i) of Proposition 533 is automatically
satis…ed.13

N.B. The conclusion of Theorem 681 holds also when f is de…ned only on R+ : the proof is
the same. O

Proof The “if” part is trivial; let us show the “only if” part in three steps. (i) Taking
x = y = 0, the equation gives f (0) = f (0) + f (0) = 2f (0), that is, f (0) = 0: the graphs of
all functions that satisfy the equation pass through the origin.
(ii) We claim that f is continuous at every point. Let x0 be the point at which, by
hypothesis, f is continuous, so that f (x) ! f (x0 ) as x ! x0 . Take another (generic) point
z0 . By the Cauchy equation and the continuity of f at x0 ,

f (x) = f (x z0 + z0 ) = f (x z0 ) + f (z0 ) ! f (x0 ) as x ! x0

Therefore,
f (x z0 ) ! f (x0 ) f (z0 ) = f (x0 z0 ) as x ! x0
which proves the continuity of f at x0 z0 and, by the arbitrariness of x0 z0 , f is everywhere
continuous.
(iii) Using Cauchy’s equation n times, we can write that, for every x 2 R and for every
n 2 N,
f (nx) = f (x
| +x+ {z + x}) = f| (x) + f (x){z+ + f (x) = nf (x)
}
n times n times

Since f (0) = 0, we have 0 = f (x x) = f (x + ( x)) = f (x) + f ( x), whence f ( x) = f (x)


and therefore f ( nx) = ( n)f (x) for every n 2 N. We conclude that

f (kx) = kf (x) 8x 2 R, 8k 2 Z (14.18)


12
Here, we talk of a functional equation because the “unknown” is a function f and not just a scalar or a
vector, as it is the case for the equations studied in Section 12.8.
13
In view of Theorem 681, non-linear additive functions must be discontinuous at each point, so are ex-
tremely irregular. This makes them complicated to describe (and not particularly nice to see); for brevity we
do not provide examples of them.
14.6. GRAND FINALE: CAUCHY’S EQUATION 479

By setting y = kx, we then have f (y) = kf (y=k), so

1 1
f y = f (y) 8y 2 R, 8k 2 Z (14.19)
k k

In conclusion, combining the two equalities (14.18) and (14.19), we get


m m
f x = f (x) 8x 2 R, 8m; n 2 Z with n 6= 0
n n
that is,
f (rx) = rf (x) 8x 2 R, 8r 2 Q
Hence, putting x = 1 and denoting f (1) = a, we have f (r) = ar for every r 2 Q. The
function f is therefore linear on the rationals. Now assume x is irrational and take a sequence
frk g of rationals that tends to x. We know that f (rk ) = ark for every k 1. Since ark ! ax
as k ! 1, the continuity of f at every x 2 R then yields

ax = a lim rk = lim ark = lim f (rk ) = f lim rk = f (x)


k!1 k!1 k!1 k!1

as desired.

14.6.2 Remarkable variants


Simple variants of Cauchy’s equation are:

(i) “+- ”: consider the functional equation

f (x + y) = f (x) f (y) 8x; y 2 R (14.20)

It admits the trivial solution f (x) = 0 for every x 2 R. Every other solution is strictly
positive. Indeed, if f is such a solution, for every x 2 R we have:

x x x x h x i2
f (x) = f + =f f = f 0
2 2 2 2 2
Moreover, if there exists y 6= 0 with f (y) = 0, then f (x) = f ((x y) + y) =
f (x y) f (y) = 0 for every x 2 R, which contradicts the non-triviality of f . Every
non-trivial solution of (14.20) is therefore strictly positive. This allows us to take the
logarithm of both members of (14.20), so that

log f (x + y) = log f (x) + log f (y) 8x; y 2 R

which is the Cauchy equation in the unknown function log f . The solution is therefore
log f (x) = mx with m 2 R, so the exponential function

f (x) = emx

is the non-trivial solution of the functional equation (14.20).


480 CHAPTER 14. CONCAVE FUNCTIONS

(ii) “ -+”: consider the functional equation


f (x y) = f (x) + f (y) 8x; y > 0 (14.21)
It also admits the trivial solution f (x) = 0 for every x 2 R. By using the identity
xy = elog x+log y , (14.21) becomes

f elog x+log y = f elog x + f elog y 8x; y > 0

By setting g (x) = f (ex ) for every x 2 R, we obtain the Cauchy equation


g (log x + log y) = g (log x) + g (log y)
in the unknown function g. We know that its solution is g (x) = mx with m 2 R. This
yields f (ex ) = mx. In other words, the logarithmic function
f (x) = log xm
is the solution of the functional equation (14.21).
(iii) “ - ”: consider the functional equation
f (x y) = f (x) f (y) 8x; y 0 (14.22)
It admits, too, the trivial solution f (x) = 0 for every x 2 R. The reader can verify that
also in this case we can take the logarithm of the two members, so that the equation
reduces to (ii) with log f in place of f , with solution log f (x) = m log x, that is, the
power function
f (x) = em log x = xm

The results just seen are remarkable because they establish a functional foundation to
the elementary functions. For example, the exponential function can be characterized, as in
Theorem 367, via the limit
x n
ex = lim 1 +
n!1 n
but also, from a completely di¤erent angle, as the function that solves the functional equation
(14.20). Both points of view are of great importance.
Because of the importance of this new perspective on elementary functions, we record as
a theorem what we established.

Theorem 682 (i) The exponential function f (x) = emx , with m 2 R, is the unique non-
trivial solution of the functional equation
f (x + y) = f (x) f (y) 8x; y 2 R

(ii) The logarithmic function f (x) = log xm , with m 2 R, is the unique non-trivial solution
of the functional equation
f (x y) = f (x) + f (y) 8x; y > 0

(iii) The power function f (x) = xm , with m 2 R, is the unique non-trivial solution of the
functional equation
f (x y) = f (x) f (y) 8x; y 0
14.6. GRAND FINALE: CAUCHY’S EQUATION 481

14.6.3 Continuous compounding


A common …nancial problem consists in calculating the value of a monetary capital at a future
date. Denote by c the amount of capital available today, date 0, and by m its terminal value,
that is, its value at a date t 0. The most common, and simplest, hypothesis is that m
depends only on c and t, so that

m = m (c; t) : R R+ ! R

Here, c < 0 is interpreted as a debt. Consider the following properties on this function:

(i) m (c1 + c2 ; t) = m (c1 ; t) + m (c2 ; t) for every t 0 and every c1 ; c2 2 R;

(ii) t1 < t2 implies m (c; t1 ) m (c; t2 ) for every c;

(iii) m (c; 0) = c for every c 2 R.

Condition (i) requires that the terminal value of a sum of capitals be the sum of their
terminal values. Observe that it would be meaningless to suppose that m (c1 + c2 ; t) <
m (c1 ; t) + m (c2 ; t) for some c1 ; c2 0 because, in such a case, it would be more pro…table
to invest separately c1 and c2 than their sum c1 + c2 . In contrast, it might be reasonable to
have m (c1 + c2 ; t) m (c1 ; t) + m (c2 ; t), but this would lead us a bit too far away.
Condition (ii) requires that the terminal value increases with the length of the investment.
This presumes that such value is measured in nominal terms. Finally, condition (iii) is
obvious.

Theorem 683 Let m be continuous for, at least, some value of c. It satis…es conditions
(i)-(iii) if and only if
m (c; t) = cf (t)
where f : [0; 1) ! R is an increasing function such that f (0) = 1.

Proof De…ne mt : R ! R by mt (c) = m (c; t). By condition (i), mt satis…es the Cauchy
functional equation. Therefore, for each t 0 there is a scalar t such that mt (c) = t c.
De…ne f : [0; 1) ! R by f (t) = t , so that we can write m (c; t) = cf (t). To satisfy (ii), f
must be increasing and, by (iii), we have f (0) = 1.

Under conditions (i)-(iii), the terminal value is therefore proportional to the amount c
of the capital. In particular, we have f (t) = m (1; t), so f (t) can be interpreted as the
terminal value in t of a unit capital. The terminal value of any other amount of capital can
be obtained simply by multiplying it by f (t). For this reason, f (t) is called the compound
factor.
The most common compound factor has the form
t
f (t) = e

with 0. To see how the exponential factor may come up, suppose that one has to invest
a capital c from today, 0, until the date t1 + t2 . We can think of two investment strategies:

(a) to invest from the beginning to the end, thus obtaining the terminal value cf (t1 + t2 );
482 CHAPTER 14. CONCAVE FUNCTIONS

(b) to invest …rst from 0 to t1 , getting the terminal value cf (t1 ), and then reinvest this
amount for the remaining t2 , thus obtaining the terminal value (cf (t1 )) f (t2 ).

If the two terminal values di¤er, that is, f (t1 + t2 ) 6= f (t1 ) f (t2 ), arbitrage opportunities
may open if in the …nancial market it is possible to lend and borrow without quantity
constraints and transactions costs. Indeed, if f (t1 + t2 ) > f (t1 ) f (t2 ), it would be pro…table
to invest without interruptions from 0 to t1 + t2 and to borrow with interruption at t1 ,
earning in this way the di¤erence f (t1 + t2 ) f (t1 ) f (t2 ) > 0. Vice versa, if f (t1 + t2 ) <
f (t1 ) f (t2 ), it would be pro…table to borrow without interruptions, and then investing with
an interruption at t1 .
In sum, the equality f (t1 + t2 ) = f (t1 ) f (t2 ) must hold for every t1 ; t2 0 in order
not to open arbitrage opportunities. Remarkably, from the study of the variant (14.20) of
Cauchy’s equation, it follows that this equality amounts to
t
f (t) = e

provided f is continuous at least at one point. The exponential compound factor is thus the
outcome of a no arbitrage argument, as it is the case for many key results in …nance (cf.
Section 19.5).

N.B. In this section we assumed that time is continuous, so t can take any positive value,
so each c induces a function mt (see the proof of the last theorem). In contrast, if time were
discrete, with t 2 N+ , we would have a sequence. In this case, the discrete compound factor
f : N+ ! R that corresponds to the exponential continuous compound factor is given by
f (t) = (1 + r)t with mt = (1 + r)t c (cf. Example 271).

14.6.4 Additive functions


De…nition 684 A function f : Rn ! R is called additive if it satis…es the Cauchy functional
equation, that is,
f (x + y) = f (x) + f (y) 8x; y 2 Rn

A function f : Rn ! R is linear if and only if it is additive and homogeneous (Proposition


533). The following result generalizes Cauchy’s Theorem to the multivariable case and, in
so doing, provides a new twist on Riesz’s Theorem. Remarkably, it shows that for additive
functions, the topological property of continuity (even just at some point) and the algebraic
property of homogeneity become equivalent.

Theorem 685 For a function f : Rn ! R, the following conditions are equivalent:

(i) f is continuous at least at one point and additive;

(ii) f is continuous and additive;

(iii) f is linear;

(iv) there exists a (unique) vector 2 Rn such that f (x) = x for all x 2 Rn .
14.7. FIREWORKS: THE SKELETON OF CONVEXITY 483

Proof (iv) implies (iii) by Riesz’s Theorem. (iii) implies (ii) by Theorem 535. (ii) trivially
implies (i). Finally, to prove that (i) implies (iv) is enough to show, along the lines of the
proof of Cauchy’s Theorem for scalar functions (which is easily adapted to Rn , as readers
can check), that (i) implies that f is homogeneous, so linear.

Interestingly, it can be proved that a function f : Rn ! R is the non-trivial continuous


solution of the multidimensional version of variant (14.20), i.e.,

f (x + y) = f (x) f (y) 8x; y 2 Rn

if and only if there exists a vector 2 Rn such that f (x) = e x for all x 2 Rn .

14.7 Fireworks: the skeleton of convexity


14.7.1 Convex envelope
De…nition 686 Given a set A in Rn , its convex envelope (or hull) co A is the smallest
convex set that contains A.

Next we show that convex envelopes are the counterpart for convex combinations of what
generated subspaces were for linear combinations (recall Section 3.4).

Proposition 687 The convex envelope of a set is the intersection of all convex sets that
contain it.

Proof Given a set A of Rn , let fCi gi2I be the collection of all convex subsets
T containing
A, where I is a (…nite
T or in…nite) index set. We want to show that co A = i2I T Ci . By
Proposition 646, i2I Ci is a convex set. Since A Ci for each i, we have co A i2I Ci
since, by de…nition, co A is the smallest convex subset containing A. On the other hand,
co
T A belongs to the collection fCi gi2I , being a convex
T subset containing A. It follows that
C
i2I i co A and we can therefore conclude that i2I Ci = co A.

The next result shows that convex envelopes can be represented through convex combin-
ations.

Theorem 688 Let A be a set in Rn . A vector x 2 Rn belongs to co A if and only if it is a


convex combination of vectors of A.

In other words, x 2 co A if and only


P if there exists a …nite set
P fxi gi2I of A and a …nite
set f i gi2I of positive scalars, with i2I i = 1, such that x = i2I i xi .

Proof “If.”Let x 2 Rn be convex combination of a …nite set fxi gi2I of vectors of A. The set
co A is convex and, since fxi gi2I co A, Lemma 649 implies x 2 co A, as desired. “Only if.”
Let C be the set of all the vectors that can be expressed as convex combinationsPof vectors of
A, i.e., x 2 C if there exist …nite sets fx g A and i [0; 1), with
P i i2I i2I i2I i = 1,
such that x = ni=1 i xi . It is easy to see that C is a convex subset containing A. It follows
that co A C and hence each x 2 co A is a linear combination of vectors of A.
484 CHAPTER 14. CONCAVE FUNCTIONS

Example 689 Let A = fx1 ; :::; xk g Rn . The polytope generated by the set A is its convex
envelope co A. In particular, simplices are the convex envelope of the versors. N

Proposition 690 The convex envelope of a compact set is compact.

Thus, convex envelopes preserve compactness (we omit the proof). When K is a compact
subset, co K is then compact. For instance, polytopes are compact because they are the
convex envelope of a …nite (so, compact) collection of vectors of Rn .

14.7.2 Extreme points


We have seen how, given any set, we can construct through the convex combinations of
its elements the smallest convex set that contains it, that is, its convex envelope. Now we
consider, in a sense, the opposite problem: given a convex set C, we ask what is the smallest
set of its points from which C can be reconstructed as their convex combinations. In other
words, we ask what is the minimal set A C such that co A = C.
If it exists, such set A gives us the essence of the set C, its “skeleton.” From the point
of view of convexity, the knowledge of A would be equivalent to the knowledge of the entire
set C because C could be reconstructed from A in a “mechanical” way through convex
combinations of its elements.
To understand how to address this problem, we go back to the rhomb described in
Example 689. There we saw how this polygon is the convex envelope of its vertices A =
f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1)g. In general, as we already remarked, a polygon is the convex
envelope of its vertices. On the other hand, we also observed how the same rhomb can be
seen as the convex envelope of the set:

A0 = f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1) ; (1=2; 1=2)g

In this set, besides the vertices there is also the vector (1=2; 1=2), which is useless for the
representation of the polygon because is itself a convex combination of the vertices.14 We
therefore have a redundancy in the set A0 , while this does not happen in the set A of the
vertices, whose elements are all essential for the representation of the rhomb.
Hence, for a polygon the set of the vertices is the natural candidate to be the minimal set
that allows to represent each point of the polygon as a convex combination of its elements.
This motivates the notion of extreme point, which generalizes that of vertex to any convex
set.

De…nition 691 A point x0 of a convex set C is said to be an extreme point of C if x0 =


tx + (1 t) y with t 2 (0; 1) and x; y 2 C implies y = x = x0 .

Thus, a point x0 2 C is extreme if it is not a convex combination of other two vectors


of C. The set of the extreme points of C is denoted by ext C. In the case of polytopes, the
extreme points are called vertices. The next result gives a simple characterization of extreme
points: they are the points that can be eliminated without altering the convex nature of the
set considered. Indeed, if in the plane we remove a vertex in a polygon, we still have a convex
set.
14 1 1
Indeed, (1=2; 1=2) = 2 (1; 0) + 2 (0; 1).
14.7. FIREWORKS: THE SKELETON OF CONVEXITY 485

Lemma 692 A point x0 of a convex set C is extreme if and only if the set C fx0 g is
convex.

Proof Let x0 2 ext C and x; y 2 C fx0 g. Since C is convex, tx + (1 t) y 2 C for


each t 2 [0; 1]. To prove that tx + (1 t) y 2 C fx0 g, it is therefore su¢ cient to prove
that x0 6= tx + (1 t) y. This is obvious if t 2 f0; 1g. On the other hand, if it held
x0 = tx + (1 t) y for some t 2 (0; 1), then De…nition 691 implies u = x = x0 , which
contradicts x; y 2 C fx0 g. In conclusion, tx + (1 t) y 2 C fx0 g, and the set C fx0 g
is therefore convex.
Vice versa, assume that x0 2 C is such that the set C fx0 g is convex. We prove that
x0 2 ext C. Let x; y 2 C be such that x0 = tx + (1 t) y with t 2 (0; 1). Since C fx0 g is
convex, if x; y belong to C fx0 g, then tx + (1 t) y 2 C fx0 g for each t 2 [0; 1]. Hence,
x0 6= tx + (1 t) y for each t 2 [0; 1]. It follows that x; y do not belong to C fx0 g, which
is equivalent to say that y = x = x0 . In conclusion, x0 2 ext C.

The next result shows that the extreme points must be boundary points. No interior
point of a convex set can be an extreme point.

Proposition 693 We have ext C @C.

Proof Let x be an interior point of C. We prove that x 2


= ext C. Since x is an interior point,
there exists a neighborhood B" (x) such that B" (x) C. Consider the points (1 "=n) x
and (1 + "=n) x. We have:
" " " "
1 x x = kxk and 1+ x x = kxk
n n n n
and hence (1 "=n) x; (1 + "=n) x 2 B" (x) for n su¢ ciently large. On the other hand,

1 " 1 "
x= 1 x+ 1+ x
2 n 2 n
and so x 2
= ext C.

Open convex sets (like, for example, open unit balls) thus do not have extreme points.
We now see other examples in which we …nd the extreme points of some convex sets.

Example 694 Consider the polytope co A generated by a …nite collection A = fx1 ; :::; xk g
Rn . It is easy to see that ext co A is not empty, with ext co A A. That is, the vertices of
the polytope necessarily belong to the …nite collection that generates the polytope. N

Example 695 Consider the closed unit ball B1 (0) = fx 2 Rn : kxk 1g of Rn . In this
case, we have:
ext B1 (0) = fx 2 Rn : kxk = 1g
That is, ext B1 (0) = @B1 (0): the set of the extreme points is given by the “circumference”
of the ball, its skin. Though a quite intuitive result (just draw a circle), it is a bit delicate
to prove. Since @B1 (0) = fx 2 V : kxk = 1g, the previous proposition implies the inclusion
ext B1 (0) fx 2 V : kxk = 1g. As to the converse inclusion, let x0 2 @B1 (0). Let x0 =
486 CHAPTER 14. CONCAVE FUNCTIONS

tx + (1 t) y 2 B1 (0) with x; y 2 B1 (0) and t 2 (0; 1). We want to show that x = y. We


have

ktx + (1 t) yk2 = t2 kxk2 + (1 t)2 kyk2 + 2t (1 t) x y


2 2 2
= t2 kxk + (1 t) kyk + 2t (1 t) kxk kyk cos ( )
2 2
= t + (1 t) + 2t (1 t) cos ( )

where is the angle that is di¤erence of the angles determined by the two vectors
(Section C.3). If x 6= y, we have cos ( ) < 1, so ktx + (1 t) yk2 < 1. This contradicts
x0 2 @B1 (0), therefore x = y. We conclude that x 2 ext B, as desired. N

We are now ready to address the opening question of this section. We …rst need a
preliminary lemma that shows that ext C is included in all subsets of C whose convex envelope
is C itself.

Lemma 696 If A C is such that co A = C, then ext C A.

Proof Let x 2 ext C. We want to show that x 2 A. P Since x 2 co A, there is a P


collection
fxi gni=1 A and a collection fti gni=1 [0; 1], with ni=1 ti = 1, such that x = ni=1 ti xi .
Without loss of generality, assume ti > 0 for every i. We have:
n
X ti
x = t1 x1 + (1 t1 ) xi
1 t1
i=2
Pn
Since C is convex, i=2 ti xi = (1 t1 ) belongs to C. Then,
n
X ti
x = x1 = xi
1 t1
i=2

since x is an extreme point. Set i = ti = (1 t1 ) for i = 2; :::; n, so that


n
X i
x= 2 x2 + (1 2) xi
1 2
i=2

Since x is an extreme point, we now have


n
X i
x = x2 = xi
1 2
i=2

By proceeding in this way, we prove that x = xi for every i. Hence, x 2 A.

The next fundamental result shows that convex and compacts sets can be reconstructed
from its extreme points by taking all their convex combinations. We omit the proof.

Theorem 697 (Minkowski) Let K be a convex and compact subset of Rn . Then:

K = co (ext K) (14.23)
14.7. FIREWORKS: THE SKELETON OF CONVEXITY 487

In view of the previous lemma, Minkowski’s Theorem answers the opening question:
ext K is the minimal set in K for which (14.23) holds. Indeed, if A K is another set for
which K = co A, then ext K A by the lemma. Summing up:

all the points of a compact and convex set K can be expressed as convex combinations
of the extreme points;

the set of the extreme points of K is the minimal set in K for which this is true.

Minkowski’s Theorem stands out as the deepest and most beautiful result of the chapter.
It shows that, in a sense, convex and compact sets in Rn are generalized polytopes (cf.
Example 694) with extreme points generalizing the role of vertices. In particular, polytopes
are the convex and compact sets of Rn that have a …nite number of extreme points (which
are then their vertices).
488 CHAPTER 14. CONCAVE FUNCTIONS
Chapter 15

Homogeneous functions

15.1 Preamble: cones


De…nition 698 A set C in Rn is said to be a cone if, for each x 2 C, we have x 2 C for
all 0.

Geometrically, C is a cone if, any time x belongs to C, the set C also includes the whole
half-line starting at the origin and passing through x.

5
y 7

6 y
4

3
4

2 3

2
1 O x
O x
1

0
0

-1 -1
-3 -2 -1 0 1 2 3 4 5 6 7 -6 -4 -2 0 2 4 6 8 10

Convex cone Cone not convex

Note that the origin 0 always belong to a cone: given any x 2 C, by taking = 0 we have
0 = 0x 2 C.

One can easily show that the closure of a cone is a cone and that the intersection of two
cones is still a cone.

Proposition 699 A convex set C in Rn is a cone if and only if

x; y 2 C =) x + y 2 C 8 ; 0

489
490 CHAPTER 15. HOMOGENEOUS FUNCTIONS

While a generic convex set is closed with respect to convex combinations, convex cones
are closed with respect to all linear combinations with positive coe¢ cients (regardless of
whether or not they add up to 1). This is what distinguishes them among all convex sets.

Proof “Only if”. Let C be a cone. Take x; y 2 C. We want to show that x + y 2 C for
all ; 0. Fix ; 0. If = = 0, then x + y = 0 2 C. Assume that + > 0.
Since C is convex, we have
x+ y2C
+ +
Since C is a cone, we have

x+ y =( + ) x+ y 2C
+ +
as desired.
“If”. Suppose that x; y 2 C implies that x + y 2 C for all ; 0. We want to show
that C is a cone. By taking = = 0, one can conclude that 0 2 C and, by taking y = 0,
that x 2 C for all 0. Hence, C is a cone.

Example 700 (i) A singleton fxg Rn is always convex; it is also a cone if x = 0. (ii)
The only non-trivial cones in R are the two half-lines ( 1; 0] and [0; 1).1 (iii) The set
Rn+ = fx 2 Rn : x 0g of the positive vectors is a convex cone. N

Cones can be closed, for example Rn+ , or open, for example Rn++ . Vector subspaces form
an important class of closed convex cones (the non-trivial proof is omitted).

Proposition 701 Vector subspaces are closed subsets of Rn .

For example, this proposition implies that the graphs of straight lines passing through
the origin are closed sets because they are vector subspaces of R2 .

15.2 Homogeneity and returns to scale


15.2.1 Homogeneous functions
Returns to scale are a main property of production functions. Their mathematical coun-
terpart is homogeneity. We begin with the simplest kind of homogeneity, namely positive
homogeneity. For production functions, it corresponds to the hypothesis of constant returns
to scale.

De…nition 702 A function f : C Rn ! R de…ned on a convex set C with 0 2 C is said


to be positively homogeneous if
f ( x) = f (x) (15.1)
for all x 2 C and all 2 [0; 1].

Hence, a reduction of proportion x of all the components of a vector x determines an


analogous reduction f (x) of the value f ( x) of the function.
1
The trivial cones in R are the singleton f0g and R itself.
15.2. HOMOGENEITY AND RETURNS TO SCALE 491

Example 703 (i) Linear functions f : Rn ! R are positively homogeneous. (ii) The func-
p
tion f : R2+ ! R given by f (x) = x1 x2 is positively homogeneous. Indeed
p p p
f ( x) = ( x1 ) ( x2 ) = 2x x = x1 x2 = f (x)
1 2

for all 0. N

For any positively homogeneous function we have


f (0) = 0 (15.2)
Indeed, for all 2 [0; 1] we have f (0) = f ( 0) = f (0), which implies f (0) = 0. Positively
homogeneous functions thus have zero value at the origin.

The condition 0 2 C in the de…nition ensures that x 2 C for all 2 [0; 1], so that (15.1)
is well-de…ned. Whenever C is a cone –as in the previous examples –property (15.1) holds,
more generally, for any positive scalar .

Proposition 704 A function f : C Rn ! R de…ned on a cone C is positively homogen-


eous if and only if
f ( x) = f (x) (15.3)
for all x 2 C and all 0.

Proof Since the “if”side is trivial, we focus on the “only if”. Let f be positively homogeneous
and let x 2 C. We must show that f ( x) = f (x) for every > 1. Let > 1 and
set y = x, so that x = y= . From > 1 it follows that 1= < 1. Thanks to the
positive homogeneity of f , we have that f (x) = f (y= ) = f (y) = = f ( x) = , that is,
f ( x) = f (x), as desired.

A positively homogeneous function on a cone thus preserves positive scalar multiplication:


if one multiplies a vector x by any positive scalar , the image f ( x) is equal to the image
f (x) of x times the scalar . Hence, both proportional reductions and increases determine
analogous reductions and increases in f (x). When f is a production function, we are in
a classic constant returns to scale scenario: by doubling the inputs we double the output
( = 2), by tripling the inputs we triple the output ( = 3), and so on.

Linear production functions are positively homogeneous, thus having constant returns to
scale (Example 532). Let us now illustrate another famous example.

Example 705 Let f : R2+ ! R be a CES (constant elasticity of substitution) production


function de…ned by
1
f (x) = ( x1 + (1 ) x2 )
with 2 [0; 1] and > 0. It is positively homogeneous:
1 1
f ( x) = ( ( x1 ) + (1 ) ( x2 ) ) = ( ( x1 + (1 ) x2 ))
1
= ( x1 + (1 ) x2 ) = f (x)
for all 0. N
492 CHAPTER 15. HOMOGENEOUS FUNCTIONS

Apart from being constant, returns to scale may be increasing or decreasing. This mo-
tivates the following de…nition.

De…nition 706 A function f : C Rn ! R de…ned on a convex set C with 0 2 C is said


to be ( positively) superhomogeneous if

f ( x) f (x)

for all x 2 C and all 2 [0; 1], while it is said to be ( positively) subhomogeneous if

f ( x) f (x)

for all x 2 C and all 2 [0; 1].

Naturally, a function is positively homogeneous if and only if it is both superhomogeneous


and subhomogeneous.
Whenever f is a production function, subhomogeneity captures decreasing returns to
scale, while superhomogeneity captures increasing returns. This can easily be seen in the next
result, a version of Proposition 704 for subhomogeneous functions (we leave the analogous
superhomogeneous case to the reader).

Proposition 707 A function f : C Rn ! R de…ned on a convex cone is subhomogeneous


if and only if and every if for every x 2 C we have

f ( x) f (x) 8 2 [0; 1]

and
f ( x) f (x) 8 1

Proof We consider the “only if” side, the converse being trivial. Let f be subhomogeneous
and x 2 C. Our aim is to show that f ( x) f (x) for all > 1. Take > 1 and set
y = x, so that x = y= . Since > 1, we have 1= < 1. By the positive subhomogeneity of
f , we have f (x) = f (y= ) f (y) = = f ( x) = , that is, f ( x) f (x), as desired.

Thus, by doubling all inputs ( = 2) the output is less than doubled, by tripling all inputs
( = 3) the output is less than tripled, and so on for each 1. A proportional increase of
all inputs brings along a less than proportional increase in output, which models decreasing
returns to scale. Dual considerations hold for increasing returns to scale, which entail more
than proportional increases in output as all inputs increase proportionally. Note that when
2 [0; 1], so we cut inputs, opposite output patterns emerge.

Example 708 Consider the following version of a Cobb-Douglas production function f :


R2+ ! R
f (x) = xa1 xb2
with a; b > 0 (we do not require a + b = 1). For each a 2 (0; 1) we have

f ( x) = ( x1 )a ( x2 )b = a+b a b
x1 x2 = a+b
f (x)

Such a production function is, thus, positively:


15.2. HOMOGENEITY AND RETURNS TO SCALE 493

(i) homogeneous if a + b = 1 (constant returns to scale);


(ii) subhomogeneous if a + b 1 (decreasing returns to scale);
(iii) superhomogeneous if a + b 1 (increasing returns to scale).

All of this can be easily extended to the general case where


n
Y
f (x) = xai i
i=1

with ai > 0 for each i. Indeed:


n
Y n
Y n
Y n
Y
f ( x) = ( xi )ai = ai ai
xi = ai
xai i
i=1 i=1 i=1 i=1
Pn n
Y Pn
= i=1 ai
xai i = i=1 ai
f (x)
i=1
Pn
for
Pn each 2 [0; 1]. It follows that f is
Phomogeneous if i=1 ai = 1, subhomogeneous if
i=1 ai 1 and superhomogeneous if n
i=1 ai 1. N

In conclusion, the notions of homogeneity are de…ned for 2 [0; 1] –that is, for propor-
tional cuts – on convex sets containing the origin. Nonetheless, their natural domains are
cones, where they model the classic returns to scale hypotheses in which both cuts, 2 [0; 1],
and raises, 1, in inputs are considered.

15.2.2 Average functions


When f : [0; 1) ! R is a scalar function de…ned on the positive half-line, the corresponding
“average function” fm : (0; 1) ! R is de…ned by
f (x)
fm (x) =
x
for each x > 0. It is important in applications: for example, if f is a production function, fm
is the average production function; if f is the cost function, fm is the average cost function;
and so on.
If f : Rn+ ! R is a function of several variables, it is no longer possible to “divide” it by
a vector x. We must, therefore, come up with an alternative concept of “average function”.
The most natural surrogate for such a function is the following. Having chosen a generic
y
vector 0 6= y 2 Rn+ , let us consider the function fm : (0; 1) ! R given by

y f (zy)
fm (z) =
z
It yields the average value of f with respect to positive multiples of z only (which is arbitrarily
chosen). In the n = 1 case, by choosing y = 1 one ends up with the previous de…nition of
average function.

The following characterization allows for a simple reinterpretation of subhomogeneity in


terms of average functions.
494 CHAPTER 15. HOMOGENEOUS FUNCTIONS

Proposition 709 A function f : C Rn+ ! R de…ned on a convex cone, with f (0) = 0,


y
is subhomogeneous if and only if the corresponding average functions fm : (0; 1) ! R are
decreasing (for any choice of y).

A function is thus subhomogeneous if and only if the corresponding average function is


decreasing. Similarly, a function is superhomogeneous if and only if its average function is
increasing.
A subhomogeneous production function is, thus, characterized by a decreasing average
production function. In other words, a decreasing average production function characterizes
decreasing returns to scale (as is quite natural to expect).

Proof “Only if”. If f is subhomogeneous one has that, for any 0 < ,

f ( y) = f y f ( y)

y y y
that is f ( y) = f ( y) = , or fm ( ) fm ( ). Therefore, the function fm is decreasing.
y y y
“If”. If fm is decreasing, by setting = 1, we have fm ( ) fm (1) for 0 < 1 and so
f ( y) = f (y), that is, f ( y) f (y) for each 0 < 1. Since f (0) = 0, the function
f is subhomogeneous.

15.2.3 Homogeneity and quasi-concavity


We conclude our study of homogeneity with a nice result, the non-simple proof of which
we omit, that shows how quasi-concavity becomes equivalent to concavity as long as we
consider positive functions which are also positively homogeneous. To better appreciate
the signi…cance of this result, recall that quasi-concavity is, in general, much weaker than
concavity.

Theorem 710 Let f : C Rn ! R be a positively homogeneous function de…ned on a


convex cone. If f 0, then f is concave if and only if it is quasi-concave.

The condition f 0 is necessary: the function f : R ! R given by


(
2x if x 0
f (x) =
x if x < 0

is strictly increasing (so, quasi-concave) and positively homogeneous. Nonetheless, it is not


concave (it is convex!).

Let us illustrate a couple of noteworthy applications of this result. In both of them, we


will use the result to prove concavity of some classic functions by showing their positivity,
quasi-concavity and positive homogeneity. This route, made possible by Theorem 710, is far
more simple than verifying concavity straightforwardly.

Corollary 711 (i) The CES production function P is concave if 0 < 1. (ii) The Cobb-
Douglas production function is concave as long as ni=1 ai = 1.

The proof is the occasion to present a useful result.


15.3. HOMOTHETICITY 495

Lemma 712 The product of two concave and strictly positive functions is a quasi-concave
function.

Proof Let f; g : C Rn ! R be strictly positive. Then, we can write log f g = log f + log g.
The functions log f and log g are concave thanks to Proposition 676. Hence, log f g is concave
because it is the sum of concave functions (Proposition 668). It follows that f g is quasi-
concave because f g = elog f g is a strictly increasing transformation of a concave function.

Proof of Corollary 711 (i) For = 1 the statement is obvious. If < 1, note that on
R+ the power function x is concave if 2 (0; 1). Hence, also g (x) = x1 + (1 ) x2
1
is concave. Since h (x) = x is strictly increasing on R+ for any > 0, it follows that
f = h g is quasi-concave. Since f 0 and thanks to Theorem 710, we conclude that f is
concave as we have previously shown its homogeneity. (ii) Any power function xi i is concave
n
Y
and strictly positive. As the function f is their product xi i , from the previous lemma we
i=1
have that it is quasi-concave. Since f 0, Theorem 710 implies that n
Pn f is concave on R+ as
we have already seen that f is positively homogeneous whenever i=1 ai = 1.

15.3 Homotheticity
15.3.1 Semicones
For the sake of simplicity, till now we considered convex sets containing the origin 0, and
cones in particular. To introduce the notions of this …nal section such an assumption becomes
too cumbersome to maintain, so we will consider the following generalization of the notion
of cone.

De…nition 713 A set C in Rn is said to be a semicone if, for every x 2 C, we have x 2 C


for any > 0.2

Unlike the de…nition of cone, here we require that x belong to C only for > 0 rather
than for 0. A cone is thus, a fortiori, a semicone. However, the converse does not hold:
the set Rn++ is a notable example of a semicone that is not a cone.

Lemma 714 A semicone C is a cone if and only if 0 2 C.

Therefore, semicones do not necessarily contain the origin and, when they do, they auto-
matically become cones. In any case, the origin is always in the surroundings of a semicone:

Lemma 715 If C is a semicone, then 0 2 @C.

The easy proofs of the above lemmas are left to the reader. The last lemma, in particular,
leads to the following result.

Proposition 716 A closed semicone is a cone.


2
This terminology is not standard.
496 CHAPTER 15. HOMOGENEOUS FUNCTIONS

The distinction between cones and semicones thus disappears when considering closed
sets. Finally, the following version of Proposition 699 holds for semicones, with coe¢ cients
that now are required to be strictly positive, as the reader can check.

Proposition 717 A set C in Rn is a convex semicone if and only if

x; y 2 C =) x + y 2 C

for all ; 0 with + > 0.

Proof “Only if”Consider ; 0 such that + > 0 and x; y 2 C. De…ne ^ = = ( + )


as well as ^ = = ( + ). Note that ^ ; ^ 2 [0; 1] and ^ = 1 ^ . Since C is convex, we
have that ^ x + ^ y 2 C. Since C is a semicone and + > 0, we have that x + y =
( + ) ^ x + ^ y 2 C. “If” Consider x; y 2 C as well as 2 [0; 1] and > 0. Note that if
we de…ne = 1 , then 0 and + = 1 > 0 as well as x + (1 ) y = x + y 2 C,
proving that C is convex. Similarly, if we set = 0, we have that + = > 0 and
x = x + y 2 C, proving C is a semicone.

Example 718 (i) The two half-lines ( 1; 0) and (0; 1) are semicones in R (but they are
not cones) (ii) The set Rn++ = fx 2 Rn : x 0g of the strongly positive vectors is a convex
semicone (which is not a cone). N

The notion of positive homogeneity can be easily extended to semicones.

De…nition 719 A function f : C Rn ! R de…ned on a semicone C is said to be positively


homogeneous if
f ( x) = f (x) (15.4)
for all x 2 C and all > 0.

The next result shows that this notion is consistent with what we did so far.

Lemma 720 Let f : C Rn ! R be a positively homogeneous function on a semicone C.


If 0 2 C, then f (0) = 0.

Proof If 0 2 C, then for every > 0 we have f (0) = f ( 0) = f (0). Hence, f (0) = 0.

Thus, when the semicone is actually a cone –i.e., it contains the origin (Lemma 714) –we
get back to the notion of positive homogeneity on cones of the previous section. Everything
…ts together.
Pn
Example P 721 Consider the function f : Rn++ ! R given by f (x) = e i=1 ai log xi , with
ai > 0. If ni=1 ai = 1, the function is positively homogeneous. Indeed, for any > 0 we
have
Pn Pn Pn Pn
ai log xi ai (log +log xi )
f ( x) = e i=1 =e i=1 = elog e i=1 ai log xi
= e i=1 ai log xi

N
15.3. HOMOTHETICITY 497

15.3.2 Homotheticity and utility


The following ordinal version of positive homogeneity is used in consumer theory.

De…nition 722 A function f : C Rn ! R de…ned on a semicone is said to be homothetic


if
f (x) = f (y) =) f ( x) = f ( y)
for every x; y 2 C and every > 0.

In particular, a utility function u is homothetic whenever the ordering between consump-


tion bundles x and y is preserved when both bundles are multiplied by the same positive
constant . By doubling (tripling, and so on) vectors, their ranking is not altered. In
preferential terms:
x y =) x y 8 >0
This property can be interpreted, in some applications, as invariance with respect to a
measurement scale.

Homotheticity has a mathematically simple, yet economically important, characterization


(the proof is left to the reader).

Proposition 723 A function h : C Rn ! R de…ned on a semicone is homothetic if and


only if it can be written as
h=f g
with g : C Rn ! R positively homogeneous and f : Im g ! R strictly increasing.

In other words, a function is homothetic if and only if it is a strictly increasing transforma-


tion of a positively homogeneous function.3 In particular, homogeneous functions themselves
are homothetic because f (x) = x is, trivially, strictly increasing.
In sum, homotheticity is the ordinal version of positive homogeneity. As such, it is the
version relevant in ordinal utility theory.

Yn
Example 724 Let u : Rn+ ! R be the Cobb-Douglas utility function u (x) = xai i , with
P i=1
ai > 0 and ni=1 ai = 1. It follows from Example 708 that such a function is positively
homogeneous. If f is strictly increasing, the transformations f u of the Cobb-Douglas utility
function are homothetic. For example, if we consider the restriction of u on the semicone Rn++
(where it is still positively homogeneous) and the logarithmic transformation
P f (x) = log x,
we obtain the log-linear utility function v = log u given by v (x) = ni=1 ai log xi , which is
thus homothetic. N

3
Let the reader be reminded that the same does not hold for quasiconcavity: as previously noted, there
are quasiconcave functions which are not transformations of concave functions.
498 CHAPTER 15. HOMOGENEOUS FUNCTIONS
Chapter 16

Lipschitz functions

16.1 Global control


Lipschitz functions are an important class of functions that, unlike concavity, does not rely
on the vector structure of Rn but only on its topological structure.1 Yet, we will see that
Lipschitzianity sheds light on the continuity properties of linear and concave functions.
We begin with the de…nition, which is stated directly in terms of operators.

De…nition 725 An operator f : A Rn ! Rm is said to be Lipschitz on a subset B of Rn


if there exists a positive scalar k > 0 such that

kf (x1 ) f (x2 )k k kx1 x2 k 8x1 ; x2 2 B (16.1)

A function is called Lipschitz, without further quali…cations, when the inequality (16.1)
holds on the entire domain of the function. When f is a function, this inequality takes the
simpler form
jf (x1 ) f (x2 )j k kx1 x2 k
where in the left hand side we have the absolute value in place of the norm.

In a Lipschitz operator, the distance kf (x1 ) f (x2 )k between the images of two vectors
x1 and x2 is controlled, through a positive coe¢ cient k, by the distance kx1 x2 k between
the vectors x1 and x2 themselves. This “variation control” that the independent variable
exerts on the dependent variable is at the heart of Lipschitzianity. The rein is especially
tight when k < 1, so variations in the independent variable cause strictly smaller variations
of the dependent variable. In this case, the Lipschitz operator is called a contraction.
The control nature of Lipschitzianity translates in a strong form of continuity. To see
how, …rst note that Lipschitz operators are continuous. Indeed, let x0 2 A. If xn ! x0 , we
have:
kf (xn ) f (x0 )k k kxn x0 k ! 0 (16.2)
and hence f (xn ) ! f (x0 ). So, f is continuous at x0 . More is true:

Lemma 726 Lipschitz operators are uniformly continuous.


1
This chapter and the next one are for coda readers. They use some (basic) di¤erential calculus notions
that will be introduced later in the book.

499
500 CHAPTER 16. LIPSCHITZ FUNCTIONS

The converse is false, as Example 728 will show momentarily. Because of its control
nature, Lipschitzianity thus embodies a stronger form of continuity than the uniform one.

Proof For each " > 0, take 0 < " < "=k. Then, kf (x) f (y)k k kx yk < " for each
x; y 2 Rn such that kx yk < " .

Example 727 A continuously di¤erentiable function f : [a; b] ! R is Lipschitz. Indeed,


set k = maxx2[a;b] jf 0 (x)j. Since the derivative f 0 is continuous on [a; b], by Weierstrass’
Theorem the constant k is well de…ned. Let x; y 2 [a; b]. By the Mean Value Theorem, there
exists c 2 [x; y] such that
f (x) f (y)
= f 0 (c)
x y
Hence,
jf (x) f (y)j
= f 0 (c) k
jx yj
So, f is Lipschitz. N
p
Example 728 The continuous function f : [0; 1) ! R de…ned by f (x) = x is not
Lipschitz. Indeed,
p
f (x) f (0) x 1
lim = lim = lim p = +1
x!0+ x 0 x!0+ x x!0+ x

So, setting y = 0, there is no k > 0 such that jf (x) f (y)j k jx yj for each x; y 0.
That said, the previous example shows that f is Lipschitz on each interval [a; b] with
a > 0. So f is not Lipschitz on its entire domain, but it is in suitable subsets of it. More
interestingly, by Theorem 526 the function f is uniformly continuous on each interval [0; b],
with b > 0, but it is not Lipschitz on [0; b]. This also shows that the converse of the last
lemma does not hold. N

Next we present a remarkable class of Lipschitz operators.

Theorem 729 Linear operators are Lipschitz.

The theorem is a consequence of the following lemma of independent interest.

Lemma 730 Given a linear operator f : Rn ! Rm , there exists a constant k > 0 such that
kf (x)k k kxk for every x 2 Rn .

In other words, if x 6= 0 we have

kf (x)k
0< k
kxk

The ratio kf (x)k = kxk is thus bounded above by a constant k, so it cannot explode, for
all non-zero vectors x. In other words, there is no sequence fxn g of vectors such that
kf (xn )k = kxn k ! +1.
16.2. LOCAL CONTROL 501

Pn
Proof Set k = i=1 f ei . We have:

n
! n n
X X X
i i
kf (x)k = f xi e = xi f e jxi j f ei
i=1 i=1 i=1

Let x = (x1 ; :::; xn ) 2 Rn . For every j = 1; ::; n we have:


v
q uX
u n 2
jxj j = x2j t xj = kxk (16.3)
j=1

So, jxi j kxk for each i = 1; :::; n. Therefore,


n
X n
X n
X
jxi j f ei kxk f ei = kxk f ei = k kxk
i=1 i=1 i=1

which implies kf (x)k k kxk, as desired.

Proof of Theorem 729 Let x; y 2 Rn . Since f is linear, the last lemma implies

kf (x) f (y)k = kf (x y)k k kx yk

So, f is Lipschitz.

16.2 Local control


Lipschitzianity is a global property because the constant k in (16.1) is required to be the
same for each pair of vectors x and y in B. It is, however, possible to give a local version of
Lipschitzianity.

De…nition 731 An operator f : A Rm ! Rn is said to be locally Lipschitz at a point


x0 2 A if there exist a neighborhood B" (x0 ) and a positive scalar kx0 > 0 such that

kf (x) f (y)k kx0 kx yk 8x; y 2 B" (x0 ) \ A

Note the local nature of this de…nition: the constant kx0 depends on the point x0 at hand
and the inequality is required only between points of a neighborhood of x0 (not between any
two points of the domain of f ).

When f is locally Lipschitz at each point of a set B we say that it is locally Lipschitz on
B. If B is the entire domain, we say that the operator is locally Lipschitz, without further
quali…cations.
Now, the “variation control” that the independent variable exerts on the dependent
variable is only local, in a neighborhood of a given point. This local control still translates in
a strong from of continuity at a point (with kx0 in place of k , (16.2) still holds as xn ! x0 ),
but no longer across points as it was the case with global Lipschitzianity.
502 CHAPTER 16. LIPSCHITZ FUNCTIONS

Example 732 A function f : [a; b] ! R is locally Lipschitz at x0 2 (a; b) if there is a


neighborhood B" (x0 ) [a; b] on which f is continuously di¤erentiable. Indeed, set

kx0 = max f 0 (x)


x2[x0 "0 ;x0 +"0 ]

where 0 < "0 < ". Since the derivative f 0 is continuous on [x0 "0 ; x0 + "0 ], by Weierstrass’
Theorem the constant k0 is well de…ned. By proceeding as in the Example 727, mutatis
mutandis, the reader can then check that f is locally Lipschitz at x0 . N

Clearly, an operator is Lipschitz on B is also locally Lipschitz on B. The converse fails,


as the example shows.

Example 733 The function f : R ! R de…ned by f (x) = x2 is easily seen to be locally


Lipschitz at each x 2 R. But, f is not Lipschitz. Otherwise, there exists k such that
x2 y 2 k jx yj for all x; y 2 R. So, jx + yj k for all x; y 2 R, which is impossible. N

There is, however, an important case where local and global Lipschitzianity are equival-
ent.

Proposition 734 An operator f : A Rm ! Rn is Lipschitz on a compact set K A if


and only if it is locally Lipschitz on K.

Proof Since the “only if ” is obvious, we only prove the “if.” Assume that f is locally
Lipschitz on K. Suppose, by contradiction, that f is not Lipschitz on K. So, there exist two
sequences fxn g and fyn g in K such that
kf (xn ) f (yn )k
! +1 (16.4)
kxn yn k
Since K is compact, by the Bolzano-Weierstrass’ Theorem there exist two subsequences
fxnk g and fynk g such that xnk ! x 2 K and ynk ! y 2 K. Since f is continuous, we have
f (xnk ) ! f (x) and f (ynk ) ! f (y). We consider two cases.

(i) Suppose x 6= y. Then, kx yk > 0 and so


kf (xnk ) f (ynk )k kf (x) f (y)k
lim = < +1
k!1 kxnk ynk k kx yk
which contradicts (16.4).

(ii) Suppose x = y. By hypothesis, f is locally Lipschitz at x, so there is B" (x) such that

kf (x) f (y)k kx kx yk 8x; y 2 B" (x)

Since xnk ! x and ynk ! x, there is a large enough k" 1 so that xnk ; ynk 2 B" (x)
for all k k" . Then,
kf (xnk ) f (ynk )k
kx 8k k"
kxnk ynk k
which contradicts (16.4).
16.2. LOCAL CONTROL 503

In both cases, we thus end up with a contradiction. We conclude that f is Lipschitz on


K.

The next important result shows that concave functions are locally Lipschitz, thus clari-
fying the continuity properties of these fundamental functions.

Theorem 735 Let f : C ! R be a concave function de…ned on an open convex set C of


Rn . Then, f is locally Lipschitz.

In view of Proposition 734, f is then Lipschitz on each compact set K C. The theorem
is a consequence of the following lemma of independent interest.

Lemma 736 Let f : C ! R be a concave function de…ned on an open convex set C of Rn .


Then, f is locally bounded at each x 2 C, i.e., there exists a positive scalar mx0 > 0 and a
neighborhood B" (x0 ) such that

jf (x)j mx0 8x 2 B" (x0 )

Proof Let x0 2 C. Since C is open, there is > 0 small enough so that x0 + ei 2 C


for every versor ei . Consider the convex hull D = co x0 ; x0 + e1 ; :::; x0 + en . Since C
is convex, we have D C. ByP Proposition 645, int D is convex. Let
P x 2 int D. There is
f i gni=0 , with each i 0 and ni=0 i = 1, such that x = 0 x0 + ni=1 i x0 + ei . By
concavity, we then have
n
X
f (x) 0f (x0 ) + if x0 + ei min f (x0 ) ; f x0 + ei
i=1;:::;n
i=1

Set mx0 = mini=1;:::;n f (x0 ) ; f x0 + ei . We thus have f (x) mx0 for all x 2 int D.
Given any neighborhood B" (x0 ) int D, we have f (x) mx0 for all x 2 B" (x0 ).
So, f is locally bounded below. Next we show that it is also bounded above on B" (x0 ).
For, let y 2 B" (x0 ). Consider the point z = 2x0 y = x0 (y z0 ). Clearly, z 2 B" (x0 )
and x0 = (z + y) =2. By concavity,

1 1 1 1
f (x0 ) = f z+ y f (z) + f (y)
2 2 2 2

Since y was arbitrarily chosen in B" (x0 ), we have

f (y) 2f (x0 ) f (z) 2f (x0 ) mx0 8y 2 B" (x0 )

as desired.

Proof of Theorem 735 We want to show that f is locally Lipschitz at any x 2 C. By the
last lemma, f is locally bounded at x, i.e., there exists mx 2 R and a neighborhood B2" (x),
without loss of generality of radius 2", such that jf (y)j mx for all y 2 B2" (x). Given
y1 ; y2 2 B2" (x), set
"
y3 = y2 + (y2 y1 )
ky2 y1 k
504 CHAPTER 16. LIPSCHITZ FUNCTIONS

Then, y3 2 B2" (x) since

"
ky3 xk = y3 y2 + (y2 y1 ) 2"
ky2 y1 k

Since
" ky2 y1 k
y2 = y1 + y3
ky2 y1 k + " ky2 y1 k + "
concavity implies

" ky2 y1 k
f (y2 ) f (y1 ) + f (y3 )
ky2 y1 k + " ky2 y1 k + "

so that
ky2 y1 k ky2 y1 k
f (y1 ) f (y2 ) (f (y1 ) f (y3 )) 2mx (16.5)
ky2 y1 k + " "
Interchanging the roles of y1 and y2 , we get

ky1 y2 k ky1 y2 k
f (y2 ) f (y1 ) (f (y2 ) f (y3 )) 2mx
ky1 y2 k + " "

Along with (16.5), this implies

2mx
jf (y1 ) f (y2 )j ky1 y2 k
"
So, f is locally Lipschitz at x.

16.3 Translation invariance


De…nition 737 A function f : Rn ! R, with f (1) 6= 0, is said to be:2

(i) translation invariant if, for all x 2 Rn ,

f (x + k) = f (x) + kf (1) 8k 0 (16.6)

(ii) Blackwell if we replace = with .3

In words, a function is translation invariant if we can “take out positive constants”, a


very weak form of linearity. Indeed, if f is linear we can take out any function, a much
stronger property. Even less is required on Blackwell functions.4
Note that a translation invariant f is normalized (Section 13.1.4) provided f (0) = 0 and
f (1) = 1. Indeed, by taking x = 0, we then have f (k) = f (0 + k) = f (0) + kf (1) = k.
Before presenting an example, next we show that translation invariant is a stronger notion
than it may appear prima facie.
2
Throughout this section we set k = (k; :::; k) 2 Rn .
3
This terminology is not standard.
4
They are named after David Blackwell, who showed their great importance in dynamic programming.
16.3. TRANSLATION INVARIANCE 505

Lemma 738 A function f : Rn ! R is translation invariant if and only if condition (16.6)


holds for all scalars k 2 R.
So, even if in the de…nition we only require invariance with respect to positive constants,
it actually holds for any constant, positive or not.

Proof We only prove the “only if”, the converse being trivial. Let f : Rn ! R be translation
invariant. We need to prove that (16.6) holds when k < 0. Let c 0. For each x 2 Rn , we
have f (x) = f (x c + c) = f (x c) + cf (1), so f (x c) = f (x) cf (1). Now, let k < 0.
Since k 0, setting c = k by what just proved we have
f (x + k) = f (x ( k)) = f (x) ( k) f (1) = f (x) + kf (1)
as desired.
Example 739 De…ne f : Rn ! R by
f (x) = min li (x)
i=1;:::;n

where each li : Rn ! R is a linear function with li (1) = c 6= 0. Clearly, f (0) = 0 and


f (1) = c. The function f is translation invariant: for every x 2 Rn we have:
f (x + k) = min li (x + k) = kc + min li (x) = f (x) + kf (1) 8k 0
i=1;:::;n i=1;:::;n

It is normalized if and only if c = 1. Later in the book, Theorem 1169 will characterize this
class of translation invariant functions. N
Though translation invariance is much weaker than linearity, under monotonicity we still
have Lipschitzianity. Actually, for the result is enough that the function be Blackwell.
Proposition 740 An increasing Blackwell function is Lipschitz.
Proof First, note that since f is increasing, we have f (1) > 0. Let x 2 Rn . By (16.3), we
have jxi j kxk for each i = 1; :::; n.5 Therefore, maxi=1;:::;n jxi j kxk, which in turn implies
x y maxi=1;:::;n jxi yi j kxk for all x; y 2 Rn . So x y + kx yk. Since f is increasing
and Blackwell, we then have
f (x) f (y + kx yk) f (y) + kx yk f (1)
So, f (x) f (y) kx yk for all x; y 2 Rn . By exchanging the roles of x and y, we also
have f (y) f (x) kx yk for all x; y 2 Rn . We conclude that
jf (x) f (y)j f (1) kx yk 8x; y 2 Rn
as desired.

N.B. The proof shows that an increasing Blackwell function f is a contraction if and only
if f (1) < 1. In applications, this is the most relevant case. O

Remarkably, like positive homogeneity (Theorem 710), also under translation invariance
concavity and quasi-concavity are equivalent properties.
5
To ease matters, in this proof with an abuse of notation we write x k and x + k in place of x k and
x + k.
506 CHAPTER 16. LIPSCHITZ FUNCTIONS

Theorem 741 A translation invariant function is concave if and only if it is quasi-concave.

Proof We only prove the “if”, the converse being obvious. Let f be quasi-concave. We
have, for all x 2 Rn ,

t t
f (x) t () f (x) t 0 () f x = f (x) f (1) 0 8t 2 R
f (1) f (1)

where t = (t; :::; t). So, for all x 2 Rn , we have


t
x 2 (f t) () x 2 (f 0) 8t 2 R
f (1)

which implies6
t
(f t) = (f 0) + 8t 2 R
f (1)
If t and s are any two scalars and 2 (0; 1), then

t + (1 )s
(f t) + (1 ) (f s) = (f 0) + (1 ) (f 0) + (16.7)
f (1)
t + (1 )s
= (f 0) + = (f t + (1 ) s)
f (1)

Take any two points x; y 2 Rn and set f (x) = t and f (y) = s. Then, x 2 (f t) and
y 2 (f s), and x + (1 ) y 2 (f t) + (1 ) (f s). By (16.7), x + (1 )y 2
(f t + (1 ) s), that is,

f ( x + (1 ) y) t + (1 ) s = f (x) + (1 ) f (y)

So, f is concave.

Example 742 De…ne f : Rn ! R by


n
X
1 xi
f (x) = log ie
i=1
Pn
where ; > 0 and i=1 i = 1. We have f (0) = 0 and f (1) = = , as well as f (x + k) =
f (x) + kf (1) for all k 0. Hence, f is translation invariant, while it is normalized if and
onlyPif = . The function f is easily seen to be increasing. It is also quasi-concave because
log ni=1 P ie
xi is quasi-convex, being a strictly increasing transformation of the convex
n
function i=1 i e xi (which is a sum of convex functions). By the last two results, we
conclude that f is concave and Lipschitz. It is a contraction if and only if 0 < < . N

6
To be precise, the right hand side is the sum of sets
t t
(f 0) + = x+ : x 2 (f 0)
f (1) f (1)
in the sense of Section 32.3. Later in the proof we add upper contour sets.
Chapter 17

Supermodular functions

17.1 Lattices
We being by introducing lattices, an important class of sets. Given any two vectors x; y 2 Rn ,
the join x _ y is the vector of Rn such that

(x _ y)i = max fxi ; yi g 8i = 1; :::; n

while the meet x ^ y is the vector of Rn such that

(x ^ y)i = min fxi ; yi g 8i = 1; :::; n

In words, x _ y is the smallest vector that is larger than both x and y, while x ^ y is the
largest vector that is smaller than both of them. That is, for all z 2 Rn we have

z x and z y =) z x_y

and
z x and z y =) z x^y

Example 743 Let x = (0; 1) and y = (2; 0) be two vectors in the plane. We have

(x _ y)1 = max fx1 ; y1 g = max f0; 2g = 2 , (x _ y)2 = max fx2 ; y2 g = max f1; 0g = 1

so x _ y = (2; 1), while

(x ^ y)1 = min fx1 ; y1 g = min f0; 2g = 0 , (x ^ y)2 = min fx2 ; y2 g = min f1; 0g = 0

so x ^ y = (0; 0). N

De…nition 744 A set L of Rn is a lattice if, for any two elements x and y of L, both x _ y
and x ^ y belong to L.

Lattices are, thus, subsets L of Rn that are closed under joins and meets, that is, both
the join and the meet of any its two elements belongs to L.

507
508 CHAPTER 17. SUPERMODULAR FUNCTIONS

Example 745 (i) Given any x; y 2 Rn , the quadruple fx; y; x _ y; x ^ yg is the simplest
example of a …nite lattice. (ii) Given any a; b 2 Rn , with a b, the interval

[a; b] = fx 2 Rn : a x bg

is clearly a lattice. Indeed, if a x b and a y b, it is easy to check that a x ^ y


x _ y b. Also the open and half-closed intervals in Rn are easily seen to be lattices. (iii)
A rectangle I = I1 In in Rn , where each Ii is an interval of the real line (bounded or
not), is a lattice. The intervals [a; b] are compact rectangles in which Ii = [ai ; bi ]. N

The next simple, yet key, property relates meets, joins and sums.

Proposition 746 Given any x; y 2 Rn , we have

x+y =x_y+x^y (17.1)

Proof The equality is trivially true if x and y are scalars. If x and y are vectors of Rn , we
then have:

x ^ y + x _ y = ((x ^ y)1 ; :::; (x ^ y)n ) + ((x _ y)1 ; :::; (x _ y)n )


= ((x ^ y)1 + (x _ y)1 ; :::; (x ^ y)n + (x _ y)n )
= (x1 + y1 ; :::; xn + yn ) = x + y

as desired.

17.2 Supermodular functions


Next we introduce functions that have lattices as their natural domain.1

De…nition 747 A function f : L Rn ! R is said to be:

(i) supermodular if f (x _ y) + f (x ^ y) f (x) + f (y) for all x; y 2 L;

(ii) submodular if the inequality is reversed;

(iii) modular if it is both supermodular and submodular.

Clearly, supermodularity and submodularity are dual notions, with f supermodular if


and only if f is submodular. In the rest of the chapter we will focus on supermodular
functions.

Example 748 (i) Functions of a single variable are modular. Indeed, let x; y 2 R with, say,
x y. Then, x ^ y = x and x _ y = x, so modularity trivially holds. (ii) Linear functions
f : Rn ! R are modular: by (17.1) we have

f (x _ y) + f (x ^ y) = f (x _ y + x ^ y) = f (x + y) = f (x) + f (y)

for all x; y 2 Rn . (iii) The function f : R2+ ! R de…ned by f (x1 ; x2 ) = x1 x2 is supermodular,


as the reader can check. N
1
Throughout the chapter L denotes a lattice in Rn .
17.3. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 509

Interestingly, the modularity notions just introduced have no bite on functions of a single
variable, so they are of interest only in the multivariable case. That said, the next two results
show how to manufacture supermodular functions via convex transformations.
Proposition 749 Let f : L ! R be a monotone and modular function. If ' : C ! R is
a convex function de…ned on a convex set of the real line, with Im ' C, then ' f is
supermodular.
Proof Let x; y 2 I with, say, f (x) f (y). By modularity, we have f (x _ y) f (x) =
f (y) f (x ^ y). We consider two cases. (i) Suppose that f is increasing. We then have
f (x _ y) f (x) f (y) f (x ^ y). Since ' has increasing increments (cf. Proposition
1090), we then have ' (f (y)) ' (f (x ^ y)) ' (f (x _ y)) f (x). So, ' f is supermodular.
(ii) Suppose that f is decreasing. Now, f (x _ y) f (y) f (x) f (x ^ y) and, since '
has increasing increments, we have ' (f (y)) ' (f (x _ y)) ' (f (x ^ y)) ' (f (x)). We
conclude that also in this case ' f is supermodular.
Example 750 Let f : Rn ! R be a positive linear function. Given any convex function
' : R ! R, the function ' f is supermodular. N
Proposition 751 Let f : L ! R be an increasing and supermodular function. If ' : C ! R
is a convex and increasing function de…ned on a convex set of the real line, with Im ' C,
then ' f is supermodular.
Proof Let x; y 2 I with, say, f (x) f (y). Since f is increasing, we have f (x ^ y) f (y)
f (x) f (x _ y). Set k = f (x _ y) f (x) f (y) f (x ^ y) = h. Since f is supermodular,
we have k h 0. Since ' has increasing increments, we then have
' (f (y)) ' (f (x ^ y)) = ' (f (x ^ y) + h) ' (f (x ^ y)) ' (f (x) + h) ' (f (x))
' (f (x) + k) ' (f (x)) = ' (f (x _ y)) ' (f (x))
where the last inequality holds because ' is increasing. So, ' f is supermodular.
Example 752 De…ne f : R2+ ! R by f (x1 ; x2 ) = x1 x2 . Given any increasing and convex
function ' : R+ ! R, the function ' f is supermodular. N

17.3 Functions with increasing cross di¤erences


17.3.1 Sections
A function f : A1 A2 ! R de…ned on a Cartesian product A1 A2 induces the functions
f x1 : A2 ! R de…ned by f x1 (x2 ) = f (x1 ; x2 ) for each x1 2 A1 as well as the functions
f x2 : A1 ! R de…ned by f x2 (x1 ) = f (x1 ; x2 ) for each x2 2 A2 . These functions are called
the sections of f .
Example
p 753 Consider the function f : [1; +1) [3; +1) ! R de…ned by f (x1 ; x2 ) =
(x1 1) (x2 3). For a …xed x1 1, the section f x1 : [3; +1) ! R has x2 as the
independentpvariable. For instance, if x1 = 5 the section f 5 : [3; +1) ! R is de…ned by
f 5 (x2 ) = 2 x2 3. On the other hand, for a …xed x2 3, the section f x2 : [1; +1) ! R
has x1 as the independentpvariable. For example, if x2 = 12 the section f 12 : [1; +1) ! R
is de…ned by f 12 (x1 ) = 3 x1 1. N
510 CHAPTER 17. SUPERMODULAR FUNCTIONS

More in general, a function f : A1 An ! R de…ned on a Cartesian product


A1 An induces, for each i = 1; :::; n, the sections f xi : A i ! R de…ned by f xi (x i ) =
f (xi ; x i ) in which the vector x i is the variable.2
On the other hand, rather than blocking a single variable, we can do the opposite: block
all but a single variable. In this case, for each i = 1; :::; n we have the section f x i : Ai ! R
de…ned by f x i (xi ) = f (xi ; x i ) in which the scalar xi is the variable.

Example 754 p Consider the function f : [1; +1) [3; +1) [2; +1) ! R de…ned by
f (x1 ; x2 ; x3 ) = (x1 1) (x2 3) (x3 2). For a …xed x1 1, the section f x1 : [3; +1)
[2; +1) ! R now has x2 and x3 as the independent variables – indeed, we have x 1 =
(xp 5 5
2 ; x3 ). For instance, if x1 = 5 the section f : [3; +1) [2; +1) ! R is de…ned by f (x2 ) =
x
2 (x2 3) (x3 2). In a similar way we can de…ne the sections f 2 : [1; +1) [2; +1) ! R
and f x3 : [1; +1) [3; +1) ! R.
On the other hand, if we …x x 1 = (x2 ; x3 ) 2 [3; +1) [2; +1), we have the section
f x 2 ;x 3 : [1; +1) ! R that has x as the independent variable. For instance, if x = 6 and
1 p p 2
x3 = 10, the section f 6;4 : [1; +1) ! R is de…ned by f 6;4 (x1 ) = 2 8 x1 1. In a similar
way we can de…ne the sections f x1 ;x3 : [3; +1) ! R and f x1 ;x2 : [2; +1) ! R. N

The sections f x i can be used to formalize ceteris paribus arguments in which all variables
are kept …xed, except xi . Indeed, partial derivation at a point x 2 Rn can be expressed in
terms of these sections:

@f (x) fx i (xi + h) fx i (xi )


= lim
@xi h!0 h

In sum, we have sections f xi in which the variable xi is kept …xed and the other variables
vary, as well as a section f x i in which the opposite holds: the variable xi is the only
independent variables, the other ones being kept …xed. In a similar spirit we can have
“intermediate” sections in which we block a subset of the variables.

Example 755 Consider the p function f : [1; +1) [3; +1) [2; +1) [ 1; +1) ! R
de…ned by f (x1 ; x2 ; x3 ) = (x1 1) (x2 3) (x3 2) (x4 + 1). The “intermediate” section
f x2 ;x3 : [1; +1) [ 1; +1) ! R has p x1 and x4 as independent variables. So, if x2 = 6 and
x3 = 5, we have f x2 ;x3 (x1 ; x4 ) = 3 (x1 1) (x4 + 1). N

In terms of notation, sections f x i : Ai ! R and f xi : A i ! R are often written as


f ( ; x i ) : Ai ! R and f (xi ; ) : A i ! R, respectively. For instance, we then write

@f (x) f (xi + h; x i ) f (x)


= lim
@xi h!0 h

Though this notation is more handy, superscripts best emphasize the parametric role of the
blocked variables.
2
Recall the notation x i from Section 12.8.1. Here A i is the Cartesian products of all sets fA1 ; :::; An g
except Ai , i.e., A i = j6=i Aj .
17.3. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 511

17.3.2 Increasing cross di¤erences and complementarity


In what follows we denote by I = I1 In a rectangle in Rn , where each interval Ii is
bounded or not.

De…nition 756 A function f : I Rn ! R has increasing (cross) di¤ erences if, for each
xi 2 Ii and hi 0 with xi + hi 2 Ii , the di¤ erence

f xi +hi (x i ) f xi (x i )

is increasing in x i , while f has decreasing di¤ erences if such di¤ erence is decreasing in x i .

Increasing and decreasing di¤erences are dual notions, so we will focus on the former.
For functions of two variables, we have a simple characterization of this property.

Proposition 757 A function f : I R2 ! R of two variables has increasing di¤ erences if


and only if

f (x1 ; x2 + h2 ) f (x1 ; x2 ) f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 ) (17.2)

for all xi 2 Ii and hi 0 with xi + hi 2 Ii .

Proof Let (x1 ; x2 ) 2 I and (h1 ; h2 ) 0 with xi + hi 2 Ii . By de…nition, f has increasing


di¤erences when the di¤erences

f x1 +h1 (x2 ) f x1 (x2 ) and f x2 +h2 (x1 ) f x2 (x1 )

are increasing in x2 2 I2 and in x1 2 I1 , respectively. In particular, we then have

f x1 +h1 (x2 ) f x1 (x2 ) f x1 +h1 (x2 + h2 ) f x1 (x2 + h2 ) (17.3)

and
f x2 +h2 (x1 ) f x2 (x1 ) f x2 +h2 (x1 + h1 ) f x2 (x1 + h1 ) (17.4)
which are both equal to (17.2).

The inequality (17.2) admits an important economic interpretation. If f is a production


function, it says that the marginal contribution of increasing the second input from x2 to
x2 + h2 increases when we increase the …rst input from x1 to x1 + h1 . By rearranging the
terms in the inequality (17.2) we have

f (x1 + h1 ; x2 ) f (x1 ; x2 ) f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 )

So, symmetrically, an increase in the …rst input has a higher impact when also the second
input increases. In sum, the marginal contribution of an input is increasing with the other
input: the two inputs are complementarity.

Proposition 758 A function f : I Rn ! R has increasing di¤ erences if and only if, for
each 1 i 6= j n, the section f x ij : I ij R2 ! R satis…es (17.2), i.e.,

fx ij
(xi ; xj + hj ) fx ij
(xi ; xj ) fx ij
(xi + hi ; xj + hj ) fx ij
(xi + hi ; xj ) (17.5)

for all (xi ; xj ) 2 Ii Ij and hi ; hj 0 with xi + hi 2 Ii and xj + hj 2 Ij .


512 CHAPTER 17. SUPERMODULAR FUNCTIONS

In terms of the previous interpretation, we can say that a production function has increas-
ing di¤erences if and only if its inputs are pairwise complementary. Increasing di¤erences
thus model this form of complementarity. In a dual way, decreasing di¤erences model an
analog form of substitutability.
Proof Assume that f has increasing di¤erences. To …x ideas, let i = 1 and j = 2. We want
to show that

fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) fx 12
(x1 + h1 ; x2 + h2 ) fx 12
(x1 + h1 ; x2 )

We have

fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) = f (x1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 ; x2 ; x3 ; :::; xn )
x2 +h2 x2
=f (x1 ; x3 ; :::; xn ) f (x1 ; x3 ; :::; xn )
f x2 +h2 (x1 + h1 ; x3 ; :::; xn ) f x2 (x1 + h1 ; x3 ; :::; xn )
= f (x1 + h1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 + h1 ; x2 ; x3 ; :::; xn )
x x
=f 12
(x1 + h1 ; x2 + h2 ) f 12
(x1 + h1 ; x2 )

as desired. The general case is analogous, just notationally cumbersome. So, (17.5) holds.
We omit the proof of the converse.

The complementarity nature of functions with increasing di¤erences, in which “the mar-
ginal contribution of an input is increasing with the other input”, has mathematically a
(cross) second order ‡avor. The next di¤erential characterization con…rms this intuition.

Proposition 759 A continuously di¤ erentiable function f : (a; b) Rn ! R has increasing


di¤ erences if and only if, for each 1 i 6= j n, we have

@f (x)
0 (17.6)
@xi @xj

Proof “Only if”. Suppose f has increasing di¤erences. To …x ideas, let i = 1 and j = 2. By
Proposition 758, the section f x 12 : I1 I2 ! R satis…es (17.2). Let x1 x01 . By setting
0
h1 = x1 x1 , we get

fx 12 (x1 ; x2 + h2 ) fx 12 (x1 ; x2 ) fx 12 (x01 ; x2 + h2 ) fx 12 (x01 ; x2 )


:
h2 h2

So, letting h2 ! 0, we conclude that

@f (x1 ; x2 ; :::; xn ) @f (x01 ; x2 ; :::; xn )


x1 x01 =)
@x2 @x2

In turn, this implies @f (x) =@x2 @x1 0. A similar argument shows that @f (x) =@x1 @x2 0.
“If”. Suppose @f (x) =@xi @xj 0 for all 1 i 6= j n. In view of Proposition 758, it
is enough to show that the sections f x ij have increasing di¤erences. Again to …x ideas, let
17.3. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 513

i = 1 and j = 2. By hypothesis, x1 x01 implies @f x 12 (x1 ; x) =@x2 @f x 12 (x01 ; x) =@x2 .


Since f is continuously di¤erentiable, its partial derivatives are continuous. So, we have
Z x2 +h2
x x @f x (x1 ; x2 )
12
f 12
(x1 ; x2 + h2 ) f 12
(x1 ; x2 ) = dx2
x2 @x2
Z x2 +h2
@f x 12 (x1 + h1 ; x2 )
dx2
x2 @x2
= fx 12
(x1 + h1 ; x2 + h2 ) fx 12
(x1 + h1 ; x2 )

By Proposition 757, f x 12 has increasing di¤erences.

Example 760 (i) Let f : R2+ ! R be a CES production function de…ned by f (x) =
1
( x1 + (1 ) x2 ) with 2 [0; 1] and > 0 (cf. Example 705). We have

@f (x) 1 1
2
= (1 ) (1 ) (x1 x2 ) ( x1 + (1 ) x2 )
@x1 @x2
By the previous result, f has decreasing di¤erences if > 1 and increasing di¤erences
if 0 < < 1. So, the parameter determines whether the inputs in the CES pro-
duction functions are complements or substitutes. (ii) Let f : R2+ ! R be a Cobb-
Douglas production function f (x) = x1 1 x2 2 , with 1 ; 2 > 0 (cf. Example 708). Since
@f (x) =@x1 @x2 = 1 2 x1 1 1 x2 2 1 , by the previous result f has increasing di¤erences (so,
its inputs are complements). N

Next we establish a key characterization of increasing di¤erences through supermodu-


larity, a simpler analytical property. Because of this result, one can say that supermodular
functions model complementarities.

Theorem 761 A function f : I Rn ! R has increasing di¤ erences if and only if it is


supermodular, i.e.,

f (x _ y) + f (x ^ y) f (x) + f (y) 8x; y 2 I

A function f of several variables is easily seen to admit the following “telescopic”expan-


sion: if x y, then

f (y) f (x) = f (y1 ; x2 ; :::; xn ) f (x1 ; :::; xn ) + f (y1 ; y2 ; x3 ; :::; xn ) f (y1 ; x2 ; x3 ; :::; xn )
+ + f (y1 ; :::; yn ) f (y1 ; :::; yn 1 ; xn )
n
X
= f (y1 ; :::; yi ; xi+1 ; :::; xn ) f (y1 ; :::; yi 1 ; xi ; :::; xn )
i=1

The proof of the previous theorem relies on this expansion.

Proof “If”. Suppose that f has increasing di¤erences. Let x; y 2 I. By (17.1), we can set

h=x_y x=y x^y 0


514 CHAPTER 17. SUPERMODULAR FUNCTIONS

and so x _ y = x + h and x ^ y = y h. By the telescoping expansion, we have


f (x _ y) f (x) = f (x + h) f (x)
Xn
= f (x1 + h1 ; :::; xi + hi ; xi+1 ; :::; xn ) f (x1 + h1 ; :::; xi 1 + hi 1 ; xi ; :::; xn )
i=1
Xn
= f xi +hi (x1 + h1 ; :::; xi+1 ; :::; xn ) f xi (x1 + h1 ; :::; xi 1 + hi 1 ; xi+1 ; :::; xn )
i=1
Xn
f xi +hi (y1 ; :::; yi 1 ; yi+1 hi+1 ; :::; yn hn ) f xi (y1 ; :::; yi 1 ; yi+1 hi+1 ; ::
i=1
Xn
= f (y1 ; :::; yi 1 ; xi + hi ; yi+1 hi+1 ; :::; yn hn ) f (y1 ; :::; yi 1 ; xi ; yi+1 hi
i=1
Xn
= f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(xi + hi ) f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(xi )
i=1
Xn
f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(yi ) f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(yi hi )
i=1
Xn
= f (y1 ; :::; yi; ; yi+1 hi+1 ; :::; yn hn ) f (y1 ; :::; yi 1 ; yi hi ; :::; yn hn )
i=1
= f (y) f (y h) = f (y) f (x ^ y)
where the …rst inequality follows from increasing di¤erences, while the second one holds
because a function of a single variable – like the section f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn – is
trivially supermodular.
“Only if”. Suppose that f is supermodular. In view of Proposition 758, it is enough to
show that 17.5 holds. Let y = (xi ; xj + hj ; x ij ) 2 Rn and z = (xi + hi ; xj ; x ij ) 2 Rn , so
that z _ y = (xi + hi ; xj + hj ; x ij ) and z ^ y = x. By the supermodularity of f , we then
have
fx ij
(xi + hi ; xj + hj ) + f x ij
(xi ; xj ) = f (xi + hi ; xj + hj ; x ij ) + f (x) = f (z _ y) + f (z ^ y)
f (z) + f (y) = f (xi + hi ; xj ; x ij ) + f (xi ; xj + hj ; x ij )
x x
=f ij
(xi + hi ; xj ) + f ij
(xi ; xj + hj )
as desired.

17.4 Supermodularity and concavity


In general, concavity and supermodularity are independent properties: there exist supermod-
ular functions that are not concave (just take any non-concave function of a single variable) as
well as concave functions that are not supermodular –for instance the function f : R2++ ! R
de…ned by f (x1 ; x2 ) = log (x1 + x2 ) is strictly concave but not supermodular.
Remarkably, supermodular functions become tightly connected to concave functions un-
der either positive homogeneity or translation invariance, as it was the case for quasi-concave
functions (Theorems 710 and 741).
17.4. SUPERMODULARITY AND CONCAVITY 515

Theorem 762 (Choquet) Let f : Rn+ ! R be positively homogeneous. If f is supermodu-


lar, then it is concave. The converse holds if n = 2.

For production functions, this means that, under constant returns to scale, complement-
arity implies concavity.

Proof We only prove the result when f is twice di¤erentiable on Rn++ . Let x; y 2 Rn+ . From

yi yj 2
yi2 yj2 yi yj
= + 2
xi xj x2i x2j xi xj

it follows that
2
1 xj 1 xi 1 yi yj
yi yj = yi2 + yj2 xi xj
2 xi 2 xj 2 xi xj
So,
0 1
X n
X Xn X 2
@f (x) y2 i @ @f (x) A 1 @f (x) yi yj
yi yj = xj xi xj
@xi @xj xi @xi @xj 2 @xi @xj xi xj
1 i;j n i=1 j=1 1 i;j n

Since f is homogeneous, by Euler’s formula we have


n
X @f (x)
f (x) = xi 8x 2 Rn++
@xi
i=1

By di¤erentiating with respect to xj , we then have


n
@f (x) @f (x) X @f (x)
= + xi 8x 2 Rn++
@xj @xj @xi @xj
i=1

that is,
n
X @f (x)
xi = 0 8x 2 Rn++
@xi @xj
i=1
We conclude that, for all x 2 Rn++ ,
X @f (x) 1 X @f (x) yi yj 2
yi yj = xi xj
@xi @xj 2 @xi @xj xi xj
1 i;j n 1 i;j n
X @f (x) yi yj 2
= xi xj 0
@xi @xj xi xj
1 i6=j n

where the last inequality follows from (17.6) and Theorem 761. The Hessian matrix of f
is thus negative semide…nite for all x 2 Rn++ and so f is concave on Rn++ . The reader can
check that the converse holds when n = 2.

Example 763 Let f : R2+ ! R be the positively homogeneous function de…ned by f (x) =
1
(x1 1 x2 2 ) 1 + 2 , with 1 ; 2 > 0. It is supermodular if 1 + 2 1 (why?), so it is concave
by Choquet’s Theorem. N
516 CHAPTER 17. SUPERMODULAR FUNCTIONS

A similar result holds for translation invariant functions (we omit the proof of this note-
worthy result).

Theorem 764 Let f : Rn ! R be translation invariant. If f is supermodular, then it is


concave. The converse holds if n = 2.

17.5 Log-convex functions


In what follows we denote by C a convex set in Rn .

De…nition 765 A strictly positive function f : C ! (0; 1) is said to be log-convex if

f ( x + (1 ) y) [f (x)] [f (y)]1

for every x; y 2 C and 2 [0; 1], and it is said to be log-concave if the inequality is reversed.

The next lemma motivates the terminology.

Lemma 766 A strictly positive function f : C ! (0; 1) de…ned on a convex set C is


log-convex (log-concave) if and only if the composite function log f is convex (concave).

Proof We prove the convex version, the concave one being similar. “If”. Let log f be convex.
In view of Proposition 43, we have

f ( x + (1 ) y) = elog f ( x+(1 )y)


e log f (x)+(1 ) log f (y)
=e log f (x) (1
e ) log f (y)
1
= elog[f (x)] elog[f (y)] = [f (x)] [f (y)]1

So, f is long-convex. “Only if”. Let f be long-convex. Then,

log f ( x + (1 ) y) log [f (x)] [f (y)]1 = log [f (x)] + log [f (y)]1


= log f (x) + (1 ) log f (y)

as desired.
2
Example 767 (i) The function f : R ! (0; 1) given by f (x) = ex is log-convex. (ii)
2
The Gaussian function f : R ! (0; 1) de…ned by f (x) = e x is log-concave. (iii) The
exponential function is both log-concave and log-convex. N

Log-convexity is much better behaved than log-concavity, as the next result and example
show. They are far from being dual notions.

Proposition 768 (i) Log-convex functions are convex. (ii) Concave functions are log-
concave functions, which in turn are quasi-concave.

Proof (i) Let f be long-convex. Since log f is convex, the result follows from the convex
version of Proposition 676-(i) because we can write f = elog f . (ii) Obvious.
17.5. LOG-CONVEX FUNCTIONS 517

Example 769 The quadratic function f : (0; 1) ! (0; 1) de…ned by f (x) = x2 is, at the
same time, strictly convex and log-concave. Indeed, in view of the last lemma, it is enough
to note that log f (x) = 2 log x is concave. So, the converse of point (i) of the last proposition
fails (there exist convex functions that are not log-convex), while point (ii) is all we can say
about log-concave functions (they can even be strictly convex). N

It is easy to check that the product of log-convex functions is log-convex, as well as that
the product of log-concave functions is log-concave. Addition, instead, does not preserve
log-concavity.

Example 770 Let f; g : R ! R be the log-concave functions given by f (x) = ex and


g (x) = e2x . Their sum h (x) = ex + e2x is not log-concave. Indeed,

d2 x 2x e x
log e + e = >0
dx2 (1 + e x )2

so log f is not concave. N

As a further proof of the much better behavior of log-convexity, we have the following
remarkable result that shows that addition preserves log-convexity (we omit the proof).

Theorem 771 (Artin) The sum of log-convex functions is log-convex.

Example 772 Given n strictly positive scalars ti > 0 and a strictly positive function ' :
(0; 1) ! (0; 1), de…ne f : C ! (0; 1) by
n
X
f (x) = ' (ti ) txi
i=1

where C any interval of the real line, bounded or not. By Artin’s Theorem, f is log-convex.
Indeed, each function ' (ti ) txi is log-convex in x because log ' (ti ) txi = log ' (ti ) + x log ti is
a¢ ne in x.
An integral version of Artin’s Theorem actually permits to conclude that if ' is continu-
ous, then the function f : C ! (0; 1) de…ned by
Z 1
f (x) = ' (t) tx 1 dt
0

is log-convex (provided the improper integrals are well de…ned for all x 2 C). In this regard,
note that the function ' (t) tx 1 is log-convex in x since log ' (t) tx 1 = log ' (t)+(x 1) log t
is a¢ ne in x. In the special case ' (t) = e t and C = (0; 1), the function f is the classic
gamma function Z 1
(x) = tx 1
e t dt
0
We will consider this log-convex function later in the book (Section 23.5). N
518 CHAPTER 17. SUPERMODULAR FUNCTIONS
Part V

Optima

519
Chapter 18

Optimization problems

Optimization problems are fundamental in economics, which is based on the analysis of


maximization/minimization problems solved by economic agents, such as individuals (con-
sumers, producers, and investors), families, and governments. Methodological individualism
is, indeed, at the heart of economic analysis that, thus, aims to explain economic phenomena
in terms of agents’purposeful behavior, assumed to be rational (so, optimal). A purposeful
rational agent –the homo oeconomicus –is the, idealized, basic unit of analysis of economic
theory.1

As a result, this is the central chapter of the book that justi…es the study of the notions
discussed so far, as well as of those that we will see in the rest of the book.

18.1 Generalities

Consider the function f : R ! R given by f (x) = 1 x2 , with graph:

1
It is a kind of abstraction that any scienti…c inquiry requires, as most eloquently Vilfredo Pareto remarked
in his seminal 1900 piece, to which we refer readers. Note that purposeful individual behavior might well be
boundedly rational (hence, suboptimal), thus rationality is an additional assumption relative to methodolo-
gical individualism (cf. Arrow, 1994).

521
522 CHAPTER 18. OPTIMIZATION PROBLEMS

4 y
3

0
O x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

It is immediate to see that f attains its maximum value, equal to 1, at the point x = 0, that
is, at the origin (Example 229). On the other hand, there is no point at which f attains a
minimum value.
Suppose that, for some reason, we are interested in the behavior of f only on the interval
[1; 2], not on the entire domain R. Then f has 0 as maximum value, attained at the point
x = 1, while it has 3 as minimum value, attained at the point x = 2. Graphically:

4 y
3

1
1 2
0
O x
-1

-2

-3 -3
-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

From this example two crucial observations follow:

(i) the distinction between maximum value and maximizer: a maximizer is an element of
the domain at which the function reaches its maximum value, that is, the element of
the codomain which is the image of a maximizer;2
2
As already anticipated in Section 6.6.
18.1. GENERALITIES 523

(ii) the importance of the subset of the domain in which we are interested in establishing
the existence of maximizers or minimizers.

These two observations lead to next de…nition, in which we consider an objective function
f and a subset C of its domain, called choice set.

De…nition 773 Let f : A Rn ! R be a real-valued function and C a subset of A. A point


x
^ 2 C is called a (global) maximizer of f on C if

f (^
x) f (x) 8x 2 C (18.1)

The value f (^
x) of the function at x
^ is called ( global) maximum value of f on C.

In the special case C = A when the choice set is the entire domain, the point x
^ is called
maximizer, without further speci…cation (in this way, we recover the de…nition of Section
6.6).

In the initial example we considered two cases:

(i) in the …rst case C was the entire domain, that is, C = R, and we had x
^ = 0 and
f (^
x) = max f (R) = 1;

(ii) in the second case C was the interval [1; 2] and we had x
^ = 1 and f (^
x) = max f ([1; 2]) =
0.

The maximum value of the objective function f on the choice set C is, thus, nothing but
the maximum of the set f (C), i.e.,3

f (^
x) = max f (C)

By Proposition 33, the maximum value is unique. We denote this unique value by

max f (x)
x2C

The maximizers may, instead, be not unique and their set, called solution set, is denoted by
arg maxx2C f (x), that is,

arg max f (x) = x


^ 2 C : f (^
x) = max f (x)
x2C x2C

For example, for the function f : R ! R de…ned by


8
>
> x + 1 if x 1
<
f (x) = 0 if 1<x<1
>
>
:
x+1 if x 1
3
Recall that f (C) = ff (x) : x 2 Cg is the set (6.1) of all the images of points that belong to C.
524 CHAPTER 18. OPTIMIZATION PROBLEMS

with graph 2

y
1.5

0.5

-1 1
0
O x
-0.5

-1

-1.5

-2
-3 -2 -1 0 1 2 3

we have maxx2R f (x) = 0 and arg maxx2R f (x) = [ 1; 1], so the set of maximizers is the
entire interval [ 1; 1]. On the other hand, if we restrict ourselves to [1; +1), we have
maxx2[1;+1) f (x) = 0 and arg maxx2[1;+1) f (x) = f1g, so 1 is the unique maximizer of f on
[1; +1). Graphically:

y
1.5

0.5

1
0
O x
-0.5

-1

-1.5

-2
-3 -2 -1 0 1 2 3

De…nition 774 Let f : A Rn ! R be a real-valued function and C a subset of A. A point


x
^ 2 C is called strong (or strict) maximizer if

f (^
x) > f (x)

for every x 2 C distinct from x


^.

Strong maximizers are an important class of maximizers, with the following property of
uniqueness.

Proposition 775 A maximizer is strong if and only if it is unique, that is, if and only if
arg maxx2C f (x) is a singleton.

Proof “Only if”. Suppose, by contradiction, that there exist two distinct strong global
maximizers x
^1 and x
^2 . By de…nition, f (^
x1 ) > f (^
x2 ) and f (^
x2 ) > f (^
x1 ), which is impossible.
18.1. GENERALITIES 525

“If”. Let arg maxx2C f (x) be a singleton f^


xg. By hypothesis, f (^
x) f (x) for every x 2 C.
Suppose, by contradiction, that there exists y 2 C such that f (^ x) = f (y). We have
f (y) = f (^x) f (x) for every x 2 C, which contradicts the fact that x^ is the unique
maximizer.

In other words, “strongness” of a maximizer is equivalent to its uniqueness. Strongness


(hence uniqueness) is a remarkable property that greatly simpli…es the study of how max-
imizers and maximum values change when the choice set C changes. For example, this is
the case in the study of how optimal bundles, and their utility, change when the budget
set changes as a consequence of variations in income and prices (see Section 18.1.4). In
economic applications this analysis, known as comparative statics, plays a fundamental role.
It is particularly e¤ective when maximizers are unique (i.e., they are strong). Indeed, it is
much easier to keep track and compare unique solutions (e.g., unique optimal bundles per
each pro…le of prices and income) than sets of them.

Until now we have talked about maximizers, but analogous considerations hold for min-
imizers. For example, in De…nition 773 an element x ^ 2 C is a (global) minimizer of f on
C if f (^
x) f (x) for every x 2 C, with minimum value f (^ x) = min f (C), denoted by
minx2C f (x). Maximizing and minimizing are actually two sides of the same coin, as form-
alized by the next result. Its obvious proof is based on the observation that f (x) f (y) if
and only if f (x) f (y) for every x; y 2 A.

Proposition 776 Let f : A Rn ! R be a real-valued function and C a subset of A. A


point x
^ 2 C is a minimizer of f on C if and only if it is a maximizer of f on C, and it is
a maximizer of f on C if and only if it is a minimizer of f on C. In particular,

min f (x) = max ( f ) (x) and max f (x) = min ( f ) (x)


x2C x2C x2C x2C

For example, it is immediate that the minimizers of the function f : R ! R given by


f (x) = x2 1 are maximizers for the function f (x) = 1 x2 seen at the beginning of the
section.
Thus, between maximizers and minimizers there is a natural duality that makes the
results of one case a simple dual version of the other. Therefore, from the mathematical
viewpoint, the choice of which of these two equivalent problems to study is only a question
of convenience bearing no conceptual relevance. Given their great importance in economic
applications, in the sequel we will tend to consider the properties of maximizers, leaving the
analogous properties for minimizers to the reader. In any case, keep in mind that the term
extremal refers both to maximizers and minimizers, and that the term optimum value refers
both to maximum and minimum values.

The problem of maximizing an objective function f : A Rn ! R on a given choice set


C A Rn , that is, of …nding its maximum value and its maximizers, is called maximization
problem. In a maximization problem, the maximizers are called solutions. The solutions are
said to be strong if so are the maximizers. By Proposition 775, a solution is strong if and
only if it is unique.
526 CHAPTER 18. OPTIMIZATION PROBLEMS

Analogous notions hold for minimization problems, in which we look for the minimum
value and the minimizers of an objective function on a given choice set. Finally, optimization
problems include both maximization and minimization problems, they are “genderless”.4

Formally, we will write a maximization problem as

max f (x) sub x 2 C (18.2)


x

(“sub” from “subject to”) and a minimization problem with min in place of max. The x
below max indicates the choice variable, that is, the variable which we control to maximize
the objective function. When C = A, sometimes we omit the clause “sub x 2 C” since
x must obviously belong to the domain of f . In the important case in which the set C
is open, we talk of unconstrained optimization problems;5 otherwise, we talk of constrained
optimization problems.

18.1.1 The beginner’s luck


Normally, it is quite complicated to solve an optimization problem. Nevertheless, maximizers
(or minimizers) can sometimes be found by working with bare hands on the problem, as the
next examples show.

Example 777 Let f : R ! R be given by f (x) = 2x x2 and consider the optimization


problem
max f (x) sub x 2 R
x

that is, we look for maximizers of f on its entire domain:

4 y
3

0
O
1 2 x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

4
Because of our maximization emphasis, however, in what follows we often use interchangeably the terms
“optimization problem” and “maximization problem”.
5
Since an open set C is still a constraint, this terminology is unsatisfactory. To make some sense of it,
note that all the points x of an open set C are interior and so have a neighborhood B" (x) included in C.
One can thus “move around” the point x while still remaining within C. In this local sense, an open choice
set allows for some freedom.
18.1. GENERALITIES 527

We can write f (x) = 2x x2 1+1=1 (x 1)2 , so one has

f (x) 1 8x 2 R

Since f takes value 1 at x


^ = 1 (actually, f takes on value 1 only at x
^ = 1), we can say
that x
^ = 1 is a strong maximizer of f on R. The maximum value of f on R is the scalar 1.
Finally, f is unbounded below, so it has no minimizers. N

Example 778 Let f : R2 ! R be de…ned by f (x) = x21 6x1 x2 + 12x22 for every x =
(x1 ; x2 ) 2 R2 and consider the optimization problem

min f (x) sub x 2 R2


x

Since f (x1 ; x2 ) = x21 6x1 x2 + 9x22 + 3x22 = (x1 3x2 )2 + 3x22 , that is, f is the sum of two
squares, we have
f (x1 ; x2 ) 0 8 (x1 ; x2 ) 2 R2

Next, since f (0; 0) = 0 (actually, f assumes value 0 only at the origin), we conclude that the
origin (0; 0) is a strong minimizer of f on R2 . The minimum value of f on R2 is the scalar
0. Finally, f is unbounded above, so it has no maximizers. N

Example 779 Let f : R3 ! R be given by f (x) = e x21 x22 x23 for every x = (x1 ; x2 ; x3 ) 2
R3 and consider the optimization problem

max f (x) sub x 2 R3


x

Since 0 < f (x1 ; x2 ; x3 ) 1 for every (x1 ; x2 ; x3 ) 2 R3 and f (0; 0; 0) = 1, the origin (0; 0; 0)
is a strong maximizer of f on R3 . The maximum value of f on R3 is the scalar 1. However,
f does not have a minimizer because it never attains the in…mum of its values, that is, 0. N

Example 780 Let f : R ! R be de…ned by f (x) = cos x, and consider the optimization
problem
min f (x) sub x 2 R
x

Since 1 cos x 1, all the points at which f (x) = 1 are maximizers and all the points at
which f (x) = 1 are minimizers. The maximizers are, therefore, x ^ = 2k with k 2 Z and
the minimizers are x~ = (2k + 1) with k 2 Z. The maximum and minimum values are the
the scalars 1 and 1, respectively.
These maximizers and minimizers on R are not strong. However, if we consider a smaller
choice set, such as C = [0; 2 ), we will …nd that the unique strong maximizer is x
^ = 0 and
the unique strong minimizer is x ~= . N

Example 781 For a constant function, all the points of the domain are simultaneously
maximizers and minimizers. Its constant value is simultaneously the maximum and minimum
value. N
528 CHAPTER 18. OPTIMIZATION PROBLEMS

Note that De…nition 773 does not require the function to satisfy any special property;
in particular, neither continuity nor di¤erentiability are invoked. For example, the function
f : R ! R given by f (x) = jxj attains its minimum value at the point x ^ = 0, where it is not
di¤erentiable. The function f : R ! R given by
(
x + 1 if x 1
f (x) =
x if x > 1

with graph
4

y
3

0
O 1 x
-1 -1

-2

-3

-4
-4 -3 -2 -1 0 1 2 3 4

attains its maximum value at the point x^ = 1, where it is discontinuous.


It may also happen that an isolated point is extremal. For example, the function de…ned
by 8
> x + 1 if x 1
>
<
f (x) = 5 if x = 2
>
>
:
x if x > 4
with graph
6

y
4

0
4
O 1 x

-2

-4 -4

-6
-6 -4 -2 0 2 4 6
18.1. GENERALITIES 529

attains its maximum value at x


^ = 2, which is an isolated point of the domain ( 1; 1] [ f2g [
(4; +1) of f .

O.R. As we have already observed, the maximum value of f : A Rn ! R on C A is


nothing but max f (C). It is a value actually attained by f , that is, there exists a point
x
^ 2 C such that f (^x) = max f (C). We can, therefore, choose a point in C of f to “attain”
the maximum.
When the maximum value does not exist, the image set f (C) might still have a …nite
supremum sup f (C). The unpleasant aspect is that there might well be no point in C that
attains such a value, that is, we might not be able to attain it. Pragmatically, this aspect is
less negative than it might appear prima facie. Indeed, as Proposition 120 indicates, we can
choose a point at which f is arbitrarily close to the sup. If sup f (C) = 48, we will never be
able to get exactly 48, but we can get arbitrarily close to it: we can always choose a point at
which the function has value 47; 9 and, if this is not enough, we can get a point at which f
takes value 47; 999999999999 and, if this is not enough.... Similar remarks hold for minimum
values. H

18.1.2 Properties
The optimization problems (18.2) enjoy a simple, but important, property of invariance.

Proposition 782 Let g : B R ! R be a strictly increasing function with Im f B. The


two optimization problems
max f (x) sub x 2 C
x

and
max (g f ) (x) sub x 2 C
x

are equivalent, that is, they have the same solutions.

Proof By Proposition 209, since g is strictly increasing, we have

f (x) f (y) () (g f ) (x) (g f ) (y) 8x; y 2 A

Therefore, f (^
x) f (x) for every x 2 C if and only if (g f ) (^
x) (g f ) (x) for every
x 2 C.

Thus, two objective functions – here f and f 0 = g f – are equivalent when they are a
strictly transformation one of the other.6 Later in the chapter, we will comment more on
this simple, yet conceptually important, result (Section 18.1.5).

Let us now consider the case, important in economic applications (as we will soon see),
in which the objective function is strongly increasing.

Proposition 783 Let f : A Rn ! R be a real-valued function and C a subset of A. If f


is strongly increasing on C, then arg maxx2C f (x) @C.
6
Note that f 0 = g f if and only if f = g 1 f 0 , so one can move back and forth between equivalent
objective functions via strictly increasing transformations.
530 CHAPTER 18. OPTIMIZATION PROBLEMS

Proof Let x ^ 2 arg maxx2C f (x). We want to show that x ^ 2 @C. Suppose, by contradiction,
that x^2= @C, i.e., x
^ is an interior point of C. There exists, therefore, a neighborhood B" (^x)
of x
^ included in C. It is easy to see that, then, there exists y 2 B" (^
x) such that x
^ y. Since
f is strongly increasing on C, we obtain that f (y) > f (^ x), which contradicts the optimality
of x
^. We conclude that x ^ 2 @C.

The possible solutions of the optimization problem (18.2) are, thus, boundary points
when the objective function is strongly increasing (a fortiori, if it is a strictly increasing
function; cf. Proposition 211). With this kind of objective function, we can thus simplify
problem (18.2) as follows:
max f (x) sub x 2 @C
x
We will soon see a remarkable application of this observation in Walras’law.
The last proposition implies that when @C \C = ;, which happens for example when C is
open, the optimization problem (18.2) does not admit any solution if f is strongly increasing.
A trivial example is f (x) = x on C = (0; 1), as the graph shows:

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4

Finally, let us consider an obvious, yet noteworthy, property of monotonicity in C.

Proposition 784 Given f : A Rn ! R, let C and C 0 be any two subsets of A. Then

C C 0 =) max f (x) max0 f (x)


x2C x2C

^0 2 arg maxx2C 0 f (x). Since C


Proof Let x C 0 , we have

arg max f (x) C0


x2C

x0 )
Therefore, f (^ f (^
x) for every x
^ 2 arg maxx2C f (x).

Larger sets C always lead to higher maximum values of the objective function. In other
terms, to have more opportunities to choose from is never detrimental, whatever the form
18.1. GENERALITIES 531

of the objective function is. This simple principle of monotonicity is often important. The
basic economic principle that removing constraints on agents’choices can only bene…t them
is, indeed, formalized by this proposition.

Example 785 Recall the initial example in which we considered two di¤erent sets of choices,
R and [1; 2], for the function f (x) = 1 x2 . We had maxx2[1;2] f (x) = 0 < 1 = maxx2R f (x),
in accordance with the last proposition. N

18.1.3 Cogito ergo solvo


Optimization problems are often solved through the di¤erential methods that will be studied
later in the book. However, before using any “method”, it is important to ponder over the
problem at hand and see if our insight can suggest us anything relevant about it. In this
way we can often simplify the problem, sometimes even guess a solution that we can then
try to verify.
We will illustrate all this through few optimization problems, couple of them inspired by
classic economic problems. Here, however, we abstract from applications and treat them in
purely analytical terms.
3
Example 786 Let f : R ! R be the scalar function de…ned by f (x) = 1 x2 . Consider
the optimization problem
max f (x) sub x 0
x
1
Consider the strictly increasing transformation g = f 3 of the objective function f , that is,
g (x) = 1 x2 . The problem
max g (x) sub x 0
x
is equivalent to the previous one by Proposition 782 but, clearly, it is more tractable. We
can actually do better by getting rid of the constant 1 in the objective function (constants
a¤ect the maximum value but not the maximizers). So, we can just study the problem

max x2 sub x 0
x

Clearly, the unique solution is x


^ = 0. By plugging it in the original objective function, we
get the maximum value f (^ x) = 1. N

Example 787 Let f : R2++ ! R be de…ned by f (x) = log x1 + log x2 . Consider the
optimization problem
max f (x) sub x1 + x2 = 1
x
The problem is symmetric in each xi , so it is natural to guess a symmetric solution x^ with
equal components x ^1 = x^2 . Then, x
^1 = x^2 = 1=2 because of the constraint x1 + x2 = 1.
Let us verify this guess. Since the logarithmic function is strictly concave, if y 6= x
^ and
y1 + y2 = 1, we have
1 1
f (y) f (^
x) = log 2y1 + log 2y2 = 2 log 2y1 + log 2y2 < 2 log (y1 + y2 ) = 2 log 1 = 0
2 2

So, x
^ indeed solves the problem. Here the the maximum value is f (^
x) = log 4. N
532 CHAPTER 18. OPTIMIZATION PROBLEMS

The next examples are a bit more complicated, but they are important in applications
and show how some little thinking can save many calculations.

Yn
Example 788 Let f : Rn+ ! R be a Cobb-Douglas function de…ned by f (x) = xai i , with
Pn n
i=1
a
i=1 i = 1 and a i > 0 for each i. Given 2 R ++ and > 0, consider the optimization
problem
max f (x) sub x 2 C (18.3)
x
P
with choice set C = x 2 Rn+ : ni=1 i xi = . It is easy to see that the maximizers belong
to Rn++ , that is, they have strictly positive components. Indeed, if x lies on some axes of
Rn – i.e., xi = 0 for some i – then f (x) = 0. Since f 0 on C, it is easy to see that such
x cannot solve the problem. For this reason, we can consider the equivalent optimization
problem
max f (x) sub x 2 C \ Rn++ (18.4)
x
We can do better: since f > 0 on Rn++ , we can consider the logarithmic transformation
P
g = log f of the objective function f , that is, the log-linear function g (x) = ni=1 ai log xi .
The problem
max g (x) sub x 2 C \ Rn++ (18.5)
x
is equivalent to the previous one by Proposition 782. It is, however, more tractable because
of the log-linear form of the objective function.
Let us ponder over problem (18.5). Suppose …rst P that both the coe¢ cients ai and i are
equal among themselves, with ai = 1=n (because ni=1 ai = 1) and i = 1 for each i. The
problem is then symmetric in each xi , so it is natural to guess a symmetric
P solution x
^, with
x
^1 = =x ^i = ai for each i because of the constraint ni=1 xi = . If, instead,
^n . Then, x
the coe¢ cients di¤er, the asymmetry in the solutions should depend on the coe¢ cients i
and ai peculiar to each xi . An (educated) guess is that the solution is

a1 an
x
^= ; :::; (18.6)
1 n

^ 2 C \ Rn++ because x
Let us verify this guess. We have x ^ 2 Rn++ and
n
X n
X n
X
ix
^i = i ai = ai =
i=1 i=1 i i=1
P P
We now show that ni=1 ai log yi < ni=1 ai log x ^i for every y 2 C \ Rn++ with y 6= x
^. Since
log x is strictly concave, by Jensen’s inequality (14.13) we have
n
X n
X n
X n
X n
X
yi yi i yi
ai log yi ai log x
^i = ai log < log ai = log
i=1 i=1 i=1 ai i i=1 ai i i=1
n
X
1 1
= log i yi log = log 1 = 0
i=1

as desired. We conclude that (18.6) is indeed the unique solution of the problem. N
18.1. GENERALITIES 533

Example 789 Let f : Rn ! R be a convex function. Consider the optimization problem

max f (x) sub x 2 C (18.7)


x
P
where C = x 2 Rn+ : ni=1 i xi = 1 , with each i > 0. We start by observing that C is
convex and that its elements can be written as a convex combination of the vectors
1 1
e~i = ei = 0; :::; 0; ; 0; :::; 0 8i = 1; :::; n
i i

Indeed, if x 2 C then
n
X n
X n
X
1
x= xi ei = i xi ei = ~i
i xi e
i=1 i=1 i i=1
P
where i xi 0 for each i and ni=1 i xi = 1 (because x 2 C). It is easy to check that each
e~i belongs to C. We are now in a position to say something about the optimization problem
(18.7). Since f is convex, we have
n
! n
X X
i
f (x) = f x
i i e
~ ~i
i xi f e max f e~i
i=1;:::;n
i=1 i=1

Thus, to …nd a maximizer it is enough to check which e~i receives the highest evaluation under
f . Since the vectors e~i lie on some axis of Rn , in this way we …nd what in the economics
jargon are called corner solutions.
That said, there might well be maximizers that this simple reasoning may neglect. In
other words, we only showed that:

arg max f (x) arg max f (x)


e1 ;:::;~
x2f~ en g x2C

To say something more about all possible maximizers, i.e., about the set arg maxx2C f (x),
we need to assume more on the objective function f . We consider two important cases:

(i) Assume that f is strictly convex. Then, the only maximizers in C are among the
vectors e~j , that is,
arg max f (x) = arg max f (x)
e1 ;:::;~
x2f~ en g x2C

So, problem (18.7) reduces to the much simpler problem

max f (x) sub x 2 e~1 ; :::; e~n (18.8)


x

Indeed, strict convexity yields a strict inequality as soon as at least for two indexes i
we have i xi > 0, that is,
n
! n
X X
i
f (x) = f i xi e
~ < ~i
i xi f e
i=1 i=1
534 CHAPTER 18. OPTIMIZATION PROBLEMS

For instance, consider the problem

max x21 + x22 + x23 sub 1 x1 + 2 x2 + 3 x3 =1


x

It is enough to solve the problem

max x21 + x22 + x23 sub x 2 f(1= 1 ; 0; 0) ; (0; 1= 2 ; 0) ; (0; 0; 1= 3 )g


x

For example, if 1 < 2 < 3,then e~1 = (1= 1 ; 0; 0) is the only solution, while if
~1 = (1=
1 = 2 < 3 , then e ~2 = (0; 1= 2 ; 0) are the only two solutions.
1 ; 0; 0) and e

(ii) Assume that f is a¢ ne, i.e., f (x) = 0 + 1 x1 + n xn . Then, the set of maximizers
j
consists of the vectors e~ that solve problem (18.8) and of their convex combinations
(as the reader can easily check). That is,

co arg max f (x) = arg max f (x)


e1 ;:::;~
x2f~ en g x2C

where left-hand side is the convex envelope of the vectors in arg maxx2f~e1 ;:::;~en g f (x)
(a polytope; cf. Example 689). For instance, consider the problem

max 0 + 1 x1 + 2 x2 + 3 x3 sub 1 x1 + 2 x2 + 3 x3 =1 (18.9)


x

as well as the simpler problem


1 1 1
max 0 + 1 x1 + 2 x2 + 3 x3 sub x 2 ; 0; 0 ; 0; ; 0 ; 0; 0; (18.10)
x 1 2 3

For instance, if 1 = 1 > 2 = 2 > 3 = 3 , then e~1 = (1= 1 ; 0; 0) is the only solution of
problem (18.10), so of problem (18.9). On the other hand, if 1 = 1 = 2 = 2 > 3 = 3 ,
then e~1 = (1= 1 ; 0; 0) and e~2 = (0; 1= 2 ; 0) solve problem (18.10), so the polytope
t (1 t)
co e~1 ; e~2 = t~
e1 + (1 t) e~2 : t 2 [0; 1] = ; ;0 : t 2 [0; 1]
1 2

is the set of all solutions of problem (18.9).

To sum up, some simple arguments show that optimization problems featuring convex
objective functions and linear constraints have corner solutions. Section 18.6.2 will discuss
these problems, which often arise in applications. N

Example 790 Let f : Rn ! R be a Leontief function de…ned by f (x) = mini=1;::;n xi .


Recall that f is concave (cf. Example 655). Given 2 Rn++ and > 0, consider the
optimization problem
max f (x) sub x 2 C
x
n
Pn
with choice set C = x 2 R+ : i=1 i xi = . Because of the symmetry of the objective
function, we again guess a symmetric solution x
^, with x
^1 = =x^n . Then,

x
^= Pn ; :::; Pn (18.11)
i=1 i i=1 i
18.1. GENERALITIES 535

because of the constraint. To verify this guess, let x 2 C be a solution of the problem, so
that f (x ) f (y) for all y 2 C. As we will see, by Weierstrass’Theorem such a solution
exists. We want to show that x = x ^. It is easy to check that, if k = (k; :::; k) 2 Rn is a
constant vector and 0 is a positive scalar, we have

f ( x + k) = f (x) + k 8x 2 Rn (18.12)

In turn, this implies

1 1 1 1
f (x ) f x + x^ = f (x ) + Pn
2 2 2 2 i=1 i
P P
So, mini=1;::;n xi = f (x ) = ni=1 i , that is, xi = ni=1 i for each i. Suppose x 6= x
^,
that is, x > x ^. Since x 2 C, we reach the contradiction
n
X n
X
= i xi > i Pn =
i=1 i=1 i=1 i

We conclude that x = x
^. The constant vector (18.11) is thus the unique solution of the
problem. N

18.1.4 Consumption and production


The next two classic examples illustrate the centrality of optimization problems in economics.

The consumer problem Consider a consumer whose preferences are represented by a


utility function u : A Rn+ ! R, where the domain A is a set of bundles x = (x1 ; x2; :::; xn )
of n goods, called the consumption set of the consumer. It consists of the bundles that are
of interest to the consumer.
Denote by p = (p1 ; p2 ; :::; pn ) 2 Rn+ the vector of the market prices of the goods. Suppose
that the consumer has income w 0. The budget set of the consumer, that is, the set of
bundles that he can purchase given the vector of prices p and his income w, is

B (p; w) = fx 2 A : p x wg

We write B (p; w) to highlight the dependence of the budget set on p and on w. For example,

w w0 =) B (p; w) B p; w0 (18.13)

that is, to a greater income there corresponds a larger budget set. Analogously,

p p0 =) B (p; w) B p0 ; w (18.14)

that is, to lower prices there corresponds a larger budget set.


By de…nition, B (p; w) is a subset of the consumer’s consumption set A. Indeed, B (p; w)
is the set of the bundles of interest to the consumer that he can a¤ord given the prices p
and the income w. Consumers with di¤erent consumption sets may therefore have di¤erent
budget sets.
536 CHAPTER 18. OPTIMIZATION PROBLEMS

Example 791 (i) Let u : R2+ ! R be the CES utility function


1
u (x) = ( x1 + (1 ) x2 )

with 2 [0; 1] and 2 (0; 1]. In this case the consumption set is A = R2+ .
(ii) Let u : R2++ ! R be the log-linear utility function

u (x) = a log x1 + (1 a) log x2

with a 2 (0; 1). Here the consumption set is A = R2++ . CES and log-linear consumers have
therefore di¤erent consumption sets.
(iii) Suppose that the consumer has a subsistence bundle x 0, so that he can consider
only bundles x x (in order to survive). In this case it is natural to take as consumption
set the closed and convex set

A = x 2 Rn+ : x x Rn++ (18.15)

For instance, we can consider the restrictions of CES and log-linear utility functions on this
set A. N

The next result shows some remarkable properties of the budget set.

Proposition 792 The budget set B (p; w) is convex if A is convex and it is compact if A is
closed and p 0.

The importance of the condition p 0 is obvious: if some of the goods were free (and
available in unlimited quantity), the consumer could obtain any quantity of it and the budget
set would be then unbounded.
In light of this proposition, we will often assume that the consumption set A is closed
and convex (but the log-linear utility function is an important example featuring an open
consumption set).

Proof Let A be closed and p 0. Let us show that B (p; w) is closed. Consider a sequence
of bundles xk B (p; w) such that xk ! x. Since A is closed, x belongs to A. Since
p xk w for every k, we have p x = lim p xk w. Therefore, x 2 B (p; w). By Theorem
165, B (p; w) is closed.
We are left to show that B (p; w) is a bounded set. By contradiction, suppose that there
exists a sequence xk B (p; w) such that xki ! +1 for some good i. Since p 0 and
n
x 2 R+ , we have p x k pi xki for every k. We reach therefore the contradiction

w lim p xk pxki ! +1

We conclude that B (p; w) is both closed and bounded, i.e., is compact.


As to convexity, let A be convex and p 0. Let x; y 2 B (p; w) and 2 [0; 1]. Since A
is convex, x + (1 ) y belongs to A. We have

p ( x + (1 ) y) = (p x) + (1 ) (p y) w + (1 )w = w

Hence, x + (1 ) y 2 B (p; w). The budget set is therefore convex.


18.1. GENERALITIES 537

The consumer (optimization) problem consists in maximizing the consumer utility func-
tion u : A Rn+ ! R on the budget set B (p; w), that is,

max u (x) sub x 2 B (p; w) (18.16)


x

Given prices and income, the budget set B (p; w) is the choice set of the consumer problem.
In particular, a bundle x
^ 2 B (p; w) is a maximizer, that is, a solution of the optimization
problem (18.16), if
u (^
x) u (x) 8x 2 B (p; w)
while maxx2B(p;w) u (x) is the maximum utility that can be attained by the consumer.

By Proposition 782, every strictly increasing transformation u0 = g u of u de…nes an


optimization problem
max u0 (x) sub x 2 B (p; w) (18.17)
x
equivalent to the original one (18.16) in that it has the same solutions (the optimal bundles).
The choice of which one to solve, among such equivalent problems, is a matter of analytical
convenience. The utility functions u0 and u are thus equivalent objective functions. Such
equivalence is also economic in that they also represent the same underlying preference
(Section 6.8). This economic and mathematical equivalences shed light one the other.
P
Example 793 The log-linear utility function u (x) = ni=1 i log xi is an analytically con-
venient transformation of the Cobb-Douglas utility function (as already observed). N

The maximum utility maxx2B(p;w) u (x) depends on the income w and on the vector of
prices p: the function v : Rn++ R++ ! R de…ned by

v (p; w) = max u (x) 8 (p; w) 2 Rn++ R++


x2B(p;w)

is called the indirect utility function.7 When prices and income vary, it indicates how varies
the maximum utility that the consumer may attain.

Example 794 The unique optimal bundle for the log-linear utility function u (x) = a log x1 +
(1 a) log x2 , with a 2 (0; 1), is given by x
^1 = aw=p1 and x^2 = (1 a) w=p2 (Example 788).
It follows that that the indirect utility function associated to the log-linear utility function
is
aw (1 a) w
v (p; w) = u (^
x) = a log + (1 a) log
p1 p2
= a (log a + log w log p1 ) + (1 a) (log (1 a) + log w log p2 )
= log w + a log a + (1 a) log (1 a) (a log p1 + (1 a) log p2 )

for every (p; w) 2 Rn++ R++ . N

Thanks to (18.13) and (18.14), the property of monotonicity seen in Proposition 784
takes the following form for indirect utility functions.
7
Here, we are tacitly assuming that a maximizer exists for every pair (p; w) of prices and income. Later
in the chapter we will present results, namely Weierstrass’and Tonelli’s theorems, that guarantee this.
538 CHAPTER 18. OPTIMIZATION PROBLEMS

Proposition 795 Let u : A Rn+ ! R be continuous. Then,

w w0 =) v (p; w) v p; w0

and
p p0 =) v (p; w) v p0 ; w

In other words, consumers always bene…t both from a higher income and from lower
prices, regardless of their utility functions (provided they are continuous).

As we observed in Section 6.4.4, it is natural to assume that the utility function u : A


Rn+ ! R is, at least, increasing. By Proposition 783, if we assume that u is actually strongly
increasing, the solution of the consumer problem will belong to the boundary @B (p; w) of
the budget set. Thanks to the particular form of the budget set, a sharper result holds.

Proposition 796 (Walras’Law) Let u : A Rn+ ! R be strongly increasing on a set A


closed under majorization.8 If x
^ is a solution of the consumer problem, then p x
^ = w.

Proof Let x 2 B (p; w) be such that p x < w. It is easy to see that, being A closed
under majorization,
P there exists y x such that p y w. Indeed, taking any 0 < " <
(w p x) = ni=1 pi , it is su¢ cient to set y = x + " (1; :::; 1), that is, yi = xi + " for every
i = 1; :::; n. Since u is strongly increasing, we have u (y) > u (x) and therefore x cannot be
a solution of the consumer problem.

The consumer allocates therefore all its income to the purchase of an optimal bundle x ^,
^ = w.9 This property is called Walras’ law and, thanks to it, in the consumer
that is, p x
problem with strongly increasing utility functions we can replace the budget set B (p; w) by
its subset
(p; w) = fx 2 A : p x = wg @B (p; w)
de…ned by the equality constraint.

Producer problem Consider a producer who must decide the quantity y to produce of a
given output. In taking such a decision the producer must consider both the revenue r (y)
that he will have by selling the quantity y and the cost c (y) that he will bear to produce it.
Let r : [0; 1) ! R be the revenue function and c : [0; 1) ! R be the cost function of
the producer. His pro…t is therefore represented by the function : [0; 1) ! R given by

(y) = r (y) c (y)

The producer (optimization) problem is to maximize his pro…t function : [0; 1) ! R, that
is,
max (y) sub y 0 (18.18)
y

8
A set A is closed under majorization if x 2 A and y x then y 2 A. That is, if A contains a vector x,
it also contains all the vectors y that are greater than x. For instance, Rn n
+ and R++ are both closed under
majorization, so to …x ideas the reader can think of them in reading Walras’law.
9
Proposition 796 is sharper than Proposition 783 because there exist points of the boundary @B (p; w)
such that p x < w. For example, the origin 0 2 @B (p; w) (provided 0 2 A).
18.1. GENERALITIES 539

In particular, a quantity y^ 0 of output is a maximizer if

(^
y) (y) 8y 0

while maxy2[0;1) (y) is the maximum pro…t that can be obtained by the producer. The set
of the (pro…t) maximizing outputs is arg maxy2[0;1) (y).

The form of the revenue function depends on the structure of the market in which the
producer sells the output, while that of the cost function depends on the structure of the
market where the producer buys the inputs necessary to produce the good. Let us consider
some classic market structures.

(i) The output market is perfectly competitive, so that its sale price p 0 is independent
of the quantity that the producer decides to produce. In such a case the revenue
function r : [0; 1) ! R is given by

r (y) = py

(ii) The producer is a monopolist on the output market. Let us suppose that the demand
function on this market is D : [0; 1) ! R, where D (y) denotes the unit price at
which the market absorbs the quantity y of the output. Usually, for obvious reasons,
we assume that the demand function is decreasing: the market absorbs greater and
greater quantities of output as its unit price gets lower and lower. The revenue function
r : [0; 1) ! R is therefore given by

r (y) = yD (y)

(iii) The input market is perfectly competitive, that is, the vectors

x = (x1 ; x2 ; :::; xn )

necessary for the production of y have prices gathered in the vector

w = (w1 ; w2 ; :::; wn ) 2 Rn+

that are independent of the quantity that the producer decides to buy (wi is
Pthe price
of the i-th input). The cost of a vector x of input is thus equal to w x = ni=1 wi xi .
But, how does this cost translate in a cost function c (y)?
To answer this question, assume that f : Rn+ ! R is the production function that the
producer has at his disposal to transform a vector x 2 Rn+ of input into the quantity
f (x) of output. The cost c (y) of producing the quantity y of output is then obtained
by minimizing the cost w x among all the vectors x 2 Rn+ that belong to the isoquant
1
f (y) = x 2 Rn+ : f (x) = y

that is, among all the vectors that allow to produce the quantity y of output. Indeed,
in terms of production, the inputs in f 1 (y) are equivalent and so the producer will
opt for the cheaper ones. In other terms, the cost function c : [0; 1) ! R is given by

c (y) = min w x
x2f 1 (y)
540 CHAPTER 18. OPTIMIZATION PROBLEMS

that is,10 it is equal to the minimum value of the minimum problem for the cost w x
on the isoquant f 1 (y).

To sum up, a producer who, for example, is a monopolist in the output market and faces
perfect competition in the inputs’markets, has a pro…t function

(y) = r (y) c (y) = yD (y) min w x


x2f 1 (y)

Instead, a producer who faces perfect competition in all markets, for the output and the
inputs’, has a pro…t function

(y) = r (y) c (y) = py min w x


x2f 1 (y)

18.1.5 Comments
Ordinality Properties of functions that are preserved under strictly increasing transform-
ations are called ordinal, as we mentioned when discussing utility theory (Sections 6.4.4 and
14.4). In view of Proposition 782, a property may hold for all equivalent objective functions
only if it is ordinal. For instance, all them can be quasi-concave but not concave (quasi-
concavity, but not concavity, is an ordinal property). So, if we are interested in a property
of solutions and wonder which properties of objective functions would ensure it, ideally we
should look for ordinal properties. If we come up with su¢ cient conditions that are not so –
for instance, concavity or continuity conditions – chances are that there exist more general
su¢ cient conditions that are ordinal. In any case, any necessary condition must be ordinal
in that it has to hold for all equivalent objective functions.
To illustrate this subtle, yet important, methodological point, consider the uniqueness of
solutions, a most desirable property for comparative statics exercises (as we remarked earlier
in the chapter). We will soon learn that strict quasi-concavity is an ordinal property that
ensures such uniqueness (Theorem 831). So does strict concavity as well, which is not an
ordinal property. Yet, conceptually it is strict quasi-concavity the best way to frame this
su¢ cient condition –though, operationally, strict concavity might be the workable version.
What about a necessary condition for uniqueness of solutions? At the end of the chapter
we will digress on cuneiformity, an ordinal property that is both necessary and su¢ cient for
uniqueness (Proposition 864). As soon as we look for necessary conditions, ordinality takes
center stage.

Rationality Optimization problems are fundamental also in the natural sciences, as Le-
onida Tonelli well explains in a 1940 piece: “Maximum and minimum questions have always
had a great importance also in the interpretation of natural phenomena because they are
governed by a general principle of parsimony. Nature, in its manifestations, tends to save
the most possible of what it uses; therefore, the solutions that it …nds are always solutions
of either minimization or maximization problems”. The general principle to which Tonelli
alludes, the so-called principle of minimum action, is a metaphysical principle (in the most
basic meaning of this term). Not by chance Tonelli continues by writing “Euler said that,
10
To be mathematically precise, the min in the previous expression should be an inf. We tacitly assume
that the inf is indeed achieved.
18.2. EXISTENCE: WEIERSTRASS’THEOREM 541

since the construction of the world is the most perfect and was established by the wisest cre-
ator, nothing happens in this world without an underlying maximum or minimum principle”.
In economics, instead, the centrality of the optimization problems is based on a (secular)
assumption of rationality of economic agents. The resulting optimal choices of the agents –
for example, optimal bundles for the consumers and optimal outputs for the producers –are
the natural benchmark with respect to which to assess any suboptimal, boundedly rational,
behavior that agents may exhibit.

18.2 Existence: Weierstrass’Theorem


18.2.1 Statement
The …rst fundamental question which arises for optimization problems, of both theoretical
and applied relevance, is the existence of a solution. Fortunately, there exist remarkable
existence results which guarantee, under very general conditions, the existence of a solu-
tion. The most famous and fundamental among them, already introduced for functions of a
single variable in Section 12.5, is the Weierstrass’ Theorem (also known as Extreme Value
Theorem). It guarantees the existence of both a maximizer and a minimizer for continuous
functions de…ned on compact sets. Given the centrality of optimization problems in economic
applications, Weierstrass’Theorem is one of the most important results that we present in
this book.

Theorem 797 (Weierstrass) A function f : A Rn ! R continuous on a compact subset


K of A admits (at least) a minimizer and (at least) a maximizer in K, that is, there exist
x1 ; x2 2 K such that

f (x1 ) = max f (x) and f (x2 ) = min f (x)


x2K x2K

Thanks to this result, the optimization problem (18.2), that is,

max f (x) sub x 2 C


x

admits a solution whenever f is continuous and C is compact. This holds also for the dual
optimization problem with min in place of max.

The hypotheses of continuity and compactness in Weierstrass’Theorem cannot be weakened,


as the simple examples presented in Section 12.5 show.

A classic economic application of Weierstrass’Theorem is the consumer problem (18.16),


i.e.,
max u (x) sub x 2 B (p; w)
x

Proposition 798 If the utility function u : A Rn+ ! R is continuous on the closed set A,
then the consumer problem has a solution provided p 0 (no free goods).
542 CHAPTER 18. OPTIMIZATION PROBLEMS

In words, if the utility function is continuous and the consumption set is closed, optimal
bundles exist as long as there are no free goods. These conditions are fairly mild and often
satis…ed.11

Proof By Proposition 792, the budget set B (p; w) is compact. By Weierstrass’Theorem,


the consumer problem has then a solution.

Example 799 The CES utility function u : Rn+ ! R given by


1
u (x) = ( x1 + (1 ) x2 )
with 2 [0; 1] and 2 (0; 1], is continuous and has a closed consumption set Rn+ . By Wei-
erstrass’Theorem, the consumer problem with this utility function has a solution (provided
p 0). N

Given the importance of Weierstrass’ Theorem, we close the section with two possible
proofs. First, we need an important remark on notation.

Notation In the rest of the book, to simplify notation we denote also sequences of vectors by
fxn g rather than fxn g. If needed, the writing fxn g Rn should clarify the vector nature of
the sequence even though here n denotes both the dimension of the space Rn and a generic
term xn of a sequence. It is a slight abuse of notation, as the same letter denotes two
altogether di¤erent entities, but hopefully it should not cause any confusion.

18.2.2 First proof


The …rst proof is based on the following lemma.

Lemma 800 Let A be a subset of the real line. There exists a convergent sequence fan g A
such that an ! sup A.

Proof Set = sup A. Suppose that 2 R. By Proposition 120, for every " > 0 there exists
a" 2 A such that a" > ". By taking " = 1=n for every n 1, it is therefore possible to
build a sequence fan g A such that an > 1=n for every n. It is immediate to see
that an ! .
Suppose now = +1. It follows that for every K > 0 there exists aK 2 A such that
aK K. By taking K = n for every n 1, we can therefore build a sequence fan g such
that an n for every n. It is immediate to see that an ! +1.

First proof of Weierstrass’ Theorem Set = supx2C f (x), that is, = sup f (C).
By the previous lemma, there exists a sequence fan g f (C) such that an ! . Let
fxn g C be such that an = f (xn ) for every n 1. Since C is compact, the Bolzano-
Weierstrass’ Theorem yields a subsequence fxnk g fxn g that converges to some x^ 2 C,
that is, xnk ! x^ 2 C. Since fan g converges to , also the subsequence fank g converges to
. Since f is continuous, it follows that
= lim ank = lim f (xnk ) = f (^
x)
k!1 k!1
11
Free goods short circuit the consumer problem, so constraints may actually help consumers to focus:
(homo oeconomicus) e vinculis ratiocinatur.
18.2. EXISTENCE: WEIERSTRASS’THEOREM 543

We conclude that x
^ is a solution and = max f (C), that is, x^ 2 arg maxx2C f (x) and
= maxx2C f (x). A similar argument shows that arg minx2C f (x) is not empty.

18.2.3 Second proof


The second proof of Weierstrass’ Theorem is based on the next lemma, which shows that
the image f (K) of a compact set is compact in R (recall De…nition 29).

Lemma 801 Let f : A Rn ! R be continuous on a compact subset K of A. Then, the


image f (K) is a compact set in R.

Proof With the notions of topology at our disposal we are able to prove the result only in
the case n = 1 (the general case, however, does not present substantial di¤erences). So, let
n = 1. By De…nition 29, to show that the set f (K) is bounded in R it is necessary to show
that it is bounded both above and below in R. Suppose, by contradiction, that f (K) is
unbounded above. Then there exists a sequence fyn g f (K) such that limn!1 yn = +1.
Let fxn g K be the corresponding sequence such that f (xn ) = yn for every n. The
sequence fxn g is bounded since it is contained in the bounded set K. By Bolzano-Weierstrass’
Theorem, there exist a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x ~.
Since K is closed, we have x ~ 2 K. Moreover, the continuity of f implies limk!1 ynk =
limk!1 f (xnk ) = f (~x) 2 R. This contradicts limk!1 ynk = limn!1 yn = +1. It follows
that the set f (K) is bounded above. In a similar way, one shows that the set f (K) is
bounded below. Thus, f (K) is bounded.
To complete the proof that f (K) is compact, it remains to show that f (K) is closed.
Consider a sequence fyn g f (K) that converges to y 2 R. By Theorem 165, we must show
that y 2 f (K). Since fyn g f (K), by de…nition there exists a sequence fxn g K such
that f (xn ) = yn . As seen above, the sequence fxn g is bounded. The Bolzano-Weierstrass’
Theorem yields a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x
~. Since K
is closed, x
~ 2 K. Moreover, the continuity of f implies that

y = lim ynk = lim f (xnk ) = f (~


x)
k!1 k!1

Therefore, y 2 f (K), as desired.

Before proving Weierstrass’ Theorem, observe that the fact that continuity preserves
compactness is quite remarkable. It is another characteristic that distinguishes compact sets
among closed sets, for which in general this fact does not hold, as the next example shows.

Example 802 The function f (x) = e x is continuous, but the image of the closed, but not
compact, set [0; 1) under f is the set (0; 1], which is not closed. N

Second proof of Weierstrass’Theorem As for the previous lemma, we prove the result
for n = 1. By Lemma 801, f (K) is compact, so is bounded. By the Least Upper Bound
Principle, there exists sup f (K). Since sup f (K) 2 @f (K) (why?) and f (K) is closed,
it follows that sup f (K) 2 f (K). Therefore, sup f (K) = max f (K), that is, there exists
x1 2 K such that f (x1 ) = maxx2K f (x). A similar argument shows that arg minx2C f (x)
is not empty.
544 CHAPTER 18. OPTIMIZATION PROBLEMS

18.3 Existence: Tonelli’s Theorem


18.3.1 Coercivity
Weierstrass’ Theorem guarantees the existence of both maximizers and minimizers. How-
ever, when studying optimization problems in economics, one is generally interested in the
existence of maximizers or minimizers, but rarely in both. For example, in many economic
applications the existence of maximizers is of crucial importance, while that of minimizers
is of little or no interest at all.
For such a reason we will now introduce a class of functions which, thanks to an ingenious
use of Weierstrass’Theorem, are guaranteed to admit maximizers under weaker hypotheses,
without making any mention of minimizers.12 Recall that for a function f : A Rn ! R
the upper contour set fx 2 A : f (x) tg is indicated as (f t).

De…nition 803 A function f : A Rn ! R is said to be coercive on a subset C of A if


there is a scalar t 2 R such that the set

(f t) \ C = fx 2 C : f (x) tg (18.19)

is non-empty and compact.

Thus, a function is coercive on C when there is at least an upper contour set that has a
non-empty and compact intersection with C. In particular, when A = C the function is just
said to be coercive, without any further speci…cation.

Example 804 The function f : R ! R given by f (x) = x2 is coercive. Its graph is a


downward parabola

4 y
3

0
O x
-1 y =t

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

12
Needless to say, the theorems of this section can be “‡ipped over” (just take f ) in order to guarantee
the existence of minimizers, now without caring about maximizers.
18.3. EXISTENCE: TONELLI’S THEOREM 545

that already suggests its coercivity. Formally, we have


( p p
t; t if t 0
fx 2 R : f (x) tg =
; if t > 0

So, fx 2 R : f (x) tg is non-empty and compact for every t 0. N

Example 805 Consider the cosine function f : R ! R given by f (x) = cos x, with graph:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

This function is coercive on [ ; ]. For example, for t = 0 one has that


h i
fx 2 [ ; ] : f (x) 0g = ;
2 2
More generally, from the graph it is easy to see that the set fx 2 [ ; ] : f (x) tg is non-
empty and compact for every t 1. However, the function fails to be coercive on the entire
real line: the set fx 2 R : f (x) tg is unbounded –so, not compact –for every t 1 and
is empty for every t > 1 (as one can easily see from the graph). N

As the last example shows, coercivity is a joint property of the function f and of the set
C, that is, of the pair (f; C). It is also an ordinal property:

Proposition 806 Given a function f : A Rn ! R, let g : B R ! R be strictly


increasing with Im f B. The function f is coercive on C A if and only if the composite
function g f is coercive on C.

Proof In proving Proposition 782 we noted that

f (x) f (y) () (g f ) (x) (g f ) (y) 8x; y 2 A

It thus follows that


(f t) = (g f t) 8t 2 R
which implies the desired result (as the reader can easily verify).
546 CHAPTER 18. OPTIMIZATION PROBLEMS

Example 807 Thanks to Example 804 and Proposition 806, the famous Gaussian function
2
f : R ! R de…ned by f (x) = e x is coercive. This should be clear by inspection of its
graph:
3

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

which is the well-known “bell curve” found in statistics courses (cf. Example 1258). N

All continuous functions are coercive on compact sets. This will be a simple consequence
of the following important property of upper and lower contours sets of continuous functions.

Lemma 808 Let f : A Rn ! R be continuous on a closed subset C of A. Then, the sets


(f t) \ C are closed for every t 2 R.

Proof If (f t) is empty, we have that (f t) \ C = ;, which is trivially closed. So, let


(f t) be non-empty. Let fxn g (f t) \ C be a sequence converging to x 2 R. By
Theorem 165, to prove that (f t) \ C is closed one must show that x 2 (f t) \ C. The
fact that C is closed implies that x 2 C. The continuity of f at x implies that f (xn ) ! f (x).
Since f (xn ) t for every n, a simple application of Proposition 296 shows that f (x) t,
that is x 2 (f t). We conclude that x 2 (f t) \ C, as desired.

Example 809 The hypothesis that C is closed is crucial. Take for example f : R ! R
given by f (x) = x. If C = (0; 1), we have (f t) \ C = [t; 1) for every t 2 (0; 1) and such
sets are not closed. N

In view of Lemma 808, the next result is now quite obvious.

Proposition 810 A function f : A Rn ! R which is continuous on a compact subset C


of A is coercive on C.

Proof Let C A be compact. If f : A Rn ! R is continuous on C, Lemma 808 implies


that any set (f t) \ C is closed. Since a closed subset of a compact set is compact itself,
it follows that any (f t) \ C is compact. Therefore, f is coercive on C.

Continuous functions f on compact sets C are, thus, a …rst relevant example of pairs
(f; C) exhibiting coercivity. Let us see a few more examples.
18.3. EXISTENCE: TONELLI’S THEOREM 547

Example 811 Let f : R ! R be de…ned by f (x) = 1 x2 . Its graph is:

4 y
3

0
O x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

This function is coercive, as the graph suggests. Formally, we have


( p p
1 t; 1 t if t 1
fx 2 R : f (x) tg =
; if t > 1

and so the set fx 2 R : f (x) tg is non-empty and compact for every t 1. For example,
for t = 0 we have
fx 2 R : f (x) 0g = [ 1; 1]
which su¢ ces to conclude that f is coercive (indeed, in De…nition 803 we require the existence
of at least one t 2 R for which the set fx 2 R : f (x) tg is non-empty and compact). N

Example 812 The function f : R ! R de…ned by f (x) = e jxj is coercive. Indeed


8
>
> R if t 0
<
fx 2 R : f (x) tg = [log t; log t] if t 2 (0; 1]
>
>
:
; if t > 1

and so fx 2 R : f (x) tg is non-empty and compact for each t 2 (0; 1]. N

Example 813 Let f : R ! R be de…ned by


(
log jxj if x 6= 0
f (x) =
0 if x = 0

Set C = [ 1; 1]. We have


(
1; et [ et ; +1 [ f0g if t 0
fx 2 R : f (x) tg =
1; et [ et ; +1 if t > 0
548 CHAPTER 18. OPTIMIZATION PROBLEMS

and so (
; t>0
fx 2 R : f (x) tg \ C =
1; et [ et ; 1 [ f0g t 0
Thus the function is coercive on the compact set [ 1; 1] (although it is discontinuous at 0,
thus making Proposition 810 inapplicable). N

18.3.2 Tonelli
The fact that coercivity and continuity of a function guarantee the existence of a maximizer
is rather intuitive. The upper contour set (f t) indeed “cuts out the low part” – i.e.,
under the value t – of Im f leaving untouched the high part – where the maximum value
lies. The following result, a version of a result of Leonida Tonelli, formalizes this intuition
by establishing the existence of maximizers for coercive functions.

Theorem 814 (Tonelli) A function f : A Rn ! R which is coercive and continuous on


a subset C of A admits (at least) a maximizer in C, that is, there exists a x
^ 2 C such that

f (^
x) = max f (x)
x2C

Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t)\C is
non-empty and compact. By Weierstrass’Theorem, there exists x ^ 2 such that f (^x) f (x)
for every x 2 . At the same time, if x 2 C we have that f (x) < t and so f (^
x) t > f (x).
It follows that f (^
x) f (x) for every x 2 C, that is, f (^x) = maxx2C f (x).

Thanks to Proposition 810, the hypotheses of Tonelli’s Theorem are weaker than those
of Weierstrass’Theorem. On the other hand, weaker hypotheses lead to a weaker result (as
always, no free meals) in which only the existence of a maximizer is guaranteed, without mak-
ing any mention of minimizers. Since, as we already noted, in many economic optimization
problems, one is interested in the existence of maximizers, Tonelli’s Theorem is important
because it allows to “trim o¤” overabundant hypotheses (with respect to our needs) from
Weierstrass’Theorem. In particular, we can use Tonelli’s Theorem in optimization problems
where the choice set is not compact – for example, in Chapter 28 we will use it with open
choice sets.

To sum up, the optimization problem (18.2), that is,

max f (x) sub x 2 C


x

has a solution if f is coercive and continuous on C. Under such hypotheses, one cannot say
anything about the dual minimization problem with min instead of max.
2
Example 815 The functions f; g : R ! R de…ned by f (x) = 1 x2 and g (x) = e x are
both coercive (see Examples 811 and 807). Since they are continuous as well, by Tonelli’s
Theorem we can say that arg maxx2R f (x) 6= ; and arg maxx2R g (x) 6= ; (as easily seen
from their graphs, for both functions the origin is the global maximizer). Note that, instead,
arg minx2R f (x) = arg minx2R g (x) = ;. Indeed, the set R is not compact, thus making
Weierstrass’Theorem inapplicable. N
18.3. EXISTENCE: TONELLI’S THEOREM 549

A constant function on Rn is a simple example of a continuous function that, trivially,


admits maximizers (and minimizers as well) but it is not coercive. So, coercivity is not a
necessary condition for the existence of maximizers, even for continuous objective functions.
Yet, by Tonelli’s Theorem it becomes a su¢ cient condition for continuous objective functions.

N.B. The coercivity of f on C amounts to say that there exists a non-empty compact set
K such that
arg max f (x) K C
x2C

Indeed, just set K = (f t) \ C in (18.19) because, if the solution set is non-empty, we


trivially have arg maxx2C f (x) = ff maxx2C f (x)g \ C. In words, coercivity thus re-
quires that the solution set can be “inscribed” in a compact subset of the choice set. Such
compact subset can be regarded as a …rst, possibly very rough, estimate of the solution set.
However rough, in view of Tonelli’s Theorem such estimate ensures for continuous functions
the existence of solutions. In this vein, Tonelli’s Theorem can be viewed as the outcome
of two elements: (i) the continuity of the objective function, (ii) a preliminary “compact”
estimate of the solution set.13 O

18.3.3 Supercoercivity
In light of Tonelli’s Theorem, it becomes important to identify classes of coercive functions.
Supercoercive functions are a …rst relevant example.14

De…nition 816 A function f : Rn ! R is said to be supercoercive if, for every sequence


fxn g Rn ,
kxn k ! +1 =) f (xn ) ! 1

Supercoercivity requires f to diverge to 1 along any possible unbounded sequence


fxn g Rn – i.e., such that kxn k ! +1. In words, the function cannot take, inde…nitely,
increasing values on a sequence that “dashes o¤” to in…nity. This makes all upper contour
sets bounded:

Proposition 817 A function f : Rn ! R is supercoercive if and only if all its upper contour
sets are bounded.

Proof “Only if”. Let f : Rn ! R be supercoercive. Suppose, by contradiction, that there is


an upper contour set (f t) which is not bounded. Then, there is, a sequence fxn g (f t)
such that kxn k ! +1. That is, fxn g Rn is such that kxn k ! +1 and f (xn ) t for each
n. But, kxn k ! +1 implies f (xn ) ! 1 because f is supercoercive. This contradiction
proves that all sets (f t) are bounded.
“If”. Suppose that all upper contour sets are bounded. Let fxn g Rn be such that
kxn k ! +1. Fix any scalar t < supx2Rn f (x), so that the corresponding upper contour
set (f t) is not empty. Since it is bounded, by De…nition 159 there exists K > 0 large
enough so that kxk < K for all x 2 (f t). Since kxn k ! +1, then there exists nt 1
13
Ultracoda readers will learn that (i) can be substantially weakened.
14
For the sake of simplicity, here we focus on functions de…ned on Rn although the analysis holds for
functions de…ned on a subset A of Rn as well (in the next de…nition one then requires fxn g A).
550 CHAPTER 18. OPTIMIZATION PROBLEMS

large enough so that xn 2 = (f t) for all n nt , i.e., f (xn ) < t for all n nt . In turn, this
implies that lim sup f (xn ) t. Since this inequality holds for all scalars t < supx2Rn f (x),
we conclude that lim sup f (xn ) = 1, which in turn trivially implies that lim f (xn ) = 1,
as desired.

Example 818 (i) The function f : R ! R, de…ned by f (x) = x2 is supercoercive. Indeed,


since jxn j2 = x2n for every n, we have that jxn j ! +1 only if x2n ! +1. This implies that

jxn j ! +1 =) f (xn ) = 1

yielding that the function is supercoercive.


(ii) The function f : R2 ! R given by f (x) = x21 x22 is supercoercive. Indeed,
q 2
f (x) = x21 + x22 = x21 + x22 = kxk2

and so kxn k ! +1 implies f (xn ) ! 1.


Pn
(iii) More generally, the function f : Rn ! R given by f (x) = kxk2 = 2
i=1 xi is
supercoercive. N

Example 819 The function f : R2 ! R given by f (x) = (x1 x2 )2 is not supercoercive.


Consider p
the sequence xn = (n; n). One has that f (xn ) = 0 for every n 1, although
kxn k = n 2 ! +1. N

The next result shows that supercoercivity implies coercivity for functions f that are
continuous on a closed set C. As a result, Tonelli’s Theorem can be applied to the pair
(f; C).

Proposition 820 A supercoercive function f : Rn ! R which is continuous on a closed


subset C of A is coercive there. In particular, the sets (f t) \ C are compact for every
t 2 R.

Proof The last result implies that, for every t 2 R, the sets (f t) \ C are bounded. Since
f is continuous and C is closed, such sets are also closed. Indeed, take fxn g (f t) \ C
such that xn ! x 2 Rn . By Theorem 165, to show that (f t) \ C is closed it su¢ ces
to show that x 2 (f t) \ C. As C is closed, we have x 2 C. Since f is continuous, we
have lim f (xn ) = f (x). Since f (xn ) t for every n 1, it follows that f (x) t, that is,
x 2 (f t). Hence, x 2 (f t) \ C and the set (f t) \ C is closed. Since it is bounded,
it is compact.

The reader should note that, when considering a supercoercive and continuous func-
tion, all sets (f t) \ C are compact, while coercivity requires only that at least one of
them be non-empty and compact. This shows, once again, how supercoercivity is a much
stronger property than coercivity. However, it is simpler both to formulate and to verify,
thus explaining its appeal.

The next result establishes a simple comparison criterion for supercoercivity.


18.4. SEPARATING SETS AND POINTS 551

Proposition 821 Let f : Rn ! R be supercoercive. If g : Rn ! R is such that, for some


k > 0,
kxk k =) g (x) f (x) 8x 2 Rn
then g is supercoercive.

Proof Let fxn g Rn be such that kxn k ! +1. This implies that there exists n 1
such that kxn k k, and so g (xn ) f (xn ), for every n n. At the same time, since f
is supercoercive, the sequence ff (xn )g is such that f (xn ) ! 1. This implies that for
each K 2 R there exists nK 1 such that xn < K for all n nK . For each K 2 R, set
nK = max fn; nK g. We then have g (xn ) f (xn ) < K for all n nK , thus proving that
g (xn ) ! 1 as well.

Supercoercivity is thus inherited via dominance: given a function g, if we can …nd a


supercoercive function f such that g f on some set fx 2 Rn : kxk kg, then also g is
supercoercive. A natural supercoercive “test” function f : Rn ! R is
f (x) = kxk +
with < 0 and 2 R. It is a very simple function, easily seen to be supercoercive. If a
function g : Rn ! R is such that
g (x) kxk + (18.20)
on some set fx 2 Rn : kxk kg, then it is supercoercive.

Example 822 Let g : Rn ! R be de…ned by g (x) = 1 kxk . If 1, then g is


supercoercive. Indeed, on fx 2 Rn : kxk 1g we have kxk kxk, so
g (x) = 1 kxk 1 kxk
The inequality (18.20) holds with = 1 and = 1, so g is supercoercive (for = 1 and
n = 1, we get back to the function g (x) = 1 x2 that was shown to be coercive in Example
811).
Since g is continuous, by Tonelli’s Theorem it has at least one maximizer in Rn . Yet,
it is easily seen that the function has no minimizers (here Weierstrass’ Theorem is useless
because Rn is not compact). N

18.4 Separating sets and points


In applications it is sometimes important to separate a point and a set. As a dividend of
Tonelli’s Theorem we will state a separation theorem.
An hyperplane H in Rn is the set of points x that satisfy the condition a x = b for some
0 6= a 2 Rn and b 2 R. That is,
H = fx 2 Rn : a x = bg
In view of Riesz’s Theorem, hyperplanes are the level curves of linear functions.
An hyperplane H de…nes two closed half-spaces
H+ = fx 2 Rn : a x bg and H = fx 2 Rn : a x bg
whose intersection is H, i.e., H+ \ H = H.
552 CHAPTER 18. OPTIMIZATION PROBLEMS

De…nition 823 Given two sets X and Y of Rn , we say that they are separated if there
exists an hyperplane H such that X H+ and Y H . In particular, they are:

(i) strictly separated if a x > a y for all x 2 X and y 2 Y ;

(ii) strongly separated if a x b+" > b a y for all x 2 X and y 2 Y and for some
" > 0.

Intuitively, two sets are separated when there exists an hyperplane that acts like a wa-
tershed between them, with each set included in a di¤erent half-space determined by the
hyperplane.
It is often important the separation between a convex set and a single point. Next we
focus on such a case.

Proposition 824 Let C be a convex set in Rn and let x0 2


= C.

(i) If C is closed, then fx0 g and C are strongly separated.

(ii) If C is open, then fx0 g and C are strictly separated.

Proof We only prove (i), while we omit the non-trivial proof of (ii). Without loss of
generality, assume that x0 = 0 2 = C. Consider the continuous function f : Rn ! R given
2
by f (x) = kxk . This function is supercoercive (Example 818). By Proposition 820, f is
coercive on the closed set C, so it has a maximizer c 2 C by Tonelli’s Theorem. If x is any
point of C, we have kck2 k c + (1 ) xk2 . Hence

kck2 2
kck2 + (1 )2 kxk2 + 2 (1 ) c x
2 2
(1 + ) kck (1 ) kxk + 2 c x

For ! 1, we get kck2 c x for all x 2 C. Therefore, setting = kck2 =2 we have


c x kck2 > > 0 = c x0 , which is the desired separation property.

Corollary 825 A compact convex set and a closed convex set are separated if they are
disjoint.

Proof Let K be a compact convex set and C be a closed convex set, with K \ C = ;. The
set K C = fx y : x 2 K; y 2 Cg is a closed and convex set (Proposition 1344) that does
not contain the origin 0 since K \ C = ;. So, by (i) of the last theorem the sets f0g and
K C are strongly separated, so 0 = a 0 b < b + " a (x y) for every x 2 K and
y 2 C. Since + " > 0, this implies a x b + " + a y > a y, so K and C are separated.

18.5 Local extremal points


Let us now consider a local and weaker version of the notion of maximizer. By itself, it is
a weakening of little interest, particularly for economic applications in which we are mainly
interested in global extrema. For example, in the consumer problem it is not of much interest
whether a bundle is a local maximizer or not: what matters is whether it is a global maximizer
or not.
18.5. LOCAL EXTREMAL POINTS 553

Nevertheless, thanks to di¤erential calculus, local maximizers are of great instrumental


importance, in primis (but not only) in the solution of optimization problems. For this
reason, we will devote this section to them.

Consider a function f : R ! R with a graph that reminds the pro…le of a mountain


range:

6
y
5

0 O x

-1

-2
1880 1900 1920 1940 1960 1980 2000

The highest peak is the (global) maximum value, but intuitively the other peaks, too, cor-
respond to points that, locally, are maximizers. The next de…nition formalizes this simple
idea.

De…nition 826 Let f : A Rn ! R be a real-valued function and C a subset of A. A


vector x
^ 2 C is said to be a local maximizer of f on C if there exists a neighborhood B" (^
x)
of x
^ such that
f (^
x) f (x) 8x 2 B" (^
x) \ C (18.21)

The value f (^
x) of the function at x
^ is called local maximum value of f on C.

The local maximizer is strong if in (18.21) we have f (^


x) > f (x) for every x 2 B" (^
x) \ C
such that x 6= x
^. In the terminology of the optimization problem (18.2), a local maximizer
of f on C is called a local solution of the problem. We have analogous de…nitions for local
minimizers, with and < in place of and >.

A global maximizer on C is obviously also a local maximizer. The notion of local max-
imizer is, indeed, much weaker than that of global maximizer. As the next example shows,
it may happen that there are (even many) local maximizers and no global maximizers.

Example 827 (i) Let f : R ! R be given by f (x) = x6 3x2 + 1. In Example 1257 we


554 CHAPTER 18. OPTIMIZATION PROBLEMS

will see that its graph is:

10
y
8

-2

-4
O x
-6

-8

-10
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

In particular, the origin x = 0 is a local maximizer, but not a global one. Indeed,
limx!+1 f (x) = limx! 1 f (x) = +1, thus the function has no global maximizers.
(ii) Let f : R ! R be given by
(
cos x if x 0
f (x) =
x if x > 0

with the graph

8
y
6

0
O x
-2

-4

-6

-8

-8 -6 -4 -2 0 2 4 6 8

The function has in…nitely many local maximizers (i.e., x = 2k for k 2 N), but no global
ones. N
18.6. CONCAVITY AND QUASI-CONCAVITY 555

Terminology In what follows maximizers (and minimizers) are understood to be global


even if not stated explicitly. The adjective “local” will be always added when they are local
in the sense of the previous de…nition.

O.R. The most important part of the de…nition of a local maximizer is “if there exists a
neighborhood”. A common mistake is to replace the correct “if there exists a neighborhood”
by the incorrect “if, by taking a neighborhood B" (^ x) of x
^”. In such a way, we do not de…ne
a local maximizer but a global one. Indeed, to …x a priori a neighborhood B" (^ x) amounts
to considering B" (^ x) rather than C as the choice set, so a di¤erent optimization will be
addressed. Relatedly, in the neighborhood B" (^ x) in (18.21) the local maximizer is, clearly,
a global one. Such “choice set” is, however, chosen by the function, not posited by us. So,
it is typically of little interest for the application that motivated the optimization problem.
Applications discipline optimization problems, not vice versa. H

O.R. An isolated point x0 of C is always both a local maximizer and a local minimizer.
Indeed, by de…nition there is a neighborhood B" (x0 ) of x0 such that B" (x0 ) \ C = fx0 g,
so the inequalities f (x0 ) f (x) and f (x0 ) f (x) for every x 2 B" (x0 ) \ C reduce to
f (x0 ) f (x0 ) and f (x0 ) f (x0 ), which are trivially true.
Considering isolated points as both local maximizers and local minimizers is a bit odd. To
avoid this, we could reformulate the de…nition of local maximizer and minimizer by requiring
x
^ to be a limit point of C. However, an even more unpleasant consequence would result:
if an isolated point were a global extremal (e.g., recall the example at the end of Section
18.1.1), we should say that it is not so in the local sense. Thus, the remedy would be worse
than the disease. H

18.6 Concavity and quasi-concavity


18.6.1 Maxima
Concave functions …nd their most classic application in the study of optimization problems, in
which they enjoy truly remarkable properties. The …rst of such properties is that maximizers
of concave functions are automatically global.

Theorem 828 Let f : C Rn ! R be a concave function de…ned on a convex subset C. If


the point x
^ 2 C is a local maximizer, then it is a global maximizer.

Proof Let x
^ 2 C be a local maximizer. By de…nition, there exists a neighborhood B" (^
x)
such that
f (^
x) f (x) 8x 2 B" (^x) (18.22)
Suppose, by contradiction, that x^ is not a global maximizer. Then, there exists a y 2 C such
that f (y) > f (^
x). Since f is concave, for every t 2 (0; 1) we have

f (t^
x + (1 t) y) tf (^
x) + (1 t) f (y) > tf (^
x) + (1 t) f (^
x) = f (^
x) (18.23)

Moreover, since C is convex, we have t^


x + (1 t) y 2 C for every t 2 (0; 1). On the other
hand,
lim kt^
x + (1 t) y x ^k = ky x^k lim (1 t) = 0
t!1 t!1
556 CHAPTER 18. OPTIMIZATION PROBLEMS

Therefore, there exists t 2 (0; 1) such that t^x + (1 t) y 2 B" (^x) for every t 2 (t; 1). From
(18.23) it follows that for such t we have f (t^
x + (1 t) y) > f (^
x), which contradicts (18.22).
We conclude that x ^ is a global maximizer.

This important result does not hold for quasi-concave functions:

Example 829 Let f : R ! R be given by


8
>
> 2 if x 0
<
f (x) = 2 x if x 2 (0; 1)
>
>
:
1 if x 1

Graphically:
4

3.5 y
3

2.5

2 2
1.5

0.5

0
O 1 x
-0.5

-1
-3 -2 -1 0 1 2 3

This function is quasi-concave because it is monotonic. All the points x > 1 are local
maximizers, but not global maximizers. N

When f is quasi-concave, the set of maximizers arg maxx2C f (x) is convex.15 Indeed, let
y; z 2 arg maxx2C f (x) and let t 2 [0; 1]. By quasi-concavity, we have

f (ty + (1 t) z) min ff (y) ; f (z)g = f (y) = f (z) = max f (x)


x2C

and therefore
f (ty + (1 t) z) = max f (x)
x2C

i.e., ty + (1 t) z 2 arg maxx2C f (x).


Since arg maxx2C f (x) is convex, there are three possibilities:

(i) arg maxx2C f (x) is empty: there are no maximizers;


15
All the more if f is concave. Recall that the properties established for quasi-concave functions hold, a
fortiori, for concave functions (the latter being a particular class of quasi-concave functions). The converse
obviously does not hold: as just noted, Theorem 828 is an important example of this fact.
18.6. CONCAVITY AND QUASI-CONCAVITY 557

(ii) arg maxx2C f (x) is a singleton: there exists a unique maximizer;

(iii) arg maxx2C f (x) consists of in…nitely many points: there exist in…nitely many maxim-
izers.

We illustrate the di¤erent possibilities with some examples.

Example 830 (i) Let f : R++ ! R be de…ned by f (x) = log x for every x > 0. The function
f is strictly concave. It is easy to see that it has no maximizers, that is, arg maxx>0 f (x) = ;.
(ii) Let f : R ! R be de…ned by f (x) = 1 x2 for every x 2 R. Then f is strictly concave
and the unique maximizer is x ^ = 0, so that arg maxx2R f (x) = f0g. (iii) Let f : R ! R be
de…ned by 8
>
> x if x 1
<
f (x) = 1 if x 2 (1; 2)
>
>
:
3 x if x > 2
with graph
2

1.5 y

0.5

O
0
1 2 x
-0.5

-1

-1.5

-2
-2 -1 0 1 2 3 4

Then f is concave and arg maxx2R f (x) = [1; 2]. N

The last function of this example, with in…nitely many maximizers, is concave but not
strictly concave. The next result shows that, indeed, strict quasi-concavity implies that a
maximizer, if exists, is necessarily unique. In other words, for strictly quasi-concave func-
tions, arg maxx2C f (x) is at most a singleton (so, the unique maximizer is also a strong one,
if exists).

Theorem 831 A strictly quasi-concave function f : C Rn ! R de…ned on a convex subset


C has at most one maximizer.

Proof Suppose that x ^1 ; x


^2 2 C are two maximizers for f . We want to show that x ^1 = x ^2 .
Suppose, by contradiction, that x^1 6= x
^2 . Since x
^1 and x
^2 are maximizers, we have f (^
x1 ) =
558 CHAPTER 18. OPTIMIZATION PROBLEMS

f (^
x2 ) = maxx2C f (x). Set xt = t^x1 + (1 t) x
^2 for t 2 (0; 1). Since C is convex, xt 2 C.
Moreover, by strict quasi-concavity,

f (xt ) = f (t^
x1 + (1 t) x
^2 ) > min ff (^
x1 ) ; f (^
x2 )g = max f (x)
x2C

which is a contradiction. We conclude that x


^1 = x
^2 , as desired.

In the last example, f (x) = 1 x2 is an instance of a strictly concave function with a


unique maximizer x ^ = 0, while f (x) = log x is an instance of a strictly concave function that
has no maximizers. The clause “at most”is, therefore, indispensable because, unfortunately,
maximizers might not exist.
To have (at most) a unique maximizer is the key characteristic of strictly quasi-concave
functions that motivates their widespread use in economic applications. Indeed, strict quasi-
concavity is the simplest condition which guarantees the uniqueness of the maximizer, a key
property for comparative statics exercises (as we remarked earlier in the chapter).

18.6.2 Minima
Also miniminization problems for concave functions have some noteworthy properties.

Proposition 832 Let f : C ! R be a non-constant function de…ned on a convex subset C


of Rn .

(i) If f is concave, then arg minx2C f (x) @C.

(ii) If f is strictly quasi-concave, then arg minx2C f (x) ext C.

Proof Suppose arg minx2C f (x) 6= ; (otherwise the result is trivially true). (i) Let x
^ 2
arg minx2C f (x). Since f is not constant, there exists y 2 C such that f (y) > f (^ x).
Suppose, by contradiction, that x^ is an interior point of C. Set z = x ^ + (1 ) y with
2 R. The points z are the points of the straight line that passes through x
^ and y. Since
x
^ is an interior point of C, there exists > 1 such that z 2 C. On the other hand,
x
^ = z = + y= 1 1 . Therefore, we get the contradiction

1 1 1 1
f (^
x) = f z + 1 y f (z ) + 1 f (y)

1 1
> f (^
x) + 1 f (^
x) = f (^
x)

It follows that x^ 2 @C, as desired. (ii) Let x^ 2 arg minx2C f (x). Suppose, by contradiction,
that x^2 = ext C. Then, there exist x; y 2 C with x 6= y and 2 (0; 1) such that x ^ = x+
(1 ) y. By strict quasi-concavity, f (^
x) = f ( x + (1 ) y) > min ff (x) ; f (y)g f (^
x),
a contradiction. We conclude that x ^ 2 ext C, as desired.

Hence, under (i) the search of minimizers can be restricted to the boundary points of C.
More is true under (ii), where the search can be restricted to the extreme points of C, an
even smaller set (Proposition 693).
18.6. CONCAVITY AND QUASI-CONCAVITY 559

Example 833 Consider the strictly concave function f : [ 1; 1] ! R de…ned by f (x) =


1 x2 . Since f 1; 1g is the set of extreme points of C = [ 1; 1], by the last proposition the
minimizers belong to such set. Clearly, both its elements are minimizers. N

Extreme points take center stage in the compact case, a remarkable fact because the set
of extreme points can be a small subset of the frontier –for instance, if C is a polytope we
can restrict the search of minimizers to the vertices.

Theorem 834 (Bauer) Let f : C ! R be a continuous function de…ned on a convex and


compact subset C of Rn .

(i) If f is concave, then


min f (x) = min f (x) (18.24)
x2C x2extC

and
;=
6 arg min f (x) arg min f (x) co arg min f (x) (18.25)
x2extC x2C x2extC

(ii) If f is strictly quasi-concave, then

;=
6 arg min f (x) extC
x2C

Relative to the previous result, now Weierstrass’Theorem ensures the existence of min-
imizers. More interestingly, thanks to Minkowski’s Theorem in (i) we can now say that a
concave function attains its minimum value at some extreme point. So, in terms of value
attainment the miniminization problem

min f (x) sub x 2 C (18.26)


x

reduces to the much simpler problem

min f (x) sub x 2 extC (18.27)


x

that only involves extreme points. In particular, in the important case when f is strictly
concave we can take advantage of both (i) and (ii), so

;=
6 arg min f (x) = arg min f (x)
x2extC x2C

The miniminization problem (18.26) then reduces to the simpler problem (18.27) in terms
of both solutions and value attainment.

Proof By Weierstrass’ Theorem, arg minx2C f (x) 6= ;. Point (ii) thus follows from the
previous result. As to (i), we …rst prove that

arg min f (x) co extC \ arg min f (x) (18.28)


x2C x2C

that is, that minimizers are a convex combination of extreme points which are, themselves,
minimizers. Let x ^ 2 arg minx2C f (x). By Minkowski’s Theorem, we have C = co extC.
560 CHAPTER 18. OPTIMIZATION PROBLEMS

Therefore, there
P exist a …nite collection Pfxi gi2I extC and a …nite collection f i gi2I
(0; 1],16 with i2I i = 1, such that x ^ = i2I i xi . Since x ^ is a minimizer, we have f (xi )
f (^
x) for all i 2 I. Together with concavity, this implies that
!
X X X
f (^
x) = f i xi i f (xi ) i f (^
x) = f (^
x) (18.29)
i2I i2I i2I
P
Hence, we conclude that i2I i f (xi ) = f (^ x), which implies f (xi ) = f (^x) for all i 2 I.
Indeed,
P if we had f (x i ) > f (^
x ) for some i 2 I, then we would reach the contradiction
i2I i f (xi ) > f (^ x). It follows that for each i 2 I we have xi 2 arg minx2C f (x) \ extC,
proving (18.28).
We are ready to prove (18.25). By the previous part of the proof, arg minx2C f (x) \
extC 6= ;. Consider x 2 arg minx2C f (x) \ extC. Let x ^ 2 arg minx2extC f (x). By de…nition
and since x 2 extC, we have that f (^ x) f (x). Since x 2 arg minx2C f (x), we have
that f (x) f (^
x). This implies that f (x) = f (^ x) and, therefore, x
^ 2 arg minx2C f (x).
Since x^ was arbitrarily chosen, it follows that arg minx2extC f (x) arg minx2C f (x) \ extC,
proving the …rst inclusion in (18.25). Clearly, extC \ arg minx2C f (x) arg minx2extC f (x).
So, extC \ arg minx2C f (x) = arg minx2extC f (x) and (18.28) yields the second inclusion in
(18.25).
It remains to prove (18.24). Let x ^ 2 arg minx2C f (x). By (18.25), there exists a …-
nite
P collection f^ xi gi2I arg
Pminx2extC f (x) and a …nite collection f i gi2I (0; 1], with
i2I i = 1, such that x
^ = i2I i ix
^ . By concavity:
X X
min f (x) = f (^ x) i f (^
xi ) = i min f (x) = min f (x) min f (x)
x2C x2extC x2extC x2C
i2I i2I

So, (18.24) holds.

Minimization problems for concave functions are, conceptually, equivalent to maxim-


ization problems for convex functions. So, Example 789 can now be viewed as an early
illustration of Bauer’s Theorem. Let us see other examples.
Example 835 (i) The function f in Example 833 is strictly concave. In particular, we have
arg minx2extC f (x) = arg minx2C f (x)
n = f 1;P
1g, while coo(arg minx2extC f (x)) = [0; 1].
(ii) Consider the simplex 2 = x 2 R+ : 3i=1 xi = 1 of R3 . De…ne f : 2 ! R by
3

1 1
f (x) =(1 x1 x2 )2 (1 x3 )2
2 2
It is easy to check that f is continuous and concave. Since 2 is convex and compact with
extreme points the versors e1 ; e2 ; e3 , by Bauer’s Theorem-(i) we have

6 arg min f ei
;= arg min f (x) co arg min f ei (18.30)
i2f1;2;3g x2 2 i2f1;2;3g

It is immediate to check that f ei = 1=2 for all i 2 f1; 2; 3g, that is,

arg min f ei = e1 ; e2 ; e3 and co arg min f ei = 2


i2f1;2;3g i2f1;2;3g
16
Without loss of generality, we assume that i > 0 for all i 2 I.
18.6. CONCAVITY AND QUASI-CONCAVITY 561

Let x = (1=4; 1=4; 1=2) 2 2 and x ^ = (1=2; 1=2; 0). We have f (x) = 1=4 > 1=2 = f (^ x),
so x does not belong to arg minx2 2 f (x) but, clearly, belongs to co(arg mini2f1;2;3g f ei ).
^ belongs to arg minx2 2 f (x) but, clearly, does not belong to arg mini2f1;2;3g f ei .
Moreover, x
This proves that the inclusions in (18.30) are strict. N

18.6.3 A¢ ne functions
If we consider a¢ ne functions –i.e., functions that are both concave and convex –we have
the following corollary of Bauer’s Theorem.

Corollary 836 Let f : C ! R be a function de…ned on a convex and compact subset C of


Rn . If f is a¢ ne, then

max f (x) = max f (x) and min f (x) = min f (x) (18.31)
x2C x2extC x2C x2extC

as well as
;=
6 arg max f (x) = co arg max f (x) (18.32)
x2C x2extC

and
;=
6 arg min f (x) = co arg min f (x) (18.33)
x2C x2extC

Proof By (18.24) we have (18.31). By Proposition 671, f is continuous. So, the sets in
(18.32) and (18.33) are non-empty by Weierstrass’ Theorem. Since f is a¢ ne, it is also
concave. By (18.25),

co arg min f (x) co arg min f (x) co arg min f (x) ;


x2extC x2C x2extC

so
co arg min f (x) = co arg min f (x) = arg min f (x)
x2extC x2C x2C

because arg minx2C f (x) is convex given that f is a¢ ne. Since f is also a¢ ne, the result
holds for the arg maxx2C f (x) as well.

For a¢ ne functions we therefore have an especially e¤ective version of Weierstrass’The-


orem: not only both maximizers and minimizers exist, but they can be found by solving the
much simpler optimization problems

max f (x) sub x 2 extC and min f (x) sub x 2 extC


x x

that only involve extreme points. Moreover, by (18.31), the values attained are the same.
So, the simpler problems are equivalent to the original ones in terms of both solutions and
value attainment.
An earlier instance of such a remarkable simpli…cation a¤orded by a¢ ne objective func-
tions was discussed in Example 789-(ii). Next we provide another couple of examples.
562 CHAPTER 18. OPTIMIZATION PROBLEMS

Example 837 (i) Consider the a¢ ne function f : R3 ! R de…ned by f (x) = x1 + 2x2


x3 +5 and the simplex 2 = f(x1 ; x2 ; 1 x1 x2 ) : x1 ; x2 0 and x1 + x2 1g. Its extreme
points are the versors e1 , e2 , and e3 . By the last corollary, some of them have to be maximizers
or minimizers. We have

f e3 = 4 < f e1 = 6 < f e2 = 7

By (18.32) and (18.33), arg maxx2C f (x) = e2 and arg minx2C f (x) = e3 .
(ii) Consider the a¢ ne function f : R3 ! R de…ned by f (x) = x1 + 2x2 + 2x3 + 5. Now
we have
f e1 = 6 < f e2 = f e3 = 7
By (18.32) and (18.33),

arg max f (x) = co e2 ; e3 = f(0; ; 1 ): 2 [0; 1]g


x2C

and arg minx2C f (x) = e1 . N

18.6.4 Linear programming


Corollary 836 and its variations play a key role in linear programming, which studies op-
timization problems with linear objective functions and a¢ ne constraints. To study these
problems we need to introduce an important class of convex sets. Speci…cally, given a m n
matrix A = (aij ) and a vector b 2 Rm , the convex set
8 9
< Xn =
P = fx 2 Rn : Ax bg = x 2 Rn : aij xj bi 8i = 1; :::; m
: ;
j=1

of Rn is called polyhedron. Let us write explicitly the row vectors of the matrix A as:

a1 = (a11 ; a12 ; :::; a1n )

am = (am1 ; am2 ; :::; amn )

Each row vector ai thus identi…es an inequality constraint ai x bi that a vector x 2 Rn has
to satisfy in order to belong to the polyhedron. We can indeed write P as the intersection
m
\
P = Hi
i=1

of the half-spaces Hi = fx 2 Rn : ai xi bi g seen in Section 18.4.

Example 838 (i) A¢ ne sets are the polyhedra featuring equality constraints (Proposition
666). (ii) Simplices are polyhedra: for instance 2 in R3 can be written as x 2 R3 : Ax b
18.6. CONCAVITY AND QUASI-CONCAVITY 563

with b = (0; 0; 0; 1) 2 R4 and 2 3


1 0 0
6 0 1 0 7
A =6
4 0
7
4 3 0 1 5
1 1 1
Clearly, simplices are examples of compact polyhedra. N

Example 839 Given b = (1; 1; 2) and


2 3
1 2 2
4
A= 0 2 1 5
0 1 1

we have the polyhedron


8 9
< x1 2x2 + 2x3 1 =
P = x 2 R3 : Ax n
b = x = (x1 ; x2 ; x3 ) 2 R : 2x2 x3 1
: ;
x2 x3 2

This polyhedron is not bounded: for instance, the vectors xn = ( n; 1=2; 0) belong to P
for all n 1. N

Example 840 The elements of a polyhedron are often required to be positive, so let P =
x 2 Rn+ : Ax b . This polyhedron can be written, however, in the standard form P 0 =
fx 2 Rn : A0 x b0 g via suitable A0 and b0 . For instance, if we require the elements of the
polyhedron of the previous example to be positive, we have b0 = (1; 1; 2; 0; 0; 0) and
2 3
1 2 2
6 0 2 1 7
6 7
6 0 1 1 7
0
A =6 6 7
6 0 0 1 7
7
4 0 1 0 5
1 0 0

in which we added (negative) versors to the matrix A. In sum, the standard formulation of
polyhedra easily includes positivity constraints. N

Polyhedra are easily seen to be closed. So, they are compact if and only if they are
bounded. Bounded polyhedra are actually old friends.

Proposition 841 A convex set in Rn is a bounded polyhedron if and only if it is a polytope.

In other words, this result (we omit the non-trivial proof) shows that a bounded poly-
hedron P can be written as a convex envelope of a collection of vectors xi 2 Rn , i.e.,
P = co (x1 ; :::; xm ) (cf. Example 689). This means, inter alia, that bounded polyhedra have
a …nite number of extreme points (cf. Example 694).
We can actually characterize the extreme points of polyhedra. To this end, denote by Ax
the submatrix of A that consists of the rows ai of A featuring constrains that are binding at
x, i.e., such that ai x = bi . Clearly, (Ax ) (A) max fm; ng.
564 CHAPTER 18. OPTIMIZATION PROBLEMS

Proposition 842 Let P = fx 2 Rn : Ax bg be a polyhedron. A vector x 2 P is an extreme


point of P if and only if (Ax ) = n.

In other words, a vector is an extreme point of a polyhedron of Rn if and only if there exist
n linearly independent binding constraints at that vector. Besides its theoretical interest,
this characterization operationalizes the search of extreme points by reducing it to checking
a matrix property.

Proof We prove the “if” leaving the converse to the reader. Suppose that (Ax ) = n.
We want to show that x is an extreme point. Suppose, by contradiction, that there exists
2 (0; 1) and two distinct vectors x0 ; x00 2 P such that x = x0 + (1 ) x00 . Denote by
I (x) = fi 2 f1; :::; mg : ai x = bi g the set of binding constrains. Then,

bi = ai x = ai x0 + (1 ) x00 = ai x0 + (1 ) ai x00 bi 8i 2 I (x)

so
ai x0 = ai x00 = bi 8i 2 I (x)

This implies that x0 and x00 are solutions of the linear system

ai x = bi 8i 2 I (x)

In view of Theorem 630, this contradicts the hypothesis (Ax ) = n. We conclude that x is
an extreme point of P .

Example 843 Let us check that the versors e1 , e2 and e3 are the extreme points of the
simplex 2 . For each x 2 R3 we have
8
>
> x1 = 0
<
x2 = 0
Ax = b ()
> x3 = 0
>
:
x1 + x2 + x3 = 1

So,
2 3
0 1 0
Ae1 =4 0 0 1 5
1 1 1

By Proposition 842, versor e1 is an extreme point of 2 because (Ae1 ) = 3. A similar


argument shows that also e2 and e3 are the extreme points of 2 . Moreover, it is easy to see
that no other points x of 2 are such that (Ax ) = 3 (indeed, to have (Ax ) > 2 at least
two coordinates of x have to be 0). N

Given a vector c 2 Rn and a non-empty polyhedron P , a linear programming problem


has the form
max c x sub x 2 P (18.34)
x
18.6. CONCAVITY AND QUASI-CONCAVITY 565

or, equivalently,
n
X
max cj xj
x1 ;:::;xn
j=1
n
X Xn n
X
sub a1j xj b1 ; a2j xj b2 ; :::; amj xj bm
j=1 j=1 j=1

In view of Corollary 836, we can solve this optimization problem when P is bounded (so
compact).

Theorem 844 (Fundamental Theorem of Linear Programming) For a linear program-


ming problem with P bounded, we have

max c x = max c x (18.35)


x2P x2fy2P : (Ay )=ng

and
;=
6 arg max c x = co arg max c x (18.36)
x2P x2fy2P : (Ay )=ng

Though an immediate consequence of Corollary 836 and Proposition 842, this is an


important result (as its name shows). In words, it says that when P is bounded (so, compact):
(i) by (18.36), a solution of the linear programming problem (18.34) exists and is either an
extreme point of the polyhedron P or a convex combination of extreme points, (ii) by (18.35),
in terms of value attainment we can consider the simpler problem

max c x sub x 2 fy 2 P : (Ay ) = ng


x

that only involves the extreme points.

Example 845 Consider the linear programming problem

max c x sub x 2 n 1
x

By the Fundamental Theorem of Linear Programming, the solution set is

co arg max c ei = co ei : i 2 arg max cj


ei 2 n 1 j=1;:::;n

For instance, if n = 4 and c = (1; 3; 3; 4), the problem is

max x1 + 3 (x2 + x3 ) 4x4 sub x = (x1 ; x2 ; x3 ; x4 ) 2 3


x1 ;x2 ;x3 ;x4

Its solution set is e2 + (1 ) e3 : 2 [0; 1] . N

A general study of optimization problem with equality and inequality constraints will be
carried out in Chapter 30. Linear programming is the special case of a concave optimization
problem (Section 30.4) where the objective function is linear and the constraints are expressed
via a¢ ne functions.17
17
By Riesz’s Theorem and Proposition 656, we can write the objective function and the constraints in the
inner product and matrix form that (18.34) features.
566 CHAPTER 18. OPTIMIZATION PROBLEMS

18.7 Consumption
18.7.1 Optimal bundles
Let us go back to the consumer problem:

max u (x) sub x 2 B (p; w)


x

If u : A Rn+ ! R is continuous and the consumption set A is closed, Weierstrass’Theorem


ensures via Proposition 798 that the consumer problem does have a solution.
If instead the consumption set A is not closed, Weierstrass’Theorem is no longer applic-
able – the set B (p; w) is not compact – and it is necessary to assume u to be coercive on
B (p; w) in order to apply Tonelli’s Theorem, which becomes key in this case. Furthermore,
if A is convex and if u is strictly quasi-concave, by Theorem 831 the solution is unique. To
sum up:

Theorem 846 If the utility function u : A Rn+ ! R is continuous and coercive on


B (p; w), the consumer problem has a solution. Such a solution is unique if A is convex and
u is strictly quasi-concave.

This powerful theorem generalizes Proposition 798 and covers most cases of interest in
consumer theory.
Pn For instance, consider the P log-linear utility function u : Rn++ ! R given
by u (x) = i=1 ai log xi , with ai > 0 and ni=1 ai = 1. It has an open consumption set
Rn++ , so Proposition 798 cannot be applied. Fortunately, the following lemma shows that it
is coercive on B (p; w). Since it is also continuous and strictly concave, by Theorem 846 the
consumer problem with log-linear utility has a unique solution.

Lemma 847 The log-linear utility function u : Rn++ ! R is coercive on B (p; w), provided
p 0.

Proof By Proposition 806, it su¢ ces to show that the result holds for the Cobb-Douglas
n
Y
utility function u (x) = xai i de…ned over Rn++ . We begin by showing that the upper
i=1
contour sets (u t) are closed for every t 2 R. If t 0 the statement is trivially true as
(u t) = ;. Let t > 0, so that (u t) 6= ;. Consider a sequence fxn g (u t) that
converges to a bundle x n
~ 2 R . To prove that (u t) is closed, it is necessary to show that
x
~ 2 (u t). Since fxn g Rn++ , we have x~ 0. Let us show that x ~ 0. Suppose, by
Yn
contradiction, that x has at least one null coordinate. This implies that u (xn ) ! ~ai i = 0,
x
i=1
thus contradicting
u (xn ) t>0 8n 1

In conclusion, x~ 0. Hence, x ~ belongs to the domain of u, so by continuity we have


u (xn ) ! u (~
x). As u (xn ) t for every n, we conclude that u (~
x) t, that is, x
~ 2 (u t),
as desired.
18.7. CONSUMPTION 567

It is easily seen that, for t > 0 small enough, the intersection (u t) \ B (p; w) is non-
empty. We have

(u t) \ B (p; w) = x 2 Rn++ : u (x) t \ x 2 Rn++ : p x w


= x2 Rn++ : u (x) t \ x2 Rn+ :p x w

As (u t) is closed and x 2 Rn+ : p x w is compact since p 0, it follows that the


intersection (u t) \ B (p; w) is a compact set. The function u is thus coercive on B (p; w).

18.7.2 Demand function


The solution set of the consumer problem –i.e., the optimal bundles –is arg maxx2B(p;w) u (x).
If the utility function is strictly quasi-concave, such a set is at most a singleton. Let us de-
note the unique optimal bundle by x ^ (p; w), so to highlight its dependence on the income w
and on the price vector p. In particular, such a dependence can be formalized by means of
a function D : Rn++ R++ ! Rn de…ned by

D (p; w) = x
^ (p; w) 8 (p; w) 2 Rn++ R++

Function D is referred to as the consumer’s demand function: it associates to each vector


(p; w) the corresponding unique optimal bundle. Of central importance in economics, the
demand function thus describes how the solution of the consumer problem varies as prices
and income change.18

The study of the demand function is usually based on methods of constrained optim-
ization that rely on di¤erential calculus, as we will see in Section 29.5. However, in the
important case of log-linear utility functions the demand for good i is, in view of Example
788,
w
Di (p; w) = ai (18.37)
pi
The demanded quantity of good i depends on income w, on its price pi and the relative
importance ai that the log-linear utility function gives it with respect to the other goods.
Speci…cally, the larger ai is, the higher is good i’s relative importance and –ceteris paribus
(i.e., keeping prices and income constant) –the higher is its demand.

18.7.3 Nominal changes


Demand functions have an important property of invariance.

Proposition 848 Given a demand function D : Rn++ R++ ! Rn , we have

D ( p; w) = D (p; w) 8 >0 (18.38)


18
Demand functions are a …rst, important, illustration of the importance of the uniqueness of the solution
of an optimization problem.
568 CHAPTER 18. OPTIMIZATION PROBLEMS

The proof is straightforward: it is enough to note that the budget set does not change if
one multiplies prices and income by the same scalar > 0, that is

B ( p; w) = fx 2 A : ( p) x wg = fx 2 A : p x wg = B (p; w)

As simple as it may seem, this proposition has an important economic meaning. Indeed, it
shows how only relative prices matter. To see why, choose any good among those in bundle
x, for example the …rst good x1 , and call it the numeraire – that is, the unit of account.
By setting its price to 1, we can express income and the other goods’prices in terms of the
numeraire:
p2 pn w
1; ; :::; ;
p1 p1 p1

By Proposition 848, the demand remains the same:

p2 pn w
x
^ (p1 ; :::; pn ; w) = x
^ 1; ; :::; ; 8p 0
p1 p1 p1

As an example, suppose that bundle x is made up of di¤erent kinds of fruit (apples, bananas,
oranges, and so on). In particular, assume that good 1, the numeraire, are apples. Set
w
~ = w=p1 and qi = pi =p1 for every i = 2; :::; n, so that

p2 p3 pn w
1; ; ; :::; ; = (1; q2 ; q3 ; ::; qn ; w)
~
p1 p1 p1 p1

In terms of the “apple ”numeraire, the price of one unit of fruit 2 is of q2 apples, the price
of one unit of fruit 3 is of q3 apples, ..., the price of one unit of fruit n is of qn apples, while
the value of income is of w ~ apples. To give a concrete example, if

p2 p3 pn w
1; ; ; :::; ; = (1; 3; 7; :::; 5; 12)
p1 p1 p1 p1

the price of one unit of fruit 2 is of 3 apples, the price of one unit of fruit 3 is of 7 apples,
..., the price of one unit of good n is of 5 apples, while the value of income is of 12 apples.
Any good in bundle x can be chosen as numeraire: it is merely a conventional choice
within an economy (justi…ed by political reasons, availability of the good itself, etc.), con-
sumers can solve their optimization problems using any numeraire whatsoever. Such a role,
however, can also be taken by an arti…cial object, such as money, say euros. In this case,
we say that the price of a unit of apples is of p1 euro, the price of a unit of fruit 2 is of p2
euro, the price of a unit of fruit 3 is of p3 euro, ..., the price of a unit of fruit n is of pn
euro, while the value of income is of w euro. It is a mere change of scale, akin to that of
measuring quantities of fruit in kilograms rather than in pounds. In conclusion, Proposition
848 shows that in consumer theory, money is a mere unit of account, nothing but a “veil”.
The choice of optimal bundles does not vary if relative prices p2 =p1 , ..., pn =p1 , and relative
income w=p1 remain unchanged. “Nominal” price and income variations do not matter for
consumers’behavior.
18.8. EQUILIBRIUM ANALYSIS 569

18.8 Equilibrium analysis


18.8.1 Exchange economies
In the previous section we studied the behavior of individual consumers. But, how these
individual behaviors do interact in a market? In particular, how is connected the individual
analysis of this section with the aggregate market analysis of Section 12.8?
The simplest way to answer these important questions is through an exchange economy,
a simple yet coherent general equilibrium model. Suppose there is a …nite collection I of
agents, each with a utility function ui : Ai Rn+ ! R and with an initial endowment ! i 2 Rn
of n goods (potatoes, apples, and so on). The exchange economy is thus represented by a
collection E = f(ui ; ! i )gi2I , where each pair (ui ; ! i ) summarizes all economically relevant
characteristics of agent i, his “economic persona”.
Assume that agents can trade, buy or sell, among themselves any quantity of the n goods
at a price vector p 2 Rn+ (say, in euros). There are no impediments to trade. Agent i has a
budget set
Bi (p; p ! i ) = fx 2 Ai : p x p ! i g
where the income w = p ! i now depends on prices because agent i can fund his consumption
by trading at the market price p his endowment and thus earning up to p ! i euros. The
vector z = x ! i is the vector of net trades, per each good, of agent i if he selects bundle
x.19
As a trader, agent i exchange goods at the market price. As a consumer, agent i solves
the optimization problem

max ui (x) sub x 2 Bi (p; p ! i )


x

Agents thus play two roles in this economy. Their trader role is, however, ancillary to their
consumer role: what agent i cares about is consumption, trading being only instrumental to
that.
^i (p; p ! i ). Since it only depends on the
Assume that there is a unique optimal bundle x
n n
price vector p, the demand function Di : R+ ! R+ of agent i can be de…ned by

^i (p; ! i )
Di (p) = x 8p 2 Rn+

The individual demand Di has still the remarkable invariance property Di ( p) = Di (p) for
every > 0. So, nominal changes in prices do not a¤ect agents’ consumption behavior.
Moreover, if ui : Rn+ ! R is strongly increasing, then Walras’law is easily seen to hold for
agent i, i.e.,
p Di (p) = p ! i (18.39)
We can now aggregate individual behavior. The aggregate demand function D : Rn+ ! Rn
is de…ned by X
D (p) = Di (p)
i2I

19
We say “net trade” because z may be the outcome of several market operations, here not modelled, in
which agents may have been on both sides of the market (i.e., buyers and sellers).
570 CHAPTER 18. OPTIMIZATION PROBLEMS

Note that the aggregate demand function inherits the invariance property of individual de-
mand functions, that is,
D ( p) = D (p) 8 >0 (18.40)

So, nominal changes do not a¤ect the aggregate demand of goods. Condition A.2 of the
Arrow-Debreu’s Theorem (Section 12.8) is thus satis…ed.
P
Let ! = i2I ! i be the the sum of individual endowments, so the total resources in the
economy. The aggregate supply function S : Rn+ ! Rn is given by such sum, i.e.,

S (p) = !

So, in this simpli…ed exchange economy the aggregate supply function does not depend on
prices. It is a “‡at” supply.
In this economy we have the weak Walras’law

p E (p) 0

where E : Rn+ ! Rn is the excess demand function de…ned by E (p) = D (p) !. Indeed,
X X X
p D (p) = p Di (p) = p Di (p) p !i = p !
i2I i2I i2I

If Walras’law (18.39) holds for each agent i 2 I, then its aggregate version holds

p E (p) = 0

So, besides condition A.2, also conditions W.1 and W.2 used in the Arrow-Debreu’s Theorem
naturally arise in this simple exchange economy.

The wellbeing of each agent i in the economy E depends on the bundle of goods xi =
(xi1 ; :::; xin ) 2 Rn that he receives, as ranked via a utility function ui : Rn+ ! R. A con-
sumption allocation of such bundles is a vector

jIj
x = x1 ; :::; xjIj 2 Rn+

Next we de…ne allocations that may arise via market exchanges which are, at the same time,
voluntary and feasible.

jIj
De…nition 849 A pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a
weak Arrow-Debreu ( market) equilibrium of the exchange economy E if

(i) xi = Di (p) for each i 2 I;


P
(ii) i2I xi !.

If equality holds in (ii), we say that (p; x) is a Arrow-Debreu ( market) equilibrium.


18.8. EQUILIBRIUM ANALYSIS 571

The optimality condition (i) requires that allocation x consists of bundles that, at the
price level p, are optimal for each agent i – so, as a trader, agent i is freely trading. The
market clearing condition (ii) requires that such allocation x relies on trades that are feasible
in the market. Jointly, conditions (i) and (ii) ensure that allocation x is attained via market
exchanges that are both voluntary and feasible.
The Arrow-Debreu equilibrium notion thus aggregates individual behavior. What distin-
guishes a weak equilibrium and an equilibrium is that in the latter optimal bundles exhaust
endowments, so no resources are left unused. The next result is trivial mathematically yet of
great economic importance in that it shows that the aggregate equilibrium notions of Section
12.8 can be interpreted in terms of a simple exchange economy.

n jIj
P 850 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:

(i) Arrow-Debreu equilibrium if and only if (12.16) holds, i.e., q = D (p) = S (p);

(ii) weak Arrow-Debreu equilibrium if and only if (12.18) holds, i.e., q = D (p) S (p).

In view of this result, we can then establish the existence of a weak market equilibrium
of the exchange economy E using the existence results of Section 12.8, in particular Arrow-
Debreu’s Theorem. For simplicity, next we consider the existence of a weak market price
equilibrium, i.e., a price p such that E (p) 0 (so, at p there is no excess demand).

Proposition 851 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I,
the endowment ! i is strictly positive and the utility function ui is continuous and strictly
quasi-concave on a convex and compact consumption set Ai . Then, a weak Arrow-Debreu
equilibrium of the exchange economy E exists.

Proof Let i 2 I. If ui is continuous and strictly quasi-concave on the compact set Ai , by


the Maximum Theorem (it will be presented in Chapter 33) the individual demand function
Di is continuous on Rn++ . The aggregate demand D is then also continuous on Rn++ , so
condition A.1 is satis…ed. Since we already noted that conditions A.2 and W.1 hold, we
conclude that a weak market price equilibrium exists by the Arrow-Debreu’s Theorem.

In sum, in this simple exchange economy we have connected individual and aggregate be-
havior via an equilibrium notion. In particular, the existence of a (weak) market equilibrium
is established only via conditions on agents’ individual characteristics – i.e., utility func-
tions and endowments – as methodological individualism prescribes. Indeed, to aggregate
individual behavior via an equilibrium notion is a common mode of analysis in economics.
A caveat, however, is in order: indeed, how does a market price equilibrium come about?
The previous analysis provides conditions under which it exists but says nothing about what
kind of individual choices may actually implement it. A deus ex machina, the “market”, sets
price equilibria, a signi…cant limitation of the analysis from a methodological individualism
viewpoint.
572 CHAPTER 18. OPTIMIZATION PROBLEMS

18.8.2 Invisible hand


The set of all consumption allocations in the economy E = f(ui ; ! i )gi2I is
( )
jIj
X
C (!) = x2 Rn+ : xi !
i2I

All allocations in C (!) can, in principle, be attained via trading; for this reason, we call
them attainable allocations. Yet, if there exists a mighty planner –say, a pharaon –endowed
with a vector ! of goods, rather than via trading the attainable allocations may result from
an arbitrary consumption allocation selected by the pharaon, who decides which bundle each
agent can consume.
jIj
The operator f : Rn+ ! RjIj given by

f (x) = (u1 (x1 ) ; :::; u (xn )) (18.41)

represents the utility pro…le across agents of each allocation. So, the image

f (C (!)) = ff (x) : x 2 C (!)g

consists of all utility pro…les (u1 (x1 ) ; :::; u (xn )) that agents can achieve at attainable alloc-
ations. Because of its importance, we denote by the more evocative symbol UE such image,
i.e., we set UE = f (C (!)). The subscript reminds us that this set depends on the individual
characteristics –utility functions and endowments –of the agents in the economy.
jIj
A vector x 2 Rn+ is said to be a (weak, resp.) equilibrium market allocation of
economy E if there is a non-zero price vector p such that the pair (p; x) is a (weak, resp.)
Arrow-Debreu equilibrium of the exchange economy E. Clearly, equilibrium allocations are
attainable.
Can a benevolent pharaon improve upon an equilibrium market allocation? Speci…cally,
given an equilibrium market allocation x, is there an alternative attainable allocation x0 such
that f (x0 ) > f (x), i.e., such that under x0 at least an agent is strictly better o¤ than under
allocation x and none is worse o¤?
Formally, a negative answer to this question amounts to saying that equilibrium market
allocations are Pareto optimal, that is, result in utility pro…les that are maximal in the set
UE , i.e., that are Pareto optima in such set (Section 2.5). Remarkably, this is indeed the
case, as the next fundamental result shows.

Theorem 852 (First Welfare Theorem) Let E = f(ui ; ! i )gi2I be an economy in which
! 0 and, for each agent i 2 I, the utility function ui is concave and strongly increasing
on a convex and closed under majorization consumption set Ai . An equilibrium allocation of
economy E is (if it exists) Pareto optimal.

Thus, it is not possible to Pareto improve upon an equilibrium allocation. The First
Welfare Theorem can be viewed as a possible formalization of the famous invisible hand of
Adam Smith. Indeed, an exchange economy reaches via feasible and voluntary exchanges
an equilibrium allocation that even a benevolent pharaon would be not be able to Pareto
18.9. LEAST SQUARES 573

improve upon, i.e., he would not be able to select a di¤erent attainable allocation that makes
at least an agent strictly better o¤, yet none worse o¤.

Proof Suppose there exists an equilibrium allocation x 2 C (!) under a non-zero price vector
p. Suppose, by contradiction, that there exists a di¤erent x0 2 C (!) such that f (x0 ) > f (x).
Let i 2 I. If ui (x0i ) > ui (xi ), then p x0i > p ! i because xi is an optimal bundle. If
ui (x0i ) = ui (xi ), then p x0i p ! i ; indeed, if p x0i < p ! i then x0i is an optimal bundle
that violates the individual Walras’ law, a contradiction because ui is strongly increasing
and Ai P is closed under majorization (Proposition 796). Being f (x0 ) > f (x), we conclude
P
that p 0 0 0
i2I xi > p !. On the other hand, from x 2 C P (!) it follows that p !P p i2I xi
because p > 0. We thus reached the contradiction p 0 0
i2I xi > p ! p i2I xi . This
proves that x is a Pareto optimum.

The First Welfare Theorem establishes a property of equilibrium allocations without


worrying about their existence. To address this further issue, it is enough to combine this
theorem and Proposition 851.

18.9 Least squares


The method of least squares is of central importance in applied mathematics. As all great
ideas, it can be analyzed from multiple perspectives, as we will see in this section.

18.9.1 Linear systems


Let us start with a linear algebra approach. A linear system of equations

A x = b (18.42)
(m n)(n 1) m 1

may not have a solution. This is often the case when a system has more equations than
unknowns, i.e., m > n.
When a system has no solution, there is no vector x^ 2 Rn such that A^ x = b. That said,
one may wonder whether there is a surrogate for a solution, a vector x 2 Rn that minimizes
the approximation error
kAx bk (18.43)
that is, the distance between the vector of constants b and the image Ax of the linear
operator F (x) = Ax. The error is null in the fortunate case where x solves the system:
Ax b = 0. In general, the error (18.43) is positive as the norm is always positive.
By Proposition 782, to minimize the approximation error is equivalent to minimizing the
quadratic transformation kAx bk2 of the norm. This justi…es the following de…nition.

De…nition 853 A vector x 2 Rn is said to be a least squares solution of system (18.42) if


it solves the optimization problem

min kAx bk2 sub x 2 Rn (18.44)


x
574 CHAPTER 18. OPTIMIZATION PROBLEMS

The least squares solution in an approximated solution of the linear system, it is the best
we can do to minimize the distance between vectors Ax and b in Rm . As k k2 is a sum
of squares, to …nd the least squares solution by solving the optimization problem (18.44)
is called least squares method . The fathers of this method are Gauss and Legendre, who
suggested it to analyze astronomical data at the beginning of the nineteenth century.

As we remarked, when it exist the linear system’s solution is also a least squares solution.
To be a good surrogate, a least squares solution should exist also when the system has no
solution. In other words, the more general are the conditions ensuring the existence of
solutions of the optimization problem (18.44), the more useful is the least squares method.
The following fundamental result shows that such solutions do indeed exist and are unique
under the hypothesis that (A) = n. In the more relevant case where m > n, it amounts to
requiring that the matrix A has maximum rank. The result relies on Tonelli’s Theorem for
existence and on Theorem 831 for uniqueness.

Theorem 854 Let m n. The optimization problem (18.44) has a unique solution if
(A) = n.

Later in the book we will see the form of this unique solution (Sections 19.4 and 24.5.1).
To prove the result, let us consider the function g : Rn ! R de…ned by

g (x) = kAx bk2

so that problem (18.44) is equivalent to the optimization problem:

max g (x) sub x 2 Rn (18.45)


x

The following lemma illustrates the remarkable properties of the objective function g which
allow us to use Tonelli’s Theorem and Theorem 831. Note that condition (A) = n is
equivalent to requiring injectivity of the linear operator F (x) = Ax (Corollary 579).

Lemma 855 If (A) = n, then g is supercoercive and strictly concave.

Proof Let us start by showing that g is strictly concave. Set x1 ; x2 2 Rn and 2 (0; 1).
Condition (A) = n implies that F is injective, hence F (x1 ) 6= F (x2 ). Therefore,

kF ( x1 + (1 ) x2 ) bk2 = k F (x1 ) + (1 ) F (x2 ) ( b + (1 ) b)k2


= k (F (x1 ) b) + (1 ) (F (x2 ) b)k2
< kF (x1 ) bk2 + (1 ) kF (x2 ) bk2 (18.46)

where the strict inequality follows from the strict convexity of k k2 .20 So,

g ( x1 + (1 ) x2 ) = kF ( x1 + (1 ) x2 ) bk2
> kF (x1 ) bk2 (1 ) kF (x2 ) bk2
= g (x1 ) + (1 ) g (x2 )
Pn
20
Indeed, the function kxk2 = i=1 x2i is strictly convex, as we already noted for n = 2 in Example 654.
18.9. LEAST SQUARES 575

which implies strict concavity of g.


Let us show that g is coercive. As F is injective, its inverse F 1 : Im F ! Rn exists
and is continuous (Proposition 563). Furthermore, the function f : Rm ! R de…ned by
f (y) = ky bk2 is supercoercive. Indeed:

kyk = ky b + bk ky bk + kbk

hence
kyk ! +1 =) ky bk ! +1 =) f (y) = ky bk2 ! 1
Set Bt = fy 2 Im F : f (y) tg = (f t) \ Im F for t 2 R. As f is supercoercive and
continuous, by Proposition 820 f is coercive on the closed set Im F and the sets Bt =
(f t) \ Im F are compact for every t. Furthermore

(g t) = fx 2 Rn : f (F (x)) tg = fx 2 Rn : F (x) 2 Bt g = F 1
(Bt )

Since F 1 is continuous and Bt is compact, by Lemma 801, F 1 (Bt ) is compact. It follows


that (g t) is compact for every t, which implies that g is supercoercive (Proposition 817).

Proof of Theorem 854 In light of the previous lemma, problem (18.45), and so problem
(18.44), has a solution thanks to Tonelli’s Theorem because g is coercive. Such a solution is
unique thanks to Theorem 831 because g is strictly concave.

18.9.2 Descriptive statistics


Let us now consider the least squares method from a more statistical perspective. Suppose
a farmer must choose how much fertilizer x (input) to use for the next crop of potatoes y
(output). He does not know the production function f : R+ ! R associating to each level of
input x the corresponding level of output y, so that, given an output objective y, he cannot
simply compute the inverse f 1 (y).
However, the farmer does have data on the pairs (xi ; yi ) of input and output over the
previous m years, that is, for i = 1; :::; m. The farmer wishes to …nd the linear production
function f (x) = x, with 2 R, that better …ts his data. Linearity is assumed for the sake
of simplicity: once one becomes familiar with the method, more complex formulations of f
can be considered.
It is still unclear what “better …ts his data”means precisely. This is, indeed, the crux of
the matter. According to the least squares method, it consists in requiring the function to
be f (x) = x, where the coe¢ cient minimizes
m
X
(yi xi )2
i=1

that is, the sum of the squares of the errors yi xi that are made by using the produc-
tion function f (x) = x to evaluate output. Therefore, one is faced with the following
optimization problem
Xm
min (yi xi )2 sub 2 R
i=1
576 CHAPTER 18. OPTIMIZATION PROBLEMS

By denoting by X = (x1 ; :::; xm ) and Y = (y1 ; :::; ym ) the data vectors regarding input and
output, the problem can be restated as

min k X Y k2 sub 2R (18.47)

which is a special case n = 1 of the optimization problem (18.44) with the notation A = X,
x = and b = Y .21
By Theorem 854, problem (18.47) has a unique solution 2 R because the rank condi-
tion is trivially satis…ed when n = 1. The farmer can use the production function

f (x) = x

in order to decide how much fertilizer to use for the next crop, for whichever level of output
he might choose. Given the data he has at hand and the (possibly, simplistic) choice of
a linear production function, the least squares method suggests the farmer that this is the
production function that best …ts the available data.

8
y
7

1
0 O 1 2 3 4 5 6 7x

Such a procedure can be used in the analysis of data regarding any pair of variables.
The independent variable x, referred to as regressor, is not generally unique. For example,
suppose the same farmer needs n kinds of input x1 , x2 , ..., xn – that is, n regressors – to
produce a quantity y of output. The data collected by the farmer is thus

X1 = (x11 ; x12 ; :::; x1m )


X2 = (x21 ; x22 ; :::; x2m )

Xn = (xn1 ; xn2 ; :::; xnm )


21
Unfortunately, the notation we have used, which is standard in statistics, is not consistent with that of
Problem (18.44). In particular, here plays the role of x in (18.44).
18.10. OPERATOR OPTIMA 577

where xij is the quantity of input i used in year j. The vector Y = (y1 ; :::; ym ) denotes the
output, as before. The linear production function is now a function of several variables, that
is, f (x) = x with x 2 Rn . The data matrix
2 3
x11 x21 xn1
6 x12 x22 xn2 7
6 7
X = X1T X2T XnT =6
6
7
7 (18.48)
m n
4 5
x1m x2m xnm

has the vectors X1 , X2 , ..., Xn as columns, so that the latter contain data on each regressor
throughout the years.
The least squares method leads to

min kX Y k2 sub 2 Rn

which is the optimization problem (18.44) with the notation A = X, x = and b = Y .


If (X) = n, Theorem 854 says that problem (18.47) has a unique solution 2 Rn . The
linear production function which the farmer extracts from the available data is f (x) = X ,
where the vector of coe¢ cients = ( 1 ; :::; n ) assigns to each regressor xi the explanatory
power i prescribed by the least squares method.

18.10 Operator optima


18.10.1 Operator optimization problems
So far we considered objective functions f : A Rn ! R that take on scalar values. In some
important applications, however, the objective function is an operator f : A Rn ! Rm
that takes on vectors as values. If we write the operator f as a m-tuple (f1 ; :::; fm ) of vector
functions fi : A Rn ! R, it becomes clear that each alternative x 2 A is now evaluated
through multiple criteria (f1 (x) ; :::; fm (x)). In a consumer problem, consumers may for
example evaluate bundles according to m criteria, each represented by a function fi (for
instance, for a car it might matter both the color and the speed, taken as indicators of
design and performance, respectively). In a planner problem, x can be an allocation of some
resources among the m agents of an economy; the planner objective function f is an operator
that assesses an allocation through the utility function fi of each agent i (cf. Section 18.8).

To address an optimization problem with operators as objective functions, we need the


notion of Pareto optimum (Section 2.5).

De…nition 856 Let f : A Rn ! Rm be an operator and C a subset of A. An element


x
^ 2 C is called Pareto optimizer of f on C if there is no x 2 C such that

f (x) > f (^
x)

The value f (^
x) of the function at x
^ is called Pareto value of f on C.
578 CHAPTER 18. OPTIMIZATION PROBLEMS

Because of the planner example, sometimes f is called the social objective function and
C the social choice set. Note that a Pareto value of the objective function f on the choice
set C is a Pareto optimum of the set f (C) = ff (x) : x 2 Cg. Unlike the maximum value,
which is unique, there are in general multiple Pareto values. The collection of all such values
is called Pareto frontier of f on C (in accordance with the terminology of Section 2.5).
We will write an operator optimization problem as

opt f (x) sub x 2 C (18.49)


x

A vector x^ 2 C solves this problem if it is a Pareto optimizer of f on C. We denote by


arg optx2C f (x) the set of all solutions. When m = 1, we get back to the maximization
problem (18.2).22 Problems (18.49) are often called vector maximization problems.
To study operator optimization problems it is often useful a scalarization of the objective
function. Speci…cally, consider the scalar function W : A Rn ! R de…ned by
m
X
W (x) = i fi (x)
i=1
P
where denotes a strictly positive and normalized element of Rm , i.e., 0 and m i=1 i =
1. The vector can be interpreted as a vector of weights. Again in view of the planner
problem, in which i would “weight” agent i, the function W is sometimes called (social )
welfare function.
The next result is a …rst illustration of the usefulness of the scalarization provided by
welfare functions.

Lemma 857 We have arg maxx2C W (x) arg optx2C f (x) for every .
P
Proof Fix 0, with m i=1 i = 1. Let x ^ 2 arg maxx2C W (x). The point x ^ is clearly a
Pareto optimizer. Otherwise, there exists x 2 C such that f (x) > f (^
x). But, being 0,
this implies W (x) = f (x) > f (^
x) = W (^ x), a contradiction.

This lemma implies the next Weierstrass-type result that ensures the existence of solu-
tions for an operator optimization problem.

Proposition 858 An operator f : A Rn ! Rm which is continuous on a compact subset


K of A admits (at least) an optimizer in K, that is, there exists x
^ 2 K such that there is no
x 2 K for which f (x) > f (^
x).

Proof The function W is continuous if the operator f is continuous. By Weierstrass’


Theorem, arg maxx2K W (x) =
6 ;. Then, by the previous lemma arg optx2K f (x) 6= ;.

Scalarization is most e¤ective when


[
arg opt f (x) = arg max W (x) (18.50)
x2C x2C

In this case, by suitably choosing the vector of weights we can retrieve all optimizers. The
next examples show that this may, or may not, happen.
22
As the reader can check, a dual notion of Pareto optimality would lead to minimum problems.
18.10. OPERATOR OPTIMA 579

Example 859 (i) Consider f : [0; 1] ! R2 given by f (x) = (ex ; e x ). All the points of the
unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = ex + (1 ) e x , where 2 (0; 1). Its maximizer is x ^ = 0 if (1 )= e
and x^ = 1 otherwise. Hence, only the two Pareto optimizers f0; 1g can be found through
scalarization. (ii) Consider f : [0; 1] ! R2 given by f (x) = x2 ; x2 . Again, all the points
of the unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = x2 (1 ) x2 = (2 1) x2 , where 2 (0; 1). We have
8
>
> f0g if < 21
<
arg max W (x) = [0; 1] if = 12
x2C >
>
:
f1g if > 12

and so (18.50) holds. In this case, all Pareto optimizers can be retrieved via scalarization.N

18.10.2 Planner’s problem


Considered again a planner, the pharaon, who has to allocate at his discretion an overall
endowment ! 2 Rn+ among a …nite set I of agents (Section 18.8). The set of attainable
consumption allocations is the set
( )
jIj
X
C (!) = x 2 Rn+ : xi ! (18.51)
i2I

jIj
Given f : Rn+ ! RjIj de…ned in (18.41), i.e., f (x) = (u1 (x1 ) ; :::; u (xn )), the operator
optimization problem of the planner is

opt f (x) sub x 2 C (!) (18.52)


x

The solutions of this problem, i.e., the Pareto optimizers, are called Pareto optimal allocations
(in accordance with the terminology of the First Welfare Theorem).
In view of the previous
P discussion, the planner can tackle his problem through a welfare
function W (x) = m i=1 i ui (xi ) and the associated optimization problem

max W (x) sub x 2 C (!) (18.53)


x

Unless (18.50) holds, some Pareto optimizers will be missed by a planner that relies on this
scalar optimization problem, whatever he chooses to scalarize with.

Example 860 Consider an exchange economy with two agents and one good. Assume that
the total amount of the good in the economy is ! > 0. For the sake of simplicity, assume
that the two agents have the same preferences over this single good. In this way, they share
the same utility function, for example a linear u : R+ ! R de…ned by

u1 (x) = u2 (x) = x

A planner has to allocate the total endowment ! to the two agents. In other words, he has to
choose an attainable vector x = (x1 ; x2 ) 2 R2+ , that is, such that x1 + x2 ! where x1 will
580 CHAPTER 18. OPTIMIZATION PROBLEMS

be the share of ! allotted to the …rst agent and x2 will be share of the second agent. Indeed,
every agent can only receive a positive quantity of the good, x 2 R2+ , and the planner cannot
allocate to the agents more than what is available in the economy, x1 + x2 !. Here the
collection (18.51) of attainable allocations is

C (!) = x 2 R2+ : x1 + x2 !

De…ne f : R2+ ! R2+ by


f (x1 ; x2 ) = (x1 ; x2 )
In other words, the function f associates to each allocation x the utility pro…le (u1 (x1 ) ; u2 (x2 )) 2
R2+ . This latter vector represents the utility of the two agents coming from the feasible al-
location x. The planner operator optimization problem (18.49) is here

opt f (x) sub x 0 and x1 + x2 !


x

It is easy to check that

arg opt f (x) = x 2 R2+ : x1 + x2 = !


x2C(!)

that is, the allocations that exhaust total resources are the Pareto optimizers of f on C.
Since agents’utility functions are linear, the Pareto frontier is x 2 R2+ : x1 + x2 = ! . N

Example 861 If in the previous example we have two agents and two goods, we get back
to the setup of the Edgeworth box (Section 2.5). Recall that we assumed that there is a unit
of each good to split among the two agents (Albert and Barbara), so ! = f1; 1g. They have
the same utility function ui : R2+ ! R de…ned by
p
ui (xi1 ; xi2 ) = xi1 xi2

The collection (18.51) of attainable allocations becomes23


n o
2
C (!) = x 2 R2+ : x11 + x21 1 and x11 + x12 1

2 p p
De…ne f : R2+ ! R2+ by f (x1 ; x2 ) = ( x11 x12 ; x21 x22 ). The planner operator optimiz-
ation problem (18.49) is here

opt f (x) sub x 0, x11 + x21 1 and x11 + x12 1


x

By Proposition 58,
n o
2
arg opt f (x) = x 2 R2+ :0 x11 = x12 = 1 x21 = 1 x22 1
x2C(!)

that is, the allocations that are symmetric – i.e., there is the same quantity of each good –
and that exhaust total resources are the Pareto optimizers of f on C. The Pareto frontier is
p p
( x11 x12 ; x21 x22 ) 2 R2+ : 0 x11 = x12 = 1 x21 = 1 x22 1

N
23 n
We denote by xi = (xi1 ; :::; xin ) 2 R a bundle of goods of agent i.
18.11. INFRACODA: CUNEIFORM FUNCTIONS 581

O.R. As the First Welfare Theorem suggests, there is a close connection between Pareto op-
timal allocations and equilibrium allocations that would arise if agents were given individual
endowments and could trade among them under a price vector. We do not further discuss
this topic, which readers will study in some microeconomics course. Just note that, through
such connection, the possible equilibrium allocations may be found by solving the operator
optimization problem (18.52) or, under condition (18.50), the standard optimization problem
(18.53). H

18.11 Infracoda: cuneiform functions


Strict quasi-concavity is the most standard condition that ensures the uniqueness of solutions
of optimization problems (Theorem 831). It is, however, a su¢ cient condition that requires
the convexity of the choice set, and so it is for example useless for …nite choice sets. Let us
consider the following class of functions.24 Here A is any set.

De…nition 862 A real-valued function f : A ! R is said to be cuneiform if, for every pair of
distinct elements x; y 2 A, there exists an element z 2 A such that f (z) > min ff (x) ; f (y)g.

It is an ordinal property: if f : A ! R is cuneiform and g : Im f ! R is strictly


increasing, then the composition g f : A ! R is cuneiform as well. The next example shows
two important classes of cuneiform functions.

Example 863 (i) Strictly quasi-concave functions f : C ! R de…ned on convex sets C


of Rn are cuneiform. Indeed, given any two distinct elements x; y 2 C, by setting z =
(1=2) x + (1=2) y we have
1 1 1 1
f (z) = f x+ y > f (x) + f (y) min ff (x) ; f (y)g
2 2 2 2
(ii) Injective functions f : A ! R are cuneiform. Let x; y 2 A be any two distinct elements
of A. Since injectivity implies f (x) 6= f (y), without loss of generality we can assume that
f (x) > f (y). So, x itself can play the role of z in De…nition 862. An important class
of cuneiform functions are, thus, the strictly monotone functions (increasing or decreasing)
de…ned on any subset –…nite or not –of the real line. N

The next result shows that being cuneiform is a necessary and su¢ cient condition for the
uniqueness of solutions. In view of the last example, this result generalizes the uniqueness
result that we established for strictly quasi-concave functions.

Proposition 864 A function f : A ! R has at most one maximizer if and only if it is


cuneiform.

Proof “If”. Let f : A ! R be cuneiform. We want to show that there exists at most a
maximizer in A. Suppose, by contradiction, that there exist in A two such points x0 and x00 ,
i.e., f (x0 ) = f (x00 ) = maxx2A f (x). Since f is cuneiform, there exists z 2 A such that

f (z) > min f x0 ; f x00 = f x0 = f x00 = max f (x)


x2A
24
Our terminology is not standard.
582 CHAPTER 18. OPTIMIZATION PROBLEMS

which contradicts the optimality of x0 and x00 . “Only if”. Suppose that there exists at most
one maximizer in A. Let x0 and x00 be any two distinct elements in A. If there are no
maximizers, then in particular x0 and x00 are not maximizers; so, there exists z 2 A such
that f (z) > min ff (x0 ) ; f (x00 )g. We conclude that f is cuneiform. On the other hand, if
there is one maximizer, it is easy to check that it plays the role of z in De…nition 862. Also
in this case f is cuneiform.

Though for brevity we omit details, it is easy to see that there is a dual notion in which the
inequality in the previous de…nition is reversed and the previous result holds for minimizers.

18.12 Coda: no illusions


Solving optimization problems is, in general, a quite complex endeavor, even when a limited
number of variables is involved. In this section we will present an example of an optimization
problem whose solution is as complicated as proving Fermat’s Last Theorem.25 The latter,
which was …nally proven after three centuries of unfruitful e¤orts, states that, for n 3,
n n n
there do not exist any three positive integers x, y and z such that x + y = z (Section
1.3.2)
Let us consider the optimization problem

min f (x; y; z; n) sub (x; y; z; n) 2 C


x;y;z;n

where the objective function f : R3 N ! R is given by

f (x; y; z; n) = (xn + y n z n )2 + (1 cos 2 x)2 + (1 cos 2 y)2 + (1 cos 2 z)2

and the choice set is C = (x; y; z; n) 2 R3 N : x; y; z 1; n 3 .


It is an optimization problem in four variables, one of which, n, is discrete, thus not
making it possible to use di¤erential and convex methods. At …rst sight this might seem a
di¢ cult problem, but not intractable. Let us have a closer look. We have f 0 because
f is a sum of squares. In particular,

inf f (x; y; z; n) = 0
(x;y;z;n)2C

p p 2 p
since limn!1 f 1; 1; n 2; n = limn!1 1 cos 2 n 2 = 0. Indeed, limn!1 n 2 = 1 (Pro-
position 322).
The minimum value is thus 0. The question is whether there is a solution of the problem,
that is, a vector (^
x; y^; z^; n
^ ) 2 C such that f (^
x; y^; z^; n
^ ) = 0. Since f is a sum of squares, this
requires that in such a vector they all be null:

^n^ + y^n^
x z^n^ = 1 cos 2 x
^=1 cos 2 y^ = 1 cos 2 z^ = 0

^, y^ and z^ are integers.26 In order to belong


The last three equalities imply that the points x
to the set C, they must be positive. Therefore, the vector (^ x; y^; z^; n
^ ) 2 C must be made
25
Based on Murty and Kabadi (1987).
26
Reall that cos 2x = 1 if and only if x is an integer.
18.13. ULTRACODA: THE SEMICONTINUOUS TONELLI 583

up of three positive integers such that x^n^ + y^n^ = z^n^ for n


^ 3. This is possible if and only
if Fermat’s Last Theorem is false. Now that we know it to be true, we can conclude that
this optimization problem has no solution. We could not have made such a statement before
1994: till then, it would have been unclear whether this optimization problem had a solution.
Be it as it may, solving this optimization problem, which only has four variables, amounts
to solving one of the most well-known problems in mathematics.

18.13 Ultracoda: the semicontinuous Tonelli


In some optimization problems, continuity turns out to be a too strong property and a
weakened notion of continuity, called semicontinuity, comes to play a key role. Fortunately,
a more general version of Tonelli’s Theorem continues to hold. We …rst introduce semicon-
tinuity, and then present this ultimate version of Tonelli’s Theorem.

18.13.1 Semicontinuous functions: de…nition


Recall that a function f : A Rn ! R is continuous at a point x0 2 A when, for each " > 0,
there exists " > 0 such that27

kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A (18.54)

If in this de…nition we keep the second inequality, we have the following weakening of
continuity.

De…nition 865 A function f : A Rn ! R is said to be upper semicontinuous at x0 2 A


if, for each " > 0, there exists " > 0 such that

kx x0 k < " =) f (x) < f (x0 ) + " 8x 2 A

A function that is upper semicontinuous at each point of a set E is called upper semicon-
tinuous on E. The function is called upper semicontinuous when it is upper semicontinuous
at all the points of its domain.28

Upper semicontinuity has a dual notion of lower semicontinuity, with f (x) > f (x0 ) "
in place of f (x) < f (x0 ) + ".

Proposition 866 A function f : A Rn ! R is both upper and lower semicontinuous at a


point x0 2 A if and only if is continuous at x0 .

Proof The “if” is obvious. As to the converse, assume that f is both upper and lower
semicontinuous at x0 2 A. Fix " > 0. There exist 0" ; 00" > 0 such that, for each x 2 A,
0
kx x0 k < " =) f (x) < f (x0 ) + "
00
kx x0 k < " =) f (x) > f (x0 ) "
27
Clearly, the sandwich f (x0 ) " < f (x) < f (x0 ) + " amounts to jf (x0 ) f (x)j < ".
28
Semicontinuity has been introduced by René Baire in 1905.
584 CHAPTER 18. OPTIMIZATION PROBLEMS

0 00
so, by taking " = min "; " , we have

kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A

In view of (18.54), we conclude that f is continuous at x0 , as desired.

The study of the two forms of semicontinuity, upper and lower, is analogous: indeed, it
is easy to see that f is upper semicontinuous if and only if f is lower semicontinuous. For
this reason, we will focus on upper semicontinuity because it is more relevant for the study
of maximizers.

The next result presents the sequential characterization of upper semicontinuity.

Proposition 867 A function f : A Rn ! R is upper semicontinuous at a point x0 2 A if


and only if lim sup f (xn ) f (x0 ) for each sequence fxn g A such that xn ! x0 .

By Proposition 475, for continuous functions we have lim f (xn ) = f (x0 ), so this sequen-
tial characterization of semicontinuous functions helps to understand to what extent upper
semicontinuity generalizes continuity. For lower semicontinuous, we have the dual condition
lim inf f (xn ) f (x0 ).29

Proof Let f be upper semicontinuous at the point x0 . Let fxn g be such that xn ! x0 . Fix
" > 0. There is n" 1 such that kxn x0 k < " for all n n" . By De…nition 865, we then
have f (xn ) < f (x0 ) + " for each n n" . Therefore, lim sup f (xn ) f (x0 ) + ". Since this
is true for each " > 0, we conclude that lim sup f (xn ) f (x0 ).
Suppose now that lim sup f (xn ) f (x0 ) for each sequence fxn g such that xn ! x0 . Let
" > 0 and suppose, by contradiction, that f is not upper semicontinuous at x0 . Therefore, for
each > 0 there exists x such that kx x0 k < and f (x ) f (x0 )+". Setting = 1=n, it
follows that for each n there exists xn such that kxn x0 k < 1=n and f (xn ) f (x0 ) + ". In
this way we can construct a sequence fxn g such that xn ! x0 and f (xn ) f (x0 )+" for each
n. Therefore, lim inf f (xn ) f (x0 ) + " > f (x0 ), which contradicts lim sup f (xn ) f (x0 )
and thus proves that f is upper semicontinuous at x0 .

Example 868 The function f : [0; 1] ! R de…ned by


(
1 if x = 0
f (x) =
x if x 2 (0; 1]

is upper semicontinuous. Indeed, it is continuous – so, upper semicontinuous – at each


x 2 (0; 1]. As to the origin x = 0, consider fxn g [0; 1] with xn ! 0. For each such xn
we have f (xn ) 1 and therefore lim sup f (xn ) 1 = f (0). By Proposition 867, f is upper
semicontinuous also at 0. N
29
Being lim sup f (xn ) lim inf f (xn ), a function is then both upper and lower semicontinuous at x0 if and
only if lim sup f (xn ) = f (x0 ) = lim inf f (xn ), i.e., if and only if lim f (xn ) = f (x0 ) (cf. Proposition 379).
This con…rms from a sequential angle that a function is both upper and lower semicontinuous if and only if
is continuous.
18.13. ULTRACODA: THE SEMICONTINUOUS TONELLI 585

Example 869 Recall that the function f : R ! R given by (12.2), i.e.,


8
>
> x for x < 1
<
f (x) = 2 for x = 1
>
>
:
1 for x > 1

has a removable discontinuity at x0 = 1, as its graph shows:

3
y

2 2

1 1

0
O 1 x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

The function is upper semicontinuous at x0 = 1. In fact, let fxn g R with xn ! 1. For every
such xn we have f (xn ) 1 and therefore lim sup f (xn ) 1 < 2 = f (1). By Proposition 867,
f is upper semicontinuous also at x0 (so, it is upper semicontinuous because it is continuous
at each x 6= x0 ). N

This last example shows that, in general, if a function f has a removable discontinuity
at a point x0 – i.e., the limit limx!x0 f (x) exists but it is di¤erent from f (x0 ) – then
at x0 is either upper semicontinuous if f (x0 ) > limx!x0 f (x) or lower semicontinuous if
f (x0 ) < limx!x0 f (x).

Example 870 Recall that the function f : R ! R given by (12.5), i.e.,


(
2 if x 1
f (x) = (18.55)
x if x < 1

has a non-removable jump discontinuity at x0 = 1. However, it is upper semicontinuous at


x0 . In fact, let fxn g R with xn ! 1. For every such xn we have f (xn ) 2 and therefore
lim sup f (xn ) 2 = f (1). By Proposition 867, f is upper semicontinuous also at 1 (so, it is
upper semicontinuous because it is continuous at each x 6= x0 ). N

In general, the reader can verify that an increasing function f : R ! R of a single


variable is upper semicontinuous at x0 if and only if it is continuous at x0 from the right,
that is, limx!x+ f (x) = f (x0 ), while it is lower semicontinuous at x0 if and only if it is there
0
586 CHAPTER 18. OPTIMIZATION PROBLEMS

continuous from the left, that is, limx!x f (x) = f (x0 ). For example, let us modify the
0
function (18.55) at x0 = 1, so to have
(
2 if x > 1
f (x) =
x if x 1

It is now lower semicontinuous at x0 = 1.

18.13.2 Semicontinuous functions: properties


The upper contour sets of continuous functions are closed (Proposition 808). Remarkably,
this property is still true for upper semicontinuous functions, so this weaker notion of con-
tinuity preserves this important property.

Proposition 871 Let f : A Rn ! R be upper semicontinuous on a closed subset C of A.


Then, the sets (f t) \ C are closed for every t 2 R.

Proof Let f be upper semicontinuous on C. Fixed t 2 R, we want to show that (f t) \ C


is closed. Let fxn g (f t) \ C with xn ! x 2 Rn . By Theorem 165, it is enough to show
that x 2 (f t) \ C. Note that x 2 C since C is closed. Moreover, f (xn ) t for each
n 1. Since f is upper semicontinuous, by Proposition 867 we have lim sup f (xn ) f (x).
Therefore t f (x), i.e., x 2 (f t). We conclude that x 2 (f t) \ C, as desired.

Example 872 Given a closed subset C of Rn , let 1C : Rn ! R be de…ned by


(
1 if x 2 C
1C (x) =
0 if x 2 =C

In words, the function 1C takes on value 1 on C and 0 elsewhere. Though not continuous, it
is upper semicontinuous. Indeed, let x0 2 Rn . If x0 2 C, then f (x0 ) f (x) for all x 2 Rn ,
so it trivially holds that lim sup f (xn ) f (x) whenever xn ! x. If x0 2= C, then it belongs
to the open set C c . Given any " > 0, if xn ! x then there is n" 1 such that xn 2 C c , so
f (xn ) = 0, for all n n" . Thus, lim f (xn ) = f (x0 ) = 0. By Proposition 867, we conclude
that f is upper semicontinuous since x0 was arbitrarily chosen. Its upper contour sets:
8 n
>
> R if t 0
<
(1C t) = C if t 2 (0; 1]
>
>
:
; if t>1

are closed for each t 2 R, in accordance with the last result. N

From the previous result it follows that also Proposition 810 continues to hold under
upper semicontinuity.

Proposition 873 An upper semicontinuous function f : A Rn ! R is coercive on every


compact and non-empty subset C A.
18.13. ULTRACODA: THE SEMICONTINUOUS TONELLI 587

Proof Let C A be compact. If f : A Rn ! R is upper semicontinuous on C, Proposition


871 implies that every set (f t) \ C is closed. Since a closed subset of a compact set is, in
turn, compact, it follows that every (f t) \ C is compact. This shows that f is coercive
on C.

A …nal important property is the stability of upper semicontinuity with respect to in…ma
and suprema of functions.

Proposition 874 Given a family ffi gi2I of functions fi : A Rn ! R upper semicontinu-


ous at x0 2 A, de…ne h : A Rn ! ( 1; +1] and g : A Rn ! [ 1; +1) by

g (x) = inf fi (x) and h (x) = sup fi (x)


i2I i2I

Then, the function g is upper semicontinuous at x0 2 A, while the function h is upper


semicontinuous at x0 2 A provided I is …nite.

In words, upper semicontinuity is preserved by in…ma over sets of functions of any car-
dinality, while is preserved under suprema only over …nite sets of functions. In this case, we
can actually write h (x) = maxi2I fi (x).
The last example showed that there is a tight connection between upper semicontinuous
functions and closed sets. It is therefore not surprising that the stability of upper semi-
continuous functions relative to in…ma and suprema reminds that of closed sets relative to
intersections and unions, respectively.

Example 875 The union of the closed sets An = [ 1 + 1=n; 1 1=n] is the open interval
( 1; 1), as noted after Corollary 158. The supremum of the in…nitely many upper semicon-
tinuous functions
fn (x) = 1[ 1+ 1 ;1 1 ] (x)
n n

is the lower, but not upper, semicontinuous function

h (x) = sup 1[ 1
1+ n ;1 1
] (x) = 1( 1;1) (x)
n2N n

Proof of Proposition 874 Let x0 2 A. Given " > 0, there exists i 2 I such that fi (x0 ) <
g (x0 ) + ". Since fi is upper semicontinuous, there exists " > 0 such that

kx x0 k < " =) fi (x) < fi (x0 ) + " 8x 2 A

So,
kx x0 k < " =) g (x) fi (x) < fi (x0 ) + " < g (x0 ) + 2" 8x 2 A
that is,
kx x0 k < " =) g (x) < g (x0 ) + 2" 8x 2 A
This proves that g is upper semicontinuous at x0 2 A. We leave to the reader the proof that
h is upper semicontinuous at x0 2 A when I is …nite.
588 CHAPTER 18. OPTIMIZATION PROBLEMS

Dual properties hold for lower semicontinuous functions: lower semicontinuity is pre-
served by suprema over sets of functions of any cardinality, while is preserved under in…ma
only over …nite sets of functions. Now the analogy is with the stability properties of open
sets relative to intersections and unions. Indeed, a tight connection –dual to the established
in Example 872 –is easily seen to exist for lower semicontinuous functions and open sets.
In view of Proposition 866, we then have the following important corollary about the
“…nite” stability of continuous functions.

Corollary 876 Given a …nite family ffi gni=1 of functions fi : A Rn ! R continuous at


x0 2 A, the functions g; h : A Rn ! R de…ned by

g (x) = min fi (x) and h (x) = max fi (x)


i=1;:::;n i=1;:::;n

are both continuous at x0 2 A.

In…ma and suprema of in…nitely many continuous functions are, in general, no longer
continuous. This fragility of continuity is a main reason for the importance of lower and
upper semicontinuity.

18.13.3 The (almost) ultimate Tonelli


Proposition 873 shows that upper semicontinuity is the natural notion version of continuity
to use for coercivity. Not surprisingly, then, we can now state and prove a version of Tonelli’s
Theorem in which upper semicontinuity replaces continuity, thus substantially broadening
the scope of the theorem.

Theorem 877 (Tonelli) A function f : A Rn ! R which is coercive and upper semicon-


tinuous on a subset C of A admits (at least) a maximizer in C, that is, there exists a x
^2C
such that
f (^
x) = max f (x)
x2C

If, in addition, C is compact, then arg maxx2C f (x) is compact.

The proof is a slight modi…cation of the …rst proof of Weierstrass’Theorem, which essen-
tially still goes through under upper semicontinuity (a further sign that upper semicontinuity
is the relevant notion of continuity to establish the existence of maximizers).

Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t)\C is
non-empty and compact. Set = supx2 f (x), that is, = sup f ( ). By Lemma 800, there
exists a sequence fan g f ( ) such that an ! . Let fxn g be such that an = f (xn )
for every n 1. Since is compact, the Bolzano-Weierstrass’Theorem yields a subsequence
fxnk g fxn g that converges to some x ^ 2 , that is, xnk ! x^ 2 . Since fan g converges to
, also the subsequence fank g converges to . Since f is upper semicontinuous, it follows
that
= lim ank = lim f (xnk ) f (^ x)
k!1 k!1

Here the penultimate inequality is due to upper semicontinuity. So, = f (^


x) and we thus
conclude that f (^
x) f (x) for every x 2 . At the same time, if x 2 C we have
18.13. ULTRACODA: THE SEMICONTINUOUS TONELLI 589

f (x) < t and so f (^


x) t > f (x). It follows that f (^
x) f (x) for every x 2 C, that is,
f (^
x) = maxx2C f (x).
It remains to show that arg maxx2C f (x) is compact if C itself is compact. Since
arg maxx2C f (x) , it is enough to show that arg maxx2C f (x) is closed (in that a closed
subset of a compact set is, in turn, compact). Clearly, we have

arg max f (x) = f max f (x) \ C


x2C x2C

So, arg maxx2C f (x) is closed by Proposition 871, as desired.

Example 878 (i) The function f : R ! R given by


(
2 if x = 0
f (x) =
e kxk if x = 6 0

is coercive and upper semicontinuous. Thanks to Tonelli’s Theorem, it has a maximizer in


C = R. Note that, instead, this function has no minimizers (here Weierstrass’Theorem does
not hold because the function is not continuous and R is not compact).
(ii) Consider the upper semicontinuous function f : [0; 1] ! R of Example 868. By Pro-
position 873, this function is coercive on its compact domain [0; 1], so by Tonelli’s Theorem
it has at least a maximizer. Note that also this function has no minimizers (here Weierstrass’
Theorem cannot be applied because the function is not continuous). N

We conclude with two remarks. (i) For minimizers hold dual versions of the results that
we established, with for instance lower contour sets in place of the upper ones (as readers
can check). (ii) Coercivity becomes a necessary condition for global optimality for upper
semicontinuous objective functions f and compact choice sets C. Indeed, in this case by
Tonelli’s Theorem the upper contour set (f maxx2C f (x)) \ C is non-empty and compact.

18.13.4 The ordinal Tonelli


There is a feature of the previous general form of Tonelli’s Theorem that, conceptually, is
still a bit unsatisfactory: unlike coercivity (Proposition 806), upper semicontinuity is not an
ordinal notion, as the next example shows.

Example 879 The function f : R ! R de…ned by f (x) = x is trivially continuous. In


contrast, the strictly increasing function g : R ! R de…ned by
8
>
> x + 1 if x > 0
<
g (x) = 0 if x = 0
>
>
:
x 1 if x < 0

is neither lower nor upper semicontinuous at 0. To see the failure of upper semicontinuity,
just note that xn = 1=n ! 0 but lim g (xn ) = 1 > 0 = g (0). Since g f = g, this proves that
lower and upper semicontinuity are not preserved by strictly increasing transformations, so
they are not ordinal properties. N
590 CHAPTER 18. OPTIMIZATION PROBLEMS

Since upper semicontinuity is not an ordinal notion, we might end up with equivalent
objective functions –in the sense of Section 18.1.5 –for which Tonelli’s Theorem is applicable
to only one of them, thus creating an unnatural asymmetry between them. To address this
issue, next we present an ordinal version of upper semicontinuity.

De…nition 880 A function f : A Rn ! R is said to be upper quasi-continuous at x0 2 A


if
f (xn ) f (y) =) f (x0 ) f (y) 8y 2 A (18.56)
for each sequence fxn g A such that xn ! x0 .

It is an ordinal notion, as the next result shows.

Proposition 881 Given a function f : A Rn ! R, let g : B R ! R be strictly


increasing with Im f B. The function f is upper quasi-continuous at x0 2 A if and only if
the composite function g f is upper quasi-continuous.

Proof We only prove the “only if”, the converse being similarly proved. Let f be upper
quasi-continuous. We want to show that g f is upper quasi-continuous at x0 . Let fxn g A
be such that xn ! x0 . Suppose that y 2 A is such that (g f ) (xn ) (g f ) (y). Since g is
strictly increasing, by Proposition 209 we have

f (xn ) f (y) () (g f ) (xn ) (g f ) (y) (18.57)

Since f is upper quasi-continuous at x0 , we then have f (x0 ) f (y). In view of (18.57), this
in turn implies (g f ) (x0 ) (g f ) (y), thus proving that g f is upper quasi-continuous
at x0 .

Besides being ordinal, upper quasi-continuity is weaker than upper semicontinuity.

Proposition 882 If a function f : A Rn ! R is upper semicontinuous at x0 2 A, then it


is upper quasi-continuous at x0 .

Proof Let f be upper semicontinuous at x0 2 A. Let xn ! x0 and y 2 A such that


f (xn ) f (y) for all n 1. By upper semicontinuity, we then have f (x0 ) lim sup f (xn )
lim inf f (xn ) f (y). So, f is upper quasi-continuous at x0 .

We can now state and prove a general ordinal version of Tonelli’s Theorem in which
upper quasi-continuity replaces upper semicontinuity.30

Theorem 883 (Ordinal Tonelli) A function f : A Rn ! R which is coercive and upper


quasi-continuous on a subset C of A admits (at least) a maximizer in C.

The proof relies on a sharpening of Lemma 800.

Lemma 884 Let A be a subset of the real line. There exists a convergent and increasing
sequence fan g A such that an " sup A.
30
We leave to readers the dual minimization version, based on a lower quasi-continuity notion.
18.13. ULTRACODA: THE SEMICONTINUOUS TONELLI 591

Proof Set = sup A. Suppose that 2 R. In the proof of Lemma 800 we proved the
existence of a sequence fan g A such that an and an ! . Set bn = max fa1 ; :::; an g.
Then 0 bn an ! 0, so bn ! . Suppose now = +1. In the proof of Lemma
800 we proved the existence of a sequence fan g A such that an ! +1. Again, by setting
bn = max fa1 ; :::; an g, we have bn " +1.

Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t) \ C
is non-empty and compact. Set = supx2 f (x). By Lemma 884, there exists a sequence
fan g f ( ) such that an " . Let fxn g be such that an = f (xn ) for every n 1. Since
is compact, the Bolzano-Weierstrass’ Theorem yields a subsequence fxnk g fxn g that
converges to some x ^ 2 . We want to show that = f (^ x). Suppose, by contradiction, that
x) < . Since ank " , then there exists k
f (^ 1 large enough so that nk nk > f (^
x)
for all k k. Hence, f (xnk ) f xnk for all k k. Since f is upper quasi-continuous at
x
^, we then have f (^ x) f xnk > , a contradiction.31 We conclude that = f (^ x). So,
f (^
x) f (x) for every x 2 . At the same time, if x 2 C we have f (x) < t and so
f (^
x) t > f (x). It follows that f (^x) f (x) for every x 2 C, as desired.

The ordinal Tonelli’s Theorem is the most general form of this existence theorem that we
present. The earlier pre-coda version of Tonelli’s Theorem for continuous functions, Theorem
814, is enough for the results of the book. Yet, when later in the book readers will come
across topics that rely on Tonelli’s Theorem, they may then wonder how much generality
would be gained via its stronger semicontinuous and quasi-continuous versions.

31
Here xnk plays the role of y in (18.56).
592 CHAPTER 18. OPTIMIZATION PROBLEMS
Chapter 19

Projections and approximations

19.1 Projection Theorem


In this chapter we address a simple general problem, with far-reaching implications: given
a point x 2 Rn and a vector subspace V of Rn , we would like to identify, if it exists, the
point m of the vector subspace V which is “closest”to x. Formally, m is the point of V that
minimizes kx yk as y varies in V . Graphically:

1.5

1
x
0.5

0 ||x-m||
O
-0.5
m
-1

-1.5

-2
-1 0 1 2 3 4

Clearly, the problem is trivial if x belong to V : just set m = x. Things become interesting
when x is not in V . In this regard, note that we can paraphrase the problem by saying that
it consists in …nding in V the best approximation of a given x 2 Rn : the vector subspace
V thus represents the space of “admissible approximations” and x m is interpreted as an
“approximation error” because it represents the error made by approximating x with m.
The problem described above is an optimization problem that consists in minimizing
kx yk under the constraint y 2 V , that is,

min kx yk sub y 2 V (19.1)


y

The relevant questions about this problem are:

593
594 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

(i) Does a solution m exist?

(ii) If it exists, is it unique?

(iii) How can it be characterized?

The following theorem addresses all these questions. It relies on the notions of orthogonal-
ity we studied earlier in the book (Chapter 4). In particular, recall that two vectors x; y 2 Rn
are orthogonal, written x?y, when their inner product is null. When x is orthogonal to all
vectors in a subset S of Rn , we write x?S.

Theorem 885 (Projection Theorem) Let V be a vector subspace of Rn . For every x 2


Rn , the optimization problem (19.1) has a unique solution, given by the vector m 2 V with
error x m orthogonal to V , that is, (x m) ?V .

Note that the uniqueness of m implies that kx mk < kx yk for each y 2 V di¤erent
from m.

This remarkable result ensures the existence and uniqueness of the solution, thus an-
swering the …rst two questions, and characterizes it as the vector in V which makes the
approximation error orthogonal to V itself. Orthogonality with respect to the error is a key
property of the solution that has a number of consequences in applications. Furthermore,
Theorem 890 will show how orthogonality allows for identifying the solution in closed form
in terms of a basis of V , thus fully answering also the last question.

To prove the theorem, given a x 2 Rn consider the function f : Rn ! R de…ned by


f (y) = kx yk. Problem (19.1) can be rewritten as

max f (y) sub y 2 V (19.2)


y

Thanks to the following lemma, one can apply Tonelli’s Theorem and Theorem 831 to
this optimization problem.

Lemma 886 The function f is strictly concave and coercive on V .

Proof The proof is analogous to that of Lemma 855 and is thus left to the reader (note
that, from Proposition 701, V is a closed and convex subset of Rn ).

Proof of the Projection Theorem In light of the previous lemma, problem (19.2), so
problem (19.1), has a solution by Tonelli’s Theorem because f is coercive on V and such a
solution is unique by Theorem 831 because f is strictly concave.
It remains to show that, if m minimizes kx yk, then (x m) ?V . Suppose, by contra-
diction, that there is a y~ 2 V which is not orthogonal to x m. Without loss in generality,
suppose that k~y k = 1 (otherwise, it would su¢ ce to take y~= k~
y k which has norm 1) and that
(x m) y~ = 6= 0. Denote by y 0 the element in V such that y 0 = m + y~. We have that
2
x y0 = kx m y~k2 = kx mk2 2 (x m) y~ + 2
= kx mk2 2
< kx mk2
19.2. PROJECTIONS 595

thus contradicting the assumption that m minimizes kx yk as the element y 0 would make
kx yk even smaller. The contradiction proves the desired result.

Denote by V ? = fx 2 Rn : x?V g the set of vectors that are orthogonal to V . The reader
can easily check that such a set is a vector subspace of Rn . It is thus called the orthogonal
complement of V .

Example 887 Let V = span fy1 ; :::; yk g be the vector subspace generated by the vectors
fyi gki=1 and let Y 2 M (k; n) be the matrix whose rows are such vectors. Given x 2 Rn ,
we have x?V if and only if Y x = 0. Therefore, V ? consists of all the solutions of this
homogeneous linear system. N

The Projection Theorem has the following important corollary.

Corollary 888 Let V be a vector subspace of Rn . Each vector x 2 Rn can be uniquely


decomposed as
x=y+z (19.3)
with y 2 V and z 2 V ? .

Proof It su¢ ces to set y = m and z = x m.

In words, any vector can be uniquely represented as sum of vectors in V and in its
orthogonal complement V ? , and this can be done for any vector subspace V of Rn . The
uniqueness of such a decomposition is remarkable as it entails that the vectors y and z are
uniquely determined. For this reason we say that Rn is direct sum of subspaces V and V ? ,
in symbols Rn = V V ? . In many applications it is important to be able to regard Rn as a
direct sum of one of its subsets and its orthogonal complement.

19.2 Projections
Given a vector subspace V of Rn , the solution of the minimization problem (19.1) is called
projection of x onto V . In such way one can de…ne an operator PV : Rn ! Rn , called
projection, that associates to each x 2 Rn its projection PV (x).

Proposition 889 The projection is a linear operator.

Proof Take x; y 2 Rn and ; 2 R. Our aim is to show that PV ( x + y) = PV (x) +


PV (y). For every z 2 V , we have

( PV (x) + PV (y) ( x + y)) z = ( (PV (x) x) + (PV (y) y)) z


= (PV (x) x) z + (PV (y) y) z = 0

Therefore,
( PV (x) + PV (y) ( x + y)) ?V
and, by the Projection Theorem and by the uniqueness of decomposition (19.3), PV (x) +
PV (y) is the projection of x + y on V , that is, PV ( x + y) = PV (x) + PV (y).
596 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

Being linear, projections thus have a matrix representation. To …nd it, consider a set
fyi gki=1 of vectors that generate the subspace V , that is, V = span fy1 ; :::; yk g. Given x 2 Rn ,
by the Projection Theorem we have (x PV (x)) ?V , so
(x PV (x)) yi = 0 8i = 1; :::; k
are the so-called normal equations of theP projection. Since PV (x) 2 V , we can write such a
vector as a linear combination PV (x) = ki=1 k yk . The normal equations then become:
k
!
X
x k yk yi = 0 8i = 1; :::; k
i=1
that is,
k
X
k (yk yi ) = x yi 8i = 1; :::; k
i=1
We thus end up with the system
8
>
> 1 (y1 y1 ) + 2 (y2
y1 ) + + k (yk y1 ) = x y1
<
1 (y1 y2 ) + 2 2 y2 ) +
(y + k (yk y2 ) = x y2
>
>
:
1 (y1 yk ) + 2 (y2 yk ) + + k (yk yk ) = x yk
Let Y 2 M (n; k) be the matrix that has as columns the generating vectors fyi gki=1 . We can
rewrite the system in matrix form as
YT Y = YT x (19.4)
k nn kk 1 k nn 1

We thus end up with the Gram square matrix Y T Y , which has rank equal to that of Y by
Proposition 582, that is, Y T Y = (Y ).
If the vectors fyi gki=1 are linearly independent, matrix Y has full rank k and so the Gram
matrix is invertible. By multiplying all elements in system (19.4) by the inverse of the Gram
1
matrix Y T Y , we have
1
= Y TY Y Tx
So, the projection is given by
k
X 1
PV (x) = k yk =Y = Y Y TY Y Tx 8x 2 Rn
i=1
We have thus proven the important:
Theorem 890 Let V be a vector subspace of Rn generated by the linearly independent vectors
fyi gki=1 .1 The projection PV : Rn ! Rn on V is given by
1
PV (x) = Y Y T Y Y Tx 8x 2 Rn (19.5)
where Y 2 M (n; k) is the matrix that has such vectors as columns.
1
In conclusion, the matrix Y Y T Y Y T represents the linear operator PV .
1
The assumption that V is generated by the linearly independent vectors fyi gki=1 is equivalent to requiring
that such vectors be a basis for V . The theorem can be equivalently formulated as: Let fyi gki=1 be a basis of
a vector subspace of Rn .
19.3. THE ULTIMATE RIESZ 597

19.3 The ultimate Riesz


Projections makes possible an important re…nement of Theorem 638, the version of Riesz’s
Theorem for vector subspaces. Given a linear function f : V ! R, let be the set of vectors
2 Rn for which (13.40) holds, that is, the vectors such that
f (x) = x 8x 2 V
By Theorem 638, such a set is non-empty. Remarkably, the projection of its elements on V
are the same:

Lemma 891 PV ( 0 ) = PV ( ) for each ; 0 2 .

Proof Take 2 . By (19.3) it holds that = PV ( ) + y with y 2 V ? , so that


f (x) = x = (PV ( ) + y) x = PV ( ) x + y x = PV ( ) x
If 0 2 we have
0
f (x) = PV x = PV ( ) x 8x 2 V
and so PV ( 0 ) x = 0 for every x 2 V . It follows that PV ( 0 ) 2 V ? , that is,
PV ( 0 ?
) 2 V \ V since by de…nition PV ( 0 ?
) 2 V . However, V \ V = f0g and so
PV ( 0 0
) = 0, that is, PV ( ) = PV ( ).

In light of this lemma, let us denote the common projection as , that is = PV ( ) with
2 . By the decomposition (19.3), every 2 can be uniquely written as = +", where
" 2 V ? , so that the vectors " and are orthogonal. In other words, = +":"2V? .
Since
f (x) = x = ( + ") x = x+" x= x 8x 2 V
the projection is the only vector in V that represents f . We have thus proven the following
version of Riesz’s Theorem for vector subspaces.

Theorem 892 (Riesz) Let V be a vector subspace of Rn . A function f : V ! R is linear


if and only if there is a unique vector 2 V such that
f (x) = x 8x 2 V

In what follows, when mentioning Riesz’s Theorem we will refer to this general version
of the result.

Example 893 In Example 637 we have = (1; 1; 0) 2 V . N

Projections have made it possible to address the multiplicity of vectors that a- icted
Theorem 638, which resulted from the multiplicity of the extensions f : Rn ! R of a function
f on Rn provided by the Hahn-Banach’s Theorem (Section 13.10).
In particular, if f : Rn ! R is a linear function on Rn and is the unique vector of Rn
such that f (x) = x for every x 2 Rn , for its restriction fjV on a vector subspace V the
vector = PV ( ) is the only vector in V such that f (x) = x for every x 2 V . By (19.5),
we then have the following remarkable formula
1
= Y Y TY YT
598 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

19.4 Least squares and projections


The idea of approximation that underlies both least squares (Section 18.9) and projections
suggests the existence of a connection between the two notions. Let us make such an intuition
more precise.

Least squares The least squares solution x 2 Rn solves the minimization problem

min kAx bk2 sub x 2 Rn (19.6)


x

At the same time, since the image Im F of the linear operator F (x) = Ax is a vector subspace
of Rm , the projection PIm F (b) of vector b 2 Rm solves the optimization problem

min ky bk2 sub y 2 Im F


y

that is,
kPIm F (b) bk ky bk 8y 2 Im F

Therefore, a vector x 2 Rn is a least squares solution if and only if

Ax = PIm F (b) (19.7)

that is, if and only if its image Ax is the projection of b on the vector subspace Im F
generated by the columns of A. The image Ax is often denoted as y . With such a
notation, (19.7) can be rewritten as y = PIm F (b).

Errors Equality (19.7) shows the tight relationship between projections and least squares.
In particular, by the Projection Theorem the error Ax b is orthogonal to the vector
subspace Im F :
(Ax b) ? Im F

or, equivalently, (y b) ? Im F .
The vector subspace Im F is generated by the columns of A, which are therefore or-
thogonal to the approximation error. For example, in the statistical interpretation of least
squares from Section 18.9.2, matrix A is denoted as X and has the form (18.48); each column
XiT of X displays data on the i-th regressor in every period. If we identify each such column
with the regressor whose data it portrays, we can see Im F as the vector subspace of Rm
generated by the regressors. The least squares method is equivalent to considering the pro-
jection of the output vector Y on the subspace generated by the regressors X1 , ..., Xn . In
particular, the regressors are orthogonal to the approximation error:

(X Y ) ?Xi 8i = 1; ::; n

By setting Y = X one equivalently has that (Y Y ) ?Xi for every i = 1; ::; n, a classic
property of least squares that we already mentioned.
19.5. A FINANCE ILLUSTRATION 599

Solution’s formula Assume that (A) = n, so matrix A has full rank and the linear
operator F is injective (Corollary 579). In this case, we have

1
x =F (PIm F (b)) (19.8)

so that the least squares solution can be determined via the projection. Equality (19.8) is
even more signi…cant if we can express it in matrix form. In doing so, note that the linearly
independent (since (A) = n) columns of A generate the subspace Im F , thus taking the
role of matrix Y from Section 19.2. By Theorem 890, we have

1
Ax = PIm F (b) = A AT A AT b

By multiplying by matrix AT we get:

1
AT A x = AT A AT A AT b = AT b

Finally, by Proposition 582 we have (A) = AT A = n, so the Gram matrix AT A is


n n
1
invertible. By multiplying its inverse AT A , we have the following remarkable matrix
formula for the least squares solution:

1
x = AT A AT b

This is the matrix representation of (19.8) that is made possible by the matrix representation
of projections established in Theorem 890. Cramer’s Theorem is the special case when A is
an invertible square matrix of order n. Indeed, in this case also the transpose AT is invertible
(Proposition 603), so by Proposition 594 we have

1 1
x = AT A AT b = A 1
AT AT b = A 1
b

We have thus found the least squares solution when the matrix A has full rank. Using
the statistical notation, we end up with the well-known least squares formula

1
= X TX X TY

19.5 A …nance illustration


We consider a two-period frictionless …nancial market. At date 0 (today) investors trade n
primary assets – in any quantity and without any kind of impediment (transaction costs,
short sales constraints, etc.) –that pay out at date 1 (tomorrow), contingent on which state
s 2 S = fs1 ; :::; sk g obtains tomorrow. States are mutually exclusive (only one of them
obtains) and provide an exhaustive description of uncertainty (at least one of them obtains).
Let L = fy1 ; :::; yn g Rk be the collection of primary assets and p = (p1 ; p2 ; : : : ; pn ) 2 Rn
the vector of their market prices (per unit of asset). The pair (L; p) describes the …nancial
market.
600 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

19.5.1 Portfolios and contingent claims


A primary asset j = 1; :::; n is denoted by
yj = (y1j ; :::; ykj ) 2 Rk
where yij represents its payo¤ if state si obtains. Portfolios of primary assets can be formed
in the market, each identi…ed by a vector of weights x = (x1 ; :::; xn ) 2 Rn where xj is the
traded quantity of primary asset yj . If xj 0 (resp., xj 0) the portfolio is long (resp.,
short) on asset yj , that is, it buys (resp., sells) xj units of the asset. In particular, the
primary asset y1 is identi…ed by the portfolio e1 = (1; 0; :::; 0) 2 Rn , the primary asset y2 by
e2 = (0; 1; 0::::; 0) 2 Rn , and so on.
The linear combination Xn
x j yj 2 R k
j=1
is the state contingent payo¤ that, tomorrow, portfolio x ensures.
Example 894 Suppose the payments of the primary assets depend on the state of the
economy (e.g., dividends if assets are shares), which can be of three types:
s1 = “recession” s2 = “stasis” s3 = “growth”
Each primary asset yj can be described as a vector
yj = (y1j ; y2j ; y3j ) 2 R3
in which yij is the payment of the asset in case state si obtains, for i = 1; 2; 3. Suppose there
exist only four assets on the market, with L = fy1 ; y2 ; y3 ; y4 g. Let xj be the quantity of
asset yj held, so that the vector of coe¢ cients x = (x1 ; x2 ; x3 ; x4 ) 2 R4 represents a portfolio
formed by these assets. The quantities xj can be both positive and negative. In the …rst
case we are long in the asset and we are paid yij in case state si obtains; when xj is negative
we are instead short on the asset and we have to pay yij when si obtains. The payment of
a portfolio x 2 R4 in the di¤erent states is, therefore, given by the linear combination
x1 y1 + x2 y2 + x3 y3 + x4 y4 2 R3
For instance, suppose
y1 = ( 1; 0; 2) , y2 = ( 3; 0; 3) , y3 = (0; 2; 4) , y4 = ( 2; 0; 2) (19.9)
Then, the portfolio x = (1; 2; 1; 2) has payo¤ y1 + 2y2 + y3 + 2y4 = ( 11; 2; 16) 2 R3 . N
We call contingent claim any state contingent payo¤ wX 2 Rk . A claim w is replicable
n
(in the market) if there exists a portfolio x such that w = xj yj . In words, replicable
j=1
contingent claims are the state contingent payo¤s that, tomorrow, can be attained by trading,
today, primary assets. The market W is the vector subspace of Rk consisting of all replicable
contingent claims, that is,
W = span L
The market is complete if W = Rk : if so, all contingent claims are replicable. Otherwise,
the market is incomplete. In view of Example 85, completeness of the market amounts to
the replicability of the k Arrow (or pure) contingent claims ei 2 Rk that pay out one euro if
state si obtains and zero otherwise. These important claims uniquely identify states.
19.5. A FINANCE ILLUSTRATION 601

Example 895 In the previous example the market generated by the four primary assets
(19.9) is easily seen to be complete. On the other hand, suppose that only the …rst two
assets are available, that is, L = fy1 ; y2 g. Then, W = span L = f(x; 0; y) : x; y 2 Rg, and so
the market is now incomplete. Indeed, it is not possible to replicate contingent claims that
feature non-zero payments when state s2 obtains. N

19.5.2 Market value


The payo¤ operator R : Rn ! Rk given by
Xn
R (x) = x j yj
j=1

is the linear operator that describes the contingent claim determined by portfolio x. In other
words, Ri (x) is the payo¤ of portfolio x if state si obtains. Clearly, W = Im R and so the
rank (R) of the linear operator R : Rn ! Rk is the dimension of the market W .
To derive the matrix representation of the payo¤ operator R, consider the payo¤ matrix
2 3
y11 y12 y1n
6 y21 y22 y2n 7
Y = (yij ) = 6
4
7
5
k n
yk1 yk2 ykn

It has k rows (states) and n columns (assets), where entry yij represents the payo¤ of primary
asset yj in state si . In words, Y is the matrix rendering of the collection L of primary assets.
It is easy to see that the payo¤ operator R : Rn ! Rk can be represented as

R (x) = Y x

The payo¤ matrix Y is thus the matrix associated with operator R. Its rank is then dimension
of market W (see Section 13.4.2).
In a frictionless market, the (market) value
Xn
v (x) = p x = p j xj
j=1

of a portfolio x is its (today) cost caused by the market operations it requires.2 The (market)
value function v : Rn ! R is the linear function that assigns to each portfolio x its value
v (x). In particular, the value of primary assets is their price. For, recalling that the primary
asset yj is identi…ed by the portfolio ej , we have

v ej = p ej = pj (19.10)

Note that it is the frictionless nature of the market that ensures the linearity of the value
function. For instance, if there are transaction costs and so the price of asset yj depends on
the traded quantity –e.g., v 2ej < 2pj –then the value function is no longer linear.
2
Since there are no restrictions to trade, and so it is possible to go long or short on assets, to be precise
v (x) is actually a cost if positive, but a bene…t if negative.
602 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

19.5.3 Law of one price


The Law of one price is a fundamental property of a …nancial market.

De…nition 896 The …nancial market (L; p) satis…es the Law of one price (LOP) if, for all
portfolios x; x0 2 Rn ,
R (x) = R x0 =) v (x) = v x0 (19.11)

In words, portfolios that induce the same contingent claims must share the same market
value. Indeed, the contingent claims that they determine is all that matters in portfolios,
which are just instruments to achieve them. If two portfolios inducing the same contingent
claim had di¤erent market values, a (sure) saving opportunity would be missed in the market.
The LOP requires that the …nancial market takes advantage of any such opportunity.
Since W = Im R, we have R (x) = R (x0 ) if and only if x; x0 2 R 1 (w) for some w 2 W .
The LOP can be then equivalently stated as follows: given any replicable claim w 2 W ,

x; x0 2 R 1
(w) =) v (x) = v x0 (19.12)

All portfolios x that replicate a contingent claim w thus share the same value v (x). It is
then natural to regard such common value as the price of the claim.

De…nition 897 The price pw of a replicable contingent claim w 2 W is the value of a


replicating portfolio x 2 R 1 (w), that is, pw = v (x) where w = R (x).

In words, pw is the market cost v (x) incurred today to form a portfolio x that tomorrow
will ensure the contingent claim w, that is, w = R (x). By the form (19.12) of the LOP, the
de…nition is well posed: it is immaterial which speci…c replicating portfolio x is considered
to determine price pw . The LOP thus permits to price all replicable claims.
For primary assets we get back to (19.10), that is, pj = v ej . In general, we have
Xn
1
pw = v (x) = p j xj 8x 2 R (w)
j=1

The price of a contingent claim in the market is thus the linear combination of the prices of
the primary assets held in any replicating portfolio, weighted according to assets’weights in
such portfolio.

Example 898 (i) The portfolio x = (c; :::; c) consisting


Pn of c units of each primary
Pn asset
replicates the contingent claim w = R (x) = c j=1 yj . We have pw = c j=1 pj . (ii)
The portfolio x = (p1 ; :::; pn ), in which the holding of each primary Passet is proportional
n
to its market price, replicates the contingent claim w = R (x) = j=1 pj yj . We have
Pn
pw = j=1 pj .2 N

In sum, the LOP makes it possible to establish a …rst pricing formula


Xn
pw = pj xj 8x 2 R 1 (w) (19.13)
j=1

which permits to price all contingent claims in the market, starting from the market prices
of primary assets.
19.5. A FINANCE ILLUSTRATION 603

19.5.4 Pricing rules


In a market that satis…es the LOP, the previous de…nition permits to de…ne the pricing rule
f : W ! R as the function that associates to each replicable contingent claim w 2 W its
price pw , that is,
f (w) = pw
The next result is a fundamental consequence of the LOP.

Theorem 899 Suppose the …nancial market (L; p) satis…es the LOP. Then, the pricing rule
f : W ! R is linear.

Proof First observe that, by the LOP, v = f R, that is, v (x) = f (R (x)) for each x 2 Rn .
Let us prove the linearity of f . Let w; w0 2 W and ; 2 R. We want to show that
f ( w + w0 ) = f (w) + f (w0 ). Since W = Im R, there exist vectors x; x0 2 Rn such that
R (x) = w and R (x0 ) = w0 . By De…nition 897, pw = v (x) and pw0 = v (x0 ). By the linearity
of R and v, we then have

f w + w0 = f R (x) + R x0 =f R x + x0 =v x + x0
= v (x) + v x0 = pw + pw0 = f (w) + f w0

The function f : W ! R is thus linear on W .

The fact that the linearity of the pricing rule characterizes the (frictionless) …nancial
markets in which the LOP holds is a remarkable result, upon which modern asset pricing
theory relies. It permits to price all contingent claims in the market in terms of other
contingent claims, thus generalizing formula (19.13). For, suppose a contingent claim w
can
Xmbe written as a linear combination of some replicable contingent claims, that is, w =
j wj . Then w is replicable, with
j=1

Xm Xm Xm
pw = f (w) = f j wj = jf (wj ) = j pwj (19.14)
j=1 j=1 j=1

Formula (19.13) is the special case where the contingent claims wj are primary assets and
their weights are the portfolio ones. In general, it may be easier (e.g., more natural from a
…nancial standpoint) to express a contingent claim in terms of other contingent claims rather
in terms of primary assets. The pricing formula
Xm
pw = j pwj (19.15)
j=1

permits to price contingent claims when expressed in terms of other contingent claims.

Inspection of the proof of Theorem 899 shows that the pricing rule inherits its linearity
from that of the value function, which in turn depends on the frictionless nature of the
…nancial market. We conclude that, in the …nal analysis, the pricing rule is linear because
the …nancial market is frictionless. Whether or not the market is complete is, instead,
irrelevant.
604 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

19.5.5 Pricing kernels


Much more is true, however. Indeed, the Theorem of Riesz (in its version for subspaces,
Theorem 638, since the market W is not necessarily complete) leads to the following key
representation result for the pricing rule.

Theorem 900 Suppose the …nancial market (L; p) satis…es the LOP. Then, there exists a
unique vector 2 W such that

f (w) = w 8w 2 W (19.16)

Proof By Theorem 899, the function f : Rk ! R is linear on W . By Theorem 638, there


exists a unique vector 2 W such that f (w) = w for every w 2 W .

The representing vector is called the pricing kernel. When the market is complete,
2 Rk . In this case we have i = pei where pei is the price of the Arrow contingent claim
i
e ; indeed, by (19.16)
pei = f ei = ei = i
In words, the i-th component of the pricing kernel i is the price of the Arrow contingent
claim that corresponds to state si . That is, i is the cost of having, for sure, one euro
tomorrow if state si obtains (and zero otherwise).
As a result, when the market is complete the price of a contingent claim w is the weighted
average
Xk
pw = f (w) = w= i wi (19.17)
i=1

of its payments in the di¤erent states, each state weighted according to how much it costs
today to have one euro tomorrow at that state. Consequently, the knowledge of the pricing
kernel (i.e., of the prices of the Arrow contingent claims) permits to price all contingent
claims in the market via the pricing formula

k
X
pw = i wi (19.18)
i=1

The earlier pricing formulas (19.13) and (19.15) require, to price each claim, the knowledge
of replicating portfolios or of prices of some other contingent claims. In contrast, the pricing
formula (19.18) only requires a single piece of information, the value of the Ppricing kernel,
to price all claims. In particular, for primary assets it takes the form pj = ki=1 i yij .

Example 901 In the three-state economy of Example 894, there are three Arrow contingent
claims e1 , e2 , and e3 . Suppose the today market price of having tomorrow one euro in the
recession state (and zero otherwise) is higher than in the stasis state, which is in turn higher
than in the growth state, say pe1 = 3, pe2 = 2, and pe3 = 1. Then, the pricing kernel is
= (3; 2; 1) and the pricing formula (19.18) becomes pw = 3w1 + 2w2 + w3 for all w 2 W .
For instance, the price of the contingent claim w = (2; 1; 4) is pw = 12. N
19.5. A FINANCE ILLUSTRATION 605

19.5.6 Arbitrage
A portfolio x 2 Rn is an arbitrage if either of the following conditions holds3

Yx 0 Yx>0
I ; II
p x<0 p x 0

A portfolio that satis…es condition I has a strictly negative market value and, nevertheless,
ensures a positive payment in all states. On the other hand, a portfolio that satis…es condition
II has a negative market value and, nevertheless, a strictly positive payo¤ in all states. Well-
functioning …nancial markets should be able to take advantage of any such opportunity of a
sure gain, and so they should feature no arbitrage portfolios.
In this section we will study such well-functioning markets. In particular, in a market
without arbitrages I we have:

I R (x) 0 =) v (x) 0 8x 2 Rn (19.19)

while without arbitrages II we have:

II R (x) > 0 =) v (x) > 0 8x 2 Rn (19.20)

The …rst no arbitrage condition is enough to ensure that the market satis…es the LOP.

Lemma 902 A …nancial market (L; p) that has no arbitrages I satis…es the LOP.

Proof By applying (19.19) to the portfolio x, we have

R (x) 0 =) v (x) 0 8x 2 Rn

that is,
R (x) 0 =) v (x) 0 8x 2 Rn
Along with (19.19), this implies

R (x) = 0 =) v (x) = 0 8x 2 Rn

Let x and x0 be two portfolios such that R (x) = R (x0 ). The linearity of R implies
R (x x0 ) = 0, and so v (x0 x) = 0, i.e., v (x0 ) = v (x).

Consider a complete market, that is, W = Rk . Thanks to the lemma, the no arbitrage
condition (19.19) implies that contingent claims are priced according to the formula (19.16).
But much more is true: under this no arbitrage condition the vector is positive, and so the
pricing rule becomes linear and increasing. Better claims command higher market prices.

Proposition 903 A complete …nancial market (L; p), with p 6= 0, satis…es the no arbitrage
condition (19.19) if and only if the pricing rule is linear and increasing, that is, there exists
unique vector 2 Rk+ such that

f (w) = w 8w 2 W (19.21)
3
Y x > 0 means (Y x)i > 0 for each i = 1; :::; k.
606 CHAPTER 19. PROJECTIONS AND APPROXIMATIONS

Proof “If”. Let R (x) 0. Then, v (x) = f (R (x)) = R (x) 0 since 0 by hypothesis.
“Only if”. Since the market is complete, we have W = Im R = Rk . By Lemma 902, the
LOP holds and so f is linear (Proposition 899). We need to show that f is increasing. Since
f is linear, this amounts to show that is positive, i.e., that x 0 implies f (x) 0. Let
x 2 Rk with x 0. Being Im R = Rk , there exists x 2 Rn such that R (x) = x. We thus
have R (x) = x 0, and so (19.19) implies v (x) 0 because of the linearity of v. Hence,
f (x) = f (R (x)) = v (x) 0. We conclude that the linear function f is positive, and so
increasing. By the monotone version of Riesz’s Theorem (Proposition 641), there exists a
positive vector 2 Rk such that f (z) = z for every z 2 Rk .4

The result becomes sharper when the market also satis…es the second no arbitrage con-
dition (19.20): the vector then becomes strictly positive, so that the pricing rule gets
linear and strictly increasing. Strictly better claims thus command strictly higher mar-
ket prices. But, as both the no arbitrage conditions (19.19) and (19.20) are compelling, a
well-functioning market should actually satisfy both of them. We thus have the following
important result (as its demanding name shows).5

Theorem 904 (Fundamental Theorem of Finance) A complete …nancial market (L; p),
with p 6= 0, satis…es the no arbitrage conditions (19.19) and (19.20) if and only if the pricing
rule is linear and strictly increasing, that is, there exists a unique vector 2 Rk++ such that

f (w) = w 8w 2 W (19.22)

Proof “If”. Let R (x) > 0. Then, v (x) = f (R (x)) = R (x) > 0 because > 0 by
hypothesis. “Only if.”By Proposition 903, f is linear and increasing. We need to show that
f is strictly increasing. Since f is linear, this amounts to show that is strictly positive, i.e.,
that x 0 implies f (x) > 0. Let x 2 Rk with x 0. Being Im R = Rk , there exists x 2 Rn
such that R (x) = x. We thus have R (x) = x 0, and so (19.19) implies v (x) 0 because
of the linearity of v. Hence, f (x) = f (R (x)) = v (x) > 0. We conclude that the linear
function f is strictly positive, and so strictly increasing. By the (strict) monotone version of
Riesz’s Theorem (Proposition 641), there exists a strictly positive vector 2 Rk++ such that
f (z) = z for every z 2 Rk .

The price of any replicable contingent claim w is thus the weighted average
k
X
pw = f (w) = w= i wi
i=1

of its payments in the di¤erent states, with strictly positive weights. If market prices do not
have this form, the market is not exhausting all arbitrage opportunities. Some sure gains
are still possible.

4
The vector in (19.22) is unique because the market is complete, and so is unique the vector in
Proposition 641.
5
We refer interested readers to Cochrane (2005) and Ross (2005).
Part VI

Di¤erential calculus

607
Chapter 20

Derivatives

20.1 Marginal analysis


Consider a function c : R+ ! R whose value c (x) represents the cost (say, in euros) required
to produce the quantity x of an output. Suppose that the producer wants to evaluate the
impact on the costs of a variation x in the output produced. For example, if x = 100
and x = 3, he has to evaluate the impact on costs of a positive variation – that is, of an
increment –of 3 units of output with respect to the current production of 100 units.
The output variation x determines a variation of the cost

c = c (x + x) c (x)

If x is a non-zero discrete variation, that is,

x 2 f:::; 3; 2; 1; 1; 2; 3; :::g

the average cost of each additional unit of output in x is given by


c c (x + x) c (x)
= (20.1)
x x
The ratio c= x, called di¤ erence quotient, is fundamental in evaluating the impact on the
cost of the variation x of the quantity produced. Let us illustrate it with the following
table, in which c(x)=x denotes the average cost (in euros) of each unit produced:
c(x) c
x c (x) x x

100 4; 494 44:94


4;500 4;500 4;494
102 4; 500 102 ' 44:11767 2 =3
4;510 4;510 4;500
105 4; 510 105 ' 42:95238 3 = 3:3
4;515 4;515 4;510
106 4; 515 106 ' 42:59434 1 =5

As the production increases, while the average cost decreases the di¤erence quotient in-
creases. This means that the average cost of each additional unit increases. Therefore, to
increase the production is, “at the margin”, more and more expensive for the producer. In

609
610 CHAPTER 20. DERIVATIVES

particular, the last additional unit has determined an increase in costs of 5 euros: for the
producer such increase in the production is pro…table if (and only if) there is an at least
equal increase in the di¤erence quotient of the return R(x), that is, in the return of each
additional unit:
R R (x + x) R (x)
= (20.2)
x x
Let us add to the table two columns with the returns and their di¤erence quotients:

c(x) c R
x c (x) x x R (x) x

100 4; 494 44:94 5; 000


5;100 5;000
102 4; 500 44:11767 3 5; 100 2 = 50
5;200 5;100
105 4; 510 42:95238 3:3 5; 200 3 = 33:3
5;204 5;200
106 4; 515 42:59434 5 5; 204 1 =4

The …rst two increases in production are pro…table for the producer: they determine a
di¤erence quotient of the returns equal to 50 euros and 33:3 euros, respectively, versus a
di¤erence quotient of the costs equal to 3 euros and 3:3 euros, respectively. After the last
increment in production, the di¤erence quotient of the returns decreases to only 4 euros,
lower than the corresponding value of 5 euros of the di¤erence quotient of the costs. The
producer will …nd, therefore, pro…table to increase the production to 105 units, but not to
106. That this choice is correct is con…rmed by the trend of the pro…t (x) = R (x) c (x),
which for convenience we add to the table:
c(x) c R
x c (x) x x R (x) x (x)
100 4; 494 44:94 5; 000 506

102 4; 500 44:11767 3 5; 100 50 600

105 4; 510 42:95238 3:3 5; 200 33:3 690

106 4; 515 42:59434 5 5; 204 4 689

The pro…t of the producer continues to increase up to the level 105 of produced output, but
decreases in case of a further increase to 106. The “incremental” information, quanti…ed by
di¤erence quotients such as (20.1) and (20.2), is therefore key for the producer ability to
assess his production decisions. In contrast, the information on average costs or on average
returns is, for instance, completely irrelevant (in our example it is actually misleading: the
decrease in average costs can lead to wrong decisions). In the economics jargon, the producer
should decide based on what happens at the margin, not on average.

Until now we have considered the ratio (20.1) for discrete variations x. Idealizing, let
us consider arbitrary non-zero variations x 2 R and, in particular, smaller and smaller
variations, that is, x ! 0. Their limit c0 (x) is given by

c (x + x) c (x)
c0 (x) = lim (20.3)
x!0 x
20.2. DERIVATIVES 611

When it exists and is …nite, c0 (x) is called the marginal cost at x: it indicates the variation
in cost determined by in…nitesimal variations of output with respect to the “initial”quantity
x.
This idealization permits to frame marginal analysis within di¤erential calculus, a fun-
damental mathematical theory that will be the subject matter of the chapters of this part of
the book. Because it formalizes marginal analysis, di¤erential calculus pervades economics.

20.2 Derivatives
For a function f : (a; b) ! R, the di¤erence quotient (20.1) takes the form
f f (x + h) f (x) f (x + h) f (x)
= = (20.4)
x (x + h) x h

where x = h denotes a generic variation, positive if h > 0 or negative if h < 0.1

De…nition 905 A function f : (a; b) ! R is said to be derivable at a point x0 2 (a; b) if the


limit
f (x0 + h) f (x0 )
lim (20.5)
h!0 h
exists and is …nite. This limit is called the derivative of f at x0 , and is denoted by f 0 (x0 ).

Therefore, the derivative is nothing but the limit of the di¤erence quotient when it exists
and is …nite. Other notations used for the derivative at x0 are
df
Df (x0 ) and (x0 )
dx
The notation f 0 (x0 ), which we will mostly use, is probably the most convenient; sometimes
we will use also the other two notations, whenever convenient.2
Note the double requirement that the limit exist and be …nite: if at a point the limit of
the di¤erence quotient (20.5) exists but is in…nite, the function does not have a derivative
at that point (see Example 909).

A few remarks are in order. (i) Di¤erential calculus, of which derivatives are a …rst key
notion, originated in the works of Leibniz and Newton in the second part of the seventeenth
century. Newton was motivated by physics, which indeed features a classic example of a
derivative: let t be the time and s be the distance covered by a mobile object. Suppose the
function s(t) indicates the total distance totally covered until time t. The di¤erence quotient
s= t is the average velocity in a time interval of t. Therefore, its derivative at a point
t0 can be interpreted as the instantaneous velocity at t0 . If space is measured in kilometers
and time in hours, the velocity is measured in km/h, that is, in “kilometers per hour” (as
speedometers do).
(ii) In applications, the dependent and independent variables y and x that appear in
a function y = f (x) take a concrete meaning and are both evaluated in terms of a unit of
1
Since the domain (a; b) is an open interval, for h su¢ ciently small we have x + h 2 (a; b).
2
Di¤erent notations for the same mathematical object can be convenient in di¤erent contexts. For this
reason, it may be important to have several notations at hand (provided they are then used consistently).
612 CHAPTER 20. DERIVATIVES

measure (e, $, kg, liters, years, miles, parsecs, etc.): if we denote by T the unit of measure of
the dependent variable y and by S that of the independent variable x, the di¤erence quotient
y= x (and so the derivative, if it exists) is then expressed in the unit of measure T =S. For
instance, if in the initial example the cost is expressed in euros and the quantity produced
in quintals, the di¤erence quotient (20.1) is expressed in e/q, that is, in “euros per quintal”.
(iii) The notation df =dx (or the equivalent dy=dx) wants to suggest that the derivative
is a limit of ratios.3 Note, however, that df =dx is only a symbol, not a true ratio (indeed, it
is the limit of ratios). Nevertheless, heuristically it is often treated as a true ratio (see, for
example, the remark on the chain rule at the end of Section 20.9). This can be a useful trick
to help our intuition as long as what found is then checked formally.
(iv) The terminology “derivable at” is not so common, but its motivation will become
apparent in Section 20.12.2. In any case, a function f : (a; b) ! R which is derivable at each
point of (a; b) is called derivable, without any further quali…cation.

20.3 Geometric interpretation


The derivative has an important geometric interpretation. Given a function f : (a; b) ! R
and a point x0 2 (a; b), consider the straight line passing through the points (x0 ; f (x0 )) and
(x0 + h; f (x0 + h)), where h 6= 0 is a variation. Assume, for simplicity, that h > 0 (similar
considerations hold for h < 0):

y
5

f(x +h)
4 0

3
f(x )
0
2

0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6

The equation of such a straight line is obtained by solving the system


(
f (x0 ) = mx0 + q
f (x0 + h) = m (x0 + h) + q

A simple calculation gives

f (x0 + h) f (x0 )
y = f (x0 ) + (x x0 ) (20.6)
h
3
This notation is due to Leibniz, while the f 0 notation is due to Lagrange.
20.3. GEOMETRIC INTERPRETATION 613

which is the equation of the sought-after straight line passing through the points (x0 ; f (x0 ))
and (x0 + h; f (x0 + h)). Taking the limit as h ! 0, we get

y = f (x0 ) + f 0 (x0 ) (x x0 ) (20.7)

that is, the equation of the straight line which is tangent to the graph of f at the point
(x0 ; f (x0 )) 2 Gr f .
As h tends to 0, the straight line (20.6) thus tends to the tangent (straight) line, whose
slope is the derivative f 0 (x0 ). The graph of the tangent line is:

y
5

f(x +h)
4 0

3
f(x )
0
2

0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6

In sum, geometrically the derivative can be regarded as the slope of the tangent line at
the point (x0 ; f (x0 )). In turn, the tangent line can be regarded as a local approximation
of the function f at x0 , a key observation that will be developed through the fundamental
notion of di¤erential (Section 20.12).

Example 906 Consider the function f : R ! R given by f (x) = x2 1. At a point x 2 R


we have
h i
f (x + h) f (x) (x + h)2 1 x2 1
f 0 (x) = lim = lim
h!0 h h!0 h
2
h + 2xh
= lim = lim (h + 2x) = 2x
h!0 h h!0

The derivative exists at each x 2 R and is given by 2x. For example, the derivative at x = 1
is f 0 (1) = 2, with tangent line

y = f (1) + f 0 (1) (x 1) = 2x 2
614 CHAPTER 20. DERIVATIVES

at the point (1; 0) 2 Gr f . Graphically:

y
3

0
-1 O 1 x

-1

-2
-2 -1 0 1 2 3

The derivative at the origin is f 0 (0) = 0, with tangent line


y = f (0) + f 0 (0) x = 1
at the point (0; 1) 2 Gr f . Graphically:

y
3

0
-1 O 1 x

-1

-2
-2 -1 0 1 2 3

In this case the tangent line is horizontal (constant) and is always equal to 1. N

Example 907 Consider a constant function f : R ! R, that is, f (x) = k for every x 2 R.
For every h 6= 0 we have
f (x + h) f (x) k k
= =0
h h
and therefore f 0 (x) = 0 for every x 2 R. The derivative of a constant function is zero. N

Example 908 Consider the function f : R ! R given by


( 1
x if x 6= 0
f (x) =
0 if x = 0
20.3. GEOMETRIC INTERPRETATION 615

with graph:

10

0
x
-2

-4

-6

-8

-10
-5 0 5

At a point x 6= 0 we have

1 1
f (x + h) f (x) x (x + h)
f 0 (x) = lim = lim x+h x
= lim
h!0 h h!0 h h!0 hx (x + h)
h 1 1
= lim = lim =
h!0 hx (x + h) h!0 x (x + h) x2

The derivative exists at each x 6= 0 and is given by x 2. For example, the derivative at
x = 1 is f 0 (1) = 1, and at x = 2 is f 0 ( 2) = 1=4.
If we consider the origin x = 0 we have, for h 6= 0,

1
f (x + h) f (x) h 0 1
= =
h h h2

so that
f (x + h) f (x)
lim = +1
h!0 h

The limit is not …nite and hence the function does not have a derivative at x = 0. Recall
that the function is not continuous at this point (Example 477). N

Example 909 Consider the function f : R ! R given by

( p
x if x 0
f (x) = p
x if x < 0
616 CHAPTER 20. DERIVATIVES

with graph:

3.5

3 y
2.5

1.5

0.5

0
O x
-0.5

-1

-1.5

-2
-6 -4 -2 0 2 4 6 8

Take x = 0. For h > 0 we have


p
f (x + h) f (x) h 1
= = p ! +1
h h h
and, for h < 0, we have
p p
f (x + h) f (x) h h 1
= = =p ! +1
h h h h
Therefore,
f (x + h) f (x)
lim = +1
h!0 h
Since the limit is not …nite, the function does not have a derivative at x = 0. Note that,
di¤erently from the previous example, the function is continuous at this point. N

20.4 Derivative function


Given a function f : (a; b) ! R, the set D (a; b) of the points of the domain where f is
derivable is called domain of derivability of f . In Examples 906 and 907 the domain of the
function coincides with that of derivability. On the contrary, in Examples 908 and 909 the
domain of the function is R, while the domain of derivability is R f0g.
We can now introduce a new function: the derivative function.

De…nition 910 Let f : (a; b) ! R be a function with domain of derivability D (a; b).
0 0
The function f : D ! R that to each x 2 D associates the derivative f (x) is called the
derivative function of f .

The derivative function f 0 describes the derivative of f at the di¤erent points where it
exists, thus describing its overall behavior. In the examples previously discussed:
20.5. ONE-SIDED DERIVATIVES 617

(i) for f (x) = x2 1, the derivative function f 0 : R ! R is given by f 0 (x) = 2x;

(ii) for f (x) = k, the derivative function f 0 : R ! R is given by f 0 (x) = 0;

(iii) for f (x) = 1=x = x 1, the derivative function f 0 : R f0g ! R is given by f 0 (x) =
x 2.

The notion of derivative function permits to frame in a bigger picture the computations
that we did in the examples of the last section: to compute the derivative of a function f at
a generic point x of the domain amounts to computing its derivative function f 0 . When we
have found that the derivative of f (x) = x2 is, at any point x 2 R, given by 2x, we have
actually found that its derivative function f 0 : R ! R is given by f 0 (x) = 2x.

Example 911 Let r : R+ ! R be the return function and c : R+ ! R be the cost function
of a producer (see Section 18.1.4). The derivative function r0 : D R+ ! R is called the
marginal return function, and the derivative function c0 : D R+ ! R is called the marginal
cost function. Their economic interpretation should be, by now, clear. N

20.5 One-sided derivatives


Until now we have considered the two-sided limit (20.5) of the di¤erence quotient. Sometimes
it is useful to consider separately positive and negative variations of h. To this end, we
introduce the notions of right and left derivatives.

De…nition 912 A function f : (a; b) ! R is said to be derivable from the right at the point
x0 2 (a; b) if the one-sided limit

f (x0 + h) f (x0 )
lim (20.8)
h!0+ h

exists and is …nite, and to be derivable from the left at x0 2 (a; b) if the one-sided limit

f (x0 + h) f (x0 )
lim (20.9)
h!0 h

exists and is …nite.

When it exists and is …nite, the limit (20.8) is called the right derivative of f at x0 , and
it is denoted by f+0 (x0 ). Analogously, when it exists and is …nite, the limit (20.9) is called
left derivative of f at x0 , and it is denoted by f 0 (x0 ). Since two-sided limits exist if and
only if both one-sided limits exist (Proposition 445), we have:

Proposition 913 A function f : (a; b) ! R is derivable at x0 2 (a; b) if and only if it is


derivable from both the right and the left, with f+0 (x0 ) = f 0 (x0 ). In this case,

f 0 (x0 ) = f+0 (x0 ) = f 0 (x0 )


618 CHAPTER 20. DERIVATIVES

Example 914 Consider the function f : R ! R given by


(
1 x2 if x 0
f (x) =
1 if x > 0
with graph:
3
y

1
1

0
O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

It is easy to see that the function is derivable at each point x 6= 0, with


(
0
2x if x < 0
f (x) =
0 if x > 0
On the other hand, at 0 we have
f (0 + h) f (0) 1 1
f+0 (0) = lim = lim =0
h!0+ h h!0+ h
f (0 + h) f (0) 1 h2 1
f 0 (x0 ) = lim = lim = lim h=0
h!0 h h!0 h h!0

Therefore, by Proposition 913 the function is derivable also at 0, with f 0 (0) = 0. In conclu-
sion, (
2x if x 0
f 0 (x) =
0 if x > 0
N

Through unilateral derivatives we can classify two important classes of points where
derivability fails. Speci…cally, a point x0 of the domain of f is called:

(i) a corner point if the right derivative and the left derivative exist but are di¤erent, i.e.,
f+0 (x0 ) 6= f 0 (x0 );
(ii) a cuspidal point (or a cusp) if the right and left limits of the di¤erence quotient are
in…nite with di¤erent sign:
f (x0 + h) f (x0 ) f (x0 + h) f (x0 )
lim = 1 and lim = 1
h!0+ h h!0 h
20.5. ONE-SIDED DERIVATIVES 619

Example 915 Let f : R ! R be given by f (x) = jxj, with graph

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

At x0 = 0 we have
(
f (x0 + h) f (x0 ) jhj 1 if h > 0
= =
h h 1 if h < 0

The two-sided limit of the di¤erence quotient does not exist at 0, so the function is not
derivable at 0. Nevertheless, at 0 there exist the one-sided derivatives. In particular,

f (0 + h) f (0) f (0 + h) f (0)
f+0 (0) = lim =1 ; f 0 (0) = lim = 1
h!0+ h h!0 h

The origin x0 = 0 is, therefore, a corner point. The reader can check that the function is
derivable at each point x 6= 0, with

(
0 1 if x > 0
f (x) =
1 if x < 0

Example 916 The function

( p
x if x 0
f (x) = p
x if x < 0
620 CHAPTER 20. DERIVATIVES

has a cuspidal point at the origin x = 0, as we can see from its graph:

3.5

3 y
2.5

1.5

0.5

0
O x
-0.5

-1

-1.5

-2
-6 -4 -2 0 2 4 6 8

We close by noting that the right and left derivative functions are de…ned in the same
way, mutatis mutandis, as the derivative function. In Example 915, the one-sided derivative
functions f+0 : R ! R and f 0 : R ! R are given by
( (
0 1 if x 0 0 1 if x > 0
f+ (x) = and f (x) =
1 if x < 0 1 if x 0

20.6 Derivability and continuity


A …rst important property of derivable functions is their continuity.

Proposition 917 A function f : (a; b) ! R derivable at a point x 2 (a; b) is continuous at


x.

Proof We have to prove that limx!x0 f (x) = f (x0 ). Since f is derivable at x, the limit of
the di¤erence quotient exists and is …nite, and it is equal to f 0 (x0 ):
f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
Let us rewrite the limit by setting x = x0 + h, so that h = x x0 . Observing that, as h
tends to 0, we have that x tends to x0 , we get:
f (x) f (x0 )
lim = f 0 (x0 )
x!x0 x x0
Therefore, by the algebra of limits (Proposition 309) we have:
f (x) f (x0 ) f (x) f (x0 )
lim (f (x) f (x0 )) = lim (x x0 ) = lim lim (x x0 )
x!x0 x!x0 x x0 x!x0 x x0 x!x0
0 0
= f (x0 ) lim (x x0 ) = f (x0 ) 0 = 0
x!x0
20.6. DERIVABILITY AND CONTINUITY 621

where the last equality holds since f 0 (x0 ) exists and is …nite. We have thus proved that
limx!x0 (f (x) f (x0 )) = 0. On the other hand, again by algebra of limits, we have:

0 = lim (f (x) f (x0 )) = lim f (x) lim f (x0 ) = lim f (x) f (x0 )
x!x0 x!x0 x!x0 x!x0

Therefore limx!x0 f (x) = f (x0 ), as desired.

Derivability at a point thus implies continuity at that point. The converse is false: the
absolute value function f (x) = jxj is continuous at x = 0 but is not derivable at that point
(Example 915). In other words, continuity is a necessary, but not su¢ cient, condition for
derivability.

Proposition 917, and the examples seen until now, allow us to identify …ve possible causes
of non-derivability at a point x:

(i) f is not continuous at x (Example 908).

(ii) f has a corner point at x (Example 915).

(iii) f has a cuspidal point at x (Example 916).

(iv) f has at x a point at which a one-sided derivative exist but, at the other side, the limit
of the di¤erence quotient is +1 or 1; for example, the function
( p
x if x 0
f (x) =
x if x < 0

is such that f 0 (0) = 1 and limh!0+ (f (x0 + h) f (x0 )) =h = +1.

(v) f has a vertical tangent at x; for example, the function


( p
x if x 0
f (x) = p
x if x < 0

seen in Example 909 has a vertical tangent at x = 0 because limh!0 f (h) =h = +1.

The …ve cases just identi…ed are, however, not exhaustive: there are other sources of
non-derivability. For example, the function
8
< x sin 1 if x 6= 0
f (x) = x
:
0 if x = 0

is continuous everywhere.4 At the origin x0 = 0 it is, however, not derivable because the
limit
f (x0 + h) f (x0 ) h sin h1 0 1
lim = lim = lim sin
h!0 h h!0 h h!0 h
4
Indeed, limx!0 x sin (1=x) = 0 because jsin (1=x)j 1 and so x x sin (1=x) x.
622 CHAPTER 20. DERIVATIVES

does not exist. The origin is not a corner point and there is no vertical tangent at this point.
The lack of derivability here is due to the fact that f has, in any neighborhood of the origin,
in…nitely many oscillations –which are such that the di¤erence quotient sin (1=h) oscillates
in…nitely many times between 1 and 1. Note that in this example the one-sided derivatives
f+0 (0) and f 0 (0) do not exist as well.

Terminology When f is derivable at all the interior points (a; b) and is one-sided derivable
at the endpoints a and b, we say that it is derivable on the closed interval [a; b]. It is
immediate to see that f is then also continuous on such interval.

20.7 Derivatives of elementary functions


Proposition 918 The power function f : R ! R given by f (x) = xn for n 1 is derivable
at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = nxn 1
(20.10)

For example, the function f (x) = x5 has derivative function f 0 (x) = 5x4 and the function
f (x) = x3 has derivative function f 0 (x) = 3x2 .
We give two proofs of this basic result.

Proof 1 By Newton’s binomial formula, we have

f (x + h) f (x) (x + h)n xn
f 0 (x) = lim = lim
h!0 h h!0 h
Pn n! n k k n
k=0 k!(n k)! x h x
= lim
h!0 h
n
x + nx n 1 h + n(n2 1) xn 2 h2 + + nxhn 1 + hn xn
= lim
h!0 h
n (n 1) n 2
= lim nxn 1 + x h+ + nxhn 2 + hn 1
h!0 2
= nxn 1

as claimed.

Proof 2 We establish (20.10) by induction, using the derivative of the product of functions
(see Section 20.8). First, we show that the derivative of the function f (x) = x is equal to 1.
The limit of the di¤erence quotient of f is

f (x + h) f (x) x+h x h
lim = lim = lim =1
h!0 h h!0 h h!0 h

Therefore f 0 (x) = 1, so (20.10) thus holds for n = 1. Suppose that (20.10) holds for n 1
(induction hypothesis), that is,

D(xn 1
) = (n 1)xn 2
20.7. DERIVATIVES OF ELEMENTARY FUNCTIONS 623

Consider the function xn = x (xn 1 ). Using the derivative of the product of functions (see
20.13 below) and the induction hypothesis, we have

D(xn ) = 1 (xn 1
) + x D(xn 1
) = (xn 1
) + x (n 1)xn 2
= (1 + n 1)(xn 1
) = nxn 1

that is, (20.10).

Proposition 919 The exponential function f : R ! R given by f (x) = x, with > 0, is


derivable at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = x
log

In particular, dex =dx = ex , that is, the derivative function of the exponential function is
the exponential function itself. So, the exponential function equals its derivative function, a
truly remarkable invariance property that gives the exponential function a special status in
di¤erential calculus.

Proof We have
x+h x x h 1
f (x + h) f (x)
f 0 (x) = lim = lim = lim
h!0 h h!0 h h!0 h
h 1
= x lim = x log
h!0 h
where the last equality follows from the basic limit (11.32).

Proposition 920 The function f : R ! R given by f (x) = sin x is derivable at each x 2 R,


with derivative function f 0 : R ! R given by

f 0 (x) = cos x

Proof From the basic trigonometric formula sin (a + b) = sin a cos b + cos a sin b, it follows
that
f (x + h) f (x) sin (x + h) sin x
f 0 (x) = lim = lim
h!0 h h!0 h
sin x cos h + cos x sin h sin x
= lim
h!0 h
sin x (cos h 1) + cos x sin h
= lim
h!0 h
cos h 1 sin h
= sin x lim + cos x lim = cos x
h!0 h h!0 h

The last equality follows from the basic limits (11.31) and (11.30) for cos x and sin x, re-
spectively.

In a similar way it is possible to prove that the function f : R ! R given by f (x) = cos x
is derivable at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = sin x (20.11)


624 CHAPTER 20. DERIVATIVES

20.8 Algebra of derivatives


In Section 6.3.2 we studied the algebra of functions, that is, their sums, products and quo-
tients. Let us see now how derivatives behaves with respect to these operations. We begin
with addition (and scalar multiplication).

Proposition 921 Let f; g : (a; b) ! R be two derivable functions at x 2 (a; b). The sum
function f + g : (a; b) ! R is derivable at x, with

(f + g)0 (x) = f 0 (x) + g 0 (x)

The result actually holds, more generally: for any linear combination f + g : (a; b) ! R,
with ; 2 R, we have
( f + g)0 (x) = f 0 (x) + g 0 (x) (20.12)

In particular, the derivative of f (x) is f 0 (x).

Proof We prove the result directly in the more general form (20.12). We have

( f + g) (x + h) ( f + g) (x)
( f + g)0 (x) = lim
h!0 h
( f )(x + h) + ( g) (x + h) ( f )(x) ( g) (x)
= lim
h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim +
h!0 h h
f (x + h) f (x) g (x + h) g (x)
= lim + lim
h!0 h h!0 h
= f 0 (x) + g 0 (x)

as desired.

Thus, the sum behaves in a simple manner with respect to derivatives: the “derivative of
a sum”is the “sum of the derivatives”.5 More subtle is the case of the product of functions.

Proposition 922 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b). The product
function f g : (a; b) ! R is derivable at x, with

(f g)0 (x) = f 0 (x) g (x) + f (x) g 0 (x) (20.13)

5
The converse does not hold: if the sum of two functions has a derivative, it is not necessarily true that
the individual functions have a derivative (for example, the origin is a corner point of both f (x) = jxj and
g (x) = jxj, but the sum f + g is a constant function that has derivative at every point of the real line).
The same is true for the multiplication and division operations on functions.
20.8. ALGEBRA OF DERIVATIVES 625

Proof We have
(f g) (x + h) (f g) (x) f (x + h) g (x + h) f (x) g (x)
(f g)0 (x) = lim = lim
h!0 h h!0 h
f (x + h) g (x + h) f (x) g (x + h) + f (x) g (x + h) f (x) g (x)
= lim
h!0 h
g (x + h) (f (x + h) f (x)) + f (x) (g (x + h) g (x))
= lim
h!0 h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim +
h!0 h h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim + lim
h!0 h h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim g (x + h) lim + f (x) lim
h!0 h!0 h h!0 h
0 0
= g (x) f (x) + f (x) g (x)
as desired. In the last step we have limh!0 g (x + h) = g (x) thanks to the continuity of g,
which is ensured by its derivability.

The derivative of the product, therefore, is not the product of the derivatives, but it is
given by the more subtle product rule (20.13). A similar rule –the so-called quotient rule –
holds mutatis mutandis for the quotient.

Proposition 923 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b), with g (x) 6=
0. The quotient function f =g : (a; b) ! R is derivable at x, with
0
f f 0 (x) g (x) f (x) g 0 (x)
(x) = (20.14)
g g (x)2
Proof We start with the case in which f is constant and equal to 1. We have
1 1
0
1 g (x + h) g (x) g (x) g (x + h)
(x) = lim = lim
g h!0 h h!0 g (x) g (x + h) h
1 g (x) g (x + h)
= lim
g (x) h!0 g (x + h) h
1 g (x + h) g (x) 1 g 0 (x)
= lim lim =
g (x) h!0 h h!0 g (x + h) g (x)2
Now consider any f : (a; b) ! R. Thanks to (20.13), we have
0
f 1 0 1 1 0
(x) = f (x) = f 0 (x) (x) + f (x) (x)
g g g g
f 0 (x) g 0 (x) f 0 (x) g 0 (x)
= + f (x) = f (x)
g (x) g (x)2 g (x) g (x)2
f 0 (x) g (x) f (x) g 0 (x)
=
g (x)2
as desired.
626 CHAPTER 20. DERIVATIVES

Example 924 (i) Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have

(f + g)0 (x) = 3x2 + cos x 8x 2 R

and
(f g)0 (x) = 3x2 sin x + x3 cos x 8x 2 R

as well as
0
f 3x2 sin x x3 cos x
(x) = 8x 2 R fn : n 2 Zg
g sin2 x
In the last formula fn : n 2 Zg is the set of the points f ; 2 ; ; 0; ; 2 ; g where
the function g (x) = sin x in the denominator is zero.
(ii) Let f : R ! R be given by f (x) = tan x. Since tan x = sin x= cos x, we have

1
f 0 (x) = 1 + tan2 x =
cos2 x

as the reader can check.


(iii) Let c : [0; 1) ! R be a cost function, with marginal cost function c0 : (0; 1) ! R.
Consider the average cost function cm : (0; 1) ! R given by

c (x)
cm (x) =
x

By the quotient rule, we have

c(x)
xc0 (x) c (x) x c0 (x) x c0 (x) cm (x)
c0m (x) = = =
x2 x2 x

Since x > 0, we have

c0m (x) 0 () c0 (x) cm (x) 0 () c0 (x) cm (x) (20.15)

Therefore, at a point x the variation in average costs is positive if and only if marginal costs
are larger than average costs. In other words, average costs continue to increase until they
are lower than marginal costs (cf. the numerical examples with which we began the chapter).
More generally, the same reasoning holds for each function f : [0; 1) ! R that represents,
when x 0 varies, an economic “quantity”: return, pro…t, etc.. The function fm : (0; 1) !
R de…ned by
f (x)
fm (x) =
x
is the corresponding “average quantity” (average return, average pro…t, etc.), while the
derivative function f 0 (x) represents the “marginal quantity” (marginal return, marginal
pro…t, etc.). At each x > 0, the function f 0 (x) can be interpreted geometrically as the slope
of the tangent line of f at x, while fm (x) is the slope of the straight line passing through
20.9. THE CHAIN RULE 627

the origin and the point (x; f (x)).

150 150
y y
f(x)

100 100

f'(x)
50 50

f(x)/x

0 0
O x O x
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5

Geometrically, (20.15) says that the variation of the average fm is positive at a point x > 0,
0 (x)
that is, fm 0, until the slope of the tangent line is larger than that of the straight line
passing through the origin and the point (x; f (x)), that is, f 0 (x) fm (x). N

20.9 The chain rule


Turn now to the derivatives of composite functions g f . How can we calculate its derivative
starting from the derivatives of the functions f and g? The answer to the question is provided
by the important formula (20.16), called chain rule.

Proposition 925 Let f : (a; b) ! R and g : (c; d) ! R be two functions with Im f (c; d).
If f is derivable at x 2 (a; b) and g is derivable at f (x), then the composite function g f :
(a; b) ! R is derivable at x, with

(g f )0 (x) = g 0 (f (x)) f 0 (x) (20.16)

Thus, the chain rule features the product of the derivatives g 0 and f 0 , where g 0 has as its
argument the image f (x). Before proving it, we provide a simple heuristic argument. For h
small enough, we have

g (f (x + h)) g (f (x)) g (f (x + h)) g (f (x)) f (x + h) f (x)


=
h f (x + h) f (x) h

If h ! 0, then

g (f (x + h)) g (f (x)) g (f (x + h)) g (f (x)) f (x + h) f (x)


lim = lim lim
h!0 h h!0 f (x + h) f (x) h!0 h
= g (f (x)) f 0 (x)
0

Note that we tacitly assumed that the denominator f (x + h) f (x) is always di¤erent from
zero, something that the hypotheses of the theorem do not guarantee. For this reason, we
need the following rigorous proof.
628 CHAPTER 20. DERIVATIVES

Proof Since g is derivable at y = f (x), we have


g (y + k) g (y)
lim = g 0 (y)
k!0 k
This is equivalent to
g (y + k) g (y)
= g 0 (y) + o (1) as k ! 0
k
This equality holds for k 6= 0 and implies
g (y + k) g (y) = g 0 (y) + o (1) k as k ! 0 (20.17)
which holds also for k = 0. Choose h small enough and set k = f (x + h) f (x). Since f is
derivable at x, f is continuous at x, so k ! 0 as h ! 0. By (20.17), we have
g (f (x + h)) g (f (x)) = g 0 (f (x)) + o (1) [f (x + h) f (x)] as h ! 0
It follows that
g (f (x + h)) g (f (x)) [f (x + h) f (x)]
= g 0 (f (x)) + o (1) ! g 0 (f (x)) f 0 (x) ;
h h
proving the statement.

Example 926 Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have, at every
x 2 R, (g f ) (x) = sin x3 and (f g) (x) = sin3 x, so
(g f )0 (x) = g 0 (f (x)) f 0 (x) = cos x3 3x2 = 3x2 cos x3
and
(f g)0 (x) = f 0 (g (x)) g 0 (x) = 3 sin2 x cos x
N

Example 927 Let f : (a; b) ! R be any function derivable at every x 2 (a; b) and let
g (x) = ex . We have
(g f )0 (x) = g 0 (f (x)) f 0 (x) = ef (x) f 0 (x) (20.18)
4 4
For example, if f (x) = x4 , (g f ) (x) = ex and (20.18) becomes (g f )0 (x) = 4x3 ex . N

The chain rule is very useful to compute the derivative of a function that can be written
as a composition of other functions.

Example 928 Let ' : R ! R be given by ' (x) = sin3 (9x + 1). To calculate '0 (x) it is
useful to write ' as
'=f g h (20.19)
where f; g; h : R ! R are given by f (x) = x3 , g (x) = sin x, and h (x) = 9x + 1. By the
chain rule, we have
'0 (x) = f 0 ((g h) (x)) (g h)0 (x) = f 0 ((g h) (x)) g 0 (h (x)) h0 (x)
= 3 sin2 (9x + 1) cos (9x + 1) 9 = 27 sin2 (9x + 1) cos (9x + 1)
Expressing the function ' as in (20.19) thus simpli…es the computation of its derivative. N
20.10. DERIVATIVE OF INVERSE FUNCTIONS 629

O.R. If we write z = f (x) and y = g (z), we clearly have y = g (f (x)). What we have
proved can be summarized by stating that
dy dy dz
=
dx dz dx
which is easy to remember if the the symbol d =d is interpreted as a true ratio –it is a kind
of Pinocchio, a puppet that behaves like a true kid. H
O.R. The chain rule has an onion ‡avor because the derivative of a composite function is
obtained by successively “peeling” the function from the outside:

(f g h )0 = (f (g ((h ( ))))0 = f 0 (g (h ( ))) g 0 (h ( )) h0 ( )

20.10 Derivative of inverse functions


Theorem 929 Let f : (a; b) ! R be an injective function derivable at x0 2 (a; b). If
f 0 (x0 ) 6= 0, the inverse function f 1 is derivable at y0 = f (x0 ), with

1 0 1
f (y0 ) = (20.20)
f 0 (x 0)

In short, the derivative of the inverse function of f , at y0 , is the reciprocal of the derivative
of f , at x0 .

It would be nice to invoke the chain rule and say that, from y0 = f f 1 (y0 ) it
0 0
follows that 1 = f 0 f 1 (y0 ) f 1 (y0 ), so that 1 = f 0 (x0 ) f 1 (y0 ), which is formula
(20.20). Unfortunately, we cannot use the chain rule because we are not sure (yet) that f 1
is derivable: indeed, this is what we …rst need to prove in this theorem.

Proof Set f (x0 + h) = y0 + k and observe that, by the continuity of f , when h ! 0, also
k ! 0. By the de…nition of inverse function, x0 = f 1 (y0 ) and x0 + h = f 1 (y0 + k).
Therefore, h = f 1 (y0 + k) f 1 (y0 ). By hypothesis, there exists

f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
But
f (x0 + h) f (x0 ) y0 + k y0 1
= 1 (y
= 1 (y 1 (y )
h f 0 + k) f 1 (y0 ) f 0 + k) f 0
k
Therefore, provided f 0 (x0 ) 6= 0, the limit of the ratio

f 1 (y + k) f 1 (y )
0 0
k

as k ! 0 also exists, and it is the reciprocal of the previous one, i.e., f 1 0 (y ) = 1=f 0 (x0 ).
0
630 CHAPTER 20. DERIVATIVES

The derivative of the inverse function is thus given by a unit fraction in which at the
denominator the derivative f 0 has as its argument the preimage f 1 (y), that is,
1 0 1 1
f (y) = = 0
f 0 (x) f (f 1 (y))
Example 930 Let f : R ! R be the exponential function f (x) = ex , so that f 1 : R++ !
R is the logarithmic function f 1 (y) = log y. Given that dex =dx = ex = y, we have
d log y 1 1 1 1
= 0 = x = log y =
dy f (x) e e y
for every y > 0. N

This example, along with the chain rule, yield the important formula
d log f (x) f 0 (x)
=
dx f (x)
for strictly positive derivable functions f . It is the logarithmic version of (20.18).

The last example, again along with the chain rule, also leads to an important generaliz-
ation of Proposition 918.

Proposition 931 The power function f : R ! R given by f (x) = xa , with a 2 R, is


derivable at each x 2 R, with derivative function f 0 : R ! R given by
f 0 (x) = axa 1

Proof We have
a
xa = elog x = ea log x (20.21)
Setting f (x) = ex and g (x) = a log x, from (20.21) it follows that
d (xa ) a a
= f 0 (g (x)) g 0 (x) = ea log x = xa = axa 1
dx x x
as desired.

Let us see two more examples.

Example 932 Let f : [ =2; =2] ! R be given by f (x) = sin x, so that f 1 : [ 1; 1] !


[ =2; =2] is given by f 1 (y) = arcsin y. From (20.20) we have
d sin x p p
= cos x = 1 sin2 x = 1 y 2
dx
and so
d arcsin y 1
=p
dy 1 y2
for every y 2 [ 1; 1]. In the same way we prove that
d arccos y 1
= p
dy 1 y2
for every y 2 [ 1; 1] . N
20.11. FORMULARY 631

Example 933 Let f : [ =2; =2] ! R be given by f (x) = tan x, so that f 1 : R !


[ =2; =2] is given by f 1 (y)= arctan y. From (20.20) we have

d tan x
= 1 + tan2 x = 1 + y 2
dx
and so, for every y 2 R,
d arctan y 1
=
dy 1 + y2
N

We relegate to an example the derivative of a function with variable base and exponent.

Example 934 Let F : R ! R be the function given by F (x) = [f (x)]g(x) with f : R ! R+


and g : R ! R. Since one can write
g(x)
F (x) = elog[f (x)] = eg(x) log f (x)

the chain rule yields

f 0 (x)
F 0 (x) = eg(x) log f (x) D [g (x) log f (x)] = F (x) g 0 (x) log f (x) + g (x)
f (x)

For example, the derivative of F (x) = xx is

dxx 1
= xx log x + x = xx (1 + log x)
dx x
2
while the derivative of F (x) = xx is
2
dxx 2 1 2 +1
= xx 2x log x + x2 = xx (1 + 2 log x)
dx x
x
The reader can try to calculate the derivative of F (x) = xx . N

O.R. Denoting by y = f (x) a function and by x = f 1 (y) its inverse, we can summarize
what we have seen by writing
dx 1
=
dy dy
dx
Again the symbol d =d behaves like a true ratio, a further proof of its Pinocchio nature.H

20.11 Formulary
The chain rule permits to broaden considerably the scope of the results on the derivatives of
elementary functions seen in Section 20.7. In Example 927 we already saw how to calculate
the derivative of a generic function ef (x) , which is much more general than the exponential
ex of Proposition 919.
632 CHAPTER 20. DERIVATIVES

In a similar way it is possible to generalize all the results on the derivatives of elementary
functions seen until now. We summarize all this in two tables: the …rst one lists the deriv-
atives of elementary functions, while the second one contains its generalization that can be
obtained through the chain rule.

f f0 Reference
k 0 Example 907
xa axa 1 Proposition 931
ex ex Proposition 919
x x log Proposition 919
1
log x Example 930
x
1
loga x Exercise for the reader
x log a
sin x cos x Proposition 920
cos x sin x Observation 20.11
1
tan x = 1 + tan2 x Example 924
cos2 x
1
cotanx = cotan2 x Exercise for the reader
sin2 x
1
arcsin x p Example 932
1 x2
1
arccos x p Exercise for the reader
1 x2
1
arctan x Example 933
1 + x2
1
arccotanx Exercise for the reader
1 + x2 (20.22)

Given their importance in so many contexts, it is useful to memorize the previous table,
as one learned as a child by heart the multiplication tables. Let us now see its general
version obtained through the chain rule. In the next table, f are the elementary functions
of the previous table, while g is any derivable function. Most of the derivatives that arise in
20.12. DIFFERENTIABILITY AND LINEARITY 633

applications can be calculated by using properly this last table.

f g (f g)0 Image of g
a
g (x) ag (x)a 1 g 0 (x) A R
eg(x) g 0 (x) eg(x) A R
g(x) g 0 (x) g(x) log A R
g 0 (x)
log g (x) A R++
g (x)
g 0 (x) 1
loga g (x) A R++
g (x) log a
sin g (x) g 0 (x) cos g (x) A R
cos g (x) g 0 (x) sin g (x) A R
g 0 (x)
tan g (x) = g 0 (x) 1 + tan2 g (x) A R
cos2 g (x)
g 0 (x)
arcsin g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arccos g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arctan g (x) A R
1 + g 2 (x)
(20.23)

20.12 Di¤erentiability and linearity


When we introduced the notion of derivative at the beginning of the chapter, we emphasized
its meaning as a way to represent the incremental, “marginal”, behavior of a function f :
(a; b) ! R at a point x0 2 (a; b). This section will show that the derivative can be seen
also from a di¤erent perspective, as a linear approximation of the increment of the function.
These two perspectives, with their interplay, are at the heart of di¤erential calculus.

20.12.1 Di¤erential
A fundamental question is whether it is possible to approximate a function f : (a; b) ! R
locally – that is, in a neighborhood of a given point of its domain – by an a¢ ne function,
namely, by a straight line (recall Proposition 656). If this is possible, we could locally
approximate the function – even if very complicated – by the simplest function: a straight
line.
To make precise this idea, given a function f : (a; b) ! R and a point x0 2 (a; b), suppose
that there exists an a¢ ne function r : R ! R that approximates f at x0 in the sense that

f (x0 + h) = r (x0 + h) + o (h) as h ! 0 (20.24)

for every h such that x0 + h 2 (a; b), i.e., for every h 2 (a x0 ; b x0 ).


When h = 0, the local approximation condition (20.24) becomes f (x0 ) = r (x0 ). This
condition thus requires two properties for a straight line r : R ! R to be considered an
634 CHAPTER 20. DERIVATIVES

adequate approximation of f at x0 . First, the straight line must coincide with f at x0 , that
is, f (x0 ) = r (x0 ): at the point x0 the approximation must be exact, without any error.
Second, and most important, the approximation error

f (x0 + h) r (x0 + h)

at x0 + h is o (h), that is, as x0 + h approaches x0 , the error goes to zero faster than h: the
approximation is (locally) “very good”.

Since the straight line r can be written as r (x) = mx + q, the condition f (x0 ) = r (x0 )
implies
r (x0 + h) = m (x0 + h) + q = mh + mx0 + q = mh + f (x0 )
Denote by l : R ! R the linear function de…ned by l (h) = mh, which geometrically is
a straight line passing through the origin. The approximation condition (20.24) can be
equivalently written as

f (x0 + h) f (x0 ) = l (h) + o (h) as h ! 0 (20.25)

This expression (20.25) emphasizes the linearity of the approximation l (h) of the di¤erence
f (x0 + h) f (x0 ), as well as the goodness of this approximation: the di¤erence f (x0 + h)
f (x0 ) l (h) is o (h). This emphasis is important and motivates the following de…nition.

De…nition 935 A function f : (a; b) ! R is said to be di¤erentiable at x0 2 (a; b) if there


exists a linear function l : R ! R such that

f (x0 + h) = f (x0 ) + l (h) + o (h) as h ! 0 (20.26)

for every h 2 (a x0 ; b x0 ).

In other words, the de…nition requires that there exists a number m 2 R, independent
of h (but, in general, dependent on x0 ) such that

f (x0 + h) = f (x0 ) + mh + o (h) as h ! 0

Therefore, f is di¤erentiable at x0 if the linear function l : R ! R approximates the di¤erence


f (x0 + h) f (x0 ) with an error that is o (h) – i.e., an error that, as h ! 0, goes to zero
faster than h. Equivalently, f is di¤erentiable at x0 if the a¢ ne function r : R ! R given by

r (h) = f (x0 ) + l (h)

approximates f at x0 according to the condition (20.24).

The linear function l : R ! R in (20.26) is called the di¤ erential of f at x0 and is denoted
by df (x0 ) : R ! R. With such a notation, (20.26) becomes6

f (x0 + h) = f (x0 ) + df (x0 ) (h) + o (h) as h ! 0 (20.27)


6
Note that h in df (x0 ) (h) is the argument of the di¤erential df (x0 ) : R ! R. In other words, df (x0 ) is
a function of the variable h, while x0 indicates the point at which the di¤erential approximates the function
f.
20.12. DIFFERENTIABILITY AND LINEARITY 635

By setting h = x x0 , we can write (20.27) in the form

f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (x x0 ) as x ! x0 (20.28)

which we will often use.

A …nal piece of terminology: a function f : (a; b) ! R which is di¤erentiable at each


point of (a; b) is called di¤erentiable, without any further quali…cation.

O.R. Di¤erentiability says that a function can be well approximated by an a¢ ne function (a


straight line) –that is, by the simplest type of function –at least nearby the point of interest.
The approximation is good in the close proximity of the point but, as we move away from
it, in general its quality deteriorates rapidly. Such an approximation, even if rough, however
conveys at least two valuable pieces of information:

(i) its mere existence ensures that the function is well behaved (it is continuous);

(ii) it reveals whether the function goes up or down and, with its slope, it tells us approx-
imately which is the rate of change of the function at the point studied.

These two pieces of information are often useful in applications. Chapter 23 will study
in more depth these issues and will present sharper local approximations. H

20.12.2 Di¤erentiability and derivability


The next key result shows that the two perspectives on derivability, incremental and of linear
approximation, are consistent. By recalling the geometric interpretation of the derivative
(Section 20.3), not surprisingly all this means that the tangent line is exactly the a¢ ne
function that satis…es condition (20.24).

Theorem 936 A function f : (a; b) ! R is di¤ erentiable at x0 2 (a; b) if and only if it is


derivable at this point. In this case, the di¤ erential df (x0 ) : R ! R is given by

df (x0 ) (h) = f 0 (x0 ) h

The di¤erential at a point can be thus written in terms of the derivative at that point.
Inter alia, this also shows the uniqueness of the di¤erential df (x0 ).

Proof “If”. Let f be a function derivable at x0 2 (a; b). We have


f (x0 + h) f (x0 ) f 0 (x0 ) h f (x0 + h) f (x0 )
lim = lim f 0 (x0 )
h!0 h h!0 h
f (x0 + h) f (x0 )
= lim f 0 (x0 ) = 0
h!0 h
that is f (x0 + h) f (x0 ) f 0 (x0 ) h = o (h). Setting m = f 0 (x0 ), this implies (20.26) and
therefore f is di¤erentiable at x0 .

“Only if”. Let f be di¤erentiable at x0 2 (a; b). By (20.26),

f (x0 + h) f (x0 ) = l (h) + o (h) as h ! 0


636 CHAPTER 20. DERIVATIVES

The linear function l : R ! R is a straight line passing through the origin, so there exists
m 2 R such that l (h) = mh. Hence

f (x0 + h) f (x0 ) l (h) + o (h)


lim = lim =m2R
h!0 h h!0 h
at x0 the limit of the di¤erence quotient exists and is …nite and therefore f is derivable at
x0 .

Di¤erentiability and derivability are, therefore, equivalent notions for scalar functions.
When they hold, we have, as h ! 0,

f (x0 + h) = f (x0 ) + df (x0 ) (h) + o (h) = f (x0 ) + f 0 (x0 ) h + o (h) (20.29)

or, equivalently, as x ! x0 ,

f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (h) (20.30)


0
= f (x0 ) + f (x0 ) (x x0 ) + o (x x0 )

The reader might recall from (20.7) that

r (x) = f (x0 ) + f 0 (x0 ) (x x0 ) (20.31)

is the equation of the tangent line at x0 . This con…rms the natural intuition that such line
is the a¢ ne approximation that makes f di¤erentiable at x0 . Graphically:

y
5

f(x +h)
4 0

3
f(x )
0
2

0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6

O.R. The di¤erence f (x0 + h) f (x0 ) is called the increment of f at x0 and is often denoted
by f (x0 ) (h). When f is di¤erentiable at x0 , we have

f (x0 ) (h) = df (x0 ) (h) + o (h)

So,
f (x0 ) df (x0 ) as h ! 0
20.12. DIFFERENTIABILITY AND LINEARITY 637

when f 0 (x0 ) 6= 0. Indeed,

f (x0 ) (h) df (x0 ) (h) o (h) f 0 (x0 ) h o (h) o (h)


= + = + = f 0 (x0 ) + ! f 0 (x0 )
h h h h h h

The two in…nitesimals f (x0 ) and df (x0 ) are, therefore, of the same order. This is another
way of saying that, when f is di¤erentiable at x0 , the di¤erential well approximates the true
increment. H

20.12.3 Di¤erentiability and continuity


A fundamental property of di¤erentiable functions, and therefore of derivable functions, is
continuity. In view of Theorem 936, now Proposition 917 can be regarded as a corollary of
the following result.

Proposition 937 If f : (a; b) ! R is di¤ erentiable at x0 2 (a; b), then it is continuous at


x0 .

The converse is clearly false, as shown by the absolute value function f (x) = jxj at
x0 = 0.

Proof By (20.30), we have

lim f (x) = lim f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


x!x0 x!x0
= f (x0 ) + f 0 (x0 ) lim (x x0 ) = f (x0 )
x!x0

Therefore, f is continuous at x0 .

20.12.4 A terminological turning point


In view of the equivalence established in Theorem 936, from now on we say that a function
f : (a; b) ! R is “di¤erentiable”at x0 rather than “derivable”. This is also in accordance with
the more standard terminology. The key conceptual distinction between the two viewpoints
embodied by derivability and di¤erentiability should be kept in mind, however, as it will be
key in multivariable calculus.

In Section 20.4 we introduced the derivative function f 0 : D ! R, de…ned on the domain


D of di¤erentiability of a function f : (a; b) ! R. If the derivative function f 0 is continuous
on a subset E of the domain of derivability D we say that f is continuously di¤ erentiable on
E. That is, f is continuously di¤erentiable on E if its derivative is continuous at all points of
E. In particular, when D = E, the function is said to be continuously di¤erentiable, without
further speci…cation.

Notation The set of all the continuously di¤erentiable functions on a set E in R is denoted
by C 1 (E).
638 CHAPTER 20. DERIVATIVES

20.13 Derivatives of higher order


The derivative function f 0 : D ! R can, in turn, admit a derivative at an (interior) point
x 2 D,7 denoted by f 00 (x) and given by
f 0 (x + h) f 0 (x)
f 00 (x) = lim
h!0 h
when the limit exists and is …nite. The derivative f 00 (x) is called the second derivative of f
at x and a function for which f 00 (x) exists is said to be twice di¤ erentiable at x.

Example 938 The quadratic function f : R ! R given by f (x) = x2 is twice di¤erentiable


at all points of the real line. Indeed, its derivative function f 0 : R ! R is given by f 0 (x) = 2x,
which, in turn, is has a derivative at each x 2 R, with f 00 (x) = 2 for each x 2 R. N

Next, let D0 be the domain of di¤erentiability of f 0 , so that its derivative function f 00 :


D0 ! R associates to every x 2 D0 the second derivative f 00 (x). The function f 00 : D0 ! R
can have derivative at a point x 2 D0 , denoted by f 000 (x) and given by
f 00 (x + h) f 00 (x)
f 000 (x) = lim
h!0 h
when such a limit exists and is …nite. The derivative f 000 (x) is called third derivative of f at
x and a function for which f 000 (x) exists is said to be three times di¤ erentiable at x.

Example 939 The quadratic function is three times di¤erentiable at all point of the real
line. Indeed, its function f 00 : R ! R has a derivative at each x 2 R, with f 000 (x) = 0 for
each x 2 R. N

These de…nitions can be iterated ad libitum, with fourth derivative, …fth derivative, and
so on. Denoting by f (n) the n-th derivative, we can de…ne by recurrence the di¤erentiability
of higher order of a function.

De…nition 940 A function f : (a; b) ! R which is n 1 times di¤ erentiable at a point


x 2 (a; b), is said to be n times di¤erentiable at x if the limit

f (n 1) (x + h) f (n 1) (x)
lim (20.32)
h!0 h
exists and is …nite.

For n = 0 we put f (0) = f . When n = 1, we have ordinary di¤erentiability and (20.32)


de…nes the (…rst) derivative. When n = 2, (20.32) de…nes the second derivative, and so on.

Example 941 Let f : R ! R be given by f (x) = x4 . At each x 2 R we have

f 0 (x) = 4x3 ; f 00 (x) = 12x2 ; f 000 (x) = 24x; f (iv) (x) = 24; f (v) (x) = 0

and f (n) (x) = 0 for every n 5. N


7
The “interior” requirement will become clear in Section 22.1.1. In any case, to ease exposition here we
overlook this requirement.
20.14. DISCRETE LIMITS 639

If the derivative function f (n) is continuous on a subset E of the domain of di¤erentiability


D, we say that f is n times continuously di¤ erentiable on E. As usual, when D = E the
function is said to be n times continuously di¤erentiable, without further speci…cation. The
set of all n times continuously di¤erentiable functions on a set E is denoted by C n (E). For
n = 1 we get back to the class C 1 (E) of the continuously di¤erentiable functions previously
introduced.

20.14 Discrete limits


We conclude by showing that the di¤erential analysis of this chapter is closely connected
with the discrete calculus of Chapter 10. Given a function f : R ! R, …x x0 2 R and h > 0.
Set an = f (x0 + nh) for every n 0.8 De…ne the di¤erence quotients:

a0 2a ka
2 0 k 0
hf (x0 ) = ; hf (x0 ) = ; ; hf (x0 ) =
h h2 hk
We have:
a0 f (x0 + h) f (x0 )
hf (x0 ) = =
h h
2a 1 f (x0 + 2h) 2f (x0 + h) + f (x0 )
2 0
hf (x0 ) = = 2 ( a1 a0 ) =
h2 h h2

k
1 X k
k
hf (x0 ) = ( 1)k i
f (x0 + ih)
hk i
i=0

where the last equality follows from (10.4). By de…nition the …rst derivative is the limit, as
h approaches 0, of the di¤erence quotient h f (x0 ). Interestingly, the next result shows that
also the second di¤erence quotient converges to the second derivative, the third di¤erence
quotient converges to the third derivative, and so on.

Proposition 942 Let f be n 1 times di¤ erentiable on R and n times di¤ erentiable at x0 .
We have f (k) (x0 ) = limh!0 kh f (x0 ) for all 1 k n.

Proof We only prove the case n = 2. In Chapter 23 we will establish the following quadratic
approximation:
1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2
2
Then, f (x0 + 2h) = f (x0 ) + 2f 0 (x0 ) h + 2f 00 (x0 ) h2 + o h2 , so

f (x0 + 2h) 2f (x0 + h) + f (x0 ) = f 00 (x0 ) h2 + o h2

as desired.9
8
Here it is convenient to start the sequence at n = 0.
9
For a direct proof of this result, we refer readers to Jordan (1893) pp. 116-118.
640 CHAPTER 20. DERIVATIVES

Conceptually, this result shows that derivatives can be viewed as limits of …nite di¤er-
ences, so the “discrete” and “continuous” calculi are consistent. Indeed, some important
continuous properties can be viewed as inherited, via limits, from discrete ones: for instance,
the algebra of derivatives can be easily deduced from that of …nite di¤erences via limits. All
this is important (and, in a sense, reassuring) because discrete properties are often much
easier to grasp intuitively.
By establishing a “direct”characterization of second and of higher order derivatives, this
proposition is also important for their numerical computation. For instance, inspection of
the proof shows that f 00 (x0 ) = 2h f (x0 ) + o(h2 ). In general, 2h f (x0 ) is much easier to
compute numerically than f 00 (x0 ), with o(h2 ) being the magnitude of the approximation
error.

A leap of faith Consider a function f : R ! R. Fix a point x0 2 R and an integer n 1.


Let x be any point in R, say x x0 . Set
x x0
h=
n
and xi = x0 + ih for i = 1; :::; n. So, x0 x1 xn = x and the n points xi form
an evenly-spaced subdivision of the interval [x0 ; x]. The choice of n determines how …ne the
subdivision is: larger values of n correspond to …ner subdivisions.
By the Newton di¤erence formula (10.12), we have10
n
X n
X
n(k) k n(k) k
f (x) = f (x0 + nh) = an = a0 = hf (x0 ) hk
k! k!
k=0 k=0

We thus get the noteworthy formula


n
X kf
n(k) (x0 )
f (x) = h
(x x0 )k 8x 2 R
nk k!
k=0

So far so good. Yet, from this formula one might be tempted to take …ner and …ner
subdivisions by letting n ! +1. For each k we have

n(k) nk

as well as
k
hf (x0 ) f (k) (x0 )
provided f is in…nitely di¤erentiable. Indeed, by Proposition 942 we have kh f (x0 ) !
f (k) (x0 ) as h ! 0, so as n ! +1. Unfortunately, the equivalence relation does not
necessarily go through sums, let alone through in…nite ones (cf. Lemma 331). Yet, if we take
a leap of faith –in a eighteen century style –we “then” have a series expansion
1
X f (k) (x0 )
f (x) (x x0 )k 8x 2 R
k!
k=0
10
A notation short circuit: here n plays the role of m in (10.12), k that of j, while in the notation of (10.12)
here we have n = 0.
20.14. DISCRETE LIMITS 641

Fortunately, later in the book Section 23.5 will make rigorous all this by showing that in…n-
itely di¤erentiable functions that are analytic admit an (exact) series expansion, something
that makes them the most tractable class of functions. Though rough, the previous heuristic
argument thus opens a door on a key topic.
642 CHAPTER 20. DERIVATIVES
Chapter 21

Di¤erential calculus in several


variables

21.1 Partial derivatives

21.1.1 The notion

Our study of di¤erential calculus has so far focused on functions of a single variable. Its
extension to functions of several variables is a fundamental, but subtle, topic. We can begin,
however, with a simple notion of di¤erentiation in Rn : partial di¤erentiation. Let us start
with the two-dimensional case. Consider the origin x = (0; 0) in the plane. There are,
intuitively, two main directions along which to approach the origin: the horizontal one –
that is, moving along the horizontal axis –and the vertical one –that is, moving along the
vertical axis.

0.8

0.6

0.4

0.2

0
O
-0.2

-0.4

-0.6

-0.8

-1
-1 -0.5 0 0.5 1

As we can approach the origin along the two main directions, vertical and horizontal, the
same can be done for any point x in the plane.

643
644 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

0.8

0.6

0.4

0.2

x0
2
-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

To formalize this intuition, let us consider the two versors e1 = (1; 0) and e2 = (0; 1) in
R2 . For every x = (x1 ; x2 ) 2 R2 and every scalar h 2 R, we have

x + he1 = (x1 ; x2 ) + (h; 0) = (x1 + h; x2 )

Graphically

0.8

0.6

0.4

0.2
x 1
x + he
x0
2
-0.2

-0.4

-0.6

O x x +h
-0.8 1 1

-1
-1 -0.5 0 0.5 1

The set
x + he1 : h 2 R

is, therefore, formed by the vectors of R2 with the same second coordinate, but with a
di¤erent …rst coordinate.
21.1. PARTIAL DERIVATIVES 645

0.8

0.6

0.4

0.2
1
x x { x + he , h ∈ ℜ }
02

-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

Graphically, it is the horizontal straight line that passes through the point x. For example,
if x is the origin (0; 0), the set

x + he1 : h 2 R = f(h; 0) : h 2 Rg

is the horizontal axis. Similarly, for every scalar h 2 R we have

x + he2 = (x1 ; x2 ) + (0; h) = (x1 ; x2 + h)

Graphically

0.8

0.6
x x
2
0.4

0.2

x + h0 2
2 x + he
-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

In this case the set x + he2 : h 2 R is formed by the vectors of R2 with the same …rst
coordinate, but with a di¤erent second coordinate.
646 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

1 2
{ x + he , h ∈ ℜ }
0.8

0.6
x x
2
0.4

0.2

-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

Graphically, it is the vertical straight line that passes through the point x. When x is the
origin (0; 0), the set x + he2 : h 2 R is the vertical axis.

The partial derivative @f =@x1 (x) of a function f : R2 ! R at a point x 2 R2 considers


the e¤ect on f of in…nitesimal variations along the horizontal straight line x + he1 : h 2 R ,
while the partial derivative @f =@x2 (x) considers the e¤ect on f of in…nitesimal variations
along the vertical straight line x + he2 : h 2 R . In other words, we study the function f
at x by moving along the two basic directions parallel to the Cartesian axes. In particular,
we de…ne the partial derivatives at x as the limits1
@f f x + he1 f (x) f (x1 + h; x2 ) f (x1 ; x2 )
(x) = lim = lim (21.1)
@x1 h!0 h h!0 h
@f f x + he2 f (x) f (x1 ; x2 + h) f (x1 ; x2 )
(x) = lim = lim (21.2)
@x2 h!0 h h!0 h
when they exist …nite.

Though key for understanding the meaning of partial derivatives, (21.1) and (21.2) are
less useful to compute them. To this end, for a …xed x 2 R2 we introduce the two auxiliary
scalar functions, called projections, '1 ; '2 : R ! R de…ned by

'1 (t) = f (t; x2 ) ; '2 (t) = f (x1 ; t)


Note that 'i is a function of only the i-th variable, denoted by t, while the other variable
is kept constant. It is immediate to see that for the partial derivatives @f =@xi at the point
x 2 R2 we have
@f ' (x1 + h) '1 (x1 )
(x) = lim 1 = '01 (x1 ) (21.3)
@x1 h!0 h
@f ' (x2 + h) '2 (x2 )
(x) = lim 2 = '02 (x2 ) (21.4)
@x2 h!0 h
1
The symbol @, a stylized d, takes the place of d to stress that we are not dealing with functions of a single
variable.
21.1. PARTIAL DERIVATIVES 647

The partial derivative @f =@xi is nothing but the ordinary derivative '0i of the scalar function
'i calculated at t = xi , with i = 1; 2. Thus, using the auxiliary functions 'i we go back
to the di¤erentiation of scalar functions studied in the last chapter. Formulas (21.3) and
(21.4) are very useful for the computation of partial derivatives, which is thus reduced to the
computation of standard derivatives of scalar functions.

Example 943 (i) Let f : R2 ! R be given by f (x1 ; x2 ) = x1 x2 . Let us compute the partial
derivatives of f at x = (1; 1). We have
'1 (t) = f (t; 1) = t ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 1 and at the point t = 1 we have
'02 ( 1) = 1, which implies
@f @f
(1; 1) = '01 (1) = 1 ; (1; 1) = '02 ( 1) = 1
@x1 @x2
More generally, at any point x 2 R2 we have
'1 (t) = tx2 ; '2 (t) = x1 t
Therefore, their derivatives at the point x are '01 (x1 ) = x2 and '02 (x2 ) = x1 . Hence,
@f @f
(x) = '01 (x1 ) = x2 ; (x) = '02 (x2 ) = x1
@x1 @x2

(ii) Let f : R2 ! R be given by f (x1 ; x2 ) = x21 x2 . Let us compute the partial derivatives
of f at x = (1; 2). We have
'1 (t) = f (t; 2) = 2t2 ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 4 and at the point t = 2 we have '02 (2) = 1,
whence
@f @f
(1; 2) = '01 (1) = 4 ; (1; 2) = '02 (2) = 1
@x1 @x2
Again, more generally, at any point x 2 R2 we have
'1 (t) = t2 x2 ; '2 (t) = x21 t
Therefore, their derivatives at the point x are '01 (x1 ) = 2x1 x2 and '02 (x2 ) = x21 , so
@f @f
(x) = '01 (x1 ) = 2x1 x2 ; (x) = '02 (x2 ) = x21
@x1 @x2
N

Thus, to calculate @f =@x1 (x) we considered f as a function of the single variable x1 ,


keeping constant the other variable x2 , and we calculated its standard derivative at x1 .
This is what, implicitly, the projection '1 did. Similarly, to calculate @f =@x2 (x) through
the projection '2 amounts to considering f as a function of the single variable x2 , keeping
constant the other variable x1 , and calculating its standard derivative at x2 . Once all this
has been understood, we can skip a step and no longer mention explicitly projections. The
calculation of partial derivatives then essentially reduces to that of standard derivatives.
648 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Example 944 Let f : R R++ ! R be given by f (x1 ; x2 ) = x1 log x2 . Let us calculate the
partial derivatives at x 2 R R++ . We start with @f =@x1 (x). If we consider f as a function
of the single variable x1 , its derivative is log x2 . Therefore,
@f
(x) = log x2
@x1
On the other hand, '1 (t) = t log x2 , and therefore at the point t = x1 we have '01 (x1 ) =
log x2 . Let us move to @f =@x2 (x). If we consider f as a function of the single variable x2 ,
its derivative is x1 =x2 . Therefore,
@f x1
(x) =
@x2 x2
N
O.R. Geometrically, at a point (x1 ; x2 ) the projection '1 (t) = f (t; x2 ) is obtained by
sectioning the surface that represents f with the vertical plane of equation x2 = x2 , while the
projection '2 (t) = f (x1 ; t) is obtained by sectioning the same surface with the vertical plane
(perpendicular to the previous one) of equation x1 = x1 . Therefore, as with a panettone,
the surface is cut with two planes perpendicular one another: the projections are nothing
but the shapes of the two slices and, as such, scalar functions (whose graph lies on the plane
with which we cut the surface).

The partial derivatives at (x1 ; x2 ) are therefore simply the slopes of the two projections at
this point. H

The notion of partial derivative extends in a natural way to functions of n variables


by considering the versors e1 = (1; 0; :::; 0), e2 = (0; 1; :::; 0), ..., en = (0; 0; :::; 1) of Rn .
Throughout the chapter we consider functions f : U ! R de…ned (at least) on an open set
U in Rn .
De…nition 945 A function f : U ! R is said to be partially derivable at a point x 2 U if,
for each i = 1; 2; :::; n, the limits
f x + hei f (x)
lim (21.5)
h!0 h
21.1. PARTIAL DERIVATIVES 649

exist and are …nite. These limits are called the partial derivatives of f at x.

The limit (21.5) is the i-th partial derivative of f at x, denoted by either fx0 i (x) or

@f
(x)
@xi
Often, it is actually convenient to write

@f (x)
@xi
The choice among these alternatives will be just a matter of convenience. The vector

@f @f @f
(x) ; (x) ; :::; (x) 2 Rn
@x1 @x2 @xn

of the partial derivatives of f at x is called the gradient of f at x, denoted by rf (x) or,


simply, by f 0 .2
When f is partially derivable at all the points of a subset E of U , for brevity we say that
f is partially derivable on E. When f is partially derivable at all the points of its domain,
it is called partially derivable, without further speci…cation.
Clearly, partial derivability reduces to standard derivability when f is a scalar function.

Also in the general case of n independent variables, to calculate the partial derivatives
at a point x one can introduce the projections 'i de…ned by

'i (t) = f (x1 ; : : : ; xi 1 ; t; xi+1 ; : : : ; xn ) 8i = 1; 2; : : : ; n

Using the scalar function 'i , we have

@f (x) ' (xi + h) 'i (xi )


= lim i = '0i (xi ) 8i = 1; 2; : : : ; n
@xi h!0 h

which generalizes to Rn formulas (21.3) and (21.4), reducing in this case, too, the calculation
of partial derivatives to that of standard derivatives of scalar functions.

Example 946 Let f : R4 ! R be de…ned by f (x1 ; x2 ; x3 ; x4 ) = x1 + ex2 x3 + 2x24 . At each


point x 2 Rn we have

'1 (t) = t + ex2 x3 + 2x24 ; '2 (t) = x1 + etx3 + 2x24


'3 (t) = x1 + ex2 t + 2x24 ; '4 (t) = x1 + ex2 x3 + 2t2

and therefore

'01 (t) = 1 ; '02 (t) = x3 etx3


'03 (t) = x2 ex2 t ; '04 (t) = 4t
2
The symbol r is called nabla.
650 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Hence
@f @f
(x) = '01 (x1 ) = 1 ; (x) = '02 (x2 ) = x3 ex2 x3
@x1 @x2
@f @f
(x) = '03 (x3 ) = x2 ex2 x3 ; (x) = '04 (x4 ) = 4x4
@x3 @x4
By putting them together, we have the gradient
rf (x) = (1; x3 ex2 x3 ; x2 ex2 x3 ; 4x4 )
N
As in the special case n = 2, also in the general case to calculate the partial derivative
@f (x) =@xi through the projection 'i amounts to considering f as a function of the single
variable xi , keeping constant the other n 1 variables. We then calculate the ordinary
derivative at xi of this scalar function. In other words, we study the incremental behavior
of f with respect to variations of xi only, by keeping constant the other variables.

21.1.2 A continuity failure


The following example shows that for functions of several variables, with n 2, the existence
of partial derivatives does not imply continuity, contrary to the scalar case n = 1.
Example 947 The function f : R2 ! R de…ned by
(
0 if x1 x2 = 0
f (x1 ; x2 ) =
1 if x1 x2 6= 0
is partially derivable at the origin, but is discontinuous there. Intuitively, this happens
because the function is 0 on the axes and 1 o¤ the axes. Formally, …x any 0 < " < 1.
Consider the points of the straight line x2 = x1 di¤erent from the origin, that is, the set
of the points (t; t) with t 6= 0.3 We have f (t; t) = 1 and each neighborhood of the origin
B (0; 0) contains (in…nitely many) such points. Therefore,
jf (t; t) f (0; 0)j = j1 0j = 1 > " 8t 6= 0
Hence, for every 0 < " < 1 there is no neighborhood B (0; 0) such that
jf (x) f (0; 0)j < " 8x 2 B (0; 0)
This shows that f is not continuous at (0; 0).
Let us now consider the partial derivatives of f at (0; 0). We have
@f f (h; 0) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x1 h!0 h h!0 h
and
@f f (0; h) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x2 h!0 h h!0 h
so that rf (0; 0) = (0; 0). In conclusion, f is partially derivable at (0; 0) but is not continuous
at (0; 0). N
As we will see in Section 21.2, in Rn is required a notion of di¤erentiability in order to
guarantee both continuity and derivability.
3
We can actually choose any straight line passing through the origin, except the axes.
21.1. PARTIAL DERIVATIVES 651

21.1.3 Derivative operator


The set D U of the points of the domain where a function f : U ! R is partially derivable
is called, as in the scalar case (Section 20.4), the domain of (partial ) derivability of f .
Since the gradient is a vector of Rn , to extend to vector functions the notion of derivative
function it is necessary to consider operators.

De…nition 948 Let f : U ! R be a function with domain of derivability D U . The


operator
@f @f
rf = ; :::; : D ! Rn (21.6)
@x1 @xn
that associates to every x 2 D the gradient rf (x) is called the derivative operator.

The derivative function f 0 : D ! R is recovered in the special case n = 1.

Example 949 Taking again Example 946, let f : R4 ! R be given by f (x1 ; x2 ; x3 ; x4 ) =


x1 + ex2 x3 + 2x24 . It is easy to check that the derivative operator rf : R4 ! R4 is given by

rf (x) = (1; x3 ex2 x3 ; x2 ex2 x3 ; 4x4 )

As emphasized in (21.6), the operator rf : D ! Rn can be regarded (cf. Section 12.7) as


the n-tuple (@f =@x1 ; :::; @f =@xn ) of functions of several variables, i.e., its partial derivatives
@f =@xi : D Rn ! R.

Example 950 The partial derivatives

@f @f @f
(x) = x2 x3 ; (x) = x1 x3 ; (x) = x1 x2
@x1 @x2 @x3

of the function f (x1 ; x2 ; x3 ) = x1 x2 x3 are functions on all R3 . Together they form the
derivative operator

@f @f @f
rf (x) = (x) ; (x) ; (x) = (x2 x3 ; x1 x3 ; x1 x2 )
@x1 @x2 @x3

of f . N

21.1.4 Ceteris paribus: marginal analysis


Partial derivability is a ceteris paribus approach, a methodological principle that studies the
e¤ect of a single explanatory variable by keeping …xed the other ones, so not to confound
matters. It informs much of the economic analysis, in particular the all-important marginal
analysis in which partial derivatives play, indeed, a fundamental role. Here we consider two
classic examples.
652 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Production Let f : A Rn+ ! R+ be a production function which speci…es that the


producer is able to transform a vector x 2 Rn+ of inputs into the quantity f (x) of output.
The partial derivative
@f (x)
(21.7)
@xi
quanti…es the variation in the output produced that the producer obtains for in…nitesimal
variations of the i-th input, when the values of the other inputs are kept …xed.
In other words, the partial derivative (21.7) isolates the e¤ect on the output caused by
variations in the i-th input, ceteris paribus – that is, by keeping …xed the quantities of the
other inputs. The partial derivative (21.7) is called the marginal productivity of input i, with
i = 1; 2; : : : ; n, and plays a key role in the production decisions of producers.

Utility Let u : A Rn ! R be a utility function. If we assume that u has a cardinal


interpretation, i.e., that u (x) quanti…es the pleasure obtained by consuming the bundle x,
then the di¤erence
u x + hei u (x) (21.8)
indicates the variation in pleasure that the consumer experiences when one varies the quantity
consumed of the good i in the bundle x, ceteris paribus, that is, when the quantities consumed
of the other goods are kept …xed. It follows that the partial derivative
@u (x)
(21.9)
@xi
quanti…es the variation in pleasure that the consumer enjoys for in…nitesimal variations of
the good i, the quantities consumed of the other goods being …xed. It is called the marginal
utility of the good i in the bundle x and it is central in the cardinalist vision of consumer
theory.
In the ordinalist approach, instead, marginal utilities are no longer meaningful because
the di¤erences (21.8) have no meaning. It is easy to construct examples in which we have

u x + hei > u (x) and (g u) x + hei < (g u) (x)

with g : R ! R strictly increasing. Since u and g u are utility functions that are equivalent
from the ordinal point of view, this shows that the di¤erences (21.8) per se have no meaning.
For this reason, the ordinalist consumer theory uses marginal rates of substitution and not
marginal utilities –as we will see in Section 25.3.2. Nevertheless, marginal utility remains a
notion commonly used in economics because of its intuitive appeal.

21.2 Di¤erential
The notion of di¤erential introduced in De…nition 935 naturally extends to functions of
several variables.

De…nition 951 A function f : U ! R is said to be di¤erentiable at a point x 2 U if there


exists a linear function l : Rn ! R such that

f (x + h) = f (x) + l (h) + o (khk) as khk ! 0 (21.10)


21.2. DIFFERENTIAL 653

for every h 2 Rn such that x + h 2 U .4

The linear function l is called the di¤ erential of f at x, denoted by df (x) : Rn ! R. The
di¤erential is the linear approximation at the point x of the variation f (x + h) f (x) with
error of magnitude o (khk), that is,5

f (x + h) f (x) = df (x) (h) + o (khk)

i.e.,
f (x + h) f (x) df (x) (h) o (khk)
lim = lim =0
h!0 khk h!0 khk

By Riesz’s Theorem, the linear function df (x) : Rn ! R has the representation

df (x) (h) = h

for a suitable vector 2 Rn . The next important theorem identi…es such a vector and shows
that di¤erentiability guarantees both continuity and partial derivability.

Theorem 952 If f : U ! R is di¤ erentiable at x 2 U , then it is both continuous and


partially derivable at that point, with
n
X @f (x)
df (x) (h) = rf (x) h = hi (21.11)
@xi
i=1

for every h = (h1 ; :::; hn ) 2 Rn .

When f is scalar we …nd again the classic expression

df (x) (h) = f 0 (x) h 8h 2 R

of the di¤erential in the scalar case (Theorem 936).

Proof Let f : U ! R be di¤erentiable at x 2 U . By (21.10), we can write

lim f (x + h) = lim (f (x) + l (h) + o (khk)) (21.12)


h!0 h!0
= lim f (x) + lim l (h) + lim o (khk)
h!0 h!0 h!0

But:

(i) limh!0 l (h) = l (0) = 0 since linear functions l : Rn ! R are continuous (Theorem
535);

(ii) by the de…nition of little-o, limh!0 o (khk) = 0.


4
In the scalar case the clause “for every h 2 Rn such that x0 + h 2 U ” reduces to the clause “for every
h 2 (x0 a; b x0 )” of De…nition 935.
5
As in the scalar case, note that h is in df (x) (h) the argument of the di¤erential df (x) : Rn ! R. In
other words, df (x) is a function of the variable h, while x denotes the speci…c point at which the di¤erential
approximates the function f .
654 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Therefore, (21.12) implies limh!0 f (x + h) = f (x), so the function is continuous at x.

To show the existence of partial derivatives at x, let us consider the case n = 2 (the
general case does not present novelties, except of notation). In this case, (21.10) implies the
existence of = ( 1 ; 2 ) 2 R2 such that

f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 ) 1 h1 2 h2
lim p =0 (21.13)
(h1 ;h2 )!(0;0) h21 + h22

Setting h2 = 0 in (21.13), we have

f (x1 + h1 ; x2 )f (x1 ; x2 ) 1 h1 f (x1 + h1 ; x2 ) f (x1 ; x2 ) 1 h1


0 = lim = lim
h1 !0 jh1 j h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
= lim 1
h1 !0 h

and therefore
f (x1 + h1 ; x2 ) f (x1 ; x2 ) @f (x1 ; x2 )
1 = lim =
h1 !0 h @x1
In a similar way it is possible to prove that 2 = @f (x1 ; x2 ) =@x2 , that is, rf (x1 ; x2 ) = .
In conclusion, both partial derivatives exist, so the function f is partially derivable, with

df (x1 ; x2 ) (h1 ; h2 ) = rf (x1 ; x2 ) (h1 ; h2 )

This proves (21.11).

Denoting by x0 the point at hand and setting x = x0 + h, expression (21.11) can be


rewritten as
df (x0 ) (x x0 ) = rf (x0 ) (x x0 )
So, the a¢ ne function r : Rn ! R de…ned by

r (x) = f (x0 ) + rf (x0 ) (x x0 ) (21.14)

generalizes the tangent line (20.31). The approximation (21.10) assumes the form f (x) =
r (x) + o (kx x0 k), that is,

f (x) = f (x0 ) + rf (x0 ) (x x0 ) + o (kx x0 k)

This vector form generalizes the scalar one (20.28).

In the special case n = 2, the a¢ ne function (21.14) that best approximates a function
f : U R2 ! R at a point x0 = (x01 ; x02 ) 2 U takes the form6

@f (x0 ) @f (x0 )
r(x1 ; x2 ) = f (x01 ; x02 ) + (x1 x01 ) + (x2 x02 )
@x1 @x2
6
Here x01 and x02 denote the components of the vector x0 .
21.2. DIFFERENTIAL 655

It is called the tangent plane to f at the point x0 = (x01 ; x02 ). Graphically:

4
x3

-2

-4 -2
2 -1
1 0
0 1
-1
-2 2
x2
x1

For n 3, the a¢ ne function (21.14) that best approximates a function in the neighborhood
of a point x0 of its domain is called tangent hyperplane. For obvious reasons, it cannot be
visualized graphically.
We close with a piece of terminology. When f is di¤erentiable at all the points of a subset
E of U , for brevity we say that f is di¤ erentiable on E. When f is di¤erentiable at all the
points of its domain, it is called di¤ erentiable, without further speci…cation.

21.2.1 Di¤erentiability and partial derivability


Partial derivability does not imply continuity when n 2 (Example 947). In view of the
last theorem, partial derivability then does not imply di¤erentiability, again unlike the scalar
case n = 1. The next example illustrates this failure.

Example 953 Let f : R2+ [ R2 ! R be given by


(
0 if (x1 ; x2 ) = (0; 0)
f (x1 ; x2 ) = p
x1 x2 if (x1 ; x2 ) 6= (0; 0)
Because of the root, the function is de…ned only on the …rst and third orthants. We can
then approach the origin only from the right and from above, so that:
@f f (h; 0) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x1 h!0 h h!0 h
and
@f f (0; k) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x2 k!0 k k!0 k
Therefore, f has partial derivatives at (0; 0), with rf (0; 0) = (0; 0). On the other hand, f
is not di¤erentiable at (0; 0). Let us suppose, by contradiction, that it is so. Then,
p
f (h; k) = f (0; 0) + rf (0; 0) (h; k) + o h2 + k 2
656 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

p
Since f (0; 0) = 0 and rf (0; 0) = (0; 0), we have f (h; k) = o h2 + k 2 , that is,

f (h; k)
lim p =0
(h;k)!(0;0) h2 + k 2
i.e., r
hk
lim =0
(h;k)!(0;0) h2 + k2
But, this is not possible. Indeed, if for example we consider the points on the straight line
x2 = x1 , that is, of the form (t; t), we get
r r r
hk t2 1
2 2
= 2 2
= 8t 6= 0
h +k t +t 2
This shows that f is not di¤erentiable at (0; 0),7 even if it has partial derivatives at (0; 0).N

Summing up:

di¤erentiability implies partial derivability (Theorem 952), but not vice versa when
n 2 (Example 953);

di¤erentiability implies continuity (Theorem 952);

partial derivability does not imply continuity when n 2 (Example 947).

It is natural to ask which additional hypotheses are required for partial derivability to
imply di¤erentiability (so, continuity). The answer is given by the next remarkable result that
extends Theorem 936 to the vector case by showing that, under a simple regularity hypotheses
(the continuity of partial derivatives), a partially derivable function is also di¤erentiable (so,
continuous).

Theorem 954 Let f : U ! R be partially derivable. If the partial derivatives are continu-
ous, then f is di¤ erentiable.

Proof8 For simplicity of notation, we consider the case in which n = 2, the function f is
de…ned on the entire plane R2 , and the partial derivatives @f =@x1 and @f =@x2 exist on R2 .
Apart from more complicated notation, the general case can be proved in a similar way.
Therefore, let f : R2 ! R and x 2 R2 . Assume that @f =@x1 and @f =@x2 are both
continuous at x. By adding and subtracting f (x1 + h1 ; x2 ), for each h 2 R2 we have:

f (x + h) f (x) (21.15)
= f (x1 + h1 ; x2 ) f (x1 ; x2 ) + f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )
7
For the more demanding reader: note pthat each neighbourhood
p of the origin contains points
p of the type
(t; t) with t 6= 0. For such points we have hk= (h2 + k2 ) = 1=2. Therefore, for 0 < " < 1=2 there is no
p
neighbourhood of the origin such that, for all its points (h; k) 6= (0; 0), we have hk= (h2 + k2 ) 0 < ".
8
Since this proof uses the Mean Value Theorem for scalar functions that will be presented in the next
chapter, it is best understood after learning that result. The same remark applies to the proof of Schwartz’s
Theorem.
21.2. DIFFERENTIAL 657

The partial derivative @f =@x1 (x) is the derivative of the function 1 : R ! R de…ned by
9
1 (x1 ) = f (x1 ; x2 ), in which x2 is considered as a constant. By the Mean Value Theorem,
there exists z1 2 (x1 ; x1 + h1 ) R such that

1 (x1
+ h1 ) 1 (x1 ) (x1 + h1 ) 1 (x1 )
0
1 (z1 ) = = 1
x1 + h1 x1 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
=
h1
Similarly, the partial derivative @f =@x2 (x + h) is the derivative of the function 2 : R ! R
de…ned by 2 (x2 ) = f (x1 + h1 ; x2 ), in which x1 + h1 is considered as a constant. Again by
the Mean Value Theorem, there exists z2 2 (x2 ; x2 + h2 ) R such that

2 (x2
+ h2 ) 2 (x2 ) (x2 + h2 ) 2 (x2 )
0
2 (z2 ) = = 2
x2 + h2 x2 h2
f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )
=
h2
0 0
Since by construction @f =@x1 (z1 ; x2 ) = 1 (z1 ) and @f =@x2 (x1 + h1 ; z2 ) = 2 (z2 ), we can
rewrite (21.15) as:
@f @f
f (x + h) f (x) = (z1 ; x2 ) h1 + (x1 + h1 ; z2 ) h2
@x1 @x2
On the other hand, by de…nition rf (x) h = @f =@x1 (x1 ; x2 ) h1 + @f =@x2 (x1 ; x2 ) h2 . Thus:
jf (x + h) f (x) rf (x) hj
lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) h2 @x1 (x1 ; x2 ) h1 + @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
lim + lim
h!0 khk h!0 khk
@f @f jh1 j
= lim (z1 ; x2 ) (x1 ; x2 )
h!0 @x1 @x1 khk
@f @f jh2 j
+ lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x2 @x2 khk
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) + lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
where the last inequality holds because
jh1 j jh2 j
0 1 and 0 1
khk khk
9
The Mean Value Theorem for scalar functions will be studied in the next chapter.
658 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

On the other hand, since z1 2 (x1 ; x1 + h1 ) and z2 2 (x2 ; x2 + h2 ), we have z1 ! x1 for


h1 ! 0 and z2 ! x2 for h2 ! 0. Therefore, being @f =@x1 and @f =@x2 both continuous at
x, we have

@f @f @f @f
lim (z1 ; x2 ) = (x1 ; x2 ) and lim (x1 + h1 ; z2 ) = (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2

which implies

@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) = lim (x1 + h1 ; z2 ) (x1 ; x2 ) =0
h!0 @x1 @x1 h!0 @x2 @x2

In conclusion, we have proved that

jf (x + h) f (x) rf (x) hj
lim =0
h!0 khk

and the function f is thus di¤erentiable at x.

Example 955 (i) Consider the function f : Rn ! R given by f (x) = kxk2 . Its gradient is

@f @f
rf (x) = (x) = 2x1 ; :::; (x) = 2xn = 2x 8x 2 Rn
@x1 @xn

The partial derivatives are continuous on Rn and therefore f is di¤erentiable on Rn . By


(21.10), at each x 2 Rn we have

df (x) (h) = rf (x) h 8h 2 Rn

and
kx + hk2 kxk2 = 2x h + o (khk)
as khk ! 0.
P
(ii) Consider the function f : Rn++ ! R given by f (x) = ni=1 log xi . Its gradient is

@f 1 @f 1
rf (x) = (x) = ; :::; (x) = 8x 2 Rn++
@x1 x1 @xn xn

The partial derivatives are continuous on Rn++ and therefore f is di¤erentiable on Rn++ . By
(21.10), at each x 2 Rn++ we have

df (x) (h) = rf (x) h 8h 2 Rn

so that, as khk ! 0,
n
X n
X n
X hi
log (xi + hi ) log xi = + o (khk)
xi
i=1 i=1 i=1

N
21.2. DIFFERENTIAL 659

21.2.2 Total di¤erential


In an imprecise, yet suggestive, way expression (21.11) is often written as
@f @f
df = dx1 + + dxn (21.16)
@x1 @xn
This formula, called total di¤ erential of f , shows how the overall e¤ect of df on f is de-
composed into the sum of the e¤ects that have on f the in…nitesimal variations dxi of the
individual variables. The summands @f =@xi are sometimes called partial di¤ erentials.
For example, if f : Rn ! R is a production function with n inputs, the total di¤erential
tells us that the overall variation df of the output is the result of the sum of the e¤ects
@f
dxi
@xi
that the in…nitesimal variations dxi of each input have on the production function. In a
more economic language, the overall variation of the output df is given by the sum of the
in…nitesimal variations dxi of the inputs, multiplied by their respective marginal productiv-
ities @f =@xi . The greater (in absolute value) the marginal productivity @f =@xi of input i,
the greater the impact of its variation on output.
Similarly, if u : Rn+ ! R is a utility function, the total di¤erential takes the form
@u @u
du = dx1 + + dxn
@x1 @xn
The overall variation du of utility decomposes in the sum of the e¤ects
@u
dxi
@xi
on the utility function of in…nitesimal variations dxi of the single goods that belong to bundle
x: the overall variation of utility du is the sum of the in…nitesimal variations of the goods
dxi , multiplied by their respective marginal utilities @u=@xi .
n ! R be the log-linear utility function u (x ; :::; x ) =
Pn
Example 956 Let Pn u : R ++ 1 n i=1 ai log xi
with ai > 0 and i=1 ai = 1. Its total di¤erential is
a1 an
du = dx1 + + dxn
x1 xn
The impact of each in…nitesimal variation dxi on the overall variation of utility du is de-
termined by the coe¢ cient ai =xi . N

However evocative, one should not forget that the total di¤erential (21.16) is only a
heuristic version of the di¤erential df (x), which is the rigorous notion.10
10
As we already remarked a few times, heuristics plays an important role in the quest for new results (of
a “vanguard of heuristic e¤orts towards the new” wrote Carlo Emilio Gadda). The rigorous veri…cation of
the results so obtained is, however, key; only few outstanding mathematicians, dear to the gods, can rely on
intuition without caring too much of rigor. Yet, one of them, the great Archimedes, so writes in his Method
“... certain things became clear to me by a mechanical method, although they had to be demonstrated by
geometry afterwards because their investigation by the said method did not furnish an actual demonstration.”
(Trans. Heath).
660 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

21.2.3 Chain rule


One of the most useful formulas of di¤erential calculus for scalar functions is the chain rule
(f g)0 (x) = f 0 (g (x)) g 0 (x) for composite functions f g. This rule generalizes to functions
of several variables as follows (we omit the proof as later we will prove a more general chain
rule).

Theorem 957 (Chain rule) Let g : U Rn ! R and f : B R ! R with Im g B. If g


is di¤ erentiable at x 2 U and if f is di¤ erentiable at g (x), then the composition f g:U
Rn ! R is di¤ erentiable at x, with

@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) rg (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn

In the scalar case n = 1, we get back the classic rule (f g)0 (x) = f 0 (g (x)) g 0 (x).
Moreover, by Theorem 952 the di¤erential of the composition f g is:
n
X @g (x)
d (f g) (x) (h) = f 0 (g (x)) hi (21.17)
@xi
i=1

The total di¤erential form of (21.17) reads

df @g df @g
d (f g) = dx1 + + dxm (21.18)
dg @x1 dg @xm
The variation of f g can decomposed according to the di¤erent in…nitesimal variations dxi ,
each of which induces the variation (@g=@xi ) dxi on g, which in turn causes a variation df =dg
on f . Summing these partial e¤ects we get the overall variation d (f g).

Example 958 (i) Let f : R ! R be given by f (x) = e2x and let g : R2 ! R be given by
g (x) = x1 x22 . Let us calculate with the chain rule the di¤erential of the composite function
f g : R2 ! R given by
2
(f g) (x) = e2x1 x2
We have
2 2
r (f g) (x) = 2x22 e2x1 x2 ; 4x1 x2 e2x1 x2

and therefore
2
d (f g) (x) (h) = 2e2x1 x2 x22 h1 + 2x1 x2 h2
for every h 2 R2 . The total di¤erential is
2
d (f g) = 2e2x1 x2 x22 dx1 + 2x1 x2 dx2

(ii) Let f : (0; 1) ! R be given by f (x) = log x and let g : R2++ [ R2 ! R be given
p
by g (x1 ; x2 ) = x1 x2 . Here the function g must be restricted to R2++ [ R2 to satisfy the
condition Im g (0; 1). Let us calculate with the chain rule the di¤erential of the composite
function f g : R2++ [ R2 ! R given by
p
(f g) (x) = log x1 x2
21.2. DIFFERENTIAL 661

We have r r
@g (x) 1 x2 @g (x) 1 x1
= and =
@x1 2 x1 @x2 2 x2
so that
@g (x) 0 @g (x)
r (f g) (x) = f 0 (g (x)) ; f (g (x))
@x1 @x2
r r
1 1 x2 1 1 x1 1 1
= p ;p = ;
x1 x2 2 x1 x1 x2 2 x2 2x1 2x2
and
1 1
d (f g) (x) (h) = h1 + h2
2x1 2x2
for every h 2 R2 . The total di¤erential is
1 1
d (f g) = dx1 + dx2
2x1 2x2
Pn 1
(iii) Let g : Rn++ ! R and f : R+ ! R be given by g (x) = i=1 ai xi and f (x) = x ,
with ai 2 R and 6= 0, so that f g : Rn++ ! R is

n
!1
X
(f g) (x) = ai xi
i=1

We have, for every x 2 Rn++ ,


@g 1 @g 1
rg (x) = (x) = a1 x1 ; :::; (x) = an xn
@x1 @xn
so that
@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
0 ! 1 !1 1
n 1 n 1
1 X 1 X
=@ ai xi 1
a1 x1 ; :::; ai xi an xn 1A

i=1 i=1
0 !1 !1 1
n
X n
X
= @a1 ai xi x1 1
; :::; an ai xi xn 1A

i=1 i=1

and
n n
!1 n
!1 n
X X X X
1 1
d (f g) (x) (h) = ai ai xi xi hi = ai xi ai xi hi
i=1 i=1 i=1 i=1

for every h 2 Rn . The total di¤erential is


n
!1 n
X X
1
d (f g) = ai xi ai xi dxi
i=1 i=1
662 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Pn
(iv) Let g : Rn ! R and f : R++ ! R be given by g (x) = i=1 ai e
xi and f (x) =
1
log x , with ai 2 R and 6= 0, so that f g: Rn ! R is
n
X
1 xi
(f g) (x) = log ai e
i=1

We have, for every x 2 Rn ,

@g x1 @g xn
rg (x) = (x) = a1 e ; :::; (x) = an e
@x1 @xn

so that
@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
1 1 1 1
= Pn x
a1 e x1 ; :::; Pn xi
an e xn
i=1 ai e i=1 ai e
i

a e x1 an e xn
= Pn 1 xi
; :::; Pn xi
a
i=1 i e i=1 ai e

and
n
X n
a e xi 1 X
d (f g) (x) (h) = Pn i xi
hi = ai e xi
hi
i=1 i=1 ai e g (x)
i=1

for every h 2 Rn . The total di¤erential is


n
1 X xi
d (f g) = ai e dxi
g (x)
i=1

21.3 Partial derivatives of higher order


Consider a function f : U ! R de…ned (at least) on an open set U in Rn and partially
derivable there. As already observed (Section 21.1.3), its partial derivatives @f =@xi can, in
turn, be seen as functions of n variables

@f
:U !R
@xi

Example 959 The partial derivatives

@f @f
(x) = ex2 and (x) = x1 ex2
@x1 @x2

of the function f (x1 ; x2 ) = x1 ex2 are functions on R2 . N


21.3. PARTIAL DERIVATIVES OF HIGHER ORDER 663

Hence, it makes sense to talk about existence of partial derivatives of the partial deriv-
atives functions @f =@xi : U ! R at a point x 2 U . In this case, for every i; j = 1; :::; n we
have the partial derivative
@f
@ @x i
(x)
@xj
with respect to xj of the partial derivative @f =@xi . These partial derivatives are called
second-order partial derivatives of f and are denoted by
@2f
(x)
@xi @xj
or by fx00i xj . When i = j we write
@2f
(x)
@x2i
instead of @ 2 f =@xi @xi . Using this notation, we can construct the matrix
2 3
@2f @2f @2f
2 (x) @x1 @x2 (x) @x1 @xn (x)
6 @x1 7
6 7
6 @2f 2 2 7
6 @ f @ f 7
6 @x2 @x1 (x) @x22
(x) @x2 @xn (x) 7
6 7
6 7
6 7
6 7
6 7
4 2 2 2
5
@ f @ f @ f
@xn @x1 (x) @xn @x2 (x) @x2
(x)
n

of second-order partial derivatives. It is called the Hessian matrix of f and is denoted by


r2 f (x).
Example 960 Let f : R3 ! R be given by f (x) = ex1 x2 + 3x2 x3 for x 2 R3 , and let us
compute its Hessian matrix. We have:
@f @f @f
(x) = x2 ex1 x2 ; (x) = x1 ex1 x2 + 3x3 ; (x) = 3x2
@x1 @x2 @x3
whence
@2f 2 x1 x2 @2f x1 x2 @2f
(x) = x 2 e ; (x) = (1 + x 1 x 2 ) e ; (x) = 0
@x21 @x1 @x2 @x1 @x3
@2f @2f @2f
(x) = (1 + x1 x2 ) ex1 x2 ; (x) = x 2 x1 x2
1 e ; (x) = 3
@x2 @x1 @x22 @x2 @x3
@2f @2f @2f
(x) = 0; (x) = 3; (x) = 0
@x3 @x1 @x3 @x2 @x23
It follows that the Hessian matrix of f is
2 3
x22 ex1 x2 (1 + x1 x2 ) ex1 x2 0
6 7
6 7
r2 f (x) = 6
6 (1 + x1 x2 ) e
x1 x2 x21 ex1 x2 3 7
7
4 5
0 3 0
N
664 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

The second-order partial derivatives can, in turn, be seen as functions of several variables.
We can therefore look for their partial derivatives, which (if they exist) are called the third-
order partial derivatives. We can then move to their partial derivatives (if they exist) and
get the fourth-order derivatives, and so on.
For instance, going back to the previous example, consider the partial derivative

@2f
(x) = (1 + x1 x2 ) ex1 x2
@x1 @x2

The third-order derivatives exist and are


@2f
@3f @ @x1 @x2
(x) = (x) = 2x2 + x1 x22 ex1 x2
@x1 @x2 @x1 @x1
@2f
@3f @ @x1 @x2
(x) = = 2x1 + x21 x2 ex1 x2
@x1 @x22 @x2
@2f
@3f @ @x1 @x2
(x) = (x) = 0
@x1 @x2 @x3 @x3

and clearly we can go to the fourth-order partial derivatives, etc.

Example 961 Let f : R2 ! R be given by f (x1 ; x2 ) = x1 x2 . It is immediate that f has


continuous partial derivatives of any order. More generally, this holds for all polynomial in
several variables. N

The following theorem establishes a key interchangeability property of second-order par-


tial derivatives.

Theorem 962 (Schwarz) Let f : U ! R be a function that has second-order partial de-
rivatives on U . If they are continuous at x 2 U , then

@2f @2f
(x) = (x) (21.19)
@xi @xj @xj @xi

for every i; j = 1; :::; n.

Proof For simplicity we consider the case n = 2. In this case, (21.19) reduces to:

@2f @2f
= (21.20)
@x1 @x2 @x2 @x1

Again for simplicity, we also assume that the domain A is the whole space R2 , so that we
consider a function f : R2 ! R. By de…nition,

@f f (x1 + h1 ; x2 ) f (x1 ; x2 )
(x) = lim
@x1 h1 !0 h1
21.3. PARTIAL DERIVATIVES OF HIGHER ORDER 665

and therefore:
@f @f
@2f @x1 (x1 ; x2 + h2 ) @x1 (x1 ; x2 )
(x) = lim
@x1 @x2 h2 !0 h2
1 f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 )
= lim lim
h2 !0 h2 h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
lim
h1 !0 h1

Let : R2 ! R be an auxiliary function de…ned by:

(h1 ; h2 ) = f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 ) f (x1 + h1 ; x2 ) + f (x1 ; x2 )

for each (h1 ; h2 ) 2 R2 . Using the function , we can write:

@2f (h1 ; h2 )
(x) = lim lim (21.21)
@x1 @x2 h2 !0 h1 !0 h2 h1
Consider in addition the scalar auxiliary function 1 : R ! R de…ned by 1 (x) = f (x; x2 + h2 )
f (x; x2 ) for each x 2 R. We have:

0 @f @f
1 (x) = (x; x2 + h2 ) (x; x2 ) (21.22)
@x1 @x1
Moreover, by the Mean Value Theorem there exists z1 2 (x1 ; x1 + h1 ) such that

0 1 (x1 + h1 ) 1 (x1 ) (h1 ; h2 )


1 (z1 ) = =
h1 h1
and therefore, by (21.22), such that
@f @f (h1 ; h2 )
(z1 ; x2 + h2 ) (z1 ; x2 ) = (21.23)
@x1 @x1 h1
@f
Let 2 : R ! R be another auxiliary scalar function de…ned by 2 (x) = @x1 (z1 ; x) for each
x 2 R. We have:
0 @2f
2 (x) = (z1 ; x) (21.24)
@x2 @x1
By the Mean Value Theorem, there exists z2 2 (x2 ; x2 + h2 ) such that
@f @f
0 2 (x2 + h2 ) 2 (x2 ) @x1 (z1 ; x2 + h2 ) @x1 (z1 ; x2 )
2 (z2 ) = =
h2 h2
and therefore, by (21.24), such that
@f @f
@2f @x1 (z1 ; x2 + h2 ) @x1 (z1 ; x2 )
(z1 ; z2 ) =
@x2 @x1 h2
Together with (21.23), this implies that

@2f (h1 ; h2 )
(z1 ; z2 ) = (21.25)
@x2 @x1 h2 h1
666 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Go back now to (21.21). Thanks to (21.25), expression (21.21) becomes:

@2f @2f
(x) = lim lim (z1 ; z2 ) (21.26)
@x1 @x2 h2 !0 h1 !0 @x2 @x1

On the other hand, since zi 2 (xi ; xi + hi ) for i = 1; 2, we have zi ! xi when hi ! 0. Being


@ 2 f =@x1 @x2 continuous by hypothesis at x = (x1 ; x2 ), we therefore have

@2f @2f
lim lim (z1 ; z2 ) = (x1 ; x2 ) (21.27)
h2 !0 h1 !0 @x2 @x1 @x2 @x1

Putting together (21.26) and (21.27), we get (21.20), as desired.

Thus, when they are continuous, the order in which we take partial derivatives does not
matter: we can compute …rst the partial derivative with respect to xi and then the one with
respect to xj , or vice versa, with the same result. So, we can choose the way that seems
computationally easier, obtaining then “for free” the other second-order partial derivative.
This simpli…es considerably the computation of derivatives and, moreover, results in an
elegant property of symmetry of the Hessian matrix.

Example 963 (i) Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x21 x2 x3 . Simple calculations


show that:
@2f @2f
(x) = (x) = 2x1 x3
@x1 @x2 @x2 @x1
in accordance with Schwarz’s Theorem because the second partial derivatives are continuous.
(ii) Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = cos (x1 x2 ) + e x3 . The Hessian matrix of
f is
2 3
x22 cos (x1 x2 ) sin (x1 x2 ) x1 x2 cos (x1 x2 ) 0
6 7
6 7
2 6
r f (x) = 6 sin (x1 x2 ) x1 x2 cos (x1 x2 ) 2
x1 cos (x1 x2 ) 0 7 7
4 5
0 0 e x 3

In accordance with Schwarz’s Theorem, this matrix is symmetric. N

To conclude, we show a case not covered by Schwarz’s Theorem.

Example 964 Let f : R2 ! R be given by:


8
> x2 x2
< x1 x2 x21 +x22 if (x1 ; x2 ) 6= (0; 0)
1 2
f (x1 ; x2 ) =
>
: 0 if (x1 ; x2 ) = (0; 0)

The reader can verify that: (i) f has continuous partial derivatives @f =@x1 and @f =@x2 ; (ii)
f has second-order partial derivatives @ 2 f =@x1 @x2 and @ 2 f =@x2 @x1 de…ned on all R2 , but
discontinuous at the origin (0; 0). Therefore, the hypothesis of continuity of the second-order
21.4. TAKING STOCK: THE NATURAL DOMAIN OF ANALYSIS 667

partial derivatives of Schwarz’s Theorem does not hold at the origin, so the theorem cannot
say anything about the behavior of these derivatives at the origin. Let us calculate them:
@2f @2f
(0; 0) = 1 and (0; 0) = 1
@x1 @x2 @x2 @x1
So,
@2f @2f
(0; 0) 6= (0; 0)
@x1 @x2 @x2 @x1
The continuity of the second-order partial derivatives is, therefore, needed for the validity of
equality (21.19). N

21.4 Taking stock: the natural domain of analysis


We have studied so far partial derivability and di¤erentiability, and established some re-
markable properties. In particular, we learned that the continuity of partial derivatives, of
di¤erent orders, is key for some highly desirable properties. Some terminology is, thus, in
order. We say that a function f of several variables that has partial derivatives of order
n continuous on a set E is n-times continuously di¤ erentiable on E. The set of all such
functions is denoted by C n (E), thus extending the terminology of the scalar case (Section
20.13).
In particular, C 1 (E) and C 2 (E) are the classes of the functions with continuous …rst-
order derivatives and with continuous …rst- and second-order derivatives on E, respectively.
Two fundamental results, Theorem 954 and Schwarz’s Theorem, show the importance of
these classes: the former showed that for the functions in C 1 (E) partial derivability implies
continuity, the latter that for the functions in C 2 (E) the mixed partial derivatives are equal.
The most signi…cant results of di¤erential calculus hold for functions of, at least, class
C 1 (E) which is, therefore, the natural space in which to carry out analyses that rely on
di¤erential methods. In applications, functions are typically assumed to belong to C 1 (E).

21.5 Incremental and approximation viewpoints


21.5.1 Directional derivatives
Via the di¤erence quotient
f x + hei f (x)
lim (21.28)
h!0 h
partial derivatives consider in…nitesimal variations along the basic directions identi…ed by
the vectors ei . But, what about the other directions? Intuitively, there are in…nitely many
ways to approach a point in Rn and one may wonder about in…nitesimal variations along
them. In particular, are they consistent, in some sense, with the variations along the basic
directions? In this section we address this issue and, in so doing, we expatiate on the
incremental (marginal) viewpoint in multivariable di¤erential calculus.
To take into account the in…nite directions along which we can approach a point in Rn ,
we generalize the quotient (21.28) as follows
f (x + hy) f (x)
lim
h!0 h
668 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

This limit represents the in…nitesimal increments of the function f at the point x when we
move along the direction determined by the vector y of Rn , which is no longer required to
be a versor ei . Graphically:

This suggests the following de…nition.

De…nition 965 A function f : U ! R is said to be derivable at a point x 2 U if, for each


y 2 Rn , the limit
f (x + hy) f (x)
f 0 (x; y) = lim (21.29)
h!0 h
exists and is …nite. This limit is called the directional derivative of f at x along the direction
y.

The function f 0 (x; ) : Rn ! R is called the directional derivative of f at x.11 To better


understand this notion, observe that, given any two vectors x; y 2 Rn , the straight line hx; yi
that passes through them is given by

hx; yi = f(1 h) x + hy : h 2 Rn g

Going back to (21.29), we have

f (x + hy) = f ((1 h) x + h (x + y))

Therefore, the ratio


f (x + hy) f (x)
h
11
Note that directional derivatives only consider “linear” approaches to a point x, namely along straight
lines. In Section 11.3.2 we saw that there are highly nonlinear ways to approach a point.
21.5. INCREMENTAL AND APPROXIMATION VIEWPOINTS 669

tells us which is the “incremental” behavior of the function when we move along the line
hx; x + yi. Each y 2 Rn identi…es a line and, therefore, gives us a direction along which we
can study the increments of the function.

Not all lines hx; x + yi identify di¤erent directions: the next result shows that, given a
vector y 2 Rn , all vectors y identify the same direction provided 6= 0.

Proposition 966 Given a point x 2 Rn , for each y; y 0 2 Rn we have hx; x + yi = hx; x + y 0 i


6 0 such that y 0 = y.
if and only if there exists =

Proof “If”. Suppose that y 0 = y with 6= 0. We have

x + y 0 = x + y = x + (1 ) x + y = (1 )x + (x + y)

and therefore x + y 0 2 hx; x + yi. This implies hx; x + y 0 i hx; x + yi. Since y = (1= ) y 0 ,
by proceeding in a similar way we can prove that hx; x + yi hx; x + y 0 i. We conclude that
hx; x + yi = hx; x + y 0 i. “Only if”. Suppose that hx; x + y 0 i = hx; x + yi. Suppose y 6= y 0
(otherwise the result is trivially true). At least one of them has then to be non-zero, say y 0 .
Since x+y 0 2 hx; x + yi and y 0 6= 0, there exists h 6= 0 such that x+y 0 = (1 h) x+h (x + y).
This implies y 0 = hy and therefore, by setting = h, we have the desired result.

The next corollary shows that this redundancy of the directions translates, in a simple
and elegant way, in the homogeneity of the directional derivative, a property that permits
to determine the value f 0 (x; y) for every scalar once we know the value of f 0 (x; y).

Corollary 967 If f is derivable at a point x 2 U , then the directional derivative f 0 (x; ) :


Rn ! R is homogeneous, i.e., for every 2 R and every y 2 Rn , we have

f 0 (x; y) = f 0 (x; y) (21.30)

Proof Let 2 R. Since h ! 0 if and only if ( h) ! 0, we have:

f (x + ( h) y) f (x) f (x + ( h) y) f (x)
lim = lim = f 0 (x; y)
h!0 h ( h)!0 h
670 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Dividing and multiplying by , we therefore have:


f (x + h ( y)) f (x) f (x + ( h) y) f (x)
lim = lim = f 0 (x; y)
h!0 h h!0 h
It follows that the limit
f (x + h ( y)) f (x)
f 0 (x; y) = lim
h!0 h
exists, it is …nite and is equal to f 0 (x; y), as desired. On the other hand, if = 0 we have
f (x + 0) f (x)
f 0 (x; y) = f 0 (x; 0) = lim =0
h!0+ h
Therefore, f 0 (x; y) = 0 = f 0 (x; y), which completes the proof.

Partial derivatives are nothing but the directional derivatives computed along the fun-
damental directions in Rn represented by the versors ei . That is,
@f (x)
f 0 x; ei =
@xi
for each i = 1; 2; :::; n. So, functions that are derivable at x, are partially derivable there.
The converse is false, as the next example shows.

Example 968 In Example 947 we showed that the function f : R2 ! R de…ned by

0 if x1 x2 = 0
f (x1 ; x2 ) =
1 if x1 x2 6= 0

is partially derivable at the origin. However, it is not derivable at the origin 0 = (0; 0).
Indeed, consider x = 0 and y = (1; 1). We have
f (x + hy) f (x) f (h; h) 1
= = 8h 6= 0
h h h
so the limit (21.29) does not exists, and the function is not derivable at 0. N

In sum, partial derivability is a weaker notion than derivability, something not surpris-
ing (indeed, the former notion controls only two directions out of the in…nitely many ones
controlled by the latter notion).

21.5.2 Algebra
Like that of partial derivatives, also the calculus of directional derivatives can be reduced to
the calculus of ordinary derivatives of scalar functions. Given a point x 2 Rn and a direction
y 2 Rn , de…ne an auxiliary scalar function as (h) = f (x + hy) for every h 2 R. The
domain of is the set fh 2 R : x + hy 2 U g, which is an open set in R containing the point
0. By de…nition of right-sided derivative, we have

0 (h) (0) f (x + hy) f (x)


+ (0) = lim = lim
h!0+ h h!0+ h
21.5. INCREMENTAL AND APPROXIMATION VIEWPOINTS 671

and therefore
f 0 (x; y) = 0
+ (0) (21.31)
The derivative f 0 (x; y) can therefore be seen as the right-sided ordinary derivative of the
scalar function computed at the point 0. Naturally, when is di¤erentiable at 0, (21.31)
reduces to f 0 (x; y) = 0 (0).

Example 969 (i) Let f : R3 ! R be de…ned by f (x1 ; x2 ; x3 ) = x21 + x22 + x23 . Compute the
directional derivative of f at x = (1; 1; 2) along the direction y = (2; 3; 5). We have:

x + hy = (1 + 2h; 1 + 3h; 2 + 5h)

and therefore

(h) = f (x + hy) = (1 + 2h)2 + ( 1 + 3h)2 + (2 + 5h)2

It follows that 0 (h) = 76h + 18 and, by (21.31), we conclude that f 0 (x; y) = 0 (0) = 18.
(ii) Let us generalize the previous example and consider the function f : Rn ! R de…ned
by f (x) = kxk2 . We have
n n
d X X
0
(h) = (xi + hyi )2 = 2 yi (xi + hyi ) = 2y (x + hy)
dt
i=1 i=1

Therefore, f 0 (x; y) = 0 (0) = 2x y. The directional derivative of f (x) = kxk2 thus exists
at all the points and along all possible directions, that is, f is derivable on Rn . Its general
form is
f 0 (x; y) = 2x y
In the special direction y = (2; 3; 5) of point (i), we indeed have f 0 (x; y) = 2 (1; 1; 2) (2; 3; 5) =
18.
(iii) Consider the function f : R2 ! R de…ned by
8 2
< x21 x22 if (x1 ; x2 ) 6= (0; 0)
x1 +x2
f (x1 ; x2 ) =
:
0 if (x1 ; x2 ) = (0; 0)

Consider the origin 0 = (0; 0). For every y 2 R2 we have (h) = f (hy) = hy1 y22 = y12 + y22
and so f 0 (0; y) = 0 (0) = y1 y22 =y12 + y22 . In conclusion,

f 0 (0; y) = f (y)

for every y 2 R2 . So, the function f is derivable at the origin and equals its own directional
derivative there. N

Using the auxiliary functions , it is easy to prove that for directional derivatives the
usual algebraic rules hold:

(i) ( f + g)0 (x; y) = f 0 (x; y) + g 0 (x; y);


(ii) (f g)0 (x; y) = f 0 (x; y) g (x) + f (x) g 0 (x; y);
(iii) (f =g)0 (x; y) = (f 0 (x; y) g (x) f (x) g 0 (x; y)) =g 2 (x).
672 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

21.5.3 The two viewpoints


Derivability is conceptually important in that it represents, via the directional derivative
f 0 (x; ) : Rn ! R, the incremental, marginal, behavior of a vector function f : U ! R at a
point x 2 U .
Di¤erentiability, on the other hand, represents the linear approximation standpoint (Sec-
tion 21.2), which is the other fundamental viewpoint that we learned to characterize dif-
ferential calculus. Remarkably, for functions of a single variable the two viewpoints are
equivalent, as Theorem 936 showed by proving that, at a given point, a scalar function is
derivable if and only if is di¤erentiable. We will now show that for functions of several vari-
ables this equivalence no longer holds, thus making all the more important to distinguish
the two viewpoints.

Theorem 970 If a function f : U ! R is di¤ erentiable at a point x 2 U , then it is derivable


at x, with
f 0 (x; y) = df (x) (y) = rf (x) y 8y 2 Rn (21.32)

Thus, di¤erentiability implies derivability. Moreover, from the incremental behavior


along the basic directions – i.e., from the partial derivatives – we can retrieve such be-
havior along any direction through linear combinations. Under di¤erentiability, incremental
behavior is thus consistent across directions.
The next example shows that the converse of the previous theorem is false –i.e., derivab-
ility does not imply di¤erentiability. It also shows that, without di¤erentiability, incremental
behavior might fail to be consistent across directions.

Example 971 In Example 969-(iii) we studied a function f : R2 ! R that, at the origin


0 = (0; 0), has directional derivative f 0 (0; y) = f (y). Since the function f is not linear, the
directional derivative f 0 (0; ) : R2 ! R is not a linear function, so it cannot coincide with
the di¤erential (which, by de…nition, is a linear function). Hence, in view of the last theorem
we can say that f is not di¤erentiable at 0 –otherwise, equality (21.32) would hold.
In sum, this example shows that a function derivable at a point might not be di¤erentiable
at that point. The nonlinear nature of the directional derivative f 0 (0; ) also shows how
unrelated may be the behavior along di¤erent directions. N

We already learned that partial derivability does not imply di¤erentiability (Example
953). Now we learned that even full-‡edged derivability is not enough to imply di¤erenti-
ability. It is, indeed, not even enough to imply continuity: there exist functions that are
derivable at some point but are discontinuous there, as the following example shows.

Example 972 Let f : R2 ! R be de…ned by


8
< x41 x22
x81 +x42
if (x1 ; x2 ) 6= (0; 0)
f (x1 ; x2 ) =
:
0 if (x1 ; x2 ) = (0; 0)
21.6. DIFFERENTIAL OF OPERATORS 673

If we set x = 0 = (0; 0), for every y 2 R2 we have:

f (hy) (hy1 )4 (hy2 )2


f 0 (0; y) = lim = lim h i
t!0 h h!0
h (hy )8 + (hy )4
1 2

h6 y 4 y 2 hy 4 y 2
= lim 5 4 81 2 4 = lim 4 81 2 4 =0
t!0 h t y + y t!0 h y1 + y2
1 2

Therefore, f 0 (0; y) = 0 for every y 2 R2 and the directional derivative at the origin 0 is then
the null linear function. It follows that f is derivable at 0. However, it is not continuous
at 0 (a fortiori, it is not di¤erentiable at 0 by Theorem 952). Indeed, consider the points
t; t2 2 R2 that lie on the graph of the parabola x2 = x21 . We have
2
t4 t2 t4 t4 1
f t; t2 = = =
t8 + (t2 )4 t8 + t8 2

Along these points the function is constant and takes on value 1=2. It follows that limt!0 f t; t2 =
1=2 and, being f (0) = 0, the function is discontinuous at 0. N

Summing up, we just learned that:

di¤erentiability implies derivability (Theorem 970), but not vice versa when n 2
(Example 971);

derivability does not imply continuity when n 2 (Example 972).

These relations sharpen some of the …ndings of Section 21.2.1 on partial derivability.

21.6 Di¤erential of operators


21.6.1 Representation
In Section 21.2 we noted that the di¤erential df (x) : Rn ! R of a function f : U ! R is such
that
f (x + h) f (x) df (x) (h)
lim
h!0 khk
or, equivalently,
jf (x + h) f (x) df (x) (h)j
lim
h!0 khk
This suggests the following generalization of the de…nition of di¤erential to the case of op-
erators.

De…nition 973 An operator f : U ! Rm is said to be di¤ erentiable at a point x 2 U if


there exists a linear operator df (x) : Rn ! Rm such that
kf (x + h) f (x) df (x) (h)k
lim =0 (21.33)
h!0 khk
The operator df (x) is said to be the di¤ erential of f at x.
674 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

This de…nition generalizes De…nition 951, which is the special case m = 1. The linear
approximation is now given by a linear operator with values in Rm , while at the numerator
of the incremental ratio in (21.33) we …nd a norm instead of an absolute value because we
now have to deal with vectors in Rm .
The di¤erential for operators satis…es properties that are similar to those that we saw
in the case m = 1. Naturally, instead of the vector representation of Theorem 952 we now
have a more general matrix representation based on the operator version of Riesz’s Theorem
(Theorem 564). To see its form, we introduce the Jacobian matrix. Recall that an operator
f : U ! Rm can be regarded as a m-tuple (f1 ; :::; fm ) of functions de…ned on U and with
values in R. The Jacobian matrix Df (x) of an operator f : U ! Rm at x 2 U is, then, a
matrix m n given by:
2 @f @f1 @f1
3
1
(x) (x) (x)
6 @x1 @x2 @xn
7
6 @f2 @f2 @f2 7
6
Df (x) = 6 @x1 (x) @x2 (x) @xn (x) 7
7
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)

that is, 2 3
rf1 (x)
6 rf2 (x) 7
Df (x) = 6
4
7
5 (21.34)
rfm (x)
We can now give the matrix representation of di¤erentials, which shows that the Jac-
obian matrix Df (x) is, indeed, the matrix associated to the linear operator df (x). This
representation generalizes the vector representation of Theorem 952 because the Jacobian
matrix Df (x) reduces to the gradient rf (x) in the special case m = 1.

Theorem 974 Let f : U ! Rm be di¤ erentiable at x 2 U . Then,

df (x) (h) = Df (x) h 8h 2 Rn

Proof We begin by considering a simple property of the norm. Let x = (x1 ; :::; xn ) 2 Rn .
For every j = 1; ::; n we have:
v
q uX
u n 2
jxj j = xj t
2 xj = kxk (21.35)
j=1

Now assume that f is di¤erentiable at x 2 U . Set h = tej with j = 1; ::; n. By de…nition,

f x + tej f (x) df (x) tej


lim =0
t!0 ktej k

and therefore, being tej = jtj, we have

f x + tej f (x)
lim df (x) ej =0 (21.36)
t!0 jtj
21.6. DIFFERENTIAL OF OPERATORS 675

From inequality (21.35), for each i = 1; :::; m we have

fi x + tej fi (x) f x + tej f (x)


dfi (x) ej df (x) ej
jtj jtj

Together with (21.36), this implies

fi x + tej fi (x)
lim dfi (x) ej =0
t!0 jtj

for each i = 1; :::m. We can therefore conclude that, for every i = 1; :::; m and every
j = 1; :::; n, we have:
@fi fi x + tej fi (x)
(x) = lim = dfi (x) ej (21.37)
@xj t!0 t
The matrix associated to a linear operator f : Rn ! Rm is (Theorem 564):
A = f e1 ; f e2 ; :::; f (en )
In our case, thanks to (21.37) we therefore have
A = df (x) e1 ; :::; df (x) (en )
2 3
df1 (x) e1 df1 (x) e2 df1 (x) (en )
6 df2 (x) e1 df2 (x) e2 df2 (x) (en ) 7
=64
7
5
dfm (x) e1 dfm (x) e2 n
dfm (x) (e )
2 @f @f1 @f1
3
1
@x (x) @x2 (x) @xn (x)
6 @f21 @f2 @f2 7
6 (x) (x) (x) 7
= 6 @x1 @x2 @xn 7 = Df (x)
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)

as desired.

Example 975 The Hessian matrix of a function f : A Rn ! R is the Jacobian matrix of


its derivative operator rf : D ! Rn , as the reader can easily check. N

Example 976 Let f : R3 ! R2 be de…ned by f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 . For


example, if x = (2; 5; 3), then f (x1 ; x2 ; x3 ) = (2 4 + 5 3; 2 625) = (10; 623) 2 R2 .
We have:
f1 (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; f2 (x1 ; x2 ; x3 ) = x1 x42
and so
4x1 1 1
Df (x) =
1 4x32 0
By Theorem 974, the di¤erential at x is given by the linear operator df (x) : R3 ! R2 de…ned
by
df (x) (h) = Df (x) h = 4x1 h1 + h2 + h3 ; h1 4x32 h2
for each h 2 R3 . For example, at x = (2; 5; 3) we have df (x) (h) = (8h1 + h2 + h3 ; h1 500h2 ).
N
676 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Example 977 Let f : R ! R3 be de…ned by f (x) = (x; sin x; cos x). For example, if x = ,
then f (x) = ( ; 0; 1) 2 R3 . We have:

f1 (x) = x ; f2 (x) = sin x ; f3 (x) = cos x

and so 2 3
1
Df (x) = 4 cos x 5
sin x
By Theorem 974, the di¤erential at x is given by the linear operator df (x) : R ! R3 de…ned
by
df (x) (h) = Df (x) h = (h; h cos x; h sin x)
for each h 2 R. For example, at x = we have df (x) (h) = (h; h; 0). N

Example 978 Let f : Rn ! Rm be the linear operator de…ned by f (x) = Ax, with
2 3
a11 a12 a1n
6 a21 a22 a2n 7
A=6 4
7
5
am1 am2 amn

Let a1 ; :::; am be the row vectors of A, that is, a1 = (a11 ; a12 ; :::; a1n ) ; ::::; am = (am1 ; am2 ; :::; amn ).
We have:

f1 (x1 ; :::; xn ) = a1 x = a11 x1 + + a1n xn


f2 (x1 ; :::; xn ) = a2 x = a21 x1 + + a2n xn

fm (x1 ; :::; xn ) = am x = am1 x1 + + amn xn

which implies Df (x) = A. Hence, the Jacobian matrix of a linear operator coincides with
the associated matrix A. By Theorem 974, the di¤erential at x is therefore given by the
linear operator Ah itself. This naturally generalizes the well known result that for scalar
functions of the form f (x) = ax, with a 2 R, the di¤erential is df (x) (h) = ah. N

21.6.2 Chain rule


Next we state the chain rule for operators, the most general form of this rule that we study.

Theorem 979 Let g : U Rn ! Rm and f : B Rm ! Rq with g (U ) B. If g is


di¤ erentiable at x 2 U and f is di¤ erentiable at g (x), then the composition f g : U
Rn ! Rq is di¤ erentiable at x, with

d (f g) (x) = df (g (x)) dg (x) (21.38)

The right-hand side is the product of the linear operators df (g (x)) and dg (x). By
Theorem 569, its matrix representation is given by the product Df (g (x)) Dg (x) of the
Jacobian matrices. We thus have the fundamental chain rule formula:

D (f g) (x) = Df (g (x)) Dg (x) (21.39)


21.6. DIFFERENTIAL OF OPERATORS 677

In the scalar case n = m = q = 1, the rule takes its basic form (f g)0 (x) = f 0 (g (x)) g 0 (x)
studied in Proposition 925.
Another important special case is when q = 1. In this case we have f : B Rm ! R
and g = (g1 ; :::; gm ) : U n m
R ! R , with g (U ) B. For the composite function f g :
U Rn ! R the chain rule takes the form:

r (f g) (x)
= rf (g (x)) Dg (x)
2 @g1 @g1 @g1
3
@x1 (x) @x2 (x) @xn (x)
@f @f 6 @g2 @g2 @g2 7
6 @x1 (x) @x2 (x) @xn (x) 7
= (g (x)) ; :::; (g (x)) 6 7
@x1 @xm 4 5
@gm @gm @gm
@x1 (x) @x2 (x) @xn (x)
m m
!
X @f @gi X @f @gi
= (g (x)) (x) ; :::; (g (x)) (x)
@xi @x1 @xi @xn
i=1 i=1

As to the di¤erential, for each h 2 Rn we have

d (f g) (x) (h) = r (f g) (x) h


m
X m
X
@f @gi @f @gi
= (g (x)) (x) h1 + + (g (x)) (x) hn
@xi @x1 @xi @xn
i=1 i=1

Grouping the terms for @f =@xi , we get the following equivalent form:

n
X n
X
@f @g1 @f @gm
d (f g) (x) (h) = (g (x)) (x) hi + + (g (x)) (x) hi
@x1 @xi @xm @xi
i=1 i=1

which can be reformulated in the following imprecise, yet expressive, way:


n
X @f @g1 @f @gm
d (f g) = dxi + + dxi (21.40)
@g1 @xi @gm @xi
i=1

This is the formula of the total di¤erential for the composite function f g. The total
variation d (f g) of f g is the result of the sum of the e¤ects on the function f of the
variations of the single functions gi determined by in…nitesimal variations dxi of the di¤erent
variables.

In the next two points we consider two subcases of the case q = 1.

(i) When q = m = 1 we return, with f : B R ! R and g : U Rn ! R, to the chain


0
rule r (f g) (x) = f (g (x)) rg (x) of Theorem 957. It corresponds to the di¤erential
(21.17).

(ii) Suppose q = n = 1. Let f : B Rm ! R and g : U R ! Rm , with g (U ) B. The


678 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

composite function f g:U R ! R is scalar and for this function we have:


2 dg1 3
@f @f dx (x)
(f g)0 (x) = rf (g (x)) Dg (x) = (g (x)) ; :::; (g (x)) 4 5
@x1 @xm dgm
dx (x)
m
X @f dgi
= (g (x)) (x)
@xi dx
i=1

The di¤erential is
m
X @f dgi
d (f g) (x) (h) = (g (x)) (x) h
@xi dx
i=1

for each h 2 R, and the total di¤erential (21.40) becomes:


@f dg1 @f dgm
d (f g) = dx + + dx
@g1 dx @gm dx

Example 980 To illustrate subcase (ii), consider a production function f : Rm ! R whose


m inputs depend on a common parameter, the time t, which indicates the availability of the
di¤erent inputs at t. Inputs are then represented by a function g = (g1 ; :::; gm ) : R ! Rm ,
where gi (t) denotes what is the quantity of input i at time t. The composition f g : R ! R
is a scalar function that tells us how the output varies according to the parameter t. We
have
@f dg1 @f dgm
d (f g) = dt + + dt (21.41)
@g1 dt @gm dt
that is, the total variation d (f g) of the output is the result of the sum of the e¤ects that
the variations of the availability of the di¤erent inputs due to in…nitesimal variations dt of
time have on the production function. In this example, (21.41) has therefore a clear economic
interpretation. More concretely, let g : R ! R3 be de…ned by g (t) = 1=t; 3=t; e t for t 6= 0,
and let f : R3 ! R be de…ned by f (x1 ; x2 ; x3 ) = 3x21 x1 x2 + 6x1 x3 . We have:
@f dg1 @f dg2 @f dg3
(f g)0 (t) = (g (t)) (t) + (g (t)) (t) + (g (t)) (t)
@x1 dt @x2 dt @x3 dt
1 1
= 6e t 2
t t
Therefore,
t 1 1
d (f g) (t) (h) = 6e h 8h 2 R
t2 t
and the total di¤erential (21.41) is

t 1 1
d (f g) = 6e dt
t2 t
N

Next we give a chain rule example with q 6= 1.


21.6. DIFFERENTIAL OF OPERATORS 679

Example 981 Consider the operators f : R2 ! R2 de…ned by f (x1 ; x2 ) = (x1 ; x1 x2 ) and


g : R3 ! R2 de…ned by g (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 . Since both f and g are
di¤erentiable at each point of their domain, by the chain rule the composition f g : R3 ! R2
is itself di¤erentiable at each point of its domain R3 . By the chain rule, the Jacobian matrix
of f g : R3 ! R2 is given by:

D (f g) (x) = Df (g (x)) Dg (x)

In Example 976 we saw that

4x1 1 1
Dg (x) =
1 4x32 0

On the other hand, we also know that:

1 0
Df (x) =
x2 x1

and therefore
1 0
Df (g (x)) =
x1 x42 2x21 + x2 + x3
It follows that:

Df (g (x)) Dg (x)
1 0 4x1 1 1
=
x1 x42 2x21 + x2 + x3 1 4x32 0
4x1 1 1
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42

which implies that the di¤erential at x of f g is given by the linear operator d (f g) :


R3 ! R2 de…ned by

d (f g) (x) (h)
2 3
h1
4x1 1 1 4 h2 5
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
h3

For example, at x = (2; 1; 1) we have:

d (f g) (x) (h) = (8h1 + h2 + h3 ; 16h1 31h2 + h3 )

Naturally, though it is in general more complicated, the Jacobian matrix of the composition
f g can be computed directly, without using the chain rule, by writing explicitly the form
of f g and by computing its partial derivatives. In this example, f g : R3 ! R2 is given
by

(f g) (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 2x21 + x2 + x3


= 2x21 + x2 + x3 ; 2x31 + x1 x2 + x1 x3 2x21 x42 x52 x42 x3
680 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Therefore,

(f g)1 (x) = 2x21 + x2 + x3


(f g)2 (x) = 2x31 + x1 x2 + x1 x3 2x21 x42 x52 x42 x3

and we have:
@ (f g)1 @ (f g)1 @ (f g)1
= 4x1 ; = 1; =1
@x1 @x2 @x3
@ (f g)2
= 6x21 4x1 x42 + x2 + x3
@x1
@ (f g)2
= x1 8x21 x32 5x42 4x32 x3
@x2
@ (f g)2
= x1 x42
@x3
The Jacobian matrix " #
@(f g)1 @(f g)1 @(f g)1
@x1 @x2 @x3
@(f g)2 @(f g)2 @(f g)2
@x1 @x2 @x3

coincides with the one found through the chain rule. N

We close with an interesting application of the chain rule. A function f : Rn+ ! R is


(positively) homogeneous of order 2 R if f (tx) = t f (x) for each t > 0 and x 2 Rn+ .12

Corollary 982 Let f : Rn+ ! R be homogeneous of order . If f is di¤ erentiable on Rn++ ,


then the derivative operator rf : Rn++ ! Rn is such that

rf (x) x = f (x) 8x 2 Rn++ (21.42)

Proof Fix x 2 Rn++ and consider the scalar function ' : (0; 1) ! R de…ned by ' (t) =
f (tx). If we de…ne g : (0; 1) ! Rn++ by g (t) = tx, we can write ' = f g. By (21.41),
we have '0 (t) = rf (tx) x. On the other hand, homogeneity implies ' (t) = t f (x), so
'0 (t) = t 1 f (x). We conclude that rf (tx) x = t 1 f (x). For t = 1, it is Euler’s
Formula.

Equality (21.42) is called Euler’s Formula.13 The more interesting cases are = 0
and = 1. For instance, the indirect utility function v : Rn++ R+ ! R is easily to be
homogeneous of degree 0 (cf. Proposition 848). By Euler’s Formula, we have:
n
X @v (p; w) @v (p; w)
pi = w
@pi @w
i=1

for all (p; w) 2 Rn+1


++ .
12
If f is positively homogeneous on Rn n
+ , then it is homogeneous of order 1 on R+ . This notion is thus
consistent with what we did in Chapter 15.
13
The reader can also check that the partial derivatives are homogeneous of order 1.
21.6. DIFFERENTIAL OF OPERATORS 681

21.6.3 Proof of the chain rule (Theorem 979)


We show that (21.38) holds, i.e., that
k(f g) (x + h) (f g) (x) (df (g (x)) dg (x)) (h)k
lim =0 (21.43)
h!0 khk
Set
(h) = g (x + h) g (x) dg (x) (h)
(k) = f (g (x) + k) f (g (x)) df (g (x)) (k)
We have
(f g) (x + h) (f g) (x) (df (g (x)) dg (x)) (h)
= f (g (x + h)) f (g (x)) df (g (x)) (dg (x) (h))
= f (g (x + h)) f (g (x)) df (g (x)) (g (x + h) g (x) (h))
= f (g (x + h)) f (g (x)) df (g (x)) (g (x + h) g (x)) + df (g (x)) ( (h))
= (g (x + h) g (x)) + df (g (x)) ( (h))
To prove (21.43) thus amounts to proving that
k (g (x + h) g (x)) + df (g (x)) ( (h))k
lim =0 (21.44)
h!0 khk
Consider the linear operator df (g (x)). By Lemma 730, there exists k > 0 such that
kdf (g (x)) (h)k k khk for each h 2 Rm . Since (h) 2 Rm for each h 2 Rn , we therefore
have kdf (g (x)) ( (h))k k k (h)k. On the other hand, g is di¤erentiable at x, and so
limh!0 k (h)k = khk = 0. It follows that
kdf (g (x)) ( (h))k k (h)k
lim k lim =0 (21.45)
h!0 khk h!0 khk
Since f is di¤erentiable at g (x), we have
k (k)k
lim =0 (21.46)
k!0 kkk

Fix " > 0. By (21.46), there exists " > 0 such that kkk " implies k (k)k = kkk ". In
other words, there exists " > 0 such that kg (x + h) g (x)k " implies

k (g (x + h) g (x))k
"
kg (x + h) g (x)k
On the other hand, since g is continuous at x, there exists 1 > 0 such that khk 1
im-
plies kg (x + h) g (x)k " . Therefore, for khk su¢ ciently small we have k (g (x + h) g (x))k
" kg (x + h) g (x)k. By applying Lemma 730 to the linear operator dg (x), there exists k > 0
such that
k (g (x + h) g (x))k " kg (x + h) g (x)k (21.47)
" k (h) + dg (x) (h)k
" k (h)k + " kdg (x) (h)k " k (h)k + "k khk
682 CHAPTER 21. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Go back to (21.44). Using (21.45) and (21.47), we have:

k (g (x + h) g (x)) + df (g (x)) ( (h))k


lim
h!0 khk
k (g (x + h) g (x))k kdf (g (x)) ( (h))k
lim + lim
h!0 khk h!0 khk
k (h)k khk
" lim + "k lim = "k
h!0 khk h!0 khk

Since " was …xed arbitrarily, it can be taken as small as we like. Therefore:

k (g (x + h) g (x)) + df (g (x)) ( (h))k


lim k lim " = 0
h!0 khk "!0

as desired.
Chapter 22

Di¤erential methods

22.1 Extremal and critical points


22.1.1 Preamble
So far we have considered the notions of derivability and di¤erentiability for functions de…ned
on open intervals (a; b) for scalar functions and, more generally, on open sets U for functions
of several variables. To study optimization problems we have to consider functions f : A
Rn ! R de…ned on any subset A of Rn . Fortunately, all we saw until now for a generic point
of an open set U extends immediately to the interior points of any set A. This is best seen
in the scalar case. So, let x0 be an interior point of A R. By de…nition, there exists a
neighborhood U of x0 such that U A. The restriction fjU of f on U is derivable at x0 if
the limit
fjU (x0 + h) fjU (x0 )
lim
h!0 h
exists and is …nite. But, for every h small enough so that x0 + h 2 U we have

fjU (x0 + h) fjU (x0 ) f (x0 + h) f (x0 )


=
h h

and so
0 f (x0 + h) f (x0 )
fjU (x0 ) = lim
h!0 h
We can therefore consider directly the limit

f (x0 + h) f (x0 )
lim
h!0 h

and say that its value, denote by f 0 (x0 ), is the derivative of f at the interior point x0 if it
exists and is …nite.
In sum, derivability and di¤erentiability are local notions that use only the properties of
the function in a neighborhood, however small, of the point at hand. They can therefore be
de…ned at any interior point of any set.

683
684 CHAPTER 22. DIFFERENTIAL METHODS

22.1.2 Fermat’s Theorem


In Section 18.5 we studied in detail the notions of local maximizers and minimizers. As we
remarked, in applications they are of little interest per se but they have a key instrumental
importance. The next fundamental result, Fermat’s Theorem, is central for their study.

Theorem 983 (Fermat) Let f : A R ! R be de…ned on a set A in R and C a subset


of A. Let f be di¤ erentiable at an interior point x
^ of C. If x
^ is a local extremal point (a
maximizer or a minimizer) of f on C, then

f 0 (^
x) = 0 (22.1)

Proof Let x ^ 2 C be an interior point and a local maximizer on C (a similar argument holds
if it is a local minimizer). There exists therefore B" (^x) such that (18.21) holds, that is,
f (^
x) f (x) for every x 2 B" (^ x) \ C. For every h > 0 su¢ ciently small, that is, h 2 (0; "),
we have x ^ + h 2 B" (^
x). Hence

f (^
x + h) f (^
x)
0 8h 2 (0; ")
h
which implies
f (^
x + h) f (^
x)
lim 0 (22.2)
h!0+ h
On the other hand, for every h < 0 su¢ ciently small, that is, h 2 ( "; 0), we have x
^+h 2
B" (^
x). Therefore,
f (^
x + h) f (^ x)
0 8h 2 ( "; 0)
h
which implies
f (^
x + h) f (^x)
lim 0 (22.3)
h!0 h
Together, inequalities (22.2) and (22.3) imply that

f (^
x + h) f (^
x) f (^
x + h) f (^
x) f (^
x + h) f (^
x)
0 lim = lim = lim 0
h!0 h h!0 h h!0+ h

Therefore, since by hypothesis f 0 (^


x) exists, we have

f (^
x + h) f (^
x)
f 0 (^
x) = lim =0
h!0 h
as desired.

A necessary condition for an interior point x ^ to be a local maximizer (or minimizer) is


therefore that the derivative at such point is, if it exists, zero. This condition, often called …rst
order (necessary) condition (abbreviated as FOC), has a simple heuristic interpretation. As
we will see shortly, if f 0 (x0 ) > 0 the function is strictly increasing at x0 , while if f 0 (x0 ) < 0
the function is strictly decreasing. If f is maximized at x0 , it is neither strictly increasing
there (otherwise, an in…nitesimal increase in x would be bene…cial), nor strictly decreasing
22.1. EXTREMAL AND CRITICAL POINTS 685

there (otherwise, an in…nitesimal decrease in x would be bene…cial). Thus, the derivative, if


it exists, must be zero.1

The …rst-order condition (22.1) will turn out to be key in solving optimization problems,
hence the important instrumental role of local extremal points. Conceptually, it tells us
that in order to maximize (or minimize) an objective function we need to consider what
happens at the margin: a point cannot be a maximizer if there is still room for improvement
through in…nitesimal changes, be they positive or negative. At a maximizer, all marginal
opportunities must have been exhausted.
The fundamental principle highlighted by the …rst order condition is that, to maximize
levels of utility (or of production or of welfare and so on), one needs to work at the mar-
gin. In economics, the understanding of this principle was greatly facilitated by a proper
mathematical formalization of the optimization problem that made it possible to rely on
di¤erential calculus (and so on the shoulders of the giants who created it). What becomes
crystal clear through calculus, is highly non-trivial otherwise, in particular if we just use a
purely literary analysis. Only in the 1870s the marginal principle was fully understood and
was at the heart of the marginalist theory of value, pioneered in the 1870s by Jevons, Menger,
and Walras. This approach has continued to evolve since then (at …rst with the works of
Edgeworth, Marshall, and Pareto) and, over the years, has shown a surprising ability to shed
light on economic phenomena. In all this, the …rst-order condition and its generalizations
(momentarily we will see its version for functions of several variables) is, like Shakespeare’s
Julius Caesar: the colossus that bestrides the economics world.

That said, let us continue with the analysis of Fermat’s Theorem. It is important to
focus on the following aspects:

(i) the hypothesis that x


^ is an interior point of C;

(ii) the hypothesis of di¤erentiability at x


^;

(iii) the condition f 0 (^


x) = 0 is only necessary.

Let us discuss them one by one.

(i) The hypothesis that x ^ is an interior point of C is essential for Fermat’s Theorem.
Indeed, consider for example f : R ! R given by f (x) = x, and let C = [0; 1]. The boundary
point x = 0 is a global minimizer of f on [0; 1], but f 0 (0) = 1 6= 0. In the same way, the
boundary point x = 1 is a maximizer, but f 0 (1) = 1 6= 0. Therefore, if x is a boundary local
extremal point, it is not necessarily true that f 0 (x) = 0.

(ii) Fermat’s Theorem cannot be applied to functions that, even if they have interior
maximizers or minimizers, are not di¤erentiable at these points. A classic example is the
function f : R ! R given by f (x) = jxj: the point x = 0 is a global minimizer but f , at
1
This heuristic argument can be also articulated as follows. Since f is derivable at x0 , we have f (x0 + h)
f (x0 ) = f 0 (x0 ) h + o (h). Heuristically, we can set f (x0 + h) f (x0 ) = f 0 (x0 ) h by neglecting the term o (h).
If f 0 (x0 ) > 0, we have f (x0 + h) > f (x0 ) if h > 0, so a strict increase is strictly bene…cial; if f 0 (x0 ) < 0, we
have f (x0 + h) > f (x0 ) if h < 0, so a strict decrease is strictly bene…cial. Only if f 0 (x0 ) = 0, such strictly
bene…cial variations cannot occur, so f may be maximized at x0 .
686 CHAPTER 22. DIFFERENTIAL METHODS

that point, does not admit derivative, so the condition f 0 (x) = 0 is not relevant in this case.
Another example is the following.

q
Example 984 Let f : R ! R be given by f (x) = 3
(x2 5x + 6)2 , with graph

2.5

y
2

1.5

0.5

0
O 2 5/2 3 x
-0.5

-1

-1.5
0 1 2 3 4 5

Since x2 5x + 6 = (x 2) (x 3) is zero for x = 2 and x = 3, we conclude that

f (x) f (2) = f (3) = 0 8x 2 R

Therefore, x = 2 and x = 3 are global minimizers. The derivative of f is

2 2 1 2 (2x 5)
f 0 (x) = x 5x + 6 3
(2x 5) = p
3
3 3 x2 5x + 6

and so it does not exist where x2 5x + 6 is zero, that is, at the two minimizers! The
point x = 5=2 is such that f 0 (x) = 0 and is a local maximizer (being unbounded above, this
function has no global maximizers). N

(iii) Lastly, the condition f 0 (x) = 0 is only necessary. The following simple example
should not leave any doubt on this.
22.1. EXTREMAL AND CRITICAL POINTS 687

Example 985 Let f : R ! R be the cubic function f (x) = x3 , with graph

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

We have f 0 (0) = 0, although the origin x0 = 0 is neither a local maximizer nor a local
minimizer.2 Condition (22.1) is therefore necessary, but not su¢ cient, for a point to be a
local extremum. N

We now address the multivariable version of Fermat’s Theorem. In this case the …rst order
condition (22.1) takes the more general form (22.4) in which gradients replace derivatives.

Theorem 986 Let f : A Rn ! R be de…ned on a set A in Rn and C a subset of A.


Suppose f is di¤ erentiable at an interior point x
^ of C. If x
^ is a local extremal point (a
maximizer or a minimizer) of f on C, then
rf (^
x) = 0 (22.4)

We leave the proof to the reader. Indeed, mutatis mutandis, it is the same as that of
Fermat’s Theorem.3

The observations (i)-(iii), just made for the scalar case, continue to hold in the multivari-
able case. In particular, as in the scalar case the …rst order condition is necessary, but not
su¢ cient, as the next example shows.

Example 987 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 x22 . We have


rf (x) = (2x1 ; 2x2 )
so the …rst order condition (22.4) takes the form
(
2x1 = 0
2x2 = 0
2
Indeed, f (x) < 0 for every x < 0 and f (x) > 0 for every x > 0.
3
In the sequel, by Fermat’s Theorem we will mean both the original scalar version as well as the present
multivariable version (the context will clarify which one we are referring to).
688 CHAPTER 22. DIFFERENTIAL METHODS

The unique solution of this system is (0; 0), which in turn is the unique point in R2 where
f satis…es condition (22.4). It is easy to see that this point is neither a maximizer nor a
minimizer. Indeed, if we consider any point (0; x2 ) di¤erent from the origin on the vertical
axis and any point (x1 ; 0) di¤erent from the origin on the horizontal axis, we have
f (0; x2 ) = x22 < 0 and f (x1 ; 0) = x21 > 0
that is, being f (0; 0) = 0,
f (0; x2 ) < f (0; 0) < f (x1 ; 0) 80 6= x1 ; x2 2 R
In every neighborhood of the point (0; 0) there are, therefore, both points in which the
function is strictly positive and points in which it is strictly negative: as we can see from the
…gure

0
x3

-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

the origin (0; 0) is a “saddle” point of f which is neither a maximizer nor a minimizer. N
The points x^ of Rn such that rf (^
x) = 0 – in particular for n = 1 the points such that
f 0 (^
x)
= 0 – are said to be stationary points or critical points of f . Using this terminology,
Theorem 986 can be paraphrased as saying that a necessary condition for an interior point
x to be a local minimizer or maximizer is to be stationary.
Example 988 Let f : R ! R be given by f (x) = 10x3 (x 1)2 . The …rst order condition
(22.1) becomes
10x2 (x 1) (5x 3) = 0
and therefore the points that satisfy it are x = 0, x = 1, and x = 3=5. N
Example 989 Let f : R2 ! R be given by f (x1 ; x2 ) = 2x21 + x22 3 (x1 + x2 ) + x1 x2 3.
We have
rf (x) = (4x1 3 + x2 ; 2x2 3 + x1 )
So here the …rst order condition (22.4) assumes the form
4x1 3 + x2 = 0
2x2 3 + x1 = 0
It is easy to see that x = (3=7; 9=7) is the unique solution of the system, so it is the unique
stationary point of f on R2 . N
22.2. MEAN VALUE THEOREM 689

22.1.3 Unconstrained optima: incipit


The role of Fermat’s Theorem in solving optimization problems will be treated in detail
in Chapter 28. We can, however, see a …rst simple use of this important theorem in an
unconstrained optimization problem

max f (x) sub x 2 C (22.5)


x

where C is an open set of Rn .4


Let us assume, as usual in applications, that f is di¤erentiable on C. Any local extremal
point is thus interior (since C is open) and f is di¤erentiable at that point. By Fermat’s
Theorem, the local extremal points of f on C are also stationary points. This is true, a
fortiori, for any solution of problem (22.5) because it is, obviously, also a local maximizer.
Therefore, to …nd the possible solutions of problem (22.5) it is necessary to solve the
…rst-order condition
rf (x) = 0
The solutions of the optimization problem, if they exist, are among the solutions of this
condition, which is necessary (but not su¢ cient!) for a point to be a local extremal one.

Example 990 Let f : R2 ! R be given by

f (x) = x41 x42 + 4x1 x2

We have rf (x) = 4x31 + 4x2 ; 4x32 + 4x1 , so the …rst order condition is
(
4x31 + 4x2 = 0
4x32 + 4x1 = 0

that is, (
x31 = x2
x32 = x1
The stationary points are (0; 0), (1; 1), and ( 1; 1). Among them we have to look for the
possible solutions of the unconstrained optimization problem

max f (x) sub x 2 R2


x

22.2 Mean Value Theorem


In this section we study the important Mean Value Theorem, one of the classic results of
di¤erential calculus. We start with a special case, known as Rolle’s Theorem.

Theorem 991 (Rolle) Let f : [a; b] ! R be continuous on [a; b], with f (a) = f (b), and
di¤ erentiable on (a; b). Then, there exists (at least) one critical point x
^ 2 (a; b), that is, a
point x^ 2 (a; b) such that f 0 (^
x) = 0.
4
Recall that in Section 18.1 optimization problems were called unconstrained when C is open.
690 CHAPTER 22. DIFFERENTIAL METHODS

This theorem, which provides a simple su¢ cient condition for a function to have a critical
point, has an immediate graphical intuition:

6
y

1
O a c b x

0
0 1 2 3 4 5

Proof By Weierstrass’Theorem, there exist x1 ; x2 2 [a; b] such that f (x1 ) = minx2[a;b] f (x)
and f (x2 ) = maxx2[a;b] f (x). Denote m = minx2[a;b] f (x) and M = maxx2[a;b] f (x). If
m = M , then f is constant, that is, f (x) = m = M , and therefore f 0 (x) = 0 for every
x 2 (a; b). If m < M , then at least one of the points x1 and x2 is interior to [a; b]. Indeed,
they cannot be both boundary points because f (a) = f (b). If x1 is an interior point of [a; b],
that is, x1 2 (a; b), then by Fermat’s Theorem we have f 0 (x1 ) = 0, so x ^ = x1 . Analogously,
0
if x2 2 (a; b), we have f (x2 ) = 0, and therefore x^ = x2 .
p
Example 992 Let f : [ 1; 1] ! R be given by f (x) = 1 x2 . This function is continuous
on [ 1; 1] and di¤erentiable on ( 1; 1). Since f ( 1) = f (1) = 0, by Rolle’s Theorem there
exists a critical point x^ 2 ( 1; 1), that is, a point such that f 0 (^
x) = 0. In particular, from
1
f 0 (x) = x 1 x2 2
it follows that this point is x
^ = 0. N

Given a function f : [a; b] ! R, consider the points (a; f (a)) and (b; f (b)) of its graph.
The straight line passing through these points has equation
f (b) f (a)
y = f (a) + (x a) (22.6)
b a
as the reader can verify by solving the system
(
f (a) = ma + q

f (b) = mb + q
This straight line plays a key role in the important Mean Value (or Lagrange’s) Theorem,
which we now state and prove.

Theorem 993 (Mean Value) Let f : [a; b] ! R be continuous on [a; b] and di¤ erentiable
on (a; b). Then, there exists x
^ 2 (a; b) such that
f (b) f (a)
f 0 (^
x) = (22.7)
b a
22.2. MEAN VALUE THEOREM 691

Rolle’s Theorem is the special case in which f (a) = f (b), so that condition (22.7)
becomes f 0 (^x) = 0.
Note that
f (b) f (a)
b a
is the slope of the straight line (22.6) passing through the points (a; f (a)) and (b; f (b)) of the
graph of f , while f 0 (x) is the slope of the straight line tangent to the graph of f at the point
(x; f (x)). The Mean Value Theorem establishes, therefore, a simple su¢ cient condition for
the existence of a point x ^ 2 (a; b) such that the straight line tangent at (^
x; f (^
x)) is parallel
to the straight line passing through the points (a; f (a)) and (b; f (b)). Graphically:

6
y

1
O a c b x

0
0 1 2 3 4 5

Note that the increment f (b) f (a) on the whole interval [a; b] can be written, thanks
to the Mean Value Theorem, as
f (b) f (a) = f 0 (^
x) (b a)
or, in an equivalent way, as
f (b) f (a) = f 0 a + t^(b a) (b a)
for a suitable 0 t^ 1. Indeed, we have
[a; b] = f(1 t) a + tb : t 2 [0; 1]g = fa + t (b a) : t 2 [0; 1]g
^ 2 [a; b] can be written in the form a + t^(b
so every point x a) for a suitable t^ 2 [0; 1].

Proof Let g : [a; b] ! R be the auxiliary function de…ned by


f (b) f (a)
g (x) = f (x) f (a) + (x a)
b a
It is the di¤erence between f and the straight line passing through the points (a; f (a))
and (b; f (b)). The function g is continuous on [a; b] and di¤erentiable on (a; b). Moreover,
g (a) = g (b) = 0. By Rolle’s Theorem, there exists x^ 2 (a; b) such that g 0 (^
x) = 0. But
f (b) f (a)
g 0 (x) = f 0 (x)
b a
692 CHAPTER 22. DIFFERENTIAL METHODS

and therefore
f (b) f (a)
f 0 (^
x) =0
b a
That is, x
^ satis…es condition (22.7).

A …rst interesting application of the Mean Value Theorem shows that constant functions
are characterized by having a zero derivative at every point.

Corollary 994 Let f : [a; b] ! R be continuous on [a; b] and di¤ erentiable on (a; b). Then
f 0 (x) = 0 for every x 2 (a; b) if and only if f is constant, that is, if and only if there exists
k 2 R such that
f (x) = k 8x 2 [a; b]

Proof Let us prove the “only if”, since the “if”is the simple property of derivatives seen in
Example 907. Let x 2 (a; b) and let us apply the Mean Value Theorem on the interval [a; x].
It yields a point x
^ 2 (a; x) such that
f (x) f (a)
0 = f 0 (^
x) =
x a
that is, f (x) = f (a). Since x is any point in (a; b), it follows that f (x) = f (a) for any
x 2 [a; b). By the continuity of f at b, we also have f (a) = f (b).

This characterization of constant functions will prove important in the theory of integ-
ration. In particular, the following simple generalization of Corollary 994 will be key.

Corollary 995 Let f; g : [a; b] ! R be continuous on [a; b] and di¤ erentiable on (a; b). Then
f 0 (x) = g 0 (x) for every x 2 (a; b) if and only if there exists k 2 R such that

f (x) = g (x) + k 8x 2 [a; b]

Two functions that have the same …rst derivative are, thus, equal up to an (additive)
constant k.

Proof Here too we prove the “only if”, the “if” being obvious. Let h : [a; b] ! R be the
auxiliary function h (x) = f (x) g (x). We have h0 (x) = f 0 (x) g 0 (x) = 0 for every
x 2 (a; b). Therefore, by Corollary 994 h is constant on [a; b]. That is, there exists k 2 R
such that h (x) = k for every x 2 [a; b], so f (x) = g (x) + k for every x 2 [a; b].

Via higher order derivatives, next we establish the ultimate version of the Mean Value
Theorem.5

Theorem 996 (Taylor) Let f : [a; b] ! R be n 1 times continuously di¤ erentiable on


[a; b] and n times di¤ erentiable on (a; b). Then, there exists x
^ 2 (a; b) such that
n
X1 f (k) (a) f (n) (^
x)
f (b) f (a) = (b a)k + (b a)n (22.8)
k! n!
k=1
5
In the statement we adopt the convention that “0 times continuous di¤erentiability” just amounts to
continuity. Moreover, f (0) = f .
22.3. CONTINUITY PROPERTIES OF THE DERIVATIVE 693

The Mean Value Theorem is the special case n = 1 because (22.7) can be equivalently
written as
f (b) f (a) = f 0 (^
x) (b a)
Formula (22.8) is a version of Taylor’s formula, arguably the most important formula of
Calculus that will be studied in detail later in the book (Chapter 23).

Proof Let g : [a; b] ! R be the auxiliary function de…ned by


n
X1 f (k) (x) k
g (x) = f (b) f (x) (b x)k (b x)n
k! n!
k=1

The function g is continuous on [a; b] and di¤erentiable on (a; b). Some algebra shows that

(b x)n 1
g 0 (x) = k f (n) (x)
(n 1)!

Let the scalar k be such that g (a) = 0, i.e.,


n
!
X1 f (k) (a) k n!
k= f (b) f (a) (b a)
k! (b a)n
k=1

We thus have g (a) = g (b) = 0. By Rolle’s Theorem, there exists x


^ 2 (a; b) such that
0
g (^
x) = 0. So
^)n 1
(b x
0= k f (n) (^
x)
(n 1)!
and therefore k = f (n) (^
x). We thus have
n
X1 f (k) (a) f (n) (^
x)
0 = g (a) = f (b) f (a) (b a)k (b a)n
k! n!
k=1

which implies (22.8).

We close by noting that, as easily checked, there is a dual version of (22.8) involving the
derivatives at other endpoint of the interval:
n
X1 f (k) (b) f (n) (^
x)
f (a) f (b) = (a b)k + (a b)n (22.9)
k! n!
k=1

where, again, x
^ 2 (a; b).

22.3 Continuity properties of the derivative


The derivative function may exist at a point without being continuous at that point, as the
next example shows.
694 CHAPTER 22. DIFFERENTIAL METHODS

Example 997 Let f : R ! R be de…ned by


( 2
x sin x1 x 6= 0
f (x) =
0 x=0

As the reader can check, we have


(
0
2x sin x1 cos x1 x 6= 0
f (x) =
0 x=0

So, f is di¤erentiable at 0, but the derivative function f 0 is discontinuous there. N

Although it might be discontinuous, the derivative function still satis…es the intermediate
values property of Lemma 493, as the next important result proves.

Theorem 998 (Darboux) Let f : [a; b] ! R be di¤ erentiable, with f 0 (a) < f 0 (b). If

f 0 (a) z f 0 (b)

then there exists a c b such that f 0 (c) = z. If f 0 is strictly increasing, such c is unique.

Proof Let f 0 (a) < z < f 0 (b) (otherwise the result is trivially true). Set g(x) = f (x) zx.
We have g 0 (x) = f 0 (x) z, and therefore g 0 (a) < 0 and g 0 (b) > 0. The function g is continuous
on [a; b] and, therefore, by Weierstrass’Theorem it has a minimizer xm on [a; b]. Let us prove
that the minimizer xm is interior. Since g 0 (a) < 0, there exists a point x1 2 (a; b) such
that g(x1 ) < g(a). Analogously, being g 0 (b) > 0, there exists a point x2 2 (a; b) such that
g(x2 ) < g(b). This implies that neither a nor b are minimizers of g on [a; b], so xm 2 (a; b).
By Fermat’s Theorem, g 0 (xm ) = 0, that is, f 0 (xm ) = z. In conclusion, there exists c 2 (a; b)
such that f 0 (c) = z.

As in Lemma 493, the case f 0 (a) > f 0 (b) is analogous. We can thus say that, for any z
such that
min f 0 (a) ; f 0 (b) z max f 0 (a) ; f 0 (b)
there exists a c b such that f (c) = z. If f 0 is strictly monotonic, such c is unique.

Since in general the derivative function is not continuous (so Weierstrass’Theorem cannot
be invoked), Darboux’s Theorem does not imply – unlike Lemma 493 – a version of the
Intermediate Value Theorem for the derivative function. Still, Darboux’s Theorem is per se
a remarkable property of continuity of the derivative function that implies, inter alia, that
such function can only have essential non-removable discontinuities.

Corollary 999 If f : [a; b] ! R is di¤ erentiable, the derivative function f 0 : [a; b] ! R


cannot have removable discontinuities or jump discontinuities.

Proof Let us suppose, by contradiction, that f 0 has at x0 2 (a; b) a removable discontinuity,


that is, limx!x0 f 0 (x) = L 6= f 0 (x0 ). Suppose that L < f 0 (x0 ) (the proof is analogous if
L > f 0 (x0 )). If " is such that 0 < " < f 0 (x0 ) L, then there exists > 0 such that

x0 6= x 2 (x0 ; x0 + ) =) L " < f 0 (x) < L + " < f 0 (x0 )


22.4. MONOTONICITY AND DIFFERENTIABILITY 695

0
By taking any 0 < < , we therefore have
0 0
x0 6= x 2 x0 ; x0 + =) L " < f 0 (x) < L + " < f 0 (x0 ) (22.10)
0 0
Consider the interval x0 ; x0 . By (22.10), we have f 0 (x0 ) < f 0 (x0 ). By Darboux’s
0 0 0
Theorem, for every f (x0 ) < z < f (x0 ) there exists c 2 (x0 ; x0 ) such that f 0 (c) = z.
But this contradicts (22.10) which implies that, taking z 2 (L + "; f 0 (x0 )), there is no c 2
[x0 ; x0 ] such that f 0 (c) = z. Hence, f 0 cannot have removable discontinuities.
The function f 0 cannot have jump discontinuities either. Suppose, by contradiction, that
f has such a discontinuity at x0 2 (a; b), that is, limx!x+ f 0 (x) 6= limx!x f 0 (x). Suppose
0
0 0
that f 0 (x0 ) = limx!x+ f 0 (x) (the proof is analogous if f 0 (x0 ) = limx!x f 0 (x)). By setting
0 0
L = limx!x f 0 (x), the proof proceeds in an analogous way to the one seen for the removable
0
discontinuity, as the reader can verify.

22.4 Monotonicity and di¤erentiability


There is a strict link between the monotonicity of a di¤erentiable function and the sign of
its derivative. This allows us to study monotonicity through di¤erential conditions based
on properties of the derivatives. Such conditions are important, both conceptually and
operationally, to check the monotonicity properties of a function. For simplicity, we only
consider scalar functions.
We start by introducing the concept of monotonicity of a function at a point of its domain.

De…nition 1000 A function f : A R ! R said to be (locally) increasing at a limit point


x0 2 A if there exists a neighborhood B" (x0 ) of x0 such that, for every x; y 2 B" (x0 ) \ A,

x < x0 < y =) f (x) f (x0 ) f (y) (22.11)

Moreover, the function is said to be (locally) strictly increasing if the inequalities in (22.11)
are all strict.

Similar de…nitions hold for the (strictly) decreasing monotonicity at a point. To avoid
misunderstandings, recall that in Section 6.4.4 we de…ned monotonicity in a global way by
saying (in De…nition 206) that a function f : A R ! R is increasing if

x > y =) f (x) f (y) 8x; y 2 A

and strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A

with analogous de…nitions for decreasing monotonicity. Obviously, an increasing function on


A is increasing at each point of A. We will see momentarily that, in general, the converse
does not hold: local monotonicity at each point of A does not guarantee global monotonicity
on A.

The following result is immediate.


696 CHAPTER 22. DIFFERENTIAL METHODS

Proposition 1001 Let f : A R ! R be di¤ erentiable at an interior point x0 2 A.

(i) If f is increasing at x0 , then f 0 (x0 ) 0.

(ii) If f 0 (x0 ) > 0, then f is strictly increasing at x0 .

A dual characterization holds for (strictly) decreasing monotonicity.

Proof If f is increasing, the di¤erence quotients of f at x0 are all positive (at least for h
su¢ ciently small), so their limit is 0. If instead f 0 (x0 ) > 0, the di¤erence quotients are, at
least for h close to 0, strictly positive by the Theorem on the permanence of sign. It follows
that f (x0 + h) > f (x0 ) for h > 0 and f (x0 + h) < f (x0 ) for h < 0, with h su¢ ciently
small, so f is strictly increasing at x0 .

Example 1002 The function f : R ! R de…ned by f (x) = 2x2 3x is strictly increasing


at x0 = 5, since f 0 (5) = 17 > 0 and strictly decreasing at x0 = 0, since f 0 (0) = 3 < 0. N

Note the asymmetry between points (i) and (ii) of the previous proposition:

f increasing at x0 =) f 0 (x0 ) 0 (22.12)

but

f 0 (x0 ) > 0 =) f strictly increasing at x0 (22.13)

The non-negativity of the derivative is necessary for the increasing monotonicity, while its
strict positivity is su¢ cient for the strictly increasing monotonicity.
This asymmetry is unavoidable because the converses of (22.12) and (22.13) do not hold.
For example, the function f (x) = x3 is strictly decreasing at 0 although f 0 (0) = 0, so the
converse of (22.12) is false. The function f (x) = x3 is strictly increasing at x0 = 0 (indeed
x3 > 0 for every x > 0 and x3 < 0 for every x < 0), but f 0 (0) = 0, so the converse of (22.13)
is false as well.

We might think that, if a function is monotonic at each point of a set A, it enjoys the
same type of monotonicity on the entire set A, i.e., globally. This is not the case. Indeed,
consider the function f (x) = 1=x de…ned on the open set R f0g. It is strictly increasing
at each point of its domain because f 0 (x) = 1=x2 > 0 for every x 6= 0. However, it is not
increasing at all because, for example 1 < 1, while f ( 1) = 1 > 1 = f (1). Graphically:
22.4. MONOTONICITY AND DIFFERENTIABILITY 697

8 y

0
O x
-2

-4

-6

-8
-4 -3 -2 -1 0 1 2 3 4 5

Therefore, monotonicity at each point of a set does not imply global monotonicity (of the
same type). Intuitively, this may happen because if such set is a union of disjoint intervals,
then at each interval the function “gets back to the beginning”. The next important result
con…rms this intuition by showing that the implication does hold when the set is an interval
(so we get rid of the case unions of disjoint intervals just mentioned). It is the classic
di¤erential criterion of monotonicity.

Proposition 1003 Let f : (a; b) ! R be a di¤ erentiable function, with a; b 2 R. Then, f is


(globally) increasing on (a; b) if and only if f 0 (x) 0 for every x 2 (a; b).

Under the clause a; b 2 R the interval (a; b) can be unbounded, for example (a; b) = R.
An similar result, negativity of the derivative on (a; b), holds for the decreasing monotonicity.
Note that Corollary 994 is a special case of this result since f 0 (x) = 0 for every x 2 (a; b) is
equivalent to having both f 0 (x) 0 and f 0 (x) 0 for every x 2 (a; b) and therefore, being
simultaneously increasing and decreasing, f is constant.

Proof “Only if”. Suppose that f is increasing. Let x 2 (a; b). For every h > 0 we have
f (x + h) f (x), hence
f (x + h) f (x)
0
h
It follows that
f (x + h) f (x) f (x + h) f (x)
f 0 (x) = lim = lim 0
h!0 h h!0+ h
“If”. Let f 0 (x) 0 for every x 2 (a; b). Let x1 ; x2 2 (a; b) with x1 < x2 . By the Mean
Value Theorem, there exists x
^ 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (^
x) = (22.14)
x2 x1
Since f 0 (^
x) 0 and x2 x1 > 0, this shows that f (x2 ) f (x1 ).
698 CHAPTER 22. DIFFERENTIAL METHODS

Example 1004 (i) Let f : R ! R be given by f (x) = 3x5 +2x3 . Since f 0 (x) = 15x4 +6x2
0 for every x 2 R, by Proposition 1003 the function is increasing. (ii) Let f : R ! R be
the quadratic function f (x) = x2 . We have f 0 (x) = 2x and hence Proposition 1003 (and its
analog for decreasing monotonicity) shows that f is neither increasing nor decreasing on R,
and that it is increasing on (0; 1) and decreasing on ( 1; 0). N

Next we show that the strict positivity of the derivative implies strict increasing mono-
tonicity, thus providing a most useful di¤erential criterion of strict monotonicity.

Proposition 1005 Let f : (a; b) ! R be a di¤ erentiable function, with a; b 2 R. If f 0 (x) > 0
for every x 2 (a; b), then f is (globally) strictly increasing on (a; b).

Proof The proof is similar to that of Proposition 1003 and is a simple application of the
Mean Value Theorem. Let f 0 (x) > 0 for every x 2 (a; b) and let x1 ; x2 2 (a; b) with x1 < x2 .
By the Mean Value Theorem, there exists c 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (c) = (22.15)
x2 x1
Since f 0 (c) > 0 for every c and x2 x1 > 0, from (22.15) it follows that f (x2 ) > f (x1 ).

The next example shows that the converse of the last result is false, so that for the
derivative of a strictly increasing function on an interval we can only say that it is 0 (and
not > 0).

Example 1006 Let f : R ! R be the cubic function f (x) = x3 . It is strictly increasing at


every point, but f 0 (0) = 0 and so it is not true that f 0 (x) > 0 for every x 2 R. N

Propositions 1003 and 1005 give very useful di¤erential criteria for the monotonicity of
scalar functions (dual versions hold for decreasing monotonicity). They hold also for closed or
half-open intervals, once the derivatives at the boundary points are understood as one-sided
ones.
We illustrate Proposition 1005 with an example.

Example 1007 (i) By Proposition 1005 (and its analog for decreasing monotonicity), the
quadratic function f (x) = x2 is strictly increasing on (0; 1) and strictly decreasing on
( 1; 0). (ii) By Proposition 1005, the function f (x) = 3x5 + 2x3 is strictly increasing both
on ( 1; 0) and on (0; 1). Nevertheless, the proposition cannot say anything about the strict
increasing monotonicity of f on R because f 0 (0) = 0. We can, however, check whether f is
strictly increasing on R through the de…nition of strict increasing monotonicity. To this end,
note that f (y) < f (0) = 0 < f (x) for every y < 0 < x, so f is indeed strictly increasing on
the entire real line. N

That said, we close with a curious characterization of strict monotonicity that, in a sense,
completes Proposition 1005.

Proposition 1008 Let f : (a; b) ! R be a di¤ erentiable function, with a; b 2 R. Then f


is strictly increasing if and only if f 0 0 and if for every a x0 < x00 b there exists
x 0 00 0
z x such that f (z) > 0.
22.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 699

Thus, it is the strict positivity at the points of an “order dense” subset of (a; b) that
characterizes strictly increasing functions. In view of Proposition 207, for a di¤erentiable
monotone function this strict positivity amounts to being injective.

Proof “If” Since f 0 0, the function f is increasing (Proposition 1003). Suppose, by


contradiction, that it is not strictly increasing. Then, there exist a x0 < x00 b such
that f (x) = f (x ) for all x 2 [x ; x ]. By Corollary 994, f (x) = 0 for all x 2 [x0 ; x00 ], a
0 0 00 0

contradiction. We conclude that f is strictly increasing.


“Only if” Let f be strictly increasing. Suppose, by contradiction, that there exist a
x < x00 b such that f 0 (x) = 0 for all x 2 [x0 ; x00 ]. Again by Corollary 994, the function f
0

is constant on [x0 ; x00 ], a contradiction.

We can revisit the last two examples in view of Proposition 1008. Indeed, by this result
we can say that the cubic function and the function f (x) = 3x5 + 2x3 are both strictly
increasing because their derivatives are everywhere strictly positive except at the origin.

A …nal twist: under continuous di¤erentiability, the “dense” strict positivity of the de-
rivative actually characterizes strictly increasing functions.

Corollary 1009 Let f : (a; b) ! R be a continuously di¤ erentiable function, with a; b 2 R.


Then f is strictly increasing if and only if for every a x0 < x00 b there exists x0 z x00
such that f 0 (z) > 0.

Proof In view of Proposition 1008, it is enough to show that f 0 0 if for every a x0 <
x00 b there exists x0 z x00 such that f 0 (z) > 0. Let x 2 (a; b). For each n large enough
so that x + 1=n 2 (a; b), there is a point x zn x + 1=n with f 0 (zn ) > 0. Since f 0 is
continuous, from zn ! x it follows that f 0 (x) = lim f 0 (zn ) 0. Since x was arbitrarily
chosen, we conclude that f 0 0.

22.5 Su¢ cient conditions for local extremal points


22.5.1 Local extremal points
In Section 22.1 we studied the fundamental necessary condition for a point to be a local
extremal, namely, the derivative being equal to zero at that point. The simple example
f (x) = x3 showed that the condition is necessary, but not su¢ cient. By the results just es-
tablished on monotonicity and di¤erentiability, we can now integrate what we saw in Section
22.1 with a su¢ cient condition for the existence of local extremal points. For simplicity, we
will consider only scalar functions.
The su¢ cient condition in which we are interested is based on a simple intuition: for x0
to be a local maximizer, there must exist a neighborhood of x0 in which the function …rst
increases – i.e., f 0 (x) > 0 if x < x0 – and then, once it has reached the maximum value at
x0 , decreases –i.e., f 0 (y) < 0 if y > x0 . Graphically:
700 CHAPTER 22. DIFFERENTIAL METHODS

6
y

O x x
0
1

0
-1 0 1 2 3 4 5

Proposition 1010 Let f : A R ! R and C A. An interior point x0 of C is a local


maximizer if there exists a neighborhood B" (x0 ) of x0 such that f is continuous at x0 and
di¤ erentiable at each x0 6= x 2 B" (x0 ), with

x < x0 < y =) f 0 (x) 0 f 0 (y) 8x; y 2 B" (x0 ) \ C (22.16)

If the inequalities in (22.16) are strict, the local maximizer is strong.

In a dual way, a local minimizer if in (22.16) we have f 0 (x) 0 f 0 (y), which is strong
if f 0 (x) < 0 < f 0 (y).6 Note that the di¤erentiability of f at x0 is not required, only its
continuity.

Proof Without loss of generality, assume that B" (x0 ) = (x0 "; x0 + ") C. Let x 2
(x0 "; x0 ). By the Mean Value Theorem, there exists 2 (x0 "; x0 ) such that
f (x0 ) f (x)
= f 0( )
x0 x
By (22.16), we have f 0 ( ) 0, from which we deduce that f (x0 ) f (x). In a similar way,
we can prove that f (x0 ) f (y) for every y 2 (x0 ; x0 + "). So, f (x0 ) f (x) for every
x 2 B" (x0 ) and therefore x0 is a local maximizer.

In particular, the following classic corollary of Proposition 1010 holds. Though weaker,
in many cases it is good enough.

Corollary 1011 Let f : A R ! R and C A. An interior point x0 of C is a local


maximizer if there exists a neighborhood B" (x0 ) of x0 on which f is di¤ erentiable, with
f 0 (x0 ) = 0 and

x < x0 < y =) f 0 (x) 0 f 0 (y) 8x; y 2 B" (x0 ) \ C (22.17)

If the inequalities in (22.17) are strict, the local maximizer is strong.


6
In particular, if in (22.16) we have f 0 (x) = 0 = f 0 (y), the point x0 is simultaneously a local maximizer
and a local minimizer, that is, the function f is locally constant at x0 .
22.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 701

Example 1012 Let f : R ! R be given by f (x) = 1 x2 and take x0 = 0. We have


f 0 (x) = 2x and hence (22.16) is satis…ed in a strict sense. Thanks to Proposition 1010 or
to Corollary 1011, x0 is a strong local maximizer. N

Example 1013 Let f : R ! R be given by f (x) = jxj and take x0 = 0. The function is
continuous at x0 and is di¤erentiable at every x 6= 0. We have
(
1 if x < 0
f 0 (x) =
1 if x > 0

and hence (22.16) is satis…ed in a strict sense. By Proposition 1010, x0 is a strong local
maximizer. Note that in this case Corollary 1011 cannot be applied. N

The previous su¢ cient condition can be substantially simpli…ed if we assume that the
function is twice continuously di¤erentiable. In this case, it is indeed su¢ cient to evaluate
the sign of the second derivative at the point.

Corollary 1014 Let f : A R ! R and C A. An interior point x0 of C is a strong


local maximizer if there exists a neighborhood B" (x0 ) of x0 on which f is twice continuously
di¤ erentiable, with f 0 (x0 ) = 0 and f 00 (x0 ) < 0.

Proof Thanks to the continuity of f 00 at x0 , we have limx!x0 f 00 (x) = f 00 (x0 ) < 0. The
Theorem on the permanence of sign implies the existence of a neighborhood B" (x0 ) such
that f 00 (x) < 0 for every x 2 B" (x0 ). Hence, by Proposition 1005 the …rst derivative f 0 is
strictly decreasing in B" (x0 ), that is,

x < x0 < y =) f 0 (x) > f 0 (x0 ) = 0 > f 0 (y)

By Proposition 1010, x0 is a strong local maximizer.

Example 1015 Going back to Example 1012, in view of Corollary 1014 it is actually su¢ -
cient to observe that f 00 (0) = 2 < 0 to conclude that x0 = 0 is a strong local maximizer.
Instead, Corollary 1014 cannot be applied to Example 1012 because f (x) = jxj is not
di¤erentiable at x0 = 0. N

The next example shows that the condition f 00 (x0 ) < 0 is su¢ cient, but not necessary:
there exist local maximizers x0 for which we do not have f 00 (x0 ) < 0.

Example 1016 Let f : R ! R be given by f (x) = x4 . The point x0 = 0 is a local


maximizer, yet f 00 (x0 ) = 0. N

22.5.2 Searching local extremal points via …rst and second order condi-
tions
Let x0 be an interior point of C. In view of Corollary 1014, we can say that:

(i) if f 0 (x0 ) = 0 and f 00 (x0 ) < 0, then x0 is a local maximizer;

(ii) if f 0 (x0 ) = 0 and f 00 (x0 ) > 0, then x0 is a local minimizer;


702 CHAPTER 22. DIFFERENTIAL METHODS

(iii) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local maximizer;

(iv) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local minimizer.

We can therefore reformulate the previous corollary as follows.

Corollary 1017 Let f : A R ! R and C A.

(i) Necessary condition for an interior point x0 of C to be a local maximizer is that there
exists a neighborhood B" (x0 ) of x0 on which f is twice continuously di¤ erentiable, with
f 0 (x0 ) = 0 and f 00 (x0 ) 0.

(ii) Su¢ cient condition for an interior point x0 of C to be a (strong) local maximizer is that
there exists a neighborhood B" (x0 ) of x0 on which f is twice continuously di¤ erentiable,
with f 0 (x0 ) = 0 and f 00 (x0 ) < 0.

Intuitively, if f 0 (x0 ) = 0 and f 00 (x0 ) < 0, the derivative function f 0 at x0 is zero and
strictly decreasing (because its derivative f 00 is strictly negative): therefore it goes, being
zero at x0 , from positive values to negative ones. Hence, the function is increasing before
x0 , stationary at x0 and decreasing after x0 . It follows that x0 is a maximizer.7 A similar
intuition holds for the necessary part.
As it should be clear by now, (i) is a necessary but not su¢ cient condition, while (ii) is
a su¢ cient but not necessary condition. It is an unavoidable asymmetry which we have to
live with.

Terminology The conditions on the second derivatives of the last corollary are called second-
order conditions. In particular:

(i) the inequality f 00 (x0 ) 0 (resp., f 00 (x0 ) 0) is called second-order necessary condition
for a maximizer (resp., for a minimizer ).

(ii) the inequality f 00 (x0 ) < 0 (resp., f 00 (x0 ) > 0) is called second-order su¢ cient condition
for a maximizer (resp., for a minimizer ).

The interest of Corollary 1017 is in allowing to establish a procedure for the search of
local maximizers and minimizers on C of a twice-di¤erentiable function f : A R ! R.
Though it will be considerably re…ned in Section 23.3, it is often good enough.
Suppose that f is twice continuously di¤erentiable on the set of the interior points int C
of C. The procedure has two stages, based on the …rst and second order su¢ cient conditions.
Speci…cally:

1. Determine the set S int C of the stationary interior points of f ; in other words, solve
the the …rst-order condition f 0 (x) = 0.

2. Compute f 00 at each of the stationary points x 2 S and check the second order su¢ cient
conditions: the point x is a strong local maximizer if f 00 (x) < 0, while it is a strong
local minimizer if f 00 (x) > 0. If f 00 (x) = 0 the procedure fails.
7
Alternatively, at x0 the function f is stationary and concave (see below), so it admits a maximizer.
22.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 703

The procedure is based on Corollary 1017-(ii). The …rst stage – i.e., the solution of the
…rst-order condition –is based on Fermat’s Theorem: stationary points are the only interior
points that are possible candidates for local extremal points. Hence, the knowledge acquired
in the …rst stage is “negative”, it rules out all the interior points that are not stationary as
none of them can be a local maximizer or minimizer.
The second stage –i.e., the check of the second order condition –examines one by one the
possible candidates from the …rst stage to see if they meet the su¢ cient condition established
in Corollary 1017-(ii).

Example 1018 Let f : R ! R be given by f (x) = 10x3 (x 1)2 and C = R. Via the
procedure, we search the local extremal points of f on R. We have C = int C = R and f is
twice continuously di¤erentiable on R. As to stage 1, by recalling what we saw in Example
988, we have:
S = f0; 1; 3=5g
The stationary points in S are the unique candidates for local extremal points. As to stage
2, we have
f 00 (x) = 60x (x 1)2 + 120x2 (x 1) + 20x3
and therefore f 00 (0) = 0, f 00 (1) > 0 and f 00 (3=5) < 0. Hence, the point 1 is a strong local
minimizer, the point 3=5 is a strong local maximizer, while the nature of the point 0 remains
undetermined. N

The procedure, although very useful, has important limitations. First of all, it can deal
only with the interior points of C at which f is twice continuously di¤erentiable. It is,
instead, completely silent on the other points of C –that is, on its boundary points as well
as on its interior points at which f is not twice continuously di¤erentiable.

Example 1019 Let f : [0; 1] ! R be de…ned by


(
x if x 2 (0; 1)
f (x) =
2 if x 2 f0; 1g

The boundary points 0 and 1 are local maximizers, but the procedure is not able to recognize
them as such. N

A further limitation of the procedure is the indeterminacy in the case f 00 (0) = 0, as the
simple function f (x) = x4 most eloquently shows: whether or not the stationary point x = 0
is a local minimizer cannot be determined through the procedure because f 00 (0) = 0. Let us
see another example which is as trivial as disconcerting (for the procedure’s self-esteem).

Example 1020 A constant function f : R ! R is trivially twice continuously di¤erentiable


on R. Given any open set C of R, we have f 0 (x) = f 00 (x) = 0 for every x 2 C. Therefore,
all the points of C are stationary and the procedure is not able to say anything about their
nature. But, each point in C is trivially both a maximizer and a minimizer (and a global
one too!). N
704 CHAPTER 22. DIFFERENTIAL METHODS

22.5.3 Searching global extremal points via …rst and second order condi-
tions
We can apply what we just learned to the unconstrained optimization problem (22.5), re…ning
for the scalar case the analysis of Section 22.1.3. So, consider the unconstrained optimization
problem

max f (x) sub x 2 C


x
where the set C is an open set of the real line. Assume that f is twice continuously di¤er-
entiable on C, that is, f 2 C 2 (C).
By Corollary 1017-(i), we now have a further necessary condition for a point x^ 2 C to
00
be a solution, that is, the second-order necessary condition f (^x) 0. We thus have the
following procedure for …nding solutions of the unconstrained optimization problem:

1. Determine the set S C of the stationary interior points of f by solving the …rst order
condition f 0 (x) = 0.

2. Compute f 00 in each of the stationary points x 2 S and compute the set S2 =


fx 2 S : f 00 (x) 0g.

3. Determine the set

S3 = x 2 S2 : f (x) f x0 for all x0 2 S2

of the points of C that are candidate solutions of the optimization problem.

Note that the procedure is not conclusive because a key piece of information is lacking:
whether the problem actually admits a solution. The di¤erential methods of this chapter
do not ensure the existence of a solution, which only Weierstrass’ and Tonelli’s Theorems
are able to guarantee (in the absence of concavity properties of the objective functions).
In Chapter 28, we will show how the elimination method re…nes, in a resolutive way, the
procedure that we outlined here by combining such existence theorems with the di¤erential
methods.

Example 1021 As usual, the study of the cubic function f (x) = x3 is of illuminating
simplicity: though the unconstrained optimization problem

max x3 sub x 2 R
x

does not admit solutions, nevertheless the procedure determines the singleton S3 = f0g.
According to the procedure, the point 0 is the unique candidate solution of the problem:
unfortunately, the solution does not exist and it is, therefore, a useless candidacy. N

The next examples illustrate the procedure.


4 2
Example 1022 Let f : R ! R be given by f (x) = e x +x and let C = R. Let us apply
the procedure to the unconstrained optimization problem
x4 +x2
max e sub x 2 R
x
22.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 705

The …rst-order condition f 0 (x) = 0 is


x4 +x2
4x3 + 2x e =0
p
So, x = 0 and x = 1= 2 are the unique stationary points, that is,

1 1
S= p ; 0; p
2 2
Since
x4 +x2
f 00 (x) = 2 8x6 8x4 4x2 + 1 e
p p
we have f 00 (0) > 0 and f 00 1= 2 = f 00 1= 2 < 0, so

1 1
S2 = p ;p
2 2
p p
On the other hand, f 1= 2 = f 1= 2 , and hence S3 = S2 . In conclusion, the points
p
x = 1= 2 are the candidate solutions of the unconstrained optimization problem. Example
1266, through the elimination method, will show that these points are, indeed, solutions of
the problem. N

Example 1023 Consider again Example 1018 and the unconstrained optimization problem

max 10x3 (x 1)2 sub x 2 R


x

From Example 1018 we know that

S = f0; 1; 3=5g

as well as that f 00 (0) = 0, f 00 (1) > 0 and f 00 (3=5) < 0. Hence,

3
S2 = 0;
5

Since
3
f (0) = 0 < f
5
we get
3
S3 =
5
The point x = 3=5 is therefore the unique candidate solution of the unconstrained optimiza-
tion problem. As in the example of the cubic function, unfortunately this candidacy is vain:
indeed,
lim 10x3 (x 1)2 = +1
x!+1

Therefore the function, being unbounded above, has no global maximizers on R. The un-
constrained optimization problem has no solutions. N
706 CHAPTER 22. DIFFERENTIAL METHODS

It is important to observe how the global nature of the solution gives a di¤erent perspect-
ive on Corollary 1017. Of this result, we are interested in point (i) that provides a necessary
conditions for local maximizers (second-order necessary condition of the form f 00 (x) 0).
At the same time, in the previous search for local extremal points we considered point (ii) of
such result that covers su¢ ciency (second-order su¢ cient condition of the form f 00 (x) < 0).
From the “global” point of view, the fact that f 00 (x) < 0 implies that x is a strong local
maximizer is of secondary importance. Indeed, it is not conclusive: the point could be just a
local maximizer and, moreover, we could also have solutions where f 00 (x) = 0.8 In contrast,
the information f 00 (x) > 0 is conclusive in that it excludes, ipso facto, that x may be a
solution.
This is another example of how the global point of view, the one which we are really
interested in applications, can lead to view things in a di¤erent way relative to a local point
of view.9

22.5.4 A false start: global extremal points


The intuition presented at the beginning of the section can lead, for open domains and
with global hypotheses of di¤erentiability, to simple su¢ cient conditions for global extremal
points. Also here we limit ourselves to the scalar case.

Proposition 1024 Let f : (a; b) ! R be di¤ erentiable, with a; b 2 R. A point x0 2 (a; b) is


a global maximizer if, for every x; y 2 (a; b), we have

x < x0 < y =) f 0 (x) 0 f 0 (y) (22.18)

If the inequalities are strict, the maximizer is strong (so, unique).

Naturally, x < x0 < y implies f 0 (x) 0 f 0 (y) is the dual version of (22.18) that leads
to global minimizers.

Proof Let x 2 (a; b) be such that x < x0 . Fixing any " 2 (x0 x; x0 a), it follows that
x 2 (x0 "; x0 ). By the Mean Value Theorem there exists 2 (x0 "; x0 ) such that

f (x0 ) f (x)
= f 0( )
x0 x

By (22.18), f 0 ( ) 0, from which we deduce that f (x0 ) f (x). In an analogous way we


prove that f (x0 ) f (y) for every y > x0 . In conclusion, f (x0 ) f (x) for every x 2 (a; b),
and so x0 is a maximizer.

Example 1025 If we go back to f (x) = 1 x2 of Example 1012, we have

x < 0 < y =) f 0 (x) > 0 > f 0 (y)

So, by Proposition 1024 x0 = 0 is a strong global maximizer. N


8
For example, this is the case for the unconstrained optimization problem maxx x4 sub x 2 R.
9
Calculus courses often emphasized the local viewpoint. Motivated by applications, throughout the book
we do the opposite.
22.6. DE L’HOSPITAL’S THEOREM AND RULE 707

Despite being attractive because of its simplicity, the global hypothesis (22.18) on deriv-
atives is less relevant than one can think prima facie because in applications it is typically
subsumed by concavity. Indeed, under concavity the …rst derivative (if exists) is decreasing
(cf. Corollary 1092), so condition (22.18) automatically holds provided the …rst order con-
dition f 0 (x0 ) = 0 holds. Though condition (22.18) can be used to …nd the maximizers of
functions that are not concave –e.g., in Example 1258 we will apply it to the Gaussian func-
tion, which is neither concave nor convex –it is much more convenient to consider a general
property of a function, like concavity, that does not require, a priori, the identi…cation of
a point x0 on which to check (22.18). All this explains the brevity of this section (and its
title). The role of concavity, instead, will be studied at length later in the book.

22.6 De l’Hospital’s Theorem and rule


22.6.1 Indeterminate forms 0=0 and 1=1
In this section we consider the so-called de l’Hospital’s rule,10 another classic application of
the Mean Value Theorem that is most useful in the computation of limits that come in the
indeterminate forms 0=0 and 1=1.
As we will see, the rule says that, under suitable conditions, it is possible to reduce the
computation of the limit of a ratio limx!x0 f (x) =g (x) to that of the ratio of the derivatives,
that is, limx!x0 f 0 (x) =g 0 (x). Since this latter limit may be simpler than the former one, the
rule o¤ers one more instrument in the calculation of limits. As just anticipated, it reveals
itself particularly valuable for the indeterminate forms of the type 0=0 and 1=1 (to which,
as we know, it is possible to reduce all the other ones).

Theorem 1026 (de l’Hospital) Let f; g : (a; b) ! R be di¤ erentiable on (a; b), with a; b 2
R and g 0 (x) 6= 0 for every x 2 (a; b), and let x0 2 [a; b], with
f 0 (x)
lim =L2R (22.19)
x!x0 g 0 (x)
If either limx!x0 f (x) = limx!x0 g (x) = 0 or limx!x0 f (x) = limx!x0 g (x) = 1, then
f (x)
lim =L
x!x0 g (x)
Thus, de l’Hospital’s rule says that, under the hypotheses just indicated, we have
f 0 (x) f (x)
lim = L =) lim =L
x!x0 g 0 (x) x!x0 g (x)

i.e., the calculation of the limit limx!x0 f (x) =g (x) can be reduced to the calculation of
the limit of the ratio of the derivatives limx!x0 f 0 (x) =g 0 (x). The simpler the second limit
compared to the original one, the greater the usefulness of the rule.

Note that the –by now usual –clause a; b 2 R allows the interval (a; b) to be unbounded.
The rule holds, therefore, also for limits as x ! 1. Moreover, it applies also to one-sided
limits, even if for brevity we have omitted this case in the statement.
10
The result is actually due to Johann Bernoulli.
708 CHAPTER 22. DIFFERENTIAL METHODS

We omit the proof of the l’Hospital’s Theorem. Next we illustrate his rule with some
examples.

Example 1027 Let f : ( 1; +1) ! R be given by f (x) = log (1 + x) and let g : R ! R


be given by g (x) = x. For x0 = 0 the limit limx!x0 f (x) =g (x) is of the indeterminate form
0=0. Let us see if de l’Hospital’s rule can be applied and be of any help.
Let B" (0) = ( "; ") be a neighborhood of x0 such that ( "; ") ( 1; +1). In ( "; ")
the hypotheses of de l’Hospital’s rule are satis…ed. Hence,
1
f 0 (x) 1 f (x) log (1 + x)
lim 0
= lim 1+x = lim = 1 =) lim = lim =1
x!x0 g (x) x!0 1 x!0 1 + x x!x 0 g (x) x!0 x
So, de l’Hospital’s rule proved to be useful in the solution of an indeterminate form. N

Example 1028 Let f; g : R ! R be given by f (x) = sin x and g (x) = x. Set x0 = 0 and
consider the classic limit limx!x0 f (x) =g (x). In every interval ( "; ") the hypotheses of de
l’Hospital’s rule are satis…ed, so

f 0 (x) cos x f (x) sin x


lim = lim = lim cos x = 1 =) lim = lim =1
x!x0 g 0 (x) x!0 1 x!0 x!x0 g (x) x!0 x

It is nice to see how de l’Hospital’s rule solves, in a simple way, this classic limit. N

Example 1029 Let f : (0; 1) ! R be given by f (x) = log x and g : R ! R be given


by g (x) = x. Setting x0 = +1, the limit limx!x0 f (x) =g (x) is in the indeterminate form
1=1. In every interval (a; +1), with a > 0, the hypotheses of de l’Hospital’s rule are
satis…ed. So,
1
f 0 (x) x f (x) log x
lim 0
= lim = 0 =) lim = lim =0
x!x0 g (x) x!+1 1 x!x0 g (x) x!+1 x

The next example shows that for the solution of some limits it may be necessary to apply
de l’Hospital’s rule several times.

Example 1030 Let f; g : R ! R be given by f (x) = ex and g (x) = x2 . Setting x0 = +1,


the limit limx!x0 f (x) =g (x) is in the indeterminate form 1=1. In every interval (a; +1),
with a > 0, the hypotheses of de l’Hospital’s rule are satis…ed. We have

f 0 (x) ex 1 ex f (x) ex 1 ex
lim = lim = lim =) lim = lim = lim (22.20)
x!x0 g 0 (x) x!+1 2x 2 x!+1 x x!x0 g (x) x!+1 x2 2 x!+1 x
obtaining a simpler limit, but still not solved.
Let us apply again de l’Hospital’s rule to the derivative functions f 0 ; g 0 : R ! R given by
f (x) = ex and g 0 (x) = x. Again in every interval (a; +1), with a > 0, the hypotheses of
0

de l’Hospital’s rule are satis…ed, and hence

f 00 (x) ex f 0 (x) ex
lim = lim = +1 =) lim = lim = +1
x!x0 g 00 (x) x!+1 1 x!x0 g 0 (x) x!+1 x
22.6. DE L’HOSPITAL’S THEOREM AND RULE 709

Thanks to (22.20), we conclude that

f (x) ex
lim = lim 2 = +1
x!x0 g (x) x!+1 x

To calculate this limit we had to apply de l’Hospital’s rule twice. N

Example 1031 In a similar way it is possible to calculate the limit of the ratio between
f (x) = 1 cos x and g (x) = x2 as x ! 0:

f 0 (x) sin x cos x 1 f (x) 1 cos x 1


lim 0
= lim = lim = =) lim = lim 2
=
x!x0 g (x) x!0 2x x!0 2 2 x!x0 g (x) x!0 x 2
N

In some cases de l’Hospital’s rule is useless or even counterproductive. This happens


when the behavior of the ratio f 0 (x) =g 0 (x) is more irregular than that of the original ratio
f (x) =g (x). The next examples illustrate this unpleasant situation.
2
Example 1032 Let f; g : R ! R be given by f (x) = ex and g (x) = ex . Setting x0 = +1,
the limit limx!x0 f (x) =g (x) is in the indeterminate form 1=1. In every interval (a; +1),
with a > 0, the hypotheses of de l’Hospital’s rule are satis…ed. We have
2 2 2 2
f 0 (x) 2xex xex f (x) ex xex
lim = lim = 2 lim =) lim = lim = 2 lim
x!x0 g 0 (x) x!+1 ex x!+1 ex x!x0 g (x) x!+1 ex x!+1 ex

and therefore the application of de l’Hospital’s rule has led to a more complicated limit than
the original one. In this case, the rule is useless, while the limit can be solved very easily in
a direct way:
2
ex 2
lim = lim ex x = lim ex(x 1) = +1
x!+1 ex x!+1 x!+1

As usual, cogito ergo solvo: mindless mechanical arguments may well lead astray. N

Example 1033 Let f; g : R ! R be given by f (x) = sin x and g (x) = x. By setting x0 =


+1, we can easily prove that limx!x0 f (x) =g (x) = 0. On the other hand, in every interval
(a; +1), with a > 0, the hypotheses of de l’Hospital’s rule are satis…ed since limx!+1 g (x) =
+1. However, the limit
f 0 (x) cos x
lim 0 = lim
x!x0 g (x) x!+1 1

does not exist. If we tried to compute the simple limit limx!x0 f (x) =g (x) = 0 through de
l’Hospital’s rule we would have used a tool both useless, given the simplicity of the limit,
and ine¤ective. Again, a mechanical use of the rule can be very misleading. N

Summing up, de l’Hospital’s rule is a useful tool in the computation of limits, but its use-
fulness must be evaluated case by case. Moreover, it is important to note that de l’Hospital’s
Theorem states that, if lim f 0 =g 0 exists, then lim f =g exists too, and the two limits are equal.
The converse does not hold: it may happen that lim f =g exists but not lim f 0 =g 0 . We have
already seen an example of this, but we show two other examples, a bit more complicated.
710 CHAPTER 22. DIFFERENTIAL METHODS

Example 1034 Given f (x) = x sin x and g (x) = x + sin x, we have

sin x
f (x) x sin x 1
lim = lim = lim x =1
x!1 g (x) x!1 x + sin x x!1 sin x
1+
x
but
f 0 (x) 1 cos x
lim 0
= lim
x!1 g (x) x!1 1 + cos x

does not exist because both the numerator and the denominator oscillate between 0 and 2,
so the ratio oscillates between 0 and +1. N
1
Example 1035 Given f (x) = x2 sin and g (x) = log (1 + x), we have
x

f (x) x2 sin x1 x sin x1 0


lim = lim = lim = =0
x!0 g (x) x!0 log (1 + x) x!0 log (1 + x) 1
x
But
f 0 (x) 2x sin x1 cos x1
lim = lim
x!0 g 0 (x) x!0 1
1+x
does not exist because the denominator tends to 1 and in the numerator the …rst summand
tends to 0 and the second one does not admit limit. N

22.6.2 Other indeterminacies


De l’Hospital’s rule can be applied, through suitable manipulations, also to the indeterminate
forms 1 1 and 0 1.
Let us start with the form 0 1. Let f; g : (a; b) ! R be di¤erentiable on (a; b) and
let x0 2 [a; b] be such that limx!x0 f (x) = 0 and limx!x0 g (x) = 1, so that the limit
limx!x0 f (x) g (x) appears in the indeterminate form 0 1. Let, for example, limx!x0 g (x) =
+1 (the case limx!x0 g (x) = 1 is analogous). There exists a > 0 such that g (x) > 0 for
every x 2 (a; +1). Therefore,

f (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
g(x)

with limx!x0 1=g (x) = 0 and de l’Hospital’s rule is applicable to the functions f and 1=g. If
f is di¤erent from zero in a neighborhood of x0 , we can also write

g (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
f (x)

with limx!x0 1=f (x) = 1. In this case, de l’Hospital’s rule can be applied to the functions
g and 1=f . Which one of the two possible applications of the rule is more convenient must
be evaluated case by case.
22.6. DE L’HOSPITAL’S THEOREM AND RULE 711

Example 1036 Let f : R ! R be given by f (x) = x and g : (0; 1) ! R be given by


g (x) = log x. Setting x0 = 0, the one-sided limit limx!x+ f (x) g (x) is in the indeterminate
0
form 0 1. The function 1=x is de…ned and strictly positive on (0; 1). On each interval
(a; +1), with a > 0, the hypotheses of de l’Hospital’s rule are satis…ed for the functions
log x and 1=x since limx!0+ log x = 1 and limx!0+ 1=x = +1. Hence
1
g 0 (x) x g (x)
lim 0 = lim 1 = lim ( x) = 0 =) lim 1 = lim f (x) g (x) = 0
x!x0 1 x!0+ x!0+ x!x+ x!x+
x2 0 f (x) 0
f (x)

Turn now to the indeterminate form 1 1. Let f; g : (a; b) ! R be di¤erentiable on


(a; b) and let x0 2 [a; b] be such that limx!x0 f (x) = +1 and limx!x0 g (x) = 1. Let us
suppose, for simplicity, that in a neighborhood of x0 both g and f are di¤erent from zero.
There are at least two possible ways to proceed. We can consider

g (x)
lim (f (x) + g (x)) = lim f (x) 1 + (22.21)
x!x0 x!x0 f (x)

and apply de l’Hospital’s rule to the limit limx!x0 g (x) =f (x), which has the form 1=1.
Alternatively, we can consider
1 1
+
f (x) g (x)
lim (f (x) + g (x)) = lim (22.22)
x!x0 x!x0 1
f (x) g (x)

and apply de l’Hospital’s rule to the limit


1 1
+
f (x) g (x)
lim
x!x0 1
f (x) g (x)

which is in the form 0=0.

Example 1037 Let f : R ! R be given by f (x) = x and g : (0; 1) ! R be given by g (x) =


log x. Setting x0 = +1, the limit limx!x0 (f (x) + g (x)) is in the indeterminate form
1 1. In Example 1029 we saw, thanks to de l’Hospital’s rule, that limx!+1 (log x) =x = 0.
It follows that
log x
lim (x log x) = lim x 1 = +1
x!+1 x!+1 x
and hence the approach (22.21) has allowed to calculate the limit. N
712 CHAPTER 22. DIFFERENTIAL METHODS
Chapter 23

Approximation

23.1 Taylor’s polynomial approximation


23.1.1 Polynomial expansions
Thanks to Theorem 936, a function f : (a; b) ! R that is di¤erentiable at x0 2 (a; b) admits
locally, at such a point, the linear approximation
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0
This approximation has two basic properties:

(i) the simplicity of the approximating function: the a¢ ne function f (x0 ) + f 0 (x0 ) h =
f (x0 ) + df (x0 ) (h) (geometrically, a straight line);
(ii) the quality of the approximation, given by the error term o (h).

Intuitively, there is a tension between these two properties: the simpler the approximating
function, the worse the quality of the approximation. In other terms, the simpler we want
the approximating function to be, the higher the error which we may incur. In this section
we study in detail the relation between these two key properties. In particular, suppose
one modi…es property (i) with an approximating function that is a polynomial of degree n,
not necessarily with n = 1 as in the case of a straight line. The desideratum that we posit
is that there is a corresponding improvement in the error term, namely, it should become
of magnitude o (hn ). In other words, when the degree n of the approximating polynomial
increases, and so does the complexity of the approximating function, we want that the error
term improves in a parallel way: an increase in the complexity of the approximating function
should be compensated by an improvement in the quality of the approximation.

To formalize these ideas, we introduce polynomial expansions. Recall that a polynomial


pn : R ! R of, at most, degree n has the form p(h) = 0 + 1 h + 2 h2 + + n hn .

De…nition 1038 A function f : (a; b) ! R admits a polynomial expansion of degree n at


x0 2 (a; b) if there exists a polynomial pn : R ! R, of at most degree n, such that
f (x0 + h) = pn (h) + o (hn ) as h ! 0 (23.1)
for every h 6= 0 with x0 + h 2 (a; b), that is, with h 2 (a x0 ; b x0 ).

713
714 CHAPTER 23. APPROXIMATION

For n = 1, the polynomial pn reduces to the a¢ ne function r (h) = 0 + 1 h of Section


20.12.1, so the approximation (23.1) reduces to (20.24). Therefore, for n = 1 the expansion
of f at x0 is equal, apart from the known term 0 , to the di¤erential of f at x0 .
For n 2 the notion of polynomial expansion goes beyond that of di¤erential. In par-
ticular, f has a polynomial expansion of degree n at x0 2 (a; b) if there exists a polynomial
pn : R ! R that approximates f (x0 + h) with an error which is o (hn ), i.e., which, as h ! 0,
goes to zero faster than hn . To a polynomial approximation of degree n there corresponds,
therefore, an error term of magnitude o (hn ), thus formalizing the aforementioned trade-o¤
between the complexity of the approximating function and the goodness of the approxima-
tion.
For example, if n = 2 we have the so-called quadratic approximation
2
f (x0 + h) = 0 + 1h + 2h + o h2 as h ! 0

Compared to the linear approximation

f (x0 + h) = 0 + h + o (h) as h ! 0

The approximating function is now more complicated: instead of a straight line –the poly-
nomial of …rst degree 0 + h – we have a quadratic function – the polynomial of second
degree 0 + 1 h + 2 h2 . On the other hand, the error term is now better: instead of o (h)
we have o h2 .

N.B. By setting x = x0 + h, the polynomial expansion can be equivalently recast as


n
X
f (x) = k (x x0 )k + o ((x x0 )n ) as x ! x0 (23.2)
k=0

for every x 2 (a; b). It is a form often used. O

Next we establish a key property: when they exist, polynomial expansions are unique.

Lemma 1039 A function f : (a; b) ! R has at most one polynomial expansion of degree n
at every point x0 2 (a; b).

Proof Suppose that, for every h 2 (a x0 ; b x0 ), there are two di¤erent expansions
2 n
0 + 1h + 2h + + nh + o (hn ) = 0 + 1h + 2h
2
+ + nh
n
+ o (hn ) (23.3)

Then
2 n
0 = lim 0 + 1h + 2h + + nh + o (hn )
h!0
2 n
= lim 0 + 1h + 2h + + nh + o (hn ) = 0
h!0

and (23.3) becomes


2 n
1h + 2h + + nh + o (hn ) = 1h + 2h
2
+ + nh
n
+ o (hn ) (23.4)
23.1. TAYLOR’S POLYNOMIAL APPROXIMATION 715

Dividing both sides by h, we then get


n 1
1 + 2h + + nh + o hn 1
= 1 + 2h + + nh
n 1
+ o hn 1

Hence,
n 1
1 = lim 1 + 2h + + nh + o hn 1
h!0
n 1
= lim 1 + 2h + + nh + o hn 1
= 1
h!0

and (23.4) becomes


2 n
2h + + nh + o (hn ) = 2h
2
+ + nh
n
+ o (hn )

Continuing in this way we can show that 2 = 2 , and so on until we show that n = n.
This proves that at most one polynomial p (h) can satisfy approximation (23.1).

23.1.2 Taylor’s Theorem


De…nition 1040 Let f : (a; b) ! R be a function n times di¤ erentiable at a point x0 2 (a; b).
The polynomial Tn : R ! R of degree at most n given by
1 1 (n)
Tn (h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + + f (x0 ) hn
2 n!
Xn (k)
f (x0 ) k
= h
k!
k=0

is called the Taylor polynomial of degree n of f at x0 .

To ease notation we put f (0) = f . The polynomial Tn has as coe¢ cients the derivatives
of f at the point x0 , up to order n. In particular, if x0 = 0 the Taylor’s polynomial is
sometimes called Maclaurin’s polynomial.

The next result, fundamental and of great elegance, shows that if f has a suitable num-
ber of derivatives at x0 , the unique polynomial expansion is given precisely by the Taylor
polynomial.

Theorem 1041 (Taylor) Let f : (a; b) ! R be a function that is n 1 times di¤ erentiable
on (a; b). If f is n times di¤ erentiable at x0 2 (a; b), then it has at x0 a unique polynomial
expansion pn of degree n, given by

pn (h) = Tn (h) (23.5)

Under simple hypotheses of di¤erentiability at x0 , we thus obtain the fundamental poly-


nomial approximation
n
X
n f (k) (x0 )
f (x0 + h) = Tn (h) + o (h ) = hk + o (hn ) (23.6)
k!
k=0
716 CHAPTER 23. APPROXIMATION

where Tn is the unique polynomial, of degree at most n, that satis…es De…nition 1038, i.e.,
which is able to approximate f (x0 + h) with error o (hn ).
The approximation (23.6) is called Taylor’s expansion (or formula) of order n of f at
x0 . The important special case x0 = 0 is called Maclaurin’s expansion (or formula) of order
n of f .

Note that for n = 1 Taylor’s Theorem coincides with the direction “if” of Theorem 936.
Indeed, since we set f (0) = f , saying that f is 0 times di¤erentiable on (a; b) is simply
equivalent to saying that f is de…ned on (a; b). Hence, for n = 1 Taylor’s Theorem states
that, if f : (a; b) ! R is di¤erentiable at x0 2 (a; b), then

f (x0 + h) = T1 (h) + o (h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0

that is, f is di¤erentiable at x0 .


For n = 1, the polynomial approximation (23.6) reduces, therefore, to the linear approx-
imation (20.29), that is, to

f (x0 + h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0

For n = 2, (23.6) becomes the quadratic (or second-order) approximation


1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2 as h ! 0 (23.7)
2
and so on for higher orders.

Approximation (23.6) is key in applications and is the actual form that the aforemen-
tioned tension between the complexity of the approximating polynomial and the goodness of
the approximation takes. The trade-o¤ must be solved case by case, according to the relative
importance that the two properties of the approximation –complexity and quality –have in
the particular application which we are interested in. In many cases, however, the quadratic
approximation (23.7) is a good compromise and so, among all the possible approximations,
it has a particular importance.

O.R. Graphically, the quadratic approximation is a parabola. The linear approximation,


as we well know, is graphically the straight line tangent to the graph of the function. The
quadratic approximating is the so-called osculating parabola 1 that shares at x0 the same value
as the function, the same slope (…rst derivative), and the same curvature (second derivative).
H

Proof In light of Lemma 1039, it is su¢ cient to show that Taylor’s polynomial satis…es
(23.1). Let us start by observing preliminarily that, by hypothesis, the higher order derivative
functions f (k) : (a; b) ! R exist for every 1 k n 1. Moreover, by Proposition 937 f (k)
is continuous at x0 for 1 k n 1. Let ' : (x0 a; b x0 ) ! R and : R ! R be the
auxiliary functions given by, respectively,
n
X f (k) (x0 )
' (h) = f (x0 + h) hk and (h) = hn
k!
k=0
1
From the Latin os, mouth (that is, it is the “kissing” parabola, where the kiss is with f at x0 ).
23.1. TAYLOR’S POLYNOMIAL APPROXIMATION 717

We have to prove that


' (h)
lim =0 (23.8)
h!0 (h)

We have, for every 0 k n 1,


(k) (k)
lim (h) = (0) (23.9)
h!0

Moreover, since f (k) is continuous at x0 for 0 k n 1, we have


n
Xk
(k) (k) (k) f (k+j) (x0 ) j
' (h) = f (x0 + h) f (x0 ) h (23.10)
j!
j=1

so that
lim '(k) (h) = '(k) (0) = 0 (23.11)
h!0
Thanks to (23.9) and (23.11), we can apply de l’Hospital’s rule n 1 times, and get

'(n 1) (h) '(n 2) (h) '(0) (h)


lim (n 1) (h)
= L =) lim (n 2) (h)
= L =) =) lim =L (23.12)
h!0 h!0 h!0 (0) (h)

with L 2 R. Simple calculations show that (n 1) (h) = n!h. Hence, since f has n derivatives
at x0 , expression (23.10) with k = n 1 yields

'(n 1) (h) 1 f (n 1) (x
0 + h) f (n 1) (x
0) hf (n) (x0 )
lim (n 1) (h)
= lim
h!0 n! h!0 h
!
1 f (n 1) (x
0 + h) f (n 1) (x
0)
= lim f (n) (x0 ) =0
n! h!0 h

By (23.12), we can therefore conclude that (23.8) holds, as desired.

As seen for (23.2), by setting x = x0 + h the polynomial approximation (23.6) can be


rewritten as
Xn
f (k) (x0 )
f (x) = (x x0 )k + o ((x x0 )n ) (23.13)
k!
k=0
This is the form in which the approximation is often stated.

We now illustrate Taylor’s (or Maclaurin’s) expansions with some examples.

Example 1042 Polynomials have, Pntrivially, polynomial approximations. Indeed, if f : R !


k
R is itself a polynomial, f (x) = k=0 k x , then we obtain the identity
n
X f (k) (0)
f (x) = xk 8x 2 R
k!
k=0

since, as the reader can verify, one has

f (k) (0)
k = 81 k n
k!
718 CHAPTER 23. APPROXIMATION

Each polynomial can therefore be equivalently rewritten as a Maclaurin’s expansion. For


example, if f (x) = x4 3x3 , we have f 0 (x) = 4x3 9x2 , f 00 (x) = 12x2 18x; f 000 (x) = 24x 18
and f (iv) (x) = 24, so

f 00 (0)
0 = f (0) = 0 , 1 = f 0 (0) = 0 , 2= =0
2!
f 000 (0) 18 f (iv) (0) 24
3 = = = 3 , 4 = = =1
3! 6 4! 24
N

Example 1043 Let f : (0; 1) ! R be given by f (x) = log (1 + x). It is n times di¤erenti-
able at each point of its domain, with

(n 1)!
f (n) (x) = ( 1)n+1 8n 1
(1 + x)n

Therefore, Taylor’s expansion of order n of f at x0 > 0 is

h h2
log (1 + x0 + h) = log (1 + x0 ) +
2 (1 + x0 )2
1 + x0
h3 n+1 hn
+ + + ( 1) + o (hn )
3 (1 + x0 )3 n (1 + x0 )n
Xn
hk
= log (1 + x0 ) + ( 1)k+1 k
+ o (hn )
k=1
k (1 + x0 )

or equivalently, using (23.13),

n
X (x x0 )k
log (1 + x) = log (1 + x0 ) + ( 1)k+1 k
+ o ((x x0 )n )
k=1
k (1 + x0 )

Note how a simple polynomial approximates – as well as we wish, because o ((x x0 )n )


can be made arbitrarily small – the logarithmic function. In particular, the Maclaurin’s
expansion of order n of f is

x2 x3 xn
log (1 + x) = x + + + ( 1)n+1 + o (xn ) (23.14)
2 3 n
n
X xk
= ( 1)k+1 + o (xn )
k
k=1

Example 1044 In a similar way the reader can verify the Maclaurin’s expansions of order
23.1. TAYLOR’S POLYNOMIAL APPROXIMATION 719

n of the following elementary functions:


X xk n
x x2 x3 xn
e =1+x+ + + + + o (xn ) = + o (xn )
2 3! n! k!
k=0
1 3 1 ( 1)n 2n+1
sin x = x x + x5 + + x + o x2n+1
3! 5! (2n + 1)!
n
X ( 1)k 2k+1
= x + o x2n+1
(2k + 1)!
k=0
X ( 1)k n
1 2 1 ( 1)n 2n
cos x = 1 x + x4 + + x +o x 2n
= x2k + o x2n
2 4! (2n)! (2k)!
k=0

Here too it is important to observe how such functions can be (well) approximated by simple
polynomials. N

Example 1045 Let f : ( 1; +1) ! R be given by f (x) = log 1 + x3 3 sin2 x. This


function is in…nitely di¤erentiable (i.e., it has derivatives of any order) at each point of its
domain. Let us calculate the second-order Maclaurin expansion. We have

3x2 3x4 + 6x
f 0 (x) = 6 cos x sin x , f 00 (x) = 6(cos2 x sin2 x)
1 + x3 (1 + x3 )2

So,
1
f (x) = f (0) + f 0 (0) x + f 00 (0) x2 + o x2 = 3x2 + o x2 (23.15)
2
N

Example 1046 Let f : ( 1; +1) ! R be given by f (x) = e x (log (1 + x) 1) + 1. This


function in…nitely di¤erentiable at each point of its domain. We leave to the reader to verify
that the third-order Taylor expansion at x0 = 3 is given by
log 4 1 5 4 log 4 16 log 4 25
f (x) = +1+ (x 3) + (x 3)2
e3 4e3 32e3
63 32 log 4
+ (x 3)3 + o (x 3)3
192e3
N

Under stronger di¤erentiability assumptions, we can sharpen the approximation (23.6)


by using the formulas (22.8) and (22.9) of the ultimate version of the Mean Value Theorem.

Theorem 1047 Let f : (a; b) ! R be a function n + 1 times continuously di¤ erentiable. If


x0 2 (a; b), then for every 0 6= h 2 R with x0 + h 2 (a; b) there exists 0 < # < 1 such that
n
X f (k) (x0 ) f (n+1) (x0 + #h) n+1
f (x0 + h) = hk + h (23.16)
k! (n + 1)!
k=0

In particular, f (n+1) (x0 + #h) hn+1 = (n + 1)! = o (hn ).


720 CHAPTER 23. APPROXIMATION

In other words, under the hypotheses of the theorem the error term o (hn ) can always be
taken equal to
f (n+1) (x0 + #h) n+1
h (23.17)
(n + 1)!
where the (n + 1)-th derivative is calculated at an intermediate point between x0 and x0 + h.
The expression indicated allows to control the approximation error: if f (n+1) (x) k for
every x 2 (a; b), then one can conclude that the approximation error does not exceed k and
therefore
X n
f (k) (x0 ) k k
f (x0 + h) h jhjn+1
k! (n + 1)!
k=0

The error term (23.17) is called the Lagrange remainder, while o (hn ) is called the Peano
remainder. The former permits error estimates, as just remarked, but the latter is often
enough to express the quality of the approximation.

Proof Suppose that h > 0. Consider the interval [x0 ; x0 + h] (a; b). By formula (22.8),
we have
n
X f (k) (x0 ) k f (n+1) (^x) n+1
f (x0 + h) = h + h
k! (n + 1)!
k=0

for some x
^ 2 (x0 ; x0 + h). Thus, for some 0 < t < 1 we have x ^ = tx0 + (1 t) (x0 + h), so
x
^ = x0 + #h by setting # = 1 t. We thus get (23.16).
Suppose that h < 0. If we now consider the interval [x0 + h; x0 ] (a; b), by formula
(22.9) we have
Xn
f (k) (x0 ) k f (n+1) (^
x) n+1
f (x0 + h) = h + h
k! (n + 1)!
k=0

for some x^ 2 (x0 + h; x0 ). In turn, this easily implies (23.16).


So far we only needed f to be n times di¤erentiable. But, its n times continuous di¤er-
entiability now allows us to write:

f (n+1) (x0 + #h) hn+1 1


lim n
= lim f (n+1) (x0 + #h) h = 0
h!0 (n + 1)! h (n + 1)! h!0

So, f (n+1) (x0 + #h) hn+1 = (n + 1)! = o (hn ).

23.1.3 Taylor expansion and limits


Taylor expansions prove very useful also in the calculation of limits. Indeed, by suitably
expanding f at x0 we reduce the original limit to a simple limit of polynomials. We illustrate
this with a couple of examples.

Example 1048 (i) Consider the limit

log 1 + x3 3 sin2 x
lim
x!0 log (1 + x)
23.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 721

Since the limit is as x ! 0, we can use the second-order Maclaurin’s expansion (23.15) and
(23.14) to approximate the numerator and the denominator. Using Lemma 470 and the
little-o algebra, we have

log 1 + x3 3 sin2 x 3x2 + o x2 3x2


lim = lim = lim =0
x!0 log (1 + x) x!0 x + o (x) x!0 x

The calculation of the limit has, therefore, been considerably simpli…ed through the combined
use of Maclaurin’s expansions and of the comparison of in…nitesimals seen in Lemma 470.
(ii) Consider the limit
x sin x
lim
x!0 log2 (1 + x)

This limit can also be calculated by combining an expansion and a comparison of in…nites-
imals:
x sin x x (x + o (x)) x2 + o x2 x2
lim 2 = lim 2 = lim = lim =1
x!0 log (1 + x) x!0 (x + o (x)) x!0 x2 + o (x2 ) x!0 x2

23.2 Omnibus proposition for local extremal points


Although for simplicity we have stated Taylor’s Theorem for functions de…ned on intervals
(a; b), it holds at any interior points x0 of any set A where f is n times di¤erentiable provided
there is a neighborhood (a; b) A of x0 where f is n 1 times di¤erentiable.
This version of Taylor’s approximation allows one to state an “omnibus”proposition for
local extremal points which includes and extends both the necessary condition f 0 (x0 ) = 0
of Fermat’s Theorem and the su¢ cient condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0 of Corollary
1014 (see also Corollary 1017-(ii)).

Proposition 1049 Let f : A R ! R and C A. Let x0 be an interior point of C


for which there exists a neighborhood B" (x0 ) C such that f is n 1 times di¤ erentiable
there. If f is n times di¤ erentiable at x0 , with f (k) (x0 ) = 0 for every 1 k n 1 and
f (n) (x0 ) 6= 0, then:

(i) If n is even and f (n) (x0 ) < 0, the point x0 is a strong local maximizer.

(ii) If n is even and f (n) (x0 ) > 0, the point x0 is a strong local minimizer.

(iii) If n is odd, x0 is not a local extremal point and, moreover, f is increasing or decreasing
at x0 depending on whether f (n) (x0 ) > 0 or f (n) (x0 ) < 0.

For n = 1, point (iii) is nothing but the fundamental …rst-order necessary condition
f 0 (x0 ) = 0. Indeed, for n = 1, point (iii) states that if f 0 (x0 ) 6= 0, then x0 is not a
local extremal point (i.e., neither a local maximizer nor a local minimizer). By taking the
contrapositive, this amounts to saying that if x0 is a local extremal point, then f 0 (x0 ) = 0.
Hence, (iii) extends to higher order derivatives the …rst-order necessary condition.
Point (i) instead, together with the hypothesis f (k) (x0 ) = 0 for every 1 k n 1,
00
extends to higher order derivatives the second-order su¢ cient condition f (x0 ) < 0 for
722 CHAPTER 23. APPROXIMATION

strong local maximizers. Indeed, for n = 2 (i) is exactly condition f 00 (x0 ) < 0. Analogously,
(ii) extends the analogous condition f 00 (x0 ) > 0 for minimizers.2

N.B. In this and in the next section we will focus on the generalization of su¢ ciency point
(ii) of Corollary 1017. It is possible to generalize in a similar way its necessity point (i), as
readers can check. O

Proof (i). Let n be even and let f (n) (x0 ) < 0. By Taylor’s Theorem, from the hypothesis
f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0 if follows that
f (n) (x0 ) n f (n) (x0 ) n o (hn )
f (x0 + h) f (x0 ) = h + o (hn ) = h 1+
n! n! hn
Since limh!0 o (hn ) =hn = 0, there exists > 0 such that jhj < implies jo (hn ) =hn j < 1.
Hence,
o (hn )
h2( ; ) =) 1 + >0
hn
Since f (n) (x0 ) < 0, we have therefore, because hn > 0 being n even,
f (n) (x0 ) n o (hn )
h2( ; ) =) h 1+ < 0 =) f (x0 + h) f (x0 ) < 0
n! hn
that is, setting x = x0 + h,
x 2 (x0 ; x0 + ) =) f (x) < f (x0 )
So, x0 is a local maximizer. This proves (i). In a similar way we prove (ii). Finally, (iii) can
be proved by adapting in a suitable way the proof of Fermat’s Theorem.

Example 1050 (i) Consider the function f : R ! R given by f (x) = x4 . We saw in


Example 1016 that, for its maximizer x0 = 0, it was not possible to apply the su¢ cient
condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have, however,
f 0 (0) = f 00 (0) = f 000 (0) = 0 and f (iv) (0) < 0
Since n = 4 is even, by Proposition 1049-(i) we conclude that x0 = 0 is a local maximizer
(actually, it is a global maximizer, but using Proposition 1049 is not enough to conclude
this).
(ii) Consider the function f : R ! R given by f (x) = x3 . At x0 = 0 we have
f 0 (0) = f 00 (0) = 0 and f 000 (0) < 0
Since n = 3 is odd, by Proposition 1049-(iii) we conclude that x0 = 0 is not a local extremal
point (rather, at x0 the function is strictly decreasing).
(iii) The function de…ned by f (x) = x6 clearly attains its minimum value at x0 = 0.
Indeed, one has f 0 (0) = f 00 (0) = = f (v) (0) = 0 and f (vi) (0) = 6! > 0.
The function f (x) = x is clearly increasing at x0 = 0. One has f 0 (0) = f 00 (0) =
5

f 000 (0) = f (iv) (0) = 0 and f (v) (0) = 5! = 120 > 0. N


2
Observe that, given what has been proved about the Taylor’s approximation, the case n = 2 presents
an interesting improvement with respect to Corollary 1014: it is required that the function f be twice
di¤erentiable on the neighborhood B" (x0 ), but f 00 is not required to be continuous.
23.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 723

Proposition 1049 is powerful but has important limitations. Like Corollary 1014, it can
only treat interior points and it is useless for local extremal points that are not strong,
for which in general the derivatives of any order are zero. The most classic instance of such
failure are constant functions: their points are all, trivially, both maximizers and minimizers,
but Proposition 1049 (like Corollary 1014) is not able to tell us anything about them.
Moreover, to apply Proposition 1049 it is necessary that the function has a su¢ cient
number of derivatives at a stationary point, which may not be the case as the next example
shows.

Example 1051 Consider the function f : R ! R de…ned by


8
< x2 sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
It is continuous at the origin x = 0. Indeed, since jsin (1=h)j 1 and by applying the
comparison criterion, it follows that
1
lim f (0 + h) = lim h2 sin =0
h!0 h!0 h
It is di¤erentiable at the origin because
f (0 + h) f (0) h2 sin h1 0 1
lim = lim = lim h sin =0
h!0 h h!0 h h!0 h
The origin is thus a stationary point for f . But the function does not admit a second
derivative there. Indeed,
8
< 2x sin 1 cos 1 if x 6= 0
0 x x
f (x) =
:
0 if x = 0
and therefore
f 0 (0 + h) f 0 (0) 2h sin h1 cos h1 0 1 1 1
lim = lim = lim 2 sin cos
h!0 h h!0 h h!0 h h h
does not exist. Therefore, Proposition 1049 cannot be applied and so it is not able to say
anything about the nature of the stationary point x = 0. Nevertheless, the graph of f shows
that the origin is not a local extremal point since f has in…nitely many oscillations in any
neighborhood of zero. N

Example 1052 The general version of the previous example considers f : R ! R de…ned
by 8
< xn sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
with n 1, and shows that f does not have derivatives of order n at the origin (in the
case n = 1, this means that at the origin the …rst derivative does not exist). We leave to the
reader the analysis of this example. N
724 CHAPTER 23. APPROXIMATION

23.3 Omnibus procedure of search of local extremal points


Thanks to Proposition 1049, we can re…ne the procedure seen in Section 22.5.2 for the search
of local extremal points of a function f : A R ! R on a set C. To …x ideas let us study
two important special cases.

23.3.1 Twice di¤erentiable functions


Suppose that f is twice di¤erentiable on the interior points of C, that is, on int C. The
omnibus procedure consists in the following two stages:

1. Determine the set S of stationary points by solving the …rst-order condition f 0 (x) = 0.
If S = ; the procedure ends (we conclude that, since there are no stationary points,
there are no extremal ones); otherwise we move to the next step.

2. Calculate f 00 at each of the stationary points x 2 S: the point x is a strong local


maximizer if the second-order condition is f 00 (x) < 0; it is a strong local minimizer if
this condition is f 00 (x) > 0; if f 00 (x) = 0, the procedure is not able to determine the
nature of x.

This is the classic procedure to …nd local extremal points based on …rst-order and second-
order conditions of Section 22.5.2. The version just presented improves what we have seen
there because, using again what we observed in a previous footnote, it requires only that the
function has two derivatives on int C, not necessarily continuous. However, we are still left
with the other limitations discussed in Section 22.5.2.

23.3.2 In…nitely di¤erentiable functions


Suppose that f is in…nitely di¤erentiable on int C. The omnibus procedure consists in the
following stages:

1. Determine the set S of the stationary points by solving the equation f 0 (x) = 0. If
S = ;, the procedure ends; otherwise move to the next step.

2. Compute f 00 at each of the stationary points x 2 S: the point x is a strong local


maximizer if f 00 (x) < 0, and a strong local minimizer if f 00 (x) > 0. Call S (2) the
subset of S of the points such that f 00 (x) = 0. If S (2) = ;, the procedure ends;
otherwise move to the next step.

3. Compute f 000 at each point of S (2) : if f 000 (x) 6= 0, the point x is not an extremal one.
Call S (3) the subset of S (2) in which f 000 (x) = 0. If S (3) = ;, the procedure ends;
otherwise move to the next step.

4. Compute f (iv) at each point of S (3) : the point x is a strong local maximizer if f (iv) (x) <
0; a strong local minimizer if f (iv) (x) > 0. Call S (4) the subset of S (3) in which
f (iv) (x) = 0. If S (4) = ;, the procedure ends; otherwise move to the next step.

5. Iterate the procedure until S (n) = ;.


23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 725

The procedure thus ends if there exists n such that S (n) = ;. Otherwise, the procedure
iterates ad libitum (or ad nauseam).

Example 1053 Consider again the function f (x) = x4 , with C = R. We saw in Example
1016 that for its maximizer x0 = 0 it was not possible to apply the su¢ cient condition
f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have, however,

f 0 (0) = f 00 (0) = f 000 (0) = 0 and f (iv) (0) < 0

so that
S = S (2) = S (3) = f0g and S (4) = ;
Stage 1 identi…es the set S = f0g, about which stage 2 has however nothing to say since
f 00 (0) = 0. Also stage 3 does not add any extra information since f 000 (0) = 0. Stage 4
instead is conclusive: since f (iv) (0) < 0, we can assert that x = 0 is a strong local maximizer
(actually, it is a global maximizer, but this procedure does not allow us to say this). N

Naturally, the procedure is of practical interest when it ends after few stages.

23.4 Taylor expansion: functions of several variables


In this section we study a version of the Taylor expansion for functions of several variables.
To do this, it is necessary to introduce quadratic forms.

23.4.1 Quadratic forms


A function f : Rn ! R of the form

f (x1 ; :::; xn ) = k (x1 1 x2 2 xn n )


Pn
with k 2 R and i 2 N, is called a monomial of degree m if i=1 i = m. For ex-
ample, f (x1 ; x2 ) = 2x1 x2 is a monomial of second degree, while f (x1 ; x2 ; x3 ) = 5x1 x32 x43 is
a monomial of degree eight.

De…nition 1054 A function f : Rn ! R is a quadratic form if it is a sum of monomials of


second degree.

For example, f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is a quadratic form because it is the sum of the
monomials of second degree 3x1 x3 and x2 x3 . It is easy to see that the following functions
are quadratic forms:

f (x) = x2
f (x1 ; x2 ) = x21 + x22 4x1 x2
f (x1 ; x2 ; x3 ) = x1 x3 + 5x2 x3 + x23
f (x1 ; x2 ; x3 ; x4 ) = x1 x4 2x21 + 3x2 x3

There is a one-to-one correspondence between quadratic forms and symmetric matrices, as


the next result shows (we omit the proof).
726 CHAPTER 23. APPROXIMATION

Proposition 1055 There is a one-to-one correspondence between quadratic forms f : Rn !


R and symmetric matrices A of order n established by:3
n X
X n
f (x) = x Ax = aij xi xj 8x 2 Rn (23.18)
i=1 j=1

In other words, given a symmetric matrix A there exists a unique quadratic form
n n
f : Rn ! R for which (23.18) holds. Vice versa, given a quadratic form f : Rn ! R there
exists a unique symmetric matrix A for which (23.18) holds.
n n
The matrix A = (aij ) is called the matrix associated to the quadratic form f . We can
write (23.18) in an extended manner as
f (x) = a11 x21 + a22 x22 + a33 x23 + + ann x2n
+ 2a12 x1 x2 + 2a13 x1 x3 + + 2a1n x1 xn
+ 2a23 x2 x3 + + 2a2n x2 xn + + 2an 1n xn 1 xn

The coe¢ cients of the squares x21 , x22 , ..., x2n are therefore the elements (a11 ; a22 ; :::ann ) of
the diagonal of A, while for every i; j = 1; 2; :::n the coe¢ cient of the monomial xi xj is 2aij .
It is therefore very simple to move from the matrix to the quadratic form and vice versa.
Let us see give some examples.
Example 1056 The matrix associated to the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3
is given by 2 3
3
0 0 2
A=4 0 0 1 5
2
3 1
2 2 0
Indeed, for every x 2 R3 we have:
2 3
323
0 0 2 x1
x Ax = (x1 ; x2 ; x3 ) 4 0 0 1 54
x2 5
2
3 1
2 2 0 x3
3 1 3 1
= (x1 ; x2 ; x3 ) x3 ; x3 ; x1 x2
2 2 2 2
3 1 3 1
= x1 x3 x2 x3 + x1 x3 x2 x3 = 3x1 x3 x2 x3
2 2 2 2
Note that also the matrices
2 3 2 3
0 0 3 0 0 0
A=4 0 0 1 5 and A=4 0 0 0 5 (23.19)
0 0 0 3 1 0
are such that f (x) = x Ax, although they are not symmetric. What we loose without
symmetry is the one-to-one correspondence between quadratic forms and matrices. Indeed,
while given the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 there exists a unique symmetric
matrix for which (23.18) holds, this is no longer true if we do not require the symmetry of
the matrix, as the two matrices in (23.19) show: for both of them, (23.18) holds. N
3
To ease notation we write x Ax instead of the more precise x AxT (cf. the dicussion on vector notation
in in Section 13.2.4).
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 727

Example 1057 As to the quadratic form f (x1 ; x2 ) = x21 + x22 4x1 x2 , we have

1 2
A=
2 1

Indeed, for every x 2 R2 we have

1 2 x1
x Ax = (x1 ; x2 ) = (x1 ; x2 ) (x1 2x2 ; 2x1 + x2 )
2 1 x2
= x21 2x1 x2 2x1 x2 + x22 = x21 + x22 4x1 x2

N
P
Example 1058 Let f : Rn ! R be de…ned by f (x) = kxk2 = ni=1 x2i . The symmetric
matrix
Pn associated to this quadratic form
Pn is the2 identity matrix I. Indeed, x Ix = x x =
2
i=1 xi . More generally, let f (x) = i=1 i xi with i 2 R for every i = 1; :::; n. It is easy
to see that the matrix associated to f is the diagonal matrix
2 3
1 0 0 0
6 0 0 0 7
6 2 7
6 0 0 0 7
6 3 7
4 0 0 0 0 5
0 0 0 n

Observe that if f : Rn ! R is a quadratic form, we have f (0) = 0. According to the sign


of f , it is possible to classify the quadratic forms as follows:

De…nition 1059 A quadratic form f : Rn ! R is said to be:

(i) positive (negative) semi-de…nite if f (x) 0( 0) for every x 2 Rn ;

(ii) positive (negative) de…nite if f (x) > 0 (< 0) for every 0 6= x 2 Rn ;

(iii) inde…nite if there exist x; x0 2 Rn such that f (x) < 0 < f (x0 ).

In view of Proposition 1055, we have a parallel classi…cation for symmetric matrices,


where the matrix is said to be positive semi-de…nite if the corresponding quadratic form is
so, and so on.

In some cases it is easy to check thePsign of a quadratic form. For example, it is immediate
to see that the quadratic form f (x) = ni=1 i x2i is positive semi-de…nite if and only if i 0
for every i, while it is positive de…nite if and only if i > 0 for every i. In general, however,
it is not simple to determine directly the sign of a quadratic form and, therefore, some
useful criteria have been elaborated. Among them, we consider the classic Sylvester-Jacobi
criterion.
728 CHAPTER 23. APPROXIMATION

Given a symmetric matrix A, consider the following square submatrices A1 , A2 , ..., An :


2 3
a11 a12 a13
a11 a12
A1 = [a11 ] ; A2 = ; A3 = 4 a21 a22 a23 5 ; :::; An = A
a21 a22
a31 a32 a33

and their determinants det A1 , det A2 , det A3 ,..., det An = det A.4

Proposition 1060 (Sylvester-Jacobi criterion) A symmetric matrix A is:

(i) positive de…nite if and only if det Ai > 0 for every i = 1; :::; n;

(ii) negative de…nite if and only if det Ai changes sign starting with negative sign (that is,
det A1 < 0, det A2 > 0, det A3 < 0 and so on);

(iii) inde…nite if the determinants det Ai are not zero and the sequence of their signs does
not respect (i) and (ii).

Example 1061 Let f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2 . The matrix associated to
f is: 2 3
1 21 0
A = 4 12 2 12 5
0 12 1
Indeed, we have
2 1
32 3
1 2 0 x1
x Ax = (x1 ; x2 ; x3 ) 4 1
2 2 1 54
2 x2 5
1
0 2 1 x3
1 1 1 1
= (x1 ; x2 ; x3 ) x1 + x2 ; x1 + 2x2 + x3 ; x2 + x3
2 2 2 2
= x21 + 2x22 + x23 + (x1 + x3 ) x2

Let us determine the sign of the quadratic form with the Sylvester-Jacobi criterion. We have:

det A1 = 1 > 0
7 1
1 2
det A2 = det = >0
1
2 42
3
det A3 = det A = > 0
2
Hence, by the Sylvester-Jacobi criterion our quadratic form is positive de…nite. N

There exist versions of the Sylvester-Jacobi criterion to determine if a symmetric matrix


is positive semi-de…nite, negative semi-de…nite, or if it is instead inde…nite. We omit the
details and we move, instead, to the Taylor expansion.
4
These are exactly the North-West principal minors of the matrix A introduced in Section 13.6.7, considered
from the smallest to the largest one.
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 729

23.4.2 Taylor expansion


By Theorem 954, a function f : U ! R de…ned on an open set U in Rn with continuous
partial derivatives is di¤erentiable at every x 2 U , that is, it can be linearly approximated
as
f (x + h) = f (x) + df (x) (h) + o (khk) = f (x) + rf (x) h + o (khk) (23.20)
for every h 2 Rn such that x+h 2 U . As already seen in Section 21.2, if, with a small change
of notation, we denote by x0 the point at which f is di¤erentiable and we set h = x x0 ,
this approximation assumes the following equivalent, but more expressive, form:

f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (kx x0 k) (23.21)


= f (x0 ) + rf (x0 ) (x x0 ) + o (kx x0 k)

for every x 2 U .
We can now present the Taylor expansion for functions of several variables. As in the
scalar case, also in the general multivariable case the Taylor expansion re…nes the …rst order
approximation (23.21). In stating it, we limit ourselves to a second order approximation
that su¢ ces for our purposes.5

Theorem 1062 Let f : U ! R be twice continuously di¤ erentiable. Then, at each x0 2 U


we have
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2 (23.22)
2
for every x 2 U .

Expression (23.22) is called the quadratic (or second-order ) Taylor expansion (or for-
mula). The polynomial in the variable x
1
f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 )
2
is called the Taylor polynomial of second degree at the point x0 . The second-degree term
is a quadratic form. Its associated matrix, the Hessian r2 f (x), is symmetric by Schwarz’s
Theorem.
Naturally, if terminated at the …rst order the Taylor’s expansion reduces to (23.21).
Moreover, observe that in the scalar case Taylor’s polynomial assumes the well-know form:
1
f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 )2
2
Indeed, in this case we have r2 f (x0 ) = f 00 (x0 ), and therefore

(x x0 ) r2 f (x0 ) (x x0 ) = f 00 (x0 ) (x x0 )2 (23.23)

As in the scalar case, here too we have a trade-o¤ between the simplicity of the approx-
imation and its accuracy. Indeed, the …rst order approximation (23.21) has the advantage
5
In the rest of this section U is an open convex set. We omit the proof of this theorem and refer readers
to more advanced courses for the study of approximations of higher order.
730 CHAPTER 23. APPROXIMATION

of simplicity compared to the quadratic one: we approximate with a linear function rather
than with a second-degree polynomial, but to the detriment of the degree of accuracy of the
approximation, given by o (kx x0 k) instead of the better o kx x0 k2 .
Also in the multivariable case, the choice of the order at which to terminate the Taylor
expansion depends therefore on the particular use we are interested in, and on which aspect
of the approximation is more important, simplicity or accuracy.
2
Example 1063 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 ex2 . We have:
2 2
rf (x) = 6x1 ex2 ; 6x21 x2 ex2

and " #
2 2
2 6ex2 12x1 x2 ex2
r f (x) = 2 2
12x1 x2 ex2 6x1 ex2 1 + 2x22
2

By Theorem 1062, the Taylor expansion at x0 = (1; 1) is

f (x) = f (1; 1) + rf (1; 1) (x1 1; x2 1)


1
+ (x1 1; x2 1) r2 f (1; 1) (x1 1; x2 1) + o k(x1 1; x2 1)k2
2
= 3e + (6e; 6e) (x1 1; x2 1) +
1 6e 12e x1 1
(x1 1; x2 1) + o (x1 1)2 + (x2 1)2
2 12e 18e x2 1
= 3e x21 4x1 + 5 8x2 + 4x1 x2 + 3x22 + o (x1 1)2 + (x2 1)2

Hence, f is approximated at the point (1; 1) by the second-degree Taylor’s polynomial

3e x21 4x1 + 5 8x2 + 4x1 x2 + 3x22

with level of accuracy given by o((x1 1)2 + (x2 1)2 ). N

We close with a …rst order approximation with Lagrange remainder that sharpens the
approximation (23.20) with Peano remainder.6

Theorem 1064 Let f : U ! R be twice continuously di¤ erentiable. If x0 2 U , then for


every 0 6= h 2 Rn such that x0 + h 2 U there exists 0 < # < 1 with

1
f (x0 + h) = f (x0 ) + rf (x0 ) h + (x0 + #h) r2 f (x0 ) (x0 + #h) (23.24)
2

Note that the same di¤erentiability assumption that permitted the quadratic approxim-
ation (23.22) with a Peano remainder, only allows for a …rst order approximation with the
sharper Lagrange remainder. As usual, no free meals.
6
Higher order approximations with Lagrange remainders are notationally cumbersome, and we leave them
to more advanced courses.
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 731

23.4.3 Second-order conditions


Using the Taylor expansion (23.22) we can state a second-order condition for local extremal
points. Indeed, this expansion allows to approximate locally a function f : U ! R at a point
x0 2 U by a second-degree polynomial in the following way:

1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2
2
If x
^ is a local extremal point (either a maximizer or minimizer), by Fermat’s Theorem we
have rf (^ x) = 0 and therefore the approximation becomes

1
f (x) = f (^
x) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x (23.25)
2
that is,
1
f (^ x) + h r2 f (^
x + h) = f (^ x) h + o khk2
2
Based on this simple observation, we obtain the following second-order conditions that are
based on the sign of the quadratic form h r2 f (x0 ) h.

Theorem 1065 Let f : U ! R be twice continuously di¤ erentiable. Let x


^ 2 U be a station-
ary point.7

^ is a local maximizer (minimizer) on U , the quadratic form h r2 f (^


(i) If x x) h is negative
(positive) semi-de…nite.

(ii) If the quadratic form h r2 f (^


x) h is negative (positive) de…nite, then x
^ is a strong local
maximizer (minimizer).

Note that from point (i) it follows that if the quadratic form h r2 f (^
x) h is inde…nite,
the point x^ is neither a local maximizer nor a local minimizer on U . This theorem is the
multivariable analog of Corollary 1017. Indeed, in the proof we will use such corollary since
we will be able to reduce the problem from functions of several variables to functions of a
single variable.

Proof We will prove only point (i), leaving point (ii) to the reader. So, let x ^ be a local
2
maximizer on U . We want to prove that the quadratic form h r f (^ x) h is negative semi-
de…nite. For simplicity, let us suppose that x
^ is the origin 0 = (0; 0). First of all, let us prove
that v r2 f (0) v 0 for every unit vector v of Rn . We will then prove that h r2 f (0) h 0
for every vector h 2 Rn .
Since 0 is a local maximizer and U is open, there exists a small enough neighborhood
B" (0) so that B" (0) U and f (0) f (x) for every x 2 B" (0). Note that every vector
x 2 B" (0) can be written as x = tv, where v is a unit vector of Rn (i.e., jjvjj = 1) and t 2 R.8
7
For simplicity we continue to consider functions de…ned on open sets. We leave to readers the routine
extension of the results to functions f : A Rn ! R and to interior points x ^ that belong to a choice set
C A.
8
Intuitively, v represents the direction of x and t its norm (indeed, jjxjj = jtj).
732 CHAPTER 23. APPROXIMATION

Clearly, tv 2 B" (0) if and only if jtj < ". Fix an arbitrary unit vector v in Rn , and de…ne
the function v : ( "; ") ! R by v (t) = f (tv). Since tv 2 B" (0) for jtj < ", we have

v (0) = f (0) f (tv) = v (t)

for every t 2 ( "; "). It follows that t = 0 is a local maximizer for the function v and hence,
being v di¤erentiable and t = 0 an interior point of the domain of v , by applying Corollary
1017 we get 0v (0) = 0 and 00v (0) 0. By applying the chain rule to the function

v (t) = f (tv1 ; tv2 ; :::; tvn )

we get 0v (t) = rf (tv) v and 00


v (t) = v r2 f (tv) v. The …rst-order and second-order
conditions become
0 00
v (0) = rf (0) v = 0 and v (0) = v r2 f (0) v 0

Since the unit vector v of Rn is arbitrary, this last inequality holds for every unit vector of
Rn .
Now, let h 2 Rn . In much the same way as before, observe that h = th v for some unit
vector v 2 Rn and th 2 R such that jth j = jjhjj.

1.5 h=t v
h

1
v
0.5

0
1
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Then
h r2 f (0) h = th v r2 f (0) th v = t2h v r2 f (0) v
Since v r2 f (0) v 0, we have also h r2 f (0) h 0. This holds for every h 2 Rn , so the
quadratic form h r2 f (0)h is negative semi-de…nite.

In the scalar case we get back to the usual second-order conditions, based on the sign of
the second derivative f 00 (^
x). Indeed, we already observed in (23.23) that in the scalar case
one has
x r2 f (^
x) x = f 00 (^
x) x2
Thus, in this case the sign of the quadratic form depends only on the sign of f 00 (^ x); that is,
it is negative (positive) de…nite if and only if f 00 (^
x) < 0 (> 0), and it is negative (positive)
semi-de…nite if and only if f 00 (^
x) 0 ( 0).
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 733

Naturally, as in the scalar case, also in this general multivariable case condition (i) is
only necessary for x
^ to be a local maximizer.

^ = 0 we have r2 f (0) = O.
Example 1066 Consider the function f (x1 ; x2 ) = x21 x2 . At x
2
The corresponding quadratic form x r f (0) x is identically zero and is therefore both
negative and positive semi-de…nite. Nevertheless, x
^ = 0 is neither a local maximizer nor a
local minimizer. Indeed, by taking a generic neighborhood B" (0), let x = (x1 ; x2 ) 2 B" (^
x)
be such that x1 = x2 . Let t be such a common value, so that
p p "
(t; t) 2 B" (0) () k(t; t)k = t2 + t2 = jtj 2 < " () jtj < p
2
Since f (t; t) = t3 , for every (t; t) 2 B" (0) we have f (t; t) < f (0) if t < 0 and f (0) < f (t; t)
if t > 0, which shows that x ^ = 0 is neither a local maximizer nor a local minimizer.9 N

Similarly, condition (ii) is only su¢ cient for x


^ to be a local maximizer.

Example 1067 For instance, consider the function f (x) = x21 x22 . The point x
^ = 0 is
2
clearly a (global) maximizer for the function f but r f (0) = O, so the corresponding
quadratic form x r2 f (0) x is not negative de…nite. N

The Hessian r2 f (^x) is the symmetric matrix associated to the quadratic form x
2
r f (^
x) x. We can therefore equivalently state Theorem 1065 in the following way:

a necessary condition for x^ to be a maximizer (minimizer) is that the Hessian matrix


r2 f (^
x) is negative (positive) semi-de…nite,
a su¢ cient condition for x
^ to be a strong maximizer (minimizer) is that the Hessian
matrix is negative (positive) de…nite.

This Hessian version is important operationally because there exist criteria, such as the
Sylvester-Jacobi one, to determine whether a symmetric matrix is positive/negative de…nite
or semi-de…nite. For instance, consider a generic function of two variables f : R2 ! R that
is twice continuously di¤erentiable. Let x0 2 R2 be a stationary point rf (x0 ) = (0; 0) and
let 2 3
@2f @2f
(x ) (x )
5= a b
2 0 @x1 @x2 0
r2 f (x0 ) = 4 @@x2 f1 @2f (23.26)
(x0 ) 2 (x0 )
c d
@x2 @x1 @x2

be the Hessian matrix computed at the point x0 . Since the gradient at x0 is zero, the point
is a candidate to be a maximizer or minimizer of f . To determine its exact nature, it is
necessary to analyze the Hessian matrix at the point. By Theorem 1065, x0 is a maximizer
if the Hessian is negative de…nite, a minimizer if it is positive de…nite, and it is neither a
maximizer, nor a minimizer if it is inde…nite. If the Hessian is only semi-de…nite, positive or
negative, it is not possible to draw conclusions on the nature of x0 . Applying the Sylvester-
Jacobi criterion to the matrix (23.26) we have that:
9
In an alternative way, it is su¢ cient to observe that at each point of the I or II quadrant, except the
axes, we have f (x1 ; x2 ) > 0, and that at each point of the III or IV quadrant, except the axes, we have
f (x1 ; x2 ) < 0. Every neighborhood of the origin contains necessarily both points of the I and II quadrants
(except the axes), for which we have f (x1 ; x2 ) > 0 = f (0), and points of the III and IV quadrants (except
the axes), for which we have f (x1 ; x2 ) < 0 = f (0). Hence 0 is neither a local maximizer nor a local minimizer.
734 CHAPTER 23. APPROXIMATION

(i) if a > 0 and ad bc > 0, the Hessian is positive de…nite, so x0 is a strong local
minimizer;

(ii) if a < 0 and ad bc > 0, the Hessian is negative de…nite, so x0 is a strong local
maximizer;

(iii) if ad bc < 0, the Hessian is inde…nite, and therefore x0 is neither a local maximizer,
nor a local minimizer.

In all the other cases it is not possible to say anything on the nature of the point x0 .

Example 1068 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 + x22 + 6x1 . We have rf (x) =
(6x1 + 6; 2x2 ) and
6 0
r2 f (x) =
0 2
It is easy to see that the unique point where the gradient vanishes is x0 = ( 1; 0) 2 R2 ,
that is, rf ( 1; 0) = (0; 0). Moreover, in view of the previous discussion, since a > 0 and
ad bc > 0, the point x0 = ( 1; 0) is a strong local minimizer of f . N

Example 1069 Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x31 + x32 + 3x23 2x3 + x21 x22 . We
have
rf (x) = 3x21 + 2x1 x22 ; 3x22 + 2x21 x2 ; 6x3 2
and 2 3
6x1 + 2x22 4x1 x2 0
r2 f (x) = 4 4x1 x2 6x2 + 2x21 0 5
0 0 6
The stationary points are x0 = ( 3=2; 3=2; 1=3) and x00 = (0; 0; 1=3). At x0 , we have
2 9 3
2 9 0
r2 f x0 = 4 9 9
2 0
5
0 0 6

and therefore
9 9
9
det < 0; det 2
9 < 0; det r2 f x0 < 0
2 9 2

By the Sylvester-Jacobi criterion the Hessian matrix is inde…nite. By Theorem 1065, the
point x0 = ( 3=2; 3=2; 1=3) is neither a local minimizer nor a local maximizer. For the
point x00 = (0; 0; 1=3) we have
2 3
0 0 0
r2 f x00 = 4 0 0 0 5
0 0 6

which is positive semi-de…nite since x r2 f (x00 ) x = 6x23 (note that it is not positive de…nite:
for example, we have (1; 1; 0) r2 f (x00 ) (1; 1; 0) = 0). N
23.4. TAYLOR EXPANSION: FUNCTIONS OF SEVERAL VARIABLES 735

23.4.4 Multivariable unconstrained optima


Lastly, we can generalize to the multivariable case the partial procedure for the solution of un-
constrained optimization problems, discussed in Section 22.5.3. Consider the unconstrained
optimization problem
max f (x) sub x 2 C
x

where C is an open convex set of Rn .


Assume that f 2 C 2 (C). By Theorem 1065-(i), the
procedure of Section 22.5.3 assumes the following form:

1. Determine the set S C of the stationary interior points of f by solving the …rst order
condition rf (x) = 0 (Section 22.1.3).

2. Calculate the Hessian matrix r2 f at each of the stationary points x 2 S and determine
the set
S2 = x 2 S : r2 f (^x) is negative semi-de…nite

3. Determine the set

S3 = x 2 S2 : f (x) f x0 for every x0 2 S2

of the points of C that are candidate solutions of the optimization problem.

Also here the procedure is not conclusive because nothing ensures the existence of a
solution. Later in the book we will discuss this crucial problem by combining in the method
of elimination such existence theorems with the di¤erential methods.

Example 1070 Let f : R2 ! R be given by f (x1 ; x2 ) = 2x21 x22 + 3 (x1 + x2 ) x1 x2 + 3


and consider the unconstrained optimization problem

max f (x) sub x 2 R2++


x

Here C = R2++ is the …rst quadrant of the plane without the axes (hence an open set). We
have
rf (x) = ( 4x1 + 3 x2 ; 2x2 + 3 x1 )
Therefore, from the …rst-order condition rf (x) = 0 it follows that the unique stationary
point is x = (3=7; 9=7), that is, S = f3=7; 9=7g. We have

4 1
r2 f (x) =
2 1

By the Sylvester-Jacobi criterion, the Hessian matrix r2 f (x) is negative de…nite.10 Hence,
S2 = f3=7; 9=7g. Since S2 is a singleton, we have trivially S3 = S2 . In conclusion, the point
x = (3=7; 9=7) is the unique candidate to be a solution of the unconstrained optimization
problem. One can show that this point is the solution of the problem. For the moment we
can only say that, by Theorem 1065-(ii), it is a local maximizer. N
10
Since r2 f (x) is negative de…nite for all x 2 Rn
++ , this also proves that f is concave.
736 CHAPTER 23. APPROXIMATION

23.5 Coda: asymptotic expansions


23.5.1 Asymptotic scales and expansions
Up to now we have considered polynomial expansions. Although they are the most relevant,
it may be useful to mention other expansions, so to better contextualize the polynomial case
itself. Their study was pioneered by Henri Poincaré in 1886
Let us take any open interval (a; b), bounded or unbounded; in other words, a; b 2 R.11
A family of scalar functions = f'n g1 n=0 de…ned on (a; b) is said to be an asymptotic scale
at x0 2 [a; b] if,12 for every n 0, we have
'n+1 = o ('n ) as x ! x0

Example 1071 (i) Power functions 'n (x) = (x x0 )n are an asymptotic scale at x0 2
(a; b). (ii) Negative power functions 'n (x) = x n are an asymptotic scale at x0 = +1.13
More generally, powers 'n (x) = x n form an asymptotic scale at x0 = +1 as long as
n+1 > n for every n 1. (iii) The trigonometric functions 'n (x) = sinn (x x0 ) form an
1
asymptotic scale at x0 2 (a; b). (iv) Logarithms 'n (x) = log n x form an asymptotic scale
at x0 = +1. N

Let us now give a general de…nition of expansion.

De…nition 1072 A function f : (a; b) ! R admits an expansion of order n with respect to


the scale at x0 2 [a; b] if there exist scalars f k gnk=0 such that
n
X
f (x) = k 'k (x) + o ('n ) as x ! x0 (23.27)
k=0

for every x 2 (a; b).

Polynomial expansions (23.2), i.e.,


n
X
f (x) = k (x x0 )k + o ((x x0 )n ) as x ! x0
k=0

are a special case of (23.27) in which the asymptotic scale is given by power functions.
Contrary to the polynomial case where x0 had to be a scalar, now we can take x0 =
1. Indeed, general expansions are relevant because, relative to special case of polynomial
expansions, they also allow us to approximate a function for large values of the argument,
that is, asymptotically.
In symbols, condition (23.27) can be expressed as
n
X
f (x) k 'k (x) as x ! x0
k=0
11
Throughout this section we will maintain this assumption.
12
The expression x0 2 [a; b] entails that x0 is an accumulation point of (a; b). For example, if (a; b) is
the real line, the point x0 belongs to the real line itself; in symbols, if (a; b) = ( 1; +1) we have that
x0 = [ 1; +1]
13
When, as in this example, we have x0 = +1 the interval (a; b) is understood to be unbounded b = +1
(the example of the negative power function scale was made by Poincaré himself.)
23.5. CODA: ASYMPTOTIC EXPANSIONS 737

For example, for n = 2 we get the quadratic approximation:

f (x) 0 '0 (x) + 1 '1 (x) + 2 '2 (x) as x ! x0

By using the scale of power functions, we end up with the well-known quadratic approxim-
ation
2
f (x) 0 + 1x + 2x as x ! 0
However, if we use the scale of negative power functions, we get:
1 2
f (x) 0 + + 2
as x ! +1
x x
In such a case, being x0 = +1, we are dealing with a quadratic asymptotic approximation.

Example 1073 It holds that:


1 1 1
+ as x ! +1 (23.28)
x 1 x x2
Indeed,
1 1 1 1 1
+ 2 = =o as x ! +1
x 1 x x (x 1) x2 x2
Approximation (23.28) is asymptotic. For values close to 0, we consider the quadratic poly-
nomial approximation instead:
1
1 x 2x2 as x ! 0
x 1
N

The key uniqueness property of polynomial expansions (Lemma 1039) still holds in the
general case.

Lemma 1074 A function f : (a; b) ! R has at most a unique expansion of order n with
respect to scale at every point x0 2 [a; b].
Pn
Proof Consider the expansion k=0 k 'k (x) + o ('n ) at x0 2 [a; b]. We have
Pn
f (x) k=0 k 'k (x) + o ('n )
lim = lim = 0 (23.29)
x!x0 '0 (x) x!x0 '0 (x)
Pn
f (x) 0 '0 (x) k=1 k 'k (x) + o ('n )
lim = lim = 1 (23.30)
x!x0 '1 (x) x!x0 '1 (x)
Pn 1
f (x) k=0 k 'k (x)
lim = n (23.31)
x!x0 'n (x)

Suppose that, for every x 2 (a; b), there are two di¤erent expansions
n
X n
X
k 'k (x) + o ('n ) = k 'k (x) + o ('n ) (23.32)
k=0 k=0
738 CHAPTER 23. APPROXIMATION

Equalities (23.29)-(23.31) must hold for both expansions. Hence, by (23.29) we have that
0 = 0 . Iterating such a procedure, from equality (23.30) we get 1 = 1 , and so on until
n = n.

Limits (23.29)-(23.31) are crucial: it is easy to prove that the expansion (23.27) holds if
and only if the limits exist (and are …nite).14 Such limits, in turn, determine the expansion’s
coe¢ cients f k gnk=0 .

Example 1075 Let us determine the quadratic asymptotic approximation, with respect
to the scale of negative power functions, for the function f : ( 1; +1) ! R de…ned by
f (x) = 1= (1 + x). Thanks to equalities (23.29)-(23.31), we have

1
f (x) 1
0 = lim = lim 1+x = lim =0
x!x0 '0 (x) x!x0 1 x!x0 1 + x
1
f (x) 0 '0 (x) 1+x x
1 = lim = lim 1 = lim =1
x!x0 '1 (x) x!x0
x
x!x0 1+x
1 1
f (x) 0 '0 (x) 1 '1 (x) 1+x x x
2 = lim = lim 1 = lim = 1
x!x0 '2 (x) x!x0
x2
x!x0 1+x

Hence, the desired approximation is


2
1 1 1
as x ! +1
1+x x x

By the previous lemma, it is the only quadratic asymptotic approximation with respect to
the scale of negative power functions. N

If we change the scale, the expansion as well changes. For example, approximation
(23.28) is a quadratic approximation for 1= (x 1) with respect to the scale of negative
power functions. However, by changing scale one obtains a di¤erent quadratic approximation.
Indeed, if for example at x0 = +1 we consider the asymptotic scale 'n (x) = (x + 1) =x2n
we obtain the quadratic asymptotic approximation

1 x+1 x+1
+ as x ! +1
x 1 x2 x4

In fact,

1 x+1 x+1 1 x+1


+ = =o as x ! +1
x 1 x2 x4 (x 1) x4 x4

In conclusion, di¤erent asymptotic scales lead to di¤erent, although unique, approxim-


ations (as long as they exist). But di¤erent functions can have the same expansion, as the
next example shows.
14
The “only if” part is shown in the previous proof, the reader can verity the converse.
23.5. CODA: ASYMPTOTIC EXPANSIONS 739

Example 1076 Both


2
1 1 1
as x ! +1
1+x x x
and
2
1+e x 1 1
as x ! +1
1+x x x
hold. Indeed,
!
2
1+e x 1 1 1 + x2 e x 1
= =o as x ! +1
1+x x x (1 + x) x2 x2

Therefore 1=x 1=x2 is the quadratic asymptotic approximation of 1= (1 + x) and (1 + e x ) = (1 + x).


N

The reader might recall that we considered the two following formulations of the De
Moivre-Stirling formula

log n! = n log n n + o (n)


1 p
= n log n n + log n + log 2 + o (1)
2
the …rst one being slightly less precise but easier to derive (Section 8.14.7). Although they
deal with discrete variables, these formulas are, in spirit, two expansions for n ! +1 of the
function log n!. In particular, the former is a quadratic asymptotic approximation with re-
spect to a scale whose …rst two terms are fn log n; ng, for example n log n; n; 1; 1=n; 1=n2 ; ::: ;
the latter is an expansion of order 4 with respect to a scale whose …rst four terms are
fn log n; n; log n; 1g, for example fn log n; n; log n; 1; 1=n; :::g.
To incarnate such spirit, consider the famous gamma function : (0; 1) ! R de…ned by
Z 1
(x) = tx 1 e t dt
0

where the integral is an improper one (Section 35.11.1). We already know that this function
is log-convex (Example 772). Moreover, it satis…es the following formula.

Lemma 1077 (x + 1) = x (x) for every x > 0.

Proof By integrating by parts, one obtains that for every 0 < a < b
Z b Z b Z b
b
tx e t dt = e t tx a + x tx 1 e t dt = e b bx + e a ax + x tx 1
e t dt
a a a

If a # 0 we have e a ax ! 0 and if b " +1 we have e b bx ! 0,15 thus implying the desired


result.

By iterating, for every n 1 we thus have:

(n + 1) = n (n) = n (n 1) (n 1) = = n! (1) = n!
15
Since x > 0, we have lima!0 ax = 0 as 1 = x lima!0 log a = lima!0 log ax .
740 CHAPTER 23. APPROXIMATION

since (1) = 1. The gamma function can be, therefore, thought of as the extension on
the real line of the factorial function f (n) = n!, which is de…ned on the natural numbers
(so, it is a sequence).16 It is an important function: the next remarkable result makes its
interpretation in terms of expansion of the two versions of the De Moivre-Stirling formula
more rigorous.

Theorem 1078 We have, for x ! +1,

log (x) = x log x x + o (x)


1 p
= x log x x log x + log 2 + o (1)
2
In the expansion notation, we can thus write that, for x ! +1,

log (x) x log x x


1 p
x log x x log x + log 2
2

23.5.2 Asymptotic expansions and analytic functions


1
If a sequence of coe¢ cients f k gk=0 is such that (23.27) holds for every n, we say that
1
X
f (x) k 'k (x) as x ! x0
k=0
P1
for every x 2 (a; b). The expression k=1 k 'k (x) is said asymptotic expansion of f at x0 .
For each given value for the argument x, the asymptotic expansion is a series. In general, such
a series does not necessarily converge to the value f (x), rather it might even not converge at
all. In fact, an asymptotic expansion is an approximation with a certain degree of accuracy,
nothing more. The next example presents the di¤erent (fortunate or less fortunate) cases
one can encounter.

Example 1079 (i) The function f : (1; +1) ! R de…ned by f (x) = 1= (x 1) has, with
respect to the scale of negative power functions, the asymptotic expansion
1
X 1
f (x) as x ! +1 (23.33)
xk
k=1

The asymptotic expansion is, for every given x, a geometric series. Therefore, it converges
for every x > 1 –i.e., for every x in the domain of f –with
1
X 1
f (x) =
xk
k=1
16
Instead of (n + 1) = n! we would have exactly (n) = n! if in the gamma function the exponent was x
instead of x 1 (we adopt the standard notation). This detail also explains the opposite sign of the logarithmic
term in the approximations of n! and of (x). The properties of the gamma function, including the next
theorem and its proof, can be found in Artin (1964).
23.5. CODA: ASYMPTOTIC EXPANSIONS 741

In this (fortunate) case the asymptotic expansion is actually correct: the series determined
by the asymptotic expansion converges to f (x) for every x 2 (a; b).
(ii) Also the function f : (1; +1) ! R de…ned by (1 + e x ) = (x 1) has, with respect
to the scale of negative power functions, the asymptotic expansion (23.33) for x ! +1.
However, in this case we have, for every x > 1,
1
X 1
f (x) 6=
xk
k=1

In this example the asymptotic expansion is merely an approximation, with degree of accur-
acy x n for every n.
(iii) Consider the function f : (1; +1) ! R de…ned by:17
Z x t
x e
f (x) = e dt
1 t

By repeatedly integrating by parts, we get that:


Z x t x Z x t x x Z x t Z x t
e et e et et 2e t 1 1 x 2e
dt = + 2
dt = + 2
+ 3
dt = e + 2
+ 3
dt
1 t t 1 1 t t 1 t 1 1 t t t 1 1 t
x x x Z x Z x
et et 2et 3!et t 1 1 2! x 3!et
= + 2 + 3 + dt = e + + + dt
t 1 t 1 t 1 1 t3 t t2 t3 1 1 t3
x Z x
1 1 2! (n 1)! et
= et + 2+ 3+ + + n! dt
t t t tn 1 1 tn+1
Since
Rx R x
et
Rx et
x
ex
et 2
dt + dt e2 + x n+1
1 tn+1 dt
x
1 tn+1 2 tn+1 ( )2
0 lim ex = lim ex lim ex
x!1 x!1 x!1
xn xn x n

xn 2n+1
= lim x + =0
x!1 e2 x
We have Z x
et ex
dt = o as x ! +1
1 tn+1 xn
Hence,
g (x) 1 1 2! 3! (n 1)! 1
f (x) = x
= + 2+ 3+ 4+ + +o as x ! +1
e x x x x xn xn
and
1
X (k 1)!
f (x) as x ! +1
xk
k=1
P P
For any given x > 1, the ratio criterion implies 1 k=1 (k 1)!=xk = 1 k
k=1 k!=kx = +1.
The asymptotic expansion thus determines a divergent series. In this (very unfortunate)
case not only the series does not converge to f (x), but it even diverges. N
17
This example is taken from de Bruijn (1961).
742 CHAPTER 23. APPROXIMATION

Let us go back to the polynomial case, in which the asymptotic expansion of f : (a; b) ! R
at x0 2 (a; b) has the power series form18
1
X
f (x) k (x x0 )k as x ! x0
k=0

When f in…nitely di¤erentiable at x0 , by Taylor’s Theorem the asymptotic expansion be-


comes
X1
f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0

The right-hand side of the expansion is a power series called the Taylor series (Maclaurin if
x0 = 0) of f at x0 , with coe¢ cients k = f (k) (x0 ) =k!.
But, when can we turn in =, that is, when can these approximations become, at least
locally, exact? To answer this important question, we introduce the following classic class of
functions.

De…nition 1080 A function f : (a; b) ! R is said to be analytic if, for every x0 2 (a; b),
there is a neighborhood B (x0 ) and a sequence of scalars f k g1
k=0 such that
1
X
f (x) = k (x x0 )k 8x 2 B (x0 )
k=0

In words, f is analytic is its polynomial asymptotic expansion of f is no longer an approx-


imation but, locally, coincides exactly with f itself. Analytic functions are thus expandable
as power series. Next we show that such a series is, indeed, the Taylor series.

Proposition 1081 A function f : (a; b) ! R is analytic if and only if it is in…nitely di¤ er-
entiable and, for every x0 2 (a; b), there is a neighborhood B (x0 ) such that
1
X f (k) (x0 )
f (x) = (x x0 )k 8x 2 B (x0 ) (23.34)
k!
k=0

Proof The converse being trivial,


P1let us consider the “only if” side. Let f be analytic.
k
Since, by hypothesis, the series k=0 k (x x0 ) is convergent for every x 2 B (x0 ), with
sum f (x), one can show that f is in…nitely di¤erentiable at every x0 2 (a; b). Let n 1. By
Taylor’s Theorem, we have
n
X f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0

Lemma 1074 implies that k = f (k) (x0 ) =k! for every 1 k n. Since n was arbitrarily
chosen, the desired result follows.

To answer the previous “approximation vs. exact” question thus amounts to establish
the analyticity of a function: we can turn in =, at least locally, if the function is analytic.
18
For simplicity, in Section 10.5 we considered power series with x0 = 0 but, of course, everything goes
through if x0 is any scalar.
23.5. CODA: ASYMPTOTIC EXPANSIONS 743

By Proposition 1081, being in…nitely di¤erentiable is a necessary condition for a function


to be analytic. However, the following remarkable example shows that such a condition is
not su¢ cient: an in…nitely di¤erentiable function may fail to have a power series expansion
(23.34) at some point of its domain. It is this surprising fact that makes it necessary to
introduce analytic functions as the class of in…nitely di¤erentiable functions for which such
failure does not occur (again Proposition 1081).

Example 1082 The function f : R ! R given by


( 1
e x2 if x 6= 0
f (x) =
0 if x = 0

is in…nitely di¤erentiable at every point of the real line, hence at the origin. So,
1
X f (k) (0)
f (x) xk as x ! 0
k!
k=0

However, it holds that f (n) (0) = 0 for every n 1, so


n
X f (k) (0)
f (x) 6= 0 = xk 80 6= x 2 R
k!
k=0

The function f is not analytic although it is in…nitely di¤erentiable. N

Next we present two classic analyticity criteria.19 The …rst one is based on the radius of
convergence of the Taylor series.

Theorem 1083 (Pringsheim) An in…nitely di¤ erentiable function f : (a; b) ! R is ana-


lytic if there is > 0 such that

r (x0 ) 8x0 2 (a; b)


q
where r (x0 ) = 1= lim sup k
f (k) (x0 ) is the radius of convergence of the Taylor series of f
at x0 .

The second, quite striking, criterion is based on the sign of the derivatives.

Theorem 1084 (Bernstein) An in…nitely di¤ erentiable function f : (a; b) ! R is analytic


if at all x 2 (a; b) its derivatives of all orders are positive, i.e., f (k) (x) 0 for all k 1.

Example 1085 For the function f : R f1g ! R de…ned by

1
f (x) =
1 x
19
The …rst criterion has been proved by Alfred Pringsheim in 1893, the second one by Sergei Bernstein in
1912. We omit the proof of these deep results and refer interested readers to Krantz and Parks (2002).
744 CHAPTER 23. APPROXIMATION

we have, for all k 1,


k!
f (k) (x) =
(1 x)k
Indeed, we can proceed by induction. For k = 1, the result is obvious. If we assume that the
result is true for k 1 (induction hypothesis), then
(k 1)!
df (k 1) (x) d d (1 x)1 k
d (1 x)1 k
(k) (1 x)k 1
f (x) = = = (k 1)! = (k 1)!
dx dx ! dx dx
(1 k) k!
= (k 1)! k
=
(1 x) (1 x)k

as desired. So, at all x < 1 we have f (k) (x) 0 for all k 1. By Bernstein’s Theorem, f is
analytic on ( 1; 1). That is, at all x0 < 1 there is a neighborhood B (x0 ) ( 1; 1) such
that
X1 1
X
f (k) (x0 ) k x x0 k
f (x) = (x x0 ) = 8x 2 B (x0 )
k! 1 x0
k=0 k=0

In particular, by the properties of the geometric series we have


1
X k
x x0
f (x) = 8x 2 (2x0 1; 1)
1 x0
k=0

because j(x x0 ) = (1 x0 )j < 1 if and only if x 2 (2x0 1; 1). So, we can take B (x0 ) =
(2x0 1; 1), a neighborhood of x0 of radius 1 x0 .20 For instance, at the origin x0 = 0 we
have
1
X 1
f (k) (0) k X k
f (x) = x = x 8x 2 ( 1; 1)
k!
k=0 k=0

Here we can take B (0) = ( 1; 1). N

If the functions f; g : (a; b) ! R are analytic and ; 2 R are any two scalars, then the
function f + g : (a; b) ! R is still analytic. So, linear combinations of analytic functions
are analytic. This simple remark, combined with analyticity criteria like the previous ones,
permits to establish that many functions of interest are analytic. The following result shows
that, indeed, some classic elementary functions are analytic.

Proposition 1086 (i) The exponential and logarithmic functions are analytic. In particu-
lar,
1
X xk
ex = 8x 2 R
k!
k=0
1
X xk
log (1 + x) = ( 1)k+1 8x > 0
k
k=1
20
Note that x0 < 1 implies 2x0 1 < 1.
23.5. CODA: ASYMPTOTIC EXPANSIONS 745

(ii) The trigonometric functions sine and cosine are analytic. In particular,
1
X 1
X
( 1)k 2k+1 ( 1)k 2k
sin x = x and cos x = x 8x 2 R
(2k + 1)! (2k)!
k=0 k=0

Proof Let P us only consider the exponential function. By Theorem 367, at x0 = 0 we


have e = 1
x
k=0 xk =k! for every x 2 R. By substitution, for every x0 2 R it holds that
P
ex = ex0 + ex0 k=1 (x x0 )k =k! for every x 2 R. The exponential function is thus analytic
n

on the real line. The same conclusion could have been achieved via Bernstein’s Theorem.

In conclusion, analytic functions are a fundamental subclass of in…nitely di¤erentiable


functions. Thanks to their asymptotic expansion, which is both polynomial and exact (what
more could one want?), they are the most tractable functions. This makes them perfect for
applications, which hardly can do without them.

23.5.3 Hille’s formula


We can now state a beautiful version of Taylor’s formula, due to Einar Hille, for continuous
functions (we omit its non-trivial proof).

Theorem 1087 (Hille) Let f : (0; 1) ! R be a bounded continuous function and x0 > 0.
Then, for each h > 0,
1
X k f (x )
0
f (x0 + h) = lim hk (23.35)
!0+ k!
k=0

We call Hille’s formula the limit (23.35). When f is in…nitely di¤erentiable, Hille’s
formula intuitively should approach the series expansion (23.34), i.e.,
1
X f (k) (x0 )
f (x0 + h) = hk
k!
k=0

because lim !0+ k f (x0 ) = f (k) (x0 ) for every k 1 (Proposition 942). This is actually
true when f is analytic because in this case (23.34) and (23.35) together imply
1
X 1
kf(x0 ) k X f (k) (x0 ) k
lim h = h
!0+ k! k!
k=0 k=0

Hille’s formula, however, holds when f is just bounded and continuous, thus providing a
remarkable generalization of Taylor’s expansion of analytic functions.

23.5.4 Borel’s Theorem


Let f be the function of Example 1082 and let g : R ! R be function identically equal to
0 (which is analytic, unlike f ), that is, g (x) = 0 for all x 2 R. We have f (k) (0) = g (k) (0)
for all k 0, so f and g are an example of two distinct in…nitely di¤erentiable functions
that have the same Maclaurin series. Indeed, Taylor series pin down uniquely only analytic
functions.
746 CHAPTER 23. APPROXIMATION

But, do coe¢ cients of Taylor (in particular, of Maclaurin) series have some characterizing
property? Is there some peculiar property that such coe¢ cients satisfy? In the special case
of analytic
p functions, the answer is positive: Cauchy-Hadamard’s Theorem requires that
lim sup j n j < +1, so only sequence of scalars f k g1
n
k=0 that satisfy such a bound may
qualify to be coe¢ cients of a Taylor series of some analytic function. Yet, we learned in
Example 1082 that there exist in…nitely di¤erentiable functions that are not analytic. Indeed,
the next deep theorem – whose highly non-trivial proof we omit – shows that, in general,
the previous questions have a negative answer.21

Theorem 1088 (Borel-Peano) For any sequence of scalars fck g1


k=0 there is an in…nitely
di¤ erentiable function f : R ! R such that

f (k) (0) = ck 8k = 0; 1; :::; n; ::: (23.36)

So, anything goes: given any sequence whatsoeverp of scalars f k g1


k=0 , there is an in…nitely
di¤erentiable function f – not analytic if lim sup n j n j = +1 – such that f (k) (0) = k k!
for all k, so with those scalars as the coe¢ cients of its Maclaurin series. It is actually not
unique in that the function satisfying (23.36) is not unique: given any such function f and
any scalar , the function f : R ! R de…ned by
( 1
f (x) + e x2 if x 6= 0
f (x) =
f (0) if x = 0

(k)
is easily seen to be also such that f (0) = ck for all k = 0; 1; :::; n; :::. A continuum of
in…nitely di¤erentiable functions that satisfy (23.36) thus exist.

21
The theorem was independently proved between 1884 and 1895 by Giuseppe Peano and Emile Borel
(Borel’s version is the best known, so the name of this subsection).
Chapter 24

Concavity and di¤erentiability

Concave functions have remarkable di¤erentiability properties that con…rm the great tract-
ability of these widely used functions. The study of these properties is the subject matter of
this chapter. We begin with scalar functions and then move to functions of several variables.
Throughout the chapter C always denotes a convex set (so an interval in the scalar case).
For brevity, we will focus on concave functions, leaving to the readers the dual results that
hold for convex functions.

24.1 Scalar functions


24.1.1 Decreasing marginal e¤ects
The di¤erentiability properties of a scalar concave function f : C R ! R follow from a
simple geometric observation. Given two points x and y in the domain of f , the chord that
joins the points (x; f (x)) and (y; f (y)) of the graph has slope

f (y) f (x)
y x

as one can verify with a simple modi…cation of what done for (20.6). Graphically:

f(y)
4
f(y)-f(x)
3
f(x)
2
y-x

0
O x y
-1
-1 0 1 2 3 4 5 6

747
748 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

If the function f is concave, the slope of the chord decreases when we move the chord
rightward. This basic geometric property characterizes concavity, as the next lemma shows.

Lemma 1089 A function f : C R ! R is concave if and only if, for any four points
x; w; y; z 2 C with x w < y z, we have
f (y) f (x) f (z) f (w)
(24.1)
y x z w

In other words, by moving rightward from [x; y] to [w; z], the slope of the chords decreases.
Graphically:

5 D
C
4

3
B
2

1 A

0
O x w y z
-1
-1 0 1 2 3 4 5 6

Note that a strict inequality in (24.1) characterizes strict concavity.

Proof “Only if”. Let f be concave. The proof is divided in two steps: …rst we show that
the chord AC has a greater slope than the chord BC:

5
C
4

3
B
2

1 A

0
O x w y
-1
-1 0 1 2 3 4 5 6
24.1. SCALAR FUNCTIONS 749

Then, we show that the chord BC has a greater slope than the chord BD:

5 D
C
4

3
B
2

0
O w y z
-1
-1 0 1 2 3 4 5 6

The …rst step amounts to proving (24.1) for z = y. Since x w < y, there exists 2 [0; 1]
such that w = x + (1 )y. Since f is concave, we have f (w) f (x) + (1 )f (y), so
that
f (y) f (w) f (y) f (x) (1 )f (y) f (y) f (x)
= (24.2)
y w y x (1 )y y x
This completes the …rst step. We now move to the second step, which amounts to proving
(24.1) for x = w. Since w < y z, there exists 2 [0; 1] such that y = w + (1 )z.
Further, since f is concave we have f (y) f (w) + (1 )f (z), so that

f (y) f (w) f (w) + (1 )f (z) f (w) f (z) f (w)


= (24.3)
y w w + (1 )z w z w
Finally, from (24.2) and (24.3) it follows that
f (z) f (w) f (y) f (w) f (y) f (x)
z w y w y x
as desired.
“If”. Assume (24.1). Let x; z 2 C, with x < z, and 2 [0; 1]. Set y = x + (1 ) z. If
in (24.1) we set w = x, we have
f ( x + (1 ) z) f (x) f (z) f (x)
x + (1 )z x z x
Since x + (1 )z x = (1 ) (z x), we then have
f ( x + (1 ) z) f (x) f (z) f (x)
(1 ) (z x) z x
that is, f ( x + (1 ) z) f (x) (1 ) (f (z) f (x)). In turn, this implies that f is
concave, as desired.

The geometric property (24.1) has the following analytical counterpart, of great economic
signi…cance.
750 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Proposition 1090 If f : C R ! R is concave, then it has decreasing increments (or


di¤ erences), i.e.,
f (x + h) f (x) f (y + h) f (y) (24.4)

for all x; y 2 C, h 0 and x y with y + h 2 C. The converse is true if f is continuous.

Proof Let x y and h 0. Then the points y and x + h belongs to the interval [x; y + h].
Under the change of variable z = y +h, we have x+h; z h 2 [x; z]. Hence there is a 2 [0; 1]
for which x + h = x + (1 ) z. It is immediate to check that z h = (1 ) x + z.
By the concavity of f , we then have f (x + h) f (x) + (1 ) f (z) and f (z h)
(1 ) f (x) + f (z). Adding the two inequalities, we have

f (x + h) + f (z h) f (x) + f (z)
f (x + h) f (x) f (z) f (z h) = f (y + h) f (y) :

as desired. We omit the proof of the converse.

The inequality (24.4) does not change if we divide both sides by a h > 0. Hence,

f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h

provided the limits exist. Similarly f 0 (x) f 0 (y), and so f 0 (x) f 0 (y) when the (bilat-
eral) derivative exists. Concave functions f thus feature decreasing marginal e¤ects as their
argument increases, so embody a fundamental economic principle: additional units have a
lower and lower marginal impact on levels (of utility, of production, and so on; we then talk
of decreasing marginal utility, decreasing marginal returns, and so on). It is through this
principle that forms of concavity …rst entered economics.1
The next lemma establishes this property rigorously by showing that one-sided derivatives
exist and are decreasing.

Proposition 1091 Let f : C R ! R be concave. Then,

(i) the right f+0 (x) and left f 0 (x) derivatives exist at each x 2 int C;2

(ii) the right f+0 and left f 0 derivative functions are both decreasing on int C;

(iii) f+0 (x) f 0 (x) for each x 2 int C.


1
In his famous 1738 essay, Daniel Bernoulli wrote: “Now it is highly probable that any increase in wealth,
no matter how insigni…cant, will always result in an increase in utility which is inversely proportionate to
the quantity of goods already possessed.” This is where the principle …rst appeared, and through it Bernoulli
justi…ed the use of a logarithmic (so concave) utility function. This magni…cent insight of Bernoulli was way
ahead of his time (see for instance Stigler, 1950).
2
The interior, int C, of an interval C is an open interval: whether C is either [a; b] or [a; b) or (a; b], we
always have int C = (a; b).
24.1. SCALAR FUNCTIONS 751

A concave function has therefore remarkable properties of regularity: at each interior


point of its domain, it is automatically continuous (Theorem 669) and has decreasing one-
sided derivative functions.3

Proof Since x0 is an interior point, it has a neighborhood (x0 "; x0 + ") included in C,
that is, (x0 "; x0 + ") C. Let 0 < a < ", so that we have [x0 a; x0 + a] C. Let
: [ a; a] ! R be de…ned by

f (x0 + h) f (x0 )
(h) = 8h 2 [ a; a]
h
Property (24.1) implies that is decreasing, that is,

f (x0 + h0 ) f (x0 ) f (x0 + h00 ) f (x0 )


h0 h00 =) h0 = h00 = (24.5)
x0 + h0 x0 x0 + h00 x0

Indeed, if h0 < 0 < h00 it is su¢ cient to apply (24.1) with w = y = x0 , x = x0 + h0 and
z = x0 + h00 . If h0 h00 < 0, apply (24.2) with y = x0 , x = x0 + h0 and w = x0 + h00 . If
0<h 0 h apply (24.3) with w = x0 , y = x0 + h0 and z = x0 + h00 .
00

Since is decreasing on [ a; a], we have (a) (h) ( a) for every h 2 [ a; a],


that is, is bounded. Therefore, is both decreasing and bounded, which implies that the
right limit and the left limit of exist and are …nite. This proves the existence of one-sided
derivatives. Moreover, the fact that is monotonically decreasing implies (h0 ) (h00 ) for
0 00
every h < 0 < h , so that

f+0 (x0 ) = lim (h) lim (h) = f 0 (x0 )


h!0+ h!0

To show the monotonicity, consider x; y 2 int C such that x < y. By (24.4),

f (x + h) f (x) f (y + h) f (y)
8h 2 [ a; a]
h h
Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
which implies that the right derivative function is decreasing. A similar argument holds for
the left derivative function.

Clearly, if in addition f is di¤erentiable at x, then f 0 (x) = f+0 (x) = f 0 (x). In particular:

Corollary 1092 If a concave function f : C R ! R is di¤ erentiable on int C, then its


derivative function f 0 is decreasing on int C.
3
For brevity, one often says that a “derivative is increasing” rather than the more precise a “derivative
function is increasing”. In what follows, at times we too will take this liberty. The more one masters a topic,
the more one is tempted to abuse notation and terminology for the sake of brevity. Sometimes this is needed
not to get trapped in pedantic matters, but other times it what makes impenetrable some topics to beginners.
As we already remarked (Section 1.1.2), a proper balance between rigor and pedantry is key for e¤ective
scienti…c communication.
752 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Example 1093 (i) The concave function f (x) = jxj does not have a derivative at x = 0.
Nevertheless, the one-sided derivatives exist at each point of the domain, with
(
1 if x < 0
f+0 (x) =
1 if x 0

and (
0 1 if x 0
f (x) =
1 if x > 0

Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing.
(ii) The concave function
8
< x+1
> if x 1
f (x) = 0 if 1<x<1
>
:
1 x if x 1

does not have a derivative at x = 1 and at x = 1. Nevertheless, the one-sided derivatives


exist at each point of the domain, with
8
< 1
> if x < 1
0
f+ (x) = 0 if 1 x<1
>
:
1 if x 1

and 8
< 1
> if x 1
0
f (x) = 0 if 1<x 1
>
:
1 if x > 1

Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing.
(iii) The concave function f (x) = 1 x2 is di¤erentiable on R with f 0 (x) = 2x. The
derivative function is decreasing. N

Proposition 1091 says, inter alia, that at the interior points x we have f+0 (x) f 0 (x).
The next result says that we actually have f+0 (x) = f 0 (x), and so f is di¤erentiable at x, at
all points x 2 C except those belonging to an, at most, countable subset of C. For the three
concave functions of the previous example, such set of non-di¤erentiability is f0g, f 1; 1g
and ;, respectively.

Theorem 1094 A concave function f : C R ! R is di¤ erentiable at all the points of


C with the exception of an, at most, countable subset.

We we omit the proof of this important result.


24.1. SCALAR FUNCTIONS 753

24.1.2 Chords and tangents

Theorem 1095 Let f : (a; b) ! R be di¤ erentiable at x 2 (a; b). If f is concave, then

f (y) f (x) + f 0 (x) (y x) 8y 2 (a; b) (24.6)

Proof Let f be concave and let x and y be two distinct points of (a; b). If 2 (0; 1), we
have

f (x + (1 ) (y x)) = f ( x + (1 ) y) f (x) + (1 ) f (y)


= f (x) + (1 ) [f (y) f (x)]

Therefore,
f (x + (1 ) (y x)) f (x)
f (y) f (x)
(1 )

Dividing and multiplying the left-hand side by y x, we get

f (x + (1 ) (y x)) f (x)
(y x) f (y) f (x)
(1 ) (y x)

This inequality holds for every 2 (0; 1). Hence, thanks to the di¤erentiability of f at x,
we have
f (x + (1 ) (y x)) f (x)
lim (y x) = f 0 (x) (y x)
!1 (1 ) (y x)

Therefore, f 0 (x) (y x) f (y) f (x), as desired.

The right-hand side of inequality (24.6) is the tangent line of f at x, that is, the linear
approximation of f that holds, locally, at x. By Theorem 1095, such line always lies above
the graph of the function, the approximation is in “excess”.
Geometrically, this remarkable property is clear: the de…nition of concavity requires that
the straight line passing through the two points (x; f (x)) and (y; f (y)) lies below the graph
of f in the interval between x and y, and hence that it lies above it outside that interval.4
Letting y tend to x, the straight line becomes tangent and lies all above the curve.

4
For completeness, let us prove it. Let z be outside the interval [x; y]: suppose that z > y. We can then
write y = x + (1 ) z with 2 (0; 1) and, by the concavity of f , we have f (y) f (x) + (1 ) f (z),
that is, f (z) (1 ) 1 f (y) (1 ) 1 f (x). But, since 1= (1 ) = > 1 and 1 = 1 1= (1 )=
= (1 ) < 0, we have f (z) = f ( y + (1 ) x) f (y) + (1 ) f (x) for every > 1. If z < x, we
reason similarly.
754 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

5 f(x)+f'(x)(y-x)

4.5
f(x)
4
f(y)
3.5

3
f(y )
2

2.5

2 f(y )
1
1.5

0.5
O y y y x
1 2
0
0 1 2 3 4 5

In the previous theorem we assumed di¤erentiability at a given point x. If we assume it


on the entire interval (a; b), the inequality (24.6) characterizes concavity.

Theorem 1096 Let f : (a; b) ! R be di¤ erentiable on (a; b). Then, f is concave if and only
if
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b) (24.7)

Thus, for a function f di¤erentiable on an open interval, a necessary and su¢ cient con-
dition for concavity of f is that the tangent lines at the various points of its domain all lie
above its graph.

Proof The “only if” follows from the previous theorem. We prove the “if”. Suppose that
inequality (24.7) holds and consider the point z = x + (1 ) y. Let us consider (24.7)
twice: …rst with the points z and x, and then with the points z and y. Then:

f 0 (z) (1 ) (x y) f (x) f ( x + (1 ) y)
0
f (z) (y x) f (y) f ( x + (1 ) y)

Let us multiply the …rst inequality by , the second one by (1 ), and add them. We get

0 f (x) + (1 ) f (y) f ( x + (1 ) y)

Given the arbitrariness of x and y, we conclude that f is concave.

24.1.3 Concavity criteria


The last theorem established a …rst di¤erential characterization of concavity. Condition
(24.7) can be viewed as a concavity criterion that can be used to check whether a given
di¤erentiable function is, indeed, concave. However, though key conceptually, condition
(24.7) turns out to be not that useful operationally as a concavity criterion. For this reason,
in this section we will establish other di¤erential characterizations of concavity that lead to
more useful concavity criteria.
24.1. SCALAR FUNCTIONS 755

To this end, remember that a signi…cant property established in Proposition 1091 is the
decreasing monotonicity of the one-sided derivative functions of concave functions. The next
important result shows that for continuous functions this property characterizes concavity.

Theorem 1097 Let f : C R ! R be continuous. Then:

(i) f is concave if and only if the right derivative function f+0 exists and is decreasing on
int C;

(ii) f is strictly concave if and only if the right derivative function f+0 exists and is strictly
decreasing on int C.

Proof (i) We only prove the “if” since the converse follows from Proposition 1091. For
simplicity, assume that f is di¤erentiable on the open interval int C. By hypotheses, f 0 is
decreasing on int C. Let x; y 2 int C, with x < y, and 2 (0; 1). Set z = x + (1 ) y, so
that x < z < y. By the Mean Value Theorem, there exist x 2 (x; z) and y 2 (z; y) such
that
f (z) f (x) f (y) f (z)
f 0 ( x) = ; f0 y =
z x y z
Since f 0 is decreasing, f 0 ( x ) f0 y . Hence,

f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
x + (1 )y x y x (1 )y

Being x + (1 )y x x = (1 ) (y x) and y x (1 )y = (y x), we then


have
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
(1 ) (y x) (y x)
In turn, this easily implies f ( x + (1 ) z) f (x) + (1 ) f (z), as desired.5 (ii) This
part is left to the reader.

A similar result, left to the reader, holds for the other one-sided derivative f 0 . This
theorem thus establishes a di¤erential characterization for concavity by showing that it is
equivalent to the decreasing monotonicity of one-sided derivative functions.

Example 1098 Let f : R ! R be given by f (x) = x + x3 , that is,


(
x + x3 if x < 0
f (x) =
x x3 if x 0

The function f is continuous. It has one-sided derivatives at each point of the domain, with
(
1 + 3x2 if x < 0
f+0 (x) =
1 3x2 if x 0
5
Using a version of the Mean Value Theorem for unilateral derivatives, we can prove the result without
any di¤erentiability assumption on f .
756 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

and
(
0 1 + 3x2 if x 0
f (x) =
1 3x2 if x > 0

To see that this is the case, consider the origin, which is the most delicate point. We have

f (h) f (0) h + h3 h3
f+0 (0) = lim = lim = lim 1+ = 1
h!0+ h h!0+ h h!0+ h

and
f (h) f (0) h + h3 h3
f 0 (0) = lim = lim = lim 1+ =1
h!0 h h!0 h h!0 h

Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing. By Theorem 1097, the function f is concave. N

One-sided derivatives are key in the previous theorem because concavity per se only
ensures their existence, not that of the two-sided derivatives. One-sided derivatives are,
however, less easy to handle than the two-sided derivative. So, in applications di¤erentiability
is often assumed. In this case we have the following simple consequence of the previous
theorem that provides a useful concavity criterion for functions.

Corollary 1099 Let f : C R ! R be di¤ erentiable on int C and continuous on C. Then:

(i) f is concave if and only if f 0 is decreasing on int C;

(ii) f is strictly concave if and only if f 0 is strictly decreasing on int C.

Under di¤erentiability, a necessary and su¢ cient condition for a function to be (strictly)
concave is, thus, that its …rst derivative is (strictly) decreasing.6

Proof We only prove (i), as (ii) is similar. Let f : C R ! R be di¤erentiable on int C and
continuous on C. If f is concave, Theorem 1097 implies that f 0 = f+0 is decreasing. Vice
versa, if f 0 = f+0 is decreasing then Theorem 1097 implies that f is concave.

Example 1100 Consider the functions f; g : R ! R given by f (x) = x3 and g (x) =

6
When C is open, the continuity assumption become super‡uous (a similar observation applies to Corollary
1101 below).
24.1. SCALAR FUNCTIONS 757

e x. The graph of f is:

3 y

0
O x
-1

-2

-3

-4
-3 -2 -1 0 1 2 3 4 5

while the graph of g is:

2
y
1

0
O x
-1 -1

-2

-3

-4

-5
-3 -2 -1 0 1 2 3 4

Both functions are di¤erentiable on their domain, with


(
3x2 if x 0
f 0 (x) = and g 0 (x) = e x
3x2 if x > 0

The derivatives are strictly decreasing and therefore f and g are strictly concave thanks to
Corollary 1099. N

The previous corollary provides a simple di¤erential criterion of concavity that reduces
the test of concavity to that, often operationally simple, of a property of …rst derivatives. The
next result shows that it is, actually, possible to do even better by recalling the di¤erential
characterization of monotonicity seen in Section 22.4.
758 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Corollary 1101 Let f : C R ! R be with twice di¤ erentiable on int C and continuous
on C. Then:

(i) f is concave if and only if f 00 0 on int C;

(ii) f is strictly concave if f 00 < 0 on int C.

Proof (i) It is su¢ cient to observe that, thanks to the “decreasing” version of Proposition
1003, the …rst derivative f 0 is decreasing on int C if and only if f 00 (x) 0 for every x 2 int C.
(ii) It follows from the “strictly decreasing” version of Proposition 1005.

Under the further hypothesis that f is twice di¤erentiable on int C, concavity thus be-
comes equivalent to the negativity of the second derivative, a condition often easier to check
than the decreasing monotonicity of the …rst derivative. In any case, thanks to the last two
corollaries we now have powerful di¤erential tests of concavity.7
Note the asymmetry between points (i) and (ii): while in (i) the decreasing monotonicity
is a necessary and su¢ cient condition for concavity, in (ii) the strictly decreasing mono-
tonicity is only a su¢ cient condition for strict concavity. This follows from the analogous
asymmetry for monotonicity between Propositions 1003 and 1005.
p
Example 1102 (i) The functions f (x) = x and g (x) = log x have, respectively, derivat-
p
ives f 0 (x) = 1=2 x and g 0 (x) = 1=x that are strictly decreasing. Therefore, they are strictly
concave. The second derivatives f 00 (x) = 1=4x3=2 < 0 and g 00 (x) = 1=x2 < 0 con…rm
this conclusion.
(ii) The function f (x) = x2 has derivative f 0 (x) = 2x that is strictly increasing. There-
fore, it is strictly convex. Indeed, f 00 (x) = 2 > 0.
(iii) The function f (x) = x3 has derivative f 0 (x) = 3x2 that is strictly decreasing on
( 1; 0] and strictly increasing on [0; 1). Indeed, the second derivative f 00 (x) = 6x is 0
on ( 1; 0] and 0 on [0; 1). N

24.2 Intermezzo
In the next section we will study the di¤erential properties of concave functions of several
variables. This important topic relies, in turn, on two important topics, superlinear functions
and monotone operators, that we now present.

24.2.1 Superlinear functions


Concavity and positive homogeneity join forces in the important class of superlinear func-
tions. Speci…cally, a function f : Rn ! R is superlinear if it is:

(i) positively homogeneous: f ( v) = f (x) for each 0 and each x 2 Rn ,

(ii) superadditive: f (x + y) f (x) + f (y) for each x; y 2 Rn .


7
As the reader can check, dual results hold for convex functions, with increasing monotonicity instead of
decreasing monotonicity (and f 00 0 instead of f 00 0).
24.2. INTERMEZZO 759

Hence, a function is superlinear if it is positively homogeneous and superadditive. Simil-


arly, a function f : Rn ! R is sublinear if it is positively homogeneous and subadditive, i.e.,
if f (x + y) f (x) + f (y) for each x; y 2 Rn . It is immediate to see that f is sublinear if
and only if f is superlinear.
Superlinear functions are concave (so sublinear functions are convex):
f ( x + (1 ) y) f ( x) + f ((1 ) y) = f (x) + (1 ) f (y)
for each x; y 2 Rn and each 2 [0; 1].8

Example 1103 (i) The norm k k : Rn ! R is a sublinear function (cf. Example 652). (ii)
De…ne f : Rn ! R by
f (x) = inf i x 8x 2 Rn
i2I
where f i gi2I be a collection, …nite or in…nite, of vectors of Rn . This function is easily seen
to be superlinear.

Next we report some useful properties of superlinear functions.

Proposition 1104 Let f : Rn ! R be superlinear. Then, f (0) = 0 and


f ( x) f (x) 8x 2 Rn (24.8)
Furthermore, f is linear if and only if f ( x) = f (x) for each x 2 Rn .

Proof Since f is positively homogeneous, we have f ( 0) = f (0) for each 0. Since


0 = 0, we have f (0) = f (0) for each 0, which can happen only if f (0) = 0.9 For
each x 2 Rn , we thus have 0 = f (0) = f (x x) f (x) + f ( x), so (24.8) holds.
Clearly, if f is linear we have f ( x) = f (x) for each x 2 Rn . As to the converse,
assume that f ( x) = f (x) for each x 2 Rn . Consider the function g : Rn ! R de…ned
as g (x) = f ( x) for each x 2 Rn . It is easy to check that g is sublinear. From f ( x) =
f (x) it follows that f (x) = g (x) for each x 2 Rn , so f is a¢ ne. By Proposition 656, there
exist a linear function l : Rn ! R and 2 R such that f = l + . On the other hand,
= f (0) = 0, so f = l. We conclude that f is linear.

A simple consequence of the last result is the following corollary, which motivates the
“superlinear” terminology.

Corollary 1105 A function f : Rn ! R is both superlinear and sublinear if and only if is


linear.

Proof Let f be both superlinear and sublinear. By (24.8), we have both f ( x) f (x)
and f ( x) f (x) for all x 2 Rn , that is, f ( x) = f (x) for all x 2 Rn . By Proposition
1104, f is then linear. The converse is trivial.

Inequality (24.8) delivers an interesting sandwich.10


8
Note the analogy with (14.5), obviously due to the sublinearity of the norm.
9
Note that the argument is analogous to the one used in the proof of Proposition 534.
10
Recall that (Rn )0 denotes the dual space of Rn , i.e., the collection of all linear functions on Rn (Section
13.1.2).
760 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Proposition 1106 Let f : Rn ! R be superlinear. Then, for each x 2 Rn ,

f (x) l (x) () f (x) l (x) f ( x) 8l 2 (Rn )0 (24.9)

In words, a linear function l dominates pointwise f if and only if it is pointwise sandwiched


between f and g, where g : Rn ! R is the dual sublinear function of f de…ned by g (x) =
f ( x).

Proof Let l 2 (Rn )0 and suppose that f (x) l (x) for all x 2 Rn . Let x 2 Rn . Then, we
have both f (x) l (x) and f ( x) l ( x), which in turn implies f (x) l (x) = l ( x)
f ( x). This proves (24.9).

24.2.2 Monotonic operators and the law of demand


An operator g = (g1 ; :::; gn ) : C Rn ! Rn is said to be monotone (decreasing) if
n
X
(g (x) g (y)) (x y) = (gi (x) gi (y)) (xi yi ) 0 8x; y 2 C (24.10)
i=1

and strictly monotone (decreasing) if the inequality (24.10) is strict when x 6= y.


The reader can verify that for n = 1 we obtain again the usual notions of monotonicity.
Moreover, if g is monotone and the vectors x and y have equal components, except for an
index i, then
xi > yi =) gi (x) gi (y) (24.11)
because in this case (g (x) g (y)) (x y) = (gi (x) gi (y)) (xi yi ).

Proposition 1107 Let g : C Rn ! Rn be a continuously di¤ erentiable operator de…ned


on an open convex set. Then,

(i) g is monotone if and only if the Jacobian matrix Dg (x) is negative semide…nite for all
x 2 C;

(ii) g is strictly monotone if the Jacobian matrix Dg (x) is negative semide…nite for all
x 2 C.

Proof We only prove (i) and leave (ii) to the reader. Suppose that g is monotone. Let
x 2 C and y 2 Rn . Then, for a scalar h > 0 small enough we have (g (x + hy) g (x))
((x + hy) x) 0. Since g is continuously di¤erentiable, we have
(g (x + hy) g (x)) ((x + hy) x)
0 lim
h!0+ h
g (x + hy) g (x)
= lim y = Dg (x) y y
h!0+ h
Since this holds for any y 2 Rn , we conclude that Dg (x) is negative semide…nite.
Conversely, suppose that Dg (x) is negative semide…nite at all x 2 C. Let x1 ; x2 2 C and
de…ne : [0; 1] ! R by

(t) = (x1 x2 ) (g (tx1 + (1 t) x2 ) g (x2 ))


24.3. MULTIVARIABLE CASE 761

To prove that g is monotone it is enough to show that (1) 0. But, (0) = 0 and is
monotone since, for all t 2 (0; 1),
0
(t) = (x1 x2 ) Dg (tx1 + (1 t) x2 ) (x1 x2 ) 0

Hence, (1) (0) = 0.

Example 1108 Consider an a¢ ne operator f : Rn ! Rn given by f (x) = Ax + b, where A


is a symmetric n n matrix and b 2 Rn . By the last result, f is monotone if A is negative
semide…nite, and it is strictly monotone if and only if A is negative de…nite. N

A market demand function D : Rn+ ! Rn+ (Section 18.8) is a strictly monotone operator
if
D (p) D p0 p p0 < 0 8p; p0 0

that is, if it satis…es the law of demand. In this case, (24.11) takes a strict version

pi > p0i =) Di (p) < Di p0

which means that, ceteris paribus, a higher price of good i results in a lower demand for this
good. In sum, monotonicity formalizes a key economic concept. Its Jacobian characterization
established in the last proposition plays an important role in demand theory.
Finally, we have a dual notion of increasing monotonicity when the inequality (24.10) is
reversed.

24.3 Multivariable case


Concave functions of several variables have important di¤erential properties. Armed with
what we learned in the Intermezzo, we now study them.
A caveat: unless otherwise stated, in the rest of this section C denotes an open and
convex set in Rn . This assumption eases the exposition, but in view of what we did in the
scalar case readers should be able to easily extend the analysis to any convex set.

24.3.1 Derivability and di¤erentiability


We begin by studying directional derivatives that continue to play a key role also in the
multivariable case. We introduce them for functions de…ned on an open set U .

De…nition 1109 A function f : U ! R is said to be derivable from the right at a point


x 2 U along the direction y 2 Rn if the limit

f (x + hy) f (x)
f+0 (x; y) = lim (24.12)
h!0+ h

exists and is …nite. This limit is called the directional right derivative of f at x along the
direction y.
762 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

The function f+0 (x; ) : Rn ! R is called the directional right derivative of f at x.


In a similar manner, by considering h ! 0 we can de…ne the directional left derivative
f 0 (x; ) : Rn ! R of f at x.
Clearly, f is derivable at x if and only if it is both left and right derivable at x with
f 0 (x; ) = f+0 (x; ). In this case, we have f 0 (x; ) = f+0 (x; ) = f 0 (x; ). The following
duality result between the two one-sided directional derivative functions is useful.

Proposition 1110 If a function f : U ! R is derivable at x 2 U from one side, so does on


the other side. In this case

f 0 (x; y) = f+0 (x; y) 8y 2 Rn (24.13)

This result implies, inter alia, that f+0 (x; ) is superlinear if and only if f 0 (x; ) is sub-
linear.

Proof Assume that f is derivable from the right at x 2 U . For each y 2 Rn we then have:

f (x + h ( y)) f (x) f (x + ( h) y) f (x)


f+0 (x; y) = lim = lim
h!0+ h h!0+ h
f (x + hy) f (x)
= lim = f 0 (x; y)
h!0 h

So, f is derivable from the left at x, and (24.13) holds. A similar argument shows that
derivability from the left yields derivability from the right.

Next we collect few important properties of one-sided directional derivatives.

Proposition 1111 Let f : C ! R be concave. Then,

(i) the right f+0 (x; ) : Rn ! R and left f 0 (x; ) : Rn ! R directional derivatives exist at
each x 2 C;

(ii) the right directional derivative f+0 (x; ) : Rn ! R is superlinear at each x 2 C;

(iii) the left directional derivative f 0 (x; ) : Rn ! R is sublinear at each x 2 C;

(iv) f+0 (x; ) f 0 (x; ) for each x 2 C.

The proof relies on the following lemma which shows that the di¤erence quotient is
decreasing.

Lemma 1112 Let f : C ! R be concave. Given any x 2 C and y 2 Rn , then the function

f (x + hy) f (x)
0<h7 ! (24.14)
h
is decreasing for all h > 0 values such that x + hy 2 C.
24.3. MULTIVARIABLE CASE 763

Proof Let x 2 C. Assume …rst that x = 0 and f (0) = 0. Fix y 2 Rn and let 0 < h1 < h2 .
By concavity,

h1 h1 h1 h1
f (h1 y) = f h2 y f (h2 y) + 1 f (0) = f (h2 y) ;
h2 h2 h2 h2

and so f (h1 y) =h1 f (h2 y) =h2 . To complete the proof, de…ne g : C fxg ! R by g (z) =
f (z + x) f (x) for all z 2 C fxg. Then, g (0) = 0 and g (hy) =h = (f (x + hy) f (x)) =h.
We conclude that the di¤erence quotient (24.14) has the desired properties.

Proof of Proposition 1111 (i) In view of Proposition 1110, we can focus on the left
derivative function f+0 (x; ) : Rn ! R. By Lemma 1112, the di¤erence quotient is decreasing,
so the limit (24.12) exists and

f (x + hy) f (x) f (x + hy) f (x)


lim = sup
h!0+ h h>0 h

It remains to show that it is …nite. We only consider the case when f is positive, i.e., f 0.
By concavity, for each h > 0 we then have

f (x + hy) f (x) f ((1 h) x + h (x + y)) f (x)


0 =
h h
(1 h) f (x) + hf (x + y) f (x)
f (x + y) f (x)
h
so the limit f+0 (x; y) is …nite for all y 2 Rn .
(ii) The proof of the positive homogeneity of f+0 (x; ) is analogous to that of the homo-
geneity of f 0 (x; ) in Corollary 967. For each 2 [0; 1], we have

f (x + h ( y1 + (1 ) y2 )) f (x) (f (x + hw1 ) f (x)) + (1 ) (f (x + hw2 ) f (x))


h h
Taking limits as h ! 0+ , this implies that f+0 (x; ) : Rn ! R is concave. Hence,

y1 + y2 y1 + y2
f+0 (x; y1 + y2 ) = f+0 x; 2 = 2f+0 x;
2 2
0 0
f+ (x; y1 ) f+ (x; y2 )
2 + = f+0 (x; y1 ) + f+0 (x; y2 ) .
2 2

This shows that f+0 (x; ) : Rn ! R is superadditive, and so superlinear.


(iii) By Proposition 1110, it follows from point (ii).
(iv) Since f+0 (x; ) : Rn ! R is superlinear, by Proposition 1104 we have f+0 (x; y)
f+ (x; y) for each y 2 Rn . By Proposition 1110, the result then follows.
0

The last result leads to interesting characterization of derivability via one-sided derivative
functions.

Corollary 1113 Let f : C ! R be concave. Given x 2 C, the following properties are


equivalent:
764 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

(i) f is derivable at x;

(ii) f+0 (x; ) = f 0 (x; );

(iii) f+0 (x; ) : Rn ! R is linear;

(iv) f 0 (x; ) : Rn ! R is linear.

In this case, the directional derivative function f 0 (x; ) : Rn ! R is linear, with

f 0 (x; y) = rf (x) y 8y 2 Rn (24.15)

A concave function derivable at a point has, thus, a linear directional derivative function
represented via the inner product (24.15). Since, in general, the directional derivative func-
tion is only homogeneous (Corollary 967), it is a further noteworthy property of concavity
that the much stronger property of linearity, with its inner product representation, holds.

Proof (iv) implies (iii). Assume that f 0 (x; ) : Rn ! R is linear. By (24.13), we have, for
all y; y 0 2 Rn and all ; 2 R,

f 0 x; y + y 0 = f+0 x; y y 0 = f+0 (x; y) + f+0 x; y 0


= f+0 (x; y) f+0 x; y 0 = f 0 (x; y) + f 0 x; y 0

So, f+0 (x; ) : Rn ! R is linear.


(iii) implies (ii). Assume that f+0 (x; ) : Rn ! R is linear. Since f+0 (x; ) f 0 (x; ), we
have f+0 (x; y) f 0 (x; y) and f+0 (x; y) f 0 (x; y) for each y 2 Rn , so

f+0 (x; y) f 0 (x; y) = f 0 (x; y) f+0 (x; y) = f+0 (x; y)

This proves that f+0 (x; ) = f 0 (x; ).


(ii) implies (i) Assume that f+0 (x; ) = f 0 (x; ). By (24.13), for each y 2 Rn we have

f (x + hy) f (x) f (x + hy) f (x)


lim = f 0 (x; y) = f 0 (x; y) = lim
h!0+ h h!0 h
and so the bilateral limit
f (x + hy) f (y)
f 0 (x; y) = lim
h!0 h
exists …nite. We conclude that f is derivable at x.
(i) implies (iv). Assume that f is derivable at x. In view of Proposition 1111, the
directional derivative function f 0 (x; ) : Rn ! R is linear because it is both superlinear,
being f 0 (x; ) = f+0 (x; ), and sublinear, being f 0 (x; ) = f 0 (x; ). Thus, f 0 (x; ) : Rn ! R
is linear. This completes the proof of the equivalence among conditions (i)-(iv).
Finally, assume that f is derivable (so, partially derivable) at x. By what just proved,
f 0 (x; ) : Rn ! R is linear. By Riesz’s Theorem, there is a vector 2 Rn such that
f 0 (x; y) = y for every y 2 Rn . Then,

@f (x)
= f 0 x; ei = ei = i 8i = 1; :::; n
@xi
24.3. MULTIVARIABLE CASE 765

Thus, = rf (x).

A remarkable property of concave functions of several variables is that for them partial
derivability and di¤erentiability are equivalent notions.

Theorem 1114 Let f : C ! R be concave. Given x 2 C, the following properties are


equivalent:

(i) f is partially derivable at x;

(ii) f is derivable at x;

(iii) f is di¤ erentiable at x.

Compared to Theorem 954, here the continuity of partial derivatives is not required.
Thus, for concave functions we recover the remarkable equivalence between derivability and
di¤erentiability that holds for scalar functions but fails, in general, for functions of several
variables. This is another sign of the great analytical convenience of concavity.

Proof It is enough to prove that (i) implies (ii) and that (ii) implies (iii) since (iii) implies (i)
by Theorem 952. (i) implies (ii). Suppose f is partially derivable at x. Then, f+0 x; ei =
f 0 x; ei for each versor ei of Rn . Let 0 6= y 2 Rn+ . By Proposition 1111, f+0 (x; ) is
superlinear and f 0 (x; ) is sublinear. So, f+0 (x; 0) = f 0 (x; 0) = 0. Let 0 6= y 2 Rn+ . Since
f+0 x; ei = f 0 x; ei , we have:
n
! n
! n
! n
X X yi X X y
f+0 (x; y) = yi f+0 x; Pn ei yi Pn i f+0 x; ei
i=1 i=1 i=1 yi i=1 i=1 i=1 yi
n
! n n
! n
!
X X yi X X yi
0 i 0 i
= yi Pn f x; e yi f x; Pn e = f 0 (x; y)
i=1 i=1 i=1 yi i=1 i=1 i=1 yi

So, f+0 (x; y) = f 0 (x; y) because, again by Proposition 1111, f+0 (x; y) f 0 (x; y). We
conclude that f+ (x; ) = f (x; ) on R+ . A similar argument, based on f+0 x; ei =
0 0 n

f 0 x; ei , shows that f+0 (x; ) = f 0 (x; ) on Rn . Let y 2 Rn . De…ne the positive vectors
y + = max fy; 0g and y = min fy; 0g. Since y = y + y , we have

f+0 (x; y) = f+0 x; y + y f+0 x; y + + f+0 x; y


= f 0 x; y + + f 0 x; y f 0 x; y + y = f 0 (x; y)

By Proposition 1111, we conclude that f+0 (x; y) = f 0 (x; y). In turn, this implies f+0 (x; ) =
f 0 (x; ) on Rn . By Corollary 1113, f is derivable.
(ii) implies (iii). Suppose f is derivable at x. To show that f is di¤erentiable at x, in
view of the last corollary we need to show that

f (x + h) f (x) rf (x) h
lim =0
h!0 khk

We omit the non-trivial proof.


766 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

24.3.2 A key inequality


To state the multivariable version of the key inequality (24.6), we take a closer look at mul-
tivariable concavity. Intuitively, the concavity of a function f : C ! R de…ned on a convex
set of Rn is closely related to its concavity on all line segments ftx + (1 t) y : t 2 [0; 1]g
determined by vectors x and y that belong to C. Proposition 1116 will make precise this
intuition that is important both conceptually, to better understand the scope of concavity,
and operationally since the restrictions on line segments of f are scalar functions, in general
much easier to study than the original function f .
Given a convex set C and x; y 2 C, set Cx;y = ft 2 R : (1 t) x + ty 2 Cg. That is, Cx;y
is the set of all t values such that (1 t) x + ty 2 C. Clearly, [0; 1] Cx;y . Moreover, we
have the following property (under our maintained hypothesis that C is an open convex set),
as the reader can prove.

Lemma 1115 Cx;y is an open interval.

De…ne x;y : Cx;y ! R by

x;y (t) = f ((1 t) x + ty) (24.16)

Proposition 1116 For a function f : C ! R, the following properties are equivalent:

(i) f is concave (resp., strictly concave);

(ii) x;y is concave (resp., strictly concave) for all x; y 2 C;

(iii) x;y is concave (resp., strictly concave) on [0; 1] for all x; y 2 C.

Proof We consider the concave case, and leave to the reader the strictly concave one. (i)
implies (ii). Suppose f is concave. Let x; y 2 C and t1 ; t2 2 Cx;y . Then, for each 2 [0; 1],

x;y ( t1 + (1 ) t2 ) = f ((1 ( t1 + (1 ) t2 )) x + ( t1 + (1 ) t2 ) y)
= f ( ((1 t1 ) x + t1 y) + (1 ) ((1 t2 ) x + t2 y))
f ((1 t1 ) x + t1 y) + (1 ) f ((1 t2 ) x + t2 y)
= x;y (t1 ) + (1 ) x;y (t2 )

and so x;y is concave.


Since (ii) trivially implies (iii), it remains to prove that (iii) implies (i). Let x; y 2 C.
Since x;y is concave on [0; 1], we have

f ((1 t) x + ty) = x;y (t) t x;y (1) + (1 t) x;y (0) = (1 t) f (x) + tf (y)

for all t 2 [0; 1], as desired.

The previous result permits to establish the sought-after multivariable inequality.

Theorem 1117 Let f : C ! R be di¤ erentiable at x 2 C. If f is concave, then

f (y) f (x) + rf (x) (y x) 8y 2 C (24.17)


24.3. MULTIVARIABLE CASE 767

Proof Let f be concave. Fix x; y 2 C. Let x;y : Cx;y ! R be given by (24.16). By Lemma
1115, Cx;y is an open interval, and by Proposition 1116 the function x;y is concave on Cx;y .
Hence,11
0 (") (0) f (x + " (x y)) f (x)
+ (0) = lim = lim
"!0+ " "!0+ "
= f+0 (x; x y) = f 0 (x; x y) = 0
(0)
So, is di¤erentiable at 0 2 Cx;y . Since [0; 1] Cx;y , by (24.6) we have
0
(1) (0) + (0) = (0) + f 0 (x; x y)
i.e., f (y) f (x) + rf (x) y (Theorem 970). So, the inequality (24.17) holds.

24.3.3 Concavity criteria


So far we considered the di¤erentiability properties of concave functions of several variables.
We now change angle and ask if, given a di¤erentiable function of several variables, there
exist some criteria based on di¤erentiability that allow us to determine whether the function
is concave. For instance, is there a multivariable counterpart of the property of decreasing
monotonicity of the …rst derivative?
The key inequality (24.17) permits to establish a …rst di¤erential characterization of
concavity that extends Theorem 1096 to functions of several variables.
Theorem 1118 Let f : C ! R be di¤ erentiable. Then, f is concave if and only if
f (y) f (x) + rf (x) (y x) 8x; y 2 C (24.18)
while f is strictly concave if and only if inequality (24.18) is strict when x 6= y.
The right-hand side of (24.18) is the linear approximation of f at x; geometrically, it
is the hyperplane tangent to f at x, that is, the multivariable version of the tangent line.
By this theorem, such approximation is from above, that is, the tangent hyperplane always
lies above the graph of a concave function. The di¤erential characterizations of concavity
discussed in the previous section for scalar functions, thus nicely extend to functions of
several variables.

Proof The “only if” follows from (24.17). As to the converse, suppose that (24.18) holds.
For each x 2 C, consider the function Fx : C ! R given by Fx (y) = f (x) + rf (x) (y x).
By (24.18), f (y) Fx (y) for all x; y 2 C. Since Fx (x) = f (x), we conclude that f (y) =
minx2C Fx (y) for each y 2 C. Since each Fx is a¢ ne, we conclude that f is concave since,
as the reader can check, a function that is a minimum of a family of concave functions is
concave.

Though conceptually important, the previous di¤erential characterization of concavity is


less useful operationally. In this regard, the next result is more useful in that it establishes the
multivariable counterpart of the property of decreasing monotonicity of the …rst derivative
that, in the scalar case, characterizes concavity (Corollary 1099). For functions of several
variables, the derivative function f 0 becomes the derivative operator rf : C ! Rn (Section
21.1.3).
11
To ease notation, in the rest of the proof we use in place of x;y .
768 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Theorem 1119 Let f : C ! R be di¤ erentiable. Then,

(i) f is concave if and only if the derivative operator rf : C ! Rn is monotone, i.e.,

(rf (y) rf (x)) (y x) 0 8x; y 2 C (24.19)

(ii) f is strictly concave if and only if f 0 : C ! Rn is strictly monotone, i.e., the previous
inequality is strict if x 6= y.

Proof (i) Suppose f is concave. Let x; y 2 C. By (24.18),

f (y) f (x) + rf (x) (y x) and f (x) f (y) + rf (y) (x y)

So, rf (x) (x y) f (x) f (y) rf (y) (x y). In turn, this implies (rf (x) rf (y))
(x y) 0 and we conclude that rf : C ! Rn is monotone decreasing.
Conversely, suppose rf : C ! Rn is monotone decreasing, i.e., (24.19) holds. Suppose
…rst that n = 1. Let x 2 C, and de…ne x : C ! R by x (y) = f (y) f (x) rf (x) (y x).
Then, 0x (y) = rf (y) rf (x), and so 0x (y) 0 if y < x and 0x (y) 0 if y > x. Hence,
x has a minimum at x, i.e.,

0= x (x) x (y) = f (y) f (x) rf (x) (y x) ; 8y 2 C.

Since x was arbitrary, we conclude that f (y) f (x) + rf (x) (y x) for all x; y 2 C. By
Theorem 1118, f is concave. This completes the proof for n = 1.
Suppose now that n > 1. Let x; y 2 C and let x;y : Cx;y ! R be given by (24.16).
By Lemma 1115, Cx;y is an open interval, with [0; 1] Cx;y . Then, x;y is concave and
di¤erentiable on (a; b), with
0
x;y (t) = rf ((1 t) x + ty) (x y) 8t 2 Cx;y (24.20)

Let t2 t1 2 Cx;y . Since f 0 is monotone, then

(rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) ((1 t1 ) x + t1 y ((1 t2 ) x + t2 y))


= (t2 t1 ) (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (x y) 0

and so, by (24.20),


0 0
0 (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (x y) = x;y (t1 ) x;y (t2 )

and we conclude that 0x;y (t1 ) 0


x;y (t2 ), i.e.,
0
x;y is monotone on Cx;y . By what already
proved, x;y is then concave, and so:

f ((1 t) x + ty) = x;y (t) (1 t) x;y (0) + t x;y (1) = (1 t) f (x) + tf (y)

which shows that f is concave.


(ii) For simplicity, we consider the case n = 1. We leave to the reader the extension to
n 1, with the help of Proposition 1116. Suppose f is strictly concave. Since f is concave, f 0
is decreasing and continuous by (i). Let x1 ; x2 2 U with x1 < x2 . Suppose, by contradiction,
that f 0 (x1 ) = f 0 (x2 ) . We have, for i = 1; 2,

f (x) f (xi ) + (x xi ) 8x 2 U (24.21)


24.3. MULTIVARIABLE CASE 769

In particular, f (x2 ) f (x1 ) + (x2 x1 ) and f (x1 ) f (x2 ) + (x1 x2 ), so that

f (x2 ) f (x1 ) (x2 x1 ) f (x2 ) f (x1 ) ;

which implies f (x2 ) f (x1 ) = (x2 x1 ). Given 2 (0; 1), by (24.21) we have:

f ( x1 + (1 ) x2 ) f (x1 ) + (1 ) (x2 x1 ) = f (x1 ) + (1 ) f (x2 ) ,

which contradicts strict concavity.


Conversely, suppose f 0 is strictly decreasing. Then, by (i) the function f is concave.
It remains to show that it is strictly concave. Suppose, by contradiction, that there exist
x1 ; x2 2 U , with x1 < x2 , and 2 (0; 1) such that f ((1 ) x1 + x2 ) = (1 ) f (x1 ) +
f (x2 ). De…ne : [0; 1] ! R by ( ) = f ((1 ) x1 + x2 ). Then, is concave and
continuous, with ( ) = (1 ) (0) + (1). This implies ( ) = (1 ) (0) + (1)
for all 2 [0; 1]. Then,
f ((1 ) x1 + x2 ) f (x1 ) f (x2 ) f (x1 )
f 0 (x1 ) = lim = ;
#0 (1 ) x1 + x2 x1 x2 x1
f ((1 ) x1 + x2 ) f (x2 ) f (x1 ) f (x2 )
f 0 (x2 ) = lim = ;
"1 (1 ) x1 + x2 x2 x1 x2
so that f 0 (x1 ) = f 0 (x2 ), a contradiction.

A dual result, with opposite inequality, characterizes convex functions. The next result
makes truly operational this characterization via a condition of negativity on the Hessian
matrix r2 f (x) of f –that is, the matrix of second partial derivatives of f –which generalizes
the condition f 00 (x) 0 of Corollary 1101. In other words, in the general case the role of
the second derivative is played by the Hessian matrix.

Proposition 1120 Let f : C ! R be twice continuously di¤ erentiable. Then:

(i) f is concave if and only if r2 f (x) is negative semi-de…nite for every x 2 C;

(ii) f is strictly concave if r2 f (x) is negative de…nite for every x 2 C.

Proof The result follows from Proposition 1107 once one remembers that the Hessian matrix
of a function of several variables is the Jacobian matrix of its derivative operator (Exercise
975). So, the Hessian matrix r2 f (x) of f is the Jacobian matrix of the derivative operator
rf : C ! Rn , which plays here the role of g in Proposition 1107.

This is the most useful di¤erential criterion to establish concavity and strict concavity
for functions of several variables. Naturally, dual results hold for convex functions, which
are characterized by having positive semi-de…nite Hessian matrices.

Example 1121 In Example 1061 we considered the function f : R3 ! R given by

f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2

and we saw how its Hessian matrix was positive de…nite. By Proposition 1120, f is strictly
convex. N
770 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Example 1122 Consider the CES production function f : R2+ ! R de…ned, as in Example
705, by
1
f (x) = ( x1 + (1 ) x2 )
with 2 [0; 1] and > 0. Some tedious algebra shows that the Hessian matrix is
1
2 2 2
r2 f (x) = (1 ) (1 )t x1 x2 H
where t = x1 + (1 ) x2 and
x22 x1 x2
H=
x1 x2 x21
If = ( 1; 2 ), we have
H = x22 2
1 2x1 x2 1 2 + x21 2
2 = (x2 1 x1 2 )2 0
Thus, the matrix H is positive semide…nite. It follows that for > 1 the matrix r2 f (x)
is positive semide…nite for all x1 ; x2 > 0, so by Proposition 1120 f is convex. While f is
concave when 0 < < 1.
In Corollary 711 we already established the concavity of the CES functions without doing
any calculation. Readers can compare the pros and cons of the two approaches. N

24.4 Ultramodular functions


The monotonicity of the increments is a key economic characterization of concave and convex
scalar functions (Section 24.1.1). Unfortunately, such characterization no longer holds for
functions of several variables, as it will seen momentarily. This motivates the next de…nition.
De…nition 1123 A function f : I Rn ! R is said to be ultramodular if, for all x; y 2 I
with x y and for all h 0, we have
f (x + h) f (x) f (y + h) f (y)
provided x + h; y + h 2 I, while it is said to be inframodular if the inequality is reversed.
In words, ultramodular functions exhibit increasing di¤erences – so increasing marginal
e¤ects, like scalar convex functions. Unlike the weaker De…nition 756, they do not consider
such di¤erences only across di¤erent variables, but consider any possible increase h 0.
Similarly, inframodular functions exhibit increasing di¤erences, so decreasing marginal
e¤ects like scalar concave functions (Proposition 1090). Clearly, f is ultramodular if and
only if f is inframodular, so the two properties are dual and results stated for one are
easily translated for the other.
Ultramodular functions are supermodular. Indeed, from the equality (17.1), we can set
h = x _ y y = x x ^ y 0. So, if f is ultramodular, we have
f (x) f (x ^ y) = f (x ^ y + h) f (x ^ y) f (y + h) f (y) = f (x _ y) f (y)
which implies that f is supermodular. The converse is false: for instance, the function
p
f (x1 ; x2 ) = x1 x2 is supermodular but not ultramodular (Example 1128).
The next result further clari…es the relations between supermodularity and ultramodu-
larity.
24.4. ULTRAMODULAR FUNCTIONS 771

Theorem 1124 Let f : [a; b] Rn ! R. If f is supermodular and separately convex,12 then


f is ultramodular. The converse holds provided f is locally bounded from below at a:

Proof Let f be ultramodular. Then, it is supermodular. It is easy to check that f ( ; x i ) :


[ai ; bi ] ! R is ultramodular. By Proposition 1090, the section f ( ; x i ) : [ai ; bi ] ! R is
convex. We omit the proof of the converse.

In Section 24.3 we learned the remarkable di¤erential properties of concave functions.


It is useful to compare them with those of inframodular functions, which are also sharp
(inframodular functions are, indeed, much better behaved that submodular functions).13 A
…rst important result is that, like for concave functions (Theorem 1114), also for inframodular
functions partial derivability is equivalent to di¤erentiability.14

Proposition 1125 A bounded and inframodular function f : (a; b) Rn ! R is partially


derivable if and only if it is di¤ erentiable.

Next, we consider a di¤erential criterion for inframodularity.

Proposition 1126 Let f : (a; b) Rn ! R be partially derivable. Then, f is inframodular


if and only if the derivative operator rf : (a; b) Rn ! Rn is decreasing, i.e.,

x y =) rf (x) rf (y) 8x; y 2 (a; b)

Now a “plain vanilla” monotonicity of the gradient characterizes inframodularity, while


a special kind of operator monotonicity characterized concavity (Theorem 1119).

Proposition 1127 Let f : (a; b) Rn ! R be twice continuously di¤ erentiable. Then, f is


inframodular if and only if the Hessian matrix r2 f (x) is negative, i.e.,

@f (x)
0 8i; j = 1; :::; n
@xi @xj

Again, a plain vanilla negativity condition on the Hessian matrix characterizes inframod-
ularity, while for concave functions we needed a notion of monotonicity based on quadratic
forms (Theorem 1120). Note that submodularity requires this negativity property only when
i 6= j. This di¤erential characterization thus sheds further light on the relations between
submodularity or supermodularity and inframodularity or ultramodularity.

The di¤erential characterizations established in the last two results show that, unlike the
scalar case, inframodularity and concavity are quite unrelated properties in the multivariable
case, as we remarked at the beginning of this section.
12
That is, each section f ( ; x i ) : [ai ; bi ] ! R is convex in xi .
13
We omit the proofs of these di¤erentiability results (their inframodular, rather than ultramodular, focus
will be self-explanatory).
14
In reading the result, recall from Section 2.3 that (a; b) = fx 2 Rn : ai < xi < bi g.
772 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Example 1128 (i) De…ne f : R2+ ! R by f (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. This


function is supermodular (Example 760). Its Hessian matrix is
2 3
1( 1 1) x1 1 2 x2 2 1
1 2
1 1 2
6 x1 x2 7
r2 f (x) = 4 5
1 2 2
1
1 2
1 1 2 2 ( 2 1) x1 x 2
x1 x2

So, we conclude that f is ultramodular provided 1 ; 2 1. (ii) In view of the previous


p
point, the concave and supermodular function f : R2+ ! R de…ned by f (x1 ; x2 ) = x1 x2
is neither ultramodular nor inframodular. (iii) The convex function f : R2 ! R de…ned by
f (x1 ; x2 ) = log (ex1 + ex2 ) is neither ultramodular nor inframodular: its Hessian matrix is
2 ex1 ex1 +x2
3
ex1 +ex2 (ex1 +ex2 )2
r2 f (x) = 4 5
ex1 +x2 e x1
(ex1 +ex2 )2 ex1 +ex2

This function is, however, submodular. N

24.5 Global optimization


24.5.1 Su¢ ciency of the …rst order condition
Though the …rst-order condition is in general only necessary, in Section 18.6 we saw that
the maximizers of concave functions are necessarily global (Theorem 828). We may then
expect that for concave functions the …rst-order condition may come to play a decisive role.
Indeed, the results studied in this chapter allow us to show that for concave functions the
…rst-order condition is also su¢ cient. In other words, a stationary point of a concave function
is, necessarily, a global maximizer. It is a truly remarkable property of concave functions, a
main reason behind their popularity.
To ease matters, we start by considering a scalar concave function f : (a; b) ! R that is
di¤erentiable. The inequality (24.7), that is,

f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b)

implies that a point x^ 2 (a; b) is a global maximizer if f 0 (^


x) = 0. Indeed, if x
^ 2 (a; b) is such
0
that f (^x) = 0, the inequality implies

f (y) x) + f 0 (^
f (^ x) (y x
^) = f (^
x) 8y 2 (a; b)

^ 2 (a; b) is a maximizer, it follows that f 0 (^


On the other hand, if x x) = 0 by Fermat’s
Theorem. Therefore:

Proposition 1129 Let f : (a; b) ! R be a concave and di¤ erentiable function. A point
^ 2 (a; b) is a global maximizer of f on (a; b) if and only if f 0 (^
x x) = 0.

Example 1130 (i) Consider the function f : R ! R given by f (x) = (x+1)4 +2. We have
f 00 (x) = 12(x + 1)2 < 0. The function is concave on R and it is therefore su¢ cient to …nd
a point where its …rst derivative is zero to …nd a maximizer. We have f 0 (x) = 4(x + 1)3 .
24.5. GLOBAL OPTIMIZATION 773

Hence f 0 is zero only at x


^ = 1. The point x ^ = 1 is the unique global maximizer, and the
maximum value of f on R is f ( 1) = 2.
(ii) Consider the function f : R ! R given by f (x) = x (1 x). Because f 0 (1=2) = 0
and f 00 (x) = 2 < 0, the point x
^ = 1=2 is the unique global maximizer of f on R. N

The result easily extends to functions f : A Rn ! R of several variables using the


multivariable version (24.18) of inequality (24.7). We can therefore state the following general
result.

Theorem 1131 Let f : C ! R be a concave function di¤ erentiable on int C and continuous
on C. A point x
^ of int C is a global maximizer of f on C if and only if rf (^
x) = 0.

Proof In view of Fermat’s Theorem, we need to prove the “if”part, that is, su¢ ciency. So,
let x
^ 2 int C be such that rf (^
x) = 0. We want to show that x
^ is a global maximizer. By
inequality (24.17), we have
f (y) f (^
x) + rf (^
x) (y x) 8y 2 int C
Since f is continuous, the inequality is easily seen to hold for all y 2 C. Since rf (^
x) = 0,
we conclude that f (y) f (^ x) for all y 2 C, as desired.

It is hard to overestimate the importance of this result in optimization theory, as we will


learn later in the book in Section 28.

Example 1132 Consider the function f : R2 ! R given by f (x1 ; x2 ) = (x1 1)2 (x2 +
3)2 6. We have
2 0
r2 f (x1 ; x2 ) =
0 2
Since 2 < 0 and r2 f (x1 ; x2 ) = 4 > 0; the Hessian matrix is negative de…nite for every
(x1 ; x2 ) 2 R2 and hence f is strictly concave. We have
rf (x1 ; x2 ) = 2(x1 1) 2(x2 + 3)
The unique point where the gradient is zero is (1; 3) which is, therefore, the unique global
maximizer. The maximum value of f on R2 is f (1; 3) = 6. N

Example 1133 In Section (18.9) we considered the least squares optimization problem
max g (x) sub x 2 Rn (24.22)
x

with g : Rn ! R de…ned by g (x) = kAx bk2 . We learned that if (A) = n, then there is a
unique solution x
^ (Theorem 854). In Section 19.4 we then noted, via the Projection Theorem,
1 T
that such solution is given by x^ = AT A A b. This can established also from Theorem
1131. Indeed, rg (x) = 2AT (Ax b) and so the …rst order condition 2AT (Ax b) = 0
can be written as a linear system
AT Ax = AT b (24.23)
Since (A) = n, by Proposition 582 we have AT A = n, so the Gram matrix is invertible.
1 T
^ = AT A
By Cramer’s Theorem, x A b is the unique solution of the linear system (24.23),
so by Theorem 1131 the only solution of the optimization problem (24.22). N
774 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

We close by noting that for scalar functions f : (a; b) ! R, with C = (a; b), the last
theorem also follows from Proposition 1024. That said, it is the last theorem the result
used in applications because of the conceptual and analytical appeal of concavity (cf. the
discussion that ends Section 22.5.4).

24.5.2 A deeper result


A function f : C ! R de…ned on a convex set of Rn is called (weakly) concavi…able if there
exists a concave function g : C ! R that dominates f , that is, g f . As the examples below
will show, concavi…ability is a much weaker condition than concavity.

Proposition 1134 Let f : C ! R be concavi…able. Then there exists a concave function


co f : C ! R such that

(i) co f f;
(ii) h co f for all concave function h : C ! R such that h f.

In words, a concavi…able function admits a smallest concave function that pointwise


dominates it.

Proof Let fgi gi2I be the collection of all concave functions gi : C ! R such that gi f.
This collection is not empty because f is concavi…able. De…ne co f : C ! R by

co f (x) = inf gi (x)


i2I

For each x 2 C, the scalar f (x) is a lower bound for the set fgi (x) : i 2 Ig. By the Least
Upper Bound Principle inf i2I gi (x) exists, so the function co f is well de…ned. It is easily
seen to be concave. Indeed, let 2 [0; 1] and x; y 2 C. By Proposition 120, for each " > 0
there exists i" such that

co f ( x + (1 ) y) > gi" ( x + (1 ) y) " gi" (x)+(1 ) gi" (y) " co f (x)+(1 )c

Since this inequality holds for every " > 0, we conclude that co f ( x + (1 ) y) co f (x)+
(1 ) co f (y), so co f is concave. In turn, this implies that co f (x) = mini2I gi (x). In par-
ticular, co f satis…es properties (i) and (ii).

The function co f is called the concave envelope of f .

Example 1135 (i) Both the sine and cosine functions are concavi…able. Their concave
envelope is constant to 1, i.e., co sin (x) = co cos (x) = 1 for all x 2 R. (ii) Let f : R ! R be
2
the Gaussian function f (x) = e x . It is concavi…able with
( h i
f (x) x 2 p1 ; p1
co f (x) = 2 2
1
e 2 else
(iii) The quadratic function is not concavi…able on the real line. (iv) Functions that have
at least a global maximizer are automatically concavi…able: just take the function constant
to the maximum value. For instance, continuous supercoercive functions f : Rn ! R are
concavi…able. N
24.5. GLOBAL OPTIMIZATION 775

Concavi…ability permits to generalize the fundamental Theorem 1131.

Theorem 1136 Let f : C ! R be a concavi…able function di¤ erentiable on int C and


continuous on C. A point x
^ of int C is a global maximizer of f on C if and only if rf (^
x) = 0
and co f (^
x) = f (^
x).

This remarkable result shows how concavity is deeply connected to global maximization,
more than it may appear prima facie. It is a result, however, mostly of theoretical interest
because concave envelopes are, in general, not easy to compute. Indeed, Theorem 1131 can
be regarded as its operational special case.
The proof relies on two elegant lemmas of independent interest.

Lemma 1137 Let f; g : C ! R with g concave and g f . If f is di¤ erentiable at x 2 int C


and if g (x) = f (x), then g is di¤ erentiable at x with rf (x) = rg (x).

Proof Assume that f is di¤erentiable at x 2 int C. We have, for h small enough,


g (x + hy) g (x) f (x + hy) f (x)
8h > 0
h h
g (x + hy) g (x) f (x + hy) f (x)
8h < 0
h h
So, for all y 2 Rn we have:

0 g (x + hy) g (x) f (x + hy) f (x)


g+ (x; y) = lim lim
h!0+ h h!0+ h
f (x + hy) f (x) g (x + hy) g (x)
= lim lim = g 0 (x; y)
h!0 h h!0 h
By Proposition 1111-(iv), we conclude that g+ 0 (x; ) = g 0 (x; ) = f 0 (x; ). By Corollary

1113, we conclude that g is di¤erentiable, as well as that rf (x) = rg (x).

Lemma 1138 Let f : C ! R be concavi…able. If a point x ^ of C is a global maximizer of f


on C, then it is a global maximizer of co f on C. In particular, co f (^
x) = f (^
x).

In words, global maximizers of concavi…able functions are global maximizers of their


concave envelopes and they share the same maximum value.

Proof Let x^ 2 C be a global maximizer of f on C. The function constant to f (^


x) is a
concave function that pointwise dominates f . So,

f (^
x) co f (x) 8x 2 C (24.24)

In particular, we then have f (^x) co f (^


x) f (^ x), thus co f (^
x) = f (^x). In view of (24.24),
in turn this implies that co f (^
x) co f (x) for all x 2 C, so x^ is a global maximizer of co f
on C.

Proof of Theorem 1136 “If”. By hypothesis, f is di¤erentiable at x ^ 2 int C. Since


co f (^
x) = f (^
x), by Lemma 1137 the convex envelope is di¤erentiable at x
^ with r co f (^
x) =
776 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

rf (^
x). So, r co f (^
x) = 0. Since f is continuous, by proceeding as in the proof of Theorem
1131 we can show that inequality (24.17) implies that x
^ is a global maximizer of co f . Hence,
f (^
x) = co f (^
x) co f (x) f (x) 8x 2 C
We conclude that x^ is a global maximizer of f .
“Only if”. Let x ^ 2 int C be a global maximizer of f on C. By Lemma 1138, x ^ is a
global maximizer of co f on C, with co f (^
x) = f (^
x). By Lemma 1137, co f is di¤erentiable
at x
^ with r co f (^
x) = rf (^ x). By Fermat’s Theorem, r co f (^
x) = 0. We conclude that
rf (^x) = 0.

In view of Lemma 1138, in optimization problems with convex choice sets –e.g., consumer
problems since budget sets are, typically, convex – in terms of value attainment one can
assume that the objective function be concave. If in such problems we are only interested
in the value functions, without any loss we can just deal with concave objective functions.
This is no longer the case, however, if we are interested also in the solutions per se, i.e., in
the solution correspondence. Indeed, in this regard Lemma 1138 only says that
arg max f (x) arg max co f (x)
x2C x2C

So, by replacing an objective function with its concave envelope we do not lose solutions
but we might well get intruders that solve the concavi…ed problem but not the original one.
To understand the scope of this issue, note that co (arg maxx2C f (x)) arg maxx2C co f (x)
because the solutions of a concave objective function form a convex set. Thus, the best one
can hope is that
co arg max f (x) = arg max co f (x)
x2C x2C

Even in such best case, there might well be many vectors that solve the optimization problem
for the concave envelope co f but not for the original objective function f . We thus might
end up overestimating the solution correspondence. For instance, if in a consumer problem
we replace a utility function with its concave envelope, we do not lose any optimal bundle
but we might well get “extraneous” bundles, optimal for the concave envelope but not for
the original utility function. For an analytical example, if we maximize the cosine function
over the real line, the maximizers are the points x ^ = 2k with k 2 Z (Example 780). If we
replace the cosine function with its concave envelope, the maximizers become all the points
of the real line. So, the solution set is vastly in‡ated. Still, the common maximum value is
1.

A …nal remark: there is a dual notion of convex envelope of a function as the largest
dominated convex function, relevant for minimization problems (the reader can establish the
dual version of Theorem 1136).

24.6 Superdi¤erentials
Theorem 1118 showed that di¤erentiable concave functions feature the important inequality15
f (y) f (x) + rf (x) (y x) 8y 2 C
15
Unless otherwise stated, throughout this section C denotes an open and convex set in Rn .
24.6. SUPERDIFFERENTIALS 777

This inequality has a natural geometric interpretation: the tangent hyperplane (line, in the
scalar case) lies above the graph of f , which it touches only at (x; f (x)). Remarkably, next
we show that this property actually characterizes the di¤erentiability of concave functions.
In other words, this geometric property is peculiar to the tangent hyperplanes of concave
functions.

Theorem 1139 A concave function f : C ! R is di¤ erentiable at x 2 C if and only if there


exists a unique vector 2 Rn such that

f (y) f (x) + (y x) 8y 2 C (24.25)

In this case, = rf (x).

The proof relies on this lemma of independent interest.

Lemma 1140 Let f : C ! R be concave. Then, a vector 2 Rn satis…es (24.25) if and


only if
f+0 (x; z) z 8z 2 Rn (24.26)

Proof “If”. Suppose 2 Rn satis…es (24.25). Let z 2 Rn . Since C is open, for h > 0 small
enough we have x + hz 2 C, so

h z= ((x + hz) x) f (x + hz) f (x)

Since, by Proposition 1111, f+0 (x; ) : Rn ! R exists, we then have

f (x + hz) f (x)
f+0 (x; z) = lim z
h!0+ h
so satis…es (24.25).
“Only if”. Assume that 2 Rn satis…es (24.26). Let y 2 C. Since C is open, there is
h > 0 small enough so that x + h (y x) 2 C. Then, by Lemma 1112,

f (x + t (y x)) f (x)
(y x) f 0 (x; y x) (24.27)
t
which is (24.25) when t = 1.

Proof of Theorem 1139 “Only if”. Assume f is di¤erentiable at x 2 C. Fix y 2 C. Let


x;y : Cx;y ! R be given by (24.16). By Lemma 1115, Cx;y is an open interval, and by
Proposition 1116 x;y is concave on Cx;y . Hence,16

0 (t + ") (t)
+ (t) = lim
"!0+ "
f ((1 t) x + ty + " (x y)) f ((1 t) x + ty)
= lim
"!0+ "
= f+0 ((1 t) x + ty; x y)
16
To ease notation, in the rest of the proof we use in place of v;w .
778 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

0 0
for each t 2 Cx;y . Since [0; 1] Cx;y and f is di¤erentiable at x, we have + (0) = (0) and
so, by (24.6),
0
(1) (0) + (0) = (0) + f 0 (x; x y)

i.e., f (y) f (x) + f 0 (x; y x). Since f is di¤erentiable at x, we have f 0 (x; y) = rf (x) y
for all y 2 Rn , so (24.18) holds with = rf (x).
“If”. Assume there is a unique vector 2 Rn such that (24.25) holds. By the last lemma,
0
f+ (x; z) z for all z 2 Rn . Since is unique, by Corollary 1170, f+0 (v; ) : Rn ! R
is a linear function. By Corollary 1113, f is derivable at x. Then, by Theorem 1114 f is
di¤erentiable at x.

For concave functions, di¤erentiability is thus equivalent to the existence of a unique


vector – the gradient – for which the basic inequality (24.25) holds. Equivalently, to the
existence of a unique linear function l : Rn ! R such that f (y) f (x) + l (y x) for all
y 2 C. Consequently, non di¤erentiability is equivalent either to the existence of multiple
vectors for which (24.25) holds or to the non existence of any such vector. This observation
motivates the next de…nition, where C is any convex (possibly not open) set.

De…nition 1141 A function f : C ! R is said to be superdi¤erentiable at a basepoint


x 2 C if the set @f (x) formed by the vectors 2 Rn such that

f (y) f (x) + (y x) 8y 2 C (24.28)

is non-empty. The set @f (x) is called superdi¤erential at x of f .

The superdi¤erential thus consists of all vectors (and so of the linear functions) for which
(24.25) holds. It may not exist any such vector (Example 1149 below); in this case the
superdi¤erential is empty and the function is not superdi¤erentiable at the basepoint.

To visualize the superdi¤erential, given a basepoint x 2 C consider the a¢ ne function


r : Rn ! R de…ned by:
r (y) = f (x) + (y x)

with 2 @f (x). The a¢ ne function r is, therefore, such that

r (x) = f (x) (24.29)


n
r (y) f (y) 8y 2 R (24.30)

In words, r is equal to f at the basepoint x and dominates f elsewhere. It follows that @f (x)
identi…es the set of all a¢ ne functions that touch the graph of f at x and that lie above this
graph at all other points of the domain. In the scalar case, a¢ ne functions are the straight
lines. So, in the next …gure the straight lines r, r0 , and r00 belong to the superdi¤erential
24.6. SUPERDIFFERENTIALS 779

@f (x) of a concave scalar function:

It is easy to see that, at the points where the function is di¤erentiable, the only straight
line that satis…es conditions (24.29)-(24.30) is the tangent line f (x) + f 0 (x) (y x). But,
at the points where the function is not di¤erentiable, we might well have several straight
lines r : R ! R that satisfy such conditions, that is, that touch the graph of the function
at the basepoint x and that lie above such graph elsewhere. The superdi¤erential, being
the collection of these straight lines, can thus be viewed as a surrogate of the tangent line,
i.e., of the di¤erential. This is the idea behind the superdi¤erential: it is a surrogate of the
di¤erential when it does not exist. The next result, an immediate consequence of Theorem
1139, con…rms this intuition.
Proposition 1142 A concave function f : C ! R is di¤ erentiable at x 2 C if and only if
@f (x) is a singleton. In this case, @f (x) = frf (x)g.
Before presenting an example, we state a …rst important property of the superdi¤erential.
Proposition 1143 If f : C ! R is concave, then the set @f (x) is compact at every x 2 C.
Proof It is easy to check that @f (x) is closed and convex. To show that @f (x) is compact,
assume that it is is non-empty (otherwise the result is trivially true) and, without loss of
generality, that 0 2 C and x = 0. By Lemma 736, there exists a neighborhood B" (0) C
and a constant k > 0 such that jf (y)j k kyk for all y 2 B" (0). Let 2 @f (0). Since
y 2 B" (0) if and only if y 2 B" (0), by (24.28) we have:
k kyk f ( y) y= y f (y) k kyk 8y 2 B" (0)
Hence, j yj k kyk for all y 2 B" (0). For each versor ei , there is > 0 small enough so
that ei 2 B" (0). Hence,
j ij = ei k ei = k 8i = 1; :::; n
so j i j k for each i = 1; :::; n. Since was arbitrarily chosen in @f (0), by Proposition 161
we conclude that @f (0) is a bounded (so, compact) set.

In the following example we determine the superdi¤erential of a simple scalar function.


780 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Example 1144 Consider f : R ! R de…ned by f (x) = 1 jxj. The only point where f is
not di¤erentiable is x = 0. By Proposition 1142, we have @f (x) = ff 0 (x)g for each x 6= 0. It
remains to determine @f (0). This amounts to …nding the scalars that satisfy the inequality

1 jyj 1 j0j + (y 0) 8y 2 R

i.e., the scalars such that jyj y for each y 2 R. If y = 0, this inequality trivially
holds for all if y = 0. If y =
6 0, we have
y
1 (24.31)
jyj
Since
y 1 if y 0
=
jyj 1 if y < 0
from (24.31) it follows both 1 and ( 1) 1. That is, 2 [ 1; 1]. We conclude
that @f (0) = [ 1; 1]. Thus:
8
>
< 1 if x > 0
@f (x) = [ 1; 1] if x = 0
>
:
1 if x < 0
N

We can recast what we found in the example as


( 0
f (x) if x 6= 0
@f (x) =
f+0 (x) ; f 0 (x) if x = 0
Next we show that this is always the case for scalar functions.

Proposition 1145 Let f : (a; b) ! R be a concave function, with a; b 2 R. Then,

@f (x) = f+0 (x) ; f 0 (x) 8x 2 (a; b) (24.32)

In words, the superdi¤erential of a scalar function consists of all coe¢ cients that lie
between the right and left derivatives. This makes precise the geometric intuition we gave
above on scalar functions.

Proof We only prove that @f (x) f+0 (x) ; f 0 (x) . Let 2 @f (x). Given any h 6= 0, by
de…nition we have f (x + h) f (x) + h. If h > 0, we then have
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f+0 (x) . If h < 0, then
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f 0 (x). We conclude that 2 f+0 (x) ; f 0 (x) , as desired.

Next we compute the superdi¤erential of an important function of several variables.


24.6. SUPERDIFFERENTIALS 781

Example 1146 Consider the function f : Rn ! R given by f (x) = mini=1;:::;n xi . Let us


…nd @f (0), that is, the vectors 2 Rn such that x f (x) for all x 2 Rn . Let 2 @f (0).
From:

i = ei f ei = 0 8i = 1; :::; n
n
X
i = (1; :::; 1) f (1; :::; 1) = 1
i=1
Xn

i = ( 1; :::; 1) f ( 1; :::; 1) = 1
i=1
P
we conclude that ni=1 i = 1 and i 0 for each i = 1; :::; n. That is, belongs to the
simplex n 1 . Thus, @f (0) n 1 . On the other hand, if 2 n 1 , then

x min xi ; :::; min xi = min xi 8x 2 Rn


i=1;:::;n i=1;:::;n i=1;:::;n

and so 2 @f (0). We conclude that @f (0) = n 1 , that is, the superdi¤erential at the
origin is the simplex. The reader can check that, for every x 2 Rn ,

@f (x) = f 2 n+1 : x = f (x)g

i.e., @f (x) consists of the vectors x of the simplex such that x = f (x). N

Example 1147 We can generalize the previous example by showing that for any positively
homogeneous function f : Rn ! R we have

@f (x) = f 2 @f (0) : x = f (x)g (24.33)

Indeed, let 2 @f (x). By positive homogeneity, if we take y = 2x in (24.28) we have

2f (x) = f (2x) f (x) + (2x x) = f (x) + x

that is, f (x) x. By (15.2), if we take instead y = 0 we have

0 = f (0) f (x) + (0 x) = f (x) x

so f (x) x. We conclude that f (x) = x for all 2 @f (x). In turn, this implies that
(24.28) takes the form
f (y) y 8y 2 Rn
for all 2 @f (x), i.e., @f (x) @f (0). So, (24.33) holds.17 N

Before we argued that the superdi¤erential is a surrogate of the di¤erential. To be a useful


surrogate, however, it is necessary that it often exists, otherwise it would be of little help.
The next key result shows that, indeed, concave functions are everywhere superdi¤erentiable
and that, moreover, this is exactly a property that characterizes them (another proof of the
tight connection between superdi¤erentiability and concavity).
17
The argument shows that (24.33) actually holds for any superhomogeneous function f : Rn ! R with
f (0) = 0.
782 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Theorem 1148 A function f : C ! R is concave if and only if @f (x) is non-empty for all
x 2 C.

In view of Proposition 1142, this result generalizes Theorem 1139.

Proof “If”. Suppose @f (x) 6= ; at all x 2 C. Let x1 ; x2 2 C and t 2 [0; 1]. Let 2
@f (tx1 + (1 t) x2 ). By (24.28),

f (x1 ) f (tx1 + (1 t) x2 ) + (x1 (tx1 + (1 t) x2 ))


f (x2 ) f (tx1 + (1 t) x2 ) + (x2 (tx1 + (1 t) x2 ))

that is,

f (x1 ) (1 t) (x1 x2 ) f (tx1 + (1 t) x2 )


f (x2 ) t (x2 x1 ) f (tx1 + (1 t) x2 )

Hence,

f (tx1 + (1 t) x2 )
tf (x1 ) t (1 t) (x1 x2 ) + (1 t) f (x2 ) (1 t) t (x2 x1 )
= tf (x1 ) + (1 t) f (x2 )

as desired.
“Only if”. Suppose f is concave. Let x 2 C. By proceeding as in the proof of the coda
Theorem 1169, it is easy to check that the Hahn-Banach’s Theorem implies that there exists
2 Rn such that y f+0 (x; y) for all y 2 Rn . Hence, by (24.35), @f (x) is non-empty.

The maintained hypothesis that C is open is key for the last two propositions, as the
next example shows.
p
Example 1149 Consider f : [0; 1) ! R de…ned by f (x) = x. The only point of the
(closed) domain in which the function is not di¤erentiable is the boundary point x = 0. The
superdi¤erential @f (0) is given by the scalars such that
p p
y 0 + (y 0) 8y 0 (24.34)
p
i.e., such that y y for each y 0. If y = 0, this inequality holds for all . If y > 0,
p p
the inequality is equivalent to y=y = 1= y. But, letting y tend to 0, this implies
p
limy!0+ 1= y = +1. Therefore, there is no scalar for which (24.34) holds. It follows
that @f (0) = ;. We conclude that f is not superdi¤erentiable at the boundary point 0. N

N.B. We focused on open convex sets C to ease matters, but this example shows that
non-open domains may be important. Fortunately, the results of this section can be easily
extended to such domains. For instance, Theorem 1148 can be stated for any convex set C
(possibly not open) by saying that a concave and continuous function f : C ! R is concave
on int C if and only if @f (x) is non-empty at all x 2 int C, i.e., at all interior points x of C.18
18
If the domain C is not assumed to be open, we need to require continuity (which is otherwise automatically
satis…ed by Theorem 669).
24.6. SUPERDIFFERENTIALS 783

p
The concave function f (x) = x is indeed di¤erentiable – and so superdi¤erentiable, with
@f (x) = ff 0 (x)g – at all x 2 (0; 1), that is, at all interior points of the function’s domain
R+ . O

There is a tight relationship between superdi¤erentials and directional derivatives, as the


next result shows. Note that (24.36) generalizes (24.32) to the multivariable case.

Theorem 1150 Let f : C ! R be concave. Then,

@f (x) = 2 Rn : f+0 (x; y) y for all y 2 Rn (24.35)


n
= 2R : f+0 (x; y) y 0
f (x; y) for all y 2 R n
(24.36)

and
f+0 (x; y) = min y 8y 2 Rn (24.37)
2@f (x)

Proof Lemma 1140 implies (24.35), while (24.36) follows from (24.35) via (24.13). Finally,
the coda Theorem 1169 implies (24.37) because f+0 (x; ) : Rn ! R is superlinear.

Superdi¤erentials permit to establish a neat characterization of (global) maximizers of


any function (not necessarily concave).

Theorem 1151 Given a function f : C ! R, a point x


^ 2 C is a maximizer if and only if
f is superdi¤ erentiable at x
^ and 0 2 @f (^
x).

Proof Let x ^ 2 C be a maximizer. We have f (x) f (^ x) + 0 (x x ^) for every x 2 C, and


so 0 2 @f (^x). Vice versa, let 0 2 @f (^
x). We have f (x) f (^ x) + 0 (x x ^) for every x 2 C,
that is, f (x) f (^x) for each x 2 C, which implies that x^ is a maximizer.

For concave functions this theorem gives as a corollary the most general version of the
…rst order condition for concave functions. Indeed, in view of Corollary 1142, the earlier
Theorem 1131 is a special case of this result.

Corollary 1152 Let f : C ! R be concave. Then, x


^ 2 C is a maximizer if and only if
0 2 @f (^
x).

The next example shows how this corollary makes it possible to …nd maximizers even
when Fermat’s Theorem does not apply because there are points where the function is not
di¤erentiable.

Example 1153 For the function f : R ! R de…ned by f (x) = 1 jxj we have (Example
1144): 8
< 1 if x>0
@f (x) = [ 1; 1] if x=0
:
1 if x<0
By Corollary 1152, x
^ = 0 is a maximizer since 0 2 @f (0). N
784 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

24.7 Quasi-concavity
24.7.1 Ordinal superdi¤erential
The next de…nition introduces a notion of superdi¤erential suitable for quasi-concave func-
tions.19 In reading it, keep in mind that quasi-concavity is an ordinal notion, unlike concavity
which is cardinal (Section 14.4).

De…nition 1154 A function f : C ! R is ordinally superdi¤erentiable at a point x 2 C if


the set @ o f (x) de…ned by

@ o f (x) = f 2 Rn : (y x) 0 =) f (y) f (x) 8y 2 Cg

is non-empty. The set @ o f (x) is called ordinal superdi¤erential of f at x.

The next result shows the ordinal nature of this notion, thus justifying its name.

Proposition 1155 Let g : C ! R be ordinally superdi¤ erentiable at x 2 C. If f : D


R ! R is strictly increasing, with Im g D, then f g : C Rn ! R is ordinally
superdi¤ erentiable at x, with @ o (f g) (x) = @ o g (x).

Proof Given x; y 2 C, it is enough to observe that f (y) f (x) if and only if (f g) (y)
(f g) (x) (cf. Proposition 209).

Because of its ordinal nature, the ordinal superdi¤erential is a convex semicone, as the
next result shows.

Proposition 1156 Let f : C ! R be ordinally superdi¤ erentiable at x 2 C. Then, @ o f (x)


is a convex semicone.

Proof Let ; 0 2 @ o f (x). In view of Proposition 717, we need to show that + 0 2


@ o f (x) whenever ; 0 and + > 0. Let y 2 C be such that
0 0
(y x) + (y x) = + (y x) 0

It follows that at least one addendum must be negative. Without loss of generality, say the
…rst: (y x) 0. We have two cases: either > 0 or = 0. In the former case, we
have that (y x) 0. In the latter case, since + > 0, 0 (y x) 0 and > 0.
We can conclude that either (y x) 0 or 0 (y x) 0 which implies f (y) f (x),
given that ; 0 2 @ o f (x), yielding that + 0 2 @ o f (x).

Next we show that for concave functions the notions of ordinal superdi¤erential and of
superdi¤erential are connected. Before doing so, we introduce an ancillary result which shows
how monotonicity is captured by the ordinal superdi¤erential.

Proposition 1157 Let f : C ! R be ordinally superdi¤ erentiable at x 2 C. If f is strongly


increasing, then @ o f (x) Rn+ f0g.
19
Unless otherwise stated, throughout this section C denotes an open and convex set in Rn .
24.7. QUASI-CONCAVITY 785

So, the elements of the ordinal superdi¤erential of a strongly increasing function are
positive and non-zero vectors.

Proof Note that 2 Rn is such that 2 @ o f (x) if and only if for every y 2 C
f (y) > f (x) =) (y x) > 0 (24.38)
Let 2 @ o f (x). Consider z 2 Rn++ . Since C is open, it follows that x + z=n for n large
enough and, in particular, x + z=n x. Since f is strongly increasing, we have that
f (x + z=n) > f (x), yielding that (z=n) > 0, that is, z > 0. By Lemma 541 and since
n
z 2 R++ was arbitrarily chosen and by continuity of the function x 7! x, we have that
z 0 for all z 2 Rn+ , proving that 0. Finally, let 1 be the constant vector whose
components are all 1. Since 1 2 Rn++ , the vector must be di¤erent from 0, otherwise
0= 1 > 0 which would be a contradiction.

We can now relate superdi¤erentials and the ordinal ones.


Proposition 1158 If f : C ! R is superdi¤ erentiable at x 2 C, then @f (x) @ o f (x). If,
in addition, f is strongly increasing and concave, then
[
@ o f (x) = @f (x) = f : 2 @f (x) and > 0g
>0

Proof Let 2 @f (x). By de…nition, we have that f (y) f (x) (y x) for all y 2 C.
This implies that if y 2 C and (y x) 0, then f (y) f (x), yielding that 2 @ o f (x)
and @f (x) @ o f (x). Now, assume that f is concave, strongly increasing, and x 2 C. Note
that @f (x) is non-empty. By the previous part[of the proof, we have that @f (x) @ o f (x).
By Proposition 1156, it follows that @ o f (x) @f (x). Vice versa, consider 2 @ o f (x).
>0
By Proposition 1157 and since f is strongly increasing, we have that > 0. Let y 2 Rn
be such that y = 0. It follows that for every h > 0 small enough x + hy 2 C and
(x + hy) x. Since 2 @ o f (x), it follows that f (x + hy) f (x) 0 for every h > 0
small enough. We can conclude that
f (x + hy) f (x)
f+0 (x; y) = lim 0
h!0+ h
Since y was arbitrarily chosen, it follows that f+0 (x; y) 0 for all y 2 Rn such that y = 0.
De…ne V = fy 2 R : n y = 0g and g : V ! R by g (y) = 0. Clearly, V is a vector subspace
and g is linear. By the Hahn-Banach’s Theorem (Theorem 1168) and since f+0 (x; ) g and
f+0 (x; ) is superlinear, it follows that g admits a linear extension such that f+0 (x; y) g (y)
for every y 2 Rn . By Riesz’s Theorem, there exists 0 2 Rn such that g (y) = 0 y for every
y 2 Rn . We can conclude that
0
y = 0 =) y=0 (24.39)
By Theorem 1150, it follows that 0 2 @f (x). Since f is strongly increasing, we also have
that 0 > 0.20 We are left to show that = 0 for some > 0. By Theorem 1167 and since
(24.39) holds, we have that 0 = for some 2 R. Since > 0 and 0 > 0, we have that
> 0, it is enough to set = 1= > 0.
20 0
By the previous part of the proof, 2 @ o f (x). By Proposition 1157 and since f is strongly increasing,
0
2 @ o f (x) Rn
+ f0g.
786 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Theorem 1159 A bounded above and continuous function f : C ! R is quasi-concave if


and only if @ o f (x) 6= ; for all x 2 C.

Proof Since f is bounded above, there exists M 2 R such that f (y) M for all y 2 C. We
need to introduce two connected ancillary objects. We start with the function G : Rn C ! R
such that for every 2 Rn and for every x 2 C

G ( ; x) = sup ff (y) : y xg

Note that f (x) G ( ; x) M for every 2 Rn and for every x 2 C. If we …x ,


note that the function x 7! G ( ; x) is quasi-concave on C. Indeed, consider z; z^ 2 C
and 2 [0; 1]. Without loss of generality, assume that z z^. It follows that
z ( z + (1 ) z^) z^. We thus have that

ff (y) : y zg ff (y) : y ( z + (1 ) z^)g

yielding that G ( ; z + (1 ) z^) G ( ; z) min fG ( ; z) ; G ( ; z^)g, proving quasi-


concavity. The second ancillary function is f^ : C ! R such that for every x 2 C

f^ (x) = infn G ( ; x)
2R

Observe that f (x) f^ (x) M for every x 2 C. Note that f^ is also quasi-concave on C
(why?).
We can now prove the main statement. We begin with the “If” part. Consider x 2 C.
Let 2 @ o f (x). This implies that if y 2 C is such that (y x) 0, then f (y) f (x).
It follows that

f^ (x) G ( ; x) = sup ff (y) : y xg = f (x) f^ (x)

This implies that f^ (x) = f (x). Since x 2 C was arbitrarily chosen, we can conclude that
f = f^, yielding that f is quasi-concave. As for the “Only if” part, let x 2 C. We have two
cases: either x is a maximizer or x is not a maximizer of f on C. In the …rst case, choose
= 0. Note that the implication

(y x) 0 =) f (y) f (x)

trivially holds, since f (y) f (x) for all y 2 C, being x a maximizer. Thus, 2 @ o f (x)
and this latter set is non-empty. In the second case, since x is not a maximizer and f is
continuous and quasi-concave, we have that the strict upper contour set

(f > f (x)) = fy 2 C : f (y) > f (x)g

is non-empty, open, convex, and x does not belong to it. By Proposition 824, there exists
2 Rn such that if y 2 (f > f (x)), that is f (y) > f (x), then y> x. By taking the
o
contrapositive, we have that 2 @ f (x) and this latter set is non-empty.
24.7. QUASI-CONCAVITY 787

24.7.2 Quasi-concavity and di¤erentiability


Proposition 1160 Let f : C ! R be a strongly increasing and quasi-concave function. If
f is di¤ erentiable at x 2 C, then

@ o f (x) = f rf (x) : > 0g (24.40)

provided rf (x) 6= 0.

The proof relies on a lemma of some independent interest.

Lemma 1161 If f : C ! R is continuous, then

f0 6= 2 Rn : (y x) < 0 =) f (y) f (x) 8y 2 Cg @ o f (x)

Proof Let be an element of the set on the left hand side. To prove the inclusion, we want
to show that if y 2 C, then

(y x) 0 =) f (y) f (x)

By assumption, if (y x) < 0, then f (y) f (x). Suppose then that y = x.


Since n
6= 0, there is some z 2 R such that z > 0. Let yn = y z=n. Since C is
open, we have yn 2 C for n su¢ ciently large. Clearly, yn = y ( z) =n < x By
assumption, it follows that f (yn ) f (x). Since f is continuous, by taking the limit we have
f (y) = limn!1 f (yn ) f (x). We conclude that 2 @ o f (x).

Proof of Proposition 1160 Suppose f is di¤erentiable at x 2 C. Let us …rst prove that


rf (x) 2 @ o f (x) provided rf (x) 6= 0. In view of Lemma 1161, it is enough to prove that
rf (x) (y x) < 0 implies f (y) f (x). Since f is di¤erentiable, by Theorem 970 we have
that
f (x + t (y x)) f (x)
rf (x) (y x) = lim
t!0 t
If rf (x) (y x) < 0, then f (x + t (y x)) f (x) < 0 for t su¢ ciently small and in
(0; 1). Namely, f ((1 t) x + ty) < f (x). Since f is quasi-concave, we have f (x) >
f ((1 t) x + ty) min ff (x) ; f (y)g, yielding that f (x) min ff (x) ; f (y)g = f (y). It
o o
follows that rf (x) 2 @ f (x). Since @ f (x) is a semicone, we can also conclude that
f rf (x) : > 0g @ o f (x).
As to the converse inclusion @ o f (x) f rf (x) : > 0g, consider 2 @ o f (x). We
want to show that = rf (x) for some > 0. By Proposition 1157 and since f is strongly
increasing, we have that > 0. Let z 2 Rn be such that z = 0. For t small enough, we have
that x + tz 2 C and (x + tz) x. Since 2 @ o f (x), we have that f (x + tz) f (x)
for t su¢ ciently small. By Theorem 970, this implies that
f (x + tz) f (x)
rf (x) z = lim 0
t!0 t
Since z was arbitrarily chosen, we have that z = 0 implies rf (x) z 0. Since z=0
if and only if ( z) = 0, we can conclude that z = 0 implies rf (x) z = 0. By
Theorem 1167, we have that rf (x) = for some 2 R. Since f is strongly increasing and
rf (x) 6= 0, we have that rf (x) > 0. Since > 0 and rf (x) > 0, we have that > 0. If
we set = 1= > 0, = rf (x), proving the inclusion.
788 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Example 1162 The conditions rf (x) 6= 0 and of strong increasing monotonicity of Pro-
position 1160 are needed. For instance, for the quasi-concave function f (x) = x3 we have
= @ o f (0) = (0; 1). On the other hand, for the function f (x) = x2 , the origin is
0 = f 0 (0) 2
a global maximum and 0 = f 0 (0) 2 @ o f (0) = R. N

24.7.3 Quasi-concavity criteria


We now turn to di¤erential criteria for quasi-concavity. We begin with the quasi-concave
counterpart of Theorem 1118.

Theorem 1163 A di¤ erentiable f : C ! R is quasi-concave if and only if for each x; y 2 C

rf (x) (y x) < 0 =) f (y) < f (x) (24.41)

Proof Before starting note that, by contrapositive, (24.41) is equivalent to the following
property for each x; y 2 C

f (y) f (x) =) rf (x) (y x) 0 (24.42)

We only prove the “Only if”part. Consider x; y 2 C. Thus, assume that f (y) f (x). Since
f is quasi-concave, it follows that f ((1 t) x + ty) f (x) for every t 2 (0; 1). By Theorem
970, we have that

f (x + t (y x)) f (x) f ((1 t) x + ty) f (x)


rf (x) (y x) = lim = lim 0
t!0 t t!0+ t

yielding that rf (x) (y x) 0.

The next result is the quasi-concave counterpart of Theorem 1119, where a suitable notion
of quasi-monotonicity is used.

Proposition 1164 A di¤ erentiable f : C ! R is quasi-concave if and only if the derivative


operator rf : C ! Rn is quasi-monotone, i.e.,

rf (x) (y x) < 0 =) rf (y) (y x) 0 8x; y 2 C (24.43)

Proof “If”Assume that the derivative operator rf : C ! Rn is quasi-monotone. Suppose,


by contradiction, that f is not quasi-concave. By (24.41), there exists a pair x; y 2 C
for which rf (x) (y x) < 0 and f (y) f (x). De…ne ' : [0; 1] ! R by ' (t) =
f (ty + (1 t) x). Since f is di¤erentiable on C, we have that ' is di¤erentiable on (0; 1), con-
tinuous on [0; 1], and such that ' (1) = f (y) f (x) = ' (0). De…ne also yt = (1 t) x + ty
for all t 2 [0; 1]. Note that for each t 2 (0; 1)

yt x = t (y x) and '0 (t) = rf (yt ) (y x) (24.44)

Since rf (x) (y x) < 0, we have that for each t 2 (0; 1)

rf (x) (yt x) = trf (x) (y x) < 0


24.8. INFRACODA: A LINEAR ALGEBRA RESULT 789

So, by (24.43) and (24.44), we have that for each t 2 (0; 1)

t'0 (t) = trf (yt ) (y x) = rf (yt ) (yt x) 0

The function ' is thus decreasing on (0; 1). By continuity, ' is decreasing on [0; 1]. Since
' (1) ' (0), this implies that ' is constant on [0; 1]. Since '+ (0) = rf (x) (y x) (why?),
in turn, this implies that 0 = '0+ (0) = rf (x) (y x) < 0, a contradiction.
“Only if”Let f be quasi-concave and suppose that (24.43) does not hold. So, there exists
a pair x; y 2 C such that

rf (x) (y x) < 0 and rf (y) (y x) > 0

In particular, we have that rf (y) (x y) < 0. Since f is quasi-concave, by Theorem 1163


these two inequalities imply f (y) < f (x) and f (x) < f (y), a contradiction.

24.7.4 Optima
We can characterize maximizers via the ordinal superdi¤erential, as we did in Theorem 1151
for the superdi¤erential.

Proposition 1165 Given a function f : C ! R, a point x ^ 2 C is a maximizer if and only


if f is ordinally superdi¤ erentiable at x o x). In this case, @ o f (^
^ and 0 2 @ f (^ x) = Rn .

Proof Let x^ 2 C be a maximizer. We have f (y) f (^ x) for every y 2 C. Thus, for every
n
2 R we trivially have that if y 2 C and (y x ^) 0, then f (y) f (^
x), yielding
that 2 @ o f (^
x). Since was arbitrarily chosen, it follows that @ o f (^
x) = Rn . Vice versa,
let 0 2 @f (^
x). It follows that if y 2 C and 0 (y x ^) 0, then f (y) f (^x). Since
0 (y x ^) 0 holds for every y 2 C, we have that f (y) f (^ x) for all y 2 C, i.e., x
^ 2 C is
a maximizer.

We thus have the following general …rst-order condition for quasi-concave functions, the
counterpart here of Corollary 1152.

Corollary 1166 Let f : C ! R be quasi-concave. Then, x


^ 2 C is a maximizer if and only
if 0 2 @ o f (^
x).

24.8 Infracoda: a linear algebra result


The results of the last section relies on an interesting linear algebra result that next we state
and prove.

Theorem 1167 Let f i gki=1 Rn be a …nite collection of vectors and 2 Rn . We have

i x=0 8i = 1; :::; k =) x=0 8x 2 Rn (24.45)


Pk
if and only if there exist scalars f i gki=1 R such that = i=1 i i.
790 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

So, 2 span f 1 ; :::; kg if and only if condition (24.45) holds.

Proof The “if” part is obvious and therefore left to the reader. “Only if” Before starting,
we introduce some derived objects since reasoning in terms of linear functions rather than
vectors will simplify things quite signi…cantly. De…ne fi : Rn ! R by fi (x) = i x for
each i = 1; :::; k. Similarly, de…ne f : Rn ! R by f (x) = x. Next, de…ne the operator
F : Rn ! Rk to be such that the i-th component of F (x) is F (x)i = fi (x). Since F is linear
(why?), note that Im F is a vector subspace of Rk . Next, we de…ne a function g : Im F ! R
by the following formula: for each v 2 Im F

g (v) = f (x) where x 2 Rn is such that F (x) = v

First, we need to show that g is well de…ned. In other words, we need to check that to each
vector of Im F g assigns one and only one value. In fact, by de…nition, given v 2 Im F there
always exists a vector x 2 Rn such that F (x) = v. The potential issue is that there might
exist a second vector y 2 Rn such that F (y) = v, but f (x) 6= f (y). We next show that this
latter inequality will never hold. Indeed, since F is linear, if F (y) = v, then F (x) F (y) = 0
and F (x y) = 0. By de…nition of F , we have i (x y) = fi (x y) = 0 for every
i = 1; :::; k. By (24.45), this yields that (x y) = f (x y) = 0, that is, f (x) = f (y). We
just proved that g is well de…ned. The reader can verify that g is also linear. By the Hahn-
Banach’s Theorem (Theorem 636), g admits an P extension to Rk . By the Riesz’s Theorem,
there exists a vector 2 Rk such that g (v) = ki=1 i vi for all v 2 Rk . By de…nition of fi ,
f , g, and F , we conclude that for every x 2 Rn
k
X k
X
x = f (x) = g (F (x)) = g (F (x)) = i fi (x) = i i x
i=1 i=1
Pk 21
yielding that = i=1 i i.

24.9 Coda: representation of superlinear functions


24.9.1 The ultimate Hahn-Banach’s Theorem
In presenting the Hahn-Banach’s Theorem (Section 13.10), we remarked that a linear func-
tion de…ned on a vector subspace of Rn admits, in general, many linear extensions. The next
more powerful version of the theorem gives some control over them.

Theorem 1168 (Hahn-Banach) Let g : Rn ! R be a concave function and V a vector


subspace of Rn . If f : V ! R is a linear function such that f (x) g (x) for all x 2 V ,
n n
then there exists a linear function f : R ! R that extends f to R with f (x) g (x) for all
x 2 Rn .

The version of the theorem seen in Section 13.10 is a special case. Indeed, let f : V ! R
be any linear function de…ned on V . Theorem 729 is easily seen to hold for linear functions
de…ned on vector subspaces, so there is k > 0 such that jf (x)j k kxk for all x 2 V . The
21
Readers who struggle with this last step should consult the proof of the Riesz’s Theorem (in particular,
the part dealing with “uniqueness”).
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 791

function g : Rn ! R de…ned by g (x) = k kxk is concave (Example 652). Since f (x) g (x)
for all x 2 V , by the last theorem there exists a linear function f : Rn ! R that extends f
to Rn .

Proof Let dim V = k n and let fx1 ; :::; xk g be a basis for V . If k = n, there is nothing
to prove since V = Rn . Otherwise, by Theorem 87 there are n k vectors fxk+1 ; :::; xn g
such that the overall set fx1 ; :::; xn g is a basis for Rn . Let V1 = span fx1 ; :::; xk+1 g. Clearly,
V V1 . Given any x 2 V1 , there exists a unique collection of scalars f i gk+1 i=1 R such
Pk Pk
that x = i=1 i xi + k+1 xk+1 . Since i=1 i xi 2 V , every element of V1 can be uniquely
written as x + xk+1 , with x 2 V and 2 R. That is, V1 = fx + xk+1 : x 2 V; 2 Rg.
Let r be an arbitrary scalar. De…ne f1 : V1 ! R by f (x + xk+1 ) = f (x) + r for all
x 2 V and all 2 R. The function f1 is linear, with f1 (xk+1 ) = r, and is equal to f on V .
We need to show that r can be chosen so that f1 (x) g (x) for all x 2 V1 .
If > 0, we have that for every > 0 and every x 2 V

g (x + xk+1 ) f (x)
f1 (x + xk+1 ) g (x + xk+1 ) () f (x)+ r g (x + xk+1 ) () r

So, for all > 0 and all x 2 V , we have

g (x + xk+1 ) f (x)
r

If < 0, we have that for every < 0 and every x 2 V

f (x) g (x ( ) xk+1 )
f1 (x + xk+1 ) g (x + xk+1 ) () f (x) ( )r g (x ( ) xk+1 ) () r

So, for all > 0 and all y 2 V , we have

f (y) g (y xk+1 )
r

Summing up, we have f1 (x) g (x) for all x 2 V1 if and only if we choose r 2 R so that

f (y) g (y xk+1 ) g (x + xk+1 ) f (x)


inf r sup
y2V; >0 x2V; >0

It remains to prove that such a choice of r is possible, i.e., that

f (y) g (y xk+1 ) g (x + xk+1 ) f (x)


inf sup (24.46)
y2V; >0 x2V; >0

Note that
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)

() f (y) g (y xk+1 ) g (x + xk+1 ) f (x)


() f (y) + f (x) g (y xk+1 ) + g (x + xk+1 )
() f ( y + x) g (y xk+1 ) + g (x + xk+1 )
792 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

But, since g is concave and f (x) g (x) for all x 2 V , we have

f ( y + x) = ( + ) f y+ x ( + )g y+ x
+ + + +

= ( + )g (y xk+1 ) + (x + xk+1 )
+ +

( + ) g (y xk+1 ) + g (x + xk+1 )
+ +
= g (y xk+1 ) + g (x + xk+1 )

Thus, for all x; y 2 V and all ; > 0, we have

f (y) g (y xk+1 ) g (x + xk+1 ) f (x)

In turn, this implies (24.46), as desired. We conclude that there exists a linear function
f1 : V1 ! R that extends f and such that f1 (x) g (x) for all x 2 V1 .
Consider now V2 = span fx1 ; :::; xk+1 ; xk+2 g. By proceeding as before, we can show the
existence of a linear function f2 : V2 ! R that extends f1 and such that f2 (x) g (x)
for all x 2 V2 . In particular, being V V1 V2 , the linear function f2 is such that
f2 (x) = f1 (x) = f (x) for all x 2 V . So, f2 extends f to V2 . By iterating, we reach a
…nal extension fn k : Rn ! R that extends f and is such that fn k (x) g (x) for all
x 2 Vn k = span fx1 ; :::; xn g = Rn . This completes the proof.

24.9.2 Representation of superlinear functions


Next we establish a key characterization of superlinear functions. In reading the result, recall
that @f (0) = f 2 Rn : x f (x) for every x 2 Rn g is a non-empty compact and convex
set in Rn if f is superlinear (Section 24.6), as well as that a translation invariant function
f is normalized (Section 13.1.4) provided f (0) = 0 – e.g., f is superlinear – and f (1) = 1
(Section 16.3).

Theorem 1169 A function f : Rn ! R is superlinear if and only if there is a non-empty


compact and convex set C Rn such that

f (x) = min x 8x 2 Rn (24.47)


2C

Moreover, C is unique and is given by @f (0). In particular,

(i) @f (0) Rn+ if and only if f is increasing;

(ii) @f (0) Rn+ f0g if and only if f is strongly increasing;

(iii) @f (0) Rn++ if and only if f is strictly increasing;

(iv) @f (0) n 1 if and only if f is increasing and translation invariant with f (1) = 1.
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 793

This result, a consequence of the Hahn-Banach’s Theorem, is a nonlinear version of Riesz’s


Theorem which shows that superlinear functions can be represented as lower envelopes of the
linear functions l (x) = x that pointwise dominate them. Together, points (i)-(iii) form a
nonlinear version of the monotone Riesz’s Theorem stated in Propositions 539 and 641, with
stronger conditions of monotonicity –recall (6.25) –that translate in stronger properties of
@f (0). Finally, point (iv) is a nonlinear versions of Proposition 542.

Proof We prove the “only if” part, as the “if” follows from Example 1103. Suppose f is
superlinear. By the Hahn-Banach’s Theorem, @f (0) is not empty. Indeed, let x 2 Rn and
consider the vector subspace Vx = f x : 2 Rg generated by x (see Example 82). De…ne
lx : Vx ! R by lx ( x) = f (x) for all 2 R. The function lx is linear on the vector subspace
Vx . Since f is superlinear, recall that f (x) f ( x), that is, f (x) f ( x). We next
show that lx f on Vx . Since f is superlinear, if 0, then lx ( x) = f (x) = f ( x). If
< 0, then lx ( x) = f (x) = ( f (x)) f ( x) = f ( x), proving that lx f on
Vx . By the Hahn-Banach’s Theorem, there exists l 2 (Rn )0 such that l f on Rn and l = lx
on Vx .22 By the Riesz’s Theorem, there exists 2 Rn such that l (x) = x for all x 2 Rn .
We thus have showed that 2 @f (0) and f (x) = x. The …rst fact implies that @f (0) is
not empty, hence min 2@f (0) x f (x) for all x 2 Rn , while the second fact implies that

f (x) = x = min x (24.48)


2@f (0)

Since x was arbitrarily chosen, (24.48) holds for every x 2 Rn . Next, suppose C; C 0 Rn
are any two non-empty convex and compact sets such that

f (x) = min x = min0 x 8x 2 Rn


2C 2C

We want to show that C = C 0 . Suppose, by contradiction, that there is 2 C such that


2 0 0 n
= C . Since C is a non-empty compact and convex set in R , by Proposition 824 there
is a “separating” pair (a; b) 2 Rn R such that a b+">b a for all 2 C 0 and
for some " > 0. Thus, we reach the contradiction

f (a) = min0 a>a min x = f (a)


2C 2C

We conclude that C = C 0 . In turn, in view of (24.48) this implies that @f (0) is the unique
non-empty compact and convex set in Rn for which (24.47) holds.
(i) Let @f (0) Rn+ . If x; y 2 Rn are such that x y, then x y for all 2 @f (0).
Let y 2 @f (0) be such that f (y) = y y. Then,

f (y) = min y= y y y x min x = f (x)


2@f (0) 2@f (0)

as desired. Conversely, assume that f is increasing. Then, for each i = 1; :::; n we have

0 f ei = min ei = min i
2@f (0) 2@f (0)

22
Recall that (Rn )0 denotes the dual space of Rn , i.e., the collection of all linear functions on Rn (Section
13.1.2).
794 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

So, 0 i for all 2 @f (0), which implies 0 for all 2 @f (0).


(ii) The “only if”is similar to that of (i) and left to the reader. As to the converse, assume
that f is strongly increasing. Then, f is increasing, yielding that @f (0) Rn+ . Moreover,
we have that
Xn
0 < f (1) = min 1 = min i 8 2 @f (0)
2@f (0) 2@f (0)
i=1

So, 0 2= @f (0).
(iii) The proof is similar to (i) and left to the reader.
(iv) Let @f (0) n 1 . By (i), f is increasing. It remains to prove that it is translation
invariant. Let x 2 Rn and k 2 R. We have k = k because 2 n 1 . So,

f (x + k) = min (x + k) = min ( x+ k)
2@f (0) 2@f (0)
= min ( x + k) = k + min x = f (x) + k
2@f (0) 2@f (0)

as desired. Conversely, assume that f is increasing and translation invariant. By point (i),
@f (0) Rn+ . Moreover, since f (k) = k for all k 2 R, we have
n
X
i = 1 min 1 = f (1) = 1 8 2 @f (0)
2@f (0)
i=1

and
n
X
i = 1 min ( 1) = f ( 1) = 1 8 2 @f (0)
2@f (0)
i=1
Pn Pn Pn
So, we have both i=1 i 1 and i=1 i 1, which implies i=1 i = 1. We conclude
that @f (0) n 1.

The previous theorem has the following important corollary.

Corollary 1170 A superlinear function f : Rn ! R is linear if and only if @f (0) is a


singleton.

Proof Let f be superlinear. Suppose f is linear. Let l 2 (Rn )0 be such that l f . By (24.9)
f = l. Conversely, suppose there is a unique l 2 (Rn )0 such that l f . Then (24.47) implies
f = l.

We can actually say something more about the domain of additivity of a superlinear
function. To this end, consider the collection Af = fx 2 Rn : f (x) = f ( x)g of all vectors
where the gap f ( x) f (x) closes.

Proposition 1171 Let f : Rn ! R be a superlinear function. Then, Af is a vector subspace


of Rn , with
f (x + y) = f (x) + f (y) 8x 2 Rn (24.49)
if and only if y 2 Af .
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 795

So, Af is a vector subspace of Rn that describes the domain of additivity of a superlinear


function f . In particular, f is linear if and only if Af = Rn . The dimension of Af is thus a
(rough) indication of the failure of additivity of f . For instance, by Lemma 738 a function
f : Rn ! R, with f (1) 6= 0, is translation invariant if and only if 1 2 Af ; in this case, the
dimension of Af is at least 1.

Proof We begin with a key observation. If y 2 Af , then

f (y) = y 8 2 @f (0) (24.50)

Indeed, for each 2 @f (0) we have f (y) y= ( y) f ( y), so f (y) = y.


We now prove that Af is a vector subspace. First, by de…nition of Af , observe that
y 2 Af if and only if y 2 Af . Let y 2 Af and 2 R. If 0, we have f ( y) = f (y) =
f ( y) = f ( y), so y 2 Af . Since y 2 Af and given what we have just proved, if
< 0, then > 0 and y = ( ) ( y) 2 Af . We conclude that y 2 Af for all 2 R.
Let x; y 2 Af . We have that x; y 2 Af . Let 2 @f (0). By (24.47) and (24.50), we
then have

f (x + y) = min (x + y) = min ( x+ y) = x+ y= ( x) ( y)
2C 2C
= (f ( x) + f ( y)) f( x y) f (x + y)

So, f ( x y) = f (x + y), which implies x + y 2 Af . We conclude that Af is a vector


subspace of Rn .
It remains to prove the equivalence stated in the result. “If”. Suppose y 2 Af . Let
2 @f (0). By (24.47) and (24.50), we have for all x 2 Rn

f (x + y) = min (x + y) = min ( x+ y) = min x+ y = f (x) + f (y)


2C 2C 2C

as desired. “Only if”. By taking x = y in (24.49), we have 0 = f (0) = f ( y + y) =


f ( y) + f (y), so f (y) = f ( y).

24.9.3 Modelling bid-ask spreads


Setup
In Section 19.5 we studied a basic …nance framework in which n primary assets L =
fy1 ; :::; yn g Rk are traded in a frictionless …nancial market. In contrast, we now allow
for bid-ask spreads, a classic market friction in which primary assets might have di¤erent
buying and selling prices. Buying one unit of asset j costs paj , the ask price, while selling one
unit of the same asset j yields instead pbj , the bid price, possibly with paj 6= pbj . In …nancial
markets, this is a fairly common situation. For an everyday example, readers may think of
buying and selling one unit of a currency, say euros for dollars, at a bank. The price of such
operations –the exchange rate –applied by the bank will be di¤erent depending on whether
we buy or sell one dollar; in particular, typically the price at which we buy is greater than
the one at which we sell, so paj pbj . Di¤erences between ask and bid prices are called bid-ask
spreads.
Here we thus assume that each primary asset j has bid and ask prices pbj and paj , with
paj pbj 0. Set pb = pb1 ; :::; pbn 2 Rn+ and pa = (pa1 ; :::; pan ) 2 Rn+ . The triple L; pb ; pa
796 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

describes a …nancial market with bid-ask spreads. If paj = pbj for each j, we are back to the
frictionless framework of Section 19.5.
Before moving on, a piece of notation based on joins and meets (Section 17.1): given a
vector x 2 Rn , the positive vectors x+ = x _ 0 and x = x ^ 0 are called positive and
negative part of x, respectively. In terms of components, we have

x+
i = max fxi ; 0g and xi = min fxi ; 0g

In words, the components of x+ coincide with the positive ones of x and are 0 otherwise.
Similarly, the components of x coincide with the negative ones of x and are 0 otherwise.
It is immediate to check that x = x+ x . This decomposition can be interpreted as a
trading strategy: if x denotes a portfolio, its positive and negative parts x+ and x describe
the long and short positions that it involves, respectively – i.e., how much one has to buy
and sell, respectively, of each primary asset to form portfolio x.

Example 1172 Let x = (1; 2; 3) 2 R3 is a portfolio in a market with three primary assets.
We have x+ = (1; 2; 0) and x = (0; 0; 3), so to form portfolio x one has to buy one unit of
the …rst asset and two units of the second one and to sell three units of the third asset. N

Market values
To describe how much it cost to form a portfolio x, we need the ask market value va : Rn ! R
de…ned by
Xn Xn
va (x) = x+ p
j j
a
xj pbj 8x 2 Rn (24.51)
j=1 j=1

So, va (x) is the cost of portfolio x. In particular, since each primary asset yj corresponds to
the portfolio ej , we have va ej = paj . Note that we can attain the primary assets’holdings
of portfolio x also by buying and selling according to any pair of positive vectors x0 and x00
such that x = x0 x00 . In this case, the cost of x would be
n
X n
X
x0j paj x00j pbj (24.52)
j=1 j=1

Example 1173 In the last example we noted that to form portfolio x = (1; 2; 3) one has
to buy and sell the amounts prescribed by x+ = (1; 2; 0) and x = (0; 0; 3), respectively. At
the same time, this portfolio can be also formed by buying an extra unit of the third asset
and by selling the same extra unit of that asset. In other words, we have that x = x0 x00 ,
where x0 = (1; 2; 1) and x00 = (0; 0; 4). The cost of the …rst trading strategy is (24.51), while
the cost of the second one is (24.52). N

A moment’s re‡ection shows that there are actually in…nite possible decompositions of x
as a di¤erence of two positive vectors x0 and x00 . Each of them is a possible trading strategy
that delivers the assets’holdings that portfolio x features. Of course, one would choose the
cheapest among such trading strategies. The next result shows that the cheapest way to
form portfolio x is, indeed, that obtained by buying the amounts in x+ and selling those in
x . So, we can focus on them and forget about alternative buying and selling pairs x0 and
x00 .
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 797

Proposition 1174 The ask market value va : Rn ! R is such that, for each x 2 Rn ,
8 9
<Xn n
X =
va (x) = min x0j paj x00j pbj : x0 ; x00 0 and x = x0 x00
: ;
j=1 j=1

Proof De…ne v^a : Rn ! R by


8 9
<Xn n
X =
v^a (x) = inf x0j paj x00j pbj : x0 ; x00 0 and x = x0 x00
: ;
j=1 j=1

On the one hand, since x+ ; x 0 and x = x+ x , it follows that v^a (x) va (x) for all
x 2 Rn . On the other hand, consider x0 ; x00 0 such that x = x0 x00 . It follows that x0 x+
and x00 x . Indeed, note that x0 = x + x00 . Let i 2 f1; :::; ng. Since x00 0, it follows that
x0i = xi + x00i xi . Since x0 0, we thus have that x0i = max fx0i ; 0g max fxi ; 0g = x+
i .
Since i was arbitrarily chosen, we conclude that x 0 + 00 0
x . Finally, since x = x x + x ,+

we conclude that x00 x .


Now, de…ne w = x0 x+ and v = x00 x . Clearly, we have w 0, v 0, and w v = 0.
This implies that
n
X n
X n
X n
X n
X n
X
x0j paj x00j pbj = x+ a
j pj + wj paj vj pbj xj pbj
j=1 j=1 j=1 j=1 j=1 j=1
n
X n
X n
X n
X
= va (x) + wj paj vj pbj = va (x) + wj paj wj pbj
j=1 j=1 j=1 j=1
Xn
= va (x) + wj paj pbj va (x)
j=1

Since x0 and x00 were arbitrarily chosen, we conclude that va (x) v^a (x). In particular, since
the inf is attained at x+ and x , we can replace it with a min.

The ask market value has a noteworthy property.

Proposition 1175 The ask market value va : Rn ! R is sublinear.

Proof Consider x; x0 2 Rn . Note that x+ + (x0 )+ (x + x0 )+ 0 and x + (x0 )


(x + x0 ) 0 (why?). At the same time, we have that
+ +
x + x0 = x+ x + x0 x0 = x+ + x0 x + x0

By Proposition 1174, we have


n
X n
X
+
va x + x0 = v^a x + x0 x+
j + x
0
j
paj xj + x0 j
pbj
j=1 j=1
n
X n
X n
X Xn
+ a
= x+ a
j pj xj pbj + x0 p
j j
x0 j
pbj = va (x) + va x0
j=1 j=1 j=1 j=1
798 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

proving subadditivity. Next, consider x 2 Rn and 0. Since x+ = ( x)+ and x =


( x) , we have that
n
X n
X n
X n
X
va ( x) = ( x)+ a
j pj ( x)j pbj = x+ a
j pj xj pbj = va (x)
j=1 j=1 j=1 j=1

proving positive homogeneity and the statement.

Let us take an alternative “bid”perspective: now x 2 Rn is no longer a portfolio that we


want to form, but rather a portfolio that we already hold and want to liquidate. How much
are we going to cash in if we were to sell it on the market? The answer is given by the bid
market value vb : Rn ! R de…ned by
n
X n
X
vb (x) = x+ b
j pj xj paj
j=1 j=1

In particular, we have vb ej = pbj for each primary asset j. There is a tight relationship
between bid and ask market values, as next we show.

Proposition 1176 We have vb va , with

vb (x) = va ( x) 8x 2 Rn (24.53)

In particular, vb is superlinear.

So, ask and bid market values are one the dual of the other. The superlinearity of vb is
a …rst dividend of this duality.

Proof If x 2 Rn , then
0 1 0 1
n
X n
X Xn n
X
va ( x) = @ ( x)+ a
( x)j pbj A = @ xj paj x+ bA
j pj j pj
j=1 j=1 j=1 j=1
n
X n
X
= x+ b
j pj xj paj = vb (x)
j=1 j=1

proving the …rst part of the statement. Consider now x; x0 2 Rn . Since va is sublinear, we
have that va ( x x0 ) va ( x) + va ( x0 ), yielding that

vb x + x0 = va x x0 va ( x) va x0 vb (x) + vb x0

proving vb is superadditive. Finally, consider x 2 Rn and 0. Since va is sublinear, we


have that
vb ( x) = va ( x) = va ( ( x)) = va ( x) = vb (x)
proving positive homogeneity.

By Proposition 1171, the set of portfolios without bid-ask spreads fx 2 Rn : vb (x) = va (x)g
is a vector subspace of Rn over which the bid and ask market values are linear.
24.9. CODA: REPRESENTATION OF SUPERLINEAR FUNCTIONS 799

Law of one price


The Law of one price continues to be key also in the presence
Xn of bid-ask spreads. Recall that
n k
the payo¤ operator R : R ! R de…ned by R (x) = xj yj is a linear operator that
j=1
describes the contingent claim determined by each portfolio x (Section 19.5). In particular,
its image W = Im R is the set of replicable claims.

De…nition 1177 The …nancial market (L; pb ; pa ) satis…es the Law of one price (LOP) if,
for all portfolios x; x0 2 Rn , we have

R (x) = R x0 =) va (x) = va x0 (24.54)

or, equivalently,
R (x) = R x0 =) vb (x) = vb x0 (24.55)

Conditions (24.54) and (24.55) are equivalent because of the bid-ask duality (24.53), so
the de…nition is well posed. Note that if pai = pbi for all i, then we get back to the LOP
of Section 19.5 since va = v. The rationale behind this more general version of the LOP is,
mutatis mutandis, the same: portfolios that induce the same contingent claims should have
the same market value whether we form or liquidate them.
In a market with bid-ask spreads, the LOP allows us to de…ne a pair of pricing rules.
Speci…cally, the ask pricing rule fa : W ! R and the bid pricing rule fb : W ! R are the
functions that associate to each replicable contingent claim w 2 W their ask and bid prices,
respectively. That is, for each w 2 W we have

fa (w) = va (x) and fb (w) = vb (x)

where x 2 R 1 (w). Clearly, we have fb fa and, by the bid-ask duality (24.53), also the
pricing rules are dual:
fb (w) = fa ( w) 8w 2 W (24.56)
Next we show that they also inherit the shape of their corresponding market values.

Theorem 1178 Suppose the …nancial market L; pb ; pa satis…es the LOP. Then, the ask
pricing rule fa : W ! R is sublinear and the bid pricing rule fb : W ! R is superlinear.

In sum, the pricing of contingent claims made possible by the LOP inherits the bid and
ask duality of the underlying market values.

Proof First, we verify that fa is well de…ned. In other words, we are going to check that
to each vector w of W the rule de…ning fa assigns one and only one value. Indeed, assume
that there exist x; x0 2 Rn such that R (x) = w = R (x0 ). The potential issue could be
that va (x) 6= va (x0 ). But, the LOP exactly prevents this from happening. Next, consider
w; w0 2 W . By de…nition, there exist x; x0 2 Rn such that R (x) = w and R (x0 ) = w0 . Since
R is linear, we also have that R (x + x0 ) = R (x) + R (x0 ) = w + w0 . Since va is sublinear,
this yields that

fa w + w0 = va x + x0 va (x) + va x0 = fa (w) + fa w0
800 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

proving that f is subadditive. Consider now w 2 W and 0. By de…nition, there exists


x 2 Rn such that R (x) = w. Since R is linear, we also have that R ( x) = R (x) = w.
Since va is sublinear, this yields that

fa ( w) = va ( x) = va (x) = fa (w)

proving that fa is positively homogeneous. We conclude that fa is sublinear. By the same


arguments, the function fb is also well de…ned. By the bid-ask duality (24.56), fb turns out
to be superlinear.

Pricing kernels
In Theorem 1169 we established a representation result for superlinear functions that we
can now use to provide a representation result for ask and bid pricing rules that generalizes
Theorem 900. Recall that the …nancial market is complete when W = Rk .

Theorem 1179 Suppose the …nancial market L; pb ; pa is complete and satis…es the LOP.
Then, there exists a unique non-empty, compact, and convex set C Rk such that

fa (w) = max w and fb (w) = min w


2C 2C

for all w 2 Rk . In particular, C = @fb (0).

Compared to the linear case of Section 19.5, bid-ask spreads result in a multiplicity of
pricing kernels , given by the set C. In particular, the ask price of a claim w can be
expressed as fa (w) = aw w and fb (w) = bw w via pricing kernels aw and bw in C that,
respectively, attain the maximum and the minimum for the linear pricing w.

Proof Consider fb : Rk ! R. By Theorem 1169 and since fb : Rk ! R is superlinear, there


exists a unique non-empty, compact, and convex set C Rk such that

fb (w) = min w 8w 2 Rk
2C

where C = @fb (0). Since fa (w) = fb ( w) for all w 2 Rk , it follows that

fa (w) = fb ( w) = min ( w) = max w 8w 2 Rk


2C 2C

proving the statement also for fa .

Let us continue to consider a complete market. In such a market there are no arbitrages
I if, for all x; x0 2 Rn ,
R x0 R (x) =) va x0 va (x) (24.57)
or, equivalently,23 if
R x0 R (x) =) vb x0 vb (x) (24.58)
23
To see the equivalence, note that R (x0 ) R (x) =) R ( x0 ) R ( x) =) va ( x0 ) va ( x) =)
va ( x0 ) va ( x) =) vb (x0 ) vb (x).
24.10. ULTRACODA: STRONG CONCAVITY 801

Without bid-ask spreads, the unique pricing rule is linear, so each of these two conditions
reduces to (19.19) because for linear functions positivity and monotonicity are equivalent
properties (Proposition 538). Here we need to make explicit the monotonicity assumption
that in the linear case was implicitly assumed.
It is easy to see that the no arbitrage conditions (24.57) and (24.58) imply the LOPs
(24.54) and (24.55). Under such stronger conditions we can get a stronger version of the last
result in which the pricing kernels are positive, thus generalizing Proposition 903.

Proposition 1180 Suppose the …nancial market L; pb ; pa is complete and has no arbit-
rages I. Then, there exists a non-empty, compact, and convex set C Rk+ such that

fa (w) = max w and fb (w) = min w


2C 2C

for all w 2 Rk . If, in addition, the risk-free contingent claim 1 has no bid-ask spread, with
fa (1) = fb (1) = 1, then C n 1.

Since the market is complete, by Proposition 1171 the set of contingent claims without
bid-ask spreads Af = w 2 Rk : fb (w) = fa (w) is a vector subspace of Rk over which the
bid and ask pricing rules are linear. The second part of the result says that if the constant
(so, risk free) contingent claim 1 belongs to such subspace and if its price is normalized to
1, then the pricing kernels are actually probability measures.24

Proof Under condition (24.57), the superlinear function fb is easily seen to be increasing. By
Theorem 1169-(i), we then have C = @fb (0) Rn+ . If 1 2 Af , then f is translation invariant.
By Theorem 1169-(iv), we then have C = @fb (0) n 1 provided fa (1) = fb (1) = 1.

Finally, the absence of arbitrages II is here modelled via strict monotonicity. So, the
resulting nonlinear version of the Fundamental Theorem of Finance, in which C Rn++ ,
relies on Theorem 1169-(iii). We leave the details to readers.

24.10 Ultracoda: strong concavity


In this …nal coda section we introduce a strong form of concavity that will turn out to have
remarkable optimality properties.

De…nition 1181 A function f : C ! R de…ned on a convex set of Rn is said to be strongly


concave if there exists k > 0 such that the function g : C ! R de…ned by

g (x) = f (x) + k kxk2

is concave.

The next result shows the strength of this notion of concavity.

Proposition 1182 Strongly concave functions are strictly concave.


24
A similar normalization holds in Proposition 903, as the reader can check.
802 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

So, strong concavity=)strict concavity=)concave. Intuitively, a strongly concave func-


tion is “so concave” that it remains concave even if added with the quadratic, so strictly
convex, function k kxk2 . Note that the sum of a concave function and of a strongly concave
function is strongly concave, so there is a simple way to construct strongly concave functions.

Proof Let f : C ! R be strongly concave. By de…nition, there exists k > 0 such that the
2
function g : C ! R de…ned by
Pgn (x)2= f (x) + k kxk is concave. Let x; y 2 C, with x 6= y,
2
and 2 (0; 1). Since kxk = i=1 xi is strictly convex, we have

f ( x + (1 ) y) = g ( x + (1 ) y) k k x + (1 ) yk2
> g (x) + (1 ) g (y) k kxk2 + (1 ) kyk2
= f (x) + (1 ) f (y)

as desired.

Strong concavity is, thus, a strong version of strict concavity. The next result shows the
great interest of such stronger version.

Proposition 1183 Let f : C ! R be strongly concave and upper semicontinuous on a closed


convex set of Rn . Then, f is coercive (supercoercive when C = Rn ).

In Example 811 we showed that the function f (x) = 1 x2 is coercive. Since this function
easily seen to be strongly concave, the example can be now seen as an illustration of the
proposition just stated.
The proof relies on a lemma of independent interest.

Lemma 1184 An upper semicontinuous continuous and concave function f : C ! R admits


a dominating a¢ ne function r : C ! R, i.e., r f .

Proof Since f is concave and upper semicontinuous, the convex set hypo f is closed. For,
let f(xn ; tn )g hypo f be such that (xn ; tn ) ! (x; t) 2 Rn+1 . We need to show that (x; t) 2
hypo f . By de…nition, tn f (xn ) for each n 1, so t = lim tn lim sup f (xn ) f (x)
because f is upper semicontinuous. This shows that (x; t) 2 hypo f .
Let (x0 ; t0 ) 2
= hypo f , with x0 2 C and t0 > f (x0 ). By Proposition 824, there exist
(a; c) 2 Rn+1 and " > 0 such that

a x0 + ct0 b+">b a x + ct 8 (x; t) 2 hypo f (24.59)

We have c > 0. For, suppose that c = 0. Then, a x0 b + " > b a x for all x 2 C, so
in particular a x0 > a x0 by taking x = x0 , a contradiction. Next, suppose c < 0. Again
by taking x = x0 and t = f (x0 ), from (24.59) it follows that ct0 b + " > b cf (x0 ). So
t0 < f (x0 ), which contradicts t0 > f (x0 ).
In sum, c > 0. Without loss of generality, set c = 1. De…ne the a¢ ne function r : C ! R
by r (x) = a (x0 x) + t0 . We then have r (x) t for all (x; t) 2 hypo f . In particular, this
is the case for (x; f (x)) for all x 2 C, so r (x) f (x) for all x 2 C. We conclude that r is
the sought-after a¢ ne function.
24.10. ULTRACODA: STRONG CONCAVITY 803

Proof of Proposition 1183 We …rst show that every upper contour set (f k) is bounded.
Suppose, by contradiction, that there exists an unbounded sequence fxn g (f k), i.e.,
such that kxn k ! +1. Since g is concave and continuous, by the previous lemma there is
an a¢ ne function r : C ! R, with r (x) = a xn + b for some a 2 Rn and b 2 R, such that
r g. So, a xn + b f (xn ) + k kxn k2 for all n. By the Cauchy-Schwarz inequality we have
a xn kak kxn k, so

k f (xn ) a xn + b k kxn k2 kak kxn k + b k kxn k2 = b kxn k (k kxn k kak)

Then f (xn ) ! 1 as kxn k ! +1 because kxn k (k kxn k kak) ! +1 as kxn k ! +1.


But this contradicts f (xn ) k for all n 1. We conclude that (f k) is bounded. Since
f is upper semicontinuous and C is closed, the set (f k) is also closed (Proposition 871),
so compact. This proves that f is coercive. Finally, since we proved that f (xn ) ! 1 as
kxn k ! +1, when C = Rn the function f is supercoercive.

By Tonelli’s Theorem, we then have the following remarkable existence and uniqueness
result that combines the best of the two worlds of coercivity and concavity: strict concavity
ensures the existence of at most a maximizer, strong concavity ensures via coercivity that
such a maximizer indeed exists.

Theorem 1185 Let f : C ! R be strongly concave and upper semicontinuous on a closed


convex set of Rn . Then, f has a unique maximizer in C, that is, there exists a unique x
^2C
such that f (^
x) = maxx2C f (x).

In view of this remarkable result one may wonder whether there are strong concavity
criteria. The next result shows that this is, indeed, the case.

Proposition 1186 A twice di¤ erentiable function f : C ! R de…ned on an open convex set
of Rn is strongly concave if and only if there exists c < 0 such that the matrix r2 f (x) cI
is negative de…nite, i.e.,

y r2 f (x) y c kyk2 8x 2 C; 8y 2 Rn (24.60)

In particular, a twice di¤erentiable scalar function f is strongly concave if and only if


there exists c < 0 such that f 00 (x) c < 0 for all x 2 C. In words, strong concavity amounts
to a uniformly strictly negative second derivative.

Proof The function f is strongly concave if and only if g is concave, i.e., if and only if
y r2 g (x) y 0 for all x 2 C and all y 2 Rn (Proposition 1120). Some simple algebra
shows that r g (x) = r2 f (x) + kI, where I is the identity matrix of order n (note that
2

kxk2 = x Ix). In turn, this implies the result by setting c = k.

A nice application of strong concavity is a far-reaching generalization of the Projection


Theorem for closed convex sets.

Theorem 1187 (Projection Theorem) Let C be a closed and convex set of Rn . For
every x 2 Rn , the optimization problem

min kx yk sub y 2 C (24.61)


y
804 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

has a unique solution m 2 C, characterized by the condition

(x m) (m y) 0 8y 2 C (24.62)

The solution m of the minimization problem (24.61) is called projection of x onto C.


We can de…ne an operator PC : Rn ! Rn , called projection, that associates to each vector
x 2 Rn its projection PC (x) 2 C. This notion of projection generalizes the one studied
earlier in the book (Section 19.2) because this version of the Projection Theorem generalizes
the earlier one for vector subspaces. Indeed, the next simple result shows that when C
is a vector subspace condition (24.62) reduces to the orthogonality of the error – i.e., to
the condition (x m) ?C – that characterized the solution of the earlier version of the the
Projection Theorem.

Proposition 1188 If C is a vector subspace, condition (24.62) is equivalent to (x m) ?C.

Proof Let C be a vector subspace. By taking y = 0 and y = 2m, condition (24.62) is easily
seen to imply (x m) m = 0. So, (x m) (m y) = (x m) y 0 for all y 2 C. Fix
y 2 C. Then, (x m) ty 0 for t = 1, so (x m) y = 0. Since y was arbitrarily chosen,
we conclude that (x m) y = 0 for all y 2 C, i.e., (x m) ?C.
Conversely, assume (x m) ?C. Then, (x m) (m y) = (x m) m for all y 2 C.
Since m 2 C, from (x m) ?C it follows in particular that (x m) m = 0. We conclude
that (x m) (m y) = 0 for all y 2 C, so condition (24.62) holds.

To prove this general form of the Projection Theorem, given an x 2 Rn we consider the
function f : Rn ! R de…ned by f (y) = kx yk2 . Problem (24.61) can be rewritten as

max f (y) sub y 2 C (24.63)


y

Thanks to the following lemma, we can apply Theorem 1185 to this optimization prob-
lem.25

Lemma 1189 The function f is strongly concave.

Proof Simple algebra shows that r2 f (x) = 2I for all x 2 C, so y r2 f (x) y = kyk2 =2 for
all x 2 C and all y 2 Rn . By taking c = 1=2, condition (24.60) is satis…ed. This proves that
f is strongly concave.

Proof of the Projection Theorem In view of the previous lemma, by Theorem 1185 there
exists a unique solution m 2 C of the optimization problem (24.61). Clearly,

kx mk2 kx yk2 8y 2 C (24.64)

It remains to show that conditions (24.62) and (24.64) are equivalent, so that condition
(24.62) characterizes the minimizer m.26 Fix any y 2 C and let yt = ty + (1 t) m for
25
The reader should compare this result with Lemma 886. In a similar vein, the function of Lemma 855 can
be shown to be strongly concave. In these cases, strong concavity combines strict concavity and coercivity,
thus con…rming its dual role across concavity and coercivity.
26
Here we follow Zarantonello (1971).
24.10. ULTRACODA: STRONG CONCAVITY 805

t 2 [0; 1]. From (24.64) it follows that, for each t 2 (0; 1], we have

0 kx mk2 kx yt k2 = km yt k2 2 (x m) (m yt )
2
= km ty (1 t) mk 2 (x m) (m ty (1 t) m)
2 2
= t k(m y)k 2t (x m) (m y)

In turn, this implies that

t k(m y)k2 2 (x m) (m y) 8t 2 (0; 1]

By letting t go to 0, we thus have (x m) (m y) 0. Since y was arbitrarily chosen, we


conclude that (24.62) holds. Conversely, assume (24.62). For all y 2 C we have

kx mk2 kx yk2 = kx mk2 k(x m) + (m y)k2


= kx mk2 kx mk2 + km yk2 + 2 (x m) (m y)
= km yk2 2 (x m) (m y)

Thus, (24.62) implies kx mk2 kx yk2 0, so (24.64). Summing up, we proved that
conditions (24.62) and (24.64) are equivalent.

Example 1190 Let C = fx 2 Rn : Ax = bg be the a¢ ne set determined by a matrix A ,


m n
with m n (cf. Proposition 666). If A has full rank, i.e., (A) = m, then
1
PC (x) = x + AT AAT (b Ax) 8x 2 Rn (24.65)

In particular, if m = 1 so that C = fx 2 Rn : a x = bg, we have


b a x
PC (x) = x + a 8x 2 Rn
kak2
To prove (24.65), consider the optimization problem

min kx yk2 sub y 2 C


y

The Lagrangian is L (y; ) = kx yk2 + (b Ay), so ry L (y; ) = 2 (x y) + AT y. The


…rst order condition (29.17) is then

2 (y x) = AT
Ay = b

By multiplying the …rst equation by A, it becomes 2A (x y) = AAT . Since (A) = m,


T
we have AAT = m (cf. Proposition 582 recalling that AAT = AT A). So, the matrix
1
AAT is invertible and, by solving for , we then get = 2 AAT A (x y). By replacing
this value of in the …rst equation, we get
1 1 1
y x = AT AAT A (x y) = AT AAT Ax AT AAT Ay
1 1 1
= AT AAT Ax AT AAT b = AT AAT A (x b)
1
Thus, y = x + AT AAT A (x b) solves the optimization problem (cf. Theorem 1314).N
806 CHAPTER 24. CONCAVITY AND DIFFERENTIABILITY

Example 1191 Let C = Rn+ be the positive orthant. Then,

PC (x) = x+ 8x 2 Rn (24.66)

where x+ = max fx; 0g is the positive part of vector x. For instance, if n = 3 we have
PC (1; 3; 2) = (1; 0; 2). To verify the form of this projection, we use the characterization
(24.62). So, let m 0 be such that

(x m) (m y) 0 8y 2 Rn+

We want to show that m = x+ . By setting y = 0 and y = 2m, we get (x m) m 0


and (x m) m 0, respectively. So, (x m) m = 0.PBy setting y = ei , we then have
0 = (x m) m xi mi , so m x. In turn, from 0 = ni=1 (xi mi ) mi it then follows
xi = mi if xi > 0 and mi = 0 if xi = 0. That is, m = x+ . N

Finally, there is a dual notion of strong convexity: a function f : C ! R de…ned on a


convex set of Rn is said to be strongly convex if there exists k > 0 such that the function
g : C ! R de…ned by g (x) = f (x) k kxk2 is convex. Since f is strongly convex if and only
if f is strongly concave, readers can check that dual versions of the results of this section
hold for strongly convex function.
Chapter 25

Implicit functions

25.1 The problem


So far we have studied scalar functions f : A R ! R by writing them in explicit form:

y = f (x)

This form separates the independent variable x from the dependent one y, so it permits
to determine the values of the latter from those of the former. The same function can be
rewritten in implicit form through an equation that keeps all the variables on the same side
of the equality sign:
g (x; f (x)) = 0

where g is a function of two variables de…ned by

g (x; y) = f (x) y

Example 1192 (i) The function f (x) = x2 + x 3 can be written in implicit form as
g (x; f (x)) = 0 with g (x; y) = x2 + x 3 y. (ii) The function f (x) = 1 + lg x can be
written in implicit form as g (x; f (x)) = 0 with g (x; y) = 1 + lg x y. N

Note that
1
g (0) \ (A Im f ) = Gr f

The graph of the function f thus coincides with the level curve g 1 (0) of the function g of
two variables.1

Example 1193 Consider the function f : [ 1; 1] ! R de…ned by f (x) = 1 x2 , whose

1
The rectangle A Im f has as its factors –its edges, geometrically –the domain and image of f . Clearly,
p
Gr f A Im f . For example,pfor the function f (x) = x this rectangle is the …rst orthant R2+ of the plane,
while for the function f (x) = x x2 is the unit square [0; 1] [0; 1] of the plane.

807
808 CHAPTER 25. IMPLICIT FUNCTIONS

graph is the parabola

y
2.5

1.5

0.5

0
-1 O 1 x
-0.5

-1
-2 -1 0 1 2

inscribed in the rectangle A Im f = [ 1; 1] [0; 1]. We can write f in implicit form


as g x; 1 x2 = 0 with g : R2 ! R de…ned by g (x; y) = 1 x2 y. Since g 1 (0) =
(x; y) 2 R2 : 1 x2 = y , we then have
1
g (0) \ (A Im f ) = (x; y) 2 [ 1; 1] [0; 1] : 1 x2 = y = Gr f

The implicit rewriting of a scalar function f whose explicit form is known is nothing
more than a curiosity because the explicit form contains all the relevant information on f ,
in particular on the dependence between the independent variable x and the dependent one
y. Unfortunately, often applications feature important scalar functions that are not given in
“ready to use” explicit form, but only in implicit form through equations g (x; y) = 0. For
this reason, it is important to consider the inverse problem: does an equation of the type
g (x; y) = 0 de…ne implicitly a scalar function f ? In other words, does f exist such that
g (x; f (x)) = 0? If so, which properties does it have? For instance, is it unique? Is it convex
or concave? Is it di¤erentiable?
This chapter will address these motivating questions by showing that, under suitable
regularity conditions, this function f exists and is unique (locally or globally, as it will
become clear) and that it may enjoy remarkable properties. As usual, we will emphasize a
global viewpoint, the one most relevant for applications.

An important preliminary observation: there is a close connection between implicit func-


tions and level curves that permits to express in functional terms the properties of the level
curves, a most useful way to describe such properties (cf. Section 25.3.2 below). Because
of its importance, in the next lemma we make this connection rigorous. Note that the role
that in the lemma the sets A and B play is to be, respectively, the domain and codomain of
25.1. THE PROBLEM 809

the implicit functions considered. In other words, the lemma considers functions f : A ! B
that belong to a posited space B A (cf. Section 6.3.2). It is a purely set theoretic result, so
in the statement we consider generic sets A, B, C and D.

Proposition 1194 Let g : C ! D with A B C and let k 2 D. For a function f : A ! B


the following two properties are equivalent:

(i) f is the unique function in B A with the property

g (x; f (x)) = k 8x 2 A (25.1)

(ii) f satis…es the equality


1
g (k) \ (A B) = Gr f (25.2)

Condition (25.2) amounts to say that

g (x; y) = k () y = f (x) 8 (x; y) 2 A B

that is, the level curve g 1 (k) of the function g is described on the rectangle A B by
the function of a single variable f . Thus, f provides a “functional description” of this level
curve that speci…es the relationship existing between the arguments x and y of g when they
belong to g 1 (k). By the lemma, for a function f to satisfy condition (25.1) thus amounts
to provide such a functional description of the level curve.

Proof (i) implies (ii). We …rst show that Gr f g 1 (k) \ (A B). Let (x; y) 2 Gr f .
By de…nition, (x; y) 2 A B and y = f (x), thus g (x; y) = g (x; f (x)) = k. This implies
(x; y) 2 g 1 (k) \ (A B), so Gr f g 1 (k) \ (A B). As to the converse inclusion, let
(x; y) 2 g 1 (k) \ (A B). We want to show that y = f (x). Suppose not, i.e., y 6= f (x).
De…ne f~ : A ! R by f~ (x) = f (x) if x 6= x and f~ (x) = y. Since g (x; y) = k, we
have g(x; f~ (x)) = k for every x 2 A. Since (x; y) 2 A B, we have f~ 2 B A . Being by
construction f~ 6= f , this contradicts the uniqueness of f . We conclude that (25.2) holds, as
desired.
(ii) implies (i). Let f 2 B A be such that (25.2). By de…nition, (x; f (x)) 2 Gr f for each
x 2 A. By (25.2), we have (x; f (x)) 2 g 1 (k), so g (x; f (x)) = k for each x 2 A. It remains
to prove the uniqueness of f . Let h 2 B A satisfy (25.1). We have Gr h g 1 (k) \ (A B)
since we can argue as in the …rst inclusion of the …rst part of the proof. By (25.2), this
inclusion then yields Gr h Gr f . In turn, this implies h = f . Indeed, if we consider x 2 A,
then (x; h (x)) 2 Gr h Gr f . Since (x; h (x)) 2 Gr f , then (x; h (x)) = (x0 ; f (x0 )) for some
x0 2 A. This implies x = x0 and h (x) = f (x0 ), and so h (x) = f (x). Since x was arbitrarily
chosen, we conclude that f = h, as desired.

N.B. If C = A B, then (25.2) simpli…es to


1
g (k) = Gr f

Indeed, in this case g 1 (k) = f(x; y) 2 A B : g (x; y) = kg and so g 1 (k) \ (A B) =


g 1 (k). O
810 CHAPTER 25. IMPLICIT FUNCTIONS

25.2 Implicit functions


To address the motivating questions that we posed we need some more structure. For this
reason, throughout the section we assume that A Rn , B R, C Rn+1 and D R. By
taking advantage of this added structure, the next result provides a simple answer to a key
existence question.

Proposition 1195 Let g : C ! D with A B C and g continuous in y, and let k 2 D.


If
inf g (x; y) k sup g (x; y) 8x 2 A (25.3)
y2B y2B

then there exists f : A ! B such that g (x; f (x)) = k for all x 2 A.

In this case we say that equation g (x;y) = k implicitly de…nes f on the rectangle A B.

Proof For simplicity, let k = 0. Let x0 2 A. By condition (25.3), there exist scalars
y 0 ; y 00 2 B, say with y 0 y 00 , such that g (x0 ; y 0 ) 0 g (x0 ; y 00 ). Since f is continuous, by
Bolzano’s Theorem there exists y0 2 [y 0 ; y 00 ] such that g (x0 ; y0 ) = 0. Since x0 was arbitrarily
chosen, this proves the existence of the implicit function f .

Next comes the uniqueness of the implicit function.

Proposition 1196 Let g : C ! D with A B C and let k 2 D. If g strictly monotone


in y,2 then there exists at most one function f : A ! B in B A such that g (x; f (x)) = k for
all x 2 A.

So, if g is continuous and strictly monotone in y and satis…es condition condition (25.3),
then equation g (x;y) = k implicitly de…nes a unique f on the rectangle A B.

Proof Let f; h : A ! B be such that g (x; f (x)) = g (x; h (x)) = k for all x 2 A. We want
to show that h = f . Suppose, by contradiction, that h 6= f . So, there is at least some x 2 A
with h (x) 6= f (x), say h (x) > f (x). The function g is strictly monotone in y, say increasing.
Thus, k = g (x; h (x)) > g (x; f (x)) = k, a contradiction. We conclude that h = f .

When g is partially derivable in y, a convenient di¤erential condition that ensures the


strict monotonicity of g in y is that either @g (x; y) =@y > 0 for all (x; y) 2 A B or that
the opposite inequality holds for all (x; y) 2 A B. This type of di¤erential monotonicity
conditions will play a key role in what follows (in particular, in the local and global versions
of the Implicit Function Theorem).

Example 1197 De…ne g : R2 ! R by g (x; y) = x2 2y ey . Equation

g (x; y) = 0

de…nes on the entire plane a unique implicit function f : R ! R. Indeed, g is di¤erentiable


with
@g (x; y)
= 2 ey < 0 8y 2 R
@y
2
A function is strictly monotone if it is either strictly increasing or strictly decreasing.
25.2. IMPLICIT FUNCTIONS 811

Therefore, g is strictly decreasing in y. Moreover, condition (25.3) holds because

lim g (x; y) = +1 and lim g (x; y) = 1 8x 2 R


y! 1 y!+1

By Propositions 1195 and 1196, there is a unique implicit function f : R ! R such that

g (x; f (x)) = x2 2f (x) ef (x) = 0 8x 2 R

Note that we are not able to write y as an explicit function of x, that is, we are not able to
provide the explicit form of f . N

The following example exhibits a discontinuous g which is not strictly monotone in y.


Nevertheless, we have a unique implicit function, thus showing that the conditions of the
last two propositions are only su¢ cient.

Example 1198 Let g : R f0g R be de…ned for each x 6= 0 as


8 y
>
< 1 if x; y 2 Q
x
g (x; y) =
: y 1 otherwise
>
x
There is a unique implicit function f : R f0g ! R on R f0g R given by
(
x if 0 6= x 2 Q
f (x) =
x if x 2 =Q

as the reader can check. N

Having discussed existence and uniqueness, we can now turn to the properties that the
implicit function f inherits from g. In short, the continuity of g is passed to the implicit
function, as well as its monotonicity and convexity, although reversed.

Proposition 1199 Let g : C ! D with A B C and g strictly increasing in y, and let


k 2 D. If f : A ! B is such that g (x; f (x)) = k for all x 2 A, then

(i) f is strictly decreasing if g is separately strictly increasing.3

(ii) f is (strictly) convex if g is (strictly) quasi concave, provided the sets A, B and C are
convex.

(iii) f is (strictly) concave if g is (strictly) quasi convex, provided the sets A, B and C are
convex.

(iv) f is continuous if g is continuous, provided the sets A and B are open.


3
That is, both g (x; ) and g ( ; y) are strictly increasing. Here n = 1.
812 CHAPTER 25. IMPLICIT FUNCTIONS

Proof (i) Let n = 1, so that C R2 . We begin by showing that assuming that g is strictly
increasing both in x and in y is equivalent to directly assuming that g is strictly increasing
on A.

Claim A function g : C R2 ! R is strictly increasing if and only if it is strictly increasing


in x and in y.

Proof Let us only show the “if”part, the converse being trivial. Hence, let g : C R2 ! R
be strictly increasing both in x and in y. Let (x; y) > (x0 ; y 0 ). Our aim is to show that
g (x; y) > g (x0 ; y 0 ). If x = x0 or y = y 0 , the result is trivial. Hence, let x > x0 and
y > y 0 . We have (x; y) > (x0 ; y) > (x0 ; y 0 ), so g (x; y) > g (x0 ; y) > g (x0 ; y 0 ), which implies
g (x; y) > g (x0 ; y 0 ).

Since it is strictly increasing in x and in y, by the Claim the function g is strictly


increasing. Let us show that f is strictly decreasing. Take x; x0 2 A with x > x0 . Suppose,
by contradiction, that f (x) f (x0 ). This implies that (x; f (x)) > (x0 ; f (x0 )) and so
g (x; f (x)) > g (x0 ; f (x0 )), which contradicts g (x; f (x)) = g (x0 ; f (x0 )).
(ii) Let g be quasi concave. Let us show that f is convex. Let x; x0 2 A and 2 [0; 1].
From g (x; f (x)) = g (x0 ; f (x0 )) it follows that

g x + (1 ) x0 ; f (x) + (1 ) f x0
g (x; f (x)) = g x + (1 ) x0 ; f x + (1 ) x0

Hence, f (x) + (1 ) f (x0 ) f ( x + (1 ) x0 ) as f is strictly increasing in y. A similar


argument can be used to show the strict version.
(iii) Similar, mutatis mutandis, to point (ii).
(iv) Consider a point x and the corresponding value y = f (x). Since A is open, the point
(x; y) is interior. Hence, there exists " > 0 such that B" (x; y) A B. Let m 1 be large
enough so that 0 < 1=m < ". Since g (x; y) = k and g is strictly increasing in y, we have
g (x; y 1=m) < k < g (x; y + 1=m). By the continuity of g, the functions g ( ; y 1=m) and
g ( ; y + 1=m) are both continuous in x. So, there exists (cf. the Theorem on the permanence
of sign) a small enough neighborhood B~" (x) A such that

1 1
g x; y < k < g x; y + 8x 2 B~" (x)
m m

Since g is strictly increasing, we then have

1 1
f (x) < f (x) < f (x) + 8x 2 B~" (x) (25.4)
m m
In turn, this guarantees that f is continuous at x. In fact, let xn ! x. Fix any m 1 large
enough so that 0 < 1=m < ". By what we just proved, there exists ~" > 0 such that (25.4)
holds. By the de…nition of convergence, there is n~" 1 such that xn 2 B~" (x) for every
n n~" , so that
1 1
f (x) < f (xn ) < f (x) + 8n n~"
m m
25.2. IMPLICIT FUNCTIONS 813

Thus
1 1
f (x) lim inf f (xn ) lim sup f (xn ) f (x) +
m m
Since this holds for all m large enough, we have
1 1
f (x) = lim f (x) lim inf f (xn ) lim sup f (xn ) lim f (x) + = f (x)
m!1 m m!1 m
We conclude that lim f (xn ) = f (x). Since x was arbitrarily chosen, the function f is
continuous.

We leave to the reader the dual version of this result in which the strict monotonicity of
g changes from increasing to decreasing. Instead, we turn to the all-important issue of the
di¤erentiability of the implicit function.
Proposition 1200 Let g : C ! D with A B C and let k 2 D. Suppose that the sets A
and B are open and that g is continuously di¤ erentiable on A B, with either @g (x; y) =@y >
0 for all (x; y) 2 A B or @g (x; y) =@y < 0 for all (x; y) 2 A B. If f : A ! B is such that
g (x; f (x)) = k for all x 2 A, then it is continuously di¤ erentiable, with
@g
(x; y)
f 0 (x) = @x (25.5)
@g
(x; y)
@y
for every (x; y) 2 g 1 (k) \ (A B).
In the next section we will discuss at length the di¤erential formula (25.5), which plays
a fundamental role in applications.
Example 1201 In the last example we learned that the equation
g (x; y) = x2 2y ey = 0
de…nes on the plane a unique implicit function f : R ! R. The function g is continuously
di¤erentiable, with
@g
(x; y) = 2 ey < 0 8x; y 2 R2
@y
By Proposition 1200, f is then continuously di¤erentiable, with
@g
(x; y) 2x
f 0 (x) = @x = 8 (x; y) 2 g 1
(0)
@g 2 + ey
(x; y)
@y
Though we were not able to provide the explicit form of f , we have a formula for its derivative.
As we will see in the next section when discussing the Implicit Function Theorem, this is a
main feature of formula (25.5). For instance, at every (x0 ; y0 ) 2 g 1 (0) we can then write
the …rst-order approximation
2x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x 0 ) = y0 + (x x0 ) + o (x x0 )
2 + ey0
that gives us some precious information on f . N
814 CHAPTER 25. IMPLICIT FUNCTIONS

Proof of Proposition 1200 Since either @g (x; y) =@y > 0 for all (x; y) 2 A B or the
opposite inequality holds, g strictly monotone in y. By Proposition 1196, f is then the
unique function in B A such that g (x; f (x)) = k for all x 2 A. The function f is continuously
di¤erentiable. Let x 2 A and y = f (x). Set h2 = f (x + h1 ) f (x). Since g is continuously
di¤erentiable, for every h1 ; h2 6= 0 there exists 0 < # < 1 such that4

@g (x + #h1 ; y + #h2 ) @g (x + #h1 ; y + #h2 )


g (x + h1 ; y + h2 ) = g (x; y) + h1 + h2
@x @y

If h1 is small enough so that x + h1 2 A and y + h2 2 B, we then have

@g (x + #h1 ; y + #h2 ) @g (x + #h1 ; y + #h2 )


0= h1 + h2 (25.6)
@x @y

By Proposition 1199-(iv), the implicit function f is continuous. Hence, if h1 ! 0 then


h2 ! 0. So, by (25.6) we have

@g
h2
@g(x+#h1 ;y+#h2 ) (x; y)
f 0 (x) = lim = lim @x
= @x (25.7)
h1 !0 h1 h1 !0 @g(x+#h1 ;y+#h2 ) @g
@y (x; y)
@y

because of the continuity of @g=@x and of @g=@y. In turn, this shows that the continuity of
the derivative function f 0 is a direct consequence of the continuity of @g=@x and of @g=@y.
From (25.7) it follows that

@g
(x; f (x))
f 0 (x) = @x 8x 2 A
@g
(x; f (x))
@y

However, the uniqueness of f ensures that g 1 (k) \ (A B) = Gr f (Proposition 1194). In


turn, this implies formula (25.5) because (x; y) 2 g 1 (k) \ (A B) if and only if y = f (x).

25.3 A local perspective


25.3.1 Implicit Function Theorem
We now address the motivating questions from a local perspective, which is particularly well
suited for di¤erential calculus, as the next famous result shows.5 It is the most important
result in the study of implicit functions and is widely used in applications. In particular, we
focus on a point (x0 ; y0 ) that solves equation g (x; y) = 0, i.e., such that g (x0 ; y0 ) = 0 or,
equivalently, such that (x0 ; y0 ) 2 g 1 (0).
4
It is a cruder version of approximation (23.24).
5
This theorem …rst appeared in lecture notes that Ulisse Dini prepared in the 1870s. For this reason,
sometimes it is named after him.
25.3. A LOCAL PERSPECTIVE 815

Theorem 1202 (Implicit Function Theorem) Let g : U ! R be de…ned (at least) on an


open set U of R2 and let g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood
of (x0 ; y0 ), and
@g
(x0 ; y0 ) 6= 0 (25.8)
@y
then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = 0 8x 2 B (x0 ) (25.9)
The function f is continuously di¤ erentiable on B (x0 ), with

@g
(x; y)
f 0 (x) = @x (25.10)
@g
(x; y)
@y

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).

Along with the continuous di¤erentiability of g, the easily checked simple di¤erential
condition (25.8) thus ensures that locally, near the point (x0 ; y0 ), there exists a unique
and continuously di¤erentiable implicit function f : B (x0 ) ! V (y0 ). It is a remarkable
achievement: the hypotheses of the global results of the previous section – Propositions
1195, 1196 and 1200 –are de…nitely clumsier. Yet, the global viewpoint –the most relevant
for applications –will be partly vindicated by the Global Implicit Function Theorem of next
chapter and, more important here, the proof of the Implicit Function Theorem will show
how this theorem in turn builds on the previous global results.
To emphasize the local perspective of the Implicit Function Theorem, here we say that
equation g (x;y) = 0 implicitly de…nes a unique f at the point (x0 ; y0 ) 2 g 1 (0).

Proof Suppose, without loss of generality, that (25.8) takes the positive form

@g
(x0 ; y0 ) > 0 (25.11)
@y

Since g is continuously di¤erentiable, by the Theorem on the permanence of sign there exists
a neighborhood B ~ (x0 ; y0 ) U for which

@g ~ (x0 ; y0 )
(x; y) > 0 8 (x; y) 2 B (25.12)
@y

Let " > 0 be small enough so that

[x0 "; x0 + "] [y0 "; y0 + "] ~ (x0 ; y0 )


B

Since @g (x; y) =@y > 0 for every (x; y) 2 [x0 "; x0 + "] [y0 "; y0 + "], the function g (x; )
is strictly increasing in y for every x 2 [x0 "; x0 + "]. So, g (x0 ; y0 ") < 0 = g (x0 ; y0 ) <
g (x0 ; y0 + "). The functions g ( ; y0 ") and g ( ; y0 + ") are both continuous in x, so by
816 CHAPTER 25. IMPLICIT FUNCTIONS

the Theorem on the permanence of sign there exists a small enough neighborhood B (x0 )
[x0 "; x0 + "] so that
g (x; y0 ") < 0 < g (x; y0 + ") 8x 2 B (x0 ) (25.13)
By Bolzano’s Theorem, for each x 2 B (x0 ) there exists y0 " < y < y0 + " such that
g (x; y) = 0. By the strict monotonicity of g (x; ) on [y0 "; y0 + "], such y is unique. By
setting V (y0 ) = (y0 "; y0 + "), we have thus de…ned a unique implicit function f : B (x0 ) !
V (y0 ) on the rectangle U (x0 ) V (y0 ) such that (25.9) holds.6
Having established the existence of a unique implicit function, its di¤erential properties
now follow from Proposition 1200.

Since the function f : B (x0 ) ! V (y0 ) de…ned implicitly by the equation g (x;y) = 0 at
(x0 ; y0 ) is unique, in view of Proposition 1194 the relation (25.9) is equivalent to
g (x; y) = 0 () y = f (x) 8 (x; y) 2 B (x0 ) V (y0 ) (25.14)
that is, to
1
g (0) \ (B (x0 ) V (y0 )) = Gr f (25.15)
Thus, the level curve g 1 (0)
–so, the solutions of the equation g (x; y) = 0 –can be repres-
ented locally by the graph of the implicit function. This is precisely, in the …nal analysis,
the reason why the theorem is so important in applications (as we will see shortly in Section
25.3.2).

Inspection of the proof of the Implicit Function Theorem shows that on the rectangle
B (x0 ) V (x0 ) we have either @g (x; y) =@y > 0 or @g (x; y) =@y < 0. Assume the former, so
that g is strictly increasing in y. By Proposition 1199, we then have that:

(i) f is strictly decreasing if @g (x; y) =@x > 0 on B (x0 ) V (x0 );


(ii) f is (strictly) convex if g is (strictly) quasi concave provided the set U is convex.
(iii) f is (strictly) concave if g is (strictly) quasi convex provided the set U is convex.

Thus, some basic properties of the implicit function provided by the Implicit Function
Theorem can be easily established. Note that formula (25.10) permits the computation of
the …rst derivative of the implicit function even without knowing the function in explicit
form. Since the …rst derivative is often what is really needed for such a function (because,
for example, we are interested in solving a …rst-order condition), this is a most useful feature
of the Implicit Function Theorem.

At the point (x0 ; y0 ) formula (25.10) takes the form


@g
(x0 ; y0 )
f 0 (x0 ) = @x
@g
(x0 ; y0 )
@y
6
Though we gave a simple direct proof, after having established (25.13) we could have just invoked Propos-
itions 1195 and 1196 to conclude that there exists a unique f . Indeed, (25.13) implies (25.3), so the existence
of f is a consequence of Proposition 1195. In a similar vein, its uniqueness follows from Proposition 1196
because g is strictly increasing in y.
25.3. A LOCAL PERSPECTIVE 817

Note that the use of formula (25.10) is based on the clause “(x; y) 2 g 1 (0)\B (x0 ) V (y0 )”
that requires to …x both variables x and y. This is the price to pay in implicit derivability –
in contrast, in explicit derivability it is su¢ cient to …x the variable x to compute f 0 (x). On
the other hand, we can rewrite (25.10) as

@g
(x; f (x))
f 0 (x) = @x (25.16)
@g
(x; f (x))
@y

for each x 2 B (x0 ), thus emphasizing the role played by the implicit function. Formulations
(25.10) and (25.16) are both useful, for di¤erent reasons; it is better to keep both of them
in mind. As we remarked, formulation (25.10) allows one to compute the …rst derivative of
f even without knowing f itself, thereby yielding a useful …rst-order local approximation of
f . For this reason in the examples we will always use (25.10) because the closed form of f
will not be available.

We can provide a heuristic derivation of formula (25.10) through the total di¤erential

@g @g
dg = dx + dy
@x @y

of the function g. We have dg = 0 for variations (dx; dy) that keep us along the level curve
g 1 (0). Therefore,
@g @g
dx = dy
@x @y
which “yields” (the power of heuristics!):
@g
dy @x
= @g
dx
@y

It is a rather rough (and incorrect) argument, but certainly useful to remember formula
(25.10).

Example 1203 In the trivial case of a linear function g (x; y) = ax + by c, equation


g (x; y) = 0 becomes ax + by c = 0, and yields
a c
y = f (x) = x+
b b
provided b 6= 0. Even in this very simple case, the existence of an implicit function requires
the condition b = @g (x) =@y 6= 0. N

Example 1204 Let g : R2 ! R be given by g (x; y) = x2 xy 3 + y 5 16. Let us determine


whether equation g (x; y) = 0 de…nes implicitly a function at the point (x0 ; y0 ) = (4; 2) 2
g 1 (0). The function g is continuously di¤erentiable on R2 , namely @g (x; y) =@y = 3xy 2 +
5y 4 , and therefore
@g
(4; 2) = 32 6= 0
@y
818 CHAPTER 25. IMPLICIT FUNCTIONS

By the Implicit Function Theorem, there exists a unique continuously di¤erentiable f :


B (4) ! V ( 2) such that

x2 xf 3 (x) + f 5 (x) = 16 8x 2 B (4)

Moreover, since @g (x; y) =@x = 2x y 3 , we have


@g
(4; 2) 2 4 ( 2)3 16 1
f 0 (4) = @x = = =
@g 3 4 ( 2)2 + 5 ( 2)4 32 2
(4; 2)
@y
In general, at every point (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )) in which @g (x; y) =@y 6= 0,
we have
@g
(x; y) 2x y 3 y 3 2x
f 0 (x) = @x = =
@g 3xy 2 + 5y 4 3xy 2 + 5y 4
(x; y)
@y
In particular, the …rst-order local approximation in a neighborhood of x0 is

f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


y03 2x0
= y0 + (x x0 ) + o (x x0 )
3x0 y02 + 5y04
for every x 2 B(x0 ).7 N

Sometimes it is possible to …nd stationary points of the implicit function without knowing
its explicit form. When this happens, it is a remarkable application of the Implicit Function
Theorem. For instance, consider in the previous example the point (4; 2) 2 g 1 (0). We have
(@g=@y) (4; 2) = 32 6= 0. Let f : B (4) ! V (2) be the unique function then de…ned implicitly
at the point (4; 2).8 We have:
@g
(4; 2) 0
f 0 (4) = @x = =0
@g 32
(4; 2)
@y
Therefore, x0 = 4 is a stationary point for the implicit function f . It is possible to check
that it is actually a local maximizer.

Example 1205 (i) Consider the function g : R2 ! R given by g (x; y) = 7x2 + 2y ey .


The hypotheses of the Implicit Function Theorem are satis…ed at every point (x0 ; y0 ) 2 R2 .
Thus, equation g (x; y) = 0 de…nes implicitly at a point (x0 ; y0 ) 2 g 1 (0) a continuously
di¤erentiable function f : B (x0 ) ! V (y0 ) with
@g(x;y)
14x
f 0 (x) = @x
= (25.17)
@g(x;y) 2 ey
@y
7
The reader can verify that also ( 12; 2) 2 g 1 (0) and @g=@y ( 12; 2) 6= 0, and calculate f 0 ( 12) for
the implicit function de…ned at ( 12; 2).
8
This function is di¤erent from the previous implicit function de…ned at the other point (4; 2).
25.3. A LOCAL PERSPECTIVE 819

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).


Even if we do not know the explicit form of f , we have been able to …nd its derivative
function f 0 . The …rst-order local approximation is
14x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x 0 ) = y0 (x x0 ) + o (x x0 )
2 ey0
p 1 (0)
p
at (x 0 ; y0 ). For example, at the point (1= 7; 0) 2 g we have, as x ! 1= 7,

1 p 1 1
f p = 2 7 x p +o x p
7 7 7

(ii) Let g : R2 ! R be given by g (x; y) = x3 + 4yex + y 2 + xey . If g (x0 ; y0 ) = 0 and


@g (x0 ; y0 ) =@y 6= 0, then by the Implicit Function Theorem the equation g (x; y) = 0 de…nes
at (x0 ; y0 ) a unique continuously di¤erentiable function f : B (x0 ) ! V (y0 ) with
@g(x;y)
0 @x 3x2 + 4yex + ey
f (x) = =
@g(x;y) 4ex + 2y + xey
@y

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). The …rst-order local approximation is

f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


3x20 + 4y0 ex0 + ey0
= y0 (x x0 ) + o (x x0 )
4ex0 + 2y0 + x0 ey0

at (x 0 ; y0 ). For example, if (x0 ; y0 ) = (0; 0) we have @g (0; 0) =@y = 4 6= 0, so


@g(0;0)
1
f 0 (0) = @x
@g(0;0)
=
4
@y

and, as x ! 0,
1
f (x) = y0 + f 0 (0) x + o (x) = x + o (x)
4
N

By exchanging the variables in the Implicit Function Theorem, we can say that the
continuity of the partial derivatives of g in a neighborhood of (x0 ; y0 ) and the condition
@g (x0 ; y0 ) =@x 6= 0 ensures the existence of a (unique) implicit function x = ' (y) such
that locally g (' (y) ; y) = 0. It follows that, if at least one of the two partial derivatives
@g (x0 ; y0 ) =@x and @g (x0 ; y0 ) =@y is not zero, there is locally a univocal tie between the two
variables. As a result, the Implicit Function Theorem cannot be applied only when both the
partial derivatives @g (x0 ; y0 ) =@y and @g (x0 ; y0 ) =@x are zero.
For example, if g (x; y) = x2 + y 2 1, then for every point (x0 ; y0 ) that satis…es the
equation g (x; y) = 0 we have @g (x0 ; y0 ) =@y = 2y0 , which is zero only for y0 = 0 (and hence
x0 = 1). At the two points (1; 0) and ( 1; 0) the equation does not de…ne any implicit
function of the type y = f (x). But @g ( 1; 0) =@x = 2 6= 0 and, therefore, at such points
the equation de…nes an implicit function of the type x = ' (y). Symmetrically, at the two
820 CHAPTER 25. IMPLICIT FUNCTIONS

points (0; 1) and (0; 1) the equation de…nes an implicit function of the type y = f (x) but
not one of the type x = ' (y).

This last remark suggests a …nal important observation on the Implicit Function The-
orem. Suppose that, as at the beginning of the chapter, ' is a standard function de…ned in
explicit form, which can be written in implicit form as

g (x; y) = ' (x) y (25.18)

Given (x0 ; y0 ) 2 g 1 (0), suppose @g (x0 ; y0 ) =@x 6= 0. The Implicit Function Theorem (in
“exchanged” form) then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function f : B (y0 ) ! V (x0 ) such that

g (f (y) ; y) = 0 8y 2 B (y0 )

that is, by recalling (25.18),

' (f (y)) = y 8y 2 B (y0 )

The function f is, therefore, the inverse of ' on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence –locally, around the point y0 –of the inverse
of '. In particular, formula (25.10) here becomes

@g
(x0 ; y0 )
@y 1
f 0 (y0 ) = = 0
@g ' (x0 )
(x0 ; y0 )
@x
which is the classic formula (20.20) of the derivative of the inverse function. In sum, there
is a close connection between implicit and inverse functions, which the reader will see later
in the book (Section 26.1).

25.3.2 Level curves and marginal rates


Though so far in this section we considered the equation g (x; y) = 0, there is nothing special
about 0 and we can actually consider any scalar k. Though mathematically it is an obvious
generalization of the Implicit Function Theorem, because of its importance in applications
next we state and prove the version of the theorem for a generic scalar k, possibly di¤erent
from 0.

Proposition 1206 Let g : U ! R be de…ned (at least) on an open set U of R2 and let
g (x0 ; y0 ) = k. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), and

@g
(x0 ; y0 ) 6= 0
@y

then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = k 8x 2 B (x0 )
25.3. A LOCAL PERSPECTIVE 821

The function f is continuously di¤ erentiable on B (x0 ), with

@g
(x; y)
f 0 (x) = @x (25.19)
@g
(x; y)
@y

for every (x; y) 2 g 1 (k) \ (B (x0 ) V (y0 )).

This is the version of the Implicit Function Theorem which we will refer to in the rest of
the section when discussing marginal rates.

Proof De…ne gk : U R2 ! R by gk (x; y) = g(x; y) k. We have g (x; y) = k if and only if


gk (x; y) = 0, that is, g 1 (k) = gk 1 (0). Moreover, @gk (x0 ; y0 ) =@y = @g (x0 ; y0 ) =@y 6= 0. By
the Implicit Function Theorem, there exist neighborhoods B (x0 ) and V (y0 ) and a unique
function f : B (x0 ) ! V (y0 ) such that gk (x; f (x)) = 0 for all x 2 B (x0 ). In turn, this
implies g (x; f (x)) = k for all x 2 B (x0 ). Since f is continuously di¤erentiable, the result is
proved.

In view of Proposition 1194, the implicit function f : B (x0 ) ! V (y0 ) permits to establish
a functional representation of the level curve g 1 (k) through the basic relation
1
g (k) \ (B (x0 ) V (y0 )) = Gr f (25.20)

which is the general form of (25.15) for any k 2 R. Implicit functions thus describe the link
between the variables x and y that belong to the same level curve, thus making it possible
to formulate trough them some key properties of these curves. The great e¤ectiveness of this
formulation explains the importance of implicit functions, as mentioned right after (25.14).

For example, the isoquant g 1 (k) is a level curve of the production function g : R2+ ! R,
which features two inputs, x and y, and one output. The points (x; y) that belong to the
isoquant are all the input combinations that keep the quantity of output produced constant.
The implicit function y = f (x) tells us, locally, how it has to change the quantity y, when
x varies, in order to keep constant the output produced. Therefore, the properties of the
function f : B (x0 ) ! V (y0 ) characterize, locally, the relations between the inputs that
guarantee the level k of output. We usually assume that f is:

(i) decreasing, that is, f 0 (x) 0 for every x 2 B (x0 ): the two inputs are partially
substitutable and, in order to keep the quantity produced unchanged to the level k, to
lower quantities of the input x have to correspond larger quantities of the input y (and
vice versa);

(ii) convex, that is, f 00 (x0 ) 0 for every x 2 B (x0 ): to greater levels of x, have to
correspond larger and larger quantities of y to compensate (negative) in…nitesimal
variations of x in order to keep production at level k.

Remarkably, as noted after the proof of the Implicit Function Theorem, via Proposition
1199 we can tell which properties of g induce these desirable properties.
822 CHAPTER 25. IMPLICIT FUNCTIONS

Example 1207 Consider a Cobb-Douglas production function g : R2++ ! R given by


g (x; y) = x y 1 , with 0 < < 1. Given any k > 0, let (x0 ; y0 ) 2 R2++ be such that
2
g (x0 ; y0 ) = k. Since g : R++ ! R is continuously di¤erentiable, with @g (x0 ; y0 ) =@y 6= 0, by
the Implicit Function Theorem there exist neighborhoods B (x0 ) and V (y0 ) and a unique
implicit function fk : B (x0 ) ! V (y0 ) such that g (x; fk (x)) = k for all x 2 B (x0 ). The
implicit function fk is continuously di¤erentiable, as well as strictly decreasing and strictly
convex because g is strictly increasing and strictly concave (Proposition 1199).9 N

The absolute value jf 0 j of the derivative of the implicit function is called the marginal
rate of transformation because for in…nitesimal variations of the inputs, it describes their
degree of substitutability –that is, the variation of y that balances an increase in x. Thanks
to the functional representation (25.20) of the isoquant, geometrically the marginal rate of
transformation can be interpreted as the slope of the isoquant at (x; y). This is the classic
interpretation of the rate, which follows from (25.20).
The Implicit Function Theorem implies the classic formula
@g
(x; y)
M RTx;y = f 0 (x) = @x
@g
(25.21)
@y (x; y)

This is the usual form in which the notion of marginal rate of transformation M RTx;y
appears.

Example 1208 Let g : R2+ ! R be the Cobb-Douglas production function g (x; y) =


x y 1 , with 0 < < 1. The corresponding marginal rate of transformation is
@g 1y1
@x (x; y) x y
M RTx;y = @g
= =
(x; y) (1 )x y 1 x
@y

For example, at a point at which we use equal quantities of the two inputs –that is, x = y –
if we increase the …rst input by one unit, the second one must decrease by = (1 ) units to
leave unchanged the quantity of output produced: in particular, when = 1=2, the decrease
of the second one must be of one unit. At a point at which we use a quantity of the second
input …ve times larger than that of the …rst input –that is, y = 5x –an increase of one unit
of the …rst input is compensated by a decrease of 5 = (1 ) of the second one. N

Similar considerations hold for the level curves of a utility function u : R2+ ! R, that is,
for its indi¤erence curves u 1 (k). The implicit functions provided by the Implicit Function
Theorem tell us, locally, how one has to vary the quantity y when x varies to keep the
overall utility level constant. For them we assume properties of monotonicity and convexity
similar to those assumed for the implicit functions de…ned by isoquants. The monotonicity
of the implicit function re‡ects the partial substitutability of the two goods: it is possible to
consume a bit less of one good and a bit more of the other one and yet keep unchanged the
overall level of utility. The convexity of the implicit function models the classic hypothesis
of decreasing rates of substitution: when the quantity of a good, for example x, increases
we then need greater and greater “compensative” variations of the other good y in order to
stay on the same indi¤erence curve, i.e., in order to have u (x; y) = u (x + x; y + y).
9
Later in the chapter we will revisit this example (Example 1223).
25.3. A LOCAL PERSPECTIVE 823

Here as well, it is important to note that via Proposition 1199 we can tell which properties
of the utility function u induce these desirable properties, thus for instance making rigorous
the common expression “convex indi¤erence curves” (cf. Chapter 14). Indeed, they have a
functional representation via convex implicit functions.

In the present case the absolute value jf 0 j of the derivative of the implicit function is
called marginal rate of substitution: it measures the (negative) variation in y that balances
marginally an increase in x. Geometrically, it is the slope of the indi¤erence curve at (x; y).
Thanks to the Implicit Function Theorem, we have
@u
(x; y)
M RSx;y = f 0 (x) = @x
@u
@y (x; y)
which is the classic form of the marginal rate of substitution.

Let h be a scalar function with a strictly positive derivative, so that it is strictly increasing
and h u is then a utility function equivalent to u. By the chain rule,
@h u
@x (x; y) h0 (u (x; y)) @u
@x (x; y)
@u
@x (x; y)
@h u
= = (25.22)
@y (x; y) h0 (u (x; y)) @u
@y (x; y) @u
@y (x; y)

Since we can drop the derivative h0 (u (x; y)), the marginal rate of substitution is the same
for u and for all its increasing transformations h u. Thus, the marginal rate of substitution
is an ordinal notion, invariant under strictly increasing (di¤erentiable) transformations. It
does not depend on which of the two equivalent utility function, u or h u, is considered.
This explains the centrality of this ordinal notion in consumer theory, where after Pareto’s
ordinalist revolution it has replaced the cardinal notion of marginal utility (cf. Section 29.5).

Example 1209 To illustrate (25.22), consider on Rn++ the equivalent Cobb-Douglas utility
function u (x; y) = xa y 1 a and log-linear utility function log u (x; y) = a log x + (1 a) log y.
We have
@u @ log(u(x;y))
@x (x; y) axa 1 y 1 a a y @x (x; y)
M RSx;y = @u
= a a
= = @ log(u(x;y))
@y (x; y) (1 a) x y 1 ax (x; y)
@y

The two utility functions have the same marginal rate of substitution. N

Finally, let us consider a consumer that consumes in two periods, today and tomorrow,
with intertemporal utility function U : R2+ ! R given by
U (c1 ; c2 ) = u (c1 ) + u (c2 )
where we assume the same instantaneous utility function u in the two periods. Given a
utility level k, let
U 1 (k) = (c1 ; c2 ) 2 R2+ : U (c1 ; c2 ) = k
be the intertemporal indi¤erence curve and let (c1 ; c2 ) be a point on it. When the hypotheses
of the Implicit Function Theorem – with the variables exchanged – are satis…ed at (c1 ; c2 ),
there exists an implicit function f : B (c2 ) ! V (c1 ) such that
U (f (c2 ) ; c2 ) = k 8c2 2 B (c2 )
824 CHAPTER 25. IMPLICIT FUNCTIONS

The scalar function c1 = f (c2 ) tells us how much has to vary consumption today c1 when
consumption tomorrow c2 varies, so as to keep the overall utility U constant. We have:
@U
(c1 ; c2 )
@c2 u0 (c2 )
f 0 (c2 ) = =
@U u0 (c1 )
(c1 ; c2 )
@c1
When the number
u0 (c2 )
IM RSc1 ;c2 = f 0 (c2 ) = (25.23)
u0 (c1 )
exists, it is called intertemporal marginal rate of substitution: it measures the (negative)
variation in c1 that balances an increase in c2 .

Example 1210 Consider the power utility function u (c) = c = for > 0. We have

c1 c2
U (c1 ; c2 ) = +

so that the intertemporal marginal rate of substitution is (c2 =c1 ) 1


. N

25.3.3 Quadratic expansions


The Implicit Function Theorem says, inter alia, that if the function g is continuously dif-
ferentiable, then also the implicit function f is continuously di¤erentiable. The next result
shows that this important property holds much more generally.

Theorem 1211 If in the Implicit Function Theorem the function g is n times continuously
di¤ erentiable, then so does the implicit function f .10 In particular, for n = 2 we have
2 2
@g(x;y) @g(x;y)
@2x @y 2 @g(x;y)
@x@y
@g(x;y) @g(x;y)
@x @y + @g(x;y)
@2y
@g(x;y)
@x
f 00 (x) = 3 (25.24)
@g(x;y)
@y

for every x 2 U (x0 ).

This expression can be written in a compact way as


00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
f 00 (x) =
gy03

The numerator somehow reminds of a square formula, so it is easier to remember.

Proof We will omit the proof of the …rst part of the statement. Suppose f is twice di¤er-
entiable and let us apply the chain rule to (25.10), that is to
@g(x;f (x))
0 @x gx0 (x; f (x))
f (x) = =
@g(x;f (x)) gy0 (x; f (x))
@y
10
Also analyticity is preserved: if g is analytic, so does f .
25.3. A LOCAL PERSPECTIVE 825

For the sake of brevity we do not make the dependence of the derivatives of g on (x; f (x))
explicit, so we can write
0 0
00 + g 00 f 0 (x) g 0 00
gxx 00 gx g 0
gxy gx0 gyx
00 00 gx
gyy
00
gxx xy y gx0 gyx
00 + g 00 f 0 (x)
yy g0y
y g0
y
f (x) = 2 + 2 = 2 + 2
gy0 gy0 (x; f (x)) gy0 gy0
00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
=
gy0 3
as desired.

The two previous theorems allow us to give local approximations for an implicitly de…ned
function. As we know, one is rarely able to write the explicit formulation of a function which
is implicitly de…ned by an equation: being able to give approximations is hence of great
importance.
If g is of class C 1 on an open set U , the …rst order approximation of the implicitly de…ned
function in a point (x0 ; y0 ) 2 A such that g (x0 ; y0 ) = 0 is
@g
(x0 ; f (x0 ))
f (x) = y0 @x (x x0 ) + o (x x0 )
@g
(x0 ; f (x0 ))
@y
as x ! x0 .
If f is of class C 2 on an open set U , the second order (or quadratic) approximation of
the implicit function in a point (x0 ; y0 ) 2 U such that g (x0 ; y0 ) = 0 is, as x ! x0 ,
00 g 0 2 00 g 0 g 0 + g 00 g 02
gx0 gxx y 2gxy x y yy x
f (x) = y0 (x x0 ) + (x x0 )2 + o (x x0 )2
gy0 gy03
where we omitted the dependence of the derivatives on the point (x0 ; f (x0 )).
Example 1212 Given the function in Example 1204 we have
2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2
f 00 (x0 ) =
(3x0 + 2y0 )3
so that the quadratic approximation of f is, as x ! x0 ,
2x + 3y0
f (x) = y0 (x x0 )
3x + 2y0
2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2
(x x0 )2
(3x0 + 2y0 )3
+ o (x x0 )2

in a generic point (x 0 ; y0 ) 2 g 1 (0). For example, in (x 0 ; y0 ) = (0; 1) 2 g 1 (0) we have, as


x ! 0,
3 10 2
x
f (0) = 1 x + o (jxj)
2 8
Furthermore, knowing the second derivatives allows us to complete the analysis of the critical
point (x0 ; y0 ) = (1=2; 1). We have f 00 (x0 ) = 316=1331 > 0, so the point is a local minimizer.
N
826 CHAPTER 25. IMPLICIT FUNCTIONS

25.3.4 Implicit functions of several variables


The variables x and y are from a formal standpoint, abstracting from any possible interpret-
ation, symmetrical in equation g (x; y) = 0: we can try to express y in terms of x, so to have
g (x; f (x)) = 0, or x in terms of y, so to have g (f (y) ; y) = 0. Though we have concentrated
on the …rst case for convenience, all notions and results are symmetrical in the second case
(as we often noted).
In this section we extend the analysis of implicit functions to the case

g (x1 ; :::; xn ; y) = 0

in which x = (x1 ; :::; xn ) is a vector, while y remains a scalar. In the n + 1 arguments of


the function g : A Rn+1 ! R, we thus separate one of them, denoted by y, from the
other ones. The choice of which argument to label y is, again from a formal standpoint,
arbitrary.11
In any case, here we regard x as a vector of independent variables and y as a dependent
variable, so the function implicitly de…ned by equation g (x; y) = 0 is a function f of n
variables. Fortunately, the Implicit Function Theorem easily extends to this case, though
mutatis mutandis: since f is a function of several variables, now the partial derivatives
@f (x) =@xk take the place of the derivative f 0 (x) that we had in the scalar case.

Theorem 1213 Let g : U ! R be de…ned (at least) on an open set U of Rn+1 and let
g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with
@g
(x0 ; y0 ) 6= 0
@y
then there exist neighborhoods B (x0 ) Rn and V (y0 ) R and a unique vector function
f : B (x0 ) ! V (y0 ) such that

g (x; f (x)) = 0 8x 2 B (x0 ) (25.25)

The function f is continuously di¤ erentiable on B (x0 ), with


@g
(x; y)
@f @xk
(x) = (25.26)
@xk @g
(x; y)
@y
for every (x; y) 2 g 1 (0) \ B (x0 ) V (y0 ) and every k = 1; :::; n.

By using gradients, formula (25.26) can be written as


rx g (x; y)
rf (x) =
@g
(x; y) (x; y)
@y
where rx g denotes the partial gradient of g with respect to x1 , x2 , ..., xn only. Moreover,
being f unique, also in this more general case (25.25) is equivalent to (25.14) and (25.15).
11
In applications, a speci…c separation may stand out in terms of interpretation, thus becoming the one of
substantive interest (e.g., y is an output and x is a vector of inputs).
25.3. A LOCAL PERSPECTIVE 827

Example 1214 Let g : R3 ! R be de…ned by g (x1 ; x2 ; y) = x21 x22 +y 3 and let (x1 ; x2 ; y0 ) =
(6; 3; 3). We have that g 2 C 1 R3 and so (@g=@y) (x; y) = 3y 2 , therefore
@g
(6; 3; 3) = 27 6= 0
@y
By the Implicit Function Theorem, there exists a unique y = f (x1 ; x2 ) de…ned in a neigh-
borhood U (6; 3), which is di¤erentiable there and takes values in a neighborhood V ( 3).
Since
@g @g
(x; y) = 2x1 and (x; y) = 2x2
@x1 @x2

we have
@f 2x1 @f 2x2
(x) = and (x) = 2
@x1 3y 2 @x2 3y

In particular
12 6
rf (6; 3) = ;
27 27

The reader can check that a global implicit function exists f : R2 ! R and, after having
recovered the explicit expression (which exists because of the simplicity of g), can verify that
formula (25.26) is correct in computing rf (x). N

If in the previous theorems we assume that g is of class C n instead of class C 1 , the


implicitly de…ned function f is also of class C n . This allows us to recover formulas analogous
to (25.24) to compute higher order partial derivatives, up to order n included, for the implicit
function f . We omit details for the sake of brevity.
Finally, the convexity and concavity property of the implicit function f follow from points
(ii) and (iii) of Proposition 1199.

N.B. Global versions in the spirit of Proposition 1200 of Theorems 1211 and 1213 can be
easily established, as readers can check. O

25.3.5 Implicit operators


A more general case is
g (x1 ; :::; xn ; y1 ; :::; ym ) = 0
in which both x = (x1 ; :::; xn ) and y = (y1 ; :::; ym ) are vectors. Here g : A Rn+m ! R is a
vector function and the equation implicitly de…nes an operator f = (f1 ; :::; fm ) between Rn
and Rm such that

g (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0

Even more generally, we can consider the nonlinear system of equations:


8
>
> g1 (x1 ; :::; xn ; y1 ; :::; ym ) = 0
<
g2 (x1 ; :::; xn ; y1 ; :::; ym ) = 0
>
>
:
gm (x1 ; :::; xn ; y1 ; :::; ym ) = 0
828 CHAPTER 25. IMPLICIT FUNCTIONS

Here also g = (g1 ; ::; gm ) : A Rn+m ! Rm is an operator and the equation de…nes an
operator f = (f1 ; :::; fm ) between Rn and Rm such that
8
> g1 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
>
<
g2 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
(25.27)
>
>
:
gm (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
Let us focus directly on this latter general case. Here the following square submatrix of
the Jacobian matrix of the operator g plays a key role:
2 @g1 @g1 @g1 3
@y1 (x; y) @y2 (x; y) @ym (x; y)
6 7
6 7
6 @g 7
6 2 @g 2 @g 2 7
6 @y1 (x; y) @y2 (x; y) @ym (x; y) 7
6 7
Dy g (x; y) = 6 7
6 7
6 7
6 7
6 @gm (x; y) @gm (x; y) @gm
(x; y) 7
4 @y1 @y2 @ym 5

We can now state, without proof, the operator version of the Implicit Function Theorem,
which is the most general form of this result that we consider.

Theorem 1215 Let g : U ! Rm be de…ned (at least) on an open set U of Rn+m and let
g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with

det Dy g (x0 ; y0 ) 6= 0 (25.28)

then there exist neighborhoods B (x0 ) Rn and V (y0 ) Rm and a unique operator f =
(f1 ; :::; fm ) : B (x0 ) ! V (y0 ) such that (25.27) holds for every x 2 B (x0 ). The operator f
is continuously di¤ erentiable on B (x0 ), with
1
Df (x) = (Dy g (x; y)) Dx g (x; y) (25.29)

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).

The Jacobian of the implicit operator is thus pinned down by formula (25.29). To better
understand this formula, it is convenient to write it as an equality

Dy g (x; y)Df (x) = Dx g (x; y)


| {z }| {z } | {z }
m m m n m n

of two m n matrices. In terms of the (i; j) 2 f1; :::; mg f1; :::; ng component of each such
matrix, the equality is
Xm
@gi @fk @gi
(x) (x) = (x)
@yk @xj @xj
k=1
For each independent variable xj , we can determine the sought-after m-dimensional vector
@f1 @fm
(x) ; :::; (x)
@xj @xj
25.3. A LOCAL PERSPECTIVE 829

by solving the following linear system with m equations:


8
> Pm @g1 @fk @g1
>
> (x) (x) = (x)
>
> k=1
@yk @xj @xj
>
>
>
< Pm @g2 (x) @fk (x) = @g2 (x)
k=1
@yk @xj @xj
>
>
>
>
>
> Pm @gm @fk @gm
>
> (x) (x) = (x)
: k=1
@yk @xj @xj

By doing this for each j, we can …nally determine the Jacobian Df (x) of the implicit
operator.

Example 1216 De…ne g = (g1 ; g2 ) : R4 ! R2 by

g1 (x1 ; x2 ; y1 ; y2 ) = 3x1 4ex2 + y12 6y2


g2 (x1 ; x2 ; y1 ; y2 ) = 2x1 y22 4x2 e y1
+ y12 1

and let (x0 ; y0 ) = (1; 0; 1; 0). The submatrix of the Jacobian matrix of the operator g
containing the partial derivatives of g with respect to y1 and y2 is given by

2y1 6
Dy g(x; y) =
4x2 ey1 + 2y1 4x1 y2

while that reporting the partial derivatives with respect to x1 and x2 is

3 4ex2
Dx g(x; y) =
2y22 4ey1

The determinant of Dy g(x; y) is jDy g(x; y)j = 8x1 y1 y2 24x2 ey1 + 12y1 , so jDy g(x0 ; y0 )j =
12 6= 0. Condition (25.27) is thus satis…ed. By the last theorem, there exists an implicit
operator f = (f1 ; :::; fm ) : B (x0 ) ! V (y0 ) which is continuously di¤erentiable on B (x0 ).
The partial derivatives
@f1 @f2
(x); (x)
@x1 @x1
satisfy the following system
" #
@f1
2y1 6 @x1 (x) =
3
y
4x2 e + 2y1 4x1 y2
1 @f2 2y22
@x1 (x)

while the partial derivatives


@f1 @f2
(x); (x)
@x2 @x2
satisfy the following system
" #
@f1
2y1 6 @x2 (x) 4ex2
=
4x2 ey1 + 2y1 4x1 y2 @f2
@x2 (x)
4ey1
830 CHAPTER 25. IMPLICIT FUNCTIONS

Solving the two systems, we …nd:

@f1 3x1 y2 + 3y22


(x) =
@x1 6x2 e y1 2x1 y1 y2 3y1
@f2 y
6x2 e + 2y1 y22 3y1
1
(x) =
@x1 12x2 ey1 4x1 y1 y2 6y1
@f2 6x2 ey1 + 2y1 y22 3y1
(x) =
@x1 12x2 ey1 4x1 y1 y2 6y1
@f2 2y1 ey1 + 4x2 e(y1+ x2 ) 2y1 ex2
(x) =
@x2 2x1 y1 y2 6x2 ey1 + 3y1

So, we found the Jacobian matrix Df (x) of the operator f . N

Our previous discussion implies, inter alia, that in the special case m = 1 formula (25.29)
reduces to
@g @f @g
(x) (x) = (x)
@y @xj @xj
which is formula (25.26) of the vector function version of the Implicit Function Theorem.
Since condition (25.28) reduces to (@g=@y) (x0 ; y0 ) 6= 0, we conclude that the vector function
version is, indeed, the special case m = 1. Everything …ts together.

25.4 A global perspective


We now return to the global perspective of Section 25.2 and take a deeper look at some of
the motivating questions that we posed in the …rst section. For simplicity, we will focus on
the basic equation g (x; y) = 0, where g : C R2 ! R is a function of two variables x and
y. But, before starting the analysis we introduce projections, which will play a key role.

25.4.1 Preamble: projections and shadows


Let A be a subset of the plane R2 : we denote each point as (x; y). Its projection

1 (A) = fx 2 R : 9y 2 R such that (x; y) 2 Ag

is the set of point x on the horizontal axis for which there exists a point y on the vertical
axis such that the pair (x; y) belong to A.12
Likewise, de…ne the projection

2 (A) = fy 2 R : 9x 2 R such that (x; y) 2 Ag

on the vertical axis, that is the set of points y on the vertical axis for which there exists (at
least) one point x on the horizontal axis such that (x; y) belongs to A.

The projections 1 (A) and 2 (A) are nothing but the “shadows” of the set A R2 on
the two axes, as the following …gure illustrates:
12
This notion of projection is not to be confused with the altogether di¤erent one seen in Chapter 21.1.
25.4. A GLOBAL PERSPECTIVE 831

4
y

0 π (A)
2

-2

-4
O π (A) x
1

-6
-6 -4 -2 0 2 4 6

Example 1217 (i) Let A = [a; b] [c; d]. In this case,

1 (A) = [a; b] and 2 (A) = [c; d]

More in general, if A = A1 A2 , one has

1 (A) = A1 and 2 (A) = A2

The projections of a product set are its factors.


(ii) Let A = x 2 R2 : x2 + y 2 = 1 and B = [0; 1] [0; 1]. Even though A B we obtain

1 (A) = 2 (A) = [ 1; 1] = 1 (B) = 2 (B)

Di¤erent sets may sharenthe same projections. o


p
(iii) Let B" (x; y) = x 2 R2 : x2 + y 2 < " be a neighborhood of a point (x; y) 2 R2 .
One has
1 (B" (x; y)) = B" (x) = (x "; x + ")

and
2 (B" (x; y)) = B" (y) = (y "; y + ")

We conclude that the projections of a neighborhood (x; y) in R2 are neighborhoods of equal


radius of x and y in R.
(iv) Given f (x) = 1= jxj de…ned on R f0g, one has

1 (Gr f ) =R f0g and 2 (Gr f ) = (0; 1)

In particular, 1 (Gr f ) is the domain of f and 2 (Gr f ) is the image Im f . This holds in
general: if f : A R ! R one has 1 (Gr f ) = A and 2 (Gr f ) = Im f . N
832 CHAPTER 25. IMPLICIT FUNCTIONS

25.4.2 Implicit functions


Given a function g : C R2 ! R of two variables, we have
1 1 1
g (0) 1 (g (0)) 2 (g (0)) (25.30)

So, for g (x; f (x)) = 0 to be well posed we need


1 1
x2 1 (g (0)) and f (x) 2 2 (g (0))

If the implicit function f exists, its domain will be included in 1 (g 1 (0)) and its codomain
will be included in 2 (g 1 (0)). This leads us to the following de…nition.

De…nition 1218 The equation g (x;y) = 0, with g : C R2 ! R, implicitly de…nes on the


rectangle A B C, with A 1 (0)) and B 1 (0)), a function f : A ! B if
1 (g 2 (g

g (x; f (x)) = 0 8x 2 A

If such an f is unique, equation g (x;y) = 0 is said to be explicitable on A B.

The uniqueness of the implicit function f is crucial in applications as it guarantees a


univocal relationship between variables x and y. For such a reason, most of the results that
we will see will deal with equations g (x;y) = 0 that implicitly de…ne a unique function f .
In light of Proposition 1194, we have
1
g (0) \ (A B) = Gr f (25.31)

that is,
g (x; y) = 0 () y = f (x) 8 (x; y) 2 A B
In such a signi…cant case, the implicit function f allows us to represent the level curve g 1 (0)
on A B by means of its graph Gr f . In other words, the level curve admits a functional
representation.

The following example illustrates these ideas.

Example 1219 Let g : R2 ! R be given by g (x; y) = x2 + y 2 1. The level curve


1
g (0) = (x; y) 2 R2 : x2 + y 2 = 1

is the unit circle. Since 1 (g 1 (0)) = 2 (g 1 (0)) = [ 1; 1], the possible implicit function on
a rectangle A B takes the form f : A ! B with A [ 1; 1] and B [ 1; 1]. Let us …x
x 2 [ 1; 1], so to analyze the set

S (x) = y 2 [ 1; 1] : x2 + y 2 = 1

of solutions y to the equation x2 + y 2 = 1. We have


8
>
> f0g if x = 1
>
< n p o
p
S (x) = 1 x2 ; 1 x2 if 0 < x < 1
>
>
>
:
f0g if x = 1
25.4. A GLOBAL PERSPECTIVE 833

The set has two elements, except for x = 1. In other words, for every 0 < x < 1 there are
two values y for which g (x; y) = 0. Let us consider the projections’rectangle

A B = [ 1; 1] [ 1; 1]

Any function f : [ 1; 1] ! [ 1; 1] such that

f (x) 2 S (x) 8x 2 [ 1; 1]

entails that
g (x; f (x)) = 0 8x 2 [ 1; 1]

and is thus implicitly de…ned by g on A B. Such functions are in…nitely many; for example,
this is the case for the function
( p
1 x2 if x 2 Q\ [ 1; 1]
f (x) = p
1 x2 otherwise

as well as for the functions

p p
f (x) = 1 x2 and f (x) = 1 x2 8x 2 [ 1; 1] (25.32)

Therefore, there are in…nitely many functions implicitly de…ned by g on the rectangle
A B = [ 1; 1] [ 1; 1].13 The equation g (x; y) = 0 is therefore not explicitable on this
rectangle, which makes this case hardly interesting. Let us consider instead the less ambitious
rectangle
A~ ~ = [ 1; 1]
B [0; 1]
p
The function f : [ 1; 1] ! [0; 1] de…ned by f (x) = 1 x2 is the only function such that

p
g (x; f (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]

that is, f is the only function implicitly de…ned by g on the rectangle A~ ~ Equation
B.
g (x; y) = 0 is then explicitable on A~ B,
~ with

g 1
(0) \ A~ ~ = Gr f
B

13
Note that most of them are somewhat irregular; the only continuous ones among them are the two in
(25.32).
834 CHAPTER 25. IMPLICIT FUNCTIONS

The level curve g 1 (0) can be represented on A~ ~ by means of the graph of f .


B

y
2.5

1.5

0.5

0
-1 O 1 x
-0.5

-1
-2 -1 0 1 2

In a similar fashion, if we considerp the rectangle A B = [ 1; 1] [ 1; 0] and if we de…ne


h : [ 1; 1] ! [ 1; 0] by h (x) = 1 x2 , we have
p
g (x; h (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]

and also that


1
g (0) \ A B = Gr h
The function h is, thus, the only one implicitly de…ned by g on the rectangle A B and the
level curve g 1 (0) can be represented by means of its graph. The equation g (x; y) = 0 is
explicitable on A B.

y
1.5

0.5

-1 1
0
O x
-0.5

-1

-1.5

-2
-2 -1 0 1 2
25.4. A GLOBAL PERSPECTIVE 835

To sum up, there are in…nitely many implicit functions on the projections rectangle A B,
while uniqueness can be obtained when we restrict ourselves to the smaller rectangles A~ B~
and A B. The study of implicit functions is of interest on these two rectangles because
the unique implicit function f de…ned thereon describes a univocal relationship between the
variables x and y which equation g (x; y) = 0 implicitly determines. N
O.R. If we draw the graph of the level curve g 1 (0) one can note how the rectangle A B
can be thought of a sort of “frame” on this graph, isolating a part of it. In some frames the
graph is explicitable, in other less fortunate ones, it is not. By changing the framing we can
tell apart di¤erent parts of the graph according to their explicitability. H

The last example showed how it is important to study, for each x 2 1 (0)),
1 (g the
solution set
S (x) = y 2 2 (g 1 (0)) : g (x; y) = 0
The scalar functions f : 1 (g 1 (0)) ! 2 (g 1 (0)), with f (x) 2 S (x) for every x in their
domain, are the possible implicit functions. In particular, when the rectangle A B is such
that S (x) \ B is a singleton for each x 2 A, we have a unique implicit function f : A ! B.
In this case, for each x 2 A there is a unique solution y 2 B to equation g (x; y) = 0.

Let us see another simple example, warning the reader that –though useful to …x ideas
– these are very fortunate cases: usually constructing S (x) is far from easy (though local,
the Implicit Function Theorem is key in this regard).
p
Example 1220 Let g : R2+ ! R be given by g (x; y) = xy 1. We have
1
g (0) = (x; y) 2 R2+ : xy = 1
since 1 (0)) 1 (0))
1 (g = 2 (g = (0; 1), and so
A B (0; 1) (0; 1) = R2++
Let us …x x 2 (0; 1) and let us analyze the set
S (x) = fy 2 (0; 1) : xy = 1g
Since
1
S (x) = 8x 2 (0; 1)
x
we consider A B = R2++ and f : (0; 1) ! (0; 1) given by f (x) = 1=x. We have
1
g (x; f (x)) = g x; =0 8x 2 (0; 1)
x
and f is the only function implicitly de…ned by g on R2++ . Moreover, we have
1
g (0) \ R2++ = Gr f
The level curve g 1 (0) can be represented on R2++ as the graph of f . N
A …nal remark. When writing g (x; y) = 0, variables x and y play symmetric roles, so
that we can think of a relationship of type y = f (x) or of type x = ' (y) indi¤erently. In
what follows, we will always consider a function y = f (x), as the case x = ' (y) can be
easily recovered via a parallel analysis to that we conduct here.
836 CHAPTER 25. IMPLICIT FUNCTIONS

25.4.3 Comparative statics I


The marginal analysis conducted in Section 25.3.2 with a local angle can be carried out
globally, as readers can check (cf. Example 1223 below). The study of functions that are
implicitly de…ned by equations
g (x; y) = 0 (25.33)
occurs in economics in at least two other settings:

(i) equilibrium analysis, where equation (25.33) derives from an equilibrium condition in
which y is an equilibrium (endogenous) variable and x is an (exogenous) parameter;

(ii) optimization problems, where equation (25.33) comes from a …rst order condition in
which y is a choice variable and x is a parameter.

The analysis of the relationship between x and y, that is, between the values of the
parameter and the resulting choice or equilibrium variable, is a comparative statics exercise
that, thus, consists in studying the function f implicitly de…ned by the economic relation
(25.33). The uniqueness of such an implicit function, and hence the explicitability of equation
(25.33), is essential to best conduct comparative statics exercises.
The following two subsections will present these two comparative statics problems.14

Equilibrium comparative statics Consider the market of a given good, as seen in


Chapter 12. Let D : [0; b] ! R and S : [0; b] ! R be the demand and supply functions
respectively. A pair (p; q) 2 [a; b] R+ of prices and quantities is said to be a market
equilibrium if
q = D (p) = S (p) (25.34)
In particular, having found the equilibrium price p^ by solving the equation D (p) = S (p),
the equilibrium quantity is q^ = D (^
p) = S (^
p).
Suppose that the demand for the good (also) depends on an exogenous variable 0. For
example, may be the level of indirect taxation which in‡uences the demanded quantity.
The demand thus takes the form D (p; ) and is a function D : [0; b] R+ ! R, that is,
it depends on both the market price p and the value of the exogenous variable. The
equilibrium condition (25.34) now becomes

q = D (p; ) = S (p) (25.35)

and the equilibrium price p^ varies as changes. What is the relationship between taxation
level and equilibrium prices? Which properties does such a relationship have?
Answering these simple, yet important, economic questions is equivalent to asking oneself:
(i) whether a (unique) function p = f ( ) which connects taxation and equilibrium prices
(i.e., the exogenous and endogenous variable of this simple market model) exists, and (ii)
which properties such a function has.
To deal with this problem, we introduce the function g : [0; b] R+ ! R given by
g (p; ) = S (p) D (p; ), so that the equilibrium condition (25.35) can be written as

g (p; ) = 0
14
In Chapter 33 we will further study comparative statics exercises in optimization problems.
25.4. A GLOBAL PERSPECTIVE 837

In particular,
1
g (0) = f(p; ) 2 [0; b] R+ : g (p; ) = 0g
is the set of all pairs of equilibrium prices/taxation levels (i.e., of endogenous/exogenous
variables).
The two questions asked above are now equivalent to asking oneself whether:

(i) a (unique) implicit function p = f ( ) such that g (f ( ) ; ) = 0 for all 0 exists;

(ii) if so, which are the properties of such a function f : for example, if it is decreasing, so
that higher indirect taxes correspond to lower equilibrium prices.

Problems as such, where the relationship among endogenous and exogenous variables
is studied – in particular, how changes in the former impact the latter – are of central
importance in economic theory and in its empirical tests.
To …x ideas, let us examine the simple linear case where everything is straightforward.

Example 1221 Consider the linear demand and supply functions:

D (p; ) = (p + )
S (p) = a + bp

where > 0 and b > 0. We have

g (p; ) = a + bp + (p + )

so that the function f : R+ ! R given by

a
f( )= + (25.36)
b+ +b

clearly satis…es (25.35). The equation g (p; ) = 0 thus implicitly de…nes (and in this case
also explicitly) the function f given by (25.36). Its properties are obvious: for example, it
is strictly decreasing, so that changes in the taxation level bring about opposite changes in
equilibrium prices.
Regarding the equilibrium quantity q^, for every it is

q^ = D (f ( ) ; ) = S (f ( ))

In other words, we have a function : R+ ! R, equivalently de…ned by ( ) = D (f ( ) ; )


or by ( ) = S (f ( )) such that ( ) is the equilibrium quantity corresponding to the
taxation level . By using function ( ) = S (f ( )) for the sake of convenience, from
(25.36) we get that
b ( a) b
( )=a +
b+ +b
It is a strictly decreasing function, so that changes in the taxation level bring about opposite
changes in the equilibrium quantities as well. N
838 CHAPTER 25. IMPLICIT FUNCTIONS

Optimum comparative statics Consider the optimization problem

max (p; y) sub y 0 (25.37)


y

of a …rm with pro…t function : [0; 1) ! R given by (p; y) = py c (y), where c : [0; 1) !
R is a di¤erentiable cost function (cf. Section 18.1.4). The choice variable is the production
level y of some good, say potatoes.
If, as one would expect, there is at least a production level y > 0 such that (y) > 0,
the level y = 0 is not optimal. So, problem (25.37) becomes

max (p; y) sub y > 0 (25.38)


y

Since the interval (0; 1) is open, by Fermat’s Theorem a necessary condition for y > 0 to
be optimal is that it satis…es the …rst order condition

@ (p; y)
=p c0 (y) = 0 (25.39)
@y
The key aspect of the producer’s problem is to assess how the optimal production of potatoes
varies as the market price of potatoes changes, i.e., how the production of potatoes is a¤ected
by their price. Such a relevant relationship between prices and quantities is expressed by the
scalar function f such that
p c0 (f (p)) = 0 8p 0
that is, by the function implicitly de…ned by the …rst order condition (25.39). Function
f is referred to as the producer’s supply function (of potatoes). For each price level p, it
gives the optimal quantity y = f (p). Its existence and properties (for example, if it is
increasing, so that higher prices lead to larger produced quantities of potatoes, hence larger
supplied quantities in the market) are of central importance in studying a good’s market. In
particular, the sum of the supply functions of all producers who are present in the market
constitutes the market supply function S (p) which we saw in Chapter 12.
To formalize the derivation of the supply function from the optimization problem (25.38),
we de…ne a function g : [0; 1) (0; 1) ! R by

g (p; y) = p c0 (y)

The …rst order condition (25.39) can be rewritten as

g (p; y) = 0

If there exists an implicit function y = f (p) such that g (p; f (p)) = 0, it is nothing but the
supply function itself. Let us see a simple example where the function f and its properties
can be recovered with simple computations.

Example 1222 Consider quadratic costs c (y) = y 2 for y 0. Here g (p; y) = p 2y, so the
only function f : [0; 1) ! [0; 1) implicitly de…ned by g on R2+ is f (p) = p=2. In particular,
f is strictly increasing, so that higher prices entail a higher production, and hence a larger
supply. N
25.4. A GLOBAL PERSPECTIVE 839

25.4.4 Properties
The …rst important problem one faces when analyzing implicit functions is that of determ-
ining which conditions on function g guarantee that equation g (x; y) = 0 is explicitable on a
rectangle, that is, it de…nes a unique implicit function over there. Later in the book we will
establish a Global Implicit Function Theorem (Section 26.3), a deep result. Here we can,
however, establish a few simple, yet quite interesting, facts that follow from Propositions
1195 and 1196.
If, for simplicity,15 we focus on the rectangle 1 (g 1 (0)) 2 (g
1 (0)), for the problem

to be well posed it is necessary that


1 1
S (x) = y 2 2 (g (0)) : g (x; y) = 0 6= ; 8x 2 1 (g (0)) (25.40)

So, for every possible x at least a solution (x; y) to equation g (x; y) = 0 exists. As previously
noted, every scalar function f : 1 (g 1 (0)) ! 2 (g 1 (0)) with f (x) 2 S (x) for all x 2
1 (0)) is a possible implicit function.
1 (g
In view of Proposition 1195, the non-emptiness condition (25.40) holds if
1
inf g (x; y) 0 sup g (x; y) 8x 2 1 (g (0))
y2 1 (0))
2 (g y2 2 (g
1 (0))

Moreover, by Proposition 1196, if g is strictly monotone in y then equation g (x; y) = 0


de…nes a unique implicit function f : 1 (g 1 (0)) ! 2 (g 1 (0)) on the rectangle 1 (g 1 (0))
1 (0)).
2 (g

The results of Section 25.2 permit to ascribe some notable properties to the implicit
function. Speci…cally, let f : 1 (g 1 (0)) ! 2 (g 1 (0)) be the unique function such that
g (x; f (x)) = 0 for all x 2 1 (g 1 (0)). By Propositions 1199 and 1200, if g is strictly
increasing in y, then f is:16

(i) strictly decreasing if g is strictly increasing in x;

(ii) (strictly) convex if g is (strictly) quasi concave;

(iii) (strictly) concave if g is (strictly) quasi convex;

(iv) continuous if g is continuous;

(v) continuously di¤erentiable, with

@g
(x; y)
f 0 (x) = @x 8 (x; y) 2 g 1
(0)
@g
(x; y)
@y

if g is continuously di¤erentiable on A B, with either @g (x; y) =@y > 0 for all (x; y) 2
A B or @g (x; y) =@y < 0 for all (x; y) 2 A B.
15
What we establish here and in the next subsection is easily seen to hold for any rectangle A B.
16
In points (ii) and (iii) we tacitly assume that the domain of C is convex, while in points (iv) and (v) we
assume that it is open.
840 CHAPTER 25. IMPLICIT FUNCTIONS

Point (ii) makes rigorous in a global sense –in contrast to the local one already remarked
in Section 25.3.2 – the expression “convex indi¤erence curves” by showing that they are,
indeed, represented via convex implicit functions.

Example 1223 Consider the Cobb-Douglas production function g : R2++ ! R given by


g (x; y) = x y 1 , with 0 < < 1, on R2++ . In Example 1207 we showed via the Implicit
Function Theorem that, given any k > 0, equation g (x; y) = k implicitly de…nes a unique
fk : B (x0 ) ! V (y0 ) at the point (x0 ; y0 ) 2 g 1 (k). But, do we really need the Implicit
Function Theorem? Using the results of Section 25.2 we can actually do much better:
equation g (x; y) = k implicitly de…nes a unique fk : (0; 1) ! (0; 1) on the entire R2++
– so, globally and not just locally at a point (x0 ; y0 ) 2 g 1 (k). Indeed, we can invoke
Propositions 1195 and 1196 since g is continuous and strictly increasing in y, while condition
(25.3) holds because

inf g (x; y) = 0 and sup g (x; y) = +1 8x > 0


y>0 y>0

Thus, the results of Section 25.2 are all what we need in this example, there is no need to
invoke the Implicit Function Theorem. For instance, the continuous di¤erentiability of fk
follows from Proposition 1200 since @g (x; y) =@y > 0 for all (x; y) 2 R2++ . In sum, here the
Implicit Function Theorem actually delivers an inferior, local rather than global, result. N

25.4.5 Comparative statics II


Let us use the observations just made for the comparative statics problems of Section 25.4.3.

Equilibrium comparative statics: properties We begin with the equilibrium problem


with indirect taxation . Suppose that:

(i) D : [0; b] R ! R and S : [0; b] ! R are continuous and such that D (0; ) S (0) and
D (b; ) S (b) for every .

(ii) D is strictly decreasing in p and S is strictly increasing.

The function g : [0; b] R+ ! R given by g (p; ) = S (p) D (p; ) is therefore strictly


increasing in p. Since condition (25.3) holds,17 by Propositions 1195 and 1196 the equation
g ( ; p) = 0 de…nes a unique function p = f ( ) such that

g (f ( ) ; ) = 0 8 0

By Proposition 1199, it is

(i) continuous because D and S are continuous;

(ii) strictly decreasing because D is strictly decreasing in ;

(iii) (strictly) convex if S is (strictly) quasi concave and D is (strictly) quasi convex.
17
Indeed D and S are continuous and, furthermore, D (0; ) S (0) and D (b; ) S (b) for every .
25.4. A GLOBAL PERSPECTIVE 841

Property (ii) is especially interesting. Under the natural hypothesis that D is strictly
decreasing in , we have that f is strictly decreasing: changes in taxation bring about
opposite changes in equilibrium prices (increases in entail decreases in p, and decreases in
determine increases in p).
In the linear case of Example 1221, the existence and properties of f follow from simple
computations. The results in this section allow to extend the same conclusions to much more
general demand and supply functions.

Optimum comparative statics: properties Consider the optimization problem

max F ( ; c) sub c 2 (a; b)


c

where c is the choice variable and 0 parameterizes the objective function F : (a; b)
[0; 1) ! R. Assume that F is partially derivable. If the partial derivative @F ( ; c) =@c is
strictly increasing in c –for example, @ 2 F ( ; c) =@c2 > 0 if F is twice di¤erentiable –and if
condition (25.3) holds, then by Propositions 1195 and 1196 the …rst order condition

@F ( ; c)
g (c; ) = =0
@c
implicitly de…nes a unique function f : [0; 1) ! (a; b) such that

@F ( ; f ( ))
=0 8 0
@c
By Proposition 1199, the function f is:

(i) continuous if @F=@c is continuous;

(ii) strictly decreasing if @F=@c is strictly decreasing in ;

(iii) (strictly) convex if @F=@c is (strictly) quasi concave.

In the special case of the producer’s problem, market prices p are the parameters and
production levels y are the choice variables. So, F (p; y) = py c (y) is the pro…t function
and
@F (p; y)
g (p; y) = = p c0 (y)
@y
The strict monotonicity of g in y is equivalent to the strict monotonicity of the derivative
function c0 (and to its strict convexity or concavity). In particular, in the standard case when
c0 is strictly increasing (so, c is strictly convex), the function g is concave, which implies that
the supply function y = f (p) is convex. In such a case, since g is strictly increasing in p, the
supply function is strictly increasing in p.
842 CHAPTER 25. IMPLICIT FUNCTIONS
Chapter 26

Inverse functions

26.1 Equations
A general form of an equation is
f (x) = y0 (26.1)
where f is an operator f : A ! Rn Rn
and y0 is a given element of Rn .1
The variable x
is the unknown of the equation and y0 is the known term. The solutions of the equation are
all x 2 X such that f (x) = y0 .
A basic taxonomy: equation (26.1) is

(i) linear if the operator f is linear and nonlinear otherwise;


(ii) homogeneous if y0 = 0 and nonhomogeneous otherwise.

Earlier in the book we studied the special cases of homogeneous equations (Section 12.8)
and linear equations (Section 13.7).
Three main questions can be asked on the solutions of equation (26.1):

(i) can the equation be solved globally: given every y0 2 Rn , is there x 2 A that satis…es
(26.1)? if so, is the solution unique?
(ii) can the equation be solved locally: given a y0 2 Rn , is there x 2 A that satis…es (26.1)?
if so, is the solution unique?
(iii) if the solution is globally unique, does it change continuously as the known term
changes?

The set of all solutions of equation (26.1) is given by the counter-image


1
f (y0 ) = fx 2 A : f (x) = y0 g
So, the questions can be addressed via the inverse correspondence f 1 : Im f Rn de…ned
by2
f 1 (y) = fx 2 A : f (x) = yg 8y 2 Im f
1
We write y0 in place of y to emphasize that y0 should be regarded as a …xed element of Rn and not as a
variable.
2
Correspondences will be studied later in the book in Chapter 32.

843
844 CHAPTER 26. INVERSE FUNCTIONS

We say that f is weakly invertible at y 2 Rn if f 1 (y) is non-empty, that is, if y 2 Im f . If,


in addition, f 1 (y) is a singleton, we say that f is invertible at y. If f is weakly invertible
(resp., invertible) at all y 2 Rn , we say that f is globally weakly invertible (resp., invertible).
In particular, a function f is globally invertible if and only if it is bijective – i.e., f 1 (y)
is a singleton for all y 2 Rn – and Im f = Rn .3 In this case, we have an inverse function
f 1 : Rn Rn .
Using this terminology, the above questions can be rephrased in more precise terms as
follows:

(i) is f globally weakly invertible? if so, is it invertible?


(ii) is f weakly invertible at y0 2 Rn , i.e., does y0 belong to Im f ? if so, is it invertible at
y0 ?
(iii) if f is globally invertible, is its inverse f 1 continuous (or di¤erentiable)?

The global question (i) is clearly much more demanding than the local one (ii). In
particular, the existence and uniqueness of solutions at each y0 2 Rn amounts to the existence
of the inverse function f 1 : Rn ! Rn , which then describes how solutions vary as the known
term varies. Finally, question (iii) is about the “robustness”of the unique solutions, whether
they change abruptly, discontinuously, under small changes of the known term. If they did,
the equation would have an unpleasant instability in that small changes in the known term
would determine signi…cant changes in its solutions.

Example 1224 Consider f (x) = x2 1 and the equation f (x) = y, with y 2 R. If y = 0,


the equation becomes
x2 1=0
which has the two solutions x = 1. If y = 1, the equation becomes
x2 = 0
which has the unique solution x = 0. Finally, if y = 2 the equation becomes
x2 = 1
which has no (real) solutions. In sum, the equation f (x) = y can be only studied locally: as
y varies, solutions may exist or not, may be unique or not. For instance, f 1 (0) = f 1; 1g,
f 1 ( 1) = f0g, and f 1 ( 2) = ;. Since Im f = [ 1; +1), the inverse correspondence
f 1 : [ 1; +1) R p
y + 1 if y 1
f 1 (y) =
; if y < 1
describes the solutions as y varies. N

Ideally, solutions should be unique globally and vary continuously with respect to the
known term. Formally, this means that f is globally invertible and its inverse f 1 : Rn ! Rn
is continuous (or, even better, di¤erentiable). In this case, we say that the problem of solving
the equation is well posed.
3
Recall that a function is invertible if it is injective (Section 6.4.1). So, global invertibility is a much
stronger notion that requires the function to be a bijection of Rn onto Rn .
26.2. LOCAL ANALYSIS 845

Example 1225 This ideal case may occur for a linear equation Ax = b. Indeed, the linear
operator T : Rn ! Rn de…ned by T (x) = Ax is globally invertible if and only if the matrix A
is invertible, that is, if and only if det A 6= 0 (Cramer’s Theorem). Condition det A 6= 0 thus
ensures that, for each b 2 Rn , there is a unique solution x 2 Rn given by T 1 (b) = A 1 b.
The inverse T 1 : Rn ! Rn is a continuous function that describes how solutions vary as b
varies. N

O.R. Every equation f (x) = y0 can be put in a homogeneous form fy0 (x) = 0 via the
auxiliary function fy0 (x) = f (x) y0 . If we are interested in addressing question (ii), so
what happens at a given y0 , it is then without loss of generality to consider homogeneous
equations (as we did, for example, in Section 12.8). However, for the global questions (i) and
(iii) it is important to keep track of the known term by studying the general form f (x) = y0 .
H

26.2 Local analysis


Theorem 1226 (Inverse Function Theorem) Let f : U ! Rn be a k 1 times con-
tinuously di¤ erentiable operator de…ned (at least) on an open set U of Rn . If

det Df (x0 ) 6= 0 (26.2)

at x0 2 U , then there exist neighborhoods B (x0 ) and V (y0 ) so that the restriction f :
B (x0 ) ! V (y0 ) is a bijective operator, with a k times continuously di¤ erentiable inverse
operator f 1 : V (y0 ) ! B (x0 ) such that
1 1
Df (y) = (Df (x)) 8x 2 B (x0 ) (26.3)

where y = f (x).

The Inverse Function Theorem thus provide conditions that ensure the local invertibility
of a function. This important theorem is a simple consequence of the Implicit Function
Theorem.4

Proof Assume, for simplicity, that Im f is an open set, so the set U Im f is open in R2n .
De…ne g : R2n ! Rn by
g (x; y) = f (x) y (26.4)
Given (x0 ; y0 ) 2 g 1 (0), by (26.2) we have

det Dx g (x0 ; y0 ) = det Df (x0 ) 6= 0

The operator operator version of the Implicit Function Theorem (Theorem 1215) (in “ex-
changed” form) then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function ' : B (y0 ) ! V (x0 ) such that

g (' (y) ; y) = 0 8y 2 B (y0 )


4
Also the converse is true, so one can …rst prove either theorem and get the other as a simple consequence
(cf. Theorem 1239.
846 CHAPTER 26. INVERSE FUNCTIONS

that is, by recalling (26.4),


f (' (y)) = y 8y 2 B (y0 )
The function ' is, therefore, the inverse of f on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence –locally, around the point y0 –of the inverse
of f . In particular, formula (25.29) here becomes
1 1 1
Df (y) = (Dx g (x; y)) Dy g (x; y) = (Df (x)) 8x 2 B (x0 )
where y = f (x).

For n = 1, formula (26.3) has as a special case the basic formula (20.20) on the derivative
0
of the inverse of a scalar function, i.e., f 1 (y0 ) = 1=f 0 (x0 ). So, the Inverse Function
Theorem vastly generalizes such basic …nding. More importantly, this classic result provides
an answer to the local question (ii). Indeed, suppose that –by skill or luck –we have been
able to …nd a solution x0 of equation f (x) = y0 . Based on this knowledge, under a di¤erential
condition at x0 the Inverse Function Theorem ensures that, …rst, x0 is the unique solution
and, second, that for all know terms y that belong to a neighborhood V (y0 ) of the known
term y0 , the corresponding equations f (x) = y have unique solutions as well, all lying in the
neighborhood B (x0 ) = f 1 (V (y0 )).

Recall that the Jacobian matrix is the matrix associated to the di¤erential operator
df (x0 ) : Rn ! Rn (Theorem 974), i.e.,
df (x0 ) (h) = Df (x0 ) h 8h 2 Rn
Condition (26.2) amounts to require that the Jacobian matrix be invertible, so that the
di¤erential operator is invertible. Its inverse operator d 1 f (x0 ) : Rn ! Rn is then given by
1 1
d f (x0 ) (h) = (Df (x0 )) h 8h 2 Rn
The Inverse Function Theorem shows that the invertibility of its di¤erential at x0 , ensured
by condition (26.2), is inherited locally at x0 by the function f itself. By formula (26.3), we
also have
1 1 1 1
df (y0 ) (h) = Df (y0 ) h = (Df (x0 )) h=d f (x0 ) (h) 8h 2 Rn
So, the di¤erential of the inverse coincides with the inverse of the di¤erential. Formula (26.3)
thus ensures the mutual consistency of the linear approximations at x0 of the function f and
of its inverse f 1 , a further dividend of the Inverse Function Theorem.

The Inverse Function Theorem may fail if we remove either of its hypothesis – i.e.,
condition (26.2) and (at least) continuous di¤erentiability. A non-trivial, omitted, example
can be given to show that di¤erentiability is not enough for the theorem, so continuous
di¤erentiability is needed. A simple example, which we give next, shows that condition
(26.2) is needed.

Example 1227 The continuously di¤erentiable quadratic function f (x) = x2 does not sat-
isfy condition (26.2) at the origin. On the other hand, this function is not locally invertible
at the origin: there is no neighborhood of the origin on which we can restrict the quadratic
function so to make it injective. N
26.3. GLOBAL ANALYSIS 847

26.3 Global analysis


We can address the global questions (i) and (iii) via a global version of the Inverse Function
Theorem. To this end, we need some preliminary notions.

26.3.1 Preamble: preimages of continuous functions


Continuous operators have an important characterization in terms of preimages. For simpli-
city we consider the case when their domains are the whole space.

Proposition 1228 An operator f : Rn ! Rm is continuous if and only if the preimage


f 1 (C) of each closed set C of Rm is itself a closed set of Rn .

For instance, level sets f 1 (y) = fx 2 Rn : f (x) = yg of continuous functions are closed
sets since singletons fyg are closed sets in Rm .
The proof of the proposition relies on some basic set theoretic properties of images and
preimages, whose proof is left to the reader.

Lemma 1229 Let f : X ! Y be a function between any two sets X and Y . We have:

(i) f f 1 (E) E for each E Y:

(ii) f 1 (E c ) = f 1 (E) c for each E Y.

In view of (ii), there is a dual version of the last proposition for open sets: an operator
is continuous if and only if the preimage of each open set is open.

Proof of Proposition 1228 “If”. Suppose that f is continuous. Let C be a closed set of
Rn . Let fxn g f 1 (C) be such that xn ! x0 2 Rn . We want to show that x0 2 f 1 (C).
Set yn = f (xn ). Since f is continuous, we have f (xn ) ! f (x0 ). Then f (x0 ) 2 C because
C is closed. In turn, this implies x0 2 f 1 (C), as desired.
“Only if”. Suppose that, for each closed set C of Rm , the set f 1 (C) is closed in Rn . So,
c
for each open set V of Rm , the set f 1 (V ) is open in Rn because f 1 (V ) = f 1 (V c ).
So, being x0 2 f 1 (V ), there exists a neighborhood B (x0 ) such that B (x0 ) f 1 (V ). So,
f (B (x0 )) f f 1 (V ) V . We conclude that f is continuous at x0 .

There is no counterpart of the last proposition for images: given a continuous function,
in general the image of an open set is not open and the image of a closed set is not closed.

Example 1230 (i) Let f : R ! R be the quadratic function f (x) = x2 . For the open
interval I = ( 1; 1) we have f (I) = [0; 1), which is not open. (ii) Let f : R ! R be the
exponential function f (x) = ex . The real line R is a closed set (also open, but here this is
not of interest), with f (R) = (0; 1), which is not closed. N

In view of Lemma 801, it is not surprising that in this example the closed set considered,
i.e. R, is unbounded, so not compact.
848 CHAPTER 26. INVERSE FUNCTIONS

26.3.2 Proper functions


De…nition 1231 An operator f : Rn ! Rm is said to be proper if, for every sequence
fxn g Rn ,
kxn k ! +1 =) kf (xn )k ! +1

Properness requires the norm of the images of f to diverge to +1 along any possible
unbounded sequence fxn g Rn –i.e., such that kxn k ! +1. In words, the function cannot
take, inde…nitely, values that have increasing norm values on a sequence that “dashes o¤”
to in…nity.

Example 1232 If m = 1, supercoercive functions are proper. Indeed, for them we have

kxn k ! +1 =) f (xn ) ! 1 =) jf (xn )j ! +1

The converse is obviously false: the cubic function f (x) = x3 is proper but not supercoercive.
N

By now, the next characterization of proper functions should not be that surprising.

Proposition 1233 An operator f : Rn ! Rm is proper if and only if the preimages of


bounded sets are, in turn, bounded sets.

Proof “If”. Suppose that f is proper. Let B be a bounded set of Rm . Suppose, by


contradiction, that the preimage f 1 (B) is not bounded. Then, there is, a sequence fxn g
f 1 (B) such that kxn k ! +1. That is, fxn g Rn is such that kxn k ! +1 and f (xn ) 2 B
for each n. But, kxn k ! +1 implies kf (xn )k ! +1 because f is proper. This contradicts
the boundedness of B. We conclude that f 1 (B) is bounded. The second part of the
statement now follows from Proposition 871.
“Only if”. Suppose that f is such that the preimages of bounded sets of Rm are bounded
sets of Rn . Let fxn g Rn be such that kxn k ! +1. Suppose, by contradiction, that
there is K > 0 such that kf (xn )k K for all n. Then, the preimage of the bounded set
n
B = fx 2 R : kf (x)k Kg contains an unbounded sequence fxn g, a contradiction. We
conclude that f is proper.

In view of Proposition 1228, we have the following simple, yet interesting, corollary.

Corollary 1234 A continuous operator f : Rn ! Rm is proper if and only if the preimages


of compact sets are, in turn, compact sets.

The next result presents an important class of proper operators.

Proposition 1235 Invertible linear operators f : Rn ! Rn are proper.

Proof Invertible linear operators are globally invertible, and the inverse f 1 : Rn ! Rn
is a linear operator (see Chapter 13). By Lemma 730, there exists a constant k > 0 such
that f 1 (x) k kxk for every x 2 Rn . Let fxn g Rn be such that kxn k ! +1. Then,
kxn k = f 1 (f (xn )) k kf (xn )k, so kf (xn )k ! +1. We conclude that f is proper.
26.3. GLOBAL ANALYSIS 849

26.3.3 Global Inverse Function Theorem


Proper functions are key for the next remarkable theorem, a far reaching generalization of
Cramer’s Theorem.5

Theorem 1236 (Caccioppoli-Hadamard) A continuously di¤ erentiable operator f : Rn !


Rn is bijective, with di¤ erentiable inverse f 1 : Rn ! Rn , if and only if it is proper and
det Df (x) 6= 0 8x 2 Rn (26.5)

In view of Proposition 1235, Cramer’s Theorem is a special case of Caccioppoli-Hadamard’s


Theorem because for linear operators f (x) = Ax we have Df (x) = A, so condition (26.5)
holds when det A 6= 0.
Thus, the problem of solving an equation f (x) = y0 featuring a proper function f that
satis…es condition (26.5) is well posed: for every possible known term y0 2 Rn , there exists a
unique solution, given by x = f 1 (y0 ). Since f 1 is di¤erentiable, solutions do not change
abruptly. At a theoretical level, questions (i) and (iii) are fully answered in this case. The
computation implementation, of course, might be nontrivial.

Proof Let f : Rn ! Rn be continuously di¤erentiable. We prove the “only if”, the converse
being much more complicated. So, suppose that f is bijective, with di¤erentiable inverse
f 1 : Rn ! Rn . Since f 1 is continuous, by Lemma 801 the image f 1 (K) of each compact
set K of Rn is compact. Since f is continuous, by Corollary 1234 this implies that f is
proper. Moreover, since f 1 f (x) = x for all x 2 Rn , by the chain rule formula (21.39)
we have Df 1 (f (x)) Df (x) = I, so det Df 1 (f (x)) Df (x) = 1. By Binet’s Theorem,
det Df (x) 6= 0.

Without the hypothesis that f is proper, the “if” can fail, as the next classic example
shows.

Example 1237 Consider the continuously di¤erentiable operator f : R2 ! R2 de…ned by


f (x1 ; x2 ) = (ex1 cos x2 ; ex1 sin x2 ). Its Jacobian matrix is
" #
@f1 (x1 ;x2 ) @f1 (x1 ;x2 )
@x @x ex1 cos x2 ex1 sin x2
Df (x) = @f2 (x11;x2 ) @f2 (x12;x2 ) =
e sin x2 ex1 cos x2
x 1
@x1 @x1

Thus, det Df (x) = e2x1 cos2 x2 + e2x1 sin2 x2 = e2x1 > 0 for all x 2 Rn , so condition (26.5)
holds. However, this function is notpproper. Indeed, if we take xn = (0; n), then kxn k = n
but kf (xn )k = k(cos n; sin n)k = cos2 n + sin2 n = 1, so kxn k ! +1 does not imply
kf (xn )k ! +1.
This function is neither injective nor surjective. To see that it is not surjective, note
that there is no x 2 Rn such that f (x) = 0. Indeed, if f (x) = 0 then ex1 cos x2 = 0, so
cos x2 = 0. In turn, this implies sin x2 = 1, which contradicts f (x) = 0. As to injectivity,
for example we have f (0; 0) = f (0; 2 ) = (1; 0).
In sum, by the Inverse Function Theorem f is locally invertible at each x 2 Rn , but we
just showed that it is not globally invertible on Rn . Thus, a function locally invertible at
each point of its domain might not be globally invertible. N
5
A …rst version of this theorem was proved by Jacques Hadamard in 1906 and then substantially generalized
by Renato Caccioppoli in 1932.
850 CHAPTER 26. INVERSE FUNCTIONS

As a consequence of Caccioppoli-Hadamard’s Theorem, we have the following global


version of the Inverse Function Theorem.

Theorem 1238 (Global Inversion Function Theorem) Let f : Rn ! Rn be a proper


continuously di¤ erentiable operator. If

det Df (x) 6= 0 8x 2 Rn

then f is bijective, with di¤ erentiable inverse f 1 : Rn ! Rn such that


1 1
Df (y) = (Df (x)) 8x 2 Rn (26.6)

where y = f (x).

Proof By Caccioppoli-Hadamard’s Theorem, f 1 : Rn ! Rn exists and is di¤erentiable.


Since f 1 f (x) = x for all x 2 Rn , by the chain rule formula (cf. the last proof) we have
Df 1 (f (x)) Df (x) = I for all x 2 Rn , so det Df 1 (f (x)) Df (x) = 1 for all x 2 Rn . By
Binet’s Theorem, det Df 1 (f (x)) 6= 0 for all x 2 Rn , so Df 1 (f (x)) 6= 0 is invertible for
all x 2 Rn . From Df 1 (f (x)) Df (x) = I, it then follows (26.6).

26.3.4 Global Implicit Function Theorem


The Global Inverse Function Theorem implies a global version of the Implicit Function
Theorem, which next we state and prove. Besides its own interest, it shows how an inverse
function theorem imply an implicit function one.

Theorem 1239 (Global Implicit Function Theorem) Let g : Rn+m ! Rm be a proper


continuously di¤ erentiable operator, with

det Dy g (x; y) 6= 0 8 (x; y) 2 Rn Rm (26.7)

Then, there exists a unique operator f : Rn ! Rm such that

g (x; f (x)) = 0 8x 2 Rn

The operator f is di¤ erentiable, with


1
Df (x) = (Dy g (x; y)) Dx g (x; y) 8x 2 Rn (26.8)

where y = f (x), i.e., g (x; y) = 0.

Proof De…ne the continuously di¤erentiable operator F : Rn+m ! Rn+m by F (x; y) =


(x; g (x; y)), i.e.,

F (x1 ; :::; xn ; y1 ; :::; ym ) = (x1 ; :::; xn ; g (x1 ; :::; xn ; y1 ; :::; ym ))

Since g is proper, so does F . Indeed, if k(x; y)k ! +1, then kg (x; y)k ! +1, so
kF (x; y)k ! +1 because kg (x; y)k kF (x; y)k.
26.3. GLOBAL ANALYSIS 851

Since
Fi (x; y) = xi 8i = 1; :::; n
Fn+j (x; y) = gj (x; y) 8j = 1; :::; m
we have
2 @F1 (x) @F1 (x) @F1 (x) @F1 (x) @F1 (x)
3
@x1 @x2 @xn @y1 @ym
6 7
6 7
6 @Fn (x) @Fn (x) @Fn (x) @Fn (x) @Fn (x) 7
6 7
DF (x) = 6
6
@x1
@Fn+1 (x)
@x2
@Fn+1 (x)
@xn
@Fn+1 (x)
@y1
@Fn+1 (x)
@ym
@Fn+1 (x)
7
7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x)
@x1 @x2 @xn @y1 @ym
2 3
1 0 0 0 0
6 7
6 7
6 7
6 0 0 1 0 0 7
= 6
6 @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y)
7
7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y)
@x1 @x2 @xn @y1 @ym
So, 2 3
@g1 (x;y) @g1 (x;y)
@y1 @ym
6 7
det DF (x) = det 4 5 = det Dy g (x; y)
@gm (x;y) @gm (x;y)
@y1 @ym
By (26.7), we thus have det DF (x) 6= 0 for all x 2 Rn .
By Caccioppoli-Hadamard’s Theorem, F is globally invertible with di¤erentiable F 1 :
R n+m ! Rn+m . Fix x 2 Rn . Since there is y 2 Rn such that F 1 (x; 0) = (x; y), we have
g (x; y) = 0. We claim that such y 2 Rn is unique. Indeed, let y; y 0 2 Rn be such that
g (x; y) = 0. Then,
F (x; y) = (x; g (x; y)) = (x; 0) = x; g x; y 0 = F x; y 0
Since F is bijective, it then follows that y = y 0 , as desired. So, let f : Rn ! Rm be the
operator that associates to each x 2 Rn the unique y 2 Rm such that g (x; y) = 0. By
de…nition, g (x; f (x)) = 0 for all x 2 Rn and f is the unique such operator. Moreover, from
F (x; f (x)) = (x; 0) 8x 2 Rn
it follows that
1
F (x; 0) = (x; f (x)) 8x 2 Rn
Since F 1 is di¤erentiable, it can be proved that this implies that f is di¤erentiable. Since
g (x; f (x)) = 0 8x 2 Rn
by the chain rule we have
Dx g (x; f (x)) = Dy g (x; f (x)) Dx f (x) 8x 2 Rn
So, formula (26.8) holds because condition (26.7) ensures that the matrix Dy g (x; f (x)) is
invertible at all x 2 Rn .
852 CHAPTER 26. INVERSE FUNCTIONS

26.4 Parametric equations


In applications, equations have often the parametric form

f (x; ) = y0

where f is an operator f : A Rn Rm ! Rn and y0 is a given element of Rn . The


variable parameterizes the equation. Given a value of the parameter , we are interested
in the variables x 2 Rn that solve the equation under the known term y0 .
Given a value of the parameter, we can ask the same three questions that we posed in
Section 26.1. In this parametric setting, however, we can take a di¤erent perspective: once
posited a known term (often normalized to 0), do solutions exist given some or all values of
the parameter? are they unique? how do they vary when the value of the parameter varies?
To formalize these questions, de…ne the (equation) solution correspondence Sy0 : S
Rn by
Sy0 ( ) = fx 2 A : f (x; ) = y0 g
In words, y0 associates to each parameter value the corresponding solution set of the equa-
tion f (x; ) = y0 . The solution correspondence describes how solutions vary as the parameter
varies. Given y0 2 Rn , the previous questions then become:

(i) is the set Sy0 ( ) not empty for some 2 or for all 2 ? If so, is it a singleton?
(ii) if it is a function (locally or globally), is Sy0 continuous (or di¤erentiable)?

We have
f (Sy0 ( ) ; ) = y0
So, a positive answer to question (i) would amount to say that Sy0 is a function implicitly
de…ned, locally or globally, by the equation f (x; ) = y0 , that is, Sy0 would give the functional
representation of the level curve f 1 (y0 ) = f(x; ) 2 A : f (x; ) = y0 g. Thus, the study
of the solutions of a parametric equation given a known terms and the study of the functional
representations of a level curve are, mathematically, equivalent exercises.
To answer the questions (i) and (ii) we need then to invoke suitable versions of the Implicit
Function Theorem: local versions of such theorem would give local answers, global versions
would give global answers. In any case, a deja vu: in our discussions of implicit functions
we already (implicitly) took this angle, which in economics is at heart of comparative statics
analysis (cf. Section 25.4.3). Indeed, conditions that ensure the existence, at least locally,
of a solution function Sy0 : S ! Rn permit to e¤ectively describe how solutions – the
endogenous variables – react to changes in the parameters – the exogenous variables. For
brevity, we leave readers to revisit those discussions through the lenses of this section.

26.5 Coda: direct and inverse problems


In a scienti…c inquiry, be it in the natural or social sciences, we posit a set X of possible
causes (or inputs), a set Y of possible e¤ ects (or outputs), and a set M of possible models
m : X ! Y . A cause x determines an e¤ect y = m (x) via model m; this scheme can be
diagrammed as
x! m !y
26.5. CODA: DIRECT AND INVERSE PROBLEMS 853

We can consider four main problems about a scienti…c inquiry described by a triple (X; Y; M ).
We formalize them by means of the evaluation function g : X M ! Y de…ned by g (x; m) =
m (x) that relates causes, e¤ects and models through the expression

y = g (x; m) (26.9)

The four problems are:

(i) Direct problems: Given a model m and a cause x, what is the resulting e¤ect y?
formally, which is the (unique) value y = g (x; m) given x 2 X and m 2 M ?

(ii) Causation problems: Given a model m and an e¤ect y, what is the underlying cause
x? formally, which are the (possibly multiple) values of x that solve equation (26.9)
given y 2 Y and m 2 M ?

(iii) Identi…cation problems: Given a cause x and an e¤ect y, what is the underlying model
m? formally, which are the (possibly multiple) values of m 2 M that solve equation
(26.9) given x 2 X and y 2 Y ?

(iv) Induction problems: Given an e¤ect y, what are the underlying cause x and model m?
formally, which are the (possibly multiple) values of x 2 X and m 2 M that solve
equation (26.9) given x 2 X?

The latter three problems (causation, identi…cation and induction) are formalized by
regarding (26.9) as an equation. For this reason, we call them inverse problems.6 We can
thus view the study of equations as a way to address such problems. In this regard, note
that:

1. In causation and identi…cation problems, the equation (26.9) is parametric. In the


former problem, x is the unknown, y is the known term and m is a parameter; in the
latter problems, m is the unknown, y is the known term and x is a parameter.

2. In an induction problem, y is the known term of equation (26.9), while x and m are
the unknowns.

Example 1240 Consider an orchard with several apple trees that produce a quantity of
apples according to the summer weather conditions; in particular, the summer could be
either cold or hot or mild. Here m is an apple tree that belongs to the collection M of the
apple trees of the orchard, y is the apple harvest with Y = [0; 1), and x is the average
summer temperature with X = [0; 1). We interpret m (x) as the quantity of apples that
the tree m produces when the summer weather is x. The trees in the orchard thus di¤er in
their performance in the di¤erent weather conditions.
In this example the previous four problems takes the form:

(i) Given a tree m and an average summer temperature x, what is the resulting apple
harvest y?
6
In this chapter we considered the case X; Y Rn , but the study of equations can be carried out more
generally, as readers will learn in more advanced courses.
854 CHAPTER 26. INVERSE FUNCTIONS

(ii) Given a tree m and an apple harvest y, what is the underlying average summer tem-
perature x?

(iii) Given an average summer temperature x and an apple harvest y, what is the underlying
tree m?

(iv) Given an apple harvest y, what are the underlying average summer temperature x and
tree m? N
Chapter 27

Study of functions

It is often useful to have, roughly, a sense of how a function looks like. In this chapter we
will outline a qualitative study of functions. To this end, we …rst introduce couple of classes
of points.

27.1 In‡ection points


We begin with a local notion of concavity.

De…nition 1241 Let f : A R ! R and x0 an accumulation point of A. The function


f is said to be ( strictly) concave at x0 if there exists a neighborhood of x0 on which it is
(strictly) concave.

A dual de…nition holds for (strict) convexity at a point. From Corollary 1101 it immedi-
ately follows the next result.

Proposition 1242 Let f : A R ! R be twice di¤ erentiable at x0 2 A. If f is concave at


x0 , then f 00 (x0 ) 0 (with the derivative understood as one-sided when needed). If f 00 (x0 ) <
0, then f is strictly concave at x0 .

An dual characterization holds for (strict) convexity.

Example 1243 (i) The function f : R ! R given by f (x) = 2x2 3 is strictly convex
at every point because f 00 (x) = 4 > 0 at every x. (ii) The function f : R ! R given by
f (x) = x3 is strictly convex at x0 = 5 since f 00 (5) = 30 > 0, and it is strictly concave at
x0 = 1 since f 00 ( 1) = 6 < 0. N

Geometrically, as we know well, for di¤erentiable functions concavity (convexity) means


that the tangent line lies always above (below) the graph of the function. Concavity (con-
vexity) at a point means, therefore, that the straight line tangent at that point lies locally –
that is, at least on a neighborhood of the point –above (below) the graph of the function.

855
856 CHAPTER 27. STUDY OF FUNCTIONS

5 10
y y

0 f(x )
0
6

-5 4 f(x )
0

-10

O x x O x x
0 0
-15 -2
0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 7

O.R. Like the …rst derivative of a function at a point gives information on its increase or
decrease, so the second derivative gives information on concavity or convexity at a point.
The greater jf 00 (x0 )j, the more pronounced the curvature (the “belly”) of f at x0 –and the
“belly” is upward if f 00 (x0 ) < 0 and downward if f 00 (x0 ) > 0, as the previous …gure shows.
Economic applications often consider the ratio

f 00 (x0 )
f 0 (x0 )

which does not depend on the unit of measure of f (x). Indeed, let T and S be the units
of measure of the dependent and independent variables, respectively. Then, the units of
measure of f 0 and of f 00 are T =S and T =S 2 , so the unit of measure of f 00 =f 0 is
T
S2 1
T
=
S
S

Note that f 00 (x0 ) =f 0 (x0 ) is the derivative of log f 0 (x0 ). H

De…nition 1244 Let f : A R ! R and x0 an accumulation point of A. Then x0 is said


to be an in‡ection point for f if there exists a neighborhood of x0 on which f is concave at
the points to the right of x0 and convex at the points to the left of x0 or vice versa.

In short, in an in‡ection point the “sign” of the concavity of the function changes. By
Proposition 1242, we have the following simple result.

Proposition 1245 Let f : A R ! R and x0 an accumulation point of A.

(i) If x0 is an in‡ection point for f , then f 00 (x0 ) = 0 (provided f is twice di¤ erentiable at
x0 ).

(ii) If f 00 (x0 ) = 0 and f 000 (x0 ) 6= 0, then x0 is an in‡ection point for f (provided f is three
times continuously di¤ erentiable at x0 ).
27.2. ASYMPTOTES 857

Example 1246 (i) The origin is an in‡ection point of the cubic function f (x) = x3 . (ii)
2 2
Let f : R ! R be the Gaussian function f (x) = e x . Then f 0 (x) = 2xe x and f 00 (x) =
2
4x2 2 e x , so the function is concave for

1 1
p <x< p
2 2
p p
and convex
p for jxj > 1= 2. The two points 1= 2 are therefore in‡ection points. Indeed,
f 00 ( 1= 2) = 0. We will continue the study of this function later in the chapter in Example
1258. N

For di¤erentiable functions, geometrically at a point of in‡ection x0 the tangent line cuts
the graph: it cannot lie (locally) above or below it. In particular, if f 0 (x0 ) = f 00 (x0 ) = 0
then the tangent line is horizontal and cuts the graph of the function: we talk of a point of
in‡ection with horizontal tangent.

Example 1247 The origin is an in‡ection point with horizontal tangent of the cubic func-
tion, as well as of any function f (x) = xn with n odd. N

27.2 Asymptotes
Intuitively, an asymptote is a straight line to which the graph of a function gets arbitrarily
close. Such straight lines can be vertical, horizontal, or oblique.

(i) When at least one of the two following conditions is satis…ed:

lim f (x) = +1 or 1
x!x+
0

lim f (x) = +1 or 1
x!x0

the straight line of equation x = x0 is called a vertical asymptote for f .

(ii) When
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1

with L 2 R, the straight line of equation y = L is called a horizontal asymptote for f


at +1 (or at 1).

(iii) When
lim (f (x) ax b) = 0 (or lim (f (x) ax b) = 0)
x!+1 x! 1

that is, when the distance between the function and the straight line y = ax + b tends
to 0 as x ! +1 (or ! 1), the straight line of equation y = ax + b is an oblique
asymptote for f to +1 (or to 1).

Horizontal asymptotes are actually the special case of oblique asymptotes with a = 0.
Moreover, it is evident that there can be at most one oblique asymptote as x ! 1 or as
x ! +1. It is, instead, possible that f has several vertical asymptotes.
858 CHAPTER 27. STUDY OF FUNCTIONS

Example 1248 Consider the function

7
f (x) = 3
x2 +1

with graph

2
y
1.5

0.5

-0.5

-1

-1.5
O x
-2

-2.5

-3

-3.5
-5 0 5

Since limx!+1 f (x) = limx! 1 f (x) = 3; the straight line y = 3 is both a right and a
left horizontal asymptote for f (x). N

Example 1249 The function f : R f 1g ! R de…ned by

1
f (x) = +2
x+1

with graph
8
y
6

0
O x
-2

-4

-5 0 5

has horizontal asymptote y = 2 and vertical asymptote x = 1. N

Example 1250 Consider the function

1
f (x) =
x2 +x 2
27.2. ASYMPTOTES 859

with graph
3
y

0
O x

-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4 5

Since limx!1+ f (x) = +1 and limx!1 f (x) = 1, the straight line x = 1 is a vertical
asymptote for f (x). Moreover, since limx! 2+ = 1 and limx! 2 = +1, also the
straight line x = 2 is a vertical asymptote for f (x). N

Example 1251 Consider the function

2x2
f (x) =
x+1

with graph
20
y
15

10

0
O x
-5

-10

-15

-20
-6 -4 -2 0 2 4 6

Since limx!+1 (f (x) 2x 2) = 0 and limx! 1 (f (x) 2x 2) = 0, the straight line


y = 2x + 2 is both a right and a left oblique asymptote for f (x). N

Vertical and horizontal asymptotes are easily identi…ed. We thus shift our attention to
oblique asymptotes. To this end, we provide two simple results.

Proposition 1252 The straight line y = ax + b is an oblique asymptote of f as x ! 1 if


and only if limx! 1 f (x) =x = a and limx! 1 [f (x) ax] = b.
860 CHAPTER 27. STUDY OF FUNCTIONS

Proof “If”. When f (x) =x ! a, consider the di¤erence f (x) ax. If it tends to a …nite
limit b, then (and only then) f (x) ax b ! 0. “Only if”. From f (x) ax b ! 0 it
follows that f (x) ax ! b and, by dividing by x, that f (x) =x a ! 0.

The next result follows from de l’Hospital’s rule.

Proposition 1253 Suppose that f is di¤ erentiable and f (x) ! 1 as x ! 1. Then y =


ax+b is an oblique asymptote of f as x ! 1 if limx! 1 f 0 (x) = a and limx! 1 [f (x) ax] =
b.

Proposition 1252 gives a necessary and su¢ cient condition for the search of oblique
asymptotes, while Proposition 1253 only provides a su¢ cient condition. To use this latter
condition, the limits involved must exist. In this regard, consider the following example.

Example 1254 For the function f : R ! R given by

cos x2
f (x) = x +
x
as x ! 1 we have
f (x) cos x2
=1+ !1
x x2
and
cos x2
f (x) x= !0
x
Therefore, y = x is an oblique asymptote of f as x ! 1. Nevertheless, the …rst derivative
of f is
2x2 sin x2 cos x2 cos x2
f 0 (x) = 1 + = 1 2 sin x 2
x2 x2
It is immediate to verify that the limit of f 0 (x) as x ! 1 does not exist. N

In the following examples we determine the asymptotes of some functions.

Example 1255 For the function f : R ! R given by f (x) = 5x + 2e x, as x ! +1, we


have
f (x) 2
=5+ x !5
x xe
and
f (x) 5x = 2e x ! 0
Therefore, y = 5x is an oblique asymptote of f as x ! +1. As x ! 1 the function does
not have oblique (so horizontal) asymptotes. N
p
Example 1256 For the function f : [1; +1) ! R given by f (x) = x2 x, as x ! +1,
we have p r
f (x) x2 x 1
= = 1 !1
x x x
27.2. ASYMPTOTES 861

and as x ! +1
r 1 !
p 1 1 2
f (x) x = x2 x x=x 1 x=x 1 1
x x
1
1
1 x
2
1 1
= 1 !
x
2

Therefore,
1
y=x
2
is an oblique asymptote as x ! +1 for f . N

It is quite simple to realize that:

(i) If f (x) = g (x) + h (x) and h (x) ! 0 as x ! 1, then f and g share the possible
oblique asymptotes.

(ii) If pn (x) = a0 xn + a1 xn 1 + + an is a polynomial


p of degree n in x with a0 > 0 and
n odd, then the function de…ned by f (x) = n pn (x) has, as x ! 1, the oblique
asymptote
p 1 a1
y = n a0 x +
n a0
If pn (x) = a0 xn + a1 xn 1 + + an is a polynomial
p of degree n in x with a0 > 0 and
n even, then the function de…ned by f (x) = n pn (x) as x ! +1 has the oblique
asymptote
p 1 a1
y = n a0 x +
n a0
and as x ! 1 the oblique asymptote

p 1 a1
y= n
a0 x +
n a0

Let us verify only (ii) for n odd (for n even the calculations are analogous). If n is odd
as x ! 1, we have
p q
n
a x n n 1 + a1 + ::: + an
f (x) 0 a0 x a0 x p
= ! n a0
x x
p
hence the slope of the oblique asymptote is n a0 . Moreover
" 1 #
p p a1 xn 1+ ::: + an n
f (x) n
a0 x = n
a0 x 1+ 1 =
a0 xn
1
a1 xn 1 +:::+a n
1+ a0 xn
n
1
p a1 xn 1+ ::: + an
= n
a0 x a1 xn 1 +:::+a
a0 xn n
a0 xn
862 CHAPTER 27. STUDY OF FUNCTIONS

Since as x ! 1
1
a1 xn 1 +:::+a n
1+ a0 xn
n
1
1 p a1 xn 1 + ::: + an p a1
a1 xn 1 +:::+an
! and n
a0 x ! n
a0
n a0 xn a0
a0 xn

we have, as x ! 1,
p p a1 1
f (x) n
a0 x ! n
a0
a0 n
In the previous example we had n = 2, a0 = 1, and a1 = 1. Indeed, as x ! +1, the
asymptote had the equation
p
2 1 1 1
y= 1 x+ =x
2 1 2

27.3 Study of functions


The di¤erential calculus results so far obtained allow for a qualitative study of functions.
Such a study consists in …nding the possible local maximizers and minimizers, the in‡ection
points, and the asymptotic and boundary behavior of the function.
Let us consider a function f : A R ! R de…ned on a set A. To apply the results of the
chapter, we assume that f is twice di¤erentiable at each interior point of A. The study of f
may be articulated in a few steps.

(i) We …rst calculate the limits of f at the boundary points of the domain, and also as
x ! 1 when A is unbounded.

(ii) We determine the sets on which the function is positive, f (x) 0, increasing, f 0 (x)
0, and concave/convex, f 00 (x) Q 0. Once it is also determined the intersections of the
graph with the axes by …nding the set f (0) on the vertical axis and the set f 1 (0) on
the horizontal axis, we begin to have a …rst idea of its graph.

(iii) We look for candidate extremal points via …rst and second-order conditions (or, more
generally, via the omnibus procedure of Section 23.3).

(iv) We look, via the condition f 00 (x) = 0, for candidate in‡ection points; they are certainly
so if at them f 000 6= 0 (provided f is three times continuously di¤erentiable at x).

(v) Finally, we look for possible oblique asymptotes of f .

Next we study a few functions.

Example 1257 Let f : R ! R be given by f (x) = x6 3x2 + 1. We look for possible local
extremal points. The …rst-order condition f 0 (x) = 0 has the form

6x5 6x = 0

therefore x = 0 and x = 1 are the unique critical points. We have f 00 (0) = 6, f 00 ( 1) =


24, and f 00 (1) = 24. Hence, x = 0 is a local maximizer, while x = 1 and x = 1 are local
27.3. STUDY OF FUNCTIONS 863

minimizers. From limx!+1 f (x) = limx! 1f (x) = +1 if follows that the graph of this
function is:

y
1.5

0.5

0
O x

-0.5

-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

2
Example 1258 Let f : R ! R be the Gaussian function f (x) = e x . Both limits, as
x ! 1, are 0. So, the horizontal axis is a horizontal asymptote. The function is always
strictly positive and f (0) = 1. Next, we look for possible local extremal points. The …rst
2
order condition f 0 (x) = 0 has the form 2xe x = 0, so the origin x = 0 is the unique
critical point. The second derivative is

x2 x2 x2
f 00 (x) = 2e + ( 2x) e ( 2x) = 2e 2x2 1

Being f 00 (0) = 2, the origin is a local maximizer. Since

x < 0 < y =) f 0 (x) > 0 > f 0 (y)

by Proposition 1024 the origin is actually a strong global maximizer. Moreover, we have

1 1
f 00 (x) < 0 () 2x2 1 < 0 () x 2 p ;p
2 2
1
f 00 (x) = 0 () 2x2 1 = 0 () x = p
2
1 1
f 00 (x) > 0 () 2x2 1 > 0 () x 2 1; p [ p ; +1
2 2

p
So, the
p points
p x = 1= 2 are in‡ection points, with f concave
p on thep open interval
1= 2; 1= 2 and convex on the open intervals 1; 1= 2 and 1= 2; +1 . The
864 CHAPTER 27. STUDY OF FUNCTIONS

graph of the function is the famous Gaussian bell:

y
1.5

0.5

0
O x

-0.5

-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

which is the most classical among the graphs of functions. N

Example 1259 Let f : R ! R be given by f (x) = x3 7x2 + 12x. We have

lim f (x) = 1 , lim f (x) = +1


x! 1 x!+1

Therefore, there are no asymptotes. Then we have:

1. f (0) p= 0 and f (x) = 0, that is, x x2 7x + 12 = 0 for x = 0 and for x =


7 49 48 =2 = 3 and 4. Given that it is possible to write f (x) = x (x 3) (x 4),
the function is 0 when x 2 [0; 3] [ [4; 1).

2. Since f 0 (x) = 3x2 14x + 12, the derivative is zero for

p p p
14 196 144 14 52 7 13
x= = =
6 6 3
p p
The derivative is 0 when x 2 ( 1; (7 13)=3] [ [(7 + 13)=3; 1).

3. Since f 00 (x) = 6x 14, it is zero for x = 7=3. The second derivative is 0 when
x 7=3.

p
Since f 00 ((7
4. p 13)=3) < 0, the point is a local maximizer; since instead f 00 ((7 +
13)=3) > 0, the point is a local minimizer. Finally, the point 7=3 is of in‡ection.
27.3. STUDY OF FUNCTIONS 865

In sum, the graph of the function is:

10

y
8

0
O x

-2
-3 -2 -1 0 1 2 3 4 5 6 7

Example 1260 Let f : R ! R be given by f (x) = xex . Its limits are limx! 1 xe
x =0
and limx!+1 xex = +1. We then have:

1. f (x) 0 () x 0.

2. f 0 (x) = (x + 1) ex 0 () x 1.

3. f 00 (x) = (x + 2) ex 0 () x 2.

4. f (0) = 0, so the origin is the unique point of intersection with the axes.

Since f 0 (x) = 0 for x = 1 and f 00 ( 1) = e 1 > 0, the unique minimizer is x = 1.


Given that f 00 (x) = 0 for x = 2, it is a point of in‡ection. In sum, the graph of the function
is:
10

9
y
8

0
O x
-1
-6 -4 -2 0 2 4 6

N
866 CHAPTER 27. STUDY OF FUNCTIONS

Example 1261 Let f : R ! R be given by f (x) = x2 ex . Its limits are

lim x2 ex = 0+ , lim x2 ex = +1
x! 1 x!+1

We then have:

1. f (x) is always 0 and f (0) = 0, hence x = 0 is a minimizer.

2. f 0 (x) = x (x + 2) ex 0 () x 2 ( 1; 2] [ [0; 1).


p p
3. f 00 (x) = x2 + 4x + 2 ex 0 () x 2 ( 1; 2 2] [ [ 2 + 2; +1).

4. x = 2 and x = 0 are the unique stationary points. Since f 00 ( 2) = 2e 2 < 0, then


x = 2 is a local maximizer. Given that f 00 (0) = 2e0 > 0, this con…rms that x = 0 is
a minimizer.
p
5. The two points of abscissae 2 2 are in‡ection points.

In sum, the graph of the function is:

8 y
7

0
O x
-1
-4 -3 -2 -1 0 1 2 3 4 5

Example 1262 Let f : R ! R be given by f (x) = x3 ex . Its limits are

lim x3 ex = 0 , lim x3 ex = +1
x! 1 x!+1

We then have that:

1. f (0) = 0; f (x) 0 () x 0.

2. f 0 (x) = x2 (x + 3) ex 0 () x 3; note that f 0 (0) = 0 as well as f 0 > 0 close to


x = 0: the function is therefore increasing at the origin.
p p
3. f 00 (x) = x3 + 6x2 + 6x ex 0 () x 2 3 3; 3 + 3 [ [0; 1).
27.3. STUDY OF FUNCTIONS 867

4. x = 3 and x = 0 are the unique stationary points. Since f 00 ( 3) = 9e 3 > 0, x = 3


is a local minimizer. One has f 00 (0) = 0, and we already know that the function is
increasing at x = 0.
p
5. The three points of abscissae 3 3 and 0 are in‡ection points.

In sum, the graph of the function is:

8
y
7

0
O x
-1

-2
-6 -5 -4 -3 -2 -1 0 1 2 3

Example 1263 Let f : R ! R be given by


1
f (x) = 2x + 3 +
x 2
This function is not de…ned at x = 2. We have
lim f (x) = lim f (x) = 1 , lim f (x) = lim f (x) = +1
x! 1 x!2 x!2+ x!+1

1. f (0) = 3 0:5 = 2:5; we have f (x) = 0 when (2x + 3) (x 2) = 1, that is, when
2x2 x 5 = 0, i.e., for
p
1 41
x= ' 1:35 and 1:85
4

2. One has that


1
f 0 (x) = 2
(x 2)2
p
which is zero if (x 2)2 = 1=2, i.e., if x = 2 (1= 2).
3. Since
2
f 00 (x) =
(x 2)3
is positive
p for every x p
> 2 and negative for every x < 2, the two stationary points
2 + (1= 2) and 2 (1= 2) are, respectively, a local minimizer and a local maximizer.
868 CHAPTER 27. STUDY OF FUNCTIONS

4. Since f 0 (x) ! 2 as x ! 1, the function has an oblique asymptote. Further, since

1
lim [f (x) 2x] = lim 3+ =3
x! 1 x! 1 x 2

the oblique asymptote has equation y = 2x + 3. Clearly, there is also a vertical


asymptote of equation x = 2.
In sum, the graph of the function is:

25

20 y
15

10

0
O x
-5

-10

-15

-20

-25
-5 0 5 10

Note that
1
f (x)
x 2
as x ! 2 (near 2 f (x) behaves like 1= (x 2), i.e., it diverges) and that f (x) 2x + 3
as x ! 1 (for x su¢ ciently large it behaves like y = 2x + 3). N
Part VII

Di¤erential optimization

869
Chapter 28

Unconstrained optimization

28.1 Unconstrained problems


In the last part of the book we learned some remarkable tools that di¤erential calculus
provides for the study of local solutions of the optimization problems introduced in Chapter
18, problems that are at heart of economics (and of our book). In the next few chapters on
optimization theory we will show how these tools can be used to …nd global solutions of such
problems, which are the real object of interest in applications –as we already stressed several
times. In other words, we will learn how the study of local solutions can be instrumental
for the study of global ones. To this end, we will study two main classes of problems:
(i) problems with coercive objective functions, in which we can combine local di¤erential
results a la Fermat with global existence results a la Weierstrass and Tonelli; (ii) problems
with concave objective functions that can rely on the fundamental optimality properties of
concave functions.
In this introductory chapter we illustrate a few classic di¤erential optimization themes
via an unconstrained di¤ erential optimization problem

max f (x) sub x 2 C (28.1)


x

with objective function f : A Rn ! R which is di¤erentiable on an open choice set C A.


As usual, a point x^ 2 C is a (global) solution of this optimization problem if f (^x) f (x)
for each x 2 C, while it is a local solution of such a problem if there exists a neighborhood
Bx0 (") of x x) f (x) for each x 2 Bx0 (") \ C.1
^ such that f (^

28.2 Coercive problems


An unconstrained di¤erential optimization problem is said to be coercive if the objective
function f is coercive on C. Since the continuity of f on C is guaranteed by di¤erentiability,
Tonelli’s Theorem can be used for this class of problems. Along with Fermat’s Theorem, it
gives rise to the so-called elimination method for solving optimization problems that in this
chapter will be used to deal with unconstrained di¤erential optimization problems.

The elimination method consists in the following two phases:


1
As in the rest of the book, solutions are understood to be global even when not stated explicitly.

871
872 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

1. identify the set S of critical points of f on C, i.e.,

S = fx 2 C : rf (x) = 0g

2. construct the set f (S) = ff (x) : x 2 Sg; if x


^ 2 S is such that

f (^
x) f (x) 8x 2 S (28.2)

then x
^ is a solution for the optimization problem (28.1).

In other words, once the conditions for Tonelli’s Theorem to be applied are veri…ed, one
constructs the set of critical points. A point where f attains its maximum value is a solution
of the optimization problem.

N.B. If the function f is twice continuously di¤erentiable, in phase 1 instead of S one can
consider the subset S2 S of the critical points that satisfy the second order necessary
condition (Sections 22.5.3 and 23.4.4). O

The rationale of the elimination method is simple. By Fermat’s Theorem, the set S
consists of all points in C which are candidate local solutions for the optimization problem
(28.1). On the other hand, if f is continuous and coercive on C, by Tonelli’s Theorem there
exists at least a solution for this optimization problem. Such a solution must belong to
the set S (as long as it is non-empty) because a solution of the optimization problem is, a
fortiori, a local solution. Hence, the solutions of the “restricted” optimization problem

max f (x) sub x 2 S (28.3)


x

are also solutions of the optimization problem (28.1). But, the solutions of the restricted
problem (28.3) are the points x^ 2 S for which condition (28.2) holds, which are then the
solutions of optimization problem (28.1), as phase 2 of the elimination method states.

As the following examples show, the elimination method elegantly and e¤ectively com-
bines Tonelli’s global result with Fermat’s local one. Note how Tonelli’s Theorem is crucial
since in unconstrained di¤erential optimization problems the choice set C is open, so Weier-
strass’Theorem inapplicable (as it requires C to be compact).
The smaller is the set S of critical points, the better the method works in that phase 2
requires a direct comparison of f at all points of S. For this reason, the method is particularly
e¤ective when we can consider, instead of S, its subset S2 consisting of all critical points
which satisfy the second order necessary condition.
2
Example 1264 Let f : Rn ! R be given by f (x) = (1 kxk2 )ekxk and let C = Rn . The
function f is coercive on Rn . Indeed, it is supercoercive: by taking tn = kxn k, it follows that
2 2
f (xn ) = 1 kxn k2 ekxn k = 1 t2n etn ! 1

for any sequence fxn g of vectors such that tn = kxn k ! +1. Since it is continuous, f is
coercive on Rn by Proposition 820. The unconstrained di¤erential optimization problem
2
max 1 kxk2 ekxk sub x 2 Rn (28.4)
x
28.2. COERCIVE PROBLEMS 873

is thus coercive. Let us solve it by using the elimination method.

Phase 1: It is easy to see that


rf (x) = 0 () x = 0
so that S = f0g and x = 0 is the unique critical point.

Phase 2: Since S is a singleton, this phase trivially implies that x


^ = 0 is a solution of
optimization problem (28.4). N

Example 1265 Let f : R ! R be given by f (x) = x6 + 3x2 1 and let C = R. By


Proposition 820, f is coercive on R because limx! 1 f (x) = limx! 6 2 1) = 1.
1 ( x +3x
The unconstrained di¤erential optimization problem

max x6 + 3x2 1 sub x 2 R (28.5)


x

is thus coercive. Let us solve it with the elimination method.

Phase 1: The …rst order condition f 0 (x) = 0 takes the form 6x5 6x = 0, so x = 0
and x = 1 are the only critical points, that is, S = f 1; 0; 1g. We have f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24, so S2 = f0g.

Phase 2: Since S2 is a singleton, this phase trivially implies that x


^ = 0 is a solution of the
optimization problem (28.5). N

Example 1266 Let us get back to the unconstrained optimization problem


x4 +x2
max e sub x 2 R
x

of Example 1022. Let us check that this di¤erential problem is coercive. By setting g (x) = ex
and h (x) = x4 x2 , it follows that f = g h. We have limx! 1 h (x) = limx! 1 x4 +x2 =
1. So, by Proposition 820 the function h is coercive on R. Since g is strictly increasing,
the function f is a strictly increasing transformation of a coercive function. By Proposition
806, f is coercive.
This unconstrained di¤erential optimization problem is thus coercive and can be solved
with the elimination method.
p p
Phase 1: From Example 1022 we know that S2 = 1= 2; 1= 2 .
p p p
Phase 2: We have f ( 1= 2) = f (1= 2), so both points x ^ = 1= 2 are solutions of the
unconstrained optimization problem. The elimination method allowed us to identify the
nature of such points, something not possible by using solely di¤erential methods as in
Example 1022. N

Example 1267 Example 1070 dealt with the optimization problem

max f (x) sub x 2 R2++


x
874 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

where f : R2 ! R is de…ned by f (x1 ; x2 ) = 2x21 x22 + 3 (x1 + x2 ) x1 x2 + 3. The function


f is supercoercive: indeed, it is easily seen that

f (x1k ; x2k ) = 2x21k x22k + 3 (x1k + x2k ) x1k x2k + 3 ! 1


q
for any “exploding”sequence fxk = (x1k ; x2k )g R2++ , that is, such that kxk k = x21k + x22k !
+1. As f is continuous, it is coercive on Rn by Proposition 820.
This unconstrained di¤erential optimization problem is coercive as well, so it can be
solved with the elimination method.

Phase 1: By Example 1070, S2 = f3=7; 9=7g.

Phase 2: As S2 is a singleton, this phase trivially implies that x


^ = (3=7; 9=7) is a solution
of the optimization problem (28.5). The elimination method has allowed us to identify the
nature of such a point, thus making it possible to conclude the study of the optimization
problem started in Example 1070. N

28.3 Concave problems


Optimization problems with concave objective functions are pervasive in economic applic-
ations because concave function can be often given a plausible (at times, even compelling)
economic meaning that makes it possible to take advantage of their remarkable optimality
properties.2 In particular, the unconstrained di¤erential optimization problem (28.1), i.e.,

max f (x) sub x 2 C (28.6)


x

is said to be concave if the set C A is both open and convex and if the function f : A
Rn ! R is both di¤erentiable and concave on C.
As we learned earlier in the book (Section 24.5.1), in a such a problem the …rst-order
condition rf (^ x) = 0 becomes necessary and su¢ cient for a point x ^ 2 C to be a solution.
This remarkable property explains the importance of concavity in optimization problems.
But, more is true: by Theorem 831, such a solution is unique if f is strictly quasi-concave.
Besides existence, also the study of the uniqueness of solutions –key for comparative statics
exercises –is best carried out under concavity.

The necessary and su¢ cient status of the …rst order condition leads to the concave (elim-
ination) method to solve the concave problem (28.6). It consists of a single phase:

1. Find the set S = fx 2 C : rf (x) = 0g of the stationary points of f on C; all, and only,
the points x
^ 2 S solve the optimization problem.

In particular, when f is strictly quasi-concave, the set S is a singleton that consists of


the unique solution. This is the case when the concave method is most powerful. In general,
this method is, at the same time, simpler and more powerful than the method of elimination.
2
Recall the discussion on diversi…cation in Section 14.5.
28.3. CONCAVE PROBLEMS 875

It requires the concavity of the objective function, a demanding condition that, however, is
often assumed in economic applications, as remarked before.3

Example 1268 Let f : R ! R be given by f (x) = x log x and let C = (0; 1). The
function f is strictly concave since f 00 (x) = 1=x < 0 for all x > 0 (Corollary 1101). Let us
solve the concave problem
max x log x sub x > 0 (28.7)
x

We have
1
f 0 (x) = 0 () log x = 1 () elog x = e 1
() x =
e
According to the concave method, x
^ = 1=e is the unique solution of problem (28.7). N

Example 1269 Let f : R2 ! R be given by f (x) = 2x2 3xy 6y 2 and let C = R2 . The
function f is strictly concave since the Hessian

4 3
3 12

is negative de…nite (Proposition 1120). Let us solve the concave problem

max 2x2 3xy 6y 2 sub x 2 R2 (28.8)


x

We have
4x 3y = 0
rf (x) = 0 () () x = (0; 0)
12y 3x = 0
By the concave method, the origin x
^ = (0; 0) is the unique solution of problem (28.8). N

Example 1270 For bundles with two goods, the Cobb-Douglas utility function u : R2+ ! R
is u (x1 ; x2 ) = xa1 x12 a , with a 2 (0; 1). Consider the consumer problem

max u (x) sub x 2 (p; w) (28.9)


x

where (p; w) = x = (x1 ; x2 ) 2 R2+ : p1 x1 + p2 x2 = w is the the budget set, with p1 ; p2 >
0 (strictly positive prices). We can easily solve this problem by substitution. Indeed, from
the budget constraint we have
w p 1 x1
x2 =
p2
In view of this expression, de…ne f : [0; w=p1 ] ! R by4
1 a
w p 1 x1
f (x1 ) = xa1
p2
3
Actually, in these applications strict concavity is often assumed in order to have unique solutions, so to
best carry out comparative statics exercises. For instance, in many works in economics, utility functions u
that are de…ned on monetary outcomes – i.e., on the real line – are assumed to be such that u0 > 0 and
u00 < 0, so strictly increasing (Proposition 1005) and strictly concave (Corollary 1101).
4
The condition x1 I=p1 ensures that x2 0.
876 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

Problem (28.9) is equivalent to


w
max f (x1 ) sub x1 2 0;
x1 p1
Since f (0) = f (w=p1 ) = 0 and f 0, the maximizers are easily seen to belong to the open
interval (0; w=p1 ). Therefore, we can consider the nicer unconstrained problem
w
max f (x1 ) sub x1 2 0;
x1 p1
where x1 is required to belong to an open interval. We can actually even do better by
considering the logarithmic transformation g = log f of the objective function f , that is,
w p 1 x1
g (x1 ) = a log x1 + (1 a) log
p2
The problem
w
max g (x1 ) sub x1 2 0;
x1 p1
is equivalent to the last one (Proposition 782), but more tractable because of the log-linear
form of the objective function. We have
a p1 1 (w p 1 x1 )
g 0 (x1 ) = 0 () = (1 a) w p1 x1 () a = p1 (1 a)
x1 p2 p2
x1

Since g is easily checked to be strictly concave, by the concave method the unique maximizer
is
w
x
^1 = a
p1
By replacing it in the budget constraint, we conclude that
w w
x
^= a ; (1 a)
p1 p2
is the unique solution of the Cobb-Douglas consumer problem (28.9). N

28.4 Relationship among problems


In this introductory chapter we introduced the two relevant classes of unconstrained di¤er-
ential optimization problems: coercive and concave ones. A few observations are in order:
1. The two classes are not exhaustive: there are unconstrained di¤erential optimization
problems which are neither coercive nor concave. For example, the unconstrained
di¤erential optimization problem
max cos x sub x 2 R
x

is neither coercive nor concave: the cosine function is neither coercive on the real line
(see Example 805) nor concave. Nonetheless, the problem is trivial: as one can easily
infer from the graph of the cosine function, its solutions are the points x = 2k con
k 2 Z. As usual, common sense gives the best guidance in solving any problem (in
particular, optimization ones), more so than any classi…cation.
28.4. RELATIONSHIP AMONG PROBLEMS 877

2. The two classes are not disjoint: there are unconstrained di¤erential optimization prob-
lems which are both coercive and concave. For example, the unconstrained di¤erential
optimization problem
max 1 x2 sub x 2 R
x

is both coercive and concave: the function 1 x2 is indeed both coercive (see Example
811) and strictly concave on the real line. In cases such as this one, we use the more
powerful concave method.5

3. The two classes are distinct: there are unconstrained di¤erential optimization problems
which are coercive but not concave, and vice versa.

(a) Let f : R ! R be given by


(
1 x2 if x 0
f (x) =
1 if x > 0

Since f is di¤erentiable (Example 914), the problem

max f (x) sub x 2 R


x

is an unconstrained di¤erential optimization problem. The graph of function f

3
y

1
1

0
O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

shows how it is concave, but not coercive. The optimization problem is thus
concave, but not coercive.
(b) The unconstrained di¤erential optimization problem

x2
max e sub x 2 R
x

5
As coda readers may have noted, this objective function is strongly concave. Indeed, it is for such a class
of concave functions that the overlaps of the two classes of unconstrained di¤erential optimization problems
works at best.
878 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

2
is coercive but not concave: the Gaussian function e x is indeed coercive (Ex-
ample 807) but not concave, as its famous bell graph shows

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

28.5 Relaxation
An optimization problem
max f (x) sub x 2 C
x

with objective function f : A Rn ! R may be solved by relaxation, that is, by considering


an ancillary optimization problem

max f (x) sub x 2 B


x

which is characterized by a larger choice set C B A which is, however, analytically more
convenient (for example it may be convex or open), so that the relaxed problem becomes
coercive or concave. If a solution of the relaxed problem belongs to the original choice set C,
it automatically solves the original problem as well. The following examples should clarify
this simple yet powerful idea, which can allow us to solve optimization problems which are
neither coercive nor concave.

Exercise 1271 (i) Consider the optimization problem


2
max 1 kxk2 ekxk sub x 2 Qn+ (28.10)
x

where Qn+ is the set of vectors in Rn whose coordinates are rational and positive. An obvious
relaxing of the problem is
2
max 1 kxk2 ekxk sub x 2 Rn
x

whose choice set is larger yet analytically more convenient. Indeed the relaxed problem is
coercive and a simple application of the elimination method shows that its solution is the
28.6. OPTIMIZATION AND EQUATIONS: GENERAL LEAST SQUARES 879

origin x^ = 0 (Example 1264). Since it belongs to Qn+ , we conclude that the origin is also
the unique solution of problem (28.10). It would have been far more complex to reach such
a conclusion by studying the original problem directly.
(ii) Consider the consumer problem with log-linear utility
n
X
max ai log xi sub x 2 C (28.11)
x
i=1

where C = B (p; w) \ Qn is the set of bundles with rational components (a realistic assump-
tion). Consider the relaxed version
n
X
max ai log xi sub x 2 B (p; w)
x
i=1

with a larger yet convex –thus analytically more convenient –choice set. Indeed, convexity
itself allowed us to conclude in Section 18.6 that the unique solution of the problem is the
bundle x ^ such that x
^i = ai w=pi for every good i = 1; :::; n. If ai ; pi ; w 2 Q for every i, the
bundle x ^ belongs to C, so is the unique solution of problem (28.11). It would have been far
more complex to reach such a conclusion by studying problem (28.11) directly. N

In conclusion, it is sometimes convenient to ignore some of the constraints of the choice


set when doing so makes the choice set larger yet more analytically tractable, in the hope
that some solutions of the relaxed problem belong to the original choice set.

28.6 Optimization and equations: general least squares


Equations play a key role in unconstrained optimization problems via …rst order conditions.
Interestingly, the converse is also true: equations can be addressed via unconstrained optim-
ization problems. Indeed, consider equation (26.1), i.e.,

f (x) = y0 (28.12)

where f is an operator f : A Rn ! Rn and y0 is a given element of Rn . Consider the


unconstrained optimization problem

min kf (x) y0 k 2 sub x 2 Rn (28.13)


x

If a vector x x ) y0 k 2 =
^ 2 A solves equation (28.12), then it solves problem (28.13). Indeed, kf (^
0. The converse is false because the optimization problem might have solutions even though
the equation has no solutions. Even in this case, however, the optimization connection is
important because the solutions of the optimization problems are the best approximations
–i.e., the best surrogates –of the missing solutions. A classic example is a system of linear
equations Ax = b, which has the form (28.13) via the linear function f (x) = Ax de…ned on
Rn and the known term b 2 Rm , i.e.,

min kAx bk2 sub x 2 Rn (28.14)


x
880 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

In this case (28.13) is a least squares problem and, when the system has no solutions, we
have the least squares solutions studied in Section 18.9.
In sum, the solutions of the optimization problem (28.13) are candidate solutions of equa-
tion (28.12). If they turn out not to be solutions, they are nevertheless best approximations.
As to problem (28.13), assume that the image of f is a closed convex set of Rn . Consider
the auxiliary problem
min ky y0 k2 sub y 2 Im f
y

By the general Projection Theorem (Section 24.10), there is a unique solution y^ 2 Im f ,


which is characterized by the condition

(y0 y^) (^
y y) 0 8y 2 Im f

All the vectors x 2 f 1 (^


y ) that belong to the preimage of y^ are, then, the candidate solutions
of equation (28.12). In the linear case f (x) = Ax we get back to the least squares solutions
(19.7).
This simple argument, which generalizes the spirit of the least squares method from
linear to general equations, illustrates the possibility of solving equations via optimization
problems. The problems of …nding solutions of equations and of optimization problems are
closely connected, more than it may appear prima facie. Each of the two problems can be
addressed via the other one, which then plays an ancillary role that becomes relevant when
it features signi…cantly better computational properties than the original problem.

28.7 Coda: computational issues


Motivated by the last section, in this coda we discuss some computational issues for optim-
ization problems.6 Throughout we consider an optimization problem

max f (x) sub x 2 C (28.15)


x

that admits at least a solution, i.e., arg maxx2C f (x) 6= ;. To ease notation, we denote the
maximum value by f^ = maxx2C f (x).

28.7.1 Decision procedures


De…nition 1272 A sequence fxn g C is relaxing for problem (28.15) if f (xn ) f (xn+1 )
for all n .

In words, a sequence fxn g in the choice set is relaxing if the objective function assumes
larger and larger values, so it gets closer and closer to the maximum value f^, as n increases.
The following notion gives some computational content to problem (28.15).

De…nition 1273 Let f : A Rn ! R be a real-valued function and C a subset of A. A


self-map h : C ! C is a (homogeneous) optimal decision procedure with speed k > 0 of
problem (28.15) if, for each initial condition x0 2 C, the sequence of iterates

xn+1 = h (xn )
6
We refer interested readers to Nesterov (2004) for a authoritative presentation of this topic.
28.7. CODA: COMPUTATIONAL ISSUES 881

is a relaxing sequence such that, for some constant c > 0,


c
f^ f (xn )
nk
The sequence of iterates fxn g is de…ned recursively via h. We consider the convergences
of images f (xn ) because one should be primarily interested in getting, as fast as possible,
to values that are almost optimal. Indeed, solutions have per se only an instrumental role,
ultimately what matters is the value that they permit to attain. In particular, given a
threshold " > 0, iterates xn are "-optimal if
1
c k
n
"
So, if we are willing to accept an " deviation from the maximum value, it is enough to
1
perform (c=") k iterates.

28.7.2 Gradient descent


We can establish the existence of optimal decision procedures for di¤erentiable objective
functions that have Lipschitz continuous derivative operators. Speci…cally, say that a func-
tion f : U ! R de…ned on an open set of Rn is -smooth, for some constant > 0, if it is
di¤erentiable with
krf (x) rf (y)k kx yk 8x; y 2 U
We consider the following unconstrained version of problem (28.15):

max f (x) sub x 2 Rn (28.16)


x

Theorem 1274 Let f : Rn ! R be a -smooth. If f is concave, then the map h : Rn ! Rn


de…ned by
1
h (x) = x + rf (x) (28.17)

is an optimal decision procedure for problem (28.16), with

^ k2
2 kx0 x
f^ f (xn ) (28.18)
n
for the sequence fxn g of its iterates.

Thus, objective functions that are -smooth and concave have a optimal decision pro-
cedure (28.17), called gradient descent, with unitary speed. The gradient descent procedure
prescribes that, if at x we have @f (x) =@xi > 0 (resp., < 0), in the next iterate we increase
(resp., decrease) the component i of the vector x. If one draws the graph of a scalar concave
function, the intuition behind this rule should be apparent.7 This rule reminds a basic rule
of thumb when trying to reach the peak of a mountain: at a crossroad, always take the rising
path.
The proof relies on the following lemma of independent interest (it is a …rst order ap-
proximation with integral remainder).
7
A dual version of this result holds for minimization problem with convex objective functions, with h (x) =
1
x rf (x).
882 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

Lemma 1275 Let f : U ! R be a di¤ erentiable function de…ned on an open set of Rn .


Then Z 1
f (x) f (y) = rf (x + t (y x)) (y x) dt
0

for all x; y 2 U .

Proof Let x; y 2 U . De…ne the auxiliary function : [0; 1] ! R by (t) = f ((1 t) x + ty).
Since f is di¤erentiable, the function is easily seen to be di¤erentiable. By the chain rule,
we then have
n
X
0 @f ((1 t) x + ty)
(t) = (yi xi ) = rf (x + t (y x)) (y x)
@xi
i=1

By (35.57), we have
Z 1 Z 1
0
f (y) f (x) = (1) (0) = (t) dt = rf (x + t (y x)) (y x) dt
0 0

as desired.

The next lemma reports some important inequalities for -smooth functions.

Lemma 1276 Let f : U ! R be a -smooth function de…ned on an open set of Rn . Then

f (y) f (x) + rf (x) (y x) + ky xk2 (28.19)


2
for all x; y 2 U . If, in addition, f and U are convex, then

krf (x) rf (y)k2 (rf (x) rf (y)) (x y) (28.20)

for all x; y 2 U .

Proof By (1275), we can write


Z 1
f (y) f (x) = rf (x + t (y x)) (y x) dt
0
Z 1
= rf (x) (y x) + [rf (x + t (y x)) rf (x)] (y x) dt
0
Z 1
rf (x) (y x) + krf (x + t (y x)) rf (x)k k(y x)k dt
0
Z 1
rf (x) (y x) + t ky xk2 dt
0
Z 1
2
= rf (x) (y x) + ky xk tdt
0
2
= rf (x) (y x) + ky xk
2
28.7. CODA: COMPUTATIONAL ISSUES 883

where the …rst inequality follows from the Cauchy-Schwarz inequality. This proves (28.19).
Assume that f and U are convex. Then, (28.19) implies

0 f (y) f (x) rf (x) (y x) + ky xk2


2
Fix x0 2 U and de…ne the auxiliary function ' : U ! R by ' (x) = f (x) rf (x0 ) x. Since
r' (x) = rf (x) rf (x0 ), we have kr' (x) r' (y)k = krf (x) rf (y)k. So, also this
auxiliary function has a -Lipschitz continuous derivative operator. Moreover, r' (x0 ) = 0
and so x0 is a minimizer of '. Along with (28.19), this implies
1 1
' (x0 ) ' x r' (x) ' (x) + r' (x) x r' (x) x
1 2
+ x r' (x) x
2
1 1 2
= ' (x) r' (x) r' (x) + r' (x)
2
1 1 1
= ' (x) kr' (x)k2 + kr' (x)k2 = ' (x) kr' (x)k2
2 2
for all x 2 U . Thus,
1
f (x0 ) rf (x0 ) x0 f (x) rf (x0 ) x krf (x) rf (x0 )k2
2
that is,
1
f (x0 ) + rf (x0 ) (x krf (x) x0 ) + rf (x0 )k2 f (x)
2
Since x0 was arbitrarily chosen, we conclude that
1
f (x) + rf (x) (y x) + krf (y) rf (x)k2 f (y) (28.21)
2
for all x; y 2 U . Since x and y play a symmetric role, by interchanging them we have
1
f (y) + rf (y) (x y) + krf (y) rf (x)k2 f (x) (28.22)
2
By adding up (28.21) and (28.22), we get (28.20).

Proof of Theorem 1274 Set g = f . Clearly, also the function g is -smooth. Since g is
convex, we then have

0 g (y) g (x) rg (x) (y x) + ky xk2


2
Moreover, xn+1 = xn + 1 rf (xn ) = xn 1 rg (x
n ). Thus:
1 1
g (xn+1 ) g (xn ) rg (xn ) rg (xn ) + krg (xn+1 )k2
2
1 1
krg (xn+1 )k2 + krg (xn+1 )k2
2
1
= krg (xn+1 )k2
2
884 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

where the second inequality follows from the Cauchy-Schwarz inequality. Since krf (x)k =
krg (x)k for all x 2 Rn , we thus have
1
krf (xn+1 )k2 f (xn+1 )
f (xn ) +
2
for all n, so the sequence fxn g is relaxing. In particular, we have
1
f^ f (xn+1 ) f^ f (xn ) krf (xn )k2 (28.23)
2
Next we show that
kxn+1 x
^k kxn x
^k 8n 0 (28.24)
Indeed, since g is -smooth and convex we have
2
1 1
kxn+1 ^ k2 =
x xn rg (xn ) x
^ = kxn ^ k2 +
x 2
krg (xn )k2
2
rf (xn ) (xn x
^)
1 21
kxn ^k2 +
x 2
krg (xn )k2 krf (x)k2
1
= kxn ^k2
x 2
krg (xn )k2

where the inequality follows from (28.20) with y = x


^, so that rg (y) = 0.
By concavity, we have f^ f (xn ) + rf (xn ) (^ x xn ), so
f^ f (xn ) rf (xn+1 ) (^
x xn ) krf (xn )k kxn x
^k kx0 x
^k krf (xn )k
where the last inequality follows from (28.24). Then
h i2 h i h i
f^ f (xn ) kx0 x ^k2 krf (xn+1 )k2 2 kx0 x ^k2 f^ f (xn ) f^ f (xn+1 )

Set dn = f^ f (xn ) for each n. We can write the last inequality as


d2n 2 (dn dn+1 ) kx0 ^k2
x
By (28.23), 0 dn+1 dn . Assume dn > 0 for each n, otherwise xn is the maximizer. Then
dn 1 1 1
1 2 (dn dn+1 ) kx0 ^k2
x =2 kx0 ^k2
x
dn+1 dn dn+1 dn+1 dn
that is,
1 1 1
dn+1 dn 2 kx0 ^ k2
x
By iterating we get
1 1 1
2 +
d1 2 kx0 x
^k d0
1 1 1 1 1 1 2 1
d2 2 + d 2 + 2 +
d0
= 2 +
d0
2 kx0 x
^k 1 2 kx0 x
^k 2 kx0 x
^k 2 kx0 x
^k

1 n 1
2 +
dn 2 kx0 x
^k d0
28.7. CODA: COMPUTATIONAL ISSUES 885

Since d0 > 0, we then have 1=dn n=2 kx0 ^k2 , so


x

^k2
2 kx0 x
0 < dn
n

This proves (28.18).

Example 1277 Given a matrix A , with n m, consider the least squares optimization
m n
problem (28.14), i.e.,
max g (x) sub x 2 Rn
x

with g : Rn ! R de…ned by g (x) = kAx bk2 . Then, rg (x) = AT (Ax b), so for
some > 0 we have

krf (x) rf (y)k = AT (Ax b) + AT (Ay b) = AT A (x y) kx yk

where the last inequality holds because the Gram matrix AT A induces a linear operator
g : Rn ! Rn de…ned by g (x) = AT Ax, which is Lipschitz continuous by Theorem 729.
We conclude that g is -smooth. Since it is also concave, by the last theorem the map
h : Rn ! Rn de…ned by
1 T
h (x) = x A (Ax b)

is an optimal decision procedure for the least squares problem. In particular,

^ k2
2 kx0 x
f^ f (xn )
n
for the sequence of iterates

1
xn+1 = xn AT (Axn b)

generated by h. N

28.7.3 Maximizing sequences


So far we considered convergence to maximum values. We now turn to convergence to
solutions. To this end, we introduce the following notion.

De…nition 1278 A sequence fxn g C is maximizing for problem (28.1) if lim f (xn ) = f^.

Next we show that under some standard conditions maximizing sequences converge to
solutions.

Proposition 1279 Let f : Rn ! R be strictly concave and supercoercive. A sequence fxn g


is maximizing for problem (28.16) if and only if it converges to the solution x
^.
886 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

Proof We prove the “if” because the converse is trivial. Let x ^ be the unique solution of
problem (28.1). Let fxn g be maximizing, i.e., lim f (xn ) = f^. We want to show that xn ! x^.
Suppose, by contradiction, that there exists " > 0 and a subsequence fxnk g such that
kxnk x ^k " for all k (cf. Proposition 1557). Since limk!+1 f (xnk ) = f^, there exists some
scalar t such that eventually all terms of the subsequence fxnk g belong to the upper contour
set (f t). The supercoercive function f is continuous because it is concave (Theorem
669). So, the set (f t) is compact (cf. Proposition 820). By the Bolzano-Weierstrass’
Theorem, there exists a subsubsequence xnks that converges to some x 2 (f t). Since
f is continuous, we have lims!+1 f xnks = f (x ) f^ = lims!+1 f xnks , where the
^ ^
equality follows from lim f (xn ) = f . So, f = f (x ). In turn, this implies x
^ = x . We thus
reached the contradiction:
0<" xnks x
^ xnks x + kx ^k = xnks
x x !0
We conclude that xn ! x
^.

Example 1280 In the last example, assume that (A) = n . By Theorem 854, the function
g is strictly concave and supercoercive. So, the iterates
1
xn+1 = xn AT (Axn b) (28.25)
1
^ = AT A
converge to the least squares solution x AT b. The iteration does not require any
matrix inversion. N

Thus, for optimization problems featuring strictly concave and supercoercive objective
functions, the sequence recursively de…ned via a decision procedure converges to the solution.
If we make the stronger assumption that the objective function is strongly concave,8 then
we can bound the rate of convergence to solutions of maximizing sequences.

Proposition 1281 If f : Rn ! R is strongly concave, then there exists a constant >0


such that p
kx x ^k f (x) f (^x)
for every x 2 Rn .

Thus, for a the sequence fxn g recursively de…ned via a decision procedure with speed k
we have p
c
kxn x ^k k
n2
provided the objective function is strongly concave.

Example 1282 In the last example, we have r2 g (x) = AT A, so g is strongly concave if


there exists < 0 such that the matrix AT A I is negative de…nite. If this the case, the
iterates (28.25) converge to the least squares solution with rate
p
2 kx0 x ^k
p
n
p
because it is easy to check that we can take = . N
8
Recall that strongly concave functions are strictly concave and supercoercive (Section 24.10).
28.7. CODA: COMPUTATIONAL ISSUES 887

The proof of Proposition 1281 is an easy consequence of the following lemma that
sharpens for strongly concave functions a classic inequality that holds for concave functions
(cf. Theorem 1117).

Lemma 1283 Let f : U ! R be a strongly concave and -smooth function de…ned on an


open and convex set of Rn . Then there exists a constant k > 0 such that

f (y) f (x) + rf (x) (y x) k kx yk2 (28.26)

for all x; y 2 U .

Proof By de…nition, there is k > 0 such that the function g : U ! R de…ned by g (x) =
f (x) + k kxk2 is concave. Then, for all x; y 2 U we have

g (y) g (x) + rg (x) (y x)

so that

f (y) + k kyk2 f (x) + k kxk2 + rf (x) (y x) + 2kx (y x) (28.27)

We have

k kxk2 k kyk2 + 2kx (y x) = k kxk2 kyk2 + 2x (y x)

= k kxk2 kyk2 + 2x y x x

= k kxk2 kyk2 + 2x y = k kx yk2

So, by (28.27) we have f (y) f (x) + rf (x) (y x) k kx yk2 , as desired.

Proof of Proposition 1281 Assume that f is strongly concave with constant k > 0. By
(28.26), we have f (x) pf p (^
x) + rf (^
x) (x x ^) k kx x ^k2 = f (^x) k kx x ^k2pfor all
n
x 2 R . So, k^ x xk k f (^ n
x) f (x) for all x 2 R . In turn, by setting = k this
easily implies the desired result.

28.7.4 Final remarks


For the optimization problem (28.15) with the set C closed and convex, the gradient descent
procedure becomes
1
h (x) = PC x + rf (x)

where PC : Rn ! C is the projection operator (Section 24.10). Indeed, the projection ensures
that the next iterate keeps being an element of the choice set C.

Example 1284 (i) Let C = fx 2 Rn : Ax = bg be the a¢ ne set determined by a matrix


A , with m n. Consider an optimization problem
m n

max f (x) sub x 2 C


x
888 CHAPTER 28. UNCONSTRAINED OPTIMIZATION

1
If (A) = m, by (24.65) we have PC (x) = x + AT AAT (b Ax) for all x 2 Rn . So

1 1 1 1
h (x) = PC x+ rf (x) =x+ rf (x) + AT AAT b A x+ rf (x)

1 1 1 1
= x+ rf (x) + AT AAT b AT AAT A x+ rf (x)

provided f is di¤erentiable.
(ii) Let C = Rn+ be the positive orthant. Consider an optimization problem

max f (x) sub x 0


x

+
By (24.65), PC (x) = x+ for all x 2 Rn , so h (x) = x + 1 rf (x) provided f is di¤eren-
tiable. N

Finally, there exist “accelerated” decision procedure that have speed 2, i.e., for some
constant c > 0 we have
c
f^ f (xn )
n2
Roughly speaking, they have a bivariate form
1
yn+1 = xn + rf (xn )
xn+1 = n yn+1 + n yn

as readers will learn in more advanced courses.


Chapter 29

Equality constraints

29.1 Introduction
The classic necessary condition for local extremal points given by Fermat’s Theorem considers
interior points of the choice set C, something that greatly limits its use in …nding candidate
solutions of optimization problems coming from economics. Indeed, in many of them the
hypotheses of monotonicity of Proposition 783 hold and, therefore, the possible solutions are
on the boundary of the choice set, not in its interior. A classic example is the consumer
problem
max u (x) sub x 2 B (p; w) (29.1)
x
Under a standard hypothesis of monotonicity, by Walras’law the problem can be rewritten
as
max u (x) sub x 2 (p; w)
x

where the set (p; w) = fx 2 A : p x = wg @B (p; w) is determined by an equality con-


straint – the consumer exhausts his budget in the purchase of the optimal bundle. The set
(p; w) has no interior points, that is,

int (p; w) = ;

Fermat’s Theorem is thus useless for …nding the candidate solutions of the consumer prob-
lem. The equality constraint, with its drastic topological consequences, deprives us of this
fundamental result in the study of the consumer problem. Fortunately, there is an equally
important result of Lagrange that rescue us, as this chapter will show.

29.2 The problem


The general form of an optimization problem with equality constraints is given by

max f (x) (29.2)


x
sub g1 (x) = b1 ; g2 (x) = b2 ; :::; gm (x) = bm

where f : A Rn ! R is the objective function, while the functions gi : A Rn ! R and


the scalars bi represent m equality constraints. Throughout the chapter we assume that all

889
890 CHAPTER 29. EQUALITY CONSTRAINTS

the functions f and gi are continuously di¤erentiable on a non-empty and open subset D of
their domain A; that is, ; =
6 D int A.
The set
C = fx 2 A : gi (x) = bi 8i = 1; :::; mg (29.3)
is the subset of A identi…ed by the constraints. Therefore, optimization problem (29.2) can
be equivalently formulated in canonical form as

max f (x) sub x 2 C


x

Nevertheless, for this special class of optimization problems we will often use the more
evocative formulation (29.2).
In what follows we will …rst study in detail the important special case of a single con-
straint, which we will then generalize in Section 29.7 to the case of several constraints.

29.3 One constraint


29.3.1 A key lemma
With a single constraint, the optimization problem (29.2) becomes:

max f (x) sub g (x) = b (29.4)


x

where f : A Rn ! R is the objective function, while the function g : A Rn ! R and the


scalar b de…ne the unique equality constraint.

The next fundamental lemma gives the key to …nding the solutions of problem (29.4).
The hypothesis x ^ 2 C \ D requires that x^ be a point of the choice set at which f and g are
both continuously di¤erentiable. Moreover, we require that rg (^ x) 6= 0. In this regard, note
that a point x 2 D is said to be regular (with respect to the constraints) if rg (x) = 0, and
singular otherwise. According to this terminology, the condition rg (^ x) 6= 0 requires point
x
^ to be regular.

Lemma 1285 Let x ^ 2 C \ D be a local solution of the optimization problem (29.4). If


x) 6= 0, then there exists a scalar ^ 2 R such that
rg (^

x) = ^ rg (^
rf (^ x) (29.5)

By unzipping gradients, the condition can be equivalently written as

@f @g
x) = ^
(^ (^
x) 8k = 1; :::; n
@xk @xk

Thus, a necessary condition for x ^ to be a local solution of the optimization problem (29.4)
is that the gradients of the functions f and g are proportional. The “hat” above reminds
us that this scalar depends on the point x^ considered.

Next we give a proof of this remarkable fact based on the Implicit Function Theorem.
29.3. ONE CONSTRAINT 891

Proof We prove the lemma for n = 2 (the extension to arbitrary n is routine if one uses a
version of the Implicit Function Theorem for functions of n variables). Since rg (^ x) 6= 0, at
least one of the two partial derivatives @g=@x1 or @g=@x2 is non-zero at x ^. Let for example
@g=@x2 (^x) 6= 0 (in the case @g=@x1 (^ x) 6= 0 the proof is symmetric). As seen in Section
25.3.2, the Implicit Function Theorem can be applied also to study locally points belonging
to the level curves g 1 (b) with b 2 R. Since x ^ = (^ ^2 ) 2 g 1 (b), this theorem yields
x1 ; x
neighborhoods U (^ x1 ) and V (^x2 ) and a unique di¤erentiable function h : U (^ x1 ) ! V (^x2 )
such that x^2 = h (^
x1 ) and g (x1 ; h(x1 )) = b for each x1 2 U (^ x1 ), with
@g
@x1 (x1 ; x2 )
h0 (x1 ) = @g
8 (x1 ; x2 ) 2 g 1
(b) \ (U (^
x1 ) V (^
x2 ))
@x2 (x1 ; x2 )

Consider the auxiliary function : U (^


x1 ) ! R de…ned by (x1 ) = f (x1 ; h(x1 )). By the
chain rule, the derivative of is

0 @f @f
(x1 ) = (x1 ; h(x1 )) + (x1 ; h(x1 )) h0 (x1 )
@x1 @x2

Since x
^ is a local solution of the optimization problem (29.4), there exists a neighborhood
B" (^
x) of x
^ such that
f (^
x) f (x) 8x 2 g 1 (b) \ B" (^
x) (29.6)
Without loss of generality, suppose that " is su¢ ciently small so that

(^
x1 "; x
^1 + ") U (^
x1 ) and (^
x2 "; x
^2 + ") V (^
x2 )

Hence, B" (^
x) U (^
x1 ) V (^
x2 ). This permits to rewrite (29.6) as

f (^
x1 ; h (^
x1 )) f (x1 ; h (x1 )) 8x1 2 (^
x1 "; x
^1 + ")

that is, (^
x1 ) (x1 ) for every x1 2 (^ x1 "; x^1 + "). The point x ^1 is, therefore, a local
maximizer for . The …rst-order condition reads
@g
!
0 @f @f @x1 (^
x1 ; x^2 )
(x1 ) = (^
x1 ; x
^2 ) (^
x1 ; x
^2 ) @g =0 (29.7)
@x1 @x2 (^
x 1 ; x
^ 2 )
@x2

If (@g=@x1 ) (^
x1 ; x
^2 ) 6= 0, we have
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )

the common value of which we denote by ^ . Then we get


8
< @f (^ ^2 ) = ^ @x
@x1 x1 ; x
@g
1
(^
x1 ; x
^2 )
: @f
= ^ @x
@g
@x2 (^
x1 ; x
^2 ) 2
(^
x1 ; x
^2 )

or, equivalently, rf (^ ^2 ) = ^ rg(^


x1 ; x x1 ; x
^2 ), that is, (29.5).
892 CHAPTER 29. EQUALITY CONSTRAINTS

If (@g=@x1 ) (^
x1 ; x
^2 ) = 0, then (29.7) yields

@f
(^
x1 ; x
^2 ) = 0
@x1

so that the equality


@f @g
(^ ^2 ) = ^
x1 ; x (^
x1 ; x
^2 )
@x1 @x1
is trivially veri…ed for every scalar ^ . Setting
@f
@x2 (^
x1 ; x
^2 )
@g
=^
@x2 (^
x1 ; x
^2 )

we therefore have again rf (^ ^2 ) = ^ rg(^


x1 ; x x1 ; x
^2 ), that is, (29.5).

The next example shows that condition (29.5) is necessary, but not su¢ cient.

Example 1286 The optimization problem:

x31 + x32
max sub x1 x2 = 0 (29.8)
x1 ;x2 2

is of the form (29.4), where f; g : R2 ! R are given by f (x) = 2 1 (x31 + x32 ) and g (x) =
x1 x2 , while b = 0. We have rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1), so ^ = 0 is such
that rf (0; 0) = ^ rg (0; 0). Hence, the origin (0; 0) satis…es condition (29.5) with ^ = 0.
But, the origin is not a solution of problem (29.8):

f (t; t) = t3 > 0 = f (0; 0) 8t > 0 (29.9)

Note that the origin is not even a constrained (global) minimizer since f (t; t) = t3 < 0 for
every t < 0. N

To understand intuitively condition (29.5), assume that f and g are de…ned on R2 , so


that (29.5) has the form:

@f @f @g @g
(^
x) ; (^
x) =^ (^
x) ; (^
x)
@x1 @x2 @x1 @x2

that is,
@f @g @f @g
x) = ^
(^ (^
x) and x) = ^
(^ (^
x) (29.10)
@x1 @x1 @x2 @x2
The condition rg (^ x) 6= 0 means that at least one of the partial derivatives (@g=@xi ) (^
x) is
di¤erent from zero. If, for convenience, we suppose that both are non-zero and that ^ 6= 0,
then (29.10) is equivalent to
@f @f
@x1 (^
x) @x2 (^
x)
@g
= @g
(29.11)
@x1 (^
x) @x2 (^
x)
29.3. ONE CONSTRAINT 893

Let us try now to understand intuitively why (29.11) is necessary for x


^ to be a solution of
the optimization problem (29.4). The di¤erentials of f and g at x
^ are given by

@f @f
df (^
x) (h) = rf (^
x) h = (^
x ) h1 + (^
x ) h2 8h 2 R2
@x1 @x2
@g @g
dg (^
x) (h) = rg (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
They linearly approximate the di¤erences f (^ x + h) f (^x) and g (^x + h) g (^x), that is, the
e¤ect of moving from x ^ to x
^ +h on f and g. As we know well by now, such an approximation is
the better the smaller h. Suppose, ideally, that h is in…nitesimal and that the approximation
is exact, so that f (^ x + h) f (^ x) = df (^x) (h) and g (^
x + h) g (^ x) = dg (^
x) (h). This is
clearly incorrect formally, but here we are proceeding heuristically.
Continuing in our heuristic reasoning, let us start now from the point x ^ and let us
consider variations x ^ + h with h in…nitesimal. The …rst issue to worry about is whether they
are legitimate, i.e., whether they satisfy the equality constraint g (^ x + h) = b. This means
that g (^
x + h) = g (^x), so h must be such that dg (^ x) (h) = 0. It follows that

@g @g
(^
x ) h1 + (^
x) h2 = 0
@x1 @x2
and so
@g
@x2 (^
x)
h1 = @g
h2 (29.12)
@x1 (^
x)
The e¤ect of moving from x ^ to x
^ + h on the objective function f is given by df (^
x) (h). When
h is legitimate, by (29.12) this e¤ect is given by
@g
!
@f @x2 (^
x) @f
df (^
x) (h) = (^
x) @g
h2 + (^
x) h2 (29.13)
@x1 (^
x) @x 2
@x1

If x
^ is solution of the optimization problem, we must necessarily have df (^ x) (h) = 0 for
every legitimate variation h. Otherwise, if, say df (^x) (h) > 0, one would have a point x ^+h
that satis…es the equality constraint, but such that f (^ x + h) > f (^x). On the other hand, if
instead df (^
x) (h) < 0 the same observation could be made this time for h, which is obviously
a legitimate variation, and that would lead to the point x ^ h with f (^ x h) > f (^ x).
The necessary condition df (^ x) (h) = 0 together with (29.13) gives
@g
!
@f @x2 (^
x) @f
(^
x) @g
h2 + (^
x) h2 = 0
@x1 (^
x) @x 2
@x1

If, as it is natural, we assume h2 6= 0, then


@g
!
@f @x2 (^
x) @f
(^
x) @g
+ (^
x) = 0
@x1 (^
x) @x2
@x1

which is precisely expression (29.11). At an intuitive level, all this explains why (29.5) is
necessary for x
^ to be solution of the problem.
894 CHAPTER 29. EQUALITY CONSTRAINTS

29.3.2 Lagrange’s Theorem


Lemma 1285 gives a rather intuitive necessary condition for optimality. This condition can
be equivalently written as
x) ^ rg (^
rf (^ x) = 0
By recalling the algebra of gradients, the expression rf (x) rg (x) makes it natural to
introduce the function L : A R Rn R ! R de…ned by

L (x; ) = f (x) + (b g (x)) 8 (x; ) 2 A R (29.14)

This function, called the Lagrangian, plays a key role in optimization problems. Its gradient
is
@L @L @L
rL (x; ) = (x; ) ; :::; (x; ) ; (x; ) 2 Rn+1
@x1 @xn @
It is important to distinguish in this gradient the two parts rx L and r L given by

@L @L
rx L (x; ) = (x; ) ; :::; (x; ) 2 Rn
@x1 @xn
and
@L
r L (x; ) = (x; ) 2 R
@
Using this notation, we have

rx L (x; ) = rf (x) rg (x) (29.15)

and
r L (x; ) = b g (x) (29.16)
which leads to the following fundamental formulation of the necessary condition of optimality
of Lemma 1285 in terms of the Lagrangian function.

Theorem 1287 (Lagrange) Let x ^ 2 C \ D be a local solution of the optimization problem


(29.4). If rg (^ x) 6= 0, then there exists a scalar ^ 2 R, called Lagrange multiplier, such that
x; ^ ) 2 Rn+1 is a stationary point of the Lagrangian function.
the pair (^

Proof Let x^ be solution of the optimization problem (29.4). By Lemma 1285, there exists
^ 2 R such that
rf (^x) ^ rg (^x) = 0
By (29.15), the condition is equivalent to

x; ^ ) = 0
rx L(^

x; ^ ) = 0
On the other hand, by (29.15) we have r L (x; ) = b g (x), so we have also r L(^
since b g (^
x) = 0. It follows that (^ ^
x; ) is a stationary point of L.

Thanks to Lagrange’s Theorem, the search for local solutions of the constrained optim-
ization problem (29.4) reduces to the search for the stationary points of a suitable function
of several variables, the Lagrangian function. It is a more complicated function than the
29.3. ONE CONSTRAINT 895

original function f because of the new variable , but through it the search for the solutions
of the optimization problem can be done by solving a standard …rst-order condition, similar
to the ones seen for unconstrained optimization problems.
Needless to say, we are discussing a condition that is only necessary: there is no guarantee
that the stationary points are actually solutions of the problem. It is already a remarkable
achievement, however, to be able to use the simple (…rst-order) condition
rL (x; ) = 0 (29.17)
to search for the possible candidate solutions of the constrained optimization problem (29.4).
In the next section we will see that this condition plays a fundamental role in the search for
the local solutions of problem (29.4) with the Lagrange’s method, which in turn may lead to
the global solutions through a version of the elimination method.

x; ^ ) is not
We close with two important remarks. First, observe that in general the pair (^
a maximizer of the Lagrangian function, even when x ^ turns out to solve the optimization
problem. The pair (^ ^
x; ) is just a stationary point for the Lagrangian function, nothing
more. Therefore, it is erroneous to assert that the search for solutions of the constrained
optimization problem reduces to the search for maximizers of the Lagrangian function.
Second, note that problem (29.4) has a symmetric version
min f (x) sub g (x) = b
x

in which, instead of looking for maximizers, we look for minimizers. Condition (29.5) is
necessary also for this version of problem (29.4) and, therefore, the stationary points of the
Lagrangian function could be minimizers instead of maximizers. However, it can be the
case that they are neither maximizers nor minimizers. This is the usual ambiguity of …rst-
order conditions, encountered also in unconstrained optimization: it re‡ects the fact that
…rst-order conditions are only necessary conditions.

29.3.3 A heuristic interpretation of the multiplier


Lagrange multipliers have a nice interpretation in terms of marginal e¤ects. To present
it properly we would need some notions that we will only introduce later in the book in
Chapter 33. We refer, therefore, readers to that chapter for a more complete exposition
of the marginal interpretation of multiplies (Section 33.6). Here we can sketch, however, a
heuristic argument that gives a ‡avor of this interpretation.
If we change the scalar b that de…nes the equality constraint, we have a new optimization
problem (29.4), with new solutions x ^. Suppose, for simplicity, that for each possible value of
the scalar b, the resulting optimization problem has a unique solution, denoted x ^ (b), with
multiplier ^ (b). We have, therefore, informally de…ned two functions x ^ ( ) and ^ ( ) that
associate to each value of b the solution x ^ (b) and the multiplier ^ (b) of the corresponding
optimization problem. The “optimal” objective function is then f (^ x (b)). As b varies, so
varies the maximum value that the objective function attains at the unique solution.
To ease matters, assume that n = 1 so that the choice variable x is a scalar. The equality
constraint implies that g (^
x (b)) b = 0 for every scalar b. By a heuristic application of the
chain rule, we then have
@g (^
x (b)) dx (b)
1=0 (29.18)
@x db
896 CHAPTER 29. EQUALITY CONSTRAINTS

On the other hand, again by a heuristic application of the chain rule we have

df (^
x (b)) df (^
x (b)) dx (b)
=
db dx db
@f (^x (b)) ^ @g (^
x (b)) ^ @g (^x (b)) dx (b)
= (b) + (b)
@x @x @x db
@f (^x (b)) ^ @g (^
x (b)) 0 @g (^
x (b)) dx (b)
= (b) x^ (b) + ^ (b)
@x @x @x db
| {z }
=0 by (29.5)
x (b)) dx (b) ^
@g (^
= ^ (b) = (b)
@x db
where the last equality follows from (29.18). Summing up, for every scalar b we have

df (^
x (b)) ^
= (b)
db
The multiplier is thus the “marginal maximum value” in that it quanti…es the marginal
e¤ect on the attained maximum value of (slightly) altering the constraint. For instance, in
the consumer problem the scalar b is the income of the consumer, so the multiplier quanti…es
the marginal e¤ect on the attained maximum utility of a (small) variation in income.

N.B. We are using the word “altering” rather than “relaxing” because by changing b the
choice set (29.3) does not get larger. It just becomes di¤erent. So, a priori, a change in b
might not be bene…cial (indeed, the sign of the multiplier can be positive or negative). In
contrast, the word “relaxing”becomes appropriate in studying variations of the scalars that
de…ne inequality constraints (cf. the discussion in Section 33.6). O

29.4 The method of elimination


Lagrange’s Theorem suggests the following procedure, which we may call Lagrange’s method,
for the search of local solutions of the optimization problem (29.4):

1. determine the set D where the functions f and g are continuously di¤erentiable;

2. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di¤erentiable;

3. setting D0 = fx 2 D : rg (x) = 0g, determine the set C \ D0 of the singular points


that satisfy the constraint;

4. determine the set S of the regular points x 2 C \ (D D0 ) for which there exists a
Lagrange multiplier 2 R such that the pair (x; ) 2 Rn+1 is a stationary point of the
Lagrangian function, that is, it satis…es the …rst-order condition (29.17);1
1
Note that S C because the points that satisfy condition (29.17) also satisfy the constraints. It is
therefore not necessary to check if for a point x 2 S we have also x 2 C.
29.4. THE METHOD OF ELIMINATION 897

5. the local solutions of the optimization problem (29.4), if they exist, belong to the set

S [ (C \ D0 ) [ (C D) (29.19)

Thus, according to Lagrange’s method, the possible local solutions of the optimization
problem (29.4) must be searched among the points of the subset (29.19) of C. Indeed, a
local solution that is a regular point will belong to the set S thanks to Lagrange’s Theorem.
However, this theorem does not say anything about possible local solutions that are singular
points – and so belong to the set C \ D0 – as well as about possible local solutions where
the functions do not have a continuous derivative –and so belong to the set C D.
In conclusion, a necessary condition for a point x 2 C to be a local solution for the
optimization problem (29.4) is that it belongs to the subset S [ (C \ D0 ) [ (C D) C.
This is what this procedure, a key dividend of Lagrange’s Theorem, establishes. Clearly, the
smaller such a set is, the more e¤ective the application of the theorem is: the search for local
solutions can be then restricted to a signi…cantly smaller set than the original set C.

That said, what about global solutions? If the objective function f is coercive and
continuous on C, the …ve phases of the Lagrange’s method plus the following extra sixth
phase provide a version of the elimination method to …nd global solutions.

6. Compute the set ff (x) : x 2 S [ (C \ D0 ) [ (C D)g; if a point x


^ 2 S [ (C \ D0 ) [
(C D) is such that

f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D) (29.20)

then x
^ is a (global) solution of the optimization problem (29.4).

In other words, the points of the set (29.19) in which f attains its maximum value are the
solutions of the optimization problem. Indeed, by Lagrange’s method this is the set of the
possible local solutions; global solutions, whose existence is ensured by Tonelli’s Theorem,
must then belong to such a set. Hence, the solutions of the “restricted”optimization problem

max f (x) sub x 2 S [ (C \ D0 ) [ (C D) (29.21)


x

are also the solutions of the optimization problem (29.4). Phase 6 is based on this remark-
able fact. As for the Lagrange’s method, the smaller the set (29.19) is, the more e¤ective
the application of the elimination method is. In particular, in the lucky case when it is a
singleton, the elimination method determines the unique solution of the optimization prob-
lem, a remarkable achievement.

In sum, the elimination method is an elegant combination of a global existence result,


Tonelli’s Theorem, and a local di¤erential result, Lagrange’s Theorem. In the rest of the
section we illustrate the procedure with some analytical examples. In the next section we
will consider the classic consumer problem.

Example 1288 The optimization problem:


n
X
kxk2
max e sub xi = 1 (29.22)
x
i=1
898 CHAPTER 29. EQUALITY CONSTRAINTS

2 P
is of the form (29.4), where f; g : Rn ! R are given by f (x) = e kxk and g (x) = ni=1 xi ,
and b = 1. The functions are both continuously di¤erentiable on the entire plane, so D = R2 .
We then trivially have C D = ;: at all the points of the constraint set, the functions f
and g are both are continuously di¤erentiable. We have therefore completed phases 1 and 2
of Lagrange’s method.
Since rg (x) = (1; 1; :::; 1), there are no singular points, that is, D0 = ;. This completes
phase 3 of Lagrange’s method.
The Lagrangian function L : Rn+1 ! R is given by
n
!
2 X
L (x; ) = e kxk + 1 xi
i=1

To …nd the set of its stationary points, it is necessary to solve the …rst-order condition (29.17)
given here by the following (nonlinear) system of n + 1 equations:
( @L kxk2
@xi = 2xi e = 0 8i = 1; :::; n
@L P n
@ =1 i=1 xi = 0

We observe that for no solution we can have = 0. Indeed, otherwise the …rst n equations
would imply xi = 0, which contradicts the last equation. It follows that for every solution
we have 6= 0. The …rst n equations yield
2
xi = ekxk
2
and, upon substituting these values in the last equation, we get
2
1 + n ekxk = 0
2
that is
2 kxk2
= e
n
Substituting this value of in any of the …rst n equations we …nd xi = 1=n, so the only
point (x; ) 2 Rn+1 that satis…es the …rst-order condition (29.17) is

1 1 1 2 1
; ; :::; ; e n
n n n n

That is, S is the singleton


1 1 1
S= ; ; :::;
n n n
This completes phase 4 of Lagrange’s method. Since C D = ; and D0 = ;, we have

S [ (C \ D0 ) [ (C D) = S (29.23)

Thus, in this example the …rst-order condition (29.17) turns out to be necessary for any local
solution of the optimization problem (29.22). The unique element of S is, therefore, the only
candidate to be a local solution of the problem. This completes Lagrange’s method.
29.4. THE METHOD OF ELIMINATION 899

Turn now to the elimination method, which we can use since the continuous function f
is coercive on the (non compact, being closed but unbounded) set
( n
)
X
C = x = (x1 ; :::; xn ) 2 Rn : xi = 1
i=1

Indeed: 8 n
< R p if t 0
(f t) = x 2 Rn : kxk lg t if t 2 (0; 1]
:
; if t > 1
so the set (f t) is compact and non-empty for each t 2 (0; 1]. Since the set in (29.23) is
a singleton, the elimination method allows us to conclude that (1=n; :::; 1=n) is the unique
solution of the optimization problem (29.22). N

Example 1289 Given p = (p1 ; :::; pn ) 2 Rn++ [ Rn ,2 the optimization problem:


n
X n
X
max pi log xi sub xi = 1 (29.24)
x1 ;:::;xn
i=1 i=1
n
Pn
is
Pnof the form (29.4), with f; g : R++ ! R given by f (x) = i=1 pi log xi and g (x) =
x
i=1 i , and b = 1. The functions f and g are continuously di¤erentiable at all points of the
constraint set, i.e., C D = ;, and there are no singular points, i.e., D0 = ;. This completes
the …rst three phases of Lagrange’s method.
The Lagrangian function L : Rn++ R ! R is given by
n n
!
X X
L (x; ) = pi log xi + 1 xi
i=1 i=1

To …nd the set of its stationary points we need to solve the …rst-order condition (29.17),
given here by the following system (nonlinear) of n + 1 equations
( @L pi
@xi = xi =0 8i = 1; :::; n
@L P n
@ =1 i=1 xi = 0

Because the coordinates of the vector p are all di¤erent from zero, one cannot have = 0
for any solution.PIt follows n
P that for each solution 6= 0. Because x 2 R++ , the …rst n
equations
Pn imply pi = xi , and by substituting these values in the last equation we …nd
i=1 pi =
P . Then, by substituting this value of in each of the …rst n equations we …nd
xi = pi = ni=1 pi . Thus, the unique point (x; ) 2 Rn+1 that satis…es the …rst order-condition
(29.17) is ( )
n
X
p1 p2 pn
Pn ; Pn ; :::; Pn ; pi
i=1 pi i=1 pi i=1 pi i=1
so that S is the singleton
p p2 pn
S= Pn 1 ; Pn ; :::; Pn
i=1 pi i=1 pi i=1 pi
2
That is, all coordinates of p are either strictly positive or strictly negative.
900 CHAPTER 29. EQUALITY CONSTRAINTS

This completes the phase 4 of Lagrange’s method. Since C D = ; and D0 = ;, we have

S [ (C \ D0 ) [ (C D) = S (29.25)

Thus, also in this example the …rst-order condition (29.17) is necessary for each local solution
of the optimization problem (29.24). Again, the unique element of S is the only candidate to
be a local solution of the optimization problem (29.22). This completes Lagrange’s method.
We can apply the elimination method becauseP the continuous function f is, by Lemma
847, also coercive on the set C = x 2 Rn++ : ni=1 xi = 1 , which is not compact because
it is not closed. In view of (29.25), the elimination method implies that
p1 pn
( Pn ; :::; Pn )
i=1 pi i=1 pi

is the unique solution of the optimization problem (29.24). N

When the elimination method is based on Weierstrass’ Theorem, rather than on the
weaker (but more widely applicable) Tonelli’s Theorem, as a “by-product” we can also …nd
the global minimizers, that is, the points x 2 C that solve problem minx f (x) sub x 2 C.
Indeed, it is easy to see that such are the points that minimize f over S [(C \ D0 )[(C D).
Clearly, this is no longer true with Tonelli’s Theorem because it only ensures the existence
of maximizers and remains silent on possible minimizers.

Example 1290 The optimization problem:

max 2x21 5x22 sub x21 + x22 = 1 (29.26)


x1 ;x2

is of the form (29.4), with f; g : R2 ! R are given by f (x1 ; x2 ) = 2x21 5x22 and g (x1 ; x2 ) =
x21 + x22 , while b = 1. Both f and g are continuously di¤erentiable on the entire plane, so
D = R2 . Hence, C D = ;: at all the points of the constraint set the functions f and g are
continuously di¤erentiable. This completes phases 1 and 2 Lagrange’s method.
We have rg (x) = (2x1 ; 2x2 ), so the origin (0; 0) is the unique singular point, that is,
D0 = f(0; 0)g. This singular point does not satisfy the constraint, so C \ D0 = ;. This
completes phase 3 of Lagrange’s method.
The Lagrangian function L : R3 ! R is given by

L (x1 ; x2 ; ) = 2x21 5x22 + 1 x21 x22

To …nd the set of its stationary points we must solve the …rst-order condition (29.17):
8 @L
>
> =0
< @x1
@L
> @x2 = 0
>
: @L
@ =0

that is, the following (nonlinear) system of three equations


8
>
> 4x1 2 x1 = 0
<
10x2 2 x2 = 0
>
>
:
1 x21 x22 = 0
29.4. THE METHOD OF ELIMINATION 901

in the three unknowns x1 , x2 , and . We verify immediately that x1 = x2 = 0 satisfy the


…rst two equations for every value of ; but they do not satisfy the third equation. Further,
x1 = 0 and = 5 imply x2 = 1, while x2 = 0 and = 2 imply x1 = 1. In conclusion,
the triples (x1 ; x2 ; ) that satisfy the …rst order condition (29.17) are

f(0; 1; 5) ; (0; 1; 5) ; (1; 0; 2) ; ( 1; 0; 2)g

so that
S = f(0; 1) ; (0; 1) ; (1; 0) ; ( 1; 0)g

This completes phase 4 of Lagrange’s method.3 Since C D = ; and C \ D0 = ;, we


conclude that
S = S [ (C \ D0 ) [ (C D) (29.27)

As in the last two examples, the …rst-order condition is necessary for any local solution of
the optimization problem (29.26).
By having completed Lagrange’s method, let us turn to elimination method to …nd the
global solutions. Since the set C = (x1 ; x2 ) 2 R2 : x21 + x22 = 1 is compact and the function
f is continuous, we can use such method through Weierstrass’Theorem. In view of (29.27),
in phase 6 we have:

f (0; 1) = f (0; 1) = 5 > f (1; 0) = f ( 1; 0) = 2

The points (0; 1) and (0; 1) are thus the (global) solutions of the optimization problem
(29.26), while the reliance here of the elimination method on Weierstrass’Theorem makes it
possible to say that the points (1; 0) and ( 1; 0) are global minimizers. N

The next example illustrates the importance of singular points.

Example 1291 The optimization problem:


x1
max e sub x31 x22 = 0 (29.28)
x1 ;x2

is of the form (29.4), with f; g : R2 ! R given by f (x) = e x1 and g (x) = x31 x22 , and
b = 0. We have D = R2 , hence C D = ;. Phases 1 and 2 of Lagrange’s method have been
completed.
Moreover, we have
rg (x) = 3x21 ; 2x2

so the origin (0; 0) is the unique singular point and it also satis…es the constraint, i.e.,
D0 = C \ D0 = f(0; 0)g. This completes phase 3 of Lagrange’s method.
The Lagrangian function L : R3 ! R is given by
x1
L (x1 ; x2 ; ) = e + x22 x31
3
Note that there are no other points that satisfy rL = 0: Indeed, suppose that rL(^ ^2 ; ^ ) = 0, with
x1 ; x
^1 6= 0 and x
x ^2 6= 0. Then, from @L=@x1 = 0 we deduce = 2, whereas from @L=@x2 = 0 we deduce = 5.
902 CHAPTER 29. EQUALITY CONSTRAINTS

To …nd the set of its stationary points, we need to solve the …rst-order condition (29.17),
given here by the following (nonlinear) system of three equations:
8 @L
>
> = e x1 3 x21 = 0
< @x1
@L
> @x2 = 2 x2 = 0
>
: @L 2 x31 = 0
@ = x2

Note that for no solution we can have = 0. Indeed, for = 0 the …rst equation becomes
e x1 = 0, which does not have solution. Let us suppose therefore 6= 0. The second equation
implies x2 = 0, hence from the third one it follows that x1 = 0. The …rst equation becomes
1 = 0, and the contradiction shows that the system does not have solutions. Therefore,
there are no points that satisfy the …rst-order condition (29.17), so S = ;. Phase 4 of
Lagrange’s method shows that

S [ (C \ D0 ) [ (C D) = C \ D0 = f(0; 0)g (29.29)

By Lagrange’s method, the unique possible local solution of the optimization problem (29.28)
is the origin (0; 0).
Turn now to the elimination method. To use it we need to show that the continuous f is
coercive on the (non compact, being closed but unbounded) set C = (x1 ; x2 ) 2 R2 : x31 = x22 .
Note that: 8 2
>
> R if t 0
<
(f t) = ( 1; lg t] R if t 2 (0; 1]
>
>
:
; if t > 1
Thus, f is not coercive on the entire plane but it is coercive on C, which is all that matters
here. Indeed, note that x1 can satisfy the constraint x31 = x22 only if x1 0, so that
C R+ R and

(f t) \ C (( 1; lg t] R) \ (R+ R) = [0; lg t] R 8t 2 (0; 1]


p p
If x1 2 [0; lg t], the constraint implies x22 2 0; lg3 t , i.e., x22 2 [ lg3 t; lg3 t]. It
follows that
q q
3
(f t) \ C [0; lg t] lg t; lg3 t 8t 2 (0; 1]

and so (f t) \ C is compact because it is a closed subset of a compact set. We conclude


that f is both continuous and coercive on C. We can thus use the elimination method.
In view of (29.29), it implies that the origin, a singular point, is the only solution of the
optimization problem (29.28). N

29.5 The consumer problem


Assume that a consumer problem satis…es Walras’law, so that we can write it as

max u (x) sub x 2 (p; w)


x
29.5. THE CONSUMER PROBLEM 903

where (p; w) = fx 2 A : p x = wg, with strictly positive prices p 0. To best solve this
problem with the di¤erential methods of this chapter, assume also that the utility function
u : A Rn+ ! R is continuously di¤erentiable on int A.4
For instance, consumer problems that satisfy such assumptions
Pare the those featuring
n
a log-linear utility function u : Rn++ ! R de…ned by u (x) = i=1 i log x
a P i , with A =
int A = R++ , or a separable utility function u : R+ ! R de…ned by u (x) = ni=1 xi , with
n n

int A = Rn++ A = Rn+ (cf. Proposition 796).

Let us …rst …nd the local solutions of the consumer problem through Lagrange’s method.
The function g (x) = p x expresses the constraint, so

D = Rn+ \ int A and (p; w) D = @A \ (p; w)

Hence, the set (p; w) D consists of the boundary points of A that satisfy the constraint.5
Note that when A = int A, as in the log-linear case, we have (p; w) D = ;.
From
rg (x) = p 8x 2 Rn
it follows that there are no singular points, that is, D0 = ;. Hence,

(p; w) \ D0 = ;

All this completes phases 1 to 4 of Lagrange’s method.


The Lagrangian function L : A R ! R is given by

L (x; ) = u (x) + (w p x)

so to …nd the set of its stationary points, it is necessary to solve the …rst-order condition:
8 @u(x)
> @L
>
> @x1 (x; ) = @x1 p1 = 0
>
>
>
>
>
<
>
>
>
> @L
(x; ) = @u(x)
pn = 0
>
> @xn @xn
>
>
: @L
@ (x; ) = w p x=0
In a more compact way, we write
@u (x)
= pi 8i = 1; :::; n (29.30)
@xi
p x=w (29.31)

The fundamental condition (29.30) is read in a di¤erent way according to the interpretation,
cardinalist or ordinalist, of the utility function. Let us suppose, for simplicity, that 6= 0.
In the cardinalist interpretation, the condition is recast in the equivalent form
@u(x) @u(x)
@x1 @xn
= =
p1 pn
4
Note that A Rn + implies int A Rn
++ , i.e., the interior points of A have strictly positive coordinates.
5
Here the choice set, (p; I), is by de…nition included in the domain A, so @A\A\ (p; I) = @A\ (p; I).
904 CHAPTER 29. EQUALITY CONSTRAINTS

which emphasizes that, at a bundle x which is a (local) solution of the consumer problem,
the marginal utilities of the income spent for the various goods, measured by the ratios

@u(x)
@xi
pi

are all equal. Note that 1=pi is the quantity of good i that can be purchased with one unit
of income.
In the ordinalist interpretation, where the notion of marginal utility becomes meaningless,
condition (29.30) is rewritten as
@u(x)
@xi pi
@u(x)
=
pj
@xj

for every pair of goods i and j of the solution bundle x. At such a bundle, therefore,
the marginal rate of substitution between each pair of goods must be equal to the ratio
between their prices, that is, M RSxi ;xj = pi =pj . For n = 2 we have the classic geometric
interpretation of the optimality condition for a bundle (x1 ; x2 ) as equality between the slope
of the indi¤erence curve (in the sense of Section 25.3.2) and the slope of the straight line of
the budget constraint.

2
x
2

1.5

0.5

-0.5

O x
1
-1
-1 0 1 2 3 4 5 6 7

The ordinalist interpretation does not require the cardinalist notion of marginal utility, a
notion that – by Occam’s razor – becomes thus super‡uous for the study of the consumer
problem. The observation dates back to a classic 1900 work of Vilfredo Pareto and repres-
ented a turning point in the history of utility theory, so much that we talk of an “ordinalist
revolution”.

In any case, relations (29.30) and (29.31) are …rst-order conditions for the consumer
problem and their resolution determines the set S of the stationary points. In conclusion,
Lagrange’s method implies that the local solutions of the consumer problem must be looked
for among the points of the set
S [ (@A \ (p; w)) (29.32)
29.5. THE CONSUMER PROBLEM 905

Besides points that satisfy the …rst-order conditions (29.30) and (29.31), local solutions can
therefore be boundary points @A \ (p; w) of the set A that satisfy the constraint.6

When u is coercive and continuous on (p; w), we can apply the elimination method
to …nd the (global) solutions of the consumer problem, that is, the optimal bundles (which
are the economically meaningful notions, consumers do not care about bundles that are just
locally optimal). In view of (29.32), the solutions are the bundles x
^ 2 S [ (@A \ (p; w))
such that
u (^
x) u (x) 8x 2 S [ (@A \ (p; w))
In other words, we have to compare the utility levels attained by the stationary points in S
and by the boundary points that satisfy the constraint in @A \ (p; w). As the comparison
requires the computation of all these utility levels, the smaller the set S [ (@A \ (p; w))
the more e¤ective the elimination method.
Example 1292 Consider the log-linear utility function in the case n = 2, i.e.,
u (x1 ; x2 ) = a log x1 + (1 a) log x2
with a 2 (0; 1). The …rst-order condition at every (x1 ; x2 ) 2 R2++ takes the form
a 1 a
= p1 ; = p2 (29.33)
x1 x2
p1 x1 + p2 x2 = w (29.34)
Relation (29.33) implies
a 1 a
=
p 1 x1 p2 x2
Substituting this in (29.34), we have
1 a
p1 x1 + p 1 x1 = w
a
and hence
w w
x1 = a ; x2 = (1 a)
p1 p2
In conclusion,
w w
S= ; (1 a)a (29.35)
p1 p2
Since @A = ;, we have @A \ (p; w) = ;. By Lagrange’s method, the unique possible local
solution of the consumer problem is the bundle
w w
x= a ; (1 a) (29.36)
p1 p2
We turn now to the elimination method that we can use because the continuous function u
is, by Lemma 847, coercive on the set (p; w) = x 2 R2++ : p1 x1 + p2 x2 = w , which is not
compact since it is not closed. In view of (29.35), the elimination method implies that the
bundle (29.36) is the unique solution of the log-linear consumer problem, that is, the unique
optimal bundle. Note that this …nding con…rms what we already proved and discussed in
Section 18.7, in a more general and elegant way through Jensen’s inequality. N
6
When A = Rn + , they lie on the axes and are called corner solutions in the economics jargon (as remarked
earlier in the book). In the case n = 2 and A = R2+ , corner solutions can be (0; I=p2 ) and (I=p1 ; 0).
906 CHAPTER 29. EQUALITY CONSTRAINTS

29.6 Cogito ergo solvo


The previous section shows the power of the elimination method: the Lagrange’s method
allowed us to …nd the unique candidate in R2++ to be a local solution of the consumer problem,
but it could not tell anything neither about its nature (whether a maximizer, a minimizer
or something else) nor about its uniqueness, a fundamental feature for an optimal bundle in
that it permits comparative statics exercises. The elimination method answers all these key
questions by showing that the unique local candidate is, indeed, the unique solution.
That said, the last example also shows the limitations of di¤erential methods. Indeed,
as we remarked at the end of the example, in Section 18.7 we reached a more general
result without using such methods via Jensen’s inequality. The next example will show that
di¤erential methods can actually turn out to be silly. They are not a deus ex machina that
one should always try automatically, without …rst thinking about the speci…c optimization
problem at hand, with its peculiar features that may make it possible to address it with a
direct argument.
Example 1293 Consider the separable utility function u : R2+ ! R given by u (x) = x1 +x2 .
Suppose p1 6= p2 (as it is usually the case). First, observe that C D = f(0; w=p2 ) ; (w=p1 ; 0)g.
The …rst-order condition at every point (x1 ; x2 ) 2 R2++ becomes
1 = p1 , 1 = p2
p1 x1 + p2 x2 = w
which has no solutions since p1 6= p2 . Hence, S = ; and so
w w
S [ (C D) = C D= 0; ; ;0
p2 p1
The unique possible local solutions of the consumer problem are, therefore, the boundary
bundles f(0; w=p2 ) ; (w=p1 ; 0)g. Since u is continuous on the compact set (p; w) = fx 2
R2+ : p1 x1 + p2 x2 = wg, we can apply the elimination method through Weierstrass’Theorem
and conclude that (0; w=p2 ) is the optimal bundle when p2 < p1 and (w=p1 ; 0) is the optimal
bundle when p2 > p1 .
The same result can be achieved, however, in a straightforward manner without any
di¤erential machinery. Indeed, if we substitute the constraint in the objective function, the
optimal x1 (and so the optimal x2 via the budget constraint) can be found by solving the
elementary optimization problem
w
max (p2 p 1 ) x1 sub x1 2 0;
x1 p1
It is immediate to check that there are two boundary solutions x ^1 = 0 and x ^1 = w=p1 if,
respectively, p1 > p2 and p1 < p2 . This shows how silly can be a mechanical use of di¤erential
arguments. N

29.7 Several constraints


Consider now the general optimization problem (29.2) in which there may be multiple equal-
ity constraints. In this section we will show that Lemma 1285 and Lagrange’s Theorem can
be easily generalized to such case.
29.7. SEVERAL CONSTRAINTS 907

Let us write problem (29.2) as

max f (x) sub g (x) = b (29.37)


x

where g = (g1 ; :::; gm ) : A Rn ! Rm and b = (b1 ; :::; bm ) 2 Rm . All functions f and gi are
assumed to be continuously di¤erentiable on a non-empty open subset D A. Thus, at all
points x 2 D we can de…ne the Jacobian matrix Dg (x) by
2 3
rg1 (x)
6 rg2 (x) 7
Dg (x) = 6
4
7
5
rgm (x)

A point x 2 D is called regular (with respect to the constraints) if Dg (x) has full rank,
otherwise is called singular. For instance, the Jacobian Dg (^
x) has full rank if the gradients
rg1 (^
x),...,rgm (^x) are linearly independent vectors of Rn . In such a case, the full rank
condition requires m n, that is, that the number m of constraints be smaller than the
dimension n of the space.

Two observations about regularity: (i) when m = n, the Jacobian has full rank if and
only if it is a non-singular square matrix, that is, det Dg (x) 6= 0;7 (ii) when m = 1, we have
Dg (x) = rg (x) and so the full rank condition amounts to require rg (x) 6= 0, which brings
us back to the notions of regular and singular points seen in the case m = 1 of a single
constraint.

The following result extends Lemma 1285 to the case with multiple constraints and shows
that the regularity condition rg (^x) 6= 0 from such lemma can be generalized by requiring
the Jacobian Dg (^x) to have full rank. In other words, x
^ must not be a singular point here
either.8

Lemma 1294 Let x ^ 2 C \ D be the local solution of the optimization problem (29.37). If
x) has full rank, then there is a vector ^ 2 Rm such that
Dg (^
n
X
rf (^
x) = ^ i rgi (^
x) (29.38)
i=1

The Lagrangian is now the function L : A R Rn Rm ! R de…ned by:


m
X
L (x; ) = f (x) + i (bi gi (x)) = f (x) + (b g (x)) (29.39)
i=1

for every (x; ) 2 A Rm , and Lagrange’s Theorem takes the following general form.
7
So, in this case a point x is singular if its Jacobian matrix Dg (x) is a singular matrix. The notion of
singular point is thus consistent with the notion of singular matrix (Section 13.6.6).
8
We omit the proof, which generalizes that of Lemma 1285 by means of a suitable version of the Implicit
Function Theorem. We then also omit the simple proof of Theorem 1295, which is similar to that of the
special case of a single constraint.
908 CHAPTER 29. EQUALITY CONSTRAINTS

Theorem 1295 (Lagrange) Let x ^ 2 C \ D be a solution of the optimization problem


(29.37). If Dg (^x) has full rank, there is a vector ^ 2 Rm such that the pair (^
x; ^ ) 2 Rn+m
is a stationary point for the Lagrangian.

The components ^ i of vector ^ 2 Rm are called Lagrange multipliers. Vector ^ is unique


x)gm
whenever the vectors frgi (^ i=1Pare linearly independent because, in such a case, there is
x) = m
a unique representation rf (^ ^
i=1 i rgi (^x).

The comments that we made for Lagrange’s Theorem also hold in this more general case.
In particular, the search for local candidate solutions for the constrained problem must still
be conducted following Lagrange’s method, while the elimination method can be still used
to check whether such local candidates actually solve the optimum problem. The examples
will momentarily illustrate all this.
From an operational standpoint note that, however, the …rst order condition

rL (x; ) = 0

is now based on a Lagrangian L that has the more complex form (29.39). Also the form of
the set of singular points D0 is more complex: the study of the Jacobian’s determinant may
be complex, thus making the search for singular points quite hard. The best thing is often
to directly look for the singular points which satisfy the constraints –i.e., for the set C \ D0
–instead of trying to determine the set D0 …rst and the intersection C \ D0 afterwards (as
we did in the case with one constraint). The points x 2 C \ D0 are such that gi (x) = bi and
the gradients rgi (x) are linearly dependent. So, we must verify whether the system
8 Pm
>
> i=1 i rgi (x) = 0
>
> g1 (x) = b1
>
>
<
>
>
>
>
>
>
:
gm (x) = bm

admits solutions (x; ) 2 Rn Rm with = ( 1 ; :::; m ) 6= 0, that is, with i that are not
all null. Such possible solutions identify the singular points that satisfy the constraints. To
ease calculations, it is useful to note that the system can be written as
8 Pm @gi (x)
>
> i=1 i @x1 = 0
>
>
>
>
>
>
>
>
>
>
>
>
>
< Pm
> @gi (x)
i=1 i @xn = 0
(29.40)
>
> g (x) = b
>
> 1 1
>
>
>
>
>
>
>
>
>
>
>
>
:
gm (x) = bm
29.7. SEVERAL CONSTRAINTS 909

Example 1296 The optimization problem:

max 7x1 3x3 sub x21 + x22 = 1 and x1 + x2 x3 = 1 (29.41)


x1 ;x2 ;x3

has the form (29.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by f (x1 ; x2 ) =
7x1 3x3 ; g1 (x1 ; x2 ; x3 ) = x21 + x22 and g2 (x1 ; x2 ; x3 ) = x1 + x2 x3 , while b = (1; 1) 2 R2 .
These functions are all continuously di¤erentiable on R3 , so D = R3 . Hence, C D = ;:
at all points of the constraint set, the functions f , g1 and g2 are all continuously di¤erentiable.
This completes phases 1 and 2 of Lagrange’s method.
Let us …nd the singular points satisfying the constraint, that is, the set C \ D0 . The
system (29.40) becomes 8
>
> 2 1 x1 + 2 = 0
>
>
< 2 1 x2 + 2 = 0
2 =0
>
> 2 + x2 = 1
>
> x
: 1 2
x1 + x2 x3 = 1
Since 2 = 0, 1 is di¤erent from 0. This implies that x1 = x2 = 0, thus contradicting the
fourth equation. Therefore, there are no singular points satisfying the constraint, that is,
C \ D0 = ;. Phase 3 of Lagrange’s method is thus completed.
The Lagrangian L : R5 ! R is

L (x1 ; x2 ; x3 ; 1; 2) = 7x1 3x3 + 1 1 x21 x22 + 2 (1 x1 x2 + x3 )

To …nd the set of its critical points we must solve the …rst order condition (29.17), which is
given by the following non-linear system of …ve equations
8 @L
>
>
> @x1 = 7 2 1 x1 2 =0
>
> @L
= 2 x = 0
< @x2 1 2 2
@L
> @x3 = 3 + 2 = 0
>
> @@L = 1 x21 x22 = 0
>
>
: @L1
@ 2 =1 x1 x2 + x3 = 0

in the …ve unknowns x1 , x2 , x3 , 1 and 2 . The third equation implies 2 = 3, so the …rst
equation implies that 1 6= 0. Therefore, from the …rst two equations it follows that 2= 1 =
x1 and 3= (2 1 ) = x2 . By substituting into the third equation we get that 1 = 5=2. If
1 = 5=2, we have x1 = 4=5, x2 = 3=5, x3 = 4=5. If 1 = 5=2, we have x1 = 4=5,
x2 = 3=5, and x3 = 6=5. We have thus found the two critical points of the Lagrangian
4 3 4 5 4 3 6 5
; ; ; ;3 ; ; ; ; ;3
5 5 5 2 5 5 5 2
so that
4 3 4 4 3 6
S= ; ; ; ; ;
5 5 5 5 5 5
thus completing all phases of Lagrange’s method. Since C D = ; and C \ D0 = ;, we
conclude that
4 3 4 4 3 6
S [ (C \ D0 ) [ (C D) = S = ; ; ; ; ; (29.42)
5 5 5 5 5 5
910 CHAPTER 29. EQUALITY CONSTRAINTS

thus proving that in the example the …rst order condition (29.17) is necessary for any local
solution of the optimization problem (29.41).
We now turn to the elimination method. Clearly, the set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x21 + x22 = 1 and x1 + x2 x3 = 1

is closed. It is also bounded (and so compact). For the x1 and x2 such that x21 + x22 = 1
we have x1 ; x2 2 [ 1; 1], while for the x3 such that x3 = x1 + x2 1 and x1 ; x2 2 [ 1; 1] we
have x3 2 [ 3; 1]. It follows that C [0; 1] [0; 1] [ 3; 1], and so C is bounded. Since f is
continuous, we can thus use the elimination method through Weierstrass’Theorem. In view
of (29.42), in the last phase of the elimination method we have
4 3 4 4 3 7 49
f ; ; =8 and f ; ; =
5 5 5 5 5 5 5
Hence, (4=5; 3=5; 4=5) solves the optimum problem (29.41), while ( 4=5; 3=5; 7=5) is a
minimizer. N

Example 1297 The optimization problem:

max x1 sub x21 + x32 = 0 and x23 + x22 2x2 = 0 (29.43)


x1 ;x2 ;x3

has also the form (29.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by
f (x1 ; x2 ) = x1 , g1 (x1 ; x2 ; x3 ) = x21 + x32 , g2 (x1 ; x2 ; x3 ) = x23 + x22 2x2 , while b = (0; 0) 2
R2 .
As before, these functions all are continuously di¤erentiable on R3 , so D = R3 . Therefore,
C D = ;: at all points of the constraint set, the functions f , g1 and g2 are all continuously
di¤erentiable. This completes phases 1 and 2 of Lagrange’s method.
Let us …nd the set C \ D0 of the singular points satisfying the constraint. The system
(29.40) becomes 8
>
> 2 1 x1 = 0
>
>
< 3 1 x22 + 2 (2x2 2) = 0
2 2 x3 = 0
>
> 2 3
>
> x1 + x2 = 0
: 2 2
x3 + x2 2x2 = 0
In light of the …rst and the third equations, we must consider three cases:

(i) 1 = 0, x3 = 0 and 2 6= 0: in this case the second equation implies x2 = 1, which


contradicts the last equation.

(ii) 2 = 0, x1 = 0 and 1 6= 0: in this case we obtain the solution x1 = x2 = x3 = 0.

(iii) x1 = x3 = 0: here as well we obtain the solution x1 = x2 = x3 = 0.

In conclusion, the origin f(0; 0; 0)g is the unique singular point that satis…es the con-
straints, so C \ D0 = f(0; 0; 0)g. This completes phase 3 of Lagrange’s method.
The Lagrangian L : R4 ! R is given by

L (x1 ; x2 ; x3 ; ) = x1 + 1 x21 x32 + 2 x23 x22 + 2x2


29.7. SEVERAL CONSTRAINTS 911

The …rst order condition (29.17) given by the following (non-linear) system of …ve equations
8 @L
> @x1 = 1 + 2 1 x1 = 0
>
>
>
> @L 2
< @x2 = 3 1 x2 2 2 (x2 1) = 0
@L
> @x3 = 2 2 x3 = 0
>
> @L
= x21 x32 = 0
>
>
: @@L1 2
@ 2 = x3 x22 + 2x2 = 0

in …ve unknowns x1 , x2 , x3 , 1 and 2 . The …rst equation implies that 1 6= 0 and x1 6= 0.


From the fourth equation it follows that x2 6= 0 and so, from the second equation, we have
2 6= 0.
Since 2 6= 0, from the …rst equation we have x3 = 0, so that the …fth equation implies
that x2 = 0 or x2 = 2. Since x2 = 0 contradicts
p what we have just stated, let us take
x2 = 2. The p fourth equation implies x 1 = 8, and so from the …rst equation
p implies that
1 = 1=4 2, so that from the second equation we get that 2 = 3=2 2. In conclusion,
the critical points of the Lagrangian are
p 1 3 p 1 3
8; 2; 0; p ; p ; 8; 2; 0; p ; p
4 2 2 2 4 2 2 2
and so n p p o
S= 8; 2; 0 ; 8; 2; 0

which completes all phases of Lagrange’s method. In conclusion, since C D = ; we have


n p p o
S [ (C \ D0 ) [ (C D) = S [ (C \ D0 ) = 8; 2; 0 ; 8; 2; 0 ; (0; 0; 0) (29.44)

Among such three points one must search for the possible local solutions of the optimization
problem (29.43).
As to the elimination method, also here the set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x32 = x21 and x23 + x22 = 2x2

is clearly closed. It is also bounded (and so compact). In fact, the second constraint can be
written as x23 + (x2 1)2 = 1, and so the x2 and x3 that satisfy it are such that x2p2 [0; p 2]
and x3 2 [ 1; 1]. Now, the constraint x 2 = x3 implies x2 2 [0; 8], and so x 2 [ 8; 8].
p p 1 2 1 1
We conclude that C [ 8; 8] [0; 2] [ 1; 1] and so C is bounded. As in the previous
example, we can thus use the elimination method through Weierstrass’Theorem. In view of
(29.44), in the last phase of the elimination method we have
p p
f 8; 2; 0 = 8 and f (0; 0; 0) = 0
p
Hence, the origin (0; 0; 0) solves the optimum problem (29.43), while ( 8; 2; 0) is a minimizer.
N

Example 1298 The optimization problem:

max x21 + x22 + x23 sub x21 x2 = 1 and x1 + x3 = 0 (29.45)


x1 ;x2 ;x3
912 CHAPTER 29. EQUALITY CONSTRAINTS

has the form (29.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by


f (x1 ; x2 ; x3 ) = x21 + x22 + x23 , g1 (x1 ; x2 ; x3 ) = x21 x2 and g2 (x1 ; x2 ; x3 ) = x1 + x3 ,
2
while b = (1; 1) 2 R .
As in the previous examples, all these functions are continuously di¤erentiable on R3 , so
D = R3 . Therefore C D = ;, which completes phases 1 and 2 of Lagrange’s method.
In this case we will directly study the rank of the Jacobian:

2x1 1 0
Dg (x) =
1 0 1

It is easy to see that for no value of x1 the two row vectors, that is, the two gradients
rg1 (x) and rg2 (x), are linearly dependent.9 Therefore, there are no singular points, that
is, D0 = ;. It follows that C \ D0 = ;, and so we have concluded phase 3 of Lagrange’s
method.
Let us now move to the search of the set of the Lagrangian’s critical points L : R5 ! R,
which is given by

L (x1 ; x2 ; x3 ; 1; 2) = x21 + x22 + x23 + 1 1 x21 + x2 + 2 (1 x1 x3 )

To …nd such points we must solve the following (non-linear) system of 5 equations
8 @L
>
>
> @x1 = 2x1 2 1 x1 2 =0
>
> @L
< @x2 = 2x2 + 1 = 0
@L
> @x3 = 2x3 2 =0
>
> @L
= 1 x 2+x =0
>
> 1 2
: @@L1
@ 2 = 1 x 1 x 3 =0

We have that 1 = 2x2 and 2 = 2x3 , which, if substituted in the …rst equation, lead to
the following non-linear system in three equations:
8
< x1 + 2x1 x2 x3 = 0
1 x21 + x2 = 0
:
1 x1 x3 = 0

From the last two equations it follows that x2 = x21 1 and x3 = 1p x1 , which, if substituted
in the …rst equation, imply that 2x31 1 = 0, from which x1 = 1= 3 2 follows and so
1 1
x2 = p
3
1 and x3 = 1 p
3
4 2
Therefore, the Lagrangian has a unique critical point
1 1 1 2 2
p
3
;p
3
1; 1 p
3
;p
3
2; 2 + p
3
2 4 2 4 2
so that
1 1 1
S= p
3
;p
3
1; 1 p
3
2 4 2
9
At a “mechanical” level, one can easily verify that no value of x1 can be such that the matrix Dg (x)
does not have full rank.
29.7. SEVERAL CONSTRAINTS 913

This completes all phases of Lagrange’s method. In conclusion, C D = ; and D0 = ; we


have
1 1 1
S [ (C \ D0 ) [ (C D) = S = p
3
;p
3
1; 1 p
3
(29.46)
2 4 2
There is a unique candidate local solution of the optimization problem (29.45).
Let us consider the elimination method. The set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x21 x2 = 1 and x1 = x3

is closed but
p not bounded (so it is not compact). In fact, consider the sequence fxn g given
by xn = 1 + n; n; 1 n . The sequence belongs to C, but kxn k ! +1 and so there is
no neighborhood in R3 that may contain it. On the other hand, by Proposition 820 the
function f is coercive and continuous on C. As in the last two examples, we can thus use the
elimination method but this time via Tonelli’s Theorem. In view of (29.46), the elimination
method implies that the point

1 1 1
p
3
;p
3
1; 1 p
3
2 4 2
is the solution of the optimization problem (29.45). In this case the elimination method
is silent about possible minimizers because it relies on Tonelli’s Theorem rather than on
Weierstrass’Theorem. N
914 CHAPTER 29. EQUALITY CONSTRAINTS
Chapter 30

Inequality constraints

30.1 Introduction
Let us go back to the consumer problem seen at the beginning of the previous chapter, in
which we considered a consumer with utility function u : A Rn ! R and income w 0.
Given the vector p 2 Rn+ of prices of the goods, because of Walras’law we wrote his budget
constraint as
C (p; w) = fx 2 A : p x = wg
and his optimization problem as:

max u (x) sub x 2 C (p; w) (30.1)


x

In this formulation we assumed that the consumer exhausts his budget (so the equality in
the budget constraint) and we did not impose other constraints on the bundle x except that
of satisfying the budget constraint. However, the hypothesis that income is entirely spent
may be too strong, so one may wonder what happens to the consumer optimization problem
if we weaken the constraint to p x w, that is, if the constraint is given by an inequality
and not anymore by an equality.
As to the bundles of goods x, in many cases it is meaningless to talk of negative quantities.
Think for example of the purchase of physical goods, say fruit or vegetables in an open air
market, in which the quantity purchased has to be positive. This suggests to impose the
positivity constraint x 0 in the optimization problem.
By keeping in mind these observations, the consumer problem becomes:

max u (x) (30.2)


x
sub p x w and x 0

with constraints now given by inequalities. If we write the budget set as

C (p; w) = fx 2 A : x 0 and p x wg (30.3)

the optimization problem still takes the form (30.1), but the budget set C (p; w) is now
di¤erent.

915
916 CHAPTER 30. INEQUALITY CONSTRAINTS

The general form of an optimization problem with both equality and inequality constraints
is:

max f (x) (30.4)


x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J

where I and J are …nite sets of indexes (possibly empty), f : A Rn ! R is the objective
function, the functions gi : A Rn ! R and the associated scalars bi characterize jIj
equality constraints, while the functions hj : A Rn ! R with the associated scalars cj
induce jJj inequality constraints. We continue to assume, as in the previous chapter, that
the functions f , gi and hj are continuously di¤erentiable on a non-empty and open subset
D of their domain A.
The optimization problem (30.4) can be equivalently formulated in canonical form as

max f (x) sub x 2 C


x

where the choice set is

C = fx 2 A : gi (x) = bi and hj (x) cj 8i 2 I; 8j 2 J g (30.5)

The formulation (30.4) is extremely ‡exible. It encompasses the optimization problem


with only equality constraints, which is the special case I 6= ; and J = ;. It reduces to
an unconstrained optimization problem when I = J = ; and A is open. Moreover, observe
that:

(i) A constraint of the form h (x) c can be included in the formulation (30.4) by consid-
ering h (x) c. In particular, the constraint x 0 can be included by considering
x 0;

(ii) A constrained minimization problem for f can be written in the formulation (30.4) by
considering f .

These two observations show the scope and ‡exibility of formulation (30.4). In particular,
in light of (ii) it should be clear that also the choice of the sign in expressing the inequality
constraints is just a convention. That said, next we give some discipline to this formulation.

De…nition 1299 The problem (30.4) is said to be well posed if, for each j 2 J, there exists
x 2 C such that hj (x) < c.

To understand this de…nition observe that an equality constraint g (x) = b can be written
in form of inequality constraint as g (x) b and g (x) b. This makes uncertain the
distinction between equality and inequality constraints in (30.4). To avoid this, and so
to have a clear distinction between the two types of constraints, in what follows we will
always consider optimization problems (30.4) that are well posed, so that it is not possible
to express equality constraints in the form of inequality constraints. Naturally, De…nition
1299 is automatically satis…ed when J = ;, so there are no inequality constraints to worry
about.
30.1. INTRODUCTION 917

Example 1300 (i) The optimization problem:

max x21 + x22 + x33


x1 ;x2 ;x3

sub x1 + x2 x3 = 1 and x21 + x22 1

is of the form (30.4) with jIj = jJj = 1, f (x) = x21 + x22 + x33 , g (x) = x1 + x2 x3 ,
h (x) = x21 + x22 and b = c = 1.1 These functions are continuously di¤erentiable, so D = R3 .
Moreover, C = x 2 R3 : x1 + x2 x3 = 1 and x21 + x22 1
(ii) The optimization problem:

max x1
x1 ;x2 ;x3

sub x21 + x32 = 0 and x23 + x22 2x2 = 0

is of the form (30.4) with I = f1; 2g, J = ;, f (x) = x1 , g1 (x) = x21 + x32 , g2 (x) =
x23 + x22 2x2 and b1 = b2 = 0. These functions are continuously di¤erentiable, so D = R3 .
Moreover, C = x 2 R3 : x32 = x21 and x23 + x22 = 2x2
(iii) The optimization problem:

max ex1 +x2 +x3


x1 ;x2 ;x3
1 1
sub x1 + x2 + x3 = 1, x21 + x22 + x23 = , x1 0 and x2
2 10
is of the form (30.4) with I = J = f1; 2g ; f (x) = ex1 +x2 +x3 , g1 (x) = x1 + x2 + x3 ,
g2 (x) = x21 + x22 + x23 ; h1 (x) = x1 ; h2 (x) = x2 , b1 = 1, b2 = 1=2, c1 = 0 and c2 = 1=10.
These functions are continuously di¤erentiable, so D = R3 . Moreover,

1 1
C= x 2 R3 : x1 + x2 + x3 = 1, x21 + x22 + x23 = , x1 0 and x2
2 10

(iv) The optimization problem:

max x31 x32


x1 ;x2

sub x1 + x2 1 and x1 + x2 1

is of the form (30.4) with I = ;; J = f1; 2g ; f (x) = x31 x32 , h1 (x) = x1 + x2 , h2 (x) =
x2 + x1 and c1 = c2 = 1. These functions are continuously di¤erentiable, so D = R2 .
Moreover, C = x 2 R2 : x1 + x2 1 and x2 1 + x1
(v) The minimum problem:

min x1 + x2 + x3
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 + x23
2
1
To be pedantic, here we should have set I = J = f1g ; g1 (x) = x1 + x2 x3 , h1 (x) = x21 + x22 and
b1 = c1 = 1. But, in this case of a single equality constraint and of a single inequality constraint, the
subscripts just make the notation heavy.
918 CHAPTER 30. INEQUALITY CONSTRAINTS

can be written in the form (30.4) as

max (x1 + x2 + x3 )
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 x23
2
N

O.R. An optimization problem with inequality constraints is often written as

max f (x) (30.6)


x
sub g1 (x) b1 ; g2 (x) b2 ; :::; gm (x) bm

where f : A Rn ! R is our objective function, while the functions gi : A Rn ! R and


the scalars bi 2 R induce m inequality constraints. As we already noted, this formulation
may include equality constraints g (x) = b via two inequality constraints g (x) b and
g (x) b. Note, however, that this formulation requires the presence of at least one
constraint (it is the case m = 1) and hence it is less general than (30.4). Moreover, the
indirect way in which (30.6) encompasses the equality constraints may make less transparent
the formulation of the results. This is a further reason why we chose the formulation (30.4)
in which the equality constraints are fully speci…ed. H

30.2 Resolution of the problem


In this section we extend to the optimization problem (30.4) the solution methods studied
in the previous chapter for the special case with only equality constraints (29.2). To do this,
we …rst need to …nd the general version of Lemma 1294 that also holds for problem (30.4).
To this end, for a given point x 2 A, set

A (x) = I [ fj 2 J : hj (x) = cj g (30.7)

In words, A (x) is the set of the indices of the so-called binding constraints at x, that is, of
the constraints that hold as equalities at the given point x. For example, in the problem

max f (x1 ; x2 ; x3 )
x1 ;x2 ;x3

sub x1 + x2 x3 = 1 and x21 + x22 1

the …rst constraint is binding at all the points


p ofpthep
choice set C, while the second constraint
is, for instance, binding at the point (1= 2; 1= 2; 2 1) and is not binding at the point
(1=2; 1=2; 0).2

De…nition 1301 A point x 2 D is said to be regular (with respect to the constraints) if


the gradients rgi (x) and the gradients rhj (x), with j 2 A (x), are linearly independent.
Otherwise, it is singular.
2
p p p
So, A(1= 2; 1= 2; 2 1) = f1; 2g and A (1=2; 1=2; 0) = f1g.
30.2. RESOLUTION OF THE PROBLEM 919

In other words, a point x 2 D is regular if the gradients of the functions that induce
constraints binding at such point are linearly independent. This condition generalizes the
notion of regularity upon which Lemma 1294 was based. Indeed, if we form the matrix whose
rows consist of the gradients of the functions that induce binding constraints at the point
considered, the regularity of the point amounts to require that such a matrix has full rank.
Note that in view of Corollary 89-(ii) a point is regular only if jA (x)j n, that is, only
if the number of the binding constraints at x does not exceed the dimension of the space on
which the optimization problem is de…ned.

We can now state the generalization of Lemma 1294 for problem (30.4). In reading it,
note how the vector ^ associated to the inequality constraints has positive sign, while there
is no restriction on the sign of the vector ^ associated to the equality constraints.

Lemma 1302 Let x ^ 2 C \ D be a local solution of the optimization problem (30.4). If x


^ is
jJj
regular, then there exist a vector ^ 2 R and a vector ^ 2 R+ such that
jIj

X X
rf (^
x) = ^ i rgi (^
x) + ^ j rhj (^
x) (30.8)
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J (30.9)

By unzipping gradients, condition (30.8) can be equivalently written as

@f X @gi X @hj
(^
x) = ^i (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J

This lemma generalizes both Fermat’s Theorem and Lemma 1294. Indeed:

(i) if I = J = ;, condition (30.8) reduces to the condition rf (^


x) = 0 of Fermat’s Theorem;
P
(ii) if I 6= ; and J = ;, condition (30.8) reduces to the condition rf (^x) = i2I ^ i rgi (^
x)
of Lemma 1294.

The novelty of Lemma 1302 relative to these previous results is, besides the positivity of
the vector ^ associated to the inequality constraints, the condition (30.9). To understand
the role of this condition, it is useful the following characterization.

Lemma 1303 Condition (30.9) holds if and only if ^ j = 0 for each j such that hj (^
x) < cj ,
that is, for each j 2
= A (^
x).

Proof Assume (30.9). Since for each j 2 J we have hj (^ x) cj , from the positive sign of ^
it follows that (30.9) implies cj hj (^ x) = 0 for each j 2 J, and therefore ^ j = 0 for each j
such that hj (^x) < cj . Conversely, if this last property holds we have

^ j (cj hj (^
x)) = 0 8j 2 J (30.10)

because, being hj (^
x) cj for each j 2 J, we have hj (^
x) < cj or hj (^
x) = cj . Condition
(30.10) immediately implies (30.9).
920 CHAPTER 30. INEQUALITY CONSTRAINTS

In words, (30.9) is equivalent to require the nullity of each ^ j associated to a not binding
constraint. Hence, we can have ^ j > 0 only if the constraint j is binding in correspondence
of the solution x^.
For example, if x^ is such that hj (^
x) < cj for each j 2 J, i.e., if in correspondence of x ^
all the inequality constraints are not binding, then we have ^ j = 0 for each j 2 J and the
vector ^ does not play any role in the determination of x ^. Naturally, this re‡ects the fact
that for this solution x
^ the inequality constraints themselves do not play any role.

The next example shows that conditions (30.8) and (30.9) are necessary, but not su¢ cient
(something not surprising since the same is true for Fermat’s Theorem and for Lemma 1294).

Example 1304 Consider the optimization problem:

x31 + x32
max (30.11)
x1 ;x2 2
sub x1 x2 0

It is a simple modi…cation of Example 1286, and has the form (30.4) with f; h : R2 ! R
given by f (x) = 2 1 (x31 + x32 ) and h (x) = x1 x2 , while c = 0. We have:

rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1)

and

rf (0; 0) = rg (0; 0)
(0 0) = 0

The origin (0; 0) satis…es with = 0 the conditions (30.8) and (30.9), but it is not solution
of the optimization problem (30.11), as (29.9) shows. N

We defer the proof of Lemma 1302 to the appendix.3 It is possible, however, to give a
heuristic proof of this lemma by reducing problem (30.4) to a problem with only equality
constraints, and then by exploiting the results seen in the previous chapter. For simplicity,
we give this argument for the special case

max f (x) (30.12)


x
sub g (x) = b and h (x) c

where f : A Rn ! R is the objective function, and g; h : A Rn ! R induce one equality


and one inequality constraint.
De…ne H : A R Rn+1 ! R as H (x; z) = h (x) + z 2 for each x 2 A and each z 2 R.
Given x 2 A, we have h (x) c if and only if there exists z 2 R such that h (x) + z 2 = c,
i.e., if and only if H (x; z) = c.4
3
A noteworthy feature of this proof is that, for a change, it does not rely on the Implicit Function Theorem,
unlike the proof that we gave for Lemma 1285 (the special case of Lemma Lemma 1294 that we proved).
4
The positivity of the square z 2 preserves the inequality g (x) b. The auxiliary variable z is often called
slack variable.
30.2. RESOLUTION OF THE PROBLEM 921

De…ne F : A R Rn+1 ! R and G : A R Rn+1 ! R by F (x; z) = f (x) and


G (x; z) = g (x) for each x 2 A and each z 2 R. The dependence of F and G on z is only
…ctitious, but it allows to formulate the following optimization problem:
max F (x; z) (30.13)
x;z

sub G (x; z) = b and H (x; z) = c


Problems (30.12) and (30.13) are equivalent: x ^ is solution of problem (30.12) if and only if
there exists z^ 2 R such that (^ x; z^) is solution of problem (30.13).
We have, therefore, reduced problem (30.12) to a problem with only equality constraints.
By Lemma 1294, (^ x; z^) is solution of such problem only if there exists a vector ( ^ ; ^ ) 2 R2
such that:
rF (^ x; z^) = ^ rG (^
x; z^) + ^ rH (^
x; z^)
that is, only if
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^) 8i = 1; :::; n
@xi @xi @xi
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^)
@z @z @z
which is equivalent to:
x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
2^ z = 0
On the other hand, we have 2^ z = 0 if and only if ^ z 2 = 0. In view of the equivalence
between problems (30.12) and (30.13), we conclude that if x
^ is a solution of problem (30.12),
then there exists a vector ( ; ) 2 R2 such that:
x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
^ (c h (x)) = 0
We therefore have conditions (30.8) and (30.9) of Lemma 1302. What we have not been
able to prove is the positivity of the multiplier , and for this reason the proof just seen is
incomplete.5

30.2.1 Kuhn-Tucker’s Theorem


In view of Lemma 1302, the Lagrangian function associated to the optimization problem
(30.4) is the function
jJj
L : A RjIj R+ Rn+jIj+jJj ! R
de…ned by:6
X X
L (x; ; ) = f (x) + i (bi gi (x)) + j (cj hj (x)) (30.14)
i2I j2J

= f (x) + (b g (x)) + (c h (x)) ;


5
Since it is, in any case, an heuristic argument, for simplicity we did not check the rank condition required
by Lemma 1294.
6
The notation (x; ; ) underlines the di¤erent status of x with respect to and .
922 CHAPTER 30. INEQUALITY CONSTRAINTS

jJj
for each (x; ; ) 2 A RjIj R+ . Note that the vector is required to be positive.

The next famous result, proved in 1951 by Harold Kuhn and Albert Tucker, generalizes
Lagrange’s Theorem to the optimization problem (30.4). We omit the proof because it is
analogous to that of Lagrange’s Theorem.

Theorem 1305 (Kuhn-Tucker) Let x ^ 2 C \ D be a local solution of the optimization


jJj
problem (30.4). If x ^ is regular, then there exists a pair of vectors ( ^ ; ^ ) 2 RjIj R+ such
x; ^ ; ^ ) satis…es the conditions:
that the triple (^

^; ^ ; ^ = 0
rLx x (30.15)

^ j rL j
^; ^ ; ^ = 0
x 8j 2 J (30.16)

rL ^; ^ ; ^ = 0
x (30.17)

rL ^; ^ ; ^
x 0 (30.18)

The components ^ i and ^ j of the vectors ^ and ^ are called Lagrange multipliers, while
(30.15)-(30.18) are called Kuhn-Tucker conditions. The points x 2 A for which there exists
jJj
a pair ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis…es the conditions (30.15)-(30.18)
are called points of Kuhn-Tucker.
The Kuhn-Tucker points are, therefore, the solutions of the –typically nonlinear –system
of equations and inequalities given by Kuhn-Tucker conditions. By Kuhn-Tucker’s Theorem,
a necessary condition for a regular point x to be solution of the optimization problem (30.4)
is that it is a point of Kuhn-Tucker.7 Observe, however, that a Kuhn-Tucker point (x; ; )
is not necessarily a stationary point for the Lagrangian function: the condition (30.18) only
requires rL (x; ; ) 0, not the stronger property rL (x; ; ) = 0.

Let (x; ; ) be a Kuhn-Tucker point. By Lemma 1303, expression (30.16) is equivalent to


require j = 0 for each j such that hj (x) < cj . Hence, j > 0 implies that the correspondent
constraint is binding at the point x, that is, hj (x) = cj . Because of its importance, we state
formally this observation.

Proposition 1306 At a Kuhn-Tucker point (x; ; ), we have j > 0 only if hj (x) = cj .

Later in the book, in Section 33.6, we will present a marginal interpretation of the
multipliers ( ^ ; ^ ), along the lines sketched in the case of equality constraints (Section 29.3.3).

30.2.2 The method of elimination


Kuhn-Tucker’s Theorem suggests a procedure to …nd local solutions of the optimization prob-
lem (30.4) that generalizes Lagrange’s method, as well as a generalization of the method of
elimination to …nd its global solutions. For brevity, we directly consider this latter general-
ization.
7
Note the adjective “regular”. Indeed, a point of Kuhn-Tucker which is not regular is outside the scope of
Kuhn-Tucker’s Theorem.
30.2. RESOLUTION OF THE PROBLEM 923

Let D0 be the set of the singular points x 2 D where the regularity condition of the
constraints does not hold, and let D1 be, instead, the set of the points x 2 A where this
condition holds. The method of elimination consists of four phases:
1. Verify if Tonelli’s Theorem can be applied, that is, if f is continuous and coercive on
C;
2. determine the set D where the functions f and gi are continuously di¤erentiable;
3. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di¤erentiable;
4. determine the set C \ D0 of the singular points that satisfy the constraints;
5. determine the set S of the regular Kuhn-Tucker points, i.e., the points x 2 C \(D D0 )
jJj
for which there exists a pair ( ; ) 2 RjIj R+ of Lagrange multipliers such that the
triple (x; ; ) satis…es the Kuhn-Tucker conditions (30.15)-(30.18);8
6. determines the set ff (x) : x 2 S [ (C \ D0 )g; if x
^ 2 S [ (C \ D0 ) is such that
f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D)
then such x
^ is solution of the optimization problem (30.4).
The …rst phase of the method of elimination is the same of the previous chapter, while
the other phases are the obvious extension of the method to the case of the problem (30.4).
Example 1307 The optimization problem:
max x1 2x22 (30.19)
x1 ;x2

sub x21 + x22 1


has the form (30.4), where f; h : R2 ! R are given by f (x1 ; x2 ) = x1 2x22 and h (x1 ; x2 ) =
x21 + x22 , while b = 1. Since C is compact, the …rst phase is completed through Weierstrass’
Theorem.
The functions f and h are continuously di¤erentiable, so D = R2 and C D = ;. We
have rh (x) = (2x1 ; 2x2 ), so the constraint is regular at each point x 2 C, that is, C \D0 = ;.
This completes the …rst four phases of the elimination method.
The Lagrangian function L : R3 ! R is given by
L (x1 ; x2 ; ) = x1 2x22 + 1 x21 x22
and to …nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8 @L
>
> @x1 = 1 2 x1 = 0
>
> @L
>
< @x2 = 4x2 2 x2 = 0
@L
@ = 1 x21 x22 = 0
>
> @L
>
> = 1 x21 x22 0
>
: @
0
8
Note that S C because the Kuhn-Tucker conditions ensure, inter alia, that the Kuhn-Tucker points
satisfy the constraints is therefore not necessary to check if for a point x 2 S we have also x 2 C. A similar
observation was made in the previous chapter.
924 CHAPTER 30. INEQUALITY CONSTRAINTS

We start by observing that 6= 0, that is, > 0. Indeed, if = 0 the …rst equation
becomes 1 = 0, a contradiction. We therefore assume that > 0. The second equation
implies x2 = 0, and in turn the third equation implies x1 = 1. From the …rst equation it
follows = (1=2), and hence the only solution of the system is ( 1; 0; (1=2)). The only
Kuhn-Tucker point is therefore ( 1; 0) , i.e., S = f( 1; 0)g.
In sum, since the sets C \ D0 and C D are both empty, we have

S [ (C \ D0 ) [ (C D) = S = f( 1; 0)g

The method of elimination allows us to conclude that ( 1; 0) is the only solution of the
optimization problem 30.19. Note that in this solution the constraint is binding (i.e., it is
satis…ed with equality); indeed = (1=2) > 0, as required by Proposition 1306. N

Example 1308 The optimization problem

n
X
max x2i (30.20)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1

n ! R are given by f (x) =


Pn 2
Pn is of the form (30.4), where f; g : R i=1 xi and g (x) =
n
i=1 xi , hj (x) : R ! R are given by h Pj (x) = xj for j = 1; :::; n; while b = 1 and cj = 0
for j = 1; :::; n. The set C = x 2 Rn+ : ni=1 xi = 1 is compact and so also in this case the
…rst phase is completed thanks to Weierstrass’Theorem.
The functions f , g and hj are continuously di¤erentiable, so D = R2 and C D = ;. For
each x 2 Rn we have rg (x) = (1; :::; 1) and rhj (x) = ej . Therefore, the value of these
gradients does not depend on the point x considered. To verify regularity, we consider the
collection (1; :::; 1) ; e1 ; :::; en of these gradients. This collection has n + 1 elements and it
is obviously linearly dependent (the versors e1 ,..., en form the standard basis of Rn ).
On the other hand, it is immediate to see that any subcollection with at most n elements
is, instead, linearly independent. Hence, the only way to violate regularity is that they are
all binding, so that all collections of n + 1 elements have to be considered. Fortunately, there
are no points x 2 Rn where all constraints are binding. Indeed, the only point that satis…es
with equality all thePconstraints xj 0 is the origin 0, which however does not satisfy the
equality constraint ni=1 xi = 1.
We conclude that all the points x 2 Rn are regular, i.e., D0 = ;. Hence, C \ D0 = ;.
This completes the …rst four phases of the elimination method.
The Lagrangian function L : R2n+1 ! R is given by

n n
! n
X X X
L (x1 ; x2 ; ) = x2i + 1 xi + i xi 8 (x; ; ) 2 R2n+1
i=1 i=1 i=1
30.2. RESOLUTION OF THE PROBLEM 925

To …nd the set S of its Kuhn-Tucker points, it is necessary to solve the system
8 @L
>
>
> @xi = 2xi + =0
Pn i
8i = 1; :::; n
>
> @L
>
> @ = (1 Pn i=1 i
x)=0
< @L
= 1 x = 0
@ i=1 i
@L
>
> i @ i = i xi = 0; 8i = 1; :::; n
>
> @L
>
> = xi 0; 8i = 1; :::; n
>
: @ i

i 0; 8i = 1; :::; n

If we multiply by xi the …rst n equations, we get

2x2i xi + i xi = 0; 8i = 1; :::; n

Adding up these new equations, we have


n
X n
X n
X
2 x2i xi + i xi =0
i=1 i=1 i=1

Therefore,
n
X
2 x2i =0
i=1
Pn 2
that is, = 2 i=1 xi . We conclude that 0.
If xi = 0, from the condition @L=@xi = 0 it follows that = i . Since i 0 and 0,
it follows that i = 0. In turn, this implies = 0 and hence using again the condition
@L=@xi = 0 we P conclude that xi = = 0 for each i = 1; :::; n. But this contradicts the
n
condition (1 i=1 xi ) = 0, and we therefore conclude that xi 6= 0, that is, xi > 0.
Since this holds for each i = 1; :::; n, it follows that xi > 0 for each i = 1; :::; n. From
the condition i xi = 0 it follows that i = 0 for each i = 1; :::; n, and the …rst n equations
become:
2xi =0 8i = 1; :::; n
P
that is, xi = =2 for each i = 1; :::; n. The xi are therefore all equal; from ni=1 xi = 1 it
follows that
1
xi = 8i = 1; :::; n
n
In conclusion,
1 1
S= ; :::;
n n
Since C D = ; and D0 = ;, we have

1 1
S [ (C \ D0 ) = ; :::;
n n

The method of elimination allows us to conclude that the point (1=n; :::; 1=n) is the solution
of the optimization problem (30.20). N
926 CHAPTER 30. INEQUALITY CONSTRAINTS

30.3 Cogito et solvo


The result of the last example –i.e., that (1=n; :::; 1=n) is the optimal point –can be proved
in a much more general form through a simple application of Jensen’s inequality, without
any use of di¤erentiable methods. Yet another proof that di¤erential methods might not be
“optimal” (cf. the discussion after Example 1292 in the previous chapter).

Proposition 1309 Let h : [0; 1] ! R be concave. The optimization problem


n
X
max h (xi ) (30.21)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1

has solution (1=n; :::; 1=n). It is the unique solution if h is strictly concave.
Pn
If h (xi ) = xi log xi , the function i=1 h (xi ) is called entropy (Examples 219 and 1268).
Pn
Proof Let x1 ; x2 ; :::; xn 2 [0; 1] with the constraint i=1 xi = 1. Since h is concave, by the
Jensen’s inequality we have
n n
!
X 1 1X 1
h (xi ) h xi =h
n n n
i=1 i=1

Namely,
n
X 1 1 1
h (xi ) nh =h + +h
n n n
i=1
P
This shows that (1=n; :::; 1=n) is a solution. Clearly, ni=1 h (xi ) is strictly concave if h is.
Hence, the uniqueness of the solution is ensured by Theorem 831.

30.4 Concave optimization


30.4.1 The problem
The remarkable optimality properties of concave functions make them of particular interest
when dealing with the optimization problem (30.4). We start with a simple, but important,
result.

Proposition 1310 Let A be convex. If the functions gi are a¢ ne for each i 2 I and the
functions hj are convex for each j 2 J, then the choice set C de…ned in (30.5) is convex.

Proof Set Ci = fx 2 A : gi (x) = bi g for each i 2 I and Cj = fx 2 A : hj (x) cj g for each


j 2 J. Clearly, Cj is convex as the sublevel of a convex function (Proposition 674). A
similar argument shows that also each
T Ci is convex,
T and this implies the convexity of the set
C de…ned in (30.5) because C = i2I Ci \ ( j2J Cj ).
30.4. CONCAVE OPTIMIZATION 927

It is easy to give examples where C is no longer convex when the conditions of convexity
and a¢ nity used in this result are not satis…ed. Note that the convexity condition of the
hj is much weaker than that of a¢ nity on the gi . This shows that the convexity of the
choice set is more natural for inequality constraints than for equality ones. This is a crucial
“structural”di¤erence between the two types of constraints –which are more di¤erent than
it may appear prima facie.

Motivated by the last result, we give the following de…nition.

De…nition 1311 The optimization problem (30.4) is said to be concave if the objective
function f is concave, the functions gi are a¢ ne and the functions hj are convex on the open
and convex set A.

A concave optimization problem has therefore the form

max f (x) (30.22)


x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J

where I and J are …nite sets of indexes (possibly empty), f : A Rn ! R is a concave


objective function, the a¢ ne functions gi : A n
R ! R and the associated scalars bi
characterize jIj equality constraints, while the convex functions hj : A Rn ! R with the
associated scalars cj induce jJj inequality constraints. The convex domain A is assumed to
be open to best exploit the properties of concave functions.

We can represent the a¢ ne functions gi as gi (x) = i x + qi (Proposition 656). Hence,


if is the jIj n matrix that has the vectors i 2 Rn as its rows, we can write the equality
constraints in the matrix form x + q = b, where b 2 RjIj . Often q = 0, so the equality
constraints take the simple matrix form

x=b (30.23)

In a similar vein, when also the functions hj happen to be a¢ ne, say hj (x) = j x + qi ,
we can write also the inequality constraints in the matrix form Hx c, where H is the
jJj n matrix with rows j and c 2 RjJj . Thus, when all constraints are identi…ed by a¢ ne
functions, the choice set is a polyhedron C = fx 2 Rn : x = b and Hx cg. This case often
arises in applications. Indeed, if also the objective function is a¢ ne, we are back to linear
programming, an important class of concave problem that we already studied via convexity
arguments (Section 18.6.4).

30.4.2 Kuhn-Tucker points


Recall from Section 28.3 that the search for the solutions of an unconstrained optimization
problem for concave functions was based on a remarkable property: the …rst order necessary
condition for the existence of a local maximizer becomes su¢ cient for the existence of a
global maximizer in the case of concave functions.
The next fundamental result is the “constrained” version of this property. Note that
regularity does not play any role in this result.
928 CHAPTER 30. INEQUALITY CONSTRAINTS

Theorem 1312 The Kuhn-Tucker points solve a concave optimization problem in which the
functions f; fgi gi2I and fhj gj2J are di¤ erentiable.

Proof Let (x ; ; ) be a Kuhn-Tucker point for the optimization problem (30.4), that is,
(x; ; ) satis…es the conditions (30.15)-(30.18). In particular, this means that
X X
rf (x ) = i rgi (x )+ j rhj (x ) (30.24)
i2I j2A(x )\J

Since each gi is a¢ ne and each hj is convex, by (24.18) it follows that:

hj (x) hj (x ) + rhj (x ) (x x ) 8j 2 J; 8x 2 A (30.25)


gi (x) = gi (x ) + rgi (x ) (x x ) 8i 2 I; 8x 2 A (30.26)

For each j 2 A (x ) we have hj (x ) = cj , and hence hj (x) hj (x ) for each x 2 C and


each j 2 A (x ) \ J. Moreover, gi (x ) = gi (x) for each i 2 I and each x 2 C. By (30.25)
and (30.26) it follows

rhj (x ) (x x ) 0 8j 2 A (x ) ; 8x 2 C
rgi (x ) (x x )=0 8i 2 I; 8x 2 C

Together with (30.24), we therefore have:


X X
rf (x ) (x x )= i rgi (x ) (x x )+ j rhj (x ) (x x ) 0
i2I j2A(x )\J

for each x 2 C. On the other hand, by (24.18) we have:

f (x) f (x ) + rf (x ) (x x ) 8x 2 A

and we conclude that f (x) f (x ) for each x 2 C, as desired.

This theorem provides a su¢ cient condition for optimality: if a point is Kuhn-Tucker,
then it solves the optimization problem. The condition is, however, not necessary: there can
be solutions of a concave optimization problem that are not Kuhn-Tucker points. In view
of Kuhn-Tucker’s Theorem this can happen only if the solution is not a regular point. The
next example illustrates this situation.

Example 1313 The optimization problem

max x1 x2 x23 (30.27)


x1 ;x2 ;x3

sub x21 + x22 2x1 0 and x21 + x22 + 2x1 0

has the form (30.4), where f; h1 ; h2 : R3 ! R are continuously di¤erentiable functions given
by f (x1 ; x2 ; x3 ) = x1 x2 x23 , h1 (x1 ; x2 ; x3 ) = x21 +x22 2x1 , h2 (x1 ; x2 ; x3 ) = x21 +x22 +2x1 ,
while c1 = c2 = 0.
30.4. CONCAVE OPTIMIZATION 929

Clearly, f is concave and h1 and h2 are convex, so (30.27) is a concave optimization


problem. The system of inequalities

x21 + x22 2x1 0


x21 + x22 + 2x1 0

has the point (0; 0) as its unique solution. Hence, C = x 2 R3 : x1 = x2 = 0 is a straight


line in R3 and the unique solution of the problem (30.27) is the origin (0; 0; 0). On the other
hand,
rh1 (0; 0; 0) = ( 2; 0; 0) and rh2 (0; 0; 0) = (2; 0; 0)
and so the origin is a singular point. Since

rf (0; 0; 0) = ( 1; 1; 0)

there are no pairs ( 1; 2) 2 R2+ such that:

rf (0; 0; 0) = 1 rh1 (0; 0; 0) + 2 rh2 (0; 0; 0)

Therefore, the solution (0; 0; 0) is not a Kuhn-Tucker point. N

By combining Kuhn-Tucker’s Theorem and Theorem 1312 we get the following necessary
and su¢ cient optimality condition.

Theorem 1314 Consider a concave optimization problem in which the functions f; fgi gi2I
and fhj gj2J are continuously di¤ erentiable. A regular point x 2 A is a solution of the
problem if and only if it is a Kuhn-Tucker point.

Theorem 1314 is a re…nement of the Kuhn-Tucker’s Theorem and, as such, it allows us


to re…ne the method of elimination, which we will call convex method (of elimination). Such
method is based on the following phases:

1. Verify if the problem is concave;

2. verify if the functions f , gi and hj are continuously di¤erentiable, i.e., A = D;

3. determine the set C \ D0 of the singular points that satisfy the constraints;

4. determine the set S of the regular Kuhn-Tucker points;

5. if S 6= ;, then all the points of S are solutions of the problem,9 while also a singular
point x 2 C \ D0 is a solution if and only if f (x) = f (^
x) for some x
^ 2 S;

6. if S = ;, check if Tonelli’s Theorem can be applied –i.e., if f is continuous and coercive


on C; if this is the case, the maximizers of f on C \D0 are solutions of the optimization
problem (30.4).
9
The set S is at most a singleton when f is strictly concave because in such a case there is at most a
solution of the problem (Theorem 831).
930 CHAPTER 30. INEQUALITY CONSTRAINTS

Since either phase 5 or 6 applies, depending on whether or not S is empty, the actual
phases of the convex method are …ve.

The convex method works thanks to Theorems 1312 and 1314. Indeed, if S 6= ; then
by Theorem 1312 all points of S are solutions of the problem. In this case, a singular point
x 2 C \ D0 can in turn be a solution when its value f (x) is equal to that of any point in S.
When, instead, we have S = ;, then Theorem 1314 guarantees that no regular point in A
is solution of the problem. At this stage, if Tonelli’s Theorem is able to ensure the existence
of at least a solution, we can restrict the search to the set C \ D0 of the singular points that
satisfy the constraints. In other words, it is su¢ cient to …nd the maximizers of f on C \ D0 :
they are also solutions of problem (30.4), and vice versa.

Clearly, the convex method becomes especially powerful when S 6= ; because in such a
case there is no need to verify the validity of global existence theorems a la Weierstrass or
Tonelli, but it is su¢ cient to …nd the Kuhn-Tucker points.
If we content ourselves with solutions that are regular points, without worrying about
the possible existence of singular solutions, we can give a short version of the convex method
that is based only on Theorem 1312. We can call it the short convex method. It is based
only on three phases:

1. Verify if the problem is concave;

2. verify if the functions f and gi are continuously di¤erentiable, i.e., A = D;

3. determine the set S of the regular Kuhn-Tucker points;

4. if S 6= ;, then all the points of S are solutions of the problem.

Indeed, by Theorem 1312 all regular Kuhn-Tucker points are solutions of the problem.
The short convex method is simpler than the convex method, and it does not require the use
of global existence theorems. The price of this simpli…cation is in the possible inaccuracy of
this method: being based on su¢ cient conditions, it is not able to …nd the solutions where
these conditions are not satis…ed (by Theorem 1314, such solutions would be singular points).
Furthermore, the short method cannot be applied when S = ;; in such a case, it is necessary
to apply the complete convex method.

The short convex method is especially powerful when the objective function f is strictly
concave, as often assumed in applications. Indeed, in such a case a solution found with the
short method is necessarily also the unique solution of the concave optimization problem.
The next example illustrates.

Example 1315 Consider the optimization problem:

max x21 + x22 + x23 (30.28)


x1 ;x2 ;x3

sub 3x1 + x2 + 2x3 1 and x1 0

This problem is of the form (30.4), where f; h1 ; h2 : R3 ! R are given by f (x) = x21 + x22 + x23 ,
h1 (x) = (3x1 + x2 + 2x3 ) and h2 (x) = x1 , while c1 = 1 and c2 = 0.
30.4. CONCAVE OPTIMIZATION 931

Using Theorem 1120 it is easy to verify that f is strictly concave, while it is immediate
to verify that h1 and h2 are convex. Therefore, (30.28) is a concave optimization problem.
Moreover, the functions f , h1 and h2 are continuously di¤erentiable. This completes the
…rst two phases of the short convex method, which we apply here since f is strictly concave.
Let us …nd the Kuhn-Tucker points. The Lagrangian function L : R5 ! R is given by

L (x1 ; x2 ; x3 ; 1; 2) = x21 + x22 + x23 + 1( 1 + 3x1 + x2 + 2x3 ) + 2 x1 ;

To …nd the set S of its Kuhn-Tucker points it is necessary to solve the system of equalities
and inequalities: 8 @L
> @x1 = 2x1 + 3 1 + 2 = 0
>
>
>
> @L
>
> @x2 = 2x2 + 1 = 0
>
> @L
>
> @x3 = 2x3 + 2 1 = 0
>
< @L
1 @ 1 = 1 ( 1 + 3x1 + x2 + 2x3 ) = 0
(30.29)
> 2 @@L = 2 x1 = 0
>
>
> 2
> @L
>
>
> @ 1 = 1 + 3x1 + x2 + 2x3 0
>
> @L
= x1 0
>
>
: @ 2
1 0; 2 0
We consider four cases, depending on the fact that the multipliers 1 and 2 are zero or
not.
Case 1: 1 > 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 + 2 = 0. This last equation does not have strictly positive solutions 1 and 2 , and
hence we conclude that we cannot have 1 > 0 and 2 > 0.
Case 2: 1 = 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 = 0, that is 1 = 0. This contradiction shows that we cannot have 1 = 0 and 2 > 0.
Case 3: 1 > 0 and 2 = 0. The conditions 1 @L=@ 1 = @L=@x1 = @L=@x2 = @L=@x3 =
0 imply: 8
>
> 2x1 + 3 1 = 0
<
2x2 + 1 = 0
>
> 2x3 + 2 1 = 0
:
3x1 + x2 + 2x3 = 1
Solving for 1 , we get 1 = 1=7, and hence x1 = 3=14, x2 = 1=14 and x3 = 1=7. The
quintuple (3=14; 1=14; 1=7; 1=7; 0) solves the system (30.29), and hence (3=14; 1=14; 1=7) is a
Kuhn-Tucker point.
Case 4: 1 = 2 = 0. The condition @L=@x1 = 0 implies x1 = 0, while the conditions
@L=@x2 = @L=@x3 = 0 imply x2 = x3 = 0. It follows that the condition @L=@ 1 0 implies
1 0, and this contradiction shows that we cannot have 1 = 2 = 0.
In conclusion, S = f((3=14; 1=14; 1=7))g. Since f is strictly concave, the short convex
method allows us to conclude that
3 1 1
; ;
14 14 7
is the unique solution of the optimization problem (30.28).10 N
10
The objective function is easily see to be strongly concave. So, coda readers may note that the existence
and uniqueness of the solution would also follow from Theorem 1185.
932 CHAPTER 30. INEQUALITY CONSTRAINTS

We close with an important observation. The solution methods seen in this chapter are
based on the search of the Kuhn-Tucker points, and therefore they require the resolution of
systems of nonlinear equations. In general, these systems are not easy to solve and this limits
the computational usefulness of these methods, whose importance is mostly theoretical. At
a numerical level, other methods are used (which the interested reader can …nd in books of
numerical analysis).

30.5 Appendix: proof of a key lemma


We begin with a calculus delight.

Lemma 1316 (i) The function y = x jxj is continuously di¤ erentiable in R and Dx jxj =
2
2 jxj. (ii) The square (x+ ) of the function x+ = max fx; 0g is continuously di¤ erentiable on
2
R, and D (x+ ) = 2x+ .

Proof (i) Observe that x jxj is in…nitely di¤erentiable for x 6= 0 and its …rst derivative is,
by the product rule for di¤erentiation,
jxj
Dx jxj = xD jxj + jxj Dx = x + jxj = 2 jxj
x
This is true for x 6= 0. Now it su¢ ces to invoke a basic calculus result that asserts: let f : I !
R be continuous on a real interval, and f be di¤erentiable at I fx0 g; if limx!x0 Df (x) = ,
then f is di¤erentiable at x0 and Df (x0 ) = . As an immediate consequence, Dx jxj = 2 jxj
also at x = 0. (ii) We have x+ = 2 1 (x + jxj). Therefore
2 1 1 1
x+ = (x + jxj)2 = x2 + x jxj
4 2 2
2 2
It follows that (x+ ) is continuously di¤erentiable and D (x+ ) = x + jxj = 2x+ .

Proof of Lemma 1302 Let k k be the Euclidean norm. We have hj (^ x) < cj for each
j 2
= A (^ x). Since A is an open, there exists ~" > 0 su¢ ciently small such that B~" (^ x) =
fx 2 A : kx x ^k ~"g A. Moreover, since each hj is continuous, for each j 2 = A (^x) there
exists "j su¢ ciently small such that hj (x) < cj for each x 2 B"j (^ x) = fx 2 A : kx x ^k "j g.
Let "0 = minj 2A(^
= x) " j and ^
" = min f~
" ; "0 g; in other words, ^
" is the minimum between ~" and
the "j . In this way we have B^" (^ x) = fx 2 A : kx x ^k ^"g A and hj (x) < cj for each
x 2 B^" (^
x) and each j 2 = A (^x).
Given " 2 (0; ^"], the set S" (^ x) = fx 2 A : kx x ^k = "g is compact. Moreover, by what
just seen hj (x) < cj for each x 2 S" (^ x) and each j 2 = A (^
x), that is, in S" (^
x) all the non
binding constraints are always satis…ed.
For each j 2 J, let h ~ j : A Rn ! R be de…ned by

~ j (x) = max fhj (x)


h cj ; 0g = (hj (x) cj )+
~ 2 2 C 1 (A) and
for each x 2 A. By Lemma 1316, h j

~ 2 (x)
@h + @hj (x)
j ~ j (x)
=2 h cj ; 8p = 1; :::; n (30.30)
@xp @xp
30.5. APPENDIX: PROOF OF A KEY LEMMA 933

We …rst prove a property that we will use after.


Fact 1. For each " 2 (0; ^"], there exists N > 0 such that
f (x) f (^
x) kx x ^ k2 (30.31)
0 1
X X 2
N@ x))2 +
(gi (x) gi (^ ~ j (x)
h ~ j (^
h x) A<0
i2I i2J\A(^
x)

for each x 2 S" (^


x).
Proof of Fact 1 We proceed by contradiction, and we assume therefore that there exists
" 2 (0; ^"] for which there is no N > 0 such that (30.31) holds. Take an increasing sequence
fNn gn with Nn " +1, and for each of these Nn take xn 2 S" (^ x) for which (30.31) does not
hold, that is, xn such that:
f (xn ) f (^
x) kxn x ^ k2
0 1
X X 2
Nn @ x))2 +
(gi (xn ) gi (^ ~ j (xn )
h ~ j (^
h x) A 0
i2I i2J\A(^
x)

Hence, for each n 1 we have:


f (xn ) f (^
x) kxn ^ k2
x X
(gi (xn ) x))2
gi (^ (30.32)
Nn
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
j2J\A(^
x)

Since the sequence fxn g just constructed is contained in the compact set S" (^ x), by the
Bolzano-Weierstrass’Theorem there exists a subsequence fxnk gk convergent in S" (^ x), i.e.,
there exists x 2 S" (^
x) such that xnk ! x . Inequality (30.32) implies that, for each k 1,
we have:
f (xnk ) f (^
x) kxnk x ^k2 X
(gi (xnk ) gi (^x))2 (30.33)
Nnk
i2I
X 2
+ ~ j (xn ) h
h ~ j (^
x)
k
j2J\A(^
x)

Since f is continuous, we have limk f (xnk ) = f (x ). Moreover, limk kxnk x


^k = kx x
^k.
Since limk Nnk = +1, we have
f (xnk ) f (^
x) kxnk ^ k2
x
lim =0
k Nnk
~j ,
and hence (30.33) implies, thanks to the continuity of the functions gi and h
X X 2
(gi (x ) gi (^x))2 + ~ j (x ) h
h ~ j (^
x)
i2I i2J\A(^
x)
0 1
X X 2
= lim @ (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A=0
k
i2I j2J\A(^
x)
934 CHAPTER 30. INEQUALITY CONSTRAINTS

2
It follows that (gi (x ) x))2 =
gi (^ ~ j (x )
h ~ j (^
= 0 for each i 2 I and for each
h x)
j 2 J \ A (^x), from which gi (x ) = gi (^
x) = bi for each i 2 I and h~ j (x ) = h
~ j (^
x) = cj for
each j 2 J \ A (^x).
Since in S" (^x) the non binding constraints are always satis…ed, i.e., hj (x) < cj for each
x 2 S" (^x) and each j 2 = A (^
x), we can conclude that x satis…es all the constraints. We
therefore have f (^x) f (x ) given that x ^ solves the optimization problem.
On the other hand, since xnk 2 S" (^ x) for each k 1, (30.33) implies

f (xnk ) f (^
x)
0 1
X X 2
kxnk ^k2 + Nnk @
x (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A "2
i2I j2J\A(^
x)

for each k 1, and hence f (xnk ) f (^x) + "2 for each k 1. Thanks to the continuity of
f , this leads to
f (x ) = lim f (xnk ) f (^x) + "2 > f (^
x)
k

which contradicts f (^
x) f (x ). This contradiction proves Fact 1. 4

Using Fact 1, we prove now a second property that we will need. Here we set S =
SRjIj+jJj+1 = x 2 RjIj+jJj+1 : kxk = 1 .

Fact 2. For each " 2 (0; ^"], there exist x" 2 B" (^
x) and a vector

" " " " "


0; 1 ; :::; jIj ; 1 ; :::; jJj 2S

with " 0 for each j 2 J, such that


j

X X
" @f @gi " " @hj
0 (x" ) 2 x"j x
^j "
i (x ) j (x" ) = 0 (30.34)
@xz @xz @xz
i2I j2J\A(^
x)

for each z = 1; :::; n.

Proof of Fact 2 Given " 2 (0; ^"], let N" > 0 be the positive constant whose existence is
guaranteed by Fact 1. De…ne the function " : A Rn ! R as:
0 1
X X 2
" (x) = f (x) f (^
x) kx x ^k2 N" @ x))2 +
(gi (x) gi (^ ~ j (x) h
h x) A
~ j (^
i2I j2J\A(^
x)

for each x 2 A. We have " (^


x) = 0 and, given how N" has been chosen,

" (x) > 0; 8x 2 S" (^


x) (30.35)

The function " is continuous on the compact set B" (^ x) = fx 2 A : kx x ^k "g and, by
Weierstrass’Theorem, there exists x" 2 B" (^
x) such that " (x" ) " (x) for each x 2 B" (^
x).
"
In particular, " (x ) " "
" (^
x) = 0, and hence (30.35) implies that kx k < ", that is, x 2
30.5. APPENDIX: PROOF OF A KEY LEMMA 935

x). Point x" is therefore a maximizer on the open set B" (^


B" (^ x) and by Fermat’s Theorem
we have r " (x" ) = 0. Therefore, by (30.30), we have:
0 1
Xm X
@f @gi @h
~ j (x" ) j (x" )A = 0 (30.36)
(x" ) 2 (x"z x^z ) 2N" @ gi (x" ) (x" ) + h
@xz @xz @xz
i=1 j2J\A(^
x)

for each z = 1; :::; n. Set:


m
X X 2 1
c" = 1 + (2N" gi (x" ))2 + ~ j (x" )
2N" h ; "
0 =
c"
i=1 j2J\A(^
x)

2N" gi (x" ) ~ j (x" )


2N" h
" "
i = 8i 2 I ; j = 8j 2 J \ A (^
x)
c" c"
"
j =0 8j 2= A (^
x)

so that (30.34) is obtained by dividing (30.36) by c" . Observe that "i 0 for each j 2 J
P " 2 P 2
" " "
and that i2I ( i ) + j2J "j = 1, i.e., " "
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S. 4

Using Fact 2, we can now complete the proof.nTake a decreasing sequence o f"n gn (0; ^"]
n n n n n
with "n # 0, and consider the associated sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj S whose
n
existence is guaranteednby Fact 2. o
n n n n n
Since the sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj is contained in the compact set S, by
n
the Bolzano-Weierstrass’Theorem there exists a subsequence
n o
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj k

convergent in S, that is, there exists 0; 1 ; :::; jIj ; 1 ; :::; jJj 2 S such that

nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj ! 0; 1 ; :::; jIj ; 1 ; :::; jJj

By Fact 2, for each "nk there exists xnk 2 B"nk (^


x) for which (30.34) holds, i.e.,

nk @f X nk @gi nk X nk @hj nk
0 (xnk ) 2 (xnk x
^z ) i (x ) j (x ) = 0
@xz @xz @xz
i2I j2J\A(^
x)

for each z = 1; :::; n. Consider the sequence fxnk gk so constructed. From xnk 2 B"nk (^
x) it
follows that kxnk x ^k < "nk ! 0 and hence, for each z = 1; :::; n,
@f X @gi X @hj
0 (^
x) i (^
x) j (x) (30.37)
@xz @xz @xz
i2I j2J\A(^
x)
0 1
nk @f X nk @gi
X nk @hj
= lim @ 0 xk 2 (xnk x
^z ) i (xnk ) j (xnk )A
k @xz @xz @xz
i2I j2J\A(^
x)

= 0:
936 CHAPTER 30. INEQUALITY CONSTRAINTS

On the other hand, 0 6= 0. Indeed, if it were 0 = 0, then by (30.37) it follows that


X @gi X @hj
i (^
x) + j (^
x) = 0 8z = 1; :::; n
@xz @xz
i2I j2J\A(^
x)

The linear independence of the gradients associated to the constraints that holds for the
hypothesis of regularity of the constraints implies i = 0 for each i 2 I, which contradicts
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S.

In conclusion, if we set ^ i = i = 0 for each i 2 I and ^ = = 0 for each j 2 J, (30.37)


j j
implies (30.8).
Chapter 31

General constraints

31.1 A general concave problem


The choice set of the optimization problem (30.4) of the previous chapter is identi…ed by a
…nite number of equality and inequality constraints expressed through suitable functions g
and h. In general, however, we may also require solutions to belong to a set X that is not
necessarily identi…ed through a …nite number of functional constraints.1 We thus have the
following optimization problem:

max f (x) (31.1)


x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
x2X

where X is a subset of A and the other elements are as in the optimization problem (30.4).
This problem includes as special cases the optimization problems that we have seen so far:
we get back to the optimization problem (30.4) when X = A and to an unconstrained
optimization problem when I = J = ; and C = X is open.
Formulation (31.1) may be also useful when there are conditions on the sign or on the
value of the choice variables xi . The classic example is the non-negativity condition of the xi ,
which are best expressed as a constraint x 2 Rn+ rather than through n inequalities xi 0.
Here a constraint of the form x 2 X simpli…es the exposition.

In this chapter we want to address the general optimization problem (31.1). If X is open,
the solution techniques of Section 30.2 can be easily adapted by restricting the analysis on
X itself (which can play the role of the set A). Matters are more interesting when X is
not open. Here we focus on the concave case of Section 30.4, widely used in applications.
Consequently, throughout the chapter we assume that X is a closed and convex subset of
an open convex set A, as well as that f : A Rn ! R is a concave di¤erentiable objective
function, that gi : Rn ! R are a¢ ne functions and that hj : Rn ! R are convex di¤erentiable
functions.2
1
Sometimes this distinction is made by talking of implicit and explicit constraints. Di¤erent authors,
however, may give an opposite meaning to this terminology (that, in any case, we do not adopt).
2
To ease matters, we de…ne the functions gi and hj on the entire space Rn .

937
938 CHAPTER 31. GENERAL CONSTRAINTS

31.2 Analysis of the black box


In canonical form, the optimization problem (31.1) has the form

max f (x) sub x 2 C


x

where the choice set is

C = fx 2 X : gi (x) = bi and hj (x) cj 8i 2 I; 8j 2 J g (31.2)

The set C is closed and convex. As it is often the case, the best way to proceed is to abstract
from the speci…c problem at hand, with its potentially distracting details. For this reason,
we will consider the following optimization problem:

max f (x) sub x 2 C (31.3)


x

where C is a generic closed and convex choice set that, for the moment, we treat as a black
box. Throughout this section we assume that f is continuously di¤erentiable on an open
convex set that contains C. The simplest case when this assumption holds is when f is
continuously di¤erentiable on its entire domain A.

31.2.1 Variational inequalities


We begin the analysis of the black box problem (31.3) with the simple scalar case

max f (x) sub x 2 [a; b] (31.4)


x

where a; b 2 R. Suppose that x


^ 2 [a; b] is a solution. It is easy to see that we can have two
cases:

(i) x ^ is an interior point; in this case, f 0 (^


^ 2 (a; b), i.e., x x) = 0.

(ii) x ^ is a boundary point; in this case, f 0 (^


^ 2 fa; bg, i.e., x x) ^ = a, while f 0 (^
0 if x x) 0
if x
^ = b.

The next lemma gives a simple and elegant way to unify these two cases.

Proposition 1317 If x
^ 2 [a; b] is solution of the optimization problem (31.4), then

f 0 (^
x) (x x
^) 0 8x 2 [a; b] (31.5)

The converse holds if f is concave.

The proof of this result rests on the following lemma.

Lemma 1318 Condition (31.5) is equivalent to f 0 (^ ^ 2 (a; b), to f 0 (^


x) = 0 if x x) 0 if
x 0
^ = a, and to f (^
x) 0 if x
^ = b.
31.2. ANALYSIS OF THE BLACK BOX 939

Proof We divide the proof in three parts, one for each of the equivalences to prove.
(i) Let x^ 2 (a; b). We prove that (31.5) is equivalent to f 0 (^
x) = 0. If f 0 (^
x) = 0 holds,
0
then f (^ x) (x x ^) = 0 for each x 2 [a; b], and hence (31.5) holds. Vice versa, suppose that
(31.5) holds. Setting x = a, we have (a x ^) < 0 and so (31.5) implies f 0 (^ x) 0. On
the other hand, setting x = b, we have (b x ^) > 0 and so (31.5) implies f 0 (^ x) 0. In
conclusion, x 0
^ 2 (a; b) implies f (^
x) = 0.

^ = a. We prove that (31.5) is equivalent to f 0 (a)


(ii) Let x 0. Let f 0 (a) 0. Since
0
(x a) 0 for each x 2 [a; b], it follows that f (a) (x a) 0 for each x 2 [a; b], and hence
(31.5) holds. Vice versa, suppose that (31.5) holds. By taking x 2 (a; b], we have (x a) > 0
and so (31.5) implies f 0 (a) 0.

(iii) Let x ^ = b. We prove that (31.5) is equivalent to f 0 (b) 0. Let f 0 (b) 0. Since
0
(x b) 0 for each x 2 [a; b], we have f (b) (x b) 0 for each x 2 [a; b] and (31.5) holds.
Vice versa, suppose that (31.5) holds. By taking x 2 [a; b), we have (x b) < 0 and so (31.5)
implies f 0 (b) 0.

Proof of Proposition 1317 In view of Lemma 1318, it only remains to prove that (31.5)
becomes a su¢ cient condition when f is concave. Suppose, therefore, that f is concave and
that x
^ 2 [a; b] is such that (31.5) holds. We prove that this implies that x ^ is solution of
problem (31.4). Indeed, by (24.7) we have f (x) f (^ 0
x) + f (^x) (x x ^) for each x 2 [a; b],
which implies f (x) f (^ x) f 0 (^
x) (x x ^) for each x 2 [a; b]. Thus, (31.5) implies that
f (x) f (^x) 0, that is, f (x) f (^ x) for each x 2 [a; b]. Hence, x
^ solves the optimization
problem (31.4).

The inequality (31.5) that x


^ satis…es is an example of a variational inequality. Besides
unifying the two cases, this variational inequality is interesting because when f is concave
it provides a necessary and su¢ cient condition for a point to be solution of the optimiza-
tion problem. Even more interesting is the fact that this characterization can be naturally
extended to the multivariable case.

Theorem 1319 (Stampacchia) If x ^ 2 C is solution of the optimization problem (31.3),


then it satis…es the variational inequality

rf (^
x) (x x
^) 0 8x 2 C (31.6)

The converse holds if f is concave.

As in the scalar case, the variational inequality uni…es the optimality necessary conditions
for interior and boundary points. Indeed, it is easy to check that, when x ^ is an interior point
of C, (31.6) reduces to the classic …rst-order condition rf (^ x) = 0 of Fermat’s Theorem.

Proof Let x
^ 2 C be solution of the optimization problem (31.3), i.e., f (^
x) f (x) for each
x 2 C. Given x 2 C, set zt = x
^ + t (x x^) for t 2 [0; 1]. Since C is convex, zt 2 C for each
940 CHAPTER 31. GENERAL CONSTRAINTS

t 2 [0; 1]. De…ne : [0; 1] ! R by (t) = f (zt ). Since f is di¤erentiable at x


^, we have

0 (t) (0) f (^
x + t (x x ^)) f (^
x)
+ (0) = lim = lim
t!0+ t t!0+ t
df (^
x) (t (x x ^)) + o (kt (x x ^)k)
= lim
t!0 + t
o (t kx x ^k)
= df (^
x) (x x ^) + lim = df (^
x) (x x^) = rf (^
x) (x x
^)
t!0+ t

For each t 2 [0; 1] we have (0) = f (^ x) f (zt ) = (t), and so : [0; 1] ! R has a (global)
0
maximizer at t = 0. It follows that + (0) 0, which implies rf (^ x) (x x ^) 0, as desired.
As to the converse, assume that f is concave. By (24.18), f (x) f (^x) + rf (^
x) (x x ^)
for each x 2 C, and therefore (31.6) implies f (x) f (^x) for each x 2 C.

For the dual minimum problems, the variational inequality is easily seen to take the dual
form rf (^
x) (x x ^) 0. For interior solutions, instead, the condition rf (^ x) = 0 is the
same in both maximization and minimization problems.3

31.2.2 A general …rst order condition


The normal cone NC (x) of a convex set C with respect to a point x 2 C is given by

NC (x) = fy 2 Rn : y (x x) 0 8x 2 Cg

Next we provide a couple of important properties of NC (x). In particular, (ii) shows that
NC (x) is non-trivial only if x is a boundary point.

Lemma 1320 (i) NC (x) is a closed and convex cone;

(ii) NC (x) = f0g if and only if x is an interior point of C.

Proof (i) The set NC (x) is clearly closed. Moreover, given y; z 2 NC (x) and ; 0, we
have
( y + z) (x x) = y (x x) + z (x x) 0 8x 2 C
and so y + z 2 NC (x). By Proposition 699, NC (x) is a convex cone. (ii) We only prove
the “if” part. Let x be an interior point of C. Suppose, by contradiction that there is a
vector y 6= 0 in NC (x). As x is interior, we have that x + ty 2 C for t > 0 su¢ ciently
small. Hence we would have y (x + ty x) = ty y = t kyk2 0. This implies y = 0, a
contradiction. Hence NC (x) = f0g.

To see the importance of normal cones, note that condition (31.6) can be written as:

rf (^
x) 2 NC (^
x) (31.7)
3
The unifying power of variational inequalities in optimization is the outcome of a few works of Guido
Stampacchia in the early 1960s. For an overview, see Kinderlehrer and Stampacchia (1980).
31.2. ANALYSIS OF THE BLACK BOX 941

Therefore, x
^ solves the optimization problem (31.3) only if the gradient rf (^
x) belongs to the
normal cone of C with respect to x ^. This way of writing condition (31.6) is useful because,
given a set C, if we can describe the form of the normal cone – something that does not
require any knowledge of the objective function f –we can then have a sense of which form
takes the “…rst order condition” for the optimization problems that have C as a choice set.
In other words, (31.7) can be seen as a general …rst order condition that permits to
distinguish in such condition the part, NC (^ x), determined by the constraint C, and the
part, rf (^
x), determined by the objective function. This distinction between the roles of the
objective function and of the constraint is illuminating.4

The next result characterizes the normal cone for convex cones.

Proposition 1321 If C is a convex cone and x 2 C, then

NC (x) = fy 2 Rn : y x = 0 and y x 0 8x 2 Cg

If, in addition, C is a vector subspace, then NC (x) = C ? for every x 2 C.

Proof Let y 2 NC (x) : Then y (x x) 0 for all x 2 C: As 0 2 C, we have y (0 x) 0.


Hence y x 0. On the other hand, we can write y x = y (2x x) 0. It follows that
y x = 0. In turn, y x = y (x x) 0 for each x 2 C. Conversely, if y satis…es the
two conditions y x = 0 and y x 0 for each x 2 C, then y (x x) = y x y x 0,
and so y 2 NC (x). Suppose now, in addition, that C is a vector subspace. A subspace
C is a cone such that x 2 C implies x 2 C. Hence, the …rst part of the proof yields
NC (x) = fy 2 Rn : y x = 0 and y x = 0 8x 2 Cg. Since x 2 C, we then have NC (x) =
fy 2 Rn : y x = 0 8x 2 Cg = C ? .

Example 1322 If C = Rn+ , we have:

NC (x) = fy 2 Rn : yi xi = 0 and yi 0 8i = 1; :::; ng (31.8)

Indeed, we have yi 0 for each i since yi = y ei 0. Hence, yi xi 0 for each i, which in


turn implies yi xi = 0 for each i because y x = 0. N

This result implies that for, given a closed and convex cone C, a point x
^ satis…es the …rst
order condition (31.7) when

rf (^
x) x
^=0 (31.9)
rf (^
x) x 0 8x 2 C (31.10)

The …rst order condition is thus easier to check on cones. Even more so in the important
special case C = Rn+ , when from (31.8) it follows that conditions (31.9) and (31.10) reduce
to the following n equalities and n inequalities,
@f (^
x)
x
^i =0 (31.11)
@xi
@f (^
x)
0 (31.12)
@xi
4
For a thorough account of this important viewpoint, we refer readers to Rockafellar (1993).
942 CHAPTER 31. GENERAL CONSTRAINTS

for each i = 1; :::; n.

P
We can also characterize the normal cones of the simplices n 1 = x 2 Rn+ : nk=1 xi = 1 ,
another all-important class of closed and convex sets. To this end, given x 2 n 1 set

I (x) = fy 2 Rn : yi = 1 if i 2 P (x) and yi 1 if i 2


= P (x)g

where P (x) = fi : xi > 0g.

Proposition 1323 We have N n 1 (x) = f y 2 Rn : y 2 I (x) and 0g.

The set f y 2 Rn : y 2 I (x) and 0g is easily seen to be the smallest convex cone
that contains I (x). The normal cone is thus such a set.

Example 1324 If x = (1=3; 0; 2=3) 2 2, we have I (x) = f(1; y2 ; 1) : y2 1g and N 2 (x) =


f( ; y2 ; ) : y2 1 and 0g. N

In view of this characterization, a point x


^2 n 1 satis…es the …rst order condition (31.7)
if and only if there is a scalar ^ 0 such that

@f (^
x) @f (^
x)
=^ if x
^i > 0 ; ^ if x
^i = 0
@xi @xi

that is, when

@f (^
x)
^ 8i = 1; :::; n (31.13)
@xi
@f (^
x)
^ x
^i = 0 8i = 1; :::; n (31.14)
@xi

Proof of Proposition 1323 Suppose that P (x) is not a singleton and let i; j 2 P (x).
Clearly, 0 < xi ; xj < 1. Consider the points x" 2 Rn having coordinates x"i = xi + ",
x"j = xj ", and x"k = xk for all k 6= i and k 6= j; while the parameter " runs over
Pn [ ""0 ; "0 ]
with "0 > 0 su¢ ciently small in order that x " 0 for " 2 [ "0 ; "0 ]. Note that k=1 xk = 1
and so x" 2 n 1 . Let y 2 N n 1 (x). By de…nition, y (x" x) 0 for every " 2 [ "0 ; "0 ].
Namely, "yi "yj = " (yi yj ) 0, which implies yi = yj . Hence, it must hold yi = for
all i 2 P (x). That is, the values of y must be constant on P (x). This is trivially true when
P (x) is singleton. Let now j 2= P (x). Consider the vector xj 2 Rn , where xjj = 1 and xjk = 0
for each k 6= j: If y 2 N n 1 (x), then y xj x 0. That is,
X X X
yj yk xk = yj yk xk = yj xk = yj 0
k6=j k2P (x) k2P (x)

Therefore, N n 1 (x) f y 2 Rn : y 2 I (x) and 0g. We now show the converse inclu-
n
sion. Let y 2 R be such that, for some 0, we have yi = for all i 2 P (x) and yk
31.2. ANALYSIS OF THE BLACK BOX 943

for each k 2
= P (x). If x 2 n 1, then
n
X X X
y (x x) = yi (xi xi ) = yi (xi xi ) + yi (xi xi )
i=1 i2P (x) i2P
= (x)
0 1
X X X X
= (xi xi ) + yi x i = @ xi A + yi xi
i2P (x) i2P
= (x) i2P (x) i2P
= (x)
0 1
X X
@ xi A + xi = 0
i2P (x) i2P
= (x)

Hence y 2 N n 1 (x).

31.2.3 Divide et impera


Often the choice set C may be written as an intersection C = C1 \ \Cn . A natural question
is whether the n relaxed optimization problems that correspond to the larger choice sets Ci
can be then combined to inform on the original optimization problem. The next result is
key, as it provides a condition under which holds an “intersection rule”for normal cones. It
involves the sum
n
( n )
X X
NCi (x) = yi : yi 2 NCi (x) 8i = 1; :::; n
i=1 i=1

of the normal cones (cf. Section 32.3).

Proposition 1325 Let C = C1 \ \ Cn , with each Ci closed and convex. Then, for all
x 2 C,
n
X
NCi (x) NC (x)
i=1

Equality holds if C satis…es Slater’s condition int C1 \ \ int Cn 6= ;, where the set Ci itself
can replace its interior int Ci if it is a¢ ne.
P
Proof Let xP 2 C. Suppose y = ni=1 yi , with yi 2 NCi (x) for every i = 1; :::; n. Then,
y (x x) = ni=1 yi (x x) 0, and so y 2 NC (x). This proves the inequality. We omit
the proof that Slater’s condition implies the equality.

Example 1326 Let A be an m n matrix and b 2 Rn . (i) Let C1 = fx 2 Rn : Ax bg and


C2 = Rn+ . We have int C1 = fx 2 Rn : Ax bg and int C2 = Rn++ . The set C = C1 \ C2
satis…es Slater’s condition when int C1 \int C2 6= ;, that is, if and only if there exists x 2 Rn++
such that Ax b. In this case, by the last proposition NC1 (x) + NC2 (x) = NC (x). (ii)
Let C1 = fx 2 Rn : Ax = bg and C2 = Rn+ . Since C1 is a¢ ne, the set C = C1 \ C2 satis…es
Slater’s condition when C1 \ int C2 6= ;, that is, if and only if there exists x 2 Rn++ such that
Ax = b. Again, in this case by the last proposition we have NC1 (x) + NC2 (x) = NC (x). N
944 CHAPTER 31. GENERAL CONSTRAINTS

In words, under Slater’s condition the normal cone of an intersection of sets is the sum
of their normal cones. Hence, a point x ^ satis…es the …rst order condition (31.7) if and only
if there is a vector y^ = (^
y1 ; :::; y^n ) such that
n
X
rf (^
x) = y^i
i=1
y^i 2 NCi (^
x) 8i = 1; :::; n

A familiar “multipliers”format emerges. The next section will show how the Kuhn-Tucker’s
Theorem …ts in this general framework.

31.3 Resolution of the general concave problem


We can now get out of the black box and extend Kuhn-Tucker’s Theorem to the general
concave optimization problem (31.1). Its choice set (31.2) is
\ \
C=X\ Ci \ Cj
i2I j2J

where Ci = (gi = bi ) and Cj = (hj cj ).

Lemma 1327 The set C satis…es Slater’s condition if there is x 2 int X such that gi (x) = bi
for all i 2 I and hj (x) < cj for all j 2 J.
\ \
Proof The level sets Ci are a¢ ne (Proposition 662). Since x 2 X \ Ci \ int Cj ,
i2I j2J
such intersection is non-empty and so C satis…es Slater’s condition.

In what follows we thus assume the existence of such x.5 In view of Proposition 1325, it
now becomes key to characterize the normal cones of the sets Ci and Cj .

Lemma 1328 (i) For each x 2 Ci , we have NCi (x) = f rg (x) : 2 Rg for each x 2 Ci ;
(ii) For each x 2 Cj , we have
8
>
> f rhj (x) : 0g if hj (x) = cj
<
NCj (x) = f0g if hj (x) < cj
>
>
:
; if hj (x) > cj

Proof We only prove (ii) when hj (x) = cj . Assume cj = 0 (otherwise, it is enough to


consider the convex function hj cj ). Let hj (x) = 0. We have f rhj (x) : 0g = NC (x).
Let y 2 NC (x). Since hj (x) = 0, we have hj (x) hj (x) + y (x x) for all x 2 C, and so
y = rhj (x) since hj is di¤erentiable at x (cf. Theorem 1139). Conversely, if y = rhj (x)
for some 0, then 0 hj (x) y (x x) since hj (x) = 0 and x 2 C. Hence,
rhj (x) 2 NC (x). We omit the cases hj (x) < 0 and hj (x) > 0.
5
This also ensures that the problem is well posed in the sense of De…nition 1299.
31.3. RESOLUTION OF THE GENERAL CONCAVE PROBLEM 945

Along with Proposition 1325, this lemma implies


8 9
< X X =
NC (x) = + i rgi (^
x) + j rhj (x) : 2 NX (x) , i 2 R 8i 2 I, j 0 8j 2 A (x)
: ;
i2I j2A(x)

where A (x) is the collection of the binding inequality constraints de…ned in (30.7). Since
here the …rst order condition (31.7) is a necessary and su¢ cient optimality condition, we can
say that x
^ 2 C solves the optimization problem (31.1) if and only if there exists a triple of
^ jJj
vectors ( ; ^ ; ^ ) 2 RjIj R+ Rn such that
X X
rf (^x) = ^ + ^ i rgi (^
x) + ^ j rhj (^
x) (31.15)
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J (31.16)
Indeed, as we noted in Lemma 1303, condition (31.16) amounts to require ^ j = 0 for each
j2= A (^
x).

To sum up, under a Slater’s condition we get back the Kuhn-Tucker’s conditions (30.8)
and (30.9), suitably modi…ed to cope with the new constraint x 2 X. We leave to the reader
the formulation of these conditions via a Lagrangian function.
Example 1329 Let X = Rn+ . By (31.8), ^ k x
^k = 0 and ^ k 0 for each k = 1; :::; n. By
(31.15), we have X X
^ = rf (^x) ^ i rgi (^
x) ^ j rhj (^
x) (31.17)
i2I j2J
So, conditions (31.15) and (31.16) can be equivalently written (with gradients unzipped) as:
x) X ^ @gi (^
@f (^ x) X @hj (^ x)
i + ^j 8k = 1; :::; n
@xk @xk @xk
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J
0 1
@f (^
x ) X @gi (^ x ) X @hj (^ x )
@ ^i ^j Ax^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J

In this formulation, we can omit ^ . N


Example 1330 Let X = n 1 . By (31.13) and (31.14), ^ 2 NX (^ x) if and only if there is
some ^ 0 such that ^ k ^ and (^ ^ ) x^k = 0 for every k = 1; :::; n. In view of (31.17),
we can say that x ^ 2 C solves the optimization problem (31.1) if and only if there exists a
^ jJj
triple ( ; ^ ; ^ ) 2 RjIj R+ R+ such that
@f (^ x) X x) X @hj (^ x)
^ i @gi (^ ^j ^ 8k = 1; :::; n
@xk @xk @xk
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J
0 1
@ @f (^x) X ^ @gi (^ x) X @hj (^ x)
i ^j ^A x
^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J

In this formulation, we replace the vector ^ with the scalar ^ . N


946 CHAPTER 31. GENERAL CONSTRAINTS

Variational inequalities provided a third approach to theorems a la Lagrange/Kuhn-


Tucker. Indeed, Lagrange’s Theorem was proved using the Implicit Function Theorem
(Lemma 1285) and Kuhn-Tucker’s Theorem using a penalization technique (Lemma 1302).
Di¤erent techniques may require di¤erent regularity conditions. For instance, Slater’s con-
dition comes up in using variational inequality, while a linear independence condition was
used in the previous chapter (De…nition 1301). In general, they provide di¤erent angles on
the multipliers format. A …nal, deep and surprising, game theoretic angle will be discussed
later in the book (Section 34.5.2).
Chapter 32

Intermezzo: correspondences

32.1 De…nition and basic notions


The notion of correspondence generalizes that of function by permitting that to an element
of the domain can be associated multiple elements of the codomain, not a single one as the
notion of function requires. Correspondences play an important role in economic applica-
tions, which actually provided a main motivation for their study. In this section we introduce
them.
Speci…cally, given any two sets X and Y , a correspondence ' : X Y is a rule that, to
each element x 2 X, associates a non-empty subset ' (x) of Y – the image of x under '.
The set X is the domain of ' and Y is the codomain.
When ' (x) is a singleton for all x 2 X, the correspondence reduces to a function ' :
X ! Y . In what follows, whenever ' (x) is a singleton, say fyg, with a small abuse of
notation, we will either write ' (x) = fyg or ' (x) = y.

Example 1331 (i) The correspondence ' : R R given by ' (x) = [ jxj ; jxj] associates to
each scalar x the interval [ jxj ; jxj]. For instance, ' (1) = ' ( 1) = [ 1; 1] and ' (0) = f0g.
(ii) Given a consumption set A = [0; b] with b 2 Rn++ , the budget correspondence B :
n
R+ R+ Rn+ de…ned by B (p; w) = fx 2 A : p x wg associates to each pair (p; w) of
prices and income the corresponding budget set.
(iii) Given a concave function f : Rn ! R, the superdi¤erential correspondence @f :
R n Rn has as image @f (x) the superdi¤erential of f at x (cf. Proposition 1143). The
superdi¤erential correspondence generalizes for concave functions the derivative operator
rf : Rn ! Rn de…ned in (21.6).
(iv) Let f : X ! Y be a function between any two sets X and Y . The inverse corres-
pondence f 1 : Im f X is de…ned by f 1 (y) = fx 2 X : f (x) = yg. If f is injective, we
get back to the inverse function f 1 : Im f ! Y . For instance, if f : R ! R is the quadratic
function f (x) = x2 , then Im f = [0; 1) and so the inverse correspondence f 1 : [0; 1) R
is de…ned by
p p
f 1 (y) = f y; yg
for all y 0. Recall that in Example 170 we argued that this rule does not de…ne a function
since, to each strictly positive scalar, it associates two elements of the codomain, i.e., its
positive and negative square roots. N

947
948 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

The graph Gr ' of a correspondence ' : X Y is the set

Gr ' = f(x; y) 2 X Y : y 2 ' (x)g

Like the graph of a function, the graph of a correspondence is a subset of X Y . If ' is a func-
tion, we get back to the notion of graph of a function Gr ' = f(x; y) 2 X Y : y = ' (x)g.
Indeed, condition y 2 ' (x) reduces to y = ' (x) when each image ' (x) is a singleton.

Example 1332 (i) The graph of the correspondence ' : R R given by ' (x) = [ jxj ; jxj]
is Gr ' = (x; y) 2 R2 : jxj y jxj . Graphically:

(ii) The graph of the budget correspondence B : Rn+ R+ Rn+ is

Gr B = (p; w; x) 2 Rn+ R+ A : x 2 B (p; w)

From now on we consider correspondences ' : A Rn Rm that have as domain a


subset of Rn and as codomain Rm . We say that such a ' is:

(i) closed-valued if ' (x) is a closed subset for every x 2 X;

(ii) compact-valued if ' (x) is a compact subset for every x 2 X;

(iii) convex-valued if ' (x) is a convex subset for every x 2 X.

Functions are, trivially, both compact-valued and convex-valued because singletons are
compact convex sets. Let us see an important economic example.

Example 1333 Suppose that the consumption set A is both closed and convex, say it is
Rn+ . Then, the budget correspondence is convex-valued, as well as compact-valued if p 0
n
and w > 0, that is, when restricted to R++ R++ (cf. Proposition 792). N
32.1. DEFINITION AND BASIC NOTIONS 949

The graph of a correspondence ' : A Rn Rm is a subset of A Rm given by


m
Gr ' = f(x; y) 2 A R : y 2 ' (x)g. It is easy to see that ' is:

(i) closed-valued when its graph Gr ' is a closed subset of A Rm ;


(ii) convex-valued when its graph Gr ' is a convex subset of A Rm .

The converse implications are false: closedness and convexity of the graph of ' are
signi…cantly stronger assumptions than the closedness and convexity of the images ' (x).
This is best seen by considering scalar functions, as next we show.

Example 1334 (i) Consider f : R ! R given by


x if x < 0
f (x) =
1 if x 0
Since f is a function, it is both closed-valued and convex-valued. However, its graph

Gr ' = f(x; x) : x < 0g [ f(x; 1) : x 0g

is neither closed nor convex. Graphically:

The lack of convexity is obvious. To see that Gr ' is not closed observe that the origin is a
boundary point that does not belong to Gr '.
(ii) A scalar function f : R ! R has convex graph if and only if it is a¢ ne (i.e., it is a
straight line). The “if” is obvious. As to the “only if,” suppose that Gr f is convex. Given
any x; y 2 R and any 2 [0; 1], then ( x + (1 ) y; f (x) + (1 ) f (y)) 2 Gr f , that
is, f ( x + (1 ) y) = f (x) + (1 ) f (y), proving f is a¢ ne. By Proposition 656, this
implies that there exist m; q 2 R such that f (x) = mx + q. We conclude that all scalar
functions that are not a¢ ne are convex-valued but do not have convex graphs. N
950 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

In Section 6.4.3 we said that a real-valued function f : A ! R, de…ned on any set A, is


bounded if its image is a bounded set of the real line, i.e., if there is k > 0 such that jf (x)j k
for all x 2 A. This notion extends naturally to functions f = (f1 ; ::; fn ) : A ! Rm by saying
that f is bounded if its image is a bounded set of Rm , that is, if there exists k > 0 such that
kf (x)k k 8x 2 A
(recall De…nition 159 of bounded set in Rm ). It is easy to check that f is bounded if and
only if its component functions fi : A ! R are bounded.
In a similar vein, we say that a correspondence ' : A Rn Rm is bounded if there is
m
a compact subset K R such that
' (x) K 8x 2 A
If needed, we may write ' : A K. In any case, when ' : A ! Rm is a function we get
back to the notion of boundedness just introduced. Indeed, in this case ' (x) K amounts
to ' (x) 2 K, and it is easy to see that ' (x) 2 K for all x 2 A if and only if there is a
positive scalar k > 0 such that k' (x)k k for all x 2 A.
Example 1335 The budget correspondence is bounded if the consumption set A is [0; b]
with b 2 Rn++ . Indeed, by de…nition B (p; w) A for all (p; w) 2 Rn+ R+ . N

32.2 Hemicontinuity
There are several notions of continuity for correspondences. For bounded correspondences,
the main class of correspondences for which continuity will be needed (cf. Section 33.3), the
following notions are adequate.
De…nition 1336 A correspondence ' : A Rn Rm is
(i) upper hemicontinuous at x 2 A if
xn ! x, yn ! y and yn 2 ' (xn )
implies y 2 ' (x);
(ii) lower hemicontinuous at x 2 A if
xn ! x and y 2 ' (x)
implies that there exist elements yn 2 ' (xn ) such that yn ! y;
(iii) continuous at x 2 A if it is both upper and lower hemicontinuous at x.
A correspondence ' is upper (lower ) hemicontinuous if it is upper (lower) hemicontinuous
at all x 2 A. A correspondence ' is continuous if it is upper and lower hemicontinuous.

Intuitively, a upper hemicontinuous correspondence has no abrupt shrinks in the graph:


the image of the correspondence at each point x contains all possible limits of sequences
yn 2 ' (xn ) included in the graph. In contrast, a lower hemicontinuous correspondence has
no abrupt dilations in the graph: any element in the image of a point x must be reachable
as a limit of a sequence yn 2 ' (xn ) included in the graph.
The following examples illustrate these continuity notions.
32.2. HEMICONTINUITY 951

Example 1337 The correspondence ' : [0; 1] R given by

(
[0; 2] if 0 x<1
' (x) = 1
2 if x = 1

is lower hemicontinuous at x = 1. Graphically:

Formally, let xn ! 1 and y 2 ' (1) = f1=2g, that is, y = 1=2. If we take, for instance,
yn = 1=2 2 ' (xn ) for all n, we have yn ! y. In contrast, ' is not upper hemicontinuous at
x = 1 (where an “abrupt shrink” in the graph occurs). For example, consider the sequences
xn = 1 1=n and yn = 1=4. It holds xn ! 1 and yn 2 ' (xn ), but yn trivially converges to
1=4 2
= ' (1) = f1=2g. Finally, ' is easily seen to be continuous on [0; 1). N

Example 1338 The correspondence ' : [0; 1] R given by

(
[1; 2] if 0 x<1
' (x) =
[1; 3] if x = 1
952 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

is upper hemicontinuous at x = 1. Graphically:

Formally, if xn ! 1, yn ! y and yn 2 ' (xn ) = [1; 2], then y 2 [1; 2] ' (1). In contrast, '
is not lower hemicontinuous at x = 1 (where an “abrupt dilation” in the graph occurs). For
example, consider the sequence xn = 1 1=n and y = 3. It holds xn ! 1 and y 2 ' (1), but
there is no sequence fyn g such that yn 2 ' (xn ) that converges to y. Finally, ' is easily seen
to be continuous on [0; 1). N

The next two results further clarify the nature of upper hemicontinuous correspondences.

Proposition 1339 A correspondence is upper hemicontinuous if its graph is a closed set.


The converse is true if its domain is a closed set.

Proof Suppose Gr ' is closed. Let xn ! x, yn ! y and yn 2 ' (xn ). Since (xn ; yn ) ! (x; y)
and Gr ' is a closed set, we have (x; y) 2 Gr ', yielding that y 2 ' (x). We conclude
that ' is upper hemicontinuous. As to the converse, assume that the domain A is closed
and ' : A Rn Rm is upper hemicontinuous. Let f(xn ; yn )g Gr ' be such that
(xn ; yn ) ! (x; y) 2 R Rm . To show that Gr ' is closed, we need to show that (x; y) 2 Gr '.
n

Since A is closed, xn ! x 2 A. By construction, we also have that yn ! y and yn 2 ' (xn )


for every n. Since ' is upper hemicontinuous, y 2 ' (x), proving that (x; y) 2 Gr ' and that
Gr ' is closed.
32.3. ADDITION AND SCALAR MULTIPLICATION OF SETS 953

Proposition 1340 An upper hemicontinuous correspondence is closed-valued.

In turn, this implies that upper hemicontinuous correspondences are compact-valued


when they are bounded.

Proof Let x 2 A. We need to show that ' (x) is a closed set. Consider fyn g ' (x) to be
such that yn ! y 2 Rm . De…ne fxn g A to be such that xn = x for every n. It follows
that xn ! x, yn ! y and yn 2 ' (xn ) for every n. Since ' is upper hemicontinuous, we can
conclude that y 2 ' (x), yielding that ' (x) is closed.

For bounded functions the two notions of hemicontinuity are equivalent to continuity.

Proposition 1341 For a bounded function f : A Rn ! Rm and a point x 2 A, the


following properties are equivalent:

(i) f is continuous at x;

(ii) f is lower hemicontinuous at x;

(iii) f is upper hemicontinuous at x.

Proof First observe that, being f a function, y = f (x) amounts to y 2 f (x), when we
look at the function f as a single-valued correspondence. (i) implies (ii). Let xn ! x and
y = f (x). Since f is a function, we can only choose fyn g to be such that yn = f (xn ). By
continuity, yn = f (xn ) ! f (x) = y, so f is lower hemicontinuous at x. (ii) implies (iii).
Let xn ! x and fyn g such that yn 2 f (xn ) and yn ! y. Since f is a function, we can only
choose fyn g to be such that yn = f (xn ). Since f is lower hemicontinuous at x, it holds
yn ! f (x) = y. This implies that f is upper hemicontinuous at x. (iii) implies (i). Let
xn ! x. We want to show that yn = f (xn ) ! f (x). Suppose not. Then there is " > 0 and
a subsequence fynk g such that

kynk f (x)k " 8k 1 (32.1)

Since fynk g is a bounded sequence of vectors (being f bounded), by the Bolzano-Weierstrass’


Theorem (which is easily seen to hold also for bounded sequences of vectors) there is a further
subsequence ynks that converges to y 2 Rm . Since xnks ! x and ynks = f xnks , by upper
hemicontinuity y 2 f (x), that is, y = f (x). Hence, for all s large enough ynks f (x) < ",
which contradicts (32.1). We conclude that yn = f (xn ) ! y = f (x), i.e., f is continuous at
x.

32.3 Addition and scalar multiplication of sets


To complete our study of correspondences, we need to introduce addition and scalar multi-
plication for sets. We begin with addition.

De…nition 1342 Given any two sets A and B in Rn , their sum A + B is the set in Rn such
that
A + B = fx + y : x 2 A and y 2 Bg
954 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

In words, A + B consists of all the possible sums x + y of elements of A and B.

Note that if 0 2 A, then B A + B because y = 0 + y 2 A + B for all y 2 B.

Example 1343 (i) The sum of the unit square A = [0; 1] [0; 1] and of the singleton
B = f(3; 3)g is the square A + B = [3; 4] [3; 4]. (ii) The sum of the squares A = [0; 1]
[0; 1] and B = [2; 3] [2; 3] is the square A + B = [2; 4] [2; 4]. Note that B A+B
since 0 2 A. (iii) The sum of the sides A = f(x1 ; x2 ) 2 [0; 1] [0; 1] : x1 = 0g and B =
f(x1 ; x2 ) 2 [0; 1] [0; 1] : x2 = 0g of the unit square is the unit square itself, i.e., A + B =
[0; 1] [0; 1]. (iv) The sum of the vertical A = (x1 ; x2 ) 2 R2 : x1 = 0 and horizontal
B = (x1 ; x2 ) 2 R2 : x2 = 0 axes is the entire plane, i.e., A + B = R2 . N

Next we give some properties of sums of sets.

Proposition 1344 Let A and B be any two sets in Rn . Then:

(i) if A and B are convex, their sum A + B is convex;

(ii) if A is closed and B is compact, their sum A + B is closed;

(iii) if A and B are compact, their sum A + B is compact.

Proof (i) Let A and B be convex. Let v; w 2 A + B and 2 [0; 1]. By de…nition, there
exist x0 ; x00 2 A and y 0 ; y 00 2 B such that v = x0 + y 0 and w = x00 + y 00 . Since A and B are
convex, we have that x0 + (1 ) x00 2 A and y 0 + (1 ) y 00 2 B. This implies that

v +(1 )w = x0 + y 0 +(1 ) x00 + y 00 = x0 +(1 ) x00 + y 0 +(1 ) y 00 2 A+B

(ii) Let A be closed and B compact. Let fzn g A + B be such that zn ! z 2 Rn . We


want to show that z 2 A + B. By de…nition, there exist fxn g A and fyn g B such that
zn = xn + yn for all n 1. Since B is compact, by the Bolzano-Weierstrass’Theorem there
exists a subsequence fynk g B that converges to some y 2 B. So, by the algebra of limits
we have
lim xnk = lim (znk ynk ) = lim znk lim ynk = z y
k!1 k!1 k!1 k!1
32.4. COMBINING CORRESPONDENCES 955

Since A is closed, we have z y 2 A. In turn, this implies z 2 A + B. (iii) Let A and B be


compact. By point (ii), A + B is closed. As the reader can check, A + B is also bounded.
So, it is compact.

By iterating the sum of two sets, we can de…ne the sum


n
X
Ai (32.2)
i=1

of n sets Ai in Rn . Properties (i) and (iii) just established for the sum of two sets continue
to hold for sums of n sets.

We turn now to scalar multiplication.

De…nition 1345 Given a scalar 2 R and a set A in Rn , their product A is the set in
Rn such that A = f x : x 2 Ag.

Example 1346 The product of the unit square A = [0; 1] [0; 1] and of = 2, is the square
2A = [0; 2] [0; 2]. N

The sum (32.2) thus generalizes to a linear combination


n
X
i Ai
i=1

of n sets Ai in Rn and n scalars i 2 R.

32.4 Combining correspondences


De…nition 1347 Given any two correspondences '; :A Rn Rm , their sum ' + :
A Rn Rm is the correspondence such that

(' + ) (x) = ' (x) + (x) 8x 2 A

In view of Proposition 1344, the sum of two convex-valued correspondences is convex-


valued, and the sum of two compact-valued correspondences is compact-valued.

Proposition 1348 Let '; : A Rn Rm be any two correspondences and ; 2 R.


Then:

(i) if ' and are bounded and upper hemicontinuous at a point, their sum '+ is
upper hemicontinuous at that point;

(ii) if ' and are lower hemicontinuous at a point, their sum ' + is lower hemicon-
tinuous at that point.
956 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

Proof It is enough to consider the case = = 1, as the general case then easily follows.
(i) Suppose that at x we have xn ! x, yn ! y and yn 2 (' + ) (xn ). We want to show
that y 2 (' + ) (x). By de…nition, for each n there exist yn0 2 ' (xn ) and yn00 2 (xn ) such
that yn = yn0 + yn00 . Since ' and are bounded, there exist compact sets K' and K such
that fyn0 g K' and fyn00 g K . Hence, both sequences are bounded, so by the Bolzano-
Weierstrass’Theorem there exist subsequences yn0 k and yn00k that converge to some points
y 0 2 Rm and y 00 2 Rm , respectively. Since yn0 k 2 ' (xnk ) and yn00k 2 (xnk ) for every k and
xnk ! x, we then have y 0 2 ' (x) and y 00 2 (x) because ' and are upper hemicontinuous
at x. We conclude that y = limk!1 ynk = limk!1 yn0 k + yn00k = y 0 + y 00 2 (' + ) (x), as
desired.
(ii) Suppose that at x we have xn ! x and y 2 (' + ) (x). We want to show that there
exist elements yn 2 (' + ) (xn ) such that yn ! y. By de…nition, there exist y 0 2 ' (x)
and y 00 2 (x) such that y = y 0 + y 00 . Since ' and are lower hemicontinuous, there exist
elements yn0 2 ' (xn ) and yn00 2 (xn ) such that yn0 ! y 0 and yn00 ! y 00 . Setting yn = yn0 + yn00
we then have yn 2 (' + ) (xn ) and yn = yn0 + yn00 ! y 0 + y 00 = y, as desired.

By iterating the linear combination of two correspondences, we can de…ne the linear
combination of
X n
i 'i (32.3)
i=1

of n correspondences and n scalars i 2 R. The properties of the linear combinations of two


correspondences just established, continue to hold for linear combination of n correspond-
ences.

32.5 Inclusion equations


32.5.1 Inclusion equations and …xed points
A correspondence f : A Rn Rn de…nes an inclusion equation

0 2 f (x)

If f is a function, the inclusion equation reduces to a standard equation f (x) = 0. The


generalized …rst order condition 0 2 @f (^x) of Theorem 1151 is a most important example of
an inclusion equation. Later we will see that inclusion equations naturally arise in market
analysis.
Like equations, also inclusion equations may be solved via …xed point analysis. Indeed,
such analysis can be generalized to correspondences. Speci…cally, a correspondence f :
A Rn Rn is said to be a self-correspondence if f (x) A for all x 2 A. In words, self-
correspondences associates a subset of A to each element of A. So, we often write f : A A.

Example 1349 (i) All correspondences f : Rn Rn are, trivially, self-correspondences.


(ii) The correspondence f : [0; 1] [0; 1] given by f (x) = 0; x2 is a self-correspondence
2
because x 2 [0; 1] for all x 2 [0; 1]. N

The notion of …xed point naturally extends to self-correspondences.


32.5. INCLUSION EQUATIONS 957

De…nition 1350 Given a self-correspondence f : A A, a vector x 2 A is said to be a


…xed point of f if x 2 f (x).

For instance, for the self-correspondence f : [0; 1] [0; 1] given by f (x) = [0; x2 ], the
endpoints 0 and 1 are …xed points in that 0 2 f (0) = f0g and 1 2 f (1) = [0; 1].
The next theorem establishes the existence of …xed points by generalizing Brouwer’s
Theorem (we omit its non-trivial proof).1

Theorem 1351 (Kakutani) An upper hemicontinuous and convex-valued self-correspondence


f :K K de…ned on a convex compact subset K of Rn has a …xed point.

Clearly, Brouwer’s Theorem is a special case of Kakutani’s Theorem. It is an important


result because, like standard equations, also an inclusion equation 0 2 f (x) may be solved
by …nding a self-correspondence g : K K de…ned on a convex compact subset K of Rn
such that 0 2 f (x) if and only if x 2 g (x). In this case, the solution of an inclusion equation
reduces to the search of the …xed points of a self-correspondence.

32.5.2 Aggregate market analysis


The aggregate market analysis of Section 12.8.3 can be generalized to the case of demand and
supply correspondences D; S : Rn+ Rn+ . As we will see momentarily, such generalization
can be easily motivated within an exchange economy.
Let E : Rn+ Rn be the excess demand correspondence de…ned by E (p) = D (p) S (p),
with positive part E + : Rn+ Rn+ de…ned by E + (p) = fmax fz; 0g : z 2 E (p)g. A pair
(p; q) 2 Rn+ Rn+ of prices and quantities is a weak market equilibrium if p is such that
E (p) \ Rn 6= ;, i.e.,
0 2 E + (p) (32.4)
and q 2 D (p). The pair (p; q) is a market equilibrium if p is such that

0 2 E (p) (32.5)

and q 2 D (p).
The existence of equilibria thus reduces to the solution of some inclusion equations de…ned
by the excess market demand correspondence. To solve these inclusion equations, and thus
establish the existence of equilibria, we consider the following assumptions on such corres-
pondence:

E.1 E is upper hemicontinuous, convex-valued, and bounded on Rn++ ;

A.2 E ( p) = E (p) for each > 0 and all p 2 Rn+ ;

W.1 p E (p) 0 for all p 2 Rn+ .2

E.4 Ei (p) Rn+ if pi = 0.

W.2 p E (p) = 0 for all p 2 Rn+ .


1
It is named after Shizuo Kakutani, P
who proved it in 1941.
2
The inequality p E (p) 0 means i2I pi zi 0 for all z 2 E (p).
958 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

We denoted the assumptions as in the earlier Section 12.8.3 because they have the same
economic interpretation (upon which we already expatiated). We use the letter “E” for the
…rst and fourth assumptions because they have to adapt their mathematical form to the
more general setting of correspondences.
We can now state and prove a general version of Arrow-Debreu’s Theorem.
Theorem 1352 (Arrow-Debreu) Under assumptions E.1, A.2, and W.1 a weak market
equilibrium exists. If, in addition, assumptions E.4 and W.2 hold, then a market equilibrium
exists.
Proof We follow Debreu (1959). Since E is bounded, there is a compact set K in Rn such
that E (p) K for all p 2 Rn+ . Without loss of generality, we can assume that K is convex.
By E.2, we can limit ourselves to the upper hemicontinuous restriction E : n 1 K.
De…ne g : K n 1 by
g (z) = arg max p z
p2 n 1

By the Maximum Theorem (Chapter 33), g is a compact-valued and upper hemicontinuous


correspondence. Moreover, it is convex-valued (Proposition 1358). Consider the product
correspondence ' : n 1 K n 1 K de…ned by ' (p; z) = g (z) E (p). The cor-
respondence ' is easily seen to be upper hemicontinuous and convex-valued (as readers can
check) on the compact and convex set n 1 K. By Kakutani’s Theorem, there exists a
…xed point (p; z) 2 n 1 K such that (p; z) 2 ' (p; z) = g (z) E (p). So, z 2 E (p) and
p 2 g (z), which respectively imply by W.1 p z 0 and by de…nition p z p z for all
p 2 n 1 . Thus, we have
p z 0 8p 2 n 1
In particular, by taking the price versor ei 2 n 1, we then get
zi = e i z 0 8i 2 I
We conclude that z 2 E (p) \ Rn , so 0 2 E + (p).
Assume E.4 and W.2. We want to show that z = 0. Suppose, by contradiction, that
zi < 0 for some good i. By E.4, pi > 0. By W.2 and since prices are positive, then there
exists some j such that pj zj > 0, which contradicts z 0. We conclude that z = 0, yielding
that 0 2 E (p).

32.5.3 Back to agents: exchange economy


The previous aggregate market analysis with demand and supply correspondences can be
understood in terms of the simple exchange economy E = f(ui ; ! i )gi2I of Section 18.8. In
what follows, we assume that each agent has a consumption set Ai = [0; bi ] where bi 2 Rn++ .
In the optimization problem
max ui (x) sub x 2 Bi (p; p ! i )
x

that, in his consumer role, agent i solves we no longer assume that the solution is unique,
but permit multiple optimal bundles. Consequently, now we have a demand correspondence
Di : Rn+ Rn+ de…ned by
Di (p) = arg max ui (x) 8p 2 Rn+
x2Bi (p;p ! i )
32.5. INCLUSION EQUATIONS 959

The aggregate demand correspondence D : Rn+ Rn is still de…ned by


X
D (p) = Di (p)
i2I

where now, though, the sum is in the sense of (32.3). The aggregate demand correspondence
still inherits the invariance property of individual demand correspondences –i.e., D ( p) =
D (p) for all > 0 –since this invariance property is easily seen to continue to hold for each
agent.
The aggregate supply function S : Rn+ ! Rn continues to be S (p) = f!g. So, the weak
Walras’ law still takes the form p E (p) 0, where E : Rn+ Rn is the excess demand
correspondence de…ned by E (p) = D (p) f!g. If Walras’law holds for each agent i 2 I,
i.e., p Di (p) = p ! i for each i 2 I, then its aggregate version p E (p) = 0 holds.
jIj
Here a pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a weak Arrow-
Debreu (market) equilibrium of the exchange economy E if

(i) xi 2 Di (p) for each i 2 I,


P
(ii) i2I xi !.

The pair (p; x) becomes a Arrow-Debreu (market) equilibrium if in the market clearing
condition (ii) we have equality, so that optimal bundles exhaust endowments.
The next result, a general version of Lemma 850, connects the Arrow-Debreu and the
aggregate market equilibrium notions.
n jIj
P 1353 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:

(i) Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.4) and
q 2 D (p);
(ii) weak Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.5) and
q 2 D (p).

We can now establish which properties of the utility functions and endowments of the
agents of economy E imply the properties of the aggregate demand correspondence that the
Arrow-Debreu’s Theorem requires. For simplicity, we consider weak equilibria and prove the
desired existence result that generalizes Proposition 851.

Proposition 1354 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I, the
endowment ! i is strictly positive and the utility function ui is continuous and quasi-concave
on a consumption set Ai = [0; bi ] where bi 2 Rn++ . Then, a weak market price equilibrium of
the exchange economy E exists.

This existence result generalizes Proposition 851 in that utility functions are only required
to be quasi-concave and not strictly quasi-concave.

Proof Let i 2 I. If ui is continuous on the compact set Ai , by the Maximum Theorem


(Chapter 33) the individual demand correspondence Di is bounded and upper hemicontinu-
ous on Rn++ . Moreover, since ui is quasi-concave, Di is convex-valued (Proposition 1358).
960 CHAPTER 32. INTERMEZZO: CORRESPONDENCES

The aggregate demand correspondence D inherits these properties, i.e., it is bounded, convex-
valued, and upper hemicontinuous continuous on Rn++ . So, condition E.1 is satis…ed. Since
we already noted that conditions A.2 and W.1 hold, we conclude that a weak market price
equilibrium exists by the Arrow-Debreu’s Theorem.
Chapter 33

Parametric optimization problems

33.1 De…nition
Given a set Rm of parameters and an all inclusive choice space A Rn , suppose that
each value of the parameter vector determines a choice (or feasible) set ' ( ) A. Choice
sets are thus identi…ed, as the parameter varies, by a feasibility correspondence ' : A.
An objective function f : A ! R, de…ned over pairs (a; ) of choices a and parameters
, has to be optimized on the feasible sets determined by the correspondence ' : A.
Jointly, ' and f thus determine an optimization problem in parametric form:

max f (x; ) sub x 2 ' ( ) (33.1)


x

When f ( ; ) is, for every 2 , concave (quasi-concave) on the convex set A and ' is
convex-valued, this problem is called concave (quasi-concave).

Ax
^ 2 ' ( ) is a solution for 2 if it is an optimal choice given , that is,

f (^
x; ) f (x; ) 8x 2 ' ( )

The solution correspondence :S A of the parametric optimization problem (33.1)


is de…ned by
( ) = arg max f (x; )
x2'( )

That is, the correspondence associates to each the corresponding solution set, i.e., the
set of optimal choices. Its domain S is the solution domain, that is, the collection of all s
for which problem (33.1) admits a solution. If such solution is unique at all 2 S, then is
single-valued, that is, it is a function. In this case we say that is a solution function.

The (optimal ) value function v : S ! R of the parametric optimization problem is


de…ned by
v ( ) = max ff (x; ) : x 2 ' ( )g (33.2)
for each 2 S, that is, v ( ) = f (^
x; ) for every x^ 2 ( ). The value function gives, for each
, the maximum value of the objective function on the set ' ( ). Since this value is attained
at the solutions x
^, the value function is well-de…ned only on the solution domain S.

961
962 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

Example 1355 The parametric optimization problem with equality and inequality con-
straints has the form
max f (x; ) (33.3)
x
sub i (x; )=0 8i 2 I
j (x; ) 0 8j 2 J
where i : A Rn Rm ! R for every i 2 I, j :A Rn Rm ! R for every j 2 J,
and = ( 1 ; :::; m
m ) 2 R . Here

'( ) = x 2 A : i (x; )=0 8i 2 I, j (x; ) 0 8j 2 J


If f does not depend on the parameter, and if i (x; ) = gi (x) bi for every i 2 I and
j (x; ) = hj (x) cj for every j 2 J (so that m = jIj + jJj), we get back to the familiar
problem (30.4) studied in Chapter 30, that is,
max f (x)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
In this case, if we set b = b1 ; :::; bjIj 2 RjIj and c = c1 ; :::; cjJj 2 RjJj , the parameter set
consists of all = (b; c) 2 RjIj RjJj . N

Example 1356 The consumer problem (Section 18.1.4) is a parametric optimization prob-
lem. The set A is the consumption set. The space Rn+1 + of all price and income pairs
is the parameter set , with generic element = (p; w). The budget correspondence
B : Rn+1
+ R n is the feasibility correspondence and the utility function u : A ! R is
+
the objective function (interestingly, in this important example the objective function does
not depend on the parameter). Let S be the set of all parameters (p; w) for which
the consumer problem has solution (i.e., an optimal bundle). The demand correspondence
D:S Rn+ is the solution correspondence, which becomes a demand function D : S ! Rn+
when optimal bundles are unique. Finally, the indirect utility function v : S ! R is the
value function. N

Parametric optimization problems are pervasive in economics because they permit to


carry out the all-important comparative statics exercises that study how, within a given
optimization problem, changes in the parameters a¤ect optimal choices and their values.
The solution correspondence and the value function are key for these exercises because they
describe how optimal choices and their value vary as parameters vary. For instance, in the
consumer problem the demand correspondence and the indirect utility function describe,
respectively, how the optimal bundles and their values are a¤ected by changes in prices and
income.

33.2 Basic properties


The existence theorems of Weierstrass and Tonelli ensure the existence of solutions. For
instance, a straightforward consequence of Weierstrass’Theorem is that 0 2 S if ' ( 0 ) is
compact and f ( ; 0 ) : A ! R is continuous. This leads to the following important result.
33.2. BASIC PROPERTIES 963

Proposition 1357 If f is continuous in x and ' is compact-valued, then S = and is


compact-valued.

Proof By Weierstrass’Theorem, we have S = . The set ( ) is compact for every 2 .


Indeed, let f^
xn g ( ) be such that x^n ! x 2 Rn . Since ( ) ' ( ) and the latter set
is compact (hence closed), we have that x 2 ' ( ). Since f (^xn ; ) = maxx2'( ) f (x; ) for
every n, the continuity of f in x implies f (x; ) = limn!1 f (^
xn ; ) = maxx2'( ) f (x; ), so
x 2 ( ). This proves that ( ) is closed. We conclude that ( ) is compact because it is
a closed subset of the compact set ' ( ).

We now turn to convexity properties. We assume that the set A is convex and, to ease
matters, that S = .

Proposition 1358 The solution correspondence is convex-valued if f is quasi-concave in x


and ' is convex-valued.

Proof Given any 2 , let us show that ( ) is convex. Let x


^1 ; x
^2 2 ( ) and 2 [0; 1].
Since f is quasi-concave in x,

f (^
x1 ; ) f( x
^1 + (1 )x
^2 ; ) min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^
x2 ; ) = v ( )

and so f ( x
^1 + (1 )x
^2 ; ) = v ( ), i.e., x
^1 + (1 )x
^2 2 ( ).

The convexity of the solution set means inter alia that, when non-empty, such a set is
either a singleton or an in…nite set. That is, either the solution is unique or there is an
in…nite number of them. Next we give the most important su¢ cient condition that ensures
uniqueness.

Proposition 1359 The solution correspondence is single-valued if f is strictly quasi-concave


in x and ' is convex-valued.

Proof Let us prove that is single-valued. Let 2 and x ^1 ; x


^2 2 ( ). We want to show
that x
^1 = x
^2 . Suppose, by contradiction, that x ^1 6= x
^2 . By the strict quasi-concavity of f in
x,
1 1
f x
^1 + x^2 ; > min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^x2 ; ) = v ( ) ;
2 2
a contradiction. Hence, x
^1 = x
^2 , as desired.

By strengthening the hypothesis of Proposition 1358 from quasi-concavity to strict quasi-


concavity, the solution set becomes a singleton. In this case we have a solution function and
not just a solution correspondence. This greatly simpli…es comparative statics exercises that
study how solutions change as the values of parameters vary. For this reason, in applications
strict concavity (and so strict quasi-concavity) is often assumed, typically by requiring that
the second derivative be decreasing (Corollary 1099). By now, we have remarked several
times this key fact: hopefully, repetita iuvant (sed nauseant).

Turn now to value functions. In the following result we assume the convexity of the graph
of '. As we already remarked, this is a substantially stronger assumption than the convexity
of the images ' (x).
964 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

Proposition 1360 The value function v is quasi-concave (resp., concave) if f is quasi-


concave (resp., concave) and the graph of ' is convex.

Proof Let 1 ; 2 2 and 2 [0; 1]. Let x


^1 2 ( 1 ) and x ^2 2 ( 2 ). Since ' has convex
graph, x
^1 + (1 )x
^2 2 ' ( 1 + (1 ) 2 ). Hence, the quasi-concavity of f implies:

v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
min ff (^
x1 ; 1) ; f (^
x2 ; 2 )g = min fv ( 1 ) ; v ( 2 )g

So, v is quasi-concave. If f is concave, we have:

v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
f (^
x1 ; 1) + (1 ) f (^
x2 ; 2) = v ( 1 ) + (1 ) v ( 2)

So, v is concave.

A similar argument shows that v is strictly quasi-concave (resp., concave) if f is strictly


quasi-concave (resp., concave).

Example 1361 In the consumer problem, the graph of the budget correspondence is convex
if the consumption set is convex. Indeed, let ((p; w) ; x) ; ((p0 ; w0 ) ; x0 ) 2 Gr B and let 2
[0; 1]. Then, p ( x + (1 ) x0 ) w+(1 ) w0 , so the set Gr B is convex. By Proposition
1358, the demand correspondence is convex-valued if the utility function is quasi-concave,
while by Proposition 1360 the indirect utility is quasi-concave (concave) if the utility function
is quasi-concave (concave). N

33.3 Maximum Theorem


How do solutions and maximum values vary as parameters change? Are such changes abrupt
or gentle? The stability of an optimization problem under parameters’changes is a key issue
in applications, where it is typically desirable that changes in parameters nicely a¤ect, in a
“continuous” manner, solutions and maximum values. Formally, this amounts to the upper
hemicontinuity of the solution correspondence and the continuity of the value function.
In this section we address this fundamental stability question of parametric optimization
problems through the celebrated Berge’s Maximum Theorem.1

Theorem 1362 Consider a parametric optimization problem

max f (x; ) sub x 2 ' ( )


x

If ' is bounded and continuous and f is continuous, then S = , v is continuous and is


bounded, compact-valued and upper hemicontinuous.
1
It is named after Claude Berge, who proved it in 1959.
33.3. MAXIMUM THEOREM 965

Under the continuity of both the objective function and feasibility correspondence, the
optimization problem is thus stable under changes in parameters: both the value function
and the solution correspondence are continuous. The Maximum Theorem is an important
result in applications because, as remarked before, the stability that it ensures is often a
desirable property of the optimization problems that they feature. Natura non facit saltus
as long as the hypotheses of the Maximum Theorem are satis…ed.

The proof of the Maximum Theorem relies on a lemma of independent interest.

Lemma 1363 Given any bounded sequence of scalars fan g, if lim supn!1 an = a, then
there exists a subsequence fank g such that limk!1 ank = a.

A similar property holds for the lim inf.

Proof Given k 1, de…ne

n1 = min fn 1 : jan aj < 1g

and recursively
1
nk+1 = min n 1 : n > nk and jan aj <
k+1
In this way, fank g is a subsequence of fan g. Indeed, by construction, nk+1 > nk for every
k 1. At the same time, fank g converges to a. Again, by construction, it is su¢ cient to
note that jank aj < 1=k for every k 1. Thus, the subsequence fank g is the subsequence
we were looking for. Nevertheless, we are not done. Indeed, to end the proof we have to show
that fank g is well de…ned. The careful reader probably noted already that the current proof,
despite being correct, is incomplete. Indeed, we do not know that the sets whose minima
we are taking are indeed not empty, so that these minima are well de…ned. The rest of the
proof is devoted to show exactly this.
For each n 1, set An = supm n am 2 R. Recall that a = limn!1 An = inf n An . Fix
any " > 0. On the one hand, since An converges to a, there exists some n" 1 such that
An" a < "=2. On the other hand, by the de…nition of supremum, there is some m n"
such that An" "=2 am An" . In turn, this easily implies that

jam aj = jam An" + An" aj jam An" j + jAn" aj < "

It follows that, for every " > 0, the set

fm 1 : jam aj < "g

is not empty. If we set " = 1, the set fn 1 : jan aj < 1g is then not empty. At the
same time, in view of the trivial inclusion fm 1 : jam aj < "0 g fm 1 : jam aj < "g
if " > "0 > 0, we conclude that the latter set is in…nite. This yields that

1 1
n 1 : n > nk and jan aj < = n 1 : jan aj < f1; :::; nk + 1g
k+1 k+1

is not empty as well for every k 2.


966 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

Proof of the Maximum Theorem Since ' is bounded, recall that there exists a compact
set K such that ' ( ) K A for all 2 . Suppose that ' and f are continuous. By
Proposition 1339, the set ' ( ) is closed for each 2 . Since ' is bounded, ' ( ) turns out
to be compact as well. By Proposition 1357, S = and is compact-valued. Fix any point
2 and consider a sequence f n g such that limn!1 n = . Next, we …rst prove
that fv ( n )g is bounded. By contradiction, assume that supn jv ( n )j = +1. It follows that
there exists a subsequence f nk g such that jv ( nk )j k for every k 1. For each s 1, let
x xnk ; nk ) for every s. By Bolzano-Weierstrass’Theorem
^nk 2 ' ( nk ) such that v ( nk ) = f (^
and since ' is bounded, there exists a subsequence x ^ks that converges to x 2 K. Since '
is continuous and lims!1 nks = , we can conclude that x 2 ' . Since f is continuous,
this implies that
+1 = lim v nks = lim f x
^nks ; nks = f x; < +1
s!1 s!1

a contradiction. We proceed by showing that lim supn!1 v ( n ) v . In light of the


previous part of the proof, we can use Lemma 1363. Set lim supn!1 v ( n ) = 2 R.
By Lemma 1363, there exists a subsequence f nk g such that limk!1 v ( nk ) = . Let
x
^k 2 ' ( nk ) for each k 1, so that f (^xk ; nk ) = v ( nk ) for each k 1. Since ' is
bounded, the sequence of vectors f^
xk g is bounded. By Bolzano-Weierstrass’Theorem, there
is a subsequence x ^ks that converges to some x 2 Rn . Since lims!1 nks = , it follows
that x 2 ' because ' is upper hemicontinuous. Since f is continuous, this implies
= lim v nks = lim f xks ; nks = f x; v
s!1 s!1

We conclude that v , as desired.


Next, we show that lim inf n!1 v ( n ) v . Set lim inf n!1 v ( n ) = 2 R. By
Lemma 1363, there exists a subsequence f nk g such that limk!1 v ( nk ) = . Since S = ,
there is x 2 ' such that v = f x; . Since ' is lower hemicontinuous, there exist
elements xk 2 ' ( nk ) such that limk!1 xk = x. Since v ( nk ) = f (xk ; nk ) for each k 1,
the continuity of f implies
= lim v ( nk ) = lim f (xk ; nk ) = f x; =v :
k!1 k!1

Hence, lim inf n!1 v ( n) v , as desired. We conclude that


lim inf v ( n) v lim sup v ( n)
n!1 n!1

so limn!1 v ( n ) = v .
It remains to show that is upper hemicontinuous at . Let n ! and xn ! x
with xn 2 ( n ). We want to show that x 2 . Since ( n ) ' ( n ) and ' is upper
hemicontinuous, clearly x 2 ' . By the continuity of both f and v, we then have
f x; = lim f (xn ; n) = lim v ( n) =v
n!1 n!1

and so x 2 , as desired.

For instance, the continuity properties of demand correspondences and indirect utility
functions follow from the Maximum Theorem. To this end, we need the following continuity
property of the budget correspondence.
33.3. MAXIMUM THEOREM 967

Proposition 1364 The budget correspondence is continuous at all (p; w) such that w > 0.

In other words, the budget correspondence B is continuous on Rn+ R++ .

Proof Let (p; w) 2 Rn+ R++ . We …rst show that B is upper hemicontinuous at (p; w). Let
(pn ; wn ) ! (p; w), xn ! x and xn 2 B (pn ; wn ). We want to show that x 2 B (p; w). Since
p xn wn for each n, it holds p x = limn!1 p xn limn!1 wn = w, that is, x 2 B (p; w).
We conclude that B is upper hemicontinuous at (p; w).
The correspondence B is also lower hemicontinuous at (p; w) 2 Rn+1
+ . Let (pn ; wn ) !
(p; w) and x 2 B (p; w). We want to show that there is a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We consider two cases.

(i) Suppose p x < w. Since (pn ; wn ) ! (p; w), there is n large enough so that pn x < wn
for all n n. Hence, the constant sequence xn = x is such that xn 2 B (pn ; wn ) for all
n n and xn ! x.

(ii) Suppose p x = w. Since w > 0, there is x 2 Rn+ such that p x < w. Since
(pn ; wn ) ! (p; w), there is n large enough so that pn x < wn for all n n. Set

1 1
xn = 1 x+ x
n n

We have xn 2 B (pn ; wn ) for all n n and xn ! x.

In both cases it then easily follows the existence of a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We conclude that B is lower hemicontinuous at (p; w).

We can now apply the Maximum Theorem to the consumer problem that, under a mild
continuity hypothesis on the utility function, turns out to be stable with respect to changes
in prices and wealth.

Proposition 1365 Suppose that u : A Rn ! R is a continuous utility function de…ned


on a compact consumption set. Let (p; w) 2 Rn+ R++ . Then:

(i) the demand correspondence is compact-valued and upper hemicontinuous at (p; w);

(ii) the indirect utility function is continuous at (p; w).

Proof Since the consumption set is compact, the budget correspondence is bounded and
continuous on Rn+ R++ . Since the utility function is continuous, the result then follows
from the Maximum Theorem.

Observe that (i) implies that demand functions are continuous at (p; w) since upper
hemicontinuity and continuity coincide for bounded functions (Proposition 1341).
968 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

33.4 Envelope theorems I: …xed constraint


How do value functions react to changes in parameters? In other words, how do change the
objective functions’optimal levels when parameters change? The answer to this basic com-
parative statics exercise depends, clearly, on how solutions react to such changes, as optimal
levels are attained at the solutions. Mathematically, under di¤erentiability it amounts to
study the gradient rv ( ) of the value function. This the subject matter of the envelope
theorems.
We begin by considering in this section the special case

max f (x; ) sub x 2 C (33.4)


x

where the feasibility correspondence is constant, with ' ( ) = C A for all 2 . The
parameter only a¤ects the objective function. To ease matters, throughout the section we
also assume that S = .
We …rst approach heuristically the issue. To this end, suppose that n = k = 1 so
that both the parameter and the choice variable x are scalars. Moreover, assume that
there is a unique solution for each , so that : ! R is the solution function. Then
v ( ) = f ( ( ) ; ) for every 2 . A heuristic application of the chain rule – a “back of
the envelope calculation” –then suggests that, if exists, the derivative of v at 0 is:
@f ( ( 0 ) ; 0) 0 @f ( ( 0 ) ; 0)
v0 ( 0) = ( 0) +
@x @
Remarkably, the …rst term is null because by Fermat’s Theorem (@f =@x) ( ( 0 ) ; 0) = 0
(provided the solution is interior). Thus,

@f ( ( 0 ) ; 0)
v0 ( 0) = (33.5)
@
Next we make general and rigorous this important …nding.

Theorem 1366 Suppose f (x; ) is, for every x 2 C, di¤ erentiable at 0 2 int . If v is
di¤ erentiable at 0 , then for every x
^ 2 ( 0 ) we have rv ( 0 ) = r f (^
x; 0 ), that is,

@v ( 0 ) @f (^
x; 0 )
= 8i = 1; :::; k (33.6)
@ i @ i

If f is strictly quasi-concave in x and ' is convex-valued, then is a function (Proposition


1359). So, (33.6) can be written as

@v ( 0 ) @f ( ( 0 ) ; 0)
= 8i = 1; :::; k
@ i @ i
which is the general form of the heuristic formula (33.5).

Proof Let 0 2 int . Let x ( 0 ) 2 ( 0 ) be an optimal solution at 0 , so that v ( 0 ) =


f (x ( 0 ) ; 0 ). De…ne w : ! R by w ( ) = f (x ( 0 ) ; ). We have v ( 0 ) = w ( 0 ) and, for
all 2 ,
w ( ) = f (x ( 0 ) ; ) max f (x; ) = v ( ) (33.7)
x2C
33.4. ENVELOPE THEOREMS I: FIXED CONSTRAINT 969

We thus have
w( 0 + tu) w ( 0) v( 0 + tu) v ( 0)
t t
for all u 2 Rk and t > 0 su¢ ciently small. Hence,

@f (x; 0 ) f x ( 0) ; + hei
0 f (x ( 0 ) ; 0) w 0 + hei w ( 0)
= lim = lim
@ i h!0+ h h!0+ h
v + hei v (
0 0) @v ( 0 )
lim =
h!0+ h @ i

On the other hand,


w( 0 + tu) w ( 0) v( 0 + tu) v ( 0)
t t
for all u 2 Rk and t < 0 su¢ ciently small. By proceeding as before, we then have

@f (x; 0 ) @v ( 0 )
@ i @ i

This proves (33.6).

The hypothesis that v is di¤erentiable is not that appealing because it is not in terms
of the primitive elements f and C of problem (33.4). Indeed, to check it we need to know
the value function. Remarkably, in concave problems this di¤erentiability hypothesis follows
from hypotheses that are directly on the objective function.

Theorem 1367 Let C and be convex. Suppose f (x; ) is, for every x 2 C, di¤ erentiable
at 0 2 int . If f is concave on C , then v is di¤ erentiable at 0 .

Thus, if f is di¤erentiable on the variable and is concave, then rv ( 0 ) = r f (^x; 0 )


for all x
^ 2 ( 0 ). If, in addition, f is strictly concave in x, then we can directly write
rv ( 0 ) = r f ( ( 0 ) ; 0 ) because is a function and ( 0 ) is the unique solution at 0 .

\
Proof By Proposition 1360, v is concave. We begin by proving that @v ( 0 ) @ f (x; 0 ).
x2 ( 0)
Let 2 @v ( 0 ), so that v ( ) v ( 0) + ( 0) for all 2 . Being v ( 0 ) = w ( 0 ), by
(33.7) we have, for all 2 ,

w( ) v( ) v ( 0) + ( 0) = w ( 0) + ( 0)

Hence, 2 @w ( 0 ) = @ f (x; 0 ) for all x 2 ( 0 ). Since v is concave at 0 2 int , by


Proposition 1143 we have @v ( 0 ) 6= ;. Since f (x; ) is, for every x 2 ( 0 ), di¤erentiable
at 0 , we have @ f (x; 0 ) = fr f (x; 0 )g by Proposition 1142. We conclude that @v ( 0 ) =
fr f (x; 0 )g. By Proposition 1142, v is di¤erentiable at 0 .
970 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

33.5 Envelope theorems II: variable constraint


Matters are less clean when the feasibility correspondence is not constant. We consider a
parametric optimization problem with equality constraints

max f (x; ) sub i (x; )=0 8i = 1; :::; m (33.8)


x

where = ( 1 ; :::; m ) : A Rn ! Rm and = ( 1 ; :::; k ) 2 Rk .


Here ' ( ) = fx 2 A : i (x; ) = 0 8i = 1; :::; mg, so the constraint varies with the para-
meter . For instance, if f does not depend on and i (x; ) = gi (x) i for i = 1; :::; m
(so that k = m), we get back to the familiar problem (29.37) of Chapter 29, that is,

max f (x) sub gi (x) = bi 8i = 1; :::; m


x

Again, we begin with a heuristic argument. Assume that n = k = m = 1, so that there is


a single constraint and both the parameter and the choice variable x are scalars. Moreover,
assume that there is a unique solution for each , so that : ! R is the solution function
and ( ) is the unique solution that corresponds to . A heuristic application of the chain
rule suggests that, if exists, the derivative of v at 0 is

@f ( ( 0 ) ; 0) ^ ( 0) @ ( ( 0) ; 0)
v0 ( 0) =
@ @
where ^ ( 0 ) is the Lagrange multiplier that corresponds to the unique solution ( 0 ). Indeed,
being ( ( ) ; ) = 0 for every 2 , by a heuristic application of the chain rule we have

@ ( ( 0) ; 0) 0 @ ( ( 0) ; 0)
( 0) + =0
@x @
On the other hand, being v ( ) = f ( ( ) ; ) for every 2 , again by a heuristic application
of the chain rule we have
@f ( ( 0 ) ; 0 ) 0 @f
v0 ( 0) = ( 0) + ( ( 0) ; 0)
@x @
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) ^ @ ( ( 0) ; 0) 0
= ( 0) + ( 0) ( 0)
@x @x @x
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) 0
= ( 0 ) 0 ( ( 0 )) 0 ( 0 ) + ^ ( 0 ) ( 0)
@x @x
| {z }
=0
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0)
= ( 0)
@ @
as desired. Next we make more rigorous and general the result. We study the case of unique
solutions, common in applications.
33.5. ENVELOPE THEOREMS II: VARIABLE CONSTRAINT 971

Theorem 1368 Suppose that problem (33.8) has a unique solution ( ) at all 2 .2
Suppose that the sets A and are open and that f and are continuously di¤ erentiable on
A . If the determinant of the Jacobian of the operator (rx L; ) is non-zero on , then

rv ( ) = r f ( ( ) ; ) ^( ) r ( ( ); ) 8 2

where ^ ( ) is the Lagrange multiplier that corresponds to the unique solution ( ).

That is,

m
X
@v ( ) @f ( ( ) ; ) ^i ( ) @ i( ( ); )
= 8s = 1; :::; k (33.9)
@ s @ s @ s
i=1

for all 2 .

Proof As in the heuristic argument we consider the case n = k = m = 1 (the general case
being just notationally messier). By hypothesis, there is a solution function : ! A. By
Lagrange’s Theorem, is then the unique function that, along with a “multiplier” function
^ : ! R, satis…es for all 2 the equations

@f ( ( ) ; ) ^ @ ( ( ); )
rx L( ( ) ; ^ ( )) = ( ) =0
@x @x
r L( ( ) ; ^ ( )) = ( ( ); ) = 0

So, the operator ( ; ^ ) : ! A R is de…ned implicitly at each 2 by these equations.


Since the Jacobian of the operator (rx L; ) is non-zero on , the operator version of
Proposition 1200 ensures that the operator ( ; ^ ) is continuously di¤erentiable, with

@ ( ( ); ) 0 @ ( ( ); )
+ ( ) =0 8 2 (33.10)
@ @x

We also have v ( ) = f ( ( ) ; ) for all 2 . By Theorem 957, v is di¤erentiable and, by


the chain rule, we have

@f ( ( ) ; ) @f
v0 ( ) = 0
( )+ ( ( ); ) 8 2 (33.11)
@x @

Putting together (33.10) and (33.11) via the simple algebra seen in the heuristic derivation,
we get

@f ( ( ) ; ) @f @f ( ( ) ; ) ^( ) @ ( ( ); )
v0 ( ) = 0
( )+ ( ( ); ) = 8 2
@x @ @ @

as desired.
2
Earlier in the chapter we saw which conditions ensure the existence and uniqueness of solutions.
972 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

33.6 Marginal interpretation of multipliers


Formula (33.9) continues to hold for parametric optimization problem with both equality
and inequality constraints (33.3), where it takes the form

@v ( 0 ) @f (^
x; 0 ) X X @ ( ( 0) ; 0)
= ^i ( 0) @ i( ( 0) ; 0)
^j ( 0)
j
(33.12)
@ s @ s @ s @ s
i2I j2J

jJj
for every s = 1; :::; k. Here ( ^ ( 0 ) ; ^ ( 0 )) 2 RjIj R+ are the Lagrange multipliers associated
with the solution ( 0 ), assumed to be unique (for simplicity).
We can derive heuristically this formula with the heuristic argument that we just used for
the equality case. Indeed, if we denote by A ( ( 0 )) be the set of the binding constraints at
0 , by Lemma 1303 we have ^ j = 0 for each j 2 = A ( ( 0 )). So, the non-binding constraints
at 0 do not a¤ect the derivation because their multipliers are null.
That said, let us consider the standard problem (30.4) in which the objective function does
not depend on the parameter, i (x; ) = gi (x) bi for every i 2 I, and j (x; ) = hj (x) cj
for every j 2 J (Example 1355). Formula (33.12) then implies
@v (b; c)
= ^ i (b; c) 8i 2 I
@bi
@v (b; c)
= ^ j (b; c) 8j 2 J
@cj
Interestingly, the multipliers describe the marginal e¤ect on the value function of relaxing
the constraints, that is, how much it is valuable to relax them. In particular, we have
@v (b; c) =@cj = ^ j (b; c) 0 because it is always bene…cial to relax an inequality constraint:
more alternatives become available. In contrast, this might not be the case for an equality
constraint, so the sign of @v (b; c) =@bi = ^ i (b; c) is ambiguous.

33.7 Monotone solutions


Given an objective function f : I Rn Rm ! R, consider a parametric optimization
problem
max f (x; ) sub x 2 ' ( ) (33.13)
x

in which the feasibility correspondence ' : I is assumed to be ascending: when 0 ,


0
if x 2 ' ( ) and y 2 ' , then x ^ y 2 ' ( ) and x _ y 2 ' 0 . Note that when '
is single-valued, this amounts to ' 0 ' ( ) whenever 0 – i.e., ' is an increasing
function.
The question that we address in this section is whether the solution correspondence of
this class of parametric optimization problems is itself ascending, so increasing when single-
valued: higher values of the parameters translate in higher values of the solutions, i.e., 0
0
implies ( ). It is a monotonicity property of solutions that may be relevant in
applications.3
The next class of functions will play a key role in our analysis.
3
We refer to Topkis (2011) for a detailed analysis of this topic. Throughout the section I = I1 In
denotes a rectangle in Rn , with each interval Ii bounded or not.
33.7. MONOTONE SOLUTIONS 973

De…nition 1369 A function f : I Rn Rm ! R is parametrically supermodular if,


given any 0 , we have
0 0
f (x; ) + f y; f x _ y; + f (x ^ y; )

for all x; y 2 I.

Given any 2 , the section f ( ; ) : I ! R is supermodular. Indeed, it is enough


to set 0 = in the previous de…nition. So, parametric supermodularity extends standard
supermodularity to a parametric setting.

Example 1370 Given a function : I ! R, de…ne f : I Rn Rn ! R by f (x; ) =


n
(x) + x. For each x 2 I and h 2 R , with x + h 2 I, we have

f (x; ) f (x + h; ) = (x) (x + h) h 8 2

Assume that is supermodular. Let x; y 2 I and set h = y x_y = x^y x 0. If


0
, we have 0 h h and so

f (x; ) f (x ^ y; ) = f (x; ) f (x + h; ) = (x) (x + h) h= (x) (x ^ y) h


0 0
(x _ y) (y) h= (x _ y) (x _ y + h) h
0 0 0 0
= f x _ y; f x _ y + h; = f x _ y; f y;

We conclude that f is parametrically supermodular. N

Example 1371 Assume that is a lattice. If f : I Rn Rm ! R is jointly


supermodular on I , then it is easily seen to be parametrically supermodular. So, any
condition that ensures such joint supermodularity of f , for instance a di¤erential condition
like (17.6), implies the parametric supermodularity of f . For instance, in the previous
example assume that the supermodular function is twice di¤erentiable and that I and
are open interval in Rn and Rm , respectively. Then, @f (x; ) =@xi @xj = @ (x) =@xi @xj ,
@f (x; ) @f (x; )
=0 81 i 6= j m and =1 81 i n; 81 j m
@ i@ j @xi @ j
Condition (17.6) is satis…ed, so f is jointly supermodular. We conclude that f is paramet-
rically supermodular, thus con…rming what established in the previous example. N

Since we deal with optimization problems, it is natural to turn to ordinal properties.

De…nition 1372 A function f : I Rn Rm ! R is parametrically semi-supermodular


0
if, given any , for each x; y 2 I we have
0 0
f (x ^ y; ) < f (x; ) =) f y; f x _ y; (33.14)

It is an ordinal property much weaker than parametric supermodularity.

Example 1373 Functions f : I Rn Rm ! R that are increasing in x for every


are easily seen to be parametrically semi-supermodular. In particular, the function f :
R2++ (0; 1) ! R de…ned by log (x1 + x2 ) is parametrically semi-supermodular; it is not,
however, parametrically supermodular. N
974 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS

We can now address the question that we posed at the beginning of this section. To
ease matters, from now on we assume that problem (33.13) has a solution for every 2
(e.g., I is compact and f is continuous in x), so we can write the solution correspondence
as : Rn . In most applications, comparative statics exercises actually feature solution
functions : ! Rn rather than correspondences (as we already argued several times).
This motivates the next result.

Proposition 1374 Let f : I Rn Rm ! R be parametrically semi-supermodular. If


the solution correspondence of the parametric optimization problem (33.13) is single-valued,
then it is increasing.

Proof Suppose that f is parametrically semi-supermodular and is single-valued. By


0
de…nition, ( ) = arg maxx2'( ) f (x; ) for all 2 . Let . Since ' is ascend-
0 0
ing, we have ( ) ^ 2 ' ( ) and ( ) _ 2 ' 0 . So, by the de…nition
0 0
of we have f ( )_ ; 0 f 0
; 0 , while by the de…nition of ( ) we
0 0
have f ( )^ ; f ( ( ) ; ). Suppose f ( ( ) ; ) = f ( )^ ; . By the
0 0
uniqueness of the solution, we have ( ) = ( ) ^ . Suppose, instead, that
0 0
f ( ( ); ) > f ( )^ ; . By (33.14), we have f ( )_ ; 0 f 0
; 0 ,
0
so f ( )_ ; 0 = f 0
; 0 . By the uniqueness of the solution, we now have
0 0 0
( ) ( )_ = . In both cases, we conclude that ( ) , as desired.

Example 1375 From the last example we know that, given a supermodular function :
I ! R, the function f : I Rn Rn ! R de…ned by f (x; ) = (x) + x is
parametrically supermodular. Consider the parametric problem

max (x) + x sub x 2 ' ( )


x

where the feasibility correspondence ' is ascending. By the last corollary, the solution
correspondence of this problem is ascending. For instance, consider a Cobb-Douglas pro-
duction function (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. If q 0 is the output’s price and
p = (p1 ; p2 ) 0 are the inputs’prices, the pro…t function (x; q) = qx1 1 x2 2 p1 x1 p2 x2 is
parametrically supermodular because is supermodular (see Example 760). The producer
problem is
max (x; q) sub x1 ; x2 0
x1 ;x2

where output’s price q plays the role of the parameter . Since the pro…t function is strictly
concave, solutions are unique (if they exist). In particular, a solution of the producer problem
is an optimal amount of inputs that the producer will demand. By the last corollary,4 the
solution function is increasing: if the output’s price increases, the inputs’ demand of the
producer increases. N

Next we turn to the ordinal version of parametric supermodularity.

De…nition 1376 A function f : I Rn Rm ! R is parametrically quasi-supermodular


0
if, given any , for each x; y 2 I we have both
0 0
f (x ^ y; ) < f (x; ) =) f y; < f x _ y; (33.15)
4
The feasibility correspondence ' : [0; +1) ! R2+ is given by ' (q) = R2+ , so it is trivially ascending.
33.7. MONOTONE SOLUTIONS 975

and
0 0
f y; > f x _ y; =) f (x ^ y; ) > f (x; ) (33.16)

The next result motivates the “quasi” terminology.

Proposition 1377 Let f : I Rn Rm ! R be parametrically supermodular. If


' : Im f ! R is strictly increasing, then ' f is parametrically quasi-supermodular.

Clearly, parametric quasi-supermodularity implies parametric semi-supermodularity.

Example 1378 If f : I Rn Rm ! R is strictly monotone in x for every , then it


is parametrically supermodular (as the reader can check). N

Parametric quasi-supermodularity permits to extend Proposition 1374 to the multi-


valued case.

Proposition 1379 If f : I Rn Rm ! R is parametrically quasi-supermodular, the


solution correspondence of the parametric optimization problem (33.13) is ascending.

Proof Suppose that f is parametrically quasi-supermodular. Let 0 . Let x ( ) 2 ( ) =


0
arg maxx2'( ) f (x; ) for all 2 . Let . Since ' is ascending, we have x ( ) ^ x 0 2
0 0
' ( ) and x ( ) _ x 2' . So, by the de…nition of x 0 we have f x ( ) _ x 0 ; 0
0
f x ; , while by the de…nition of x ( ) we have f x ( ) ^ x 0 ;
0
f (x ( ) ; ). Sup-
0 0 0 0 0
pose f x ( ) _ x ; <f x ; . By (33.16), f x ( ) ^ x ; > f (x ( ) ; ), a
contradiction. We conclude that f x ( ) _ x 0 ; 0 = f x 0 ; 0 , so x ( ) _ x 0 2
0
. Suppose f x ( ) ^ x 0 ; < f (x ( ) ; ). By (33.15), f x ( ) _ x 0 ; 0 > f x 0 ; 0
,
a contradiction. We conclude that f x ( ) ^ x 0 ; = f (x ( ) ; ), so x ( ) ^ x 0 2 ( ).
This proves that is ascending.
976 CHAPTER 33. PARAMETRIC OPTIMIZATION PROBLEMS
Chapter 34

Interdependent optimization

So far we have considered individual optimization problems. Many economic and social phe-
nomena, however, are characterized by the interplay of several such problems, in which the
outcomes of agents’decisions depend on their decisions as well as on the decisions of other
agents. Market interactions are an obvious example of interdependence among agents’ de-
cisions: for instance, in an oligopoly problem the pro…ts that each producer can earn depends
both on his production decision and on the production decisions of the other oligopolists.
Interdependent decisions must coexist: the mutual compatibility of agents’ decisions is
the novel conceptual issue that emerges in the study of interdependent optimization. Equi-
librium notions address this issue. In this chapter we present an introductory mathematical
analysis of this most important topic, which is the subject matter of game theory and is at
the heart of economic analysis. In particular, the theorems of von Neumann and Nash that
we will present in this chapter are wonderful examples of deep mathematical results that
have been motivated by economic applications.

34.1 Minimax Theorem


De…nition 1380 Let f : A1 A2 ! R be a real-valued function and C1 and C2 subsets
of A1 and A2 , respectively. A pair (^ x1 ; x
^2 ) 2 C1 C2 is said to be a saddle point of f on
C1 C2 if
f (^
x1 ; x2 ) f (^
x1 ; x
^2 ) f (x1 ; x ^2 ) 8x1 2 C1 ; 8x2 2 C2 (34.1)
The value f (^
x1 ; x
^2 ) of the function at x
^ is called saddle value of f on C1 C2 .

In other words, (^
x1 ; x
^2 ) is a saddle point if the function f (^
x1 ; ) : C2 ! R has a minimum
at x
^2 and the function f ( ; x ^2 ) : C1 ! R has a maximum at x ^1 . To visualize these points,
think of centers of horse saddles: these points at the same time maximize f along one
dimension and minimize it along the other, perpendicular, one. This motivates their name.
Their nature is clari…ed by the next characterization.

Proposition 1381 Let f : A1 A2 ! R be a real-valued function and C1 and C2 subsets of


A1 and A2 , respectively. A pair (^
x1 ; x
^2 ) 2 C1 C2 is a saddle point of f on C1 C2 if and
only if1

1
Since we have inf and sup, we must allow the values 1 and +1, respectively.

977
978 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

(i) the function inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) attains its maximum value at x
^1 ,

(ii) the function supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x
^2 ,

(iii) the two values are equal, i.e.,

max inf f (x1 ; x2 ) = f (^


x1 ; x
^2 ) = min sup f (x1 ; x2 ) (34.2)
x1 2C1 x2 2C2 x2 2C2 x1 2C1

This characterization consists of two optimization conditions, (i) and (ii), and a …nal
condition, (iii), that requires their mutual consistency. Let us consider these conditions one
by one.
By condition (i), the component x ^1 of a saddle point, called maximinimizer, solves the
following optimization problem, called maximinimization (or primal ) problem,

max inf f (x1 ; x2 ) sub x1 2 C1 (34.3)


x1 x2 2C2

where inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) is the objective function. If f does not depend on
x2 , this problem reduces to the standard maximization problem

max f (x1 ) sub x1 2 C1 (34.4)


x1

where the maximinimizer x ^1 becomes a standard maximizer.


By condition (ii), the component x^2 of a saddle point, called minimaximizer, solves the
following optimization problem, called minimaximization (or dual ) problem,

min sup f (x1 ; x2 ) sub x2 2 C2 (34.5)


x2 x1 2C1

where supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] is the objective function. If f does not depend
on x1 , this problem reduces to the standard minimization problem

min f (x2 ) sub x2 2 C2


x2

where the minimaximizer x ^2 becomes a standard minimizer.


The optimization problems (34.3) and (34.5) that underlie conditions (i) and (ii) are
dual: in one we …rst minimize over x2 and then maximize over x1 , in the other we do the
opposite. The consistency condition (iii) makes interchangeable in terms of value attained
these dual optimization problems by requiring their values to be equal.

The optimization conditions (i) and (ii) have standard optimization (maximization or
minimization) problems as special cases, so conceptually they are generalizations of famil-
iar notions. In contrast, the consistency condition (iii) is the actual novel feature of the
characterization in that it introduces a notion of mutual consistency between optimization
problems, which are no longer studied in isolation, as we did so far. The scope of this
condition will become more clear with the notion of Nash equilibrium.

The proof of Proposition 1381 relies on the following simple but important lemma (inter
alia, it shows that the more interesting part in an equality sup inf = inf sup is the inequality
sup inf inf sup).
34.1. MINIMAX THEOREM 979

Lemma 1382 For any function f : A1 A2 ! R, we have

sup inf f (x1 ; x2 ) inf sup f (x1 ; x2 )


x1 2A1 x2 2A2 x2 2A2 x1 2A1

Proof Clearly, f (x1 ; x2 ) inf x2 2A2 f (x1 ; x2 ) for all (x1 ; x2 ) 2 A1 A2 , so

sup f (x1 ; x2 ) sup inf f (x1 ; x2 ) 8x2 2 A2


x1 2A1 x1 2A1 x2 2A2

Then, inf x2 2A2 supx1 2A1 f (x1 ; x2 ) supx1 2A1 inf x2 2A2 f (x1 ; x2 ).

Proof of Proposition 1381 “Only if”. Let (^


x1 ; x
^ 2 ) 2 C1 C2 be a saddle point of f on
C1 C2 . By (34.1),

inf f (^
x1 ; x2 ) = f (^
x1 ; x
^2 ) = sup f (x1 ; x
^2 ) (34.6)
x2 2C2 x1 2C1

So,
sup inf f (x1 ; x2 ) f (^
x1 ; x
^2 ) inf sup f (x1 ; x2 )
x1 2C1 x2 2C2 x2 2C2 x1 2C1

By the previous lemma, the inequalities are actually equalities, that is,

sup inf f (x1 ; x2 ) = f (^


x1 ; x
^2 ) = inf sup f (x1 ; x2 )
x1 2C1 x2 2C2 x2 2C2 x1 2C1

From (34.6) it follows that

inf f (^
x1 ; x2 ) = sup inf f (x1 ; x2 ) and sup f (x1 ; x
^2 ) = inf sup f (x1 ; x2 )
x2 2C2 x1 2C1 x2 2C2 x1 2C1 x2 2C2 x1 2C1

which, in turn, implies (34.2). This proves the “only if”.


“If”. By (i) and (iii) we have f (^x1 ; x
^2 ) = maxx1 2C1 inf x2 2C2 f (x1 ; x2 ) = inf x2 2C2 f (^
x1 ; x2 ).
By (ii) and (iii), f (^
x1 ; x
^2 ) = minx2 2C2 supx1 2C1 f (x1 ; x2 ) = supx1 2C1 f (x1 ; x
^2 ). Hence,

inf f (^
x1 ; x2 ) = f (^
x1 ; x
^2 ) = sup f (x1 ; x
^2 )
x2 2C2 x1 2C1

which, in turn, implies that (^


x1 ; x
^ 2 ) 2 C1 C2 is a saddle point of f .

The last proposition implies the next remarkable interchangeability property of saddle
points.

Corollary 1383 Let f : A1 A2 ! R be a real-valued function and C1 and C2 subsets of


A1 and A2 , respectively. If the pairs (^ x1 ; x x01 ; x
^2 ) ; (^ ^02 ) 2 C1 C2 are saddle points of f on
C1 C2 , so are the pairs (^x1 ; x 0
^2 ) ; (^ 0
x1 ; x
^ 2 ) 2 C1 C2 .

In words, if we interchange the two components of a saddle point, we get a new saddle
point.

Proof It is enough to consider (^ ^02 ). Since (^


x1 ; x x1 ; x
^2 ) is a saddle point of f on C1 C2 ,
by Proposition 1381 the function inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) attains its maximum
value at x x01 ; x
^1 . Since (^ ^02 ) is a saddle point of f on C1 C2 , by Proposition 1381 the function
980 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x ^02 . In turn, by the “if”
part of Proposition 1381 this implies that (^
x1 ; x 0
^2 ) is a saddle point of f on C1 C2 .

A function f : A1 A2 ! R de…ned on a Cartesian product A1 A2 induces the


functions f x1 : A2 ! R de…ned by f x1 (x2 ) = f (x1 ; x2 ) for each x1 2 A1 as well as the
functions f x2 : A1 ! R de…ned by f x2 (x1 ) = f (x1 ; x2 ) for each x2 2 A2 . These functions
are called the sections of f (see Section 17.3.1). Using this terminology, we can say that
(^ ^2 ) is a saddle point of f if and only if the section f x^1 : C2 ! R attains minimum value
x1 ; x
^2 and the section f x^2 : C1 ! R attains maximum value at x
at x ^1 .
This remark easily leads, via Stampacchia’s Theorem, to a di¤erential characterization
of saddle points. To this end, as we did earlier in the book, in the gradient2

@f (x1 ; x2 ) @f (x1 ; x2 ) @f (x1 ; x2 ) @f (x1 ; x2 )


rf (x1 ; x2 ) = ; ::::; ; ; ::::;
@x11 @x1m @x21 @x2n

of a function f : A1 A2 Rm Rn ! R we distinguish the two parts rx1 f (x1 ; x2 ) and


rx2 f (x1 ; x2 ) de…ned by:

@f (x1 ; x2 ) @f (x1 ; x2 )
rx1 f (x1 ; x2 ) = ; ::::;
@x11 @x1m
@f (x1 ; x2 ) @f (x1 ; x2 )
rx2 f (x1 ; x2 ) = ; ::::;
@x21 @x2n

This distinction is key for the next di¤erential characterization of saddle points.

Proposition 1384 Let f : A1 A2 Rm Rn ! R be a real-valued function and C1 and


C2 subsets of A1 and A2 , respectively. Suppose that

(i) Ci is a closed and convex subset of the open and convex set Ai for i = 1; 2;

(ii) f is continuously di¤ erentiable in both x1 and x2 .3

If (^
x1 ; x
^ 2 ) 2 C1 C2 is a saddle point of f on C1 C2 , then

rx1 f (^
x1 ; x
^2 ) (x1 x
^1 ) 0 8x1 2 C1 (34.7)
rx2 f (^
x1 ; x
^2 ) (x2 x
^2 ) 0 8x2 2 C2 (34.8)

The converse is true if f is concave in x1 2 C1 and convex in x2 2 C2 .4

Proof It is enough to note that x ^2 ) : A1 Rm ! R on


^1 is a maximizer of the function f ( ; x
C1 , while x x1 ; ) : A2 R2 ! R on C2 . By Stampacchia’s
^1 is a minimizer of the function f (^
Theorem, the result holds.
2
Here x1 = (x11 ; :::; x1m ) 2 Rm and x2 = (x21 ; :::; x2n ) 2 Rn denote generic vectors in A1 and A2 ,
respectively.
3
That is, given any x2 2 A2 the section f x2 : A1 ! R is continuously di¤erentiable, while given any
x1 2 A1 the section f x1 : A2 ! R is continuously di¤erentiable.
4
That is, given any x2 2 C2 the section f x2 : C1 ! R is concave, while given any x1 2 C1 the section
x1
f : C2 ! R is convex.
34.1. MINIMAX THEOREM 981

When x
^1 is an interior point, condition (34.7) takes the simpler Fermat’s form
rx1 f (^
x1 ; x
^2 ) = 0
and the same is true for condition (34.8) if x
^2 is an interior point. Remarkably, conditions
(34.7) and (34.8) become necessary and su¢ cient when f is a saddle function on C1 C2 ,
i.e., when f is concave in x1 2 C1 and convex in x2 2 C2 . Saddle functions have therefore
for saddle points the remarkable status that concave and convex functions have in standard
optimization problems for maximizers and minimizers, respectively.
Example 1385 Consider the saddle function f : R2 ! R de…ned by f (x1 ; x2 ) = x21 x22 .
Since
@f (x1 ; x2 ) @f (x1 ; x2 )
= = 0 () x1 = x2 = 0
@x1 @x2
from the last theorem it follows that the origin (0; 0) is the only saddle point of f on R2 (cf.
Example 987). Graphically:

0
x3

-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

The previous result establishes, inter alia, the existence of saddle points under di¤erenti-
ability and concavity assumptions on the function f . Next we give a fundamental existence
result, the Minimax Theorem, that relaxes these requirements on f , in particular it drops
any di¤erentiability assumption. It requires, however, the sets C1 and C2 to be compact (as
usual, there are no free meals).
Theorem 1386 (Minimax) Let f : A1 A2 Rn Rm ! R be a real-valued function and
C1 and C2 subsets of A1 and A2 , respectively. Suppose that:
(i) C1 and C2 are convex and compact subsets of A1 and A2 , respectively;
(ii) f ( ; x2 ) : A1 ! R is continuous and quasi-concave on C1 ;
(iii) f (x1 ; ) : A2 ! R is continuous and quasi-convex on C2 .
Then, f has a saddle point on C1 C2 , with
max min f (x1 ; x
^2 ) = f (^
x1 ; x
^2 ) = min max f (x1 ; x2 ) (34.9)
x1 2C1 x2 2C2 x2 2C2 x1 2C1
982 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

Proof The existence of the saddle point follows from Nash’s Theorem, which will be proved
below. Since the sets C1 and C2 are compact and the function f is continuous in x1 and
in x2 , by Weierstrass’Theorem we can de…ne the functions minx2 2C2 f ( ; x2 ) : C1 ! R and
maxx1 2C1 f (x1 ; ) : C2 ! R. So, (34.2) implies (34.9).

The Minimax Theorem was proved in 1928 by John von Neumann in his seminal paper
on game theory. Interestingly, the choice sets C1 and C2 are required to be convex, so they
have to be in…nite (unless they are singletons, a trivial case).
A simple, yet useful, corollary of the Minimax Theorem is that continuous saddle func-
tions on a compact convex set C1 C2 have a saddle point on C1 C2 . If, in addition, they
are di¤erentiable, conditions (34.7) and (34.8) then characterize any such point.

34.2 Nash equilibria


Consider a group of n agents.5 Each agent i has a choice sets Ci and an objective function
fi . Because of the interdependence of agents’ decisions, the domain of fi is the Cartesian
product C1 Cn , that is,

fi : C1 Cn ! R

For instance, the objective function f1 of agent 1 depends on the agent decision x1 , as well
on the decisions x2 , ...., xn of the other agents. In the oligopoly example below, x1 is the
production decision of agent 1, while x2 , ...., xn are the production decisions of the other
agents.
Decisions are simultaneous, described by a vector (x1 ; :::; xn ). The operator f = (f1 ; :::; fn ) :
C1 Cn ! Rn , with

f (x1 ; :::; xn ) = (f1 (x1 ; :::; xn ) ; :::; fn (x1 ; :::; xn )) 2 Rn

describes the value fi (x1 ; :::; xn ) that each agent attains at (x1 ; :::; xn ).

Example 1387 Consider n …rms that produce the same output, say potatoes, that they
sell in the same market. The market price of the output depends on the total output
that together all …rms o¤er. Assume that the output has a strictly decreasing demand
function 1
Pn D : [0; 1) ! [0; 1) in the market. So, D (q) is the market price of the output if
q = i=1 qi is the sum of the individual quantities qi 0 of the output produced by each
n
…rm i = 1; :::; n. The pro…t function i : R+ ! R of …rm i is
1
i (q1 ; :::; qn ) =D (q) qi ci (qi )

where ci : [0; 1) ! R is its cost function. Thus, the pro…t of …rm i depends via q on
the production decisions of all …rms, not just on their own decisions qi . We thus have an
interdependent optimization problem, called Cournot oligopoly. Here the choice sets Ci are
the positive half-line [0; 1) and the operator f is given by = ( 1 ; :::; n ) : Rn+ ! Rn . N
5
In game theory agents are often called players (or co-players or opponents).
34.2. NASH EQUILIBRIA 983

To introduce the next equilibrium notion, to …x ideas we …rst consider the case n = 2
of two agents. Here f : C1 C2 ! R2 with f (x1 ; x2 ) = (f1 (x1 ; x2 ) ; f2 (x1 ; x2 )). Suppose a
decision pro…le (^
x1 ; x
^2 ) 2 C1 C2 is such that

f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1 (34.10)
f2 (^
x1 ; x
^2 ) f2 (^
x1 ; x2 ) 8x2 2 C2

In this case, each agent is doing his best given what the other agent does. Agent i has no
incentive to deviate from x ^i – that is, to select a di¤erent decision – as long as he knows
that the other agent (his “opponent”), denoted i, is playing x ^ i .6 In this sense, decisions
(^
x1 ; x
^2 ) are mutually compatible.
All this motivates the following classic de…nition proposed in 1950 by John Nash, which
is the most important equilibrium notion in economics. Here for each agent i we denote by
x i 2 C i = j6=i Cj the decision pro…le of his opponents.

De…nition 1388 Let f = (f1 ; :::; fn ) : A = A1 An ! Rn be an operator and


C = C1 Cn a subset of A. An element x
^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of
f on C if, for each i = 1; :::; n,

fi (^
x) fi (xi ; x
^ i) 8xi 2 Ci (34.11)

In the case n = 2, the equilibrium conditions becomes (34.10). The interpretation is


similar: each agent i has no incentive to deviate from x ^i as long as he knows that his
opponents are playing x^ i . Note that the de…nition of Nash equilibrium does not require
any structure on the choice sets Ci . The scope of this de…nition is, therefore, huge. Indeed,
it has been widely applied in many disciplines, within and outside the social sciences.

N.B. Nash equilibrium is de…ned purely in terms of agents’individual decisions xi , unlike the
notion of Arrow-Debreu equilibrium (Section 18.8) that involves a variable, the price vector,
which is not under the control of agents. In this sense, the Arrow-Debreu equilibrium is a
spurious equilibrium notion from a methodological individualism standpoint, though most
useful in understanding markets’behavior.7 O

Nash equilibrium is based on the n interdependent parametric optimization problems,


one per agent,

max fi (xi ; x i ) sub xi 2 Ci


xi

where the opponents’ decisions x i play the role of the parameter. The solution corres-
pondence i : C i Ci de…ned by i (x i ) = arg maxxi fi (xi ; x i ) is called best reply
correspondence. We can reformulate the equilibrium condition (34.11) as

x
^i 2 i (^
x i) 8i = 1; :::; n (34.12)
6
How such mutual understanding among agents emerges is a non-trivial conceptual issue from which we
abstract away, leaving it to game theory courses.
7
Methodological principles are important but a pragmatic attitude should be kept not to transform them
in dogmas.
984 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

In words, in equilibrium all agents are best replying in that each x


^i solves the optimization
problem

max fi (xi ; x
^ i) sub xi 2 Ci (34.13)
xi

In turn, this easily leads to a di¤erential characterization of Nash equilibria via Stam-
pacchia’s Theorem. To ease matters, we assume that each Ai is a subset of the same space
Rm , so that both A and C are subsets of (Rm )n .

Theorem 1389 Let f = (f1 ; :::; fn ) : A = A1 An (Rm )n ! Rn be an operator and


C = C1 Cn a subset of A. Suppose that, for each i = 1; :::; n,

(i) Ci is a closed and convex subset of the open and convex set Ai ;

(ii) fi is continuously di¤ erentiable in xi .

If x
^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of f on C, then, for each i = 1; :::; n,

rxi fi (^
x) (xi x
^i ) 0 8xi 2 Ci (34.14)

The converse is true if each fi is concave in xi .

Proof It is enough to note that x


^i is a maximizer of the function fi ( ; x
^ i ) : Ai Rm ! R
on Ci . By Stampacchia’s Theorem, the result holds.

When m = 1, so that each Ai is a subset of the real line, the condition takes the simpler
form:
@fi (^
x)
(xi x^i ) 0 8xi 2 Ci
@xi
Moreover, when x
^i is an interior point of Ci , the condition takes the Fermat’s form

rxi fi (^
x) = 0 8xi 2 Ci (34.15)

Example 1390 In the Cournot oligopoly, assume that both the demand and cost functions
are linear, where D 1 (q) = a bq and ci (qi ) = cqi with a > c and b > 0. Then, the pro…t
function of …rm i is i (q1 ; :::; qn ) = (a bq) qi cqi , which is strictly concave in qi . The
choice set of …rm i is the set Ci = [0; +1). By the last proposition, the …rst order condition
(34.14) is necessary and su¢ cient for a Nash equilibrium (^ q1 ; :::; q^n ). This condition is, for
every i,
@ i (^
q1 ; :::; q^n )
(qi q^i ) = (a b^
q b^
qi c) (qi q^i ) 0 8qi 0
@qi
So, for every i we have a b^ q b^ qi = c if q^i > 0, and (a b^
q b^
qi ) c if q^i = 0.
We have q^i > 0 for every i. Indeed, assume by contradiction that q^i = 0 for some i. The
…rst order condition then implies a b^ q c, which in turn implies a c, thus contradicting
a > c. We conclude that q^i > 0 for every i. Then, the …rst order condition implies
a c b^
q
q^i = 8i = 1; :::; n
b
34.3. NASH EQUILIBRIA AND SADDLE POINTS 985

By adding up, one gets


n a c
q^ =
1+n b
So, the unique Nash equilibrium is

1 a c
q^i = 8i = 1; :::; n
1+n b

As n increases, the (per …rm) equilibrium quantity decreases. N

The best reply formulation (34.12) permits to establish the existence of Nash equilibria
via a …xed point argument based on Kakutani’s Theorem.

Theorem 1391 (Nash) Let f = (f1 ; :::; fn ) : A = A1 An ! Rn be an operator and


C = C1 Cn a subset of A. Suppose that, for each i = 1; :::; n, we have

(i) Ci is a convex and compact subset of Ai ;

(ii) fi is continuous and quasi-concave in xi 2 Ci .

Then, f has a Nash equilibrium on C.

Proof Given any x i , the function fi ( ; x i ) : Ai ! R is by hypothesis continuous on the


compact set Ci . By the Maximum Theorem, the best reply correspondence i : C i Ci
is compact-valued and upper hemicontinuous because fi ( ; x i ) : Ai ! R is continuous on
the compact set Ci . Moreover, it is convex-valued because fi ( ; x i ) : Ai ! R is, again by
hypothesis, quasi-concave on the convex set Ci (Proposition 1358). Consider the product
correspondence ' : C C de…ned by ' (x1 ; :::; xn ) = 1 (x 1 ) n (x n ). The
correspondence ' is easily seen to be upper hemicontinuous and convex-valued (as readers
can check) on the compact and convex set C. By Kakutani’s Theorem, there exists a …xed
point (^
x1 ; :::; x
^n ) 2 C such that

(^
x1 ; :::; x
^n ) 2 ' (^
x1 ; :::; x
^n ) = 1 (^
x 1) n (^
x n)

So, x
^i 2 i (^
x i) for each i = 1; :::; n, as desired.

34.3 Nash equilibria and saddle points


Consider the two-agent case. The operator f = (f1 ; f2 ) is strictly competitive if there is a
strictly decreasing function ' such that f2 = ' f1 .

Example 1392 When ' (x) = x, we have f2 = f1 . This strictly competitive operator f
is called zero-sum. It is the polar case that may arise, for example, in military interactions.
This is the case originally studied by von Neumann and Morgenstern in their celebrated
(wartime) 1944 opus. N
986 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

We have (cf. Proposition 209):

(' f1 ) (^
x1 ; x
^2 ) (' f1 ) (^
x1 ; x2 ) () f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 )

So, when f is strictly competitive the equilibrium conditions (34.10) reduce to

f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1
f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 ) 8x2 2 C2

that is,
f1 (^
x1 ; x2 ) f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 )
In this case, a pair (^
x1 ; x
^2 ) is a Nash equilibrium if and only if it is a saddle point of f
on C1 C2 . We have thus proved the following mathematically simple, yet conceptually
important, result.

Theorem 1393 Let f = (f1 ; f2 ) : A1 A2 ! R be a strictly competitive operator and C1 and


C2 subsets of A1 and A2 , respectively. Then, a pair (^
x1 ; x
^2 ) 2 C1 C2 is a Nash equilibrium
if and only if it is a saddle point.

Saddle points are thus Nash equilibria of strictly competitive operators. In particular, the
Minimax Theorem is the special case of Nash’s Theorem for strictly competitive operators.
This further clari…es the nature of saddle points as a way to model individual optimiza-
tion problems that are “negatively” interdependent, so agents expect the worst from their
opponent and best reply by maxminimizing.

34.4 Nash equilibria on a simplex


As in the Minimax Theorem, in Nash’s Theorem the choice sets Ci are required to be convex,
so they have to be in…nite (unless they are singletons). This raised the question of how to
“convexify” the …nite choice sets that economic applications often feature. Mixing through
randomization is, typically, the way to answer this important question. In Section 34.5.1
we will elaborate. In any case,Pm formally this means that the choice set Ci is the simplex
m m
m 1 = (x1 ; :::; xm ) 2 R+ : i=1 xi = 1 of R . In this case, the following di¤erential
characterization holds.

Proposition 1394 Let f = (f1 ; :::; fn ) : m 1 n


m 1 ! R . If (^ x1 ; :::; x
^n ) 2
n
m 1 m 1 is a Nash equilibrium of f , then there exists ^ 2 R+ such that for each
i = 1; :::; n we have
@fi @fi
(^
x) = ^ i if x
^ik > 0 ; (^
x) ^i if x
^ik = 0
@xik @xik
for all k = 1; :::; m. The converse holds if each fi is concave in xi .

Proof Here condition (34.14) takes the normal cone form8

rxi fi (^
x) 2 N m 1 (^
x) 8xi 2 Ci
8
Recall Section 31.2.2.
34.5. APPLICATIONS 987

So, the result follows from Proposition 1323 and from Stampacchia’s Theorem.

The objective function fi of agent i is often assumed to be a¢ ne in xi because of the ex-


pected utility hypothesis (Section 34.5.1). Interestingly, next we show that in this important
case by Bauer’s Theorem equilibrium decisions are convex combinations of extreme points
of the simplex.

Proposition 1395 Let f = (f1 ; :::; fn ) : m 1 ! Rn , with each fi a¢ ne in


m 1
xi . Then, (^
x1 ; :::; x
^n ) 2 m 1 m 1 is a Nash equilibrium of f if and only if

max fi (xi ; x
^ i) = max fi (xi ; x
^ i) (34.16)
xi 2 m 1 xi 2fe1 ;:::;em g

and
;=
6 arg max fi (xi ; x
^ i ) = co arg max fi (xi ; x
^ i) (34.17)
xi 2 m 1 xi 2fe1 ;:::;em g

Proof By Bauer’s Theorem –via Corollary 836 –we have

arg max fi (xi ; x


^ i ) = co arg max fi (xi ; x
^ i) = co arg max fi (xi ; x
^ i)
xi 2 m 1 xi 2ext m 1 xi 2fe1 ;:::;em g

because ext m 1 = e1 ; :::; em .

By (34.17), the set of Nash equilibria is a non-empty set that consists of the n-tuples
(^
x1 ; :::; x
^n ) 2 m 1 m 1 such that

x
^i 2 co arg max fi (xi ; x
^ i)
xi 2fe1 ;:::;em g

for each i = 1; :::; n. Thus, x


^i is either a versor that best replies to the opponent’s decisions
x
^ i or a convex combination of such versors. In particular, we have

^ik > 0 =) ek 2 arg max fi (xi ; x


x ^ i) 8k = 1; :::; m (34.18)
xi 2 m 1

^ik correspond to best replying versors ek .


Thus, in equilibrium strictly positive weights x
Moreover, by (34.16) in terms of value attainment agent i can solve the optimum problem

max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi

that only involves the versors. In the next section we will discuss the signi…cance of all this
for games and decisions under randomization.

34.5 Applications
34.5.1 Randomization in games and decisions
Suppose that an agent has a set S = fs1 ; s2 ; :::; sm g of m pure actions (or strategies),
evaluated with a utility function u : S ! R. Since the set S is …nite, it is not convex (unless
988 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

it is a singleton), so we cannot use the powerful results – such as Nash’s Theorem – that
throughout the book we saw to hold for concave (or convex) functions de…ned on convex
sets. A standard way to embed S in a convex set is via randomization, as readers will learn
in game theory courses. Here we just outline the argument to illustrate the results of the
chapter.
Speci…cally, by randomizing via some random device – coin tossing, roulette wheels,
and the like – agents can select a mixed (or randomized ) action in which (sk ) is the
probability that the random device assigns to the pure action sk . Denote by (S) the set of
all randomized actions. According to the expected utility criterion, an agent evaluates the
randomized action via the function U : (S) ! R de…ned by
m
X
U( )= u (sk ) (sk )
k=1

In words, the randomized action is evaluated by taking the average of the utilities of
the pure actions weighted by their probabilities under .9 Note that each pure action sk
corresponds to the “degenerated” randomized action that assigns it probability 1, i.e.,
(sk ) = 1. Via this identi…cation, we can regard S as a subset of (S) and thus write, with
an abuse of notation, S (S).
Under randomization, agents aim to select the best randomized action by solving the
optimization problem
max U ( ) sub 2 (S) (34.19)

where (S) is the choice set and U is the objective function.

We can extract the mathematical essence of this optimization problem by identifying a


randomized action with an element x of the simplex m 1 via the relation

(sk ) ! xk

In particular, a degenerate , with (sk ) = 1, is identi…ed with the versor ek . That is, pure
actions can be identi…ed with the versors of the simplex, i.e., with its extreme points. For
instance, if is such that (s2 ) = 1, then it corresponds to the versor e2 .
Summing up, we have the following identi…cations and inclusions:

S ! ext m 1

(S) ! m 1

In this way, we have “convexi…ed” S by identifying it with a subset of the simplex, which is
a convex set in Rm . In this sense, we have convexi…ed S.

Example 1396 Let S = fs1 ; s2 ; s3 g. Then

(s1 ) ! x1 ; (s2 ) ! x2 ; (s3 ) ! x3


9
Weighted averages are discussed in Section 13.1.4.
34.5. APPLICATIONS 989

Here we have:
S = fs1 ; s2 ; s3 g ! ext 2 = e1 ; e2 ; e3

(s1 ; s2 ; s3 ) ! 2 = (x1 ; x2 ; x3 ) 2 R3+ : x1 + x2 + x3 = 1

For instance, if 2 (S) is such that (s1 ) = (s2 ) = 1=4, and (s3 ) = 1=2, then it
corresponds to x = (1=4; 1=4; 1=2). N

By setting uk = u (sk ) for each k, the expected utility function U can be identi…ed with
the a¢ ne function V : m 1 ! R de…ned by
m
X
V (x) = uk xk = u x
k=1

where u = (u1 ; u2 ; :::; um ) 2 Rm . The optimization problem (34.19) of the agent becomes

max V (x) sub x 2 m 1 (34.20)


x

It is a very nice concave optimization problem in which the objective function V is a¢ ne


and the choice set m 1 is a convex and compact set of Rm . In particular, by Proposition
1395 we have
max V (x) = max V (x) (34.21)
x2 m 1 x2fe1 ;:::;em g

and
;=
6 arg max V (x) = co arg max V (x) (34.22)
x2 m 1 x2fe1 ;:::;em g

By (34.21), agents’optimal mixed actions are convex combinations of pure actions that,
in turn, are optimal. So, the optimal x
^ is such that

^k > 0 =) ek 2 arg max V (x)


x 8k = 1; :::; m
x2 m 1

That is, the pure actions that are assigned a strictly positive weight by an optimal mixed
action are, in turn, optimal. By (34.22), in terms of value attainment problem (34.20) is
equivalent to the much simpler problem

max V (x) sub x 2 e1 ; :::; em


x

that only involves pure actions.


Similar identi…cations can be done in a game with n agents. To keep notation simple,
we consider two agents that have a set Si = fsi1 ; :::; sim g of m pure actions, evaluated
with a utility function ui : S1 S2 ! R. By randomizing, they can consider mixed actions
i 2 (Si ). Because of interdependence, agent i evaluates a pro…le f 1 ; 2 g of mixed actions,
one per agent, via an expected utility function Ui : (S1 ) (S2 ) ! R de…ned by
m
X
Ui ( 1; 2) = (s1k ) (s2k0 ) ui (s1k ; s2k0 )
k;k0 =1
990 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

Under randomization, agents choose a mixed actions. In particular, a pair (^ 1 ; ^ 2 ) 2 (S1 )


(S2 ) is a Nash equilibrium if

Ui (^ i ; ^ i ) Ui ( i ; ^ i ) 8 i 2 (Si )

for each i = 1; 2.
The mixed actions (Si ) can be identi…ed with the simplex m 1 , with its extreme points
ei representing the pure actions si . De…ne ui : f1; :::; mg f1; :::; mg ! R by ui (k 0 ; k 00 ) =
ui (s1k0 ; s2k00 ). We can then identify Ui with the function Vi : m 1 m 1 ! R de…ned by
X
Vi (x1 ; x2 ) = x1k0 x2k00 ui k 0 ; k 00 = x1 Ui x2
(k0 ;k00 )2f1;:::;mg f1;:::;mg

where Ui is the square matrix of order m that has the values ui (k 0 ; k 00 ) as entries.
The function Vi is a¢ ne in xi . A pair (^
x1 ; x
^2 ) 2 m 1 m 1 is a Nash equilibrium if

Vi (^
xi ; x
^ i) Vi (xi ; x
^ i) 8xi 2 m 1

for each i = 1; 2. By Proposition 1395,

max Vi (xi ; x
^ i) = max Vi (xi ; x
^ i) (34.23)
xi 2 m 1 xi 2fe1 ;:::;em g

and
;=
6 arg max Vi (xi ; x
^ i ) = co arg max Vi (xi ; x
^ i) (34.24)
xi 2 m 1 xi 2fe1 ;:::;em g

By (34.24), equilibrium mixed actions are convex combinations of pure actions that, in turn,
best reply to the opponent’s mixed action. So, the equilibrium x
^i is such that (34.18) holds,
i.e.,
x^ik > 0 =) ek 2 arg max Vi (xi ; x ^ i)
xi 2 m 1

for each i = 1; 2. That is, the pure actions ek


that are assigned a strictly positive weight x
^ik
by an equilibrium mixed action x ^i of an agent are, in turn, best replies to the opponent’s
equilibrium mixed action x ^ i . Moreover, by (34.23) in terms of value attainment agent i can
solve the optimum problem

max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi

that only involves pure actions.

34.5.2 Kuhn-Tucker’s saddles


Saddle points provide an interesting angle on Lagrange multipliers. For simplicity, consider
an optimization problem with inequality constraints

max f (x) (34.25)


x
sub g1 (x) b1 ; g2 (x) b2 ; :::; gm (x) bm
34.5. APPLICATIONS 991

where f : A Rn ! R is the objective function, while the functions gi : A Rn ! R and


the scalars bi 2 R induce m inequality constraints.10
For this problem the Lagrangian function L : A Rm + ! R is de…ned by

L (x; ) = f (x) + (b g (x)) 8 (x; ) 2 A Rm


+

x; ^ ) 2 A
A pair (^ Rm
+ is a saddle point of L on A Rm
+ if

L (^
x; ) x; ^ )
L(^ L(x; ^ ) 8x 2 A; 8 0

Lemma 1397 A pair (^x; ^ ) 2 A Rm


+ is a saddle point of the Lagrangian function L :
m
A R+ ! R if and only if

(i) f (^
x) f (x) + ^ (b g (x)) for every x 2 A;

(ii) g (^
x) b and ^ i (bi gi (^
x)) = 0 for all i = 1; :::; m.

x; ^ ) 2 A Rm
Proof “Only if”. Let (^ + be a saddle point of the Lagrangian function L :
m
A R+ ! R. Since L (^ x; ) L(^ ^
x; ) for all 0, it follows that

( ^ ) (b g (^
x)) 0 8 0 (34.26)

Putting = ^ + ei , then (34.26) implies bi gi (^


x) 0. Since this holds for every i = 1; :::; m,
we have g (^x) b. Moreover, by taking = 0 from (34.26) it follows ^ (b g (^ x)) 0, while
^
by taking = 2 from (34.26) it follows ^ (b g (^
x)) 0. So, ^ (b g (^ x)) = 0. Then,
x; ^ ) = f (^
L(^ x) and

f (^ x; ^ )
x) = L(^ L(x; ^ ) = f (x) + ^ (b g (x)) 8x 2 A (34.27)

Since the positivity of ^ implies that, provided g (^ x) b, condition ^ (b g (^ x)) = 0 is


^
equivalent to i (bi gi (^ x)) = 0 for all i = 1; :::; m, we conclude that (i) and (ii) hold.
“If”. Assume that conditions (i) and (ii) hold. By taking x = x ^, from (i) it follows that
f (^
x) f (^ x) + ^ (b g (^ x)). By (ii) b g (^ x) 0, so f (^ x) + ^ (b g (^x)) f (^ x) since
^ 0. We conclude that f (^ x) + ^ (b g (^ x)) = f (^ x), so that
^ (b g (^
x)) = 0 (34.28)

Thus, for every 0 we have:

x; ^ )
L(^ x; ) = ( ^
L (^ ) (b g (^
x)) = (b g (^
x)) 0

x; ^ )
which implies L(^ L (^
x; ) for all 0. On the other hand, (i) and (34.28) imply

x; ^ ) = f (^
L(^ x) f (x) + ^ (b g (x)) = L(x; ^ ) 8x 2 A

x; ^ )
so that L(^ L(x; ^ ) for all x 2 A. We conclude that (^
x; ^ ) is a saddle point of L on
A R+ .m

The next result is a …rst dividend of this lemma.


10
Later we will invoke Slater’s condition: till then, this setup actually includes also equality constraints (cf.
the discussion at the end of Section 30.1). For this reason we use the letters g and (rather than h and ).
992 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

Proposition 1398 A vector x ^ 2 A solves problem (34.25) if there exists ^ 0 such that
^
x; ) is a saddle point of the Lagrangian function L on A Rm
(^ +.

So, the existence of a saddle point for the Lagrangian function implies the existence of
a solution for the underlying optimization problem with inequality constraints. No assump-
tions are made on the functions f and gi . If we make some standard assumptions on them,
the converse becomes true, thus establishing the following remarkable “saddle” version of
Kuhn-Tucker’s Theorem.

Theorem 1399 Let f : A Rn ! R and gi : A Rn ! R be continuously di¤ erentiable on


an open and convex set A, with f concave and each gi convex. Assume Slater’s condition,
i.e., there exists x 2 A such gi (x) < bi for all i = 1; :::; m. Then, the following conditions
are equivalent:

(i) x
^ 2 A solves problem (34.25);
(ii) there exists a vector ^ x; ^ ) is a saddle point of the Lagrangian function
0 such that (^
L on A Rm +;

(iii) there exists a vector ^ 0 such that the Kuhn-Tucker conditions hold
x; ^ ) = 0
rx L(^ (34.29)
^ i r L(^x; ^ ) = 0 8i = 1; :::; m (34.30)
i

r L(^x; ^ ) 0 (34.31)

Proof (ii) implies (i) by the last proposition. (i) implies (iii) by what we learned in Section
31.3. (iii) implies (ii) by Theorem 1384. Indeed the Kuhn-Tucker conditions are nothing but
conditions (34.7) and (34.8) for the Lagrangian function (cf. Example 1322). First, note that
condition (34.7) takes the form rx L(^ x; ^ ) = 0 because the set A is open. As to condition
(34.8), here it becomes
r L(^x; ^ ) ( ^) 0 8 0 (34.32)
This condition is equivalent to (34.30) and (34.31). From (34.30) it follows r L(^ x; ^ )
^ = 0, while from (34.31) it follows that r L(^ ^
x; ) 0 for all 0. So, (34.32) holds.
Conversely, by taking = 0 in (34.32), we have r L(^ x; ^ ) ^ 0 and by taking = 2 ^ we
have r L(^ x; ^ ) ^ 0, so r L(^ x; ^ ) ^ = 0. Finally, by taking = ^ + ei in (34.32), we
easily get r L(^ ^
x; ) 0. Since r L(^ x; ^ ) = b g (x), from b g (x) and the positivity of
^ it follows that r L(^ x; ^ ) ^ = 0 is equivalent to ^ i r L(^ x; ^ ) = 0 for all i = 1; :::; m. In
i
sum, the Kuhn-Tucker conditions are the form that conditions (34.7) and (34.8) take here.
Since the Lagrangian function is easily seen to be a saddle function when f concave and each
gi convex, this prove that properties (ii) and (iii) are equivalent, thus completing the proof.

By Proposition 1381, (^ x; ^ ) is a saddle point of the Lagrangian function L on A Rm


+ if
and only if there exists a vector ^ 0 such that:

(i) x
^ solves the primal problem
max inf L (x; ) sub x 2 A
x 0
34.5. APPLICATIONS 993

(ii) ^ solves the dual problem

min sup L (x; ) sub 0 (34.33)


x2A

(iii) the two values are equal, i.e.,

x; ^ ) = min sup L (x; )


max inf L (x; ) = L(^
x2A 0 0 x2A

The primal problem is actually equivalent to the original problem (34.25). Indeed, let us
write problem (34.25) in canonical form as

max f (x) sub x 2 C


x

where the choice set is C = fx 2 A : g (x) bg. Since

inf L (x; ) = f (x) + inf (b g (x))


0 0

we have (
1 if x 2
=C
inf L (x; ) =
0 f (x) if x 2 C

because inf 0 (b g (x)) = 1 if x 2


= C and inf 0 (b g (x)) = 0 if x 2 C.
We conclude that
max inf L (x; ) = max f (x)
x2A 0 x2C

and
arg max inf L (x; ) = arg max f (x)
x2A 0 x2C

so the primal and the original problem are equivalent in terms of both solutions and value
attainment. We thus have the following corollary of the last theorem, which relates the
original and dual problems.

Corollary 1400 Let f : A Rn ! R and gi : A Rn ! R be continuously di¤ erentiable


on an open and convex set A, with f concave and each gi convex. If x^ 2 A solves problem
(34.25) and Slater’s condition holds, then there exists ^ 0 that solves the dual problem
(34.33), with maxx2C f (x) = min 0 supx2A L (x; ).

Summing up, in concave optimization problems with inequality constraints the solution x^
and the multiplier ^ solve dual optimization problems that are mutually consistent. In par-
ticular, multipliers admit a dual optimization interpretation in which they can be viewed as
(optimally) chosen by some …ctitious, yet malevolent, opponent (say, nature). An individual
optimization problem is thus solved by embedding it in a …ctitious game against nature, a
surprising paranoid twist on multipliers.
994 CHAPTER 34. INTERDEPENDENT OPTIMIZATION

Under such game-theoretic interpretation, the Kuhn-Tucker conditions characterize a


saddle point of the Lagrangian function in that they are the form that conditions (34.7) and
(34.8) take for the Lagrangian function. We can write them explicitly as:

x; ^ )
@L(^ x; ^ )
@f (^
= i (bi gi (^
x)) = 0 8i = 1; :::; n
@xi @xi
x; ^ ) ^
@L(^
i =0 8i = 1; :::; m
@ i
x; ^ )
@L(^
= bi gi (x) 0 8i = 1; :::; m
@ i
This is our last angle on Kuhn-Tucker’s Theorem, the deepest one.

34.5.3 Linear programming: duality


An elegant application of the game theoretic angle on Kuhn-Tucker’s Theorem is a duality
result for linear programming (Section 18.6). Given a m n matrix A = (aij ) and vectors
b 2 Rm and c 2 Rn , consider the linear programming problem
max c x sub x 2 P = x 2 Rn+ : Ax b (34.34)
x

as well as the minimization problem


min b sub 2 = 2 Rm
+ :A
T
c (34.35)

The last corollary implies the following classic duality result.

Theorem 1401 (Duality Theorem of Linear Programming) Suppose Slater’s condi-


tion holds for both problems (34.34) and (34.35). Then, there exists x ^ 0 that solves
problem (34.34) if and only if there exists ^ 0 that solves problem (34.35). In this case,
their optimal values are equal:
max c x = min b
x 0 0

As the proof clari…es, the two problems (34.34) and (34.35) are one the dual of the other,
either providing the multipliers to the other. In particular, solutions exists if either of the
two polyhedra P and is bounded (Corollary 836).

Proof The Lagrangian function L : Rn+ Rm


+ ! R of problem (34.34) is

L (x; ) = c x + (b Ax)
Its dual problem is
min sup L (x; ) sub 0 (34.36)
x 0
We have
sup L (x; ) = sup c x + (b Ax) = b + sup c x Ax
x 0 x 0 x 0
n m
!
X X
= b + sup cj aij i xj = b + sup c AT x
x 0 j=1 i=1 x 0
34.5. APPLICATIONS 995

Consider the polyhedron = 0 : AT c in Rm . Then


(
+1 if 2
=
sup L (x; ) =
x2Rn b if 2
because supx2Rn c AT x = 0 if 2 = and supx2Rn c AT x = +1 if 2 . We
conclude that the dual problem (34.36) reduces to problem (34.35), which can be written in
linear programming form as
max b sub 2 = 0: AT c (34.37)

~ : Rm
In turn, the Lagrangian function L + Rn+ ! R of this problem is
n m
!
X X
~ ( ; x) =
L b+x c+A T
= b+ cj + aij i xj
j=1 i=1
= c x (b Ax) = L (x; )
x; ^ ) is a saddle point of L if and only if ( ^ ; x
So, (^ ~ We conclude
^) is a saddle point of L.
that the linear programs (34.34) and (34.37) are one dual to the other, each providing the
multipliers to the other. By Corollary 1400 the result then follows.
Example 1402 Let 2 3
1 2 2 1
4
A= 0 2 1 2 5
0 1 1 3
and b = (1; 3; 2) and c = ( 1; 2; 4; 2). Consider the linear programming problem
max x1 + 2 (x2 x4 ) + 4x3
x1 ;x2 ;x3 ;x4
sub x1 2x2 + 2x3 + x4 1; 2 (x2 + x4 ) x3 3; x2 x3 + 3x4 2
x1 0; x2 0, x3 0, x4 0
Since 2 3
1 0 0
6 2 2 1 7
AT = 6
4 2
7
1 1 5
1 2 3
the dual problem is
min 1 +3 2 +2 3
1; 2; 3

sub 1 1; 2 ( 2 1) + 3 2; 2 1 2 3 4, 1 +2 2 +3 3 2
1 0; 2 0, 3 0
In view of the Duality Theorem of Linear Programming, if the two problems satisfy Slater’s
condition (do they?) then either problem has a solution if the other does, with
max x1 + 2 (x2 x4 ) + 4x3 = min 1 +3 2 +2 3
x 0 0
996 CHAPTER 34. INTERDEPENDENT OPTIMIZATION
Part VIII

Integration

997
Chapter 35

The Riemann integral

35.1 The method of exhaustion


Let us consider a positive function f (i.e., taking values 0) de…ned on a closed interval
[a; b]. Intuitively, the integral of f on [a; b] is the measure, called area, of the plane region

A f[a;b] = f(x; y) 2 [a; b] R+ : 0 y f (x)g (35.1)

under the graph of the function f on the interval. Graphically:

6
y

1
O a b x

0
0 1 2 3 4 5 6

The problem is how to make this natural intuition rigorous. As the …gure shows, the
plane region A f[a;b] is a “curved” trapezoid with three straight sides and a curved one.
So, it is not an elementary geometric …gure that we know how to compute its area. To
our rescue comes a classic procedure known as the method of exhaustion. It consists in
approximating from above and below the area of a non-trivial geometric …gure (such as our
trapezoid) through the areas of simple circumscribed and inscribed elementary geometric
…gures, typically polygons (in our case, the so-called “plurirectangles”), whose measure can
be calculated in an elementary way. If the resulting upper and lower approximations can
be made more and more precise via polygons having more and more sides, till in the limit
of “in…nitely many sides” they reach a common limit value, we then take such a common

999
1000 CHAPTER 35. THE RIEMANN INTEGRAL

value as the sought-after area of the non-trivial geometric …gure (in our case, the area of the
trapezoid, so the integral of f on [a; b]).
In the next sections we will make rigorous the procedure just outlined. The method of
exhaustion originates in Greek mathematics, where it found wonderful applications in the
works of Eudoxus of Cnidus and Archimedes of Syracuse, who with this method were able
to compute or approximate the areas of some highly non-trivial geometric …gures.1

35.2 Plurirectangles

We know how to calculate the areas of elementary geometric …gures. Among them, the
simplest ones are rectangles, whose area is given by the product of the side lengths. A
simple, but key for our purposes, generalization of a rectangle is the plurirectangle, that is,
the polygon formed by contiguous rectangles. Graphically:

-1
-1 0 1 2 3 4 5 6 7 8 9

Clearly, the area of a plurirectangle is just the sum of the areas of the individual rectangles
that compose it.
Let us go back now to the plane region A f[a;b] under the graph of a positive function f
on [a; b]. It is easy to see how such region can be sandwiched between inscribed plurirectangles
and circumscribed plurirectangles. For example, the following plurirectangle

1
For instance, Example 1546 of Appendix C reports the famous Archimedes approximation of , the area
of the closed unit ball, via the method of exhaustion based on circumscribed and inscribed regular polygons.
35.2. PLURIRECTANGLES 1001

4 y

3.5

2.5

1.5

0.5

0
O a b x
-0.5

-1
0 1 2 3 4 5 6

is inscribed in A f[a;b] , while the following plurirectangle circumscribes it:

4 y

3.5

2.5

1.5

0.5

0
O a b x
-0.5

-1
0 1 2 3 4 5 6

Naturally, the area of A f[a;b] is larger than the area of any inscribed plurirectangle and
smaller than the area of any circumscribed plurirectangle. The area of A f[a;b] is, therefore,
in between the areas of the inscribed and circumscribed plurirectangles.
We thus have a …rst key observation: the area of A f[a;b] can always be sandwiched
between areas of plurirectangles. This yields simple lower approximations (the areas of
the inscribed plurirectangles) and upper approximations (the areas of the circumscribed
plurirectangles) of the area of A f[a;b] .
A second key observation is that such a sandwich, and consequently the relative ap-
proximations, can be made better and better by considering …ner and …ner plurirectangles,
obtained by subdividing further and further their bases:
1002 CHAPTER 35. THE RIEMANN INTEGRAL

4 y 4 y

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
O a b x O a b x
-0.5 -0.5

-1 -1
0 1 2 3 4 5 6 0 1 2 3 4 5 6

Indeed, by subdividing further and further the bases, the area of the inscribed plurirectangles
becomes larger and larger, though it remains always smaller than the area of A f[a;b] . On
the other hand, the area of the circumscribed plurirectangles becomes smaller and smaller,
though it remains always larger than the area of A f[a;b] . In other words, the two slices of
the sandwich that include the region A f[a;b] –i.e., the lower and the upper approximations
–take values that become closer and closer to each other.
If by considering …ner and …ner plurirectangles, corresponding to …ner and …ner subdi-
visions of the bases, in the limit the lower and upper approximations coincide –so, the two
slices of the sandwich merge –such a limit common value can be rightfully taken to be the
area of A f[a;b] . In this way, starting with objects, the plurirectangles, that are simple to
measure we are able to measure via better and better approximations a much more complic-
ated object such as the area of the plane region A f[a;b] under f . The method of exhaustion
is one of the most powerful ideas in mathematics.

35.3 De…nition
We now formalize the method of exhaustion. We …rst consider positive and bounded func-
tions f : [a; b] ! R+ . In the next section, we will then consider general bounded functions,
not necessarily positive

35.3.1 Positive functions


De…nition 1403 A set = fxi gni=0 of points is a subdivision (or partition) of an interval
[a; b] if
a = x0 < x1 < < xn = b
The set of all possible subdivisions of an interval [a; b] is denoted by .

Given a bounded function f : [a; b] ! R+ , consider the contiguous bases generated by


the points of the subdivision :

[x0 ; x1 ] ; [x1 ; x2 ] ; ::: ; [xn 1 ; xn ] (35.2)


35.3. DEFINITION 1003

Let us construct on them the largest plurirectangle inscribed in the plane region under f .
In particular, for the i-th base, the maximum height mi of a inscribed rectangle with base
[xi 1 ; xi ] is
mi = inf f (x)
x2[xi 1 ;xi ]

Since f is bounded, by the Least Upper Bound Principle this in…mum exists and is …nite,
that is, mi 2 R. Since the length xi of each base [xi 1 ; xi ] is

xi = xi xi 1

the area I (f; ) of such maximal inscribed plurirectangle is


n
X
I (f; ) = mi xi (35.3)
i=1

In a similar way, let us construct on the contiguous bases (35.2) determined by the subdivision
, the smallest plurirectangle that circumscribes the plane region under f . For the i-th base,
the minimum height Mi of a circumscribed rectangle with base [xi 1 ; xi ] is

Mi = sup f (x)
x2[xi 1 ;xi ]

Graphically:
4

M
i
0
m
i
-1

-2 x x
i-1 i

-3
-2 -1 0 1 2 3 4

As before, since f is bounded by the Least Upper Bound Principle the supremum exists
and is …nite, that is, Mi 2 R. Therefore, the area S (f; ) of the minimal circumscribed
plurirectangle is
Xn
S (f; ) = Mi x i (35.4)
i=1

Since mi Mi for every i, we have

I (f; ) S (f; ) 8 2 (35.5)


1004 CHAPTER 35. THE RIEMANN INTEGRAL

In particular, the area of the plane region under f lies between these two values. Hence,
I (f; ) gives a lower approximation of this area, while S (f; ) gives an upper approximation
of it. They are called the lower and upper integral sums of f with respect to , respectively.

De…nition 1404 Given two subdivisions and 0 of [a; b], we say that 0 re…nes if 0.

That is, if all the points of are also points of 0 .

In other words, the …ner subdivision 0 is obtained by adding further points to . For
example, the subdivision
0 1 1 3
= 0; ; ; ; 1
4 2 4
of the unit interval [0; 1] re…nes the subdivision = f0; 1=2; 1g.
It is easy to see that if 0 re…nes , then
0 0
I (f; ) I f; S f; S (f; ) (35.6)

In other words, a …ner subdivision 0 yields a better approximation, both lower and upper, of
the area under f .2 By starting from any subdivision, we can always re…ne it, thus improving
(or, at least, not worsening) the approximations given by the corresponding plurirectangles.
The same can be done by starting from any two subdivisions and 0 , not necessarily
nested. Indeed, the subdivision 00 = [ 0 formed by all the points that belong to the two
subdivisions and 0 re…nes both of them. In other words, 00 is a common re…nement of
and 0 .

Example 1405 Consider the two subdivisions

1 1 2 0 1 1 3
= 0; ; ; ; 1 and = 0; ; ; ; 1
3 2 3 4 2 4

of [0; 1]. They are not nested: neither re…nes 0 nor 0 re…nes . However, the subdivision

00 0 1 1 1 2 3
= [ = 0; ; ; ; ; ; 1
4 3 2 3 4

re…nes both and 0. N

Thanks to the inequality (35.6), we have


00 00
I (f; ) I f; S f; S (f; ) (35.7)

and
0 00 00 0
I f; I f; S f; S f; (35.8)
The common re…nement 00 gives a better approximation, both lower and upper, of the area
under f than the original subdivisions and 0 .
All this motivates the next de…nition.
2
For sake of brevity, we write “area under f ” instead of the more precise expression “area of the plane
region that lies under the graph of f ”.
35.3. DEFINITION 1005

De…nition 1406 Let f : [a; b] ! R+ be a bounded function. The value


Z b
f (x) dx = sup I (f; ) (35.9)
a 2

is said to be the lower integral of f on [a; b] ; while the value


Z b
f (x) dx = inf S (f; ) (35.10)
a 2

is said to be the upper integral of f on [a; b].


Rb
Therefore, f (x) dx is the supremum of the areas I (f; ) of the inscribed plurirectangles
a
obtained by considering all the possible subdivisions of [a; b]. Starting from the inscribed
plurirectangles, this is the best possible lower approximation of the area under f on [a; b].
Rb
Similarly, a f (x) dx is the in…mum of the areas S (f; ) of the circumscribed plurir-
ectangles obtained by considering all the possible subdivisions of [a; b]. Starting from the
circumscribed plurirectangles, this is the best possible upper approximation of the area under
f.

A …rst important question is whether the lower and upper integrals of a bounded function
exist. Fortunately, this is the case, as next we show.

Lemma 1407 If f : [a; b] ! R+ is a bounded function, then both the lower integral and the
upper integral exist and are …nite, with
Z b Z b
f (x) dx f (x) dx (35.11)
a a

Proof Since f is positive and bounded, there exists M 0 such that 0 f (x) M for
every x 2 [a; b]. Therefore, for every subdivision = fxi gni=0 we have

0 inf f (x) sup f (x) M 8i = 1; 2; : : : ; n


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and so
0 I (f; ) S (f; ) M (b a) 8 2
By the Least Upper Bound Principle, the supremum in (35.9) and the in…mum in (35.10)
Rb Rb
exist and are …nite and positive, that is, f (x) dx 2 R+ and a f (x) dx 2 R+ .
a
We still need to prove the inequality (35.11). Let us suppose, by contradiction, that
Z b Z b
f (x) dx f (x) dx = " > 0
a a

By Proposition 120, there exist a subdivision 0 such that


Z b
0 "
I(f; ) > f (x) dx
a
2
1006 CHAPTER 35. THE RIEMANN INTEGRAL

and a subdivision 00 such that


Z b
00 "
S(f; )< f (x) dx +
a 2
These two inequalities yield
Z Z b
!
b
0 00 " "
I(f; ) S(f; )> f (x) dx f (x) dx + =" "=0
a
2 a 2

If we take the subdivision = 0 [ 00 , then I (f; ) I (f; 0) and S (f; ) S (f; 00 ). We


conclude that
I(f; ) S(f; ) I(f; 0 ) S(f; 00
)>0
that is, I(f; ) > S(f; ), which contradicts (35.5).

By the previous lemma, every bounded function f : [a; b] ! R+ has both the lower
integral and the upper integral, with
Z b Z b
f (x) dx f (x) dx
a a

The area under f lies between these two values. The last inequality is the most re…ned
version of (35.6). The lower and upper integrals are, respectively, the best lower and upper
approximations of the area under f that can be obtained through plurirectangles. In partic-
Rb Rb
ular, when f (x) dx = a f (x) dx, the area under f will be assumed to be such common
a
value. This motivates the next fundamental de…nition.

De…nition 1408 A bounded function f : [a; b] ! R+ is said to be integrable in the sense of


Riemann (or Riemann integrable) if
Z b Z b
f (x) dx = f (x) dx
a a
Rb
This common value, denoted by a f (x) dx, is called the integral in the sense of Riemann
(or Riemann’s integral) of f on [a; b].

For brevity, in the rest of the chapter we will often talk about integrals and integrable
functions, omitting the clause “in the sense of Riemann”. Since there are other notions of
integral, it is important however to keep always in mind such quali…cation. In addition, note
that the de…nition applies only to bounded functions. When in the sequel we will consider
integrable functions, they will be assumed to be bounded (even if not stated explicitly).
Rb
O.R. The notation
Pn a f (x) dx reminds us that P
the integral is obtained as the limit of sums
of the type
R i=1 i x i , in which the symbol is replaced by the integral sign (“a long
letter s”) , the length xi by dx, and the values i of the function by f (x). H

Let us illustrate the de…nition of the integral with, …rst, an example of an integrable
function and, then, of a non-integrable one.
35.3. DEFINITION 1007

Example 1409 Let f : [a; b] ! R be de…ned by f (x) = x. For any subdivision fxi gni=0 we
have
n
X
I (f; ) = x0 x1 + x1 x2 + + xn 1 xn = xi 1 xi
i=1
n
X
S (f; ) = x1 x1 + x2 x2 + + xn xn = xi xi
i=1

Therefore,
n
X
S (f; ) I (f; ) = (x1 x0 ) x1 + (x2 x1 ) x2 + + (xn xn 1) xn = ( xi )2
i=1

By taking progressively …ner subdivisions, we obtain that xi ! 0, and so


Z b Z b n n
!
X X 1
0 f (x) dx f (x) dx ( xi )2 = n ( xi )2
a a
n
i=1 i=1
n
!2
X1 b a 2 (b a)2
n xi =n = !0
n n n
i=1

where the last inequality follows from Jensen’s inequality because the quadratic function is
Rb Rb
convex.3 Thus, a f (x) dx = f (x) dx and we conclude that f (x) = x is integrable. N
a

Example 1410 Let f : [a; b] ! R be the Dirichlet function


(
1 if x 2 Q \ [a; b]
f (x) = (35.12)
0 if x 2= Q \ [a; b]
restricted to [a; b]. For every a x < y b there exists a rational number q such that
x < q < y, as well as an irrational number r such that x < r < y (Propositions 18 and 39).
Given any subdivision fxi gni=0 of [a; b], we thus have
mi = 0 and Mi = 1 8i = 1; 2; :::; n
Therefore,
I (f; ) = 0 x1 + 0 x2 + + 0 xn = 0
and
n
X
S (f; ) = 1 x1 + 1 x2 + + 1 xn = xi = b a
i=1
Rb Rb
which implies f (x) dx = 0 < b a = a f (x) dx. We conclude that the Dirichlet function
a
is not integrable in the sense of Riemann.4 N
3
This argument relies on subdivisions getting …ner and …ner. In this way, on the one hand xi gets smaller
and smaller, on the other
P hand the number of summands xi gets larger and larger. A priori the behavior
of the resulting sum n i=1 xi is thus ambiguous. In this example, Jensen’s inequality comes to the rescue
and solves the ambiguity.
4
Therefore, it is meaningless (at least in the sense of Riemann) to talk about the “area”of the plane region
under such a function.
1008 CHAPTER 35. THE RIEMANN INTEGRAL

Finally, let us introduce a useful quantity that characterizes the “…ness”of a subdivision
of [a; b].

De…nition 1411 Given a subdivision of [a; b], we de…ne the mesh of , denoted by j j,
the positive quantity
j j = max xi
i=1;2;:::;n

Finer subdivisions have a smaller mesh.

35.3.2 General functions


We now extend the notion of integral to any bounded function f : [a; b] ! R, not necessarily
positive. For a function f : [a; b] ! R that assumes both negative and positive values, the
plane region bounded by f on [a; b] has in general a positive part and a negative part:

5 y

2
+

1
O - x
0

-1

-2
-3 -2 -1 0 1 2 3 4

Intuitively, the integral is now the di¤erence between the area of the positive part and the
area of the negative part. If they have equal value, the integral is zero: this is the case, for
example, of the function f (x) = sin x on the interval [0; 2 ].
To make this idea rigorous, it is useful to decompose a function into its positive and
negative parts.

De…nition 1412 Let f : A R ! R. The function f + : A R ! R+ is de…ned by

f + (x) = max ff (x) ; 0g 8x 2 A

while the function f :A R ! R+ is de…ned by

f (x) = min ff (x) ; 0g 8x 2 A

The function f + is called the positive part of f , while f is called the negative part.

Both functions f + and f are positive.


35.3. DEFINITION 1009

Example 1413 (i) Let f : R ! R be given by f (x) = x. We have

0 x<0 x x<0
f + (x) = and f (x) =
x x 0 0 x 0

Graphically:

3 3
y y
2.5 2.5

2 2
+ -
f f
1.5 1.5

1 1

0.5 0.5

0 0
O x O x
-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
-4 -2 0 2 4 6 -4 -2 0 2 4 6

(ii) Let f : R ! R be given by f (x) = sin x. We have


8 [
>
< sin x x 2 [2n ; (2n + 1) ]
f + (x) = n2Z
>
:
0 otherwise

and 8 [
>
< 0 x2 [2n ; (2n + 1) ]
f (x) = n2Z
>
:
sin x otherwise
Graphically:

4 4

y y
3 3

-
f
2 2
+
f
1 1

0 0
O x O x

-1 -1

-2 -2
-8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 10

N
1010 CHAPTER 35. THE RIEMANN INTEGRAL

Since for every real number a 2 R we trivially have

a = max fa; 0g + min fa; 0g

it follows that, for every x 2 A,

f (x) = max ff (x) ; 0g + min ff (x) ; 0g = max ff (x) ; 0g ( min ff (x) ; 0g)
+
= f (x) f (x)

Every function f : A R ! R can therefore be decomposed as the di¤erence

f = f+ f (35.13)

of its positive and negative parts. Such a decomposition permits to extend in a natural way
the notion of integral to any function, not necessarily positive. Indeed, since both functions
f + and f are positive, the de…nition of Riemann integral for positive functions applies to
the areas under each of them. The di¤erence between their integrals
Z b Z b
+
f (x) dx f (x) dx
a a

is the di¤erence between the areas under f + and f . So, it is the integral which we were
looking for.
All of this motivates the following de…nition of Riemann integral for general bounded
functions, not necessarily positive.

De…nition 1414 A bounded function f : [a; b] ! R is said to be integrable in the sense of


Riemann if the functions f + and f are integrable. In this case, the Riemann integral of f
on [a; b] is de…ned by
Z b Z b Z b
+
f (x) dx = f (x) dx f (x) dx
a a a

This de…nition makes it rigorous and transparent the idea of considering with di¤erent
sign the areas of the plane regions bounded by f that lie, respectively, above and below the
horizontal axis.

35.3.3 Everything holds together


Is it possible to express the notion of Riemann integral for general bounded functions in
terms of the lower and upper approximations I (f; ) and S (f; ) upon which the notion of
integral of positive functions so much relied, formally and conceptually? In this section we
show that, remarkably, this is indeed the case, thus showing that the method of exhaustion
is at the heart of the notion of Riemann integral for any bounded function, positive or not.
To this end, we …rst note that, given a subdivision = fxi gni=0 , we can still de…ne for
any bounded function f : [a; b] ! R the sums S (f; ) and I (f; ) as in (35.3) and (35.4),
that is,
Xn Xn
I (f; ) = mi xi and S (f; ) = Mi x i
i=1 i=1
35.3. DEFINITION 1011

For general functions, too, the sums I(f; ) and S(f; ) are called the lower and upper
integral sum of f with respect to the subdivision , respectively. The reader can easily verify
that for these sums the properties (35.5), (35.6), (35.7) and (35.8) continue to hold. In
particular,
sup I (f; ) inf S (f; )
2 2

Moreover, for any bounded function f : [a; b] ! R, positive or not, we can still de…ne the
lower and upper integrals
Z b Z b
f (x) dx = sup I (f; ) and f (x) dx = inf S (f; ) (35.14)
2 a 2
a

in perfect analogy with what we did for positive functions. The next result shows that
everything …ts together: the notion of Riemann integral obtained through the decomposition
(35.13) into positive and negative part is given by the equality between upper and lower
integrals of (35.14).
Rb
Proposition 1415 A bounded function f : [a; b] ! R is integrable if and only if f (x) dx =
a
Rb
a f (x) dx. In this case,
Z b Z b Z b
f (x) dx = f (x) dx = f (x) dx
a a a

The proof is based on the three lemmas. The …rst one establishes a general property of
the suprema and in…ma of sums of functions, the second one has also a theoretical interest
for the theory of integration (as we will explain at the end of the section), while the last one
has a more technical nature.

Lemma 1416 For any two bounded functions g; h : A ! R, we have supx2A (g + h) (x)
supx2A g (x) + supx2A h (x) and inf x2A (g + h) (x) inf x2A g (x) + inf x2A h (x).

Proof By contradiction, suppose that supx2A (g + h) (x) > supx2A g (x) + supx2A h (x). Let
" = supx2A (g + h) (x) (supx2A g (x) + supx2A h (x)) > 0. By a property of the sup of a
set, there exists x0 2 A such that (g + h)(x0 ) > supx2A (g + h) (x) " = supx2A g (x) +
supx2A h (x).5 At the same time, by the de…nition of sup of a function, we have g(x)
supx2A g (x) and h(x) supx2A h (x) for every x 2 A, from which it follows that g(x)+h(x)
supx2A g (x) + supx2A h (x) for every x 2 A. In particular, (g + h)(x0 ) supx2A g (x) +
supx2A h (x), a contradiction. The reader can prove, in a similar way, that inf x2A (g+h) (x)
inf x2A g (x) + inf x2A h (x).

Lemma 1417 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b], we have
S (f; ) = S f + ; I f ; (35.15)
and
I (f; ) = I f + ; S f ; (35.16)
5
Note that supx2A g (x) + supx2A h (x) = sup Im(g + h) = sup(g + h)(A).
1012 CHAPTER 35. THE RIEMANN INTEGRAL

Proof Let f : [a; b] ! R be a bounded function and let = fxi gni=0 be a subdivision of
[a; b]. For a generic interval [xi 1 ; xi ], put = supx2[xi 1 ;xi ] f (x) and = inf x2[xi 1 ;xi ] f (x).
Since f is bounded, and exist by the Least Upper Bound Principle. We have

0 =) = sup f + (x)
x2[xi 1 ;xi ]

and
< 0 =) sup f + (x) = 0 and = inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

So,
sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

On the other hand, by Lemma 1416 for any pair of functions g; h : A ! R we have

sup(g + h) (x) sup g (x) + sup h (x) (35.17)


x2A x2A x2A

and so

= sup f + (x) f (x) sup f + (x) + sup f (x)


x2[xi 1 ;xi ] x2[xi 1 ;xi ] x2[xi 1 ;xi ]

= sup f + (x) inf f (x)


x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

In sum,
= sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

which implies (35.15). A similar argument proves (35.16).

Lemma 1418 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b],

sup I (f; ) sup I f + ; inf S f ; (35.18)


2 2 2

inf S f + ; sup I f ; inf S (f; )


2 2 2

Proof By (35.15) and by the “inf” part of Lemma 1416, we have

inf S (f; ) = inf S f + ; I f ; inf S f + ; + inf I f ;


2 2 2 2
= inf S f + ; sup I f ; (35.19)
2 2

Moreover, by (35.16) and by the “sup” part of Lemma 1416, we have

sup I (f; ) = sup I f + ; S f ; sup I f + ; + sup S f ;


2 2 2 2
= sup I f + ; inf S f ; (35.20)
2 2
35.3. DEFINITION 1013

Putting together (35.19), (35.20) and (35.5) applied to both f + and f , we get the inequality
(35.18).
Rb Rb
Proof of Proposition 1415 We begin with the “if”: suppose f (x) dx = af (x) dx. We
a
show that f + and f are integrable. From (35.18) it follows

sup I (f; ) = sup I f + ; inf S f ; (35.21)


2 2 2

= inf S f + ; sup I f ; = inf S (f; )


2 2 2

So
sup I f + ; inf S f ; = inf S f + ; sup I f ;
2 2 2 2
which implies

sup I f + ; inf S f + ; = inf S f ; sup I f ;


2 2 2 2

Using again (35.5) applied to both f + and f , we have

0 sup I f + ; inf S f + ; = inf S f ; sup I f ; 0


2 2 2 2

which implies

sup I f + ; inf S f + ; = inf S f ; sup I f ; =0


2 2 2 2

We conclude that inf 2 S (f + ; ) = sup 2 I (f + ; ) and inf 2 S (f ; ) = sup 2 I (f ; ),


so the functions f + and f are both integrable. Moreover, from (35.21) it follows that
Z b Z b Z b
inf S (f; ) = sup I (f; ) = f + (x) dx f (x) dx = f (x) dx
2 2 a a a

It remains to prove the “only if”. Suppose that f be integrable, that is, that f + and f
are both integrable. We show that

sup I (f; ) = inf S (f; ) (35.22)


2 2

By (35.18), we have
Z b Z b Z b
+
sup I (f; ) f (x) dx f (x) dx = f (x) dx inf S (f; ) (35.23)
2 a a a 2

Since f + and f are both integrable, by the integrability criterion of Proposition 1419 we
have that, for every " > 0, there exist subdivisions and 0 such that6

S f +; I f +; < " and S f ; 0


I f ; 0
<"
6
The integrability criterion of Proposition 1419 for positive functions (all we need here) can be proved
directly via De…nition 1408. Thus, there is no circularity in using in the current proof such criterion.
1014 CHAPTER 35. THE RIEMANN INTEGRAL

If 00 is common re…nement of and 0, a fortiori we have


S f +; 00
I f +; 00
< " and S f ; 00
I f ; 00
<"
So, by (35.15) and (35.16) we have
00 00
0 inf S (f; ) sup I (f; ) S f; I f;
2 2
+ 00
= S f ; I f +; 00
+S f ; 00
I f ; 00
< 2"
which implies (35.22). Together with (35.23), this proves that
Z b
sup I (f; ) = f (x) dx = inf S (f; )
2 a 2

as desired.

N.B. The Riemann integral is often de…ned directly for general functions, not necessarily
positive, through the lower and upper sums. What is lost in de…ning these sums for not
necessarily positive functions is the geometric intuition. While for positive functions I(f; )
is the area of the inscribed plurirectangles and S(f; ) the area of the circumscribed plurir-
ectangles, this is no longer true for a generic function that takes positive and negative values,
as (35.15) and (35.16) show. The formulation we adopt with De…nition 1414 is suggested
by pedagogical motivations and is equivalent to the usual formulation, as Proposition 1415
shows. O

35.4 Integrability criteria


In the next section we will study some important classes of integrable functions. To this end,
we establish here some important integrability criteria.
We begin with simple, yet useful, criterion.

Proposition 1419 A bounded function f : [a; b] ! R is Riemann integrable if and only if


for every " > 0 there exists a subdivision such that S (f; ) I (f; ) < ".

Proof “If”. Suppose that, for every " > 0, there exists a subdivision such that S (f; )
I (f; ) < ". Then
Z b Z b
0 f (x) dx f (x) dx S (f; ) I (f; ) < "
a a
Rb Rb
and therefore, since " > 0 is arbitrary, we have a f (x) dx = f (x) dx.
a
Rb Rb
“Only if”. Suppose that a f (x) dx = f (x) dx. By Proposition 120, for every " > 0
a
0
Rb
there exist a subdivision such that S (f; 0 ) a f (x) dx < " and a subdivision
00 such
Rb
that f (x) dx I (f; 00 ) < ". Let be a subdivision that re…nes both 0 and 00 . Thanks
a
to (35.6), we have I (f; 00 ) I (f; ) S (f; ) S (f; 0 ), so
Z b Z b
0 00
S (f; ) I (f; ) S f; I f; < f (x) dx + " f (x) dx + " = 2"
a a
35.4. INTEGRABILITY CRITERIA 1015

as desired.

The next result shows that, if two functions are equal except at a …nite number of points,
then their integrals (if they exist) are equal. It is an important property of stability of the
integral, whose value does not change if we modify a function f : [a; b] ! R at a …nite
number of points.

Proposition 1420 Let f : [a; b] ! R be an integrable function. If g : [a; b] ! R is equal


Rb
to f except at most at a …nite number of points, then also g is integrable and a f (x) dx =
Rb
a g (x) dx.

Proof It is su¢ cient to prove the statement for the case in which g di¤ers from f at only
one point x^ 2 [a; b]. The case of n points is then proved by (…nite) induction by adding one
point at a time.
Suppose, therefore, that f (^ x) 6= g(^
x) with x ^ 2 [a; b]. Without loss of generality, suppose
that f (^
x) > g(^x). Setting k = f (^ x) g(^ x) > 0, let h : [a; b] ! R be the function h = f g.
Then
0 x 6= x ^
h(x) =
k x=x ^
Rb
Let us prove that h is integrable and that a h(x)dx = 0. Let " > 0. Consider an arbitrary
subdivision = fx0 ; x1 ; :::; xn g of [a; b] such that j j < "=(2k). Since x ^ 2 [a; b], there are
two possibilities: (i) x
^ is not an intermediate point of the subdivision, that is, we have either
x
^ 2 fx0 ; xn g or x
^ 2 (xi 1 ; xi ) for some i = 1; :::; n; (ii) x^ is a point of the subdivision, with
the exclusion of the extremes, that is, x ^ = xi for some i = 1; :::; n 1. Since h(x) = 0 for
every x 6= x
^, we have
I(h; ) = 0
In case (i), with either x
^ 2 fx0 ; xn g or x
^ 2 (xi 1 ; xi ) for some i = 1; :::; n, we have7
" "
S(h; ) = k xi < k = <"
2k 2
In case (ii), with x
^ = xi for some i = 1; :::; n 1, we have
"
S(h; ) = k ( xi + xi+1 ) < 2k ="
2k
Therefore, in both cases (i) and (ii) we have S(h; ) I(h; ) < ". Since " > 0 is arbitrary,
by Proposition 1419 h is integrable on [a; b]. Hence
Z b
h(x)dx = sup I(h; ) = inf S(h; ) (35.24)
a 2 2

But, since h(x) = 0 for every x 6= x


^, one has I(h; ) = 0 for every subdivision 2 , and so

sup I(h; ) = 0
2
7
If x
^ = x0 , we have S(h; ) = k x1 , while if x
^ = xn , we have S(h; ) = k xn . In both cases, we have
S(h; ) < ".
1016 CHAPTER 35. THE RIEMANN INTEGRAL

Thanks to (35.24), we conclude that


Z b
h(x)dx = sup I(h; ) = 0
a 2

By applying the linearity of the integral (Theorem 1429), we have that g = f h is integrable
because f and h are so, with
Z b Z b Z b Z b
g(x)dx = f (x)dx h(x)dx = f (x)dx
a a a a

as desired.

O.R. Even if a function f is not de…ned at a …nite number of points of the interval [a; b],
we can still talk about its integral: it coincides with that of any function de…ned also at the
missing points and equal to f at the points where f is de…ned. In particular, the integrals
of f on [a; b], (a; b], [a; b) and (a; b) always coincide: this makes unambiguous the notation
Z b
f (x) dx. H
a

Finally, let us show that integrability is preserved by continuous transformations.

Proposition 1421 Let f : [a; b] ! R be an integrable and bounded function, with m


f M . If g : [m; M ] ! R is continuous, then the composite function g f : [a; b] ! R is
integrable.

Proof Let " > 0. Since g is continuous on [m; M ], by Theorem 526 the function g is
uniformly continuous on [m; M ], that is, there exists " > 0 such that

jx yj < " =) jg (x) g (y)j < " 8x; y 2 [m; M ] (35.25)

Without loss of generality, we can assume that " < ".


Since f is integrable, Proposition 1419 provides a subdivision = fxi gni=0 of [a; b] such
that S (f; ) I (f; ) < 2" . Let I f1; 2; : : : ; ng be the set of the indexes i of the subdivision
such that
sup f (x) inf f (x) < "
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

so that, for i 2 I we have

f (x) f x0 < " 8x; x0 2 [xi 1 ; xi ]

From (35.25) it follows that, for every i 2 I,

(g f ) (x) (g f ) x0 < " 8x; x0 2 [xi 1 ; xi ]

and therefore

sup (g f ) (x) inf (g f ) (x) " 8i 2 I


x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
35.4. INTEGRABILITY CRITERIA 1017

On the other hand,8


" #
X X
2
" xi sup f (x) inf f (x) xi < "
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
= i2I
=
P
and therefore i2I
= xi <
< ". Hence,
"

n
" #
X
S (g f; ) I (g f; ) = sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=0
" #
X
= sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
" #
X
+ sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
=
X X
" xi + 2 max jg (y)j xi < " (b a) + 2 max jg (y)j "
y2[m;M ] y2[m;M ]
i2I i2I
=

= b a + 2 max jg (y)j "


y2[m;M ]

By Proposition 1419, g f is integrable.

Since the function g (x) = jxj is continuous, a simple but important consequence of
Proposition 1421 is that the integrability of a bounded function f : [a; b] ! R implies the
integrability of the function absolute value jf j : [a; b] ! R. Note that the converse is false:
the function 8
< 1 if x 2 Q \ [0; 1]
f (x) = (35.26)
:
1 if x 2 = Q \ [0; 1]
is a simple modi…cation of the Dirichlet function and hence it is not integrable, contrary to
its absolute value jf j which is the constant function equal to 1 on the interval [0; 1].

Finally, observe that the …rst integrability criteria of this section, Proposition 1419, opens
an interesting perspective on the Riemann integral. Given any subdivision = fxi gni=0 , by
de…nition we have mi f (x0i ) Mi for every x0i 2 [xi 1 ; xi ], so that
n
X
I (f; ) f x0i xi S (f; )
i=1

Hence, since
Z b
I(f; ) f (x) dx S(f; )
a
we have
n
X Z b
I (f; ) S (f; ) f x0i xi f (x) dx S (f; ) I (f; )
i=1 a

8
Here i 2
= I stands for i 2 f1; 2; : : : ; ng I.
1018 CHAPTER 35. THE RIEMANN INTEGRAL

which is equivalent to
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; )
i=1 a

By Proposition 1419, for every " > 0 there exists a su¢ ciently …ne subdivision for which
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; ) < "
i=1 a

In a suggestive way we can, therefore, write


n
X Z b
lim f x0i xi = f (x) dx (35.27)
j j!0 a
i=1
Rb
That is, the Riemann integral a f (x) dx can Pnbe seen0 as a 9limit, for smaller and smaller
meshes j j of the subdivisions , of the sums i=1 f (xi ) xi . It is an equivalent way to see
Riemann integral, which is indeed sometimes de…ned directly in these terms through (35.27).
Even if evocative, the limit limj j!0 is not among the notions of limit, for sequences or
functions, discussed in the book (indeed, it requires a more subtle de…nition). Moreover,
the de…nition we have adopted is particularly well suited for generalizations of the Riemann
integral, as the reader will see in more advanced courses on integration.

35.5 Classes of integrable functions


Armed with the integrability criteria of the previous section, we now study some important
classes of integrable functions.

35.5.1 Step functions


There is a class of functions closely related to plurirectangles that plays a central role in the
theory of integration.

De…nition 1422 A function f : [a; b] ! R is called step function if there exist a subdivision
= fxi gni=0 and a set fci gni=1 of constants such that

f (x) = ci 8x 2 (xi 1 ; xi ) (35.28)

For example, the functions f; g : [a; b] ! R given by


n
X1
f (x) = ci 1[xi 1 ;xi )
(x) + cn 1[xn 1 ;xn ]
(x) (35.29)
i=1

and
n
X
g (x) = c1 1[x0 ;x1 ] (x) + ci 1(xi 1 ;xi ]
(x) (35.30)
i=2
9
Often called Riemann sums (or, sometimes, Cauchy sums).
35.5. CLASSES OF INTEGRABLE FUNCTIONS 1019

are step functions where, for every set A in R, we denote by 1A : R ! R the indicator
function
(
1 if x 2 A
1A (x) = (35.31)
0 if x 2
=A

The two following …gures give, for n = 4, examples of functions f and g described by (35.29)
and (35.30). Note that f and g are, respectively, continuous from the right and from the
left, that is, limx!x+ f (x) = f (x0 ) and limx!x g (x) = g (x0 ).
0 0

7 7

6 6

5 f(x) 5 g(x)
4 c 4 c
4 4

3 c 3 c
2 2

2 c 2 c
3 3

1 c 1 c
1 1

0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9

On the intervals
[x0 ; x1 ) [ (x1 ; x2 ) [ (x2 ; x3 ) [ (x3 ; x4 ]

the two step functions generate the same plurirectangle

4 c
4

3 c
2

2 c
3

1 c
1

0
x x x x x
0 1 2 3 4
-1
-1 0 1 2 3 4 5 6 7 8 9

determined by the subdivision fxi g4i=0 and by the constants fci g4i=1 . Nevertheless, at the
points x1 < x2 < x3 the functions f and g di¤er and it is easy to verify that on the entire
interval [x0 ; x4 ] they do not generate this plurirectangle, as the next …gure shows. Indeed,
the dashed segment at x2 is not under f and the dashed segments at x1 and x3 are not under
1020 CHAPTER 35. THE RIEMANN INTEGRAL

g.

7 7

6 6

5 f(x) 5 g(x)
4 c 4 c
4 4

3 c 3 c
2 2

2 c 2 c
3 3

1 c 1 c
1 1

0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 -1
8 90 1 2 3 4 5 6 7

But, thanks to Proposition 1420, such a discrepancy at a …nite number of points is irrelevant
for the integral. The next result shows that the area under the step functions f and g is,
actually, equal to that of the corresponding plurirectangle (independently of the values of
the function at the points x1 < x2 < x3 ).

Proposition 1423 A step function f : [a; b] ! R, determined by the subdivision fxi gni=0
and the constants fci gni=1 according to (35.28), is integrable, with

Z b n
X
f (x) dx = ci xi (35.32)
a i=1

All the step functions that are determined by a subdivision fxi gni=0 and a set of constants
fci gni=1 according to (35.28), share therefore the same integral (35.32). In particular, this
holds for the step functions (35.29) and (35.30).

Rb Rb
Proof Since f is bounded, Lemma 1407 shows that f (x) dx; a f (x) dx 2 R. Let m =
a
inf x2[a;b] f (x) and M = supx2[a;b] f (x). Fix " > 0 su¢ ciently small, and consider the
subdivision " given by

x0 < x1 " < x1 + " < x2 " < x2 + " < < xn 1 " < xn 1 + " < xn
35.5. CLASSES OF INTEGRABLE FUNCTIONS 1021

We have

I (f; ") = c1 (x1 " x0 ) + 2" inf f (x)


x2[x1 ";x1 +"]

+ c2 (x2 " x1 ") + 2" inf f (x) +


x2[x2 ";x2 +"]

+ 2" inf f (x) + cn (xn xn 1 ")


x2[xn 1 ";xn 1 +"]

n
X1
= c1 ( x1 ") + 2" inf f (x)
x2[xi ";xi +"]
i=1
n
X1
+ ci ( xi 2") + cn ( xn ")
i=2
Xn n
X1 n
X1
= ci xi " (c1 + cn ) + 2" inf f (x) 2" ci
x2[xi ";xi +"]
i=1 i=1 i=2
Xn n
X
ci xi 2"M + 2" (n 1) m 2"M (n 2) = ci xi 2" (n 1) (M m)
i=1 i=1

In a similar way we show that


n
X
S (f; ") ci xi + 2" (n 1) (M m)
i=1

Therefore, setting K = 2(n 1)(M m) > 0, we have

S (f; ") I (f; ") 2K" < 4K"

Since " > 0 is arbitrary, Proposition 1419 shows that f is integrable. Moreover, since
Z b
I (f; " ) f (x) dx S (f; " )
a

we have Z
n
X b n
X
ci xi K" f (x) dx ci xi + K"
i=1 a i=1
Rb Pn
which, given the arbitrariness of " > 0, guarantees that a f (x) dx = i=1 ci xi .

35.5.2 Analytic and geometric approaches


Step functions can be seen as the functional version of plurirectangles. They are, therefore,
the simplest functions that one can integrate. In particular, thanks to formula (35.32),
the lower and upper integrals can be expressed in terms of integrals of step functions. Let
S ([a; b]) be the set of all step functions de…ned on [a; b].

Proposition 1424 Given a bounded function f : [a; b] ! R we have


Z b Z b
f (x) dx = sup h (x) dx : h f and h 2 S ([a; b]) (35.33)
a a
1022 CHAPTER 35. THE RIEMANN INTEGRAL

and
Z b Z b
f (x) dx = inf h (x) dx : h f and h 2 S ([a; b]) (35.34)
a a

Thus, a bounded function f : [a; b] ! R is Riemann integrable if and only if


Z b Z b
sup h (x) dx : h f and h 2 S ([a; b]) = inf h (x) dx : f h and h 2 S ([a; b])
a a

That is, if and only if the lower approximation given by the integrals of step functions smaller
than f coincides, at the limit, with the upper approximation given by the integrals of step
functions larger than f . In this case the method of exhaustion assumes a more analytic and
less geometric aspect10 with the approximation by elementary polygons (the plurirectangles)
replaced by the one given by elementary functions (the step functions).
This suggests a di¤erent approach to the Riemann integral, more analytic and less geo-
metric. In such an approach, we …rst de…ne the integrals of step functions (that is, the area
under them), which can be determined on the basis of elementary geometric considerations
based on plurirectangles. We then use these “elementary”integrals to suitably approximate
the areas under more complicated functions. In particular, we de…ne the lower integral of
a bounded function f : [a; b] ! R as the best approximation “from below” obtained by
means of step functions h f , and, analogously, the upper integral of a bounded function
f : [a; b] ! R as the best approximation “from above” obtained by means of step functions
h f.
Thanks to (35.33) and (35.34), this more analytic interpretation of the method of ex-
haustion is equivalent to the geometric one previously adopted. The analytic approach is
quite fruitful, as readers will learn in more advanced courses.

35.5.3 Continuous functions and monotonic functions


We now consider two important classes of integrable functions, the continuous and the mono-
tone ones.

Proposition 1425 Every continuous function f : [a; b] ! R is integrable.

Proof Since f is continuous on [a; b], by Weierstrass’Theorem f is bounded. Let " > 0. By
Theorem 526, f is uniformly continuous, that is, there exists " > 0 such that

jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (35.35)

Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". By (35.35), for every i =
1; 2; : : : ; n we therefore have

max f (x) min f (x) < "


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

10
That is, based also on the use of notions of analysis, such as functions, and not only on that of geometric
…gures, such as plurirectangles.
35.5. CLASSES OF INTEGRABLE FUNCTIONS 1023

where max and min exist thanks to Weierstrass’Theorem. It follows that


n
X n
X
S (f; ) I (f; ) = max f (x) xi min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
Xn
= max f (x) min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
<" xi = " (b a)
i=1

By Proposition 1419, f is integrable.

Because of the stability of the integral seen in Proposition 1420, we have the following
immediate generalization of the last result: every bounded function f : [a; b] ! R that has
at most a …nite number of removable discontinuities is integrable. Indeed, by recalling (12.7)
of Chapter 12, if S = fxi gni=1 is the set of points where f has removable discontinuities, the
function
f (x) if x 2
=S
f~ (x) =
limy!x f (y) if x 2 S
is continuous (so, integrable) and is equal to f except at the points of S.
More is true: the hypothesis that the discontinuities are removable is actually super‡uous,
and we can allow for countably many points of discontinuity (but not more than that).

Theorem 1426 Every bounded function f : [a; b] ! R with at most countably many discon-
tinuities is integrable.

Therefore, a function is integrable if its points of discontinuity form a …nite or a countable


set. We omit the proof (which is less easy than that of the special case just seen with only
removable discontinuities).
This important integrability result generalizes Proposition 1425 as well as Proposition
1423 on the integrability of step functions (which obviously are continuous, except at the
points of the subdivisions that de…ne them). Let us see a couple of examples.

Example 1427 (i) The function f : [0; 1] ! R given by


(
x if x 2 (0; 1)
f (x) = 1
2 if x 2 f0; 1g

is continuous at all the points of [0; 1], except at the two extreme points 0 and 1. By Theorem
1426, the function f is integrable.
(ii) Consider the countable set
1
E= :n 1 [0; 1]
n
The function f : [0; 1] ! R de…ned by

x2 if x 2
=E
f (x) =
0 if x 2 E
1024 CHAPTER 35. THE RIEMANN INTEGRAL

is continuous at all the points of [0; 1], except at the points of E.11 Since E is a countable
set, by Theorem 1426 the function f is integrable. N

Note that the Dirichlet function f : [0; 1] ! R


(
1 if x 2 Q\ [0; 1]
f (x) =
0 if x 2= Q\ [0; 1]
which we know to be not integrable, does not satisfy the hypotheses of Theorem 1426.
Indeed, even if it is bounded, f is discontinuous at each point of [0; 1] – not only at the
points x 2 Q\ [0; 1], which form a countable set.

Let us now consider the monotonic functions.

Proposition 1428 Every monotonic function f : [a; b] ! R is integrable.

The result follows immediately from Theorem 1426 because monotonic functions have at
most countably many points of discontinuity (Proposition 483). Next we give, however, a
simple direct proof of the result.

Proof Let " > 0. Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". Let us suppose
that f is increasing (the argument for f decreasing is analogous). We have

inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and therefore
n
X n
X
S (f; ) I (f; ) = sup f (x) xi inf f (x) xi
x2[xi 1 ;xi ]
i=1 x2[xi 1 ;xi ] i=1
Xn n
X n
X
= f (xi ) xi f (xi 1) xi = (f (xi ) f (xi 1 )) xi
i=1 i=1 i=1
n
X
j j (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1

By Proposition 1419, the function f is integrable.

35.6 Properties of the integral


The …rst important property of the integral is its linearity: the integral of a linear combin-
ation of functions is equal to the linear combination of their integrals.

Theorem 1429 Let f; g : [a; b] ! R be two bounded and integrable functions. Then, for
every ; 2 R the function f + g : [a; b] ! R is integrable, with
Z b Z b Z b
( f + g) (x) dx = f (x) dx + g (x) dx (35.36)
a a a
11
Note that f is continuous at the origin, as the reader can verify.
35.6. PROPERTIES OF THE INTEGRAL 1025

Proof The proof is divided into two parts. First we will prove homogeneity, that is,
Z b Z b
f (x) dx = f (x) dx 8 2R (35.37)
a a

Then we will prove additivity, that is,


Z b Z b Z b
(f + g) (x) dx = f (x) dx + g (x) dx (35.38)
a a a

whenever f and g are integrable. Together, relations (35.37) and (35.38) are equivalent to
(35.36).
(i) Homogeneity. Let = fxi gni=0 be a subdivision of [a; b]. If 0 we have I ( f; ) =
I (f; ) and S ( f; ) = S (f; ). Therefore, f is integrable, with
Z b Z b
f (x) dx = f (x) dx (35.39)
a a

Let now < 0. Let us start by considering the case = 1. We have


n
X n
X
I ( f; ) = inf ( f ) (x) xi = sup f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=0 i=0
n
X
= sup f (x) xi = S (f; )
i=0 x2[xi 1 ;xi ]

In a similar way, we have S ( f; ) = I (f; ). Let " > 0. Since f is integrable, by


Proposition 1419 there exists such that S (f; ) I (f; ) < ". Therefore, S ( f; )
I ( f; ) = S (f; ) I (f; ) < ", which implies, by Proposition 1419, that f is integrable.
Moreover,

Z b Z b
( f ) (x) dx = sup I ( f; ) = sup S (f; ) = inf S (f; ) = f (x) dx
a 2 2 2 a

Now let < 0. We have f = ( )( f ) with > 0. Then, by applying (35.39) we obtain
Z b Z b Z b
( f ) (x) dx = ( ) ( f ) (x) dx = ( ) ( f ) (x) dx
a a a
Z b Z b
=( ) f (x) dx = f (x) dx
a a

Therefore,
Z b Z b
f (x) dx = f (x) dx 8 2R (35.40)
a a
that is, (35.37).
(ii) Additivity. Let us prove (35.38). Let " > 0. Since f and g are integrable, by
Proposition 1419 there exists a subdivision of [a; b] such that S (f; ) I (f; ) < " and
there exists 0 such that S (g; 0 ) I (g; 0 ) < ". Let 00 be a subdivision of [a; b] that re…nes
1026 CHAPTER 35. THE RIEMANN INTEGRAL

both and 0 . Thanks to (35.6), we have S (f; 00 ) I (f; 00 ) < " and S (g; 00 ) I (g; 00 ) < ".
Moreover, by applying the inequalities of Lemma 1416,
00 00 00 00 00 00
I f; + I g; I f + g; S f + g; S f; + S g; (35.41)
and therefore
00 00 00 00 00 00
S f + g; I f + g; S f; I f; + S g; I g; < 2"
By Proposition 1419, f + g is integrable. Hence, (35.41) becomes
Z b
I (f; ) + I (g; ) (f + g)(x)dx S (f; ) + S (g; )
a
Rb Rb
for every subdivision 2 . By subtracting a f (x) dx + a g (x) dx from all the three
members of the inequality, we obtain
Z b Z b
I (f; ) + I (g; ) f (x) dx + g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) + S (g; ) f (x) dx + g (x) dx
a a

that is,
Z b Z b
I (f; ) f (x) dx + I (g; ) g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) f (x) dx + S (g; ) g (x) dx
a a

Since f and g are integrable, given any " > 0 we can …nd a subdivision " such that, for
h = f; g, we have
Z b Z b
" " " "
I (h; ) h (x) dx > and S (h; ) h (x) dx <
a 2 a 2
Therefore,
Z b Z b Z b
"< (f + g)(x)dx f (x) dx + g (x) dx <"
a a a
and, given the arbitrariness of " > 0, one necessarily has
Z b Z b Z b
(f + g)(x)dx = f (x) dx + g (x) dx (35.42)
a a a

that is, (35.38).

An important consequence of the linearity of the integral is that the product of two
integrable functions is integrable.
35.6. PROPERTIES OF THE INTEGRAL 1027

Corollary 1430 If f; g : [a; b] ! R are two bounded and integrable functions, then their
product f g : [a; b] ! R is integrable.

Proof If f = g, the integrability of f 2 follows from Proposition 1421 by considering the


continuous function g (x) = x2 . If f 6= g, then f g can be rewritten as
1h i
fg = (f + g)2 (f g)2
4
By Theorem 1429, f + g and f g are integrable. By what has just been proved, their
squares are also integrable. By applying again Theorem 1429, we have that f g is integrable.

O.R. Thanks to the linearity of the integral, knowing the integrals of f and g allows one
to calculate the integral of f + g. It is not so for the product or for the composition of
integrable functions: the integrability of f guarantees the integrability of f 2 , but knowing
the value of
R bthe integral ofR f does not help in the calculation of the integral of f 2 –indeed,
b
in general a f (x) dx 6= ( a f (x) dx)2 . More generally, knowing that g f is integrable does
2

not give any useful indication for the computation of the integral of the composite function.
H

Finally, the linearity of the integral implies that it is possible to freely subdivide the
domain of integration [a; b] into subintervals.

Corollary 1431 Let f : [a; b] ! R be a bounded and integrable function. If a < c < b, then
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (35.43)
a a c

Vice versa, if f1 : [a; c] ! R and f2 : [c; b] ! R are bounded and integrable, then the function
f : [a; b] ! R de…ned by (
f1 (x) if x 2 [a; c]
f (x) =
f2 (x) if x 2 (c; b]
is also bounded and integrable, with
Z b Z c Z b
f (x) dx = f1 (x) dx + f2 (x) dx
a a c

Proof Let us prove the …rst part. Since (recall the de…nition (35.31) of the indicator func-
tion):
f = 1[a;c] f + 1(c;b] f
the linearity of the integral implies that
Z b Z b Z b
f (x) dx = 1[a;c] f + 1(c;b] f (x) dx = 1[a;c] f (x) + 1(c;b] f (x) dx
a a a
Z b Z b
= 1[a;c] f (x) dx + 1(c;b] f (x) dx
a a
1028 CHAPTER 35. THE RIEMANN INTEGRAL

Let us show that Z Z


b c
1[a;c] f (x) dx = f (x) dx
a a
where
f (x) if x 2 [a; c]
1[a;c] f (x) =
0 if x 2 (c; b]
Let " > 0. Since 1[a;c] f (x) is integrable (being a product of integrable functions),12 by
Proposition 1419 there exists a subdivision of [a; b] such that

S(1[a;c] f (x) ; ) I(1[a;c] f (x) ; ) < "

Let 0 = fxi gi=0;1;:::;n be a re…nement of that has c as point of subdivision, say c = xj .


Then we have
S(1[a;c] f (x) ; 0 ) I(1[a;c] f (x) ; 0 ) < "
Let 00 = 0 \ [a; c]. In other words, 00 = fx0 ; x1 ; :::xj g is the restriction of the subdivision 0
to the interval [a; c]. By employing the usual terminology for mi and Mi for every i = 1; 2; :::n,
since mi = Mi = 0 for i > j, we have
n
X X
0 00
I(1[a;c] f (x) ; )= mi xi = mi xi = I(fj[a;c] (x) ; ) (35.44)
i=1 i j

and
n
X X
0 00
S(1[a;c] f (x) ; )= Mi xi = Mi xi = S(fj[a;c] (x) ; ) (35.45)
i=1 i j

Therefore,
00 00
S(fj[a;c] (x) ; ) I(fj[a;c] (x) ; )<"
By Proposition 1419 we conclude that fj[a;c] : [a; c] ! R is integrable. Moreover, from (35.44)
and (35.45) we deduce that
Z b Z c Z c
1[a;c] f (x) dx = fj[a;c] (x)dx = f (x)dx
a a a

In a similar way we prove that


Z b Z b
1(c;b] f (x) dx = f (x) dx
a c

and therefore (35.43) follows.


Let us prove the second part. Let " > 0. Since f1 is integrable, there exists a subdivision
0 of [a; c] such that

S(f1 ; 0 ) I(f1 ; 0 ) < "


Since f2 is integrable, there exists a subdivision 00 of (c; b] such that
00 00
S(f2 ; ) I(f2 ; )<"
12
The indicator function, being a step function, is integrable.
35.6. PROPERTIES OF THE INTEGRAL 1029

Therefore, by taking the subdivision of [a; b] given by = 0 [ 00 , we get


0 0 00 00
S(f; ) I(f; ) = S(f1 ; ) I(f1 ; ) + S(f2 ; ) I(f2 ; ) < 2"
which shows that f is integrable. Moreover,
fj[a;c] = f1 and fj(c;b] = f2
So, f = 1[a;c] f1 + 1(c;b] f2 and by the linearity of the integral we have
Z b Z b Z b
f (x) dx = 1[a;c] f (x) dx + 1(c;b] f (x) dx
a a a
Z c Z c Z c Z b
= fj[a;c] (x)dx + fj(c;b] (x)dx = f1 (x) dx + f2 (x) dx
a a a c

as desired.

The next property of monotonicity of the integral shows that to larger functions there
correspond larger integrals. The writing f g means f (x) g (x) for every x 2 [a; b], i.e.,
the function f is pointwise smaller than the function g.

Theorem 1432 Let f; g : [a; b] ! R be two bounded and integrable functions. If f g, then
Rb Rb
a f (x) dx a g (x) dx.

Proof From f g it follows


I (f; ) I (g; ) and S (f; ) S (g; ) 8 2
Rb Rb
which in turn implies a f (x) dx a g (x) dx.

From the monotonicity of the integral we obtain an important inequality between “ab-
solute values of integrals” and “integrals of absolute values”, the latter being larger. In
reading the result keep in mind that, as observed after Proposition 1421, the integrability of
jf j follows from that of f .

Corollary 1433 Let f : [a; b] ! R be a bounded and integrable function. We have


Z b Z b
f (x) dx jf (x)j dx (35.46)
a a
Rb
Proof Since f jf j and f jf j, from Proposition 1432 it follows that a f (x) dx
Rb Rb Rb Rb Rb
a jf (x)j dx and a f (x) dx a jf (x)j dx. So, a f (x) dx a jf (x)j dx.

The monotonicity of the integral allows us to establish an interesting sandwich property


for integrals.

Proposition 1434 Let f : [a; b] ! R be a bounded and integrable function. Then, setting
m = inf [a;b] f (x) and M = sup[a;b] f (x), we have
Z b
m (b a) f (x) dx M (b a) (35.47)
a
1030 CHAPTER 35. THE RIEMANN INTEGRAL

Proof We have
m f (x) M 8x 2 [a; b]
whence, by the monotonicity of the integral,
Z b Z b Z b
mdx f (x) dx M dx
a a a
Rb
Clearly, a mdx = m (b a) (it is the area of a rectangle of base b a and height m) and
Rb
a M dx = M (b a).13 This shows that 35.47 holds.

We end with the classic Integral Mean Value Theorem, which follows from the previous
sandwich property.

Theorem 1435 (Integral Mean Value) Let f : [a; b] ! R be a bounded and integrable
function. Then, setting m = inf [a;b] f (x) and M = sup[a;b] f (x), there exists a scalar 2
[m; M ] such that
Z b
f (x) dx = (b a) (35.48)
a
In particular, if f is continuous, there exists c 2 [a; b] such that f (c) = , that is,
Z b
f (x) dx = f (c) (b a)
a

Expression (35.48) can be rewritten as


Z b
1
f (x) dx =
b a a

For this reason, is called the mean value (of the ordinates) of f : the value of the integral
does not change if we replace all the ordinates of the function by the constant value .

Proof By (35.47), we have


Rb
a f (x) dx
m M
b a
By setting
Rb
f (x) dx a
=
b a
we obtain the …rst part of the statement. To prove the second part, assume that f is
continuous. By the Intermediate Value Theorem, f assumes all the values included between
its minimum m and its maximum M . Therefore, there exists c 2 [a; b] such that f (c) = ,
which completes the proof.

O.R. The Integral Mean Value Theorem is quite intuitive: there exists a rectangle with base
[a; b] and height , with area equal to the one under f on [a; b]:
13
Note that m (b a) and M (b a) are the areas of the rectangles of base b a and height m and M ,
respectively.
35.7. INTEGRAL CALCULUS 1031

25

y
20

15

10

0
O a b x

-2 0 2 4 6 8

If, moreover, the function f is continuous, the height of such a rectangle coincides with
one of the ordinates of f . H

N.B. Given a function f : [a; b] ! R, until now we have considered the de…nite integral of
Rb
f from a to b,R that is, a f (x)dx. Sometimes it is useful to consider the integral
Ra of f from b
a 14
to a, that is, b f (x)dx, as well as the integral of f from a to a, that is, a f (x)dx. What
do we mean by such expressions? By convention, we pose, for a < b,
Z a Z b
f (x)dx = f (x)dx (35.49)
b a

and Z a
f (x)dx = 0 (35.50)
a
Rb
Thanks to these conventions, it is no longer essential that in a we have a < b: in the case in
which a b the integral assumes the meaning given to it by (35.49) and (35.50). Moreover,
Rb
it is possible to prove that the properties established for the integral a f (x)dx hold also in
the case a b. O

35.7 Integral calculus


After having introduced the Riemann integral and studied its main properties, we turn our
attention to its actual calculation, for which the de…nition is of little help (even if it is,
obviously, essential to understand its nature).
In this section we study the central results of integral calculus, termed “fundamental”
to emphasize their importance. Inter alia, we will show how integration can be seen as the
inverse of the operation of di¤erentiation, something that greatly simpli…es the computation
of integrals.
14
This happens, for example, if f is integrable on an interval [a; b] and we take two generic points x; y 2 [a; b],
without specifying if x < y or x y, and then consider the integral of f between x and y.
1032 CHAPTER 35. THE RIEMANN INTEGRAL

In the study of di¤erentiability, we have considered functions di¤erentiable on an open


interval (a; b), or at least at the interior points of their domain. In this section we will
consider functions f : [a; b] ! R that are di¤erentiable on [a; b], where the derivatives at the
endpoints a and b is taken as one-sided. In a similar way we talk of di¤erentiability on the
half-open intervals (a; b] and [a; b).

35.7.1 Primitive functions


Even if we will be mainly interested in functions de…ned on closed and bounded intervals
[a; b], in this section we will consider more generally any interval I of the real line, be it open,
closed, or half-open, bounded, or unbounded (for example, I can be the entire real line R).

De…nition 1436 Let f : I ! R. A function P : I ! R is called a primitive of f on I if it


is di¤ erentiable on I and
P 0 (x) = f (x) 8x 2 I

In other words, moving from the function f to its primitive P can be seen as the inverse
procedure with respect to moving from P to f through di¤erentiation. In this sense, the
primitive function is the inverse of the derivative function (indeed, sometimes it is called
antiderivative).
Let us provide a couple of examples. Here it is important to keep in mind that, as Example
1442 will show, a function might not have a primitive, so the search of the primitive of a
function might be vain. In any case, by Corollary 999 a necessary condition for a function f
to have a primitive is that it has no removable or jump discontinuities.

Example 1437 Let f : [0; 1] ! R be given by f (x) = x. The function P : [0; 1] ! R given
by P (x) = x2 =2 is a primitive of f . Indeed, P 0 (x) = 2x=2 = x. N

Example 1438 Let f : R ! R be given by f (x) = x= 1 + x2 . The function P : R ! R


given by
1
P (x) = log 1 + x2
2
is a primitive of f . Indeed,
1 1
P 0 (x) = 2x = f (x)
2 1 + x2
for every x 2 R . N

N.B. If I1 and I2 are two nested intervals, with I1 I2 , then a primitive of f on I2 is also a
primitive on I1 . For example, if we consider the restriction of f (x) = x= 1 + x2 on [0; 1],
that is, the function f~ : [0; 1] ! R given by f~ (x) = x= 1 + x2 , then the primitive on [0; 1]
remains P (x) = 2 1 log 1 + x2 . O

If P is a primitive of f , then the function P + k obtained by adding a constant to P is


also a primitive of f . Indeed, (P + k)0 (x) = P 0 (x) = f (x) for every x 2 [a; b]. The next
result shows that, up to such translations, the primitive function is unique.
35.7. INTEGRAL CALCULUS 1033

Proposition 1439 Let f : I ! R and let P1 : I ! R be a primitive function of f . A


function P2 : I ! R is a primitive of f on I if and only if there exists a constant k 2 R such
that
P1 = P2 + k

Proof The “if”is obvious. Let us prove the “only if”. Let I = [a; b] and let P1 ; P2 : [a; b] ! R
be two primitive functions of f on [a; b]. Since P10 (x) = f (x) and P20 (x) = f (x) for every
x 2 [a; b], we have
(P1 P2 )0 (x) = P10 (x) P20 (x) = 0 8x 2 [a; b]
Therefore, the function P1 P2 has zero derivative on [a; b]. The Mean Value Theorem, via
Corollary 995, implies that the function P1 P2 is constant, that is, there exists k 2 R such
that P1 = P2 + k.
Let now I be an open and bounded interval (a; b). Let " > 0 be su¢ ciently small so that
a + " < b ". We have
1 h
[ " "i
(a; b) = a + ;b
n n
n=1
By what has just been proved, for every n 1 there exists a constant kn 2 R such that
h " "i
P1 (x) = P2 (x) + kn 8x 2 a + ; b (35.51)
n n
Let x0 2 (a; b) be such that a + " < x0 < b ", so that x0 2 [a + "=n; b "=n] for every
n 1. From (35.51) it follows that P1 (x0 ) = P2 (x0 ) + kn for every n 1. Therefore,
kn = P1 (x0 ) P2 (x0 ) for every n 1, that is, k1 = k2 = = kn . There exists, therefore,
k 2 R such that P1 (x) = P2 (x) + k for every x 2 (a; b).
In a similar way one can show the result when I is a half-open and bounded [1 interval
(a; b] or [a; b). If I = R, we proceed as in the case (a; b), observing that R = [ n; n].
n=1
A similar argument, which we leave to the reader, holds also for unbounded intervals.

This proposition is another important application of the Mean Value Theorem (of di¤er-
ential calculus). Thanks to it, once a primitive P of a function f is identi…ed, we can write
the family of all the primitives as fP + kgk2R . This important family deserves a name.

De…nition 1440 Given a function f : I ! R, the family of all its primitives is called the
inde…nite integral of f and is denoted by
Z
f (x) dx

Example 1441 Let us go back to Examples (1437) and (1438). For the function f : [0; 1] !
R given by f (x) = x, we have Z
x2
f (x) dx = +k
2
For the function f : R ! R given by f (x) = x= 1 + x2 we have
Z
1
f (x) dx = log 1 + x2 + k
2
N
1034 CHAPTER 35. THE RIEMANN INTEGRAL

We close the section by showing that not all the functions admit a primitive, so an
inde…nite integral.

Example 1442 The signum function sgn : R ! R given by

8
>
> 1 if x > 0
<
sgn (x) = 0 if x = 0
>
>
:
1 if x < 0

does not admit a primitive. Let us suppose, by contradiction, that there exists a primitive
P : R ! R, i.e., a di¤erentiable function such that P 0 (x) = sgn x. By Proposition 1439,
there exists k 2 R such that

x+k if x > 0
P (x) =
x+k if x < 0

Since P is di¤erentiable, by continuity we also have P (0) = k. Therefore, P (x) = jxj + k


for every x 2 R, but this function is not di¤erentiable at the origin, which contradicts what
has been assumed on P . Note that the signum function, being a step function, is integrable
by Proposition 1423. N

Rb
The Riemann integral a f (x) dx is often called a de…nite integral to distinguish it from
the inde…nite integral just introduced. Note that the inde…nite integral is a di¤erential
calculus notion. The Riemann integral, with its connection with the method of exhaustion,
is a conceptually much deeper notion.
35.7. INTEGRAL CALCULUS 1035

35.7.2 Formulary
The next table, obtained by “reversing” the corresponding table of the basic derivatives,
records some fundamental inde…nite integrals.
R
f f (x) dx
xa+1
xa +k 1 6= a 2 R and x > 0
a+1
xn+1
xn +k x2R
n+1
1
log x + k x>0
x
1
log ( x) + k x<0
x
cos x sin x + k x2R
sin x cos x + k x2R
ex ex + k x2R
x
x +k > 0 and x 2 R
log
1
p arcsin x + k x 2 ( 1; 1)
1 x2
1
arctan x + k x2R
1 + x2
1
(cos x)2
tan x + k x2R

We make three observations:

(i) For powers, we have Z


xa+1
xa dx = +k 8a 6= 1
a+1
on the entire real line R when a is such that the power function xa has R as domain:
for example, if a 2 N. In general, if a 2 R we might need to require x > 0 (e.g., if
a = 1=2).

(ii) The case a = 1 for powers is covered by f (x) = 1=x.

(iii) Note that (


Z log x + k if x > 0
f (x) dx =
log ( x) + k 0 if x < 0
summarizes the cases x < 0 and x > 0 for f (x) = 1=x. in this regard, note that for
x < 0 and g (x) = log ( x) one has
1 1
g 0 (x) = ( 1) =
x x
1036 CHAPTER 35. THE RIEMANN INTEGRAL

35.7.3 The First Fundamental Theorem of Calculus


The next theorem, called First Fundamental Theorem of Calculus, is a central result in the
theory of integration. Conceptually, it shows that integration can be seen as the inverse
operation of di¤erentiation. This, in turn, o¤ers a powerful method of computation of
integrals based on the use of primitive functions.

Theorem 1443 (First Fundamental Theorem of Calculus) Let P : [a; b] ! R be a


primitive function of f : [a; b] ! R. If f is integrable, then
Z b
f (x) dx = P (b) P (a) (35.52)
a
Rb
In view of formula (35.52), the computation of the Riemann integral a f (x) dx reduces to
the computation of the primitive P of f , that is, to the computation of the inde…nite integral.
As we saw in the last section, this can be carried out by using the di¤erentiation rules studied
in Chapter 20. In a sense, formula (35.52) reduces integral calculus to di¤erential calculus.

Proof Let = fxi gni=0 be a subdivision of [a; b]. If we add and subtract P (xi ) for every
i = 1; 2; : : : ; n 1, we have
P (b) P (a) = P (xn ) P (xn 1 ) + P (xn 1) P (x1 ) + P (x1 ) P (x0 )
Xn
= (P (xi ) P (xi 1 ))
i=1

Let us consider P on [xi 1 ; xi ]. Since P is di¤erentiable on (a; b) and is continuous on [a; b],
by the Mean Value Theorem there exists x ^i 2 (xi ; xi 1 ) such that
P (xi ) P (xi 1)
P 0 (^
xi ) =
xi xi 1
Since P is a primitive, we have
P (xi ) P (xi 1)
xi ) = P 0 (^
f (^ xi ) =
xi xi 1
and hence
n
X n
X n
X
P (b) P (a) = (P (xi ) P (xi 1 )) = f (^
xi ) (xi xi 1) = f (^
xi ) xi
i=1 i=1 i=1

which implies
I (f; ) P (b) P (a) S (f; ) (35.53)
Since is any subdivision, (35.53) holds for every 2 and therefore
sup I (f; ) P (b) P (a) inf S (f; )
2 2

from which, since f is integrable, we obtain (35.52).

Let us illustrate the theorem with some examples, which use again the primitives com-
puted in Examples 1437 and 1438.
35.7. INTEGRAL CALCULUS 1037

Example 1444 Let f : R ! R be given by f (x) = x. We have P (x) = x2 =2 and therefore,


thanks to (35.52),
Z b
b2 a2
xdx =
a 2 2
R1
For example, 0 xdx = 1=2. More generally, let f : R ! R be given by a power f (x) = xn .
Clearly, we have P (x) = xn+1 = (n + 1). So, by (35.52),
Z b
bn+1 an+1
xn dx =
a n+1 n+1
R1
Now, 0 xn dx = 1= (n + 1). N

Example 1445 Let f : R ! R be given by f (x) = x= 1 + x2 . As we saw in Example


1438, the primitive function P : R ! R is given by P (x) = (1=2) log 1 + x2 . Therefore,
thanks to (35.52),
Z b
x 1 1
2
dx = log 1 + b2 log 1 + a2
a 1 + x 2 2
For example,
Z 1
x 1 log 2
2
dx = log 2 0=
0 1+x 2 2
N

For the integrable functions without primitives, such as the signum function, the last
theorem cannot be applied and the calculation of integrals cannot be done through formula
(35.52). In some simple cases it is, however, possible to calculate the integral using directly
the de…nition. For example, the signum function is a step function and therefore we can
apply Proposition 1423 in which, using the de…nition of the integral, we determined the
value of the integral for this class of functions. In particular, we have
8
Z b < b a if a 0
>
sgn x dx = a + b if a < 0 < b
a >
:
a b if b 0

The cases a 0 and b 0 are obvious using (35.32). Let us consider the case a < 0 < b.
Using (35.32) and (35.43), we have
Z b Z 0 Z b Z 0 Z b
sgn x dx = sgn x dx + sgn x dx = ( 1)dx + 1dx
a a 0 a 0
= ( 1)(0 a) + (1)(b 0) = a + b

35.7.4 The Second Fundamental Theorem of Calculus


In light of the First Fundamental Theorem, it is natural to look for conditions that guarantee
that an integrable function f : [a; b] ! R has, indeed, a primitive. To this end, we introduce
an important notion.
1038 CHAPTER 35. THE RIEMANN INTEGRAL

De…nition 1446 Let f : [a; b] ! R be an integrable function. The function F : [a; b] ! R


given by Z x
F (x) = f (t) dt 8x 2 [a; b]
a
is called the integral function of f .

In other words, the value F (x) of the integral function is the (signed) area under f on
the interval [a; x], when x varies.15

Rx
N.B. The integral function F (x) = a f (t) dt is a function F : [a; b] ! R that has, as
variable,Rthe upper limit of integration x that, when varies, determines a di¤erent Riemann
x
integral a f (t) dt. The value of this integral (which is a scalar) is the image F (x) of the
integral function. In this regard, note that F is de…ned on [a; b] since, f being integrable on
this interval, it is integrable on all the subintervals [a; x] [a; b]. O

Let us establish a …rst property of integral functions.

Proposition 1447 The integral function F : [a; b] ! R of an integrable bounded function


f : [a; b] ! R is (uniformly) continuous.

Proof Since f is bounded, there exists M > 0 such that jf (x)j M for every x 2R [a; b]. Let
x
x; y 2 [a; b]. By the de…nition of the integral function, we have F (x) F (y) = y f (t) dt.
Thanks to (35.46), we have
Z x Z x Z x
jF (x) F (y)j = f (t) dt jf (t)j dt M dt = M jx yj
y y y

Therefore, for every " > 0, by setting " = "=M we have

jx yj < " =) jF (x) F (y)j < " 8x; y 2 [a; b]

By Theorem 526, F is uniformly continuous on [a; b].

Armed with the notion of integral function, we can address the problem that opened the
section: the next important result, the Second Fundamental Theorem of Calculus, shows
that the integral function is a primitive of a continuous f . Continuity is, therefore, a simple
condition that guarantees the existence of a primitive.

Theorem 1448 (Second Fundamental Theorem of Calculus) Let f : [a; b] ! R be a


continuous (so, integrable) function. Its integral function F : [a; b] ! R is a primitive of f ,
that is, it is di¤ erentiable at every x 2 [a; b], with

F 0 (x) = f (x) 8x 2 [a; b] (35.54)


15
Note that in the de…nition of the integral function the (mute) variable of integration is no longer x, but
any other letter (here t, but it could have been z, u or any other letter di¤erent from x). Such a choice is
dictated by the necessity of avoiding any confusion about the use of the variable x, which here becomes the
independent variable of the integral function.
35.7. INTEGRAL CALCULUS 1039

By the “pasting” property (35.43), for all a y x b we have


Z x
F (x) F (y) = f (t) dt (35.55)
y

for all a y x b. In view of (35.54), the fact that the integral function may be a
primitive is then not that surprising. Next we give a rigorous argument.

Proof Let x0 2 (a; b). First of all, let us see which form the di¤erence quotient of F at x0
assumes. Take h > 0 such that x0 + h 2 [a; b]. By Corollary 1431,
Z x0 +h Z x0
F (x0 + h) F (x0 ) = f (t) dt f (t) dt
a a
Z x0 Z x0 +h Z x0 Z x0 +h
= f (t) dt + f (t) dt f (t) dt = f (t) dt
a x0 a x0

Therefore, by the Mean Value Theorem, letting x0 + #h, 0 # 1 denote a point of the
interval [x0 ; x0 + h], we have:
R x0 +h
F (x0 + h) F (x0 ) x0 f (t) dt hf (x0 + #h) hf (x0 )
f (x0 ) = f (x0 ) =
h h h
= f (x0 + #h) f (x0 ) ! 0

by the continuity of f .
A similar argument holds if h < 0.16 Therefore,

F (x0 + h) F (x0 )
F 0 (x0 ) = lim = f (x0 )
h!0 h
by completing in this way the proof when x0 2 (a; b). The cases x0 = a and x0 = b are
proved in a similar way, as the reader can easily verify. We conclude that there exists F 0 (x0 )
and that it is equal to f (x0 ).

The Second Fundamental Theorem gives a su¢ cient condition, continuity, for an integ-
rable function to have a primitive (so, an inde…nite integral). More importantly, however,
in so doing it shows that di¤erentiation can be seen as the inverse operation of integration:
condition (35.54) can, indeed, be written as
Z x
d
f (t) dt = f (x) (35.56)
dx a

On the other hand, a di¤erentiable function f : [a; b] ! R is, obviously, a primitive of its
derivative function f 0 : [a; b] ! R. By the First Fundamental Theorem of Calculus, if f 0
16
Observe that in this case we have
Z x0 +h Z x0 Z x0 +h Z x0 +h Z x0 Z x0
f (t) dt f (t) dt = f (t) dt f (t) dt + f (t) dt = f (t) dt
a a a a x0 +h x0 +h
1040 CHAPTER 35. THE RIEMANN INTEGRAL

is integrable – e.g., if f is continuously di¤erentiable (cf. Proposition 1425) – then formula


(35.52) takes the form Z x
f 0 (t) dt = f (x) f (a)
a
for all a x b, that is, Z x
df
(t) dt = f (x) f (a) (35.57)
a dt
Integration can thus be seen as the inverse operation of di¤erentiation. Jointly, (35.56)
and (35.57) show that di¤erentiation and integration can be viewed as inverse operations.
The two fundamental theorems form the backbone of integral calculus by clarifying its dual
relation with di¤erential calculus and, in this way, by making it operational. The importance
of all this in both mathematics and in applications is just enormous.

The next example shows that continuity is only a su¢ cient, but not necessary, condition
for an integrable function to admit a primitive.

Example 1449 The function f : R ! R given by


8
< 2x sin 1 cos 1 if x 6= 0
f (x) = x x
:
0 if x = 0

is discontinuous at 0. Nevertheless, a primitive P : R ! R of this function is


8
< x2 sin 1 if x 6= 0
P (x) = x
:
0 if x = 0

Indeed, for x 6= 0 this can be veri…ed by di¤erentiating x2 sin 1=x, while for x = 0 one
observes that
P (h) P (0) h2 sin h1 1
P 0 (0) = lim = lim = lim h sin = 0 = f (0)
h!0 h h!0 h h!0 h
So, there exist discontinuous integrable functions that have primitives (for which the First
Fundamental Theorem can therefore be applied). N

The signum function, which has no primitive (Example 1442), is an example of a dis-
continuous function for which the last theorem altogether fails. Next we present another
example of such failure, yet more subtle in that it features a di¤erentiable integral function.

Example 1450 De…ne f : [0; 1] ! R by


( 1
n if x = m
n (in its lowest terms)
f (x) =
0 otherwise

The function f , a well behaved modi…cation of the Dirichlet function, is continuous at every
irrational points and discontinuous at every rational point of the unit interval. By Theorem
35.8. PROPERTIES OF THE INDEFINITE INTEGRAL 1041

R1
1426, f is integrable. In particular, 0 f (t) dt = 0. It is a useful (non-trivial) exercise to
check all this. Rx
That said, if F (x) = 0 f (t) dt for every x 2 [0; 1], we then have F (x) = 0 for every
x 2 [0; 1]. Hence, F is trivially di¤erentiable, with F 0 (x) = 0 for every x 2 [0; 1], but F 0 6= f
because F 0 (x) = f (x) if and only if x is irrational. We conclude that (35.54) does not
hold, andR so the last theorem fails because F is not a primitive of f . Nevertheless, we have
x
F (x) = 0 F 0 (t) dt for every x 2 [0; 1]. N
O.R. The operation of integration makes a function more regular: the integral function F of
f is always continuous and, if f is continuous, it is di¤erentiable. In contrast, the operation
of di¤erentiation makes a function more irregular. Speci…cally, integration scales up of a
degree the regularity: F is always continuous; if f is continuous, F is di¤erentiable and,
continuing in this way, if f is di¤erentiable, F is twice di¤erentiable, and so on and so forth.
Di¤erentiation, instead, scales down the regularity of a function. H

35.8 Properties of the inde…nite integral


The First Fundamental Theorem of Calculus gives, through formula (35.52), a powerful
method to compute Riemann integrals. It relies on the calculation Rof primitives, that is, of
b
the inde…nite integral. Indeed, to calculate the Riemann’s integral a f (x) dx of a function
f : [a; b] ! R that has primitive, we proceed in two steps:
R
(i) we calculate the primitive P : [a; b] ! R of f , that is, the inde…nite integral f (x) dx;

(ii) we calculate the di¤erence P (b) P (a): this di¤erence is often denoted by P (x)jba or
[P (x)]ba .
Next we present some properties of the inde…nite integral that simplify its calculation.
A …rst observation is that the linearity of derivatives, established in (20.12), implies the
linearity of the inde…nite integral.17
Proposition 1451 Let f; g : I ! R be two functions that admit primitives. Then for every
; 2 R, the function f + g : I ! R admits a primitive and
Z Z Z
( f + g) (x) dx = f (x) dx + g (x) dx (35.58)

Proof Let Pf ; Pg : I ! R be primitives of f and g. By (20.12), we have


( Pf + Pg )0 (x) = Pf0 (x) + Pg0 (x) = f (x) + g (x) 8x 2 I
So, Pf + Pg is a primitive of f + g, which implies (35.58) by Proposition 1439.

A simple application of the result is the calculation of the inde…nite integral of a poly-
nomial. Namely, given a polynomial f (x) = 0 + 1 x + + n xn , it follows from (35.58)
that Z Z X ! Z
n Xn Xn
i i xi+1
f (x) dx = ix dx = i x dx = i +k
i+1
i=0 i=0 i=0
17
As in Section 35.7.1, in this section we denote by I a generic interval, bounded or unbounded, of the real
line.
1042 CHAPTER 35. THE RIEMANN INTEGRAL

The product rule for di¤erentiation leads to an important formula for the calculation of
the inde…nite integral, called integration by parts.

Proposition 1452 (Integration by parts) Let f; g : I ! R be two di¤ erentiable func-


tions. Then Z Z
f 0 (x) g (x) dx + f (x) g 0 (x) dx = f (x) g (x) + k (35.59)

Proof By the product rule (20.13), (f g)0 = f 0 g + f g 0 . Hence, f g = Pf 0 g+f g0 , and thanks to
(35.58) we have
Z Z Z
f (x) g (x) + k = f 0 (x) g (x) + f (x) g 0 (x) dx = f 0 (x) g (x) dx + f (x) g 0 (x) dx

as claimed.

Formula (35.59) is useful becauseR sometimes there is Ra strong asymmetry in the com-
putability of the inde…nite integrals f 0 (x) g (x) dx and f (x) g 0 (x) dx, one of them may
be much simpler to calculate than the other one. By exploiting this asymmetry, thanks to
(35.59) we may be able to calculate the more complicated integral as the di¤erence between
f (x) g (x) and the simpler integral.
R
Example 1453 Let us calculate the inde…niteR integral log x dx. Let f; g :R(0; 1) ! R be
de…ned by f (x) = log x and g (x) = x, so that log x dx can be rewritten as log x g 0 (x) dx.
By formula (35.59), we have
Z Z
0
xf (x) dx + log x dx = x log x + k

that is, Z Z
1
x dx + log x dx = x log x + k
x
So, Z
log x dx = x (log x 1) + k

N
R
Example 1454 Let us calculate the inde…nite integral Rx sin x dx. Let f; g : (0; 1) !
R be given
R by f (x) = x and g (x) = cos x, so that x sin x dx can be rewritten as
f (x) g 0 (x) dx. By formula (35.59),
Z Z
0
f (x) g (x) dx + x sin x dx = x cos x + k

that is, Z Z
x sin x dx = cos xdx x cos x + k = sin x x cos x + k

N
35.9. CHANGE OF VARIABLE 1043

Note that in the last example, if instead we set f (x) = sin x and g (x) = x2 =2, formula
(35.59)
R becomes
R useless. Also with such choice of f and g, it is still possible to rewrite
x sin x dx as f (x) g 0 (x) dx. Yet, here (35.59) implies
Z Z
0 x2
f (x) g (x) dx + x sin x dx = sin x + k
2

that is,
Z Z
x2 1
x sin x dx = sin x x2 cos xdx + k
2 2
R 2
which actually complicated things because
R the integral x cos xdx is more di¢ cult to com-
pute compared to the original integral x sin x dx. This shows that integration by parts
cannot proceed in a mechanical way, but it requires a bit of imagination and experience.
R
O.R. Example 1454 shows that to calculate the integral xn h(x)dx, where h is a function
whose primitive has a similar “complexity” (e.g., h is sin x, cos x or ex ), a good choice is
to set f (x) = xn and g(x) = h(x). Indeed, after having di¤erentiated f (x) for n times,
the polynomial form disappears and one is left with g(x) or g 0 (x), which is immediately
integrable. Such a choice has been used in Example 1454. H

The formula of integration by parts is usually written as


Z Z
0
f (x) g (x) dx = f (x) g (x) f 0 (x) g (x) dx + k

The two factors of the product f (x) g 0 (x) dx are called, respectively, the …nite factor, f (x),
and the di¤ erential factor, g 0 (x) dx. So, the formula says that “the integral of the product
between the …nite factor and a di¤erential factor is equal to the product between …nite
factor and the integral of the di¤erential factor minus the integral of the product between
the derivative of the …nite factor and the integral just found”. We repeat that it is important
to carefully choose which of the two factors to take as …nite factor and which as di¤erential
factor.

Finally, in terms of Riemann integrals the formula obviously becomes


Z b Z b
0
f (x) g (x) dx = f (x) g (x)jba f 0 (x) g (x) dx (35.60)
a a
Z b
= f (b) g (b) f (a) g (a) f 0 (x) g (x) dx
a

35.9 Change of variable


The next result shows how the integral of a function f changes when we compose it with
another function '.
1044 CHAPTER 35. THE RIEMANN INTEGRAL

Theorem 1455 Let ' : [c; d] ! [a; b] be a di¤ erentiable and strictly increasing function
such that '0 : [c; d] ! R is integrable. If f : [a; b] ! R is continuous, then the function
(f ') '0 : [c; d] ! R is integrable and
Z d Z '(d)
0
f (' (t)) ' (t) dt = f (x) dx (35.61)
c '(c)

If ' is surjective, we have a = ' (c) and b = ' (d). Formula (35.61) can therefore be
rewritten as Z b Z d
f (x) dx = f (' (t)) '0 (t) dt (35.62)
a c
Heuristically, (35.61) can be seen as the result of the change of variable x = ' (t) and of the
corresponding change
dx = '0 (t) dt = d' (t) (35.63)
in dx. At a mnemonic and calculation level, this observation can be useful, even if the writing
(35.63) is per se meaningless.

Proof Since f is continuous, (35.55) yields


Z '(d)
f (x) dx = F (' (d)) F (' (c)) (35.64)
'(c)

Moreover, the chain rule implies

(F ')0 (t) = F 0 ' (t) '0 (t) = (f ') (t) '0 (t)

that is, F ' is a primitive of (f ') '0 : [c; d] ! R. By Proposition 1421, the composite
function f ' : [c; d] ! R is integrable. Since, by hypothesis, '0 : [c; d] ! R is integrable,
so is the product function (f ') '0 : [c; d] ! R (recall what we saw at the end of Section
35.6). By the First Fundamental Theorem, we have
Z d
(f ') (t) '0 (t) dt = (F ') (d) (F ') (c) (35.65)
c

Since ' is bijective (being strictly increasing), we have ' (c) = a and ' (d) = b. Therefore,
(35.65) and (35.64) imply
Z d Z b
(f ') (t) '0 (t) dt = F (' (d)) F (' (c)) = f (x) dx
c a

as desired.

Theorem 1455, besides having a theoretical interest, can be useful in the calculation of
integrals. Formula (35.61), and its rewriting (35.62), can be used both from “right to left”
and fromR “left to right”. In the …rst case, from right to left, the objective is to calculate the
b
integral a f (x) dx by …nding a suitable change of variable x = ' (t) that leads to an integral
R ' 1 (b)
' 1 (a)
f (' (t)) '0 (t) dt that is easier to calculate. The di¢ culty is in …nding a suitable
35.9. CHANGE OF VARIABLE 1045

change of variable x = ' (t): indeed, nothing guarantees that there exists a “simplifying”
change and, even if it existed, it might not be obvious how to …nd it.

On the other hand, the application in direction left to right of formula (35.61) is useful
Rd
to calculate an integral that can be written as c f (' (t)) '0 (t) dt for some function f for
R '(d)
which we know the primitive F . In such a case, the corresponding integral '(c) f (x) dx,
obtained by setting x = ' (t), is easier to calculate since
Z
f ('(x))'0 (x)dx = F ('(x))

Rd
In such a case the di¢ culty is in recognizing the composite form c f (' (t)) '0 (t) dt in
the integral that we want to calculate. Also here, nothing guarantees that the integral
can be rewritten in this form, nor that, also when possible, it is easy to recognize. Only
the experience (and the exercise) can be of help. The next example presents some classic
integrals that can be calculated with this technique.

Example 1456 (i) If a 6= 1, we have


Z
'(x)a+1
'(x)a '0 (x)dx = +k
a+1
For example, Z
1
sin4 x cos xdx = sin5 x + k
5
(ii) We have Z
'0 (x)
dx = log j'(x)j + k
'(x)
For example,
Z Z Z
sin x sin x
tan xdx = dx = dx = log j cos xj + k
cos x cos x

(iii) We have
Z Z
sin('(x))'0 (x)dx = cos '(x) + k and cos('(x))'0 (x)dx = sin '(x) + k

For example,
Z
sin(3x3 2x2 ) (9x2 4x)dx = sin(3x3 2x2 ) + k

(iv) We have Z
e'(x) '0 (x)dx = e'(x) + k

For example, Z Z
2 1 2 1 2
xex dx = 2xex dx = ex + k
2 2
1046 CHAPTER 35. THE RIEMANN INTEGRAL

We present now three examples that illustrate the two possible applications of formula
(35.61). The …rst example considers the case right to left, the second example can be solved
both going right to left and left to right, while the last example considers the case left to
right. For simplicity we use the variables x and t as they appear in (35.61), even if it is
obviously a mere convenience, without substantial value.

Example 1457 Consider the integral


Z b p
sin x dx
a
p
with [a; b] [0; 1). Set t = x, so that x = t2 . Here we have ' (t) = t2 and, thanks to
(35.62), p p
Z b Z b Z b
p
sin xdx = p
2t sin tdt = 2 p
t sin tdt
a a a
R
In Example 1454 we solved by parts the inde…nite integral t sin tdt. In light of that example,
we have
Z pb p p p p
p p p
t sin tdt = sin x x cos xj pb = sin b sin a + a cos a b cos b
p a
a

and so Z b p p p p p p p
sin xdx = 2 sin b sin a + a cos a b cos b
a
p
Note how the starting point has been to set t = x, that is, to specify the inverse function t =
p
' 1 (x) = x. This is often the case because it is simpler to think of which transformation
of x may simplify the integration. N

Example 1458 Consider the integral


Z
2 cos x
dx
0 (1 + sin x)3
1
“Right to left”. Set t = sin x, so that ' (t) = sin t on [0; =2]. From (20.20) it follows that
1
'0 (t) = 1
cos sin t

Thanks to (35.62), we have


Z Z 1
2 cos x cos sin 1 t 1
3 dx = 3 dt
0 (1 + sin x) 0 (1 + t) cos sin 1 t
Z 1 1
1 1 3
= 3 dt = 2 =
0 (1 + t) 2 (1 + t) 0 8
“Left to right”. In the integral we recognize a form of type (i) of Example 1456, an integral
of the type Z
'(x)a '0 (x)dx
35.10. CLOSED FORMS 1047

R '(x)a+1
with '(x) = 1 + sin x and a = 3. Since '(x)a '0 (x)dx = a+1 , we have
Z
2 cos x 1 2 1 1 3
dx = = + =
0 (1 + sin x)3 2 (1 + sin x)2 0
8 2 8

Example 1459 Consider the integral


Z d
log t
dt (35.66)
c t

with [c; d] (0; 1). Here we recognize again a form of type (i) of Example 1456, an integral
of the type Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = log t and a = 1. Since again '(t)a '0 (t)dt = a+1 , we have
Z d
d
log t log2 t 1
dt = = log2 d log2 c
c t 2 c 2
N

35.10 Closed forms


Both theoretically and operationally, it is important to know when the primitive of an ele-
mentary function is itself an elementary function. To this end it is necessary, …rst of all,
to make rigorous the notion of elementary function informally introduced in Section 6.5 of
Chapter 6. To do this, we rely on two important classes of functions, the rational and the
algebraic ones. A function f : A R ! R is called:

(i) rational if it is can be expressed as a ratio of polynomials (Section 10.5.1), that is,

a0 + a1 x + ::: + an xn
f (x) = (35.67)
b0 + b1 x + ::: + bm xm

(ii) algebraic if it is de…ned through …nite combinations of the four elementary operations
and root extraction.

Example 1460 The functions


p p r q
x 31 x 3 p
5
f (x) = p and g(x) = 1+ 1 x 1
x 2 e
are algebraic. N

We can now de…ne the elementary functions.


1048 CHAPTER 35. THE RIEMANN INTEGRAL

De…nition 1461 A function f : A R ! R is called elementary if it belongs to one of the


following classes:

(i) rational functions,

(ii) algebraic functions,

(iii) exponential functions,

(iv) logarithmic functions,

(v) trigonometric functions,18

(vi) the functions obtained through both …nite combinations and …nite compositions of func-
tions that belong to the previous classes.

The elementary functions that are neither rational nor algebraic are called transcend-
ental. For example, such are the exponential functions, the logarithmic functions, and the
trigonometric functions.

The elementary functions can be written in …nite terms (that is, in closed form), which
gives them simplicity and tractability. However, the relevant question for the integral calculus
is whether their primitive are themselves elementary functions, so they keep enjoying the
tractability of the original functions. This motivates the following de…nition:

De…nition 1462 An elementary function is said to be integrable in …nite terms if its prim-
itive is an elementary function.

In this case, we will say also that f is explicitly integrable or integrable in closed form.
For example, f (x) = 2x is explicitly integrable since its primitive F (x) = x2 is an elementary
function. Also the functions f (x) = sin x, f (x) = cos x, as well as all the polynomials and
the exponential functions f (x) = ekx , with k 2 R, are explicitly integrable.

Nevertheless, and this is what makes interesting the topic of this section, not all element-
ary functions are explicitly integrable. The next result reports the remarkable example of
the Gaussian function.

Proposition 1463 The elementary functions e x2 and ex =x are not integrable in …nite
terms.

The proof of the proposition is based on results of complex analysis. The non-integrability
in …nite terms of these functions implies that of other important functions.

Example 1464 The function 1= log x is not integrable in …nite terms. Indeed, with the
change of variable x = et , we get dx = et dt and therefore, by substitution,
Z Z t
1 e
dx = dt
log x t
18
Through complex numbers, it is possible to express trigonometric functions as linear combinations of
exponential functions, as reader will learn in more advanced courses.
35.10. CLOSED FORMS 1049

Since ex =x is not integrable in …nite terms, the same holds for 1= log x. In particular, the
integral function Z x
1
Li (x) = dt
2 log t
which plays a key role in the study of prime numbers, is not an elementary function. N

In view of this example, it becomes important to have criteria that guarantee the in-
tegrability, or the non-integrability, in …nite terms of a given elementary function. For the
rational functions everything is simple, as the next result shows (we omit its proof).

Proposition 1465 Rational functions are integrable in …nite terms. In particular, the prim-
itive of a rational function f (x) is an elementary function given by a linear combination of
the following functions:
log(ax2 + bx + c), arctan(dx + k) and r (x)
where a; b; c; d; k 2 R and r(x) is a rational function.

Example 1466 Let us calculate the integral


Z
x 1
dx
x2 + 3x + 2
In view of Example 415, the partial fraction expansion of f is
2 3
f (x) = +
x+1 x+2
So, we have
Z Z
x 1 2 3
2
dx = + dx = 2 log jx + 1j + 3 log jx + 2j
x + 3x + 2 x+1 x+2
N

Example 1467 Let us calculate the integral


Z
dx
x2 6x + 13
We write
1 1 1 1
= 2 =
x2 6x + 13 (x 3) + 4 4 x 3 2
2 +1
Let us make the change of variable u = (x 3) =2, so that
dx
du =
2
Then
Z Z Z
dx 1 2du 1 du
= =
x2 6x + 13 4 2
u +1 2 2
u +1
1 1 x 3
= arctan u + k = arctan +k
2 2 2
N
1050 CHAPTER 35. THE RIEMANN INTEGRAL

Things are more complicated for algebraic and transcendental functions: some of them
are integrable in …nite terms, others are not. A full analysis of the topic is well beyond
the scope of this book.19 We just mention that Liouville has proved an important result
that establishes a necessary and su¢ cient condition for the integrability in …nite terms of
functions of the form f (x)eg(x) . Inter alia, this result permits to prove Proposition 1463,
2
that is, the non-integrability in …nite terms of the functions e x and ex =x.

This said, in some (lucky) cases the integrability in …nite terms of non-rational elementary
functions can be reduced, through suitable substitutions, to that of rational functions. This
is the case, for example, for functions of the type r(ex ), where r ( ) is a rational function.
Indeed, by setting x = log t and by recalling what we saw in Section 35.9 on the integration
by substitution, we get Z Z
r(t)
r(ex )dx = dt
t
Thanks to Proposition 1465, the rational function r (t) =t is integrable in …nite terms.
Another example is the transcendental function

a sin x + b cos x
f (x) =
c sin x + d cos x
with a; b; c; d 2 R and ; ; ; 2 Z. By setting x = 2 arctan t, that is,
x
tan =t
2
simple trigonometric arguments yield:

2t 1 t2
sin x = and cos x = (35.68)
1 + t2 1 + t2

Indeed, we have sin x = 2 sin x=2 cos x=2 and cos x = cos2 x=2 sin2 x=2. Since 1+tan2 x=2 =
cos 2 x=2, we have
x 1
cos = q
2 1 + tan2 x 2

Moreover,
x x x tan x2
sin = tan cos = q
2 2 2 1 + tan2 x
2

By substituting sin x=2 and cos x=2 in sin x and cos x, we get (35.68).
With this substitution we transform f (x) into the rational function

2t 1 t2
a 1+t2
+b 1+t2

2t 1 t2
c 1+t2
+d 1+t2

and we proceed to the explicit integration (always proceeding by substitution).


19
See Ritt (1948) and the comprehensive Gradshteyn and Ryzhik (2014).
35.11. IMPROPER INTEGRALS 1051

O.R. The question of determining whether or not the inde…nite integral of a function belongs
to a given class of functions was tackled already by Newton and Leibniz. While Newton,
to avoid resorting to transcendental functions, preferred to express the primitive through
algebraic functions (also through in…nite series of algebraic functions), Leibniz gave priority
to formulations in …nite terms and considered acceptable also non-algebraic primitives. The
vision of Leibniz prevailed and in the nineteenth century the problem of integrability in …nite
terms became an important area of research, with major contributions by Joseph Liouville
in the 1830s. H

35.11 Improper integrals


We talk about improper integrals in two cases: when the interval of integration is unbounded,
or when the interval of integration is bounded but the function being integrated is unbounded
near some point of the interval.

35.11.1 Unbounded intervals of integration: generalities


Until now we have considered integrals on closed and bounded intervals [a; b]. In applications
integrals on unbounded intervals are also very important. A famous example is the Gaussian
bell centered at the origin

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

seen in Example 1258 and whose area is given by the Gauss integral
Z +1
2
e x dx (35.69)
1

In this case the domain of integration is the whole real line ( 1; +1).

Let us begin with domains of integration of the form [a; +1). Given a function f :
[a; +1) ! R, consider the integral function F : [a; +1) ! R given by
Z x
F (x) = f (t) dt
a
1052 CHAPTER 35. THE RIEMANN INTEGRAL

R +1
The de…nition of the improper integral a f (x) dx is based on the limit limx!+1 F (x),
that is, on the asymptotic behavior of the integral function. For such behavior, we can have
three cases:

(i) limx!+1 F (x) = L 2 R;

(ii) limx!+1 F (x) = 1;

(iii) limx!+1 F (x) does not exist.

Cases (i) and (ii) are considered by the next de…nition.

De…nition 1468 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
[a; +1) with integral function F . If limx!+1 F (x) 2 R, we set
Z +1
f (x) dx = lim F (x)
a x!+1

Rand
+1
the function f is said to be integrable in the improper sense on [a; +1). The value
a f (x) dx is called the improper (or generalized) Riemann integral.

For brevity, in the sequel we will say that a function f is integrable on [a; +1), omitting
“in an improper sense”. We have the following terminology:
R +1
(i) the integral a f (x) dx converges if limx!+1 F (x) 2 R;
R +1
(ii) the integral a f (x) dx diverges positively (resp., negatively) if limx!+1 F (x) = +1
(resp., 1);
R +1
(iii) …nally, if limx!+1 F (x) does not exist, we say that the integral a f (x) dx does not
exist (or that it is oscillating).

Example 1469 Fix > 0 and let f : [1; +1) ! R be given by f (x) = x . The integral
function F : [1; +1) ! R is
8
Z x < 1
x1 1 if =6 1
F (x) = t dt = 1
1 :
log x if = 1

So, 8
< +1 if 1
lim F (x) = 1
x!+1 : if >1
1
It follows that the improper integral
Z +1
1
dx
1 x

exists for every > 0: it converges if > 1 and diverges positively if 1. N


35.11. IMPROPER INTEGRALS 1053

Example 1470 A continuous time version of the discrete time intertemporal problem of
Section 9.1.2 features an in…nitely lived consumer who chooses over consumption streams
f : [0; 1) ! [0; 1) of a single good. Such streams are evaluated by a continuous time
intertemporal utility function U : A R[0;1) ! R, often de…ned by the improper integral
Z 1
U (f ) = u (f (t)) e t dt
0

with instantaneous utility function u : [0; 1) ! R and exponential discounting e t with


subjective discount factor 2 (0; 1). The domain A is formed by the streams f where this
improper integral converges. N
Ra
The integral 1 f (x) dx on the domain of integration R( 1; a] is de…ned in a sim-
R1 a
ilar way to R a f (x) dx by considering the limit limx! 1 x f (t) dt, that is, the limit
x
limx! 1 a f (t) dt = limx! 1 F (x).

Example 1471 Let f : ( 1; 0] ! R be given by f (x) = xe x2 . We have


Z 0 Z 0
t2 1 x2 1
f (x) dx = lim te dt = lim 1 e =
1 x! 1 x x! 1 2 2

Therefore, the improper integral


Z 0
x2
xe dx
1

exists and converges. N

Let us now consider the improper integral on the domain of integration ( 1; 1).

De…nition 1472 RLet f : R ! R beR a function integrable on every interval [a; b]. If there
+1 a
exist the integrals a f (x) dx and 1 f (x) dx, the function f is said to be integrable ( in
an improper sense) on R and we set
Z +1 Z +1 Z a
f (x) dx = f (x) dx + f (x) dx (35.70)
1 a 1
R +1
provided we do not have an indeterminate form 1 1. The value 1 f (x) dx is called
the improper (or generalized) Riemann integral of f on R.

It is easy to see that this de…nition does not depend on the choice of the point a 2 R.
Often, for convenience, we take Ra = 0.
+1
Also the improper integral 1 f (x) dx is called convergent or divergent according to
whether its value is …nite or is equal to 1.
Next we illustrate this notion withR couple of examples.
R a Note that it is necessary to
+1
compute separately the two integrals a f (x) dx and 1 f (x) dx, whose values must
then be summed (unless the indeterminate form 1 1 arises).
1054 CHAPTER 35. THE RIEMANN INTEGRAL

Example 1473 Let f : R ! R be the constant function f (x) = k. We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
8
>
> +1 if k > 0
<
= lim kx + lim kx = 0 if k = 0
x!+1 x! 1 >
>
:
1 if k < 0
R +1
In other words, 1 kdx = k 1 unless k = 0. N

The value of the integral in the previous example is consistent with the geometric inter-
pretation of the integral as the area (with sign) of the region under f . Indeed, such a …gure
is a big rectangle with in…nite base and height k. Its area is +1 if k > 0, zero if k = 0, and
1 if k < 0.

Example 1474 Let f : R ! R be given by f (x) = xe x2 . We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
t2 t2
= lim te dt + lim te dt
x!+1 0 x! 1 x
1 x2 1 x2 1 1
= lim 1 e + lim e 1 = =0
x!+1 2 x! 1 2 2 2
Therefore, the improper integral
Z +1
x2
xe dx
1

exists and is equal to 0. N

Example 1475 Let f : R ! R be given by f (x) = x. We have


Z +1 Z +1 Z 0 Z x Z 0
f (x) dx = f (x) dx + f (x) dx = lim tdt + lim tdt
1 0 1 x!+1 0 x! 1 x

x2 x2
= lim + lim =1 1
x!+1 2 x! 1 2
So, the improper integral
Z +1
xdx
1

does not exist because we have the indeterminate form 1 1. N

Di¤erently from Example 1473, the value of the integral in this last example is not
consistent with the geometric interpretation of the integral. Indeed, look at the following
picture:
35.11. IMPROPER INTEGRALS 1055

y
2

1
(+)
0
O x
(-)
-1

-2

-3
-3 -2 -1 0 1 2 3

The areas of the two regions under f for x < 0 and x > 0 are two “big triangles” of in…nite
base and height. They are intuitively equal because they are perfectly symmetric with respect
to the vertical axis, but of opposite sign –as indicated by the signs (+) and ( ) in the …gure.
It is then natural to think that they compensate each other, resulting in an integral equal
to 0. Nevertheless, the de…nition requires the separate calculation of the two integrals as
x ! +1 and as x ! 1, which in this case generates the indeterminate form 1 1.

To try to reconcile improper integration on ( 1; +1) with the geometric intuition, we


can follow an alternative route by considering the single limit
Z k
lim f (x) dx
k!+1 k

instead of the two separate limits in (35.70). This motivates the following de…nition.

De…nition 1476 Let f : R ! R be a function integrable on Reach interval [a; b]. The Cauchy
R1 1
principal value, denoted by PV 1 f (x) dx, of the integral 1 f (x) dx is given by
Z +1 Z k
PV f (x) dx = lim f (x) dx
1 k!+1 k

whenever the limit exists in R.

In place of the two limits upon which the Rde…nition of the improper integral is based,
k
the principal value considers only the limit of k f (x) dx. We will see in examples below
that, with this de…nition, the geometric intuition of the integral as the area (with sign) of
the region under f is preserved. It is, however, a weaker notion than the improper integral.
Indeed:

(i) when the improper integral exists, also the principal value exists and one has
Z +1 Z +1
PV f (x) dx = f (x) dx
1 1
1056 CHAPTER 35. THE RIEMANN INTEGRAL

because by Proposition 459-(i) we have


Z +1 Z k Z k Z 0
PV f (x) dx = lim f (x) dx = lim f (x) dx + f (x) dx
1 k!+1 k k!+1 0 k
Z k Z 0
= lim f (x) dx + lim f (x) dx
k!+1 0 k!+1 k
Z k Z 0 Z 1
= lim f (x) dx + lim f (x) dx = f (x) dx
k!+1 0 k! 1 k 1

(ii) the principal value may exist alsoRwhen the improper integral does not exist: in Ex-
+1
ample 1475 the improper integral 1 xdx does not exist, yet
Z +1 Z k
PV xdx = lim xdx = 0
1 k!+1 k

R +1
and therefore PV 1 xdx exists and is …nite.

In sum, the principal value may exist even when the improper integral does not exist.
To better illustrate this key relation between the two notions of integral on ( 1; 1), let us
consider a more general version of Example 1475.

Example 1477 Let f : R ! R be given by f (x) = x + , with 2 R. We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
= lim (t + ) dt + lim (t + ) dt
x!+1 0 x! 1 x

x2 x2
= lim + x + lim x =1 1
x!+1 2 x! 1 2

So the improper integral


Z 1
(x + ) dx
1

does not exist because we have the indeterminate form 1 1. By taking the principal
value, we have
Z +1 Z k
PV f (x) dx = lim (x + ) dx
1 k!+1 k
8
Z k < +1 if >0
= lim xdx + 2 k = 2 lim k = 0 if =0
k!+1 k k!+1 :
1 if <0
R +1
So, the principal value exists: PV 1 (x + ) dx = 1, unless is zero. N
35.11. IMPROPER INTEGRALS 1057

In the last example the principal value agrees with the geometric intuition of the integral
as area with sign. Indeed, when = 0 the intuition is obvious (see the …gure and the
comment after Example 1475). In the case > 0, look at the …gure

2.5 y

1.5

0.5
(+)

0
x
-0.5 (-)
-1

-1.5

-2
-3 -2 -1 0 1 2 3

The negative area of the “big triangle”indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other, what
“is left”is the area of the dotted …gure, which is clearly in…nite and with + sign (lying above
the horizontal axis). For < 0 similar considerations hold:

y
2

1
(+)
0

(-) x
-1

-2

-3
-3 -2 -1 0 1 2 3

The negative area of the “big triangle”indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other out,
“what is left” is here again the area of the dotted …gure, which is clearly in…nite and with
negative sign (lying below the horizontal axis).
1058 CHAPTER 35. THE RIEMANN INTEGRAL

Example 1478 Let f : R ! R be given by f (x) = x= 1 + x2 . We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
x x
= lim dx + lim dx
x!+1 0 1 + x2 x! 1 x 1 + x2
1 1
= lim log 1 + x2 + lim log 1 + x2 =1 1
x!+1 2 x! 1 2

Therefore, the improper integral does not exist because we have the indeterminate form
1 1. By calculating the principal value, we have instead
Z +1 Z k
x
PV f (x) dx = lim dx
1 k!+1 k 1 + x2
1 1
= lim log 1 + k 2 log 1 + k 2 =0
k!+1 2 2

and so
Z +1
x
PV dx = 0:
1 1 + x2
N

35.11.2 Unbounded integration intervals: properties and criteria


We give now some properties of improper integrals, as well as some criteria of improper
integrability, i.e., su¢ cient conditions for a function f de…ned on an unbounded domain to
have an improper integral. For simplicity, we limit ourselves to the domain [a; +1), leaving
to the reader the analogous versions of these criteria for ( 1; a] and ( 1; +1).

Properties

Being de…ned as limits, the properties of improper integrals follow from the properties of
limits of functions (Section 11.4). In particular, the improper integral retains the properties
of linearity and of monotonicity of the Riemann integral.
Let us begin with linearity, which follows from the algebra of limits established in Pro-
position 459.

Proposition 1479 Let f; g : [a; +1) ! R be two functions integrable on [a; +1). Then,
for every ; 2 R, the function f + g : [a; +1) ! R is integrable on [a; +1) and
Z +1 Z +1 Z +1
( f + g) (x) dx = f (x) dx + g (x) dx (35.71)
a a a

provided the second member is not an indeterminate form 1 1.


35.11. IMPROPER INTEGRALS 1059

Proof By the linearity of the Riemann integral, and by points (i) and (ii) of Proposition
459, we have
Z x
lim ( f + g) (x) dx = lim ( F (x) + G (x)) = lim F (x) + lim G (x)
x!+1 a x!+1 x!+1 x!+1
Z +1 Z +1
= f (x) dx + g (x) dx
a a

which implies the improper integrability of the function f + g and (35.71).

The property of monotonicity of limits of functions (see Proposition 458 and its scalar
variant) yields the property of monotonicity of the improper integral.

Proposition
R +1 1480 Let
R +1f; g : [a; +1) ! R be two functions integrable on [a; +1). If f g,
then a f (x) dx a g (x) dx.

Proof Thanks to the monotonicity of the Riemann integral, F (x) G (x) for every x 2
[a; +1). By the monotonicity of the limits of functions, we have therefore limx!+1 F (x)
limx!+1 G (x).
R +1
As we have seen in Example 1473, a 0dx = 0. So, a simple consequence of Proposition
R +1
1480 is that a f (x) dx 0 whenever f is positive and integrable on [a; +1).

Integrability criteria
We give now some integrability criteria, limiting ourselves for simplicity to positive functions
f : [a; +1) ! R. In this case, the integral function F : [a; +1) ! R is increasing. Indeed,
for every x2 x1 a,
Z x2 Z x1 Z x2 Z x1
F (x2 ) = f (t) dt = f (t) dt + f (t) dt f (t) dt = F (x1 )
a a x1 a
R x2
since x1 f (t) dt 0. Thanks to the monotonicity of the integral function, we have the
following characterization of improper integrals of positive functions.

Proposition 1481 Let f : [a; +1) ! R be a function positive and integrable on every
interval [a; b] [a; +1). Then, f is integrable on [a; +1) and
Z +1
f (t) dt = sup F (x) (35.72)
a x2[a;+1)
R1
In particular, a f (t) dt converges only if limx!+1 f (x) = 0 (provided this limit exists).

R +1Positive functions f : [a; +1) ! R are therefore integrable


R1 in an improper sense, that is,
a f (t) dt 2 [0; +1]. In particular, their integral a f (t) dt either converges or diverges
positively: tertium non datur. We have convergence if and only if supx2[a;+1) F (x) < +1,
and
R +1 only if f is in…nitesimal as x ! +1 (provided limx!+1 f (x) exists). Otherwise,
a f (t) dt diverges positively.
1060 CHAPTER 35. THE RIEMANN INTEGRAL

The condition limx!+1 f (x) = 0 is only necessary for convergence, as Example 1469
with 0 < 1 shows. For instance, if = 1 we have limx!+1 1=x = 0, but for every a > 0
we have Z +1 Z x
1 1 x
dt = lim dt = lim log = +1
a t x!+1 a t x!+1 a
R +1
and therefore a (1=t) dt diverges positively.

In stating the necessary condition limx!+1 f (x) = 0 we put the clause “provided this
limit exists”. The next simple example
R 1 shows that the clause is important because the limit
may not exist even if the integral a f (t) dt converges.
Example 1482 Let f : [0; 1) ! R be given by
(
1 if x 2 N
f (x) =
0 otherwise
Rx
By Proposition 1420, it is easy to see that 0 f (t) dt = 0 for every x > 0 and, therefore,
R1
0 f (x) dx = 0. Nevertheless, limx!+1 f (x) does not exist. N
The proof of Proposition 1481 rests on the following simple property of limits of monotonic
functions, which is the version for functions of Theorem 299 for monotonic sequences.
Lemma 1483 Let ' : [a; +1) ! R be an increasing function. Then, limx!+1 ' (x) =
supx2[a;+1) ' (x).
Proof Let us consider …rst the case supx2[a;+1) ' (x) 2 R. Let " > 0. Since supx2[a;+1) ' (x) =
sup ' ([a; +1)), thanks to Proposition 120 there exists x" 2 [a; +1) such that ' (x" ) >
supx2[a;+1) ' (x) ". Since ' is increasing, we have
sup ' (x) " < ' (x" ) ' (x) sup ' (x) 8x x"
x2[a;+1) x2[a;+1)

So, limx!+1 ' (x) = supx2[a;+1) ' (x).


Suppose now that supx2[a;+1) ' (x) = +1. For every M > 0 there exists xM 2 [a; +1)
such that ' (xM ) M . The increasing monotonicity implies ' (x) ' (xM ) M for every
x xM , and therefore limx!+1 ' (x) = +1.

Proof of Proposition 1481 Since f is positive, its integral function F : [a; +1) ! R is
increasing and therefore, by Lemma 1483,
lim F (x) = sup F (x)
x!+1 x2[a;+1)

Suppose that limx!+1 f (x) exists. Let us show that the integral converges only if limx!+1 f (x) =
0. Suppose, by contradiction, that limx!+1 f (x) = L 2 (0; +1]. Given 0 < " < L, there
exists x" > a such that f (x) L " > 0 for every x x" . Therefore
Z +1 Z x" Z +1 Z +1 Z x
f (t) dt = f (t) dt + f (t) dt f (t) dt = lim f (t) dt
a a x" x" x!+1 x
"
Z x
lim (L ") dt = (L ") lim (x x" ) = +1
x!+1 x x!+1
"
35.11. IMPROPER INTEGRALS 1061

R +1
i.e., a f (t) dt diverges positively.

The next result is a simple comparison criterion to determine if the improper integral of
a positive function is convergent or divergent.

Corollary 1484 Let f; g : [a; +1) ! R be two positive functions integrable on every [a; b]
[a; +1), with f g. Then
Z +1 Z +1
g (x) dx 2 [0; 1) =) f (x) dx 2 [0; 1) (35.73)
a a

and Z Z
+1 +1
f (x) dx = +1 =) g (x) dx = +1 (35.74)
a a

The study of integral (35.69) of the Gaussian function f (x) = e x2 , to which we will
devote the next section, is a remarkable application of this corollary.
R +1 R +1
Proof By Proposition 1480, a f (x) dx g (x) dx, while thanks to Proposition
R +1 R +1 a R +1
1481 we have a f (x) dx 2 [0; +1] and a g (x) dx 2 [0; +1]. Therefore, a f (x) dx
R +1 R +1 R +1
converges if a g (x) dx converges, while a g (x) dx diverges positively if a f (x) dx
diverges positively.

Finally, we report an important asymptotic criterion of integrability based on the asymp-


totic nature of the improper integral. We omit the proof.

Proposition 1485 Let f; g : [a; +1) ! R be positive functions integrable on every interval
[a; b] [a; +1).
R +1
(i) If f g as x ! +1, then a g (x) dx converges (diverges positively) if and only if
R +1
a f (x) dx converges (diverges positively).
R +1 R +1
(ii) If f = o (g) as x ! +1 and a g (x) dx converges, then so does a f (x) dx.
R +1 R +1
(iii) If f = o (g) as x ! +1 and a f (x) dx diverges positively, then so does a g (x) dx.
R +1
In light of Example 1469, Proposition 1485 implies that a f (x) dx converges if there
exists > 1 such that
1 1
f or f = o as x ! +1
x x

The comparison with powers x is an important convergence criterion for improper integ-
rals, as the next two examples show.

Example 1486 Let f : [0; 1) ! R be the positive function given by


1 1
sin3 x + x2
f (x) = 1 1
x + x3
1062 CHAPTER 35. THE RIEMANN INTEGRAL

As x ! +1, we have
1
f
x
R +1
By Proposition 1485, 0 f (x) dx = +1, i.e., the integral diverges positively. N

Example 1487 Let f : [0; 1) ! R be a positive function given by


1
f (x) = x sin
x
with < 0. As x ! +1, we have
1
f
x1
R +1
By Proposition 1485, 0 f (x) dx 2 [0; 1), i.e., the integral converges. N

N.B. As the reader can check, what has been proved for positive functions extends easily to
functions f : [a; +1) ! R that are eventually positive, that is, such that there exists c > a
for which f (x) 0 for every x c. O

35.11.3 Gauss integral


2
Consider the Gaussian function f : R ! R given by Rf (x) = e x . Since it is positive,
+1
Proposition 1481 guarantees that the improper integral a f (x) dx exists for every a 2 R.
Let us show that it converges. De…ne g : R ! R by
x
g (x) = e

If x > 0, we have
2
f (x) () e x e x () x x2 () x 1
g (x)
R +1 R +1
By (35.73) of Corollary 1484, if 1 g (x) dx converges, then also 1 f (x) dx converges.
R +1
In turn, this implies that a f (x) dx converges for every a 2 R. This is obvious if a 1.
If a < 1, we have Z Z Z
+1 1 +1
f (x) dx = f (x) dx + f (x) dx
a a 1
R1 R1
Since a f (x) dx exists
R 1because of the continuity of f on [a; 1], the convergence of 1 f (x) dx
then implies that of a f (x) dx.
R +1
Thus, it remains to show that 1 g (x) dx converges. We have
Z x
G (x) = g (t) dt = e 1 e x
1

Hence, (35.72) implies


Z 1
1
g (x) dx = sup G (x) = e < +1
1 x2[1;1)
R +1
It follows that 1 f (x) dx converges, as desired.
35.11. IMPROPER INTEGRALS 1063

In conclusion, the integral


Z +1
x2
e dx
a

is convergent for every a 2 R. By Proposition 1463, this integral cannot be computed in


closed form. Indeed, its computation is not simple at all and, although we omit the proof, we
report a beautiful result for a = 0 due to Gauss (here as never princeps mathematicorum).

Theorem 1488 (Gauss) It holds


Z +1 p
x2
e dx = (35.75)
0 2

It is possible to prove in a similar way that


Z 0 p
x2
e dx = (35.76)
1 2

The equality between integrals (35.75) and (35.76) is quite intuitive in light of the symmetry
of the Gaussian bell with respect to the vertical axis.
Thanks to De…nition 1472, the Gauss integral –i.e., the integral of the Gaussian function
–has therefore value
Z +1 Z +1 Z 0
2 2 2 p
e x dx = e x dx + e x dx = (35.77)
1 0 1

The Gauss integral is central in probability theory, where it is usually presented in the form:
Z +1
1 x2
p e 2 dx
1 2

By proceeding by substitution, it is easy to verify that, for every pair of scalars a; b 2 R, one
has Z +1
(x+a)2 p
e b2 dx = b (35.78)
1
p
By setting b = 2 and a = 0, we then have
Z +1
1 x2
p e 2 dx = 1
1 2

The improper integral on R of the function

1 x2
f (x) = p e 2
2

has therefore unit value and, thus, it is a density function (as it will be seen in Section 38.1).
This explains the importance of this particular form of the Gaussian function.
1064 CHAPTER 35. THE RIEMANN INTEGRAL

35.11.4 Unbounded functions


Another case of improper integral involves a function continuous on a bounded interval [a; b]
except at some points in a neighborhood of which it is unbounded (that is, the limit of the
function at such points is 1).
It is enough to consider the case of only one such point (when there are a few of them,
it is enough to examine them one by one). Next we consider the case in which this point is
the supremum b of the interval.

De…nition 1489 Let f : [a; b) ! R be a continuous function such that limx!b f (x) = 1.
If Z z
lim f (x) dx = lim [F (z) F (a)]
z!b a z!b

exists (…nite or in…nite), the function f is said to be integrable in an improper sense on


Rb Rb
[a; b] and this limit is taken as a f (x) dx. The value a f (x) dx is called improper (or
generalized) Riemann integral.

If the unboundedness of the function concerns the other endpoint a, or both endpoints,
we can give a similar de…nition. If the unboundedness concerns an interior point c 2 (a; b),
it is enough to consider separately the two intervals [a; c] and [c; b].

Example 1490 Let f : [a; b] ! R be given by

f (x) = (b x) with >0

The integral function of f is


8
>
> (b x) +1
< for 0 < 6= 1
F (x) = +1
>
>
: log jb xj for =1

So,
0 if > 1
lim F (x) =
x!b +1 if 0 < 1
It follows that the improper integral
Z b
1
dx
a (b x)
exists for every > 0: it converges if > 1 and diverges positively if 0 < 1. N

R b Proposition 1485 holds also for these improper integrals and allows us to state that
a f (x) dx converges if there exists > 1 such that

1 1
f or f = o as x ! b
(b x) (b x)

The comparison with (b x) is an important convergence criterion for these improper


integrals.
35.11. IMPROPER INTEGRALS 1065

O.R. When the interval is unbounded, for the improper integral to converge the function
must tend to zero quite rapidly (as x with > 1). When the function is unbounded,
for the improper integral to converge the function must tend to in…nity fairly quickly – as
(b x) with > 1. Both things are quite intuitive: for the area of an unbounded surface
to exist …nite, its portion “that escapes to in…nity” must be very narrow.
For example, the function f : R+ ! R+ de…ned by f (x) = 1=x is not integrable either
on intervals of the type [a; +1), with a > 0, or on intervals of the type [0; a]: indeed the
integral function of f is F (x) = log x which diverges when x ! +1 as well as when x ! 0+ .
The functions (asymptotic to) 1= (x b)1+" , with " > 0, are integrable on the intervals of
the type [b; +1), b > 0, as well as on the intervals of the type [0; b]. H
1066 CHAPTER 35. THE RIEMANN INTEGRAL
Chapter 36

Parameter-dependent integrals

Consider a function of two variables

f : [a; b] [c; d] ! R

de…ned on a rectangle [a; b] [c; d] in R2 . If for every y 2 [c; d] the scalar function f ( ; y) :
[a; b] ! R is integrable on [a; b], then to every such y we can associate the scalar
Z b
f (x; y)dx (36.1)
a

Unlike the integrals seen so far, the value of the de…nite integral (36.1) depends on the value
of the variable y, which is usually interpreted as a parameter. Such an integral, referred
to as parameter-dependent integral, therefore de…nes a scalar function F : [c; d] ! R in the
following way:
Z b
F (y) = f (x; y)dx (36.2)
a

Note that, although function f is of two variables, the function F is scalar. Indeed, it does
not depend in any way on the variable x, which here plays the role of a mute variable of
integration.
Functions of type (36.2) appear in applications more frequently than one may initially
think. Therefore, having the appropriate instruments to study them is important.

36.1 Properties
We will study two properties of the function F , namely continuity and di¤erentiability. Let
us start with continuity.

Proposition 1491 If f : [a; b] [c; d] ! R is continuous, then the function F : [c; d] ! R


is continuous, that is,
Z b
lim F (y) = lim f (x; y)dx 8y0 2 [c; d] (36.3)
y!y0 a y!y0

1067
1068 CHAPTER 36. PARAMETER-DEPENDENT INTEGRALS

Formula (36.3) is referred to as “passage of the limit under the integral sign”.

Proof Take " > 0. We must show that there exists a > 0 such that

y 2 [c; d] \ (y0 ; y0 + ) =) jF (y) F (y0 )j < "

By using the properties of integrals, we have


Zb Zb
jF (y) F (y0 )j = (f (x; y) f (x; y0 )) dx jf (x; y) f (x; y0 )j dx
a a

By hypothesis, f is continuous on the compact set [a; b] [c; d]. By Theorem 526, it is
therefore uniformly continuous on [a; b] [c; d] , so there is a > 0 such that
"
k(x; y) (x0 ; y0 )k < =) jf (x; y) f (x0 ; y0 )j < (36.4)
b a
for every (x; y) 2 [a; b] [c; d]. Therefore, for every y 2 [c; d] \ (y0 ; y0 + ) we have

k(x; y) (x; y0 )k = jy y0 j <

which, thanks to (36.4), implies that

Zb
"
jF (y) F (y0 )j jf (x; y) f (x; y0 )j dx < (b a) = "
b a
a

as desired.

The second result analyzes the di¤erentiability of function F .

Proposition 1492 Suppose that f : [a; b] [c; d] ! R and its partial derivative @f =@y are
both continuous on [a; b] [c; d]. Then, the function F : [c; d] ! R is di¤ erentiable on (c; d),
with Z b
0 @
F (y) = f (x; y)dx (36.5)
a @y

Formula (36.5) is referred to as “di¤erentiation under the integral sign”. Since

Zb
0 F (y + h) F (y0 ) f (x; y + h) f (x; y)
F (y) = lim = lim dx
h!0 h h!0 h
a

and Z Z
b b
@ f (x; y + h) f (x; y)
f (x; y)dx = lim dx
a @y a h!0 h
formula (36.5) is then equivalent to

Zb Z b
f (x; y + h) f (x; y) f (x; y + h) f (x; y)
lim dx = lim dx
h!0 h a h!0 h
a
36.1. PROPERTIES 1069

that is, to exchange the order of limits and integrals.

Proof Let y0 2 (c; d). For every x 2 [a; b] the function f (x; ) : [c; d] ! R is by hypothesis
di¤erentiable. By the Mean Value Theorem, then there exists x 2 [0; 1] such that
f (x; y0 + h) f (x; y0 ) @f
= (x; y0 + x h)
h @y
Note that x depends on x. Let us write the di¤erence quotient of function F at y0 :

Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx (36.6)
h @y
a
Zb Zb
f (x; y0 + h) f (x; y0 ) @f
= dx (x; y0 ) dx
h @y
a a
Zb
@f @f
= (x; y0 + x h) (x; y0 ) dx
@y @y
a
Zb
@f @f
(x; y0 + x h) (x; y0 ) dx
@y @y
a

The partial derivative @f =@y is continuous on the compact set [a; b] [c; d], so it is also
uniformly continuous. Thus, given any " > 0, there exists a > 0 such that
@f @f "
k(x; y) (x; y0 )k < =) (x; y) (x; y0 ) < (36.7)
@y @y b a
for every y 2 [c; d]. Therefore, for jhj < we have that

k(x; y0 + x h) (x; y0 )k = x jhj jhj < 8x 2 [a; b]

Thanks to conditions (36.6) and (36.7), this implies that

Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx < " 8jhj <
h @y
a

that is,
Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx "< < (x; y0 ) dx + " 8jhj <
@y h @y
a a

In particular, it holds that


Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx " lim (x; y0 ) dx + "
@y h!0 h @y
a a
1070 CHAPTER 36. PARAMETER-DEPENDENT INTEGRALS

Since the above holds for every " > 0, it follows that

Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx lim (x; y0 ) dx
@y h!0 h @y
a a

that is,
Zb
F (y0 + h) F (y0 ) @f
lim = (x; y0 ) dx
h!0 h @y
a

as desired.

Example 1493 Set f (x; y) = x2 + xy 2 and


Z b
F (y) = x2 + xy 2 dx
a

As the hypotheses of Proposition 1492 are satis…ed, we di¤erentiate under the integral sign:
Z b
0 b2 a2
F (y) = 2y xdx = y
a 2

36.2 Variability: Leibniz’s rule


Consider the general case in which also the limits of the integral are functions of the variable
y. Speci…cally, let
; : [c; d] ! [a; b]
be two functions de…ned on [c; d] taking values on [a; b]. Given f : [a; b] [c; d] R2 ! R,
de…ne G : [c; d] ! R by
Z(y)
G(y) = f (x; y)dx (36.8)
(y)

The following result extends Proposition 1492 to the case of variable limits of integration.

Proposition 1494 Suppose that f : [a; b] [c; d] R2 ! R and its partial derivative @f =@y
are both continuous on [a; b] [c; d]. If ; : [c; d] ! [a; b] are di¤ erentiable, then the function
G : [c; d] ! R is di¤ erentiable on (c; d), with
Z (y)
0 @f 0 0
G (y) = (x; y)dx + (y)f ( (y); y) (y)f ( (y); y) (36.9)
(y) @y

Formula (36.9) is referred to as Leibniz’s rule.


36.2. VARIABILITY: LEIBNIZ’S RULE 1071

Proof Let H : [a; b] [a; b] [c; d] ! R be given by


Zz
H (v; z; y) = f (x; y)dx
v

Since
G(y) = H( (y) ; (y) ; y)
the derivative of G with respect to y at a point y0 2 (c; d) can be calculated via the chain
rule:
@H @H @H
G0 (y0 ) = (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) (36.10)
@v @z @y
where a0 = (y0 ) and b0 = (y0 ). By Proposition 1492, we have
Z b0
@H @
(a0 ; b0 ; y0 ) = f (x; y)dx (36.11)
@y a0 @y

and, by the Second Fundamental Theorem of Calculus, we have


@H @H
(a0 ; b0 ; y0 ) = f (b0 ; y) and (a0 ; b0 ; y0 ) = f (a0 ; y) (36.12)
@z @v
In conclusion,
@H @H @H
G0 (y0 ) = (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) 0 (y) + (a0 ; b0 ; y0 )
@v @z @y
Z b0
@
= f (a0 ; y0 ) 0 (y0 ) + f (b0 ; y) 0 (y) + f (x; y0 )dx
a0 @y
Z (y0 )
@
= f ( (y0 ) ; y) 0 (y0 ) + f ( (y0 ) ; y0 ) 0 (y) + f (x; y0 )dx
(y0 ) @y
as desired.

Example 1495 Let f (x; y) = x2 + y 2 , a (x) = sin x and b (x) = cos x. Set
Z y
cos

G(y) = x2 + y 2 dx
sin y

The hypothesis of Proposition 1494 are satis…ed, so by Leibniz’s rule we have:


Z y
cos
0
G (y) = 2ydx sin y cos2 y + y 2 cos y sin2 y + y 2
sin y
Z y
cos

= 2y dx sin y cos2 y + y 2 + cos y sin2 y + y 2


sin y

= 2y (cos y sin y) sin y cos2 y + y 2 + cos y sin2 y + y 2


N
1072 CHAPTER 36. PARAMETER-DEPENDENT INTEGRALS

36.3 Improper integrals


In applications the parameter-dependent integral (36.1) is often improper. Let f : I J
R2 ! R be a function de…ned on the rectangle I J in R2 whose “sides” I and J are any
two closed, bounded or unbounded, intervals of the real line. For example, if I = R and if
R +1
the improper integral 1 f (x; y)dx converges for every y 2 J, then the function F : J ! R
is de…ned by Z +1
F (y) = f (x; y)dx (36.13)
1
The extension of Proposition 1492 to the improper case is a delicate issue that requires
a dominance condition. For simplicity, in the statement we make the assumption that I is
the real line and J a closed and bounded interval. An analogous result, which we omit for
brevity, holds when I is a half-line and J an unbounded interval.

Proposition 1496 Let f : R [c; d] ! R be continuous on R [c; d] and di¤ erentiable at y


R +1
for every x 2 R. If there exists a positive function g : R ! R such that 1 g (x) dx < +1
and, for every y 2 J,
jf (x; y)j g (x) 8x 2 R (36.14)
then function F : [c; d] ! R is di¤ erentiable on (c; d), with
Z +1
0 @
F (y) = f (x; y)dx (36.15)
1 @y

The proof of this result is not simple, so we omit it. Note that the dominance condition
(36.14),
R +1 which is based on the auxiliary function g, guarantees inter alia that the integral
1 f (x; y)dx converges (thanks to the comparison convergence criterion stated in Corollary
1484).

Example 1497 Let F : [c; d] ! R be given by


Z +1
y 2 x2
F (y) = sin x e dx
1

with c 1 or d 1. Let g be the Gaussian function, that is, g (x) = e x2 . For every
y 2 [c; d], we have
y 2 x2 y 2 x2 y 2 x2
sin x e = jsin xj e e g (x)
R +1 2
Moreover, 1 e x dx < +1. The hypotheses of Proposition 1496 are satis…ed, so formula
(36.15) takes the form
Z +1 Z +1
0 @ y 2 x2 2 2
F (y) = sin x e dx = 2y sin x e y x dx = 2yF (y)
1 @y 1
Chapter 37

Stieltjes’integral

Stieltjes’integral is an important generalization of Riemann’s integral often used in applica-


tions. It can be thought of in the following way: while Riemann’s integral is based on sums
such as
n
X n
X
mk (xk xk 1) and Mk (xk xk 1) (37.1)
k=1 k=1

the Stieltjes’integral is based on sums such as

n
X n
X
mk (g(xk ) g(xk 1 )) and Mk (g (xk ) g (xk 1 )) (37.2)
k=1 k=1

where g is a scalar function. Clearly, (37.1) is the special case of (37.2) that corresponds to
the identity function g(x) = x.
But, why the more general sums (37.2) are relevant? Recall that the sums (37.1) arise
in Riemann integration because every interval [xi 1 ; xi ] obtained by subdividing [a; b] is
measured according to its length xi = xi xi 1 . Clearly, the length is a most natural
way to measure an interval. However, it is not the only way: in some problems it might be
more natural to measure an interval in a di¤erent way. For example, if [xi 1 ; xi ] represents
levels of production between xi 1 and xi , the most appropriate economic measure for such
an interval may be the additional cost that a higher production level entails: if C (x) is
the total cost for producing x, the measure that must be assigned to [xi 1 ; xi ] is then the
di¤erence C (xi ) C (xi 1 ). If [xi 1 ; xi ] represents, instead, an interval in which a random
variable may assume values and F (x) is the probability that such value is x, then the most
natural way to measure [xi 1 ; xi ] is the di¤erence F (xi ) F (xi 1 ). In such cases, which are
quite common in economic applications (see, e.g., Section 37.8), the Stieltjes’integral is the
natural notion of integral to use.
Besides its interest for applications, however, Stieltjes integration also sheds further light
on Riemann integration. Indeed, we will see in this chapter that some results that we
established for Riemann’s integrals are actually best understood in terms of the more general
Stieltjes’integral.

1073
1074 CHAPTER 37. STIELTJES’INTEGRAL

37.1 De…nition
Consider two functions f; g : [a; b] R ! R, with f bounded and g increasing.1 For every
subdivision = fa = x0 ; x1 ; :::; xn = bg of [a; b] and for every interval Ii = [xi 1 ; xi ], we can
de…ne the following quantities

mi = inf f (x) and Mi = sup f (x)


x2Ii x2Ii

Since f is bounded, such quantities are …nite. The sum


n
X
I( ; f; g) = mi (g(xi ) g(xi 1 ))
i=1

is referred to as lower Stieltjes sum, while


n
X
S( ; f; g) = Mi (g(xi ) g(xi 1 ))
i=1

is referred to as upper Stieltjes sum. It can be easily shown that, for every subdivision of
[a; b], we have
I( ; f; g) S( ; f; g)
When the equality holds, we get Stieltjes’integral.

De…nition 1498 A bounded function f : [a; b] ! R is said to be integrable in the sense of


Stieltjes (or Stieltjes integrable) with respect to an increasing function g if

sup I( ; f; g) = inf S( ; f; g)
2 ([a;b]) 2 ([a;b])

Rb
The common value, denoted by a f (x)dg(x), is called integral in the sense of Stieltjes (or
Stieltjes’integral) of f with respect to g on [a; b].

When g (x) = x, we get back to Riemann’s integral. The functions f and g are called
Rintegrand
b
function and integrator function, respectively. For brevity, we will often write
a f dg, thus omitting the arguments of such functions.

N.B. In the rest of the chapter we will tacitly assume f and g to be any two scalar functions
de…ned on [a; b], with f bounded and g increasing. O

37.2 Integrability criteria


There exists a few integrability criteria that ensure that a function f be Stieltjes integrable
with respect to function g. Needless to say, when g is the identity function we get back to
integrability criteria for the Riemann’s integral.
We begin with a criterion that extends the criterion established in Proposition 1419 for
Riemann’s integral (the proof is analogous, so it is omitted).
1
If g were decreasing, we could consider h = g instead, which is clearly increasing.
37.2. INTEGRABILITY CRITERIA 1075

Proposition 1499 The function f is Stieltjes integrable with respect to g if, for every " > 0,
there exists a subdivision 2 ([a; b]) such that S( ; f; g) I( ; f; g) < ".

As for Riemann’s integral, it is important to know which are the classes of integrable
functions. As one may expect, the answer depends on the regularity of both functions f and
g (recall that we assumed g to be increasing).
Rb
Proposition 1500 The integral a f dg exists if at least one of the following two conditions
is satis…ed:

(i) f is continuous;

(ii) f is monotone and g is continuous.

Note that (i) and (ii) generalize, respectively, Propositions 1425 and 1428 for Riemann’s
integral.

Proof (i) The proof relies on the same steps as that of Proposition 1425. Since f is continu-
ous on [a; b], it is also bounded (Weierstrass’Theorem) and uniformly continuous (Theorem
526). Take " > 0. There exists a " > 0 such that

jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (37.3)

Let = fxi gni=0 be a subdivision of [a; b] such that j j < " . By condition (37.3), for every
i = 1; 2; : : : ; n we have
max f (x) min f (x) < "
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

where max and min exist by Weierstrass’Theorem. It follows that


n
X n
X
S ( ; f; g) I ( ; f; g) = max f (x) (g(xi ) g(xi 1 )) min f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
n
X
= max f (x) min f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
< " (g(xi ) g(xi 1 )) = "(g(b) g(a))
i=1

By Proposition 1499, f is integrable.

(ii) Since g is continuous on [a; b], it is also bounded and uniformly continuous. Let " > 0.
There is a " > 0 such that

jx yj < " =) jg (x) g (y)j < " 8x; y 2 [a; b]

Let = fxi gni=0 be a subdivision of [a; b] such that j j < " . For every pair of consecutive
points of such a subdivision, we have that g(xi ) g(xi 1 ) = jg(xi ) g(xi 1 )j < ". The proof
1076 CHAPTER 37. STIELTJES’INTEGRAL

now follows the same steps as that of Proposition 1428. Suppose that f is increasing (if f is
decreasing the argument is analogous). We have

inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

so that
n
X
S ( ; f; g) I ( ; f; g) = sup f (x) (g(xi ) g(xi 1 ))
i=1 x2[xi 1 ;xi ]

Xn
inf f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ]
i=1
n
X n
X
= f (xi ) (g(xi ) g(xi 1 )) f (xi 1 ) (g(xi ) g(xi 1 ))
i=1 i=1
Xn
= (f (xi ) f (xi 1 )) (g(xi ) g(xi 1 ))
i=1
Xn
< " (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1

By Proposition 1419, the function f is integrable.

Lastly, we extend Proposition 1426 to Stieltjes’ integral by requiring that g does not
share discontinuities with f .

Proposition 1501 If f has …nitely many discontinuities and g is continuous at such points,2
then f is Stieltjes integrable with respect to g.

We omit the proof of this remarkable result which, inter alia, generalizes Proposition
1500-(i). However, while Proposition 1426 allowed for in…nitely many discontinuities, in this
more general setting we restrict ourselves to consider …nitely many ones.

37.3 Calculus
When g is di¤erentiable, the Stieltjes’integral can be written as a Riemann’s integral.

Proposition 1502 Let g be di¤ erentiable and g 0 Riemann integrable. Then f is Stieltjes
integrable with respect to g if and only if f g 0 is Riemann integrable. In such a case, we have
Z b Z b
f (x)dg (x) = f (x)g 0 (x)dx (37.4)
a a

Proof Since g 0 is Riemann integrable, for any given " > 0 there exists a subdivision such
that
S(g 0 ; ) I(g 0 ; ) < "
2
In other words, we require the two functions f and g not to be discontinuous at the same points.
37.3. CALCULUS 1077

That is, denoting by Ii = [xi 1 ; xi ] the generic i-th interval in subdivision ,


n
X
sup g 0 (x) inf g 0 (x) xi < " (37.5)
x2Ii x2Ii
i=1

From (37.5) we also deduce that, for any pair of points si ; ti 2 Ii , we have
n
X
g 0 (si ) g 0 (ti ) xi < " (37.6)
i=1

Always referring to the generic interval Ii of the subdivision, we can observe that, thanks to
the di¤erentiability of g, there is a point ti 2 [xi 1 ; xi ] such that

gi = g(xi ) g(xi 1) = g 0 (ti ) xi


n
X n
X
So f (si ) gi = f (si )g 0 (ti ) xi . By denoting M = sup[a;b] jf (x)j, and using inequality
i=1 i=1
(37.6), we have
n
X n
X n
X
f (si ) gi f (si )g 0 (si ) xi = f (si ) g 0 (ti ) g 0 (si ) xi
i=1 i=1 i=1
n
X
M g 0 (si ) g 0 (ti ) xi M"
i=1

So,
n
X n
X
M" f (si ) gi f (si )g 0 (si ) xi M"
i=1 i=1
n
X n
X
Note that S(f g 0 ; ) f (si )g 0 (si ) xi , from which f (si ) gi S(f g 0 ; ) + M ", and
i=1 i=1
so also
S( ; f; g) S(f g 0 ; ) + M " (37.7)
One can symmetrically prove that

S(f g 0 ; ) S( ; f; g) + M " (37.8)

So, by combining (37.7) and (37.8), we get that

S( ; f; g) S(f g 0 ; ) M" (37.9)

Inequality (37.9) holds for any subdivision of the interval [a; b] and for every " > 0. So
Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (37.10)
a a

One can analogously show that


Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (37.11)
a a
1078 CHAPTER 37. STIELTJES’INTEGRAL

From (37.10) and (37.11) one can see that f g 0 is Riemann integrable if and only if f is
Stieltjes integrable with respect to g, in which case we get (37.4).

When f is continuous and g is di¤erentiable, thanks to equation (37.4) a Stieltjes’integral


can be transformed in a Riemann’s integral with integrand function

h(x) = f (x)g 0 (x)

This greatly simpli…es computations easier because the techniques developed to solve Riemann’s
integrals can be then used for Stieltjes’integrals.3
From a theoretical standpoint, Stieltjes’integral substantially extends the scope of Riemann’s
integral, while keeping –also thanks to (37.4) –its remarkable analytical properties. Such a
remarkable balance between generality and tractability explains the importance of Stieltjes’
integral.

Let us conclude with a useful variation on this theme.

PropositionR x 1503 Let g be the integral function of a Riemann integrable function , that
is, g (x) = a (t) dt for every x 2 [a; b]. If f is continuous, we have
Z b Z b
f (x)dg (x) = f (x) (x)dx
a a

We omit the proof of this result. However, when is continuous (so, Riemann integrable)
it follows from the previous result because, by the Second Fundamental Theorem of Calculus,
the function g is di¤erentiable with g 0 = .

37.4 Properties
Properties similar to those for Riemann’s integral hold for Stieltjes’ integral. The only
substantial novelty lies in a linearity property that now holds with respect to both the
integrand function f and integrator function g. Next we list the properties without proving
them (the proofs being similar to those of Section 35.6).

(i) Linearity with respect to the integrand function:


Z b Z b Z b
( f1 + f2 )dg = f1 dg + f2 dg 8 ; 2R
a a a

(ii) Positive linearity with respect to the integrator function:4


Z b Z b Z b
f d( g1 + g2 ) = f dg1 + f dg2 8 ; 0
a a a
3
Riemann’s integral is the simplest example of (37.4), with g 0 (x) = 1.
4
The positivity of and ensures that the integrator function g1 + g2 is increasing.
37.5. STEP INTEGRATORS 1079

(iii) Additivity with respect to the integration interval:


Z b Z c Z b
f dg = f dg + f dg (37.12)
a a c

(iv) Monotonicity:
Z b Z b
f1 f2 =) f1 dg f2 dg
a a

(v) Absolute value:


Z b Z b
f dg jf j dg
a a

37.5 Step integrators


Riemann’s integral is the special case of Stieltjes’integral in which the integrator function
is the identity function g (x) = x . The scope of Stieltjes’ integral becomes clear when
we consider integrator functions that are substantially di¤erent from the identity, like for
example scale functions.
For simplicity, in the next statement we denote the unilateral, right and left, limits of
the integrator g : [a; b] ! R at a point x0 by g (x ) and g (x+ ).5 The di¤erence
g x+
0 g x0
is therefore the potential jump of g at x0 .
Proposition 1504 Let f : [a; b] ! R be continuous and g : [a; b] ! R be a monotone step
function, with discontinuities at the points fc1 ; :::; cn g of the interval [a; b]. We have
Z b X n
f dg = f (ci ) g c+ i g ci (37.13)
a i=1

In other words, Stieltjes’ integral is the sum of all the jumps of the integrator at the
points of discontinuity, multiplied by the value of the integrand in such points. Note that,
as the integrator g is monotone, the jumps are either all positive (increasing monotonicity)
or all negative (decreasing monotonicity).
Rb
Proof By Proposition 1500, the integral a f dg exists. We must show that its value is
(37.13). Let us consider a subdivision of [a; b] which is …ne enough so that in every
interval Ii = [xi 1 ; xi ] there is at most one point of discontinuity cj (otherwise, it would
be enough to add at most n points to obtain the desired subdivision). Therefore, we have
= fx0 ; x1 ; :::; xm g with m n. For such a subdivision, it holds
m
X
I( ; f; g) = mi (g(xi ) g(xi 1 )) (37.14)
i=1

where mi = inf Ii f (x). Consider the generic i-th term of the sum in (37.14), which refers to
interval Ii . There are two cases:
5
That is, g x+
0 = limx!x+ g (x) and g x0 = limx!x g (x). We also set g a = g (a) and g b+ =
0 0
g (b).
1080 CHAPTER 37. STIELTJES’INTEGRAL

1. There exists j 2 f1; 2; :::; ng such that cj 2 Ii . If so, since Ii does not contain any other
points of discontinuity of g besides cj , we have
g(xi 1) = g(cj ) and g(xi ) = g(c+
j )

and furthermore
f (cj ) inf f (x) = mi
Ii
In this case it thus holds
h i
mi (g(xi ) g(xi 1 )) f (cj ) g c+
j g cj (37.15)

Denote by J the set of indexes i 2 f1; 2; :::; mg such that cj 2 Ii for some j 2
f1; 2; :::; ng. Clearly, jJj = n.
2. Ii does not contain any cj . In such a case, g(xi ) = g(xi 1) and so
mi (g(xi ) g(xi 1 )) =0 (37.16)
Let us denote by J c the set of indexes i 2 f1; 2; :::; mg such that cj 2
= Ii for every
c
j = 1; 2; :::; n. Clearly, jJ j = m n.

Obviously, we have J [ J c = f1; 2; :::; mg. Hence


m
X X X
I( ; f; g) = mi (g(xi ) g(xi 1 )) = mi (g(xi ) g(xi 1 )) + mi (g(xi ) g(xi 1 ))
i=1 i2J i2J c

By using (37.15) and (37.16) it is now evident that


X n
X h i
I( ; f; g) = mi (g(xi ) g(xi 1 )) f (cj ) g c+
j g cj
i2J i=1

We can similarly show that


n
X h i
S( ; f; g) f (cj ) g c+
j g cj
i=1

So,
n
X
I( ; f; g) f (ci ) g c+
i g ci S( ; f; g)
i=1
Since the inequalities hold for …ner partitions than the one considered, we have
n
X
sup I( ; f; g) f (ci ) g c+
i g ci inf S( ; f; g)
2 2
i=1
Rb
This implies, since the integral a f dg exists, that
Z b n
X
f dg = sup I( ; f; g) = inf S( ; f; g) = f (ci ) g c+
i g ci
a 2 2
i=1

thus proving the desired result.


37.6. INTEGRATION BY PARTS 1081

Example 1505 Let f; g : [0; 1] ! R be given by f (x) = x2 and


8
>
> 0 if 0 x < 21
<
3
g (x) = 4 if 12 x < 23
>
>
:
1 if 23 x 1

The discontinuities are at 1=2 and 2=3, where we have

1+ 2 1 2+ 2 3
g = ; g =0 ; g =1 ; g =
2 3 2 3 3 4

Equality (37.13) thus becomes


Z 1
1 1+ 1 2 2+ 2
f dg = f g g +f g g
0 2 2 2 3 3 3
2
3 2 3 5
= + 12 1 =
4 3 4 8

Consider an integrator step function with unitary jumps, that is, for every i we have

g c+
i g ci =1

Equation (37.13) then becomes


Z b n
X
f dg = f (ci )
a i=1

In particular, if f is the identity we get


Z b n
X
f dg = ci
a i=1

Stieltjes’ integral thus includes addition as a particular case. More generally, we will soon
see that the moments of a random variable are represented by Stieltjes’integral.

37.6 Integration by parts


For Stieltjes’ integral, the integration by parts formula takes the elegant form of a role
reversal between f and g.

Proposition 1506 Given any two increasing functions f; g : [a; b] ! R, it holds


Z b Z b
f dg + gdf = f (b) g (b) f (a) g (a) (37.17)
a a
1082 CHAPTER 37. STIELTJES’INTEGRAL

Proof For every " > 0 there are two subdivisions, = fxi gni=0 and 0 = fyi gni=0 , of [a; b],
such that
Z b X n
"
f dg f (xi 1 ) (g (xi ) g (xi 1 )) <
a 2
i=1

and
Z b n
X "
gdf g (yi ) (f (yi ) f (yi 1 )) <
a 2
i=1

Let 00 = fzi gni=0 be the subdivision 00 = [ 0 . The two inequalities still hold for subdivision
00 . Moreover, note that

n
X n
X
f (zi 1 ) (g (zi ) g (zi 1 )) + g (zi ) (f (zi ) f (zi 1 )) = f (b) g (b) f (a) g (a)
i=1 i=1

which implies
Z b Z b
f dg + gdf f (b) g (b) + F (a) g (a) < "
a a

Since " was arbitrarily chosen, we reach the desired conclusion.

Thanks to Proposition 1502, whenever f and g are di¤erentiable we get


Z b Z b
0
f g dx + gf 0 dx = f (b) g (b) f (a) g (a)
a a

thus obtaining the integration by parts formula (35.60) for Riemann’s integral.

37.7 Change of variable


The next theorem, whose simple yet tedious proof we omit, establishes the change of variable
formula for Stieltjes’integral.

Theorem 1507 Let f be continuous and g increasing. If ' : [c; d] ! [a; b] is a strictly
increasing function, then (f ') is Stieltjes integrable with respect to g, with
Z d Z '(d)
f (' (t)) d (g ') (t) = f (x) dg (x) (37.18)
c '(c)

If ' is surjective, we can just write


Z d Z a
f (' (t)) d (g ') (t) = f (x) dg (x)
c b

If both ' and g are di¤erentiable, by Proposition 1502 we then have


Z d Z a
0 0
f (' (t)) g (' (t)) ' (t) dx = f (x) dg (x)
c b
37.8. MODELLING ASSETS’GAINS 1083

In particular, if g (x) = x we get back to the Riemann formula (35.62), that is,
Z d Z b
0
f (' (t)) ' (t) dx = f (x) dx
c a

The more general Stieltjes formula thus clari…es the nature of this earlier formula, besides
extending its scope. After integration by parts, the change of variable formula is thus another
result that is best understood in terms of the Stieltjes’integral.

If g is strictly increasing (so, invertible), by setting g 1 = ' in (37.18) we get the


noteworthy formula
Z g(b) Z b
1
f g (t) dt = f (x) dg (x)
g(a) a

When g is strictly increasing, the Stieltjes integral can be computed via a Riemann integral.
This result complements Proposition 1502, which showed that the same is true, but with a
di¤erent formula, when g is di¤erentiable.

37.8 Modelling assets’gains


In this section we show that Stieltjes’integration naturally arises in modelling the perform-
ance of a portfolio over a time interval [0; T ]. Speci…cally, suppose for simplicity that there
is a single …nancial asset that can be traded at price p (t) in frictionless …nancial markets
that open at each point of time t 2 [0; T ]. The function p : [0; T ] ! R+ thus represents the
asset’s price at the di¤erent points of time.
In this temporal setting, a portfolio is described by a function x : [0; T ] ! R, where x (t)
is the number of units of the asset held at t. The positive and negative parts x+ (t) and
x (t) are the portfolio’s long and short positions on the asset at time t , respectively (cf.
Section 24.9.3).
These positions are the outcome of some trading on the asset performed on the open
markets. Suppose that for some reason we change the portfolio only a …nite number of
times. Accordingly, though markets are open at each t, we trade on them only …nitely many
times. Thus, de…ne x : [0; T ] ! R by the step function
n
X1
x (t) = ci 1[ti 1 ;ti )
(t) + cn 1[tn 1 ;tn ]
(t)
i=1

where = fti gni=0 is a subdivision 0 = t0 < t1 < < tn 1 < tn = T of [0; T ]. At each time
t 2 [tk 1 ; tk ) the portfolio x (t) thus features ck units of the asset, the outcome of trading at
the market open at time tk 1 . Till time tk the portfolio does not change, so no trading is
made. The last trading occurs at tn 1 , so at T the position does not change.6
How do a portfolio’s gains/losses cumulate over time? This is a most basic bookkeeping
question that we need to answer to assess a portfolio’s performance. To this end, de…ne the
6
For simplicity, we do not consider any dividend, so the cumulated gains/losses only come from trading
(“capital gains” in the …nance jargon).
1084 CHAPTER 37. STIELTJES’INTEGRAL

integral function Gx : [0; T ] ! R, called gains’ process, by the Stieltjes’integral


Z t
Gx (t) = x (s) dp (s) (37.19)
0

where x is the integrand and p is the integrator. Since x is a step function, it is easy to see
that
( Pk 1
i=1 ci (p (ti ) p (ti 1 )) + ck (p (t) p (tk 1 )) if t 2 [tk 1 ; tk ) ; k = 2; :::; n
Gx (t) = Pn
i=1 ci (p (ti ) p (ti 1 )) if t = T

The gains’process describes how a portfolio’s gains/losses cumulate over time, thus answering
the previous question. To …x ideas, suppose that each ci is positive –i.e., x 0 –and consider
t 2 [t0 ; t1 ). Throughout all the time interval [t0 ; t1 ), the portfolio x features c1 units of the
asset. These units were traded at time 0 at a price p (0) and at time t their price is p (t).
The change in price is p (t) p (t0 ), so the portfolio’s gains/losses up to time t are

Gx (t) = c1 (p (t) p (t0 )) (37.20)

At time t1 , our position changed from c1 to c2 and then remained constant throughout the
time interval [t1 ; t2 ). To obtain this new position, we could have for example sold c1 at time
t1 and bought simultaneously c2 or just directly acquired the di¤erence c2 c1 . If markets are
frictionless, these possible trading strategies are equivalent. So, let us focus on the former.
It yields that, up to time t 2 [t1 ; t2 ), the portfolio’s cumulated gain is

Gx (t) = c1 (p (t1 ) p (t0 )) + c2 (p (t) p (t1 )) (37.21)

Indeed, c1 (p (t1 ) p (t0 )) are the gains/losses matured in the period [0; t1 ] coming from
buying c1 units at 0 and selling them at time t1 , while c2 (p (t) p (t1 )) are the gains/losses
occurred between [t1 ; t), given by the new position c2 . By iterating this reasoning, the
Stieltjes’ integral (37.19) follows immediately – indeed, (37.20) and (37.21) correspond to
t = t1 and t = t2 in such integral. In particular, if one operates in the markets throughout,
from time 0 through time T , so to keep the long and short positions of portfolio x, then one
ends up with the gain/loss Gx (T ).
Finally, we can relax the assumption that portfolios are adjusted only …nitely many times:
as long as functions x and p satisfy, for example, the hypotheses of Proposition 1501, the
gains’process de…ned via the Stieltjes’integral (37.19) is well de…ned and can be interpreted
in terms of gains/losses.
Chapter 38

Moments

In this …nal chapter we outline a study of moments, a notion that plays a fundamental role
in probability theory and, through it, in a number of applications. For us, it is also a way
to illustrate what we learned in the last two chapters.

38.1 Densities
We say that an increasing function g : R ! R is a probability integrator if:

(i) limx! 1 g (x) = 0 and limx!+1 g (x) = 1;

(ii) g is right continuous.

This class of integrators is pervasive in probability theory (in the form of cumulative
distribution functions), and this justi…es their name. If g takes on value 0 outside a bounded
interval, say the unit interval [0; 1] for concreteness, condition (i) reduces to g (0) = 0 and
g (1) = 1.
If g is the integral function of a positive function : R ! R+ , that is,
Z x
g (x) = (t) dt 8x 2 R
1
R +1
we say that is a probability density of g. By condition (i), 1 (x) dx = 1. When g is
continuously di¤erentiable, the Second Fundamental Theorem of Calculus implies g 0 = .

Example 1508 (i) Given any two scalars a < b, consider the probability integrator
8
>
> 0 if x < a
<
x a
g (x) = b a if a x b
>
>
:
1 if x > b

Its probability density, called uniform, is


( 1
b a if a x b
(x) =
0 else

1085
1086 CHAPTER 38. MOMENTS

because Z x Z x
1
(t) dt = dt = g (x) 8x 2 [a; b]
1 a b a
R +1
and 1 (x) dx = 1.
(ii) The Gaussian integrator is
Z x
1 t2
g (x) = p e 2 dt
1 2

The Gaussian probability density is


1 x2
(x) = p e 2
2
R +1
because 1 (t) dt = 1 (see Section 35.11.3). N

38.2 Moments
R +1
The improper Stieltjes integral, denoted 1 f (x) dg (x), can be de…ned in a similar way
than the improper Riemann integral. For it, the proprieties (i)-(v) of Section 37.4 continue
to hold. The next important de…nition rests upon this notion.

De…nition 1509 The n-th moment of an integrator function g is given by the Stieltjes
integral Z +1
n = xn dg (x) (38.1)
1

For instance, 1 is the …rst moment (often called average or mean) of g, 2 is its second
moment, 3 is its third moment, and so on.

Proposition 1510 If the moment n exists, then all lower moments k, with k n, exist.

To assume the existence of higher and higher moments is, therefore, a more and more
demanding requirement. For instance, to assume the existence of the second moment is a
stronger hypothesis than to assume the existence of the …rst moment.

R +1 nTo ease matters,k assumen that there is a scalar a such that g (a) = 0, so that n =
Proof
x dg (x). Since x = o (x ) if k < n, the version for improper Stieltjes integrals of
a R +1
Proposition 1485-(ii) ensures the convergence of a xk dg (x), that is, the existence of k .

If g has a probability density , by Proposition 1503 we have


Z +1 Z +1
n
x dg = xn (x)dx (38.2)
1 1

In this case, we are back to Riemann integration and we directly say that n is the n-th
moment of the density .
38.3. THE PROBLEM OF MOMENTS 1087

Example 1511 (i) For the uniform density we have


Z +1 Z b
1 1 b2 a2 a+b
1 = x (x) dx = x dx = =
1 a b a b a 2 2
Z +1 Z b 3 3
1 1 b a 1 2
2 = x2 (x) dx = x2 dx = = a + ab + b2
1 a b a b a 3 3
(ii) For the Gaussian density we have:
Z +1 Z +1 Z +1 Z 0
1 x2 1 x2 1 x2
1 = x (x) dx = xp e 2 dx = xp e 2 dx + xp e 2 dx
1 1 2 0 2 1 2
Z +1 Z 0
1 x 2 1 x2
= x p e 2 dx ( x) p e 2 dx
0 2 1 2
Z +1 Z +1
1 x 2 1 x2
= x p e 2 dx x p e 2 dx = 0
0 2 0 2
By integrating by parts,
Z +1 Z +1 Z +1
2 2 1 x2 1 x2
2 = x (x) dx = x p e 2 dx = xp xe 2 dx
1 1 2 1 2
+1 Z +1
1 x2 1 x 2
= p xe 2 + p e 2 dx = 0 + 1 = 1
2 1 1 2
p x2
where we adapted (35.60) to the improper case, with g (x) = x= 2 and f 0 (x) = xe 2 , so
p x2
that g 0 (x) = 1= 2 and f (x) = e 2 . N

38.3 The problem of moments


Consider a probability integrator g that takes on value 0 outside the unit interval [0; 1]. The
n-th moment of g is then Z 1
n = xn dg (38.3)
0
If all moments exist, they form a sequence f n g of scalars in [0; 1]. For instance, if g (x) = x
we have n = 1= (n + 1).
In this unit interval setting, the problem of moments takes the following form:

Given a sequence f n g of scalars in [0; 1], is there an integrator g such that, for each
n, the term n is exactly its n-th moment n ?

The question amounts to ask whether sequences of moments have a characterizing prop-
erty, which then f n g should satisfy in order to have the desired property. This question was
…rst posed by Thomas Stieltjes in the same 1894-95 articles where it developed his notion
of integral. Indeed, to provide a setting where to address properly the problem of moments
was a main motivation for his integral (which, as we just remarked, is indeed the natural
setting where to de…ne moments).
Next we present a most beautiful answer given by Felix Hausdor¤ in the early 1920s. To
do it, we need to go back to the …nite di¤erences of Chapter 10.
1088 CHAPTER 38. MOMENTS

De…nition 1512 A sequence fxn g1


n=0 is totally monotone if, for every n 0, we have
( 1)k k xn 0 for every k 0.

In words, a sequence is totally monotone if its …nite di¤erences keep alternating sign
across their orders. A totally monotone sequence is positive because 0 xn = xn , as well as
decreasing because xn 0 (Lemma 386).

We can now answer the question we posed.


R1
Theorem 1513 (Hausdor¤) A sequence f n g [0; 1] is such that n = 0 xn dg for a
probability integrator g if and only if it is totally monotone.

Proof We prove the “only if” part, the converse being signi…cantly more complicated.
k k
So,
R 1 nlet f n gk be a sequence of moments (38.3). It su¢ ces to show that ( 1) xn =
0 t (1 t) dg (t) 0. We proceed by induction on k. For k = 0 we trivially have
k k R1 n k 1 k 1 R1 n k 1
( 1) n = n = 0 t dg (t) for all n. Assume ( 1) xn = 0 t (1 t) dg (t)
for all n (induction hypothesis). Then,

k k 1
xn = xn = k 1 xn+1 k 1
xn
Z 1 Z 1
= ( 1)k 1
tn+1 (1 t)k 1
dg (t) tn (1 t)k 1
dg (t)
0 0
Z 1 Z 1
= ( 1)k 1
tn (1 t)k 1
(1 t) dg (t) = ( 1)k tn (1 t)k dg (t)
0 0

as desired.

The characterizing property of moment sequences is, thus, total monotonicity. It is truly
remarkable that a property of …nite di¤erences is able to pin down moments’sequences. Note
that for this result the Stieltjes integral is required: in the “if” part the integrator, whose
moments turn out to be the terms of the given totally monotone sequence, might well be
non-di¤erentiable (so, the Riemann version (38.2) might not hold).

38.4 Moment generating function


De…nition 1514 Let g be a probability integrator for which there exists " > 0 such that
Z +1
eyx dg (x) < +1 8y 2 ( "; ") (38.4)
1

The function F : ( "; ") ! R de…ned by


Z +1
F (y) = eyx dg (x)
1

is said to be the moment generating function of g.


38.4. MOMENT GENERATING FUNCTION 1089

Assume that g has a probability density , so that


Z +1
F (y) = eyx (x) dx
1

In this case, the function F is of the form (36.13), with

f (x; y) = eyx (x)

We can then use Proposition 1496 to establish the existence and di¤erentiability of the
moment generating function.
R +1 In particular, if there exists " > 0 and a positive function
g : R ! R such that 1 g (x) dx < +1 and, for every y 2 [ "; "],

eyx (x) g (x) 8x 2 R

then F : ( "; ") ! R is di¤erentiable, with


Z +1 Z +1
0 @ yx
F (y) = e (x) dx = xeyx (x) dx
1 @y 1

At y = 0 we get
F 0 (0) = 1

The derivative at 0 of the moment generating function is, thus, the …rst moment of the
density. R +1
If there exists a positive function h : R ! R such that 1 h (x) dx < +1 and, for every
y 2 [ "; "],
jxeyx (x)j = jxj eyx (x) h (x) 8x 2 R
then, by Proposition 1496, F : ( "; ") ! R is twice di¤erentiable, with
Z +1 Z +1
00 @ yx
F (y) = xe (x) dx = x2 eyx (x) dx
1 @y 1

At y = 0 we get
F 00 (0) = 2

By proceeding in this way (if possible), with higher order derivatives we get:

F 000 (0) = 3
(iv)
F (0) = 4

F (n) (0) = n

The derivative of order n at 0 of the moment generating function is, thus, the n-th moment
of the density. This fundamental property justi…es the name of this function.
1090 CHAPTER 38. MOMENTS

x2
Example 1515 For the Gaussian density (x) = e 2 we have
Z +1 Z +1 Z +1
x2 1 2
F (y) = eyx (x) dx = eyx e 2 dx = e 2 (x 2yx) dx
1 1 1
Z +1 Z +1 2
Z +1
y2
1
x2
( 2yx+y 2 y2 ) dx = 1
x2
( 2yx+y 2 )+ y2 1
(x y)2
= e 2 e 2 dx = e 2 e 2 dx
1 1 1

where in the fourth equality we have added and subtracted y 2 . But, (35.78) of Chapter
R +1 1 2 y2 y2
35 implies 1 e 2 (x y) dx = 1, so F (y) = e 2 . We have F 0 (y) = ye 2 and F 00 (y) =
y2
e 2 (1 y), so 1 = F 0 (0) = 0 and 2 = F 00 (0) = 1. N

The next example shows that not all densities have a moment generating function; in
this case there is no " > 0 such that the integral (38.4) is …nite.

Example 1516 Let 8 1


< x2
if x > 1
(x) =
:
0 else
R +1 2 dx
This is the so-called Pareto probability density (recall from Example 1469 that 1 x =
1). For every y > 0 we have
Z +1 Z +1 yx Z +1 yx
yx e e
e (x) dx = 2
dx = dx = +1
1 1 x 1 x2

Therefore, the moment generating function does not exist. Since


Z +1 Z +1
1 1
1 = x 2 dx = dx = +1
1 x 1 x

the …rst moment does not exist either. By the comparison criterion for improper Riemann
integrals, this implies n = +1 for every n 1. This density has no moments of any order.
N

Suppose that the moment generating function has derivatives of all orders. By Theorem
367,
X1
y 2 x2 y 3 x3 y n xn y n xn
eyx = 1 + yx + + + + + =
2 3! n! n!
n=0
So, it is tempting to write:
Z +1 Z 1
+1 X X 1 Z +1 X yn 1
yx y n xn y n xn
F (y) = e (x) dx = (x) dx = (x) dx = n
1 1 n=0 n! 1 n! n!
n=0 n=0

Under suitable hypotheses, spelled out in more advanced courses, it is legitimate to give in
to this temptation. Moment generating functions can be then expressed as a power series
with coe¢ cients given by the moments of the density (divided by factorials).
Part IX

Appendices

1091
Appendix A

Binary Relations

A.1 De…nition
Throughout the book we already encountered a few times binary relations, but we never
formally introduced them. In a nutshell, the notion of binary relation formalizes the idea
that an element x is in a relation with an element y. It is an abstract notion that is best
understood after having seen a few concrete examples that make it possible to appreciate its
unifying power. We discuss it in an Appendix, so that readers can decide if and when to go
through it.
A …rst example of a binary relation is the relation “being greater or equal than” among
natural numbers: given any two natural numbers x and y, we can always say if x is greater
or equal than y. For instance, 6 is greater or equal than 4. In this example, x and y are
natural numbers and “being in relation with” is equivalent to say “being greater or equal
than”.
The imagination is the only limit to the number of binary relations one can think of. Set
theory is the language that we can use to formalize the idea that two objects are related to
each other. For example, given the set of citizens I of a country, we could say that x is in
relation with y if x is the mother of y. In this case, “being in relation with” amounts to
“being the mother of ”.
Economics is a source of examples of binary relations. For instance, consider an agent
and a set of alternatives X. The preference relation % is a binary relation. In this case, “x
is in relation with y” is equivalent to say “x is at least as good as y”.
What do all these examples have in common? First, in all them we considered two
elements x and y of a set X. Second, these elements x and y were in a speci…c order: indeed,
one thing is to say that x is in relation with y, another is to say that y is in relation with
x. So, the pair formed by x and y is an ordered pair (x; y) that belongs to the Cartesian
product X X. Finally, in all three examples it might well happen that a generic pair of
elements x and y is actually unrelated. For instance, if in our second example x and y are
siblings, neither is obviously a mother of the other. In other words, a given notion of “being
in relation with” might not include all pairs of elements of X.
We are now ready to give a (set theoretic) de…nition of binary relations.

De…nition 1517 Given a non-empty set X, a binary relation is a subset R of X X.

1093
1094 APPENDIX A. BINARY RELATIONS

In terms of notation, we write xRy in place of (x; y) 2 R. Indeed, the notation xRy,
which reads “x is in the relation R with y”, is more evocative of what the concept of binary
relation is trying to capture. So, in what follows we will adopt it.
To get acquainted with this new mathematical notion, let us now formalize our …rst three
examples.

Example 1518 (i) Let X be the set of natural numbers N. The binary relation can be
viewed as the subset of N N given by

R = f(x; y) 2 N N : x is greater or equal than yg

Indeed, it contains all pairs in which the …rst element x is greater or equal than the second
element y.
(ii) Let X be the set of all citizens C of a country. The binary relation “being the mother
of” can be viewed as the subset of C C given by

R = f(x; y) 2 C C : x is the mother of yg

Indeed, it contains all pairs in which the …rst element is the mother of the second element.
(iii) Let X be the set of all consumption bundles Rn+ . The binary relation % can be seen
as the subset of Rn+ Rn+ given by

R = (x; y) 2 Rn+ Rn+ : x % y

Indeed, it contains all pairs of bundles in which the …rst bundle is at least as good as the
second one. N

A binary relation associates to each element x of X some element y of the same set
(possibly x itself, i.e., x = y). We denote by R (x) = fy 2 X : xRyg the image of x through
R, i.e., the collections of all y that stand in the relation R with a given x.

Example 1519 (i) For the binary relation on N, the image R (x) = fy 2 N : y xg of
x 2 N consists of all natural numbers that are greater or equal to x. (ii) For the binary
relation “being the mother of” on C, the image R (x) consists of all children. (iii) For the
binary relation % on Rn+ , the image R (x) = y 2 Rn+ : y % x of x 2 Rn+ consists of all
bundles that are at least as good as x. N

Any binary relation R induces a self-correspondence : X X de…ned by (x) = R (x).


Vice versa, any self-correspondence : X X induces a binary relation R on X de…ned
by xRy if y 2 (x). So, binary relations and self-correspondences are two sides of the same
coin. Depending on the applications, one side may turns out to be more interesting than the
other.

Example 1520 A self-map f : X ! X can be viewed as a binary relation

Rf = f(x; f (x)) : x 2 Xg

on X consisting of all pairs (x; f (x)). The image Rf (x) = ff (x)g is a singleton consisting
of the image f (x). Indeed, functions can be regarded as the binary relations on X that have
singleton images, i.e., that associate to each element of X a unique element of X. N
A.2. PROPERTIES 1095

A.2 Properties
A binary relation R can satisfy several properties. In particular, a binary relation R on a
set X is:

(i) re‡exive if xRx for every x 2 X;

(ii) transitive if xRy and yRz implies xRz for every x; y; z 2 X;

(iii) complete if, for every x; y 2 X, either xRy or yRx or both;

(iv) symmetric if xRy implies yRx for all x; y 2 X;

(v) asymmetric if xRy implies not yRx for all x; y 2 X;

(vi) antisymmetric if xRy and yRx implies x = y for all x; y 2 X.

Often we will consider binary relations that satisfy more than one of these properties.
However, some of them are incompatible, for example asymmetry and symmetry, while others
are related, for example completeness implies re‡exivity.1

Example 1521 (i) Consider the binary relation on N. Clearly, is complete (so, it is
re‡exive). Indeed, given any two natural numbers x and y, either is greater or equal than
the other. Actually, if both x y and y x, then x = y. Thus, is antisymmetric. Finally,
is transitive but it is neither symmetric nor asymmetric.
(ii) Let R be the binary relation “being the mother of” on C. An individual cannot be
his/her own mother, so R is not re‡exive (thus, it is not complete either). Similarly, R is
not symmetric since if x is the mother of y, then y cannot be the mother of x. A similar
argument shows that, instead, R is antisymmetric. We leave to the reader to verify that R
is not transitive. N

Example 1522 Let R be the binary relation “being married to”on C. This relation consists
of all pairs of citizens (x; y) 2 C C such that x is the spouse of y. That is, xRy means that
x is married to y. The image R (x) is a singleton consisting of the spouse. The “married to”
relation is neither re‡exive (individuals cannot married to themselves) nor antisymmetric
(married couples do not become single individuals). It is symmetric since individuals are
each other spouses, while transitivity does not hold because xRy and yRz implies x = z.
Finally, this relation is not complete if jCj 3. In fact, suppose that R is complete and
that there exist three distinct elements x; y; z 2 X. By completeness, we have xRy, xRz and
yRz. By symmetry, zRx. Since xRy and xRz imply z = y, we then contradict z 6= y. N

The relation on N is the prototype for the following important class of binary relations.

De…nition 1523 A binary relation R on a set X is said to be a partial order if it satis…es


re‡exivity, antisymmetry, and transitivity. If re‡exivity is replaced by completeness, R is a
complete order.
1
Indeed, if R is a complete binary relation on X, we can consider x 2 X and de…ne y = x. Since R is
complete, we either have xRy or yRx or both. In any case, since x = y, we obtain that xRx, which yields
re‡exivity.
1096 APPENDIX A. BINARY RELATIONS

For example, the binary relation on Rn satis…es re‡exivity, transitivity, and antisym-
metry, so it is a partial order (cf. Section 2.3). If n = 1, this binary relation is complete, thus
is a complete order. If n > 1, this is no longer the case, as we emphasized several times
in the text –for instance, the vectors (1; 2) and (2; 1) cannot be ordered by the relation .

Example 1524 (i) Consider the space of sequences R1 = fx = (x1 ; :::; xn ; :::) : xn 2 R for each n
The componentwise order on R1 de…ned by x y if xn yn for each n 1 is easily seen
to be a partial order. (ii) Given any set A, consider the space AR of real-valued functions
f : A ! R. The pointwise order on AR de…ned by f g if f (x) g (x) for all x 2 A
is also easily seen to be a partial order (the componentwise order on R1 is the special case
A = N). (iii) Consider the power set 2X = fA : A Xg of a set X, i.e., the collection of
all its subsets (cf. Section 7.3). The inclusion relation on 2X is a partial order. Unless
X contains only two elements, is not complete –e.g., if X = fa; b; cg, the sets fa; bg and
fb; cg cannot be ordered by the inclusion relation. N

The preference relation % is typically assumed to be re‡exivity and transitive (Section


6.8). It is also often assumed to be complete. In contrast, antisymmetry is a too strong prop-
erty for a preference relation in that it rules out the possibility that two di¤erent alternatives
be indi¤erent. For example, if X is a set of sports cars, an agent could rightfully declare a
Ferrari as good as a Lamborghini and obviously these two objects are quite di¤erent cars.
This important example motivates the next de…nition.

De…nition 1525 A binary relation R on a set X is said to be a preorder if it satis…es


re‡exivity and transitivity. If re‡exivity is replaced by completeness, R is a complete preorder
(or a weak order).

So, the preference relations that one usually encounters in economics are an import-
ant example of complete preorders. Interestingly, we also encountered a preorder when we
discussed the notion of “having cardinality less or equal than” (Section 7.3).

Example 1526 Let 2R be the collection of all subsets of the real line. De…ne the binary
relation on 2R by A B if jAj jBj, i.e., if A has cardinality higher or equal than B
(Section 7.3). By Proposition 259, is re‡exive and transitive, so it is a preorder. It is
not, however, a partial order because antisymmetry is clearly violated: for example, the sets
A = f1; g and B = f2; 5g have the same cardinality – i.e., both A B and B A – yet
they are di¤erent, i.e., A 6= B. N

Clearly, a preorder is a partial order, while this example shows that the converse is false.

A.3 Equivalence relations


In analogy with how a preference relation induces an indi¤erence relation (Section 6.8), any
binary relation R on X induces a binary relation I on X by saying that xIy if both xRy
and yRx. This induced relation is especially well behaved when R is a preorder, as next we
show.

Proposition 1527 Let R be a preorder on a set X. The induced binary relation I is re‡ex-
ive, symmetric, and transitive.
A.3. EQUIVALENCE RELATIONS 1097

This result is the general abstract version of what Lemma 239 established for a preference
relation.

Proof Consider x 2 X and y = x. Since R is re‡exive and y = x, we have both xRy and
yRx. So, by de…nition xIx, proving re‡exivity of I. Next assume that xIy. By de…nition,
we have that xRy and yRx, which means that yRx and xRy, yielding that yIx and proving
symmetry. Finally, assume that xIy and yIz. It follows that xRy and yRx as well as yRz
and zRy. By xRy and yRz and the transitivity of R, we conclude that xRz. By yRx and
zRy and the transitivity of R, we conclude that zRx. So, we have both xRz and zRx,
yielding xIz and proving the transitivity of I. We have thus proved that I is an equivalence
relation.

This result motivates the following de…nition.

De…nition 1528 A binary relation R on a set X is an equivalence relation if it satis…es


re‡exivity, symmetry, and transitivity.

The indi¤erence relation is, of course, an important economic example of an equivalence


relation. More generally, the induced relation I is an equivalence relation by Proposition
1527. Equivalence relations play an important role in both mathematics and applications
because they formalize a notion of similarity. Re‡exivity captures the idea that an object
must be similar to itself, while symmetry amounts to say that if x is similar to y, then y is
similar to x. As for transitivity, an analogous argument holds.

Let R be an equivalence relation. Given any element x 2 X we write

[x] = fy 2 X : yRxg

The collection [x], which is nothing but the image R (x) of x, is called the equivalence class
of x.

Lemma 1529 If y 2 [x], then [y] = [x].

Thus, the choice of the representative x in de…ning the equivalence class is immaterial:
any element of the equivalence class can play that role.

Proof Let y 2 [x]. Then [y] [x]. In fact, if y 0 2 [y], then y 0 Ry and so by transitivity y 0 Rx,
i.e., y 0 2 [x]. On the other hand, y 2 [x] implies x 2 [y] by symmetry. So, [x] [y]. We
conclude that [y] = [x].

For a preference relation, the equivalence classes are the indi¤erence classes, i.e., [x] is
the collection of all alternatives indi¤erent to x. Let us see another classic example.

Example 1530 The preorder on 2R of Example 1526 induces the equivalence relation
on 2R de…ned by A B if and only if jAj = jBj, i.e., if A has the same cardinality than B
If we consider the set Q, the equivalence class [Q] is the class of all sets that are countable,
for example N and Z. Intuitively, this binary relation declares two sets similar if they share
the same number of elements. N
1098 APPENDIX A. BINARY RELATIONS

At this point the reader might think that all equivalence relations are necessarily induced
by a preorder, so have the form I. The next classic example shows that this is not the case.

Example 1531 Let n 2 Z be such that n 2. Consider the binary relation R on the set
of integers Z such that xRy if and only if n divides x y, that is, there exists k 2 Z such
that x y = kn. Clearly, for any x 2 Z, we have xRx since x x = kn with k = 0. At the
same time, if x and y in Z are such that xRy, then x y = kn for some k 2 Z, yielding that
y x = ( k) n. It follows that yRx, proving that R is symmetric. Finally, if x, y, and z in
Z are such that xRy and yRz, then x y = kn and y z = k 0 n for some k; k 0 2 Z, yielding
that x z = (k + k 0 ) n. It follows that xRz, proving that R is transitive. We conclude that
R is an equivalence relation. It is often denoted by x = y (mod n). N

The next result shows that equivalence relations are closely connected to partitions of X,
so to subdivisions of the set of interest X in mutually exclusive classes. It generalizes the
basic property that indi¤erence curves are disjoint (Lemma 240).

Lemma 1532 If R is an equivalence relation on a set X, the collection of its equivalence


classes f[x] : x 2 Xg is a partition of X. Vice versa, any partition = fAi gi2I of X is the
collection of equivalence classes of the equivalence relation R de…ned by xRy if there exists
A 2 such that x; y 2 A.

Proof The collection f[x] : x 2 Xg is a partition of X. For, given any x; y 2 X, suppose


[x] \ [y] 6= ;. We want to show that [x] = [y]. We …rst prove [y] [x]. Let y 0 2 [y]. Since
[x]\[y] 6= ;, let z 2 [x]\[y]. Since y 0 2 [y], we have y 0 Ry. Since z 2 [x]\[y], we have zRx and
zRy. By symmetry, this implies yRz and, by transitivity, we conclude yRx. By transitivity
again and since y 0 Ry, we …nally obtain that y 0 Ry, that is, y 0 2 [x], proving the inclusion. A
dual argument yields the opposite inclusion [x] [y]. Hence, [x] = [y], as desired. We leave
the rest of the statement to the reader.

The collection f[x] : x 2 Xg of all equivalence classes determined by an equivalence re-


lation R is called quotient space and is denoted by X=R. In other words, the points of the
quotient space are the equivalence classes.

Example 1533 (i) The relation “having the same age” is an equivalence relation on C,
whose equivalence classes consist of all citizens that have the same age, that is, who belong to
same age cohort. The quotient space has, as points, the age cohorts. (ii) For the indi¤erence
relation on Rn+ , the quotient space has, as points, the indi¤erence curves. N
Appendix B

Permutations

B.1 Generalities
Combinatorics is an important area of discrete mathematics, useful in many applications.
Here we focus on permutations, a fundamental combinatorial notion that is important to
understand some of the topics of the book.
We start with a simple problem. We have at our disposal three pairs of pants and …ve
T-shirts. If there are no chromatic pairs that hurt our aesthetic sense, in how many possible
ways can we dress? The answer is very simple: in 3 5 = 15 ways. Indeed, let us call the
pairs of pants a, b, c and the T-shirts 1, 2, 3, 4, 5: since the choice of a certain T-shirt does
not impose any (aesthetic) restriction on the choice of the pants, the possible pairings are
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
We can therefore conclude that if we have to make two independent choices, one among
n di¤erent alternative and the other among m di¤erent alternatives, the total number of
possible choices is n m. In particular, suppose that A and B are two sets with n and m
elements, respectively. Their Cartesian product A B, which is set of ordered pairs (a; b)
with a 2 A and b 2 B, has n m elements. That is:
Proposition 1534 jA Bj = jAj jBj.
What has been said can be easily extended to the case of more than two choices: if we
have to make multiple choices, none of which imposes restrictions on the others, the total
number of possible choices is the product of the numbers of alternatives for each choice.
Formally:
Proposition 1535 jA1 A2 An j = jA1 j jA2 j jAn j.
Example 1536 (i) How many Italian licence plates are possible? They have the form
AA 000 AA with two letters, three digits, and again two letters. There are 22 letters
that can be used and, obviously, 10 digits. The number of (di¤erent) plates is, therefore,
22 22 10 10 10 22 22 = 234; 256; 000. (ii) In a multiple choice test, in each question
students have to select one of the three possible answers. If there are 13 questions, then the
overall number of possible selections is 313 = 1; 594; 323. N

1099
1100 APPENDIX B. PERMUTATIONS

B.2 Permutations
Intuitively, a permutation of n distinct objects is a possible arrangement of these objects.
For instance, with three objects a, b, c there are 6 permutations:

abc , acb , bac , bca , cab , cba (B.1)

We can formalize this notion through bijective functions.

De…nition 1537 Let X be any collection. A permutation on X is a bijective function


f : X ! X.

Permutations are thus nothing but the bijective functions f : X ! X. Though combin-
atorics typically considers …nite sets X, the de…nition is fully general.
For instance, if X = fa; b; cg the permutations f : fa; b; cg ! fa; b; cg that correspond to
the arrangements (B.1) are:

(i) abc corresponds to the permutation f (x) = x for all x 2 X;

(ii) acb corresponds to permutation f (a) = a, f (b) = c and f (c) = b;

(iii) bac corresponds to permutation f (a) = b, f (b) = a and f (c) = c;

(iv) bca corresponds to permutation f (a) = b, f (b) = c and f (c) = a;

(v) cab corresponds to permutation f (a) = c, f (b) = a and f (c) = b;

(vi) cba corresponds to permutation f (a) = c, f (b) = b and f (c) = a.

We have a …rst important result.

Proposition 1538 The number of permutations on a set with n elements is n! = 1 2


n.

The number n! is called factorial of n. We set conventionally 0! = 1.


To understand, heuristically, the result consider any arrangement of the n elements. In
the …rst place we can put any element: the …rst place can therefore be occupied in n di¤erent
ways. In the second place we can place any of the remaining elements: the second place can
be occupied in n 1 di¤erent ways. By proceeding in this way, we see that the third position
can be occupied in n 2 di¤erent ways, and so on so forth, till 1 since at the end of the
process we have no choice because only one element is left. The number of the permutations
is, therefore, n (n 1) (n 2) 2 1 = n!.

Example 1539 (i) A deck of 52 cards can be reshu- ed in 52! di¤erent ways. (ii) Six
passengers can occupy in 6! = 720 di¤erent ways a six-passenger car. N

The recursive formula


n! = n (n 1)!
B.3. ANAGRAMS 1101

permits to de…ne the sequence of factorials xn = n! also by recurrence as xn = nxn 1 , with


…rst term x1 = 1. The rate of growth of this sequence is impressive, as the following table
shows:
n 0 1 2 3 4 5 6 7 8 9 10
n! 1 1 2 6 24 120 720 5; 040 40; 320 362; 880 3; 628; 800

Indeed, Lemma 337 showed that n = o (n!). The already very fast exponentials are actually
slower than factorials, which de…nitely deserve their exclamation mark.

B.3 Anagrams
We now drop the requirement that the objects be distinct and allow for repetitions. Spe-
ci…cally in this section we consider P n objects of h n di¤erent types, each type i with
multiplicity ki , with i = 1; :::; h, and hi=1 ki = n.1 For instance, consider the 6 objects

a; a; b; b; b; c

There are 3 types a, b, and c with multiplicity 2, 3, and 1, respectively. Indeed, 2 + 3 + 1 = 6.


How many distinguishable arrangements are there? If in this example we distinguished
all the objects by using a di¤erent index for the identical objects, a1 ; a2 ; b1 ; b2 ; b3 ; c, there are
6! = 720 permutations. If now we eliminate the distinctive index to the three letters b, they
can be permuted in 3! di¤erent ways in the terns of places occupied by them. Such 3! di¤erent
permutations (when we write b1 ; b2 ; b3 ) are no longer distinguishable (by writing b; b; b).
Therefore, the di¤erent permutations of a1 ; a2 ; b; b; b; c are 6!=3!. A similar argument shows
that, by removing the distinctive index to the two letters a, the distinguishable permutations
reduce to 6!= (3!2!) = 60.
In general, one can prove the following result.

Proposition 1540 The number of distinct arrangements, called permutations with repeti-
tions (or anagrams), is
n!
(B.2)
k1 !k2 ! kh !

The integers (B.2) are called multinomial coe¢ cients.

Example 1541 (i) The possible anagrams of the word ABA are 3!= (2!1!) = 3. They
are ABA, AAB, BAA. (ii) The possible anagrams of the word MAMMA are 5!= (3!2!) =
120= (6 2) = 10. N

In the important two-type case, h = 2, we have k objects of one type and n k of the
other type. By (B.2), the number of distinct arrangements is

n!
(B.3)
k! (n k)!
1
Note that, because of repetitions, these n objects do not form a set X. The notion of “multiset” is
sometimes used for collections in which repetitions are permitted.
1102 APPENDIX B. PERMUTATIONS

This number is usually denoted by


n
k
and is called binomial coe¢ cient. In particular,
n n! n (n 1) (n k + 1)
= =
k k! (n k)! k!
with
n n!
= =1
0 0!n!
The following identity can be easily proved, for 0 k n,
n n
=
k n k
It captures a natural symmetry: the number of distinct arrangements remains the same,
regardless of which of the two types we focus on.

Example 1542 (i) In a parking lot, spots can be either free or busy. Suppose that 15
out of the 20 available spots are busy. The possible arrangements of the 5 free spots (or,
symmetrically, of the 15 busy spots) are:
20 20
= = 15; 504
5 15
(ii) We repeat an experiment 100 times: each time we can record either a “success” or a
“failure”, so a string of a 100 outcomes like F SF F:::S results. Suppose that we have recorded
92 “successes” and 8 “failures”. The number of the di¤erent strings that may result is:
100 100
= = 186; 087; 894; 300
92 8
N

We close with the nice and easily proved formula, for 1 k n,


n n n 1
=
k k k 1
that relates binomial coe¢ cients with the corresponding ratios and establishes a recurrence
for binomial coe¢ cients.

B.4 Newton’s binomial formula


From high school we know that
(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
(a + b)3 = a3 + 3a2 b + 3ab2 + b3
More generally, one has the following result.
B.4. NEWTON’S BINOMIAL FORMULA 1103

Theorem 1543 (Tartaglia-Newton) It holds that

n n n n n
(a + b)n = an + a 1
b+ a 2 2
b + + abn 1
+ bn (B.4)
1 2 n 1
n
X n n k k
= a b
k
k=0

Proof We proceed by induction. The initial step, that is the veracity of the statement for
n = 1, is trivially veri…ed. Indeed:
1
1 1 0 0 1 1 1 0 1 0 1 X 1 1 k k
(a + b) = a + b = a b + a b = a b + a b = a b
0 1 k
k=0

We next prove the inductive step. We assume the statement holds for n, that is,
n
X n n
(a + b)n = a k k
b
k
k=0

and we show it holds for n + 1 as well. In doing so, we will use the combinatorial identity
(10.5), that is,
n+1 n n
= + 8i = 1; :::; n
i i 1 i
Note that
n
X n n
(a + b)n+1 = (a + b) (a + b)n = (a + b) a k k
b
k
k=0
n
X n
X
n n+1 k k n n k k+1
= a b + a b
k k
k=0 k=0
Xn n+1
X
n n+1 i i n
= a b + an+1 i bi
i i 1
i=0 i=1
n
X n
n n+1 i i X n
= an+1 + a b + an+1 i bi + bn+1
i i 1
i=1 i=1
Xn
n n
= an+1 + + an+1 i bi + bn+1
i 1 i
i=1
n
X n+1
X n+1
n+1 n + 1 n+1 i i n+1
= a + a b +b = an+1 i bi
i i
i=1 i=0

So, the statement holds for n + 1, thus proving the induction step and the main statement.

Formula (B.4) is called the Newton binomial formula. It motivates the name of binomial
n
coe¢ cients for the integers . In particular,
k
1104 APPENDIX B. PERMUTATIONS

n
X n k
(1 + x)n = x
k
k=0

If we take x = 1 we obtain the remarkable relation


n n n n
+ + + + = 2n
0 1 2 n

which can be used to prove that if a …nite set has cardinality n , then its power set has
cardinality 2n (cf. Proposition 257). Indeed, there is only one, 1 = n0 , subset with 0
elements (the empty set), n = n1 subsets with only one element, n2 subsets with two
elements, ..., and …nally only one, 1 = nn , subset –the set itself –with all the n elements.

More generally, one can prove the multinomial formula:


X n!
(a1 + a2 + + ah )n = ak1 ak2 akhh
k1 !k2 ! kh ! 1 2
P
where the sum is over all the choices of natural numbers k1 , k2 ,..., kh such that hi=1 ki = n.
This formula motivates the name of multinomial coe¢ cients for the integers (B.2).
Appendix C

Notions of trigonometry

C.1 Generalities

We call trigonometric circle the unit circle with center at the origin and radius 1, oriented
counterclockwise, and on which one moves starting from the point of coordinates (1; 0).

y
1.5

0.5

(1,0)
0
O x
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Clearly, each point on the circle determines an angle between the positive horizontal axis
and the straight line joining the point with the origin; vice versa, each angle determines a
point on the circle. This correspondence between points and angles can be, equivalently,
viewed as a correspondence between points and arcs of circle. In the following …gure the

1105
1106 APPENDIX C. NOTIONS OF TRIGONOMETRY

point P determines the angle , as well as the arc 0

y
1.5
P
P
2
1

α'
0.5

α
0
O P 1 x
1
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Angles are usually measured in either degrees or radians. A degree is the 360th part of
a round angle (corresponding to a complete round of the circle); a radian is an, apparently
strange, unit of measure that assigns measure 2 to a round angle; it is therefore its 2 -th
part. We will use the radian as unit of measure of angles because it presents some advantages
over the degree. In any case, the next table lists some equivalent values of degrees and
radians.
degrees 0 30 45 60 90 180 270 360
3
radians 0 2
6 4 3 2 2
Angles that di¤er by one or more complete rounds of the circle are identical: to write or
+ 2k , with k 2 Z, is the same. We will therefore always take 0 <2 .

Fix a point P = (P1 ; P2 ) on the trigonometric circle, as in the previous …gure. The sine
of the angle determined by the point P is the ordinate P2 of such point, while the cosine
of is the abscissa P1 .
The sine and the cosine of the angle are denoted, respectively, by sin and cos . The
sine is positive in the quadrants I and II, and negative in the quadrants III and IV. The
cosine is positive in the quadrants I and IV, and negative in the quadrants II and III. For
example,
3
0 p4 2 2 2
2
sin 0 p2 1 0 1 0
2
cos 1 2 0 1 0 1

In view of the previous discussion, for every k 2 Z we have

sin ( + 2k ) = sin and cos ( + 2k ) = cos (C.1)

Note that Pythagoras’Theorem guarantees that, for every 2 R,

sin2 + cos2 =1 (C.2)


C.2. CONCERTO D’ARCHI (STRING CONCERT) 1107

This classic identity is sometimes called the Pythagorean trigonometric identity.


Fixed again a point P on the circle, we call tangent of the angle determined by P ,
written tan , the ratio between its ordinate and its abscissa, i.e.,
sin
tan =
cos
The tangent is positive in the quadrants I and III, and negative in the quadrants II and IV.
For example,
3
0 4 2 2 2
tan 0 1 !1 0 !1 0

Again, for every k 2 Z,


tan ( + k ) = tan (C.3)
Since tan = sin = cos , from the Pythagorean trigonometric identity it follows

tan2
sin2 =
1 + tan2
Finally, the reciprocals of sine, cosine, and tangent are called secant, cosecant, and cotangent,
respectively.

C.2 Concerto d’archi (string concert)


We list, just for sine and cosine, some simple relations between angles (arcs).

(i) Angles and :


sin ( )= sin ; cos ( ) = cos

(ii) Angles and 2 :

sin = cos ; cos = sin


2 2

(iii) Angles and 2 + :

sin + = cos ; cos + = sin


2 2

(iv) Angles and :

sin ( ) = sin ; cos ( )= cos

(v) Angles and + :

sin ( + ) = sin ; cos ( + ) = cos

Next we list some formulas that we do not prove (in any case, it would be enough to
prove the …rst two because the other ones are simple consequences).
1108 APPENDIX C. NOTIONS OF TRIGONOMETRY

Addition and subtraction formulas:

sin ( + ) = sin cos + sin cos ; cos ( + ) = cos cos sin sin

and

sin ( ) = sin cos sin cos ; cos ( ) = cos cos + sin sin (C.4)

Doubling and bisection formulas:

sin 2 = 2 sin cos ; cos 2 = cos2 sin2

and r r
1 cos 2 1 + cos 2
sin = ; cos =
2 2
Prostaphaeresis formulas (addition and subtraction):

sin ( + ) + sin ( ) = 2 sin cos ; sin ( + ) sin ( ) = 2 cos cos

and

cos ( + ) + cos ( ) = 2 cos cos ; cos ( + ) cos ( )= 2 sin sin

We close with a few classic theorems that show how trigonometry is intimately linked
to the study of triangles. In these theorems a, b, c denote the lengths of the three sides
of a triangle and , , the angles opposite to them:

Theorem 1544 (Law of Sines) Sides are proportional to the sines of their opposite angles,
that is,
a b c
= =
sin sin sin
An interesting consequence of the law of sines is that the area of a triangle can be
expressed in trigonometric form via the length of two sides and of the angle opposite to the
third side. Speci…cally, if the two sides are b and c, the area is
1
bc sin (C.5)
2
Indeed, draw in the last …gure a perpendicular from the top vertex to the side of length c,
and denote its length by h. From, at least, high school we know that the area of the triangle
C.2. CONCERTO D’ARCHI (STRING CONCERT) 1109

is ch=2 (it is the classic “half the base times the height”formula). Consider the right triangle
that has the side of length b as hypotenuse and the perpendicular of length h as a cathetus.
By the law of sines,

h b
=
sin sin 2

So, h = b sin . From the high school formula ch=2 it then follows the trigonometric formula
(C.5).

Example 1545 Some important geometric …gures in the plane can be subdivided in tri-
angles, so their area can be recovered by adding up the area of such triangles. For instance,
consider a regular polygon with n sides of equal length and n angles of equal measure 2 =n.
For example, in the following …gure we have an hexagon with six sides of equal length and
six angles of equal measure =3 (i.e., 60 degrees):

Denote by r the radius of this regular polygon. The area of each regular polygon is partitioned
in n identical isosceles triangles with two sides of equal length r. For instance, in the hexagon
there are six such triangles. By formula (C.5), the area of each of these identical isosceles
triangles is 2 1 r2 sin 2 =n, so the area of the polygon is

n 2 2
r sin (C.6)
2 n

p p
For example, the area of the hexagon is 3r2 3=2 since sin =3 = 3=2.
The subdivision of geometric …gures of the plane in triangles is called triangulation, an
important technique that may permit to reduce the study of geometric …gures to that of
triangles (by taking limits via arbitrarily small triangles, the technique becomes especially
powerful). N
1110 APPENDIX C. NOTIONS OF TRIGONOMETRY

Example 1546 The famous number can be de…ned as the area of the closed unit ball

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

To compute amounts to compute this area, a problem that Archimedes famously ap-
proached via the method of exhaustion. This method considers the areas of inscribed and
circumscribed polygons, which provide lower and upper approximations for , respectively.
Indeed, the area of any inscribed polygon is always , while the area of any circumscribed
polygon is always . For instance, consider a regular polygons inscribed in the closed unit
ball, like the hexagon:

By increasing the number of sides, we get larger and larger inscribed regular polygons that
provide better and better lower approximations of . The area of each such polygon is given
by formula (C.6). Since their radius r is 1, we thus have the lower approximations

n 2
sin 8n 1
2 n
that are better and better as n increases. At the limit, we have:
n 2
lim sin =
n!1 2 n
C.2. CONCERTO D’ARCHI (STRING CONCERT) 1111

Indeed, by setting x = 2 =n we have

n 2 sin n2 sin x sin x


lim sin = lim 2 = lim x = lim =
n!1 2 n n!1 x!0 x!0 x
n

Similarly, by increasing the number of sides we get smaller and smaller circumscribed regular
polygons that provide better and better upper approximations of . The radius r of the
circumscribed regular polygon with n sides is the length of the equal sides of the isosceles
triangles in which it can be partitioned. So, r = cos 1 =n > 1 as the reader can check with
the help of the next …gure:

By formula (C.6), we thus have the upper approximations


n 1 2
sin 8n 1
2 cos2 n n
that are better and better as n increases. At the limit, by setting again x = 2 =n we have:
n 1 2 1 sin x
lim sin = lim x =
n!1 2 cos2 n x!0 cos2 x
n 2

Summing up,
n 2 n 1 2
" sin sin # 8n 1 (C.7)
2 n 2 cos2 n n
Via a trigonometric argument, we thus showed that the areas of the inscribed and circum-
scribed regular polygons provide lower and upper approximations of that, as the number
of sides increases, better and better sandwich till, in the limit of “in…nitely many sides”,
they reach as their common limit value.1
The trigonometric approximations (C.7) thus justify the use of the method of exhaustion
to compute . Archimedes was able to compute the area of the inscribed and circumscribed
regular polygons till n = 96, getting the remarkable approximation
10 1
3:1408 = 3 + 3+ = 3:1429
71 7
1
The role of in the approximations is to identify radians, so the actual knowledge of is not needed
(thus, there is no circularity in using these approximations for ).
1112 APPENDIX C. NOTIONS OF TRIGONOMETRY

By computing the areas of the inscribed and circumscribed regular polygons for larger and
larger n, we get better and better approximations of . N

We close with a result that generalizes Pythagoras’ Theorem, which is the special case
when the triangle is right and side a is the hypotenuse (indeed, cos = cos =2 = 0).

Theorem 1547 (Carnot) We have a2 = b2 + c2 2ab cos .

C.3 Perpendicularity
The trigonometric circle consists of the points x 2 R2 of unit norm, that is, kxk = 1. Hence,
any point x = (x1 ; x2 ) 2 R2 can be moved back on the unit circle by dividing it by its norm
kxk since
x
=1
kxk
The following picture illustrates:

It follows that
x2 x1
sin = and cos = (C.8)
kxk kxk
that is,
x = (kxk cos ; kxk sin )
This trigonometric representation of the vector x is called polar. The components kxk cos
and kxk sin are called polar coordinates.

The angle can be expressed through the inverse trigonometric functions arcsin x,
arccos x, and arctan x. To this end, observe that
x2
sin kxk x2
tan = = x1 =
cos kxk x1
C.3. PERPENDICULARITY 1113

Together with (C.8), this implies that


x2 x1 x2
= arctan = arccos = arcsin
x1 kxk kxk

The equality = arctan x2 =x1 is especially important because it permits to express the angle
as a function of the coordinates of the point x = (x1 ; x2 ).

Let x and y be two vectors in the plane R2 that determine the angles and :

By (C.4), we have

x y = (kxk cos ; kxk sin ) (kyk cos ; kyk sin )


= kxk kyk (cos cos + sin sin ) = kxk kyk cos ( )

that is,
x y
= cos ( )
kxk kyk
where is the angle that is di¤erence of the angles determined by the two points.
1114 APPENDIX C. NOTIONS OF TRIGONOMETRY

This angle is a right one, i.e., the vectors x and y are “perpendicular”, when
x y
= cos = 0
kxk kyk 2

that is, if and only if x y = 0. In other words, two vectors in the plane R2 are perpendicular
when their inner product is zero.
Appendix D

Elements of intuitive logic

In this chapter we will introduce some basic notions of logic. Though, “logically”, these
notions should actually be placed at the beginning of a textbook, they can be best appreciated
after having learned some mathematics (even if in a logically disordered way). This is why
this chapter is an Appendix, leaving to the reader to judge when it is best to read it.

D.1 Propositions
We call proposition a statement that can be either true or false. For example, “ravens are
black” and “in the year 1965 it rained in Milan” are propositions. On the contrary, the
statement “in the year 1965 it has been cold in Milan”is not a proposition, unless we specify
the meaning of cold, for example with the proposition “in the year 1965 the temperature
went below zero in Milan”.
We will denote propositions by letters such as p; q; :::. Moreover, we will denote for the
sake of brevity with 1 and 0, respectively, the truth or the falsity of a proposition: these are
called truth values.

D.2 Operations
Let us list some operations on propositions.

(i) Negation. Let p be a proposition; the negation, denoted by :p, is the proposition that
is true when p is false and that is false when p is true. We can summarize the de…nition
in the following truth table
p :p
1 0
0 1

which reports the truth values of p and :p. For instance, if p is “in the year 1965 it
rained in Milan”, then :p is “in the year 1965 it did not rain in Milan”.

(ii) Conjunction. Let p and q be two propositions; the conjunction of p and q, denoted by
p ^ q, is the proposition that is true when p and q are both true and is false when at

1115
1116 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

least one of the two is false. The truth table is:

p q p^q
1 1 1
1 0 0
0 1 0
0 0 0

For instance, if p is “in the year 1965 it rained in Milan” and q is “in the year 1965
the temperature went below zero in Milan”, then p ^ q is “in the year 1965 it rained in
Milan and the temperature went below zero”.

(iii) Disjunction. Let p and q be two propositions; the disjunction of p and q, denoted by
p _ q, is the proposition that is true when at least one between p and q is true and is
false when both of them are false.1 The truth table is:

p q p_q
1 1 1
1 0 1
0 1 1
0 0 0

For instance, with the previous examples of p and q, then p _ q is “in the year 1965 it
rained in Milan or the temperature went below zero”.

(iv) Conditional. Let p and q be two propositions; the conditional, denoted by p =) q, is


the proposition with truth table:

p q p =) q
1 1 1
1 0 0
0 1 1
0 0 1 (D.1)

The conditional is therefore true if, when p is true, also q is true, or if p is false (in
which case the truth value of q is irrelevant). The proposition p is called the antecedent
and q is the consequent. For instance, suppose the antecedent p is “I go on vacation”
and the consequent q is “I go to the sea”; the conditional p =) q is “If I go on
vacation, then I go to the sea”.

(v) Biconditional. Let p and q be two propositions; the biconditional, denoted by p () q,


is the proposition (p =) q) ^ (q =) p) that involves the implication p =) q and
1
As the union symbol [, also the disjunction symbol _ reminds of the Latin “vel”, an inclusive “or”, as
opposed to the exclusive “aut”.
D.3. LOGICAL EQUIVALENCE 1117

its converse q =) p, with truth table:

p q p =) q q =) p p () q
1 1 1 1 1
1 0 0 1 0
0 1 1 0 0
0 0 1 1 1

The biconditional is, therefore, true when the two involved implications are both true
or both false. With the last example of p and q, the biconditional p () q is “I go on
vacation if and only if I go to the sea”.

These …ve logical operations allow us to build new propositions form old ones. Starting
from the three propositions p, q, and r, through negation, disjunction and conditional we
can build, for example, the proposition

: ((p _ :q) =) r)

Its truth table is:


p q r :q p _ :q (p _ :q) =) r : ((p _ :q) =) r)
1 1 1 0 1 1 0
0 1 1 0 0 1 0
1 0 1 1 1 0 1
0 0 1 1 1 1 0
1 1 0 0 1 0 1
0 1 0 0 0 1 0
1 0 0 1 1 0 1
0 0 0 1 1 0 1

O.R. The true-false dichotomy originates in the Eleatic school, which based its dialectics
upon it (Section 1.8). Apparently, it …rst appears as “[a thing] is or it is not” in the poem
of Parmenides (trans. Raven). A serious challenge to the universal validity of the true-false
dichotomy has been posed by some, old and new, paradoxes. We already encountered the set
theoretic paradox of Russell (Section 1.1.4). A simpler, much older, paradox is that of the
liar: consider the self-referential proposition “this proposition is false”. Is it true or false?
Maybe it is both.2 Be that as it may, in many matters – in mathematics, let alone in the
empirical sciences –the dichotomy can be safely assumed.

D.3 Logical equivalence


Two classes of propositions are central, contradictions and tautologies. A proposition is called
contradiction if it is always false, while it is called tautology if it is always true. Obviously,
contradictions and tautologies have, respectively, truth tables with only values 0 and only
values 1. For this reason, we write p 0 if p is a contradiction and p 1 if p is a tautology.
2
A proposition such that both it and its negation are true has been called dialetheia.
1118 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

In other words, the symbol 0 denotes a generic contradiction and the symbol 1 a generic
tautology.
Two propositions p and q are said to be (logically) equivalent, written p q, when they
have the same truth values, i.e., they are always both true or both false. In other words, two
propositions p and q are equivalent when the co-implication p () q is a tautology, i.e., it
is always true. The relation is called logical equivalence.
The following properties are evident:

(i) p ^ p p and p _ p p (idempotence);

(ii) : (:p) p (double negation);

(iii) p ^ q q ^ p and p _ q q _ p (commutativity);

(iv) (p ^ q) ^ r p ^ (q ^ r) and (p _ q) _ r p _ (q _ r) (associativity).

Moreover, one has that:

(v) p ^ :p 0 (law of non-contradiction);

(vi) p _ :p 1 (law of excluded middle).

In words, proposition p ^ :p is a contradiction: a proposition and its negation cannot


be both true. In contrast, proposition p _ :p is a tautology: a proposition is either true or
false, tertium non datur. Indeed:

p :p p ^ :p p _ :p
1 0 0 1
0 1 0 1

If p is the proposition “all ravens are black”, the contradiction p ^ :p is “all ravens are both
black and non-black” and the tautology p _ :p is “all ravens are either black or non-black”.

The de Morgan’s laws are:

: (p ^ q) :p _ :q and : (p _ q) :p ^ :q

They can be proved through the truth tables; we con…ne ourselves to the …rst law:

p q p^q : (p ^ q) :p :q :p _ :q
1 1 1 0 0 0 0
1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 0 1 1 1 1

The table shows that the true values of : (p ^ q) and of :p _ :q are identical, as claimed.
Note an interesting duality: the laws of non-contradiction and of the excluded middle can
be derived one from the other via de Morgan’s laws.
D.4. DEDUCTION 1119

It is easily seen that p =) q is equivalent to :q =) :p, that is,


(p =) q) (:q =) :p) (D.2)
Indeed:
p q p =) q :p :q :q =) :p
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1

The proposition :q =) :p is called the contrapositive of p =) q. Each conditional is,


therefore, equivalent to its contrapositive.

Finally, another remarkable equivalence for the conditional is


: (p =) q) (p ^ :q) (D.3)
That is, the negation of a conditional p =) q is equivalent to the conjunction between p
and the negation of q. Indeed:
p q p =) q : (p =) q) p ^ :q
1 1 1 0 0
1 0 0 1 1
0 1 1 0 0
0 0 1 0 0

N.B. Given two equivalent propositions, one of them is a tautology if and only if the other
one is so. O

D.4 Deduction
D.4.1 Theorems and proofs
An equivalence is a biconditional which is a tautology, i.e., which is always true. In a similar
vein, we call implication a conditional which is a tautology, that is, (p =) q) 1. In this
case, if p is true then also q is true.3 We say that q is a logical consequence of p, written
p j= q.
The antecedent p is now called hypothesis and the consequent q thesis. Naturally, we
have p q when simultaneously p j= q and q j= p.
In our naive setup, a theorem is a proposition of the form p j= q, that is, an implication.
The proof is a logical argument that proves that the conditional p =) q is actually an
implication.4 To do this it is necessary to establish that, if the hypothesis p is true, then
also the thesis q is true. Usually we choose one among the following three di¤erent types of
proof:
3
When p is false the implication is automatically true, as the truth table (D.1) shows.
4
In these introductory notes we remain vague about what a “logical argument” is, leaving a more de-
tailed analysis to more advanced courses. We expect, however, that readers can (intuitively) recognize, and
elaborate, such arguments.
1120 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

(a) direct proof : p j= q, i.e., to establish directly that, if p is true, also q is so;
(b) proof by contraposition: :q j= :p, i.e., to establish that the contrapositive :q =) :p
is a tautology (i.e., that if q is false, so is p);
(c) proof by contradiction (reductio ad absurdum): p ^ :q j= r ^ :r, i.e., to establish that
the conditional p ^ :q =) r ^ :r is a tautology (i.e., that, if p is true and q is false,
we reach a contradiction r ^ :r).
The proof by contraposition relies on the equivalence (D.2) and is, basically, an upside
down direct proof (for instance, Theorem 1554 will be proved by contraposition). For this
reason in what follows we will focus on the two main types of proofs, direct and by contra-
diction.

N.B. (i) When both p j= q and q j= p hold, the theorem takes the form of equivalence p q.
The implications p j= q and q j= p are independent and each of them requires its own proof
(this is why in the book we studied separately the “if” and the “only if”). (ii) When, as it
is often the case, the hypothesis is the conjunction of several propositions, we write
p1 ^ ^ pn j= q (D.4)
So, the scope of the implication p j= q is broader than it may appear prima facie. O

D.4.2 Direct proofs


Sometimes p j= q can be proved with a direct argument.
Theorem 1548 If n is odd, then n2 is odd.
Proof Since n is odd, there is a natural number k such that n = 2k + 1. Then, n2 =
(2k + 1)2 = 2 2k 2 + 2k + 1, so n2 is odd.

Direct proofs are, however, often articulated in several steps, in a divide et impera spirit.
In this regard, the next result is key.
Proposition 1549 j= is transitive.
Proof Assume p j= r and r j= q. We have to show that p =) q is a tautology, that is, that
if p is true, then q is true. Assume that p is true. Then, r is true because p j= r. In turn,
this implies that q is true because r j= q.

By iterating transitivity, we then get the following deduction scheme: p j= q if


p j= r1
r1 j= r2
(D.5)
rn j= q
The auxiliary n propositions ri break up the direct argument in n steps, thus forming a chain
of reasoning. We can write horizontally the scheme as:
p j= r1 j= r2 j= j= rn j= q
D.4. DEDUCTION 1121

Example 1550 (i) Assume that p is “n2 + 1 is odd” and q is “n is even”. To prove p j= q,
let us consider the auxiliary proposition “n2 is even”. The implication p j= r is obvious,
while the implication r j= q will be proved momentarily (Theorem 1553). Jointly, these two
implications provide a direct proof p j= r j= q of p j= q, that is, of the proposition “if n2 + 1
is odd, then n is even”. (ii) Assume that p is “the scalar function f is di¤erentiable” and q
is “the scalar function f is integrable”. To prove p j= q is natural to consider the auxiliary
proposition “the scalar function f is continuous”. The implications p j= r and r j= q are
basic calculus results that, jointly, provide a direct proof p j= r j= q of p j= q, that is, of the
proposition “if the scalar function f is di¤erentiable, then it is integrable”. N

When p p1 _ _ pn , we have the (easily checked) equivalence

(p1 _ _ pn ) =) q (p1 =) q) ^ ^ (pn =) q)

Consequently, to establish pi j= q for each i = 1; ::; n amounts to establish p j= q. This is the


so-called proof by cases, where each pi j= q is a case. Needless to say, the proof of each case
may require its own deduction scheme (D.5).

Theorem 1551 If n is any natural number, then n2 + n is even.

Proof Assume that p is “n is any natural number”, p1 is “n is an odd number”, p2 is “n is


an even number”, and q is “n2 + n is even”. Since p p1 _ p2 , we prove the two cases p1 j= q
and p2 j= q.
Case 1: p1 j= q. We have p1 = 2k + 1 for some natural number k, so n2 + n =
(2k + 1)2 + 2k + 1 = 2 2k 2 + 3k + 1 , which is even.
Case 2: p2 j= q. We have p1 = 2k for some natural number k, so n2 + n = (2k)2 + 2k =
2 2k 2 + 1 , which is even.

D.4.3 Reductio ad absurdum


To understand the rationale of the proof by contradiction, note that the truth table

p q p ^ :q r ^ :r p =) q p ^ :q =) r ^ :r
1 1 0 0 1 1
1 0 1 0 0 0
0 1 0 0 1 1
0 0 0 0 1 1

proves the logical equivalence

(p =) q) (p ^ :q =) r ^ :r) (D.6)

Hence, p =) q is true if and only if p ^ :q =) r ^ :r is true. Consequently, to establish


p ^ :q j= r ^ :r amounts to establish p j= q.
It does not matter what is the proposition r because, in any case, r^:r is a contradiction.
In a more compact way, we can rewrite the previous equivalence as

(p =) q) (p ^ :q =) 0)
1122 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

The proof by contradiction is the most intriguing (recall Section 1.8 on the birth of the
deductive method). We illustrate it with one of the gems of Greek mathematics that we saw
in the …rst chapter. For brevity, we do not repeat the proof of the …rst chapter and just
present its logical analysis.
p
Theorem 1552 22
= Q.

Logical analysis In this, as in other theorems it might seem that there is no hypothesis, but
it is not so: simply the hypothesis is concealed. For example, here the concealed hypothesis
is “the axioms of arithmetic, in particular those aboutp arithmetical operations, hold”. Let
a be this concealed hypothesis,5 let q be the thesis “ 2 2= Q”, and let r be the proposition
“m=n is reduced to its lowest terms”. The scheme of the proof is a ^ :q j= r ^ :r, i.e., if
arithmetical operations apply, the negation of the thesis leads to a contradiction.

An important special case of the equivalence (D.6) is when the role of r is played by the
hypothesis p itself. In this case, (D.6) becomes

(p =) q) (p ^ :q =) p ^ :p)

The following truth table

p q p =) q p ^ :q :p p ^ :q =) :p p ^ :q =) p ^ :p
1 1 1 0 0 0 1
1 0 0 1 0 0 0
0 1 1 0 1 1 1
0 0 1 0 1 1 1

proves the equivalence (p ^ :q =) p ^ :p) (p ^ :q =) :p). In the special case r = p


the reductio ad absurdum is, therefore, based on the equivalence

(p =) q) (p ^ :q =) :p)

In words, it is necessary to show that the hypothesis and the negation of the thesis imply,
jointly, the negation of the hypothesis. Let us see an example.

Theorem 1553 If n2 is even, then n is even.

Proof Let us assume, by contradiction, that n is odd. Then n2 is odd, which contradicts
the hypothesis.

Logical analysis. Let p be the hypothesis “n2 is even” and q the thesis “n is even”. The
scheme of the proof is p ^ :q j= :p.
5
This discussion will become clearer after the next section on the deductive method. In any case, we can
think of a = a1 ^ ^ an as the conjunction of a collection A = fa1 ; :::; an g of axioms of arithmetic (in our
naive setup, we do not worry wether all such axioms can be expressed via propositional calculus, an issue
that readers will study in more advanced courses). In terms of (D.7), in this theorem there is no speci…c
hypothesis p.
D.5. DEDUCTIVE METHOD 1123

D.4.4 Summing up
Proofs require, in general, some inspiration: there are no recipes or mechanical rules that
can help us in …nding in a proof by contradiction an auxiliary proposition r that determines
the contradiction and in a direct proof the auxiliary propositions ri that permit to articulate
a direct argument.
As to terminology, the implication p j= q can be read in di¤erent, but equivalent, ways:

(i) p implies q;

(ii) if p, then q;

(iii) p only if q ;

(iv) q if p;

(v) p is a su¢ cient (condition) for q;

(vi) q is a necessary (condition) for p.

The choice among these versions is a matter of expositional convenience. Similarly, the
equivalence p q can be read as:

(i) p if and only if q;

(ii) p is a necessary and su¢ cient (condition) for q.

For example, the next simple result shows that the implication “a > 1 j= a2 > 1” is
true, i.e., that “a > 1 is a su¢ cient condition for a2 > 1”, i.e., that “a2 > 1 is a necessary
condition for a > 1”.

Theorem 1554 If a > 1, then a2 > 1.

Proof Let us proceed by contraposition.


p Let a2 1. We want to show that a 1. This
follows by observing that jaj = a2 1.

D.5 Deductive method


D.5.1 Collections
Let P be a collection of propositions that is closed under the logical operations _, ^, :, =),
and (). For instance, if the propositions a, b and c belong to P then also the proposition
: ((a _ :b) =) c) belongs to P .
If = fp 1 ; :::; pn g is a collection of propositions in P , we say that q is a logical consequence
of , and we denote the implication (D.4) by j= q. Logical consequences are established
via deductive reasoning. Such reasoning might well be sequential, according for example to
the deduction scheme (D.5).
If all propositions in are true, so are their logical consequences. We say that is
(logically):
1124 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

(i) consistent if there is no q 2 P such that both j= q and j= :q;

(ii) independent if there is no p 2 such that fpg j= p;

(iii) complete if, for all q 2 P , either j= q or j= :q;

In words, consistency requires that the conjunction p = p1 ^ ^ pn of the propositions in


be not a contradiction, while independence requires that no proposition in P be a logical
consequences of other ones in P (so, it is super‡uous). Finally, completeness requires that
each proposition in P , or its negation, be a logical consequence of propositions in P .

D.5.2 Deductive method


Using the few notions of propositional logic that we learned, we can now outline a (highly
stylized) description of the deductive (or axiomatic) method which is a central canon of
Western thought after Greek geometry (cf. Section 1.8).
In a mathematical theory, the propositions in P are written through primitive terms,
whose meaning is regarded as self-evident (so not explained, famous examples being “points”
and “lines” in Euclidean geometry and “sets” in set theory), and through de…ned terms,
whose meaning is expressed in terms either of primitive terms or of previously de…ned terms.
The theory then posits a set of propositions A = fa1 ; :::; an g in P , called axioms, that
are assumed to be true “without establishing them in any way” (e.g., the parallel axiom in
Euclidean geometry).6 The set A, called axiomatic system, is assumed to be consistent, so
the conjunction a = a1 ^ ^ an of the axioms is not a contradiction. Ideally, the axiomatic
system should be independent, so there are no redundant axioms. The axiomatic system is
complete when the truth or falsehood of every proposition in P can, in principle, be deduced
from the axioms.
Theorems in the theory take the form

= A [ fp g j= q (D.7)

That is, consists of the axioms as well as of a speci…c hypothesis (which, of course, can in
turn be the conjunction of several propositions). Note that here j= q stands for a ^ p j= q,
where a = a1 ^ ^ an is the conjunction of the axioms A = fa1 ; :::; an g.
Normally, to ease exposition axioms are omitted in theorems’ statements because they
are taken for granted within the mathematical theory at hand. So, we just write p j= q in
place of A [ fp g j= q. For instance in Euclidean geometry, theorems do not mention the
axioms which they rely upon, for instance the parallel axiom, but only the speci…c hypothesis
of the theorem.

The scope of a mathematical theory is given by the propositions that, via theorems (D.7),
can be established to be true from the axioms in A and from a speci…c hypothesis p (which are
required not to contradict the axioms, i.e., a ^ p is not a contradiction). If these hypotheses
follow from the axioms, (D.7) is actually A j= q. If the axiomatic system is complete, all
theorems then take the form A j= q.
6
As Tarski (1994) writes on p. 110. Alfred Tarski has been, along with David Hilbert and Giuseppe Peano,
a central …gure in the modern analysis of the deductive method in mathematics. We refer readers to his book
for a masterly introduction to the subject.
D.5. DEDUCTIVE METHOD 1125

D.5.3 A miniature theory


Following Tarski (1994), consider a miniature mathematical theory that has two primitive
terms I and . The symbol I indicates the set of all segments (denoted by the letters x, y,
z ...) of the real line. The symbol indicates the congruence relation between segments,
so that x y reads as “the segment x is congruent with the segment y”. Two axioms are
considered.

A.1 The proposition a1 =“x y for all x; y 2 I” is true (i.e., is re‡exive).

A.2 The proposition a2 =“x z and y z imply x y for all x; y; z 2 I” is true.

Let q =“x y if and only if y x for all x; y 2 I” (i.e., is symmetric).

Theorem 1555 We have A j= q.

Proof We have a2 j= r, where r =“z z and y z imply z y for all y; z 2 I”. So, the
proof relies on the deduction scheme a1 ^ a2 j= a1 ^ r j= q.7

Thus, under the axioms – i.e., is symmetric and transitive – the binary relation is
symmetric. It is easily checked to be also transitive.

D.5.4 Interpretations
The speci…c meaning attached to the primitive terms is irrelevant for the formal deductions
carried out via (D.7). For instance, following again Tarski (1994), consider an alternative
interpretation of the primitive terms of the previous theory in which I now indicates a set of
numbers and the symbol indicates a congruence relation in which x y reads as “there
is an integer z such that x y = z”. Axioms A.1 and A.2 and the resulting Theorem 1555
still apply.
So, the same mathematical theory may admit di¤erent interpretations, whose meaning is
understood outside the theory –which thus takes it for granted. The expression “self-evident”
is now replaced by this more general principle. For this reason, in modern mathematics
the emphasis is on the consistency of the axioms rather than on their self-evidence (as it
was in Greek geometry), a notion that implicitly refers to a speci…c interpretation. As
readers will learn in logic courses, axioms have their own syntactic life that abstracts from
any speci…c interpretations (semantics). For instance, in Tarski’s miniature example the
underlying general abstract structure consists of a set X and a binary relation R on it. Any
interpretation of X and R provides a model for such abstract structure. The abstract axioms
are:

A.1 the proposition a1 =“R is re‡exive” is true;

A.2 the proposition a2 =“xRz and yRz imply xRy for all x; y; z 2 X” is true.
7
It is easy to check using truth tables that from q j= r it follows p ^ q j= p ^ r for all propositions p, q and
r.
1126 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

If we set q =“R is symmetric”, we have the abstract version of Theorem 1555.


All this is a bit pedantic, however. In a more imprecise, yet much more suggestive, way
these two abstract axioms can be stated as:

A.1 R is re‡exive;

A.2 If xRz and yRz, then xRy for all x; y; z 2 X.

If we call Tarskian the property in A.2, we can state the abstract version of Theorem
1555 in a legible way.

Theorem 1556 If a binary relation is re‡exive and Tarskian, then it is symmetric.

In all models of the abstract structure (X; R) this theorem holds and will be suitably
interpreted.

D.6 Predicates and quanti…ers


D.6.1 Generalities
The symbols 8 and 9 mean respectively “for every” and “there exists (at least one)” and
are called the universal quanti…er and the existential quanti…er . Their role is fundamental
in mathematics. For example, the statement x2 = 1 is, per se, meaningless. By completing
it by writing
8x 2 R, x2 = 1 (D.8)
we would make a big mistake; by writing, instead,

9x 2 R, x2 = 1 (D.9)

we would assert a (simple) truth: there is some real number (there are actually two of them:
x = 1) whose square is 1.

To understand the role of quanti…ers, we consider expressions –called (logical ) predicates


and denoted by p (x) –that contain an argument x that varies in a given set X, the domain
(or universe of discourse). For example, the predicate p (x) can be “x2 = 1” or “in the
year x it rained in Milan”. Once a speci…c value of the domain x is considered, we have a
proposition p (x) that may be either true or false. For instance, if X is the real line and
x = 3, the proposition “x2 = 1” is false; it becomes true if and only if x = 1.
The propositions
9x 2 X, p (x) (D.10)
and
8x 2 X, p (x) (D.11)
mean that p(x) is true at least for some x in the domain and that p(x) is true for every such
x, respectively. For example, when p (x) is “x2 = 1” propositions (D.10) and (D.11) reduce,
respectively, to propositions (D.8) and (D.9), while for the weather predicate they become
the propositions “there exists a year in which it rained in Milan”and “every year it rained in
D.6. PREDICATES AND QUANTIFIERS 1127

Milan”. Note that when the domain is …nite, say X = fx1 ; :::; xn g, the propositions (D.10)
and (D.11) can be written as p (x1 ) _ _ p (xn ) and p (x1 ) ^ ^ p (xn ), respectively.
Quanti…ers transform, therefore, predicates in propositions, that is, in statements that
are either true or false. That said, if X is in…nite to verify whether proposition (D.11)
is true requires an in…nite number of checks, i.e., whether p (x) is true for each x 2 X.
Operationally, such truth value cannot be determined. In contrast, to verify whether (D.11)
is false is enough to exhibit one x 2 X such that p (x) is false. There is, therefore, a clear
asymmetry between the operational content of the two truth values of (D.11). A large X
reinforces the asymmetry between veri…cation and falsi…cation that a large n already causes,
as we remarked in Coda (a proposition “8x 2 X, p1 (x) ^ ^ pn (x)” would combine, so
magnify, these two sources of asymmetry).
In contrast, the existential proposition (D.10) can be veri…ed via an element x 2 X such
that p (x) is true. Of course, if X is large (let alone if it is in…nite), it may be operationally
not obvious how to …nd such an element. Be that as it may, falsi…cation is in a much bigger
trouble: to verify that proposition (D.10) is false we should check that, for all x 2 X, the
proposition p (x) is false. Operationally, existential propositions are typically not falsi…able.

N.B. (i) In the book we often write “p (x) for every x 2 X” in the form

p (x) 8x 2 X

instead of 8x 2 X, p (x). It is a common way to handle universal quanti…ers. (ii) If


X = X1 Xn is a Cartesian product, the predicate takes the form p (x1 ; :::; xn ) because
x = (x1 ; :::; xn ). O

D.6.2 Algebra
In a sense, 8 and 9 represent the negation of one another. So8

: (9x, p (x)) 8x; :p (x)

and, symmetrically,
: (8x, p (x)) 9x, :p (x)
In the example where p (x) is “x2 = 1”, we can equally well write:

: 8x, x2 = 1 or 9x, x2 6= 1

(respectively: it is not true that x2 = 1 for every x and it is true that for some x one has
x2 6= 1).
More generally
: (8x; 9y, p (x; y)) 9x; 8y, :p (x; y)
For example, let p (x; y) be the proposition “x + y 2 = 0”. We can equally assert that

: 8x; 9y, x + y 2 = 0
8
To ease notation, in the quanti…ers we omit the clause “2 X”.
1128 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

(it is not true that, for every x 2 R, we can …nd a value of y 2 R such that the sum x + y 2
is zero: it is su¢ cient to take x = 5) or

9x; 8y, x + y 2 6= 0

(it is true that, for every choice of y 2 R, there exists some value of x 2 R such that
x + y 2 6= 0: it is su¢ cient to take x 6= y 2 ).

D.6.3 Example: linear dependence


m
In Chapter 3 a …nite set of vectors xi i=1
of Rn has been called linearly independent if, for
every set f i gm
i=1 of real numbers,

1 2 m
1x + 2x + + mx = 0 =) 1 = 2 = = m =0
m
The set xi i=1 has been, instead, called linearly dependent if it is not linearly inde-
pendent, i.e., if there exists a set f i gmi=1 of real numbers, not all equal to zero, such that
x 1+ x 2+ + xm = 0.
1 2 m
We can write these notions by making the role of predicates explicit. Let p ( 1 ; :::; m ) and
q ( 1 ; :::; m ) be the predicates “ 1 x1 + 2 x2 + + m xm = 0”and “ 1 = 2 = = m = 0”,
m
respectively. The set xi i=1 is linearly independent when

8 f i gm
i=1 , p ( 1 ; :::; m) =) q ( 1 ; :::; m)

In words, for every set f i gm


i=1 of real numbers, if 1x
1 + 2x
2 + + mx
m = 0, then
1 = 2 = = m = 0.
The negation is

9 f i gm
i=1 ; : (p ( 1 ; :::; m) =) q ( 1 ; :::; m ))

that is, thanks to the equivalence (D.3),

9 f i gm
i=1 ; p ( 1 ; :::; m) ^ :q ( 1 ; :::; m)

In words, there exists a set f i gm


i=1 of real numbers not all, simultaneously, null, and such
1
that 1 x + 2 x +2 m
+ m x = 0.

D.6.4 Example: negation of convergence


What is the correct negation of the de…nition of convergence? Recall that a sequence fxn g
converges to a point L 2 R if for every " > 0 there exists n" 1 such that

n n" =) jxn Lj < " (D.12)

By expliciting all quanti…ers, we can succinctly write

8" > 0; 9n" 1; 8n n" ; jxn Lj < "

The negation is then


9" > 0; 8k 1; 9n k; jxn Lj "
D.7. CODA: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1129

In other words, a sequence fxn g does not converge to a point L 2 R if there exists " > 0
such that for each k 1 there is n k such that

jxn Lj "

By denoting by nk any such n k,9 we de…ne a subsequence fxnk g such that jxnk Lj "
for all k 1. So, we have the following useful characterization of non-convergence to a given
point.

Proposition 1557 A sequence fxn g does not converge to a point L 2 R if and only if there
is a subsequence fxnk g such that jxnk Lj " for all k 1.

D.6.5 A set-theoretic twist


There is a close connection between predicates and sets. Indeed, any predicate p (x) can be
identi…ed with the set A of all elements x of X such that the proposition p (x) is true, i.e.,
A = fx 2 X : p (x) is trueg. Clearly,

p (x) is true () x 2 A

So, predicates and sets are two sides of the same coin. Indeed, predicates formalize the
speci…cation of sets via a property that its elements have in common, as we mentioned at
the very beginning of the book.
In a similar vein, a binary predicate p (x; y) with two arguments that belong to the same
set X can be identi…ed with the binary relation R on X consisting of all pairs (x; y) such
that the proposition p (x; y) is true, i.e., R = f(x; y) 2 X X : p (x; y) is trueg. Clearly,

p (x; y) is true () xRy

We conclude that also binary predicates and binary relations are two sides of the same coin.
In general, predicates with n arguments can be identi…ed with n-ary relations, as readers
will learn in more advanced courses. In any case, the set-theoretic translations of some key
logical notions is a further wonder of Cantor’s paradise.

D.7 Coda: the logic of empirical scienti…c theories


Inspired by deductive method outlined before, we can sketch a description of a deductive
and realist scienti…c theory about a physical or social empirical reality.10
Let P be a collection of propositions closed with respect to the logical operations. Pro-
positions are written through primitive terms, whose empirical meaning is taken for granted
9
The construction of this subsequence is, actually, a bit delicate. Indeed, for fxnk g to be a subsequence
we need to construct nk so that n1 < n2 < < nk < nk+1 < . To start with, note that if fxn g does not
converge to L, then for each m 1 the set N (m) = fn 1 : n m and jxn Lj "g is non-empty. De…ne
then n1 = min N (1) and, recursively, nk+1 = min N (nk + 1) for every k. Since each N (m) is nonempty, nk
is well de…ned.
10
Realism is a methodological position, widely held in the practice of natural and social science, that asserts
the existence of an external, objective, reality that it is the purpose of scienti…c inquiries to investigate.
1130 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

by the theory, and of de…ned terms. So written, the propositions in P are either true or
false, and the collection P describes the empirical phenomenon under investigation.11
A function v : P ! f0; 1g assigns a truth value to all propositions in P . Each truth
assignment v corresponds to a possible con…guration of the empirical reality in which the
propositions in P are either true or false. Each truth assignment is, thus, a possible inter-
pretation that reality may give P . There is a unique true v because there is a unique true
empirical reality.
Let V be the collection of all truth assignments. A proposition p 2 P is a tautology if
v (p) = 1 for all v 2 V and is a contradiction if v (p) = 0 for all v 2 V . In words, a tautology
is a proposition that is true under all interpretations, while a contradiction is a proposition
that is false under all them. The truth value of tautologies and contradictions thus only
depend on their own form, regardless of any interpretation that they can take.12

Lemma 1558 p j= q if and only if v (p) v (q) for all v 2 V .

Proof Let p j= q. If p is true also q is true (both values equal to 1); if p is false (value 0), q
can be true or false (value either 0 or 1). Thus, v (p) v (q) for all v 2 V . The converse is
easily checked.

Let v be the true con…guration of the empirical reality under investigation. A scienti…c
theory takes a stance about the empirical reality that it is studying by positing a consistent
collection A = fa1 ; :::; an g of propositions, called axioms, that are assumed to be true under
the (unknown) true con…guration v , i.e., it is assumed that v (ai ) = 1 for each i = 1; :::; n.
All propositions that are logical consequences of the axioms are then assumed to be true
under v .13 In particular, if A is complete the truth value of all propositions in P can be, in
principle, decided. So, the function v is identi…ed.

Example 1559 (i) A choice theory studies the behavior of a consumer who faces di¤erent
bundles of goods. Consider a choice theory that has two primitive terms I and (cf. Section
D.5.3). The symbol I indicates the set of all bundles of goods available to the consumer. The
symbol indicates the consumer’s indi¤erence relation between the bundles, so that x y
reads as “for the consumer bundle x is indi¤erent to bundle y”.14 If the theory assumes
axioms A.1 and A.2, so the truth of propositions a1 and a2 , then is symmetric (Theorem
1555) and transitive. By assuming these two axioms, the theory makes a stance about the
consumer’s behavior, which is the empirical reality that is studying. The theory is correct
as long as these axioms are true, i.e., v (a1 ) = v (a2 ) = 1.15 (ii) Special relativity is based
11
Of course, behind this sentence there are a number of highly non-trivial conceptual issues about meaning,
truth, reality, etc. etc. (an early classical analysis of these issues can be found in Carnap, 1936).
12
The importance of propositions whose truth value is independent of any interpretation was pointed out
by Ludwig Wittgenstein in his famous Tractatus (the use of the term tautology in logic is due to him; he also
popularized the use of truth tables to handle truth assignments).
13
In the words of Wittgenstein “If a god creates a world in which certain propositions are true, he creates
thereby also a world in which all propositions consequent on them are true.” (Tractatus, proposition 5.123)
14
Needless to say, after congruence relations on segments and integers, the indi¤erence relation on bundles
of goods is yet another model of the abstract structure (X; R) of Section D.5.4.
15
Di¤erent interpretations are, of course, possible of this theory. Debreu (1959) is a classic axiomatic work
in economics; in the preface of his book, Debreu writes that “Allegiance to rigor dictates the axiomatic form
of the analysis where the theory, in the strict sense, is logically entirely disconnected from its interpretations.”
D.7. CODA: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1131

on two axioms: a1 =“invariance of the laws of physics in all inertial frames of reference”,
a2 =“the velocity of light in vacuum is the same in all inertial frames of reference”. If v is
the true physical con…guration, the theory is true if v (a1 ) = v (a2 ) = 1. N

To decide whether a scienti…c theory is true we thus have to check whether v (ai ) = 1
for each i = 1; :::; n. If n is large, operationally this might be complicated (infeasible if is
in…nite). In contrast, to falsify the theory it is enough to exhibit, directly, a proposition of
that is false or, indirectly, a consequence of that is false. This operational asymmetry
between veri…cation and falsi…cation (emphasized by Karl Popper in the 1930s) is an im-
portant methodological aspect. Indirect falsi…cation is, in general, the kind of falsi…cation
that one might hope for. It is the so-called testing of the implications of a scienti…c theory.
In this indirect case, however, it is unclear which one of the posited axioms actually fails: in
fact, : (p1 ^ ^ pn ) :p1 _ _:pn . If not all the posited axioms have the same status,
only some of them being “core”axioms (as opposed to auxiliary ones), it is then unclear how
serious is the falsi…cation. Indeed, falsi…cation is often a chimera (especially in the social
sciences), as even the highly stylized setup of this section should suggest.
1132 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
Appendix E

Mathematical induction

E.1 Generalities
Suppose that we want to prove that a proposition p(n), formulated for every natural number
n, is true for every such number n. Intuitively, it is su¢ cient to show that the “initial”
proposition p(1) is true and that the truth of each proposition p (n) implies that of the
“subsequent” one p (n + 1). Next we formalize this domino argument:1

Theorem 1560 (Induction principle) Let p (n) be a proposition stated in terms of each
natural number n. Suppose that:

(i) p (1) is true;

(ii) for each n, if p(n) is true, then p(n + 1) is true.


Then, proposition p (n) is true for each n.

Proof Suppose, by contradiction, that proposition p (n) is false for some n. Denote by n0
the smallest such n, which exists since every non-empty collection of natural numbers has
a smallest element.2 By (i), n0 > 1. Moreover, by the de…nition of n0 , the proposition
p (n0 1) is true. By (ii), p (n0 ) is true, a contradiction.

A proof by induction thus consists of two steps:

(i) Initial step: prove that the proposition p (1) is true.

(ii) Induction step: prove that, for each n, if p(n) is true (induction hypothesis), then
p(n + 1) is true.

We illustrate this important type of proof by determining the sum of some important
series.
1
There are many soldiers, one next to the other. The …rst has the “right scarlet fever”, a rare form of
scarlet fever that contaminates instantaneously who is at the right of the sick person. All the soldiers take it
because the …rst one infects the second one, the second one infects the third one, and so on so forth.
2
In the set-theoretic jargon, we say that N is a well ordered set.

1133
1134 APPENDIX E. MATHEMATICAL INDUCTION

(i) We have
n
X n (n + 1)
1+2+ +n= s=
2
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1)
1=
2
Induction step. Assume it is true for n = k (induction hypothesis), that is,
k
X k (k + 1)
s=
2
s=1

We must prove that it is true also for n = k + 1, i.e., that


k+1
X (k + 1) (k + 2)
s=
2
s=1

Indeed3
k+1
X k
X k (k + 1) (k + 1) (k + 2)
s= s + (k + 1) = +k+1=
2 2
s=1 s=1

In particular, the sum of the …rst n odd numbers is n2 :


n
X n
X n
X n (n + 1)
(2s 1) = 2 s 1=2 n = n2
2
s=1 s=1 s=1

(ii) We have
n
X
2 2 2 n (n + 1) (2n + 1)
1 +2 + +n = s2 =
6
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1) (2 + 1)
12 =
6
Induction step. By proceeding as above, we get:
k+1
X k
X k (k + 1) (2k + 1)
2
s = s2 + (k + 1)2 = + (k + 1)2
6
s=1 s=1
(k + 1) [k (2k + 1) + 6 (k + 1)] (k + 1) 2k 2 + 7k + 6
= =
6 6
(k + 1) (k + 2) (2k + 3)
=
6
as claimed.
3
Alternatively, this sum can be derived by observing that the sum of the …rst and of the last addend is
n + 1, the sum of the second one and of the second-last one is still n + 1, etc. There are n=2 pairs and therefore
the sum is (n + 1) n=2.
E.2. THE HARMONIC MENGOLI 1135

(iii) We have
n n
!2
X X n2 (n + 1)2
13 + 2 3 + + n3 = s3 = s =
4
s=1 s=1

Initial step. For n = 1 the property is trivially true:

12 (1 + 1)2
13 =
4
Induction step. By proceeding as above, we get:
k+1
X k
X k 2 (k + 1)2
s3 = s3 + (k + 1)3 = + (k + 1)3
4
s=1 s=1
(k + 1)2 k 2 + 4 (k + 1) (k + 1)2 (k + 2)2
= =
4 4

(iv) Consider the sum


n
X 1 qn
a + aq + aq 2 + + aq n 1
= aq s 1
=a
1 q
s=1

of n terms in the geometric progression with …rst term a and common ratio q 6= 1.
Initial step. For n = 1 the formula is trivially true:

1 q
a=a
1 q

Induction step. By proceeding as above, we get


k+1
X k
X 1 qk
aq s 1
= aq s 1
+ aq k = a + aq k
1 q
s=1 s=1
1 q k + (1 q) q k 1 q k+1
=a =a
1 q 1 q

as claimed.

E.2 The harmonic Mengoli


As a last illustration of the induction principle, we report a modern version of the classic
proof by Pietro Mengoli of the divergence of the harmonic series – presented in his 1650
essay Novae quadraturae arithmeticae seu de additione fractionum.

Theorem 1561 The harmonic series is divergent.

The proof is based on a couple of lemmas, the second of which is proven by induction.
1136 APPENDIX E. MATHEMATICAL INDUCTION

Lemma 1562 We have, for every k 2,


1 1 1 3
+ +
k 1 k k+1 k

Proof Consider the convex function f : (0; 1) ! (0; 1) de…ned by f (x) = 1=x. Since
1 1 1
k= (k 1) + k + (k + 1)
3 3 3
Jensen’s inequality implies

1 1 1 1 1
= f (k) = f (k 1) + k + (k + 1) (f (k 1) + f (k) + f (k + 1))
k 3 3 3 3
1 1 1 1
= + +
3 k 1 k k+1

as claimed.
Pn
Let sn = k=1 xk be the partial sum of the harmonic series xk = 1=k.

Lemma 1563 s3n+1 sn + 1 for every n 1.

Proof We proceed by induction. Initial step: n = 1. We apply the previous lemma for
k = 3:
1 1 1 3
s3 1+1 = s4 = 1 + + + > 1 + = 1 + s1
2 3 4 3

Induction step: let us assume that the statement holds for n 1. We prove that it holds
for n + 1. We apply the previous lemma for k = 3n + 3,

1 1 1
s3(n+1)+1 = s3n+4 = s3n+1 + + +
3n + 2 3n + 3 3n + 4
1 1 1
sn + 1 + + +
3n + 2 3n + 3 3n + 4
3 1
sn + 1 + = sn + 1 + = sn+1 + 1
3n + 3 n+1
which completes the induction step. In conclusion, the result holds thanks to the induction
principle.

Proof of the theorem Since the harmonic series has positive terms, the sequence of its
partial sums fsn g is monotonic increasing. Therefore, it either converges or diverges. By
contradiction, let us assume that it converges, i.e., sn " L < 1. From the last lemma it
follows that
L lim s3n+1 lim (1 + sn ) = 1 + lim sn = 1 + L
n n n

which is a contradiction.
Appendix F

Cast of characters

Archimedes (Syracuse 287 BC ca. –212 BC), mathematician.


Aristotle (Stagira 384 BC –Euboea 322 BC), philosopher and physicist.
Kenneth Arrow (New York 1921 –Palo Alto 2017), economist.
Emil Artin (Vienna 1898 –Hamburg 1962), mathematician.
René Baire (Paris 1874 - Chambéry 1932), mathematician.
Stefan Banach (Kraków 1892 –Lviv 1945), mathematician.
Heinz Bauer (Nuremberg 1928 –Erlangen 2002), mathematician.
Jeremy Bentham (London 1748 –1832), philosopher.
Daniel Bernoulli (Groningen 1700 –Basel 1782), mathematician.
Jakob Bernoulli (Basel 1654 –1705), mathematician.
Johann Bernoulli (Basel 1667 –1748), mathematician.
Sergei Bernstein (Odessa 1880 –Moscow 1968), mathematician.
Jacques Binet (Renns 1786 –Paris 1856), mathematician.
David Blackwell (Centralia 1919 –Berkeley 2010), mathematician and statistician.
Bernard Bolzano (Prague 1781 –1848), mathematician and philosopher.
Émile Borel (Saint-A¤rique 1871 –Paris 1956), mathematician.
Luitzen Brouwer (Overschie, 1881 –Blaricum 1966), mathematician and philosopher.
Cesare Burali-Forti (Arezzo 1861 –Turin 1931), mathematician.
Renato Caccioppoli (Naples 1904 –1959), mathematician.
Georg Cantor (Saint Petersburg 1845 –Halle 1918), mathematician.
Alfredo Capelli (Milan 1855 –Naples 1910), mathematician.
Gerolamo Cardano (Pavia 1501 –Rome 1576), mathematician.
Augustin-Louis Cauchy (Paris 1789 –Sceaux 1857), mathematician.
Ernesto Cesàro (Naples 1859 –Torre Annunziata 1906), mathematician.
Gustave Choquet (Solesmes 1915 - Lyon 2006), mathematician.
Gabriel Cramer (Geneva 1704 –Bagnols-sur-Cèze 1752), mathematician.
Jean Darboux (Nimes, 1842 –Paris 1917), mathematician.

1137
1138 APPENDIX F. CAST OF CHARACTERS

Gerard Debreu (Calais 1921 –Paris 2004), economist.


Richard Dedekind (Braunschweig 1831 –1916), mathematician.
Democritus (Abdera 460 BC ca. –370 BC ca.), philosopher.
René Descartes (Cartesius) (La Haye 1596 –Stockholm 1650), mathematician and philo-
sopher.
Diophantus (Alexandria, II - III century BC), mathematician.
Ulisse Dini (Pisa 1845 –1918), mathematician.
Peter Lejeune Dirichlet (Düren 1805 –Göttingen 1859), mathematician.
Francis Edgeworth (Edgeworthstown 1845 –Oxford 1926), economist.
Epicurus (Samos 341 BC –Athens 270 BC), philosopher.
Euclid (Alexandria, IV - III century BC), mathematician.
Eudoxus (Cnidus, IV centry BC), mathematician.
Leonhard Euler (Basel 1707 –Saint Petersburg 1783), mathematician.
Leonardo da Pisa (Fibonacci) (Pisa ca. 1170 - ca. 1240), mathematician.
Werner Fenchel (Berlin 1905 –Copenhagen 1988), mathematician.
Pierre de Fermat (Beaumont-de-Lomagne 1601 – Castres 1665), lawyer and mathem-
atician.
Bruno de Finetti (Innsbruck 1906 –Rome 1985), mathematician.
Nicolò Fontana (Tartaglia) (Brescia 1499 –Venice 1557), mathematician.
Ferdinand Frobenius (Charlottenburg 1849 –Berlin 1917), mathematician.
Galileo Galilei (Pisa 1564 –Arcetri 1642), astronomer and physicist.
Carl Gauss (Brunswick 1777 –Gottingen 1855), mathematician.
Guido Grandi (Cremona 1671 –Pisa 1742), mathematician.
Jacques Hadamard (Versailles 1865 –Paris 1963), mathematician.
Felix Hausdor¤ (Breslau 1868 –Bonn 1942), mathematician.
Heinrich Heine (Berlin 1821 –Halle 1881), mathematician.
Heron (Alexandria I century AD), mathematician.
John Hicks (Warwick 1904 –Blockley 1989), economist.
David Hilbert (Königsberg 1862 –Gottingen 1943), mathematician.
Einar Hille (New York 1894 –La Jolla 1980), mathematician.
Guillaume de l’Hôpital (Paris 1661 –1704), mathematician.
Hippocrates (Chios, V century BC), mathematician.
Carl Jacobi (Potsdam 1804 –Berlin 1851), mathematician.
Johan Jensen (Nakskov 1859 –Copenhagen 1925), mathematician.
William Jevons (Liverpool 1835 –Bexill 1882), economist.
Shizuo Kakutani (Osaka, 1911 –New Haven 2004), mathematician.
Leopold Kronecker (Liegnitz 1823 –Berlin 1891), mathematician.
Harold Kuhn (Santa Monica 1925 - New York 2014), mathematician.
1139

Muh.ammad ibn Mūsa al-Khuwārizm¯¬ (750 ca – Baghdad 850 ca), astronomer and
mathematician.
Giuseppe Lagrange (Turin 1736 –Paris 1813), mathematician.
Gabriel Lamé (Tours 1795 –Paris 1870), mathematician.
Edmund Landau (Berlin 1877 –1938), mathematician.
Pierre-Simon de Laplace (Beaumont-en-Auge 1749 – Paris 1827), mathematician and
physicist.
Adrien-Marie Legendre (Paris 1752 –1833), mathematician.
Gottfried Leibniz (Leipzig 1646 –Hannover 1716), mathematician and philosopher.
Wassily Leontief (Saint Petersburg 1905 –New York 1999), economist.
Joseph Liouville (Saint-Omer 1809 –Paris 1882), mathematician.
Rudolph Lipschitz (Konigsberg 1832 –Bonn 1903), mathematician.
John Littlewood (Rochester 1885 –Cambridge 1977), mathematician.
Colin Maclaurin (Kilmodan 1698 –Edinburgh 1746), mathematician.
Lorenzo Mascheroni (Bergamo, 1750 –Paris, 1800), mathematician.
Melissus (Samos V century BC), philosopher.
Carl Menger (Nowy Sacz
¾ 1840 –Vienna 1921), economist.
Pietro Mengoli (Bologna 1626 –1686), mathematician.
Marin Mersenne (Oizé 1588 –Paris 1648), mathematician and physicist.
Hermann Minkowski (Aleksotas 1864 –Gottingen 1909), mathematician.
Carlo Miranda (Naples 1912 –1982), mathematician.
Abraham de Moivre (Vitry-le-François 1667 –London 1754), mathematician.
John Napier (Edinburgh 1550 –1617), mathematician.
John Nash (Blue…eld 1928 –Monroe 2015), mathematician.
Isaac Newton (Woolsthorpe 1642 –London 1727), mathematician and physicist.
Vilfredo Pareto (Paris 1848 –Céligny 1923), economist and sociologist.
Parmenides (Elea VI century BC), philosopher.
Giuseppe Peano (Spinetta di Cuneo 1858 –Turin 1932), mathematician.
Plato (Athens 484 BC ca. –348 BC ca.), philosopher.
Alfred Pringsheim (Olawa 1850 –Zurich 1941), mathematician.
Pythagoras (Samos 570 BC ca. – Metapontum 495 BC ca.), mathematician and philo-
sopher.
Henri Poincaré (Nancy 1854 –Paris 1912), mathematician.
Hudalricus Regius (Ulrich Rieger) (XVI century), mathematician.
Bernhard Riemann (Breselenz 1826 –Selasca 1866), mathematician.
Michel Rolle (Ambert 1652 –Paris 1719), mathematician.
Bertrand Russell (Trellech 1872 –Penrhyndeudraeth 1970), philosopher.
Karl Schwarz (Hermsdorf, 1843 –Berlin 1921), mathematician.
1140 APPENDIX F. CAST OF CHARACTERS

Eugen Slutsky (Yaroslav 1880 –Moscow 1948), economist and mathematician.


Guido Stampacchia (Naples, 1922 –Paris, 1978), mathematician.
James Stirling (Garden 1692 –Edinburgh 1770), mathematician.
Thomas Stieltjes (Zwolle 1856 –Toulouse 1894), mathematician.
Alfred Tarski (Warsaw 1902 –Berkeley 1983), mathematician.
Brook Taylor (Edmonton 1685 –London 1731), mathematician.
Leonida Tonelli (Gallipoli 1885 –Pisa 1946), mathematician.
Albert Tucker (Oshawa 1905 –Hightstown 1995), mathematician.
Charles-Jean de la Vallèe Poussin (Leuven 1866 –1962), mathematician.
John von Neumann (Budapest 1903 –Washington 1957), mathematician.
Leon Walras (Évreux 1834 –Clarens-Montreux 1910), economist.
Karl Weierstrass (Ostenfelde 1815 –Berlin 1897), mathematician.
Ludwig Wittgenstein (Vienna 1889 –Cambridge 1951), philosopher.
Zeno (Elea V century BC), philosopher.
Bibliography

[1] Kenneth J. Arrow, Methodological individualism and social knowledge, American Eco-
nomic Review, 84, 1-9, 1994.

[2] Emil Artin, The gamma function, Holt, Rinehart and Winston, New York, 1964.

[3] Jonathan Barnes, The Presocratic philosophers, Routledge, Lodon, 1982.

[4] Claude Berge, Espaces topologiques et fonctions multivoques, Dunod, Paris, 1959.

[5] Daniel Bernoulli, Specimen theoriae novae de mensura sortis, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, 1738 (trans. in Econometrica, 22, 23-36, 1954).

[6] Luitzen E. J. Brouwer, Über abbildung von mannigfaltikeiten, Mathematische Annalen,


71, 97-115, 1912.

[7] Guido Calogero, Studi sull’Eleatismo, La Nuova Italia, Firenze, 1977.

[8] Maria Cardini Timpanaro, Pitagorici, La Nuova Italia, Firenze, 1964.

[9] Rudolf Carnap, Testability and meaning, Philosophy of Science, 3, 419-471, 1936.

[10] John H. Cochrane, Asset pricing, Princeton, Princeton University Press, 2005.

[11] Giorgio Colli, La nascita della …loso…a, Adelphi, Milano, 1975.

[12] Gerard Debreu, Theory of value, Yale University Press, New Haven, 1959.

[13] Nicolaas G. de Bruijn, Asymptotic methods in analysis, North-Holland, Amsterdam,


1961.

[14] Godfrey H. Hardy, Orders of in…nity, Cambridge University Press, Cambridge, 1910.

[15] Bruno de Finetti, Sulle strati…cazioni convesse, Annali di Matematica Pura e Applicata,
30, 173-183, 1949

[16] Werner Fenchel, Convex cones, sets, and functions, Princeton University Press, 1953.

[17] Kurt von Fritz, The discovery of incommensurability by Hippasus of Metapontum, An-
nals of Mathematics, 46, 242–264, 1945.

[18] Izrail S. Gradshteyn and Iosif M. Ryzhik, Table of integrals, series, and products, 8th
ed., Academic Press, New York, 2014.

1141
1142 BIBLIOGRAPHY

[19] Paul Halmos, Naive set theory, Van Nostrand, Princeton, 1960.

[20] Godfrey H. Hardy, Divergent series, Oxford University Press, Oxford, 1949.

[21] Johan Jensen, Sur les fonctions convexes et les inégalités entre les valeurs moyennes,
Acta Mathematica, 30, 175-193, 1906.

[22] Camille Jordan, Cours d’analyse, v. 1, Gauthier-Villars, Paris, 1893.

[23] Shizuo Kakutani, A generalization of Brouwer’s …xed point theorem, Duke Mathematical
Journal, 8, 457-459, 1941.

[24] David Kinderlehrer and Guido Stampacchia, An introduction to variational inequalities


and their applications, Academic Press, New York, 1980.

[25] Harold W. Kuhn and Albert W. Tucker, Nonlinear programming, Proceedings of the
Second Berkeley Symposium, 481-492, University of California Press, Berkeley, 1951.

[26] Lucio Lombardo Radice, L’in…nito, Editori Riuniti, Roma, 1981.

[27] Katta G. Murty e Santosh N. Kabadi, Some NP-complete problems in quadratic and
nonlinear programming, Mathematical Programming, 39, 117-129, 1987.

[28] Steven G. Krantz and Harold R. Parks, A primer of real analytic functions, Birkhauser,
Boston, 2002.

[29] Wladyslaw Kulpa, The Poincaré-Miranda theorem, American Mathematical Monthly,


104, 545-550, 1997.

[30] John Nash, Equilibrium points in n-person games, Proceedings of the National Academy
of Sciences, 36, 48-49, 1950.

[31] Yurii Nesterov, Introductory lectures on convex optimization, Kluwer, Boston, 2004.

[32] Vilfredo Pareto, Sunto di alcuni capitoli di un nuovo trattato di economia pura, 20, 216-
235, Giornale degli Economisti, 1900 (trans. in Giornale degli Economisti, 67, 453-504,
2008).

[33] John W. Pratt, Risk aversion in the small and in the large, Econometrica, 32, 122-136,
1964.

[34] Joseph F. Ritt, Integration in …nite terms: Liouville’s theory of elementary models,
Columbia University Press, New York, 1948.

[35] R. Tyrrell Rockafellar, Lagrange multipliers and optimality, SIAM Review, 35, 183-238,
1993.

[36] Stephen A. Ross, Neoclassical …nance, Princeton University Press, Princeton, 2005.

[37] Walter Rudin, Principles of mathematical analysis, McGraw-Hill, New York, 1964.

[38] Arpad Szabo, The beginnings of Greek mathematics, Reidel Publishing Company,
Dordrecht, 1978.
BIBLIOGRAPHY 1143

[39] George J. Stigler, The development of utility theory I, II, Journal of Political Economy,
58, 307-327 and 373-396, 1950.

[40] Patrick Suppes, Axiomatic set theory, Princeton, Van Nostrand, 1960.

[41] Alfred Tarski, Introduction to logic and to the methodology of the deductive sciences, 4th
ed., Oxford University Press, Oxford, 1994.

[42] Leonida Tonelli, L’analisi funzionale nel calcolo delle variazioni, Annali della Scuola
Normale Superiore di Pisa, 9, 289-302, 1940.

[43] Donald M. Topkis, Supermodularity and complementarity, Princeton University Press,


Princeton, 2011.

[44] Gregory Vlastos, Studies in Greek philosophy, v. 1, Princeton University Press, Prin-
ceton, 1996.

[45] John von Neumann, Zur theorie der gesellshaftsphiele, Mathematische Annalen, 100,
295-320, 1928 (trans. in R. D. Luce and A. W. Tucker, eds., Contributions to the Theory
of Games IV, 13-42. Princeton University Press, Princeton, 1959).

[46] John von Neumann and Oskar Morgenstern, Theory of games and economic behavior,
Princeton University Press, Princeton, 1944.

[47] James Warren, Presocratics, Routledge, London, 2014.

[48] Eduardo H. Zarantonello, Projections on convex sets in Hilbert space and spectral the-
ory, in Contributions to nonlinear functional analysis (E. H. Zarantonello, ed.), Aca-
demic Press, New York, 1971.
Index

Absolute value, 75 Riemann, 1043


Addition Stieltjes, 1080
among matrices, 388 Closure
Algorithm of set, 96
notion, 17 Codomain, 107
of Euclid, 17 Coe¢ cient
of Gauss, 409 binomial, 1100
of Hero, 372 Fourier, 82
of Kronecker, 431 multinomial, 1099
Approximation Cofactor, 423
linear, 633, 713 Combination
polinomial, 713 a¢ ne, 465
quadratic, 714 convex, 451, 454
Arbitrage, 482, 605 Comparative statics, 525, 836, 874, 962, 963,
Archimedean property, 27 968
Argmax, 523 Complement
Arithmetic average, 384 algebraic, 423
Asset, 599 Completeness
Asymptote, 857 of the order, 22
horizontal, 857 Components
oblique, 857 of a matrix, 387
vertical, 857 of a vector, 44
Axis Compound factor, 481
horizontal/abscissae, 42 Condition
vertical/ordinates, 42 …rst order, 684
…rst-order, 689
Basis, 68, 72
second-order, 702
orthonormal, 82
Conditional, 1114
Biconditional, 1114
Cone, 489
Bits, 34
Constant
Border, 91
Euler-Mascheroni , 251
C(E), 341 Napier, 223
C^1(E), 637, 667 Constraints
C^n(E), 639, 667 equality, 889
Cardinality, 164 inequality, 916
of the continuum, 169 Contingent claim, 600
Cauchy condition, 222 Continuity, 339
Change of variable uniform, 375

1144
INDEX 1145

Contrapositive, 1117 Density, 28


Convergence Derivative, 611
absolute (for series), 261 higher order, 638
in mean (Cesàro), 280 left, 617
negation, see Principle by induction of compounded function, 627
of improper integrals, 1052 of the inverse function, 629
of sequences, 195, 203, 216 of the product, 624
of series, 244, 256 of the quotient, 625
radius, 291 of the sum, 624
Converse, 1115 partial, 646, 649
Correspondence, 947 right, 617
ascending, 972 second, 638
budget, 947 third, 638
demand, 962 unilateral, 617
feasibility, 961 Determinant, 415
hemicontinuous, 950 Diagonal
inverse, 947 principal, 389
solution, 961 Di¤erence, 7
Cosecant, 1105 Di¤erence quotient, 609, 611
Cosine, 1104 Di¤erentiability with continuity, 637
Cost Di¤erential, 634
marginal, 611 total, 659
Cotangent, 1105 Di¤erentiation under the integral sign, 1066
Countable, 164 Direct sum, 595
Cramer’s rule, 434 Discontinuity
Criterion essential, 345
comparison, 216 jump, 345
di¤erential of concavity, 756, 758, 768, non-removable, 345
769 removable, 345
di¤erential of monotonicity, 697 Distance (Euclidean), 86
di¤erential of strict monotonicity, 698 Divergence
of comparison for series, 249 of improper integrals, 1052
of the ratio for sequences, 217 of sequences, 198
of the root for sequences, 219 of series, 244
of the root for series, 284 Domain, 107
ratio, 256, 283 natural, 153
Sylvester-Jacobi, 728 of derivability, 616, 651
Cryptography, 130 Dual space, 381
Curve, 108
indi¤erence, 122, 159 Edgeworth box, see Pareto optimum
level, 118 Element
Cusp, 618 of a sequence, see Term of a sequence
of a vector, see Component of a vector
De Morgan’s laws, 10, 1116 Envelope
Decay concave of a function, 774
exponential, 235 convex of a function, 776
1146 INDEX

convex of a set, 483 arctan, 150


Equation, 358, 843 asymptotic to another, 333
characteristic, 297 bijective, 127
inclusion, 956 Blackwell, 504
parametric, 852 bounded, 132
polynomial, 358 bounded from above, 132
well posed, 844 bounded from below, 132
Equilibrium CES, 491
Arrow-Debreu, 570 Cobb-Douglas, 112
market, 188, 350, 360, 362, 368, 570, 836, coercive, 544
957 comparable with another, 333
Nash, 983 composite, 627
Equivalence, 1116 composite (compoud), 125
Expansion concave, 139, 457
asymptotic, 740 concave at a point, 855
partial fraction, 288 concavi…able, 774
polinomial, 713 constant, 133, 135
polynomial of Maclaurin, 716 continuous , 341
polynomial of Taylor, 716 continuous at a point, 339
Expectations continuously di¤erentiable, 637
classic, 368 convex, 139, 457
extrapolative, 191 convex at a point, 855
rational, 371 cosine, 147
Extended real line, 36, 199 CRRA, 333
cubic, 108
Factorial, 1098 cuneiform, 581
FOC, 684 decreasing, 133, 135
Forms of indetermination, 37, 212 demand, 567
Formula derivable, 611, 668, 761
binomial of Newton, 1101 derivative, 616
compound interest, 184 di¤erentiable, 634, 635, 652
multinomial, 1102 discontinuous, 344
of Euler, 680 elementary, 143, 1048
of Hille, 745 exponential, 143
of Maclaurin, 716 gamma, 517, 739
of Taylor, 716 Gaussian, 546, 863, 1051
Frontier, 91 generating, 292
Function, 105 homothetic, 497
absolute value, 110 implicit, 810, 815, 832
additive, 482 increasing, 133, 135
a¢ ne, 460 indicator, 1019
algebraic, 1047 in…mum of, 133
alpha-smooth, 881 in…nite, 338
analytic, 742 in…nitesimal, 338
arccosin, 149 inframodular, 770
arcsin, 148 injective, 126
INDEX 1147

instantaneous utility, 117, 187 rational, 288


integrable in an improper sense, 1052 Riemann integrable, 1006, 1010
integral, 1038 scalar, see Function of one variable
integrand, 1072 semicontinuous, 583
integrator, 1072 separable, 141
intertemporal utility, 117 signum, 355, 1034
inverse, 128, 629 sine, 146
invertible, 128 solution, 961
Lagrangian, 894 square root, 109
linear, 379, 443 step, 1018
locally decreasing, 695 strictly concave, 459
locally increasing, 695 strictly concave at a point, 855
locally strictly decreasing, 695 strictly convex, 459
locally strictly increasing, 695 strictly convex at a point, 855
log-concave, 516 strictly decreasing, 133
log-convex, 516 strictly increasing, 133, 136
logarithmic, 110, 143 strongly concave, 801
mantissa, 151 strongly convex, 806
modular, 508 strongly increasing, 136
moment generating, 1086 submodular, 508
monotonic (or monotone), 134 superlinear, 758
n-times continuously di¤erentiable, 639, supermodular, 508
667 supremum of, 132
negligible with respect to another, 333 surjective, 126
objective, 523 tangent, 147
of a single variable, 108 translation invariant, 504
of Dirichlet, 308 trascendental, 1048
of Kronecker, 427 trigonometric, 1048
of Leontief, 137 ultramodular, 770
of n variables, 111 uniformly continuous, 375
of several variables, 108 utility, 115, 137, 158
of vector, see Function of n variables value, 961
one-to-one, see Function injective vector, 108
one-way, 130 with increasing (cross) di¤erences, 511
partially derivable, 648 Functional
periodic, 150 linear, 379
polynomial, 143 Functional equation
positive homogeneous, 490 Cauchy, 478
primitive, 1032 for the exponential, 479
production, 116 for the logarithm, 480
proper, 848 for the power, 480
quadratic, 109
quasi-a¢ ne, 470 Goods
quasi-concave, 470 complements, 511
quasi-continuous, 589 perfect complements, 138
quasi-convex, 470 perfect substitutes, 142
1148 INDEX

substitutes, 512 upper, 1005


Gradient, 649 Integral sum
Gradient descent, 881 lower, 1004, 1011
Graph upper, 1004, 1011
of a correspondence, 948 Integration
of a function, 113 by change of variable, 1043
by parts (Riemann), 1042
Half-spaces, 551 by parts (Stieltjes), 1079
Hyperplane, 551 by trigonometric substitution, 1050
Hypograph, 462 Interior
of set, 90
Image, 107 Intersection, 5, 1113
of a sequence, 192 Interval, 23
of function, 107 bounded, 23, 49
of operator, 400 closed, 23, 49
Implication, 1117 half-closed, 23, 49
Indeterminacies, 328 half-open, see Interval half-closed
Indi¤erence open, 23, 49
class, 156 unbounded, 23, 50
curve, 122, 159 Isocosts, 123
map, 157 Isoquants, 123
relation, 155
Induction, see Principle by induction Kernel, 400
Inequality pricing, 604
Jensen, 467
of Cauchy-Schwarz, 77 L(R^n), 396
triangle, 76, 78, 87 L(R^n,R^m), 396
In…mum, 26, 89, 92 Law of one price, 602, 799
In…nite, 338 Least Upper Bound Principle, 27
actual, 163 Limit, 308
potential, 163, 247 from above, 197
In…nitesimal, 338 from below, 197
Integrability, 1006, 1011 inferior, 267
in …nite terms, 1048 left, 314
of continuous functions, 1022 of function, 303
of monotonic functions, 1024 of operators, 357
of rational functions, 1049 of scalar function, 308, 310
Integral of sequence, 194
de…nite, 1034 one-sided, 314
generalized, see Improper integral right, 314
improper, 1052, 1053, 1064 superior, 267
inde…nite, 1033 unilateral, 314
lower, 1005 vector function, 318
of Gauss, 1051, 1063 Linear combination, 64
of Stieltjes, 1072 convex, 451
Riemann, 1006 Linear system
INDEX 1149

determined, 436 local, 553


homogeneous, 434 strong global, 524
solvability, 439 strong local, 553
solvable, 436 Maximum of a function, 151
square, 432 global, 151, 523
undetermined, 436 global maximum value, 523
unsolvable, 436 local maximizer, 553
little-o of, 228, 334 local maximum value, 553
Lower bound, 24 maximizer, 151, 523
maximum value, 151
M(m,n), 388 strong global, 524
M(n), 415 strong maximizer, 524
Marginal rate Maximum of a set
of intertemporal substitution, 824 in R, 25
of substitution, 823
in R^n, 51
of transformation, 822
Maxminimizer, 978
Matrix
Mesh of a subdivision, 1008
adjoint, 423
Method
augmented, 437
elimination, 871
cofactor, see Matrix of algebraic compon-
Gaussian elimination, 409
ents
Lagrange’s, 896
complete, 437
least squares, 574
diagonal, 390
Methodology
echelon, 409
cardinal properties, 473
elementary, 410
full rank, 406 ceteris paribus, 651
Gram, 407 diversi…cation principle, 475
Hessian, 663, 729 homo oeconomicus, 521
identity, 388 methodological individualism, 521
inverse, 414 minimum action principle, 540
invertible, 414, 428 ordinal properties, 473, 540
Jacobian, 674, 907 rationality, 521, 541
lower triangular, 390 Minimal of a set, see Pareto optimum
maximum rank, 406 Minimaximizer, 978
non-singular, 428 Minimizer
null, 388 global, 525
of algebraic complements, 423 local, 553
rectangular, 387 Minimum of a function
simmetric, 389 local mimimum value, 553
singular, 428 Minor
square, 387 principal, 430
transpose, 390 principal of NW, 430
upper triangular, 390 Moments, 1084
Maximal of a set, see Pareto optimum Multiplier
Maximizer marginal interpretation, 972
global, 151, 523 Multiplier of Lagrange, 894, 908, 922
1150 INDEX

Napier’s constant, 258 Ordered pairs, 41


Negation, 1113 Orthogonal
Neighbourhood, 88 subspace, 595
left, 89 vectors, 80
of in…nite, 199
right, 89 Parabola, 113
Norm, 76 Paradox
Nullity, 401 of Burali Forti, 10
Number of Russell, 10
cardinal, 171, 174 of the liar, 1115
e, 14, 223, 258 Pareto optimum, 52
pi, 14, 1108 Part
Numbers integer, 28
algebraic, 225 negative, 795, 1008
Fibonacci, 295 positive, 795, 1008
irrational, 14 Partial sums, 244
natural, 11 Partition, 9
prime, 18 Permutation
prime of Mersenne, 185 simple, 1098
rational, 11 with repetitions, 1099
real, 14 Plurirectangle, 1000
relative integer, 11 Point
transcendental, 225 accumulation, 92
numeraire, 568 boundary, 90
corner, 618
Operations critical, 688
elementary (by row), 409 cuspidal, 618
Operator, 108, 112, 394 exterior, 90
continuous, 357 extremal, 525
contraction, 499 in‡ection, 856
derivative, 651 interior, 90
identity, 395 isolated, 92
invertible, 412 limit, 92, 269
linear, 394 of in‡ection with horizontal tangent, 857
Lipschitz, 499 of Kuhn-Tucker, 922
monotone, 760 regular, 890, 907, 918
null, 395 saddle, 688, 977
projection, 595, 804 singular, 890, 907, 918
strictly competitive, 985 stationary, 688
zero-sum, 985 Polyhedron, 562
Optimizer Polynomial, 143
global, 577 of Maclaurin, 715
Order of Taylor, 715
complete, 1093 Polytope, 454
partial, 47, 1093 Portfolio, 600
weak, 1094 Positive orthant, 42
INDEX 1151

Postulate of continuity of the real line, 14 Archimedean, 27


Power associative, 8, 46, 124
of set, 164 commutative, 8, 45, 47, 124
set, 171 distributive, 9, 46, 47
Predicate, 1124 satis…ed eventually, 193
Preference Proposition, 1113
complete, 157 Pythagorean trigonometric identity, 1105
de…nition, 115
lexicographic, 160 Quadratic form, 725
monotonic, 158 inde…nite, 727
re‡exive, 156 negative de…nite, 727
strict, 155 negative semi-de…nite, 727
strictly monotonic, 158 positive de…nite, 727
strongly monotonic, 158 positive semi-de…nite, 727
transitive, 156 Quanti…er
Preimage, 117 existential, 1124
Preorder, 1094 universal, 1124
Price
ask, 795 Rank, 401, 403
bid, 795 full, 406
Primitive, 1032 maximum, 406
Problem Recurrence, 180
constrained optimization, 526 linear of order k, 183
consumer, 535 of order k, 365
maximum, 526 orbit, 366
minimum, 526 phase portrait, 366
optimization, 525 random walk, 181
parametric optimization, 961 Recursion, 180
unconstrained di¤erential optimization, 871Relation
unconstrained optimization, 526 binary, 1091
vector maximum, 578 equivalence, 1095
with equality constraints, 889 Remainder
with inequality constraints, 916, 991 Lagrange’s, 720
Procedure Peano’s, 720
Gaussian elimination, 409 Representation
Product of linear function, 382
Cartesian, 41, 44 of linear operator, 397
inner, 46, 75 Restriction, 154
of matrices, 391 Root
Projection, 595, 804 algebraic, 29
Projections, 646 arithmetical, 29, 76
Proof Rule
by contradiction, 1118 chain, 627, 660, 676
by contraposition, 1118 of Cramer, 434
direct, 1118 of de l’Hospital, 707
Property of Leibniz, 1068
1152 INDEX

pricing, 603 MacLaurin, 742


Mengoli, 245
Scalar, 45 negatively divergent, 244
Scalar multiplication, 388 of Grandi, 281
Secant, 1105 oscillating, see Irregular series
Semicone, 495 positively divergent, 244
Separating element, 23 power, 290
Sequence, 179 Taylor, 742
arithmetic, 181 with positive terms, 249
asymptotic to another, 228 Set, 3
bounded, 192 bounded, 24, 101
bounded from above, 192 bounded from above, 24
bounded from below, 192 bounded from below, 24
Cauchy, 222 budget, 535
comparable with another, 228 choice, 523
constant, 193 closed, 96
convergent, 195 compact, 101
decreasing, 193 complement, 8
divergent, 198 consumption, 155, 535
Fibonacci, 180 convex, 451
geometric, 180 countable, 164
harmonic, 180 derived, 92
increasing, 192 directed, 476
in…nitesimal, 196 empty, 5
irregular, 194 …nite, 164
maximizing, 885 image, 107
monotonic, 193 lattice, 507
negligible with respect to another, 228 linearly dependent, 62
null, see In…nitesimal sequence linearly independent, 62
of di¤erences, 270 maximum, 25, 51
of second di¤erences, 272 minimum, 25, 51
of the partial sums of a series, 244 open, 94
of the same order of another, 228 orthogonal, 81
oscillating, see Irregular sequence orthonormal, 81
regular, 194 power of, 164
relaxing, 880 unbounded, 24
totally monotone, 1086 universal, 8
unbounded, 192 Sets
Series, 244 disjoint, 5
absolutely convergent, 261 lower contour, 463
alternating harmonic series, 263 upper contour, 463
convergent, 244 Sine, 1104
generalized harmonic, 250 Singleton, 4
geometric, 246 Solution
harmonic, 245, 1133 corner, 533, 904
irregular, 244 of an optimization problem, 525
INDEX 1153

set, 523 of Bernstein, 743


Space, 8 of Binet, 422
column, 405 of Bolzano, 348
complete, 223 of Bolzano-Weierstrass, 206
dual, 381 of Borel-Peano, 746
Euclidean, 44 of Brouwer, 361
incomplete, 223 of Caccioppoli-Hadamard, 849
R^n, 44 of Cantor, 170
row, 405 of Carnot, 1110
vector, 59 of Cauchy, 221, 478
Span of a set, 66 of Cauchy-Hadamard, 291
Subdivision, 1002 of Cesàro, 277
Submatrix, 414 of Choquet, 515
Subsequence, 204 of Darboux, 694
Subset, 3 of de l’Hospital, 707
proper, 4 of De Moivre-Stirling, 237
Superdi¤erential, 778 of Euclid, 16, 21
ordinal, 784 of Fermat, 684
Supremum, 26, 89, 92 of Frobenius-Littlewood, 300
of Hahn-Banach, 443, 790
Tangent of Hausdor¤, 1086
(trigonometric), 1105 of Hille, 745
Tangent line, 613 of Kakutani, 957
Tangent plane, 655 of Kronecker, 431
Teorema of Kronecker-Capelli, 437
di Riemann, 265 of Kuhn-Tucker, 922
Term of a sequence, 179 of Lagrange (mean value), see Mean Value
Theorem Theorem
Berge’s Maximum, 964 of Lagrange (optimization), 894
duality of linear programming, 994 of Landau, 280
extreme value, 351, 541 of Laplace, 427
…rst welfare, 572 of Minkowski, 486
fundamental of arithmetic, 19 of Nash, 985
fundamental of …nance, 606 of permanence of sign, 202, 325
fundamental of integral calculus (…rst), of Poincaré-Miranda, 359
1036 of Pringsheim, 743
fundamental of integral calculus (second), of Pythagoras, 80, 1110
1038 of Rolle, 689
fundamental of linear programming, 565 of Schwarz, 664
integral mean value, 1030 of Stampacchia, 939
intermediate value, 354 of Tartaglia-Newton, 1101
mean value, 690 of Taylor, 715
minimax, 981 of the comparison, 216, 325
of Arrow-Debreu, 364, 958 of the envelope, 968, 971
of Artin, 517 of the implicit function, 814, 826, 828,
of Bauer, 559 850
1154 INDEX

of the inverse function, 845, 850


of Tonelli, 548, 588
of Tonelli (ordinal), 590
of uniqueness of the limit, 201, 323
of Weierstrass, 351, 541
Projection, 594, 803
Riesz, 382, 597
Triangulation, 1107
Truth
table of, 1113
value, 1113

Union, 6, 1114
Unit ball, 42
Unit circle, 43
Upper bound, 23

Value
absolute, 75
maximum, 151
principal, according to Cauchy, 1055
saddle, 977
Variable
dependent, 107
independent, 107
of choice, 526
Vector, 42, 44
unit, 79
zero, 45
Vector subspace, 60
generated, 66
Vectors
addition, 45
collinear, 62
column, 387
linearly dependent, 62
linearly independent, 62
orthogonal, 80
product, 45
row, 387
scalar multiplication, 45
sum, 45
Venn diagrams, 4
Versors, 62
fundamental of R^n, 79

Walras’Law, 538

Das könnte Ihnen auch gefallen