Beruflich Dokumente
Kultur Dokumente
by
Takashi Kunimoto
First Version: August 9, 2007
This Version: May 18, 2010
Summer 2010, Department of Economics, McGill University
August 16 - 27 (tentative): Monday - Friday, 10:00am - 1:00pm; at TBA
Instructor: Takashi Kunimoto
Email: takashi.kunimoto@mcgill.ca
Class Web: http://people.mcgill.ca/takashi.kunimoto/?View=Publications
Oce: Leacock 438
COURSE DESCRIPTION: This course is designed to provide you with
mathematical tools that are extensively used in graduate economics courses. The topics
which will be covered is
Sets and Functions;
Topology in the Euclidean Space;
Linear Algebra
Multivariate Calculus;
Static Optimization;
(Optional) Correspondences and Fixed Points; and
(Optional) The First-Order Dierential Equations in One Variable.
A good comprehension of the material covered in the notes is essential for successful
graduate studies in economics. Since we are seriously time constrained which you
might not believe , it would be very useful for you to carry one of the books provided
below as a reference after you start to study in graduate school in September.
READING:
The main textbook of this course is Further Mathematics for Economic Analysis. I
mostly use this book for the course. However, if you dont nd the main textbook helpful
enough, I strongly recommend you that you should buy at least one of the other books
I listed below as well as Further Mathematics for Economic Analysis. Of course, you
can buy any math book which you nd useful.
1
I am thankful to the students for their comments, questions, and suggestions. Yet, I believe that
there are still many errors in this manuscript. Of course, all remaining ones are my own.
Further Mathematics for Economic Analysis, by Knut Sydsaeter, Peter Hammond, Atle Seierstad, and Atle Strom, Prentice Hall, 2005 (Main Textbook. If
you dont have any math book or are not condent about your math skill, this
book will help you a lot.)
Mathematical Appendix, in Advanced Microeconomic Theory, Second Edition,
by Georey A. Jehle and Philip J. Reny, (2000), Addison Wesley (Supplementary.
This is the main textbook for Econ 610 but the mathematical appendix of this
book is too concise in many times)
Mathematics for Economists, by Simon and Blume, Norton, (1994). (Supplementary. This book is a popular math book in many Ph.D programs in economics.
There has to be a reason for that, although I dont know the true one.)
Fundamental Methods of Mathematical Economics, by A. Chiang, McGraw-Hill.
(more elementary and supplementary)
Introductory Real Analysis, by A. N. Kolmogorov and S.V. Fomin, Dover Publications (very very advanced and supplementary. If you really like math, this is the
book for you.)
OFFICE HOURS: Wednesday and Friday, 2:00pm - 3:00pm
PROBLEM SETS: There will be several problem sets. Problem sets are essential to
help you understand the course and to develop your skill to analyze economic problems.
ASSESSMENT: No grade will be assigned. However, you are expected to do the
problem sets assigned.
Contents
1 Introduction
2 Preliminaries
2.1 Logic . . . . . . . . . . . . . . . . . .
2.1.1 Necessity and Suciency . .
2.1.2 Theorems and Proofs . . . .
2.2 Set Theory . . . . . . . . . . . . . .
2.3 Relations . . . . . . . . . . . . . . .
2.3.1 Preference Relations . . . . .
2.4 Functions . . . . . . . . . . . . . . .
2.4.1 Least Upper Bound Principle
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Topology in Rn
3.1 Sequences on R . . . . . . . . . . . . . . . . .
3.1.1 Subsequences . . . . . . . . . . . . . .
3.1.2 Cauchy Sequences . . . . . . . . . . .
3.1.3 Upper and Lower Limits . . . . . . . .
3.1.4 Inmum and Supremum of Functions
3.1.5 Indexed Sets . . . . . . . . . . . . . .
3.2 Point Set Topology in Rn . . . . . . . . . . .
3.3 Topology and Convergence . . . . . . . . . .
3.4 Properties of Sequences in Rn . . . . . . . . .
3.5 Continuous Functions . . . . . . . . . . . . .
4 Linear Algebra
4.1 Basic Concepts in Linear Algebra . . . . .
4.2 Determinants and Matrix Inverses . . . .
4.2.1 Determinants . . . . . . . . . . . .
4.2.2 Matrix Inverses . . . . . . . . . . .
4.2.3 Cramers Rule . . . . . . . . . . .
4.3 Vectors . . . . . . . . . . . . . . . . . . .
4.4 Linear Independence . . . . . . . . . . . .
4.4.1 Linear Dependence and Systems of
4.5 Eigenvalues . . . . . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Linear
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Equations
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
11
13
13
14
15
.
.
.
.
.
.
.
.
.
.
17
17
18
19
20
21
21
21
23
24
26
.
.
.
.
.
.
.
.
.
29
29
32
32
32
33
34
35
36
37
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
39
40
42
44
45
46
47
47
49
49
49
51
52
53
54
5 Calculus
5.1 Functions of a Single Variable . . . . . . . . . . . .
5.2 Real-Valued Functions of Several Variables . . . .
5.3 Gradients . . . . . . . . . . . . . . . . . . . . . . .
5.4 The Directional Derivative . . . . . . . . . . . . . .
5.5 Convex Sets . . . . . . . . . . . . . . . . . . . . . .
5.5.1 Upper Contour Sets . . . . . . . . . . . . .
5.6 Concave and Convex Functions . . . . . . . . . . .
5.7 Concavity/Convexity for C 2 Functions . . . . . . .
5.7.1 Jensens Inequality . . . . . . . . . . . . . .
5.8 Quasiconcave and Quasiconvex Functions . . . . .
5.9 Total Dierentiation . . . . . . . . . . . . . . . . .
5.9.1 Linear Approximations and Dierentiability
5.10 The Inverse of a Transformation . . . . . . . . . .
5.11 Implicit Function Theorems . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
55
56
56
57
58
59
59
60
63
64
68
69
72
73
.
.
.
.
.
.
.
.
.
.
77
77
77
78
79
80
81
81
84
84
85
4.6
4.7
4.8
4.9
4.5.1 Motivations . . . . . . . . . . . . . . . . .
4.5.2 How to Find Eigenvalues . . . . . . . . .
Diagonalization . . . . . . . . . . . . . . . . . . .
Quadratic Forms . . . . . . . . . . . . . . . . . .
Appendix 1: Farkas Lemma . . . . . . . . . . . .
4.8.1 Preliminaries . . . . . . . . . . . . . . . .
4.8.2 Fundamental Theorem of Linear Algebra
4.8.3 Linear Inequalities . . . . . . . . . . . . .
4.8.4 Non-Negative Solutions . . . . . . . . . .
4.8.5 The General Case . . . . . . . . . . . . .
Appendix 2: Linear Spaces . . . . . . . . . . . .
4.9.1 Number Fields . . . . . . . . . . . . . . .
4.9.2 Denitions . . . . . . . . . . . . . . . . .
4.9.3 Bases, Components, Dimension . . . . . .
4.9.4 Subspaces . . . . . . . . . . . . . . . . . .
4.9.5 Morphisms of Linear Spaces . . . . . . . .
6 Static Optimization
6.1 Unconstrained Optimization . . . . . . . . . . . . . . .
6.1.1 Extreme Points . . . . . . . . . . . . . . . . . .
6.1.2 Envelope Theorems for Unconstrained Maxima
6.1.3 Local Extreme Points . . . . . . . . . . . . . .
6.1.4 Necessary Conditions for Local Extreme Points
6.2 Constrained Optimization . . . . . . . . . . . . . . . .
6.2.1 Equality Constraints: The Lagrange Problem .
6.2.2 Lagrange Multipliers as Shadow Prices . . . . .
6.2.3 Tangent Hyperplane . . . . . . . . . . . . . . .
6.2.4 Local First-Order Necessary Conditions . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6.2.5
6.3
6.4
6.5
6.6
6.7
6.8
6.9
for
. .
. .
. .
. .
. .
. .
. .
. .
. .
Local Ex. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
85
87
88
90
91
94
95
96
96
7 Dierential Equations
97
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8 Fixed Point Theorems
98
8.1 Banach Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.2 Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . 99
9 Topics on Convex Sets
9.1 Separation Theorems . . . .
9.2 Polyhedrons and Polytopes
9.3 Dimension of a Set . . . . .
9.4 Properties of Convex Sets .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
102
104
105
105
Chapter 1
Introduction
I start my lecture with Rakesh Vohras message about what economic theory is. He is a
professor at Northwestern University. 1
All of economic theorizing reduces, in the end, to the solution of one of
three problems.
Given a function f and a set S:
1. Find an x such that f (x) is in S. This is the feasibility question.
2. Find an x in S that optimizes f (x). This is the problem of optimality.
3. Find an x in S such that f (x) = x, this is the xed point problem.
These three problems are, in general, quite dicult. However, if one is
prepared to make assumptions about the nature of the underlying function
(say it is linear, convex or continuous) and the nature of the set S (convex,
compact etc.) it is possible to provide answers and very nice ones at that.
I think this is the biggest picture of economic theory you could have as you go along
this course. Whenever you are at a loss, please come back to this message.
We build our theory on individuals. Assume that all commodities are traded in the
centralized markets. Throughout Econ 610 and 620, we assume that each individual
(consumer and rm) takes prices as given. We call this the price taking behavior assumption. You might ask why individuals are price takers. My answer would be why not?
Let us go as far as we can with this behavioral assumption and thereafter try to see the
limitation of the assumption. However, you have to wait for Econ 611 and 621 for how
to relax this assumption. So, stick with this assumption. For each consumer, we want
to know
1. What is the set of physically feasible bundles? Is there any such a bundle at all
(feasibility)? We call this set the consumption set.
1
2. What is the set of nancially feasible bundles? Is there any such a bundle at all
(feasibility)? We call this set the budget set.
3. What is the best bundle to the consumer among all feasible bundles (optimality)?
We call this bundle the consumers demand.
We can make the exact parallel argument for the rm. What is the set of technically
feasible inputs (feasibility)? We call this the production set of the rm. What is the
best combination of inputs to maximize its prot (optimality)? We call this the rms
supply. Once we gure out what are feasible and best choices to each consumer and each
rm under any possible circumstance, we want to know if there is any coherent state of
aairs where everybody makes her best choice. In particular, all markets must clear. We
call this coherent state competitive (Walrasian) equilibrium. (a xed point).
If we move from microeconomics to macroeconomics, we must pay special attention
to time. Now, each individuals budget set does depend upon time. At each point in
time, he can change his asset portfolio so that he smoothes out his consumption plan
and/or production plan over time. If you know exactly when you die, there is no problem.
Because you just leave no money when you die, unless you want to leave some money
to your kids (i.e., altruistic preferences). This is called the nite time horizon problem.
What if you might live longer than you expected with no money left? Then, what do
you do? So, in reality, you dont know exactly when you die. This situation can be
formulated as the innite time horizon problem. Do you see why? To deal with the
innite horizon problem, we use the transversality condition as the terminal condition of
the feasible set. Moreover, his optimization must be taken into account time. You can
also analogously dene a sequence of competitive equilibria of the economy.
How can we summarize what we discussed above? Given a (per capita) consump
tion stream {ct }
t=0 , (per capita) capital accumulation {kt }t=0 , (per capita) GDP stream
{f (kt )}t=0 , capital depreciation rate , the population growth rate n, (per capita) consumption growth g, instantaneous utility function of the representative consumer u(),
the eective discount rate of the representative consumer > 0, (per capita) wage prole
{wt }
t=0 , and capital interest rate prole {rt }t=0 :
1. Find a {ct }
t=0 such that kt = f (kt ) ct ( + g + n)kt holds at each t 1 and
k0 > 0 is exogenously given. This is the feasibility question. Any such {ct } is called
a feasible consumption stream.
t
u(ct )dt.
2. Find a feasible consumption stream {ct }
t=0 that maximizes V0 = 0 e
This is the problem of optimality. I assume that V0 < .
3. Find a {rt , wt }
t=0 such that V0 (the planners optimization) is sustained through
market economies where k t = (rt n g)kt + wt ct holds at each t 1 and
another condition limt t et kt = 0 holds. This latter condition is sometimes
called the transversality condition. This is the xed point problem. This, in fact,
can be done by choosing rt = f (kt ) t and wt = f (kt ) f (kt )kt at each t 1.
7
With appropriate re-interpretations, the above is exactly what we had in the beginning except the transversality condition, which is a genuine feature of macroeconomics.
Chapter 2
Preliminaries
2.1
Logic
Theorems provide a compact and precise format for presenting the assumptions and
important conclusions of sometimes lengthy arguments, and so help identify immediately
the scope and limitations of the result presented. Theorems must be proved and a proof
consists of establishing the validity of the statement in the theorem in a way that is
consistent with the rules of logic.
2.1.1
Consider any two statements, p and q. When we say p is necessary for q, we mean
that p must be true for q to be true. For q to be true requires p to be true, so whenever
q is true, we know that p must also be true. So we might have said, instead, that p is
true if q is true, or simply that p is implied by q (p q).
Suppose we know that p q is a true statement. What if p is not true? Because
p is necessary for q, when p is not true, then q cannot be true, either. But doesnt this
just say that q not true is necessary for p not true? Or that not-q is implied by
not-p (q p). This latter form of the original statement is called the contrapositive
form.
Lets consider a simple illustration of these ideas. Let p be the statement, x is an
integer less than 10. Let q be the statement that x is an integer less than 8. Clearly,
p is necessary for q (q p). If we form the contrapositive of these two statements, the
statement p becomes x is not an integer less than 10, and x is not an integer less
than 8. Then, observe that q p. However, p q is false. The value of x could
well be 9.
The notion of necessity is distinct from that of suciency. When we say p is
sucient for q, we mean that whenever p holds, q must hold. We can say, p is true
only if q is true, or that p implies q (p q). Once again, whenever the statement
9
2.1.2
It is important to keep in mind the old saying that goes, Proof by example is no
proof. Suppose the following two statements are given:
p x is a student,
q x has red hair.
Assume further that we make the assertion p q. Then clearly nding one student
with red hair and pointing him out to you is not going to convince you of anything.
Examples are good for illustrating but typically not for proving.
Finally, a sort of converse to the old saying about examples and proofs should be
noted. Whereas citing a hundred examples can never prove that a certain property
always holds, citing one solitary counterexample can disprove that the property always
holds. For instance, to disprove the assertion about the color of students hair, you need
simply point out one student with brown hair. A counterexample proves that the claim
cannot always be true because you have found at least one case where it is not.
2.2
Set Theory
A set is any collection of elements. Sets of objects will usually be denoted by capital
letters, A, S, T for example, while their members by lower case, a, s, t for example (English
or Greek). A set S is a subset of another set T if every element of S is also an element
of T . We write S T . If S T , then x S x T . The set S is a proper subset of T
if S T and S
= T ; sometimes one writes S T in this case. Two sets are equal sets if
they each contain exactly the same elements. We write S = T whenever x S x T
and x T x S. The number of elements in a set S, its cardinality, is denoted
|S|. The upside down A, , means for all, while the backward E, means there
exists.
A set S is empty or is an empty set if it contains no elements at all. It is a subset
of every set. For example, if A = {x| x2 = 0, x > 1}, then A is empty. We denote
the empty set by the symbol . The complement of a set S in a universal set U is the
set of all elements in U that are not in S and is denoted S c . For any two sets S and T
in a universal set U , we dene the set dierence denoted S\T , as all elements in the set
S that are not elements of T . Thus, we can think S c = U \S. The symmetric dierence
ST = (S\T ) (T \S) is the set of all elements that belong to exactly one of the sets S
and T . Note that if S = T , then ST = .
For two sets S and T , we dene the union of S and T as the set ST {x| x S or x T }.
We dene the intersection of S and T as the set S T {x| x S and x T }. Let
. }, we can write
{1, 2, 3, . . . } be an index set. In stead of writing {S1 , S2 , S3 , . .
sets
in
the
collection
by
{S } . We would denote the union of all
S , and the
intersection of all sets in the collection as S .
11
The following are some important identities involving the operations dened above.
A B = B A, (A B) C = A (B C), A = A
A B = B A, (A B) C = A (B C), A =
A (B C) = (A B) (A C), A (B C = (A B) (A C) (Distribute laws)
A\(B C) = (A\B) (A\C), A\(B C) = (A\B) (A\C) (De Morgans laws)
AB = BA, (AB)C = A(BC), A = A
The collection of all subsets of a set A is also a set, called the power set of A and
denoted by P(A). Thus, B P(A) B A.
Example 2.1 Let A = {a, b, c}. Then, P(A) = {, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.
The previous argument reveals its stance that the order of the elements in a set
specication does not matter. In particular, {a, b} = {b, a}. However, on many occasions,
one is interested in distinguishing between the rst and the second elements of a pair.
One such example is the coordinates of a point in the x y-plane. These coordinates
are given as an ordered pair (a, b) of real numbers. The important property of ordered
pairs is that (a, b) = (c, d) if and only if a = c and b = d. The product of two sets S and
T is the set of ordered pairs in the form (s, t), where the rst element in the pair is a
member of S and the second is a member of T . The product of S and T is denoted
S T {(s, t)| s S, t T }.
The set of real numbers is denoted by the special symbol R and is dened as
R {x| < x < }.
Any n-tuple, or vector, is just an n dimensional ordered tuple (x1 , . . . , xn ) and can
be thought of as a point in n dimensional Euclidean space. This space is dened as
the set product
R {(x1 , . . . , xn ) | xi R, i = 1, . . . , n}.
Rn R
n times
Often, we want to restrict our attention to a subset of Rn , called the nonnegative
orthant and denoted Rn+ , where
Rn+ {(x1 , . . . , xn ) | xi 0, i = 1, . . . , n} Rn .
Furthermore, we sometimes talk about the strictly positive orthant of Rn
Rn++ {(x1 , . . . , xn ) | xi > 0, i = 1, . . . , n} Rn+ .
12
2.3
Relations
2.3.1
Preference Relations
I now talk a little bit about economics. Here I apply the concept of relations to the
consumer choice problem. The number of commodities is nite and equal to n. Each
commodity is measured in some innitely divisible units. Let x = (x1 , . . . , xn ) Rn+
be a consumption bundle. Let Rn+ be a consumption set that is the set of bundles the
consumer can conceive. We represent the consumers preferences by a binary relation,
13
2.4
Functions
A function is a relation that associates each element of one set with a single, unique
element of another set. We say that the function f is a mapping, map, or transformation
from one set D to another set R and write f : D R. We call the set D the domain
and the set R the range of the mapping. If y is the point in the range mapped into by
the point x in the domain, we write y = f (x). In set-theoretic terms, f is a relation from
D to R with the property that for each x D, there is exactly one y R such that xf y
(x is related to y via f ).
The image of f is that set of points in the range into which some point in the domain
is mapped, i.e.,
I {y | y = f (x) for some x D} R.
The inverse image of a set of points S I is dened as
f 1 (S) {x | x D, f (x) S} .
The graph of the function f is the set of ordered pairs
G {(x, y) | x D, y = f (x)} .
If f (x) = y, one also writes x y. The squaring function s : R R, for example,
can then be written as s : x x2 . Thus, indicates the eect of the function on an
element of the domain. If f : A B is a function and S A, the restriction of f to S
is the function f |S dened by f |S (x) = f (x) for every x S. There is nothing in the
denition of a function that prohibits more than one element in the domain from being
mapped into the same element in the range. If, however, every point in the range is
assigned to at most a single point in the domain, the function is said to be one-to-one,
14
that is, for all x, x D, whenever f (x) = f (x ), then x = x . If the image is equal to
the range - if for every y R, there is x D such that f (x) = y, the function is said
to be onto. If a function is one-to-one and onto (sometimes called bijective), then an
inverse function f 1 : R D exists that is also one-to-one and onto. The composition
of a function f : A B and a function g : B C is the function g f : A C given
by (g f )(a) = g(f (a)) for all a A.
Exercise 2.2 Show that f (x) = x2 is not a one-to-one mapping.
2.4.1
A set S of real numbers is bounded above if there exists a real number b such that b x
for all x S. This number b is called an upper bound for S. A set that is bounded above
has many upper bounds. A least upper bound for the set S is a number b that is an
upper bound for S and is such that b b for every upper bound b. The existence of a
least upper bound is a basic and non-trivial property of the real number system.
Fact 2.1 (Least Upper Bound Principle) Any nonempty set of real numbers that is
bounded above has a least upper bound.
This principle is rather an axiom of real numbers. A set S can have at most one least
upper bound, because if b1 and b2 are both least upper bounds for S, then b1 b2 and
b2 b1 , which thus implies that b1 = b2 . The least upper bound b of S is often called
the supremum of S. We write b = sup S and b = supxS x.
Example 2.2 The set S = (0, 5), consisting of all x such that 0 < x < 5, has many
upper bounds, some of which are 100, 6.73, and 5. Clearly no number smaller than 5
can be an upper bound, so 5 is the least upper bound. Thus, sup S = 5.
A set S is bounded below if there exists a real number a such that x a for all x S.
The number a is a lower bound for S. A set S that is bounded below has a greatest
lower bound a , with the property a x for all x S, and a a for all lower bounds
a. The number a is called the inmum of S and we write a = inf S or a = inf xS x.
Thus, we summarize
sup S = the least number greater than or equal to all numbers in S; and
inf S = the greatest number less than or equal to all numbers in S.
Theorem 2.1 Let S be a set of real numbers and b a real number. Then sup S = b if
and only if the following two conditions are satised:
1. x b for all x S.
2. For each > 0, there exists an x S such that x > b .
15
16
Chapter 3
Topology in Rn
3.1
Sequences on R
A sequence is a function k x(k) whose domain is the set {1, 2, 3, . . . } of all positive integers. I denote the set of natural numbers by N = {1, 2, . . . }. The terms
x(1), x(2), . . . , x(k), . . . of the sequence are usually denoted by using subscripts: x1 , x2 , . . . , xk , . . . .
We shall use the notation {xk }
k=1 , or simply {xk }, to indicate an arbitrary sequence of
real numbers. A sequence {xk } of real numbers is said to be
1. nondecreasing if xk xk+1 for k = 1, 2, . . .
2. strictly increasing if xk < xk+1 for k = 1, 2, . . .
3. nonincreasing if xk xk+1 for k = 1, 2, . . .
4. strictly decreasing if xk > xk+1 for k = 1, 2, . . .
A sequence that is nondecreasing or nonincreasing is called monotone. A sequence
{xk } is said to converge to a number x if xk becomes arbitrarily close to x for all
sucient large k. We write limk xk = x or xk x as k . The precise denition
of convergence is as follows:
Denition 3.1 The sequence {xk } converge to x if for every > 0, there exists a
natural number N such that |xk x| < for all k > N . The number x is called
the limit of the sequence {xk }. A convergent sequence is one that converges to some
number.
Note that the limit of a convergent sequence is unique. A sequence that does not
converge to any real number is said to diverge. In some cases we use the notation
limk xk even if the sequence {xk } is divergent. For example, we say that xk as
k . A sequence {xk } is bounded if there exists a number M such that |xk | M for
all k = 1, 2, . . . . It is easy to see that every convergent sequence is bounded: If xk x,
by the denition of convergence, only nitely many terms of the sequence can lie outside
the interval I = (x 1, x + 1). The set I is bounded and the nite set of points from the
17
sequence that are not in I is bounded, so {xk } must be bounded. On the other hand, is
every bounded sequence convergent? No. For example, the sequence {yk } = {(1)k } is
bounded but not convergent.
Theorem 3.1 Every bounded monotone sequence is convergent.
Proof of Theorem 3.1: Suppose, without loss of generality, that {xk } is nondecreasing and bounded. Let b be the least upper bound of the set X = {xk |k = 1, 2, . . . },
and let > 0 be an arbitrary number. Theorem 2.1 already showed that there must be a
term xN of the sequence for which xN > b . Because the sequence is nondecreasing,
b < xN xk for all k > N . But the xk are all less than or equal to b because of
boundedness, so we have b < xk b . Thus, for any > 0, there exists a number
N such that |xk b | < for all k > N . Hence, {xk } converges to b .
Theorem 3.2 Suppose that the sequences {xk } and {yk } converge to x and y, respectively. Then,
1. limk (xk yk ) = x y
2. limk (xk yk ) = x y
3. limk (xk /yk ) = x/y, assuming that yk
= 0 for all k and y
= 0.
Exercise 3.1 Prove Theorem 3.2.
3.1.1
Subsequences
18
denition of yn , we can choose a term xkn from the original sequence {xk } (with kn n)
satisfying |yn xkn | < 1/n.
|x xkn | = |x yn + yn xkn | |x yn | + |yn xkn | < |x yn | + 1/n.
This shows that xkn x as n .
3.1.2
Cauchy Sequences
I have dened the concept of convergence of sequences. Then, a natural question arises as
to how we can check if a given sequence is convergent. The concept of Cauchy sequence,
indeed, enables us to do so.
Denition 3.2 A sequence {xk } of real numbers is called a Cauchy sequence if for
every > 0, there exists a natural number N such that |xm xn | < for all m, n > N .
The theorem below is a characterization of convergent sequences.
Theorem 3.5 A sequence is convergent if and only if it is a Cauchy sequence.
Proof of Theorem 3.5: (=) Suppose that {xk } converges to x. Given > 0,
we can choose a natural number N such that |xn x| < /2 for all n > N . Then, for
m, n > N ,
|xm xn | = |xm x + x xn | |xm x| + |x xm | < /2 + /2 = .
Therefore, {xk } is a Cauchy sequence. (=) Suppose that {xk } is a Cauchy sequence.
First, we shall show that the sequence is bounded. By the Cauchy property, there is a
number M such that |xk xM | < 1 for k > M . Moreover, the nite set {x1 , x2 , . . . , xM 1 }
is clearly bounded. Hence, {xk } is bounded. Theorem 3.4 showed that the bounded
sequence {xk } has a convergent subsequence {xkj }. Let x = limj xkj . Because {xk } is a
Cauchy sequence, for every > 0, there is a natural number N such that |xm xn | < /2
for all m, n > N . If we take J is suciently large, we have |xkj x| < /2 for all j > J.
Then for k > N and j > max{N, J},
|xk x| = |xk xkj + xkj x| |xk xkj | + |xkj x| < /2 + /2 =
Hence xk x as k .
Exercise 3.2 Consider the sequence {xk } with the generic term
k
1
1
1
1
xk = 2 + 2 + + 2 =
1
2
k
i2
i=1
19
1
1
1
1
1
1
+ +
=
n n+1
n+1 n+2
n+k1 n+k
Exercise 3.3 Prove that a sequence can have at most one limit. Use proof by contradiction. Namely, you rst suppose, by way of contradiction, that there are two limit
points.
3.1.3
Let {xk } be a sequence that is bounded above, and dene yn = sup{xk |k n} for
n = 1, 2, . . . . Each yn is a nite number and {yn } is a nonincreasing sequence. Then
either limn yn exists or is . We call this limit the upper limit (or lim sup) of the
sequence {xk }, and we introduce the following notation:
lim sup xk = lim (sup{xk |k n})
On the other hand, if limk sup xk = limk inf xk , then {xk } is convergent.
I omit the proof of Theorem 3.6.
Exercise 3.4 Determine the lim sup and lim inf of the following sequences.
1. {xk } = {(1)k }
2. {xk } = (1)k (2 + 1/k) + 1
20
3.1.4
Suppose that f (x) is dened for all x B, where B Rn . We dene the inmum and
supremum of the function f over B by
inf f (x) = inf{f (x)|x B}, sup f (x) = sup{f (x)|x B}.
xB
xB
3.1.5
Indexed Sets
Suppose that, for each , we specify an object a . Then, these objects form an
indexed set {a } with as its index set. In formal terms, an indexed set is a function
whose domain is the index set. For example, a sequence is an indexed set {ak }k with
the set N of natural numbers as its index set. In stead of {ak }k one often writes
{ak }
k=1 .
A set whose elements are sets is often called a family of sets, and so an indexed set
of sets is also called an indexed family of sets. Consider a nonempty indexed family
{A } of sets. The union and the intersection of this family are the sets
A = the set consisting of all x that belong to A for at least one
A = the set consisting of all x that belong to A for all .
The distributive laws can be generalized to
B =
(A B ) , A
B =
(A B )
A
The union
of a sequence {An }n = {An }
n=1 of sets is often
and the intersection
written as n=1 An and n=1 An .
3.2
Consider the n-dimensional Euclidean space Rn , whose elements, or points, are nvectors x = (x1 , . . . , xn ). The Euclidean distance d(x, y) between any two points x =
21
1. The entire space Rn and the empty set are both open.
22
Exercise 3.5 There are two questions. First, draw the graph of S = {(x, y) R2 |2x
y < 2 and x 3y < 5}. Second, prove that S is open in R2 .
Denition 3.4 A set S is closed if its complement, Rn \S is open.
A point x0 Rn is said to be a boundary point of the set S Rn if B (x0 ) S c
=
and B (x0 ) S
= for every > 0. Here, S c = R\S. In general, a set may include
none, some, or all of its boundary points. An open set, for instance, contains none of its
boundary points.
Each point in a set is either an interior point or a boundary point of the set. The set
of all boundary points of a set S is said to be the boundary of S and is denoted S or
bd(S). Note that, given any set S Rn , there is a corresponding partition of Rn into
three mutually disjoint sets (some of which may be empty), namely;
1. the interior of S, which consists of all points x Rn such that N S for some
neighborhood N of x;
2. the exterior of S, which consists of all points x Rn for which there exists some
neighborhood N of x such that N Rn \S;
3. the boundary of S, which consists of all points x Rn with the property that every
neighborhood N of x intersects both S and its complement Rn \S.
A set S Rn is said to be closed if it contains all its boundary points. The union of
A point x belongs
S and its boundary (S S) is called the closure of S, denoted by S.
1. The whole space Rn and the empty set are both closed.
3.3
I want to generalize the argument in Section 3.1 into Rn . The basic idea to do so is to
n
apply the previous argument coordinate-wise. A sequence {xk }
k=1 in R is a function
n
that for each natural number k yields a corresponding point xk in R .
23
(j)
there exists a number Nj such that |xk x(j) | < / n for all k > Nj . It follows that
(j)
|xk x(j) |. It follows
(j)
xk x(j) as k
d(xk , x) =
(1)
(n)
2 /n + 2 /n = ,
for all k > max{N1 , . . . , Nn }. This is well dened because of the niteness of n. Therefore, xk x as k
Denition 3.6 A sequence {xk } in Rn is said to be Cauchy sequences if for every
> 0, there exists a number N such that d(xm , xn ) < for all m, n > N .
Theorem 3.10 A sequence {xk } in Rn is convergent if and only if it is a Cauchy sequence.
Exercise 3.7 Prove Theorem 3.10. Apply the same argument in Theorem 3.5 to each
coordinate.
3.4
Properties of Sequences in Rn
Theorem 3.11
1. For any set S Rn , a point x Rn belongs to S if and only if
there exists {xk } S such that xk x as k .
2. A set S Rn is closed if and only if every convergent sequence of points in S has
its limit in S.
Regardless of whether
Proof of Theorem 3.11: (= of Property 1) Let x S.
x intS or S, for each k N, we can construct xk such that xk B1/k (x) S (In
particular, take xk = x for each k). Then xk x as k . (= of Property 1) Suppose
that {xk } is a convergent sequence for which xk S for each k and x = limk xk . We
For any > 0, there is a number N N such that xk B (x) for
claim that x S.
all k > N . Since xk S for each k, it follows that B (x) S
= . Suppose, on the
Since S is closed, there is some > 0 such that B (x) S = . This
other hand, x
/ S.
24
Exercise 3.8 Let the number of commodities of the competitive market be n. Let pi > 0
be a price for commodity i for each i = 1, . . . , n. Let y > 0 be the consumers income.
Dene the consumers budget set B(p, y) as
n
pi xi y .
B(p, y) x = (x1 , . . . , xn ) Rn+
i=1
3.5
Continuous Functions
26
j = 1, . . . , m, there exists j > 0 such that |fj (x) fj (x0 )| < / m for every point x S
with d(x, x0 ) < j . Let = min{1 , . . . , m }. Then x B (x0 ) S implies that
2
2
+ +
= .
d(f (x), f (x0 )) = |f1 (x) f1 (x0 )|2 + + |fm (x) fm (x0 )|2 <
m
m
This proves that f is continuous at x0 .
Here, I want to characterize the continuity of the functions in terms of sequences.
Theorem 3.15 A function f from S Rn into Rm is continuous at a point x0 in S if
and only if f (xk ) f (x0 ) for every sequence {xk } of points in S that converges to x0 .
Proof of Theorem 3.15: (=) Suppose that f is continuous at x0 and let {xk } be
a sequence for which xk S and limk xk = x0 . Let > 0 be given. Since xk x0 ,
for any > 0, there exists a number N N such that d(xk , x0 ) < for all k > N .
Therefore, because of the continuity of f , there exists > 0 such that d(f (x), f (x0 )) <
whenever x B (x0 ) S. But then xk B (x0 ) S and so d(f (xk ), f (x0 )) < for
all k > N . This implies that f (xk ) f (x0 ). (=) Let {xk } be a sequence for which
xk S for each k and x0 = limk xk . Since xk x0 , for any > 0, there is a number
N such that d(xk , x0 ) < for all k > N . Similarly, since f (xk ) f (x0 ), for any
> 0, there exists a number N such that d(f (xk ), f (x0 )) < for all k > N . Dene
N = max{N , N }. Then, by choosing k > N , there exists a > 0 with d(xk , x0 ) <
such that d(f (xk ), f (x0 )) < . In fact, this holds for any > 0. Hence, we prove that f
is continuous at x0 .
The theorem below shows that continuous mappings preserve the compactness of the
set.
Theorem 3.16 Let S Rn and let f : S Rm be continuous. Then f (K) = {f (x)|x
K} is compact for every compact subset K of S.
Proof of Theorem 3.16: Let {yk } be any sequence in f (K). By denition, for each
k, there is a point xk K such that yk = f (xk ). Because K is compact, by BolzanoWeierstrasss theorem, the sequence {xk } has a subsequence {xkj } with the property that
xkj K for each j and limj xkj = x0 K. Because f is continuous, by the previous
theorem (Theorem 3.15), f (xkj ) f (x0 ) as j , where f (x0 ) f (K) because
x0 K. But then {ykj } is a subsequence of {yk } that converges to a point f (x0 ) f (K).
So, we have proved that any sequence in f (K) has a subsequence converging to a point
of f (K).
27
28
Chapter 4
Linear Algebra
4.1
A = (aij )mn = .
..
..
.
.
.
.
.
.
.
am1 am2 amn
Here aij denotes the elements in the ith row
we can express f (x) = Ax as below:
(1)
f (x)
a11
..
.
a21
(j)
f (x) =
.
f (x) =
..
..
.
am1
(m)
f (x)
..
.
a1n
a2n
..
.
am2
amn
a12
a22
..
.
x1
x2
..
.
xn
29
n
r=1
air brj = ai1 b1j + ai2 b2j + + aik bkj + ain bnj
n terms
It is important to note that the product BA is well dened only if the number of
columns in B is equal to the number of rows in A.
If A, B, and C are matrices whose dimensions are such that the given operations are
well dened, then the basic properties of matrix of multiplication are:
(AB)C = A(BC) (associative law)
A(B + C) = AB + AC (left distributive law)
(A + B)C = AC + BC (right distributive law)
Exercise 4.2 Show the above three properties when we consider 2 2 matrices.
However, matrix multiplication is not commutative. In fact,
AB
= BA, except in special cases
AB = 0 does not imply that A or B is 0
AB = AC and A
= 0 do not imply that B = C
Exercise 4.3 Conrm the above three points by example.
By using matrix multiplication, one can write a general system of linear equations in
a very concise way. Specically, the system
a11 x1 + a12 x2 + + a1n xn
b1
b2
am1 x1 + am2 x2 + + amn xn
can be written as Ax = b if we
a11 a12
a21 a22
A= .
..
..
.
am1 am2
dene
..
.
a1n
a2n
..
.
amn
, x =
x1
x2
..
.
xn
30
bm
, b =
b1
b2
..
.
bn
d1
0
D= .
..
0
d1 0
0 0
0 dn
d2 0
2
n
.. . .
.. = D = ..
.. . .
.
.
.
.
.
.
0
dm
0
0
..
.
dnm
The identity matrix of order n, denoted by In , is the n n matrix having ones along
the main diagonal and zeros elsewhere:
1 0 0
0 1 0
(identity matrix)
In = . . .
. . ...
.. ..
0 0
31
4.2
4.2.1
For a general n n matrix A = {aij }, the determinant |A| can be dened recursively.
In fact,
|A| = ai1 Ai1 + ai2 Ai2 + aij Aij + + ain Ain
where the cofactors Aij are determinants of (n 1) (n 1)
a11 a1,j1 a1j a1,j+1
a21 a2,j1 a2j a2,j+1
..
..
..
..
.
.
.
i+j .
Aij = (1)
a
a
a
a
i1
i,j1
ij
i,j+1
.
..
..
..
..
.
.
.
a
an,j1 anj an,j+1
n1
matrices given by
a1n
a2n
..
.
ain
..
.
ann
Here row i and column j are to be deleted from the matrix A to produce Aij .
Proposition 4.1 Let A and B be n n matrices. Then, |AB| = |A||B|
Exercise 4.5 Prove Proposition 1 when n = 2.
4.2.2
Matrix Inverses
1
A12 A22 An2
adj(A), where adj(A) = .
A1 =
..
..
..
|A|
..
.
.
.
A1n A2n Ann
with Aij , the cofactor of the element aij . Note carefully the order of the indices in the
adjoint matrix, adj(A) with the column number preceding the row number. The matrix
(Aij )nn is called the cofactor matrix, whose transpose is the adjoint matrix.
32
4.2.3
Cramers Rule
b1
b2
()
an1 x1 + an2 x2 + + ann xn
bn
|Aj |
, j = 1, . . . , n
|A|
a11
a21
|Aj | = .
..
.
..
an1
a1,j1
a2,j1
..
.
b1
b2
..
.
an,j1
bn
a1,j+1
a2,j+1
..
..
.
.
an,j+1
a1n
a2n
..
.
ann
is obtained by replacing the jth column of |A| by the column whose components are
b1 , b2 , . . . , bn . If the right-hand side of the equation system () consists only of zeros, so
that it can be written in matrix form as Ax = 0, the system is called homogeneous. A
homogeneous system will always have the trivial solution x1 = x2 = = xn = 0.
33
4.3
Vectors
ai bi
i=1
34
ab
1.
ab
ab
, [0, ]
a b
This denition reveals that cos = 0 if and only if a b = 0. Then = /2. In symbols,
ab a b = 0
The hyperplane in Rn that passes through the point a = (a1 , . . . , an ) and is orthogonal to the nonzero vector p = (p1 , . . . , pn ), is the set of all points x = (x1 , . . . , xn ) such
that
p (x a) = 0
4.4
Linear Independence
Exercise 4.12 a1 = (1, 2), a2 = (1, 1), and a3 = (5, 1) R2 . Show that a1 , a2 , a3 are
linearly dependent.
Leta1 , a2 , . . . , an Rn \{0}. Suppose that, for any i = 1, . . . , n, it follows that
n
ai
=
j=i j aj for any 1 , . . . , i1 , i+1 , . . . , n R. Then, the entire space R is
spanned by the set of all linear combinations of a1 , . . . , an .
Lemma 4.4 A set of n vectors a1 , a2 , . . . , an in Rm is linearly dependent if and only if
at least one of them can be written as a linear combination of the others. Or equivalently:
A set of vectors a1 , a2 , . . . , an in Rm is linearly independent if and only if none of them
can be written as a linear combination of the others.
Proof of Lemma 4.4: Suppose that a1 , a2 , . . . , an are linearly dependent. Then
the equation c1 a1 + + cn an = 0 holds with at least one of the coecients ci dierently
from 0. We can, without loss of generality, assume that c1
= 0. Solving the equation for
a1 yields
c2
cn
a1 = a2 an .
c1
c1
Thus, a1 is a linear combination of the other vectors.
4.4.1
b1
b2
x1 a1 + + xn an = b
()
an1 x1 + an2 x2 + + amn xn
bm
Here a1 , . . . , an are the column vectors of coecients, and b is the column vector with
components b1 , . . . , bm .
Suppose that () has two solutions (u1 , . . . , un ) and (v1 , . . . , vn ). Then,
u1 a1 + + un an = b and v1 a1 + + vn an = b
Subtracting the second equation from the rst yields
(u1 v1 )a1 + + (un vn )an = 0.
Let c1 = u1 v1 , . . . , cn = un vn . The two solutions are dierent if and only if c1 , . . . , cn
are not all equal to 0. We conclude that if system () has more than one solution, then
the column vectors a1 , . . . , an are linearly dependent. 2 Equivalently, If the column
vectors a1 , . . . , an are linearly independent, then system () has at most one solution.
3
36
a11
a21
A= .
..
a12 a1n
a1j
a2j
a22 a2n
..
.. , where aj = .. j = 1, . . . , n
..
.
.
.
.
an1 an2
ann
anj
4.5
Eigenvalues
4.5.1
Motivations
A=
2 0
0 3
2 0
x1
2x1
y1
=
=
y = Ax
y2
x2
3x2
0 3
The linear transformation (matrix) A extends x1 into 2x1 along the x1 axis and x2
into 3x2 along the x2 axis. Importantly, there is no interaction between x1 and x2 through
the linear transformation A. This is, I believe, a straightforward extension of the linear
transformation in R into Rn . Dene e1 = (1, 0) and e2 = (0, 1) as the unit vectors in R2 .
Then, x = x1 e1 + x2 e2 and y = 2x1 e1 + 3x2 e2 . In other words, (e1 , e2 ) is the unit vector
in the original space and (2e1 , 3e2 ) is the unit vector in the space transformed through
A. Next, consider the matrix B as follows.
1 1
B=
2 4
Now, we dont have a clear image about what is going on through the linear transformation A. However, consider the following dierent unit vectors f1 = (1, 1) and
4
Check Section 4.2.3 for this argument. Recall that the system of linear equations is homogeneous if
it is expressed by Ax = 0.
37
f2
1
1
1
1
1 1
1 1
=2
=3
, and
1
1
2
2
2 4
2 4
2f1
3f2
This shows that once we take f1 and f2 as the new coordinate system, the linear
transformation B is the same as A but now along the f1 and f2 axes, respectively.
Finally, consider the matrix C below.
2 3
C=
4 2
It turns out that there is no way of nding the new coordinate system in which
the linear transformation C can be seen as either extending or shrinking the vectors in
each new axis. The reason why we dont nd such a new coordinate system is that we
restrict our attention to Rn . Once we allow for the unit vectors in the new system to be
complex numbers, we again will be successful to nd the new coordinate system in which
everything is easy to understand. 5
Consider another dierent unit vector
2
2
3
3
i and f2 = 1,
i .
f1 = 1,
3
3
Then,
f1
1
2 3
= (2 2 3i)
2 3i
4 2
3
2 3
4 2
A
2 3i
3
(22 3i)f1
f2
1
2 33i
= (2 + 2 3i)
2 33i
(2+2 3i)f2
, and
Those who are interested in the definition of complex numbers should be referred to Appendix 2 in
this chapter.
38
4.5.2
a1n
a21
a22
a2n
p() = |A I| =
= 0.
..
..
..
.
..
.
.
.
an1
an2
ann
This is called the characteristic equation of A. From the denition of determinant,
it follows that p() is a polynomial of degree n in . According to the fundamental
theorem of algebra, it has exactly n roots (real or complex), provided that any multiple
roots are counted appropriately.
Theorem 4.2 (The Fundamental Theorem of Algebra by Gauss (1779)) Consider
a polynomial of degree n in z shown below.
z n + an1 z n1 + + a1 z + a0 = 0, ()
where a0 , . . . an1 C. Then, () has n solutions z1 , . . . , zn with the property that zi C
for each i = 1, . . . , n. This includes the case in which zi = zj for some i
= j.
Exercise 4.13 Find the eigenvalues and the associated eigenvectors of the matrices, A
and B.
1 2
A =
3 0
0 1
B =
1 0
In fact, it is convenient to write the characteristic function as a polynomial in :
p() = ()n + bn1 ()n1 + + b1 () + b0
39
The zeros of this characteristic polynomial are precisely the eigenvalues of A. Denoting
the eigenvalues by 1 , 2 , . . . , n C, we have
p() = (1)n ( 1 )( 2 ) ( n )
Theorem 4.3 If A is an n n matrix with eigenvalues 1 , 2 , . . . , n , then
1. |A| = 1 2 n
2. tr(A) = a11 + a22 + + ann = 1 + 2 + + n
Proof of Theorem 4.3: Putting = 0, we see that p(0) = b0 = |A|. Specically,
= 0 gives p(0) = (1)n (1)n 1 2 n . Since (1)n (1)n = ((1)n )2 = 1, we have
b0 = |A| = 1 2 n . The product of the elements on the main diagonal of |A I| is
(a11 )(a22 ) (ann ).
If we choose ajj from one of these parentheses and from the remaining n 1, then
add over j = 1, . . . n, we obtain the term
(a11 + a22 + + ann )()n1
Since we cannot obtain other terms with ()n1 except the above terms, we conclude
that bn1 = a11 + a22 + ann , the trace of A.
4.6
Diagonalization
Let A and P be n n matrices with P invertible. Then A and P 1 AP have the same
eigenvalues. This is true because the two matrices have the same characteristic polynomial:
|P 1 AP I| = |P 1 AP P 1 IP | = |P 1 (A I)P |
= |P 1 ||A I||P | = |A I|
where we use the fact that |P 1 | = 1/|P | and |AB| = |A||B| (See Proposition 4.1 and
4.2.) .
An n n matrix A is diagonalizable if there exists an invertible n n matrix P and
a diagonal matrix D such that P 1 AP = D.
Theorem 4.4 (Diagonalization Theorem) An n n matrix A is diagonalizable if
and only if it has a set of n linearly independent eigenvectors x1 , . . . , xn Cn . In this
case,
P 1 AP = diag(1 , . . . , n )
where P is the matrix with x1 , . . . , xn Cn as its columns, and 1 , . . . , n C are the
corresponding eigenvalues.
40
Proof of Diagonalization Theorem: (=) Suppose that A has n linearly independent eigenvectors x1 , . . . , xn , with corresponding eigenvalues 1 , . . . , n . Let P denote
the matrix whose columns are x1 , . . . , xn . Then, AP = P D ( AP = DP because D is
diagonal), where D = diag(1 , . . . , n ). Because the eigenvectors are linearly independent, P is invertible, so P 1 AP = D. (=) If A is diagonalizable, there exists invertible
n n matrix P such that P 1 AP = D. Then, AP = P D. Since D is a diagonal matrix
by our hypothesis, we have AP = DP . The columns of P must be eigenvectors of A,
and the diagonal elements of D must be the corresponding eigenvalues.
A matrix P is said to be orthogonal if P T = P 1 , i.e., P T P = I. If x1 , . . . , xn are the
n column vectors of P , then x1 , . . . , xn are the row vectors of the transformed matrix,
P . The condition P T P = I then reduces to the n2 equations xTi xj = 1 if i = j and
xTi xj = 0 if i
= j.
Theorem 4.5 If the matrix A = (aij )nn is symmetric, then:
1. All the n eigenvalues 1 , . . . , n are real.
2. Eigenvectors that corresponds to dierent eigenvalues are orthogonal.
3. There exists an orthogonal matrix P (i.e., P T = P 1 ) such that
1 0 0
0 2 0
P 1 AP = .
.. . .
..
.
.
. .
.
0
The columns v1 , v2 , . . . , vn of the matrix P are eigenvectors of unit length corresponding to the eigenvalues 1 , 2 , . . . , n .
Proof of Theorem 4.5: (1) We will show this for n = 2. The eigenvalues of 2 2
matrix A are given by the quadratic equation.
a11
a
12
= 2 (a11 + a22 ) + (a11 a22 a12 a21 ) = 0 ()
|A I| =
a21
a22
The roots of the quadratic equation () are
(a11 + a22 ) (a11 + a22 )2 4(a11 a22 a12 a21 )
.
=
2
These roots are real if and only if (a11 + a22 )2 4(a11 a22 a12 a21 ), which is equivalent
to
(a11 a22 )2 + 4a12 a21 0
41
This is indeed the case. (2) Suppose that Axi = i xi and Axj = j xj by i
= j .
Multiplying these equalities from the left by xTj and xTi , respectively,
2 1
A=
1 2
Compute the matrix P described in Theorem 4.5.
4.7
Quadratic Forms
n
n
i=1 j=1
where the aij are constants. Suppose we put x = (x1 , . . . , xn )T and A = (aij ). Then, it
follows from the denition of matrix multiplication that
Q(x1 , . . . , xn ) = Q(x) = xT Ax.
Of course, xi xj = xj xi , so we can write aij xi xj + aji xj xi = (aij + aji )xi xj . If we replace
aij and aji by (aij + aji )/2, then the new numbers aij and aji become equal without
changing Q(x). Thus, we can assume that aij = aji for all i and j, which means that
the matrix A is symmetric. Then A is called the symmetric matrix associated with Q,
and Q is called a symmetric quadratic form.
6
42
Denition 4.4 A quadratic form Q(x) = x Ax, as well as its associated symmetric
matrix A, are said to be positive denite, positive semidenite, negative denite,
or negative semidenite according as
Q(x) > 0, Q(x) 0, Q(x) < 0, Q(x) 0,
for all x R\{0}. The quadratic form Q(x) is indenite if there exist vectors x and
y such that Q(x ) < 0 and Q(y ) > 0.
Let A = (aij ) be any n n matrix. An arbitrary principal minor of order r is the
determinant of the matrix obtained by deleting all but r rows and r columns in A with
the same numbers. In particular, a principal minor of order r always includes exactly r
elements of the main (principal) diagonal. We call the determinant |A| itself a principal
minor (no rows and columns are deleted). A principal minor is said to be a leading
principal minor of order r (1 r n), if it consists of the rst leading rows and
columns of |A|.
Suppose A is an arbitrary n n matrix.
a11 a12
a21 a22
Dk = .
..
..
..
.
.
ak1 ak2
n
n
aij xi xj
(aij = aji )
i=1 j=1
with the associated symmetric matrix A = (aij )nn . Let Dk be the leading principal
minor of A of order k and let k denote an arbitrary principal minor of order k. Then
we have
1. Q is positive denite Dk > 0 for k = 1, . . . , n
2. Q is positive semidenite k 0 for all principal minors of order k =
1, . . . , n.
3. Q is negative denite (1)k Dk > 0 for k = 1, . . . , n
43
a12
a212
x22
x2 + a22
Q(x1 , x2 ) = a11 x1 +
a11
a11
0
>0
Thus, we obtain
Q(x1 , x2 ) > 0 a11 > 0 and a11 a22 a212 > 0.
Q(x1 , x2 ) < 0 a11 < 0 and a11 a22 a212 < 0.
4.8
44
4.8.1
Preliminaries
The set of all vectors that can be expressed as a linear combination of vectors in S is
called the span of S and denoted span(S).
Denition 4.6 The rank of a (not necessarily nite) set S of vectors is the size of the
largest subset of linearly independent vectors in S.
Denition 4.7 Let S be a set of vectors and B S be nite and linearly independent.
The set B of vectors is said to be a maximal linear independent set if the set B {x}
is linearly dependent for all vectors x S\B. A maximal linearly independent subset of
S is called a basis of S.
Theorem 4.8 Every S Rn has a basis. If B is a basis of S, then span(S) = span(B).
45
4.8.2
4.8.3
Linear Inequalities
4.8.4
Non-Negative Solutions
47
Let D p = {ai1 , . . . , air }, b = i1 ai1 + ir air and let y be the vector found in step two
of iteration q. Then:
!
0 > y b = y i1 ai1 + + ir air = y i1 ai1 + + y ir air > 0,
which is a contradiction. The rst inequality comes from the previous theorem (Theorem
??). To see why the last inequality must be true:
When ij < s, we have from Step 1 of iteration p that ij 0. From Step 4 of
iteration q, we have y aij 0.
When ij = s, we have from Step 1 of iteration p that ij < 0. From Step 4 of
iteration q we have y aij < 0.
When ij > s, we have from Dp {as+1 , . . . , an } = Dq {ar+1 , . . . , an } and Step 2
of iteration q that y aij = 0.
This complete the proof.
48
4.8.5
Similarly, an inequality of the form j aij xj bi can be converted into an equation by
the addition of a slack variable, si 0 as follows:
aij xj + si = bi .
j
4.9
4.9.1
Linear algebra makes use of number systems (number elds). By a number eld I mean
any set K of objects, called numbers, which, when subjected to four arithmetic operations again give elements of K. More exactly, these operations have the following
properties F 1, F 2, and F 3 (eld axioms):
F1: To every pair of numbers and in K, there corresponds a (unique) number
+ in K, called the sum of and , where
1. + = + , K (addition is communicative);
49
2. ( + ) + = + ( + ) , , K (addition is associative);
3. There exists a number 0 (zero) in K such that 0 + = K;
4. For every K, there exists a number (negative element) K such that
+ = 0.
The solvability of the equation + = 0 allows us to carry out the operation of
subtraction, by dening the dierence as the sum of the number and the solution
of the equation + = 0.
F2: To every pair of numbers , K, there corresponds a (unique) number (or
) in K, called the product of and , where
1. = , K (multiplication is commutative);
2. () = () , , K (multiplication is associative);
3. There exists a number 1 (
= 0) in K such that 1 = inK;
4. For every
= 0 K, there exists a number (reciprocal element) K such that
= 1.
F3: Multiplication is distributive over addition, i.e., for every , , K,
( + ) = + .
The solvability of the equation = 1 for every
= 0 allows us to carry out the
operation of division, by dening the quotient / as the product of the number and
the solution of the equation = 1.
The numbers 1, 1 + 1 = 2, 2 + 1 = 3, etc. are said to be natural; it is assumed
that none of these numbers is zero. The set of natural numbers is denoted as N. By
the integers in a eld K, we mean the set of all natural numbers together with their
negatives and the number zero. The set of integers is denoted as Z. By the rational
numbers in a eld K, we mean the set of all quotients p/q, where p and q are integers
and q
= 0. The set of rational numbers is denoted as Q.
Two eld K and K are said to be isomorphic if we can set up a one-to-one cor
respondence between K and K such that the number associated with every sum (or
product) of numbers in K is the sum (or product) of the corresponding numbers in K .
The number associated with every dierence (or quotient) of numbers in K will then be
the dierence (or quotient) of the corresponding numbers in K .
The most commonly encountered concrete examples of number elds are the following:
50
4.9.2
Denitions
The concept of a linear space generalizes that of the set of all vectors. The generalization
consists rst in getting away from the concrete nature of the objects involved (directed
line segments) without changing the properties of the operations on the objects, and
secondly in getting away from the concrete nature of the admissible numerical factors
(real numbers). This leads the following denition.
Denition 4.12 A set V is called a linear (or ane) space over a eld K if
1. Given any two elements x, y V , there is a rule (the addition rule) leading to a
(unique) element x + y V , called the sum of x and y;
2. Given any element x V and any number V , there is a rule (the multiplication
by a number) leading to a (unique) element x V , called the product of the
element x and the number ;
3. These two rules obey the axioms listed below, VS1 and VS2.
VS 1: The addition rule has the following properties:
51
1. x + y = y + x for every x, y V ;
2. (x + y) + z = x + (y + z) for every x, y, z V ;
3. There exists an element 0 V (the zero vector ) such that x + 0 = x for every
xV;
4. For every x V , there exists an element y V (the negative element) such that
x + y = 0.
VS 2: The rule for multiplication by a number has the following properties:
1. 1 x = x for every x V ;
2. (x) = ()x for every x V and , K;
3. ( + )x = x + x for every x V and , K;
4. (x + y) = x + y for every x V and every K.
4.9.3
Theorem 4.14 When two vectors of a linear space V are added, their components (with
respect to any basis) are added. When a vector is multiplied by a number , all its
components are multiplied by .
If, in a linear space V , we can nd n linearly independent vectors while every n + 1
vectors of the space are linearly dependent, then the number n is called the dimension
of the space V and the space V itself is called n-dimensional. A linear space in which
we can nd an arbitrarily large number of linearly independent vectors is called innitedimensional.
Theorem 4.15 In a space V of dimension n, there exists a basis consisting of n vectors.
Moreover, any set of n linearly independent vectors of the space V is a basis for the space.
Theorem 4.16 If there is a basis in the space V , then the dimension of V equals the
number of basis vectors.
4.9.4
Subspaces
4.9.5
Denition 4.16 Let be a rule which assigns to every given vector x of linear space V
a vector x in a linear space. Then, is called morphism (or linear operator) if the
following two conditions hold:
1. (x + y) = (x) + (y) for every x, y V ;
2. (x) = (x) for every x V and every K.
Theorem 4.18 Any two n-dimensional spaces V and V (over the same eld K) are
K-isomorphic.
Corollary 4.1 Every n-dimensional linear space over a eld K is K-isomorphic to the
space K n . In particular, every n-dimensional complex space is C-isomorphic to the space
Cn , and every n-dimensional real space is R-isomorphic to the space Rn .
54
Chapter 5
Calculus
5.1
to indicate that f (x) gives us the (instantaneous) amount, dy, by which y changes per
unit change, dx, in x. If the rst derivative is a dierentiable function, we can take its
derivative which gets the second derivative of the original function
d2 y
= f (x).
dx2
If a function possesses a continuous derivatives f , f , . . . , f n , it is called n-times continuously dierentiable, or a C n function. Some rules of dierentiation is provided below:
For constants, : d/dx() = 0.
5.2
5.3
Gradients
If z = F (x, y) and C is any number, we call the graph of the equation F (x, y) = C a
level curve for F . The slope of the level curve F (x, y) = C at a point (x, y) is given by
the formula
F (x, y) = C = y =
F (x, y)/x
F1 (x, y)
dy
=
=
dx
F (x, y)/y
F2 (x, y)
If (x0 , y0 ) is a particular point on the level curve F (x, y) = C, the slope at (x0 , y0 ) is
F1 (x0 , y0 )/F2 (x0 , y0 ). The equation for the tangent hyperplane T is
y y0 = [F1 (x0 , y0 )/F2 (x0 , y0 )] (x x0 )
or, rearranging
F1 (x0 , y0 )(x x0 ) + F2 (x0 , y0 )(y y0 ) = 0.
Recalling the inner product, the equation can be written as
(F1 (x0 , y0 ), F2 (x0 , y0 )) (x x0 , y y0 ) = 0
The vector (F1 (x0 , y0 ), F2 (x0 , y0 )) is said to be the gradient of F at (x0 , y0 ) is often
denoted by F (x0 , y0 ) (pronounced as nabla). The vector (x x0 , y y0 ) is a vector
56
on the tangent hyperplane T which implies that F (x0 , y0 ) is orthogonal to the tangent
hyperplane T at (x0 , y0 ).
Suppose more generally that F (x) = F (x1 , . . . , xn ) is a function of n variables dened
on an open set A in Rn , and let x0 = (x01 , . . . , x0n ) be a point in A. The gradient of F at
x0 is the vector
F (x0 )
F (x0 )
0
, ,
F (x ) =
x1
xn
of rst-order partial derivatives.
5.4
f (x + ha) f (x)
h0
h
fa (x) = lim
or, with components,
fa (x1 , . . . , xn ) = lim
h0
We assume that x + ha lies in the domain of f for all suciently small h. This is one
reason why the domain is generally assumed to be open. In particular, with ai = 1 and
aj = 0 for all j
= i, this derivative is the partial derivative of f with respect to xi .
Suppose f is C 1 in a set A 1 , and let x be an interior point in A. For an arbitrary
vector a, dene the function g by
g(h) = f (x + ha) = f (x1 + ha1 , . . . , xn + han ).
1
A function f : n is continuously dierentiable (or C 1 ) on an open set A n if, for each i =
1, . . . , n, (f /xi )(x) exists for all x A and is continuous in x. f is k-times continuously dierentiable
or C k on A if all the derivatives of f of order less than or equal to k( 1) exist and are continuous on A.
57
Then, (g(h)g(0))/h
= (f (x+ha)f (x))/h.
Letting h tend to 0, we have g (0) = fa (x).
Since g (h) = ni=1 fi (x + ha)ai , g (0) = ni=1 fi (x)ai . Hence,
fa (x) =
fi (x)ai = f (x) a.
i=1
This equation shows that the derivative of f along the vector a is equal to the inner
product of the gradient of f and a. If a = 1, the number fa (x) is called the directional
derivative of f at x, in the direction a.
Theorem 5.1 Suppose that f (x) = f (x1 , . . . , xn ) is C 1 in an open set A. Then, at
points x where f (x) Rn \{0}, the gradient f (x) = (f1 (x), . . . , fn (x)) satises:
1. f (x) is orthogonal to the level surface through x.
2. f (x) points in the direction of maximal increase of f .
3. f (x) measures how fast the function increases in the direction of maximal increase.
Proof of Theorem 5.1: By introducing as the angle between the vectors f (x)
and a, we have
5.5
Convex Sets
Convex sets are basic building blocks in virtually every area of microeconomic theory. Convexity is most often assumed to guarantee that the analysis is mathematically
tractable and that the results are clear-cut and well-behaved.
58
5.5.1
Let u() : Rn R be a utility function. Dene U C(x0 ) = {x Rn+ |u(x) u(x0 )}. This
U C(x0 ) is called the upper contour set which consists of all commodity vectors x that
the individual values at least as good as x0 . In consumer theory, we usually assume that
U C(x0 ) is convex for every x0 Rn+ .
5.6
f (x + (1 )x ) () f (x) + (1 )f (x )
5.7
the n determinants
f1r (x)
f2r (x)
, r = 1, . . . , n
..
..
.
.
frr (x)
are the leading principal minors of D 2 f (x) of order r. Here fij (x) = 2 f (x)/xi xj for
any i, j = 1, . . . , r.
Theorem 5.4 (Second-Order Characterization of Concave (Convex) Functions)
Suppose that f (x) = f (x1 , . . . , xn ) is a C 2 function dened on an open, convex set S
in Rn . Let 2(r) f (x) denote a generic principal minor of order r in the Hessian matrix.
Then
1. f is convex in S 2(r) f (x) 0 for all x S and all 2(r) f (x), r = 1, . . . , n.
2. f is concave in S (1)r 2(r) f (x) 0 for all x S and all 2r f (x), r =
1, . . . , n.
Proof of Theorem 5.4: (=) The proof relies on the knowledge on the chain
rule (Theorem 5.15) which we are going to cover in this course. Just take it for granted
until then. Take two points x, x0 S and let t [0, 1]. Dene
g(t) = f (x0 + t(x x0 )) = f (tx + (1 t)x0 ).
The chain rule for functions of several variables gives
0 T
g (t) = (x x )
"
n
#
fi (x0 + t(x x0 ))(xi x0i )
f (x + t(x x ) =
0
i=1
By our hypothesis with Theorem 4.6 on quadratic forms, g (t) 0 for any t [0, 1].
This shows that g() is convex. In particular, we have
g(t) = g (t 1 + (1 t) 0)) tg(1) + (1 t)g(0) = tf (x) + (1 t)f (x0 )
60
But this shows that f () is convex. The concavity of f easily follows by replacing f with
f . (=) Suppose f () is convex. According to Theorem 4.6 on quadratic forms, it
suces to show that for all x S and all h1 , . . . , hn , we have
Q=
n
n
fij (x)hi hj 0.
i=1 j=1
p (t) =
n
n
fij (x + th)hi hj 0
i=1 j=1
for all t I. Putting t = 0, it follows that f (x) 0. This completes the proof.
Corollary 5.1 Let z = f (x, y) be a C 2 function dened on an open convex set S R2 .
Then,
1. f is convex f11 0, f22 0, and f11 f22 (f12 )2 0.
2. f is concave f11 0, f22 0, and f11 f22 (f12 )2 0.
Exercise 5.3 Let f (x, y) = 2x y x2 + 2xy y 2 for all (x, y) R2 . Check whether f
is concave, convex, or neither.
Exercise 5.4 The CES (Constant Elasticity of Substitution) function f dened for K >
0, L > 0 by
#1/
"
f (K, L) = A K + (1 )L
where A > 0,
= 0, and 0 1. Show that f is concave if 1 and convex if
1.
Theorem 5.5 (Second-Order (Partial) Characterization of Strict Concavity)
Suppose that f (x) = f (x1 , . . . , xn ) is a C 2 function dened on an open, convex set S in
2 f (x) be dened above. Then
Rn . Let D(r)
2 f (x) > 0 for all x S and all r = 1, . . . , n = f is strictly convex.
1. D(r)
2 f (x) > 0 for all x S and all r = 1, . . . , n = f is strictly concave.
2. (1)r D(r)
Proof of Theorem 5.5: Dene the function g() as in the proof of Theorem 5.4
above. If the specied conditions are satised, the Hessian matrix D2 f (x) is positive
denite by Theorem 4.6 on quadratic forms. So, for x
= x0 , g (t) > 0 for all t [0, 1].
It follows that g() is strictly convex. Then, we have
g(t) = g (t 1 + (1 t) 0)) > tg(1) + (1 t)g(0) = tf (x) + (1 t)f (x0 )
for all t (0, 1). The strict concavity of f is obtained by replacing f with f .
61
n
f (x0 )
i=1
xi
(xi x0i )
for all x, x0 S.
2. f () is strictly concave i the above inequality is always strict when x
= x0 .
3. The corresponding result for convex (strictly convex) functions is obtained by changing to (< to >) in the above inequality.
Proof of Theorem 5.6: (1) (=) Let x, x0 S. Since f is concave,
f (x) + (1 )f (x0 ) f (x + (1 )x0 )
for all (0, 1). Rearranging the above inequality, for all (0, 1), we obtain
f (x) f (x0 )
f (x0 + (x x0 )) f (x0 )
()
Let 0. The right hand side of () then approaches f (x0 ) (x x0 ). (=) Let
x, x0 S and (0, 1). Dene z = x + (1 )x0 . Notice that z S because S is
convex. By our hypothesis, we have
f (x) f (z) f (z) (x z) (i)
f (x0 ) f (z) f (z) (x0 z) (ii)
Multiplying the inequality in (i) by > 0 and the inequality in (ii) by 1 > 0, we
obtain
!
"
#
(f (x) f (z)) + (1 ) f (x0 ) f (z) f (z) (x z) + (1 )(x0 z) (iii)
Here (x z) + (1 )(x0 z) = x + (1 )x0 z = 0, so the right hand side of (iii)
is 0. Thus, rearranging (iii) gives
f (x) + (1 )f (x0 ) f (z) = f (x + (1 )x0 )
62
because z = x + (1 )x0 . This shows that f is concave. (2) (=) Suppose that
f is strictly concave in S. Then, inequality () is strict for x
= x0 . (=) With z =
x0 + (x x0 ), we have
f (x) f (x0 ) <
f (x0 ) (z x0 )
f (z) f (x0 )
= f (x0 ) (x x0 ).
where we used the inequality in (1), which we have already proved, and the fact that
z x0 = (x x0 ). This shows that the inequality in (1) holds with strict inequality.
(3) This part is trivial. Do you agree with me?
5.7.1
Jensens Inequality
n
i=1 i
= 1.
h=1
H(2) is true because it is indeed the denition of concavity of f . Now, we will show that
H(k) = H(k + 1). We execute a series of computations below.
k
$ k
%
k+1
h
h xh
h
xh + k+1 xk+1
= f
f
k
h=1 h
h=1
h=1
h=1
k
k
h
h f
xh + k+1 f (xk+1 ) (because of H(2))
k
h=1 h
h=1
h=1
k
k
h
h
f (xh ) + k+1 f (xk+1 )
k
h
h=1
h=1
h=1
(because H(k) is true under the inductive hypothesis.)
=
k+1
h f (xh ).
h=1
One can extend Jensens inequality to the continuum. Let X be a random variable
which takes values on the real line R. Dene g : R R to be a probability density
function. Then, continuous version of Jensens inequality is given:
63
&
f (x)g(x)dx
f (x)g(x)dx
f
5.8
1. f (x + (1 )x ) min{f (x), f (x )}
2. f (x ) f (x) = f (x + (1 )x ) f (x)
x, x Pa = {x S|f (x) a}
Since Pa is convex by our hypothesis, x + (1 )x Pa for any [0, 1]. This implies
that f (x + (1 )x ) a = min{f (x), f (x )}. (=) Suppose that the inequality in (1)
is valid and let a be an arbitrary number. We must show that Pa is convex. Take any
arbitrary points x, x Pa . Then, f (x) a and f (x ) a. Also, for all (0, 1), the
inequality in (1) implies that
(
'
f x + (1 )x min{f (x), f (x )}
Thus, x + (1 )x Pa . This proves that Pa is convex. We leave the rest of the proof
as an exercise.
Exercise 5.6 Prove the second part of Theorem 5.8.
A function : R R is said to be strictly increasing if F (x) > F (y) whenever x > y.
64
Exercise 5.7 The Cobb-Douglas function f (x) = Ax1 1 xn , dened for x1 > 0, . . . , xn >
0, with A > 0 and i > 0 for all i = 1, . . . , n. Show the following.
1. f () is quasiconcave for all 1 , . . . , n .
2. f () is concave for 1 + + n 1.
3. f () is strictly concave for 1 + + n < 1.
Theorem 5.10 (First-Order Characterization of Quasiconcavity) Let f () be a
C 1 function of n variables dened on an open convex set S in Rn . Then f () is quasiconcave on S if and only if for all x, x0 S,
0
f (x) f (x ) = f (x ) (x x ) =
n
f (x0 )
i=1
xi
(xi x0i ) 0.
Suppose f (x) f (x0 ). By Theorem 5.8, g(t) g(0) for all t [0, 1]. For any t (0, 1],
we have
g(t) g(0)
0.
t
Letting t 0, we obtain
lim
t0
g(t) g(0)
= g (0) 0.
t
This implies
g (0) = f (x0 ) (x x0 ) 0
(=) We will be satised with the gure for this part.
The content of Theorem 5.10 is that for any quasiconcave function f () and any pair
of points x and x0 with f (x) f (x0 ), the gradient vector f (x0 ) and the vector (x x0 )
must form an acute angle.
Theorem 5.11 (Second-Order Characterization of Quasiconcavity) Let f () be
a C 2 function dened on an open, convex set S in Rn . Then, f () is quasiconcave if
and only if, for every x S, the Hessian matrix D 2 f (x) is negative semidenite in the
subspace {z Rn |f (x) z = 0}, that is,
z T D2 f (x)z 0 whenever f (x) z = 0
for every x S. If the Hessian matrix D2 f (x) is negative denite in the subspace
{z Rn |f (x) z = 0} for every x S, then f () is strictly quasiconcave.
Proof of Theorem 5.11: (=) Suppose f () is quasiconcave. Let x S. Choose
x S such that f (x) (x x) = 0. Since f () is quasiconcave, f (x ) f (x). To see
this, draw the gure. Then,
f (x ) f (x) f (x) (x x) = 0
g() = f (x + (x x)).
Note that g(0) = f (x), g(1) = f (x ), and g(1) g(0) because f (x ) f (x) by our
hypothesis. What we want to show is that g() g(0) for any [0, 1]. By the
mean-value theorem (Theorem 5.2), there exists 0 (0, 1) such that g (0 ) = 0. Let
66
Again, for notional simplicity, we denote ()p + (x x) + x0 by z(). By dierentiating f (z()) = f (x0 ), we have
)
*
f (z()) ()p + (x x) = 0 ()
and by further dierentiating, we have
*T
)
*
)
()p + (x x) D2 f (z()) ()p + (x x) + f (z()) ()p = 0. ()
Then, we must have ()f (z())p 0. When is suciently close to zero, z() is
very close to x0 and so f (z())p > 0 because p Rn \{0}. Then, () 0 for with
|| suciently small.
For suciently small | 0 |, we have
(
'
f (t t0 )(x x) + x0 p > 0
because f (x0 )T p > 0 and (tt0 )(x x)+x0 is very close to x0 . Hence, for suciently
close to 0 , we have
'
(
g() = f (x + (x x)) = f ( 0 )(x x) + x0
(
'
f ( 0 )p + ( 0 )(x x0 ) + x0 = f (x0 ) = g(0 )
because ( 0 ) 0 for suciently close to 0 . Accordingly, g(0 ) does not have an
interior minimum in [0, 1], unless it is constant. Hence, g() g(0) for any [0, 1].
The last step is based on Corollary 4.3 in Nine Kinds of Quasiconcavity and Concavity,
by Diewert, Avriel, and Zang in Journal of Economic Theory, vol 25, (1981), 397-420.
Corollary 4.3 (Diewert, Avriel, and Zang (1981)): A dierentiable function f
dened over an open S is quasiconcave if and only if, for any x0 S and any v R with
v T v = 1,
67
v T f (x0 ) = 0 implies g(t) f (x0 + tv) does not attain a (semistrict) local minimum at
t = 0.
This completes the proof.
Theorem 5.12 (A Characterization through Bordered Hessian) Let f be a C 2
function dened in an open, convex set S in Rn . Dene the bordered Hessian determinants Br (x) as follows: for each r = 1, . . . , n,
0
f1 (x) fr (x)
f1 (x) f11 (x) f1r (x)
Br (x) = .
.
..
..
..
..
.
.
.
fr (x) fr1 (x) frr (x)
Then,
1. A necessary condition for f to be quasiconcave is that (1)r Br (x) 0 for all x S
and all r = 1, . . . , n.
2. A sucient condition for f to be strictly quasiconcave is that (1)r Br (x) > 0 for
all x S and all r = 1, . . . , n.
5.9
Total Dierentiation
(1)
f (x)
a11 a12 a1n
x1
..
..
..
.. x2
..
.
.
.
.
.
..
..
.. ..
..
..
.
.
.
.
.
.
(m)
am1 am2 amn
xn
f (x)
In particular,
f (j)(x) = aj1 x1 + aj2 x2 + ajn xn =
n
i=1
68
aji xi .
5.9.1
O(h)
f (x0 + h) f (x0 )
0
= lim
f (x ) = 0.
lim
h0 h
h0
h
Moreover, f () is dierentiable at x0 if and only if there exists a number c R such that
f (x0 + h) f (x0 ) ch
= 0.
h0
h
lim
If such a number c R exists, it is unique and c = f (x0 ). These arguments can be generalized straightforwardly to higher dimensional spaces. In particular, a transformation
f () is dierentiable at a point x0 if it admits a linear transformation around x0 :
Denition 5.8 If f : A Rm is a transformation dened on a subset A of Rn and
x0 is an interior point of A, then f is said to be dierentiable at x0 if there exists an
m n matrix C such that
lim
h0n
In particular, if ej = (0, . . . , 1 , . . . , 0) is the jth standard unit vector in Rn , then
f (x) ej is the partial derivative f (x)/xj = fj (x) with respect to the jth variable.
On the other hand, f (x) ej is the jth component of f (x). Hence, f (x) is the row
vector
f 1
f 1
f 1
(x)
(x)
(x)
f (1) (x)
x1
x2
xn
f 2
f 2
f 2
f (2) (x)
(x)
(x)
x1
x2
xn (x)
Df (x) =
=
..
.
.
.
..
..
..
..
.
f m
f m
f m
f (m) (x)
x1 (x)
x2 (x)
xn (x)
This is called the Jacobian matrix of f () at x. Its rows are the gradients of the
component functions of f ().
Proof of Theorem 5.13: Let C be an m n matrix and let O(h) = f (x + h)
f (x) Ch where h Rn .
(1)
f (x + h) f (1) (x)
c11 c12 c1n
h1
O1 (h)
O2 (h) f (2) (x + h) f (2) (x) c21 c22 c2n h2
..
=
..
.. .. .
..
..
..
.
.
.
.
.
.
.
(m)
(m)
cm1 cm2 cmn
hn
Om (h)
f (x + h) f (x)
The j-th component of O(h), j = 1, . . . , m, is Oj (h) f (j) (x + h) f (j) (x) Cj h, where
Cj is the j-th row of C. For each j,
|Oj (h)| O(h) |O1 (h)| + |Om (h)|
It follows that
O(h)
|Oi (h)|
= 0 lim
= 0 for all i = 1, . . . , m
h0 h
h0 h
lim
70
is
pm
mn
ef (h)
0 as h 0
h
eg (k)
0 as k 0
k
Also,
Note that there exits some xed constant K such that k(h) Kh for all small h.
Otherwise, f and g are not dierentiable. Note also that for all > 0, eg (k) < k
for k small because g is dierentiable. Thus, when h is small, we can summarize
eg (k(h)) < k(h) Kh
71
Hence,
eg (k(h))
0 as h 0
h
Then, we execute a series of computation below.
e(h) = g (f (x) + k(h)) g(f (x)) Dg(f (x))Df (x)h
= D(g f )(x)k(h) + eg (k(h)) Dg(f (x))Df (x)h
= D(g f )(x) (k(h) Df (x)h) + eg (k(h))
= D(g f )(x)ef (h) + eg (k(h)) ( h(k) = Df (x)h + ef (h))
And, moreover,
e(h)
h
1
D(g f )(x)ef (h) + eg (k(h))
h
D(g f )(x)ef (h) eg (k(h))
+
( triangle inequality)
h
h
ef (h) eg (k(h))
+
( Cauchy-Schwartz inequality)
D(g f )(x)
h
h
=
5.10
5.11
f1 /x1
..
..
Dx f (x, y) =
.
.
fm /x1
f1 /xn
..
.
.
fm /xn
dy
y2 /x1 y2 /x2 y2 /xn
=
= Dg(x) = [Dy f (x, y)]1 Dx f (x, y)
..
..
.
.
..
..
dx
.
.
ym /x1 ym /x2 ym /xn
Proof of Implicit Function Theorem: We dene the norm of vectors in Rn as
follows.
x max |xi |
1in
This is the norm we discuss for the implicit function theorem. The proof relies on the
following three lemmas (Lemmas 5.1, 5.2, and 5.3). We will not provide their proofs
here.
Lemma 5.1 Let K be a compact set in Rn . Let {hk (x)}K
k=1 be a sequence of continuous
m
functions K R . Suppose that for any > 0, there exists a number N N such that
max hm (x) hn (x) <
xK
for all m, n > N . Then, there exists a unique continuous function h : K Rm such that
+
lim max hk (x) h(x) = 0
k
xK
With Weierstrasss Theorem (Theorem 3.19) and the concept of Cauchy sequence in
Rn (Denition 3.6), the above lemma should be easy to be established. For the next
lemma, dene the following.
D = x Rn x x0
D = y Rm y y 0
Let (x, y) be a continuous mapping from D D to R with the property that
= 0. Notice that D D is a compact set by construction.
(x0 , y 0 )
Lemma 5.2 (Lipschitz Continuity) There exists a number K (0, 1) such that, for
all y, y D ,
f (x, y) = Dy f (x0 , y 0 ) (y y 0 ) + g(x, y)
m1
m1
mm
76
Chapter 6
Static Optimization
6.1
6.1.1
Unconstrained Optimization
Extreme Points
f (x)
= 0, for i = 1, . . . , n
xi
Proof of Theorem 6.1: Suppose, on the contrary, that x is a maximum point but
not a stationary point for f (). Then, there is no loss of generality to assume that there
exists at least i such that fi (x) > 0. Dene x = (x1 , . . . , xi + , . . . , xn ). Since x is an
interior point in S, one can make sure that x S by choosing > 0 suciently small.
Then,
, 0, . . . , 0) > f (x ).
f (x ) f (x ) + f (x) (0, . . . , 0,
i
77
However, this contradicts the hypothesis that x is a maximum point for f ().
The next theorem claries under what conditions, the converse of the previous theorem (Theorem 6.1) is established.
Theorem 6.2 Suppose that the function f () is dened in a convex set S Rn and let
x be an interior point of S. Assume that f () is C 1 in a ball around x .
1. If f () is concave in S, then x is a (global) maximum point for f () in S if and
only if x is a stationary point for f ().
2. f () is convex in S, then x is a (global) minimum point for f () in S if and only
if x is a stationary point for f ().
Proof of Theorem 6.2: We focus on the rst part of the theorem. The second part
follows once we take into account that f is concave. (=) This follows from Theorem
6.1 above. (=) Suppose that x is a stationary point for f () and that f () is concave.
Recall the inequality in Theorem 5.6 (First-order characterization of concave functions).
For any x S,
f (x) f (x ) f (x ) (x x ) = 0 ( f (x ) = 0)
Thus, we have f (x) f (x ) for any x S as desired.
6.1.2
The vector x that maximizes f (x, r) depends on r and is therefore denoted by x (r).
Then, f (r) = f (x (r), r).
Theorem 6.3 (Envelope Theorem) In the maximization problem maxxS f (x, r), where
S Rn and r Rk , suppose that there is a maximum point x (r) S for every
r B (r ) with some > 0. Furthermore, assume that the mappings r f (x (r ), r)
and r f (r) are dierentiable at r . Then
f (x, r)/r1
f (x, r)/r2
r f (r ) =
..
.
f (x, r)/rk
78
x=x (r ),r=r
There are two eects of r on the value function f through both directly and indirectly
The Envelope theorem says that we can ignore the indirect eects.
x (r).
6.1.3
The point x is a local maximum point of f () in S if there exists an > 0 such that
f (x) f (x ) for all x B (x ) S. If x is the unique local maximum point for f (),
then it is a strict local maximum point for f () in S. A (strict) local minimum point is
dened in the obvious way, and it should be clear what we mean by local maximum and
minimum values, local extreme points, and local extreme values. A stationary point x of
f () that is neither a local maximum point nor a local minimum point is called a saddle
point of f ().
Before stating the next result, recall the
matrix D 2 f (x):
f11 (x) f12 (x)
f21 (x) f22 (x)
2
f (x)| =
|D(k)
..
..
.
.
fk1 (x) fk2 (x)
..
.
, k = 1, . . . , n
fkk (x)
f1k (x)
f2n (x)
..
.
Theorem 6.4 (Sucient Conditions for Local Extreme Points) Suppose that f (x) =
f (x1 , . . . , xn ) is dened on a set S Rn and that x is an interior stationary point. Assume also that f () is C 2 in an open ball around x . Then,
1. D 2 f (x ) is positive denite = x is a local minimum point.
2. D 2 f (x ) is negative denite = x is a local maximum point.
79
Proof of Theorem 6.4: We only focus on the rst part of the theorem. We should
be able to prove the second part of the proof by replacing f () with f (). Since each
fij (x) is continuous in x (because f () is C 2 ), the determinant is a continuous function
2 f (x )| > 0 for all k, it is possible to nd a ball B (x ) with > 0
of x. Therefore, if |D(k)
the corresponding quadratic form is positive denite for all x B (x ). It follows from
Theorem 5.5 that f () is strictly convex in B (x ). Then, Theorem 6.2 shows that the
stationary point x is a maximum point for f in B (x ). Hence, x is a local minimum
point for f ().
Lemma 6.1 If x is an interior stationary point of f () such that |D2 f (x )|
= 0 and
D 2 f (x ) is neither positive denite nor negative denite, then x is a saddle point.
6.1.4
g (t) =
fi (x + th)hi
i=1
g (t) =
n
n
i=1 j=1
80
fij (x + th)hi hj
i=1 j=1
This implies that the Hessian matrix D 2 f (x ) is negative semidenite. Theorem 4.6
shows that this is equivalent to checking all principal minors. The same argument can
be used to establish the necessary condition for x to be a local minimum point for f ().
Exercise 6.1 Find the local extreme values and classify the stationary points as maxima,
minima, or neither.
1. f (x1 , x2 ) = 2x1 x21 x22 .
2. f (x1 , x2 ) = x21 + 2x22 4x2 .
3. f (x1 , x2 ) = x31 x22 + 2x2 .
4. f (x1 , x2 ) = 4x1 + 2x2 x21 + x1 x2 x22 .
5. f (x1 , x2 ) = x31 6x1 x2 + x32
6.2
6.2.1
Constrained Optimization
Equality Constraints: The Lagrange Problem
f (x) gj (x)
L(x)
=
j
= 0, i = 1, . . . , n ()
L(x) = f (x) Dg(x) = 0
xi
xi
xi
j=1
Theorem 6.6 (N&S Conditions for Extreme Points with Equality Constraints)
The following establishes the necessary and sucient conditions for the Lagrangian method
81
g 1 (x )
g 1 (x )
xn
x. 1
..
..
.
Dg(x ) =
.
.
.
m
g (x )
g (x )
mn
x1
xn
has rank m. Then, there exist unique numbers 1 , . . . , m such that the rst-order
conditions () are valid.
2. (Suciency) If there exist numbers 1 , . . . , m and a feasible x which together
satisfy the rst-order conditions (), and if the Lagrangian L(x) is concave in x,
then x solves the maximization problem ().
Proof of Theorem 6.6: (Necessity) The proof for the necessity part consists of
three steps.
Step 1: Construction of the unconstrained maximization problem
Since m n matrix Dg(x ) is assumed to have rank m, there exists a invertible (nonsingular) m m submatrix. After renumbering the variables, if necessary, we can assume
that it consists of the rst m columns. By the implicit function theorem (Theorem 5.17),
= (xm+1 , . . . , xn )
the m constraints, g1 , . . . , gm dene x1 , . . . , xm as C 1 functions of x
in some open ball B around x , i.e., B (x ) with > 0 suciently small, so we can write
x), j = 1, . . . , m
xj = hj (xm+1 , . . . , xn ) = hj (
Then, f (x1 , . . . , xn ) reduces to a composite function
x), . . . , hm (
x), x
)
(
x) = f (h1 (
of x
only. Now, the maximization problem with equality constraints is translated into
the unconstrained maximization problem
x)
max (
x
B (x )
Since x is a local extreme point for f subject to the given constraints, must have an
unconstrained local extreme point at x
= (xm+1 , . . . , xn ). Hence, the partial derivatives
of with respect to xm+1 , . . . , xn must be 0:
f (x )
f (x ) h1
f (x ) hm
(x )
+
= 0 (1)
=
+ +
xk
x1 xk
xm xk
xk
direct eects
indirect eects
82
where k = m + 1, . . . , n.
Step 2: To express h1 /xk , . . . , hm /xk in terms of g j (x)
x), . . . , hm (
x), x) = 0, j = 1, . . . , m
gj (h1 (
for all x
B. Dierentiating this with respect to xk gives
m
gj hs
gj
+
= 0,
xs xk
xk
s=1
j=1
m
m
m
j
f (x ) gj
g hs
f (x )
j
+
j
=0
xs
xs xk
xk
xk
s=1
j=1
j=1
f (x ) gj
j
= 0, s = 1, . . . , m (3)
xs
xs
j=1
f (x ) gj
j
k = m + 1, . . . , n
xk
xk
j=1
Thus, the rst-order necessary conditions for the Lagrangian are also satised.
Step 3: Existence of 1 , . . . , m satisfying (3)
83
g2
gm
2
m
x1
x1
g2
gm
2
m
x2
x2
=
=
f
x1
f
Dg(x )
= f (x )
x1
mn
g2
gm
f
g1
1 +
2
m =
xm
xm
xm
xm
n1
m1
Here the coecient matrix above is invertible (nonsingular), it has a unique solution
1 , . . . , m . This completes the proof.
(Suciency) Suppose that the Lagrangian L(x) is concave. The rst-order necessary conditions imply that the Lagrangian is stationary at x . Then by Theorem 6.2
(suciency for global maximum),
L(x ) = f (x )
j g (x ) f (x)
j=1
j gj (x) = L(x) x S
j=1
But for all feasible x, we have gj (x) = 0 and of course, gj (x ) = 0 for all j = 1, . . . , m.
Hence, this implies that f (x ) f (x). Thus, x solves the maximization problem ().
6.2.2
The optimal values of x1 , . . . , xn in the maximization problem () will depend upon the
parameter vector r = (r1 , . . . , rk ) Rk , in general. If x (r) = (x1 (r), . . . , xn (r)) denotes
the vector of optimal values of the choice variables, then the corresponding value
f (r) = f (x1 (r), . . . , xn (r))
of f () is called the (optimal) value function for the maximization problem (). The values
of the Lagrange multipliers
depend on r; we write j = j (r) for j = 1, . . . , m.
will also
j (x, r) be the Lagrangian. Under certain conditions, we
g
Let L(x, r) = f (x, r) m
j=1 j
have
L(x, r)
f (r)
=
, i = 1, . . . , k
ri
ri
x=x (r)
6.2.3
Tangent Hyperplane
6.2.4
Lemma 6.2 Let x be a regular point of the constraint g(x) = 0 and a local extreme
point of f () subject to these constraints. Then, for any y Rn ,
f (x )y = 0 whenever Dg(x )y = 0.
Proof of Lemma 6.2: Let y = (y1 , . . . , yn ) with y = 1. Let x(t) = x + ty be any
smooth curve on the constraint surface g(x(t)) = 0 passing through x with derivative
x (t)|t=0 = y at x(0) = x . There exists some > 0 such that g(x(t)) = 0 for any
t (, ).
Since x is a regular point, the tangent hyperplane is identical with the set of ys
satisfying g(x )y = 0. Then, since x is a constrained local extreme point of f (), we
have
d
f (x(t))
= 0 = f (x )x (0) = 0,
dt
t=0
equivalently, f (x )y = 0.
The above lemma says that f (x ) is orthogonal to the tangent hyperplane.
6.2.5
Theorem 6.7 (Necessity for Local Maximum) Suppose that x is a local maximum
of f () subject to g(x) = 0 and that x is a regular point of these constraints. Then there
is a Rn such that
f (x ) Dg(x ) = 0.
If we denote by M the tangent hyperplane M = {h Rn |Dg(x )h = 0}, then, the matrix
D 2 L(x ) = D2 f (x ) D2 g(x )
is negative semidenite on M , that is,
hT D2 L(x )h 0 h M
85
Proof of Theorem 6.7: The rst part follows from Theorem 6.6. We only focus
on the second part. Let h = (h1 , . . . , hn ) M with h = 1. Let x(t) = x + th be
any smooth curve on the constant surface g(x(t)) = 0 passing through x with derivative
x (0) = h at x(0) = x . Suppose that x is an interior local maximum point for f subject
to g(x) = 0. Then, if > 0 is small enough,
L(x + th) L(x ) f (x + th) g(x + th) f (x ) g(x )
for all t (, ) because (x + th) x = th = |t| < . Dene the function
(t) = L((x + th). Then, for all t (, ), we have
L(x + th) L(x ) (t) (0).
Thus, the function has an interior maximum at t = 0. Using the chain rule (Theorem
5.15), we obtain
The hypothesis that has an interior local maximum at t = 0 means (0) 0. Thus,
hT D2 L(x )h 0
Lij (x )hi hj 0
i=1
0 = . .
f (x ) Dg(x ) =
n1
0
Suppose also that the matrix D 2 L(x ) = D2 f (x ) + D2 g(x ) is negative denite on
M = {y Rn |Dg(x )y = 0}, that is, for y M with y
= 0, y T D2 L(x )y < 0. Then, x
is a strict local maximum of f () subject to g(x) = 0.
Proof of Theorem 6.8: The rst part follows from Theorem 6.7. Dene the Lagrangian as follows.
L(x) = f (x) g(x)
Dierentiating this with respect to x, and evaluating it at x , we obtain
L(x ) = f (x ) Dg(x ) = 0
86
This implies that L(x )y = 0 for any y Rn . By our hypothesis, D2 L(x ) is negative
denite on M , and therefore, x is a local maximum point of L(x) from Theorem 6.4.
This implies that x is a local maximum of f () subject to g(x) = 0.
Exercise 6.2 Solve the problem
max{x + 4y + z} subject to x2 + y 2 + z 2 = 216 and x + 2y + 3z = 0
Exercise 6.3 Consider the problem (assuming m 4).
max U (x1 , x2 ) =
1
1
ln(1 + x1 ) + ln(1 + x2 ) subject to 2x1 + 3x2 = m
2
4
6.2.6
In economic optimization problems, the objective function as well as the constraint functions (such as the budget set) will often depend on parameters. These parameters are
held constant when optimizing (remember the price taking behavior assumption), but
can vary with the economic situation. We might want to know what happens to the
optimal value function when the parameter change.
Consider the following general Lagrange problem.
max f (x, r) subject to gj (x, r) = 0, j = 1, . . . , m.
xS
87
6.3
g2 (x1 , . . . , xn ) 0
max f (x) subject to
..
xS
.
m
g (x1 , . . . , xn ) 0
A vector x = (x1 , . . . , xn ) that satises all the constraints is called feasible. The
set of all feasible vectors is said to be the feasible set. We assume that f () and all the
gj functions are C 1 . In the case of equality constraint, the number of constraints were
assumed to be strictly less than the number of variables. This is not necessary for the
case of inequality constraints. An inequality constraint gj (x) 0 is said to be active
(binding) at x if gj (x) = 0 and inactive (non-binding) at x if gj (x) < 0.
Note that minimizing f (x) is equivalent to maximizing f (x). Moreover, an inequality constraint of the form gj (x) 0 can be rewritten as g j (x) 0. In this way, most
constrained optimization problem can be expressed as the above form.
We dene the Lagrangian exactly as before.
L(x) = f (x) g(x) = f (x)
j gj (x),
j=1
f (x) gj (x)
L(x)
=
j
= 0, i = 1, . . . , n ()
xi
xi
xi
j=1
Theorem 6.9 (Suciency for the Kuhn-Tucker Conditions I) Consider the maximization problem and suppose that x is feasible and satises conditions () and ().
If the Lagrangian L(x) = f (x) g(x) (with the values obtained from the recipe) is
concave, then x is optimal.
Proof of Theorem 6.9: This is very much the same as the suciency part of
the Lagrangian problem in Theorem 6.6. Since L(x) is concave by assumption and
L(x ) = 0 from (), by Theorem 6.2, x is a global maximum point of L(x). Hence,
for all x S,
f (x )
j gj (x ) f (x)
j=1
j gj (x)
j=1
!
j gj (x ) gj (x) .
j=1
!
j gj (x ) gj (x) 0
j=1
for all feasible x, because this will imply that x solves the maximization problem. Suppose that gj (x ) < 0. Then () shows that j = 0. Suppose that gj (x ) = 0, we have
j
0 because
x is feasible, i.e., gj (x) 0 and j 0.
j (gj (x ) gj (x))
!
m = jjg (x)
j
Then, we have j=1 j g (x ) g (x) 0 as desired.
Theorem 6.10 (Suciency for the Kuhn-Tucker Conditions II) Consider the maximization problem and suppose that x is feasible and satises conditions () and ().
If f () is concave and each j gj (x) (with the values obtained from the recipe) is quasiconvex, then x is optimal.
Proof of Theorem 6.10: We want to show that f (x) f (x ) 0 for all feasible
x. Since f () is concave, then according to Theorem 5.6 (First-order characterization of
concavity of f ()),
=
f (x) f (x ) f (x ) (x x )
j g j (x ) (x x )
() j=1
where we use the rst order condition (). It therefore suces to show that for all
j = 1, . . . , m, and all feasible x,
j g j (x ) (x x ) 0.
The above inequality is satised for those j such that gj (x ) < 0, because then j = 0
from the complementary slackness condition (). For those j such that gj (x ) = 0, we
89
6.4
Proof of Proposition 6.1: Suppose that b and b are two arbitrary parame
ter vectors in the constraint set, and let f (b ) = f (x (b ), f (b ) = f (x (b ), with
gj (x (b )) bj , gj (x (b )) bj for j = 1, . . . , m. Let [0, 1]. Corresponding to the
vector b + (1 )b , there exists an optimal solution x (b + (1 )b ), and
f (b + (1 )b ) = f (x (b + (1 t)b ))
gj (x ) gj (x (b )) + (1 )gj (x (b )) bj + (1 )bj
90
f (x ) f (x (b + (1 )b )) = f (b + (1 )b )
On the other hand, concavity of f implies that
f (x ) f (x (b )) + (1 )f (x (b )) = f (b ) + (1 )f (b )
In sum,
f (b + (1 )b ) f (b ) + (1 )f (b )
This shows that f (b) is concave.
6.5
Constraint Qualications
g1 (x )/x1 g1 (x )/xn
..
..
..
k = Rank [Dgk (x )] = Rank
.
.
.
.
k
g (x )/x1 g (x )/xn
91
k gj (x).
j=1
j g j (x )y
j=1
= f (x )h 1 g 1 (x )h
= f (x )h + 1
Since f (x )h 0, we conclude that 1 0. A similar argument shows that j 0 for
j = 1, . . . , k. This completes the proof.
Theorem 6.12 (Kuhn-Tucker N & S Conditions) Assume that a feasible vector x
and a set of multipliers 1 , . . . , m satisfy the Kuhn-Tucker necessary conditions () and
() for the constrained maximization problem. Dene J = {j|gj (x ) = 0}, the set
of active (binding) constraints, and assume that j > 0 for all j J Consider the
Lagrangian problem
max f (x) subject to gj (x) = 0 j J
Then, x satises
) = f (x )
L(x
j g j (x ) = 0
jJ
for k large enough. Then, we must have Dgj (x )h 0 because Dgj (x )hk is a linear
continuous in hk and gj (yk ) gj (y ) for each k.
If g j (x )h = 0 for all j J, then the proof goes through just as in Theorem 6.8.
Therefore, there must exists at least one j J such that g j (x )h < 0. Then, we
obtain
j Dgj (x )h > 0 because j > 0 for all j J
f (x )h
jJ
f (x )
j Dgj (x ) h > 0
jJ
=0
jJ
j g j (x ) = 0.
6.6
Nonnegativity Constraints
gm+n (x) = xn 0
We introduce the Lagrangian multipliers 1 , . . . , n to go with the new constraints and
form the extended Lagrangian.
L1 (x) = f (x)
j g (x)
j=1
i (xi )
i=1
f (x ) gj (x )
j
+ i = 0, i = 1, . . . , n
xi
xi
j=1
i 0 and j = 0 if gj (x ) < 0, j = 1, . . . , m
i 0 and i = 0 if xi > 0, i = 1, . . . , n
94
f (x ) gj (x )
j
0 (= 0 if xi > 0), i = 1, . . . , n
xi
xi
j=1
j
= i , i = 1, . . . , n
xi
xi
j=1
6.7
6.8
Quasiconcave Programming
The following theorem is important for economists, because in many economic optimization problems, the objective function is assumed to be quasiconcave, rather than
concave.
Theorem 6.15 (Arrow and Enthoven (1961) in Econometrica ) (Sucient Conditions for Quasiconcave Programming): Consider the constrained optimization
problem where the objective function f () is C 1 and quasiconcave. Assume that there
exist numbers 1 , . . . , m and a vector x such that
1. x is feasible and satises the Kuhn-Tucker conditions.
2. f (x )
= 0.
3. j gj (x) is quasiconvex for each j = 1, . . . , m.
Then, x is optimal.
6.9
96
Chapter 7
Dierential Equations
7.1
Introduction
x =
()
x (t) = ax(t),
where a is some constant. I propose here x(t) = Keat as a solution to the dierential
equation.
97
Chapter 8
8.1
We use this norm in the proof of the implicit function theorem (Theorem 5.17). Choose
any x0 S and let xk = f (xk1 ). If a sequence {xk } has a limit x , then x S because
S is closed, and f (x ) = x . Therefore, it suces to prove that {xk } has a limit. We use
the Cauchy criterion. Pick q > p. Then,
6
6
6 q1
6
q1
6
6
6
6
(xk+1 xk )6
xk+1 xk
xq xp = 6
6k=p
6
k=p
Minkowski inequality
98
But
xk+1 xk = f (xk ) f (xk1 )| xk xk1 .
Repeated application of the above yields
xk+1 xk k x1 x0
Hence
xq xp
q1
k x1 x0
k=p
!
x1 x0 p + p+1 +
p
0 as p, q
= x1 x0
1
Because p 0 as p due to < 1.
8.2
Lemma 8.1 If f : [0, 1] [0, 1] is continuous, there exists x [0, 1] such that f (x) = x.
Proof of Lemma 8.1: Each x [0, 1] can be represented as a convex combination
of the end points of the interval:
x = (1 x) 0 + x 1
The same will be true for f (x). So we express each x [0, 1] as a pair of nonnegative
numbers (x1 , x2 ) = (1 x, x) that add to one. When expressing f (x) in this way, we will
write it as (f1 (x), f2 (x)) = (1 f (x), f (x)). Suppose for a contradiction, that f has no
xed point.
Since f : [0, 1] [0, 1] we can think of the function f () as moving each point x [0, 1]
either to the right (if f (x) > x) or to the left (if f (x) < x). The assumption that f ()
has no xed point eliminates the possibility that f () leaves the position of x unchanged.
Given any x [0, 1], we label it with a + if f1 (x) < x1 (move to the right) and
label it if f1 (x) > x1 (move to the left). The assumption of no xed point implies
f1 (x)
= 1 x for all x [0, 1]. Thus, the labeling scheme is well dened. Notice that
the point 0 will be labeled (+) and the point 1 will be labeled ().
Choose any nite partition, 0 , of the interval [0, 1] into smaller intervals.
Claim 8.1 The partition 0 must contain a subinterval [x0 , y 0 ] whose endpoints have
dierent labels.
99
Proof of Claim 8.1: Every endpoint of these subintervals is labeled either (+) or
(). The point 0, which must be the endpoint of the subinterval of 0 , has label (+).
The point 1 has label (). As we travel from 0 to 1 (left to right), we leave a point
labeled (+) and arrive at a point labeled (). At some point, we must pass through a
subintervals which has endpoints with dierent labels.
Now take the partition 0 and form a new partition 1 , ner than the rst by taking
all the subintervals in 0 whose endpoints have dierent labels and cutting them in half.
In 1 , there must exist at least one subinterval, [x1 , y 1 ] with endpoints having dierent
labels. Repeat this procedure indenitely.
This produces an innite sequence of subintervals {(xk , y k )} shrinking in size with
dierent labels at the endpoints. Furthermore, we can choose a subsequence of them so
that the left hand endpoint, xk , is labeled (+) and the right hand endpoint y k is labeled
(). Since these intervals live in [0, 1], their lengths are bounded. Therefore, by BolzanoWeierstrass theorem (Theorem 3.13), there is a convergent subsequence of them, with
|xk y k | 0 as k . By continuity of f (), |f (xk ) f (y k )| 0 as k .
Let z be the limit point of {xk } and {y k }. By continuity, f (xk ) and f (y k ) both
converge to f (z). Since each xk is labeled (+) and each y k is labeled (), for each k, we
have f1 (xk ) < xk1 and in the limit f1 (z) z1 . For each k, we have f1 (y k ) > y1k and in
the limit f1 (z) z1 . Thus, f1 (z) z1 and f1 (z) z1 . This implies that f1 (z) = z, i.e.,
f (z) = z, a xed point. This is a contradiction.
Denition 8.2 The n-simplex is the set n = {x Rn |
all i = 1, . . . , n}.
n
i=1 xi
= 1 and xi 0 for
From the denition, n is convex and compact. We also see that this is an (n 1)dimensional object.
Lemma 8.2 If f : n n is a continuous function, then there exists x n such
that f (x) = x.
We skip the proof of Lemma 8.2. We should note that Lemma 8.1 is a special case of
Lemma 8.2. Before showing Brouwer xed point theorem, we need some preliminaries.
Denition 8.3 A set A is topologically equivalent to a set B if there exists a continuous function g with continuous inverse such that g(A) = B and g1 (B) = A.
Observe that topological equivalence is a weaker requirement than that for the inverse
function theorem. Do you see why? The closed n-ball of center x0 in Rn is the set
{x Rn |d(x, x0 ) 1}. Note that a closed n-ball is of dimension n.
Theorem 8.2 A nonempty compact convex set S Rn of dimension m n is topologically equivalent to a closed ball in Rm .
100
We skip the proof of Theorem 8.2. Now, it is time to prove Brouwer xed point
theorem.
Theorem 8.3 (Brouwer Fixed Point Theorem (1912)) If S Rn is compact and
convex and f : S S is continuous, there exists x S such that f (x) = x.
Proof of Brouwer Fixed Point Theorem: Lemma 8.2 shows that a continuous
function f : n n must have a xed point. Then, it only remains to prove that there
is no loss of generality to assume that S = n as any compact convex set of dimension
n 1 in Rn . To do so, we make use of the topological equivalence of compact convex
sets.
If S is a compact convex set of dimension n 1, we know from Theorem 8.2 that
there is a g : S n and g1 : n S such that g and g1 are continuous. Dene
h : n n as follows:
!#
"
h(x) = g f g1 (x) .
Since
h() is !#
continuous, by Lemma 8.2, it has a xed point x . Therefore, h(x ) =
"
1
101
Chapter 9
Separation Theorems
()
Then, the theorem is true for every number (a w, a y). Now, it remains to show
(). Let x be any point in S. Since S is convex, x + (1 )w S for each [0, 1].
Now dene g() as the square of the distance from x + (1 )w to the point y:
g() = y (x + (1 )w)2 = y w + (w x)2
for every x S
Do
Proof of Theorem 9.2: Let S be the closure of S. Because S is convex, so is S.
you see why? Because y is not an interior point of S and S is convex, y is not an interior
Hence, there is a sequence {yk } of points for which yk
/ S for each k and
point of S.
= s t + (1 )s (1 )t
)
* )
*
= s + (1 )s t + (1 )t
()
From () it follows that the set A = {a x|x S} is bounded above by a y for any
y T . By Fact 2.1 (Least Upper Bound Principle), A has a supremum . Since
is the least upper bound of A, it follows that a y for every y T . Therefore,
a x a y for all x S and all y T . Thus, S and T are separated by the
hyperplane {z Rn |a z = }.
Theorem 9.4 Let S and T be two disjoint, nonempty, closed, convex sets in Rn with S
being bounded. Then, there exists a nonzero vector a Rn and a scalar R such that
ax> >ay
9.2
9.3
Dimension of a Set
9.4
105