© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

39 Aufrufe

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- CBSE-2013-SP-I.pdf
- Maths for jee
- Maths
- Application of Fuzzy Logic to Social Choice Theory
- 10th Mathematics Matriculation Level
- Arithmetic and Geometric Progressions
- Topology
- Intro Analysis Copy
- gidea2
- MIB.LECTURE9
- ch2-folland.pdf
- DMS Relations
- Real Analysis First Year Notes
- Calculus Option- Pearson.pdf
- 1443
- 9781431522958_Technical Mathematics Grade 10.epub
- Supplementary Material
- How to Choose - An Introduction to Decision Theory
- mathgen-809352135
- The Basics: Sets, Tuples & Functions Sets

Sie sind auf Seite 1von 105

by

Takashi Kunimoto

First Version: August 9, 2007

This Version: May 18, 2010

Summer 2010, Department of Economics, McGill University

August 16 - 27 (tentative): Monday - Friday, 10:00am - 1:00pm; at TBA

Instructor: Takashi Kunimoto

Email: takashi.kunimoto@mcgill.ca

Class Web: http://people.mcgill.ca/takashi.kunimoto/?View=Publications

Oce: Leacock 438

COURSE DESCRIPTION: This course is designed to provide you with

mathematical tools that are extensively used in graduate economics courses. The topics

which will be covered is

Sets and Functions;

Topology in the Euclidean Space;

Linear Algebra

Multivariate Calculus;

Static Optimization;

(Optional) Correspondences and Fixed Points; and

(Optional) The First-Order Dierential Equations in One Variable.

A good comprehension of the material covered in the notes is essential for successful

graduate studies in economics. Since we are seriously time constrained which you

might not believe , it would be very useful for you to carry one of the books provided

below as a reference after you start to study in graduate school in September.

READING:

The main textbook of this course is Further Mathematics for Economic Analysis. I

mostly use this book for the course. However, if you dont nd the main textbook helpful

enough, I strongly recommend you that you should buy at least one of the other books

I listed below as well as Further Mathematics for Economic Analysis. Of course, you

can buy any math book which you nd useful.

1

I am thankful to the students for their comments, questions, and suggestions. Yet, I believe that

there are still many errors in this manuscript. Of course, all remaining ones are my own.

Further Mathematics for Economic Analysis, by Knut Sydsaeter, Peter Hammond, Atle Seierstad, and Atle Strom, Prentice Hall, 2005 (Main Textbook. If

you dont have any math book or are not condent about your math skill, this

book will help you a lot.)

Mathematical Appendix, in Advanced Microeconomic Theory, Second Edition,

by Georey A. Jehle and Philip J. Reny, (2000), Addison Wesley (Supplementary.

This is the main textbook for Econ 610 but the mathematical appendix of this

book is too concise in many times)

Mathematics for Economists, by Simon and Blume, Norton, (1994). (Supplementary. This book is a popular math book in many Ph.D programs in economics.

There has to be a reason for that, although I dont know the true one.)

Fundamental Methods of Mathematical Economics, by A. Chiang, McGraw-Hill.

(more elementary and supplementary)

Introductory Real Analysis, by A. N. Kolmogorov and S.V. Fomin, Dover Publications (very very advanced and supplementary. If you really like math, this is the

book for you.)

OFFICE HOURS: Wednesday and Friday, 2:00pm - 3:00pm

PROBLEM SETS: There will be several problem sets. Problem sets are essential to

help you understand the course and to develop your skill to analyze economic problems.

ASSESSMENT: No grade will be assigned. However, you are expected to do the

problem sets assigned.

Contents

1 Introduction

2 Preliminaries

2.1 Logic . . . . . . . . . . . . . . . . . .

2.1.1 Necessity and Suciency . .

2.1.2 Theorems and Proofs . . . .

2.2 Set Theory . . . . . . . . . . . . . .

2.3 Relations . . . . . . . . . . . . . . .

2.3.1 Preference Relations . . . . .

2.4 Functions . . . . . . . . . . . . . . .

2.4.1 Least Upper Bound Principle

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3 Topology in Rn

3.1 Sequences on R . . . . . . . . . . . . . . . . .

3.1.1 Subsequences . . . . . . . . . . . . . .

3.1.2 Cauchy Sequences . . . . . . . . . . .

3.1.3 Upper and Lower Limits . . . . . . . .

3.1.4 Inmum and Supremum of Functions

3.1.5 Indexed Sets . . . . . . . . . . . . . .

3.2 Point Set Topology in Rn . . . . . . . . . . .

3.3 Topology and Convergence . . . . . . . . . .

3.4 Properties of Sequences in Rn . . . . . . . . .

3.5 Continuous Functions . . . . . . . . . . . . .

4 Linear Algebra

4.1 Basic Concepts in Linear Algebra . . . . .

4.2 Determinants and Matrix Inverses . . . .

4.2.1 Determinants . . . . . . . . . . . .

4.2.2 Matrix Inverses . . . . . . . . . . .

4.2.3 Cramers Rule . . . . . . . . . . .

4.3 Vectors . . . . . . . . . . . . . . . . . . .

4.4 Linear Independence . . . . . . . . . . . .

4.4.1 Linear Dependence and Systems of

4.5 Eigenvalues . . . . . . . . . . . . . . . . .

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Linear

. . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

Equations

. . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

9

9

10

11

13

13

14

15

.

.

.

.

.

.

.

.

.

.

17

17

18

19

20

21

21

21

23

24

26

.

.

.

.

.

.

.

.

.

29

29

32

32

32

33

34

35

36

37

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

37

39

40

42

44

45

46

47

47

49

49

49

51

52

53

54

5 Calculus

5.1 Functions of a Single Variable . . . . . . . . . . . .

5.2 Real-Valued Functions of Several Variables . . . .

5.3 Gradients . . . . . . . . . . . . . . . . . . . . . . .

5.4 The Directional Derivative . . . . . . . . . . . . . .

5.5 Convex Sets . . . . . . . . . . . . . . . . . . . . . .

5.5.1 Upper Contour Sets . . . . . . . . . . . . .

5.6 Concave and Convex Functions . . . . . . . . . . .

5.7 Concavity/Convexity for C 2 Functions . . . . . . .

5.7.1 Jensens Inequality . . . . . . . . . . . . . .

5.8 Quasiconcave and Quasiconvex Functions . . . . .

5.9 Total Dierentiation . . . . . . . . . . . . . . . . .

5.9.1 Linear Approximations and Dierentiability

5.10 The Inverse of a Transformation . . . . . . . . . .

5.11 Implicit Function Theorems . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

55

55

56

56

57

58

59

59

60

63

64

68

69

72

73

.

.

.

.

.

.

.

.

.

.

77

77

77

78

79

80

81

81

84

84

85

4.6

4.7

4.8

4.9

4.5.1 Motivations . . . . . . . . . . . . . . . . .

4.5.2 How to Find Eigenvalues . . . . . . . . .

Diagonalization . . . . . . . . . . . . . . . . . . .

Quadratic Forms . . . . . . . . . . . . . . . . . .

Appendix 1: Farkas Lemma . . . . . . . . . . . .

4.8.1 Preliminaries . . . . . . . . . . . . . . . .

4.8.2 Fundamental Theorem of Linear Algebra

4.8.3 Linear Inequalities . . . . . . . . . . . . .

4.8.4 Non-Negative Solutions . . . . . . . . . .

4.8.5 The General Case . . . . . . . . . . . . .

Appendix 2: Linear Spaces . . . . . . . . . . . .

4.9.1 Number Fields . . . . . . . . . . . . . . .

4.9.2 Denitions . . . . . . . . . . . . . . . . .

4.9.3 Bases, Components, Dimension . . . . . .

4.9.4 Subspaces . . . . . . . . . . . . . . . . . .

4.9.5 Morphisms of Linear Spaces . . . . . . . .

6 Static Optimization

6.1 Unconstrained Optimization . . . . . . . . . . . . . . .

6.1.1 Extreme Points . . . . . . . . . . . . . . . . . .

6.1.2 Envelope Theorems for Unconstrained Maxima

6.1.3 Local Extreme Points . . . . . . . . . . . . . .

6.1.4 Necessary Conditions for Local Extreme Points

6.2 Constrained Optimization . . . . . . . . . . . . . . . .

6.2.1 Equality Constraints: The Lagrange Problem .

6.2.2 Lagrange Multipliers as Shadow Prices . . . . .

6.2.3 Tangent Hyperplane . . . . . . . . . . . . . . .

6.2.4 Local First-Order Necessary Conditions . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6.2.5

6.3

6.4

6.5

6.6

6.7

6.8

6.9

treme Points . . . . . . . . . . . . . . . . . . . . .

6.2.6 Envelope Result for Lagrange Problems . . . . . .

Inequality Constraints: Nonlinear Programming . . . . . .

Properties of the Value Function . . . . . . . . . . . . . .

Constraint Qualications . . . . . . . . . . . . . . . . . .

Nonnegativity Constraints . . . . . . . . . . . . . . . . . .

Concave Programming Problems . . . . . . . . . . . . . .

Quasiconcave Programming . . . . . . . . . . . . . . . . .

Appendix: Linear Programming . . . . . . . . . . . . . . .

for

. .

. .

. .

. .

. .

. .

. .

. .

. .

Local Ex. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

85

87

88

90

91

94

95

96

96

7 Dierential Equations

97

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8 Fixed Point Theorems

98

8.1 Banach Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.2 Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . 99

9 Topics on Convex Sets

9.1 Separation Theorems . . . .

9.2 Polyhedrons and Polytopes

9.3 Dimension of a Set . . . . .

9.4 Properties of Convex Sets .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

102

102

104

105

105

Chapter 1

Introduction

I start my lecture with Rakesh Vohras message about what economic theory is. He is a

professor at Northwestern University. 1

All of economic theorizing reduces, in the end, to the solution of one of

three problems.

Given a function f and a set S:

1. Find an x such that f (x) is in S. This is the feasibility question.

2. Find an x in S that optimizes f (x). This is the problem of optimality.

3. Find an x in S such that f (x) = x, this is the xed point problem.

These three problems are, in general, quite dicult. However, if one is

prepared to make assumptions about the nature of the underlying function

(say it is linear, convex or continuous) and the nature of the set S (convex,

compact etc.) it is possible to provide answers and very nice ones at that.

I think this is the biggest picture of economic theory you could have as you go along

this course. Whenever you are at a loss, please come back to this message.

We build our theory on individuals. Assume that all commodities are traded in the

centralized markets. Throughout Econ 610 and 620, we assume that each individual

(consumer and rm) takes prices as given. We call this the price taking behavior assumption. You might ask why individuals are price takers. My answer would be why not?

Let us go as far as we can with this behavioral assumption and thereafter try to see the

limitation of the assumption. However, you have to wait for Econ 611 and 621 for how

to relax this assumption. So, stick with this assumption. For each consumer, we want

to know

1. What is the set of physically feasible bundles? Is there any such a bundle at all

(feasibility)? We call this set the consumption set.

1

2. What is the set of nancially feasible bundles? Is there any such a bundle at all

(feasibility)? We call this set the budget set.

3. What is the best bundle to the consumer among all feasible bundles (optimality)?

We call this bundle the consumers demand.

We can make the exact parallel argument for the rm. What is the set of technically

feasible inputs (feasibility)? We call this the production set of the rm. What is the

best combination of inputs to maximize its prot (optimality)? We call this the rms

supply. Once we gure out what are feasible and best choices to each consumer and each

rm under any possible circumstance, we want to know if there is any coherent state of

aairs where everybody makes her best choice. In particular, all markets must clear. We

call this coherent state competitive (Walrasian) equilibrium. (a xed point).

If we move from microeconomics to macroeconomics, we must pay special attention

to time. Now, each individuals budget set does depend upon time. At each point in

time, he can change his asset portfolio so that he smoothes out his consumption plan

and/or production plan over time. If you know exactly when you die, there is no problem.

Because you just leave no money when you die, unless you want to leave some money

to your kids (i.e., altruistic preferences). This is called the nite time horizon problem.

What if you might live longer than you expected with no money left? Then, what do

you do? So, in reality, you dont know exactly when you die. This situation can be

formulated as the innite time horizon problem. Do you see why? To deal with the

innite horizon problem, we use the transversality condition as the terminal condition of

the feasible set. Moreover, his optimization must be taken into account time. You can

also analogously dene a sequence of competitive equilibria of the economy.

How can we summarize what we discussed above? Given a (per capita) consump

tion stream {ct }

t=0 , (per capita) capital accumulation {kt }t=0 , (per capita) GDP stream

{f (kt )}t=0 , capital depreciation rate , the population growth rate n, (per capita) consumption growth g, instantaneous utility function of the representative consumer u(),

the eective discount rate of the representative consumer > 0, (per capita) wage prole

{wt }

t=0 , and capital interest rate prole {rt }t=0 :

1. Find a {ct }

t=0 such that kt = f (kt ) ct ( + g + n)kt holds at each t 1 and

k0 > 0 is exogenously given. This is the feasibility question. Any such {ct } is called

a feasible consumption stream.

t

u(ct )dt.

2. Find a feasible consumption stream {ct }

t=0 that maximizes V0 = 0 e

This is the problem of optimality. I assume that V0 < .

3. Find a {rt , wt }

t=0 such that V0 (the planners optimization) is sustained through

market economies where k t = (rt n g)kt + wt ct holds at each t 1 and

another condition limt t et kt = 0 holds. This latter condition is sometimes

called the transversality condition. This is the xed point problem. This, in fact,

can be done by choosing rt = f (kt ) t and wt = f (kt ) f (kt )kt at each t 1.

7

With appropriate re-interpretations, the above is exactly what we had in the beginning except the transversality condition, which is a genuine feature of macroeconomics.

Chapter 2

Preliminaries

2.1

Logic

Theorems provide a compact and precise format for presenting the assumptions and

important conclusions of sometimes lengthy arguments, and so help identify immediately

the scope and limitations of the result presented. Theorems must be proved and a proof

consists of establishing the validity of the statement in the theorem in a way that is

consistent with the rules of logic.

2.1.1

Consider any two statements, p and q. When we say p is necessary for q, we mean

that p must be true for q to be true. For q to be true requires p to be true, so whenever

q is true, we know that p must also be true. So we might have said, instead, that p is

true if q is true, or simply that p is implied by q (p q).

Suppose we know that p q is a true statement. What if p is not true? Because

p is necessary for q, when p is not true, then q cannot be true, either. But doesnt this

just say that q not true is necessary for p not true? Or that not-q is implied by

not-p (q p). This latter form of the original statement is called the contrapositive

form.

Lets consider a simple illustration of these ideas. Let p be the statement, x is an

integer less than 10. Let q be the statement that x is an integer less than 8. Clearly,

p is necessary for q (q p). If we form the contrapositive of these two statements, the

statement p becomes x is not an integer less than 10, and x is not an integer less

than 8. Then, observe that q p. However, p q is false. The value of x could

well be 9.

The notion of necessity is distinct from that of suciency. When we say p is

sucient for q, we mean that whenever p holds, q must hold. We can say, p is true

only if q is true, or that p implies q (p q). Once again, whenever the statement

9

Two implications, p q and p q, can both be true. When this is so, I say

that p is necessary and sucient for q, or p is true if and only if q is true, or p i

q. When p is necessary and sucient for q, we say that the statements p and q are

equivalent and write p q.

To illustrate briey, suppose that p and q are the following statements:

p X is yellow,

q X is a lemon.

Certainly, if X is a lemon, then X is yellow. Here, p is necessary for q. At the same

time, just because X is yellow does not mean that it must be a lemon. It could be a

banana. So p is not sucient for q.

2.1.2

one or more statements are alleged to be related in particular ways. Suppose we have

the theorem p q. Here, p is the assumption and q is the conclusion. To prove a

theorem is to establish the validity of its conclusion given the truth of its assumption,

and several methods can be used to do that.

1. In a constructive proof, we assume that p is true, deduce various consequences of

that, and use them to show that q must also hold. This is also sometimes called a

direct proof, for obvious reasons.

2. In a contrapositive proof, we assume that q does not hold, then show that p cannot

hold. This approach takes advantage of the logical equivalence between the claims

p q and q p noted earlier, and essentially involves a constructive proof

of the contrapositive to the original statement.

3. In a proof by contradiction, the strategy is to assume that p is true, assume that q

is not true, and attempt to derive a logical contradiction. This approach relies on

the fact that p q or q always is true and if p q is false, then p q must

be true.

4. In a proof by mathematical induction, I have a statement H(k) which does depend

upon a natural number k. What I want to show is that a statement H(k) is true

for each k = 1, 2, . . . . First, I show that H(1) is true. This step is usually easy to

establish. Next, I show that H(k) = H(k + 1), i.e., if H(k) is true, then H(k + 1)

is also true. These two steps allows me to claim that I am done.

If I assert that p is necessary and sucient for q, or that p q, we must give a

proof in both directions. That is, both p q and q p must be established

before a complete proof of the assertion has been achieved.

10

It is important to keep in mind the old saying that goes, Proof by example is no

proof. Suppose the following two statements are given:

p x is a student,

q x has red hair.

Assume further that we make the assertion p q. Then clearly nding one student

with red hair and pointing him out to you is not going to convince you of anything.

Examples are good for illustrating but typically not for proving.

Finally, a sort of converse to the old saying about examples and proofs should be

noted. Whereas citing a hundred examples can never prove that a certain property

always holds, citing one solitary counterexample can disprove that the property always

holds. For instance, to disprove the assertion about the color of students hair, you need

simply point out one student with brown hair. A counterexample proves that the claim

cannot always be true because you have found at least one case where it is not.

2.2

Set Theory

A set is any collection of elements. Sets of objects will usually be denoted by capital

letters, A, S, T for example, while their members by lower case, a, s, t for example (English

or Greek). A set S is a subset of another set T if every element of S is also an element

of T . We write S T . If S T , then x S x T . The set S is a proper subset of T

if S T and S

= T ; sometimes one writes S T in this case. Two sets are equal sets if

they each contain exactly the same elements. We write S = T whenever x S x T

and x T x S. The number of elements in a set S, its cardinality, is denoted

|S|. The upside down A, , means for all, while the backward E, means there

exists.

A set S is empty or is an empty set if it contains no elements at all. It is a subset

of every set. For example, if A = {x| x2 = 0, x > 1}, then A is empty. We denote

the empty set by the symbol . The complement of a set S in a universal set U is the

set of all elements in U that are not in S and is denoted S c . For any two sets S and T

in a universal set U , we dene the set dierence denoted S\T , as all elements in the set

S that are not elements of T . Thus, we can think S c = U \S. The symmetric dierence

ST = (S\T ) (T \S) is the set of all elements that belong to exactly one of the sets S

and T . Note that if S = T , then ST = .

For two sets S and T , we dene the union of S and T as the set ST {x| x S or x T }.

We dene the intersection of S and T as the set S T {x| x S and x T }. Let

. }, we can write

{1, 2, 3, . . . } be an index set. In stead of writing {S1 , S2 , S3 , . .

sets

in

the

collection

by

{S } . We would denote the union of all

S , and the

intersection of all sets in the collection as S .

11

The following are some important identities involving the operations dened above.

A B = B A, (A B) C = A (B C), A = A

A B = B A, (A B) C = A (B C), A =

A (B C) = (A B) (A C), A (B C = (A B) (A C) (Distribute laws)

A\(B C) = (A\B) (A\C), A\(B C) = (A\B) (A\C) (De Morgans laws)

AB = BA, (AB)C = A(BC), A = A

The collection of all subsets of a set A is also a set, called the power set of A and

denoted by P(A). Thus, B P(A) B A.

Example 2.1 Let A = {a, b, c}. Then, P(A) = {, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.

The previous argument reveals its stance that the order of the elements in a set

specication does not matter. In particular, {a, b} = {b, a}. However, on many occasions,

one is interested in distinguishing between the rst and the second elements of a pair.

One such example is the coordinates of a point in the x y-plane. These coordinates

are given as an ordered pair (a, b) of real numbers. The important property of ordered

pairs is that (a, b) = (c, d) if and only if a = c and b = d. The product of two sets S and

T is the set of ordered pairs in the form (s, t), where the rst element in the pair is a

member of S and the second is a member of T . The product of S and T is denoted

S T {(s, t)| s S, t T }.

The set of real numbers is denoted by the special symbol R and is dened as

R {x| < x < }.

Any n-tuple, or vector, is just an n dimensional ordered tuple (x1 , . . . , xn ) and can

be thought of as a point in n dimensional Euclidean space. This space is dened as

the set product

R {(x1 , . . . , xn ) | xi R, i = 1, . . . , n}.

Rn R

n times

Often, we want to restrict our attention to a subset of Rn , called the nonnegative

orthant and denoted Rn+ , where

Rn+ {(x1 , . . . , xn ) | xi 0, i = 1, . . . , n} Rn .

Furthermore, we sometimes talk about the strictly positive orthant of Rn

Rn++ {(x1 , . . . , xn ) | xi > 0, i = 1, . . . , n} Rn+ .

12

2.3

Relations

of ordered pairs is said to constitute a binary relation between the sets S and T . Many

familiar binary relations are contained in the product of one set with itself. For example,

let X be the closed unit interval, X = [0, 1]. Then the binary relation consists of all

ordered pairs of numbers in X where the rst one in the pair is greater than or equal to

the second one. When, as here, a binary relation is a subset of the product of one set

X with itself, we say that it is a relation on the set X. A binary relation R on X is

represented by the subset of X X, i.e., R X X. We can build more structure for

a binary relation on some set by requiring that it possesses certain properties.

Denition 2.1 A relation R in X is reexive if xRx for all x X.

For example, and = on R are reexive, while > is not.

Denition 2.2 A relation R on X is complete if, for all elements x and y in X, xRy

or yRx.

For example, on R is complete, while > and = are not. Note that R on X is

reexive if it is complete.

Denition 2.3 A relation R on X is transitive if, for any three elements x, y, and

z X, xRy and yRz implies xRz.

For instance, all , =, > on R are transitive.

Denition 2.4 A relation R on X is symmetric if xRy implies yRx and it is antisymmetric if xRy and yRx implies x = y.

For example, = is symmetric, while and > are not. However, is anti-symmetric,

while > is not as well. A relation R is said to be a partial ordering on X if it is reexive,

transitive, and anti-symmetric. If a partial ordering is complete, it is called a linear

ordering. For instance, the relation in R is a linear ordering.

For n 2, the less-than-or-equal-to relation on Rn is dened by (x1 , . . . , xn )

(y1 , . . . , yn ) if and only if xk yk for k = 1, . . . , n. There is also a strict inequality

relation , which is given by (x1 , . . . , xn ) (y1 , . . . , yn ) if and only if xk > yk for all

k = 1, . . . , n.

2.3.1

Preference Relations

I now talk a little bit about economics. Here I apply the concept of relations to the

consumer choice problem. The number of commodities is nite and equal to n. Each

commodity is measured in some innitely divisible units. Let x = (x1 , . . . , xn ) Rn+

be a consumption bundle. Let Rn+ be a consumption set that is the set of bundles the

consumer can conceive. We represent the consumers preferences by a binary relation,

13

x , for this consumer.

Denition 2.5 The binary relation on X is said to be strict preference relation

if, x x if and only if x x but x x.

Denition 2.6 The binary relation on X is said to be indierence relation if,

x x if and only if x x and x x.

Exercise 2.1 Show the following:

1. on Rn+ is reexive if it is complete.

2. on Rn+ is not symmetric.

3. on Rn+ is symmetric.

2.4

Functions

A function is a relation that associates each element of one set with a single, unique

element of another set. We say that the function f is a mapping, map, or transformation

from one set D to another set R and write f : D R. We call the set D the domain

and the set R the range of the mapping. If y is the point in the range mapped into by

the point x in the domain, we write y = f (x). In set-theoretic terms, f is a relation from

D to R with the property that for each x D, there is exactly one y R such that xf y

(x is related to y via f ).

The image of f is that set of points in the range into which some point in the domain

is mapped, i.e.,

I {y | y = f (x) for some x D} R.

The inverse image of a set of points S I is dened as

f 1 (S) {x | x D, f (x) S} .

The graph of the function f is the set of ordered pairs

G {(x, y) | x D, y = f (x)} .

If f (x) = y, one also writes x y. The squaring function s : R R, for example,

can then be written as s : x x2 . Thus, indicates the eect of the function on an

element of the domain. If f : A B is a function and S A, the restriction of f to S

is the function f |S dened by f |S (x) = f (x) for every x S. There is nothing in the

denition of a function that prohibits more than one element in the domain from being

mapped into the same element in the range. If, however, every point in the range is

assigned to at most a single point in the domain, the function is said to be one-to-one,

14

that is, for all x, x D, whenever f (x) = f (x ), then x = x . If the image is equal to

the range - if for every y R, there is x D such that f (x) = y, the function is said

to be onto. If a function is one-to-one and onto (sometimes called bijective), then an

inverse function f 1 : R D exists that is also one-to-one and onto. The composition

of a function f : A B and a function g : B C is the function g f : A C given

by (g f )(a) = g(f (a)) for all a A.

Exercise 2.2 Show that f (x) = x2 is not a one-to-one mapping.

2.4.1

A set S of real numbers is bounded above if there exists a real number b such that b x

for all x S. This number b is called an upper bound for S. A set that is bounded above

has many upper bounds. A least upper bound for the set S is a number b that is an

upper bound for S and is such that b b for every upper bound b. The existence of a

least upper bound is a basic and non-trivial property of the real number system.

Fact 2.1 (Least Upper Bound Principle) Any nonempty set of real numbers that is

bounded above has a least upper bound.

This principle is rather an axiom of real numbers. A set S can have at most one least

upper bound, because if b1 and b2 are both least upper bounds for S, then b1 b2 and

b2 b1 , which thus implies that b1 = b2 . The least upper bound b of S is often called

the supremum of S. We write b = sup S and b = supxS x.

Example 2.2 The set S = (0, 5), consisting of all x such that 0 < x < 5, has many

upper bounds, some of which are 100, 6.73, and 5. Clearly no number smaller than 5

can be an upper bound, so 5 is the least upper bound. Thus, sup S = 5.

A set S is bounded below if there exists a real number a such that x a for all x S.

The number a is a lower bound for S. A set S that is bounded below has a greatest

lower bound a , with the property a x for all x S, and a a for all lower bounds

a. The number a is called the inmum of S and we write a = inf S or a = inf xS x.

Thus, we summarize

sup S = the least number greater than or equal to all numbers in S; and

inf S = the greatest number less than or equal to all numbers in S.

Theorem 2.1 Let S be a set of real numbers and b a real number. Then sup S = b if

and only if the following two conditions are satised:

1. x b for all x S.

2. For each > 0, there exists an x S such that x > b .

15

property 1 holds, that is, x b for all x S. Suppose, on the other hand, that there

is some > 0 such that x b for all x S. Dene b = b . This implies

that b is also an upper bound for S and b < b . This contradicts our hypothesis that

b is a least upper bound for S. (=) Property 1 says that b is an upper bound for

S. Suppose, on the contrary, that b is not a least upper bound. That is, there is some

other b such that x b < b for all x S. Dene = b b. Then, we obtain that

x b for all x S. This contradicts property 2.

16

Chapter 3

Topology in Rn

3.1

Sequences on R

A sequence is a function k x(k) whose domain is the set {1, 2, 3, . . . } of all positive integers. I denote the set of natural numbers by N = {1, 2, . . . }. The terms

x(1), x(2), . . . , x(k), . . . of the sequence are usually denoted by using subscripts: x1 , x2 , . . . , xk , . . . .

We shall use the notation {xk }

k=1 , or simply {xk }, to indicate an arbitrary sequence of

real numbers. A sequence {xk } of real numbers is said to be

1. nondecreasing if xk xk+1 for k = 1, 2, . . .

2. strictly increasing if xk < xk+1 for k = 1, 2, . . .

3. nonincreasing if xk xk+1 for k = 1, 2, . . .

4. strictly decreasing if xk > xk+1 for k = 1, 2, . . .

A sequence that is nondecreasing or nonincreasing is called monotone. A sequence

{xk } is said to converge to a number x if xk becomes arbitrarily close to x for all

sucient large k. We write limk xk = x or xk x as k . The precise denition

of convergence is as follows:

Denition 3.1 The sequence {xk } converge to x if for every > 0, there exists a

natural number N such that |xk x| < for all k > N . The number x is called

the limit of the sequence {xk }. A convergent sequence is one that converges to some

number.

Note that the limit of a convergent sequence is unique. A sequence that does not

converge to any real number is said to diverge. In some cases we use the notation

limk xk even if the sequence {xk } is divergent. For example, we say that xk as

k . A sequence {xk } is bounded if there exists a number M such that |xk | M for

all k = 1, 2, . . . . It is easy to see that every convergent sequence is bounded: If xk x,

by the denition of convergence, only nitely many terms of the sequence can lie outside

the interval I = (x 1, x + 1). The set I is bounded and the nite set of points from the

17

sequence that are not in I is bounded, so {xk } must be bounded. On the other hand, is

every bounded sequence convergent? No. For example, the sequence {yk } = {(1)k } is

bounded but not convergent.

Theorem 3.1 Every bounded monotone sequence is convergent.

Proof of Theorem 3.1: Suppose, without loss of generality, that {xk } is nondecreasing and bounded. Let b be the least upper bound of the set X = {xk |k = 1, 2, . . . },

and let > 0 be an arbitrary number. Theorem 2.1 already showed that there must be a

term xN of the sequence for which xN > b . Because the sequence is nondecreasing,

b < xN xk for all k > N . But the xk are all less than or equal to b because of

boundedness, so we have b < xk b . Thus, for any > 0, there exists a number

N such that |xk b | < for all k > N . Hence, {xk } converges to b .

Theorem 3.2 Suppose that the sequences {xk } and {yk } converge to x and y, respectively. Then,

1. limk (xk yk ) = x y

2. limk (xk yk ) = x y

3. limk (xk /yk ) = x/y, assuming that yk

= 0 for all k and y

= 0.

Exercise 3.1 Prove Theorem 3.2.

3.1.1

Subsequences

k1 < k2 < k3 <

and form a new sequence {yj }

j=1 , where yj = xkj for j = 1, 2, . . . . The sequence

{yj }j = {xkj }j is called a subsequence of {xk }.

Theorem 3.3 Every subsequence of a convergent sequence is itself convergent, and has

the same limit as the original sequence.

Proof of Theorem 3.3: It is trivial.

Theorem 3.4 If the sequence {xk } is bounded, then it contains a convergent subsequence.

Proof of Theorem 3.4: Since {xk } is bounded, we can assume that there exists

some M R such that |xk | M for all k N. Let yn = sup{xk |k n} for n N.

By construction, {yn } is a nonincreasing sequence because the set {xk |k n} shrinks

as n increases. The sequence {yn } is also bounded because M yn M . Theorem

3.1 already showed that the sequence {yn } is convergent. Let x = limn yn . By the

18

denition of yn , we can choose a term xkn from the original sequence {xk } (with kn n)

satisfying |yn xkn | < 1/n.

|x xkn | = |x yn + yn xkn | |x yn | + |yn xkn | < |x yn | + 1/n.

This shows that xkn x as n .

3.1.2

Cauchy Sequences

I have dened the concept of convergence of sequences. Then, a natural question arises as

to how we can check if a given sequence is convergent. The concept of Cauchy sequence,

indeed, enables us to do so.

Denition 3.2 A sequence {xk } of real numbers is called a Cauchy sequence if for

every > 0, there exists a natural number N such that |xm xn | < for all m, n > N .

The theorem below is a characterization of convergent sequences.

Theorem 3.5 A sequence is convergent if and only if it is a Cauchy sequence.

Proof of Theorem 3.5: (=) Suppose that {xk } converges to x. Given > 0,

we can choose a natural number N such that |xn x| < /2 for all n > N . Then, for

m, n > N ,

|xm xn | = |xm x + x xn | |xm x| + |x xm | < /2 + /2 = .

Therefore, {xk } is a Cauchy sequence. (=) Suppose that {xk } is a Cauchy sequence.

First, we shall show that the sequence is bounded. By the Cauchy property, there is a

number M such that |xk xM | < 1 for k > M . Moreover, the nite set {x1 , x2 , . . . , xM 1 }

is clearly bounded. Hence, {xk } is bounded. Theorem 3.4 showed that the bounded

sequence {xk } has a convergent subsequence {xkj }. Let x = limj xkj . Because {xk } is a

Cauchy sequence, for every > 0, there is a natural number N such that |xm xn | < /2

for all m, n > N . If we take J is suciently large, we have |xkj x| < /2 for all j > J.

Then for k > N and j > max{N, J},

|xk x| = |xk xkj + xkj x| |xk xkj | + |xkj x| < /2 + /2 =

Hence xk x as k .

Exercise 3.2 Consider the sequence {xk } with the generic term

k

1

1

1

1

xk = 2 + 2 + + 2 =

1

2

k

i2

i=1

19

1

1

1

+

+ +

2

2

(n + 1)

(n + 2)

(n + k)2

1

1

1

+

+ +

<

n(n + 1) (n + 1)(n + 2)

(n + k 1)(n + k)

1

1

1

1

1

1

+ +

=

n n+1

n+1 n+2

n+k1 n+k

Exercise 3.3 Prove that a sequence can have at most one limit. Use proof by contradiction. Namely, you rst suppose, by way of contradiction, that there are two limit

points.

3.1.3

Let {xk } be a sequence that is bounded above, and dene yn = sup{xk |k n} for

n = 1, 2, . . . . Each yn is a nite number and {yn } is a nonincreasing sequence. Then

either limn yn exists or is . We call this limit the upper limit (or lim sup) of the

sequence {xk }, and we introduce the following notation:

lim sup xk = lim (sup{xk |k n})

bounded below, its lower limit (or lim inf, is dened as

lim inf xk = lim (inf{xk |k n)

Theorem 3.6 If the sequence {xk } is convergent, then

lim sup xk = lim inf xk = lim xk .

On the other hand, if limk sup xk = limk inf xk , then {xk } is convergent.

I omit the proof of Theorem 3.6.

Exercise 3.4 Determine the lim sup and lim inf of the following sequences.

1. {xk } = {(1)k }

2. {xk } = (1)k (2 + 1/k) + 1

20

3.1.4

Suppose that f (x) is dened for all x B, where B Rn . We dene the inmum and

supremum of the function f over B by

inf f (x) = inf{f (x)|x B}, sup f (x) = sup{f (x)|x B}.

xB

xB

such that f (c) = y, then we say that the inmum is attained (at the point c) in B. In

this case the inmum y is called the minimum of f over B, and we often write min

instead of inf. In the same way we write max instead of sup when the supremum

of f over B is attained in B, and so becomes the maximum.

3.1.5

Indexed Sets

Suppose that, for each , we specify an object a . Then, these objects form an

indexed set {a } with as its index set. In formal terms, an indexed set is a function

whose domain is the index set. For example, a sequence is an indexed set {ak }k with

the set N of natural numbers as its index set. In stead of {ak }k one often writes

{ak }

k=1 .

A set whose elements are sets is often called a family of sets, and so an indexed set

of sets is also called an indexed family of sets. Consider a nonempty indexed family

{A } of sets. The union and the intersection of this family are the sets

A = the set consisting of all x that belong to A for at least one

A = the set consisting of all x that belong to A for all .

The distributive laws can be generalized to

B =

(A B ) , A

B =

(A B )

A

B =

(A\B ) , A\

B =

(A\B )

A\

The union

of a sequence {An }n = {An }

n=1 of sets is often

and the intersection

written as n=1 An and n=1 An .

3.2

Consider the n-dimensional Euclidean space Rn , whose elements, or points, are nvectors x = (x1 , . . . , xn ). The Euclidean distance d(x, y) between any two points x =

21

between x and y. Thus,

d(x, y) = x y = (x1 y1 )2 + + (xn yn )2

If x, y, and z are points in Rn , then

d(x, z) d(x, y) + d(y, z) (triangle inequality)

If x0 is a point in Rn and r is a positive real number, then the set of all points x Rn

whose distance from x0 is less than r, is called the open ball around x0 with radius r.

This open ball is denoted by Br (x0 ). Thus,

Br (x0 ) = {x Rn |d(x0 , x) < r}

Denition 3.3 A set S Rn is open if, for all x0 S, there exists some > 0 such

that B (x0 ) S.

On the real line R, the simplest type of open set is an open interval. Let S be any

subset of Rn . A point x0 S is called an interior point of S if there is some > 0 such

that B (x0 ) S. The set of all interior points of S is called the interior of S, and is

denoted int(S). A set S is said to be a neighborhood of x0 if x0 is an interior point of S,

that is, if S contains some open ball B (x0 ) (i.e., B (x0 ) S) for some > 0.

Theorem 3.7

1. The entire space Rn and the empty set are both open.

sets are open: Let be an arbitrary index set. If A is

open for each , then, A is also open.

3. The intersection of nitelymany open sets is open: Let be a nite set. If A is

open for each , then A is open.

Proof of Theorem 3.7: (1) It is clear that B1 (x) Rn for all x Rn , so Rn is

open. The empty set is open because the set has no element, so every member is an

interior point.

(2) Let {U } be an arbitrary family of open sets in Rn , and let U = U be

the union of the whole family. For each x U , there is at least one such that

x U . Since U is open by our hypothesis, there exists > 0 such that B (x) U

U . Hence, U is open.

K

n

(3) Let {Ui }K

k=1 Uk be the

k=1 be nite collection of open sets in R , and let U =

intersection of all these sets. Let x be any point in U . Since Uk is open for each k,

there is k > 0 such that Bk (x) Uk for each k. Let = min{1 , . . . , K }. This is

well dened because of niteness. Then, B (x) Uk for each k, which implies that

B (x) U . Hence, U is open.

22

Exercise 3.5 There are two questions. First, draw the graph of S = {(x, y) R2 |2x

y < 2 and x 3y < 5}. Second, prove that S is open in R2 .

Denition 3.4 A set S is closed if its complement, Rn \S is open.

A point x0 Rn is said to be a boundary point of the set S Rn if B (x0 ) S c

=

and B (x0 ) S

= for every > 0. Here, S c = R\S. In general, a set may include

none, some, or all of its boundary points. An open set, for instance, contains none of its

boundary points.

Each point in a set is either an interior point or a boundary point of the set. The set

of all boundary points of a set S is said to be the boundary of S and is denoted S or

bd(S). Note that, given any set S Rn , there is a corresponding partition of Rn into

three mutually disjoint sets (some of which may be empty), namely;

1. the interior of S, which consists of all points x Rn such that N S for some

neighborhood N of x;

2. the exterior of S, which consists of all points x Rn for which there exists some

neighborhood N of x such that N Rn \S;

3. the boundary of S, which consists of all points x Rn with the property that every

neighborhood N of x intersects both S and its complement Rn \S.

A set S Rn is said to be closed if it contains all its boundary points. The union of

A point x belongs

S and its boundary (S S) is called the closure of S, denoted by S.

= for any > 0. The closure S of any set S is indeed

closed. In fact, S is the smallest closed set containing S.

Theorem 3.8

1. The whole space Rn and the empty set are both closed.

3. The union of nitely many closed sets is closed.

Exercise 3.6 Prove Theorem 3.8. Use the fact that the complement of open set is closed

and Theorem 3.7.

In topology, any set containing some of its boundary points but not all of them, is

neither open nor closed. The half-open intervals [a, b) and (a, b], for examples, are neither

open nor closed. Hence, openness and closedness are not mutually exclusive.

3.3

I want to generalize the argument in Section 3.1 into Rn . The basic idea to do so is to

n

apply the previous argument coordinate-wise. A sequence {xk }

k=1 in R is a function

n

that for each natural number k yields a corresponding point xk in R .

23

there exists a natural number N such that xk B (x) for all k N , or equivalently, if

d(xk , x) 0 as k .

Theorem 3.9 Let {xk } be a sequence in Rn . Then, {xk } converges to the vector x Rn

(j)

if and only if for each j = 1, . . . , n, the real number sequence {xk }

k=1 , consisting of jth

(j)

component of each vector xk , converges to x R, the jth component of x.

Proof of Theorem 3.9: (=) For every k and every j, one has d(xk , x) = xk x|

(j)

that if xk x, then xk x(j) for each j. (=) Suppose that

for j = 1, . . . , n. Then, given any > 0, for each i = 1, . . . , n,

(j)

there exists a number Nj such that |xk x(j) | < / n for all k > Nj . It follows that

(j)

|xk x(j) |. It follows

(j)

xk x(j) as k

d(xk , x) =

(1)

(n)

2 /n + 2 /n = ,

for all k > max{N1 , . . . , Nn }. This is well dened because of the niteness of n. Therefore, xk x as k

Denition 3.6 A sequence {xk } in Rn is said to be Cauchy sequences if for every

> 0, there exists a number N such that d(xm , xn ) < for all m, n > N .

Theorem 3.10 A sequence {xk } in Rn is convergent if and only if it is a Cauchy sequence.

Exercise 3.7 Prove Theorem 3.10. Apply the same argument in Theorem 3.5 to each

coordinate.

3.4

Properties of Sequences in Rn

Theorem 3.11

1. For any set S Rn , a point x Rn belongs to S if and only if

there exists {xk } S such that xk x as k .

2. A set S Rn is closed if and only if every convergent sequence of points in S has

its limit in S.

Regardless of whether

Proof of Theorem 3.11: (= of Property 1) Let x S.

x intS or S, for each k N, we can construct xk such that xk B1/k (x) S (In

particular, take xk = x for each k). Then xk x as k . (= of Property 1) Suppose

that {xk } is a convergent sequence for which xk S for each k and x = limk xk . We

For any > 0, there is a number N N such that xk B (x) for

claim that x S.

all k > N . Since xk S for each k, it follows that B (x) S

= . Suppose, on the

Since S is closed, there is some > 0 such that B (x) S = . This

other hand, x

/ S.

= for any > 0. Hence, x S.

24

such that xk S for each k. Note that x S by property 1. Since S = S if S is closed,

there is

it follows that x S. (= of Property 2) By property 1, for any point x S,

some sequence {xk } for which xk S for each k and limk xk = x. By our hypothesis,

x S. This shows that x S implies x S, i.e., S S. By denition, S S for any

that is, S is closed.

S. Hence S = S,

Denition 3.7 A set S in Rn is bounded if there exists a number M R such that

x M

for all x S. A set that is not bounded is called unbounded. Here x =

d(x, 0) = x21 + + x2n , called the Euclidean norm.

Similarly, a sequence {xk } in Rn is bounded if the set {xk |k = 1, 2, . . . } is bounded.

Lemma 3.1 Any convergent sequence {xk } in Rn is bounded.

Proof of Lemma 3.1: If xk x, then only nitely many terms of sequence can lie

outside the ball B1 (x). The ball B1 (x) is bounded and any nite set of points is bounded,

so {xk } must be bounded.

On the other hand, a bounded sequence {xk } in Rn is not necessarily convergent.

This is the same as the sequences in R. The theorem below gives us a characterization

of boundedness of the set in terms of sequences.

Theorem 3.12 A subset S of Rn is bounded if and only if every sequence of points in

S has a convergent subsequence.

I omit the proof of this theorem. Although it is not a dicult proof, it is tedious to

prove that. The next concept of compact sets is used extensively in both mathematics

and economics.

Denition 3.8 A set S in Rn is compact if it is closed and bounded.

Theorem below is a characterization of compact sets in terms of sequences.

Theorem 3.13 (Bolzano-Weierstrass) A subset S of Rn is compact if and only if

every sequence of points in S has a subsequence that converges to a point in S.

Proof of Bolzano-Weierstrasss theorem: (=) Suppose that S is compact and

let {xk } be a sequence such that xk S for each k. Since S is also bounded, there is

a convergent subsequence {yn } = {xkn }. Furthermore, limn yn = y S because S

is closed. (=) Let {xk } be any sequence for which xk S for each k. Suppose that

there exists a subsequence {yn } = {xkn } of the sequence {xk } with limn yn = y S.

This with the previous theorem (Theorem 3.12) already showed that S is bounded. Let

{xk } be any sequence for which xk S for each k and x = limk xk . Since S is closed

By assumption, {xk } has a subsequence {xk } that

by denition, it follows that x S.

j

converges to limj xkj = x S. But {xkj } also converges to x. Hence, the limit

points must be the same, that is, x = x S.

25

Exercise 3.8 Let the number of commodities of the competitive market be n. Let pi > 0

be a price for commodity i for each i = 1, . . . , n. Let y > 0 be the consumers income.

Dene the consumers budget set B(p, y) as

n

pi xi y .

B(p, y) x = (x1 , . . . , xn ) Rn+

i=1

3.5

Continuous Functions

speaking, f is continuous if small changes in the independent variables cause only small

changes in the function value.

Denition 3.9 A function f : S R with domain S Rn is continuous at a point

x0 in S if for every > 0 there exists a > 0 such that

|f (x) f (x0 )| < for all x S with x x0 < .

If f is continuous at every point in a set S, we simply say that f is continuous on S.

Exercise 3.9 Let f (x) =

1 if x 1

f (x) =

0 if x < 1

Show that f is not a continuous function.

Consider next the general case of vector-valued functions.

Denition 3.10 A function f = (f1 , . . . , fm ) from a subset S of Rn to Rm is said to be

continuous at x0 in S if for every > 0, there exists a > 0 such that d(f (x), f (x0 )) <

for all x S with d(x, x0 ) < , or equivalently, such that f (B (x0 ) S) B (f (x0 )).

The next theorem shows that the continuity of vector-valued functions can reduce to

the continuity of each component (coordinate) functions, and vice versa.

Theorem 3.14 A function f = (f1 , . . . , fm ) from S Rn to Rm is continuous at a

point x0 in S if and only if each component function fj : S R, j = 1, . . . , m, is

continuous at x0 .

26

> 0, there exists a > 0 such that

|fj (x) fj (x0 )| d(f (x), f (x0 )) <

for every x S with d(x, x0 ) < . Hence, fj is continuous at x0 for j = 1, . . . , m. (=)

Suppose that each component fj is continuous at x0 . Then, for every > 0 and every

j = 1, . . . , m, there exists j > 0 such that |fj (x) fj (x0 )| < / m for every point x S

with d(x, x0 ) < j . Let = min{1 , . . . , m }. Then x B (x0 ) S implies that

2

2

+ +

= .

d(f (x), f (x0 )) = |f1 (x) f1 (x0 )|2 + + |fm (x) fm (x0 )|2 <

m

m

This proves that f is continuous at x0 .

Here, I want to characterize the continuity of the functions in terms of sequences.

Theorem 3.15 A function f from S Rn into Rm is continuous at a point x0 in S if

and only if f (xk ) f (x0 ) for every sequence {xk } of points in S that converges to x0 .

Proof of Theorem 3.15: (=) Suppose that f is continuous at x0 and let {xk } be

a sequence for which xk S and limk xk = x0 . Let > 0 be given. Since xk x0 ,

for any > 0, there exists a number N N such that d(xk , x0 ) < for all k > N .

Therefore, because of the continuity of f , there exists > 0 such that d(f (x), f (x0 )) <

whenever x B (x0 ) S. But then xk B (x0 ) S and so d(f (xk ), f (x0 )) < for

all k > N . This implies that f (xk ) f (x0 ). (=) Let {xk } be a sequence for which

xk S for each k and x0 = limk xk . Since xk x0 , for any > 0, there is a number

N such that d(xk , x0 ) < for all k > N . Similarly, since f (xk ) f (x0 ), for any

> 0, there exists a number N such that d(f (xk ), f (x0 )) < for all k > N . Dene

N = max{N , N }. Then, by choosing k > N , there exists a > 0 with d(xk , x0 ) <

such that d(f (xk ), f (x0 )) < . In fact, this holds for any > 0. Hence, we prove that f

is continuous at x0 .

The theorem below shows that continuous mappings preserve the compactness of the

set.

Theorem 3.16 Let S Rn and let f : S Rm be continuous. Then f (K) = {f (x)|x

K} is compact for every compact subset K of S.

Proof of Theorem 3.16: Let {yk } be any sequence in f (K). By denition, for each

k, there is a point xk K such that yk = f (xk ). Because K is compact, by BolzanoWeierstrasss theorem, the sequence {xk } has a subsequence {xkj } with the property that

xkj K for each j and limj xkj = x0 K. Because f is continuous, by the previous

theorem (Theorem 3.15), f (xkj ) f (x0 ) as j , where f (x0 ) f (K) because

x0 K. But then {ykj } is a subsequence of {yk } that converges to a point f (x0 ) f (K).

So, we have proved that any sequence in f (K) has a subsequence converging to a point

of f (K).

27

the image f (V ) = {f (x)|x V } of V need not be open in Rm . Nor need f (C) be closed

if C is closed. Nevertheless, the inverse image f 1 (U ) = {x|f (x) U } of an open set U

under continuous function f is always open. Similarly, the inverse image of any closed

set must be closed.

Theorem 3.17 Let f be any function from Rn to Rm . Then f is continuous if and only

if either of the following equivalent conditions is satised.

1. f 1 (U ) is open for each open set U in Rm .

2. f 1 (F ) is closed for each closed set F in Rm .

I omit the proof of Theorem 3.17. Because it is conceptually involved. So, just accept

the result.

Theorem 3.18 Let S be a compact set in R and let x be the greatest lower bound of S

and x be the lowest upper bound of S. Then, x S and x S.

Proof of Theorem 3.18: Let S R be closed and bounded and let x be the

lowest upper bound of S. Then, by denition of any lower bound, we have x x for

all x S. If x = x for some x S, we are done. Suppose, therefore, that x is strictly

/ S, so x R\S.

greater than every point in S. If x > x for all x S, then x

Since S is closed, R\S is open. Then, by the denition of open sets, there exists some

> 0 such that B (x ) = (x , x + ) R\S. Since x > x for all x S and

B (x ), we must have x

> x for all x S. In

B (x ) R\S, we claim that for any x

our hypothesis that x is the lowest upper bound of S. Thus, we must conclude that

x S. The same argument should be constructed for the greatest lower bound of S,

i.e., x .

Theorem 3.19 (Weierstrasss Theorem) Let f : S R be a continuous real-valued

mapping where S is a nonempty compact subset of Rn . Then there exists a vector x , x

S such that for all x S,

f (x ) f (x) f (x ).

Proof of Weierstrasss Theorem: It follows from Theorem 3.16 and 3.18.

28

Chapter 4

Linear Algebra

4.1

if for any x, y Rn and any R, the following two conditions are satised: (1)

f (x + y) = f (x) + f (y) and (2) f (x) = f (x). For any linear mapping f : Rn Rm ,

there exists a unique m n matrix A such that f (x) = Ax for all x Rn . 1 An m n

matrix is a rectangular array with m rows and n columns:

a21 a22 a2n

A = (aij )mn = .

..

..

.

.

.

.

.

.

.

am1 am2 amn

Here aij denotes the elements in the ith row

we can express f (x) = Ax as below:

(1)

f (x)

a11

..

.

a21

(j)

f (x) =

.

f (x) =

..

..

.

am1

(m)

f (x)

..

.

a1n

a2n

..

.

am2

amn

a12

a22

..

.

x1

x2

..

.

xn

If A = (aij )mn , B = (bij )mn , and is a scalar, we dene

A + B = (aij + bij )mn ,

A = (aij )mn ,

1

This is a non-trivial statement. But I take this one-to-one correspondence between linear mapping

and matrix representation as a fact with no proof provided.

29

Let f : Rn Rm and g : Rn Rp be linear mappings. Then, we can set m n

matrix A = (aij )mn associated with f and p m matrix B = (bij )pm associated with

g. Consider the composite mapping g f (x) = g(f (x)). What I would like to have is the

requirement on the product of matrices that g f BA. Then the product C = BA

is dened as the p n matrix C = (cij )pn , whose element in the ith row and the jth

column is the inner product of the ith row of A and the jth column of B. That is,

cij =

n

r=1

air brj = ai1 b1j + ai2 b2j + + aik bkj + ain bnj

n terms

It is important to note that the product BA is well dened only if the number of

columns in B is equal to the number of rows in A.

If A, B, and C are matrices whose dimensions are such that the given operations are

well dened, then the basic properties of matrix of multiplication are:

(AB)C = A(BC) (associative law)

A(B + C) = AB + AC (left distributive law)

(A + B)C = AC + BC (right distributive law)

Exercise 4.2 Show the above three properties when we consider 2 2 matrices.

However, matrix multiplication is not commutative. In fact,

AB

= BA, except in special cases

AB = 0 does not imply that A or B is 0

AB = AC and A

= 0 do not imply that B = C

Exercise 4.3 Conrm the above three points by example.

By using matrix multiplication, one can write a general system of linear equations in

a very concise way. Specically, the system

a11 x1 + a12 x2 + + a1n xn

b1

b2

am1 x1 + am2 x2 + + amn xn

can be written as Ax = b if we

a11 a12

a21 a22

A= .

..

..

.

am1 am2

dene

..

.

a1n

a2n

..

.

amn

, x =

x1

x2

..

.

xn

30

bm

, b =

b1

b2

..

.

bn

matrix and n is a positive integer, we dene the nth power of A in the obvious way:

A

An = AA

n factors

For diagonal matrices it

d1

0

D= .

..

0

n

d1 0

0 0

0 dn

d2 0

2

n

.. . .

.. = D = ..

.. . .

.

.

.

.

.

.

0

dm

0

0

..

.

dnm

The identity matrix of order n, denoted by In , is the n n matrix having ones along

the main diagonal and zeros elsewhere:

1 0 0

0 1 0

(identity matrix)

In = . . .

. . ...

.. ..

0 0

AIn = In A = A for every n n matrix A

If A = (aij )mn is any matrix, the transpose of A is dened as AT = (aji )nm .

The subscripts i and j are interchanged because every row of A becomes a column of

AT , and every column of A becomes a row of AT . The following rules apply to matrix

transposition:

1. (AT )T = A

2. (A + B)T = AT + B T

3. (A)T = AT

4. (AB)T = B T AT

Exercise 4.4 Prove the above four properties when we consider 2 2 matrices.

A square matrix is said to be symmetric if A = AT .

31

4.2

4.2.1

Determinants

a

|A| = 11

a21

a11

|A| = a21

a31

a12

= a11 a22 a12 a21

a22

a12 a13

a22 a23 = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a11 a23 a32 a12 a21 a33 a13 a22 a31

a32 a33

For a general n n matrix A = {aij }, the determinant |A| can be dened recursively.

In fact,

|A| = ai1 Ai1 + ai2 Ai2 + aij Aij + + ain Ain

where the cofactors Aij are determinants of (n 1) (n 1)

a11 a1,j1 a1j a1,j+1

a21 a2,j1 a2j a2,j+1

..

..

..

..

.

.

.

i+j .

Aij = (1)

a

a

a

a

i1

i,j1

ij

i,j+1

.

..

..

..

..

.

.

.

a

an,j1 anj an,j+1

n1

matrices given by

a1n

a2n

..

.

ain

..

.

ann

Here row i and column j are to be deleted from the matrix A to produce Aij .

Proposition 4.1 Let A and B be n n matrices. Then, |AB| = |A||B|

Exercise 4.5 Prove Proposition 1 when n = 2.

4.2.2

Matrix Inverses

B = A1 AB = In BA = In

A1 exists |A|

= 0

If A = (aij )nn and |A|

= 0, the unique inverse of A is given by

1

A12 A22 An2

adj(A), where adj(A) = .

A1 =

..

..

..

|A|

..

.

.

.

A1n A2n Ann

with Aij , the cofactor of the element aij . Note carefully the order of the indices in the

adjoint matrix, adj(A) with the column number preceding the row number. The matrix

(Aij )nn is called the cofactor matrix, whose transpose is the adjoint matrix.

32

A = a21 a22 a23

a31 a32 a33

= 0.

Lemma 4.1 The following rules for inverses can be established.

(A1 )1 = A,

(AB)1 = B 1 A1 ,

(AT )1 = (A1 )T ,

(A)1 = 1 A1 , where R.

Exercise 4.7 Prove Lemma 4.1 when n = 2.

Proposition 4.2 Let A be a n n matrix. Then, |A1 | = 1/|A|.

Exercise 4.8 Prove Proposition 4.2 when n = 2.

4.2.3

Cramers Rule

a11 x1 + a12 x2 + + a1n xn

b1

b2

()

an1 x1 + an2 x2 + + ann xn

bn

= 0. The solution is then

xj =

|Aj |

, j = 1, . . . , n

|A|

a11

a21

|Aj | = .

..

.

..

an1

a1,j1

a2,j1

..

.

b1

b2

..

.

an,j1

bn

a1,j+1

a2,j+1

..

..

.

.

an,j+1

a1n

a2n

..

.

ann

is obtained by replacing the jth column of |A| by the column whose components are

b1 , b2 , . . . , bn . If the right-hand side of the equation system () consists only of zeros, so

that it can be written in matrix form as Ax = 0, the system is called homogeneous. A

homogeneous system will always have the trivial solution x1 = x2 = = xn = 0.

33

I omit the proof of Lemma 4.2.

Exercise 4.9 Use Cramers rule to solve the following system of equations:

2x1 3x2 = 2

4x1 6x2 + x3 = 7

x1 + 10x2 = 1.

4.3

Vectors

and columns of a matrix as vectors, and an n-vector can be understood either as a 1 n

matrix a = (a1 , a2 , . . . , an ) (a row vector ) or as an n 1 matrix aT = (a1 , a2 , . . . , an )T

(a column vector). The operations of addition, subtraction and multiplication by scalars

of vectors are dened in the obvious way. The dot product (or inner product) of the

n-vectors a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ) is dened as

a b = a1 b1 + a2 b2 + + an bn =

ai bi

i=1

1. a b = b a,

2. a (b + c) = a b + a c,

3. (a) b = a (b) = (a b).

4. a a = 0 = a = 0

5. (a + b) (a + b) = a a + 2(a b) + b b.

Exercise 4.10 Prove Proposition 4.1. If you nd it dicult to do so, focus on vectors

in R2 .

The Euclidean norm or length of the vector a = (a1 , a2 , . . . , an ) is

Note that a = ||a for all scalars and vectors.

Lemma 4.3 The following useful inequalities hold.

1. |a b| a b (Cauchy-Schwartz inequality)

34

Proof of Cauchy-Schwartz inequality: Dene f (t) as

f (t) = (ta + b) (ta + b),

where t R. Because of the denition of dot products, we have f (t) 0 for any t R.

f (t) = t2 a2 + 2t(a b) + b2 .

Then, using the formula, we solve the above equation with respect to t:

(a b) (a b)2 a2 b2

t=

a2

Since f (t) 0 for any t R, we must have

|a b| ab.

Exercise 4.11 Prove property 2 in Lemma 4.3. (Hint: It suces to show that a+b2

(a + b)2 )

Cauchy-Schwartz inequality implies that, for any a, b Rn ,

1

ab

1.

ab

cos =

ab

, [0, ]

a b

This denition reveals that cos = 0 if and only if a b = 0. Then = /2. In symbols,

ab a b = 0

The hyperplane in Rn that passes through the point a = (a1 , . . . , an ) and is orthogonal to the nonzero vector p = (p1 , . . . , pn ), is the set of all points x = (x1 , . . . , xn ) such

that

p (x a) = 0

4.4

Linear Independence

exist numbers c1 , c2 , . . . , cn , not all zero, such that

c1 a1 + c2 a2 + + cn an = 0

If this equation holds only when c1 = c2 = = cn = 0, then the vectors are linearly

independent.

35

Exercise 4.12 a1 = (1, 2), a2 = (1, 1), and a3 = (5, 1) R2 . Show that a1 , a2 , a3 are

linearly dependent.

Leta1 , a2 , . . . , an Rn \{0}. Suppose that, for any i = 1, . . . , n, it follows that

n

ai

=

j=i j aj for any 1 , . . . , i1 , i+1 , . . . , n R. Then, the entire space R is

spanned by the set of all linear combinations of a1 , . . . , an .

Lemma 4.4 A set of n vectors a1 , a2 , . . . , an in Rm is linearly dependent if and only if

at least one of them can be written as a linear combination of the others. Or equivalently:

A set of vectors a1 , a2 , . . . , an in Rm is linearly independent if and only if none of them

can be written as a linear combination of the others.

Proof of Lemma 4.4: Suppose that a1 , a2 , . . . , an are linearly dependent. Then

the equation c1 a1 + + cn an = 0 holds with at least one of the coecients ci dierently

from 0. We can, without loss of generality, assume that c1

= 0. Solving the equation for

a1 yields

c2

cn

a1 = a2 an .

c1

c1

Thus, a1 is a linear combination of the other vectors.

4.4.1

a11 x1 + a12 x2 + + a1n xn

b1

b2

x1 a1 + + xn an = b

()

an1 x1 + an2 x2 + + amn xn

bm

Here a1 , . . . , an are the column vectors of coecients, and b is the column vector with

components b1 , . . . , bm .

Suppose that () has two solutions (u1 , . . . , un ) and (v1 , . . . , vn ). Then,

u1 a1 + + un an = b and v1 a1 + + vn an = b

Subtracting the second equation from the rst yields

(u1 v1 )a1 + + (un vn )an = 0.

Let c1 = u1 v1 , . . . , cn = un vn . The two solutions are dierent if and only if c1 , . . . , cn

are not all equal to 0. We conclude that if system () has more than one solution, then

the column vectors a1 , . . . , an are linearly dependent. 2 Equivalently, If the column

vectors a1 , . . . , an are linearly independent, then system () has at most one solution.

3

Is there anything to say when there is no solution? The answer is yes. I can use Farkas Lemma to

check if there is any solution to the system. See Appendix 1 in this chapter for Farkas Lemma.

3

36

a11

a21

A= .

..

a12 a1n

a1j

a2j

a22 a2n

..

.. , where aj = .. j = 1, . . . , n

..

.

.

.

.

an1 an2

ann

anj

= 0.

Proof of Theorem 4.1: The vectors a1 , . . . , an are linearly independent i the

vector equation x1 a1 + xn an = 0 has only the trivial solution x1 = = xn = 0. This

vector equation is equivalent to a homogeneous systems of equations, and therefore, it

has only the trivial solution i |A|

= 0. 4

Denition 4.2 The rank of a matrix A, written r(A), is the maximum number of

linearly independent column vectors in A. If A is the 0 matrix, we put r(A) = 0.

4.5

Eigenvalues

4.5.1

Motivations

A=

2 0

0 3

2 0

x1

2x1

y1

=

=

y = Ax

y2

x2

3x2

0 3

The linear transformation (matrix) A extends x1 into 2x1 along the x1 axis and x2

into 3x2 along the x2 axis. Importantly, there is no interaction between x1 and x2 through

the linear transformation A. This is, I believe, a straightforward extension of the linear

transformation in R into Rn . Dene e1 = (1, 0) and e2 = (0, 1) as the unit vectors in R2 .

Then, x = x1 e1 + x2 e2 and y = 2x1 e1 + 3x2 e2 . In other words, (e1 , e2 ) is the unit vector

in the original space and (2e1 , 3e2 ) is the unit vector in the space transformed through

A. Next, consider the matrix B as follows.

1 1

B=

2 4

Now, we dont have a clear image about what is going on through the linear transformation A. However, consider the following dierent unit vectors f1 = (1, 1) and

4

Check Section 4.2.3 for this argument. Recall that the system of linear equations is homogeneous if

it is expressed by Ax = 0.

37

f1

f2

1

1

1

1

1 1

1 1

=2

=3

, and

1

1

2

2

2 4

2 4

2f1

3f2

This shows that once we take f1 and f2 as the new coordinate system, the linear

transformation B is the same as A but now along the f1 and f2 axes, respectively.

Finally, consider the matrix C below.

2 3

C=

4 2

It turns out that there is no way of nding the new coordinate system in which

the linear transformation C can be seen as either extending or shrinking the vectors in

each new axis. The reason why we dont nd such a new coordinate system is that we

restrict our attention to Rn . Once we allow for the unit vectors in the new system to be

complex numbers, we again will be successful to nd the new coordinate system in which

everything is easy to understand. 5

Consider another dierent unit vector

2

2

3

3

i and f2 = 1,

i .

f1 = 1,

3

3

Then,

f1

1

2 3

= (2 2 3i)

2 3i

4 2

3

2 3

4 2

A

2 3i

3

(22 3i)f1

f2

1

2 33i

= (2 + 2 3i)

2 33i

(2+2 3i)f2

, and

with the special property that

Ax = x ()

In this case, we would have A2 x = A(Ax) = A(x) = Ax = x = 2 x, in general,

An x = n x.

5

Those who are interested in the definition of complex numbers should be referred to Appendix 2 in

this chapter.

38

is a nonzero vector x Rn such that

Ax = x

Then x is an eigenvector of A (associated with )

4.5.2

(A I)x = 0

where I denotes the identity matrix of order n. Note that this linear system of equations

has a solution x

= 0 if and only if the coecient matrix has determinant equal to 0

that is, i |A I| = 0. Letting p() = |A I|, where A = (aij )nn , we have the

equation

a11

a12

a1n

a21

a22

a2n

p() = |A I| =

= 0.

..

..

..

.

..

.

.

.

an1

an2

ann

This is called the characteristic equation of A. From the denition of determinant,

it follows that p() is a polynomial of degree n in . According to the fundamental

theorem of algebra, it has exactly n roots (real or complex), provided that any multiple

roots are counted appropriately.

Theorem 4.2 (The Fundamental Theorem of Algebra by Gauss (1779)) Consider

a polynomial of degree n in z shown below.

z n + an1 z n1 + + a1 z + a0 = 0, ()

where a0 , . . . an1 C. Then, () has n solutions z1 , . . . , zn with the property that zi C

for each i = 1, . . . , n. This includes the case in which zi = zj for some i

= j.

Exercise 4.13 Find the eigenvalues and the associated eigenvectors of the matrices, A

and B.

1 2

A =

3 0

0 1

B =

1 0

In fact, it is convenient to write the characteristic function as a polynomial in :

p() = ()n + bn1 ()n1 + + b1 () + b0

39

The zeros of this characteristic polynomial are precisely the eigenvalues of A. Denoting

the eigenvalues by 1 , 2 , . . . , n C, we have

p() = (1)n ( 1 )( 2 ) ( n )

Theorem 4.3 If A is an n n matrix with eigenvalues 1 , 2 , . . . , n , then

1. |A| = 1 2 n

2. tr(A) = a11 + a22 + + ann = 1 + 2 + + n

Proof of Theorem 4.3: Putting = 0, we see that p(0) = b0 = |A|. Specically,

= 0 gives p(0) = (1)n (1)n 1 2 n . Since (1)n (1)n = ((1)n )2 = 1, we have

b0 = |A| = 1 2 n . The product of the elements on the main diagonal of |A I| is

(a11 )(a22 ) (ann ).

If we choose ajj from one of these parentheses and from the remaining n 1, then

add over j = 1, . . . n, we obtain the term

(a11 + a22 + + ann )()n1

Since we cannot obtain other terms with ()n1 except the above terms, we conclude

that bn1 = a11 + a22 + ann , the trace of A.

4.6

Diagonalization

Let A and P be n n matrices with P invertible. Then A and P 1 AP have the same

eigenvalues. This is true because the two matrices have the same characteristic polynomial:

|P 1 AP I| = |P 1 AP P 1 IP | = |P 1 (A I)P |

= |P 1 ||A I||P | = |A I|

where we use the fact that |P 1 | = 1/|P | and |AB| = |A||B| (See Proposition 4.1 and

4.2.) .

An n n matrix A is diagonalizable if there exists an invertible n n matrix P and

a diagonal matrix D such that P 1 AP = D.

Theorem 4.4 (Diagonalization Theorem) An n n matrix A is diagonalizable if

and only if it has a set of n linearly independent eigenvectors x1 , . . . , xn Cn . In this

case,

P 1 AP = diag(1 , . . . , n )

where P is the matrix with x1 , . . . , xn Cn as its columns, and 1 , . . . , n C are the

corresponding eigenvalues.

40

Proof of Diagonalization Theorem: (=) Suppose that A has n linearly independent eigenvectors x1 , . . . , xn , with corresponding eigenvalues 1 , . . . , n . Let P denote

the matrix whose columns are x1 , . . . , xn . Then, AP = P D ( AP = DP because D is

diagonal), where D = diag(1 , . . . , n ). Because the eigenvectors are linearly independent, P is invertible, so P 1 AP = D. (=) If A is diagonalizable, there exists invertible

n n matrix P such that P 1 AP = D. Then, AP = P D. Since D is a diagonal matrix

by our hypothesis, we have AP = DP . The columns of P must be eigenvectors of A,

and the diagonal elements of D must be the corresponding eigenvalues.

A matrix P is said to be orthogonal if P T = P 1 , i.e., P T P = I. If x1 , . . . , xn are the

n column vectors of P , then x1 , . . . , xn are the row vectors of the transformed matrix,

P . The condition P T P = I then reduces to the n2 equations xTi xj = 1 if i = j and

xTi xj = 0 if i

= j.

Theorem 4.5 If the matrix A = (aij )nn is symmetric, then:

1. All the n eigenvalues 1 , . . . , n are real.

2. Eigenvectors that corresponds to dierent eigenvalues are orthogonal.

3. There exists an orthogonal matrix P (i.e., P T = P 1 ) such that

1 0 0

0 2 0

P 1 AP = .

.. . .

..

.

.

. .

.

0

The columns v1 , v2 , . . . , vn of the matrix P are eigenvectors of unit length corresponding to the eigenvalues 1 , 2 , . . . , n .

Proof of Theorem 4.5: (1) We will show this for n = 2. The eigenvalues of 2 2

matrix A are given by the quadratic equation.

a11

a

12

= 2 (a11 + a22 ) + (a11 a22 a12 a21 ) = 0 ()

|A I| =

a21

a22

The roots of the quadratic equation () are

(a11 + a22 ) (a11 + a22 )2 4(a11 a22 a12 a21 )

.

=

2

These roots are real if and only if (a11 + a22 )2 4(a11 a22 a12 a21 ), which is equivalent

to

(a11 a22 )2 + 4a12 a21 0

41

This is indeed the case. (2) Suppose that Axi = i xi and Axj = j xj by i

= j .

Multiplying these equalities from the left by xTj and xTi , respectively,

xTi AT xj = i xTi xj and xTj AT xi = j xTj xi

Since A is symmetric so that A = AT , the above is simplied to

xTi Axj = i xTi xj

= 0 by our hypothesis, we must

have xi xj = 0 and thus, xi and xj are orthogonal. (3) Suppose all the real which

we are supposed to know from (1) eigenvalues are dierent. 6 Then, according to (2)

which we have shown above, the associated eigenvectors are mutually orthogonal. Hence,

the eigenvectors are linearly independent. When we dene P as the collection of the

eigenvectors x1 , . . . xn as its columns, it follows that P 1 = P T . By the diagonalization

theorem (Theorem 4.4), A is diagonalizable. We can choose the eigenvectors so that they

all have length 1, by replacing each xj with xj /xj .

Exercise 4.14 Let a 2 2 symmetric matrix A be given below.

2 1

A=

1 2

Compute the matrix P described in Theorem 4.5.

4.7

Quadratic Forms

Q(x1 , . . . , xn ) =

n

n

i=1 j=1

where the aij are constants. Suppose we put x = (x1 , . . . , xn )T and A = (aij ). Then, it

follows from the denition of matrix multiplication that

Q(x1 , . . . , xn ) = Q(x) = xT Ax.

Of course, xi xj = xj xi , so we can write aij xi xj + aji xj xi = (aij + aji )xi xj . If we replace

aij and aji by (aij + aji )/2, then the new numbers aij and aji become equal without

changing Q(x). Thus, we can assume that aij = aji for all i and j, which means that

the matrix A is symmetric. Then A is called the symmetric matrix associated with Q,

and Q is called a symmetric quadratic form.

6

42

Denition 4.4 A quadratic form Q(x) = x Ax, as well as its associated symmetric

matrix A, are said to be positive denite, positive semidenite, negative denite,

or negative semidenite according as

Q(x) > 0, Q(x) 0, Q(x) < 0, Q(x) 0,

for all x R\{0}. The quadratic form Q(x) is indenite if there exist vectors x and

y such that Q(x ) < 0 and Q(y ) > 0.

Let A = (aij ) be any n n matrix. An arbitrary principal minor of order r is the

determinant of the matrix obtained by deleting all but r rows and r columns in A with

the same numbers. In particular, a principal minor of order r always includes exactly r

elements of the main (principal) diagonal. We call the determinant |A| itself a principal

minor (no rows and columns are deleted). A principal minor is said to be a leading

principal minor of order r (1 r n), if it consists of the rst leading rows and

columns of |A|.

Suppose A is an arbitrary n n matrix.

a11 a12

a21 a22

Dk = .

..

..

..

.

.

ak1 ak2

a1k

a2k

.. , k = 1, . . . , n

.

akk

A = a21 a22 a23 .

a31 a32 a33

Compute all the principal minors of A.

Theorem 4.6 Consider the quadratic form

Q(x) =

n

n

aij xi xj

(aij = aji )

i=1 j=1

with the associated symmetric matrix A = (aij )nn . Let Dk be the leading principal

minor of A of order k and let k denote an arbitrary principal minor of order k. Then

we have

1. Q is positive denite Dk > 0 for k = 1, . . . , n

2. Q is positive semidenite k 0 for all principal minors of order k =

1, . . . , n.

3. Q is negative denite (1)k Dk > 0 for k = 1, . . . , n

43

k = 1, . . . , n.

Proof of Theorem 4.6: We only prove this for n = 2. Then, the quadratic form is

Q(x1 , x2 ) = a11 x21 + 2a12 x1 x2 + a22 x22

After some manipulation through perfect square, we obtain

a12

a212

x22

x2 + a22

Q(x1 , x2 ) = a11 x1 +

a11

a11

0

>0

Thus, we obtain

Q(x1 , x2 ) > 0 a11 > 0 and a11 a22 a212 > 0.

Q(x1 , x2 ) < 0 a11 < 0 and a11 a22 a212 < 0.

and let 1 , . . . , n be the (real) eigenvalues of A. Then,

1. Q is positive denite 1 > 0, . . . , n > 0

2. Q is positive semidenite 1 0, . . . , n 0

3. Q is negative denite 1 < 0, . . . , n < 0

4. Q is negative semidenite 1 0, . . . , n 0

5. Q is indenite A has eigenvalues with opposite signs.

Proof of Theorem 4.7: According to Theorem 4.5, there exists an orthogonal

matrix P such that P AP = diag(1 , . . . , n ). Let y = (y1 , . . . , yn )T be the n 1 matrix

dened by y = P T x. Then, x = P y, so that

xT Ax = (P y)T AP y = y T P T AP y = y T diag(1 , . . . , n )y = 1 y12 + 2 y22 + + n yn2

Also, x = 0 i y = 0. This completes the proof.

4.8

44

4.8.1

Preliminaries

S = {x1 , x2 , . . . } if there are real numbers {j }jS such that

j xj

y=

jS

The set of all vectors that can be expressed as a linear combination of vectors in S is

called the span of S and denoted span(S).

Denition 4.6 The rank of a (not necessarily nite) set S of vectors is the size of the

largest subset of linearly independent vectors in S.

Denition 4.7 Let S be a set of vectors and B S be nite and linearly independent.

The set B of vectors is said to be a maximal linear independent set if the set B {x}

is linearly dependent for all vectors x S\B. A maximal linearly independent subset of

S is called a basis of S.

Theorem 4.8 Every S Rn has a basis. If B is a basis of S, then span(S) = span(B).

From this theorem, one can see that if S has a basis B, then the rank of S and B.

Denition 4.8 Let S be a set of vectors. The dimension of span(S) is the rank of S.

Denition 4.9 The kernel or null space of A is the set {x Rn |Ax = 0}.

The following theorem summarizes the relationship between the span of A and its

kernel.

Theorem 4.10 If A is an m n matrix, then the dimension of span(A) plus the dimension of the kernel of A is n.

This is sometimes written as

dim[span(A)] + dim[ker(A)] = rank(A) + dim[ker(A)] = n.

The column rank of a matrix is the dimension of the span of its columns. Similarly,

the row rank is the dimension of the span of its row.

Theorem 4.11 Let A be an m n matrix. Then, the column rank of A and AT (the

tranpose of A) are the same.

Thus, the column and row rank of A are equal. This allows us to dene the rank of

a matrix A to be the dimension of span(A).

45

4.8.2

following kind:

Given b Rm , nd an x Rn such that Ax = b or prove that no such x exists.

Convincing another that Ax = b has a solution (when it does) is easy. One merely

exhibits and they can verify that the solution does indeed satisfy the equations. What

if the system Ax = b does not admit a solution? By framing the problem in the right

way, we can bring to bear the machinery of linear algebra. Specically, given b Rm ,

the problem of nding an x Rn such that Ax = b can be stated as: is b span(A)?

Theorem 4.12 (Gauss (??)) Let A be an m n matrix, b Rm and F = {x

Rn |Ax = b}. Then, either F

= or there exists y Rm such that yA = 0 and yb

= 0

but not both.

Suppose F = . Then, b is not in the span of the columns of A. If I think of the space

of the columns of A as a plane, then b is a vector pointing out of the plane. Thus, any

vector y orthogonal to this plane (and so to every column of A) must have a non-zero

dot product with b. Now for an algebraic interpretation. Take any linear combination

of the equations in the system Ax = b. This linear combination can be obtained by

pre-multiplying each side of the equation by a suitable vector y, i.e., yAx = yb. Suppose

there is a solution x to the system, i.e., Ax = b. Any linear combination of these

equations results in an equation that x satises as well. In particular, x must also be

a solution to the resulting equation: yAx = yb. Suppose I found a vector y such that

yAx

= yb then clearly the original system Ax = b could not have a solution.

Proof: First, we prove the not both part. Suppose that F

= . Choose any x F .

Then, for any y Rm , we have

yb = yAx = (yA)x = 0,

which contradicts the fact that yb

= 0 for some y.

If F

= , we are done. Suppose that F

= . Hence, b cannot be in the span of the

columns of A. Thus, the rank of C = [A, b], r , is one larger than the rank, r, of A. That

is, r = r + 1. Since C is a m (n + 1) matrix,

rank(C T ) + dim[ker(C T )] = m = rank(AT ) + dim[ker(AT )].

Using the fact that the rank of a matric and its transpose coincide, we have

i.e., dim[ker(C T )] = dim[ker(AT )] 1. Since the dimension of ker(C T ) is one smaller

than the dimension of ker(AT ), we can nd a y ker(AT ) that is not in ker(C T ). Hence,

yA = 0 but yb

= 0.

46

4.8.3

Linear Inequalities

Given a b Rm , nd an x Rn such that Ax b or show that no such x exists.

The problem diers from the earlier one in that = has been replaced by .

4.8.4

Non-Negative Solutions

Observe that if b = 0, the problem is trivial, so I assume that b

= 0.

Denition 4.10 A set C of vectors is called a cone if x C whenever x C and

> 0.

Denition 4.11 The set of all non-negative linear combinations of the columns of A is

called the nite cone generated by the columns of A. It is denoted cone(A).

Note the dierence between span(A) and cone(A) below:

span(A) = {y Rm |y = Ax for some x Rn }

and

cone(A) = y Rm |y = Ax for some x Rn+

Theorem 4.13 (Farkas Lemma (1902)) Let A be an m n matrix, b Rm and

F = {x Rn |Ax = b, x 0}. Then, either F

= or there exists y Rm such that

yA 0 and y b < 0 but not both.

Take any linear combination of the equations in Ax = b to get yAx = yb. A nonnegative solution to the rst system is a solution to the second. If we can choose y so that

yA 0 and y b < 0, we nd that the left hand side of the single equation yAx = yb is

at least zero while the right hand side is negative, a contradiction. Thus, the st system

cannot have a non-negative solution.

Proof: First we prove that both statements cannot hold simultaneously. Suppose

not. Let x 0 be a solution to Ax = b and y a solution to yA 0 such that

y b < 0. Notice that x must be a solution to y Ax = y b. Thus, y Ax = y b. The

0 y Ax = y b < 0, which is a contradiction.

If b

/ span(A) (i.e., there is no x such that Ax = b), by the previous theorem

(Theorem ??), there is a y Rm such that yA = 0 and yb

= 0. If it so happens that the

given y has the property that yb < 0 we are done. If yb > 0, then negate y and again we

are done. So, we may suppose that b span(A) but b

/ cone(A), i.e., F = .

47

column vectors and b span(A), we can express b as a linear combination of

an r-subset

i

i

r

1

D of linearly independent columns of A. Let D = {a , . . . , a } and b = rt=1 it ait .

Note that D is linearly independent. Since b

/ cone(A), at least one of {it }t1 is

negative.

Now apply the following four step procedure repeatedly. Subsequently, we show that

the procedure must terminate.

1. Choose the smallest index h amongst {i1 , . . . , ir } with h < 0.

2. Choose y so that y a = 0 for all a D\ah and y ah

= 0. This can be done by the

previous theorem (Theorem ??) because ah

/ span(D\ah ). Normalize y so that

h

y a = 1. Observe that y b = h < 0.

3. If y aj 0 for all columns aj of A stop, and the proof is complete.

4. Otherwise, choose the smallest index w amongst {1, . . . , n} such that y aw < 0.

/ D\ah . Replace D by {D\ah } aw , i.e., exchange ah for aw .

Note that aw

To complete the proof, we must show that the procedure terminates. Let D k denote

the set D at the start of the kth iteration of the four step procedure described above. If

the procedure does not terminate, there is a pair (k,

) with k <

such that D k = D ,

i.e., the procedure cycles.

Let s be the largest index for which as has been removed from D at the end of one

of the iterations k, k + 1, . . . ,

1, say p. Since D = Dk , there is a q such that as is

inserted into D q at the end of iteration q, where k q <

. No assumption is made

about whether p < q or p > q. Notice that

Dp {as+1 , . . . , an } = Dq {as+1 , . . . , an }.

Let D p = {ai1 , . . . , air }, b = i1 ai1 + ir air and let y be the vector found in step two

of iteration q. Then:

!

0 > y b = y i1 ai1 + + ir air = y i1 ai1 + + y ir air > 0,

which is a contradiction. The rst inequality comes from the previous theorem (Theorem

??). To see why the last inequality must be true:

When ij < s, we have from Step 1 of iteration p that ij 0. From Step 4 of

iteration q, we have y aij 0.

When ij = s, we have from Step 1 of iteration p that ij < 0. From Step 4 of

iteration q we have y aij < 0.

When ij > s, we have from Dp {as+1 , . . . , an } = Dq {ar+1 , . . . , an } and Step 2

of iteration q that y aij = 0.

This complete the proof.

48

4.8.5

reduced to the problem of deciding if Bz = b, z 0 has a solution for a suitable matrix

B.

First observe that any inequality of the form j aij xj bi can be turned into an

equation by the subtraction of a surplus variable, s. That is, dene a new variable si 0

such that

aij xj si = bi .

j

Similarly, an inequality of the form j aij xj bi can be converted into an equation by

the addition of a slack variable, si 0 as follows:

aij xj + si = bi .

j

zj and zj by setting xj = zj zj . In this way any inequality system can be converted

into an equality system with non-negative variables. We will refer this as converting into

standard form.

As an example, we derive the Farkas alternative for the system {x|Ax b, x 0}.

Deciding solvability of Ax b for x 0 is equivalent to solvability of Ax + Is = b where

x, s 0. Set B = [A|I] and z = (x, s)T and we can write the system as Bz = b, z 0.

Now apply the Farkas lemma to this system:

yB 0, yb < 0.

Now 0 yB = y[A|I] implies yA 0 and y 0. So, the Farkas alternative is {y|yA

0, y 0, yb < 0}. The principle here is that by a judicious use of auxiliary variable,

one can convert almost anything into standard form.

4.9

4.9.1

Number Fields

Linear algebra makes use of number systems (number elds). By a number eld I mean

any set K of objects, called numbers, which, when subjected to four arithmetic operations again give elements of K. More exactly, these operations have the following

properties F 1, F 2, and F 3 (eld axioms):

F1: To every pair of numbers and in K, there corresponds a (unique) number

+ in K, called the sum of and , where

1. + = + , K (addition is communicative);

49

2. ( + ) + = + ( + ) , , K (addition is associative);

3. There exists a number 0 (zero) in K such that 0 + = K;

4. For every K, there exists a number (negative element) K such that

+ = 0.

The solvability of the equation + = 0 allows us to carry out the operation of

subtraction, by dening the dierence as the sum of the number and the solution

of the equation + = 0.

F2: To every pair of numbers , K, there corresponds a (unique) number (or

) in K, called the product of and , where

1. = , K (multiplication is commutative);

2. () = () , , K (multiplication is associative);

3. There exists a number 1 (

= 0) in K such that 1 = inK;

4. For every

= 0 K, there exists a number (reciprocal element) K such that

= 1.

F3: Multiplication is distributive over addition, i.e., for every , , K,

( + ) = + .

The solvability of the equation = 1 for every

= 0 allows us to carry out the

operation of division, by dening the quotient / as the product of the number and

the solution of the equation = 1.

The numbers 1, 1 + 1 = 2, 2 + 1 = 3, etc. are said to be natural; it is assumed

that none of these numbers is zero. The set of natural numbers is denoted as N. By

the integers in a eld K, we mean the set of all natural numbers together with their

negatives and the number zero. The set of integers is denoted as Z. By the rational

numbers in a eld K, we mean the set of all quotients p/q, where p and q are integers

and q

= 0. The set of rational numbers is denoted as Q.

Two eld K and K are said to be isomorphic if we can set up a one-to-one cor

respondence between K and K such that the number associated with every sum (or

product) of numbers in K is the sum (or product) of the corresponding numbers in K .

The number associated with every dierence (or quotient) of numbers in K will then be

the dierence (or quotient) of the corresponding numbers in K .

The most commonly encountered concrete examples of number elds are the following:

50

= 0 are the

ordinary integers subject to the ordinary operations of arithmetic. It should be

noted that the integers by themselves do not form a eld, since they do not satisfy

axiom F2-4. It follows that every eld K has a subset isomorphic to the eld of

rational numbers.

2. The eld of real numbers, having the set of all points of the real line as its geometric

counterpart. The set of real numbers is denoted as R. An axiomatic treatment of

the eld of real numbers is achieved by supplementing axioms F 1, F 2, F 3 with the

axioms of order and the least upper bound principle.

3. The eld of complex numbers of the form a + ib, where a and b are real numbers

(i is not a real number), equipped with the following operations of addition and

multiplication:

(a1 + ib1 ) + (a2 + ib2 ) = (a1 + a2 ) + i(b1 + b2 ),

(a1 + ib1 )(a2 + ib2 ) = (a1 a2 b1 b2 ) + i(a1 b2 + a2 b1 ).

The set of complex numbers is denoted as C. For numbers of the form a + i0, these

operations reduce to the corresponding operations for real numbers; briey I write

a + i0 = a and call complex numbers of this form real. Thus, it can be said that

the eld of complex numbers has a subset isomorphic to the eld of real numbers.

Complex numbers of the form 0 + ib are said to be (purely) imaginary and are

designated briey by ib. It follows from the multiplication rule that

i2 = i i = (0 + i1)(0 + i1) = 1.

4.9.2

Denitions

The concept of a linear space generalizes that of the set of all vectors. The generalization

consists rst in getting away from the concrete nature of the objects involved (directed

line segments) without changing the properties of the operations on the objects, and

secondly in getting away from the concrete nature of the admissible numerical factors

(real numbers). This leads the following denition.

Denition 4.12 A set V is called a linear (or ane) space over a eld K if

1. Given any two elements x, y V , there is a rule (the addition rule) leading to a

(unique) element x + y V , called the sum of x and y;

2. Given any element x V and any number V , there is a rule (the multiplication

by a number) leading to a (unique) element x V , called the product of the

element x and the number ;

3. These two rules obey the axioms listed below, VS1 and VS2.

VS 1: The addition rule has the following properties:

51

1. x + y = y + x for every x, y V ;

2. (x + y) + z = x + (y + z) for every x, y, z V ;

3. There exists an element 0 V (the zero vector ) such that x + 0 = x for every

xV;

4. For every x V , there exists an element y V (the negative element) such that

x + y = 0.

VS 2: The rule for multiplication by a number has the following properties:

1. 1 x = x for every x V ;

2. (x) = ()x for every x V and , K;

3. ( + )x = x + x for every x V and , K;

4. (x + y) = x + y for every x V and every K.

4.9.3

V over a eld K is called a basis for V if, given any x V , there exists an expansion

x = 1 e1 + 2 e2 + + n en ()

where j V for every j = 1, . . . , n.

It is easy to see that under these conditions, the coecients in the expansion () are

uniquely determined. In fact, we can write two expansions

x = 1 e1 + 2 e2 + + n en ,

x = 1 e1 + 2 e2 + + n en

for a vector x, then, subtracting them term by term, we obtain the relation

0 = (1 1 )e1 + (2 2 )e2 + + (n n )en ,

from which, by the assumption that the vectors e1 , e2 , . . . , en are linearly independent,

we nd that

1 = 1 , 2 = 2 , . . . , n = n .

The uniquely dened numbers 1 , . . . , n are called the components of the vector x with

respect to the basis e1 , . . . , en

The fundamental signicance of the concept of a basis for a linear space consists

in the fact that when a basis is specied, the originally abstract linear operations in

the space become ordinary linear operations with numbers, i.e., the components of the

vectors with respect to the given basis. In fact, we have the following.

52

Theorem 4.14 When two vectors of a linear space V are added, their components (with

respect to any basis) are added. When a vector is multiplied by a number , all its

components are multiplied by .

If, in a linear space V , we can nd n linearly independent vectors while every n + 1

vectors of the space are linearly dependent, then the number n is called the dimension

of the space V and the space V itself is called n-dimensional. A linear space in which

we can nd an arbitrarily large number of linearly independent vectors is called innitedimensional.

Theorem 4.15 In a space V of dimension n, there exists a basis consisting of n vectors.

Moreover, any set of n linearly independent vectors of the space V is a basis for the space.

Theorem 4.16 If there is a basis in the space V , then the dimension of V equals the

number of basis vectors.

4.9.4

Subspaces

has the following properties:

1. If x, y W , then x + y W ;

2. If x W and is an element of the eld K, then x W .

Then, every set W V with properties 1 and 2 above is called linear subspace (or

simply a subspace) of the space V .

Denition 4.15 (The Direct Sum) A linear space W is the direct sum of given

subspaces W1 , . . . , Wm W if the following two conditions are satised:

1. For every x W , there exists an expansion

x = x1 + + xm ,

where x1 W1 , . . . , xm Wm ;

2. This expansion is unique, i.e., if

x = x1 + + xm = y1 + + ym

where xj , yj Wj (j = 1, . . . , m), then

x1 = y1 , . . . , xm = ym .

Theorem 4.17 Let W1 be a xed subspace of an n-dimensional space Vn . Then, there

always exists a subspace W2 Vn such that the whole space Vn is the direct sum of W1

and W2 .

53

4.9.5

Denition 4.16 Let be a rule which assigns to every given vector x of linear space V

a vector x in a linear space. Then, is called morphism (or linear operator) if the

following two conditions hold:

1. (x + y) = (x) + (y) for every x, y V ;

2. (x) = (x) for every x V and every K.

phism, and the spaces V and V themselves are said to be isomorphic (more exactly,

K-isomorphic).

Theorem 4.18 Any two n-dimensional spaces V and V (over the same eld K) are

K-isomorphic.

Corollary 4.1 Every n-dimensional linear space over a eld K is K-isomorphic to the

space K n . In particular, every n-dimensional complex space is C-isomorphic to the space

Cn , and every n-dimensional real space is R-isomorphic to the space Rn .

54

Chapter 5

Calculus

5.1

smooth, with no breaks or kinks. The derivative of f is a function giving, at each

value of x, the slope of change in f (x). We sometimes write

dy

= f (x).

dx

to indicate that f (x) gives us the (instantaneous) amount, dy, by which y changes per

unit change, dx, in x. If the rst derivative is a dierentiable function, we can take its

derivative which gets the second derivative of the original function

d2 y

= f (x).

dx2

If a function possesses a continuous derivatives f , f , . . . , f n , it is called n-times continuously dierentiable, or a C n function. Some rules of dierentiation is provided below:

For constants, : d/dx() = 0.

Power rule: d/dx(xn ) = nxn1 .

Later in this note on multivariate calculus, we are going to discuss some of the above

properties in details from a more general perspective. Until then, just remember them

so that you can use them anytime.

55

5.2

the following: x y if xi yi for every i = 1, . . . , n; and x y if xi > yi for every

i = 1, . . . , n.

Denition 5.1 Let f : D R, where D is a subset of Rn . Then, f is nondecreasing

if f (x) f (y) whenever x y. If, in addition, the inequality is strict whenever x y,

then we say that f is is increasing. If, instead, f (x) > f (y) whenever x y and x

= y,

then we say that f is strongly increasing.

Rather than having a single slope, a function of n variables can be thought to have

n partial slopes, each giving only the rate at which y would change if one xi , alone, were

to change. Each of these partial slopes is called partial derivative.

Denition 5.2 Let y = f (x1 , . . . , xn ). The partial derivative of f with respect to xi is

dened as

f (x1 , . . . , xi + h, . . . , xn ) f (x1 , . . . , xi , . . . , xn )

f (x)

lim

h0

xi

h

y/xi or fi (x) are used to denote partial derivatives.

5.3

Gradients

If z = F (x, y) and C is any number, we call the graph of the equation F (x, y) = C a

level curve for F . The slope of the level curve F (x, y) = C at a point (x, y) is given by

the formula

F (x, y) = C = y =

F (x, y)/x

F1 (x, y)

dy

=

=

dx

F (x, y)/y

F2 (x, y)

If (x0 , y0 ) is a particular point on the level curve F (x, y) = C, the slope at (x0 , y0 ) is

F1 (x0 , y0 )/F2 (x0 , y0 ). The equation for the tangent hyperplane T is

y y0 = [F1 (x0 , y0 )/F2 (x0 , y0 )] (x x0 )

or, rearranging

F1 (x0 , y0 )(x x0 ) + F2 (x0 , y0 )(y y0 ) = 0.

Recalling the inner product, the equation can be written as

(F1 (x0 , y0 ), F2 (x0 , y0 )) (x x0 , y y0 ) = 0

The vector (F1 (x0 , y0 ), F2 (x0 , y0 )) is said to be the gradient of F at (x0 , y0 ) is often

denoted by F (x0 , y0 ) (pronounced as nabla). The vector (x x0 , y y0 ) is a vector

56

on the tangent hyperplane T which implies that F (x0 , y0 ) is orthogonal to the tangent

hyperplane T at (x0 , y0 ).

Suppose more generally that F (x) = F (x1 , . . . , xn ) is a function of n variables dened

on an open set A in Rn , and let x0 = (x01 , . . . , x0n ) be a point in A. The gradient of F at

x0 is the vector

F (x0 )

F (x0 )

0

, ,

F (x ) =

x1

xn

of rst-order partial derivatives.

5.4

the rate of change of f (x) in the direction parallel to the i-th coordinate axis. Each

partial derivative says nothing about the behavior of f in other directions. We introduce

the concept of directional derivative in order to measure the rate of change of f in an

arbitrary direction.

Consider the vector x = (x1 , . . . , xn ) and let a = (a1 , . . . , an ) Rn \{0} be a given

vector. If we move a distance ha > 0 from x in the direction given by a, we arrive at

x + ha. The average rate of change of f from x to x + ha is then (f (x + ha) f (x))/h.

We dene the derivative of f along the vector a by

f (x + ha) f (x)

h0

h

fa (x) = lim

or, with components,

fa (x1 , . . . , xn ) = lim

h0

h

We assume that x + ha lies in the domain of f for all suciently small h. This is one

reason why the domain is generally assumed to be open. In particular, with ai = 1 and

aj = 0 for all j

= i, this derivative is the partial derivative of f with respect to xi .

Suppose f is C 1 in a set A 1 , and let x be an interior point in A. For an arbitrary

vector a, dene the function g by

g(h) = f (x + ha) = f (x1 + ha1 , . . . , xn + han ).

1

A function f : n is continuously dierentiable (or C 1 ) on an open set A n if, for each i =

1, . . . , n, (f /xi )(x) exists for all x A and is continuous in x. f is k-times continuously dierentiable

or C k on A if all the derivatives of f of order less than or equal to k( 1) exist and are continuous on A.

57

Then, (g(h)g(0))/h

= (f (x+ha)f (x))/h.

Letting h tend to 0, we have g (0) = fa (x).

Since g (h) = ni=1 fi (x + ha)ai , g (0) = ni=1 fi (x)ai . Hence,

fa (x) =

fi (x)ai = f (x) a.

i=1

This equation shows that the derivative of f along the vector a is equal to the inner

product of the gradient of f and a. If a = 1, the number fa (x) is called the directional

derivative of f at x, in the direction a.

Theorem 5.1 Suppose that f (x) = f (x1 , . . . , xn ) is C 1 in an open set A. Then, at

points x where f (x) Rn \{0}, the gradient f (x) = (f1 (x), . . . , fn (x)) satises:

1. f (x) is orthogonal to the level surface through x.

2. f (x) points in the direction of maximal increase of f .

3. f (x) measures how fast the function increases in the direction of maximal increase.

Proof of Theorem 5.1: By introducing as the angle between the vectors f (x)

and a, we have

Note that cos 1 for all and cos 0 = 1. So when a = 1, it follows that at points

where f (x)

= 0, the number fa (x) is largest when = 0, i.e., when a points in the

same direction as f (x), while fa (x) is smallest when = , that is, cos = 1, i.e.,

when a points in the opposite direction to f (x). Moreover, it follows that the length

of f (x) equals the magnitude of the maximum directional derivative.

Theorem 5.2 (The Mean-Value Theorem) Suppose that f : Rn R is C 1 in an

open set containing [x, y]. Then there exists a point w (x, y) such that

f (x) f (y) = f (w) (x y).

Proof of the mean-value theorem: We assume that the mean-value theorem for

functions of one variable is correct. Dene () = f (x + (1 )y). Then, using the

chain rule which we will cover later, () = f (x + (1 )y) (x y). According to

the mean-value theorem for functions of one variable, there exists a number 0 (0, 1)

such that (1) (0) = (0 ). Putting w = 0 x + (1 0 )y, the theorem follows.

5.5

Convex Sets

Convex sets are basic building blocks in virtually every area of microeconomic theory. Convexity is most often assumed to guarantee that the analysis is mathematically

tractable and that the results are clear-cut and well-behaved.

58

x + (1 )y S,

for all [0, 1]

We say that z is a convex combination of x and y if z = x + (1 )y for some

[0, 1]. We have a very simple and intuitive rule dening convex sets: A set is convex

if and only if we can connect any two points in the set by a straight line that lies entirely

within the set.

Exercise 5.1 Suppose that p 0 and y 0. Let B(p, y) = {x Rn+ |p x y} be the

budget set of the consumer. Show that B(p, y) is convex.

Theorem 5.3 Let S and T be convex sets in Rn . Then S T is a convex set.

Proof of Theorem 5.3: Let x and y be any two points in S T . Because x S T ,

we have x S and x T . Similarly, we have y S and y T . Let z = x + (1 )y

for some [0, 1] be any convex combination of x and y. z S because S is convex

and z T because T is convex. Thus, z S T .

Exercise 5.2 Construct an example in which two sets S and T are convex but S T is

not convex.

5.5.1

Let u() : Rn R be a utility function. Dene U C(x0 ) = {x Rn+ |u(x) u(x0 )}. This

U C(x0 ) is called the upper contour set which consists of all commodity vectors x that

the individual values at least as good as x0 . In consumer theory, we usually assume that

U C(x0 ) is convex for every x0 Rn+ .

5.6

if f (x) () 0 for all x I.

Denition 5.4 A function f (x) = f (x1 , . . . , xn ) dened on a convex set S is concave

(convex) on S if

f (x + (1 )x ) () f (x) + (1 )f (x )

Denition 5.5 A function f (x) = f (x1 , . . . , xn ) dened on a convex set S is strictly

concave (convex) on S if

= x and all (0, 1)

59

5.7

The matrix

D 2 f (x) = (fij (x))nn

is called the Hessian (matrix) of f at x, and

f11 (x) f12 (x)

f21 (x) f22 (x)

2

|D(r)

f (x)| =

..

..

.

.

fr1 (x) fr2 (x)

the n determinants

f1r (x)

f2r (x)

, r = 1, . . . , n

..

..

.

.

frr (x)

are the leading principal minors of D 2 f (x) of order r. Here fij (x) = 2 f (x)/xi xj for

any i, j = 1, . . . , r.

Theorem 5.4 (Second-Order Characterization of Concave (Convex) Functions)

Suppose that f (x) = f (x1 , . . . , xn ) is a C 2 function dened on an open, convex set S

in Rn . Let 2(r) f (x) denote a generic principal minor of order r in the Hessian matrix.

Then

1. f is convex in S 2(r) f (x) 0 for all x S and all 2(r) f (x), r = 1, . . . , n.

2. f is concave in S (1)r 2(r) f (x) 0 for all x S and all 2r f (x), r =

1, . . . , n.

Proof of Theorem 5.4: (=) The proof relies on the knowledge on the chain

rule (Theorem 5.15) which we are going to cover in this course. Just take it for granted

until then. Take two points x, x0 S and let t [0, 1]. Dene

g(t) = f (x0 + t(x x0 )) = f (tx + (1 t)x0 ).

The chain rule for functions of several variables gives

0 T

g (t) = (x x )

"

n

#

fi (x0 + t(x x0 ))(xi x0i )

f (x + t(x x ) =

0

i=1

#

"

g (t) = (x x0 )T D2 f (x0 + t(x x0 )) (x x0 )

n

n

fij (x0 + t(x x0 ))(xi x0i )(xj x0j )

=

i=1 j=1

By our hypothesis with Theorem 4.6 on quadratic forms, g (t) 0 for any t [0, 1].

This shows that g() is convex. In particular, we have

g(t) = g (t 1 + (1 t) 0)) tg(1) + (1 t)g(0) = tf (x) + (1 t)f (x0 )

60

But this shows that f () is convex. The concavity of f easily follows by replacing f with

f . (=) Suppose f () is convex. According to Theorem 4.6 on quadratic forms, it

suces to show that for all x S and all h1 , . . . , hn , we have

Q=

n

n

fij (x)hi hj 0.

i=1 j=1

exists a positive number a such that x + th S for all t with |t| < a. Let I = (a, a).

Dene the function p on I by p(t) = f (x + th). Since p() is convex in I,

p (t) =

n

n

fij (x + th)hi hj 0

i=1 j=1

for all t I. Putting t = 0, it follows that f (x) 0. This completes the proof.

Corollary 5.1 Let z = f (x, y) be a C 2 function dened on an open convex set S R2 .

Then,

1. f is convex f11 0, f22 0, and f11 f22 (f12 )2 0.

2. f is concave f11 0, f22 0, and f11 f22 (f12 )2 0.

Exercise 5.3 Let f (x, y) = 2x y x2 + 2xy y 2 for all (x, y) R2 . Check whether f

is concave, convex, or neither.

Exercise 5.4 The CES (Constant Elasticity of Substitution) function f dened for K >

0, L > 0 by

#1/

"

f (K, L) = A K + (1 )L

where A > 0,

= 0, and 0 1. Show that f is concave if 1 and convex if

1.

Theorem 5.5 (Second-Order (Partial) Characterization of Strict Concavity)

Suppose that f (x) = f (x1 , . . . , xn ) is a C 2 function dened on an open, convex set S in

2 f (x) be dened above. Then

Rn . Let D(r)

2 f (x) > 0 for all x S and all r = 1, . . . , n = f is strictly convex.

1. D(r)

2 f (x) > 0 for all x S and all r = 1, . . . , n = f is strictly concave.

2. (1)r D(r)

Proof of Theorem 5.5: Dene the function g() as in the proof of Theorem 5.4

above. If the specied conditions are satised, the Hessian matrix D2 f (x) is positive

denite by Theorem 4.6 on quadratic forms. So, for x

= x0 , g (t) > 0 for all t [0, 1].

It follows that g() is strictly convex. Then, we have

g(t) = g (t 1 + (1 t) 0)) > tg(1) + (1 t)g(0) = tf (x) + (1 t)f (x0 )

for all t (0, 1). The strict concavity of f is obtained by replacing f with f .

61

Then,

1. f11 > 0 and f11 f22 (f12 )2 > 0 = f is strictly convex.

2. f11 < 0 and f11 f22 (f12 )2 > 0 = f is strictly concave.

Theorem 5.6 (First-Order Characterization of Concavity) Suppose that f () is

a C 1 function dened on an open, convex set S in Rn . Then

1. f is concave in S if and only if

f (x) f (x0 ) f (x0 ) (x x0 ) =

n

f (x0 )

i=1

xi

(xi x0i )

for all x, x0 S.

2. f () is strictly concave i the above inequality is always strict when x

= x0 .

3. The corresponding result for convex (strictly convex) functions is obtained by changing to (< to >) in the above inequality.

Proof of Theorem 5.6: (1) (=) Let x, x0 S. Since f is concave,

f (x) + (1 )f (x0 ) f (x + (1 )x0 )

for all (0, 1). Rearranging the above inequality, for all (0, 1), we obtain

f (x) f (x0 )

f (x0 + (x x0 )) f (x0 )

()

Let 0. The right hand side of () then approaches f (x0 ) (x x0 ). (=) Let

x, x0 S and (0, 1). Dene z = x + (1 )x0 . Notice that z S because S is

convex. By our hypothesis, we have

f (x) f (z) f (z) (x z) (i)

f (x0 ) f (z) f (z) (x0 z) (ii)

Multiplying the inequality in (i) by > 0 and the inequality in (ii) by 1 > 0, we

obtain

!

"

#

(f (x) f (z)) + (1 ) f (x0 ) f (z) f (z) (x z) + (1 )(x0 z) (iii)

Here (x z) + (1 )(x0 z) = x + (1 )x0 z = 0, so the right hand side of (iii)

is 0. Thus, rearranging (iii) gives

f (x) + (1 )f (x0 ) f (z) = f (x + (1 )x0 )

62

because z = x + (1 )x0 . This shows that f is concave. (2) (=) Suppose that

f is strictly concave in S. Then, inequality () is strict for x

= x0 . (=) With z =

x0 + (x x0 ), we have

f (x) f (x0 ) <

f (x0 ) (z x0 )

f (z) f (x0 )

= f (x0 ) (x x0 ).

where we used the inequality in (1), which we have already proved, and the fact that

z x0 = (x x0 ). This shows that the inequality in (1) holds with strict inequality.

(3) This part is trivial. Do you agree with me?

5.7.1

Jensens Inequality

Rn if and only if

f (1 x1 + n xn ) 1 f (x1 ) + n f (xn )

holds for all x1 , . . . , xn S, and for all i 0 for all i = 1, . . . , n with

n

i=1 i

= 1.

follows.

k

k

k xh

h f (xh )

H(k) : f

h=1

h=1

H(2) is true because it is indeed the denition of concavity of f . Now, we will show that

H(k) = H(k + 1). We execute a series of computations below.

k

$ k

%

k+1

h

h xh

h

xh + k+1 xk+1

= f

f

k

h=1 h

h=1

h=1

h=1

k

k

h

h f

xh + k+1 f (xk+1 ) (because of H(2))

k

h=1 h

h=1

h=1

k

k

h

h

f (xh ) + k+1 f (xk+1 )

k

h

h=1

h=1

h=1

(because H(k) is true under the inductive hypothesis.)

=

k+1

h f (xh ).

h=1

One can extend Jensens inequality to the continuum. Let X be a random variable

which takes values on the real line R. Dene g : R R to be a probability density

function. Then, continuous version of Jensens inequality is given:

63

and only if

&

&

f (x)g(x)dx

f (x)g(x)dx

f

5.8

the upper level set P = {x S|f (x) } is convex for each R. We say that

f is quasiconvex if f is quasiconcave. So, f is quasiconvex i the lower level set

P = {x S|f (x) } is convex for each R.

Proposition 5.1 If f () is concave, then it is quasiconcave. Similarly, if f () is convex,

then it is quasiconvex.

Exercise 5.5 Prove Proposition 5.1.

Theorem 5.8 Let f () be a function of n variables dened on a convex set S in Rn .

Then, f is quasiconcave if and only if either of the following conditions is satised for

all x, x S and all [0, 1],

1. f (x + (1 )x ) min{f (x), f (x )}

2. f (x ) f (x) = f (x + (1 )x ) f (x)

and [0, 1], and dene a = min{f (x), f (x )}. Then,

x, x Pa = {x S|f (x) a}

Since Pa is convex by our hypothesis, x + (1 )x Pa for any [0, 1]. This implies

that f (x + (1 )x ) a = min{f (x), f (x )}. (=) Suppose that the inequality in (1)

is valid and let a be an arbitrary number. We must show that Pa is convex. Take any

arbitrary points x, x Pa . Then, f (x) a and f (x ) a. Also, for all (0, 1), the

inequality in (1) implies that

(

'

f x + (1 )x min{f (x), f (x )}

Thus, x + (1 )x Pa . This proves that Pa is convex. We leave the rest of the proof

as an exercise.

Exercise 5.6 Prove the second part of Theorem 5.8.

A function : R R is said to be strictly increasing if F (x) > F (y) whenever x > y.

64

Let f () be dened on a convex set S in Rn and let F be a function of one variable whose

domain includes f (S). If f () is quasiconcave (quasiconvex) and F is strictly increasing,

then F (f ()) is quasiconcave (quasiconvex).

Proof of Theorem 5.9: Suppose f () is quasiconcave. Using the previous theorem

(Theorem 5.8), we must have

(

'

f x + (1 )x min{f (x), f (x )}.

Since F () is strictly increasing,

(

((

'

' '

F min{f (x), f (x )} = min{F (f (x)), F (f (x ))}.

F f x + (1 )x

It follows that F f is quasiconcave. The argument in the quasiconvex case is entirely

similar, replacing with and min with max.

Denition 5.7 A function f () dened on a convex set S Rn is said to be strictly

quasiconcave if

= x and all (0, 1). The function f is strictly quasiconvex

if f is strictly quasiconcave.

n

Exercise 5.7 The Cobb-Douglas function f (x) = Ax1 1 xn , dened for x1 > 0, . . . , xn >

0, with A > 0 and i > 0 for all i = 1, . . . , n. Show the following.

1. f () is quasiconcave for all 1 , . . . , n .

2. f () is concave for 1 + + n 1.

3. f () is strictly concave for 1 + + n < 1.

Theorem 5.10 (First-Order Characterization of Quasiconcavity) Let f () be a

C 1 function of n variables dened on an open convex set S in Rn . Then f () is quasiconcave on S if and only if for all x, x0 S,

0

f (x) f (x ) = f (x ) (x x ) =

n

f (x0 )

i=1

xi

(xi x0i ) 0.

dene the function g() on [0, 1] by

!

!

g(t) = f (1 t)x0 + tx = f x0 + t(x x0 ) .

Then, using the chain rule (Theorem 5.15) which will be shown later, we have

65

Suppose f (x) f (x0 ). By Theorem 5.8, g(t) g(0) for all t [0, 1]. For any t (0, 1],

we have

g(t) g(0)

0.

t

Letting t 0, we obtain

lim

t0

g(t) g(0)

= g (0) 0.

t

This implies

g (0) = f (x0 ) (x x0 ) 0

(=) We will be satised with the gure for this part.

The content of Theorem 5.10 is that for any quasiconcave function f () and any pair

of points x and x0 with f (x) f (x0 ), the gradient vector f (x0 ) and the vector (x x0 )

must form an acute angle.

Theorem 5.11 (Second-Order Characterization of Quasiconcavity) Let f () be

a C 2 function dened on an open, convex set S in Rn . Then, f () is quasiconcave if

and only if, for every x S, the Hessian matrix D 2 f (x) is negative semidenite in the

subspace {z Rn |f (x) z = 0}, that is,

z T D2 f (x)z 0 whenever f (x) z = 0

for every x S. If the Hessian matrix D2 f (x) is negative denite in the subspace

{z Rn |f (x) z = 0} for every x S, then f () is strictly quasiconcave.

Proof of Theorem 5.11: (=) Suppose f () is quasiconcave. Let x S. Choose

x S such that f (x) (x x) = 0. Since f () is quasiconcave, f (x ) f (x). To see

this, draw the gure. Then,

f (x ) f (x) f (x) (x x) = 0

Theorem 5.4 and Theorem 4.6, concavity of f () is equivalent to negative semideniteness

of the Hessian matrix. Then, the conclusion follows. (=) This proof is based on A

Characterization of Quasi-Concave Functions, by Kiyoshi Otani in Journal of Economic

Theory, vol 31, (1983), 194-196. Let x, x S such that f (x ) f (x). This choice entails

no loss of generality. For [0, 1], dene

g() = f (x + (x x)).

Note that g(0) = f (x), g(1) = f (x ), and g(1) g(0) because f (x ) f (x) by our

hypothesis. What we want to show is that g() g(0) for any [0, 1]. By the

mean-value theorem (Theorem 5.2), there exists 0 (0, 1) such that g (0 ) = 0. Let

66

= 0

for any x S. This strikes us as being innocuous. Let p denote f (x0 ) for notational

simplicity.

By pT p > 0, there exists a C 2 function : R R for suciently small || > 0 such

that

(0) = 0 and

(

= f (x0 ) for any small

f ()p + (x x) + x0

'

Again, for notional simplicity, we denote ()p + (x x) + x0 by z(). By dierentiating f (z()) = f (x0 ), we have

)

*

f (z()) ()p + (x x) = 0 ()

and by further dierentiating, we have

*T

)

*

)

()p + (x x) D2 f (z()) ()p + (x x) + f (z()) ()p = 0. ()

hypothesis requires

*T

)

*

)

()p + (x x) D2 f (z()) ()p + (x x) 0.

Then, we must have ()f (z())p 0. When is suciently close to zero, z() is

very close to x0 and so f (z())p > 0 because p Rn \{0}. Then, () 0 for with

|| suciently small.

For suciently small | 0 |, we have

(

'

f (t t0 )(x x) + x0 p > 0

because f (x0 )T p > 0 and (tt0 )(x x)+x0 is very close to x0 . Hence, for suciently

close to 0 , we have

'

(

g() = f (x + (x x)) = f ( 0 )(x x) + x0

(

'

f ( 0 )p + ( 0 )(x x0 ) + x0 = f (x0 ) = g(0 )

because ( 0 ) 0 for suciently close to 0 . Accordingly, g(0 ) does not have an

interior minimum in [0, 1], unless it is constant. Hence, g() g(0) for any [0, 1].

The last step is based on Corollary 4.3 in Nine Kinds of Quasiconcavity and Concavity,

by Diewert, Avriel, and Zang in Journal of Economic Theory, vol 25, (1981), 397-420.

Corollary 4.3 (Diewert, Avriel, and Zang (1981)): A dierentiable function f

dened over an open S is quasiconcave if and only if, for any x0 S and any v R with

v T v = 1,

67

v T f (x0 ) = 0 implies g(t) f (x0 + tv) does not attain a (semistrict) local minimum at

t = 0.

This completes the proof.

Theorem 5.12 (A Characterization through Bordered Hessian) Let f be a C 2

function dened in an open, convex set S in Rn . Dene the bordered Hessian determinants Br (x) as follows: for each r = 1, . . . , n,

0

f1 (x) fr (x)

f1 (x) f11 (x) f1r (x)

Br (x) = .

.

..

..

..

..

.

.

.

fr (x) fr1 (x) frr (x)

Then,

1. A necessary condition for f to be quasiconcave is that (1)r Br (x) 0 for all x S

and all r = 1, . . . , n.

2. A sucient condition for f to be strictly quasiconcave is that (1)r Br (x) > 0 for

all x S and all r = 1, . . . , n.

5.9

Total Dierentiation

functions are often called transformations (operators). A transformation f : Rn Rm

is said to be linear if

f (x1 + x2 ) = f (x1 ) + f (x2 ) and f (x1 ) = f (x1 )

for all x1 , x2 Rn and all scalars R. Our knowledge on linear algebra tells us that

for every linear transformation f : Rn Rm , there is a unique m n matrix A such

that f (x) = Ax for all x Rn .

(1)

f (x)

a11 a12 a1n

x1

..

..

..

.. x2

..

.

.

.

.

.

..

..

.. ..

..

..

.

.

.

.

.

.

(m)

am1 am2 amn

xn

f (x)

In particular,

f (j)(x) = aj1 x1 + aj2 x2 + ajn xn =

n

i=1

68

aji xi .

5.9.1

to f around x0 is given by

for small values of h. Here stands for approximation. This is useful because the

approximation error is dened by

becomes negligible for suciently small h. Namely O(h) 0 as h 0. More importantly, however, O(h) also becomes small in comparison with h - that is,

O(h)

f (x0 + h) f (x0 )

0

= lim

f (x ) = 0.

lim

h0 h

h0

h

Moreover, f () is dierentiable at x0 if and only if there exists a number c R such that

f (x0 + h) f (x0 ) ch

= 0.

h0

h

lim

If such a number c R exists, it is unique and c = f (x0 ). These arguments can be generalized straightforwardly to higher dimensional spaces. In particular, a transformation

f () is dierentiable at a point x0 if it admits a linear transformation around x0 :

Denition 5.8 If f : A Rm is a transformation dened on a subset A of Rn and

x0 is an interior point of A, then f is said to be dierentiable at x0 if there exists an

m n matrix C such that

lim

h0n

=0

h

by Df (x0 ).

If I restrict attention to real-valued functions, the following theorem establishes an

equivalence between directional derivative along every vector and total dierentiation.

Theorem 5.13 If f : A R is dened on a subset A of Rn and f is dierentiable

at an interior point x A, then f has a derivative fa (x) along every n-vector a, and

fa (x) = f (x) a.

Proof of Theorem 5.12: The derivative along a is

+ f (x) a = 0 + f (x) a

fa (x) = lim

h0

h

69

In particular, if ej = (0, . . . , 1 , . . . , 0) is the jth standard unit vector in Rn , then

f (x) ej is the partial derivative f (x)/xj = fj (x) with respect to the jth variable.

On the other hand, f (x) ej is the jth component of f (x). Hence, f (x) is the row

vector

Suppose that I am interested in checking if a given transformation f : Rn Rm is

dierentiable. Then, the next theorem shows that it suces to check if each component

real-valued function f j : Rn R is dierentiable (j = 1, . . . , m). 2

Theorem 5.14 A transformation f = (f 1 , . . . , f m ) from a subset A of Rn into Rm is

dierentiable at an interior point x A if and only if each component function f j : A

R, j = 1, . . . , m, is dierentiable at x. Moreover,

f 1

f 1

f 1

(x)

(x)

(x)

f (1) (x)

x1

x2

xn

f 2

f 2

f 2

f (2) (x)

(x)

(x)

x1

x2

xn (x)

Df (x) =

=

..

.

.

.

..

..

..

..

.

f m

f m

f m

f (m) (x)

x1 (x)

x2 (x)

xn (x)

This is called the Jacobian matrix of f () at x. Its rows are the gradients of the

component functions of f ().

Proof of Theorem 5.13: Let C be an m n matrix and let O(h) = f (x + h)

f (x) Ch where h Rn .

(1)

f (x + h) f (1) (x)

c11 c12 c1n

h1

O1 (h)

O2 (h) f (2) (x + h) f (2) (x) c21 c22 c2n h2

..

=

..

.. .. .

..

..

..

.

.

.

.

.

.

.

(m)

(m)

cm1 cm2 cmn

hn

Om (h)

f (x + h) f (x)

The j-th component of O(h), j = 1, . . . , m, is Oj (h) f (j) (x + h) f (j) (x) Cj h, where

Cj is the j-th row of C. For each j,

|Oj (h)| O(h) |O1 (h)| + |Om (h)|

It follows that

O(h)

|Oi (h)|

= 0 lim

= 0 for all i = 1, . . . , m

h0 h

h0 h

lim

j-th row of the matrix C = Df (x) is the derivative of f (j), that is, Cj = f (j) (x).

The next theorem conrms our intuition about dierentiation: If a transformation

(or function) is dierentiable, it is more than continuous.

See the similar argument in Theorem 3.14 in which we show that a function f : n

continuous if and only if each component function f j : n is continuous (j = 1, . . . , m).

2

70

is

Rn into Rm is dierentiable at an interior point x0 A, then f is continuous.

Proof of Theorem 5.14: Let C = Df (x). Then, for small but nonzero h Rn , the

triangle inequality yields

f (x0 + h) f (x0 ) = f (x0 + h) f (x0 ) + Ch Ch

f (x0 + h) f (x0 ) + Ch + Ch ( Minkowski inequality)

+ Ch

= h

h

Since f is dierentiable at x0 ,

f (x0 + h) f (x0 ) + Ch

0 as h 0

h

Ch 0 as h 0

Hence, f (x0 + h) f (x0 ) as h 0.

The next theorem shows that the order of two operations do not matter for the nal

product: (1) Construct a composite mapping and dierentiate it; and (2) Dierentiate

each mapping and constructs a composite of two derivatives.

Theorem 5.16 (The Chain Rule) Suppose f : A Rm and g : B Rp are dened

on A Rn and B Rm , with f (A) B, and suppose that f and g are dierentiable at

x and f (x), respectively. Then, the composite transformation g f : A Rp dened by

(g f )(x) = g(f (x)) is dierentiable at x, and

D(g f )(x) = Dg(f (x)) Df (x)

pn

pm

mn

k(h) = f (x + h) f (x) = Df (x)h + ef (h), where

ef (h)

0 as h 0

h

eg (k)

0 as k 0

k

Also,

Note that there exits some xed constant K such that k(h) Kh for all small h.

Otherwise, f and g are not dierentiable. Note also that for all > 0, eg (k) < k

for k small because g is dierentiable. Thus, when h is small, we can summarize

eg (k(h)) < k(h) Kh

71

Hence,

eg (k(h))

0 as h 0

h

Then, we execute a series of computation below.

e(h) = g (f (x) + k(h)) g(f (x)) Dg(f (x))Df (x)h

= D(g f )(x)k(h) + eg (k(h)) Dg(f (x))Df (x)h

= D(g f )(x) (k(h) Df (x)h) + eg (k(h))

= D(g f )(x)ef (h) + eg (k(h)) ( h(k) = Df (x)h + ef (h))

And, moreover,

e(h)

h

1

D(g f )(x)ef (h) + eg (k(h))

h

D(g f )(x)ef (h) eg (k(h))

+

( triangle inequality)

h

h

ef (h) eg (k(h))

+

( Cauchy-Schwartz inequality)

D(g f )(x)

h

h

=

0 as h 0.

5.10

f is the whole of B, i.e., f (A) = B. Recall that f is one-to-one if f (x) = f (x ) = x = x .

In sum, f is bijective. In this case, for each point y B there is exactly one point x A

such that f (x) = y, and the inverse of f is the transformation f 1 : B A which maps

each y B to precisely that point x A for which f (x) = y.

If f : U V and g : V U are dierentiable and mutually inverse transformations

between open sets U and V in Rn , then g f is the identity transformation on U , and

therefore D(g f )(x) = In for all x U . The chain rule then gives Dg(f (x))Df (x) = In .

This means that the Jacobian matrix Df (x) must be nonsingular, so |Df (x)|

= 0. Also,

Dg(f (x)) is the inverse matrix Df (x)1 .

Theorem 5.17 (Inverse Function Theorem) Consider a transformation f = (f1 , . . . , fn )

from A Rn into Rn and assume that f is C k (k 1) in an open set containing

x0 = (x01 , . . . , x0n ). Furthermore, suppose that |Df (x)|

= 0 for x = x0 . Let y 0 = f (x0 ).

Then, there exists an open set U around x0 such that f maps U one-to-one onto an open

set V around y 0 , and there is an inverse mapping g = f 1 : V U which is also C k .

Moreover, for all y V , we have

Dg(y) = Df (x)1 , where x = g(y) U.

72

5 steps. However, the proof of each step will be either briey sketched or completely

skipped due to its technical diculty. For simplicity, we assume that x0 = 0 and f (x0 ) =

y0 = 0.

Step 1: There is no loss of generality to assume that Df (x) = In

Proof of Step 1: Let Df (0) = A. Since A is non-singular, A1 exists. Let g : Rn

be a linear mapping associated with A1 . That is, g(x) = A1 x for any x Rm .

Note that g(0) = 0 and Dg(0) = A1 .

The Jacobian matrix of f g : Rn Rn is given as D(f g)(0) = Df (0)Dg(0) =

1

AA = In . If we can show that f g is C r , so is f . This is because g is a linear map

associated with A1 and so is C k . Therefore, we can rather talk about f g instead of

f so that Df (0) = In can be assumed with no loss of generality.

Rn

one.

Proof of Step 2:

Step 3: There exists an open set V such that f |U : U V is onto. That is, for any

0 V , there exists 0 U such that f (0) = 0.

Step 4: f 1 |V : V U is one-to-one and onto.

Step 5: f 1 |V : V U is C k .

5.11

f1 (x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ) = 0

f (x, y) = 0

fm (x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ) = 0

f1 /x1

..

..

Dx f (x, y) =

.

.

fm /x1

f1 /xn

..

.

.

fm /xn

an open set A Rn+m , and consider the vector equation f (x, y) = 0, where x Rn

and y Rm . Let (x0 , y 0 ) be an interior point of A satisfying f (x, y) = 0. Suppose that

73

|Dy f (x, y)|

= 0 at (x, y) = (x0 , y 0 ). Then, there exist open balls B1 and B2 around x0

and y 0 , respectively, with B1 B2 A, such that |Dy f (x, y)|

= 0 in B1 B2 , and such

that for each x B1 , there is a unique y B2 with f (x, y) = 0. In this way, y is dened

on B1 as a C 1 function g() of x. The Jacobian matrix Dg(x) can be found by implicit

dierentiation of f (x, y) = 0, and

dy

y2 /x1 y2 /x2 y2 /xn

=

= Dg(x) = [Dy f (x, y)]1 Dx f (x, y)

..

..

.

.

..

..

dx

.

.

ym /x1 ym /x2 ym /xn

Proof of Implicit Function Theorem: We dene the norm of vectors in Rn as

follows.

x max |xi |

1in

This is the norm we discuss for the implicit function theorem. The proof relies on the

following three lemmas (Lemmas 5.1, 5.2, and 5.3). We will not provide their proofs

here.

Lemma 5.1 Let K be a compact set in Rn . Let {hk (x)}K

k=1 be a sequence of continuous

m

functions K R . Suppose that for any > 0, there exists a number N N such that

max hm (x) hn (x) <

xK

for all m, n > N . Then, there exists a unique continuous function h : K Rm such that

+

lim max hk (x) h(x) = 0

k

xK

With Weierstrasss Theorem (Theorem 3.19) and the concept of Cauchy sequence in

Rn (Denition 3.6), the above lemma should be easy to be established. For the next

lemma, dene the following.

D = x Rn x x0

D = y Rm y y 0

Let (x, y) be a continuous mapping from D D to R with the property that

= 0. Notice that D D is a compact set by construction.

(x0 , y 0 )

Lemma 5.2 (Lipschitz Continuity) There exists a number K (0, 1) such that, for

all y, y D ,

74

needed for Lemma 5.1. Again, we take Lemma 5.2 for granted. Let y0 (x) = y 0 . Dene

yk+1 (x) = y0 + (x, yk (x)) for k 0. Since (x0 , y 0 ) = 0, we have

|(x, y 0 )| = (1 K)

for x D with > 0 suciently small. We execute a series of computations below.

|yk+1 (x) y 0 | = |(x, yk (x))|

= |(x, yk (x)) (x, y 0 ) + (x, y 0 )|

|(x, yk (x)) (x, y 0 )| + |(x, y 0 )|

< K|yk (x) y 0 | + (1 K)

Fix m N.

yk+m (x) yk (x) = (x, yk+m1 (x)) (x, yk1 (x))

Kyk+m1 (x) yk1 (x)

K k+1 ym (x) y 0 0 as k

This means that maxxD ym (x) yn (x) 0 as m, n .

Lemma 5.3 Let (x, y) be a continuous mapping from D D to Rm with the property

that (x0 , y 0 ) = 0. Furthermore, There exists a number K (0, 1) such that, for all

y, y D ,

Then, there exists a unique continuous mapping : D Rm for which

(x) y 0 = (x, (x))

for x D with > 0 suciently small.

With the help of all three lemmas above, we will complete the proof of Implicit

Function Theorem.

Dene a function g(x, y) : D D Rm satisfying the following equation.

m1

f (x, y) = Dy f (x0 , y 0 ) (y y 0 ) + g(x, y)

m1

m1

mm

= 0, we have

"

#1

g(x, y)

y y 0 = Dy f (x0 , y 0 )

75

= 0. Dene

#1

"

g(x, y)

(x, y) = Dy f (x0 , y 0 )

Note also that (x0 , y 0 ) = 0. By the mean-value theorem (Theorem 5.2),

"

#1

[Dy g(x, y)](y y )

(x, y) (x, y ) = Dy f (x0 , y 0 )

"

#1 "

#

= Dy f (x0 , y 0 )

Dy f (x, y) Dy f (x0 , y 0 ) (y y )

= Im [Dy f (x0 , y 0 )]1 Dy f (x, y) (y y 0 )

If we choose > 0 small enough so that x is very close to x0 , i.e., x D , there exists

K (0, 1) such that

= Ky y 0

Now, we can take two open sets B1 and B2 small enough so that B1 D and B2 D

needed for the theorem. Then, we can use Lemma 5.3 which completes the proof.

Corollary 5.3 (A Version of Implicit Function Theorem) Suppose f : R2 R is

C 1 in an open set A containing (x0 , y0 ), with f (x0 , y0 ) = 0 and f (x0 , y0 )/y

= 0. Then,

there exists an interval Ix = (x0 , x0 + ) and an interval Iy = (y0 , y0 + ) (with

> 0 and > 0) such that Ix Iy A and:

1. for every x Ix , the equation f (x, y) = 0 has a unique solution in Iy which denes

y as a function y = (x) in Iy ;

2. is C 1 in Ix = (x0 , x0 + ), with derivative

f (x, (x))/x

dy

= (x) =

dx

f (x, (x))/y

Exercise 5.8 The point P = (x, y, z, u, v, w) = (1, 1, 0, 1, 0, 1) satises all the equations

y 2 z + u v w3 = 1

2x + y z 2 + u + v 3 w = 3

x2 + z u v + w 3 = 3

Using the implicit function theorem, nd du/dx, dv/dx, and dw/dx at P .

76

Chapter 6

Static Optimization

6.1

6.1.1

Unconstrained Optimization

Extreme Points

Suppose that the point x = (x1 , . . . , xn ) belongs to S and that the value of f at x is

greater than or equal to the values attained by f at all other points x = (x1 , . . . , xn ) S.

Thus,

f (x ) f (x) for all x S ()

Here x is called a (global) maximal point for f in S and f (x ) is called the maximum

value. If the inequality () is strict for all x

= x , then x is a strict maximum point

for f () in S. We dene (strict) minimum point and minimum value by reversing the

inequality sign in (). As collective names, we use extreme points and extreme values to

indicate both maxima or minima.

Theorem 6.1 Let f () be dened on a set S in Rn and let x = (x1 , . . . , xn ) be an

interior point in S at which f () has partial derivatives. A necessary condition for x to

be an extreme point for f is that x is a stationary point for f () that is, it satises

the equations

f (x) = 0

f (x)

= 0, for i = 1, . . . , n

xi

Proof of Theorem 6.1: Suppose, on the contrary, that x is a maximum point but

not a stationary point for f (). Then, there is no loss of generality to assume that there

exists at least i such that fi (x) > 0. Dene x = (x1 , . . . , xi + , . . . , xn ). Since x is an

interior point in S, one can make sure that x S by choosing > 0 suciently small.

Then,

, 0, . . . , 0) > f (x ).

f (x ) f (x ) + f (x) (0, . . . , 0,

i

77

However, this contradicts the hypothesis that x is a maximum point for f ().

The next theorem claries under what conditions, the converse of the previous theorem (Theorem 6.1) is established.

Theorem 6.2 Suppose that the function f () is dened in a convex set S Rn and let

x be an interior point of S. Assume that f () is C 1 in a ball around x .

1. If f () is concave in S, then x is a (global) maximum point for f () in S if and

only if x is a stationary point for f ().

2. f () is convex in S, then x is a (global) minimum point for f () in S if and only

if x is a stationary point for f ().

Proof of Theorem 6.2: We focus on the rst part of the theorem. The second part

follows once we take into account that f is concave. (=) This follows from Theorem

6.1 above. (=) Suppose that x is a stationary point for f () and that f () is concave.

Recall the inequality in Theorem 5.6 (First-order characterization of concave functions).

For any x S,

f (x) f (x ) f (x ) (x x ) = 0 ( f (x ) = 0)

Thus, we have f (x) f (x ) for any x S as desired.

6.1.2

f (x1 , . . . , xn , r1 , . . . , rk ), where x S Rn and r Rk . For each xed r, suppose we

have found the maximum of f (x, r) when x varies in S. The maximum value of f (x, r)

usually depends on r. We denote this value by f (r) and call f the value function.

Thus,

f (r) = max f (x, r)

xS

The vector x that maximizes f (x, r) depends on r and is therefore denoted by x (r).

Then, f (r) = f (x (r), r).

Theorem 6.3 (Envelope Theorem) In the maximization problem maxxS f (x, r), where

S Rn and r Rk , suppose that there is a maximum point x (r) S for every

r B (r ) with some > 0. Furthermore, assume that the mappings r f (x (r ), r)

and r f (r) are dierentiable at r . Then

f (x, r)/r1

f (x, r)/r2

r f (r ) =

..

.

f (x, r)/rk

78

x=x (r ),r=r

There are two eects of r on the value function f through both directly and indirectly

The Envelope theorem says that we can ignore the indirect eects.

x (r).

(r) = f (x (r ), r) f (r).

Because x (r ) is a maximum point of f (x, r) when r = r , one has (r ) = 0 and

(r) 0 for all r B (r ). Since (r ) is a maximum, the following rst order condition

is satised (because of Theorem 6.1).

(r)

= 0 j = 1, . . . , k

r (r) r=r = 0

rj r=r

That is,

f (x (r ), r)

f (r)

(r)

=

rj = 0 j = 1, . . . , k

rj

rj

r=r

r=r

6.1.3

The point x is a local maximum point of f () in S if there exists an > 0 such that

f (x) f (x ) for all x B (x ) S. If x is the unique local maximum point for f (),

then it is a strict local maximum point for f () in S. A (strict) local minimum point is

dened in the obvious way, and it should be clear what we mean by local maximum and

minimum values, local extreme points, and local extreme values. A stationary point x of

f () that is neither a local maximum point nor a local minimum point is called a saddle

point of f ().

Before stating the next result, recall the

matrix D 2 f (x):

f11 (x) f12 (x)

f21 (x) f22 (x)

2

f (x)| =

|D(k)

..

..

.

.

fk1 (x) fk2 (x)

..

.

, k = 1, . . . , n

fkk (x)

f1k (x)

f2n (x)

..

.

Theorem 6.4 (Sucient Conditions for Local Extreme Points) Suppose that f (x) =

f (x1 , . . . , xn ) is dened on a set S Rn and that x is an interior stationary point. Assume also that f () is C 2 in an open ball around x . Then,

1. D 2 f (x ) is positive denite = x is a local minimum point.

2. D 2 f (x ) is negative denite = x is a local maximum point.

79

Proof of Theorem 6.4: We only focus on the rst part of the theorem. We should

be able to prove the second part of the proof by replacing f () with f (). Since each

fij (x) is continuous in x (because f () is C 2 ), the determinant is a continuous function

2 f (x )| > 0 for all k, it is possible to nd a ball B (x ) with > 0

of x. Therefore, if |D(k)

so small that |D(k)

the corresponding quadratic form is positive denite for all x B (x ). It follows from

Theorem 5.5 that f () is strictly convex in B (x ). Then, Theorem 6.2 shows that the

stationary point x is a maximum point for f in B (x ). Hence, x is a local minimum

point for f ().

Lemma 6.1 If x is an interior stationary point of f () such that |D2 f (x )|

= 0 and

D 2 f (x ) is neither positive denite nor negative denite, then x is a saddle point.

6.1.4

The function g() describes the behavior of f () along the straight line through x parallel

to the vector h Rn .

g(t) = f (x + th) = f (x1 + th1 , . . . , xn + thn )

We have the following characterization of local extreme points.

Theorem 6.5 (Necessary Conditions for Local Extreme Points) Suppose that f (x) =

f (x1 , . . . , xn ) is dened on a set S Rn , and x is an interior stationary point in S.

Assume that f is C 2 in a ball around x . Then,

1. x is a local minimum point = D2 f (x ) is positive semidenite.

2. x is a local maximum point = D2 f (x ) is negative semidenite.

Proof of Theorem 6.5: Suppose that x is an interior local maximum point for

f (). Then, if > 0 is small enough, B (x ) S, and f (x) f (x ) for all x B (x ).

If t (, ), then x + th B (x ) because (x + th) x = th = |t| < . Then,

for all t (, ), we have

f (x + th) f (x ) g(t) g(0).

Thus, the function g() has an interior maximum at t = 0. Using the chain rule, we

obtain

g (t) =

fi (x + th)hi

i=1

g (t) =

n

n

i=1 j=1

80

fij (x + th)hi hj

n

n

i=1 j=1

This implies that the Hessian matrix D 2 f (x ) is negative semidenite. Theorem 4.6

shows that this is equivalent to checking all principal minors. The same argument can

be used to establish the necessary condition for x to be a local minimum point for f ().

Exercise 6.1 Find the local extreme values and classify the stationary points as maxima,

minima, or neither.

1. f (x1 , x2 ) = 2x1 x21 x22 .

2. f (x1 , x2 ) = x21 + 2x22 4x2 .

3. f (x1 , x2 ) = x31 x22 + 2x2 .

4. f (x1 , x2 ) = 4x1 + 2x2 x21 + x1 x2 x22 .

5. f (x1 , x2 ) = x31 6x1 x2 + x32

6.2

6.2.1

Constrained Optimization

Equality Constraints: The Lagrange Problem

max

L(x) = f (x) 1 g1 (x) m gm (x)

where 1 , . . . , m are called Lagrange multipliers. The necessary rst-order conditions

for optimality are then:

m

f (x) gj (x)

L(x)

=

j

= 0, i = 1, . . . , n ()

L(x) = f (x) Dg(x) = 0

xi

xi

xi

j=1

Theorem 6.6 (N&S Conditions for Extreme Points with Equality Constraints)

The following establishes the necessary and sucient conditions for the Lagrangian method

81

in Rn and x = (x1 , . . . , xn ) is an interior point of S that solves the maximization

problem (). Assume further that f and g1 , . . . , gm are C 1 in a ball around x , and

that the Jacobian matrix,

g 1 (x )

g 1 (x )

xn

x. 1

..

..

.

Dg(x ) =

.

.

.

m

g (x )

g (x )

mn

x1

xn

has rank m. Then, there exist unique numbers 1 , . . . , m such that the rst-order

conditions () are valid.

2. (Suciency) If there exist numbers 1 , . . . , m and a feasible x which together

satisfy the rst-order conditions (), and if the Lagrangian L(x) is concave in x,

then x solves the maximization problem ().

Proof of Theorem 6.6: (Necessity) The proof for the necessity part consists of

three steps.

Step 1: Construction of the unconstrained maximization problem

Since m n matrix Dg(x ) is assumed to have rank m, there exists a invertible (nonsingular) m m submatrix. After renumbering the variables, if necessary, we can assume

that it consists of the rst m columns. By the implicit function theorem (Theorem 5.17),

= (xm+1 , . . . , xn )

the m constraints, g1 , . . . , gm dene x1 , . . . , xm as C 1 functions of x

in some open ball B around x , i.e., B (x ) with > 0 suciently small, so we can write

x), j = 1, . . . , m

xj = hj (xm+1 , . . . , xn ) = hj (

Then, f (x1 , . . . , xn ) reduces to a composite function

x), . . . , hm (

x), x

)

(

x) = f (h1 (

of x

only. Now, the maximization problem with equality constraints is translated into

the unconstrained maximization problem

x)

max (

x

B (x )

Since x is a local extreme point for f subject to the given constraints, must have an

unconstrained local extreme point at x

= (xm+1 , . . . , xn ). Hence, the partial derivatives

of with respect to xm+1 , . . . , xn must be 0:

f (x )

f (x ) h1

f (x ) hm

(x )

+

= 0 (1)

=

+ +

xk

x1 xk

xm xk

xk

direct eects

indirect eects

82

where k = m + 1, . . . , n.

Step 2: To express h1 /xk , . . . , hm /xk in terms of g j (x)

x), . . . , hm (

x), x) = 0, j = 1, . . . , m

gj (h1 (

for all x

B. Dierentiating this with respect to xk gives

m

gj hs

gj

+

= 0,

xs xk

xk

s=1

each of the m equations above by a scalar j , then adding these equations over j, we

obtain

m

m

m

gj hs

gj

j

j

= 0, (2)

+

xs xk

xk

s=1

j=1

j=1

m

m

m

j

f (x ) gj

g hs

f (x )

j

+

j

=0

xs

xs xk

xk

xk

s=1

j=1

j=1

the partial derivatives are evaluated at x . Suppose that we can prove the existence of

numbers 1 , . . . , m such that

m

f (x ) gj

j

= 0, s = 1, . . . , m (3)

xs

xs

j=1

m

f (x ) gj

j

k = m + 1, . . . , n

xk

xk

j=1

Thus, the rst-order necessary conditions for the Lagrangian are also satised.

Step 3: Existence of 1 , . . . , m satisfying (3)

83

g1

1 +

x1

g1

1 +

x2

g2

gm

2

m

x1

x1

g2

gm

2

m

x2

x2

=

=

f

x1

f

Dg(x )

= f (x )

x1

mn

g2

gm

f

g1

1 +

2

m =

xm

xm

xm

xm

n1

m1

Here the coecient matrix above is invertible (nonsingular), it has a unique solution

1 , . . . , m . This completes the proof.

(Suciency) Suppose that the Lagrangian L(x) is concave. The rst-order necessary conditions imply that the Lagrangian is stationary at x . Then by Theorem 6.2

(suciency for global maximum),

L(x ) = f (x )

j g (x ) f (x)

j=1

j gj (x) = L(x) x S

j=1

But for all feasible x, we have gj (x) = 0 and of course, gj (x ) = 0 for all j = 1, . . . , m.

Hence, this implies that f (x ) f (x). Thus, x solves the maximization problem ().

6.2.2

The optimal values of x1 , . . . , xn in the maximization problem () will depend upon the

parameter vector r = (r1 , . . . , rk ) Rk , in general. If x (r) = (x1 (r), . . . , xn (r)) denotes

the vector of optimal values of the choice variables, then the corresponding value

f (r) = f (x1 (r), . . . , xn (r))

of f () is called the (optimal) value function for the maximization problem (). The values

of the Lagrange multipliers

depend on r; we write j = j (r) for j = 1, . . . , m.

will also

j (x, r) be the Lagrangian. Under certain conditions, we

g

Let L(x, r) = f (x, r) m

j=1 j

have

L(x, r)

f (r)

=

, i = 1, . . . , k

ri

ri

x=x (r)

6.2.3

Tangent Hyperplane

g1 (x) = 0

g2 (x) = 0

..

..

.

.

gm (x) = 0

84

section, the functions gj , j = 1, . . . , m belong to C 1 , the surface dened by them is said

to be smooth. We introduce the tangent hyperplane M below:

M = {y Rn |Dg(x )y = 0}

Note that the tangent hyperplane is a subspace of Rn .

Denition 6.1 A point x satisfying the constraint g(x ) = 0 is said to be a regular point of the constraint if the gradient vectors g 1 (x ), . . . , g m (x ) are linearly

independent. That is, Rank(Dg(x )) = m.

6.2.4

Lemma 6.2 Let x be a regular point of the constraint g(x) = 0 and a local extreme

point of f () subject to these constraints. Then, for any y Rn ,

f (x )y = 0 whenever Dg(x )y = 0.

Proof of Lemma 6.2: Let y = (y1 , . . . , yn ) with y = 1. Let x(t) = x + ty be any

smooth curve on the constraint surface g(x(t)) = 0 passing through x with derivative

x (t)|t=0 = y at x(0) = x . There exists some > 0 such that g(x(t)) = 0 for any

t (, ).

Since x is a regular point, the tangent hyperplane is identical with the set of ys

satisfying g(x )y = 0. Then, since x is a constrained local extreme point of f (), we

have

d

f (x(t))

= 0 = f (x )x (0) = 0,

dt

t=0

equivalently, f (x )y = 0.

The above lemma says that f (x ) is orthogonal to the tangent hyperplane.

6.2.5

Theorem 6.7 (Necessity for Local Maximum) Suppose that x is a local maximum

of f () subject to g(x) = 0 and that x is a regular point of these constraints. Then there

is a Rn such that

f (x ) Dg(x ) = 0.

If we denote by M the tangent hyperplane M = {h Rn |Dg(x )h = 0}, then, the matrix

D 2 L(x ) = D2 f (x ) D2 g(x )

is negative semidenite on M , that is,

hT D2 L(x )h 0 h M

85

Proof of Theorem 6.7: The rst part follows from Theorem 6.6. We only focus

on the second part. Let h = (h1 , . . . , hn ) M with h = 1. Let x(t) = x + th be

any smooth curve on the constant surface g(x(t)) = 0 passing through x with derivative

x (0) = h at x(0) = x . Suppose that x is an interior local maximum point for f subject

to g(x) = 0. Then, if > 0 is small enough,

L(x + th) L(x ) f (x + th) g(x + th) f (x ) g(x )

for all t (, ) because (x + th) x = th = |t| < . Dene the function

(t) = L((x + th). Then, for all t (, ), we have

L(x + th) L(x ) (t) (0).

Thus, the function has an interior maximum at t = 0. Using the chain rule (Theorem

5.15), we obtain

because h M so that f (x )h = 0 and Dg(x )h = 0. Furthermore,

The hypothesis that has an interior local maximum at t = 0 means (0) 0. Thus,

hT D2 L(x )h 0

Lij (x )hi hj 0

i=1

Theorem 6.8 (Suciency for Local Maximum) Suppose there is a point x Rn

satisfying g(x ) = 0, and a Rm such that

0

..

0 = . .

f (x ) Dg(x ) =

n1

0

Suppose also that the matrix D 2 L(x ) = D2 f (x ) + D2 g(x ) is negative denite on

M = {y Rn |Dg(x )y = 0}, that is, for y M with y

= 0, y T D2 L(x )y < 0. Then, x

is a strict local maximum of f () subject to g(x) = 0.

Proof of Theorem 6.8: The rst part follows from Theorem 6.7. Dene the Lagrangian as follows.

L(x) = f (x) g(x)

Dierentiating this with respect to x, and evaluating it at x , we obtain

L(x ) = f (x ) Dg(x ) = 0

86

This implies that L(x )y = 0 for any y Rn . By our hypothesis, D2 L(x ) is negative

denite on M , and therefore, x is a local maximum point of L(x) from Theorem 6.4.

This implies that x is a local maximum of f () subject to g(x) = 0.

Exercise 6.2 Solve the problem

max{x + 4y + z} subject to x2 + y 2 + z 2 = 216 and x + 2y + 3z = 0

Exercise 6.3 Consider the problem (assuming m 4).

max U (x1 , x2 ) =

1

1

ln(1 + x1 ) + ln(1 + x2 ) subject to 2x1 + 3x2 = m

2

4

1. Let x1 (m) and x2 (m) denote the values of x1 and x2 that solve the above maximization problem. Find these functions and the corresponding Lagrangian multiplier.

2. The optimal value U of U (x1 , x2 ) is a function of m. Find an explicit expression

for U (m), and show that dU /dm = .

6.2.6

In economic optimization problems, the objective function as well as the constraint functions (such as the budget set) will often depend on parameters. These parameters are

held constant when optimizing (remember the price taking behavior assumption), but

can vary with the economic situation. We might want to know what happens to the

optimal value function when the parameter change.

Consider the following general Lagrange problem.

max f (x, r) subject to gj (x, r) = 0, j = 1, . . . , m.

xS

the maximization problem will be functions of r. If we denote them by x1 (r), . . . , xn (r),

then

f (r) = f (x1 (r), . . . , xn (r))

is called the value function. Suppose that i = i (r) for all i = 1, . . . , m are the Lagrange

multipliers in the rst-order conditions for the maximization problem and let

L(x, r) = f (x, r) + g(x, r)

be the Lagrangian. Here = (1 , . . . , m ) and g(x, r) = (g1 (x, r), . . . , gm (x, r)).

87

6.3

1

g (x1 , . . . , xn ) 0

g2 (x1 , . . . , xn ) 0

max f (x) subject to

..

xS

.

m

g (x1 , . . . , xn ) 0

A vector x = (x1 , . . . , xn ) that satises all the constraints is called feasible. The

set of all feasible vectors is said to be the feasible set. We assume that f () and all the

gj functions are C 1 . In the case of equality constraint, the number of constraints were

assumed to be strictly less than the number of variables. This is not necessary for the

case of inequality constraints. An inequality constraint gj (x) 0 is said to be active

(binding) at x if gj (x) = 0 and inactive (non-binding) at x if gj (x) < 0.

Note that minimizing f (x) is equivalent to maximizing f (x). Moreover, an inequality constraint of the form gj (x) 0 can be rewritten as g j (x) 0. In this way, most

constrained optimization problem can be expressed as the above form.

We dene the Lagrangian exactly as before.

L(x) = f (x) g(x) = f (x)

j gj (x),

j=1

partial derivatives of the Lagrangian are equated to 0:

m

f (x) gj (x)

L(x)

=

j

= 0, i = 1, . . . , n ()

xi

xi

xi

j=1

j 0 and j = 0 if gj (x) < 0 ()

An alternative formulation of this condition is that for any j = 1, . . . , m,

j 0 and j gj (x) = 0

In particular, if j > 0, we must have gj (x) = 0. However, it is perfectly possible to have

both j = 0 and gj (x) = 0.

Conditions () and () are often called the Kuhn-Tucker conditions. They are

(essentially but not quite) necessary conditions for a feasible vector to solve the maximization problem. In general, they are denitely not sucient on their own. Suppose

one can nd a point x at which f () is stationary and gj (x ) < 0 for all j = 1, . . . , m.

Then, the Kuhn-Tucker conditions will automatically be satised by x together with all

the Lagrangian multipliers j = 0 for all j = 1, . . . , m.

88

Theorem 6.9 (Suciency for the Kuhn-Tucker Conditions I) Consider the maximization problem and suppose that x is feasible and satises conditions () and ().

If the Lagrangian L(x) = f (x) g(x) (with the values obtained from the recipe) is

concave, then x is optimal.

Proof of Theorem 6.9: This is very much the same as the suciency part of

the Lagrangian problem in Theorem 6.6. Since L(x) is concave by assumption and

L(x ) = 0 from (), by Theorem 6.2, x is a global maximum point of L(x). Hence,

for all x S,

f (x )

j gj (x ) f (x)

j=1

j gj (x)

j=1

f (x ) f (x)

!

j gj (x ) gj (x) .

j=1

m

!

j gj (x ) gj (x) 0

j=1

for all feasible x, because this will imply that x solves the maximization problem. Suppose that gj (x ) < 0. Then () shows that j = 0. Suppose that gj (x ) = 0, we have

j

0 because

x is feasible, i.e., gj (x) 0 and j 0.

j (gj (x ) gj (x))

!

m = jjg (x)

j

Then, we have j=1 j g (x ) g (x) 0 as desired.

Theorem 6.10 (Suciency for the Kuhn-Tucker Conditions II) Consider the maximization problem and suppose that x is feasible and satises conditions () and ().

If f () is concave and each j gj (x) (with the values obtained from the recipe) is quasiconvex, then x is optimal.

Proof of Theorem 6.10: We want to show that f (x) f (x ) 0 for all feasible

x. Since f () is concave, then according to Theorem 5.6 (First-order characterization of

concavity of f ()),

=

f (x) f (x ) f (x ) (x x )

j g j (x ) (x x )

() j=1

where we use the rst order condition (). It therefore suces to show that for all

j = 1, . . . , m, and all feasible x,

j g j (x ) (x x ) 0.

The above inequality is satised for those j such that gj (x ) < 0, because then j = 0

from the complementary slackness condition (). For those j such that gj (x ) = 0, we

89

j 0. Since the function j gj (x) is quasiconcave (because j gj (x) is quasiconvex),

it follows from Theorem 5.10 (a characterization of quasiconcavity) that (j gj (x ))

(x x ) 0, and thus, j g j (x ) (x x ) 0.

Exercise 6.4 Reformulate the problem

min 4 ln(x2 + 2) + y 2 subject to x2 + y 2, x 1

as a standard Kuhn-Tucker maximization problem and write down the necessary KuhnTucker conditions. Moreover, nd the solution of the problem (Take it for granted that

there is a solution).

6.4

max f (x) subject to gj (x) bj j = 1, . . . , m

The optimal value of the objective f (x) obviously depends upon b Rm which is a

parameter vector in the constraint set. The function dened by

f (b) = max f (x)gj (x) bj , j = 1, . . . , m

assigns to each b = (b1 , . . . , bk ) the optimal value f (b) of f (). It is called the value

function for the problem. Let the optimal choice for x in the constrained optimization

problem be denoted by x (b), and assume that it is unique. Let j (b) for j = 1, . . . , m

be the corresponding Lagrange multipliers. Then, if f (b)/bj exists,

f (b)

= j (b) j = 1, . . . , m

bj

The value function f is not necessarily C 1 . The next proposition characterizes a geometric structure of the value function.

Proposition 6.1 If f (x) is concave and gj (x) is convex for each j = 1, . . . , m, then

f (b) is concave.

Proof of Proposition 6.1: Suppose that b and b are two arbitrary parame

ter vectors in the constraint set, and let f (b ) = f (x (b ), f (b ) = f (x (b ), with

gj (x (b )) bj , gj (x (b )) bj for j = 1, . . . , m. Let [0, 1]. Corresponding to the

vector b + (1 )b , there exists an optimal solution x (b + (1 )b ), and

f (b + (1 )b ) = f (x (b + (1 t)b ))

gj (x ) gj (x (b )) + (1 )gj (x (b )) bj + (1 )bj

90

x (b + (1 )b ) is optimal. It follows that

f (x ) f (x (b + (1 )b )) = f (b + (1 )b )

On the other hand, concavity of f implies that

f (x ) f (x (b )) + (1 )f (x (b )) = f (b ) + (1 )f (b )

In sum,

f (b + (1 )b ) f (b ) + (1 )f (b )

This shows that f (b) is concave.

6.5

Constraint Qualications

max f (x) subject to gj (x) 0, j = 1, . . . , m

Denition 6.2 The constrained maximization problem satises the constraint qualication if the gradient vectors g j (x ) (1 j m) corresponding those constraints

that are active (binding) at x , are linearly independent.

An alternative formulation of this condition is: Delete all rows in the Jacobian matrix

Dg(x ) that correspond to constraints that are inactive (not binding) at x . Then, the

remaining matrix should have rank equal to the number of rows.

Theorem 6.11 (Kuhn-Tucker Necessary Conditions) Suppose that x = (x1 , . . . , xn )

solves the constrained maximization problem where f () and g1 (), . . . , gm () are C 1 functions. Suppose furthermore that the maximization problem satises the constraint qualication. Then, there exist unique numbers 1 , . . . , m such that the Kuhn-Tucker conditions () and () hold at x = x .

Proof of Kuhn-Tucker Necessary Conditions: We assume the following.

1. x Rn maximizes f on the constraint set gj (x) 0 for all j = 1, . . . , m

2. only g1 , . . . , gk are binding at x , where k m.

3. the k n Jacobian matrix Dgk (x ) has maximal rank k. That is,

g1 (x )/x1 g1 (x )/xn

..

..

..

k = Rank [Dgk (x )] = Rank

.

.

.

.

k

g (x )/x1 g (x )/xn

91

Step 1: L(x, ) = 0 and g(x) = 0

Since each gj () is a continuous function, there is a open ball B (x ) such that gj (x) <

0 for all x B (x ) and for j = k + 1, . . . , m. We will work in the open ball B (x ) for

the rest of proof.

Note that x maximizes f () in B (x ) over the constraint set that gj (x) = 0 for

j = 1, . . . , k. By assumption, Theorem 6.7 (Necessity for Optimization with Equality

Constraints) applies and therefore, there exist 1 , . . . , k such that

, ) = 0 and gj (x ) = 0 j = 1, . . . , k

L(x

) f (x) k j gj (x) as the restricted Lagrangian.

where L(x,

j=1

Consider the usual Lagrangian

L(x, 1 , . . . , m ) f (x)

k gj (x).

j=1

is a solution of the n + m equations in n + m unknowns:

L

(x , ) = 0 i = 1, . . . , n

xi

j gj (x ) = 0 j = 1, . . . , m

Step 2: j 0 for all j

There is a C 1 curve x(t) dened for t [0, ) such that x(0) = x and, for all t [0, ),

g1 (x(t)) = t and gj (x(t)) = 0 for j = 2, . . . , k

By the implicit function theorem (Theorem 5.17), we can still solve the constrained

optimization problem in B (x ) even if we slightly perturb the constraint set. Let h =

x (0). Using the chain rule (Theorem 5.15), we conclude that

nablag 1 (x )h = 1, g j (x )h = 0 j = 2, . . . , k

Since x(t) lies in the constraint set for all t and x maximizes f () in the constraint set,

f () must be nonincreasing along x(t). Therefore,

d

f (x(t))

= f (x )h 0

dt

t=0

92

0 = L(x )y

= f (x )h

j g j (x )y

j=1

= f (x )h 1 g 1 (x )h

= f (x )h + 1

Since f (x )h 0, we conclude that 1 0. A similar argument shows that j 0 for

j = 1, . . . , k. This completes the proof.

Theorem 6.12 (Kuhn-Tucker N & S Conditions) Assume that a feasible vector x

and a set of multipliers 1 , . . . , m satisfy the Kuhn-Tucker necessary conditions () and

() for the constrained maximization problem. Dene J = {j|gj (x ) = 0}, the set

of active (binding) constraints, and assume that j > 0 for all j J Consider the

Lagrangian problem

max f (x) subject to gj (x) = 0 j J

Then, x satises

) = f (x )

L(x

j g j (x ) = 0

jJ

for the given multipliers j for j J. If D2 L(x

a strict local maximum point for the original constrained maximization problem. Here

M = h Rn g j (x )h = 0 j J

Proof of Kuhn-Tucker N & S Conditions: Suppose, on the contrary, that x

is not a local maximum point of the constrained optimization problem. Then, we can

consider {yk } as a sequence of feasible points converging to x such that f (yk ) f (x )

for each k. More specically, for each k, dene yk = x + k hk with hk = 1 and k > 0.

We may assume that k 0 and hk h as k . Using the linear approximation

through dierentiability,

f (yk ) f (x ) + f (x ) (yk y ) = f (x ) + k f (x ) hk

for k large enough. Letting k , because of linearity of f (x )hk in hk (continuity

follows), we must have f (x )h 0 from f (yk ) f (x ). Also for each binding (active)

constraint gj , we have

gj (yk ) gj (x )

Again, using the linear approximation through dierentiability,

gj (yk ) gj (x ) + g j (x ) (yk x ) = gj (x ) + k g j (x ) hk

93

for k large enough. Then, we must have Dgj (x )h 0 because Dgj (x )hk is a linear

continuous in hk and gj (yk ) gj (y ) for each k.

If g j (x )h = 0 for all j J, then the proof goes through just as in Theorem 6.8.

Therefore, there must exists at least one j J such that g j (x )h < 0. Then, we

obtain

j Dgj (x )h > 0 because j > 0 for all j J

f (x )h

jJ

f (x )

j Dgj (x ) h > 0

jJ

=0

We complete the proof.

jJ

j g j (x ) = 0.

max f (x, y) = x subject to g(x, y) = x3 + y 2 = 0.

Show that this problem does not satisfy the constraint qualication.

6.6

Nonnegativity Constraints

max f (x) subject to gj (x) 0 j = 1, . . . , m and xi 0 for all i = 1, . . . , n

We introduce n new constraints in addition to the m original ones:

gm+1 (x) = x1 0

gm+2 (x) = x2 0

.. .. ..

. . .

gm+n (x) = xn 0

We introduce the Lagrangian multipliers 1 , . . . , n to go with the new constraints and

form the extended Lagrangian.

L1 (x) = f (x)

j g (x)

j=1

i (xi )

i=1

m

f (x ) gj (x )

j

+ i = 0, i = 1, . . . , n

xi

xi

j=1

i 0 and j = 0 if gj (x ) < 0, j = 1, . . . , m

i 0 and i = 0 if xi > 0, i = 1, . . . , n

94

the necessary conditions for the optimization problem are sometime formulated slightly

dierently below.

m

f (x ) gj (x )

j

0 (= 0 if xi > 0), i = 1, . . . , n

xi

xi

j=1

m

f (x ) gj (x )

j

= i , i = 1, . . . , n

xi

xi

j=1

6.7

the case when f () is concave and each gj is a convex function. In this case, the set of

feasible vectors satisfying the m constraints is convex. We write the concave program as

follows:

max f (x) subject to g(x) 0

where g(x) = (g1 (x), . . . , gm (x)) and 0 = (0, . . . , 0).

Denition 6.3 Let f () be concave on convex set S Rn , and let x0 be an interior

point in S. Then, there exists a vector p Rn such that for all x S,

f (x) f (x0 ) p (x x0 ).

A vector p that satises the above inequality is called a supergradient for f at x0 .

Denition 6.4 The nonlinear programming problem satises the Slater qualication

if there exists a vector z Rn such that g(z) 0, i.e., gj (z) < 0 for all j.

Theorem 6.13 (Necessary Conditions for Concave Programming) Suppose that

the nonlinear programming is a concave programming satisfying the Slater constraint

qualication. Then, the optimal value function f (c) is dened for (at least) all c g(z),

and has a super gradient at 0. Furthermore, if is any supergradient of f at 0, then

0, and any solution x of the concave programming problem is an unconstrained maximum point of the Lagrangian L(x, ) = f (x) g(x) which also satises g(x ) = 0

(the complementary slackness condition).

Proof of Necessity for Concave Programming: We consider only the special

but usual case where, for all c Rm , the feasible set of points x that satisfy g(x) c

is bounded, so compact because of the assumption that the functions gj are C 1 , i.e.,

continuous. (See Theorem 3.16). In this case, f (c) is dened as a maximum value

whenever there exists at least one x satisfying g(x) c, which is certainly true when

c g(z). Then, f is dened for all c g(z).

95

nonlinear programming problem with f () concave and g() convex, and assume that there

exists a vector 0 and a feasible vector x which together have the property that x

maximizes f (x) g(x) among all x Rn , and g(x ) = 0. Then, x solves the

original concave problem and is a supergradient for f at 0.

6.8

Quasiconcave Programming

The following theorem is important for economists, because in many economic optimization problems, the objective function is assumed to be quasiconcave, rather than

concave.

Theorem 6.15 (Arrow and Enthoven (1961) in Econometrica ) (Sucient Conditions for Quasiconcave Programming): Consider the constrained optimization

problem where the objective function f () is C 1 and quasiconcave. Assume that there

exist numbers 1 , . . . , m and a vector x such that

1. x is feasible and satises the Kuhn-Tucker conditions.

2. f (x )

= 0.

3. j gj (x) is quasiconvex for each j = 1, . . . , m.

Then, x is optimal.

6.9

96

Chapter 7

Dierential Equations

7.1

Introduction

algebraic equations, in a dierential equation:

The unknown is a function, not a number.

The equation includes one or more of the derivatives of the function.

An ordinary dierential equation is one for which the unknown is a function of only

one variable. Partial dierential equations are equations where the unknown is a function

of two or more variables, and one or more the partial derivatives of the function are

included. In this chapter, I restrict attention to rst-order ordinary dierential equations

that is, equations where the rst-order derivatives of the unknown functions of one

variable are included.

Consider the following dierential equation:

dx

= ax

dt

x =

()

derivative of x(t). The above equation says that for any t R,

x (t) = ax(t),

where a is some constant. I propose here x(t) = Keat as a solution to the dierential

equation.

97

Chapter 8

The xed point problem is

Given a set S Rn and a function f : S S, is there an x S such that

f (x) = x?

The problem of nding the zeros of a function, f (), and x S such that f (x) = 0,

can be converted into a xed point problem. To see this, observe that f (x) = 0 i

g(x) = x where g(x) = f (x) + x. The unconstrained optimization with concave objective

function is a special case of the xed point theorem. In this case, the optimal solution is

found by solving f (x) = 0.

8.1

d(x, y) for all x, y S, where 0 < 1 is a xed constant.

Theorem 8.1 (Banach Fixed Point Theorem) Let S Rn be closed and f : S S

a contraction mapping. Then, there exists a unique x S such that f (x) = x.

Proof of Banach Fixed Point Theorem: We dene the norm of vectors in Rn as

follows.

x max |xi |

1n

We use this norm in the proof of the implicit function theorem (Theorem 5.17). Choose

any x0 S and let xk = f (xk1 ). If a sequence {xk } has a limit x , then x S because

S is closed, and f (x ) = x . Therefore, it suces to prove that {xk } has a limit. We use

the Cauchy criterion. Pick q > p. Then,

6

6

6 q1

6

q1

6

6

6

6

(xk+1 xk )6

xk+1 xk

xq xp = 6

6k=p

6

k=p

Minkowski inequality

98

But

xk+1 xk = f (xk ) f (xk1 )| xk xk1 .

Repeated application of the above yields

xk+1 xk k x1 x0

Hence

xq xp

q1

k x1 x0

k=p

!

x1 x0 p + p+1 +

p

0 as p, q

= x1 x0

1

Because p 0 as p due to < 1.

8.2

Lemma 8.1 If f : [0, 1] [0, 1] is continuous, there exists x [0, 1] such that f (x) = x.

Proof of Lemma 8.1: Each x [0, 1] can be represented as a convex combination

of the end points of the interval:

x = (1 x) 0 + x 1

The same will be true for f (x). So we express each x [0, 1] as a pair of nonnegative

numbers (x1 , x2 ) = (1 x, x) that add to one. When expressing f (x) in this way, we will

write it as (f1 (x), f2 (x)) = (1 f (x), f (x)). Suppose for a contradiction, that f has no

xed point.

Since f : [0, 1] [0, 1] we can think of the function f () as moving each point x [0, 1]

either to the right (if f (x) > x) or to the left (if f (x) < x). The assumption that f ()

has no xed point eliminates the possibility that f () leaves the position of x unchanged.

Given any x [0, 1], we label it with a + if f1 (x) < x1 (move to the right) and

label it if f1 (x) > x1 (move to the left). The assumption of no xed point implies

f1 (x)

= 1 x for all x [0, 1]. Thus, the labeling scheme is well dened. Notice that

the point 0 will be labeled (+) and the point 1 will be labeled ().

Choose any nite partition, 0 , of the interval [0, 1] into smaller intervals.

Claim 8.1 The partition 0 must contain a subinterval [x0 , y 0 ] whose endpoints have

dierent labels.

99

Proof of Claim 8.1: Every endpoint of these subintervals is labeled either (+) or

(). The point 0, which must be the endpoint of the subinterval of 0 , has label (+).

The point 1 has label (). As we travel from 0 to 1 (left to right), we leave a point

labeled (+) and arrive at a point labeled (). At some point, we must pass through a

subintervals which has endpoints with dierent labels.

Now take the partition 0 and form a new partition 1 , ner than the rst by taking

all the subintervals in 0 whose endpoints have dierent labels and cutting them in half.

In 1 , there must exist at least one subinterval, [x1 , y 1 ] with endpoints having dierent

labels. Repeat this procedure indenitely.

This produces an innite sequence of subintervals {(xk , y k )} shrinking in size with

dierent labels at the endpoints. Furthermore, we can choose a subsequence of them so

that the left hand endpoint, xk , is labeled (+) and the right hand endpoint y k is labeled

(). Since these intervals live in [0, 1], their lengths are bounded. Therefore, by BolzanoWeierstrass theorem (Theorem 3.13), there is a convergent subsequence of them, with

|xk y k | 0 as k . By continuity of f (), |f (xk ) f (y k )| 0 as k .

Let z be the limit point of {xk } and {y k }. By continuity, f (xk ) and f (y k ) both

converge to f (z). Since each xk is labeled (+) and each y k is labeled (), for each k, we

have f1 (xk ) < xk1 and in the limit f1 (z) z1 . For each k, we have f1 (y k ) > y1k and in

the limit f1 (z) z1 . Thus, f1 (z) z1 and f1 (z) z1 . This implies that f1 (z) = z, i.e.,

f (z) = z, a xed point. This is a contradiction.

Denition 8.2 The n-simplex is the set n = {x Rn |

all i = 1, . . . , n}.

n

i=1 xi

= 1 and xi 0 for

From the denition, n is convex and compact. We also see that this is an (n 1)dimensional object.

Lemma 8.2 If f : n n is a continuous function, then there exists x n such

that f (x) = x.

We skip the proof of Lemma 8.2. We should note that Lemma 8.1 is a special case of

Lemma 8.2. Before showing Brouwer xed point theorem, we need some preliminaries.

Denition 8.3 A set A is topologically equivalent to a set B if there exists a continuous function g with continuous inverse such that g(A) = B and g1 (B) = A.

Observe that topological equivalence is a weaker requirement than that for the inverse

function theorem. Do you see why? The closed n-ball of center x0 in Rn is the set

{x Rn |d(x, x0 ) 1}. Note that a closed n-ball is of dimension n.

Theorem 8.2 A nonempty compact convex set S Rn of dimension m n is topologically equivalent to a closed ball in Rm .

100

We skip the proof of Theorem 8.2. Now, it is time to prove Brouwer xed point

theorem.

Theorem 8.3 (Brouwer Fixed Point Theorem (1912)) If S Rn is compact and

convex and f : S S is continuous, there exists x S such that f (x) = x.

Proof of Brouwer Fixed Point Theorem: Lemma 8.2 shows that a continuous

function f : n n must have a xed point. Then, it only remains to prove that there

is no loss of generality to assume that S = n as any compact convex set of dimension

n 1 in Rn . To do so, we make use of the topological equivalence of compact convex

sets.

If S is a compact convex set of dimension n 1, we know from Theorem 8.2 that

there is a g : S n and g1 : n S such that g and g1 are continuous. Dene

h : n n as follows:

!#

"

h(x) = g f g1 (x) .

Since

h() is !#

continuous, by Lemma 8.2, it has a xed point x . Therefore, h(x ) =

"

1

101

Chapter 9

9.1

Separation Theorems

H = {x Rn |a x = }

is a hyperplane in Rn , with a as its normal. Moreover, the hyperplane H separates Rn

into two closed half-spaces.

If S and T are subsets of Rn , then H is said to separate S and T if S is contained in

one of the closed half-spaces determined by H and T is contained in the other. In other

words, S and T can be separated by a hyperplane if there exists a vector a

= 0 and a

scalar such that for all x S and y T ,

ax ay

If both inequalities are strict, then the hyperplane H = {x Rn |a x = } strictly

separates S and T .

Theorem 9.1 (Strict Separation Theorem) Let S be a closed convex set in Rn , and

let y be a point in Rn that does not belong to S. Then there exists a nonzero vector

a Rn \{0} and a number R such that

ax< <ay

for all x S. For every such , the hyperplane H = {x Rn |a x = } strictly separates

S and y.

Proof of Theorem 9.1: Because S is a closed set, among all the points of S, there

is one w = (w1 , . . . , wn ) that is closest to y. For this we can suppose that there is no

such closest point. Then, closedness of S gives us a contradiction. You should ll the

gap in the above argument. Let a = y w. Since w S and y

/ S, it follows that a

= 0.

Note that a (y w) = a a > 0, and so a w < a y. Suppose we prove that

a x a w x S

102

()

Then, the theorem is true for every number (a w, a y). Now, it remains to show

(). Let x be any point in S. Since S is convex, x + (1 )w S for each [0, 1].

Now dene g() as the square of the distance from x + (1 )w to the point y:

g() = y (x + (1 )w)2 = y w + (w x)2

w2 , the square of the distance between y and w. But w is the point in S that is closest to

y, so g() g(0) for all [0, 1]. It follows that 0 g (0) = 2(yw)(wx) = a(wx),

which proves ().

In the proof of the above theorem, it was essential that y did not belong to S. If S

is an arbitrary convex set (not necessarily closed), and if y is not an interior point of S,

then it seems plausible that y can still be separated from S by a hyperplane. If y is a

boundary point of S, such a hyperplane is called a supporting hyperplane to S at y.

Theorem 9.2 (Separating Hyperplane) Let S be a convex set in Rn and suppose

y = (y1 , . . . , yn ) is not an interior point of S. Then, there exists a nonzero vector

a Rn such that

axay

for every x S

Do

Proof of Theorem 9.2: Let S be the closure of S. Because S is convex, so is S.

you see why? Because y is not an interior point of S and S is convex, y is not an interior

Hence, there is a sequence {yk } of points for which yk

/ S for each k and

point of S.

yk y as k . Now, yk

theorem (Theorem 8.4), for each k, there exists a vector ak

= 0 such that ak x < ak yk for

Without loss of generality, we can assume that ak = 1 for each k. Then, {ak }

all x S.

is a sequence of vectors in {x Rn |x = 1} which is a compact set. Bolzano-Weierstrass

theorem (Theorem 3.13) shows that {ak } has a convergent subsequence {akj }

j=1 . Let

as

a = limj akj . Then, a x = limj akj x limj akj ykj = a y for every x S,

required. Here we make use of continuity of linear functions. Moreover, we can conrm

that a

= 0 because a = limj akj = 1.

Theorem 9.3 (Separating Hyperplane Theorem) Let S and T be two disjoint nonempty

convex sets in Rn . Then, there exists a nonzero vector a Rn and a scalar R such

that

ax ay

Proof of Separating Hyperplane Theorem: Let W = S T be the vector

dierence of the two convex sets S and T . Since S and T are disjoint, 0

/ W . First, I

claim that W is convex.

Claim 9.1 W is convex.

103

t, t T such that w = s t and w = s t . Let [0, 1]. What we want to show is

that w + (1 )w W . We compute the convex combination below.

w + (1 )w

= s t + (1 )s (1 )t

)

* )

*

= s + (1 )s t + (1 )t

w + (1 )w S T = W .

Hence, by the previous theorem (Theorem 8.5), there exists an a

= 0 such that

a w a 0 = 0 for all w W . Let x S and y T be any two points of these sets.

Then w = x y W by denition, so a (x y) 0. Hence

axay

()

From () it follows that the set A = {a x|x S} is bounded above by a y for any

y T . By Fact 2.1 (Least Upper Bound Principle), A has a supremum . Since

is the least upper bound of A, it follows that a y for every y T . Therefore,

a x a y for all x S and all y T . Thus, S and T are separated by the

hyperplane {z Rn |a z = }.

Theorem 9.4 Let S and T be two disjoint, nonempty, closed, convex sets in Rn with S

being bounded. Then, there exists a nonzero vector a Rn and a scalar R such that

ax> >ay

Lemma 9.2 Let A be an m n matrix, then cone(A) is a closed set.

Theorem 9.5 (Farkas Lemma) Let A be an m n matrix, b Rm and F = {x

Rn | Ax = b, x 0}. Then either F

= or there exists y Rm such that yA 0 and

yb < 0 but not both.

Proof of Farkas Lemma:

9.2

Denition 9.1 A cone C Rn is polyhedral if there is a matrix A such that C = {x

Rn |Ax 0}.

Geometrically, a polyhedral cone is the intersection of a nite number of half-spaces

through the origin.

Theorem 9.6 (Farkas-Minkowski-Weyl) A cone C is polyhedral if and only if there

is a nite matrix A such that C = cone(A).

104

9.3

Dimension of a Set

9.4

105

- CBSE-2013-SP-I.pdfHochgeladen vonmaruthinmdc
- Maths for jeeHochgeladen vonSarthak Lalchandani
- MathsHochgeladen vonYash Mangal
- Application of Fuzzy Logic to Social Choice TheoryHochgeladen vonmariobmac7395
- 10th Mathematics Matriculation LevelHochgeladen vonbresks
- Arithmetic and Geometric ProgressionsHochgeladen voninfinitykarthik
- TopologyHochgeladen vonAyak Chol
- Intro Analysis CopyHochgeladen vonBrian Precious
- gidea2Hochgeladen vonmradulescu.csmro3950
- MIB.LECTURE9Hochgeladen vonmrtfkhang
- ch2-folland.pdfHochgeladen vonerlandfilho
- DMS RelationsHochgeladen vonsubramanyam62
- Real Analysis First Year NotesHochgeladen vonMomo
- Calculus Option- Pearson.pdfHochgeladen vonYuhan Wang
- 1443Hochgeladen vonSilviu
- 9781431522958_Technical Mathematics Grade 10.epubHochgeladen vonSandhya Chowdary
- Supplementary MaterialHochgeladen vonugn_st
- How to Choose - An Introduction to Decision TheoryHochgeladen vonfivehours5
- mathgen-809352135Hochgeladen vonVukhob
- The Basics: Sets, Tuples & Functions SetsHochgeladen vonKelvin
- More Notes for DMHochgeladen vonna10254
- Equivalency Checking IssuesHochgeladen vonprahallad_reddy
- ztp5-12-03-04Hochgeladen vonAnonymous 0U9j6BLllB
- 104 NotesHochgeladen vonrollsroycemr
- Complete Metric SpacesHochgeladen vonHussein Alkafaji
- TS - Part II-ORSA-awHochgeladen vonjkl316
- Seq and Series Review Hw 10Hochgeladen voncaleb castillo
- 6 - 1 - Lecture 45 SequencesHochgeladen vonRahulsinghoooo
- Gabel 1999Hochgeladen vonDewi Fitriyani

- Communique - 60th Meeting of the OECS Authority January 2015Hochgeladen vonT
- Contingent ValuationHochgeladen vonT
- Credit_analysis_lending_management_PDF.pdfHochgeladen voncharlie simo
- The Current Financial and Economic Crisis- Empirical and Methodological Issues..pdfHochgeladen vonT
- Herzien Verdrag Van ChaguaramasHochgeladen vonConsumentenbond Suriname
- twovarsolHochgeladen vonT
- 337.pdfHochgeladen vonT
- 09 Constraints 3Hochgeladen vonsarahjohnson
- Social Research Methods Neuman 1Hochgeladen vonT
- 03 - SequencesHochgeladen vonSora Kage
- M04_Titman_2544318_11_FinMgt_C04Hochgeladen vonkennethfarnum
- Cibc Fcib Group Ar 2015Hochgeladen vonT
- Reviewed Work- Money and Capital in Economic Development by Ronald McKinnonHochgeladen vonT
- Finance and Growth Causality- A Test of the Patrick’s Stage of Development HypothesisHochgeladen vonT
- Course Outline - Mathematical MethodsHochgeladen vonT
- ECON 6030 Microeconomic Theory_2016Hochgeladen vonT
- ECON6044-FinancialMarketsandInstitutionsHochgeladen vonT
- Revised Treaty of Basseterre Establishing the Organisation of Eastern Caribbean States Economic UnionHochgeladen vonT

- design process 20 questions mason watkinsHochgeladen vonapi-344618427
- 1972Hochgeladen vonMushtaqElahiShaik
- UNIT_5_PowerQualityMonitoring.pdfHochgeladen vonBharathKumar
- Schedules of accounts.pdfHochgeladen vonOr Gio
- 1 2 1 Javascript ExtensionsHochgeladen vonAlexander Gräf
- NFS Services Administrator's Guide HP-UX 11i Version 3Hochgeladen vonPhuc Nguyen
- tcpipstackHochgeladen vonsnail2hemant
- Guided Procedure IntroHochgeladen vonAswin Girmaji
- C3 Yard Data SheetHochgeladen vonMaria Miranda
- LibreOffice Database Handbook 4Hochgeladen vonVioleta Xevin
- how to use the FDISK and FORMAT commands.docHochgeladen vonIwel Nagan
- Automationlibraryreference4 6 2(1)Hochgeladen vonkuchow
- Organic Chemistry by Jagdamba Singh eBook AndHochgeladen vonprayag527
- Chapter6 Lab AnswersHochgeladen vonswag99
- Manual Featurecam 2011Hochgeladen vonSibiagny Villegas Arroyo
- aHochgeladen vonworkmumbai3870
- lte-l16-course-flows-w-5g-mod.pptHochgeladen vonAseem Rajpal
- Buckman ReportHochgeladen vonbharatjai
- angajatori-oradeaHochgeladen vonioana
- TechFest '14Hochgeladen vonAnukul Sangwan
- Sample1(MT1) for calculus 1Hochgeladen vonmiro1993
- Solutions Custom ErrorsHochgeladen vonapi-3811230
- Winrunner Testing Winrunner 1Hochgeladen vonSrujana
- FEA of Ellipsoidal Head Pressure Vessel_by_VikramHochgeladen vonaruatscribd
- Sap ERP Modules IntroductionHochgeladen vonDigital Vivekananda - Digital Library by Jyoti
- 06-Icf Fncm 5.2.1 Notifications Pit PmtHochgeladen vonMukund
- Forms 4.5 Reference Manual, Vol. 1 and 2Hochgeladen vonMouson Chen
- Sappress Abap Development for Sap Netweaver Bi (Sample)Hochgeladen vonomicon
- Catia - Knowledge Advisor 2 (Kwa) BroucherHochgeladen vonCarlos Prego
- Microprocessor LectureHochgeladen vonAlgene Cuesta Frias