Beruflich Dokumente
Kultur Dokumente
Kuttler
I Prerequisite Material 11
1 Some Fundamental Concepts 13
1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.1 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.2 The Schroder Bernstein Theorem . . . . . . . . . . . . . . . . 16
1.1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . 19
1.2 lim sup And lim inf . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Double Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Sequences 43
3.1 The Inner Product And Dot Product . . . . . . . . . . . . . . . . . . 43
3.1.1 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Vector Valued Sequences And Their Limits . . . . . . . . . . . . . . 46
3.3 Sequential Compactness . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Closed And Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Cauchy Sequences And Completeness . . . . . . . . . . . . . . . . . 55
3.6 Shrinking Diameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Continuous Functions 61
4.1 Continuity And The Limit Of A Sequence . . . . . . . . . . . . . . . 64
4.2 The Extreme Values Theorem . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Sequences And Series Of Functions . . . . . . . . . . . . . . . . . . . 72
4.6 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Sequences Of Polynomials, Weierstrass Approximation . . . . . . . . 77
4.7.1 The Tietze Extension Theorem . . . . . . . . . . . . . . . . . 81
3
4 CONTENTS
Prerequisite Material
11
Some Fundamental Concepts
1. Two sets are equal if and only if they have the same elements.
2. To every set A, and to every condition S (x) there corresponds a set B, whose
elements are exactly those elements x of A for which S (x) holds.
3. For every collection of sets there exists a set that contains all the elements
that belong to at least one set of the given collection.
5. If A is a set there exists a set P (A) such that P (A) is the set of all subsets
of A. This is called the power set.
13
14 SOME FUNDAMENTAL CONCEPTS
determine whether the element in question is in the set. For example, the set of all
integers which are multiples of 2. This set could be specied as follows.
{x Z : x = 2y for some y Z} .
In this notation, the colon is read as such that and in this case the condition is
being a multiple of 2.
Another example of political interest, could be the set of all judges who are not
judicial activists. I think you can see this last is not a very precise condition since
there is no way to determine to everyones satisfaction whether a given judge is an
activist. Also, just because something is grammatically correct does not
mean it makes any sense. For example consider the following nonsense.
So what is a condition?
We will leave these sorts of considerations and assume our conditions make sense.
The axiom of unions states that for any collection of sets, there is a set consisting
of all the elements in each of the sets in the collection. Of course this is also open to
further consideration. What is a collection? Maybe it would be better to say set
of sets or, given a set whose elements are sets there exists a set whose elements
consist of exactly those things which are elements of at least one of these sets. If S
is such a set whose elements are sets
{A : A S} or S
are also elements of B. The axiom of specication shows this is a set. The empty
set is the set which has no elements in it, denoted as . A B denotes the union
of the two sets A and B and it means the set of all elements which are in either of
the sets. It is a set because of the axiom of unions.
The complement of a set, (the set of things which are not in the given set ) must
be taken with respect to a given set called the universal set which is a set which
contains the one whose complement is being taken. Thus, the complement of A,
denoted as AC ( or more precisely as X \ A) is a set obtained from using the axiom
of specication to write
AC {x X : x / A}
The symbol / means: is not an element of. Note the axiom of specication takes
place relative to a given set. Without this universal set it makes no sense to use
the axiom of specication to obtain the complement.
Words such as all or there exists are called quantiers and they must be
understood relative to some given set. For example, the set of all integers larger
than 3. Or there exists an integer larger than 7. Such statements have to do with a
given set, in this case the integers. Failure to have a reference set when quantiers
are used turns out to be illogical even though such usage may be grammatically
correct. Quantiers are used often enough that there are symbols for them. The
symbol is read as for all or for every and the symbol is read as there
exists. Thus could mean for every upside down A there exists a backwards
E.
DeMorgans laws are very useful in mathematics. Let S be a set of sets each of
which is contained in some universal set U . Then
{ } C
AC : A S = ( {A : A S})
and { } C
AC : A S = ( {A : A S}) .
These laws follow directly from the denitions. Also following directly from the
denitions are:
Let S be a set of sets then
B {A : A S} = {B A : A S} .
B {A : A S} = {B A : A S} .
Unfortunately, there is no single universal set which can be used for all sets.
Here is why: Suppose there were. Call it S. Then you could consider A the set
of all elements of S which are not elements of themselves, this from the axiom of
specication. If A is an element of itself, then it fails to qualify for inclusion in A.
Therefore, it must not be an element of itself. However, if this is so, it qualies for
inclusion in A so it is an element of itself and so this cant be true either. Thus
16 SOME FUNDAMENTAL CONCEPTS
the most basic of conditions you could imagine, that of being an element of, is
meaningless and so allowing such a set causes the whole theory to be meaningless.
The solution is to not allow a universal set. As mentioned by Halmos in Naive
set theory, Nothing contains everything. Always beware of statements involving
quantiers wherever they occur, even this one. This little observation described
above is due to Bertrand Russell and is called Russells paradox.
X Y {(x, y) : x X and y Y }
D (f ) {x : (x, y) f } ,
written as f : D (f ) Y .
It is probably safe to say that most people do not think of functions as a type
of relation which is a subset of the Cartesian product of two sets. A function is like
a machine which takes inputs, x and makes them into a unique output, f (x). Of
course, that is what the above denition says with more precision. An ordered pair,
(x, y) which is an element of the function or mapping has an input, x and a unique
output, y,denoted as f (x) while the name of the function is f . mapping is often
a noun meaning function. However, it also is a verb as in f is mapping A to B
. That which a function is thought of as doing is also referred to using the word
maps as in: f maps X to Y . However, a set of functions may be called a set of
maps so this word might also be used as the plural of a noun. There is no help for
it. You just have to suer with this nonsense.
The following theorem which is interesting for its own sake will be used to prove
the Schroder Bernstein theorem.
A B = X, C D = Y, A B = , C D = ,
f (A) = C, g (D) = B.
1.1. SET THEORY 17
X Y
f
A - C = f (A)
g
B = g(D) D
C f (A) , D Y \ C, B X \ A.
Denition 1.1.4 Let I be a set and let Xi be a set for each i I. f is a choice
function written as
f Xi
iI
The axiom of choice says that if Xi = for each i I, for I a set, then
Xi = .
iI
Sometimes the two functions, f and g are onto but not one to one. It turns out
that with the axiom of choice, a similar conclusion to the above may be obtained.
Similarly g01 is one to one. Therefore, by the Schroder Bernstein theorem, there
exists h : X Y which is one to one and onto.
Denition 1.1.6 A set S, is nite if there exists a natural number n and a map
which maps {1, , n} one to one and onto S. S is innite if it is not nite. A
set S, is called countable if there exists a map mapping N one to one and onto
S.(When maps a set A to a set B, this will be written as : A B in the future.)
Here N {1, 2, }, the natural numbers. S is at most countable if there exists a
map : N S which is onto.
Theorem 1.1.7 If X and Y are both at most countable, then X Y is also at most
countable. If either X or Y is countable, then X Y is also countable.
X = {x1 , x2 , x3 , }
and
Y = {y1 , y2 , y3 , } .
Consider the following array consisting of X Y and path through it.
x1 x2 x3
y1 y2
Thus the rst element of X Y is x1 , the second is x2 the third is y1 the fourth is
y2 etc.
Consider the second claim. By the rst part, there is a map from N onto X Y .
Suppose without loss of generality that X is countable and : N X is one to one
and onto. Then dene (y) 1, for all y Y ,and (x) 1 (x). Thus, maps
X Y onto N and this shows there exist two onto maps, one mapping X Y onto N
and the other mapping N onto X Y . Then Corollary 1.1.5 yields the conclusion.
person, just that they weighed the same. Often such relations involve considering
one characteristic of the elements of a set and then saying the two elements are
equivalent if they are the same as far as the given characteristic is concerned.
2. If x y then y x. (Symmetric)
Denition 1.1.10 [x] denotes the set of all elements of S which are equivalent to
x and [x] is called the equivalence class determined by x or just the equivalence class
of x.
With the above denition one can prove the following simple theorem.
Theorem 1.1.11 Let be an equivalence class dened on a set S and let H denote
the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes,
either x y and [x] = [y] or it is not true that x y and [x] [y] = .
Proof: Let sup ({An : n N}) = r. In the rst case, suppose r < . Then
letting > 0 be given, there exists n such that An (r , r]. Since {An } is
increasing, it follows if m > n, then r < An Am r and so limn An = r
as claimed. In the case where r = , then if a is a real number, there exists n
such that An > a. Since {Ak } is increasing, it follows that if m > n, Am > a. But
this is what is meant by limn An = . The other case is that r = . But
in this case, An = for all n and so limn An = . The case where An is
decreasing is entirely similar.
n
Sometimes the limit of a sequence does not exist. For example, if an = (1) ,
then limn an does not exist. This is because the terms of the sequence are a
distance of 1 apart. Therefore there cant exist a single number such that all the
terms of the sequence are ultimately within 1/4 of that number. The nice thing
about lim sup and lim inf is that they always exist. First here is a simple lemma
and denition.
Denition 1.2.3 Denote by [, ] the real line along with symbols and .
It is understood that is larger than every real number and is smaller than
every real number. Then if {An } is an increasing sequence of points of [, ] ,
limn An equals if the only upper bound of the set {An } is . If {An } is
bounded above by a real number, then limn An is dened in the usual way and
equals the least upper bound of {An }. If {An } is a decreasing sequence of points of
[, ] , limn An equals if the only lower bound of the sequence {An } is
. If {An } is bounded below by a real number, then limn An is dened in the
usual way and equals the greatest lower bound of {An }. More simply, if {An } is
increasing,
lim An = sup {An }
n
and if {An } is decreasing then
lim An = inf {An } .
n
Lemma 1.2.4 Let {an } be a sequence of real numbers and let Un sup {ak : k n} .
Then {Un } is a decreasing sequence. Also if Ln inf {ak : k n} , then {Ln } is
an increasing sequence. Therefore, limn Ln and limn Un both exist.
Proof: Let Wn be an upper bound for {ak : k n} . Then since these sets are
getting smaller, it follows that for m < n, Wm is an upper bound for {ak : k n} .
In particular if Wm = Um , then Um is an upper bound for {ak : k n} and so Um
is at least as large as Un , the least upper bound for {ak : k n} . The claim that
{Ln } is decreasing is similar.
From the lemma, the following denition makes sense.
Denition 1.2.5 Let {an } be any sequence of points of [, ]
lim sup an lim sup {ak : k n}
n n
lim sup an
n
and
lim inf an
n
Suppose rst that limn an exists and is a real number. Then by Theorem 3.5.3
{an } is a Cauchy sequence. Therefore, if > 0 is given, there exists N such that if
m, n N, then
|an am | < /3.
From the denition of sup {ak : k N } , there exists n1 N such that
It follows that
2
sup {ak : k N } inf {ak : k N } |an1 an2 | + < .
3
Since the sequence, {sup {ak : k N }}N =1 is decreasing and {inf {ak : k N }}N =1
is increasing, it follows from Theorem 3.2.7
Since sup {ak : k N } inf {ak : k N } it follows that for every > 0, there
exists N such that
sup {ak : k N } inf {ak : k N } <
Thus if m, n > N, then
|am an | <
which means {an } is a Cauchy sequence. Since R is complete, it follows that
limn an a exists. By the squeezing theorem, it follows
a = lim inf an = lim sup an
n n
With the above theorem, here is how to dene the limit of a sequence of points
in [, ].
Denition 1.2.7 Let {an } be a sequence of points of [, ] . Then limn an
exists exactly when
lim inf an = lim sup an
n n
and in this case
lim an lim inf an = lim sup an .
n n n
The signicance of lim sup and lim inf, in addition to what was just discussed,
is contained in the following theorem which follows quickly from the denition.
Theorem 1.2.8 Suppose {an } is a sequence of points of [, ] . Let
= lim sup an .
n
The proof of this theorem is left as an exercise for you. It follows directly from
the denition and it is the sort of thing you must do yourself. Here is one other
simple proposition.
Proof: This follows from the denition. Let n = sup {ak bk : k n} . For all n
large enough, an > a where is small enough that a > 0. Therefore,
n sup {bk : k n} (a )
In other words, rst sum on j yielding something which depends on k and then sum
these. The major consideration for these double series is the question of when
ajk = ajk .
k=m j=m j=m k=m
In other words, when does it make no dierence which subscript is summed over
rst? In the case of nite sums there is no issue here. You can always write
M
N
N
M
ajk = ajk
k=m j=m j=m k=m
1.3. DOUBLE SERIES 25
because addition is commutative. However, there are limits involved with innite
sums and the interchange in order of summation involves taking limits in a dierent
order. Therefore, it is not always true that it is permissible to interchange the
two sums. A general rule of thumb is this: If something involves changing the
order in which two limits are taken, you may not do it without agonizing over
the question. In general, limits foul up algebra and also introduce things which are
counter intuitive. Here is an example. This example is a little technical. It is placed
here just to prove conclusively there is a question which needs to be considered.
Example 1.3.1 Consider the following picture which depicts some of the ordered
pairs (m, n) where m, n are positive integers.
0 0 0 0 0 c 0 -c
0 0 0 0 c 0 -c 0
0 0 0 c 0 -c 0 0
0 0 c 0 -c 0 0 0
0 c 0 -c 0 0 0 0
b 0 -c 0 0 0 0 0
0 a 0 0 0 0 0 0
The numbers next to the point are the values of amn . You see ann = 0 for all
n, a21 = a, a12 = b, amn = c for (m, n) on the line y = 1 + x whenever m > 1, and
amn = c for all (m, n) on the line y = x 1 whenever m > 2.
Then m=1 amn = a if n = 1, m=1 amn = b c if n = 2 and if n >
2, m=1 amn = 0. Therefore,
amn = a + b c.
n=1 m=1
Next
observe that n=1 amn = b if m = 1, n=1 amn = a + c if m = 2, and
a
n=1 mn = 0 if m > 2. Therefore,
amn = b + a + c
m=1 n=1
and so the two sums are dierent. Moreover, you can see that by assigning dierent
values of a, b, and c, you can get an example for any two dierent numbers desired.
It turns out that if aij 0 for all i, j, then you can always interchange the order
of summation. This is shown next and is based on the following lemma. First, some
notation should be discussed.
26 SOME FUNDAMENTAL CONCEPTS
Proof: Note that for all a, b, f (a, b) supbB supaA f (a, b) and therefore, for
all a, supbB f (a, b) supbB supaA f (a, b). Therefore,
sup sup f (a, b) sup sup f (a, b) .
aA bB bB aA
Repeat the same argument interchanging a and b, to get the conclusion of the
lemma.
Theorem 1.3.4 Let aij 0. Then
aij = aij .
i=1 j=1 j=1 i=1
Proof: First note there is no trouble in dening these sums because the aij are
all nonnegative. If a sum diverges, it only diverges to and so is the value of
the sum. Next note that
n
aij sup aij
n
j=r i=r j=r i=r
n
m
n
m
= sup lim aij = sup lim aij
n m n m
i=r j=r i=r j=r
n
n
= sup aij = lim aij = aij
n n
i=r j=r i=r j=r i=r j=r
Denition 2.1.2 The elementary matrices consist of those matrices which result
by applying a row operation to an identity matrix. Those which involve switching
rows of the identity are called permutation matrices1 .
27
28 ROW REDUCED ECHELON FORM
Now consider what these elementary matrices look like. First Pij , which involves
switching row i and row j of the identity where i < j. We write
r1
..
.
ri
I = ...
rj
.
..
rn
where
rj = (0 1 0)
r1
..
.
rj
..
.
ri
.
..
rn
r1 v1 v1
.. .. ..
. . .
rj vi vj
.. .. = ..
. . .
ri vj vi
. .. ..
.. . .
rn vn vn
2.1. ELEMENTARY MATRICES 29
Lemma 2.1.3 Let P ij denote the elementary matrix which involves switching the
ith and the j th rows of I. Then if P ij , A are conformable, we have
P ij A = B
30 ROW REDUCED ECHELON FORM
Denote by E (c, i) this elementary matrix which multiplies the ith row of the identity
by the nonzero constant, c. Then from what was just discussed and the way matrices
are multiplied,
a11 a12 a1p
.. .. ..
. . .
E (c, i) ai1 ai2 aip
. .. ..
.. . .
an1 an2 anp
equals a matrix having the columns indicated below.
a11 a12 a1p
.. .. ..
. . .
= ca i1 ca i2 ca ip
. .. ..
. . . .
an1 an2 anp
2.1. ELEMENTARY MATRICES 31
Lemma 2.1.4 Let E (c, i) denote the elementary matrix corresponding to the row
operation in which the ith row is multiplied by the nonzero constant c. Thus E (c, i)
involves multiplying the ith row of the identity matrix by c. Then
E (c, i) A = B
Finally consider the third of these row operations. Letting rj be the j th row of
the identity matrix, denote by E (c i + j) the elementary matrix obtained from
the identity matrix by replacing rj with rj + cri . In case i < j this will be of the
form
r1
..
.
ri
..
.
cri + rj
..
.
rn
Now consider what this does to a column vector.
r1 v1 v1
.. .. ..
. . .
ri vi vi
.. .. ..
. . = .
cri + rj vj cvi + vj
.. . ..
. .. .
rn vn vn
We will discuss methods for nding the inverse later. For now, observe that
Theorem 2.1.6 says that elementary matrices are invertible and that the inverse of
such a matrix is also an elementary matrix. The major conclusion of the above
Lemma and Theorem is the following lemma about linear relationships.
Denition 2.1.9 Let v1 , , vk , u be vectors. Then u is said to be a linear com-
bination of the vectors {v1 , , vk } if there exist scalars c1 , , ck such that
k
u= ci vi .
i=1
We also say that when the above holds for some scalars c1 , , ck , there exists a
linear relationship between the vector u and the vectors {v1 , , vk }.
34 ROW REDUCED ECHELON FORM
We will discuss this more later, but the following picture illustrates the geometric
signicance of the vectors which have a linear relationship with two vectors u, v
pointing in dierent directions.
1u y
v
x
The following lemma states that linear relationships between columns in a ma-
trix are preserved by row operations. This simple lemma is the main result in
understanding all the major questions related to the row reduced echelon form as
well as many other topics.
Lemma 2.1.10 Let A and B be two m n matrices and suppose B results from a
row operation applied to A. Then the k th column of B is a linear combination of the
i1 , , ir columns of B if and only if the k th column of A is a linear combination of
the i1 , , ir columns of A. Furthermore, the scalars in the linear combinations are
the same. (The linear relationship between the k th column of A and the i1 , , ir
columns of A is the same as the linear relationship between the k th column of B
and the i1 , , ir columns of B.)
Proof. Let A be the following matrix in which the ak are the columns
( )
a1 a2 an
and let B be the following matrix in which the columns are given by the bk
( )
b1 b2 bn
Then by Theorem 2.1.6 on Page 32, bk = Eak where E is an elementary matrix.
Suppose then that one of the columns of A is a linear combination of some other
columns of A. Say
ak = c1 ai1 + + cr air
Then multiplying by E,
bk = Eak = c1 Eai1 + + cr Eair = c1 bi1 + + cr bir .
This is really just an extension of the technique for nding solutions to a linear
system of equations. In solving a system of equations earlier, row operations were
used to exhibit the last column of an augmented matrix as a linear combination of
the preceding columns. The row reduced echelon form makes obvious all linear
relationships between all columns, not just the last column and those preceding it.
2.2. THE ROW REDUCED ECHELON FORM OF A MATRIX 35
Thus ei is the column vector which has all zero entries except for a 1 in the ith
position down from the top.
The n n matrix
1 0 0
0 1 0
I= .. .. .. (the identity matrix)
. . .
0 0 1
Denition 2.2.3 Given a matrix A, row reduction produces one and only one row
reduced matrix B with A B. See Corollary 2.2.9.
Proof. Viewing the columns of A from left to right, take the rst nonzero
column. Pick a nonzero entry in this column and switch the row containing this
entry with the top row of A. Now divide this new top row by the value of this
nonzero entry to get a 1 in this position and then use row operations to make all
entries below this equal to zero. Thus the rst nonzero column is now e1 . Denote
the resulting matrix by A1 . Consider the sub-matrix of A1 to the right of this
column and below the rst row. Do exactly the same thing for this sub-matrix that
was done for A. This time the e1 will refer to F m1 . Use the rst 1 obtained by the
above process which is in the top row of this sub-matrix and row operations, to zero
out every entry above it in the rows of A1 . Call the resulting matrix A2 . Thus A2
satises the conditions of the above denition up to the column just encountered.
Continue this way till every column has been dealt with and the result must be in
row reduced echelon form.
Now here is some terminology which is often used.
Denition 2.2.5 The rst pivot column of A is the rst nonzero column of A
which becomes e1 in the row reduced echelon form. The next pivot column is the
rst column after this which becomes e2 in the row reduced echelon form. The third
is the next column which becomes e3 in the row reduced echelon form and so forth.
The algorithm just described for obtaining a row reduced echelon form shows
that these columns are well dened, but we will deal with this issue more carefully
in Corollary 2.2.9 where we show that every matrix corresponds to exactly one row
reduced echelon form.
Example 2.2.6 Determine the pivot columns for the matrix
2 1 3 6 2
A= 1 7 8 4 0 (2.2.4)
1 3 4 2 2
B = E1 E2 Em A
1
It follows from Lemma 2.1.8 that (E1 E2 Em ) exists and equals the product of
the inverses of these matrices in the reverse order. Thus
1 1 1
Em Em1 E11 B = (E1 E2 Em ) B
1
= (E1 E2 Em ) (E1 E2 Em ) A = A
By Theorem 2.1.6, each Ek1 is an elementary matrix. By Theorem 2.1.6 again, the
above shows that A results from a sequence of row operations applied to B. The
last claim is left for an exercise.
There are three choices for row operations at each step in Theorem 2.2.4. A
natural question is whether the same row reduced echelon matrix always results in
the end from following any sequence of row operations.
We have already made use of the following observation in nding a linear rela-
tionship between the columns of the matrix A in 2.2.4, but here it is stated more
formally. Now
x1
..
. = x1 e1 + + xn en ,
xn
so to say two column vectors are equal, is to say the column vectors are the same
linear combination of the special vectors ej .
Corollary 2.2.9 The row reduced echelon form is unique. That is if B, C are two
matrices in row reduced echelon form and both are obtained from A by a sequence
of row operations, then B = C.
Proof: Suppose B and C are both row reduced echelon forms for the matrix
A. It follows that B and C have zero columns in the same position because row
operations do not aect zero columns. By Proposition 2.2.8, B and C are row
equivalent. Suppose e1 , , er occur in B for the rst time, reading from left to
right, in positions i1 , , ir respectively. Then from the description of the row
reduced echelon form, each of these columns of B, in positions i1 , , ir , is not a
linear combination of the preceding columns. Since C is row equivalent to B, it
follows from Lemma 2.1.10, that each column of C in positions i1 , , ir is not a
38 ROW REDUCED ECHELON FORM
Corollary 2.2.10 Suppose A is an m n matrix and that m < n. That is, the
number of rows is less than the number of columns. Then one of the columns of A
is a linear combination of the preceding columns of A. Also, there exists x F n
such that x = 0 and Ax = 0.
Proof: Since m < n, not all the columns of A can be pivot columns. In reading
from left to right, pick the rst one which is not a pivot column. Then from the
description of the row reduced echelon form, this column is a linear combination of
the preceding columns. Say
aj = x1 a1 + + xj1 aj1 .
2.3. FINDING THE INVERSE OF A MATRIX 39
AB = BA = I
where I is the identity matrix discussed in In Theorem 2.1.6, it was shown that an
elementary matrix is invertible and that its inverse is also an elementary matrix.
We also showed in Lemma 2.1.8 that the product of invertible matrices is invertible
and that the inverse of this product is the product of the inverses in the reverse
order. In this section, we consider the problem of nding an inverse for a given
n n matrix. Recall that A has an inverse, denoted by A1 if AA1 = A1 A = I.
The procedure for nding the inverse is called the Gauss-Jordan procedure.
The procedure just described actually yields a right inverse. This is a matrix
B such that AB = I.
As mentioned earlier, what you have really found in the above algorithm is a
right inverse. Is this right inverse matrix, which we have called the inverse, really
the inverse, the matrix which when multiplied on both sides gives the identity?
Bx = y
would not be unique, contrary to what was just shown must happen if AB = I. It
follows that a right inverse B 1 for B exists. The above procedure yields
( ) ( )
B I I B 1
Corollary 2.3.3 An n n matrix A has an inverse if and only if the row reduced
echelon form of A is I.
Proof. First suppose the row reduced echelon form of A is I. Then Procedure
2.3.1 yields a right inverse for A. By Theorem 2.3.2 this is the inverse. Next suppose
A has an inverse. Then there exists a unique solution x to the equation
Ax = y
(A|0)
there are no free variables and so every column to the left of | is a pivot column.
Therefore, the row reduced echelon form of A is I.
This also shows the following major theorem.
E1 Es1 Es A = I
Proposition 2.4.2 Let A be an m n matrix which has rank r. Then there exists
a set of r columns of A such that every other column is a linear combination of
these columns. Furthermore, none of these columns is a linear combination of the
other r 1 columns in the set. The rank of A is no larger than the minimum of m
and n. Also the rank added to the nullity equals n.
Proof. Since the rank of A is r it follows that A has exactly r pivot columns.
Thus, in the row reduced echelon form, every column is a linear combination of these
pivot columns and none of the pivot columns is a linear combination of the others
pivot columns. By Lemma 2.1.10 the same is true of the columns in the original
matrix A. There are at most min (m, n) pivot columns (nonzero rows). Therefore,
the rank of A is no larger than min (m, n) as claimed. Since every column is either
a pivot column or isnt a pivot column, this shows that the rank added to nullity
equals n.
42 ROW REDUCED ECHELON FORM
Sequences
n
(a, b) ak bk .
k=1
With this denition, there are several important properties satised by the inner
product. In the statement of these properties, and will denote scalars and a, b, c
will denote vectors or in other words, points in Fn . The following proposition comes
directly from the denition of the inner product.
43
44 SEQUENCES
Then by 3.1.2, f (t) 0 for all t R. Also from 3.1.3,3.1.4,3.1.1, and 3.1.5
since otherwise the function, f (t) would have two real zeros and would necessarily
have a graph which dips below the t axis. This proves 3.1.6.
It is clear from the axioms of the inner product that equality holds in 3.1.6
whenever one of the vectors is a scalar multiple of the other. It only remains to
verify that this is the only way equality can occur. If either vector equals zero,
then equality is obtained in 3.1.6 so it can be assumed both vectors are non zero.
Then if equality is achieved, it follows f (t) has exactly one real zero because the
discriminant vanishes. Therefore, for some value of t, a + tb = 0 showing that a is
a multiple of b.
You should note that the entire argument was based only on the properties
of the inner product listed in 3.1.1 - 3.1.5. This means that whenever something
3.1. THE INNER PRODUCT AND DOT PRODUCT 45
satises these properties, the Cauchy Schwartz inequality holds. There are many
other instances of these properties besides vectors in Fn .
The Cauchy Schwartz inequality allows a proof of the triangle inequality for
distances in Fn in much the same way as the triangle inequality for the absolute
value.
Theorem 3.1.5 (Triangle inequality) For a, b Fn
|a + b| |a| + |b| (3.1.7)
and equality holds if and only if one of the vectors is a nonnegative scalar multiple
of the other. Also
||a| |b|| |a b| (3.1.8)
Proof : By properties of the inner product and the Cauchy Schwartz inequality,
2
|a + b| = ((a + b) , (a + b))
= (a, a) + (a, b) + (b, a) + (b, b)
2 2
= |a| + 2 Re (a, b) + |b|
2 2
|a| + 2 |(a, b)| + |b|
2 2
|a| + 2 |a| |b| + |b|
2
= (|a| + |b|) .
Taking square roots of both sides you obtain 3.1.7.
It remains to consider when equality occurs. If either vector equals zero, then
that vector equals zero times the other vector and the claim about when equality
occurs is veried. Therefore, it can be assumed both vectors are nonzero. To get
equality in the second inequality above, Theorem 3.1.4 implies one of the vectors
must be a multiple of the other. Say b = a. Also, to get equality in the rst
inequality, (a, b) must be a nonnegative real number. Thus
2
0 (a, b) = (a,a) = |a| .
Therefore, must be a real number which is nonnegative.
To get the other form of the triangle inequality,
a=ab+b
so
|a| = |a b + b|
|a b| + |b| .
Therefore,
|a| |b| |a b| (3.1.9)
Similarly,
|b| |a| |b a| = |a b| . (3.1.10)
It follows from 3.1.9 and 3.1.10 that 3.1.8 holds. This is because ||a| |b|| equals
the left side of either 3.1.9 or 3.1.10 and either way, ||a| |b|| |a b| .
46 SEQUENCES
ab=ba (3.1.11)
((a + b) c) = (a c) + (b c) (3.1.12)
(c (a + b)) = (c a) + (c b) (3.1.13)
Note that it is the same thing as the inner product if the vectors are in R . However,
n
in case the vectors are in Cn , this dot product will not satisfy a a 0. For example,
(1, 2i) (1, 2i) = 1 4 = 3. However, ((1, 2i) , (1, 2i)) = 1 + 4 = 5. Usually we are
considering Rn so it makes no dierence.
Denition 3.2.1 Let {an } be a sequence and let n1 < n2 < n3 , be any strictly
increasing list of integers such that n1 is at least as large as the rst number in the
domain of the function. Then if bk ank , {bk } is called a subsequence of {an } .
( ( ))
Example 3.2.2 Let an = n + 1, sin n1 . Then {an }n=1 is a vector valued se-
quence.
The denition of a limit of a vector valued sequence is given next. It is just like
the denition given for sequences of scalars. However, here the symbol || refers to
the usual norm in Fn . In a general normed vector space, it will be denoted by |||| .
Denition 3.2.3 A vector valued sequence {an }n=1 converges to a in a normed
vector space V, written as
lim an = a or an a
n
if and only if for every > 0 there exists n such that whenever n n ,
In words the denition says that given any measure of closeness , the terms of
the sequence are eventually this close to a. Here, the word eventually refers to n
being suciently large.
Proof: Suppose a1 = a. Then let 0 < < ||a1 a|| /2 in the denition of
the limit. It follows there exists n such that if n n , then ||an a|| < and
|an a1 | < . Therefore, for such n,
a contradiction.
Theorem 3.2.5 Suppose {an } and {bn } are vector valued sequences and that
Also,
lim (an bn ) = (a b) (3.2.15)
n
where here ( ) ( )
xk = x1k , , xnk , x = x1 , , xn .
Now consider the second. Let > 0 be given and choose n1 such that if n n1
then
|an a| < 1.
For such n, it follows from the Cauchy Schwarz inequality and properties of the
inner product that
For n n ,
This proves 3.2.15. The claim, 3.2.16 is left for you to do.
Finally consider the last claim. If 3.2.17 holds, then from the denition of
distance in Fn ,
v
u
u n ( )2
lim |x xk | lim t xj xjk = 0.
k k
j=1
On the other hand, if limk |x xk | = 0, then since xjk xj |x xk | , it
follows from the squeezing theorem that
lim xjk xj = 0.
k
An important theorem is the one which states that if a sequence converges, so
does every subsequence. You should review Denition 3.2.1 at this point. The proof
is identical to the one involving sequences of numbers.
Theorem 3.2.6 Let {xn } be a vector valued sequence with limn xn = x and let
{xnk } be a subsequence. Then limk xnk = x.
3.3. SEQUENTIAL COMPACTNESS 49
Proof: Let > 0 be given. Then there exists n such that if n > n , then
||xn x|| < . Suppose k > n . Then nk k > n and so
Theorem 3.2.7 Let {xn } be a sequence of real numbers and suppose each xn l
( l)and limn xn = x. Then x l ( l) . More generally, suppose {xn } and {yn }
are two sequences such that limn xn = x and limn yn = y. Then if xn yn
for all n suciently large, then x y.
l xn > x
and so
l + x.
Since > 0 is arbitrary, this requires l x. The other case is entirely similar or else
you could consider l and {xn } and apply the case just considered.
Consider the last claim. There exists N such that if n N then xn yn and
|x xn | + |y yn | < /2.
x y xn + /2 (yn /2) = xn yn + .
Theorem 3.2.8 Let {xn } be a sequence vectors and suppose each ||xn || l ( l)and
limn xn = x. Then x l ( l) . More generally, suppose {xn } and {yn } are two
sequences such that limn xn = x and limn yn = y. Then if ||xn || ||yn || for
all n suciently large, then ||x|| ||y|| .
Proof: It suces to just prove the second part since the rst part is similar.
By the triangle inequality,
and for large n this is given to be small. Thus {||xn ||} converges to ||x|| . Similarly
{||yn ||} converges to ||y||. Now the desired result follows from Theorem 3.2.7.
Ik Ik+1 .
Consequently, if k l,
al al bl bk . (3.3.19)
Now dene
{ }
c sup al : l = 1, 2,
By the rst inequality in 3.3.18, and 3.3.19
{ }
ak c = sup al : l = k, k + 1, bk (3.3.20)
Theorem 3.4.2 The intersection of any nite collection of open sets is open. The
union of any collection of open sets is open. The intersection of any collection of
closed sets is closed and the union of any nite collection of closed sets is closed.
Proof: To see that any union of open sets is open, note that every point of the
union is in at least one of the open sets. Therefore, it is an interior point of that
set and hence an interior point of the entire union.
Now let {U1 , , Um } be some open sets and suppose p m k=1 Uk . Then there
exists rk > 0 such that B (p, rk ) Uk . Let 0 < r min (r1 , r2 , , rm ) . Then
B (p, r) m k=1 Uk and so the nite intersection is open. Note that if the nite
intersection is empty, there is nothing to prove because it is certainly true in this
case that every point in the intersection is an interior point because there arent
any such points.
Suppose {H1 , , Hm } is a nite set of closed sets. Then mk=1 Hk is closed if
its complement is open. However, from DeMorgans laws,
C
k=1 Hk ) = k=1 Hk ,
(m m C
a nite intersection of open sets which is open by what was just shown.
Next let C be some collection of closed sets. Then
C { }
(C) = H C : H C ,
a union of open sets which is therefore open by the rst part of the proof. Thus C
is closed.
Next there is the concept of a limit point which gives another way of character-
izing closed sets.
Denition 3.4.3 Let A be any nonempty set and let x be a point. Then x is said
to be a limit point of A if for every r > 0, B (x, r) contains a point of A which is
not equal to x.
Theorem 3.4.8 Every closed and bounded set in Fn is sequentially compact. Con-
versely, every sequentially compact set in Fn is closed and bounded.
54 SEQUENCES
Proof: Let H be a closed and bounded set in Fn . Then H B (0, r) for some r.
n 2
Therefore, if x H, x = (x1 , , xn ) , it must be that i=1 |xi | < r and so each
xi [r, r] + i [r, r] Rr , a sequentially compact set by Corollary 3.3.5. Thus H
n
is a closed subset of i=1 Rr which is a sequentially compact set by Theorem 3.3.4.
Therefore, by Theorem 3.4.6 it follows H is sequentially compact.
Conversely, suppose K is a sequentially compact set in Fn . If it is not bounded,
then there exists a sequence, {km } such that km K but km / B (0, m) for m =
1, 2, . However, this sequence cannot have any convergent subsequence because if
C
kmk k, then for large enough m, k B (0,m) D (0, m) and kmk B (0, m)
for all k large enough and this is a contradiction because there can only be nitely
many points of the sequence in B (0, m) . If K is not closed, then it is (missing)a
limit point. Say k is a limit point of K which is not in K. Pick km B k , m 1
.
Then {km } converges to k and so every subsequence also converges to k by
Theorem 3.2.6. Thus there is no point of K which is a limit of some subsequence
of {km } , a contradiction.
What are some examples of closed and bounded sets in a general normed vector
space and more specically Fn ?
Proposition 3.4.9 Let D (z, r) denote the set of points,
{w V : ||w z|| r}
Then D (z, r) is closed and bounded. Also, let S (z,r) denote the set of points
{w V : ||w z|| = r}
Then S (z, r) is closed and bounded. It follows that if V = Fn ,then these sets are
sequentially compact.
Proof: First note D (z, r) is bounded because
D (z, r) B (0, ||z|| + 2r)
Here is why. Let x D (z, r) . Then ||x z|| r and so
||x|| ||x z|| + ||z|| r + ||z|| < 2r + ||z|| .
It remains to verify it is closed. Suppose then that y / D (z, r) . This means
||y z|| > r. Consider the open ball B (y, ||y z|| r) . If x B (y, ||y z|| r) ,
then
||x y|| < ||y z|| r
and so by the triangle inequality,
||z x|| ||z y|| ||y x|| > ||x y|| + r ||x y|| = r
Thus the complement of D (z, r) is open and so D (z, r) is closed.
C C
For the second type of set, note S (z, r) = B (z, r) D (z, r) , the union of two
open sets which by Theorem 3.4.2 is open. Therefore, S (z, r) is a closed set which
is clearly bounded because S (z, r) D (z, r).
3.5. CAUCHY SEQUENCES AND COMPLETENESS 55
Denition 3.5.1 {an } is a Cauchy sequence in a normed vector space, V if for all
> 0, there exists n such that whenever n, m n ,
||an am || < .
Theorem 3.5.2 The set of terms (values) of a Cauchy sequence in a normed vector
space V is bounded.
Proof: Let = 1 in the denition of a Cauchy sequence and let n > n1 . Then
from the denition,
||an an1 || < 1.
It follows that for all n > n1 ,
Proof: Let > 0 be given and suppose an a. Then from the denition of
convergence, there exists n such that if n > n , it follows that
||an a|| <
2
Therefore, if m, n n + 1, it follows that
||an am || ||an a|| + ||a am || < + =
2 2
showing that, since > 0 is arbitrary, {an } is a Cauchy sequence.
The following theorem is very useful. It is identical to an earlier theorem. All
that is required is to put things in bold face to indicate they are vectors.
Theorem 3.5.4 Suppose {an } is a Cauchy sequence in any normed vector space
and there exists a subsequence, {ank } which converges to a. Then {an } also con-
verges to a.
56 SEQUENCES
Proof: Let > 0 be given. There exists N such that if m, n > N, then
Denition 3.5.5 If V is a normed vector space having the property that every
Cauchy sequence converges, then V is called complete. It is also referred to as a
Banach space.
|p q| diam (Fk ) .
3.7 Exercises
1. For a nonempty set S in a normed vector space, V, dene a function
Show
||dist (x, S) dist (y, S)|| ||x y|| .
3. The interior of a set was dened above. Tell why the interior of a set is always
an open set. The interior of a set A is sometimes denoted by A0 .
4. Given an example of a set A whose interior is empty but whose closure is all
of Rn .
6. Give an example of a nite dimensional normed vector space where the eld
of scalars is the rational numbers which is not complete.
7. Explain why as far as the theorems of this chapter are concerned, Cn is es-
sentially the same as R2n .
11. Show that any uncountable set of points in Fn must have a limit point.
12. Let V be any nite dimensional vector space having a basis {v1 , , vn } . For
x V, let
n
x= xk vk
k=1
so that the scalars, xk are the components of x with respect to the given basis.
Dene for x, y V
n
(x y) xi yi
i=1
Show this is a dot product for V satisfying all the axioms of a dot product
presented earlier.
13. In the context of Problem 12 let |x| denote the norm of x which is produced
by this inner product and suppose |||| is some other norm on V . Thus
( )1/2
2
|x| |xi |
i
where
x= xk vk . (3.7.21)
k
Show there exist positive numbers < independent of x such that
|x| ||x|| |x|
This is referred to by saying the two norms are equivalent. Hint: The top half
is easy using the Cauchy Schwarz inequality. The bottom half is somewhat
harder. Argue that if it is not so, there exists a sequence {xk } such that
|xk | = 1 but k 1 |xk | = k 1 ||xk || and then note the vector of components
of xk is on S (0, 1) which was shown to be sequentially compact. Pass to
a limit in 3.7.21 and use the assumed inequality to get a contradiction to
{v1 , , vn } being a basis.
14. It was shown above that in Fn , the sequentially compact sets are exactly those
which are closed and bounded. Show that in any nite dimensional normed
vector space, V the closed and bounded sets are those which are sequentially
compact.
15. Two norms on a nite dimensional vector space, ||||1 and ||||2 are said to be
equivalent if there exist positive numbers < such that
||x||1 ||x||2 ||x1 ||1 .
Show the statement that two norms are equivalent is an equivalence rela-
tion. Explain using the result of Problem 13 why any two norms on a nite
dimensional vector space are equivalent.
60 SEQUENCES
16. A normed vector space, V is separable if there is a countable set {wk }k=1
such that whenever B (x, ) is an open ball in V, there exists some wk in this
open ball. Show that Fn is separable. This set of points is called a countable
dense set.
17. Let V be any normed vector space with norm ||||. Using Problem 13 show
that V is separable.
18. Suppose V is a normed vector space. Show there exists a countable set of open
balls B {B (xk , rk )}k=1 having the remarkable property that any open set U
is the union of some subset of B. This collection of balls is called a countable
basis. Hint: Use Problem 17 to get a countable ( dense
) dense set of points,
{xk }k=1 and then consider balls of the form B xk , 1r where r N. Show this
collection of balls is countable and then show it has the remarkable property
mentioned.
19. Suppose S is any nonempty set in V a nite dimensional normed vector space.
Suppose C is a set of open sets such that C S. (Such a collection of sets is
called an open cover.) Show using Problem 18 that there are countably many
sets from C, {Uk }k=1 such that S k=1 Uk . This is called the Lindelo
property when every open cover can be reduced to a countable sub cover.
20. A set H in a normed vector space is said to be compact if whenever C is a set
of open sets such that C H, there are nitely many sets of C, {U1 , , Up }
such that
H pi=1 Ui .
Show using Problem 19 that if a set in a normed vector space is sequentially
compact, then it must be compact. Next show using Problem 14 that a set
in a normed vector space is compact if and only if it is closed and bounded.
Explain why the sets which are compact, closed and bounded, and sequentially
compact are the same sets in any nite dimensional normed vector space
Continuous Functions
Continuous functions are dened as they are for a function of one variable.
There is a theorem which makes it easier to verify certain functions are contin-
uous without having to always go to the above denition. The statement of this
theorem is purposely just a little vague. Some of these things tend to hold in almost
any context, certainly for any normed vector space.
2. If and f and g have values in Fn and they are each continuous at x, then f g
is continuous at x. If g has values in F and g (x) = 0 with g continuous, then
f /g is continuous at x.
61
62 CONTINUOUS FUNCTIONS
Proof: First consider 1.) Let > 0 be given. By assumption, there exist 1 > 0
such that whenever |x y| < 1 , it follows |f (x) f (y)| < 2(|a|+|b|+1)
and there
exists 2 > 0 such that whenever |x y| < 2 , it follows that |g (x) g (y)| <
2(|a|+|b|+1) . Then let 0 < min ( 1 , 2 ) . If |x y| < , then everything happens
Now consider 2.) There exists 1 > 0 such that if |y x| < 1 , then |f (x) f (y)| <
1. Therefore, for such y,
|f (y)| < 1 + |f (x)| .
It follows that for such y,
|f g (x) f g (y)| |f (x) g (x) g (x) f (y)| + |g (x) f (y) f (y) g (y)|
Now let > 0 be given. There exists 2 such that if |x y| < 2 , then
|g (x) g (y)| < ,
2 (2 + |g (x)| + |f (x)|)
Now let 0 < min ( 1 , 2 , 3 ) . Then if |x y| < , all the above hold at once
and so
|f g (x) f g (y)|
This proves the rst part of 2.) To obtain the second part, let 1 be as described
above and let 0 > 0 be such that for |x y| < 0 ,
which implies |g (y)| |g (x)| /2, and |g (y)| < 3 |g (x)| /2.
Then if |x y| < min ( 0 , 1 ) ,
f (x) f (y) f (x) g (y) f (y) g (x)
g (x) g (y) = g (x) g (y)
|f (x) g (y) f (y) g (x)|
( )
2
|g(x)|
2
2
2 [|f (x) g (y) f (y) g (y) + f (y) g (y) f (y) g (x)|]
|g (x)|
2
2 [|g (y)| |f (x) f (y)| + |f (y)| |g (y) g (x)|]
|g (x)|
[ ]
2 3
2 |g (x)| |f (x) f (y)| + (1 + |f (x)|) |g (y) g (x)|
|g (x)| 2
2
2 (1 + 2 |f (x)| + 2 |g (x)|) [|f (x) f (y)| + |g (y) g (x)|]
|g (x)|
M [|f (x) f (y)| + |g (y) g (x)|]
where M is dened by
2
M 2 (1 + 2 |f (x)| + 2 |g (x)|)
|g (x)|
x f 1 (B (f (x) , )) ,
Proof: Suppose rst that f is continuous at x and let xn x. Let > 0 be given.
By continuity, there exists > 0 such that if ||y x|| < , then ||f (x) f (y)|| < .
4.1. CONTINUITY AND THE LIMIT OF A SEQUENCE 65
However, there exists n such that if n n , then ||xn x|| < and so for all n
this large,
||f (x) f (xn )|| <
which shows f (xn ) f (x) .
Now suppose the condition about taking convergent sequences to convergent
sequences holds at x. Suppose f fails to be continuous at x. Then there exists > 0
and xn D (f ) such that ||x xn || < n1 , yet
||f (x)|| l ( l) .
Proof: Since ||f (xn )|| l and f is continuous at x, it follows from the triangle
inequality, Theorem 3.2.8 and Theorem 4.1.1,
Proof: Suppose f (kn ) f (k) . Does it follow kn k? If this does not happen,
then there exists > 0 and a subsequence still denoted as {kn } such that
|kn k| (4.1.1)
Now since K is compact, there exists a further subsequence, still denoted as {kn }
such that
kn k K
However, the continuity of f requires
f (kn ) f (k )
lim xkl = x K.
l
Then by continuity of f,
which shows f achieves its maximum on K. To see it achieves its minimum, you
could repeat the argument with a minimizing sequence or else you could consider
f and apply what was just shown to f , f having its minimum when f has its
maximum.
Proof: First of all, denote by C the set of closed sets which contain A. Then
A = C
Each H C is open and so the union of all these open sets must also be open. This
is because if x is in this union, then it is in at least one of them. Hence it is an
interior point of that one. But this implies it is an interior point of the union of
them all which is an even larger set. Thus A is closed.
The interesting part is the next claim. First note that from the denition, A A
so if x A, then x A. Now consider y A but y / A. If y
/ A, a closed set, then
C
there exists B (y, r) A . Thus y cannot be a limit point of A, a contradiction.
Therefore,
A A A
Next suppose x A and suppose x / A. Then if B (x, r) contains no points of
A dierent than x, since x itself is not in A, it would follow that B (x,r) A =
C
and so recalling that open balls are open, B (x, r) is a closed set containing A so
from the denition, it also contains A which is contrary to the assertion that x A.
/ A, then x A and so
Hence if x
A A A
Now that the closure of a set has been dened it is possible to dene what is
meant by a set being separated.
Denition 4.3.3 A set S in a normed vector space is separated if there exist sets
A, B such that
S = A B, A, B = , and A B = B A = .
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated. Remember A denotes the closure of the set A.
Note that the concept of connected sets is dened in terms of what it is not. This
makes it somewhat dicult to understand. One of the most important theorems
about connected sets is the following.
Theorem 4.3.4 Suppose U and V are connected sets having nonempty intersec-
tion. Then U V is also connected.
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets A, B. Therefore, the other must be empty
and this shows U V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
S {t [x, y] : [x, t] A}
(l, l + ) B =
Corollary 4.3.9 Let E be a connected set in a normed vector space and suppose
f : E R and that y (f (e1 ) , f (e2 )) where ei E. Then there exists e E such
that f (e) = y.
Theorem 4.3.10 Let U be an open set in R. Then there exist countably many
disjoint open sets {(ai , bi )}i=1 such that U =
i=1 (ai , bi ) .
This shows Cp is open. By Theorem 4.3.8, this shows Cp is an open interval, (a, b)
where a, b [, ] . There are therefore at most countably many of these con-
nected components because each must contain a rational number and the rational
numbers are countable. Denote by {(ai , bi )}i=1 the set of these connected compo-
nents.
You can verify that this set of points in the normed vector space R2 is not arcwise
connected but is connected.
Proof: This is easy from the convexity of the set. If x, y B (z, r) , then let
(t) = x + t (y x) for t [0, 1] .
Thus you consider for each x D the sequence of numbers {fn (x)} and if this
sequence converges for each x D, the thing it converges to is called f (x).
for all x D.
Proof: Let > 0 be given and pick z D. By uniform convergence, there exists
N such that if n > N, then for all x D,
||f (x) f (z)|| ||f (x) fn (x)|| + ||fn (x) fn (z)|| + ||fn (z) f (z)||
< /3 + /3 + /3 =
||f (y) f (z)|| ||f (y) fn (y)|| + ||fn (y) fn (z)|| + ||fn (z) f (z)||
< /3 + /3 + /3 =
for all x D. Then for any x D, pick n N and it follows from Theorem 3.2.8
Proof: If the sequence converges pointwise, then by Theorem 3.5.3 the se-
quence {fn (x)} is a Cauchy sequence for each x D. Conversely, if {fn (x)} is a
Cauchy sequence for each x D, then {fn (x)} converges for each x D because
of completeness.
Now suppose {fn } is uniformly Cauchy. Then from Theorem 4.5.6 there ex-
ists f such that {fn } converges uniformly on D to f . Conversely, if {fn } converges
uniformly to f on D, then if > 0 is given, there exists N such that if n N,
and its value at x is given by the limit of the sequence of partial sums in 4.5.4. If
for all x D, the limit in 4.5.4 exists, then 4.5.5 is said to converge pointwise.
k=1 fk is said to converge uniformly on D if the sequence of partial sums,
{ n }
fk
k=1
converges uniformly.If the indices for the functions start at some other value than
1, you make the obvious modication to the above denition.
Proof: The rst part follows from Theorem 4.5.8. The second part follows
from observing the condition is equivalent to the sequence of partial sums forming a
uniformly Cauchy sequence and then by Theorem 4.5.6,
these partial sums converge
uniformly to a function which is the denition of k=1 fk .
Is there an easy way to recognize when 4.5.6 happens? Yes, there is. It is called
the Weierstrass M test.
Proof: Let z D. Then letting m < n and using the triangle inequality
n
m
n
fk (z) fk (z) ||fk (z)|| Mk <
k=1 k=1 k=m+1 k=m+1
whenever m is large enough because of the assumption that n=1 Mn converges.
Therefore, the sequenceof partial sums is uniformly Cauchy on D and therefore,
converges uniformly to k=1 fk on D.
Theorem
4.5.12 If {fn } is a sequence of continuous
functions dened on D and
f
k=1 k converges uniformly, then the function, k=1 fk must also be continuous.
Proof: This follows from Theorem 4.5.4 applied to the sequence of partial
sums
of the above series which is assumed to converge uniformly to the function, k=1 fk .
4.6 Polynomials
General considerations about what a function is have already been considered ear-
lier. For functions of one variable, the special kind of functions known as a polyno-
mial has a corresponding version when one considers a function of many variables.
This is found in the next denition.
= (1 , , n )
Then x means
x x
1 x2 x3
1 2 n
where the d are complex or real numbers. Rational functions are dened as the
quotient of two polynomials. Thus these functions are dened on Fn .
is a rational function.
Note that in the case of a rational function, the domain of the function might
not be all of Fn . For example, if
the domain of f would be all complex numbers such that x22 + 3x21 = 4.
By Theorem 4.0.2 all polynomials are continuous. To see this, note that the
function,
k (x) xk
is a continuous function because of the inequality
Polynomials are simple sums of scalars times products of these functions. Similarly,
by this theorem, rational functions, quotients of polynomials, are continuous at
points where the denominator is non zero. More generally, if V is a normed vector
space, consider a V valued function of the form
f (x) d x
||m
and so, taking the derivative on both sides with respect to t and then letting t = 0,
m ( )
m mk
mx = kxk (1 x)
k
k=0
Next take the second derivative of both sides with respect to t and then let t = 0.
Thus after doing the computations,
m ( )
m 2 k mk
x2 m2 x2 m + mx = k x (1 x)
k
k=0
and this achieves its maximum value when x = 1/2. Plugging this in, the above is
no larger than 14 m.
Now let f be a continuous function dened on [0, 1] . Let pn be the polynomial
dened by
n ( ) ( )
n k nk
pn (x) f xk (1 x) . (4.7.7)
k n
k=0
n
Now for f a continuous function dened on [0, 1] and for x = (x1 , , xn ) ,consider
the polynomial,
m m ( )( )
( )
m m m k1 mk1 k2 mk2
pm (x) x (1 x1 ) x2 (1 x2 )
k1 k2 kn 1
k1 =1 kn =1
( )
mkn k1 kn
xknn (1 xn ) f , , . (4.7.8)
m m
Also dene if I is a set in Rn
This is the n dimensional version of the Bernstein polynomials which is what results
n the case where n = 1.
n n
Lemma 4.7.2 For x [0, 1] , f a continuous F valued function dened on [0, 1] ,
n
and pm given in 4.7.8, pm converges uniformly to f on [0, 1] as m .
n
and so for x [0, 1] ,
( ) ( )
m k mk k
|pm (x) f (x)| x (1 x) f f (x)
k m
||k|| m
(m) ( )
mk k
x (1 x)
k
f f (x)
k m
kG
m ( ) ( )
mk k
+ x (1 x)
k
f m f (x) (4.7.9)
C
k
kG
(k) k
and so f m f (x) < because the above implies m x < . Therefore, the
rst sum on the right in 4.7.9 is no larger than
(m) mk
(m) mk
xk (1 x) xk (1 x) = .
k k
kG ||k|| m
n
Letting M max {|f (x)| : x [0, 1] } it follows
because on GC ,
2
(kj mxj )
< 1, j = 1, , n.
2 m2
Now by Lemma 4.7.1,
( )n (
1 m )n
|pm (x) f (x)| + 2M .
2 m2 4
Therefore, since the right side does not depend on x, it follows that for all m
suciently large,
||pm f ||[0,1]n 2
and since is arbitrary, this shows limm ||pm f ||[0,1]n = 0.
Proof: Let gk : [0, 1] [ak , bk ] be linear, one to one, and onto and let
Lemma 4.7.5 Let H, K be two nonempty disjoint closed subsets of Rn . Then there
exists a continuous function, g : Rn [1, 1] such that g (H) = 1/3, g (K) =
1/3, g (Rn ) [1/3, 1/3] .
Proof: Let
dist (x, H)
f (x) .
dist (x, H) + dist (x, K)
The denominator is never equal to zero because if dist (x, H) = 0, then x H
because H is closed. (To see this, pick hk B (x, 1/k) H. Then hk x and since
H is closed, x H.) Similarly, if dist (x, K) = 0, then x K and so the denominator
is never zero as claimed. Hence f is (continuous) and from its denition, f = 0 on H
and f = 1 on K. Now let g (x) 32 f (x) 12 . Then g has the desired properties.
82 CONTINUOUS FUNCTIONS
Denition 4.7.6 For f a real or complex valued bounded continuous function de-
ned on M Rn .
||f ||M sup {|f (x)| : x M } .
Proof: Using Lemma 4.7.7, let g1 be such that g1 (Rn ) [1/3, 1/3] and
2
||f g1 ||M .
3
Suppose g1 , , gm have been chosen such that gj (Rn ) [1/3, 1/3] and
m ( )i1
( )m
2 2
f gi < . (4.7.12)
3 3
i=1 M
( 3 )m ( m ( 2 )i1 )
and so 2 f i=1 3 gi can play the role of f in the rst step of the
proof. Therefore, there exists gm+1 dened and continuous on all of Rn such that
its values are in [1/3, 1/3] and
( ) ( m ( )i1
)
3 m
2 2
f gi gm+1 .
2 3 3
i=1 M
Hence ( m ( )i1
) ( ) ( )m+1
m
2 2 2
f gi gm+1 .
3 3 3
i=1 M
4.7. SEQUENCES OF POLYNOMIALS, WEIERSTRASS APPROXIMATION 83
It follows there exists a sequence, {gi } such that each has its values in [1/3, 1/3]
and for every m 4.7.12 holds. Then let
( )i1
2
g (x) gi (x) .
i=1
3
It follows ( ) m ( )i1
2 i1
2 1
|g (x)| gi (x) 1
3 3 3
i=1 i=1
and ( ) ( )
2 i1 i1
2 1
gi (x)
3 3 3
so the Weierstrass M test applies and shows convergence is uniform. Therefore g
must be continuous. The estimate 4.7.12 implies f = g on M .
The following is the Tietze extension theorem.
Then by the Weierstrass approximation theorem, Theorem 4.7.3 there exists a se-
quence of polynomials {pm } converging uniformly to g on R. Therefore, this se-
quence of polynomials converges uniformly to g = f on K as well.
By considering the real and imaginary parts of a function which has values in C
one can generalize the above theorem.
84 CONTINUOUS FUNCTIONS
Denition 4.8.1 Let V, W be two nite dimensional normed vector spaces having
norms ||||V and ||||W respectively. Let L L (V, W ) . Then the operator norm of
L, denoted by ||L|| is dened as
Then the following theorem discusses the main properties of this norm. In the
future, I will dispense with the subscript on the symbols for the norm because it is
clear from the context which norm is meant. Here is a useful lemma.
Lemma 4.8.2 Let V be a normed vector space having a basis {v1 , , vn }. Let
{ n }
n
A = a F : ak vk 1
k=1
Next consider the claim that A is bounded. Suppose this is not so. Then there
exists a sequence {ak } of points of A,
( )
ak = a1k , , ank ,
Let ( )
a1k an
bk = , , k
|ak | |ak |
Then |bk | = 1 so bk is contained in the closed and bounded set S (0, 1) which is
sequentially compact in Fn . It follows there exists a subsequence, still denoted by
{bk } such that it converges to b S (0,1) . Passing to the limit in 4.8.13 using the
following inequality,
n n j
ajk
n
a
vj bj vj k
bj ||vj ||
|a | |a |
j=1 k j=1 j=1k
n
to see that the sum converges to j=1 bj vj , it follows
n
bj vj = 0
j=1
and this is a contradiction because {v1 , , vn } is a basis and not all the bj can
equal zero. Therefore, A must be bounded after all.
1. ||L|| <
2. For all x X, ||Lx|| ||L|| ||x|| and if L L (V, W ) while M L (W, Z) ,
then ||M L|| ||M || ||L||.
3. |||| is a norm. In particular,
Then if w V,
and so if ||v w|| is suciently small, ||v w|| < / ||L|| , then L (w) B (L (v) , )
which shows B (v, / ||L||) L1 (U ) and since v L1 (U ) was arbitrary, this
shows L1 (U ) is open.
The operator norm will be very important in the chapter on the derivative.
Part 1.) of Theorem 4.8.3 says that if L L (V, W ) where V and W are two
normed vector spaces, then there exists K such that for all v V,
||Lv||W K ||v||V
An obvious case is to let L = id, the identity map on V and let there be two
dierent norms on V, ||||1 and ||||2 . Thus (V, ||||1 ) is a normed vector space and
so is (V, ||||2 ) . Then Theorem 4.8.3 implies that
Theorem 4.8.4 Let V be a nite dimensional vector space and let ||||1 and ||||2 be
two norms for V. Then these norms are equivalent which means there exist constants,
, such that for all v V
A set K is sequentially compact if and only if it is closed and bounded. Also every
nite dimensional normed vector space is complete. Also any closed and bounded
subset of a nite dimensional normed vector space is sequentially compact.
and so
1
||v||1 ||v||2 K2 ||v||1 .
K1
Next consider the claim that all closed and bounded sets in a normed vector
space are sequentially compact. Let L : Fn V be dened by
n
L (a) ak vk
k=1
88 CONTINUOUS FUNCTIONS
Also L1 (K) is bounded. To see this, note that L1 is one to one onto V and so
L1 L (V, Fn ). Therefore,
1
L (v) L1 ||v|| L1 r
n
xk = xjk vj
j=1
It is clear most axioms of a norm hold. The triangle inequality also holds because
by the triangle inequality for Fn ,
1/2
n
j
||x + y|| x + y j 2
j=1
1/2 1/2
n
j 2
n
j 2
x + y ||x|| + ||y|| .
j=1 j=1
By the rst part of this theorem, this norm is equivalent to the norm on V . Thus
K{is closed
} and bounded with respect to this new norm. It follows that for each
j
j, xk is a bounded sequence in F and so by the theorems about sequential
k=1
compactness in F it follows upon taking subsequences n times, there exists a sub-
sequence xkl such that for each j,
lim xjkl = xj
l
because K is closed.
In the above example, this is a norm on the vector space, V . It is clear ||av|| =
|a| ||v|| and that ||v|| 0 and
nequals 0 if and only ifv = 0. The hard part is the
n
triangle inequality. Let v = k=1 ak vk and w = v = k=1 bk vk .
B (x, ) D = .
( )
Proof: Let n N. Pick k1n K. If B k1n , n1 K, stop. Otherwise pick
( )
n 1
k2 K \ B k1 ,
n
n
Continue this way till the process ends. It must end because if it didnt, there would
exist a convergent subsequence which would imply two of the kjn would have to be
closer than 1/n which is impossible from the construction. Denote this collection
of points by Dn . Then D n=1 Dn . This must work because if > 0 is given
and x K, let 1/n < /3 and the construction implies x B (kin , 1/n) for some
kin Dn D. Then
kin B (x, ) .
D is countable because it is the countable union of nite sets.
Denition 4.9.5 More generally, if K is any subset of a normed vector space and
there exists D such that D is countable and for all x K,
B (x, ) D =
lim ||fk f || = 0.
k
Proof: Uniform convergence would say that for every > 0, there exists n
such that if k, l n , then for all x K,
Thus if the given sequence does not converge uniformly, there exists > 0 such that
for all n, there exists k, l n and xn K such that
lim ||f f k || = 0.
k
The Ascoli Arzela theorem is the following.
Thus for k, l large enough, the right side is less than . This shows that for each x
K, {fk (x)}k=1 is a Cauchy sequence and so by completeness of V this converges. Let
f (x) be the thing to which it converges. Then f is continuous and the convergence
is uniform by Lemma 4.9.6.
4.10 Exercises
1. In Theorem 4.7.3 it is assumed f has values in F. Show there is no change if
f has values in V, a normed vector space provided
you redene the denition
of a polynomial to be something of the form ||m a x where a V .
2. How would you generalize the conclusion of Corollary 4.7.11 to include the
situation where f has values in a nite dimensional normed vector space?
3. If {fn } and {gn } are sequences of Fn valued functions dened on D which con-
verge uniformly, show that if a, b are constants, then afn + bgn also converges
uniformly. If there exists a constant, M such that |fn (x)| , |gn (x)| < M for all
n and for all x D, show {fn gn } converges uniformly. Let fn (x) 1/ |x|
for x B (0,1) and let gn (x) (n 1) /n. Show {fn } converges uniformly on
B (0,1) and {gn } converges uniformly but {fn gn } fails to converge uniformly.
4. Formulate a theorem for series of functions of n variables which will allow you
to conclude the innite series is uniformly continuous based on reasonable
assumptions about the functions in the sum.
5. If f and g are real valued functions which are continuous on some set D, show
that
min (f, g) , max (f, g)
are also continuous. Generalize this to any nite collection of continuous func-
tions. Hint: Note max (f, g) = |f g|+f
2
+g
. Now recall the triangle inequality
which can be used to show || is a continuous function.
94 CONTINUOUS FUNCTIONS
for some 1 for all x, y. Show every Holder continuous function is uniformly
continuous.
13. Consider f (x) dist (x,S) where S is a nonempty subset of Rn . Show f is
uniformly continuous.
14. Let K be a sequentially compact set in a normed vector space V and let
f : V W be continuous where W is also a normed vector space. Show f (K)
is also sequentially compact.
15. If f is uniformly continuous, does it follow that |f | is also uniformly continu-
ous? If |f | is uniformly continuous does it follow that f is uniformly continu-
ous? Answer the same questions with uniformly continuous replaced with
continuous. Explain why.
4.10. EXERCISES 95
18. Show that a real valued function dened on D Rn is continuous if and only
if it is both upper and lower semicontinuous.
19. Show that a real valued lower semicontinuous function dened on a sequen-
tially compact set achieves its minimum and that an upper semicontinuous
function dened on a sequentially compact set achieves its maximum.
{(x, y) : y f (x)} .
dimensional sets. In more general settings, one formulates the concept dierently.
96 CONTINUOUS FUNCTIONS
23. The operator norm was dened for L (V, W ) above. This is the usual norm
used for this vector space of linear transformations. Show that any other norm
used on L (V, W ) is equivalent to the operator norm. That is, show that if
||||1 is another norm, there exist scalars , such that
for all L L (V, W ) where here |||| denotes the operator norm.
24. One alternative norm which is very popular is as follows. Let L L (V, W )
and let (lij ) denote the matrix of L with respect to some bases. Then the
Frobenius norm is dened by
1/2
|lij |
2
||L||F .
ij
ij
where p 1 or even
||L|| = max |lij | .
ij
25. Explain why L (V, W ) is always a complete normed vector space whenever
V, W are nite dimensional normed vector spaces for any choice of norm for
L (V, W ). Also explain why every closed and bounded subset of L (V, W ) is
sequentially compact for any choice of norm on this space.
26. Let L L (V, V ) where V is a nite dimensional normed vector space. Dene
Lk
eL
k!
k=1
Explain the meaning of this innite sum and show it converges in L (V, V ) for
any choice of norm on this space. Now tell how to dene sin (L).
27. Let X be a nite dimensional normed vector space, real or complex. Show
n
that X is separable.Hint: Let {vi }
i=1 be a basis and dene a map from F to
n
n n
X, , as follows. ( k=1 xk ek ) k=1 xk vk . Show is continuous and has
a continuous inverse. Now let D be a countable dense set in Fn and consider
(D).
4.10. EXERCISES 97
where
||f || sup{|f (x)| : x X}
and
|f (x) f (y)|
(f ) sup{ : x, y X, x = y}.
|x y|
Show that (C (X; Rn ) , |||| ) is a complete normed linear space. This is
called a Holder space. What would this space consist of if > 1?
30. Let {fn }
n=1 C (X; R ) where X is a compact subset of R and suppose
n p
||fn || M
for all n. Show there exists a subsequence, nk , such that fnk converges in
C (X; Rn ). The given sequence is precompact when this happens. (This also
shows the embedding of C (X; Rn ) into C (X; Rn ) is a compact embedding.)
Hint: You might want to use the Ascoli Arzela theorem.
31. This problem is for those who know about the derivative and the integral of
a function of one variable. Let f :R Rn Rn be continuous and bounded
and let x0 Rn . If
x : [0, T ] Rn
and h > 0, let {
x0 if s h,
h x (s)
x (s h) , if s > h.
For t [0, T ], let
t
xh (t) = x0 + f (s, h xh (s)) ds.
0
Show using the Ascoli Arzela theorem that there exists a sequence h 0 such
that
xh x
in C ([0, T ] ; Rn ). Next argue
t
x (t) = x0 + f (s, x (s)) ds
0
98 CONTINUOUS FUNCTIONS
x = f (t, x) , t [0, T ]
x (0) = x0 .
{x : |x x0 | r}
where this is the usual norm coming from the dot product. Let P : Rn
D (x0 , r) be dened by
{
x if x D (x0 , r)
P (x) xx0
x0 + r |xx0|
if x / D (x0 , r)
33. Use Problem 31 to obtain local solutions to the initial value problem where
f is not assumed to be bounded. It is only assumed to be continuous. This
means there is a small interval whose length is perhaps not T such that the
solution to the dierential equation exists on this small interval.
The Mathematical Theory Of
Determinants, Basic Linear
Algebra
Lemma 5.1.1 There exists a unique function sgnn which maps each list of numbers
from {1, , n} to one of the three numbers, 0, 1, or 1 which also has the following
properties.
sgnn (1, , n) = 1 (5.1.1)
sgnn (i1 , , p, , q, , in ) = sgnn (i1 , , q, , p, , in ) (5.1.2)
In words, the second property states that if two of the numbers are switched, the
value of the function is multiplied by 1. Also, in the case where n > 1 and
{i1 , , in } = {1, , n} so that every number from {1, , n} appears in the or-
dered list, (i1 , , in ) ,
99
100THE MATHEMATICAL THEORY OF DETERMINANTS, BASIC LINEAR ALGEBRA
If there are repeated numbers in (i1 , , in+1 ) , then it is obvious 5.1.2 holds because
both sides would equal zero from the above denition. It remains to verify 5.1.2 in
the case where there are no numbers repeated in (i1 , , in+1 ) . Consider
( r s
)
sgnn+1 i1 , , p, , q, , in+1 ,
where the r above the p indicates the number, p is in the rth position and the s
above the q indicates that the number, q is in the sth position. Suppose rst that
r < < s. Then
( )
r s
sgnn+1 i1 , , p, , n + 1, , q, , in+1
( r s1
)
n+1
(1) sgnn i1 , , p, , q , , in+1
while ( )
r s
sgnn+1 i1 , , q, , n + 1, , p, , in+1 =
( r s1
)
n+1
(1) sgnn i1 , , q, , p , , in+1
and so, by induction, a switch of p and q introduces a minus sign in the result.
Similarly, if > s or if < r it also follows that 5.1.2 holds. The interesting case
is when = r or = s. Consider the case where = r and note the other case is
entirely similar. ( )
r s
sgnn+1 i1 , , n + 1, , q, , in+1 =
( s1
)
n+1r
(1) sgnn i1 , , q , , in+1 (5.1.4)
5.2. THE DETERMINANT 101
while ( )
r s
sgnn+1 i1 , , q, , n + 1, , in+1 =
( r
)
n+1s
(1) sgnn i1 , , q, , in+1 . (5.1.5)
Therefore,
( r s
) ( s1
)
n+1r
sgnn+1 i1 , , n + 1, , q, , in+1 = (1) sgnn i1 , , q , , in+1
( r
)
n+1r s1r
= (1) (1) sgnn i1 , , q, , in+1
( r
)
n+s
= (1) sgnn i1 , , q, , in+1
( r
)
2s1 n+1s
= (1) (1) sgnn i1 , , q, , in+1
( r s )
= sgnn+1 i1 , , q, , n + 1, , in+1 .
Denition 5.2.1 Let f be a real valued function which has the set of ordered lists
of numbers from {1, , n} as its domain. Dene
f (k1 kn )
(k1 , ,kn )
102THE MATHEMATICAL THEORY OF DETERMINANTS, BASIC LINEAR ALGEBRA
to be the sum of all the f (k1 kn ) for all possible choices of ordered lists (k1 , , kn )
of numbers of {1, , n}. For example,
f (k1 , k2 ) = f (1, 2) + f (2, 1) + f (1, 1) + f (2, 2) .
(k1 ,k2 )
where the sum is taken over all ordered lists of numbers from {1, , n}. Note it
suces to take the sum over only those ordered lists in which there are no repeats
because if there are, sgn (k1 , , kn ) = 0 and so that term contributes 0 to the sum.
and
A (1, , n) = A.
= sgn (k1 , , kn ) ar1 k1 arn kn (5.2.7)
(k1 , ,kn )
sgn (k1 , , kr , , ks , , kn ) a1k1 arkr asks ankn ,
(k1 , ,kn )
5.2. THE DETERMINANT 103
Summing over all ordered lists, (r1 , , rn ) where the ri are distinct, (If the ri are
not distinct, sgn (r1 , , rn ) = 0 and so there is no contribution to the sum.)
n! det (A) =
sgn (r1 , , rn ) sgn (k1 , , kn ) ar1 k1 arn kn .
(r1 , ,rn ) (k1 , ,kn )
This proves the corollary since the formula gives the same number for A as it does
for AT .
Proof: By Proposition 5.2.3 when two rows are switched, the determinant of the
resulting matrix is (1) times the determinant of the original matrix. By Corollary
5.2.5 the same holds for columns because the columns of the matrix equal the rows
of the transposed matrix. Thus if A1 is the matrix obtained from A by switching
two columns, ( ) ( )
det (A) = det AT = det AT1 = det (A1 ) .
If A has two equal columns or two equal rows, then switching them results in the
same matrix. Therefore, det (A) = det (A) and so det (A) = 0.
It remains to verify the last assertion.
det (A) sgn (k1 , , kn ) a1k1 (xaki + ybki ) ankn
(k1 , ,kn )
=x sgn (k1 , , kn ) a1k1 aki ankn
(k1 , ,kn )
+y sgn (k1 , , kn ) a1k1 bki ankn
(k1 , ,kn )
5.2. THE DETERMINANT 105
By Corollary 5.2.6
r
( )
det (A) = ck det a1 ar an1 ak = 0.
k=1
( )
The case for rows follows from the fact that det (A) = det AT .
One of the most important rules about determinants is that the determinant of
a product equals the product of the determinants.
det (AB) =
sgn (k1 , , kn ) c1k1 cnkn
(k1 , ,kn )
( ) ( )
= sgn (k1 , , kn ) a1r1 br1 k1 anrn brn kn
(k1 , ,kn ) r1 rn
= sgn (k1 , , kn ) br1 k1 brn kn (a1r1 anrn )
(r1 ,rn ) (k1 , ,kn )
= sgn (r1 rn ) a1r1 anrn det (B) = det (A) det (B) .
(r1 ,rn )
Proof: Denote M by (mij ). Thus in the rst case, mnn = a and mni = 0 if
i = n while in the second case, mnn = a and min = 0 if i = n. From the denition
of the determinant,
det (M ) sgnn (k1 , , kn ) m1k1 mnkn
(k1 , ,kn )
Letting denote the position of n in the ordered list, (k1 , , kn ) then using the
earlier conventions used to prove Lemma 5.1.1, det (M ) equals
( )
n1
n
(1) sgnn1 k1 , , k1 , k+1 , , kn m1k1 mnkn
(k1 , ,kn )
5.2. THE DETERMINANT 107
Now suppose 5.2.13. Then if kn = n, the term involving mnkn in the above expres-
sion equals zero. Therefore, the only terms which survive are those for which = n
or in other words, those for which kn = n. Therefore, the above expression reduces
to
a sgnn1 (k1 , kn1 ) m1k1 m(n1)kn1 = a det (A) .
(k1 , ,kn1 )
To get the assertion in the situation of 5.2.12 use Corollary 5.2.5 and 5.2.13 to write
(( T ))
( T) A 0 ( )
det (M ) = det M = det = a det AT = a det (A) .
a
In terms of the theory of determinants, arguably the most important idea is
that of Laplace expansion along a row or a column. This will follow from the above
denition of a determinant.
The following is the main result. Earlier this was given as a denition and the
outrageous totally unjustied assertion was made that the same number would be
obtained by expanding the determinant along any row or column. The following
theorem proves this assertion.
The rst formula consists of expanding the determinant along the ith row and the
second expands the determinant along the j th column.
Proof: Let (ai1 , , ain ) be the ith row of A. Let Bj be the matrix obtained
from A by leaving every row the same except the ith row which in Bj equals
(0, , 0, aij , 0, , 0). Then by Corollary 5.2.6,
n
det (A) = det (Bj )
j=1
Denote by Aij the (n 1) (n 1) matrix obtained by deleting the ith row and
i+j ( )
the j th column of A. Thus cof (A)ij (1) det Aij . At this point, recall
108THE MATHEMATICAL THEORY OF DETERMINANTS, BASIC LINEAR ALGEBRA
that from Proposition 5.2.3, when two rows or two columns in a matrix, M, are
switched, this results in multiplying the determinant of the old matrix by 1 to get
the determinant of the new matrix. Therefore, by Lemma 5.2.11,
(( ij ))
nj ni A
det (Bj ) = (1) (1) det
0 aij
(( ij ))
i+j A
= (1) det = aij cof (A)ij .
0 aij
Therefore,
n
det (A) = aij cof (A)ij
j=1
which is the formula for expanding det (A) along the ith row. Also,
( ) n
( )
det (A) = det AT = aTij cof AT ij
j=1
n
= aji cof (A)ji
j=1
which is the formula for expanding det (A) along the ith column.
Note that this gives an easy way to write a formula for the inverse of an n n
matrix.
Now consider
n
air cof (A)ik det(A)1
i=1
when k = r. Replace the k th column with the rth column to obtain a matrix, Bk
whose determinant equals zero by Corollary 5.2.6. However, expanding this matrix
along the k th column yields
1
n
1
0 = det (Bk ) det (A) = air cof (A)ik det (A)
i=1
5.2. THE DETERMINANT 109
Summarizing,
n
1
air cof (A)ik det (A) = rk .
i=1
n
1
arj cof (A)kj det (A) = rk
j=1
( )
This proves that if det (A) = 0, then A1 exists with A1 = a1
ij , where
1
a1
ij = cof (A)ji det (A) .
so det (A) = 0.
The next corollary points out that if an n n matrix, A has a right or a left
inverse, then it has an inverse.
det B det A = 1
thus solving the system. Now in the case that A1 exists, there is a formula for
A1 given above. Using this formula,
n
n
1
xi = a1
ij yj = cof (A)ji yj .
j=1 j=1
det (A)
T
where here the ith column of A is replaced with the column vector (y1 , yn ) ,
and the determinant of this modied matrix is taken and divided by det (A). This
formula is known as Cramers rule.
i1 < < ir
I want to show that every row is a linear combination of these rows. Consider the
lth row and let p be an index between 1 and n. Form the following (r + 1) (r + 1)
matrix
ai1 j1 ai1 jr ai1 p
.. .. ..
. . .
air j1 air jr air p
alj1 aljr alp
Of course you can assume l / {i1 , , ir } because there is nothing to prove if the
lth row is one of the chosen ones. The above matrix has determinant 0. This is
because if p
/ {j1 , , jr } then the above would be a submatrix of A which is too
large to have non zero determinant. On the other hand, if p {j1 , , jr } then the
above matrix has two columns which are equal so its determinant is still 0.
Expand the determinant of the above matrix along the last column. Let Ck
denote the cofactor associated with the entry aik p . This is not dependent on the
choice of p. Remember, you delete the column and the row the entry is in and take
the determinant of what is left and multiply by 1 raised to an appropriate power.
Let C denote the cofactor associated with alp . This is given to be nonzero, it being
the determinant of the matrix
ai1 j1 ai1 jr
.. ..
. .
air j1 a ir j r
Thus
r
0 = alp C + Ck aik p
k=1
which implies
r
Ck
r
alp = aik p mk aik p
C
k=1 k=1
Since this is true for every p and since mk does not depend on p, this has shown the
lth row is a linear combination of the i1 , i2 , , ir rows. The determinant rank does
not change when you replace A with AT . Therefore, the same conclusion holds for
the columns.
112THE MATHEMATICAL THEORY OF DETERMINANTS, BASIC LINEAR ALGEBRA
1. det (A) = 0.
2. A, AT are not one to one.
3. A is not onto.
Since also A0 = 0, it follows A is not one to one. Similarly, AT is not one to one
by the same argument applied to AT . This veries that 1.) implies 2.).
Now suppose 2.). Then since AT is not one to one, it follows there exists x = 0
such that
AT x = 0.
Taking the transpose of both sides yields
x T A = 0T
1. det(A) = 0.
2. A and AT are one to one.
3. A is onto.
Theorem 5.2.22 Let the system of equations be as just described in 5.2.14 where
m < n. Then letting
xT (x1 , x2 , , xn ) Fn ,
there exists x = 0 such that the components satisfy each of the equations of 5.2.14.
Here F is a eld of scalars. Think R or C for example.
Ax = 0
an n n matrix having n m rows of zeros on the bottom, it follows this matrix has
determinant equal to 0. Therefore, from Theorem 5.2.20, there exists x = 0 such
that Ax = 0.
Theorem 5.2.24 Let v1 be a unit vector (|v1 | = 1) in Fn , n > 1. Then there exist
vectors {v2 , , vn } such that
{v1 , , vn }
vj x = 0 j = 1, 2, , k
This amounts to the situation of Theorem 5.2.22 in which there are more variables
than equations. Therefore, by this theorem, there exists a nonzero x solving all
these equations. Divide by its magnitude and this gives vk+1 .
The argument gives the following generalization.
U T U = U U T = I.
Two matrices A and B are similar if there is some invertible matrix S such
that A = S 1 BS. Note that similar matrices have the same characteristic equation
because by Theorem 5.2.10 which says the determinant of a product is the product
of the determinants,
( ) ( )
det (I A) = det I S 1 BS = det S 1 (I B) S
( ) ( )
= det S 1 det (I B) det (S) = det S 1 S det (I B) = det (I B)
Av1 = 1 v1 , |v1 | = 1.
e1 A1 U
U e1 = Tn1 ,
det (I A) = 0
Ax = x, x = 0
Of course if A is real, it is still possible that the eigenvalue could be complex and if
this is the case, then the vector x will also end up being complex. I wish to show
that the eigenvalues are all real. Suppose then that is an eigenvalue and let x be
the corresponding eigenvector described above. Then letting x denote the complex
conjugate of x,
T
xT x = (Ax) x = xT AT x = xT Ax = xT Ax = xT x
Theorem 5.2.29 Let A be a real symmetric matrix. Then there exists a diago-
nal matrix D consisting of the eigenvalues of A down the main diagonal and an
orthogonal matrix U such that
U T AU = D.
Proof: Since A has all real eigenvalues, it follows from Theorem 5.2.27, there
exists an orthogonal matrix U such that
U T AU = T
T T = U T AT U = U T AU = T
and so in fact T is a diagonal matrix having the eigenvalues of A down the diagonal.
Theorem 5.2.30 Let A be a real symmetric matrix which has all positive eigen-
values 0 < 1 2 n . Then
2
(Ax x) xT Ax 1 |x|
5.3. THE RIGHT POLAR FACTORIZATION 117
Lemma 5.3.1 Let A be a symmetric matrix such that all its eigenvalues are non-
negative. Then there exists a symmetric matrix, A1/2 such that A1/2 has all non-
( )2
negative eigenvalues and A1/2 = A.
so A1/2 is symmetric.
There is also a useful observation about orthonormal sets of vectors which is
stated in the next lemma.
Proof: This follows from the denition. From the properties of the dot product
and using the fact that the given set of vectors is orthonormal,
r 2 r
r
ck xk = ck xk , cj xj
k=1 k=1 j=1
r
2
= ck cj (xk , xj ) = |ck | .
k,j k=1
Here is another lemma about preserving distance.
Proof: Since R preserves distances, |Rx| = |x| for every x. Therefore from the
axioms of the dot product,
2 2
|x| + |y| + (x, y) + (y, x)
2
= |x + y|
= (R (x + y) , R (x + y))
= (Rx,Rx) + (Ry,Ry) + (Rx, Ry) + (Ry, Rx)
2 2 ( ) ( )
= |x| + |y| + RT Rx, y + y, RT Rx
Then
( ) ( )
0 = RT Rx x,y = RT Rx x, y
( )
= RT Rx x, y
( )
Thus RT Rx x, y = 0 for all x, y because the given x, y were arbitrary. Let
y = RT Rx x to conclude that for all x,
RT Rx x = 0
F = RU.
Also the eigenvalues of the n n matrix F T F are all nonnegative. This is because
if x is an eigenvalue,
( )
(x, x) = F T F x, x = (F x,F x) 0.
U 2 = F T F.
{U x1 , , U xr , yr+1 , , yn } .
Therefore, from Corollary 5.2.25 again, this orthonormal set of vectors can be ex-
tended to an orthonormal basis for Fm ,
{F x1 , , F xr , zr+1 , , zm }
Thus there are at least as many zk as there are yj . Now for x Fn , since
{U x1 , , U xr , yr+1 , , yn }
c1 , cr , dr+1 , , dn
such that
r
n
x= ck U xk + dk yk
k=1 k=r+1
120THE MATHEMATICAL THEORY OF DETERMINANTS, BASIC LINEAR ALGEBRA
Dene
r
n
Rx ck F xk + dk zk (5.3.16)
k=1 k=r+1
( ( r ) ( ))
( )
r
= F F T
bk xk x , bk xk x
k=1 k=1
( ( ) ( ))
r
r
= U 2
bk xk x , bk xk x
k=1 k=1
( ( r ) ( r ))
= U bk xk x , U bk xk x
k=1 k=1
( )
r
r
= bk U xk U x, bk U xk U x =0
k=1 k=1
r
Therefore, F ( k=1 bk xk ) = F (x) and this shows
RU x = F x.
From 5.3.16 and Lemma 5.3.2 R preserves distances. Therefore, by Lemma 5.3.3
RT R = I.
The Derivative
f (x + v) = f (x) + Lv + o (v)
Note that from Theorem 4.8.4 the question whether a given function is dieren-
tiable is independent of the norm used on the nite dimensional vector space. That
121
122 THE DERIVATIVE
is, a function is dierentiable with one norm if and only if it is dierentiable with
another norm.
The denition 6.1.1 means the error,
f (x + v) f (x) Lv
converges to 0 faster than ||v||. Thus the above denition is equivalent to saying
||f (x + v) f (x) Lv||
lim =0 (6.1.2)
||v||0 ||v||
or equivalently,
||f (y) f (x) Df (x) (y x)||
lim = 0. (6.1.3)
yx ||y x||
The symbol, o (v) should be thought of as an adjective. Thus, if t and k are
constants,
o (v) = o (v) + o (v) , o (tv) = o (v) , ko (v) = o (v)
and other similar observations hold.
Theorem 6.1.2 The derivative is well dened.
Proof: First note that for a xed vector, v, o (tv) = o (t). This is because
o (tv) o (tv)
lim = lim ||v|| =0
t0 |t| t0 ||tv||
Now suppose both L1 and L2 work in the above denition. Then let v be any vector
and let t be a real scalar which is chosen small enough that tv + x U . Then
f (x + tv) = f (x) + L1 tv + o (tv) , f (x + tv) = f (x) + L2 tv + o (tv) .
Therefore, subtracting these two yields (L2 L1 ) (tv) = o (tv) = o (t). There-
fore, dividing by t yields (L2 L1 ) (v) = o(t)t . Now let t 0 to conclude that
(L2 L1 ) (v) = 0. Since this is true for all v, it follows L2 = L1 .
Lemma 6.1.3 Let f be dierentiable at x. Then f is continuous at x and in fact,
there exists K > 0 such that whenever ||v|| is small enough,
||f (x + v) f (x)|| K ||v||
Proof: From the denition of the derivative,
f (x + v) f (x) = Df (x) v + o (v) .
o(||v||)
Let ||v|| be small enough that ||v|| < 1 so that ||o (v)|| ||v||. Then for such
v,
||f (x + v) f (x)|| ||Df (x) v|| + ||v||
(||Df (x)|| + 1) ||v||
This proves the lemma with K = ||Df (x)|| + 1.
Here ||Df (x)|| is the operator norm of the linear transformation, Df (x).
6.2. THE CHAIN RULE 123
Theorem 6.2.1 (The chain rule) Let U and V be open sets U X and V
Y . Suppose f : U V is dierentiable at x U and suppose g : V Fq is
dierentiable at f (x) V . Then g f is dierentiable at x and
Proof: This follows from a computation. Let B (x,r) U and let r also be
small enough that for ||v|| r, it follows that f (x + v) V . Such an r exists
because f is continuous at x. For ||v|| < r, the denition of dierentiability of g
and f implies
g (f (x + v)) g (f (x)) =
It follows
( )
f (x + tvk ) f (x)
lim j
t0 t
fj (x + tvk ) fj (x)
lim Dvk fj (x)
t0
( t )
= j Jik (x) wi = Jjk (x)
i
In other words, the matrix of Df (x) is nothing more than the matrix of partial
derivatives. The k th column of the matrix (Jij ) is
f f (x + tek ) f (x)
(x) = lim Dek f (x) .
xk t0 t
Thus the matrix of Df (x) with respect to the usual basis vectors is the matrix
of the form
f1,x1 (x) f1,x2 (x) f1,xn (x)
.. .. ..
. . .
fm,x1 (x) fm,x2 (x) fm,xn (x)
where the notation g,xk denotes the k th partial derivative given by the limit,
g (x + tek ) g (x) g
lim .
t0 t xk
The above discussion is summarized in the following theorem.
Theorem 6.3.1 Let f : Fn Fm and suppose f is dierentiable at x. Then all the
partial derivatives fx
i (x)
j
exist and if Jf (x) is the matrix of the linear transforma-
tion, Df (x) with respect to the standard basis vectors, then the ij th entry is given
fi
by x j
(x) also denoted as fi,j or fi,xj .
Denition 6.3.2 In general, the symbol
Dv f (x)
is dened by
f (x + tv) f (x)
lim
t0 t
where t F. This is often called the Gateaux derivative.
What if all the partial derivatives of f exist? Does it follow that f is dieren-
tiable? Consider the following function, f : R2 R,
{ xy
f (x, y) = x2 +y 2 if (x, y) = (0, 0) .
0 if (x, y) = (0, 0)
Then from the denition of partial derivatives,
f (h, 0) f (0, 0) 00
lim = lim =0
h0 h h0 h
and
f (0, h) f (0, 0) 00
lim = lim =0
h0 h h0 h
However f is not even continuous at (0, 0) which may be seen by considering the
behavior of the function along the line y = x and along the line x = 0. By Lemma
6.1.3 this implies f is not dierentiable. Therefore, it is necessary to consider the
correct denition of the derivative given above if you want to get a notion which
generalizes the concept of the derivative of a function of one variable in such a way
as to preserve continuity whenever the function is dierentiable.
126 THE DERIVATIVE
Lemma 6.4.1 Let Y be a normed vector space and suppose h : [0, 1] Y is dier-
entiable and satises
||h (t)|| M.
Then
||h (1) h (0)|| M.
Suppose t < 1. Then there exist positive numbers, hk decreasing to 0 such that
and now it follows from 6.4.5 and the triangle inequality that
and so
||h (t + hk ) h (t)|| > (M + ) hk
Now dividing by hk and letting k
||h (t)|| M + ,
a contradiction.
Proof: Let
h (t) f (x + t (y x)) .
Then by the chain rule,
h (t) = Df (x + t (y x)) (y x)
and so
by Lemma 6.4.1
Theorem 6.5.1 Let X be a normed vector space having basis {v1 , , vn } and let
Y be another normed vector space having basis {w1 , , wm } . Let U be an open
set in X and let f : U Y have the property that the Gateaux derivatives,
f (x + tvk ) f (x)
Dvk f (x) lim
t0 t
n
Df (x) v = Dvk f (x) ak
k=1
where
n
v= ak vk .
k=1
n
Proof: Let v = k=1 ak vk . Then
( )
n
f (x + v) f (x) = f x+ ak vk f (x) .
k=1
0
Then letting k=1 0, f (x + v) f (x) is given by
n
k
k1
f x+ aj vj f x+ a j v j
k=1 j=1 j=1
n
= [f (x + ak vk ) f (x)] +
k=1
n
k
k1
f x+ aj vj f (x + ak vk ) f x+ aj vj f (x)
k=1 j=1 j=1
(6.5.6)
Consider the k th term in 6.5.6. Let
k1
h (t) f x+ aj vj + tak vk f (x + tak vk )
j=1
Now without loss of generality, it can be assumed the norm on X is given by that
of Example 4.8.5,
n
||v|| max |ak | : v = ak vk
j=1
6.5. EXISTENCE OF THE DERIVATIVE, C 1 FUNCTIONS 129
because by Theorem 4.8.4 all norms on X are equivalent. Therefore, from 6.5.7 and
the assumption that the Gateaux derivatives are continuous,
k1
||h (t)|| = Dvk f x+
aj vj + tak vk Dvk f (x + tak vk ) ak
j=1
|ak | ||v||
provided ||v|| is suciently small. Since is arbitrary, it follows from Lemma 6.4.1
the expression in 6.5.6 is o (v) because this expression equals a nite sum of terms
of the form h (1) h (0) where ||h (t)|| ||v|| . Thus
n
f (x + v) f (x) = [f (x + ak vk ) f (x)] + o (v)
k=1
n
n
= Dvk f (x) ak + [f (x + ak vk ) f (x) Dvk f (x) ak ] + o (v) .
k=1 k=1
Dening
n
Df (x) v Dvk f (x) ak
k=1
where v = k ak vk , it follows Df (x) L (X, Y ) and is given by the above formula.
It remains to verify x Df (x) is continuous.
||(Df (x) Df (y)) v||
n
||(Dvk f (x) Dvk f (y)) ak ||
k=1
n
max {|ak | , k = 1, , n} ||Dvk f (x) Dvk f (y)||
k=1
n
= ||v|| ||Dvk f (x) Dvk f (y)||
k=1
130 THE DERIVATIVE
and so
n
||Df (x) Df (y)|| ||Dvk f (x) Dvk f (y)||
k=1
which proves the continuity of Df because of the assumption the Gateaux derivatives
are continuous.
This motivates the following denition of what it means for a function to be C 1 .
Now the following major theorem states these two denitions are equivalent.
Proof: It was shown in Theorem 6.5.1 that Denition 6.5.2 implies 6.5.3. Sup-
pose then that Denition 6.5.3 holds. Then if v is any vector,
x Df (x)
Thus,
Df (x + v) Df (x) = D2 f (x) v + o (v) .
This implies
D2 f (x) L (X, L (X, Y )) , D2 f (x) (u) (v) Y,
and the map
(u, v) D2 f (x) (u) (v)
is a bilinear map having values in Y . In other words, the two functions,
with similar conventions for higher derivatives than 3. Another convention which
is often used is the notation
Dk f (x) vk
132 THE DERIVATIVE
instead of
Dk f (x) (v, , v) .
Note that for every k, Dk f maps U to a normed vector space. As mentioned
above, Df (x) has values in L (X, Y ) , D2 f (x) has values in L (X, L (X, Y )) , etc.
Thus it makes sense to consider whether Dk f is continuous. This is described in
the following denition.
Denition 6.6.2 Let U be an open subset of X, a normed vector space and let
f : U Y. Then f is C k (U ) if f and its rst k derivatives are all continuous. Also,
Dk f (x) when it exists can be considered a Y valued multilinear function.
6.7 C k Functions
Recall that for a C 1 function, f
Df (x) v = Dvj f (x) aj = Dvj fi (x) wi aj
j ij
( )
= Dvj fi (x) wi vj ak vk = Dvj fi (x) wi vj (v)
ij k ij
where ak vk = v and
k
f (x) = fi (x) wi . (6.7.8)
i
This is because ( )
wi vj ak vk ak wi jk = wi aj .
k k
Thus
Df (x) = Dvj fi (x) wi vj
ij
I propose to iterate this observation, starting with f and then going to Df and
then D2 f and so forth. Hopefully it will yield a rational way to understand higher
order derivatives in the same way that matrices can be used to understand linear
transformations. Thus beginning with the derivative,
Df (x) = Dvj1 fi (x) wi vj1 .
ij1
for any choice of vj1 vj2 vjk and for any k p exist and are continuous.
Proof: This follows from a repeated application of Theorems 6.5.1 and 6.5.4 at
each new dierentiation.
Denition 6.7.2 Let X, Y be nite dimensional normed vector spaces and let U
be an open set in X and f : U Y be a function,
f (x) = fi (x) wi
i
for any choice of vj1 vj2 vjk where {v1 , , vn } is a basis for X and for any
k n exist and are continuous.
134 THE DERIVATIVE
A convenient notation which is often used which helps to make sense of higher
order partial derivatives is presented in the following denition.
x = (x1 , , xn ),
|| f (x)
1 x2 xn , D f (x)
x x 1 2 n
.
x 2
1 x2
1
x
n
n
Then in this special case, the following denition is equivalent to the above as
a denition of what is meant by a C k function.
Uy {x X : (x, y) U }
Proof: Recall by Theorem 4.8.4 it does not matter how this norm is dened
and the denition above is convenient. It obviously satises most axioms of a norm.
The only one which is not obvious is the triangle inequality. I will show this now.
suppose then that ||x|| + ||x1 || ||y|| + ||y1 || . Then the above equals
||x|| + ||x1 || max (||x|| , ||y||) + max (||x1 || , ||y1 ||) ||(x, y)|| + ||(x1 , y1 )||
B ((x, y) , r) U.
This says that if (u, v) X Y such that ||(u, v) (x, y)|| < r, then (u, v) U.
Thus if
||(u, y) (x, y)|| = ||u x|| < r,
then (u, y) U. This has just said that B (x,r), the ball taken in X is contained in
Uy .
Or course one could also consider
Ux {y : (x, y) U }
in the same way and conclude this set is open in Y . Also, the n generalization to
many factors yields the same conclusion. In this case, for x i=1 Xi , let
( )
||x|| max ||xi ||Xi : x = (x1 , , xn )
n
Then a similar argument to the above shows this is a norm on i=1 Xi .
n
Corollary 6.8.2 Let U i=1 Xi and let
{ ( ) }
U(x1 , ,xi1 ,xi+1 , ,xn ) x Fri : x1 , , xi1 , x, xi+1 , , xn U .
of v. Thus, by saying
( )
z g x1 , , xi1 , z, xi+1 , , xn
where v = (v1 , , vn ) .
Proof: Suppose then that Di g exists and is continuous for each i. Note that
k
j vj = (v1 , , vk , 0, , 0) .
j=1
n 0
Thus j=1 j vj = v and dene j=1 j vj 0. Therefore,
n
k
k1
g (x + v) g (x) = g x+ j vj g x + j vj (6.8.10)
k=1 j=1 j=1
6.8. THE DERIVATIVE AND THE CARTESIAN PRODUCT 137
and the expression in 6.8.12 is of the form h (vk ) h (0) where for small w Xk ,
k1
h (w) g x+ j vj + k w g (x + k w) .
j=1
Therefore,
k1
Dh (w) = Dk g x+ j vj + k w Dk g (x + k w)
j=1
and by continuity, ||Dh (w)|| < provided ||v|| is small enough. Therefore, by
Theorem 6.4.2, whenever ||v|| is small enough,
||h (vk ) h (0)|| ||vk || ||v||
which shows that since is arbitrary, the expression in 6.8.12 is o (v). Now in 6.8.11
g (x+k vk ) g (x) = Dk g (x) vk + o (vk ) = Dk g (x) vk + o (v) .
Therefore, referring to 6.8.10,
n
g (x + v) g (x) = Dk g (x) vk + o (v)
k=1
which shows Dg (x) exists and equals the formula given in 6.8.9.
Next suppose g is C 1 . I need to verify that Dk g (x) exists and is continuous.
Let v Xk suciently small. Then
g (x + k v) g (x) = Dg (x) k v + o (k v)
= Dg (x) k v + o (v)
since ||k v|| = ||v||. Then Dk g (x) exists and equals
Dg (x) k
Now x Dg (x) nis continuous. Since k is linear, it follows from Theorem 4.8.3
that k : Xk i=1 Xi is also continuous,
The way this is usually used is in the following corollary, a case of Theorem 6.8.5
obtained by letting Xi = F in the above theorem.
138 THE DERIVATIVE
Proof: Since U is open, there exists r > 0 such that B ((x, y) , r) U. Now let
|t| , |s| < r/2, t, s real numbers and consider
h(t) h(0)
1 z }| { z }| {
(s, t) {f (x + t, y + s) f (x + t, y) (f (x, y + s) f (x, y))}. (6.9.13)
st
Note that (x + t, y + s) U because
( )1/2
|(x + t, y + s) (x, y)| = |(t, s)| = t2 + s2
( 2 )1/2
r r2 r
+ = < r.
4 4 2
As implied above, h (t) f (x + t, y + s) f (x + t, y). Therefore, by the mean
value theorem and the (one variable) chain rule,
1 1
(s, t) = (h (t) h (0)) = h (t) t
st st
1
= (fx (x + t, y + s) fx (x + t, y))
s
for some (0, 1) . Applying the mean value theorem again,
(s, t) = fxy (x + t, y + s)
6.9. MIXED PARTIAL DERIVATIVES 139
The following is obtained from the above by simply xing all the variables except
for the two of interest.
Corollary 6.9.2 Suppose U is an open subset of X and f : U R has the property
that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk
are both continuous at x U. Then fxk xl (x) = fxl xk (x) .
By considering the real and imaginary parts of f in the case where f has values
in C you obtain the following corollary.
Corollary 6.9.3 Suppose U is an open subset of Fn and f : U F has the property
that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk
are both continuous at x U. Then fxk xl (x) = fxl xk (x) .
Finally, by considering the components of f you get the following generalization.
Corollary 6.9.4 Suppose U is an open subset of Fn and f : U F m has the
property that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and
fxl xk are both continuous at x U. Then fxk xl (x) = fxl xk (x) .
It is necessary to assume the mixed partial derivatives are continuous in order
to assert they are equal. The following is a well known example [3].
Example 6.9.5 Let
{
xy (x2 y 2 )
f (x, y) = x2 +y 2 if (x, y) = (0, 0)
0 if (x, y) = (0, 0)
From the denition of partial derivatives it follows immediately that fx (0, 0) =
fy (0, 0) = 0. Using the standard rules of dierentiation, for (x, y) = (0, 0) ,
x4 y 4 + 4x2 y 2 x4 y 4 4x2 y 2
fx = y 2 , fy = x 2
(x2 + y 2 ) (x2 + y 2 )
Now
fx (0, y) fx (0, 0)
fxy (0, 0) lim
y0 y
y 4
= lim = 1
y0 (y 2 )2
140 THE DERIVATIVE
while
fy (x, 0) fy (0, 0)
fyx (0, 0) lim
x0 x
x4
= lim =1
x0 (x2 )2
showing that although the mixed partial derivatives do exist at (0, 0) , they are not
equal there.
and
1 1
(I A) (1 r) . (6.10.15)
Furthermore, if
{ }
I A L (X, X) : A1 exists
the map A A1 is continuous on I and I is an open subset of L (X, X).
Then ( )
n
(I A) ck vk =0
k=1
n
ck vk = 0
k=1
6.10. IMPLICIT FUNCTION THEOREM 141
n
which requires each ck = 0 because the {vk } are independent. Hence {(I A) vk }k=1
is a basis for X because there are n of these vectors and every basis has the same
size. Therefore, if y X, there exist scalars, ck such that
( n )
n
y= ck (I A) vk = (I A) ck vk
k=1 k=1
1
so (I A) is onto as claimed. Thus (I A) L (X, X) and it remains to estimate
its norm.
which shows the map which takes a linear transformation in I to its inverse is
continuous.
The next theorem is a very useful result in many areas. It will be used in this
section to give a short proof of the implicit function theorem but it is also useful
in studying dierential equations and integral equations. It is sometimes called the
uniform contraction principle.
142 THE DERIVATIVE
Theorem 6.10.2 Let X, Y be nite dimensional normed vector spaces. Also let E
be a closed subset of X and F a closed subset of Y. Suppose for each (x, y) E F,
T (x, y) E and satises
Then for each y F there exists a unique xed point for T (, y) , x E, satisfying
T (x, y) = x (6.10.19)
p
||xk+p xk || ||xk+i xk+i1 ||
i=1
p
rk ||g (x0 ) x0 ||
rk+i1 ||g (x0 ) x0 || .
i=1
1r
Since 0 < r < 1, this shows that {xk }k=1 is a Cauchy sequence. Therefore, by
completeness of E it converges to a point x E. To see x is a xed point, use the
continuify of g to obtain
f (x (y) , y) = 0. (6.10.22)
1 1
By Lemma 6.10.1 and the assumption that D1 f (x0 , y0 ) exists, it follows, D1 f (x, y)
exists and equals
1 1
(I D1 T (x, y)) D1 f (x0 , y0 )
By the estimate of Lemma 6.10.1 and 6.10.24,
1 1
D1 f (x, y) 2 D1 f (x0 , y0 ) . (6.10.27)
M ||y2 y1 || . (6.10.29)
From now on assume ||x x0 || < and ||y y0 || < so that 6.10.29, 6.10.27,
6.10.28, 6.10.26, and 6.10.25 all hold. By 6.10.29, 6.10.26, 6.10.28, and the uniform
( )
contraction principle, Theorem 6.10.2 applied to E B x0 , 5 6 and F B (y0 , )
implies that for each y B (y0 , ), there exists a unique x (y) B (x0 , ) (actually
( )
in B x0 , 5
6 ) such that T (x (y) , y) = x (y) which is equivalent to
f (x (y) , y) = 0.
Furthermore,
||x (y) x (y )|| 2M ||y y || . (6.10.30)
6.10. IMPLICIT FUNCTION THEOREM 145
This proves the implicit function theorem except for the verication that y
x (y) is C 1 . This is shown next. Letting v be suciently small, Theorem 6.8.5 and
Theorem 6.4.2 imply
0 = f (x (y + v) , y + v) f (x (y) , y) =
D1 f (x (y) , y) (x (y + v) x (y)) +
+D2 f (x (y) , y) v + o ((x (y + v) x (y) , v)) .
The last term in the above is o (v) because of 6.10.30. Therefore, using 6.10.27,
solve the above equation for x (y + v) x (y) and obtain
1
x (y + v) x (y) = D1 (x (y) , y) D2 f (x (y) , y) v + o (v)
Now it follows from the continuity of D2 f , D1 f , the inverse map, 6.10.30, and this
formula for Dx (y)that x () is C 1 (B (y0 , )).
The next theorem is a very important special case of the implicit function the-
orem known as the inverse function theorem. Actually one can also obtain the
implicit function theorem from the inverse function theorem. It is done this way in
[33] and in [2].
x0 W U, (6.10.33)
F (x, y) f (x) y
The scalar valued entries of the matrix of D2 f (x (y) , y) have the same dieren-
tiability as the function y D2 f (x (y) , y) . This is because the linear projection
map ij mapping L (Y, Z) to F given by ij L Lij , the ij th entry of the matrix of
L with respect to the given bases is continuous thanks to Theorem 4.8.3. Similar
considerations apply to D1 f (x (y) , y) and the entries of its matrix, D1 f (x (y) , y)ij
taken with respect to suitable bases. From the formula for the inverse of a matrix,
1 1
Theorem 5.2.14, the ij th entries of the matrix of D1 f (x (y) , y) , D1 f (x (y) , y)ij
also have the same dierentiability as y D1 f (x (y) , y).
Now consider the formula for the derivative of the implicitly dened function in
6.10.31,
1
Dx (y) = D1 f (x (y) , y) D2 f (x (y) , y) . (6.10.36)
The above derivative is in L (Y, X) . Let {w1 , , wm } be a basis for Y and let
{v1 , , vn } be a basis for X. Letting xi be the ith component of x with respect
to the basis for X, it follows from Theorem 6.7.1, y x (y) will be C k if all such
Gateaux derivatives, Dwj1 wj2 wjr xi (y) exist and are continuous for r k and for
any i. Consider what is required for this to happen. By 6.10.36,
( 1
)
Dwj xi (y) = D1 f (x (y) , y) (D2 f (x (y) , y))kj
ik
k
G1 (x (y) , y) (6.10.37)
Since a similar result holds for all i and any choice of wj , wk , this shows x is at
least C 2 . If k 3, then another Gateaux derivative can be taken because then
(x, y, z) G2 (x, y, z) is C 1 and it has been established Dx is C 1 . Continuing this
6.11. TAYLORS FORMULA 147
way shows Dwj1 wj2 wjr xi (y) exists and is continuous for r k. This proves the
following corollary to the implicit and inverse function theorems.
Corollary 6.10.5 In the implicit and inverse function theorems, you can replace
C 1 with C k in the statements of the theorems for any k N.
f : U Rn Rm Rn
and f (x0 , y0 ) = 0 while f is C 1 . How can you recognize the condition of the implicit
1
function theorem which says D1 f (x0 , y0 ) exists? This is really not hard. You
recall the matrix of the transformation D1 f (x0 , y0 ) with respect to the usual basis
vectors is
f1,x1 (x0 , y0 ) f1,xn (x0 , y0 )
.. ..
. .
fn,x1 (x0 , y0 ) fn,xn (x0 , y0 )
1
and so D1 f (x0 , y0 ) exists exactly when the determinant of the above matrix is
nonzero. This is the condition to check. In the general case, you just need to verify
D1 f (x0 , y0 ) is one to one and this can also be accomplished by looking at the matrix
of the transformation with respect to some bases on X and Z.
m
h(k) (0) h(m+1) (t)
h (1) = h (0) + + .
k! (m + 1)!
k=1
Then F (1) = F (0) = 0. Therefore, by Rolles theorem there exists t between 0 and
1 such that F (t) = 0. Thus,
m
h(k+1) (t)
F (t) = h (t) +
k
0 = (1 t)
k!
k=1
m
h(k) (t) k1 m
k (1 t) K (m + 1) (1 t)
k!
k=1
And so
m
h(k+1) (t)
m1
h(k+1) (t)
= h (t) +
k k
(1 t) (1 t)
k! k!
k=1 k=0
m
K (m + 1) (1 t)
h(m+1) (t)
= h (t) + (1 t) h (t) K (m + 1) (1 t)
m m
m!
and so
h(m+1) (t)
K= .
(m + 1)!
Now let f : U R where U X a normed vector space and suppose f
C m (U ). Let x U and let r > 0 be such that
B (x,r) U.
Then for ||v|| < r consider
f (x+tv) f (x) h (t)
for t [0, 1]. Then by the chain rule,
h (t) = Df (x+tv) (v) , h (t) = D2 f (x+tv) (v) (v)
and continuing in this way,
h(k) (t) = D(k) f (x+tv) (v) (v) (v) D(k) f (x+tv) vk .
It follows from Taylors formula for a function of one variable given above that
m
D(k) f (x) vk D(m+1) f (x+tv) vm+1
f (x + v) = f (x) + + . (6.11.38)
k! (m + 1)!
k=1
D2 f (x+tv) v2
f (x + v) = f (x) + Df (x) v+ . (6.11.39)
2
Consider
n
v= vi ei ,
i=1
where
2 f (x+tv)
Hij (x+tv) = D2 f (x+tv) (ei ) (ej ) = .
xj xi
2 f (x)
Denition 6.11.3 The matrix whose ij th entry is xj xi is called the Hessian ma-
trix, denoted as H (x).
From Theorem 6.9.1, this is a symmetric real matrix, thus self adjoint. By the
continuity of the second partial derivative,
1
f (x + v) = f (x) + Df (x) v+ vT H (x) v+
2
1( T )
v (H (x+tv) H (x)) v . (6.11.40)
2
where the last two terms involve ordinary matrix multiplication and
vT = (v1 vn )
Then Dfi (0, 0) = 0 and for both functions, the Hessian matrix evaluated at
(0, 0) equals ( )
0 0
0 2
but the behavior of the two functions is very dierent near the origin. The second
has a saddle point while the rst has a minimum there.
f (x) = a
gi (x) = 0, i = 1, , m.
x0 is a local maximum if f (x0 ) f (x) for all x near x0 which also satises the
constraints 6.12.41. A local minimum is dened similarly. Let F : U R Rm+1
be dened by
f (x) a
g1 (x)
F (x,a) .. . (6.12.42)
.
gm (x)
Now consider the m+1n Jacobian matrix, the matrix of the linear transformation,
D1 F (x, a) with respect to the usual basis for Rn and Rm+1 .
fx1 (x0 ) fxn (x0 )
g1x1 (x0 ) g1xn (x0 )
.. .. .
. .
gmx1 (x0 ) gmxn (x0 )
F (x,a) = 0 (6.12.43)
1 , , m ,
at every point x0 which is either a local maximum or a local minimum. This proves
the following theorem.
6.13 Exercises
1. Suppose L L (X, Y ) and suppose L is one to one. Show there exists r > 0
such that for all x X,
||Lx|| r ||x|| .
Hint: You might argue that |||x||| ||Lx|| is a norm.
2. Show every polynomial, ||k d x is C k for every k.
5. The existence of partial derivatives does not imply continuity as was shown
in an example. However, much more can be said than this. Consider
{ 2 4 2
(x y )
f (x, y) = (x2 +y 4 )2
if (x, y) = (0, 0) ,
1 if (x, y) = (0, 0) .
Show each Gateaux derivative, Dv f (0) exists and equals 0 for every v. Also
show each Gateaux derivative exists at every other point in R2 . Now consider
the curve x2 = y 4 and the curve y = 0 to verify the function fails to be con-
tinuous at (0, 0). This is an example of an everywhere Gateaux dierentiable
function which is not dierentiable and not continuous.
Determine whether f is continuous at (0, 0). Find fx (0, 0) and fy (0, 0) . Are
the partial derivatives of f continuous at (0, 0)? Find
f (t (u, v))
D(u,v) f ((0, 0)) , lim .
t0 t
Is the mapping (u, v) D(u,v) f ((0, 0)) linear? Is f dierentiable at (0, 0)?
whenever t [0, 1]. Suppose also that f is dierentiable. Show then that for
every x, y V,
(Df (x) Df (y)) (x y) 0.
(u v (x)) = Df (x) u.
This special vector is called the gradient and is usually denoted by f (x) .
Hint: You might review the Riesz representation theorem presented earlier.
154 THE DERIVATIVE
Hint: Consider T x = f (x) Lx and argue ||DT (x)|| < k. Then consider
Theorem 6.4.2.
11. Let U be an open subset of X, f : U Y where X, Y are nite dimensional
normed vector spaces and suppose f C 1 (U ) and Df (x0 ) is one to one. Then
show f is one to one near x0 . Hint: Show using the assumption that f is C 1
that there exists > 0 such that if
x1 , x2 B (x0 , ) ,
then
r
|f (x1 ) f (x2 ) Df (x0 ) (x1 x2 )| |x1 x2 | (6.13.47)
2
then use Problem 1.
12. Suppose M L (X, Y ) and suppose M is onto. Show there exists L L (Y, X)
such that
LM x =P x
where P L (X, X), and P 2 = P . Also show L is one to one and onto. Hint:
Let {y1 , , ym } be a basis of Y and let M xi = yi . Then dene
m
m
Ly = i xi where y = i yi .
i=1 i=1
Show {x1 , , xm } is a linearly independent set and show you can obtain
{x1 , , xm , , xn }, a basis for X in which M xj = 0 for j > m. Then let
m
Px i xi
i=1
where
m
x= i xi .
i=1
6.13. EXERCISES 155
15. Let f : U Y , Df (x) exists for all x U , B (x0 , ) U , and there exists
L L (X, Y ), such that L1 L (Y, X), and for all x B (x0 , )
r
||Df (x) L|| < , r < 1.
||L1 ||
Show that there exists > 0 and an open subset of B (x0 , ) , V , such that
f : V B (f (x0 ) , ) is one to one and onto. Also Df 1 (y) exists for each
y B (f (x0 ) , ) and is given by the formula
[ ( )]1
Df 1 (y) = Df f 1 (y) .
Hint: Let
Ty (x) T (x, y) xL1 (f (x) y)
(1r)
for |y f (x0 )| < 2||L1 || , consider {Ty (x0 )}. This is a version of the inverse
n
16. Recall the nth derivative can be considered a multilinear function dened on
X n with values in some normed vector space. Now dene a function denoted
as wi vj1 vjn which maps X n Y in the following way
wi (ak1 ak2 akn ) j1 k1 j2 k2 jn kn
k1 k2 kn
= wi aj1 aj2 ajn (6.13.49)
Show each wi vj1 vjn is an n linear Y valued function. Next show the set
of n linear Y valued functions is a vector space and these special functions,
wi vj1 vjn for all choices of i and the jk is a basis of this vector space. Find
the dimension of the vector space.
n n 2 2
17. Minimize j=1 xj subject to the constraint j=1 xj = a . Your answer
should be some function of a which you may assume is a positive number.
18. Find the point, (x, y, z) on the level surface, 4x2 + y 2 z 2 = 1which is closest
to (0, 0, 0) .
19. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and the
cylinder x2 + y 2 = 4. Find the point on this curve which is closest to (0, 0, 0) .
20. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and
the sphere x2 + y 2 + z 2 = 16. Find the point on this curve which is closest to
(0, 0, 0) .
21. Find the point on the plane, 2x + 3y + z = 4 which is closest to the point
(1, 2, 3) .
22. Let A = (Aij ) be an n n matrix which is symmetric. Thus Aij = Aji
and recall (Ax)i = Aij xj where as usual sum over the repeated index. Show
xi (Aij xj xi ) = 2Aij xj . Show that when you use the method of Lagrange
n
mul-
tipliers to maximize the function, Aij xj xi subject to the constraint, j=1 x2j =
1, the value of which corresponds to the maximum value of this functions is
such that Aij xj = xi . Thus Ax = x. Thus is an eigenvalue of the matrix,
A.
6.13. EXERCISES 157
24. Let f (x1 , , xn ) = xn1 x2n1 x1n . Then f achieves a maximum on the set
{ }
n
S xR : n
ixi = 1 and each xi 0 .
i=1
and there exist values of the xi for which equality holds. This says the geo-
metric mean is always smaller than the arithmetic mean.
27. Maximize x2 y 2 subject to the constraint
x2p y 2q
+ = r2
p q
where p, q are real numbers larger than 1 which have the property that
1 1
+ = 1.
p q
show the maximum is achieved when x2p = y 2q and equals r2 . Now conclude
that if x, y > 0, then
xp yq
xy +
p q
and there are values of x and y where this inequality is an equation.
158 THE DERIVATIVE
Part II
159
Measures And Measurable
Functions
The integral to be discussed next is the Lebesgue integral. This integral is more
general than the Riemann integral of beginning calculus. It is not as easy to dene
as this integral but it is vastly superior in every application. In fact, the Riemann
integral has been obsolete for over 100 years. There exist convergence theorems
for this integral which are not available for the Riemann integral and unlike the
Riemann integral, the Lebesgue integral generalizes readily to abstract settings used
in probability theory. Much of the analysis done in the last 100 years applies to the
Lebesgue integral. For these reasons, and because it is very easy to generalize the
Lebesgue integral to functions of many variables I will present the Lebesgue integral
here. First it is convenient to discuss outer measures, measures, and measurable
function in a general setting.
161
162 MEASURES AND MEASURABLE FUNCTIONS
First here is a very interesting lemma about the existence of something called a
Lebesgue number, the number r in the next lemma.
Lemma 7.1.2 Let K be a sequentially compact set in a normed vector space and
let U be an open cover of K. Then there exists r > 0 such that if x K, then
B (x, r) is a subset of some set of U.
contrary to the choice of xnk which required B (xnk , 1/nk ) is not contained in any
set of U.
B (xi , r) Ui U.
m m
If their union contains K then {Ui }i=1 is a nite subcover of U. If {B (xi , r)}i=1
does not cover K, then there exists xm+1 / mi=1 B (xi , r) and so B (xm+1 , r)
Um+1 U. This process must stop after nitely many choices of B (xi , r) because
if not, {xk }k=1 would have a subsequence which converges to a point of K which
cannot occur because whenever i = j,
||xi xj || > r
Therefore, eventually
K m
k=1 B (xk , r) k=1 Uk .
m
k=m {xk } Cm is a closed set, closed because it contains all its limit points. (It
has no limit points so it contains them all.) Then letting Um = Cm C
, it follows {Um }
is an open cover of K which has no nite subcover. Thus K must be sequentially
compact after all.
If K is a closed and bounded set in a nite dimensional normed vector space,
then K is sequentially compact by Theorem 4.8.4. Therefore, by the rst part of
this theorem, it is sequentially compact.
Summarizing the above theorem along with Theorem 4.8.4 yields the following
corollary which is often called the Heine Borel theorem.
Corollary 7.1.4 Let X be a nite dimensional normed vector space and let K X.
Then the following are equivalent.
Lemma 7.1.5 Let ||x|| max {|xi | , i = 1, 2, , n} for x Fn . Then every set U
which is open in Fn is the countable union of balls of the form B (x,r) where the
open ball is dened in terms of the above norm. Also, if C is any collection of open
sets, then there exists a countable subset C such that
C = C .
Proof: By Theorem 4.8.3 if you consider the two normed vector spaces (Fn , ||)
and (Fn , ||||) , the identity map is continuous in both directions. Therefore, if a set
U is open with respect to || , it follows it is open with respect to |||| and the other
way around. The other thing to notice is that there exists a countable dense subset
of F. The rationals will work if F = R and if F = C, then you use Q + iQ. Letting
D be a countable dense subset of F, Dn is a countable dense subset of Fn . It is
countable because it is a nite Cartesian product of countable sets and you can use
Theorem 1.1.7 of Page 18 repeatedly. It is dense because if x Fn , then by density
of D, there exists dj D such that
|dj xj | <
B {B (d, r) : d Dn , r Q} .
This collection of open balls is countable by Theorem 1.1.7 of Page 18. I claim
every open set is the union of balls from B. Let U be an open set in Fn and x U .
164 MEASURES AND MEASURABLE FUNCTIONS
Then there exists > 0 such that B (x, ) U. There exists d Dn B (x, /5) .
Then pick rational number /5 < r < 2/5. Consider the set of B, B (d, r) . Then
x B (d, r) because r > /5. However, it is also the case that B (d, r) B (x, )
because if y B (d, r) then
C C C
The last condition which says you can reduce to a countable sub covering is
called the Lindelo property.
Thus one of these is the limit from the left and the other is the limit from the right.
In words, you look at all coverings of A with open intervals. For each of these
open coverings, you add the lengths of the individual open intervals and you take
the inmum of all such numbers obtained.
Then 1.) is obvious because if a countable collection of open intervals covers B,
then it also covers A. Thus the set of numbers obtained for B is smaller than the set
of numbers for A. Why is () = 0? Pick a point of continuity of F. Such points exist
because F is increasing and so it has only countably many points of discontinuity.
Let a be this point. Then (a , a + ) and so () F (a + ) F (a )
for every > 0. Letting 0, it follows that () = 0.
Consider 2.). If any (Ai ) = , there is nothing to prove. The assertion simply
is . Assume then that (Ai ) < for all i. Then for each m N there
m
exists a countable set of open intervals, {(ami , bi )}i=1 such that
i ) F (ai +)) .
(F (bm m
(Am ) + m
>
2 i=1
(
m=1 Am ) i ) F (ai +)) =
(F (bm m
i ) F (ai +))
(F (bm m
By Theorem 7.1.3, nitely many of these intervals also cover [a, b]. It follows there
n
exist nitely many of these intervals, denoted as {(ai , bi )}i=1 , which overlap, such
166 MEASURES AND MEASURABLE FUNCTIONS
It follows
n
(F (bi ) F (ai +)) ([a, b])
i=1
n
(F (bi ) F (ai +))
i=1
F (b+) F (a)
Therefore,
Letting 0,
F (b+) F ((a + ))
F (b+) F ((a + ) )
= ([a + , b]) ((a, b])
((a, b + )) = F ((b + ) ) F (a+)
F (b + ) F (a+) .
7.3. MEASURES AND MEASURE SPACES 167
, S,
If E S then E C S
and
If Ei S, for i = 1, 2, , then
i=1 Ei S.
The following lemma is obvious and its proof is left for you.
Lemma 7.3.2 Let G denote a set whose elements are algebras. Then G is also
a algebra.
Denition 7.3.3 In any topological space, the Borel sets consist of the smallest
algebra which contains the open sets. Thus denoting by the collection of open
sets, the Borel sets, written B ( ).
Theorem 7.3.4 Let {Em } m=1 be a sequence of measurable sets in a measure space
(, F, ). Then if En En+1 En+2 ,
(
i=1 Ei ) = lim (En ) (7.3.1)
n
(
i=1 Ei ) = lim (En ). (7.3.2)
n
Stated more succinctly, Ek E implies (Ek ) (E) and Ek E with (E1 ) <
implies (Ek ) (E).
+ (Ek+1 ) (Ek )
k=1
n
= (E1 ) + lim (Ek+1 ) (Ek ) = lim (En+1 ).
n n
k=1
This shows part 7.3.1.
To verify 7.3.2,
(E1 ) = (
i=1 Ei ) + (E1 \ i=1 Ei )
so by 7.3.1,
(E1 ) (
i=1 Ei ) = (E1 \ i=1 Ei ) = lim (E1 \ i=1 Ei )
n
n
7.4. MEASURES FROM OUTER MEASURES 169
lim (En ) = (
i=1 Ei ).
n
Denition 7.3.5 If something happens except for on a set of measure zero, then
it is said to happen a.e. almost everywhere. For example, {fk (x)} is said to
converge to f (x) a.e. if there is a set of measure zero, N such that if x N, then
fk (x) f (x).
Denition 7.4.1 Let be a nonempty set and let : P() [0, ] be an outer
measure. For E , E is measurable if for all S ,
The next lemma indicates that the property of measurability is not lost by
considering this restricted measure.
involved was either oil, bread, our or sh. In mathematics such things have also been done with
sets. In the book by Bruckner Bruckner and Thompson there is an interesting discussion of the
Banach Tarski paradox which says it is possible to divide a ball in R3 into ve disjoint pieces and
assemble the pieces to form two disjoint balls of the same size as the rst. The details can be
found in: The Banach Tarski Paradox by Wagon, Cambridge University press. 1985. It is known
that all such examples must involve the axiom of choice.
170 MEASURES AND MEASURABLE FUNCTIONS
If Fn Fn+1 , then if F =
n=1 Fn and Fn S, it follows that
If Fn Fn+1 , and if F =
n=1 Fn for Fn S then if (F1 ) < ,
This measure space is also complete which means that if (F ) = 0 for some F S
then if G F, it follows G S also.
Proof: First note that and are obviously in S. Now suppose A, B S. I
will show A \ B A B C is in S. To do so, consider the following picture.
S
C
S AC B
S AC B
S A BC
S A B
B
A
7.4. MEASURES FROM OUTER MEASURES 171
S \ (A \ B) = (S \ A) (S B) = (S \ A) (A B S) .
By induction, if Ai Aj = and Ai S,
n
(ni=1 Ai ) = (Ai ). (7.4.8)
i=1
Now let A =
i=1 Ai where Ai Aj = for i = j.
n
(Ai ) (A) (ni=1 Ai ) = (Ai ).
i=1 i=1
Since this holds for all n, you can take the limit as n and conclude,
(Ai ) = (A)
i=1
Therefore, letting
F
k=1 Fk
172 MEASURES AND MEASURABLE FUNCTIONS
In order to establish 7.4.7, let the Fn be as given there. Then, since (F1 \ Fn )
increases to (F1 \ F ), 7.4.6 implies
which implies
lim (Fn ) (F ) .
n
But since F Fn ,
(F ) lim (Fn )
n
and this establishes 7.4.7. Note that it was assumed (F1 ) < because (F1 ) was
subtracted from both sides.
It remains to show S is closed under countable unions. Recall that if A S, then
AC S and S is closed under nite unions. Let Ai S, A = i=1 Ai , Bn = i=1 Ai .
n
Then
(S) = (S Bn ) + (S \ Bn ) (7.4.9)
= (S)(Bn ) + (S)(BnC ).
(S) (S G) + (S \ G)
7.5. MEASURABLE FUNCTIONS 173
However,
(S G) + (S \ G) (S F ) + (S \ F ) + (F \ G)
= (S F ) + (S \ F ) = (S)
because by assumption, (F \ G) (F ) = 0.
(E \ E ) (E \ E) N F
(E \ E ) , (E \ E) F
E = (E E ) (E \ E) F
The measure which results from the outer measure of Theorem 7.2.3 is called
the Lebesgue Stieltjes measure associated with the integrator function F . Its prop-
erties will be discussed more later.
Proof: First note that the rst and the third are equivalent. To see this, observe
f 1 ([d, ]) =
n=1 f
1
((d 1/n, ]),
f 1 ((d, ]) =
n=1 f
1
([d + 1/n, ]),
so the rst and fourth conditions are equivalent. Thus the rst four conditions are
equivalent and if any of them hold, then for < a < b < ,
and so the third condition holds. Therefore, all ve conditions are equivalent.
This lemma allows for the following denition of a measurable function having
values in (, ].
In the case of (, ], you can start with the general denition of measurability
just given and conclude that all of the equivalent conclusions in the above lemma
hold. This was contained in the above proof.
7.5. MEASURABLE FUNCTIONS 175
the ck being the distinct values of s, Ek being the set on which s = ck . A nonnegative
simple function is one in which the ck [0, ].
m
Lemma 7.5.5 Let s () = k=1 ck XEk () where the ck are the distinct values of
s, the Ek being disjoint. Then s is measurable if and only if each Ek is measurable..
sn () sn+1 ()
f () = lim sn () for all . (7.5.11)
n
Letting I { : f () = } , dene
n
2
k
tn () = X 1 () + nXI ().
n f ([k/n,(k+1)/n))
k=0
1
0 f () tn () . (7.5.12)
n
Thus whenever
/ I, the above inequality will hold for all n large enough. Let
|f | + f
f + max (f, 0) = ,
2
|f | f
f f + f = min (f, 0) = .
2
Thus f + f = f and both f + and f are nonnegative.
Observation 7.5.8 Note that f < since f > . Also note that f = f +
when f + > 0 and f = f when f > 0.
Proof: Suppose rst that f is measurable and a 0. Then from the denition
and the above observation,
( + )1
f ((a, ]) = f 1 ((a, ]) F
If a < 0, then
( )1 ( )1
f ((a, ]) = f ([0, ]) = F
and so f + , f are both measurable. The rest involving the approximating sequences
of simple functions follows from Theorem 7.5.6.
Conversely, if both f + and f are measurable, then consider the following pic-
ture.
Ri y =a+x
The top half can be lled with a countable union of open rectangles of the sort
shown, Ri = (ai , bi ) (ci , di ). See Lemma 7.1.5. Then clearly f () > a, if and only
if (f () , f + ()) is in the top half plane determined in the above picture, which
is the union of countably many open rectangles. Hence
( + )1 ( )1
f 1 ((a, ]) = i=1 f (ci , di ) f (ai , bi ) F
You could also consider the following equation in which the right side is a count-
able union of measurable sets.
[ ] [ ]
[f > a] = rQ f + > a + r f < r .
where the ck are the distinct values of s, and the Ek are disjoint sets.
n
Lemma 7.5.11 Let s () = k=1 (ak + ibk ) XEk () where the ak + ibk are the
distinct values of s and the Ek are the disjoint sets on which s achieves these
values. Then s is measurable if and only if each Ek is measurable.
n n
Proof: First suppose s is measurable. Then both k=1 ak XEk () and k=1 bk XEk ()
are measurable. Then Ek = [Re s = ak ]
[Im s = bk ] F. Conversely, assume each
Ek is measurable. Then k ak XEk and k bk XEk are both measurable because if
h is the rst, then
[h > ] = {Ek : ak > } F.
Similar considerations hold for the second.
Now it is not hard to generalize the above to complex valued functions.
Proof: If f is measurable, the conclusion follows right away from Theorem 7.5.9
applied to the real and imaginary parts of f. Letting h denote one of these real or
imaginary parts, there exist increasing sequences of simple functions {sk } , {tk }
converging pointwise to h+ and h respectively. If h () 0, Then h () = 0 so
tk () = 0. Hence
part of f and their imaginary parts are simple functions converging to the imaginary
part of f . Then it follows from Theorem 7.5.3 that f is measurable because its real
and imaginary parts are.
Now the following is a major result about continuous functions of complex valued
measurable functions.
lim g sj = g f
j
(() < )
and let fn , f be complex valued functions such that Re fn , Im fn are all measurable
and
lim fn () = f ()
n
for all
/ E where (E) = 0. Then for every > 0, there exists a set
F E, (F ) < ,
0 = (
k=1 Ekm ) = lim (Ekm ).
k
F = Ek(m)m .
m=1
C
Ek(m)m .
m=1
Hence Ek(m
C
0 )m0
so
|fn () f ()| < 1/m0 <
for all n > k(m0 ). This holds for all F C and so fn converges uniformly to f on
F C.
Now if E = , consider {XE C fn }n=1 . Each XE C fn has real and imaginary
parts measurable and the sequence converges pointwise to XE f everywhere. There-
fore, from the rst part, there exists a set of measure less than , F such that on
C
F C , {XE C fn } converges uniformly to XE C f. Therefore, on (E F ) , {fn } con-
verges uniformly to f .
7.6. ONE DIMENSIONAL LEBESGUE STIELTJES MEASURE 181
Theorem 7.6.1 Let S denote the algebra of Theorem 7.4.4, associated with the
outer measure in Theorem 7.2.3, on which is a measure. Then every open
interval is in S. So are all open and closed sets. Furthermore, if E is any set in S
Proof: The rst task is to show (a, b) S. I need to show that for every S R,
( )
C
(S) (S (a, b)) + S (a, b) (7.6.15)
Suppose rst S is an open interval, (c, d) . If (c, d) has empty intersection with (a, b)
or is contained in (a, b) there is nothing to prove. The above expression reduces to
nothing more than (S) = (S). Suppose next that (c, d) (a, b) . In this case,
the right side of the above reduces to
The only other cases are c a < d b or a c < d b. Consider the rst of these
cases. Then the right side of 7.6.15 for S = (c, d) is
The last case is entirely similar. Thus 7.6.15 holds whenever S is an open interval.
Now it is clear 7.6.15 also holds if (S) = . Suppose then that (S) < and
let
S
k=1 (ak , bk )
such that
(S) + > (F (bk ) F (ak +)) = ((ak , bk )) .
k=1 k=1
182 MEASURES AND MEASURABLE FUNCTIONS
Then since is an outer measure, and using what was just shown,
( )
C
(S (a, b)) + S (a, b)
( )
( C
k=1 (ak , bk ) (a, b)) + k=1 (ak , bk ) (a, b)
( )
C
((ak , bk ) (a, b)) + (ak , bk ) (a, b)
k=1
((ak , bk )) (S) + .
k=1
Since is arbitrary, this shows 7.6.15 holds for any S and so any open interval is in
S.
It follows any open set is in S. This follows from Theorem 4.3.10 which implies
that if U is open, it is the countable union of disjoint open intervals. Since each of
these open intervals is in S and S is a algebra, their union is also in S. It follows
every closed set is in S also. This is because S is a algebra and if a set is in S
then so is its complement. The closed sets are those which are complements of open
sets.
The assertion of outer regularity is not hard to get. Letting E be any set
(E) < , there exist open intervals covering E denoted by {(ai , bi )}i=1 such that
(E) + > F (bi ) F (ai +) = (ai , bi ) (V )
i=1 i=1
(E) (V ) (E) + .
Now suppose E is arbitrary and let l < (E) . Then choosing small enough,
l + < (E) also. Letting En E [n, n] , it follows from Theorem 7.3.4 that
for n large enough, (En ) > l + . Now from what was just shown, there exists
K En such that (K) + > (En ). Hence (K) > l. This shows 7.6.13.
Denition 7.6.2 When the integrator function is F (x) = x, the Lebesgue Stielt-
jes measure just discussed is known as one dimensional Lebesgue measure and is
denoted as m.
Proposition 7.6.3 For m Lebesgue measure, m ([a, b]) = m ((a, b)) = b a. Also
m is translation invariant in the sense that if E is any Lebesgue measurable set,
then m (x + E) = m (E).
Proof: The formula for the measure of an interval comes right away from
Theorem 7.2.3. From this, it follows right away that whenever E is an interval,
m (x + E) = m (E). Every open set is the countable disjoint union of open intervals,
so if E is an open set, then m (x + E) = m (E). What about closed sets? First
suppose H is a closed and bounded set. Then letting (n, n) H,
(((n, n) \ H) + x) + (H + x) = ((n, n) + x)
Therefore, the translation invariance holds for closed and bounded sets. If H is an
arbitrary closed set, then
It follows right away that if G is the countable intersection of open sets, (G set,
pronounced g delta set ) then
m (F ) = m (x + F ) m (x + E) m (x + G) = m (G) = m (E) = m (F ) .
7.7 Exercises
1. Let C be a set whose elements are algebras of subsets of . Show C is a
algebra also.
184 MEASURES AND MEASURABLE FUNCTIONS
2. Let be any set. Show P () , the set of all subsets of is a algebra. Now
let L denote some subset of P () . Consider all algebras which contain L.
Show the intersection of all these algebras which contain L is a algebra
containing L and it is the smallest algebra containing L, denoted by (L).
When is a normed vector space, and L consists of the open sets (L) is
called the algebra of Borel sets.
3. Show that for (, F) a measurable space, f : C is measurable if and
only if f 1 (open sets) F . Hint: Note that any open set is composed of
countably many rectangles of the form (a, b) + i (c, d).
4. Consider = [0, 1] and let S denote all subsets of [0, 1] , F such that either
F C or F is countable. Note the empty set must be countable. Show S is a
algebra. (This is a sick algebra.) Now let : S [0, ] be dened by
(F ) = 1 if F C is countable and (F ) = 0 if F is countable. Show is a
measure on S.
5. Let = N, the positive integers and let a algebra be given by F = P (N),
the set of all subsets of N. What are the measurable functions having values
in C? Let (E) be the number of elements of E where E is a subset of N.
Show is a measure.
6. Let F be a algebra of subsets of and suppose F has innitely many
elements. Show that F is uncountable. Hint: You might try to show there
exists a countable sequence of disjoint sets of F, {Ai }. It might be easiest to
verify this by contradiction if it doesnt exist rather than a direct construction
however, I have seen this done several ways. Once this has been done, you
can dene a map , from P (N) into F which is one to one by (S) = iS Ai .
Then argue P (N) is uncountable and so F is also uncountable.
7. A probability space is a measure space, (, F, P ) where the measure P has
the property that P () = 1. Such a measure is called a probability measure.
Random vectors are measurable functions, X, mapping a probability space,
(, F, P ) to Rn . Thus X () Rn for each and P is a probability
measure dened on the sets of F, a algebra of subsets of . For E a Borel
set in Rn , dene
( )
(E) P X1 (E) probability that X E.
Show this is a well dened probability measure on the Borel sets of Rn . Thus
(E) = P (X () E) . It is called the distribution.
8. Let E be a countable subset of R. Show m(E) = 0. Hint: Let the set be
{ei }i=1 and let ei be the center of an open interval of length /2i .
9. If S is an uncountable set of irrational numbers, is it necessary that S has
a rational number as a limit point? Hint: Consider the proof of Problem 8
when applied to the rational numbers. (This problem was shown to me by
Lee Erlebach.)
7.7. EXERCISES 185
10. Let be a nite measure on the Borel sets of Rn . Show that must be
regular. Hint: You might let F denote those Borel sets for which is regular,
show the open sets are in F and that F is a algebra.
11. If K is a compact subset of an open set V where K, V are in Rn , show there
exists a continuous function f : Rn [0, 1] such that f (x) = 1 on K and
spt (f ) {x : f (x) = 0} referred to as the support of f is a compact subset
of V .
12. Suppose
(, F, ) is a measure space and {Ei } F. Suppose also that
i=1 (Ei ) < . Let N consist of all the points of which are in innitely
many of the sets Ei . Show that (N ) = 0.
13. Suppose is a nite regular measure dened on B (Rn ) and E is a Borel
set. Let XE denote the indicator function of E dened as
{
1 if E
XE (x)
0 if
/E
Show there exists a sequence of continuous functions having compact supports
{fk } such that fk (x) XE (x) for a.e. x. Recall from the above problems,
the support of f is dened as
spt (f ) {x : f (x) = 0}.
Show {fn } converges uniformly on [0, 1]. If f (x) = limn fn (x), show that
f (0) = 0, f (1) = 1, f is continuous, and f (x) = 0 for all x
/ P where P is the
Cantor set. This function is called the Cantor function.It is a very important
example to remember. Note it has derivative equal to zero a.e. and yet it
1
succeeds in climbing from 0 to 1. Thus it would seem that 0 f (t) dt = 0 =
f (1) f (0) . Is this somehow contradictory to the fundamental theorem of
calculus? Hint: This isnt too hard if you focus on getting a careful estimate
on the dierence between two successive functions in the list, considering only
a typical small interval in which the change takes place. The above picture
should be helpful.
16. Let m(W ) > 0, W is measurable, W [a, b]. Show there exists a nonmea-
surable subset of W . Hint: Let x y if x y Q. Observe that is an
equivalence relation on R. See Denition 1.1.9 on Page 20 for a review of this
terminology. Let C be the set of equivalence classes and let D {CW : C C
and C W = }. By the axiom of choice, there exists a set A, consisting of
exactly one point from each of the nonempty sets which are the elements of
D. Show
W rQ A + r (a.)
A + r1 A + r2 = if r1 = r2 , ri Q. (b.)
Observe that since A [a, b], then A + r [a 1, b + 1] whenever |r| < 1. Use
this to show that if m(A) = 0, or if m(A) > 0 a contradiction results.Show
there exists some set S such that m (S) < m (S A) + m (S \ A) where m is
the outer measure determined by m.
17. This problem gives a very interesting example found in the book by
McShane [36]. Let g(x) = x + f (x) where f is the strange function of Problem
15. Let P be the Cantor set of Problem 14. Let [0, 1] \ P = j=1 Ij where
Ij is open and Ij Ik = if j = k. These intervals are the connected
components of the complement
of the
Cantor set. Show m(g(Ij )) = m(Ij )
so m(g( j=1 I j )) = j=1 m(g(I j )) = j=1 m(Ij ) = 1. Thus m(g(P )) = 1
because g([0, 1]) = [0, 2]. By Problem 16 there exists a set A g (P ) which is
non measurable. Dene (x) = XA (g(x)). Thus (x) = 0 unless x P . Tell
why is measurable. (Recall m(P ) = 0 and Lebesgue measure is complete.)
Now show that XA (y) = (g 1 (y)) for y [0, 2]. Tell why g 1 is continuous
but g 1 is not measurable. (This is an example of measurable continuous
= measurable.) Show there exist Lebesgue measurable sets which are not
Borel measurable. Hint: The function, is Lebesgue measurable. Now show
that Borel measurable = measurable.
18. Let K be a compact subset of R having no isolated points. Show that there
exists an increasing continuous function g such that g is constant on every
connected component of K C and has values between 0 and 1. If J, L are two
components, J < L, then the value of g on J is strictly less than its value on
L. Hint: Let the components be {(ak , bk )}. Let a be the rst point of K and
7.7. EXERCISES 187
b be the last. Let g0 be piecewise linear, increasing and continuous going from
0 to the left of a to 1 to the right of b. Let g1 equal 12 (g0 (a1 ) + g0 (b1 )) on
(a1 , b1 ) and adjust to make piecewise linear and increasing going from 0 to 1.
Next adjust g1 in a similar way to make it constant on (a2 , b2 ). Continue this
way. Estimate ||gk gk1 || in terms of gk1 (bk ) gk1 (ak ) and observe
and use that the intervals (gk1 (ak ) , gk1 (bk )) are disjoint.
19. Show that if K is any compact subset of R which has no isolated points,
there exists a Borel measure which has the properties (K) = 1, (E) =
(E K) , if H is a proper compact subset of K, then (H) < 1. Also,
(p) = 0 whenever p is a point.
21. Suppose you consider the closed upper half plane determined by the line y =
x. Can it be covered with countably many rectangles of the form [a, b] [c, d]
each of which is contained in the upper half plane? Hint: You must cover
the points on the line y = x.
188 MEASURES AND MEASURABLE FUNCTIONS
The Abstract Lebesgue
Integral
The general Lebesgue integral requires a measure space, (, F, ) and, to begin with,
a nonnegative measurable function. I will use Lemma 1.3.3 about interchanging two
supremums frequently. Also, I will use the observation that if {an } is an increasing
sequence of points of [0, ] , then supn an = limn an which is obvious from the
denition of sup.
is short for
{ : g () < f ()}
with other variants of this notation being similar. Also, the convention, 0 = 0
will be used to simplify the presentation whenever it is convenient to do so.
189
190 THE ABSTRACT LEBESGUE INTEGRAL
where the integral on the right is the usual Riemann integral because eventually
M > f . For f a nonnegative decreasing function dened on [0, ),
R R R
f d lim f d = sup f d = sup sup f M d
0 R 0 R>1 0 R M >0 0
Since decreasing bounded functions are Riemann integrable, the above denition
is well dened. Now here are some obvious properties.
m(h,R)
= sup sup sup ( ([f > kh]) M ) h
M R>0 h>0
k=1
hm(h,R)
The sum is just a lower sum for the integral 0
([f > ]) M d. Hence,
switching the order of the sups, this equals
m(R,h)
= sup sup ( ([f > kh])) h = sup ( ([f > kh])) h.
h>0 R h>0
k=1 k=1
Now the Lebesgue integral for a nonnegative function has been dened, what
does it do to a nonnegative simple function? Recall a nonnegative simple function is
one which has nitely many nonnegative real values which it assumes on measurable
sets. Thus a simple function can be written in the form
n
s () = ci XEi ()
i=1
192 THE ABSTRACT LEBESGUE INTEGRAL
and so both sides are equal to . Thus it can be assumed for each i, (Ei ) < .
Then it follows from Lemma 8.2.1 and Lemma 8.1.2,
ap p
ak
([s > ]) d = ([s > ]) d = ([s > ]) d
0 0 k=1 ak1
p
p
p
i
p
= (ak ak1 ) (Ei ) = (Ei ) (ak ak1 ) = ai (Ei )
k=1 i=k i=1 k=1 i=1
Proof: Let
n
m
s() = i XAi (), t() = j XBj ()
i=1 i=1
where i are the distinct values of s and the j are the distinct values of t. Clearly
as + bt is a nonnegative simple function because it has nitely many values on
measurable sets In fact,
m
n
(as + bt)() = (ai + b j )XAi Bj ()
j=1 i=1
8.3. THE MONOTONE CONVERGENCE THEOREM 193
Theorem 8.3.1 (Monotone Convergence theorem) Let f have values in [0, ] and
suppose {fn } is a sequence of nonnegative measurable functions having values in
[0, ] and satisfying
lim fn () = f () for each .
n
fn () fn+1 ()
N
= sup sup ([fn > kh]) h = sup sup sup ([fn > kh]) h
n h>0 h>0 N n
k=1 k=1
N
= sup sup ([f > kh]) h = sup ([f > kh]) h = f d
h>0 N h>0
k=1 k=1
To illustrate what goes wrong without the Lebesgue integral, consider the fol-
lowing example.
194 THE ABSTRACT LEBESGUE INTEGRAL
Example 8.3.2 Let {rn } denote the rational numbers in [0, 1] and let
{
1 if t
/ {r1 , , rn }
fn (t)
0 otherwise
Then fn (t) f (t) where f is the function which is one on the rationals and zero
on the irrationals. Each fn is Riemann integrable (why?)
but f is not Riemann
integrable. Therefore, you cant write f dx = limn fn dx.
A meta-mathematical observation related to this type of example is this. If you
can choose your functions, you dont need the Lebesgue integral. The Riemann Dar-
boux integral is just ne. It is when you cant choose your functions and they come
to you as pointwise limits that you really need the superior Lebesgue integral or
at least something more general than the Riemann integral. The Riemann integral
is entirely adequate for evaluating the seemingly endless lists of boring problems
found in calculus books. It is shown later that the two integrals coincide when the
Lebesgue integral is taken with respect to Lebesgue measure and the function being
integrated is Riemann integrable.
Similarly this also shows that for such nonnegative measurable function,
{ }
f d = sup s : 0 s f, s simple
In other words, (
)
lim inf fn d lim inf fn d
n n
Proof: By Theorem 7.5.6 on Page 175 there exist increasing sequences of non-
negative simple functions, sn f and tn g. Then af + bg, being the pointwise
limit of the simple functions asn + btn , is measurable. Now by the monotone con-
vergence theorem and Lemma 8.2.3,
(af + bg) d = lim asn + btn d
n
( )
= lim a sn d + b tn d
n
= a f d + b gd.
As long as you are allowing functions to take the value +, you cannot consider
something like f + (g) and so you cant very well expect a satisfactory statement
about the integral being linear until you restrict yourself to functions which have
values in a vector space. This is discussed next.
f +f f f
Re f = , Im f = .
2 2i
Denition 8.7.1 Let (, S, ) be a measure space and suppose f : C. Then f
is said to be measurable if both Re f and Im f are measurable real valued functions.
2 2 2
As is always the case for complex numbers, |z| = (Re z) + (Im z) . Also, for
g a real valued function, one can consider its positive and negative parts dened
respectively as
I will show that with this denition, the integral is linear and well dened. First
note that it is clearly well dened because all the above integrals are of nonnegative
functions and are each equal to a nonnegative
real number because for h equal to
any of the functions, |h| |f | and |f | d < .
Here is a lemma which will make it possible to show the integral is linear.
Then
gd hd = g d h d.
Proof: First observe that from the denition of the positive and negative parts
of a function,
( )
(f + g) (f + g) = f + + g + f + g
+
because both sides equal f + g. Therefore from Lemma 8.7.3 and the denition, it
follows from Theorem 8.6.1 that
(f + g) (f + g) d = f + + g + d f + g d
+
f + gd
( )
= f d + g d
+ +
f d + g d = f d + gd.
what about taking out scalars? First note that if a is real and nonnegative, then
(af ) = af + and (af ) = af while if a < 0, then (af ) = af and (af ) =
+ +
af . These claims follow immediately from the above denitions of positive and
+
198 THE ABSTRACT LEBESGUE INTEGRAL
Also, for every f L1 () it follows that for every > 0 there exists a simple
function s such that |s| |f | and
|f s| d < .
+i (b Re (f ) + c Im (g) + a Im (f ) + d Re (g)) (8.7.5)
8.7. THE LEBESGUE INTEGRAL, L1 199
which equals
= a Re f d b Im f d + ib Re f d + ia Im f d
+c Re gd d Im gd + id Re gd d Im gd.
Using Lemma 8.7.4 and collecting terms, it follows that this reduces to 8.7.5. Thus
the integral is linear as claimed.
Consider the claim about approximation with a simple function. Letting h equal
any of
+ +
(Re f ) , (Re f ) , (Im f ) , (Im f ) , (8.7.6)
It follows from the monotone convergence theorem and Theorem 7.5.6 on Page 175
there exists a nonnegative simple function s h such that
|h s| d < .
4
Therefore, letting s1 , s2 , s3 , s4 be such simple functions, approximating respectively
the functions listed in 8.7.6, and s s1 s2 + i (s3 s4 ) ,
+
|f s| d (Re f ) s1 d + (Re f ) s2 d
+
+ (Im f ) s3 d + (Im f ) s4 d <
The following corollary follows from this. The conditions of this corollary are
sometimes taken as a denition of what it means for a function f to be in L1 ().
200 THE ABSTRACT LEBESGUE INTEGRAL
sn () f () for all
(8.7.7)
limm,n (|sn sm |) = 0
When f L1 () ,
f d lim sn . (8.7.8)
n
Then
1 1
|sn sm | d |sn f | d + |f sm | d + .
n m
Next suppose the existence of the approximating sequence of simple functions.
Then f is measurable because its real and imaginary parts are the limit of measur-
able functions. By Fatous lemma,
|f | d lim inf |sn | d <
n
because
|sn | d |sm | d |sn sm | d
{ }
which is given to converge to 0. Hence |sn | d is a Cauchy sequence and is
therefore, bounded.
In case f L1 () , letting {sn } be the approximating sequence, Fatous lemma
implies
f d sn d |f sn | d lim inf |sm sn | d <
m
In other words, wants to be linear. Then has a unique linear extension to the
complex valued measurable functions.
8.8. THE DOMINATED CONVERGENCE THEOREM 201
Since is arbitrary, the two must be equal and they both must equal a. Next suppose
limn an = . Then if l R, there exists N such that for n N,
l an
and therefore, for such n,
l inf {ak : k n} sup {ak : k n}
and this shows, since l is arbitrary that
lim inf an = lim sup an = .
n n
f () = lim fn (),
n
and there exists a measurable function g, with values in [0, ],1 such that
|fn ()| g() and g()d < .
Then f L1 () and
0 = lim |fn f | d = lim f d fn d
n n
f L1 () and |f fn | 2g.
Hence
( )
0 lim sup |f fn |d
n
( )
lim inf |f fn |d f d fn d 0.
n
This proves the theorem by Lemma 8.8.1 because the lim sup and lim inf are equal.
1 Note that, since g is allowed to have the value , it is not known that g L1 () .
8.9. THE ONE DIMENSIONAL LEBESGUE STIELTJES INTEGRAL 203
lim inf (gn + g) d lim sup |f fn | d
n n
= lim inf ((gn + g) |f fn |) d 2gd
n
and so lim supn |f fn | d 0. Thus
( )
0 lim sup |f fn |d
n
( )
lim inf |f fn |d f d fn d 0.
n
{E A : A F}
2. (
k=1 Ai ) i=1 (Ai )
The Lebesgue integral taken with respect to this measure, is called the Lebesgue
Stieltjes integral. Note that any real valued continuous function is measurable with
respect to S. This is because if f is continuous, inverse images of open sets are open
and open sets are in S. Thus f is measurable because f 1 ((a, b)) S. Similarly if
f has complex values this argument applied to its real and imaginary parts yields
the conclusion that f is measurable.
For f a continuous function, how does the Lebesgue Stieltjes integral compare
with the Darboux Stieltjes integral? To answer this question, here is a technical
lemma.
Now let {x0 , , xmn } be a partition of [a, b] such that |xi xi1 | < for each i.
/ D and |zkn xk | < . Then
For k = 1, 2, , mn 1, let zkn
n
zk zk1 n
|zkn xk | + |xk xk1 | + xk1 zk1
n
< 3.
Proof: Since F is an increasing function it can have only countably many dis-
continuities. The reason for this is that the only kind of discontinuity it can have is
where F (x+) > F (x) . Now since F is increasing, the intervals (F (x) , F (x+))
for x a point of discontinuity are disjoint and so since each must contain a rational
number and the rational numbers are countable, and therefore so are these intervals.
Let D denote this countable set of discontinuities of F . Then if l, r
/ D, [l, r]
[a, b] , it follows quickly from the denition of the Darboux Stieltjes integral that
b
X[l,r) dF = F (r) F (l) = F (r) F (l)
a
= ([l, r)) = X[l,r) d.
Now let {sn } be the sequence of step functions of Lemma 8.9.2 such that these step
functions converge uniformly to f on [c, d] , say maxx |f (x) sn (x)| < 1/n. Then
( ) 1
X[c,d] f X[c,d] sn d X[c,d] (f sn ) d ([c, d])
n
and
b( ) b
1
X[c,d] f X[c,d] sn dF X[c,d] |f sn | dF < (F (b) F (a)) .
a a n
206 THE ABSTRACT LEBESGUE INTEGRAL
mn
( n ) ( n ) mn
( n )( ( n ))
= f zk1 [zk1 , zkn ) = f zk1 F (zkn ) F zk1
k=1 k=1
mn
( n )( ( n ))
= f zk1 F (zkn ) F zk1
k=1
mn b
( n ) b
= f zk1 X[zk1
n n ) dF =
,zk sn dF.
k=1 a a
Therefore,
b
X[c,d] f d X[c,d] f dF
X[c,d] f d X s
[c,d] n d
a
b b
b
+ X[c,d] sn d sn dF + sn dF X[c,d] f dF
a a a
1 1
([c, d]) + (F (b) F (a))
n n
and since n is arbitrary, this shows
b
f d f dF = 0.
a
Thus, if F (x) = x so the Darboux Stieltjes integral is the usual integral from
calculus,
b
f (t) dt = X[a,b] f d
a
where is the measure which comes from F (x) = x as described above. This
measure is often denoted by m. Thus when f is continuous
b
f (t) dt = X[a,b] f dm
a
8.10. EXERCISES 207
for either the Lebesgue or the Riemann integral. Furthermore, when f is continuous,
you can compute the Lebesgue integral by using the fundamental theorem of calculus
because in this case, the two integrals are equal.
8.10 Exercises
1. Let = N ={1, 2, }. Let F = P(N), the set of all subsets of N, and let
(S) = number of elements in S. Thus ({1}) = 1 = ({2}), ({1, 2}) = 2,
etc. Show (, F, ) is a measure space. It is called counting measure. What
functions are measurable in this case? For a nonnegative function, f dened
on N, show
f d = f (k)
N k=1
3. If (, F, ) is a measure space
and f
0 is measurable, show that 1if g () =
f () a.e. and g 0, then gd = f d. Show that if f, g L () and
g () = f () a.e. then gd = f d.
Show satises
() = 0, if A B, (A) (B),
(i=1 Ai ) (Ai ), (A) = (A) if A F .
i=1
etc. Now consider what it means for fnk (x) to fail to converge to f (x). Then
use Problem 9.
11. Suppose (, ) is a nite measure space ( () < ) and S L1 (). Then
S is said to be uniformly integrable if for every > 0 there exists > 0 such
that if E is a measurable set satisfying (E) < , then
|f | d <
E
for all f S.
12. Let (, F, ) be a measure space and suppose f, g : (, ] are
measurable. Prove the sets
Show S is measurable. Hint: You might try to exhibit the set where fn
converges in terms of countable unions and intersections using the denition
of a Cauchy sequence.
210 THE ABSTRACT LEBESGUE INTEGRAL
14. Suppose un (t) is a dierentiable function for t (a, b) and suppose that for
t (a, b),
|un (t)|, |un (t)| < Kn
where n=1 Kn < . Show
( un (t)) = un (t).
n=1 n=1
Hint: Use the monotone convergence theorem along with the fact the integral
is linear.
16. The integral f (t) dt will denote the Lebesgue integral taken with re-
spect to one dimensional Lebesgue measure as discussed earlier. Show that
for > 0, t eat is in L1 (R). The gamma function is dened for x > 0 as
2
(x) et tx1 dt
0
(x + 1) = x (x) , (1) = 1.
17. This problem outlines a treatment of Stirlings formula which is a very useful
approximation to n! based on a section in [39]. It is an excellent application
of the monotone convergence theorem. Follow and justify the following steps
using the convergence theorems for the Lebesgue integral as needed. Here
x > 0.
(x + 1) = et tx dt
0
isnt too bad if you take ln and then use LHospitals rule. Consider the
integral. Explain why it must be increasing in x. Next justify the following
assertion. Remember the monotone convergence theorem applies to a sequence
of functions.
( ( ))x
s x 2
es ds
2 2
lim e 1 + s ds =
x x
2
x
where this last improper integral equals a well dened constant (why?). It
is very easy, when you know something about multipleintegrals of functions
of more than one variable to verify this constant is but the necessary
mathematical machinery has not yet been presented. It can also be done
through much more dicult arguments in the context of functions of only one
variable. See [39] for these clever arguments.
18. To show you the power of Stirlings formula, nd whether the series
n!en
n=1
nn
converges. The ratio test falls at but you can try it if you like. Now explain
why, if n is large enough
( )
1 n n+(1/2)
es ds c 2en nn+(1/2) .
2
n! 2e n
2
Use this.
19. Give a theorem in which the improper Riemann integral coincides with a
suitable Lebesgue integral. (There are many such situations just nd one.)
20. Note that 0 sinx x dx is a valid improper Riemann integral dened by
R
sin x
lim dx
R 0 x
but this function, sin x/x is not in L1 ([0, )). Why?
212 THE ABSTRACT LEBESGUE INTEGRAL
(E \ E ) (E \ E) N
2. = on F
3. G F
213
214 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
(G \ F ) = (G \ F ) = 0 (9.1.1)
Thus F G..
Finally suppose is nite. Let =
n=1 n where the n are disjoint sets of
F and (n ) < . Letting A G, consider An A n . From(what was) just
shown, there exists Gn AC n , Gn n such that (Gn ) = AC n .
Gn
n AC
n A
( ( )) ( )
A n \ GC
n n = (A n Gn ) = (A Gn ) = Gn \ AC
( ( ))
Gn \ AC n = 0.
Also, there exists Gn An such that (Gn ) = (Gn ) = (An ) . Since the measures
are nite, it follows that (Gn \ An ) = 0. Then letting G =
n=1 Gn , it follows that
G A and
(G \ A) = (
n=1 Gn \ n=1 An )
(
n=1 Gn \ An ) (Gn \ An ) = 0.
n=1
Thus (G \ F )(= (G ) \ F ) = (G \ A) + (A \ F ) = 0.
If you have , G complete and satisfying 9.1.1, then letting E G , it follows
from 5, there exist F, G F such that
F E G, (G \ F ) = 0 = (G \ F ) .
Now consider the last assertion. By Theorem 7.5.6 on Page 175 there exists an
increasing sequence of nonnegative simple functions, {sn } measurable with respect
to G which converges pointwise to f . Letting
mn
sn () = cnk XEkn () (9.1.2)
k=1
be one of these simple functions, it follows from Theorem 9.1.2, there exist sets,
Fkn F such that Fkn Ekn Gnk and (Gnk \ Fkn ) = 0. Then let
mn
mn
bn () cnk XFkn () , tn () = cnk XGnk () .
k=1 k=1
g () lim sup tn ()
n
and let
h () = lim inf bn () .
n
Denition 9.1.4 Let (X, F, ) be a measure space where the algebra F contains
B (X) , the Borel sets. This is called a regular measure space if for every E F ,
and
(E) = inf { (V ) : V E, V open}
Note that this is equivalent to the existence of a set F E which is the countable
union of compact sets and a set G E which is the countable intersection of open
sets such that
(E) = (F ) = (G) .
Sets which are the countable intersection of open sets are called G sets and those
which are the countable union of compact sets are called F sets.1
In the context of a nite measure space, this can be strengthened to mean the
same as (G \ F ) = 0 and it is in this form that regularity will usually be referred
to. The following theorem is very interesting in this regard.
1 Actually, F usually means that the set is a countable union of closed sets but in the situations
considered in this book it is usually the same thing.
9.1. COMPLETION OF MEASURE SPACES, APPROXIMATION 217
Theorem 9.1.5 Let be a measure dened on the Borel sets of Rn which is nite
on compact sets. Then if E is any Borel set, there exists F a countable union of
compact sets and G, a countable intersection of open sets, such that F E G
and (G \ F ) = 0.
Proof: First suppose that is a nite measure. Let F E Borel such that the
following two conditions hold. These will be referred to as inner regular on E and
outer regular on E.
V C =
n=1 V
C
B (0, n)
n=1 Kn ,
the union of an increasing sequence of compact sets. Therefore, for all n large
enough, ( ) ( ) ( )
E C \ Kn E C \ V C + V C \ Kn < .
This has shown that F is closed with respect to complements.
Next consider Ei F. There exist open sets Vi Ei such that (Vi Ei ) <
/2i . Then
((i Vi ) \ (i Ei )) (i (Vi \ Ei )) (Vi Ei ) < .
i=1
It follows that
( n
i=1 Ei \ i=1 Ki ) (i=1 Ei \ i=1 Ki ) + (i=1 Ki \ i=1 Ki ) <
n
The following little theorem shows that if you have a measure which is regular
on B (X) , then the completion is also regular on the enlarged algebra coming
from the completion.
Theorem 9.1.6 Suppose is a regular measure dened on B (), the(Borel sets of)
, a topological space. Also suppose is nite. Then denoting by , B (),
the completion of (, B () , ) , it follows is also regular.
(G \ H ) = (G \ H )
(G \ G) + (G \ H) + (H \ H ) = 0.
Corollary 9.1.7 The conclusion of the above theorem holds for replaced with Y
where Y is a closed subset of .
9.2. DYNKIN SYSTEMS 219
1. K G
2. If A G, then AC G
3. If {Ai }i=1 is a sequence of disjoint sets from G then
i=1 Ai G.
H {G : 1 - 3 all hold}
GA {B G : A B G} .
I want to show GA satises 1 - 3 because then it must equal G since G is the smallest
collection of subsets of which satises 1 - 3. This will give the conclusion that for
A K and B G, A B G. This information will then be used to show that if
A, B G then A B G. From this it will follow very easily that G is a algebra
which will imply it contains (K). Now here are the details of the argument.
Since K is given to be a system, K G A . Property 3 is obvious because if
{Bi } is a sequence of disjoint sets in GA , then
A
i=1 Bi = i=1 A Bi G
A B
because it was just shown that nite intersections of sets of G are in G. Since the
Ai are disjoint, it follows
i=1 Ai = i=1 Ai G
and continue this way. The iterated integral is said to make sense if the process just
described makes sense at each step. Thus, to make sense, it is required
xi1 f (x1 , , xn )
can be integrated. Either the function has values in [0, ] and is measurable or it
is a function in L1 . Then it is required
xi2 f (x1 , , xn ) dxi1
can be integrated and so forth. The symbol in 9.3.3 is called an iterated integral.
Also for every Borel set E, there exist F an F set and G a G set such that
F EG
and mn (G \ H) = 0.
n
Also let Rp [p, p] , the n dimensional rectangle having sides [p, p]. A set F
Rn will be said to satisfy property P if for every p N and any two permutations
of {1, 2, , n}, (i1 , , in ) and (j1 , , jn ) the two iterated integrals
XRp F dxi1 dxin , XRp F dxj1 dxjn
make sense and are equal. Now dene G to be those subsets of Rn which have
property P.
Thus K G because if (i1 , , in ) is any permutation of {1, 2, , n} and
n
A= Ai K
i=1
then
n
XRp A dxi1 dxin = m ([p, p] Ai ) .
i=1
Now suppose F G and let (i1 , , in ) and (j1 , , jn ) be two permutations. Then
( )
Rp = Rp F C (Rp F )
and so
( )
XRp F C dxi1 dxin = XRp XRp F dxi1 dxin .
Since Rp G, the iterated integrals on the right and hence on the left make
sense. Then continuing with the expression on the right and using that F G,
( )
XRp XRp F dxi1 dxin =
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 223
n n
(2p) XRp F dxi1 dxin = (2p) XRp F dxj1 dxjn
( )
= XRp XRp F dxj1 dxjn = XRp F C dxj1 dxjn
N
= lim XRp Fk dxi1 dxin
N
k=1
Do
the iterated integrals make sense? Note that the iterated integral makes sense
N
for k=1 XRp Fk as the integrand because it is just a nite sum of functions for
which the iterated integral makes sense. Therefore,
xi1 XRp Fk (x)
k=1
is also measurable. Therefore, one can do another integral to this function. Con-
tinuing this way using the monotone convergence theorem, it follows the iterated
integral makes sense. The same reasoning shows the iterated integral makes sense
for any other permutation.
Now applying the monotone convergence theorem as needed,
XRp F dxi1 dxin = XRp Fk dxi1 dxin
k=1
224 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
N N
= lim XRp Fk dxi1 dxin = lim XRp Fk dxi1 dxin
N N
k=1 k=1
N
= lim XRp Fk dxi1 dxin
N
k=1
N
= lim XRp Fk dxi1 dxin
N
k=1
N
= lim XRp Fk dxj1 dxjn
N
k=1
the last step holding because each Fk G. Then repeating the steps above in the
opposite order, this equals
XRp Fk dxj1 dxjn = XRp F dxj1 dxjn
k=1
Using the monotone convergence theorem repeatedly as in the rst part of the
argument, this equals
lim XRp Fk dxj1 dxjn mn (Fk ) .
p
k=1 k=1
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 225
Applying the monotone convergence theorem repeatedly on the right, this yields
that the iterated integral makes sense and
XF dmn = XF dxj1 dxjn
Rn
It follows 9.3.4 holds for every nonnegative simple function in place of f because
these are just linear combinations of functions, XF . Now taking an increasing
sequence of nonnegative simple functions, {sk } which converges to a measurable
nonnegative function f
f dmn = lim sk dmn = lim sk dxj1 dxjn
Rn k Rn k
= f dxj1 dxjn
The assertion about regularity on the Borel sets follows from Theorem 9.1.5
because the measure is nite on compact sets.
Denition 9.3.4 The measure space for Lebesgue measure is (Rn , Fn , mn ) where
Fn is the completion of (Rn , B (Rn ) , mn ) where B (Rn ) denotes the Borel sets is n
dimensional measure dened on B (Rn ).
226 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
The important thing about Lebesgue measure is that the measure space is com-
plete and it is a regular measure space.
Theorem 9.3.5 The measure space (Rn , Fn , mn ) is complete and regular. This
means that for every E Fn , there exists an F set F and a G set G such that
G E F and mn (G \ F ) = 0. Also if f 0 is Fn measurable, then there exists
g f such that g is Borel measurable and g = f a.e.
Proof: Let E Fn . From Theorem 9.1.2 there exist Borel sets A, B such that
A E B and mn (B \ A) = 0. Now from Theorem 9.1.5, there exists a G set G
containing B and an F set F contained in A such that mn (A \ F ) = mn (G \ B) =
0. Then
mn (G \ F ) = mn (G \ B) + mn (B \ A) + mn (A \ F ) = 0.
Now consider the last claim. Let sk be an increasing
mk sequence of simple functions
which converges pointwise to f . Say sk (x) = i=1 ci XEi (x) where Ei Fn .
Then let Fi Ei such that Fi is Borel measurable and mn (Ei \ Fi ) = 0. Letting
Nk = m i=1 (Ei \ Fi ) , and N = k=1 Nk , it follows mn (N ) = 0. Now let N be
k
a Borel measurable set such that N has measure zero and N N . Then each
function in the sequence of simple functions given by sk XN i=1
mk
ci XFi is Borel
measurable, and the sequence converges to a Borel measurable function which equals
f o the exceptional set of measure zero N and equals 0 on N.
I dened (Fn , mn ) as the completion of (B (Rn ) , mn ) but the same thing would
have been obtained if I had used (F n , mn ). This can be shown from using the
regularity of one dimensional Lebesgue measure. You might try and show this.
In particular, iterated integrals for any permutation of {1, , n} are all equal.
Proof: It suces to prove this for f having real values because if this is shown
the general case is obtained by taking real and imaginary parts. Since f L1 (Rn ) ,
|f | dmn <
Rn
9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 227
and so both 12 (|f | + f ) and 12 (|f | f ) are in L1 (Rn ) and are each nonnegative.
Hence from Proposition 9.3.3,
[ ]
1 1
f dmn = (|f | + f ) (|f | f ) dmn
Rn 2 2
R
n
1 1
= (|f | + f ) dmn (|f | f ) dmn
R n 2 R n 2
1
= (|f (x)| + f (x)) dxi1 dxin
2
1
(|f (x)| f (x)) dxi1 dxin
2
1 1
= (|f (x)| + f (x)) (|f (x)| f (x)) dxi1 dxin
2 2
= f (x) dxi1 dxin
The following corollary is a convenient way to verify the hypotheses of the above
theorem.
Corollary 9.3.7 Suppose f is measurable with respect to F n and suppose for some
permutation, (i1 , , in )
|f (x)| dxi1 dxin <
Then f L1 (Rn ) .
and so f is in L1 (Rn ).
In using Proposition 9.3.3 or Corollary 9.3.7 or Theorem 9.3.6 when f is only
known to be Fn measurable, one typically uses Theorem 9.3.5 to get a function g
which is equal to f mn a.e., but g is Borel and hence F n measurable. (You use
this theorem on the positive and negative parts of the real and imaginary parts of
f .) Then every Lebesgue integral mentioned above in the above can be computed
by using the iterated integral of g. However, if you are interested in a much fussier
result, read the section on completion of product measure spaces below.
Since F n contains the Borel sets, all the above formulas pertain to the case
where f is Borel measurable.
228 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
where
A = {(x, y) : x y where x [0, 1]}
Fubinis theorem can be applied because the function (x, y) sin (y) /y is contin-
uous except at y = 0 and can be redened to be continuous there. The function is
also bounded so
sin (y)
(x, y) XA (x, y)
y
( )
clearly is in L1 R2 . Therefore,
sin (y) sin (y)
XA (x, y) dm2 = XA (x, y) dxdy
R2 y y
1 y
sin (y)
= dxdy
0 0 y
1
= sin (y) dy = 1 cos (1)
0
(A B) = (A) (B)
whenever A F and B S.
Denition 9.4.1 Let (X, F, ) and (Y, S, ) be two measure spaces. A measurable
rectangle is a set of the form A B where A F and B S.
9.4. PRODUCT MEASURES 229
where in the above, part of the requirement is for all integrals to make sense.
Then K G. This is obvious.
Next I want to show that if E G then E C G. Observe XE C = 1 XE and so
XE C dd = (1 XE ) dd
Y X
Y X
= (1 XE ) dd
X Y
= XE C dd
X Y
Next I want to show G is closed under countable unions of disjoint sets of G. Let
{Ai } be a sequence of disjoint sets from G. Then
X
i=1 Ai
dd = X Ai dd = XAi dd
Y X Y X i=1 Y i=1 X
= XAi dd = XAi dd
i=1 Y X i=1 X Y
= XAi dd = XAi dd
X i=1 Y X Y i=1
= X
i=1 Ai
dd, (9.4.6)
X Y
the interchanges between the summation and the integral depending on the mono-
tone convergence theorem. Thus G is closed with respect to countable disjoint
unions.
From Lemma 9.2.2, G (K) . Also the computation in 9.4.6 implies that on
(K) one can dene a measure, denoted by and that for every E (K) ,
( ) (E) = XE dd = XE dd. (9.4.7)
Y X X Y
Now here is Fubinis theorem.
Theorem 9.4.2 Let f : X Y [0, ] be measurable with respect to the algebra,
(K) just dened and let be the product measure of 9.4.7 where and are
nite measures on (X, F) and (Y, S) respectively. Then
f d ( ) = f dd = f dd.
XY Y X X Y
230 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
Proof: Since the measures are nite, there exist increasing sequences of sets,
{Xn } and {Yn } such that (Xn ) < and (Yn ) < . Then and restricted to
Xn and Yn respectively are nite. Then from Theorem 9.4.2,
f dd = f dd
Yn Xn Xn Yn
Then just as in the proof of Theorem 9.4.2, the conclusion of this theorem is ob-
tained.
n
It is also useful to note that all the above holds for i=1 Xi in place of X Y.
You would simply modify the denition of G in 9.4.5 including allpermutations for
n
the iterated integrals and for K you would use sets of the form i=1 Ai where Ai
is measurable. Everything goes through exactly as above. Thus the following is
obtained.
n n
Theorem 9.4.4 Let {(Xi , Fi , i )}i=1 be nite measure spaces and let i=1 Fi
denote
n the smallest algebra which contains the measurable boxes n of the form
A
i=1 i where A i F i . Then
n there exists a measure dened on i=1 Fi such that
n
if f : i=1 Xi [0, ] is i=1 Fi measurable, and (i1 , , in ) is any permutation
of (1, , n) , then
f d = f di1 din
Xin Xi1
9.4. PRODUCT MEASURES 231
Since f1 = f o a set of measure zero, I will dispense with the subscript. Also
it is customary to write
= 1 n
and
= 1 n .
Thus in more standard notation, one writes
f d (1 n ) = f di1 din
Xin Xi1
This theorem is often referred to as Fubinis theorem. The next theorem is also
called this.
232 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
( n )
n
Corollary 9.4.6 Suppose f L1 i=1 Xi , i=1 F i , 1 n where each Xi
is a nite measure space. Then if (i1 , , in ) is any permutation of (1, , n) ,
it follows
f d (1 n ) = f di1 din .
Xin Xi1
Proof: Just apply Theorem 9.4.5 to the positive and negative parts of the real
and imaginary parts of f.
Here is another easy corollary.
Corollary
n 9.4.7 Suppose in the situation of Corollary 9.4.6, f = f1 o N, a set of
i=1 F i having 1 nmeasure zero and that f1 is a complex valued function
n
measurable with respect to i=1 Fi . Suppose also that for some permutation of
(1, 2, , n) , (j1 , , jn )
|f1 | dj1 djn < .
Xjn Xj1
Then ( )
n
n
f L 1
Xi , Fi , 1 n
i=1 i=1
and the conclusion of Corollary 9.4.6 holds.
n
Proof: Since |f1 | is i=1 Fi measurable, it follows from Theorem 9.4.4 that
> |f1 | dj1 djn
Xjn Xj1
= |f1 | d (1 n )
= |f1 | d (1 n )
= |f | d (1 n ) .
( n )
n
Thus f L1 i=1 Xi , i=1 Fi , 1 n as claimed and the rest follows from
Corollary 9.4.6.
The following lemma is also useful.
Lemma 9.4.8 Let (X, F, ) and (Y, S, ) be nite complete measure spaces and
suppose f 0 is F S measurable. Then for a.e. x,
y f (x, y)
is S measurable. Similarly for a.e. y,
x f (x, y)
is F measurable.
9.5. EXERCISES 233
Then it follows that for these values of x, g (x, y) = h (x, y) and so by Theorem 9.1.3
again and the assumption that (Y, S, ) is complete, y f (x, y) is S measurable.
The other claim is similar.
9.5 Exercises
2 62z 3z ( )
1. Find 0 0 1 (3 z) cos y 2 dy dx dz.
2x
1 183z 6z ( )
2. Find 0 0 1 (6 z) exp y 2 dy dx dz.
3x
2 244z 6z ( )
3. Find 0 0 1 (6 z) exp x2 dx dy dz.
4y
1 124z 3z sin x
4. Find 0 0 1
x dx dy dz.
4y
20 1 5z 25 5 1 y 5z
5. Find 0 0 1 y sinx x dx dz dy+ 20 0 5 1 y sin x
x dx dz dy. Hint: You might
5 5
try doing it in the order, dy dx dz
Thus
R R
sin (t)
dt = sin (t) etx dxdt
0 t 0 0
Now explain why you can change the order of integration in the above iterated
integral. Then compute what you get. Next pass to a limit as R and
show
sin (t) 1
dt =
0 t 2
r
7. Explain why a f (t) dt limr a f (t) dt whenever f L1 (a, ) ; that
is f X[a,) L1 (R).
234 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
1/2
8. Let f (y) = g (y) = |y| if y (1, 0) (0, 1) and f (y) = g (y) = 0 if
y / (1,0) (0, 1). For which values of x does it make sense to write the
integral R f (x y) g (y) dy?
n
9. Let Ei be a Borel set in R. Show that i=1 Ei is a Borel set in Rn .
10. Let {an } be an increasing sequence of numbers in (0, 1) which converges to
1. Let gn be a nonnegative function which equals zero outside (an , an+1 ) such
that gn dx = 1. Now for (x, y) [0, 1) [0, 1) dene
f (x, y) gn (y) (gn (x) gn+1 (x)) .
k=1
Explain why this is actually a nite sum for each such (x, y) so there are
no convergence questions in the innite sum. Explain why f is a continuous
function on [0, 1) [0, 1). You can extend f to equal zero o [0, 1) [0, 1) if
you like. Show the iterated integrals exist but are not equal. In fact, show
1 1 1 1
f (x, y) dydx = 1 = 0 = f (x, y) dxdy.
0 0 0 0
Does this example contradict the Fubini theorem? Explain why or why not.
11. Let f : [a, b] R be Rieman integrable. Thus f is a bounded function and
by Darbouxs theorem, there exists a unique number between all the upper
sums and lower sums of f , this number being the Riemann integral. Show
that f is Lebesgue measurable and
b
f (x) dx = f dm
a [a,b]
where the second integral in the above is the Lebesgue integral taken with
respect to one dimensional Lebesgue measure and the rst is the ordinary
Riemann integral.
12. Let (, F, ) be a nite measure space and let f : [0, ) be mea-
surable. Also let : [0, ) R be increasing with (0) = 0 and a C 1
function. Show that
f d = (t) ([f > t]) dt.
0
Hint: This can be done using the following steps. Let tni = i2n . Show that
X[f >t] () = lim X[f >tn ] () X[tni ,tni+1 ) (t)
n i+1
i=0
so is X[f >t] () (t) . Note that it is important in the argument to have f > t.
Now observe
f ()
f d = (t) dtd = X[f >t] () (t) dtd
0 0
Use Fubinis theorem. For your information, this does not require the measure
space to be nite. You can use a dierent argument which ties in to the
rst denition of the Lebesgue integral. The function t ([f > t]) is called
the distribution function.
13. Give a dierent proof of the above as follows. First suppose f is a simple
function,
n
f () = ak XEk ()
k=1
where the ak are strictly increasing, (a0 ) = a0 0. Then explain carefully
the steps to the following argument.
n (ai ) n (ai )
n
f d = ([ f > t]) dt = (Ek ) dt
i=1 (ai1 ) i=1 (ai1 ) k=i
n n ai n ai
n
= (Ek ) (t) dt =
(t) (Ek ) dt
i=1 k=i ai1 i=1 ai1 k=i
n ai
= (t) ([f > t]) dt = (t) ([f > t]) dt
i=1 ai1 0
Note that this did not require the measure space to be nite and comes
directly from the denition of the integral.
14. Give another argument for the above result as follows.
([ ])
f d = ([ f > t]) dt = f > 1 (t) dt
0 0
and now change the variable in the last integral, letting (s) = t. Justify the
easy manipulations.
15. Let (x) C (x) for all > 1, (0) = 0, is strictly increasing on
[0, ), is C 1 , and suppose (, F, ) is a nite measure space. Also suppose
f, g are nonnegative measurable functions Suppose there exists > 1 such
that for all > 0 and 1 > > 0,
([f > ] [g ]) () ([f > ])
where lim0+ () = 0 where is increasing. Show there exists a constant
C depending only on , such that
f d C gd
236 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
This is called the good lambda inequality2 . Hint: Use the above problems.
Fill in the details.
(f ) d = (t) ([f > t]) dt = () ([f > ]) d
0 0
= () ([f > ] [g ]) d
0
+ () ([f > ] [g > ]) d
0
() () ([f > ]) d + () ([g > ]) d
0 0
( )
t
= () () ([f > ]) d + ([g > t]) dt
0 0
) (
= () (f ) d + g d
() C (f ) d + C/ (g) d
Now adjust . This yields the desired result in the case that (f ) d < .
What about the case where (f ) d = ? Does the good lambda estimate
hold if f is replaced with f m for m a positive constant? Recall () < .
2 I have no idea why it is called the good lambda inequality. I am also not sure if there is a bad
237
238 LEBESGUE MEASURABLE SETS
Proof: The regularity of mp on the Borel sets follows from Theorem 9.1.5.
Then Theorem 9.1.6 implies mp is regular. The assertion about the measure of the
Cartesian product in the case where each Ak is Borel follows from the fact that mp
is the extension of a measure for which the assertion does hold. See Theorem 9.1.2
on the completion of a measure space. In case the Ak are only Lebesgue measurable,
the equation can be obtained from regularity considerations.
It only remains to consider the claim about translation invariance. Let K denote
all sets of the form
p
Uk
k=1
p
p
x+ Uk = (xk + Uk )
k=1 k=1
which is also a nite Cartesian product of nitely many open sets. Also,
( ) ( p )
p
mp x + Uk = mp (xk + Uk )
k=1 k=1
p
= m (xk + Uk )
k=1
( )
p
p
= m (Uk ) = mp Uk
k=1 k=1
The step to the last line is obvious because an arbitrary open set in R is the disjoint
union of open intervals, and the lengths of these intervals are unchanged when they
are slid to another location.
Now let G denote those sets of F p (Recallthat F p was the smallest algebra
p
which contains all the measurable rectangles i=1 Ei , Ei Lebesgue measurable.) E
with the property that for each n N
p p
mp (x + E (n, n) ) = mp (E (n, n) )
p
and the set x + E (n, n) is in F p . Thus K G. If E G, then
( p) p p
x + E C (n, n) (x + E (n, n) ) = x + (n, n)
p
which implies x + E C (n, n) is in F p since it equals a dierence of two sets in
F p . Now consider the following.
( p) p
mp x + E C (n, n) + mp (E (n, n) )
( p ) p
= mp x + E C (n, n) + mp (x + E (n, n) )
p p
= mp (x + (n, n) ) = mp ((n, n) )
10.1. THE ALGEBRA OF LEBESGUE MEASURABLE SETS 239
( p) p
= mp E C (n, n) + mp (E (n, n) )
which shows ( p) ( p)
mp x + E C (n, n) = mp E C (n, n)
showing that E C G.
If {Ek } is a sequence of disjoint sets of G,
mp (x + (
p p
k=1 Ek ) (n, n) ) = mp (k=1 (x + Ek ) (n, n) )
n
Now the sets {(x + Ek ) (p, p) } are also disjoint and so the above equals
p
p
mp (x + Ek (n, n) ) = mp (Ek (n, n) )
k k
= mp (
p
k=1 Ek (n, n) )
Thus G is also closed with respect to countable disjoint unions. It follows from the
lemma on systems that G = (K) = B (Rp ) .
I have just shown that for every E B (Rp ) , and any n N,
p p
mp (x + E (n, n) ) = mp (E (n, n) )
mp (x + E) = mp (E) .
mp (E + x) mp (G + x) = mp (G) = mp (E)
= mp (F ) = mp (F + x) mp (E + x)
Lemma 10.1.3 Let f be measurable with respect to Fp . Then there exists a Borel
measurable function g, such that g = f mp a.e.
Proof: First note that every open set V is the countable union of compact sets.
In fact, { ( ) }
V = k=1 B (0, k) x V : dist x, V
C
1/k .
Now if K V where V is an open set and K is a compact set, it follows from the
nite intersection property of compact sets that for all k large enough,
{ ( ) }
K B (0, k) x V : dist x, V C > 1/k W
{ ( ) }
B (0, k) x V : dist x, V C 1/k ,
dist(x,W C )
a compact subset of V . Let h (x) = dist(x,W C )+dist(x,K) . Then since W is an open
set, containing the compact set K, it follows that the denominator is never equal
to 0. Therefore, h is continuous. Also, if x K, then h = 0 and if x / W, then
h = 0. It follows that h is in Cc (Rn ) , equals 1 on K and vanishes o a compact
set which is contained in V . We denote this situation with the notation of Rudin,
K h W.
Now let f 0 and f dmn < . By Theorem 8.7.5, there exists a nonnegative
m
simple function, s (x) = k=1 ck XEk (x) , mn (Ek ) < , such that
|f s| dmn < .
2
By regularity of the measure, there exists a compact set Kk and an open set Vk
such that
Kk Ek Vk , mn (Vk \ Kk ) < m
2 ( k=1 ck )
mwhat was just shown, there exists hk such that Kk hk Vk . Then let
Then from
h (x) = k=1 ck hk (x) . Thus h Cc (Rn ).
m
|s h| dmn ck mn (Vk \ Kk ) < .
2
k=1
Therefore,
|f h| dmn |f s| dmn + |s h| dmn < + = .
2 2
For an arbitrary f L1 (Rn ) , you simply apply the above result to positive and
negative parts of real and imaginary parts.
10.2. CHANGE OF VARIABLES, LINEAR MAPS 241
p
= |bi ai | = mp (Q) = |det (A)| mp (Q)
i=1
and so
( ( ))
mp A E C Rn = |det (A)| mp (Rn ) mp (A (E Rn ))
= |det (A)| (mp (Rn ) mp (E Rn ))
( )
= |det (A)| mp E C Rn
It was shown above that G contains K. By Lemma 9.2.2, it follows that G (K) =
B (Rp ). Therefore, for any A elementary and E a Borel set,
m
m1
|det (A)| mp (E) = |det (Ai )| mp (E) = |det (Ai )| mp (Am E)
i=1 i=1
m2
= |det (Ai )| mp (Am1 Am E) = mp (A1 Am E) = mp (AE) .
i=1
In case A is an arbitrary matrix which has rank less than p, there exists a
sequence of elementary matrices such that
A = E1 E2 Es B
where B is in row reduced echelon form and has at least one row of zeros. Thus if
E is any Lebesgue measurable set,
AE E1 E2 Es (BRp )
However, BRn is a Borel set of measure zero because it is contained in a set of the
form
F {x Rp : xk , , xp = 0}
and this has measure zero. Therefore, E1 E2 Es (BRp ) is a Borel set of measure
zero because
It has now been shown that for invertible A, and E any Borel set,
and for any Lebesgue measurable set E and A not invertible, the above formula
holds. It only remains to verify the formula holds for A invertible and E only
Lebesgue measurable. However, in this case, A maps open sets to open sets because
its inverse is continuous and maps compact sets to compact sets because x Ax
is continuous. Hence A takes G sets to G sets and F sets to F sets. Let E be
Lebesgue measurable. By regularity of the measure, there exists G and F, G and F
sets respectively such that F E G and mp (G \ F ) = 0. Then AF AE AG
and
mp (AG \ AF ) = mp (A (G \ F )) = |det (A)| mp (G \ F ) = 0.
By completeness, AE is Lebesgue measurable. Also
The above theorem also implies easily the following version of the change of variables
formula for linear mappings.
Proof: From Theorem 10.2.1, the equation is true if det (A) = 0. It follows that
it suces to consider only the case where A1 exists. First suppose f (y) = XE (y)
where E is a Lebesgue measurable set. In this case, A (Rn ) = Rn . Then from
Theorem 10.2.1
( )
XA(Rp ) (y) f (y) dy = mp (E) = |det (A)| mp A1 E
= |det (A)| XA1 E (x) dx
Rn
= |det (A)| XE (Ax) dx = f (A (x)) |det (A)| dx
Rn
It follows from this that 10.2.1 holds whenever f is a nonnegative simple function.
Finally, the general result follows from approximating the Lebesgue measurable
function with nonnegative simple functions using Theorem 7.5.6 and then applying
the monotone convergence theorem.
or any other. The balls can be either open or closed or neither. The proof given
here is from Basic Analysis [32].
If B1 , B2 G then B1 B2 = , (10.3.3)
A {B : B F} .
Suppose
> M sup {r : B(p, r) F} > 0.
Then there exists G F such that G consists of disjoint balls and
e : B G}.
A {B
10.3. COVERING THEOREMS 245
r0 x
w p
p0 r
?
Then
r0
z }| {
|x p0 | |x p| + |p w| + |w p0 |
< 32 r0
z }| {
( )m1
2
r + r + r0 2 M + r0
3
( )m1 ( )m
2 3
< 2 r0 + r0 = 4r0 .
3 2
This proves the lemma since it shows B (p, r) B0 (p0 , 4r0 ).
You dont need to assume the set of Balls is countable. Let F be any collection of
balls having bounded radii. Let F result from replacing each ball B (x, r) F with
the open ball B0 (x, r (1 + )). Thus, letting A denote the union of these slightly
enlarged balls, it follows from Problem 19 on Page 60 or Lemma 7.1.5 on Page 163,
that a countable subset of F denoted by F has the property that F = A .
Thus, letting = 1/4, the above conclusion follows for these enlarged balls and
that the enlarged balls are of the form B0 (x, 5r) where B (x, r) F. Note that if
B0 (x, r (1 + )) B0 (x1 , r1 (1 + )) = , then B (x, r) B (x1 , r1 ) = . This proves
the following proposition.
246 LEBESGUE MEASURABLE SETS
A {B : B F} .
Suppose
> M sup {r : B(p, r) F} > 0.
Then there exists G F such that G consists of disjoint balls whose closures are
also disjoint and
A {Bb : B G}
Denition 10.3.4 Let S be a set and let C be a covering of S meaning that every
point of S is contained in a set of C. This covering is said to be a Vitali covering if
for each > 0 and x S, there exists a set B C containing x, the diameter of B
is less than , and there exists an upper bound to the set of diameters of sets of C.
mp (B (x,r)) = mp (B (0,r))
= p mp (0, r) = p mp (B (x,r)) ,
E1
Therefore,
( ) ( )
mp E \
j=1 Bj mp (U ) mp
j=1 Bj
( )1 ( )
< 1 10p mp (E) mp Bj
j=1
( )
( )
p 1 cj
= 1 10 mp (E) 5p mp B
j=1
( )
p 1 p
1 10 mp (E) 5 mp (E)
= mp (E) p
where ( )1
p 1 10p 5p < 1
Thus, there exists m1 large enough that
( )
mp E \ mj=1 Bj p mp (E)
1
Now consider E \ m j=1 Bj and apply the same reasoning to it that was done to E.
1
( )
and since p < 1, and mp (E) < , this implies mp E \
j=1 Bj = 0.
You dont need to assume that E has nite measure in order to draw the above
conclusion.
248 LEBESGUE MEASURABLE SETS
Note that if
h (x) = Lx
where L L (Rp , Rp ) , then L is included in 10.4.5 because
p
(xk r, xk + r)
k=1
p
and so mp (B (x, r)) = (2r) . Also for a linear transformation A L (Rp , Rp ) ,
h (x+tei ) h (x)
lim
t0 t
10.4. DIFFERENTIABLE FUNCTIONS AND MEASURABILITY 249
where D is a dense countable subset of the unit ball. Since it is the sup of countably
many Borel measurable functions, it must also be Borel measurable. It follows the
set
Tk {x T : ||Dh (x)|| < k}
for T a measurable set, must be measurable as well.
Proof: Let
Tk {x T : ||Dh (x)|| < k}
and let > 0 be given. Now by outer regularity, there exists an open set V ,
containing Tk which is contained in U such that mp (V ) < . Let x Tk . Then by
dierentiability,
h (x + v) = h (x) + Dh (x) v + o (v)
and so there exist arbitrarily small rx < 1 such that B (x,5rx ) V and whenever
||v|| 5rx , ||o (v)|| < k ||v|| . Thus
From the Vitali covering theorem, there exists a countable { }disjoint sequence of
these balls, {B (xi , ri )}i=1 such that {B (xi , 5ri )}i=1 = B ci covers Tk Then
i=1
letting mp denote the outer measure determined by mp ,
( ( ))
mp (h (Tk )) mp h B bi
i=1
( ( ))
bi
mp h B mp (B (h (xi ) , 6krxi ))
i=1 i=1
p
= mp (B (xi , 6krxi )) = (6k) mp (B (xi , rxi ))
i=1 i=1
p p
(6k) mp (V ) (6k) .
mp (h (T )) = lim mp (h (Tk )) = 0.
k
250 LEBESGUE MEASURABLE SETS
F S, mp (S \ F ) = 0.
Then since h is continuous
h (F ) = k h (Kk ) B (Rp )
because the continuous image of a compact set is compact. Also, h (S \ F ) is a set
of measure zero by Lemma 10.4.1 and so
h (S) = h (F ) h (S \ F ) Fp
because it is the union of two sets which are in Fp .
In particular, this proves most of the following theorem from a dierent point
of view to that done before.
Theorem 10.4.3 Let A be a p p matrix. Then if E is a Lebesgue measurable set,
it follows that A (E) is also a Lebesgue measurable set.
Proof: By Theorem 10.1.2, there exists F a countable union of compact sets
{Ki }, and a set of measure zero N disjoint from F such that E = F N, with
mp (N ) = 0. Since x Ax is a continuous map, each AKi is compact and so AF
is a countable union of compact sets. Therefore, it is a Borel set and is therefore,
Lebesgue measurable. Thus A (F N ) = A (F ) A (N ) and from what was just
shown, A (N ) is of measure zero and so it is measureable.
Proof: First note h1 (spt (f )) is a closed subset of the bounded set U and so
it is compact. Thus x f (h (x)) |det (Dh (x))| is bounded and continuous.
Let x U. By the assumption that h and h1 are C 1 ,
|f (h (x1 )) |det (Dh (x1 ))| f (h (x)) |det (Dh (x))|| < (10.5.8)
Kk E Gk
mp (N ) mp (
k=m Gk \ Kk )
mp (Gk \ Kk ) < 2k = 2(m1)
k=m k=m
showing mp (N ) = 0.
/ h1 (N ), a set of measure
Then fk (h (x)) must converge to XE (h (x)) for all x
zero by Lemma 10.4.1. Thus XE (h (x)) = limk fk (h (x)) o h1 (N ) and so by
completeness of Lebesgue measure, x XE (h (x)) is measurable. Then
fk (y) dmp = fk (h (x)) |det (Dh (x))| dmp .
V U
Proof: Since both h, h1 are continuous, h maps open sets to open sets. Let
Un = B (0, n) U where n is large enough that this intersection is nonempty. Let
Vn = h (Un ) , an open set. Then if E is a Lebesgue measurable set, the above
implies
XEVn (y) dmp = XEVn (h (x)) |det Dh (x)| dmp
Vn Un
254 LEBESGUE MEASURABLE SETS
Hence
XEVn (y) dmp = XEVn (h (x)) |det Dh (x)| dmp
V U
Now let n and use the monotone convergence theorem.
With this corollary, the main theorem follows.
Proof: From Corollary 10.5.4, 10.5.10 holds for any nonnegative simple function
in place of g. In general, let {sk } be an increasing sequence of simple functions which
converges to g pointwise. Then from the monotone convergence theorem
g (y) dmp = lim sk dmp = lim sk (h (x)) |det (Dh (x))| dmp
k V k U
V
= g (h (x)) |det (Dh (x))| dmp .
U
Of course this theorem implies the following corollary by splitting up the function
into the positive and negative parts of the real and imaginary parts.
This is a pretty good theorem but it isnt too hard to generalize it. In particular,
it is not necessary to assume h1 is C 1 .
In what follows, it may be convenient to take
Proof: For convenience, assume the balls in the following argument come from
|||| . First note that Z is a Borel set because h is continuous and so the component
functions of the Jacobian matrix are each Borel measurable. Hence the determinant
is also Borel measurable.
Suppose that U is a bounded open set. Let > 0 be given. Also let V Z with
V U open, and
mp (Z) + > mp (V ) .
Now let x Z. Then since h is dierentiable at x, there exists x > 0 such that if
r < , then B (x, r) V and also,
The diameter of ADh (x) (B (0,r)) is no larger than ||A|| ||Dh (x)|| 2r and it lies in
Rp1 {0} . The diameter of AB (0,r) is no more than ||A|| (2r) .Therefore, the
measure of the right side in 10.6.11 is no more than
p1
[(||A|| ||Dh (x)|| 2r + ||A|| (2)) r] (r)
p
C (||A|| , ||Dh (x)||) (2r)
The balls of this form constitute a Vitali cover of Z. Hence, by the Vitali covering
theorem, there exists {Bi }i=1 , Bi = Bi (xi , ri ) , a collection of disjoint balls, each of
which is contained in V, such that mp (h (Bi )) mp (Bi ) and mp (Z \ i Bi ) = 0.
Hence from Lemma 10.4.1,
mp (h (Z) \ i h (Bi )) mp (h (Z \ i Bi )) = 0
Therefore,
mp (h (Z)) mp (h (Bi )) mp (Bi )
i i
(mp (V )) (mp (Z) + ) .
256 LEBESGUE MEASURABLE SETS
Proof: Let Z = {x : det (Dh (x)) = 0} , a closed set. Then by the inverse
function theorem, h1 is C 1 on h (U \ Z) and h (U \ Z) is an open set. Therefore,
from Lemma 10.6.1, h (Z) has measure zero and so by Theorem 10.5.5,
g (y) dmp = g (y) dmp = g (h (x)) |det (Dh (x))| dmp
h(U ) h(U \Z) U \Z
= g (h (x)) |det (Dh (x))| dmp .
U
and Z the set where |det Dh (x)| = 0, Lemma 10.6.1 implies mp (h(Z)) = 0. For
x U+ , the inverse function theorem implies there exists an open set Bx U+ ,
such that h is one to one on Bx .
Let {Bi } be a countable subset of {Bx }xU+ such that U+ = i=1 Bi . Let
E1 = B1 . If E1 , , Ek have been chosen, Ek+1 = Bk+1 \ ki=1 Ei . Thus
i=1 Ei = U+ , h is one to one on Ei , Ei Ej = ,
and each Ei is a Borel set contained in the open set Bi . Now dene
n(y) Xh(Ei ) (y) + Xh(Z) (y).
i=1
The set h (Ei ) , h (Z) are measurable by Lemma 10.4.2. Thus n () is measurable.
= Xh(Ei ) (y)XF (y)dmp
i=1 h(U )
= Xh(Ei ) (y)XF (y)dmp
i=1 h(Bi )
= XEi (x)XF (h(x))| det Dh(x)|dmp
i=1 Bi
= XEi (x)XF (h(x))| det Dh(x)|dmp
i=1 U
= XEi (x)XF (h(x))| det Dh(x)|dmp
U i=1
= XF (h(x))| det Dh(x)|dmp = XF (h(x))| det Dh(x)|dmp .
U+ U
Observe that
#(y) = n(y) a.e. (10.7.13)
because n(y) = #(y) if y
/ h(Z), a set of measure 0. Therefore, # is a measurable
function because of completeness of Lebesgue measure.
Proof: From 10.7.13 and Lemma 10.7.1, 10.7.14 holds for all g, a nonnegative
simple function. Approximating an arbitrary measurable nonnegative function, g,
with an increasing pointwise convergent sequence of simple functions and using
the monotone convergence theorem, yields 10.7.14 for an arbitrary nonnegative
measurable function, g.
258 LEBESGUE MEASURABLE SETS
y1 = cos
y2 = sin
where > 0 and R. Thus these transformation equations are not one to one
but they are one to one on (0, ) [0, 2). Here I am writing in place of r to
emphasize a pattern which is about to emerge. I will consider polar coordinates as
spherical coordinates in two dimensions. I will also simply refer to such coordinate
systems as polar coordinates regardless of the dimension. This is also the reason I
am writing y1 and y2 instead of the more usual x and y. Now consider what happens
when you go to three dimensions. The situation is depicted in the following picture.
R (x1 , x2 , x3 )
1
R2
From this picture, you see that y3 = cos 1 . Also the distance between (y1 , y2 )
and (0, 0) is sin (1 ) . Therefore, using polar coordinates to write (y1 , y2 ) in terms
of and this distance,
y1 = sin 1 cos ,
y2 = sin 1 sin ,
y3 = cos 1 .
R (x1 , x2 , x3 , x4 )
2
R3
From this picture, you see that y4 = cos 2 . Also the distance between (y1 , y2 , y3 )
and (0, 0, 0) is sin (2 ) . Therefore, using polar coordinates to write (y1 , y2 , y3 ) in
10.8. SPHERICAL COORDINATES IN P DIMENSIONS 259
( ( )) ( )
= , p1 ,
f hp , dmp1 dm
(0,) A
( ( )) ( )
= p1
f hp , , , dmp1 dm
(0,) A
Now the claim about f L1 follows routinely from considering the positive and
negative parts of the real and imaginary parts of f in the usual way.
Note that the above equals
( ( )) ( )
f hp , , p1 , dmp
A[0,)
Notation 10.8.2 Often this is written dierently. Note that from the spherical co-
ordinate formulas, f (h (, , )) = f () where || = 1. Letting S p1 denote the
unit sphere, { Rp : || = 1} , the inside integral in the above formula is some-
times written as
f () d
S p1
will be referred
( ) toas polar coordinates and is very useful in establishing estimates.
Here S p1 A (, ) dmp1 .
( )s
2
Example 10.8.3 For what values of s is the integral B(0,R)
1 + |x| dy bounded
independent of R? Here B (0, R) is the ball, {x R : |x| R} . p
I think you can see immediately that s must be negative but exactly how neg-
ative? It turns out it depends on p and using polar coordinates, you can nd just
exactly what is needed. From the polar coordinates formula above,
( )s
2
R ( )s
1 + |x| dy = 1 + 2 p1 dd
B(0,R) 0 S p1
R ( )s
= Cp 1 + 2 p1 d
0
Now the very hard problem has been reduced to considering an easy one variable
problem of nding when
R
( )s
p1 1 + 2 d
0
i+j
(1) . In the proof and in what follows, I am using Dg to equal the matrix of
the linear transformation Dg taken with respect to the usual basis on Rp . Thus
Dg (x) = (Dg)ij ei ej
ij
and recall that (Dg)ij = gi /xj where g = i gi ei .
p
cof (Dg)ij,j = 0,
j=1
det(Dg)
where here (Dg)ij gi,j gi
xj . Also, cof (Dg)ij = gi,j .
p
det (Dg) = gi,j cof (Dg)ij
i=1
and so
det (Dg)
= cof (Dg)ij (10.9.18)
gi,j
which shows the last claim of the lemma. Also
kj det (Dg) = gi,k (cof (Dg))ij (10.9.19)
i
Subtracting the rst sum on the right from both sides and using the equality of
mixed partials,
gi,k (cof (Dg))ij,j = 0.
i j
10.9. BROUWER FIXED POINT THEOREM 263
If det (gi,k ) = 0 so that (gi,k ) is invertible, this shows j (cof (Dg))ij,j = 0. If
det (Dg) = 0, let
gk (x) = g (x) + k x
where k 0 and det (Dg + k I) det (Dgk ) = 0. Then
(cof (Dg))ij,j = lim (cof (Dgk ))ij,j = 0
k
j j
that g = h on U and g is C k (W ) .
In the following lemma, you could use any norm in dening the balls and every-
thing would work the same but I have in mind the usual norm.
( )
Lemma 10.9.3 There does not exist h C 2 B (0, R) such that h :B (0, R)
B (0, R) which also has the property that h (x) = x for all x B (0, R) . Such a
function is called a retraction.
Proof: Suppose such an h exists. Let [0, 1] and let p (x) x+ (h (x) x) .
This function, p is called a homotopy of the identity map and the retraction, h.
Let
I () det (Dp (x)) dx.
B(0,R)
Now by assumption, hi (x) = xi on B (0, R) and so one can form iterated integrals
and integrate by parts in each of the one dimensional integrals to obtain
I () = cof (Dp (x))ij,j (hi (x) xi ) dx = 0.
i B(0,R) j
but
I (1) = det (Dh (x)) dmp = # (y) dmp = 0
B(0,1) B(0,1)
264 LEBESGUE MEASURABLE SETS
Proof: Suppose the lemma is not true. Then for all x, |x h (x)| = 0. Then
dene
x h (x)
g (x) = h (x) + t (x)
|x h (x)|
where t (x) is nonnegative and is chosen such that g (x) B (0, R) . This mapping
is illustrated in the following picture.
f (x)
x
g(x)
Then ( )
x h (x)
Ht (x, t) = 2 h (x) , + 2t.
|x h (x)|
If this is nonzero for all x near B (0, R), it follows from the implicit function theorem
that t is a C 2 function of x. From 10.9.20
( )
x h (x)
2t = 2 h (x) ,
|x h (x)|
( )2
x h (x) ( )
2
4 h (x) , 4 |h (x)| R2
|x h (x)|
and so
( )
x h (x)
Ht (x, t) = 2t + 2 h (x) ,
|x h (x)|
( )2
( ) x h (x)
2
= 4 R |h (x)| + 4 h (x) ,
2
|x h (x)|
10.10. EXERCISES 265
10.10 Exercises
1. Recall the denition of fy . Prove that if f L1 (Rp ) , then
lim |f fy | dmp = 0
y0 Rp
Hint: If either of the factors on the right equals 0, explain why there is nothing
( p )1/p ( q )1/q
to show. Now let a = |f | / |f | d and b = |g| / |g| d . Apply
the inequality of the previous problem.
4. If f L1 (Rp ) , show there exists g L1 (Rp ) such that g is also Borel
measurable such that g (x) = f (x) for a.e. x.
5. Suppose f, g L1 (Rp ) . Dene f g (x) by
f (x y) g (y) dmp (y) .
Show this makes sense for a.e. x and that in fact for a.e. x
|f (x y)| |g (y)| dmp (y)
Next show
|f g (x)| dmp (x) |f | dmp |g| dmp .
Hint: Use Problem 4. Show rst there is no problem if f, g are Borel mea-
surable. The reason for this is that you can use Fubinis theorem to write
|f (x y)| |g (y)| dmp (y) dmp (x)
= |f (x y)| |g (y)| dmp (x) dmp (y)
= |f (z)| dmp |g (y)| dmp .
Explain. Then explain why if f and g are replaced by functions which are
equal to f and g a.e. but are Borel measurable, the convolution is unchanged.
6. In the situation of Problem 5 Show x f g (x) is continuous whenever g
is also bounded. Hint: Use Problem 1.
7. Let f : [0, ) R be in L1 (R, m). The Laplace transform is given by
x
fb(x) = 0 ext f (t)dt. Let f, g be in L1 (R, m), and let h(x) = 0 f (x
t)g(t)dt. Show h L1 , and b
h = fbgb.
8. Suppose A is covered by a nite collection of Balls, F. Show that then
p bi
there exists a disjoint collection of these balls, {Bi }i=1 , such that A pi=1 B
b
where Bi has the same center as Bi but 3 times the radius. Hint: Since the
collection of balls is nite, they can be arranged in order of decreasing radius.
10.10. EXERCISES 267
9. Let f be a function dened on an interval, (a, b). The Dini derivates are
dened as
f (x + h) f (x)
D+ f (x) lim inf ,
h0+ h
f (x + h) f (x)
D+ f (x) lim sup
h0+ h
f (x) f (x h)
D f (x) lim inf ,
h0+ h
f (x) f (x h)
D f (x) lim sup .
h0+ h
Suppose f is continuous on (a, b) and for all x (a, b), D+ f (x) 0. Show
that then f is increasing on (a, b). Hint: Consider the function, H (x)
f (x) (d c) x (f (d) f (c)) where a < c < d < b. Thus H (c) = H (d).
Also it is easy to see that H cannot be constant if f (d) < f (c) due to the
assumption that D+ f (x) 0. If there exists x1 (a, b) where H (x1 ) > H (c),
then let x0 (c, d) be the point where the maximum of f occurs. Consider
D+ f (x0 ). If, on the other hand, H (x) < H (c) for all x (c, d), then consider
D+ H (c).
10. Suppose in the situation of the above problem we only know
D+ f (x) 0 a.e.
Does the conclusion still follow? What if we only know D+ f (x) 0 for every
x outside a countable set? Hint: In the case of D+ f (x) 0,consider the
bad function in the exercises for the chapter on the construction of measures
which was based on the Cantor set. In the case where D+ f (x) 0 for all but
countably many x, by replacing f (x) with fe(x) f (x) + x, consider the
situation where D+ fe(x) > 0 for all but
( countably )many x. If in this situation,
f (c) > f (d) for some c < d, and y fe(d) , fe(c) ,let
e e
{ }
z sup x [c, d] : fe(x) > y0 .
and conclude that aside from a set of measure zero, D+ f (x) = D+ f (x).
Similar reasoning will show D f (x) = D f (x) a.e. and D+ f (x) = D f (x)
a.e. and so o some set of measure zero, we have
which implies the derivative exists and equals this common value. Hint: To
show 10.10.21, let U be an open set containing Npq such that m (Npq ) + >
m (U ). For each x Npq there exist y > x arbitrarily close to x such that
Thus the set of such intervals, {[x, y]} which are contained in U constitutes a
Vitali cover of Npq . Let {[xi , yi ]} be disjoint and
m (Npq \ i [xi , yi ]) = 0.
Thus the set of such intervals, {[x , y ]} which are contained in V is a Vitali
cover of Npq V . Let {[xi , yi ]} be disjoint and
m (Npq V \ i [xi , yi ]) = 0.
13. Why is it that if f L1 (R) , then there exists g C 1 (R) which vanishes o
some nite interval? Consider g Cc (R) which is close to f in L1 (R) and
x+h
then consider gh (x) 2h
1
xh
g (t) dt.
14. Prove Lemma 10.4.1 which says a C 1 function maps a set of measure zero to
a set of measure zero using Theorem 10.7.3.
r
15. For this problem dene a f (t) dt limr a f (t) dt. Note this coincides
with the Lebesgue integral when f L1 (a, ). Show
sin(u)
(a) 0 u du = 2
(b) limr sin(ru)
u du= 0 whenever > 0.
(c) If f L (R), then limr R sin (ru) f (u) du = 0.
1
Hint: For the rst two, use u1 = 0 eut dt and apply Fubinis theorem to
R
0
sin u R eut dtdu. For the last part, rst establish it for f a C 1 function
which vanishes o a nite interval and then use the density of this set in
L1 (R) to obtain the result. This is called the Riemann Lebesgue lemma.
16. Suppose that g L1 (R) and that at some x > 0, g is locally Holder
continuous from the right and from the left. This means
lim g (x + r) g (x+)
r0+
exists,
lim g (x r) g (x)
r0+
exists and there exist constants K, > 0 and r (0, 1] such that for |x y| <
,
r
|g (x+) g (y)| < K |x y|
for y > x and
r
|g (x) g (y)| < K |x y|
for y < x. Show that under these conditions,
( )
2 sin (ur) g (x u) + g (x + u)
lim du
r 0 u 2
g (x+) + g (x)
= .
2
Let g L1 (R) and suppose g is locally Holder continuous from the right and
from the left at x. Show that then
R
1 g (x+) + g (x)
lim eixt eity g (y) dydt = .
R 2 R 2
This is very interesting. Hint: Show the left side of the above equation
reduces to ( )
2 sin (ur) g (x u) + g (x + u)
du
0 u 2
and then use Problem 16 to obtain the result.
18. A measurable function g dened on (0, ) has exponential growth if
|g (t)| Cet for some . For Re (s) > , dene the Laplace Transform by
Lg (s) esu g (u) du.
0
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick > . Show that
R
1 g (t+) + g (t)
lim et eiyt Lg ( + iy) dy = .
R 2 R 2
This formula is sometimes written in the form
+i
1
est Lg (s) ds
2i i
and is called the complex inversion integral for Laplace transforms. It can be
used to nd inverse Laplace transforms. Hint:
R
1
et eiyt Lg ( + iy) dy =
2 R
R
1
et eiyt e(+iy)u g (u) dudy.
2 R 0
Now use Fubinis theorem and do the integral from R to R to get this equal
to
et u sin (R (t u))
e g (u) du
tu
where g is the zero extension of g o [0, ). Then this equals
et (tu) sin (Ru)
e g (t u) du
u
which equals
2et g (t u) e(tu) + g (t + u) e(t+u) sin (Ru)
du
0 2 u
and then apply the result of Problem 16.
10.10. EXERCISES 271
and then use this to argue {zk } is a Cauchy sequence. Then if zi works for
i = 1, 2, consider (z1 + z2 ) /2 to get a contradiction.
20. In Problem 19 show that P x satises the following variational inequality.
(xP x) (yP x) 0
for all y K. Then show that |P x1 P x2 | |x1 x2 |. Hint: For the rst
2
part note that if y K, the function t |x (P x + t (yP x))| achieves its
minimum on [0, 1] at t = 0. For the second part,
(x1 P x1 ) (P x2 P x1 ) 0, (x2 P x2 ) (P x1 P x2 ) 0.
Explain why
(x2 P x2 (x1 P x1 )) (P x2 P x1 ) 0
and then use a some manipulations and the Cauchy Schwarz inequality to get
the desired inequality.
21. Establish the Brouwer xed point theorem for any convex compact set in Rp .
Hint: If K is a compact and convex set, let R be large enough that the closed
ball, D (0, R) K. Let P be the projection onto K as in Problem 20 above.
If f is a continuous map from K to K, consider f P . You want to show f has
a xed point in K.
22. In the situation of the implicit function theorem, suppose f (x0 , y0 ) = 0 and
assume f is C 1 . Show that for (x, y) B (x0 , ) B (y0 , r) where , r are
small enough, the mapping
1
x Ty (x) xD1 f (x0 , y0 ) f (x, y)
is continuous and maps B (x0 , ) to B (x0 , /2) B (x0 , ). Apply the Brouwer
xed point theorem to obtain a shorter proof of the implicit function theorem.
272 LEBESGUE MEASURABLE SETS
23. Here is a really interesting little theorem which depends on the Brouwer xed
point theorem. It plays a prominent role in the treatment of the change of
variables formula in Rudins book, [40] and is useful in other contexts as well.
The idea is that if a continuous function mapping a ball in Rk to Rk doesnt
move any point very much, then the image of the ball must contain a slightly
smaller ball.
Lemma: Let B = B (0, r), a ball in Rk and let F : B Rk be continuous
and suppose for some < 1,
|F (v) v| < r (10.10.22)
for all v B. Then
F (B) B (0, r (1 )) .
Hint: Suppose a B (0, r (1 )) \ F (B) so it didnt work. First explain
why a = F (v) for all v B. Now letting G :B B, be dened by G (v)
r(aF(v))
|aF(v)| ,it follows G is continuous. Then by the Brouwer xed point theorem,
G (v) = v for some v B. Explain why |v| = r. Then take the inner product
with v and explain the following steps.
2 r
(G (v) , v) = |v| = r2 = (a F (v) , v)
|a F (v)|
r
= (a v + v F (v) , v)
|a F (v)|
r
= [(a v, v) + (v F (v) , v)]
|a F (v)|
r [ ]
2
= (a, v) |v| + (v F (v) , v)
|a F (v)|
r [ 2 ]
r (1 ) r2 +r2 = 0.
|a F (v)|
24. Using Problem 23 establish the following interesting result. Suppose f : U
Rp is dierentiable. Let
S = {x U : det Df (x) = 0}.
Show f (U \ S) is an open set.
25. Let K be a closed, bounded and convex set in Rp and let f : K Rp be
continuous and let y Rp . Show using the Brouwer xed point theorem
there exists a point x K such that P (y f (x) + x) = x. Next show that
(y f (x) , z x) 0 for all z K. The existence of this x is known as
Browders lemma and it has great signicance in the study of certain types of
nolinear operators. Now suppose f : Rp Rp is continuous and satises
(f (x) , x)
lim = .
|x| |x|
Show using Browders lemma that f is onto.
Approximation Theorems
Proof: Let
( )
( )m
m
m mk
(t) 1 x + et x = ekt xk (1 x)
k
k=0
Then
( )m1
(t) = mxet xet x + 1
( )m2 ( )
(t) = mxet xet x + 1 mxet x + 1
Then
m ( )
m mk
(0) = mx = kxk (1 x) ,
k
k=0
m (
)
m
(0) =
mk
mx (mx x + 1) = k 2 xk (1 x)
k
k=0
Therefore,
m (
) m (
)
m mk 2 m mk 2
x (1 x)
k
(k mx) xk (1 x) k
k k
k=0 k=0
273
274 APPROXIMATION THEOREMS
m (
)
m mk
2mx kxk (1 x) + m2 x2
k
k=0
( ) ( )
m
xk (1 x)
mk f k f (x) +
k m
|m
k
x|<
( )
m mk
2 ||f || xk (1 x)
k
|m
k
x|
Therefore,
m (
) ( )
m mk m mk
xk (1 x) + 2 ||f || xk (1 x)
k 2 k
k=0 (kmx)2 m2 2
m ( )
1 m 2 mk
+ 2 ||f || 2 2 (k mx) xk (1 x)
2 m k=0 k
1 1
+ 2 ||f || m 2 2 <
2 4 m
provided m is large enough.
Thus the above theorem can be stated as follows. There exists a sequence of polyno-
mials {pm } such that
lim ||pm f || = 0.
m
Proof: Let l : [0, 1] [a, b] be one to one, linear and onto. Then f l is
continuous on [0, 1] and so if > 0 is given, there exists a polynomial p such that
for all x [0, 1] ,
|p (x) f l (x)| <
Therefore, letting y = l (x) , it follows that for all y [a, b] ,
As another corollary, here is the version which will be used in Stones general-
ization.
Note that if A is an algebra which annihilates no point and separates the points,
then A is also an algebra which has these same properties.
276 APPROXIMATION THEOREMS
Proof: Say ||f || M. Let pm (t) |t| uniformly on [M, M ] with pm (0) = 0.
Thus pm (f ) A and limm |||f | pm (f )|| = 0. Hence |f | A. Now note that
|g f | + (f + g) (f + g) |g f |
max (f, g) = , min (f, g) =
2 2
This clearly works in the sense that it gives the right values at p, q. Why is it in
A? This is obvious when you expand it and see that you are adding products of
functions in A and scalar multiples of functions in A.
Now here is the rst step to Stones generalization.
Theorem 11.2.4 Let K be a compact set and let A be a real algebra of continuous
functions dened on K such that A separates the points and annihilates no point.
Then for every f C (K) and > 0, there exists A such that ||f || .
Proof: Let f C (K) . I will show there exists A such that ||f || .
This will prove the desired result thanks to the denition of A.
Pick p K and let pq A be such that pq (p) = f (p) and pq (q) = f (q).
Then for each q K, there exists an open set Uq containing q such that pq (z)+ >
f (z) for all z Uq . Since K is compact, there }mnitely many, Uq1 , , Uqm which
{ are
cover K. Therefore, letting p (x) min pqi i=1 , it follows that p (p) = f (p)
and also p (x) + > f (x) for all x. Now of course p was arbitrary and for each p,
there exists an open set Vp containing p such that f (z) > p (z) for all z Vp .
n
Since K is compact, there exist nitely many of these Vp which cover K, {Vpi }i=1 .
Then it follows that for all x,
{ }n
f (x) > max pi i=1
11.2. STONES GENERALIZATION 277
{ }n
Let = max pi i=1 . Then (x) + > f (x) > (x) and so ||f || < .
The next step is to generalize this to algebras in C0 (Rn ) or more generally
to C0 (X) where X is a locally compact Hausdor space. I will ignore the more
abstract case at this time.
The generalization to this case depends on the ideas illustrated in the following
picture.
xn+1
P
K y
Rn
x = (y)
The sphere is labeled K and there is a mapping, denoted by which takes the
surface of K \ {P } , to Rn as implied by the picture. Obviously this mapping and
its inverse is continuous. Now make the following denition for f C0 (Rn ) .
{
f (y) if y =P
f (y)
0 if y = P
lim ||f hk || = 0.
k
278 APPROXIMATION THEOREMS
This proves the following theorem which is the real form of the Stone Weierstrass
approximation theorem.
lim ||hk f || = 0.
k
11.4 Exercises
1. Consider polynomials in x3 on [0, b] . Such a polynomial is of the form p (x) =
a0 + a1 x3 + a2 x6 + + an x3n . Show these polynomials are dense in the space
of continuous functions dened on [0, b].
4. Let S 1 denote the unit circle centered at (0, 0). For convenience, consider
S 1 to be in the complex plane. Thus the points of S 1 are of the form eix for
x R. Show that linear combinations
( 1) of the functions einx for n Z is an
algebra and is dense in C S . That is, show that if f is continuous on S 1
280 APPROXIMATION THEOREMS
Then show limn ||pn f || = 0. where ||f || = max {|f (x)| : x [1, 1]}.
7. Suppose f C0 ([0, )) and also |f (t)| Cert . Let A denote the algebra
of linear combinations of functions of the form est for s suciently large.
Thus A is dense in C0 ([0, )) . Show that if
est f (t) dt = 0
0
for each s suciently large, then f (t) = 0. Next consider only |f (t)| Cert
for some r. That is f has exponential growth. Show the same conclusion holds
for f if
est f (t) dt = 0
0
for all s suciently large. This justies the Laplace transform procedure of
dierential equations where if the Laplace transforms of two functions are
equal, then the two functions are considered to be equal. More can be said
about this. Hint: For the last part, consider g (t) e2rt f (t) and apply the
rst part to g. If g (t) = 0 then so is f (t).
11.4. EXERCISES 281
n
8. Consider linear combinations of functions of the form i=1 i (xi ) where
i Cc (R) . Show that this is an algebra A of functions of Cc (Rn ) . Explain
why A is dense in C0 (Rn ).
9. In the context of the above problem, if f Cc (Rn ) , dene
f dx lim pk dx1 dxn
k
ap bq
Lemma 12.1.3 If p > 1, and 0 a, b then ab p + q .
283
284 THE LP SPACES
x
b
x = tp1
t = xq1
t
a
From this picture, the sum of the area between the x axis and the curve added to
the area between the t axis and the curve is at least as large as ab. Using beginning
calculus, this is equivalent to the following inequality.
a b
ap bq
ab t p1
dt + xq1 dx = + .
0 0 p q
The above picture represents the situation which occurs when p > 2 because the
graph of the function is concave up. If 2 p > 1 the graph would be concave down
or a straight line. You should verify that the same argument holds in these cases
just as well. In fact, the only thing which matters in the above inequality is that
the function x = tp1 be strictly increasing.
Note equality occurs when ap = bq .
Here is an alternate proof.
p + bq ab. Then f (a) = ap1 b. This is negative when a < b1/(p1) and is
positive when a > b1/(p1) . Therefore, f has a minimum when a = b1/(p1) . In other
words, when ap = bp/(p1) = bq since 1/p + 1/q = 1. Thus the minimum value of f
is
bq bq
+ b1/(p1) b = bq bq = 0.
p q
It follows f 0 and this yields the desired inequality.
Proof of Holders inequality: If either |f |p d or |g|p d equals , the
inequality
p 12.1.1 is obviously valid because anything. If either |f |p
d or
|g| d equals 0, then f = 0 a.e. or that g = 0 a.e. and so in this case the left side of
the inequality equals 0 and so the inequality is therefore true. Therefore assume both
( )1/p
|f |p d and |g|p d are less than and not equal to 0. Let |f |p d = I (f )
( p )1/q
and let |g| d = I (g). Then using the lemma,
|f | |g| 1 |f |p 1 |g|q
d p d + q d = 1.
I (f ) I (g) p I (f ) q I (g)
12.1. BASIC INEQUALITIES AND PROPERTIES 285
Hence,
( )1/p ( )1/q
|f | |g| d I (f ) I (g) = |f | d
p
|g| d
q
.
(|x| + |y|)/2 = m
|x| m |y|
Now as shown above,
( )p p p
|x| + |y| |x| + |y|
2 2
which implies
p p p p
|x + y| (|x| + |y|) 2p1 (|x| + |y| )
Note that if y = (x) is any function for which the graph of is concave up,
you could get a similar inequality by the same argument.
Corollary 12.1.6 (Minkowski inequality) Let 1 p < . Then
( )1/p ( )1/p ( )1/p
p p p
|f + g| d |f | d + |g| d . (12.1.2)
( p )1/p
and |f + g| d = 0 or there is nothing to prove. Therefore, using the above
lemma, ( )
|f + g| d 2
p p1
|f | + |g| d < .
p p
286 THE LP SPACES
p p1
Now |f () + g ()| |f () + g ()| (|f ()| + |g ()|). Also, it follows from the
denition of p and q that p 1 = pq . Therefore, using this and Holders inequality,
|f + g|p d
|f + g|p1 |f |d + |f + g|p1 |g|d
p p
= |f + g| q |f |d + |f + g| q |g|d
1 1 1 1
( |f + g|p d) q ( |f |p d) p + ( |f + g|p d) q ( |g|p d) p.
1
Dividing both sides by ( |f + g|p d) q yields 12.1.2.
The following follows immediately from the above.
Then with this denition and using the convention that elements in Lp are
considered to be the same if they dier only on a set of measure zero, || ||p is a
norm on Lp () because if ||f ||p = 0 then f = 0 a.e. and so f is considered to be
the zero function because it diers from 0 only on a set of measure zero.
12.2 Completeness
The following is an important denition.
Denition 12.2.1 A complete normed linear space is called a Banach1 space.
Lp is a Banach space. This is the next big theorem.
Theorem 12.2.2 The following hold for Lp ()
a.) Lp () is complete.
b.) If {fn } is a Cauchy sequence in Lp (), then there exists f Lp () and a
subsequence which converges a.e. to f Lp (), and ||fn f ||p 0.
Proof: Let {fn } be a Cauchy sequence in Lp (). This means that for every
> 0 there exists N such that if n, m N , then ||fn fm ||p < . Now select a
subsequence as follows. Let n1 be such that ||fn fm ||p < 21 whenever n, m n1 .
Let n2 be such that n2 > n1 and ||fn fm ||p < 22 whenever n, m n2 . If
n1 , , nk have been chosen, let nk+1 > nk and whenever n, m nk+1 , ||fn
fm ||p < 2(k+1) . The subsequence just mentioned is {fnk }. Thus, ||fnk fnk+1 ||p <
2k . Let
gk+1 = fnk+1 fnk .
Then by the corollary to Minkowskis inequality,
m
m
> ||gk+1 ||p ||gk+1 ||p |gk+1 |
k=1 k=1 k=1 p
for all m. It follows that
( m
)p (
)p
|gk+1 | d ||gk+1 ||p < (12.2.3)
k=1 k=1
1 These spaces are named after Stefan Banach, 1892-1945. Banach spaces are the basic item of
study in the subject of functional analysis and will be considered later in this book.
There is a recent biography of Banach, R. Katuza, The Life of Stefan Banach, (A. Kostant and
W. Woyczy nski, translators and editors) Birkhauser, Boston (1996). More information on Banach
can also be found in a recent short article written by Douglas Henderson who is in the department
of chemistry and biochemistry at BYU.
Banach was born in Austria, worked in Poland and died in the Ukraine but never moved. This
is because borders kept changing. There is a rumor that he died in a German concentration camp
which is apparently not true. It seems he died after the war of lung cancer.
He was an interesting character. He hated taking examinations so much that he did not receive
his undergraduate university degree. Nevertheless, he did become a professor of mathematics due
to his important research. He and some friends would meet in a cafe called the Scottish cafe where
they wrote on the marble table tops until Banachs wife supplied them with a notebook which
became the Scotish notebook and was eventually published.
288 THE LP SPACES
for all m and so the monotone convergence theorem implies that the sum up to m
in 12.2.3 can be replaced by a sum up to . Thus,
(
)p
|gk+1 | d <
k=1
which requires
|gk+1 (x)| < a.e. x.
k=1
Therefore, k=1 gk+1 (x) converges for a.e. x because the functions have values in
a complete space, C, and this shows the partial sums form a Cauchy sequence. Now
let x be such that this sum is nite. Then dene
f (x) fn1 (x) + gk+1 (x) = lim fnm (x)
m
k=1
m
since k=1 gk+1 (x) = fnm+1 (x) fn1 (x). Therefore there exists a set E having
measure zero such that
lim fnk (x) = f (x)
k
for all x
/ E. Redene fnk to equal 0 on E and let f (x) = 0 for x E. It then
follows that limk fnk (x) = f (x) for all x. By Fatous lemma, and the Minkowski
inequality,
( )1/p
p
||f fnk ||p = |f fnk | d
( )1/p
p
lim inf |fnm fnk | d = lim inf ||fnm fnk ||p
m m
m1
lim inf fnj+1 fnj fni+1 fni 2(k1). (12.2.4)
m p p
j=k i=k
Therefore, f Lp () because
Lemma 12.3.1 Let (X, S, ) and (Y, F, ) be nite complete measure spaces and
let f be measurable and uniformly bounded. Then the following inequality is
valid for p 1.
( ) p1 ( ) p1
p
|f (x, y)| d d ( |f (x, y)| d) d .
p
(12.3.5)
X Y Y X
Let
J(y) = |f (x, y)|d.
X
Note there is no problem in writing this for a.e. y because f is measurable and
Lemma 9.4.8 on Page 232. Then by Fubinis theorem,
( )p
|f (x, y)|d d = J(y) p1
|f (x, y)|d d
Y X
Y X
Now apply Holders inequality in the last integral above and recall p 1 = pq . This
yields
( )p
|f (x, y)|d d
Y X
( ) q1 ( ) p1
J(y) dp
|f (x, y)| d
p
d
X Y Y
( ) q1 ( ) p1
= J(y) d p
|f (x, y)| d
p
d
Y X Y
( ) q1 ( ) p1
= ( |f (x, y)|d) d
p
|f (x, y)| d
p
d. (12.3.6)
Y X X Y
290 THE LP SPACES
Therefore, dividing both sides by the rst factor in the above expression,
( ( )p ) p1 ( ) p1
|f (x, y)|d d |f (x, y)|p d d. (12.3.7)
Y X X Y
Note that 12.3.7 holds even if the rst factor of 12.3.6 equals zero.
Now consider the case where f is not assumed to be bounded and where the
measure spaces are nite.
Theorem 12.3.2 Let (X, S, ) and (Y, F, ) be -nite measure spaces and let f
be product measurable. Then the following inequality is valid for p 1.
( ) p1 ( ) p1
p
|f (x, y)| d d ( |f (x, y)| d)p d . (12.3.8)
X Y Y X
Proof: Since the two measure spaces are nite, there exist measurable sets,
Xm and Yk such that Xm Xm+1 for all m, Yk Yk+1 for all k, and (Xm ) , (Yk ) <
. Now dene {
f (x, y) if |f (x, y)| n
fn (x, y)
n if |f (x, y)| > n.
Thus fn is uniformly bounded and product measurable. By the above lemma,
( ) p1 ( ) p1
p
|fn (x, y)| d d ( |fn (x, y)| d)p d . (12.3.9)
Xm Yk Yk Xm
Now observe that |fn (x, y)| increases in n and the pointwise limit is |f (x, y)|. There-
fore, using the monotone convergence theorem in 12.3.9 yields the same inequality
with f replacing fn . Next let k and use the monotone convergence theorem
again to replace Yk with Y . Finally let m in what is left to obtain 12.3.8.
Note that the proof of this theorem depends on two manipulations, the inter-
change of the order of integration and Holders inequality. Note that there is nothing
to check in the case of double sums. Thus if aij 0, it is always the case that
( )p 1/p 1/p
p
aij aij
j i i j
because the integrals in this case are just sums and (i, j) aij is measurable.
The Lp spaces have many important properties.
Proof: Recall that a function, f , having values in R can be written in the form
f = f + f where
12.5 Density Of Cc ()
For a topological space, Cc () is the space of continuous functions with compact
support in . If you have never heard of a topological space, think Rp . Also recall
the following denition.
Lemma 12.5.2 Let be a metric space in which the closed balls are compact and
let K be a compact subset of V , an open set. Then there exists a continuous function
f : [0, 1] such that f (x) = 1 for all x K and spt(f ) is a compact subset of
V . That is, K f V.
292 THE LP SPACES
Proof: First note that V is the increasing union of open sets {Wm }m=1 having
compact closures. Therefore, there exists an m such that K Wm since otherwise,
you could obtain a nested sequence of nonempty compact sets of the form K \ Wm
which would have a point in common contrary to the assertion that K V . Pick
such an m. Then let
( C
)
dist x, Wm
f (x) = C ) + dist (x, K)
dist (x, Wm
m
s(x) = ci XEi (x)
i=1
is a simple function in Lp where the ci are the distinct nonzero values of s each
(Ei ) < since otherwise s
/ Lp due to the inequality
p p
|s| d |ci | (Ei ) .
By Theorem 12.4.1, simple functions are dense in Lp (). Therefore, the set of
functions Cc () , is also dense in Lp ().
fw (x) = f (x w).
12.7. SEPARABILITY, SOME SPECIAL FUNCTIONS 293
Proof: Let > 0 be given and let g Cc (Rn ) with ||g f ||p <
3. Since
Lebesgue measure is translation invariant (mn (w + E) = mn (E)),
||gw fw ||p = ||g f ||p < .
3
You can see this from looking at simple functions and passing to the limit or you
could use the change of variables formula to verify it.
Therefore
( )1/p
p
||g gw ||p = |g (x) g (x w)| dmn
B
1/p
mn (B)
( )< .
3 1 + mn (B)
1/p 3
x = (x1 , , xn ) ,
|| f
D f (x) n .
1 x2 xn
x 1 2
Denition 12.7.2 Dene G1 to be the functions of the form p (x) ea|x| where
2
a > 0 is rational and p (x) is a polynomial having all rational coecients, a being
rational if it is of the form a+ib for a, b Q. Let G be all nite sums of functions
in G1 . Thus G is an algebra of functions which has the property that if f G then
f G.
Thus there are countably many functions in G1 . This is because, for each m,
there are countably many choices for a for || m since there are nitely many
for || m and for each such , there are countably many choices for a since
Q+iQ is countable. (Why?) Thus there are countably many polynomials having
degree no more than m. This is true for each m and so the number of dierent
polynomials is a countable union of countable sets which is countable. Now there
are countably many choices of e|x| and so there are countably many in G1 because
2
|y1 | = |y2 | , then suppose y1k = y2k . This must happen for some k because y1 = y2 .
Then let f (x) xk e|x| . Thus G separates points. Now e|x| is never equal to
2 2
These functions are clearly quite specialized. Therefore, the following theorem
is somewhat surprising.
Proof: Let f Lp (Rn ) . Then there exists g Cc (Rn ) such that ||f g||p < .
Now let b > 0 be large enough that
( )
2 p
eb|x| dx < p .
Rn
2
Then x g (x) eb|x| is in Cc (Rn ) C0 (Rn ) . Therefore, from Lemma 12.7.3 there
exists G such that
b||2
ge < 1
b|x|2
Therefore, letting (x) e (x) it follows that G and for all x Rn ,
Therefore,
( )1/p ( ( )p )1/p
eb|x|
p 2
|g (x) (x)| dx dx < .
Rn Rn
It follows
||f ||p ||f g||p + ||g ||p < 2.
From now on, we can drop the restriction that the coecients of the polynomials
in G are rational. We also drop the restriction that a is rational. Thus G will be
nite sums of functions which are of the form p (x) ea|x| where the coecients of
2
Thus these special functions are innitely dierentiable (smooth). They also
have the property that they and all their derivatives vanish as |x| .
12.8 Convolutions
An important construction is the convolution of two functions. This is dened as
follows.
the last step following from translation invariance of Lebesgue measure, Theorem
10.1.2 or from the change of variables formulas. It follows from this inequality that
for a.e. x,
|f (x y)| |g (y)| dy <
Proof: Pick z U and let r be small enough that B (z, 2r) U . Then let
Cc (B (z, 2r)) Cc (U ) be the function of the above example.
Note that here, a specic order is mandated, unlike the earlier treatment given above
for convolutions in the context of Lebesgue measure where the order does not matter.
The following lemma will be useful in what follows. It says that one of these very
unregular functions in L1loc (Rn , ) is smoothed out by convolving with a mollier.
298 THE LP SPACES
g(x + tej y) g (x y)
,
t
is uniformly bounded. To see this easily, use Theorem 6.4.2 on Page 126 to get the
existence of a constant, M depending on
such that
|g(x + tej y) g (x y)| M |t|
for any choice of x and y. Therefore, there exists a dominating function for the
integrand of the above integral which is of the form C |f (y)| XK where K is a
compact set depending on the support of g. It follows the limit of the dierence
quotient above passes inside the integral as t 0 and
(f g) (x) = f (y) g (x y) d (y) .
xj xj
Now letting x j
g play the role of g in the above argument, partial derivatives of all
orders exist. A similar use of the dominated convergence theorem shows all these
partial derivatives are also continuous.
The following theorem is a lot like Lemma 10.5.2 except the function is innitely
dierentiable.
Theorem 12.9.8 Let K be a compact subset of an open set U . Then there exists
a function, h Cc (U ), such that h(x) = 1 for all x K and h(x) [0, 1] for all
x.
Proof: Let r > 0 be small enough that K + B(0, 3r) U. The symbol,
K + B(0, 3r) means
{B (k, 3r) : k K} .
( )
12.9. MOLLIFIERS AND DENSITY OF CC RN 299
K Kr U
Theorem 12.9.9 For each p 1, Cc (Rn ) is dense in Lp (Rn ). Here the measure
is Lebesgue measure.
Proof: Let f Lp (Rn ) and let > 0 be given. Choose g Cc (Rn ) such that
||f g||p < 2 . This can be done by using Theorem 12.5.3. Now let
gm (x) = g m (x) g (x y) m (y) dmn (y) = g (y) m (x y) dmn (y)
whenever m is large enough. This follows from the uniform continuity of g. Theorem
12.3.2 was used to obtain the third inequality. There is no measurability problem
because the function
This is a very remarkable result. Functions in Lp (Rn ) dont need to be continu-
ous anywhere and yet every such function is very close in the Lp norm to one which
is innitely dierentiable having compact support.
( ) p1
( |g(x) g(x y)| m (y)dmn (y))p dmn (x)
( ) p1
|g(x) g(x y)|p dmn (x) m (y)dmn (y)
= ||g gy ||p m (y)dmn (y) <
1
B(0, m ) 2
whenever m is large enough. This follows from the uniform continuity of g. Theorem
12.3.2 was used to obtain the third inequality. There is no measurability problem
because the function
12.10 L
Formally the conjugate index to 1 would be regarding 1/ as 0. This is also an
important space. Sometimes we call it the space of essentially bounded functions
meaning that they are bounded o a set of measure zero.
a.e. and so ||f || + ||g|| serves as one of the constants, M in the denition of
||f + g|| . Therefore,
||f + g|| ||f || + ||g|| .
Next let c be a number. Then |cf (x)| = |c| |f (x)| |c| ||f || and so ||cf ||
|c| ||f || . Therefore since c is arbitrary, ||f || = ||c (1/c) f || 1c ||cf || which
implies |c| ||f || ||cf || . Thus || || is a norm as claimed.
To verify completeness, let {fn } be a Cauchy sequence in L () and use the
above claim to get the existence of a set of measure zero, Enm such that for all
x / Enm ,
|fn (x) fm (x)| ||fn fm ||
Let E = n,m Enm . Thus (E) = 0 and for each x / E, {fn (x)}n=1 is a Cauchy
sequence in C. Let
{
0 if x E
f (x) = = lim XE C (x)fn (x).
limn fn (x) if x
/E n
and F =
n=1 Fn , it follows (F ) = 0 and that for x
/ F E,
12.11 Exercises
1. Let E be a Lebesgue measurable set in R. Suppose m(E) > 0. Consider the
set
E E = {x y : x E, y E}.
Show that E E contains an interval. Hint: Let
f (x) = XE (t)XE (x + t)dt.
Thus, the partial sums of the Fourier series give the optimal approximation
of g in L2 (0, 2)!
12.11. EXERCISES 303
Thus the Fourier series converges in L2 (0, 2) to the function it came from.
Hint: This is not too hard if you use Problem 5 on Page 280. This does not
show anything about pointwise convergence of the partial sums of the Fourier
series! This will be done later. In fact the question of pointwise convergence
of Fourier series to such a function in L2 (0, 2) was an unsolved problem till
the middle 1960s.
5. Now suppose that f is a continuous 2 periodic function. Suppose also
that for t [0, 2] ,
t
f (t) = f (0) + g (s) ds
0
n
where g L (0, 2) . Letting Sn g (x)
2
k=n gk e
ikx
for
2
1
gk = g (s) eiks ds
2 0
(a) Show that this convergence implies that for each t [0, 2] ,
t
f (t) = f (0) + lim Sn g (s) ds
n 0
Now use the integral of the sum equals the sum of the integrals on the
right to nd that
n
gk ( ikt )
f (t) = f (0) + lim e 1
n ik
k=n, k=0
n
f (t) = f (0) C + lim fk eikt
n
k=n, k=0
304 THE LP SPACES
(c) Finally, use the proof of the completeness of L2 to argue that for some
subsequence nm , limm nm = ,
nm
f (t) = lim fk eikt + f0
m
k=nm , k=0
2
for a.e. t [0, 2], where f0 is the Fourier coecient of f, f0 = 2
1
0
f (t) dt.
Explain why f (0) C = f0 . Explain why this gives a result for pointwise
convergence of the Fourier series of f and describe this result carefully.
Thus f = f1 + f2 .
(e) Note how Fubinis theorem was used. Why were the functions of interest
product measurable? After all, fi is a function of although this has
been suppressed. You might consider the proof of measurability which
led to Problem 12 on Page 234.
7. Let K be a bounded subset of Lp (Rn ) and suppose that for each > 0
there exists G such that G is compact with
p
|u (x)| dx < p
Rn \G
and for all > 0, there exist a > 0 and such that if |h| < , then
p
|u (x + h) u (x)| dx < p
306 THE LP SPACES
show = 0
z }| {
F dx = xF p |
p
0 p xF p1 F dx.
0 0
Now show xF = f F and use this in the last integral. Complete the
argument by using Holders inequality and p 1 = p/q.
14. Now supposef Lp (0, ), p > 1, and f not necessarily in Cc (0, ). Show
x
that F (x) = x1 0 f (t)dt still makes sense for each x > 0. Show the inequality
of Problem 13 is still valid. This inequality is called Hardys inequality. Hint:
To show this, use the above inequality along with the density of Cc (0, ) in
Lp (0, ).
16. Prove Vitalis Convergence theorem: Let {fn } be uniformly integrable and
complex valued, () < , fn (x) f (x) a.e. where f is measurable. Then
f L1 and limn |fn f |d = 0. Hint: Use Egoros theorem to show
{fn } is a Cauchy sequence in L1 ().
17. Show the Vitali Convergence theorem implies the Dominated Convergence
theorem for nite measure spaces but there exist examples where the Vitali
convergence theorem works and the dominated convergence theorem does not.
Now raise both ends to the 1/p power and take lim inf and lim sup as p .
You should get ||f || lim inf ||f ||p lim sup ||f ||p ||f ||
20. Show L1 (R)* L2 (R) and L2 (R) * L1 (R) if Lebesgue measure is used. Hint:
Consider 1/ x and 1/x.
1 1
= + .
q r s
308 THE LP SPACES
show that
( |f | d)
q 1/q
(( |f | d) ) (( |f |s d)1/s )1.
r 1/r
Hint:
|f |q d = |f |q |f |q(1) d.
q q(1)
Now note that 1 = r + s and use Holders inequality.
22. Suppose f is a function in L1 (R) and f is innitely dierentiable. Does it
follow that f L1 (R)? Is f measurable? Hint: What if Cc (0, 1) and
f (x) = (2n (x n)) for x (n, n + 1) , f (x) = 0 if x < 0?
Show that if f L1 (Rp ) , then lim|x| F f (x) = 0. Hint: You might try to
show this rst for f Cc (Rp ).
Fourier Transforms
Denition 13.1.1 For G Dene the Fourier transform, F and the inverse
Fourier transform, F 1 by
F (t) (2)n/2 eitx (x)dx,
Rn
1 n/2
F (t) (2) eitx (x)dx.
Rn
n
where t x i=1 ti xi . Note there is no problem with this denition because is
in L1 (Rn ) and therefore, itx
e (x) |(x)| ,
an integrable function.
One reason for using the functions G is that it is very easy to compute the
Fourier transform of these functions. The rst thing to do is to verify F and F 1
map G to G and that F 1 F () = .
( )n/2 ( )n ( )n/2
1
|s|2 1
e 4c |s| .
1 2
= e 4c = (13.1.1)
2 c 2c
309
310 FOURIER TRANSFORMS
Proof: Consider rst the case of one dimension. Let H (s) be given by
ect eist dt = ect cos (st) dt
2 2
H (s)
R R
Then using the dominated convergence theorem to dierentiate,
( ( )
) ect
2
s
ct2 ct2
H (s) = e t sin (st) dt = sin (st) | e cos (st) dt
R 2c 2c R
s
= H (s) .
2c
ct2
dt. Thus H (0) = R ecx dx I and so
2
Also H (0) = R e
2
ec(x +y ) dxdy =
2 2
ecr rddr = .
2
I2 =
R 2 0 0 c
For another proof of this which does not use change of variables and polar coordi-
nates, see Problem 14 below. Hence
s
H (s) + H (s) = 0, H (0) = .
2c c
s2
It follows that H (s) = e 4c . Hence
c
( )1/2
1 ct2 ist 1 s2 1 s2
e e dt = e 4c = e 4c .
2 R c 2 2c
This proves the formula in the case of one dimension. The case of the inverse Fourier
transform is similar. The n dimensional formula follows from Fubinis theorem.
With these formulas, it is easy to verify F, F 1 map G to G and F F 1 =
F 1 F = id.
Theorem 13.1.3 Each of F and F 1 map G to G. Also F 1 F () = and
F F 1 () = .
Proof: To make the notation simpler, will symbolize (2)1n/2 Rn . Also,
fb (x) eb|x| . Then from the above
2
n/2
F fb = (2b) f(4b)1
The rst claim will be shown if it is shown that F G for (x) = x eb|x|
2
|t|2
and this is clearly in G because it equals a polynomial times e 4b . Similarly,
F 1 : G G. Now consider F 1 F () (s) . From the above, and integrating by
parts,
( )
1 || itx b|x|2
F F () (s) = (i) ist
e Dt e e dx dt
( )
|| || ist itx b|x|2
= (i) (i) s e e e dx dt
= s F 1 (F (fb )) (s)
( ) ( )
n/2 n/2 1
F 1 (F (fb )) (s) = F 1 (2b) f(4b)1 (s) = (2b) F f(4b)1 (s)
( )n/2
n/2 1
= (2b) 2 (4b) f(4(4b)1 )1 (s) = fb (s)
Lemma 13.2.4 F and F 1 are both one to one, onto, and are inverses of each
other.
Proof: First note F and F 1 are both linear. This follows directly from the
denition. Suppose now F T = 0. Then F T () = T (F ) = 0 for all ( G. But )F
and F 1 map G onto G because if G, then as shown above, = F F 1 () .
Therefore, T = 0 and so F is one to one. Similarly F 1 is one to one. Now
( ) ( ( ))
F 1 (F T ) () (F T ) F 1 T F F 1 () = T .
Let {
f
if |f | = 0
|f |
sgn f =
0 if |f | = 0
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 313
By Lemma 13.2.5 f = 0.
The next theorem is the main result of this sort.
Proof: First note that if f Lp (Rn ) or has polynomial growth, then it makes
sense to write the integral f dx described above. This is obvious in the case of
polynomial growth. In the case where f Lp (Rn ) it also makes sense because
( )1/p ( )1/p
p p
|f | || dx |f | dx || dx <
due to the fact mentioned above that all these functions in G are in Lp (Rn ) for
every p 1. Suppose now that f Lp , p 1. The case where f L1 (Rn ) was
dealt with in Corollary 13.2.6. Suppose f Lp (Rn ) for p > 1. Then
( )
p2 p 1 1
|f | f L (R ) , p = q, + = 1
n
p q
314 FOURIER TRANSFORMS
and by density of G in Lp (Rn ) (Theorem 12.7.4), there exists a sequence {gk } G
such that
p2
gk |f | f 0.
p
Then
( )
p p2
|f | dx = f |f | f gk dx + f gk dx
Rn Rn Rn
( )
p2
= f |f | f gk dx
Rn
p2
||f ||Lp gk |f | f
p
0 = f (x) e|x| (x) dx
2
because e|x| (x) G. Therefore, by the rst part, f (x) e|x| = 0 a.e.
2 2
Note that polynomial growth could be replaced with a condition of the form
( )m
2
|f (x)| K 1 + |x| ek|x| , < 2
and the same proof would yield that these functions are in G . The main thing to
observe is that almost all functions of interest are in G .
Proof: Let f have polynomial growth rst. Then the above integral is clearly
well dened and so in this case, f G .
Next suppose f Lp (Rn ) with > p 1. Then it is clear again that the
above integral is well dened because of the fact that is a sum of polynomials
times exponentials of the form ec|x| and these are in Lp (Rn ). Also f () is
2
Since G is arbitrary, it follows from Theorem 13.2.7 that F f (x) is given by the
claimed formula. The case of F 1 is identical.
Here are interesting properties of these Fourier transforms of functions in L1 .
n/2
= (2) |fk (x) f (x)| dx
Rn
n/2
= (2) ||f fk ||1 .
316 FOURIER TRANSFORMS
Now integrating by parts, it follows that for ||t|| max {|tj | : j = 1, , n} > 0
n
1
|F f (t)| /2 + (2)n/2 g (x) dx (13.2.3)
||t|| Rn j=1 xj
and this last expression converges to zero as ||t|| . The reason for this is that
if tj = 0, integration by parts with respect to xj gives
1 g (x)
(2)n/2 eitx g(x)dx = (2)n/2 eitx dx.
Rn it j R n xj
Therefore, choose the j for which ||t|| = |tj | and the result of 13.2.3 holds. There-
fore, from 13.2.3, if ||t|| is large enough, |F f (t)| < . Similarly, lim||t|| F 1 (t) =
0. Consider the claim about uniform continuity. Let > 0 be given. Then there
exists R such that if ||t|| > R, then |F f (t)| < 2 . Since F f is continuous, it is
n
uniformly continuous on the compact set [R 1, R + 1] . Therefore, there exists
n
1 such that if ||t t || < 1 for t , t [R 1, R + 1] , then
Now let 0 < < min ( 1 , 1) and suppose ||t t || < . If both t, t are contained
in [R, R] , then 13.2.4 holds. If t [R, R] and t
n n n
/ [R, R] , then both are
n
contained in [R 1, R + 1] and so this veries 13.2.4 in this case. The other case
n
is that neither point is in [R, R] and in this case,
|F f (t) F f (t )| |F f (t)| + |F f (t )|
< + = .
2 2
There is a very interesting relation between the Fourier transform and convolu-
tions.
n/2
Theorem 13.2.11 Let f, g L1 (Rn ). Then f g L1 and F (f g) = (2) F f F g.
Proof: Consider
|f (x y) g (y)| dydx.
Rn Rn
n/2
= (2) eity g (y) eit(xy) f (x y) dxdy
Rn Rn
n/2
= (2) F f (t) F g (t) .
Similarly,
(F 1 )dx = (F 1 )dt. (13.2.6)
Rn Rn
Now, 13.2.5 - 13.2.6 imply
||2 dx = dx = F 1 (F )dx = F (F )dx
Rn
R Rn
Rn
n
= F (F )dx = |F |2 dx.
Rn Rn
Similarly
||||2 = ||F 1 ||2 .
Lemma 13.2.13 Let f L2 (Rn ) and let k f in L2 (Rn ) where k G. (Such
a sequence exists because of density of G in L2 (Rn ).) Then F f and F 1 f are both
in L2 (Rn ) and the following limits take place in L2 .
lim F (k ) = F (f ) , lim F 1 (k ) = F 1 (f ) .
k k
||F f ||2 = lim ||F k ||2 = lim ||k ||2 = ||f ||2 .
k k
Similarly,
||f ||2 = ||F 1 f ||2.
The following corollary is a simple generalization of this. To prove this corollary,
use the following simple lemma which comes as a consequence of the Cauchy Schwarz
inequality.
Proof:
fk gk dx f gdx fk gk dx fk gdx +
Rn R n Rn Rn
fk gdx f gdx
Rn Rn
Proof: First note the above formula is obvious if f, g G. To see this, note
1
F f F gdx = F f (x) n/2
eixt g (t) dtdx
Rn Rn (2) Rn
1 ( )
= n/2
eixt F f (x) dxg (t)dt = F 1 F f (t) g (t)dt
Rn (2) Rn Rn
= f (t) g (t)dt.
Rn
Since this holds for all G, a dense subset of L2 (Rn ), it follows that
n
F fr (y) = (2) 2 fr (x)eixy dx.
Rn
Similarly
1 n
F fr (y) = (2) 2 fr (x)eixy dx.
Rn
This shows that to take the Fourier transform of a function
in L2 (Rn ), it suces
2 n n
to take the limit as r in L (R ) of (2) 2 Rn fr (x)eixy dx. A similar
procedure works for the inverse Fourier transform.
Note this reduces to the earlier denition in case f L1 (Rn ). Now consider the
convolution of a function in L2 with one in L1 .
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 321
F 1 (h f ) = (2) F 1 hF 1 f,
n/2
n/2
F (h f ) = (2) F hF f,
and
||h f ||2 ||h||2 ||f ||1 . (13.2.10)
Hence |h (x y)| |f (y)| dy < a.e. x and
x h (x y) f (y) dy
and letting G,
F (hr f ) () dx
(hr f ) (F ) dx
n/2
= (2) hr (x y) f (y) eixt (t) dtdydx
( )
n/2
= (2) hr (x y) ei(xy)t dx f (y) eiyt dy (t) dt
n/2
= (2) F hr (t) F f (t) (t) dt.
notation.
Denition 13.2.20 f S, the Schwartz class, if f C (Rn ) and for all positive
integers N ,
N (f ) <
where
2
N (f ) = sup{(1 + |x| )N |D f (x)| : x Rn , || N }.
Thus f S if and only if f C (Rn ) and
Also note that if f S, then p(f ) S for any polynomial, p with p(0) = 0 and
that
S Lp (Rn ) L (Rn )
for any p 1. To see this assertion about the p (f ), it suces to consider the case
of the product of two elements of the Schwartz class. If f, g S, then D (f g) is
a nite sum of derivatives of f times derivatives of g. Therefore, N (f g) < for
all N . You may wonder about examples of things in S. Clearly any function in
Cc (Rn ) is in S. However there are other functions in S. For example e|x| is in
2
S as you can verify for yourself and so is any function from G. Note also that the
density of Cc (Rn ) in Lp (Rn ) shows that S is dense in Lp (Rn ) for every p.
Recall the Fourier transform of a function in L1 (Rn ) is given by
F f (t) (2)n/2 eitx f (x)dx.
Rn
Therefore, this gives the Fourier transform for f S. The nice property which S
has in common with G is that the Fourier transform and its inverse map S one to
one onto S. This means I could have presented the whole of the above theory in
terms of S rather than in terms of G. However, it is more technical.
n/2
= i(2) eitx xej f (x)dx.
Rn
Now xej f (x) S and so one can continue in this way and take derivatives inde-
nitely. Thus F 1 f C (Rn ) and from the above argument,
1 n/2
D F f (t) =(2) eitx (ix) f (x)dx.
Rn
To complete showing F 1 f S,
t D F 1 f (t) =(2)n/2
a
eitx t (ix) f (x)dx.
Rn
where the boundary term vanishes because f S. Returning to 13.2.14, use the
fact that |eia | = 1 to conclude
|t D F 1 f (t)| C
a
|D ((ix) f (x))|dx < .
Rn
324 FOURIER TRANSFORMS
Proof: The rst claim follows from the fact that F and F 1 are inverses of
( 1on)G which was established above. For the second,1let S. Then
each other
= F F . Thus F maps S onto S. If F = 0, then do F to both sides to
conclude = 0. Thus F is one to one and onto. Similarly, F 1 is one to one and
onto.
13.2.5 Convolution
To begin with it is necessary to discuss the meaning of f where f G and G.
What should it mean? First suppose f Lp (Rn ) or measurable with polynomial
growth. Then f also has these properties. Hence, it should be the case that
f () = Rn f dx = Rn f () dx. This motivates the following denition.
F F f, F 1 (f ) = (2) F 1 F 1 f
n/2 n/2
F (f ) = (2)
F 1 (F F f ) G
n/2
f (2)
F 1 (f ) = (2) F 1 F 1 f.
n/2
(13.2.16)
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 325
Proof: Note that 13.2.15 follows from Denition 13.2.24 and both assertions
hold for f G. Consider 13.2.16. Here is a simple formula involving a pair of
functions in G. ( )
F 1 F 1 (x)
( )
(x y) eiyy1 eiy1 z (z) dzdy1 dy (2)
n
=
( )
(x y) eiyy1 eiy1 z (z) dzd
n
= y1 dy (2)
= ( F F ) (x) .
Now for G,
( ) n/2 ( 1 )
F F 1 F 1 f () (2) F F 1 f (F )
n/2
(2)
( ) n/2 ( 1 ( 1 ))
F 1 f F 1 F (2)
n/2
(2) f F F F =
( ))
n/2 1 (( )
f (2) F F F 1 F 1 (F )
( )
f F 1 F 1 = f ( F F ) (13.2.17)
Also ( )
F 1 (F F f ) () (2) (F F f ) F 1
n/2 n/2
(2)
( ) n/2 ( ( ))
F f F F 1 (2) f F F F 1 =
n/2
(2)
( ( )))
n/2 (
= f F (2) F F 1
( ( )))
n/2 ( 1 ( ( ))
= f F (2) F F F F 1 = f F F 1 (F F )
f (F F ) = f ( F F ) . (13.2.18)
The last line follows from the following.
F F (x y) (y) dy = F (x y) F (y) dy
= F (x y) F (y) dy
= (x y) F F (y) dy.
13.3 Exercises
1. For f L1 (Rn ), show that if F 1 f L1 or F f L1 , then f equals a
continuous bounded function a.e.
lim g (x + r) g (x+)
r0+
exists,
lim g (x r) g (x)
r0+
exists and there exist constants K, > 0 and r (0, 1] such that for |x y| <
,
r
|g (x+) g (y)| < K |x y|
for y > x and
r
|g (x) g (y)| < K |x y|
for y < x. Show that under these conditions,
( )
2 sin (ur) g (x u) + g (x + u)
lim du
r 0 u 2
g (x+) + g (x)
= .
2
13.3. EXERCISES 327
7. Let g L1 (R) and suppose g is locally Holder continuous from the right
and from the left at x. Show that then
R
1 g (x+) + g (x)
lim eixt eity g (y) dydt = .
R 2 R 2
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick > . Show that
R
1 g (t+) + g (t)
lim et eiyt Lg ( + iy) dy = .
R 2 R 2
and is called the complex inversion integral for Laplace transforms. It can be
used to nd inverse Laplace transforms. Hint:
R
1
et eiyt Lg ( + iy) dy =
2 R
R
1
et eiyt e(+iy)u g (u) dudy.
2 R 0
Now use Fubinis theorem and do the integral from R to R to get this equal
to
et u sin (R (t u))
e g (u) du
tu
where g is the zero extension of g o [0, ). Then this equals
et (tu) sin (Ru)
e g (t u) du
u
328 FOURIER TRANSFORMS
which equals
2et g (t u) e(tu) + g (t + u) e(t+u) sin (Ru)
du
0 2 u
Show both || ||k,2 and ||| |||k,2 are norms on G and that they are equivalent.
These are Sobolev space norms. For which values of k does the second norm
make sense? How about the rst norm?
1
|||f |||k,2 ( |F f (x)|2 (1 + |x|2 )k dx) 2.
Then show is a Borel measure which is inner and outer regular and show
there exists {gm } such that gm G and gm F f in L2 (). Thus gm =
F fm , fm G because F maps G onto G. Then by Problem 10, {fm } is
Cauchy in the norm || ||k,2 .
So
k k
|F f (x)|dx = |F f (x)|(1 + |x|2 ) 2 (1 + |x|2 ) 2 dx.
13. Let u G. Then F u G and so, in particular, it makes sense to form the
integral,
F u (x , xn ) dxn
R
constant such that F (u) (x ) equals this constant times the above integral.
Hint: By the dominated convergence theorem
F u (x , xn ) dxn = lim e(xn ) F u (x , xn ) dxn .
2
R 0 R
Now use the denition of the Fourier transform and Fubinis theorem as re-
quired in order to obtain the desired relationship.
( )2 ( x2 (1+t2 ) )
x t2 1 e
14. Let h (x) = 0
e dt + 0 1+t2 dt . Show that h (x) = 0 and
h (0) = /4. Then let x to conclude that 0 et dt = /2. Show that
2
t2 ct2
e dt = and that e dt = c .
17. For f (L1) (Rn ) and c a nonzero real number, show F f (ct) = F g (t) where
g (x) = f xc .
18. Suppose that f L1 (R) and that |x| |f (x)| dx < . Find a way to use the
Fourier transform of f to compute xf (x) dx.
330 FOURIER TRANSFORMS
Fourier Series
Obviously such a sequence of partial sums may or may not converge at a particular
value of x.
These series have been important in applied math since the time of Fourier who
was an ocer in Napoleons army. He was interested in studying the ow of heat in
cannons and invented the concept to aid him in his study. Since that time, Fourier
series and the mathematical problems related to their convergence have motivated
the development of modern methods in analysis. As recently as the mid 1960s a
problem related to convergence of Fourier series was solved for the rst time and
the solution of this problem was a big surprise.1 This chapter is on the classical
theory of convergence of Fourier series.
If you can approximate a function f with an expression of the form
ck eikx
k=
then the function must have the property f (x + 2) = f (x) because this is true of
every term in the above series. More generally, here is a denition.
1 The question was whether the Fourier series of a function in L2 converged a.e. to the function.
It turned out that it did, to the surprise of many because it was known that the Fourier series of
a function in L1 does not necessarily converge to the function a.e. The problem was solved by
Carleson in 1965.
331
332 FOURIER SERIES
As just explained, Fourier series are useful for representing periodic functions and
no other kind of function. There is no loss of generality in studying only functions
which are periodic of period 2. Indeed, if f is a function (which ) has period T , you
can study this function in terms of the function, g (x) f T2x where g is periodic
of period 2.
Denition 14.1.2 For f L1 ([, ]) (f measurable and
|f (t)| dt < ) and
f periodic on R, dene the Fourier series of f as
ck eikx , (14.1.1)
k=
where
1
ck f (y) eiky dy. (14.1.2)
2
n
Sn (f ) (x) ck eikx . (14.1.3)
k=n
It may be interesting to see where this formula came from. Suppose then that
f (x) = ck eikx ,
k=
multiply both sides by eimx and take the integral
, so that
f (x) eimx dx = ck eikx eimx dx.
k=
Now switch the sum and the integral on the right side even though there is absolutely
no reason to believe this makes any sense. Then
imx
f (x) e dx = ck eikx eimx dx
k=
= cm 1dx = 2cm
because eikx eimx dx = 0 if k = m. It is formal manipulations of the sort just
presented which suggest that Denition 14.1.2 might be interesting.
14.1. DEFINITION AND BASIC PROPERTIES 333
Letting ck k + i k
n
1
Sn f (x) = f (y) dy + 2 [k cos kx k sin kx]
2 k=1
where
1 iky 1
ck = f (y) e dy = f (y) (cos ky i sin ky) dy
2 2
which shows that
1
1
k = f (y) cos (ky) dy, k = f (y) sin (ky) dy
2 2
The function,
1 ikt
n
Dn (t) e
2
k=n
is called the Dirichlet Kernel.
334 FOURIER SERIES
Therefore,
( ) ( ) n ( )
t t t
2Dn (t) sin = sin +2 sin cos (kt)
2 2 2
k=1
( ) n (( ) ) (( ) )
t 1 1
= sin + sin k+ t sin k t
2 2 2
k=1
(( ) )
1
= sin n+ t
2
where the easily veried trig. identity cos (a) sin (b) = 12 (sin (a + b) sin (a b)) is
used to get to the second line.
Here is a picture of the Dirichlet kernels for n = 1, 2, and 3
y 1
0
3 2 1 0 1 2 3
x
1
Note they are not nonnegative but there is a large central positive bump which
gets larger as n gets larger.
It is not reasonable to expect a Fourier series to converge to the function at
every point. To see this, change the value of the function at a single point in
(, ) and extend to keep the modied function periodic. Then the Fourier series
of the modied function is the same as the Fourier series of the original function and
so if pointwise convergence did take place, it no longer does. However, it is possible
to prove an interesting theorem about pointwise convergence of Fourier series. This
is done next.
14.2. THE RIEMANN LEBESGUE LEMMA 335
Therefore,
Sn f (x) f (x+) + f (x) =
2
1 sin ((n + 1 ) y ) [ f (x y) f (x) + f (x + y) f (x+) ]
( )
2
dy . (14.3.10)
0 sin y2 2
14.3. DINIS CRITERION FOR CONVERGENCE 337
The function y
2 sin( y2 )
is bounded on (0, ) so the result is in L1 ([0, ]) .
The following corollary is obtained immediately from the above proof with minor
modications.
The following corollary gives an easy to check condition for the Fourier series to
converge to the mid point of the jump.
f (x+) + f (x)
lim Sn f (x) = . (14.3.15)
n 2
Proof: The condition 14.3.14 clearly implies Dinis condition, 14.3.7. This is
because for 0 < y <
f (x y) f (x) + f (x + y) f (x+)
2Ky 1
y
338 FOURIER SERIES
and so
f (x y) f (x) + f (x + y) f (x+)
dy
y
1
2Ky 1 dy + |f (x y) f (x) + f (x + y) f (x+)| dy.
Now
2Ky 1 dy = 2K
which converges to 2K as 0. Thus
f (x y) f (x) + f (x + y) f (x+)
lim dy
0+ y
exists and so, from the monotone convergenct theorem the function
f (x y) f (x) + f (x + y) f (x+)
y
y
is in L1 ([0, ]) . This is the Dini condition.
As pointed out by Apostol [3], where you can read these theorems presented in
the context of the Riemann integral, this is a very surprising result because even
though the Fourier coecients depend on the values of the function on all of [, ],
the convergence properties depend in this theorem on very local behavior of the
function.
where G (x) G (a) for all x < a. Thus Gh (a) = G (a). Also, from the fundamental
theorem of calculus, Gh (t) 0 and is a continuous function of t. Also it is clear
t
that limh0 Gh (t) = G (t) for all t [a, b]. Letting F (t) a f (s) ds,
b b
Gh (s) f (s) ds = F (t) Gh (t) |ba F (t) Gh (t) dt. (14.4.17)
a a
Now letting m = min {F (t) : t [a, b]} and M = max {F (t) : t [a, b]}, since
Gh (t) 0,
b b b
m Gh (t) dt F (t) Gh (t) dt M Gh (t) dt.
a a a
b
Therefore, if a
Gh (t) dt = 0,
b
F (t) Gh (t) dt
m a
b M
a
G h (t) dt
b
for some th [a, b] . This is true even if a Gh (t) dt = 0 because in this case, the
left side equals 0. Since Gh 0 and is continuous, it must equal 0 and so the right
side is also 0. Therefore, substituting for
b
F (t) Gh (t) dt
a
in 14.4.17,
[ ]
b b
Gh (s) f (s) ds = F (t) Gh (t) |ba F (th ) Gh (t) dt
a a
The above lemma will be used in the following lemma from Apostol [3].
Lemma 14.4.3 Let G be increasing. Then for > 0,
sin (y)
lim G (y) dy = G (0+)
0 y 2
Proof: Let 0 < h < then 0 G (y) sin(y) y dy =
h h
sin (y) sin (y) sin (y)
(G (y) G (0+)) dy + G (0+) dy + G (y) dy
0 y 0 y h y
From the mean value theorem above, the rst integral equals
h
sin (y)
(G (h) G (0+)) dy
0 y
This integral converges as to 0 siny y dy = 2 . Just change the variable and
use Problem 6 on Page 233. See also Problem 19 on Page 358 below. Therefore, if
h is chosen small enough, the rst term is bounded by /3. Fix such an h. Then
as the second term converges to 2 G (0+). The last term converges to 0 by
the Riemann Lebesgue lemma. Therefore, xing h as described,
h sin (y)
sin (y)
G (y) dy G (0+) (G (y) G (0+)) dy
0 y 2 0 y
h
sin (y)
sin (y)
+ G (0+) dy + G (y) dy
0 y 2 h y
< + +
3 3 3
provided is large enough.
Denition 14.4.4 Let f : [a, b] C be a function. Then f is of bounded variation
if { n }
sup |f (ti ) f (ti1 )| : a = t0 < < tn = b V (f, [a, b]) <
i=1
where the sums are taken over all possible lists, {a = t0 < < tn = b}. The sym-
bol, V (f, [a, b]) is known as the total variation on [a, b].
14.4. JORDANS CRITERION 341
|f (ti ) f (ti1 )| max (|Re f (ti ) Re f (ti1 )| , |Im f (ti ) Im f (ti1 )|)
and
f (x+) + f (x)
lim Sn f (x) = . (14.4.18)
n 2
Proof: First note that from Denition 14.4.4, limyx Re f (y) exists because
Re f is the dierence of two increasing functions. Similarly this limit will exist for
Im f by the same reasoning, and limits of the form limyx+ will also exist. Then
f (x+) + f (x)
Sn f (x) =
2
( )
f (x+) + f (x)
Dn (y) f (x y) dy
2
342 FOURIER SERIES
= Dn (y) [(f (x + y) f (x+)) + (f (x y) f (x))] dy.
0
Now the Dirichlet kernel, Dn (y) is a constant multiple of
sin ((n + 1/2) y) / sin (y/2)
and so the Riemann Lebesgue lemma implies
lim Dn (y) [(f (x + y) f (x+)) + (f (x y) f (x))] dy = 0.
n
by Corollary 14.4.6.
It is known that neither the Jordan criterion nor the Dini criterion implies the
other. See Problem 21.
It follows that
1 x
( ) n x
F (x) = ak eikx etk = a0 dt + lim ak eikt dt
ik n
k=0 k=n,k=0
n k x
(1) i
= lim eikt dt
n k
k=n,k=0
( )
n
(1) i
k
sin xk cos xk + (1)
k
= lim +i
n k k k
k=n,k=0
and so
2 1
= 2
8 (2k 1)
k=1
which fails to converge anywhere because the k th term fails to converge to 0. This
is in spite of the fact that f has a derivative away from 0.
However, it is possible to prove some theorems which let you dierentiate a
Fourier series term by term. Here is one such theorem.
it follows the Fourier series of f is
ak ikeikx .
k=
If k = 0, b0 = 1
2 (f () f ()) = 0 by periodicity of f . For k = 0,
( t )
1 iks k
bk = f (t) ike ds (1) dt
2
Since
f (t) dt = 0, this equals
( t )
1 iks
f (t) ike ds dt
2
346 FOURIER SERIES
as claimed.
Note the conclusion of this theorem is only about the Fourier series of f . It does
not say the Fourier series of f converges pointwise to f . However, if f satises a
Dini condition, then this will also occur. For example, if f has a bounded derivative
at every point, then by the mean value theorem |f (x) f (y)| K |x y| , and
this is enough to show the Fourier series converges to f (x).
This means that for all > 0 there exists a pn () such that
||f pn || < .
Denition 14.7.2 Recall the nth partial sum of the Fourier series Sn f (x) is given
by
Sn f (x) = Dn (x y) f (y) dy = Dn (t) f (x t) dt
The nth Fejer mean, n f (x) is the average of the rst n of the Sn f (x). Thus
( )
1 1
n n
n+1 f (x) Sk f (x) = Dk (t) f (x t) dt
n+1 n+1
k=0 k=0
As was the case with the Dirichlet kernel, the Fejer kernel has some properties.
Proof: Part 1.) is obvious because Fn+1 is the average of functions for which
this is true.
Part 2.) is also obvious for the same reason as Part 1.). Part 3.) is obvious
because it is true for Dn in place of Fn+1 and then taking the average yields the
same sort of sum.
The last statements in 4.) are obvious from the formula which is the only hard
part of 4.).
n (( ) )
1 1
Fn+1 (t) = (t) sin k+ t
(n + 1) sin 2 2 k=0 2
n (( ) ) ( )
1 1 t
= ( ) sin k + t sin
(n + 1) sin2 2t 2 k=0 2 2
( )
Using the identity sin (a) sin (b) = cos (a b) cos (a + b) with a = k + 12 t and
b = 2t , it follows
1 n
Fn+1 (t) = ( )
2 t (cos (kt) cos (k + 1) t)
(n + 1) sin 2 4 k=0
1 cos ((n + 1) t)
= ( )
(n + 1) sin2 2t 4
0.75
y 0.5
0.25
0.0
3 2 1 0 1 2 3
t
Note how these kernels are nonnegative, unlike the Dirichlet kernels. Also there
is a large bump in the center which gets increasingly large as n gets larger. The
fact these kernels are nonnegative is what is responsible for the superior ability of
the Fejer means to approximate a continuous function.
14.7. WAYS OF APPROXIMATING FUNCTIONS 349
= (f (x) f (x y)) Fn+1 (y) dy
|f (x) f (x y)| Fn+1 (y) dy
= |f (x) f (x y)| Fn+1 (y) dy + |f (x) f (x y)| Fn+1 (y) dy
+ |f (x) f (x y)| Fn+1 (y) dy
Since Fn+1 is even and |f | is continuous and periodic, hence bounded by some
constant M the above is dominated by
|f (x) f (x y)| Fn+1 (y) dy + 4M Fn+1 (y) dy
Now choose such that for all x, it follows that if |y| < then
8M
||f n+1 f || /2 + ( ) <
(n + 1) sin2 2 4
( )( )
n
n
il
= f () bk e ik
f () bl e d
k=n l=n
( ( )( )
n
n
bl eil
2
= |f ()| + bk eik
k=n l=n
)
n
n
il
f () bl e f () bk e ik
d
l=n k=n
eik eil d 2
2
= |f ()| d + bk bl bl al 2 bk ak
kl l k
2
Then adding and subtracting 2 k |ak | ,
2 2 2
= |f ()| d 2 |ak | + 2 |bk |
k k
2
2 bl al 2 bk ak + 2 |ak |
l k k
( )
2
2
( )
= |f ()| d 2 |ak | + 2 (bk ak ) bk ak
k k
2 2 2
= |f ()| d 2 |ak | + 2 |bk ak |
k k
Therefore, to make
2
n
f () bk eik d
k=n
14.7. WAYS OF APPROXIMATING FUNCTIONS 351
as small as possible for all choices of bk , one should let bk = ak , the k th Fourier
coecient. Stated another way,
n
2
ik 2
f () bk e d |f () Sn f ()| d
k=n
Also,
n
f () Sn f ()d = ak f () eik d
k=n
=2ak
z }| {
n
= ak f () eik d
k=n
n
2
= 2 |ak |
k=n
Similarly,
n
2
f ()Sn f () d = 2 |ak |
k=n
Therefore, ( )
0 (f () Sn f ()) f () Sn f () d
2 2
= |f ()| + |Sn f ()| f ()Sn f () f () Sn f ()d
2 2
= |f ()| |Sn f ()| d
showing
n
2 2 2
2 |ak | |Sn f ()| d |f ()| d (14.7.21)
k=n
and so gX[,] could be replaced with gXKn if necessary in the above inequality
involving .
Extending g to be 2 periodic, it follows
||f g||L2 ([,]) + ||g n+1 g||L2 ([,]) + || n+1 (g f )||L2 ([,]) . (14.7.22)
( )1/2
2
Fn (y) |g (x y) f (x y)| dx dy
= Fn (y) dy ||g f ||L2 ([,]) = ||g f ||L2 ([,])
14.8 Exercises
1. Suppose f has innitely many derivatives and is also periodic with period
2. Let the Fourier series of f be
ak eik
k=
Show that
lim k m ak = lim k m ak = 0
k k
for every m N.
2. Let f be a continuous function dened on [, ]. Show there exists a poly-
nomial, p such that ||p f || < where
||g|| sup {|g (x)| : x [, ]} .
Extend this result to an arbitrary interval. This is another approach to the
Weierstrass approximation theorem. Hint: First nd a linear function, ax +
b = y such that f y has the property that it has the same value at both
ends of [, ]. Therefore, you may consider this as the restriction to [, ]
of a continuous periodic function, F . Now nd a trig polynomial,
n
(x) a0 + ak cos kx + bk sin kx
k=1
such that || F || < 3 . Recall 14.1.4. Now consider the power series of
the trig functions making use of the error estimate for the remainder after m
terms.
3. The inequality established above,
n
2 2 2
2 |ak | |Sn f ()| d |f ()| d
k=n
is called Bessels inequality. Use this inequality to give an easy proof that for
all f L2 ([, ]) ,
lim f (x) einx dx = 0.
n
Recall that in the Riemann Lebesgue lemma |f | L1 ((a, b]) so while this
exercise is easier, it lacks the generality of the earlier proof. Explain why this
is less general.
4. Let f (x) = x for x (, ) and extend to make the resulting function
dened on R and periodic of period 2. Find the Fourier series of f . Verify
the Fourier series converges to the midpoint of the jump and use this series
to nd a nice formula for 4 . Hint: For the last part consider x = 2 .
354 FOURIER SERIES
6. Let f (x) = cos x for x (0, ) and dene f (x) cos x for x (, 0).
Now extend this function to make it 2 periodic. Find the Fourier series of f .
where k are the Fourier coecients of f and k are the Fourier coecients
of g.
9. Recall the partial summation formula, called the Dirichlet formula which says
that
q
q1
ak bk = Aq bq Ap1 bp + Ak (bk bk+1 ) .
k=p k=p
q
Here Aq k=1 ak . Also recall Dirichlets test which says that if limk bk =
0, Ak are bounded, and |b k bk+1 | converges, then ak bk converges. Show
the partial sums of k sin kx are bounded for each x R. Using this fact
and
the Dirichlet test above, obtain some theorems which will state that
k ak sin kx converges for all x.
10. Let {an } be a sequence of positive numbers having the property that
lim nan = 0
n
11. The problem in Apostols book mentioned in Problem 10 does not require
nan to be decreasing and is as follows. Let {ak }k=1 be a decreasing sequence
of nonnegative numbers which satises limn nan = 0. Then
ak sin (kx)
k=1
which you can establish by taking the imaginary part of a geometric series of
q ( )k
the form k=1 eix or else the approach used above to nd a formula for
the Dirichlet kernel. Now dene
If x > 1/q, then q > 1/x and you use the top line of 14.8.24 picking m such
that
1 1
q > m 1.
x x
Then in this case,
q m
b (k) 2
a sin (kx) kx + 3ap
k k x
k=p k=p
b (p) x (m p) + 6ap (m + 1)
( )
1 b (p)
b (p) x + 6 (m + 1) 25b (p) .
x m+1
Therefore, the partial sums of the series, ak sin kx form a uniformly Cauchy
sequence and must converge uniformly on (0, ). Now explain why this implies
the series converges uniformly on R.
12. Suppose f (x) = k=1 ak sin kx and that the convergence is uniform. Recall
something like this holds for power series. Is it reasonable to suppose that
f (x) = k=1 ak k cos kx? Explain.
13. Suppose |uk (x)| Kk for all x D where
n
Kk = lim Kk < .
n
k= k=n
Show that k= uk (x) converges converges uniformly on D in the sense
that for all > 0, there exists N such that whenever n > N ,
n
uk (x) uk (x) <
k= k=n
be the Fourier series for f . Then from the denition of ak , show that for
1
k = 0, ak = ik ak where ak is the Fourier coecient of f . Now use the
Bessels inequality to argue that k= |ak | < and then show this implies
2
|ak | < . You might want to use the Cauchy Schwarz inequality to do this
part. Then using the version of the Weierstrass M test given in Problem 13
obtain uniform convergence of the Fourier series to f .
for all R. Hint: Note the function arccos is continuous and maps [1, 1]
onto [0, ] . Using this, show you can dene g a continuous function on [1, 1]
by g (cos ) = h () for on [0, ]. Now use the Weierstrass approximation
theorem on [1, 1].
16. Show that if f is any odd 2 periodic function, then its Fourier series can
be simplied to an expression of the form
bn sin (nx)
n=1
where
k
Sk aj .
j=1
Show that if k=1 an converges then limn n also exists and equals the
same thing. Next nd an example where, although k=1 an fails to converge,
limn n does exist. This summability method is called Ceasaro summa-
bility. Recall the Fejer means were obtained in just this way.
358 FOURIER SERIES
18. Let 0 < r < 1 and for f a continuous periodic function of period 2 consider
Ar f () r|k| ak eik
k=
lim Ar f () = f () .
r1
Hint: You need to nd a kernel and write as the integral of the kernel con-
volved with f . Then consider properties of this kernel as was done with the
Fejer kernel.
Finally show
R
sin u sin u
lim du = du
R 0 u 2 0 u
This is a very important improper integral.
14.8. EXERCISES 359
21. Show that neither the Jordan nor the Dini criterion for pointwise convergence
implies the other criterion. That is, nd an example of a function for which
Jordans condition implies pointwise convergence but not Dinis and then nd
a function for which Dini works but Jordan does not. Hint: You might try
1
considering something like y = [ln (x)] for x > 0, x near 0, to get something
for which Jordan works but Dini does not. For the other part, try something
like x sin (1/x).
360 FOURIER SERIES
Part III
Further Topics
361
Metric Spaces And General
Topological Spaces
d (x, y) = d (y, x)
d (x, y) 0 and d (x, y) = 0 if and only if x = y
d (x, y) d (x, z) + d (z, y) .
You can check that Rn and Cn are metric spaces with d (x, y) = |x y| . How-
ever, there are many others. The denitions of open and closed sets are the same
for a metric space as they are for Rn .
Lemma 15.1.3 In a metric space, X every ball, B (x, r) is open. A set is closed
if and only if it contains all its limit points. If p is a limit point of S, then there
exists a sequence of distinct points of S, {xn } such that limn xn = p.
363
364 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Theorem 15.1.4 Suppose (X, d) is a metric space. Then the sets {B(x, r) : r >
0, x X} satisfy
{B(x, r) : r > 0, x X} = X (15.1.1)
If p B (x, r1 ) B (z, r2 ), there exists r > 0 such that
B (p, r) B (x, r1 ) B (z, r2 ) . (15.1.2)
Proof: Observe that the union of these balls includes the whole space, X so
15.1.1 is obvious. Consider 15.1.2. Let p B (x, r1 ) B (z, r2 ). Consider
r min (r1 d (x, p) , r2 d (z, p))
and suppose y B (p, r). Then
d (y, x) d (y, p) + d (p, x) < r1 d (x, p) + d (x, p) = r1
and so B (p, r) B (x, r1 ). By similar reasoning, B (p, r) B (z, r2 ). This proves
the theorem.
Let K be a closed set. This means K C X \ K is an open set. Let p be a
limit point of K. If p K C , then since K C is open, there exists B (p, r) K C . But
this contradicts p being a limit point because there are no points of K in this ball.
Hence all limit points of K must be in K.
Suppose next that K contains its limit points. Is K C open? Let p K C .
Then p is not a limit point of K. Therefore, there exists B (p, r) which contains at
most nitely many points of K. Since p / K, it follows that by making r smaller if
necessary, B (p, r) contains no points of K. That is B (p, r) K C showing K C is
open. Therefore, K is closed.
Suppose now that p is a limit point of S. Let x1 (S \ {p}) B (p, 1) . If
x1 , , xk have been chosen, let
{ }
1
rk+1 min d (p, xi ) , i = 1, , k, .
k+1
Let xk+1 (S \ {p}) B (p, rk+1 ) . This proves the lemma.
Lemma 15.1.5 If {xn } is a Cauchy sequence in a metric space, X and if some
subsequence, {xnk } converges to x, then {xn } converges to x. Also if a sequence
converges, then it is a Cauchy sequence.
Proof: Note rst that nk k because in a subsequence, the indices, n1 , n2 ,
are strictly increasing. Let > 0 be given and let N be such that for k >
N, d (x, xnk ) < /2 and for m, n N, d (xm , xn ) < /2. Pick k > n. Then if n > N,
d (xn , x) d (xn , xnk ) + d (xnk , x) < + = .
2 2
Finally, suppose limn xn = x. Then there exists N such that if n > N, then
d (xn , x) < /2. it follows that for m, n > N,
d (xn , xm ) d (xn , x) + d (x, xm ) < + = .
2 2
This proves the lemma.
15.2. COMPACTNESS IN METRIC SPACE 365
Example 15.2.2 Let X be any innite set and dene d (x, y) = 1 if x = y while
d (x, y) = 0 if x = y.
You should verify the details that this is a metric space because it satises the
axioms of a metric. The set X is closed and bounded because its complement is
which is clearly open because every point of is an interior point. (There are
none.) Also
{ (X is )bounded }because X = B (x, 2). However, X is clearly not compact
because B x, 21 : x X is a collection of open sets whose union contains X but
since they are all disjoint and( nonempty,
) there is no nite subset of these whose
union contains X. In fact B x, 12 = {x}.
From this example it is clear something more than closed and bounded is needed.
If you are not familiar with the issues just discussed, ignore them and continue.
Denition 15.2.3 In any metric space, a set E is totally bounded if for every > 0
there exists a nite set of points {x1 , , xn } such that
E ni=1 B (xi , ).
The following proposition tells which sets in a metric space are compact. First
here is an interesting lemma.
Lemma 15.2.4 Let X be a metric space and suppose D is a countable dense subset
of X. In other words, it is being assumed X is a separable metric space. Consider
the open sets of the form B (d, r) where r is a positive rational number and d D.
Denote this countable collection of open sets by B. Then every open set is the union
of sets of B. Furthermore, if C is any collection of open sets, there exists a countable
subset, {Un } C such that n Un = C.
Proof: Let U be an open set and let x U. Let B (x, ) U. Then by density of
D, there exists d DB (x, /4) . Now pick r Q(/4, 3/4) and consider B (d, r) .
Clearly, B (d, r) contains the point x because r > /4. Is B (d, r) B (x, )? if so,
366 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
this proves the lemma because x was an arbitrary point of U . Suppose z B (d, r) .
Then
3
d (z, x) d (z, d) + d (d, x) < r + < + =
4 4 4
Now let C be any collection of open sets. Each set in this collection is the union
of countably many sets of B. Let B denote the sets of B which are contained in
some set of C. Thus B = C. Then for each B B , pick UB C such that
B UB . Then {UB : B B } is a countable collection of sets of C whose union
equals C. Therefore, this proves the lemma.
Proposition 15.2.5 Let (X, d) be a metric space. Then the following are equiva-
lent.
(X, d) is compact, (15.2.3)
(X, d) is sequentially compact, (15.2.4)
(X, d) is complete and totally bounded. (15.2.5)
Proof: Suppose 15.2.3 and let {xk } be a sequence. Suppose {xk } has no
convergent subsequence. If this is so, then no value of the sequence is repeated
more than nitely many times. Also {xk } has no limit point because if it did,
there would exist a subsequence which converges. To see this, suppose p is a limit
point of {xk } . Then in B (p, 1) there are innitely many points of {xk } . Pick one
called xk1 . Now if xk1 , xk2 , , xkn have been picked with xki B (p, 1/i) , consider
B (p, 1/ (n + 1)) . There are innitely many points of {xk } in this ball also. Pick
xkn+1 such that kn+1 > kn . Then {xkn }n=1 is a subsequence which converges to p
and it is assumed this does not happen. Thus {xk } has no limit points. It follows
the set
Cn = {xk : k n}
is a closed set because it has no limit points and if
Un = CnC ,
then
X =
n=1 Un
but there is no nite subcovering, because no value of the sequence is repeated more
than nitely many times. Note xk is not in Un whenever k > n. This contradicts
compactness of (X, d). This shows 15.2.3 implies 15.2.4.
Now suppose 15.2.4 and let {xn } be a Cauchy sequence. Is {xn } convergent?
By sequential compactness xnk x for some subsequence. By Lemma 15.1.5 it
follows that {xn } also converges to x showing that (X, d) is complete. If (X, d) is
not totally bounded, then there exists > 0 for which there is no net. Hence there
exists a sequence {xk } with d (xk , xl ) for all l = k. By Lemma 15.1.5 again,
this contradicts 15.2.4 because no subsequence can be a Cauchy sequence and so no
subsequence can converge. This shows 15.2.4 implies 15.2.5.
15.2. COMPACTNESS IN METRIC SPACE 367
Now suppose 15.2.5. What about 15.2.4? Let {pn } be a sequence and let
n
{xni }m
i=1 be a 2
n
net for n = 1, 2, . Let
( )
Bn B xnin , 2n
pnk Bk .
Then if k l,
k1
( )
d (pnk , pnl ) d pni+1 , pni
i=l
k1
< 2(i1) < 2(l2).
i=l
D =
n=1 Dn .
such that Ce = C. If C admits no nite subcover, then neither does Ce and there ex-
ists pn X \ nk=1 Uk . Then since X is sequentially compact, there is a subsequence
{pnk } such that {pnk } converges. Say
p = lim pnk .
k
All but nitely many points of {pnk } are in X \ nk=1 Uk . Therefore p X \ nk=1 Uk
for each n. Hence
p/ k=1 Uk
|z 0| |z xj | + |xj | < 1 + r.
(x [r, r)n means xi [r, r) for each i.) Now dene S to be all cubes of the form
n
[ak , bk )
k=1
where
ak = r + i2p r, bk = r + (i + 1) 2p r,
( )n
for i {0, 1, , 2p+1 1}. Thus S is a collection of 2p+1 non overlapping
cubes
whose union equals [r, r)n and whose diameters are all equal to 2p r n. Now
choose p large enough that the diameter of these cubes is less than . This yields a
contradiction because one of the cubes must contain innitely many points of {ai }.
This proves the lemma.
The next theorem is called the Heine Borel theorem and it characterizes the
compact sets in Rn .
Proof: First it is shown f (X) is compact. Suppose C is a set of open sets whose
1
{ f (X). Then} since f is continuous f (U ) is open for all U C.
union contains
Therefore, f 1 (U ) : U C is a collection of open sets {whose union contains X. }
Since X is compact, it follows nitely many of these, f 1 (U1 ) , , f 1 (Up )
p
contains X in their union. Therefore, f (X) k=1 Uk showing f (X) is compact
as claimed.
Now since f (X) is compact, Theorem 15.2.7 implies f (X) is closed and bounded.
Therefore, it contains its inf and its sup. Thus f achieves both a maximum and a
minimum.
Proof: Suppose this is not true and that f is continuous but not uniformly
continuous. Then there exists > 0 such that for all > 0 there exist points, p
and q such that d (p , q ) < and yet d (f (p ) , f (q )) . Let pn and qn be
the points which go with = 1/n. By Proposition 15.2.5 {pn } has a convergent
subsequence, {pnk } converging to a point, x X. Since d (pn , qn ) < n1 , it follows
that qnk x also. Therefore,
Denition 15.3.4 If every nite subset of a collection of sets has nonempty inter-
section, the collection has the nite intersection property.
Proof: First I show each compact set is closed. Let K be a nonempty compact
set and suppose p / K. Then for each x K, let Vx = B (x, d (p, x) /3) and
Ux = B (p, d (p, x) /3) so that Ux and Vx have empty intersection. Then since V
is compact, there are nitely many Vx which cover K say Vx1 , , Vxn . Then let
U = ni=1 Uxi . It follows p U and U has empty intersection with K. In fact U has
empty intersection with ni=1 Vxi . Since U is an open set and p K C is arbitrary,
it follows K C is an open set.
370 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
{ C }
Consider now the claim about the intersection. {If this were not
} so, F : F F =
X and so, in particular, picking some F0 F, F : F F would be an open
C
property. To see this, note that if x F0 , then it must fail to be in some Fk and so
it is not in m
k=0 Fk . Since this is true for every x it follows k=0 Fk = .
m
m
Theorem 15.3.6 Let Xi be a compact metric space with metric di . Then i=1 Xi
is also a compact metric space with respect to the metric, d (x, y) maxi (di (xi , yi )).
{ }
Proof: This is most easily seen from sequential compactness. Let xk k=1
m th k k
be a sequence
{ k } of points in i=1 Xi . Consider the i component of x , xi . It
follows xi is a sequence of points in Xi and so it has a convergent subsequence.
{ }
Compactness of X1 implies there exists a subsequence of xk , denoted by xk1 such
that
lim xk11 x1 X1 .
k1
{ }
Now there exists a further subsequence, denoted by xk2 such that in addition to
{ l } x2 x2 X2 . After
k2
this, taking m such subsequences, there exists a subsequence,
x such that
m lim l xl
i = xi Xi for each i. Therefore, letting x (x1 , , xm ),
xl x in i=1 Xi . This proves the theorem.
f (x) B (a, M )
Uniform equicontinuity is like saying that the whole set of functions, A, is uni-
formly continuous on K uniformly for f A. The version of the Ascoli Arzela
theorem I will present here is the following.
lim K (fkl , f ) = 0.
l
Lemma 15.4.5 If K is a compact subset of Rn , then there exists D {xk }k=1 K
such that D is dense in K. Also, for every > 0 there exists a nite set of points,
{x1 , , xm } K, called an net such that
m
i=1 B (xi , ) K.
Otherwise, pick
2 K \ B (x1 , 1/m) .
xm m
Otherwise, pick
Continue this way until the process stops, say at N (m). It must stop because
if it didnt, there would be a convergent subsequence due to the compactness of
K. Ultimately all terms of this convergent subsequence would be closer than 1/m,
violating the manner in which they are chosen. Then D =
N (m)
m=1 k=1 {xk } . This
m
lim (gm , g) = 0.
m
Next I show that {gm } converges at every point of K. Let x K and let > 0 be
given. Choose xk such that for all f A,
d (f (xk ) , f (x)) < .
3
I can do this by the equicontinuity. Now if p, q are large enough, say p, q M,
d (gp (xk ) , gq (xk )) < .
3
Therefore, for p, q M,
d (gp (x) , gq (x)) d (gp (x) , gp (xk )) + d (gp (xk ) , gq (xk )) + d (gq (xk ) , gq (x))
< + + =
3 3 3
It follows that {gm (x)} is a Cauchy sequence having values X. Therefore, it con-
verges. Let g (x) be the name of the thing it converges to.
Let > 0 be given and pick > 0 such that whenever x, y K and |x y| < ,
it follows d (f (x) , f (y)) < 3 for all f A. Now let {x1 , , xm } be a net for
K as in Lemma 15.4.5. Since there are only nitely many points in this net, it
follows that there exists N such that for all p, q N,
d (gq (xi ) , gp (xi )) <
3
for all {x1 , , xm } . Therefore, for arbitrary x K, pick xi {x1 , , xm } such
that |xi x| < . Then
d (gq (x) , gp (x)) d (gq (x) , gq (xi )) + d (gq (xi ) , gp (xi )) + d (gp (xi ) , gp (x))
< + + = .
3 3 3
Since N does not depend on the choice of x, it follows this sequence {gm } is uni-
formly Cauchy. That is, for every > 0, there exists N such that if p, q N,
then
(gp , gq ) < .
15.4. ASCOLI ARZELA THEOREM 373
Now let p satisfy 15.4.6 for all x whenever p > N. Also pick > 0 such that if
|x y| < , then
d (gp (x) , gp (y)) < .
3
Then if |x y| < ,
d (g (x) , g (y)) d (g (x) , gp (x)) + d (gp (x) , gp (y)) + d (gp (y) , g (y))
< + + = .
3 3 3
Since was arbitrary, this shows that g is continuous.
It only remains to verify that (g, gk ) 0. But this follows from 15.4.6. This
proves the lemma.
With these lemmas, it is time to prove Theorem 15.4.4.
Proof of Theorem 15.4.4: Let D = {xk } be the countable dense set of K
gauranteed by Lemma 15.4.5 and let {(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , } be a
subsequence of N such that
be a subsequence of {(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , } which has the property
that
lim f(2,k) (x2 ) exists.
k
Denition 15.5.3 A set X together with such a collection of its subsets satisfying
15.5.7-15.5.9 is called a topological space. is called the topology or set of open sets
of X.
U V
p q
Hausdor
Note that if the topological space is Hausdor, then this denition is equivalent
to requiring that every open set containing p contains innitely many points from
E. Why?
Theorem 15.5.6 A subset, E, of X is closed if and only if it contains all its limit
points.
U V
p C
Regular
U V
C K
Normal
Proof: Let C denote all the closed sets which contain E. Then C is nonempty
because X C.
C { }
( {A : A C}) = AC : A C ,
an open set which shows that C is a closed set and is the smallest closed set which
contains E.
Proof: Let x E and suppose that x / E. If x is not a limit point either, then
there exists an open set, U ,containing x which does not intersect E. But then U C
is a closed set which contains E which does not contain x, contrary to the denition
that E is the intersection of all closed sets containing E. Therefore, x must be a
limit point of E after all.
Now E E so suppose x is a limit point of E. Is x E? If H is a closed set
containing E, which does not contain x, then H C is an open set containing x which
contains no points of E other than x negating the assumption that x is a limit point
of E.
The following is the denition of continuity in terms of general topological spaces.
It is really just a generalization of the - denition of continuity given in calculus.
Denition 15.5.13 Let (X, ) and (Y, ) be two topological spaces and let f : X
Y . f is continuous at x X if whenever V is an open set of Y containing f (x),
there exists an open set U such that x U and f (U ) V . f is continuous if
f 1 (V ) whenever V .
x = (x1 , , xn ) .
n n
n xi Ai Bi for each i. Therefore, x i=1 Ai Bi B and i=1 Ai Bi
Then
i=1 Ai .
The denition of compactness is also considered for a general topological space.
This is given next.
Theorem 15.5.18 If (X, ) is a Hausdor space, then every compact subset must
also be a closed set.
Proof: Suppose p
/ K. For each x X, there exist open sets, Ux and Vx such
that
x Ux , p Vx ,
and
Ux Vx = .
If K is assumed to be compact, there are nitely many of these sets, Ux1 , , Uxm
which cover K. Then let V m i=1 Vxi . It follows that V is an open set containing
p which has empty intersection with each of the Uxi . Consequently, V contains no
points of K and is therefore not a limit point of K. This proves the theorem.
A useful construction when dealing with locally compact Hausdor spaces is the
notion of the one point compactication of the space.
Denition 15.5.19 Suppose (X, ) is a locally compact Hausdor space. Then let
Xe X {} where is just the name of some point which is not in X which is
called the point at innity. A basis for the topology e e is
for X
{ C }
K where K is a compact subset of X .
e and so the open sets, K C are basic open
The complement is taken with respect to X
sets which contain .
The reason this is called a compactication is contained in the next lemma.
( )
Lemma 15.5.20 If (X, ) is a locally compact Hausdor space, then X, e e
is a
compact Hausdor space. Also if U is an open set of e , then U \ {} is an open
set of .
( )
Proof: Since (X, ) is a locally compact Hausdor space, it follows X, e e
is
a Hausdor topological space. The only case which needs checking is the one of
p X and . Since (X, ) is locally compact, there exists an open set of , U
C
having compact closure which contains p. Then p U and U and these are
disjoint open sets containing the points, p and respectively. Now let C be an
open cover of X e with sets from e
. Then must be in some set, U from C, which
must contain a set of the form K C where K is a compact subset of X. Then there
exist sets from C, U1 , , Ur which cover K. Therefore, a nite subcover of X e is
U1 , , Ur , U .
To see the last claim, suppose U contains since otherwise there is nothing to
show. Notice that if C is a compact set, then X \ C is an open set. Therefore, if
x U \ {} , and if X e \ C is a basic open set contained in U containing , then
if x is in this basic open set of X, e it is also in the open set X \ C U \ {} . If x
is not in any basic open set of the form X e \ C then x is contained in an open set of
which is contained in U \ {}. Thus U \ {} is indeed open in . This proves
the lemma.
15.6. CONNECTED SETS 379
Denition 15.5.21 If every nite subset of a collection of sets has nonempty in-
tersection, the collection has the nite intersection property.
Theorem 15.5.22 Let K be a set whose elements are compact subsets of a Haus-
dor topological space, (X, ). Suppose K has the nite intersection property. Then
= K.
Lemma 15.5.23 Let (X, ) be a topological space and let B be a basis for . Then
K is compact if and only if every open cover of basic open sets admits a nite
subcover.
S = A B, A, B = , and A B = B A = .
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated.
One of the most important theorems about connected sets is the following.
Theorem 15.6.2 Suppose U and V are connected sets having nonempty intersec-
tion. Then U V is also connected.
380 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets, A, B. Therefore, the other must be empty
and this shows U V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
Proof: To do this you show f (X) is not separated. Suppose to the contrary
that f (X) = A B where A and B separate f (X) . Then consider the sets, f 1 (A)
and f 1 (B) . If z f 1 (B) , then f (z) B and so f (z) is not a limit point of
A. Therefore, there exists an open set, U containing f (z) such that U A = .
But then, the continuity of f implies that f 1 (U ) is an open set containing z such
that f 1 (U ) f 1 (A) = . Therefore, f 1 (B) contains no limit points of f 1 (A) .
Similar reasoning implies f 1 (A) contains no limit points of f 1 (B). It follows
that X is separated by f 1 (A) and f 1 (B) , contradicting the assumption that X
was connected.
An arbitrary set can be written as a union of maximal connected sets called
connected components. This is the concept of the next denition.
Denition 15.6.4 Let S be a set and let p S. Denote by Cp the union of all
connected subsets of S which contain p. This is called the connected component
determined by p.
S {t [x, y] : [x, t] A}
(l, l + ) B =
Theorem 15.6.7 Let U be an open set in R. Then there exist countably many
disjoint open sets, {(ai , bi )}i=1 such that U =
i=1 (ai , bi ) .
You can verify that this set of points considered as a metric space with the metric
from R2 is not locally connected or arcwise connected but is connected.
Proof: Suppose not. Then it achieves two dierent values, k and l = k. Then
= f 1 (l) f 1 ({m Z : m = l}) and these are disjoint nonempty open sets
which separate . To see they are open, note
( ( ))
1 1 1 1
f ({m Z : m = l}) = f m=l m , n +
6 6
15.7 Exercises
1. Let V be an open set in Rn . Show there is an increasing sequence of
open sets, {Um } , such for all m N, Um Um+1 , Um is compact, and
V = m=1 Um .
Thus we have two metric spaces here although they involve the same sets of
points. Show the identity map is continuous and has a continuous inverse.
Show that R with the metric, is not complete while R with the usual metric
is complete. The rst part of this problem shows the two metric spaces are
homeomorphic. (That is what it is called when there is a one to one onto
continuous map having continuous inverse between two topological spaces.)
Thus completeness is not a topological property although it will likely be
referred to as such.
8. Prove the Heine Borel theorem as follows. First show [a, b] is compact in R.
n
Next show that i=1 [ai , bi ] is compact. Use this to verify that compact sets
are exactly those which are closed and bounded.
9. Give an example of a metric space in which closed and bounded subsets are
not necessarily compact. Hint: Let X be any innite set and let d (x, y) = 1
if x = y and d (x, y) = 0 if x = y. Show this is a metric space. What about
B (x, 2)?
12. Let (X, d) be a metric space where d is a bounded metric. Let C denote the
collection of closed subsets of X. For A, B C, dene
13. Using 12, suppose (X, d) is a compact metric space. Show (C, ) is a complete
metric space. Hint: Show rst that if Wn W where Wn is closed, then
(Wn , W ) 0. Now let {An } be a Cauchy sequence in C. Then if > 0
15.7. EXERCISES 385
there exists N such that when m, n N , then (An , Am ) < . Therefore, for
each n N ,
(An )
k=n Ak .
Let A
n=1 k=n Ak . By the rst part, there exists N1 > N such that for
n N1 , ( )
k=n Ak , A < , and (An ) k=n Ak .
(An )
k=n Ak A.
14. In the situation of the last two problems, let X be a compact metric space.
Show (C, ) is compact. Hint: Let Dn be a 2n net for X. Let Kn denote
nite unions of sets of the form B (p, 2n ) where p Dn . Show Kn is a
2(n1) net for (C, ).
16. Explain why L (V, W ) is always a complete normed vector space whenever
V, W are nite dimensional normed vector spaces for any choice of norm for
L (V, W ). Also explain why every closed and bounded subset of L (V, W ) is
sequentially compact for any choice of norm on this space.
17. Let L L (V, V ) where V is a nite dimensional normed vector space. Dene
Lk
eL
k!
k=1
Explain the meaning of this innite sum and show it converges in L (V, V ) for
any choice of norm on this space. Now tell how to dene sin (L).
18. Let X be a nite dimensional normed vector space, real or complex. Show
n
that X is separable.Hint: Let {vi }
i=1 be a basis and dene a map from F to
n
n n
X, , as follows. ( k=1 xk ek ) k=1 xk vk . Show is continuous and has
a continuous inverse. Now let D be a countable dense set in Fn and consider
(D).
where
||f || sup{|f (x)| : x X}
and
|f (x) f (y)|
(f ) sup{ : x, y X, x = y}.
|x y|
Show that (C (X; Rn ) , |||| ) is a complete normed linear space. This is
called a Holder space. What would this space consist of if > 1?
21. Let {fn }
n=1 C (X; R ) where X is a compact subset of R and suppose
n p
||fn || M
for all n. Show there exists a subsequence, nk , such that fnk converges in
C (X; Rn ). The given sequence is precompact when this happens. (This also
shows the embedding of C (X; Rn ) into C (X; Rn ) is a compact embedding.)
Hint: You might want to use the Ascoli Arzela theorem.
22. Let f :R Rn Rn be continuous and bounded and let x0 Rn . If
x : [0, T ] Rn
x = f (t, x) , t [0, T ]
x (0) = x0 .
{x : |x x0 | r}
where this is the usual norm coming from the dot product. Let P : Rn
D (x0 , r) be dened by
{
x if x D (x0 , r)
P (x) xx0
x0 + r |xx0|
if x / D (x0 , r)
389
390 MEASURE THEORY AND TOPOLOGY
(V \ K) <
3. Then the measurable sets S contain the Borel sets and also is inner regular
on every open set and for every E S with (E) < .
Proof: First we establish 1 and 2 and use them to establish the last assertion.
Consider 2. Suppose it is not true. Then there exists an open set V having (V ) <
but for all K V, (V \ K) for some > 0. By inner regularity on open
sets, there exists K1 V, K1 compact, such that (K1 ) /2. Now by assumption,
(V \ K1 ) and so by inner regularity on open sets again, there exists compact
K2 V \ K1 such that (K2 ) /2. Continuing this way, there is a sequence of
disjoint compact sets contained in V {Ki } such that (Ki ) /2.
K1
K4
K2
K3
Dene
S1 = {E P () : E K S}
for all compact K.
First it will be shown the compact sets are in S. From this it will follow the
closed sets are in S1 . Then you show S1 = S. Thus S1 = S is a algebra and so it
contains the Borel sets. Finally you show the inner regularity assertion.
Claim 1: Compact sets are in S.
Proof of claim: Let V be an open set with (V ) < . I will show that for C
compact,
(V ) (V \ C) + (V C).
Here is a diagram to help keep things straight.
H V C
Since is arbitrary, this shows the compact sets are in S. This proves the claim.
As discussed above, this veries the closed sets are in S1 . If S1 is a algebra,
this will show that S1 contains the Borel sets. Thus I rst show S1 is a algebra.
To see that S1 is closed with respect to taking complements, let E S1 and K
a compact set.
K = (E C K) (E K).
Then from the fact, just established, that the compact sets are in S,
E C K = K \ (E K) S.
K
n=1 En = n=1 K En S
Let E S1 and let V be an open set with (V ) < and choose K V such
that (V \ K) < . Then since E S1 , it follows E K, E C K S and so
The two sets are disjoint and in S
z }| {
(V ) (V \ E) + (V E) (K \ E) + (K E) + 2
= (K) + 2 (V ) + 3
(V ) = (V \ E) + (V E)
(S) = (S E) + (S \ E).
(S) + (V ) = (V \ E) + (V E) (S \ E) + (S E).
(F ) = sup{(K) : K F }
16.1. BOREL MEASURES 393
for all F S with (F ) < . It might help to refer to the following crude picture
to keep things straight. It also might not help. I am not sure.
<
U \F F K VC V V
<
Let (F ) < and let U be an open set U F, (U ) < . Let V be open,
V U \ F , and
(V \ (U \ F )) < .
(This can be obtained as follows, because is a measure on S.
(V ) = (U \ F ) + (V \ (U \ F ))
Thus from the outer regularity of , 1 above, there exists V such that it contains
U \ F and
(U \ F ) + > (V ) .
and so
(V \ (U \ F )) = (V ) (U \ F ) < .)
Also,
( )C
V \ (U \ F ) = V U FC
[ ]
= V UC F
( )
= (V F ) V U C
V F
and so
(V F ) (V \ (U \ F )) < .
Since V U F , V U C F so U V C U F = F . Hence U V C is a
C C
(F ) = (V F ) + (F \ V )
< + (F \ V ) + (U V C ) 2 + (K V C ).
A measure space, (S, , ) is nite if there exist measurable sets, i with (i ) <
and = i=1 i .
Lemma 16.1.6 If (S, , ) is nite then there exist disjoint measurable sets,
{Bn } such that (Bn ) < and
n=1 Bn = .
The following lemma deals with the outer measure generated by a measure
which is nite. It says that if the given measure is nite and complete then no
new measurable sets are gained by going to the induced outer measure and then
considering the measurable sets in the sense of Caratheodory.
Lemma 16.1.7 Let (, S, ) be any measure space and let : P() [0, ] be
the outer measure induced by . Then is an outer measure as claimed and if S
is the set of measurable sets in the sense of Caratheodory, then S S and =
on S. Furthermore, if is nite and (, S, ) is complete, then S = S.
(S) > (T ) = (T E) + (T \ E)
(T E) + (T \ E)
(S E) + (S \ E) .
Next consider the claim about not getting any new sets from the outer measure
in the case the measure space is nite and complete.
Suppose rst F S and (F ) < . Then there exists E S such that E F
and (E) = (F ) . Since (F ) < ,
(E \ F ) = (E) (F ) = 0.
16.2. REGULAR MEASURES 395
E = (E \ F ) F
If the measure is both outer and inner regular, it is called regular. Such a measure,
if it is complete, is referred to as a Radon measure.
Theorem 16.2.2 Let be a separable complete metric space, and let be a nite
measure ( () < ) dened on the Borel sets B (). Then is regular.
First I will show that is almost regular. This will not require completeness of the
metric space. I will also say almost regular on A if the above conditions hold for
A.
Claim 1: is almost regular.
396 MEASURE THEORY AND TOPOLOGY
because { }
1
H=
n=1 x : dist (x, H) < .
n
In words, is almost regular on open and closed sets. Let
It follows that since is a nite measure, then for all n large enough,
( )
j=1 Fj \ j=1 Hj <
n
( )
j=1 Vj \
j=1 Fj (Vj \ Fj ) <
j=1
Thus G is a algebra. It contains the open sets because is inner almost regular on
open sets and obviously outer regular on an open set. Therefore G contains B ().
Next we use the separability and completeness of the metric space to go from
almost regular to regular.
Claim 2: Let C be a closed set. Then
{ ( )}
Proof: Let {ak }k=1 be a dense subset of . Thus B ak , n1 k,n cover C. It
follows there exists mn such that
( ( ))
1
C \ k=1 B ak ,
mn
(C \ Cn ) < n .
n 2
Let
K C (
n=1 Cn )
(C \ K) = (
n=1 (C \ Cn ))
(C \ Cn ) < n
= .
n=1 n=1
2
Corollary 16.2.3 Let be a complete metric space in which the closures of balls
are compact, = . Also let be a Borel measure which is nite on compact sets.
Then must be regular.
An B (x0 , n) \ B (x0 , n 1)
, x0 and let
Bn = B (x0 , n + 1) \ B (x0 , n 2)
Thus the An are disjoint and have union equal to and the Bn are open sets having
nite measure which contain the respective An . Also, for E An ,
(E) = Bn (E)
By the above theorem, each Bn is regular. Let E be any Borel set with l < (E) .
Then for n large enough,
n
n
l< (E Ak ) = Bk (E Ak )
k=1 k=1
n
n
n
(K) = (Kk ) = Bk (Kk ) > r Bk (E Ak ) > l
k=1 k=1 k=1
It follows that
(E) + > (V )
Lemma 16.3.2 Let X be a locally compact Hausdor space, and let K be a compact
subset of the open set V . Then there exists an open set U such that U is compact
and K U U V .
Proof: To begin with, here is a claim. This claim is obvious in the case of a
metric space but requires some proof in this more general case.
Claim: If k K then there exists an open set Uk containing k such that Uk is
contained in V.
Proof of claim: Since X is locally compact, there exists a basis of open sets
whose closures are compact, U. Denote by C the set of all U U which contain
16.3. LOCALLY COMPACT HAUSDORFF SPACES 399
k and let C denote the set of all closures of these sets of C intersected with the
closed set V C . Thus C is a collection of compact sets. I will argue that there are
nitely many of the sets of C which have empty intersection. If not, then C has
the nite intersection property and so there exists a point p in all of them. Since
X is a Hausdor space, there exist disjoint basic open sets from U, A, B such that
k A and p B. Therefore, p / A contrary to the above requirement that p be in
all such sets. It follows there are sets A1 , , Am in C such that
V C A1 Am =
U = ri=1 Uki
so it follows
U = ri=1 Uki
and so K U U V and U is a compact set.
Urysohns lemma is a fundamental result. This lemma really applies to normal
topological spaces but we will need a version of it which holds for locally compact
Hausdor space.
Theorem 16.3.3 (Urysohn) Let (X, ) be a locally compact Hausdor space and
let K V where K is compact and V is open. Then there exists g : X [0, 1] such
that g is continuous, g (x) = 1 on K and there exists an open set U having compact
closure such that g (x) = 0 if x
/ U and K U U V .
Proof: By Lemma 16.3.2, there exists an open set U having compact closure
which also contains K such that U V . Let D {rn }
n=1 be the rational numbers
in (0, 1). Using Lemma 16.3.2 again, choose Vr1 an open set such that
K Vr1 V r1 U.
Suppose Vr1 , , Vrk have been chosen, and list the rational numbers
r1 , , rk
in order,
rl1 < rl2 < < rlk for {l1 , , lk } = {1, , k}.
If rk+1 > rlk then letting p = rlk , let Vrk+1 satisfy
V p Vrk+1 V rk+1 U.
400 MEASURE THEORY AND TOPOLOGY
If rk+1 (rli , rli+1 ), let p = rli and let q = rli+1 . Then let Vrk+1 satisfy
V p Vrk+1 V rk+1 Vq .
If rk+1 < rl1 , let p = rl1 and let Vrk+1 satisfy
K Vrk+1 V rk+1 Vp .
Thus there exist open sets Vr for each r Q (0, 1) with the property that if
r < s are two rational numbers,
K Vr V r Vs V s U.
Now let
f (x) = min (inf{t D : x Vt }, 1) , f (x) 1 if x
/ Vt .
tD
Theorem 16.3.5 (Urysohns lemma for metric space) Let H be a closed subset of
an open set U in a metric space, (X, d) . Then there exists a continuous function,
g : X [0, 1] such that g (x) = 1 for all x H and g (x) = 0 for all x
/ U.
Proof: If x / C, a closed set, then dist (x, C) > 0 because if not, there would
exist a sequence of points of( C converging
) to x and it would follow that x C.
Therefore, dist (x, H) + dist x, U C > 0 for all x X. Now dene a continuous
function g as ( )
dist x, U C
g (x) .
dist (x, H) + dist (x, U C )
It is easy to see this veries the conclusions of the theorem.
where denotes the whole topological space considered. Also for Cc (), K
if
() [0, 1] and (K) = 1.
and V if
() [0, 1] and spt() V.
K V = ni=1 Vi , Vi open.
for all x K.
Wi Ui Vi
n
If x is such that j=1 j (x) = 0, then x / ni=1 U i . Consequently (y) = 0 for
ally near x and so i (y) = 0 for all y near x. Hence i is continuous at such x.
n
If j=1 j (x) = 0, this situation persists near x and so i is continuous at such
n
points. Therefore i is continuous. If x K, then (x) = 1 and so j=1 j (x) = 1.
Clearly 0 i (x) 1 and spt( j ) Vj .
The following corollary wont be needed immediately but is very interesting just
the same.
Proof: Keep Vi the same but replace Vj with V fj Vj \ H. Now in the proof
above, applied to this modied collection of open sets, if j = i, j (x) = 0 whenever
x H. Therefore, i (x) = 1 on H.
and if Lf 0 whenever f 0.
Example 16.4.2 Let be N the positive integers with the usual metric space topol-
ogy |x y| d (x, y). Then every function dened on is continuous. Thus Cc ()
consists of those
functions f which vanish for all n large enough. For such func-
tions, let Lf k=1 f (k) where the sum is well dened because all but nitely
many terms equal 0.
16.4. POSITIVE LINEAR FUNCTIONALS 403
Example 16.4.3 Let = Rn with the usual metric space topology given by |x y|
( )1/2
n 2
i=1 |xi yi | . For f Cc (Rn ) , dene
Lf f (x1 , , xn ) dx1 dxn ,
this being the ordinary iterated Riemann integral used in beginning calculus.
The Riesz representation theorem shows that positive linear functionals of this
sort correspond to measures and the Lebesgue integrals which result extend the
functionals. In the second example, the measure which results will be Lebesgue
measure.
is complete, (16.4.3)
(K) < for all K compact, (16.4.4)
(F ) = sup{(K) : K F, K compact},
for all F open and for all F S with (F ) < ,
(F ) = inf{(V ) : V F, V open}
The plan is to dene an outer measure and then to show that it, together with the
algebra of sets measurable in the sense of Caratheodory, satises the conclusions
of the theorem. Always, K will be a compact set and V will be an open set.
Proof: First it is necessary to verify is well dened because there are two
descriptions of it on open sets. Suppose then that 1 (V ) inf{(U ) : U V
and U is open}. It is required to verify that 1 (V ) = (V ) where is given as
sup{Lf : f V }. If U V, then (U ) (V ) directly from the denition. Hence
from the denition of 1 , it follows 1 (V ) (V ) . On the other hand, V V and
so 1 (V ) (V ) . This veries is well dened.
404 MEASURE THEORY AND TOPOLOGY
It remains to show that is an outer measure. nLet V = i=1 Vi and let f V .
Then spt(f ) ni=1 Vi for some n. Let i Vi , i=1 i = 1 on spt(f ).
n
n
Lf = L(f i ) (Vi ) (Vi ).
i=1 i=1 i=1
Hence
(V ) (Vi )
i=1
since f V is arbitrary. Now let E = i=1 Ei . Is (E) i=1 (Ei )? Without
loss of generality, it can be assumed (Ei ) < for each i since if not so, there is
nothing to prove. Let Vi Ei with (Ei ) + 2i > (Vi ).
(E) (
i=1 Vi ) (Vi ) + (Ei ).
i=1 i=1
Since was arbitrary, (E) i=1 (Ei ).
K V
g>
Then h 1 on V while g1 1 on V and so g1 h which implies
L(g1 ) Lh and that therefore, since L is linear,
Lg Lh.
Lg (V ) (K) .
Letting 1 yields Lg (K). This proves the rst part of the lemma. The
second assertion follows from this and Theorem 16.3.3. If K is given, let
Kg
A U1 B V1
From Lemma 16.4.7 (A B) < and so there exists an open set W such that
W A B, (A B) + > (W ) .
U A, V B, U V = , and (A B) + (W ) (U V ).
Lemma 16.4.9 Let f Cc (), f () [0, 1]. Then (spt(f )) Lf . Also, every
open set V satises
(V ) = sup { (K) : K V } .
spt(f ) V
Finally, let V be open and let l < (V ) . Then from the denition of , there
exists f V such that L (f ) > l. Therefore, l < (spt (f )) (V ) and so this
shows the claim about inner regularity of the measure on an open set.
This has now veried the conditions of Lemma 16.1.4. It follows is inner regular
on sets of nite measure and outer regular on all sets, also that the algebra of
measurable sets contains the Borel sets.
It remains to show satises 16.4.5.
Lemma 16.4.10 f d = Lf for all f Cc ().
406 MEASURE THEORY AND TOPOLOGY
Proof: Let f Cc (), f real-valued, and suppose f () [a, b]. Choose t0 < a
and let t0 < t1 < < tn = b, ti ti1 < . Let
n
hi Vi , hi (x) = 1 on spt(f ).
i=1
n n
Lf = L( f hi ) L( hi (ti + ))
i=1 i=1
n
= (ti + )L(hi )
i=1
( n )
n
= (|t0 | + ti + )L(hi ) |t0 |L hi .
i=1 i=1
Now note that |t0 | + ti + 0 and so from the denition of and Lemma 16.4.7,
this is no larger than
n
(|t0 | + ti + )(Vi ) |t0 |(spt(f ))
i=1
n
(|t0 | + ti + ) ((Ei ) + /n) |t0 |(spt(f ))
i=1
(spt(f ))
z }| {
n
n
|t0 | (Ei ) + |t0 | + ti (Ei ) + (|t0 | + |b|)
i=1 i=1
16.4. POSITIVE LINEAR FUNCTIONALS 407
n
n
ti + (Ei ) + 2 |t0 |(spt(f )).
i=1
n i=1
From 16.4.7 and 16.4.6, the rst and last terms cancel. Therefore this is no larger
than
(2|t0 | + |b| + (spt(f )) + )
n n
+ ti1 (Ei ) + (spt(f )) + (|t0 | + |b|)
i=1 i=1
n
f d + (2|t0 | + |b| + 2(spt(f )) + ) + (|t0 | + |b|)
Thus 1 (K) 2 (K) for all K. Similarly, the inequality can be reversed and so it
follows the two measures are equal on compact sets. By the assumption of inner
regularity on open sets, the two measures are also equal on all open sets. By outer
regularity, they are equal on all sets of S.
An important example of a locally compact Hausdor space is any metric space
in which the closures of balls are compact. For example, Rn with the usual metric
is an example of this. Not surprisingly, more can be said in this important special
case.
Theorem 16.4.11 Let (, ) be a metric space in which the closures of the balls
are compact and let L be a positive linear functional dened on Cc () . Then there
exists a measure representing the positive linear functional which satises all the
conclusions of Theorem 16.3.3 and in addition the property that is regular. The
same conclusion follows if (, ) is a compact Hausdor space.
Proof: Let and S be as described in Theorem 16.4.4. The outer regularity
comes automatically as a conclusion of Theorem 16.4.4. It remains to verify inner
regularity. Let F S and let l < k < (F ) . Now let z and n = B (z, n) for
n N. Thus F n F. It follows that for n large enough,
k < (F n ) (F ) .
408 MEASURE THEORY AND TOPOLOGY
Therefore, taking the inmum over all V containing K implies 1 (K) 2 (K) .
Reversing the argument shows 1 (K) = 2 (K) . This also implies the two measures
are equal on all open sets because they are both inner regular on open sets. It is
being assumed the two measures are regular. Now let F S1 with 1 (F ) < .
Then there exist sets, H, G such that H F G such that H is the countable
union of compact sets and G is a countable intersection of open sets such that
1 (G) = 1 (H) which implies 1 (G \ H) = 0. Now G \ H can be written as the
countable intersection of sets of the form Vk \Kk where Vk is open, 1 (Vk ) < and
Kk is compact. From what was just shown, 2 (Vk \ Kk ) = 1 (Vk \ Kk ) so it follows
2 (G \ H) = 0 also. Since 2 is complete, and G and H are in S2 , it follows F S2
and 2 (F ) = 1 (F ) . Now for arbitrary F possibly having 1 (F ) = , consider
16.4. POSITIVE LINEAR FUNCTIONALS 409
Now let k N and Bk = B (0, k) . It was just shown that for E Borel,
(Bk E) = (Bk E) .
where Ekn F . By the outer regularity of , there exists a Borel set Fkn Ekn such
that (Fkn ) = (Ekn ). In fact Fkn can be assumed to be a G set. Let
Pn
tn () cnk XFkn () .
k=1
410 MEASURE THEORY AND TOPOLOGY
Lemma 16.5.2 Every open set in Rn is the countable disjoint union of half open
boxes of the form
n
(ai , ai + 2k ]
i=1
k
where ai = l2 for some integers, l, k. The sides of these boxes are of equal length.
One could also have half open boxes of the form
n
[ai , ai + 2k )
i=1
Proof: Let
n
Ck = {All half open boxes (ai , ai + 2k ] where
i=1
p Bk B .
Hence B is the desired countable disjoint collection of half open boxes whose union
is U . The last assertion about the other type of half openrectangle is obvious.
n
Now what does Lebesgue measure do to a rectangle, i=1 (ai , bi ]?
n n
Lemma 16.5.3 Let R = i=1 [ai , bi ], R0 = i=1 (ai , bi ). Then
n
mn (R0 ) = mn (R) = (bi ai ).
i=1
for i = 1, , n and consider functions gik and fik having the following graphs.
1 fik 1 gik
ai + 1
bi 1 ai 1
k bi + 1
k k k
R R
ai bi ai bi
Let
n
n
g k (x) = gik (xi ), f k (x) = fik (xi ).
i=1 i=1
n
f k dmn = f k (bi ai 2/k).
i=1
and so
mn (H) = mn (B (0, R)) mn (B (0, R) \ H) = mn (H + x)
Therefore, mn (x + H) = mn (H) as claimed. If H is not bounded, consider Hm
B (0, m)H. Then mn (x + Hm ) = mn (Hm ) . Passing to the limit as m yields
the result in general.
mn (E) = mn (x + E)
Proof: Suppose mn (E) < . By regularity of the measure, there exist sets
G, H such that G is a countable intersection of open sets, H is a countable union
of compact sets, mn (G \ H) = 0, and G E H. Now mn (G) = mn (G + x) and
mn (H) = mn (H + x) which follows from Lemma 16.5.4 applied to the sets which
are either intersected to form G or unioned to form H. Now
and both x + H and x + G are measurable because they are either countable unions
or countable intersections of measurable sets. Furthermore,
mn (x + G \ x + H) = mn (x + G) mn (x + H) = mn (G) mn (H) = 0
16.6. CHANGE OF VARIABLES 413
mn (E) = mn (H) = mn (x + H) mn (x + E)
mn (x + G) = mn (G) = mn (E) .
S2
ej 6
S1
-
ek
Denote by S1 those points of L (Q) such that also xj < 1. This is a Borel set
because it is the intersection of an open set with a Borel set. Let S2 denote those
points of L (Q) for which xj 1. Again, this is the intersection of Borel sets and
is therefore Borel. Then, as suggested by the picture, it follows from translation
invariance of Lebesgue measure, proved above and elementary geometry, that
mn (L (Q)) = mn (S1 ) + mn (S2 ) = mn (S1 ) + mn (S2 ej )
= mn (Q) = |det (L)| mn (Q)
Now consider a half open rectangle R having all equal sides of length 2m , of
the sort described in Lemma 16.5.2. Then there exists a vector v and an integer m
such that
(R v) = 2m Q
Hence, from translation invariance and the formula det (aL) = an det (L) ,
( )
mn (L (R)) = mn (L (R v)) = mn 2m LQ
= 2mn |det (L)| mn (Q) = |det (L)| mn (R)
It follows from Lemma 16.5.2 that whenever L is an elementary matrix and V
is open,
mn (LV ) = |det (L)| mn (V ) (16.6.10)
16.6.10 is also true for any elementary L if V is replaced with a compact set K.
Let K be compact and contained in an open set V having nite measure. Then
mn (LK) + |det (L)| mn ((V \ K)) = mn (LK) + mn (L (V \ K))
= mn (LV ) = |det (L)| mn (V ) ,
and so
mn (LK) = |det (L)| mn (V ) |det (L)| mn ((V \ K)) = |det (L)| mn (K) .
Now let E be an arbitrary Lebesgue measurable set with mn (E) < . Then
there exists F, a countable union of an increasing sequence of compact sets {Kk }
contained in E, G a countable intersection of a decreasing sequence {Vk } of open
sets containing E, such that
mn (G \ F ) = 0, F E G
16.7. FUBINIS THEOREM 415
Then it also follows from the above that L (G \ F ) is Lebesgue measurable because
it is the intersection of open sets {L (Vk \ Kk )} and that
mn (L (G \ F )) = lim mn (L (Vk \ Kk )) = |det (L)| lim mn (Vk \ Kk )
k k
= |det (L)| mn (G \ F ) = 0
It follows that L (F ) L (E) L (G) and the two ends are Lebesgue measurable
with
mn (L (G \ F )) = mn (LG \ LF ) = 0,
and so L (E) is Lebesgue measurable. Also the desired formula must hold for G
and F and therefore,
|det (L)| mn (E) = |det (L)| mn (F ) = mn (L (F )) mn (L (E))
mn (L (G)) = |det (L)| mn (G) = |det (L)| mn (E)
and so all the inequalities are equal signs and
|det (L)| mn (E) = mn (L (E)) .
If E is an arbitrary Lebesgue measurable set, apply the above result to E B (0, k)
and then let k . Since every invertible matrix is a product of elementary
matrices, it follows that the above formula holds for any invertible L.
Can we relax the requirement that L be invertible? If L is an arbitrary n n
matrix, then there are elementary matrices Ej such that L = E1 Er R where R
is in row reduced echelon form. If R = I, then this just exhibits L as a product of
elementary matrices. Otherwise, R maps Rn to span (e1 , , en1 ) which is clearly
a closed set of Lebesgue measure 0.
Consider this second case that
( L )is not invertible. Then if E is any Lebesgue
measurable set LE E1 Er Rn1 and this second set is closed with Lebesgue
measure zero because, from the above,
( ( ))
r
( )
mn E1 Er Rn1 = |det (Ei )| mn Rn1 = 0.
i=1
Therefore LE, being a subset of this set of measure zero, is also Lebesgue measurable
and has measure 0. Also mn (E) |det (L)| = 0 so the formula continues to hold for
all L invertible or not.
dxij = dm1 .
416 MEASURE THEORY AND TOPOLOGY
Proof: First, why does it make sense? It obviously makes sense for any E of
the form
n
E= Ui
i=1
where Ui is an open set. Let G denote the Borel sets E for which
XERp dxi1 dxin
n
makes sense. Here Rp = (p, p) . If you have a nite disjoint union ki=1 Ei ,
Then the iterated integral makes sense because it is just the integral of a nite
sum of indicator functions XRp E and for each of these, the iterated integral makes
sense. It follows from the monotone convergence theorem that the iterated integral
makes sense for i=1 Ei where each Ei G. Now if E G, E Rp = R Rp \
C n
(E Rp ) . Each iterated integral makes sense for XERp and each makes sense for
XRp . Therefore, each makes sense for the dierence XRp XERp = XE C Rp . Thus
G contains the open rectangles and if K denotes these open rectangles, G (K) by
Dynkins lemma. However, (K) equals the Borel sets. Hence G equals the Borel
sets. Now dene
(E) XE dxi1 dxin
and passing to a limit using a sequence of nonnegative simple functions and using
the monotone convergence theorem multiple times, we obtain Fubinis theorem.
16.8. EXERCISES 417
where the iterated integral makes sense because each iterate is measurable.
16.8 Exercises
1. Let = N, the natural numbers and let d (p, q) = |p q|, the usual dis-
tance in
R. Show that (, d) the closures of the balls are compact. Now let
f k=1 f (k) whenever f Cc (). Show this is a well dened positive
linear functional on the space Cc (). Describe the measure of the Riesz rep-
resentation theorem which results from this positive linear functional. What
if (f ) = f (1)? What measure would result from this functional? Which
functions are measurable?
2. Verify that dened in Lemma 16.1.7 is an outer measure.
3. Let F : R R be increasing and right continuous. Let f f dF where
the integral is the Riemann Stieltjes integral of f Cc (R). Show the measure
from the Riesz representation theorem satises
([a, b]) = F (b) F (a) , ((a, b]) = F (b) F (a) ,
([a, a]) = F (a) F (a) .
Hint: You might want to review the material on Riemann Stieltjes integrals
presented in the Preliminary part of the notes.
4. Suppose is a metric space and , are two Borel measures with the prop-
erty that they are nite on every ball and that they are equal on every open
set. Show they must be equal on every Borel set. Hint: Let G denote those
Borel sets E such that (E B) = (E B) for B an open ball. Show G
is closed with respect to countable disjoint unions and complements and con-
tains the system consisting of the open sets. Then consider the lemma on
systems. Let B = B (p, n) , n = 1, 2, .
5. Let be a metric space with the closed balls compact and suppose is a
measure dened on the Borel sets of which is nite on compact sets. Show
there exists a unique Radon measure, which equals on the Borel sets.
6. Random vectors (variables) are measurable functions X, mapping a prob-
ability space, (, P, F) to Rn (sometimes, although not in this problem, a Ba-
nach space). Thus X () Rn for each and P is a probability measure
dened on the sets of F, a algebra of subsets of . For E a Borel set in Rn ,
dene ( )
(E) P X1 (E) probability that X E.
418 MEASURE THEORY AND TOPOLOGY
Show this is a well dened measure on the Borel sets of Rn and use Problem 5
to obtain a Radon measure, X dened on a algebra of sets of Rn including
the Borel sets such that for E a Borel set, X (E) =Probability that (X E).
7. For X a random variable dened above and X the Radon measure just
dened, suppose h : Rp R is Borel measurable and in L1 (Rp , X ) . Then
h (X ()) dP = h (x) dX .
Rp
for all G. Then explain why this also holds for all Cc (Rp ) . Now
apply the Riesz representation theorem to conclude that Y = X .
9. Suppose X and Y are metric spaces having compact closed balls. Show
(X Y, dXY )
is also a metric space which has the closures of balls compact. Here
Let
A {E F : E is a Borel set in X, F is a Borel set in Y } .
Show (A), the smallest algebra containing A contains the Borel sets. Hint:
Show every open set in a metric space which has closed balls compact can be
obtained as a countable union of compact sets. Next show this implies every
open set can be obtained as a countable union of open sets of the form U V
where U is open in X and V is open in Y .
16.8. EXERCISES 419
Would it work the same if you used ([f t]) dt? Explain.
13. The Riemann integral is only dened for functions which are bounded which
are also dened on a bounded interval. If either of these two criteria are not
satised, then the integral is not the Riemann integral. Suppose f is Riemann
integrable on a bounded interval, [a, b]. Show that it must also be Lebesgue
integrable with respect to one dimensional Lebesgue measure and the two
integrals coincide. Give a theorem in which the improper Riemann integral
coincides with a suitable Lebesgue integral. (There are many such situations
just nd one.) Note that 0 sinx x dx is a valid improper Riemann integral but
is not a Lebesgue integral. Why?
Next let F consist of those sets for which is outer regular and also inner reg-
ular with closed replacing compact in the denition of inner regular. Finally
show that if C is a closed set, then
(C \ Cn ) < /2n .
Then consider K n Cn .
lim ([fn ]) = 0
n
16. Let K be a compact subset of R having no isolated points. Show that there
exists an increasing continuous function g such that g is constant on every
connected component of K C and has values between 0 and 1. If J, L are two
components, J < L, then the value of g on J is strictly less than its value on
L. Hint: Let the components be {(ak , bk )}. Let a be the rst point of K and
b be the last. Let g0 be piecewise linear, increasing and continuous going from
0 to the left of a to 1 to the right of b. Let g1 equal 21 (g0 (a1 ) + g0 (b1 )) on
(a1 , b1 ) and adjust to make piecewise linear and increasing going from 0 to 1.
next adjust g1 in a similar way to make it constant on (a2 , b2 ). Continue this
way. Estimate ||gk gk1 || in terms of gk1 (bk ) gk1 (ak ) and observe
and use that the intervals (gk1 (ak ) , gk1 (bk )) are disjoint.
17. Show that if K is any compact subset of R which has no isolated points,
there exists a Radon measure which has the properties (K) = 1, (E) =
(E K) , if H is a proper compact subset of K, then (H) < 1. Also,
(p) = 0 whenever p is a point.
18. Let K be an arbitrary compact subset of R. Then there exists a Radon
measure which has the properties (K) = 1, (E) = (E K) , if H is a
compact subset of K, then (H) < 1.
Extension Theorems
17.1 Algebras
First of all, here is the denition of an algebra and theorems which tell how to
recognize one when you see it. An algebra is like a algebra except it is only closed
with respect to nite unions.
Lemma 17.1.2 Suppose R and E are subsets of P(Z)1 such that E is dened as
the set of all nite disjoint unions of sets of R. Suppose also
, Z R
A B R whenever A, B R,
A \ B E whenever A, B R.
421
422 EXTENSION THEOREMS
E1 = m
i=1 Ri , E2 = j=1 Rj
n
where the Ri are disjoint sets in R and the Rj are disjoint sets in R. Then
E1 E2 = m
i=1 j=1 Ri Rj
n
which is clearly an element of E because no two of the sets in the union can intersect
and by assumption they are all in R. Thus by induction, nite intersections of sets
of E are in E. Consider the dierence of two elements of E next.
If E = ni=1 Ri E,
E1 \ E2 = E1 E2C E
E1 E2 = (E1 \ E2 ) E2 E
because E1 \ E2 consists of a nite disjoint union of sets of R and these sets must
be disjoint from the sets of R whose union yields E2 because (E1 \ E2 ) E2 = .
This proves the lemma.
The following corollary is particularly helpful in verifying the conditions of the
above lemma.
ABC D =AC BD R
by assumption.
A B \ (C D) =
E2 E1 R2
z }| { z }| { z }| {
A (B \ D) (A \ C) (D B)
17.2. CARATHEODORY EXTENSION THEOREM 423
= (A Q) (P R)
where Q E2 , P E1 , and R R2 .
D
B
then
0 (E) = 0 (Ei )
i=1
while 0 () < .
In this denition, 0 is trying to be a measure and acts like one whenever pos-
sible. Under these conditions, 0 can be extended uniquely to a complete measure,
, dened on a algebra of sets containing E such that agrees with 0 on E. The
following is the main result.
(E) = 0 (E)
for all E E. Also if is any such measure which agrees with 0 on E, then =
on (E), the algebra generated by E.
424 EXTENSION THEOREMS
(Si ) + i
(Eij ) .
2 j=1
Then ( )
(S) (Eij ) = (Si ) + = (Si ) + .
i j i
2i i
(, S, )
it follows
(A) + > 0 (Ei A) 0 (A)
i=1
(S) + > (Ei ) .
i=1
Then
(S) (S A) + (S \ A)
(
i=1 Ei \ A) + (i=1 (Ei A))
(Ei \A) + (Ei A) = (Ei ) < (S) + .
i=1 i=1 i=1
17.2. CARATHEODORY EXTENSION THEOREM 425
Lemma 17.2.3 Let M be a metric space with the closed balls compact and suppose
is a measure dened on the Borel sets of M which is nite on compact sets.
Then there exists a unique Radon measure, which equals on the Borel sets. In
particular must be both inner and outer regular on all Borel sets.
Proof: Dene a positive linear functional, (f ) = f d. Let be the Radon
measure which comes from the Riesz representation theorem for positive linear
functionals. (Theorem 16.4.11) Thus for all f continuous,
f d = f d.
and so the two measures coincide on all open sets. Every compact set is a countable
intersection of open sets and so the two measures coincide on all compact sets. Now
let B (a, n) be a ball of radius n and let E be a Borel set contained in this ball.
Then by regularity of there exist sets F, G such that G is a countable intersection
of open sets and F is a countable union of compact sets such that F E G and
(G \ F ) = 0. Now (G) = (G) and (F ) = (F ) . Thus
(G \ F ) + (F ) = (G)
= (G) = (G \ F ) + (F )
and so (G \ F ) = (G \ F ) . It follows
The main tool in the study of products of compact topological spaces is the
Alexander subbasis theorem which is presented next. Recall a set is compact if
every basic open cover admits a nite subcover. This was pretty easy to prove.
However, there is a much smaller set of open sets called a subbasis which has this
property. The proof of this result is much harder.
Denition 17.3.2 S is called a subbasis for the topology if the set B of nite
intersections of sets of S is a basis for the topology, .
Theorem 17.3.3 Let (X, ) be a topological space and let S be a subbasis for
. Then if H X, H is compact if and only if every open cover of H consisting
entirely of sets of S admits a nite subcover.
Proof: The only if part is obvious because the subasic sets are themselves open.
If every basic open cover admits a nite subcover then the set in question is
compact. Suppose then that H is a subset of X having the property that subbasic
open covers admit nite subcovers. Is H compact? Assume this is not so. Then
what was just observed about basic covers implies there exists a basic open cover
of H, O, which admits no nite subcover. Let F be dened as
The assumption is that F is nonempty. Partially order F by set inclusion and use
the Hausdor maximal principle to obtain a maximal chain, C, of such open covers
and let
D = C.
If D admits a nite subcover, then since C is a chain and the nite subcover has only
nitely many sets, some element of C would also admit a nite subcover, contrary
to the denition of F. Therefore, D admits no nite subcover. If D properly
contains D and D is a basic open cover of H, then D has a nite subcover of H
since otherwise, C would fail to be a maximal chain, being properly contained in
C {D }. Every set of D is of the form
U = m
i=1 Bi , Bi S
17.3. THE TYCHONOFF THEOREM 427
because they are all basic open sets. If it is the case that for all U D one of the
Bi is found in D, then replace each such U with the subbasic set from D containing
it. But then this would be a subbasic open cover of H which by assumption would
admit a nite subcover contrary to the properties of D. Therefore, one of the sets
of D, denoted by U , has the property that
U = m
i=1 Bi , Bi S
and no Bi is in D. Thus D {Bi } admits a nite subcover, for each of the above
Bi because it is strictly larger than D. Let this nite subcover corresponding to Bi
be denoted by
V1i , , Vmi i , Bi
Consider
{U, Vji , j = 1, , mi , i = 1, , m}.
If p H \ {Vji }, then p Bi for each i and so p U . This is therefore a nite
subcover of D contradicting the properties of D. Therefore, F must be empty and
this proves the theorem.
Let I be a set and suppose for each i I, (Xi , i )is a nonempty topological
space. The Cartesian product of the Xi , denoted by iI Xi , consists of the set
of allchoice functions dened on I which select a single element of each Xi . Thus
f iI Xi means for every i I, f (i) Xi . The axiom of choice says iI Xi
is nonempty. Let
Pj (A) = Bi
iI
Proof: By the Alexander subbasis theorem, the theorem will be proved if every
subbasic open cover admits a nite subcover. Therefore, let O be a subbasic open
428 EXTENSION THEOREMS
cover of iI Xi . Let
Thus Oj consists of those sets of O which have a possibly proper subset of Xi only
in the slot i = j. Let
j Oj = {A : Pj (A) Oj }.
Thus j Oj picks out those proper open subsets of Xj which occur in Oj .
If no j Oj covers Xj , then by the axiom of choice, there exists
f Xi \ i Oi
iI
Therefore, f (j) / j Oj for each j I. Now f is a point of iI Xi and so
f Pk (A) O for some k. However, this is a contradiction as it was shown that
f (k) is not an element of A. (A is one of the sets whose union makes up k Ok .)
This contradiction shows that for some j, j Oj covers Xj . Thus
Xj = j Oj
Lemma 17.4.2 The sets, E ,EJ dened above form an algebra of sets of tI Mt .
Proof: First consider RJ . If A, B RJ , then A B RJ also. Is A \ B a
nite disjoint union of sets of RJ ? It suces to verify that J (A \ B) is a nite
disjoint union of J (RJ ). Let |J| denote the number of indices in J. If |J| = 1,
then it is obvious that J (A \ B) is a nite disjoint union of sets of J (RJ ). In
fact, letting J = (t) and the tth entry of A is A and the tth entry of B is B, then
the tth entry of A \ B is A \ B, a Borel set of Mt , a nite disjoint union of Borel
sets of Mt .
Suppose then that for A, B sets of RJ , J (A \ B) is a nite disjoint union of
sets of J (RJ ) for |J| n, and consider J = (t1 , , tn , tn+1 ) . Let the tth
i entry
of A and B be respectively Ai and Bi . It follows that J (A \ B) has the following
in the entries for J
(A1 A2 An An+1 ) \ (B1 B2 Bn Bn+1 )
Letting A represent A1 A2 An and B represent B1 B2 Bn , this
is of the form
A (An+1 \ Bn+1 ) (A \ B) (An+1 Bn+1 )
By induction, (A \ B) is the nite disjoint union of sets of R(t1 , ,tn ) . Therefore,
the above is the nite disjoint union of sets of RJ . It follows that EJ is an algebra.
Now suppose A, B R. Then for some nite set J, both are in RJ . Then from
what was just shown,
A \ B EJ E, A B R.
By Lemma 17.1.2 on Page 421 this shows E is an algebra.
With this preparation, here is the Kolmogorov extension theorem. In the state-
ment and proof of the theorem, Fi , Gi , and Ei will denote Borel sets. Any list of
indices from I will always be assumed to be taken in order. Thus, if J I and
J = (t1 , , tn ) , it will always be assumed t1 < t2 < < tn .
Theorem 17.4.3 For each nite set
J = (t1 , , tn ) I,
supposethere exists a Borel probability measure, J = t1 tn dened on the Borel
sets of tJ Mt such that the following consistency condition holds. If
(t1 , , tn ) (s1 , , sp ) ,
then ( )
t1 tn (Ft1 Ftn ) = s1 sp Gs1 Gsp (17.4.1)
where if si = tj , then Gsi = Ftj and if si is not equal to any of the indices, tk ,
then Gsi = Msi . Then for E dened in Denition 17.4.1, there exists a probability
measure, P and a algebra F = (E) such that
( )
Mt , P, F
tI
430 EXTENSION THEOREMS
is a probability space. Also there exist measurable functions, Xs : tI Mt Ms
dened as
Xs x xs
for each s I such that for each (t1 tn ) I,
t1 tn (Ft1 Ftn ) = P ([Xt1 Ft1 ] [Xtn Ftn ])
( )
n
= P (Xt1 , , Xtn ) Ftj = P Ft (17.4.2)
j=1 tI
where Ft = Mt for every t / {t1 tn } and Fti is a Borel set. Also if f is a non-
negative
( function
) of nitely many variables, xt1 , , xtn , measurable with respect to
n
B j=1 Mtj , then f is also measurable with respect to F and
f (xt1 , , xtn ) d t1 tn
Mt1 Mtn
=
f (xt1 , , xtn ) dP (17.4.3)
tI Mt
Proof: Let E be the algebra of sets dened in Denition 17.4.1. I want to dene
a measure on E. For F E, there exists J such that F is the nite disjoint unions
of sets of RJ . Dene
P0 (F) J ( J (F))
Then P0 is well dened because of the consistency condition on the measures J .
P0 is clearly nitely additive because the J are measures and one can pick J as
large as desired to include all t where there may be something other than Mt . Also,
from the denition,
( )
P0 () P0 Mt = t1 (Mt1 ) = 1.
tI
Next I will show P0 is a nite measure on E. After this it is only a matter of using
the Caratheodory extension theorem to get the existence of the desired probability
measure P.
Claim: Suppose En is in E and suppose En . Then P0 (En ) 0.
Proof of the claim: If not, there exists a sequence such that although En
, P0 (En ) > 0. Let En EJn . Thus it is a nite disjoint union of sets of RJn .
By regularity of the measures J , there exists a compact set KJn En such that
Jn ( Jn (KJn )) + > Jn ( Jn (En ))
2n+2
Thus
P0 (KJn ) + Jn ( Jn (KJn )) +
2n+2 2n+2
> Jn ( Jn (En )) P0 (En )
17.4. KOLMOGOROV EXTENSION THEOREM 431
The interesting thing about these KJn is: they have the nite intersection property.
Here is why.
P0 (m
k=1 KJk ) + P0 (E \ k=1 KJk )
m m
( m )
P0 (k=1 KJk ) + P0 k=1 E \ KJk
m k
< P0 (m K
k=1 Jk ) + < P0 (m
k=1 KJk ) + /2,
2k+2
k=1
and so P0 (m k=1 KJk ) > /2. Now this yields a contradiction, because this nite in-
tersection property implies the intersection of all the KJk is nonempty, contradicting
En since each KJn is contained in En .
With the claim, it follows P0 is a measure on E. Here is why: If E = k=1 E
k
n
where E, E E, then (E \ k=1 Ek ) and so
k
P0 (nk=1 Ek ) P0 (E) .
n
Hence if the Ek are disjoint, P0 (nk=1 Ek ) = k=1 P0 (Ek ) P0 (E) . Thus for
disjoint Ek having k Ek = E E,
P0 (
k=1 Ek ) = P0 (Ek ) .
k=1
Now to conclude the proof, apply the Caratheodory extension theorem to obtain
P a probability measure which extends P0 to a algebra which contains (E) the
sigma algebra generated by E with P = P0 on E. Thus for EJ E, P (EJ ) =
P0 (EJ ) = J ((P
J Ej ) . )
Next, let tI Mt , F, P be the probability space and for x tI Mt let
Xt (x) = xt , the tth entry of x. It follows Xt is measurable (also continuous) because
if U is open in Mt , then Xt1 (U ) has a U in the tth slot and Ms everywhere else for
s = t. Thus inverse images of open sets are measurable. Also, letting J be a nite
subset of I and for J = (t1 , , tn ) , and Ft1 , , Ftn Borel sets in Mt1 Mtn
respectively, it follows FJ , where FJ has Fti in the tth i entry, is in E and therefore,
( )
= P Ft = XtI Ft (x) dP
tI
= XF (xt1 , , xtn ) dP (17.4.5)
Lemma 17.4.4 Let J be a nite subset ofI. Then U is a Borel set in tJ Mt if
and only if there exists a Borel set, U in tJ Mt such that U = U tJ Mt .
Proof: A subbasis for the topology for [, ] is sets of the form [, a)
n
n (a, ]. Also
and a subbasis for the topology of [, ] is sets of the form
n n
i=1 [, ai ) and ]. Similarly, a subbasis
i=1 (ai , n for the topology of (, )
n
consists of sets of the form i=1 (, a i ) and i=1 (ai , ). Thus the basic open
sets
of tJ M t are of the form U
tJ t M where U is a basic open set in
tJ M t . It follows
the open sets of tJ M t are of the form
U tJ Mt where
U is open in tJ Mt . Let F denote those Borel sets of tJ M t which are of the
form U tJ Mt for U a Borel set in tJ M t
. Then as just shown, F contains
the system of open sets in tJ Mt . Let G denote those Borel sets of tJ Mt
which are of the desired form. It is clearly closed with respect to complements and
countable disjoint unions. Hence G equals the Borel sets of tJ Mt .
Maybe this diagram will help to keep the argument straight.
M
M
U
then ( )
t1 tn (Ft1 Ftn ) = s1 sp Gs1 Gsp (17.4.6)
where if si = tj , then Gsi = Ftj and if si is not equal to any of the indices, tk , then
Gsi = Msi . Then for E dened as in Denition 17.4.1, adjusted so that never
appears as any endpoint of any interval, there exists a probability measure, P and
a algebra F = (E) such that
( )
Mt , P, F
tI
is a probability space. Also there exist measurable functions, Xs : tI Mt Ms
dened as
Xs x xs
for each s I such that for each (t1 tn ) I,
where Ft = Mt for every t / {t1 tn } and Fti is a Borel set. Also if f is a non-
negative
( function
) of nitely many variables, xt1 , , xtn , measurable with respect to
n
B j=1 Mtj , then f is also measurable with respect to F and
f (xt1 , , xtn ) d t1 tn
Mt1 Mtn
=
f (xt1 , , xtn ) dP (17.4.8)
tI Mt
Proof: Using Lemma 17.4.4, extend each measure, (J to M t , dened
) by adding
in(the points) at the ends, by letting (E) E M for all E
J J tI t
B tI M t . Then apply Theorem 17.4.3 to these extended measures and use the
denition of the extensions of each J to replace each Mt with Mt everywhere it
occurs.
As a special case, you can obtain a version of product measure for possibly
innitely many factors. Suppose in the context of the above theorem that t is a
probability measure dened on the Borel sets of Mt Rnt fornt a positive integer,
n
and let the measures, t1 tn be dened on the Borel sets of i=1 Mti by
product measure
z }| {
t1 tn (E) ( t1 tn ) (E) .
Then these measures satisfy the necessary consistency condition and so the Kol-
mogorov extension theorem given above can be applied to obtain a measure P
434 EXTENSION THEOREMS
( )
dened on tI Mt , F and measurable functions Xs : tI Mt Ms such that
for Fti a Borel set in Mti ,
( )
n
P (Xt1 , , Xtn ) Fti = t1 tn (Ft1 Ftn )
i=1
17.5 Independence
The concept of independence is probably the main idea which separates probability
from analysis and causes some of us to struggle to understand what is going on. In
what follows, recall that a Banach space is a complete normed vector space. These
are discussed more elsewhere in the book.
m
P (m
k=1 Aik ) = P (Aik ) .
k=1
( )
Each of these events denes a rather simple algebra, Ai , AC i , , denoted
by Fi . Now the following lemma is interesting because it motivates a more general
notion of independent algebras.
m
P (m
k=1 Bik ) = P (Bik ) .
k=1
17.5. INDEPENDENCE 435
Proof: The proof is by induction on the number l of the Bik which are not
equal to Aik . First suppose l = 0. Then the above assertion is true by assumption.
Suppose it is so for some l and there are l + 1 sets not equal to Aik . If any equals
there is nothing to show. Both sides equal 0. If any equals , there is also nothing
to show. You can ignore that set in both sides and then you have by induction the
two sides are equal because you have no more than l sets dierent than Aik . The
only remaining case is where some Bik = AC C
ik . Say Bim+1 = Aim+1 for simplicity.
( ) ( )
P m+1
k=1 Bik = P Aim+1 k=1 Bik
C m
( )
k=1 Bik ) P Aim+1 k=1 Bik
= P (m m
Then by induction,
m
( )
m
m
( ( ))
= P (Bik ) P Aim+1 P (Bik ) = P (Bik ) 1 P Aim+1
k=1 k=1 k=1
( )
m
m+1
= P AC
im+1 P (Bik ) = P (Bik )
k=1 k=1
Denition 17.5.3 If {Fi }iI is any set of algebras contained in F, they are said
to be independent if whenever Aik Fik for k = 1, 2, , m, then
m
P (m
k=1 Aik ) = P (Aik ) .
k=1
A set of random variables {Xi }iI is independent if the algebras { (Xi )}iI
are independent algebras. Here {(X) denotes the smallest } algebra such that
X is measurable. Thus (X) = X1 (U ) : U is a Borel set . More generally,
(Xi : i I) is the smallest algebra such that each Xi is measurable.
Observation 17.5.4 Recall that (X) was the smallest algebra such that X is
measurable with respect to (X) . That is (X) must contain X1 (U ) for every
open U . If S denotes the Borel sets B such that X1 (B) (X) , it follows easily
that S is a algebra and also contains the open sets. Hence S must contain the
Borel sets. Hence { 1 }
X (U ) : U is a Borel set
must be contained in (X) . However, such sets described above also constitute a
algebra and X is measurable with respect to this algebra. Hence it contains (X).
Note that by Lemma 17.5.2 you can consider independent events in terms of
independent algebras. That is, a set of independent events can always be consid-
ered as events taken from a set of independent algebras. This is a more general
notion because here the algebras might have innitely many sets in them.
436 EXTENSION THEOREMS
Lemma 17.5.5 Suppose the set of random variables, {Xi }iI is independent. Also
suppose I1 I and j
/ I1 . Then the algebras (Xi : i I1 ) , (Xj ) are indepen-
dent algebras.
( )
m
( )
1
P (A B) = P m X
k=1 k (B k ) B = P (B) P X1
k (Bk )
k=1
( 1
)
= P (B) P m
k=1 Xk (Bk ) .
Thus K is contained in
Now G is closed with respect to complements and countable disjoint unions. Here
is why: If each Ai G and the Ai are disjoint,
P ((
i=1 Ai ) B) = P (
i=1 (Ai B))
= P (Ai B) = P (Ai ) P (B)
i i
= P (B) P (Ai ) = P (B) P (
i=1 Ai )
i
If A G, ( )
P AC B + P (A B) = P (B)
and so
( )
P AC B = P (B) P (A B)
= P (B) P (A) P (B)
( )
= P (B) (1 P (A)) = P (B) P AC .
Therefore, from the lemma on systems, Lemma 9.2.2 on Page 219, it follows
G (K) = (Xi : i I1 ).
Notation 17.5.6 In probability, it is standard to write E (X) in place of
XdP.
This is also referred to as the expectation.
r
Lemma 17.5.7 If {Xk }k=1 are independent random variables having values in Z a
r
separable metric space, and if gk is a Borel measurable function, then {gk (Xk )}k=1
17.5. INDEPENDENCE 437
More generally, the above formula holds if it is only known that each Xi L1 (; R)
and
r
Xi L1 (; R) .
i=1
r
Proof: First consider the claim about {gk (Xk )}k=1 . Letting O be an open set
in Z, ( 1 )
1
(gk Xk ) (O) = X1 k gk (O) = X1k (Borel set) (Xk ) .
1
It follows (gk Xk ) (E) is in (Xk ) whenever E is Borel because the sets whose
inverse images are measurable includes the Borel sets. Thus (gk Xk ) (Xk )
and this proves the rst part of the lemma.
m m
Let X1 = i=1 ci XEi , X2 = j=1 dj XFj where P (Ei Fj ) = P (Ei ) P (Fj ). Then
( ) ( )
X1 X2 dP = dj ci P (Ei ) P (Fj ) = X1 dP X2 dP
i,j
Next suppose there are m of these independent bounded random variables. Then
m
mi=2 Xi (X2 , , Xm ) and by Lemma 17.5.5 the two random variables X1 and
i=2 Xi are independent. Hence from the above and induction,
m
m
m m
Xi dP = X1 Xi dP = X1 dP Xi dP = Xi dP
i=1 i=2 i=2 i=1
Now consider the last claim. Replace each Xi with Xin where this is just a
truncation of the form
Xi if |Xi | n
Xin n if Xi > n
n if Xi < n
438 EXTENSION THEOREMS
n
Bi = ni=1 1
i (Bi ) , a Borel set.
i=1
Thus ( )
n
X 1
Bi = ni=1 Xi1 (Bi ) (X1 , , Xn )
i=1
n
If G denotes the Borel sets F i=1 Zi such that X1 (F ) (X1 , , Xn ) , then
G is clearly a algebra which contains the open sets. Hence G = B the Borel sets of
n
i=1 Zi . This shows that (X) (X1 , , Xn ) . Next we observe that (X) is a
algebra with the property that (each Xi)is measurable with respect to (X) . This
1 1 n
follows from Xi (Bi ) = X j=1 Aj (X) , where each Aj = Zj except for
Ai = Bi .Since (X1 , , Xn ) is dened as the smallest such algebra, it follows
that (X) (X1 , , Xn ) .
Maybe this would be a good place to put a really interesting result known as the
Doob Dynkin lemma. This amazing result is illustrated with the following diagram
in which X = (X1 , , Xm ). By Proposition 17.5.8 (X) = (X1 , , Xn ) .
X
(, (X)) F
X g
m m
( i=1 Ei , B ( i=1 Ei ))
You start with X and can write it as the composition g X provided X is (X)
measurable.
17.6. EXERCISES 439
17.6 Exercises
1. A random variable X : (, F, P ) R is said to be normally distributed
with mean and variance 2 > 0 if for all E a Borel set in R,
1 (x)2
P ([X E]) = e 22 dx.
2 E
Use the Kolmogorov extension theorem to show that there exists a probability
space and random variables { i }i=1 dened on this space such that each i is
normally distributed with mean 0 and variance 1 such that also the random
variables i are independent. Hint: For i1 < < im , and E a Borel set
1 (x2 ++x2 )
of Rm , dene i1 im (E) 2
1
m e 2 i1 im
dmm . Show that this
( ) E
satises the necessary consistency condition for the Kolmogorov extension
theorem.
440 EXTENSION THEOREMS
2. Show using the Hausdor maximal theorem that there exists a maximal
sequence of orthonormal functions {gi } in L2 (0, ) which is countable. Then
show that if f L2 (0, ) ,
2 2
||f ||L2 (0,) = |(f, gi )| .
i=1
converges in L2 () . Thus the above series yields a random variable for each
t. W (t) is called the Wiener process or one dimensional Brownian motion.
Much more can be said about it.
Banach Spaces
The following remarkable result is called the Baire category theorem. To get an
idea of its meaning, imagine you draw a line in the plane. The complement of this
line is an open set and is dense because every point, even those on the line, are limit
points of this open set. Now draw another line. The complement of the two lines
is still open and dense. Keep drawing lines and looking at the complements of the
union of these lines. You always have an open set which is dense. Now what if there
were countably many lines? The Baire category theorem implies the complement
of the union of these lines is dense. In particular it is nonempty. Thus you cannot
write the plane as a countable union of lines. This is a rather rough description of
this very important theorem. The precise statement and proof follow.
Theorem 18.1.2 Let (X, d) be a complete metric space and let {Un }
n=1 be a se-
quence of open subsets of X satisfying Un = X (Un is dense). Then D
n=1 Un
is a dense subset of X.
441
442 BANACH SPACES
r0 p
p 1
rn < 2n,
lim pn = p .
n
Since all but nitely many terms of {pn } are in B(pm , rm ), it follows that p
B(pm , rm ) for each m. Therefore,
p
m=1 B(pm , rm ) i=1 Ui B(p, r0 ).
Proof: If all Fi has empty interior, then FiC would be a dense open set. There-
fore, from Theorem 18.1.2, it would follow that
= (
C
i=1 Fi ) = i=1 Fi = .
C
18.1. THEOREMS BASED ON BAIRE CATEGORY 443
implies
lim f (xn ) = f (x).
n
Theorem 18.1.4 Let X and Y be two normed linear spaces and let L : X Y be
linear (L(ax + by) = aL(x) + bL(y) for a, b scalars and x, y X). The following
are equivalent
a.) L is continuous at 0
b.) L is continuous
c.) There exists K > 0 such that ||Lx||Y K ||x||X for all x X (L is
bounded).
L (xn x) = Lxn Lx 0
Hence
1
||Lx|| ||x||.
Note that from Theorem 18.1.4 ||L|| is well dened because of part c.) of that
Theorem.
The next lemma follows immediately from the denition of the norm and the
assumption that L is linear.
Lemma 18.1.6 With ||L|| dened in 18.1.1, L(X, Y ) is a normed linear space.
Also ||Lx|| ||L|| ||x||.
Therefore, multiplying both sides by ||x||, ||Lx|| ||L|| ||x||. This is obviously a
linear space. It remains to verify the operator norm really is a norm. First (of all, )
if ||L|| = 0, then Lx = 0 for all ||x|| 1. It follows that for any x = 0, 0 = L ||x||
x
This shows the operator norm is really a norm as hoped. This proves the lemma.
For example, consider the space of linear transformations dened on Rn having
values in Rm . The fact the transformation is linear automatically imparts conti-
nuity to it. You should give a proof of this fact. Recall that every such linear
transformation can be realized in terms of matrix multiplication.
Thus, in nite dimensions the algebraic condition that an operator is linear is
sucient to imply the topological condition that the operator is continuous. The
situation is not so simple in innite dimensional spaces such as C (X; Rn ). This
explains the imposition of the topological condition of continuity as a criterion for
membership in L (X, Y ) in addition to the algebraic condition of linearity.
Lx = lim Ln x.
n
18.1. THEOREMS BASED ON BAIRE CATEGORY 445
Also L is continuous. To see this, note that {||Ln ||} is a Cauchy sequence of real
numbers because |||Ln || ||Lm ||| ||Ln Lm ||. Hence there exists K > sup{||Ln || :
n N}. Thus, if x X,
Theorem 18.1.8 Let X be a Banach space and let Y be a normed linear space.
Let {L } be a collection of elements of L(X, Y ). Then one of the following
happens.
a.) sup{||L || : } <
b.) There exists a dense G set, D, such that for all x D,
sup{||L x|| } = .
But then, since L is continuous, this situation persists for all y suciently close
to x, say for all y B (x, ). Then B (x, ) Un which shows Un is open.
Case b.) is obtained from Theorem 18.1.2 if each Un is dense.
The other case is that for some n, Un is not dense. If this occurs, there exists
x0 and r > 0 such that for all x B(x0 , r), ||L x|| n for all . Now if y
446 BANACH SPACES
B(0, r), x0 + y B(x0 , r). Consequently, for all such y, ||L (x0 + y)|| n. This
implies that for all and ||y|| < r,
Theorem 18.1.9 Let X and Y be Banach spaces, let L L(X, Y ), and suppose L
is onto. Then L maps open sets onto open sets.
Then
L(B(0, b)) L(B(0, 2b)).
Proof of Lemma 18.1.10: Let y L(B(0, b)). There exists x1 B(0, b) such
that ||y Lx1 || < a2 . Now this implies
Thus 2y 2Lx1 L(B(0, b)) just like y was. Therefore, there exists x2 B(0, b)
such that ||2y 2Lx1 Lx2 || < a/2. Hence ||4y 4Lx1 2Lx2 || < a, and there
exists x3 B (0, b) such that ||4y 4Lx1 2Lx2 Lx3 || < a/2. Continuing in this
way, there exist x1 , x2 , x3 , x4 , ... in B(0, b) such that
n
||2n y 2n(i1) L(xi )|| < a
i=1
which implies
( )
n
n
(i1) (i1)
||y 2 L(xi )|| = ||y L 2 (xi ) || < 2n a (18.1.2)
i=1 i=1
18.1. THEOREMS BASED ON BAIRE CATEGORY 447
Now consider the partial sums of the series, i=1 2(i1) xi .
n
|| 2(i1) xi || b 2(i1) = b 2m+2 .
i=m i=m
Therefore, these
partial sums form a Cauchy sequence and so since X is complete,
there exists x = i=1 2(i1) xi . Letting n in 18.1.2 yields ||y Lx|| = 0. Now
n
||x|| = lim || 2(i1) xi ||
n
i=1
n
n
lim 2(i1) ||xi || < lim 2(i1) b = 2b.
n n
i=1 i=1
The reason for the last inclusion is that from the above, if y1 B (y, r) and y2
B (y, r), there exists xn , zn B (0, n0 ) such that
Lxn y1 , Lzn y2 .
Therefore,
||xn + zn || 2n0
and so (y1 + y2 ) L(B(0, 2n0 )).
By Lemma 18.1.10, L(B(0, 2n0 )) L(B(0, 4n0 )) which shows
Hence
Lx B(Lx, ar) L(U ).
which shows that every point, Lx LU , is an interior point of LU and so LU is
open. This proves the theorem.
This theorem is surprising because it implies that if || and |||| are two norms
with respect to which a vector space X is a Banach space such that || K ||||,
then there exists a constant k, such that |||| k || . This can be useful because
sometimes it is not clear how to compute k when all that is needed is its existence.
To see the open mapping theorem implies this, consider the identity map id x = x.
Then id : (X, ||||) (X, ||) is continuous and onto. Hence id is an open map which
implies id1 is continuous. Theorem 18.1.4 gives the existence of the constant k.
There are other ways to give a norm for X Y . For example, you could dene
||(x, y)|| = ||x|| + ||y||
Proof: The only axiom for a norm which is not obvious is the triangle inequality.
Therefore, consider
It is obvious X Y is a vector space from the above denition. This proves the
lemma.
Lemma 18.1.14 If X and Y are Banach spaces, then X Y with the norm and
vector space operations dened in Denition 18.1.12 is also a Banach space.
18.1. THEOREMS BASED ON BAIRE CATEGORY 449
Proof: The only thing left to check is that the space is complete. But this
follows from the simple observation that {(xn , yn )} is a Cauchy sequence in X Y
if and only if {xn } and {yn } are Cauchy sequences in X and Y respectively. Thus
if {(xn , yn )} is a Cauchy sequence in X Y , it follows there exist x and y such that
xn x and yn y. But then from the denition of the norm, (xn , yn ) (x, y).
Note the distinction between closed and continuous. If the operator is closed
the assertion that y = Lx only follows if it is known that the sequence {Lxn }
converges. In the case of a continuous operator, the convergence of {Lxn } follows
from the assumption that xn x. It is not always the case that a mapping which
is closed is necessarily continuous. Consider the function f (x) = tan (x) if x is not
an odd multiple of 2 and f (x) 0 at every odd multiple of 2 . Then the graph
is closed and the function is dened on R but it clearly fails to be continuous. Of
course this function is not linear. You could also consider the map,
d { }
: y C 1 ([0, 1]) : y (0) = 0 D C ([0, 1]) .
dx
where the norm is the uniform norm on C ([0, 1]) , ||y|| . If y D, then
x
y (x) = y (t) dt.
0
Therefore, if dyn
dx f C ([0, 1]) and if yn y in C ([0, 1]) it follows that
x dyn (t)
yn (x) = 0 dx dt
x
y (x) = 0
f (t) dt
and so by the fundamental theorem of calculus f (x) = y (x) and so the mapping
is closed. It is obviously not continuous because it takes y (x) and y (x) + n1 sin (nx)
to two functions which are far from each other even though these two functions are
very close in C ([0, 1]). Furthermore, it is not dened on the whole space, C ([0, 1]).
The next theorem, the closed graph theorem, gives conditions under which closed
implies continuous.
450 BANACH SPACES
By Theorem 18.1.4 on Page 443, this shows L is continuous and proves the theorem.
The following corollary is quite useful. It shows how to obtain a new norm on
the domain of a closed operator such that the domain with this new norm becomes
a Banach space.
Proof: If {xn } is a Cauchy sequence in D with this new norm, it follows both
{xn } and {Lxn } are Cauchy sequences and therefore, they converge. Since L is
closed, xn x and Lxn Lx for some x D. Thus ||xn x||D 0.
x x for all x F .
If x y and y z then x z.
C F is said to be a chain if every two elements of C are related. This means that
if x, y C, then either x y or y x. Sometimes a chain is called a totally ordered
set. C is said to be a maximal chain if whenever D is a chain containing C, D = C.
18.2. HAHN BANACH THEOREM 451
The most common example of a partially ordered set is the power set of a given
set with being the relation. It is also helpful to visualize partially ordered sets
as trees. Two points on the tree are related if they are on the same branch of
the tree and one is higher than the other. Thus two points on dierent branches
would not be related although they might both be larger than some point on the
trunk. You might think of many other things which are best considered as partially
ordered sets. Think of food for example. You might nd it dicult to determine
which of two favorite pies you like better although you may be able to say very
easily that you would prefer either pie to a dish of lard topped with whipped cream
and mustard. The following theorem is equivalent to the axiom of choice. For a
discussion of this, see the appendix on the subject.
Theorem 18.2.2 (Hausdor Maximal Principle) Let F be a nonempty partially
ordered set. Then there exists a maximal chain.
Denition 18.2.3 Let X be a real vector space : X R is called a gauge function
if
(x + y) (x) + (y),
(ax) = a(x) if a 0. (18.2.4)
Suppose M is a subspace of X and z / M . Suppose also that f is a linear
real-valued function having the property that f (x) (x) for all x M . Consider
the problem of extending f to M Rz such that if F is the extended function,
F (y) (y) for all y M Rz and F is linear. Since F is to be linear, it suces
to determine how to dene F (z). Letting a > 0, it is required to dene F (z) such
that the following hold for all x, y M .
f (x)
z }| {
F (x) + aF (z) = F (x + az) (x + az),
f (y)
z }| {
F (y) aF (z) = F (y az) (y az). (18.2.5)
Now if these inequalities hold for all y/a, they hold for all y because M is given to
be a subspace. Therefore, multiplying by a1 18.2.4 implies that what is needed is
to choose F (z) such that for all x, y M ,
f (x) + F (z) (x + z), f (y) (y z) F (z)
and that if F (z) can be chosen in this way, this will satisfy 18.2.5 for all x, y and
the problem of extending f will be solved. Hence it is necessary to choose F (z)
such that for all x, y M
f (y) (y z) F (z) (x + z) f (x). (18.2.6)
Is there any such number between f (y) (y z) and (x + z) f (x) for every
pair x, y M ? This is where f (x) (x) on M and that f is linear is used.
For x, y M ,
(x + z) f (x) [f (y) (y z)]
452 BANACH SPACES
= (x + z) + (y z) (f (x) + f (y))
(x + y) f (x + y) 0.
Therefore there exists a number between
sup {f (y) (y z) : y M }
and
inf {(x + z) f (x) : x M }
Choose F (z) to satisfy 18.2.6. This has proved the following lemma.
Lemma 18.2.4 Let M be a subspace of X, a real linear space, and let be a gauge
function on X. Suppose f : M R is linear, z / M , and f (x) (x) for all
x M . Then f can be extended to M Rz such that, if F is the extended function,
F is linear and F (x) (x) for all x M Rz.
Theorem 18.2.5 (Hahn Banach theorem) Let X be a real vector space, let M be a
subspace of X, let f : M R be linear, let be a gauge function on X, and suppose
f (x) (x) for all x M . Then there exists a linear function, F : X R, such
that
a.) F (x) = f (x) for all x M
b.) F (x) (x) for all x X.
(V, g) (W, h)
means
V W and h(x) = g(x) if x V.
By Theorem 18.2.2, there exists a maximal chain, C F. Let Y = {V : (V, g) C}
and let h : Y R be dened by h(x) = g(x) where x V and (V, g) C. This
is well dened because if x V1 and V2 where (V1 , g1 ) and (V2 , g2 ) are both in the
chain, then since C is a chain, the two element related. Therefore, g1 (x) = g2 (x).
Also h is linear because if ax + by Y , then x V1 and y V2 where (V1 , g1 )
and (V2 , g2 ) are elements of C. Therefore, letting V denote the larger of the two Vi ,
and g be the function that goes with V , it follows ax + by V where (V, g) C.
Therefore,
Also, h(x) = g (x) (x) for any x Y because for such x, x V where (V, g) C.
Is Y = X? If not, there exists z X \ Y and there exists an extension of h to
Y Rz using Lemma 18.2.4. Letting( h denote this ) extended function, contradicts
the maximality of C. Indeed, C { Y Rz, h } would be a longer chain. This
proves the Hahn Banach theorem.
This is the original version of the theorem. There is also a version of this theorem
for complex vector spaces which is based on a trick.
Re f (x + y) i Re f (i (x + y)) = f (x + y)
= f (x) + f (y)
Actually, |h (x)| K ||x|| . The reason for this is that h (x) = h (x) K ||x|| =
K ||x|| and therefore, h (x) K ||x||. Let
If c is a real scalar,
Now
Thus
Denition 18.2.8 Let X and Y be Banach spaces and suppose L L(X, Y ). Then
dene the adjoint map in L(Y , X ), denoted by L , by
L y (x) y (Lx)
for all y Y .
18.2. HAHN BANACH THEOREM 455
Proof:
Denition 18.3.1 A Banach space is uniformly convex if whenever ||xn ||, ||yn ||
1 and ||xn + yn || 2, it follows that ||xn yn || 0.
You can show that uniform convexity implies strict convexity. There are various
other things which can also be shown. See the exercises for some of these. In this
section, it will be shown that the Lp spaces are examples of uniformly convex spaces.
This involves some inequalities known as Clarksons inequalities. Before present-
ing these, here are the backwards Holder inequality and the backwards Minkowski
inequality.
Lemma 18.3.2 Let 0 < p < 1 and let f, g be measurable functions. Also
p/(p1) p
|g| d < , |f | d <
This makes sense because, due to the hypothesis on g it must be the case that g
equals 0 only on a set of measure zero, since p/ (p 1) < 0. Then
( )p ( ( )1/(1p) )1p
p 1
|f | d |f g| d p d
|g|
( )p ( )1p
p/p1
= |f g| d |g| d
p
Proof: If (|f | + |g|) d = 0 then there is nothing to prove so assume this is
not zero.
p p1
(|f | + |g|) d = (|f | + |g|) (|f | + |g|) d
p p p
(|f | + |g|) |f | + |g| and so
( )p/p1
p1
(|f | + |g|) d < .
( ( [ )1/p ]
)p/p1 )(p1)/p ( )1/p (
p1 p p
(|f | + |g|) |f | d + |g| d
( )(p1)/p [( )1/p ( )1/p ]
p p p
= (|f | + |g|) |f | d + |g| d
Proof: It is clear that, since p 2, the inequality holds for t = 0 and t = 1.Thus
it suces to consider only t (0, 1). Let x = 1/t. Then, dividing by 1/tp , the
inequality holds if and only if
( )p ( )p
x+1 x1 1
+ (1 + xp )
2 2 2
for all x 1. Let
(( )p ( )p )
1 x+1 x1
f (x) = (1 + xp ) +
2 2 2
Then f (1) = 0 and
( ( )p1 ( )p1 )
p p x+1 p x1
f (x) = xp1 +
2 2 2 2 2
Proof: One of |w| , |z| is larger. Say |z| |w| . Then dividing both sides of the
p
proposed inequality by |z| it suces to verify that for all complex t having |t| 1,
1 + t p 1 t p 1
+ p
2 2 2 (|t| + 1)
It is 2p times
( )p/2 ( )p/2
2 2
(1 + r cos ) + r2 sin2 () + (1 r cos ) + r2 sin2 ()
( )p/2 ( )p/2
= 1 + r2 + 2r cos + 1 + r2 2r cos ,
a continuous periodic function for R which achieves its maximum value when
= 0. This follows from the rst derivative test from calculus. Therefore, for |t| 1,
1 + t p 1 t p 1 + |t| p 1 |t| p 1
+ + p
2 2 2 2 2 (1 + |t| )
Lemma 18.3.7 For 1 < p < 2, the following inequality holds for all t [0, 1] .
( )q/p
1 + t q 1 t q
+ 1 + 1 |t|p
2 2 2 2
where here 1/p + 1/q = 1 so q > 2.
18.3. UNIFORM CONVEXITY OF LP 461
Proof: First note that if t = 0 or 1, the inequality holds. Next observe that the
map s 1s
1+s maps (0, 1) onto (0, 1). Replace t with (1 s) / (1 + s). Then you get
( p )q/p
1 q s q
+ 1 + 1 1 s
s + 1 s + 1 2 2 s + 1
q
Multiplying both sides by (1 + s) , this is equivalent to showing that for all s
(0, 1) ,
( p )q/p
p q/p 1 1 1 s
1+s q
((1 + s) ) +
2 2 s + 1
( )q/p
1 p p q/p
= ((1 + s) + (1 s) )
2
From the above observation about the binomial coecients, the above is larger than
(
) ( )
p p1
s
2k
sq(2k1)
2k 2k 1
k=1
It remains to show the k th term in the above sum is nonnegative. Now q (2k 1) >
2k for all k 1 because q > 2. Then since 0 < s < 1
( ) ( ) (( ) ( ))
p p1 p p1
s
2k
sq(2k1)
s2k
2k 2k 1 2k 2k 1
( )q/p ( )q/p
f + g p f g p
d + d
2 2
( ( ) )q/p ( ( ) )q/p
f + g q p/q f g q p/q
= d + d
2 2
Now p/q < 1 and so the backwards Minkowski inequality applies. Thus
( ( ) )q/p
f + g q f g q p/q
2 + 2 d
Now with these Clarkson inequalities, it is not hard to show that all the Lp
spaces are uniformly convex.
n
Next suppose 1 < p < 2 and fn +g
2 p 1. Then from the second Clarkson
L
inequality
q ( )q/p
fn + gn q
+ fn gn 1 ||fn ||p p + 1 ||gn ||p p 1
2 p 2 p 2 L
2 L
L L
A (x) max
|x (x)| (18.4.9)
x A
A (x + y) A (x) + A (y) ,
A (ax) = |a| A (x) .
BA (x , r) {y X : A (y x ) < r} . (18.4.12)
A (x + y ) (x ) + A (y ) ,
A (ax ) = |a| A (x ) .
Proof: The two assertions are very similar. I will verify the one for the weak
topology. The union of these sets, BA (x, r) for x X and r > 0 is all of X. Now
suppose z is contained in the intersection of two of these sets. Say
r A (z x) > C (y z) A (y z)
and so
r > A (y z) + A (z x) A (y x)
which shows y BA (x, r) . Similar reasoning shows y BA1 (x1 , r1 ) and so
Therefore, the weak topology consists of the union of all sets of the form BA (x, r).
Denition 18.4.2 Let I be a set and suppose for each i I, (Xi , i ) is a nonempty
topological space. The Cartesian product of the Xi , denoted by iI Xi , consists
of the set of allchoice functions dened on I which select a single element of each
i . Thus f iI Xi means for every i I, f (i) Xi . The axiom of choice says
X
iI Xi is nonempty. Let
Pj (A) = Bi
iI
Theorem 18.4.4 Let B be the closed unit ball in X . Then B is compact in the
weak topology.
is compact in the product topology where the topology on B (0, ||x||) is the usual
topology of F. Recall P is the set of functions which map a point, x X to a point
in B (0, ||x||). Therefore, B P. Also the basic open sets in the weak topology
on B are obtained as the intersection of basic open sets in the product topology
of P to B and so it suces to show B is a closed subset of P. Suppose then that
f P \ B . Since |f (x)| x for each x, it follows f cannot be linear. There are
two ways this can happen. One way is that for some x, y
f (x + y) = f (x) + f (y)
and f (xn ) = g (xn ) . It follows that |f (x) g (x)| < 2r. Since r is arbitrary, this
implies f (x) = g (x) . It is routine to verify the triangle inequality from the easy to
establish inequality,
x y x+y
+ ,
1+x 1+y 1+x+y
18.4. WEAK AND WEAK TOPOLOGIES 467
Proof: By Theorem 18.4.5, K is a metric space for the metric described there
and it is compact. Therefore by the characterization of compact metric spaces,
Proposition 15.2.5 on Page 366, K is sequentially compact. This proves the corol-
lary.
Jx (f ) f (x)
and let X be reexive so that J is onto. Then J is a homeomorphism of (X, weak topology)
and (X , weak topology).This means J is one to one, onto, and both J and J 1
are continuous.
Thus Bf (x, r) is a subbasic set for the weak topology on X. I claim that
where Bf (Jx, r) is a subbasic set for the weak topology. If y Bf (x, r) , then
Jy Jx = x y < r and so JBf (x, r) Bf (Jx, r) . Now if x Bf (Jx, r) ,
then since J is reexive, there exists y X such that Jy = x and so
y x = Jy Jx < r
showing that JBf (x, r) = Bf (Jx, r) . A typical subbasic set in the weak topology
is of the form Bf (Jx, r) . Thus J maps the subbasic sets of the weak topology to the
subbasic sets of the weak topology. Therefore, J is a homeomorphism as claimed.
The following is an easy corollary.
Corollary 18.4.8 If X is a reexive Banach space, then the closed unit ball is
weakly compact.
Proof: This follows from Theorem 18.4.5 and Lemma 18.4.7. Lemma 18.4.7
implies J (K) is compact in X . Then since X is separable in the weak topology,
X is separable in the weak topology and so there is a metric, d on J (K) which
delivers the weak topology on J (K). Let d (x, y) d (Jx, Jy) . Then
J id J 1
(K, d ) (J (K) , d ) (J (K) , weak ) (K, weak )
Lemma 18.4.10 Let Y be a closed subspace of a Banach space X and let y X \Y.
Then there exists x X such that x (Y ) = 0 but x (y) = 0.
|f (x + y)| k x + y
18.4. WEAK AND WEAK TOPOLOGIES 469
Proof: Let Y be the closed subspace of the reexive space, X. Consider the
following diagram
i 1-1
Y X
i onto
Y X
i
Y X
This diagram follows from Theorem 18.2.10 on Page 455, the theorem on adjoints.
Now let y Y . Then i y = JX (y) because X is reexive. I want to show
that y Y . If it is not in Y then since Y is closed, there exists x X such that
x (y) = 0 but x (Y ) = 0. Then i x = 0. Hence
0 = y (i x ) = i y (x ) = J (y) (x ) = x (y) = 0,
y (i x ) = i y (x ) = JX (y) (x )
= x (y) = x (iy) = i x (y) = JY (y) (i x )
Theorem 18.4.12 (Eberlein Smulian) The closed unit ball in a reexive Banach
space X, is weakly sequentially compact. By this is meant that if {xn } is contained
in the closed unit ball, there exists a subsequence, {xnk } and x X such that for
all x X ,
x (xnk ) x (x) .
Proof: Let {xn } B B (0, 1). Let Y be the closure of the linear span of
{xn }. Thus Y is a separable. It is reexive because it is a closed subspace of a
reexive space so the above lemma applies. By the Banach Alaoglu theorem, the
closed unit ball B in Y is weak compact. Also by Theorem 18.4.5, B is a metric
space with a suitable metric.
i 1-1
B Y X
i onto
weakly separable B Y X
i
separable B Y X
470 BANACH SPACES
Thus B is complete and totally bounded with respect to this metric and it
follows that B with the weak topology is separable. This implies Y is also
separable in the weak topology. To see this, let {yn } D be a weak dense
set in B and let y Y . Let p be a large enough positive rational number that
y /p B . Then if A is any nite set from Y, there exists yn D such that
A (y /p yn ) < p . It follows pyn BA (y , ) showing that rational multiples of
D are weak dense in Y . Since Y is reexive, the weak and weak topologies on
Y coincide and so Y is weakly separable. Since Y is weakly separable, Corollary
18.4.6 implies B , the closed unit ball in Y is weak sequentially compact. Then
by Lemma 18.4.7 B, the unit ball in Y , is weakly sequentially compact. It follows
there exists a subsequence xnk , of the sequence {xn } and a point x Y , such that
for all f Y ,
f (xnk ) f (x).
Now if x X , and i is the inclusion map of Y into X,
which shows xnk converges weakly and this shows the unit ball in X is weakly
sequentially compact.
Corollary 18.4.13 Let {xn } be any bounded sequence in a reexive Banach space
X. Then there exists x X and a subsequence, {xnk } such that for all x X ,
18.5 Exercises
1. Is N a G set? What about Q? What about a countable dense subset of a
complete metric space?
2. Let f : R C be a function. Dene the oscillation of a function in B (x, r)
by r f (x) = sup{|f (z) f (y)| : y, z B(x, r)}. Dene the oscillation of the
function at the point, x by f (x) = limr0 r f (x). Show f is continuous
at x if and only if f (x) = 0. Then show the set of points where f is
continuous is a G set (try Un = {x : f (x) < n1 }). Does there exist a
function continuous at only the rational numbers? Does there exist a function
continuous at every irrational and discontinuous elsewhere? Hint: Suppose
18.5. EXERCISES 471
D is any countable set, D = {di }i=1 , and dene the function, fn (x) to equal
/ {d1 , , dn } and 2n for x in this nite set. Then consider
every x
zero for
g (x) n=1 fn (x). Show that this series converges uniformly.
3. Let f C([0, 1]) and suppose f (x) exists. Show there exists a constant, K,
such that |f (x) f (y)| K|x y| for all y [0, 1]. Let Un = {f C([0, 1])
such that for each x [0, 1] there exists y [0, 1] such that |f (x) f (y)| >
n|x y|}. Show that Un is open and dense in C([0, 1]) where for f C ([0, 1]),
where
1
ak = eikx f (x) dx.
2
Show
Sn f (x) = Dn (x y) f (y) dy
where
sin((n + 21 )t)
Dn (t) = .
2 sin( 2t )
Verify that
Dn (t) dt = 1. Also show that if g L1 (R) , then
lim g (x) sin (ax) dx = 0.
a R
This last is called the Riemann Lebesgue lemma. Hint: For the last part,
assume rst that g Cc (R) and integrate by parts. Then exploit density of
the set of functions in L1 (R).
472 BANACH SPACES
7. It turns out that the Fourier series sometimes converges to the func-
tion pointwise. Suppose f is 2 periodic and Holder continuous. That is
|f (x) f (y)| K |x y| where (0, 1]. Show that if f is like this, then
the Fourier series converges to f at every point. Next modify your argument
to show that if at every point, x, |f (x+) f (y)| K |x y| for y close
enough to x and larger than x and |f (x) f (y)| K |x y| for every
f (x+)+f (x)
y close enough to x and smaller than x, then Sn f (x) 2 , the
midpoint of the jump of the function. Hint: Use Problem 6.
8. Let Y = {f such that f is continuous, dened on R, and 2 periodic}. Dene
||f ||Y = sup{|f (x)| : x [, ]}. Show that (Y, || ||Y ) is a Banach space. Let
x R and dene Ln (f ) = Sn f (x). Show Ln Y but limn ||Ln || = .
Show that for each x R, there exists a dense G subset of Y such that for f
in this set, |Sn f (x)| is unbounded. Finally, show there is a dense G subset of
Y having the property that |Sn f (x)| is unbounded on the rational numbers.
Hint: To do the rst part, let f (y) approximate sgn(Dn (xy)). Here sgn r =
1 if r > 0, 1 if r < 0 and 0 if r = 0. This rules out one possibility of the
uniform boundedness principle. After this, show the countable intersection of
dense G sets must also be a dense G set.
9. Let (0, 1]. Dene, for X a compact subset of Rp ,
where
||f || sup{|f (x)| : x X}
and
|f (x) f (y)|
(f ) sup{ : x, y X, x = y}.
|x y|
Show that (C (X; Rn ) , |||| ) is a complete normed linear space. This is
called a Holder space. What would this space consist of if > 1?
10. Let X be the Holder functions which are periodic of period 2. Dene
Ln f (x) = Sn f (x) where Ln : X Y for Y given in Problem 8. Show ||Ln ||
is bounded independent of n. Conclude that Ln f f in Y for all f X. In
other words, for the Holder continuous and 2 periodic functions, the Fourier
series converges to the function uniformly. Hint: Ln f (x) is given by
Ln f (x) = Dn (y) f (x y) dy
where f (x y) = f (x) + g (x, y) where |g (x, y)| C |y| . Use the fact the
Dirichlet kernel integrates to one to write
=|f (x)|
z }| {
Dn (y) f (x y) dy Dn (y) f (x) dy
18.5. EXERCISES 473
(( ) )
1
+C sin n+ y (g (x, y) / sin (y/2)) dy
2
Show the functions, y g (x, y) / sin (y/2) are bounded in L1 independent of
x and get a uniform bound on ||Ln ||. Now use a similar argument to show
{Ln f } is equicontinuous in addition to being uniformly bounded. In doing
this you might proceed as follows. Show
|Ln f (x) Ln f (x )| Dn (y) (f (x y) f (x y)) dy
||f || |x x |
(( ) )( )
f (x y) f (x) (f (x y) f (x ))
1
(y)
+ sin n+ y dy
2 sin 2
Then split this last integral into two cases, one for |y| < and one where
|y| . If Ln f fails to converge to f uniformly, then there exists > 0 and a
subsequence, nk such that ||Lnk f f || where this is the norm in Y or
equivalently the sup norm on [, ]. By the Arzela Ascoli theorem, there is
a further subsequence, Lnkl f which converges uniformly on [, ]. But by
Problem 7 Ln f (x) f (x).
11. Let X be a normed linear space and let M be a convex open set containing
0. Dene
x
(x) = inf{t > 0 : M }.
t
Show is a gauge function dened on X. This particular example is called a
Minkowski functional. It is of fundamental importance in the study of locally
convex topological vector spaces. A set, M , is convex if x + (1 )y M
whenever [0, 1] and x, y M .
12. The Hahn Banach theorem can be used to establish separation theorems.
Let M be an open convex set containing 0. Let x / M . Show there exists
x X such that Re x (x) 1 > Re x (y) for all y M . Hint: If y
M, (y) < 1. Show this. If x / M, (x) 1. Try f (x) = (x) for R.
Then extend f to the whole space using the Hahn Banach theorem and call
the result F , show F is continuous, then x it so F is the real part of x X .
13. A Banach space is said to be strictly convex if whenever ||x|| = ||y|| and
x = y, then
x + y
2 < ||x||.
such a duality map exists. The duality map is an attempt to duplicate some
of the features of the Riesz map in Hilbert space. This Riesz map is the map
which takes a Hilbert space to its dual dened as follows.
The Riesz representation theorem for Hilbert space says this map is onto.
Hint: For an arbitrary Banach space, let
{ }
F (x) x : ||x || ||x|| and x (x) = ||x||
2
14. Prove the following theorem which is an improved version of the open mapping
theorem, [12]. Let X and Y be Banach spaces and let A L (X, Y ). Then
the following are equivalent.
AX = Y,
A is an open map.
Note this gives the equivalence between A being onto and A being an open
map. The open mapping theorem says that if A is onto then it is open.
16. A Banach space is uniformly convex if whenever ||xn ||, ||yn || 1 and
||xn + yn || 2, it follows that ||xn yn || 0. Show uniform convexity
implies strict convexity (See Problem 13). Hint: Suppose it is
not strictly
convex. Then there exist ||x|| and ||y|| both equal to 1 and xn +y
2
n
= 1
consider xn x and yn y, and use the conditions for uniform convexity to
get a contradiction. It can be shown that Lp is uniformly convex whenever
> p > 1. See Hewitt and Stromberg [28] or Ray [38].
17. Show that a closed subspace of a reexive Banach space is reexive. This is
done in the chapter. However, try to do it yourself.
and f (x) = ||x||. See Problem 16 for the denition of uniform convexity.
Now by the weak convergence, you can argue that if x = 0, f (xn / ||xn ||)
f (x/ ||x||). You also might try to show this in the special case where ||xn || =
||x|| = 1.
20. Suppose L L (X, Y ) and M L (Y, Z). Show M L L (X, Z) and that
(M L) = L M .
476 BANACH SPACES
Hilbert Spaces
Note that 19.1.2 and 19.1.3 imply (x, ay + bz) = a(x, y) + b(x, z). Such a vector
space is called an inner product space.
The Cauchy Schwarz inequality is fundamental for the study of inner product
spaces.
and so (x, 0) = 0. Thus, it can be assumed y = 0. Then from the axioms of the
inner product,
F (t) = ||x||2 + 2t Re(x, y) + t2 ||y||2 0.
477
478 HILBERT SPACES
This yields
||x||2 + 2t|(x, y)| + t2 ||y||2 0.
Since this inequality holds for all t R, it follows from the quadratic formula that
Proof: All the axioms are obvious except the triangle inequality. To verify this,
2 2 2
||x + y|| (x + y, x + y) ||x|| + ||y|| + 2 Re (x, y)
2 2
||x|| + ||y|| + 2 |(x, y)|
2 2 2
||x|| + ||y|| + 2 ||x|| ||y|| = (||x|| + ||y||) .
In Hilbert space, one can dene a projection map onto closed convex nonempty
sets.
2 yn + ym
||yn x + ym x|| = 4(|| x||2 )
2
Since ||x yn || , this shows {yn x} is a Cauchy sequence. Thus also {yn } is
a Cauchy sequence. Since H is complete, yn y for some y H which must be in
K because K is closed. Therefore
Let P x = y.
480 HILBERT SPACES
yy
K - x
z
Condition 19.1.6 says the angle, , shown in the diagram is always obtuse. Re-
member from calculus, the sign of x y is the same as the sign of the cosine of the
included angle between x and y. Thus, in nite dimensions, the conclusion of this
corollary says that z = P x exactly when the angle of the indicated angle is obtuse.
Surely the picture suggests this is reasonable.
The inequality 19.1.6 is an example of a variational inequality and this corollary
characterizes the projection of x onto K as the solution of this variational inequality.
Proof of Corollary: Let z K and let y K also. Since K is convex, it
follows that if t [0, 1],
z + t(y z) = (1 t) z + ty K.
Furthermore, every point of K can be written in this way. (Let t = 1 and y K.)
Therefore, z = P x if and only if for all y K and t [0, 1],
||x (z + t(y z))||2 = ||(x z) t(y z)||2 ||x z||2
for all t [0, 1] and y K if and only if for all t [0, 1] and y K
2 2 2
||x z|| + t2 ||y z|| 2t Re (x z, y z) ||x z||
If and only if for all t [0, 1],
2
t2 ||y z|| 2t Re (x z, y z) 0. (19.1.7)
Now this is equivalent to 19.1.7 holding for all t (0, 1). Therefore, dividing by
t (0, 1) , 19.1.7 is equivalent to
2
t ||y z|| 2 Re (x z, y z) 0
for all t (0, 1) which is equivalent to 19.1.6. This proves the corollary.
Corollary 19.1.10 Let K be a nonempty convex closed subset of a Hilbert space,
H. Then the projection map, P is continuous. In fact,
|P x P y| |x y| .
19.1. BASIC THEORY 481
Re (x P x , P x P x ) 0, Re (x P x, P x P x) 0
Hence
0 Re (x P x, P x P x ) Re (x P x , P x P x )
Re (x x , P x P x ) |P x P x |
2
=
and so
|P x P x | |x x | |P x P x | .
2
Proof: Let K B (0, R) and let P be the projection map onto K. Then
consider the map f P which maps B (0, R) to B (0, R) and is continuous. By the
Brouwer xed point theorem for balls, this map has a xed point. Thus there exists
x such that
f P (x) = x
Now the equation also requires x K and so P (x) = x. Hence f (x) = x.
The case where the closed convex set is a closed subspace is of special importance
and in this case the above corollary implies the following.
(x z, y) = 0 (19.1.8)
and
2 2 2
||x|| = ||x P x|| + ||P x|| . (19.1.9)
where the denominator is not equal to zero because the xj form a basis and so
xk+1
/ span (x1 , , xk ) = span (u1 , , uk )
Thus by induction,
Also, xk+1 span (u1 , , uk , uk+1 ) which is seen easily by solving 19.2.11 for xk+1
and it follows
If l k,
k
(uk+1 ul ) = C (xk+1 ul ) (xk+1 uj ) (uj ul )
j=1
k
= C (xk+1 ul ) (xk+1 uj ) lj
j=1
= C ((xk+1 ul ) (xk+1 ul )) = 0.
n
The vectors, {uj }j=1 , generated in this way are therefore an orthonormal basis
because each vector has unit length.
Consider the second claim about nite dimensional subspaces. Without loss of
generality, assume {x1 , , xn } is linearly independent. If it is not, delete vectors
484 HILBERT SPACES
until a linearly independent set is obtained. Then by the rst part, span (x1 , , xn ) =
span (u1 , , un ) M where the ui are an orthonormal set of vectors. Suppose
{yk } M and yk y H. Is y M ? Let
n
yk ckj uj
j=1
( )T
Then let ck ck1 , , ckn . Then
k
n
k
n
( )
n
( )
c cl 2 cj clj =
2
ckj clj uj , ckj clj uj
j=1 j=1 j=1
2
= ||yk yl ||
{ }
which shows ck is a Cauchy sequence in Fn and so it converges to c Fn . Thus
n
n
y = lim yk = lim ckj uj = cj uj M.
k k
j=1 j=1
Theorem 19.2.2 Let M be the span of {u1 , , un } in a Hilbert space, H and let
y H. Then P y is given by
n
Py = (y, uk ) uk (19.2.12)
k=1
Proof:
( )
n
n
y (y, uk ) uk , up = (y, up ) (y, uk ) (uk , up )
k=1 k=1
= (y, up ) (y, up ) = 0
It follows that ( )
n
y (y, uk ) uk , u =0
k=1
2
n
2
n
2
= |y| 2 |(y, uk )| + |(y, uk )|
k=1 k=1
Thus the ij th entry of this matrix is (xi , xj ). This is sometimes called the Gram
matrix. Also dene G (x1 , , xn ) as the determinant of this matrix, also called the
Gram determinant.
(x1 , x1 ) (x1 , xn )
.. ..
G (x1 , , xn ) . . (19.2.15)
(xn , x1 ) (xn , xn )
G (x1 , , xn , y)
d2 = . (19.2.16)
G (x1 , , xn )
n
Proof: By Theorem 19.2.1 M is a closed subspace of H. Let k=1 k xk be the
element of M which is closest to y. Then by Corollary 19.1.13,
( )
n
y k xk , xp = 0
k=1
n
(y, xp ) = (xp , xk ) k , p = 1, 2, , n (19.2.17)
k=1
486 HILBERT SPACES
d2 + yxT (19.2.19)
in which
yxT ((y, x1 ) , , (y, xn )) , T (1 , , n ) .
Then 19.2.17 and 19.2.18 imply the following system
( )( ) ( )
G (x1 , , xn ) 0 yx
= 2
yxT 1 d2 ||y||
By Cramers rule,
( )
G (x1 , , xn ) yx
det 2
yxT ||y||
d2 = ( )
G (x1 , , xn ) 0
det
yxT 1
( )
G (x1 , , xn ) yx
det 2
yxT ||y||
=
det (G (x1 , , xn ))
det (G (x1 , , xn , y)) G (x1 , , xn , y)
= =
det (G (x1 , , xn )) G (x1 , , xn )
and this proves the theorem.
Theorem 19.3.2 In any separable Hilbert space, H, there exists a countable or-
thonormal set, S = {xi } such that the span of these vectors is dense in H. Further-
more, if span (S) is dense, then for x H,
n
x= (x, xi ) xi lim (x, xi ) xi . (19.3.20)
n
i=1 i=1
2
n
2
n
n
||x|| + |ck | ck (x, xk ) ck (x, xk ).
k=1 k=1 k=1
This equals
2
n
2
n
2
||x|| + |ck (x, xk )| |(x, xk )|
k=1 k=1
2
n
2
||x|| |(x, xk )|
k=1
488 HILBERT SPACES
Now since span (S) is dense, there exists n large enough that for some choice of
constants, ck ,
2
n
x ck xk < .
k=1
{x1 , , xn } S.
Then if x H
2 2
n
n
x ck xk x (x, xi ) xi
k=1 i=1
2
n
2
||x|| |(x, xk )| .
k=1
If S is countable and span (S) is dense, then letting {xi }i=1 = S, 19.3.20 follows.
This is a Hilbert space because of the theorem which states the Lp spaces are
complete, Theorem 12.2.2 on Page 287. An example of an orthonormal set of
functions in L2 (0, 2) is
1
n (x) einx
2
for n an integer. Is it true that the span of these functions is dense in L2 (0, 2)?
Theorem 19.4.1 Let S = {n }nZ . Then span (S) is dense in L2 (0, 2).
19.4. FOURIER SERIES, AN EXAMPLE 489
m
ck z k
k=m
for m chosen large enough. This algebra separates the points of T because it contains
the function, p (z) = z. It annihilates no point of t because it contains the constant
function 1. Furthermore, it has the property that for f A, f A. By the Stone
Weierstrass approximation theorem, Theorem 11.3.1 on Page 278, A is dense in
( ) ). Now for g Cc (0, 2) , extend g to all of R to be 2 periodic. Then letting
C (T
G eit g (t) , it follows G is well dened and continuous on T. Therefore, there
exists H A such that for all t R,
( it ) ( )
H e G eit < 2 /2.
( )
Thus H eit is of the form
( )
m
( )k
m
H eit = ck eit = ck eikt span (S) .
k=m k=m
m ikt
Let h (t) = k=m ck e . Then
( 2 )1/2 ( 2 )1/2
2
|g h| dx max {|g (t) h (t)| : t [0, 2]} dx
0 0
( )1/2
2 { ( ) ( ) }
= max G eit H eit : t [0, 2] dx
0
( 2 )1/2
2
< = .
0 2
S (h) x x
lim Ax
h0 h
The assertion that t S (t) x0 is continuous and that S (t) L (H, H) is not
sucient to say there is a bound on ||S (t)|| for all t. Also the assertion that for
each x0 ,
lim S (t) x0 = x0
t0+
is not the same as saying that S (t) I in L (H, H) . It is a much weaker assertion.
The next theorem gives information on the growth of ||S (t)|| . It turns out it has
exponential growth.
Proof: If this is not true, then there exists tn [0, T ] such that ||S (tn )|| n.
That is the operators S (tn ) are not uniformly bounded. By the uniform bound-
edness principle, Theorem 18.1.8, there exists x H such that ||S (tn ) x|| is not
bounded. However, this is impossible because it is given that t S (t) x is con-
tinuous on [0, T ] and so t ||S (t) x|| must achieve its maximum on this compact
set.
Now here is the main result for growth of ||S (t)||.
Theorem 19.5.3 For M described in Lemma 19.5.2, there exists such that
||S (t)|| M et .
Proof: First note D (A) = . In fact 0 D (A). It follows from Theorem 19.5.3
that for all large enough, one can dene a Laplace transform,
R () x et S (t) xdt H.
0
Here the integral is the ordinary improper Riemann integral. I claim each of these
is in D (A) .
S (h) 0 et S (t) xdt 0 et S (t) xdt
h
Using the semigroup property and changing the variables in the rst of the above
integrals, this equals
( )
1
= eh et S (t) xdt et S (t) xdt
h h 0
( h )
1 ( h ) t t
= e 1 e S (t) xdt e h
e S (t) xdt
h 0 0
Corollary 19.5.6 Let S (t) be a continuous semigroup and let A be its generator.
Then for 0 < a < b and x D (A)
b
S (b) x S (a) x = S (t) Axdt
a
and also for t > 0 you can take the derivative from the left,
S (t) x S (t h) x
lim = S (t) Ax
h0+ h
Proof:Letting y H ,
( ) ( )
b b
S (h) x x
y S (t) Axdt = y S (t) lim dt
a a h0 h
The dierence quotients are bounded because they converge to Ax. Therefore, from
the dominated convergence theorem,
( ) b ( )
b
S (h) x x
y S (t) Axdt = lim y S (t) dt
a h0 a h
( )
b
S (h) x x
= lim y S (t) dt
h0 a h
( )
1 b+h
1 b
= lim y S (t) xdt S (t) xdt
h0 h a+h h a
( )
1 b+h 1 a+h
= lim y S (t) xdt S (t) xdt
h0 h b h a
= y (S (b) x S (a) x)
Since y is arbitrary, this proves the rst part. Now from what was just shown, if
t > 0 and h is small enough,
S (t) x S (t h) x 1 t
= S (s) Axds
h h th
Theorem 19.5.7 Suppose A is a densely dened linear operator which has the
property that for all > 0,
1
(I A) L (H, H)
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 495
which means that I A : D (A) H is one to one and onto with continuous
inverse. Suppose also that for all n N,
( )n M
1
(I A) n . (19.5.22)
Then there exists a continuous semigroup, S (t) which has A as its generator and
satises ||S (t)|| M and A is closed. In fact letting
( )
1
S (t) exp + 2 (I A)
Thus
||S (t)|| et M et = M
The series converges uniformly on any nite interval thanks to the Weierstrass M
test. Thus t S (t) is continuous and it is also routine to verify the semigroup
identity. Clearly limt0 S (t) x = x. It is also the case that S (t) is generated by
1
+ 2 (I A) = A . This is easy to show from dierentiating the power series
which has a continuous derivative. Thus
( )k ( )k+1
tk 2 (I A)1
x tk 2 (I A)1
x
() et + et
k! k!
k=0 k=0
( ) ( )
1 1
= + 2 (I A) S (t) x = S (t) + 2 (I A) x
( )
1
Now let t 0+ to obtain + 2 (I A) x = A x.
1 1
Claim: For , > 0, (I A) and (I A) commute.
Proof of claim: Suppose
1 1
y = (I A) (I A) x (19.5.25)
1 1
z = (I A) (I A) x (19.5.26)
Hence
(I A) z = ( ) z + (I A) z D (A) .
Similarly
(I A) y, (I A) y D (A) .
From 19.5.25
(I A) (I A) y = x
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 497
x = (I A) (I A) z
= (( ) I + (I A)) (I A) z
2
= ( ) (I A) z + (I A) z
= (I A) ( ) z + (I A) (I A) z
= (I A) ( ) z + (I A) (( ) I + (I A)) z
= (I A) ( ) z + (I A) (( ) z + (I A) z)
= (I A) ( ) z + (I A) ( ) z + (I A) (I A) z
= (I A) (I A) z
Thus
x = (I A) (I A) z = (I A) (I A) y
and so z = y. This proves the claim.
It follows from the description of S (t) that S (t) and S (s) commute and also
A commutes with S (t) for any t.
I want to show that for each x D (A) ,
Since A commutes with S (r) , the following formula follows from 19.5.24.
t
= (S (t r) S (r) A x S (t r) A S (r) x) dr
0
t
||S (t r) S (r) (A x A x)|| dr
0
M 2 t ||A x A x|| M 2 t (||A x Ax|| + ||Ax A x||)
( )
||Ax|| ||Ax||
+ tM 2
Hence whenever , large enough, ||S (t) x S (t) x|| is small. Thus S (t) x
converges uniformly on nite intervals to something denoted by S (t) x. Therefore,
t S (t) x is continuous for each x D (A) and also
so that S (t) can be extended to a continuous linear map, still called S (t) dened
on all of H which also satises ||S (t)|| M since D (A) is dense in H. If x is
arbitrary, let y D (A) be close to x. Then
||S (t) x S (t) x|| ||S (t) x S (t) y|| + ||S (t) y S (t) y||
+ ||S (t) y S (t) x||
and so
lim ||S (t) x x|| = 0
t0+
By the uniform convergence just shown, there exists large enough that for all
t [0, ] ,
||S (t) x S (t) x|| < .
Then
lim sup ||S (t) x x|| lim sup (||S (t) x S (t) x|| + ||S (t) x x||)
t0+ t0+
lim sup ( + ||S (t) x x||)
t0+
Thus letting B denote the generator of S (t) , D (A) D (B) and A = B on D (A) .
It only remains to verify D (A) = D (B) .
To do this, let > 0 and consider the following where y H is arbitrary.
( )
1 1 1
(I B) y = (I B) (I A) (I A) y
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 499
1
Now (I A) y D (A) D (B) and A = B on D (A) and so
1 1
(I A) (I A) y = (I B) (I A) y
which implies,
1
(I B) y=
( )
1 1 1
(I B) (I B) (I A) y = (I A) y
Recall from Proposition 19.5.5 that an arbitrary element of D (B) is of the form
1
(I B) y and this has shown every such vector is in D (A) , in fact it equals
1
(I A) y. Hence D (B) D (A) which shows A generates S (t) and this proves
the rst half of the theorem.
Next suppose A is the generator of a semigroup S (t) having ||S (t)|| M. Then
by Proposition 19.5.5 for all > 0, (I A) is onto and
1
(I A) = et S (t) dt
0
thus
( )n
1
(I A)
= e (t1 ++tn )
S (t1 + + tn ) dt1 dtn
0 0
M
e(t1 ++tn ) M dt1 dtn = n .
0 0
This proves the theorem.
y y = g (t)
provided t g (t) is C 1 .
y y = g, y (0) = y0 D ()
and it is given by t
y (t) = S (t) y0 + S (t s) g (s) ds. (19.5.28)
0
This solution is continuous having continuous derivative and has values in D ().
500 HILBERT SPACES
Proof: First
t I show the following claim.
Claim: 0 S (t s) g (s) ds D () and
( t ) t
S (t s) g (s) ds = S (t) g (0) g (t) + S (t s) g (s) ds
0 0
1 0 th
g (s + h) g (s)
= S (t s) g (s + h) ds + S (t s)
h h 0 h
t
1
S (t s) g (s) ds
h th
Using the estimate in Theorem 19.5.3 on Page 490 and the dominated convergence
theorem, the limit as h 0 of the above equals
t
S (t) g (0) g (t) + S (t s) g (s) ds
0
S (h) y0 y0
S (t) y0 = S (t) lim
h0 h
S (t + h) S (t)
= lim y0
h0 h
S (h) S (t) y0 S (t) y0
= lim (19.5.29)
h0 h
Since this limit exists, the last limit in the above exists and equals
S (t) y0 (19.5.30)
y (t + h) y (t) S (t + h) S (t)
= y0 +
h h
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 501
( )
t+h t
1
S (t s + h) g (s) ds S (t s) g (s) ds
h 0 0
S (t + h) S (t) 1 t+h
= y0 + S (t s + h) g (s) ds
h h t
( t t )
1
+ S (h) S (t s) g (s) ds S (t s) g (s) ds
h 0 0
From the claim and 19.5.29, 19.5.30 the limit of the right side is
( t )
S (t) y0 + g (t) + S (t s) g (s) ds
0
( t )
= S (t) y0 + S (t s) g (s) ds + g (t)
0
Hence
y (t) = y (t) + g (t)
and from the formula, y is continuous since by the claim and 19.5.30 it also equals
t
S (t) y0 + g (t) + S (t) g (0) g (t) + S (t s) g (s) ds
0
which is continuous. The claim and 19.5.30 also shows y (t) D (). This proves
the existence part of the lemma.
It remains to prove the uniqueness part. It suces to show that if
y y = 0, y (0) = 0
and y is C 1 having values in D () , then y = 0. Suppose then that y is this way.
Letting 0 < s < t,
d
(S (t s) y (s))
ds
y (s + h) y (s)
lim S (t s h)
h0 h
S (t s) y (s) S (t s h) y (s)
h
provided the limit exists. Since y exists and y (s) D () , this equals
S (t s) y (s) S (t s) y (s) = 0.
Let y H . This has shown that on the open interval (0, t) the function s
y (S (t s) y (s)) has a derivative equal to 0. Also from continuity of S and y, this
function is continuous on [0, t]. Therefore, it is constant on [0, t] by the mean value
theorem. At s = 0, this function equals 0. Therefore, it equals 0 on [0, t]. Thus
for xed s > 0 and letting t > s, y (S (t s) y (s)) = 0. Now let t decrease toward
s. Then y (y (s)) = 0 and since y was arbitrary, it follows y (s) = 0. This proves
uniqueness.
502 HILBERT SPACES
Then since D (A) is dense, there exists a unique element of H denoted by A y such
that
(Ax, y) = (x, A y)
for all x D (A) .
(Ax, a) = (x, b)
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 503
and so |(Ax, a)| C |x| for all x D (A) which shows a D (A ) and
(x, A a) = (x, b)
and S is always a closed subspace. Also and commute. The reason for this is
that [x, y] ( V ) means that
(x, b) + (y, a) = 0
( )
for all [a, b] V and [x, y] V means [y, x] V so for all [a, b] V,
(y, a) + (x, b) = 0
which says the same thing. It is also clear that has the eect of multiplication
by 1.
It follows from the above description of the graph of A that even if G (A) were
not closed it would still be the case that G (A ) is closed.
Why is D (A ) dense? Suppose z D (A ) . Then for all y D (A ) so that
( )
[y, Ay] G (A ) , it follows [z, 0] G (A ) = ( G (A)) = G (A) but this
implies
[0, z] G (A)
and so z = A0 = 0. Thus D (A ) must be dense since there is no nonzero vector
in D (A ) .
Since A is a closed operator, meaning G (A) is closed in H H, it follows from
the above formula that
( ( ( )) ( )
)
G (A ) = ( G (A)) = ( G (A))
( ) ( )
= (G (A)) = G (A) = G (A)
and so (A ) = A.
Now consider the nal claim. First let y D (A ) = D (I A ) . Then letting
x H be arbitrary, ( ( ) )
1
x, (I A) (I A) y
( ) ( ( ) )
1 1
(I A) (I A) x, y = x, (I A) (I A ) y
Thus
( ) ( )
1 1
(I A) (I A) = I = (I A) (I A ) (19.5.32)
504 HILBERT SPACES
Hence ( ( ) )
1
(x, y) = x, (I A ) (I A) y .
and (I A ) is one to one and onto with continuous inverse. Finally, from the
above,
( )n (( ) )n (( )n )
1 1 1
(I A ) = (I A) = (I A) .
Proof: First suppose S (t) is also a bounded semigroup, ||S (t)|| M . From
Lemma 19.5.10 A is closed and densely dened. It follows from the Hille Yosida
theorem, Theorem 19.5.7 that
( )n M
1
(I A) n
From Lemma 19.5.10 and the fact the adjoint of a bounded linear operator preserves
the norm,
(( )n ) (( ) )n
M
(I A)
1 = (I A)
1
n
( )n
1
= (I A )
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 505
(( )k )
1
t
tk 2n (n I A)
= lim e n x, y
n k!
k=0
(( )k )
1
t t
k
2n (n I A)
= lim x, e n y
n k!
k=0
Thus I + A generates et S (t) and it follows from the rst part that I + A
generates et S (t) . Thus
eh S (h) x x
x + A x = lim
h0+ h
(
)
h S (h) x x eh 1
= lim e + x
h0+ h h
S (h) x x
= x + lim
h0+ h
showing that A generates S (t) . It follows from Proposition 19.5.5 that A is closed
and densely dened. It is obvious S (t) is a semigroup. Why is it continuous? This
506 HILBERT SPACES
also follows from the rst part of the argument which establishes that
et S (t)
Then since D (A) is dense, there exists a unique element of H denoted by A y such
that
A (y ) (x) = y (Ax)
for all x D (A) .
For S H H, dene S by
Here is why 19.5.34 is so. For [x , A x ] G (A ) it follows that for all y D (A)
x (Ay) = A x (y)
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 507
x (Ay) + A x (y) = 0
which is what it means to say [x , A x ] ( G (A)) . This shows
G (A ) ( G (A))
To obtain the other inclusion, let [a , b ] ( G (A)) . This means that for all
[x, Ax] G (A) ,
a (Ax) + b (x) = 0
In other words, for all x D (A) ,
and S is always a closed subspace. Also and commute. The reason for this is
that [x , y ] ( V ) means that
x (b) + y (a) = 0
( ) ( )
for all [a, b] V and [x , y ] V means [y , x ] V = V so for all
[a, b] V,
y (a) + x (b) = 0
which says the same thing. It is also clear that has the eect of multiplication
by 1. If V H H , the argument for commuting and is similar.
It follows from the above description of the graph of A that even if G (A) were
not closed it would still be the case that G (A ) is closed.
Why is D (A ) dense? If it is not dense, then by a typical application of the Hahn
Banach theorem, there exists y H such that y (D (A )) = 0 but y = 0.
Since H is reexive, there exists y H such that x (y) = 0 for all x D (A ) .
Thus ( )
[y, 0] G (A ) = ( G (A)) = G (A)
Then 19.5.35 and 19.5.36 show I A is one to one and onto from D (A ) t0 H
and ( )
1 1
(I A ) = (I A) .
19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 509
Proof: First suppose S (t) is also a bounded semigroup, ||S (t)|| M . From
Lemma 19.5.13 A is closed and densely dened. It follows from the Hille Yosida
theorem, Theorem 19.5.7 that
( )n M
1
(I A) n
From Lemma 19.5.13 and the fact the adjoint of a bounded linear operator preserves
the norm,
(( )n ) (( ) )n
M
(I A)
1 = (I A)
1
n
( )n
1
= (I A )
(( )k )
1
tk 2n (n I A)
= lim en t x (y)
n k!
k=0
(( )k )
1
t t k
2
n (n I A)
= lim x
e n
y
n k!
k=0
Thus I + A generates et S (t) and it follows from the rst part that I + A
generates the semigroup et S (t) . Thus
eh S (h) x x
x + A x = lim
h0+ h
( )
S (h) x x eh 1
= lim eh + x
h0+ h h
S (h) x x
= x + lim
h0+ h
showing that A generates S (t) . It follows from Proposition 19.5.5 that A is closed
and densely dened. It is obvious S (t) is a semigroup. Why is it continuous? This
also follows from the rst part of the argument which establishes that
t et S (t) x
19.6 Exercises
1
1. For f, g C ([0, 1]) let (f, g) = 0 f (x) g (x)dx. Is this an inner product
space? Is it a Hilbert space? What does the Cauchy Schwarz inequality say
in this context?
2. Suppose the following conditions hold.
(x, x) 0, (19.6.37)
These are the same conditions for an inner product except it is no longer
required that (x, x) = 0 if and only if x = 0. Does the Cauchy Schwarz
inequality hold in the following form?
1/2 1/2
|(x, y)| (x, x) (y, y) .
S {x X : ||x|| = 1} .
Next suppose (X, || ||) is a real normed linear space and the parallelogram
identity holds. Can it be concluded there exists an inner product (, ) such
that ||x|| = (x, x)1/2 ?
6. Let K be a closed, bounded and convex set in Rn and let f : K Rn be
continuous and let y Rn . Show using the Brouwer xed point theorem
there exists a point x K such that P (y f (x) + x) = x. Next show that
(y f (x) , z x) 0 for all z K. The existence of this x is known as
Browders lemma and it has great signicance in the study of certain types of
nolinear operators. Now suppose f : Rn Rn is continuous and satises
(f (x) , x)
lim = .
|x| |x|
Show using Browders lemma that f is onto.
7. Show that every inner product space is uniformly convex. This means that if
xn , yn are vectors whose norms are no larger than 1 and if ||xn + yn || 2,
then ||xn yn || 0.
8. Let H be separable and let S be an orthonormal set. Show S is countable.
Hint: How far apart are two elements of the orthonormal set?
512 HILBERT SPACES
12. Show S is a maximal orthonormal set if and only if span (S) is dense in H,
where span (S) is dened as
span(S) {all nite linear combinations of elements of S}.
N
x= (x, xn )xn lim (x, xn )xn
N
n=1 n=1
and ||x||2 = i=1 |(x, xi )|2 . Also show (x, y) = n=1 (x, xn )(y, xn ). Hint:
For the last part of this, you might proceed as follows. Show that
((x, y)) (x, xn )(y, xn )
n=1
is a well dened inner product on the Hilbert space which delivers the same
norm as the original inner product. Then you could verify that there exists
a formula for the inner product in terms of the norm and conclude the two
inner products, (, ) and ((, )) must coincide.
14. Suppose X is an innite dimensional Banach space and suppose
{x1 xn }
are linearly independent with ||xi || = 1. By Problem 9 span (x1 xn ) Xn
is a closed linear subspace of X. Now let z / Xn and pick y Xn such that
||z y|| 2 dist (z, Xn ) and let
zy
xn+1 = .
||z y||
Show the sequence {xk } satises ||xn xk || 1/2 whenever k < n. Now
show the unit ball {x X : ||x|| 1} in a normed linear space is compact if
and only if X is nite dimensional. Hint:
z y z y xk ||z y||
.
||z y|| xk = ||z y||
19.6. EXERCISES 513
15. Show that if A is a self adjoint operator on a Hilbert space and Ay = y for
a complex number and y = 0, then must be real. Also verify that if A is
self adjoint and Ax = x while Ay = y, then if = , it must be the case
that (x, y) = 0.
16. Theorem 19.5.8 gives the the existence and uniqueness for an evolution equa-
tion of the form
y y = g, y (0) = y0 H
where g is in C 1 (0, ; H) for H a Banach space. Recall was the generator
of a continuous semigroup, S (h). Generalize this to an equation of the form
y y = g + Ly, y (0) = y0 H
515
516 REPRESENTATION THEOREMS
By Holders inequality,
( )1/2 ( )1/2
2 1/2
|g| 2
1 d |g| d ( + ) = () ||g||2
The plan is to show h is real and nonnegative at least a.e. Therefore, consider the
set where Im h is positive.
E = {x : Im h(x) > 0} ,
(Im h) d( + )
En
1
( + ) (En )
n
where { }
1
En x : Im h (x)
n
Thus ( + ) (En ) = 0 and since E =
n=1 En , it follows ( + ) (E) = 0. A similar
argument shows that for
Let f (x) = i=1 hi (x) and use the Monotone Convergence theorem in 20.1.4 to let
n and conclude
(E) = f d.
E
f L (, ) because is nite.
1
Similarly, the set where g is larger than f has measure zero. This proves the
theorem.
Case where it is not necessarily true that .
In this case, let N = [h 1] and let g = XN . Then
(N ) = h d( + ) (N ) + (N ).
N
(E) (E N )
Also, ( )
|| (E) = (E) (E) (E) (E N ) = E N C .
Suppose || (E) > 0. Therefore, since h < 1 on N C
( )
|| (E) = E N C = h d( + )
EN C
( ) ( )
< E N C + E N C = (E) + || (E) ,
Sn Sm = ,
n=1 Sn = ,
20.1. RADON NIKODYM THEOREM 519
and (Sn ), (Sn ) < . Then there exists f 0, where f is measurable, and
(E) = f d
E
Sn {E Sn : E S}.
f 1 ((a, ]) = 1
n=1 fn ((a, ]) S.
Also, for E S,
(E) = (E Sn ) = XESn (x)fn (x)d
n=1 n=1
= XESn (x)f (x)d
n=1
(Ek ) = lim (E Sn ) = 0.
n
520 REPRESENTATION THEOREMS
Hence ([f1 f2 > 0]) k=1 (Ek ) = 0. Therefore, ([f1 f2 > 0]) = 0 also.
Similarly
( + ) ([f1 f2 < 0]) = 0.
This version of the Radon Nikodym theorem will suce for most applications,
but more general versions are available. To see one of these, one can read the
treatment in Hewitt and Stromberg [28]. This involves the notion of decomposable
measure spaces, a generalization of nite.
Not surprisingly, there is a simple generalization of the Lebesgue decomposition
part of Theorem 20.1.2.
+ || = , || , (E) = (E N ) = (E N ) .
such that i = i + i|| , a set of i measure zero, Ni Si such that for all E Si ,
i (E) = i (E Ni ) and i|| i . Dene for E S
(E) i (E i ) , || (E) i|| (E i ) , N i Ni .
i i
and
(E) i (E i ) = i (E i Ni )
i i
= (E i N ) = (E N ) .
i
20.2. VECTOR MEASURES 521
The decomposition is unique because of the uniqueness of the i|| and i and the
observation that some other decomposition must coincide with the given one on the
i .
Denition 20.2.1 Let (V, ||||) be a normed linear space and let (, S) be a measure
space. A function : S V is a vector measure if is countably additive. That
is, if {Ei }
i=1 is a sequence of disjoint sets of S,
(
i=1 Ei ) = (Ei ).
i=1
Note that it makes sense to take nite sums because it is given that has
values in a vector space in which vectors can be summed. In the above, (Ei ) is a
vector. It might be a point in Rn or in any other vector space. In many of the most
important applications, it is a vector in some sort of function space which may be
innite dimensional. The innite sum has the usual meaning. That is
n
(Ei ) = lim (Ei )
n
i=1 i=1
The next theorem may seem a little surprising. It states that, if nite, the total
variation is a nonnegative measure.
522 REPRESENTATION THEOREMS
Proof: Consider the last claim. Let a < || (A) and let (A) be a partition of
A such that
a< || (F )|| .
F (A)
Since this is true for all such a, it follows || (B) || (A) as claimed.
Let {Ej }j=1 be a sequence of disjoint sets of S and let E = j=1 Ej . Then
letting a < || (E ) , it follows from the denition of total variation there exists a
partition of E , (E ) = {A1 , , An } such that
n
a< ||(Ai )||.
i=1
Also,
Ai = j=1 Ai Ej
and so by the triangle inequality, ||(Ai )|| j=1 ||(Ai Ej )||. Therefore, by the
above, and either Fubinis theorem or Lemma 1.3.4 on Page 26
||(Ai )||
z }| {
n
n
a< ||(Ai Ej )|| = ||(Ai Ej )|| ||(Ej )
i=1 j=1 j=1 i=1 j=1
n
because {Ai Ej }i=1 is a partition of Ej .
Since a is arbitrary, this shows
||(
j=1 Ej ) ||(Ej ).
j=1
If the sets, Ej are not disjoint, let F1 = E1 and if Fn has been chosen, let Fn+1
En+1 \ ni=1 Ei . Thus the sets, Fi are disjoint and
i=1 Fi = i=1 Ei . Therefore,
( ) ( )
||
j=1 Ej = || j=1 Fj || (Fj ) || (Ej )
j=1 j=1
Such a partition exists because of the denition of the total variation. Consider the
sets which are contained in either of (E1 ) or (E2 ) , it follows this collection of
sets is a partition of E1 E2 denoted by (E1 E2 ). Then by the above inequality
and the denition of total variation,
||(E1 E2 ) ||(F )|| > || (E1 ) + || (E2 ) 2,
F (E1 E2 )
Since n is arbitrary,
||(
j=1 Ej ) = ||(Ej )
j=1
it follows that
n
|| (E) + | (F )| = + (F Ei )
F (E) F (E) i=1
n
n
+ | (F Ei )| + || (Ei )
i=1 F (E) i=1
Then || () < .
Here 20 is just a nice sized number. No eort is made to be delicate in this argument.
Also note that (E) C because it is given that is a complex measure. Consider
the following picture consisting of two lines in the complex plane having slopes 1
and -1 which intersect at the origin, dividing the complex plane into four closed
sets, R1 , R2 , R3 , and R4 as shown.
R2
R3 R1
R4
Let i consist of those sets, A of (E) for which (A) Ri . Thus, some sets,
A of (E) could be in two of the i if (A) is on one of the intersecting lines. This
is not important. The thing which is important is that if (A) R1 or R3 , then
20.2. VECTOR MEASURES 525
2
2
| (A)| |Re ( (A))| and if (A) R2 or R4 then 22 | (A)| |Im ( (A))| and
Re (z) has the same sign for z in R1 and R3 while Im (z) has the same sign for z in
R2 or R4 . Then by 20.2.6, it follows that for some i,
| (F )| > 5 (1 + | (E)|) . (20.2.7)
F i
Dene D E \ C.
and so
5 3
1< + | (E)| < | (D)| .
2 2
Now since || (E) = , it follows from Theorem 20.2.5 that = || (E) || (C)+
|| (D) and so either || (C) = or || (D) = . If || (C) = , let B = C and
A = D. Otherwise, let B = D and A = C. This proves the claim.
526 REPRESENTATION THEOREMS
Now suppose || () = . Then from the claim, there exist A1 and B1 such that
|| (B1 ) = , | (B1 )| , | (A1 )| > 1, and A1 B1 = . Let B1 \ A play the same
role as and obtain A2 , B2 B1 such that || (B2 ) = , | (B2 )| , | (A2 )| > 1,
and A2 B2 = B1 . Continue in this way to obtain a sequence of disjoint sets, {Ai }
such that | (Ai )| > 1. Then since is a measure,
(
i=1 Ai ) = (Ai )
i=1
Therefore, each of
| Re | + Re | Re | Re() | Im | + Im | Im | Im()
, , , and
2 2 2 2
are nite measures on S. It is also clear that each of these nite measures are abso-
lutely continuous with respect to and so there exist unique nonnegative functions
in L1 (), f1, f2 , g1 , g2 such that for all E S,
1
(| Re | + Re )(E) = f1 d,
2
E
1
(| Re | Re )(E) = f2 d,
2
E
1
(| Im | + Im )(E) = g1 d,
2
E
1
(| Im | Im )(E) = g2 d.
2 E
B(p, r)
(0, 0)
p
1
because it is closer to p than r. (Refer to the picture.) However, this contradicts the
assumption of the lemma. It follows (E) = 0. Since the set of complex numbers,
z such that |z| > 1 is an open set, it equals the union of countably many balls,
{Bi }i=1 . Therefore,
( ) ( )
f 1 ({z C : |z| > 1} = k=1 f
1
(Bk )
( )
f 1 (Bk ) = 0.
k=1
Corollary 20.2.8 Let be a complex vector measure with ||() < 1 Then there
exists a unique f L1 () such that (E) = E f d||. Furthermore, |f | = 1 for ||
a.e. This is called the polar decomposition of .
1 As proved above, the assumption that || () < is redundant.
528 REPRESENTATION THEOREMS
Proof: First note that || and so such an L1 function exists and is unique.
It is required to show |f | = 1 a.e. If ||(E) = 0,
(E) 1
= f d|| 1.
||(E) ||(E)
E
m
m
m
| (Fi )| = f d || |f | d ||
i=1 i=1 Fi i=1 Fi
m ( ) m ( )
1 1
1 d || = 1 || (Fi )
i=1 Fi
n i=1
n
( )
1
= || (En ) 1 .
n
which shows || (En ) = 0. Hence || ([|f | < 1]) = 0 because [|f | < 1] =
n=1 En .This
proves Corollary 20.2.8.
Then
|| (E) = |h| d.
E
Proof: From Corollary 20.2.8 there exists g such that |g| = 1, || a.e. and for
all E S
(E) = gd || = hd.
E E
20.3. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP 529
It follows gh 0 a.e. and |g| = 1. Therefore, |h| = |gh| = gh. It follows from the
above, that
|| (E) = d || = ghd = d || = |h| d
E E E E
This function satises ||h||q = |||| where |||| is the operator norm of .
Now let (E) = (XE ). Since this is a nite measure space XE is an element
of Lp () and so it makes sense to write (XE ). In fact is a complex measure
having nite total variation. Let A1 , , An be a partition of .
i=1
||XFn XF ||p 0.
Therefore, by continuity of ,
n
(F ) = (XF ) = lim (XFn ) = lim (XEk ) = (Ek ).
n n
k=1 k=1
from Theorem 7.5.6 on Page 175 upon breaking f up into positive and negative
parts of real and complex parts. In fact this theorem gives uniform convergence.
Then
(f ) = lim (sn ) = lim hsn d = hf d,
n n
the rst equality holding because of continuity of , the second following from 20.3.9
and the third holding by the dominated convergence theorem.
This is a very nice formula but it still has not been shown that h Lq ().
Let En = {x : |h(x)| n}. Thus |hXEn | n. Then
= ||hXEn ||qp
Therefore, since q q
p = 1, it follows that
Now that h has been shown to be in Lq (), it follows from 20.3.9 and the density
of the simple functions, Theorem 12.4.1 on Page 290, that
f = hf d
a.e. and so ||f || + ||g|| serves as one of the constants, M in the denition of
||f + g|| . Therefore,
||f + g|| ||f || + ||g|| .
Next let c be a number. Then |cf (x)| = |c| |f (x)| |c| ||f || and so ||cf ||
|c| ||f || . Therefore since c is arbitrary, ||f || = ||c (1/c) f || 1c ||cf || which
implies |c| ||f || ||cf || . Thus || || is a norm as claimed.
To verify completeness, let {fn } be a Cauchy sequence in L () and use the
above claim to get the existence of a set of measure zero, Enm such that for all
x / Enm ,
|fn (x) fm (x)| ||fn fm ||
Let E = n,m Enm . Thus (E) = 0 and for each x / E, {fn (x)}n=1 is a Cauchy
sequence in C. Let
{
0 if x E
f (x) = = lim XE C (x)fn (x).
limn fn (x) if x
/E n
and F =
n=1 Fn , it follows (F ) = 0 and that for x
/ F E,
||fm fn || < .
20.3. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP 533
Then, if x
/ E,
Hence ||f fn || < for all n large enough. This proves the theorem.
( )
The next theorem is the Riesz representation theorem for L1 () .
E = {x : |h(x)| |||| + }.
Let |k| = 1 and hk = |h|. Since the measure space is nite, k L1 (). As
in Theorem 20.3.1 let {sn } be a sequence of simple functions converging to k in
L1 (), and pointwise. It follows from the construction in Theorem 7.5.6 on Page
175 that it can be assumed |sn | 1. Therefore
(kXE ) = lim (sn XE ) = lim hsn d = hkd
n n E E
where the last equality holds by the Dominated Convergence theorem. Therefore,
||||(E) |(kXE )| = | hkXE d| = |h|d
E
(|||| + )(E).
It follows that (E) = 0. Since > 0 was arbitrary, |||| ||h|| . Since h L (),
the density of the simple functions in L1 () and 20.3.11 imply
f = hf d , |||| ||h|| . (20.3.12)
534 REPRESENTATION THEOREMS
This proves the existence part of the theorem. To verify uniqueness, suppose h1
and h2 both represent and let f L1 () be such that |f | 1 and f (h1 h2 ) =
|h1 h2 |. Then
0 = f f = (h1 h2 )f d = |h1 h2 |d.
Thus h1 = h2 . Finally,
|||| = sup{| hf d| : ||f ||1 1} ||h|| ||||
by 20.3.12.
Next these results are extended to the nite case.
Lemma 20.3.5 Let (, S, ) be a measure space and suppose there exists a mea-
r such that r (x) > 0 for all x, there exists M such that |r (x)| < M
surable function,
for all x, and rd < . Then for
(Lp (, )) , p 1,
1 1
+ = 1.
p q
e is a nite measure on S. For (Lp ()) , dene
Thus
e (Lp (e
))
by ( )
e (g) r1/p g
This really is in (Lp (e
)) because
( ) ( )1/p
e 1/p p
(g) r g ||||
1/p
r g d = |||| ||g||Lp (e)
20.3. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP 535
Therefore, by Theorems 20.3.4 and 20.3.1 there exists a unique h Lq (e) which
e
represents . Here q = if p = 1 and satises 1/q + 1/p = 1 otherwise. Thus for
g Lp (e
) ,
( ) ( )( )
1/p e
r g (g) = hgrd = r1/q h r1/p g d
( )
For f Lp () , it follows f = r1/p r1/p f = r1/p g and r1/p f Lp (e ). Thus
from the above,
( ( )) ( ) ( ) ( )
1/p 1/p 1/q 1/p 1/p
(f ) = r r f = r h r r f d = r1/q h f d
Since h Lq (e
) , it follows r1/q h Lq (). This is true even in the case that p = 1
so q = because r is bounded. It follows
q2
1/q q 1/q 1/q
r h q
= r h r h 1/q
r h =
L ()
( q2 ) ( ( )p )1/p
r1/q h r 1/q
h ||||
1/q q/p
r h d = ||||
1/q q/p
r h q
L ()
and so
1/q
r h |||| .
Lq ()
Now (
) 1/q
|||| sup r 1/q
h f d r h ||||
Lq ()
||f ||Lp () 1
(Lp (, )) , p 1.
1 1
+ = 1.
p q
536 REPRESENTATION THEOREMS
1 < (n ) < ,
n=1 n = .
Dene
1 1
r(x) = X (x) ( n ) , e
(E) = rd.
n=1
n2 n E
Thus
1
e() =
rd = <
n=1
n2
e is a nite measure. The above lemma gives the existence part of the conclusion
so
of the theorem. Uniqueness is done as before. This proves the theorem.
With the Riesz representation theorem, it is easy to show that
Lp (), p > 1
is a reexive Banach space. Recall Denition 18.2.14 on Page 457 for the denition.
for all g Lr (). From Theorem 20.3.6 r is one to one, onto, continuous and
linear. By the open map theorem, 1 r is also one to one, onto, and continuous
( r equals the representor of ). Thus r is also one to one, onto, and continuous
by Corollary 18.2.11. Now observe that J = p 1 q
q . To see this, let z (L ) ,
p
y (L ) ,
p 1
q ( q z )(y ) = ( p z )(y )
= z ( p y )
= ( q z )( p y )d,
J( q z )(y ) = y ( q z )
= ( p y )( q z )d.
Therefore p 1 q p
q = J on q (L ) = L . But the two maps are onto and so J is
also onto.
20.4. THE DUAL SPACE OF L () 537
|| () + || () = |||| + |||| .
Suppose now that { n } is a Cauchy sequence. For each E F,
| n (E) m (E)| || n m ||
and so the sequence of complex numbers n (E) converges. That to which it con-
verges is called (E) . Then it is obvious that (E) is nitely additive. Why is ||
nite? Since |||| is a norm, it follows that there exists a constant C such that for
all n,
| n | () < C
Let () be any partition. Then
| (A)| = lim | n (A)| C.
n
A() A()
Hence BV (). Let > 0 be given and let N be such that if n, m > N, then
|| n m || < /2.
Pick any such n. Then choose () such that
| n | () /2 < | (A) n (A)|
A()
= lim | m (A) n (A)| < lim inf | n m | () /2
m m
A()
It follows that
lim || n || = 0.
n
538 REPRESENTATION THEOREMS
Lemma 20.4.4 The above denition of the integral with respect to a nitely addi-
tive measure in BV (; ) is well dened.
Proof: First consider the claim about the integral being well dened on the
simple functions. This is clearly true if it is required that the ck are disjoint and
the Ek also disjoint having union equal to . Thus dene the integral of a simple
function in this manner. First write the simple function as
n
ck XEk
k=1
where the ck are the values of the simple function. Then use the above formula to
dene the integral. Next suppose the Ek are disjoint but the ck are not necessarily
distinct. Let the distinct values of the ck be a1 , , am
ck XEk = aj XEi = aj Ei
k j i:ci =aj j i:ci =aj
= aj (Ei ) = ck (Ek )
j i:ci =aj k
and so the same formula for the integral of a simple function is obtained in this case
also. Now consider two simple functions
n
m
s= ak XEk , t = bj XFj
k=1 j=1
20.4. THE DUAL SPACE OF L () 539
where the ak and bj are the distinct values of the simple functions. Then from what
was just shown,
n m m n
(s + t) d = ak XEk Fj + bj XEk Fj d
k=1 j=1 j=1 k=1
= ak XEk Fj + bj XEk Fj d
j,k
= (ak + bj ) (Ek Fj )
j,k
n
m
m
n
= ak (Ek Fj ) + bj (Ek Fj )
k=1 j=1 j=1 k=1
n
m
= ak (Ek ) + bj (Fj )
k=1 j=1
= sd + td
Thus the integral is linear on simple functions so, in particular, the formula given
in the above denition is well dened regardless.
So what about the denition for f L (; )? Since f L , there is a set
of measure zero N such that on N C there exists a sequence of simple functions
which converges uniformly to f on N C . Consider sn and sm . As in the above, they
can be written as
p p
ck XEk ,
n
k X Ek
cm
k=1 k=1
respectively, where the Ek are disjoint having union equal to . Then by uniform
convergence, if m, n are suciently large, |cnk cm
k | < or else the corresponding
Ek is contained in N C a set of measure 0 thanks to . Hence
p
sn d sm d =
(ck ck ) (Ek )
n m
k=1
p
|cnk cm
k | | (Ek )| ||||
k=1
and so the integrals of these simple functions converge. Similar reasoning shows
that the denition is not dependent on the choice of approximating sequence.
Note also that for s simple,
sd ||s|| || () = ||s|| ||||
L L
540 REPRESENTATION THEOREMS
Thus each T is in (L (; )) .
Here is the representation theorem, due to Kantorovitch, for the dual of L (; ).
Theorem 20.4.6 Let : BV (; ) (L (; )) be given by () T . Then
is one to one, onto and preserves norms.
Proof: It was shown in the above lemma that maps into (L (; )) . It is
obvious that is linear. Why does it preserve norms? From the above lemma,
where sgn ( (A)) is dened to be a complex number of modulus 1 such that sgn ( (A)) (A) =
| (A)| and
f () = sgn ( (A)) XA () .
A()
= |T (f )| = | () (f )| || ()|| ||||
( n )
n
n
sd = ck (Ek ) = ck (XEk ) = ck XEk = (s)
k=1 k=1 k=1
and so is onto.
Observe that Clarksons inequalities imply Lp is uniformly convex for all p >
1. Uniformly convex spaces have a very nice property which is described in the
following lemma. Roughly, this property is that any element of the dual space
achieves its norm at some point of the closed unit ball.
Lemma 20.5.3 Let X be uniformly convex and let X . Then there exists
x X such that
||x|| = 1, (x) = ||||.
xn = |e
wn e xn |.
||||
(xn ) = | (xn ) |
2
and = 0.
Claim || xn +x 2
m
|| 1 as n, m .
Proof of Claim: Let n, m be large enough that (xn ) , (xm ) |||| 2
where 0 < . Then ||xn + xm || = 0 because if it equals 0, then xn = xm so
(xn ) = (xm ) but both (xn ) and (xm ) are positive. Therefore consider
xn +xm
||xn +xm || , a vector of norm 1. Thus,
( )
(xn + xm ) 2||||
|||| | | .
||xn + xm || ||xn + xm ||
Hence
||xn + xm |||||| 2|||| .
Since > 0 is arbitrary, limn,m ||xn + xm || = 2. This proves the claim.
By uniform convexity, {xn } is Cauchy and xn x, ||x|| = 1. Thus (x) =
limn (xn ) = ||||.
The proof of the Riesz representation theorem will be based on the following
lemma which says that if you can show a directional derivative exists, then it can
be used to represent a functional.
Lemma 20.5.4 (McShane) Let X be a complex normed linear space and let X .
Suppose there exists x X, ||x|| = 1 with x = |||| = 0. Let y X and let
y (t) = ||x + ty|| for t R. Suppose y (0) exists for each y X. Then for all
y X,
y (0) + i iy (0) = ||||1 (y) .
20.5. NON FINITE CASE 543
Proof: Suppose rst that |||| = 1. Then since (x) = 1, (y (y)x) = 0 and
so
(x + t(y (y)x)) = (x) = 1 = ||||.
Therefore, ||x + t(y (y)x)|| 1 since otherwise ||x + t(y (y)x)|| = r < 1 and
so ( )
1 1 1
(x + t(y (y)x)) = (x) =
r r r
which would imply that |||| > 1.
Also for small t, |(y)t| < 1, and so
= y (0) + i iy (0).
This proves the lemma when |||| = 1. For arbitrary = 0, let (x) = ||||, ||x|| = 1.
1
Then from above, if 1 (y) |||| (y) , ||1 || = 1 and so from what was just
shown,
(y)
1 (y) = = y (0) + i iy (0)
||||
Now here are some short observations. For t R, p > 1, and x, y C, x = 0
p p
|x + ty| |x| p2
lim = p |x| (Re x Re y + Im x Im y)
t0 t
p2
= p |x| Re (
xy) (20.5.16)
Also from convexity of f (r) = r , for |t| < 1,
p
p p p p
|x + ty| |x| ||x| + |t| |y|| |x|
[ ( )]p
|x| + |t| |y| p
= (1 + |t|) |x|
1 + |t|
p p
p |x| |t| |y| p
(1 + |t|) + |x|
1 + |t| 1 + |t|
p1 p p p
(1 + |t|) (|x| + |t| |y| ) |x|
( )
p1 p p
(1 + |t|) 1 |x| + 2p1 |t| |y|
The above lemma and uniform convexity of Lp can be used to prove a general
version of the Riesz representation theorem next. This version makes no assumption
that the measure space is nite. Let p > 1 and let : Lq (Lp ) be dened by
(g)(f ) = gf d. (20.5.18)
Theorem 20.5.5 (Riesz representation theorem p > 1) The map is 1-1, onto,
continuous, and
||g|| = ||g||, |||| = 1.
Proof: Obviously is linear. Suppose g = 0. Then 0 = gf d for all
f Lp . Let f = |g|q2 g. Then f Lp and so 0 = |g|q d. Hence g = 0 and is
1-1. That g (Lp ) is obvious from the Holder inequality. In fact,
Note 1
p 1 = 1q . Therefore,
p
if (0) = ||g|| q |g(x)|p2 Re(ig(x)f(x))d.
Denition 20.6.1 f C0 (X) means that for every > 0 there exists a compact
set K such that |f (x)| < whenever x
/ K. Recall the norm on this space is
If
| (f )| C ||f ||
then
|f | C ||f ||
R (f ) = f + f
(cf ) = cf ,
if c 0 while
(cf )+ = c(f ),
if c < 0 and
(cf ) = (c)f +,
20.6. THE DUAL SPACE OF C0 (X) 547
Proof: The rst two assertions are easy to see so consider the third.
For fj C0+ (X) , there exists gi C0 (X) such that |gi | fi and
| 1 g1 + 2 g2 | |g1 | + |g2 | f1 + f2
Therefore,
Also, if f 0,
f = R f = (f ) 0.
Therefore, is a positive linear functional on C0 (X). In particular, it is a positive
linear functional on Cc (X). By Theorem 16.4.4 on Page 403, there exists a unique
measure such that
f = f d
X
for all f Cc (X). This measure is inner regular on all open sets and on all
measurable sets having nite measure. In fact, it is actually a nite measure.
Lemma 20.6.4 Let L C0 (X) as above. Then letting be the Radon measure
just described, it follows is nite and
Proof: First of all, why is |||| = ||L||? From 20.6.20 it follows |||| ||L||. But
also
|Lg| (|g|) = (|g|) |||| ||g||
and so by denition of the operator norm, ||L|| |||| .
Now X is an open set and so
(X) = sup {f : f X}
and so
(X) ||L|| .
Now since Cc (X) is dense in C0 (X) , there exists f Cc (X) such that ||f || 1
and
|f | + > |||| = ||L||
Then also f X and so
Since is arbitrary, this shows ||L|| = (X). This proves the lemma.
What follows is the Riesz representation theorem for C0 (X) .
550 REPRESENTATION THEOREMS
Theorem 20.6.5 Let L (C0 (X)) for X a locally compact Hausdorf space. Then
there exists a nite Radon measure and a function L (X, ) such that for
all f C0 (X) ,
L(f ) = f d.
X
Furthermore,
(X) = ||L|| , || = 1 a.e.
and if
(E) d
E
then = || .
Proof: From the above there exists a unique Radon measure such that for all
f Cc (X) ,
f = f d
X
Since is both inner and outer regular thanks to it being nite, Cc (X) is dense
in L1 (X, ). (See Theorem 12.5.3 for more than is needed.) Therefore L extends
e By the Riesz representation theorem for
uniquely to an element of (L1 (X, )) , L.
L1 for nite measure spaces, there exists a unique L (X, ) such that for all
f L1 (X, ) ,
e
Lf = f d
X
which shows from Lemma 20.2.7 that || 1 a.e. But also, choosing f1 appropri-
ately, ||f1 || 1, and letting Lf1 = |Lf1 | ,
|| d +
X
where d || is the polar decomposition of the complex measure . Then with this
convention, the above representation is
L (f ) = f d, || (X) = ||L|| .
X
Also note that at most one can represent L. If there were two of them i , i = 1, 2,
then 1 2 would represent 0 and so | 1 2 | (X) = 0. Hence 1 = 2 , at least
on every Borel set.
Also C0 (X) will denote the space of continuous functions, f , dened on X such
that in the topology of X, e limx f (x) = 0. For this space of functions, ||f ||
0
sup {|f (x)| : x X} is a norm which makes this into a Banach space. Then the
generalization is the following corollary.
Corollary 20.7.1 Let L (C0 (X)) where X is a locally compact Hausdor space.
Then there exists L (X, ) for a nite Radon measure such that for all
f C0 (X),
L (f ) = f d.
X
Proof: Let { ( ) }
e f C X
D e : f () = 0 .
( )
e is a closed subspace of the Banach space C X
Thus D e . Let : C0 (X) D
e be
dened by
{
f (x) if x X,
f (x) =
0 if x = .
L1 g = gd1 .
e
X
S {E \ {} : E S1 }
The following lemma says that the dierence of regular complex measures is also
regular.
|1 | (V \ K) + |2 | (V \ K) < .
20.9 Exercises
1. Suppose is a vector measure having values in Rn or Cn . Can you show that
|| must be nite? Hint: You might dene for each ei , one of the standard ba-
sis vectors, the real or complex measure, ei given by ei (E) ei (E) . Why
would this approach not yield anything for an innite dimensional normed lin-
ear space in place of Rn ?
2. The Riesz representation theorem of the Lp spaces can be used to prove a
very interesting inequality. Let r, p, q (1, ) satisfy
1 1 1
= + 1.
r p q
20.9. EXERCISES 555
Then
1 1 1 1
=1+ >
q r p r
and so r > q. Let (0, 1) be chosen so that r = q. Then also
1/p+1/p =1
z }| {
1
1 1
1 1
= 1 + 1=
r p q q p
and so
1 1
=
q q p
which implies p (1 ) = q. Now let f Lp (Rn ) , g Lq (Rn ) , f, g 0.
Justify the steps in the following argument using what was just shown that
r = q and p (1 ) = q. Let
( )
1 1
h Lr (Rn ) . + =1
r r
f g (x) h (x) dx = f (y) g (x y) h (x) dxdy .
1
|f (y)| |g (x y)| |g (x y)| |h (x)| dydx
( ( )r )1/r
1
|g (x y)| |h (x)| dx
( ( )r )1/r
|f (y)| |g (x y)| dx dy
[ ( )p /r ]1/p
( )r
1
|g (x y)| |h (x)| dx dy
[ ( )p/r ]1/p
( )r
|f (y)| |g (x y)| dx dy
[ ( )r /p ]1/r
( )p
1
|g (x y)| |h (x)| dy dx
[ ( )p/r ]1/p
p r
|f (y)| |g (x y)| dx dy
556 REPRESENTATION THEOREMS
[ ( )r /p ]1/r
r (1)p q/r
= |h (x)| |g (x y)| dy dx ||g||q ||f ||p
q/r q/p
= ||g||q ||g||q ||f ||p ||h||r = ||g||q ||f ||p ||h||r . (20.9.23)
Youngs inequality says that
Therefore ||f g||r ||g||q ||f ||p . How does this inequality follow from the
above computation? Does 20.9.23 continue to hold if r, p, q are only assumed
to be in [1, ]? Explain. Does 20.9.24 hold even if r, p, and q are only assumed
to lie in [1, ]?
3. Suppose (, , S) is a nite measure space and that {fn } is a sequence of
functions which converge weakly to 0 in Lp (). This means that
fn gd 0
for every g Lp (). Suppose also that fn (x) 0 a.e. Show that then
fn 0 in Lp () for every p > > 0.
4. Give an example of a sequence of functions in L (, ) which converges
weak to zero but which does not converge pointwise a.e. to zero. Conver-
gence weak to 0 means that for every g L1 (, ) , g (t) fn (t) dt 0.
Hint: First consider g Cc (, ) and maybe try something like fn (t) =
sin (nt). Do integration by parts.
5. Let be a real vector measure on the measure space (, F). That is has
values in R. The Hahn decomposition says there exist measurable sets P, N
such that
P N = , P N = ,
and for each F P, (F ) 0 and for each F N, (F ) 0. These
sets P, N are called the positive set and the negative set respectively. Show
the existence of the Hahn decomposition. Also explain how this decompo-
sition is unique in the sense that if P , N is another Hahn decomposition,
then (P \ P ) (P \ P ) has measure zero, a similar formula holding for
N, N . When you have the Hahn decomposition, as just described, you de-
ne + (E) (E P ) , (E) (E N ). This is sometimes called the
Hahn Jordan decomposition. Hint: This is pretty easy if you use the polar
decomposition above.
6. The Hahn decomposition holds for measures which have values in (, ].
Let be such a measure which is dened on a algebra of sets F. This is
not a vector measure because the set on which it has values is not a vector
space. Thus this case is not included in the above discussion. N F is called
20.9. EXERCISES 557
tn sup { (E) : E Nn } 0.
+ (E) (E P ) , (E) (E N ) .
and from Problem 6 let An En be a negative set such that (An ) (En ).
Then Nn+1 Nn An . If tn does not converge to 0 explain why there exists a
set having measure which is not allowed. Thus tn 0. Let N = n=1 Nn
and explain why P N C must be positive due to tn 0.
8. What if has values in [, ). Prove there exists a Hahn decomposition for
as in the above problem. Why do we not allow to have values in [, ]?
Hint: You might want to consider .
9. Suppose X is a Banach space and let X denote its dual space. A sequence
{xn }n=1 in X is said to converge weak to x X if for every x X,
(E) = 1 if 0 E and 0 if 1
/ E.
Explain why this is well dened. Next explain why X can be considered a
Radon probability measure by completion. Explain why X G if
X () dX
Rn
11. Using the above problem, the characteristic function of this measure (ran-
dom variable) is
X (y) eixy dX
Rn
Show this always exists for any such random variable and is continuous. Next
show that for two random variables X, Y, X = Y if and only if X (y) =
Y (y) for all y. In other words, show the distribution measures are the same
if and only if the characteristic functions are the same. A lot more can be
concluded by looking at characteristic functions of this sort. The important
thing about these characteristic functions is that they always exist, unlike
moment generating functions.
12. It was shown above that if X where X is a uniformly convex Banach
space, then there exists x X, ||x|| = 1, and (x) = |||| . Show that this x
must be unique. Hint: Recall that uniform convexity implies strict convexity.
Dierentiation With Respect
To General Radon Measures
rx Bx By ry
x y
px
py
a
Ba r
559
560 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
The other case is that ||y a|| ||x a|| < 0. Then in this case 21.1.1 reduces
to
( )
||x y|| 1
r (||x a|| ||y a||)
||x a|| ||x a||
( )
||x y|| ||y a||
= r 1+
||x a|| ||x a||
r
(||x y|| ||x a|| + ||y a||)
||x a||
r r
(ry (r + rx ) + ry ) (ry r)
rx + r r +r
( x )
r r 1 1
(rx r) 1 rx rx = r
rx + r rx + rx +1
Finally, in the case of the balls Bi having centers at xi , let Pxi be the expression
a+r ||xxii a
a|| . Then (Pxi a) r
1
is on the unit sphere having center 0. Furthermore,
(Pxi a) r1 (Pyi a) r1 = r1 ||Pxi Pyi || r1 r 1 = 1 .
+1 +1
How many points on the unit sphere can be pairwise this far apart? This set is
compact and so there exists an 1
+1 net having L (n, ) points. Thus m cannot be
any larger than L (n, ).
The above lemma has do do with balls which are relatively large intersecting a
given ball. Next is a lemma which has to do with relatively small balls intersecting
a given ball.
Lemma 21.1.2 Let B be a ball having radius r and suppose B has nonempty in-
tersection with the balls B1 , , Bm having radii r1 , , rm respectively. Suppose
, > 1 and the ri are comparable with r in the sense that 1 r ri r. Let Bi
have the same center as Bi with radius equal to ri = ri for some < 1. If the Bi
are disjoint, then there exists a constant M (n, , , ) such that m M (n, , , ).
Letting = 10, = 1/3, = 4/3, it follows that m 60n .
Proof: Let the volume of a ball of radius r be given by (n) rn where (n)
depends on the norm used and on the dimension n as indicated. The idea is to
enlarge B, till it swallows all the Bi . Then, since they are disjoint and their radii
are not too small, there cant be too many of them.
This can be done for a single Bi by enlarging the radius of B to r + ri + ri .
Bi Bi B
Then to get all the Bi , you would just enlarge the radius of B to r + r + r =
(1 + + ) r. Then, using the inequality which makes ri comparable to r, it follows
that
m ( )n m
n n
(n) r (n) (ri ) (n) (1 + + ) rn
i=1
i=1
Therefore,
( )n
n
m (1 + + )
( )n
n
and so m (1 + + ) M (n, , , ).
562 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
From now on, let = 10 and let = 1/3 and = 4/3. Then
( )n
172
M (n, , , ) 60n
3
Thus m 60n .
Now here is the Besicovitch covering theorem.
Theorem 21.1.3 There exists a constant Nn , depending only on n with the fol-
lowing property. If F is any collection of nonempty balls in Rn with
and if A is the set of centers of the balls in F, then there exist subsets of F, H1 , ,
HNn , such that each Hi is a countable collection of disjoint balls from F (possibly
empty) and
A N i=1 {B : B Hi }.
n
Lemma 21.1.4 In the situation of Theorem 21.1.3, suppose the set of centers A is
J
bounded. Dene a sequence of balls from F, {Bj }j=1 where J such that
3
r (B1 ) sup {r (B) : B F} (21.1.2)
4
and if
Am A \ (m
i=1 Bi ) = , (21.1.3)
then Bm+1 F is chosen with center in Am such that
3
rm+1 r (Bm+1 ) sup {r : B (a, r) F , a Am } . (21.1.4)
4
Then letting Bj = B (aj , rj ) , this sequence satises
4 J
A Ji=1 Bi , r (Bk ) r (Bj ) for j < k, {B (aj , rj /3)}j=1 are disjoint.
3
(21.1.5)
Proof: Consider the second inequality. First note the sets Am form a decreasing
sequence. Thus, from the denition, of Bj , for j < k,
Proof: For each x Z, there exists a ball B (x,r) with (B (x,r)) = 0. Let C
be the collection of these balls. Since Rn has a countable basis, a countable subset,
e of C also covers Z. Let
C,
Ce = {Bi }i=1 .
564 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
Theorem 21.2.2 Let be a Radon measure and let f L1 (Rn , ). Then for
a.e.x
/ Z,
1
lim |f (y) f (x)| d (y) = 0
r0 (B (x, r)) B(x,r)
Proof: First consider the following claim which is a weak type estimate of the
same sort used when dierentiating with respect to Lebesgue measure.
Claim 1: The following inequality holds for Nn the constant of the Besicovitch
covering theorem.
([M f > ]) Nn 1 ||f ||1
Proof: First note [M f > ] Z = and without loss of generality, you can
assume ([M f > ]) > 0. Next, for each x [M f > ] there exists a ball Bx =
B (x,rx ) with rx 1 and
1
(Bx ) |f | d > .
B(x,rx )
Let F be this collection of balls so that [M f > ] is the set of centers of balls of F.
By the Besicovitch covering theorem,
[M f > ] N
i=1 {B : B Gi }
n
Nn
([M f > ]) ( {B : B Gi })
i=1
Nn
([M f > ])
< = ([M f > ]),
i=1
Nn
21.2. FUNDAMENTAL THEOREM OF CALCULUS FOR RADON MEASURES 565
and
1
lim g (y) d (y) = g (x). (21.2.6)
r0 (B (x,r)) B(x,r)
21.2.6 follows from the above and the triangle inequality. This proves the claim.
Now let g Cc (Rn ) and x / Z. Then from the above observations about
continuous functions,
([ ])
1
x/ Z : lim sup |f (y) f (x)| d (y) > (21.2.7)
r0 (B (x, r)) B(x,r)
([ ])
1
x/ Z : lim sup |f (y) g (y)| d (y) >
r0 (B (x, r)) B(x,r) 2
([ ])
+ x / Z : |g (x) f (x)| > .
2
([ ]) ([ ])
M (f g) > + |f g| > (21.2.8)
2 2
Now
([ ])
|f g| d |f g| >
[|f g|> 2 ] 2 2
Proof: If f is replaced by f XB(0,k) then the conclusion 21.2.9 holds for all
/ k where Fk is a set of measure 0. Letting k = 1, 2, , and F
x F k=1 Fk , it
follows that F is a set of measure zero and for any x / F , and k {1, 2, }, 21.2.9
holds if f is replaced by f XB(0,k) . Picking any such x, and letting k > |x| + 1, this
shows
1
lim |f (y) f (x)| d (y)
r0 (B (x, r)) B(x,r)
1
= lim f XB(0,k) (y) f XB(0,k) (x) d (y) = 0.
r0 (B (x, r)) B(x,r)
measures and this shows that an integral with respect to can be written as an
iterated integral in terms of the measure and the slicing measures, x . This is
like going backwards in the construction of product measure. One starts with a
measure, , dened on the Cartesian product and produces and an innite family
of slicing measures from it whereas in the construction of product measure, one
starts with two measures and obtains a new measure on a algebra of subsets of
the Cartesian product of two spaces. First here are two technical lemmas.
21.3. SLICING MEASURES 567
x f (x, y) d x (y) is measurable (21.3.11)
Rm
and ( )
f (x, y) d = f (x, y) d x (y) d (x). (21.3.12)
Rn+m Rn Rm
If bx is any other collection of Radon measures satisfying 21.3.11 and 21.3.12, then
bx = x for a.e. x.
Proof:
0 (E) = (E Rm ).
Thus 0 is a nite Borel measure and so it is nite on compact sets. Lemma 17.2.3
on Page 17.2.3 implies the existence of the Radon measure extending 0 .
Uniqueness of x
where ai and bi are rational. Thus there are countably many such sets. Then from
the conclusion of the theorem, if x0 / N N b,
1
XA (y) d x (y) d
(B (x0 , r)) B(x0 ,r) Rm
1
= XA (y) db
x (y) d,
(B (x0 , r)) B(x0 ,r) Rm
and by the Lebesgue Besicovitch Dierentiation theorem, there exists a set of
measure zero, EA , such that if x0 / EA N N b , then the limit in the above exists
as r 0 and yields
x0 (A) = bx0 (A).
Letting E denote the union of all the sets EA for A as described above, it follows
that E is a set of measure zero and if x0
/ EN N b then x (A) = bx (A) for
0 0
21.3. SLICING MEASURES 569
all such sets A. But every open set can be written as a disjoint union of sets of this
form and so for all such x0 , x0 (V ) = bx0 (V ) for all V open. By Lemma 21.3.2
this shows the two measures are equal and proves the uniqueness assertion for x .
It remains to show the existence of the measures x .
Existence of x
It is obvious from the formula that the map from f Cc (Rm ) to L1 () given
by f hf is linear. However, this is not suciently specic because functions
in L1 () are only determined a.e. However, for hf L1 () , you can specify a
particular representative a.e. By the fundamental theorem of calculus,
c 1
hf (x) lim hf (z) d (z) (21.3.14)
r0 (B (x, r)) B(x,r)
exists o some set of measure zero Zf . Note that since this involves the integral
over a ball, it does not matter which representative of hf is placed in the formula.
570 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
cf (x) is well dened pointwise for all x not in some set of measure zero.
Therefore, h
c
Since hf = hf a.e. it follows that hcf is well dened and will work in the formula
21.3.13. Let
Z = {Zf : f D}
g (x) (hf (x) hf (x)) d ||f f || |g (x)| d
Rn Rn+m
Dividing by (B (z, r)) , and then taking a limit as r 0, it follows that for all
z
/ Zf Zf ,
|hf (z) hf (z)| ||f f || ,
Also, if (B (z, r)) > 0 for all r > 0, then for all r > 0,
1
(hf (x) hf (x)) d ||f f ||
(B (z, r)) B(z,r)
+
It follows that for f Cc (Rm ) arbitrary and z / Z,
1 1
lim sup hf (x) d lim inf hf (x) d
r0 (B (z, r)) B(x,r) r0 (B (z, r)) B(x,r)
1
= lim sup (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)
1
lim inf (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)
1
lim sup (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)
1
+ lim inf (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)
2 ||f f ||
and since f is arbitrary, it follows that the limit of 21.3.14 holds for all f Cc (Rm )
whenever z / Z, the above set of measure zero.
21.3. SLICING MEASURES 571
Now for f an arbitrary real valued function of Cc (Rn ) , simply apply the above
cf hd
result to positive and negative parts to obtain hf hf + hf and h d
f + hf .
Then it follows that for all f Cc (R ) and g Cc (R )
m m
g (x) f (y) d = cf (x) d.
g (x) h
Rn+m Rn
Then dividing by (B (x, r)) and taking a limit as r 0, it follows that for
a.e. x, 1 = x (Rm ) , so these x are probability measures. Letting gk (x)
XA (x) , fk (y) XB (y) for A, B open, it follows that 21.3.15 is valid for g (x)
replaced with XA (x) and f (y) replaced with XB (y).
Now let G denote the Borel sets F of Rn+m such that
XF (x, y) d (x, y) = XF (x, y) d x (y) d (x)
Rn+m Rn Rm
and that all the integrals make sense. As just explained, this includes all Borel
sets of the form F = A B where A, B are open. It is clear that G is closed with
respect to countable disjoint unions and complements, while sets of the form A B
for A, B open form a system. Therefore, by Lemma 9.2.2, G contains the Borel
sets which is the smallest algebra which contains such products of open sets. It
follows from the usual approximation with simple functions that if f 0 and is
Borel measurable, then
f (x, y) d (x, y) = f (x, y) d x (y) d (x)
Rn+m Rn Rm
572 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
It follows
g (x, y) d x (y) d (x) = h (x, y) d x (y) d (x)
Rn Rm Rn Rm
and so, since for a.e. x, y g (x, y) and y h (x, y) are x measurable with
0= (h (x, y) g (x, y)) d x (y)
Rm
21.4 Exercises
1. Suppose U is an open set in Rn and h : U Rn is a function which satises
the following conditions.
mn (h (U \ A)) = 0 (21.4.18)
Here mn is Lebesgue measure. Show, using the Besikovitch covering theorem
that if T U and mn (T ) = 0, then mn (h (T )) = 0 also. If mn were replaced
with an arbitrary Radon measure, would this result still hold? Explain. Hint:
You might want to rst consider Tk {x T : Dh (x) k} , show h (Tk )
has measure zero and then let k .
J (x) dmn mn (Dh (x) B (0, r (1 )))
B(x,r)
= |det Dh (x)| mn (B (0,r (1 )))
n
= (1 ) |det Dh (x)| mn (B (0,r))
Next show the change of variables formula is valid for any f 0 and Borel
measurable,
f (y) dmn = f (h (x)) |det Dh (x)| dmn
V U
21.4. EXERCISES 575
1
under the assumptions 21.4.16 - 21.4.18 with Dh (x) existing for all x A.
Finally extend this result to only require f is Lebesgue measurable. Note that
x f (h (x)) is not known to be measurable but x f (h (x)) |det Dh (x)|
will be measurable. This last part will involve completeness of Lebesgue mea-
sure along with regularity.
8. Suppose h : U Rn is a function which satises the following conditions.
(a) Using Problem 9, explain why for each x E, there exist Br (x) F
such that ( )
(S (x, r)) B (x,r) \ B (x,r) = 0
E N
i=1 {B : B Hi }
n
Then explain why, for some i, there are nitely many sets of Hi , B1 , , Bm1
such that
m1
( ) m1
(E)
Bi E = (Bi E) > .
i=1 i=1
Nn + 1
576 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
( 1 )
(c) Letting E1 = E \ m
i=1 Bi , explain why
Nn
(E1 ) (E)
Nn + 1
{ let F2 be the
(d) Now } sets of F1 , if any, which have empty intersection with
Bi : Bi Hi . Then let E1 play the role of E in the above argument
and let F2 play the role of F. Thus there exist nitely many sets of
F2 , Bm1 +1 , , Bm2 which are disjoint and if E2 consists of those points
of E1 which are not covered by these balls, then
( )2
Nn Nn
(E2 ) (E1 ) (E)
Nn + 1 Nn + 1
Continuing this way, explain why (En ) 0 and why the disjoint balls
just constructed have the property that (E \
i=1 Bi ) = 0.
11. In the above problem, you dont need to have E bounded. Explain why
you can eliminate this assumption. Hint: Let rn be an increasing sequence of
positive real numbers
( ) rn )) = 0. Then let E0 = E B (0, r0 )
such that (S (0,
and En = E B (0, rn ) \ B (0, rn1 ) . Also let Fn be those sets of F which
are contained in B (0, rn ) \ B (0, rn1 ). Then apply the above result.
12. For X a random variable having values in Rn , denote by X the Radon mea-
sure satisfying X (E) P (X E) for every Borel E. Now suppose X, Y
are random variables having values in Rn and Rm respectively. First explain
why there exist unique probability measures, denoted as X|y and X|x such
that whenever E is a Borel set in Rn+m ,
XE d (X, Y) = XE dX|y dY = XE dY|x dX
Rn+m Rn Rm Rn Rm
577
578 THE BOCHNER INTEGRAL
In words, Ukm is the set of points of X which are as close to ak as they are to any
of the al for l n.
( )
Bkn x1 (Ukn ) , Dkn Bkn \ k1
i=1 Bi , D1 B1 ,
n n n
and
n
xn (s) ak XDkn (s).
k=1
n
Thus xn (s) is a closest approximation to x (s) from {ak }k=1 and so xn (s) x (s)
because {an }n=1 is dense in x (). Furthermore, xn is measurable because each Dk
n
is measurable.
Since (, S, ) is nite, there exists n with (n ) < . Let
Then yn (s) x (s) for each s because for any s, s n if n is large enough. Also
yn is a simple function because it equals 0 o a set of nite measure.
Now suppose that x is strongly measurable. Then some sequence of simple
functions, {xn }, converges pointwise to x. Then x1 n (W ) is measurable for every
open set W because it is just a nite union of measurable sets. Thus, x1 n (W ) is
measurable for every Borel set W . This follows by considering
{ }
W : x1
n (W ) is measurable
and observing this is a algebra which contains the open sets. Since X is a metric
space, it follows that if U is an open set in X , there exists a sequence of open sets,
{Vn } which satises
V n U, V n Vn+1 , U =
n=1 Vn .
Then ( )
x1 (Vm ) x1
k (Vm ) x
1
Vm .
n< kn
This implies
x1 (U ) = x1 (Vm )
m<
( )
x1
k (Vm ) x1 V m x1 (U ).
m< n< kn m<
Since
x1 (U ) = x1
k (Vm ),
m< n< kn
22.1. STRONG AND WEAK MEASURABILITY 579
The next lemma is interesting for its own sake. Roughly it says that if a Banach
space is separable, then the unit ball in the dual space is weak separable. This
will be used to prove Pettiss theorem, one of the major theorems in this subject
which relates weak measurability to strong measurability.
Lemma 22.1.4 If X is a separable Banach space with B the closed unit ball in
X , then there exists a sequence {fn }
n=1 D B with the property that for every
x X,
||x|| = sup |f (x)|
f D
Proof: Let {ak } be a countable dense set in X, and consider the mapping
n : B Fn
given by
n (f ) (f (a1 ) , , f (an )) .
Then n (B ) is contained in a compact subset of Fn because |f (ak )| ||ak || .
Therefore, there exists a countable
densek subset
{ ) ,({kn)(fk )}}k=1 .Then pick
of n (B
hj H B such that limj fk hj = 0. Then n hj , k, j must also be
k
{ }
dense in n (B ) . Let Dn = hkj , k, j . Dene
D
k=1 Dk .
580 THE BOCHNER INTEGRAL
Note that for each x X, there exists fx B such that fx (x) = ||x||. From the
construction,
||am || = sup {|f (am )| : f D }
because fam (am ) is the limit of numbers f (am ) for f Dm D . Therefore, for x
arbitrary,
= f D {s : |f (x (s) a)| r}
= f D {s : |f (x (s)) f (a)| r}
1
= f D (f x) B (f (a) , r)
Corollary 22.1.6 Let X be a separable Banach space and let B (X) denote the
algebra of Borel sets. Let H be a dense subset of X . Then B (X) = (H) F,
the smallest algebra of subsets of X which has the property that every function,
x H is measurable.
22.1. STRONG AND WEAK MEASURABILITY 581
Proof: First I need to show F contains open balls because then F will contain
the open sets and hence the Borel sets. As noted above, it suces to show F
contains closed balls. Let D be those functionals in B dened in Lemma 22.1.4.
Then
{ }
{x : ||x a|| r} = x : sup |x (x a)| r
x D
= x D {x : |x (x a)| r}
= x D {x : |x (x) x (a)| r}
( )
= x D x1 B (x (a) , r) (H)
m=1 f
1
(Vm ) = f 1 (U ).
It follows f 1 (U ) F because it equals the expression in the middle which is
measurable. Now let W . Since B is countable, W = n=1 Un for some sets
Un B. Hence
f 1 (W ) =
n=1 f
1
(Un ) F .
Note that the same conclusion would hold for any topological space with the
property that for any open set U, it has such a sequence of Vk attached to it as in
22.1.2.
582 THE BOCHNER INTEGRAL
= m
i=1 {s : |xi (x (s) z)| < r}
= m
i=1 {s : |xi (x (s)) xi (z)| < r}
which is measurable because each xi x is given to be measurable.
Next suppose x1 (U ) F whenever U is weakly open. Then in particular this
holds when U = Bx (z, r) for arbitrary x . Hence
{s : x (s) Bx (z, r)} F.
22.1. STRONG AND WEAK MEASURABILITY 583
Lemma 22.1.10 Let B be the closed unit ball in X. If X is separable, there exists
a sequence {xm }
m=1 D B with the property that for all y X ,
Proof: Let
{xk }
k=1
It remains to verify this works. Let y X . Then there exists y such that
By density, there exists one of the xk from the countable dense subset of X such
that also
|xk (y)| > ||y || , ||xk y || < .
Now xk (y) k (B) and so there exists x Dk D such that
|y (x)| ||y || 2.
The next theorem is another version of the Pettis theorem. First here is a
denition.
584 THE BOCHNER INTEGRAL
a contradiction.
then
n
ak (Ek ) = 0.
k=1
Let f X . Then
( )
n
n
f ak XEk (s) = f (ak ) XEk (s) = 0
k=1 k=1
and, therefore,
( n
)
n
( n
)
0= f (ak ) XEk (s) d = f (ak ) (Ek ) = f ak (Ek ) .
k=1 k=1 k=1
Since f X is arbitrary, and X separates the points of X, it follows that
n
ak (Ek ) = 0
k=1
as claimed.
It follows easily from this proposition that d is well dened and linear on
simple functions.
Denition 22.2.3 A strongly measurable function x is Bochner integrable if there
exists a sequence of simple functions xn converging to x pointwise and satisfying
||xn (s) xm (s)|| d 0 as m, n . (22.2.4)
If x is Bochner integrable, dene
x (s) d lim xn (s) d. (22.2.5)
n
586 THE BOCHNER INTEGRAL
Theorem 22.2.4 The Bochner integral is well dened and if x is Bochner inte-
grable and f X , ( )
f x (s) d = f (x (s)) d (22.2.6)
and
x (s) d ||x (s)|| d. (22.2.7)
Also, the Bochner integral is linear. That is, if a, b are scalars and x, y are two
Bochner integrable functions, then
(ax (s) + by (s)) d = a x (s) d + b y (s) d (22.2.8)
Also
x (s) d < when x is Bochner integrable.
Proof: First it is shown that the triangle inequality holds on simple functions
and that the limit in 22.2.5 exists. Thus, if x is given by 22.2.3 (simple) with the
Ek disjoint,
x (s) d
n n
= ak XEk (s) d = ak (Ek )
k=1 k=1
n n
||ak || (Ek ) = ||ak || XEk (s) d = ||x (s)|| d
k=1 k=1
which shows the triangle inequality holds on simple functions. This implies
xn (s) d
xm (s) d = (xn (s) xm (s)) d
||xn (s) xm (s)|| d
/2 + /2
if m and n are chosen large enough. Since is arbitrary, this shows the limit is the
same for both sequences and demonstrates the Bochner integral is well dened.
It remains to verify the triangle inequality on Bochner integrable functions and
the claim about passing a continuous linear functional inside the integral. Let x be
Bochner integrable and let xn be a sequence which satises the conditions of the
denition. Dene
{
xn (s) if ||xn (s)|| 2 ||x (s)||,
yn (s) (22.2.9)
0 if ||xn (s)|| > 2 ||x (s)||.
Thus
yn (s) = xn (s) X[||xn ||2||x||] (s) .
If x (s) = 0 then yn (s) = 0 for all n. If ||x (s)|| > 0 then for all n large enough,
yn (s) = xn (s).
Thus,
( ) ( )
f xd = lim f yn d = lim f (yn ) d = f (x) d,
n n
the last equation holding from the dominated convergence theorem and 22.2.10 and
22.2.11. This shows 22.2.6. To verify 22.2.7,
x (s) d = lim yn (s) d
n
lim ||yn (s)|| d = ||x (s)|| d
n
where the last equation follows from the dominated convergence theorem and 22.2.10,
22.2.11.
It remains to verify 22.2.8. Let f X . Then from 22.2.6
( )
f (ax (s) + by (s)) d = (af (x (s)) + bf (y (s))) d
= a f (x (s)) d + b f (y (s)) d
(
)
= f a x (s) d + b y (s) d .
Thus
||x|| d lim inf ||xn || d <
n
Using 22.2.16 it follows yn satises 22.2.14, converges pointwise to x and then from
the dominated convergence theorem 22.2.15 holds.
Here is a simple corollary.
Proof: From Theorem 22.2.5 there is a sequence of simple functions {yn } hav-
ing the properties listed in that theorem. Then consider {Lyn } which converges
pointwise to Lx. Since L is continuous and linear,
||Lyn Lx||Y d ||L|| ||yn x||X d
590 THE BOCHNER INTEGRAL
Also ( )1/p
p
||x||Lp (;X) ||x||p ||x (s)|| d . (22.3.17)
Proof: Let
N
gN (s) ||xn+1 (s) xn (s)||X
n=1
||xn+1 xn ||p < .
n=1
Let
g (s) = lim gN (s) = ||xn+1 (s) xn (s)||X .
N
n=1
exists because
N
xN +1 (s) = xN +1 (s) x1 (s) + x1 (s) = (xn+1 (s) xn (s)) + x1 (s).
n=1
N
||xN +1 (s) xM +1 (s)||X ||xn+1 (s) xn (s)||X
n=M +1
||xn+1 (s) xn (s)||X
n=M +1
which shows that {xN +1 (s)}N =1 is a Cauchy sequence. Now let
{
limN xN (s) if s
/ E,
x (s)
0 if s E.
if s
/ E and f (x (s)) = 0 if s E. Therefore, f x is measurable because it is the
limit of the measurable functions,
f x N XE C .
It remains to show x Lp (; X). This follows from the above and the triangle
inequality. Thus, for N large enough,
( )1/p
p
||x (s)|| d
( )1/p ( )1/p
p p
||xN (s)|| d + ||x (s) xN (s)|| d
( )1/p
p
||xN (s)|| d + < .
and apply Lemma 22.3.2. The pointwise convergence of this subsequence was estab-
lished in the proof of this lemma. This proves the theorem because if a subsequence
of a Cauchy sequence converges, then the Cauchy sequence must also converge.
22.3. THE SPACES LP (; X) 593
Observation 22.3.4 If the measure space is Lebesgue measure then you have con-
tinuity of translation in Lp (Rn ; X) in the usual way. More generally, for a Radon
measure on a locally compact Hausdor space, Cc (; X) is dense in Lp (; X) .
Here Cc (; X) is the space of continuous X valued functions which have compact
support in . The proof of this little observation follows immediately from approx-
imating with simple functions and then applying the appropriate considerations to
the simple functions.
Clearly Fatous lemma and the monotone convergence theorem make no sense
for functions with values in a Banach space but the dominated convergence theorem
holds in this setting.
Theorem 22.3.5 If x is strongly measurable and xn (s) x (s) a.e. with
||xn (s)|| g (s) a.e.
where g L1 (), then x is Bochner integrable and
x (s) d = lim xn (s) d.
n
Proof: ||xn (s) x (s)|| 2g (s) a.e. so by the usual dominated convergence
theorem,
0 = lim ||xn (s) x (s)|| d.
n
Also,
||xn (s) xm (s)|| d
||xn (s) x (s)|| d + ||xm (s) x (s)|| d,
and so {xn } is a Cauchy sequence in L1 (; X). Therefore, by Theorem 22.3.3, there
exists y L1 (; X) and a subsequence xn satisfying
xn (s) y (s) a.e. and in L1 (; X).
But x (s) = limn xn (s) a.e. and so x (s) = y (s) a.e. Hence
||x (s)|| d = ||y (s)|| d <
which shows that x is Bochner integrable. Finally, since the integral is linear,
x (s) d x (s) d = (x (s) x (s)) d
n n
(() < )
fn () f ()
F E, (F ) < ,
d (f () , xk ) d (f () , fn ()) + d (fn () , xk )
1 1
< 2r + r = r
m m
( ( ))
Thus fn1 (B (xk , r)) f 1 B xk , m1
r and so is in the left side. Thus
the two sets are equal. Now the set on the left in 22.3.18 is measurable because it
is a countable union of measurable sets. This proves the claim since
[ ]
1
: d (fn () , f ())
m
22.3. THE SPACES LP (; X) 595
C
Ek(m)m .
m=1
Hence Ek(m
C
0 )m0
so
d (f () , fn ()) < 1/m0 <
for all n > k(m0 ). This holds for all F C and so fn converges uniformly to f on
F C.
Now if E = , consider {XE C fn }n=1 . Then XE C fn is measurable and the se-
quence converges pointwise to XE f everywhere. Therefore, from the rst part, there
exists a set of measure less than , F such that on F C , {XE C fn } converges uniformly
C
to XE C f. Therefore, on (E F ) , {fn } converges uniformly to f .
Now here is the Vitali convergence theorem and a denition.
Denition 22.3.7 Let A L1 (; X). Then A is said to be uniformly integrable
if for every > 0 there exists > 0 such that whenever (E) < , it follows
||f ||X d <
E
Proof: Let > 0 be given. Then by uniform integrability there exists > 0
such that if (E) < then
||fn || d < /3.
E
By Fatous lemma the same inequality holds for f . Also Fatous lemma shows
f L1 (; X), f being measurable because of Theorem 22.1.7.
By Egoros theorem, Theorem 22.3.6, there exists a set of measure less than ,
E such that the convergence of {fn } to f is uniform o E. Therefore,
||f fn || d (||f ||X + ||fn ||X ) d + ||f fn ||X d
E
EC
2
< + d <
3 E C ( () + 1) 3
if n is large enough.
Note that a convenient way to achieve uniform integrability is to simple say {fn }
is bounded in Lp (; X) for some p > 1. This follows from Holders inequality.
( )1/p ( )1/p
p
||fn || d d ||fn || d .
E E
For such a simple function, you can assume the Ek are disjoint and then
m m
xn L1 (,L1 (B)) = ak L1 (B) (Ek ) = |ak | d (Ek )
k=1 k=1 B
= |ak (t)| d (t) XEk (s) d (s)
B
= |xn | dd
B
Now consider 22.4.20. Since limm xm (s) = x (s) in L1 (B) , it follows from
Fatous lemma that
||xn x||L1 (,L1 (B)) lim inf ||xn xm ||L1 (,L1 (B)) <
m
and so
x (s) = y (s) in L1 (B) a.e. s
In particular, for a.e. s, it follows that
But ( )
xn (s) d (t) = xn (s, t) d a.e. t.
Therefore
( )
lim xn (s, t) d x (s) d (t) d = 0. (22.4.21)
n
B
Theorem 22.4.1 Let X = L1 (B) where (B, F, ) is a nite measure space and
let x L1 (; X). Then there exists a measurable representative, y L1 ( B),
such that
x (s) = y (s, ) a.e. s in , the equation in L1 (B) ,
and ( )
y (s, t) d = x (s) d (t) a.e. t.
F :SX
whenever {Ei }i=1 is a sequence of disjoint elements of S. For F a vector measure,
|F | (A) sup{ || (F )|| : (A) is a partition of A}.
F (A)
22.5. VECTOR MEASURES 599
This is the same denition that was given in the case where F would have values
in C, the only dierence being the fact that now F has values in a general Banach
space X as the vector space of values of the vector measure. Recall that a partition
of A is a nite set, {F1 , , Fm } S such that m
i=1 Fi = A. The same theorem
about |F | proved in the case of complex valued measures holds in this context with
the same proof. For completeness, it is included here.
Proof: Let E1 and E2 be sets of S such that E1 E2 = and let {Ai1 Aini } =
(Ei ), a partition of Ei which is chosen such that
ni
|F |(Ei ) < ||F (Aij )|| i = 1, 2.
j=1
Consider the sets which are contained in either of (E1 ) or (E2 ) , it follows this
collection of sets is a partition of E1 E2 which is denoted here by (E1 E2 ).
Then by the above inequality and the denition of total variation,
|F |(E1 E2 ) ||F (F )|| > |F |(E1 ) + |F |(E2 ) 2,
F (E1 E2 )
Also,
Ai = j=1 Ai Ej
and so by the triangle inequality, ||F (Ai )|| j=1 ||F (Ai Ej )||. Therefore, by
the above,
||F (Ai )||
z }| {
n
|F |(E ) < ||F (Ai Ej )||
i=1 j=1
n
= ||F (Ai Ej )||
j=1 i=1
|F |(Ej )
j=1
600 THE BOCHNER INTEGRAL
n
because {Ai Ej }i=1 is a partition of Ej .
Since > 0 is arbitrary, this shows
|F |(
j=1 Ej ) |F |(Ej ).
j=1
n
Also, 22.5.23 implies that whenever the Ei are disjoint, |F |(nj=1 Ej ) j=1 |F |(Ej ).
Therefore,
n
|F |(Ej ) |F |(
j=1 Ej ) |F |(j=1 Ej )
n
|F |(Ej ).
j=1 j=1
Since n is arbitrary,
|F |(
j=1 Ej ) = |F |(Ej )
j=1
Denition 22.5.3 A Banach space is said to have the Radon Nikodym property if
whenever
(, S, ) is a nite measure space
F : S X is a vector measure with |F | () <
F
then one may conclude there exists g L1 (; X) such that
F (E) = g (s) d
E
for all E S.
Some Banach spaces have the Radon Nikodym property and some dont. No
attempt is made to give a complete answer to the question of which Banach spaces
have this property but the next theorem gives examples of many spaces which do.
Theorem 22.5.4 Suppose X is a separable dual space. Then X has the Radon
Nikodym property.
which contradicts 22.5.25 because B (p, r) was given to have empty intersection with
B (0, ||x||). Therefore, |F | (E) = 0 as hoped.
( Now F \)B (0, ||x||) can be covered by
countably many such balls and so |F | F \ B (0, ||x||) = 0.
Denote the exceptional set of measure zero by Nx . By Theorem 22.1.13, X is
separable. Letting D be a dense, countable subset of X, dene
N1 xD Nx .
Thus
|F | (N1 ) = 0.
For any E S, x, y D, and a, b F,
fax+by (s) d |F | = F (E) (ax + by) = aF (E) (x) + bF (E) (y)
E
= (afx (s) + bfy (s)) d |F |. (22.5.26)
E
Since 22.5.26 holds for all E S, it follows
for |F
| a.e. s and x, y D. Let D consist of all nite linear combinations of the
m
form i=1 ai xi where ai is a rational point of F and xi D. If
m
ai xi D,
i=1
|F | (N2 ) = 0
This is well dened because if x and y are elements of D, the above claim and
22.5.27 imply
y (s) = h
(x y ) (s) ||x y ||.
hx (s) h
Using 22.5.27, the dominated convergence theorem may be applied to conclude that
for xn x, with xn D,
hx (s) d |F | = lim x (s) d |F | = lim F (E) (xn ) = F (E) (x). (22.5.28)
h n
E n E n
|hx (s)| ||x|| , hax+by (s) = ahx (s) + bhy (s), (22.5.29)
Therefore,
|| (s)|| d |F | <
22.6. THE RIESZ REPRESENTATION THEOREM 603
so L1 (; X ). By 22.2.6, if E S,
( )
hx (s) d |F | = (s) (x) d |F | = (s) d |F | (x). (22.5.30)
E E E
Corollary 22.5.5 Any separable reexive Banach space has the Radon Nikodym
property.
It is not necessary to assume separability in the above corollary. For the proof
of a more general result, consult Vector Measures by Diestal and Uhl, [11].
||x||Y = ||x||X .
Theorem 22.6.2 Let X be any Banach space and let (, S, ) be a nite measure
space. Let p 1 and let 1/p + 1/p = 1.(If p = 1, p .) Then Lp (; X ) is
isometric to a subspace of (Lp (; X)) . Also, for g Lp (; X ),
sup g (s) (f (s)) d = ||g||p .
||f ||p 1
Proof: First observe that for f Lp (; X) and g Lp (; X ),
s g (s) (f (s))
by
g (f ) g (s) (f (s)) d.
Holders inequality implies
||g|| ||g||p (22.6.31)
and it is also clear that is linear. Next it is required to show
||g|| = ||g||.
m
i=1 Ei = .
Then ||g|| Lp (). Let > 0 be given. By the scalar Riesz representation
theorem, there exists h Lp () such that ||h||p = 1 and
||g (s)||X h (s) d ||g||Lp (;X ) .
Also
||g|| |g (f )| = g (s) (f (s)) d
m
( )
||ci ||X / ||h||L1 () h (s) XEi (s) d
i=1
||g (s)||X h (s) d h (s) / ||h||L1 () d
||g||Lp (;X ) 2.
Since was arbitrary,
||g|| ||g|| (22.6.32)
and from 22.6.31 this shows equality holds in 22.6.32 whenever g is a simple function.
In general, let g Lp (; X ) and let gn be a sequence of simple functions
p
converging to g in L (; X ). Then
Theorem 22.6.3 If X is a Banach space and X has the Radon Nikodym property,
then if (, S, ) is a nite measure space,
(Lp (; X))
= Lp (; X )
Lemma 22.6.4 F dened above is a vector measure with values in X and |F | () <
.
1/p
||l|| sup ||XE () x||Lp (;X) ||l|| (E) .
||x||1
Let {Ei }
i=1 be a sequence of disjoint elements of S and let E = n< En .
n n
F (E) (x) F (Ek ) (x) = l (XE () x) l (XEi () x) (22.6.33)
k=1 i=1
n
||l|| XE () x XEi () x
p
i=1 L (;X)
( )1/p
||l|| Ek ||x||.
k>n
Since () < ,
( )1/p
lim Ek =0
n
k>n
Thus
n
n
n
+ ||F (Hi )|| < l (XHi () xi ) ||l|| XHi () xi
i=1 i=1 i=1 Lp (;X)
( )1/p
n
1/p
||l|| XHi (s) d = ||l|| () .
i=1
1/p
Since the partition was arbitrary, this shows |F | () ||l|| () and this proves
the lemma.
Continuing with the proof of Theorem 22.6.3, note that
F .
22.6. THE RIESZ REPRESENTATION THEOREM 607
Since X has the Radon Nikodym property, there exists g L1 (; X ) such that
F (E) = g (s) d.
E
n n
= F (Ei ) (xi ) = g (s) (xi ) d. (22.6.34)
i=1 i=1 Ei
Let
Gn {s : ||g (s)||X n}
and let
j : Lp (Gn ; X) Lp (; X)
be given by {
h (s) if s Gn ,
jh (s) =
0 if s
/ Gn .
Thus j is the zero extension o of Gn . Letting h be a simple function in Lp (Gn ; X),
j l (h) = l (jh) = g (s) (h (s)) d. (22.6.36)
Gn
Since the simple functions are dense in Lp (Gn ; X), and g Lp (Gn ; X ), it follows
22.6.36 holds for all h Lp (Gn ; X). By Theorem 22.6.2,
Therefore g Lp (; X ) and since simple functions are dense in Lp (; X), 22.6.35
holds for all h Lp (; X) . Thus l = g and the theorem is proved because, by
Theorem 22.6.2, ||l|| = ||g|| and the mapping is onto because l was arbitrary.
As in the scalar case, everything generalizes to the case of nite measure
spaces. The proof is almost identical.
608 THE BOCHNER INTEGRAL
1 1
+ = 1.
p p
Proof: First suppose r exists as described. Also, to save on notation and to
emphasize the similarity with the scalar case, denote the norm in the various spaces
by ||. Dene a new measure e, according to the rule
e (E)
rd. (22.6.37)
E
Then
p p1 p p
||f ||Lp (e) = r f rd = ||f ||Lp ()
and so is one to one and in fact preserves norms. I claim that also is onto. To
1
see this, let g Lp (; X, e) and consider the function, r p g. Then
p1 p p p
r g d = |g| rd = |g| de <
1
( 1 )
Thus r p g Lp (; X, ) and r p g = g showing that is onto as claimed. Thus
is one to one, onto, and preserves norms. Consider the diagram below which is
descriptive of the situation in which must be one to one and onto.
p e
h, L (e
) L (ep
) , Lp () ,
Lp (e
) Lp ()
e Lp (e
Then for Lp () , there exists a unique
e = ,
) such that e =
|||| . By the Riesz representation theorem for nite measure spaces, there exists
22.6. THE RIESZ REPRESENTATION THEOREM 609
a unique h Lp (e
) Lp (; X , e
e) which represents in
the
manner described
e
in the Riesz representation theorem. Thus ||h||Lp (e) = = |||| and for all
f Lp () ,
( 1 )
e e (f ) =
(f ) = (f ) h (f ) de
= rh r p f d
1
= r p hf d.
Now
p1 p p p
r h d = |h| rd = ||h||Lp (e) < .
1
e
Thus r p h = ||h||Lp (e) = = |||| and represents in the appropriate
Lp ()
way. If p = 1, then 1/p 0. Now consider the existence of r. Since the measure
space is nite, there exist {n } disjoint, each having positive measure and their
union equals . Then dene
1
r () 2
(n )1 Xn ()
n=1
n
This proves the Lemma.
Theorem 22.6.6 (Riesz representation theorem) Let (, S, ) be nite and let
X have the Radon Nikodym property. Then for
(Lp (; X, )) , p 1
there exists a unique h Lq (, X , ), L (, X , ) if p = 1 such that
f = h (f ) d.
Denition 23.1.1 For a set E, denote by r (E) the number which is half the di-
ameter of E. Thus
1 1
r (E) sup {|x y| : x, y E} diam (E)
2 2
Let E Rn .
Hs (E) inf{ (s)(r (Cj ))s : E
j=1 Cj , diam(Cj ) }
j=1
(s)(r(Cji ))s /2i < Hs (Ei )
j=1
611
612 HAUSDORFF MEASURE
Theorem 23.1.4 Let be an outer measure on the subsets of (X, d), a metric
space. If
(A B) = (A) + (B)
whenever dist(A, B) > 0, then the algebra of measurable sets contains the Borel
sets.
Proof: It suces to show that closed sets are in S, the -algebra of measurable
sets, because then the open sets are also in S and consequently S contains the Borel
sets. Let K be closed and let S be a subset of . Is (S) (S K) + (S \ K)?
It suces to assume (S) < . Let
1
Kn {x : dist(x, K) }
n
Since, x dist (x, K) is continuous, it follows Kn is closed. By the assumption of
the theorem,
If limn ((Kn \ K) S) = 0 then the theorem will be proved because this limit
along with 23.1.2 implies limn (S \ Kn ) = (S \ K) and then taking a limit
in 23.1.1, (S) (S K) + (S \ K) as desired. Therefore, it suces to establish
this limit.
Since K is closed, a point, x
/ K must be at a positive distance from K and so
Kn \ K =
k=n Kk \ Kk+1 .
Therefore
(S (Kn \ K)) (S (Kk \ Kk+1 )). (23.1.3)
k=n
If
(S (Kk \ Kk+1 )) < , (23.1.4)
k=1
M
(S (Kk \ Kk+1 )) =
k=1
(S (Kk \ Kk+1 )) + (S (Kk \ Kk+1 )). (23.1.5)
k even, kM k odd, kM
By the construction, the distance between any pair of sets, S (Kk \ Kk+1 ) for
dierent even values of k is positive and the distance between any pair of sets,
S (Kk \ Kk+1 ) for dierent odd values of k is positive. Therefore,
(S (Kk \ Kk+1 )) + (S (Kk \ Kk+1 ))
k even, kM k odd, kM
( S (Kk \ Kk+1 )) + ( S (Kk \ Kk+1 )) 2 (S) <
k even k odd
M
and so for all M, k=1 (S (Kk \ Kk+1 )) 2 (S) showing 23.1.4
With the above theorem, the following theorem is easy to obtain.
Theorem 23.1.5 The algebra of Hs measurable sets contains the Borel sets and
Hs has the property that for all E Rn , there exists a Borel set F E such that
Hs (F ) = Hs (E).
614 HAUSDORFF MEASURE
Hs (A B) + > (s)(r (Cj ))s.
j=1
Thus
Hs (A B ) + > (s)(r (Cj ))s + (s)(r (Cj ))s
jJ1 jJ2
where
J1 = {j : Cj A = }, J2 = {j : Cj B = }.
Recall dist(A, B) = 2 0 , J1 J2 = . It follows
Hs (A B) + > Hs (A) + Hs (B).
Letting 0, and noting > 0 was arbitrary, yields
Hs (A B) Hs (A) + Hs (B).
Equality holds because Hs is an outer measure. By Caratheodorys criterion, Hs is
a Borel measure.
To verify the second assertion, note rst there is no loss of generality in letting
Hs (E) < . Let
E j=1 Cj , r(Cj ) < ,
and
Hs (E) + > (s)(r (Cj ))s.
j=1
Let
F =
j=1 Cj .
Thus F E and
()s
Hs (E) Hs (F ) (s)(r Cj ) = (s)(r (Cj ))s < + Hs (E).
j=1 j=1
23.2 Hn And mn
Next I will compare Hn and mn . To do this, recall the following covering theorem
of Corollary 10.3.6 on Page 248.
In the next lemma, the balls are the usual balls taken with respect to the usual
distance in Rn .
Now letting E be Borel, it follows from the outer regularity of mn there exists
a decreasing sequence of open sets, {Vi } containing E such such that mn (Vi )
mn (E) . Then from the above,
Now let Bi be a ball having radius equal to diam (Ci ) = 2r (Ci ) which contains Ci .
It follows
n (n) 2n n
mn (Bi ) = (n) 2n r (Ci ) = (n) r (Ci )
(n)
which implies
n (n)
1> (n) r (Ci ) = mn (Bi ) = ,
i=1 i=1
(n) 2n
a contradiction.
Proof: I will show Hn is a positive multiple of mn for any choice of (n) . Dene
mn (Q0 )
k=
Hn (Q0 )
where Q0 = [0, 1)n is the half open unit cube in Rn . I will show kHn (E) = mn (E)
for any Lebesgue measurable set. When this is done, it will follow that by adjusting
(n) the multiple
n can be taken to be 1.
Let Q(= ) i=1 [ai , ai + 2k ) be a half open box where ai = l2k . Thus Q0 is the
n
union of 2k of these identical half open boxes. By translation invariance, of Hn
and mn
( k )n n 1 1 ( k )n
2 H (Q) = Hn (Q0 ) = mn (Q0 ) = 2 mn (Q) .
k k
Therefore, kHn (Q) = mn (Q) for any such half open box and by translation in-
variance, for the translation of any such half open box. It follows from Lemma
16.5.2 that kHn (U ) = mn (U ) for all open sets. It follows immediately, since every
compact set is the countable intersection of open sets that kHn = mn on compact
23.3. TECHNICAL CONSIDERATIONS 617
sets. Therefore, they are also equal on all closed sets because every closed set is
the countable union of compact sets. Now let F be an arbitrary Lebesgue measur-
able set. I will show that F is Hn measurable and that kHn (F ) = mn (F ). Let
Fl = B (0, l) F. Then there exists H a countable union of compact sets and G a
countable intersection of open sets such that
H Fl G (23.2.6)
mn (G \ H) = kHn (G \ H) = 0. (23.2.7)
This inequality may seem obvious at rst but it is not really. The reason it is not
is that there are sets which are not subsets of any sphere having the same diameter
as the set. For example, consider an equilateral triangle.
m=1
618 HAUSDORFF MEASURE
Let
Sk = N
m=1 Em (cm , cm ).
k k k k
Then (x,y) Sk if and only if f (x) > 0 and |y| < sk (x) f (x). It follows that
Sk Sk+1 and
S = k=1 Sk .
APi x x
and these are Borel measurable functions of Pi x. Also, if {Ai } is a disjoint sequence
of sets in G then
( ) ( )
m (i Ai Rk )Pi x = m (Ai Rk )Pi x
i
APi x = {y R : Pi x + yei A}
Pi x m(APi x )
Proof : The rst assertion is obvious from the denition. The Borel measur-
ability of S(A, ei ) follows from the denition and Lemmas 23.3.2 and 23.3.1. To
show Formula 23.3.8,
21 m(APi x )
mn (S(A, ei )) = dxi dx1 dxi1 dxi+1 dxn
P i Rn 21 m(APi x )
= m(APi x )dx1 dxi1 dxi+1 dxn = m(A).
P i Rn
x1 = Pi x1 + y1 ei , x2 = Pi x2 + y2 ei .
For x A dene
l(x) = sup{y : Pi x+yei A}.
g(x) = inf{y : Pi x+yei A}.
620 HAUSDORFF MEASURE
If |y1 y2 | |l(x2 ) g(x1 )|, then we use the same argument but let
Since x1 , x2 are arbitrary elements of S(A, ei ) and is arbitrary, this proves 23.3.9.
The next lemma says that if A is already symmetric with respect to the j th
direction, then this symmetry is not destroyed by taking S (A, ei ).
Proof : By denition,
Pj x + ej xj S(A, ei )
if and only if
|xi | < 21 m(APi (Pj x+ej xj ) ).
Now
xi APi (Pj x+ej xj )
23.3. TECHNICAL CONSIDERATIONS 621
if and only if
xi APi (Pj x+(xj )ej )
Pj x + ej xj S(A, ei )
if and only if
|xi | < 21 m(APi (Pj x+(xj )ej ) )
if and only if
Pj x+(xj )ej S(A, ei ).
(If x An \ B(0, r (An )), then x An \ B(0, r (An )) and so diam (An )
2|x| > diam(An ).) Therefore,
Then
cluded because H H and Lemma 23.2.2. Using mn (B (0, 1)) = Hn (B (0, 1))
n n
again,
n
Hn (B (0, 1)) = Hn (i Bi ) (n) r (Bi )
i=1
(n) n (n)
= (n) r (Bi ) = mn (Bi )
(n) i=1
(n) i=1
(n) (n) (n) n
= mn (i Bi ) = mn (B (0, 1)) = H (B (0, 1))
(n) (n) (n)
which implies (n) (n) and so the two are equal. This proves that if (n) =
(n) , then the Hn = mn on the measurable sets of Rn .
This gives another way to think of Lebesgue measure which is a particularly nice
way because it is coordinate free, depending only on the notion of distance.
For s < n, note that Hs is not a Radon measure because it will not generally be
nite on compact sets. For example, let n = 2 and consider H1 (L) where L is a line
segment joining (0, 0) to (1, 0). Then H1 (L) is no smaller than H1 (L) when L is
considered a subset of R1 , n = 1. Thus by what was just shown, H1 (L) 1. Hence
H1 ([0, 1] [0, 1]) = . The situation is this: L is a one-dimensional object inside
23.4. THE PROPER VALUE OF (N ) 623
p(p) = (p + 1),
( 1 )
(p)(q) = p1
x (1 x) q1
dx (p + q),
0
( )
1
= .
2
Next
(p) (q) = et tp1 dt es sq1 ds = e(t+s) tp1 sq1 dtds
0 0 0 0
u
u
eu (u s)
p1 q1 p1 q1
= e (u s) s duds = s dsdu
0 s 0 0
1
eu (u ux)
p1 q1
= (ux) udxdu
0 0
1
eu up+q1 (1 x)
p1
= xq1 dxdu
0 0
( 1 )
= (p + q) xp1 (1 x)q1 dx .
0
(1)
It remains to nd 2 .
( )
1 t 1/2 u2 1
eu du
2
= e t dt = e 2udu = 2
2 0 0 u 0
624 HAUSDORFF MEASURE
Now
( )2
e(x
2
x2 +y 2 )
ex dx ey dy =
2 2
e dx = dxdy
0 0 0 0 0
/2
1
er rddr =
2
=
0 0 4
and so ( )
1
eu du =
2
=2
2 0
Next let n be a positive integer.
Theorem 23.4.2 (n) = n/2 ((n/2 + 1))1 where (s) is the gamma function
(s) = et ts1 dt.
0
From now on, in the denition of Hausdor measure, it will always be the case
that (s) = (s) . As shown above, this is the right thing to have (s) to equal if s
is a positive integer because this yields the important result that Hausdor measure
is the same as Lebesgue measure. Note the formula, s/2 ((s/2+1))1 makes sense
for any s 0.
23.4. THE PROPER VALUE OF (N ) 625
Since P preserves lengths, it follows P is one to one on P (Rn ) and P 1 also preserves
lengths on P (Rn ) . Replacing each Cj with Cj (P A),
( )n
Hn (P A) + > (n)r(Cj (P A))n = (n)r P 1 (Cj (P A)) Hn (A).
j=1 j=1
626 HAUSDORFF MEASURE
Thus Hn (P A) Hn (A).
Now let A j=1 Cj , diam(Cj ) , and
n
Hn (A) + (n) (r (Cj ))
j=1
Then
n n
Hn (A) + (n) (r (Cj )) = (n) (r (P Cj )) Hn (P A).
j=1 j=1
Hn (F A) = Hn (RU A)
The purpose of this appendix is to prove the equivalence between the axiom of
choice, the Hausdor maximal theorem, and the well-ordering principle. The Haus-
dor maximal theorem and the well-ordering principle are very useful but a little
hard to believe; so, it may be surprising that they are equivalent to the axiom of
choice. First it is shown that the axiom of choice implies the Hausdor maximal
theorem, a remarkable theorem about partially ordered sets.
A nonempty set is partially ordered if there exists a partial order, , satisfying
xx
and
if x y and y z then x z.
An example of a partially ordered set is the set of all subsets of a given set and
. Note that two elements in a partially ordered sets may not be related. In
other words, just because x, y are in the partially ordered set, it does not follow
that either x y or y x. A subset of a partially ordered set, C, is called a chain
if x, y C implies that either x y or y x. If either x y or y x then x and
y are described as being comparable. A chain is also called a totally ordered set. C
is a maximal chain if whenever Ce is a chain containing C, it follows the two chains
are equal. In other words C is a maximal chain if there is no strictly larger chain.
Lemma A.0.7 Let F be a nonempty partially ordered set with partial order .
Then assuming the axiom of choice, there exists a maximal chain in F.
g(C) = C {f (C)}.
627
628 THE HAUSDORFF MAXIMAL THEOREM
Thus g(C) ) C and g(C) \ C ={f (C)} = {a single element of F}. A subset T of X
is called a tower if
T,
C T implies g(C) T ,
and if S T is totally ordered with respect to set inclusion, then
S T .
Here S is a chain with respect to set inclusion whose elements are chains.
Note that X is a tower. Let T0 be the intersection of all towers. Thus, T0 is a
tower, the smallest tower. Are any two sets in T0 comparable in the sense of set
inclusion so that T0 is actually a chain? Let C0 be a set of T0 which is comparable
to every set of T0 . Such sets exist, being an example. Let
B {D T0 : D ) C0 and f (C0 )
/ D} .
f (C0 )
C0 D
f (C0 )
D C0 f (D)
Hence if f (D)
/ C0 , then D C0 . If D = C 0 , then f (D) = f (C0 ) g (D) so
629
Lemma A.0.8 The Hausdor maximal principle implies every nonempty set can
be well-ordered.
(S2 , 2 ) is well-ordered
and if
y S2 \ S1 then x 2 y for all x S1 ,
and if 1 is the well order of S1 then the two orders are consistent on S1 . Then
observe that is a partial order on F. By the Hausdor maximal principle, let C
be a maximal chain in F and let
X C.
x 1 z whenever x X .
Then let
Ce = {S C or X {z}}.
Then Ce is a strictly larger chain than C contradicting maximality of C. Thus X \
X = and this shows X is well-ordered by . This proves the lemma.
With these two lemmas the main result follows.
Proof: It only remains to prove that the well-ordering principle implies the
axiom of choice. Let I be a nonempty set and let Xi be a nonempty set for each
i I. Let X = {Xi : i I} and well order X. Let f (i) be the smallest element
of Xi . Then
f Xi .
iI
where the xk are vectors of , then if c = 0 this contradicts the condition that z
is not a nite linear combination of vectors of . Therefore, c = 0 and now all the
ck must equal zero because it was just shown is linearly independent. It follows
C { {z}} is a strictly larger chain than C and this is a contradiction. Therefore,
is a Hamel basis as claimed. This proves the theorem.
632 THE HAUSDORFF MAXIMAL THEOREM
Bibliography
[6] Bartle R.G., A Modern Theory of Integration, Grad. Studies in Math., Amer.
Math. Society, Providence, RI, 2000.
[7] Bartle R. G. and Sherbert D.R. Introduction to Real Analysis third edi-
tion, Wiley 2000.
[9] Davis H. and Snider A., Vector Analysis Wm. C. Brown 1995.
[11] Diestal J. and Uhl J., Vector Measures, American Math. Society, Provi-
dence, R.I., 1977.
[12] Dontchev A.L. The Graves theorem Revisited, Journal of Convex Analysis,
Vol. 3, 1996, No.1, 45-53.
[16] Evans L.C. and Gariepy, Measure Theory and Fine Properties of Functions,
CRC Press, 1992.
633
634 BIBLIOGRAPHY
[37] Nobel B. and Daniel J. Applied Linear Algebra, Prentice Hall, 1977.
[38] Ray W.O. Real Analysis, Prentice-Hall, 1988.
[39] Rudin, W., Principles of mathematical analysis, McGraw Hill third edition
1976
[40] Rudin W., Real and Complex Analysis, third edition, McGraw-Hill, 1987.
[41] Salas S. and Hille E., Calculus One and Several Variables, Wiley 1990.
636
INDEX 637