Lecture Notes On Differentiability

LECTURE NOTES ON DIFFERENTIABILITY (MA 108A)
1. Definition of derivative
Recall the denition of derivative for real valued functions:
f
(x) := lim
h0
f(x +h) f(x)
h
,
if the limit exists. Writing
(h) =
_
f(x+h)f(x)
h
f
(x) h ,= 0
0 h = 0,
we see that the denition of derivative implies that (h) is a continuous function at
0, while it is clearly a continuous function elsewhere (as the dierentiable function
f is continuous). Moreover we have the equation
f(x +h) = f(x) +f
(x)h +h(h).
We can now generalize this idea to obtain a more general denition of derivative.
Notice that we need that domain and range of f are normed vector spaces, otherwise
we cant add (if we dont have a vector space) or talk about continuity (if we dont
have norms).
Denition 1.1. Let X and Y be normed vector spaces, and U X open, f :
U Y . We say f is dierentiable at x U if there exists a bounded linear map
Df(x) L(X, Y )
and a continuous function : V Y , where V is an open

neighbourhood of 0 X, with (0) = 0, such that
f(x +h) = f(x) + (Df(x))h + |h|(h)
for all h V . (Note V must be chosen such that x +V = x +v [ v V U.)
We can also relate this to the original limit denition of f
, by the following
lemma.
Lemma 1.2. If f : U Y is dierentiable at x, then for all h X we have
(1) Df(x)h = lim
t0
f(x +th) f(x)
t
,
where t is chosen in R.
Proof. Let us assume t > 0 (for t < 0 the argument is the same). By denition of
derivative we have for t small enough (so th V )
f(x +th) = f(x) + (Df(x))(th) + |th|(th),
or by rearranging and using linearity
(Df(x))h =
f(x +th) f(x)
t
|h|(th).
We use the notation L(X, Y ) for the space of bounded linear maps from X to Y
1
2 LECTURE NOTES ON DIFFERENTIABILITY (MA 108A)
Now we can take the limit as t 0 on both sides (as the left hand side is constant,
it has a limit) to get the desired result (note lim
t0
|h|(th) = |h|(0h) = 0 as
is continuous).
It is not true however that if the limit (1) exist for all h that the function f
is dierentiable. Indeed consider f : R
2
R
2
given by f(x, y) =
xy
x
2
+y
2
and
f(0, 0) = 0. Then f is continuous and for h = (h
1
, h
2
) we see
lim
t0
f(0 + (h
1
, h
2
)t) f(0)
t
= lim
t0
h
1
h
2
t
2
t
_
h
2
1
t
2
+h
2
2
t
2
= lim
t0
h
1
h
2
_
h
2
1
+h
2
2
= f(h
1
, h
2
),
but f is not dierentiable, as this possible derivative is not a linear function.
Corollary 1.3. If Df exists at x U then Df(x) is unique. Moreover, in this
case f is continuous at x.
Proof. Df(x) is unique as we can give an explicit formula for it. f is continous as
lim
h0
f(x +h) = lim
h0
f(x) + (Df(x))h + |h|(h) = f(x).
Denition 1.4. We say f : U Y is dierentiable if for all x U the function
f is dierentiable at x. In this case we can view Df as a map Df : U L(X, Y ).
If Df is a continuous function we say f C
1
.
Let us consider some examples:
f : R R: In this case the derivative as dened here is identical to the
usual derivative.
f : R
n
R: In this case the derivative is given by a linear map from
R
n
to R. All linear maps v L(R
n
, R) can be written as inner product
with the vector (ve
1
, ve
2
, . . . , ve
n
), where the e
j
(the vector with all zeros
and 1 one at the jth coordinate) form the standard normal basis of R
n
.
Now Df(x)e
j
=
f
xj
(x) by the limit expression, so Df(x) is the linear
map dened by taking the inner product with the gradient. Note that
the gradient may exist, while Df does not, as in the case with f(x, y) =
xy/
_
x
2
+y
2
at the origin.
f : R
n
R
m
: Let us write f = (f
1
, f
2
, . . . , f
m
), where f
j
=
j
f is the
projection of f on the jth coordinate. Now we can write Df(x) as an mn
matrix, with coecients (
fi
xj
), i.e.
Df(x) = J =
_
_
_
_
_
_
f1
x1
f1
x2

f1
xn
f2
x1
f2
x2

f2
xn
.
.
.
.
.
.
.
.
.
.
.
.
fm
x1
fm
x2

fm
xn
_
_
_
_
_
_
We call this matrix the Jacobian matrix (for m = n, its determinant occurs
in the formula for the change of variables for multivariate integrals).
The function f C
1
if and only if f
i
/x
j
is continuous for all i and
j. Indeed if f C
1
, then Df is a continuous map to L(R
n
, R
m
), which
is homeomorphic to R
nm
(as all vector spaces of equal nite dimension
are homeomorphic). And we have seen that g = (g
1
, . . . , g
t
) : M R
t
is
continuous if and only if g
r
are continuous for all r, thus in this case we
LECTURE NOTES ON DIFFERENTIABILITY (MA 108A) 3
see that Df is continuous if and only if the elements f
i
/x
j
in the Jacobi
matrix are continuous.
The statement also includes the statement that f is dierentiable if all
partial derivatives are continuous. This follows from the remark that the
mean value theorem (below) is actually still valid with Df replaced by the
matrix of derivatives if partial derivatives are continuous, so this matrix
then satises the denition of the derivative.
2. Basic Properties
Several of the properties of the ordinary derivative generalize. Indeed we have
Proposition 2.1. Let U X be open, and f, g : U Y be dierentiable at x U.
Then f +g is also dierentiable at x and D(f +g)(x) = Df(x) +Dg(x).
Moreover if c R then cf is dierentiable at x and D(cf)(x) = c(Df(x)).
Proof. The proof is left as an exercise to the reader.
As a consequence of this proposition we see that the space of dierentiable func-
tions is a linear subspace of the space of continuous functions, and that D is a linear
map from the space of dierentiable functions to L(X, Y ). The natural question
whether D is a bounded map depends on the norm on the space of dierentiable
functions. If we use the | |
norm D is in general not bounded (for example

the derivative of sin(x
2
) is unbounded). However there exist other norms
, with
respect to which D is bounded (though this often involves looking at subspaces of
the space of dierentiable functions).
The product rule (Leibniz rule) also generalizes.
Theorem 2.2. Let f
1
, f
2
: U R be two maps dierentiable at x U. Then the
function f
1
f
2
: U R is also diertiable at x and
D(f
1
f
2
)(x) = f
2
(x)(Df
1
)(x) +f
1
(x)(Df
2
)(x)
Note the right hand side is indeed a linear map, where we regard f
1
(x) and f
2
(x)
as scalar multiplications.
Proof. Dene the functions
j
by the equation
f
j
(x +h) = f
j
(x) + (Df
j
(x))h + |h|
j
(h).
In particular we see that
j
are continuous functions with
j
(0) = 0.
We consider the following identity which follows trivially by expanding the dif-
ferent terms:
f
1
(x +h)f
2
(x +h) f
1
(x)f
2
(x) f
2
(x)(Df
1
)(x)h f
1
(x)(Df
2
)(x)h
= f
1
(x +h) [f
2
(x +h) f
2
(x) Df
2
(x)h |h|
2
(h)]
+ (f
2
(x) +Df
2
(x)h) [f
1
(x +h) f
1
(x) Df
1
(x)h |h|
1
(h)]
+ |h|(
2
(h)f
1
(x +h) +
1
(h)f
2
(x) +
1
(x)Df
2
(x)h)
+ (Df
1
(x)h)(Df
2
(x)h)
On the right hand side we nd that the rst two terms vanish, while the remain-
ing part is |h| times a continuous function which is zero in zero (in particular
This is an innite dimensional space so there might exist inequivalent norms.

|(Df
1
(x))h(Df
2
(x)h)| |Df
1
||Df
2
||h|
2
is |h| times a continuous function
which vanishes in zero).
Thus the proposed derivative satises the denition of derivative.
And of course the Chain rule also still holds
Theorem 2.3. Let f : U V and g : V Z (with U X and V Y ) be
dierentiable at x, respectively f(x). Then g f is dierentiable at x and
D(g f)(x) = Dg(f(x)) Df(x).
Proof. The proof is along similar lines as the proof of the product rule. Let
f
and
g
be the s in the denition of derivative of f, respectively g. Thus we have
(g f)(x +h) (g f)(x) (Dg(f(x)) Df(x))h
= Dg(f(x))(f(x +h) f(x) + |f(x +h) f(x)|
g
(f(x +h) f(x)) Df(x)h)
= Dg(f(x))(|h|
f
(h) + |Df(x)h + |h|
f
(x)|
g
(f(x +h) f(x)))
where in the rst step we used the denition of derivative for g and the fact that
we can make f(x +h) f(x) arbitrarily small by choosing h small enough, and in
the second step we used the denition of derivative of f. By linearity of Dg(f(x))
and the fact that Df(x) is a bounded linear operator we see that the remaining
part is |h| times a continuous function, which vanishes at h = 0. In particular we
see that the proposed derivative indeed satises the criteria of the denition.
As a corollary we obtain that dierentiation commutates with taking linear maps
Corollary 2.4. Let f : U Y be dierentiable at x, and : Y Z be a linear
map. Then
D( f)(x) = Df(x).
Proof. Notice that D(y) = (for all y Y ) as
(y +h) (y) = h,
so it satises the conditions of the denition of derivative (with (y) = 0). The
corollary now follows by applying the chain rule.
3. Mean Value theorem
From now on we will assume that Y is a Banach space, as this will implicate the
existence of certain integrals. We need a few properties of integration, which we
will not prove.
Let f : [a, b] V be a continuous function from a closed bounded interval in
R to a Banach space V , then the integral
_
b
a
f(t)dt exists. Indeed integration
becomes a bonuded linear map
_
b
a
L(C([a, b], V ), V ) with norm |
_
b
a
| =
[b a[.
This last sentence just says that
_
f + g =
_
f +
_
g,
_
cf = c
_
f and
_
b
a
f(t)dt [b a[|f|
.
If V = L(X, Y ) for some vector spaces X and Y (and Y a Banach space,
otherwise V is not a Banach space) then for all x X we have
_
b
a
(f(t)(x))dt =
_
_
b
a
f(t)dt
_
(x).
This property holds for nite sums (indeed, if F
i
L(X, Y ) then the ex-
pression
i
(F
i
x) = (
i
F
i
)x is the denition of
i
F
i
), and it is preserved
in the limiting process which denes integration.
(The fundamental theorem of Calculus) If g : [a, b] V is continuously
dierentiable then
_
b
a
Dg(t)dt = g(b) g(a).
The mean value theorem for functions f : R R says that if x, y R then there
exists a in between x and x +y such that
f(x +y) f(x) = f
()y.
The following theorem is a generalization of this theorem.
Theorem 3.1. Let U X be open, f : U Y a C
1
function, where Y is a
Banach space. Assume the line segment x +ty [ t [0, 1] U, then
f(x +y) f(x) =
__
1
0
Df(x +ty)dt
_
(y).
The pretty expression f
() has been replaced by an integral. In the univariate

case we could use the intermediate value theorem to show this integral equals the
derivative at one point of the integral. However in the multivariate case this is no
longer possible, hence the ugly expression. However, for many application it does
not matter, as the integral is as easily bounded as f
().
Proof. Let g(t) := f(x + ty), then g is a continuously dierentiable function g :
[0, 1] Y . We get
g(1) g(0) =
_
1
0
Dg(t)dt =
_
1
0
Df(x +ty)(y)dt =
__
1
0
Df(x +ty)dt
_
(y),
where in the rst and third step we used two of the properties of integration, and
in the second step we used the chain rule for dierentiation.
The following corollaries look much nicer (though remember the original theorem
as it is more general and you sometimes need that generality)
Corollary 3.2. Under the same conditions as in the theorem we have
|f(x +y) f(x)| |y| sup
vline
|Df(v)|,
where line = x +ty [ t [0, 1] is the line between x and x +y.
Proof. This follows from the mean value theorem by observing that integration is
a bounded operator. Indeed we get
|f(x +y) f(x)| = |
__
1
0
Df(x +ty)dt
_
(y)|
|
__
1
0
Df(x +ty)dt
_
||y| sup
t[0,1]
|Df(x +ty)| |y|
Here the rst inequality is just using the denition of operator norm, while the nal
inequality follows from the fact that
_
1
0
is a bounded operator of norm 1.
The following corollary might sometimes be useful, especially if you plug in
L = Df(x).
Corollary 3.3. With conditions as in the mean value theorem, let L : X Y be
a bounded linear map. Then
|f(x +y) f(x) Ly| |y| sup
vline
|Df(v) L|.
Proof. Apply the previous corollary to g(x) = f(x) Lx.
4. Higher derivatives
Recall that for a C
1
function f we have Df : U L(X, Y ) is some continuous
function. Hence it is natural to consider whether we can dierentiate this function
as well, and, if so, what we can say about the derivatives. The range of D(Df)
would be L(X, L(X, Y )), so let us rst study this space, and higher generalizations
a bit.
Denition 4.1. We dene the space of p-multilinear bounded functions L
p
(X, Y )
by L
1
(X, Y ) = L(X, Y ) and L
n
(X, Y ) = L(X, L
n1
(X, Y )) for n 2.

For elements L
n
(X, Y ) we write
(v
1
)(v
2
) (v
n
) = (v
1
, v
2
, . . . , v
n
).
Notice that the elements of L
n
are multilinear in the sense that for each index
1 k n and L
n
(X, Y ) we have
(v
1
, . . . , v
k1
, v
k
+v
k
, v
k+1
, . . . , v
n
) = (v
1
, . . . , v
k1
, v
k
, v
k+1
, . . . , v
n
)
+(v
1
, . . . , v
k1
, v
k
, v
k+1
, . . . , v
n
)
and for each c R we have
(v
1
, . . . , v
k1
, cv
k
, v
k+1
, . . . , v
n
) = c(v
1
, . . . , v
k1
, v
k
, v
k+1
, . . . , v
n
).
We dene the norm of a multilinear operator as
|| = sup
yX
n
:yj=1
|y|,
this coincides with the norm we would have gotten if we just used the usual operator
norm on L(X, L
n1
(X, Y )).
Denition 4.2. For L
n
(X, Y ) we get (v
1
, . . . , v
r
) L
nr
(X, Y ), via
(v
1
, . . . , v
r
)(v
r+1
, . . . , v
n
) = (v
1
, . . . , v
n
).
We say that a multilinear function L
n
(X, Y ) is symmetric if for all permu-
tations S
n
we have
(v
1
, . . . , v
n
) = (v
(1)
, . . . , v
(n)
).
Let us give some examples of multilinear maps
Do not confuse L
n
(X, Y ) with L
p
-spaces which are spaces of functions with a p-norm on
them; those will not be considered in this class.
Let us consider L
n
(R, R). Linear maps L(R, R) are just of the form
(y) = cy for some constant c. Bilinear maps, are maps such that (x) =
(y cy) for some c, which depends on x; as this map must be linear in x we
get L
2
(R, R) are of the form (x, y) = cxy for some c R. Continuing
in this way we see L
n
(R, R) are all of the form (x
1
, . . . , x
n
) = cx
1
x
n
for some constant c R.
If we consider the n-dimensional determinant function as a function of the
n column vectors in a matrix, it is an n-linear function. In fact its an
anti-symmetric n-linear function.
In the case of L
2
(X, Y ) a bilinear function is symmetric if (v, w) = (w, v).
We get the following important theorem
Theorem 4.3. Let U be open in X and f : U Y be twice continuously dieren-
tiable (i.e. f C
1
and Df C
1
as well). Then for each x U, the bilinear map
D
2
f = D(D(f)) is symmetric.
Proof. Let x U and choose r > 0 such that B
r
(x) U. Let v, w B
r/2
(0) X
and set g(x) = f(x +v) f(x). Then
Dg(x) = Df(x +v) Df(x) =
__
1
0
D
2
f(x +vt)dt
_
v.
Now we have
f(x +v +w) f(x +w) f(x +v) +f(x) D
2
f(x)(v, w)
= g(x +w) g(x) D
2
f(x)(v, w)
=
__
1
0
Dg(x +tw)dt
_
w D
2
f(x)(v, w)
=
__
1
0
__
1
0
D
2
f(x +wt +vs)ds
_
vdt
_
w D
2
f(x)(v, w)
=
__
1
0
__
1
0
D
2
f(x +wt +vs) D
2
f(x)ds
_
vdt
_
w.
Thus we obtain the bound
|f(x +v +w) f(x +w) f(x +v) +f(x) D
2
f(x)(v, w)|
|v||w| sup
s,t[0,1]
|D
2
f(x +wt +vs) D
2
f(x)|.
In the same vein we get
|f(x +v +w) f(x +w) f(x +v) +f(x) D
2
f(x)(w, v)|
|v||w| sup
s,t[0,1]
|D
2
f(x +wt +vs) D
2
f(x)|.
Using the triangle inequality we thus nd
|D
2
f(x)(v, w) D
2
f(x)(w, v)| 2|v||w| sup
s,t[0,1]
|D
2
f(x +wt +vs) D
2
f(x)|
The result now follows from the following lemma
Lemma 4.4. Let L
2
(X, Y ) and assume there exists a continuous function
dened on an open neighbourhood V of (0, 0) X X, such that (0, 0) = 0 and
such that for all (v, w) V
|(v, w)| |v||w||(v, w)|,
then = 0.
Proof. Indeed, we nd for any v, w X that (for small enough s R
>0
)
|(v, w)| =
1
s
2
|(sv, sw)|
1
s
2
|sv||sw||(sv, sw)| = |v||w||(sv, sw)|.
Taking the limit s 0 we obtain that |(v, w)| = 0, so = 0 identically.
Let us now consider what this means for a function f : U R with U

R
n
. Thus Df : U L(R
n
, R). As indicated in the previous section there is an
identication of L(R
n
, R) to R
n
, via v R
n
, v) L(R
n
, R), with inverse
L(R
n
, R) ((e
1
), . . . , (e
n
)) R
n
. Under this identication we get Df =
grad(f). But we also saw that we could identify the derivative of a function U R
n
with an n n matrix. Indeed in this case we get the matrix
D
2
f =
_
_
_
_
_
_
2
f
x1x1
2
f
x2x1

2
f
xnx1
2
f
x1x2
2
f
x2x2

2
f
xnx2
.
.
.
.
.
.
.
.
.
.
.
.
2
f
x1xn
2
f
x2xn

2
f
xnxn
_
_
_
_
_
_
.
This matrix should be considered an element of L
2
(R
n
, R) via M(x
1
, x
2
) = x
T
1
Mx
2
,
for x
1
, x
2
X column vectors. This matric is called the Hessian matrix (of f at
x). The symmetry of the second derivative than corresponds to the symmetricness
of this matrix, i.e. D
2
f = (D
2
f)
T
, which corresponds to the commutativity of
the partial derivatives (i.e.

2
f
xixj
=

2
f
xjxi
). And the commutativity of partial
derivatives, as long as the second derivatives are continuous, is a well-known fact.
Let us now consider higher order derivatives.
Denition 4.5. We say a function f : U Y is n times continuously dierentiable
if f is dierentiable and Df is (n 1) times continuously dierentiable. We write
f C
n
. Note that D
n
f : U L
n
(X, Y ).
Note that by denition (of how we write the arguments of multilinear maps, and
the inductive denition of D
n
) we have
D
p+q
f(x)(v
1
, v
2
, . . . , v
p+q
) = D
p
D
q
f(x)(v
1
, . . . , v
p
)(v
p+1
, . . . , v
p+q
).
We can also, given the vectors v
p+1
, . . . , v
p+q
X consider the function g() =
D
q
f()(v
p+1
, . . . , v
p+q
), which is a function from U Y (as D
q
f() is a function
U L
q
(X, Y ), so that if we plug in q vectors we get a result in Y ). And we can dif-
ferentiate this function again. So we get D
p
(D
q
f(x)(v
p+1
, . . . , v
p+q
)) L
p
(X, Y ).
The following lemma relates these two ways of calculating derivatives
Lemma 4.6. Let f C
p+q
and v
j
X. Then
D
p+q
f(x)(v
1
, . . . , v
p+q
) = D
p
D
q
f(x)(v
1
, . . . , v
p
)(v
p+1
, . . . , v
p+q
)
= D
p
(D
q
f(x)(v
p+1
, . . . , v
p+q
))(v
1
, . . . , v
p
).
Proof. We will prove this for p = 1, for higher p it follows by induction. Dene
g : U Y via g(x) = D
q
f(x)(v
2
, . . . , v
q+1
), then g is the compostion of two maps,
D
q
f : U L
q
(X, Y ) and : L
q
(X, Y ) Y , where is the evaluation at the
points v
2
, . . . , v
q+1
. Now by the chain rule we have Dg(x) = DD(D
q
f). A quick
calculation shows is a linear map, so D = . The resulting formula is exactly
the one we need.
Theorem 4.7. Let U be open in X and f : U Y be C
n
. Then for each x U,
the multilinear map D
n
f is symmetric.
Proof. The proof is by induction, where as base case we assume the case n = 2
(which was the previous theorem).
Let g = D
n2
f(v
3
, . . . , v
n
), then g C
2
, so we nd D
2
g(v
1
, v
2
) = D
2
(v
2
, v
1
).
Therefore we nd
D
n
f(v
1
, v
2
, v
3
. . . , v
n
) = D
2
g(v
1
, v
2
) = D
2
g(v
2
, v
1
) = D
n
f(v
2
, v
1
, v
3
. . . , v
n
).
Moreover for any permutation S
n1
(acting on 2, 3, . . . , n) we nd that
D
n1
f(v
2
, . . . , v
n
) = D
n1
f(v
(2)
, . . . , v
(n)
),
thus
D
n
f(v
1
, v
2
, . . . , v
n
) = D(D
n1
f)(v
1
)(v
2
, . . . , v
n
)
= D(D
n1
f)(v
1
)(v
(2)
, . . . , v
(n)
) = D
n
f(v
1
, v
(2)
, . . . , v
(n)
)
The two dierent kind of permutations we have now proven (i.e. interchanging
1 and 2, and all permutations which x 1) generate the full symmetric group S
n
,
so the theorem is proven.
5. Taylors theorem
Taylors theorem gives an approximation of a function f by looking at a very
simple function with the same derivatives as f at some point. For function f : R
R these very simple functions are polynomials. For functions f : R
n
R these
functions will be multivariate polynomials.
Theorem 5.1. Let U X be open, and f : U Y be of class C
n
. Let x U and
y Y such that the line x + ty [ t [0, 1] is completely contained in U. Denote
by y
(k)
the k-tuple (y, y, . . . , y). Then
f(x +y) =
f(x)
0!
+
Df(x)y
1!
+
D
2
f(x)y
(2)
2!
+ +
D
n1
f(x)y
(n1)
(n 1)!
+R
n
,
where the remainder term R
n
is given by
R
n
=
__
1
0
(1 t)
n1
(n 1)!
D
n
f(x +ty)dt
_
y
(n)
.
Note that the zeroth order (n = 0) Taylor approximation is just the mean value
theorem, and the rst order Taylor approximation is closely related to the denition
of derivative (only the remainder term is written dierently).
Proof. We prove the theorem by induction on n. The base case n = 0 is Theorem
3.1. Now suppose the theorem holds for n = p 1, and suppose that f is of class
C
p
. Then certainly f is of class C
p1
, so we get
f(x +y) =
f(x)
0!
+
Df(x)y
1!
+
D
2
f(x)y
(2)
2!
+ +
D
p2
f(x)y
(p2)
(p 2)!
+R
p1
,
where
R
p1
=
__
1
0
(1 t)
p2
(p 2)!
D
p1
f(x +ty)dt
_
y
(p1)
.
Now we can rewrite the remainder term as
R
p1
=
__
1
0
(1 t)
p2
(p 2)!
D
p1
f(x)dt
_
y
(p1)
+
__
1
0
(1 t)
p2
(p 2)!
(D
p1
f(x +ty) D
(p1)
f(x))dt
_
y
(p1)
=
D
p1
f(x)y
(p1)
(p 1)!
+
__
1
0
(1 t)
p2
(p 2)!
(D
p1
f(x +ty) D
(p1)
f(x))dt
_
y
(p1)
Now we apply the mean value theorem on the remaining integral to see
R
p1
=
D
p1
f(x)y
(p1)
(p 1)!
+
__
1
0
(1 t)
p2
(p 2)!
__
t
0
D
p
f(x +uy)ydu
_
dt
_
y
(p1)
Interchanging the u and t integral (which is allowed as we are integrating some
continuous function over some bounded compact set (the triangle determined by
the possible u and t variables)), and calculating the t integral now gives
R
p1
=
D
p1
f(x)y
(p1)
(p 1)!
+
__
1
0
_
1
u
(1 t)
p2
(p 2)!
D
p
f(x +uy)dtdu
_
y
(p)
=
D
p1
f(x)y
(p1)
(p 1)!
+
__
1
0
(1 u)
p1
(p 1)!
D
p
f(x +uy)dtdu
_
y
(p)
,
as desired.
We can rewrite the error term in Taylors theorem, to get an even better approx-
imation (i.e. one term more).
Corollary 5.2. Notation as in Taylors theorem above. We have the formula
f(x +y) =
f(x)
0!
+
Df(x)y
1!
+
D
2
f(x)y
(2)
2!
+ +
D
n
f(x)y
(n)
n!
+
n
(y),
where the remainder term
n
(y) is bounded by
|
n
(y)| sup
0t1
|D
p
f(x +ty) D
p
(x)|
p!
|y|
p
,
and in particular we have
lim
y to0
|
n
(y)|
|y|
p
= 0.
Proof. We note that the calculations of before show
n
(y) =
__
1
0
(1 t)
p1
(p 1)!
(D
p
f(x +ty) D
p
f(x))dt
_
y
(p)
,
the rst bound then follows from the approximation
n
(y)
__
1
0
(1 t)
p1
(p 1)!
dt
_
sup
t[0,1]
|D
p
f(x +ty) D
p
f(x)||y|
p
,
By continuity of D
p
f we then see that the limit expression for y 0 works.
Let us now consider what it means for functions f : R
n
R. First of all, we
can generalize arguments used before to see that
D
p
f(x)(e
i1
, e
i2
, . . . , e
ip
) =

x
i1
x
i2

x
ip
f(x),
where the e
j
denote the standard basis vectors of R
n
. By multilinearity we therefore
see that
D
p
f(x)(y
1
, y
2
, . . . , y
p
) =
{1,...,n}
p
y
1i1
y
2i2
y
pip
x
i1
x
i2

x
ip
f(x),
(here y
k
is the vector in R
n
, with coordinates y
k
= (y
k1
, y
k2
, . . . , y
kn
)). So it seems
that the dierent terms in Taylors formula are sums of n
p
terms. However we can
use the symmetry of the derivatives to see that several of the terms are identical (at
least if y
1
= y
2
= = y
p
). Indeed the value of y
1i1
y
2i2
y
pip
xi
1
xi
2

xip
f(x)
only depends on the number of 1s, 2s, etc. in the vector (i
1
, . . . , i
p
), not on their
order in that case. If we dene
Denition 5.3. We dene
_
p
k1k2kn
_
= p!/k
1
!k
2
! k
n
!, which is the number of
dierent ways to put k
1
balls of color 1, k
2
balls of color 2, etc. (with p = k
1
+ +k
n
total number of balls) in a row.
we get
D
p
f(x)(y, y, . . . , y) =
k
_
p
k
1
k
2
k
n
_
y
k1
1
y
k2
2
y
kn
n
k1
k1
x
1
k2
k2
x
2

kn
kn
x
n
f(x),
where the sum is over all vectors k Z
n
0
with k
1
+k
2
+ +k
n
= p.
In particular we see that this gives a polynomial term in the dierent coordinates,
so the right hand side in Taylors approximation becomes polynomial. Let us end
this section by taking n = 2 and writing down explicitly the third order Taylor
approximation
f(x
1
+y
1
, x
2
+y
2
) = f(x) +y
1
f
x
1
+y
2
f
x
2
+
y
2
1
2
2
f
x
2
1
+y
1
y
2
2
f
x
1
x
2
+
y
2
2
2
2
f
x
2
2
+
y
3
1
6
3
f
x
3
1
+
y
2
1
y
2
2
3
f
2
x
1
x
2
+
y
1
y
2
2
2
3
f
x
1
2
x
2
+
y
3
2
6
3
f
x
3
2
+
3
(y),
where all derivatives are evaluated at x = (x
1
, x
2
).
6. Extremal Values
Recall that for a dierentiable function f : R R you can determine extremal
values (maxima and minima) by looking where the derivative vanishes. The kind
of extremal value is then determined by the second derivative (if it existed); if the
second derivative is positive, you have a minimum, if it is negative you have a
maximum, and otherwise you do not know (it could be either maximum/minimum
or an inection point).
A similar statement can be made for multivariate functions. Let us rst dene
extremal points
Denition 6.1. Let f : X R be a function, dened on some metric space X.
We say f has a local maximum at x X if there exists a neighborhood U X of
x such that f(y) f(x) for all y U. Moreover f has a local minimum whenever
f has a local maximum.
The following theorem shows that extrema only occur at points with vanishing
derivative
Theorem 6.2. Let U X be an open subset of a normed vector space. If f : U
R is dierentiable at x U and has an extremal value (i.e. maximum or minimum)
at x, then Df(x) = 0.
Proof. Without loss of generality we suppose f has a maximum at x, and is dif-
ferentiable at x. We use the denition of derivative; so there exists a continuous
function : V R for some open neighborhood V of 0 X, with (0) = 0 such
that
f(x +y) = f(x) +Df(x)y + |y|(y).
Now for some neighbourhood W of 0 we have the inequality f(x + y) f(x) for
y W. So for y W V we nd
Df(x)y |y|(y).
We now employ a lemma akin to Lemma 4.4 used in the proof of the symmetry of
Df(x) to conclude Df(x) = 0. Indeed the result follows from
Lemma 6.3. Let L(X, Y ) and assume there exists a continuous function
dened on an open neighbourhood V of 0 X, such that (0) = 0 and such that
for all v V
|(v)| |v||(v)|,
then = 0.
Proof. Indeed, we nd for any v X that (for small enough s R
>0
, we need
sv V )
|(v)| =
1
s
|(sv)|
1
s
|sv||(sv)| = |v||(sv)|.
Taking the limit s 0 we obtain that |(v)| = 0, so = 0 identically.
Of course the inverse of this theorem does not hold, you can have functions with
vanishing derivatives, which have no local extremum (take f(x) = x
3
in 0). In the
case of R R we could look at the sign of the second derivative to see whether
and what kind of extremal point we have. It requires some thought to determine
what the equivalent to positive is for elements of L
2
(X, Y ).
Denition 6.4. Let X be a normed vector space. We say A L
2
(X, R) is positive
denite if A(x, x) 0 for all x X, with equality only if x = 0. We say A is highly
positive denite if inf
xX,x=1
A(x, x) > 0.
For nite dimensional vector spaces X the concept of positive denite is the same
as highly positive denite (as B
1
(0) is a compact set in this case). An example
of a positive denite operator which is not highly positive denite is A L
2
(
2
, R)
given by A(x, y) =
nN
x
n
y
n
/n.
Recall that we could write elements of L
2
(R
n
, R) as symmetric square matrices.
A symmetric square matrix is positive denite if all of its eigenvalues are positive
(as the matrix is symmetric all the eigenvalues are real). This condition becomes
especially simple for 2 2 matrices, where a symmetric matrix is positive denite
if and only if both its trace and its determinant are positive.
An important application of positive denite bilinear operators is that they cor-
respond to innerproducts. Indeed if A is symmetric and positive denite then
x, y) := A(x, y) denes an inner product on X (and vice versa for any inner prod-
uct , ) the bilinear map A(x, y) := x, y) is positive denite).
We however are now more interested in the following theorem
Theorem 6.5. Let U X be an open subset of a normed vector space. If f : U
R is twice dierentiable at x U and Df(x) = 0 and D
2
f(x) is highly positive
denite, then f has a local minimum at x. If D
2
f(x) is highly positive denite,
then f has a local maximum at x.
Proof. We only proof this for D
2
f being continuous at x; it is true even if D
2
f is
not continuous, but the proof becomes more tedious.
Now we can use Taylors theorem to write
f(x +y) = f(x) +
Df(x)(y)
1!
+
D
2
f(x)(y, y)
2!
+(y),
where (y) is a continuous function satisfying lim
y0
(y)|y|
2
= 0. Note that
Df(x) = 0 so the second term on the right hand side vanishes. Now let =
inf
yX,y=1
D
2
f(x)(y, y), then > 0 and D
2
f(x)(y, y) |y|
2
for all y X. We
nd that if y is small enough than [(y)[/|y|
2
< /2, so we obtain
f(x +y) > f(x) +
D
2
f(x)(y, y)
2!
|y|
2
/2 f(x) +
|y|
2
2!
|y|
2
/2 = f(x).
Thus we see that x is a local minimum. The case for local maximum is proven in
the same way.
In the higher dimensional case there is a much broader range of possible be-
haviours at points of Df(x) = 0. Apart from the maxima and minima a typical
example is the saddle point for functions f : R
2
R. A saddle point is a point
where the derivative vanishes and D
2
f(x) has one eigenvalue greater than zero and
one less than zero. The typical shape is that of f(x, y) = x
2
y
2
at the origin,
which looks like a saddle (the origin is a maximum in one direction and a minimum
in the other). Another practical example is if a function f gives the height of a
mountain range, then a pass is a saddle point of f.
If you want to know the extremal values of a function f it is still often the best
way to just calculate the derivative and solve Df(x) = 0, and then consider whether
f has extremal values at those points. Indeed you have to solve n equations in n
variables, so you expect a discrete set of points as solutions (unless some equations
are dependent on each other); so the amount of work should remain limited.

Lecture Notes On Differentiability

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture Notes On Differentiability

Hochgeladen von

Copyright:

Verfügbare Formate

LECTURE NOTES ON DIFFERENTIABILITY (MA 108A)

and a continuous function : V Y , where V is an open

norm D is in general not bounded (for example

This is an innite dimensional space so there might exist inequivalent norms.

() has been replaced by an integral. In the univariate

Let us now consider what this means for a function f : U R with U

Das könnte Ihnen auch gefallen