Beruflich Dokumente
Kultur Dokumente
November 6, 2013
1
R1 dx
Example 1.1 Evaluate I = 0
42xx2
.
R dx
The integral is close to the integral 1x 2
which equals sin1 x up to a constant, so we attempt
to use this known integral. By completing square we may write
4 2x x2 = 5 (x + 1)2
!
x+1 2
= 5 1 ,
5
x+1
making substitution t = , dx = 5dt, where t : 1 2 , we have
5 5 5
2
Z 2
1 5dt dt
Z
5 5
I = =
5 1
5
1 t2 1
5
1 t2
2 1
= sin1 sin1 .
5 5
Now let us recall the technique of integration by parts, which is in many aspects the soul of
the analysis. Integration by parts is the integral form of the product rule for derivatives. Since
(f g) = f g + f g so that
Z Z
f (x)g(x) = g(x)f (x)dx + f (x)g (x)dx
Similarly we have
Z b Z b
f (x)g (x)dx = f (x)g(x)|ba g(x)f (x)dx,
a a
or in terms of differentials we can rewrite the preceding formula as
Z b Z b
f (x)dg(x) = f (x)g(x)|ba g(x)df (x).
a a
However there is no general rule to tell us how to split an integrand into g (x)f (x).
2
Example 1.3 Now let us consider In = xn ex dx where n = 1, 2, 3, . Using integration by parts
R
Z Z
In = x de = x e ex dxn
n x n x
Z
= x e n xn1 ex dx
n x
= xn ex nIn1
which gives an induction formula. Repeating the use of integration by parts, one can eventually
work out the result. For example
I2 = x2 ex 2I1 = x2 2(x 1) ex + C
and
I3 = x3 ex 3I2
= x3 3 x2 2(x 1) ex + C
etc.
Example 1.4 Consider In = cosn xdx where n is a non-negative integer. Split the integrand
R
cosn x into cosn1 x cos x = cosn1 x(sin x), and perform integration by parts. Then
Z Z
n1
In = cos x(sin x)dx = cosn1 xd sin x
Z
n1
= cos x sin x sin xd cosn1 x
Z
n1
= cos x sin x (n 1) sin x cosn2 x( sin x)dx
Z
n1
= cos x sin x + (n 1) sin2 x cosn2 xdx.
so that
n1 1
In = In2 + cosn1 x sin x
n n
3
which reduces the calculation of In to I0 or I1 , both are easy to evaluate. For example
/2 /2
/2
n1 1
Z Z
n n2 n1
cos xdx = cos xdx + cos x sin x
0 n 0 n 0
/2
n1
Z
= cosn2 xdx =
n 0
( R /2
n1 n3
n n2 0 cos xdx if n = odd,
= n1 n3
R /2
n n2 0 dx if n = even.
= ex cos x + ex sin x I
so that
2I = ex cos x + ex sin x + C.
Such an equation is called an n-th order differential equation. If n = 1, then it is called a first order
differential equation. Thus a first order differential equation has the general form y = f (x, y), or
implicitly F (x, y, y ) = 0.
A function y = (x) defined on some interval J is called a solution of (2.1) if
4
general we need n conditions which appear as initial conditions. More precisely, an initial condition
for n-th order differential equation (2.1) may be formulated as
y + p(x)y = q(x).
which gives solutions of a separable equation implicitly. If y0 is a root to b(y) = 0, then clearly the
constant function y = y0 is also a solution.
(x2 1)(y 2 1) = C.
The constant functions y = 1 or y = 1 are solutions but are already included in the above general
form with C = 0.
5
Example 2.2 Find the solution to (1 + ex )yy = ex satisfying the initial condition that y(0) = 1.
The equation is separable:
ex
ydy = dx.
1 + ex
After integration we obtain the general solution
1 2
y = ln(1 + ex ) + C.
2
To match the initial condition, we set x = 0 and y = 1 in the general solution to determine the
constant C = 21 ln 2, so that 21 y 2 = ln(1 + ex ) + 21 ln 2. After simplification we have
he i
y 2 = ln (1 + ex )2 .
4
Some differential equations of first order can be transformed by proper substitutions to separable
equations.
6
2.2 Homogenous equations
dy
Consider a first order differential equation dx = f (x, y). If the function f (x, y) (of two variables)
is homogenous, i.e. f (x, y) = h( xy ) where h is a function of one variable, then we can make a
substitution u(x) = y(x) dy du
x so that y = xu. The product rule gives that dx = u + x dx , and the
equation may be written as
du
u+x = f (u)
dx
which is separable.
p
Example 2.4 Find general solutions to xy = x2 y 2 + y. The equation, by dividing x both
sides, is homogenous r y 2 y
y = 1 +
x x
y
so we make substitution u = x and change equation to be
du p
u+x = 1 u2 + u.
dx
du dx
Rearrange the equation: 1u 2
= x . Integrating both sides to obtain
sin1 u = ln |x| + C
or in terms of y, general solutions are given by sin1 ( xy ) = ln |x| + C, together with solutions y
x =1
and xy = 1.
Some differential equations of first order can be transformed into homogenous ones by simple
substitutions.
For example, consider the following type of first order differential equations
dy a 1 x + b1 y + c 1
=f .
dx a 2 x + b2 y + c 2
If c1 = c2 = 0 then the equation is homogenous, so we consider the case that c1 or c2 does not
vanish. If
a 1 b1
a 2 b2 = 0
and b1 6= 0, then we make substitution u(x) = a1 x+b1 y(x) to transform the equation to a separable
one. For the case where
a 1 b1
a2 b2 6= 0
7
therefore, by chain rule,
dz dy
= .
dt dx
The differential equation we are interested becomes
dz a 1 t + b1 z
=f
dt a 2 t + b2 z
which is homogenous.
du 2u2
u+t =
dt (1 + u)2
which is separable. Rearrange the equation
du 2u2 u(1 + u)2 u(1 + u2 )
t = =
dt (1 + u)2 (1 + u)2
and separate the variables to obtain
(1 + u)2 dt
2
du = . (2.2)
u(1 + u ) t
Since
(1 + u)2
Z
1 2
Z
du = + du
u(1 + u2 ) u 1 + u2
= ln |u| + 2 tan1 u
8
2.3 Linear differential equations of first order
Consider a linear differential equation of first order
dy
+ p(x)y = q(x) (2.3)
dx
dz
where p and q are two continuous functions. The corresponding homogenous equation dx +p(x)z =0
is separable, and has the general solution
R
p(x)dx
z(x) = Ce
R
where p(x)dx is a primitive of p(x), and C is an arbitrary constant. It follows that z(x)e p(x)dx
R
is a constant, so that
d R
z(x)e p(x)dx = 0
dx
which is in turn equivalent to the homogenous equation z + p(x)z = 0.
Next we consider the inhomogeneous
R equation (2.3). The previous discussion suggests to con-
sider the differential of y(x)e p(x)dx , and by employing the product rule for derivatives, we obtain
d R
p(x)dx
R
p(x)dx dy
y(x)e = e + p(x)y
dx dx
R
p(x)dx
= q(x)e (2.4)
R R
The function e p(x)dx which is multiplied to y to form ye p(x)dx is called an integrating factor
to the inhomogeneous equation (2.3).
We may describe the above procedure to obtain general solutions for first order linear differential
equations as following, which includes an idea that can be applied to other different situations, thus
is worthy of learning. R
Observe that z(x) = e p(x)dx is a non-trivial solution to the corresponding homogenous equa-
tion z + p(x)z = 0, in order to obtain the general solution to the inhomogeneous one (2.3), we
make use of the solution z(x): making substitution
y(x)
u(x) = (2.6)
z(x)
(which is a standard substitution as long as z is a known function which has some thing to do with
the differential equation we are interested. We will use this idea in several instances later on), and
9
turn (2.3) into a differential
R equation in u. Of course, according to the explicit form of z(x) we
have u(x) = y(x)e p(x)dx and (2.4) just says that
R
p(x)dx
u = q(x)e
10
3.1 Structure of general solutions to linear differential equations
Let us first describe the structures of solutions to linear differential equations. Recall the general
linear differential equation of order n is an equation that can be written
an (x)y (n) + + a1 (x)y + a0 (x)y = f (x) (3.1)
where ai are continuous functions (on some interval) and an 6= 0.
Suppose yp is a particular solution of (3.1), then clearly, y is a solution to (3.1) if and only if
y yp is a solution to the corresponding homogenous linear DE of n-th order
Rearrange the above equation and use the fact that z is a solution to (3.3)
p(x)z(x)u + 2p(x)z (x) + q(x)z(x) u = 0
(3.4)
which is a homogenous differential equation of first order for u .
11
1
Example 3.1 Verify that z(x) = x is a solution to
xy + 2(1 x)y 2y = 0
Since z = x2 and z = 2x3 we can easily see that z is a solution. Making substitution
y(x) = x1 u(x) in the equation we obtain a differential equation for u:
1 1
x u + 2x2 x + 2(1 x) u = 0.
x x
w 2w = 0
which is separable, and has the general solution w(x) = C1 e2x . Integrating w to obtain
Z
u(x) = w(x)dx = C1 e2x + C2
so that
1
C1 e2x + C2
y(x) = (3.5)
x
is the general solution, where C1 and C2 are arbitrary constants.
Example 3.2 Find the general solution to the inhomogeneous linear equation
We have found the general solution to the corresponding homogenous equation which is given
by (3.5), thus, according to the structure of solutions to linear equations, we only need to find a
particular solution. Since the coefficients of the equation are all polynomials in x so we may look
for a solution with a form y(x) = ax+b where a, b are constants. Plugging into the equation y = 0,
y = a and y = ax + b into the equation
so we should have 2a2b = 0 and 2a2a = 12 so that a = 3 and b = 3. Thus y0 (x) = 3x3
is a particular solution, and the general solution thus is given by
1
C1 e2x + C2 3x 3.
y(x) =
x
12
3.2 Linear ODE with constant coefficients
For homogenous linear ODE with constant coefficients:
y (n) + an1 y (n1) + + a1 y + a0 y = 0 (3.6)
where an1 , , a0 are constants, we can construct its general solution if we can find the roots to
the auxiliary equation
mn + an1 mn1 + + a1 m + a0 = 0. (3.7)
The auxiliary equation comes from the following observation. Since the derivative of emx is memx
it is thus reasonable to search for a solution y = emx . Substitute y (k) = mk emx into (3.6) we have
mn + an1 mn1 + + a1 m + a0 emx = 0
thus emx is a solution if and only if m is a root to (3.7) and as long as m is real. If m = + i is a
complex root of the auxiliary equation, then since the coefficients an1 , , a0 are real numbers, so
that m = i is also a root. Now the complex functions emx and emx both satisfy the differential
equation (3.6) so that the real part and imaginary parts of
emx = ex cos(x) + iex sin(x)
(Eulers equation) are solutions of (3.6), i.e. if m = + i is a complex root of the auxiliary
equation, then
y1 (x) = ex cos(x)
and
y2 (x) = ex sin(x)
are a pair of linearly independent solutions of (3.6).
If m is a repeated root to the auxiliary equation with multiplicity k 2, then emx , xemx , , xk1 emx
are solutions. The similar conclusion is valid for complex roots. We therefore are able to construct
n linearly independent solutions to (3.6) via the roots to the auxiliary equation.
Example 3.3 Consider the harmonic motion described by
d2 y
+ 2 y = 0
dx2
where 6= 0 is real. The auxiliary equation is m2 + 2 = 0 which has two complex roots m = i
and m. So we have two independent solutions cos x and sin x and the general solution
y(x) = A cos x + B sin x
where A, B are arbitrary constants.
Example 3.4 Solve the equation
d3 y d2 y dy
3
4 2
+ + 6y = 0.
dx dx dx
The auxiliary equation
m3 4m + m + 6 = 0
has roots 1, 2, 3 so the general solution
y(x) = C1 ex + C2 e2x + C3 e3x .
13
The situation for second order differential equations with constant coefficients is particularly
simple. Consider the homogenous linear equation
d2 y dy
2
+a + by = 0 (3.8)
dx dx
where a, b are two real numbers.
m2 + am + b = 0
Proof. Note that a = (m1 + m2 ) and b = m1 m2 . We consider 1) and 2) first. In this case
em1 x is a solution, so we make substitution y(x) = u(x)em1 x in the differential equation. Since
y = u + m1 u em1 x
and
y = u + 2m1 u + m21 u em1 x
we obtain
u + 2m1 u + m21 u + a u + m1 u
+ bu = 0.
Using the fact that 1 is a root and that a = (m1 + m2 ), we have
u (m2 m1 ) u = 0.
Thus, if m2 m1 6= 0,
u (x) = C1 e(m2 m1 )x
and integrating the equation to obtain
u(x) = C1 e(m2 m1 )x + C2
u(x) = C1 + C2 x
14
Example 3.6 Solve the differential equation
d2 y dy
2
2 + 5y = 0.
dx dx
The auxiliary equation m2 2m+5 = 0 has complex roots m = 1+2i and m, so the general solution
d2 y
+ 4y = 0
dx2
whose auxiliary equation m2 + 4 = 0 has two complex roots 2i. Since sin 3x is the imaginary
part of e3ix , and 3i is not a root of the auxiliary equation. Thus we search for a particular solution
yp (x) = A sin 3x. Plugging it into the equation we find A = 15 . Hence the general solution
1
y(x) = C1 cos 2x + C2 sin 2x sin 3x.
5
Example 3.8 Consider
d2 y dy
2
+4 + 4y = sin 3x.
dx dx
Then
9A + 12B + 4A = 0
and
9B 12A + 4B = 1.
Thus
12 5
A= ,B= .
169 169
The general solution
12 5
y(x) = (C1 x + C2 ) e2x cos 3x sin 3x.
169 169
15
Example 3.9 Let us now consider
d2 y
+ 4y = sin 2x.
dx2
We have seen that sin 2x is a solution to the corresponding homogenous equation, so we look
for a particular solution
yp (x) = Ax cos 2x + Bx sin 2x.
Then B = 0 and A = 14 , so the general solution
1
y(x) = C1 cos 2x + C2 sin 2x x cos 2x.
4
Example 3.10 Find a particular solution to
d2 y
+ 4y = sin x + sin 2x .
dx2
1
By a simple inspection, y1 = 3 sin x is a particular solution to
d2 y
+ 4y = sin x
dx2
and we know from the previous example y2 = 14 x cos 2x is a particular solution to
d2 y
+ 4y = sin 2x.
dx2
Thus
1 1
yp = sin x x cos 2x
3 4
is a particular solution.
d2 y dy
2
3 + 2y = f (x)
dx dx
where f (x) is a given function.
The auxiliary equation m2 3m + 2 = 0 has two real roots 1 and 2, so the general solution to
the corresponding homogenous equation is C1 ex + C2 e2x .
1) Suppose f (x) = sin x which is the imaginary part of eix , since i is not a root of the auxiliary
equation, so we may search for a particular solution yp = A sin x + B cos x, but not just A sin x
which is not good. Feeding yp , yp = A cos x B sin x and yp = yp into the differential equation
16
2 3
Set 2A + 3B 1 = 0 and 2B 3A = 0, and solve the system to obtain A = 13 and B = 13 . The
general solution is given by
2 3
y = C1 ex + C2 e2x + sin x + cos x.
13 13
2) f (x) = e3x . Since 3 is not a root of the auxiliary equation, so search for a particular solution
yp = Ae3x . Feeding it into the differential equation:
(4A 3A 1) e2x = 0
d2 y dy
2
3 + 2y = e2x
dx dx
is given by
y = C1 ex + C2 e2x + xe2x
3) f (x) = xe2x . Since 2 is a root of the auxiliary equation, so we may search for a particular
solution in a form yp = (Ax2 + Bx)e2x (we have included Bxe2x as well, since e2x is a solution to
the homogenous equation, but not xe2x ).
4) f (x) = ex sin x which is the imaginary part of e(1+i)x and 1 + i is not a root of the auxiliary
equation, so we may search for a particular solution yp = (A cos x + B sin x)ex .
5) f (x) = sin2 x. Since sin2 x = 21 12 cos 2x, so we may attempt a particular solution with a
form yp = A + B cos 2x + C sin 2x.
17
In short we may write A = (aij ), where aij is the entry in the ith row and jth column. If m = n,
then A is called a square matrix.
Let us concentrate on 2 2 matrices. You will learn the general theory about matrices in linear
algebra (topics in your paper Mathematics I).
First of all, we have elementary operations among 2 2 matrices: if
a11 a12 b11 b12
A= ,B=
a21 a22 b21 b22
That is A B = (aij bij ) and A = (aij ). The more interesting operation is the multiplication
of two matrices, defined as the following
a11 a12 b11 b12
AB =
a21 a22 b21 b22
a11 b11 + a12 b21 a11 b12 + a12 b22
= .
a21 b11 + a22 b21 a21 b12 + a22 b22
18
We will use I to denote the identity matrix
1 0
.
0 1
It is trivial that IA = AI for any 2 2 matrix. Clearly det(I) = 1.
Given a 2 2 matrix A, we say a 2 2 matrix B (if ever exists) is an inverse matrix of A if
AB = BA = I. Since det(AB) = det(A) det(B), so that a necessary condition for the existence of
an inverse matrix to A is that det(A) 6= 0. It turns out this condition is also sufficient.
Theorem 4.2 Let A = (aij ) be a 2 2 matrix. Then A has an inverse matrix if and only if
det(A) 6= 0. In this case, the inverse matrix is unique and thus is denoted by A1 , given by
1 1 a22 a12
A = .
det(A) a21 a11
Proof. By a direct computation we can see that A1 defined as above is an inverse matrix. If
B is an inverse of A, then
B = B(AA1 ) = (BA)A1 = IA1 = A1
so the inverse matrix is unique.
a11
We observe that det(A) = 0, i.e. a11 a22 = a21 a12 means two column vectors and
a21
a12
are proportional, that is, they are linearly dependent.
a22
2 v1
Let us consider R as the vector space of column vectors v = (also consider as 2 1
v2
matrix). Let A = (aij ) be a 2 2 matrix. Then we associate A a linear mapping from R2 R2
denoted by A and defined by
a11 a12 v1 a11 v1 + a12 v2
Av = = .
a21 a22 v2 a21 v1 + a22 v2
Proposition 4.3 Let A = (aij ) be a 2 2 matrix.
1) The linear system Av = 0 has non zero solutions if and only if det(A) = 0.
2) The linear system Av = v has a solution v 6= 0 if and only if is an eigenvalue of A, that
is, det(A I) = 0 (which is called the characteristic equation of A). In this case, v is called an
eigenvector (corresponding to the eigenvalue ).
A square matrix A = (aij ) is diagonal if aij = 0 for any i 6= j.
Theorem
4.4
Suppose a 2 2 matrix A = (aij ) has distinct real eigenvalues 1 and 2 , and let
v1i
vi = be corresponding vectors with eigenvalues i (i = 1, 2). Let
v2i
v11 v12
P = (v1 , v2 ) = .
v21 v22
Then
1 1 0
P AP = .
0 2
19
Proof. First show that P is invertible, which is equivalent to that v1 and v2 are linearly
independent. Suppose v1 + v2 = 0, so that Av1 + Av2 = 0. Hence 1 v1 + 2 v2 = 0. It
follows that (2 1 )v2 = 0, so that = 0 and similarly = 0. Therefore v1 and v2 are linearly
independent, and P 1 exists.
By definition
Example 4.5 Find all the eigenvalues and eigenvectors for the following matrices
2 1 2 1 0 1
A= , B= and C = .
6 3 0 2 1 0
d2 y dy
2
+ a + by = f (t)
dt dt
is equivalent to the system dx
dt = ax by + f (t)
dy
dt = x.
In terms of matrix notations, it can be written as
dx
dt a b x f (t)
dy = + .
dt
1 0 y 0
20
Example 5.1 Solve the following initial value problem
dx
dt = 3x + y, x(0) = 1;
dy
dt = 6x + 4y, y(0) = 1.
dx
From the first equation, substitute y = dt 3x to the second equation, to obtain
d2 x dx dx
3 = 6x + 4 12x
dt2 dt dt
so x solves the homogenous linear equation of second order
d2 x dx
2
7 + 6x = 0
dt dt
whose auxiliary equation has roots 1 and 6, so x(t) = C1 et + C2 e6t . Since x(0) = 1 and x (0) =
y(0) + 3x(0) = 4 we have
C1 + C2 = 1, C1 + 6C2 = 4.
3
Thus C2 = 5 and C1 = 52 , and
2 3 4 9
x(t) = et + e6t , y(t) = et + e6t .
5 5 5 5
We next describe another method which is contained in the following
Theorem 5.2 Consider the system of linear equations with constant coefficients
dx
dt a11 a12 x
dy = .
dt
a21 a22 y
v1k
Suppose A = (aij ) has distinct eigenvalues 1 and 2 with corresponding eigenvectors vk =
v2k
(k = 1, 2). Then the general solution of the system is given by
x(t)
= C 1 e 1 t v 1 + C 2 e 2 t v 1
y(t)
Proof. Let P = (v1 , v2 ). We know that P 1 exists. The system may be written as
x x 1 x
= A = AP P .
y y y
21
Then
d
d 1 x(t)
z(t) = P dt
d = P 1 AP z(t)
dt dt y(t)
1 0 z1 (t)
= .
0 2 z2 (t)
That is
z1 (t) = 1 z1 (t) and z2 (t) = 2 z2 (t)
so that zk (t) = Ck ek t (k = 1, 2). Hence
x(t) z1 (t) z1 (t)
= P = (v1 , v2 )
y(t) z2 (t) z2 (t)
= C 1 e 1 t v 1 + C 2 e 2 t v 2 .
22
Example 5.5 Solve the system
dx
dt 2 5 x
dy = .
dt
2 4 y
Example 5.6 Solve the initial value problem to the linear system
dx
dt 2 1 x x(0) = 1,
dy = ,
dt
4 6 y y(0) = 1.
has repeated root 4, so x(t) = y(t) = e4t is a solution to the system. Taking into account the initial
condition, we may set x(t) = (At + 1)e4t and y(t) = (Bt + 1)e4t , and feed them into the system to
obtain A = 1 and B = 2. Thus the solution to the initial problem is given by
x(t) = (1 t)e4t ,
y(t) = (1 2t)e4t .
23
6.1 Computations of partial derivatives
Let us begin with a (real) function of two variables, u = f (x, y) defined on an open subset such as
an open disk, and begin with the partial derivatives of f . By saying a subset U of R2 (resp. Rn )
an open subset we mean that if any point p U there is an open disk (resp. an open ball in Rn )
Bp (r) centered at p with radius r > 0 such that Bp (r) U .
Holding y = y0 as constant, consider f (x, y0 ) as a function of x, if its derivative (in x) exists at
x0 , i.e.
f (x, y0 ) f (x0 , y0 )
lim
xx0 x x0
exists, then its limit is called the partial derivative of f in x at (x0 , y0 ), denoted by one of the
following notations
u f (x0 , y0 )
, ; ux , fx (x0 , y0 ); Dx u, Dx f (x0 , y0 ).
x x
It was C.G.J.Jacobi who first proposed to use symbol instead of d for partial derivatives. Similarly
we may introduce partial derivative in y, denoted by u y etc. The definition of partial derivatives
applies as well to functions of three variables, and to functions of several variables.
Example 6.1 Find partial derivatives for u = y x . Holding y as constant then u is an exponential
function and u x u
x = y ln y, while if hold x as constant, it is a power function so that y = xy
x1 .
x
Example 6.2 Find partial derivatives for u = x2 +y 2 +z 2
. The results are
u 2x2 1
= +
x (x2 y2
+ + z 2 )2 x2 + y 2 + z 2
y2 + z 2 x2
= ,
(x2 + y 2 + z 2 )2
u 2xy u 2xz
= 2 , = .
y (x2 + y 2 + z 2 ) z (x2 + y 2 + z 2 )2
Example 6.3 Let u = yf (x2 y 2 ) where y 6= 0, and f (t) is a differentiable function with continuous
derivative f (t). Then
1 u 1 u u
+ = 2.
x x y y y
In fact, according to the chain rule we have
u u
= 2xyf (x2 y 2 ), = f (x2 y 2 ) 2y 2 f (x2 y 2 )
x y
so that
1 u 1 u 1 u
+ = f (x2 y 2 ) = 2 .
x x y y y y
24
Suppose u = f (x, y) whose partial derivatives u u u
x and y exist on an open subset, so that x
and u u
y are functions of variables x and y. Suppose that the partial derivative of x in x exists,
u u 2u
then x x
= x ( x ) is called the second order partial derivative of u, denoted by x2
or by any of
the following
2 f (x0 , y0 ) 2 2
; uxx , fxx (x0 , y0 ); Dxx u, Dxx f (x0 , y0 ).
x2
u 2u
Similarly y ( x ) is denoted by yx etc. Higher order derivatives can be defined inductively.
x
Example 6.4 Find partial derivatives of u = tan1 y up to second order. In fact
u y u x
= 2 2
, = 2
x x + y y x + y2
and
2u
y 2xy
2
= 2 2
= 2 ,
x x x + y (x + y 2 )2
2u x2 y 2 2u
y
= = = ,
xy y x2 + y 2 (x2 + y 2 )2 yx
2u
x 2xy
2
= 2 2
= 2 .
y y x +y (x + y 2 )2
In particular, u solves the Laplace equation
2u 2u
+ 2 = 0.
x2 y
We can carry on to find
2u 6xy 2 2x3
2xy
2
= = and etc.
x y y (x + y 2 )2
2 (x2 + y 2 )3
Lemma 6.5 Suppose that u = f (x, y) defined on an open subset U has first order partial derivatives
u u
x and y which are continuous functions on U . Let (x0 , y0 ) U , x = x x0 , y = y y0 and
u = f (x, y) f (x0 , y0 ). Then
f (x0 , y0 ) f (x0 , y0 )
u = x + y + (6.1)
x y
25
That is, is small in comparison with x and y, thus the main part of the increment u at
(x0 , y0 ) is
f (x0 , y0 ) f (x0 , y0 )
x + y
x y
which is linear in the increments (x, y) of independent variables, called the first order differential
of f at (x0 , y0 ).
f (x, y) f (x0 , y)
= 0.
x
Similarly if y = 0,
f (x0 , y) f (x0 , y0 )
= 0.
y
Then, since
x y
p , p
x2 + y 2 x2 + y 2
p
are bounded, as x2 + y 2 0 we have
f (x, y) f (x0 , y) f (x0 , y) x
p 0,
x x x2 + y 2
and
f (x0 , y) f (x0 , y0 ) f (x0 , y0 ) y
p 0.
y y x2 + y 2
Since the partial derivatives are continuous, so thar
f (x0 , y) f (x0 , y0 )
0
x x
p
as x2 + y 2 0. Putting these facts all together we may conclude that
p
p 0 as x2 + y 2 0.
x2 + y 2
26
The first order differential of u = f (x, y), denoted by du or df , is defined as
f (x, y) f (x, y)
df = dx + dy.
x y
The function
f (x0 , y0 ) f (x0 , y0 )
z = f (x0 , y0 ) + (x x0 ) + (y y0 )
x y
is the linear approximation of z = f (x, y) near the point (x0 , y0 ). The above linear equation in
(x, y, z) represents the tangent plane to the surface graph of the function z = f (x, y) at the point
(x0 , y0 , f (x0 , y0 )). We will return to this topic in the following lectures.
Lemma 6.6 (Chain rule for two variable functions) Suppose that f (x, y) is function on an open
subset U R2 with continuous partial derivatives f f
x and y , and suppose x = (t) and y = (t)
are two differentiable functions on an interval (a, b) such that ((t), (t)) U for every t (a, b).
Let F (t) = f (x(t), y(t)). Then F is differentiable on (a, b) and
27
Suppose we make change of variables: x = (s, t) and y = (s, t), assume that and have
continuous partial derivatives. Consider F (s, t) = f ((s, t), (s, t)). Holding t as constant and
applying the chain rule (6.2) to variable s we obtain
F f f
= +
s x s y s
and similarly
F f f
= + .
t x t y t
In terms of matrices, the chain rule may be put into a neat form
F F f f s t
, = , ,
s t x y s t
the 2 2 matrix on the right hand side is called the first order total derivative (or the Jacobian
matrix) of the transformation x = (s, t) and y = (s, t), denoted by D(, ).
If all involved functions have continuous higher order partial derivatives, then we may repeat
the use of the chain rule.
2F
Example 6.7 Let F (s, t) = f ((s, t), (s, t)). Evaluate ts .
2F F
By definition ts = s t , so that
2F
f f
= +
ts s x t y t
f f
= +
s x t s y t
f f 2
= +
s x t x ts
f f 2
+ + [Product rule]
s y t y ts
2
2 f f 2
f f
= 2
+ + [chain rule to ]
x s xy s t x ts x
2
f 2 f f 2
f
+ + 2 + [chain rule to ].
yx s y s t y ts y
Here, the important thing we should keep in mind when working with higher partial derivatives is
that the symbols f f
x and y are again functions of x and y, hence of s and t, so we have to apply
the chain rule to these functions as well.
As a direct consequence of the chain rule, we can show that the first order differentials are
invariant under substitutions. To be more precise, if F (s, t) = f ((s, t), (s, t)), then
f f F F
df = dx + dy = ds + dt
x y s t
28
where
dx = ds + dt, dy = ds + dt.
s t s t
The invariance of the first differentials under change of variables is useful in evaluating partial
derivatives, but more importantly, it implies that differentials of functions are globally defined
objects which do not depend on the coordinates we use to evaluate them.
Let us write down the chain rule for several variable functions.
Suppose that f (x1 , , xm ) is a function of m variables which has continuous partial derivatives.
Consider change of variables given by
x1 = 1 (t1 , , tn ),
.. .. (6.3)
. .
xm = m (t1 , , tn )
i
where n N and 1 , , m are functions of (t1 , , tn ) which have continuous derivatives tj .
Let
F (t1 , , tn ) = f (1 (t1 , , tn ), , m (t1 , , tn )).
Then
F f 1 f m
= + + (6.4)
tj x1 tj xm tj
where j = 1, , n. In terms of matrix notations, the chain rule may be put in the following form
1
1
t1 tn
F F f f . .. ..
, , = , , .. . . (6.5)
t1 tn x1 xm m m
t1 tn
where the m n matrix on the right-hand side of (6.5) is called the first order total derivative
associated with the transformation (6.3), denoted by D(1 , , m ). A careful study about the
total derivatives for vector valued functions such as (6.3) will be the topic of Part A Multi-Variable
Calculus (Trinity Term in your second year).
Example 6.8 Consider u = xy where x = (t) and y = (t) so that u(t) = (t)(t) . According to
the chain rule
z z
u (t) = x (t) + y (t)
x y
= (t)yxy1 + (t)xy ln x
= 1 + ln .
Example 6.9 Let u = f (x, y, z) have continuous partial derivatives. Let x = , y = and
z = . Work out the matrix of the first order total derivative for the transformation
x x x
0 1 1
y y y
= 1 0 1
z z z 1 1 0
29
so, by the chain rule,
0 1 1
f f f f f f 1 0
, , = , , 1
x y z
1 1 0
f f f f f f
= + , , + ,
y z x z x y
that is,
u f f u f f u f f
= + , = , = + .
y z x z x y
Using chain rule again we have
2u
f f f f
= + = +
y z y z
2 2 2 2f
f f f
= + .
yx yz zx zz
The idea used in the previous example can be applied to evaluating partial derivatives for
implicit functions. Suppose that y = y(x) is a function of x implicitly given by an equation
F (x, y) = 0 .
In order to solve y from the equation to determine a function y = y(x) at least locally, we need
to impose some conditions. Let us assume the partial derivatives of F (by considering x, y as
30
dy
independent variables) exist and are continuous. To find out the derivative dx , we take derivative
both sides of the equation F (x, y) = 0, in x, keep in mind that y = y(x) is a function of x. Then
F F dy
+ = 0.
x y dx
The left hand side is the result from applying the chain rule to F with x = x and y = y(x). In
dy
order to be able to solve dx we need to assume that Fy 6= 0, and we obtain
dy Fx
= .
dx Fy
In fact the condition that Fy 6= 0 ensures that the equation F (x, y) = 0 determines locally a function
y = y(x).
The previous idea applies to several variable implicit functions. For example, if z = z(x, y) is
function implicitly given by the following equation
F (x, y, z) = 0,
and if, the partial derivatives of F (considering x, y, z as independent variables) are continuous,
and Fz =6 0, then, by taking derivative in x holding y as constant to obtain
z
Fx + Fz =0 (6.6)
x
so that
z Fx
= .
x Fz
Similarly we have
z Fy
= .
y Fz
To compute the second partial derivatives, we continue the same procedure. Taking derivative both
side of (6.6) in y to obtain
2z
z z z
Fxy + Fxz + Fzy + Fzz + Fz =0
y y x xy
2z
and solving xy we obtain
z z z
2z Fxy + Fxz y + Fzy + Fzz y x
=
xy Fz
etc., though the results become increasingly complicated.
Finally we mention that the same idea applies to several functions with several variables. For
example, from the following system
F (x, y, z) = 0,
(6.7)
G(x, y, z) = 0,
31
we hope to solve y and z in terms of variable x, thus y = y(x) and z = z(x). By saying that y(x)
and z(x) are solutions means that if we substitute (y, z) in the system (6.7) by (y(x), z(x)), then
F (x, y(x), z(x)) = 0, G(x, y(x), z(x)) = 0 (6.8)
hold identically over the range of x. Therefore, by taking derivative on both sides of the equations
in x, and employing the chain rule, we have
dy dz dy dz
Fx + Fy + Fz = 0, Gx + Gy + Gz = 0, (6.9)
dx dx dx dx
dy dz
which is a linear system in ( dx dx ), and can be put in a matrix form, namely
dy
Fy Fz dx
Fx
dz = . (6.10)
Gy Gz dx
Gx
dy dz
We may solve dx and dx as long as
Fy Fz
det = Fy Gz Fz Gy 6= 0. (6.11)
Gy Gz
[In fact, near the point (x, y, z) where Fy Gz Fz Gy 6= 0, we can show that (y, z) can be solved from
the system (6.7) at least locally, which is a part of the conclusion of the so-called Inverse Function
Theorem. The proper formulation and its proof of the inverse function theorem will be the topics
for the Part A option Multi-Variable Calculus in Hilary term]. Indeed, since under the condition
(6.11) the matrix
Fy Fz
Gy Gz
is invertible, so that
dy 1
dx
Fy Fz Fx
dz =
dx
Gy Gz Gx
1 Gz Fz Fx
=
Fy Gz Fz Gy Gy Fy Gx
1 Gz Fx Gx Fz
= .
Fy Gz Fz Gy Fy Gx Fx Gy
Hence
dy Gz Fx Gx Fz F Fx Fz Fy
= z
= /
dx Fy Gz Fz Gy Gz Gx Gz Gy
and
dz Fy Gx Fx Gy F Fx Fz Fy
= y
= / .
dx Fy Gz Fz Gy G y G x
Gz Gy
Example 6.11 Let y = y(x) and z = z(x) be the functions satisfying the following equations
x2 y 2 z 2
+ 2 + 2 = 1 and x + y + z = 0
a2 b c
dy dz
where x > 0, y > 0 and z > 0. Find the derivatives dx and dx .
32
We may differentiate the equations in x, while keep in mind y and z are functions of x, so
according to chain rule
2x 2y dy 2z dz d
+ 2 + 2 = 1 = 0, (6.12)
a2 b dx c dx dx
and
dy dz d
1+ + = 0 = 0. (6.13)
dx dx dx
From (6.13) we obtain
dz dy
= 1
dx dx
and substitute it into (6.12) to get
x y dy z dy
+ + 1 =0
a2 b2 dx c2 dx
dy
from which we may solve dx , hence
z y
dy c2
ax2 dz b2
ax2
= y , =
dx b2
cz2 dx z
c2
by2
y z
at those points where b2
c2
6= 0.
The symbol In the n-dimensional Euclidean space Rn , the symbol means the total differ-
entiation. Under the Cartesian coordinate system (x1 , , xn ), denotes the total derivative
= , , .
x1 xn
When applies to a function f (x1 , , xn ) with continuous partial derivatives, f means the
total derivative
f f
f = , ,
x1 xn
called the gradient (vector field) of f . f may be considered a function taking values in Rn (such
a function is called a vector-valued function, also called a vector field in Rn in this special case that
the number of functions in f is exactly the dimension n of Rn ).
On the other hand, if
33
is a function of n variables defined on U Rn , taking values in Rn (such a vector valued function
u is called a vector field on U ) then we may make dot product between and u to obtain a real
valued function by
u1 , , un
u = , ,
x1 xn
u 1 un
= + +
x1 xn
which is called the divergence of the vector field u.
We have seen that, if f (x1 , , xn ) is a scalar function on U Rn , then its gradient f is a
vector field, so we may apply dot product between and f , to obtain
f f
f = , , , ,
x1 xn x1 xn
2
f 2
f
= + + 2
x21 xn
which is called the Laplacian of f , denoted by f . Thus, we introduce the differential operator of
second order
2 2
= + +
x21 x2n
called the Laplace operator in Rn . We extend the operation of to vector valued functions as the
following. Suppose
f = f 1 , , f m .
Curl operator In 3-dimensional Euclidean space R3 , besides the dot product, there is
another multiplication between two vectors called cross products. Recall that, under the Cartesian
coordinate system (x, y, z), if a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) then the cross product a b is
defined by
i j k
a2 a3 a1 a3 a1 a2
a b = a1 a2 a3 =
i j+
k
b1 b2 b3 b2 b3 b1 b3 b1 b2
= (a2 b3 a3 b2 , a3 b1 a1 b3 , a1 b2 a2 b1 ) ,
where i, j, k are the standard basis in R3 : i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1). a b is the
unique vector which is perpendicular to both a and b obeying the right hand rule, with magnitude
|a b| = |a||b| sin (a, b), where 0 (a, b) is the angle between
a and b.
We apply this definition by replacing a with = x , y , z and a vector field u = (u1 , u2 , u3 )
where u1 , u2 , u3 are functions on U R3 with continuous partial derivatives, and define the curl
34
of the vector field u by
i j k
u = x y z = 2
y z i x z j + x y k
u u3 u1 u3 u1 u2
u1 u2 u3
3
u2 u1 u3 u2 u1
u
= , , .
y z z x x y
u is again a vector field on U R3 , also called the vorticity of u.
The determinate of the first order total derivative (the Jacobian matrix)
x x
x y x y
det u y
v
y =
u v u v v u
(x,y)
is called the Jacobian of the transformation, which will be denoted by (u,v) , i.e.
x x
(x, y) u v
= y y
(u, v) u v
which is the density of area elements in a new coordinate system (u, v) in the following sense.
Suppose the transformation u = u(x, y) and v = v(x, y) which send a domain U in xy-plane one to
one and onto a domain D in uv-plane, then
(x, y)
Z Z
f (x, y)dxdy = f (x(u, v), y(u, v))
dudv.
U D (u, v)
That is to say, under the transformation (x, y) (u, v), the area element dxdy in the xy-plane is
(x,y)
equivalent to (u,v) dudv, where dudv is the area element in uv-plane.
35
Example 6.13 (Parabolic coordinate system) The coordinates (u, v) given by the following rela-
tions
1
x = (u2 v 2 ), y = uv
2
are called the parabolic coordinates in the planer. The Jacobian matrix and Jacobian are given by
x x
u v (x, y)
u
y
v
y = , = u2 + v 2 .
u v
v u (u, v)
and
2 2 2 2
+ = (u2 + v 2 ) + .
x2 y 2 u2 v 2
is given by
x y
! !
r r
x y x 2 +y 2 x 2 +y 2
= y x
x y x2 +y 2 x2 +y 2
(r,) 1
so that its Jacobian (x,y) = = 1r . Hence
x2 +y 2
(x, y) (r, )
= 1.
(r, ) (x, y)
If f (x, y) is a function with continuous partial derivatives, and F (r, ) = f (r cos , r sin ), then
(
F
r = cos f f
x + sin y ,
(6.14)
F
= r sin f f
x + r cos y .
36
f f f f
It is also useful to express x and y in terms of r and , which can be achieved by solving
f f
x , y from the above linear system:
1
f f f f cos r sin
, = ,
x y r sin r cos
1 f f r cos r sin
= ,
r r sin cos
1 f 1 f r cos r sin
= ,
r r r sin cos
f 1 f f 1 f
= cos sin , sin + cos
r r r r
that is (
f
x = cos f sin f
r r ,
f (6.15)
y = sin f cos f
r + r .
2f 2f
f = + .
x2 y 2
We wish to work out the Laplace operator in the polar coordinate system. To this end, we continue
to compute the second order partial derivatives. In fact,
2f
f f
= cos + sin
r2 r x y
f f
= cos + sin
r x r y
2f 2f
= cos cos 2 + sin
x yx
2 2f
f
+ sin cos + sin 2
xy y
2
f 2
f 2f
= cos2 2 + 2 sin cos + sin2 2
x xy y
37
and
2f
f f
= r sin + r cos
2 x y
f f
= r cos r sin
x y
f f
r sin + r cos
x y
f f
= r cos r sin
x y
2 2f
f
r sin r sin 2 + r cos
x xy
2 2f
f
+r cos r sin + r cos 2
yx y
2
f 2
f 2f
= r2 sin2 2 2r2 sin cos + r2 cos2 2
x xy y
f f
r cos r sin .
x y
Hence
2f 1 2f 2f 2f cos f sin f
+ = +
r2 r2 2 x2 y 2 r x r y
2
f 2
f 1 f
= 2
+ 2
x y r r
in other words
2f 2f 2f 1 2f 1 f
2
+ 2
= 2
+ 2 2
+ . (6.17)
x y r r r r
Therefore, under the polar coordinate system (r, ) the Laplace operator
2 1 2 1
= 2
+ 2 2
+ . (6.18)
r r r r
x = r cos , y = r sin , z = z
38
(x,y,z)
and the Jacobian (r,,z) = r. The inverse transformation is given by
p y
r= x2 + y 2 , tan = , z = z.
x
The Laplace operator in the cylindrical coordinates is
2 1 2 1 2
= + + + . (6.19)
r2 r2 2 r r z 2
39
Finally let us consider the Laplace operator in R3
2 2 2
= + +
x2 y 2 z 2
and we wish to write the Laplace operator in the spherical coordinate system. Suppose f (x, y, z)
has continuous derivatives up to second order. First, we use the cylindrical coordinates x = r cos ,
y = r sin , z = z. Then, according to (6.17)
2f 2f 2f 1 f 1 2f
+ = + + . (6.21)
x2 y 2 r2 r r r2 2
Next, we use the change of variables: z = cos and r = sin . Notice that (, ) are the
polar coordinates for (z, r), thus, according to (6.14)
(
f f f
= cos z + sin r ,
f f f (6.22)
= sin z + cos r ,
2f 2f 1 f 1 2f
f = + + +
r2 z 2 r r r2 2
2
f 1 f 2
1 f 1 f 1 2f
= + + + + . (6.24)
2 2 2 r r r2 2
f
On the other hand, by solving r from (6.22) we have
f f cos f
= sin + (6.25)
r
2f 1 f 1 2f
f = + +
2 2 2
1 2f
1 f cos f
+ sin + + 2 2
sin r
2
f 2
1 f 1 2
f 2 f cot f
= 2
+ 2 2
+ 2 2 2
+ + 2 .
cos
That is, under the spherical coordinate system the Laplace operator in R3 can be written as
2 1 2 1 2 2 cot
= 2
+ 2 2
+ 2 + + 2 . (6.26)
cos 2
2
40
6.6 Some simple partial differential equations
An equation involving several variables, functions and their partial derivatives is called a partial
differential equation (abbreviated as PDE or PDEs for simplicity).
Example 6.14 z = x2 + y 2 is a solution to the following PDE
z z
x y =0
y x
such that when x = 0 then z = y 2 .
Example 6.15 z = y x2 is a solution to the following PDE
z z
x + (y + x2 ) =z
x y
which satisfies the condition that when x = 2 then z = y 4.
Example 6.16 (The heat equation) Consider the one-dimensional heat equation:
u(x, t) 2 2 u(x, t)
=
t 2 x2
where > 0 is a constant.
By an inspection, we can see that the Gaussian probability function
1 x2
u(x, t) = e 22 t for t > 0
2 2 t
is a positive solution to the heat equation for t > 0.
Let us search for solutions u(x, t) which are separable. To this end, make substitution u(x, t) =
g(x)h(t). Since
u(x, t) 2 u(x, t)
= g(x)h (t), and = g (x)h(t)
t x2
thus, the heat equation becomes
2
g(x)h (t) = g (x)h(t).
2
Separate the variables to obtain
h (t) 2 g (x)
= .
h(t) 2 g(x)
h (t) h (t)
Since h(t) only depends on t, while the equation implies that it only depends on x, therefore h(t)
must be a constant function. Similarly, gg(x) (x)
is a constant independent of x or t. Therefore we
must have
h (t) 2 g (x)
= =
h(t) 2 g(x)
where is a constant. The heat equation is thus transformed to a system of second order linear
ODEs
2
h (t) = h(t), g (x) = 2 g(x).
t
The first ODE has a general solution h(t) = C1 e and the second ODE has a general solution
41
1. If > 0, then q q
2 2
x x
g(x) = C2 e 2 + C3 e 2 .
2. If < 0, then r ! r !
2 2
g(x) = C2 cos 2x + C3 sin 2x .
3. If = 0, then
g(x) = C2 + C3 x.
x = x(t), y = y(t),
where t (a, b) (some interval) is a parameter. Since dx dy
= xy (t)
(t) , so the tangent line to the curve
at a point (x(t0 ), y(t0 )) has a parameterization
42
which represents the line passing through (x(t0 ), y(t0 )) with slope (x (t0 ), y (t0 )).
A curve in R2 can also be described implicitly by an equation such as F (x, y) = 0. You should
be familiar with the standard quadratic curves such as circles, ellipses, parabolas and hyperbolas
(for a revision you may refer to Richard Earls notes).
x2 y 2
+ 2 = 1,
a2 b
which has a parameterization defined by
x = a cos t, y = b sin t
where 0 t < 2. A tangent vector at (a cos t, b sin t) is thus given as (a sin t, b cos t).
A parameterized curve in the space R3 may be described by a vector valued function of one
variable t, i.e. a mapping t (t) where
The tangent vector to the curve at (t0 ) is the vector (x (t0 ), y (t0 ), z (t0 )) and the line tangent to
the curve at (x(t0 ), y(t0 ), z(t0 )) is given by the equation
that is
x = x(t0 ) + x (t0 )(t t0 ),
y = y(t0 ) + y (t0 )(t t0 ),
z = z(t0 ) + z (t0 )(t t0 ).
By relabel the variables, the graph of z = f (x, y) is a parameterized surface defined by the mapping
(u, v) (u, v, f (u, v)) where (u, v) as two parameters. In general, a mapping
where (u, v) runs through an open subset U R2 is called a parameterized surface in the space
R3 . The mapping or the parameterized surface is often written as
When (u, v) runs through a subset U , then (x(u, v), y(u, v), z(u, v)) draws out a surface in the space
the image of U under the mapping (7.4).
43
A surface S may be described by an equation
F (x, y, z) = 0, (7.5)
where, in order to avoid technical difficulty, we assume that F 6= 0. For example, a sphere:
x2 + y 2 + z 2 = R2 which has a parameterized representation in terms of spherical coordinates (, )
[Notice that the equation of the sphere in spherical coordinates takes a simple form: = R]
Recall that F = (Fx , Fy , Fz ) is the gradient vector field of F , so we may rewrite (7.10) as
which says the tangent vector (0) is perpendicular to the gradient of F . Since can be any curve
on the surface S, thus (0) can be any vector tangent to the surface S at P , so (7.7) means that
any vector tangent to the surface S at P is perpendicular to the gradient vector F (x0 , y0 , z0 ),
and therefore all tangent vectors to S at the point P lies on the plane passing through P and
perpendicular to F (x0 , y0 , z0 ), which is called the tangent plane to S at P . We therefore call
F (x0 , y0 , z0 ) a normal vector to the surface S at P .
Suppose that (x, y, z) belongs to the tangent plane at P , so that (x x0 , y y0 , z z0 ) lies on
the tangent plane, so it must be perpendicular to the normal vector F (x0 , y0 , z0 ), thus
44
Consider on the other hand a parameterized surface S: which is described by a vector valued
function of two parameters (u, v):
where u (resp. v) is considered as a parameter. Then both curves 1 and 2 lie on the surface
and pass through P , and the tangent vectors 1 (u0 ) and 2 (v0 ) are two tangent vectors to the
parameterized surface S at P , thus, by definition 1 (u0 ) 2 (v0 ) (cross product of two vectors
1 (u0 ) 2 (v0 )) is a vector perpendicular to the both vectors 1 (u0 ) and 2 (v0 ) [Geometry I,
Prelims] and therefore 1 (u0 ) 2 (v0 ) is a normal vector to the surface S. On the other hand, by
definition of partial derivatives
x y z x y z
1 = = , , , 2 = = , ,
u u u u v v v v
where (u, v) = (x(u, v), y(u, v), z(u, v)). Thus u
v is a normal vector to the parameterized
surface S, which is given by according to the definition of the cross product
i j k
x y z
= u u u .
u v x y z
v v v
(u0 , v0 ) (u0 , v0 )
(r (u0 , v0 )) = 0 (7.11)
u v
where r = (x, y, z) is the position vector on for a general point in the tangent plane. In terms of
the Cartesian coordinates, (7.11) can be written, by working out the dot product, as
x x 0 y y0 z z 0
y
x z
=0 (7.12)
u u u
x y z
v v v
where the partial derivatives are evaluated at (u0 , v0 ), and (x0 , y0 , z0 ) = (u0 , v0 ).
45
Example 7.2 The sphere with radius R > 0 may be described implicitly by the equation
x2 + y 2 + z 2 = R 2 ,
so a normal vector to the tangent plane at (x0 , y0 , z0 ) is f (x0 , y0 , z0 ) = 2(x0 , y0 , z0 ) which has the
same direction as the coordinate vector, and the tangent plane has an equation
x0 (x x0 ) + y0 (y y0 ) + z0 (z z0 ) = 0.
Since the point (x0 , y0 , z0 ) lies on the sphere so that the equation can be simplified as
x 0 x + y0 y + z 0 z = R 2 .
The sphere may be parameterized via the spherical coordinates which are is given as the parameter-
ized surface
x = R sin cos , y = R sin sin , z = R cos
where 0 and 0 < 2, hence the tangent plane at a point (x0 , y0 , z0 ) has an equation
xx yy zz
0 0 0 x x0 y y0 z z0
x y z
= R cos cos R cos sin R sin
x
y z R sin sin R sin cos 0
x x0 y y0 z z0
= R2 cos cos cos sin sin = 0
sin sin sin cos 0
(t) = (x0 + v1 t, y0 + v2 t, z0 + v3 t)
46
the line passing through (x0 , y0 , z0 ), then the derivative
d
F (0) = F (0)
dt
= F (x0 , y0 , z0 ) v
= v1 Fx + v2 Fy + v3 Fz
Dv F = F v.
By definition
F (x0 + v1 t, y0 + v2 t, z0 + v3 t) F (x0 , y0 , z0 )
Dv F (x0 , y0 , z0 ) = lim .
t0 t
The previous discussion can be stated as the following
Proposition 7.3 Suppose that F (x, y, z) is a function on an open subset U R3 with continuous
partial derivatives, and (t) is a parameterized curve in U with tangent vector v at (0), i.e.
(0) = v, then
d
F (0) = Dv F ((0)) = F ((0)) v. (7.13)
dt
8 Taylors theorem
Suppose that f (x) is a function defined on [a, b] with derivatives of any order. For a given natural
number n we search for a polynomial in (x a) of degree n
pn (x) = a0 + a1 (x a) + + an (x a)n
so that f (x) agrees with pn (x) up to nth order derivatives at a, that is f (k) (a) = p(k) (a) for
1 (k)
k = 0, 1, , n. Since p(k) (a) = k!ak for k = 0, , n we obtain ak = k! f (a) and therefore
f (n) (a)
pn (x) = f (a) + f (a)(x a) + + (x a)n (8.1)
n!
which is called Taylors expansion (of order n) for f at the point a. We have the following theorem
which will be proved in Prelims Analysis II in Hilary term.
Theorem 8.1 (Taylors theorem for one variable function) Suppose f (x) has derivatives at a up
to nth order, then
f (n) (a)
f (x) = f (a) + f (a)(x a) + + (x a)n + o((x a)n )) (8.2)
n!
as x a [the right-hand side is called Taylors expansion of f at a with Peanos remainder]. That
is
f (x) pn (x)
lim = 0.
xa (x a)n
47
Taylors theorem says the Taylor expansion of nth order is a good approximation of f near a
up to (x a)n .
We can have better estimate for the difference f (x) pn (x) if f has derivatives on [a, b] up to
(n + 1)th order. Namely we have
Theorem 8.2 (Taylors Theorem) Suppose f (x) has derivatives on [a, b] up to (n + 1)th order,
then for any x (a, b] there is (a, x) such that
f (n) (a)
f (x) = f (a) + f (a)(x a) + + (x a)n + x [a, b].
n!
For example, we can easily see that
x2 x4 x2n
cos x = 1 + + (1)n + x (, ).
2! 4! (2n)!
Let us now consider a function f (x, y) of two variables defined on an open subset U . Suppose
(x0 , y0 ) U . We search for a Taylor type expansion of f (x, y) near (x0 , y0 ). Let us assume that
f has continuous partial derivatives up to order n. Let (x, y) U close to (x0 , y0 ) so that the line
segment
(where t [0, 1]) between (x0 , y0 ) and (x, y) lies in U . Consider one variable function
g (0) = f (x0 , y0 ) v
48
so that, by differentiating in t again to obtain
and from which we can see the pattern for kth derivative, namely
X k k f ((t))
(k)
g (t) = (x x0 )i (y y0 )j
i xi y j
i+j=k
i,j0
and therefore
X k! k f (x0 , y0 )
g (k) (0) = (x x0 )i (y y0 )j . (8.4)
i!j! xi y j
i+j=k
i,j0
as (x, y) (x0 , y0 ). If f has partial derivatives on U up to (n + 1)th order, and the segment
between (x0 , y0 ) and (x, y) lies in U , then there is (0, 1) (depending on n, (x0 , y0 ), (x, y) and
the function f ) such that
n X
X 1 k f (x0 , y0 )
f (x, y) = f (x0 , y0 ) + (x x0 )i (y y0 )j
i!j! xi y j
k=1 i+j=k
i,j0
X 1 n+1 f ()
+ (x x0 )i (y y0 )j (8.6)
i!j! xi y j
i+j=n+1
i,j0
where
= (x0 , y0 ) + (1 )(x, y).
The right-hand side of (8.6) is called the Taylor expansion of two variable function f (x, y) at (x0 , y0 ).
To memorize this formula, you should compare it with the Binomial expansion
X k!
(a + b)k = a i bj
i!j!
i+j=k
i,j0
49
which corresponds to the kth derivative term in the Taylor expansion. But notice that the com-
k!
bination numbers in binomial expansion are i!j! but in Taylors expansion, they turn out to be
1
i!j! .
It is particularly interesting for n = 1. For simplicity, suppose U = BR (x0 , y0 ) is an open
disk centered at (x0 , y0 ) with radius R > 0. Suppose all first and second partial derivatives are
continuous on U . Then for any (x, y) U there is U such that
as x a, where
n n!
= .
i1 , , ik i1 ! ik !
9 Critical points
In this part we apply Taylors theorem to the study of multi-variable functions near critical points.
For simplicity, we concentrate on two variable functions, though the techniques we are going to
develop apply to several variable functions with necessary modifications.
First of all we introduce the notions of local extrema. Let f (x, y) be a function defined on a
subset A R2 . Then a point (x0 , y0 ) A is a local maximum (resp. local minimum) of f , if there
is an open ball Br (x0 , y0 ) A for some r > 0 such that
50
(resp.
f (x, y) f (x0 , y0 ) (x, y) Br (x0 , y0 )). (9.2)
On the other hand, we say (x0 , y0 ) A is a (global) maximum (resp. (global) minimum) if
f (x, y) f (x0 , y0 ) (resp. f (x, y) f (x0 , y0 )) for every (x, y) A. We should note that a global
maximum (or a global minimum) for a function is not necessary a local one, for example consider
the function f (x, y) = x2 + y 2 defined on A = {(x, y) : x2 + y 2 1} the closed unit disk. Then
every point on the unit circle is a global maximum, but not local one.
Theorem 9.1 (Fermat) Suppose that f (x, y) defined on an open subset U has continuous partial
derivatives, and (x0 , y0 ) U is a local maximum (or a local minimum), then
f (x0 , y0 ) f (x0 , y0 )
= = 0. (9.3)
x y
That is the gradient vector f (x0 , y0 ) = 0.
Proof. Consider the local maximum case. There is > 0 such that B (x0 , y0 ) U and (9.1)
holds. For any unit vector v = (v1 , v2 ) and let (t) = (x0 , y0 ) + tv. Consider one variable function
g(t) = f (t). Then g(t) g(0) for any t (, ) and g (0) exists by the chain rule. On the
other hand
g(t) g(0)
g (0) = lim 0
t0 t
t>0
and
g(t) g(0)
g (0) = lim 0
t0 t
t<0
so we must have g (0) = 0. While, g (0) is just the directional derivative of f in v = (v1 , v2 ) so that
f (x0 , y0 ) f (x0 , y0 )
Dv f (x0 , y0 ) = v1 + v2 =0
x y
for any unit vector (v1 , v2 ), which yields (9.3).
Any point (x0 , y0 ) such that f (x0 , y0 ) = 0 is called a critical (or stationary) point. Fermats
theorem says local extrema must be stationary points. Therefore we search for local extrema among
the stationary points. Taylors expansion allows us say more about whether a stationary point is a
local extreme point or not.
To this end, we have to look at the remainder term which appears in Taylors expansion, i.e.
the term
fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2 .
By considering the quadratic function a2 + 2c + b whose discriminate is 4(c2 ab), we have
the following
a2 + 2c + b2 0
51
2) If c2 ab < 0 and a < 0 (so b < 0 as well) then Av v 0
a2 + 2c + b2 0
Together with Taylors expansion we are now in a position to derive further information about
stationary points.
Theorem 9.3 Suppose that f (x, y) defined on an open subset U has continuous derivatives up to
second order, and suppose (x0 , y0 ) U is a critical point: f (x0 , y0 ) = 0.
1) If
2 2
f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )
< 0, >0 (9.4)
xy x2 y 2 x2
then (x0 , y0 ) is a local minimum.
2) If
2 2
f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )
< 0, <0 (9.5)
xy x2 y 2 x2
then (x0 , y0 ) is a local maximum.
Proof. Since all partial derivatives up to second order are continuous, we can choose a small
> 0 so that the open disk B (x0 , y0 ) U and (9.4) (resp. (9.5)) hold not only at a = (x0 , y0 ) but
also at any point in B (a). For any x B (a), according to Taylors theorem, there is B (a)
(though depending on x) such that
If 2
2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )
=0 (9.6)
xy x2 y 2
then, based only on the information about the first and second partial derivatives at (x0 , y0 ), we
can not know the sign of
52
appearing in the Taylor expansion, so we are in this case unable to tell whether (x0 , y0 ) is a local
extreme point or not.
On the other hand, if
2
2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )
>0 (9.7)
xy x2 y 2
then, by continuity, the same inequality remains to hold on a small disk near (x0 , y0 ), and thus
is indefinite, i.e. it can take both positive and negative values, so in this case the stationary point
(x0 , y0 ) is not a local extreme point, such a critical point is called a saddle point.
Example 9.4 Consider f (x, y) = sin x + sin y sin(x + y). Find the maximum and minimum
values of f on the triangle enclosed by the x-axis, y-axis and the line x + y = 2.
The triangle is bounded and closed, and f is continuous, so f achieves its maximum and
minimum values. The global extrema must lies on the boundary of the triangle, i.e. x = 0,
0 y 2; y = 0, 0 x 2; x + y = 2, 0 x, y 2, or lies in the interior of the triangle. In
this case, a global extreme point must be a local one, hence must be critical points of f . Hence we
first locate the possible critical points inside the triangle by solving the following system
f
= cos x cos(x + y) = 0,
x
f
= cos y cos(x + y) = 0,
y
to obtain only one critical point ( 2 2 2 2
3 , 3 ) and f ( 3 , 3 ) = 3/2. On the other hand on the
boundary f (x, y) = 0 so ( 2 2
3 , 3 ) is the global maximum.
Since
2f 2f
= sin x + sin(x + y), = sin y + sin(x + y),
x2 y 2
2f
= sin(x + y),
xy
at ( 2
3 ,
2
3 ), the discriminate
4 2
4 2
D = sin2 ( ) sin + sin
3 3 3
3
= 3<0
4
and
2 f 2 2 2 4
( , ) = sin + sin = 3<0
x2 3 3 3 3
so that ( 2
3 ,
2
3 ) is a local maximum,
53
There is a generalization to several variable functions. To this end we have to borrow a notion
about symmetric matrices from the linear algebra. We say an n n symmetric matrix A = (aij )
(where aij = aji for any pair (i, j)) is positive definite (resp. negative definite) if
n
X
Av v = aij vi vj 0 (resp. 0) v = (v1 , , vn ) Rn , (9.8)
i,j=1
D2 f =
.. .. ..
(9.9)
. . .
2
f 2
f
xn x1 x2 n
Theorem 9.5 Suppose f (x) is a function with n variables x = (x1 , , xn ) defined on an open
subset U Rn which has continuous partial derivatives up to second order. Let a = (a1 , , an )
be a critical point: f (a) = 0.
1) If the hessian matrix D2 f (a) is positive definite, then a is a local minimum of f .
2) If the hessian matrix D2 f (a) is negative definite, then a is a local maximum of f .
The proof follows from a discussion via Taylors expansion at the critical point a.
10 Lagranges multipliers
In this part we develop a method of locating relative local extrema. Let us first consider the question
with three variables, and consider the following problem. Let f (x, y, z) be a function defined on a
subset U R3 . We wish to locate the local extrema of f (x, y, z) subject to the following constraint
F (x, y, z) = 0. (10.1)
We say (x0 , y0 , z0 ) U is a (relative) local minimum subject to (10.1) if F (x0 , y0 , z0 ) = 0 and there
is a small ball B centered at (x0 , y0 , z0 ) with radius > 0 such that f (x, y, z) f (x0 , y0 , z0 ) for
every (x, y, z) B which satisfies (10.1).
Theorem 10.1 Let f (x, y, z) and F (x, y, z) be two functions on an open subset U R3 . Suppose
that both functions f and F have continuous partial derivatives, and the gradient vector field F 6= 0
on U . Let (x0 , y0 , z0 ) U be a local maximum or local minimum of f (x, y, z) subject to the constraint
(10.1). Then there is a real number such that f (x0 , y0 , z0 ) = F (x0 , y0 , z0 ).
54
By assumptions, (x0 , y0 , z0 ) S is a local maximum or minimum of the restriction of the function f
over S. Given any differential curve (t) = (x(t), y(t), z(t)) lying on the surface S, passing through
(x0 , y0 , z0 ), i.e.
F (t) = 0 t (, ), (0) = (x0 , y0 , z0 ),
and consider h(t) = f (t). Then by the definition of relative local extrema, 0 is a local maximum
or minimum of the function h(t). Therefore, by Fermats theorem, h (0) = 0. On the other hand,
according to the chain rule
h (0) = f ((0)) (0) = 0,
which means that f (x0 , y0 , z0 ) is perpendicular to (0). Since (t) is any curve lying on the
surface S passing through (x0 , y0 , z0 ), so that (0) can be any tangent vector to S at (x0 , y0 , z0 ).
Therefore f (x0 , y0 , z0 ) must be perpendicular to the tangent plane of S at (x0 , y0 , z0 ). It follows
that f (x0 , y0 , z0 ) either equals 0 or f (x0 , y0 , z0 ) 6= 0 is normal to S at (x0 , y0 , z0 ). On the other
hand a normal vector to S at (x0 , y0 , z0 ) is F (x0 , y0 , z0 ), therefore f (x0 , y0 , z0 ) and F (x0 , y0 , z0 )
are parallel. Since F (x0 , y0 , z0 ) 6= 0, so there is such that f (x0 , y0 , z0 ) = F (x0 , y0 , z0 ).
As a by-product, we have proved that if (x0 , y0 , z0 ) S is a relative local maximum or minimum
of f (x, y, z) along S (i.e. satisfying the constraint (8.5)), then f (x0 , y0 , z0 ) is perpendicular to
the level surface S : F (x, y, z) = 0.
According to the previous theorem, in order to find the constrained extrema of f we should
look among those (x, y, z) U and real number which satisfy the following system
f (x, y, z) = F (x, y, z),
(10.2)
F (x, y, z) = 0.
[We often assume that F (x, y, z) 6= 0]. Of course we are interested in those (x, y, z) U such
that there is a real number which solve the system (10.2). In practice, we need to solve (x, y, z),
but there is no need to know the explicit value . The constant introduced here to help us to
locate the relative extrema is called a Lagrange multiplier.
Introduce a function G(x, y, z, ) = f (x, y, z) F (x, y, z). Then the system (10.2) may be
written as
G G G G
= = = =0
x y z
which means a solution (x, y, z, ) to (10.2) is just a critical point of G(x, y, z, ).
Example 10.2 Maximize f (x, y, z) = x + y subject to the constraint x2 + y 2 + z 2 = 1.
To use the method of Lagrange multipliers, set
G(x, y, z, ) = x + y x2 + y 2 + z 2 1
55
The first equation implies that 6= 0, so from the first and second equations, we obtain z = 0,
1
x = y = 2 , substituting them to the constraint to obtain
2 2
1 1
+ + 02 = 1
2 2
q q q q q
1 1 1 1 1 1
so that 2 = 2. Thus there are two possible relative extrema ( 2, 2 , 0) and ( 2, 2 , 0).
Since the sphere S : x2 + y 2 + z 2 = 1 is compact (bounded and closed), and the function f (x, y, z) =
x + y is continuous, so it must achieve it maximum and minimum values [We will prove this kind
of statements in Prelims Analysis II]. Therefore the maximum of f subject to the constraint is
r r
1 1
f( , , 0) = 2
2 2
while
r r
1 1
f ( , , 0) = 2
2 2
is the constrained minimum values of f over the unit sphere.
To conclude our discussion, let us describe the general form of the Lagrange multipliers. Suppose
that f (x1 , , xn ) and F1 (x1 , , xn ), , Fk (x1 , , xn ) are functions with n variables defined
on an open subset U Rn , where n, k N. Suppose that f , F1 , , Fk have continuous partial
derivatives. Then the local extrema of f (x1 , , xn ) subject to the following constraints:
F1 (x1 , , xn ) = 0,
Fk (x1 , , xn ) = 0
Example 10.3 Find the extreme points of f (x, y, z) = x+y+z subject to the conditions x2 +y 2 = 2
and y 2 + z 2 = 2.
Construct function
56
We want to solve, in order to locate extreme points, the following system
G
= 1 21 x = 0,
x
G
= 1 2(1 + 2 )y = 0,
y
G
= 1 22 z = 0,
y
1 2 1 2
+ =2
21 41
q
1
which leads to the solutions 1 = 4 25 . Thus possible constrained extreme points are
r r r ! r r r !
2 2 2 2 2 2
2 , ,2 and 2 , , 2
5 5 5 5 5 5
at which the function f achieves the relative maximum and minimum values respectively.
57