Beruflich Dokumente
Kultur Dokumente
J. L. Selden
Multivariable Calculus
In this chapter, we make the step from single-variable calculus to multivariable cal-
culus by introducing the notions of (scalar) functions of more than one variable, partial
di↵erentiation, multiple integrals and vector functions.
3
4 CHAPTER 1. MULTIVARIABLE CALCULUS
25
20
15
10
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
3. h : R3 ! R, where h(x, y, z) = x2 + y 2 + z 2
h(F, ~, ~) = F2 + ~2 + ~2 .
Definition 1.3 For f : R ! R, we form the graph of f by taking all pairs (x, y) in the
plane such that y = f (x).
To make things concrete, let’s take the function f : R ! R defined by f (x) = x2 . The
graph of f is all pairs (x, y) such that y = x2 . We know this as the standard parabola in
the plane. By plotting the graph of f in the plane, we can “see” the function, illustrated
in Figure 1.1. Using geometric jargon, the graph is generally called a curve. We can use
the very same trick with a function of two real variables.
Definition 1.4 For h : R2 ! R, we form the graph of h by taking all triples (x, y, z) in
space such that z = h(x, y).
In this case, we can sometimes visualize the graph of h. In general, the resulting graph of
a function of two variables will be very difficult to “see” directly. The geometric jargon for
a graph of a function of two variables is a surface.
50
45
40
35
30
25
20
15
10
5 5
0 0
5 4 3 2 1 0 −1 −2 −3 −4 −5
−5
−5
5
0
0
−5 −5
Planes are a very important part of multivariable function theory, both for visualizing
the graphs of functions as well as for calculus (as we’ll see later). Of particular interest are
the following three planes and their parallel translations.
Definition 1.6 A coordinate plane is a plane gotten by setting one of the three coordinates,
x, y, z, equal to 0. The xy-plane is given by the equation z = 0. The xz-plane is given by
the equation y = 0. The yz-plane is given by the equation x = 0. See Figure 1.3
Along with these three planes, we are also interested in their parallel translations, by which
we simply mean planes of the form z = c where c is a constant, and similarly for x and y.
We make use of these planes in the following way.
6 CHAPTER 1. MULTIVARIABLE CALCULUS
Definition 1.7 A cross-section of a graph is the intersection of the graph with a given
plane.
The idea behind a cross-section is quite simple. It may be difficult to visualize the graph
of a given function, but if we look at cross-sections, we get a “snapshot” of the surface. If
we look at enough of these snapshots, we can reconstruct the surface.
Example 1.8 Let’s find the cross-section of h by the plane z = 2. We do this algebraically
by observing that in order for a point (x, y, z) to be both in the plane as well as on the
graph, it must satisfy the system of equations:
⇢
z = x2 + y 2
.
z=2
2 2
Therefore, wep can eliminate z, which gives x + y = 2. This is the equation of a circle
with radius 2. So the cross-section of the graph of h by the plane z = 2 is a circle.
To find out more about the shape of the graph of h, we can find its cross-section with
another plane, for example, x = 2. Again, we can use algebra to find an equation for the
resulting intersection (curve). We require that (x, y, z) satisfy
⇢
z = x2 + y 2
.
x=2
Cross-sections are useful for any surface, not just those that are given by functions of
two variables. For example, we could use cross-sections with a sphere, which is not the
graph of a function of two variables. However, when we are dealing with a graph of a
function of two variables, we assign extra importance to cross-sections with planes parallel
to the xy-plane, i.e., z = c for a constant c.
0
2
1 2
1.5
1
0 0.5
0
−1 −0.5
−1
−1.5
−2 −2
160 160
140 140
120 120
100 100
80 80
60 60
40 40
20 20
Examples 1.10 1. The contours of the function f (x, y) = x2 + y 2 are all of the form
2 2
x + y = c. Therefore, all the contours (or level curves) of f are circles, but with
varying radii. See Figure 1.6, left plot.
2. For the function f : R2 ! R given by f (x, y) = 2x + y 2 , the contours are given by the
equation 2x + y 2 = c. I leave it to you to convince yourself that these are parabolas.
See Figure 1.6, right plot.
3. A practical example of contours can be found on any topographic (or contour) map.
The curves plotted on the map are level curves of the elevation function, i.e., if you
were to walk along one of the curves, you would never go up or down, but would
instead remain always at the same elevation.
So far, we have only dealt with visualizing functions of two variables. What about
functions of three variables? You might be tempted toward the idea of graphs, but a quick
8 CHAPTER 1. MULTIVARIABLE CALCULUS
look back and the pattern becomes obvious. For functions of three variables, the graphs
would require four variables. This makes the thought of graphing a function of three
variables unhelpful. However, there is nothing wrong with using an idea inspired by the
notion of level curves. Therefore, we find that our tool for visualizing functions of three
variables is via their level surfaces.
We will not often try to find the level surfaces of a given function; rather, we will usually
try to realize a given surface as a level surface for some function. For instance, consider
the following.
p
Example 1.12 The equation of a sphere in R3 is x2 + y 2 + z 2 = c, where c is the radius.
If we consider the function f (x, y, z) = x2 + y 2 + z 2 , it pretty quickly becomes apparent that
spheres are the level surfaces of this function. Later on, we’ll be able to use this information
to our advantage.
There is a special class of level surfaces which we have already encountered, just with a
di↵erent name. Instead of level surfaces, we called these surfaces the graphs of a function
of two variables. Recall that if f : R2 ! R, then the graph of f is the set of (x, y, z) in
space such that z = f (x, y). To see the connection, we define the function G : R3 ! R by
G(x, y, z) = z f (x, y). For c = 0, the level surface for G is the set of all (x, y, z) such
that G(x, y, z) = z f (x, y) = 0, i.e., such that z = f (x, y). Therefore, the graph of f is
identical to the level surface G(x, y, z) = 0.
One could ask the question the other way around, too. In other words, if you have
a level surface, is it the graph of some function of two variables? The sphere provides a
negative answer. The level surface x2 + y 2 + z 2 = 1 cannot be represented by a function of
two variables since there would be two possible z values for each pair (x, y). However, this
level surface can be realized as the union of the graphs of two di↵erent functions. (This is
exactly analogous to the case of representing a circle using two functions.) If we solve for
z, we have
p
z = ± 1 x2 y 2 .
p p
If we define f+ (x, y) = 1 x2 y 2 and f (x, y) = 1 x2 y 2 , it is clear (after a
little thought, perhaps) that the union of the graphs of f+ and f is the whole sphere.
Polar Coordinates
In the plane, we have been using (x, y) to describe the location of a point. The idea
behind polar coordinates is to use two other numbers to describe the location of a point.
One number, r, is the distance from that point to the origin (0, 0), and the other number,
✓, is the angle that a ray emanating from the origin and passing through that point makes
with the x-axis. To make things precise, we require r > 0 and 0 ✓ < 2⇡. It is possible
to describe how the two coordinate systems relate to each other. If a point is described
in polar coordinates (r, ✓), then the corresponding Cartesian coordinates are x = r cos(✓)
and y = r sin(✓). Note that the Pythagorean theorem tell us that r2 = x2 + y 2 .
Examples 1.13 Let’s try representing functions expressed in terms of x and y in polar
coordinates.
1. f (x, y) = x2 + y 2 . This becomes f (r, ✓) = r2 . Notice that the function depended on
both x and y in Cartesian coordinates, but only on r in polar coordinates. This is the
whole point of using a di↵erent coordinate system; to make things easier.
2. g(x, y) = xy. This becomes g(r, ✓) = r2 cos(✓) sin(✓). This certainly doesn’t look any
better in polar coordinates, so it’s unlikely that it would make our lives easier.
Polar coordinates are great whenever there are circles showing up in the problem.
Cylindrical Coordinates
Now we move into space and the di↵erent ways of representing points there. The first
alternative is known as cylindrical coordinates. Basically, we use polar coordinates to
replace x and y and leave z as it was. The three cylindrical coordinates are r, ✓ and z,
where r > 0, 0 ✓ < 2⇡, and z can be any real number. The equations for representing
Cartesian coordinates in terms of cylindrical coordinates are
x = r cos(✓)
y = r sin(✓)
z = z.
Example 1.14 Given the function f (x, y, z) = x2 + y 2 + y + z, we can represent it in
cylindrical coordinates as f (r, ✓, z) = r2 + r sin(✓) + z.
A graphic illustration of cylindrical coordinates is given in the left plot of Figure 1.7.
Spherical Coordinates
In spherical coordinates, we use one distance coordinate and two angular coordinates.
We denote them by ⇢, ✓ and , where ⇢ > 0, 0 ✓ < 2⇡ and 0 < ⇡. The equations
for representing Cartesian coordinates in terms of spherical coordinates are
x = ⇢ cos(✓) sin( )
y = ⇢ sin(✓) sin( )
z = ⇢ cos( ).
10 CHAPTER 1. MULTIVARIABLE CALCULUS
Examples 1.15 Let’s try representing functions expressed in terms of x and y in spherical
coordinates.
A graphic illustration of spherical coordinates is given in the left plot of Figure 1.7.
x = 3. We have
f (3 + x, 2) f (3, 2)
lim ,
x!0 x
which is the slope of the tangent line to the curve z = f (x, 2) at the point x = 3. Un-
fortunately, this only gives us a seemingly small amount of information about the surface,
since it applies only to a particular curve on the surface. So if we are interested only in
this curve, we have all the information we need. However, what about the behaviour near
(3, 2) but not on this curve?
Of course, we can do the same thing with the cross-section of the graph of f by the
plane x = 3. This cross-section is described by the curve z = f (3, y), which lies on the
graph and also passes through the point (3, 2). Again, we can use single-variable calculus
to find the slope of the tangent line to this curve at the point y = 2. It is equal to
f (3, 2 + y) f (3, 2)
lim .
y!0 y
All right, so we have information about the behaviour of f along a di↵erent curve, but does
this really help us understand the behaviour of f as a function of two variables? YES! As
it turns out, these two slopes are all we need to find in order to know the local behaviour
of f as a function of two variables. We’ll talk more about this later. For now, we formalize
these notions and set notation.
The key to partial di↵erentiation is only allowing one of the variables to change while
holding the rest constant. As a result, we’ll be able to use almost all of the rules and tricks
from single variable calculus.
@f f (x0 , y0 + y) f (x0 , y0 )
(x0 , y0 ) := lim .
@y y!0 y
Take note of the fact that if these limits exist, then the result is a number that depends on
the point (x0 , y0 ). In the standard way, we define a function using the above definitions.
If the limits exist at each point in R2 , then we can define the partial derivative functions
@f @f
: R2 ! R and : R2 ! R.
@x @y
12 CHAPTER 1. MULTIVARIABLE CALCULUS
@f f (x + x, y, z) f (x, y, z)
(x, y, z) := lim .
@x x!0 x
The partial derivative of f with respect to y is
@f f (x, y + y, z) f (x, y, z)
(x, y, z) := lim .
@y y!0 y
The partial derivative of f with respect to z is
@f f (x, y, z + z) f (x, y, z)
(x, y, z) := lim .
@z z!0 z
Provided the limits exist for all points in space, each of these is a function from R3 to R.
Before we concern ourselves any more with the meaning of the partial derivative and
what it tells us about the function, let’s get some practice computing them explicitly.
@f f (1 + x, 4) f (1, 4)
(1, 4) := lim
@x x!0 x
(1 + x)2 + 16 (1 + 16)
= lim
x!0 x
1+2 x+ x2 + 16 17
= lim = lim (2 + x) = 2.
x!0 x x!0
2. In single variable calculus, we were able to get the derivative function directly using
rules, e.g., the product rule, the power rule, the quotient rule. Can we do something
similar with partial derivatives? YES. BUT we have to be careful since there are too
many variables floating around to be sloppy. Let’s find the partial derivative function
of f (x, y) = x2 + y 2 with respect to x. To do so, we treat y AS A CONSTANT, and
then use the rules with x. We have
@f @ 2
= (x + y 2 ) = 2x,
@x @x
1.2. PARTIAL DIFFERENTIATION 13
Since y is constant with respect to x, @y/@x = 0. Note that it is now easy to find the
value of the partial derivative of f with respect to x at the point (1, 4).
If you are having any trouble with thinking of y as a constant here, just consider the
function f (x) = x2 + ⇡ 2 . What is f 0 ? It’s just 2x. So treat y just like you treated ⇡.
4. This is a good one for emphasizing the particulars of partial di↵erentiation. Let
h : R2 ! R be defined as h(x, y, z) = xy + y z + z x . Find the three partial derivatives
of h.
@h @ y
= (x + y z + z x ) = yxy 1
+ ln(z)z x
@x @x
@h @ y
= (x + y z + z x ) = ln(x)xy + zy z 1
@y @y
@h @ y
= (x + y z + z x ) = ln(y)y z + xz x 1 .
@z @z
Just remember that as much as you would like to think of y and z as variables, when
you are taking a partial derivative with respect to x, they are nothing but constants
and should be treated as such.
Since the result of partial di↵erentiation with respect to a given variable is again a
function on R3 , we can consider its partial derivative with respect to any of the variables.
For example, if f (x, y) = 2xy 2 + y 7 , then
@f
= 4xy + 7y 6 .
@y
We can then take the partial derivative of this function with respect to x, which gives
@
(4xy + 7y 6 ) = 4y,
@x
14 CHAPTER 1. MULTIVARIABLE CALCULUS
@
(4xy + 7y 6 ) = 4x + 42y 5 .
@y
These are examples of higher order partial derivatives. The di↵erence between these and
higher order ordinary derivatives is the fact that we have a choice of which variables to
di↵erentiate with respect to. The possibilities for second order partial derivatives are as
follows:
@ 2f @ @f
:=
@x2 @x @x
@ 2f @ @f
:=
@y 2 @y @y
@ 2f @ @f
:=
@x@y @x @y
@ 2f @ @f
:=
@y@x @y @x
The last two are called mixed partial derivatives. For us, it is safe to assume that these
last two partial derivatives will be equal. However, that is an assumption and one should
be aware that it is not always true. The following theorem tells us when the mixed partial
derivatives are equal. We present it without the slightest bit of proof.
@ 2f @ 2f
Theorem 1.19 If and are continuous functions on all of R2 , then
@x@y @y@x
@ 2f @ 2f
(x, y) = (x, y).
@x@y @y@x
The graph of a function of two variables is a surface in space. The linear function
which models surfaces (the way a line models a curve) is a plane. Therefore, one of the
purposes of multivariable calculus is to find “the best linear approximation” of the surface.
We call this ideal plane a tangent plane. To determine the tangent plane to the graph of
a function f : R2 ! R at a point (x0 , y0 ), we must identify it amongst all the other planes
which pass through the point in space given by (x0 , y0 , f (x0 , y0 )). You will recall that a
plane is uniquely determined by two distinct lines. The tangent lines we obtain through
partial di↵erentiation provide two such lines for the tangent plane. That is the reason why
we only need to use two distinct cross-sections to know the local behaviour of a graph.
We choose the planes x = x0 and y = y0 because they are easier for computations. The
resulting equation for the tangent plane at the point (x0 , y0 , f (x0 , y0 )) is
@f @f
z = f (x0 , y0 ) + (x0 , y0 )(x x0 ) + (x0 , y0 )(y y0 ).
@x @y
The right-hand side of this equation is the first three terms in the (multivariable) Taylor
series expansion of the function f . When (x, y) is near (x0 , y0 ), we can use the tangent
plane approximation with a certain degree of confidence, i.e., for (x, y) near (x0 , y0 ), we
have
@f @f
f (x, y) ⇡ f (x0 , y0 ) + (x0 , y0 )(x x0 ) + (x0 , y0 )(y y0 ).
@x @y
A graphic illustration of the tangent plane is given in Figure 1.8.
This brings us to a related topic concerning the local behaviour of a function. Our
cross-sections (and resulting partial derivatives) have told us about the rate of change of
f in the x direction and in the y direction, but what about in other directions. We could
use the tangent plane again, but that is a bit too general for what we’re after. Instead,
we take this opportunity to introduce vectors into the game. Directions in space can be
determined using vectors. So that the magnitude of the vector does not a↵ect the result
16 CHAPTER 1. MULTIVARIABLE CALCULUS
(since we’re only interested in the direction), we consider only unit vectors, i.e., vectors of
length 1.
Let’s consider a function f : R2 ! R, the point (x0 , y0 ), and a vector ai + bj. We want
to know how f changes at (x0 , y0 ) in the direction of the unit vector ai + bj. It sounds
rough, but in fact, it will be very easy. Let t be a very small positive number. Starting
at the point (x0 , y0 ), we go a short distance in the direction of ai + bj, i.e., to the point
whose coordinates are (x0 + a t, y0 + b t). We can measure how the function f di↵ers
between these two points, i.e.,
f (x0 + a t, y0 + b t) f (x0 , y0 ).
Since t is the only thing that is changing here, in essence, we are in a single variable
situation and can therefore consider the di↵erence quotient
f (x0 + a t, y0 + b t) f (x0 , y0 )
lim .
t!0 t
Using the tangent plane approximation from above, we can write
@f @f
f (x0 + a t, y0 + b t) ⇡ f (x0 , y0 ) + (x0 , y0 )(a t) + (x0 , y0 )(b t).
@x @y
This allows us to rewrite the limit as
@f @f
@x
(x0 , y0 )(a t) + @y
(x0 , y0 )(b t) @f @f
lim =a (x0 , y0 ) + b (x0 , y0 ).
t!0 t @x @y
This is the expression is called the directional derivative of f at (x0 , y0 ) in the direction of
ai + bj. An illustration is given in Figure 1.9. If we denote the vector ai + bj by u, this
directional derivative is sometimes written as
@f
(x0 , y0 ).
@u
However, the important thing to notice here is that the directional derivative can be deter-
mined by the vector n and the partial derivatives with respect to x and y. If we consider
the vector given by
@f @f
(x0 , y0 )i + (x0 , y0 )j,
@x @y
then ✓ ◆
@f @f @f
(x0 , y0 ) = (x0 , y0 )i + (x0 , y0 )j · u,
@u @x @y
where we have taken the dot product of the two vectors on the right-hand side. All of this
can be generalized to three variables and so we make the following definitions.
~ , at the point
Definition 1.20 Let f : R3 ! R. The gradient of f , denoted by rf
(x0 , y0 , z0 ) is the vector given by
Definition 1.21 Let f : R3 ! R. For a given point (x0 , y0 , z0 ) and a given unit vector
u = ai + bj + ck, the directional derivative of f at (x0 , y0 , z0 ) in the direction of u is
@f ~ (x0 , y0 , z0 ) · u.
(x0 , y0 , z0 ) = rf
@u
@f @ 2
= (x + y 2 + z 2 ) = 2x
@x @x
@f @ 2
= (x + y 2 + z 2 ) = 2y
@y @y
@f @ 2
= (x + y 2 + z 2 ) = 2z.
@z @z
18 CHAPTER 1. MULTIVARIABLE CALCULUS
At the point (1, 2, 3), the partial derivatives have the values
@f
(1, 2, 3) = 2
@x
@f
(1, 2, 3) = 4
@y
@f
(1, 2, 3) = 6.
@z
So the gradient of f at (1, 2, 3) is given by the vector
~ (1, 2, 3) = 2i + 4j + 6k.
rf
The fact that the directional derivative of f at the point (1, 2, 3) in the direction of u
is equal to 0 tells us something interesting about the function f . Roughly speaking, it
says that f does not change near the point (1, 2, 3) if we move in the direction of u.
To see that this is not the case in every direction, let’s find the directional derivative
p
of f at the point (1, 2, 3) in the direction of the unit vector v = (2i + 3j + k)/ 14.
Conveniently, we have already found the gradient of f at (1, 2, 3), so this computation
is quicker. We get
@f ~ (1, 2, 3) · v
(1, 2, 3) = rf
@v p
= (2i + 4j + 6k) · (2i + 3j + k)/ 14
p p
= (4 + 12 + 6)/ 14 = 22/ 14.
@f @f
f (x + a, y + b) ⇡ f (x, y) + (x, y)(a) + (x, y)(b).
@x @y
This is the chain rule for this situation. Let’s put everything down in the form of a theorem.
dw @f dx @f dy
(t) = (x(t), y(t)) (t) + (x(t), y(t)) (t).
dt @x dt @y dt
dw @w dx @w dy
= + .
dt @x dt @y dt
The problem I have with this formula is the ambiguity with which w is treated. Really, w
does not depend upon x and y as variables, only upon t as a variable. So taking a partial
derivative of w with respect to x is unnatural. However, some people find it handy for
remembering the chain rule. If you remember how to interpret the terms correctly, then
you won’t run in to any problems here and the two formulas say exactly the same thing.
In the second, the term @w/@x is to be understood as follows: treat w as a function of
x and y and find its partial derivative with respect to the “variable” x. This is nothing
other than the partial derivative of f with respect to x where we have to remember to let
@f /@x act on the pair (x(t), y(t)) (look back at the single-variable chain rule and you’ll see
a similar thing!).
Examples 1.25 Let’s do some examples in which we can use the chain rule.
1. Let f (x, y) = x sin(y) and suppose that x and y in turn depend upon t as follows:
x(t) = t2 and y(t) = 2t+1. As before, we define the composition w(t) := f (x(t), y(t)).
If we want to find the derivative of w with respect to t, the chain rule tells us that
dw @f dx @f dy
(t) = (x(t), y(t)) (t) + (x(t), y(t)) (t).
dt @x dt @y dt
1.2. PARTIAL DIFFERENTIATION 21
We don’t want to have to rely upon geometric intuition to find normal vectors, so we
need to develop some theory. In fact, it is not difficult to explicitly construct the unit
normal vector to a surface at every point. We already have all of the tools needed to do
so.
I have suppressed the dependence upon the point (x, y) to make things look nicer. The
unit normal vector is therefore given by
@f @f
u⇥v @x
i @y
j +k
n= =q .
|u ⇥ v| @f 2
+ @f 2
+1
@x @y
It may be good to stress that this vector depends on the coordinates x and y, which makes
it a vector function. Let’s check this formula against the hemisphere, since we know what
the answer should be.
p
Example 1.28 We have f : R2 ! R defined by f (x, y) := 4 x2 y 2 and we want
to find a formula for the unit normal vector n(x, y). The formula requires the partial
derivatives of f , so we compute them:
@f 1 x
(x, y) = (4 x2 y2) 1/2
( 2x) = p
@x 2 4 x2 y2
@f 1 y
(x, y) = (4 x2 y2) 1/2
( 2y) = p .
@y 2 4 x2 y2
Therefore,
✓ ◆ ✓ ◆
p x i p y
j+k p
4 x2 y 2 4 x2 y 2 xi + yj + 4 x2 y2k
n(x, y) = s✓ ◆2 ✓ ◆2 = p
x2 + y 2 + 4 x2 y2
p x p y
+ +1
4 x2 y 2 4 x2 y 2
1 1
= (xi + yj + zk) = r,
2 2
p
where we have used the fact that on the graph of f , we know that z = 4 x2 y 2 . This
is exactly what we expected to get.
As a check for your understanding, find the unit normal vector for the xy-plane, i.e.,
the plane z = f (x, y) where f (x, y) := 0.
The directional derivative of f gives us information about how the function f changes in
the direction of v. So the gradient of f holds all of the information about how the function
f changes.
Example 1.29 Let’s consider a concrete example.
Let f (x, y, z) = x2 + y 2 + z 2 . We can easily compute the gradient of f :
~ = 2xi + 2yj + 2zk = 2r.
rf
So the gradient of f is equal to twice the radial vector r. To understand the geometric
significance of this, let’s consider a level surface of f . Recall that a level surface of f is the
set of points (x, y, z) in space which satisfy the equation
x2 + y 2 + z 2 = c,
p
where c is a constant. We recognize this surface as the sphere of radius c centered at the
origin. As we have seen before, the unit outward normal to the sphere is r/|r|. So the
gradient of f points in the same direction as the unit normal to any level surface of f .
Is this just a coincidence, or special case? The answer is no. This is another general
property of the gradient, but let’s look at a class of surfaces for which we already have an
explicit formula for the normal: graphs.
Consider the graph of a function f (x, y). Recall that we can think of graphs as level
surface in the following way. The graph is given by all points (x, y, z) in space which satisfy
z = f (x, y). If we define the function G(x, y, z) = z f (x, y), then the graph of f is the
same as the level surface G(x, y, z) = 0. We already have a formula for the unit normal for
the graph of f . It is
@f
@x
i @f
@y
j+k
n= q .
@f 2 @f 2
@x
+ @y
+ 1
How is this connected with a gradient? By analogy with the previous example, we should
expect the gradient of the function G (not the gradient of f !) to point in the same direction
as n. The gradient of G is
~ = @ (z
rG f (x, y))i +
@
(z f (x, y))j +
@
(z f (x, y))k =
@f
i
@f
j + k.
@x @y @z @x @y
This vector field clearly points in the same direction as n at every point. This means that
the gradient of G gives us the normal direction to the level surfaces of G. So we could, in
fact, define the unit normal to a level surface of G as1
~
rG
n := .
~
|rG|
This definition works for a general level surfaces, rather than just for surfaces given as
graphs.
1
Notice that this gives the same normal vector field as before in the case of graphs.
1.4. GEOMETRIC SIGNIFICANCE OF THE GRADIENT 25
To gain some more insight into the gradient and level surfaces, let’s connect these two
notions for the gradient, i.e., the gradient in its role for finding directional derivatives and
in its role as the normal direction to a level surface. Let G : R3 ! R and let’s fix a point
(x0 , y0 , z0 ) in space. Then G(x0 , y0 , z0 ) = c for some constant c, which means that the point
(x0 , y0 , z0 ) lies on the level surface G(x, y, z) = c. The unit normal to this level surface at
the point (x0 , y0 , z0 ) is given by the vector
~
rG(x 0 , y0 , z 0 )
.
~
|rG(x 0 , y0 , z0 )|
Recall that the directional derivative of G at (x0 , y0 , z0 ) in the direction of a unit vector v
is equal to
~
rG(x 0 , y0 , z0 ) · v.
If v is any unit vector which is tangent to the level surface G = c, then the directional
derivative of G in the direction of v is equal to 0 (since rG~ is normal to this surface).
This makes sense for another reason. A level surface of G is the set of points for which the
function G has the same value, i.e., where the function doesn’t change. This is precisely
what a directional derivative of 0 means.
~ and v is maximal if v points in the same direction
By contrast, the dot product of rG
of as the gradient of G. This maximizes the directional derivative of G and means that
the direction of greatest change of G is given by the gradient of G.
A simple practical example might be helpful. Consider a topographic map describing a
hill. The curves on the map gives the contours, which are precisely the level curves of the
height function. The gradient of the height function is normal to these level curves and
represents the direction of greatest change of the height function. Moving in tangential
directions to the level curves means we don’t change our height and therefore the directional
derivative is 0. However, if we want to change our altitude in the fastest way possible, we
would choose to follow the gradient of the height function.
26 CHAPTER 1. MULTIVARIABLE CALCULUS
Example 1.30 Let’s examine an example with these new insights. Let f : R3 ! R be
given by f (x, y, z) := x2 + y 2 + z 1. First, we compute the gradient of f :
~ = 2xi + 2yj + k.
rf
Consider the point (1, 2, 1). Since f (1, 2, 1) = 12 + 22 + 1 1 = 5, the point (1, 2, 1) lies
on the level surface of f described by f (x, y, z) := x2 + y 2 + z 1 = 3. It follows that the
vector
~ (1, 2, 1) = 2i + 4j + k
rf
is normal to the level surface of f at (1, 2, 1). It’s not a unit vector, but we could simply
divide it by its magnitude
p to fix that
p problem. This also gives us the direction of greatest
increase of f and 22 + 42 + 12 = 21 is the maximum rate of change of f at (1, 2, 1).
f (x) f (y) = f 0 (⌘)(x y), where ⌘ is some value between x and y. (1.5.1)
We immediately see that h(0) = f (y) and h(1) = f (x). Since h is a scalar function we can
apply (1.5.1) to obtain
~ (y + t(x
h0 (t) = rf y)) · (x y) (1.5.2)
and therefore
f (x) ~ ( (⌘)) · (x
f (y) = rf y). (1.5.3)
This is the equivalent of the mean value theorem for a function of several variables, observe
that (1.5.3) is indeed similar to (1.5.1), but the derivative is replaced by the directional
derivative.
We immediately see that h(0) = f (y) and h(1) = f (x). Since h is a scalar function we can
apply (1.5.4) to h(t), which gives
1
h(1) = h(0) + h0 (0) + h00 (⌘), with ⌘ 2 [0, 1].
2
Recalling (1.5.2) we see that we only need to evaluate h00 (⌘). In order to do this we develop
the expression in (1.5.2) and derive it term by term. We use the notation x = x1 i+x2 j+x3 k
and y = y1 i + y2 j + y3 k in order to simplify the expression
3
!
d ~ d X @f
h00 (t) = (rf (y + t(x y)) · (x y)) = ( (t))(xj yj )
dt dt j=1
@xj
3 X 3
!
X @ 2f
= ( (t))(xi yi )(xj yj ) .
i=1 j=1
@x i @x j
Studying the last term of the right hand side we observe that this can be written as a
product between a vector and a matrix. To see this define the 3⇥3 matrix H(x) = (hij (x))
where the elements are defined by
@ 2f
hij (x) := (x).
@xi @xj
28 CHAPTER 1. MULTIVARIABLE CALCULUS
3 3
~ (y) · (x 1 X X @ 2f
f (x) = f (y) + rf y) + ( (⌘))(xi yi )(xj yj )
2 i=1 j=1 @xj @xi
~ (y) · (x 1
= f (y) + rf y) + (x y)T H( (⌘))(x y).
2
The second order Taylor polynomial T (x) of the function f at a point x̄ may be written
~ (x̄) · (x 1
T (x) = f (x̄) + rf x̄) + (x x̄)T H(x̄)(x x̄).
2
The function T (x) is a good approximation of f (x) in a neighbourhood of x̄. Indeed for
smooth functions this approximation is one order of approximation better than the tangent
plane approximation at x̄.
1.5. THE MEAN VALUE THEOREM AND TAYLOR’S FORMULA 29
1.5.2 Optimisation
In many applications in science, engineering and finance we are faced with the problem
of finding extremal values of a function in some subset of Rn . That is the points in some
set, where the function takes it maximum or minimum values. Here we will consider the
framework where ⌦ ⇢ Rn , n 2 N, and the problem then reads: find x 2 ⌦ such that
It is of course a relevant question to ask if such a problem always has a solution. This
question is answered in the following theorem that we state without proof.
Theorem 1.32 If f : ⌦ 7! R is a di↵erentiable function and either ⌦ is a closed and
bounded subset of Rn , or f ! 1 for |x| ! 1 then there is a minimum point x 2 ⌦ such
that
f (x) f (y) for all y 2 ⌦.
Observe that it is essential that ⌦ is closed or bounded to guarantee the existence of the
minimum or for unbounded domains f ! 1 for |x| ! 1.
Examples 1.33 1. The function f : (0, 1) 7! R with f (x) = x does not have a mini-
mum point in (0, 1). In this case ⌦ = (0, 1) is not closed and f ! 0 for |x| ! 1.
However the lower bound f = 0 is not taken for any x.
2. The function f : [1, 1) 7! R with f (x) = 1/x does not have a minimum point in
[1, 1). In this case ⌦ = [1, 1) is not bounded.
3. Even if ⌦ is unbounded f may have a minimum point. In particular if f (x) increases
to infinity as |x| increases one can always restrict the search for a minimum to a
bounded set.
When solving the minimisation problem two situations can arise:
1. the minimum value is taken in an interior point x of ⌦;
2. the minimum value is taken in a point x in the boundary, @⌦, of ⌦. We may define
the boundary as the x points in ⌦ such that for every ✏ > 0 there exists points y not
in ⌦ such that |x y| ✏.
When we solve a minimisation problem we must typically consider the two possibilities
separately and first find all local minima in the interior of the domain and then study the
function on the boundary and find local extrema there. The minimum point is then the
point from the two searches that results in the smallest value of f (x).
Examples 1.34 Throughout this section we will use the following minimisation problem
to illustrate the theory. Find x 2 ⌦ realising the minimum
min f (x)
x2⌦
1. i > 0 for 1 i n, this means that H(x) is positive definite and the critical point
x is a strict local minimum. If i 0, with j = 0 for some j, the critical point is
said to be a degenerate local minimum.
2. i < 0 for 1 i n, this means that H(x) is negative definite and the critical point
x is a strict local maximum. If i 0, with j = 0 for some j, the critical point is
said to be a degenerate local maximum.
3. If neither of the above cases hold, i.e. H(x) has eigenvalues of both signs, then we
say that the critical point is a saddle point.
It follows that to identify local extrema in ⌦ we first find the critical points by solving the
equation rf~ (x) = 0 for x 2 ⌦. Then we classify the critical points as local minima, local
maxima or saddle points using the spectrum of the Hessian.
~ (x) = 0 for x 2 ⌦ we find two critical points in ⌦, x1 = (0, 0) and x2 = (0, 1).
Solving rf
To classify these points we compute the Hessian matrix
2 0
H(x) = .
0 2y 1
It follows that
2 0
H(x1 ) =
0 1
and since the eigenvalues of a diagonal matrix coincides with the coefficients on the diagonal
we conclude that x1 is a saddle point. Similarly
2 0
H(x2 ) =
0 1
Parametrised boundary
In this case the points on @⌦ are given by a function : Rn 1 7! Rn , such that the
mapping from some R 2 Rn 1 to @⌦, : R 7! @⌦ defines @⌦. Then we may find the
extrema on the boundary by solving the following minimisation problem
min f ( (s)).
s2R
under the constraint that g(x) = 0. This means that in the minimisation we are only
allowed to consider such x for which g(x) = 0. In order to include the constraint we
introduce an auxilliary function known as the Lagrangian
L(x, ) := f (x) + g(x)
where 2 R is an additional unknown. We may then find the critical (or stationary)
points of the Lagrangian by solving the set of equations obtained by setting
~
rL(x, )=0
and
@L
(x, ) = 0.
@
Observe that by the definition of L(x, ) the second equation is simply the constraint
g(x) = 0. In mechanics f (x) is often some energy and then corresponds to the “virtual”
force required to make the system stay on the “trajectory” g(x) = 0. Considering our
example we get
1.6. AN INTERLUDE ON VECTOR FIELDS 33
Examples 1.39
1 1 2
L(x, ) := f (x) + g(x) = x2 + y 3 y + (x2 + y 2 4).
3 2
To find the stationary points we need to solve
@L
= 2x + 2 x = 0,
@x
@L
= y2 y + 2 y = 0,
@y
@L
= x2 + y 2 4 = 0.
@
First observe that if = 0 the two first equations are exactly those we solved to find the
critical points in the interior and they are not on the boundary. Hence 6= 0. Then
assume > 0. By the first equation x = 0 and then by the constraint (third equation)
y = ±2. It is then immediate from the second equation that x = 0, y = 2, = 1/2 and
x = 0, y = 2, = 3/2 are solutions. For = 1 we see that the first equation always is
satisfied. Using this value in the second equation we obtain the solutions y = 0 and y = 3,
of which only y = 0 is compatible with the third equation, leading to x = ±2. Discarding
the Lagrange multiplier we see that we have identified the same four critical points as
before, (2, 0), (0, 2), ( 2, 0), (0, 2).
Putting it together
Once we have found the critical points in the interior and classified them and we have
identified the points on the boundary in which a local extremum may be taken in the
constrained configuration, we are ready to find the point minimising f (x) over ⌦. Assume
that those points are the set C = {x1 , . . . , xm }, if there are m candidates for the local
extremum. The minimum (or maximum) is then obtained by taking the minimum of f
over the set C
min f (x).
x2C
Examples 1.40 In our example we have found an interior local minimum at (0, 1) and
critical points on the boundary at (0, ±2), (±2, 0). If we evaluate the function f in these
points we find that the minimum f = 14/3 is taken in the point (0, 2). In the local
minimum in the interior the function takes the value f (0, 1) = 1/6.
In three dimensions, such a function assigns to each vector in R3 another vector in R3 , i.e.,
F : R3 ! R3 .
Let’s take a moment to review vector notation. In these notes, boldface letters represent
vectors. When handwriting things, we use an arrow over the top of the symbol to denote
that it’s a vector. We express any vector v as a linear combination of the Cartesian unit
vectors i, j, and k, i.e., given v, there are three unique real numbers v1 , v2 and v3 such
that v = v1 i + v2 j + v3 k. In R3 , any point in space can also be identified with a unique
vector. For example, the point (x0 , y0 , z0 ) is identified with the vector r = x0 i + y0 j + z0 k.
This means that we can think of vectors in R3 as points in R3 and vice versa. This duality
will play an important role in the structure of the theory to come.
r(x, y, z) := xi + yj + zk.
F (x, y, z) := i + x2 yj + cos(xyz)k.
G(x, y, z) := yi + xj.
r(x, y, z) := i + 2j 13k.
A practical example of a vector field is the function which assigns to each point in
the Earth’s atmosphere a vector pointing in the direction of the wind at that point and
having a magnitude equal to the wind speed at that point. The information contained
in this function not only tells us about the speed of the wind at each point, but also the
direction. That is the key to vector fields. They contain two pieces of information, direction
and magnitude, because those are exactly the two pieces of information that determine a
vector.
r(x, y, z) := xi + yj + zk.
This one is relatively easy to picture. At each point (x, y, z), we take the vector from
the origin to the point and just translate it so that its tail is at the point. For example,
at the point (1, 0, 0), we take the vector i and translate it along so that its tail is at
(1, 0, 0). To the point (1, 1, 1), we take the vector i + j + k and move it until its tail
is at (1, 1, 1). You get the picture now. If we consider r(x, y, 0) : R2 ! R2 we can
visualize the vector field in two dimensions which is easier. See the left plot of Figure
1.10 (produced using the matlab script vectors.m).
36 CHAPTER 1. MULTIVARIABLE CALCULUS
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Figure 1.10: Left: the vectorfield r(x, y) := xi + yj. Right: the vectorfield F (x, y) :=
yi + xj.
F (x, y, z) := yi + xj.
Notice that since there is no k component, the resulting vector field will just be a
vertical translation of what happens in the xy-plane. To the point (1, 0, z), for any
z, we attach the vector j. To the point (0, 1, z), for any z, we attach the vector i,
and so on. See the right plot of Figure 1.10.
3. Consider the constant vector function F : R3 ! R3 by
F (x, y, z) := i + 2j 13k.
take its dot product with some fixed vector v. The resulting function, F (x, y, z) · v, is
a scalar function. It might look horrible, but we will have to get used to such things.
Another scalar function that we can get from a vector function is taking the norm of the
vector function, i.e., |F (x, y, z)|. Let’s look at some examples.
Notice that for most points, the value of this scalar function will be nonzero. However,
there are points where the value is 0 and it would be worth your time to think a little
bit about what that would mean geometrically. Consider for instance F (x, y, z) :=
yi + xj and r(x, y) := xi + yj. What is the value of F · r? Can you deduce this
from the plots of Figure 1.10?
This function returns the distance from the point (x, y, z) to the origin. What are
the level surfaces of this function? The formula given above gives the answer to you
in an algebraic form, but can you explain the shape of the level surfaces on a purely
geometric basis?
These are the sort of games that we will play with vector functions. The notation can
be a little bit intimidating at first, but if you simply keep a clear head and apply the
definitions, things will be all right.
~ := @ i + @ j + @ k.
r
@x @y @z
38 CHAPTER 1. MULTIVARIABLE CALCULUS
We can then formally apply it to a vectorfield using the operations we know from linear
algebra to obtain other di↵erential operators that may be applied to vectorfields (recall
that the gradient is applied to scalar functions). Let F : R3 7! R3 . If we first consider the
dot product we obtain what is known as the del operator or the divergence operator:
✓ ◆
~ @ @ @ @F1 @F2 @F3
r·F = i+ j + k · (F1 i + F2 j + F3 k) = + + .
@x @y @z @x @y @z
These di↵erential operators are very important in Mathematical Physics and you will see
them repeatedly in coming courses.